# fuzzyclassification_assisted_solution_preselection_in_evolutionary_optimization__c6ef42d3.pdf

The Thirty-Third AAAI Conference on Artiﬁcial Intelligence (AAAI-19)

Fuzzy-Classiﬁcation Assisted Solution Preselection in Evolutionary Optimization

Aimin Zhou,1 Jinyuan Zhang,1 Jianyong Sun,2 Guixu Zhang1

1East China Normal University, 3663 North Zhongshan Road, Shanghai, 200062, China 2Xi an Jiaotong University, No.28, Xianning West Road, Xi an, Shaanxi, 710049, China {amzhou, gxzhang}@cs.ecnu.edu.cn, jyzhang@stu.ecnu.edu.cn, jy.sun@xjtu.edu.cn

In evolutionary optimization, the preselection is an efﬁcient operator to improve the search efﬁciency, which aims to ﬁlter unpromising candidate solutions before ﬁtness evaluation. Most existing preselection operators rely on ﬁtness values, surrogate models, or classiﬁcation models. Basically, the classiﬁcation based preselection regards the preselection as a classiﬁcation procedure, i.e., differentiating promising and unpromising candidate solutions. However, the difference between promising and unpromising classes becomes fuzzy as the running process goes on, as all the left solutions are likely to be promising ones. Facing this challenge, this paper proposes a fuzzy classiﬁcation based preselection (FCPS) scheme, which utilizes the membership function to measure the quality of candidate solutions. The proposed FCPS scheme is applied to two state-of-the-art evolutionary algorithms on a test suite. The experimental results show the potential of FCPS on improving algorithm performance.

Introduction Sophisticated optimization problems are faced in a variety of real-world applications, and are arisen from scientiﬁc to engineering areas (Polak 1997). Generally, an optimization problem can be formulated as:

min x Ωf(x) (1)

where x = (x1, , xn) is a decision variable vector, Ω= Qn i=1[ai, bi] deﬁnes the feasible region of the search space, and f : Rn R denotes the objective function. When optimization problems are friendly, i.e., the objective and/or constraints are linear or convex, the linear programming or convex optimization methods are of great interests (Polak 1997). However real-world applications are usually more complicated where the objective and/or constraints might be nonlinear, non-convex or even without closed forms (Szu 1986; Yuille and Rangarajan 2002). For these problems, traditional optimization methods may become inefﬁcient, or even fail to work. To this end, derivativefree or heuristic optimization methods, which do not make strong assumptions about the problems to optimize, have

Corresponding author Copyright c 2019, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

attracted much attention (Wang et al. 2013; Mallipeddi and Suganthan 2014; Qian, Yu, and Zhou 2015). Among them, the evolutionary algorithms (EAs) (Back, Fogel, and Michalewicz 1997) is a kind of promising one, which includes genetic algorithms (GAs), particle swarm optimization (PSO), differential evolution (DE) (Das and Suganthan 2011), estimation of distribution algorithms (EDAs) (Larraaga and Lozano 2001), to name a few. Evolutionary optimization method is a kind of population based iterative heuristic optimization paradigm that uses a population of solutions to approximate the optimum of (1). As a kind of ﬁtness value driven method, EAs do not use the (approximated) derivatives to generate candidate solutions. Instead, they sample candidate solutions by using the current solutions, evaluate the candidate solutions, and select promising ones into the next generation. Therefore, the selection operations play an vital role in EAs. Generally, there are three types of selection, i.e. the mating selection, the preselection, and the environmental selection. The mating selection chooses parent solutions for sampling candidate solutions. The environmental selection chooses the promising solutions to the next generation. Preselection may have different meanings (Mahfoud 1992), and it refers to select promising ones from a set of candidate solutions in this paper. In evolutionary optimization, the preselection operator can help to improve algorithm performance signiﬁcantly if it is correctly used (Emmerich et al. 2002; Jin 2011), in which case the unpromising candidate solutions can be discarded before the ﬁtness evaluations and thus the computational resources can be saved. A key issue with preselection is on how to measure qualities of candidate solutions, i.e., to judge whether a candidate solution is promising or not. According to the quality measurement, the preselection operators can be roughly classiﬁed into three categories: (1) ﬁtness based approaches (Wang, Cai, and Zhang 2011; Li, Zhou, and Zhang 2014), (2) surrogate model based approaches (Jin 2003; 2011; Chen, Xie, and Zou 2013), and (3) classiﬁcation based approaches (Lu, Tang, and Yao 2011; Lu and Tang 2012; Yu, Qian, and Hu 2016; Hu, Qian, and Yu. 2017; Zhang, Zhou, and Zhang 2018; Wang, Qian, and Yu 2018). Classiﬁcation based preselection (CPS) (Zhang, Zhou, and Zhang 2018) regards the selection as a classiﬁcation

procedure, where the chosen candidate solutions belong to the promising class, and the discarded ones to the unpromising class. In this approach, the visited solutions are used to build a classiﬁcation model and to predict the qualities, i.e., promising or unpromising , of the candidate solutions, instead of ﬁtness values as done by the ﬁtness and surrogate model based approaches, and the promising ones are then chosen out. Therefore, this approach might be more natural in evolutionary optimization. By using a binary classiﬁcation model (Zhang, Zhou, and Zhang 2018), CPS, which actually is a binary-class classiﬁcation based preselection (BCPS), needs to prepare a training data set with two classes. However, in EAs, after several generations, all left solutions in the current population are likely to be promising ones, and the gap between the promising solutions and the unpromising ones will become unclear and fuzzy. In the community of classiﬁcation, there exits a type of so called fuzzy classiﬁcation methods (Keller, Gray, and Givens 1985; Derrac, Garcia, and Herrera 2014), which usually judge the data quality according to the membership function (Klir and Yuan 1995) when the features of the data are fuzzy. Thus applying the fuzzy classiﬁcation models to assist the preselection procedure might be more suitable than the binary classiﬁcation models. Following this idea, this paper proposes a fuzzy classiﬁcation based preselection (FCPS) strategy for evolutionary optimization. For a FCPS assisted EA, a fuzzy classiﬁcation model is built ﬁrstly according to the current solutions. Then for each parent solution, a set of candidate solutions are sampled by the variation operator. Finally for each parent solution, a candidate solution with the maximal fuzzy membership degree to the promising class is chosen for ﬁtness evaluation. The major contributions are as follows.

The fuzzy classiﬁcation model is introduced to assist the preselection, named as FCPS, in evolutionary optimization.

The FCPS strategy is applied to two state-of-the-art EAs, and the experimental studies demonstrate its advantages.

The rest of the paper is organized as follows. The next section presents the basic idea of the FCPS strategy, and introduces the FCPS assisted EA framework. After that, CPS assisted EAs are empirically studied on some test instances (Yao, Liu, and Lin 1999). Finally, the paper is concluded with some marks for future work.

Fuzzy Classiﬁcation based Preselection This section introduces the three main components, i.e., training set deﬁnition, model building, and candidate solution labeling and selection, when implementing FCPS and a general EA framework with FCPS.

Training Set Deﬁnition In current population, each solution x is with an objective value (ﬁtness) f(x). In classiﬁcation, each training sample x, i.e., the solution in our case, is with a label l. Therefore, it needs to assign each solution x a label l {+1, 1}, where +1 denotes a promising sample and 1 denotes a

unpromising sample. By this way, the current population will be partitioned into two training classes. This paper uses the following training set deﬁnition strategies (Zhang, Zhou, and Zhang 2018), which are also illustrated in Figure 1.

Figure 1: Training set deﬁnition strategies

Mean ﬁtness separation (MS): the solutions with the objective values less than the mean ﬁtness value, are assigned label +1, and the others are assigned label 1.

First quartile ﬁtness separation (Q1S): the best 25% solutions are assigned label +1, and the others are assigned label 1.

Median ﬁtness separation (Q2S): the best 50% solutions are assigned label +1, and the other ones are assigned label 1.

Third quartile ﬁtness separation (Q3S): the best 75% solutions are assigned label +1, and the others are assigned label 1.

Model Building

Fuzzy classiﬁcation (Derrac, Garcia, and Herrera 2014) is a fuzzy rule-based classiﬁcation model. In real world applications, a large number of concepts may generally not be described precisely, and in some situations, it does not need to accurately describe the relevant concepts of things. Based on such observations, Zadeh proposed fuzzy mathematics based on fuzzy set theory in 1965 (Zadeh 1965; Zadeh, Klir, and Yuan 1996). As one of the popular methods in fuzzy mathematics ﬁeld, fuzzy classiﬁcation is often applied in a fuzzy environment by using precise methods to model and classify imprecise and vague data. As with many other fuzzy mathematics methods, fuzzy membership function (Klir and Yuan 1995) is used in the fuzzy classiﬁcation to facilitate classiﬁcation process in a fuzzy manner. The membership function is deﬁned by the true value of the fuzzy function (Klir and Yuan 1995), indicating the membership degree of the data that is to be classiﬁed belonging to a certain category. According to the fuzzy membership degree, the fuzzy classiﬁcation groups the data with the same characteristics into fuzzy sets. In other words, the determination of the membership function reveals the result of the fuzzy classiﬁcation. In addition, the membership function is related to each particular practical problem,

which is required to be constructed given the actual problem at hand. Let m = Fclass(x)

denote a fuzzy classiﬁcation model for a binary classiﬁcation problem, where x is a feature vector, and m = (m1, m2) is the membership degree vector that belongs to the two classes. Some commonly used fuzzy classiﬁcation models are based on neural networks, K nearest neighbors (KNN), etc. (Keller, Gray, and Givens 1985; Derrac, Garcia, and Herrera 2014). This paper uses the fuzzy KNN (FKNN) to do classiﬁcation, which uses the fuzzy similarity rather than the distance in original KNN to do classiﬁcation (Derrac, Garcia, and Herrera 2014).

Candidate Solution Labeling and Selection In FCPS, for each parent solution x, M candidate solutions, Y = {y1, y2, , y M}, are sampled. The candidate solution with the maximal membership degree belongs to promising class is chosen.

FCPS Assisted EA A general FCPS assisted EA is shown in Algorithm 1.

Algorithm 1: FCPS-EA Framework

// Initialization

1 Initialize the population P = {x1, x2, , x N}; // Main loop

2 while termination condition is not satisﬁed do

// Training Set definition

3 Assign each individual x P a label l {+1, 1}; // Model building

4 Train a fuzzy classiﬁer model m = Fclass(x) based on the data set {< x, l > |x P};

5 foreach x P do

// Candidate solution generation

6 Sample candidate solutions Y = {y1, , y M}; // Candidate solutions labeling and selection

7 Predict the membership of y Y by m = Fclass(y);

8 Let V Y contain candidate solutions with maximal membership degree belongs to promising class;

9 Randomly choose y V as offspring x; // Environmental selection

10 if f(y) < f(x) then

11 Set x = y;

In each generation, FCPS-EA maintains:

a set of N solutions P = {x1, x2, , x N}, and

their objective values f(x1), f(x2), , f(x N).

Some components in Algorithm 1 are explained as follows.

Initialization: In Line 1, N solutions are uniformly and randomly sampled from Ωto initialize the population P.

Stopping condition: The algorithm stops when the number of function evaluations exceeds the given maximum number FES in Line 2.

Training set deﬁnition: In Line 3, each solution in the current population is assigned a label l {+1, 1} where l = +1 denotes the solution is a promising one and l = 1 denotes the solution is a unpromising one.

Model building: In Line 4, a fuzzy classiﬁer is trained based on the labeled samples.

Candidate solution generation: In Line 6, a set of M candidate solutions are sampled with a reproduction operator.

Candidate solution labeling and selection: In Lines 79, the fuzzy membership degree of the candidate solutions belong to promising class is computed according to fuzzy membership function of the fuzzy classiﬁer and the ones with maximal membership degrees are chosen out. And ﬁnally a solution is randomly selected from the promising candidate set as the offspring solution.

Environmental selection: In Lines 10-12, the better ones between the current and candidate solutions are selected into the next generation according to the objective function values.

Experimental Study

Algorithms in Study

The proposed FCPS strategy is integrated into two state-ofthe-art evolutionary optimization algorithms, i.e., the composite differential evolution (Co DE) (Wang, Cai, and Zhang 2011) and the hybrid estimation of distribution algorithm with cheap and expensive local search (EDA/LS) (Zhou, Sun, and Zhang 2015) to study its performance.

Co DE Co DE (Wang, Cai, and Zhang 2011) is a multioperator based DE. It utilizes three mutation schemes. i.e., DE/rand/1/bin , DE/rand/2/bin , and current-to-rand/1 , along with three groups of control parameters in candidate solution generation. In each generation, three combinations of mutation and parameter are used to generate three candidate solutions for each parent, and the one with best ﬁtness value is chosen as the real offspring solution. The details of the algorithm are referred to (Wang, Cai, and Zhang 2011).

EDA/LS EDA/LS (Zhou, Sun, and Zhang 2015) is a hybrid EDA that employees both cheap and expensive local search methods. Its basic idea is to use both global statistical information and individual location information for solution generation. The details of the algorithm are referred to (Zhou, Sun, and Zhang 2015).

Table 1: Statistical results for the median, mean and standard deviation values of the results obtained by Co DE, EDA/LS, and their variants on f1-f13 after 300, 000 FES over 30 independent runs.

median meanstd median meanstd median meanstd FCPS-Co DE BCPS-Co DE Co DE f1 1.86e-145[1] 1.30e-143[1]3.04e 143 5.69e-104[2] 1.37e-102[2](-)4.19e 102 2.02e-67[3] 4.88e-67[3](-)7.01e 67 f2 4.83e-77[1] 1.64e-76[1]2.55e 76 2.41e-53[2] 3.93e-53[2](-)5.47e 53 1.44e-35[3] 2.53e-35[3](-)4.15e 35 f3 9.84e-36[1] 5.84e-34[1]1.76e 33 9.16e-26[2] 9.14e-25[2](-)2.03e 24 7.13e-17[3] 9.63e-16[3](-)4.53e 15 f4 8.86e-29[1] 4.53e-27[1]1.43e 26 2.51e-23[2] 1.51e-22[2](-)4.25e 22 5.37e-16[3] 8.89e-16[3](-)9.94e 16 f5 8.69e-06[3] 2.66e-01[3]1.01e+00 7.83e-24[1] 2.39e-01[2](+)9.56e 01 5.54e-13[2] 7.97e-02[1](+)5.64e 01 f6 0.00e+00[1] 0.00e+00[1]0.00e+00 0.00e+00[1] 0.00e+00[1]( )0.00e+00 0.00e+00[1] 0.00e+00[1]( )0.00e+00 f7 1.62e-03[1] 1.81e-03[1]7.43e 04 1.97e-03[2] 2.14e-03[2](-)7.93e 04 2.94e-03[3] 2.84e-03[3](-)9.36e 04 f8 0.00e+00[1] 6.71e+01[3]1.06e+02 0.00e+00[1] 4.74e+00[2]( )2.34e+01 0.00e+00[1] 2.37e+00[1]( )1.67e+01 f9 1.99e+00[3] 2.22e+00[3]1.40e+00 0.00e+00[1] 5.97e-01[2](+)7.52e 01 0.00e+00[1] 0.00e+00[1](+)0.00e+00 f10 4.44e-15[1] 4.44e-15[1]0.00e+00 4.44e-15[1] 4.44e-15[1]( )0.00e+00 4.44e-15[1] 4.44e-15[1]( )0.00e+00 f11 0.00e+00[1] 1.31e-03[3]4.79e 03 0.00e+00[1] 7.39e-04[2]( )2.69e 03 0.00e+00[1] 1.97e-04[1]( )1.39e 03 f12 1.57e-32[1] 1.57e-32[1]5.53e 48 1.57e-32[1] 1.57e-32[1]( )5.53e 48 1.57e-32[1] 2.07e-03[3]( )1.47e 02 f13 1.35e-32[1] 3.66e-04[3]2.01e 03 1.35e-32[1] 2.20e-04[2]( )1.55e 03 1.35e-32[1] 1.35e-32[1]( )1.11e 47 +/-/ 2/5/6 2/5/6

FCPS-EDA/LS BCPS-EDA/LS EDA/LS f1 3.09e-153[1] 5.57e-153[1]6.59e 153 2.26e-140[2] 3.08e-139[2](-)1.90e 138 3.59e-130[3] 5.63e-130[3](-)7.39e 130 f2 5.47e-81[1] 6.48e-81[1]4.87e 81 5.58e-70[2] 7.65e-70[2](-)6.72e 70 9.54e-65[3] 1.08e-64[3](-)7.42e 65 f3 9.25e-38[1] 1.08e-35[1]3.10e 35 1.73e-37[2] 1.17e-34[3]( )6.18e 34 3.70e-37[3] 1.96e-35[2]( )6.41e 35 f4 9.95e-47[1] 1.31e-46[1]9.62e 47 3.44e-45[2] 1.12e-39[3](-)7.91e 39 2.01e-42[3] 3.99e-40[2](-)2.80e 39 f5 1.94e-29[3] 1.33e-01[3]7.28e 01 1.12e-29[1] 2.27e-29[1]( )3.07e 29 1.74e-29[2] 2.48e-29[2]( )2.97e 29 f6 0.00e+00[1] 0.00e+00[1]0.00e+00 0.00e+00[1] 0.00e+00[1]( )0.00e+00 0.00e+00[1] 0.00e+00[1]( )0.00e+00 f7 2.03e-03[1] 2.13e-03[1]5.50e 04 2.22e-03[3] 2.24e-03[2](-)5.95e 04 2.19e-03[2] 2.27e-03[3](-)6.18e 04 f8 0.00e+00[1] 0.00e+00[1]0.00e+00 0.00e+00[1] 0.00e+00[1]( )0.00e+00 0.00e+00[1] 0.00e+00[1]( )0.00e+00 f9 0.00e+00[1] 0.00e+00[1]0.00e+00 0.00e+00[1] 1.99e-02[3]( )1.41e 01 0.00e+00[1] 0.00e+00[1]( )0.00e+00 f10 4.44e-15[1] 4.44e-15[1]0.00e+00 4.44e-15[1] 4.44e-15[1]( )0.00e+00 4.44e-15[1] 4.44e-15[1]( )0.00e+00 f11 0.00e+00[1] 3.29e-04[3]1.80e 03 0.00e+00[1] 0.00e+00[1]( )0.00e+00 0.00e+00[1] 0.00e+00[1]( )0.00e+00 f12 1.57e-32[1] 1.57e-32[1]5.53e 48 1.57e-32[1] 1.57e-32[1]( )5.53e 48 1.57e-32[1] 1.57e-32[1]( )5.53e 48 f13 1.35e-32[1] 1.35e-32[1]1.11e 47 1.35e-32[1] 1.35e-32[1]( )1.11e 47 1.35e-32[1] 1.35e-32[1]( )1.11e 47 +/-/ 0/4/9 0/4/9

0 1000 2000 3000 4000 10 150

FCPS Co DE BCPS Co DE Co DE

0 1000 2000 3000 4000 10 5

FCPS Co DE BCPS Co DE Co DE

0 1000 2000 3000 4000 10 4

FCPS Co DE BCPS Co DE Co DE

0 1000 2000 3000 4000 10 15

FCPS Co DE BCPS Co DE Co DE

Figure 2: The mean function value of the best individuals obtained by Co DE and its variants versus FES on 4 instances over 30 independent runs.

1e 10 1e 20 1e 30 1e 40 1e 50 1e 60 1e 70 0

FCPS EDA/LS BCPS EDA/LS EDA/LS

1e 05 1e 10 1e 15 1e 20 1e 25 0

FCPS EDA/LS BCPS EDA/LS EDA/LS

1 1e 01 1e 02 0

FCPS EDA/LS BCPS EDA/LS EDA/LS

1 1e 05 1e 10 0

FCPS EDA/LS BCPS EDA/LS EDA/LS

Figure 3: Bar plots of the median FES obtained by EDA/LS and its variants when achieve the same values on 4 instances over 30 independent runs.

Experimental Settings The ﬁrst 13 benchmark functions from the YLL test suite (Yao, Liu, and Lin 1999) are employed for empirical study. Among the test problems, f1-f4 are unimodal, f5 is unimodal when n = 2 and n = 3, and is multimodal when n > 3, f6 is a step function, f7 is with white noise, and f8-f13 are multimodal. All the test instances have an optimal objective value 0. The deﬁnitions of these functions are referred to (Yao, Liu, and Lin 1999). The variable dimensions are n = 30 for all instances. The population size is N = 150 for EDA/LS and its variants, and N = 30 for Co DE and its variants. The stop condition is ﬁtness evaluations (FES) = 300,000 for all of the algorithms. Each algorithm is executed on each test instance for 30 independent runs. For FCPS based approaches, the number of candidate solutions is M = 4. The other control parameters are the same as in the original algorithms (Zhou, Sun, and Zhang 2015; Wang, Cai, and Zhang 2011). In the experiments, FCPS is also compared with BCPS. As suggested in (Zhang, Zhou, and Zhang 2018), the classiﬁcation and regression trees (CART) based BCPS approach performs best than other binary classiﬁcation models, besides K nearest neighbor (KNN). Thus, in this paper in order to evidence the efﬁciency of FCPS model, the CART model is employed. The MS strategy is used for training set deﬁnition. In the following subsections, Co DE and EDA/LS, their variants with FCPS, i.e., FCPS-Co DE and FCPS-EDA/LS, and their variants with BCPS (Zhang, Zhou, and Zhang 2018), i.e., BCPS-Co DE and BCPS-EDA/LS, are empirically studied. The Wilcoxon rank sum test is used to compare the experimental results, where + , , or in the tables indicate the value obtained by an algorithm is smaller than, greater than, or similar to that obtained by its FCPS based version at 95% signiﬁcance level. The value in the brackets denote the corresponding rank value in the tables.

Experimental Results Table 1 summarizes the median, mean and standard deviation of the results obtained by Co DE, EDA/LS and their variants on the 13 YLL test instances after 300,000 FES over 30 runs respectively. Table 2 records the rank values obtained by the algorithms. The experimental results are summarized as follows. Co DE and its variants: The results in Table 1 suggest that on f1-f4, f7, FCPS-Co DE performs better than Co DE; on f6, and f8, f10 f13, the two algorithms obtain similar results; and on f5, f9, FCPS-Co DE performs worse than Co DE. On f1 f4, f7, FCPS-Co DE performs better than BCPS-Co DE; on f5 and f9, BCPS-Co DE outperforms FCPS-Co DE; and on other 6 instances, the two algorithms obtain similar results. EDA/LS and its variants: The results in Table 1 suggest that on f3, f5, f6, and f8 f13, FCPS-EDA/LS, BCPSEDA/LS, and EDA/LS obtain similar results; and on f1, f2, f4, f7, FCPS-EDA/LS gets the best results. The Wilcoxon rank sum test also shows that on most of the instances, the FCPS variant works no worse than the

original algorithms and the corresponding BCPS variants. The rank values in Table 2 also suggest that according to

Table 2: Statistical results of rank values in Table 1.

median mean Rank 1 2 3 mean 1 2 3 mean FCPS-Co DE 11 0 2 1.31 8 0 5 1.77 BCPS-Co DE 8 5 0 1.38 3 10 0 1.77 Co DE 7 1 5 1.85 7 0 6 1.92 FCPS-EDA/LS 12 0 1 1.15 11 0 2 1.31 BCPS-EDA/LS 8 4 1 1.46 7 3 3 1.69 EDA/LS 7 2 4 1.77 7 3 3 1.69

both the median value and the mean value, the FCPS variants of Co DE and EDA/LS always achieve the best mean rank values. Figure 2 plots the run time performance of FCPS-Co DE, BCPS-Co DE and Co DE on four test instances. It shows that on f1,f7, f10 FCPS-Co DE in red curve converges faster than BCPS-Co DE and Co DE. On f5 BCPS-Co DE is the fastest one. Figure 3 plots the median FES obtained by FCPSEDA/LS, BCPS-EDA/LS and EDA/LS when achieve the same values on four representative test instances f1, f5, f7, f10. The ﬁgure suggests on all of the test instances FCPS-EDA/LS always uses the smallest numbers of FES to achieve the ﬁtness values. The above results indicate that the FCPS assisted approaches work no worse than the original algorithms and the BCPS based approaches on most of the 13 test instances. This also suggests that FCPS could be regarded as a general strategy to improve the performances of existing EAs.

Inﬂuence of Training Set Deﬁnition Strategies This section studies the 4 training data set deﬁnition strategies. In this study, Co DE is chosen as the basic optimizer in the experiments. The statistical results for FCPS-Co DE with the 4 training set deﬁnition strategies on 4 test instances after 300,000 FEs over 30 independent runs are shown in Figure 4. Figure 4 shows FCPS-Co DE with Q1S, Q2S, and Q3S perform similarly and are worse than FCPS-Co DE with MS. This suggests that the population can be stably partitioned into two classes by the mean objective value of the current population. The reason might be that MS is statistically more stable than the other three strategies. To this end, the MS strategy might be a promising choice.

Comparison between FCPS and BCPS Strategies This section investigates why the fuzzy classiﬁcation model works better than the binary classiﬁcation model. The following experiment is conducted. For a given population, both FKNN and CART models are built, then the models are used to predict the quality of the newly generated candidate solutions, and ﬁnally an offspring solution is chosen according to each of the model. The two offspring solutions are compared according to the real function values, and the number of win solutions achieved by the FKNN model

Figure 4: The run time performance of FCPS-Co DE with 4 training set deﬁnition strategies on 4 instances over 30 independent runs.

Table 3: The median, mean, max, min of win numbers obtained by FKNN-Co DE comparing to those of CART-Co DE in each generation.

f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 mean 22 22 21 21 26 20 15 20 20 25 20 20 29 median 22 22 21 21 30 20 15 20 20 25 20 20 30 max 30 29 29 29 30 28 25 29 30 30 29 29 30 min 11 12 11 11 12 10 5 10 10 10 10 10 12 win generations 9971 9967 9914 9950 9960 9830 5998 9841 9805 9995 9799 9715 9990

Table 4: The statistical results obtained by FCPS, SVM, GP, RBF, and Btree assisted Co DE on f1 f13.

FCPS SVM GP Btree RBF median[rank] median[rank] median[rank] median[rank] median[rank] f1 1.86e-145[1] 5.15e-49[5] 2.42e-54[3] 1.20e-107[2] 4.44e-53[4] f2 4.83e-77[1] 2.23e-31[3] 3.65e-31[4] 3.41e-55[2] 5.05e-31[5] f3 9.84e-36[1] 3.72e-09[5] 6.31e-18[3] 3.09e-27[2] 1.82e-15[4] f4 8.86e-29[1] 8.57e-11[4] 9.61e-12[3] 8.11e-24[2] 3.35e-10[5] f5 8.69e-06[4] 5.86e-12[2] 1.16e-05[5] 1.25e-25[1] 1.36e-10[3] f6 0.00e+00[1] 0.00e+00[1] 0.00e+00[1] 0.00e+00[1] 0.00e+00[1] f7 1.62e-03[2] 1.64e-03[3] 1.27e-03[1] 2.04e-03[4] 3.24e-03[5] f8 0.00e+00[1] 1.18e+02[5] 0.00e+00[1] 0.00e+00[1] 0.00e+00[1] f9 1.99e+00[5] 0.00e+00[1] 0.00e+00[1] 0.00e+00[1] 0.00e+00[1] f10 4.44e-15[1] 7.99e-15[2] 7.99e-15[2] 7.99e-15[2] 7.99e-15[2] f11 0.00e+00[1] 0.00e+00[1] 0.00e+00[1] 0.00e+00[1] 0.00e+00[1] f12 1.57e-32[1] 1.57e-32[1] 1.57e-32[1] 1.57e-32[1] 1.57e-32[1] f13 1.35e-32[1] 1.35e-32[1] 1.35e-32[1] 1.35e-32[1] 1.35e-32[1] mean rank 1.62 2.62 2.08 1.62 2.62

0 0.5 1 1.5 2 2.5 3

FCPS SVM GP Btree RBF

0 0.5 1 1.5 2 2.5 3

FCPS SVM GP Btree RBF

0 0.5 1 1.5 2 2.5 3

FCPS SVM GP Btree RBF

0 0.5 1 1.5 2 2.5 3

FCPS SVM GP Btree RBF

Figure 5: The mean runtime performance of FCPS, SVM, GP, RBF, Btree assisted Co DE on 4 instances.

Table 5: The median, mean, max, min of win numbers obtained by FKNN-Co DE comparing to those of Btree-Co DE in each generation.

f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 mean 22 23 22 23 26 18 16 19 19 24 19 19 29 median 22 23 22 22 26 18 15 19 19 24 19 18 30 max 30 30 30 29 30 28 25 29 29 30 28 29 30 min 10 11 12 11 12 8 5 7 6 13 8 8 14 win generations 9934 9984 9986 9973 9983 9249 5807 9252 9323 9995 9273 9121 9996

is recorded 1. This comparison is applied to each generation. The test instances f1 f13 are chosen for empirical study. Co DE is chosen as the basic optimizer in the experiments. The population size is 30. The experimental results are shown in Table 3. From Table 3, it can be seen that 16.67%-100.00% offspring solutions chosen by the FKNN model are better than those chosen by the CART model, and on the contrary, only 0.00%-83.33% offspring solutions chosen by the CART model are better than those chosen by the FKNN model. The median and mean values are quite consistent. According to the mean values, 50%-96.67% offspring solutions chosen by the FKNN model are better than those chosen by the CART model. It is clear that except f7, in over 97% generations, more than 50.00% offspring solutions chosen by the FKNN model are better than those chosen by the CART model on all the other test instances. This suggests that the FKNN performs better than CART on choosing offspring solutions.

Comparison between Fuzzy Classiﬁcation and Surrogate Models This section compares FCPS and surrogate model based preselection (SPS) assisted Co DE. The surrogate model employed in this section are SVM, GP, Btree, RBF. For simplicity, the name of the model is directly used to represent Co DE with the corresponding strategy. Table 4 summarizes the median results obtained by FCPS, SVM, GP, Btree, RBF assisted Co DE on the 13 YLL test instances after 300,000 FES over 30 independent runs respectively. The rank values in Table 4 suggest that FCPS and Btree assisted Co DE achieve the best mean rank value. Figure 5 plots the mean run time performance of FCPS, SVM, GP, Btree, RBF assisted Co DE on 4 test instances. The ﬁgure presents that on f1,f10, FCPS assisted Co DE in red curve converges faster than the others, and the Btree assisted Co DE performs the second best. On f5 Btree assisted Co DE performs the best, and on f7, GP assisted Co DE is faster than the others, and FCPS and Btree assisted Co DE perform similarly. Thus it can be concluded that FCPS assisted Co DE outperforms the surrogate model assisted Co DE. And for the surrogate models, the Btree model is the best one. In order to investigate why the fuzzy classiﬁcation model works better than the surrogate model, the following experiment is conducted. For a given population, both FKNN and

1The number of win solutions achieved by CART-Co DE equals to the population size minus that achieved by FKNN-Co DE.

Btree models are built, then the models are used to predict the quality of the newly generated candidate solutions, and ﬁnally an offspring solution is chosen according to each of the models. The two offspring solutions are compared according to the real function values, and the number of win solutions achieved by the FKNN model is recorded 2. This comparison is applied to each generation. The test instances f1 f13 are chosen to do the experiment. The population size is 30. The experimental results are shown in Table 5. From Table 5, it can be seen that 16.67%-100.00% offspring solutions chosen by the FKNN model are better than those chosen by the Btree model, and on the contrary, only 0.00%-83.33% offspring solutions chosen by the Btree model are better than those chosen by the FKNN model. The median and mean values are quite consistent. According to the mean values, 53.33%-96.67% offspring solutions chosen by the FKNN model are better than those chosen by the Btree model. It is also clear that except f7, in over 91.00% generations, more than 50.00% offspring solutions chosen by the FKNN model are better than those chosen by the Btree model on all the other test instances. This suggests the FKNN performs better than Btree on choosing offspring solutions. It can also be concluded that FCPS can signiﬁcantly improve the performance of EAs than surrogate models.

Conclusions

This paper proposed a fuzzy classiﬁcation based preselection (FCPS) strategy for evolutionary optimization. FCPS applies a fuzzy classiﬁcation model instead of a binary classiﬁcation model to assist preselection. To apply FCPS in EAs, ﬁrstly a fuzzy classiﬁcation model is built according to the current solutions that belong to the promising and unpromising training data sets. Then, for each parent solution, a set of candidate solutions are sampled by the variation operator. Finally, a solution belongs to the promising class with maximal membership degree according to the fuzzy classiﬁcation model is chosen as the offspring solution. FCPS is applied to two state-of-the-art algorithms, i.e., Co DE (Wang, Cai, and Zhang 2011), EDA/LS (Zhou, Sun, and Zhang 2015), and studied on 13 YLL test instances (Yao, Liu, and Lin 1999). The CART based BCPS scheme is also employed for comparison study. The experimental results suggest that on most cases, the FCPS based approach works better than the BCPS based approach and the original algorithm. Some more discussions have also

2The number of win solutions achieved by Btree-Co DE equals to the population size minus that achieved by FKNN-Co DE.

proved that the fuzzy classiﬁcation model is more efﬁcient than the CART model and surrogate model on prediction in evolutionary optimization. Some future work to improve the performance of FCPSEA could be in following ways: (a) ﬁnd a more proper way to combine fuzzy classiﬁcation with EAs, and (b) reduce the computational cost in fuzzy classiﬁcation building.

Acknowledgments This work is supported by the National Natural Science Foundation of China (No. 61731009, 61673180, and 61703382), and the Fundamental Research Funds for the Central Universities.

References Back, T.; Fogel, D. B.; and Michalewicz, Z. 1997. Handbook of Evolutionary Computation. CRC Press. Chen, Y.; Xie, W.; and Zou, X. 2013. How can surrogates inﬂuence the convergence of evolutionary algorithms? Swarm and Evolutionary Computation 12:18 23. Das, S., and Suganthan, P. N. 2011. Differential evolution: A survey of the state-of-the-art. IEEE Transactions on Evolutionary Computation 15(1):4 31. Derrac, J.; Garcia, S.; and Herrera, F. 2014. Fuzzy nearest neighbor algorithms: Taxonomy, experimental analysis and prospects. Information Sciences 260:98 119. Emmerich, M.; Giotis, A.; Ozdemir, M.; Back, T.; and Giannakoglou, K. 2002. Metamodel-assisted evolution strategies. In International Conference on Parallel Problem Solving from Nature (PPSN 02), 361 370. Hu, Y.-Q.; Qian, H.; and Yu., Y. 2017. Sequential classiﬁcation-based optimization for direct policy search. In Proceedings of the 31st AAAI Conference on Artiﬁcial Intelligence (AAAI 17), 2029 2035. Jin, Y. 2003. A comprehensive survey of ﬁtness approximation in evolutionary computation. Soft Computing 9(1):3 12. Jin, Y. 2011. Surrogate-assisted evolutionary computation: Recent advances and future challenges. Swarm and Evolutionary Computation 1(2):61 70. Keller, J. M.; Gray, M. R.; and Givens, J. A. 1985. A fuzzy knearest neighbor algorithm. IEEE Transactions on Systems, Man, and Cybernetics (4):580 585. Klir, G., and Yuan, B. 1995. Fuzzy sets and fuzzy logic, volume 4. New Jersey: Prentice hall. Larraaga, P., and Lozano, J. A. 2001. Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation, volume 2. Springer Science & Business Media. Li, Y.; Zhou, A.; and Zhang, G. 2014. An MOEA/D with multiple differential evolution mutation operators. In Proceedings of the 2014 IEEE Congress on Evolutionary Computation (CEC 14), 397 404. Lu, X.-F., and Tang, K. 2012. Classiﬁcationand regressionassisted differential evolution for computationally expensive problems. Journal of Computer Science and Technology 27(5):1024 1034.

Lu, X.; Tang, K.; and Yao, X. 2011. Classiﬁcation-assisted differential evolution for computationally expensive problems. In Proceedings of the 2011 IEEE Congress on Evolutionary Computation (CEC 11), 1986 1993. Mahfoud, S. W. 1992. Crowding and preselection revisited. In International Conference on Parallel Problem Solving From Nature (PPSN 92), 27 36. Mallipeddi, R., and Suganthan, P. N. 2014. Unit commitment - a survey and comparison of conventional and nature inspired algorithms. International Journal of Bio-Inspired Computation 6(2):71 90. Polak, E. 1997. Optimization: algorithms and consistent approximations. Springer-Verlag New York. Qian, C.; Yu, Y.; and Zhou, Z.-H. 2015. Pareto ensemble pruning. In Proceedings of the 29th AAAI Conference on Artiﬁcial Intelligence (AAAI 15), 2935 2941. Szu, H. H. 1986. Non-convex optimization. In Real-Time Signal Processing IX. International Society for Optics and Photonics, volume 698, 59 68. Wang, Z.; Zoghi, M.; Hutter, F.; Matheson, D.; and De Freitas, N. 2013. Bayesian optimization in high dimensions via random embeddings. In Proceedings of the 23rd International Joint Conference on Artiﬁcial Intelligence (IJCAI 13), 1778 1784. Wang, Y.; Cai, Z.; and Zhang, Q. 2011. Differential evolution with composite trial vector generation strategies and control parameters. IEEE Transactions on Evolutionary Computation 15:55 66. Wang, H.; Qian, H.; and Yu, Y. 2018. Noisy derivative-free optimization with value suppression. In Proceedings of the 32nd AAAI Conference on Artiﬁcial Intelligence (AAAI 18), 1447 1454. Yao, X.; Liu, Y.; and Lin, G. 1999. Evolutionary programming made faster. IEEE Transactions on Evolutionary Computation 3(2):82 102. Yu, Y.; Qian, H.; and Hu, Y.-Q. 2016. Derivative-free optimization via classiﬁcation. In Proceedings of the 30th AAAI Conference on Artiﬁcial Intelligence (AAAI 16), 2286 2292. Yuille, A. L., and Rangarajan, A. 2002. The concave-convex procedure (cccp). In Advances in neural information processing systems 14 (NIPS 01), 1033 1040. Zadeh, L. A.; Klir, G. J.; and Yuan, B. 1996. Fuzzy sets, fuzzy logic, and fuzzy systems: Selected papers. Advances in Fuzzy Systems - Applications and Theory. Zadeh, L. A. 1965. Fuzzy logic and its applications. New York, NY, USA. Zhang, J.; Zhou, A.; and Zhang, G. 2018. Preselection via classiﬁcation: A case study on global optimization. International Journal of Bio-Inspired Computation 11(4):267 281. Zhou, A.; Sun, J.; and Zhang, Q. 2015. An estimation of distribution algorithm with cheap and expensive local search methods. IEEE Transactions on Evolutionary Computation 19(6):807 822.