# multiobjective_bayesian_optimization_with_active_preference_learning__708d9f40.pdf

Multi-Objective Bayesian Optimization with Active Preference Learning

Ryota Ozaki1, Kazuki Ishikawa1, Youhei Kanzaki1, Shion Takeno3, Ichiro Takeuchi2, 3, Masayuki Karasuyama1

1Nagoya Institute of Technology 2Nagoya University 3RIKEN AIP ozaki.ryota.mllab.nit@gmail.com, ishikawa.kazuki.mllab.nit@gmail.com, kanzaki.youhei.mllab.nit@gmail.com, shion.takeno@riken.jp, takeuchi.ichiro@mae.nagoya-u.ac.jp, karasuyama@nitech.ac.jp

There are a lot of real-world black-box optimization problems that need to optimize multiple criteria simultaneously. However, in a multi-objective optimization (MOO) problem, identifying the whole Pareto front requires the prohibitive search cost, while in many practical scenarios, the decision maker (DM) only needs a speciﬁc solution among the set of the Pareto optimal solutions. We propose a Bayesian optimization (BO) approach to identifying the most preferred solution in the MOO with expensive objective functions, in which a Bayesian preference model of the DM is adaptively estimated by an interactive manner based on the two types of supervisions called the pairwise preference and improvement request. To explore the most preferred solution, we deﬁne an acquisition function in which the uncertainty both in the objective function and the DM preference is incorporated. Further, to minimize the interaction cost with the DM, we also propose an active learning strategy for the preference estimation. We empirically demonstrate the effectiveness of our proposed method through the benchmark function optimization and the hyper-parameter optimization problems for machine learning models.

1 Introduction In many real-world problems, simultaneously optimizing multiple expensive black-box functions f1(x), . . . , f L(x) are often required. For example, considering an experimental design for developing novel drugs, there can be several criteria to evaluate drug performance such as the efﬁcacy of the drug and the strength of side effects. If we pursue high efﬁcacy, the strength of side effects usually tends to deteriorate. Another example is in the hyper-parameter optimization of a machine learning model, in which multiple different criteria such as accuracy on different classes, computational efﬁciency, and social fairness should be simultaneously optimized. In general, multi-objective optimization (MOO) problems have multiple optimal solutions (Figure 1 (a)), called Pareto optimal solutions, and MOO solvers typically aim to ﬁnd all of the Pareto optimal solutions. However, the search cost of this approach often becomes prohibitive because the number of Pareto optimal solutions can be large even with small L.

Copyright 2024, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

On the other hand, in many practical scenarios, a decision maker (DM) only needs one of the optimal solutions that matches their demands (e.g., a drug developer may choose just one drug design considering the best balance between the efﬁcacy and side effects). Therefore, if we can incorporate the DM s preference at the MOO stage, preferred solutions for the DM can be efﬁciently identiﬁed without enumerating all Pareto solutions. However, for the DM, directly deﬁning their preference mathematically can be difﬁcult. We propose a Bayesian optimization (BO) method for the preference-based MOO, optimizing x through an interactive (human-in-the-loop based) estimation of the preference based on weak-supervisions provided by the DM. Let U : RL R be a utility function that quantiﬁes the DM preference. We assume that when the DM prefers f RL to f RL, the utility function values satisfy U(f) > U(f ). By using U, the optimization problem can be formulated as maxx U(f(x)), where f(x) = (f1(x), . . . , f L(x)) . Our proposed method is based on a Bayesian modeling of f(x) and U, by which uncertainty of both the objective functions and the DM preference can be incorporated. For f(x), we use the Gaussian process (GP) regression, following the standard convention of BO. For U, we employ a Chebyshev scalarization function (CSF) based parametrized utility function because of its simplicity and capability of identifying any Pareto optimal points depending on a setting of the preference parameter w RL (Figure 1 (b)). To estimate utility function U (i.e., to estimate the parameter w), we consider two types of weak supervisions provided by the interaction with the DM. First, we use the pairwise comparison (PC) over two vectors f, f RL. In this supervision, the DM answers whether U(f) > U(f ) holds based on their preference. In the context of preference learning (e.g., Chu and Ghahramani 2005), this ﬁrst type of supervision is widely known that the relative comparison is often much easier to be provided by the DM than the exact value of U(f). As the second preference information, we propose to use improvement request (IR) for a given f RL. The DM provides the index ℓfor which the DM hopes that the ℓ-th dimension of f, i.e., fℓ, is required to improve most among all the L dimensions. To our knowledge, this way of supervision has never been studied in preference learning nevertheless it is obviously easy to provide and important information for the DM. We show that IR can be formulated

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

as a weak supervision of the gradient of the utility function. We show that the well-known expected improvement (EI) acquisition function of BO can be deﬁned for the optimization problem maxx U(f(x)). Here, the expectation is taken over the both of U and f(x), by which exploration is performed based on the current uncertainty for both of them. Further, to reduce querying cost to the DM, we also propose active learning that selects effective queries (PC or IR) to estimate the preference parameter w. Our contributions can be summarized as follows:

We propose a preference-based MOO algorithm by combining BO and preference learning in which both the MOO objective f(x) and the DM preference (utility function) are modeled by a Bayesian manner. Bayesian preference learning for the utility function is proposed based on two types of weak supervisions, i.e., PC and IR. In particular, IR is a novel paradigm of preference learning. Active learning for PC and IR is also proposed. Our approach is based on BALD (Bayesian Active Learning by Disagreement) (Houlsby et al. 2011), which uses mutual information to measure the beneﬁt of querying. Numerical experiments on several benchmark functions and hyperparameter optimization of machine learning models show superior performance of our framework compared with baselines such as an MOO extension of BO without preference information.

2 Problem Setting We consider a multi-objective optimization (MOO) problem that maximizes L objective functions. Let fℓ(x) for ℓ [L] be a set of objective functions and f(x) = (f1(x), . . . , f L(x)) be their vector representation, where x Rd is an input vector. In general, an MOO problem can have multiple optimal solutions, called the Pareto optimal points. For example, in Figure 1 (a), all the green points are optimal points. See MOO literature (e.g., Giagkiozis and Fleming 2015) for the detailed deﬁnition of the Pareto optimality. Since the number of the Pareto optimal points can be large even with small L (e.g., Ishibuchi, Tsukamoto, and Nojima 2008), identifying all the Pareto optimal points can be computationally prohibitive. Instead, we consider optimizing a utility function U : RL R that satisﬁes U(f) > U(f ) when the DM prefers f to f , where f, f RL. The optimization problem that seeks the solution preferred by the DM is represented as

max x U(f(x)). (1)

In our problem setting, both the functions f(x) and U(f) are assumed to be unknown. About the querying to the objective function f(x), we follow the standard setting of BO. Let yi = f(xi) + ε be a noisy observation of f(xi) where ε N(0, σ2I) is an independent noise term with the variance σ2 and the identity matrix I. Since observing yi requires high observation cost, a sample efﬁcient optimization strategy is required.

Figure 1: (a) Illustration of Pareto optimal points. The green points are Pareto optimal because, from each one of the green points, no points can improve both objectives simultaneously. The dashed line is an underlying set of all the Pareto optimal points, called the Pareto front. (b) Illustration of CSF (see 3.2 for detail of CSF). The direction of the red arrow corresponds to w. The red dashed lines are the contour lines of CSF, by which the red star becomes the optimal point.

On the other hand, directly observing the value of the utility function U(f(x)) is usually difﬁcult because of difﬁculty in deﬁning a numerical score of the DM preference. Instead, we assume that we can query to the DM about the following two types of weak supervisions:

Pairwise Comparison (PC): PC indicates the relative preference over given two f and f , i.e., the DM provides whether f is preferred than f or not. Improvement Request (IR): IR indicates the dimension ℓ in a given f that the DM considers improvement is required most among ℓ [L].

For the DM, these two types of information are much easier to provide than the exact value of U(f) itself. PC is a well-known format of a supervision in the context of preference learning (Chu and Ghahramani 2005) and dueling bandit (Sui et al. 2018). On the other hand, it should be noted that IR has not been studied in these contexts to our knowledge, though it considers a practically quite common scenario.

3 Proposed Method In this section, we ﬁrst describe the modeling of the MOO objective functions f(x) in 3.1. Next, in 3.2, the modeling of the utility functions U that represents the DM preference is described. In 3.3, we show the acquisition functions for BO of U(f(x)) and active learning of U. Then, the entire algorithm of the proposed method is shown in 3.4. Finally, we discuss a variation of the utility function in 3.5. Due to the space limitation, some technical details are omitted. See our extended pre-print (Ozaki et al. 2023) for further details.

3.1 Gaussian Process for Objective Functions As a surrogate model of f(x), we employ the Gaussian process (GP) regression. For simplicity, the L dimensionaloutput of f(x) is modeled by the L independent GPs, each one of which has k(x, x ) as a kernel function. When we

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Figure 2: Illustrative examples of contour plots the linear utility function U(f) = PL ℓ=1 wℓfℓ, where wℓ 0. The plots (a) and (b) have different coefﬁcients. In (a), the blue point is the optimal in a sense of U, while the red point is the optimal in (b). On the other hand, the green point can never be the optimal point by any selection of the coefﬁcients.

observe a set of t observations DGP = {(xi, yi)}t i=1, the predictive distribution can be obtained as the posterior p(f(x) | DGP), for which the well-known analytical calculation is available (see e.g., Williams and Rasmussen 2006). Although we here use the independent setting for the L outputs, incorporating correlation among them is also possible by using the multi-output GP (Alvarez et al. 2012).

3.2 Bayesian Modeling for Utility Function The simplest choice of the utility function is a linear function U(f) = PL ℓ=1 wℓfℓwith a weight parameter wℓ 0. However, in a linear utility function, depending of the shape of the Pareto front, it is possible that: 1) there exists a Pareto optimal solution that cannot be identiﬁed by any weight parameters, and 2) only one objective function with the highest weight is exclusively optimized (Figure 2). To incorporate the DM preference, variety of scalarization functions have been studied (Bechikh et al. 2015). We here mainly consider the following Chebyshev scalarization function (CSF) (Giagkiozis and Fleming 2015):

U(f(x)) = min f1(x)

w1 , . . . , f L(x)

where w = (w1, . . . , w L) is a preference vector that satisﬁes P ℓ [L] wℓ= 1 and wℓ> 0. Figure 1 (b) is an illustration of CSF. In the MOO literature (Giagkiozis and Fleming 2015), it is known that for any Pareto optimal solution f , there exist a weighting vector w under which the maximizer x of (2) derives f = f(x). For further discussion on the choice of the utility function, see 3.5.

Likelihood for Pairwise Comparison We deﬁne the likelihood of PC, inspired by the existing work on preference learning of a GP (Chu and Ghahramani 2005). Suppose that the DM prefers f (i) to f (i ), for which we write f (i) f (i ). We assume that the preference observation is generated from the underlying true utility U contaminated with the Gaussian noise as follows:

f (i) f (i ) U(f (i)) + ϵi > U(f (i )) + ϵ i,

Figure 3: Assume that the DM requests that the direction of f2 should be improved than f1 at f (i). This indicates that the utility function should have larger increase when f (i)

increases along with the axis of f2 (the direction of the black arrow) compared with the axis of f1 (the white arrow).

where ϵi, ϵ i N(0, σ2 PC) is the Gaussian noise having the variance σ2 PC. For a given set of n preference observations {f i f i}n i=1, the likelihood is written as

p({f (i) f (i )}n i=1|w)=

U(f (i)) U(f (i ))

where Φ is the cumulative distribution function of the standard Gaussian distribution.

Likelihood for Improvement Request For a given f (i) = (f (i) 1 , . . . , f (i) L ) , if the DM considers f (i) ℓi has a higher pri-

ority to be improved more than other f (i) ℓ i , we write ℓi ℓ i. In IR, an observation is a dimension ℓi for which the DM requires improvement most strongly among L dimensions. This can be considered that we observe L 1 relations ℓi ℓ i for ℓ i [L] \ ℓi. The observation ℓi ℓ i for f (i) can be interpreted that the gradient of U(f (i)) with respect to f (i) ℓi is larger than those

of f (i) ℓ i . In other words, the direction of f (i) ℓi should improve

the preference U(f (i)) more rapidly than the direction of f (i) ℓ i (Figure 3). Let gℓ(f) = U(f)

f ℓ be the ℓ-th dimension of the gradient of U(f). Then, the event ℓi ℓ i is characterized through the underlying gradient gℓ(f) as follows:

ℓi ℓ i gℓi(f (i)) gℓ i(f (i)) > ei

where ei N(0, σ2 IR) is the observation noise with the variance σ2 IR. Suppose that we have m observations {ℓi ℓ i}m i=1 in total. The likelihood is

p({ℓi ℓ i}m i=1 | w) =

gℓi(f (i))) gℓ i(f (i))

Prior and Posterior We employ the Dirichlet distribution p(w) = 1 B(α) QL i=1 wαi 1 i as a prior distribution of w because it has the constraint w 1 = 1, where B is the beta function and α = [α1, . . . , αL] is a parameter. Let Dpre be

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

p(w | Dpre) p(w | D pre)

Figure 4: A schematic illustration of the procedure of the proposed framework. x(t) is the selected x at the t-th iteration of BO. The DM can add the preference information at any time point. A pair in a PC observation is represented by the two triangles with the same color (e.g., for the orangecolored and , ). Each black arrow represents the direction that the DM requests to improve. In D pre, the DM newly adds the request for the improvement on the horizontal direction.

a set of PC and IR observations. The posterior distribution of w is written as

p(w | Dpre) p(w)

p({f (i) f (i )}n i=1 | w) p({ℓj ℓ j}m j=1 | w). (5)

3.3 Acquisition Functions

Bayesian Optimization of Utility Function Our purpose is to identify x that maximizes U(f(x)), for which we employ the well-known expected improvement (EI) criterion of Bayesian optimization. Let xbest = argmaxx Θ U(f(x)) be the best x in the set Θ consisting of xi contained in the observed set DGP. The acquisition function that selects the next x is deﬁned by

αEI(x) = Ef(x),w [max {U (f(x)) U (f(xbest)) , 0}] . (6)

Unlike the standard EI in BO, the expectation is jointly taken over f(x) and w, and the current best term U(f(xbest)) is also random variable. Therefore, the analytical calculation of (6) is difﬁcult and we employ the Monte Carlo (MC) method to evaluate (6). The objective function f(x) can be easily sampled because it is represented by the GPs The parameter of utility function w is sampled from the posterior (5), for which we use the standard Markov chain MC (MCMC) sampling. Note that since both of f and U are represented as Bayesian models, uncertainty with respect to both the objective functions and the user preference are incorporated in our acquisition function.

Active Learning for Utility Function Estimation We also propose an active learning (AL) acquisition function for efﬁciently estimating the utility function U. Since PCand IRobservations require interactions with the DM, accurate preference estimation with the minimum observations is desired. Our AL acquisition function is based on the Bayesian active learning framework called BALD (Bayesian Active Learning by Disagreement) (Houlsby et al. 2011). BALD is an information theoretic approach in which the next query

is selected by maximizing mutual information (MI) between an observation and a model parameter. We here describe the case of PC only because for IR, almost the same procedure is derived. We need to select a pair f and f that efﬁciently reduces the uncertainty of w. Let

z PC = 1 if f f , 0 if f f

be the indicator of the preference given by the DM. Our AL acquisition function is deﬁned by the MI between z PC and w:

MI(z PC; w) = H[z PC] Ew[H[z PC | w]], (7)

where H represents the entropy. Houlsby et al. (2011) clarify that this difference of the entropy representation results in a simpler computation than other equivalent representations of MI. For the ﬁrst term of (7), since z PC follows the Bernoulli distribution, we have

H[z PC] = X

z PC {1,0} p(z PC) log p(z PC).

Unfortunately, p(z PC), in which w is marginalized, is difﬁcult to evaluate analytically. On the other hand, the conditional distribution p(z PC | w) is easy to evaluate as shown in (3). Therefore, we employ a sampling based approximation

w W p(z PC | w)/|W|,

where W is a set of w generated from the posterior (5) (e.g., by using the MCMC sampling). For the second term of (7), the same sample set W can be used as

Ew[H[z PC | w]]

p(z PC | w) log p(z PC | w)

3.4 Algorithm

The procedure of the proposed framework is shown in Figure 4 and Algorithm 1 (MBO-APL: Multi-objective BO with Active Preference Learning). Note that although Algorithm 1 also only considers the case of PC, the procedure for IR is almost the same. BO and the preference learning can be in parallel because the training of the GPs and w are independent. When the DM adds the preference data (PC and/or IR) into Dpre, the posterior of w is updated. The updated posterior can be immediately used in the next acquisition function calculation of BO (in Figure 4, for example, p(w | D pre) can be used to determine x(t+2)).

3.5 Selection on Utility Function

Although we mainly focus on (2) as a simple example of the utility function, different utility functions can be used in our framework. For example, in MOO literature, the following

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Algorithm 1: Proposed Method

1: procedure MBO-APL 2: Run ACTIVE-PREF-LEARNING in background 3: for t = 1, . . . do 4: Fit GPs to DGP = {(xi, yi)}t i=1 5: xt+1 argmaxx αEI(x) using current p(w | Dpref) 6: Observe (xt+1, yt+1) and DGP DGP (xt+1, yt+1) 7: end for 8: end procedure 9: procedure ACTIVE-PREF-LEARNING 10: for t = 1, . . . do 11: Update p(w | Dpref) with the current Dpref 12: f, f argmaxf,f MI(z PC; w) 13: Query z PC to the DM and add the result to Dpref 14: end for 15: end procedure

augmented CSF is often used (Bechikh et al. 2015; Hakanen and Knowles 2017):

U(f(x)) = min ℓ [L] fℓ(x) f ref ℓ wℓ + ρ X

fℓ(x) f ref ℓ wℓ ,

where ρ > 0 is an augmentation coefﬁcient (usually a small constant) and f ref ℓ is a reference point. If the DM can directly provide the reference point, f ref ℓ is a ﬁxed constant, while it is also possible that we estimate f ref ℓ as a random variable from PC and IR. The second term avoids weakly Pareto optimal solutions. Using the augmented CSF in our framework is easy because the posterior of w (and f ref ℓ) can be derived by the same manner described in 3.2. Instead of parametric models such as CSF, nonparametric approaches are also applicable to deﬁning the utility function. Since our problem setting is the maximization of f(x), the utility function U should be monotonically increasing. Therefore, for example, the GP regression with the monotonicity constraint (Riihim aki and Vehtari 2010) can be used to build a utility function with high ﬂexibility. However, because of its high ﬂexibility, the GP model may require a larger number of observations to estimate the DM preference accurately. In this sense, the simple CSF-based and the GP-based approaches are expected to have trade-off relation about their ﬂexibility and complexity of the estimation.

4 Related Work In the MOO literature, incorporating the DM preferences into exploration algorithms have been studied. For example, a hyper-rectangle or a vector are often used to represent the preference of the DM in the objective function space (e.g., Hakanen and Knowles 2017; Palar et al. 2018; He et al. 2020; Paria, Kandasamy, and P oczos 2020). Another example is that Abdolshah et al. (2019) represent the DM preference by the order of importance for the objective functions. In particular, Paria, Kandasamy, and P oczos (2020) use a similar formulation to our approach in which CSF with

0 10 20 30 Iteration

Random Active Learning

0 10 20 30 Iteration

Random Active Learning

Figure 5: Estimation error of w.

the random parameters can be used for the utility function. These approaches do not estimate a model of the DM preference, by which the DM needs to directly specify the detailed requirement of the performance though it is often difﬁcult in practice. For example, in the case of the hyper-rectangle, there may not exist any solutions in the speciﬁed region by the DM. The interactive preference estimation with MOO has also been studied mainly in the context of evolutionary algorithms (Hakanen et al. 2016). For example, Taylor et al. (2021) combine preference learning with the multi-objective evolutionary algorithm. On the other hand, to our knowledge, combining an interactively estimated Bayesian preference model and the multi-objective BO has not been studied, though multi-objective extension of BO has been widely studied (e.g., Emmerich 2005; Hern andez-Lobato, Hoffman, and Ghahramani 2014). Astudillo and Frazier (2020) and Jerry Lin et al. (2022) consider a different preference based BO in which the DM can have an arbitrary preference over the multi-dimensional output space, meaning that the problem setting is not MOO anymore (in the case of MOO, the preference should be monotonically increasing). Further, these studies do not discuss the IR-type supervision.

5 Experiments

We perform two types of experiments. First, in 5.1, we evaluate the performance of our MI-based active learning (described in 3.3) that efﬁciently learns the preference model (2). Next, in 5.2, we evaluate the performance of the entire framework of our proposed method, for which we used a benchmark function and two settings of hyperparameter optimization of machine learning models. In all experiments, the true utility function (the underlying true DM preference) is deﬁned by (2) with the parameter wtrue, determined through the sampling from the Dirichlet distribution (α = (2, . . . , 2) ). GPs for f(x) employs the RBF kernel. Preference observations are generated with the noise variance σPC = σIR = 0.1. Due to the space limitation, some experimental details are omitted. See our extended pre-print (Ozaki et al. 2023) for further details and additional results.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

0 50 100 150 Iteration

Simple Regret

Proposed method EI with True Preference Random MOBO-RS EI-UU

0 50 100 150 Iteration

Simple Regret

Proposed method EI with True Preference Random MOBO-RS EI-UU

0 50 100 150 Iteration

Simple Regret

Proposed method EI with True Preference Random MOBO-RS EI-UU

0 50 100 150 Iteration

Simple Regret

Proposed method EI with True Preference Random MOBO-RS EI-UU

Figure 6: Simple regret on benchmark functions.

5.1 Preference Learning with Active Query Selection We evaluate estimation accuracy of the preference parameter w. Our MI-based acquisition function and the random query selection were compared. We iteratively added the preference information of the DM. At each iteration, a PC observation and an IR observation are provided. Figure 5 shows the results. In each plot, the horizontal axis is the iteration and the vertical axis is werror = 1 T PT t=1 wtrue wt 2, where w1, . . . , w T are sampled from the posterior (T = 1000). The results are the average of 10 runs. We can see that the error rapidly decreases, and further, active learning obviously improves the accuracy. Even when L = 10, werror decreased to around 0.1 only with a few tens of iterations. Since werror becomes 0 only when the posterior generates the exact wtrue T times, we consider werror 0.1 is sufﬁciently small to accelerate our preference based BO (note that wt 1 = 1).

5.2 Utility Function Optimization We evaluate efﬁciency of the proposed method by evaluating the simple regret on the true utility function Uwtrue:

max x Uwtrue(f(x)) max x Θs Uwtrue(f(x)),

where Θs is a set of x already observed. This evaluates the difference of the utility function values between the true optimal (the ﬁrst term) and the current best value (the second term). The true parameter wtrue was determined through the

sampling from the Dirichlet distribution. We assume that a PCand an IRobservation are obtained when the DM provides the preference information. The results are shown in the average and the standard error of 10 runs. The performance was compared with the three methods: random selection (Random), a random scalarization-based multiobjective BO, called MOBO-RS (Paria, Kandasamy, and P oczos 2020), and multi-attribute BO with preference-based expected improvement acquisition function, called EI-UU (Astudillo and Frazier 2020). In hyper-parameter optimization experiments, we also compare our proposed methods with two other multi-objective BO methods, Expected Hypervolume Improvement (EHI) (Emmerich and Klinkenberg 2008) and Max-value Entropy Search for Multi-objective Bayesian Optimization (MESMO) (Belakaria, Deshwal, and Doppa 2019). Further, we evaluate the performance of EI (6) with the ﬁxed ground-truth preference vector wtrue, referred to as EI with True Preference (EI-TP). We interpret EI-TP as a best possible baseline for our proposed method, because it has a complete DM preference from the beginning. We used benchmark functions and two problem settings of hyper-parameter optimization for machine learning models (cost-sensitive learning and fairness-aware machine learning).

Benchmark Function We use well-known MOO benchmark functions, called DTLZ1 and DTLZ3 (Deb et al. 2005). In both the functions, the input and output dimensions are d = 3 and L = 3. In addition, we use benchmark functions called Kursawe (Kursawe 1990), and Schaffer2 (Schaffer et al. 1985), in which (d, L) = (3, 2) and (d, L) = (1, 2), respectively. We prepare 1000 input candidates by taking grid points in each input dimension. At every iterations, a PCand an IRobservation are selected by active learning, and the posterior of w is updated. The results are shown in Figure 6. We see that our proposed method can efﬁciently reduce simple regret compared with Random, MOBO-RS (which seeks the entire Pareto front), and EI-UU. On the other hand, the proposed method and EI-TP show similar performance, which indicates that the posterior of w provides sufﬁciently useful information for the efﬁcient exploration. We further provide results on other benchmark functions, sensitivity evaluation to the number of samplings in EI, and ablation study on PC and IR in Appendix E.2, F.3, and F.1, respectively, in Ozaki et al. (2023).

Hyper-parameter Optimization for Cost-sensitive Learning In many real-world applications of multi-class classiﬁcation, a DM may have different preferences over each type of misclassiﬁcations (Elkan 2001; Li et al. 2021). For example, in a disease diagnosis, the miss-classiﬁcation cost for the disease class can be higher than those of the healthy class. In learning algorithms, the cost balance are often controlled by hyper-parameters. In this section, we consider a hyper-parameter optimization problem for L-class classiﬁcation models considering the importance of each class. For the hyper-parameter optimization of cost-sensitive learning, we used Light GBM (Ke et al. 2017) and neural net-

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

0 50 100 150 Iteration

Simple Regret

Waveform-5000

Proposed method EI with True Preference Random MOBO-RS EI-UU MESMO EHI

0 50 100 Iteration

Simple Regret

Proposed method EI with True Preference Random MOBO-RS EI-UU MESMO

Figure 7: Simple regret on the hyper-parameter optimization for cost-sensitive learning.

0 50 100 150 Iteration

Simple Regret

Proposed method EI with True Preference Random MOBO-RS EI-UU MESMO EHI

0 50 100 150 Iteration

Simple Regret

Proposed method EI with True Preference Random MOBO-RS EI-UU MESMO EHI

Figure 8: Simple regret on the hyper-parameter optimization for fairness-aware learning.

works. For Light GBM, x is deﬁned by the L-dimensional class weight parameters, which controls the importance of each class. In the case of neural networks, the weighted cross-entropy loss PL i=1 λiyi log ˆyi was used, where λi > 0 is the class weight, yi {0, 1} is one-hot encoding of the label and ˆyi is the corresponding class predicted probability. In this case, x consists of λ1, . . . , λL. The objective functions f(x) are deﬁned by recall of each class on the validation set (e.g. f1 represents recall of the ﬁrst class, f2 represents recall of the second class, ...). The datasets for the classiﬁers are Waveform-5000 (L = 3) and CIFAR-10 (L = 10) (Krizhevsky, Hinton et al. 2009). For Waveform-5000, we used Light GBM. For CIFAR-10, we used Resnet18 (He et al. 2016) pre-trained by Imagenet (Deng et al. 2009). The input dimension (dimension of hyper-parameters) equals to the number of classes, i.e., d = L. Figure 7 shows the results. Overall, we see that the same tendency with the case of benchmark function optimization. The proposed method outperformed MOBO-RS, EHI, MESMO, and EI-UU, and was comparable with EI-TP. In the CIFIR-10 dataset, which has the highest output dimension (10), EI-TP was better than the proposed method. When the output dimension is high, the preference estimation can

be more difﬁcult, but we still obviously see that the proposed method drastically accelerates the exploration compared with searching the entire Pareto front. Note that EHI is not applied to the CIFAR-10 dataset because it is difﬁcult to calculate the acquisition function values due to the high output dimension.

Hyper-parameter Optimization for Fair Classiﬁcation As another example of preference-aware hyper-parameter optimization, we consider the fairness in classiﬁcation problem. Although speciﬁc sensitive attributes (e.g. gender, race) should not affect the prediction results for fairness, generally high fairness degrades the classiﬁcation accuracy (Zafar et al. 2017). Zafar et al. (2017) propose the classiﬁcation method which maximizes fairness under accuracy constraints, and it has a hyper-parameter named γ to control trade-off between fairness and accuracy. We consider the hyper-parameter optimization with 2dimensional objective functions consisting of the level of fairness (p%-rule (Biddle 2006)) and accuracy of the classiﬁcation. We use the logistic regression classiﬁer proposed in Zafar et al. (2017) and the hyper-parameter γ which control trade-off between fairness and accuracy is equivalent to x (d = 1). For two datasets, Adult (Becker and Kohavi 1996) and Bank (Moro, Rita, and Cortez 2012), we prepare 201 candidates of x by taking grid points in [0, 1]. The data preprocessing and some other settings comply with Zafar et al. (2017). Figure 8 shows the results. The proposed method ﬁnds a better or the best solution in short iteration compared with other methods, especially for the Adult dataset.

6 Conclusion We proposed a multi-objective Bayesian optimization (BO) method in which the preference of the decision maker (DM) is adaptively estimated through a human-in-the-loop manner. The DM s preference is represented by a Chebyshev scalarization based utility function, for which we assume that pairwise comparison (PC) and improvement request (IR) are provided as weak supervisions by a decision maker (DM). Our acquisition function is based on the well-known expected improvement by which uncertainty of both the original objective functions and the preference model can be incorporated. We further proposed a mutual information based active learning strategy that reduces the interaction cost with the DM. Empirical evaluation indicated that our proposed method accelerates the optimization on several benchmark functions, and applications to hyper-parameter optimization of machine learning models are also shown.

Acknowledgements This work was supported by MEXT KAKENHI (20H00601, 21H03498, 22H00300, 23K17817), JST CREST (JPMJCR21D3, JPMJCR22N2), JST Moonshot R&D (JPMJMS2033-05), JST AIP Acceleration Research (JPMJCR21U2), NEDO (JPNP18002, JPNP20006), RIKEN Center for Advanced Intelligence Project, JSPS KAKENHI Grant Number JP21J14673, and JST ACT-X (JPMJAX23CD).

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Abdolshah, M.; Shilton, A.; Rana, S.; Gupta, S.; and Venkatesh, S. 2019. Multi-objective Bayesian optimisation with preferences over objectives. In Wallach, H.; Larochelle, H.; Beygelzimer, A.; d'Alch e-Buc, F.; Fox, E.; and Garnett, R., eds., Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc.

Alvarez, M. A.; Rosasco, L.; Lawrence, N. D.; et al. 2012. Kernels for vector-valued functions: A review. Foundations and Trends in Machine Learning, 4(3): 195 266.

Astudillo, R.; and Frazier, P. 2020. Multi-attribute Bayesian optimization with interactive preference learning. In International Conference on Artiﬁcial Intelligence and Statistics, 4496 4507. PMLR.

Bechikh, S.; Kessentini, M.; Said, L. B.; and Gh dira, K. 2015. Preference Incorporation in Evolutionary Multiobjective Optimization: A Survey of the State-of-the-Art. volume 98 of Advances in Computers, 141 207. Elsevier.

Becker, B.; and Kohavi, R. 1996. Adult. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5XW20.

Belakaria, S.; Deshwal, A.; and Doppa, J. R. 2019. Maxvalue entropy search for multi-objective Bayesian optimization. Advances in neural information processing systems, 32.

Biddle, D. 2006. Adverse impact and test validation: A practitioner s guide to valid and defensible employment testing. Gower Publishing, Ltd.

Chu, W.; and Ghahramani, Z. 2005. Preference learning with Gaussian processes. In Proceedings of the 22nd international conference on Machine learning, 137 144.

Deb, K.; Thiele, L.; Laumanns, M.; and Zitzler, E. 2005. Scalable test problems for evolutionary multiobjective optimization. Springer.

Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; and Fei Fei, L. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, 248 255. Ieee.

Elkan, C. 2001. The Foundations of Cost-Sensitive Learning. In Proceedings of the 17th International Joint Conference on Artiﬁcial Intelligence - Volume 2, IJCAI 01, 973 978. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.

Emmerich, M.; and Klinkenberg, J.-w. 2008. The computation of the expected improvement in dominated hypervolume of Pareto front approximations. Rapport technique, Leiden University, 34: 7 3.

Emmerich, M. T. M. 2005. Singleand Multi-objective Evolutionary Design Optimization Assisted by Gaussian Random Field Metamodels. Ph D thesis, FB Informatik, University of Dortmund.

Giagkiozis, I.; and Fleming, P. J. 2015. Methods for multiobjective optimization: An analysis. Information Sciences, 293: 338 350.

Hakanen, J.; Chugh, T.; Sindhya, K.; Jin, Y.; and Miettinen, K. 2016. Connections of reference vectors and different types of preference information in interactive multiobjective evolutionary algorithms. In 2016 IEEE Symposium Series on Computational Intelligence (SSCI), 1 8. Hakanen, J.; and Knowles, J. D. 2017. On Using Decision Maker Preferences with Par EGO. In Trautmann, H.; Rudolph, G.; Klamroth, K.; Sch utze, O.; Wiecek, M.; Jin, Y.; and Grimme, C., eds., Evolutionary Multi-Criterion Optimization, 282 297. Cham: Springer International Publishing. He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770 778. He, Y.; Sun, J.; Song, P.; Wang, X.; and Usmani, A. S. 2020. Preference-driven Kriging-based multiobjective optimization method with a novel multipoint inﬁll criterion and application to airfoil shape design. Aerospace Science and Technology, 96: 105555. Hern andez-Lobato, J. M.; Hoffman, M. W.; and Ghahramani, Z. 2014. Predictive Entropy Search for Efﬁcient Global Optimization of Black-box Functions. In Advances in Neural Information Processing Systems 27, 918 926. Curran Associates, Inc. Houlsby, N.; Husz ar, F.; Ghahramani, Z.; and Lengyel, M. 2011. Bayesian Active Learning for Classiﬁcation and Preference Learning. Ishibuchi, H.; Tsukamoto, N.; and Nojima, Y. 2008. Evolutionary many-objective optimization: A short review. In 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence), 2419 2426. Jerry Lin, Z.; Astudillo, R.; Frazier, P.; and Bakshy, E. 2022. Preference Exploration for Efﬁcient Bayesian Optimization with Multiple Outcomes. In Camps-Valls, G.; Ruiz, F. J. R.; and Valera, I., eds., Proceedings of The 25th International Conference on Artiﬁcial Intelligence and Statistics, volume 151 of Proceedings of Machine Learning Research, 4235 4258. PMLR. Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; and Liu, T.-Y. 2017. Lightgbm: A highly efﬁcient gradient boosting decision tree. Advances in neural information processing systems, 30. Krizhevsky, A.; Hinton, G.; et al. 2009. Learning multiple layers of features from tiny images. Kursawe, F. 1990. A variant of evolution strategies for vector optimization. In International conference on parallel problem solving from nature, 193 197. Springer. Li, M.; Zhang, X.; Thrampoulidis, C.; Chen, J.; and Oymak, S. 2021. Auto Balance: Optimized Loss Functions for Imbalanced Data. Advances in Neural Information Processing Systems, 34: 3163 3177. Moro, S.; Rita, P.; and Cortez, P. 2012. Bank Marketing. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5K306.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Ozaki, R.; Ishikawa, K.; Kanzaki, Y.; Suzuki, S.; Takeno, S.; Takeuchi, I.; and Karasuyama, M. 2023. Multi-Objective Bayesian Optimization with Active Preference Learning. ar Xiv:2311.13460. Palar, P. S.; Yang, K.; Shimoyama, K.; Emmerich, M.; and B ack, T. 2018. Multi-Objective Aerodynamic Design with User Preference Using Truncated Expected Hypervolume Improvement. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 18, 1333 1340. New York, NY, USA: Association for Computing Machinery. Paria, B.; Kandasamy, K.; and P oczos, B. 2020. A ﬂexible framework for multi-objective bayesian optimization using random scalarizations. In Uncertainty in Artiﬁcial Intelligence, 766 776. PMLR. Riihim aki, J.; and Vehtari, A. 2010. Gaussian processes with monotonicity information. In Proceedings of the thirteenth international conference on artiﬁcial intelligence and statistics, 645 652. JMLR Workshop and Conference Proceedings. Schaffer, J. D.; et al. 1985. Multiple objective optimization with vector evaluated genetic algorithms. In Proceedings of an international conference on genetic algorithms and their applications, 93 100. Sui, Y.; Zoghi, M.; Hofmann, K.; and Yue, Y. 2018. Advancements in Dueling Bandits. In Proceedings of the 27th International Joint Conference on Artiﬁcial Intelligence, 5502 5510. AAAI Press. Taylor, K.; Ha, H.; Li, M.; Chan, J.; and Li, X. 2021. Bayesian Preference Learning for Interactive Multi Objective Optimisation. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 21, 466 475. New York, NY, USA: Association for Computing Machinery. Williams, C. K.; and Rasmussen, C. E. 2006. Gaussian processes for machine learning, volume 2. MIT press Cambridge, MA. Zafar, M. B.; Valera, I.; Rogriguez, M. G.; and Gummadi, K. P. 2017. Fairness constraints: Mechanisms for fair classiﬁcation. In Artiﬁcial intelligence and statistics, 962 970. PMLR.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)