# projective_preferential_bayesian_optimization__3e0cf57c.pdf

Projective Preferential Bayesian Optimization

Petrus Mikkola 1 Milica Todorovic 2 arvi 2 Patrick Rinke 2 Samuel Kaski 1 3 Jari J

Bayesian optimization is an effective method for fnding extrema of a black-box function. We propose a new type of Bayesian optimization for learning user preferences in high-dimensional spaces. The central assumption is that the under lying objective function cannot be evaluated di rectly, but instead a minimizer along a projection can be queried, which we call a projective pref erential query. The form of the query allows for feedback that is natural for a human to give, and which enables interaction. This is demonstrated in a user experiment in which the user feedback comes in the form of optimal position and orien tation of a molecule adsorbing to a surface. We demonstrate that our framework is able to fnd a global minimum of a high-dimensional black-box function, which is an infeasible task for existing preferential Bayesian optimization frameworks that are based on pairwise comparisons.

1. Introduction

Let f : X R be a black-box function defned on a QD hypercube X = [ad, bd] where D 2. Without loss d=1 of generality we assume that 0 X . The objective is to fnd a global minimizer

x = argmin f(x). (1) x X

We assume, as in Preferential Bayesian Optimization (PBO, Gonz alez et al., 2017), that f is not directly accessible. In

0 PBO, queries to f can done in pairs of points x, x X , and the binary feedback indicates whether f(x) > f(x0). In contrast, in our work we assume that queries to f are be done

1Helsinki Institute for Information Technology HIIT, De partment of Computer Science, Aalto University, Espoo, Fin land 2Department of Applied Physics, Aalto University, Espoo, Finland 3The University of Manchester, UK. Correspondence to: Petrus Mikkola <petrus.mikkola@aalto.f>, Samuel Kaski <samuel.kaski@aalto.f>.

Proceedings of the 37 th International Conference on Machine Learning, Online, PMLR 119, 2020. Copyright 2020 by the au thor(s).

Figure 1. An illustration of a projective preferential query on molecular properties: in which rotation is the molecule most likely to bind to the surface. Here, x X describes the location and orientation of a molecule as a vector x = (X, Y, Z, a, b, c). In the fgure, the projective preferential query (ξ, x) fnds the optimal rotation along the horizontal plane, defned by the coordinate b. This corresponds to setting ξi = 0 to all coordinates i except the one corresponding to b, and by rotating the molecule the expert then gives the optimal value α that corresponds to the optimal value for coordinate b. The other coordinates are kept fxed (these are determined by x).

over the projection onto a projection vector ξ Ξ X . The feedback is the optimal scalar projection, that is, the length α of the projection in the direction ξ. We assume that there are zero coordinates in ξ, and these coordinates are set to fxed values described by a reference vector x X . Formally, given a query (ξ, x), the feedback is obtained as a minimizer over the possible scalar projections,

α = argmin f(αξ + x), (2) α Iξ

where Iξ {α R|αξ + x X}.

What are then natural use cases for such projective prefer ential queries? The main motivation comes from humans serving as the oracles. The form of the query enables eff cient learning of user preferences over choice sets in which each choice has multiple attributes, and in particular over continuous choice sets. An important application is knowl edge elicitation from trained professionals (doctors, physi

Projective Preferential Bayesian Optimization

cists, etc.). For example, we may learn a material scientists preferences, that is, insight based on prior knowledge and experience, over molecular translations and orientations as a molecule adsorbs to a surface. In this case, a projective preferential query could correspond to fnding an optimal rotation (see Figure 1), which the scientists can easily give by rotating the molecule in a visual interface.

Probabilistic preference learning is a relatively new topic in machine learning research but has a longer history in econometrics and psychometrics (Mc Fadden, 1981; 2001; Stern, 1990; Thurstone, 1927). A wide range of applications of these models exists, for instance in computer graphics (Brochu et al., 2010), expert knowledge elicitation (Souf ani et al., 2013), revenue management systems of airlines (Carrier & Weatherford, 2015), rating systems, and almost any application that contains users preference modeling. An established probabilistic model is Thurstone-Mosteller (TM) model that measures a process of pairwise compar isons (Thurstone, 1927; Mosteller, 1951). In the preference learning context, the models based on the TM-model can be applied to learning preferences from pairwise comparison feedback (e.g. Chu & Ghahramani, 2005). An extension of this research into the interactive learning setting is studied by, among others, Brochu et al. (2008) and Gonz alez et al. (2017). All these approaches resort to pairwise feedbacks. Koyama et al. (2017) proposed an extension of Bayesian op timization for learning optimal parameter values for visual design tasks by letting a user give feedback as a slider manip ulation. They considered only two pairwise comparisons per slider since they tested to included more sampling points but they did not observe any signifcant improvement in the optimization behavior .

A drawback of most preference learning frameworks is their incapability to handle high-dimensional input spaces. The underlying reason is a combinatorial explosion in the num ber of possible comparisons with respect to the number of dimensions D, O(K2D), given K grid-points per dimen sion. This implies that a single pairwise comparison has low information content in high-dimensional spaces. This prob lem was mitigated by Gonz alez et al. (2017) by capturing the correlations among pairs of queries (duels). However, it is still diffcult to scale that method to high-dimensional spaces, say higher than 2-dimensional (see Section 4). Fur thermore, the numerical computations become infeasible in a high-dimensional setting, especially the optimization of an acquisition function or fnding a Condorcet winner.

In this paper, we introduce a Bayesian framework, which we call Projective Preferential Bayesian Optimization (PPBO), that scales to high-dimensional input spaces. A main rea son is that the information content of a projective prefer ential query is much higher than that of a pairwise pref erential query. A projective preferential query is equiv

alent to infnite pairwise comparisons along a projection. An important consequence is that with projective preferen tial queries, the user s workload in answering the queries will be considerably reduced. Source code is available at https://github.com/Aalto PML/PPBO.

2. Learning preferences from projective preferential feedback

In this section we introduce a Bayesian framework capable of dealing with projective preferential data. A central idea is to model the user s utility function, that is f, as a Gaus sian process as frst proposed by Chu & Ghahramani (2005). We extend this line of study to allow projective preferen tial queries, by deriving a tractable likelihood, proposing a method to approximate it, and introducing four acquisition criteria for enabling interactive learning in this setting.

In this paper, for convenience, we will formulate the method for maximization instead of minimization as in (2), without loss of generality.

2.1. Likelihood

Our probabilistic model of user preferences is built upon the Thurstone s law of comparative judgement (Thurstone, 1927). A straightforward way to formalize this would be to assume pairwise comparisons are corrupted by Gaussian noise: x x0, if and only if f(x) + ε > f(x0) + ε0, where the latent function f is a utility function that characterizes user preferences described by the preference relation . The standard assumption is that ε and ε0 are identically and independently distributed Gaussians. Here, we deviate slightly from this assumption: Given two alternatives (αξ + x), (βξ + x) X , we assume that αξ + x βξ + x, if and only if f(αξ + x) + W (α) > f(βξ + x) + W (β), where W is a Gaussian white noise process with zeromean and autocorrelation E(W (t)W (t + τ )) = σ2 if τ = 0, and zero otherwise.

We would like to fnd the likelihood for an observation (α, (ξ, x)) that corresponds to uncountably infnite pairwise comparisons: αξ+x βξ+x for β 6= α. For each compar ison we condition on W (α) (more details in Supplementary material),

P (αξ + x βξ + x | W (α) = w)

f(βξ + x) f(αξ + x) w = 1 Φ , σ

where Φ is the cumulative distribution function of the stan dard normal distribution. For a single comparison we have

P (αξ + x βξ + x)

f(βξ + x) f(αξ + x) = 1 [Φ φ] , σ

Projective Preferential Bayesian Optimization

where φ is the probability density of the standard normal distribution and is the convolution operator. For infnite comparisons, we frst consider a fnite number of compar isons m. By their independence, we have

P (αξ + x β1ξ + x, ..., αξ + x βmξ + x)

m Y f(βj ξ + x) f(αξ + x) = 1 [Φ φ] . σ j=1

Z f(βξ + x) f(αξ + x) exp [Φ φ] dβ . σ Iξ

The joint log-likelihood of a dataset D, denoted as L(D|f), takes the form

N Z X f(βξi i) f(αiξi i) + x + x [Φ φ] dβ. σ I i=1 ξi

First, we introduce notation. Assume that N projective preferential queries have been performed and gathered into a dataset D = {(αi , (ξi , xi))}N For every data in i=1. stance (αi , (ξi , xi)), we also consider a sequence of pseudoobservations {(βi ξi , xi)}m Technically, the pseudo j j=1. observations are Monte-Carlo samples needed for integrat ing the likelihood. The latent function values evaluated on those points are gathered into a vector,

f (i) f(αiξi i), {f(βj

i ξi i)}m + x + x . j=1

The latent function vector over all points is formed by con catenating over f (f (i))N

The user s utility function f is modelled as a Gaussian pro cess (Rasmussen & Williams, 2005). GP model fts ideally to this objective, since it is fexible (non-parametric) and can conveniently handle uncertainty (predictive distributions can be derived analytically). In particular, it allows us to have insight into those regions of the space X in which either we are uncertain about user preferences due to lack of data, or because the user gives inconsistent feedback. A possible reason for the latter is that one of the preference axioms, transitivity or completeness, is violated in those regions. A weak preference relation is complete if for all x, y X , either x y or y x holds. That is, a user is able to reveal their preferences over all possible pairwise compar isons. Similarly, is transitive if for any x, y, z X the following holds: (x y and y z) implies that x z. That is, a user has consistent preferences. This together

with the continuity of guarantees the existence of a realvalued continuous utility function that represents (Debreu, 1954).

Thus, we assume as Chu & Ghahramani (2005), that the prior of the utility function follows a zero-mean Gaussian process,

1 1 f >Σ 1f), p(f) = N 1 exp( (2π) 2 |Σ| 2 2 By letting the number of points m in an increasing sequence β1, ..., βm of the partition of the interval Iξ\{α} to ap proach infnity, we can interpret this as a Volterra (product)

where the ijth-element of the covariance matrix is de

i termined by a kernel k as Σij = k(x , xj ). Through out the paper, we assume the squared exponential kernel

2 1 k(x, x) = σ2 exp( kx x0k ), where the σf and l are f 2l hyperparameters.

2.3. Posterior

For the sake of simplicity, we use the Laplace approxima tion for the posterior distribution. A maximum a posteriori (MAP) estimate is needed for that,

argmax P(f|D) = argmax(P(f)L(D|f)) = argmax T (f),

where we denote the functional (log-scaled posterior)

T (f ) 1 f >Σ 1f 2 N Z X i) f(βξi + xi) f(αiξi + x [Φ φ] dβ. σ I i=1 ξi

The convolution Φ φ can be effciently approximated by Gauss-Hermite quadrature. The outer integral is approxi mated as a Monte-Carlo integral,

Z f(βξ + x) f(αξ + x) [Φ φ] dβ σ Iξ

X m (Iξ) f(βj ξ + x) f(αξ + x) [Φ φ] , m σ j=1

where the pseudo-observations (βj ξ)m

j for j = 1, ..., m are sampled from a suitable distribution. Our choice is to use a family of truncated generalized normal (TGN) distributions, since it provides a continuous transformation from the uni form distribution to the truncated normal distribution, such that the locations of distributions can be specifed. The idea is to concentrate pseudo-observations more densely around the optimal value αξ as the number of queries increases. For more details, see Supplementary material.

For notational convenience, defne

i ξi + xi) f(αiξi + xi) Δi,j (f) . σ

Projective Preferential Bayesian Optimization

QD If the domain is normalized to X = [0, 1], and the d=1 ξ projections are normalized to ξ = , then (Iξ) = 1. kξk Hence, under this normalization, the functional T can be approximated as

N m X X 1 1 T (f) f >Σ 1f [Φ φ] Δi,j (f) . 2 m i=1 j=1

The MAP estimate can be effciently solved by a secondorder iterative optimization algorithm, since the gradient and the Hessian can be easily derived for T .

The Laplace approximation of the posterior amounts to the second-order Taylor approximation of the log posterior around the MAP estimate. In the ordinary (non-log) scale, this reads

1 P(f|D) P(f MAP|D) exp (f f MAP)>H(f f MAP) , 2

where the matrix H is the negative Hessian of the log-posterior at the MAP estimate, H rr log P(f |D)|f = Σ 1 + Λ.1 In other words, the =f MAP posterior distribution is approximated as a multivariate normal distribution with mean f MAP and the covariance matrix (Σ 1 + Λ) 1 .

2.4. Predictive distribution

Based on the well-known properties of the multivariate Gaussian distribution, the predictive distribution of f is also Gaussian. Given test locations (y(1), ..., y(M )), consider the N by M matrix K [k(y(j), x(i))]ij . The predictive mean and the predictive covariance at test locations are (for more details see (Rasmussen & Williams, 2005) or (Chu & Ghahramani, 2005))

µpred = K>Σ 1f MAP

Σpred = Σ0 K>(Σ + Λ 1) 1K,

where Σ0 is the covariance matrix of the test locations.

3. Sequential learning by projective preferential query

In this section, we discuss how to select the next projective preferential query (ξ, x). We will choose the next query as a maximizer of an acquisition function α(ξ, x), for in stance, we will consider a modifed version of the expected

1We denote the partial derivatives matrix evaluated at MAP estimate as

N m 2 1 X X Λ [Φ φ] Δi,j (f) . f f > m i=1 j=1 f =f MAP

improvement acquisition function (Jones et al., 1998). The optimization (ξ, x)next = argmax(ξ,x) α(ξ, x) is carried out by using Bayesian optimization (more details in Supple mentary material).

If the oracle is a human, this allows us to learn user prefer ences in an iterative loop, making PPBO interactive. How ever, this interesting special case, where f is a utility func tion of a human, also brings forth issues due to bounded rationality. We apply here the following narrow defnition of this more general concept (see Simon, 1990): Bounded rationality is the idea that users give feedback that refects their preferences, but within the limits of the information available to them and their mental capabilities.

3.1. The effects of bounded rationality on the optimal next query

If the oracle is a human, it is important to realize that the optimal next (ξ, x) is not solely the one which optimally balances the exploration-exploitation trade-off as it is for a perfect oracle but the optimal (ξ, x) takes also into account human cognitive capabilities and limitations. For instance, the more there are non-zero coordinates in ξ, the greater the cognitive burden to a human user, and the harder it becomes to give useful feedback. Thus, if there is a human in the loop, the choice of ξ should take into account both the optimization needs and what types of queries are convenient for the user.

The projective preferential feedback (2) may not be singlevalued or even well-defned for all ξ Ξ, if the oracle is a human. For instance, the user may not be able to explicate their preferences with respect to the dth attribute, that is, the preferences do not satisfy the completeness axiom. Formally, this means that if ξ = ed (the dth-standard unit vector), then for some x X it holds that argmaxα Iξ f(αξ + x) is multi-valued, a random variable or not well-defned de pending on how we interpret the scenario in which the user should say I do not know but instead gives an arbitrary feedback. Fortunately, this incompleteness can be easily handled when a GP is used for modelling f; it just implies that the posterior variance is high along the dimension d.

A possible solution would be to allow the answer I do not know , and to design an acquisition function that is capable of discovering and avoiding those regions in the space X where the user gives inconsistent feedback due to any source of bounded rationality. This challenge is left for future research. It is noteworthy that the acquisition function we introduce next, performed well in the user experiment covered in Section 5.

Projective Preferential Bayesian Optimization

3.2. Expected improvement by projective preferential query

We defne the expected improvement by projective preferen tial query at the nth-iteration by

EIn(ξ, x) En max max f(αξ + x) µ , 0 , (3) α Iξ n

where µ denotes the highest value of the predictive poste n rior mean, and the expectation is conditioned on the data up to the nth-iteration. The maximum over α models the anticipated feedback.

EIn can be approximated as a Monte-Carlo integral (up to a multiplicative constant that does not depend on (ξ, x)),

K X 1 max max f k(αξ + x) µ , 0 , (4) n K α Iξ k=1

where maxα Iξ f k(αξ + x) is approximated by using dis crete2 Thompson sampling as described in (Hern andez Lobato et al., 2014). Discrete Thompson sampling draws a fnite sample from the GP posterior distribution, and then returns the maximum over the sample. The steps needed to approximate EIn are summarized in Algorithm 1.

Algorithm 1 Approximate EIn(ξ, x) input (ξ, x) and K 1, J 1

1. Compute (Σ + Λ 1) 1

for k = 1, 2, ..., K do

2. Draw (βj )J

j=1 J 3. Draw f (βj ξ + x) from the predictive distribu j=1 tion N µpred, Σpred J 4. zk maxj f (βj ξ + x) j=1 end for PK 1 output k=1 max zk µ , 0 K n

The bottlenecks are the frst and the third steps. In the third step, a predictive covariance matrix of size J J needs to be computed, and then a sample from the multivariate normal distribution needs to be drawn. Hence, the time complex

3 ity of Algorithm 1 is O(N 3m + KN 2m2J + KNm J2 + KJ3), where the terms come from a matrix inversion (the frst step), two matrix multiplications, and a Cholesky de composition, respectively. Recall that N, m, K and J refer to the number of observations, pseudo-observations, Monte Carlo samples, and grid points, respectively.

2Another alternative is to consider continuous Thompson sam pling to draw a continuous sample path from the GP model, and then maximize it. The method is based on Bochner s theorem and the equivalence between a Bayesian linear model with random features and a Gaussian process. For more details see (Hern andez Lobato et al., 2014).

3.3. Pure exploitation and exploration

In the experiments we use pure exploration and exploitation as baselines. A natural interpretation of pure exploitation in our context is to select the next query (ξ, x) such that

ξ + x = x argmaxx0 X µn(x0), where µn(x0) is the posterior mean of the GP model at location x0, given all the data so far Dn.

We interpret pure exploration as maximization of the GP variance on a query (ξ, x) given the anticipated feedback. That is, the pure explorative acquisition strategy maximizes the following acquisition function

Explore(ξ, x) Vn max f(αξ + x) . (5) α Iξ

In practice, (5) is approximated by Monte-Carlo integration and discrete Thompson sampling in the same vein as in Algorithm 1.

3.4. Preferential coordinate descent (PCD)

The fourth acquisition strategy corresponds to the inter esting special case where ξ = ed (the dth-standard unit vector), and the coordinates d are rotated in a cycli cal order for each query. The reference vector x = (x1, ..., xd 1, 0, xd+1, ..., x D) can be chosen in several ways, but it is natural to consider an exploitative strategy

in which x is set to x except for the dth-coordinate which is set to zero. We call this acquisition strategy Preferential Coordinate Descent (PCD), since PPBO with PCD acquisi tion is closely related to a coordinate descent algorithm that successively minimizes an objective function along coordi nate directions. The PPBO method with PCD acquisition (PPBO-PCD) differs from the classical coordinate descent in two ways: First, PPBO-PCD assumes that direct function evaluations are not possible but instead projective preferen tial queries are. Second, it models the black-box function f (as a GP) whereas the classical coordinate descent does not. This makes PPBO-PCD able to take advantage of past queries from every one-dimensional optimization.

When comparing to other acquisition strategies, we show that PCD performs well in numerical experiments (when f is not a utility function but a numerical test function). This agrees with the results in the optimization literature; for instance: if f is pseudoconvex with continuous gradient, and X is compact and convex with nice boundary , then the coordinate descent algorithm converges to a global min imum (Spall, 2012, Corollary 3.1). However, PCD may not perform so well in high-dimensional spaces, since it cannot query in between the dimensions. For instance, the expected improvement by projective preferential query outperformed PCD on a 20D test function (see Section 4), since it allows to query arbitrary projections.

Projective Preferential Bayesian Optimization

4. Numerical experiments

In this section we demonstrate the effciency of the PPBO method in high-dimensional spaces, and experiment with various acquisition strategies in numerical experiments on simulated functions.

The goal is to fnd a global minimum of a black-box function f by querying it either through (i) pairwise comparisons or (ii) projective preferential queries. For (i) we use the PBO method of Gonz alez et al. (2017), which is state of the art among Gaussian process preference learning frameworks that are based on pairwise comparisons. For (ii) we use the PPBO method as introduced in this paper. The four different acquisition strategies introduced in Section 3 are compared against the baseline that samples a random (ξ, x). For the PBO method, we consider a random and a dueling Thompson sampling acquisition strategies. In total seven different methods are compared: the expected improvement by projective preferential query (PPBO-EI), pure exploita tion (PPBO-EXT), pure exploration (PPBO-EXR), preferen tial coordinate descent (PPBO-PCD), random (PPBO-RAND), and for the PBO; random (PBO-RAND) and a variant of dueling-Thompson sampling (PBO-DTS). For more details, see Supplementary material.

For f we consider four different test functions: Six-hump camel2D, Hartmann6D, Levy10D and Ackley20D.3 We add a small Gaussian error term to the test function outputs. There are as many initial queries as there are dimensions in a test function. The ith-initial query corresponds to ξ = ei, that is, to the ith-coordinate projection, and the reference vector x is uniformly random. We consider a total budget of 100 queries. The results are depicted in Figure 2.4

PPBO-PCD obtained the best performance on three of the four test functions. On the high-dimensional test function Ackley20D, PPBO-EI performed best. Unsurprisingly, all PPBO variants clearly outperformed all PBO variants. Since the performance gap between PPBO-RAND and PBO-RAND is so high, we conclude that from the optimization perspec tive; whenever a projective preferential query is possible, a PPBO-type of approach should be preferred to an approach that is based on pairwise comparisons. However, we note that it is better to think of PPBO as a complement, not as a substitute for PBO. In the applications, pairwise com parisons may be preferred, for instance, if they are more convenient to a user, and the underlying choice space is low-dimensional.

To illustrate the low information content of pairwise com parisons, we ran a test on the Six-hump-camel2D function.

3https://www.sfu.ca/ ssurjano/optimization.html 4All experiments of each test function were run on a computing infrastructure of 24x Xeon Gold 6148 2.40GHz cores and 72GB RAM. The longest experiment (Ackley20D) took in total 24h.

We trained a GP classifer with 2000 random queries (du els), and found a Condorcet winner (see Gonz alez et al., 2017) by maximizing the soft-Copeland score (33 33 MC-samples used for the integration) by using Bayesian optimization (500 iterations with 10 optimization restarts). This took 41 minutes on the 8th-gen Intel i5-CPU, and the distance to a true global minimizer was kxc xtruek = k(0.1770, 0.0488) (0.0898, 0.7126)k 0.67, and the corresponding function value was 0.1052 compared to a true global minimum value 1.0316. In contrast, PPBO RAND reached this level of accuracy at the frst queries, as seen from Figure 2.

5. User experiment

In this section we demonstrate the capability of PPBO to correctly and effciently encode user preferences from pro jective preferential feedback.

We consider a material science problem of a single organic molecule adsorbing to an inorganic surface. This is a key step in understanding the structure at the interface between organic and inorganic flms inside electronic devices, coat ings, solar cells and other materials of technological rele vance. The molecule can bind in different adsorption con fgurations, altering the electronic properties at the interface and affecting device performance. Exploring the structure and property phase space of materials with accurate but costly computer simulations is a diffcult task. Our objective is to fnd the most stable surface adsorption confguration through human intuition and subsequent computer simula tions. The optimal confguration is the one that minimises the computed adsorption energy.

Our test case is the adsorption of a non-symmetric, bulky molecule camphor on the fat surface of (111)-plane termi nated Cu slab. Some understanding of chemical bonding is required to infer correct adsorption confgurations. The user is asked to consider the adsorption structure as a function of molecular orientation and translation near the surface. These are represented with 6 physical variables: angles α, β, γ of molecular rotation around the X, Y, Z Cartesian axes (in the range [0, 360] deg.), and distances x, y, z of translation above the surface (with lattice vectors following the translational symmetry of the surface). The internal structures of the molecule and surface were kept fxed since little structural deformation is expected with adsorption. A similar organic/inorganic model system and experiment sce nario was previously employed to detect the most stable surface structures with autonomous BO, given the energies of sampled confgurations (Todorovic et al., 2019).

In this interactive experiment, the users encode their preferred adsorption geometry as a location in the 6 dimensional phase space. We employ the quantum

Projective Preferential Bayesian Optimization

Figure 2. Convergence of different methods across 4 objective functions. The results are averaged over 25 different random initializations (for PBO we run 100 seeds). The vertical axis represent the value of the true objective function at the best guess: xopt = argmaxx µn(x) in PPBO, and xopt = Condorcet winner in PBO. The horizontal axis represents the number of projective preferential queries in PPBO, and the number of pairwise comparisons in PBO. The black horizontal dashed line indicates the global minimum of the objective function (a small Gaussian noise is added to the function values). The standard deviations are depicted by the vertical lines.

mechanical atomistic simulation code FHI-aims (Blum et al., 2009) to i) compute the adsorption energy E of this preferred choice, and ii) optimise the structure from this initial po sition to fnd the nearest local energy minimum in phase space, E*. We also consider the number of optimization steps N needed to reach the nearest minimum as a measure of quality of the initial location.

There are four different test users: two materials science ex perts (human: a Ph D student and an experienced researcher, both of them know the optimal solution), a non-expert (hu man), and a random bot (computer). The hypothesis is that: the materials science experts should obtain structures as sociated with lower energy minimum points. We consider only coordinate projections, that is ξ {e1, ..., e6}. In other words, we let the user choose the optimal value for one dimension at a time.

The total number of queries was 24, of which 6 were initial queries. The ith-initial query corresponded to ξ = ei, that is, to the ith-coordinate projection. The initial values for the reference coordinate vector x were fxed to the same value across all user sessions. For acquisition, we used the expected improvement by projective preferential query. Since we allowed only coordinate projections for ξ, we R frst selected ξ = argmaxξ {e1,...,e6} EIn(ξ, x)dx, n+1

and then, either xn+1 = argmax µn(x) (EI-EXT), or the x next xn+1 was drawn uniformly at random (EI-RAND). The computer bot gave random values to the queries; to provide some consistency to the bot, xn+1 was selected by maxi mizing a standard expected improvement function (EI-EI). The results are summarized in Table 1.

Our frst observation is that PPBO can distinguish between the choices made by a human and a computer bot. Hu man choices pinpoint atomic arrangements that are close to nearby local minima (small N), while the random bot s choices are far less reasonable and require much subsequent computation to optimise structures. For all human users, the preferred molecular structures were placed somewhat high above the surface, which led to relatively high E values. With this query arrangement, it appears the z variable was the most diffcult one to estimate visually. Human-preferred molecular orientations were favourable, so the structures were optimised quickly (few N steps).

The quality of user preference is best judged by the depth of the nearest energy basin, denoted by E*. It describes the energy of the structure preferred by a user. Here, there is a marked divide by expertise. The structures refned from the choices of the bot and non-expert are local minima of adsorption, characterised by weak dispersive interactions.

Projective Preferential Bayesian Optimization

Table 1. Results of the user experiment. The absorption energies were computed by using density functional theory (DFT) meth ods. The energies represent the absorption energies of the relaxed structures corresponding to the most preferred confgurations.

User Acq. of (ξ, x) E (e V) E* (e V) N

Expert 1 EI-RAND -0.454 -1.023 62 Expert 1 EI-EXT -0.371 -1.029 39 Expert 2 EI-RAND -0.611 -1.007 37 Expert 2 EI-EXT -0.564 -1.030 45 Non-expert EI-RAND -0.481 -0.771 32 Non-expert EI-EXT -0.511 -0.762 56 Bot EI-EI -0.365 -0.643 70 Bot EI-EI -0.612 -0.753 63 Bot EI-EI 2.231 -0.959 127 Bot EI-EI -0.216 -0.783 88

The expert s choice led to two low-energy structure types that compete for the global minimum, and feature strong chemical bonding of the O atom to the Cu surface. Thus, the data (Table 1, column E*: rows 1-4 versus rows 5-10) supports our hypothesis: the materials science experts do obtain structures associated with lower energy minimum points.

The fndings above demonstrate that the PPBO framework was able to encode the expert knowledge described via preferences. However, since there are only 10 samples, further work will be needed to validate the results.

6. Conclusions

In this paper we have introduced a new Bayesian framework, PPBO, for learning user preferences from a special kind of feedback, which we call projective preferential feedback. The feedback is equivalent to a minimizer along a projection. Its form is especially applicable in a human-in-the-loop con text. We demonstrated this in a user experiment in which the user gives the feedback as an optimal position or orientation of a molecule adsorbing to a surface. PPBO was capable of encoding user preferences in this case.

We demonstrated that PPBO can deal with high-dimensional spaces where existing preferential Bayesian optimization frameworks that are based on pairwise comparisons, such as IBO (Brochu, 2010) or PBO (Gonz alez et al., 2017), have diffculties to operate. In the numerical experiments, the performance gap between PPBO and PBO was so high that we conclude: whenever a projective preferential query is possible, a PPBO-type of approach is preferable from the optimization perspective. However, we note that it is better to think of PPBO as a complement, not as a substitute for

PBO. In the applications, pairwise comparisons may be preferred, for instance, if they are more convenient to a user.

In summary, if it is possible to query a projective preferential query, then PPBO provides an effcient way for preference learning in high-dimensional problems. In particular, PPBO can be used for effcient expert knowledge elicitation in highdimensional settings which are important in many felds.

Acknowledgements

This research was supported by the Academy of Finland (Flagship programme: Finnish Center for Artifcial Intelli gence FCAI; and grants 320181, 319264, 313195, 292334 and 316601). Computational resources were provided by the CSC IT Center for Science, and the Aalto Science-IT Project.

Blum, V., Gehrke, R., Hanke, F., Havu, P., Havu, V., Ren, X., Reuter, K., and Scheffer, M. Ab initio molecular sim ulations with numeric atom-centered orbitals. Computer Physics Communications, 180(11):2175 2196, 2009.

Brochu, E. Interactive Bayesian optimization: learning user preferences for graphics and animation. Ph D thesis, University of British Columbia, 2010.

Brochu, E., Brochu, N. D. F., and Ghosh, A. Active pref

erence learning with discrete choice data. In Platt, J. C., Koller, D., Singer, Y., and Roweis, S. T. (eds.), Advances in Neural Information Processing Systems 20, pp. 409 416. Curran Associates, Inc., 2008.

Brochu, E., Brochu, T., and de Freitas, N. A Bayesian interactive optimization approach to procedural anima tion design. In Proceedings of the 2010 ACM SIG GRAPH/Eurographics Symposium on Computer Anima tion, SCA 10, pp. 103 112. Eurographics Association, 2010.

Carrier, E. and Weatherford, L. Implementation of MNL choice models in the passenger origin-destination simu lator (PODS). Journal of Revenue and Pricing Manage ment, 14, 2015.

Chu, W. and Ghahramani, Z. Preference learning with gaussian processes. In Proceedings of the 22nd Interna tional Conference on Machine Learning, ICML 05, pp. 137 144. ACM, 2005.

Debreu, G. Representation of a preference ordering by a numerical function. Decision Process, pp. 159 165, 1954.

Gonz alez, J., Dai, Z., Damianou, A., and Lawrence, N. D. Preferential Bayesian Optimization. In Proceedings of

Projective Preferential Bayesian Optimization

the 34th International Conference on Machine Learning Volume 70, ICML 17, pp. 1282 1291, 2017.

Hern andez-Lobato, J. M., Hoffman, M. W., and Ghahra

mani, Z. Predictive entropy search for effcient global optimization of black-box functions. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1, NIPS 14, pp. 918 926. MIT Press, 2014.

Jones, D. R., Schonlau, M., and Welch, W. J. Effcient global optimization of expensive black-box functions. Journal of Global Optimization, 13(4):455 492, 1998.

Koyama, Y., Sato, I., Sakamoto, D., and Igarashi, T. Sequen

tial line search for effcient visual design optimization by crowds. ACM Transactions on Graphics, 36(4), 2017.

Mc Fadden, D. Econometric models of probabilistic choice. Structural analysis of discrete data with econometric ap plications, pp. 198 272, 1981.

Mc Fadden, D. Economic choices. American Economic Review, 91(3):351 378, 2001.

Mosteller, F. Remarks on the method of paired comparisons: The least squares solution assuming equal standard devia tions and equal correlations. Psychometrika, 16(1):3 9, 1951.

Rasmussen, C. E. and Williams, C. K. I. Gaussian Processes for Machine Learning. The MIT Press, 2005.

Simon, H. Reason in human affairs. Stanford University Press, 1990.

Soufani, H. A., Parkes, D. C., and Xia, L. Preference elicitation for general random utility models. In Proceed ings of the 29th Conference on Uncertainty in Artifcial Intelligence, 2013.

Spall, J. C. Cyclic seesaw process for optimization and identifcation. Journal of Optimization Theory and Appli cations, 154(1):187 208, 2012.

Stern, H. A continuum of paired comparisons models. Biometrika, 77(2):265 273, 1990.

Thurstone, L. L. A law of comparative judgment. Psycho

logical Review, 34(4):273 286, 1927.

Todorovi c, M., Gutmann, M., Corander, J., and Rinke, P. Bayesian inference of atomistic structure in functional materials. In npj Computational Materials, volume 5, pp. 1 7. 2019.