# optimistic_bayesian_optimization_with_unknown_constraints__fdb4575f.pdf

Published as a conference paper at ICLR 2024

OPTIMISTIC BAYESIAN OPTIMIZATION WITH UNKNOWN CONSTRAINTS

Quoc Phong Nguyen1, Wan Theng Ruth Chew2, Le Song2, Bryan Kian Hsiang Low2 & Patrick Jaillet1

1LIDS and EECS, Massachusetts Institute of Technology, USA 2School of Computing, National University of Singapore, Singapore qphongmp@gmail.com, chew.ruth@u.nus.edu, le.song@u.nus.edu, lowkh@comp.nus.edu.sg, jaillet@mit.edu

Though some research efforts have been dedicated to constrained Bayesian optimization (BO), there remains a notable absence of a principled approach with a theoretical performance guarantee in the decoupled setting. Such a setting involves independent evaluations of the objective function and constraints at different inputs, and is hence a relaxation of the commonly-studied coupled setting where functions must be evaluated together. As a result, the decoupled setting requires an adaptive selection between evaluating either the objective function or a constraint, in addition to selecting an input (in the coupled setting). This paper presents a novel constrained BO algorithm with a provable performance guarantee that can address the above relaxed setting. Specifically, it considers the fundamental trade-off between exploration and exploitation in constrained BO, and, interestingly, affords a noteworthy connection to active learning. The performance of our proposed algorithms is also empirically evaluated using several synthetic and real-world optimization problems.

1 INTRODUCTION

In real-world applications, we often encounter expensive-to-evaluate black-box objective functions that can only be assessed through simulations or experimentation. For example, problems involve optimizing the hyperparameters of a machine learning model (Wistuba et al., 2018; Perrone et al., 2020), or choosing experiments in the fields of material and drug design (Schweidtmann et al., 2018). To address these problems, Bayesian optimization (BO) has gained prominence as a widely adopted approach (Brochu et al., 2010; Frazier, 2018; Garnett, 2022). It is an iterative model-based approach that employs a probabilistic model, e.g., a Gaussian process (GP), to estimate the unknown objective function. At each iteration, BO searches for the optimal solution by strategically selecting an input query to evaluate the objective function, maintaining a balance between exploiting promising areas and exploring poorly-estimated regions. BO encompasses many well-established techniques such as the probability of improvement (Kushner, 1964), expected improvement (EI) (Mockus et al., 1978), Gaussian process upper confidence bound (GP-UCB) (Srinivas et al., 2010), the knowledge-gradient based approach (Frazier et al., 2008), and information-theoretic approaches: entropy search (Hennig and Schuler, 2012), predictive entropy search (PES) (Hernández-Lobato et al., 2014), and max-value entropy search (MES) (Wang and Jegelka, 2017).

Beyond the black-box objective function, recent advancements in BO have focused on addressing the prevalent presence of black-box constraints. For example, there often exist prediction time constraints and class-wise performance constraints when tuning machine learning models (Hernández Lobato et al., 2016; Takeno et al., 2022). They are just as costly to evaluate as the objective function. Constrained BO has led to many BO extensions such as EIC (an EI-based method) (Gardner et al., 2014), a knowledge gradient-based method (Chen et al., 2021), CMES-IBO (an MES-based method) (Takeno et al., 2022), augmented Lagrangian approaches (Gramacy et al., 2016; Picheny et al., 2016), and upper trust bound (UTB) (a GP-UCB-based method) (Priem et al., 2020).

Published as a conference paper at ICLR 2024

However, existing approaches primarily concentrate on empirical performance and lack a theoretical analysis to ensure consistent performance. Recently, theoretical studies for constrained BO have gained attention via the works of Lu and Paulson (2022) and Xu et al. (2023). While Lu and Paulson (2022) perform the analysis by proposing a penalty-based regret, Xu et al. (2023) analyse the cumulative regret due to the objective function and the constraint violation separately.

Besides the above largely unexplored theoretical analysis, existing works often overlook the potential for evaluating the objective function and constraints independently at different inputs, known as decoupled queries. Specifically, the above works, including the theoretical studies by Lu and Paulson (2022) and Xu et al. (2023), require simultaneous evaluations of the objective function and constraints at an input query, known as coupled queries. The distinction between coupled and decoupled queries was first mentioned in the work of Gelbart et al. (2014). They discuss a chicken-and-egg pathology which prevents extending a myopic BO approach such as EIC to the decoupled setting. Later, Hernández-Lobato et al. (2016) introduce a principled approach based on PES, namely PESC, to address constrained BO with decoupled queries. While PESC is functionally equivalent to a lookhead approach, it leverages the symmetric property of mutual information to avoid performing the actual lookahead computation. Regrettably, its implementation is fairly complex, making it less accessible to practitioners. Besides, it lacks a theoretical performance guarantee. While ADMMBO (Ariafar et al., 2019) deals with decoupled queries, it deterministically alternates the evaluations of the objective function and constraints, without exploiting the benefits of an adaptive selection approach. Hence, the question of devising an approach that offers a theoretical performance guarantee, is adaptable to decoupled queries, and can be readily implemented by practitioners remains unanswered.

In this paper, we address this question by proposing a simple algorithm with a theoretical performance guarantee, especially in the decoupled setting. Notably, our algorithm is myopic without expensive lookaheads. In Sec. 2, we introduce a regret that does not require any penalty parameter, unlike that in the work of Lu and Paulson (2022). Then, we discuss the exploration-exploitation trade-off in constrained BO in Sec. 3. Specifically, we introduce a new form of exploration, namely horizontal exploration, resulted from the presence of black-box constraints. It is to differentiate from the vertical exploration in unconstrained BO (Srinivas et al., 2010). Then, we design a unified approach that handles coupled and decoupled queries from this perspective. More importantly, our algorithms are shown to be no-regret in Theorem 3.3 and App. B. While the viewpoint of balancing exploration and exploitation is inherently grounded in BO, especially in the bandit setting like GP-UCB (Srinivas et al., 2010), Sec. 3.3 shows that the choice of the function to query can also be framed within the well-known uncertainty sampling paradigm in the active learning literature (Settles, 2009). In Sec. 3.4, we propose an estimator for approximating the optimal solution at each BO iteration with a theoretical performance guarantee. To empirically demonstrate the performance of our algorithms, we presents several experiments using both synthetic and real-world optimization problems in Sec. 4.

2 CONSTRAINED BAYESIAN OPTIMIZATION AND REGRET DEFINITION

Let f be a real-valued black-box objective function and C be a finite set of real-valued black-box constraints. Let the compact subset X Rd be the input domain and d N+ be the input dimension. We consider the following constrained optimization problem

max x S f(x) where the feasible region S {x X| c(x) λc c C} and λc R c C . (1)

An equality constraint can be transformed into two inequality constraints. Let us denote the set of the objective function and constraints as F {f} C and denote a function in F as h.

To identify the optimal solution x arg maxx S f(x), we employ BO which is an algorithm operating in a sequential manner. In the decoupled query setting, at iteration t, we gather a noisy observation of the function query ht F evaluated at an input query xt X

yht(xt) ht(xt) + ϵht(xt), ϵht(xt) N(0, σ2 ht) . (2)

For instance, at iteration t, the algorithm may decide to query the objective function (i.e., ht = f) while it may query a constraint (i.e., ht = c for some c C) at a different iteration t = t. On the contrary, in the coupled query setting, at iteration t, we query for observations {yh(xt)}h F of all functions h F evaluated at the same input query xt X. The coupled setting is less challenging since it does not require specifying the function query.

Published as a conference paper at ICLR 2024

Let Dh,t denote the set of observed inputs of h until iteration t (including xt) and Dt h FDh,t. Then, D0 consists of initial observed inputs. Let yh(Dt) = yh(Dh,t) {yh(x)}x Dh,t denote the set of observations from h at Dh,t. BO effectively utilizes the acquired observations {yh(Dh,t 1)}h F in the previous t 1 iterations to formulate a strategy for determining the next input query xt, the next function query ht, and an estimator, denoted as x t , for approximating the optimal solution x . We will discuss the probabilistic model of h F given the acquired observations and a performance metric of BO, called the regret, in the rest of this section. Then, we will elaborate on our strategy of choosing xt, ht, and x t in the following sections.

Gaussian process. For each h F, we model h with a Gaussian process (GP). It implies that every finite subset of {h(x)}x X follows a multivariate Gaussian distribution. The GP is fully specified by its prior mean mh(x) and its kernel kh(x, x ) cov(h(x), h(x )). We employ the commonly-used squared exponential (SE) kernel. At iteration t, given the observations yh(Dt 1) in the previous t 1 iterations, the posterior distribution of h(x) follows a Gaussian distribution with a closed-form posterior mean and variance, denoted as µh,t 1(x) and σ2 h,t 1(x), respectively.1

Regrets. To analyse the theoretical performance of constrained BO, we propose the following instantaneous regret r including that of the objective function rf and the constraints rc.

r(xt) maxh F rh(xt) where rf(xt) max(0, f(x ) f(xt)) c C, rc(xt) max(0, λc c(xt)) . (3)

Then, our goal is to design BO algorithms that achieve a sublinear cumulative regret

lim T 1 T RT lim T 1 T

t=1 r(xt) = 0 . (4)

as it implies that minx {xt}T t=1 r(xt) 1

T PT t=1 r(xt) approaches 0 as T approaches .

Remark 2.1 (Instantaneous regret as a sum). Alternatively, the instantaneous regret can be defined as a sum of instantaneous regrets of the objective function and the constraints

h F rh(xt) . (5)

Let us consider the case of a single constraint C = {c0}. For xt = x t, if rf(xt) = rf(x t) = 1 and rc0(xt) = 0 while rc0(x t) = 1, then r(xt) = r(x t) = 1 while s(xt) = 1 < 2 = s(x t). As a result, s(x) is more effective than r(x) at measuring the suboptimality of a solution. Nevertheless, a sublinear cumulative regret w.r.t. r implies a sublinear cumulative regret w.r.t. s and vice versa since r(xt) s(xt) |F| r(xt). We revisit s(xt) in Sec. 3.4 when discussing the estimator x t .

3 OPTIMISTIC BAYESIAN OPTIMIZATION WITH UNKNOWN CONSTRAINTS

To simplify the derivation, we consider the case of finite input domain X and utilize the following Lemma 5.1 of Srinivas et al. (2010) with a modification by applying the union bound for all functions in F (Lu and Paulson, 2022).

Lemma 3.1. Pick δ [0, 1] and set βt = 2 log(|F||X|t2π2/6δ). Then,

|h(x) µh,t 1(x)| β1/2 t σh,t 1(x) x X t 1 h F (6)

holds with probability 1 δ. It suggests

uh,t 1(x) µh,t 1(x) + β1/2 t σh,t 1(x) and lh,t 1(x) µh,t 1(x) β1/2 t σh,t 1(x) (7)

as the upper and lower confidence bounds of h(x) for all h F, x X, and t 1, respectively.

1Please refer to Rasmussen and Williams (2006) for the closed-form expressions.

Published as a conference paper at ICLR 2024

3.1 INPUT QUERY

In the classic unconstrained GP-UCB work of Srinivas et al. (2010) (i.e., C = ), it balances between exploiting the current posterior belief by selecting those with high posterior mean µf,t 1(x), and exploring inputs with highly uncertain evaluations of f by selecting those with high posterior standard deviations σf,t 1(x). Specifically, GP-UCB (Srinivas et al., 2010) selects an input query that maximizes an optimistic objective function evaluation uf,t 1(x)

x GP-UCB t = arg max x X uf,t 1(x) = arg max x X

µf,t 1(x) + β1/2 t σf,t 1(x) | {z } vertical exploration bonus

Let us call β1/2 t σf,t 1(x) the vertical exploration bonus as it encourages us to be optimistic about the unknown objective function evaluations.2 It is shown as the blue region in Fig. 1a.

In the presence of unknown constraints C, apart from the above optimistic objective function evaluation uf,t 1(x), we additionally consider an optimistic feasible region, denoted as Ot, which may contain some infeasible inputs. It represents the exploration of the feasible region which we refer to as horizontal exploration as opposed to the vertical exploration in the optimistic objective function evaluation. Ideally, Ot is a superset of the feasible region S to ensure that the optimal solution x remains in Ot. We consider the following optimistic feasible region

Ot {x X| uc,t 1(x) λc c C} (9) which aligns with our goal that Ot S with high probability since uc,t 1(x) c(x) with high probability (Lemma 3.1). By considering an optimistic feasible region, we avoid the subtle issue that the probability mass of the feasible region is 0 given the GP posterior beliefs of the constraints. This issue affects several existing approaches such as EIC, PESC, and CMES-IBO as discussed in App. A.

Combining the vertical and horizontal explorations, we select the input query xt that maximizes the optimistic objective function uf,t 1 restricted to the optimistic feasible region Ot xt arg max x Ot uf,t 1(x) . (10)

We derive an upper confidence bound of rf(xt) similar to that in GP-UCB, and an additional upper confidence bound of rc(xt) which holds with probability 1 δ (see App. B)

rf(xt) 2β1/2 t σf,t 1(xt) and rc(xt) 2β1/2 t σc,t 1(xt) . (11) If the queries are coupled, we can achieve a no-regret BO algorithm by simply obtaining observations {yh(xt)}h F at iteration t as elaborated in App. B. This algorithm for the coupled setting is called UCB-C to distinguish it from another algorithm, namely UCB-D, for the decoupled setting described in the next section.

While a condition resembling equation (9) is utilized in the work of Priem et al. (2020), they do not offer any theoretical analysis. Furthermore, they maximize a variant of the EI criterion as opposed to the upper confidence bound uf,t 1 in our approach. Besides, the optimization problem in equation (10) can also be framed as the unconstrained penalized acquisition function in the work of Lu and Paulson (2022). It suffers from an extra penalty parameter requiring automated fine-tuning, which is left as a future work by Lu and Paulson (2022). The algorithm most closely related to our UCB-C is the recent CONFIG algorithm (Xu et al., 2023), which is accompanied by theoretical bounds on the cumulative regret from the objective function and the constraints. However, decoupled queries remain unexplored in these studies. In the next section, we address this scenario by adaptively selecting the function query ht while maintaining the theoretical performance guarantee.

In order for the choice of xt in equation (10) to exist, Ot must be non-empty. This holds with probability 1 δ according to Lemma 3.1 if the optimization problem is feasible. To avoid the subtle case of Ot = , we recommend setting the GP prior mean of constraint c to λc in practice.

3.2 FUNCTION QUERY

To begin with, we provide insights into a rational function query selection strategy in the decoupled setting. Then, we will rigorously translate them into a concrete strategy.

2The term vertical refers to the output of f often plotted as the (vertical) y-axis, as opposed to the term horizontal which refers to the input of f often plotted as the (horizontal) x-axis, e.g., in Fig. 1a.

Published as a conference paper at ICLR 2024

vertical exploration bonus

(reduced by evaluating f) µf,t 1(x)

St Ut |{z} horizontal exploration bonus (reduced by evaluating c)

St Ut |{z} horizontal exploration bonus

Figure 1: (a) Plot of the vertical and horizontal exploration bonuses in the space of the objective function; and (b) Plot of the horizontal exploration bonus in the space of the constraint function.

Remark 3.2 (On a rational function query selection strategy). In the decoupled setting, it can be inefficient by querying all functions in F at every iteration since the possibility of querying/evaluating functions in F independently is left unexploited. On one hand, it is unnecessary to evaluate any constraint at an input query that is likely to be feasible, which is illustrated in Fig. 2a in Sec. 4. On the other hand, if there is a significant risk of a constraint violation at xt, it is likely that querying the most-violated constraint eliminates xt from the feasible region (hence, from being x ), in which case querying the objective function at xt is redundant. It is illustrated in Figs. 2b-c in Sec. 4.

From the regret analysis perspective, the decoupled setting poses a challenge in bounding the instantaneous regret r(xt) at each iteration. It is because r(xt) depends on all instantaneous regrets {rh(xt)}h F while we do not evaluate all functions in F at each iteration, unlike in the coupled setting. Therefore, it is necessary to establish a connection across {rh(xt)}h F.

To motivate our choice of ht, we partition the optimistic feasible region Ot into a νt-relaxed feasible confidence region St and an uncharted region Ut.

Ot = St Ut |{z} horizontal exploration bonus

St {x X| lc,t 1(x) λc νt c C} Ot and Ut Ot \ St (13)

where νt 0 is a constraint-relaxation parameter (St and Ut are illustrated in Fig. 1b). Recall that lc,t 1(x) c(x) holds with high probability (Lemma 3.1), so does St e Sνt where e Sνt {x X| c(x) λc νt c C} is a νt-relaxation of S. Therefore, St consists of feasible inputs w.r.t. e Sνt with high probability. Furthermore, any input x in the uncharted region Ut satisfies

c C, uc,t 1(x) λc λc νt > lc,t 1(x) . (14)

Hence, the uncharted region Ut consists inputs whose feasibilities w.r.t. S are unknown (because uc,t 1(x) λc > lc,t 1(x)) and whose risks of a constraint violation are sufficiently high (because λc lc,t 1(x) νt for some c C) as illustrated in Fig. 1b.

We interpret Ut as the horizontal exploration bonus. Its role in the optimistic feasible region Ot is analogous to the vertical exploration bonus β1/2 t σf,t 1(x) in the optimistic objective function evaluation uf,t 1(x) (illustrated in Fig. 1a and equation (8) vs. equation (12)). Let us translate the 2 intuitive cases in Remark 3.2 into the following concrete conditions.

Querying a constraint when xt Ut. When xt Ut, the risk of a constraint violation at xt is sufficiently high (λc lc,t 1(xt) νt, illustrated in Fig. 1b). From Remark 3.2, we query the most-violated constraint defined as the one with the highest risk of a constraint violation, i.e., arg maxc C λc lc,t 1(xt). It is noted that as νt decreases, the size of the uncharted region Ut is non-decreasing (see Fig. 1b). Hence, we use νt to control the size of Ut, i.e., controlling the horizontal exploration bonus. However, assigning a small value to νt is risky because we may excessively query a constraint, i.e., excessive horizontal exploration. Let us consider an extreme scenario: νt = 0 and there exists an iteration t such that a constraint c C is active at xt, i.e., c(xt) = λc. In this case, λc lc,t 1(xt) = c(xt) lc,t 1(xt) 0, so the algorithm will keep querying a constraint without querying the objective function.

Published as a conference paper at ICLR 2024

Algorithm 1 UCB-D Require: X, D0

1: Update GP posterior beliefs: {(µh,0, σh,0)}h F 2: for t 1; t t + 1; t T do 3: xt arg maxx Ot uf,t 1(x) 4: ct arg maxc C λc lc,t 1(xt) // most-violated constraint

5: if λct lct,t 1(xt) > 2β1/2 t σf,t 1(xt) then // xt Ut 6: ht ct // query most-violated constraint 7: else // xt St 8: ht f // query objective function 9: end if 10: yht(Dht,t) yht(Dht,t 1) {yht(xt)} 11: Update GP posterior belief: µht,t, σht,t 12: end for

Querying the objective function when xt St. When xt St, it is likely that xt is a feasible solution (relaxed by νt), illustrated in Fig. 1b. From Remark 3.2, we prefer querying the objective function. However, assigning a large value to νt is risky because we may excessively query the objective function, i.e., insufficient horizontal exploration. In particular, if νt > maxc C maxx X λc lc,t 1(x), then Ut = and Ot = St. It means xt St and the algorithm queries the objective function.

To resolve the dilemma of setting νt too large (excessively querying the objective function) or too small (excessively querying a constraint), we let νt to be self-tuned by tying its value with the vertical exploration bonus, i.e., setting νt = 2β1/2 t σf,t 1(xt). The broad intuition is that if νt is too large, the algorithm repeats querying the objective function which reduces the vertical exploration bonus β1/2 t σf,t 1(xt). It, in turn, reduces νt as νt = 2β1/2 t σf,t 1(xt). On the contrary, νt is too small only if 2β1/2 t σf,t 1(xt) is too small. From equation (11), it implies that rf(xt) is small, so it is justifiable to refrain from querying the objective function. The resulting algorithm, called UCB-D, for the decoupled setting is described in Algorithm 1. In App. C, we prove the following Theorem 3.3 on the cumulative regret of Algorithm 1. Theorem 3.3. The cumulative regret RT of Algorithm 1 is bounded by

|F|TβT max h F Chγh,T T 1 o 1 δ (15)

where Ch 8/ log(1+σ 2 h ) and γh,T adopted from the work of Srinivas et al. (2010) is the maximum information gain from observing T noisy evaluations of h.

Srinivas et al. (2010) show that γh,T is sublinear for some commonly used kernels including SE and Matérn kernels. Hence, Theorem 3.3 suggests that Algorithm 1 results in a cumulative regret that grows sublinearly when employing GPs with these kernels.

3.3 FUNCTION QUERY FROM ACTIVE LEARNING PERSPECTIVE

The choice of the input query xt from GP-UCB exhibits an intriguing link to the concept of information gain found in the active learning literature, where one seeks the most informative data point or its approximate equivalent, the most uncertain data point as discussed in the work of Srinivas et al. (2010). Interestingly, one can view the choice of the function query ht as an uncertainty sampling strategy as well (i.e., seeking the most uncertain data point ) (Settles, 2009).

Let us denote the upper confidence bounds of the instantaneous regrets w.r.t the objective function f and constraint c in equation (11) as

urf ,t 1(xt) 2β1/2 t σf,t 1(xt) rf(xt) (16)

urc,t 1(xt) max(0, λc lc,t 1(xt)) rc(xt) . (17)

While the nature of the uncertainty in active learning differs from that of the instantaneous regret in our problem, we are interested in minimizing their values in both scenarios. Hence, we employ the

Published as a conference paper at ICLR 2024

uncertainty sampling paradigm to choose the function query by setting ht = arg max h F urh,t 1(xt) . (18)

This resulting strategy, interestingly, coincides with the choice of ht in Algorithm 1. It is noted that the uncertainty sampling approach may not be effective when there exists uncertainty that cannot be reduced with observations, referred to as aleatoric uncertainty in the work of Hüllermeier and Waegeman (2021). Fortunately, our instantaneous regrets are analogous to the epistemic uncertainty that can be reduced with observations. Specifically, the more observations we obtain, the better the objective function and constraints (hence, x ) are estimated. Thus, a smaller regret can be achieved. Remark 3.4. If the evaluation of h F incurs a cost l(h) > 0, then we would like to choose the function query by maximizing a cost-aware upper confidence bound of the instantaneous regret. Specifically, the function query is chosen as ht = arg maxh F urh,t 1(xt)/l(h). It is interpreted as the upper confidence bound of the instantaneous regret per unit cost (Swersky et al., 2013).

3.4 ESTIMATOR OF THE OPTIMAL SOLUTION

Remark 2.1 states that s(x) is more effective than r(x) in assessing the suboptimality of a solution. Hence, we would like to use s(x) to propose an estimator x t for approximating the optimal solution x . From equation (11), we obtain an upper confidence bound of s(x) at iteration t

h F rh(x) X

h F urh,t 1(x) (19)

where urh,t 1 is defined in equation (16) and equation (17). We would like to select the input with the lowest upper confidence bound of s(x) as the estimator by considering all previous t 1 iterations x t = xκ(t) , (20) where xt arg min x X

h F urh,t 1(x) and κ(t) arg min t =1,...,t

h F urh,t 1( xt ) . (21)

Then, App. D proves the following lemma. Lemma 3.5. By picking the estimator in equation (20), it holds with probability 1 δ that

t 1, s( x t ) |F| q

|F|βt max h F Chγh,t/t (22)

where Ch 8/ log(1+σ 2 h ) and γh,T adopted from the work of Srinivas et al. (2010) is the maximum information gain from observing T noisy evaluations of h.

Hence, when γh,t is sublinear (e.g., when the kernel is SE or Matérn (Srinivas et al., 2010)), the sum s( x t ) of instantaneous regrets at the estimator approaches 0 as t .

4 EXPERIMENTS

This section validates the empirical performance of our algorithms (UCB-C in the coupled setting and UCB-D in the decoupled setting) by comparing with EIC (Gardner et al., 2014), ADMMBO (Ariafar et al., 2019), and the state-of-the-art CMES-IBO which significantly outperforms other existing approaches including EIC and PESC in the work of Takeno et al. (2022). We did not conduct a comparison with PESC due to the challenge of maintaining a consistent initial configuration for PESC, as emphasized by Takeno et al. (2022). Moreover, in the decoupled setting, PESC is not currently available in the primary branch of the Spearmint tool at https://github.com/ HIPS/Spearmint. This also highlights the complexity involved in implementing PESC for decoupled queries and its limited accessibility to practitioners. For these baselines, we select the estimator x t = arg maxx X µf,t 1(x) such that c C, Pr(c(x) λc) |C|

0.95 as suggested by Takeno et al. (2022). This definition may be undefined in the presence of an equality constraint, so we do not consider equality constraints in our experiments. The estimator in our algorithms is described in equation (20). To illustrate both the instantaneous regrets of the objective function and constraints, we plot the average and standard error (over 10 repeated experiments) of the sum s( x t ) (equation (5)) of these regrets at the estimator against the number of queries |Dt|. The noise s standard deviation is set at σh = 0.01 h F. Additional details are described in App. E.

Published as a conference paper at ICLR 2024

0 20 40 60 Number of queries

Percentage of function queries

(a) [S-A0] (b) [S-A1] (c) [S-A2] (d) [S-A0]

0 25 50 75 Number of queries

Percentage of function queries

0 25 50 75 Number of queries (e) [S-A1] (f) [S-A2]

CMES-IBO UCB-C ADMMBO UCB-D EIC

0 20 40 60 Number of queries

0 25 50 75 Number of queries

0 25 50 75 Number of queries

(g) [S-A0] (h) [S-A1] (i) [S-A2]

Figure 2: Synthetic problems: (a-c) Plots of the upper confidence bound uf,t 1 (plotted as the gray heatmap), constraints (plotted as contour lines with arrows showing the side of the feasible region), the optimal x , the estimator x t , and input queries of UCB-D; (d-f) Plots of the percentage of the objective function (the dotted area) and constraints selected as ht by UCB-D; (g-i) Plots of the instantaneous regret s( x t ) against the number of queries.

4.1 SYNTHETIC PROBLEMS

The experiments are conducted on 3 synthetic constrained optimization problems each labeled in the format [S-A{#_of_active_constraints_at_x }]: [S-A0], [S-A1], and [S-A2] with 0, 1, and 2 active constraints, respectively (the formulations are described in App. E.1). In these problems, there are 2 input dimensions so we can visualize the constraints and the input queries as shown in Figs. 2a-c. [S-A0]: The constraint is inactive at x which is located distant from the boundary of the feasible region S (Fig. 2a). Thus, UCB-D does not require precise boundary estimations of S (i.e., of c0) to pinpoint x . This results in a sparse allocation of input queries around the boundary of S. Furthermore, around the estimator x t , a substantial number of queries are evaluated at the objective function (i.e., ht = f, as plotted by the yellow pluses) due to the high certainty that the input query is feasible, which aligns with Remark 3.2. Specifically, Fig. 2d shows that more than 70% of 60 input queries are evaluated at the objective function f, as plotted by the dotted area. Despite the distance between the estimator x t and the optimal x , the difference between f( x t ) and f(x ) is minimal because s( x t ) is small in Fig. 2g for UCB-D. [S-A1]: At x , the constraint c0 is inactive, but unlike [S-A0], the constraint c1 is active (Fig. 2b). Thus, it requires precise boundary estimations of c1 around x to pinpoint x , but does not require precise boundary estimations of c0. This results in a sparse allocation of input queries around the boundary of c0 and a denser allocation of input queries around the boundary of c1, especially near x t in Fig. 2b. Specifically, Fig. 2e shows that only a small number of the input queries are evaluated at the inactive c0. [S-A2]: Both constraints c0 and c1 are active at x (Fig. 2c). Thus, UCB-D requires precise boundary estimations of both c0 and c1 around x to pinpoint x . This results in a dense allocation of input queries around the boundaries of both c0 and c1, especially around x in Fig. 2c. Specifically, Fig. 2f shows that the input queries are roughly allocated equally to the objective function and the 2 constraints. Despite the distance between x t and x , the difference between f( x t ) and f(x ) is minimal because s( x t ) is small in Fig. 2i for UCB-D. In Figs. 2b-c, we also observe that only a minority of the function queries ht = f are located far away from the feasible region S, which aligns with Remark 3.2.

Regarding the instantaneous regret, Figs. 2g-i show that our UCB-D converges faster than other algorithms. Hence, UCB-D is more query-efficient, as explained by the above discussion. UCB-C performs competitively compared to the state-of-the-art CMES-IBO as both are designed for the

Published as a conference paper at ICLR 2024

CMES-IBO UCB-C ADMMBO UCB-D EIC

0 20 40 60 Number of queries

0 50 100 150 Number of queries

0 20 40 60 Number of queries

(a) [Gas] (b) [CNN] (c) [QChip]

0 20 40 60 Number of queries

Percentage of function queries

0 50 100 150 Number of queries

0 20 40 60 Number of queries (d) [Gas] (e) [CNN] (f) [QChip]

Figure 3: Real-world problems: Plots of (a-c) the instantaneous regret s( x ) and (d-f) the percentage of function queries (by UCB-D) against the number of queries.

coupled setting. ADMMBO does not work well probably because it requires tuning the number of evaluations of f and c at each BO iteration. In our experiments, ADMMBO evaluates f and c once at each BO iteration to be consistent with EIC, CMES-IBO, and UCB-C.

4.2 REAL-WORLD PROBLEMS

In this section, we introduce 3 optimization problems utilizing real-world objective functions and constraints. These problems serve to assess the effectiveness of our algorithms in practice. We select a real-world problem of optimizing a gas transmission compressor design, referred to as [Gas], from Kumar et al. (2020). It consists of d = 4 input dimensions and has |C| = 1 constraint. The problem of tuning hyperparameters of a convolutional neural network (CNN), referred to as [CNN], is taken from the work of Takeno et al. (2022). In the [CNN] problem, a two-layer CNN is trained on a class-imbalanced CIFAR10 dataset. The goal is to maximize the overall accuracy across 10 classes subject to the constraint that the recall of each class is at least 0.5, i.e., |C| = 10. There are d = 5 hyperparameters to be optimized. The final experiment, referred to as [QChip], involves maximizing the coupling strength of a synthetic superconducting quantum chip (Yan et al., 2018). While it is a critical aspect for the chip s performance, coupling strength must be carefully controlled within the constraints of the coupling energy to prevent issues like noise, cross-talk between qubits, and poor gate fidelity (Kwon et al., 2021). In particular, we maximize the coupling strength subject to |C| = 2 constraints specifying the desirable range of the coupling energy, by adjusting d = 11 geometric features that describe the physical dimensions and arrangement of quantum chip components. To create the groundtruth functions, we obtain a dataset consisting of 393 data points. They are rigorously generated through extensive simulations using electrical simulation software (see App. E.3).

Figs. 3(a-c) show that our UCB-D and UCB-C converge faster than other baseline methods. Therefore, when considering the same number of queries, our algorithms outperform other baseline methods in identifying superior designs for the above real-world problems. In Figs. 3d and 3f, we observe that the number of queries to the objective function dominates that to the constraints in the [Gas] and [QChip] experiments, hinting that the constraints are inactive at the optimal solution, aligning with the groundtruth. Fig. 3e shows that in the [CNN] experiment, UCB-D initially focuses on identifying a feasible input as it allocates few queries to the objective function at the start. It is noted that locating a feasible input is more challenging in this experiment due to the large number of constraints.

5 CONCLUSION

In this paper, we propose a novel constrained BO algorithm with a provable performance guarantee that adaptively selects not only the input query but also the function query to account for the decoupled query. We formulate the algorithm from the standpoint of the fundamental exploration-exploitation trade-off and, interestingly, cast the proposed algorithm under the uncertainty sampling paradigm in the active learning literature. As our constrained BO solution requires only the confidence bounds of the function evaluations, we believe the approach can be applied to other BO problems such as BO of risk measures (Cakmak et al., 2020; Nguyen et al., 2021b;a) and meta-BO (Nguyen et al., 2023).

Published as a conference paper at ICLR 2024

REPRODUCIBILITY STATEMENT

We have described detailed proofs for the theoretical results in App. B, C, and D. These proofs utilize an assumption from the work of Srinivas et al. (2010) as elaborated in Sec. 3. Regarding the experimental results, we have included both the code and the datasets in the submission. We have also provided a more detailed description of the experiment settings in App. E.

ACKNOWLEDGEMENT

This research is supported by AI Singapore, under grant AISG2-RP-2020-018.

This research is part of the programme Des Cartes and is supported by the National Research Foundation, Prime Minister s Office, Singapore under its Campus for Research Excellence and Technological Enterprise (CREATE) programme.

S. Ariafar, J. Coll-Font, D. H. Brooks, and J. G. Dy. ADMMBO: Bayesian optimization with unknown constraints using ADMM. JMLR, 20(123):1 26, 2019.

E. Brochu, V. M. Cora, and N. de Freitas. A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. ar Xiv:1012.2599, 2010.

S. Cakmak, R. Astudillo, P. I. Frazier, and E. Zhou. Bayesian optimization of risk measures. In Proc. Neur IPS, 2020.

W. Chen, S. Liu, and K. Tang. A new knowledge gradient-based method for constrained Bayesian optimization. ar Xiv preprint ar Xiv:2101.08743, 2021.

P. I. Frazier. A tutorial on Bayesian optimization. ar Xiv preprint ar Xiv:1807.02811, 2018.

P. I. Frazier, W. B. Powell, and S. Dayanik. A knowledge-gradient policy for sequential information collection. SIAM Journal on Control and Optimization, 47(5):2410 2439, 2008.

J. R. Gardner, M. J. Kusner, Z. E. Xu, K. Q. Weinberger, and J. P. Cunningham. Bayesian optimization with inequality constraints. In Proc. ICML, volume 2014, pages 937 945, 2014.

Roman Garnett. Bayesian Optimization. Cambridge University Press, 2022. in preparation.

M. A. Gelbart, J. Snoek, and R. P. Adams. Bayesian optimization with unknown constraints. ar Xiv preprint ar Xiv:1403.5607, 2014.

J. Ghosh and M. R. Geller. Controlled-not gate with weakly coupled qubits: Dependence of fidelity on the form of interaction. Physical Review A, 81(5), 2010.

R. B. Gramacy, G. A. Gray, S. Le Digabel, H. K. Lee, P. Ranjan, G. Wells, and S. M. Wild. Modeling an augmented Lagrangian for blackbox constrained optimization. Technometrics, 58(1):1 11, 2016.

M. Gupta, G. V. Graziano, M. Pendharkar, J. T. Dong, C. P. Dempsey, C. Palmstrøm, and V. S. Pribiag. Gate-tunable superconducting diode effect in a three-terminal Josephson device. Nature Communications, 14(1), 2023.

T. Hao, R. Shaydulin, M. Pistoia, and J. Larson. Exploiting in-constraint energy in constrained variational quantum optimization. In 2022 IEEE/ACM Third International Workshop on Quantum Computing Software (QCS), pages 100 106, 2022.

P. Hennig and C. J. Schuler. Entropy search for information-efficient global optimization. JMLR, pages 1809 1837, 2012.

J. M. Hernández-Lobato, M. W. Hoffman, and Z. Ghahramani. Predictive entropy search for efficient global optimization of black-box functions. In Proc. NIPS, pages 918 926, 2014.

Published as a conference paper at ICLR 2024

J. M. Hernández-Lobato, M. A. Gelbart, R. P. Adams, M. W. Hoffman, and Z. Ghahramani. A general framework for constrained Bayesian optimization using information-based search. JMLR, 17(1): 5549 5601, 2016.

E. Hüllermeier and W. Waegeman. Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods. Machine Learning, 110:457 506, 2021.

A. Kumar, G. Wu, M. Z. Ali, R. Mallipeddi, P. N. Suganthan, and S. Das. A test-suite of non-convex constrained optimization problems from the real-world and some baseline results. Swarm and Evolutionary Computation, 56:100693, 2020.

H. J. Kushner. A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise. Journal of basic engineering, 86(1):97 106, 1964.

Sangil Kwon, Akiyoshi Tomonaga, Gopika Lakshmi Bhai, Simon J. Devitt, and Jaw-Shen Tsai. Gate-based superconducting quantum computing. Journal of Applied Physics, 129(4), 2021.

Fei-Yu Li and Li-Jing Jin. Quantum chip design optimization and automation in superconducting coupler architecture. Quantum Science and Technology, 8(4):045015, 2023.

B. Lienhard, J. Braumuller, W. Woods, D. Rosenberg, G. Calusine, S. Weber, A. Vepsalainen, K. O'Brien, T. P. Orlando, S. Gustavsson, and W. D. Oliver. Microwave packaging for superconducting qubits. In 2019 IEEE MTT-S International Microwave Symposium (IMS), 2019.

Q. Liu, Y. Huang, Y. Du, Z. Zhao, M. Geng, Z. Zhang, and K. Wei. Advances in chip-based quantum key distribution. Entropy, 24:1334, 2022.

C. Lu and J. A. Paulson. No-regret Bayesian optimization with unknown equality and inequality constraints using exact penalty functions. IFAC-Papers On Line, 55(7):895 902, 2022.

X. Y. Lu, S. Ashhab, W. Cui, R. Wu, and F. Nori. Two-qubit gate operations in superconducting circuits with strong coupling and weak anharmonicity. New Journal of Physics, 14:073041, 2012.

D. A. B. Miller. Optical physics of quantum wells. Technical report, Stanford University, 1997.

J. Mockus, V. Tiešis, and A. Žilinskas. The application of Bayesian methods for seeking the extremum. In L. C. W. Dixon and G. P. Szegö, editors, Towards Global Optimization 2, pages 117 129. North-Holland Publishing Company, 1978.

Q. P. Nguyen, Z. Dai, B. K. H. Low, and P. Jaillet. Optimizing conditional value-at-risk of black-box functions. In Proc. Neur IPS, pages 4170 4180, 2021a.

Q. P. Nguyen, Z. Dai, B. K. H. Low, and P. Jaillet. Value-at-risk optimization with Gaussian processes. In Proc. ICML, pages 8063 8072, 2021b.

Q. P. Nguyen, B. K. H. Low, and P. Jaillet. Meta-VBO: Utilizing prior tasks in optimizing risk measures with Gaussian processes. In Proc. ICLR, 2023.

V. Perrone, H. Shen, A. Zolic, I. Shcherbatyi, A. Ahmed, T. Bansal, M. Donini, F. Winkelmolen, R. Jenatton, J. B. Faddoul, et al. Amazon Sage Maker automatic model tuning: Scalable black-box optimization. ar Xiv preprint ar Xiv:2012.08489, 2020.

V. Picheny, R. B. Gramacy, S. Wild, and S. Le Digabel. Bayesian optimization under mixed constraints with a slack-variable augmented Lagrangian. In Proc. Neur IPS, volume 29, 2016.

R. Priem, N. Bartoli, Y. Diouane, and A. Sgueglia. Upper trust bound feasibility criterion for mixed constrained Bayesian optimization with application to aircraft design. Aerospace Science and Technology, 105:105980, 2020.

C. E. Rasmussen and C. K. I. Williams. Gaussian processes for machine learning. MIT Press, 2006.

M. Sarovar, T. Proctor, K. Rudinger, K. Young, E. Nielsen, and R. Blume-Kohout. Detecting crosstalk errors in quantum information processors. Quantum, 4:321, 2020.

Published as a conference paper at ICLR 2024

A. M. Schweidtmann, A. D. Clayton, N. Holmes, E. Bradford, R. A. Bourne, and A. A. Lapkin. Machine learning meets continuous flow chemistry: Automated optimization towards the Pareto front of multiple objectives. Chemical Engineering Journal, 352:277 282, 2018.

E. A. Sete, N. Didier, A. Q. Chen, S. Kulshreshtha, R. Manenti, and S. Poletto. Parametric-resonance entangling gates with a tunable coupler. Physical Review Applied, 16(2), 2021.

B. Settles. Active learning literature survey. 2009.

N. Srinivas, A. Krause, S. Kakade, and M. Seeger. Gaussian process optimization in the bandit setting: No regret and experimental design. In Proc. ICML, pages 1015 1022, 2010.

K. Swersky, J. Snoek, and R. P. Adams. Multi-task Bayesian optimization. In Proc. Neur IPS, volume 26, 2013.

S. Takeno, T. Tamura, K. Shitara, and M. Karasuyama. Sequential and parallel constrained max-value entropy search via information lower bound. In Proc. ICML, pages 20960 20986, 2022.

Z. Wang and S. Jegelka. Max-value entropy search for efficient Bayesian optimization. In Proc. ICML, pages 3627 3635, 2017.

M. Wistuba, N. Schilling, and L. Schmidt-Thieme. Scalable Gaussian process-based transfer surrogates for hyperparameter optimization. Machine Learning, 107(1):43 78, 2018.

Y. Wu, W. S. Bao, S. Cao, F. Chen, M. C. Chen, X. Chen, T. H. Chung, H. Deng, Y. Du, D. Fan, et al. Strong quantum computational advantage using a superconducting quantum processor. Physical review letters, 127(18):180501, 2021.

W. Xu, Y. Jiang, B. Svetozarevic, and C. Jones. Constrained efficient global optimization of expensive black-box functions. In Proc. ICML, pages 38485 38498, 2023.

F. Yan, P. Krantz, Y. Sung, M. Kjaergaard, D. L. Campbell, T. P. Orlando, S. Gustavsson, and W. D. Oliver. Tunable coupling scheme for implementing high-fidelity two-qubit gates. Physical Review Applied, 10(5), 2018.

C. H. Yang, R. C. C. Leon, J. C. C. Hwang, A. Saraiva, T. Tanttu, W. Huang, J. Camirand Lemyre, K. W. Chan, K. Y. Tan, F. E. Hudson, et al. Operation of a silicon quantum processor unit cell above one kelvin. Nature, 580(7803):350 354, 2020.