# collaborative_bayesian_optimization_with_fair_regret__43316c3c.pdf Collaborative Bayesian Optimization with Fair Regret Rachael Hwee Ling Sim * 1 Yehong Zhang * 2 Bryan Kian Hsiang Low 1 Patrick Jaillet 3 Bayesian optimization (BO) is a popular tool for optimizing complex and costly-to-evaluate blackbox objective functions. To further reduce the number of function evaluations, any party performing BO may be interested to collaborate with others to optimize the same objective function concurrently. To do this, existing BO algorithms have considered optimizing a batch of input queries in parallel and provided theoretical bounds on their cumulative regret reflecting inefficiency. However, when the objective function values are correlated with real-world rewards (e.g., money), parties may be hesitant to collaborate if they risk incurring larger cumulative regret (i.e., smaller real-world reward) than others. This paper shows that fairness and efficiency are both necessary for the collaborative BO setting. Inspired by social welfare concepts from economics, we propose a new notion of regret capturing these properties and a collaborative BO algorithm whose convergence rate can be theoretically guaranteed by bounding the new regret, both of which share an adjustable parameter for trading off between fairness vs. efficiency. We empirically demonstrate the benefits (e.g., increased fairness) of our algorithm using synthetic and real-world datasets. 1. Introduction Bayesian optimization (BO) (Dai et al., 2019; 2020a;b; Ling et al., 2016; Zhang et al., 2019) is a popular tool for optimizing complex (e.g., possibly noisy, non-convex, and/or with no closed-form expression/derivative) and costly-toevaluate black-box objective functions given a limited budget. BO algorithms can achieve competitive optimization *Equal contribution 1Department of Computer Science, National University of Singapore, Republic of Singapore 2Peng Cheng Laboratory, People s Republic of China 3Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, USA. Correspondence to: Bryan Kian Hsiang Low . Proceedings of the 38 th International Conference on Machine Learning, PMLR 139, 2021. Copyright 2021 by the author(s). performance by sequentially selecting input queries for evaluating the objective function f to trade off between sampling at or close to a likely maximizer (i.e., exploitation) vs. sampling in an unobserved input region (i.e., exploration). BO has been used in a wide variety of real-world applications such as hyperparameter tuning of machine learning (ML) models (Shahriari et al., 2015), drug development, chemical/material design (Griffiths & Hern andez-Lobato, 2020; Zhang et al., 2020), hotspot discovery via a mobility-ondemand (Mo D) system with autonomous vehicles (AVs) (Chen et al., 2013; 2015; Kharkovskii et al., 2020), among others. In these applications, each function evaluation (e.g., evaluating molecular properties, Mo D AV cruising to a location) is economically costly and/or time-consuming. Naturally, any party who performs BO would be interested in reducing the number of function evaluations and hence the cost/time needed. One appealing solution is collaborative ML (Sim et al., 2020) which allows every party to share data from function evaluations with the other parties optimizing the same f concurrently instead of evaluating the expensive f on its own. The works of Cheng et al. (2020) and Fitch (2018) have discussed the importance and benefits of collaboration in drug discovery and precision agriculture. At first glance, existing batch BO algorithms (Daxberger & Low, 2017; Desautels et al., 2014; Wu & Frazier, 2016) may seem like a suitable solution. In each iteration, these algorithms select a batch of input queries and each input query is assigned to a collaborating party for evaluating f in parallel. However, these batch BO algorithms do not optimize the assignment of input queries to parties. Some parties may be unintentionally assigned input queries to evaluate f with only small values. This possibility discourages collaboration and data sharing. Why is this so? Each party usually incurs some cost to evaluate f and is particular about its value as it is often correlated with some real-world reward. Any party would not want to unfairly receive smaller real-world reward and less valuable information than other collaborating parties. For example, when companies operating Mo D AVs collaborate to discover mobility demand hotspots, a party evaluating at a location with a higher mobility demand is more likely to pick up passengers and earn fares. So, any Mo D company would not want to solely evaluate at locations with low mobility demand and altruistically provide information to the other parties. When farmers collaborate Collaborative Bayesian Optimization with Fair Regret to optimize the conditions (e.g., fertilizer composition) to grow crops, a farmer would not like to always test conditions with poor yield to benefit others. Likewise, a research group collaborating with others in chemical or drug design would not want to study significantly fewer or less promising molecules than the others. Then, what guarantees should a collaborative BO algorithm provide to address every party s concerns? Firstly, the algorithm must have good efficiency: It should be able to maximize the evaluated function values across all parties (i.e., cumulative utility) and simultaneously find the maximizer. The cumulative utility (CU) here is closely related to the notion of cumulative regret used to measure the optimization performance of several BO algorithms (Daxberger & Low, 2017; Desautels et al., 2014; Hoang et al., 2018; Srinivas et al., 2010). Next, to avoid the unfair real-world rewards in the aforementioned examples, the algorithm needs to ensure fairness by reducing differences between the CUs of all parties. However, incorporating both efficiency and fairness in a BO algorithm is nontrivial due to the lack of a formal notion of fairness and the trade-off between them. The trade-off arises as fairness can be achieved by preventing some parties from getting larger CUs. For example, the fair strategy of selecting the same input queries for all parties may hurt efficiency due to redundant function evaluations. To address this challenge, our work here considers a collaborative BO mechanism where a trusted mediator jointly selects the input queries for all parties based on their data, as in batch BO algorithms. A new notion of regret is defined based on a social welfare concept from economics for capturing both inefficiency and unfairness. We then design a collaborative BO algorithm whose convergence rate can be theoretically guaranteed by bounding the new regret, i.e., the algorithm would not produce inefficient and unfair assignments. This is novelly achieved by considering each party s CU up to the previous BO iteration when selecting input queries in the current BO iteration to reduce unfairness. The specific contributions of our work here include: Defining new notions of fair regret based on a social welfare concept economics called the generalized Gini social-evaluation function (G2SF) (Weymark, 1981) that considers both efficiency and fairness (Sec. 3); Proposing a collaborative BO algorithm that can theoretically guarantee its convergence rate by bounding the new fair regret, achieve asymptotic no-regret performance, and address the concern of collaborative fairness (Sec. 4); Designing an adjustable parameter that can be used to trade off between fairness vs. efficiency in both the new fair regret and our collaborative BO algorithm (Sec. 3); Demonstrating the increased fairness and other properties of our collaborative BO algorithm empirically (Sec. 5). To the best of our knowledge, this is the first algorithm that addresses collaborative fairness in BO and considers fairness in the cumulative sense. The only other fair BO work (Perrone et al., 2020) has focused on mitigating biases in the outputs, which is a different line of fairness concept. More literature related to fairness (including the various fairness concepts in multi-armed bandit) and incentivizing exploration will be discussed later in Sec. 6. 2. Problem Formulation and Background Our problem setting considers n parties jointly optimizing a black-box objective function f : X R where X Rd denotes a domain of d-dimensional input feature vectors.1 We assume that each party can evaluate the objective function f at any input x X and the availability of a trusted mediator2 whom every honest party is willing to share its data with. In each iteration t = 1, . . . , T, the mediator selects an input query xi t to be assigned to each party i who then evaluates f at the assigned input xi t to observe a noisy realized output yi t f(xi t) + ϵ where ϵ N(0, σ2) with noise variance σ2. Let [n] {1, . . . , n}. Let Xt (xi t)i [n] and yt (yi t) i [n] be an input matrix and the corresponding vector of noisy realized outputs in iteration t, respectively. After t iterations, the mediator has a pooled dataset D1:t (X1:t, y1:t) where X1:t (Xj)j=1,...,t and y1:t (yj)j=1,...,t. The number T of iterations can be decided in advance or determined when any party wishes to leave the collaboration. 2.1. Batch Bayesian Optimization (BO) The objective of a conventional batch BO is to find a global maximizer x arg maxx X f(x) of the objective function f. To achieve this, the black-box objective function f is modeled as a Gaussian process (GP), that is, every finite subset of {f(x)}x X follows a multivariate Gaussian distribution (Rasmussen & Williams, 2006). A GP is fully specified by its prior mean mx E[f(x)] and covariance kxx cov[f(x), f(x )] for all x, x X. In iteration t, a GP model can produce a predictive distribution of the function outputs f Xt (f(xi t))i [n] at any input matrix Xt: p(f Xt|D1:t 1) = N(µXt|D1:t 1, ΣXt|D1:t 1) where µXt|D1:t 1 and ΣXt|D1:t 1 are, respectively, the predictive/posterior mean vector and covariance matrix which can be computed analytically (Hoang et al., 2015; 2016). Then, in each iteration t, a batch BO algorithm selects a batch of inputs Xt X n to maximize some acquisition function that is computed using the predictive distribution 1Our theoretical analysis considers a discrete input domain. However, our results can be generalized to a continuous, compact input domain via a suitable discretization (Srinivas et al., 2010). 2In reality, the mediator can be a government agency or an industry association who acts in everyone s best interest equally. Collaborative Bayesian Optimization with Fair Regret of f Xt. For example, the distributed batch GP upper confidence bound (DB-GP-UCB) algorithm (Daxberger & Low, 2017) selects Xt that trades off between sampling inputs with large posterior mean (i.e., exploitation) vs. those with high information gain on f X (i.e., exploration): max Xt X n 1 µXt|D1:t 1 + p αt I[f X ; yt | D1:t 1] (1) where parameter αt is set to trade off between exploitation vs. exploration for bounding its cumulative regret (Sec. 2.2), while I[f X ; yt | D1:t 1] = 0.5 log | I + σ 2ΣXt|D1:t 1 | is the information gain (or, equivalently, reduction in uncertainty) on f X by evaluating f at Xt to observe yt given the dataset D1:t 1 from previous iterations 1, . . . , t 1, and is larger when the inputs within Xt are more diverse and dissimilar to the inputs X1:t 1 from earlier iterations. 2.2. Regret and Utility The above DB-GP-UCB algorithm is designed to minimize the full cumulative regret which is a common BO objective. Let ri t f(x ) f(xi t) be the instantaneous regret incurred by assigning xi t to party i in iteration t. The full instantaneous regret is the sum of instantaneous regrets incurred by all parties of the batch in iteration t: rt Pn i=1 ri t. The full cumulative regret is the sum of full instantaneous regrets over iterations t = 1, . . . , T: RT PT t=1 rt. When a BO algorithm asymptotically achieves no regret (i.e., lim T RT /T = 0), it will eventually converge to a global maximum and every party will evaluate f at the assigned maximizer x . Let the utility ui t f(xi t) achieved by party i in iteration t be the function output f(xi t) that it has evaluated at the assigned xi t. The full utility is the sum of utilities achieved by all parties of the batch in iteration t: ut Pn i=1 ui t. The full cumulative utility is the sum of full utilities over iterations t = 1, . . . , T: UT PT t=1 ut. Both ut and UT are considered measures of efficiency. The full cumulative regret equals to the loss in utility from not knowing x beforehand: RT = n Tf(x ) UT . So, minimizing RT is equivalent to maximizing UT . Though RT is more commonly used than UT in the BO literature, we have introduced the notion of utility to ease the discussion of the concept of fairness. The properties formalized below are satisfied by rt and ut. Such properties should also be satisfied by alternative notions of regret and utility so that achieving no regret would imply convergence to a global maximum for every party: E1 Monotonicity. If the utility of any party improves in any iteration t, ceteris paribus, then ut should increase and rt should decrease: Let {ui t}i [n] and {ˆui t}i [n] denote any two sets of utilities achieved by parties i [n] in iteration t, and rt (ut) and ˆrt (ˆut) be the corresponding full instantaneous regrets (full utilities). For t = 1, . . . , T, i [n] (ui t > ˆui t) ( j [n] \ {i} uj t = ˆuj t) (ut > ˆut) (rt < ˆrt) . E2 Instantaneity. The full instantaneous regret in iteration t is 0 if every party i [n] is assigned x to evaluate f: For t = 1, . . . , T, ( i [n] xi t = x ) rt = 0 . 2.3. Social Welfare and Fairness In this subsection, we will introduce a number of social welfare and fairness concepts from economics which will be referenced later in this paper. Suppose that there are n parties and each party i [n] has a utility ai. Let a (ai)i [n] be a utility vector. A social welfare function projects a utility vector to a real value and naturally induces an ordering ( ) of the utility vectors: A utility vector b is preferred over a (i.e., denoted as a b) if the social welfare SW(b) of b is larger than the social welfare SW(a) of a: SW(a) < SW(b). Some commonly considered social welfare functions and orderings (Endriss, 2018) are listed below: Utilitarian social welfare (USW) is defined as a sum of the utilities of all parties i [n]: SWutil(a) Pn i=1 ai. Maximizing the USW maximizes the averaged utility over all parties, but it does not consider fairness. For example, given n = 3, the utility vector (3, 3, 3) is obviously fairer than (1, 1, 7), but their USWs are the same. The full cumulative utility UT defined in Sec. 2.2 is a form of USW. Egalitarian (max-min) social welfare (ESW) is defined as the minimum utility of any party i [n]: SWegal(a) mini [n] ai. ESW prefers the fairer outcome of (3, 3, 3) over (1, 1, 7). Leximin ordering is an ordering method that cannot be induced from any social welfare function. A utility vector b is preferred over a in a Leximin ordering (i.e., denoted as a lex b) iff a and b are identical, or after sorting in ascending order to obtain a and b , there exists an i [n] s.t. ai < bi and aj = bj for all j < i. Note that this implies SWegal(a) SWegal(b). Leximin ordering extends the idea of maximizing the utility of the worst-off party in ESW to the second worst-off party next, then the third, and so on. Next, how do we decide if the ordering a b is fair? Intuitively, given a utility vector a, if a party with a higher utility transfers 1/2 of its excess utility to another worse-off party, then it will result in a new utility vector b with more similar utilities and so, b should be preferred for fairness. This central intuition on fairness is captured by the Pigou Dalton principle (PDP) (Dalton, 1920; Pigou, 1912). Formally, a social welfare function or an ordering satisfies PDP in a strong sense if b is preferred over a (i.e., a b) whenever there exist i, j [n] s.t. (a) k [n] \ {i, j} ak = bk, Collaborative Bayesian Optimization with Fair Regret (b) ai + aj = bi + bj, and (c) |ai aj| > |bi bj|. To satisfy the PDP in the weak sense, we only require that a b. For an unfairness metric represented by a function H to satisfy the PDP, we will instead require H(a) > H(b) which means that the unfairness in a is more than that in b. This general concept of fairness gives us flexibility in selecting a fair and efficient notion of social welfare. 3. Notions of Fair Regret in Collaborative BO Recall that we consider n parties jointly optimizing a common black-box objective function f. A mediator repeatedly selects separate input queries to be assigned to different parties who then evaluate f at these assigned inputs. During the BO process, each party shares its data with the mediator only and does not know the data of the other parties. Such an information asymmetry process provides two key implications. Firstly, each party prefers to stay in the collaboration instead of leaving to optimize f independently because evaluating f at the inputs assigned by the mediator is the only way to benefit from more data gathered by all the parties.3 Next, it prevents any self-interested party from evaluating f at inputs with large function values/real-world rewards discovered by the other parties, which may increase unfairness. Moreover, it protects the privacy of the parties (e.g., location history of Mo D AV). To incentivize any selfinterested party (e.g., an Mo D company or a research group) to join the collaboration, our work proposes a collaborative BO algorithm that considers both efficiency and fairness when selecting the input queries. To do this, the conventional batch BO objective of minimizing the full cumulative regret RT (Sec. 2.2) is no longer suitable since it is equivalent to maximizing the full cumulative utility which is a form of USW (Sec. 2.3) no fairness is considered during the BO process. A new notion of regret, which incorporates both efficiency and fairness, is needed. Following the notations in Sec. 2.2, let U i T PT t=1 ui t be the individual cumulative utility (CU) of party i, and u T (U i T )i [n] be a vector of individual CUs of all parties. We can define a generalized CU by aggregating the individual CUs of all parties in a utility vector u T using a function W which projects u T to a real value: GUT W(u T ). Note that GUT = UT when W is set as USW. In our work here, we propose a suitable function W such that maximizing W corresponds to (a) maximizing the individual CU of each party (i.e., efficiency) and (b) reducing the differences between the individual CUs of all parties (i.e., fairness). To achieve this, W can be any welfare function that satisfies both monotonicity (E1) and the PDP. We will also define W using the generalized Gini social-evaluation function 3We will show this implication later via empirical evaluation. A rigorous mechanism that can incentivize the parties to adhere to the mediator and stay in the collaboration is left to future work. (G2SF) since it is more general than the welfare functions described in Sec. 2.3 and, more importantly, provides an adjustable parameter that can be used to trade off between efficiency vs. fairness, as detailed in Sec 3.2. 3.1. Fair Regret Given the generalized CU (i.e., GUT ), we can define a generalized cumulative regret similar to that described in Sec. 2.2: GRT W(Tf ) GUT where f f(x )1. Though it seems natural and ideal to consider minimizing GRT directly, it is challenging to do so within the iterative BO process since GUT = W(u T ) depends on the dataset of every party in all the BO iterations and may not be decomposable into independent terms across iterations t = 1, . . . , T for certain choices of W, unlike UT that can be decomposed into additive terms u1, . . . , u T . To tackle this challenge, we propose to greedily optimize the trade-off between efficiency vs. fairness in each BO iteration instead. Specifically, let U i t Pt t =1 ui t be the individual t-step cumulative utility (t-CU) of party i for t = 1, . . . , T and ut (U i t)i [n] be a vector of individual t-CUs of all parties. Correspondingly, let the generalized t-CU be gt W(ut). We define a new notion of instantaneous fair regret st by subtracting gt from the maximally achievable value in iteration t: st W(f + ut 1) gt (2) = W((f(x ) + U i t 1)i [n]) W((f(xi t) + U i t 1)i [n]) where the equality is due to the definition of ut and ui t = f(xi t). To understand (2), for t = 1, . . . , T, the maximally achievable value of the individual t-CU of any party i [n] is f(x ) + U i t 1 instead of tf(x ) as its individual (t 1)- CU U i t 1 achieved in previous iterations 1, . . . , t 1 cannot be changed. This new fair regret satisfies instantaneity (E2 in Sec. 2.2) since st = 0 if xi t = x for all i [n]. Then, a fair cumulative regret can be defined as ST PT t=1 st. From Sec. 2, before iteration t, any party i can only observe the noisy realized outputs (yi t ) t =1,...,t 1 instead of the noiseless function outputs (f(xi t )) t =1,...,t 1. Consequently, since each party i does not know the value of U i t 1, U i t 1 should not be directly used in our algorithm. Instead, it is reasonable to use the noisy realized/observed variant of the individual (t 1)-CU: λi t Pt 1 t =1 yi t as the parties are particular about observing the realized outputs (e.g., mobility demand correlating with the resulting income) in the real-world applications.4 So, we can modify st to obtain a revised notion of instantaneous fair regret s t instead: s t W((f(x ) + λi t)i [n]) W((f(xi t) + λi t)i [n]) (3) 4In practice, when f is unknown/black-box, RT and UT are computed by also approximating f(xi t) with yi t. Collaborative Bayesian Optimization with Fair Regret which still satisfies instantaneity (E2) since s t = 0 if xi t = x for all i [n]. Similarly, let g t W((f(xi t) + λi t)i [n]). Remark 1. We consider fairness in individual t-CUs among the parties in every BO iteration t = 1, . . . , T instead of only at the final iteration T. This is desirable for real-world applications as the parties may be concerned about fairness during the BO process and may not fix T in advance. Remark 2. Cooperative game theory (CGT) (Chalkiadakis et al., 2011) is another framework that may be incorporated into BO to achieve fairness by distributing/assigning the rewards/input queries to all parties according to their data valuations. Our work here chooses to adopt social welfare concepts instead of CGT as the social welfare functions are cheaper to compute and easier to incorporate into BO than the data valuation functions (e.g., Shapley value) from CGT. However, we will empirically study a CGT concept known as individual rationality which states that each party should not be worse off via collaborating than working alone. 3.2. Generalized Gini Social-Evaluation Function Our above notion of fair regret (3) is defined over a general welfare function W. We will next introduce the specific form of W that is used in our BO algorithm. The Gini index is a popular measure of income inequality and is related to the Gini social-evaluation function (GSF). A fairer distribution of incomes should have a smaller Gini index and larger GSF value. Formally, let w1, . . . , wn be a sequence of positive and non-increasing weights, w (wi)i [n] be a weight vector, and φ be a sorting operator that sorts the elements of its input vector (e.g., ut) in an ascending order and returns a new sorted vector. GSF is a subclass of ordered weighted average functions. According to GSF, we can define the generalized t-CU as gt W(ut) = w φ(ut) (4) s.t. its corresponding Gini weights are set as odd numbers in a descending order: wi = 2(n i) + 1 for all i [n]. Intuitively, (4) assigns a larger weight wi to a smaller individual t-CU U i t and outputs a larger gt for a utility vector ut whose elements are more similar, which aligns with our intuition that such values of individual t-CUs of all parties are fairer and more desirable. For example, given 2 parties, the gt with utility vector (5, 5) is W(5, 5) = 20 which is larger than gt with (8, 2) or (2, 8) (i.e., 3 2 + 1 8 = 14). However, both their CUs are equal (i.e., 5 + 5 = 8 + 2). As there is no strong reason for the Gini weights or any other choice of weights, the work of Weymark (1981) has generalized W to the generalized Gini social-evaluation function (G2SF) which allows an arbitrary sequence of positive nonincreasing weights. Positive weights address monotonicity (E1), while pairing non-increasing (decreasing) weights with increasing utilities has been shown to satisfy the PDP in the weak (strong) sense (see Axiom 6 in (Weymark, 1981) for more details) and thus addresses fairness . The weights w can be used to trade off between efficiency vs. fairness, as discussed later in Sec. 3.3. 3.3. Efficiency vs. Fairness in the Fair Regret When W is set as a G2SF function, the weights w can be used to trade off between efficiency vs. fairness in g t (or s t) and gt (or st). For simplicity, we will focus our discussion in this section on gt (or st) but it would also apply to g t (or s t). To devise a way to control the efficiency vs. fairness trade-off, we start by examining two extreme cases: If wi = 1 for all i [n], then gt reduces to USW (Sec. 2.3). Maximizing gt would lead to maximum efficiency (i.e., largest UT ). The fair cumulative regret is ST = RT since in each of its additive terms st (2), U i t 1 in its first and second terms would cancel out. If w1 = 1 and wi = 0 for i = 2, . . . , n, then gt reduces to ESW (Sec. 2.3) which only considers the utility of the worst-off party. In this case, st does not satisfy monotonicity (E1 in Sec. 2.2) as the utilities of the other n 1 parties are assigned a weight of 0 and hence ignored. In the first case, the weights are identical, while in the second case, the relative difference between w1 vs. wi for i = 2, . . . , n is large. Intuitively, smaller relative differences between wi s would cause the larger weighted utilities to make up a bigger proportion of gt and hence cause gt to induce an ordering that may prefer more efficient but less fair utility vectors. To better control the differences between the non-increasing wi for all i [n], we set wi = ρi 1 with 0 < ρ 1 s.t. a single parameter ρ can be used to trade off between efficiency vs. fairness, as detailed below: Proposition 1. Set wi = ρi 1 for all i [n] with 0 < ρ 1. Then, st will satisfy monotonicity (E1) and the Pigou-Dalton principle (PDP) on fairness. Moreover, when ρ = 1, W is USW s.t. gt and st only satisfy the PDP in the weak sense; 0 < ρ < 1, gt and st satisfy PDP in the strong sense; ρ 0, wi/w1 0 for i = 2, . . . , n, and the preference of utility vectors induced by gt converges to that of ESW and Leximin ordering: a lex b W(a) < W(b). Its proof is in Appendix B.1. Informally, Proposition 1 holds as w1, . . . , wn forms a decreasing sequence that satisfies the conditions of PDP outlined in the work of Weymark (1981) and the convergence to leximin ordering is described in the work of Endriss (2018). Concretely, suppose that there are 3 parties. The party with the largest, mid, and smallest U i t is weighted 1, ρ, and ρ2, respectively. As ρ decreases from 1 to 0, the relative differences between the parties weights increase. Minimizing the fair regret st Collaborative Bayesian Optimization with Fair Regret defined w.r.t. ρ would lead to more fairness and smaller Ut. Remark 3. How would gt compare with ut when varying ρ? Note that as ρ increases, Pn i=1 wi increases and becomes n when ρ = 1. For a better comparison, we should use normalized weights wi/ Pn i=1 wi to compute the normalized gt. For a fixed (f(xi t) + U i t 1)i [n], when ρ decreases, the normalized weights of the parties with smaller individual t-CUs increase, hence decreasing the normalized gt to be less than ut/n. The normalized gt and ST (i.e., computed using the normalized weights) will be used in Sec. 5. Remark 4. There are other ways to set w to control the trade-off between efficiency vs. fairness. For example, we can set wi = ρ(n i) + 1 for all i [n] with ρ > 1 s.t. the weight wi decreases linearly in i, like in the original Gini weights (see paragraph after (4)). However, it is not used as the preference of utility vectors induced by gt converges to Leximin ordering more slowly, as compared to the exponentially decreasing weights in Proposition 1. 4. Collaborative BO with Fair Regret We will now design a collaborative BO algorithm whose convergence rate can be theoretically guaranteed by bounding its fair (cumulative) regret s t (3) (S T PT t=1 s t).5 To achieve this, we propose a variant of the DB-GP-UCB algorithm (Sec. 2.1) which jointly selects n input queries Xt = (xi t)i [n] to be assigned to all n parties in each iteration t by maximizing the following acquisition function that accounts for fairness via G2SF: Xt arg max Xt X n i=1 wiφi((λi t + µxi t|D1:t 1)i [n]) αt I(f X ; yt | D1:t 1) where φ is the same sorting operator defined previously in (4), φi is the i-th element of the sorted vector returned by φ, λi t Pt 1 t =1 yi t is a realized/observed variant of the individual (t 1)-CU (i.e., U i t 1) of each party i, and αt is still a parameter that is set to trade off between exploitation vs. exploration. The first (exploitation) term in (5) is an ordered weighted average of λi t + µxi t|D1:t 1 for all i [n] and thus satisfies both monotonicity (E1) and the PDP. In each iteration t, our collaborative BO algorithm (5) selects n input queries Xt and assigns each input query xi t in Xt to party i for evaluating the objective function f. Such a selection has to trade off between (a) sampling near to a likely maximizer (i.e., with a large posterior mean µxi t|D1:t 1), (b) sampling inputs that can yield large information gain I(f X ; yt | D1:t 1) to improve the belief of f X (i.e., exploration), and (c) balancing the expected individual t-CUs between the parties by correcting past observed 5BO algorithms usually cannot minimize RT or S T directly. unfairness in λi t. To achieve (c), the parties with smaller λi t should exploit inputs with larger posterior mean while the parties with larger λi t may be assigned to sample inputs with large information gain. Specifically, given any Xt, the first term in (5) is maximized when the party with the k-th smallest λi t is assigned to evaluate f at x with the k-th largest µx|D1:t 1 due to the PDP property of function W (see Appendix B.2). So, our algorithm requires both ordered weighted averaging and λi t to achieve fairness. Note that without the second exploration term, maximizing our acquisition function in (5) is equivalent to maximizing each party s expected utility separately. Though G2SF is only applied to the first exploitation term in (5), fairness affects the second exploration term and the exploitation-exploration trade-off through αt which will be defined in Theorem 1. Remark 5. One may be inspired by the acquisition function in (5) to consider an alternative two-step batch BO algorithm to incorporate fairness: (a) Select Xt using the DB-GP-UCB algorithm (1), and (b) permute the order of xi t in Xt s.t. xi t with a larger expected utility µxi t|D1:t 1 is assigned to (or paired with) a smaller λi t. However, this heuristic can only partially alleviate some fairness concerns as it will ignore the numerical values of each λi t when selecting Xt and may be less aggressive at correcting for past unfairness. More importantly, it does not allow us to control the trade-off between efficiency vs. fairness. 4.1. Regret Bound The convergence rate of our collaborative BO algorithm (5) can be theoretically guaranteed via a probabilistic bound on its fair cumulative regret S T : Theorem 1. Let δ (0, 1), γT max X1:T I[f X ; y1:T ], C be a constant with C I[fx; (yi t ) i [i] | D1:t 1] for all i [n 1], x X, and t = 1, . . . , T, C1 4/ log (1 + σ 2), and αt C1(Pn i=1 w2 i ) exp (2C) log (|X|t2π2/(6δ)) . Then, for the modified fair cumulative regret S T incurred by our collaborative BO algorithm (5), S T = PT t=1 s t 2(TαT γT )1/2 holds with probability of at least 1 δ. Its proof and more details about the constants can be found in Appendix B.3. With the exception of αt, the constants are the same as the ones defined in the work of Daxberger & Low (2017). Theorem 1 implies that lim T S T /T = 0 and our algorithm can eventually converge to a global maximum, hence achieving asymptotic no-regret performance. That is, there exists an iteration t where s t = 0 and all parties evaluate f at x with no inefficiency and unfairness introduced. By bounding s t, we bound the inefficiency and unfairness (i.e., adjusted based on historical observed utilities) introduced in iteration t, i.e., we rule out some inefficient and less fair assignments, as described in Fig. 1. Note that a smaller ρ for our collaborative BO algorithm Collaborative Bayesian Optimization with Fair Regret Figure 1. Overview of our notion of instantaneous fair regret: s t considers λi t (i.e., realized/observed variant of individual (t 1)- CU which cannot be changed) and is the loss in the generalized t-CU g t from evaluating f at {xi t}i [n] instead of the maximizer x . As W captures both efficiency and fairness, when s t > 0, s t reflects the inefficiency from not evaluating x in iteration t and some unfairness. The red and blue boxes enclose assignments with the same full utility ut of values a and b, respectively. A fairer assignment is of a darker shade and increases g t. By bounding s t, we rule out some assignments due to their inefficiency (almost the entire red box) or unfairness (pale blue). would lead to relatively more input queries selected for exploration over exploitation due to two reasons: (a) Observe that αt in Theorem 1 depends on the weights wi s of G2SF. A smaller ρ will lead to a greater ratio of p Pn i=1 w2 i in the exploration term to Pn i=1 wi (i.e., total weight of the exploitation term). (b) Even after adjusting for the ratio, a smaller ρ would lead to a smaller exploitation term in (5) (see Remark 3 in Sec. 3.3). This increased exploration may cause the observed standard full cumulative regret RT (and possibly S T ) to increase. Furthermore, as the order of parties in the computation of g t may change across iterations, a smaller ρ (i.e., associated with more fairness) would reduce the efficiency of multiple parties. 5. Experiments and Discussion This section empirically evaluates the performance and properties of our collaborative BO algorithm using a benchmark function: Hartmann-6d, and three real-world collaborative black-box optimization problems: (a) hyperparameter tuning of a logistic regression (LR) model with a mobile sensor dataset (Anguita et al., 2013), (b) hyperparameter tuning of a convolutional neural network (CNN) with federated extended MNIST (FEMNIST) dataset (Caldas et al., 2018), and (c) mobility demand hotspot discovery on a traffic dataset (Chen et al., 2013). The performance of our collaborative BO algorithm is compared with that of the distributed batch GP-UCB (DB-GP-UCB) algorithm (Daxberger & Low, 2017) which has been empirically shown to outperform several existing batch BO algorithms (Daxberger & Low, 2017; Kharkovskii et al., 2020) but, like them, does not consider fairness. Note that DB-GP-UCB is equivalent to our algorithm with ρ = 1, i.e., each party s expected individual t-CU has equal weight. We evaluate the performance of the tested algorithms in terms of the following metrics: Inefficiency, measured using full cumulative regret (CR) (Sec. 2.2) averaged across parties, i.e., RT /n; Unfairness, measured using Ut/n gt which is the difference between the averaged individual t-CU vs. the generalized t-CU. When computing gt, we set6 ρ = 0.2 and normalize the weights wi s.t. Pn i=1 wi = 1 for a fair comparison. So, Ut/n gt is the generalized Gini absolute inequality indices (Weymark, 1981) and is an unfairness metric that satisfies PDP as gt satisfies PDP (Proposition 1) and Ut/n is a constant if the precondition of PDP holds. Its minimum value is 0 when U i t for all i [n] are equal (i.e., the fairest outcome); Unfairness and inefficiency, measured together using the normalized ST by setting ρ = 0.2. For the real-world experiments, as f(xi t) is unknown, we approximate f(xi t) with yt i for all i [n] and t = 1, . . . , T and so, ST = S T . We aim to show that our collaborative BO algorithm with ρ < 1 can improve fairness and observe the effect on efficiency. Following the previous GP-UCB works (Kandasamy et al., 2015; 2016) and our theoretical result (Theorem 1), we set αt = c1d(Pn i=1 w2 i ) log (c2t) in (5). We consider two settings: (i) Fix c1: c1 and c2 are fixed across different ρ s, and (ii) Vary c1: c2 is fixed but c1 varies for different ρ s s.t. the ratio of p Pn i=1 w2 i in the exploration term to Pn i=1 wi (i.e., total weight of the exploitation term) is a constant. Recall from the last paragraph of Sec. 4.1 that a smaller ρ will increase the exploration relative to exploitation and hence reduce the efficiency. Setting (ii) is used to elaborate and mitigate this effect. Details on the choices of c1 and c2 are shown in Appendix C.1. For each experiment, we repeat 10 runs of BO with different random seeds and plot the standard error as the error bars. The objective function is modeled as a GP with kxx chosen to be a squared exponential kernel. We describe the experimental setting of each tested objective function below: Hartmann-6d function. We consider n = 3 parties. The objective function f has d = 6 input dimensions in [0, 1]6. In each run, 10 data points are randomly selected for each party as initialization (i.e., T0 = 10 in Algorithm 1 of Appendix A). Hyperparameter tuning of LR with mobile sensor dataset. We consider n = 5 parties and each party trains a LR model for activity recognition using sensor data gathered by 10 mobile sensors.7 These parties collaborate to tune 6Note that this is distinct from ρ used in (5) which is varied in each experiment. 7Every party represents a company who owns the data of multiple users. They share a common objective function as the distributions of the companies data pooled from multiple users are likely Collaborative Bayesian Optimization with Fair Regret three hyperparameters of a LR: batch size in [20, 100], L2 regularization parameter and learning rate in [10 5, 1]. The output of the objective function is the validation accuracy of the LR model. We set T0 = 2. Every party is particular about its individual CU and the best ui t that it has evaluated as it would use the corresponding LR model for real-world activity recognition. Hyperparameter tuning of CNN with FEMNIST. We consider n = 10 parties and each party trains a CNN for character recognition using handwritten digits from 10 authors. The parties collaborate to tune three hyperparameters: learning rate, learning rate decay, and regularization parameters in [10 5, 1]. The output of the objective function is the validation accuracy of the CNN model and T0 = 2. Mobility demand hotspot discovery on traffic dataset. The traffic dataset includes 2506 input regions, each of which has a mobility demand.8 We adopt the settings in (Chen et al., 2015; Kharkovskii et al., 2020) and consider n = 8 parties. Each party is a company that operates a fleet of Mo D AVs and wants to discover the region with the highest mobility demand.9 If an Mo D AV picks up a passenger, the company deploys another Mo D AV to take its place. In each BO iteration, the mediator would assign each company to evaluate the mobility demand at an input region that is connected/near to its current input region. This constraint makes it harder to achieve efficiency and fairness as an Mo D AV cruising in low mobility demand regions cannot evaluate at known higher mobility demand regions that are too far away. The aim here is to show that our collaborative BO algorithm can improve fairness.10 Observations about the notions of regret. Fig. 2 shows results of the full CR RT /n and fair CR ST against different ρ s used in our collaborative BO algorithm (5) for each experiment. It can be observed that as ρ decreases, the inefficiency metric RT /n generally increases for both fixed and varying c1. This is expected since with a smaller ρ, the relative weight of the exploration term in (5) is larger and the exploitation term is smaller (i.e., discussed in Sec. 4.1 and Proposition 1), which reduces efficiency. However, in Fig. 2b, RT /n decreases when ρ decreases from 1 to around 0.7. This may be due to the objective function of the LR being easy to maximize. For example, there are multiple hyperparameters that can achieve similar competitive valida- to be similar. 8Appendix C.1 visualizes the mobility demand distribution. 9An Mo D AV visiting a higher mobility demand region is more likely to pick up passengers, earn fares, and collect more useful information. 10The efficiency of our collaborative BO algorithm in this experiment can be improved via a non-myopic approach (Kharkovskii et al., 2020), or relaxing the locality constraints and adopting discrete optimization techniques to maximize the acquisition function. However, this is non-trivial and not the focus of this work. 0.2 0.4 0.6 0.8 1.0 ρ Full or Fair CR RT/n (Fix c1) ST (Fix c1) RT/n (Vary c1) ST (Vary c1) 0.2 0.4 0.6 0.8 1.0 ρ Full or Fair CR RT/n (Fix c1) ST (Fix c1) RT/n (Vary c1) ST (Vary c1) (a) Hartmann-6d (b) Hyp. Tuning, LR 0.2 0.4 0.6 0.8 1.0 ρ Full or Fair CR RT/n (Fix c1) ST (Fix c1) RT/n (Vary c1) ST (Vary c1) 0.2 0.4 0.6 0.8 1.0 ρ Full or Fair CR RT/n (Fix c1) ST (Fix c1) RT/n (Vary c1) ST (Vary c1) (c) Hyp. Tuning, CNN (d) Traffic Figure 2. Full CR RT /n and normalized fair CR ST averaged over 10 BO runs incurred by the tested algorithms using different ρ s in (5). The dotted horizontal lines represent the results of ρ = 1 (i.e., DB-GP-UCB). tion accuracies for LR. Therefore, increasing the exploration (i.e., decreasing ρ) may lead to earlier discovery of inputs with small regret and thus improve efficiency. Moreover, as ρ decreases from 1 to 0.2, Fig. 2 shows that the fair CR ST , which measures both inefficiency and unfairness, first decreases and then increases. In Fig. 2, we plot the fair CR ST of DB-GP-UCB (equivalent to ours with ρ = 1) as a dotted line to aid comparison with ST of other ρ s. The decrease of ST is due to the improved fairness (and possibly efficiency) while the subsequent increase of ST is due to larger inefficiency having a dominant effect. For each experiment, since DB-GP-UCB (ρ = 1) does not consider fairness, its ST may be larger than those with smaller ρ s even though its RT /n (when ρ = 1) is smaller. Furthermore, it can be observed that when ρ is small, the blue lines (Vary c1) usually lie below the red lines (Fix c1), which shows that varying c1 can reduce the inefficiency significantly. In Figs. 2b-c, the gap between ST and RT /n, which reflects unfairness, is the largest when ρ = 1. This is expected as when ρ = 1, the mediator may unintentionally assign a party to evaluate at inputs with smaller function values multiple times. In Appendix C.2, we plot more graphs including Rt/n vs. iteration t for various ρ s; the observations are similar to the above. Observations about fairness. To evaluate the fairness achieved by the tested algorithms, Fig. 3 shows results of the unfairness metric Ut/n gt averaged over iterations t = 1, . . . , T against different ρ s used in our collaborative BO algorithm (5). It can be observed that for all the experiments, the averaged (Ut/n gt) decreases as ρ decreases. This observation holds for various choices of fixed Collaborative Bayesian Optimization with Fair Regret 0.2 0.4 0.6 0.8 1.0 ρ Averaged (Ut/n gt) (Fix c1) (Vary c1) 0.2 0.4 0.6 0.8 1.0 ρ Averaged (Ut/n gt) (Fix c1) (Vary c1) (a) Hartmann-6d (b) Hyp. Tuning, LR 0.2 0.4 0.6 0.8 1.0 ρ Averaged (Ut/n gt) (Fix c1) (Vary c1) 0.2 0.4 0.6 0.8 1.0 ρ Averaged (Ut/n gt) (Fix c1) (Vary c1) (c) Hyp. Tuning, CNN (d) Traffic Figure 3. Averaged (Ut/n gt) incurred by the tested algorithms using different ρ s in (5) for various objective functions. c1 and even for varying c1. Combining this observation with that about RT /n, we suggest to use varying c1 in the real-world applications for achieving more fairness and efficiency. However, one may ask whether an increase in Ut/n gt is due to more unfairness or the increase of the Ut/n alone. To answer this question, we plot Ut/n gt vs. Ut/n for different ρ s in Appendix C.3. It shows that even for the same Ut/n, Ut/n gt generally decreases when ρ decreases, which indicates that our collaborative BO algorithm with a smaller ρ can achieve fairer outcomes. Individual rationality. In CGT, a desirable property of a collaboration is individual rationality, which states that each party should not be worse off via collaborating than working alone (Chalkiadakis et al., 2011). In our collaborative BO setting, it is individually rational for any party to participate in the collaboration if it can obtain a better estimate of x for the same CR or, alternatively, smaller CR for the same estimate of x compared to performing BO alone. In Appendix C.5, we show that for some ρ s, our collaborative BO algorithm would satisfy this property by assuming that parties would perform GP-UCB (Srinivas et al., 2010) on their own. More experimental results on the simple regret, collaborative BO with n = 50 parties, and some detailed analysis of λi t are given in Appendix C. 6. Related Work To the best of our knowledge, there is no existing work on collaborative fairness in BO. The closely related works are multi-armed bandit (MAB) with various fairness considerations and batch BO, as summarized in Table 1. Our work builds upon batch BO but additionally minimizes the differences between the individual CUs of all parties to encourage collaboration. The MAB works consider significantly dif- Table 1. Summary of related works. Description References Batch BO. Select the input of each party one at a time (Alvi et al., 2019; Azimi et al., 2010; Contal et al., 2013; Desautels et al., 2014; Gonz alez et al., 2016) Batch BO. Optimize the batch of inputs jointly (Chevalier & Ginsbourger, 2013; Daxberger & Low, 2017; Shah & Ghahramani, 2015; Wu & Frazier, 2016) MAB. Fairness across arms (Liu et al., 2017; Patil et al., 2020) MAB. Fairness between objectives (Busa-Fekete et al., 2017) MAB. Fairness across parties (Hossain et al., 2020) MAB. Incentivizing exploration (Frazier et al., 2014; Mansour et al., 2020) ferent notions of fairness. Only the work of Hossain et al. (2020) has studied fairness across parties but its setting differs as every party receives a (different) utility whenever an arm is pulled. The incentivizing exploration (IE) works are not designed to ensure fairness across parties. Instead, they tackle a separate problem of incentivizing each selfinterested party to adopt the mediator s recommendation and explore arms with smaller utility, which benefits the group but may not benefit itself. Another key difference is that in most IE applications, parties are less keen on exploration as they are not intent on finding the maximizer x and do not participate in multiple rounds. Nevertheless, designing a fair and incentive-compatible (Mansour et al., 2020) collaborative BO algorithm that does not rely on external incentives (e.g., monetary payment) is an important but nontrivial future work. 7. Conclusion This paper describes the first BO algorithm that considers collaborative fairness. We propose new notions of fair (cumulative) regret and a collaborative BO algorithm whose convergence rate can be theoretically guaranteed by bounding the new fair regret. By controlling parameter ρ, the parties can select a set of weights for G2SF to trade off between efficiency vs. fairness. This is empirically demonstrated using a benchmark function and three real-world experiments. With this collaborative BO algorithm that considers each party s historical utilities, a mediator can ensure fairness across iterations while greedily selecting input queries for the current iteration. By considering fairness in every iteration up to T, the differences between the (historical) CUs of all parties are reduced and can thus be corrected without significantly hurting efficiency in the current iteration. Acknowledgements This research is supported by the National Research Foundation, Singapore under its AI Singapore Programme (Award No: AISG2-RP-2020-018). Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore. R. H. L. Sim is also supported by the Singapore-MIT Alliance for Research and Technology (SMART) Ph.D. Fellowship. Collaborative Bayesian Optimization with Fair Regret Alvi, A. S., Ru, B., Calliess, J., Roberts, S. J., and Osborne, M. A. Asynchronous batch Bayesian optimisation with improved local penalisation. In Proc. ICML, 2019. Anguita, D., Ghio, A., Oneto, L., Parra, X., and Reyes Ortiz, J. L. A public domain dataset for human activity recognition using smartphones. In Proc. ESANN, 2013. Azimi, J., Fern, A., and Fern, X. Z. Batch Bayesian optimization via simulation matching. In Proc. Neur IPS, pp. 109 117, 2010. Busa-Fekete, R., Sz or enyi, B., Weng, P., and Mannor, S. Multi-objective bandits: Optimizing the generalized Gini index. In Proc. ICML, pp. 625 634, 2017. Caldas, S., Duddu, S. M. K., Wu, P., Li, T., Koneˇcn y, J., Mc Mahan, H. B., Smith, V., and Talwalkar, A. LEAF: A benchmark for federated settings. ar Xiv:1812.01097, 2018. Chalkiadakis, G., Elkind, E., and Wooldridge, M. Computational aspects of cooperative game theory. In Brachman, R. J., Cohen, W. W., and Dietterich, T. G. (eds.), Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers, 2011. Chen, J., Low, K. H., and Tan, C. K.-Y. Gaussian processbased decentralized data fusion and active sensing for mobility-on-demand system. In Proc. RSS, 2013. Chen, J., Low, K. H., Jaillet, P., and Yao, Y. Gaussian process decentralized data fusion and active sensing for spatiotemporal traffic modeling and prediction in mobilityon-demand systems. IEEE Trans. Autom. Sci. Eng., 12: 901 921, 2015. Cheng, F., Ma, Y., Uzzi, B., and Loscalzo, J. Importance of scientific collaboration in contemporary drug discovery and development: A detailed network analysis. BMC Biology, 18(138), 2020. Chevalier, C. and Ginsbourger, D. Fast computation of the multi-points expected improvement with applications in batch selection. In Proc. LION, pp. 59 69, 2013. Contal, E., Buffoni, D., Robicquet, A., and Vayatis, N. Parallel Gaussian process optimization with upper confidence bound and pure exploration. In Proc. ECML/PKDD, pp. 225 240, 2013. Dai, Z., Yu, H., Low, B. K. H., and Jaillet, P. Bayesian optimization meets Bayesian optimal stopping. In Proc. ICML, pp. 1496 1506, 2019. Dai, Z., Chen, Y., Low, B. K. H., Jaillet, P., and Ho, T.-H. R2B2: Recursive reasoning-based Bayesian optimization for no-regret learning in games. In Proc. ICML, 2020a. Dai, Z., Low, B. K. H., and Jaillet, P. Federated Bayesian optimization via Thompson sampling. In Proc. Neur IPS, pp. 9687 9699, 2020b. Dalton, H. The measurement of the inequality of incomes. The Economic Journal, 30(119):348 361, 1920. Daxberger, E. A. and Low, B. K. H. Distributed batch Gaussian process optimization. In Proc. ICML, pp. 951 960, 2017. Desautels, T., Krause, A., and Burdick, J. W. Parallelizing exploration-exploitation tradeoffs in Gaussian process bandit optimization. JMLR, 15:4053 4103, 2014. Endriss, U. Lecture notes on fair division. ar Xiv:1806.04234, 2018. Fitch, D. Opinion: The value of collaboration in Ag technology, 2018. URL https://www.precisionag. com/digital-farming/. Frazier, P., Kempe, D., Kleinberg, J., and Kleinberg, R. Incentivizing exploration. In Proc. EC, pp. 5 22, 2014. Gonz alez, J., Dai, Z., Hennig, P., and Lawrence, N. D. Batch Bayesian optimization via local penalization. In Proc. AISTATS, 2016. Griffiths, R. and Hern andez-Lobato, J. M. Constrained Bayesian optimization for automatic chemical design using variational autoencoders. Chemical Science, 11(2): 577 586, 2020. Hoang, T. N., Hoang, Q. M., and Low, K. H. A unifying framework of anytime sparse Gaussian process regression models with stochastic variational inference for big data. In Proc. ICML, pp. 569 578, 2015. Hoang, T. N., Hoang, Q. M., and Low, K. H. A distributed variational inference framework for unifying parallel sparse Gaussian process regression models. In Proc. ICML, pp. 382 391, 2016. Hoang, T. N., Hoang, Q. M., Ouyang, R., and Low, K. H. Decentralized high-dimensional Bayesian optimization with factor graphs. In Proc. AAAI, pp. 3231 3238, 2018. Hossain, S., Micha, E., and Shah, N. Fair algorithms for multi-agent multi-armed bandits. ar Xiv:2007.06699, 2020. Kandasamy, K., Schneider, J., and P oczos, B. High dimensional Bayesian optimisation and bandits via additive models. In Proc. ICML, pp. 295 304, 2015. Kandasamy, K., Dasarathy, G., Oliva, J., Schneider, J., and P oczos, B. Gaussian process optimisation with multifidelity evaluations. In Proc. Neur IPS, 2016. Collaborative Bayesian Optimization with Fair Regret Kharkovskii, D., Ling, C. K., and Low, K. H. Nonmyopic Gaussian process optimization with macro-actions. In Proc. AISTATS, pp. 4593 4604, 2020. Ling, C. K., Low, K. H., and Jaillet, P. Gaussian process planning with Lipschitz continuous reward functions: Towards unifying Bayesian optimization, active learning, and beyond. In Proc. AAAI, pp. 1860 1866, 2016. Liu, Y., Radanovic, G., Dimitrakakis, C., Mandal, D., and Parkes, D. Calibrated fairness in bandits. In Proc. KDD Workshop on Fairness, Accountability, and Transparency in Machine Learning, 2017. Mansour, Y., Slivkins, A., and Syrgkanis, V. Bayesian incentive-compatible bandit exploration. Operations Research, 68(4):1132 1161, 2020. Patil, V., Ghalme, G., Nair, V., and Narahari, Y. Achieving fairness in the stochastic multi-armed bandit problem. In Proc. AAAI, pp. 5379 5386, 2020. Perrone, V., Donini, M., Kenthapadi, K., and Archambeau, C. Fair Bayesian optimization. ar Xiv:2006.05109, 2020. Pigou, A. C. Wealth and Welfare. Macmillan and Co., London, 1912. Rasmussen, C. E. and Williams, C. K. I. Gaussian Processes for Machine Learning. MIT Press, 2006. Shah, A. and Ghahramani, Z. Parallel predictive entropy search for batch global optimization of expensive objective functions. In Proc. Neur IPS, pp. 3312 3320, 2015. Shahriari, B., Swersky, K., Wang, Z., Adams, R. P., and de Freitas, N. Taking the human out of the loop: A review of Bayesian optimization. Proceedings of the IEEE, 104 (1):148 175, 2015. Sim, R. H. L., Zhang, Y., Chan, M. C., and Low, B. K. H. Collaborative machine learning with incentiveaware model rewards. In Proc. ICML, pp. 8927 8936, 2020. Srinivas, N., Krause, A., Kakade, S., and Seeger, M. Gaussian process optimization in the bandit setting: No regret and experimental design. In Proc. ICML, pp. 1015 1022, 2010. Weymark, J. A. Generalized Gini inequality indices. Mathematical Social Sciences, 1(4):409 430, 1981. Wu, J. and Frazier, P. The parallel knowledge gradient method for batch Bayesian optimization. In Proc. Neur IPS, pp. 3126 3134, 2016. Zhang, Y., Dai, Z., and Low, B. K. H. Bayesian optimization with binary auxiliary information. In Proc. UAI, pp. 1222 1232, 2019. Zhang, Y., Apley, D. W., and Chen, W. Bayesian optimization for materials design with mixed quantitative and qualitative variables. Sci. Rep., 10(4924), 2020.