# bayesian_optimizationbased_combinatorial_assignment__6cd10806.pdf Bayesian Optimization-Based Combinatorial Assignment* Jakob Weissteiner1,3 , Jakob Heiss2,3 , Julien Siems1 and Sven Seuken1,3 1University of Zurich 2ETH Zurich 3ETH AI Center weissteiner@ifi.uzh.ch, jakob.heiss@math.ethz.ch, juliensiems@gmail.com, seuken@ifi.uzh.ch We study the combinatorial assignment domain, which includes combinatorial auctions and course allocation. The main challenge in this domain is that the bundle space grows exponentially in the number of items. To address this, several papers have recently proposed machine learning-based preference elicitation algorithms that aim to elicit only the most important information from agents. However, the main shortcoming of this prior work is that it does not model a mechanism s uncertainty over values for not yet elicited bundles. In this paper, we address this shortcoming by presenting a Bayesian optimization-based combinatorial assignment (BOCA) mechanism. Our key technical contribution is to integrate a method for capturing model uncertainty into an iterative combinatorial auction mechanism. Concretely, we design a new method for estimating an upper uncertainty bound that can be used to define an acquisition function to determine the next query to the agents. This enables the mechanism to properly explore (and not just exploit) the bundle space during its preference elicitation phase. We run computational experiments in several spectrum auction domains to evaluate BOCA s performance. Our results show that BOCA achieves higher allocative efficiency than state-of-the-art approaches. 1 Introduction Many economic problems require finding an efficient combinatorial assignment of multiple indivisible items to multiple agents. Popular examples include combinatorial auctions (CAs), combinatorial exchanges (CEs), and combinatorial course allocation. In CAs, heterogeneous items are allocated among a set of bidders, e.g., for the sale of spectrum licenses (Cramton 2013). In CEs, items are allocated among agents who can be sellers and buyers at the same time, e.g., for the reallocation of catch shares (Bichler, Fux, and Goeree 2019). In course allocation, course seats are allocated among students at universities (Budish 2011). In all these domains, agents have preferences over bundles of items. In particular, agents preferences may exhibit complementarities and substitutabilities among items. Copyright 2023, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. *The full paper including appendix is available on ar Xiv via: https://arxiv.org/abs/2208.14698. These authors contributed equally. A mechanism that allows agents to report values for bundles rather than just for individual items can achieve significantly higher efficiency. However, this also implies that agents preferences are exponentially-sized (i.e., for m items there are 2m different bundles), which implies that agents cannot report values for all bundles. Therefore, the key challenge in combinatorial assignment is the design of a preference elicitation (PE) algorithm that is (i) practically feasible w.r.t. elicitation costs and (ii) smart, i.e., it should elicit the information that is most useful for achieving high efficiency. 1.1 Iterative Combinatorial Auctions (ICAs) While the PE challenge is common to all combinatorial assignment problems, it has been studied most intensely in the CA domain (Sandholm and Boutilier 2006). In CAs with general valuations, the amount of communication needed to guarantee full efficiency is exponential in the number of items (Nisan and Segal 2006). Thus, practical CAs cannot provide efficiency guarantees. In practice, iterative combinatorial auctions (ICAs) are therefore employed, where the auctioneer interacts with bidders over rounds, eliciting a limited (and thus practically feasible) amount of information, aiming to find a highly efficient allocation. ICAs are widely used; e.g., for the sale of licenses to build offshore wind farms (Ausubel and Cramton 2011). The provision of spectrum licenses via the combinatorial clock auction (CCA) (Ausubel, Cramton, and Milgrom 2006) has generated more than $20 billion in total revenue (Ausubel and Baranov 2017). Thus, increasing the efficiency of such realworld ICAs by only 1% point translates into monetary gains of hundreds of millions of dollars. 1.2 ML-Powered Preference Elicitation In recent years, researchers have used machine learning (ML) to design smart PE algorithms. Most related to this paper is the work by Brero, Lubin, and Seuken (2018, 2021), who developed the first practical ML-powered ICA that outperforms the CCA. The main idea of their mechanism is two-fold: first, they train a separate support vector regression model to learn each bidder s full value function from a small set of bids; second, they solve an ML-based winner determination problem (WDP) to determine the allocation with the highest predicted social welfare, and they use this allocation to generate the next set of queries to all bidders. The Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23) This process repeats in an iterative fashion until a fixed number of queries has been asked. Thus, their ML-powered ICA can be interpreted as a form of combinatorial Bayesian optimization (BO) (see Appendix C). In several follow-up papers, this work has been extended by developing more sophisticated ML methods for this problem. Weissteiner and Seuken (2020) integrated neural networks (NN) in their ICA and further increased efficiency. Weissteiner et al. (2022b) used Fourier transforms (FTs) to leverage different notions of sparsity of value functions. Finally, Weissteiner et al. (2022a) achieved state-of-the-art (SOTA) performance using monotone-value NNs (MVNNs), which incorporate important prior domain knowledge. The main shortcoming of this prior work is that all of these approaches are myopic in the sense that the resulting mechanisms simply query the allocation with the highest predicted welfare. In particular, the mechanisms do not have any model of uncertainty over bidders values for not yet elicited bundles, although handling uncertainty in a principled manner is one of the key requirements of a smart PE algorithm (Bonilla, Guo, and Sanner 2010). Thus, the mechanisms cannot properly control the exploration-exploitation trade-off inherent to BO. For ML-based ICAs, this means that the mechanisms may get stuck in local minima, repeatedly querying one part of the allocation space while not exploring other, potentially more efficient allocations. 1.3 Our Contributions In this paper, we address this main shortcoming of prior work and show how to integrate a notion of model uncertainty (i.e., epistemic uncertainty) over agents preferences into iterative combinatorial assignment. Concretely, we design a Bayesian optimization-based combinatorial assignment (BOCA)1 mechanism that makes use of model uncertainty in its query generation module. The main technical challenge is to design a new method for estimating an upper uncertainty bound that can be used to define an acquisition function to determine the next query. For this we combine MVNNs (Weissteiner et al. 2022a) with neural optimizationbased model uncertainty (NOMU) (Heiss et al. 2022), a recently introduced method to estimate model uncertainty for NNs. In detail, we make the following contributions: 1. We present a modified NOMU algorithm (Section 3.1), tailored to CAs, exploiting monotonicity of agents preferences and the discrete (finite) nature of this setting. 2. We show that generic parameter initialization for monotone NNs can dramatically fail and propose a new initialization method for MVNNs based on uniform mixture distributions (Section 3.2). 3. We present a more succinct mixed integer linear program for MVNNs to solve the ML-based WDP (Section 3.3). 4. We experimentally show that BOCA outperforms prior approaches in terms of efficiency (Section 4). Although our contribution applies to any combinatorial assignment setting, we focus on CAs to simplify the notation 1The acronym BOCA has also been used for a different method, namely for Bayesian optimisation with continuous approximations by Kandasamy et al. (2017). and because there exist well-studied preference generators for CAs that we use for our experiments. Our source code is publicly available on Git Hub via: https://github.com/marketdesignresearch/BOCA. 1.4 Related Work on Model Uncertainty Estimating model uncertainty for NNs is an active area of research in AI and ML, with a plethora of new methods proposed every year. Classic methods can be broadly categorized into (i) ensemble methods: training multiple different NNs to estimate model uncertainty (Gal and Ghahramani 2016; Lakshminarayanan, Pritzel, and Blundell 2017; Wenzel et al. 2020) and (ii) Bayesian NNs (BNNs): assuming a prior distribution over parameters and then estimating model uncertainty by approximating the intractable posterior (Graves 2011; Blundell et al. 2015; Hern andez-Lobato and Adams 2015; Ober and Rasmussen 2019). However, for ML-based iterative combinatorial assignment, a key requirement is to be able to efficiently solve the ML-based WDP based on these uncertainty estimates. As there is no known computationally tractable method to perform combinatorial optimization over ensembles or BNNs, we cannot use these approaches for ML-based ICAs. In contrast, NOMU (Heiss et al. 2022) enables the computationally efficient optimization over its uncertainty predictions, which is why we use it as a building block for BOCA. 2 Preliminaries In this section, we present our formal model (Section 2.1), review the ML-based ICA by Brero, Lubin, and Seuken (2021) (Section 2.2), briefly review Bayesian optimization (BO) (Section 2.3), and review monotone-value neural networks (MVNNs) by Weissteiner et al. (2022a) (Section 2.4) as well as neural optimization-based model uncertainty (NOMU) by Heiss et al. (2022) (Section 2.5). 2.1 Formal Model for ICAs We consider a CA with n bidders and m indivisible items. Let N = {1, . . . , n} and M = {1, . . . , m} denote the set of bidders and items. We denote by x X = {0, 1}m a bundle of items represented as an indicator vector, where xj = 1 iff item j M is contained in x. Bidders true preferences over bundles are represented by their (private) value functions vi : X R+, i N, i.e., vi(x) represents bidder i s true value for bundle x X. By a = (a1, . . . , an) X n we denote an allocation of bundles to bidders, where ai is the bundle bidder i obtains. We denote the set of feasible allocations by F = a X n : P i N aij 1, j M . We let p Rn + denote the bidders payments. We assume that bidders have quasilinear utility functions ui of the form ui(a, p) = vi(ai) pi. This implies that the (true) social welfare V (a) of an allocation a is equal to the sum of all bidders values P i N vi(ai). We let a argmaxa F V (a) denote a social-welfare maximizing, i.e., efficient, allocation. The efficiency of any allocation a F is determined as V (a)/V (a ). An ICA mechanism defines how the bidders interact with the auctioneer and how the allocation and payments are determined. We denote a bidder s (possibly untruthful) reported value function by ˆvi : X R+. In this paper, we consider ICAs that ask bidders to iteratively report their values ˆvi(x) for bundles x selected by the mechanism. A finite set of reported bundle-value pairs of bidder i is denoted as Ri = x(l), ˆvi(x(l)) , x(l) X. Let R = (R1, . . . , Rn) be the tuple of reported bundle-value pairs obtained from all bidders. We define the reported social welfare of an allocation a given R as b V (a|R) := P i N: (ai,ˆvi(ai)) Ri ˆvi(ai), where (ai, ˆvi(ai)) Ri ensures that only values for reported bundles contribute. The ICA s optimal allocation a R F and payments p(R) Rn + are computed based on the elicited reports R only. More formally, a R F given reports R is defined as a R argmax a F b V (a|R). (1) As the auctioneer can only query a limited number of bundles |Ri| Qmax (e.g., Qmax = 100), an ICA needs a practically feasible and smart PE algorithm. 2.2 A Machine Learning-Powered ICA We now provide a high-level review of the machine learning-powered combinatorial auction (MLCA) by Brero, Lubin, and Seuken (2021) (please see Appendix A for further details). MLCA proceeds in rounds until a maximum number of value queries per bidder Qmax is reached. In each round, for every bidder i, an ML model Ai is trained on the bidder s reports Ri to learn an approximation of bidders value functions. Next, MLCA generates new value queries by computing the allocation with the highest predicted social welfare. Concretely, it computes qnew = (qnew i )n i=1 with qnew i X \ Ri by solving an ML-based WDP: qnew argmax a F i N Ai(ai) (2) The idea is the following: if the Ai s are good surrogate models of the bidders value functions, then the efficiency of qnew is likely to be high as well. Thus, in each round, bidders are providing value reports on bundles that are guaranteed to fit into a feasible allocation and that together are predicted to have high social welfare. Additionally, bidders are also allowed to submit push-bids, enabling them to submit information to the auctioneer that they deem useful, even if they are not explicitly queried about it. At the end of each round, MLCA receives reports Rnew from all bidders for the newly generated queries qnew, and updates the overall elicited reports R. When Qmax is reached, MLCA computes an allocation a R that maximizes the reported social welfare (Equation (1)) and determines VCG payments p(R) based on all reports (see Appendix Definition B.1). Remark 1 (IR, No-Deficit, and Incentives of MLCA). Brero, Lubin, and Seuken (2021) showed that MLCA satisfies individual rationality (IR) and no-deficit, with any ML algorithm. They also studied MLCA s incentive properties; this is important, since manipulations may lower efficiency. Like all deployed ICAs (including the CCA), MLCA is not strategyproof. However, they argued that it has good incentives in practice; and given two additional assumptions, bidding truthfully is an ex-post Nash equilibrium. We present a detailed summary of their incentive analysis in Appendix B. 2.3 Bayesian Optimization Background In this section, we briefly review Bayesian optimization (BO). BO refers to a class of machine learning-based gradient-free optimization methods, which, for a given black-box objective function f : X R, aim to solve max x X f(x) (3) in an iterative manner. Specifically, given a budget of T queries (i.e., function evaluations of f), a BO algorithm generates queries {x(1), . . . , x(T )} with the aim that max x {x(1),...,x(T )} f(x) max x X f(x). (4) In each BO step t, the algorithm selects a new input point x(t) X and observes a (potentially noisy) output y(t) = f(x(t)) + ε(t), (5) where ε(t) is typically assumed to be i.i.d. Gaussian, i.e., ε(t) N(0, σ2).2 The BO algorithm s decision rule for selecting the query x(t) is based on 1. A probabilistic model representing an (approximate) posterior distribution over f (e.g., Gaussian processes, NOMU, ensembles, BNNs, etc.). 2. An acquisition function A : X R that uses this probabilistic model to determine the next query x(t) argmaxx X A(x) by properly trading off exploration and exploitation. See Appendix C.3 for popular examples of acquisition functions including: Upper uncertainty bound (u UB) (aka upper confidence bound (UCB)) (Srinivas et al. 2012) Expected improvement (Frazier 2018, Section 4.1) Thompson sampling (Chapelle and Li 2011) Remark 2. MLCA (Section 2.2) can be seen as a combinatorial BO algorithm with acquisition function A(a) := P i N Ai(ai) (see Appendix C for a discussion). 2.4 MVNNs: Monotone-Value Neural Networks MVNNs (Weissteiner et al. 2022a) are a new class of NNs specifically designed to represent monotone combinatorial valuations. First, we reprint the definition of MVNNs and then discuss their desirable properties. Definition 1 (MVNN, Weissteiner et al. (2022a)). An MVNN Mθ i : X R+ for bidder i N is defined as Mθ i (x) := W i,Kiϕ0,ti,Ki 1 . . . ϕ0,ti,1(W i,1x + bi,1) . . . (6) Ki+1 N is the number of layers (Ki 1 hidden layers), 2In this paper, we assume that σ2 = 0. {ϕ0,ti,k}Ki 1 k=1 are the MVNN-specific activation functions with cutoff ti,k > 0, called bounded Re LU (b Re LU): ϕ0,ti,k( ) := min(ti,k, max(0, )) (7) W i := (W i,k)Ki k=1 with W i,k 0 and bi := (bi,k)Ki 1 k=1 with bi,k 0 are the non-negative weights and nonpositive biases of dimensions di,k di,k 1 and di,k, whose parameters are stored in θ = (W i, bi). MVNNs are particularly well suited for the design of combinatorial assignment mechanism for two reasons. First, MVNNs are universal in the set of monotone and normalized value functions (Weissteiner et al. 2022a, Theorem 1), i.e., any ˆvi : X R+ that satisfies the following two properties can be represented exactly as an MVNN Mθ i : 1. Monotonicity (M) ( additional items increase value ): For A, B 2M: if A B it holds that ˆvi(A) ˆvi(B) 2. Normalization (N) ( no value for empty bundle ): ˆvi( ) = ˆvi((0, . . . , 0)) := 0, Second, Weissteiner et al. (2022a) showed that an MVNNbased WDP, i.e., argmax a F P i N Mθ i (ai), can be succinctly encoded as a MILP, which is key for the design of MVNNbased iterative combinatorial assignment mechanisms. Finally, Weissteiner et al. (2022a) experimentally showed that using MVNNs as Ai in MLCA leads to SOTA performance. 2.5 NOMU Recently, Heiss et al. (2022) introduced a novel method to estimate model uncertainty for NNs: neural optimizationbased model uncertainty (NOMU). In contrast to other methods (e.g., ensembles), NOMU represents an upper uncertainty bound (u UB) as a single and MILP-formalizable NN. Thus, NOMU is particularly well suited for iterative combinatorial assignment, where u UB-based winner determination problems (WDPs) need to be solved hundreds of times to generate new informative queries. This, together with NOMU s strong performance in noiseless BO, is the reason why we build on it and define a modified NOMU algorithm tailored to iterative combinatorial assignment (Section 3.1). 3 Bayesian Optimization-Based ICA In this section, we describe the design of our Bayesian optimization-based combinatorial assignment (BOCA) mechanism. While the design is general, we here present it for the CA setting, leading to a BO-based ICA. Recall that MLCA generates new value queries by solving the MLbased WDP qnew argmax a F i N Ai(ai) (see Section 2.2). For the design of BOCA, we integrate a proper notion of uncertainty into MLCA by using a bidder-specific upper uncertainty bound (u UB), taking the role of the ML model Ai, to define our acquisition function A(a) := P i N Ai(ai). To define our u UB and make it amenable to MLCA, we proceed in three steps: First, we combine MVNNs with a modified NOMU algorithm that is tailored to the characteristics of combinatorial assignment (Section 3.1). Second, we highlight the importance of proper parameter initialization for MVNNs and propose a more robust method (Section 3.2). Third, we present a more succinct MILP for MVNNs (Section 3.3). In the remainder of the paper, we make the following assumption: Assumption 1. For all agents i N, the true and reported value functions vi and ˆvi fulfill the Monotonicity (M) and Normalization (N) property (see Section 2.4). 3.1 Model Uncertainty for Monotone NNs We propose a modified NOMU architecture and loss that is specifically tailored to combinatorial assignment. Concretely, our algorithm is based on the following two key characteristics of combinatorial assignment: (i) since agents value functions are monotonically increasing, the u UBs need to be monotonically increasing too, and (ii) due to the (finite) discrete input space, one can derive a closed-form expression of the 100%-u UB as an MVNN. Before we present our modified NOMU architecture and loss, we introduce the MVNN-based 100%-u UB. Let H denote a hypothesis class of functions f : X R for some input space X and let HDtrain := {f H : f(x(l)) = y(l), l = 1, . . . , ntrain} denote the set of all functions from H that fit exactly through training points Dtrain = x(l), f(x(l)) ntrain Definition 2 (100%-u UB). For a hypothesis class H and a training set Dtrain, we define the 100%-u UB as f 100%-u UB(x) := supf HDtrain f(x) for every x X. In the following, let V := {ˆv : X R+| satisfy (N) and (M)} (8) denote the set of all value functions that satisfy the normalization and monotonicity property. Next, we define the 100%-u UB. In Theorem 1, we show that for H = V the 100%-u UB can be explicitly represented as an MVNN. Theorem 1 (MVNN-based 100%-u UB). Let ((1, . . . , 1), ˆvi(1, . . . , 1)) Dtrain. Then for H = V it holds that f 100%-u UB(x) = maxf VDtrain f(x) for all x X and f 100%-u UB VDtrain can be represented as a two hidden layer MVNN with ntrain neurons per layer, which we denote as M100%-u UB i going forward.3 Proof. The proof for Theorem 1 is provided in Appendix D.1. It follows a similar idea as the universality proof in (Weissteiner et al. 2022a, Theorem 1). In particular, Equation (27) in Appendix D.1 provides the closed-form expression of f 100%-u UB as MVNN M100%-u UB i . Using the MVNN-based 100%-u UB M100%-u UB i , we can now define our modified NOMU architecture and loss. The Architecture. Towards defining the architecture, we first observe that if the true function is monotonically increasing, the corresponding u UB needs to be monotonically increasing as well (Proposition 1 and 2 in Appendix D.2). 3Note that M100%-u UB i ( ) depends on a training set Dtrain, but we omit this dependency in our notation to improve readability. Mu UB i (x) Mmean i (x) Mean-network u UB-network Figure 1: MNOMU i : a modification of NOMU s original architecture for the combinatorial assignment domain. Given that bidders value functions are monotone (Assumption 1), this implies that our u UB must also be monotonically increasing. Thus, instead of the original NOMU architecture that outputs the (raw) uncertainty (i.e., an estimate of the posterior standard deviation) which is not monotone, we can modify NOMU s architecture and directly output the monotone u UB. Given this, we propose the following architecture MNOMU i to estimate the u UB for bidder i N. MNOMU i consists of two sub-MVNNs with two outputs: the mean prediction Mmean i : X R and the estimated u UB Mu UB i : X R. In Figure 1, we provide a schematic representation of MNOMU i (see Appendix D.2 for details). The Loss. Next, we formulate a new NOMU loss function Lπ tailored to combinatorial assignment. Since we have a closed-form expression of the 100%-u UB as MVNN M100%-u UB i (Theorem 1), we are able to enforce that Mmean i Mu UB i M100%-u UB i via the design of our new loss function. Let Mmean i be a trained mean-MVNN with a standard loss (e.g., MAE and L2-regularization). Using Mmean i and M100%-u UB i , we then only train the parameters θ of Mu UB i with loss Lπ and L2-regularization parameter λ > 0, i.e., minimizing Lπ(Mu UB i ) + λ θ 2 2 via gradient descent. In particular, the parameters of M100%-u UB i and Mmean i are not influenced by the training of Mu UB i (see Appendix D.3 for details on the loss and training procedure). Definition 3 (NOMU Loss Tailored to Combinatorial Assignment). Let π = (πsqr, πexp, cexp, π, π) R5 + be a tuple of hyperparameters and let s(Mu UB i , x) := min{Mu UB i (x), M100%-u UB i (x)} Mmean i (x) for all x X. For a training set Dtrain, Lπ is defined as Lπ(Mu UB i ) := πsqr l=1 Lβ 1 Mu UB i (x(l)), y(l) (9a) [0,1]m g cexps(Mu UB i , x) dx (9b) + πexpcexpπ Z [0,1]m Lβ 1 (Mu UB i (x) M100%-u UB i (x))+ dx + πexpcexpπ [0,1]m Lβ 1 (Mmean i (x) Mu UB i (x))+ dx , where Lβ 1 is the smooth L1-loss with threshold β (see Appendix Definition D.1), ( )+ the positive part, and g := 1 + ELU4 is convex monotonically increasing with ELU being the exponential linear unit (see Appendix Definition D.2). The interpretations of the four terms are as follows: (9a) enforces that Mu UB i fits through the training data. (9b) pushes Mu UB i up as long as it is below the 100%-u UB M100%-u UB i . This force gets weaker the further Mu UB i is above the mean Mmean i (especially if cexp is large). πexp controls the overall strength of (9b) and cexp controls how fast this force increases when Mu UB i Mmean i . Thus, increasing πexp increases the u UB and increasing cexp increases the u UBs in regions where it is close to Mmean i . Weakening (9b) (i.e., πexpcexp 0) leads to Mu UB i Mmean i . Strengthening (9b) by increasing πexpcexp in relation to regularization5 leads to Mu UB i M100%-u UB i . (9c) enforces that Mu UB i M100%-u UB i . The strength of this term is determined by π (πexpcexp), where π is the (9c)- specific hyperparameter and πexpcexp adjusts the strength of (9c) to (9b). (9d) enforces Mu UB i Mmean i . The interpretation of π and πexpcexp is analogous to (9c). As in (Heiss et al. 2022), in the implementation of Lπ, we approximate Equations (9b) to (9d) via Monte Carlo integration using additional, artificial input points Dart := x(l) nart l=1 i.i.d. Unif([0, 1]m). Visualization of the u UB. In Figure 2, we present a visualization of the output of MNOMU i (i.e., Mmean i and Mu UB i ) and M100%-u UB i for the national bidder in the LSVM domain of the spectrum auction test suite (SATS) (Weiss, Lubin, and Seuken 2017). In noiseless regression, uncertainty should vanish at observed training points, but (model) uncertainty should remain about value predictions for bundles that are very different from the bundles observed in training. Figure 2 shows that our u UB Mu UB i nicely fulfills this. Moreover, we have shown in Appendix D.2 that Mu UB i is monotonically increasing, since we assume that value functions fulfill the monotonicity property. This implies that once we observe a value for the full bundle, we obtain a globally bounded 100%-u UB, i.e., see M100%-u UB i in Figure 2. Furthermore, we see that M100%-u UB i jumps to a high value when only a single item is added to an already queried bundle, but then often stays constant (e.g., |x| = 12, . . . , 18 in Figure 2). Thus, using such a 100%-u UB in our acquisition function, BOCA would only add a single item to an already queried bundle to have more items left for the other bidders instead of properly exploring the bundle space. Our u UB Mu UB i circumvents this via implicit and explicit regularization and yields a useful u UB. 4In our notation, g( ) is the analog of the function e( ) used in the original NOMU loss in (Heiss et al. 2022). 5Regularization can be early stopping or a small number of neurons (implicit) or L2-regularization on the parameters (explicit). 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Number of Items in Bundle |x| := Pm i=1 xi [0,1]-scaled Value Function vi Mean MVNN Mmean i u UB Mu UB i 100%-u UB M100% u UB i Training Points Test Points Figure 2: Mmean i , Mu UB i and M100%-u UB i along an increasing 1D subset-path (i.e., for all bundles x(j), x(k) on the x-axis it holds that for j k : x(j) x(k)). 3.2 Parameter Initialization for MVNNs We now discuss how to properly initialize parameters for MVNNs. Importantly, the MVNN-based u UBs Mu UB i are MVNNs. As we will show next, to achieve the best performance of BOCA (in fact of any MVNN training), an adjusted, non-generic parameter initialization is important. Generic Initialization. For standard NNs, it is most common to use a parameter initialization with zero mean µk := E h W i,k j,l i = 0 and non-zero variance σ2 k := V h W i,k j,l i = 0. Then the mean of each preactivated neuron of the first hidden layer is zero and the variance V h W i,1x i = di,0σ2 1x2, if W i,1 j,l di,0 l=1 are i.i.d., where x2 = 1 di,0 Pdi,0 l=1 x2 l .6 Analogously, one can compute the conditional mean and the conditional variance of a pre-activated neuron in any layer k by replacing x by the output zi,k 1 of the previous layer, i.e., E h W i,kzi,k 1 zi,k 1i = 0 and V h W i,kzi,k 1 zi,k 1i = di,k 1σ2 k(zi,k 1)2 . For σk di,k 1 , the conditional mean and variance do not depend on the layer dimensions di,k, which is why generic initialization methods scale the initial distribution by sk 1 Problem. Unfortunately, this generic initialization approach can dramatically fail for MVNNs: For any non-zero initialization, the non-negativity constraint of the weights implies that the mean µk > 0. This implies that the mean of a pre-activated neuron in the first hidden layer is E h W i,1x i = di,0µ1 x. For a generic scaling sk one would obtain µk 1 di,k 1 and thus the mean 6We assume that the biases bi,k = 0 are all initialized to zero throughout Section 3.2 to keep the notation simpler, while we formulate everything for the general case including random biases in Appendix E and in our code. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Number of Items in Bundle |x| := Pm i=1 xi [0,1]-scaled Value Function vi Mean MVNN Mmean i u UB Mu UB i 100%-u UB M100% u UB i Training Points Test Points Figure 3: In contrast to our proposed initialization (see Figure 2), training fails with generic initialization already for relatively small [64,64]-architectures that were used here. di,0 x of the pre-activated neurons diverges to infinity with a rate of di,0 as di,0 . Analogously, the pre-activated neurons of every layer diverge to infinity as di,k 1 . This is particularly problematic for b Re LUs (as used in MVNNs) as their gradient is zero on [0, ti,k]c. Figure 3 shows that both MVNNs Mmean i and Mu UB i get stuck. This happens because already at initialization, every neuron in the first hidden layer has a preactivation that is larger than ti,1 for every training point. This could be solved by scaling down the initial weights even more, e.g., W i,k j,l Unif[0, 2 di,k 1 ] resulting in µk = 1 di,k 1 . However, since for W i,k j,l Unif[0, 2 di,k 1 ] it holds that σ2 k 1 (di,k 1)2 , this induces a new problem of van- ishing conditional variance V h W i,kzi,k 1 zi,k 1i with a rate of O( 1 di,k 1 ) for wide (i.e., di,k 1 large) MVNNs. Overall, it is impossible to simultaneously solve both problems by just scaling the distribution by a factor sk, because the conditional mean E h W i,kzi,k 1 scales with sk di,k 1 and the conditional variance V h W i,kzi,k 1 zi,k 1i scales with s2 k di,k 1. Thus, for wide MVNNs, one of those two problems (i.e., either diverging expectation or vanishing variance) would persist. Solution. We introduce a new initialization method that solves both problems at the same time. For this, we propose a mixture distribution of two different uniform distributions (see Appendix Definition E.1). For each layer k, we independently sample all weights W i,k jl i.i.d. with probability (1 pk) from Unif[0, Ak], and with probability pk from Unif[0, Bk]. If we choose pk and Ak small enough, we can get arbitrarily small µk while not reducing σk too much. In Appendix E, we provide formulas for how to choose Ak, Bk and pk depending on di,k 1. In Theorem 3 in Appendix E, we prove that, if the parameters are chosen in this way, then the conditional mean and conditional variance neither explode nor vanish with increasing di,k 1 but rather stay con- stant for large di,k 1. Note that, in Figure 2, for Mmean i and Mu UB i , we used our proposed initialization method for suitable Ak, Bk and pk, such that the problem induced by a generic initialization from Figure 3 is resolved. 3.3 Mixed Integer Linear Program (MILP) A key step in ML-powered iterative combinatorial assignment mechanisms is finding the (predicted) social welfaremaximizing allocation, i.e., solving the ML-based WDP. Thus, a key requirement posed on any acquisition function A in such a mechanism is to be able to efficiently solve max a F A(a). Recall that, to define our acquisition func- tion A, we use A(a) = P i N Ai(ai) where the Ai s are bidder-specific upper uncertainty bounds. Thus, the MLbased WDP becomes i N Ai(ai). (10) Weissteiner et al. (2022a) proposed a MILP for MVNNs with Ai := Mθ i to efficiently solve eq. (10). Their MILP was based on a reformulation of the min( , ) and max( , ) in the b Re LU activation min(max( , 0), t). Thus, it required twice the number of binary variables and linear constraints as for a plain Re LU-NN. Since we use an MVNN-based u UB Ai := Mu UB i to define our acquisition function, we could directly use their MILP formulation. However, instead, we propose a new MILP, which is significantly more succinct. For this, let oi,k := W i,kzi,k 1 + bi,k be the pre-activated output and zi,k := ϕ0,ti,k(oi,k) be the output of the kth layer with li,k oi,k ui,k, where the tight lower (upper) bound li,k (ui,k) is derived by forward-propagating the empty (full) bundle (Weissteiner et al. 2022a, Fact 1). In Theorem 2, we state our new MILP (see Appendix F.1 for the proof).7 Theorem 2 (MVNN MILP Tailored to Combinatorial Assignment). Let Ai = Mu UB i be our MVNN-based u UBs. The ML-based WDP (10) can be formulated as the following MILP: max a F,zi,k,αi,k,βi,k i N W i,Kizi,Ki 1 ) s.t. for i N and k {1, . . . , Ki 1} zi,0 = ai (12) zi,k αi,k ti,k (13) zi,k oi,k li,k (1 αi,k) (14) zi,k βi,k ti,k (15) zi,k oi,k + (ti,k u)βi,k (16) αi,k {0, 1}di,k, βi,k {0, 1}di,k (17) Note that for each neuron of Ai = Mu UB i , our new MILP has only 4 linear constraints, i.e., respective components of eqs. (13) to (16), compared to 8 in (Weissteiner et al. 2022a). Moreover, in contrast to the MILP in (Weissteiner et al. 2022a), our MILP does not make use of any big-M constraints, which are known to be numerically unstable. 7All vector inequalities should be understood component-wise. 4 Experiments In this section, we experimentally evaluate the performance of BOCA in CAs. To this end, we equip the MLCA mechanism (see Section 2.2) with our new acquisition function A(a) = P i N Mu UB i (ai). We compare the efficiency of BOCA against the previously proposed MVNN-based and NN-based MLCA from (Weissteiner et al. 2022a) which do not explicitly model the mechanism s uncertainty over values for not yet elicited bundles.8 We use our new parameter initialization method (Section 3.2) for Mu UB i , and we use our new MILP (Theorem 2) for solving the WDPs. Experiment Setup. To generate synthetic CA instances, we use the following three domains from the spectrum auction test suite (SATS) (Weiss, Lubin, and Seuken 2017): LSVM, SRVM, and MRVM (see Appendix G.1 for details).9 SATS gives us access to the true optimal allocation a , which we use to measure the efficiency loss, i.e., 1 V (a R)/V (a ) when eliciting reports R via MLCA. We report efficiency loss (and not revenue), as spectrum auctions are government-run, with a mandate to maximize welfare (Cramton 2013). See Appendix G.6 for a discussion of the corresponding results on revenue. To enable a fair comparison against prior work, for each domain, we use Qinit = 40 initial random queries (including the full bundle for the calculation of M100%-u UB i ) and set the query budget to Qmax = 100 (see Appendix G.8 for results for Qinit = 20). We terminate any mechanism in an intermediate iteration if it already found an allocation with 0% efficiency loss. Hyperparameter Optimization (HPO). We use random search (RS) (Bergstra and Bengio 2012) to optimize the hyperparameters of the mean MVNN Mmean i and of our MVNN-based u UB Mu UB i . The HPO includes the NNarchitecture parameters, training parameters, NOMU parameters, and initialization parameters (see Section 3.2). RS was carried out independently for each bidder type and SATS domain with a budget of 500 configurations, where each configuration was evaluated on 100 SATS instances. For each instance, the MVNNs Mmean i and Mu UB i were trained on uniformly at random chosen bundle-value pairs Dtrain and evaluated on a disjoint test set of different bundlevalue pairs Dtest. To select the winner configuration, we consider as evaluation metric the quantile-loss on the test set and the MAE on the training set, i.e., for each configuration and instance we calculate 1 |Dtest| (x,y) Dtest max{(y Mu UB i (x))q, (Mu UB i (x) y)(1 q)} + MAE(Dtrain), (18) which we then average over all 100 instances. We used four quantile parameters q {0.6, 0.75, 0.9, 0.95} in eq. (18) 8In these methods, uncertainty over not yet elicited bundles is only modeled via the retraining of the (MV)NNs in each round, i.e., the random parameter initialization of the (MV)NNs. This can be seen as simple form of Thompson sampling (see last paragraph in Appendix C). 9We do not use GSVM, as Weissteiner et al. (2022a) already achieved 0% efficiency loss in GSVM via MVNN-based MLCA. EFFICIENCY LOSS IN % T-TEST FOR EFFICIENCY: DOMAIN QMAX BOCA MVNN-MLCA NN-MLCA FT-MLCA RS H0 : µMVNN-MLCA µBOCA H0 : µNN-MLCA µBOCA LSVM 100 0.39 0.30 00.70 0.40 02.91 1.44 01.54 0.65 31.73 2.15 p VAL = 9e 2 p VAL = 3e 4 SRVM 100 0.06 0.02 00.23 0.06 01.13 0.22 00.72 0.16 28.56 1.74 p VAL = 5e 6 p VAL = 2e 13 MRVM 100 7.77 0.34 08.16 0.41 09.05 0.53 10.37 0.57 48.79 1.13 p VAL = 8e 2 p VAL = 2e 5 Table 1: BOCA vs MVNN-MLCA, NN-MLCA, Fourier transform (FT)-MLCA and random search (RS). Shown are averages and a 95% CI on a test set of 50 instances. Winners based on a t-test with significance level of 1% are marked in grey. 40 60 80 100 Number of Queries Efficiency Loss in % NN-MLCA MVNN-MLCA BOCA 40 60 80 100 Number of Queries NN-MLCA MVNN-MLCA BOCA 40 60 80 100 Number of Queries NN-MLCA MVNN-MLCA BOCA Figure 4: Efficiency loss paths (i.e., regret plots) of BOCA compared to the results from Weissteiner et al. (2022a) of MVNNMLCA and NN-MLCA without any notion of uncertainty. Shown are averages with 95% CIs over 50 CA instances. to achieve different levels of exploration (i.e., the resulting u UBs become larger the more we increase q in eq. (18)). This evaluation metric simultaneously measures the quality of the u UB on the test data (via the quantile-loss) as well as the quality of the u UB predictions on the training data (via the MAE). For each quantile q and SATS domain, we then proceed with the winner configuration of Mu UB i and evaluate the efficiency of BOCA on a separate set of 50 instances. Details on hyperparameter ranges and the training procedure are provided in Appendices G.2 and G.3. Results. In Table 1, we show the average efficiency loss of each approach after Qmax = 100 queries (see Appendix G.5 for details). We see that BOCA significantly outperforms MVNN-MLCA (Weissteiner et al. 2022a) in SRVM, and it performs on-par in LSVM and MRVM, with a better average performance. Since MVNNs previously achieved SOTA performance, BOCA also outperforms the other benchmarks (i.e., NN (Weissteiner and Seuken 2020) and FT-MLCA (Weissteiner et al. 2022b)). RS s poor performance highlights the intrinsic difficulty of this task. The amount of exploration needed is domain dependent (e.g., multi-modality of the objective), which explains why the significance of BOCA s improvement varies across domains. However, our results also show that using an u UB (as in BOCA) instead of just a mean prediction (as in MVNN-MLCA) never hurts. Figure 4 shows the efficiency loss path for all domains. We see that the superior (average) performance of Mu UB i does not only hold at the end of the auction (at Qmax = 100), but also for a large range of queries: in LSVM, BOCA is better for [70,100]; in SRVM, BOCA is significantly better for [70,100]; in MRVM, BOCA is better for [50,100]. See Appendix G.6 for results on revenue where BOCA significantly outperforms MVNN-MLCA also for MRVM. In Appendix G.7, we study to what degree BOCA s performance increase is due to (a) our uncertainty model (Section 3.1) versus (b) our new parameter initialization method (Section 3.2). Finally, in Appendix G.8, we provide further experiments for a reduced number of Qinit = 20 initial queries, which lead to similar results as shown in Table 1. 5 Conclusion In this paper, we have proposed a Bayesian optimizationbased combinatorial assignment (BOCA) mechanism. On a conceptual level, our main contribution was the integration of model uncertainty over agents preferences into MLbased preference elicitation. On a technical level, we have designed a new method for estimating an upper uncertainty bound that exploits the monotonicity of agents preferences in the combinatorial assignment domain and the finite nature of this setting. Our experiments have shown that BOCA performs as good or better than the SOTA in terms of efficiency. An interesting direction for future work is the evaluation of BOCA in other combinatorial assignment domains, such as combinatorial exchanges or course allocation (e.g., see (Soumalias et al. 2023)). Finally, it would also be interesting to apply BOCA s conceptual idea in the combinatorial BO settings outside of combinatorial assignment. Acknowledgments We thank the anonymous reviewers for helpful comments. This paper is part of a project that has received funding from the European Research Council (ERC) under the European Union s Horizon 2020 research and innovation program (Grant agreement No. 805542). References Ausubel, L.; and Cramton, P. 2011. Auction design for wind rights. Report to Bureau of Ocean Energy Management, Regulation and Enforcement. Ausubel, L. M.; and Baranov, O. 2017. A practical guide to the combinatorial clock auction. Economic Journal, 127(605): F334 F350. Ausubel, L. M.; Cramton, P.; and Milgrom, P. 2006. The clock-proxy auction: A practical combinatorial auction design. In Cramton, P.; Shoham, Y.; and Steinberg, R., eds., Combinatorial Auctions, 115 138. MIT Press. Bergstra, J.; and Bengio, Y. 2012. Random search for hyperparameter optimization. Journal of machine learning research, 13(2). Bichler, M.; Fux, V.; and Goeree, J. K. 2019. Designing combinatorial exchanges for the reallocation of resource rights. Proceedings of the National Academy of Sciences, 116(3): 786 791. Blundell, C.; Cornebise, J.; Kavukcuoglu, K.; and Wierstra, D. 2015. Weight uncertainty in neural networks. In 32nd International Conference on Machine Learning (ICML). Bonilla, E. V.; Guo, S.; and Sanner, S. 2010. Gaussian process preference elicitation. Advances in neural information processing systems, 23. Brero, G.; Lubin, B.; and Seuken, S. 2018. Combinatorial Auctions via Machine Learning-based Preference Elicitation. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. Brero, G.; Lubin, B.; and Seuken, S. 2021. Machine Learning-powered Iterative Combinatorial Auctions. ar Xiv preprint ar Xiv:1911.08042. Budish, E. 2011. The combinatorial assignment problem: Approximate competitive equilibrium from equal incomes. Journal of Political Economy, 119(6): 1061 1103. Chapelle, O.; and Li, L. 2011. An Empirical Evaluation of Thompson Sampling. In NIPS. Cramton, P. 2013. Spectrum auction design. Review of Industrial Organization, 42(2): 161 190. Frazier, P. I. 2018. A tutorial on Bayesian optimization. ar Xiv preprint ar Xiv:1807.02811. Gal, Y.; and Ghahramani, Z. 2016. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In 33rd International Conference on Machine Learning (ICML), 1050 1059. Graves, A. 2011. Practical variational inference for neural networks. In Advances in neural information processing systems, 2348 2356. Heiss, J. M.; Weissteiner, J.; Wutte, H. S.; Seuken, S.; and Teichmann, J. 2022. NOMU: Neural Optimization-based Model Uncertainty. In Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, 8708 8758. PMLR. Hern andez-Lobato, J. M.; and Adams, R. 2015. Probabilistic backpropagation for scalable learning of bayesian neural networks. In International Conference on Machine Learning, 1861 1869. Kandasamy, K.; Dasarathy, G.; Schneider, J.; and P oczos, B. 2017. Multi-fidelity bayesian optimisation with continuous approximations. In International Conference on Machine Learning, 1799 1808. PMLR. Lakshminarayanan, B.; Pritzel, A.; and Blundell, C. 2017. Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in neural information processing systems, 6402 6413. Nisan, N.; and Segal, I. 2006. The communication requirements of efficient allocations and supporting prices. Journal of Economic Theory, 129(1): 192 224. Ober, S. W.; and Rasmussen, C. E. 2019. Benchmarking the neural linear model for regression. ar Xiv preprint ar Xiv:1912.08416. Sandholm, T.; and Boutilier, C. 2006. Preference elicitation in combinatorial auctions. Combinatorial auctions, 10. Soumalias, E.; Zamanlooy, B.; Weissteiner, J.; and Seuken, S. 2023. Machine Learning-powered Course Allocation. ar Xiv preprint ar Xiv:2210.00954. Srinivas, N.; Krause, A.; Kakade, S. M.; and Seeger, M. W. 2012. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting. IEEE Transactions on Information Theory, 58(5): 3250 3265. Weiss, M.; Lubin, B.; and Seuken, S. 2017. Sats: A universal spectrum auction test suite. In Proceedings of the 16th Conference on Autonomous Agents and Multi Agent Systems, 51 59. Weissteiner, J.; Heiss, J.; Siems, J.; and Seuken, S. 2022a. Monotone-Value Neural Networks: Exploiting Preference Monotonicity in Combinatorial Assignment. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, 541 548. International Joint Conferences on Artificial Intelligence Organization. Main Track. Weissteiner, J.; and Seuken, S. 2020. Deep Learning Powered Iterative Combinatorial Auctions. Proceedings of the AAAI Conference on Artificial Intelligence, 34(02): 2284 2293. Weissteiner, J.; Wendler, C.; Seuken, S.; Lubin, B.; and P uschel, M. 2022b. Fourier Analysis-based Iterative Combinatorial Auctions. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI22, 549 556. International Joint Conferences on Artificial Intelligence Organization. Main Track. Wenzel, F.; Snoek, J.; Tran, D.; and Jenatton, R. 2020. Hyperparameter ensembles for robustness and uncertainty quantification. ar Xiv preprint ar Xiv:2006.13570.