# deep_learningpowered_iterative_combinatorial_auctions__95e955df.pdf The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20) Deep Learning Powered Iterative Combinatorial Auctions Jakob Weissteiner University of Zurich weissteiner@ifi.uzh.ch Sven Seuken University of Zurich seuken@ifi.uzh.ch In this paper, we study the design of deep learning-powered iterative combinatorial auctions (ICAs). We build on prior work where preference elicitation was done via kernelized support vector regressions (SVRs). However, the SVR-based approach has limitations because it requires solving a machine learning (ML)-based winner determination problem (WDP). With expressive kernels (like gaussians), the MLbased WDP cannot be solved for large domains. While linear or quadratic kernels have better computational scalability, these kernels have limited expressiveness. In this work, we address these shortcomings by using deep neural networks (DNNs) instead of SVRs. We first show how the DNNbased WDP can be reformulated into a mixed integer program (MIP). Second, we experimentally compare the prediction performance of DNNs against SVRs. Third, we present experimental evaluations in two medium-sized domains which show that even ICAs based on relatively small-sized DNNs lead to higher economic efficiency than ICAs based on kernelized SVRs. Finally, we show that our DNN-powered ICA also scales well to very large CA domains. 1 Introduction Combinatorial auctions (CAs) are used to allocate multiple heterogeneous items to bidders in domains where these items may be substitutes or complements. Specifically, in a CA, bidders are allowed to submit bids on bundles of items rather than on individual items. CAs are widely used in practice, including for the sale of airport landing and take-off slots (Rassenti, Smith, and Bulfin 1982), in industrial procurement (Bichler et al. 2006), and for the sale of spectrum licenses (Cramton 2013). One of the main challenges in large CAs is that the bundle space grows exponentially in the number of items. This typically makes it impossible for the bidders to report their full value function, even for medium-sized domains. Thus, careful preference elicitation is needed in CAs. Nisan and Segal (2006) have shown that to achieve full efficiency and support general value functions, exponential communication in the number of items is needed in the worst case. Thus, practical auction designs cannot provide Copyright c 2020, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. efficiency guarantees in large CA domains. Instead, many recent proposals for CAs have focused on iterative combinatorial auctions (ICAs) where the auctioneer interacts with bidders over multiple rounds, eliciting a limited amount of information, aiming to find a highly efficient allocation. ICAs have found widespread application in practice. For example, just between 2008 and 2014, the combinatorial clock auction (CCA) (Ausubel, Cramton, and Milgrom 2006) has been used to conduct more than 15 spectrum auctions and has generated more than $20 Billion in total revenue (Ausubel and Baranov 2017). Another important application of ICAs are auctions for building offshore wind farms (Ausubel and Cramton 2011). Given the value of the resources allocated in these real-world ICAs, increasing their efficiency by 1-2% points already translates into welfare gains of millions of dollars. Therefore, improving the efficiency of ICAs is an important research challenge. 1.1 Machine Learning and Mechanism Design Researchers have proposed various ways to further increase the efficiency of CAs by integrating machine learning (ML) methods into the mechanism. This research goes back to Blum et al. (2004) and Lahaie and Parkes (2004), who studied the relationship between computational learning theory and preference elicitation in CAs. More recently, Brero and Lahaie (2018) and Brero, Lahaie, and Seuken (2019) introduced a Bayesian CA where they integrated ML into a CA to achieve faster convergence. In a different strand of research, D utting et al. (2015; 2017), Narasimhan, Agarwal, and Parkes (2016) and Golowich, Narasimhan, and Parkes (2018) used ML to directly learn a new mechanism (following the automated mechanism design paradigm). Most related to the present paper is the work by Brero, Lubin, and Seuken (2017; 2018; 2019), who proposed an MLpowered ICA. The core of their auction is an ML-powered preference elicitation algorithm. As part of their algorithm, they used kernelized support vector regressions (SVRs) to learn the nonlinear value functions of bidders. Recently, Brero, Lubin, and Seuken (2019) showed that their MLbased ICA achieves even higher efficiency than the CCA. However, because of runtime complexity issues, Brero, Lubin, and Seuken (2018; 2019) focused on SVRs with linear and quadratic kernels. This leaves room for improvement, since bidders valuations can have more complex structures than can be captured by linear or quadratic kernels. 1.2 Our Approach Using Deep Neural Networks In this paper, we propose using DNNs instead of SVRs in ML-powered ICAs. In each round of the auction, we approximate bidders value functions by DNNs and subsequently solve an optimization problem, a DNN-based winner determination problem (WDP) , to determine which query to ask every bidder in the next round. Since our design involves doing this in each round of the auction, a central requirement for the practical implementation of the auction mechanism is to efficiently solve these DNN-based WDPs. Therefore, we show how to reformulate the WDP based on DNNs with rectified linear units (Re LUs) as activation functions into a (linear) mixed integer program (MIP) (Section 4). Our approach is related to a recent line of research that uses MIP formulations to study specific properties of DNNs. For example, Cheng, N uhrenberg, and Ruess (2017) studied resilience properties of DNNs. Similarly, Fischetti and Jo (2018) used a MIP formulation for finding adversarial examples in image recognition. To experimentally evaluate the performance of our DNNbased approach, we use the Spectrum Auction Test Suite (SATS) (Weiss, Lubin, and Seuken 2017) to generate synthetic CA instances (Section 5). We first compare the prediction performance of DNNs against SVRs in the two medium-sized domains GSVM and LSVM. Then we compare the economic efficiency of our DNN-powered ICA against the SVR-powered ICA. In GSVM (a domain perfectly suited for the quadratic kernel), our DNN-powered ICA matches the efficiency of the SVR-powered ICA, while in the more complex LSVM domain, our DNN-powered ICA outperforms the SVR-powered ICA by 1.74% points. Finally, we also demonstrate that our DNN-based approach scales well to a very large domain, by evaluating it in the MRVM domain (with 98 items and 10 bidders). Overall, our results show that, perhaps surprisingly, even small-sized neural networks can be advantageous for the design of ICAs. 2 Preliminaries We now present our formal model and review the MLpowered ICA by Brero, Lubin, and Seuken (2018).1 2.1 Iterative Combinatorial Auction We consider a CA setting with n bidders and m indivisible items. Let N := {1, . . . , n} and M := {1, . . . , m} denote the set of bidders and items, respectively. We denote by x X := {0, 1}m a bundle of items represented as an indicator vector, where xj = 1 iff item j M is contained in x. Bidders true preferences over bundles are represented by their (private) value functions vi : {0, 1}m R+, i N, i.e., vi(x) represents bidder i s true value for bundle x. 1We compare our DNN-powered ICA against the mechanism described in (Brero, Lubin, and Seuken 2018) because, when we wrote this paper, (Brero, Lubin, and Seuken 2019) was not available yet. We slightly adopt the notation and use Bi instead of ϑi. Let v := (v1, . . . , vn) denote the vector of bidders value functions. The (possibly untruthful) reported valuations are denoted by ˆvi and ˆv, respectively. By a := (a1, . . . , an) X n we denote an allocation of bundles to bidders, where ai X is the bundle bidder i obtains. An allocation a is feasible if each item is allocated to at most one bidder, i.e., j M : i N aij 1. We denote the set of feasible allocations by F := a X n : i N aij 1, j M . Payments are denoted by p = (p1, . . . , pn) Rn, where pi is bidder i s payment. Furthermore, we assume that bidders have quasilinear utility functions ui(a) := vi(ai) pi. The (true) social welfare of an allocation a is defined as V (a) := i N vi(ai). Let a arg maxa F V (a) be a feasible, social-welfare maximizing, i.e., efficient, allocation given true value functions v. Then the efficiency of any feasible allocation a F is measured in terms of a by V (a) V (a ). An ICA mechanism defines how the bidders interact with the auctioneer, how the final allocation is determined, and how payments are computed. In this paper, we only consider ICAs that ask bidders to iteratively report their valuations ˆvi(x) for particular bundles x selected by the mechanism. A finite set of such reported bundle-value pairs of bidder i is denoted as Bi := x(k), ˆvi(x(k)) k {1,...,ni} , ni N, x(k) X, where ni is the total number of bundle-value pairs reported by bidder i. We let B := (B1, . . . , Bn) denote the tuple of reported bundle-value pairs obtained from all bidders. We define the reported social welfare of an allocation a given B as V (a|B) := i N: (ai,ˆvi(ai)) Bi ˆvi(ai), (1) where the condition (ai, ˆvi(ai)) Bi ensures that only values for reported bundles contribute to the sum. Finally, the optimal feasible allocation a B given B is defined as a B arg max a F V (a|B). (2) In the ICA mechanisms we consider in this paper, the final outcome is only computed based on the reported values B at termination. Specifically, the mechanism determines a feasible allocation a B F and charges payments p. As the auctioneer can generally only ask each bidder i for a limited number of bundle-value pairs Bi, the ICA mechanism needs a sophisticated preference elicitation algorithm. This leads to the following preference elicitation problem, where the goal is to find an (approximately) efficient allocation with a limited number of value queries. More formally: Problem 1 (PREFERENCE ELICITATION IN ICA). Given a cap ce on the number of value queries in an ICA, elicit from each bidder i N a set of reported bundle-value pairs Bi with |Bi| ce such that the resulting efficiency of a B is maximized, i.e., B arg max B:|Bi| ce V (a B) V (a ) . (3) In practice, a small domain-dependent cap on the number of queries is chosen, e.g., ce 500. 2.2 SVR-powered ICA We now present a brief review of the ML-based ICA introduced by Brero, Lubin, and Seuken (2018). At the core of their auction is an ML-based preference elicitation algorithm which we reprint here as Algorithm 1. Algorithm 1: ML-BASED ELICITATION (Brero et al. 2018) Parameter: Machine learning algorithm A 1 B0 = initial tuple of reported bundle-value pairs at t = 0 4 Estimation step: V t := A(Bt 1) 5 Optimization step: a(t) arg max a F V t(a) 6 for each bidder i do 7 if a(t) i / Bt 1 i then 8 Query value ˆvi(a(t) i ) 9 Bt i = Bt 1 i a(t) i , ˆvi(a(t) i ) 11 Bt i = Bt 1 i 12 end 14 while i N : a(t) i / Bt 1 i 15 Output tuple of reported bundle-value pairs Bt This algorithm is a procedure to determine B, i.e., for each bidder i a set of reported bundle-value pairs Bi. Note that the algorithm is described in terms of a generic ML algorithm A which is used in the estimation step (Line 4) to obtain the estimated social welfare function V t in iteration t. In the optimization step (Line 5), an ML-based winner determination problem is then solved to find an allocation a(t) that maximizes V t. Finally, given the allocation a(t) from iteration t, each bidder i is asked to report his value for the bundle a(t) i . The algorithm stops when it reaches an allocation a(t) for which all bidders have already reported their values for the corresponding bundles a(t) i . As the ML-algorithm A, Brero, Lubin, and Seuken (2018) used a sum of kernelized SVRs, i.e, i N SVRi. (4) Given a bundle x, each SVRi computes the predicted value as SVRi(x) = wi φ(x), where the weights wi are determined through training on the reported bundle-value pairs Bt 1 i . Kernelized SVRs are a popular non-linear regression technique, where a linear model is fitted on transformed data. The transformation of bundles x is implicitly conducted by setting a kernel k(x, x ) := φ(x)T φ(x ) in the dual optimization problem (Smola and Sch olkopf 2004). Brero, Lubin, and Seuken (2018) called their entire auction mechanism the Pseudo Vickrey-Clarke-Groves mechanism (PVM). We reprint it here as Algorithm 2. PVM calls Algorithm 2: PVM (Brero et al. 2018) 1 Run Algorithm 1 n + 1 times: B( ), B( 1), . . . , B( n). 2 Determine allocations: a( ), a( 1), . . . , a( n), where a( i) arg maxa F V (a|B( i)). 3 Pick apvm {a( ), a( 1), . . . , a( n)} with maximal V . 4 Charge each bidder i according to: j =i ˆvj a( i) j j =i ˆvj apvm j . (5) the preference elicitation algorithm (Algorithm 1) n + 1 times: once including all bidders (called the main economy) and n times excluding a different bidder in each run (called the marginal economies). The motivation for this design, which is inspired by the VCG mechanism, is to obtain payments such that the auction aligns bidders incentives with allocative efficiency. Here, B( i) denotes the output of Algorithm 1 by excluding bidder i from the set of bidders. For each of the reported bundle-value pairs B( i) obtained from the n + 1 runs, PVM calculates a corresponding allocation that maximizes the reported social welfare (Line 2). The final allocation apvm is determined as the allocation of the n + 1 runs with the largest reported social welfare (Line 3). Finally, VCG-style payments are calculated (Line 4). 3 Deep Neural Network-powered ICA In this section, we present the high level design of our DNNpowered ICA and discuss its advantages compared to the SVR-based design by Brero, Lubin, and Seuken (2018). Observe that the choice of the ML algorithm A affects Algorithm 1 in two ways: first, in the estimation step (Line 4), A determines how well we can predict bidders valuations; second, in the optimization step (Line 5), it determines the complexity of the ML-based WDP. Thus, our situation is different from standard supervised learning because of the added optimization step. In particular, when choosing A, we must also ensure that we obtain a practically feasible MLbased WDP. Given that we have to solve the optimization step hundreds of times throughout an auction, in practice, we must impose a time limit on this step. In our experiments (Section 5), we follow Brero, Lubin, and Seuken (2018) and impose a 1 hour time limit on this step. To make the optimization step feasible, Brero, Lubin, and Seuken (2018) used SVRs with quadratic kernels, for which the ML-based WDP is a quadratic integer program (QIP) and still practically solvable within a 1 hour time limit for most settings. However, note that a quadratic kernel, while more expressive than a linear kernel, can still at most model two-way interactions between the items. To this end, Brero, Lubin, and Seuken (2017; 2019) also evaluated more expressive kernels (gaussian and exponential). Even though these kernels had good prediction performance, the corresponding ML-based WDPs were too complex such that they always timed out and had a large optimization gap, thus leading to worse economic efficiency than the quadratic kernel. However, using SVRs with quadratic kernels leaves room for improvement, since bidders valuations can be more complex than can be captured by quadratic kernels. In this work, we show how these shortcomings can be addressed by using DNNs instead of SVRs in the estimation and optimization steps of Algorithm 1. DNNs are a concatenation of affine and non-linear mappings (see Figure 1). They consist of several layers, which are themselves composed of multiple nodes. Between each of the layers an affine transformation is applied, which is followed by a nonlinear mapping called the activation function. One advantage of DNNs compared to (nonlinear) kernelized SVRs is that, for any number of layers and nodes, we always obtain a (linear) MIP for the DNN-based WDP whose size only grows linearly in the number of bidders and items (as we will show in Section 4). The key insight for this is to use rectified linear units (Re LUs) as activation functions. Furthermore, in contrast to SVRs, DNNs do not use predefined feature transformations. While with SVRs, the choice of a good kernel usually relies on prior domain knowledge, DNNs automatically learn features in the process of training. Following Brero, Lubin, and Seuken (2018), we decompose the estimated social welfare function in Line 4 of Algorithm 1, V t = A(Bt 1), as follows: i N vt i, (6) where vt i is an estimate of bidder i s true value function vi and is trained on the data set Bt 1 i , i.e., the values queried up to round t 1. In this work, for every i N, we model vt i using a fully connected feed-forward DNN Ni : {0, 1}m R+. Consequently, the estimated social welfare function V t is given as a sum of DNNs, i.e., i N Ni. (7) Note that each bidder s value function is modeled as a distinct DNN (with different architectures and parameters) because bidders preferences are usually highly idiosyncratic.2 4 MIP Formulation of the DNN-based Winner Determination Problem We now present the parameterization of each DNN and show how to reformulate the DNN-based WDP into a MIP. Thus, we focus on the optimization step (Line 5) of Algorithm 1. 4.1 Setting up the DNN-based WDP Figure 1 shows a schematic representation of a DNN Ni. We now define the parameters of the DNNs. To simplify the exposition, we consider a fixed iteration step t and no longer highlight the dependency of all variables on t. 2A second reason for this construction is that this prevents bidders from influencing each others ML-models. Please see (Brero, Lubin, and Seuken 2019) for a detailed incentive analysis of PVM. ... ... ... ... di 1 di 2 di Ki 1 W i,0( ) + bi,0 W i,1( ) + bi,1 . . . W i,Ki 1( ) + bi,Ki 1 Figure 1: Schematic representation of a DNN Ni. Each DNN Ni consists of Ki 1 hidden layers for Ki N, with the kth hidden layer containing di k hidden nodes, where k {1, . . . , Ki 1}. As Ni maps bundles x {0, 1}m to values, the dimension of the input layer di 0 is equal to the number of items m (i.e., di 0 := m) and the dimension of the output layer is equal to 1 (i.e., di Ki := 1). Hence, in total, a single DNN consists of Ki +1 layers. Furthermore, let ϕ : R R+, ϕ(s) := max(0, s) denote the Re LU activation function.3 The affine mappings between the kth and the k+1st layer are parameterized by a matrix W i,k Rdi k+1 di k and a bias bi,k Rdi k+1, k {0, . . . , Ki 1}. To estimate the parameters W i,k and bi,k from data (i.e., from the bundle-value pairs Bi) we use the ADAM algorithm, which is a popular gradient-based optimization technique (Kingma and Ba 2015).4 This is done in the estimation step in Line 4 of Algorithm 1. Thus, after the estimation step, W i,k and bi,k are constants. In summary, given estimated parameters W i := {W i,k}0 k Ki 1 and bi := {bi,k}0 k Ki 1, each DNN Ni(W i, bi) : {0, 1}m R+ represents the following nested function: Ni(W i, bi)(x) = (8) = ϕ W i,Ki 1ϕ . . . ϕ(W i,0x + bi,0) . . . + bi,Ki 1 . The DNN-based WDP in the optimization step in Line 5 of Algorithm 1 can now be formulated as follows: i N Ni W i, bi (ai) i N aij 1, j M aij {0, 1}, j M, i N. 4.2 The MIP Formulation In its general form, (OP1) is a nonlinear, non-convex optimization problem and there do not exist practically feasible algorithms that are guaranteed to find a globally optimal solution. Therefore, we now reformulate (OP1) into a MIP. 3ϕ acts on vectors componentwise, i.e., for s Rk let ϕ(s) := (ϕ(s1), . . . , ϕ(sk)). 4For fitting the DNNs in all of our experiments we use PYTHON 3.5.3, KERAS 2.2.4 and TENSORFLOW 1.13.1. Consider bidder i N. For every layer k {1, . . . , Ki} let oi,k Rdi k + denote the output of the kth layer, which can be recursively calculated as oi,k = ϕ(W i,k 1oi,k 1 + bi,k 1) = = max(0, W i,k 1oi,k 1 + bi,k 1). (9) For k {1, . . . , Ki}, we introduce di k binary decision variables that determine which node in the corresponding layer is active, represented as a vector yi,k {0, 1}di k. We also introduce 2di k continuous variables, represented as vectors zi,k, si,k Rdi k. Each zi,k corresponds to the positive components of the output value oi,k of each layer and each si,k is used as a slack variable representing the absolute value of the negative components of oi,k. In our final MIP formulation, we will make use of big M constraints. For our theoretical results to hold, we need to make the following standard assumption. Assumption 1. (BIG-M CONSTRAINT) For all i N and k {1, ..., Ki} there exists a large enough constant L R+, such that (W i,k 1oi,k 1 + bi,k 1)j L for 1 j di k. In the following lemma, we show how to recursively encode a layer of Ni given the output value of the previous layer as multiple linear constraints.5 Lemma 1. Let W i,k 1oi,k 1 + bi,k 1 Rdi k be fixed for a k {1, . . . , Ki}. Furthermore, let zi,k, si,k Rdi k, yi,k {0, 1}di k. Consider the following linear constraints: zi,k si,k = W i,k 1oi,k 1 + bi,k 1 (10) 0 zi,k yi,k L (11) 0 si,k (1 yi,k) L. (12) The polytope defined by (10)-(12) is not empty and every element zi,k, si,k, yi,k of this polytope satisfies zi,k = oi,k. Proof. For notational convenience, let c := W i,k 1oi,k 1+ bi,k 1. Non-emptiness follows since |cj| L, 1 j di k, by Assumption 1. From Constraints (11) and (12) it follows, for 1 j di k, that if zi,k j > 0 then si,k j = 0, and if si,k j > 0 then zi,k j = 0. We now have to distinguish the following three cases for each component of c: cj < 0 = yi,k j = 0, si,k j = cj, zi,k j = 0 = ϕ(cj) cj > 0 = yi,k j = 1, si,k j = 0, zi,k j = cj = ϕ(cj) cj = 0 = zi,k j , si,k j , yi,k j {(0, 0, 0), (0, 0, 1)} Combining all cases yields that zi,k = ϕ(cj) = oi,k. Given Lemma 1, we can now reformulate the DNN-based WDP as a MIP. For this, we let W i, bi denote the estimated parameters corresponding to Ni(W i, bi). Furthermore, let a X n, yi,k {0, 1}di k and zi,k, si,k Rdi k, for 1 k Ki and L be a constant satisfying Assumption 1. 5In the following, all constraints containing vectors are defined componentwise. max a X n,zi,k,si,k,yi,k i N zi,Ki (OP2) zi,k si,k = W i,k 1zi,k 1 + bi,k 1 0 zi,k yi,k L 0 si,k (1 yi,k) L yi,k {0, 1}di k i N k {1, . . . , Ki} aij {0, 1}, j M, i N i N aij 1, j M We are now ready to state our main theorem. Theorem 1. (MIP FORMULATION) The DNN-based WDP as defined in (OP1) is equivalent to the MIP defined in (OP2). Proof. Consider (OP1). For each bidder i N, we first set zi,0 equal to the input bundle ai. Then we proceed by using Lemma 1 for k = 1, i.e., we reformulate the output of the 1st layer as the linear constraints (10), (11) and (12). We iterate this procedure until we have reformulated the final layer, i.e, k = Ki. Doing so for each bidder i N and adding the feasibility constraints yields (OP2). Using the MIP formulation (OP2), we can solve the DNN-based WDP using standard optimization packages like CPLEX.6 We now provide a simple worked example for how to reformulate (OP1) into (OP2) which illustrates Theorem 1. Example 1. Consider a setting with one bidder (n = 1), m items (d1 0 = m) and one hidden layer (K1 = 2). Given W 1,0 Rd1 1 m, W 1,1 R1 d1 1 ,b1,0 Rd1 1, and b1,1 R, (OP1) can be written as max a1 X 1 max 0, W 1,1 max 0, W 1,0a1 + b1,0 + b1,1 s.t. a1j {0, 1}, j M, where the constraint i N aij 1, j M is redundant in this case, since we only consider one bidder. First, we replace the inner maximum using y1,1 {0, 1}d1 1, z1,1, s1,1 Rd1 1 and arrive at the equivalent optimization problem z1,1, s1,1, y1,1 max 0, W 1,1z1,1 + b1,1 s.t. z1,1 s1,1 = W 1,0a1 + b1,0 0 z1,1 y1,1 L 0 s1,1 (1 y1,1) L a1j {0, 1}, j M. 6For solving MIPs of the form (OP2) in our experiments we use CPLEX 12.8.0.0 with the python library DOCPLEX 2.4.61. Applying Lemma 1 again by using y1,2 {0, 1} and z1,2, s1,2 R yields the final MIP formulation max a1 X 1, z1,k y1,k, s1,k, k {1,2} s.t. z1,1 s1,1 = W 1,0a1 + b1,0 0 z1,1 y1,1 L 0 s1,1 (1 y1,1) L z1,2 s1,2 = W 1,1z1,1 + b1,1 0 z1,2 y1,2 L 0 s1,2 (1 y1,2) L a1j {0, 1}, j M. Remark 1. The number of decision variables in the MIP defined in (OP2) is given by i N #bidders m #items + 3 yi,k, si,k,zi,k #hidden layers Ki 1 k=1 di k #nodes per layer 5 Experimental Evaluation In this section, we evaluate the performance of our deep learning-powered ICA and compare it against the SVRbased approach using quadratic kernels by Brero, Lubin, and Seuken (2018). We release our code under an open-source license at: https://github.com/marketdesignresearch/DL-ICA. 5.1 Experiment setup Spectrum auctions are one of the most prominent applications of CAs, which is why we choose them for our experiments. Specifically, we use the spectrum auction test suite (SATS) version 0.6.4 (Weiss, Lubin, and Seuken 2017).7 SATS enables us to generate 1000s of CA instances in different domains. Furthermore, we have access to each bidder s true value vi(x) for all 2m possible bundles x X as well as the efficient allocation a F, which we can use to measure the efficiency of any other allocation a by V (a)/V (a ). We evaluate our approach in the following three domains: The Global Synergy Value Model (GSVM) (Goeree and Holt 2010) consists of 6 regional bidders, 1 national bidder, and 18 items. In GSVM the value of a package increases by a certain percentage with every additional item of interest. Thus, the value of a bundle only depends on the total number of items contained in a bundle which makes it one of the simplest models in SATS. In fact, bidders valuations can be exactly learned by SVRs with quadratic kernels (Brero, Lubin, and Seuken (2019)) which implies that the valuations exhibit at most two-way interactions between items. The Local Synergy Value Model (LSVM) (Scheffel, Ziegler, and Bichler 2012) consists of 5 regional bidders, 1 national bidder and 18 items. The items are arranged on a rectangle of size 3 6. The national bidder is interested in all 7Experiments were conducted on machines with Intel Xeon E52650 v4 2.20GHz processors with 40 logical cores. items, while the regional bidders are only interested in certain subsets of items. Complementarities arise from spatial proximity of items and are modeled via a logistic function, which makes it more complex than GSVM. The Multi-Region Value Model (MRVM) (Weiss, Lubin, and Seuken 2017) consists of 98 items and 10 bidders. It models large US and Canadian spectrum auctions and captures both geography (different regions) as well as frequency dimensions (different bands). A bidder is categorized as national, regional or local, depending on the magnitude of the synergies between different regions. In Sections 5.2 and 5.3, we first evaluate our approach in detail using the two medium-sized domains GSVM and LSVM. Then, in Section 5.4, we use MRVM to evaluate how well our approach scales to very large domains. 5.2 Prediction Performance We first compare the prediction performance of DNNs to SVRs. Using SATS, we generate a data set of bundle-value pairs {(x(k), vi(x(k)))} for all bidders i N. For each auction domain, we draw 100 auction instances uniformly at random. For each such instance, we sample, for each bidder type, a training set T of equal size and a disjoint test set V consisting of all remaining bundles, i.e., |V | := 2|M| |T|. For each bidder type, we train the ML algorithm on T and test it on V . We report the mean absolute error (MAE) for both bidder types averaged over the 100 instances.8 We denote by [d1, d2, d3] a 3-hidden-layer DNN with d1, d2 and d3 hidden nodes, respectively. For both, SVRs and DNNs, we performed a hyperparameter optimization for each bidder type. For the DNNs, we optimized the architecture9, the L2-regularization parameter for the affine mappings, the dropout rate per layer, and the learning rate of ADAM. For SVRs with a quadratic kernel k(x, y) := x T y + γ(x T y)2, we optimized γ (i.e., the influence of the quadratic term), the regularization parameter C, and the loss function parameter ϵ. In what follows, we present the winner models resulting from this hyperparameter optimization. In Table 1, we present prediction performance results for the GSVM domain. Consider the last column of the table, which shows the MAE on the test set. We observe the very good prediction performance of the SVRs and in particular that the test error converges to 0 when increasing |T|. This is due to the fact that in GSVM, bidders value functions can be perfectly captured by quadratic kernels. In this sense, GSVM represents a worst case auction domain w.r.t our comparison of DNNs against quadratically-kernelized SVRs. Looking at the performance of the DNNs, we observe that the test error also decreases with |T|, but, not surprisingly, is always larger than for the SVRs with quadratic kernels. Furthermore, we observe that the optimal architectures are always a 1-hidden layer network. In Table 2, we present the results for the more complex LSVM domain. We observe that for |T| = 50, the DNNs and 8For training the DNNs, we use the MAE as the loss function. 9We considered the following architectures: for the national bidders: [10], [100], [10,10], . . ., [100,100,100,100], and for the regional bidders: [32], [100], [32,32], . . ., [100,100,100,100]. Bidder DNN ML Algorithm |T| Type Architecture MAEtrain MAEtest 50 National [100] 1.99 5.25 (0.11) Regional [100] 2.06 6.20 (0.19) DNNs 100 National [100] 2.10 3.66 (0.07) Regional [100] 2.71 4.64 (0.13) 200 National [100] 1.61 2.22 (0.05) Regional [100] 2.11 2.89 (0.09) 50 National quadratic 0.03 4.38 (0.11) Regional quadratic 0.03 4.98 (0.20) SVRs 100 National quadratic 0.03 1.71 (0.04) Regional quadratic 0.03 2.07 (0.07) 200 National quadratic 0.03 0.12 (0.00) Regional quadratic 0.03 0.13 (0.00) Table 1: Prediction performance in GSVM. All results are averaged over 100 auction instances. For MAEtest, standard errors are shown in parentheses. Bidder DNN ML Algorithm |T| Type Architecture MAEtrain MAEtest 50 National [10] 24.68 29.90 (0.23) Regional [100] 4.22 16.58 (0.39) DNNs 100 National [10, 10, 10] 9.01 25.62 (0.36) Regional [100] 5.01 13.74 (0.26) 200 National [10, 10] 10.52 20.58 (0.21) Regional [100, 100, 100] 3.64 11.27 (0.23) 50 National quadratic 18.51 32.61 (0.59) Regional quadratic 3.11 15.30 (0.34) SVRs 100 National quadratic 20.03 27.86 (0.28) Regional quadratic 3.21 14.21 (0.28) 200 National quadratic 20.03 25.44 (0.16) Regional quadratic 8.23 12.67 (0.26) Table 2: Prediction performance in LSVM. All results are averaged over 100 auction instances. For MAEtest, standard errors are shown in parentheses. SVRs with quadratic kernels have similar test error. But for |T| = 100 and |T| = 200, the DNNs significantly outperform the SVRs with quadratic kernels. Specifically, DNNs better capture the national bidder in LSVM, which is important, since this bidder is interested in all items and usually gets a large portion of the items in the final allocation, which matters a lot for efficiency. In contrast to GSVM, we observe that for |T| 100, multi-hidden-layer networks were found to be best. This suggests that DNNs may indeed be advantageous for capturing more complex preference structures.10 5.3 Efficiency Results Finally, we compare the economic efficiency of our DNNpowered ICA against the SVR-powered ICA. When conducting the efficiency experiments, we follow Brero, Lubin, and Seuken (2018) and assume that bidders answer all value queries truthfully. Furthermore, we also use their experiment 10We also evaluated the prediction performance of other kernels (linear, gaussian and exponential) and observed that DNNs were as good or better for almost all combinations of bidder types and |T|. DNN Architectures11 c0 Efficiency % Revenue %12 R:[16, 16] | N:[10, 10] 40 98.53% 69.26% R:[16, 16] | N:[10, 10] 30 98.41% 68.29% R:[16, 16] | N:[10, 10, 10] 40 98.51% 68.91% R:[16, 16] | N:[10, 10, 10] 30 98.32% 68.59% R:[32, 32] | N:[10, 10] 40 98.75% 71.14% R:[32,32] | N:[10,10] 30 98.94% 68.47% R:[32, 32] | N:[10, 10, 10] 40 98.69% 71.63% R:[32, 32] | N:[10, 10, 10] 30 98.92% 68.88% R:[100] | N:[100] 30 98.27% 66.73% Table 3: Efficiency results for 9 configurations of a DNNpowered ICA on a training set of 100 GSVM auction instances. The selected winner model is marked in bold. All results are averaged over the 100 auction instances. Auction Max % % t-test on Mechanism #Queries #Queries Efficiency Revenue Efficiency13 VCG 218 218 100.00 (0.00) 80.4 - SVR-ICA 41.9 42.8 98.85 (0.13) 77.80 0.337 DNN-ICA 53 78 98.63 (0.18) 67.81 Table 4: A comparison of the DNN-powered ICA against the SVR-powered ICA and VCG (as reported in Brero, Lubin, and Seuken (2018)) on a test set of 100 GSVM instances. All results are averaged over the 100 instances. For efficiency, standard errors are shown in parentheses. setup and define a cap ce on the total number of queries in Algorithm 1 and set ce := 50. The initial set of reported bundle-value pairs B0 i per bidder i is drawn uniformly at random and set to be equal across bidders. We denote the number of initial reports by c0 := |B0 i |, i N, resulting in a maximum of c0 + n (ce c0) queries per bidder. GSVM. In Table 3, we first present the results from comparing nine different network architectures on a training set of 100 GSVM instances. As we can see, the winning model is among the largest multi-layer networks we tested. It is noteworthy that the one-hidden-layer network (R:[100]|N:[100]), which performed best in terms of prediction performance, did not perform best in terms of efficiency. In Table 4, we compare the performance of the winner model from Table 3 against the SVR-based approach on a test set of 100 GSVM instances. Even though GSVM can be perfectly captured by quadratic kernels, our DNN-based approach achieves a similar result w.r.t to efficiency, where the difference is not statistically significant (p = 0.337). LSVM. We now turn to the more complex LSVM domain. We first select a winner model based on a training set of 100 LSVM instances (Table 5). As in GSVM, the best model 11We denote by R and N the architectures used for the regionaland national bidders, respectively. 12Revenue is calculated as ( i N ppvm i )/V (a ). 13We performed a two-sided unpaired Welch Two Sample t-test with H0 : μ1 = μ2 against HA : μ1 = μ2. We thank Brero, Lubin, and Seuken (2018) for providing us with the detailed results of their experiments to enable all t-tests reported in this paper. DNN Architectures c0 Efficiency % Revenue % R:[16, 16] | N:[10, 10] 40 97.40 60.51 R:[16, 16] | N:[10, 10] 30 96.87 56.85 R:[16, 16] | N:[10, 10, 10] 40 97.45 61.15 R:[16, 16] | N:[10, 10, 10] 30 97.12 59.31 R:[32, 32] | N:[10, 10] 40 97.40 62.01 R:[32, 32] | N:[10, 10] 30 96.83 59.07 R:[32,32] | N:[10,10,10] 40 97.74 61.95 R:[32, 32] | N:[10, 10, 10] 30 97.12 59.56 R:[100] | N:[10] 40 96.78 58.71 Table 5: Efficiency results for 9 configurations of a DNNpowered ICA on a training set of 100 LSVM auction instances. The selected winner model is marked in bold. All results are averaged over the 100 auction instances. Auction Max % % t-test on Mechanism #Queries #Queries Efficiency Revenue Efficiency VCG 218 218 100.00 (0.00) 83.4 - SVR-ICA 48.2 52.8 96.03 (0.33) 65.60 0.000038 DNN-ICA 65 77 97.74 (0.24) 62.45 Table 6: A comparison of the DNN-powered ICA against the SVR-powered ICA and VCG (as reported in Brero, Lubin, and Seuken (2018)) on a test set of 100 LSVM auction instances. All results are averaged over the 100 instances. For efficiency, standard errors are shown in parentheses. is among the largest architectures and the best model w.r.t prediction performance does not yield the highest efficiency. In Table 6, we compare the performance of the selected winner model from Table 5 against the SVR-based approach on a test set of 100 new auction instances. Here, we see that our DNN-powered ICA substantially outperforms the SVR-powered ICA by 1.74% points, and this effect is highly statistically significant (p = 0.000038). This demonstrates the advantage of DNNs over SVRs with quadratic kernels in complex domains like LSVM. In Figure 2, we present a histogram of the efficiency obtained by the selected winner model on the test set. We see that for 29 auction instances, our approach (impressively) obtains an economic efficiency of 100%. However, for two instances, the efficiency is less than 90%. Thus, it is a promising avenue for future work to investigate these outliers to further increase the average efficiency. mean std min 25% 50% 75% max 97.74 2.35 86.51 96.67 98.33 99.48 100.0 86 88 90 92 94 96 98 100 Efficiency in % # of Instances 1 1 2 4 2 5 10 7 12 13 14 29 Figure 2: Histogram of efficiency results in LSVM of the selected DNN winner model on the test set. Remark 2. In Tables 4 and 6, we see that our DNN-based approach achieves lower revenue than the SVR-based approach. This may be explained as follows. A bidder s payment in PVM, depends on the difference between the social welfare in the marginal and the main economy. However, PVM has one oddity: a bidder s payment may be negative. This happens more frequently with DNNs than with SVRs: consider an auction where bidder i is not allocated in the main economy. Then, ideally, the allocation (and thus the welfare) should be exactly the same in the marginal economy where bidder i is excluded, resulting in a zero payment. When using SVRs, this is guaranteed if the same set of bundle-value pairs is used in the main and marginal economy, because SVRs use a deterministic algorithm for estimation. In contrast, DNNs use a non-deterministic algorithm, sometimes resulting in different allocations in the main and marginal economies. However, this is more a limitation of PVM itself. In practice, one should lower bound the payments as also suggested by Brero, Lubin, and Seuken (2019). Lower-bounding all payments by zero increases the revenue in GSVM by 7.9% points and in LSVM by 8.3% points. 5.4 Scaling to Larger Domains We now present results for the MRVM domain (with 98 items and 10 bidders) to show that our DNN-powered ICA also scales well to larger domains (we present detailed runtime results in Section 5.5). We use the experiment setup of Brero, Lubin, and Seuken (2018) and set ce := 100. In Table 7, we present the results for different DNN architectures and different values of c0, evaluated on a training set of MRVM instances. First, we observe that the efficiency increases as we decrease c0. This can be explained by the fact DNN Architectures c0 Efficiency % Revenue % L:[10, 10] | R:[32, 32] | N:[32, 32] 70 93.54 31.02 L:[10, 10] | R:[32, 32] | N:[32, 32] 50 94.07 33.51 L:[16, 16] | R:[16, 16] | N:[16, 16] 30 94.46 31.39 L:[10, 10] | R:[32, 32] | N:[32, 32] 30 94.73 31.88 L:[16, 16] | R:[16, 16] | N:[16, 16] 20 94.88 30.31 L:[10, 10] | R:[32, 32] | N:[32, 32] 20 94.42 34.23 L:[16,16]| R:[16,16] | N:[16,16] 10 95.00 31.97 L:[10, 10] | R:[32, 32] | N:[32, 32] 10 94.54 34.78 L:[10, 10] | R:[16, 16, 16] | N:[16, 16, 16] 10 94.74 31.12 Table 7: Efficiency results for 9 configurations of a DNNpowered ICA on a training set of 19 MRVM auction instances. The selected winner model is marked in bold. All results are averaged over the 19 auction instances. Auction Max % % t-test on Mechanism #Queries #Queries Efficiency Revenue Efficiency VCG 298 298 100.00 (0.00) 44.3 - SVR-ICA 265 630 94.58 (0.14) 35.20 0.0268 DNN-ICA 334 908 95.04 (0.14) 30.59 Table 8: A comparison of the DNN-powered ICA against the SVR-powered ICA and VCG (as reported in Brero, Lubin, and Seuken (2018)) on a test set of 50 MRVM auction instances. All results are averaged over the 50 instances. For efficiency, standard errors are shown in parentheses. that a smaller c0 tends to lead to more iterations of the preference elicitation algorithm, resulting in a larger number of elicited bundle-value pairs. In terms of which DNN architectures performed better or worse, no clear pattern emerged. In Table 8, we compare the performance of the selected winner model from Table 7 against the SVR-based approach on a test set of 50 MRVM instances. We see that our DNN-powered ICA outperforms the SVR-powered ICA by 0.46% points. While this is a relatively modest increase in efficiency, this effect is statistically significant (p = 0.0268). We also observe that our DNN-based approach (while obeying all caps) asks a larger number of queries than the SVR-based approach. It is unclear how much of the efficiency increase is driven by the DNN or by the larger number of queries. Future work should compare the two approaches by holding the total number of queries constant. 5.5 Runtime Analysis Domain MIP Runtime Iteration Runtime Auction Runtime GSVM 15.90 sec 30.51 sec 44 min LSVM 39.75 sec 51.69 sec 65 min MRVM 3.67 sec 26.75 sec 457 min Table 9: Average runtime results of the selected DNN winner models in different domains. All values are averaged over 100 (GSVM and LSVM) and 50 (MRVM) auction instances. In Table 9, we present runtime results for our DNNpowered ICA for the winner models from Tables 4, 6 and 8. Specifically, we show average runtime results of the MIP (OP2), of an iteration of Algorithm 1, and of a whole auction (PVM). We observe that the average runtime of a whole auction takes approximately 1 hour in GSVM and LSVM and 7 hours in the larger MRVM domain. The increase in total runtime in MRVM can be explained by the fact that we use a smaller number of initial queries (c0 := 10) and a larger total query cap (ce := 100) compared to LSVM and GSVM. This results in a larger number of iterations of Algorithm 1. Additionally, MRVM consists of 10 bidders resulting in 11 calls of Algorithm 1 in contrast to 7 calls in LSVM and GSVM. Even though in MRVM the average MIP runtime is smaller, the larger number of iterations and bidders lead to this increase in total runtime. Overall, these results show that our DNN-based approach is practically feasible and scales well to the larger MRVM domain. Brero, Lubin, and Seuken (2018) do not provide runtime information such that we cannot provide a runtime comparison with SVRs.14 In Figure 3, we present additional MIP runtime results for selected DNN architectures in LSVM (results in GSVM and MRVM are qualitatively similar). We observe two effects: First, increasing the number of nodes per layer slightly increases the average runtime. Second, adding an additional 14In conversations with the authors, they told us that for gaussian and exponential kernels, their MIPs always timed out (1h cap) in GSVM, LSVM and MRVM. The average MIP runtime for the quadratic kernel was a few seconds in GSVM and LSVM. In MRVM, the quadratic kernel also regularly timed out resulting in an average MIP runtime of 30 min and of 36 h for a whole auction. R:[16,16] N:[10,10] R:[32,32] N:[10,10] R:[16,16] N:[10,10,10] R:[32,32] N:[10,10,10] MIP Runtime (in sec.) 2849 3553 3193 3840 #MIPs: Figure 3: MIP runtimes defined in (OP2) based on 50 different LSVM instances. Results are shown for a selection of various DNN architectures and for c0 := 40, ce := 50. layer (for the national bidder) significantly increases the average runtime. Not surprisingly, the largest DNN architectures lead to the highest runtime. The runtime of our MIPs heavily depends on the size of the big-M variable L. In practice, L should be chosen as small as possible to obtain a MIP formulation that is as tight as possible. We initialized L := 5000 and tightened this bound further by using interval arithmetic (see, e.g., Tjeng, Xiao, and Tedrake (2019)). Recently, Singh et al. (2018) proposed a novel technique for tightening such MIP formulations. Evaluating this in more detail is subject to future work. 6 Conclusion In this paper, we have designed a deep learning-powered ICA. We have compared our approach against prior work using SVRs with quadratic kernels. Our experimental results have shown that our DNN-based approach leads to significantly higher economic efficiency in complex auction domains and scales well to large domains. On a technical level, our main contribution was to reformulate the DNN-based WDP into a MIP. The main insight to achieve this was to use Re LU activation functions, which can be re-written as multiple linear constraints. From an experimental point of view, we were pleasantly surprised to see that even DNNs with a small number of layers and nodes and with a small number of training samples (i.e., bids) were able to achieve high prediction performance and ultimately high economic efficiency in the overall auction mechanism. Future work should investigate the trade-off between larger DNN architectures and the resulting MIP runtime, to further increase efficiency. Acknowledgments We thank Gianluca Brero, Nils Olberg, and Stefania Ionescu for insightful discussions and the anonymous reviewers for helpful comments. This paper is part of a project that has received funding from the European Research Council (ERC) under the European Union s Horizon 2020 research and innovation programme (Grant agreement No. 805542). Ausubel, L. M., and Baranov, O. 2017. A practical guide to the combinatorial clock auction. Economic Journal 127(605):F334 F350. Ausubel, L., and Cramton, P. 2011. Auction design for wind rights. Report to Bureau of Ocean Energy Management, Regulation and Enforcement. Ausubel, L. M.; Cramton, P.; and Milgrom, P. 2006. Combinatorial Auctions. MIT Press. chapter The clock-proxy auction: A practical combinatorial auction design, 115 138. Bichler, M.; Davenport, A.; Hohner, G.; and Kalagnanam, J. 2006. Combinatorial Auctions. MIT Press. chapter Industrial procurement auctions, 593 612. Blum, A.; Jackson, J.; Sandholm, T.; and Zinkevich, M. 2004. Preference elicitation and query learning. Journal of Machine Learning Research 5:649 667. Brero, G., and Lahaie, S. 2018. A bayesian clearing mechanism for combinatorial auctions. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence. Brero, G.; Lahaie, S.; and Seuken, S. 2019. Fast iterative combinatorial auctions via bayesian learning. In Proceedings of the Thirty-third AAAI Conference of Artificial Intelligence. Brero, G.; Lubin, B.; and Seuken, S. 2017. Probably approximately efficient combinatorial auctions via machine learning. In Thirty First AAAI Conference on Artificial Intelligence. Brero, G.; Lubin, B.; and Seuken, S. 2018. Combinatorial auctions via machine learning-based preference elicitation. In Proceedings of the 27th International Joint Conference on Artificial Intelligence and the 23rd European Conference on Artificial Intelligence. Brero, G.; Lubin, B.; and Seuken, S. 2019. Machine learning-powered iterative combinatorial auctions. ar Xiv preprint https://arxiv.org/abs/1911.08042. Cheng, C.-H.; N uhrenberg, G.; and Ruess, H. 2017. Automated Technology for Verification and Analysis. ATVA 2017. Lecture Notes in Computer Science, volume 10482. Springer, Cham. chapter Maximum resilience of artificial neural networks. Cramton, P. 2013. Spectrum auction design. Review of Industrial Organization 42(2):161 190. D utting, P.; Fischer, F.; Jirapinyo, P.; Lai, J. K.; Lubin, B.; and Parkes, D. C. 2015. Payment rules through discriminant-based classifiers. ACM Transactions on Economics and Computation 3(1):5. D utting, P.; Feng, Z.; Narasimhan, H.; Parkes, D. C.; and Ravindranath, S. S. 2017. Optimal auctions through deep learning. In Proceedings of the 36th International Conference on Machine Learning. Fischetti, M., and Jo, J. 2018. Deep neural networks and mixed integer linear optimization. Constraints 23(3):296 309. Goeree, J. K., and Holt, C. A. 2010. Hierarchical package bidding: A paper & pencil combinatorial auction. Games and Economic Behavior 70(1):146 169. Golowich, N.; Narasimhan, H.; and Parkes, D. C. 2018. Deep learning for multi-facility location mechanism design. In Proceedings of the Twenty-seventh International Joint Conference on Artificial Intelligence and the Twenty-third European Conference on Artificial Intelligence, 261 267. Kingma, D., and Ba, J. 2015. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations. Lahaie, S. M., and Parkes, D. C. 2004. Applying learning algorithms to preference elicitation. In Proceedings of the 5th ACM Conference on Electronic Commerce. Narasimhan, H.; Agarwal, S. B.; and Parkes, D. C. 2016. Automated mechanism design without money via machine learning. In Proceedings of the 25th International Joint Conference on Artificial Intelligence. Nisan, N., and Segal, I. 2006. The communication requirements of efficient allocations and supporting prices. Journal of Economic Theory 129(1):192 224. Rassenti, S. J.; Smith, V. L.; and Bulfin, R. L. 1982. A combinatorial auction mechanism for airport time slot allocation. The Bell Journal of Economics 402 417. Scheffel, T.; Ziegler, G.; and Bichler, M. 2012. On the impact of package selection in combinatorial auctions: an experimental study in the context of spectrum auction design. Experimental Economics 15(4):667 692. Singh, G.; Gehr, T.; Mirman, M.; P uschel, M.; and Vechev, M. 2018. Fast and effective robustness certification. In Advances in Neural Information Processing Systems, 10802 10813. Smola, A. J., and Sch olkopf, B. 2004. A tutorial on support vector regression. Statistics and computing 14(3):199 222. Tjeng, V.; Xiao, K. Y.; and Tedrake, R. 2019. Evaluating robustness of neural networks with mixed integer programming. In International Conference on Learning Representations. Weiss, M.; Lubin, B.; and Seuken, S. 2017. Sats: A universal spectrum auction test suite. In Proceedings of the 16th Conference on Autonomous Agents and Multi-Agent Systems.