# optimal_auctions_through_deep_learning__81c9640e.pdf Optimal Auctions through Deep Learning Paul Dütting * 1 Zhe Feng * 2 Harikrishna Narasimham * 2 David C. Parkes * 2 Sai S. Ravindranath * 2 Abstract Designing an incentive compatible auction that maximizes expected revenue is an intricate task. The single-item case was resolved in a seminal piece of work by Myerson in 1981. Even after 30-40 years of intense research the problem remains unsolved for seemingly simple multibidder, multi-item settings. In this work, we initiate the exploration of the use of tools from deep learning for the automated design of optimal auctions. We model an auction as a multi-layer neural network, frame optimal auction design as a constrained learning problem, and show how it can be solved using standard pipelines. We prove generalization bounds and present extensive experiments, recovering essentially all known analytical solutions for multi-item settings, and obtaining novel mechanisms for settings in which the optimal mechanism is unknown. 1. Introduction Optimal auction design is one of the cornerstones of economic theory. It is of great practical importance, as auctions are used across industries and by the public sector to organize the sale of their products and services. Concrete examples are the US FCC Incentive Auction, the sponsored search auctions conducted by web search engines such as Google, or the auctions run on platforms such as e Bay. In the standard independent private valuations model, each bidder has a valuation function over subsets of items, drawn independently from not necessarily identical distributions. It is assumed that the auctioneer knows the distributions and can (and will) use this information in designing the auction. A major difficulty in designing auctions is that valuations are private and bidders need to be incentivized to report their valuations truthfully. The goal is to learn an incentive compatible auction that maximizes revenue. *Equal contribution 1London School of Economics 2Harvard University. Correspondence to: Zhe Feng . Proceedings of the 36 th International Conference on Machine Learning, Long Beach, California, PMLR 97, 2019. Copyright 2019 by the author(s). In a seminal piece of work, Myerson resolved the optimal auction design problem when there is a single item for sale (Myerson, 1981). Quite astonishingly, even after 30-40 years of intense research, the problem is not completely resolved even for a simple setting with two bidders and two items. While there have been some elegant partial characterization results (Manelli & Vincent, 2006; Pavlov, 2011; Haghpanah & Hartline, 2015; Giannakopoulos & Koutsoupias, 2015; Daskalakis et al., 2017; Yao, 2017), and an impressive sequence of recent algorithmic results (Cai et al., 2012b;a; 2013; Hart & Nisan, 2017; Babaioff et al., 2014; Yao, 2015; Cai & Zhao, 2017; Chawla et al., 2010), most of them apply to the weaker notion of Bayesian incentive compatibility (BIC). Our focus is on designing auctions that satisfy dominant-strategy incentive compatibility (DSIC), which is the more robust and desirable notion of incentive compatibility. A recent, concurrent line of work started to bring in tools from machine learning and computational learning theory to design auctions from samples of bidder valuations. Much of the effort here has focused on analyzing the sample complexity of designing revenue-maximizing auctions (see e.g. Cole & Roughgarden (2014); Mohri & Medina (2016)). A handful of works has leveraged machine learning to optimize different aspects of mechanisms (Lahaie, 2011; Dütting et al., 2014; Narasimhan et al., 2016), but none of these offers the generality and flexibility of our approach. There have also been computational approaches to auction design, under the agenda of automated mechanism design (Conitzer & Sandholm, 2002; 2004; Sandholm & Likhodedov, 2015), but these are limited to specialized classes of auctions known to be incentive compatible. Our contribution. In this work we provide the first, general purpose, end-to-end approach for solving the multi-item auction design problem. We use multi-layer neural networks to encode auction mechanisms, with bidder valuations being the input and allocation and payment decisions being the output. We then train the networks using samples from the value distributions, so as to maximize expected revenue subject to constraints for incentive compatibility. To be able to tackle this problem using standard pipelines, we restate the incentive compatibility constraint as requiring the expected ex post regret for the auction to be zero. We Optimal Auctions through Deep Learning adopt the Augmented Lagrangian Method to solve the resulting constrained optimization problem, where in each iteration we push gradients through the regret term, by solving an inner optimization problem to find the optimal misreport for each bidder and valuation profile. We describe network architectures for bidders with additive, unit-demand, and combinatorial valuations, and present extensive experiments that show that: (a) Our approach is capable of recovering essentially all analytical solutions for multi-item settings that have been obtained over the past 30-40 years by finding auctions with almost optimal revenue and vanishingly small regret that match the allocation and payment rules of the theoretically optimal auctions to surprising accuracy. (b) Our approach finds high-revenue auctions with negligibly small regret in settings in which the optimal auction is unknown, matching or outperforming state-of-the-art computational results (Sandholm & Likhodedov, 2015). (c) Whereas the largest setting presently studied in the analytical literature is one with 2 bidders and 2 items, our approach learns auctions for larger settings, such as a 5 bidder, 10 items setting, where optimal auctions have been hard to design, and finds low regret auctions that yield higher revenue than strong baselines. We also prove a novel generalization bound, which implies that, with high probability, for our architectures high revenue and low regret on the training data translates into high revenue and low regret on freshly sampled valuations. Discussion. By focusing on expected ex post regret we adopt a quantifiable relaxation of dominant-strategy incentive compatibility, first introduced in (Dütting et al., 2014). Our experiments suggest that this relaxation is an effective tool for approximating the optimal DSIC auctions. While not strictly limited to neural networks our approach benefits from the expressive power of neural networks and the ability to enforce complex constraints in the training problem using the standard pipeline. A key advantage of our method over state-of-the-art automated mechanism design approaches (such as (Sandholm & Likhodedov, 2015)) is that we optimize over a broader class of not necessarily incentive compatible mechanisms, and are only constrained by the expressivity of the neural network architecture. While the original work on automated auction design framed the problem as a linear program (LP) (Conitzer & Sandholm, 2002; 2004), follow-up works have acknowledged that this approach has severe scalablility issues as it requires a number of constraints and variables that is exponential in the number of agents and items (Guo & Conitzer, 2010). We find that even for small setting with 2 bidders and 3 items (and a discretization of the value into 5 bins per item) the LP takes 69 hours to complete since the LP needs to handle 105 decision variables and 4 106 constraints. For the same setting, our approach found an auction with lower regret in just over 9 hours (see Table 1). Further related work. Prior sample complexity results are available for the design of optimal single-item auctions (Cole & Roughgarden, 2014; Mohri & Medina, 2016; Huang et al., 2015), single bidder, multi-item auctions (Dughmi et al., 2014), general single-parameter settings (Morgenstern & Roughgarden, 2015), combinatorial auctions (Balcan et al., 2016; Morgenstern & Roughgarden, 2016; Syrgkanis, 2017), and allocation mechanisms (both with and without money) (Narasimhan & Parkes, 2016). Several other research groups have recently picked up deep nets and inference tools and applied them to economic problems, different from the one we consider here. These include the use of neural networks to predict behavior of human participants in strategic scenarios (Hartford et al., 2016), an automated equilibrium analysis of mechanisms (Thompson et al., 2017), deep nets for causal inference (Hartford et al., 2017; Louizos et al., 2017), and deep reinforcement learning for solving combinatorial games (Raghu et al., 2018).1 2. Auction Design as a Learning Problem Auction design basics. We consider a setting with a set of n bidders N = {1, . . . , n} and m items M = {1, . . . , m}. Each bidder i has a valuation function vi : 2M R 0, where vi(S) denotes how much the bidder values the subset of items S M. In the simplest case, a bidder may have additive valuations, where she has a value for individual items in M, and her value for a subset of items S M: vi(S) = P j S vi({j}). Bidder i s valuation function is drawn independently from a distribution Fi over possible valuation functions Vi. We write v = (v1, . . . , vn) for a profile of valuations, and denote V = Qn i=1 Vi. The auctioneer knows the distributions F = (F1, . . . , Fn), but does not know the bidders realized valuation v. The bidders report their valuations (perhaps untruthfully), and an auction decides on an allocation of items to the bidders and charges a payment to them. We denote an auction (g, p) as a pair of allocation rules gi : V 2M and payment rules pi : V R 0 (these rules can be randomized). Given bids b = (b1, . . . , bn) V , the auction computes an allocation g(b) and payments p(b). A bidder with valuation vi receives a utility ui(vi, b) = vi(gi(b)) pi(b) for report of bid profile b. Bidders are 1There has also been follow-up work to the present paper that extends our approach to budget constrained bidders (Feng et al., 2018) and to the facility location problem (Golowich et al., 2018), and that develops specialized architectures for single bidder settings that satisfy IC (Shen et al., 2019). Optimal Auctions through Deep Learning strategic and seek to maximize their utility, and may report bids that are different from their valuations. Let v i denote the valuation profile v = (v1, . . . , vn) without element vi, similarly for b i, and let V i = Q j =i Vj denote the possible valuation profiles of bidders other than bidder i. An auction is dominant strategy incentive compatible (DSIC), if each bidder s utility is maximized by reporting truthfully no matter what the other bidders report. In other words, ui(vi, (vi, b i)) ui(vi, (bi, b i)) for every bidder i, every valuation vi Vi, every bid bi Vi, and all bids b i V i from others. An auction is (ex post) individually rational (IR) if each bidder receives a non-zero utility, i.e. ui(vi, (vi, b i)) 0 i N, vi Vi, and b i V i . In a DSIC auction, it is in the best interest of each bidder to report truthfully, and so the revenue on valuation profile v is P i pi(v). Optimal auction design seeks to identify a DSIC auction that maximizes expected revenue. Formulation as a learning problem. We pose the problem of optimal auction design as a learning problem, where in the place of a loss function that measures error against a target label, we adopt the negated, expected revenue on valuations drawn from F. We are given a parametric class of auctions, (gw, pw) M, for parameters w Rd (some d N), and a sample of bidder valuation profiles S = {v(1), . . . , v(L)} drawn i.i.d. from F.2 The goal is to find an auction that minimizes the negated, expected revenue P i N pw i (v), among all auctions in M that satisfy incentive compatibility. In particular, we introduce constraints in the learning problem to ensure that the chosen auction satisfies incentive compatibility. For this, we define the ex post regret for each bidder to measure the extent to which an auction violates incentive compatibility. Fixing the bids of others, the ex post regret for a bidder is the maximum increase in her utility, considering all possible non-truthful bids. We will be interested in the expected ex post regret for bidder i: rgti(w) = E h max v i Vi uw i (vi; (v i, v i)) uw i (vi; (vi, v i)) i , where the expectation is over v F and uw i (vi, b) = vi(gw i (b)) pw i (b) for given model parameters w. We assume that F has full support on the space of valuation profiles V , and recognizing that the regret is non-negative, an auction satisfies DSIC if and only if rgti(w) = 0, i N. Given this, we re-formulate the learning problem as minimizing the expected loss, i.e., the expected negated revenue s.t. the expected ex post regret being 0 for each bidder: min w Rd Ev F i N pw i (v) s.t. rgti(w) = 0, i N. 2Note that there is no need to compute equilibrium inputs we sample true profiles, and seek to learn rules that are IC. Given a sample S of L valuation profiles from F, we estimate the empirical ex post regret for bidder i as: c rgti(w) = ℓ=1 max v i Vi uw i v(ℓ) i ; v i, v(ℓ) i uw i (v(ℓ) i ; v(ℓ)), (1) and seek to minimize the empirical loss subject to the empirical regret being zero for all bidders: L PL ℓ=1 Pn i=1 pw i (v(ℓ)) s.t. c rgti(w) = 0, i N. (2) Individual Rationality. We will additionally require the designed auction to satisfy IR, which can be ensured by restricting our search space to a class of parametrized auctions (gw, pw) that charge no bidder more than her expected utility for an allocation. In Section 3, we will model the allocation and payment rules as neural networks and incorporate the IR requirement in the architecture. Generalization bound. We bound the gap between the expected regret and the empirical regret in terms of the number of sampled valuations profiles. We show a similar result for revenue. Our bounds hold for any auction chosen from a finite capacity class, and imply that solving for (2) with a large sample yields an auction with near-optimal expected revenue and close-to-zero expected regret (we note that in practice, we may not be able to solve (2) exactly). We measure the capacity of an auction class using a definition of covering numbers used in the ranking literature (Rudin & Schapire, 2009). We define the ℓ ,1 distance between auctions (g, p), (g , p ) M as maxv V P i,j |gij(v) g ij(v)| + P i |pi(v) p i(v)|. For any ϵ > 0, let N (M, ϵ) be the minimum number of balls of radius ϵ required to cover M under the ℓ ,1 distance. Theorem 1. For each bidder i, assume w.l.o.g. the valuation function vi(S) 1, S M. Let M be a class of auctions that satisfy individual rationality. Fix δ (0, 1). With probability at least 1 δ over draw of sample S of L profiles from F, for any (gw, pw) M, i N pw i (v) 1 i=1 pw i (v(ℓ)) + 2n L + Cn i=1 rgti(w) 1 i=1 c rgti(w) + 2 L + C r where L = infϵ>0 n ϵ n + 2 q 2 log(N (M, ϵ/2)) C, C are distribution-independent constants. Optimal Auctions through Deep Learning See the appendix for the proof. If the term L in the above bound goes down to 0 as the sample size L increases, the above bounds go to 0 as L . In Theorem 2 in Section 3, we bound L for the neural network architectures we present in this paper. 3. Neural Network Architecture We describe neural network architectures, which we refer to as Regret Net, for modeling multi-item auctions. We consider bidders with additive, unit-demand, and general combinatorial valuations. The architectures contain two logically distinct components: the allocation and payment networks. Additive valuations. A bidder has additive valuations if the bidder s value for a bundle of items S M is the sum of her value for the individual items in S, i.e. vi(S) = P j S vi(j). In this case, the bidders report only their valuations for individual items. The architecture for this setting models a randomized allocation network gw : Rnm [0, 1]nm and a payment network pw : Rnm Rn 0, both of which are modeled as feed-forward, fully-connected networks with tanh activations. The input layer of the networks consists of bids bij representing the valuation of bidder i for item j. The allocation network outputs a vector of allocation probabilities z1j = g1j(b), . . . , znj = gnj(b), for each item j [m]. To ensure feasibility, i.e. that the probability of an item being allocated is at most 1, the allocations are computed using a softmax activation function, so that for all items j, Pn i=1 zij 1. To accommodate the possibility of an item not being assigned to any bidder, we include a dummy node in the softmax computation which holds the residual allocation probabilities. Bundling of items is possible because the output units allocating items to the same bidder can be correlated. The payment network outputs a payment for each bidder that denotes the amount the bidder should pay in expectation, for this particular bid profile. To ensure that the auction satisfies individual rationality, i.e. does not charge a bidder more than her expected value for the allocation, the network first computes a fractional payment pi [0, 1] for each bidder i using a sigmoidal unit, and outputs a payment pi = pi Pm j=1 zij bij, where zij s are outputs from the allocation network. An overview of the architecture is shown in Figure 1, where the revenue and regret are computed as functions of the parameters of the allocation and payment networks. Unit-demand valuations. A bidder has unit-demand valuations when the bidder s value for a bundle of items S M is the maximum value she assigns to any one item in the bundle, i.e. vi(S) = maxj S vi(j). The allocation network for unit-demand bidders is the feed-forward network shown in Figure 2. For revenue maximization in this set- ting, it can be shown that it is sufficient to consider allocation rules that assign at most one item to each bidder.3 In the case of randomized allocation rules, this would require that the total allocation for each bidder is at most 1, i.e. P j zij 1, i [n]. We would also require that no item is over-allocated, i.e. P i zij 1, j [m]. Hence, we design allocation networks for which the matrix of output probabilities [zij]n i,j=1 is doubly stochastic.4 In particular, we have the allocation network compute two sets of scores sij s and s ij s, with the first set of scores normalized along the rows, and the second set of scores normalized along the columns. Both normalizations can be performed by passing these scores through softmax functions. The allocation for bidder i and item j is then computed as the minimum of the corresponding normalized scores: zij = ϕDS ij (s, s ) = min esij Pn+1 k=1 eskj , es ij Pm+1 k=1 es jk where indices n + 1 and m + 1 denote dummy inputs that correspond to an item not being allocated to any bidder, and a bidder not being allocated any item respectively. Lemma 1. ϕDS(s, s ) is doubly stochastic s, s Rnm. For any doubly stochastic allocation z [0, 1]nm, s, s Rnm, for which z = ϕDS(s, s ). The payment network is the same as in Figure 1. Combinatorial valuations. We also consider bidders with general, combinatorial valuations. In the present work, we develop this architecture only for small number of items.5 In this case, each bidder i reports a bid bi,S for every bundle of items S M (except the empty bundle, for which her valuation is taken as zero). The allocation network has an output zi,S [0, 1] for each bidder i and bundle S, denoting the probability that the bidder is allocated the bundle. To prevent the items from being over-allocated, we require that the probability that an item appears in a bundle allocated to some bidder is at most 1. We also require that the total allocation to a bidder is at most 1: X S M:j S zi,S 1, j M; (3) S M zi,S 1, i N. (4) 3 This holds by a simple reduction argument: for any IC auction that allocates multiple items, one can construct an IC auction with the same revenue by retaining only the most-preferred item among those allocated to the bidder. 4 A randomized allocation represented by a doubly-stochastic matrix can be decomposed into a lottery over deterministic one-toone assignments (Birkhoff, 1946; von Neumann, 1953). 5With more items, combinatorial valuations can be succinctly represented using appropriate bidding languages; see, e.g. (Boutilier & Hoos, 2001). Optimal Auctions through Deep Learning j=1 z1j b1j j=1 z2j b2j j=1 znj bnj z11, . . . , z1m zn1, . . . , znm Allocation Network g Payment Network p Metrics: rev, rgt1, . . . , rgtn wg wp Figure 1: The allocation and payment networks for a setting with n additive bidders and m items. The inputs are bids from bidders for each item. The rev and each rgti are defined as a function of the parameters of the allocation and payment networks w = (wg, wp). z11 = min{ s11, s 11} zn1 = min{ sn1, s n1} z1m = min{ s1m, s 1m} znm = min{ snm, s nm} softmax softmax item1 softmax bidder1 softmax item2 softmax bidder2 softmax min z1,{1,2} min z2,{1,2} Figure 2: The allocation network for settings with (a) n unit-demand bidders and m items; and (b) 2 combinatorial bidders and 2 items. In (a), sij = esij/ Pn+1 k=1 eskj and s ij = es ij/ Pm+1 k=1 es jk. The payment networks for these settings are same as in Figure 1. We refer to an allocation that satisfies constraints (3) (4) as being combinatorial feasible. To enforce these constraints, we will have the allocation network compute a set of scores for each item and a set of scores for each agent. Specifically, there is a group of bidder-wise scores si,S, S M for each bidder i N, and a group of item-wise scores s(j) i,S, i N, S M for each item j M. Each group of scores is normalized using a softmax function: si,S = exp(si,S)/P S exp(si,S ) and s(j) i,S = exp(s(j) i,S)/P i ,S exp(s(j) i ,S ). The allocation for bidder i and bundle S M is defined as the minimum of the normalized bidder-wise score si,S for i and the normalized item-wise scores s(j) i,S for each j S: zi,S = ϕCF i,S (s, s(1), . . . , s(m)) = min si,S, s(j) i,S : j S . Lemma 2. ϕCF (s, s(1), . . . , s(m)) is combinatorial feasible s, s(1), . . . , s(m) Rn2m. For any combinatorial feasible allocation z [0, 1]n2m, s, s(1), . . . , s(m) Rn2m, for which z = ϕCF (s, s(1), . . . , s(m)). Figure 2(b) shows the network architecture for a setting with 2 bidders and 2 items. For ease of exposition, we ignore the empty bundle in our discussion. For each bidder i {1, 2}, the network computes three scores si,{1}, si,{2}, and si,{1,2}, one for each bundle that she can be assigned, and normalizes them using a softmax function. The network also computes four scores for item 1: s1 1,{1}, s1 2,{1}, s1 1,{1,2}, and s1 2,{1,2}, one for each assignment where item 1 is present, and similarly, four scores for item 2: s2 1,{2}, s2 2,{2}, s2 1,{1,2}, and s2 2,{1,2}. Each set of scores is then normalized by separate softmax functions. The final allocation for each bidder i is: zi,{1} = min{ si,{1}, s1 i,{1}}, zi,{2} = min{ si,{2}, s2 i,{2}}, and zi,{1,2} = min{ si,{1,2}, s1 i,{1,2}, s2 i,{1,2}}. The payment network for combinatorial bidders has the same structure as the one in Figure 1, computing a fractional payment pi [0, 1] for each bidder i using a sigmoidal unit, and outputting a payment pi = pi P S M zi,S bij, where zi,S s are the outputs from the allocation network. Optimal Auctions through Deep Learning Covering number bounds. We now bound the term L in the generalization bound in Theorem 1 for the neural networks presented above. Theorem 2. For Regret Net with R hidden layers, K nodes per hidden layer, da parameters in the allocation network, dp parameters in the payment network, and the vector of all model parameters w 1 W, the following are the bounds on the term L for different bidder valuation types: (a) additive valuations: R(da + dp) log(LW max{K, mn})/L , (b) unit-demand valuations: R(da + dp) log(LW max{K, mn})/L , (c) combinatorial valuations: R(da + dp) log(LW max{K, n 2m})/L . The proof is given in the appendix. As the sample size L , the term L 0. The dependence of the above result on the number of layers, nodes and parameters in the network is similar to standard covering number bounds for neural networks (Anthony & Bartlett, 2009). Note that the logarithm in the bound for combinatorial valuations cancels the exponential dependence on the number of items m. 4. Optimization and Training We use the augmented Lagrangian method to solve the constrained training problem in (2) over the space of neural autoworker parameters w. We first define the Lagrangian function for the optimization problem, augmented with a quadratic penalty term for violating the constraints: Cρ(w; λ) = 1 i N pw i (v(ℓ)) i N λi c rgti(w) + ρ i N c rgti(w) 2 where λ Rn is a vector of Lagrange multipliers, and ρ > 0 is a fixed parameter that controls the weight on the quadratic penalty. The solver alternates between the following updates in each iteration on the model parameters and the Lagrange multipliers: (a) wnew argminw Cρ(wold; λold) and (b) λnew i = λold i + ρ c rgti(wnew), i N. The solver is described in Algorithm 1. We divide the training sample S into mini-batches of size B, and perform several passes over the training samples (with random shuffling of the data after each pass). We denote the minibatch received at iteration t by St = {u(1), . . . , u(B)}. The update (a) on model parameters involves an unconstrained optimization of Cρ over w and is performed using a gradient-based optimizer. Let f rgti(w) denote the empirical regret in (1) Algorithm 1 Regret Net Training Input: Minibatches S1, . . . , ST of size B Parameters: t, ρt > 0, γ > 0, η > 0, R N, K N Initialize: w0 Rd, λ0 Rn for t = 0 to T do Receive minibatch St = {u(1), . . . , u(B)} Initialize misreports v (ℓ) i Vi, ℓ [B], i N for r = 0 to R do ℓ [B], i N : v (ℓ) i v (ℓ) i + γ v i uw i v(ℓ) i ; v (ℓ) i , v(ℓ) i end for Compute regret gradient: ℓ [B], i N: w uw i v(ℓ) i ; v (ℓ) i , v(ℓ) i uw i (v(ℓ) i ; v(ℓ)) w=wt Compute Lagrangian gradient using (5) and update wt: wt+1 wt η w Cρt(wt, λt) Update Lagrange multipliers once in Q iterations: if t is a multiple of Q λt+1 i λt i + ρt f rgti(wt+1), i N else computed on mini-batch St. The gradient of Cρ w.r.t. w for fixed λt is given by: w Cρ(w; λt) = 1 i N w pw i (v(ℓ)) ℓ=1 λt i gℓ,i + ρ X ℓ=1 f rgti(w) gℓ,i (5) gℓ,i = w h max v i Vi uw i v(ℓ) i ; v i, v(ℓ) i uw i (v(ℓ) i ; v(ℓ)) i . Note that the terms f rgti and gℓ,i in turn involve a max over misreports for each bidder i and valuation profile ℓ. We solve the inner maximization over misreports using another gradient based optimizer, and push the gradient through the utility differences at the optimal misreports. In particular, we maintain misreports v (ℓ) i for each i and valuation profile ℓ. For every update on the model parameters wt, perform R gradient updates to compute the optimal misreports: v (ℓ) i = v (ℓ) i + γ v iuw i v(ℓ) i ; v (ℓ) i , v(ℓ) i , for some γ > 0. In our experiments, we use the Adam optimizer (Kingma & Ba, 2014) for updates on model w and v (ℓ) i . Since the optimization problem we seek to solve is nonconvex, the solver is not guaranteed to reach a globally optimal solution. However, our method proves very effective in our experiments. The learned auctions incur very low regret and closely match the structure of the optimal auctions in settings where this is known. Optimal Auctions through Deep Learning Dist. OPT rev rgt (I) 0.550 0.554 < 0.001 (II) 2.137 2.137 < 0.001 Dist. rev rgt VVCA AMAbsym (III) 0.878 < 0.001 0.860 0.862 (IV) 2.871 < 0.001 2.741 2.765 (V) 4.270 < 0.001 4.209 3.748 (c) Figure 3: (a)-(b): Test revenue and regret for (a) single bidder, 2 items and (b) 2 bidder, 2 items settings. (c): Plot of test revenue and regret as a function of training epochs for setting (I). 5. Experimental Results We demonstrate that our approach can recover near-optimal auctions for essentially all settings for which the optimal solution is known and that it can find new auctions for settings where there is no known analytical solution. We present the complete set of experiments in the appendix and include a representative subset of the results here. Setup. We implemented our framework using the Tensor Flow deep learning library.6 We used the Glorot uniform initialization (Glorot & Bengio, 2010) for all networks and the tanh activation function at the hidden nodes. For all the experiments, we used a sample of 640,000 valuation profiles for training and a sample of 10,000 profiles for testing. The augmented Lagrangian solver was run for a maximum of 80 epochs with a minibatch size of 128. The value of ρ in augmented Lagrangian was set to 1.0 and incremented every 2 epochs. An update on wt was performed for every minibatch using the Adam optimizer with learning rate 0.001. For each update on wt, we ran R = 25 misreport updates steps with learning rate 0.1. At the end of 25 updates, the optimized misreports for the current minibatch were cached and used to initialize the misreports for the same minibatch in the next epoch. An update on λt was performed once in every 100 minibatches (i.e. Q = 100). Our experiments were run on a compute cluster with NVDIA GPU cores. Evaluation. In addition to the revenue of the learned auction on a test set, we also evaluate the regret, averaged across all bidders and test valuation profiles, rgt = 1 n Pn i=1 c rgti(f, p). Each d rgti has a max of the utility function over bidder valuations v i Vi (see (1)). We evaluate these terms by running gradient ascent on v i with a step-size of 0.1 for 2000 iterations (we test 1000 different random initial v i and report the one achieves the largest regret). Single bidder. Even in the simple setting of single bidder 6https://github.com/saisrivatsan/deep-opt-auctions auctions, there are analytical solutions only for special cases. We give the first computational approach that can handle the general design problem, and compare to the available analytical results. We show that not only are we able to learn auctions with near-optimal revenue, but we are also able to learn allocation rules that resemble the theoretically optimal rule with surprising accuracy. (I) Single bidder with additive valuations over 2 items, where the item values are drawn from U[0, 1]. The optimal auction is given by Manelli & Vincent (2006). (II) Single bidder with unit-demand valuations over 2 items, where the item values are drawn from U[2, 3]. The optimal mechanism is given by Pavlov (2011). Figure 3(a) presents the revenue and regret of the final auctions learned for settings (I) and (II) on the test set with an architecture with two hidden layers and 100 nodes per layer.7 The revenue of the learned auctions is very close to the optimal revenue, with negligibly small regret. In some cases the learned auctions achieve revenue slightly above that of the optimal incentive compatible auction. This is possible because of the small, non-zero regret that they incur. The visualizations of the learned allocation rules in Figure 4(a)-(b) show that our approach also closely recovers the structure of the optimal auction. Figure 3(c) presents a plot of revenue and regret as a function of the training epochs. The solver adaptively tunes the Lagrange multiplier on the regret, focusing on the revenue in the initial iterations and on regret in later iterations. Multiple bidders. We next compare to the state-of-the-art computational results of Sandholm and Likhodedov (Sandholm & Likhodedov, 2015) for settings for which the optimal auction is not known. These auctions are obtained by searching over a parametrized class of incentive compatible auctions. Unlike these prior methods, we do not need to search over a specific class of incentive compatible auction, and are limited only by the expressive power of the networks used. We show that this leads to novel auction designs that match or outperform the state-of-the-art mechanisms. (III) 2 additive bidders and 2 items, where bidders draw their value for each item from U[0, 1]. (IV) 2 bidders and 2 items, with v1,1, v1,2, v2,1, v2,2 U[1, 2], v1,{1,2} = v1,1 + v1,2 + C1 and v2,{1,2} = v2,1 + v2,2 + C2, where C1, C2 U[ 1, 1]. (V) 2 bidders and 2 items, with v1,1, v1,2 U[1, 2], v2,1, v2,2 U[1, 5], v1,{1,2} = v1,1 + v1,2 + C1 and v2,{1,2} = v2,1 + v2,2 + C2, where C1, C2 U[ 1, 1]. We adopt the same experimental setup as in settings (I)-(II). We compare the trained mechanism with the optimal auctions from the VVCA and AMAbsym families of incentive 7Based on evaluations on a held-out set, we found the gains to be negligible when we used more number of layers or nodes. Optimal Auctions through Deep Learning Figure 4: Allocation rules learned for single-bidder, two items settings: (a) I and (b) II. The solid regions describe the probability that the bidder is allocated item 1 (left) and item 2 (right) for different valuation inputs. The optimal auctions are described by the regions separated by the dashed black lines, with the numbers in black the optimal probability of allocation in the region. Distribution rev rgt Item-wise Bundled Myerson Myerson VI: 3 10 5.541 < 0.002 5.310 5.009 VII: 5 10 6.778 < 0.005 6.716 5.453 Figure 5: (a) Revenue and regret on validation set for auctions learned for setting (VI) using different architectures. (b) Test revenue and regret for setting (VI) - (VII). compatible auctions from (Sandholm & Likhodedov, 2015). Figure 3(b) summarizes our results. Our approach leads to significant revenue improvements and tiny regret. Comparing with Figure 3(a), where the regret of (I) afforded a revenue advantage over OPT of around 0.004 or 0.72%, it seems highly unlikely that the tiny non-zero regret explains the revenue advantages over these prior results Scaling up. We also consider settings with up to 5 bidders and 10 items. Due the exponential nature of the problem this is several orders of magnitude more complex than what the existing analytical literature can handle. For the settings that we study running a separate Myerson auction for each item is optimal in the limit of number of bidders (Palfrey, 1983). This yields a very strong but still improvable benchmark. (VI) 3 additive bidders and 10 items, where bidders draw their value for each item from U[0, 1]. (VII) 5 additive bidders and 10 items, where bidders draw their value for each item from U[0, 1]. Dist. Method rev rgt IR viol. Run-time 2 3 Regret Net 1.291 < 0.001 0 9 hrs LP (D: 5 bins/value) 1.53 0.019 0.027 69 hrs Table 1: Test revenue, regret and IR viol., run-time for Regret Net and LP for a 2 bidder, 3 items setting with uniform valuations. For setting (VI), we show in Figure 5(a) the revenue and regret of the learned auction on a validation sample of 10000 profiles, obtained with different architectures. Here (R, K) denotes an architecture with R hidden layers and K nodes per layer. The (5, 100) architecture has the lowest regret among all the 100-node networks for both settings above. Figure 5(b) shows that the final learned auctions yield higher revenue (with tiny regret) compared to the baselines. Comparison to LP. We also compare the running time of our algorithm with the LP approach proposed (Conitzer & Sandholm, 2002; 2004). To be able to run the LP to completion, we consider a smaller setting with 2 additive bidders and 3 items, with item values drawn from from U[0, 1]. The LP is solved with the commercial solver Gurobi. We handle continuous valuations by discretizing the value into 5 bins per item (resulting in 105 decision variables and 4 106 constraints) and then rounding a continuous input valuation profile to the nearest discrete profile (for evaluation). See the appendix for further discussion on LP. The results are shown in Table 1. We also report the violations in IR constraints incurred by the LP on the test set; for L valuation profiles, this is measured by 1 Ln PL ℓ=1 P i N max{ui(v(ℓ)), 0}. Due to the coarse discretization, the LP approach suffers significant IR violations (and as a result yields higher revenue). We are not able to run a LP for this setting in more than 1 week of compute time for finer discretizations. In contrast, our approach yields much lower regret and no IR violations (as the neural networks satisfy IR by design), in just around 9 hours. In fact, even for the larger settings (VI) (VII), the running time of our algorithm was less than 13 hours. 6. Conclusion Neural networks have been deployed successfully for exploration in other contexts, e.g., for the discovery of new drugs (Gómez-Bombarelli et al., 2018). We believe that there is ample opportunity for applying deep learning in the context of economic design. We have demonstrated how standard pipelines can re-discover and surpass the analytical and computational progress in optimal auction design that has been made over the past 30-40 years. While our approach can easily solve problems that are orders of magnitudes more complex than what could previously be solved with the standard LP-based approach, a natural next step would be to scale this approach further up to industry scale. We envision progress at scale will come through addressing the benchmarking question (e.g. through standardized benchmarking suites), and through innovations in the network architecture. Optimal Auctions through Deep Learning Acknowledgments This research is supported in part by NSF EAGER #124110. Anthony, M. and Bartlett, P. L. Neural Network Learning: Theoretical Foundations. Cambridge University Press, 1st edition, 2009. Babaioff, M., Immorlica, N., Lucier, B., and Weinberg, S. M. A simple and approximately optimal mechanism for an additive buyer. In Proceedings of the 55th IEEE Symposium on Foundations of Computer Science, pp. 21 30, 2014. Balcan, M.-F., Sandholm, T., and Vitercik, E. Sample complexity of automated mechanism design. In Proceedings of the 29th Conference on Neural Information Processing Systems, pp. 2083 2091, 2016. Birkhoff, G. Tres observaciones sobre el algebra lineal. Universidad Nacional de Tucumán, Revista A, 5:147 151, 1946. Boutilier, C. and Hoos, H. H. Bidding languages for combinatorial auctions. In Proceedings of the 17th International Joint Conference on Artificial Intelligence, pp. 1211 1217, 2001. Cai, Y. and Zhao, M. Simple mechanisms for subadditive buyers via duality. In Proceedings of the 49th ACM Symposium on Theory of Computing, pp. 170 183, 2017. Cai, Y., Daskalakis, C., and Weinberg, M. S. Optimal multi-dimensional mechanism design: Reducing revenue to welfare maximization. In Proceedings of the 53rd IEEE Symposium on Foundations of Computer Science, pp. 130 139, 2012a. Cai, Y., Daskalakis, C., and Weinberg, S. M. An algorithmic characterization of multi-dimensional mechanisms. In Proceedings of the 44th ACM Symposium on Theory of Computing, pp. 459 478, 2012b. Cai, Y., Daskalakis, C., and Weinberg, S. M. Understanding incentives: Mechanism design becomes algorithm design. In Proceedings of the 54th IEEE Symposium on Foundations of Computer Science, pp. 618 627, 2013. Chawla, S., Hartline, J. D., Malec, D. L., and Sivan, B. Multi-parameter mechanism design and sequential posted pricing. In Proceedings of the 42th ACM Symposium on Theory of Computing, pp. 311 320, 2010. Cole, R. and Roughgarden, T. The sample complexity of revenue maximization. In Proceedings of the 46th ACM Symposium on Theory of Computing, pp. 243 252, 2014. Conitzer, V. and Sandholm, T. Complexity of mechanism design. In Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence, pp. 103 110, 2002. Conitzer, V. and Sandholm, T. Self-interested automated mechanism design and implications for optimal combinatorial auctions. In Proceedings of the 5th ACM Conference on Electronic Commerce, pp. 132 141, 2004. Daskalakis, C., Deckelbaum, A., and Tzamos, C. Strong duality for a multiple-good monopolist. Econometrica, 85:735 767, 2017. Dughmi, S., Han, L., and Nisan, N. Sampling and representation complexity of revenue maximization. In Proceedings of the 10th Conference on Web and Internet Economics, pp. 277 291, 2014. Dütting, P., Fischer, F., Jirapinyo, P., Lai, J., Lubin, B., and Parkes, D. C. Payment rules through discriminantbased classifiers. ACM Transactions on Economics and Computation, 3(1):5, 2014. Feng, Z., Narasimhan, H., and Parkes, D. C. Deep learning for revenue-optimal auctions with budgets. In Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems, pp. 354 362, 2018. Giannakopoulos, Y. and Koutsoupias, E. Selling two goods optimally. In Proceedings of the 42nd International Colloquium on Automata, Languages, and Programming, pp. 650 662, 2015. Glorot, X. and Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, 2010. Golowich, N., Narasimhan, H., and Parkes, D. C. Deep learning for multi-facility location mechanism design. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, pp. 261 267, 2018. Guo, M. and Conitzer, V. Computationally feasible automated mechanism design: General approach and case studies. In Proceedings of the 24th AAAI Conference on Artificial Intelligence, 2010. Gómez-Bombarelli, R., Wei, J. N., Duvenaud, D., Hernández-Lobato, J. M., Sánchez-Lengeling, B., Sheberla, D., Aguilera-Iparraguirre, J., Hirzel, T. D., Adams, R. P., and Aspuru-Guzik, A. Automatic chemical design using a data-driven continuous representation of molecules. ACS Central Science, 4(2):268 276, 2018. Haghpanah, N. and Hartline, J. Multi-dimensional virtual values and second-degree price discrimination. Co RR, abs/1404.1341, 2015. Optimal Auctions through Deep Learning Hart, S. and Nisan, N. Approximate revenue maximization with multiple items. Journal of Economic Theory, 172: 313 347, 2017. Hartford, J. S., Wright, J. R., and Leyton-Brown, K. Deep learning for predicting human strategic behavior. In Proceedings of the 29th Conference on Neural Information Processing Systems, pp. 2424 2432, 2016. Hartford, J. S., Lewis, G., Leyton-Brown, K., and Taddy, M. Deep IV: A flexible approach for counterfactual prediction. In Proceedings of the 34th International Conference on Machine Learning, pp. 1414 1423, 2017. Huang, Z., Mansour, Y., and Roughgarden, T. Making the most of your samples. In Proceedings of the 16th ACM Conference on Economics and Computation, pp. 45 60, 2015. Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization. Co RR, abs/1412.6980, 2014. Lahaie, S. A kernel-based iterative combinatorial auction. In Proceedings of the 25th AAAI Conference on Artificial Intelligence, pp. 695 700, 2011. Louizos, C., Shalit, U., Mooij, J. M., Sontag, D., Zemel, R. S., and Welling, M. Causal effect inference with deep latent-variable models. In Proceedings of the 30th Conference on Neural Information Processing Systems, pp. 6449 6459, 2017. Manelli, A. and Vincent, D. Bundling as an optimal selling mechanism for a multiple-good monopolist. Journal of Economic Theory, 127(1):1 35, 2006. Mohri, M. and Medina, A. M. Learning algorithms for second-price auctions with reserve. Journal of Machine Learning Research, 17:74:1 74:25, 2016. Morgenstern, J. and Roughgarden, T. On the pseudodimension of nearly optimal auctions. In Proceedings of the 28th Conference on Neural Information Processing Systems, pp. 136 144, 2015. Morgenstern, J. and Roughgarden, T. Learning simple auctions. In Proceedings of the 29th Conference on Learning Theory, pp. 1298 1318, 2016. Myerson, R. Optimal auction design. Mathematics of Operations Research, 6:58 73, 1981. Narasimhan, H. and Parkes, D. C. A general statistical framework for designing strategy-proof assignment mechanisms. In Proceedings of the Conference on Uncertainty in Artificial Intelligence, 2016. Narasimhan, H., Agarwal, S., and Parkes, D. C. Automated mechanism design without money via machine learning. In Proceedings of the 25th International Joint Conference on Artificial Intelligence, pp. 433 439, 2016. Palfrey, T. Bundling decisions by a multiproduct monopolist with incomplete information. Econometrica, 51(2):463 83, 1983. Pavlov, G. Optimal mechanism for selling two goods. B.E. Journal of Theoretical Economics, 11:1 35, 2011. Raghu, M., Irpan, A., Andreas, J., Kleinberg, R., Le, Q. V., and Kleinberg, J. M. Can deep reinforcement learning solve Erdos-Selfridge-Spencer games? In Proceedings of the 35th International Conference on Machine Learning, pp. 4235 4243, 2018. Rudin, C. and Schapire, R. E. Margin-based ranking and an equivalence between adaboost and rankboost. Journal of Machine Learning Research, 10:2193 2232, 2009. Sandholm, T. and Likhodedov, A. Automated design of revenue-maximizing combinatorial auctions. Operations Research, 63(5):1000 1025, 2015. Shen, W., Tang, P., and Zuo, S. Automated mechanism design via neural networks. In Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems, 2019. Forthcoming. Syrgkanis, V. A sample complexity measure with applications to learning optimal auctions. In Proceedings of the 20th Conference on Neural Information Processing Systems, pp. 5358 5365, 2017. Thompson, D., Newman, N., and Leyton-Brown, K. The positronic economist: A computational system for analyzing economic mechanisms. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, pp. 720 727, 2017. von Neumann, J. A certain zero-sum two-person game equivalent to an optimal assignment problem. Annals of Mathematical Studies, 28:5 12, 1953. Yao, A. C.-C. An n-to-1 bidder reduction for multi-item auctions and its applications. In Proceedings of the 26th ACM-SIAM Symposium on Discrete Algorithms, pp. 92 109, 2015. Yao, A. C.-C. Dominant-strategy versus bayesian multi-item auctions: Maximum revenue determination and comparison. In Proceedings of the 18th ACM Conference on Economics and Computation, pp. 3 20, 2017.