# optimaler_auctions_through_attention__90f20025.pdf Optimal-er Auctions through Attention Dmitry Ivanov HSE University & Technion Israel Iskander Safiulin Independent researcher Russia Igor Filippov Independent researcher Russia Ksenia Balabaeva ITMO University & BIOCAD Russia Regret Net is a recent breakthrough in the automated design of revenue-maximizing auctions. It combines the flexibility of deep learning with the regret-based approach to relax the Incentive Compatibility (IC) constraint (that participants prefer to bid truthfully) in order to approximate optimal auctions. We propose two independent improvements of Regret Net. The first is a neural architecture denoted as Regret Former that is based on attention layers. The second is a loss function that requires explicit specification of an acceptable IC violation denoted as regret budget. We investigate both modifications in an extensive experimental study that includes settings with constant and inconstant numbers of items and participants, as well as novel validation procedures tailored to regret-based approaches. We find that Regret Former consistently outperforms Regret Net in revenue (i.e. is optimal-er) and that our loss function both simplifies hyperparameter tuning and allows to unambiguously control the revenue-regret trade-off by selecting the regret budget.2 1 Introduction Several decades ago, Myerson [42] has proposed a general solution for the problem of optimal (revenue-maximizing) design of single-item auctions. Despite the collaborative effort of the community, the results of the same generality for the multi-item auctions have not been obtained. As an alternative to analytical solutions, automated auction design [7, 8] takes a perspective of constrained optimization. In particular, the objective is to maximize the revenue while satisfying the Incentive Compatibility (IC) and Individual Rationality (IR) constraints. This framework can provide approximate solutions in settings where the optimal mechanisms are unknown, as well as hint at what the analytical solutions may look like. Whereas the early algorithms utilize linear programming and classic machine learning, modern algorithms focus on deep learning. Dütting et al. [17] are the pioneers of this approach with their Regret Net architecture. Regret Net is a neural network that represents a mechanism by mapping a matrix of bids that n participants report for m items into a probabilistic allocation matrix and a payment vector. It is trained via differential optimization on a mixture of two objectives: to maximize revenue and to minimize regret, which is a relaxation of the IC constraint proposed in [15]. Regret Net recovers near-optimal revenue in the analytically solved settings and outperforms the previous state-of-the-art while having vanishing regret guarantees. In this paper, we propose two independent improvements of Regret Net. First, we propose a neural architecture based on attention layers. We name it Regret Former after the widely-known Transformers [60]. An illustration of the architecture is provided in Figure 1. dimonenka@mail.ru, divanov@campus.technion.ac.il 2Code is available here 36th Conference on Neural Information Processing Systems (Neur IPS 2022). Figure 1: Our architecture Regret Former. The description is provided in Section 3.1. Regret Former has several crucial properties. On the one hand, it is by design insensitive to the order of items and participants, which is desirable for learning symmetric auctions. Note that it can also be modified to learn asymmetric mechanisms. On the other hand, it does not assume a predetermined constant input size, which allows one network to learn from data consisting of auctions of varying sizes, as well as to generalize to settings unseen during training. Second, we propose an alternative objective. Instead of optimizing arbitrary mixtures of revenue and regret, our approach maximizes revenue given an explicitly specified regret budget, which is an acceptable violation of the IC constraint. This is technically implemented through the method of Lagrange multipliers and dual gradient descent. Our approach allows to unambiguously balance revenue and regret by varying the regret budget. As a bonus, we find that our loss function is significantly less sensitive to the choice of loss-related hyperparameters. We conduct an extensive experimental study to verify the strengths of our modifications. Specifically, we compare our and existing architectures in both symmetric and asymmetric settings with constant input size, as well as in novel settings with inconstant input size. We find that Regret Former consistently outperforms existing architectures in revenue given the same regret budget. We additionally verify this result with two novel validation procedures. The first procedure is based on estimating the regret of one network given the optimal misreports approximated by another network. The second procedure is based on network distillation. Regarding our loss modification, we confirm that specifying the regret budget results in the intended revenue-regret trades-off. 2 Background 2.1 Problem statement Let N = {1, ..., n} be a set of n bidders, M = {1, ..., m} be a set of m items. Each bidder i evaluates all items with valuation function vi : 2M R. In our study, we only consider additive valuations, meaning that for each item j M a bidder has a valuation function vi(j), and a valuation of a subset S M equals to the sum of valuations of the items in the subset vi(S) = Σj Svi(j). The valuation function can also be extended to evaluate probabilistic assignments: vi(zij) = zijvi(j), where zij is the probability that participant i is assigned item j. The valuation function vi is drawn independently from the distribution Fi over all possible valuation functions Vi. A set of valuations of all bidders forms a valuation profile v = (v1, ..., vn). An auctioneer does not know the bidders valuations nor their distribution F = (F1, ..., Fn), but has a sample L = (v1, ..., v|L|) of valuation profiles drawn from F. The bidders independently report their bids b = (b1, ..., bn) V , based on which the auction allocates the items and charges payments. Formally, the auction is defined as a tuple (g = (g1, ..., gn), p = (p1, ..., pn)) of probabilistic allocation rules gi : V [0, 1]M and payment rules pi : V R . Define the bidder s utility as ui(vi, b) = vi(gi(b)) pi(b). Define the valuation profile without the valuation vi as v i. Similarly, define the bids without bi as b i, and the valuation profile of all bidders except i as V i = Πk =i Vk. An auction is dominant strategy incentive compatible (DSIC) if the utility of each bidder is maximized by reporting their true valuations regardless of bids of other bidders: i, v, b : ui(vi, v) ui(vi, (bi, v i)). An auction is ex-post individually rational (IR) if each agent s utility is non-negative for all possible valuations and bids: i, v, b : ui(vi, (bi, v i)) 0. Define the revenue of a profile as a sum of bidders payments Σipi(v). The goal of the optimal auction design is to maximize the expected revenue subject to DSIC and IR constraints. The problem is analytically solved for the auctions with one item in the seminal work of Myerson [42], provided the valuation distributions are known. There are no general solutions for auctions with m > 1. The task of optimal auction design can be viewed as an optimization or a machine learning problem. In this case, the expected revenue takes the place of the objective. There is a class of auctions (g(w), p(w)) M parameterized with w Rd for some d N and a set of bidder valuation profiles L drawn as independent and identically distributed variables from F. Then, the goal of the optimization procedure is to find an auction that minimizes the negated expected revenue EF [Σi Npi(v; w)] while satisfying DSIC and IR constraints. 2.2 Regret Net Regret Net [17] is a deep learning solution for the optimal auction design. It can be used for multibidder and multi-item settings, the analytical solutions for which are unknown. The architecture of Regret Net consists of two networks: the allocation network and the payment network. Both networks take flattened constant-sized bid matrix Bnm as an input and process it through several fully-connected layers. The allocation network maps a matrix of bids to categorical distributions of allocations between participants for each item: Anet(Bnm) = Znm, where zij is the probability of allocating the j-th item to the i-th bidder. To allow for the possibility of an item remaining unallocated, the output of the final layer has the size of (n + 1) m, imitating an additional dummy participant with zero valuations. Then, to obtain properly scaled allocation probabilities over participants for each item, the softmax activation is applied to the output of the final layer corresponding to each item. Specifically, zij = e zij/ Pn+1 i=1 e zij, where zij denotes the unnormalized allocation probability (i.e. logit). The payment network maps a matrix of bids to a vector of pay values: Pnet(Bnm) = ˆP n, where ˆpi is the fraction of the i-th bidder expected utility that the bidder transfers to the auctioneer. The payment is then computed as pi = ˆpi Pm j=1 zijbij for i = 1, . . . , n. To properly scale ˆpi [0, 1], a sigmoid activation is applied to the output of the final layer. Because the payment cannot exceed the bidder s expected utility, the mechanisms discovered by Regret Net always satisfy the IR constraint. Given a sample of profiles L, Regret Net aims to optimize the following empirical objective: i N pi(vl; w) s.t. rgti(vl; w) = 0, i N, l L where rgti denotes the i-th bidder s ex-post regret. Regret is a quantifiable relaxation of the DSIC constraint first introduced in [15]. It is defined as the difference between the i-th bidder s utilities given the optimal misreport (that provides the highest utility) and the truthful bid: rgti(vl; w) = max v i u(vl i, (v i, vl i); w) u(vl i, vl; w) (2) To solve the constrained optimization problem 1 over the space of the network parameters w, the authors of Regret Net employ the augmented Lagrangian method. This yields a loss function that combines revenue maximization with a penalty for violations of DSIC constraints: Louter(w) = X h Pi + λi e Ri + ρ 2 e R2 i i (3) where B denotes mini-batch, Pi = 1 |B| P l B pi(vl; w) is the average revenue from the i-th par- ticipant, e Ri = 1 |B| P l B f rgti(v l i , vl; w) is the average approximate regret of the i-th participant, and f rgti(v l i , vl; w) = u(vl i, (v i, vl i); w) u(vl i, vl; w) is an approximation of rgti. To find the approximate regret, the optimal misreports v l i are estimated via gradient descent in the inner optimization loop by minimizing the i-th participant negated utility as a function of their misreport: Linner(v l i ) = u(vl i, (v l i , vl i); w) (4) The Lagrange multipliers λi and ρ are periodically updated throughout the training according to the schedules: λi λi + ρ e Ri and ρ ρ + ρ . The starting values of λi and ρ and the learning rate ρ are hyperparameters that control the trade-off between revenue and regret of the learned mechanism. 2.3 Equivariant Net Equivariant Net is an alternative architecture proposed by Rahme et al. [51] to effectively deal with symmetric auctions. It relies on a theorem originally proven by [10] that there exists an optimal solution for a symmetric auction that is permutation-equivariant, i.e. insensitive to the order of inputs. Like Regret Net, Equivariant Net has an allocation network and a payment network. Both networks consist of compositions of exchangeable layers that preserve permutation-equivariance [25]. Their definition is provided in the Appendix. The authors observe that Equiavariant Net produces competitive results but does not outperform Regret Net in revenue. 2.4 Related work Early work that explores the automated solutions to auction design formulates the problem as linear program [7, 54, 8] or searches within specific families of DSIC auctions [35, 55]. The former approach suffers from scalability issues due to the curse of dimensionality [24], while the latter approach searches within a limited family of auctions that might not contain the optimal mechanism. Classic machine learning has also been applied to the auction design [34, 15, 44], but these approaches are considered less flexible and general than the deep learning alternatives [17]. In recent years, multiple extensions of Regret Net have been proposed, including the extensions to the optimal auction design with budget constraints [18], fairness constraints [33], and human preferences over desirable allocations [49], as well as the problem of faculty allocation [19] and the matching problem [53]. Shen et al. [56] and Dütting et al. [17] propose alternatives to Regret Net that are exactly DSIC but that are only applicable to auctions with one bidder. Curry et al. [9] combine the approach of searching within limited families of DSIC auctions with deep learning. Rahme et al. [52] propose a simplified loss function for Regret Net that is easier to tune, as well as a potentially faster regret estimation procedure based on training an adversarial bidder network. Deep learning has been applied to other aspects of mechanism design, such as iterative combinatorial auctions [62], minimizing economic burden on bidders within the Groves family of auctions [58], optimal bidding strategies [46, 45], optimal redistribution mechanisms [38], and E-commerce advertising [36]. Several studies have combined reinforcement learning with auction design [59, 29, 47, 1]. A plethora of research has also focused on the questions of sample complexity of designing revenue-maximizing auctions [6, 41, 13, 3, 40, 39, 43, 57, 20, 28, 26, 23, 21]. 3 Our modifications of Regret Net In this section, we describe our two modifications of Regret Net. The first modification is a novel neural architecture for the optimal auction design, which we denote as Regret Former. The second modification is an alternative constrained objective that yields a more convenient loss function. 3.1 Regret Former: enhancing Regret Net with attention layers The neural architecture of Regret Net has several issues. One issue is the sensitivity of Regret Net to the order of items and participants in the input bid matrix. Such order should not affect the outcome of a symmetric auction [10, 50]. Even in non-symmetric auctions, permutation-equivariant solutions sometimes achieve near-optimal revenue [27, 2, 30, 31, 32]. This makes the option to wire equivariance to the input order into the architecture desirable. Another issue is that Regret Net assumes a constant number of participants and items. This severely limits its practical applicability due to the inability to generalize to unseen input sizes and learn from data with heterogenous input sizes. Finally, the fully-connected layers used in Regret Net have limited expressivity and may not have the right inductive biases for auction design. As a remedy, we propose Regret Former. The architecture of Regret Former is illustrated in Figure 1. The network takes the matrix of bids Bnm as the input. First, we apply an exchangeable layer [25] to transform each bid into an initial vector of features that contains information about other bids. Then, we apply item-wise (i.e. row-wise) and participant-wise (i.e. column-wise) self-attention layers [60] to each feature vector corresponding to each bid. For a given bid, the outputs of the two self-attention layers are transformed into a single feature vector through a fully-connected layer. These self-attention transformations can be applied several times. Finally, by averaging over rows and columns we transform the output of self-attention layers into the allocation matrix Znm and the payment vector P n. Note that each layer shares the parameters across all bids to ensure that the network is insensitive to the order of bids and is applicable to different input sizes. The detailed definitions and order of all layers are provided in the Appendix. Our architecture has several advantages. On the one hand, it leverages the expressivity of attention layers that help to achieve state-of-the-art performance in many diverse and complex problems, e.g. [60, 12, 63]. We empirically demonstrate the superiority of Regret Former in Section 4. On the other hand, it maintains the equivariance and the invariance to the order of items and participants. The former forces the resulting mechanisms to be symmetric, which drastically reduces the solution search space. The latter allows Regret Former to learn from data with inconstant input sizes, as well as to generalize to unseen settings. Note that Regret Former can still learn asymmetric mechanisms by utilizing positional encoding, which we demonstrate empirically. We note that a concurrent work by Duan et al. [14] also combines Transformers and Regret Net. However, their focus is on integrating contextual information of bidders and items into the framework, whereas we perform a wider and more accurate comparison of neural architectures. 3.2 Specifying regret budget Both Regret Net and Equivariant Net optimize a mixture of two conflicting objectives (3), namely revenue maximization and regret minimization, and control their trade-off with hyperparameters like initial values and schedules of the Lagrange multipliers. There are two issues with this approach. The first issue is sensitivity. These hyperparameters have to be precisely tuned for each experiment, and as Rahme et al. [52] show, may massively degrade the performance if selected improperly. The second issue is uninterpretability. While increasing any of λi, ρ, ρ tightens the regret budget of the learned mechanism, the exact effect of such changes on resulting revenue and regret is unpredictable. Furthermore, the recipe for hyperparameter selection in new settings is unclear. Rahme et al. [52] propose a mixture of objectives that mitigates the first issue, but the second issue remains unresolved. We propose an alternative perspective that mitigates both issues. Instead of balancing the two objectives, we maximize the revenue given a maximal regret budget Rmax, which is pre-specified by the designer. This corresponds to a relaxed version of the constrained objective (1): i N pi(vl; w) i N rgti(vl; w) Rmax (5) Instead of the constrained objective (5), we optimize its dual by introducing the Lagrange multiplier γ (note that Rmax does not depend on w and thus can be dropped): Louter(w) = X i N Pi + γ X i N e Ri (6) Critically, unlike the original Regret Net we do not hand-select the Lagrange multiplier. Instead, γ is dynamically updated to enforce the constraint satisfaction while exhausting all available regret budget Rmax. To this end, we employ dual gradient descent [4]. Specifically, we iterate between one gradient update of the network parameters w to minimize (6) and one update of γ according to: i N e Ri/ X i N Pi) log(Rmax) where γ is the learning rate for the dual variable. For convenience, in (7) we normalize the regret estimate by revenue. This way, Rmax [0, 1] specifies the regret budget as a percentage of revenue. We also apply logarithms to both terms of the difference in (7) for faster convergence of γ. Additionally, we apply a decreasing schedule to Rmax. If the regret budget Rmax is too tight at the beginning of training, the network may fail to escape the local optima of low revenue and zero regret. To avoid that, we initialize Rmax at a higher value Rstart max and exponentially anneal it to the desirable budget Rend max throughout the training. This leads the network to first find the solutions with high revenue and then tighten the regret. This results in the following update rule for the regret budget: Rmax max Rend max, Rmult max Rmax (8) where Rmult max < 1 controls the speed of annealing of Rmax to Rend max. Another example of applying dual gradient descent to enforce a constraint can be found in [48]. The proposed approach has several advantages. On the one hand, it resolves the dichotomy of two conflicting objectives. While the regret budget needs to be explicitly set based on the designer s preferences, all other hyperparameters are then straightforward to tune by maximizing the revenue given the specified regret budget. This is unlike the original objective of Regret Net (3) where the regret budget is chosen implicitly through specifying uninterpretable loss-related hyperparameters like λi, ρ, ρ . On the other hand, our approach is also significantly less sensitive to loss-related hyperparameters. We empirically demonstrate this by using the same hyperparameters related to the budget Rmax and its schedule in all our experiments, in contrast to the objective (3) that requires a uniquely tuned set of loss-related hyperparameters to perform well in a given setting [52]. 4 Experiments In this section, we empirically investigate our modifications. We compare our Regret Former with Regret Net and Equivariant Net in settings with a constant number of participants and items used by Dütting et al. [17], as well as in novel settings with inconstant input sizes that we denote as multi-settings. All three networks are trained given the same regret budgets using our approach from Section 3.2. Additionally, we vary regret budgets and further investigate the differences between the architectures using novel validation procedures. In all experiments, the valuations of all participants are additive and are independently drawn for each item from the Uniform distribution: vi(j) U[0, 1]. Training details, hyperparameters, and additional results are reported in the Appendix. 4.1 Comparison of architectures under constant input sizes The settings only differ in the number of participants n and items m, so we denote them as nxm. We conduct experiments in five settings: {1x2, 2x2, 2x3, 2x5, 3x10}. The 1x2 setting is the celebrated Manelli-Vincent auction, the analytical solution for which is provided in [37]. The optimal revenue for this auction equals 0.55. For the rest of the settings, the analytical solutions are unknown. We compare our and existing neural architectures in the five settings given two different regret budgets Rmax {10 3, 10 4}. We additionally include the classic VCG [61, 5, 22] and Myerson [42] mechanisms for comparison, both of which are DSIC and ex-post IR. The Myerson auctions are optimal for m = 1. We report its two extensions to multi-good settings: the item-wise where each item is sold separately and the bundled where all items are sold as a single bundle. We report the results in Table 1. Additionally, we report the learning curves in the Appendix. In all settings but 1x2 and given both regret budgets, Regret Former outperforms both Regret Net and Equivariant Net in revenue. The performance gap is absent in the simplest setting but becomes Table 1: Architecture comparison. Like in [16], revenue is summed over participants, and regret is averaged over participants. The highest revenue in a setting is highlighted in bold font. For brevity, we only report aggregated standard deviations: the average standard deviation of revenue is 0.006 for 1x2, 0.011 for 2x2, 0.009 for 2x3, 0.033 for 2x5, 0.019 for 3x10; the average standard deviation of regret is 0.00018 for Rmax = 10 3, 0.00003 for Rmax = 10 4. Rmax setting Regret Net Equivariant Net Regret Former revenue regret revenue regret revenue regret 10 3 1x2 0.572 0.0007 0.586 0.00065 0.571 0.00075 2x2 0.889 0.00055 0.878 0.0008 0.908 0.00054 2x3 1.317 0.00102 1.365 0.00084 1.416 0.00089 2x5 2.339 0.00142 2.437 0.00146 2.453 0.00102 3x10 5.59 0.00204 5.744 0.00167 6.121 0.00179 10 4 1x2 0.551 0.00007 0.548 0.00013 0.556 0.00014 2x2 0.825 0.00005 0.75 0.00005 0.861 0.00006 2x3 1.249 0.00007 1.226 0.0001 1.327 0.00011 2x5 2.121 0.00013 2.168 0.00017 2.339 0.00015 3x10 5.02 0.00062 5.12 0.00025 5.745 0.00022 Rmax setting VCG Myerson item-wise Myerson bundled revenue regret revenue regret revenue regret 1x2 0 0 0.5 0 0.544 0 2x2 0.667 0 0.833 0 0.839 0 2x3 1 0 1.25 0 1.278 0 2x5 1.667 0 2.083 0 2.188 0 3x10 5 0 5.312 0 5.003 0 Table 2: Ratio of the estimated regret to the regret budget. Our approach from Section 3.2 implies that this ratio should be close to 1 during training. Optimal misreports are estimated by minimizing (4) using 50 and 1000 gradient descent steps during training and validation, respectively. Rmax setting Regret Net Equivariant Net Regret Former train valid train valid train valid 10 3 1x2 1.12 1.22 1.04 1.11 1.01 1.31 2x2 0.97 1.24 1.41 1.82 0.89 1.19 2x3 1.07 1.55 1.11 1.23 1.02 1.26 2x5 0.94 1.21 1.11 1.2 0.8 0.83 3x10 0.89 1.09 0.9 0.87 1.03 0.88 10 4 1x2 0.94 1.27 0.92 2.37 1.31 2.52 2x2 0.95 1.94 1.73 1.33 0.93 1.39 2x3 1.52 1.12 1.57 1.63 1.6 1.66 2x5 1.04 1.23 1.02 1.57 0.95 1.28 3x10 0.9 3.71 1.05 1.46 0.88 1.15 especially prominent in the larger settings. While permutation-equivariance of Regret Former likely plays a role, it cannot fully explain the results since Equivarint Net also has this property. We, therefore, attribute the superiority of Regret Former to the expressivity of attention layers. Additionally, in Figure 2, we provide solutions found by Regret Net and Regret Former in the 1x2 setting. Both networks find allocation probabilities that smoothly approximate the optimal solution. In the bigger settings, Equivariant Net can also outperform Regret Net. This result is somewhat surprising since it was not observed by the authors of Equivariant Net, but it can simply be explained by the fact that they do not test the architecture in settings that are complex enough. Given the smallest regret budget, both baselines sometimes underperform Myerson. While Regret Former outperforms Myerson in revenue, its regret is still non-zero. This highlights a potential Figure 2: Allocation probabilities: (a, b) Regret Net in 1x2; (c, d) Regret Former in 1x2; (e, f) Regret Former without PE in the asymmetric setting; (g, h) Regret Former with PE in the asymmetric setting. The optimal mechanisms are described by the regions separated by the dashed black lines, with the numbers in black denoting the optimal probability of allocation in the region. problem of the regret-based approach: when any violations of DSIC are highly undesirable, better revenue might be achieved with classic mechanisms that are also guaranteed to be DSIC. A potential downside of Regret Former compared to the baselines is that it takes longer to train and requires more parameters to reach optimal performance. We elaborate on this in the Appendix. 4.2 Does regret budget function as indented? The core feature of our approach from Section 3.2 is that the resulting regret should be uniquely and unambiguously specified through the regret budget. In particular, the total regret normalized by the total revenue should equal the regret budget. To verify this quantitatively, we present the ratios of the estimated normalized total regret to the regret budget in Table 2. A ratio higher than 1 means that the regret budget is exceeded, and vice versa. Below we summarize our findings. During training, the ratios are indeed close to 1 for all three architectures, meaning that our approach functions as intended. Additionally, we report the ratios estimated during validation, which involves more gradient descent steps to find optimal misreports than during training, making the regret estimates more precise. Unsurprisingly, these are usually than the ratios estimated during training. For a tighter regret approximation and a better match with the budget, one could increase the number of the inner optimization steps during training, provided that a longer training time is acceptable. 4.3 Is the performance gap genuine? In Table 1, we have observed that Regret Former outperforms both baselines and have attributed its success to the expressivity of attention layers. However, there is an alternative explanation. Recall that regret is approximated in the inner optimization (4) by repeatedly back-propagating through the whole neural network w. It follows that regret approximation depends on the neural architecture, including the number, the types, the size, and the order of layers. Due to the absence of a regret approximation procedure that is identical for all networks, there is no guarantee that similar regret estimates between different neural architectures correspond to actual similar regret values, i.e. for different architectures w1, w2 : R(w1) R(w2) = R(w1) R(w2), where R(w) is the expected regret and R(w) is its approximation. In particular, we are concerned whether Regret Former achieves higher revenues due to approximating the regret worse, rather than due to maximizing the revenue better. We design two procedures to test this hypothesis. Table 3: Cross-misreport regret estimates. The highest regret for a network is highlighted with bold. setting regret of misreports of Regret Net Equivariant Net Regret Former 1x2 Regret Net 0.00079 0.00043 0.00074 Equivariant Net 0.00102 0.00071 0.00129 Regret Former 0.00076 0.00046 0.00092 2x2 Regret Net 0.00050 0.00031 0.00024 Equivariant Net 0.00021 0.00050 0.00022 Regret Former 0.00065 0.00056 0.00059 2x3 Regret Net 0.00116 0.00062 0.00044 Equivariant Net 0.00020 0.00071 0.00028 Regret Former 0.00072 0.00087 0.00094 2x5 Regret Net 0.00149 0.00065 0.00060 Equivariant Net 0.00075 0.00142 0.00071 Regret Former 0.00089 0.00120 0.00109 3x10 Regret Net 0.00198 0.00035 0.00012 Equivariant Net 0.00035 0.00168 0.00014 Regret Former 0.00178 0.00178 0.00222 Table 4: Asymmetric setting from Daskalakis et al. [11], Rmax = 10 3 Optimal Regret Net Regret Former Regret Former + PE revenue regret revenue regret revenue regret revenue regret 9.781 0 9.951 0.0072 9.967 0.0102 10.056 0.0099 The first procedure is denoted as cross-misreports . It is based on estimating the regret of one network on the optimal misreports approximated by another network: e Rcross i (w1, w2) = 1 |B| P l B f rgti(v l i (w2), vl; w1). We apply this procedure to the trained networks in all five settings given Rmax = 10 3. If all networks approximate regret equally well, we expect the regret estimated on cross-misreports to not exceed the normally computed regret. However, if the regret estimates of Regret Former were higher on the misreports estimated by Regret Net or Equivariant Net, this would point toward poor regret approximation by Regret Former. We report the results in Table 3. We do not find any evidence that Regret Former underestimates the regret: in all settings, it finds misreports that produce regret either higher than or within a standard deviation from the misreports of the other two architectures. The second procedure is based on network distillation. The results are similar to Table 3 in that Regret Former does not appear to underestimate the regret, so we report them in the Appendix. 4.4 Learning asymmetric mechanisms using positional encoding To handle asymmetric distributions, our architecture can be augmented with Positional Encoding (PE) - a common technique for transformers to incorporate information about the order of elements. To demonstrate this, we study a setting with one bidder, two items, and non-identically distributed valuations over items, where v1 U[4, 16] and v2 U[4, 7]. The optimal mechanism is given by Daskalakis et al. [11]. We apply Regret Net, Regret Former without PE, and Regret Former with PE given Rmax = 10 3. As PE, we use a common technique that adds arbitrary numbers from [-1, 1] to the input (estimated as sine and cosine functions of position-dependent frequencies) proposed by Vaswani et al. [60]. Note that this PE has no learnable parameters. The results are reported in Table 4. While Regret Former with PE unsurprisingly performs on par with Regret Net, Regret Former without PE learns a symmetric solution that is not much worse. In Figure 2, we provide illustrations of solutions found by the two versions of Regret Former. These show that Regret Former with PE closely approximates the allocation probabilities of the optimal mechanism, Table 5: Multi-setting results. We report the highest regret identified with the cross-misreport procedure (see full cross-misreport results in the Appendix). (*) Note that Equivariant Net achieves unrealistically high revenue for a much higher regret. Setting Regret Net Equivariant Net( ) Regret Former revenue regret revenue regret revenue regret average 2.573 0.00352 2.889 0.00989 2.703 0.00391 2x3 1.386 0.00305 1.504 0.00554 1.432 0.00246 2x4 1.855 0.00341 2.070 0.00925 1.951 0.00317 2x5 2.339 0.00362 2.646 0.01270 2.471 0.00391 2x6 2.866 0.00425 3.226 0.01597 3.006 0.00439 2x7 3.346 0.00457 3.834 0.01951 3.553 0.00481 3x3 1.652 0.00322 1.795 0.00358 1.708 0.00251 3x4 2.217 0.00264 2.449 0.00508 2.312 0.00336 3x5 2.786 0.00277 3.108 0.00709 2.911 0.00421 3x6 3.362 0.00340 3.787 0.00916 3.521 0.00476 3x7 3.921 0.00430 4.467 0.01101 4.140 0.00553 whereas Regret Former without PE finds a symmetric solution that resembles some middle ground between the optimal allocation probabilities for the two items. 4.5 Comparison of architectures in multi-settings In practice, it may be desirable for a single network to be applicable to multiple input sizes, e.g. to save computation or due to limited data. To test our network in such cases, we introduce multi-setting. A multi-setting is a uniform mixture of constant-sized settings studied so far. In our experiments, we mix the following settings: {2x3, 2x4, 2x5, 2x6, 2x7, 3x3, 3x4, 3x5, 3x6, 3x7}. We set Rmax = 10 3. To adapt Regret Net to multi-settings, we fix the input size for the maximum possible number of items and participants in a multi-setting and pad unused valuations with zeros. The results are presented in Table 5. Comparing Regret Former with Regret Net, our architecture consistently achieves higher revenue for the same regret budget. While Equivariant Net achieves even higher revenue, after applying our cross-misreport validation procedure we find that Equivariant Net fails to minimize regret due to poorly approximating optimal misreports. Full results of crossmisreport validation, as well as out-of-setting experiments, are reported in the Appendix. 5 Conclusion In this study, we achieve new state-of-the-art in the application of deep learning to optimal auction design. Our Regret Former leverages recent advances in deep learning to unlock the full potential of regret-based optimization while enforcing the equivariance and the invariance to the order of participants and items. We test the effectiveness of Regret Former in multiple experimental settings and find that our network consistently outperforms the existing analogues. In addition, we rethink the objective formulation of Regret Net. The resulting loss function disentangles balancing the revenueregret trade-off and optimizing the performance, as well as reduces the burden of hyperparameter tuning. Finally, we suggest two validation procedures for comparing regret-based approaches that may find use in future studies. Acknowledgments All authors. We sincerely thank Xeniya Adayeva for creating the illustration of Regret Former 1. You can contact Xeniya at xeniya.adayeva@gmail.com regarding scientific illustrations. Dmitry Ivanov. This research was supported in part through computational resources of HPC facilities at HSE University, Russian Federation. Support from the Basic Research Program of the National Research University Higher School of Economics is gratefully acknowledged. [1] R. Ai, B. Lyu, Z. Wang, Z. Yang, and M. I. Jordan. A reinforcement learning approach in multi-phase second-price auction design. ar Xiv preprint ar Xiv:2210.10278, 2022. [2] S. Alaei, J. Hartline, R. Niazadeh, E. Pountourakis, and Y. Yuan. Optimal auctions vs. anonymous pricing. Games and Economic Behavior, 118:494 510, 2019. [3] M.-F. F. Balcan, T. Sandholm, and E. Vitercik. Sample complexity of automated mechanism design. Advances in Neural Information Processing Systems, 29, 2016. [4] S. Boyd, S. P. Boyd, and L. Vandenberghe. Convex optimization. Cambridge university press, 2004. [5] E. H. Clarke. Multipart pricing of public goods. Public choice, pages 17 33, 1971. [6] R. Cole and T. Roughgarden. The sample complexity of revenue maximization. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 243 252, 2014. [7] V. Conitzer and T. Sandholm. Complexity of mechanism design. In Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence, pages 103 110, 2002. [8] V. Conitzer and T. Sandholm. Self-interested automated mechanism design and implications for optimal combinatorial auctions. In Proceedings of the 5th ACM Conference on Electronic Commerce, pages 132 141, 2004. [9] M. J. Curry, T. Sandholm, and J. P. Dickerson. Differentiable economics for randomized affine maximizer auctions. Co RR, abs/2202.02872, 2022. URL https://arxiv.org/abs/2202. 02872. [10] C. Daskalakis and S. M. Weinberg. Symmetries and optimal multi-dimensional mechanism design. In Proceedings of the 13th ACM conference on Electronic commerce, pages 370 387, 2012. [11] C. Daskalakis, A. Deckelbaum, and C. Tzamos. Strong duality for a multiple-good monopolist. Econometrica, 85(3):735 767, 2017. [12] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171 4186. Association for Computational Linguistics, jun 2019. [13] P. Dhangwatnotai, T. Roughgarden, and Q. Yan. Revenue maximization with a single sample. Games and Economic Behavior, 91:318 333, 2015. [14] Z. Duan, J. Tang, Y. Yin, Z. Feng, X. Yan, M. Zaheer, and X. Deng. A context-integrated transformer-based neural network for auction design. In International Conference on Machine Learning. PMLR, 2022. [15] P. Dütting, F. Fischer, P. Jirapinyo, J. K. Lai, B. Lubin, and D. C. Parkes. Payment rules through discriminant-based classifiers. ACM Transactions on Economics and Computation, 3(1), 2015. [16] P. Dütting, Z. Feng, N. Golowich, H. Narasimhan, D. C. Parkes, and S. S. Ravindranath. Machine learning for optimal economic design. In The Future of Economic Design, pages 495 515. Springer, 2019. [17] P. Dütting, Z. Feng, H. Narasimhan, D. Parkes, and S. S. Ravindranath. Optimal auctions through deep learning. In International Conference on Machine Learning, pages 1706 1715. PMLR, 2019. [18] Z. Feng, H. Narasimhan, and D. C. Parkes. Deep learning for revenue-optimal auctions with budgets. In Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems, pages 354 362, 2018. [19] N. Golowich, H. Narasimhan, and D. C. Parkes. Deep learning for multi-facility location mechanism design. In IJCAI, pages 261 267, 2018. [20] Y. A. Gonczarowski and N. Nisan. Efficient empirical revenue maximization in single-parameter auction environments. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 856 868, 2017. [21] Y. A. Gonczarowski and S. M. Weinberg. The sample complexity of up-to-ε multi-dimensional revenue maximization. Journal of the ACM (JACM), 68(3):1 28, 2021. [22] T. Groves. Incentives in teams. Econometrica: Journal of the Econometric Society, pages 617 631, 1973. [23] C. Guo, Z. Huang, and X. Zhang. Settling the sample complexity of single-parameter revenue maximization. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 662 673, 2019. [24] M. Guo and V. Conitzer. Computationally feasible automated mechanism design: General approach and case studies. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 24, pages 1676 1679, 2010. [25] J. Hartford, D. Graham, K. Leyton-Brown, and S. Ravanbakhsh. Deep models of interactions across sets. In International Conference on Machine Learning, pages 1909 1918. PMLR, 2018. [26] J. Hartline and S. Taggart. Sample complexity for non-truthful mechanisms. In Proceedings of the 2019 ACM Conference on Economics and Computation, pages 399 416, 2019. [27] J. D. Hartline and T. Roughgarden. Simple versus optimal mechanisms. In Proceedings of the 10th ACM conference on Electronic commerce, pages 225 234, 2009. [28] Z. Huang, Y. Mansour, and T. Roughgarden. Making the most of your samples. SIAM Journal on Computing, 47(3):651 674, 2018. [29] J. Jin, C. Song, H. Li, K. Gai, J. Wang, and W. Zhang. Real-time bidding with multi-agent reinforcement learning in display advertising. In Proceedings of the 27th ACM international conference on information and knowledge management, pages 2193 2201, 2018. [30] Y. Jin, P. Lu, Q. Qi, Z. G. Tang, and T. Xiao. Tight approximation ratio of anonymous pricing. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 674 685, 2019. [31] Y. Jin, P. Lu, Z. G. Tang, and T. Xiao. Tight revenue gaps among simple mechanisms. SIAM Journal on Computing, 49(5):927 958, 2020. [32] P. Kothari, S. Singla, D. Mohan, A. Schvartzman, and S. M. Weinberg. Approximation schemes for a unit-demand buyer with independent items via symmetries. In 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS), pages 220 232. IEEE, 2019. [33] K. Kuo, A. Ostuni, E. Horishny, M. J. Curry, S. Dooley, P.-y. Chiang, T. Goldstein, and J. P. Dickerson. Proportionnet: Balancing fairness and revenue for auction design with deep learning. ar Xiv preprint ar Xiv:2010.06398, 2020. [34] S. Lahaie. A kernel-based iterative combinatorial auction. In Twenty-Fifth AAAI Conference on Artificial Intelligence, 2011. [35] A. Likhodedov, T. Sandholm, et al. Approximating revenue-maximizing combinatorial auctions. In AAAI, volume 5, pages 267 274, 2005. [36] X. Liu, C. Yu, Z. Zhang, Z. Zheng, Y. Rong, H. Lv, D. Huo, Y. Wang, D. Chen, J. Xu, et al. Neural auction: End-to-end learning of auction mechanisms for e-commerce advertising. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 3354 3364, 2021. [37] A. M. Manelli and D. R. Vincent. Bundling as an optimal selling mechanism for a multiple-good monopolist. Journal of Economic Theory, 127(1):1 35, 2006. [38] P. Manisha, C. Jawahar, and S. Gujar. Learning optimal redistribution mechanisms through neural networks. In Proceedings of the 17th International Conference on Autonomous Agents and Multi Agent Systems, pages 345 353, 2018. [39] M. Mohri and A. M. Medina. Learning algorithms for second-price auctions with reserve. The Journal of Machine Learning Research, 17(1):2632 2656, 2016. [40] J. Morgenstern and T. Roughgarden. Learning simple auctions. In Conference on Learning Theory, pages 1298 1318. PMLR, 2016. [41] J. H. Morgenstern and T. Roughgarden. On the pseudo-dimension of nearly optimal auctions. Advances in Neural Information Processing Systems, 28, 2015. [42] R. B. Myerson. Optimal auction design. Mathematics of operations research, 6(1):58 73, 1981. [43] H. Narasimhan and D. C. Parkes. A general statistical framework for designing strategyproof assignment mechanisms. In UAI 16 Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence, 2016. [44] H. Narasimhan, S. B. Agarwal, and D. C. Parkes. Automated mechanism design without money via machine learning. In Proceedings of the 25th International Joint Conference on Artificial Intelligence, 2016. [45] T. Nedelec, N. El Karoui, and V. Perchet. Learning to bid in revenue-maximizing auctions. In International Conference on Machine Learning, pages 4781 4789. PMLR, 2019. [46] T. Nedelec, J. Baudet, V. Perchet, and N. El Karoui. Adversarial learning for revenuemaximizing auctions. In 20th International Conference on Autonomous Agents and Multiagent Systems, 2021. [47] K. T. Nguyen. A bandit learning algorithm and applications to auction design. Advances in Neural Information Processing Systems, 33:12070 12079, 2020. [48] X. B. Peng, A. Kanazawa, S. Toyer, P. Abbeel, and S. Levine. Variational discriminator bottleneck: Improving imitation learning, inverse rl, and gans by constraining information flow. In International Conference on Learning Representations, 2018. [49] N. Peri, M. Curry, S. Dooley, and J. Dickerson. Preferencenet: Encoding human preferences in auction design with deep learning. Advances in Neural Information Processing Systems, 34, 2021. [50] T. Qin, F. He, D. Shi, W. Huang, and D. Tao. Benefits of permutation-equivariance in auction mechanisms. Advances in Neural Information Processing Systems, 35, 2022. [51] J. Rahme, S. Jelassi, J. Bruna, and S. M. Weinberg. A permutation-equivariant neural network architecture for auction design. Proceedings of the AAAI Conference on Artificial Intelligence, 35(6):5664 5672, May 2021. URL https://ojs.aaai.org/index.php/AAAI/article/ view/16711. [52] J. Rahme, S. Jelassi, and S. M. Weinberg. Auction learning as a two-player game. In International Conference on Learning Representations, 2021. URL https://openreview.net/ forum?id=YHde AO61l6T. [53] S. S. Ravindranath, Z. Feng, S. Li, J. Ma, S. D. Kominers, and D. C. Parkes. Deep learning for two-sided matching. ar Xiv preprint ar Xiv:2107.03427, 2021. [54] T. Sandholm. Automated mechanism design: A new application area for search algorithms. In International Conference on Principles and Practice of Constraint Programming, pages 19 36. Springer, 2003. [55] T. Sandholm and A. Likhodedov. Automated design of revenue-maximizing combinatorial auctions. Operations Research, 63(5):1000 1025, 2015. [56] W. Shen, P. Tang, and S. Zuo. Automated mechanism design via neural networks. In Proceedings of the 18th International Conference on Autonomous Agents and Multi Agent Systems, pages 215 223, 2019. [57] V. Syrgkanis. A sample complexity measure with applications to learning optimal auctions. Advances in Neural Information Processing Systems, 30, 2017. [58] A. Tacchetti, D. Strouse, M. Garnelo, T. Graepel, and Y. Bachrach. A neural architecture for designing truthful and efficient auctions. Co RR, abs/1907.05181, 2019. URL http: //arxiv.org/abs/1907.05181. [59] P. Tang. Reinforcement mechanism design. In IJCAI, pages 5146 5150, 2017. [60] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017. [61] W. Vickrey. Counterspeculation, auctions, and competitive sealed tenders. The Journal of finance, 16(1):8 37, 1961. [62] J. Weissteiner and S. Seuken. Deep learning powered iterative combinatorial auctions. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 2284 2293, 2020. [63] L. Yuan, Y. Chen, T. Wang, W. Yu, Y. Shi, Z.-H. Jiang, F. E. Tay, J. Feng, and S. Yan. Tokensto-token vit: Training vision transformers from scratch on imagenet. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 558 567, 2021. 1. For all authors... (a) Do the main claims made in the abstract and introduction accurately reflect the paper s contributions and scope? [Yes] (b) Did you describe the limitations of your work? [Yes] See the end of Section 4.1. (c) Did you discuss any potential negative societal impacts of your work? [No] We do not foresee such an impact. (d) Have you read the ethics review guidelines and ensured that your paper conforms to them? [Yes] 2. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] We include the code as a URL. (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See the Appendix. (c) Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? [Yes] In Table 1, we report aggregated standard deviations. In the Appendix, we report the learning curves with the error bars. (d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] See the Appendix.