# learning_to_bid_in_revenuemaximizing_auctions__ce93db71.pdf Learning to bid in revenue-maximizing auctions Thomas Nedelec 1 2 Noureddine El Karoui 3 1 Vianney Perchet 2 1 We consider the problem of the optimization of bidding strategies in prior-dependent revenuemaximizing auctions, when the seller fixes the reserve prices based on the bid distributions. Our study is done in the setting where one bidder is strategic. Using a variational approach, we study the complexity of the original objective and we introduce a relaxation of the objective functional in order to use gradient descent methods. Our approach is simple, general and can be applied to various value distributions and revenuemaximizing mechanisms. The new strategies we derive yield massive uplifts compared to the traditional truthfully bidding strategy. 1. Introduction Modern marketplaces like Uber, Amazon or Ebay enable sellers to fine-tune their selling mechanism by reusing their large number of past interactions with consumers. In the online advertising or the electricity markets, billions of auctions are occurring everyday between the same bidders and sellers. Based on the data gathered, different approaches learn complex mechanisms maximizing the seller revenue (Conitzer and Sandholm, 2002; Ostrovsky and Schwarz, 2011; Paes Leme et al., 2016; Golrezaei et al., 2017). Most of the literature has focused on the auctioneer side (Milgrom and Tadelis, 2018). Algorithms focused on the bidder s standpoint to enable them to be strategic against any smart data-driven selling mechanisms are lacking. These algorithms should ideally strengthen the balance of power driving the relationship between buyers and sellers. Our main objective is to exhibit simple robust algorithmic procedures that take advantage of various data-dependent revenuemaximizing mechanisms. This represents a big step forward in understanding possible strategic behaviors in revenue 1Criteo AI Lab 2CMLA, ENS Paris Saclay 3UC, Berkeley. Correspondence to: Thomas Nedelec . Proceedings of the 36 th International Conference on Machine Learning, Long Beach, California, PMLR 97, 2019. Copyright 2019 by the author(s). maximizing auctions. This is a new argument supporting the Wilson doctrine (Wilson, 1987) claiming that datadependent revenue maximizing algorithms are not robust to strategic bidders. 1.1. Framework In the early stage of the market design literature (see, e.g., Myerson (1981)), a typical underlying assumption is that the bidders value distributions were commonly known to the seller and other bidders. This can be justified if different group of bidders with the same value distribution are interacting successively with one seller. In the aforementioned modern applications, the same bidders have billions of interactions everyday with the seller. Even if the latter does not know the value distribution beforehand, it might use in many cases the past bid distributions as proxies of value distribution. Several mechanisms based on the value distribution of bidders have already been introduced. We will focus on the lazy second price auction with personalized reserve price (Paes Leme et al., 2016), the Myerson auction (Myerson, 1981), the eager version of the second price auction and the boosted second price auction (Golrezaei et al., 2017). When repeating these auctions (every day, or every milli-second, depending on the context) and if the bidder is myopic, i.e optimizing per stage and not long-term revenue, it is optimal to bid truthfully at each auction. So with myopic bidders, bids and values have the same distribution and the seller can design optimally the mechanism based on the former. Non-myopic bidders optimize their long-term expected utility taking into account that their current strategy will imply a certain mechanism (for instance a specific reserve price) in the future. More precisely, we will consider the following steady state analysis. Assume the valuations of a bidder vi R are drawn from a specific distribution Fi; a bidding strategy is a mapping βi from R into R that indicates the actual bid Bi = βi(vi) when the value is vi. As a consequence, the distribution of bids FBi is the push-forward of Fi by βi. In the steady state, the seller uses the distributions of bids FBi to choose a specific auction mechanism M(FBi) among a given class of mechanisms M. The objective of a long-term strategic bidder is to find her strategy βi that maximizes her expected utility when vi Fi, she Learning to bid in revenue-maximizing auctions bids βi(vi) and the induced mechanism is M(FBi). This steady-state objective is particularly relevant in modern applications as most of the data-driven selling mechanisms are using large batches of bids as examples to update their mechanism. In terms of game theory, these interactions are a game between the seller - whose strategy is to pick a mechanism design that maps bid distributions to reserve prices - and the bidders - who chose bidding strategies. Our overarching objective is to derive the best-response, for a given bidder i, to the strategy of the seller (i.e., a given mechanism) and the strategies of the other bidders (i.e., their bid distributions). 1.2. Contributions Our main contributions are the following. We first introduce the optimization problem that strategic bidders are facing when the seller is optimizing personalized reserve prices based on their bid distributions. A straightforward optimization can fail because the objective is discontinuous as a function of the bidding strategy. To circumvent this issue, we introduce a new relaxation of the problem which is stable to local perturbations of the objective function and computationally tractable and efficient. We numerically optimize this new objective through a simple neural network and get very significant improvements in bidder utility compared to truthful bidding. We also provide a theoretical analysis of thresholded strategies (introduced in Nedelec et al. (2018)) and show their (local) optimality as improvements of bidding strategies with non-zero reserve value. For the Myerson auction, the strategies learned by the model can be independently proved to be optimal. We apply the approach to other auction settings such as boosted second price or eager second price with monopoly price. We report massive uplifts compared to the traditional truthful strategy advocated in all these settings. Our simple approach can be plugged in any modern bidding algorithms learning distribution of the highest bid of the competition and we test it on other classes of mechanism without any known closed form optimal bidding strategies. We finally provide the code in Py Torch that has been used to run the different experiments. This approach opens avenues of research for designing good bidding strategies in many data-driven revenue-maximizing auctions. 1.3. Related work Starting with the seminal work of Myerson (1981), a rich line of work indicates the type of auctions that is revenuemaximizing for the seller. In the case of symmetric bidders (Myerson, 1981), one revenue maximizing auction is a second price auction with a reserve price equal to the monopoly price, i.e, the price r that maximizes r(1 F(r)). However, in most applications, the symmetric assumption is not satisfied (Golrezaei et al., 2017). In the asymmetric case, the Myerson auction is optimal (Myerson, 1981) but is difficult to implement in practice (Morgenstern and Roughgarden, 2015). In this case, a second price auction with a wellchosen vector of reserve prices guarantees at least one-half of the optimal revenue (Hartline and Roughgarden, 2009). In modern markets, some bidders are myopic simply because truthful bidding is a simple strategy to implement. Receiving truthful bid enables sellers to design various revenue maximizing auctions. (Conitzer and Sandholm, 2002) has therefore been interested in the automatic mechanism design that fine tunes mechanism based on some examples of bids. This work was extended recently in (D utting et al., 2017) with the use of deep learning. In (Ostrovsky and Schwarz, 2011; Medina and Mohri, 2014; Paes Leme et al., 2016), it is shown specifically how to learn the optimal reserve prices in the lazy second price auction. This practice was theoretically addressed by (Cole and Roughgarden, 2014; Huang et al., 2018; Devanur et al., 2016) looking at the sample complexity of a large class of auctions assuming an oracle offering iid examples of the value distribution. However, it is quite intuitive that non-myopic bidders should not bid truthfully. Robustness to strategic bidders has been studied in (Balseiro et al., 2017; Kanoria and Nazerzadeh, 2014; Epasto et al., 2018). A potential limitation of this type of approach is that it is either assumed that all bidders have the same value distribution (or up to ε for some specific metric on distributions) or that there is a very large number of bidders and a global mechanism designed so that any of them has no incentive to bid untruthfully. In (Ashlagi et al., 2016), an involved mechanism was designed that keeps the incentive compatibility property even if the seller is learning on former bids of the bidders. None of these papers have exhibited optimal strategies that can be used when the seller is optimizing her mechanism based on past bids. This strategic behavior has been studied for posted price with one bidder and one seller (Mohri and Munoz, 2015). An independent line of work has focused on learning to bid when the value is not known to the bidders (Weed et al., 2016; Feng et al., 2018). Some Bayes-Nash equilibria corresponding to games where bidders can choose their bid distribution were designed (Tang and Zeng, 2018; Abeille et al., 2018) with some derivations of seller revenue and bidders utility at these equilibria. However, no strategies corresponding to these equilibria were provided in the general case. Our work is finally strongly related to (Nedelec et al., 2018) where a new class of shading strategies for second price auctions with personalized reserve price is proposed. Our new optimization pipeline is very general and enables bidders to learn good bidding strategies in multiple Learning to bid in revenue-maximizing auctions settings and for any value distribution. 2. The bidder s optimization problem We introduce in this section the optimization problem, starting with the lazy second price auction with personalized reserve prices (formalized below). 2.1. Notations and setting To describe precisely our approach, we use the traditional setting of auction theory (see e.g. Krishna (2009)). Recall that Fi is the value distribution of bidder i and βi : R R her strategy that maps values to bids. The corresponding distribution of bids is then FBi = βi Fi, the push-forward of Fi w.r.t. βi. In the steady-state, we assume that the seller has the perfect knowledge of each bid distribution FBi. Notice that we have implicitly identified the distribution Fi (resp. FBi) with its cumulative distribution function (cdf) and use both terms exchangeably. We use fi (resp. f Bi) for the corresponding probability density function (pdf). For the sake of simplicity, let us first consider a lazy second price auction Krishna (2009). We recall that in this auction each bidder has a personalized reserve price. The item is attributed to the highest bidder, if she clears her reserve price, and not attributed otherwise; the winner then pays the maximum between the second highest bid and her reserve price. It is known that the optimal reserve price of bidder i is her monopoly price equal to argmaxr r(1 FBi(r)), or equivalently1 to ψ 1 Bi (0), where ψBi is the usual virtual value function defined as ψBi(b) = b 1 FBi(b) As a consequence, it is natural to assume that the strategy of bidder i does not impact the strategy of other bidders (that can be either myopic or not) and from now on, we assume that bids are independent. 2.2. A variational approach A fundamental result in auction theory is the Myerson lemma (Myerson, 1981). It expresses the expected payment of a bidder depending on her virtual value and the value distribution of the competition. An important notation is Gi, the cdf of the maximum bid of players other than i; obviously, if the other bidders are truthful, Gi is the distribution of the maximum value of the other bidders. Lemma 1 (Integrated version of the Myerson lemma). In a lazy second price auction with personalized reserve price 1at least for regular distributions, i.e., when ψ is non-decreasing ri, the payment of bidder i with continuous strategy βi is Π(βi) = EBi FBi ψBi(Bi)Gi(Bi)1(Bi ri) . Proof. The proof is similar to the original one (Myerson, 1981), see also (Krishna, 2009), so we do not spell it out. It consists of using Fubini s theorem and integration by parts to transform the standard form of the seller revenue, i.e. EBi FBi,Xj FBj max j =i (Bj, r)1[Bi maxj =i(Bj,r)] into the above equation. It then suffices to work along the lines mentioned above with Yi = maxj =i Bj and realize that i s expected payment can be written as EBi FBi,Yi Gi max(Yi, r)1[Bi max(Yi,r)] In lazy second price auction, the seller chooses as reserve price the monopoly price corresponding to the bid distribution of bidder i. In this case, Lemma 1 implies that the expected payment of bidder i is equal to Π(βi) = EB FBi ψBi(B)Gi(B)1(B ψ 1 Bi (0)) . In order to simplify the computation of the expectation and remove the dependence on βi, this expected payment can be rewritten in the space of values, by introducing hβi(x) ψFBi(βi(x)) , and noting the equivalent following formulation Π(βi) = EXi Fi hβi(Xi)Gi(βi(Xi))1(Xi xβi) , where xβi = h 1 βi (0) when hβi is increasing. We call it the reserve value, as it is the smallest value above which the seller accepts all bids from bidder i. The expected utility can be derived as a function of βi as U(βi) = EXi Fi (Xi hβi(Xi))Gi(β(Xi))1(Xi xβi) . (1) Finally, we remark that if βi is increasing and differentiable, hβi verifies a simple first order differential equation. Lemma 2. Suppose βi is increasing and differentiable then hβi(xi) = ψFBi(βi(x)) = βi(x) β i(x)1 Fi(x) fi(x) . (2) Proof. ψFBi(b) = b 1 FBi(b) f Bi(b) with FBi(b) = Fi(β 1 i (b)) and f Bi(b) = fi(β 1 i (b)/β i(β 1 i (b). Then, hβi(x) = ψBi(βi(x)) = βi(X) β i(X) 1 Fi(X) Learning to bid in revenue-maximizing auctions If we consider only monotonically increasing differentiable strategies, and we denote by I the class of such functions, the problem of the strategic bidder is therefore to solve maxβ I U(β) with U defined in Equation (1). This equation is crucial, as it indicates that optimizing over bidding strategy can be reduced to finding a distribution with a wellspecified virtual value h( ). A crucial difference between the long term vision and the classical, myopic (or one-shot) auction theory is that bidders also maximize expected utility. They might therefore be willing to sometime over-bid (incurring a negative utility at some specific auctions) if this reduces their reserve price. Indeed, having a lower reserve price increases the revenue of many other auctions. Lose small to win big. This reasoning is possible as there exist multiple interactions between bidders and seller, billions every day in the case of online advertising. 2.3. Discontinuity of the objective In the previous section, we assumed the reserve value was defined as h 1 βi (0), which is well defined only if hβi is increasing. This condition is complicated to ensure as, for instance, restricting the strategies to be increasing does not provide any guarantee on hβi. If the later is not increasing, then the function r(1 FBi(r)) that the seller maximizes might have several local optima, as illustrated with a specific bid distribution in Figure 1. We mention here that this distribution actually arises during our numerical optimization using first order splines as described in the next section. 0.0 0.2 0.4 0.6 0.8 1.0 possible reserve values revenue seller Revenue seller for various reserve values Figure 1. Revenue of the seller as a function of the reserve value. This shape of revenue by running the first order spline method described in Section 2.4 . For this distribution, there exists two local optima that are equivalent in terms of revenue for the seller but dramatically change the utility of the strategic bidder. The fact that r(1 FBi(r)) is not always strictly concave implies that the set of maximizer is not continuous but only upper hemi-continuous; stated otherwise, the reserve value xβi can jump from a small to high value with an arbitrarily small change in the bidding strategy. In the example of Figure 1, the reserve value switches from 0.18 to 0.58. As a consequence, the expected utility of the bidder, which is another function depending on xβi, might also jumps erratically. In the same example, the lower bound of integration increases from 0.18 to 0.58, so that the overall integral decreases from 0.14 to 0.09. This discontinuity makes the optimization of the real objective difficult. 2.4. An attempt with first-order splines A natural question is whether the buyer can compute shading strategies numerically. A first approach is to look back at the gradient of the bidder s utility in the direction of a certain function ρ, i.e., the directional derivative, that can be computed by elementary calculus. and to look at shading function expressed in a specific basis as i=1 ci(β)fi(x) , and try to optimize over ci. It would be also quite natural to do an isotonic regression and optimize over non-decreasing functions directly; this approach is tackled later on. A natural basis Splines (see e.g. (Hastie et al., 2001) for a practical introduction) are a natural candidate for the function fi s. In particular, first order splines are piecewise continuous functions, hence evaluating derivatives is trivial and it is easy to account in the formula above for the finitely many discontinuities of the derivative that will arise. If ξk s are given knots, first order splines are the functions f1 = 1, f2(x) = x, fk+2(x) = (x ξk)+ = max(x ξk, 0) . Higher order splines could of course also be used. Lemma 3. As described above, the optimal shading problem can be numerically approximated using steepest descent by a succession of linear programs, provided the nondecreasing constraint on β can be written linearly in ci. This is of course the case for 1st order spline. Proof. After the function is expanded in a basis, the functional gradient becomes a standard gradient, and the shading function can be improved with a steepest descent. If the reserve value is not one of the knots, the gradient above is easy to compute: each step of the optimization requires to solve a constrained LP to ensure that the solution is increasing. For 1st order splines, the derivative is constant between knots, thus checking that β ( ) 0 amounts to check finitely many linear constraints and so is amenable to an LP. The objective is not even continuous, though differentiable in a large part of the parameter space. The optimization problem is hard. In our experiments, we got significant improvement over bidding truthfully by using the above numerical method. However, we encountered the discontinuities of the optimization problem described above: our Learning to bid in revenue-maximizing auctions numerical optimizer got stuck at shading functions around which the reserve value was very unstable, which corresponds to revenue curves for the seller with several distant (approximate) local maxima: a small perturbation in function space does not induce much loss of the revenue on the seller side, but can have a huge impact on the reserve value and hence the buyer revenue. Note that in our numerical experiments we did not enforce the non-decreasing-constraint on β but ended up with solutions that were non-decreasing. More details on this approach are provided in Appendix D. This is precisely the reason why, in the next section, we introduce a relaxation of the problem that is easier to optimize and with the same solutions as our initial objective. We also change the class of shading functions we consider and use neural networks to fit them. Before we describe these experiments, we provide some theory for the problem of optimizing buyer revenue in lazy second price auctions. 3. Theory and a relaxation of the problem enabling the use of gradient descent 3.1. The family of optimal extensions of a strategy In the context of lazy second price auctions, any increasing and continuous bidding strategy β whose reserve value r is not 0 can be improved with β (x) = β(r)(1 F (r)) 1 F (x) on [0, r] where β(r) = r, i.e. by thresholding the virtual value below the current reserve value and keeping β = β on (r, + ). Indeed, Lemma 2 yields that hβ (x) = 0 on [0, r] and hβ = hβ elsewhere. So the seller is indifferent between setting the reserve price anywhere in [0, r] and we might assume she picks 0 (if she is welfare benevolent, or it is always possible to give an ε-incentive to pick 0, for ε arbitrarily small). According to Myerson s Lemma, the strategy β generates the same payment as β, so the revenue of the seller coming from this bidder is unchanged. On the other hand, that bidder wins more auctions with this new strategy, hence it improves her revenue and thus her expected utility. In this subsection, we address the question of whether the strategy β , which is simple and robust can be improved for the bidder. Our previous argument already shows that any improvement would be a strategy with 0 reserve value. Differentiating Equation (2) yields f(x)hβ(x) = (β(x)(F(x) 1)) . Let us denote by r the current reserve price; we rewrite the family of bidding strategies β with reserve value at 0 as elements of the following constraint set: E ψB(β(X))G(β(X))1[r0 X r] 0 , 0 r0 r , E ψB(β(X))G(β(X))1[0 X r] = 0 . For all those strategies, the seller revenue is maximal for the reserve value ropt = 0, and hence under the assumption of welfare benevolence, the seller will accept all bids of the bidder. It is also clear that this set of constraints define all possible strategies with reserve value 0. The strategy β (which is increasing and continuous, say) that maximizes the revenue of the bidder corresponds to max β E (X ψB(β(X)))G(β(X))1[X 0] under the constraints that gr0(β) = E ψB(β(X))G(β(X))1[r0 X r] 0 g0(β) = E ψB(β(X))G(β(X))1[0 X r] = 0 . Let us limit ourselves to not changing our strategy beyond r, e.g. by bidding truthfully beyond r. Then we effectively need to maximize max β F(β) = E (X hβ(X))G(β(X))1[0 X r] . with the continuity constraints that β(r) = r. The constraints can be rewritten into gr0(β) = E ψB(β(X))G(β(X))1[0 X r0] = E hβ(X)G(β(X))1[0 X r0] . along with gr(β) = 0. We call those strategies continuation strategies as they extend the bidding below the current reserve price/value. Remark : in this class of feasible strategies, the optimal reserve value for the seller is zero. So the discontinuities of the objective function in the broader class of strategies considered before, which stemmed from discontinuities of the reserve value as a function of the shading function, are not anymore problematic. The following theorem states one of our main results. Theorem 1. Let F, 1/(1 F) and G be differentiable on [0, r]. Suppose that the virtual value ψF is such that ψF (x) 0 on [0, r]. We consider increasing shading functions β on [0, r] with β(r) = r. Thresholding, i.e. using β (x) = r(1 F(r))/(1 F(x)) for 0 x r is locally optimal among continuation strategies for which β is differentiable on [0, r], provided G(β (x)) > 0 on [0, r]. It is also locally optimal among β s such that gr(β) is differentiable as function of r. Furthermore, if r < 1 and G(x) = min(x, 1), i.e. the competition s distribution is Uniform[0, 1], then thresholding is globally optimal among functions that are bounded by 1 and differentiable. Sketch of proof : the proof consists in keeping track of the slack function h(r) = gr(β), rewriting locally feasible Learning to bid in revenue-maximizing auctions β s as functions of h through differential equation manipulations and finally comparing their revenue and showing that the optimal h is zero for our objective. This requires somewhat lengthy and delicate manipulations. In the case of G(x) = min(x, 1), we are able to write all feasible β s as a function of h and carry out the program globally. We note that we did not require in our analysis that our optimization be limited to non-decreasing functions; it turns out that our local optima are optimal in larger class of functions. 3.2. One relaxation of the objective Instead of computing the exact reserve value in the definition of the expected utility of the bidder, we introduce a relaxation Ur of the objective corresponding to : Ur(βi) = E (Xi hβi(Xi))Gi(β(Xi))1[hβi(Xi) 0] . (3) We replaced 1[Xi xβi] by 1[hβi(Xi) 0]. This relaxation avoids to compute the reserve value at each step of the gradient descent and remove most of the discontinuities of the previous objective. We now prove that the function maximizing Equation. 3 has non-negative virtual value. The value of the relaxation objective at its optimum is equal to the one in the strategic bidder problem. Theorem 2. If an increasing and differentiable function βi is maximizing Ur(βi) = E (Xi hβi(Xi))Gi(βi(Xi)))1[hβi(Xi) 0] , it has non-negative virtual value, a reserve value equal to zero and Ur(βi) = U(βi) with U(βi) = E (Xi hβi(Xi))Gi(βi(Xi)))1[Xi xβi] . Proof. We use the fact that if hβ(x) < 0 on a certain interval [a,b], we can find a new strategy β+ with higher Ur. Let us consider the rightmost interval [a,b] where hβ(x) < 0. On [b, + ], β+ = β. Then on [a,b], β+(x) = β(b)(1 F(b)/(1 F(x)). β+ verifies hβ+(x) = 0 on [a, b]. Then if we denote T = β+(a), we define β+ on [0,a] as β+(x) = β(x) + (T β(a))(1 F(a))/(1 F(x)). We have hβ+ = hβ on [0,a]. β+ is continuous. With f(x)hβ(x) = (β(x)(F(x) 1)) , we see that β(1 F) is non-decreasing on [a,b]. Hence β+(a) β(a). Therefore, x, β+(x) β(x) and G(β+(x)) G(β(x)). Hence, Ur(β ) Ur(β). Then, we tackle the next interval where hβ(x) < 0 by doing the same manipulation on β+. We conclude by induction on the intervals where hβ 0. Thus, a solution of the relaxation has a virtual value positive everywhere and a reserve value equal to zero. In this case, Ur(βi) = U(βi). This new objective enables to run simple gradient descent algorithms without the need to recompute the reserve value at each iteration. It is also more stable than the original one since a local change of the virtual value does not completely change the value of the objective, which could be the case when the reserve value were part of the objective. 4. Experimental setup We present in this section the complete approach and report the uplift of the new bidding strategies in various revenuemaximizing auctions. 4.1. Our architecture To fit the optimal strategies, we use a simple one-layer neural network with 200 Re Lus. We replace the indicator function by a sigmoid function to have a fully differentiable objective and we optimize Uη(βi) = EXi Fi (Xi hβi(Xi))Gi(β(Xi))σ(ηhβi(Xi)) . with σ(x) = 1 1+exp( x) and η = 1000. We start with a batch size of 10000 examples, sampled according to the value distribution of the bidder. We use a stochastic gradient algorithm (SGD) with a decreasing learning rate starting at 0.001. The full code in Py Torch is provided with the paper. The learning of an affine shading strategy is also provided in the notebook and is reaching already very decent performance. In our setting, we assume that Gi is known. However, we could replace its expression by an approximation ˆ Gi learned from past examples of bids of the competition or on the winning distribution of bidder i computed on past auctions (in practice one may have to use survival analysis techniques to account for censoring of the observations). The results for the lazy second price auction with personalized price are presented in Table 1 and in Table 2. 4.2. Extension to other types of auction Our approach can easily be extended to many other types of auctions. Only a few lines of code are needed to adapt the objective to other mechanisms. The Myerson auction. The Myerson auction (Myerson, 1981) consists in using the virtual value both for the allocation and payment rules. The item is allocated to the bidder with the highest non-negative virtual value that pays: ψ 1 Bi (max(max j =i ψBj(Xj), 0)) As for the lazy second price auction, we can use the Myerson lemma and show that the expected utility of the strategic bidder using the strategy β in the Myerson auction is Ui(βi) = E ([Xi hβi(Xi)]FZ(hβi(Xi))) . Learning to bid in revenue-maximizing auctions with FZ the cumulative distribution function of Z = max2 j K(0, ψj(Xj)), Xi is the value of bidder i, and hβi = ψBi(βi(Xi)) is the virtual value function associated with the bid distribution. For some distribution, the optimal strategy can be analytically computed. For instance, for the uniform distribution, we can prove this lemma which defines the optimal strategies. Lemma 4 (Shading against (K 1) uniform bidders). Suppose that x has a positive density on its support and assume that x is bounded by (K +1)/(K 1). Let ϵ > 0 be chosen by bidder 1 arbitrarily close to 0. Let us call h(ϵ) K (x) = K ϵ 1+ϵx if x [0, (1 + ϵ)/(K 1)) , K x 1 K 1 if x (1 + ϵ)/(K 1) . A near-optimal shading strategy is for bidder 1 to shade her value through β(ϵ) 1 (x) = E h(ϵ) K (t)|t x . As ϵ goes to 0+, this strategy approaches the optimum. If the support of x is within (1/(K 1), (K + 1)/(K 1)), then ϵ can be taken equal to 0. The full proof is in Appendix C. Since in this specific setting optimal strategies have a known closed form, our optimization pipeline can be tested to see if it recovers these strategies. With the same pipeline used in Section 4.1, we optimize Ui(βi) = E ([Xi hβi(Xi)]FZ(hβi(Xi))) . Appendix E.2 focuses on the uniform distribution where our algorithm recover exactly the strategies proposed in Lemma 9 showing the robustness of our approach. The interest of the optimization pipeline is the direct extension to all possible value distributions without the need to solve at each time a new system of differential equations. The performance with an exponential value distribution is provided in Table 1. Eager second price auction with monopoly reserve prices. The eager second price auction consists of running a second price auction but only among bidders that clear their personalized reserve price. The objective function is very similar to the one of the lazy second price auction except that the winning distribution is different below the reserve price of the other bidders. Indeed, if all other bidders are below their reserve price, the strategic bidders that bids above his monopoly price is sure to win and only pays her monopoly price. We provide more details in Appendix E.3. The boosted second price auction. (BSP) Two small variants of the boosted second price auction (BSP) (Golrezaei et al., 2017) can also be addressed. We deal with the BSP auction as it seems to be one of the state of the art alternative to the second price auction with personalized reserve price to be used in practice and deals with heterogeneities between bidders. In the original paper, the seller computes first the reserve prices of each bidder based on their bid distributions. Then, the algorithm computes a boosting factor γi > 0 for each bidder by counterfactually maximizing the revenue of the seller. More precisely, the auction is ran according to : Algorithm 1 Boosted second price (r, γ) - First each bidder i submits his bid bi - Define S as a set of bidders whose bids exceed their reserve price, i.e, S = {i : bi ri} - If the set S is empty, the item is not allocated. Otherwise, the item is allocated to bidder i with the highest boosted bid, i.e., i = argmaxi S{biγi} and she pays max{ri , maxi S,i =i {biγi/γi }}. For other bidders, the payment is zero. To explain intuitively our two objectives corresponding to this auction, we consider first the example of the family of generalized Pareto distributions. As the virtual value of all distributions in this family is affine, the boosted second price auction is strictly equivalent to the Myerson auction in this family. It explains why this auction can perform well in practice for the seller since it avoids to compute exactly the virtual value by approximating it by a linear fit. In the first model, we assume that the seller first makes an affine-fit through an L2 regression on the virtual values she observes. Then, she runs a Myerson auction based on these L2 fits. In the case of Generalized Pareto distributions, this procedure results exactly in the BSP auction. If we note ˆψBi the L2 fit of the virtual value corresponding to the bid distrution and ˆhβi = ˆψBi βi, we optimize the Myerson objective with ˆhβi corresponding to the fit of hβi. The main difference with the BSP auction for non-generalized pareto distributions is that the fit is used to compute the reserve price. In the second objective, we adress this limitation by computing first the reserve price ri based on the observed ψBi. Then, the algorithm computes a linear fit of ψBi on bids higher than ri. This linear fit is used as the boosting parameter for bidder i. To make the objective differentiable, we consider the relaxation where ri is assumed to be minr(ψBi(r) > 0). We verify retrospectively that the final strategy verifies minr(ψBi(r) > 0) = argmax(ri(1 FBi(ri))) Our experiments show that our approach can also be empirically generalized to more advanced, intricate, practical and modern settings, on top of working well theoretically on the lazy second price auctions. Learning to bid in revenue-maximizing auctions Auction Type K=2 K=3 K=4 Baselines Utility of truthful strategy (in revenue maximizing) 0.30 0.24 0.21 Utility of truthful strategy (in welfare maximizing) 0.50 0.33 0.25 Lazy second price auction Utility of strategic bidder 0.45 0.001 0.31 0.001 0.24 0.001 Uplift vs truthful bidding +50% +29% +14% Eager second price auction Utility of strategic bidder 0.52 0.02 0.33 0.02 0.25 0.02 Uplift vs truthful bidding +73% +37% +19% Myerson auction Utility of strategic bidder 0.64 0.001 0.45 0.001 0.35 0.001 Uplift vs truthful bidding +113% +87% +67% Boosted second price Utility of strategic bidder 0.48 0.03 0.41 0.001 0.32 0.001 Uplift vs truthful bidding +60% +71% +52% Table 1. All bidders have an exponential value distribution with parameter λ = 1. The strategic bidder has K-1 opponents bidding truthfully and having a reserve price equal to 1.0, their monopoly price. The reserve price of the strategic bidder is computed on her bid distribution. For each run, the evaluation is based on 106 samples, and we average the performances over 10 learnings. The utility of the strategic bidder can be higher that in the welfare-maximizing auction because revenue maximizing auctions remove competition below the reserve price, as illustrated by some examples in Appendix E. 4.3. Evaluation and results Two different value distributions were used to run the experiments: the exponential distribution in Table 1 and the uniform distribution in Table 2 in Appendix. We focus on a small number of bidders since it is where the reserve price play an important role for the seller. (Celis et al., 2014) also noticed that the median of the number of participants in online advertising auctions is 6. To compute the real performance of the strategy, we are conservative in the computation of the reserve price since we use ri = max(b|ψBi(b) < 0). We then compute the performance by computing the objective (expected utility) with Monte-Carlo simulations. For the lazy second price with personalized reserve price, we use for instance U(βi) = E (Xi hβi(Xi))Gi(β(Xi))1(βi(Xi) ri) . We compare the performance of our strategies with two baselines: the utility of one bidder bidding truthfully in a second price auction without reserve price (the welfare maximizing auction) and in a second price auction with monopoly price (with symmetric bidders, this auction is equivalent to the Myerson auction and is revenue-maximizing for the seller). For BSP, we report results for the second objective which is the closest one to the corresponding procedure of (Golrezaei et al., 2017). The first one gives similar uplifts that the strategic behavior in the Myerson auction. The order of magnitude of the uplift reported is significant. We observe that BSP and the Myerson auction are less robust to strategic behavior than the lazy second price auction with personalized reserve price. Indeed, as in the eager version of the second price auction there is no competition when all other bidders are below their reserve price. It is not the case for the lazy second price auction explaining why the uplift are slightly lower for this specific auction. We focused on the stationary case where the strategic bidder has to choose one strategy implying a bid distribution and the seller will immediately optimize their mechanism according to this bid distribution. However, our differential approach allows some generalizations. In future work, we could adapt the differential approach to more dynamic settings where the seller uses a particular dynamic to update the reserve price based on past bids of the bidders. 5. Conclusion In this paper, we showed that machine learning can be efficiently used on the bidder side to learn how to shade in revenue-maximizing auctions that are optimized based on past bids (or a distribution announced by the bidder to which she commits). Our work, both theoretical and practical, complements the classical approach using statistical learning from the seller s standpoint showing that strategic bidding can be implemented in some of the main revenue-maximizing auctions. Our work also raises questions about many automatic mechanism procedures since many are based on the assumption of having observed past truthful bids in order to optimize mechanisms. From an industry point of view, our work provides a new argument to come back to simple and more transparent auction mechanisms that are less subject to optimization on both the bidders and the seller s sides. Learning to bid in revenue-maximizing auctions Acknowledgements V. Perchet has benefited from the support of the FMJH Program Gaspard Monge in optimization and operations research (supported in part by EDF) and from the CNRS through the PEPS program. N. El Karoui gratefully acknowledges support from grant NSF DMS 1510172. Marc Abeille, Cl ement Calauz enes, Noureddine El Karoui, Thomas Nedelec, and Vianney Perchet. 2018. Explicit shading strategies for repeated truthful auctions. ar Xiv preprint ar Xiv:1805.00256. Itai Ashlagi, Constantinos Daskalakis, and Nima Haghpanah. 2016. Sequential mechanisms with ex-post participation guarantees. In Proceedings of EC 2016. Santiago R Balseiro, Vahab S Mirrokni, and Renato Paes Leme. 2017. Dynamic mechanisms with martingale utilities. In Management Science. L Elisa Celis, Gregory Lewis, Markus Mobius, and Hamid Nazerzadeh. 2014. Buy-it-now or take-a-chance: Price discrimination through randomized auctions. Management Science 60, 12 (2014), 2927 2948. Richard Cole and Tim Roughgarden. 2014. The sample complexity of revenue maximization. In Proceedings of Theory of computing. Vincent Conitzer and Tuomas Sandholm. 2002. Complexity of mechanism design. In Proceedings of UAI 2002. Morgan Kaufmann Publishers Inc., 103 110. Nikhil R Devanur, Zhiyi Huang, and Christos-Alexandros Psomas. 2016. The sample complexity of auctions with side information. In Proceedings of Theory of Computing. Paul D utting, Zhe Feng, Harikrishna Narasimhan, and David C Parkes. 2017. Optimal auctions through deep learning. ar Xiv preprint ar Xiv:1706.03459 (2017). Alessandro Epasto, Mohammad Mahdian, Vahab Mirrokni, and Song Zuo. 2018. Incentive-aware learning for large markets. In Proceedings of WWW 2018. Zhe Feng, Chara Podimata, and Vasilis Syrgkanis. 2018. Learning to bid without knowing your value. In Proceedings of EC 2018. N. Golrezaei, M. Lin, V. Mirrokni, and H. Nazerzadeh. 2017. Boosted Second-price Auctions for Heterogeneous Bidders. In Management Science. Jason D Hartline and Tim Roughgarden. 2009. Simple versus optimal mechanisms. In Proceedings of EC 2009. Trevor Hastie, Robert Tibshirani, and Jerome Friedman. 2001. The Elements of Statistical Learning. Springer New York Inc., New York, NY, USA. Zhiyi Huang, Yishay Mansour, and Tim Roughgarden. 2018. Making the most of your samples. SIAM J. Comput. Yash Kanoria and Hamid Nazerzadeh. 2014. Dynamic Reserve Prices for Repeated Auctions: Learning from Bids. In Proceedings of WINE 2014. V. Krishna. 2009. Auction Theory. Andres M Medina and Mehryar Mohri. 2014. Learning theory and algorithms for revenue optimization in second price auctions with reserve. In Proceedings of ICML 2014. Paul R Milgrom and Steven Tadelis. 2018. How Artificial Intelligence and Machine Learning Can Impact Market Design. Technical Report. National Bureau of Economic Research. Mehryar Mohri and Andres Munoz. 2015. Revenue optimization against strategic buyers. In Proceedings of NIPS 2015. Jamie H Morgenstern and Tim Roughgarden. 2015. On the pseudo-dimension of nearly optimal auctions. In Proceedings of NIPS 2015. R. B. Myerson. 1981. Optimal Auction Design. Math. Oper. Res. 6, 1. Thomas Nedelec, Marc Abeille, Cl ement Calauz enes, Noureddine El Karoui, Benjamin Heymann, and Vianney Perchet. 2018. Thresholding the virtual value: a simple method to increase welfare and lower reserve prices in online auction systems. ar Xiv preprint ar Xiv:1808.06979 (2018). M. Ostrovsky and M. Schwarz. 2011. Reserve prices in internet advertising auctions: A field experiment. In Proceedings of EC 2011. Renato Paes Leme, Martin Pal, and Sergei Vassilvitskii. 2016. A field guide to personalized reserve prices. In Proceedings of WWW 2016. Pingzhong Tang and Yulong Zeng. 2018. The price of prior dependence in auctions. In Proceedings of EC 2018. Jonathan Weed, Vianney Perchet, and Philippe Rigollet. 2016. Online learning in repeated auctions. In Proceedings of COLT 2016. R. Wilson. 1987. Game-theoretic analyses of trading processes. In Advances in Economic Theory.