# myersonian_regression__e39d5100.pdf Myersonian Regression MIT cliu568@mit.edu Renato Paes Leme Google Research renatoppl@google.com Jon Schneider Google Research jschnei@google.com Motivated by pricing applications in online advertising, we study a variant of linear regression with a discontinuous loss function that we term Myersonian regression. In this variant, we wish to find a linear function f : Rd ! R that well approximates a set of points (xi, vi) 2 Rd [0, 1] in the following sense: we receive a loss of vi when f(xi) > vi and a loss of vi f(xi) when f(xi) vi. This arises naturally in the economic application of designing a pricing policy for differentiated items (where the loss is the gap between the performance of our policy and the optimal Myerson prices). We show that Myersonian regression is NP-hard to solve exactly and furthermore that no fully polynomial-time approximation scheme exists for Myersonian regression conditioned on the Exponential Time Hypothesis being true. In contrast to this, we demonstrate a polynomial-time approximation scheme for Myersonian regression that obtains an m additive approximation to the optimal possible revenue and can be computed in time O(exp(poly(1/ ))poly(m, n)). We show that this algorithm is stable and generalizes well over distributions of samples. 1 Introduction In economics, the Myerson price of a distribution is the price that maximizes the revenue when selling to a buyer whose value is drawn from that distribution. Mathematically, if F is the cdf of the distribution, then the Myerson price is p = argmaxp p (1 F(p)) In many modern applications such as online marketplaces and advertising, the seller doesn t just set one price p but must instead price a variety of differentiated products. In these settings, a seller must design a policy to price items based on their features in order to optimize revenue. Thus, in this paper we study the contextual learning version of Myersonian pricing. More formally, we get to observe a training dataset {(xt, vt)}t=1..m representing the bids of a buyer on differentiated products. We will assume that the bids vt 2 [0, 1] come from a truthful auction and hence represent the maximum value a buyer is willing to pay for the product. Each product is represented by a vector of features xt 2 Rn normalized such that kxtk2 1. The goal of the learner is to design a policy that suggests a price φ(xt) for each product xt with the goal of maximizing the revenue on the underlying distribution D from which the pairs (xt, vt) are drawn. In practice, one would train a pricing policy on historical bids (training) and apply this policy on future products (testing). Mathematically, we want to solve φ2P E(x,v) D[REV(φ(x); v)] (PP) where P is a class of pricing policies and REV is the revenue function (see Figure 1) REV(p; v) = max(p, 0) 1{p v} having only access to samples of D. 34th Conference on Neural Information Processing Systems (Neur IPS 2020), Vancouver, Canada. Figure 1: Revenue function Medina and Mohri [2014a] establish that if the class of policies P has good generalization properties (defined in terms of Rademacher complexity) then it is enough to solve the problem on the empirical distribution given by the samples. The policy that optimizes over the empirical distribution is typically called Empirical Risk Minimization (ERM). The missing piece in this puzzle is the algorithm, i.e. how to solve the ERM problem. Previous papers (Medina and Mohri [2014a], Medina and Vassilvitskii [2017], Shen et al. [2019]) approached this problem by designing heuristics for ERM and giving conditions on the data under which the heuristics perform well. In this paper we give the first provable approximation algorithm for the ERM problem without assumptions on the data. We also establish hardness of approximation that complements our algorithmic results. We believe these are the first hardness results for this problem. Even establishing whether exactly solving ERM was NP-hard for a reasonable class of pricing policies was open prior to this work. Myersonian regression We now define formally the ERM problem for linear pricing policies1, which we call Myersonian regression. Recall that the dataset is of the form {(xt, vt)}t=1..m with xt 2 Rn, kxtk2 1 and vt 2 [0, 1]. The goal is to find a linear pricing policy x 7! hw, xi with kwk2 1 that maximizes the revenue on the dataset, i.e. max w2Rn;kwk2 1 REV(hw, xti; vt) (MR) It is worth noting that we restrict ourselves to 1-Lipschitz pricing policies by only considering policies with kwk2 1. Bounding the Lipschitz constant of the pricing policy is important to ensure that the problem is stable and hence generalizable. We will contrast it with the unregularized version of (MR) in which the constraint kwk2 1 is omitted: REV(hw, xti; vt) (UMR) Without the Lipschitz constraint it is possible to come up with arbitrarily close datasets in the sense that kxt xtk and |vt vt| generating vastly different revenue even as ! 0. We will also show that (UMR) is APX-hard, i.e. it is NP-hard to approximate within 1 0 for some constant 0 > 0. Our Results Our main result is a polynomial time approximation scheme (PTAS) using dimensionality reduction. We present two versions of the same algorithm. The first version of the PTAS has running time O(epoly(1/ ) poly(n, m)) 1The choice of linear function is actually not very restrictive. A common trick in machine learning is to map the features to a different space and train a linear model on (x). For example if d = 2, the features are (x1, x2). By mapping (x) = (1, x1, x2, x2 2, x1x2) 2 R6, and training a linear function on (x), we are actually optimizing over all quadratic functions on the original features. Similarly, we can optimize over any polynomial of degree k or even more complex functions with an adequate mapping. and outputs an L-Lipschitz pricing policy with L = O( pn) that is an m-additive approximation of the optimal 1-Lipschitz pricing policy. The second version of the PTAS has running time O(npoly(1/ ) poly(n, m)) and outputs a 1-Lipschitz pricing policy that is an m-additive approximation of the optimal 1Lipschitz pricing policy. We complement this result by showing that the Myersonian regression problem (MR) is NP-hard using a reduction from 1-IN-3-SAT. While it is not surprising that solving Myersonian regression exactly is NP-hard given the discontinuity in the reward function, this has actually been left open by several previous works. In fact, the same reduction implies that under the Exponential Time Hypothesis (ETH) any algorithm approximating it within an m additive factor must run in time at least e (poly(1/ )), therefore ruling out a fully-polynomial time approximation scheme (FPTAS) for the problem. This hardness of approximation perfectly complements our algorithmic results, showing that our guarantees are essentially the best that one can hope for. Finally we discuss stability and generalization of the problem. We show that (UMR) is unstable in the sense that arbitrarily small perturbations in the input can lead to completely different solutions. On the other hand (MR) is stable in the sense that the optimal solution varies continuously with the input. We also discuss the setting in which there is an underlying distribution D on datapoints (x, v) and while we optimize on samples from D, we care about the loss with respect to the underlying distribution. We also discuss stability of our algorithms and how to extend them to other loss functions. Due to space constraints, most proofs are deferred to the Supplementary Material. Related work Our work is in the broad area of learning for revenue optimization. The papers in this area can be categorized along two axis: online vs batch learning and contextual vs non-contextual. In the online non-contextual setting, Kleinberg and Leighton [2003] give the optimal algorithm for a single buyer which was later extended to optimal reserve pricing in auctions in Cesa-Bianchi et al. [2013]. In the online contextual setting there is a stream of recent work deriving optimal regret bounds for pricing (Amin et al. [2014], Cohen et al. [2016], Javanmard and Nazerzadeh [2016], Javanmard [2017], Lobel et al. [2017], Mao et al. [2018], Leme and Schneider [2018], Shah et al. [2019]). For batch learning in non-contextual settings there is a long line of work establishing tight sample complexity bounds for revenue optimization (Cole and Roughgarden [2014], Morgenstern and Roughgarden [2015, 2016]) as well as approximation algorithms to reserve price optimization (Paes Leme et al. [2016], Roughgarden and Wang [2019], Derakhshan et al. [2019]). Our paper is in the setting of contextual batch learning. Medina and Mohri [2014a] started the work on this setting by showing generalization bounds via Rademacher complexity. They also observe that the loss function is discontinuous and non-convex and propose the use of a surrogate loss. They bound the difference between the pricing loss and the surrogate loss and design algorithms for minimizing the surrogate loss. Medina and Vassilvitskii [2017] design a pricing algorithm based on clustering, where first features are clustered and then a non-contextual pricing algorithm is used on each cluster. Shen et al. [2019] replaces the pricing loss by a convex loss function derived from the theory of market equilibrium and argue that the clearing price is a good approximation of the optimal price in real datasets. A common theme in the previous papers is to replace the pricing loss by a more amenable loss function and give conditions under which the new loss approximates the pricing loss. Instead here we study the pricing loss directly. We give the first hardness proof in this setting and also give a (1 )-approximation without any conditions on the data other than bounded norm. Our approximation algorithms for this problem works by projecting down to a lower-dimensional linear subspace and solving the problem on this subspace. In this way, it is reminiscent of the area of compressed learning (Calderbank et al. [2009]), which studies if it is possible to learn directly in a projected ( compressed ) space. More generally, our algorithm fits into a large body of work which leverages the Johnson-Lindenstrauss lemma for designing efficient algorithms (see e.g. Linial et al. [1995] and Har-Peled et al. [2012]). Hardness of approximation have been established for non-contextual pricing problems with multiple buyers, e.g Paes Leme et al. [2016], Roughgarden and Wang [2019]. Such hardness results hinge on the interaction between different buyers and don t translate to single-buyer settings. The hardness result in our paper is of a different nature. 2 Approximation Algorithms The main ingredient in the design of our algorithms will be the Johnson-Lindenstrauss lemma: Lemma 2.1 (Johnson-Lindenstrauss). Given a vector x 2 Rn with kxk2 = 1, if J is a k n matrix formed by taking k random orthogonal vectors as rows for k = O( 2 log δ 1) and J = n/k J, then: Pr(|k Jxk2 1| > ) δ The following is a direct consequence of the JL lemma: Lemma 2.2. Let J be the JL-projection with k = O( 2 log(1/ )), w be the optimal solution to (MR) and xt is a point in the dataset with hw , xti then with probability at least 1 the following inequalities hold: (1 ) kxtk2 k Jxtk2 (1 + ) kxtk2 (1 ) hw , xti h Jw , Jxti (1 + ) hw , xti PTAS - Version 1: For the first version of the algorithm, we randomly sample 1/ JL-projections J with k = O( 2 log(1/ )) and search over an -net of the projected space. For each projection, we define a set of discretized vectors as: D = { ˆw; ˆw = 5z for z 2 Zk, k ˆwk2 1 + } Then we search for the vector ˆw 2 D that maximizes REV(h ˆw, Jxti; vt) (1) Over all projections, we output the vector w = J> ˆw that maximizes the revenue. Theorem 2.3. There is an algorithm with running time O(epoly(1/ )poly(n, m)) that outputs a vector w with kwk2 O( pn) such that: REV(hw, xti; vt) where R = P t REV(hw , xti; vt) for the optimal w with kw k2 1. Proof. The running time follows from the fact that |D| (1/ )O(k) = e O(poly(1/ )). We show the approximation guarantee in three steps: Step 1: defining good points. Let w be the optimal solution to (MR). Say that a datapoint (xt, vt) is good if hw , xti vt and the event in Lemma 2.2 happens. If G is the set of indices t corresponding to good datapoints, then with at least 1/2 probability: hw , xti R 2 m This is true since the points with hw , xti < can only affect the revenue by at most each and for the remaining m0 points, each can fail to be good with probability at most . The revenue loss in expectation is at most m0 , so by Markov s inequality it is at most 2m0 with 1/2 probability. Step 2: projection of the optimal solution. Define w0 = (1 2 ) Jw and define ˆw to be the vector in D obtained by rounding all coordinates of w0 to the nearest multiple of 5. For any good index t 2 G we have: h ˆw, Jxti = h ˆw w0, Jxti + hw0, Jxti (1 + ) 5p k + (1 2 )h Jw , Jxti k + (1 )hw , xti vt and hence that datapoint generates revenue since the price is below the value. And: h ˆw, Jxti = h ˆw w0, Jxti + hw0, Jxti (1 + ) 5p k + (1 2 )h Jw , Jxti k + (1 5 )hw , xti Step 3: bounding the revenue. Finally, note that hw, xti = h J> ˆw, xti = h ˆw, Jxti REV(hw, xti; vt) = 0 h ˆ w,Jxti vt h ˆw, Jxti (1 5 ) hw , xti O( ) (1 5 )(R 2m ) O( m) = R O( m) Since we sample 1/ independent JL projections and for each, we find an O( m) additive approximation with probability at least 1/2, our algorithm achieves expected revenue R O( m), as desired. PTAS Version 2 The main drawback of the first version of the PTAS is that we output an pn Lipschitz pricing policy that is an approximation to the optimal 1-Lipschitz pricing policy. With an increase in running time, it is possible to obtain the same approximation with an 1-Lipschitz pricing policy (i.e. kwk2 1). For that we will increase the dimension of the JL projection to k = O( 2 log(n/ )). This will allow us to have the following conditions hold simultaneously for all datapoints with probability at least 1 : (1 ) kxtk2 k Jxtk2 (1 + ) kxtk2 hw , xti 2 h Jw , Jxti hw , xti + 2 This follows from the same argument in Lemma 2.2, taking the Union Bound over all points. Now we repeat the following process (1/ )O(k log(1/ )) times: Choose a random point ˆw in the unit ball in Rk. For each such ˆw we define the important set as t 2 ˆG( ˆw) if 10 h ˆw, Jxti vt. Now, we check (by solving a convex program) if there exists a vector w 2 Rn with kwk2 1 such that: 1 + 5 hw, xti vt, 8t 2 ˆG( ˆw) If it exists, call it w( ˆw) otherwise discard ˆw. Over all (1/ )O(k log(1/ )) iterations, for all vectors ˆw that weren t discarded, choose the one maximizing the objective (1) and output w( ˆw). Theorem 2.4. There is an algorithm with running time O(npoly(1/ )poly(n, m)) that outputs a vector w with kwk2 1 such that: REV(hw, xti; vt) where R = P t REV(hw , xti; vt) for the optimal w with kw k2 1. Proof. Step 1: When ˆw lies close to the projection of the optimum, the convex program is feasible Let w0 = (1 2 ) Jw . If || ˆw w0|| 5 we will show that the convex program is solvable. For t 2 ˆG( ˆw) we have hw , xti 1 1 2 hw0, Jxti + 2 (1 + 3 )(h ˆw, Jxti + (1 + ) 5) + 2 (1 + 5 )vt hw , xti 1 (1 2 )hw0, Jxti 2 (1 + 2 )hw0, Jxti 2 (1 + 2 )(h ˆw, Jxti (1 + ) 5) 2 > h ˆw, Jxti Thus 1/(1 + 5 ) w is a solution to the convex program. Step 2: When ˆw lies close to the projection of the optimum, any solution to the convex program achieves a good approximation If || ˆw w0|| 5 then for each data point xt with t 2 ˆG( ˆw) h ˆw, Jxti = h ˆw w0, Jxti + hw0, Jxti (1 + ) 5 + (1 2 )h Jw , Jxti (1 + ) 5 + (1 5 )hw , xti Note the last step holds because hw , xti h ˆw, Jxti 10 h Jw , Jxti hw , xti 2. Next, we deal with the datapoints with t /2 ˆG( ˆw). For these datapoints, either h ˆw, Jxti < 10 in which case hw , xti (1 + 5 )hw0, Jxti + 2 (1 + 5 )(h ˆw, Jxti + (1 + ) 5) + 2 11 or h ˆw, Jxti > vt 10 in which case hw , xti 1 (1 2 )hw0, Jxti 2 (1 + 2 )hw0, Jxti 2 (1 + 2 )(h ˆw, Jxti (1 + ) 5) 2 > (1 + 2 )(vt (1 + ) 5) 2 > vt Thus, the total revenue achieved by w( ˆw) is at least t2 ˆ G( ˆ w) 2 5 + (1 5 )REV(hw , xti; vt) 2 5m + (1 10 ) t2 ˆ G( ˆ w) REV(hw , xti; vt) 2 5m + (1 10 ) REV(hw , xti; vt) 11 m REV(hw , xti; vt) 25 m Step 3: The algorithm finds a good approximation with probability 1 O( ) It suffices to show that our algorithm will choose some ˆw such that || ˆw w0|| 5 with probability 1 O( ). Note ||w0||2 (1 2 )(1 + ) 1 . Thus the probability that ˆw lands within distance 5 of w0 is 5k. Since we choose (1/ )O(k log(1/ )) different points ˆw independently at random, the probability that at least one of them lands within distance 5 of w0 is at least 1 . 3 Hardness of approximation Unlike 2 and 1 regression, Myersonian regression is NP-hard. We prove two hardness results. First we show that without the assumption ||w||2 1, achieving a constant factor approximation is NPhard. Then we show that under the Exponential Time Hypothesis (ETH), any algorithm that achieves a m-additive approximation for Myersonian regression must run in time at least exp(O 1-in-3-SAT We will rely on reductions from the 1-IN-3-SAT problem, which is NP-complete. The input to 1-IN-3-SAT is an expression in conjunctive normal form with each expression having 3 literals per clause (i.e. a collection of expression of the type Xi _ Xj _ Xk). The problem is to determine if there is a truth assignment such that exactly one literal in each clause is true (and the remaining are false). GAP 1-in-3-SAT We will need a slightly stronger hardness result that 1-in-3-SAT is not only hard to solve exactly, but it is hard to approximate the maximum number of clauses that can be satisfied. In particular, there are constants 0 < c1 < c2 1 such that given a 1-in-3-SAT instance, it is NP-hard to distinguish the following two cases At most c1-fraction of the clauses can be satisfied At least c2-fraction of the clauses can be satisfied ETH The Exponential Time Hypothesis says that 3-SAT with N variables can t be solved in time O(2c Npoly(N)) for some constant c > 0. Since there is a linear time reduction between 3-SAT and 1-IN-3-SAT and 1-IN-3-SAT is NP-complete, then ETH implies that there is no O(2c Npoly(N)) time algorithm for 1-IN-3-SAT. Lemma 3.1. There exists a constant > 0 for which it is possible to reduce (in poly-time) an instance of (c1, c2)-GAP 1-in-3-SAT to computing a (1 )-approximation for an instance of the unregularized Myersonian regression problem (UMR). Theorem 3.2. There is some constant > 0 for which obtaining a (1 )-approximation for the unregularized Myersonian regression problem (UMR) is NP-hard. The proof follows directly from Lemma 3.1 and the NP hardness of GAP-1-IN-3-SAT. The previous result rules out a PTAS for (UMR). In contrast we will see that while (MR) is still NP-hard to solve exactly, it admits a PTAS. However, runtime that is superpolynomial in is necessary. Lemma 3.3. It is possible to transform (in poly-time) an instance of 1-IN-3-SAT with N variables into an instance of Myersonian regression with the promise ||w||2 1 and n = O(N) and m = O(N 5) in such a way that a satisfiable 1-IN-3-SAT instance will map to an instance of Myersonian regression with revenue R O(N 2.5) while any unsatisfiable instance will map to an instance with revenue at most R 0.5N 0.5. If we assume ETH, we obtain a bound on the runtime of any approximation algorithm: Theorem 3.4. Under ETH, any algorithm that achieves a m-additive (or (1 )-multiplicative) approximation for Myersonian regression must run in time at least O(2 ( 1/6)poly(n, m)). Proof. Assume there is an approximation algorithm for Myersonian regression with running time O(2 ( 1/6)poly(n, m)) for the constant c in the definition of ETH. The for an instance of 1-IN-3-SAT with N variables, consider the transformation in Lemma 3.3 and apply the approximation algorithm with = O(1/N 6). Such an approximation algorithm would run in time O(2c Npoly(N)) and distinguish between the satisfiable and unsatisfiable cases of 1-IN-3-SAT, contradicting ETH. 4 Stability, Generalization and Extensions We start by commenting on the importance of the constraint kwk2 1 imposed on the problem (MR), which is closely related to stability and generalization. Offset term It will be convenient to allow a constant term in the pricing loss, i.e. we will look at pricing functions of the type: This is equivalent to assuming that all the datapoints have xt 1 = 1 and kxtk2 2. We renormalize such that we still have Pn i)2 1. We will make this assumption for the rest of this section. We note that this assumption doesn t affect the results in the previous sections. The positive results remain unchanged since we don t have any assumption on the data other than the norm being bounded by a constant. Our hardness results can be easily adapted to the setting with an offset term. We can essentially force the constant term to be very small by adding (N 103) data points with vt = 1/N 100, xt 1 = 1 and all other coordinates 0. Stability We start by discussing the constraint kwk2 1 imposed on the problem (MR). Without this constraint, it is possible to completely change the objective function with a tiny perturbation in the problem data. Let R be the optimal revenue in the unregularized Myersonian regression (UMR) for some instance (xt, vt). A natural upper bound on R is the maximum welfare, given by W = Pm t=1 vt. Typically R < W. Consider such an instance. For any fixed δ < 0 consider the following two instances: xt = (xt, 0) 2 Rn+1 xt = (xt, δvt) 2 Rn+1 The instances ( xt, vt)t=1..m and ( xt, vt)t=1..m are very close to each other in the sense that the labels are the same and the features have: k xt xtk δ, 8t. However, the optimal revenue of ( xt, vt)t=1..m under (UMR) is R while the optimal revenue of ( xt, vt)t=1..m is W by choosing w = (0, δ 1). This is true even as δ ! 0. On the other hand, the solution of the regularized problem (MR) is Lipschitz-continuous in the data. Theorem 4.1. Consider two instances ( xt, vt)t=1..m and ( xt, vt)t=1..m such that k xt xtk δ and | vt vt| δ for all t, then if R and R are the respective solutions to (MR) then: | R R| O(δm) Uniform Convergence and Generalization To understand generalization, we are concerned with the performance of the algorithm on a distribution D that generates datapoints (xt, vt). We will sample m points from this distribution and obtain a dataset S = {(xt, vt); t = 1..m}. We want to compare across all pricing policies w the objective function on the sample: REV(hw, xti; vt) with the performance on the original distribution: FD(w) = E(x,v) D REV(hw, xti; vt) Medina and Mohri [2014a] provide bounds for |FS(w) FD(w)| by studying the empirical Rademacher complexity of the pricing function. The following statement follows directly from Theorem 3 in their paper. Note that while their theorem bounds only one direction, the same proof also works for the other direction. Theorem 4.2 (Medina and Mohri [2014a]). For any δ > 0 it holds with probability 1 δ over the choice of a sample S of size m that: |FS(w) FD(w)| O n log(m/n) + log(1/δ) Corollary 4.3. Let w S be the output of the ERM algorithm on sample S of size m = O( 2[n log(n/d) + log(1/δ)]). Then with probability 1 δ we have: FD(w S) max kwk2 1 FD(w) O( ) Extensions to other loss functions While our results are phrased in terms of the pricing, they hold for any lower-semi-Lipschitz reward fuction, i.e. any function such that: An important example studied in Medina and Mohri [2014a], Shen et al. [2019] is the revenue of a second price auction with reserves price p. Given two highest bids v1 and v2 the revenue function is written as: SPA(p; v1, v2) = max(v2, p) 1{p v1} 5 Conclusion We give the first approximation algorithm for learning a linear pricing function without any assumption on the data other than normalization. This provides a key missing component to the field of learning for revenue optimization, where ERM was shown to be optimal in Medina and Mohri [2014a] but there were no algorithms with provable guarantees for it. Our algorithm is polynomial in the number of features dimensions n and on the number of datapoints m but exponential in the accuracy parameter . We show that the exponential dependency on is necessary. In this paper we assume that the bids in the dataset represent the buyer s true willingness to pay as in Medina and Mohri [2014a], Medina and Vassilvitskii [2017], Shen et al. [2019]. A interesting avenue of investigation for future work is to understand how strategic buyers would change their bids in response to a contextual batch learning algorithm and how to design algorithms that are aware of strategic response. This is a well studied problem in non-contextual online learning (Amin et al. [2013], Medina and Mohri [2014b], Drutsa [2017], Vanunts and Drutsa [2019], Nedelec et al. [2019]) as well as in online contextual learning (Amin et al. [2014], Golrezaei et al. [2019]). Formulating a model of strategic response to batch learning algorithms is itself open. Broader Impact Statement While our work is largely theoretical, we feel it can have downstream impact in the design of better marketplaces such as those for internet advertisement. Better pricing can increase both the efficiency of the market and the revenue of the platform. The latter is important since the revenue of platforms keeps such services (e.g. online newspapers) free for most users. Acknowledgments and Disclosure of Funding No funding to disclose. The authors would like to thank Andrés Muñoz Medina for helpful discussions. K. Amin, A. Rostamizadeh, and U. Syed. Learning prices for repeated auctions with strategic buyers. In Advances in Neural Information Processing Systems, pages 1169 1177, 2013. K. Amin, A. Rostamizadeh, and U. Syed. Repeated contextual auctions with strategic buyers. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pages 622 630, 2014. R. Calderbank, S. Jafarpour, and R. Schapire. Compressed learning: Universal sparse dimensionality reduction and learning in the measurement domain. preprint, 2009. N. Cesa-Bianchi, C. Gentile, and Y. Mansour. Regret minimization for reserve prices in second- price auctions. In Proceedings of the twenty-fourth annual ACM-SIAM symposium on Discrete algorithms, pages 1190 1204, 2013. M. C. Cohen, I. Lobel, and R. Paes Leme. Feature-based dynamic pricing. In Proceedings of the 2016 ACM Conference on Economics and Computation, pages 817 817. ACM, 2016. R. Cole and T. Roughgarden. The sample complexity of revenue maximization. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 243 252, 2014. M. Derakhshan, N. Golrezaei, and R. Paes Leme. Lp-based approximation for personalized reserve prices. In Proceedings of the 2019 ACM Conference on Economics and Computation, pages 589 589, 2019. A. Drutsa. Horizon-independent optimal pricing in repeated auctions with truthful and strategic buyers. In Proceedings of the 26th International Conference on World Wide Web, pages 33 42, 2017. N. Golrezaei, A. Javanmard, and V. Mirrokni. Dynamic incentive-aware learning: Robust pricing in contextual auctions. In Advances in Neural Information Processing Systems, pages 9756 9766, 2019. S. Har-Peled, P. Indyk, and R. Motwani. Approximate nearest neighbor: Towards removing the curse of dimensionality. Theory of computing, 8(1):321 350, 2012. A. Javanmard. Perishability of data: dynamic pricing under varying-coefficient models. The Journal of Machine Learning Research, 18(1):1714 1744, 2017. A. Javanmard and H. Nazerzadeh. Dynamic pricing in high-dimensions. Working paper, University of Southern California, 2016. R. Kleinberg and T. Leighton. The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In Foundations of Computer Science, 2003. Proceedings. 44th Annual IEEE Symposium on, pages 594 605. IEEE, 2003. R. P. Leme and J. Schneider. Contextual search via intrinsic volumes. In 59th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2018, Paris, France, October 7-9, 2018, pages 268 282, 2018. N. Linial, E. London, and Y. Rabinovich. The geometry of graphs and some of its algorithmic applications. Combinatorica, 15(2):215 245, 1995. I. Lobel, R. P. Leme, and A. Vladu. Multidimensional binary search for contextual decision-making. In Proceedings of the 2017 ACM Conference on Economics and Computation, EC 17, Cambridge, MA, USA, June 26-30, 2017, page 585, 2017. J. Mao, R. P. Leme, and J. Schneider. Contextual pricing for lipschitz buyers. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, Neur IPS 2018, 3-8 December 2018, Montréal, Canada., pages 5648 5656, 2018. A. M. Medina and M. Mohri. Learning theory and algorithms for revenue optimization in second price auctions with reserve. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), pages 262 270, 2014a. A. M. Medina and M. Mohri. Revenue optimization in posted-price auctions with strategic buyers. ar Xiv preprint ar Xiv:1411.6305, 2014b. A. M. Medina and S. Vassilvitskii. Revenue optimization with approximate bid predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pages 1856 1864. Curran Associates Inc., 2017. J. Morgenstern and T. Roughgarden. Learning simple auctions. In Conference on Learning Theory, pages 1298 1318, 2016. J. H. Morgenstern and T. Roughgarden. On the pseudo-dimension of nearly optimal auctions. In Advances in Neural Information Processing Systems, pages 136 144, 2015. T. Nedelec, N. El Karoui, and V. Perchet. Learning to bid in revenue maximizing auction. In Companion Proceedings of The 2019 World Wide Web Conference, pages 934 935, 2019. R. Paes Leme, M. Pal, and S. Vassilvitskii. A field guide to personalized reserve prices. In Proceedings of the 25th international conference on world wide web, pages 1093 1102, 2016. T. Roughgarden and J. R. Wang. Minimizing regret with multiple reserves. ACM Transactions on Economics and Computation (TEAC), 7(3):1 18, 2019. V. Shah, R. Johari, and J. Blanchet. Semi-parametric dynamic contextual pricing. In Advances in Neural Information Processing Systems, pages 2360 2370, 2019. W. Shen, S. Lahaie, and R. P. Leme. Learning to clear the market. In International Conference on Machine Learning (ICML), 2019. A. Vanunts and A. Drutsa. Optimal pricing in repeated posted-price auctions with different patience of the seller and the buyer. In Advances in Neural Information Processing Systems, pages 939 951, 2019.