# on_the_pseudodimension_of_nearly_optimal_auctions__07ad2710.pdf The Pseudo-Dimension of Near-Optimal Auctions Jamie Morgenstern Computer and Information Science University of Pennsylvania Philadelphia, PA jamiemor@cis.upenn.edu Tim Roughgarden Stanford University Palo Alto, CA tim@cs.stanford.edu This paper develops a general approach, rooted in statistical learning theory, to learning an approximately revenue-maximizing auction from data. We introduce t-level auctions to interpolate between simple auctions, such as welfare maximization with reserve prices, and optimal auctions, thereby balancing the competing demands of expressivity and simplicity. We prove that such auctions have small representation error, in the sense that for every product distribution F over bidders valuations, there exists a t-level auction with small t and expected revenue close to optimal. We show that the set of t-level auctions has modest pseudodimension (for polynomial t) and therefore leads to small learning error. One consequence of our results is that, in arbitrary single-parameter settings, one can learn a mechanism with expected revenue arbitrarily close to optimal from a polynomial number of samples. 1 Introduction In the traditional economic approach to identifying a revenue-maximizing auction, one first posits a prior distribution over all unknown information, and then solves for the auction that maximizes expected revenue with respect to this distribution. The first obstacle to making this approach operational is the difficulty of formulating an appropriate prior. The second obstacle is that, even if an appropriate prior distribution is available, the corresponding optimal auction can be far too complex and unintuitive for practical use. This motivates the goal of identifying auctions that are simple and yet nearly-optimal in terms of expected revenue. In this paper, we apply tools from learning theory to address both of these challenges. In our model, we assume that bidders valuations (i.e., willingness to pay ) are drawn from an unknown distribution F. A learning algorithm is given i.i.d. samples from F. For example, these could represent the outcomes of comparable transactions that were observed in the past. The learning algorithm suggests an auction to use for future bidders, and its performance is measured by comparing the expected revenue of its output auction to that earned by the optimal auction for the distribution F. The possible outputs of the learning algorithm correspond to some set C of auctions. We view C as a design parameter that can be selected by a seller, along with the learning algorithm. A central goal of this work is to identify classes C that balance representation error (the amount of revenue sacrificed by restricting to auctions in C) with learning error (the generalization error incurred by learning over C from samples). That is, we seek a set C that is rich enough to contain an auction that closely approximates an optimal auction (whatever F might be), yet simple enough that the best auction in C can be learned from a small amount of data. Learning theory offers tools both for rigorously defining the simplicity of a set C of auctions, through well-known complexity measures such as the Part of this work done while visiting Stanford University. Partially supported by a Simons Award for Graduate Students in Theoretical Computer Science, as well as NSF grant CCF-1415460. pseudo-dimension, and for quantifying the amount of data necessary to identify the approximately best auction from C. Our goal of learning a near-optimal auction also requires understanding the representation error of different classes C; this task is problem-specific, and we develop the necessary arguments in this paper. 1.1 Our Contributions The primary contributions of this paper are the following. First, we show that well-known concepts from statistical learning theory can be directly applied to reason about learning from data an approximately revenue-maximizing auction. Precisely, for a set C of auctions and an arbitrary unknown distribution F over valuations in [1, H], O( H2 2 d C log H ) samples from F are enough to learn (up to a 1 factor) the best auction in C, where d C denotes the pseudo-dimension of the set C (defined in Section 2). Second, we introduce the class of t-level auctions, to interpolate smoothly between simple auctions, such as welfare maximization subject to individualized reserve prices (when t = 1), and the complex auctions that can arise as optimal auctions (as t ! 1). Third, we prove that in quite general auction settings with n bidders, the pseudo-dimension of the set of t-level auctions is O(nt log nt). Fourth, we quantify the number t of levels required for the set of t-level auctions to have low representation error, with respect to the optimal auctions that arise from arbitrary product distributions F. For example, for single-item auctions and several generalizations thereof, if t = ( H ), then for every product distribution F there exists a t-level auction with expected revenue at least 1 times that of the optimal auction for F. In the above sense, the t in t-level auctions is a tunable sweet spot , allowing a designer to balance the competing demands of expressivity (to achieve near-optimality) and simplicity (to achieve learnability). For example, given a fixed amount of past data, our results indicate how much auction complexity (in the form of the number of levels t) one can employ without risking overfitting the auction to the data. Alternatively, given a target approximation factor 1 , our results give sufficient conditions on t and consequently on the number of samples needed to achieve this approximation factor. The resulting sample complexity upper bound has polynomial dependence on H, 1, and the number n of bidders. Known results [1, 8] imply that any method of learning a (1 )-approximate auction from samples must have sample complexity with polynomial dependence on all three of these parameters, even for single-item auctions. 1.2 Related Work The present work shares much of its spirit and high-level goals with Balcan et al. [4], who proposed applying statistical learning theory to the design of near-optimal auctions. The first-order difference between the two works is that our work assumes bidders valuations are drawn from an unknown distribution, while Balcan et al. [4] study the more demanding prior-free setting. Since no auction can achieve near-optimal revenue ex-post, Balcan et al. [4] define their revenue benchmark with respect to a set G of auctions on each input v as the maximum revenue obtained by any auction of G on v. The idea of learning from samples enters the work of Balcan et al. [4] through the internal randomness of their partitioning of bidders, rather than through an exogenous distribution over inputs (as in this work). Both our work and theirs requires polynomial dependence on H, 1 : ours in terms of a necessary number of samples, and theirs in terms of a necessary number of bidders; as well as a measure of the complexity of the class G (in our case, the pseudo-dimension, and in theirs, an analagous measure). The primary improvement of our work over of the results in Balcan et al. [4] is that our results apply for single item-auctions, matroid feasibility, and arbitrary singleparameter settings (see Section 2 for definitions); while their results apply only to single-parameter settings of unlimited supply.1 We also view as a feature the fact that our sample complexity upper bounds can be deduced directly from well-known results in learning theory we can focus instead on the non-trivial and problem-specific work of bounding the pseudo-dimension and representation error of well-chosen auction classes. Elkind [12] also considers a similar model to ours, but only for the special case of single-item auctions. While her proposed auction format is similar to ours, our results cover the far more general 1See Balcan et al. [3] for an extension to the case of a large finite supply. case of arbitrary single-parameter settings and and non-finite support distributions; our sample complexity bounds are also better even in the case of a single-item auction (linear rather than quadratic dependence on the number of bidders). On the other hand, the learning algorithm in [12] (for singleitem auctions) is computationally efficient, while ours is not. Cole and Roughgarden [8] study single-item auctions with n bidders with valuations drawn from independent (not necessarily identical) regular distributions (see Section 2), and prove upper and lower bounds (polynomial in n and 1) on the sample complexity of learning a (1 )-approximate auction. While the formalism in their work is inspired by learning theory, no formal connections are offered; in particular, both their upper and lower bounds were proved from scratch. Our positive results include single-item auctions as a very special case and, for bounded or MHR valuations, our sample complexity upper bounds are much better than those in Cole and Roughgarden [8]. Huang et al. [15] consider learning the optimal price from samples when there is a single buyer and a single seller; this problem was also studied implicitly in [10]. Our general positive results obviously cover the bounded-valuation and MHR settings in [15], though the specialized analysis in [15] yields better (indeed, almost optimal) sample complexity bounds, as a function of 1 and/or H. Medina and Mohri [17] show how to use a combination of the pseudo-dimension and Rademacher complexity to measure the sample complexity of selecting a single reserve price for the VCG mechanism to optimize revenue. In our notation, this corresponds to analyzing a single set C of auctions (VCG with a reserve). Medina and Mohri [17] do not address the expressivity vs. simplicity trade-off that is central to this paper. Dughmi et al. [11] also study the sample complexity of learning good auctions, but their main results are negative (exponential sample complexity), for the difficult scenario of multi-parameter settings. (All settings in this paper are single-parameter.) Our work on t-level auctions also contributes to the literature on simple approximately revenuemaximizing auctions (e.g., [6, 14, 7, 9, 21, 24, 2]). Here, one takes the perspective of a seller who knows the valuation distribution F but is bound by a simplicity constraint on the auction deployed, thereby ruling out the optimal auction. Our results that bound the representation error of t-level auctions (Theorems 3.4, 4.1, 5.4, and 6.2) can be interpreted as a principled way to trade off the simplicity of an auction with its approximation guarantee. While previous work in this literature generally left the term simple safely undefined, this paper effectively proposes the pseudo-dimension of an auction class as a rigorous and quantifiable simplicity measure. 2 Preliminaries This section reviews useful terminology and notation standard in Bayesian auction design and learning theory. Bayesian Auction Design We consider single-parameter settings with n bidders. This means that each bidder has a single unknown parameter, its valuation or willingness to pay for winning. (Every bidder has value 0 for losing.) A setting is specified by a collection X of subsets of {1, 2, . . . , n}; each such subset represent a collection of bidders that can simultaneously win. For example, in a setting with k copies of an item, where no bidder wants more than one copy, X would be all subsets of {1, 2, . . . , n} of cardinality at most k. A generalization of this case, studied in the supplementary materials (Section 5), is matroid settings. These satisfy: (i) whenever X 2 X and Y X, Y 2 X; and (ii) for two sets |I1| < |I2|, I1, I2 2 X, there is always an augmenting element i2 2 I2 \ I1 such that I1 [ {i2} 2 X, X. The supplementary materials (Section 6) also consider arbitrary single-parameter settings, where the only assumption is that ; 2 X. To ease comprehension, we often illustrate our main ideas using single-item auctions (where X is the singletons and the empty set). We assume bidders valuations are drawn from the continuous joint cumulative distribution F. Except in the extension in Section 4, we assume that the support of F is limited to [1, H]n. As in most of optimal auction theory [18], we usually assume that F is a product distribution, with F = F1 F2 . . . Fn and each vi Fi drawn independently but not identically. The virtual value of bidder i is denoted by φi(vi) = vi 1 Fi(vi) fi(vi) . A distribution satisfies the monotone-hazard rate (MHR) condition if fi(vi)/(1 Fi(vi)) is nondecreasing; intuitively, if its tails are no heavier than those of an exponential distribution. In a fundamental paper, [18] proved that when every virtual valuation function is nondecreasing (the regular case), the auction that maximizes expected revenue for n Bayesian bidders chooses winners in a way which maximizes the sum of the virtual values of the winners. This auction is known as Myerson s auction, which we refer to as M. The result can be extended to the general, non-regular case by replacing the virtual valuation functions by ironed virtual valuation functions. The details are well-understood but technical; see Myerson [18] and Hartline [13] for details. Sample Complexity, VC Dimension, and the Pseudo-Dimension This section reviews several well-known definitions from learning theory. Suppose there is some domain Q, and let c be some unknown target function c : Q ! {0, 1}. Let D be an unknown distribution over Q. We wish to understand how many labeled samples (x, c(x)), x D, are necessary and sufficient to be able to output a ˆc which agrees with c almost everywhere with respect to D. The distribution-independent sample complexity of learning c depends fundamentally on the complexity of the set of binary functions C from which we are choosing ˆc. We define the relevant complexity measure next. Let S be a set of m samples from Q. The set S is said to be shattered by C if, for every subset T S, there is some c T 2 C such that c T (x) = 1 if x 2 T and c T (y) = 0 if y /2 T. That is, ranging over all c 2 C induces all 2|S| possible projections onto S. The VC dimension of C, denoted VC(C), is the size of the largest set S that can be shattered by C. Let err S(ˆc) = (P x2S |c(x) ˆc(x)|)/|S| denote the empirical error of ˆc on S, and let err(ˆc) = Ex D[|c(x) ˆc(x)|] denote the true expected error of ˆc with respect to D. A key result from learning theory [23] is: for every distribution D, a sample S of size ( 2(VC(C) + ln 1 δ )) is sufficient to guarantee that err S(ˆc) 2 [err(ˆc) , err(ˆc) + ] for every ˆc 2 C with probability 1 δ. In this case, the error on the sample is close to the true error, simultaneously for every hypothesis in C. In particular, choosing the hypothesis with the minimum sample error minimizes the true error, up to 2 . We say C is ( , δ)-uniformly learnable with sample complexity m if, given a sample S of size m, with probability 1 δ, for all c 2 C, |err S(c) err(c)| < : thus, any class C is ( , δ)-uniformly learnable with m = VC(C) + ln 1 samples. Conversely, for every learning algorithm A that uses fewer than VC(C) samples, there exists a distribution D0 and a constant q such that, with probability at least q, A outputs a hypothesis ˆc0 2 C with err(ˆc0) > err(ˆc) + 2 for some ˆc 2 C. That is, the true error of the output hypothesis is more than 2 larger the best hypothesis in the class. To learn real-valued functions, we need a generalization of VC dimension (which concerns binary functions). The pseudo-dimension [19] does exactly this.2 Formally, let c : Q ! [0, H] be a realvalued function over Q, and C the class we are learning over. Let S be a sample drawn from D, |S| = m, labeled according to c. Both the empirical and true error of a hypothesis ˆc are defined as before, though |ˆc(x) c(x)| can now take on values in [0, H] rather than in {0, 1}. Let (r1, . . . , rm) 2 [0, H]m be a set of targets for S. We say (r1, . . . , rm) witnesses the shattering of S by C if, for each T S, there exists some c T 2 C such that f T (xi) ri for all xi 2 T and c T (xi) < ri for all xi /2 T. If there exists some ~r witnessing the shattering of S, we say S is shatterable by C. The pseudo-dimension of C, denoted d C, is the size of the largest set S which is shatterable by C. The sample complexity upper bounds of this paper are derived from the following theorem, which states that the distribution-independent sample complexity of learning over a class of real-valued functions C is governed by the class s pseudo-dimension. Theorem 2.1 [E.g. [1]] Suppose C is a class of real-valued functions with range in [0, H] and pseudo-dimension d C. For every > 0, δ 2 [0, 1], the sample complexity of ( , δ)-uniformly learning f with respect to C is m = O Moreover, the guarantee in Theorem 2.1 is realized by the learning algorithm that simply outputs the function c 2 C with the smallest empirical error on the sample. 2The fat-shattering dimension is a weaker condition that is also sufficient for sample complexity bounds. All of our arguments give the same upper bounds on the pseudo-dimension and the fat-shattering dimension of various auction classes, so we present the stronger statements. Applying Pseudo-Dimension to Auction Classes For the remainder of this paper, we consider classes of truthful auctions C.3 When we discuss some auction c 2 C, we treat c : [0, H]n ! R as the function that maps (truthful) bid tuples to the revenue achieved on them by the auction c. Then, rather than minimizing error, we aim to maximize revenue. In our setting, the guarantee of Theorem 2.1 directly implies that, with probability at least 1 δ (over the m samples), the output of the empirical revenue maximization learning algorithm which returns the auction c 2 C with the highest average revenue on the samples chooses an auction with expected revenue (over the true underlying distribution F) that is within an additive of the maximum possible. 3 Single-Item Auctions To illustrate out ideas, we first focus on single-item auctions. The results of this section are generalized significantly in the supplementary (see Sections 5 and 6). Section 3.1 defines the class of t-level single-item auctions, gives an example, and interprets the auctions as approximations to virtual welfare maximizers. Section 3.2 proves that the pseudo-dimension of the set of such auctions is O(nt log nt), which by Theorem 2.1 implies a sample-complexity upper bound. Section 3.3 proves that taking t = ( H ) yields low representation error. 3.1 t-Level Auctions: The Single-Item Case We now introduce t-level auctions, or Ct for short. Intuitively, one can think of each bidder as facing one of t possible prices; the price they face depends upon the values of the other bidders. Consider, for each bidder i, t numbers 0 i,0 i,1 . . . i,t 1. We refer to these t numbers as thresholds. This set of tn numbers defines a t-level auction with the following allocation rule. Consider a valuation tuple v: 1. For each bidder i, let ti(vi) denote the index of the largest threshold i, that lower bounds vi (or -1 if vi < i,0). We call ti(vi) the level of bidder i. 2. Sort the bidders from highest level to lowest level and, within a level, use a fixed lexico- graphical tie-breaking ordering to pick the winner.4 3. Award the item to the first bidder in this sorted order (unless ti = 1 for every bidder i, in which case there is no sale). The payment rule is the unique one that renders truthful bidding a dominant strategy and charges 0 to losing bidders the winning bidder pays the lowest bid at which she would continue to win. It is important for us to understand this payment rule in detail; there are three interesting cases. Suppose bidder i is the winner. In the first case, i is the only bidder who might be allocated the item (other bidders have level -1), in which case her bid must be at least her lowest threshold. In the second case, there are multiple bidders at her level, so she must bid high enough to be at her level (and, since ties are broken lexicographically, this is her threshold to win). In the final case, she need not compete at her level: she can choose to either pay one level above her competition (in which case her position in the tie-breaking ordering does not matter) or she can bid at the same level as her highest-level competitors (in which case she only wins if she dominates all of those bidders at the next-highest level according to ). Formally, the payment p of the winner i (if any) is as follows. Let denote the highest level such that there at least two bidders at or above level , and I be the set of bidders other than i whose level is at least . Monop If = 1, then pi = i,0 (she is the only potential winner, but must have level 0 to win). Mult If ti(vi) = then pi = i, (she needs to be at level ). 3An auction is truthful if truthful bidding is a dominant strategy for every bidder. That is: for every bidder i, and all possible bids by the other bidders, i maximizes its expected utility (value minus price paid) by bidding its true value. In the single-parameter settings that we study, the expected revenue of the optimal non-truthful auction (measured at a Bayes-Nash equilibrium with respect to the prior distribution) is no larger than that of the optimal truthful auction. 4When the valuation distributions are regular, this tie-breaking can be done by value, or randomly; when it is done by value, this equates to a generalization of VCG with nonanonymous reserves (and is IC and has identical representation error as this analysis when bidders are regular). Unique If ti(vi) > , if i i0 for all i0 2 I, she pays pi = i, , otherwise she pays pi = i, +1 (she either needs to be at level + 1, in which case her position in does not matter, or at level , in which case she would need to be the highest according to ). We now describle a particular t-level auction, and demonstrate each case of the payment rule. Example 3.1 Consider the following 4-level auction for bidders a, b, c. Let a, = [2, 4, 6, 8], b, = [1.5, 5, 9, 10], and c, = [1.7, 3.9, 6, 7]. For example, if bidder a bids less than 2 she is at level 1, a bid in [2, 4) puts her at level 0, a bid in [4, 6) at level 1, a bid in [6, 8) at level 2, and a bid of at least 8 at level 3. Let a b c. Monop If va = 3, vb < 1.5, vc < 1.7, then b, c are at level 1 (to which the item is never allocated). So, a wins and pays 2, the minimum she needs to bid to be at level 0. Mult If va 8, vb 10, vc < 7, then a and b are both at level 3, and a b, so a will win and pays 8 (the minimum she needs to bid to be at level 3). Unique If va 8, vb 2 [5, 9], vc 2 [3.9, 6], then a is at level 3, and b and c are at level 1. Since a b and a c, a need only pay 4 (enough to be at level 1). If, on the other hand, va 2 [4, 6], vb = [5, 9] and vc 6, c has level at least 2 (while a, b have level 1), but c needs to pay 6 since a, b c. Remark 3.2 (Connection to virtual valuation functions) t-level auctions are naturally interpreted as discrete approximations to virtual welfare maximizers, and our representation error bound in Theorem 3.4 makes this precise. Each level corresponds to a constraint of the form If any bidder has level at least , do not sell to any bidder with level less than . We can interpret the i, s (with fixed , ranging over bidders i) as the bidder values that map to some common virtual value. For example, 1-level auctions treat all values below the single threshold as having negative virtual value, and above the threshold uses values as proxies for virtual values. 2-level auctions use the second threshold to the refine virtual value estimates, and so on. With this interpretation, it is intuitively clear that as t ! 1, it is possible to estimate bidders virtual valuation functions and thus approximate Myerson s optimal auction to arbitrary accuracy. 3.2 The Pseudo-Dimension of t-Level Auctions This section shows that the pseudo-dimension of the class of t-level single-item auctions with n bidders is O(nt log nt). Combining this with Theorem 2.1 immediately yields sample complexity bounds (parameterized by t) for learning the best such auction from samples. Theorem 3.3 For a fixed tie-breaking order, the set of n-bidder single-item t-level auctions has pseudo-dimension O (nt log(nt)). Proof: Recall from Section 2 that we need to upper bound the size of every set that is shatterable using t-level auctions. Fix a set of samples S = v1, . . . , vm# of size m and a potential witness R = r1, . . . , rm# . Each auction c induces a binary labeling of the samples vj of S (whether c s revenue on vj is at least rj or strictly less than rj). The set S is shattered with witness R if and only if the number of distinct labelings of S given by any t-level auction is 2m. We upper-bound the number of distinct labelings of S given by t-level auctions (for some fixed potential witness R), counting the labelings in two stages. Note that S involves nm numbers one value vj i for each bidder for each sample, and a t-level auction involves nt numbers t thresholds i, for each bidder. Call two t-level auctions with thresholds { i, } and {ˆ i, } equivalent if 1. The relative order of the i, s agrees with that of the ˆ i, s, in that both induce the same permutation of {1, 2, . . . , n} {0, 1, . . . , t 1}. 2. merging the sorted list of the vj i s with the sorted list of the i, s yields the same partition of the vj i s as does merging it with the sorted list of the ˆ i, s. Note that this is an equivalence relation. If two t-level auctions are equivalent, every comparison between a valuation and a threshold or two valuations is resolved identically by those auctions. Using the defining properties of equivalence, a crude upper bound on the number of equivalence classes is (nm + nt)2nt. (1) We now upper-bound the number of distinct labelings of S that can be generated by t-level auctions in a single equivalence class C. First, as all comparisons between two numbers (valuations or thresholds) are resolved identically for all auctions in C, each bidder i in each sample vj of S is assigned the same level (across auctions in C), and the winner (if any) in each sample vj is constant across all of C. By the same reasoning, the identity of the parameter that gives the winner s payment (some i, ) is uniquely determined by pairwise comparisons (recall Section 3.1) and hence is common across all auctions in C. The payments i, , however, can vary across auctions in the equivalence class. For a bidder i and level 2 {0, 1, 2, . . . , t 1}, let Si, S be the subset of samples in which bidder i wins and pays i, . The revenue obtained by each auction in C on a sample of Si, is simply i, (and independent of all other parameters of the auction). Thus, ranging over all t-level auctions in C generates at most |Si, | distinct binary labelings of Si, the possible subsets of Si, for which an auction meets the corresponding target rj form a nested collection. Summarizing, within the equivalence class C of t-level auctions, varying a parameter i, generates at most |Si, | different labelings of the samples Si, and has no effect on the other samples. Since the subsets {Si, }i, are disjoint, varying all of the i, s (i.e., ranging over C) generates at most |Si, | mnt (2) distinct labelings of S. Combining (1) and (2), the class of all t-level auctions produces at most (nm + nt)3nt distinct labelings of S. Since shattering S requires 2m distinct labelings, we conclude that 2m (nm + nt)3nt, implying m = O(nt log nt) as claimed. 3.3 The Representation Error of Single-Item t-Level Auctions In this section, we show that for every bounded product distribution, there exists a t-level auction with expected revenue close to that of the optimal single-item auction when bidders are independent and bounded. The analsysis rounds an optimal auction to a t-level auction without losing much expected revenue. This is done using thresholds to approximate each bidder s virtual value: the lowest threshold at the bidder s monopoly reserve price, the next 1 thresholds at the values at which bidder i s virtual value surpasses multiples of , and the remaining thresholds at those values where bidder i s virtual value reaches powers of 1 + . Theorem 3.4 formalizes this intuition. Theorem 3.4 Suppose F is distribution over [1, H]n. If t = , Ct contains a single-item auction with expected revenue at least 1 times the optimal expected revenue. Theorem 3.4 follows immediately from the following lemma, with = γ = 1. We prove this more general result for later use. Lemma 3.5 Consider n bidders with valuations in [0, H] and with P[maxi vi > ] γ. Then, Ct contains a single-item auction with expected revenue at least a 1 times that of an optimal auction, for t = 1 γ + log1+ Proof: Consider a fixed bidder i. We define t thresholds for i, bucketing i by her virtual value, and prove that the t-level auction A using these thresholds for each bidder closely approximates the expected revenue of the optimal auction M. Let 0 be a parameter defined later. Set i,0 = φ 1 i (0), bidder i s monopoly reserve.5 For 2 [1, d 1 γ 0 e], let i, = φ 1 (φi 2 [0, ]). For 2 [d 1 γ 0 e + dlog1+ e], let i, = φ 1 2) d 1 γ 0 e) (φi > ). Consider a fixed valuation profile v. Let i denote the winner according to A, and i 0 the winner according to the optimal auction M. If there is no winner, we interpret φi (vi ) and φi0 (vi0 ) as 0. Recall that M always awards the item to a bidder with the highest positive virtual value (or no one, if no such bidders exist). The definition of the thresholds immediately implies the following. 1. A only allocates to non-negative ironed virtual-valued bidders. 2. If there is no tie (that is, there is a unique bidder at the highest level), then i 0 = i . 3. When there is a tie at level , the virtual value of the winner of A is close to that of M: If 2 [0, d 1 γ 0 e] then φi0 (vi0 ) φi (vi ) γ 0; γ 0 e + dlog1+ e], φi (vi ) φi0 (vi0 ) 1 These facts imply that Ev[Rev(A)] = Ev[φi (vi )] (1 2) Ev[φi0 (vi0 )] γ 0 = (1 2) Ev[Rev(M)] γ 0. (3) are equal. The first and final equality follow from A and M s allocations depending on ironed virtual values, not on the values themselves, thus, the ironed virtual values are equal in expectation to the unironed virtual values, and thus the revenue of the mechanisms (see [13], Chapter 3.5 for discussion). As P[maxi vi > ] γ, it must be that E[Rev(M)] γ (a posted price of will achieve this revenue). Combining this with (3), and setting 0 = 2 implies Ev[Rev(A)] (1 ) Ev[Rev(M)]. Combining Theorems 2.1 and 3.4 yields the following Corollary 3.6. Corollary 3.6 Let F be a product distribution with all bidders valuations in [1, H]. Assume that nt log (nt) log H . Then with probability at least 1 δ, the single-item empirical revenue maximizer of Ct on a set of m samples from F has expected revenue at least 1 times that of the optimal auction. Open Questions There are some significant opportunities for follow-up research. First, there is much to do on the design of computationally efficient (in addition to sample-efficient) algorithms for learning a nearoptimal auction. The present work focuses on sample complexity, and our learning algorithms are generally not computationally efficient.6 The general research agenda here is to identify auction classes C for various settings such that: 1. C has low representation error; 2. C has small pseudo-dimension; 3. There is a polynomial-time algorithm to find an approximately revenue-maximizing auction from C on a given set of samples.7 There are also interesting open questions on the statistical side, notably for multi-parameter problems. While the negative result in [11] rules out a universally good upper bound on the sample complexity of learning a near-optimal mechanism in multi-parameter settings, we suspect that positive results are possible for several interesting special cases. 5Recall from Section 2 that φi denotes the virtual valuation function of bidder i. (From here on, we always mean the ironed version of virtual values.) It is convenient to assume that these functions are strictly increasing (not just nondecreasing); this can be enforced at the cost of losing an arbitrarily small amount of revenue. 6There is a clear parallel with computational learning theory [22]: while the information-theoretic foundations of classification (VC dimension, etc. [23]) have been long understood, this research area strives to understand which low-dimensional concept classes are learnable in polynomial time. 7The sample-complexity and performance bounds implied by pseudo-dimension analysis, as in Theorem 2.1, hold with such an approximation algorithm, with the algorithm s approximation factor carrying through to the learning algorithm s guarantee. See also [4, 11]. [1] Martin Anthony and Peter L. Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, NY, NY, USA, 1999. [2] Moshe Babaioff, Nicole Immorlica, Brendan Lucier, and S. Matthew Weinberg. A simple and approxi- mately optimal mechanism for an additive buyer. SIGecom Exch., 13(2):31 35, January 2015. [3] Maria-Florina Balcan, Avrim Blum, and Yishay Mansour. Single price mechanisms for revenue maxi- mization in unlimited supply combinatorial auctions. Technical report, Carnegie Mellon University, 2007. [4] Maria-Florina Balcan, Avrim Blum, Jason D Hartline, and Yishay Mansour. Reducing mechanism design to algorithm design via machine learning. Jour. of Comp. and System Sciences, 74(8):1245 1270, 2008. [5] Yang Cai and Constantinos Daskalakis. Extreme-value theorems for optimal multidimensional pricing. In Foundations of Computer Science (FOCS), 2011 IEEE 52nd Annual Symposium on, pages 522 531, Palm Springs, CA, USA., Oct 2011. IEEE. [6] Shuchi Chawla, Jason Hartline, and Robert Kleinberg. Algorithmic pricing via virtual valuations. In Proceedings of the 8th ACM Conf. on Electronic Commerce, pages 243 251, NY, NY, USA, 2007. ACM. [7] Shuchi Chawla, Jason D. Hartline, David L. Malec, and Balasubramanian Sivan. Multi-parameter mech- anism design and sequential posted pricing. In Proceedings of the Forty-second ACM Symposium on Theory of Computing, pages 311 320, NY, NY, USA, 2010. ACM. [8] Richard Cole and Tim Roughgarden. The sample complexity of revenue maximization. In Proceedings of the 46th Annual ACM Symposium on Theory of Computing, pages 243 252, NY, NY, USA, 2014. SIAM. [9] Nikhil Devanur, Jason Hartline, Anna Karlin, and Thach Nguyen. Prior-independent multi-parameter mechanism design. In Internet and Network Economics, pages 122 133. Springer, Singapore, 2011. [10] Peerapong Dhangwatnotai, Tim Roughgarden, and Qiqi Yan. Revenue maximization with a single sample. In Proceedings of the 11th ACM Conf. on Electronic Commerce, pages 129 138, NY, NY, USA, 2010. ACM. [11] Shaddin Dughmi, Li Han, and Noam Nisan. Sampling and representation complexity of revenue max- imization. In Web and Internet Economics, volume 8877 of Lecture Notes in Computer Science, pages 277 291. Springer Intl. Publishing, Beijing, China, 2014. [12] Edith Elkind. Designing and learning optimal finite support auctions. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 736 745. SIAM, 2007. [13] Jason Hartline. Mechanism design and approximation. Jason Hartline, Chicago, Illinois, 2015. [14] Jason D. Hartline and Tim Roughgarden. Simple versus optimal mechanisms. In ACM Conf. on Electronic Commerce, Stanford, CA, USA., 2009. ACM. [15] Zhiyi Huang, Yishay Mansour, and Tim Roughgarden. Making the most of your samples. abs/1407.2479: 1 3, 2014. URL http://arxiv.org/abs/1407.2479. [16] Michael J. Kearns and Umesh V. Vazirani. An Introduction to Computational Learning Theory. MIT Press, Cambridge, MA, 1994. [17] Andres Munoz Medina and Mehryar Mohri. Learning theory and algorithms for revenue optimization in second price auctions with reserve. In Proceedings of The 31st Intl. Conf. on Machine Learning, pages 262 270, 2014. [18] Roger B Myerson. Optimal auction design. Mathematics of operations research, 6(1):58 73, 1981. [19] David Pollard. Convergence of stochastic processes. David Pollard, New Haven, Connecticut, 1984. [20] T. Roughgarden and O. Schrijvers. Ironing in the dark. Submitted, 2015. [21] Tim Roughgarden, Inbal Talgam-Cohen, and Qiqi Yan. Supply-limiting mechanisms. In Proceedings of the 13th ACM Conf. on Electronic Commerce, pages 844 861, NY, NY, USA, 2012. ACM. [22] Leslie G Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134 1142, 1984. [23] Vladimir N Vapnik and A Ya Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability & Its Applications, 16(2):264 280, 1971. [24] Andrew Chi-Chih Yao. An n-to-1 bidder reduction for multi-item auctions and its applications. In Pro- ceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 92 109, San Diego, CA, USA., 2015. ACM.