# learning_to_price_homogeneous_data__3b37c69e.pdf Learning to Price Homogeneous Data Keran Chen UW-Madison kchen429@wisc.edu Joon Suk Huh UW-Madison jhuh23@wisc.edu Kirthevasan Kandasamy UW-Madison kandasamy@cs.wisc.edu We study a data pricing problem, where a seller has access to N homogeneous data points (e.g. drawn i.i.d. from some distribution). There are m types of buyers in the market, where buyers of the same type i have the same valuation curve vi : [N] [0, 1], where vi(n) is the value for having n data points. A priori, the seller is unaware of the distribution of buyers, but can repeat the market for T rounds so as to learn the revenue-optimal pricing curve p : [N] [0, 1]. To solve this online learning problem, we first develop novel discretization schemes to approximate any pricing curve. When compared to prior work, the size of our discretization schemes scales gracefully with the approximation parameter, which translates to better regret in online learning. Under assumptions like smoothness and diminishing returns which are satisfied by data, the discretization size can be reduced further. We then turn to the online learning problem, both in the stochastic and adversarial settings. On each round, the seller chooses an anonymous pricing curve pt. A new buyer appears and may choose to purchase some amount of data. She then reveals her type only if she makes a purchase. Our online algorithms build on classical algorithms such as UCB and FTPL, but require novel ideas to account for the asymmetric nature of this feedback and to deal with the vastness of the space of pricing curves. Our algorithms achieve e O(m T) regret in the stochastic setting and e O(m 3/2 T) regret in the adversarial setting. 1 Introduction Due to the rise in popularity of machine learning, there is an increased demand for data. However, not all users of data have the wherewithal to collect data on their own, and have to rely on data marketplaces to acquire the data they need. For example, a materials data platform (e.g. [18]), may have collected vast amounts of data from various proprietary sources. Materials scientists in smaller organizations and academia, who do not have large experimental apparatuses, may wish to purchase this data to aid in their research. Similarly, small businesses may wish to purchase customer data for advertising and product recommendations [4, 5], while small technology companies may wish to purchase data about cloud operations to optimize their computing infrastructure [2, 3]. Model. Motivated by the emergence of such data marketplaces, we study the following online data pricing problem. A seller has access to N homogeneous data points, (e.g. drawn i.i.d. from some distribution). He wishes to sell the data to a sequence of distinct buyers over T rounds, and intends to achieve large revenue. There are m types of buyers in the data marketplace, with all buyers in type i having the same valuation curve vi : [N] [0, 1] for the data, where vi(n) represents the buyer s value for having n points. As data is homogeneous, we can treat an agent s value as a function of the amount of data n (we will illustrate this in the sequel). Valuation curves are monotone non-decreasing, as more data is better. At each round t, the seller chooses a price curve pt : [N] [0, 1], where pt(n) is the price for purchasing n data points. Then, a buyer with type it arrives and purchases an amount of data that maximizes her utility (value minus price), provided that she can achieve non-negative utility. A buyer will reveal her type to the seller only if she makes a purchase, and only after she 38th Conference on Neural Information Processing Systems (Neur IPS 2024). makes the purchase. The seller has knowledge of valuation curves of the m types, but does not know the distribution q over types (stochastic setting), or the buyer sequence (adversarial setting). Moreover, he cannot practice non-anonymous (discriminatory) pricing, as he needs to choose the pricing curve pt without knowledge of the buyer s type on that round. While there is extensive research on revenue-optimal pricing and learning to price, data marketplaces merit special attention, both due to their recent emergence and the unique characteristics of data. Typically the number of data N (number of goods) is very large, but data usually satisfies additional properties such as smoothness (an agent s value does not increase significantly with a small amount of additional data) and diminishing returns (additional data is more valuable when a buyer has less data). To illustrate further, note that two steps are essential to develop an effective online learning solution for data pricing. (1) First, we need to solve the planning problem, i.e. find a revenue-optimal pricing curve when the type distribution q is known. (2) Second, when q is unknown, we need to combine the algorithm in step (1) with estimates for q to maximize long-term revenue. Methods in the existing literature fall short in both steps. (1) When the type distribution q is known, the data pricing problem resembles an ordered item pricing problem, which is known to be NP-hard [13, 25]. Hence, prior work has aimed at approximating the optimal pricing curves via discretization schemes. Unfortunately, existing discretization schemes have poor, often exponential, dependence on the approximation parameter ϵ. However, achieving sublinear regret in online learning requires choosing ϵ that vanishes with longer time horizons, i.e. ϵ 0 as T . Therefore, directly using existing discretization schemes in an online setting leads to poor statistical and computational properties of the associated online algorithm. This requires us to leverage the above properties of data to design discretization schemes with better dependence on ϵ. (2) While there is prior work on learning optimal prices [22, 27, 33], these techniques either fall short of addressing the complexities in our setting, or fail to account for the properties of data, and hence do not scale gracefully when the amount of data N is very large. Moreover, in our online learning setup, the seller faces a trade-off between setting high prices to maximize instantaneous revenue versus setting low prices so as to guarantee a purchase, which results in the buyer revealing their type, which in turn can be helpful in future rounds. Prior work has studied this asymmetric feedback model only in single-item markets which is significantly simpler, and only in the stochastic setting [23, 47]. 1.1 Summary of our contributions Our contributions in this work are threefold: (1) First, in 3, we develop discretization schemes for revenue-optimal data pricing under a variety of assumptions, which we will use later in our online learning schemes. (2) In 4, we study learning a revenue-optimal price in a stochastic setting, where the customer types on each round are drawn from a fixed but unknown distribution q. (3) Finally, in 5, we study online learning when the buyer types are chosen by an oblivious adversary. 1. Discretization (approximation) schemes for revenue-optimal data pricing. Assuming only monotonicity, we show that there is a discretization of size e O((N/ϵ)m) which is an O(ϵ) additive approximation to any pricing curve. When compared to prior work [14, 25], our discretization scheme has smaller dependence on ϵ 1 when the number of types m is small (see Table 1). This will be useful, both statistically and computationally, when we study the online setting, as we need to choose ϵ 0 as T to achieve sublinear regret. This is still quite large in real-world data marketplaces, where N may be very large. Hence, we also study two other assumptions. First, when valuations are smooth, satisfying an L-Lipschitz-like condition, we construct a discretization of size e O (L/ϵ2)m , which has no dependence on N. Next, under a diminishing returns condition, we construct a discretization of size O Jmϵ 3m logm(N) , which only has polylogarithmic dependence on N. Key insights. We first show that when there are only m types, for any price function p : [N] [0, 1], there exists an m step price function p whose revenue is at least as much as that of p on any type distribution q. An m step function is non-decreasing and changes values at most m times, allowing us to focus on this restricted class and thereby reduce the search space when m N. We then consider discretizations of the data space [N] and valuations [0, 1] which allow us to obtain an O(ϵ) approximation to any pricing curve, and then apply this insight to construct our discretizations. Finally, we show that with monotonicity and diminishing returns, similarly accurate approximations are attainable with substantially coarser discretizations. 2. Learning to price in the stochastic setting. Next, we turn to the online learning problem described in the beginning in a stochastic setting. On each round, our algorithm computes an Algorithm Assumptions Size of discretization Reference Hartline and Koltun [25] e O(2Nϵ N) Chawla et al. [14] M N O(ϵ 2 log ϵ 1) Algorithm 1 (ours) M, F e O(N mϵ m) Theorem 3.1 Algorithm 5 (ours) M, F, S e O Lmϵ 2m Theorem 3.2 Algorithm 2 (ours) M, F, D e O Jmϵ 3m logm N Theorem 3.3 Table 1: Comparison of discretization (approximation) schemes of prior work and our methods under various assumptions. All methods achieve a O(ϵ) additive approximation to any pricing curve. Here, M means Monotonicity, F means that there are a Finite (m) number of types, S means that the valuation curves satisfy a L-Lipschitz-like Smoothness condition (Assumption 1), and D means that they satisfy a Diminishing returns condition (Assumption 2). The e O notation suppresses log dependencies when there is already a polynomial dependence on a parameter. Prior work has exponential dependence in either N or ϵ 1. We wish to do better since (i) typically, the number of data N is very large and (ii) we need ϵ 0 as T to achieve sublinear regret. upper confidence bound (UCB) [8, 38] on the revenue for each price curve in the discretization previously developed; we then choose the price curve with the highest UCB. As summarized in Table 2, this algorithm achieves a e O(m T) bound on the regret for any discretization scheme, including those from prior work. In the stochastic setting, the key advantage of our discretization schemes is computational. Key insights. Both the design and the anlaysis of an algorithm is challenging in this setting due to two reasons: (i) the large size of the discretization and (i) the asymmetric nature of feedback. First, naively maintaining UCBs for each price leads to large confidence intervals, and hence large regret as the size of the discretization is large; instead, we construct confidence intervals on estimates of the type distribution, and translate them to UCBs for the revenue. Second, the asymmetric nature of the feedback places us between bandit and full-information settings. Treating this like a bandit setting would lead to poor, exponential dependence on m in the regret. However, we are unable to treat this as full information since the type distribution is revealed only if there is a purchase. Handling this asymmetry requires a delicate construction of the UCB. 3. Learning to price in the adversarial setting. We study learning in an adversarial setting where the types on each round may be chosen adversarially. Table 2 shows the regret and time complexity of our method when paired with various discretization schemes. In the adversarial setting, our discretization schemes offer both computational and statistical advantages compared to prior work. Key insights. Our algorithm builds on the Follow-the-Perturbed-Leader (FTPL) [31], originally designed for full-information settings and not directly applicable here. To handle asymmetric feedback, we use the information we have about the valuation curves to keep track of which customers would not have made a purchase given a price curve. If a purchase is made and we observe feedback, we use the usual FTPL update, but if not, we reward each pricing curve with the sum of revenue of all types that would not purchase in that current round. 1.2 Related work Dynamic pricing. The online posted-price mechanism, also known as dynamic pricing, is a central research area in algorithmic market design [19, 33]. In the most classical setting [33], the seller sets a price for an item in each round, and a buyer purchases the item only if their valuation exceeds the posted price. While several extensions of this setting have been explored for both parametric [12, 20, 28, 29, 32, 46] and non-parametric [11, 17, 39, 40, 44] demands, most focus on single-parameter demands, i.e., selling a single item to buyers. Our data pricing problem is multi-parameter, as demands are parameterized by multiple outcomes, i.e. the number of data points. Bayesian unit-demand pricing problem. Formally, our data pricing problem is a variant of the Bayesian Unit-demand Pricing Problem (BUPP) [13]. BUPP addresses the problem of (offline) revenue maximization over a known distribution of unit-demand buyers, meaning they want to buy at Setting Assumptions Regret bound Complexity per iteration Reference M, F, S e O ((LT)m) Theorem 4.1 M, F, D e O(Jm T 3m/2) Adversarial M, F, S e O ((LT)m) Theorem 5.1 M, F, D e O Jm T 3m/2 Discretization method Assumptions Complexity per iteration Regret (Adversarial) Hartline and Koltun [25] F e O(2Nϵ N) e O(m Chawla et al. [14] M, F N O(ϵ 2 log ϵ 1) e O m T 3/4 Table 2: Comparison of regret and time complexity of our online learning methods when paired with our discretization schemes and schemes from prior work. See Table 1 for a description of the assumptions. All methods, including [14, 25] achieve O(m T) regret in the stochastic setting. most one item from the inventory. In BUPP, a seller has N distinct items to sell to a unit-demand buyer whose valuations are v = (v1, . . . , v N), where vi is the value of the ith item. Given prices pi, i [N], the unit-demand buyer purchases a single item i [N] that maximizes their utility: vi pi. Assuming the valuation profile v follows a known distribution D, the goal of BUPP is to find the best prices {pi}i [N] that maximize the seller s expected revenue. Our data pricing problem is a variant of BUPP in two ways: (1) We study the sequential setting where type distributions are unknown, while valuation profiles for each type are known, and (2) We assume monotonic values v1 v N, which is natural in data pricing. Unfortunately, BUPP is a computationally intractable problem, as is ours. BUPP is known to be NP-hard even when D is a product distribution [16]. Moreover, even assuming that values are monotonic (i.e., v1 v N), the problem remains (strongly) NP-hard [14]. Therefore, we aim to provide a reasonably efficient no-regret algorithm for our problem, especially when the number of types m is a fixed constant. The previous works most relevant to our paper are Hartline and Koltun [25] and Chawla et al. [14], which study offline revenue maximization for unit-demand buyers. Buyers in our problem are also unitdemand, as each amount of data points can be seen as an individual item. Revenue maximization for unit-demand buyers is known to be computationally intractable [24], even with ordered (monotonic) buyer values [14], leading these works to focus on approximation algorithms. Hartline and Koltun [25] proposed an approximation algorithm with near-linear runtime in the number of buyers, given a fixed number of items. Chawla et al. [14] introduced a polynomial-time approximation scheme (PTAS) for unit-demand buyers with monotonic values. In this work, we extend the framework to the online setting with partial feedback, which has more practical implications. In addition, Balcan and Beyhaghi [10] provide new guarantees for learning revenue-maximizing menus of lotteries and two-part tariffs, demonstrating that their discretization technique yields efficient solutions for specific pricing models. Similar discretization methods could be investigated in future work to potentially improve our approach in more complex data pricing scenarios. Market design for data-sharing. In recent years, there has been a plethora of work devoted to algorithmic market design for data sharing [6, 7, 30, 43]. These works provide ingenious solutions to challenges unique to the data market, such as free replicability and the difficulty of valuation due to the combinatorial nature of data. Except for Agarwal et al. [6], the above-cited solutions are inherently offline or single-shot. While we focus on a simplified yet relevant setting where data comes from a single source, resulting in monotonic valuations, in this work, we tackle the problem in a sequential, dynamic setting, which has practical importance. In contrast to our approach, Agarwal et al. [6] considered the price to be a constant (i.e., a scalar rather than a price vector) to address the inherent computational intractability of multi-dimensional pricing. Instead, we maintain the price as a vector (i.e., a price function) but focus on cases where the valuation function satisfies natural properties such as monotonicity, smoothness, and diminishing returns. 2 Problem setting, assumptions, and challenges A seller has N homogeneous data points. There are m types of buyers who wish to purchase this data. A buyer of type i [m] has a valuation curve vi : [N] [0, 1], where vi(n) is her value for n data points. We will assume vi(n) is non-decreasing as more data is valuable, and further that vi(0) = 0. Example 1. To motivate this model, consider a seller with N ordered data points {x1, . . . , x N}, drawn i.i.d. from a distribution D. If a buyer purchases n points, she receives the first n points, Xn = {x1, . . . , xn}. Her ex-post value evi(Xn) may represent the accuracy of her ML model trained with Xn. However, as the buyer has not seen the data before the purchase, she does not know which specific points she will receive, and hence her (ex-ante) value vi(n) = EXn[evi(Xn)] is the expected model accuracy when n i.i.d points are drawn from D. The different types could be buyers who use the data for different tasks or models. For instance, with Image Net s [21], N 1.4 million data points, different types of buyers could perform different learning tasks such as object detection, identification, and segmentation, and/or train different models such as Alex Net [36], Res Net [26], and Goog Le Net [42]. Both empirically and theoretically, for many learning tasks, vi(n) is non-decreasing, and satisfies additional characteristics such as smoothness and/or diminishing returns. Pricing curves, buyer utility, and buyer purchase model. Let p : [N] [0, 1] be a pricing curve chosen by the seller. Let P = {p : [N] [0, 1] : p(0) = 0} denote the set of all pricing curves. If a buyer purchases n points, her utility is ui(n) = vi(n) p(n). If a buyer can achieve non-negative utility, i.e. vi(n) p(n) for some n [N], she will purchase an amount of data to maximize her utility. To fully specify the buyer s purchase model, we will assume that when there are multiple n which maximizes her utility, she will choose the largest such n. Formally, for a given pricing curve p, a buyer of type i will purchase ni,p points where, ni,p = 0 if vi(n) < p(n) for all n [N], max argmaxn [N] (vi(n) p(n)) otherwise. (1) Optimal revenue. It follows that the revenue from a buyer of type is p(ni,p). Let q = (q1, . . . , qm) be the distribution of the buyers. Under this distribution q, the expected revenue rev(p) for a price curve p, the optimal price p OPT, and the optimal revenue OPT as follows: i=1 qi p(ni,p), p OPT = argmax p P rev(p), OPT = rev(p OPT). (2) We have omitted the dependence on q in rev, p OPT, and OPT. There is no closed-form solution to finding the optimal pricing curve, even when q is known. Therefore, in 3, we explore discretization methods to approximate p OPT, which will then be used in 4 and 5 to develop online learning algorithms. Unfortunately, the size of this discretization can be very large in N and m without further assumptions. Therefore, we also consider two additional commonly satisfied conditions by data. Our first such assumption states that buyer valuation curves satisfy a Lipschitz-like smoothness condition with Lipschitz constant L/N. We use L/N instead of L since the number of data has a range [0, N], while the valuations only have a range [0, 1]. This condition states that a buyer s valuation does not change significantly if she only purchases a few additional points. Assumption 1 (Smoothness, S). For all n, n [N], we have vi(n + n ) vi(n) L Our second condition is based on the fact that data typically exhibits diminishing returns [34, 35]. This means that an additional data point is more valuable when there is less data, i.e. vi(n+1) vi(n) is decreasing with n. We will in fact make a stronger assumption, and justify it below. Assumption 2 (Diminishing returns, D). There exists some J > 0 such that, for all types i [m], and for all n [N], we have vi(n + 1) vi(n) J Assumption 2 quantifies the rate of decrease of diminishing returns. Following Example 1, the valuation (accuracy) curves for many learning problems take the form vi(n) = α βn γ; for instance, for binary classification in a VC class H, α may be the best accuracy in H, β O( d H) where d H is the VC dimension, and γ = 1/2 [41]; similarly, for nonparametric regression of a twice differentiable function, α and β are constants while γ = 2/5 [45]. In such cases, Assumption 2 is satisfied with J = βγ. Note that neither assumption subsumes the other: a non-concave Lipschitz function will not satisfy Assumption 2, while a suitable L for a function which satisfies Assumption 2 may need to be very large for Assumption 1 to hold for small n. 2.1 Learning to price in online settings In this work, we will also study how a seller may learn to maximize revenue. In our learning problem, the seller is aware of the valuation curves {vi}i of each type, but does not know the distribution of types (stochastic setting) or there may be no such distribution (adversarial setting). Setup. The seller repeats the data market for T rounds. At the beginning of each round, he chooses some price curve pt P. After the seller has chosen pt, a new buyer of type it [m] appears and purchases nt = nit,pt amount of data (see (1)). The buyer is aware of her own valuation curve. If she makes a purchase, that is if nt > 0, she pays pt(nt) to the seller and reveals her type it. Otherwise, the buyer will make no payment and not reveal her type. We have assumed that a priori, the seller is aware of the buyer valuation curves {vi}i [m], and that buyers are aware of their own valuation curves. In Example 1, a seller can profile how different machine learning models perform with different amounts of data and publish them ahead of time. The buyers can also gauge their value from these curves, even though they do not have access to the data. Next, we have also assumed that buyers will reveal their type after the purchase. In modern machine learning as a service platforms [1, 4, 18], buyers directly run their jobs in the seller s computing platform, so the seller can observe the buyers job type directly. Even if this is not the case, sellers can elicit this information via questionnaires and reviews from customers who have made a purchase [23]. Challenges. Despite these assumptions, the learning problem remains challenging for two main reasons. First, the space of price curves is vast: discretizing the valuations in [0, 1] into K bins, still leaves O(KN) possible price curves, which is both statistically and computationally intractable, especially for large N. Second, in addition to the exploration-exploitation trade-off usually encountered in sequential decision-making, the seller faces a tension between high instantaneous revenue and information acquisition: setting high prices can yield high immediate revenue if a purchase occurs, but it also increases the risk of no purchase, resulting in no revenue and crucially no feedback about the buyer type which could help him in future rounds. This trade-off was recently studied for single-item markets in a stochastic setting [23, 47], but is more complex in our multi-item problem. Moreover, to our knowledge, no existing work addresses this asymmetric feedback model in an adversarial setting, even for single-item markets. Next, we describe the buyer arrival model and define the regret for the learning problem in both stochastic and adversarial settings. Stochastic setting. Here, there is some fixed but unknown distribution of types q. On each round, a buyer of type it q is drawn independently. The optimal expected revenue OPT under type distribution q is as defined in (2). The regret RT is as defined below. We wish to design algorithms which have small expected regret E[RT ], where the expectation accounts for both the sampling of types it q and any randomness in the algorithm. We have, t=1 pt(nt) = T OPT t=1 pt(nit,pt). (3) Adversarial setting. Here, the types on each round {it}T t=1 are chosen arbitrarily, possibly by an oblivious adversary, ahead of time. The type on round t is revealed to the seller only at the end of the round, and only if there is a purchase. In the adversarial setting, we define our regret RT with respect to the single best price in P in hindsight. We wish to design algorithms with small expected regret E[RT ], where the expectation is with respect to any randomness in the algorithm. We have, RT = max p P t=1 p(nit,p) t=1 pt(nit,pt). (4) Algorithm 1 Price discretization scheme under monotonicity Given: Approximation parameter ϵ > 0. Let W be discretization of the valuation space [0, 1] defined as follows, Zi = ϵ(1 + ϵ)i; i 0, 1, . . . , log1+ϵ 1 ϵ Wi = Zi 1 + Zi 1 ϵk m ; k {1, 2, ..., (2 + ϵ)m } , W = log1+ϵ 1 ϵ [ Set P to be the class of all m-step functions mapping [N] to W. 3 Efficient discretization of price curves with small errors We first study the revenue maximization problem in the offline setting, where the seller knows both the valuation curves vi, i [m], and the type distribution q. Our goal is to design a discretization so as to achieve revenue within a gap of O(ϵ) from OPT. Before discussing our discretization algorithms, we first show that the optimal pricing curve is simple when there are at most m types. Lemma 3.1. Assume there are m types with non-decreasing value curves {vi}i [m]. For any nondecreasing price curve p, there exists an m-step price curve p that yields expected revenue at least that of p with respect to any distribution over the m types. Here, m-step refers to non-decreasing functions f : [N] [0, 1] where f(n + 1) f(n) > 0 in at most m points (i.e., at most m jumps). Lemma 3.1, proven in Appendix A.1, will be an important tool in all three discretization algorithms of this section. It will allow us to reduce the space of pricing curves as we only need to focus on m-step price curves. Next, we present our first discretization procedure in Algorithm 1, which only assumes the monotonicity of the valuation curves. Discretization scheme under monotonic valuations. Our discretization proecdure, outlined in Algorithm 1, adapts the method in Hartline and Koltun [25] using Lemma 3.1. For this, we will first construct a discretization W of the valuation space as follows. Let Zi = ϵ(1 + ϵ)i, i = 0, 1, . . . , log1+ϵ 1 ϵ be the powers of (1 + ϵ) on price space [ϵ, 1]. For each i, we let Wi be a uniform discretization of the interval [Zi 1, Zi+1) uniformly with gap Zi 1 ϵ m. Finally, let W be the union of all such Wi. According to Lemma 3.1, every price function in P has the same revenue as an m-step function. We set P to be all choices of non-decreasing m-step functions that take value in W. We have the following theorem about Algorithm 1 which we prove in Appendix A.2. Theorem 3.1. Consider the discretization P as constructed in Algorithm 1. For any type distribution, there exists p P such that rev(p) OPT O(ϵ). Moreover, we have |P| e(N 1) m m e (2 + ϵ) log1+ϵ 1 ϵ m e O N Discretization scheme for smooth monotonic valuations. Due to space constraints, we present our algorithm, under Assumption 1 in Appendix A.4. We have the following theorem about Algorithm 5. Theorem 3.2. Consider the discretization P as constructed in Algorithm 5. Under Assumption 1, for any type distribution, there exists p P such that rev(p) OPT O(ϵ). Moreover, |P| O logm 1+ϵ (1/ϵ) (L/ϵ)m e O L Discretization scheme for monotone valuations under diminishing returns. Finally, we study discretization schemes under the diminishing returns condition. Our procedure, outlined in Algorithm 2 proceeds as follows. We use the same discretization W of the valuation space from Algorithm 1. Next, we will discretize the dataspace [N]. To exploit the structure in the diminishing returns condition, we will need to do so more densely when n is small. For this, let Yi = 2Jm ϵ2 (1 + ϵ2)i, i = 0, . . . , log1+ϵ2 Nϵ2 2Jm be the powers of (1 + ϵ2) on data space 2Jm ϵ2 , N . For each i, the set Qi further partitions the interval [Yi, Yi+1) uniformly with gap Yi ϵ2 2Jm. For n smaller than 2Jm ϵ2 , we do not discretize it as the valuations may change rapidly when n is small. Let ND be the union of Algorithm 2 Price discretization scheme monotonic valuations under diminishing returns Given: Diminishing returns constant J, approximation parameter ϵ. Let W = S log1+ϵ 1 ϵ i=2 Wi, were Wis are the same as in Algorithm 1. Let ND be discretization of the interval [0, N] defined as follows, ϵ2 (1 + ϵ2)i , i = 0, 1, . . . , log1+ϵ2 Nϵ2 Qi = Yi + Yi ϵ2k , k = 0, 1, . . . , 2Jm , Q = l log1+ϵ2 Nϵ2 2Jm m [ ND = 1, 2, . . . , 2Jm The discretization price set P is the class of all m-step price curves on function space ND W. 1, 2, . . . , 2Jm ϵ2 and all the set Qi. Therefore, ND has a size of at most 2Jm ϵ2 +2Jm log1+ϵ2 Nϵ2 2Jm . We have the following theorem about Algorithm 2 which we prove in Appendix A.5. Theorem 3.3. Consider the discretization P as constructed in Algorithm 2. Under Assumption 2, for any type distribution, there exists p P such that rev(p) OPT O(ϵ). Moreover, logm 1+ϵ 1/ϵ e O J Proof outline. By Lemma 3.1, we may assume the optimal price curve p = {(n i , p i )}m i=1 is an m-step function, where p i denote the value of p on step i. We generate an m-step price curve p = {(ni, pi)}m i=1 on space ND W such that ni is obtained by rounding down n i to the closest value in ND, and pi p i /(1 + ϵ). We then show that if a buyer purchases at step i under price p , she will not purchase at step j < i under new price p. Therefore, the revenue from this buyer is at least pi p i /(1 + ϵ) = p i O(ϵ), which ensures that rev(p) OPT O(ϵ). 4 Online learning in the stochastic setting We now study the online learning problem outlined in 2.1 in the stochastic setting. Our Algorithm, outlined in Algorithm 3 is based on the classical upper confidence bound (UCB) algorithm for stochastic bandits [8, 38]. It takes a discretization P of the pricing curves as input, and on each round chooses a pt P which has the largest UCB on the revenue. The key challenge lies in constructing an UCB. As P is large, naively constructing UCB over prices in P will lead to a q |P|T log T upper bound, leading to poor, exponential dependence on m. This is the bound if we only observe the reward for the prices that are actually pulled, but do not observe the types after purchase. Therefore, naively applying UCB is like bandit feedback. On the other extreme, had we been in an alternative setting where we observe the type regardless of purchase, this is like a full information feedback because once observe the type, we know the revenue for all prices. Then UCB gives us q log(|P|)T log T upper bound. We are in an intermediate regime between bandit feedback and full information: The challenge in constructing the UCB arises because we only observe types upon purchase. As the key unknown is the type distribution, we maintain UCBs for it and translate them to UCBs for the revenue. In particular, our UCB depends on how many times a buyer could have purchased at a given round, which is a random quantity depending on the algorithm itself. Construction of UCB. We will now show how to construct the upper confidence bound c revt at the end of round t, which will be used in computing pt+1. For τ t, let Sτ, defined below in (5), be the set of types who would have purchased in round τ at price pτ had they appeared in that round. Then, for any type i [m], we define Ti,t to be the number of times that type i appears in set Sτ for Algorithm 3 Online data pricing in the stochastic setting. Given: time horizon T, discretization P of price curves. Set p1 to be the zero function. # Give data away for free on round 1. A buyer of type i1 q arrives and purchases N data points at price 0. for t = 2 to T do Compute the UCB c revt 1(p) on the revenue of p for each p P. # See (5), (6), and (7). Set pt = argmaxp P c revt 1(p). A buyer of type it q arrives, purchases nit,pt points, and pays pt(nit,pt). end for τ {1, . . . , t}. That is, Ti,t measures the number of times a buyer of type i would have purchased during the first t rounds. We have, Sτ = i [m] : n [N], vi(n) pτ(n) 0 , Ti,t = τ=1 I(i Sτ). (5) Note that as we use the 0 price function on round 1, i.e. p1( ) = 0, we have Ti,t > 0 for all t > 1. Next, we estimate qi via the fraction of times that type i has appeared in the past t rounds, provided that i Sτ for τ {1, . . . , t}. We have defined this quantity, qi,t below in (6). Via a standard application of Hoeffding s inequality, we can show that qi qi,t p (log T)/Ti,t with high probability. Using this, we can construct an upper confidence bound bqi,t as follows, qi,t = 1 Ti,t τ=1 I(i Sτ, iτ = i), bqi,t = qi,t + We now translate the UCBs on q to the UCBs on the revenue. Recall from (1) that a buyer of type i will purchase ni,p points at price p and the revenue from this buyer will be p(ni,p). Note that as the seller has access to the valuation curves, he can compute ni,p for any i and price curve p. Since rev(p) = Ei q[p(ni,p)], we have the following natural UCB for rev(p) on round t: c revt(p) = i=1 bqi,t p(ni,p). (7) This completes the description of our construction. The following theorem bounds the regret for Algorithm 3 when paired with any of the discretization schemes in 3. While the computational complexity of our method depends on |P|, there is no dependence on the regret because of the above construction of the UCB. The proof is given in Appendix C. Theorem 4.1. Suppose in Algorithm 3 we use a discretization P which is a O(1/ T) additive approximation to any price curve. Then, the regret of Algorithm 3 satisfies E[RT ] e O(m Proof challenges. When bounding the regret, we first observe that the subsets S [m] induces a partitioning of the price curves, where p belongs to the partition of S, if all types in S would make a purchase at price p, and all types in Sc would not make a purchase at price p. With this insight, we can view the action of a seller as not just choosing a price curve, but also choosing a set St [n]. That is, St can be viewed as a super-arm in a combinatorial semi-bandit problem [37]. 5 Online learning in the adversarial setting We now study the adversarial setting. Similar to the stochastic setting, our algorithm will use a discretization of the price curves from 3. We will control regret by bounding both the discretization error and the algorithm s regret relative to the best pricing curve in the discretization. Before proceeding, let us first contextualize our feedback model against prior work. If the buyers do not reveal their types, this becomes an adversarial bandit problem with |P| arms (pricing curves) [33]. Using an algorithm such as EXP-3 [9] results in large e O(T 1/2|P| 1/2) regret, which is not ideal due to |P| s exponential dependence in m. Conversely, if buyers reveal their types regardless of purchase, Algorithm 4 Online data pricing in the adversarial setting. Given: time horizon T, discretization P, perturbation parameter θ. For each p P, sample θp from an exponential distribution with pdf θe θx for t = 1 to T do Set price curve for the current round pt = argmax p P τ=1 rτ(p) + θp. A buyer of type it arrives, purchases nit,pt points, and pays pt(nit,pt). if nit,pt > 0 then Set rt(p) = p(nit,p) for all p P. # If there was a purchase else Set rt(p) = P i Sc t p(ni,p) for all p P. # See (5) for St. end if end for this is equivalent to full information feedback, where algorithms such as Hedge or Follow-theperturbed-leader (FTPL) [31] yield O(T 1/2 log 1/2 |P|) regret, translating to e O((m T) 1/2) with our discretization schemes in 3. In our intermediate regime, where feedback is only revealed upon purchase, we aim for a middle ground. We show our algorithm, outlined in Algorithm 4, achieves e O(m3/2T 1/2) regret, which is worse than full information, but still depends polynomially on m. Our algorithm takes a discretization P and a perturbation parameter θ as input. First, it samples a random perturbation θp from an exponential distribution with pdf θe θx for each pricing curve p in P. It maintains rewards {rt(p)}t,p for each round t and price curve p. On each round, it chooses the price curve that maximizes the perturbed cumulative reward Pt τ=1 rτ(p) + θp. This scheme is similar to FTPL, but the key difference is in how we design the rewards {rt(p)}t,p. To describe this, let St, defined exactly as in (5), be the set of agents who would have purchased in round t at price pt. At the end of the round, if there was a purchase, for all prices p P, we set the reward to be rt(p) = p(nit,p), i.e. the payment we would have received from the buyer at that round, had the price been p (see (1)). If there was no purchase, we know that it / St, in which case we set rt(p) = P i Sc t p(ni,p). In this case, rt(p) is an upper bound on p(nit,p), and this upper bound is tight around prices similar to the chosen price pt; in fact, rt(pt) = 0 if there was no purchase. Intuitively, rt(p) deals with the uncertainty of not knowing the type on round t by providing a large reward (as we are taking the sum) to prices that could have resulted in a purchase, which encourages exploration of such prices in future rounds. This intuition will help us bound the regret. Theorem 5.1 provides a bound on the regret for Algorithm 4. Its proof is given in Appendix B. Combining this with the size of P under the various assumptions in 3, we obtain e O(m 3/2 Theorem 5.1. Suppose in Algorithm 4 we use a discretization P which is a O(1/ T) additive approximation to any price curve. Let RT be as defined in (4). Then, for Algorithm 4, we have E[RT ] O m2θT + θ 1 1 + log P . Setting θ = q m2T , we have E[RT ] O m q 6 Conclusion and Discussion We designed revenue-optimal learning algorithms for pricing data. First, we leveraged properties like smoothness and diminishing returns to create novel discretization schemes for approximating any pricing curve. These schemes were then used in our learning algorithms to improve their statistical and computational properties. Our algorithms build on classical methods like UCB and FTPL but required significant adaptations to handle the vast space of pricing curves and the asymmetric feedback. An interesting future direction would be to relax the assumption that the seller knows the valuation curves vi. Computational complexity. Our algorithm is designed to achieve polynomial computational complexity with respect to the number of data points when the number of types is fixed, making it suitable for practical data pricing scenarios where the type count is typically small or bounded. While the overall computational cost grows exponentially with the number of types due to the problem s strong NP-hardness (see [14]), this design choice ensures computational feasibility in settings with large datasets and a limited number of types. [1] AWS Forecast. https://aws.amazon.com/forecast/, . Accessed: 2024-05-12. [2] AWS Data Hub. https://aws.amazon.com/blogs/big-data/tag/datahub/, . Accessed: 2024-05-11. [3] Azure Data Share. https://azure.microsoft.com/en-us/products/data-share. Accessed: 2024-05-10. [4] Delta Sharing. https://docs.databricks.com/en/data-sharing/index.html. Accessed: 2024-05-11. [5] Ads Data Hub. https://developers.google.com/ads-data-hub/guides/intro. Accessed: 2022-05-10. [6] A. Agarwal, M. Dahleh, and T. Sarkar. A marketplace for data: An algorithmic solution. In Proceedings of the 2019 ACM Conference on Economics and Computation, pages 701 726, 2019. [7] A. Agarwal, M. Dahleh, T. Horel, and M. Rui. Towards data auctions with externalities. ar Xiv preprint ar Xiv:2003.08345, 2020. [8] P. Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3(Nov):397 422, 2002. [9] P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. The nonstochastic multiarmed bandit problem. SIAM journal on computing, 32(1):48 77, 2002. [10] M. F. Balcan and H. Beyhaghi. New guarantees for learning revenue maximizing menus of lotteries and two-part tariffs. Transactions on Machine Learning Research, 2024. [11] O. Besbes and A. Zeevi. Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Operations Research, 57(6):1407 1420, 2009. [12] O. Besbes and A. Zeevi. On the (surprising) sufficiency of linear models for dynamic pricing with demand learning. Management Science, 61(4):723 739, 2015. [13] S. Chawla, J. D. Hartline, and R. Kleinberg. Algorithmic pricing via virtual valuations. In Proceedings of the 8th ACM Conference on Electronic Commerce, pages 243 251, 2007. [14] S. Chawla, R. Rezvan, Y. Teng, and C. Tzamos. Pricing ordered items. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 722 735, 2022. [15] W. Chen, W. Hu, F. Li, J. Li, Y. Liu, and P. Lu. Combinatorial multi-armed bandit with general reward functions. Advances in Neural Information Processing Systems, 29, 2016. [16] X. Chen, I. Diakonikolas, D. Paparas, X. Sun, and M. Yannakakis. The complexity of optimal multidimensional pricing. In Proceedings of the twenty-fifth annual ACM-SIAM symposium on Discrete algorithms, pages 1319 1328. SIAM, 2014. [17] W. C. Cheung, D. Simchi-Levi, and H. Wang. Dynamic pricing and demand learning with limited price experimentation. Operations Research, 65(6):1722 1731, 2017. [18] Citrine Informatics. Citrine Informatics Accelerating Materials Innovation. URL: https: //citrine.io/, 2024. Accessed: March 9, 2024. [19] A. V. Den Boer. Dynamic pricing and learning: Historical origins, current research, and new directions. Surveys in Operations Research and Management Science, 20(1):1 18, 2015. [20] A. V. den Boer and B. Zwart. Simultaneously learning and optimizing using controlled variance pricing. Management Science, 60(3):770 783, 2014. [21] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248 255. Ieee, 2009. [22] M. Dudík, N. Haghtalab, H. Luo, R. E. Schapire, V. Syrgkanis, and J. W. Vaughan. Oracleefficient online learning and auction design. Journal of the ACM (JACM), 67(5):1 57, 2020. [23] W. Guo, N. Haghtalab, K. Kandasamy, and E. Vitercik. Leveraging reviews: Learning to price with buyer and seller uncertainty. In Proceedings of the 24th ACM Conference on Economics and Computation, pages 816 816, 2023. [24] V. Guruswami, J. D. Hartline, A. R. Karlin, D. Kempe, C. Kenyon, and F. Mc Sherry. On profit-maximizing envy-free pricing. In SODA, volume 5, pages 1164 1173, 2005. [25] J. D. Hartline and V. Koltun. Near-optimal pricing in near-linear time. In Proceedings of the 9th International Conference on Algorithms and Data Structures, WADS 05, page 422 431, Berlin, Heidelberg, 2005. Springer-Verlag. ISBN 3540281010. doi: 10.1007/11534273_37. URL https://doi.org/10.1007/11534273_37. [26] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770 778, 2016. [27] M. Jagadeesan, A. Wei, Y. Wang, M. Jordan, and J. Steinhardt. Learning equilibria in matching markets from bandit feedback. Advances in Neural Information Processing Systems, 34:3323 3335, 2021. [28] A. Javanmard. Perishability of data: dynamic pricing under varying-coefficient models. The Journal of Machine Learning Research, 18(1):1714 1744, 2017. [29] A. Javanmard and H. Nazerzadeh. Dynamic pricing in high-dimensions. The Journal of Machine Learning Research, 20(1):315 363, 2019. [30] R. Jia, D. Dao, B. Wang, F. A. Hubis, N. Hynes, N. M. Gürel, B. Li, C. Zhang, D. Song, and C. J. Spanos. Towards efficient data valuation based on the Shapley value. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 1167 1176. PMLR, 2019. [31] A. Kalai and S. Vempala. Efficient algorithms for online decision problems. Journal of Computer and System Sciences, 71(3):291 307, 2005. [32] N. B. Keskin and A. Zeevi. Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies. Operations Research, 62(5):1142 1167, 2014. [33] R. Kleinberg and T. Leighton. The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings., pages 594 605. IEEE, 2003. [34] A. Krause and C. Guestrin. Submodularity and its applications in optimized information gathering. ACM Transactions on Intelligent Systems and Technology (TIST), 2(4):1 20, 2011. [35] A. Krause, H. B. Mc Mahan, C. Guestrin, and A. Gupta. Robust submodular observation selection. Journal of Machine Learning Research, 9(12), 2008. [36] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012. [37] B. Kveton, Z. Wen, A. Ashkan, and C. Szepesvari. Tight regret bounds for stochastic combinatorial semi-bandits. In Artificial Intelligence and Statistics, pages 535 543. PMLR, 2015. [38] T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics, 6(1):4 22, 1985. [39] K. Misra, E. M. Schwartz, and J. Abernethy. Dynamic online pricing with incomplete information using multiarmed bandit experiments. Marketing Science, 38(2):226 252, 2019. [40] G. Perakis and D. Singhvi. Dynamic pricing with unknown nonparametric demand and limited price changes. Operations Research, 2023. [41] S. Shalev-Shwartz and S. Ben-David. Understanding machine learning: From theory to algorithms. Cambridge university press, 2014. [42] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1 9, 2015. [43] T. Wang, J. Rausch, C. Zhang, R. Jia, and D. Song. A principled approach to data valuation for federated learning. Federated Learning: Privacy and Incentive, pages 153 167, 2020. [44] Y. Wang, B. Chen, and D. Simchi-Levi. Multimodal dynamic pricing. Management Science, 67 (10):6136 6152, 2021. [45] L. Wasserman. All of nonparametric statistics. Springer Science & Business Media, 2006. [46] J. Xu and Y.-X. Wang. Logarithmic regret in feature-based dynamic pricing. Advances in Neural Information Processing Systems, 34:13898 13910, 2021. [47] H. Zhao and W. Chen. Stochastic one-sided full-information bandit. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 150 166. Springer, 2019. A Omitted Details from Section 3 A.1 Proof of Lemma 3.1 Lemma 3.1. Assume there are m types with non-decreasing value curves {vi}i [m]. For any nondecreasing price curve p, there exists an m-step price curve p that yields expected revenue at least that of p with respect to any distribution over the m types. Here, m-step refers to non-decreasing functions f : [N] [0, 1] where f(n + 1) f(n) > 0 in at most m points (i.e., at most m jumps). Proof of Lemma 3.1. Fix a price curve p. Let ni,p be the amount of data type i purchase at price curve p, that is argmax n [N] (vi(n) p(n)) For {ni,p}i [m], let π : [m] [m] be a permutation such that nπ(1),p nπ(2),p nπ(m),p. Let n(i) = nπ(i),p. Then, define a function p : [N] [0, 1] as follows, p n(1) , n n(1), p n(2) , n(1) < n n(2), ... p n(m 1) , n(m 2) < n n(m 1), p n(m) , n(m 1) < n N, so that p has at most m steps. Then, p has following properties, p(n) = p(n), when n n(1), n(2), . . . , n(m) , p(n) p(n), when n [N] \ n(1), n(2), . . . , n(m) . We next prove that for any i [m], after changing the price function from p to p, the type i buyer either purchases at (ni,p, p(ni,p)) or at (N, p(n(m))). For any type i and any amount of data n n(m), there exists k such that n(k 1) < n n(k) (let n(0) = 0), we then have vi(n) p(n) vi n(k) p n(k) (as vi is non-decreasing and p is a step function.) = vi n(k) p n(k) (as p n(k) = p n(k) ) vi(ni,p) p(ni,p) (as ni,p maximizes the buyer s utility.) = vi(ni,p) p(ni,p). (as p(ni,p) = p(ni,p)) As shown in the above, type i still prefers purchasing ni,p data over all n n(m) under price p. For n n(m) + 1, . . . , N , by the monotonicity of value curves, we have arg max n {n(m)+1,...,N} (vi(n) p(n)) Therefore, for any i [m], type i either purchases at (ni,p, p(ni,p)), or purchases at (N, p(N)) = (N, p(n(m))) under price p. No matter in which case, type i contributes no less revenue under p than p. It then follows that, for any type distribution q, rev( p) rev(p). A.2 Proof of Theorem 3.1 In this subsection, we prove Theorem 3.1 by decomposing it into three technical lemmas (Lemma A.1, A.2 and A.3). In Lemma A.1 and A.2, we prove the approximation guarantee of our discretization scheme and, in Lemma A.3 we provide an upper bound on the size of the discretization. Lemma A.1. For any type distribution, there exists a pricing function ep : [N] [ϵ, 1] such that rev(ep) OPT ϵ. Proof of Lemma A.1. Consider the optimal pricing function p : [N] [0, 1], i.e., OPT = rev(p ). Consider price curve ep : [N] [ϵ, 1] where ep(n) = max (ϵ, p (n)). Let J = {n [N] : ep(n) = p (n)} be the set of data quantities whose price under ep are the same as those under p. Any buyer type who would have purchased n J amount of data under p will purchase the same amount of data under ep. On the other hand, for buyer types who would have purchased n / J amount of data under p , since ep(n) = ϵ > p (n) for n / J, the expected revenue contribution from such buyers under p is at most ϵ, hence no matter they purchase or not under ep, we have rev(ep) OPT ϵ. Lemma A.2. For any ep [ϵ, 1]N there exists p P such that rev(p ) rev(ep)/(1 + ϵ), for any type distribution q. Proof of Lemma A.2. For m buyer types, by Lemma 3.1, there exists a non-decreasing step function p [ϵ, 1]N with at most m steps, whose expected revenue is at least rev(ep). Assume p has k steps, k m. To simplify the notation, for 1 j k, let pj denote the price p on jth step. That is, p1, n (0, i1] Z, p2, n (i1, i2] Z, ... pk, n (ik 1, N] Z. Where i1, . . . , ik 1 [N] are discontinuities in p. Recall the definitions of Z and W as stated in Algorithm 1, Zi = ϵ(1 + ϵ)i : i 0, 1, . . . , log1+ϵ 1 ϵ Wi = Zi 1 + Zi 1 ϵk m : k {1, 2, ..., (2 + ϵ)m } , W = log1+ϵ 1 ϵ [ Let ik = N and for each j [k], let Zij be the price obtained by rounding pj down to the nearest value in Z. By constructions of Z and W above, Wij is a partition of interval (Zij 1, Zij+1). Let wj be the price obtained by rounding pj down to the nearest value in Wij. Set dj = ϵ m Zij 1 and consider k-step function p defined by whose price at jth step (denoted pj) is wj (j 1)dj Wij, that is p1 = w1, for n (0, i1] Z, p2 = w2 d2, for n (i1, i2] Z, ... pk = wk (k 1)dk, for n (ik 1, N] Z. By the tie-breaking rule and the monotonicity of valuation curves, buyers only purchase among 0, i1, i1, . . . , ik number of data under p and p. Subclaim. Then, p and p satisfies the following rev(p) rev( p)/(1 + ϵ), (8) with respect to any type distribution. Proof of the Subclaim. We prove the above subclaim with two steps. Step 1: No buyer who prefers to purchase ij data under p would prefer ij data for some j < j under p (i.e., one with a less price). This is because, when going from price p to p, the increase in the buyer s utility for ij data is pj pj, which is higher than the increase pj pj for ij data. Formally, this can be seen as follows: For any j < j we have, pj pj wj pj = (j 1)dj, as pj wj and pj = wj (j 1)dj. Moreover, pj < wj + dj = pj pj < wj + dj pj = j dj . (9) The inequality pj < wj + dj holds because wj is the result of rounding down pj to the nearest value in Wij. By constructions of sets Z and W, we have dj dj which implies (j 1)dj j dj . Then, by combining the above inequalities, we obtain pj pj (j 1)dj j dj pj pj . (10) Consider a buyer with value curve v who prefers to purchase at ij under price p, then it must be v(ij) pj > v(ij ) pj . (11) Then, by combining (10) and (11), we have v(ij) pj > v(ij ) pj , therefore the buyer would not purchase at ij < ij under p. Step 2: Next, we claim that pj pj/(1 + ϵ) for all step j [k]. Since Zij is obtained by rounding pj down to the nearest value in Z, we have pj Zij = Zij 1 + ϵZij 1 = Zij 1 + mdj. (12) By (9) and the above, we have pj pj jdj Zij 1 + (m j)dj Zij 1, where the first inequality is by (9), the second is by (12), and the third is because m j. Then, it follows that pj pj j dj = ϵ j m Zij 1 ϵ Zij 1 ϵ pj = pj pj/(1 + ϵ). So far we have proved pj pj/(1 + ϵ) and no type wants to change their preference to a smaller amount of data under p. If one type purchase at pi under p and pk under p for k i, then pk pi pi/(1 + ϵ). Therefore, we have rev(p) rev( p)/(1 + ϵ) rev( p)/(1 + ϵ). Since the construction of price p is not relevant to type distribution, the above holds for any type distribution q, which proves the subclaim. Note that p constructed in the above subclaim is not necessarily non-decreasing as a larger amount of data surfers more price deduction when going from p to p. In this case, we can directly construct a non-decreasing price curve p P from p such that rev(p ) rev( p)/(1 + ϵ). Let S = {i [k] : j < i, s.t. pj > pi}. If S is empty, this implies that p is non-decreasing, hence setting p = p. If S is not empty, we define p as follows: Let p be a k-step function with the same jump points i1, . . . , ik as p. Let p i be the value of p on ith step. Then, for i / S, let p i = pi; and for i S, let p i = maxj / S,j p on set S. Next, we claim that pj p j is non-decreasing for all j [k]. Both ( pj pj)j [k] and p are non-decreasing with respect to j by the previous results. Hence, pj p j < pj p j pj+1 pj+1 = pj+1 p j+1, if j S, j + 1 / S, pj p j = pj p j pj+1 pj+1 = pj+1 p j+1, if j / S, j + 1 / S, pj p j = pj p j+1 pj+1 p j+1, if j / S, j + 1 S, (as p j+1 = p j) pj p j = pj p j+1 pj+1 p j+1, if j S, j + 1 S. (as p j+1 = p j) Therefore, any type that prefers to purchase at jth step under p would not prefer purchasing at any step j < j under p , and since p j pj pj/(1 + ϵ), we have rev(p ) rev( p)/(1 + ϵ) rev(ep)/(1 + ϵ). Lemma A.3. When n > m, P e N m m e (2 + ϵ) log1+ϵ 1 ϵ m. Proof of Lemma A.3. For any integer i m, the number of non-decreasing i-step price function is N 1 i |W | i , hence we have m e (2 + ϵ) log1+ϵ 1 ϵ In the last inequality, we use the fact that |W| (2 + ϵ)m log1+ϵ 1 ϵ . Finally, Theorem 3.1 follows directly from the above lemmas. Theorem 3.1. Consider the discretization P as constructed in Algorithm 1. For any type distribution, there exists p P such that rev(p) OPT O(ϵ). Moreover, we have |P| e(N 1) m m e (2 + ϵ) log1+ϵ 1 ϵ m e O N Proof of Theorem 3.1. Combining Lemma A.1 and Lemma A.2 together, we conclude that there exists price curve p P such that rev(p ) rev( p) 1 + ϵ OPT ϵ 1 + ϵ OPT 2ϵ 1 + ϵ = OPT O(ϵ). The size of P follows from Lemma A.3. A.3 Price discretization scheme for smooth monotonic valuations We study discretization schemes to approximate monotone valuations under the smoothness condition in Assumption 1. Our procedure is outlined in Algorithm 5. The discretization W of the valuation space follows Algorithm 1. Additionally, we uniformly split the data space into multiples of ϵN m L , denoting them as the set NS. We then set the discretization P to be the class of all m-step price curves on the function space NS W. The following theorem, proven in Appendix A.4, outlines the main properties of this discretization scheme: the size of the discretization has no dependence on the number of data N. Algorithm 5 Price discretization scheme for smooth monotonic valuations Given: Smoothness constant L, approximation parameter ϵ > 0. Let W be discretization of the valuation space [0, 1] given in Algorithm 1. Let NS be the following discretization of the interval [0, N], , NS = δk : k N Set P to be the class of all m-step functions mapping NS W. A.4 Proof of Theorem 3.2 Theorem 3.2. Consider the discretization P as constructed in Algorithm 5. Under Assumption 1, for any type distribution, there exists p P such that rev(p) OPT O(ϵ). Moreover, |P| O logm 1+ϵ (1/ϵ) (L/ϵ)m e O L Proof of Theorem 3.2. By Lemma 3.1, there is a revenue optimal price curve p : [N] [0, 1] which is a k-step function, for some k [m]. Where p can be compactly represented as the following set of tuples: {(n 1, p 1), (n 2, p 2), . . . , (n k, p k)} , where n 1, . . . , n k denote the locations of jumps and p i denote the value of p on step i [k] (i.e. p (n) = p i for n (n i 1, n i ]). Let ϵ := ϵ m. Next, we generate a price p using Algorithm 6, which ensures that the price curve p generated in the following step (13) is non-decreasing. We demonstrate that in each round of Algorithm 6, we incur a revenue loss of at most ϵ. If p i > p i 1 + ϵ, everything remains the same and thus does not affect the expected revenue. If not, we combine the price of step i with step i 1, let p j = p j p i p i 1 for j = i, . . . , k. During this process, buyers either make purchases at the same step, or switch to purchase at a higher step. Note that p i p i 1 < ϵ, so the revenue loss of each type is at most ϵ. This implies that the revenue loss in each round is at most ϵ. As there are k rounds, we lose expected revenue of at most m ϵ. We conclude that rev(p ) is within a gap of ϵ from OPT, i.e., rev(p ) OPT ϵ. Algorithm 6 Auxiliary Price Adjustment Input: Optimal price curve p . Let p = p . for i = 2, . . . , k do if p i < p i 1 + ϵ then for j = i, . . . , k do p j = p j p i p i 1 . end for end if end for Output: Price curve p . After combining some steps in Algorithm 6, Assume that p is a k-step function ( k k) represented by (n 1, p 1), (n 2, p 2), . . . , (n k, p k) . Then, we define a new price curve p P as follows: let δ := ϵN L , then p is a k-step function represented by {(n1, p1), (n2, p2), . . . , (n k, p k)} , δ, pi = p i i ϵ. (13) First, we show that no buyer who purchases at step i under p would purchase at step j < i under p. Let the buyer s valuation be v. First, we prove that the buyer s utility is non-negative at ni: v(ni) pi v(n i) δ L N pi (by L/N-Smoothness of v.) = v(n i) δ L N p i + i ϵ v(n i) ϵ p i + i ϵ (as δ L = v(n i) p i + (i 1) ϵ v(n i) p i 0. Then, we prove that the buyer s utility at ni is larger than that of nj for j < i, therefore, the buyer would not prefer buying at step j < i under price p. v(ni) pi (v(nj) pj) v(n i) δ L N v(n j) (pi pj) (by L/N-Smoothness of v.) = v(n i) δ L N v(n j) (p i p j (i j) ϵ) v(n i) ϵ v(n j) (p i p j (i j) ϵ) (as δ L = (v(n i) p i) (v(n j) p j) + (i j 1) ϵ (v(n i) p i) (v(n j) p j) (as i > j) 0. (as the buyer prefers ni than nk under p .) Finally, fix the type distribution (q1, . . . , qm), then we have rev(p ) rev(p) i=1 (p i pi) I(Type j purchase at p i under price p ) m ϵ = ϵ. (as ϵ = m ϵ.) Hence, rev(p) is within a gap of 2ϵ from OPT. We then apply Theorem 3.1 to price p. Therefore, it is enough to consider price functions from the set NS = kδ : k = 1, . . . , N δ [N] to W to approximate the revenue within O(ϵ) gap. Moreover, this discretization is of the size N δ |W | O log1+ϵ 1 A.5 Proof of Theorem 3.3 Theorem 3.3. Consider the discretization P as constructed in Algorithm 2. Under Assumption 2, for any type distribution, there exists p P such that rev(p) OPT O(ϵ). Moreover, logm 1+ϵ 1/ϵ e O J Proof of Theorem 3.3. For each i = 0, 1, . . . , l log1+ϵ2 Nϵ2 2Jm m , let Yi = 2Jm ϵ2 (1 + ϵ2)i , and Qi be the set nj Yi + Yiϵ2 2Jmk k : k = 1, . . . , 2Jm o , i.e., Qi splits the interval [Yi, Yi+1] equally into 2m J parts. The union of Qis and the set 1, 2, . . . , 2Jm ϵ2 form a set of grids on [0, N], denoted by ND. There are at most 2Jm ϵ2 + 2Jm log1+ϵ2 Nϵ2 2Jm grids in total. By Lemma 3.1, there is a revenue optimal price curve p : [N] [0, 1] which is a k-step function, for some k [m]. Where p can be compactly represented as the following set of tuples: {(n 1, p 1), (n 2, p 2), . . . , (n k, p k)} , where n 1, . . . , n k denote the locations of jumps and p i denote the value of p on step i [k] (i.e. p (n) = p i for n (n i 1, n i ]). Then, define a new k-step price curve p via {(n1, p1), (n2, p2), . . . , (nk, pk)} , where ni is given by ni round down n i to the closest grid in ND. Then we define pi below. If p i < ϵ(1 + ϵ), let pi = ϵ(1 + ϵ); otherwise, let Zn i be the price obtained by rounding p i down to the nearest value in Z. By constructions of Z and W above, Wn i is a partition of interval (Zn i 1, Zn i +1). Let wi be the price obtained by rounding p i down to the nearest value in Wn i . Set di = ϵ m Zn i 1. Then define pi = wi i di Wn i . First, we prove for i satisfying p i > ϵ(1 + ϵ), if a buyer purchases at ni under price p , she will not purchase at nj, j < i under new price p. We prove this property separately when ni 2Jm ϵ2 and ni > 2Jm (i) When ni > 2Jm The buyer s utility at ni under price p is, v(ni) pi = v(n j) p i + (p i pi (v(n i ) v(ni))) . (14) Let δi = v(n i ) v(ni). Then δi is upper bounded by, h=ni v(h + 1) v(h) ni (n i ni) 2m J + 1 = ϵ2 where the third inequality is due to Lemma A.4. By the construction of p, we have p i pi = Zni 1 ϵi Therefore, by (14), v(ni) pi v(n i ) p i 0, buyer s utility at ni under price p is non-negative. Next, we claim that v(ni) pi (v(nj) pj) 0. To prove this, for any j < i, let δj = v(n j) v(nj), then we have v(ni) pi (v(nj) pj) = v(n i ) p i (v(n j) p j) + (p i pi δi) (p j pj δj) Where v(n i ) p i (v(n j) p j) 0 because the buyer prefers n i over n j under price p . Recall that we have δj 0, then we bound δi δj as follows, δi δj δi ϵ2 By the construction of pi, we have, p i pi (p j pj) = Zni 1 ϵi (as Zni 1 Znj 1) Therefore, combining (17) and (18) together, we have v(ni) pi (v(nj) pj) v(n i ) p i (v(n j) p j) 0. We conclude that under price p, the buyer prefers ni over nj, for any j < i. (ii) When ni 2Jm In this case, ni = n i , and for any j < i, we still have nj = n j. First, we prove the buyer s utility at n i under p is non-negative: v(ni) pi = v(n i ) pi = v(n i ) p i + (p i pi) v(n i ) p i 0. Then, we show that the buyer prefers ni over nj under p: v(ni) pi (v(nj) pj) = v(n i ) p i (v(n j) p j) + (p i pi δi) (p j pj δj) = v(n i ) p i (v(n j) p j) + (p i pi) (p j pj) v(n i ) p i (v(n j) p j) where the first inequality is due to (18), and the second is because the buyer prefers n i over n j under p . So far we have completed the proof that for i satisfying p i > ϵ(1 + ϵ), if a buyer purchases at ni under price p , she will not purchase at nj, j < i under new price p. Then, similar to Step 2 in the proof of Lemma A.2, we have p p 1+ϵ pointwise. We then conclude the proof by observing rev(p) rev(p ) O(ϵ) 1 + ϵ = OPT O(ϵ). Lemma A.4. When ni > 2Jm ϵ2 , we have n j ni ni ϵ2 2Jm + 1. Proof of Lemma A.4. By the construction of discretization set, ni must have the following form, Yi + Yi ϵ2k , where Yi = 2Jm ϵ2 (1 + ϵ2)i for some i , k Z. Since n j is obtained by rounding down nj to the nearest grid in ND, nj satisfies the following inequality, nj n j Yi + Yi ϵ2(k + 1) Therefore, we have n i ni Yi + Yi ϵ2(k + 1) = Yi + Yi ϵ2(k + 1) 2Jm Yi + Yi ϵ2k Yi + Yi ϵ2(k + 1) 2Jm Yi + Yi ϵ2k Where in the last inequality, since Yi is an integer, and we have n i = Yi + Yi ϵ2k Yi , for k 0. B Proof of Theorem 5.1 Theorem 5.1. Suppose in Algorithm 4 we use a discretization P which is a O(1/ T) additive approximation to any price curve. Let RT be as defined in (4). Then, for Algorithm 4, we have E[RT ] O m2θT + θ 1 1 + log P . Setting θ = q m2T , we have E[RT ] O m q Proof of Theorem 5.1. Recall that the regret RT for the adversarial setting is RT = max p P t=1 r(it, p) t=1 r(it, pt) t=1 r(it, p) max p P t=1 r(it, p) | {z } Loss of revenue due to discretization t=1 r(it, p) t=1 r(it, pt). = RT (discretization regret) We decompose RT into two regrets. The first term is the sacrifice of revenue on discretization. The second term is the algorithm regret when competing against the optimal price within the discretization set P. According to Theorem 3.1, our discretization scheme approaches optimal revenue within a gap of 2ϵ 1+ϵ: t=1 r(it, p) max p P t=1 r(it, p) 2ϵT 1 + ϵ < 2ϵT. (20) Therefore, the first term can be bounded by 2ϵT. According to Theorem B.1, the second term discretization regret is upper bounded by E[RT ] 3m q T log P . (21) Combining (20) and (21) together, we have, E[RT ] 2ϵT + 3m q T log P = O m q T log P . (as ϵ = 1 Plug in the size of discretization set in Section 3, we have, E[RT ] = e O m 3/2 Theorem B.1. The discretization regret RT defined in (19) has upper bound O m q Proof of Theorem B.1. We first claim that rt(pt) = r(it, pt) all t. If the buyer make a purchase at round t, rt(pt) = r(it, pt) holds by definition. But if the buyer does not purchase at a price pt on round t, r(it, pt) = 0. Since Sc t contains all the types that would not make a purchase at pt, we have r(i, pt) = 0, i Sc t , and r(it, pt) = X i Sc t r(i, pt) = rt(pt) = 0. Therefore, rt(pt) = r(it, pt) holds for every round t [T]. Denote p as, p = argmax p P t=1 r(it, p). Then, we decompose the regret as follows, t=1 r(it, p ) E t=1 r(it, pt) t=1 r(it, p ) E t=1 (r(it, p ) rt(p )) t=1 rt(pt+1) t=1 rt(pt+1) rt(pt) We bound three terms in (22) separately. The first term. For any price p and any round t, we have rt(p) r(it, p) by definition. Hence, t=1 (r(it, p ) rt(p )) 0. (23) The second term. Since p = argmax p P PT t=1 r(it, p). We apply Lemma B.1 to p , t=1 rt(pt+1) θp1 θp . Note that both θp1 and θp are drawn i.i.d. from exponential distribution, E[θp1] E max p P θp E[θp ] E max p P θp t=1 rt(pt+1) E θp1 θp 1 + log P The third term. Note that for any price p P and any round t, rt(p) m. Therefore we have, E [rt(pt+1) rt(pt)] = P (pt+1 = pt) E [rt(pt+1) rt(pt) | pt+1 = pt] m P (pt+1 = pt) . The price curve on round t is pt, then by the price updation rule, pt = argmax p P τ=1 rτ(p) + θp, which is equivalent to, τ=1 rτ(pt), p P. For all p P, let ct 1,p denote ct 1,p , (25) then pt = p is equivalent to θp ct 1,p . (26) Subclaim. If θpt also satisfies the following condition (27), τ=1 rτ(pt) + m, p P, (27) then pt+1 = pt. Proof of the Subclaim. If (27) holds for all p P, τ=1 rτ(pt) + m τ=1 rτ(pt). (because p P, rt(p) [0, m]) pt = argmax p P τ=1 rτ(p) + θp = pt+1. Therefore, (27) is a sufficient condition for pt+1 = pt. We then bound the probability of pt+1 = pt by computing the probability of (27) happening. P (pt = pt+1) = X p P P (pt = p) P(pt+1 = p | pt = p) p P P (pt = p) P (pt+1 = p | θp ct 1,p) ( by (26)) p P P (pt = p) P (θp ct 1,p + m | θp ct 1,p) p P P (pt = p) e mθ Therefore, P (pt = pt+1) mθ. Hence, the third term can be bounded as E rt(pt+1) rt(pt) m2θ = t=1 E rt(pt+1) rt(pt) m2θT. (28) m2T . Combining the upper bounds for three terms (23), (24) and (28) together, we have E[RT ] 1 + log P θ + m2θT O m q Plugging in the size of the discretization set (Theorem 3.1), we have, E[RT ] e O m 3/2 Lemma B.1. For any p P, t=1 rt(pt+1) + θp1 t=1 rt(p) + θp. (29) Proof of Lemma B.1. We prove this by induction. For T = 0, the inequality θp1 θp holds by definition p1 = argmax p P θp. Assume that the inequality holds for some T. Then for any p P, t=1 rt(pt+1) + θp1 = t=1 rt(pt+1) + θp1 + r T +1(p T +2) t=1 rt(p T +2) + θp T +2 + r T +1(p T +2) t=1 rt(p T +2) + θp T +2 t=1 rt(p) + θp. Where the first inequality is by the induction hypothesis, and the second inequality is by p T +2 = arg max p P t=1 rt(p) + θp. By the induction, the inequality (29) holds for any T 0. C Proof of Theorem 4.1 In this section, we prove, Theorem 4.1, our regret upper bound of Algorithm 3. We prove the theorem by first decomposing the regret into two parts: Regret with respect to the best price in a discretized set (called discretization regret ) and the residual error due to discretization. The residual error is controlled by the approximation guarantees developed in Section 3. Then, the key lemma in this appendix is Lemma C.1 which controls the discretization. We prove Lemma C.1 using a technique adapted from Chen et al. [15]. Theorem 4.1. Suppose in Algorithm 3 we use a discretization P which is a O(1/ T) additive approximation to any price curve. Then, the regret of Algorithm 3 satisfies E[RT ] e O(m Proof of Theorem 4.1. For the sake of simplicity, we define r(i, p) as the revenue under type i and price p, i.e, r(i, p) = p(ni,p). Therefore, on every round, we have r(it, pt) = pt(nit,pt). Recall that the regret RT is t=1 pt(nit,pt) t=1 r(it, pt) = T OPT T max p P rev(p) | {z } Loss of revenue due to discretization + T max p P rev(p) t=1 r(it, pt). = RT (discretization regret) We decompose RT into two parts. The first term is the sacrifice of revenue on discretization. The second term is the algorithm regret when competing against the optimal price within the discretization set P. According to Theorem 3.1, our discretization scheme approaches OPT within a gap of 2ϵ 1+ϵ, OPT max p P rev(p) 2ϵ 1 + ϵ 2ϵ. Therefore, the first term can be bounded as, T OPT T max p P rev(p) 2ϵT. (31) By Lemma C.1, the second term, discretization regret, is upper bounded by E[RT ] 93m p T log T (32) Combining (31) and (32) together, we have, E[RT ] 2ϵT + 93m p T log T = e O(m T) ( as ϵ = 1 Lemma C.1. The discretization regret RT defined in (30) is at most e O(m Proof of Lemma C.1. The discretization regret RT T max p P rev(p) t=1 r(it, pt) t=1 (r(p , it) r(pt, it)) t=1 E [r(p , it) r(pt, it)] t=1 E [rev(p ) rev(pt)] t=1 E [(rev(p ) rev(pt)) I(At)] + t=1 E [(rev(p ) rev(pt)) I(Ac t)] t=1 E [δpt I(At)] + t=1 E [δpt I(Ac t)] . (33) We can further decompose E[RT ] into PT t=1 E [δpt I(At)] and PT t=1 E [δpt I(Ac t)]. Where for any round t, we define the good event At as follows, i [m] , qi bqi,t qi + 2 Define qi,t = Pt τ=1 I(i Sτ ,iτ =i) Pt s=1 I(i Sτ ) I(iτ =i) Pt τ=1 I(i Sτ ) . Note that I(iτ = i) is a random variable that follows Bernoulli distribution Ber(qi), and one can only observe I(iτ = i) when i Sτ, let xi,j denote the mean value of first j i.i.d. observations of I(is = i). Then, we have Ti,t , Ti,t = j |xi,j qi| > j=0 2 exp( 2 log T) Where in the first inequality, the event n qi,t qi > q Ti,t , Ti,t = j o indicates n |xi,j qi| > q j o , and the second inequality follows from Hoeffding s inequality. We then bound the second term in (33) t=1 E [δpt I(Ac t)] t=1 E [I(Ac t)] Define event Ht = n 0 < δpt < 2 P log T Ti,t 1 o . By Lemma C.3, we know that I(At 1, δpt > 0) = I 0 < δpt < X log T Ti,t 1 It remains to prove the upper bound for PT t=1 E [δpt I(AT )]. For t {1, . . . , T} and k Z+, let 2 log T, δpt > 0, + , δpt = 0, Ak,t = {i St : Ti,t 1 mk,t} . Then, we define an event Gk,t = {|Ak,t| βkm} , which means In the t-th round, at least βkm types in St has been observed at most mk,t times . Then, by Lemma C.5, we have t=1 I(Ht) δpt t=1 I (Gk,t, δpt > 0) δpt. For i [m], k Z+, t [T], define an event Gi,k,t = Gk,t {i St, Ti,t 1 mk,t} . Then by the definitions of Gk,t and Gi,k,t we have I (Gk,t, δpt > 0) 1 βkm i EB I (Gi,k,t, δpt > 0) . t=1 I(Ht) δpt X t=1 I (Gi,k,t, δpt > 0) δpt For any price function p, define δp = rev(p ) rev(p). If δp > 0, we call it a bad price. Let EB = {i [m] : type i would make a purchase at least one bad price}. For each type i EB, suppose i is contained in Ni bad prices p B i,1, p B i,2, . . . , p B i,Ni. Let δi,l = δp B i,l (l [Ni]). Without loss of generality, we assume δi,1 δi,2 δi,Ni. Let δi,min = δi,Ni. For convenience, we also define δi,0 = + , i.e., αk 2m δi,0 2 = 0. Then, we have t=1 I (Ht) δpt t=1 I (Gi,k,t, δpt > 0) δpt l=1 I Gi,k,t, pt = p B i,l δpt l=1 I Gi,k,t, pt = p B i,l δi,l l=1 I Ti,t 1 mk,t, pt = p B i,l δi,l 2 log T, pt = p B i,l 2 log T < Ti,t 1 αk 2 log T, pt = p B i,l 2 log T < Ti,t 1 αk 2 log T, pt = p B i,l 2 log T < Ti,t 1 αk 2 log T, pt = p B i,l 2 log T < Ti,t 1 αk 2 log T, i St 1 δ2 i,j 1 δ2 i,j 1 1068m log T X 1 δ2 i,j 1 δ2 i,j 1 where the last inequality is due to Lemma C.4. Finally, for each i EB we have 1 δ2 i,j 1 δ2 i,j 1 δi,j = 1 δi,Ni + 1 δ2 i,j (δi,j δi,j+1) 1 δi,Ni + Z δi,1 = 2 δi,Ni 1 It follows that t=1 I(Ht) δpt 1068m log T X 2 δi,min = m X 2136 δi,min log T (34) So far, the distribution-dependent regret bound is proven. To prove the distribution-independent bound, we decompose PT t=1 I(Ht) δpt into two parts: t=1 I(Ht) δpt = t=1 I (Ht, δpt ϵ) δpt + t=1 I (Ht, δpt > ϵ) δpt t=1 I (Ht, δpt > ϵ) δpt, where ϵ > 0 is a constant to be determined. The second term can be bounded in the same way as in the proof of the distribution-dependent regret bound, except that we only consider the case δpt > ϵ. (For each type i EB, suppose i is contained in Ni bad prices p B i,1, p B i,2, . . . , p B i,Ni. Let δi,l = δp B i,l (l [Ni]) satisfies δi,1 δi,2 . . . δi,Ni ϵ. Also let δi,min = δi,Ni.) Thus, we can replace (34) by t=1 I (Ht, δpt > ϵ) δpt m X i EB,δi,min>ϵ 2136 δi,min log T 2136m2 It follows that t=1 I(Ht) δSt ϵ T + 2136m2 Finally, letting ϵ = q 2136m2 log T t=1 I(Ht) δSt 2 p 2136m2T log T 93 p Lemma C.2. Under good event At, for any price function p, let Sp denote the set of types who would purchase at price p, then we have t [T], rev(p) c revt(p) rev(p) + X Proof of Lemma C.2. When At happens, qi bqi,t qi + 2 for all i [m]. Therefore, we have c revt(p) = i=1 bqi,t r(i, p) i=1 qi r(i, p) = rev(p) c revt(p) = i=1 bqi,t r(i, p) r(i, p) rev(p) + X The last inequality is by r(i, p) 1. Lemma C.3. For each t [T], under good event At 1, the following inequality holds, δpt = rev(p ) rev(pt) 2 X log T Ti,t 1 . Proof of Lemma C.3. When At 1 happens, by Lemma C.2, rev(p ) c revt 1(p ), rev(pt) c revt 1(pt) 2 X log T Ti,t 1 . It then follows that, δpt = rev(p ) rev(pt) c revt 1(p ) c revt 1(pt) 2 X log T Ti,t 1 Since pt = argmaxp P c revt 1(p), we have c revt 1(pt) c revt 1(p ). Lemma C.4 (Theorem 4 of Kveton et al. [37]). We can choose {αk}k 0 and {βk}k 0, which satisfy the following properties: {αk}k 0 and {βk}k 0 are positive and α1 > α2 > . . . and 1 = β0 > β1 > β2 > . . . , such that limk αk = limk βk = 0. Moreover, βk 1 βk αk 1, and αk βk < 267. Lemma C.5. On round t, if event Ht happens, then at least one event Gk,t, k Z+ happens, where Gk,t = {|Ak,t| βkm} , where Ak,t = {i St : Ti,t 1 mk,t} , and mk,t = αk m δpt 2 log T when δpt > 0 and + otherwise. Proof of Lemma C.5. Assume that Ht happens and that none of G1,t, G2,t, . . . happens. Then |Ak,t| < βkm for all k Z+. Let A0,t = St and Ak,t = St\Ak,t for k Z+ {0}. Thus Ak 1,t Ak,t for all k Z+. Note that limk mk,t = 0. Thus there exists N Z+such that Ak,t = St for all k N, and then we have St = S k=1 Ak,t\ Ak 1,t . Finally, note that for all i Ak,t, we have Ti,t 1 > mk,t. Therefore i Ak,t\ Ak 1,t i Ak,t\ Ak 1,t Ak,t\ Ak 1,t mk,t = |Ak 1,t\Ak,t| mk,t = |Ak 1,t| |Ak,t| mk,t = |St| m1,t + k=1 |Ak,t| 1 mk+1,t 1 mk,t k=1 βkm 1 mk+1,t 1 mk,t (βk 1 βk) m mk,t . Under event Ht, we have log T Ti,t 1 = 2 p (βk 1 βk) m mk,t = 2 βk 1 βk αk δpt δpt, where the last inequality is due to Lemma C.4. We reach a contradiction here, hence the lemma follows. D Miscellaneous D.1 Notations The following table contains the notations used in this paper. Notation Meaning N The total amount of data. n [N] The number of data. m The number of types. p : [N] [0, 1] A price curve. P A set of discretized price curves. vi : [N] [0, 1] The valuation curve for type i [m]. V = {vi : i [m]} The set of all valuation curves. ni,p The amount of data type i [m] purchases at price curve p. r(i, p) = p(ni,p) The revenue from type i [m] under price curve p. q = (q1, q2, . . . , qm) The type distribution. rev(p) The expected revenue under price p. it [m] The type of buyer on round t [T]. pt : [N] [0, 1] The price curve on round t [T]. St The set of types that would make a purchase at price pt. Sp The set of types that would make a purchase at price p. Ti,t = Pt τ=1 I(i Sτ) The number of times that type i appears in set Sτ for τ {1, . . . , t}. P = {p [N] [0, 1] : p(0) = 0} The set of all pricing curves. L Smoothness constant of valuation curves. J Diminishing return constant of valuation curves. Table 3: Table of notations. Neur IPS Paper Checklist Question: Do the main claims made in the abstract and introduction accurately reflect the paper s contributions and scope? Answer: [Yes] Justification: We fully included paper s contributions and scope in the appendix. See 1.1 for the summary of our contributions. 2. Limitations Question: Does the paper discuss the limitations of the work performed by the authors? Answer: [Yes] Justification: See 6 for future works on relaxing one of key assumptions of the paper. 3. Theory Assumptions and Proofs Question: For each theoretical result, does the paper provide the full set of assumptions and a complete (and correct) proof? Answer: [Yes] Justification: We provided the full set of assumptions. Moreover, we provided the full proofs of each Lemma and Theorem in this paper both in main text and Appendices. Guidelines: The answer NA means that the paper does not include theoretical results. All the theorems, formulas, and proofs in the paper should be numbered and crossreferenced. All assumptions should be clearly stated or referenced in the statement of any theorems. The proofs can either appear in the main paper or the supplemental material, but if they appear in the supplemental material, the authors are encouraged to provide a short proof sketch to provide intuition. Inversely, any informal proof provided in the core of the paper should be complemented by formal proofs provided in appendix or supplemental material. Theorems and Lemmas that the proof relies upon should be properly referenced. 4. Experimental Result Reproducibility Question: Does the paper fully disclose all the information needed to reproduce the main experimental results of the paper to the extent that it affects the main claims and/or conclusions of the paper (regardless of whether the code and data are provided or not)? Answer: [NA] Justification: This paper does not include experiments. 5. Open access to data and code Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [NA] Justification: This paper does not include experiments requiring code. 6. Experimental Setting/Details Question: Does the paper specify all the training and test details (e.g., data splits, hyperparameters, how they were chosen, type of optimizer, etc.) necessary to understand the results? Answer: [NA] Justification: This paper does not include experiments. 7. Experiment Statistical Significance Question: Does the paper report error bars suitably and correctly defined or other appropriate information about the statistical significance of the experiments? Answer: [NA] Justification: This paper does not include experiments. 8. Experiments Compute Resources Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [NA] Justification: This paper does not include experiments. 9. Code Of Ethics Question: Does the research conducted in the paper conform, in every respect, with the Neur IPS Code of Ethics https://neurips.cc/public/Ethics Guidelines? Answer: [Yes] Justification: Our research conforms, in every respect, with the Neur IPS Code of Ethics. 10. Broader Impacts Question: Does the paper discuss both potential positive societal impacts and negative societal impacts of the work performed? Answer: [NA] Justification: Our research is theoretical and have no societal impact in a foreseeable future. 11. Safeguards Question: Does the paper describe safeguards that have been put in place for responsible release of data or models that have a high risk for misuse (e.g., pretrained language models, image generators, or scraped datasets)? Answer: [NA] Justification: Our paper poses no such risks. 12. Licenses for existing assets Question: Are the creators or original owners of assets (e.g., code, data, models), used in the paper, properly credited and are the license and terms of use explicitly mentioned and properly respected? Answer: [NA] Justification: This paper does not use existing assets. 13. New Assets Question: Are new assets introduced in the paper well documented and is the documentation provided alongside the assets? Answer: [NA] Justification: This paper does not release new assets. 14. Crowdsourcing and Research with Human Subjects Question: For crowdsourcing experiments and research with human subjects, does the paper include the full text of instructions given to participants and screenshots, if applicable, as well as details about compensation (if any)? Answer: [NA] Justification: This paper does not involve crowdsourcing nor research with human subjects. 15. Institutional Review Board (IRB) Approvals or Equivalent for Research with Human Subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or institution) were obtained? Answer: [NA] Justification: This paper does not involve crowdsourcing nor research with human subjects.