# noregret_learning_in_partiallyinformed_auctions__4492fe4d.pdf

No-Regret Learning in Partially-Informed Auctions

Wenshuo Guo 1 Michael I. Jordan 1 2 Ellen Vitercik 3

Auctions with partially-revealed information about items are broadly employed in real-world applications, but the underlying mechanisms have limited theoretical support. In this work, we study a machine learning formulation of these types of mechanisms, presenting algorithms that are no-regret from the buyer s perspective. Speciﬁcally, a buyer who wishes to maximize his utility interacts repeatedly with a platform over a series of T rounds. In each round, a new item is drawn from an unknown distribution and the platform publishes a price together with incomplete, masked information about the item. The buyer then decides whether to purchase the item. We formalize this problem as an online learning task where the goal is to have low regret with respect to a myopic oracle that has perfect knowledge of the distribution over items and the seller s masking function. When the distribution over items is known to the buyer and the mask is a Sim Hash function mapping Rd to {0, 1}ℓ, our algorithm has regret O((Tdℓ) 1/2). In a fully agnostic setting when the mask is an arbitrary function mapping to a set of size n and the prices are stochastic, our algorithm has regret O((Tn) 1/2).

1. Introduction

Selling mechanisms play a crucial role in economic theory and have a wide range of applications across many industries (Post et al., 1995; Milgrom, 2004; Edelman et al., 2007; Milgrom, 2010; Arnosti et al., 2016). Under the canonical mechanism design model, buyers choose whether or not to buy items for sale based on their true values for those items.

1Department of Electrical Engineering & Computer Sciences, University of California, Berkeley, USA 2Department of Statistics, University of California, Berkeley, USA 3Department of Management Science & Engineering and Department of Computer Science, Stanford University, USA. Correspondence to: Wenshuo Guo <wguo@cs.berkeley.edu>.

Proceedings of the 39 th International Conference on Machine Learning, Baltimore, Maryland, USA, PMLR 162, 2022. Copyright 2022 by the author(s).

This fundamental model, however, assumes that the buyers know exactly how much they value the items for sale, which is often not the case.

One of the overriding reasons that a buyer may not know their true values is information asymmetry: the seller may purposefully obfuscate information about an item for sale. For example, the seller may hide information about the item in the hopes of better revenue (Gershkov, 2009). Alternatively, information about the item may be private, and thus the seller may wish to protect this sensitive information by only revealing partial information about the item. For instance, in online advertising auctions, bids represent how much advertisers are willing to pay to display their ad to a particular user. Historically, advertisers have bid based on uniquely identifying information about users, but there has been a growing effort to protect users privacy by obfuscating this sensitive information (Juels, 2001; Guha et al., 2011; Epasto et al., 2021).

In these scenarios, the buyer only has partial information about the item for sale but still must decide whether to make a purchase. This raises the question: how should a buyer determine their purchase strategy with only incomplete item information?

We study posted-price auctions a fundamental mechanism family that is appealingly interpretable with incomplete item information. In particular, the seller reveals obfuscated ( masked ) information about the item using a ﬁxed, unknown masking function. We study an online setting where, at each round, a fresh item is drawn from an unknown distribution (for example, a distribution over users visiting a webpage). The seller sets a price and the buyer chooses whether to buy the item based on the incomplete information that the seller provides. We propose no-regret learning algorithms for the buyer that achieve sub-linear regret compared to an oracle buyer who has perfect knowledge of the item distribution as well as the seller s masking function.

1.1. Our results

We study no-regret learning with incomplete item information in two settings:

1. First, we propose an algorithm for a setting where the item distribution is known to the buyer and the mask

No-Regret Learning in Partially-Informed Auctions

Item distribution Prices Masking function h Regret

Known Adversarial Sim Hash h : [0, 1]d {0, 1}ℓ O( p

Tdℓlog(T ℓ/δ)) (Theorem 3.5)

Unknown Stochastic Arbitrary h : X [n] O( p

T(n log T/n + log 1/δ)) (Theorem 4.4)

Unknown Adversarial Arbitrary h : X [n] O(T 2/3n 1/3) (Remark 4.5)

Table 1. Summary of regret bounds which hold with probability at least 1 δ.

is a Sim Hash function mapping [0, 1]d to {0, 1}ℓ. In other words, each item is deﬁned by d real-valued features and the seller reveals ℓbits about the item to the buyer, as deﬁned by a function that is unknown to the buyer. This model has been studied from an applied perspective in the context of ad auctions (Epasto et al., 2021). We provide an algorithm with regret O( p

Tdℓlog(T ℓ/δ)).

2. Next, we study a setting where the masking function is an arbitrary mapping from the set of all items, denoted X, to a ﬁnite set of size n. We propose an online learning algorithm with regret O( p

T(n log T/n + log 1/δ)) when the prices are stochastic, where T is the length of the horizon.

In the ﬁrst setting where the masking function is a Sim Hash function mapping [0, 1]d to {0, 1}ℓ, the domain of the masking function is of size n = 2ℓ, so our regret bound of O((Tdℓ) 1/2) is exponentially better than the latter regret bound.

We summarize these results in Table 1.

1.2. Related work

This work draws on several threads of research on designing auctions with incomplete information, learning to bid, and privacy-preserving simple auctions.

Auction design with incomplete information. Auctions with incomplete value information have attracted much research attention. Several prior works have explored auction design where the information about the item may be incomplete to the buyer or the seller (Ganuza, 2004; Es o & Szentes, 2007; Bergemann & Pesendorfer, 2007; Arefeva & Meng, 2021; Roesler & Szentes, 2017; Li & Shi, 2017; Bergemann et al., 2021; Li & Jewitt, 2017). In particular, Ganuza (2004) studied the incentives of the auctioneer to release signals about the item to the buyers that reﬁne their private valuations before a second-price auction. In their model, the seller reveals a noisy item feature vector which is an unbiased estimator of the true one. Bergemann & Pesendorfer (2007) considered single-item multi-bidder auctions where the seller decides how accurately the bidders can learn their valuations. However, all of this prior work

is limited to ofﬂine settings; they did not explore online purchase strategies that the buyer can adopt. To the best of our knowledge, this work constitutes the ﬁrst analysis of noregret learning algorithms in partially-informed posted-price auctions.

Learning to bid with an unknown value. Instead of focusing the problem from the side of the auctioneer who aims to maximize revenue over repeated rounds, another active line of research studies bidding strategies for the bidders when they do not know their values (Dikkala & Tardos, 2013; Weed et al., 2016; Feng et al., 2018; Balseiro & Gur, 2019). Feng et al. (2018) considered a single-item multibidder setting where the bidder learns to bid via partial feedback, and provide algorithms with regret rates against the best ﬁxed bid in hindsight. Dikkala & Tardos (2013) explored a setting where bidders need to experiment in order to learn their valuations. The key difference between this work and ours is that our algorithms exploit the available partial information instead of having zero knowledge about the item being sold. This partial information model allows us to trade off between the amount of information revealed about the item and the regret. Moreover, we compete with the best purchase policy, rather than the best ﬁxed bid in hindsight.

Private auctions. Our theoretical model is motivated by recent work on designing privacy-enhanced auctions for practical usage. An important application of these auctions is online advertising, where the items being auctioned are user queries and the auctioner must trade off between user privacy and revenue maximization (De Corniere & De Nijs, 2016; Raﬁeian & Yoganarasimhan, 2021; Epasto et al., 2021; Guha et al., 2011; Juels, 2001). Epasto et al. (2021) present a detailed exploration of Chrome s Federated Learning of Cohorts (FLo C) API, where user information is masked. Our contribution is to provide a formal treatment of such auctions, including providing algorithms that have theoretical guarantees in the setting of arbitrary masking functions that are unknown to the buyer.

2. Preliminaries

We begin by deﬁning formally the setting we study for auctions with partial information.

No-Regret Learning in Partially-Informed Auctions

We consider a setting where there is a single seller and a single buyer. There is a distribution P over items which are elements of an abstract set X. The buyer has a bounded valuation, v (x) [0, H] for every item x X and some H R+1. The seller and buyer interact over a series of T rounds. At each round t [T], the seller draws an item

xt iid P and sets a price pt(xt) [0, H] for the item. The seller does not reveal xt to the buyer, but rather reveals some partial information h(xt) Y where h : X Y is a ﬁxed masking function that maps to an abstract set Y.

We assume that the price function pt does not give any more information about xt than that is provided by the masking function h. In other words, if h(x) = h(x ), then pt(x) = pt(x ), which implies

E[v (x) | h(x), p(x)] = E[v (x) | h(x)]. (1)

The buyer uses a strategy st : Y R {0, 1} to decide whether to buy the item given the partially revealed item information and the price. Here st(h(xt), pt(xt)) = 1 if and only if the buyer buys the item. Letting bt = st(h(xt), pt(xt)), the buyer s utility is ut = bt(v (xt) pt(xt)) [ H, H]. We summarize this process below.

Model Online model

1: for t = 1, 2, . . . , T do 2: Seller selects a price function pt : X [0, H]. 3: Item xt is sampled from P. 4: Seller publishes item information h(xt) and a price pt(xt). 5: Buyer decides whether or not to buy: bt = st(h(xt), pt(xt)) {0, 1}. 6: Buyer obtains reward ut = (v (xt) pt(xt)) bt. 7: if bt then {item is purchased} 8: Buyer observes xt. 9: end if

10: end for

Example 2.1 (Advertising auctions). We instantiate our model in the context of advertising auctions, taking inspiration from Epasto et al. (2021). The seller is a platform and the buyer is an advertiser. Each item x X describes a user who visits the platform. For example, X = Rd might denote features that uniquely identify each user. On round t, the advertiser has a value v (xt) for the opportunity to show the user xt an ad. In order to preserve user privacy, the platform does not reveal xt to the advertiser, but rather

1The assumption that there is a bound on the maximum amount that the agents value the item is widely made in prior research. In our main application ad auctions the value of an impression is typically very cheap, so this assumption is mild.

some summary h(xt). For example, Epasto et al. (2021) study a setting where h is a Sim Hash function, so xt Rd

and h(xt) Rℓfor some ℓ< d. The platform sets a price pt(xt) which the advertiser pays to show the user an ad.

We study this problem from the perspective of the buyer: how should they select the strategy st at each round to maximize their utility? We study two settings: a setting where the distribution P is unknown to the buyer and the prices are stochastic (Section 4) and a model where P is known to the buyer with adversarial prices (Section 3).

2.2. Regret and the optimal strategy

We measure the regret of the buyer in our online model with regard to the optimal strategy s of a myopic buyer2

who has perfect knowledge of the distribution P and the masking function h, but not the realized item xt. To make this dependence on the environment clear, in any single round, we use the notation s (h(x), p(x), h, P) to denote the optimal strategy (we drop the subscript t for simplicity). More formally, s maximizes the expected utility:

argmax s S E x P[(v (x) p(x))s(h(x), p(x), h, P) | h(x)],

where S represents the set of all decision functions s( ) : Y R {h( ), P} {0, 1}. Deﬁnition 2.2. The buyer s (expected) regret with respect to the optimal strategy s , denoted RT , is deﬁned as

v (xt) pt(xt) s (h(xt), pt(xt), h( ), P)

v (xt) pt(xt) st (h(xt), pt(xt)) . (2)

In the following proposition, we identify the form of the optimal strategy s . The proof is in Appendix A. Proposition 2.3. The strategy s that maximizes Ex P[(v (x) p(x))s (h(x), p(x), h, P) | h(x)] is

s (h(x), p(x), h, P) = I E x P [v (x) | h(x)] > p(x) .

Even when the buyer has no information about the distribution P (Section 4), we show that he can guarantee low regret with respect to s with either stochastic or adversarial prices, in polynomial per-round runtime. When the distribution P is known (Section 3), we provide an algorithm with exponentially better regret.

3. Known Item Distribution

First, we focus on a speciﬁc class of masking functions, Sim Hash, motivated by recent practical applications in ad

2A myopic buyer optimizes his utility separately in each round.

No-Regret Learning in Partially-Informed Auctions

auctions (Epasto et al., 2021). Here, X is a feature space [0, 1]d and the the masking function h is a Sim Hash function that is unknown to the buyer. In other words, there are ℓunknown vectors w1, . . . , wℓ Rd such that the masking function, denoted as hw with w = (w1, . . . , wℓ), is hw(x) = (sgn(w1 x), . . . , sgn(wℓ x)) .

We consider the setting where the distribution of the items is known to the buyer (for example, via historical data). We provide an algorithm that achieves a regret O(

Tdℓ) even under adversarial prices. Since the masking function maps to a set of size n = 2ℓ, the regret only depends logarithmically on n. As we detail in the subsequent Section 4, this algorithm achieves exponentially better regret compared to the algorithm we present where the distribution P over items is unknown and the masking function is arbitrary.

Algorithm 1 Explore-then-Commit (Known Distribution)

1: Input: horizon T, distribution P, d, ℓ N+, and δ (0, 1). 2: Compute t = p

4Tdℓlog(ℓ/δ). 3: for t = 1, 2, . . . , t do {Exploration phase}

4: Receive h(xt), price pt(xt) where xt iid P. 5: Make decision bt = 1 and observe xt. 6: end for

7: Use linear programming to compute ˆw = ( ˆw1, . . . , ˆwℓ) such that h ˆw(xi) = hw(xi) for all i [t ].

8: for t = t + 1, t + 2, . . . , T do {Exploitation phase}

9: Receive hw(xt), price pt(xt) where xt iid P. 10: Obtain an estimate ˆZt of Ex P[v (x) | x h 1 bw (hw(xt))] using the Integration Algorithm by Lovász & Vempala (2006). 11: Make decision bt = I( ˆZt pt(xt)). 12: end for

Algorithm 1 begins with an exploration phase of length t = O(

Tdℓ), during which the buyer buys the item in each round. The algorithm then uses linear programming to solve for separators ˆw = ( ˆw1, . . . , ˆwℓ), ˆwj Rd for all j [ℓ], such that sgn(wj xi) = sgn( ˆwj xi) for all j [ℓ] and i [t ]. During the rest of the rounds t {t + 1, . . . , T}, the algorithm exploits. Since the optimal strategy is to buy if Ex P [v (x) | hw(x) = hw(xt)] p(xt) (Prop. 2.3), the algorithm uses hw(xt) and ˆw = ( ˆw1, . . . , ˆwℓ) to compute an estimate of Ex P [v (x) | hw(x) = hw(xt)].

The intuition behind the estimate is the following. Letting h 1 w (xt) = {x : hw(x) = hw(xt)} (a convex polytope), we have that Ex P [v (x) | hw(x) = hw(xt)] = Ex P v (x) | x h 1 w (hw(xt)) . Since the buyer does not know w, we cannot compute the set h 1 w (hw(xt)), but we can compute the set h 1 ˆw (hw(xt)) using the estimated separators that we have obtained after the ex-

ploration phase. Even still, the conditional expectation Ex P v (x) | x h 1 ˆw (hw(xt)) may be challenging to compute in high dimensions. Therefore, we use a sampling algorithm by Lovász & Vempala (2006) to compute an estimate Zt of Ex P[v (x) | x h 1 bw (hw(xt))]. The buyer buys the item if Zt pt(xt).

To compute the estimate Zt, Lovász & Vempala (2006) require that if π is the density function of P, then v (x)π(x) is log-concave and well-rounded. Many well-studied distributions are log-concave, including the normal, exponential, uniform, and beta distributions, among many others. Moreover, every concave function that is nonnegative on its domain is log-concave. If v and π are log-concave, then v (x)π(x) is also log-concave. For example, the Cartesian product of single-dimensional log-concave distributions (exponential, logistic, extreme value, Laplace, and beta distributions, among many others) is log-concave. Log-concavity has also been widely-assumed in prior works in machine learning and high-dimensional statistics (e.g., Bagnoli & Bergstrom, 2006; Saumard & Wellner, 2014).

The function v (x)π(x) is well-rounded if for any A X, the distribution deﬁned by fπ(A) =

A v (x)π(x)dx R

X v (x)π(x)dx is neither too spread out nor too concentrated. We include the formal deﬁnition in Appendix B (Def. B.1). Every logconcave function can be brought to a well-rounded position by an afﬁne transformation of the space in polynomial time (Lovász & Vempala, 2006).

Regret analysis. We now prove that the regret of Algorithm 1 is O(

Tdℓ). To do so, we must contend with two sources of error: the fact that we use the learned linear separators ˆw instead of w and the estimation error introduced by the sampling algorithm.

We begin our analysis by proving that for any y {0, 1}ℓin the image of hw : X {0, 1}ℓ, the agent s expected value conditioned on x h 1 bw (y) is close to its true expected value conditioned on x h 1 w (y). In this section, we use

the notation ϵ = ℓ

Lemma 3.1. For any y {0, 1}ℓ, with probability at least 1 δ over x1, . . . , xt P,

E v (x)|x h 1 bw (y) E v (x)|x h 1 w (y) Hϵ.

Proof. We can decompose the set h 1 bw (y) as

h 1 bw (y) = h 1 bw (y) h 1 w (y) h 1 bw (y) \ h 1 w (y) .

No-Regret Learning in Partially-Informed Auctions

We can therefore write

E x P v (x) | x h 1 bw (y)

= E v (x) | x h 1 bw (y) h 1 w (y)

Pr x h 1 bw (y) h 1 w (y)

+ E v (x) | x h 1 bw (y) \ h 1 w (y)

Pr x h 1 bw (y) \ h 1 w (y) .

Similarly, we can write

E x P v (x) | x h 1 w (y)

= E v (x) | x h 1 bw (y) h 1 w (y)

Pr x h 1 bw (y) h 1 w (y)

+ E v (x) | x h 1 w (y) \ h 1 bw (y)

Pr x h 1 w (y) \ h 1 bw (y) .

Matching terms, we have that E x P v (x) | x h 1 bw (y) E v (x) | x h 1 w (y)

= E v (x) | x h 1 bw (y) \ h 1 w (y)

Pr x h 1 bw (y) \ h 1 w (y)

E v (x) | x h 1 w (y) \ h 1 bw (y)

Pr x h 1 w (y) \ h 1 bw (y) .

We know that x h 1 bw (y) \ h 1 w (y) if and only if h bw(x) = y and hw(x) = y, which means that h bw(x) = hw(x). The following claim bounds Pr[h bw(x) = y and hw(x) = y] Pr[h bw(x) = hw(x)].

Claim 3.2. With probability 1 δ, Prx P[h bw(x) = hw(x)] ϵ.

Proof of Claim 3.2. For a ﬁxed i [ℓ], by the standard PAC learning generalization bound in the realizable setting (e.g., Theorem 4.8 by Anthony & Bartlett (2009)), we have that with probability 1 δ

Pr[sgn(wi x) = sgn(bwi x)] 1

Therefore, with probability 1 δ,

Pr x P[h bw(x) = hw(x)]

= Pr[ i [ℓ] such that sgn(wi x) = sgn(bwi x)] ϵ,

as claimed.

By Claim 3.2 and the fact that v (x) [0, H], we therefore know that E v (x) | x h 1 bw (y) \ h 1 w (y)

Prx P x h 1 bw (y) \ h 1 w (y) [0, Hϵ]. By a symmetric argument, E v (x) | x h 1 w (y) \ h 1 bw (y) Prx P x h 1 w (y) \ h 1 bw (y) [0, Hϵ]. Therefore, the lemma statement holds.

Lemma 3.1 guarantees that Ex P v (x) | x h 1 bw (y) is a good approximation of E v (x) | x h 1 w (y) which is the key quantity needed to compute the optimal policy (see Prop. 2.3). However, this estimate may be difﬁcult to compute when x is high dimensional, despite the fact that P is known. The integration algorithm of Lovász & Vempala (2006) allows us to estimate it in polynomial time, as we summarize in the following lemma.

Lemma 3.3. Suppose that v (x)π(x) is log-concave and well-rounded. Then for any y {0, 1}ℓ, with probability at least 1 δ, we can compute a constant A in polynomial time such that A Ex P v (x) | x h 1 w (y) Hϵ.

Proof. By Lemma 3.1, with probability at least 1 δ/2, E x P v ( x) | x h 1 bw (hw(x))

E x P v ( x) | x h 1 w (hw(x)) ϵH.

By deﬁnition

E x P[v ( x) | x h 1 bw (hw(x))]

x h 1 b w (hw(x)) v ( x)π( x)d x.

Then, by Lovász & Vempala (2006, Theorem 1.3), in polynomial runtime with probability of at least 1 δ/2, we can compute a constant A, such that A E x P v ( x) | x h 1 bw (hw(x))

ϵ E x P v ( x) | x h 1 bw (hw(x)) ϵH.

By the triangle inequality and a union bound, we have that with probability of at least 1 δ, we can compute a value A such that A E x P v ( x) | x h 1 w (hw(x)) 2ϵH,

which completes the proof.

Lemma 3.4. In each round t {t + 1, . . . , T} of the exploitation phase in Algorithm 1, with probability at least 1 δ, the expected instantaneous regret incurred in round t is at most

d + ln ℓ 2ℓ+2

No-Regret Learning in Partially-Informed Auctions

Proof. Given hw(xt) and price p(xt), denote the estimated value of Ex P v (x) | x h 1 bw (hw(xt)) obtained using the sampling algorithm (Alg 1, step 1) as A(hw(xt)). For simplicity of notation, we denote the decision of the oracle policy as s and the decision of the learned policy as st:

s = I E x P v (x) | x h 1 w (hw(xt)) > p(xt) ,

st = I A(hw(xt)) > p(xt) .

Now we bound the expected instantaneous regret in round t:

E xt P [(v (xt) p(xt)) s (v (xt) p(xt)) st]

= E xt P [(v (xt) p(xt)) (s st)] .

Let denote the difference = s st, so { 1, 0, 1} is a random variable that depends on xt. By the law of total expectation,

E xt P [(v (xt) p(xt)) s (v (xt) p(xt)) st]

h E x P (v (x) p(xt)) | x h 1 w (hw(xt)) i

h E x P v (x)|x h 1 w (hw(xt)) p(xt) i .

The variable is only nonzero when s = st. Let E denote the event where s = st and let p E = Pxt P[E]. Then

E xt P [(v (xt) p(xt)) s (v (xt) p(xt)) st]

h E x v (x) | x h 1 w (hw(xt)) p(xt) E i p E

h E x v (x) | x h 1 w (hw(xt)) p(xt) E i .

By deﬁnition, when event E happens, we know that

A(hw(xt)) < p(xt) E x[v (x) | x h 1 w (hw(xt))] or

E x P[v (x) | x h 1 w (hw(xt))] < p(xt) A(hw(xt)),

where in either case we have that E x P[v (x) | x h 1 w (hw(xt))] p(xt)

E x P[v (x) | x h 1 w (hw(xt))] A(hw(xt)) .

Given δ (0, 1), let ϵ = ℓ t d ln 2et

δ . By Lemma 3.3, for any value of hw(x) Y, with probability at least 1 δ , E x P[v (x) | x h 1 w (hw(xt))] A(hw(xt)) 2ϵ H.

Setting δ = δ/|Y| = δ/2ℓ, by a union bound over elements in Y, we have that with probability at least 1 δ,

E xt P [(v (xt) p(xt)) s (v (xt) p(xt)) st]

h E x P v (x) | x h 1 w (hw(xt)) p(xt) E i

h E x v (x) | x h 1 w (hw(xt)) A(hw(xt)) E i

d + ln ℓ 2ℓ+2

which completes the proof.

This instantaneous regret bound then implies a regret bound for Algorithm 1, the proof of which is in Appendix B.

Theorem 3.5. With probability at least 1 δ, the regret of Algorithm 1 is RT = O( p

Tdℓlog(T ℓ/δ)).

In Algorithm 1, we assumed that we knew the horizon T. This assumption can be lifted via a doubling trick; see Appendix B.

4. General Masking Functions

We next consider a more general setting where in each round, an item xt is drawn from an unknown distribution P and a published price pt is drawn from some ﬁxed unknown distribution. We study the case where the masking function is an arbitrary mapping h : X [n]. In this setting, is there a no-regret strategy that the buyer can use?

We answer this question in the afﬁrmative, building on the classical Exp4.VC algorithm (Beygelzimer et al., 2011). Out of the box, Exp4.VC has per-round runtime that is exponential in n, but we exploit the structure of our problem setting to obtain a polynomial per-round runtime. We prove that this algorithm has a regret bound of O( p

T(n log T/n + log 1/δ)) with probability at least 1 δ.

4.1. Exp4.VC algorithm

Based on the optimal strategy from Proposition 2.3, we deﬁne an inﬁnite set of policies that take as input (pt, h(xt)) and return decisions in {0, 1} indicating whether or not the buyer should buy the item. Each policy is deﬁned by a vector v [0, H]n as follows: πv(pt, h(xt)) = I(v[h(xt)] pt). The optimal strategy from Proposition 2.3 corresponds to the strategy πv with v = (E[v (x) | h(x) = 1], . . . , E[v (x) | h(x) = n]). We use the notation Π = {πv | v [0, H]n} to denote the set of all such policies.

A key observation is that our problem can be framed as a contextual bandit problem with an oblivious adversary and an inﬁnite set of contexts. At each round t = 1, . . . , T, the buyer observes stochastic context (pt, h(xt)) and makes

No-Regret Learning in Partially-Informed Auctions

Algorithm 2 Exp4.VC with an unknown distribution

1: Input: T 0, δ (0, 1).

2: Set τ = q

3: for t = 1, 2, . . . , τ do {Initialization phase}

4: Receive h(xt), price pt where xt iid P. 5: Make decision bt Bern(0.5) at random. 6: end for

7: for i = 1, 2, . . . , n do {Extract a ﬁnite subset of policies} 8: Let i1, i2, . . . , imi [τ] be the set of indices where h(xij) = i for all j {1, 2, . . . , mi} 9: Deﬁne Vi = {0, pi1, pi2, . . . , pimi} 10: end for 11: Deﬁne V = n i=1Vi

12: Set γ = q

log |V| 2(T τ) and wv τ+1 = 1 for all v V.

13: for t = τ + 1, . . . , T do {Exp.4 subroutine}

14: Receive h(xt), price pt where xt iid P. 15: Get advice vectors ξv t {0, 1}2 for all v V where ξv t = (1 πv(pt, h(xt)), πv(pt, h(xt))). 16: Set Wt = P v V wv t and deﬁne ξt (0, 1)2 as

ξt[0] = (1 2γ) X

wv t ξv t [0] Wt + γ

ξt[1] = (1 2γ) X

wv t ξv t [1] Wt + γ.

17: Draw decision bt Bern( ξt[1]) and receive reward ut = (v (xt) pt)bt. 18: Set ˆrt = (btut/ ξt[1], 0) .

19: for v V do 20: Set

ξv t [b] ξ v t [b]

log |V|/δ 2(T τ)

21: Set wv t+1 = wv t exp Cv t 22: end for

23: end for

a purchase decision. This is a contextual bandit problem with two arms: the ﬁrst arm corresponds to the decision no purchase and the second arm corresponds to the decision purchase. The reward of the ﬁrst arm is always zero, while the reward of pulling the second arm depends on the item xt and the price pt.

We prove that the class of policies Π has a VC dimension of n, which allows us to adapt Exp4.VC from Beygelzimer et al. (2011), a generic contextual bandits algorithm

for policy classes with ﬁnite VC dimension. Algorithm 2 begins with an initialization phase of length τ. In this phase, the buyer chooses their action uniformly at random and collects tuples {(h(xt), pt)}τ t=1. The algorithm then uses these tuples to identify a ﬁnite (though exponentially large), representative subset of policies in Π. In particular, using the collected tuples, the algorithm partitions Π into a ﬁnite set of equivalence classes where two policies π, π are equivalent if they agree on the set of τ collected tuples. Then the buyer constructs a ﬁnite set of policies Π by selecting one policy from each equivalence class. Algorithm 2 does this by ﬁrst deﬁning for each i [n] a set Vi R which is the set of all prices from the ﬁrst τ rounds for items xt with h(xt) = i. The ﬁnite set Π of policies is then deﬁned as Π = {πv : v n i=1Vi}. We prove that Π contains a policy from each equivalence class in Lemma 4.2.

In the remaining rounds, Algorithm 2 follows the Exp4.P strategy (Beygelzimer et al., 2011) which runs multiplicative weight updates on each of the selected policies πv with v n i=1Vi. Out-of-the-box, Exp4.VC would therefore have a per-round runtime that is exponential in n since n i=1Vi is exponentially large. However, with a careful analysis, we show that in our setting these multiplicative weight updates can be computed in polynomial time.

4.2. Regret

The key ﬁrst step is to show that although the set of all policies we need to consider Π is inﬁnite, it has a ﬁnite VC dimension. The full proof is in Appendix C.

Lemma 4.1. The VC dimension of Π is n.

Proof sketch. First, we show that the functions in Π cannot be used to label n + 1 contexts in all possible ways. Given n + 1 contexts (h(x1), p1), . . . , (h(xn+1), pn+1), by the pigeonhole principle there must exist at least two items xi and xj that have the same index: h(xi) = h(xj). Therefore, for any policy πv, the decisions for these two items are determined by the same threshold v[h(xi)] = v[h(xj)]. Without loss of generality, assume that pi < pj. There is no policy πv where the decision is to purchase item j but not purchase item i because this would imply that pj v[h(xj)] = v[h(xi)] < pi. However, with fewer than n + 1 items, since all items can use a different threshold, their decisions do not interfere with each other.

Next, in order to invoke the regret bound of Exp4.VC , we verify that Π is a representative set of policies. Formally, suppose we partition Π into a set of equivalence classes where policies π and π are equivalent if they agree on the set of τ tuples collected in the initalization phase. We prove that Π contains a policy from each equivalence class.

No-Regret Learning in Partially-Informed Auctions

Lemma 4.2. Let V be deﬁned as in Algorithm 2. Then

{(πv(h(x1), p1), . . . , πv(h(xτ), pτ)) : v [0, H]n}

= {(πv(h(x1), p1), . . . , πv(h(xτ), pτ)) : v V} .

The proof of this lemma can be found in Appendix C.

Lemmas 4.1 and 4.2 imply the following regret bound:3

Theorem 4.3. With probability 1 δ, Algorithm 2 achieves a regret rate that is RT = O( p

T(n log T/n + log 1/δ)).

Proof. This theorem follows directly from Lemmas 4.1 and 4.2 and Theorem 5 of Beygelzimer et al. (2011).

4.3. Computational Complexity

The key challenge in applying Exp4.VC out-of-the-box is that it computes multiplicative weight updates over every policy πv with v V an exponential number of policies. We show that by exploiting our problem structure, we can perform these multiplicative weight updates in each round in polynomial time. In particular, we show that we can efﬁciently compute the purchase probabilities ξt[0] and ξt[1] without computing the multiplicative weights wv t for each v V explicitly, and therefore Algorithm 2 can be run with polynomial per-round runtime. Intuitively, rather than sum over every vector in V = n i=1Vi as in the deﬁnitions of ξt[0] and ξt[1], we show how to sum over individual elements in n i=1Vi, of which there are τ +n = O(

Tn+n). We provide the complete proof in Appendix C. Theorem 4.4. The purchase probabilities ξt[0] and ξt[1] in Algorithm 2 can be computed in O(n + τ) = O(n + p

Tn log(T/n) + log(1/δ)) time.

Proof sketch. Our proof begins with the observation that for each index i [n], the thresholds 0 pi1 pimi in Vi divide the price range [0, H] into mi+1 non-overlapping buckets : [0, pi1), [pi2, pi3), . . . , [pimi, H]. Using the notation m = maxi mi, the total number of buckets is O(mn). Each context (h(xt), pt) corresponds to exactly one bucket: the bucket [pij, pij+1) containing pt where h(xt) = i. Moreover, the decision of each policy in each bucket is constant since the policies all use the same thresholds, namely, the boundaries of these buckets. Given a vector v V and a bucket k, let av k {0, 1} be the policy s recommendation of buy or do not buy for any item that falls in that bucket. This allows us to rewrite the policy decision ξv t as an alternative sum:

k:av k=0 I(item in k), P

k:av k=1 I(item in k) . In-

tuitively, I(item in k) is only nonzero for the bucket that this

3By running n copies of Exp4.VC in parallel for each context, we would obtain a regret bound of O( p

Tn log(T n/δ)), but by using a more careful analysis in this section, we improve the dependence on n.

item belongs to, and the policy s decision for the item is the same as the policy s decision for that bucket. Using this ﬁne-grained argument, we then show that all the purchase probabilities ξt[0] and ξt[1] can be computed in polynomial time without explicitly computing the exponentially many weights wv t .

Remark 4.5. Under adversarial prices, we can run n independent copies of an algorithm for Lipschitz contextual bandits (for example, the algorithm from Section 8.3 of the textbook by Slivkins et al. (2019)) to obtain an expected regret bound of O(T 2/3n 1/3).

5. Conclusion

We presented learning algorithms for buyers who participate in auctions with limited item information. This model captures a broad set of practical applications, including advertising auctions. Our algorithms are no-regret with respect to an oracle buyer who has perfect knowledge of the distribution over items and the masking function that the seller uses to obfuscate the item information. We proposed noregret learning algorithms in a variety of settings, including when the distribution over items is either known or unknown to the buyer, and when the prices are either stochastic or adversarial.

To the best of our knowledge, this is the ﬁrst result on no-regret learning strategies for buyers with partial item information. Many interesting questions remain open for future research. First, we have assumed that the valuation is bounded, and it would be an interesting direction for future research to extend our results to unbounded distributions. Second, we focused on posted-prices in this work. It would be interesting to consider extensions to secondprice auctions with partial item information. With stochastic prices, the problem seems potentially feasible, but adversarial prices might pose challenges because the adversary could block the learner from estimating the second-highest bid. Moreover, can the platform release partial information in a way that optimally trades off between revenue and privacy? When the distribution over items is unknown, our algorithm works with general masking functions. Can better purchasing strategies be developed by exploiting the properties of a speciﬁc set of masking functions?

Acknowledgements

We wish to acknowledge support from the Vannevar Bush Faculty Fellowship program under grant number N0001421-1-2941. WG acknowledges support from a Google Ph D fellowship.

No-Regret Learning in Partially-Informed Auctions

Anthony, M. and Bartlett, P. Neural Network Learning: Theoretical Foundations. Cambridge University Press, 2009.

Arefeva, A. and Meng, D. Revealing information in auctions: The optimal auction versus the second-price auction. Economics Letters, 204:109895, 2021.

Arnosti, N., Beck, M., and Milgrom, P. Adverse selection and auction design for internet display advertising. American Economic Review, 106(10):2852 66, 2016.

Bagnoli, M. and Bergstrom, T. Log-concave probability and its applications. In Rationality and Equilibrium, pp. 217 241. Springer, 2006.

Balseiro, S. R. and Gur, Y. Learning in repeated auctions with budgets: Regret minimization and equilibrium. Management Science, 65(9):3952 3968, 2019.

Bergemann, D. and Pesendorfer, M. Information structures in optimal auctions. Journal of Economic Theory, 137(1): 580 609, 2007.

Bergemann, D., Heumann, T., and Morris, S. Selling impressions: Efﬁciency vs. competition. 2021.

Beygelzimer, A., Langford, J., Li, L., Reyzin, L., and Schapire, R. Contextual bandit algorithms with supervised learning guarantees. In Proceedings of the Fourteenth International Conference on Artiﬁcial Intelligence and Statistics, pp. 19 26. JMLR Workshop and Conference Proceedings, 2011.

De Corniere, A. and De Nijs, R. Online advertising and privacy. The RAND Journal of Economics, 47(1):48 72, 2016.

Dikkala, N. and Tardos, É. Can credit increase revenue? In International Conference on Web and Internet Economics, pp. 121 133. Springer, 2013.

Edelman, B., Ostrovsky, M., and Schwarz, M. Internet advertising and the generalized second-price auction: Selling billions of dollars worth of keywords. American economic review, 97(1):242 259, 2007.

Epasto, A., Muñoz Medina, A., Avery, S., Bai, Y., Busa Fekete, R., Carey, C., Gao, Y., Guthrie, D., Ghosh, S., Ioannidis, J., et al. Clustering for private interest-based advertising. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 2802 2810, 2021.

Es o, P. and Szentes, B. Optimal information disclosure in auctions and the handicap auction. The Review of Economic Studies, 74(3):705 731, 2007.

Feng, Z., Podimata, C., and Syrgkanis, V. Learning to bid without knowing your value. In Proceedings of the 2018 ACM Conference on Economics and Computation, pp. 505 522, 2018.

Ganuza, J.-J. Ignorance promotes competition: an auction model with endogenous private valuations. RAND Journal of Economics, pp. 583 598, 2004.

Gershkov, A. Optimal auctions and information disclosure. Review of Economic Design, 13(4):335 344, 2009.

Guha, S., Cheng, B., and Francis, P. Privad: Practical privacy in online advertising. In USENIX Conference on Networked Systems Design and Implementation, pp. 169 182, 2011.

Juels, A. Targeted advertising... and privacy too. In Cryptographers Track at the RSA Conference, pp. 408 424. Springer, 2001.

Li, D. and Jewitt, I. Cheap talk advertising in auctions: horizontally vs vertically differentiated products. 2017.

Li, H. and Shi, X. Discriminatory information disclosure. American Economic Review, 107(11):3363 85, 2017.

Lovász, L. and Vempala, S. Fast algorithms for logconcave functions: Sampling, rounding, integration and optimization. In 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS 06), pp. 57 68. IEEE, 2006.

Milgrom, P. Putting Auction Theory to Work. Cambridge University Press, 2004.

Milgrom, P. Simpliﬁed mechanisms with an application to sponsored-search auctions. Games and Economic Behavior, 70(1):62 70, 2010.

Post, D., Coppinger, S., and Sheble, G. Application of auctions as a pricing mechanism for the interchange of electric power. IEEE Transactions on Power Systems, 10 (3):1580 1584, 1995.

Raﬁeian, O. and Yoganarasimhan, H. Targeting and privacy in mobile advertising. Marketing Science, 40(2):193 218, 2021.

Roesler, A.-K. and Szentes, B. Buyer-optimal learning and monopoly pricing. American Economic Review, 107(7): 2072 80, 2017.

Saumard, A. and Wellner, J. A. Log-concavity and strong log-concavity: a review. Statistics surveys, 8:45, 2014.

Slivkins, A. et al. Introduction to multi-armed bandits. Foundations and Trends in Machine Learning, 12(12):1 286, 2019.

No-Regret Learning in Partially-Informed Auctions

Weed, J., Perchet, V., and Rigollet, P. Online learning in repeated auctions. In Conference on Learning Theory, pp. 1562 1583. PMLR, 2016.

No-Regret Learning in Partially-Informed Auctions

A. Proofs for Section 2

Proposition 2.3. The strategy s that maximizes Ex P[(v (x) p(x))s (h(x), p(x), h, P) | h(x)] is

s (h(x), p(x), h, P) = I E x P [v (x) | h(x)] > p(x) .

Proof. For a decision b {0, 1}, let Rb denote the expected utility of buying or not buying an item given the published information and price:

Rb def = E[(v (x) p(x)) b | h(x), p(x)]

= E[(v (x) p(x)) b | v (x) p(x), h(x), p(x)] P[v (x) p(x) | h(x), p(x)]

E[(p(x) v (x)) b | v (x) < p(x), h(x), p(x)] P[v (x) < p(x) | h(x), p(x)].

Then, by deﬁnition, the optimal strategy s (h(x), p(x), p, P) = I(R1 > R0) = I(R1 > 0).

Further, letting x+ = max{0, x}, by the law of total expectation,

E[(v (x) p(x))+ | h(x), p(x)]

= E[(v (x) p(x)) | v (x) p(x), h(x), p(x)] P[v (x) p(x) | h(x), p(x)]

+ E[0 | v (x) < p(x), h(x), p(x)] P(v (x) < p(x) | h(x), p(x))

= E[(v (x) p(x)) | v (x) p(x), h(x), p(x)] P[v (x) p(x) | h(x), p(x)].

Similarly, letting x = min{0, x},

E[(v (x) p(x)) | h(x), p(x)]

= E[(p(x) v (x)) | v (x) < p(x), h(x), p(x)] P(v (x) < p(x) | h(x), p(x)).

Therefore, the optimal strategy is:

s (h(x), p(x), h, p, P) = I E[(v (x) p(x))+ | h(x), p(x)] E[(v (x) p(x)) | h(x), p(x)] > 0

= I (E[(v (x) p(x)) | h(x), p(x)] > 0)

= I (E[v (x) | h(x), p(x)] > p(x)) .

The lemma statement then follows from Equation (1).

B. Proofs for Section 3

Deﬁnition B.1 (Lovász & Vempala (2006)). Deﬁne the centroid of fπ as cf = R xdfπ. Deﬁne the variance of fπ as var(fπ) = R x cf 2dfπ. Also denote L(θ) as the level set {x : v (x)π(x) θ}.

Theorem 3.5. With probability at least 1 δ, the regret of Algorithm 1 is RT = O( p

Tdℓlog(T ℓ/δ)).

Proof. First, in the exploration phase, the total regret is upper bounded by Ht .

Now consider the regret in the exploitation phase. Notice that, we would have obtained {xi, hw(xi)}t i=1 number of i.i.d. samples from the exploration phase.

d + ln ℓ 2ℓ+2

δ . By Lemma 3.4 and a union bound over all rounds, with probability at least 1 δ T, the

total expected regret incurred by Algorithm 1 is RT Ht + 2ϵH(T t ).

δ , we have

δ log ℓ2ℓ+2

No-Regret Learning in Partially-Informed Auctions

Note that log ℓ2ℓ+2

δ + ℓlog(4) log ℓ

δ + d log(4). Setting δ = δ/T, we have that with probability at least 1 δ,

This completes the proof.

B.1. Algorithm for an unknown horizon T

In Theorem 3.5, we assumed that we knew the time horizon T, which allowed us to set the correct length for the exploration phase. This assumption can be lifted by using the doubling trick which runs the algorithm in independent intervals that are doubling in length, as summarized by Algorithm 3. The regret bound remains the same up to constant factors.

Algorithm 3 Explore-then-Commit with Unknown Horizon

1: Input: starting epoch length T0. 2: for i = 1, 2, . . . do 3: Ti 2i T0. 4: Run Algorithm 1 with T = Ti. 5: end for

Corollary B.2. Suppose that v (x)π(x) is logconcave and well-rounded. Then with probability at least 1 δ, the regret of Algorithm 3 is RT = O( p

Tdℓlog(T ℓ/δ)).

Proof. Denote the total expected accumulated regret as RT , and the expected accumulated regret in each interval with length Ti as RTi. Denote the number of intervals as m log2 2T

T0 . Then, by Theorem 3.5 we have that for some universal constant C and with probability at least 1 δ m:

Tidℓlog 2ℓTi

2i T0 log(2i T0)

Let δ = δ/m and note that m log2 2T

T0 , applying a union bound over all intervals, we have that with probability at least 1 δ,

This completes the proof.

C. Proofs for Section 4

Lemma 4.1. The VC dimension of Π is n.

No-Regret Learning in Partially-Informed Auctions

Proof. We argue that any function contained in Π cannot be used to label n + 1 input points in all possible ways. For simplicity denote h(x) = y [n]. Consider a set of input points {yi, pi}n+1 i=1 . By the pigeonhole principle there must exist at least two elements (yi, pi), (yj, pj) such that yi = yj and pi = pj. Therefore by deﬁnition, we have that for any v Rn,

πv(yi, pi) = I(v[yi] > pi) = I(v[yj] > pi),

πv(yj, pj) = I(v[yj] > pj) = I(v[yi] > pj).

Without loss of generality, assume that pi < pj. Then the pair of labels πv(yi, pi) = 1 and πv(yj, pj) = 0 can never be achieved for any v Rn. Thus V Cdim(Π) < n + 1.

Next, consider a set of n input points {(i, 0.5)}n i=1. Each point in this set is labeled using a different index i, so all possible combinations of labels can be achieved using vectors v {0, 1}n. Thus we conclude that V Cdim(Π) = n.

Lemma 4.2. Let {(h(x1), p1), . . . , (h(xτ), pτ)} be a subset of [n] [0, H]. For each i [n], let i1, i2, . . . , imi [τ] be the set of indices where h(xij) = i for all j {1, 2, . . . , mi}. Deﬁne Vi = {0, pi1, pi2, . . . , pimi} and V = n i=1Vi. Then

πv(h(x1), p1) ... πv(h(xτ), pτ)

πv(h(x1), p1) ... πv(h(xτ), pτ)

Proof. We will show that for every v [0, H]n, there exists a vector v0 n i=1Vi such that πv(h(xj), pj) = πv0(h(xj), pj) for every j [τ]. To this end, ﬁx an index i [n] and without loss of generality, let i1, i2, . . . , imi be sorted such that 0 := pi0 < pi1 < pi2 < < pimi. Let i {0, 1, 2, . . . , imi} be the largest index such that v[i] pi . Deﬁne v0[i] = pi . For every index ij i , we know that v[i] pi = v0[i] pij, so πv(h(xij), pij) = I(v[i] pij) = 1 = I(v0[i] pij) = πv0(h(xij), pij). Meanwhile, for every index ij > i , we know that pi = v0[i] v[i] < pij. Therefore, πv(h(xij), pij) = 0 = πv0(h(xij), pij). In either case, we have that πv(h(xij), pij) = πv0(h(xij), pij). Since this is true for every index i [n], the lemma statement holds.

Theorem 4.4. The purchase probabilities ξt[0] and ξt[1] in Algorithm 2 can be computed in O(n + τ) = O(n + p

Tn log(T/n) + log(1/δ)) time.

Proof. Label the elements of Vi as 0 := pi,0 < pi,1 < < pi,mi. Let m = max mi. For the ease of notation, deﬁne the variables pi,mi+1 = pi,mi+2 = pi,m = H. For each i [n], j {1, . . . , m}, and t {τ + 1, τ2, . . . , T}, we deﬁne the variable di,j(t) = I(h(xt) = i and pt (pi,j 1, pi,j]).

We also set di,0(t) = I(h(xt) = i and pt = 0). Next, for each v n i=1Vi, i [n], and j {0, . . . , m}, we deﬁne the variable av i,j = I(v[i] pi,j).

Claim C.1. For b {0, 1},

ξv t [b] = X

(i,j):av i,j=b di,j(t). (4)

Proof of Claim C.1. First, let it = h(xt). If pt = 0, deﬁne jt = 0 and otherwise, deﬁne jt such that pt (pit,jt 1, pit,jt]. Therefore, di,j(t) = 1 if (i, j) = (it, jt) and di,j(t) = 0 otherwise. This means that

(i,j):av i,j=b di,j(t) = I(av it,jt = b). (5)

We split the proof into two cases: b = 0 and b = 1.

No-Regret Learning in Partially-Informed Auctions

Case 1: b = 0. We know that ξv t [0] = 1 πv(pt, h(xt)) = I(v[h(xt)] < pt) = I(v[it] < pt).

If ξv t [0] = 0, then v[it] pt. Since v V, we know that v[it] = pit,j for some j [m]. Moreover, since pt (pit,jt 1, pit,jt], the fact that v[it] pt means that v[it] pit,jt. Therefore, av it,jt = 1. By Equation (5), this means that P

(i,j):av i,j=0 di,j(t) = 0, so Equation (4) holds.

Meanwhile, if ξv t [0] = 1, then v[it] < pt pit,jt. Therefore, av it,jt = 0. By Equation (5), this means that P

(i,j):av i,j=0 di,j(t) = 1, so Equation (4) holds.

Case 2: b = 1. We know that ξv t [1] = I(v[it] pt).

If ξv t [1] = 0, then v[it] < pt. By the same logic as the previous case, this means that av it,jt = 0. By Equation (5), this means that P

(i,j):av i,j=1 di,j(t) = 0, so Equation (4) holds.

Meanwhile, if ξv t [1] = 1, then v[it] pt. By the same logic as the previous case, av it,jt = 1. By Equation (5), this means that P

(i,j):av i,j=1 di,j(t) = 1, so Equation (4) holds.

This means that

(i,j):av i,j=b di,j(t)ˆrt[b]

j=0 di,j(t) ˆrt[0]I(av i,j = 0) + ˆrt[1]I(av i,j = 1)

j=0 di,j(t)ˆrt[av i,j].

ξv t [b] ξ v t [b] =

(i,j):av i,j=b

ξ v t [b] =

di,j(t) ξ v t [av i,j].

wv t+1 = wv t exp

ξv t [b] ξ v t [b]

log |V|/δ 2(T τ)

j=0 di,j(t)fav i,j(t)

fav i,j(t) = γ

ˆrt[av i,j] + 1 ξ v t [av i,j]

log |V|/δ 2(T τ)

No-Regret Learning in Partially-Informed Auctions

We can therefore write

j=0 di,j(τ)fav i,j(τ)

j=0 di,j(τ)fav i,j(τ)

τ=1 di,j(τ)fav i,j(τ)

τ=1 di,j(τ)fav i,j(τ)

Letting gi,j(t, b) = exp Pt τ=1 di,j(τ)fb(τ) , we have that

j=0 gi,j(t, av i,j).

Let b1, . . . , bm be the set of m increasing bit vectors b1 = (1, 0, 0, . . . , 0), b2 = (1, 1, 0, . . . , 0), b3 = (1, 1, 1, . . . , 0), ... , bm = (1, 1, 1, . . . , 1). For each i [n] and bj, deﬁne

gi(t, bj) = gi,1(t, bj[1])gi,2(t, bj[2]) gi,m(t, bj[m]).

We now prove the following claim.

Claim C.2. The normalizing constant Wt can be computed in polynomial time as

j=1 gi(t, bj).

Proof of Claim C.2. We ﬁrst write

j=0 gi,j(t, av i,j)

j=0 gi,j(t, I(v[i] pi,j))

j=0 gi,j(t, I(pi,ji pi,j)).

This last equality holds because V = n i=1Vi and Vi = {pi,0, . . . , pi,mi}. Moreover,

j=0 gi,j(t, 1)

j=ji+1 gi,j(t, 0)

because pi,ji pi,j for all j ji and pi,ji < pi,j for all j > ji.

No-Regret Learning in Partially-Informed Auctions

By deﬁnition of the vectors b1, . . . , bm,

i=1 gi(t, bji) =

j=1 gi(t, bj),

as claimed.

We next prove that ξt can be computed in polynomial time. To do so, let h(xt) = it. If pt = 0, deﬁne jt = 0 and otherwise, deﬁne jt such that pt (pit,jt 1, pit,jt]. Next, deﬁne mi as follows:

( jt 1 if i = it mi otherwise.

Similarly, deﬁne mi as follows:

( jt if i = it 0 otherwise.

Then ξt has the following form:

Claim C.3. The probabilities ξt can be computed in polynomial time as:

ji=0 gi(t, bji)

ji=mi gi(t, bji).

Proof of Claim C.3. Recall that ξv t [0] = I(v[it] < pt). Therefore,

v V wv t ξv t [0] = X

j=0 gi,j(t, av i,j)ξv t [0]

j=0 gi,j(t, av i,j)I(v[it] < pt)

i=1 gi(t, bji)

I(pit,jit < pt).

This last equality holds because V = n i=1Vi and Vi = {pi,0, . . . , pi,mi}. Without loss of generality, suppose that it = 1, so I(pit,jit < pt) = I(p1,j1 < pt). Then

v V wv t ξv t [0] =

i=1 gi(t, bji)

I(p1,j1 < pt)

j1=0 I(p1,j1 < pt)

i=1 gi(t, bji)

Since pt = 0 if jt = 0 and otherwise pt (p1,jt 1, p1,jt] we have that I(p1,j1 < pt) = 1 if and only if j1 jt 1. Therefore,

v V wv t ξv t [0] =

i=1 gi(t, bji)

i=1 gi(t, bji) =

ji=0 gi(t, bji).

No-Regret Learning in Partially-Informed Auctions

Similarly, since ξv t [1] = I(v[it] pt), we have

v V wv t ξv t [1] =

i=1 gi(t, bji)

I(pit,jit pt).

Without loss of generality, suppose that it = 1, so I(pit,jit pt) = I(p1,j1 pt). Then

v V wv t ξv t [1] =

j1=0 I(p1,j1 pt)

i=1 gi(t, bji)

Since pt = 0 if jt = 0 and otherwise pt (p1,jt 1, p1,jt] we have that I(p1,j1 pt) = 1 if and only if j1 jt. Therefore,

v V wv t ξv t [0] =

i=1 gi(t, bji)

i=1 gi(t, bji)

ji=mi gi(t, bji).

Lastly, note that in the initialization phase of Algorithm 2, every decision bt is computed with O(1) time. In the Exp4 subroutine, by Claim C.2 and Claim C.3, every iteration is also computed in polynomial time. Therefore the theorem statement holds.