# learning_to_price_against_a_moving_target__da0c4d51.pdf

Learning to Price Against a Moving Target

Renato Paes Leme 1 Balasubramanian Sivan 1 Yifeng Teng 2 Pratik Worah 1

In the Learning to Price setting, a seller posts prices over time with the goal of maximizing revenue while learning the buyer s valuation. This problem is very well understood when values are stationary (ﬁxed or iid). Here we study the problem where the buyer s value is a moving target, i.e., they change over time either by a stochastic process or adversarially with bounded variation. In either case, we provide matching upper and lower bounds on the optimal revenue loss. Since the target is moving, any information learned soon becomes out-dated, which forces the algorithms to keep switching between exploring and exploiting phases.

1. Introduction

Inspired by applications in electronic commerce, we study a problem where a seller repeatedly interacts with a buyer by setting prices for an item and observing whether the buyer purchases or not. These problems are characterized by two salient features: (i) binary feedback: we only observe if the buyer purchased or not, at the price we posted; (ii) discontinuous loss function: pricing just below the buyer s valuation incurs a small loss while pricing just above it incurs a large loss since it results in a no-sale.

This problem has been studied with many different assumptions on how the buyer valuation vt changes over time: ﬁxed over time and i.i.d. draws each round were studied in (Kleinberg & Leighton, 2003; Devanur et al., 2019; Cesa-Bianchi et al., 2019), deterministic contextual (Amin et al., 2014; Cohen et al., 2016; Lobel et al., 2017; Leme & Schneider, 2018; Liu et al., 2021), contextual with parametric noise (Javanmard & Nazerzadeh, 2019) and contextual with non-

*Equal contribution 1Google Research, New York, NY, USA 2Department of Computer Sciences, University of Wisconsin Madison, Madison, WI, USA. Correspondence to: Renato Paes Leme <renatoppl@google.com>, Balasubramanian Sivan <balusivan@google.com>, Yifeng Teng <yifengt@cs.wisc.edu>, Pratik Worah <pworah@google.com>.

Proceedings of the 38 th International Conference on Machine Learning, PMLR 139, 2021. Copyright 2021 by the author(s).

parametric noise (Shah et al., 2019; Krishnamurthy et al., 2020). All those models are stationary in the sense that the buyer s model is i.i.d. across time. The exceptions to this are algorithms that consider valuations that are drawn adversarially (Kleinberg & Leighton, 2003), but that work still compares with the best single price in hindsight. I.e., even though the buyer model is non-stationary, the benchmark still is.

Our main goal in this paper is to explore settings where both the buyer model and the benchmark are non-stationary. We will compare our revenue with the ﬁrst-best benchmark, namely, the sum of the buyer s value at every single step. We will however assume that the buyer s valuation moves slowly.

Motivation Our main motivation for this study is online advertising. Display ads are mostly sold through ﬁrst price auctions with reserve prices (Paes Leme et al., 2020). In many market segments, the auctions are thin, i.e., there is just one buyer, who bids just above the reserve when his value exceeds the reserve (to both pay as little as possible, and not reveal their true value) and doesn t bid otherwise. This scenario effectively offers just binary feedback, and also makes reserve price the only pricing tool (i.e., not much auction competition). To see why buyer value changes are typically slow, and unknown to the seller: the effective value of a buyer, even for two identical queries, is similar but not exactly the same due to factors such as remaining budget. A common scenario is that a buyer has a spend target stating a target θt of their daily budget to be spent by time t. Bids often become a function of the ratio between the actual spend and the target spend. The auction platform doesn t know the targets/bidding formula, but it can use the fact that both target and actual spend, and hence the bids, will change smoothly over time.

Another important motivation is to effectively price buyers who are learning about their own valuation. This is a common setup in ﬁnance (Shreve, 2004) where traders constantly acquire new information about the products they are trading, and update their valuations accordingly.

Our results and techniques are presented in Section 3 after we formally deﬁne our model in Section 2.

Learning to Price Against a Moving Target

Related Work Our work is situated in the intersection of two lines of work in online learning: online learning for pricing (discussed earlier in the introduction) and online learning with stronger benchmarks, such as tracking regret (Herbster & Warmuth, 2001; Luo & Schapire, 2015), adaptive regret (Hazan & Seshadhri, 2007), strongly adaptive online learning (Daniely et al., 2015) and shifting bandits (Foster et al., 2016; Lykouris et al., 2018). The difﬁculty in applying this line of work to pricing problems is that even when the valuation vt changes slightly, the loss function itself will change dramatically for certain prices. Instead here, we exploit the special structure to the revenue loss to obtain better regret bounds.

There is another line of work that studies revenue maximization in the presence of evolving buyer values (Kakade et al., 2013; Pavan et al., 2014; Chawla et al., 2016). While all these works consider the cumulative value over time as benchmark, there are important differences, the ﬁrst two papers have full feedback: they design mechanisms that solicit buyer bids. Chawla et al. shoot for simple pricing schemes yielding constant factor approximations, while we seek to obtain much closer to the optimal. Moreover, in their model the values evolve only when the buyer purchases the good.

General setting. A seller repeatedly interacts with a buyer over T time steps, offering to sell an item at each step. The buyer s value is vt [0, 1] at time step t [T], and is changing across time. The seller has no knowledge of the buyer s value at any time, not even at t = 1. At each time step t, the seller posts a price pt [0, 1] to the buyer, and the buyer purchases the item if and only if she can afford to buy it, i.e. pt vt. The binary signal

σt = 1{vt pt} {0, 1}

of whether the item is sold or not is the only feedback the seller obtains at each time step. The seller s objective is to maximize his the total revenue, i.e. REV = PT t=1 ptσt.

Loss metrics. The benchmark we are comparing to is the hindsight optimal revenue we can get from the buyer, which is the sum of her value across all time periods: OPT = PT t=1 vt. The revenue loss at any time step t is deﬁned by ℓR(pt, vt) = vt ptσt, and the revenue loss of any price vector p = (p1, , p T ) for value proﬁle v = (v1, , v T ) is deﬁned by

ℓR(p, v) = 1

T (OPT REV) = 1

t=1 (vt ptσt).

The symmetric loss at any time step t is deﬁned by ℓ1(pt, vt) = |vt pt|, and the total symmetric loss of

any price vector p = (p1, , p T ) for value proﬁle v = (v1, , v T ) is deﬁned by

ℓ1(p, v) = 1

t=1 |vt pt|.

Intuitively, the revenue loss determines how much revenue we lose compared to a hindsight optimal selling strategy. The symmetric loss determines how close our price guesses are from the correct value vector of the buyer.

3. Our Results and Techniques

Next we describe our main results in this model. Our results will be given in terms of the changing rate, which we deﬁne as follows: consider a sequence of buyer valuations v1, v2, . . . , v T and a sequence ϵ1, . . . , ϵT 1 (all ϵt 1). We say that a sequence of buyer valuations {vt}t=1..T has changing rate {ϵt}t=1..T 1 whenever

|vt+1 vt| ϵt. (1)

The average changing rate is ϵ = 1 T 1 PT 1 t=1 ϵt. Our guarantees are a function of the average changing rate ϵ but our algorithms are agnostic to both the changing rate sequence and its average (except in warmup section 4).

We will consider the problem in two environments:

1. Adversarial: an adaptive adversary chooses vt s.

2. Stochastic: an adaptive adversary chooses a mean-zero distribution supported in [ ϵt, ϵt] from which vt+1 vt is drawn so that vt is a bounded martingale. More generally, our results hold for any stochastic process satisfying Azuma s inequality.

We analyze three settings in total: symmetric loss in the adversarial environment, and revenue loss in both the adversarial and stochastic environments. In each of the three settings, we design our algorithms gradually starting from the simple case where the changing rate is ﬁxed and known (warmup, Section 4), then ﬁxed and unknown (Section 5)), and ﬁnally dynamic and unknown (Section 6)).

Dynamic, non-increasing changing rates In Section 6, where we study dynamic and unknown ϵt, we focus primarily on the case where the changing rates ϵt are nonincreasing over time. This is motivated by situations where buyers use a learning algorithm with non-increasing learning rates (quite common) to determine their value, or using a controller to bid that stabilizes once the value approaches the ground truth. For symmetric loss alone (Theorem 6.4), we give guarantees for any sequence ϵt (i.e., not just nonincreasing) but we get an additional log(T)-factor in loss.

Learning to Price Against a Moving Target

Our results For symmetric loss, we develop an algorithm with average loss O( ϵ) (Theorem 6.1) [and a slightly larger loss of O( ϵ log T) when the ϵt s are not necessarily nonincreasing (Theorem 6.4)]. For revenue loss we show average loss of O( ϵ1/2) in the adversarial setting (Theorem 6.2) and O( ϵ2/3) in the stochastic setting (Theorem 6.3). Throughout, we use O(f) to denote O(f)polylog(1/ ϵ) + o(1) where the o(1) is with respect to T. Surprisingly, our loss bounds for none of the three settings in the general case of dynamic and unknown changing rates (Section 6) can be improved even if the changing rate is ﬁxed and known (Section 4, Theorems 4.1, 4.2, 4.3). I.e., in our fairly general model, knowing the rate of change of valuation, or having the rate of change be ﬁxed, don t provide much help in learning the valuation function to proﬁtably price against it!

Our techniques Step 1: ﬁxed and known ϵ (Section 4): Here, the algorithm keeps in each period a conﬁdence interval [ℓt, rt] containing vt. If rt ℓt is small, we price at the lower endpoint ℓt, resulting in a larger interval [ℓt+1, rt+1] = [ℓt ϵ, rt + ϵ] in the next iteration. Once the interval is large enough, we decrease its size by a binary search to decrease and re-start the process. The algorithm thus keeps alternating between exploring and exploiting, where the length of the interval decides when we do what.

Step 2: ﬁxed and unknown ϵ (Section 5): Here, we start guessing a very small value of ϵ, say ˆϵ = 1/T and we behave as if we knew ϵ. If we detect any inconsistency with our assumption, we double our guess: ˆϵ 2ˆϵ. It is easy to detect when the value is below our conﬁdence interval (since we always price at the lower point) but not so easy to detect when it is above. To address this issue, we randomly select certain rounds (with probability proportional to a power of our estimate ˆϵ) to be checking rounds. In those rounds, we price at the higher end of the interval.

Step 3: dynamic, non-increasing and unknown ϵt (Section 6): We again keep an estimate ˆϵ like in Step 2, but now we adjust it in both directions. We increase ˆϵ as in the previous step if we observe anything inconsistent with the guess. For the other direction, we optimistically halve the guess (ˆϵ ˆϵ/2) after we spend 1/ˆϵ periods with the same guess.

Comment on average versus total loss. Our results are all framed in terms of the average loss instead of the total loss, thus suppressing the factor T. To see why, note that even if we knew vt 1 exactly for each t, and even if values evolved stochastically, we would already incur an O(ϵ) loss per step leading to a total regret of O(Tϵ). Consequently, the main question is not the dependence on T (which is necessarily linear and cannot become any worse), but how much worse does the dependence on ϵ get, per time step. To see this in action, it is instructive to compare our algorithms with the algorithm in (Kleinberg & Leighton, 2003) which

has sublinear loss with respect to a ﬁxed price benchmark. For a ﬁxed ϵ consider the periodic sequence vt that starts at zero and increases by ϵ in each period reaching 1 and then decreases by ϵ in each period reaching 0 and starts climbing again. The ﬁrst-best benchmark is P

2 +O(ϵ) while the best ﬁxed price benchmark is T

4 + O(ϵ). Our algorithm guarantees total revenue T( 1

2 O( ϵ)), while Kleinberg & Leighton guarantee a revenue of T 1

T). In the supplementary material we show that for this example their algorithm indeed suffers a total loss of Ω(T) with respect to the ﬁrst-best benchmark, while our algorithms suffer T O( ϵ), i.e., much better dependence on ϵ.

4. Warmup: Buyer s Changing Rate is Fixed, Known

We begin by studying the symmetric loss in the adversarial environment. The result is straightforward via a binary search algorithm that keeps track of a conﬁdence interval [ℓt, rt] that contains the true value vt in each time step.

Theorem 4.1. If the buyer has adversarial values and a ﬁxed changing rate ϵ that is known to the seller, Algorithm 1 achieves symmetric loss O(ϵ). Further, no online algorithm can obtain symmetric loss less than Ω(ϵ).

Algorithm 1 Symmetric-loss minimizing algorithm for adversarial buyer with known changing rate ϵ

ℓ1 0 r1 1 for each time step t do

Price at pt rt+ℓt

2 if the current value vt < pt then

ℓt+1 max(0, ℓt ϵ) rt+1 min(1, pt + ϵ) else

ℓt+1 max(0, pt ϵ) rt+1 min(1, rt + ϵ) end if end for

Next, we extend the algorithm for symmetric loss to an optimal algorithm for revenue loss.

Theorem 4.2. If the buyer has adversarial values and a ﬁxed changing rate ϵ that is known to the seller, there exists an online pricing algorithm with revenue loss O(ϵ1/2). Further, no online algorithm can obtain a revenue loss less than Ω(ϵ1/2).

Proof. We assume that ϵ 1

T . Otherwise, we can run the algorithm with ϵ = 1

T and get revenue loss O((ϵ )1/2) = o(1) = O(ϵ1/2).

Learning to Price Against a Moving Target

The idea of the algorithm is as follows. The sale process starts with the seller not knowing the initial value v1 of the buyer. However, since the buyer s value only changes by at most ϵ in each step, the seller can quickly locate the buyer s value within O(ϵ) error by binary search. Such a small conﬁdence interval that contains the buyer s current value extends by ϵ in both upper bound and lower bound after each step. We propose an algorithm that repeatedly prices at the bottom of the conﬁdence interval and re-locates the buyer s current value whenever the conﬁdence interval becomes too wide.

Firstly, we have the following building-block algorithm that quickly returns the range of the buyer s value with additive accuracy O(ϵ). In the algorithm, timestamp t increases by one after any pricing query.

Algorithm 2 Locating the value of a buyer with changing rate ϵ to an interval of length a 6ϵ

Input: current conﬁdence interval [ℓt, rt] Run Algorithm 1 until conﬁdence interval [ℓt, rt] satisﬁes rt ℓt < 4ϵ Return [ℓt ϵ, rt + ϵ]

The algorithm repeatedly does binary search until the conﬁdence interval has length O(ϵ). If at the beginning we have a conﬁdence interval of length b and ﬁnally we have a conﬁdence interval of length a, thus the total number of steps needed is O(log b

a), and the total loss of the algorithm is O(log b

a) since the loss of each step is at most 1.

Next, we present the entire pricing algorithm for an adversarial buyer with changing rate ϵ using Algorithm 2 as a building block.

Algorithm 3 Revenue loss minimizing algorithm for adversarial buyer with known changing rate ϵ

for each phase [t0 + 1, t0 + m] of length m = ϵ 1/2 do

Apply Algorithm 2 to locate the current value of the buyer in an interval [ℓt, rt] with length ϵ for each step t in the phase do

Price at pt ℓt [ℓt+1, rt+1] [ℓt ϵ, rt + ϵ] end for end for

Now we analyze the revenue loss of the algorithm. In each phase of the algorithm with m time steps, we ﬁrst initialize and locate the value of the buyer to a small range ϵ. The revenue loss incurred by Algorithm 2 is O(log 1

ϵ ) = O(1) in each phase (actually O(1) after the ﬁrst phase since the length of conﬁdence interval only needs to shrink by constant fraction). Then we price at the bottom of the conﬁ-

dence interval for O( ϵ) steps for m = ϵ steps. Since the conﬁdence interval [ℓt, rt] expands by 2ϵ each time, it is equivalent to say we wait until the conﬁdence interval has length 3 ϵ. The revenue loss from each step is 3 ϵ since the item is bought by the buyer every time, thus O( ϵ) 3 ϵ = O(ϵ) in the entire phase; then we use binary search to narrow the conﬁdence interval by 1

2, which has revenue loss O(1) since it takes only O(1) steps in Algorithm 2. Each phase takes Θ( ϵ) steps with loss O(ϵ), therefore there are O(Tϵ 1/2) phases in total with loss O(Tϵ 1/2) O(ϵ) = O(Tϵ1/2). Thus the total loss of the algorithm is O(Tϵ1/2), with average revenue loss O(ϵ1/2).

We show that such loss is tight, that no online algorithm can obtain a revenue loss o(ϵ1/2). By Yao s minimax principle, to reason that there exists some adversarial buyer such that no randomized online algorithm can get revenue loss o(ϵ1/2), it sufﬁces to show that there exists a random adversarial buyer such that no online deterministic algorithm can get such low revenue loss. The buyer s value sequence is predetermined as follows. In the beginning, the buyer has value v0 = 1

2. The entire time horizon [T] is partitioned to T ϵ phases, each with length ϵ 1/2. In each phase starting with time t0 + 1 and ending with time t0 + ϵ 1/2, the values of the buyer form a monotone sequence: with probability 1

2, vt0+j = vt0 + jϵ, j [ϵ 1/2]; with probability 1 2, vt0+j = vt0 jϵ, j [ϵ 1/2]. For any deterministic algorithm with price pt at each time, when any phase begins, the algorithm needs to decide the pricing strategy without knowing which value instance of the buyer will be realized. Let ˆt [t0 +1, t0 +ϵ 1/2] be the ﬁrst time step in the phase such that vt0 (t t0)ϵ < pt.

If such ˆt exists, then the revenue loss at time ˆt is vt0 (ˆt t0)ϵ = vt0 O(ϵ1/2) when the values of the buyer decrease in the phase, and this case happens with probability 1 2. Then in expectation the revenue loss in this phase is Ω(vt0) O(ϵ1/2). Notice that the starting value vt0 of each phase form a unbiased random walk sequence with step size ϵ1/2, since the buyer starts with v0 = 1

2, therefore with constant probability vt0 1

2. Thus we can also claim that the expected revenue loss in the phase is Ω(1). If such ˆt does not exist, it means that the algorithm has identical information from both instances of the buyer in the entire phase, since the buyer can always afford to purchase the item. As pt vt0 (t t0)ϵ for each time t, the revenue loss of the algorithm when the values of the buyer increase in the phase is at least P

t0<t t+ϵ 1/2(vt0 + (t t0)ϵ pt) P

t0<t t+ϵ 1/2 2(t t0)ϵ = Ω(1).

Therefore in both cases, the revenue loss in the phase is Ω(1) in this phase for any deterministic algorithm. As there are Tϵ1/2 phases, the total revenue loss from all phases is at least Ω(Tϵ1/2). Thus no randomized algorithm can get

Learning to Price Against a Moving Target

average revenue loss o(ϵ1/2).

If the buyer s value evolves stochastically across time, the stochasticity helps us incur less revenue loss. To be more speciﬁc, when the buyer s value is stochastic and forms a martingale, after every ϵ 2/3 steps, although the buyer s value can change by as large as ϵ 2/3 ϵ = ϵ1/3, with high probability the buyer s value only changes by ϵ2/3. Thus compared to Algorithm 3, we can extend each phase s length while maintaining a shorter conﬁdence interval. We state the algorithm and the revenue loss below.

Algorithm 4 Revenue loss minimizing algorithm for stochastic buyer with known changing rate ϵ

[ℓ, r] [0, 1] for each phase [t0 + 1, t0 + m] of length m = ϵ 2/3 do

Apply Algorithm 2 to narrow down the conﬁdence interval [ℓ, r] to length 6ϵ for each step t in the phase do

Price at pt ℓ 4ϵ2/3q

ϵ end for end for

Theorem 4.3. If the buyer has stochastic value and a ﬁxed changing rate ϵ that is known to the seller, Algorithm 4 has revenue loss O(ϵ2/3). Furthermore, no online algorithm can obtain a revenue loss less than Ω(ϵ2/3).

5. Buyer s Changing Rate is Fixed, Unknown

In this section, we consider the case where the buyer s value changes by ﬁxed rate ϵt = ϵ that is unknown to the seller. As a warm-up, we ﬁrst study the symmetric loss obtained by online prices. Such a problem lets us understand better how to deal with the unknown rate of value change, and the pricing algorithm can be extended to the case of revenue loss.

Theorem 5.1. If the buyer has adversarial values and a ﬁxed changing rate ϵ unknown to the seller, there exists an online pricing algorithm with a symmetric loss O(ϵ). Further, no online algorithm can obtain a symmetric loss less than Ω(ϵ).

Proof sketch. Compared to the case where ϵ is known to the seller, the seller can use a guess ˆϵ to replace the true rate ϵ, and run the algorithms in the known ϵ setting. The algorithm starts with ˆϵ = 1 T , and doubles ˆϵ whenever the algorithm detects that the true changing rate exceeds ˆϵ.

The algorithm deals with three time steps t, t + 1, t + 2 at each time, and always maintain a conﬁdence interval [ℓt, rt] of vt at the beginning of time step t, such that ˆϵ > ϵ,

Algorithm 5 Symmetric-loss minimizing algorithm for adversarial buyer with unknown changing rate ϵ

[ℓ1, r1] [0, 1] ˆϵ 1

T while ˆϵ < 1

2 do for each three consecutive time steps t, t + 1, t + 2 do

Set price pt ℓt Set price pt+1 rt + ˆϵ Set price pt+2 ℓt+rt

2 if the seller ﬁnds vt < ℓor vt+1 rt + ˆϵ then

ˆϵ 2ˆϵ break end if if vt+2 < pt+2 then

ℓt+3 ℓt 3ˆϵ, rt+3 pt+2 + ˆϵ else

ℓt+3 pt+2 ˆϵ, rt+3 r + 3ˆϵ end if end for end while

vt [ℓt, rt] always holds. Time steps t and t + 1 are checking steps , to check whether the current three steps vt , vt+1 and vt+2 incur too much loss. In particular, if ˆϵ ϵ, then vt pt and vt+1 < pt+1 will always hold, and the symmetric loss is O(ϵ) per step from Theorem 4.1. On the other hand, in an iteration with ˆϵ < ϵ, if vt pt = ℓt and vt+1 < pt+1 = rt + ˆϵ always hold, then vt, vt+1, vt+2 [ℓt O(ϵ), rt + O(ϵ)] since the value of the buyer only changes by at most ϵ per step. Thus in a round of three consecutive time steps t, t + 1, t + 2, if vt pt and vt+1 < pt+1, the symmetric loss from this round of three time steps is O(rt ℓt + ϵ). Notice that there are only O(log T) different ˆϵ < ϵ. For each ˆϵ the initialization has O(1) symmetric loss, while the stable length of conﬁdence interval rt ℓt = O(ˆϵ) = O(ϵ), thus the average symmetric loss of each step is O(ϵ).

Using a similar technique of checking steps as in the case of symmetric loss, we can obtain the same revenue loss as in the setting where the seller knows the changing rate, no matter whether the buyer has adversarial or stochastic value.

Theorem 5.2. If the buyer has adversarial values and a ﬁxed changing rate ϵ unknown to the seller, there exists an online pricing algorithm with revenue loss O(ϵ1/2). Further, no online algorithm can obtain a revenue loss less than Ω(ϵ1/2).

Proof. The tightness Ω(ϵ1/2) result is shown in Theorem 4.2, so we only need to construct an algorithm with revenue loss O(ϵ1/2). We want to apply the same technique

Learning to Price Against a Moving Target

of checking steps as in the pricing algorithm for symmetric loss. Recall that the checking steps in Algorithm 5 repeatedly price at the upper bound and the lower bound of the current conﬁdence interval of the value of the buyer to make sure the buyer s value is not too far away from the conﬁdence bound. However, when minimizing revenue loss, the seller cannot afford to frequently check the upper bound of the conﬁdence interval, since each time the buyer is very likely to reject the item and incurs a huge revenue loss. The solution to such a problem is that we can add one checking step to each phase of Algorithm 3. By pricing at the lower bound ℓt of the conﬁdence interval at time t and the upper bound rt+1 of the conﬁdence interval at time t + 1, the seller can know whether vt 1 is far away from our conﬁdence interval [ℓt 1, rt 1] for it. Thus for a phase with m time steps and k bad steps when the buyer s value is far away from the conﬁdence interval, a random checking step can detect a bad step with probability k

m. This means that after O(m) bad steps in expectation, a bad step will be detected and the algorithm will move to the next iteration with ˆϵ doubled. The algorithm is stated as follows.

Algorithm 6 Revenue loss minimizing algorithm for adversarial buyer with unknown changing rate ϵ

T while ˆϵ < 1

2 do for each phase [t0 + 1, t0 + mˆϵ] of length mˆϵ = ˆϵ 1/2

Apply Algorithm 2 to locate the current value of the buyer in an interval [ℓt, rt] with length

ˆϵ Randomly select a t [t0 + 1, t0 + mˆϵ] for each step t in the phase do

if t = t then

Price at pt ℓt if vt < pt then

ˆϵ 2ˆϵ and break (terminate the phase) end if else

Price at pt rt if vt pt then

ˆϵ 2ˆϵ and break (terminate the phase) end if end if [ℓt+1, rt+1] [ℓt ϵ, rt + ϵ] end for end for end while

The algorithm ﬁrst guesses ˆϵ = 1 T , and doubles ˆϵ whenever the algorithm detects a piece of evidence of ˆϵ being smaller than the true ϵ. In particular, the algorithm maintains conﬁdence bound [ℓt, rt] that may contain the current value vt of the buyer. When ˆϵ ﬁrst becomes larger than ϵ (thus at most 2ϵ), the algorithm will run smoothly without

triggering any break statement since vt [ℓt, rt] always holds. The revenue loss is O(

ˆϵ) = O( ϵ) as analyzed in Theorem 4.2.

In an iteration when ˆϵ < ϵ, ﬁrstly there is an additional O(log 1

ˆϵ ) = O(1) loss for binary search initialization of the conﬁdence interval compared to Algorithm 3 for the ﬁxed ϵ setting. Let bad event Et denote vt rt + 2ϵ or vt < ℓt . If bad events never happen, the additional loss is at most 2ϵ per step (thus Tϵ in total), since in the analysis of Algorithm 3 it has conﬁdence interval [ℓt, rt] rather than [ℓt, rt + 2ϵ] here. In a phase with time steps [t0 + 1, t0 + mˆϵ], if an event Et happens, then there are two cases. If vt < ℓt, then it is detected when pt = ℓt, which almost surely happens. If vt > rt + 2ϵ, then it is detected when pt+1 = rt+1 i.e. t = t, since vt+1 vt ϵ > rt + ϵ = rt+1. Therefore, since t is randomly selected from [t0 + 1, t0 + mˆϵ], if k events in Et0+1, , Et0+mˆϵ happens, with probability k mˆϵ a bad event gets detected. Thus, in an iteration with ﬁxed ˆϵ < ϵ, when a bad event is detected, in expectation m bad events have occurred. Each bad event will result in additional revenue loss at most 1, thus O(mˆϵ) = O(ˆϵ 1/2). The total contribution of

revenue loss from bad events is at most Plog T i=0 2i

O(T 1/2) = To(1). To summarize, the total revenue loss of the algorithm is O(Tϵ1/2) for Algorithm 3 with known ϵ, plus the binary search cost O(1) for locating the position of vt in each iteration of different ˆϵ, plus the additional cost O(Tϵ) for having a slightly larger conﬁdence interval in good events than Algorithm 3, plus a total revenue loss of T 1/2 from the bad events. Sum up all the costs above we show that the total revenue loss of Algorithm 6 is T O(ϵ1/2) for all time steps, thus O(ϵ1/2) on average.

When the buyer has stochastic value, we can modify Algorithm 6 for an adversarial buyer such that in each phase is replaced by a phase in Algorithm 4, with the normal pricing step t = t pricing at pt = ℓt0 O(ˆϵ 1/3), and each checking step t at price pt = rt0 + O(ˆϵ 1/3). The analysis is almost identical to Theorem 5.2.

Theorem 5.3. If the buyer has stochastic value and a ﬁxed changing rate ϵ unknown to the seller, there exists an online pricing algorithm with revenue loss O(ϵ2/3). Further, no online algorithm can obtain a revenue loss less than Ω(ϵ2/3).

6. Buyer s Changing Rate is Dynamic, Unknown

In this section, we study a more complicated setting where the buyer s value changes in a more dynamic way. In particular, |vt+1 vt| are upper bounded by possibly different

Learning to Price Against a Moving Target

non-increasing ϵt that are unknown to the seller.

For the symmetric-loss minimization problem with an adversarial buyer, we show that when ϵt are non-increasing, the seller can still achieve a symmetric loss of O( ϵ) as in the case of ﬁxed ϵ. The algorithm and its analysis have the same structure as the revenue loss minimization problem and is omitted here.

Theorem 6.1. If the buyer has adversarial values and nonincreasing changing rates ϵt unknown to the seller, then there exists an online pricing algorithm with symmetric loss O( ϵ). Furthermore, no online algorithm can obtain a symmetric loss less than Ω( ϵ).

For the revenue loss minimization problem for an adversarial buyer, we can also recover the results in previous sections, even when ϵt are unknown to the seller. We describe the result and the algorithm in detail here.

Theorem 6.2. If the buyer has adversarial values and nonincreasing changing rates ϵt unknown to the seller, there exists an online pricing algorithm with revenue loss O( ϵ1/2), here ϵ = 1 T PT t=1 ϵt. Further, no online algorithm can obtain a revenue loss less than Ω( ϵ1/2).

Proof. The Ω( ϵ1/2) tightness result has been shown in Theorem 4.2 with all ϵt being identical. Now we show that there exists an algorithm with revenue loss O( ϵ1/2).

When ϵt decreases, the seller needs to detect such a trend timely, otherwise the loss of each time step is going to be not comparable to P

t ϵt. We propose the following algorithm for the seller, that repeatedly guesses the current level of changing rate at each time step. The algorithm starts with guessing ˆϵ = 1 2 being an estimate of ϵt, and reduces the value of the guess ˆϵ by a factor of 1

2 if in several time steps the algorithm cannot ﬁnd any evidence supporting ˆϵ < ϵt. Whenever the algorithm ﬁnds evidence that supports ˆϵ < ϵt, the algorithm repeatedly doubles ˆϵ and updates the conﬁdence interval according to the new ˆϵ, until the evidence of ˆϵ < ϵt disappears. Such a dynamic update of ˆϵ keeps the revenue loss bounded.

To be more speciﬁc, ˆϵ decreases by a factor of 1

2 if the seller has not observed any evidence of ˆϵ < ϵt for long enough time. In particular, the algorithm tries to run ˆϵ 1/2 identical phases in Algorithm 6, and will halve ˆϵ when the buyer passes all checking steps. The algorithm is described in Algorithm 7.

Now we analyze the performance of the algorithm. Partition the time horizon [T] to log T intervals I1, , Ilog T , such that for each time interval Ii and time t Ii, ϵt (2 i, 2 i+1]. Let ϵ i = 2 i+1. We argue that in time interval Ii the total revenue loss is O((ϵ i ) 1/2 + |Ii|(ϵ i )1/2).

In interval Ii the algorithm may start with ˆϵ > ϵ i , and

Algorithm 7 Revenue loss minimizing algorithm for adversarial buyer with unknown decreasing changing rate ϵt

2 while true do

for ˆϵ 1/2 phases of length mˆϵ = ˆϵ 1/2 do

At the beginning of phase [t0 + 1, t0 + mˆϵ], apply Algorithm 2 to locate the current value of the buyer in an interval [ℓt, rt] with length

ˆϵ Randomly select a t [t0 + 1, t0 + mˆϵ] for each step t in the phase do

if t = t then

Price at pt ℓt if vt < pt then

ˆϵ 2ˆϵ and go back to the beginning of the while loop (terminate the ˆϵ 1/2 phases) end if else

Price at pt rt if vt pt then

ˆϵ 2ˆϵ and go back to the beginning of the while loop (terminate the ˆϵ 1/2 phases) end if end if [ℓt+1, rt+1] [ℓt ϵ, rt + ϵ] end for end for ˆϵ ˆϵ/2 if ˆϵ > 1

T end while

then gradually decreases to reach ˆϵ = ϵ i and never become larger than ϵ i later. In this process, the revenue loss is O(1) in each phase as shown in the proof of Theorem 5.2, thus O(ˆϵ 1/2) loss for every value ˆϵ > ϵ i and at most P

ˆϵ>ϵ i O(ˆϵ 1/2) = O((ϵ i ) 1/2) in total.

Now we show that after ˆϵ reaches ϵ i , the revenue loss is O((ϵ i )1/2) per step on average. First we study the revenue loss from each ˆϵ 1/2 phases with changing rate ˆϵ. The same as in previous proofs, a piece of bad evidence , or a piece of evidence of ˆϵ < ϵ is the event of vt < pt in a nonchecking step t = t or vt pt in a checking step t = t . If no evidence of ˆϵ < ϵ is detected, then the revenue loss is at most some constant c = O(1) in each phase, thus cˆϵ 1/2 for the ˆϵ 1/2 phases with ˆϵ 1 time steps. We also observe that if we run the algorithm with changing rate ϵ i , the revenue loss of such ˆϵ 1 time steps is going to be c(ϵ i )1/2 per step thus cˆϵ 1(ϵ i )1/2 in total.

We argue that the per-step revenue loss in Ii is at most 2c(ϵ i )1/2. In ˆϵ 1/2 phases where no bad evidence is found, the algorithm actually gets c(ˆϵ 1(2ϵ i )1/2 ˆϵ1/2) > cˆϵ 1(ϵ i )1/2 > cˆϵ 1/2 less revenue loss than the expected benchmark (2c(ϵ i )1/2 per step). In any phase with estimated changing rate ˆϵ where a piece of bad evidence is

Learning to Price Against a Moving Target

found, as shown in the analysis of Algorithm 6, in expectation mˆϵ = ˆϵ 1/2 steps with value out of conﬁdence bound has occurred, and contributes at most ˆϵ 1/2 total additional revenue loss more than the normal 2c(ϵ i )1/2 loss per step. Therefore, every time the algorithm goes through ˆϵ 1/2

phases without bad evidence, the algorithm has at least cˆϵ 1/2 less revenue loss than expected; every time the algorithm with estimated changing rate ˆϵ ﬁnds a phase with a piece of bad evidence, the algorithm has at most cˆϵ 1/2

more revenue loss than expected. Observe that in each iteration with ˆϵ decreases by 1

2 no bad evidence is detected, and bad evidence is found in each iteration with ˆϵ getting doubled. Thus the number of iterations with no bad evidence being detected is at least the number of iterations with bad evidence found, which means that the algorithm has no more revenue loss than the expected 2c(ϵ i )1/2 per step. To summarize, in Ii after ˆϵ reaches ϵ i , the revenue loss is O((ϵ i )1/2) per step.

Above reasoning shows that in each time interval Ii, the total revenue loss is O((ϵ i ) 1/2 + |Ii|(ϵ i )1/2). Sum up over all i, the total revenue loss of all time steps is X

i log T O((ϵ i ) 1/2 + |Ii|(ϵ i )1/2)

= O(T 1/2) + O(1) X

i |Ii|(ϵ i )1/2

To(1) + O(1) X

i |Ii| ϵ1/2 = O(T ϵ1/2),

Here the inequality is by Cauchy-Schwarz. Thus the average revenue loss of each time step is O( ϵ1/2).

Such a result can also be extended for a stochastic buyer.

Theorem 6.3. If the buyer has stochastic value and nonincreasing changing rate ϵt unknown to the seller, then there exists an online pricing algorithm with revenue loss O( ϵ2/3), here ϵ = 1

T PT t=1 ϵt. Furthermore, no online algorithm can obtain a revenue loss less than Ω( ϵ2/3).

For the revenue loss minimization problem, it is hard to obtain positive results when the changing rates ϵt are arbitrary, since setting a price slightly higher than the true value in a step can result in a huge revenue loss. Surprisingly, even if ϵt for each time step can change arbitrarily, we can still achieve the O( ϵ) loss in previous sections, only losing a tiny O(log T) factor.

Theorem 6.4. If the buyer has adversarial values and dynamic changing rate ϵt unknown to the seller, there exists an online pricing algorithm with symmetric loss O( ϵ log T) for ϵ = 1

t [T ] ϵt. Further, no online algorithm can obtain a symmetric loss less than Ω( ϵ).

Proof sketch. Suppose for a moment that the algorithm is allowed to set multiple pricing queries for a single time step. The algorithm maintains a correct conﬁdence interval [ℓt, rt] that contains the value vt of the buyer at each time step. At time t + 1, the seller does not know the exact value change vt+1 vt. Furthermore, she also does not know a bound of the value change ϵt |vt+1 vt|. However, the seller can try to price at ℓt δj and rt + δj repeatedly for every j and δj = 2j T 1. When j has increased such that the algorithm ﬁnds that ℓt δj < vt+1 < rt + δj, the seller can then price at ℓt+rt

2 to get a new correct conﬁdence interval [ℓt+1, rt+1] [ℓt δj, ℓt+rt

2 ] or [ ℓt+rt

2 , rt + δj]. Such an algorithm identiﬁes the change of each step accurately and has O( ϵ) symmetric loss.

Algorithm 8 Symmetric-loss minimizing algorithm for adversarial buyer with unknown dynamic changing rate ϵt

[ℓ1, r1] [0, 1] Let t1 + 1 = 1 be the starting time of the ﬁrst phase for each phase i of time interval [ti + 1, ti+1] with to-bedetermined stopping time ti+1 do

Let ti + 1 be the starting time step of the phase for each integer j 0 do

ˆϵ δj = 2j T 1

Set price pti+2j+1 ℓti+1 δj Set price pti+2j+2 rti+1 + δj if the seller ﬁnds vti+2j+1 pti+2j+1 and vti+2j+2 < pti+2j+2 then

break (from this for loop) end if end for ti+1 ti + 2j + 3 Set pti+1 ℓti+1+rti+1

2 if vti+1 < pti+1 then

[ℓti+1+1, rti+1+1] [ℓti+1 δj, pti+1] else

[ℓti+1+1, rti+1+1] [pti+1, rti+1 + δj] end if end for

However, we are not allowed to have multiple pricing queries for the same value. The key observation is that when we serialize the pricing queries in such an algorithm with at most k queries per step, the symmetric loss only increases by a factor of O(k). Since the parallel algorithm has at most O(log T) pricing queries in each step, the serialized algorithm s symmetric loss only increases by O(log T). Another speciﬁc example of such serialization is Algorithm 5 for minimizing symmetric loss in the unknown ﬁxed changing rate setting. Each three consecutive steps t, t + 1, t + 2 can be viewed as the serialization of a single step with three price queries ℓt ϵ, rt + ϵ and ℓt+rt

2 . The analysis of the serialized Algorithm 8 for this unknown changing ϵt setting can be viewed as a generalization of the analysis of

Learning to Price Against a Moving Target

Algorithm 5 for the unknown ﬁxed changing rate setting.

Acknowledgements

We thank the reviewers for their valuable feedback. Yifeng Teng was supported in part by NSF grants CCF-1617505 and SHF-1704117. Part of this work was done when Yifeng Teng was an intern at Google.

Amin, K., Rostamizadeh, A., and Syed, U. Repeated contextual auctions with strategic buyers. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pp. 622 630, 2014.

Cesa-Bianchi, N., Cesari, T., and Perchet, V. Dynamic pricing with ﬁnitely many unknown valuations. In Algorithmic Learning Theory, pp. 247 273. PMLR, 2019.

Chawla, S., Devanur, N. R., Karlin, A. R., and Sivan, B. Simple pricing schemes for consumers with evolving values. In Krauthgamer, R. (ed.), Proceedings of the Twenty Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016, Arlington, VA, USA, January 10-12, 2016, pp. 1476 1490. SIAM, 2016.

Cohen, M. C., Lobel, I., and Paes Leme, R. Feature-based dynamic pricing. In Proceedings of the 2016 ACM Conference on Economics and Computation, pp. 817 817. ACM, 2016.

Daniely, A., Gonen, A., and Shalev-Shwartz, S. Strongly adaptive online learning. In International Conference on Machine Learning, pp. 1405 1411. PMLR, 2015.

Devanur, N. R., Peres, Y., and Sivan, B. Perfect bayesian equilibria in repeated sales. Games Econ. Behav., 118: 570 588, 2019.

Foster, D. J., Li, Z., Lykouris, T., Sridharan, K., and Tardos, E. Learning in games: robustness of fast convergence. In Proceedings of the 30th International Conference on Neural Information Processing Systems, pp. 4734 4742, 2016.

Hazan, E. and Seshadhri, C. Adaptive algorithms for online decision problems. In Electronic colloquium on computational complexity (ECCC), volume 14, 2007.

Herbster, M. and Warmuth, M. K. Tracking the best linear predictor. Journal of Machine Learning Research, 1(281309):10 1162, 2001.

Javanmard, A. and Nazerzadeh, H. Dynamic pricing in highdimensions. Journal of Machine Learning Research, 20 (9):1 49, 2019. URL http://jmlr.org/papers/ v20/17-357.html.

Kakade, S. M., Lobel, I., and Nazerzadeh, H. Optimal dynamic mechanism design and the virtual-pivot mechanism. Oper. Res., 61(4):837 854, 2013.

Kleinberg, R. and Leighton, T. The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In Foundations of Computer Science, 2003. Proceedings. 44th Annual IEEE Symposium on, pp. 594 605. IEEE, 2003.

Krishnamurthy, A., Lykouris, T., Podimata, C., and Schapire, R. Contextual search in the presence of irrational agents. ar Xiv preprint ar Xiv:2002.11650, 2020.

Leme, R. P. and Schneider, J. Contextual search via intrinsic volumes. In 59th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2018, Paris, France, October 7-9, 2018, pp. 268 282, 2018. doi: 10. 1109/FOCS.2018.00034. URL https://doi.org/ 10.1109/FOCS.2018.00034.

Liu, A., Leme, R. P., and Schneider, J. Optimal contextual pricing and extensions. pp. 1059 1078, 2021. doi: 10.1137/1.9781611976465.66. URL https://epubs.siam.org/doi/abs/10. 1137/1.9781611976465.66.

Lobel, I., Leme, R. P., and Vladu, A. Multidimensional binary search for contextual decision-making. Operations Research, 2017.

Luo, H. and Schapire, R. E. Achieving all with no parameters: Adanormalhedge. In Conference on Learning Theory, pp. 1286 1304. PMLR, 2015.

Lykouris, T., Sridharan, K., and Tardos, E. Small-loss bounds for online learning with partial information. In Conference on Learning Theory, pp. 979 986. PMLR, 2018.

Paes Leme, R., Sivan, B., and Teng, Y. Why do competitive markets converge to ﬁrst-price auctions? In Proceedings of The Web Conference 2020, pp. 596 605, 2020.

Pavan, A., Segal, I., and Toikka, J. Dynamic mechanism design: A myersonian approach. Econometrica, 82(2):601 653, 2014. doi: https://doi.org/10.3982/ ECTA10269. URL https://onlinelibrary. wiley.com/doi/abs/10.3982/ECTA10269.

Shah, V., Johari, R., and Blanchet, J. Semiparametric dynamic contextual pricing. 32:2363 2373, 2019. URL https://proceedings.

Learning to Price Against a Moving Target

neurips.cc/paper/2019/file/ 363763e5c3dc3a68b399058c34aecf2c-Paper. pdf.

Shreve, S. E. Stochastic calculus for ﬁnance II: Continuoustime models, volume 11. Springer Science & Business Media, 2004.