# dynamic_budget_throttling_in_repeated_secondprice_auctions__84547aa6.pdf

Dynamic Budget Throttling in Repeated Second-Price Auctions

Zhaohua Chen*1, Chang Wang*2, Qian Wang*1, Yuqi Pan3, Zhuming Shi4, Zheng Cai5, Yukun Ren5, Zhihua Zhu5, Xiaotie Deng1,6

1 CFCS, School of Computer Science, Peking University 2 Northwestern University 3 School of Electronics Engineering and Computer Science, Peking University 4 Stony Brook University 5 Tencent Technology (Shenzhen) Co., Ltd. 6 CMAR, Institute for Artiﬁcial Intelligence, Peking University chenzhaohua@pku.edu.cn, wc@u.northwestern.edu, charlie@pku.edu.cn, pyq0419@stu.pku.edu.cn, zhuming.shi@stonybrook.edu, {zhengcai, rickyren, zhihuazhu}@tencent.com, xiaotie@pku.edu.cn

In today s online advertising markets, a crucial requirement for an advertiser is to control her total expenditure within a time horizon under some budget. Among various budget control methods, throttling has emerged as a popular choice, managing an advertiser s total expenditure by selecting only a subset of auctions to participate in. This paper provides a theoretical panorama of a single advertiser s dynamic budget throttling process in repeated second-price auctions. We ﬁrst establish a lower bound on the regret and an upper bound on the asymptotic competitive ratio for any throttling algorithm, respectively, when the advertiser s values are stochastic and adversarial. Regarding the algorithmic side, we propose the OGD-CB algorithm, which guarantees a near-optimal expected regret with stochastic values. On the other hand, when values are adversarial, we prove that this algorithm also reaches the upper bound on the asymptotic competitive ratio. We further compare throttling with pacing, another widely adopted budget control method, in repeated second-price auctions. In the stochastic case, we demonstrate that pacing is generally superior to throttling for the advertiser, supporting the well-known result that pacing is asymptotically optimal in this scenario. However, in the adversarial case, we give an exciting result indicating that throttling is also an asymptotically optimal dynamic bidding strategy. Our results bridge the gaps in theoretical research of throttling in repeated auctions and comprehensively reveal the ability of this popular budget-smoothing strategy.

Introduction

In recent years, the online advertising market has experienced signiﬁcant growth, driven by the rise of new social media platforms such as short videos. When a user submits an ad query to the market, an auction is held among all interested advertisers, and the winner is awarded the opportunity to display their ad. Owing to the vast market volume, it is common for advertisers to set a budget to regulate their expenditure over a speciﬁed period. Correspondingly, advertising platforms

*These authors contributed equally. Copyright 2024, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

provide advertisers with various budget control methods to choose from.

This work studies one of these methods, called throttling (a.k.a. probabilistic pacing), which is widely adopted by major advertising platforms including Facebook (Facebook 2017), Google (Karande, Mehta, and Srikant 2013), Linked In (Agarwal et al. 2014) and Yahoo! (Xu et al. 2015). Under this method, an advertiser s accumulated payment is controlled by being excluded from a fraction of auctions throughout the entire period (e.g., a day, a week, or a month). Compared to other budget management strategies (Balseiro et al. 2021), such as pacing (a.k.a. bid-shading), an essential feature of throttling is that an advertiser s bid is never altered in any auction instance. As revealed in the literature (Karande, Mehta, and Srikant 2013; Chen, Kroer, and Kumar 2021), this feature attracts a large number of advertisers to use the throttling strategy due to two primary reasons. (a) Some advertisers do not permit the platform to modify their bids, forcing the platform to exclude them from some auctions to control their budget, i.e., adopting the throttling strategy. (b) Strategies that modify bids, such as pacing, may allocate an advertiser to those auctions where she is superior to other advertisers, and could be detrimental for the advertiser to exploit other parts of the market. In contrast, with unmodiﬁed bids competing, the throttling strategy provides advertisers with a more straightforward and unbiased observation of various users, enabling them to gain a clearer understanding of their competitiveness in different market sectors. For instance, under pacing, a small budget on some market sectors to be explored would lead to a small pacing multiplier on the advertiser s bid, which would cause the advertiser to win nothing in this sector, considering other advertisers who are superior. In comparison, under throttling, even with a small budget, the advertiser has a chance to have a competitive bid (as the bid is not modiﬁed) and win in some auctions, thus helping the advertiser to explore the new market sector with a small cost. These phenomena illustrate the importance of throttling as a popular budget control method.

The throttling strategy has been extensively explored in the literature, primarily from an empirical perspective evalu-

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

ating the performance of speciﬁc algorithms (Agarwal et al. 2014; Karande, Mehta, and Srikant 2013; Xu et al. 2015). Some work has also considered the theoretical equilibrium problem when multiple buyers simultaneously adopt the throttling strategy (Balseiro et al. 2021; Chen, Kroer, and Kumar 2021). However, there is currently no literature that takes the perspective of the advertiser and concentrates on how to theoretically optimize their accumulated revenue through a throttling strategy in repeated auctions. The aim of this paper is to address this problem and design dynamic throttling algorithms that achieve good performance in various input models. In practice, such a throttling strategy is implemented through an auto-bidding service that receives the advertiser s values and makes binary choices on behalf of the advertiser in each auction. However, to simplify the terminology and description, this work shifts the throttling control to the advertiser as a subjective strategy that takes objective values as inputs, which is equivalent to the real-world scenario.

Our Contributions This work gives a theoretical panorama of an advertiser s dynamic throttling strategy in repeated second-price auctions. Speciﬁcally, our contributions are three-wise along the following lines.

Formalization, bounds, and impossibility results. We formalize the problem of dynamic throttling in repeated secondprice auctions from the perspective of a single advertiser in two value input models; namely, where private values v are stochastic and adversarial, respectively. To model other bidders bids, we assume the highest competing bids p to be stochastic and following an unknown i.i.d. distribution. When v are stochastic, we measure the performance of a throttling strategy by considering its regret and establish an Ω(

T) lower bound on the expected regret (Theorem 1). On the other hand, when v are adversarial, we measure the performance of a throttling strategy by considering its asymptotic competitive ratio and demonstrate that any throttling strategy s asymptotic competitive ratio cannot exceed the advertiser s regularized average budget (Theorem 2), i.e., the average budget divided by the maximum value. Note that the i.i.d. assumption on p is a common practice in the literature. In fact, effective learning could be impossible for the bidder without this assumption, as we show in the full version of this work (Chen et al. 2022b) that any throttling strategy can behave arbitrarily poorly when p are adversarial.

Our OGD-CB algorithm and analysis. Following the above bound results, we combine the online optimization method and the distribution estimation technique and propose an algorithm that is oblivious to the value input model. More technically, it is based on online gradient descent and conﬁdence intervals, referred to as the OGD-CB algorithm (Algorithm 1). Practically, the ability to observe a sample of the highest competing bid affects the algorithm s estimation accuracy of the distribution of p, and we consider our algorithm s performance under two information models in order: (a) full information feedback, where the advertiser can observe the highest competing bid in each round; (b) partial information feedback, where the advertiser can only acquire

the highest competing bid when she participates in an auction. The latter information structure presents greater difﬁculty in distribution estimation but is more practical for auto-bidding services in the industry, especially if the platform announces the highest bid to all participants after an auction (Google Ad Exchange 2022). Theoretically, we prove that with either full or partial information feedback, our OGD-CB algorithm achieves an O( T log T) regret with probability 1 O(1/T) for the stochastic value input model (Theorems 3 and 5). This result implies a near-optimal O( T log T) expected regret. For the adversarial value input model, we show that it also possesses an optimal asymptotic competitive ratio with either full or partial information feedback (Theorems 4 and 6). We summarize these performance guarantees together with the bound results in Table 1. In addition to its high performance, our algorithm has two additional advantages. Firstly, it is computation-friendly, as the decision and update process only takes constant time in each round. Second, compared with other algorithms, our solution does not rely on any particular structure of the distribution of the highest competing bid. Notably, our solution does not require the (interim) reward/cost function to be linear, convex/concave, or even continuous. In particular, our solution even works for discrete distributions.

Comparison between throttling and pacing. Subsequently, we compare the throttling strategy with the celebrated pacing strategy (Balseiro and Gur 2019) in repeated second-price auctions. It is worth noting that the latter is known to be asymptotically optimal when v and p are simultaneously stochastic or adversarial. When v and p are stochastic, we show that, in general cases, throttling results in a Θ(T) loss compared to pacing on the advertiser s expected revenue (Theorem 7). We also give special conditions under which these two strategies exhibit only an e O(

T) difference under full/partial information feedback (Theorem 8), for completeness. Excitingly, when v is adversarial and p is stochastic, we demonstrate that throttling is an asymptotically optimal bidding strategy under full/partial information feedback. Furthermore, our OGD-CB algorithm is also optimal in this case. This result reveals the importance of throttling as a budget-smoothing method in advertising, and ﬁlls the gaps in research on dynamic bidding strategies with adversarial values and stochastic competing bids under repeated second-price auctions.

Related Work In this part, we will review two popular budget management strategies in repeated auctions: throttling and pacing, further discussion on technically related works. Previous work on dynamic throttling mainly centers on experimental investigations. Among them, Karande, Mehta, and Srikant (2013) explore the concept of fair allocation in generalized second-price (GSP) auctions, wherein they present an optimal throttling algorithm for diverse objectives. Agarwal et al. (2014) also focus on GSP auctions from an advertiser s side and implements their algorithm in Linked In s ad serving system. Xu et al. (2015) evaluate a practical online throttling

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Value Input Model Stochastic Adversarial

Information Structure Full Partial Full Partial

T) regret (Theorem 1)

(ρ/ v)-asymptotic competitive ratio (Theorem 2)

OGD-CB O( T log T) regret (Theorems 3 and 5)

(ρ/ v)-asymptotic competitive ratio (Theorems 4 and 6)

T: Number of auctions. ρ: Average budget. v: Maximum value.

Table 1: The bounds and the performance of OGD-CB under different value input models and information structures.

algorithm on the demand-side platform. Recently, Gui, Nair, and Niu (2022) show how to conduct causal inference of online advertising effects in the budget throttling market. On the theoretical side, some work (Balseiro et al. 2021; Chen, Kroer, and Kumar 2021) focuses on the market equilibrium when all advertisers simultaneously follow a random throttling strategy from either a continuous or discrete view. In contrast, our work examines the dynamics of throttling in the repeated advertising market. Additionally, Charles et al. (2013) study the regret-free allocation for advertisers ROI, and shows that such a heuristic outperforms the random throttling strategy for advertisers. Meanwhile, other work looks into a similar problem for Internet keyword search, known as the Ad Words problem (Mehta et al. 2007, 2013) in the framework of online matching. However, this line of work concentrates on the platform s side rather than the advertiser s side. Pacing is another well-studied budget control strategy, in which an advertiser shades her value by a constant factor on her bid. Existing work studies this strategy from both dynamic view (Balseiro and Gur 2019; Borgs et al. 2007; Gaitonde et al. 2023; Celli et al. 2022; Lee, Jalali, and Dasdan 2013) and equilibrium perspectives (Balseiro, Besbes, and Weintraub 2015; Conitzer et al. 2022a,b). Among these, the result of Balseiro and Gur (2019) is highly correlated with our solution, which also considers the dual space. Nevertheless, the analysis of their algorithm depends on the continuity of the distribution function, which is not necessary for our algorithm. Some papers compare various budget control methods and explore their relationships from the equilibrium view (Balseiro et al. 2021; Chen et al. 2022a; Balseiro, Kroer, and Kumar 2023). In particular, Balseiro et al. (2021) show that in the symmetric system equilibrium, throttling yields a higher proﬁt for the platform than pacing under certain assumptions. A part of our results extends the comparison between throttling and pacing from an advertiser s viewpoint from a dynamic view. Technically, our problem is closely related to the network revenue management (NRM) problem (Gallego and Van Ryzin 1994, 1997) and the contextual bandits with knapsacks (CBw K) problem. However, there are signiﬁcant differ-

ences between our work and the existing literature. In contrast to NRM, our agent s reward and cost are random with the highest competing bid, while such randomness is not present in NRM problems. Moreover, our setting diverges from the literature on CBw K in multiple ways. Firstly, early work in CBw K (Agrawal, Devanur, and Li 2016; Badanidiyuru, Langford, and Slivkins 2014) assumes that the context set is ﬁnite, whereas we do not make such assumptions. Secondly, some work in CBw K (Agrawal and Devanur 2016; Sivakumar, Zuo, and Banerjee 2022; Han et al. 2023; Slivkins, Sankararaman, and Foster 2023) assumes a speciﬁc relationship between the expectation of reward/cost and the context, i.e., requires a speciﬁc distribution of the highest competing bid. However, our work sets no such limitation on this distribution. Finally, other work (Wu et al. 2015; Balseiro, Besbes, and Pizarro 2023; Liu and Grigas 2022; Ai et al. 2022) supposes that the context is drawn stochastically i.i.d. from some distribution. In contrast, our solution is oblivious to the context input model, allowing it to deal with adversarial context inputs. This is particularly important for auto-bidding services, as the value for an ad slot can be affected by multiple features and could vary over time. Therefore, it is crucial to design algorithms that can handle different value input models simultaneously. At last, our problem is also correlated with the bandits with knapsacks (Bw K) problem (e.g., (Castiglioni, Celli, and Kroer 2022)). However, in that problem, the action in each round is chosen without any context, and the optimal action mode is universal. In contrast, in our problem, the optimal action is relevant to the value observed at the start of each auction.

Basic settings. In this work, we consider the repeated second-price auction market, where an advertiser with a budget constraint competes against other advertisers. The market comprises T rounds of auctions, and in each round t [T] := {1, 2, , T}, an item is to be sold to a buyer via a second-price auction. Here, we suppose that T 1. To match the more prevalent terminology in the literature, we refer to the advertiser as the buyer in the remaining parts. This work adopts the perspective of a ﬁxed buyer. In each

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

round t [T], she obtains a personal value for the item, represented by vt [0, v], where v is a constant upper bound on vt. We denote the highest competing bid against the buyer as pt, which is assumed to be i.i.d. sampled from an unknown distribution G with a support of [0, v]. This assumption comes from the mean-ﬁeld approximation (Balseiro, Besbes, and Weintraub 2015) and is commonly used in the literature. We use v to represent the buyer s value vector across T rounds, i.e., v := (vt)t [T ]; similarly, p := (pt)t [T ]. We assume the buyer has a total budget of B across all T rounds, with the maximum average expenditure being ρ := B/T per round. In this paper, we suppose that ρ v is a constant. This assumption comes from the practice in which the buyer is always asked to set a budget for a ﬁxed period, with relatively ﬁxed rounds of auctions. In each round t, the buyer makes a decision xt {0, 1} based on the value vt. The binary selection of the buyer reﬂects the nature of throttling, where xt = 1 denotes participation in the auction, and xt = 0 means saving the budget and not participating in the auction1. After the decision, the buyer receives a reward of xtrt and incurs a cost of xtct in this round, where rt and ct are deﬁned as:

rt = (vt pt)+, ct = pt1[vt pt].

In the above, (vt pt)+ in the expression for rt stands for the positive part of (vt pt). Thus, the buyer can obtain a positive reward and cost in round t only by opting to enter (i.e., xt = 1) and wins (i.e., vt pt). In this case, her cost is pt, and her reward is vt pt for a second-price auction. Once again, we mention here that in the literature on throttling, it is commonly assumed that the buyer bids truthfully as long as she enters an auction.

Information structure. We consider two different information models in this work: 1. [Full information feedback.] The buyer observes pt at the end of any round t. 2. [Partial information feedback.] The buyer observes pt at the end of round t only if she chooses to participate in the auction in this round, i.e., if xt = 1. In comparison to the full information feedback model, it is evident that the partial information feedback model is more challenging to manage since the buyer can access less information. Additionally, a natural model with even less information than the two we discuss is the bandit feedback model, in which the buyer only sees rt and ct in round t instead of pt. Recently, some research (Slivkins, Sankararaman, and Foster 2023; Han et al. 2023) has investigated this model in the problem of contextual bandits with knapsacks (CBw K), which is a generalization of our problem. Nevertheless, this line uses online regression techniques and has speciﬁc requirements on E[rt, ct | vt] as a function of vt, e.g., being linear. In other words, their method has strong assumptions on the distribution G of pt in our problem. As far as we know, these assumptions are necessary in the literature for bandits information feedback. In contrast, in this work, we do not

1It is required in throttling that the buyer truthfully bids. The motivation here is discussed in the Introduction.

impose any restriction on the distribution G. Our solution is oblivious to the distribution G, and thus model-free. In this respect, our information feedback model is comparable to existing works. We generally use Ht to denote the history that the buyer can access at the start of round t. For the full information feedback model, we use HF t := (vτ, xτ, pτ)1 τ<t to denote the buyer s view at the start of round t. We should note that in this model, pτ is always available to the buyer at the end of each round τ, and rτ and cτ can be inferred from vτ, xτ and pτ. For the partial information feedback model, since pτ is disclosed to the buyer if and only if xτ = 1, we accordingly deﬁne the history available at round t as HP t := (vτ, xτ, xτpτ)1 τ<t. It is worth noting that pτ cannot be deduced from xτ and xτpτ when xτ = 0. Likewise, rτ and cτ can also be derived using vτ, xτ and xτpτ in this information model.

The throttling strategy. With the above notation, we now formally deﬁne the buyer s throttling strategy. Prior to making a decision, the buyer can see Ht and vt in each round t. Denoting the buyer s single-round strategy in round t by βt : [0, v]2t 1 {0, 1}t 1 Γ {0, 1}, we have

xt = βt (Ht, vt, γt) .

Here, γt Γ is sampled from a probability space to depict the potential randomness involved in calculating xt. Let γ = (γt)t [T ]. As a result, β := (βt)t [T ] encompasses the buyer s single-round strategy for all T rounds and represents her overall throttling strategy. With v, p and randomness γ given, we denote the stopping time of strategy β by T β 0 (v, p, γ) T, i.e., the last round with xt = 1. When the context is clear, we abbreviate this as T0. Consequently, the total revenue of β with inputs v and p is given by:

U β(v, p, γ) :=

Value Model and Benchmark As previously suggested, this work examines two distinct input models for the values (vt)t [T ]: the stochastic model and the adversarial model.

Stochastic values and regret. Concerning the stochastic value model, it is assumed that for each t, vt is drawn i.i.d. from some unknown distribution F with a support on [0, v]. We additionally suppose that F and G are independent. In this case, the performance of a throttling strategy is assessed by comparing its reward with the ﬂuid adaptive throttling benchmark. The latter represents the optimal expected revenue of any random strategy given the value without exceeding the budget in expectation. Speciﬁcally,

OPT := T max π:[0, v] [0,1] Ev F,p G π(v) (v p)+ ,

s.t. Ev F,p G [π(v) p1[v p]] ρ. (1)

It is known that OPT provides an upper bound for the expected total reward of any throttling strategy with stochastic

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

values (Balseiro, Besbes, and Pizarro 2023). Consequently, we can deﬁne the regret of strategy β in relation to OPT given v, p and randomness γ. That is,

Regβ(v, p, γ) := OPT U β(v, p, γ).

Clearly, under stochastic values, our goal is to design a throttling strategy β that results in a low expectation of Regβ(v, p, γ). A stronger requirement is to ensure a small Regβ(v, p, γ) with high probability on samples (v, p) and randomness γ.

Adversarial values and asymptotic competitive ratio. We also consider the scenario that the inputs of (vt)t [T ] are adversarial on [0, v], which accounts for the situations where the buyer lacks conﬁdence in the item distribution. In this regard, we ﬁrst deﬁne the hindsight throttling benchmark, which is the optimal performance advertiser could attain with the beneﬁt of hindsight on v and p. Speciﬁcally,

UH(v, p) := max x {0,1}T

t=1 xt (vt pt)+ ,

t=1 xtpt1[vt pt] Tρ.

Evidently, UH(v, p) bounds U β(v, p, γ) for any strategy β and randomness γ. We then deﬁne a throttling strategy β to be asymptotically µ-competitive for µ (0, 1], if the following condition holds:

lim inf T ,B=ρT inf v,G

T Ep GT ,γ h U β(v, p, γ) µ UH(v, p) i

Bounds on Dynamic Throttling In this section, we explore the optimal performance attainable by any throttling strategy in the presence of either stochastic or adversarial values. In the case of stochastic values, we obtain a regret lower bound of Ω(

T). On the other hand, with adversarial values, we give an asymptotic competitive ratio upper bound of ρ/ v. In the full version of this work (Chen et al. 2022b), we additionally derive an impossibility result indicating that any throttling algorithm could achieve arbitrarily small reward facing adversarial p even when {vt}t T are constant. This result underscores the necessity of presuming stationary highest competing bids in the context of throttling.

Stochastic values. The following theorem gives our primary result for the regret lower bound with stochastic values: Theorem 1. There exists an instance tuple (F, G) and some constant Cl > 0, such that for any online throttling strategy β and 4|T (i.e., T is a multiple of 4), we have

Ev F T ,p GT ,γ Regβ(v, p, γ) Cl

Similar results have been documented in Vera and Banerjee (2021); Bumpensanti and Wang (2020); Arlotto and Gurvich (2019) in the literature of network revenue management.

However, in that problem, there is no randomness in the reward and cost given the request (value), which is a simpliﬁcation of our situation. The central idea of our proof is to give an appropriate problem instance with a degenerate ﬂuid solution, where even the optimal throttling strategy with the hindsight of v and p cannot avoid a regret of Ω(

T). Note that the performance of an online throttling strategy may only be inferior with partial information. As a result, Theorem 1 implies that no online throttling strategy can always guarantee an o(

T) regret with partial information feedback.

Adversarial values. With adversarial values, we settle an upper bound on the asymptotic competitive ratio of any throttling algorithm. In fact, we have the following theorem stating that such an upper bound is ρ/ v. Theorem 2. For any µ > ρ/ v,

lim inf T ,B=ρT inf v,G

T Ep GT ,γ h U β(v, p, γ) µ UH(v, p) i

Here, the lim inf notation stands for the limit inferior. The proof of Theorem 2 follows Balseiro and Gur (2019). More speciﬁcally, we ﬁrst apply Yao s principle (Yao 1977) and change our problem into constructing a hard problem instance with stochastic p and half-stochastic v that follows some vector distribution. Here, the hard instance should ensure that the asymptotic competitive ratio of any deterministic (rather than random) algorithm is no more than ρ/ v. We further let p be ﬁxed across time. Therefore we are further reduced to constructing a vector distribution of (v p)+ that blocks any deterministic throttling strategy.

The OGD-CB Algorithm and Performance In this section, we introduce an online throttling strategy called OGD-CB that is oblivious of the value model (stochastic or adversarial value) and works under either full or partial information feedback. For stochastic values, the regret of the algorithm is upper-bounded by O( T log T), which is near optimal based on the lower bound (Theorem 1). For adversarial values, the OGD-CB algorithm is asymptotically (ρ/ v)- competitive regardless of the information model, matching the upper bound given in Theorem 2.

The algorithm. Our OGD-CB algorithm is presented in Algorithm 1. The algorithm starts with a one-round exploration to make an appropriate initialization (Line 3 Line 5). In each of the following rounds, after observing the value, the algorithm chooses the action based on a dynamically updated pricing parameter λt (Line 9), which is updated to control the rate of budget expenditure (Line 10). The update of λt follows an online gradient descent (OGD) procedure for a series of proper online reward functions, with step size ηt. Intuitively, a large λt indicates that the budget is being spent too quickly, and the algorithm reduces the frequency of entering the market. Conversely, an average expenditure below the ideal ρ in past rounds will result in a descent of λt and encourage the algorithm to participate in the auction. This intuition has been inspired by Balseiro, Lu, and Mirrokni

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Algorithm 1. The OGD-CB Algorithm.

Input: ρ, T. Initialization: I1 , B1 B, λ1 0.

1 for t 1 to T do

2 Observe vt;

/* A single round of exploration. */

3 if t = 1 then

4 xt 1, λt+1 λt;

/* Estimate the revenue and cost with a confidence bound. */

(log 2 + 2 log T)/(2|It|);

8 ert(vt) (P τ It(vt pτ)+)/|It| + ϵtvt, ect(vt) (P τ It pτ1[vt pτ])/|It| 2ϵtvt;

/* Choose the action according to the estimates. */

9 xt 1[ert(vt) λtect(vt)];

/* Online gradient descent on the pricing variable. */

10 ηt 1/( v

t), λt+1 (λt + ηt(xtect(vt) ρ))+;

/* Observe the sample. */

12 if (FULL-INFO) (PARTIAL-INFO xt = 1) then

13 Observe pt, It+1 It {t};

16 It+1 It;

/* Update the remaining budget. */

18 Bt+1 Bt xtct;

19 if Bt+1 < v then

(2023). However, a crucial distinction between our setting and that work is that the buyer is unaware of the (expected) revenue and cost given the value. To address this issue, we employ the distribution estimation method and the conﬁdence bound (CB) technique. Speciﬁcally, at the start of each round, the algorithm ﬁrst provides estimates of the expected revenue and cost based on the history (Line 8) and incorporates a bias using the conﬁdence bound, parameterized by ϵt, and then makes the decision and updates according to the biased

estimation. As the observation on the highest competing bid p accumulates, the estimation of the reward and the cost becomes more precise, and the bias value reduces to zero.

Stochastic values with full information feedback. We now analyze the performance of the OGD-CB algorithm with stochastic values and full information feedback. We show that in this scenario, OGD-CB achieves an O( T log T) regret bound against OPT, as given in the following theorem. Theorem 3. With full information feedback, when {vt}t T are sampled i.i.d. from some distribution F on [0, v], there is a constant CF, such that with probability at least 1 4/T, the OGD-CB algorithm guarantees

Regβ(v, p) CFp

Adversarial values with full information feedback. We further investigate the scenario where the value input could be adversarial. In this case, we establish that the OGD-CB algorithm attains a ρ/ v asymptotic competitive ratio, which matches the upper bound provided in Theorem 2. Theorem 4. With full information feedback, the OGD-CB algorithm guarantees

lim inf T ,B=ρT inf v,G

T Ep GT h U β(v, p) ρ

v UH(v, p) i 0.

Partial information feedback. In the partial information setting, the buyer can only observe pt if she chooses to enter the round t. As a result, the main obstacle here is that the algorithm may not gather sufﬁcient historical data to provide accurate estimates of r(vt) and c(vt). In other words, we need to simultaneously limit the failure probability and the estimation error. We overcome this issue by bounding the entering frequency. More precisely, we use an induction method to show that |It|, the entering frequency before round t, increases linearly with t. The result is expressed in the following important lemma.

Lemma 1. Let Ce := min{(1/2) (ρ/ v)2, (

2/4) (ρ/ v)}. In the partial information setting, for any t 2, the OGD-CB algorithm guarantees that |It| Ce (t 1). Utilizing Lemma 1, we can revisit Theorems 3 and 4 and obtain their corresponding versions under partial information. Speciﬁcally, we have the following results. Theorem 5. With partial information feedback, when {vt}t T are sampled i.i.d. from some distribution F on [0, v], there is a constant CP, such that with probability at least 1 4/T, the OGD-CB algorithm guarantees

Regβ(v, p) CPp

Theorem 6. With partial information feedback, the OGD-CB algorithm guarantees

lim inf T ,B=ρT inf v,G

T Ep GT h U β(v, p) ρ

v UH(v, p) i 0.

We notice that in Theorems 3 and 5, the difference between CP and CF satisiﬁes that

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

In other words, the difﬁculty of partial information feedback highly correlates with the ratio v/ρ. When this ratio increases, i.e., the buyer s average budget ρ becomes smaller, the partial information feedback becomes more challenging. Intuitively, with smaller ρ, the buyer has fewer chances to participate in auctions and thus learns less information on the distribution of the highest competing bid p with partial feedback, which leads to an increase in regret.

Comparison between Throttling and Pacing In this section, we aim to compare two popular budget control approaches in the dynamic setting: throttling and pacing. Throttling, which is the focus of this work, involves limiting the frequency of entering the auction, while pacing (Balseiro and Gur 2019), involves shading the buyer s value by an adaptive multiplier and bid the shaded value in each round. The latter approach has been extensively studied in the literature and widely adopted in the industry, as it is known to be the asymptotically optimal bidding strategy when both v and p are stochastic or adversarial. Speciﬁcally, we are most interested in comparing the two dynamic strategies. This part extends the result in Balseiro et al. (2021), which compares these two strategies in system equilibrium, i.e., when the dynamic process converges. We now compare the two strategies under stochastic and adversarial values, and restrict ourselves to full or partial information feedback model. We denote by U T(v, p, γ) the revenue of the optimal throttling strategy2 given v, p, and γ. For comparison, we use U P(v, p) to represent the revenue of the adaptive pacing strategy given in Balseiro and Gur (2019).

Stochastic values. With stochastic values, we begin by giving the following two assumptions.

Assumption 1. G is a continuous distribution on [0, v] with density strictly no less than some constant L > 0.

Assumption 2. For any λ > 0, the measure that r(v)/c(v) = λ is positive concerning distribution F.

Under these two assumptions, we have the following theorem.

Theorem 7. Under Assumptions 1 and 2, when Ev,p[p1[v p]] > ρ, we have

Ev,p U P(v, p) Ev,p,γ U T(v, p, γ) = Θ (T) .

We should emphasize that Assumptions 1 and 2 are not particularly strong. In fact, Assumption 1 holds for most parameterized continuous distribution families, while Assumption 2 holds true unless G has a special form, e.g., being a uniform distribution on [0, v]. As a result, we can conclude that under stochastic values, dynamic pacing (Balseiro and Gur 2019) would generally outperform even the optimal online throttling strategy by a linear term on the buyer s expected revenue.

2The optimality here concerns the expected total revenue. When v is stochastic, the expectation is taken on v, p and the algorithm randomness γ. When v is adversarial, the expectation is only on p and γ.

Nevertheless, for the completeness of this part, we also mention some special cases in which these two strategies have asymptotically similar performances. Theorem 8. Let β be the OGD-CB algorithm. When (a) the highest competing bid p is ﬁxed, or (b) Ev,p[p1[v p]] ρ, then under full or partial information feedback, we have Ev,p,γ U T(v, p, γ) Ev,p U P(v, p) = O

T , Ev,p U β(v, p) Ev,p U P(v, p) = O

These results conclude our discussions on the comparison of the two dynamic methods under stochastic values.

Adversarial values. Under adversarial values, we ﬁrst recall the result of Balseiro and Gur (2019) on the performance of dynamic pacing in this scenario. Proposition 2 (From Balseiro and Gur (2019)). We have the following:

lim inf T ,B=ρT inf v,p

T U P(v, p) ρ

v UH(v, p) 0.

A crucial point in Proposition 2 is that the adaptive pacing algorithm can handle the case when both v and p are adversarial and reaches optimality in this case (Balseiro and Gur 2019). However, by our further results in the full version (Chen et al. 2022b), this is impossible for any throttling strategy. Nevertheless, when only v is adversarial and p is stochastic, notice that Theorem 2 can be extended to arbitrary bidding strategies (which certainly includes throttling and pacing) without any modiﬁcation on the proof. Therefore, combining our positive result on the OGD-CB algorithm (Theorems 4 and 6), we conclude that OGD-CB is asymptotically optimal under this scenario.

Concluding Remarks This work provides a comprehensive discussion on the dynamic throttling strategy in repeated second-price auctions from a buyer s viewpoint. More speciﬁcally, we investigate the optimal performance of any throttling algorithm when the buyer s values and the highest competing bids are stochastic or adversarial. On top of that, we propose an OGD-CB algorithm that achieves (near) optimality under both full and partial information structure, regardless of the value input model when the highest competing bid is stochastic. Furthermore, we compare the dynamic throttling strategy to dynamic pacing under different settings. When the values are stochastic, dynamic throttling strategy generally exhibits a linear gap in comparison to dynamic pacing concerning the buyer s revenue. However, with adversarial values, we demonstrate that OGD-CB is asymptotically the best bidding strategies with full or partial information feedback.

Acknowledgements This work is supported by Tencent Marketing Solution RBFR202104. The authors also thank Santiago R. Balseiro, Chuyue Tang, Xiang Yan, and anonymous reviewers for their kind and useful suggestions.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

References Agarwal, D.; Ghosh, S.; Wei, K.; and You, S. 2014. Budget pacing for targeted online advertisements at linkedin. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, 1613 1619. Agrawal, S.; and Devanur, N. 2016. Linear contextual bandits with knapsacks. Advances in Neural Information Processing Systems, 29. Agrawal, S.; Devanur, N. R.; and Li, L. 2016. An efﬁcient algorithm for contextual bandits with knapsacks, and an extension to concave objectives. In Conference on Learning Theory, 4 18. PMLR. Ai, R.; Chen, Z.; Deng, X.; Pan, Y.; Wang, C.; and Yang, M. 2022. On the Re-Solving Heuristic for (Binary) Contextual Bandits with Knapsacks. ar Xiv preprint ar Xiv:2211.13952. Arlotto, A.; and Gurvich, I. 2019. Uniformly bounded regret in the multisecretary problem. Stochastic Systems, 9(3): 231 260. Badanidiyuru, A.; Langford, J.; and Slivkins, A. 2014. Resourceful contextual bandits. In Conference on Learning Theory, 1109 1134. PMLR. Balseiro, S.; Kim, A.; Mahdian, M.; and Mirrokni, V. 2021. Budget-Management Strategies in Repeated Auctions. Operations Research, 69(3): 859 876. Balseiro, S.; Kroer, C.; and Kumar, R. 2023. Contextual standard auctions with budgets: Revenue equivalence and efﬁciency guarantees. Management Science. Balseiro, S. R.; Besbes, O.; and Pizarro, D. 2023. Survey of Dynamic Resource-Constrained Reward Collection Problems: Uniﬁed Model and Analysis. Operations Research. Balseiro, S. R.; Besbes, O.; and Weintraub, G. Y. 2015. Repeated auctions with budgets in ad exchanges: Approximations and design. Management Science, 61(4): 864 884. Balseiro, S. R.; and Gur, Y. 2019. Learning in repeated auctions with budgets: Regret minimization and equilibrium. Management Science, 65(9): 3952 3968. Balseiro, S. R.; Lu, H.; and Mirrokni, V. 2023. The best of many worlds: Dual mirror descent for online allocation problems. Operations Research, 71(1): 101 119. Borgs, C.; Chayes, J.; Immorlica, N.; Jain, K.; Etesami, O.; and Mahdian, M. 2007. Dynamics of bid optimization in online advertisement auctions. In Proceedings of the 16th international conference on World Wide Web, 531 540. Bumpensanti, P.; and Wang, H. 2020. A re-solving heuristic with uniformly bounded loss for network revenue management. Management Science, 66(7): 2993 3009. Castiglioni, M.; Celli, A.; and Kroer, C. 2022. Online learning with knapsacks: the best of both worlds. In International Conference on Machine Learning, 2767 2783. PMLR. Celli, A.; Colini-Baldeschi, R.; Kroer, C.; and Sodomka, E. 2022. The parity ray regularizer for pacing in auction markets. In Proceedings of the ACM Web Conference 2022, 162 172. Charles, D.; Chakrabarty, D.; Chickering, M.; Devanur, N. R.; and Wang, L. 2013. Budget smoothing for internet ad auctions: a game theoretic approach. In Proceedings of the

fourteenth ACM conference on Electronic commerce, 163 180. Chen, X.; Kroer, C.; and Kumar, R. 2021. Throttling equilibria in auction markets. ar Xiv preprint ar Xiv:2107.10923. Chen, Z.; Deng, X.; Li, J.; Wang, C.; and Yang, M. 2022a. Budget-Constrained Auctions with Unassured Priors. ar Xiv preprint ar Xiv:2203.16816. Chen, Z.; Wang, C.; Wang, Q.; Pan, Y.; Shi, Z.; Cai, Z.; Ren, Y.; Zhu, Z.; and Deng, X. 2022b. Dynamic Budget Throttling in Repeated Second-Price Auctions. ar Xiv preprint ar Xiv:2207.04690. Conitzer, V.; Kroer, C.; Panigrahi, D.; Schrijvers, O.; Stier Moses, N. E.; Sodomka, E.; and Wilkens, C. A. 2022a. Pacing equilibrium in ﬁrst price auction markets. Management Science, 68(12): 8515 8535. Conitzer, V.; Kroer, C.; Sodomka, E.; and Stier-Moses, N. E. 2022b. Multiplicative pacing equilibria in auction markets. Operations Research, 70(2): 963 989. Facebook. 2017. Your Guide to Facebook Bid Strategy. https://www.facebook.com/gms_hub/share/ biddingstrategyguide_ﬁnal.pdf?ref=advertising-principles. Accessed: 2022-06-28. Gaitonde, J.; Li, Y.; Light, B.; Lucier, B.; and Slivkins, A. 2023. Budget Pacing in Repeated Auctions: Regret and Efﬁciency without Convergence. In 14th Innovations in Theoretical Computer Science Conference (ITCS 2023), volume 251, 52. Schloss Dagstuhl Leibniz-Zentrum für Informatik. Gallego, G.; and Van Ryzin, G. 1994. Optimal dynamic pricing of inventories with stochastic demand over ﬁnite horizons. Management science, 40(8): 999 1020. Gallego, G.; and Van Ryzin, G. 1997. A multiproduct dynamic pricing problem and its applications to network yield management. Operations research, 45(1): 24 41. Google Ad Exchange. 2022. Bid data sharing. https://support. google.com/authorizedbuyers/answer/2696468. Accessed: 2022-06-28. Gui, G.; Nair, H.; and Niu, F. 2022. Auction Throttling and Causal Inference of Online Advertising Effects. In Proceedings of the 23rd ACM Conference on Economics and Computation, 297 297. Han, Y.; Zeng, J.; Wang, Y.; Xiang, Y.; and Zhang, J. 2023. Optimal Contextual Bandits with Knapsacks under Realizability via Regression Oracles. In International Conference on Artiﬁcial Intelligence and Statistics, 5011 5035. PMLR. Karande, C.; Mehta, A.; and Srikant, R. 2013. Optimizing budget constrained spend in search advertising. In Proceedings of the sixth ACM international conference on Web search and data mining, 697 706. Lee, K.-C.; Jalali, A.; and Dasdan, A. 2013. Real time bid optimization with smooth budget delivery in online advertising. In Proceedings of the seventh international workshop on data mining for online advertising, 1 9. Liu, H.; and Grigas, P. 2022. Online Contextual Decision Making with a Smart Predict-then-Optimize Method. ar Xiv preprint ar Xiv:2206.07316.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Mehta, A.; Saberi, A.; Vazirani, U.; and Vazirani, V. 2007. Adwords and generalized online matching. Journal of the ACM (JACM), 54(5): 22 es. Mehta, A.; et al. 2013. Online matching and ad allocation. Foundations and Trends in Theoretical Computer Science, 8(4): 265 368. Sivakumar, V.; Zuo, S.; and Banerjee, A. 2022. Smoothed adversarial linear contextual bandits with knapsacks. In International Conference on Machine Learning, 20253 20277. PMLR. Slivkins, A.; Sankararaman, K. A.; and Foster, D. J. 2023. Contextual Bandits with Packing and Covering Constraints: A Modular Lagrangian Approach via Regression. In The Thirty Sixth Annual Conference on Learning Theory, 4633 4656. PMLR. Vera, A.; and Banerjee, S. 2021. The bayesian prophet: A lowregret framework for online decision making. Management Science, 67(3): 1368 1391. Wu, H.; Srikant, R.; Liu, X.; and Jiang, C. 2015. Algorithms with logarithmic or sublinear regret for constrained contextual bandits. Advances in Neural Information Processing Systems, 28. Xu, J.; Lee, K.-c.; Li, W.; Qi, H.; and Lu, Q. 2015. Smart pacing for effective online ad campaign optimization. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2217 2226. Yao, A. C.-C. 1977. Probabilistic computations: Toward a uniﬁed measure of complexity. In 18th Annual Symposium on Foundations of Computer Science (sfcs 1977), 222 227. IEEE Computer Society.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)