# multichannel_autobidding_with_budget_and_roi_constraints__ffee1a8f.pdf Multi-channel Autobidding with Budget and ROI Constraints Yuan Deng 1 Negin Golrezaei 2 Patrick Jaillet 2 Jason Cheuk Nam Liang 2 Vahab Mirrokni 1 In digital online advertising, advertisers procure ad impressions simultaneously on multiple platforms, or so-called channels, such as Google Ads, Meta Ads Manager, etc., each of which consists of numerous ad auctions. We study how an advertiser maximizes total conversion (e.g. ad clicks) while satisfying aggregate return-on-investment (ROI) and budget constraints across all channels. In practice, an advertiser does not have control over, and thus cannot globally optimize, which individual ad auctions she participates in for each channel, and instead authorizes a channel to procure impressions on her behalf: the advertiser can only utilize two levers on each channel, namely setting a per-channel budget and per-channel target ROI. In this work, we first analyze the effectiveness of each of these levers for solving the advertiser s global multi-channel problem. We show that when an advertiser only optimizes over per-channel ROIs, her total conversion can be arbitrarily worse than what she could have obtained in the global problem. Further, we show that the advertiser can achieve the global optimal conversion when she only optimizes over per-channel budgets. In light of this finding, under a bandit feedback setting that mimics real-world scenarios where advertisers have limited information on ad auctions in each channels and how channels procure ads, we present an efficient learning algorithm that produces per-channel budgets whose resulting conversion approximates that of the global optimal problem. 1. Introduction In today s online advertising world, advertisers (including but not limited to small businesses, marketing practitioners, 1Google Research 2Massachusetts Institute of Technology. Correspondence to: Jason Cheuk Nam Liang . Proceedings of the 40 th International Conference on Machine Learning, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright 2023 by the author(s). non-profits, etc.) have been embracing an expanding array of advertising platforms such as search engines, social media platforms, web publisher display, etc., which present a plenitude of channels for advertisers to procure ad impressions and obtain traffic. In this growing multi-channel environment, the booming online advertising activities have fueled extensive research and technological advancements in attribution analytics to answer questions like which channels are more effective in targeting certain users? Or, which channels produce more user conversion (e.g. ad clicks) or return-on-investment (ROI) with the same amount of investments? (see (Kannan et al., 2016) for a comprehensive survey on attribution analytics). Yet, this area of research has largely left out a crucial phase in the workflow of advertisers creation of a digital ad campaign, namely how advertisers interact with advertising channels, which is the physical starting point of a campaign. To illustrate the significance of advertiser-channel interactions, consider for example a small business who is relatively well-informed through attribution research that Google Ads and Meta ads are the two most effective channels for its products. The business instantiates its ad campaigns through interacting with the platforms ad management interfaces (see Figure 1), on which the business utilizes levers such as specifying budget and a target ROI1 to control campaigns. Channels then inputs these specified parameters into their autobidding procedures, where they procure impressions on the advertiser s behalf to procure ads through automated blackbox algorithms. Eventually, channels report performance metrics such as expenditure and conversion back to the advertiser once the campaign ends. Therefore, one of the most important decisions advertisers need to make involves how to optimize over these levers provided by channels. Unfortunately, this has rarely been addressed in attribution analytics and relevant literature. Hence, this works contributes to filling this vacancy by addressing two themes: How effective are these channel levers for advertisers to achieve their conversion goals? And how should advertisers optimize over such levers? To answer these questions, we study a setting where an advertiser simultaneously procures ads on multiple channels, 1Target ROI is the numerical inverse of CPA or cost per action on Google Ads, and cost per result goal in Meta Ads. Multi-channel Autobidding with Budget and ROI Constraints Figure 1. Interfaces on Google Ads (left) and Meta Ads Manager (right) for creating ad campaigns that allow advertisers to set per-channel budgets and ROIs. CPA, or cost per action on Google Ads, as well as cost per result goal on Meta Ads Manager, are effectively the inverse value for an advertiser s per-channel target ROI. Meta Ads Manager highlights its autobidding procedure maximizes total conversion while respecting advertisers per-channel target ROI (see red box highlighted), supporting the GL-OPT and CH-OPT models in Eq. (1), (3). each of which consists of multiple ad auctions that sell ad impressions. The advertiser s global optimization problem is to maximize total conversion over all channels, while respecting a global budget constraint that limits total spend, and a global ROI constraint that ensures total conversion is at least the target ROI times total spend. However, as channels operate as independent entities and conduct autobidding procurement on behalf of advertisers, there are no realistic means for an advertiser to implement the global optimization problem via optimizing over individual auctions. Instead, advertisers can only use two levers, namely a per-channel ROI and per-channel budget, to influence how channels should autobid for impressions. Our goal is to understand how effective are these levers by comparing the total conversion via optimizing levers versus the globally optimal conversion, and also present methodologies to help advertisers optimize over the usage of these levers. We summarize our contributions as followed: Main contributions. 1. Modelling ad procurement through per-channel ROI and budget levers. In Section 2 we develop a novel model for online advertisers to optimize over the per-channel ROI and budget levers to maximize total conversion over channels while respecting a global ROI and budget constraint. This multi-channel optimization model closely imitates realworld practices (see Figure 1 for evidence), and to the best of our knowledge is the first of its kind to characterize advertisers interactions with channels to run ad campaigns. 2. Solely optimizing per-channel budgets are sufficient to maximize conversion. In Theorem 3.2 of Section 3, we show that solely optimizing for per-channel ROIs is inadequate to optimize conversion across all channels, possibly resulting in arbitrarily worse total conversions compared to the global optimal where advertisers can optimize over individual auctions. In contrast, in Theorem 3.3 and Corollary 3.4 we show solely optimizing for per-channel budgets allows advertisers to achieve the global optimal. 3. Algorithm to optimize per-channel budget levers. Under a realistic bandit feedback structure where advertisers can only observe the total conversion and spend in each channel after making a per-channel budget decision, in Section 4 we develop an algorithm that augments stochastic gradient descent (SGD) with the upper-confidence bound (UCB) algorithm, and eventually outputs a per-channel budget profile with which advertisers can achieve O(T 1/3) approximation accuracy in total conversion compared to that of the optimal per-channel budget profile within T iterations. Our algorithm relates to constrained convex optimization with uncertain constraints and bandit feedback under a one point estimation regime, and to the best of our knowledge, our proposed algorithm is the first to handle such a setting; see discussions in Remark 4.9 of Section 4. Related works. We review literature that relates to key themes of this work: autobidding, budget and ROI management, and constrained optimization with bandit feedback. 1. Autobidding. The autobidding model has been formally developed in (Aggarwal et al., 2019), and has been analyzed through the lens of welfare efficiency or price of anarchy in (Deng et al., 2021; Balseiro et al., 2021a; Deng et al., 2022b; Mehta, 2022), as well as individual advertiser fairness in (Deng et al., 2022a). The autobidding model has also been compared to classic quasi-linear utility models in (Balseiro et al., 2021b). The autobidding model considered in these papers assume advertisers can directly optimize over individual auctions, whereas in this work we address a more realistic setting that mimics practice where advertisers can only use levers provided by channels, and let channels procure ads on their behalf. Multi-channel Autobidding with Budget and ROI Constraints 2. Budget and ROI management. Budget and ROI management strategies have been widely studied in the context of mechanism design and online learning. (Balseiro et al., 2017) studies the system equilibria of a range of budget management strategies in terms of the platforms profits and advertisers utility; (Balseiro & Gur, 2019; Balseiro et al., 2022) study online bidding algorithms (called pacing) that help advertisers achieve high utility in repeated second-price auctions while maintaining a budget constraint, whereas (Feng et al., 2022) studies similar algorithms but considers respecting a long term ROI constraint in addition to a fixed budget. All of these works on budget and ROI management focus on bidding strategies in a single repeated auction where advertisers decisions are bid values submitted directly to the auctions. In contrast, this work studies advertisers making decisions on how to adjust per-channel ROI and budget levers while leaving the bidding to channels blackbox algorithms. 3. Constrained optimization with bandit feedback. Lemma 4.5 of Section 4 shows the advertiser s optimization problem for per-channel budgets is a constrained convex problem with bandit feedback. Due to space limitations, here we only review two very recent works (Usmanova et al., 2019; Nguyen & Balasubramanian, 2022) that are most relevant to this paper, and refer readers to other related works in our extended literature review in Appendix A.1. Both works consider a similar convex setting to ours where the objective and constraints are only available through noisy function value evaluations. Although the two works achieve O(T 1) and O(T 1/2) respective approximation accuracy to the optimal solution, which contrasts our O(T 1/3) accuracy, these works impose several assumptions that are stronger than the ones that we consider. First, the objective and constraint functions are smooth (i.e. the gradients are Lipschitz continuous) and (Nguyen & Balasubramanian, 2022) further assume strong convexity. But in our work, our objectives and constraints are piece-wise linear and do not satisfy such salient properties. Second, and most importantly, both works consider a setting with two point estimations that allows the optimizer to access the objective and constraint function values twice in each iteration, enabling more efficient estimations. This work, however, lies in the one-point setting where we can only access function values once per iteration. Finally, we remark that the optimal accuracy/oracle complexity for the one-point setting for constrained (non-smooth) convex optimization with bandit feedback and unknown constraints remains an open question; see Remark 4.9 for more details. See Appendix A.1 for an extended literature review. 2. Preliminaries Advertisers global optimization problem. Consider an advertiser running a digital ad campaign to procure ad impressions on M N platforms such as Google Ads, Meta Ads Manager etc., each of which we call a channel. Each channel j consists of mj N parallel ad auctions, each of which corresponds to the sale of an ad impression.2 An ad auction n [mj] is associated with a value vj,n 0 that represents the expected conversion (e.g. number of clicks) of the impression on sale, and a cost dj,n 0 that is required for the purchase of the impression. For example, the cost in a single slot second-price auction is the highest competing bid from competitors in the market, and in a posted price auction the cost is simply the posted price by the seller of the impression. Writing vj = (vj,n)n [mj] and dj = (dj,n)n [mj], we assume that zj := (vj, dj) is sampled from some unknown discrete distribution pj supported on unknown finite set Fj Rmj + Rmj + . The advertiser s goal is to maximize total conversion of procured ad impressions, while subject to a return-oninvestment (ROI) constraint that states total conversion across all channels is no less than γ times total spend for some pre-specified target ROI 0 < γ < , as well as a budget constraint that states total spend over all channels is no greater than the total budget ρ 0. Mathematically, the advertiser s global optimization problem is: GL-OPT = max xj [0,1]mj , j [M] j [M] E v j xj j [M] E v j xj γd j xj 0 P j [M] E d j xj ρ . Here, the decision variable xj [0, 1]mj is a vector where xj,n denotes whether the impression in auction n for channel j is procured. We remark that x depends on the realization of z = (vj, dj)j [M] and is also random. We note that the ROI and budget constraints are taken in expectation because an advertiser procures impressions from a very large number of auctions in total, and thus the advertiser typically only requires to satisfy constraints in an average sense. GL-OPT is a widely adopted formulation for autobidding practices in modern online advertising; see e.g. (Aggarwal et al., 2019; Balseiro et al., 2021a; Deng et al., 2021; 2022b). In Section 5 we discuss more general advertiser objectives. Our overarching goal is to develop methodologies that enable an advertiser to achieve total campaign conversion that match GL-OPT. However, directly optimizing GL-OPT may not be plausible as we discuss in the following. Advertisers levers to solve their global problems. To 2Ad auctions for each channel may be run by the channel itself or other external ad inventory suppliers such as web publishers. Multi-channel Autobidding with Budget and ROI Constraints solve the global optimization problem GL-OPT, ideally advertisers would like to optimize over individual auctions across all channels. However, in reality channels operate as independent entities, and typically do not provide means for general advertisers to participate in specific individual auctions at their discretion. Instead, channels provide advertisers with specific levers to express their ad campaign goals on spend and conversion. In this work, we focus on two of the most widely used levers, namely the per-channel ROI target and per-channel budget (see illustration in Fig. 1). After an advertiser inputs these parameters to a channel, the channel then procures ads on behalf of the advertiser through autonomous programs (we call this programmatic process autobidding) to help advertiser achieve procurement results that match with the inputs. We elaborate on this later. Formally, we consider the setting where for each channel j, an advertiser is allowed to input a per-channel target ROI 0 γj < , and a per-channel budget ρj [0, ρ] where we recall ρ > 0 is the total advertiser budget for a certain campaign. Then, the channel uses these inputs in its autobidding algorithm to procure ads, and returns the total conversion Vj(γj, ρj; zj) 0 , as well as total spend Dj(γj, ρj; zj) 0 to the advertiser, where we recall zj = (vj, dj) Fj is the realized vector of value-cost pairs in channel j; Vj and Dj will be further specified later. As the advertiser has the freedom of choice to input either per-channel target ROI s, budgets, or both, we consider three options: 1. input only a per-channel target ROI; 2. input only a per-channel budget; 3. input both per-channel target ROI and budgets. Such options correspond to the following decision sets for (γj, ρj)j [M]: Per-channel budget only option: IB = {(γj, ρj)j [M] R2 M + : γj = 0, ρj [0, ρ] for j}. Per-channel target ROI only option: IR = {(γj, ρj)j [M] R2 M + : γj 0, ρj = for j}. General option: IG = {(γj, ρj)j [M] : γj 0, ρj [0, ρ] for j}. The advertiser s goal in practice is to maximize their total conversion of procured ad impressions through optimizing over per-channel budgets and target ROIs, while subject to the global ROI and budget constraint similar to those in GL-OPT. Mathematically, for any option I {IB, IR, IG}, the advertiser s optimization problem can be written as CH-OPT(I) = max (γj,ρj)j I j M E [Vj(γj, ρj; zj)] j M E [Vj(γj, ρj; zj) γDj(γj, ρj; zj)] 0 P j [M] E [Dj(γj, ρj; zj)] ρ , where the expectation is taken w.r.t. randomness in zj. We remark that for any channel j [M], the number of auctions mj as well as the distribution pj are fixed and not a function of the input parameters γj, ρj. The functions (Vj, Dj) that map per-channel target ROI and budgets γj, ρj to the total conversion and expenditure are specified by various factors including but not limited to channel j s autobidding algorithms deployed to procure ads on advertisers behalf, or the auction mechanisms that sell impressions. In this work, we study a general setup that closely mimics industry practices: we assume that on the behalf of the advertiser, each channel aims to optimize their conversion over all mj auctions while respecting the advertiser s input (i.e., per-channel target ROI and budgets). (See e.g. Meta Ads Manager in Figure 1 specifically highlights the channel s autobidding procurement methodology which supports this setup). Hence, each channel j s optimization problem can be written as x j(γj, ρj; zj) = arg max x [0,1]mj v j x s.t. v j x γjd j x, d j x ρj , (4) where x = (xn)n [Mj] [0, 1]mj denotes the vector of probabilities to win each of the parallel auctions, i.e. xn [0, 1] is the probability to win auction n [mj] in channel j. In light of this representation, the corresponding conversion and spend functions are given by Vj(γj, ρj; zj) = v j x j(γj, ρj; zj) Vj(γj, ρj) = E[Vj(γj, ρj; zj)] Dj(γj, ρj; zj) = d j x j(γj, ρj; zj) Dj(γj, ρj) = E[Dj(γj, ρj; zj)] . Here, the expectation is taken w.r.t. randomness in zj = (vj, dj). We assume that for any (γj, ρj) and realization zj, Vj(γj, ρj; zj) is bounded above by some absolute constant V (0, ) almost surely. We remark that Eq.(5) assumes channels are able to achieve optimal procurement performance. Later in Section 5, we briefly discuss setups where channels does not optimally solve for Eq.(4). Key questions and organization of this paper. 1. (Section 3) How effective are the per-channel ROI and budget levers to help advertisers achieve the globally optimal conversion GL-OPT while respecting the global ROI and budget constraints? In particular, for each option I {IB, IR, IG} defined in Eq. (2), what is the discrepancy between CH-OPT(I) versus the optimal GL-OPT? 2. (Section 4) How can advertisers optimize per-channel target ROIs and budgets to solve for CH-OPT(I)? 3. On the Efficacy of Per-channel Levers In this section, we examine the effectiveness of the perchannel target ROI and per-channel budget levers in achiev- Multi-channel Autobidding with Budget and ROI Constraints ing the global optimal GL-OPT. In particular, we study if the optimal solution to the channel problem CH-OPT(I) defined in Eq. (3) for I {IB, IR, IG} is equal to the global optimal GL-OPT. Our first result in this section is the following Lemma 3.1 which shows that GL-OPT serves as a theoretical upper bound for an advertiser s conversion through optimizing CH-OPT(I) with any option I. Lemma 3.1 (GL-OPT is the theoretical upper bound for conversion). For any option I {IB, IR, IG}, we have GL-OPT CH-OPT(I). The proof of Lemma 3.1 is deferred to Appendix B.1. In light of the theoretical upper bound GL-OPT, we are now interested in the gap between GL-OPT and CH-OPT(I) for option I {IB, IR, IG}. In the following Theorem 3.2, we show that there exists a problem instance under which the ratio CH-OPT(IR) GL-OPT nears 0, implying the per-channel ROIs alone fail to help advertisers optimize conversion. Theorem 3.2 (Per-channel ROI only option fails to optimize conversion). Consider an advertiser with a (global) target ROI of γ = 1 procuring impressions from M = 2 channels with 1 and 2 auctions, respectively. The advertiser has unlimited budget ρ = , and chooses the per-channel target ROI only option IR defined in Eq. (2). Assume there is only one realization of value-cost pairs z = (vj, dj)j [M] (i.e. the support F = F1 F2 is a singleton), and the realization is presented in the following table, where X > 0 is some arbitrary parameter. Then, lim X CH-OPT(IR) GL-OPT = 0. Channel 1 Channel 2 Auction 1 Auction 2 Auction 3 Value vj,n 1 X 2X Spend dj,n 0 1 + X 2(1 + X) See proof in Appendix B.2. In contrast, the budget only option IB in fact allows an advertiser s conversion to reach the theoretical upper bound GL-OPT through solely optimizing for per-channel budgets. This is formalized in the following theorem (see proof in Appendix B.3). Theorem 3.3 (Per-channel budget suffices to achieve optimal conversion). For the budget only option IB defined in Eq.(2), we have GL-OPT = CH-OPT(IB) for any global target ROI γ > 0 and total budget ρ > 0, even for ρ = . As an immediate extension of Theorem 3.3, the following Corollary 3.4 states per-channel ROI s in fact become redundant once advertisers optimize for per-channel budgets. Corollary 3.4 (Redundancy of per-channel ROIs). For the general option IG defined in Eq.(2) where an advertiser sets both per-channel ROI and budgets, we have GL-OPT = CH-OPT(IG) for any aggregate ROI γ > 0 and total budget ρ > 0, even for ρ = . Further, there exists an optimal solution (γj, ρj)j [M] to CH-OPT(IG), s.t. γj = 0 for all j [M]. In light of the redundancy of per-channel ROIs as illustrated in Corollary 3.4, in the rest of the paper we will fix γj = 0 for any channel j [M], and omit γj in all relevant notations; e.g. we will write Dj(ρj; zj) and Dj(ρj), instead of Dj(γj, ρj; zj) and Dj(γj, ρj). Equivalently, we will only consider the per-channel budget only option IB. 4. Optimization Algorithm for Per-channel Budgets under Bandit Feedback In this section, we develop an efficient algorithm to solve for per-channel budgets that optimize CH-OPT(IB) defined in Eq. (3), which achieves the theoretical optimal conversion, namely GL-OPT, as illustrated in Theorem 3.3. In particular, we consider algorithms that run over T > 0 periods, where each period for example corresponds to the duration of 1 hour or 1 day. At the end of T periods, the algorithm produces some per-channel budget profile (ρj)j [M] [0, ρ]M that approximates CH-OPT(IB), and satisfies aggregate ROI and budget constraints, namely P j M Vj(ρj) γ P j M Dj(ρj), P j [M] Dj(ρj) ρ , where we recall (Vj(ρj), Dj(ρj)) are defined in Eq. (5). The algorithm proceeds as follows: at the beginning of period t [T], the advertiser sets per-channel budgets (ρj,t)j [M], while simultaneously values and costs zt = (zj,t)j [M] = (vj,t, dj,t)j [M], where (vj,t, dj,t) Rmj + Rmj + are sampled (independently in each period) from finite support F = F1 . . . FM according to discrete distributions (pj)j [M]. Each channel j then takes as input ρj,t [0, ρ] and procures ads on behalf of the advertiser, and reports the total realized conversion Vj(ρj,t; zj,t) as well as total spend Dj(ρj,t; zj,t) to the advertiser (see definitions in Eq. (5)). For simplicity we assume any realization zj = (vj, dj) Fj admits an ordering vj,1 dj,1 > > vj,mj dj,mj for all channels j [M]. Bandit feedback: We highlight that the advertiser receives bandit feedback from channels, i.e. the advertiser only observes the numerical values Vj(ρj,t; zj,t) and Dj(ρj,t; zj,t), but does not get to observe Vj(ρ j; z j) and Dj(ρ j; z j) evaluated at any other per-channel budget ρ j = ρj,t and realized value-cost pairs z j = zj,t. We also make two mild assumptions: In Assumption 4.1, we assume that any channel will deplete input per-channel budgets. This is a natural assumption that mimics practical scenarios, e.g. marketing for small businesses that have moderate-sized budgets. In Assumption 4.2, we assume for any realization of value-cost pairs zj = (vj, dj) in a Multi-channel Autobidding with Budget and ROI Constraints channel j [M], there always exists an auction n [mj] in this channel whose value-to-cost ratio is at least γ, i.e. vj,n > γdj,n. Assumption 4.1 (Moderate budgets). Assume ρ < , and for any channel j [M], value-cost realization zj = (vj, dj) Fj, and per-channel budget ρj [0, ρ], the optimal solution x j(ρj; zj) defined in Eq. (4) is budget binding, i.e. Dj(ρj; zj) = d j x j(ρj; zj) = ρj. Assumption 4.2 (Strictly feasible global ROI constraints). Fix any channel j [M] and any realization of value-cost pairs zj = (vj, dj) Fj. Then, the channel s optimization problem in Eq. (4) is strictly feasible, i.e. the set xj [0, 1]mj : v j xj > γd j xj is nonempty. 4.1. Optimize per-channel budgets with SGD-UCB Here, we describe our algorithm to solve for optimal perchannel budgets w.r.t. CH-OPT(IB). Similar to most algorithms for constrained optimization, we take a dual stochastic gradient descent (SGD) approach; see a comprehensive survey on dual descent methods in (Bertsekas, 2014). First, we consider the Lagrangian functions w.r.t. CH-OPT(IB) where we let c = (λ, µ) R2 + be the dual variables corresponding to the ROI and budget constraints, respectively: Lj(ρj, c; zj) = (1 + λ)Vj(ρj; zj) (λγ + µ)ρj Lj(ρj, c) = E [Lj(ρj, c; zj)] . (6) Then, in each period t [T] given dual variables ct = (λt, γt), SGD decides on a primal decision, i.e. per-channel budget (ρj,t)j [M] by optimizing the following ρj,t = arg maxρj [0,ρ] Lj(ρj, ct; zj,t) . (7) Having observed the realized values (Vj(ρj,t; zj,t))j [M] (note that spend is (ρj,t)j [M] under Assumption 4.1), we calculate the current period violation in budget and ROI constraints, namely g1,t := P j M (Vj(ρj,t; zj,t) γρj,t) and g2,t = ρ P j [M] ρj,t. Next, we update dual variables via Π[0,CF ] (λt ηg1,t) and µt+1 = Π[0,CF ] (µt ηg2,t), where Π is the projection operator, η is some pre-specified step size, and CF is some dual variable upper bound specified in Eq. (9).3 However, we cannot realistically find the primal decisions by solving Eq. (7) since the function Lj( , ct; zj,t) is unknown due to the bandit feedback structure. Therefore, we provide a modification to SGD to handle this issue. We briefly note that although bandit feedback prevents naively applying SGD to our problem, this may not be the case in other online advertising scenarios that involve relevant learning tasks, underlining the challenges of our problem; see Remark A.1 in Appendix A.2 for comparisons with related works. 3One can also employ more general mirror descent dual variable updates; see e.g. (Balseiro et al., 2022). To handle bandit feedback, we take a natural approach to augment SGD with the celebrated upper-confidence bound (UCB) algorithm; see intro to UCB and multi-arm bandits in (Slivkins et al., 2019). In particular, we first discretize our per-channel budget decision set [0, ρ] into granular arms separated by distance δ > 0: A(δ) = {ak}k [K] where ak = (k 1)δ . (8) for K := ρ/δ + 1. In the following we will use the terms per-channel budget and arm interchangeably. In the spirit of UCB, in each period t we maintain some estimate j [M] of the conversions (Vj(ak))j [M] as well as an upper confidence bound UCBj,t(ak) for each arm ak using historical payoffs from periods in which arm ak is pulled. Finally, we update primal decisions for each channel j [M] using the best arm ρj,t = arg maxak A(δ) (1 + λt) V j,t(ak) + UCBj,t(ak) (λtγ + µt) ak. Finally, to ensure aggregate ROI and budget constraint satisfaction, we maintain variables that check ROI and budget balances, namely S1,t and S2,t, to record the cumulative ROI and spend across all channels up until period t. When the ROI balance check S1,t is too negative, or the budget balance check is too large, we stop the algorithm, and naively set some pre-defined small per-channel budget ρ (0, ρ) (later chosen in Theorem 4.8) during all periods after the stopping time denoted as τA. We remark that similar approaches to ensure constraint satisfaction has been introduced in e.g. (Balseiro et al., 2022; Feng et al., 2022). We summarize our algorithm, called SGD-UCB, in Algorithm 1.4 4.2. Analyzing the SGD-UCB algorithm In this subsection, we analyze the performance of SGDUCB in Algorithm 1, and present accuracy guarantees on the final output ρT = 1 T P t [T ] ρj,t backbone of our analysis strategy is to show the cumulative loss over T periods, namely T GL-OPT E h P j [M] Vj(ρj,t) i consists of three main parts, namely the stopping error due to some condition for the while loop being violated and naively setting a small perchannel budget ρ after the stopping time τA (see step 10); the error induced by UCB in our algorithm; and the error due to SGD (or what is typically viewed as the deviations from complementary slackness); see following Proposition 4.3. Then we further bound each part. 4There has been very recent works that combine SGD with adversarial bandit type algorithms such as EXP3 (Castiglioni et al., 2022), or with Thompson sampling which is another well-known algorithm for stochastic bandit problems (e.g. (Ding et al., 2021)), and works that employ SGD in bandit problems (e.g. (Han et al., 2021)). Yet to the best of our knowledge, our approach to integrate SGD with UCB is novel. Multi-channel Autobidding with Budget and ROI Constraints Algorithm 1 SGD-UCB 1: Input: Budget discretization set of arms A(δ) defined in Eq.(8). Step size η > 0. Initialize Nj,1(ak) = V j,1(ak) = 0 for all j [M] and k [K], and dual variables λ1 = µ1 = 0. Set ρ (0, ρ/M), β > 0 and dual variable upper bound CF = MV max n 1 βρ, 1 ρ Mρ where V maxj [M] maxρj [0,ρ] maxzj Fj Vj(ρj, zj) is the conversion upper bound. 2: Set initial constraint balance checks: S1,t = S2,t = 0 for t = 1, and start period counter t = 1. 3: while t T and S1,t γMρ + βρ(T t) 0 and S2,t + Mρ + Mρ(T t) ρT do 4: Set per-channel budget. For each channel j [M]: If t K, set ρj,t = at. Else if t > K, set ρj,t = arg maxak A(δ) V j,t(ak) + UCBj,t(ak) (λtγ+µt)ak where UCBj,t(ak) = q 2 log(T ) Nj,t(ak). 5: Observe realized conversion {Vj(ρj,t; zj,t)}j [M], and update for each arm k [K] and channel j [M] Nj,t+1(ak) = Nj,t(ak) + I{ρj,t = ak} V j,t+1(ak) = Nj,t(ak)V j,t(ak)+Vj(ρj,t;zj,t)I{ρj,t=ak} 6: Update dual variables. Calculate g1,t = P j M (Vj(ρj,t; zj,t) γρj,t) and g2,t = ρ P j [M] ρj,t. Then, set λt+1 = Π[0,CF ] (λt ηg1,t) , µt+1 = Π[0,CF ] (µt ηg2,t) . (10) 7: Update balance check: S1,t+1 = S1,t + g1,t and S2,t+1 = S2,t + P j [M] ρj,t. 8: Increment period counter t t + 1. 9: end while 10: Record τA = t 1 and for all t = τA + 1 . . . T set ρj,t = ρ for all j [M]. 11: Output ρT = 1 T P t [T ] ρj,t Proposition 4.3 (Regret decomposition). For any channel j [M] define ρ j(t) = arg maxρj [M] Lj(ρj; ct) to be the optimal per-channel budget w.r.t. dual variables ct = (λt, µt)t [T ]. Then T GL-OPT P j [M] Vj(ρj,t) is bounded by MV (T τA) | {z } Stopping error t [τA] (λtg1,t + µtg2,t) | {z } SGD complementary slackness deviations t [τA] Lj(ρ j(t), λt, µt) Lj(ρj,t, λt, µt) | {z } UCB error where τA [T] is defined in step 10 of Algorithm 1. Recall the definitions of g1,t and g2,t in step 5 of Algorithm 1, and the fact that the conversion Vj(ρj; zj) is bounded above by absolute constant V (0, ) almost surely. We bound the stopping error together with SGD complementary slackness violations in the following Lemma 4.4, which follows standard analyses for SGD; see proof in Appendix C.2. Lemma 4.4 (Bounding stopping error and complementary slackness deviations). Assume Assumptions 4.1 and 4.2 hold. Recall η > 0 is the step size. Then we have MV (T t [τA](λtg1,t + µtg2,t) O ηT + 1 Challenges in bounding UCB error due to adversarial contexts and continuum-arm dicretization. Bounding our UCB error is much more challenging than doing so in classic stochastic multi-arm bandit settings: first, our setup involves discretizing a continuum of arms i.e. our discretization in Eq.(8) for [0, ρ]; second, and more importantly, the dual variables {ct}t [T ] are effectively adversarial contexts since they are updated via SGD instead of being stochastically sampled from some nice distribution, and correspondingly the Lagrangian function Lj(ak, ct; zt) can be viewed as a reward function that maps any arm-context pair (ak, ct) to (stochastic) payoffs. Both continuum-arms and adversarial contexts have been notorious in making reward function estimations highly inefficient; see e.g. discussions in (Agrawal, 1995; Agarwal et al., 2014). We further elaborate on specific challenges that adversarial contexts bring about: 1. Boundedness of rewards. In classic stochastic multiarm bandtis and UCB, losses in total rewards grow linearly with the magnitude of rewards. In our setting, the reward function, i.e. the Lagrangian function Lj(ak, ct; zt), scales linearly with the magnitude of contexts (see Eq. (6), so large contexts (i.e. large dual variables) may lead to large losses. 2. Context-dependent exploration-exploitation tradeoffs. The typical trade-off for arm exploration and exploitation in our setting depends on the particular values of the contexts (i.e. the dual variables), which means there may exist bad contexts that lead to poor tradeoffs that require significantly more explorations to achieve accurate estimates of arm rewards than other good contexts. We elaborate more in Lemma 4.7 and discussions thereof. We first handle continuum arm discretization via showing the specific form of conversion functions V (ρj; z) in Eq. (4) induces salient structures for the Lagrangian function, namely it is continuous, piecewise linear, concave, and unimodal5; we present the proof in Appendix C.3 Lemma 4.5 (Structural properties). For any channel j [M] and per-channel budget ρj the conversion function Vj(ρj) is continuous, piece-wise linear, strictly increasing, 5A function f : R R is unimodal if y such that f(y) strictly increases when y y and strictly decreases when y y . Multi-channel Autobidding with Budget and ROI Constraints and concave. In particular, Vj(ρj) takes the form n [Sj] (sj,nρj + bj,n) I{ρj [rj,n 1, rj,n]} , where Sj N and {(sj,n, bj,n, rj,n)}n [Sj] only depend on the support Fj and distribution pj from which value and costs are sampled. These parameters satisfy sj,1 > > sj,Sj > 0 and 0 = rj,0 < rj,1 < < rj,Sj = ρ, as well as bj,n 0 s.t. sj,nrj,n + bj,n = sj,n+1rj,n + bj,n+1 for all n [Sj 1], with bj,1 = 0. This implies Vj(ρj) is continous in ρj. For any dual variables c = (λ, µ) R2 +, Lj(ρj, c) defined in Eq. (6) is continuous, piece-wise linear, concave, and unimodal in ρj. In particular, Lj(ρj, c) = PSj n=1 σj,n(c)ρj + b j,n I{ρj [rj,n 1, rj,n]} where the slope σj,n(c) = (1 + λ)sj,n (µ + γλ) and b j,n = (1 + λ)bj,n. This implies arg maxρj 0 Lj(ρj, c) = max{rj,n : n = 0, 1 . . . , Sj , σj,n(c) 0}. In fact, for any realized value-cost pairs z, the realization versions of the conversion and Lagrangians functions, namely Vj(ρj; z) and Lj(ρj, c; z), also satisfy the same properties as those of Vj(ρj) and Lj(ρj, c). We provide a visual illustration for these properties in Figure 2. Figure 2. Illustration of Lagrangian functions defined in Eq. (6) with Mj = 2 auctions in channel j, and support Fj that contains 3 elements, z(1) = ((8, 2), (2, 3)), z(2) = ((3, 4), (1, 4)), z(3) = ((8, 1), (4, 2)), and context c = (λ, µ) = (4, 2). Under Lemma 4.5, Sj = 5, where the turning points rj,0 . . . rj,Sj are indicated on the x-axis, and the optimal budget w.r.t. c is arg maxρj [0,ρ] Lj(ρj; c) = rj,2. The adjacent slopes in Eq. (11) are σ j (c) = σj,2(c), and σ+ j (c) = σj,3(c). We now handle the reward boundedness issue for the Lagrangian functions that arise from adversarial contexts: in Lemma 4.6 (proof in Appendix C.4), we show the Lagrangian functions are bounded by some absolute constants: Lemma 4.6 (Bounding Lagrangian functions). For any t [T], j [M] and ρj [0, ρ] we have (1 + γ) ρCF Lj(ρj, λt, µt) (1 + CF )V , where the dual variables (λt, µt)t [T ] are generated from Algorithm 1. We now address the context-dependent explorationexploitation tradeoff. To illustrate (e.g. Figure 2), define the slopes that are adjacent to the optimal per-channel budget w.r.t. c = (λ, µ) as followed: assume the nth turning point rj,n = arg maxρj [0,ρ] Lj(ρj, c), then σ j (c) = σj,n(c) and σ+ j (c) = σj,n+1(c) (11) Similar to standard exploration-exploitation tradeoffs in bandits, the flatter the slope (e.g. σ j (c) is close to 0), the more pulls required to accurately estimate rewards for sub-optimal arms on the slope, but the lower the loss in conversion for pulling sub-optimal arms. Our setting is challenging because the magnitude of this tradeoff, or equivalently adjacent slopes σ j (c) and σ+ j (c), depend on the adversarial contexts. In Lemma 4.7 we bound the UCB error by handling this context-dependent tradeoff through separately analyzing periods when the adjacent slopes σ j (c) and σ+ j (c) are less or greater than some parameter σ > 0 chosen later, and characterize the context-dependent tradeoff using σ. Lemma 4.7 (Bounding UCB error). Assume the discretization width δ satisfies δ < rj := minn [Sj] rj,n rj,n 1, where Sj and {rj,n}Sj n=0 are defined in Lemma 4.5. Then the UCB error in Proposition 4.3 is upper bounded by O δT + σT + 1 σδ , where σ > 0 is any positive number. See Appendix C.5 for the proof. Finally, we put together Proposition 4.3, Lemma 4.4 and 4.7, and obtain the main result Theorem 4.8 whose proof we detail in Appendix C.6 Theorem 4.8 (Putting everything together). Assume assumptions 4.1 and 4.2 hold. Take step size η = Θ(1/ T), discretization width δ = Θ(T 1/3) and β = ρ = 1 log(T ) in Algorithm 1, as well as σ = Θ(T 1/3) in Lemma 4.7. Then, for large enough T we have T GL-OPT E h P t [T ] P j [M] Vj(ρj,t) i O(T 2/3). Further, recalling ρT is the final output of Algorithm 1, we have j [M] E Vj(ρj,T ) O(T 1/3) and constraint satisfaction: ρ P j [M] E[ρj,T ] 0, as well as P j [M] E[Vj(ρj,T ) γρj,T ] 0. We make an important remark that distinguishes our result in Theorem 4.8 with related literature on convex optimization: Remark 4.9. In light of Lemma 4.5, the advertiser s optimization problem CH-OPT(IB) in Eq. (3) effectively becomes a convex problem (see Proposition C.5 in Appendix C.7). Hence it may be tempting for one to directly employ off-the-shelf convex optimization algorithms. However, our problem involves stochastic bandit feedback, and more importantly, uncertain constraints, meaning that we cannot analytically determine whether a primal decision satisfies the constraints of the problem. For Multi-channel Autobidding with Budget and ROI Constraints example, in CH-OPT(IB), for some primal decision (ρj)j [M], we cannot determine whether the ROI constraint P j M E [Vj(γj, ρj; zj) γDj(γj, ρj; zj)] 0 holds because the distribution (pj)j [M] from which z is sampled is unknown. To the best of our knowledge, there are only two recent works that handle a similar stochastic bandit feedback, and uncertain constraint setting, namely (Usmanova et al., 2019) and (Nguyen & Balasubramanian, 2022). Nevertheless, our setting is more challenging because these works consider a two-point estimation regime where one can make function evaluations to the objective and constraints twice each period, whereas our setting involves one-point estimation such that we can only make function calls once per period. We note the optimal oracle complexities for unknown constraint convex optimization with one-point bandit feedback, remains an open problem.6 5. Additional Discussions See more details on following discussions in Appendix A.3. General advertiser objectives. In GL-OPT and CH-OPT(I) we can consider more general objectives, namely P j [M] E v j xj αd j xj and P j M E [Vj(γj, ρj; zj) αVj(γj, ρj; zj)] for some private cost α [0, γ]. Our results in Section 3 still hold, and Algorithm 1 can still produce per-channel budgets that are approximately optimal via introducing α into the Lagrangian. Note α = 0 recovers models in the previous section, whereas α = 1 yields classic quasi-linear utilities. Ad auctions selling multiple impressions. In GL-OPT and channels autobidding problem Eq.(4), a single impression is sold per auction. Yet, our insights in Section 3 also hold for auctions that sell multiple impressions from which advertisers can at most procure 1, e.g. position auctions such as VCG or generalized second price (GSP). To see this, use our original notation (vj, dj) to represent the concatenation of all value-cost pairs of individual impressions across all auctions in channel j, while xj is the vector of indicators for individual impressions. Further, our SGD-UCB algorithm produces accurate lever estimates for auctions whose induced conversion function Vj(ρj) still possesses properties in Lemma 4.5. This holds for position auctions whose marginal cost for winning a higher position increases, e.g. VCG (but not GSP). Non-optimal autobidding in channels. Eq. (4) assumes channels adopt optimal autobidding , and natural questions regarding non-optimal autobidding lead to interesting future research. In such a scenario, an advertiser s (bandit) feedback in channel j is Vj(γj, ρj; zj) ϵj for some loss ϵj > 0. One potential resolution is to treat such loss as adversarial corruptions to bandit rewards, and 6See Table 4.1 in (Larson et al., 2019) for best known complexity bounds for one-point bandit feedback setups. augment SGD with bandit algorithms robust to adversarial corruptions; e.g. (Lykouris et al., 2018; Gupta et al., 2019). Agarwal, A., Hsu, D., Kale, S., Langford, J., Li, L., and Schapire, R. Taming the monster: A fast and simple algorithm for contextual bandits. In International Conference on Machine Learning, pp. 1638 1646. PMLR, 2014. Aggarwal, G., Badanidiyuru, A., and Mehta, A. Autobidding with constraints. In International Conference on Web and Internet Economics, pp. 17 30. Springer, 2019. Agrawal, R. The continuum-armed bandit problem. SIAM journal on control and optimization, 33(6):1926 1951, 1995. Audet, C. and Dennis Jr, J. E. A pattern search filter method for nonlinear programming without derivatives. SIAM Journal on Optimization, 14(4):980 1010, 2004. Audet, C. and Tribes, C. Mesh-based nelder mead algorithm for inequality constrained optimization. Computational Optimization and Applications, 71(2):331 352, 2018. Balseiro, S., Kim, A., Mahdian, M., and Mirrokni, V. Budget management strategies in repeated auctions. In Proceedings of the 26th International Conference on World Wide Web, pp. 15 23, 2017. Balseiro, S., Golrezaei, N., Mahdian, M., Mirrokni, V., and Schneider, J. Contextual bandits with cross-learning. Advances in Neural Information Processing Systems, 32, 2019a. Balseiro, S., Golrezaei, N., Mirrokni, V., and Yazdanbod, S. A black-box reduction in mechanism design with private cost of capital. Available at SSRN 3341782, 2019b. Balseiro, S., Deng, Y., Mao, J., Mirrokni, V., and Zuo, S. Robust auction design in the auto-bidding world. Advances in Neural Information Processing Systems, 34: 17777 17788, 2021a. Balseiro, S. R. and Gur, Y. Learning in repeated auctions with budgets: Regret minimization and equilibrium. Management Science, 65(9):3952 3968, 2019. Balseiro, S. R., Deng, Y., Mao, J., Mirrokni, V. S., and Zuo, S. The landscape of auto-bidding auctions: Value versus utility maximization. In Proceedings of the 22nd ACM Conference on Economics and Computation, pp. 132 133, 2021b. Balseiro, S. R., Lu, H., and Mirrokni, V. The best of many worlds: Dual mirror descent for online allocation problems. Operations Research, 2022. Multi-channel Autobidding with Budget and ROI Constraints Bertsekas, D. P. Constrained optimization and Lagrange multiplier methods. Academic press, 2014. Besbes, O., Gur, Y., and Zeevi, A. Stochastic multi-armedbandit problem with non-stationary rewards. Advances in neural information processing systems, 27, 2014. Braverman, M., Mao, J., Schneider, J., and Weinberg, M. Selling to a no-regret buyer. In Proceedings of the 2018 ACM Conference on Economics and Computation, pp. 523 538, 2018. B urmen, Á., Puhan, J., and Tuma, T. Grid restrained neldermead algorithm. Computational optimization and applications, 34(3):359 375, 2006. Castiglioni, M., Celli, A., Marchesi, A., Romano, G., and Gatti, N. A unifying framework for online optimization with long-term constraints. ar Xiv preprint ar Xiv:2209.07454, 2022. Cheung, W. C., Simchi-Levi, D., and Zhu, R. Learning to optimize under non-stationarity. In The 22nd International Conference on Artificial Intelligence and Statistics, pp. 1079 1087. PMLR, 2019. Deng, Y., Schneider, J., and Sivan, B. Prior-free dynamic auctions with low regret buyers. Advances in Neural Information Processing Systems, 32, 2019. Deng, Y., Mao, J., Mirrokni, V., and Zuo, S. Towards efficient auctions in an auto-bidding world. In Proceedings of the Web Conference 2021, pp. 3965 3973, 2021. Deng, Y., Golrezaei, N., Jaillet, P., Liang, J. C. N., and Mirrokni, V. Fairness in the autobidding world with machinelearned advice. ar Xiv preprint ar Xiv:2209.04748, 2022a. Deng, Y., Mao, J., Mirrokni, V., Zhang, H., and Zuo, S. Efficiency of the first-price auction in the autobidding world. ar Xiv preprint ar Xiv:2208.10650, 2022b. Ding, Q., Hsieh, C.-J., and Sharpnack, J. An efficient algorithm for generalized linear bandit: Online stochastic gradient descent and Thompson sampling. In International Conference on Artificial Intelligence and Statistics, pp. 1585 1593. PMLR, 2021. Drutsa, A. Optimal non-parametric learning in repeated contextual auctions with strategic buyer. In International Conference on Machine Learning, pp. 2668 2677. PMLR, 2020. Dzahini, K. J., Kokkolaras, M., and Le Digabel, S. Constrained stochastic blackbox optimization using a progressive barrier and probabilistic estimates. Mathematical Programming, pp. 1 58, 2022. Fasano, G., Liuzzi, G., Lucidi, S., and Rinaldi, F. A linesearch-based derivative-free approach for nonsmooth constrained optimization. SIAM journal on optimization, 24(3):959 992, 2014. Feng, Z., Padmanabhan, S., and Wang, D. Online bidding algorithms for return-on-spend constrained advertisers. ar Xiv preprint ar Xiv:2208.13713, 2022. Flaxman, A. D., Kalai, A. T., and Mc Mahan, H. B. Online convex optimization in the bandit setting: gradient descent without a gradient. ar Xiv preprint cs/0408007, 2004. Golrezaei, N., Jaillet, P., and Liang, J. C. N. Incentiveaware contextual pricing with non-parametric market noise. ar Xiv preprint ar Xiv:1911.03508, 2019a. Golrezaei, N., Javanmard, A., and Mirrokni, V. Dynamic incentive-aware learning: Robust pricing in contextual auctions. Advances in Neural Information Processing Systems, 32, 2019b. Golrezaei, N., Jaillet, P., Liang, J. C. N., and Mirrokni, V. Bidding and pricing in budget and ROI constrained markets. ar Xiv preprint ar Xiv:2107.07725, 2021a. Golrezaei, N., Manshadi, V., Schneider, J., and Sekar, S. Learning product rankings robust to fake users. In Proceedings of the 22nd ACM Conference on Economics and Computation, pp. 560 561, 2021b. Gupta, A., Koren, T., and Talwar, K. Better algorithms for stochastic bandits with adversarial corruptions. In Conference on Learning Theory, pp. 1562 1578. PMLR, 2019. Han, Y., Liang, Z., Wang, Y., and Zhang, J. Generalized linear bandits with local differential privacy. Advances in Neural Information Processing Systems, 34:26511 26522, 2021. Kannan, P., Reinartz, W., and Verhoef, P. C. The path to purchase and attribution modeling: Introduction to special section, 2016. Larson, J., Menickelly, M., and Wild, S. M. Derivativefree optimization methods. Acta Numerica, 28:287 404, 2019. Luo, H., Wei, C.-Y., Agarwal, A., and Langford, J. Efficient contextual bandits in non-stationary worlds. In Conference On Learning Theory, pp. 1739 1776. PMLR, 2018. Lykouris, T., Mirrokni, V., and Paes Leme, R. Stochastic bandits robust to adversarial corruptions. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pp. 114 122, 2018. Multi-channel Autobidding with Budget and ROI Constraints Mehta, A. Auction design in an auto-bidding setting: Randomization improves efficiency beyond vcg. In Proceedings of the ACM Web Conference 2022, pp. 173 181, 2022. Nguyen, A. and Balasubramanian, K. Stochastic zerothorder functional constrained optimization: Oracle complexity and applications. INFORMS Journal on Optimization, 2022. Pourmohamad, T. and Lee, H. K. The statistical filter approach to constrained optimization. Technometrics, 62 (3):303 312, 2020. Slivkins, A. et al. Introduction to multi-armed bandits. Foundations and Trends in Machine Learning, 12(12):1 286, 2019. Usmanova, I., Krause, A., and Kamgarpour, M. Safe convex learning under uncertain constraints. In The 22nd International Conference on Artificial Intelligence and Statistics, pp. 2106 2114. PMLR, 2019. Varian, H. R. Position auctions. international Journal of industrial Organization, 25(6):1163 1178, 2007. Multi-channel Autobidding with Budget and ROI Constraints Appendices for Multi-channel Autobidding with Budget and ROI constraints A. Additional material A.1. Extended Literature Review Generally speaking, our work focuses on advertisers impression procurement process or the interactions between advertisers and impression sellers, which has been addressed in a vast amount of literature in mechanism design and online learning; see e.g. (Braverman et al., 2018; Deng et al., 2019; Golrezaei et al., 2019b;a; Balseiro et al., 2019b; Golrezaei et al., 2021a) to name a few. In addition to the literature review on constrained optimization under bandit feedback in the introduction section, we also discuss additional related works in this area here. Constrained optimization under bandit feedback. Section 4 where we develop an algorithm to optimize over per-channel target budgets relates to the area of convex constrained optimization with bandit feedback (also referred to as zero-order or gradient-less feedback) since in light of Lemma 4.5 in Section 4 our problem of interest is also constrained and convex. First, there has been a plenitude of algorithms developed for deterministic constrained convex optimization under a bandit feedback structures where function evaluations for the objective and constraints are non-stochastic. Such algorithms include filter methods (Audet & Dennis Jr, 2004; Pourmohamad & Lee, 2020), barrier-type methods (Fasano et al., 2014; Dzahini et al., 2022), as well as Nelder-Mead type algorithms (B urmen et al., 2006; Audet & Tribes, 2018); see (Nguyen & Balasubramanian, 2022) and references therein for a comprehensive survey. In contrast to these works, our optimization algorithm developed in Section 4 handles noisy bandit feedback. Regarding works that also address stochastic settings, (Flaxman et al., 2004) presents online optimization algorithms under the known constraint regime, which assumes the optimizer can evaluate whether all constraints are satisfied, i.e. constraints are analytically available. Further, the algorithm achieves a O(T 1/4) accuracy. In this work, our setting is more complex as the optimizer (i.e. the advertiser) cannot tell whether the ROI constrained is satisfied (due to unknown value and cost distributions in each channels auctions). Yet our proposed algorithm can still achieve a more superior O(T 1/3) accuracy due to our specific problem structure. A.2. Additional materials for Section 4 Remark A.1. Our problem of interest to apply SGD under bandit feedback is more difficult than similar problems in related works that study online bidding strategies under budget and ROI constraints; see e.g. (Balseiro et al., 2017; 2022; Feng et al., 2022). To illustrate, consider for instance (Balseiro et al., 2017) in which a budget constrained advertiser s primal decision at period t is to submit a bid value bt after observing her value vt. The advertiser competes with some unknown highest competing bid dt in the market, and after submitting bid bt, does not observe dt if she does not win the competition, which involves a semi-bandit feedback structure. Nevertheless, the corresponding Lagrangian under SGD takes the special form Lj(b, µt; zt) = (vt (1 + µt)dt) I{b dt} where µt is the dual variable w.r.t. the budget constraint. This simply allows an advertiser to optimize for her primal decision by bidding arg maxb 0 Lj(b, ct; zt) = vt 1+µt . So even though (Balseiro et al., 2017; 2022; Feng et al., 2022) study dual SGD under bandit feedback, the special structures of their problem instances permits SGD to effectively optimize for primal decisions in each period, as opposed to Eq. (7) in our setting in which we cannot directly solve for the primal decision due to unknown conversion functions. A.3. Additional materials for Section 5 A.3.1. MORE GENERAL ADVERTISER OBJECTIVES In GL-OPT as well as CH-OPT(I) we can also consider more general objectives, namely maxx1,...,x M P j [M] E v j xj αd j xj and max(γj,ρj)j [M] I P j M E [Vj(γj, ρj; zj) αVj(γj, ρj; zj)] for some private cost α [0, γ]7 in GL-OPT and CH-OPT(I), respectively. When α = 0, we recover our considered models in the previous section, whereas in when α = 1, we obtain the classic quasi-linear utility. We remark that this private cost model has been introduced and studied in related literature; see (Balseiro et al., 2019b) and references therein. Nevertheless, when each channel s autobidding problem remains as is in Eq.(4), i.e. channels still aim to maximize conversion which 7If α > γ the ROI constraints in GL-OPT as well as CH-OPT(I) become redundant. Multi-channel Autobidding with Budget and ROI Constraints causes a misalignment between advertiser objectives and channel behavior, it is not difficult to see in our proofs that all our results still hold in Section 3, and our UCB-SGD algorithm still produces estimates of the same order of accuracy via introducing α into the Lagrangian. In other words, even if channels aim to maximize total conversion for advertisers, advertisers can optimize for GL-OPT with a private cost α through optimizing CH-OPT(I) that also incorporates the same private cost. A.3.2. AD AUCTIONS SELLING MULTIPLE IMPRESSIONS. Recall in GL-OPT and each channel s autobidding problem in Eq. (4) we implicitly assumed each auction sells of a single impression. Interestingly, our results in Section 3 that states solely optimizing per-channel budgets would be sufficient for the advertiser to achieve GL-OPT also holds in the scenario when each single auction corresponds to the sale of multiple impressions in which an advertiser can at most procure 1, or in other words, the auctions are position auctions such as VCG or generalized second price (GSP) auctions. To see this, we can use our original notation (vj, dj) to represent the concatenation of all value-cost pairs of all individual impressions across all auctions in channel j. Correspondingly the decision xj is the vector of indicators that determine whether an individual impression is procured, but with the additional linear constraint that says the sum of indicators within any auction sum up to less than 1 to represent an advertiser can win at most 1 impression. A similar representation can also be used for each channel s autobidding problem in Eq. (4). Then, it is not difficult to see the results and proofs of Theorem 3.3 and Corollary 3.4 still hold valid. Regarding the applicability of our proposed UCB-SGD algorithm in Algorithm 1, it suffices to check if the position auctions induce a conversion function Vj(ρj) that satisfies properties illustrated in Lemma 4.5, in particular whether Vj(ρj; zj) is continuous, concave, piecewise linear, and strictly increasing. We claim that this is true for VCG auctions. Fix some VCG auction n in channel j with L positions, each corresponding to a click through rate (CTR) θℓthat represents the probability of a user viewing that position (so it is natural to assume θ1 > θ2 > . . . θL; see (Varian, 2007) for more details. If an advertiser procures position ℓ, she acquires value v(ℓ) = θℓ v where v represents her expected conversion due to a user viewing her ad (i.e. the value of winning any position), and incurs expenditure p(ℓ) = PL ℓ =ℓ(θℓ θℓ +1)d(ℓ ), where d(ℓ ) is the ℓ th highest competing bid in the market, and we denote θL+1 = 0. Now, this single VCG auction can be viewed as L separated impressions (each sold in a separate single-impression auction) with value-cost pairs v(1) v(2), p(1) p(2) , . . . v(L) v(L+1), p(L) p(L+1) , where we denote v(L+1) = p(L+1) = 0. Correspondingly, the indicator decision variables xn,j in GL-OPT and the channel s autobidding problem Eq. (4) for the original multiimpression auction n of channel j can be viewed as a vector of indicator decisions x {0, 1}L for each of these separated impressions. Now, the proof of Lemma 4.5 still holds if we can show that procuring any separated impression ℓalso implies procuring impression ℓ+ 1, ℓ+ 2 . . . L, or equivalently, v(ℓ) v(ℓ+1) p(ℓ) p(ℓ+1) > v(ℓ ) v(ℓ +1) p(ℓ ) p(ℓ +1) for any ℓ > ℓ(since the advertiser s problem is the LP relaxation of the 0-1 knapsack problem as discussed in the proof of Lemma 4.5). It is thus easy to see this holds because v(ℓ) v(ℓ+1) p(ℓ) p(ℓ+1) = (θℓ θℓ+1) v (θℓ θℓ+1)d(ℓ) = v d(ℓ) (12) which decreases in ℓfor any ℓbecause d(ℓ) is the ℓth highest competing bid. In other words, marginal cost increases as one procures higher positions. Hence, the proof of Lemma 4.5 holds w.r.t. these separated impressions, and thus for VCG auctions, the induced conversion satisfies properties in Lemma 4.5. The insight here is that for any position auctions whose marginal cost increases as the advertiser procures higher positions, properties in Lemma 4.5 hold. However, this is not true for auctions like generalized second price (GSP). A.3.3. NON-OPTIMAL AUTOBIDDING IN CHANNELS. We recall in previous sections we assumed that each channel adopt optimal autobidding that solves Eq. (4) to optimality. This raises the natural question that whether our findings will still hold when channels do not procure ads optimally, perhaps because of non-stationary environments (Besbes et al., 2014; Luo et al., 2018; Cheung et al., 2019), or the presence of strategic market participants who aim to manipulate the market (Golrezaei et al., 2019a; Drutsa, 2020; Golrezaei et al., 2021b;a). In such a scenario, an advertiser s (bandit) conversion feedback in a channel j would be V (γj, ρj; zj) ϵj for some channel-specific and possibly adversarial loss ϵj > 0. One potential resolution is to treat such ϵj as adversarial corruptions to bandit rewards, and instead of integrating vanilla UCB with SGD as in Algorithm 1, augment SGD with bandit algorithms that are robust to corruptions; see e.g. (Lykouris et al., 2018; Gupta et al., 2019). Nevertheless, it remains an open question to prove how such augmentation would perform in our specific bandit-feedback constrained optimization Multi-channel Autobidding with Budget and ROI Constraints setup. This leads to potential research directions of both practical and theoretical significance. B. Proofs for Section 3 B.1. Proof of Lemma 3.1 Fix any option I {IB, IR, IG} defined in Eq. (2), and let (eγ, eρ) I be the optimal solution to CH-OPT(I). Note that for the per-channel ROI only option IR, we have eρj = and for the per-channel budget only we have eγj = 0 for all j [M]. Further, for any realization of value-cost pairs over all auctions z = (vj, dj)j [M], recall the optimal solution x j(eγj, eρj; zj) to Vj(eγj, eρj; zj) for each channel j [M] as defined in Eq. (4). Due to feasibility of (eγ, eρ) I for CH-OPT(I), we have j M E [Vj(eγj, eρj; zj)] γ X j M E [Dj(eγj, eρj; zj)] = X j [M] E v j x j(eγj, eρj; zj) γ X j [M] E d j x j(eγj, eρj; zj) where we used the definitions Vj(eγj, eρj; zj) = v j x j(eγj, eρj; zj) and Dj(eγj, eρj; zj) = d j x j(eγj, eρj; zj) in Eq. (5). This implies x j(eγj, eρj; zj) j [M] satisfies the ROI constraint in GL-OPT. A similar analysis implies x j(eγj, eρj; zj) j [M] also satisfies the budget constraint in GL-OPT. Therefore, x j(eγj, eρj; zj) j [M] is feasible to GL-OPT. So j [M] E v j x j(eγj, eρj; zj) = X j M [Vj(eγj, eρj; zj)] = CH-OPT(I) . (13) where the final equality follows from the assumption that (eγ, eρ) I is the optimal solution to CH-OPT(I). B.2. Proof of Theorem 3.2 Recall the value and cost instance: Channel 1 Channel 2 Auction 1 Auction 2 Auction 3 Value vj,n 1 X 2X Spend dj,n 0 1 + X 2(1 + X) Let eγ = (eγ1, eγ2) be the optimal solution to CH-OPT(IR) and recall under the option IR, we let per-channel budgets to be infinity. It is easy to see that eγ1 can be any arbitrary nonnegative number because the advertiser always wins auction 1, and eγ2 > X 1+X : if otherwise eγ2 X 1+X , then the optimal outcome of channel 2 is to win both auctions 2 and 3. However, in this case, the advertiser wins all auctions and acquires total value 1 + X + 2X = 1 + 3X, and incurs total spend 0 + (1 + X) + 2(1 + X) = 3 + 3X, which violates the ROI constraint in CH-OPT(IR) because 1+3X 3+3X < 1. Therefore the advertiser can only win auction 1, or in other words eγ2 > X 1+X . This implies that the optimal objective to CH-OPT(IR) is 1. On the other hand, it is easy to see that the optimal solution to GL-OPT is to only win auctions 1 and 2, yielding an optimal value of 1 + X. Therefore CH-OPT(IR) GL-OPT = 1 1+X . Taking X yeilds the desired result. B.3. Proof of Theorem 3.3 In light of Lemma 3.1, we only need to show CH-OPT(IB) GL-OPT. Let ex(z) = {exj(zj))}j [N] be the optimal solution to GL-OPT, and define eγj = 0 and eρj = E d j exj(zj)) to be the corresponding expected spend for each channel j under the optimal solution ex(z) to GL-OPT, respectively. We first argue that (eγj, eρj)j [M] is feasible to CH-OPT(IB). Recall the optimal solution x j(eγj, eρj; zj) to Vj(eγj, eρj; zj) for each channel j [M] as defined in Eq. (4), as well as the definitions Vj(eγj, eρj; zj) = v j x j(eγj, eρj; zj) and Dj(eγj, eρj; zj) = d j x j(eγj, eρj; zj) in Eq. (5). Then, we have E [Dj(eγj, eρj; zj)] = E d j x j(eγj, eρj); zj (i) eρj = E d j exj(zj) , (14) Multi-channel Autobidding with Budget and ROI Constraints where (i) follows from feasibility of x j(eγj, eρj; zj) to Vj(eγj, eρj; zj). Summing over j [M] we conclude that (eγj, eρj)j [M] satisfies the budget constraint in CH-OPT(IB): j [M] E [Dj(eγj, eρj; zj)] X j [M] E d j exj(zj) (i) ρ . (15) Here (i) follows from feasibility of ex(z) = {exj(zj))}j [N] to GL-OPT since it is the optimal solution. On the other hand, we have Vj(eγj, eρj; zj) = v j x j(eγj, eρj; zj) (i) v j exj(zj) (16) where (i) follows from optimality of x j(eγj, eρj; zj) to Vj(eγj, eρj; zj). Hence, we have j M E [Vj(eγj, eρj; zj)] X j M E v j exj(zj) (i) γ X j M E d j exj(zj) (ii) γ X j [M] E [Dj(eγj, eρj; zj)] (17) where (i) follows from feasibility of ex(z) = {exj(zj))}j [N] to GL-OPT since it is the optimal solution; (ii) follows from Eq. (14). Hence combining Eq. (15) (17) we can conclude that (eγj, eρj)j [M] is feasible to CH-OPT(IB). Finally, we have CH-OPT(IB) P j M E [Vj(eγj, eρj; zj)] P j M E v j exj(zj) = GL-OPT, where the last inequality follows from Eq. (16), and the final equality is because we assumed ex(z) = {exj(zj))}j [N] is the optimal solution to GL-OPT. B.4. Proof of Corollary 3.4 In light of Lemma 3.1, we only need to show CH-OPT(IG) GL-OPT. Let (eγ, eρ) IB be the optimal solution to CH-OPT(IB), and by definition of IB in Eq. (2) we have eγj = 0 for all j [M]. Since (eγ, eρ) is feasible to CH-OPT(IB), it is also feasible to CH-OPT(IG) since these two problems share the same ROI and budget constraints. Because they also share the same objectives, we have CH-OPT(IG) CH-OPT(IB) = GL-OPT (18) where the final equality follows from Theorem 3.3. C. Proofs for Section 4 C.1. Proof of Proposition 4.3 Let (ρ j)j [M] be the optimal per-channel budgets to CH-OPT(IB), and define µT = 1 τA P t [τA] µt as well as λT = 1 τA P t [τA] λt . Then j [M] Vj(ρj,t) (i) MV (T τA) + τACH-OPT(IB) X j [M] Vj(ρj,t) MV (T τA) + τA Lj(ρ j, λT , µT ) + ρ µT X j [M] Vj(ρj,t) (ii) MV (T τA) + ρ X t [τA] µt + X j [M] Lj(ρ j, λt, µt) X j [M] Lj(ρj,t, ct) λt (Vj(ρj,t) γρj,t) + µtρj,t (iii) MV (T τA) + X t [τA] Lj(ρ j(t), ct) Lj(ρj,t, ct) + X t [τA] (λtg1,t + µtg2,t) . Multi-channel Autobidding with Budget and ROI Constraints Here, (i) follows from Theorem 3.3 that states GL-OPT = CH-OPT(IB) and CH-OPT(IB) is apparently upper bounded by MV ; (ii) follows from the CH-OPT(IB) = P j [M] Vj(ρ j) and the definition of the Lagrangian in Eq. (6); in (iii) we define ρ j(t) = arg maxρj 0 Lj(ρj, ct) to be the optimal budget that maximizes the Lagrangian w.r.t. the dual variables ct = (λt, µt). C.2. Proof for Lemma 4.4 Recall g1,t = P j [M] (Vj,t(ρj,t; zj,t) γρj,t) and g2,t = ρ P j [M] ρj,t defined in Algorithm 1. Also recall τA [T] defined in step 10 of Algorithm 1. In the following, we will show MV (T τA) + X t [τA] (λtg1,t + µtg2,t) CF max{MV , ρ} + M 2V ρ max n 1 o + (γM 2 V 2 + ρ2) 2η C2 F = O ηT + 1 where we recall CF = MV max n 1 βρ, 1 ρ Mρ o defined in Eq. (9). From Lemma C.4, we have for any t [T], and λ, µ [0, CF ], τ [t] (λτ λ) g1,τ ηM 2 V 2 τ [t] (µτ µ) g2,τ ηρ2 2η µ2 , (21) where we used the fact that λ1 = µ1 = 0 in Algorithm 1. Suppose that τA = T and thus MV (T τA) = 0. Then, considering λ = µ = 0 in Eq. (21), we have t [τA] λtg1,t ηM 2 V 2 t [τA] µtg2,t ηρ2 Thus, Eq. (20) holds. If τA < T, then according to Algorithm 1, we either have S1,τA γMρ+βρ(T τA) < 0 or S2,τA +Mρ+Mρ(T τA) > ρT, where we recall S1,τA = P t [τA 1] g1,t and S2,τA = P t [τA 1] P j [M] ρj,t = P t [τA 1](ρ g2,t): If S1,τA γMρ + βρ(T τA) < 0, then we have P t [τA 1] g1,t < γMρ βρ(T τA). Hence, considering βρ [0, CF ] in Eq. (21), we have MV (T τA) + X t [τA] λtg1,t MV (T τA) + λτAg1,τA + X t [τA 1] λg1,t + ηM 2 V 2 2 (τA 1) + 1 < λτAg1,τA + MV (T τA) MV (T τA) + γM 2V ρ βρ + ηM 2 V 2 2 (τA 1) + 1 CF MV + γM 2V ρ βρ + ηM 2 V 2 where the final inequality uses the fact that τA T, λ CF , and g1,t MV for any t [T]. Hence, similar to Eq. (22) by further taking µ = 0 in Eq.(21) we show that Eq. (20) holds. Multi-channel Autobidding with Budget and ROI Constraints If S2,τA + Mρ + Mρ(T τA) > ρT, then we have P t [τA 1](ρ g2,t) > ρT Mρ Mρ(T τA), or equivalently P t [τA 1] g2,t < Mρ(T τA) + Mρ ρ(T τA) (ρ Mρ)(T τA) + Mρ. Hence, considering µ = MV ρ Mρ [0, CF ] in Eq.(21) we have MV (T τA) + X t [τA] µtg1,t MV (T τA) + µτAg2,τA + X t [τA 1] µg2,t + ηρ2 < µτAg2,τA + MV (T τA) MV (T τA) + M 2V ρ CF ρ + M 2V ρ where the final inequality uses the fact that τA T, λ CF , and g2,t ρ for any t [T]. Hence, similar to Eq. (22) by further taking λ = 0 in Eq.(21) we show that Eq. (20) holds. C.3. Proof of Lemma 4.5 We first show for any realization z = (zj)j [M] = (vj, dj)j [M], the conversion function Vj(ρj; zj) is piecewise linear, strictly inreasing, and concave for any j [M]. Fix any channel j which consists of mj parallel auctions, and recall that we assumed the orderding vj,1 dj,1 > vj,2 dj,2 > > vj,mj dj,mj for any realization zj. Then, with the option where the per-channel ROI is set to 0 (i.e. omitted) Vj(ρj; zj) is exactly the LP relaxation of a 0-1 knapsack, whose optimal solution x j(ρj; zj) is well known to be unique, and takes the form for any auction index n [mj]: x j,n(ρj; zj) = 1 if P n [n] dj,n ρj ρj P n [n 1] dj,n n [n] dj,n > ρj 0 otherwise where we denote dj,0 = 0. With this form, it is easy to see Vj(ρj; zj) = v j x j(ρj; zj) = X dj,n ρj + bj,n I {dj,0 + + dj,n 1 ρj dj,0 + + dj,n} (25) where we denote dj,0 = 0 and also bj,n = P n [n 1] vj,n vj,n n [n 1] dj,n and vj,0 = 0. It is easy to check that any two line segments, say [Xn 1, Xn] and [Xn, Xn+1] where we write Xn = dj,0 + + dj,n, intersect at ρj = Xn, because vj,n dj,n ρj + bj,n = vj,n+1 dj,n+1 ρj + bj,n+1 at ρj = Xn. Hence, from Eq. (25) we can conclude Vj(ρj; zj) is continuous, which further implies it is piecewise linear and strictly increasing. Further, the ordering vj,1 dj,1 > vj,2 dj,2 > > vj,mj dj,mj implies that the slopes on each segment [Xn, Xn+1] decreases as n increases, which implies Vj(ρj; zj) is concave. Since Vj(ρj) = E [Vj(ρj; zj)], where the expectation is taken w.r.t. randomness in zj, and since the zj is sampled from some discrete distribution pj on finite support Fj, Vj(ρj) is simply a weighted average over all (Vj(ρj; zj))zj Fj with weights in pj, so Vj(ρj) is also continuous, piecewise linear, strictly increasing, and concave, and thus can be written as in Lemma 4.5 with parameters {(sj,n, bj,n, rj,n)}n [Sj] that only depend on the support Fj and distribution pj. Finally, according to the definition of Lj(ρj, c) = E [Lj(ρj, c; zj)] and Lj(ρj, c; zj) = (1 + λ)Vj(ρj; zj) (λγ + µ)ρj as defined in Eq. (6), we have Lj(ρj, c) = (1 + λ)Vj(ρj) (λγ + µ)ρj (26) Multi-channel Autobidding with Budget and ROI Constraints which implies Lj(ρj, c) is continuous, piecewise linear, and concave because Vj(ρj) is continuous, piecewise linear, and concave as shown above. Combining Eq. (26) and the representation of Vj(ρj) in Lemma (4.5), we have Lj(ρj, c) = X n [Sj] (σj,n(c)ρj + (1 + λ)bj,n) I{rj,n 1 ρj rj,n} . (27) where the slope σj,n(c) = (1 + λ)sj, n (µ + γλ) decreases in n. Thus at the point rj,n = max{rj,n : n = 0, 1 . . . , Sj , σj,n(c) 0} in which the slope to the right turns negative for the first time, Lj(ρj, c) takes its maximum value maxρj 0 Lj(ρj, c), because to the left of rj,n , namely the region [0, rj,n ], Lj(ρj, c) strictly increases because slopes are positive; and to the right of rj,n , namely the region [rj,n , ρ], Lj(ρj, c) strictly decreases because slopes are negative. C.4. Proof for Lemma 4.6 Recall the definition of the Lagrangian function Lj(ρj, c; zj) = (1 + λ)Vj(ρj; zj) (λγ + µ)ρj in Eq.(6). Then, since Vj(ρj; zj) V , and λt, µt [0, CF ] for any period t [T] and per-channel budget ρj [0, ρ], we can conclude (1 + γ) ρCF Lj(ρj, λt, µt) (1 + CF )V . C.5. Proof for Lemma 4.7 In the following, instead of bounding P t [τA] Lj(ρ j,t, ct) Lj(ρj,t, ct), we bound P t [T ] Lj(ρ j,t, ct) Lj(ρj,t, ct) where we consider the hypothetical scenario in which we ignore the termination criteria for the while loop in Algorithm 1, and continue to set per-channel budgets based on steps 4-6 in the algorithm until the end of period T. This is due to the fact that P t [T ] Lj(ρ j,t, ct) Lj(ρj,t, ct) P t [τA] Lj(ρ j,t, ct) Lj(ρj,t, ct). We fix some channel j [M] and omit the subscript j when the context is clear. Also, we first introduce some definitions that will be used throughout our proof. Fix some positive constant σ > 0 whose value we choose later, and recall ak denotes the kth arm in the discretized budget set A(δ) as we defined in Eq. (8). Then we define the following k(c) = max ρj [0,ρ] Lj(ρj, c) Lj(ak, c) Cn = n c {ct}t [T ] : rj,n = arg max ρj 0 Lj(ρj, c) o for n = 0 . . . Sj C(σ) = c {ct}t [T ] : σ j (c) > σ, |σ+ j (c)| > σ for n = 0, . . . , Sj mk(c) = 8 log(T) 2 k(c) for (k, c) s.t. k(c) > 0 . Here, the adjacent slopes σ j (c) and σ+ j (c), which are defined in Eq.(11), represent the slopes that are adjacent to the optimal budget arg maxρj [0,ρ] Lj(ρj, c) for any context c = (λ, µ). Further, Sj and {rj,n}j [Sj] are defined in Lemma 4.5. Here we state in words the meanings of k(c), C(σ) and Cn, respectively. k(c) denotes the loss in contextual bandit rewards when pulling arm ak under context c. Cn is the set including all context ct under which the optimal per-channel budget arg maxρj 0 Lj(ρj, ct) is taken at the nth turning point rj,n (see Lemma 4.5). C(σ) is the set of all contexts, in which the adjacent slopes to the optimal point w.r.t. the context c, namely arg maxρj 0 Lj(ρj, c), have magnitude greater than σ, or in other words, the adjacent slopes are steep. On a related note, for any context c, we define the following adjacent regions that sandwich the optimal budget w.r.t.c U j (c) = [rj,n 1, rj,n] and U+ j (c) = [rj,n, rj,n+1] if c Cn . (29) In other words, if c Cn, per the definition of Cn above, arg maxρj [0,ρ] Lj(ρj, c) is located at the nth turning point rj,n, then U j (c) and U j (c) are respectively the left and right regions surrounding rj,n. Multi-channel Autobidding with Budget and ROI Constraints With the above definitions, we demonstrate how to bound the UCB-error. Define Nk,t = P τ t 1 I{ρj,τ = ak} to be the number of times arm k is pulled up to time t, then we can decompose the UCB error as followed t>K Lj(ρ j(t), ct) Lj(ρj,t, ct) = X1 + X2 + X3 where t>K:ct / C(σ) k [K] k(ct)I{ρj,t = ak, Nk,t mk(ct)} t>K:ct C(σ) k [K] k(ct)I{ρj,t = ak, Nk,t mk(ct)} t>K k(ct)I{ρj,t = ak, Nk,t > mk(ct)} . In Section C.5.1, we show that X1 e O(δT + σT + 1 δ ); in Section C.5.2 we show that X2 e O(δT + 1 δσ); in Section C.5.3 we show that X3 e O( 1 Remark C.1. In the following sections C.5.1, C.5.2 and C.5.3 where we bound X1, X2, and X3, respectively, we assume the optimal per-channel ρ j(t) = arg maxρj [0,ρ] Lj(ρj, ct) lies in the arm set A(δ) for all t. This is because otherwise, we can consider the following decomposition of the UCB error in period t as followed: Lj(ρ j(t), ct) Lj(ρj,t, ct) = Lj(ρ j(t), ct) Lj(a t , ct) + Lj(a t , ct) Lj(ρj,t, ct) where a t = arg max ak A(δ) Lj(ak, ct) The first term will yield an error in the order of O(δ) due to the Lagrangian function being unimodal, piecewise linear liner, which implies |a t ρ j(t)| δ so that Lj(ρ j(t), ct) Lj(a t , ct) = O(δ). Hence, this discretization error will accumulate to a magnitude of O(δT) over T periods, which leads to an additional error that is already accounted for in the statement of the lemma. C.5.1. BOUNDING X1. Our strategy to bound X1 consists of 4 steps, namely bounding the loss of arm ak at each context c / C(σ) when ak U j (c) lies on the left adjacent region of the optimal budget; ak < min U j (c) lies to the left of the left adjacent region; ak U+ j (c) lies on the right adjacent region of the optimal budget; and ak > max U+ j (c) lies to the right of the right adjacent region. Here we recall the adjacent regions are defined in Eq.(29). Step 1: ak U j (ct). For arm k such that ak U j (ct), recall Lemma 4.5 that Lj(a, ct) is linear in a for a U j (ct), so k(ct) = σ j (ct) (ρ j(t) ak) σρ where we used the condition that ct / C(σ) so the adjacent slopes have magnitude at most σ, and ρ j(t) ρ. Thus, summing over all such k we get t>K:ct / C(σ) k [K]:ak U j (ct) k(ct)I{ρj,t = ak, Nk,t mk(ct)} t>K:ct / C(σ) k [K]:ak U j (ct) σρ I{ρj,t = ak, Nk,t mk(ct)} σρT = O(σT) . (31) Step 2: ak < min U j (ct). For arm k such that ak < min U j (ct), we further split contexts into groups Cn for n = 0 . . . Sj (defined in Eq. (28)) based on whether the corresponding optimal budget w.r.t. the Lagrangian at the context is taken at the nth turning point (see Figure 2 of illustration). Then, for each context group n by defining k := max{k : ak < rj,n 1} Multi-channel Autobidding with Budget and ROI Constraints to be the arm closest to and less than rj,n 1, we have t>K:ct Cn/C(σ) k [K]:akK:ct Cn/C(σ) k [K]:akK I{ct = c, ρj,t = ak, Nk,t mk(c)}. In (i) we used the fact that the left end of the left adjacent region, i.e. min U j (ct) is exactly rj,n 1 because for context ct Cn the optimal budget arg maxρj [0,ρ] Lj(ρj, ct) is at the nth turning point; in (ii) we used the definition k := max{k : ak < rj,n 1} where we recall arms are indexed such that a1 < a2 < < a K. Note that in (ii) we separate out the arm ak because its distance to the optimal per-channel may be less than δ since it is the closest arm, and thus we ensure all other arms indexed by k [K] : ak < rj,n 1 δ, are at least δ away from the optimal per-channel budget; (iii) follows from the fact that under a context c Cn/C(σ), we have arg maxρj [0,ρ] Lj(ρj, c) = rj,n so k (c) = Lj(rj,n, c) Lj(rj,n 1, c) + Lj(rj,n 1, c) Lj(ak , c) = σ j (c)(rj,n rj,n 1) + σj,n 1(c)(rj,n 1 ak ) (iv) σρ + σj,n 1(c)δ (v) σρ + (1 + CF )sj,n 1δ , where in (iv) we used c Cn/C(σ) implies σ j (c) σ, as well as all rj,n ρ for any n and the fact that k lies on the line segment between points rj,n 2 and rj,n 1 since δ < minn [Sj] rj,n rj,n 1; in (v) we recall σj,n 1(c) = (1 + λ)sj,n 1 (µ + γλ) (1 + CF )sj,n 1 where CF is defined in Lemma 4.6. We now bound P c Cn/C(σ) k(c)Yk(c) in Eq. (32). It is easy to see the following inequality for any sequence of context c(1), . . . , c(ℓ) {ct}t [T ] (This is a slight generalization of an inequality result shown in (Balseiro et al., 2019a)): Yk(c(1)) + + Yk(c(ℓ)) max ℓ =1...ℓmk(c(ℓ )) . (33) This is because X ℓ [ℓ] Yk(c(ℓ )) = X ℓ [ℓ] I{ct = c(ℓ ), ρj,t = ak, Nk,t mk(c(ℓ ))} ℓ [ℓ] I{ct = c(ℓ ), ρj,t = ak, Nk,t max ℓ =1...ℓmk(c(ℓ ))} t>K I{ct {c(ℓ )}ℓ [ℓ], ρj,t = ak, Nk,t max ℓ =1...ℓmk(c(ℓ ))} max ℓ =1...ℓmk(c(ℓ )) . For simplicity denote L = |Cn/C(σ)|, and order contexts in c Cn/C(σ) as {c(ℓ)}ℓ [L] s.t. k(c(1)) > k(c(2)) > > k(c(L)), or equivalently mk(c(1)) < mk(c(2)) < < mk(c(L)) according to Eq.(28). Then multiplying Eq. (33) by by Multi-channel Autobidding with Budget and ROI Constraints k(c(ℓ)) k(c(ℓ+1)) (which is strictly positive due to the ordering of contexts), and summing ℓ= 1 . . . L we get c Cn/C(σ) k(c)Yk(c) = X ℓ [L] k(c(ℓ))Yk(c(ℓ)) X ℓ [L] mk(c(ℓ)) k(c(ℓ)) k(c(ℓ+1)) (i) = 8 log(T) X k(c(ℓ)) k(c(ℓ+1)) (ii) 8 log(T) Z = 8 log(T) k(c(L)) (iii) = 8 log(T) minc Cn/C(σ) k(c) . Here (i) follows from the definition of mk(c) in Eq. (28) where mk(c) = 8 log(T ) 2 k(c) ; both (ii) and (iii) follow from the ordering of contexts so that k(c(1)) > k(c(2)) > > k(c(L)). Note that for any c Cn/C(σ) and arm k such that ak < rj,n 1, we have k(c) = Lj(rj,n, c) Lj(rj,n 1, c) + Lj(rj,n 1, c) Lj(ak, c) > Lj(rj,n 1, c) Lj(ak, c) (i) σj,n 1(c)(rj,n 1 ak) (ii) (σj,n 1(c) σj,n(c)) (rj,n 1 ak) (iii) = (1 + λ) (sj,n 1 sj,n) (rj,n 1 ak) > (sj,n 1 sj,n) (rj,n 1 ak) , where in (i) we recall the slope σj,n 1(c) is defined in Lemma 4.5 and further (i) follows from concavity of Lj(ρj, c) in the first argument ρj; in (ii) we used the fact that σj,n(c) 0 since the optimal budget arg maxρj [0,ρ] Lj(ρj, c) is taken at the nth turning point, and is the largest turning point whose left slope is non-negative from Lemma 4.5; (iii) follows from the definition σj,n (c) = (1 + λ)sj,n (µ + γλ) for any n . Finally combining Eqs. (32), (34) and (35), and summing over n = 1 . . . Sj we get t>K:ct / C(σ) k [K]:akK:ct Cn/C(σ) k [K]:ak max U+ j (ct). The cases where arm ak U+ j (ct) and ak > max U+ j (ct) are symmetric to ak U j (ct) and ak < min U+ j (ct), respectively, and we omit from this paper. Multi-channel Autobidding with Budget and ROI Constraints Therefore, combining Eqs. (31) and (36) we can conclude X1 e O(δT + σT + 1 C.5.2. BOUNDING X2. We first rewrite X2 as followed t>K:ct C(σ) k [K] k(ct)I{ρj,t = ak, Nk,t mk(ct)} c Cn C(σ) k(c)I{ct = c, ρj,t = ak, Nk,t mk(c)} c Cn C(σ) k(c)Yk(c) k {k n ,k+ n } k(c)Yk(c) + X k [K]/{k n ,k+ n } k(c)Yk(c) (iii) Tδ(1 + CF ) X n [Sj] (sj,n + sj,n+1) + X k [K]/{k n ,k+ n } k(c)Yk(c) . where in (i) we define Yk(c) = P t>K I{ct = c, ρj,t = ak, Nk,t mk(c)}; in (ii) we separate out two arms k n and k+ n defined as followed: recall for context c Cn C(σ), the optimal budget arg maxρj [0,ρ] Lj(ρj, c) = rj,n is taken at the nth turning point per the definition of Cn in Eq. (28), and thereby we defined k n := max{k [K] : ak < rj,n} to be the arm closest to and no greater than rj,n, whereas k+ n := min{k [K] : ak > rj,n} to be the arm closest to and no less than rj,n; in (iii), for small enough δ < minn [Sj] rj,n rj,n 1, we know that k n lies on the line segment between rj,n 1 and rj,n, so k n (c) = σ j (c)(rj,n ak n ) σ j (c)δ (1 + CF )sj,n 1δ, where in the final inequality follows from the definition of σ j (c) = σj,n(c) = (1 + λ)sj,n (µ + γλ) (1 + λ)sj,n (1 + CF )sj,n where CF is defined in Eq. (4.6). A similar bound holds for k+ n (c). Then, following the same logic as Eqs. (33), (34), (35) in Section C.5.1 where we bound X1, we can bound P c Cn C(σ) k(c)Yk(c) as followed for any arm k [K]/{k n , k+ n }, i.e. arms who are at least δ away from the optimal per-channel budget w.r.t. c: c Cn C(σ) k(c)Yk(c) 8 log(T) minc Cn C(σ) k(c) . (39) Now, the set k [K]/{k n , k+ n } in Eq. (38) can be further split into two subsets, namely {k [K] : ak < rj,n δ} and {k [K] : ak > rj,n + δ} due to the definitions k n := max{k [K] : ak < rj,n} and k+ n := min{k [K] : ak > rj,n}. Therefore, for any k s.t. ak < rj,n δ and any c Cn C(σ), k(c) = Lj(rj,n, c) Lj(ak, c) σ j (c)(rj,n ak) σ(rj,n ak) , where the final inequality follows from the definition of C(σ) in Eq. (28) such that σ j (c) (σ) for c C(σ). Hence combining this with Eq. (39) we have k [K]:ak rj,n + δ}. Hence, combining Eqs. (38) and (40) we can conclude X2 e O δT + 1 Multi-channel Autobidding with Budget and ROI Constraints Here, similar to our bound in Eq. (36) for bounding X1, we hide all logarithmic factors using the notation e O, and note that the constants CF , (sj,n)n Sj, Sj are all absolute constants that depend only on the support Fj and corresponding sampling distribution pj for value-cost pairs; see definitions of these absolute constants in Lemma 4.5 and 4.6. C.5.3. BOUNDING X3. We first define L = (1 + γ) ρCF + (1 + CF ) V (42) where CF is specified in Lemma 4.6. Recalling the definition k(c) = maxρj [0,ρ] Lj(ρj, c) Lj(ak, c) in Eq. (28), and (1 + γ)ρCF Lj(ρj, c) (1 + CF ) V for any ρj [0, ρ] and context c (see Lemma 4.6), it is easy to see k(c) L k [K], c . (43) Then we bound X3 as followed t>K E [ k(c)I{ρj,t = ak, Nk,t > mk(c)}] t>K P (ρj,t = ak, Nk,t > mk(ct)) t>K P ˆVj,t(ak) λtγ + µt 1 + λt ak + UCBj,t(ak) ˆVj,t(ρ j(t)) λtγ + µt 1 + λt ρ j(t) + UCBj,t(ρ j(t)), Nk,t > mk(ct) , where (i) follows from Eq. (43); in (ii), recall that we choose arm ρj,t = ak because the estimated UCB rewards of arm ak is greater than that of any other arm including ρ j(t) according to the UCB-SGD (Algorithm 1), or mathematically, ˆVj,t(ak) λtγ+µt 1+λt ak + UCBj,t(ak) ˆVj,t(ρ j(t)) λtγ+µt 1+λt ρ j(t) + UCBj,t(ρ j(t)). Here we also used the fact that ρ j(t) lies in the arm set A(δ) for all t (see Remark C.1). Now let ˆRn(ak) denote the average conversion of arm k over its first n pulls, i.e. ˆRn(ak) = ˆVj,τ(ak) for τ = min{t [T] : Nk,t = n} (45) where we recall ˆVj,τ(ak) is the estimated conversion for arm ak in channel j during period τ as defined in Algorithm 1. In other words, τ is the period during which arm ak is pulled for the nth time so ˆRn(ak) = ˆVj,τ(ak). Hence, we continue with Eq. (44) as followed: P ˆVj,t(ak) λtγ + µt 1 + λt ak + UCBj,t(ak) ˆVj,t(ρ j(t)) λtγ + µt 1 + λt ρ j(t) + UCBj,t(ρ j(t)), Nk,t > mk(ct) P max n:mk(ct) ˆRn (ρ j(t)) + UCBn (ρ j(t)) λtγ + µt 1 + λt ρ j(t) Now, when the event n ˆRn(ak) + UCBn(ak) λtγ+µt 1+λt ak > ˆRn (ρ j(t)) + UCBn (ρ j(t)) λtγ+µt 1+λt ρ j(t) o occurs, it is easy to see that one of the following events must also occur: G1,n = Rn(ak) V (ak) + UCBn(ak) for n s.t. mk(ct) < n t G2,n = Rn (ρ j(t)) V (ρ j(t)) UCBn(ρ j(t)) for n s.t. 1 n t G3 = Vj(ρ j(t)) λtγ + µt 1 + λt ρ j(t) < Vj(ak) λtγ + µt 1 + λt ak + 2 UCBn(ak) (47) Multi-channel Autobidding with Budget and ROI Constraints Note that for n > mk(ct), we have UCBn(ak) = q mk(ct) = k(ct) 2 since we defined mk(c) = 8 log(T ) 2 k(c) in Eq. (28). Therefore Vj(ak) λtγ + µt 1 + λt ak + 2 UCBn(ak) < Vj(ak) λtγ + µt 1 + λt ak | {z } =L(ak,ct) + k(ct) (i) = Vj(ρ j(t)) λtγ + µt 1 + λt ρ j(t) | {z } =L(ρ j (t),ct)=maxa A(δ) L(a,ct) where (i) follows from the definition of k(c) = maxa A(δ) L(a, c) L(ak, c) in Eq. (28) for any context c. This implies that event G3 in Eq. (47) cannot hold for n > mk(ct). Therefore P ˆRn(ak) + UCBn(ak) λtγ + µt 1 + λt ak > ˆRn (ρ j(t)) + UCBn (ρ j(t)) λtγ + µt 1 + λt ρ j(t) P (G1,n G2,n ) . (48) From the standard UCB analysis and the Azuma Hoeffding s inequality, we have P(G1,n) V T 4 and P(G2,n ) V T 4 . Hence combining Eqs. (44) (46), (48) we can conclude n= mk(ct) +1 n =1 (P (G1,n) + P (G2,n )) n= mk(ct) +1 C.6. Proof for Theorem 4.8 Starting from Proposition 4.3, we get j [M] Vj(ρj,t) MV (T τA) + X t [τA] Lj(ρ j(t), ct) Lj(ρj,t, ct) t [τA] (λtg1,t + µtg2,t) i (i) MV (T τA) + O σT + δT + 1 where in (i) we applied Lemma 4.7 and 4.4. Taking η = 1/ T, δ = σ = T 1/3 (i.e. K = O(T 1/3) yields T GL-OPT j [M] Vj(ρj,t) i O(T 2/3). According to Lemma 4.5, Vj(ρj) is concave for all j [M], so O(T 1/3) GL-OPT 1 j [M] Vj(ρj,t) t [T ] ρj,t j [M] Vj(ρj,T ) Multi-channel Autobidding with Budget and ROI Constraints where in the final equality we used the definition ρT as defined in Algorithm 1. Regarding ROI constraint satisfaction, consider t [T ] E [g1,t] j [M] E [Vj(ρj,t; zj,t) γρj,t] j [M] E [Vj(ρj,t) γρj,t] t [T ] ρj,t t [T ] ρj,t ρj,T γρj,T ] . where (i) follows from Lemma C.3; in (ii) we again applied concavity of Vj(ρj). We omit the analysis for the budget constraint as it is similar to the above. C.7. Additional Results for Section 4 Proposition C.2. Assume Assumption 4.2 holds, and recall zj = (vj, dj) Fj is any realization of values and costs for channel j [M]. Then, for any channel j [M], we have minzj Fj vj,1 dj,1 > γ, where we recall the ordering vj,1 dj,1 > vj,2 dj,2 > > vj,mj dj,mj for any element zj = (vj, dj) Fj (see Section 4). Further, there exists some eρ (0, ρ) s.t. for any per-channel budget ρj eρ, we have Vj(ρj; zj) = vj,1 dj,1 ρj > γρj for any j [M]. Proof. Under Assumption 4.2, it is easy to see for any realization of value-cost pairs zj = (vj, dj) there always exists an auction n [mj] whose value-to-cost ratio is at least γ, i.e. vj,n > γdj,n. Hence we know that vj,1 dj,n > γ. Now, in Eq. (25) within the proof of Lemma 4.5, we showed Vj(ρj; zj) = v j x j(ρj; zj) = X dj,n ρj + bj,n I {dj,0 + + dj,n 1 ρj dj,0 + + dj,n} , where dj,0 = vj,0 = bj,1 = 0. This implies that for any ρj < dj,1, we have Vj(ρj; zj) = vj,1 dj,1 ρj > γρj. Therefore, we can take eρ = minj [M] minzj Fj dj,1, which ensures that for any ρj eρ and realization zj Fj we have Vj(ρj; zj) = vj,1 dj,1 ρj > γρj for any channel j [M]. Lemma C.3 (Constraint satisfaction). Assume Assumption 4.2 holds, and consider β = ρ = 1 log(T ) in Algorithm 1. Then, for large enough T we have t [T ] g1,t 0 and 1 T j [M] ρj,t ρ , where we recall g1,t = P j [M] (Vj(ρj,t; zj,t) γρj,t). Proof. Recall τA [T] defined in step 10 of Algorithm 1. If τA = T, then we know that Algorithm 1 does not exit the while loop, and therefore S1,t γMρ + βρ(T t) 0 for t = T, or equivalently S1,T γMρ > 0. Since we recall S1,T = P t [T 1] g1,t, we can conclude that P t [T ] g1,t = S1,T + g1,T Mρ + g1,T 0 since g1,T γMρ. Similarly, we also have S2,t + Mρ + ρ(T t) ρT for t = T, or equivalently S2,T ρT Mρ where we used the fact that ρ = 1/ log(T) < ρ for large enough T and M 2. Hence, recalling S2,T = P j [M] ρj,t, we can conclude that P j [M] ρj,t = S2,T + P j [M] ρj,T ρT Mρ + P j [M] ρj,T ρT since P j [M] ρj,T Mρ. Multi-channel Autobidding with Budget and ROI Constraints If τA < T, then we know that at the stopping time τA, the while loop in Algorithm 1 has not yet exited, so we have S1,τA γMρ + βρ(T τA) 0 and S2,τA + Mρ + Mρ(T τA) ρT (53) t [T ] g1,t = X t [τA 1] g1,t + g1,τA + t=τA+1 g1,t (i) γMρ βρ(T τA) + g1,τA + t=τA+1 g1,t γMρ βρ(T τA) γMρ + t=τA+1 g1,t (ii) = βρ(T τA) + Vj(ρ; zj,t) γρ (iii) βρ(T τA) + ρ min zj Fj vj,1 dj,1 γρ = βρ(T τA) + (T τA)M ρ min zj Fj vj,1 dj,1 γρ where (i) follows from S1,τA 1 = P t [τA 2] g1,t and Eq. (53); (ii) follows from Algorithm 1 where we set ρj,t = ρ for all j [M] and t = τA + 1 . . . T; for (iii), assuming the jth channel s realized value cost pairs zj,t is the element zj Fj, then Proposition C.2 says Vj(ρ; zj,t) vj,1 dj,1 ρ since ρ = 1 log(T ) < eρ for large enough T. Hence Vj(ρ; zj,t) minzj Fj vj,1 dj,1 ρ; (iv) follows from the fact that minzj Fj vj,1 dj,1 > γ according to Proposition C.2, so M minzj Fj vj,1 dj,1 Mγ + β since β = 1 log(T ) M minzj Fj vj,1 dj,1 Mγ for large enough T. Similarly, we have j [M] ρj,t = X j [M] ρj,t + X j [M] ρj,τA + (i) ρT Mρ Mρ(T τA) + X j [M] ρj,τA + M(T τA)ρ ρT Mρ Mρ(T τA) + Mρ + M(T τA)ρ where (i) follows from S2,τA = P j [M] ρj,t and Eq. (53), as well as in Algorithm 1 we set ρj,t = ρ for all j [M] and t = τA, τA + 1 . . . T. Lemma C.4. Let (λt, µt)t [T ] be the dual variables generated by Algorithm 1. Then for any λ, µ [0, CF ] and t [T] we have τ [t] (λτ λ) g1,τ ηM 2 V 2 τ [t] (µτ µ) g2,τ ηρ2 2η (µ µ1)2 . (56) where we recall g1,τ = P j [M] (Vj,τ(ρj,τ) γρj,τ) and g2,τ = ρ P j [M] ρj,τ. Multi-channel Autobidding with Budget and ROI Constraints Proof. We will show Eq. (56). Starting with the first inequality w.r.t. λτ s, we have (λτ λ) g1,τ = (λτ+1 λ) g1,τ + (λτ λτ+1) g1,τ (57) Since λτ+1 = Π[0,CF ] (λτ ηg1,τ)+ = arg minλ [0,CF ] (λ (λτ ηg1,τ))2, we have (λτ+1 (λτ ηg1,τ)) (λ λτ+1) 0 λ [0, CF ] . (58) (λτ+1 λ) g1,τ 1 η (λτ+1 λτ) (λ λτ+1) = 1 2η (λ λτ)2 (λ λτ+1)2 (λτ+1 λτ)2 . (59) Plugging the above back into Eq. (57) we get (λτ λ) g1,τ (λτ λτ+1) g1,τ + 1 2η (λ λτ)2 (λ λτ+1)2 (λτ+1 λτ)2 2g2 1,τ + 1 2η (λ λτ)2 (λ λτ+1)2 2η (λ λτ)2 (λ λτ+1)2 , where the final inequality follows from the fact that Vj,τ(ρj,τ) V for any j [M] and τ [t] so g1,τ M V . Summing the above over τ = 1 . . . t and telescoping we get X τ [t] (λτ λ) g1,τ ηM 2 V 2 2η (λ λ1)2 for λ [0, CF ] . Following the same arguments above we can show X τ [t] (µτ µ) g2,τ ηρ2 2η (µ µ1)2 for µ [0, CF ] . Proposition C.5. Under Assumption 4.2, the advertiser s per-channel only budget optimization problem, namely CH-OPT(IB) is a convex problem. Proof. Recalling the CH-OPT(IB) in Eq. (3) and the definition of IB in Eq. (2), we can write CH-OPT(IB) as CH-OPT(IB) = max (γj)j [M] I j M Vj(ρj) γ X j [M] ρj ρ , Here we used the definition Vj(ρj) = E [Vj(ρj; zj)] in Eq. (5), and Dj(ρj; zj) = ρj for any zj under Assumption 4.2. According to Lemma 4.5, Vj(ρj) is concave in ρj for any j, so the objective of CH-OPT(IB) maximizes a concave function. For the feasibility region, assume ρj and ρ j are feasible, then defining ρ j = θρj + (1 θ)ρ j for any θ [0, 1], we know that X Vj(ρ j ) γρ j (i) X θVj(ρj) + (1 θ)Vj(ρ j) γρ j j M (Vj(ρj) γρj) + (1 θ) X Vj(ρ j) γρ j Multi-channel Autobidding with Budget and ROI Constraints where (i) follows from concavity of Vj(ρj) and (ii) follows from feasiblity of ρj and ρ j. On the other hand it is apparent that P j [M] ρ j ρ. Hence we conclude that for any ρj and ρ j feasible, ρ j = θρj + (1 θ)ρ j is also feasible, so the feasible region of CH-OPT(IB) is convex. This concludes the statement of the proposition.