# regularized_online_allocation_problems_fairness_and_beyond__3302fb17.pdf

Regularized Online Allocation Problems: Fairness and Beyond

Santiago Balseiro * 1 2 Haihao Lu * 3 2 Vahab Mirrokni 2

Online allocation problems with resource constraints have a rich history in computer science and operations research. In this paper, we introduce the regularized online allocation problem, a variant that includes a non-linear regularizer acting on the total resource consumption. In this problem, requests repeatedly arrive over time and, for each request, a decision maker needs to take an action that generates a reward and consumes resources. The objective is to simultaneously maximize total rewards and the value of the regularizer subject to the resource constraints. Our primary motivation is the online allocation of internet advertisements wherein ﬁrms seek to maximize additive objectives such as the revenue or efﬁciency of the allocation. By introducing a regularizer, ﬁrms can account for the fairness of the allocation or, alternatively, punish under-delivery of advertisements two common desiderata in internet advertising markets. We design an algorithm when arrivals are drawn independently from a distribution that is unknown to the decision maker. Our algorithm is simple, fast, and attains the optimal order of sub-linear regret compared to the optimal allocation with the beneﬁt of hindsight. Numerical experiments conﬁrm the effectiveness of the proposed algorithm and of the regularizers in an internet advertising application.

1. Introduction Online allocation problems with resource constraints have abundant real-world applications and, as such, have been extensively studied in computer science and operations research. Prominent applications can be found in internet advertising and cloud computing, both of which are multibillion dollar markets. In display advertising, for example,

*Equal contribution 1Columbia University, New York, USA 2Google Research, New York, USA 3University of Chicago. Correspondence to: Santiago R. Balseiro <srb2155@columbia.edu>, Haihao Lu <haihao.lu@chicagobooth.edu>.

Proceedings of the 38 th International Conference on Machine Learning, PMLR 139, 2021. Copyright 2021 by the author(s).

a publisher typically signs contracts with many advertisers agreeing to deliver a ﬁxed number of impressions within a limited time horizon. Impressions arrive sequentially over time and the publisher needs to assign, in real time, each impression to one advertiser so as to maximize metrics such as the cumulative click-through rate or the number of conversions while satisfying contractual agreements on the number of impressions to be delivered (Feldman et al., 2010). In cloud computing, jobs arriving online need to be scheduled to one of many servers. Each job consumes resources from the server, which need to be shared with other jobs. The scheduler needs to assign jobs to servers to maximize metrics such as the cumulative revenue or efﬁciency of the allocation. When jobs processing times are long compared to their arrival rates, this scheduling problem can be cast as an online allocation problem (Badanidiyuru et al., 2018).

The literature on online allocation problems focuses mostly on optimizing additively separable objectives such as the total click-throughout rate, revenue, or efﬁciency of the allocation. In many settings, however, decision makers are also concerned about ancillary objectives such as fairness across advertisers, avoiding under-delivery of impressions, balancing the load across servers, or avoiding saturating resources. These metrics are, unfortunately, non-separable and cannot be readily accommodated by existing algorithms that are tailored for additively separable objectives. Thus motivated, in this paper, we introduce the regularized online allocation problem, a variant that includes a non-linear regularized acting on the total resource consumption. The introduction of a regularizer allows the decision maker to simultaneously maximize an additively separable objective together with other metrics such as fairness and load balancing that are non-linear in nature.

More formally, we consider a ﬁnite horizon model in which requests arrive repeatedly over time. The decision maker is endowed with a ﬁxed amount of resources that cannot be replenished. Each arriving requests is presented with a concave reward function and a consumption matrix. After observing the request, the decision makers needs to take an action that generates a reward and consumes resources. The objective of the decision maker is to maximize the sum of the cumulative reward and a regularizer that acts on the total resource consumption. (Our model can easily accommodate

Regularized Online Allocation Problems

a regularizer that acts on other metrics such as, say, the cumulative rewards by adding dummy resources.)

Motivated by practical applications, we consider an incomplete information model in which requests are drawn independently from a distribution that is unknown to the decision maker. That is, when a request arrives, the decision maker observes the reward function and consumption matrix of the request before taking an action, but does not get to observe the reward functions and consumption matrices of future requests until their arrival. For example, in display advertising, publishers can estimate, based on the attributes of the visiting user, the click-through rates of each advertiser before assigning an impression. However, the click-through rates of future impressions are not known in advance as these depend on the attributes of the unknown, future visitors. The objective of this paper is to design simple algorithms that attain low regret relative to the best allocation when all requests are known in advance.

1.1. Our Results Duality theory has been successfully used to tackle online allocation problems with additively separable objectives because it allows to decouple a master, resource-constrained problem into simpler subproblems one for each request. We show that similar techniques can be used to design algorithms for online allocation problems with a non-additively separable regularizer.

In particular, we propose a dual-descent algorithm that maintains a dual variable for each resource constraint. When a request arrives, the reward function is adjusted with the dual variables to capture the opportunity of consuming resources, and then actions are taken greedily with respect to the dual-variable-adjusted reward function. The dual variables are also used to determine an ideal resource consumption that optimizes the regularizer. A simple, yet key observation is that by comparing the actual resource expenditure of the current action to the ideal resource consumption from the regularizer, we can construct a noisy, unbiased subgradient of the dual objective function. Using these subgradients as inputs, our algorithm employs weighted online subgradient descent to update the dual variables after each request. We prove that the regret of our algorithm is of the order O(T 1/2), where T is number of time periods, when resources are scaled proportionally to the length of the horizon. This rate is unimprovable under our minimal assumptions on the input.

When updating the dual variables, it is required to project dual variables to the feasible set. In standard online allocation problems the dual feasible set is the non-negative orthant and the projection step is trivial. The introduction of a regularizer, however, alters the geometry dual feasible set. In many cases, the dual feasible set becomes more complex,

which, in turn, results in a more difﬁcult projection problem. By suitably picking the weights for subgradient descent, it is possible to adjust the update rule to better capture the geometry of the dual feasible set and obtain more tractable projection problems.

An advantage of our algorithm is that it is efﬁcient and simple to implement. In many cases, the update rule can be implemented in linear time and there is no need to solve auxiliary convex optimization problems on historical data as in other methods in the literature.

To illustrate of our approach we discuss several regularizers that are useful in practice and numerically evaluate our algorithm on an internet advertising application using a max-min fairness regularizer. Our experiments conﬁrm that our proposed algorithm attains O(T 1/2) regret as suggested by our theory, and showcase the trade-off between click-through rates and fairness: fairness can be signiﬁcantly improved by reducing click-through rates by a small amount.

1.2. Related Work Online allocation problems have been extensively studied in computer science and operations research literature. Table 1 summarizes the differences between our work and the existing literature on online allocation problems.

There is a stream of literature that studies online allocation problems under adversarial input models, i.e., when the incoming requests are adversarially chosen (Mehta et al., 2007; Feldman et al., 2009). We focus, instead, on stochastic models when the incoming requests are drawn i.i.d. from an unknown distribution.

Early work on online allocation with stochastic input models focus on linear reward functions, i.e., the case when the reward function is linear in the decision variable. Devanur and Hayes (2009) presented a two-phase dual training algorithm for linear reward function that is proportional to the resource consumption, which attains O(T 2/3) regret. Feldman et al. (2010) introduced a similar algorithm for more general linear reward functions which obtains the same order of regret. Later on, Agrawal et al. (2014) proposed a new dual-based algorithm that periodically solves a linear program using all data collected so far in order to update the dual variable, which improves the regret bound to O(T 1/2). Devanur et al. (2019) studied more complicated algorithms that not only obtain O(T 1/2) regret, but also yield nearoptimal dependence on other parameters of the problems such as the number of resources. On the other hand, by a result of Arlotto and Gurvich (2019), (T 1/2) regret turns out to be lowest possible attainable regret under such settings. With additional assumptions on the input, the regret bound can be further improved. When the expected instance is non-degenerate, Jasin (2015) presented a new algorithm that attains O(log T) regret by periodically re-estimating

Regularized Online Allocation Problems

Resource Papers Objective constraints Input model Results

Mehta et al. (2007); Feldman et al. (2009)

Linear Hard Adversarial Fixed comp. ratio

Devanur and Hayes (2009); Feldman et al. (2010); Agrawal et al. (2014); Devanur et al. (2019); Jasin (2015); Li and Ye (2019); Li et al. (2020)

Linear Hard Stochastic Sublinear regret

Balseiro et al. (2020) Nonlinear,

Hard Stochastic Sublinear regret

Jenatton et al. (2016); Agrawal and Devanur (2015)

Non-separable Soft Stochastic Sublinear regret

Eghbali and Fazel (2016) Non-separable Soft Adversarial Fixed comp. ratio Tan et al. (2020) Non-separable Hard Adversarial Fixed comp. ratio This paper Non-separable Hard Stochastic Sublinear regret

Table 1: Comparison of our work with the existing literature on online allocation problems.

the distribution of requests. When the distribution of requests is absolutely continuous with uniformly bounded densities, Li and Ye (2019) presented a different algorithm that attains O(log T) regret. Their algorithm updates the dual variables by periodically solving a linear program using all data collected so far.

While the algorithms described above usually require solving large linear problem periodically, there is a recent line of work seeking simple algorithms that have no need of solving large linear program. Balseiro et al. (2020) studied a simple dual mirror descent algorithm for online allocation problems with concave reward functions, which attains O(T 1/2) regret. Our algorithm is similar to theirs in spirit, but our analysis is simpler as we do not need to explicitly bound the stopping time corresponding to the ﬁrst time a resource is depleted. Their updates can be computed in linear time, and avoids the need of solving large linear program. Soon afterward, Li et al. (2020) presented a similar fast algorithm that attains O(T 1/2) regret for linear reward. Our proposed algorithm falls into this category: the update per iteration can be efﬁciently computed in linear time and there is no need to solve large convex optimization problems. A major difference is that we allow for a regularizer in the objective, which provides us with greater modeling power while increasing the analysis difﬁculty.

While most of the literature on online allocation focuses on maximizing an additively separable objective, other features of the allocation, such as fairness and load balancing, sometimes are crucial to the decision maker. For example, fairness is a central concept in welfare economics. Differ-

ent reasonable metrics of equity have been proposed and studied: max-min fairness, which maximizes the reward of the worst-off agents (Nash Jr, 1950; Bansal and Sviridenko, 2006); proportional fairness, which makes sure that there is no alternative allocation that can lead to a positive aggregate proportional change for each agent (Azar et al., 2010; Bateni et al., 2018); or -fairness, which generalizes the previous notions (Mo and Walrand, 2000; Bertsimas et al., 2011; 2012), and allows to recovers max-min fairness and proportional fairness as special cases when = 1 and = 1, respectively. The above line of work focuses on optimizing the fairness of an allocation problem in either static settings, adversarial settings, or stochastic settings that are different to ours. In contrast, our framework is concerned with maximizing an additively separable objective but with an additional regularizer corresponding to fairness (or other desired ancillary objective).

The regularizer can be viewed as a reward that is not separable over time. The existence of such non-separable regularizer makes the theoretical analysis much harder than the one without the regularizer. For example, compared to Balseiro et al. (2020), one fundamental difﬁculty in the analysis is that the dual variables, which correspond to the opportunity cost that a unit of resource is consumed, can be negative due to the existence of the regularizer. This precludes the stopping time analysis introduced in Balseiro et al. (2020) from applying in our result. We come up with a new analysis that overcomes that difﬁculty by utilizing the complementary slackness condition to control the performance up to the stopping time.

Regularized Online Allocation Problems

Indeed, there have been previous works studying online allocation problems with non-separable rewards, but most of them are under very restrictive assumptions, precluding most of the interesting examples we present in Section 2.1. Jenatton et al. (2016) studies a problem with soft resource constraints in which violations of the constraints are penalized using a non-separable regularizer. Thus, different to our paper, they allow the resource constraints to be violated. They provide a similar algorithm to ours, but their regret bounds depend on the optimal solution in hindsight, which, in general, it is hard to control. Eghbali and Fazel (2016) consider maximizing a non-separable concave reward and requires the reward to be monotonic. They do not consider hard resource constraints and, moreover, they consider an adversarial input model. They provide similar primal-dual algorithms that are shown to attain ﬁxed competitive ratios. Tan et al. (2020) studies a similar problem as our regularized online allocation setting, but restrict attention to one resource. Compared to the above works, we consider stochastic inputs, which allow us to attain sublinear regret, while they consider adversarial inputs under which sublinear regret is not attainable.

Another related work to ours is Agrawal and Devanur (2015), where the focus is to solve general online stochastic convex program that allows general concave objectives and convex constraints. When the value of the benchmark is known, they present fast algorithms; otherwise, their algorithms require periodically solving large convex optimization problems with the data collected so far to obtain a good estimate of the benchmark. In principle, our regularized online allocation problem (see Section 2 for details) can be reformulated as an instance of the online stochastic convex program presented in Agrawal and Devanur (2015). Such reformulation makes the algorithms proposed in Agrawal and Devanur (2015) more complex than ours as they require keeping track of additional dual variables and solving convex optimization program on historical data (unless the optimal value of the objective is known). Moreover, their algorithm treat resources constraints as soft, i.e., they allow constraints to be violated and then prove that constrains are violated, in expectation, by an amount sublinear in the number of time periods T. Instead, in our setting, resource constraints are hard and cannot be violated, which is a fundamental requirement in many applications. Additionally, our proposed algorithm is simple, fast, and does not require estimates of the value of the benchmark nor solving large convex optimization problems.

2. Problem Formulation We consider the following generic convex problem with a ﬁnite horizon of T time periods, resource constraints, and a

regularizer r on the resource consumption:

(O) : max x:xt2X

ft(xt) + Tr

where xt 2 X Rd is the action at time t, ft 2 Rd ! R+ is the non-negative concave reward function received at time t, bt 2 Rm d

+ is the entry-wise non-negative cost matrix received at time t, 2 Rm

++ is the positive resource constraint vector, r is a concave regularizer on the consumption. The assumption bt 0 implies that we cannot replenish resources once they are consumed. We assume that the action set X is a convex and compact set in Rd

+, and 0 2 X. The above assumption implies that we can only take nonnegative actions. Moreover, we can always take a void action by choosing xt = 0 in order to make sure we do not exceed the resource constraints. This guarantees the existence of a feasible solution.

We assume the request (ft, bt) at time t is generated i.i.d. from an unknown distribution P 2 (S) with ﬁnite support S = {(f 1, b1), . . . , (f n, bn)} and where (S) is the space of all probability distributions over S.

In the online setting, at each time period 1 t T, we receive a request (ft, bt), and we use an algorithm A to make a real-time decision xt based on the current request (ft, bt) and the previous history Ht 1 := {fs, bs, xs}t 1

xt = A(ft, bt|Ht 1) . (2)

Moreover, algorithm A must satisfy constraints Pt

s=1 bsxs T and xt 2 X for every t T. In particular, we deﬁne the expected reward of an algorithm A over distribution P 2 (S) as

R(A|P) = EP

ft(xt) + Tr

where xt is computed by (2). The baseline we compare with is the expected reward of the optimal solution when all requests are known in advance, which is also referred as the ofﬂine problem in the computer science literature. This amounts to solving for the optimal allocation under full information of all requests and then taking expectations over all possible realizations:

OPT(P) = EP

ft(xt) + Tr

Regularized Online Allocation Problems

Our goal is to a design an algorithm A that attains low regret while satisfying above constraints. We measure the regret of an algorithm as the worst-case difference, over distributions in (S), between the expected performance of the benchmark and the algorithm:

Regret(A) = sup P2 (S)

{OPT(P) R(A|P)} .

2.1. Examples of the Regularizer We now describe some examples of the regularizer. First, by setting the regularizer to zero, we recover a standard online allocation problem.

Example 1. (No regularizer) When the regularizer is r(a) = 0, we recover the non-regularized problem.

Our next example allows for max-min fairness guarantees, which have been studied extensively in the literature (Nash Jr, 1950; Bansal and Sviridenko, 2006; Mo and Walrand, 2000; Bertsimas et al., 2011; 2012). Here we state the regularizer in terms of consumption. In many settings, however, it is reasonable to state the fairness regularizer in terms of other quantities such as the cumulative utility of advertisers. As discussed in Example 6, such regularizers can be easily accommodated by introducing dummy resource constraints.

Example 2. (Max-min Fairness) The regularizer is deﬁned as r(a) = λ minj(aj/ j), i.e., the minimum relative consumption. This regularizer imposes fairness on the consumption between different advertisers, making sure that no advertiser gets allocated a too-small number of ad slots. Here λ > 0 is parameter that captures that importance of the regularizer relative to the rewards.

In applications like cloud computing, load should be balanced across resources to avoid congestion. The following regularizer is reminiscent of the makespan objective in machine scheduling.

Example 3. (Load Balancing) The regularizer is deﬁned as r(a) = λ maxj(aj/ j), i.e., the negative maximum relative consumption. This regularizer guarantees that consumption is evenly distributed across resources by making sure that no resource is too demanded.

In some settings, the cost of utilizing resources is non-linear and convex because of decreasing returns to scale. The next regularizer allows to capture situations in which the cost of utilizing resources increases as they become saturated.

Example 4. (Hinge Loss of Consumption) The regularizer is deﬁned as r(a) = Pm

j=1 cj max(aj tj, 0), a hinge loss function with thresholds tj 2 [0, j] and penalties cj. This regularizer can be used when there is an extra variable

cost cj for each unit of resource consumed above a threshold tj.

To maximize reach, internet advertisers, in some cases, prefer that their budgets are spent as much as possible or their reservation contracts are delivered as many impressions as possible. We can incorporate these features by having the ﬁrm pay a goodwill penalty whenever targets are not met.

Example 5. (Mirrored Hinge Loss of Consumption) The regularizer is deﬁned as r(a) = Pm

j=1 cj max(tj aj, 0), a mirrored hinge loss function with thresholds tj 2 [0, j] and penalties cj. This regularizer can be used when the advertiser j would like to spend at least tj and the ﬁrm pays a penalty cj for under-delivering.

The regularizers in the previous examples act exclusively on resource consumption. The next example shows that by adding dummy resource constraints that never bind, it is possible to incorporate regularizers that act on other quantities.

Example 6. (Santa Claus Regularizer from Bansal and Sviridenko 2006) The Santa Claus regularizer intends to make sure the minimal reward of each advertiser is not too small. Here we consider reward function ft(xt) = Pm

j=1(qtxt)j, where qt 2 Rm d, and qtxt 2 Rm is the reward vector for the m advertiser with decision xt. While the reward function ft measures the total reward for all advertisers, the regularizer intends to make sure the minimal reward of the advertisers is not too small. To reduce such problem to our setting, we ﬁrst add auxiliary budget constraints PT

t=1 qtxt T fe, and then regularizer is

t=1 qtxt, 1

t=1(qtxt)j,

where f is an upper bound of the possible reward.

2.2. The Dual Problem Our algorithm is of dual-descent nature and, thus, the Lagrangian dual problem of (3) and its constraint set play a key role. We construct a Lagragian dual of (3) in which we move the constraints to the objective using a vector of Lagrange multipliers µ 2 Rm. For c 2 Rd and µ 2 Rm we deﬁne

f (c) := sup

{f(x) c>x} , and

r ( µ) := sup

{r(a) + µ>a} ,

as the conjugate function of f(x) restricted to X1 and the conjugate function of r(a) restricted to {a|a }, respectively. Deﬁne D = {µ 2 Rm | r ( µ) < +1} as the

1More precisely, f ( c) is the conjugate function of f(x) + 1{x 2 X} with the standard deﬁnition of conjugate functions, where 1{x 2 X} is the indicator function of the constraint. We redeﬁne the conjugate function for simplicity.

Regularized Online Allocation Problems

set of dual variables for which the conjugate of the regularized is bounded. For a given distribution P, deﬁne the Lagrangian dual function D(µ|P) : D ! R as

D(µ|P) := E(f,b) P

then the following result shows that D(µ|P) provides a valid upper bound to OPT(P). All missing proofs are available in the appendix.

Proposition 1. It holds for any µ 2 D that OPT(P) TD(µ|P).

Furthermore, we call

(D) : inf µ2D TD(µ|P) , (5)

the dual problem to (1). As mentioned before, the feasible region of the dual problem, as given by D, together with the conjugate of the regularizer plays a key role in our algorithm and in our regret analysis. The following result provides some useful properties of the set D.

Lemma 1. The set D is convex and positive orthant is inside the recession cone of D, i.e., Rd

Proof. Convexity follows from Proposition 1.1.6 of Bertsekas (2009). We prove the second part. Suppose µ 2 D, namely maxa {r(a) + µ>a} < +1. Then it holds for any e 2 Rd

+ and λ > 0 that

a {r(a) + (µ + λe)>a} max

a {r(a) + µ>a} + λe>

thus µ + λe 2 D, which ﬁnishes the proof by deﬁnition of recession cone.

We next give explicit characterizations of the constraint set D and the conjugate r for the sample regularizers stated in Section 2.1.

2.3. The Constraint Set D for the Examples The next proposition presents the conjugate functions r , the corresponding domain D, and optimal actions a ( µ) 2 arg maxa {r(a) + µ>a} for the ﬁrst ﬁve examples stated in Section 2.1.

Proposition 2. The following hold:

Example 1: If r(a) = 0, then D = Rm

+ and, for µ 2 D, r ( µ) = µ> and a ( µ) = .

Example 2: If r(a) = λ minj(aj/ j), then D = 1

j2S jµj λ 8S [m]

, and, for µ 2 D, r ( µ) = >µ + λ and a ( µ) = .

Algorithm 1: Dual Subgradient Descent Algorithm Input: Initial dual solution µ0, total number of time periods T, initial resources B0 = T , weight vector w 2 Rm

++, and step-size . for t = 0, . . . , T 1 do

Receive (ft, bt) P. Make the primal decision and update the remaining

xt = arg maxx2X{ft(x) µ>

xt if bt xt Bt 0 otherwise ,

at = arg maxa {r(a) + µ>

Bt+1 = Bt btxt.

Obtain a stochastic subgradient of D(µt|P):

gt = bt xt + at .

Update the dual variable by weighted, projected subgradient descent:

µt+1 = arg min

µ2Dh gt, µi + 1

Example 3: If r(a) = λ maxj(aj/ j), then D = 1

, and, for µ 2 D, r ( µ) = >µ + λ and a ( µ) = .

Example 4: If r(a) = Pm

j=1 cj max(aj tj, 0), then D = Rm

+ and, for µ 2 D, r ( µ) = µ>t + Pm

j=1( j tj) max(µj cj, 0) and a

j( µ) = tj if µj 2 [ cj, 0) and a

j( µ) = j if µj 0.

Example 5: If r(a) = Pm

j=1 cj max(tj aj, 0), then D =

µ 2 RM | µ c

and, for µ 2 D, r ( µ) = µ>t + Pm

j=1( j tj) max(µj, 0) and a

j( µ) = tj if µj 2 [ cj, 0) and a

j( µ) = j if µj 0.

3. Algorithm Algorithm 1 presents the main algorithm we study in this paper. Our algorithm keeps a dual variable µt 2 Rm for each resource that is updated using subgradient descent, which is the workhorse algorithm of online convex optimization (Hazan et al., 2016).

At time t, the algorithm receives a request (ft, bt), and computes the optimal response xt that maximizes an opportunity

Regularized Online Allocation Problems

cost-adjusted reward of this request based on the current dual solution µt. It then takes this action (i.e., xt = xt) if the action does not exceed the resource constraint, otherwise it takes a void action (i.e., xt = 0). Additionally, it chooses a target resource consumption at by maximizing the opportunity-cost adjusted regularized (in Proposition 2 we give closed-form solutions for at for the regularizers we consider). Notice that it follows from the deﬁnition of conjugate function (4) that bt xt 2 @f

t µt) and at 2 @r ( µt). Thus gt := bt xt + at is an unbiased stochastic estimator of a subgradient of the dual problem D(µ|P) at µt:

EP [ gt] = EP [ bt xt + at]

+ r ( µt) 2 @D(µt|P) .

Finally, the algorithm utilizes gt to update the dual variable by performing an online subgradient descent step (6) with step-size and weight w. The descent step (6) can be interpreted as minimizing over D a ﬁrst-order Taylor expansion of the dual objective plus a term that penalizes movement from the incumbent solution µt using the weighted 2-norm k kw, which is given by kxk2

j for some weight vector w 2 Rm

Algorithm 1 only takes an initial dual variable and a stepsize as inputs, and is thus simple to implement. In some cases, though the constraint D can be complicated, a proper choice of the weight w may make the descent step (6) easily computable (in linear time or with a closed-form solution). We discuss some good choices for the weight w for Examples 2-5 in Appendix D. In particular, in Example 2, the dual feasible set D has an exponential number of constraints, but using weights wj = 2

j we can cast (6) as quadratic program with a linear number of constraints. In Example 3, the constraint D is a simple polytope, and we can again cast (6) as quadratic program with a linear number of constraints. Finally, in examples 4 and 5, the set D is a simple box constraint, and we can recover projected gradient descent by using the un-weighted Euclidean norm.

4. Regret Bound In this section, we present the worst-case regret bound of Algorithm 1 for solving (1). First we state the assumptions required in our analysis.

Assumption 1. (Assumptions on the support of the distribution). There exists f 2 R+ and b 2 R+ such that for all requests (f, b) 2 S in the support, it holds f(x) f for all x 2 X and kbxkw, b for all x 2 X.

The upper bounds f and b impose regularity on the space of requests, and will appear in the regret bound. In the assumption above, we denote by k kw, the dual norm of k kw, which is given by kxk2

Assumption 2. (Assumptions on the regularizer r). We assume the regularizer r and the set D = {µ 2 Rm | r ( µ) < +1} satisfy:

1. The function r(a) is concave.

2. The function r(a) is L-Lipschitz continuous in the

k kw, -norm on its effective domain, i.e., |r(a1) r(a2| Lka1 a2kw, for any a1, a2 .

3. There exists a constant a such that for any µ 2 D,

there exists a 2 arg maxa {r(a) + µ>a} such that kakw, a.

4. There exist r and r such that for any x 2 X and

(b, f) 2 S(P) that r r(bx) r.

The ﬁrst part of assumption is required to guarantee that the optimization problem is convex. The third part guarantees that the conjugate of the regularizer has bounded subgradient, while the last part imposes that the regularizer is bounded. Proposition 2 in Appendix 2.3 immediately implies that the assumption is satisﬁed by all examples we consider in this paper.

The previous assumptions imply, among other things, that the projection step (6) of the algorithm always admits a solution. First, continuity of the regularizer implies that D is closed by Proposition 1.1.6 of Bertsekas (2009). The objective is continuous and coercive. Therefore, the projection problem admits an optimal solution by Weierstrass theorem.

The next theorem presents the worst-case regret bound of Algorithm 1.

Theorem 1. Consider Algorithm 1 with step-size 0 and initial solution µ0 2 D. Suppose Assumptions 1-2 are satisﬁed. Then, it holds for any T 1 that

Regret(A) C1 + C2 T + C3

where C1 = ( f + r + L( b + a) r)/ , C2 = ( b + a)2/2,

C3 = (L + C1kwk1/2

1 )2 + kµ0k2

w, and = minj2[m] j.

While the result has a similar ﬂavor to Balseiro et al. (2020), we would like to highlight that their stopping-time analysis does not work for our setting with the regularizer because the dual variables can be strictly negative with the existence of a regularizer.

Some observations are in order. First, the previous result implies that, by choosing a step-size of order T 1/2, Algorithm 1 attains regret of order O(T 1/2) when the length of the horizon and the initial amount of resources are simultaneously scaled. Second, Lemma 1 from Arlotto and Gurvich (2019) implies that one cannot hope to attain regret lower than (T 1/2) under our assumptions. (Their result

Regularized Online Allocation Problems

Figure 1: Plot of the regret versus the horizon T for the regularized online allocation problem (8) with different regularization levels.

holds for the case of no regularizer, which is a special case of our setting.) Therefore, Algorithm 1 attains the optimal order of regret.

We prove Theorem 1 in three steps. Let A be the stopping time corresponding to the ﬁrst time that a resource is depleted. The ﬁrst step involves lower bounding the cumulative reward of the algorithm up to A in terms of the dual objective (evaluated at the average dual variable used by the algorithm) minus a complementary slackness term. Until the stopping time A, the algorithm performs standard subgradient descent steps on the dual function and, as a consequence, seeks to minimize complementary slackness. In the second step, we upper bound the complementary slackness term using standard results for online subgradient descent. By picking a suitable point in the dual space and using the theory of convex conjugates, we can relate the bound on the complementary slackness term to the value of regularizer. In the last step, we put everything together by using Proposition 1 to upper bound the optimal performance in terms of the dual function and then controlling the stopping time A.

5. Numerical Experiments In this section, we present numerical experiments on a display advertisement allocation application regularized by max-min fairness on consumption (Example 2).

Dataset. We utilize the display advertisement dataset introduced in Balseiro et al. (2014). They consider a publisher who has agreed to deliver ad slots (the requests) to different advertisers (the resources) so as to maximize the cumulative click-through rates (the reward) of the assignment. In their paper, they estimate click-through rates using mixtures of

log-normal distributions. We adopt their parametric model as a generative model and sample requests from their estimated distributions. We consider publisher 2 from their dataset, which has m = 12 advertisers. Furthermore, in our experiments, we rescale the budget so that Pm

j=1 j = 1.5 in order to make sure that the max-min fairness is strictly less than 1.

Regularized Online Problem. The goal here is to design an online allocation algorithm that maximizes the total expected click-through rate with a max-min fairness regularizer on resource consumption as in Example 2. Advertiser j 2 [m] can be assigned at most T j ad slots and the decision variables lie in the simplex X = {x 2 Rm

j=1 xj 1}. Denoting by qt 2 Rm the click-through rate of the t-th ad slot T, we have that the benchmark is given by:

t xt + λ min j=1,...,m

where λ is the weight of the regularizer. In the experiments, we consider the regularization levels λ 2 {0, 0.1, 0.01, 0.001, 0.0001} and lengths of horizon T 2 {102, 103, 2 103, . . . , 104}.

Implementation Details. In the numerical experiments, we implemented Algorithm 1 with weights wj = 2

j and step-size 0.01 T 1/2. The dual update (6) is computed by solving a convex quadratic program as stated in Appendix D using cvxpy (Diamond and Boyd, 2016). For each regularization level λ and time horizon T, we randomly choose T samples from their dataset that are fed to Algorithm 1 sequentially. In order to compute the regret, we utilize the dual objective evaluated at the average dual D( 1

t=1 µt) as an upper bound to the benchmark. We report the average cumulative reward, the average max-min consumption fairness, and the average regret of 100 independent trials in Figure 1 and Figure 2. Ellipsoids in Figure 2 (b) give 95% conﬁdence regions for the point estimates.

Observations. Consistent with Theorem 1, Figure 1 suggests that regret grows at rate O(

T) for all regularization levels. Figure 2 presents the trade-off between reward and fairness with the 95% conﬁdence ellipsoid. In particular, we can double the max-min fairness while sacriﬁcing only about 4% of the reward by choosing λ = 0.01. This showcases that fairness can be signiﬁcantly improved by solving the regularized problem with a small amount of reward reduction.

Regularized Online Allocation Problems

Figure 2: Plot of the reward PT

t=1 qtxt versus the max-

min fairness minj=1,...,m

t=1(xt)j/T j

. Dots from left to right corresponds to regularization levels λ = 0.0, 0.0001, 0.001, 0.01, 0.1, respectively.

6. Conclusion and Future Directions In this paper, we introduce the regularized online allocation problem, a novel variant of the online allocation problem that allows for regularization on resource consumption. We present multiple examples to showcase how the regularizer can help attain desirable properties, such as fairness and load balancing, and present a dual online subgradient descent algorithm for solving this problem with low regret. Future directions include extending the results in this work to more general input models (e.g., non-stationary stochastic inputs and adversarial inputs).

Shipra Agrawal and Nikhil R. Devanur. Fast algorithms for online

stochastic convex programming. In Proceedings of the Twenty Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 15, page 1405 1424, USA, 2015. Society for Industrial and Applied Mathematics.

Shipra Agrawal, Zizhuo Wang, and Yinyu Ye. A dynamic near-

optimal algorithm for online linear programming. Operations Research, 62(4):876 890, 2014.

Alessandro Arlotto and Itai Gurvich. Uniformly bounded regret in

the multisecretary problem. Stochastic Systems, 9(3):231 260, 2019.

Yossi Azar, Niv Buchbinder, and Kamal Jain. How to allocate

goods in an online market? In European Symposium on Algorithms, pages 51 62. Springer, 2010.

Ashwinkumar Badanidiyuru, Robert Kleinberg, and Aleksandrs

Slivkins. Bandits with knapsacks. J. ACM, 65(3), March 2018.

Santiago Balseiro, Haihao Lu, and Vahab Mirrokni. Dual mirror

descent for online allocation problems. In Hal Daumé III and

Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 613 628. PMLR, 13 18 Jul 2020.

Santiago R Balseiro, Jon Feldman, Vahab Mirrokni, and Shan

Muthukrishnan. Yield optimization of display advertising with ad exchange. Management Science, 60(12):2886 2907, 2014.

Nikhil Bansal and Maxim Sviridenko. The santa claus problem.

In Proceedings of the thirty-eighth annual ACM symposium on Theory of computing, pages 31 40, 2006.

Mohammad Hossein Bateni, Yiwei Chen, Dragos Ciocan, and Va-

hab Mirrokni. Fair resource allocation in a volatile marketplace. Available at SSRN 2789380, 2018.

Dimitri P Bertsekas. Convex optimization theory. Athena Scientiﬁc

Belmont, 2009.

Dimitris Bertsimas, Vivek F Farias, and Nikolaos Trichakis. The

price of fairness. Operations research, 59(1):17 31, 2011.

Dimitris Bertsimas, Vivek F Farias, and Nikolaos Trichakis. On

the efﬁciency-fairness trade-off. Management Science, 58(12): 2234 2250, 2012.

Gong Chen and Marc Teboulle. Convergence analysis of a proximal-like minimization algorithm using bregman functions. SIAM Journal on Optimization, 3(3):538 543, 1993.

Nikhil R. Devanur and Thomas P. Hayes. The adwords problem:

online keyword matching with budgeted bidders under random permutations. In Proceedings of the 10th ACM conference on Electronic commerce, EC 09, pages 71 78. ACM, 2009.

Nikhil R. Devanur, Kamal Jain, Balasubramanian Sivan, and

Christopher A. Wilkens. Near optimal online algorithms and fast approximation algorithms for resource allocation problems. J. ACM, 66(1), jan 2019.

Steven Diamond and Stephen Boyd. Cvxpy: A python-embedded

modeling language for convex optimization. The Journal of Machine Learning Research, 17(1):2909 2913, 2016.

Reza Eghbali and Maryam Fazel. Designing smoothing functions

for improved worst-case competitive ratio in online optimization. In Advances in Neural Information Processing Systems, pages 3287 3295, 2016.

Jon Feldman, Nitish Korula, Vahab Mirrokni, S. Muthukrishnan,

and Martin Pál. Online ad assignment with free disposal. In Proceedings of the 5th International Workshop on Internet and Network Economics, WINE 09, pages 374 385. Springer-Verlag, 2009.

Jon Feldman, Monika Henzinger, Nitish Korula, Vahab S. Mir-

rokni, and Cliff Stein. Online stochastic packing applied to display ad allocation. In Proceedings of the 18th annual European conference on Algorithms: Part I, ESA 10, pages 182 194. Springer-Verlag, 2010.

Elad Hazan et al. Introduction to online convex optimization. Foun-

dations and Trends in Optimization, 2(3-4):157 325, 2016.

Stefanus Jasin. Performance of an lp-based control for revenue

management with unknown demand parameters. Operations Research, 63(4):909 915, 2015.

Regularized Online Allocation Problems

Rodolphe Jenatton, Jim Huang, Dominik Csiba, and Cedric

Archambeau. Online optimization and regret guarantees for non-additive long-term constraints. ar Xiv preprint ar Xiv:1602.05394, 2016.

Xiaocheng Li and Yinyu Ye. Online linear programming: Dual

convergence, new algorithms, and regret bounds. 2019.

Xiaocheng Li, Chunlin Sun, and Yinyu Ye. Simple and fast algo-

rithm for binary integer and online linear programming. ar Xiv preprint ar Xiv:2003.02513, 2020.

Aranyak Mehta, Amin Saberi, Umesh Vazirani, and Vijay Vazirani.

Adwords and generalized online matching. J. ACM, 54(5):22 es, October 2007. ISSN 0004-5411.

Jeonghoon Mo and Jean Walrand. Fair end-to-end window-based

congestion control. IEEE/ACM Transactions on networking, 8 (5):556 567, 2000.

John F Nash Jr. The bargaining problem. Econometrica: Journal

of the Econometric Society, pages 155 162, 1950.

Shai Shalev-Shwartz et al. Online learning and online convex

optimization. Foundations and Trends in Machine Learning, 4(2):107 194, 2012.

Xiaoqi Tan, Bo Sun, Alberto Leon-Garcia, Yuan Wu, and

Danny HK Tsang. Mechanism design for online resource allocation: A uniﬁed approach. In Abstracts of the 2020 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems, pages 11 12, 2020.

Martin Zinkevich. Online convex programming and generalized

inﬁnitesimal gradient ascent. In Proceedings of the Twentieth International Conference on International Conference on Machine Learning, ICML 03, page 928 935. AAAI Press, 2003.