# nonstationary_dual_averaging_and_online_fair_allocation__3a3e2b54.pdf

Nonstationary Dual Averaging and Online Fair

Luofeng Liao, Yuan Gao, Christian Kroer

IEOR, Columbia University {ll3530,yg254,ck294}@columbia.edu

We consider the problem of fairly allocating sequentially arriving items to a set of individuals. For this problem, the recently-introduced PACE algorithm leverages the dual averaging algorithm to approximate competitive equilibria and thus generate online fair allocations. PACE is simple, distributed, and parameter-free, making it appealing for practical use in large-scale systems. However, current performance guarantees for PACE require i.i.d. item arrivals. Since real-world data is rarely i.i.d., or even stationary, we study the performance of PACE on nonstationary data. We start by developing new convergence results for the general dual averaging algorithm under three nonstationary input models: adversariallycorrupted stochastic input, ergodic input, and block-independent (including periodic) input. Our results show convergence of dual averaging up to errors caused by nonstationarity of the data, and recover the classical bounds when the input data is i.i.d. Using these results, we show that the PACE algorithm for online fair allocation simultaneously achieves best of many worlds guarantees against any of these nonstationary input models as well as against i.i.d. input. Finally, numerical experiments show strong empirical performance of PACE against nonstationary inputs.

1 Introduction

In fair division, the goal is to allocate a set of items, typically assumed divisible, among a set of agents with heterogeneous preferences, while guaranteeing fairness and efﬁciency properties. In this paper we are interested in how to fairly and efﬁciency allocate items that arrive online: at every time step one item arrives, and we must irrevocably assign it to some agent. Recently, there has been a growing literature on such online fair allocation problems (Azar et al., 2016; Balseiro et al., 2020; Gao et al., 2021; Bateni et al., 2021; Sinclair et al., 2021; Banerjee et al., 2022). Real-world systems that can be captured by such settings include Internet advertising systems, job recommender systems, cloud computing platforms, and many more. One of the key challenges in such problems is to balance the (often conﬂicting) goals of overall efﬁcient resource utilization with fairness guarantees for the individual agents.

For this setting, Gao et al. (2021) shows that a simple mechanism called PACE (Pace According to Current Estimated utility) generates asymptotically fair and efﬁcient allocations when the item arrivals are drawn in an i.i.d. manner. PACE gives each agent a per-time-step budget of faux currency, and the fair allocation is achieved by having agents participate in ﬁrst-price auctions for each item, using the faux money. By guaranteeing that each agent asymptotically spends their budget at the correct rate, the resulting allocations and prices converge to what is known as a competitive equilibrium from equal incomes (CEEI), which guarantees both fairness and efﬁciency. In PACE, each agent maintains a pacing multiplier to control their spending over time, and the pacing multipliers are updated based on buyers budgets and cumulative utilities. This is similar to how budget-management

36th Conference on Neural Information Processing Systems (Neur IPS 2022).

systems work in Internet ad auctions. PACE is highly decentralized due to its auction-based allocation, it does not require dividing the item, and it is also completely parameter free. This makes it suitable for large-scale practical implementation.

Yet in many large-scale settings, such as the context of fair recommender systems (Kroer et al., 2021;

Kroer and Stier-Moses, 2022) or Internet advertising, we would not expect items to be drawn i.i.d. from a single distribution. One alternative is to assume that data arrives adversarially. However, this leads to very pessimistic negative results and is not an accurate representation of the data one would expect to see in practice. Instead, one would expect the data to have a strong stochastic component, but with changes over time, e.g., due to ﬂow of trafﬁc, breaking news events, or system updates (Esfandiari et al., 2018; Balseiro et al., 2020).

Motivated by the above considerations, we study online fair allocation when the data exhibits nonstationary behavior. In particular, we focus on the performance of the PACE algorithm of Gao et al. (2021). We ask

How does PACE behave when nonstationarity is present in the stream of items?

We show that, under several data-input models, the fairness and efﬁciency guarantees of the PACE algorithm are still preserved, up to errors due to the nonstationarity of the data input. In this sense, we signiﬁcantly extend the main results in Gao et al. (2021). To show these results, we ﬁrst consider the more general setting of nonstationary stochastic optimization and develop new performance guarantees for dual averaging in this setting. Given the ubiquitous use of dual averaging in online and stochastic optimization, our results are of broader interest beyond (fair) resource allocation.

1.1 Summary of Contributions

Novel convergence results for dual averaging under three nonstationary settings.

We analyze the dual averaging (DA) algorithm for nonstationary stochastic optimization under different data input models, namely, (1) mildly corrupted, (2) ergodic and (3) periodic input data. Specifically, we consider the composite dual averaging algorithm, where the composite term is strongly convex. We show that, in all cases, the iterates generated by dual averaging (DA) converge to the optimal solution in mean square, where the bound on the mean-square error decomposes into two terms: i) the typical O(log t/t) guarantee known from the i.i.d. case, and ii) a term that depends on the amount of nonstationarity in the data input model. Our results recover the classical bounds under i.i.d. data input as a special case.

Theoretical fairness and efﬁciency guarantees of PACE for nonstationary item arrivals.

We consider the online fair allocation problem where item arrivals follow any of the three data input models that we consider for DA; these settings generalize the i.i.d. setting in Gao et al. (2021). Utilizing our convergence results for DA under nonstationary data input models, we show that, for item arrivals following these models, PACE ensures convergence of the pacing multipliers, again with a decomposition into a O(log t/t) term as well as a term depending on the nonstationarity. We then show that the agents realized utilities, envy, regrets, and expenditures all obtain convergence bounds based on the convergence of pacing multipliers. Our results show that PACE as an online fair resource allocation algorithm is robust against distributional uncertainty of the input and automatically adapts to many different data input models without any parameter tuning. In Appendix F we provide numerical experiments which corroborate the above theory and demonstrate the practical efﬁciency of PACE under different data input models.

An extensive review of related work is provided in Appendix A.

Notation. We use 1t to denote the vector of ones of length t and ej to denote the vector with one in the j-th entry and zeros in the others. We use (Θ) to denote the space of probability measures on a measurable space Θ, and n to denote the simplex in Rn. To measure the nonstationarity in the input data, we will use the total variation distance. Given two probability measures P and Q, it is deﬁned as P Q TV := (1/2)

dµ | dµ , where µ is a supporting measure.

2 Preliminaries on Online Fair Allocation

An online fair allocation instance with inﬁnitely divisible items with n agents and a ﬁnite horizon t consists of a tuple A = (n, t, Θ, Q, v), where Θ is the (possibly uncountable) measurable space of all possible items, with an associated σ-algebra M and a probability measure µ, the distribution Q (Θt) is the distribution over possible sequences of items γ = (θ1, . . . , θt) Θt, each of unit supply, and the set v = (v1, . . . , vn) L1

+(Θ)n is the set of valuation functions of the n agents. Here L1

+(Θ) is the space of positive integrable functions on Θ. Agent i sees a utility of vi(θ) in item θ Θ. Abusing notation we let vi(γ) =

vi(θ1), . . . , vi(θt)

denote the valuation for agent i of items in the sequence γ. Let Qτ be the marginal distribution of the item θτ at time τ and Q = (1/t)$t

τ=1Qτ. We assume

Θ vid Q = 1 for all i [n]. We further assume |v| := maxi vi < . We stress that the PACE algorithm that we study is not going to require access to either the valuation functions v or the set of possible items Θ; these are only required in order to discuss the resulting bounds.

Given an instance A, the decision maker allocates the stream of items γ one at a time, in an irrevocable manner. At time τ when item θτ is revealed, the decision maker must choose an allocation xτ = (xτ

1, . . . , xτ

n) n based on information available at that time, and allocate accordingly. Here the i-th entry of xτ is the fraction of item θτ allocated to agent i. On receiving her fraction, agent i realizes a utility of uτ

i := vi(θτ)xτ

i . We let x = (x1, . . . , xt) denote the sequence of allocations made over time. For agent i, let xi = (x1

i , . . . , xt

i) Rt denote the fraction of items given to agent i across time. With this notation, the total utility of agent i is xi, vi(γ) . The goal of the decision maker is to choose, in an online fashion, an allocation x such that it achieves some form of both efﬁciency and fairness guarantees.

2.1 Benchmark: The Hindsight Allocation

As a benchmark, we will consider the hindsight-optimal allocation. Suppose all items are presented to the decision maker as opposed to arriving one by one. In that case, a fair and efﬁcient allocation can be found by allocating using the Eisenberg-Gale (EG) convex program Eisenberg and Gale (1959). EG picks the allocation that maximizes the sum of weighted logarithmic utilities (which is equivalent to maximizing the weighted geometric mean of utilities):

max x 0,u 0

The weights Bi represent the priority given to each agent, and they they can be interpreted as budgets in a market-based interpretation of the EG allocation.1. We will focus on the case where Bi = t/n for all i, but all our results extend directly to the case of unequal weights, which can be useful in settings such as when buyers have quasilinear utilities Gao et al. (2021); Conitzer et al. (2019) or when it is desirable to give a larger allocation to certain agents.

The PACE algorithm asymptotically converges to the optimal dual solution, which is

βγ := arg min

max i [n] βivi(θτ) 1

We will also be interested in the underlying problem implied by the average item supplies s = d Q/ dµ. Letting

Θ vixi dµ, this leads to the inﬁnite-dimensional analogue of (1):

We let u denote the optimal utilities in Eq. (3). The inﬁnite-dimensional analogue of the dual (2) is the following. For any δ0 > 0, The inﬁnite-dimensional analogue of (2) is the following. For any δ0 > 0,

β := arg min 1 n(1+δ0) β 1+δ0

max i [n] βivi(θ)

1The hindsight allocation Eq. (1) can be interpreted as a competitive equilibrium from equal incomes (CEEI) in the corresponding Fisher market; see Appendix B for more details on this interpretation.

A rigorous mathematical treatment of the inﬁnite-dimensional program can be found in Gao and Kroer (2021) and (Gao et al., 2021, Section 2). Note the additional constraint in Eq. (4) on β does not affect the optimal solution since 1/n β

i 1; see Lemma 1 in Gao and Kroer (2021).

It is well-known that the hindsight allocation generated by the EG program enjoys the following efﬁciency and fairness properties:

1. Pareto optimality: we cannot strictly increase any agent s utility without decreasing some

other agents utility. 2. Envy-freeness: each agent prefers their own allocation to that of any other agent: vi(γ), x

k for all k = i. 3. Proportionality: every agent achieves at least as much utility as under the uniform alloca-

tion, i.e. vi(γ), x

i vi(γ), (1/n)1t . Therefore the hindsight EG allocation is the gold standard that we assume the decision maker would use if she had known the sequence of items γ in advance. However, in the online setting the decision maker does not know this sequence, and must therefore instead attempt to approximate an equally good allocation in online fashion.

For an item sequence γ, we let xγ denote the optimal hindsight allocation, which is an optimal solution to Eq. (1), and we denote the resulting total and average utility as

i vi(θτ), uγ

i := (1/t) U γ

2.2 Performance Metrics

We measure the performance of an online allocation rule x on the instance γ via the following two quantities. The regret of agent i is the difference between the total hindsight equilibrium utility U γ

i and their realized utilities uτ

Regi,t(γ) := U γ

The envy of agent i is the maximal extent to which they prefer the allocation of any other agent:

Envyi,t(γ) := max

vi(γ), xk vi(γ), xi

We seek to understand the worst-case behavior of an algorithm when facing a certain class of input distributions. For a given input distribution class C (Θt), we will develop bounds on the worstcase regret and envy under any distribution in C:

2.3 The PACE Algorithm

In this section, we review the PACE (Pace According to Current Estimated Utility) dynamics (Gao et al., 2021). In PACE, each item is allocated via ﬁrst-price auction, and each agent constructs bids by scaling their value by a pacing multiplier. The pacing multipliers are maintained using simple, distributed updates that can be handled either by the agents or by the platform.

Algorithmic details are displayed in Algorithm 1. Here Π[ℓ,h][x] = max{ℓ, min{x, h}}. At every time step τ an item θτ is revealed. At that point every agent comes up with a bid for that item, which is equal to their value for the item multiplied by their current pacing multiplier βτ

i . Then, the agents submit these bids to a ﬁrst-price auction, and the item is allocated to the highest bidder. For concreteness, we choose the bidder with the smallest index if a tie occurs, but any rule works. Each agent then observes their realized utility, updates their average utility received so far, and updates their pacing multiplier accordingly. As pointed out in Gao et al. (2021), PACE is an instantiation of dual averaging Xiao (2010) applied to the dual of the hindsight allocation program in (4).

PACE has many attributes desirable in real-world applications.

Algorithm 1: PACE(n, t, δ0) Input: number of agents n, horizon t, algorithm parameter δ0 > 0.

1 Initialize: Set β1 = (1 + δ0) 1n.

2 Environment draws the item sequence γ = {θ1, . . . , θt} from the distribution Q.

3 for τ = 1, . . . , t when item θτ is revealed do

4 Agent i bids βτ

i vi(θτ), the whole item θτ is allocated to the highest bidder iτ (with arbitrary tie breaking) iτ := min{arg maxi [n] βτ

i vi(θτ)} .

5 Agent i updates current average utility uτ

i = vi(θτ)1{i = iτ} , uτ

6 Agent i updates the pacing multiplier βτ+1

, where the interval

1 (1+δ0)n, 1 + δ0

Highlight 1. Decentralization. The PACE dynamics can be run in either centralized (by having the mechanism designer emulate the pacing process for each agent) or decentralized fashion (since the auction-based allocation is the only centralized step at each iteration), and are therefore suitable for Internet-scale online fair division and online Fisher market applications.

Highlight 2. Pure Allocation. PACE allows each item to be fully allocated to a single agent, even though the hindsight performance metric is allowed to utilize fractional allocations. While fractional allocations can be interpreted as randomized allocations in many large-scale settings, this may not always be desirable, for example when allocating food to food banks.

Highlight 3. Tuning-free. An important fact about the PACE dynamics is that each agent has no stepsize parameter whatsoever, which means that no stepsize tuning is required. Moreover, PACE is robust against the types of item arrivals since the algorithm needs neither knowledge of the item distribution P nor the input type C.

In addition to the regret and the envy performance metrics, we will also derive results for the following two quantities that characterize the long-run behavior of PACE. Let ut = (1/t) $t

τ=1uτ be the vector of average realized utilities for all agents. We will show that the agents utilities converge to those associated to the underlying ofﬂine fair allocation problem, u , deﬁned in Eq. (3), in a mean-square sense, i.e., E

0, as long as the error due to nonstationarity grows sublinearly in the number of time periods. Secondly, deﬁne the expenditure of agent i at time τ by bτ

i vi (θτ) 1 {i = iτ} . We will show (1/t) $t

i 1/n in mean square as well, as long as the error due to nonstationarity grows sublinearly in the number of time periods.

3 Main Results

This section introduces the main results of this paper: the behavior of PACE under three different types of nonstationary input models. All prior results on fair online allocation have been either for worst-case inputs (with much more conservative guarantees and not for the PACE algorithm) (Azar et al., 2016; Banerjee et al., 2022) or for i.i.d. input data Gao et al. (2021).

We ﬁrst introduce some notation that will be useful for describing these input models. For s > τ 1 let Qs(θ1:τ) denote the conditional distribution of θs given {θ1, . . . , θτ}. For a subset I of [t] let QI denote the joint distribution of the variables {θτ}τ I. Let Q = (1/t) $t

τ=1Qτ be the uniform mixture of {Qτ}τ. We study three types of input: independent input with adversarial corruption, ergodic and Markov input, and periodic input. For each input setting, we describe our main theorem for the performance guarantees of PACE here. The proofs are given in Appendix C, because these results rely on developing a theory of nonstationary performance of DA, which is done in Section 4.

3.1 Independent Input with Adversarial Corruption

Adversarial perturbation of a ﬁxed item distribution models scenarios where the items generally behave in a predictable manner, but for some time steps the input behaves erratically. Typically this is assumed to happen only for a small number of time steps. Such perturbation could be malicious,

for example when item arrivals are manipulated in favor of certain agents; or non-malicious, such as unpredictable surges of certain keywords on search engines (Esfandiari et al., 2018).

We study a type of adversarial perturbation where the item distribution at each time step might be corrupted by an arbitrary amount, but distributions at different time steps are independent of each other. We assume the average corruption is bounded by δ, as measured in TV distance. The set of distributions over sequences that we consider is then:

We use O to hide numeric constants and polynomials of n, |v| , and log t. Our main fair online allocation result for the adversarial corruption case is: Theorem 1 (Independent Case). For the adversarially corrupted and independent case, Algorithm 1 guarantees that for an instance A = (n, t, Θ, Q, v), we have

sup Q CID(δ)

, sup Q CID(δ)

sup Q CID(δ) Eγ Q

bt (1/n)1n 21

,sup Q CID(δ) Eγ Q

,sup Q CID(δ) Eγ Q

= O(δ+1/t) .

The result shows that the regret and envy performance metrics degrade linearly in the average corruption δ. In the i.i.d. case where δ = 0, we recover the

t regret rate and the 1/t rate of convergence for utilities and expenditures in terms of the mean-square error from Gao et al. (2021). If out of the t distributions of items in each time step only O(

t) are corrupted, each by a constant amount, then the

t regret and envy bounds, as well as 1/t convergence rates, are also preserved.

3.2 Ergodic Input and Markov Processes

To handle correlation across time, we next study ergodic inputs. For these inputs, strong correlation might be present for items sampled at nearby time steps, but the correlation between items decays as they are separated in time. For any integer ι such that 1 ι t 1, we measure the ι-step deviation from some distribution Π (Θ) by the quantity

δ(ι) := sup

γ sup τ=1,...,t ι

Qτ+ι(θ1:τ) Π TV .

Intuitively, this deﬁnition tells us that, no matter where and when we start the item arrival process, it takes only ι steps to get δ(ι)-close to the distribution Π. We will consider the set of ergodic input distributions whose ι-step deviation is bounded by δ:

CE(δ, ι) :=

Q (Θt) : sup

γ sup τ=1,...,t ι

Qτ+ι(θ1:τ) Π TV δ, for some Π (Θ)

Theorem 2 (Ergodic Case). For the ergodic case, Algorithm 1 guarantees that for an instance A = (n, t, Θ, Q, v), we have

sup Q CE(δ,ι)

, sup Q CE(δ,ι)

sup Q CE(δ,ι) Eγ Q

bt (1/n)1n 21

,sup Q CE(δ,ι) Eγ Q

,sup Q CE(δ,ι) Eγ Q

= O(δ+ι/t) . Remark 1 (Markov Input). We can specialize the result in Theorem 2 to fast mixing or Markov item sequences. Fast mixing means the deviation δ decreases exponentially, i.e., for all 1 ι t 1,

γ sup τ=1,...,t ι

Qτ+ι(θ1:τ) Π TV Mρι , (12)

for some M > 0, ρ [0, 1), and Π is the stationary distribution. Examples include ﬁnite statespace time-homogeneous Markov chains and uniformly ergodic Markov chains on general state spaces (Meyn and Tweedie, 2012, Chapter 16). In these cases, setting ι = log t+log M

log(ρ 1) = O( log t log(ρ 1)) implies δ 1/t. This means the Markov chain from which γ is generated takes O(log t) steps to get (1/t)-close to stationarity. The dominant term for the regret in Theorem 2 (further ignoring M) is then (1 + 1 log(ρ 1))1/2

t. The term in the parenthesis reﬂects the inﬂation caused by input

dependency. To recover the case of i.i.d. input, we simply send ρ 0 and the usual

t regret and envy rates and 1/t utility and expenditure convergence rates are again recovered.

3.3 Periodic Input

Item sequences often exhibit statistical periodic structure. For example, when allocating compute time to requestors, there will be more requests during weekdays and less on weekends. The compute request patterns vary throughout the week, and yet the weekly pattern would repeat over time. Similarly, Internet trafﬁc typically exhibits periodic structure.

Formally, assume that the horizon t divides into K blocks of time, each of size q. This divides the item sequence γ into consecutive blocks of length q. Within each block, we allow a distribution with arbitrary dependence between time steps, but we assume that the blocks, as a whole, are identically and independently distributed. We deﬁne the set of periodic input distributions as follows:

Q (Θq)K : Q1:q = Qq+1:2q = . . . = Qt q+1:t/

Theorem 3 (Periodic Case). For the periodic case, Algorithm 1 guarantees that for an instance A = (n, t, Θ, Q, v), we have

sup Q CP(q)

, sup Q CP(q)

sup Q CP(q) Eγ Q

bt (1/n)1n 21

,sup Q CP(q) Eγ Q

,sup Q CP(q) Eγ Q

= O(q2/t) .

If the length q of the blocks is of order o(t) then the time-averaged regret and envy are both vanishing. For the i.i.d. case, we can set q = 1 to recover the previous results.

Now we comment on dependence on the period length q. Suppose the item sequence consists of K blocks, and blocks are i.i.d. We still allow arbitrary dependence within a block. The proof of Theorem 3 essentially relies on the result (Theorem 9) that DA produces iterates whose squared error is of order O(q2/t). Consider dual averaging with the knowledge of the block structure q. Then the rate 1/K = q/t can be achieved by executing DA using one randomly chosen data point within a block, throwing away the rest in that same block. Such selection produces K i.i.d. samples from the distribution. In comparison, the rate in O(q2/t) is worse off by a factor of q due to not knowing the block-structure information.

4 Proof Technique: Nonstationary Dual Averaging

The PACE dynamics can be cast as online dual averaging (Xiao, 2010) applied to the dual of the hindsight allocation program in (1). This will be discussed in more detail in Section 4.2 and Appendix C.2. However, in order to characterize its performance under various types of nonstationary input, we ﬁrst extend the general convergence results for dual averaging to incorporate nonstationary input. The convergence results that we developed for dual averaging under three different types of input models are novel and are of independent interest.

We remark that the results for the stochastic setting given by Xiao (2010) cannot be used directly, since they rely on the stringent i.i.d. assumption. Duchi et al. (2012) consider ergodic mirror descent (MD) for convex problems under some (but not all) of our nonstationary input models. Their results and analysis cannot be used in our case either, since their results do not allow using the composite structure, whose strong convexity we leverage. Moreover, unlike DA, an MD-based approach would require tuning parameters such as stepsizes.

In this section, after introducing the setup of DA in Section 4.1, we present a DA convergence result for independent but not identical input in Section 4.3, for which we outline the proof idea and technical challenges in Section 4.3. Due to space limitations, we present DA convergence results for ergodic and periodic inputs in Appendix D.3.

In the nonstationary setup of DA, we emphasize that whenever we mention convergence, we mean convergence of DA iterates to the population-level optimum w

Π (or sometimes the hindsight optimum), up to some error caused by nonstationarity.

4.1 Review: The DA Algorithm

We review the dual averaging setup in the strongly convex case (Xiao, 2010, 1.1). Consider a stochastic optimization problem of the form

φ(w) := Ez Π

where w (Rd, ) is the variable, Ψ is a closed convex function with closed domain Dom Ψ := {w Rn : Ψ(w) < }. The expectation is taken over a probability distribution Π on a measurable space Z. For each z Z, the function f( , z) is convex and subdifferentiable (a subgradient always exists) on Dom Ψ. Let F(w, z) = f(z, w) + Ψ(w).

Assumptions. Let G(w, z) be a ﬁxed element in the set of subgradients wf(w, z).

1. For almost every z, it holds G(w, z) G, where = max w 1 s, w is the dual

norm. 2. There exists an F R such that F(w, z) F for all w Dom Ψ and (almost every) z. 3. Ψ is σ-strongly convex, i.e., Ψ(αw + (1 α)u) αΨ(w) + (1 α)Ψ(u) σ

2 α(1 α) w u 2 for w, u Dom Ψ. Because of our strong convexity assumption, the solution to (15) is unique. Associated with Π we deﬁne w

Π := arg min Ez Π

The dual averaging algorithm (DA) (Xiao, 2010, Algorithm 1) aims to produce a sequence converging to the optimal point w

Π or minimize the associated regret (Xiao, 2010, 1.2). The algorithmic details for DA are presented in Algorithm 2. Note that although Xiao (2010) only considers the case of i.i.d. data, DA iterates are deﬁned for every input sequence {zτ}t

τ=1, regardless of any distributional properties of the sequence.

The ﬁrst step in our analysis is a relationship between regret and the suboptimality wt w

Π derived by Xiao (2010). Consider the dual averaging algorithm with data {zτ}t

τ=1. We denote the one-step subgradient by gτ := G(wτ, zτ) and the average subgradient by gτ = ($τ

s=1 gs)/τ. Given data {zτ}t

τ=1, we deﬁne the regret and the sum of squared subgradient norms 2

F(wτ, zτ) F(w, zτ)

The bound t (6 + log t)G2/(2σ) holds in a deterministic manner due to the bounded subgradient assumption.

Xiao (2010) shows the following bound on suboptimality of wt Lemma 1 (Regret Bound, Section B.2 in Xiao (2010)). For any sequence {zτ}t

τ=1, any w Dom Ψ, any t = 1, 2, . . . , it holds wt+1 w 2 2 σt

4.2 Review: PACE as Dual Averaging

In this section we review how to cast PACE as dual averaging applied to the problem (4). This derivation was originally given in Gao et al. (2021). Let f(β, θ) = maxi βivi(θ), Ψ(β) = 1

i=1 log βi, in which case we get F(β, θ) = f(β, θ) + Ψ(β) = maxi [n]

i=1 log βi . Following (Gao and Kroer, 2021, 5), since f( , θ) is a piecewise linear function, a subgradient is G(β, θ) := viτ (θ)eiτ βf(β, θ) , where iτ = min{arg maxi βivi(θ)} is the index of the winning agent (see, e.g., Beck (2017, Theorem 3.50)). Based on this instantiation of DA, we get that the iterates {βτ}t+1

τ=1 generated by the PACE dynamics (Algorithm 1) are exactly the iterates wτ generated by DA(G, Ψ, γ) (Algorithm 2). A proof is given in Appendix C.2.

Before moving on to showing our results, let us ﬁrst touch on the fact that dual averaging provides worst-case regret guarantees. Naively, one may expect that these regret guarantees would directly translate into regret guarantees on the primal performance, meaning a bound on supγ Regi,t(γ).

2 See the ﬁrst equation on page 2584 in Xiao (2010). In Xiao (2010) s notation, set βτ = 0 all τ 1 and β0 = σ, plug in the bound h(w2) 2 g1 2

/σ and we have the expression of t in our paper.

Algorithm 2: DA(G, Ψ, {zτ}t

Input: subgradient G, regularizer Ψ and data {zτ}t

1 Initialize: set g0 = 0 and w1 = arg min Ψ.

2 for τ = 1, . . . , t do

3 Observe zτ and compute gτ = G(wτ, zτ).

4 Average subgradients (the dual average) via gτ = τ 1

5 Compute the next iterate wτ+1 = arg minw{ gτ, w + Ψ(w)}.

However, such dual regret bounds do not imply a worst-case primal regret bound. From a technical perspective, it is unclear how a regret bound on the dual objective would translate to a regret bound on envy or utilities. Secondly, based on results in the online fair allocation literature (Azar et al., 2016; Banerjee et al., 2022), it is known that not only can we not get a no-regret guarantee on the primal performance, it is not even possible to achieve a constant competitive ratio in the worst-case setting.

4.3 Independent Data with Adversarial Corruption

We present a DA convergence result with independent data in this section. Theorem statements for the ergodic case and periodic case are presented in Appendix D.3. Discarding the i.i.d. assumption on the data {zτ}t

τ=1, we let P be the joint distribution of {zτ}τ and let P τ be the marginal distribution of zτ. We study the relationship between the DA iterates wt+1 and w

Π by bounding the mean-square difference E{zτ }t

, and thus demonstrate in what sense the data distribution P should stay close to the i.i.d. distribution Π in order to preserve DA convergence.

We ﬁrst introduce a variant of CID(δ) with a target distribution Π:

CID(δ; Π) :=

Theorem 4 (DA Convergence, Independent Data with Adversarial Corruption). If {zτ}t

τ=1 P and P CID(δ, Π). Then for t 1,

(6 + log t)G2

σ δ = O(δ + 1/t) .

Moreover, the rate O(δ+1/t) applies to E

(See Appendix D.1).

Remark 2. From this result we can tell when DA retains last-iterate convergence. Suppose the number of corrupted data points is of order o(t) (assuming corruption on each data is of the same order), then the corruption per item is δ = o(1) and DA converges to the optimal solution as t . Furthermore, if the corruption per data point is of order δ = O(1/t), then the fast rate 1/t is

By Section 4.2, we get as a corollary that the PACE iterates {βt} will converge to β , the solution to the inﬁnite-dimensional dual program (4). This is the building block for our results in Section 3.

While the full proof is too long to ﬁt in the paper, we now give a proof sketch to show the main ideas behind how we derive Theorem 4. We begin the proof by noticing Lemma 1 is deterministic and valid for any {zτ}t

τ=1. Now set w = w

Π in Lemma 1. If the input data {zτ}t

τ=1 were i.i.d. from Π, i.e., P = Π t, then E[Rt(w

Π)] would be greater than zero, and we would obtain

(6 + log t)G2

However, in the nonstationary case, the regret E[Rt(w

Π)] might be negative. At a high level, our results are achieved by introducing appropriate measures of the nonstationarity and then lower bounding E[Rt(w

Π)] based on those measures.

To this end, we decompose the regret as follows. Write

F(wτ, zτ) φΠ(wτ)

φΠ(wτ) φΠ(w

By optimality of w

Π we have II 0. Using the bound on the TV distance between {P τ}τ and Π, and boundedness of F we can control the other two terms. The key is, conditional on Fτ 1, the iterate wτ is deterministic and the distribution of zτ is P τ due to independence assumption. For each term in the ﬁrst summation, we condition on Fτ 1 and obtain

|E[F(wτ, zτ) φΠ(wτ)|Fτ 1]| =

F(wτ, z)P τ(dz | z1:τ 1)

F(wτ, z) dΠ(z)|Fτ 1

F(wτ, z)P τ(dz)

F(wτ, z) dΠ(z)|Fτ 1

F(wτ, z)P τ(dz)

F(wτ, z) dΠ(z)

2 F P τ Π TV . The second sum can be handled similarly. For the detailed proof and a generalization see Appendix D.2 in Appendix D.

We now discuss how this paper handles nonstationarity differently from the existing literature. Let us use Balseiro et al. (2020) as a reference point, since they consider very similar categories of nonstationary inputs. From a online constrained optimization perspective, Balseiro et al. (2020) relies on the fact that their objective is separable across timesteps in order to handle nonstationarity. In contrast, this structure does not exist for the online fair allocation problem, because the objective takes the logarithm of the utility over time. Concretely, in Balseiro et al. (2020), the objective is of the form $t

τ=1 fτ(x). This type of time-separability occurs in all works that we are aware of for non-stationary inputs, also e.g. the ergodic mirror descent paper Duchi et al. (2012) (Duchi et al. (2012) is not an online setting, but the expectation in their objective is analogous to time separability). Time separability is used as a key property for deriving regret bounds in those papers. The separability enables translating dual regret to primal regret by a weak duality argument (see e.g. Prop. 1 in Balseiro et al. (2020) where a time-separable dual allows weak duality).

In contrast, our problem has time-separability only in the dual formulation, but not in the primal one, which is where we ultimately want guarantees (since we are interested in utilities converging). Our contribution to handling nonstationarity in online fair allocation is showing that a dual approach works even without the separability structure. We begin by deriving convergence guarantees on the last dual iterate, which we achieve by modifying the dual averaging proof to take into account nonstationarity and analyzing the stability of the dual variables in dual averaging. We note that this technique depends on strong convexity, unlike for the time-separable case. Next, to go from dual last-iterate convergence to primal regret bounds we use the ﬁrst-order optimality condition for EG program, which are speciﬁc techniques for our problem (such techniques were also used in the previous PACE paper Gao et al. (2021)).

5 Conclusion

We established new convergence results for dual averaging under nonstationary data input models, namely, adversarial corruption, ergodic, and block-independent input models. Leveraging these results, we showed that, for online fair allocation problems with item arrivals generated from the above nonstationary data input models, the PACE algorithm automatically adapts to them and achieves asymptotic fairness and efﬁciency without any parameter tuning. Numerical experiments demonstrated the effectiveness of PACE under these data input models.

Acknowledgments and Disclosure of Funding

This research was supported by the Ofﬁce of Naval Research Young Investigator Program under grant N00014-22-1-2530, and the National Science Foundation award IIS-2147361.

Azar, Y., Buchbinder, N., and Jain, K. (2016). How to allocate goods in an online market? Algorith-

mica, 74(2):589 601.

Balseiro, S., Lu, H., and Mirrokni, V. (2020). The best of many worlds: Dual mirror descent for

online allocation problems. ar Xiv preprint ar Xiv:2011.10124.

Balseiro, S. R. and Gur, Y. (2019). Learning in repeated auctions with budgets: Regret minimization

and equilibrium. Management Science, 65(9):3952 3968.

Banerjee, S., Gkatzelis, V., Gorokh, A., and Jin, B. (2022). Online nash social welfare maximiza-

tion with predictions. In Proceedings of the 2022 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1 19. SIAM.

Bateni, M., Chen, Y., Ciocan, D. F., and Mirrokni, V. (2021). Fair resource allocation in a volatile

marketplace. Operations Research.

Beck, A. (2017). First-Order Methods in Optimization, volume 25. SIAM.

Besbes, O., Gur, Y., and Zeevi, A. (2015). Non-stationary stochastic optimization. Operations

Research, 63(5):1227 1244.

Birnbaum, B., Devanur, N. R., and Xiao, L. (2011). Distributed algorithms via gradient descent

for ﬁsher markets. In Proceedings of the 12th ACM Conference on Electronic Commerce, pages 127 136. ACM.

Caragiannis, I., Kurokawa, D., Moulin, H., Procaccia, A. D., Shah, N., and Wang, J. (2019). The

unreasonable fairness of maximum nash welfare. ACM Transactions on Economics and Computation (TEAC), 7(3):1 32.

Cheung, Y. K., Cole, R., and Devanur, N. R. (2020). Tatonnement beyond gross substitutes? Gradi-

ent descent to the rescue. Games and Economic Behavior, 123:295 326.

Cheung, Y. K., Cole, R., and Tao, Y. (2018). Dynamics of distributed updating in ﬁsher markets. In

Proceedings of the 2018 ACM Conference on Economics and Computation, pages 351 368.

Cole, R., Devanur, N. R., Gkatzelis, V., Jain, K., Mai, T., Vazirani, V. V., and Yazdanbod, S. (2017).

Convex program duality, ﬁsher markets, and Nash social welfare. In 18th ACM Conference on Economics and Computation, EC 2017. Association for Computing Machinery, Inc.

Conitzer, V., Kroer, C., Panigrahi, D., Schrijvers, O., Sodomka, E., Stier-Moses, N. E., and Wilkens,

C. (2019). Pacing equilibrium in ﬁrst-price auction markets. In Proceedings of the 2019 ACM Conference on Economics and Computation. ACM.

Conitzer, V., Kroer, C., Sodomka, E., and Stier-Moses, N. E. (2021). Multiplicative pacing equilibria

in auction markets. Operations Research.

Duchi, J. C., Agarwal, A., Johansson, M., and Jordan, M. I. (2012). Ergodic mirror descent. SIAM

Journal on Optimization, 22(4):1549 1578.

Eisenberg, E. and Gale, D. (1959). Consensus of subjective probabilities: The pari-mutuel method.

The Annals of Mathematical Statistics, 30(1):165 168.

Esfandiari, H., Korula, N., and Mirrokni, V. (2018). Allocation with trafﬁc spikes: Mixing ad-

versarial and stochastic models. ACM Transactions on Economics and Computation (TEAC), 6(3-4):1 23.

Gao, Y. and Kroer, C. (2020). First-order methods for large-scale market equilibrium computation.

In Neural Information Processing Systems 2020, Neur IPS 2020.

Gao, Y. and Kroer, C. (2021). Inﬁnite-dimension ﬁsher markets and tractable fair division. ar Xiv

preprint ar Xiv:2010.03025.

Gao, Y., Kroer, C., and Peysakhovich, A. (2021). Online market equilibrium with application to fair

division. ar Xiv preprint ar Xiv:2103.12936.

Gkatzelis, V., Psomas, A., and Tan, X. (2021). Fair and efﬁcient online allocations with normalized

valuations. In Proceedings of the AAAI Conference on Artiﬁcial Intelligence, volume 35, pages 5440 5447.

Harper, F. M. and Konstan, J. A. (2016). The movielens datasets: History and context. ACM Transactions on Interactive Intelligent Systems (TIIS), 5(4):19.

Kroer, C., Peysakhovich, A., Sodomka, E., and Stier-Moses, N. E. (2021). Computing large com-

petitive equilibria using abstractions. Operations Research.

Kroer, C. and Stier-Moses, N. E. (2022). Market equilibrium models in large-scale internet markets.

Innovative Technology at the Interface of Finance and Operations. Springer Series in Supply Chain Management. Springer Natures, Forthcoming.

Manshadi, V., Niazadeh, R., and Rodilitz, S. (2021). Fair dynamic rationing. In Proceedings of the

22nd ACM Conference on Economics and Computation, pages 694 695.

Meyn, S. P. and Tweedie, R. L. (2012). Markov Chains and Stochastic Stability. Springer Science

& Business Media.

Nesterov, Y. (2003). Introductory Lectures on Convex Optimization: A Basic Course, volume 87.

Springer Science & Business Media.

Sinclair, S. R., Banerjee, S., and Yu, C. L. (2021). Sequential fair allocation: Achieving the optimal

envy-efﬁciency tradeoff curve. ar Xiv preprint ar Xiv:2105.05308.

Varian, H. R. (1974). Equity, envy, and efﬁciency. Journal of Economic Theory, 9(1):63 91.

Xiao, L. (2010). Dual averaging methods for regularized stochastic learning and online optimization.

Journal of Machine Learning Research, 11:2543 2596.

Zhang, L. (2011). Proportional response dynamics in the ﬁsher farket. Theoretical Computer Sci-

ence, 412(24):2691 2698.

The checklist follows the references. Please read the checklist guidelines carefully for information on how to answer these questions. For each question, change the default [TODO] to [Yes] , [No] , or [N/A] . You are strongly encouraged to include a justiﬁcation to your answer, either by referencing the appropriate section of your paper or providing a brief inline description. For example:

Did you include the license to the code and datasets? [Yes] See Section ??. Did you include the license to the code and datasets? [No] The code and the data are

proprietary. Did you include the license to the code and datasets? [N/A]

Please do not modify the questions and only use the provided macros for your answers. Note that the Checklist section does not count towards the page limit. In your paper, please delete this instructions block and only keep the Checklist section heading above along with the questions/answers below.

1. For all authors...

(a) Do the main claims made in the abstract and introduction accurately reﬂect the paper s

contributions and scope? [Yes] See Section 3 (b) Did you describe the limitations of your work? [No] Our work is mostly theoretical

and studies the fair-allocation algorithm PACE under various nonstationary settings. (c) Did you discuss any potential negative societal impacts of your work? [No] Not appli-

cable. (d) Have you read the ethics review guidelines and ensured that your paper conforms to

them? [Yes] We read them. 2. If you are including theoretical results...

(a) Did you state the full set of assumptions of all theoretical results? [Yes] See Section 3

and Section 4. (b) Did you include complete proofs of all theoretical results? [Yes] See Appendix C,

Appendix D and Appendix E. 3. If you ran experiments...

(a) Did you include the code, data, and instructions needed to reproduce the main ex-

perimental results (either in the supplemental material or as a URL)? [Yes] See Appendix F. (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they

were chosen)? [Yes] See Appendix F. (c) Did you report error bars (e.g., with respect to the random seed after running experi-

ments multiple times)? [No] No applicable. (d) Did you include the total amount of compute and the type of resources used (e.g., type

of GPUs, internal cluster, or cloud provider)? [Yes] See Appendix F. 4. If you are using existing assets (e.g., code, data, models) or curating/releasing new assets...

(a) If your work uses existing assets, did you cite the creators? [No] We use synthetic

data. (b) Did you mention the license of the assets? [No] N/A.

(c) Did you include any new assets either in the supplemental material or as a URL? [No]

N/A. (d) Did you discuss whether and how consent was obtained from people whose data

you re using/curating? [No] N/A. (e) Did you discuss whether the data you are using/curating contains personally identiﬁ-

able information or offensive content? [No] N/A. 5. If you used crowdsourcing or conducted research with human subjects...

(a) Did you include the full text of instructions given to participants and screenshots, if

applicable? [No] N/A. (b) Did you describe any potential participant risks, with links to Institutional Review

Board (IRB) approvals, if applicable? [No] N/A.

(c) Did you include the estimated hourly wage paid to participants and the total amount

spent on participant compensation? [No] N/A.