# dynamic_fair_division_with_partial_information__2a65c2f6.pdf

Dynamic Fair Division with Partial Information

Gerdus Benadè Boston University benade@bu.edu

Daniel Halpern Harvard University dhalpern@g.harvard.edu

Alexandros Psomas Purdue University apsomas@cs.perdue.edu

We consider the fundamental problem of fairly and efﬁciently allocating T indivisible items among n agents with additive preferences. The items become available over a sequence of rounds, and every item must be allocated immediately and irrevocably before the next one arrives. Previous work shows that when the agents valuations for the items are drawn from known distributions, it is possible (under mild technical assumptions) to ﬁnd allocations that are envy-free with high probability and Pareto efﬁcient ex-post. We study a partial-information setting, where it is possible to elicit ordinal but not cardinal information. When a new item arrives, the algorithm can query each agent for the relative rank of this item with respect to a subset of the past items. When values are drawn from i.i.d. distributions, we give an algorithm that is envy-free and (1 ϵ)-welfare-maximizing with high probability. We provide similar guarantees (envy-freeness and a constant approximation to welfare with high probability) even with minimally expressive queries that ask for a comparison to a single previous item. For independent but non-identical agents, we obtain envy-freeness and a constant approximation to Pareto efﬁciency with high probability. We prove that all our results are asymptotically tight.

1 Introduction

We consider the following fundamental problem in fair division. A set of T indivisible items, arriving one at a time, must be allocated among a set of n agents with additive preferences. The value vi,t that agent i has for the item in round t is realized once the item arrives. Each item must be allocated immediately and irrevocably upon arrival, and we ask that the overall allocation is fair and efﬁcient.

Previous work on this problem shows that, despite the uncertainty about future valuations, one can achieve simultaneous fairness and efﬁciency when agents values are stochastic. Speciﬁcally, when each vi,t is drawn i.i.d. from a distribution D, the simple algorithm that maximizes welfare each item is allocated to the agent with the highest value is envy-free with high probability and (obviously) ex-post Pareto efﬁcient [DGK+14, KPW16]. The same guarantee holds for independent and non-identical agents (vi,t is drawn from an agent-speciﬁc distribution Di) for the algorithm that maximizes weighted welfare [BG22]. Even when agents valuations for an item are correlated (but items are independent), Pareto efﬁciency ex-post is compatible with strong fairness guarantees ( envy-freeness with high probability or envy-freeness up-to-one item ex-post ) [ZP20].

Despite the computational simplicity of (most of) the aforementioned algorithms, an unappealing aspect, especially from a practical perspective, is the requirement that agents report an exact numerical value for each item. Eliciting expressive additive valuations can be impractical, e.g., due to agents cognitive limitations. Motivated by such considerations, a growing body of work in AI studies what can be achieved by algorithms that only elicit ordinal information. This idea originates from [PR06], who deﬁned the notion of distortion to measure the worst-case deterioration of an aggregate cardinal objective (e.g., utilitarian social welfare) due to only having access to preferences of limited expressiveness, particularly ordinal rankings. Recent works

36th Conference on Neural Information Processing Systems (Neur IPS 2022).

prove bounds on the distortion in the context of many problems in social choice, e.g., voting [CNPS17, GKM17, MSW20, MW19, Kem20a, Kem20b, GHS20], matching [FRFZ14], and participatory budgeting [BNPS21]; see [AFRSV21] for a recent survey.

In this paper, we study the power and limits of eliciting ordinal information in dynamic fair division. The value vi,t of agent i for item t is drawn from an unknown distribution upon arrival, and the algorithm is provided, from each agent, partial ordinal information about this item, e.g., its rank relative to the past items allocated to this agent, or even just a single past item allocated to this agent. Under what distributional assumptions and elicitation constraints, can we simultaneously achieve qualitative fairness and efﬁciency? We answer these questions.

1.1 Our Contribution

We start by establishing a separation between the cardinal setting and our ordinal one. Pareto efﬁciency alone is trivial (allocate all goods to the same agent) and, in the cardinal setting, it is known that Pareto efﬁciency ex-post is compatible with envy-freeness with high probability (as long as agents are independent). We prove (Theorem 4) that in our setting, even for the case of two i.i.d. agents and any known distribution, envy-freeness with high probability is incompatible with even a very mild notion of (exact) Pareto efﬁciency, one-swap-Pareto efﬁciency, which requires that there is no beneﬁcial one-to-one trade of items between agents (but allows for improvements via manyto-many trades of items).

We proceed to give an essentially matching positive result. For any number of i.i.d. agents and an unknown distribution D, there exists an algorithm (Algorithm 1) that is envy-free with high probability and guarantees a (1 ε)-approximation to the optimal utilitarian social welfare (the sum of utilities), for all ε > 0, with high probability (Theorem 5). When an item arrives, the algorithm learns for each agent i its relative rank compared to a subset of prior items allocated to agent i (but otherwise knows nothing about the underlying numerical valuation). Our algorithm works in epochs. Each epoch has an exploration/sampling phase, where each agent gets a pre-determined number of items, followed by an exploitation/ranking phase, where each fresh item is given to an agent whose empirical quantile is largest. The goal is to make a sublinear number of errors compared to the ideal algorithm that allocates each item to the agent with the highest true quantile. The algorithm has to balance the need for sampling, which leads to more accurate empirical quantiles, against the number of inefﬁcient allocations made while sampling. A signiﬁcant technical barrier is that we cannot ﬁx a target accuracy because the underlying distribution is unknown. That is, for every ﬁxed accuracy for the empirical quantiles, there exists a distribution for which this accuracy is not good enough for even a constant approximation to the optimal welfare. Instead, we need to make our epochs progressively longer, thereby guaranteeing progressively better bounds.

Given this strong positive result, we explore the limits of what we can achieve when further restricting the amount of information available. What if each agent can remember only a single item previously allocated to them, and the fresh item is compared to just this one item?1 Surprisingly, the aforementioned positive result can almost be recovered even in this very restrictive setting. We prove that there exists an algorithm (Algorithm 2) that is envy-free with high probability and guarantees a (1 1/e)2 ε approximation to the optimal welfare with high probability, for all ε > 0 (Theorem 9). Our algorithm again proceeds in epochs with gradually increasing exploration and exploitation phases; this time the goal is a sublinear number of differences compared to allocating each item to a uniformly random agent with quantile at least 1 1/n, which we prove is envy-free and approximately efﬁcient (Lemmas 2 and 1). When exploring, the algorithm puts a new item in memory, estimates its quantile, and rejects it if not sufﬁciently close to 1 1/n. We need to sample enough to ensure high conﬁdence in the estimated quantile, but also account for the additional sampling since an item s quantile value might be far from 1 1/n to begin with; several technical details need to be accounted for. We give a near-matching lower bound: no algorithm with a memory of one item can achieve a 0.999 approximation to the social welfare with high probability; therefore a constant approximation (which Algorithm 2 provides) is all we can hope for.

Finally, we relax the i.i.d. condition and study agents that are independent but not identical; each agent i s values are drawn from an unknown distribution Di. Even with unbounded memory, we

1So, the algorithm only learns if the new item is better or worse than the item in memory and may, at that time, choose to replace the item in memory.

Table 1: Main Results Setting Possibility Impossibility

Unbounded Memory, i.i.d. EF + (1 ε)-welfare (Theorem 5) EF + PO (Theorem 4) Limited Memory, i.i.d. EF + .399-welfare (Theorem 9) .999-welfare (Theorem 8) Unbounded Memory, non-i.i.d. EF + .367-PO (Theorem 13) EF + .809-PO (Theorem 12)

show that it is impossible to get a 1+

5 4 .809 approximation to Pareto efﬁciency with probability more than 2/3, even for two agents (Theorem 12). At the same time, we prove that Algorithms 1 and 2 are envy-free and 1/e approximately Pareto efﬁcient with high probability! Note that, though both algorithms give the same formal guarantees and Algorithm 2 elicits strictly less information, one might still prefer to use Algorithm 1 since it has signiﬁcantly shorter exploration phases. A summary of our results can be found in Table 1

We leave the study of correlated agents as an interesting open problem. Finally, we note that beyond stochastic valuations, [BKPP18] show that it is possible to achieve sublinear envy by randomly allocating every item when agents valuations are adversarially generated (and this is optimal); however, sublinear envy is incompatible with non-trivial efﬁciency even in the cardinal setting [ZP20].

1.2 Related Work

A number of works study fair division under ordinal preferences, e.g., [AGMW15, BEL10, BBL+17, NNR17], but often these models do not assume an underlying cardinal model and work directly on the ordinal preferences. [ABM16] assume underlying cardinal information and, among other results, bound the approximation ratio of truthful mechanisms that elicit rankings. Closer to our work, [HS21] study rules that have access to the ranking of the top-k items of each agent and bound the ratio of the social welfare of the allocation returned by a rule in the worst case. They also characterize the value of k needed to achieve prominent notions of fairness, namely envy-freeness up to one item (EF1) and approximate maximin share guarantee (MMS), as well as bound the loss in efﬁciency incurred due to fairness constraints in this setting.

Our work contributes to the growing literature in dynamic fair division [KPS14, AAGW15, FPV15, FPV17, BKPP18, LLL18, HPPZ19, ZP20, GPT21, BKM22, GBI21, VPF21, BGGJ22, BMS22, KS22] (and we note that the welfare-maximizing algorithms of [DGK+14, KPW16, BG22] work in the dynamic setting, even though the their settings are not explicitly dynamic). To the best of our knowledge, we are the ﬁrst to study imperfect expressivity in a dynamic setting in fair division.

2 Preliminaries

A set of T indivisible items/goods, labeled by G = {1, 2, , T}, needs to be allocated to a set of n agents, labeled by N = {1, . . . , n}. Agent i N assigns a value vi,t [0, 1] to item t G. We assume agents have additive valuation functions, so vi(S) = P

t S vi,t for S G. An allocation A is a partition of the items into bundles A1, . . . , An, where Ai is the set of items assigned to agent i N. Each allocation has an associated utility proﬁle v(A) = (v1(A1), . . . , vn(An)).

Items arrive online, one per round. The agents valuations for the item in round t (the t-th item) are realized when the item arrives. Every item is allocated immediately and irrevocably before moving on to the next round. We write Gt = {1, 2, , t} for the set of items that arrived in the ﬁrst t rounds, and At i for the allocation of agent i after the t-th item was allocated. We consider two different models which specify how values are generated. In the i.i.d. model, the value of agent i for item t is independently drawn from an unknown distribution D with CDF F, i.e., vi,t D for all i N and t G. In the non-i.i.d. model, the value of item t for agent i is independently drawn from an unknown, agent-dependent distribution Di with CDF Fi, i.e., vi,t Di for all i N and t G. We write Xi for the random variable for i s valuation, and Xi,t for the random variable for i s valuation for item t. It is often convenient to work directly with the quantile of an agent s value rather than the value itself; let Qi = Fi(Xi) and Qi,t = Fi(Xi,t) respectively be the random variable denoting the quantile of agent i the associated item. Note that all Qi and Qi,t are i.i.d. and

follow a Unif[0, 1] distribution. Unless explicitly stated otherwise, we assume all distributions are continuous (i.e., do not have point masses).

Ordinal information. We assume the realizations vi,t are not available. Instead, our algorithms have access to ordinal information. Speciﬁcally, given current item t, the algorithm can access each agent s ranking for S = {t} M, M Gt 1. The size of M, which we will informally refer to as the memory size, determines the complexity of eliciting information from each agent. In one extreme, agent i compares a new item t to a single item they had previously received, i.e., M At 1 i , |M| 1. In the other extreme, t is compared to all previous items she received, so M = At 1 i . We write σi(S) for the ranking of agent i for a subset S of the items, and σ 1 i (S, t) for the position of item t S with respect to a subset S according to agent i. The highest value item is in position 1 and the lowest in position |S|. For example, if S = {1, 4}, vi,1 = 1 and vi,4 = 0.1, σi(S) = (1 4), σ 1 i (S, 1) = 1 and σ 1 i (S, 4) = 2.

Algorithms. An algorithm A, in each step t, queries each agent for ordinal information with respect to some subset M and then makes a (possibly randomized) allocation decision based on this ordinal information and the history so far. An instance of our problem is parameterized by the number of agents n and the (unknown) value distributions D1, . . . , Dn. Let EP (t) be the event that some algorithm satisﬁes property P (e.g., envy-freeness or PO or ε-welfare) at time t. We are interested in the probability that an algorithm satisﬁes certain properties in the limit, as the number of rounds tends to inﬁnity, where the randomness is over the random choices of the algorithm as well as the randomness in the valuations. Deﬁnition 1. An algorithm satisﬁes P with high probability if limt Pr[EP (t)] = 1.

Note that this deﬁnition of high probability allows for dependency on n and the underlying distributions (i.e., they are treated as constants).

Efﬁciency notions. An allocation A Pareto dominates an allocation A , denoted A A , when vi(Ai) vi(A i) for all i N and there exists j N with vj(Aj) > vj(A j). An allocation A is Pareto efﬁcient or Pareto optimal (PO) if there is no feasible (integral) allocation that Pareto dominates it. An allocation A is in the (one) swap-neighborhood of A when it can be created from A with a single exchange of items between one pair of agents. Formally, there exist i, j N and items zj Aj and zi Ai so that A i = (Ai \{zi}) {zj}, A j = (Aj \{zj}) {zi}, and A k = Ak for all other agents k = i, j. An allocation A is one-swap Pareto optimal (SPO) if it is undominated by any allocation in its swap-neighborhood. We use a notion of approximate efﬁciency deﬁned by [RF90] according to which an allocation A is α-Pareto efﬁcient when v(A)/α is undominated.

The social welfare of an allocation A is sw(A) = P i N vi(Ai). Let allocation A denote a (social) welfare optimal allocation for which sw(A ) sw(A) for all feasible allocations A. An allocation provides an α-approximation to welfare if sw(A) α sw(A ).

Fairness notions. We focus on a prominent notion of fairness called envy-freeness. An allocation AT = (AT 1 , . . . , AT n) of T items is envy-free (EF) when vi(AT i ) vi(AT j ) for all i, j N, and c-strongly-envy-free (c-strong-EF) when vi(Ai) vi(Aj) + c T.

3 Ideal Quantile-based Algorithms.

For our analysis, it will be useful to compare our algorithms with ideal algorithms that know exact quantile values for every item (and, in fact, several of our lower bounds apply to these stronger algorithms too). Two ideal algorithms of interest are (1) quantile maximization, which allocates each item to the agent with the highest quantile value for it, and (2) q threshold, which allocates each item uniformly at random among agents whose quantile is at least q (and uniformly at random over all agents, if all quantile values are less than q).

In the i.i.d. model, quantile maximization is the same as value maximization, and thus envy-free with high probability and ex-post welfare optimal. The property we will use is c-strong envy-freeness, for some distribution-dependent constant c, which we state as Lemma 1. This was essentially proved by [DGK+14]; we provide an alternate proof that also works, essentially unchanged, for the n 1

n - threshold algorithm; it can be found in Appendix A.1. The main idea is that an agent s value for

an item conditioned on them receiving it is strictly larger than their value for an item received by another agent. Combined with the fact that the probability of receiving an item is equal for all agents, standard concentration results ensure each agent strictly prefers their own bundle with high probability. Lemma 1. [Essentially [DGK+14].] In the i.i.d. and non-i.i.d. models, the quantile maximization algorithm and the n 1

n -threshold algorithm are c-strongly-envy-free, with high probability, where the constant c = mini N (E[Xi | Qi 1/2] E[Xi])/(4n).

Note that c is strictly positive since our distributions are continuous. In the i.i.d. model, we show that the n 1

n -threshold algorithm gives a 1 1

e 2 ε approximation to welfare (Lemma 2) with high probability.

Next, we prove that the n 1

n -threshold algorithm guarnatees a constant approximation to welfare. The main idea for this proof is that, with constant probability, some agent has a quantile of at least n 1

n for each item. Additionally, giving an item to an agent with quantile at least n 1

n generates within a constant factor amount of welfare as giving it to the agent with the highest welfare.

Lemma 2. In the i.i.d. model, the n 1

n -threshold algorithm guarantees a 1 1

e 2 ε - approximation to welfare, with high probability, for all ε > 0.

Proof. Let F be the CDF of an arbitrary continuous distribution. The expected contribution of an item to the welfare of the threshold algorithm is at least

E Q Unif[0,1]

F 1(Q) | Q n 1

Pr Q Unif[0,1]n

max i N Qi n 1

For the ﬁrst term we have

E Q Unif[0,1]

F 1(Q) | Q n 1

= E Q Unif[ n 1

n F 1(q) f Unif[ n 1

n ,1](q) dq

n F 1(q) n dq

(f Beta[n,1](x)=nxn 1) Z 1

n F 1(q) f Beta[n,1](q) dq

= E Q Beta[n,1]

F 1(Q) | Q n 1

Pr Q Beta[n,1]

E Q Beta[n,1]

F 1(Q) Pr Q Beta[n,1]

= E Q Unif[0,1]n

F 1(max i N Qi) Pr Q Unif[0,1]n

max i N Qi n 1

where we used the fact that the maximum of n draws from U[0, 1] follows a Beta(n, 1). The expected contribution to the welfare is thus at least

E Q Unif[0,1]n

F 1(max i N Qi) 1 1 1

2 E Q Unif[0,1]n

F 1(max i N Qi)

Finally, for any ﬁxed ε > 0, standard Chernoff bounds tell us that with high probability, the optimal welfare of T items is at most T (1 + ε/2) E Q Unif[0,1]n F 1(maxi N Qi) while the welfare of

the threshold algorithm is at least T (1 ε/2) 1 1

e 2 [E Q Unif[0,1]n F 1(maxi N Qi) . Indeed, the expected optimal welfare is equal to T E Q Unif[0,1]n F 1(maxi N Qi) , the sum of T i.i.d. random variables with expectation E Q Unif[0,1]n F 1(maxi N Qi) . The standard multiplicative

Chernoff bound says that the sum of i.i.d. variables exceeds (1 + ϵ/2) times its expectation µ is at most exp µϵ2/12 . Plugging in µ = T E Q Unif[0,1]n F 1(maxi N Qi) , we get the desired statement. The statement about the welfare of the threshold algorithm follows similarly. Thus, the algorithm is a 1 1

2 (1 ε/2)/(1 + ε/2) 1 1

2 (1 ε) 1 1

approximation to welfare, with high probability.

Finally, we prove that both ideal algorithms are approximately efﬁcient. Let P be the following property of an allocation: all items such that exactly one agent has quantile values at least 1 1/n are in the bundle of this agent. Both ideal algorithms (quantile maximization and 1 1/n-threshold) satisfy P . We prove that, in the non-i.i.d. model, P implies an almost 1/e approximation to efﬁciency. Our proof uses the fact that there is a (roughly) 1/e probability that exactly one agent has the high quantile, so the value of an agent s bundle in an algorithm that satisﬁes P is, with high probability, a 1/e approximation to their value for their T/n most valuable items. Therefore, when considering an alternate allocation A , the agent in A that gets at most T/n items cannot be improved upon by more than a 1/e factor. The proof can be found in Appendix A.2. Lemma 3. In the non-i.i.d. model, every algorithm whose allocations satisfy P is (1/e ε)-Pareto optimal, with high probability, for all ε > 0.

4 Unbounded Memory in the I.I.D. Model

We explore some fundamental limits of our setting. Efﬁciency by itself is easy: allocate all items to the same agent. However, in contrast to the cardinal setting, we ﬁnd one-swap Pareto efﬁciency is incompatible with envy-freeness with high probability, even for two i.i.d. agents, and even when the underlying distribution is known. Theorem 4. In the i.i.d. model, even for n = 2 agents, there does not exist an algorithm A which is one-swap Pareto efﬁcient and envy-free with high probability, even when values are sampled according to D, for any continuous, bounded and known value distribution D.

The proof can be found in Appendix B.1. The main idea is the following. As the agents are a priori identical, we can assume without loss of generality that A gives the ﬁrst item to agent 1. The proof shows that, with positive probability, this decision becomes an irrevocable mistake, in the sense that agent 2 really liked the item and agent 1 did not. This mistake will make envy-freeness and one-swap-Pareto efﬁciency incompatible.

Theorem 4 implies that when we have access to only ordinal information, we need to settle for some approximation to envy-freeness and efﬁciency. Our main positive result for this section is an algorithm that essentially matches the aforementioned lower bound. Theorem 5. In the i.i.d. model, Algorithm 1 achieves envy-freeness and a (1 ε) approximation to welfare, with high probability, for all ε > 0.

Algorithm 1 works in epochs: each epoch k has an exploration/sampling phase, where each agent i receives a pre-determined set of items, denoted Gk i , irrespective of their valuation. This is followed by an exploitation/ranking phase, where each item is given to the agent with the highest empirical quantile (with respect to items received in the preceding exploration phase, i.e. Gk i ). The key idea for the proof is that the exploration phases will ensure nearly all of the items will go to the agent that truly has the highest quantile. Using this, the good properties of the ideal quantile maximization algorithm will carry over to Algorithm 1.

We start with a technical lemma, which gives us a bound on the length of the exploration period we need in each epoch. The following deﬁnition will be useful. Deﬁnition 2. A sample of n m items (where each agent is allocated exactly m items) is ε-accurate if, with probability at least 1 ε, the relative rank of a fresh item (with respect to the sample) is highest for the agent with highest quantile value.

Lemma 6. If ε, δ (0, 1), and m Z+ are such that ε > 2n q

2m , then giving m samples to each agent is ε-accurate with probability at least 1 δ.

Algorithm 1: EF + (1 ε)-Welfare for epoch k = 1 . . . do

Sampling Phase: (n k4 items) Give the j-th item in this phase to agent j(mod n). Ranking Phase: (k8 items) for each item g in this phase do

Elicit σ 1 i (Gk i {g}, g) for all i N. Allocate g to an agent j arg mini N σ 1 i (Gk i {g}, g).

Proof. We will use the Dvoretzky Kiefer Wolfowitz (DKW) inequality [DKW56, Mas90] to show the empirical CDF of sampled quantiles is reasonably close to a uniform distribution with probability 1 δ. We then show this is sufﬁcient to guarantee ε-accuracy for the chosen ε. Let ˆFi be the empirical CDF of the sampled quantiles for agent i, i.e., ˆFi(q) for q [0, 1] is a random variable that describes the proportion of sampled items with quantile at most q. Note that ˆFi exactly captures agent i s ranking for a new item: if a fresh item has quantile qi for agent i and qj for agent j, then i ranks it higher than j exactly when ˆFi(qi) > ˆFj(qj).

Noting that the CDF for the actual quantile distribution (i.e., the uniform distribution) is the identity on [0, 1], the DKW inequality states that for all γ > 0, Pr h supq [0,1] | ˆFi(q) q| > γ i 2e 2mγ2. We want this condition to hold for all n agents, simultaneously, with probability at least 1 δ, so

we pick γ such that 2e 2mγ2 δ/n and apply a union bound; it sufﬁces to choose γ = q

We now show that the DKW condition (supq [0,1] | ˆFi(q) q| γ) being satisﬁed for all agents i is sufﬁcient to guarantee ε-accuracy. Consider sampling quantiles Q1, . . . , Qn for a fresh item. Let imax argmaxi N Qi be a quantile-maximizing agent (technically a random variable). Our goal is to show that with probability at least 1 ε (with respect to the samples of Q1, . . . , Qn) ˆFimax(Qimax) > ˆFj(Qj) for all j = imax. This ensures that imax has the highest empirical rank, and hence receives the item. Let Q(1), . . . , Q(n) be the respective order statistics. A key observation is that Q(n) Q(n 1) Beta[1, n] [Gen19]. The PDF of a Beta[1, n] distribution is f(x) = nxn 1

for x [0, 1]. Since f(x) n, Pr Q(n) Q(n 1) < ρ < nρ for all ρ > 0. Plugging in ρ = 2γ, we have Pr Q(n) Q(n 1) 2γ < 2nγ. We will show that as long as ε > 2nγ, ε-accuracy holds. First, we have Pr Q(n) Q(n 1) > 2γ > 1 ε. Conditioned on Q(n) Q(n 1) > 2γ, the item is given to imax. To see why, observe Qimax = Q(n) and Qj Q(n 1) for all j = imax, by deﬁnition. Using the DKW inequality condition, it follows that ˆFimax(Qimax) Qimax γ > Qj+γ ˆFj(Qj).

We conclude that for ε > 2n q

2m , ε-accuracy is satisﬁed with probability at least 1 δ.

Using Lemma 6, we can get, for each epoch, a bound on the number of decisions where Algorithm 1 differs from the quantile maximization algorithm.

Lemma 7. The allocation of Algorithm 1 differs from that of the quantile maximization algorithm after T steps by at most f(T) items with high probability, where f(T) O(poly(n) T 15/16).

Proof. We start by bounding the accuracy of Algorithm 1 in each epoch k. In epoch k, each agent receives k4 items during the sampling phase. We claim that the sample in epoch k for k 3n is εk-accurate for εk := 3n/k3/2 with probability at least 1 δk, for δk := 2n/e2k. Indeed, ﬁrst note that by the choice of k, we have that εk, δk (0, 1). Hence, we just need to show that these values satisfy the inequality of Lemma 6. We have that

Next, ﬁx a time T. Slightly abusing notation, let k(t) = min{K N| PK k=1 nk4 + k8 t} be the function that given an item t returns the epoch item t is in. Notice that T Pk(T ) 1 k=1 nk4 + k8

(k(T) 1)8, and therefore k(T) 2T 1/8. In any run of the algorithm, we can classify every item t T into one of four categories.

1. Item t was allocated in one of the ﬁrst 3n 1 epochs, that is, k(t) < 3n.

2. Item t was allocated in the sampling phase of epoch k(t) 3n.

3. Item t was allocated in the ranking phase of epoch k(t) 3n; the epoch was εk(t)-accurate.

4. Item t was allocated in the ranking phase of epoch k(t) 3n; the epoch was not εk(t)-accurate.

We say an item t was a mistake if it was given to an agent with a non-maximum quantile for it. We show that the number of mistakes in each category are bounded by 310n9, 2n T 5/8, 9n T 15/16, and 158n ln(T) respectively, with high probability. This implies that the total number of mistakes is at most the sum of these quantities, which is O(poly(n) T 15/16), with high probability, via a union bound.

The number of items in the ﬁrst category is at most

k=1 k4n + k8

k=1 (3n)4n + (3n)8 (3n)5n + (3n)9 310n9.

Hence, the number of mistakes in the ﬁrst category is also at most 310n9.

For the second category, since k(T) 2T 1/8, we have that the total number of items in the sampling phase is (with probability 1) upper bounded by

k=1 nk4 nk(T)5 2n T 5/8.

Each item t in the third category has probability εk(t) of being a mistake. The expected number

of mistakes is therefore at most Pk(T ) k=3n εk(t)k8 = Pk(T ) k=3n 3nk13/2 3nk(T)15/2 8n T 15/16. Using Hoeffding s inequality we get that with high probability the number of mistakes is at most 9n T 15/16, since a deviation of n T 15/16 occurs with probability at most exp 2n2T 15/8/T = exp 2n2T 7/8 .

For the fourth category, the expected number of items is at most Pk(T ) k=3n δkk8 = 2n Pk(T ) k=3n k8

e2k 2n P k=1 k8

e2k 158n. Using Markov s inequality we have that the number of mistakes is at most 158n ln(T) with probability at least 1 ln(T), i.e., with high probability.

Finally, we can prove Theorem 5 as a relatively straightforward consequence of Lemma 7, since the ideal quantile maximization algorithm satisﬁes nice properties (e.g., Lemma 1). The full proof can be found in Appendix B.2.

5 Bounded Memory in the I.I.D. Model

In this section, we are interested in the more ambitious problem of designing dynamic algorithms with even more limited partial information: each agent is allowed to remember only a single item. We ﬁrst show that, in this case, we need to settle for constant approximations of welfare. The main idea of the proof is a difﬁcult value distribution. Agents have essentially one of three values, low, medium, high. The key idea is that no matter what quantile thresholds an algorithm has, it will either be the case that with reasonable probability, the highest value agent has medium value but is indistinguishable from all low value agents, or the highest value agent has high value, but is indistinguishable from all medium or low value agents. In either case, the algorithm will make a mistake with constant probability, and hence cannot guarantee close to optimal welfare. The full proof can be found in Appendix C.1. Theorem 8. In the i.i.d. model, given a memory of one item per agent, there is no algorithm A that is .999-welfare maximizing with high probability for all continuous and bounded value distributions.

Algorithm 2: Bounded Memory for Epoch k = 1 . . . do

Sampling Phase: (k9 items) NOTWITHINERROR N for trial = 1, . . . , k3 do

for i NOTWITHINERROR do

Allocate the next item to agent i, and update her memory Test k6 |NOTWITHINERROR| number of items (for each agent) for i NOTWITHINERROR do

if Proportion of test items for agent i is within 1/k2 of (n 1)/n then

NOTWITHINERROR NOTWITHINERROR \ { i } Ranking Phase: (k18 items) for each item g in this phase do

if Some agent i has high signal then

Give g to a (uniformly) random such agent else

Give g to an agent uniformly at random

Our positive result matches this lower bound up to a constant.

Theorem 9. In the i.i.d. model, given a memory of one item per agent, Algorithm 2 achieves envyfreeness and a (1 1/e)2 ε approximation to welfare, with high probability, for all ε > 0.

Algorithm 2 works in epochs, similar to Algorithm 1. In each epoch s exploration/sampling phase, it tries to ﬁnd an item whose quantile is close to the n 1

n -threshold algorithm. Epoch k makes k3 such attempts, and each candidate item is tested against k6 fresh items to get an estimated quantile. If everything is within the error we can tolerate, the algorithm remembers this item for this epoch; otherwise, the agent has an arbitrary item in memory during this epoch. During the exploitation/ranking phase, Algorithm 2 tries to mimic the n 1

n -threshold algorithm (instead of the quantile maximization algorithm as Algorithm 1 did), and, in fact, inherits its approximation factor (Lemma 2) exactly.

Our ﬁrst technical lemma, Lemma 10, gives necessary bounds on the various variables of Algorithm 2 for a sample to be ε-accurate with respect to the ideal threshold algorithm; see Deﬁnition 3. Its proof can be found in Appendix C.2.

Deﬁnition 3. A set of n items in memory, one for each agent, is ε-accurate with respect to q if with probability at least 1 ε, when a fresh item is sampled, the agents with true quantile above q are exactly those that value the fresh item more than their item in memory.

Lemma 10. For all ε, δ (0, 1), if (1) at least τ trials are done with τ ln(2n/δ)

ε/(3n) , and (2) at least

ℓtest items are used per trial for ℓ 18n2

δ , and (3) the tolerance for accepting an item is ε/(3n), then the items in memory are ε-accurate (for all agents, simultaneously) with respect to q = n 1

n , with probability at least 1 δ.

Though Lemmas 6 and 10 resemble each other (and are used in analogous ways), the proofs require different techniques, as the sampling processes are very different. Next, we prove an analogue to Lemma 7: the number of disagreements between Algorithm 2 and the ideal threshold algorithm is sublinear.2 The proofs of Lemmas 7 and 11 are similar, precisely because Lemma 6 matches Lemma 10. Theorem 9 follows from Lemma 11 as in the i.i.d. case. The proofs of Lemma 11 and Theorem 9 can be found in Appendices C.3 and C.4 respectively.

Lemma 11. The allocation of Algorithm 2 differs from that of the n 1

n -threshold algorithm after T steps by at most f(T) items with high probability, where f(T) O(poly(n) T 17/18).

2Note these are randomized algorithms, so by differ on a item here we mean that the distributions over agents receiving the item differ.

6 The Non-I.I.D. Model

In this section, we study the non-i.i.d. model. We ﬁrst establish a strong lower bound for the non-i.i.d. model. The following negative result holds even for algorithms that know the associated quantile for every fresh item. Theorem 12. Even for 2 non-identical agents, there is no algorithm that is EF and c-PO with probability p, for c > 1+

5 4 .809 and p > 2/3, for all continuous and bounded value distributions.

Here we provide a proof sketch of Theorem 12. The full proof can be found in Appendix D.1.

Proof Sketch. The main idea is that an algorithm, even if it can see the exact quantiles of all items, cannot distinguish between agents having potentially different kinds of distributions. At one extreme end, agents may have nearly identical values for all items. On the other extreme, agents may only care (i.e., have high value) for, say, their top third of items.

Consider the case of two agents. Notice that if both agents have nearly identical values for all items, to be EF, the bundle sizes of both agents must be approximately the same. We can then show that, no matter how an algorithm does this, it must be the case that one agent, say agent 1, receives a signiﬁcant number of items with quantile outside of their top third does not receive a signiﬁcant number of items with quantile in their top third.

However, suppose instead the value distributions were such that agent 2 still has nearly identical values for all items, but agent 1 only has high value for their top-third items. In the setting described above, agent 1 would be willing to trade many of their bottom two-third quantile items for the items agent 2 received in their top third. By setting up this trade correctly, both agents total utilities go up by a signiﬁcant amount. The existence of such a trade violates constant approximations to PO.

Algorithms 1 and 2 are envy-free with high probability, even in the non-i.i.d. model, since envyfreeness is not an inter-agent property. Our last result shows that they also give a constant approximation to Pareto efﬁciency, by combining Lemma 3 with Lemmas 7 and 11. Its proof can be found in Appendix D.2. Theorem 13. In the non-i.i.d. model, both Algorithm 1 (unbounded memory) and Algorithm 2 (oneitem memory) are EF and (1/e ε)-PO, with high probability, for all ε > 0.

Interestingly, the guarantees for Algorithm 2 in the non-i.i.d. model are only marginally worse compared to the i.i.d. model; the approximation ratio decreases from (1 1/e)2 0.4 to 1/e 0.37. Finally, we note that, even though the formal guarantees in Theorem 13 are the same for the two algorithms, and even though Algorithm 2 uses memory size of one, Algorithm 1 has the beneﬁt of much shorter epoch lengths (in addition to better guarantees under the i.i.d. model).

7 Conclusion

To conclude, we have analyzed the online fair division problem when agents only reveal partial information. In multiple settings, we have proved tight bounds on what properties are attainable. As for future work, one direction would be to move past the assumption that agent values are drawn independently, as we may not expect this to hold in practice. Additionally, there are many other forms of partial information that could potentially achieve different properties. For example, if agents can compare not just single items but small subsets of items, then it may be possible to achieve stronger results such as arbitrarily good approximations to PO even in the non-i.i.d. setting.

Acknowledgements

Alexandros Psomas is supported in part by an NSF CAREER award CCF-2144208, a Google Research Scholar Award, and by the Algorand Centres of Excellence program managed by Algorand Foundation. Any opinions, ﬁndings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reﬂect the views of Algorand Foundation. This material is based upon work supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE1745303.

[AAGW15] Martin Damyanov Aleksandrov, Haris Aziz, Serge Gaspers, and Toby Walsh. Online fair division: Analysing a food bank problem. In Proceedings of the 24th International Joint Conference on Artiﬁcial Intelligence (IJCAI), pages 2540 2546, 2015.

[ABM16] Georgios Amanatidis, Georgios Birmpas, and Evangelos Markakis. On truthful mechanisms for maximin share allocations. In Proceedings of the 25th International Joint Conference on Artiﬁcial Intelligence (IJCAI), pages 31 37, 2016.

[AFRSV21] Elliot Anshelevich, Aris Filos-Ratsikas, Nisarg Shah, and Alexandros A Voudouris. Distortion in social choice problems: the ﬁrst 15 years and beyond. In Proceedings of the 30th International Joint Conference on Artiﬁcial Intelligence (IJCAI), pages 4294 4301, 2021.

[AGMW15] Haris Aziz, Serge Gaspers, Simon Mackenzie, and Toby Walsh. Fair assignment of indivisible objects under ordinal preferences. Artiﬁcial Intelligence, 227:71 92, 2015.

[BBL+17] Dorothea Baumeister, Sylvain Bouveret, Jérôme Lang, Nhan-Tam Nguyen, Trung Thanh Nguyen, Jörg Rothe, and Abdallah Safﬁdine. Positional scoringbased allocation of indivisible goods. Autonomous Agents and Multi-Agent Systems, 31(3):628 655, 2017.

[BEL10] Sylvain Bouveret, Ulle Endriss, and Jérôme Lang. Fair division under ordinal preferences: Computing envy-free allocations of indivisible goods. In Proceedings of the 19th European Conference on Artiﬁcial Intelligence (ECAI), pages 387 392, 2010.

[BG22] Yushi Bai and Paul Gölz. Envy-free and pareto-optimal allocations for agents with asymmetric random valuations. In Proceedings of the 32nd International Joint Conference on Artiﬁcial Intelligence (IJCAI), 2022. Forthcoming.

[BGGJ22] Siddhartha Banerjee, Vasilis Gkatzelis, Artur Gorokh, and Billy Jin. Online nash social welfare maximization with predictions. In Proceedings of the 33rd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1 19, 2022.

[BKM22] Siddharth Barman, Arindam Khan, and Arnab Maiti. Universal and tight online algorithms for generalized-mean welfare. In Proceedings of the 36th AAAI Conference on Artiﬁcial Intelligence (AAAI), 2022. forthcoming.

[BKPP18] Gerdus Benade, Aleksandr M Kazachkov, Ariel D Procaccia, and Christos-Alexandros Psomas. How to make envy vanish over time. In Proceedings of the 10th ACM Conference on Economics and Computation (EC), pages 593 610, 2018.

[BMS22] Anna Bogomolnaia, Hervé Moulin, and Fedor Sandomirskiy. On the fair division of a random object. Management Science, 68(2):1174 1194, 2022.

[BNPS21] Gerdus Benade, Swaprava Nath, Ariel D Procaccia, and Nisarg Shah. Preference elicitation for participatory budgeting. Management Science, 67(5):2813 2827, 2021.

[CNPS17] Ioannis Caragiannis, Swaprava Nath, Ariel D Procaccia, and Nisarg Shah. Subset selection via implicit utilitarian voting. Journal of Artiﬁcial Intelligence Research, 58:123 152, 2017.

[DGK+14] John P. Dickerson, Jonathan Goldman, Jeremy Karp, Ariel D Procaccia, and Tuomas Sandholm. The computational rise and fall of fairness. In Proceedings of the 28th AAAI Conference on Artiﬁcial Intelligence (AAAI), pages 1405 1411, 2014.

[DKW56] Aryeh Dvoretzky, Jack Kiefer, and Jacob Wolfowitz. Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. The Annals of Mathematical Statistics, pages 642 669, 1956.

[FPV15] Eric Friedman, Christos-Alexandros Psomas, and Shai Vardi. Dynamic fair division with minimal disruptions. In Proceedings of the 16th ACM Conference on Economics and Computation (EC), pages 697 713, 2015.

[FPV17] Eric Friedman, Christos-Alexandros Psomas, and Shai Vardi. Controlled dynamic fair division. In Proceedings of the 18th ACM Conference on Economics and Computation (EC), pages 461 478, 2017.

[FRFZ14] Aris Filos-Ratsikas, Søren Kristoffer Stiil Frederiksen, and Jie Zhang. Social welfare in one-sided matchings: Random priority and beyond. In Proceedings of the 7th International Symposium on Algorithmic Game Theory (SAGT), pages 1 12, 2014.

[GBI21] Artur Gorokh, Siddhartha Banerjee, and Krishnamurthy Iyer. From monetary to nonmonetary mechanism design via artiﬁcial currencies. Mathematics of Operations Research, 2021.

[Gen19] James E Gentle. Computational statistics. Springer, 2019.

[GHS20] Vasilis Gkatzelis, Daniel Halpern, and Nisarg Shah. Resolving the optimal metric distortion conjecture. In Proceedings of the 61st Symposium on Foundations of Computer Science (FOCS), pages 1427 1438, 2020.

[GKM17] Ashish Goel, Anilesh K Krishnaswamy, and Kamesh Munagala. Metric distortion of social choice rules: Lower bounds and fairness properties. In Proceedings of the 18th ACM Conference on Economics and Computation (EC), pages 287 304, 2017.

[GPT21] Vasilis Gkatzelis, Alexandros Psomas, and Xizhi Tan. Fair and efﬁcient online allocations with normalized valuations. In Proceedings of the 35th AAAI Conference on Artiﬁcial Intelligence (AAAI), pages 5440 5447, 2021.

[HPPZ19] Jiafan He, Ariel D Procaccia, Alexandros Psomas, and David Zeng. Achieving a fairer future by changing the past. Proceedings of the 28th International Joint Conference on Artiﬁcial Intelligence (IJCAI), pages 343 349, 2019.

[HS21] Daniel Halpern and Nisarg Shah. Fair and efﬁcient resource allocation with partial information. In Proceedings of the 30th International Joint Conference on Artiﬁcial Intelligence (IJCAI), pages 5456 5463, 2021.

[Kem20a] David Kempe. An analysis framework for metric voting based on lp duality. In Proceedings of the 34th AAAI Conference on Artiﬁcial Intelligence (AAAI), pages 2079 2086, 2020.

[Kem20b] David Kempe. Communication, distortion, and randomness in metric voting. In Proceedings of the 34th AAAI Conference on Artiﬁcial Intelligence (AAAI), pages 2087 2094, 2020.

[KPS14] Ian Kash, Ariel D Procaccia, and Nisarg Shah. No agent left behind: Dynamic fair division of multiple resources. Journal of Artiﬁcial Intelligence Research, 51:579 603, 2014.

[KPW16] David Kurokawa, Ariel D Procaccia, and Junxing Wang. When can the maximin share guarantee be guaranteed? In Proceedings of the 30th AAAI Conference on Artiﬁcial Intelligence (AAAI), pages 523 529, 2016.

[KS22] Yasushi Kawase and Hanna Sumita. Online max-min fair allocation. In Proceedings of the 15th International Symposium on Algorithmic Game Theory (SAGT), pages 526 543, 2022.

[LLL18] Bo Li, Wenyang Li, and Yingkai Li. Dynamic fair division problem with general valuations. In Proceedings of the 27th International Joint Conference on Artiﬁcial Intelligence (IJCAI), pages 375 381, 2018.

[Mas90] Pascal Massart. The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality. The annals of Probability, pages 1269 1283, 1990.

[MSW20] Debmalya Mandal, Nisarg Shah, and David P Woodruff. Optimal communicationdistortion tradeoff in voting. In Proceedings of the 21st ACM Conference on Economics and Computation (EC), pages 795 813, 2020.

[MW19] Kamesh Munagala and Kangning Wang. Improved metric distortion for deterministic social choice rules. In Proceedings of the 20th ACM Conference on Economics and Computation (EC), pages 245 262, 2019.

[NNR17] Nhan-Tam Nguyen, Trung Thanh Nguyen, and Jörg Rothe. Approximate solutions to max-min fair and proportionally fair allocations of indivisible goods. In Proceedings of the 16th International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), pages 262 271, 2017.

[PR06] Ariel D. Procaccia and Jeffrey S. Rosenschein. The distortion of cardinal preferences in voting. In Proceedings of the 10th International Workshop on Cooperative Information Agents (CIA), pages 317 331, 2006.

[RF90] Günther Ruhe and Bernd Fruhwirth. ε-optimality for bicriteria programs and its application to minimum cost ﬂows. Computing, 44(1):21 34, 1990.

[VPF21] Shai Vardi, Alexandros Psomas, and Eric Friedman. Dynamic fair resource division. Mathematics of Operations Research, 2021.

[ZP20] David Zeng and Alexandros Psomas. Fairness-efﬁciency tradeoffs in dynamic fair division. In Proceedings of the 21st ACM Conference on Economics and Computation (EC), pages 911 912, 2020.