# online_task_assignment_problems_with_reusable_resources__650fb363.pdf

Online Task Assignment Problems with Reusable Resources

Hanna Sumita,1 Shinji Ito,2 Kei Takemura,2 Daisuke Hatano,3 Takuro Fukunaga,4

Naonori Kakimura,5 Ken-ichi Kawarabayashi 6

1 Tokyo Institute of Technology 2 NEC Corporation 3 RIKEN AIP 4 Chuo University 5 Keio University 6 National Institute of Informatics sumita@c.titech.ac.jp, {kei takemura, i-shinji}@nec.com, daisuke.hatano@riken.jp, fukunaga.07s@chuo-u.ac.jp, kakimura@math.keio.ac.jp, k keniti@nii.ac.jp

We study online task assignment problem with reusable resources, motivated by practical applications such as ridesharing, crowdsourcing and job hiring. In the problem, we are given a set of ofﬂine vertices (agents), and, at each time, an online vertex (task) arrives randomly according to a known time-dependent distribution. Upon arrival, we assign the task to agents immediately and irrevocably. The goal of the problem is to maximize the expected total proﬁt produced by completed tasks. The key features of our problem are (1) an agent is reusable, i.e., an agent comes back to the market after completing the assigned task, (2) an agent may reject the assigned task to stay the market, and (3) a task may accommodate multiple agents. The setting generalizes that of existing work in which an online task is assigned to one agent under (1). In this paper, we propose an online algorithm that is 1/2competitive for the above setting, which is tight. Moreover, when each agent can reject assigned tasks at most times, the algorithm is shown to have the competitive ratio /(3 1) 1/3. We also evaluate our proposed algorithm with numerical experiments.

Introduction Online task assignment problem has attracted extensive attention recently in combinatorial optimization and artiﬁcial intelligence. The problem models practical situations that assign agents to tasks arriving online. For example, in a rideshare platform (Dickerson et al. 2021; Dong et al. 2021; Lowalekar, Varakantham, and Jaillet 2020; Nanda et al. 2020) such as Uber and Lyft, we match drivers to riders where requests from riders arrive one by one. Other applications include crowdsourcing (Assadi, Hsu, and Jabbari 2015; Ho and Vaughan 2012; Xu et al. 2017) and job hiring (Anagnostopoulos et al. 2018; Dickerson et al. 2019). In this paper, we study online task assignment problem under known adversarial distributions. In the problem, we are given a time horizon T and a bipartite graph G = (U, V ; E), where U corresponds to a set of agents and V represents types of tasks. A type of a task is usually deﬁned by attributes of the task, e.g., pick-up/drop-off locations of

Copyright 2022, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

a rider in a rideshare platform and language skills required to complete a task in a crowdsourcing platform. An edge (u, v) E means that u has ability to process an online task v. In the problem, the set U of agents is known in advance (ofﬂine vertices), while we are given a vertex with a type v V (an online vertex) at each time, randomly according to a time-dependent distribution Dt over V . We identify an online vertex with its type v. On arrival of v, we immediately and irrevocably either assign some agents to v, or discard the chance of processing v. Then we obtain a proﬁt produced by an assigned agent if she completes a task. The goal is to design an online algorithm that maximizes the total proﬁt. We here assume that Dt is known in advance, which is called a Known Adversarial Distribution (KAD) model (Dickerson et al. 2021); see also (Alaei, Hajiaghayi, and Liaghat 2012). If Dt is the same for all t, it is called a Known I.I.D. (KIID) model. Motivated by practical applications, it has been studied to generalize the online task assignment problem with additional speciﬁc conditions. An example is reusability of agents (Dickerson et al. 2021). In the standard online task assignment problem, when an agent is assigned a task, she will leave immediately from the market. However, if agents are reusable, an agent comes back to the market after completing the task, and she can be matched to a new task again. Such situation especially arises in a rideshare platform with drivers and riders. Another example is rejections by agents (Nanda et al. 2020). It is natural that an agent is allowed to reject an assigned task if it is not satisfactory. Moreover, each agent may have an upper bound on the number of rejections, that is, when she rejects an assigned task times, then she must leave from the market. It should be noted that the previous work mentioned above studied the online bipartite matching where only one agent can be assigned to an online task, and it is not straightforward to extend1 the results to the online task assignment problem.

1We can consider a simple reduction that replaces an online vertex v with capacity bv as bv vertices with capacity 1 arriving sequentially. Such reduction, however, would not work in the KIID/KAD model because an online vertex is chosen independently at each time.

The Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22)

In this paper, we introduce the online task assignment problem in the KAD model with all the constraints mentioned above, that is, the problem with the following constraints: A. (Reusability) an agent comes back to the market after she completes a task, where the occupation time is drawn from a known distribution, B. (Rejections) an agent u rejects an assignment with some probability, and has to leave after rejecting u times, and C. (Task capacity) an online vertex v can accommodate at most bv ofﬂine vertices. See the next section for a formal problem deﬁnition.

Our Contribution Our main result is to propose an online algorithm for the above problem. We prove that our algorithm has the following theoretical guarantees: 1/2-competitive for the unlimited rejection case, i.e., when u is + for all u (Theorem 1), /(3 1)-competitive for the general case (Theorem 2), where = maxu U u. We here evaluate the performance of online algorithms with the common measure called competitive ratio. We say that an algorithm is α-competitive if the expected proﬁt obtained by the algorithm is at least α times the ofﬂine optimal value. The formal deﬁnition is given in the next section. Our results and existing results are summarized in Table 1. Note that our algorithm is useful for the problem without reusability. In fact, we prove that our algorithm has the competitive ratio (1 1/e2)/2 > 0.432 in the KIID model without reusability, which improves on the competitive ratio 1/e (Nanda et al. 2020) for the same setting (Theorem 3). In addition, we show that no (adaptive) algorithm can achieve the competitive ratio better than 1/2. This is implied by a well-known example for the prophet inequality problem (Krengel and Sucheston 1977, 1978). We note that this hardness result is applied even in the setting of (Dickerson et al. 2021). Thus our result complements their result, which considers only non-adaptive algorithms. Due to the space limitation, we omit some results and the details of the proofs; see the full version (Sumita et al. 2022). Below let us describe technical highlights of the proof of our algorithms.

Technique To design an online algorithm, we ﬁrst construct an ofﬂine linear program (LP) that gives an upper bound on the ofﬂine optimal value. This is a similar approach to the online bipartite matching in the KIID/KAD model (Alaei, Hajiaghayi, and Liaghat 2012; Dickerson et al. 2021; Nanda et al. 2020). Our LP has variables xuv,t for each (u, v) E and t, and has linear inequalities that incorporate the constraints (A) (C) above. See the next section for the formal deﬁnition of our LP. Intuitively, each variable xuv,t corresponds to a probability that v arrives at time t and u is assigned to v. We shall use an LP optimal solution x uv,t to determine how to assign agents to v at each time. There are, however, several difﬁculties to obtain our results as described below.

The ﬁrst difﬁculty is to handle the task capacity condition, i.e., that we choose a set of agents each time. Given LP optimal solution x e,t for each edge e, we would like to have a distribution of feasible sets so that the probability of choosing e at time t is x e,t. For the online bipartite matching, this is easy; we just choose an edge e with probability proportional to x e,t at time t. However, for the online task assignment problem, such independent choice of edges may violate the task capacity condition. To avoid it, we use Carath eodory s theorem in convex analysis. The theorem allows us to decompose x e,t to a polynomial number of feasible sets, which can be regarded as a probability distribution on feasible sets. Another difﬁculty is that the LP optimal value may give a loose upper bound on the ofﬂine optimal value. In this case, ﬁnding a feasible set based on LP is not sufﬁcient; we may select an agent which should not be chosen. In fact, there exists a problem instance such that an algorithm similarly to Nanda et al. (2020) that just assigns an agent according to an LP optimal solution would fail to obtain the competitive ratio more than 1/3 (See the full version for a speciﬁc example). To overcome this difﬁculty, we adapt the idea of Alaei, Hajiaghayi, and Liaghat (2012), who analyzed a different online matching problem in the KAD model. Speciﬁcally, after ﬁnding a feasible set S based on x e,t, we decide whether to assign a task v to u for each agent u S. We compute the expected proﬁt that u earns at and after the current time t by assigning u to v, and, if it is larger than the one when not assigning, then we decide to assign u to v. To prove that the proposed algorithm admits desired competitive ratio, we evaluate the expected proﬁt that each vertex u earns. Let Rd u,t be the expected proﬁt that u earns at and after time t, when u has a remaining budget d. Then Rd u,t can be represented recursively with respect to t and d. Since the recursive equation is linear, we can bound Rd u,t by introducing another linear program and its dual. We note that when u is inﬁnite, the recursive equation becomes simpler depending on only t, that gives a better competitive ratio. Let us describe the differences from (Dickerson et al. 2021; Nanda et al. 2020). The algorithm of Dickerson et al. (2021) assumes that the algorithm can access a probability that u U is available at time t, i.e., u is not occupied by a task at time t. This assumption may be problematic because it is not easy to obtain such information precisely. In contrast, we adopt expected proﬁt after t to decide the assignment, which can be computed deterministically and efﬁciently by dynamic programming. Our algorithm is also different from Nanda et al. (2020), as their algorithm just uses x e,t as a probability mentioned above, which works only for the KIID model. Our algorithm is shown to admit a better competitive ratio in their setting, which improves on the competitive ratio 1/e by Nanda et al. (2020) (Theorem 3).

Experiments We evaluate the performance of our algorithm through experiments. We use a synthetic dataset and the real-world dataset of taxi trip records, similarly to previous work (Dickerson et al. 2021; Nanda et al. 2020). Our algorithm performs the best in most cases, and runs practi-

Reusability # Rejections # Assigned agents Online vertices Comp. ratio Hardness This work + 1 KAD 1/2 1/2 This work 1 KAD /(3 1) (Dickerson et al. 2021) NA 1 KAD 1/2 (1/2) This work 1 KIID > 0.432 (Nanda et al. 2020) 1 KIID 1/e 1 1/e

Table 1: Summary of our results and previous work. The algorithm (Dickerson et al. 2021) requires to know in advance the probability that an agent is available at each time.

cally fast enough. The results imply the superiority of our algorithm.

Related Work The online task assignment problem is a generalization of the online bipartite matching (where each vertex can match to at most one neighbor). Nowadays there exist a large body of literature on online matching. We here mention some of them closely related to our problem. See Mehta (2013) for the detailed survey. The online bipartite matching was introduced by Karp, Vazirani, and Vazirani (1990). They considered the adversarial input model, that is, the model that online vertices arrive in an adversarial order, and proposed a randomized (1 1/e)-approximation algorithm for the unweighted case. It is known that the ratio is tight (Birnbaum and Mathieu 2008; Mehta et al. 2007). When online vertices arrive independently according to a distribution (i.e., the KIID model), Manshadi, Gharan, and Saberi (2012) proposed a 0.702competitive algorithm and showed that the ratio cannot be better than 0.823. For the edge-weighted case, a 0.667competitive algorithm is known (Haeupler, Mirrokni, and Zadimoghaddam 2011). Online stochastic matching, introduced by Mehta and Panigrahi (2012), is the problem that an ofﬂine vertex accepts an assignment with a probability. This can be viewed as that inﬁnite number of rejections are allowed in the process. For the problem in the adversarial input model, Mehta, Waggoner, and Zadimoghaddam (2015) presented a 0.534competitive algorithm for the unweighted case when edge probabilities go to 0. For the KIID model, Brubach et al. (2016) gave a (1 1/e)-competitive algorithm, which works also in the edge-weighted case. Online bipartite matching in the KAD model was introduced by Alaei, Hajiaghayi, and Liaghat (2012) under a name of prophet-inequality matching, as the problem includes the prophet inequality problem (Krengel and Sucheston 1977, 1978) as a special case. Alaei, Hajiaghayi, and Liaghat (2012) extended the problem so that an ofﬂine vertex has a capacity. When ofﬂine vertices have capacities, the online bipartite matching is often called the Ad Words problem (Mehta et al. 2007). This variant is also studied extensively (Devanur and Hayes 2009; Lowalekar, Varakantham, and Jaillet 2020). Recently, online task assignment with the reusability condition receives attention. Dickerson et al. (2021) gave a 1/2-competitive algorithm for the KAD model, and variants of the problem have been studied (Dong et al. 2021;

Gong et al. 2021; Goyal, Iyengar, and Udwani 2021; Rusmevichientong, Sumida, and Topaloglu 2020); see also references therein. Our problem can be viewed as a general framework that uniﬁes the problem of Dickerson et al. and online stochastic matching with limited number of rejections. We also mention that our condition of the task capacity is close to the setting of the online assortment optimization (see e.g. (Gong et al. 2021)). In this problem, an online vertex is offered a set of ofﬂine vertices (an assortment), and the online vertex selects one of them. Nanda et al. (2020) discussed the online bipartite matching problem that maximizes fairness, instead of the total proﬁt. Here, the fairness means the smallest acceptance ratio in tasks. Note that our results can easily be extended to their setting to balance a trade-off between proﬁt and fairness.

In this section, we describe a formal deﬁnition of our problem. For a positive integer k, we denote [k] = {1, . . . , k}. We are given a bipartite graph G = (U, V ; E) with edge weight we 0, where U is the set of ofﬂine vertices and V is the set of types of online vertices. We are also given a time horizon T. Each ofﬂine vertex u U has a positive integer u, which is a budget of allowed rejections in the process. In addition, each edge (u, v) E has a random variable Ce [T] that represents the occupation time for u to complete the task v. In other words, when u accepts v, u is absent from the market, and will be available at time t + Ce, where Ce is drawn from a given distribution. We say that an ofﬂine vertex u is available at time t if u is in the market (i.e., not being occupied by some task) and u has not rejected online vertices u times. For each time t [T], an online vertex v arrives according to a probability distribution {pv,t}v.2 Upon arrival of a vertex v, we immediately and irrevocably either assign at most bv neighbors to v that are available, or discard v. When u U is assigned, u either accepts v with probability qe or rejects v with probability 1 qe, where e = (u, v). When u accepts v, we obtain a proﬁt we and u becomes absent from the market during the occupation time Ce. We note that we may assume u > 0 for all u U. When no rejection is allowed for u, i.e., u = 0, only edges e with qe = 1 can be matched to u. Therefore, we can remove all the edges e such that e is incident to u and qe < 1, and set u to 1.

2Nothing arrives at time t with probability 1 P

Our goal is to design an online (randomized) algorithm that maximizes the expected total proﬁt. The performance of an online algorithm is evaluated by the competitive ratio. In the subsequent sections, we deﬁne the ofﬂine optimal value and the competitive ratio. We note that our model includes the online bipartite matching problem studied in (Dickerson et al. 2021; Nanda et al. 2020) as special cases. We assume that bv = 1 for each v V . When u = + for any u U and the acceptance probability qe is one for each e E, we can ignore the constraint on rejections, and hence our model is identical to the one in (Dickerson et al. 2021). On the other hand, when probability distribution {pv,t}v is the same for all t [T] and Pr[Ce = T] = 1 for any e E, our model is the exactly one in the KIID model without reusable resources, which was studied in (Nanda et al. 2020). We also note that our model and the problem of Alaei, Hajiaghayi, and Liaghat (2012) have no inclusion relationships.

Ofﬂine Optimal Algorithms Given a problem instance, a sequence of online vertices is determined according to probability distributions {pv,t}v,t. We denote by I the probability distribution over all input sequences of online vertices. In the ofﬂine setting, we suppose that we know the sequence I of online vertices in advance. An ofﬂine algorithm that maximizes the expected proﬁt is called an ofﬂine optimal algorithm for I. We note that the algorithm does not know whether each ofﬂine vertex accepts or rejects an online vertex at each time. We denote the expected proﬁt of the ofﬂine optimal algorithm for I by OPT(I). Deﬁne the ofﬂine optimal value by EI I[OPT(I)]. Thus we consider the best algorithm for each I in the ofﬂine optimal value. We denote the set of edges incident to u by Eu = {(u, v) | (u, v) E} for each u U, and similarly Ev for v V . We introduce an ofﬂine LP whose optimal value is an upper bound of the ofﬂine optimal value:

e E weqexe,t

e Eu xe,t qe Pr[Ce t t + 1]

e Eu xe,tqe 1 (u U, t [T]) (1)

e Eu xe,t (1 qe Pr[Ce T t]) u

(u U) (2) X

e Ev xe,t pv,tbv (v V, t [T]) (3)

0 xe,t pv,t (v V, e Ev, t [T]) (4)

For each edge e = (u, v) and time t, a variable xe,t corresponds to a probability that e is chosen at time t. Intuitively, constraints (1) (3) corresponds to the constraints (A) (C) mentioned in the introduction, respectively.

Lemma 1. The optimal value of LP (Off) is at least the ofﬂine optimal value EI I[OPT(I)].

Proof. For each input sequence I, an ofﬂine optimal algorithm for I determines a probability that, when an online vertex v arrives at time t, v is assigned to ofﬂine vertices u for each e = (u, v) E and t [T]. By taking expectations over I, we set xe,t to be the probability that an online vertex v arrives at t and an ofﬂine vertex u is assigned to v for each e = (u, v) E and t [T]. We show that x deﬁned above is a feasible solution to the LP. By deﬁnition, xe,t is nonnegative and at most the probability pv,t, and hence (4) is satisﬁed. Let us see that the ﬁrst constraint (1) is valid. Fix an input sequence I, u U, and t [T]. Let vt be an online vertex arriving at time t < t (if exists). For any realization of acceptances/rejections and Ce s, we have any one of the following: u has no assignment, or u is occupied by vt

for some t < t with occupation time at least t t + 1. Since acceptances/rejections and Ce s are sampled independently, the expected number of online vertices occupying u is Pr[u accepts vt | I] + P t <t Pr[u accepts vt at t | I] Pr[C(u,vt ) t t + 1]. Since Pr[u accepts vt | I] = x(u,v)q(u,v) and the expected number is at most 1, (1) holds. To see the second constraint (2), let us ﬁx an input sequence I and u U. For any realization of acceptances/rejections and Ce s, the number of rejections plus assignments with u never coming back (i.e., Ce being at least T t + 1) is at most u. Then the expected number of rejections is P t [T ] Pr[u rejects some v V at t | I] = P e Eu xe,t(1 qe). The number of assignments that u never returns is P t [T ] Pr[u never comes back | I] = P t [T ] P e Eu xe,tqe Pr[Ce T t + 1]. Hence, we have P t [T ] P e Eu(xe,t(1 qe)+xe,tqe Pr[Ce T t+1]) u, and (2) holds. The third constraint (3) is satisﬁed because the expected number of assignments made for an online vertex v is at most pv,tbv. Therefore, the solution x deﬁned from ofﬂine optimal algorithms is feasible to the LP.

Note that the LP (Off) is a non-trivial extension of those in the previous papers (Dickerson et al. 2021; Nanda et al. 2020). The constraints on rejections can be naturally expressed as P t [T ] P e Eu xe,t(1 qe) u, similarly to the ofﬂine LP in (Nanda et al. 2020). However, this is not enough to show our result, and we need a stronger formulation (2) of the constraints. Example 1. Let us consider the following instance. Let U = {u}, V = {v1, v2, v3} and E = {(u, v) | v V }. Let also T = 3, = 1 and ε 1. Assume that, at time t, only a vertex vt can arrive, whose probability is pt. For our notational convenience, we identify an edge (u, vt) with t. Let p1 = p2 = 1, p3 = ε, q1 = q2 = 1/2, q3 = 1, w1 = 4/9, w2 = 6/9, and w3 = 4/(9ε). We set Pr[Ct = 1] = 1/2, Pr[Ct = 2] = 1/2, Pr[Ct = 3] = 0 for all t. We demonstrate the calculation of the expected proﬁt for an adaptive online algorithm. Consider an adaptive algorithm that we always assign an arriving vertex to u if pos-

sible. Then, at time 1, we obtain a proﬁt 4

9 in expectation. The probability that u is available at time 2 is q1 Pr[Ct = 1] = 1 4 because it is equal to the probability that u accepts v1 and C1 takes 1. Thus the probability that v2 is assigned to u at time 2 is Pr[(u is available at time 2) (v2 arrives at time 2)] = 1 4. Hence the expected proﬁt at time 2 is 6

2 = 1 12. By a similar discussion, the expected proﬁt at time 3 is 5

36. Therefore, the expected proﬁt in total is 2

9. The ofﬂine optimal value can be calculated as follows. When we know v3 arrives at time 3 in advance, the best strategy is that we discard v1 and v2 but assign v3 to obtain a proﬁt 4

9ε. On the other hand, suppose that v3 does not arrive and we know it. We describe that the expected proﬁt when we assign v1 at time 1 is at most 11

36. We obtain the expected proﬁt q1w1 = 2

9 from v1. Since v2 is the last vertex to arrive, we should assign v2 if u is available at time 2. The probability that u is available at time 2 is q1 Pr[Ct = 1] = 1

4 because it is equal to the probability that u accepts v1 and C1 takes 1. Thus the probability that v2 is assigned to u at time 2 is Pr[(u is available at time 2) (v2 arrives at time 2)] = 1

4. Hence the expected proﬁt at time 2 is 6

2 = 1 12. Therefore, the expected proﬁt in total is 2

9 + 1 12 = 11 36. On the other hand, the expected proﬁt obtained when we discard v1 is 1

3, and thus we see that it is better to discard v1. Since the probability that v3 arrives is ε, the ofﬂine optimal value is ε 4

9ε + (1 ε) 3

9 . Let us see a corresponding feasible solution x to the ofﬂine LP. We denote xuvt,t = xt for t = 1, 2, 3 for simplicity. Then as in the proof of Lemma 1, xt corresponds to the probability that vt is assigned to u. Since v1 is not chosen in any ofﬂine optimal algorithm, x1 is 0. Moreover, x2 is ε 0 + (1 ε) 1 = 1 ε since we choose v2 if v3 does not arrive. Finally, x3 is ε 1 + (1 ε) 0 = ε. The LP objective value for this solution x is 2

9εx3 = 7 3ε

Competitive Ratio

We evaluate the performance of an online algorithm by a competitive ratio. Let ALG(I) be the expected proﬁt of an online algorithm ALG when the input sequence is I. We say that an online algorithm is α-competitive if EI I[ALG(I)] αEI I[OPT(I)] for any instance.

Proposed Algorithm

In this section, we present our algorithm, and then analyze the competitive ratio in subsequent subsections. The overview of our proposed algorithm is described as follows. We ﬁrst ﬁnd an optimal solution x to LP (Off). Then we use x e,t to determine a probability that v comes at time t and we assign u to v where e = (u, v). That is, we choose a set S of at most bv vertices in U so that Pr[u S, v arrives at time t] = x e,t. Then, for each u S, we compute the expected proﬁt earned at and after t by assigning u to v, and, if it is larger than the one by not assigning, then the algorithm tries to assign u to v. Our algorithm is summarized in Algorithm 1. We now explain how to implement each step in more detail.

Finding a Set of Ofﬂine Vertices We ﬁrst explain how to design a probabilistic distribution to ﬁnd a set of ofﬂine vertices when v arrives at time t. We note that it is easy when bv = 1, as we can just choose a vertex u in U with probability x uv,t/pv,t. However, when bv 2, an independently random choice of ofﬂine vertices may violate the feasibility constraint. To avoid it, we construct a probability distribution over feasible vertex sets and choose a feasible vertex set according to the distribution. Let Sv = {S Ev | |S| bv}. For S Sv, its characteristic vector is a vector χS {0, 1}Ev such that (χS)e = 1 if and only if e S. Consider the convex hull Pv of all characteristic vectors χS (S Sv). Then the convex hull coincides with the polytope {y [0, 1]Ev | P e Ev yuv bv}. In addition, every vertex in Pv is an integral vector, which corresponds to a characteristic vector χS for some S Sv. For an optimal solution x of LP (Off), deﬁne yv,t = (x e,t/pv,t)e Ev. Then we can see that yv,t Pv since x satisﬁes (3). It follows from well-known Carath eodory theorem that yv,t can be decomposed as a convex combination of at most |Ev| + 1 vertices in Pv. Since every vertex in Pv corresponds to a characteristic vector of a feasible set, there exist Sv k S and λv,t k (k = 1, . . . , |Ev| + 1) such that

yv,t = P|Ev|+1 k=1 λv,t k χSv k, (5)

where P k λv,t k = 1 and λv,t k 0 for any k [|Ev| + 1]. We regard λv,t k (k = 1, . . . , |Ev| + 1) as a probability distribution over Sv. That is, we choose a set Sk Sv with probability λv,t k . Then the probability that u is chosen when

v arrives at t is P k:u Sv k λv,t k = yv,t uv = x uv,t pv,t .

We note that λv,t k s can be obtained in polynomial time by the constructive proof of Carath eodory theorem. See the full version for the detailed description.

Remark 1. In the above, we use only the fact that a subset of a feasible set is also feasible. Thus our algorithm would work even for more general constraints. We note, however, that it is required to solve linear programming problem over the convex hull Pv with constraints (1), (2), and (4). This implies that, to run our algorithm in polynomial time, we need to describe Pv efﬁciently. An example is a matroid constraint, that is, each online vertex v has a matroid on the ground set U and v chooses an independent set of the matroid. We note that a cardinality constraint is a special case of a matroid constraint.

Assigning Ofﬂine Vertices We next describe how to decide whether we assign each online vertex u Sv k to v. Let Rd u,t be the expected proﬁt that u earns at and after time t, when u has a remaining budget d. By deﬁnition, R0 u,t = 0. Also, for our notational convenience, we assume that Rd u,t = 0 for all d > T. If we assigned an online vertex v to u with remaining budget d > 1 at time t, then the expected proﬁt that u earns at and after t would be

Qd e,t := qe we + PT t ℓ=1 Pr[Ce = ℓ]Rd u,t+ℓ

+ (1 qe)Rd 1 u,t+1.

Algorithm 1: Proposed online algorithm

1: Solve LP (Off) to obtain an optimal solution x

2: for t = 1, . . . , T do 3: Let v be a vertex that arrives at time t (if none, skip) 4: Choose Sv k with probability λv,t k by (5) 5: for u Sv k do 6: Let d be the remaining budget of u 7: if Qd uv,t Rd u,t+1 and u is available then 8: Assign u to v 9: else 10: Do nothing for u

Otherwise, the expected proﬁt will be Rd u,t+1. In our algorithm, for each vertex u Sv k, we compare Qd uv,t and Rd u,t+1. If the former is larger, then this means that assigning u to v makes larger proﬁt than not assigning, and thus we assign u to v. Otherwise, i.e., if assigning u to v does not make enough proﬁt, then we do not assign u to v. Thus the expected proﬁt for a vertex u Sv k is max Qd uv,t, Rd u,t+1 . Since the probability that v arrives at time t and u is chosen by v is x e,t for each e = (u, v) Ev and t [T], Rd u,t can be represented recursively by

Rd u,t = P e Eu x e,t max Qd e,t, Rd u,t+1

+ 1 P e Eu x e,t Rd u,t+1. (6)

Our algorithm needs to compute Rd u,t for each d, u U and t [T]. This can be done efﬁciently in advance by dynamic programming with the above recursive equation (6).

Modifying Instance We ﬁrst observe that the total expected proﬁt of the algorithm is equal to P u U R u u,1. Since LP (Off) gives an upper bound on EI I[OPT(I)] by Lemma 1, we aim to determine the ratio between P u U R u u,1 and the LP optimal value P u U P e Eu P t weqex e,t. To this end, we ﬁx a vertex u in U, and we evaluate the ratio αu between R u u,1 and P e Eu P t weqex e,t. Then the competitive ratio of Algorithm 1 is at least minu αu. To obtain a lower bound of R u u,1, we modify a given instance to a simpler one with LP optimal solution x . We will show that the expected proﬁt for the modiﬁed instance gives a lower bound on that for the original instance. The new instance is deﬁned as follows. We deﬁne G = (U , V ; E ) where U = {u}, V = {vt | t [T]}, and E = {(u, vt) | t [T]}. In this instance, we have only one ofﬂine vertex u, and, at each t [T], only one vertex vt may arrive with edge (u, vt). We set the parameters3 for vt in the new instance as follows: the probability of arrival: p t = P e Eu x e,t; the probability of acceptance: q t = P e Eu x e,tqe/p t if p t > 0, and q t = 0 otherwise;

3For ease of notation, we use simpler subscripts, e.g., p t instead of puvt,t, as we have only one online vertex at time t.

the proﬁt: w t = P e Eu x e,tqewe/(p tq t) if p tq t > 0 and 0 otherwise; the distribution of occupation time (denoted by Ct):

Pr[Ct = ℓ] =

P e Eu x e,tqe Pr[Ce = ℓ]

p tq t (ℓ [T])

if p t, q t > 0, and otherwise, Pr[Ct = T] = 1; the capacity: b t = 1. The budget u of u is set the same value as in the original instance. Since x is a feasible solution to LP (Off), it holds that p t 1 and q t 1 by (4), and moreover,

ℓ=1 Pr[Ct = ℓ] =

P e Eu x e,tqe P ℓPr[Ce = ℓ] p tq t = 1.

Note also that P t [T ] w tq tp t = P e Eu P t weqex e,t. Let us execute Algorithm 1 for the modiﬁed instance, where we use p t instead of LP optimal solution x . Let Rd t be the expected proﬁt that u earns on and after t when u has a remaining budget d at time t. Similarly to (6) for Rd u,t, it holds that

Rd t = p t max n Qd t , Rd t+1 o + (1 p t) Rd t+1, (7)

Qd t = q t w t + PT t ℓ=1 Pr[Ct = ℓ] Rd t+ℓ

+ (1 q t) Rd 1 t+1 . (8)

We show that the modiﬁcation does not increase the proﬁt.

Lemma 2. Rd u,t Rd t for all d [ u] {0} and t [T].

By this lemma, it sufﬁces to lower-bound R u 1 to obtain a lower bound on R u u,1 for the original instance.

Analysis for the Unlimited Rejection Case In this section, we assume that the number of allowed rejections is unlimited, that is, u = + for u U. This means that we can ignore the constraint (2) in LP (Off). The main result of this section is to prove that Algorithm 1 is 1/2-competitive for this case. For that purpose, we ﬁrst modify the instance as in the previous section, and we will bound R u 1 from below. We observe that, when executing the algorithm for the modiﬁed instance, the remaining budget d of the vertex u is inﬁnite in the whole process. For simplicity, we denote Rt = R t and Qt = Q t . Then, by (8), it holds that

ℓ=1 Pr[Ct = ℓ]Rt+ℓ

+ (1 q t)Rt+1.

Moreover, it follows from (7) that

Rt = max {p t Qt + (1 p t)Rt+1, Rt+1} . (9)

We note that RT = p T q T w T . We can make the ﬁrst term in (9) simpler as Btw t + At Rt+1 + Bt PT t ℓ=2 Pr[Ct = ℓ]Rt+ℓ, where Bt = p tq t and At = p tq t Pr[Ct = 1] +

p t(1 q t) + (1 p t) for each t [T]. We note that At = p tq t Pr[Ct = 1] + 1 p tq t = 1 Bt Pr[Ct 2]. A lower bound on R1 is obtained by minimizing R1 subject to the condition (9). The minimization problem can be formulated as a linear programming problem as below. Note that we do not need to solve the LP, but use it for analysis.4

min R1 s.t. Rt Btw t + At Rt+1 + Bt PT t ℓ=2 Pr[Ct = ℓ]Rt+ℓ(t [T 1]) Rt Rt+1 (t [T 1]) RT BT w T Rt 0 (t [T]), (10)

Then the optimal value of the above LP gives a lower bound of R1. The dual of LP (10) is given by

max PT t=1 Btw tαt s.t. α1 + β1 1 α2 + β2 A1α1 + β1 αt + βt Pt 2 ℓ=1 αℓBℓPr[Cℓ= t ℓ] + At 1αt 1 + βt 1 (3 t T 1) αT PT 2 ℓ=1 αℓBℓPr[Cℓ= T ℓ] + AT 1αT 1 + βT 1 αt 0 (t [T]) βt 0 (t [T 1]). (11)

To show a lower bound on LP (10), we construct a feasible solution to the dual LP (11). Let γ 1/2. We set αt = γ for t [T], β1 = 1 γ, β2 = β1 γ + A1γ, and βt = βt 1 γ + At 1γ + Pt 2 ℓ=1 BℓPr[Cℓ= t ℓ]γ (3 t T 1).

Lemma 3. For 0 γ 1/2, αt and βt deﬁned as above are feasible to (11).

Therefore, αt and βt deﬁned as above are feasible to (11). The objective value for this solution is P t Btw tγ = γ P t p tq tw t. It follows from the LP duality theorem that the optimal value of LP (10) is at least γ P t p tq tw t. This is maximized when γ = 1/2, and in this case, R1 is lower-bounded by 1 2 P t p tq tw t. Since P t p tq tw t = P t P e Eu weqex e,t, we obtain R1 1 2 P t P e Eu weqex e,t. Summarizing, we have P

u U R u 1 1 2E[OPT(I)] by Lemma 1, which implies the following theorem.

Theorem 1. Algorithm 1 is 1/2-competitive for the problem with unlimited rejections.

Analysis for the Limited Rejection Case In this subsection, we prove that Algorithm 1 is /(3 1)- competitive for the general case with = maxu U u. As in the analysis for the unlimited rejection case (the previous subsection), we present a lower bound on Rd t . We ﬁrst show the following lemma.

Lemma 4. Rd 1 t d 1

d Rd t for each d [ u] and t [T].

4Such LP is called a factor-revealing LP.

By (7) with the above lemma, we have the following relationships:

Rd t max n pt ˆQd t + (1 pt) Rd t+1, Rd t+1 o , (12)

where ˆQd t = qt(wt + P ℓ 1 Pr[Ct = ℓ] Rd t+ℓ) + d 1

d (1 qt) Rd t+1. Recall that it sufﬁces to give a lower bound on R u 1 to obtain the competitive ratio of Algorithm 1. By (12), we can construct a linear program to bound R u 1 in a similar way to the previous subsection. See the full version for the details. Theorem 2. Algorithm 1 is a /(3 1)-competitive algorithm, where = maxu U u. We also show that our algorithm can be 1 2 1 1

e2 - competitive in the KIID model with non-reusable agents. Theorem 3. There is a ( 1 2 1/ 1 1 e2 1/ )-competitive algorithm for the problem in the KIID model with nonreusable resources, where = maxu U u. Note that 1 2 1/ 1 1 e2 1/ 1

e. Thus, our algorithm has a better competitive ratio than the 1 ecompetitive algorithm by Nanda et al. (2020).

Experiments In this section, we present experimental results to evaluate the performance of Algorithm 1. We describe the experimental environment in the full version.

Instance Setting We focus on the following four settings (a) (d); see also the full version for other settings. The setting (a) is the KIID model with non-reusable agents (i.e., Pr[Ce = T] = 1 for all e E) and u < T ( u U). This coincides with that of Nanda et al. (2020). The settings (b) (d) are the KAD model with reusable agents. In the setting (b), which is that of Dickerson et al. (2021), ofﬂine vertices always accept assignments (i.e., qe = 1 for all e E), while they may reject in (c) and (d). We set u < T ( u U) in (c) and u = + in (d). For each setting, we generate one instance and 1000 input sequences, and focus on the average proﬁts and runtime for evaluation.

Baseline Algorithms We compare our algorithm with the following baseline algorithms: (1) Random chooses ofﬂine vertices uniformly at random. (2) Greedy assigns at most bv available ofﬂine vertices u to v in the order of w(u,v)q(u,v). (3) NAdap* is a simple extension of NAdap proposed by Nanda et al. (2020). NAdap* solves LP (Off) and to assign a set of ofﬂine vertices with probability λv,t k . When bv = 1 for all v, NAdap* coincides with NAdap in the setting (a), and is also used in experiments in (Dickerson et al. 2021). In our experiments, we do not evaluate the algorithm by Dickerson et al. (2021) due to its impractical assumption.

Synthetic Dataset We generate synthetic datasets similarly to (Nanda et al. 2020) as follows. We set |U| = 30, |V | = 100, and T = 200. For each u U and v V , an edge (u, v) exists in E with probability 0.1. For each e E, we set

2 4 6 8 10 bv

Proﬁt / opt. val. of LP

Setting (a) of synthetic dataset

Random Greedy

NAdap* Proposed

(a) KIID without reusability

2 4 6 8 10 bv

Proﬁt / opt. val. of LP

Setting (b) of synthetic dataset

(b) KAD without rejection

2 4 6 8 10 bv

Proﬁt / opt. val. of LP

Setting (c) of synthetic dataset

(c) KAD with u 3

2 4 6 8 10 bv

Proﬁt / opt. val. of LP

Setting (d) of synthetic dataset

(d) KAD with u = +

Figure 1: The rates of average proﬁts for synthetic datasets.

qe U(0.5, 1) (uniform distribution) and we U(0, 1), respectively. For each u U and e Eu, the distribution of Ce for reusability is deﬁned as a binomial distribution B(20, ηu), where ηu U(0, 1). For the settings (a) and (c), u is drawn uniformly at random from {1, 2, 3} for each u U. We set the same bv for all v from {2, 4, 6, 8, 10}. Figure 1 plots the ratios between average proﬁts and the LP optimal values. We see that our algorithm (the black solid line) performs the best in almost all cases. Moreover, it obtains more than half of the LP optimal value, while Random and Greedy sometimes fail. NAdap* performs almost the same as us in (a), but is worse in (b) (d). All the algorithms run within less than 1 second for processing online vertices. Our algorithm needs additionally around 6 seconds on average because of the preprocess of solving LP (Off) and computing the table of Rd u,t s. For the detailed results, see the full version.

Real-world Dataset

We evaluate the performance for instances generated from the New York City yellow cabs dataset5, which is used also in existing work such as (Dickerson et al. 2021; Nanda et al. 2020). Using the data in January 2013 we set up a bipartite graph with |U| = 30 and |V | = 100 and parameters similarly to (Nanda et al. 2020). The detailed set-up is described in the full version. We set T = 100k (k [4]) in setting (a) and T = 288k (k [4]) in others. Figure 2 shows the ratios between average proﬁts and the LP optimal values. Note that our algorithm earns more than half of the LP optimal value in any cases. In settings (a) and (c), Random and Greedy cannot obtain even half of the LP optimal value for a large T. NAdap* performs as well as ours in settings except (c), in which it performs worse than our algorithm. Our algorithm may be too careful in (b) and

5http://www.andresmh.com/nyctaxitrips/

100 200 300 400 T

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Proﬁt / opt. val. of LP

Setting (a) of real-world dataset

Random Greedy

NAdap* Proposed

(a) KIID without reusability

288 576 864 1152 T

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Proﬁt / opt. val. of LP

Setting (b) of real-world dataset

(b) KAD without rejection

288 576 864 1152 T

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Proﬁt / opt. val. of LP

Setting (c) of real-world dataset

(c) KAD with u 3

288 576 864 1152 T

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Proﬁt / opt. val. of LP

Setting (d) of real-world dataset

(d) KAD with u = +

Figure 2: The rates of average proﬁts for real-world datasets.

(d), in which there is no need to care about the rejection constraint, and has less performance. On the other hand, since ofﬂine vertices easily become unavailable in (a) and (c), our algorithm, which considers a long-term effect of the current assignment, performs the best. All the algorithms run in less than 1 second for processing even 1152 online vertices. The preprocess in our algorithm completes in 400 seconds. Since the preprocess is done before arrival of online vertices, our algorithm makes decision as fast as others. For the detailed results, see the full version.

Conclusion In this paper, we studied the online task assignment problem with reusable resources in the KAD model, which generalizes the online bipartite matching in the KAD model. Our problem incorporates practical conditions arising in applications such as ridesharing and crowdsourcing. We proposed an online algorithm that is 1/2-competitive for the unlimited rejection case, which is tight, and /(3 1)-competitive for the general case. Practical usefulness of our algorithm is conﬁrmed by numerical experiments. For future work, as solving LP (Off) is time-consuming, it would be interesting to obtain a constant competitive ratio without solving the LP. One direction is to use an approximate solution to the LP in our algorithm. Another future work is to consider a model that, when a task is rejected, it is allowed to re-assign some other agents.

Acknowledgments We would like to thank the AAAI anonymous reviewers for valuable comments to improve the paper. HS was supported by JSPS KAKENHI Grant Numbers JP17K12646, JP21K17708, and JP21H03397, Japan. SI was supported by JST, ACT-I, Grant Number JPMJPR18U5, Japan. TF was supported by JSPS KAKENHI Grant Numbers JP20H05965, JP21K11759, and JP21H03397, Japan.

NK was supported by JSPS KAKENHI Grant Numbers JP20H05795, JP18H05291, and JP21H03397, Japan. KK was supported by JSPS KAKENHI Grant Number JP18H05291, Japan.

Alaei, S.; Hajiaghayi, M.; and Liaghat, V. 2012. Online Prophet-Inequality Matching with Applications to Ad Allocation. In Proceedings of the 13th ACM Conference on Electronic Commerce (EC), 18 35. Anagnostopoulos, A.; Castillo, C.; Fazzone, A.; Leonardi, S.; and Terzi, E. 2018. Algorithms for Hiring and Outsourcing in the Online Labor Market. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD), 1109 1118. Assadi, S.; Hsu, J.; and Jabbari, S. 2015. Online Assignment of Heterogeneous Tasks in Crowdsourcing Markets. In Proceedings of the 3rd AAAI Conference on Human Computation and Crowdsourcing (HCOMP), 12 21. Birnbaum, B.; and Mathieu, C. 2008. On-Line Bipartite Matching Made Simple. SIGACT News, 39(1): 80 87. Brubach, B.; Sankararaman, K. A.; Srinivasan, A.; and Xu, P. 2016. New Algorithms, Better Bounds, and a Novel Model for Online Stochastic Matching. In Proceedings of the 24th Annual European Symposium on Algorithms (ESA), volume 57, 24:1 24:16. Devanur, N. R.; and Hayes, T. P. 2009. The Adwords Problem: Online Keyword Matching with Budgeted Bidders under Random Permutations. In Proceedings of the 10th ACM Conference on Electronic Commerce (EC), 71 78. Dickerson, J. P.; Sankararaman, K. A.; Sarpatwar, K. K.; Srinivasan, A.; Wu, K.-L.; and Xu, P. 2019. Online Resource Allocation with Matching Constraints. In Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 1681 1689. Dickerson, J. P.; Sankararaman, K. A.; Srinivasan, A.; and Xu, P. 2021. Allocation Problems in Ride-Sharing Platforms: Online Matching with Ofﬂine Reusable Resources. ACM Transactions on Economics and Computation, 9(3). Dong, Z.; Das, S.; Fowler, P.; and Ho, C.-J. 2021. Efﬁcient Nonmyopic Online Allocation of Scarce Reusable Resources. In Proceedings of the 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 447 455. Gong, X.-Y.; Goyal, V.; Iyengar, G. N.; Simchi-Levi, D.; Udwani, R.; and Wang, S. 2021. Online Assortment Optimization with Reusable Resources. Management Science. Forthcoming. Goyal, V.; Iyengar, G.; and Udwani, R. 2021. Asymptotically Optimal Competitive Ratio for Online Allocation of Reusable Resources. In Proceedings of the 17th Conference on Web and Internet Economics (WINE), 543. Available at ar Xiv:2002.02430. Haeupler, B.; Mirrokni, V. S.; and Zadimoghaddam, M. 2011. Online Stochastic Weighted Matching: Improved

Approximation Algorithms. In Proceedings of the 7th International Workshop on Internet and Network Economics (WINE), 170 181. Ho, C.-J.; and Vaughan, J. W. 2012. Online Task Assignment in Crowdsourcing Markets. In Proceedings of the 26th AAAI Conference on Artiﬁcial Intelligence (AAAI), 45 51. Karp, R. M.; Vazirani, U. V.; and Vazirani, V. V. 1990. An Optimal Algorithm for On-Line Bipartite Matching. In Proceedings of the 22nd Annual ACM Symposium on Theory of Computing (STOC), 352 358. Krengel, U.; and Sucheston, L. 1977. Semiamarts and ﬁnite values. Bulletin of the American Mathematical Society, 83(4): 745 747. Krengel, U.; and Sucheston, L. 1978. On semiamarts, amarts, and processes with ﬁnite value. Probability on Banach spaces, 4: 197 266. Lowalekar, M.; Varakantham, P.; and Jaillet, P. 2020. Competitive Ratios for Online Multi-Capacity Ridesharing. In Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 771 779. Manshadi, V. H.; Gharan, S. O.; and Saberi, A. 2012. Online Stochastic Matching: Online Actions Based on Ofﬂine Statistics. Mathematics of Operations Research, 37(4): 559 573. Mehta, A. 2013. Online Matching and Ad Allocation. Foundations and Trends in Theoretical Computer Science, 8(4): 265 368. Mehta, A.; and Panigrahi, D. 2012. Online matching with stochastic rewards. In Proceedings of the 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science (FOCS), 728 737. Mehta, A.; Saberi, A.; Vazirani, U.; and Vazirani, V. 2007. Ad Words and Generalized Online Matching. Journal of the ACM, 54(5): 22:1 22:19. Mehta, A.; Waggoner, B.; and Zadimoghaddam, M. 2015. Online Stochastic Matching with Unequal Probabilities. In Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 1388 1404. Nanda, V.; Xu, P.; Sankararaman, K. A.; Dickerson, J.; and Srinivasan, A. 2020. Balancing the Tradeoff between Proﬁt and Fairness in Rideshare Platforms during High-Demand Hours. In Proceedings of the 34th AAAI Conference on Artiﬁcial Intelligence (AAAI), 2210 2217. Rusmevichientong, P.; Sumida, M.; and Topaloglu, H. 2020. Dynamic Assortment Optimization for Reusable Products with Random Usage Durations. Management Science, 66(7): 2820 2844. Sumita, H.; Ito, S.; Takemura, K.; Hatano, D.; Fukunaga, T.; Kakimura, N.; and ichi Kawarabayashi, K. 2022. Online Task Assignment Problems with Reusable Resources. ar Xiv:2203.07605. Xu, P.; Srinivasan, A.; Sarpatwar, K. K.; and Wu, K.-L. 2017. Budgeted Online Assignment in Crowdsourcing Markets: Theory and Practice. In Proceedings of the 16th Conference on Autonomous Agents and Multiagent Systems (AAMAS), 1763 1765.