# safe_multiagent_pathfinding_with_time_uncertainty__3318000f.pdf Journal of Artificial Intelligence Research 70 (2021) 923 954 Submitted 09/2020; published 03/2021 Safe Multi-Agent Pathfinding with Time Uncertainty Tomer Shahar stomer@post.bgu.ac.il Shashank Shekhar shekhar@post.bgu.ac.il Dor Atzmon dorat@post.bgu.ac.il Ben-Gurion University of the Negev, Israel Abdallah Saffidine abdallah.saffidine@gmail.com The University of New South Wales, Sydney, Australia Brendan Juba bjuba@wustl.edu Washington University in St. Louis, USA Roni Stern Ben-Gurion University of the Negev, Israel sternron@post.bgu.ac.il Palo Alto Research Center, USA rstern@parc.com In many real-world scenarios, the time it takes for a mobile agent, e.g., a robot, to move from one location to another may vary due to exogenous events and be difficult to predict accurately. Planning in such scenarios is challenging, especially in the context of Multi-Agent Pathfinding (MAPF), where the goal is to find paths to multiple agents and temporal coordination is necessary to avoid collisions. In this work, we consider a MAPF problem with this form of time uncertainty, where we are only given upper and lower bounds on the time it takes each agent to move. The objective is to find a safe solution, which is a solution that can be executed by all agents and is guaranteed to avoid collisions. We propose two complete and optimal algorithms for finding safe solutions based on well-known MAPF algorithms, namely, A with Operator Decomposition (A + OD) and Conflict Based Search (CBS). Experimentally, we observe that on several standard MAPF grids the CBS-based algorithm performs better. We also explore the option of online replanning in this context, i.e., modifying the agents plans during execution, to reduce the overall execution cost. We consider two online settings: (a) when an agent can sense the current time and its current location, and (b) when the agents can also communicate seamlessly during execution. For each setting, we propose a replanning algorithm and analyze its behavior theoretically and empirically. Our experimental evaluation confirms that indeed online replanning in both settings can significantly reduce solution cost. 1. Introduction Multi-Agent Pathfinding (MAPF) is the problem of finding how to move a team of agents over edges in a graph from an initial configuration to a goal configuration. The key constraint is that agents must not occupy the same vertex or traverse the same edge at the same time during plan execution. A solution to a MAPF problem is a set of single-agent plans for all the agents that do not violate this constraint. MAPF problems arise in real-world applications, such as aircraft towing vehicles (Morris et al., 2016), video game characters (Silver, 2005), office robots (Veloso et al., 2015), and warehouse robots (Wurman et al., 2007). Modern MAPF solvers can optimally solve problems with over a hundred agents (Sharon et al., 2015; Boyarski et al., 2015b; Felner et al., 2018). c 2021 AI Access Foundation. All rights reserved. Shahar, Shekhar, Atzmon, Saffidine, Juba, & Stern However, most of the earlier work assumed the time it takes an agent to traverse an edge is fixed and known a priori. In reality, exogenous events may cause the edge traversal time to be non-deterministic. One approach to address this is to gather statistics over these exogenous events and reason about them (Wagner & Choset, 2017; Atzmon et al., 2020a; Ma et al., 2017). However, gathering sufficiently accurate statistics of exogenous events is often difficult. For example, in many automated warehouses, humans are responsible for packing items into boxes and the robots move the packages. The time it takes a human to pack a box may depend on many factors, such as the number of items in the box, their weight, and the human s fatigue. Creating an accurate statistical model for the packing time is a challenging problem on its own. In this work, we consider weaker knowledge about such exogenous events, in the form of upper and lower bounds.1 That is, we consider the case where each edge is associated with a time range that bounds the duration it takes an agent to traverse it. This form of uncertainty poses a challenge to current MAPF solvers, especially if one wants to find a safe solution. A solution is safe if it is guaranteed that when executing it there will not be any collision.2 We call the problem of finding a safe solution in this setting, the Multi-Agent Pathfinding with Time Uncertainty (MAPF-TU) problem. In the first part of this work, we adapt two popular MAPF solvers, namely A + OD (Standley, 2010) and Conflict-Based Search (CBS) (Sharon et al., 2015), to solve the MAPFTU problem. We call the adapted algorithms A + ODTU and CBSTU. Both A + ODTU and CBSTU are sound and complete, in the sense that any solution they return is safe, and if a safe solution exists they will return it. In addition, they can guarantee various forms of optimality. We implemented A + ODTU and CBSTU, and compared them on a set of standard MAPF benchmarks adapted to include time uncertainty. Our results show that CBSTU performs significantly better in almost all cases. A + ODTU and CBSTU are both offline algorithms, that is, they output a solution before the agents start to move. In the second part of this paper, we explore the option of online replanning to improve the solution returned by an offline MAPF-TU solver, i.e., the option to modify the given plan while executing it. Specifically, we consider two possible capabilities the agents may have that can be used for online replanning: sensing and communication. Sensing in our context is the ability of an agent to observe, during execution, its current time and location. Communication in our context is the ability to send the sensed information to all the other agents. We demonstrate that online replanning can reduce the overall execution cost even if the agents only sense and cannot communicate. We also demonstrate that the ability to communicate can lead to further reduction of execution costs. Then, we propose two fast online replanning techniques that preserve safety and can reduce the overall execution cost. We implemented both techniques and evaluated the benefit of using them on several benchmark domains. The results show that these online replanning techniques, especially when the agents can communicate, can reduce execution cost by a factor of 1.5 less than the cost of following the original offline solution. Moreover, the time overhead of using them is negligible compared to the time required to compute the original offline solution. 1. We show a way to obtain such bounds from observations in Section 9. 2. Such a solution is known as a strong solution in the planning literature (Cimatti et al., 2018). Safe Multi-Agent Pathfinding with Time Uncertainty The paper is structured as follows. In Section 2, we provide details on relevant background and related work. In Section 3, we define formally the MAPF-TU problem. Sections 4 and 5 introduce the A + ODTU and CBSTU algorithms for solving MAPF-TU, respectively, and in Section 6 we evaluate them empirically. Then, in Section 7, we analyze the benefit of online replanning in MAPF-TU and propose two fast algorithms for this purpose. In Section 8, we evaluate these online replanning algorithms experimentally. Finally, in Section 9, we discuss and outline several key extensions, such as creating an individual execution policy for each agent and bounding edge traversal time from experience. 2. Background and Related Work This work is related to two areas in Artificial Intelligence: multi-agent pathfinding (MAPF) and planning under temporal uncertainty. We briefly discuss the relevant literature below. 2.1 Multi-Agent Pathfinding (MAPF) An instance of a classical MAPF problem (Stern et al., 2019) is defined by a tuple G, s, t where G = (V, E) is a graph, s : [1, . . . , k] V maps an agent to its initial location, and t : [1, . . . , k] V maps an agent to its goal location. Time is assumed to be discretized. At the beginning of each time step, each agent is located in one of the graph vertices. Then, the agent can perform a single action: either wait in its current location, or move to one of the vertices that is adjacent to its current location. An action is a pair (u, v) where for a wait action u = v and for a move action (u, v) is an edge in the graph. A single-agent plan for an agent ai is a sequence of actions that move ai to its goal, i.e., a sequence of actions π = ((v0, v1), (v1, v2), . . . , (vn 1, vn)) where v0 = s(i) and vn = t(i). A solution to a MAPF instance is a vector of k single-agent plans. A solution is said to have a conflict ai, aj, x, t iffthe two agents ai and aj are planned to occupy the same vertex or edge x at the same time step t. We assume that edges are undirected, so a conflict also arises if agents swap locations over the same edge. We say that a solution to a MAPF instance is valid if it does not have any of these conflicts. That is, we consider only vertex, edge, and swapping conflicts, and do not consider other types of conflicts (Stern et al., 2019). Traditionally, the cost of a single-agent plan π in MAPF is the number of its constituent actions. In this work, we define the cost of a solution to be the sum of costs of its constituent single-agent plans. This solution cost function is known as the sum-of-costs (SOC). A valid solution is optimal when there is no other valid solution with a lower cost. A range of algorithms have been proposed in the past decade for finding optimal solutions to such MAPF instances (Felner et al., 2017), such as A + OD (Standley, 2010), ICTS (Sharon et al., 2013), M* (Wagner & Choset, 2015), CBS (Sharon et al., 2015), BCP (Lam et al., 2019b), as well as several algorithms that find optimal solutions by compiling the problem to Boolean satisfiability (Surynek, 2012; Surynek et al., 2016), Constraint programming (Bart ak et al., 2017), or Answer set programming (Erdem et al., 2013). The problem addressed by these optimal solvers has been termed classical MAPF (Stern et al., 2019). In classical MAPF, there is no uncertainty in the edge traversal time, and the edge traversal and the wait action have unit duration. These latter assumptions have been relaxed in recent work. Cohen et al. (2019) and Walker et al. (2018, 2020) suggested optimal and suboptimal algorithms for solving MAPF problems with non-unit edge traversal time Shahar, Shekhar, Atzmon, Saffidine, Juba, & Stern and non-unit but fixed wait action duration. Andreychuk et al. (2019) proposed an optimal algorithm to further allow wait actions with arbitrary durations. However, none of the above algorithms considers time uncertainty over edge durations. Various forms of MAPF with time uncertainty have also been studied in the past. Several algorithms were proposed for the MAPF Problem with Delay Probabilities (MAPFDP) problem, which is a variant of MAPF in which each agent is delayed with some given probability, which is known to the problem solver. The UM* (Wagner & Choset, 2017) algorithm returns a solution for a MAPF-DP problem in which the probability for every potential collisions to occur is below a given threshold. The p R-CBS (Atzmon et al., 2020a) algorithms returns a solution for a MAPF-DP problem in which the probability for successful execution (i.e., having no collisions) is above a given threshold. Neither algorithms guarantee that the solutions they return are safe, in the sense that a collision might still occur while executing these solutions. Approximate Minimization in Expectation (AME) is another algorithm for MAPF-DP, which aims to minimize the expected cost of a solution. However, it only guarantees no collisions occur in the next time step. Thus, it must have online communication to avoid conflicts, and hence is not safe. k R-CBS (Atzmon et al., 2020b) is an algorithm that creates solutions that can be executed without collisions, but only if each agent is not delayed more than k times. MAPFPOST (H onig et al., 2016) is an algorithm that gets a valid MAPF solution and reschedules the times in which agents visit their locations to avoid collisions. Thus, MAPF-POST is safe and can be applied to the MAPF-TU setting (which is defined later). However, it does not provide any solution quality guarantees and a lower cost solution or execution may exist, while the algorithms we propose in this work also provide optimality guarantees. 2.2 Planning under Temporal Uncertainty The type of time uncertainty addressed in our work has been considered in the context of the Simple Temporal Problem under Uncertainty (STPU) (Vidal & Fargier, 1999). Simple Temporal Problem (STP) (Dechter et al., 1991) is the problem of finding a schedule for a set of activities, where each activity has a set of temporal constraints over when it can be scheduled. In STP with Uncertainty (STPU), the duration of an activity may lie within some bounds but cannot be decided upfront, just like in MAPF-TU. However, MAPF-TU is a planning problem while STPU is a scheduling problem: a solution to MAPF-TU is a set of single-agent plans, one per agent, while a solution for STPU is a schedule for a given set of activities. The MAPF-TU problem is a non-deterministic planning problem, where the non-determinism manifests in the edges traversal durations. Cimatti et al. (2003) identified three types of solutions to non-deterministic planning problems: weak solutions, strong solutions, and strong cyclic solutions. A weak solution is a plan that may achieve the goal, a strong solution is a plan that must achieve the goal under any outcome of the non-determinism, and a strong cyclic solution is a plan that must achieve the goal but may do so via trial-and-error, and that trial-and-error may be infinitely long. In this work we aim to find a safe solution, which is a solution that leads all the agents to their respective goals without collisions as Safe Multi-Agent Pathfinding with Time Uncertainty long as the edge traversal durations are in their specified time ranges. Thus, a safe solution is a strong solution. Temporal planning (Coles et al., 2008; Coles & Coles, 2014; Cashmore et al., 2019) is a popular and well-researched sub-area of AI planning in which actions have durations. Cimatti et al. (2018) recently defined the Strong Temporal Planning with Uncertain action Durations (STPUD) problem, which is a temporal planning problem in which action durations are uncontrollable. In particular, they propose a planning approach that can handle cases where actions have uncertain durations that are within known bounds. The MAPF-TU problem can be seen as a special case of the STPUD problem, and it may be possible to translate the MAPF-TU problem to the single-agent STPUD problem and use their planner to provide strong solutions to MAPF-TU. However, the algorithms we propose also guarantee optimality, while their strong temporal planning algorithm does not. Moreover that it is evident from the planning literature that factored-approaches are often very efficient for multi-agent planning problems (Brafman & Domshlak, 2008; Pajarinen & Peltonen, 2011; Shekhar et al., 2019). This has especially been observed in MAPF, where all state of the art algorithms directly exploit the decoupled nature of the problem (Sharon et al., 2015, 2013; Lam et al., 2019a; Gange et al., 2019). 3. Safe MAPF with Time Uncertainty An instance of the Multi-Agent Pathfinding with Time Uncertainty problem (MAPF-TU) for k agents is defined by a tuple G, s, t, w , w+ , where G = (V, E), s, and t, are the same as in classical MAPF; and w : E N and w+ : E N are functions that return the minimal and maximal duration it takes an agent to traverse a given edge. That is, the time it takes an agent to traverse an edge e is a number in the range [w (e), w+(e)]. We refer to w (e) and w+(e) as the lower and upper bounds of the edge duration, respectively. Note that if an agent traversed the same edge twice, the durations of these traversals may be different. The only requirement is that these durations are in the specified time range [w (e), w+(e)]. Time is assumed to be discretized, and a solution is a vector of single-agent plans as defined for classical MAPF. A solution is said to have a potential conflict if there are two agents that may occupy the same vertex or edge at the same time. In this work, we aim to find a solution for a given MAPF-TU instance has no potential conflicts. We call such a solution a safe solution. 3.1 Computing the Potential Presence To identify a safe solution, we need to formally define what is a potential conflict. To this end, we introduce the notion of potential presence. The potential presence induced by a single-agent plan π at a vertex v, denoted τ(π, v), is defined as the set of time intervals in which an agent may be at v when following π. The potential presence is similarly defined for an edge. Example 1. (Figure 1) Consider a single-agent plan π in which the agent moves from its initial vertex s to its goal vertex g through vertices v1 and v2 without waiting. The time interval depicted above each edge is the range of possible edge durations. The potential Shahar, Shekhar, Atzmon, Saffidine, Juba, & Stern τ(π, s) = [0, 0] τ(π, v1) = [1, 3] τ(π, v2) = [2, 4] τ(π, g) = [3, 6] [1, 3] [1, 1] [1, 2] Figure 1: An example of the potential presence of the agent when moving from s to g. presences induced by π at vertices s, v1, v2, and g are {[0, 0]}, {[1, 3]}, {[2, 4]}, and {[3, 6]}, respectively. Computing the potential presence can be done as follows. Let π = (v0, v1) . . . (vn 1, vn) be a single-agent plan for some agent. For every vi = v0, the potential presence induced by π is given by τ(π, v) = [ 0 j c, if there exists a node N in OPEN such that N .cost = c, it can further be developed without expanding N at this point in time. This is safe to do since the cardinal conflict will reappear in all of the nodes in the subtree below N until it is chosen and resolved. However, if the algorithm decides to apply split to N, then it generates all the children for every cardinal conflict in N. In a preliminary set of experiments, we evaluated the impact of these two CBS improvements on the performance of CBSTU. The results showed that CBSTU with PC performs much better than basic CBSTU. CBSTU with BP also performed better than the basic CBSTU, but adding it on top of PC did not yield significant gains. Therefore, unless stated otherwise hereinafter, we assume that CBSTU is implemented only with PC. 6. Empirical Evaluation In this section, we evaluate empirically the performance of A + ODTU and CBSTU by performing a set of experiments. 6.1 Experimental Setup All our experiments were performed over the following 4-neighborhood grids: Open. An 8 8 grid with no obstacles. DAO. A grid from the ost003d map of the game Dragon Age Origins (DAO), made publicly available by Sturtevant (Sturtevant, 2012). Warehouse. A grid structured like an automated warehouse. Figure 4 depicts these grids, where black cells and green cells are obstacles. Safe Multi-Agent Pathfinding with Time Uncertainty k U = 0 U = 1 U = 2 U = 4 CBSTU A + ODTU CBSTU A + ODTU CBSTU A + ODTU CBSTU A + ODTU 3 100% 100% 100% 100% 100% 100% 100% 100% 7 100% 100% 100% 88% 94% 72% 58% 46% 10 100% 87% 80% 43% 54% 13% 12% 0% 15 100% 13% 18% 3% 0% 0% 0% 0% Table 1: Success rates results for CBSTU and A + ODTU algorithms on the Open grid. k U = 0 U = 1 U = 2 U = 4 CBSTU A + ODTU CBSTU A + ODTU CBSTU A + ODTU CBSTU A + ODTU 4 100% 100% 52% 17% 44% 23% 22% 13% 7 92% 93% 14% 0% 2% 0% 0% 0% 10 80% 70% 4% 0% 3% 0% 0% 0% 13 70% 27% 0% 0% 0% 0% 0% 0% Table 2: Success rates results for CBSTU and A + ODTU algorithms on the DAO grid. To define w (e) and w+(e) for every edge e we introduce a parameter U called the uncertainty rate. For a given value of U, we set w (e) to be a value chosen uniformly at random from the range [1, U + 1], and set w+(e) to be a value chosen uniformly at random from the range [w (e), U + 1]. Thus, the uncertainty rate parameter U allows control over the amount of uncertainty in the generated experiment, where U = 0 means no uncertainty. For each grid, we run experiments with an increasing number of agents, starting from 2 and going up to 20 agents. For each configuration of grid type, U, and number of agents, we created 50 experiments. These 50 experiments differ in the start and goal locations of the agents, which were randomly selected. We set a timeout of five minutes for each solving each MAPF-TU instance. The success rate of an algorithm is the ratio of problems solved before reaching this timeout. Unless stated otherwise, the objective function we optimized for was SOCpes. The existing MAPF code frameworks that we explored were not easily adaptable to the MAPF-TU setting. So, we implemented all algorithms from scratch, including the two CBS enhancements mentioned in Section 5.3. As such, the performance of our algorithms is not directly comparable to state-of-the-art implementations of algorithms for classical MAPF, even when U = 0, and scales less gracefully. Nevertheless, the source code for running all our experiments is publicly available at https://github.com/Tomer-Shahar/Conformant-CBS. 6.2 Experimental Results Table 1 and Table 2 show the success rates of CBSTU and A + ODTU for different values of U and numbers of agents on the Open and DAO grids, respectively. Rows correspond to different numbers of agents, and columns represent different U values and algorithms. Each inner cell contains success rates. We observe the following trends. First, increasing U significantly decreases the success rates of the algorithms. For example, in the DAO grid with 7 agents, CBSTU solves all prob- Shahar, Shekhar, Atzmon, Saffidine, Juba, & Stern 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Success Rate Number of Agents U = 0 U = 1 U = 2 U = 4 Figure 5: Success rates results for CBSTU on the Open grid. lem instances with U = 0, but solves only 14% for U = 1. This trend is expected, as adding U means more conflicts and a longer runtime to resolve new conflicts. Similarly, adding more agents makes problems harder for both algorithms, reducing the overall success rates. Another clear trend is that CBSTU outperforms A + ODTU for all the instances except for the case when 7 agents with U = 0 on a DAO map. Indeed, CBS is also known to outperform A in classical MAPF on similar domains, and thus, the superior performance of its time uncertainty counter-part is expected. As CBSTU generally outperformed A + ODTU, we show results of only CBSTU hereinafter. Figure 5 shows the success rate of CBSTU with more agents than in Table 1. As expected adding more agents or increasing the uncertainty rate decreases the success rate. We note that CBSTU s performance when U = 0 is similar to the performance of Improved CBS (ICBS) on a similar grid (Boyarski et al., 2015b). 6.2.1 Sum of Costs Results To have a deeper understanding of the impact of U on CBSTU s behavior, Table 3 shows the results of experiments with a wider range of uncertainty rate values (U). These experiments were conducted on the Open grid with 9 agents. The table columns show the SOCopt (the Opt. column), SOCpes ( Pes. ), the difference between them ( Range ), the CBSTU computational runtime ( Time ) in seconds, and the success rate ( Success ). As seen in Tables 1 and 2, increasing U in general decreases the success rate. However, this relation between U and success rate is noisy, indicating that the complexity of MAPFTU is affected by other problem features as well. Similarly, the computational time does not fully correlate with U, but the general trend clearly indicates that higher uncertainty rate makes the problem, in general, harder to solve for CBSTU. Next, consider the difference between SOCopt and SOCpes (shown in the Range column), which we will refer to as the uncertainty range of the solution. The results clearly show that the uncertainty range increases significantly with U. For example, the uncertainty Safe Multi-Agent Pathfinding with Time Uncertainty U Sum of Cost (SOC) Time Success Opt. Pes. Range 0 47.7 47.7 0.0 0.04 100% 1 59.5 70.3 10.8 2.59 100% 2 73.6 96.9 23.3 5.59 98% 3 86.4 122.2 35.8 20.2 84% 4 104.0 146.7 42.7 19.4 70% 5 107.3 166.0 58.7 34.3 67% 6 122.5 179.4 56.0 30.1 69% 7 137.3 205.2 67.9 23.9 69% 8 145.9 232.8 86.9 32.6 53% 9 171.5 286.5 115.0 42.0 50% Table 3: SOCopt, SOCpes, runtime, and success rate results for Open grid with 9 agents. k U = 0 U = 1 U = 2 U = 4 CU PU CU PU CU PU CU PU 4 100% 100% 96% 100% 92% 98% 86% 96% 7 98% 98% 84% 96% 68% 92% 48% 80% 10 94% 94% 48% 76% 34% 68% 10% 56% 13 82% 82% 18% 42% 6% 28% 0% 18% Table 4: Success rates results for CBSTU on the Warehouse grid in the complete uncertainty (CU) and the partial uncertainty (PU) experiment types. range is 23.3, 42.7, and 86.9 for U=2, 4, and, 8, respectively. More generally, the results show that the uncertainty range increases linearly with U, with an increase of around 11 per U. Note that CBSTU and A + ODTU are both optimal and thus their solutions will have exactly the same SOCpes. Their SOCopt, however, can differ. While this is not reported these results, we have observed that both algorithms yielded solutions with SOCpes that were almost identical for all the problems, and thus have almost identical uncertainty range. 6.2.2 Warehouse Results Finally, we present results for the Warehouse grid. Here we performed two types of experiments: Complete Uncertainty (CU) and Partial Uncertainty (PU). CU experiments are the same as those described above. PU experiments are different in that they introduce uncertainty only on a specific set of edges. These edges are marked in gray in Figure 4 (right). The motivation for PU experiments is to simulate automated warehouse scenarios, where most of the uncertainty is condensed in specific areas such as the pickup depots and packaging areas. Obviously, experiments under PU and CU settings would be the same when U = 0. Shahar, Shekhar, Atzmon, Saffidine, Juba, & Stern Table 4 shows the success rate for different number of agents (k) and uncertainty rate (U). The results show that CBSTU performed better with PU than with CU in terms of the success rate. For example, 10 agents and U = 4, CBSTU with PU received a success rate of 56%, while CBSTU with CU received only a success rate of 10%. A similar trend was also observed on other grids. This is expected, since PU has strictly less uncertainty than CU. To summarize, we demonstrated that both CBSTU and A + ODTU can be applied to find safe and optimal solutions on real MAPF benchmarks. Increasing the uncertainty rate lowers the success rate and increases the SOC and uncertainty range of the returned solutions. If we compare the CBSTU and A + ODTU approaches, from the experiments we conducted, we observe that CBSTU performs significantly better. 7. Safe Replanning In some scenarios, acting agents can sense their environment, i.e., observe during the execution something that was unknown previously, and replan based on their updated knowledge (Seuken & Zilberstein, 2008). In this section, we consider how sensing and online replanning can be useful in the context of MAPF-TU. The type of sensing we focus on is where an agent can sense the current time when it arrives at a vertex v. This eliminates the uncertainty embodied by the potential presence of that agent at v. Therefore, replanning at this stage may yield a better solution. In the rest of this section, we investigate two sensing and replanning settings. In the first, every agent attempts to improve its current single-agent plan individually while ensuring safety of the resulting overall solution. In the second setting, the agents share sensing information with a central controller that may replan for multiple agents at once. We refer to the first setting as SENSE and the second setting as SENSE+COM. In both settings, we assume that the agents are initially given a safe solution to the current MAPF-TU instance, computed offline using a MAPF-TU algorithm such as CBSTU or A + ODTU. For each setting, we provide theoretical examples showcasing the potential gains of replanning and propose concrete replanning algorithms. Section 8 shows experimental results that measure the performance of these algorithms on MAPF-TU instances. The results confirm that our algorithms for SENSE and SENSE+COM can result in executing a solution with a lower cost. 7.1 The SENSE Setting [1 + n, 1 + n] [1, 1] [1, 1] [1, 1 + n] Figure 6: MAPF-TU instance showing the benefits of replanning when sensing might occur. Compared to the optimal, purely offline solution, the replanning solution decreases the pessimistic sum-of-costs from 4 + 3n to 4 + 2n, when sensing actually occurs at vertex w. Safe Multi-Agent Pathfinding with Time Uncertainty The following example demonstrates the potential benefit of sensing in the SENSE setting. Example 3. Consider the MAPF-TU instance shown in Figure 6, and the solution comprising the single-agent plan (s1, g2)(g2, g1) for a1 and the single-agent plan (s2, w) . . . (w, g2) for a2, where the agent waits in w for n time steps. This solution is safe. Moreover, it is an optimal solution, in terms of both optimistic and pessimistic SOC (which are 4 + 2n and 4 + 3n respectively). However, if we allow replanning then when agent a2 reaches vertex w it can decide to reduce the amount of waiting based on how long it actually took to go from s2 to w. For example, assume that it took n time steps for a2 to reach w. Following the original safe solution, the agent would wait in w for n additional time steps before moving to g2, reaching its goal at time 2n + 1. By sensing that it reached w at time step n, the agent can reduce the planned wait time to a single time step, reaching its goal at time n+2. In general, through sensing and replanning optimistic and pessimistic SOC can be reduced down to 4 + 2n and 4 + 2n, respectively. 7.1.1 Replanning Algorithm We propose the following replanning algorithm for the SENSE setting. Let ai be an agent following a single-agent plan πi that senses it has just reached vertex v and the current time is t. Our replanning algorithm searches for the lowest-cost single-agent plan from v to the goal of ai, that is guaranteed to avoid conflicts with the other agents execution of their single-agent plans. To this end, we restrict the new single-agent plan π i such that the potential presences it induces are subsets of the potential presences induced by πi. This means we can only choose to decrease the duration of its planned wait actions. Finding an optimal way to do this is a trivial each agent simply waits until it reaches the lower time bound of its next non-wait action. 7.1.2 Theoretical Analysis Let Π be the MAPF-TU solution prior to replanning and let Π be the MAPF-TU solution after replanning. Since the potential presences according to Π are either the same or a subset of the potential presences in Π, the solution Π must be safe. Introducing replanning cannot lead to worse SOC bounds since the search space for each agent during replanning includes its original single-agent plan. Thus, in the worst case the replanning will not change the current single-agent plan and SOC(Π) = SOC(Π ). Since our replanning algorithm only removes wait actions, the best-case reduction in SOC (i.e., SOC(Π) SOC(Π )) is bounded by the number of wait moves in the offline solution. 7.2 The SENSE+COM Setting In the SENSE+COM setting, agents share the information they sensed. This sharing occurs when an agent reaches a vertex and it is able to sense. Thus, the input to replanning here is a set of agents that arrived at a vertex at the same time and were able to sense. We call this set of agents the replanning agents. The following example demonstrates the potential benefit of sensing in the SENSE+COM setting, showing that this setting allows further reduction in SOC over what is possible in the SENSE setting. Shahar, Shekhar, Atzmon, Saffidine, Juba, & Stern [1, 1] [1, 1] Figure 7: MAPF-TU instance showing the benefits of replanning when sensing and communication might occur. Compared to the optimal, purely offline solution, the replanning solution decreases the optimistic SOC from 4 + n to 4, when sensing occurs by agent a1 in vertex g2. Example 4. Consider the MAPF-TU instance shown in Figure 7, and the solution comprising the single-agent plan (s1, g2)(g2, g1) for a1 and the single-agent plan (s2, w) . . . (w, g2) for a2, where the agent waits in w for n time steps. This solution is safe and optimal in terms of SOC (the pessimistic and optimistic SOC is 4 + n and 4 + 2n, respectively). However, if we allow replanning and communications, and if agent a1 senses when reaching vertex g2 and broadcasts its arrival time, then agent a2 can decide to reduce the amount of waiting based on how long it took going from s1 to g2. In the optimal case where agent a1 reaches g2 at time 1, the SOC range for the replanning approach is [4, 4 + 2n]. 7.2.1 Replanning Algorithm We propose to use CBSTU to generate a solution for the replanning agents that takes them from their current locations to their respective goal locations. To maintain safety, we initialize CBSTU with a set of CBSTU constraints that cover every potential presences of every agent that is not a replanning agent. This ensures that the agents with new singleagent plans will never conflict with agents that have not sensed. This is true even when the replanning phase assigns the agents a completely different solution to execute. This is why communication adds a facet of flexibility to replanning. 7.2.2 Theoretical Analysis Let Π be the MAPF-TU solution prior to replanning, ΠWOC be the MAPF-TU solution created after replanning under the SENSE setting, and ΠWC be the MAPF-TU solution created after replanning under the SENSE+COM setting. The ability to communicate expands the set of single-agent plans that may be assigned to the replanning agents. Thus, SOC(Π) SOC(ΠWOC) SOC(ΠWC) (4) for the SOC being optimized (pessimistic or optimistic). Note, that SOC(Π) is the SOC of the plan Π, as oppose to the cost of executing Π. For example, it may be that SOC(Π)=10 but executing Π would cost more. Thus, Equation 4 does not mean executing Π will necessarily cost more than executing ΠWOC, and that executing ΠWC will necessarily cost less than ΠWOC. Time complexity Since this variant uses a centralized planner, the runtime of every replanning step is in the worst-case exponential in the number of replanning agents. However, Safe Multi-Agent Pathfinding with Time Uncertainty the overall computation time is typically dominated by the time it takes to compute the initial solution since it always takes into consideration all the agents simultaneously with maximal uncertainty. At a certain time-step, even if all the agents were to sense and replan simultaneously sometime during the execution, this would still be a considerably easier MAPF-TU instance to solve compared to the initial, offline solution that was already found. This is because much of the time uncertainty is removed from the solution up to this point leading to fewer potential conflicts over shorter time intervals across the entire solution. Indeed, we observed in our experiments that all subsequent replanning stages were on an average several times faster than the time required to compute the initial solution. Safety The output of each replanning stage is guaranteed to be safe. This occurs because whenever replanning happens, each agent either sensed (and subsequently replans) or it did not sense (since it is currently traveling or did not sense for any other reason). For all agents that did not sense, their current single-agent plans are treated as constraints for all the agents that will replan. Then, the central planner computes new single-agent plans for the sensing agents, using a safe MAPF-TU algorithm. Since the other agents are taken into account through constraints, and a safe MAPF-TU solver is used, the solution found is guaranteed to contain no collisions in the future. Thus, the output for each replanning stage is a safe MAPF-TU solution. 8. Empirical Evaluation: Online Replanning In this section, we experimentally evaluate the benefit of using our online replanning algorithms for the SENSE and SENSE+COM setting. We performed a range of experiments and report here only a representative subset of them. Specifically, we report here the results only for the Open grid and for the Warehouse grid with partial uncertainty (PU). In each experiment, we first obtained an initial safe solution using CBSTU. Then, we simulated the execution of this solution by sampling uniformly the edges durations within their ranges. Whenever an agent finishes traversing an edge, it applies the appropriate replanning algorithm. Two types of experiments were performed, one where the objective is to minimize SOCopt and one where the objective is to minimize SOCpes. The main performance metric we consider here is the Final SOC which is the SOC (either SOCopt or SOCpes) of the solution the agents ended up executing, following all the performed replanning. 8.1 Results for the Open grid Table 5 shows the Final SOC results for 8 agents in the Open Grid when optimizing for SOCopt (left) and for SOCpes (right). The Initial SOC is the SOC (either SOCopt or SOCpes) of the initial solution. The SENSE and S+C columns represent the SENSE and SENSE+COM settings, respectively. Each line corresponds to a different uncertainty rates (U = 0, 1, . . . , 8). Every data cell is an average over 50 problem instances. Note that increasing U inherently increases the SOC since it increases the edge traversal duration. The results in Table 5 show that online replanning in this set of experiments is only marginally beneficial. This can be seen when comparing the Final SOC with the Initial SOC if they are the same then there is no benefit to online replanning. The largest difference between the average Final SOC and Initial SOC is for U = 7 in the SENSE Shahar, Shekhar, Atzmon, Saffidine, Juba, & Stern U Initial SOC Final SOC SENSE S+C 0 43.06 43.06 43.06 1 59.32 58.22 58.05 2 72.89 72.61 72.96 3 93.45 90.79 91.10 4 111.09 109.10 107.64 5 115.60 118.67 111.07 6 129.53 137.96 127.33 7 155.57 150.10 148.57 8 166.89 163.47 159.22 U Initial SOC Final SOC SENSE S+C 0 43.06 43.06 43.06 1 58.58 58.16 58.63 2 74.44 75.70 73.56 3 93.76 92.74 91.88 4 116.04 116.41 114.13 5 124.07 125.02 120.04 6 137.63 138.90 134.60 7 163.10 158.98 160.68 8 176.07 173.42 173.81 Table 5: Initial and Final SOC results for 8 agents in the Open grid, for the SENSE and SENSE+COM (S+C) settings. The left side shows results for SOCopt and the right side shows results for SOCpes. Agents U=1 U=2 U=3 U=4 2 11% 9% 12% 17% 3 11% 9% 11% 15% 4 9% 9% 10% 14% 5 8% 8% 9% 13% 6 7% 7% 9% 11% 7 7% 8% 9% 9% 8 6% 7% 9% 10% 9 6% 7% 9% - 10 5% 7% 9% - 11 4% - - - 12 4% - - - Agents U=1 U=2 U=3 U=4 2 0% 0% 0% 0% 3 0% 0% 0% 1% 4 0% 0% 0% 0% 5 0% 0% 1% 1% 6 0% 1% 1% 2% 7 0% 2% 1% 2% 8 0% 1% 2% 2% 9 0% 2% 2% 3% 10 0% 2% 2% 2% 11 1% 1% 1% 2% 12 1% 1% 1% 1% Table 6: Overall SOC reduction, SENSE+COM setting, Open grid. (left) Results for optimizing SOCopt, (right) Results for optimizing SOCpes. setting (150.10 Final SOC vs. 155.57 Initial SOC) and for U = 8 in the SENSE+COM setting (159.22 Final SOC vs. 166.89 Initial SOC). In almost all cases, the Final SOC when in the SENSE+COM setting is smaller than the Final SOC in the SENSE setting, which supports our theoretical analysis in the previous section. Next, we focus on the SENSE+COM setting and consider the impact of changing the number of agents on the ratio between the Final SOC and the Initial SOC. We refer to this ratio as the reduction in SOC due to online replanning. Table 6 shows the reduction in SOC for different numbers of agents (the table rows) and values of U (the columns), in the SENSE+COM setting. The left and right sides show results when optimizing for SOCopt and SOCpes, respectively. Cells marked with - indicate that the success rate is 10% or less, to avoid drawing conclusions from such experiments.5 5. Table 5 shows results for experiments with fewer agents, and so in all cases there the success rate was higher than 10%. Safe Multi-Agent Pathfinding with Time Uncertainty U Initial SOC Final SOC SENSE S+C 0 240.00 240.00 240.00 1 265.31 265.11 255.83 2 283.29 282.79 257.88 3 305.43 305.10 260.95 4 316.17 315.39 254.78 5 338.43 337.57 264.29 6 346.46 345.85 264.46 7 368.38 367.62 267.15 8 423.10 421.91 280.60 U Initial SOC Final SOC SENSE S+C 0 240.00 240.00 240.00 1 248.70 251.07 248.50 2 262.55 266.12 262.05 3 264.44 272.18 264.56 4 268.30 268.30 268.47 5 270.45 270.45 270.66 6 272.97 272.97 273.84 7 276.68 276.68 276.54 8 285.80 285.80 284.39 Table 7: Initial and Final SOC results for 8 agents in the Warehouse grid with PU, for the SENSE and SENSE+COM (S+C) settings. The left side shows results for SOCopt and the right side shows results for SOCpes. The results in this table show several interesting trends. First, we see that the effectiveness of online replanning (i.e,. the reduction in SOC) increases with U. This occurs because online replanning benefits from reducing the uncertainty through sensing (and communication in the SENSE+COM setting). Having more uncertainty (higher U) therefore provides more opportunities for the online replanning to improve on the initial safe solution. The second trend we observe is that here online replanning was significantly more effective when optimizing for SOCopt then when optimizing for SOCpes. For example, for U = 2 and 7 agents, in the SOCopt setting the reduction in SOC is 8% while in the SOCpes the reduction in SOC is only 2% for the same U and number of agents. We conjecture that this difference between the objective functions is because aiming to minimize SOCpes has an indirect effect of decreasing the SOC range of the initial solution. Since the Final SOC is planned to be within the SOC range, a smaller SOC range means limited opportunity for online replanning to improve over the Initial SOC. Thus, the reduction in SOC for SOCopt is significantly larger. 8.2 Warehouse Domain We performed a similar experiment on the Warehouse grid with 8 agents, where Table 7 follows the same format as Table 5. The results here emphasizes the trends observed for Open grid. First, the reduction in SOC when optimizing for the SOCpes objective is negligible. Similarly, there is no significant reduction in SOC in the SENSE setting, i.e, when agents cannot communicate. However, there is significant reduction SOC in the SENSE+COM setting when optimizing for SOCopt. This reduction increases with U, where for U = 8, the Initial SOC was 423.10 while the Final SOC is 280.60. This is expected, as a higher U means more uncertainty, a larger SOC range, and thus more opportunity for sensing to reduce uncertainty and lead to a better solution. 8.3 Experiments Summary In all experiments, we observed the following trends. Shahar, Shekhar, Atzmon, Saffidine, Juba, & Stern A larger SOC reduction in the SENSE+COM setting compared to the SENSE setting. Very limited SOC reduction in the SENSE setting. A larger SOC reduction when optimizing for SOCopt than when optimizing for SOCpes A larger SOC reduction when there is a higher uncertainty rate (U) The first trend corroborates our theoretical analysis, demonstrating that having the ability to share sensing information and coordinate with other agents is beneficial in MAPF-TU. The second trend suggests that the practical benefit in the SENSE setting with our replanning algorithm is limited. The third and fourth trends are related to the size of the initial uncertainty. When there is limited uncertainty over the duration of the initial solution, exhibited by a small SOC range, then there is limited opportunity for online replanning to improve the Final SOC by gaining information during execution. 9. Discussion There can be many variants and extensions to the MAPF-TU problem and the MAPF-TU solvers we proposed. In this section, we outline several key extensions. 9.1 Offline Planning for MAPF-TU with Sensing S = (v1, v2), (v2, v3), (v4, v1) [1, 1] [1, 1] Figure 8: MAPF-TU with sensing instance showing that replanning is not complete. The initial and goal vertices of the three agents are indicated in the bottom right. A policybased solution consists in Agents 1 and 2 waiting for 2 time steps, then moving to their goal vertices, and Agent 3 moves immediately to v3, then depending on the sense elapsed time, moves to its goal vertex or waits one time step and then moves. In Section 7, we proposed an online approach to consider agents ability to sense their current location and time. An alternative approach is to consider the agents ability to sense their location offline, i.e., when executing the solution. This means the planning algorithm will output an execution policy for each agent instead of a sequence of moves and waits. A policy may instruct the agent to perform different actions depending on the information it obtains via sensing. Considering sensing opportunities in this way can lead to significant cost reductions, compared to the replanning approach proposed in Section 7. Some problems cannot be solved with the replanning approaches we proposed, but can be solved with a carefully crafted offline policy. Safe Multi-Agent Pathfinding with Time Uncertainty Figure 8 illustrates an example of such an MAPF-TU problem instance. For this problem instance, there exists no safe, offline solution. This implies that we cannot perform online replanning. For safety, our current online approach begins by executing an offline solution and improves upon it. However, creating offline policies for multi-agent problems can be extremely difficult, e.g., in models such as Decentralized-POMDP (Bernstein et al., 2002). Developing efficient offine algorithms for MAPF-TU is left for future work. 9.2 Bounding Edge Traversal Time From Experience The main assumption in a MAPF-TU problem is that the upper and lower bounds on edges durations are given as input. While in some cases such information is given as input, another approach is to learn these upper and lower bounds from experience. We describe here one way to do so. Consider a setting where the time it takes to traverse an edge e is independent across executions and drawn from some unknown distribution De of edge-traversal durations. For every edge e, we collect M(e) samples e1, . . . e M(e) from De. Each sample is collected from a different execution, since multiple traversals of e may be correlated with each other. Note, however, that we do assume that every traversal of e, taken in isolation, has the same distribution De. Let L(e) = mini {1,...,M(e)} ei and U(e) = maxi {1,...M(e)} ei be the minimum and maximum observed duration it took to traverse e. The probability that the duration of a future traversal of e will be outside the interval [L(e), U(e)] is given by the following theorem. Theorem 3. If M(e) Ω( 1 δ), then with probability 1 δ over the prior executions, the delay on edge e lies in the interval [L(e), U(e)] with probability 1 ϵ. In other words, if the number of samples M(e) is at least Ω( 1 δ), then with probability higher than 1 δ the L and U of our sample satisfies that the probability that traversing e will take a duration not in [L(e), U(e)] is smaller than ϵ. The proof of Theorem 3 is given in Appendix A. Now, consider a MAPF-TU instance in which we set the lower and upper bounds edge durations of every edge e to be the observed lower and upper bounds L(e) and U(e), respectively. I.e., setting w (e) and w+(e) to be L(e) and U(e). We call this MAPF-TU instance an empirical MAPF-TU instance. We can use Theorem 3 to upper bound on the probability that a solution to an empirical MAPF-TU instance will cause a collision. In other words, we can compute a lower bound to the probability that a solution to an empirical MAPF-TU instance is indeed safe w.r.t. the real world. To do so, we introduce some additional notation. Let Π be a solution to an empirical MAPF-TU instance and let E be the number of distinct edges in Π. For any δ (0, 1), let ϵ(e, δ ) be the minimum ϵ for which the bound of Theorem 3 holds for the number of examples we possess for e and δ set to δ /E. That is, ϵ(e) = O 1 M(e) log E Theorem 4. Let Π = {π1, . . . , πk} be a solution to an empirical MAPF-TU problem. Then for δ, ϵ > 0 with probability 1 δ we obtain lower and upper bounds for every edge such that the probability that Π can be safely executed is at least 1 Pk i=1 P e πi ϵ(e). Shahar, Shekhar, Atzmon, Saffidine, Juba, & Stern Proof. Π is a safe solution to the corresponding empirical MAPF-TU instance. Therefore, if executing Π leads to a collision, then the duration of at least one move action in π exceeded the lower or upper bound of the corresponding edge duration. Taking the union bound over all move actions of all agents, we obtain that the probability of this occurring is at most Pk i=1 P e πi ϵ(e). Hence the probability no collisions occur is as claimed. The bound [L(e), U(e)] is always valid for the given ϵ(e) we obtain as a function of δ and M(e). However, if we are seeking a particular safety level, say some target ϵ for each edge giving an overall bound that the solution is safe with probability 1 ϵ, then the bounds thus obtained may be too conservative. Eventually, especially with a large sample size, our examples will include outlier events that occur with probability far less than ϵ . Here is an alternative for large sample sizes that will allow us to approach an optimal interval [L, U] for a given safety bound 1 ϵ. Theorem 5. For any α > 1, δ (0, 1), and ϵ (0, 1), let [L(e), U(e)] be any interval containing at least (1 ϵ α)M(e) examples for e where M(e) Ω( α ϵ(α 1)2 (log α δ)). Then, with probability 1 δ over prior executions, the delay for edge e lies in the interval [L(e), U(e)] with probability 1 ϵ. The proof for Theorem 5 is also given in Appendix 9.2. This theorem provides greater flexibility when constructing a MAPF-TU instance from empirical data, as one can take for every e any interval in [L(e), U(e)] that captures a sufficient number of samples. Then, one can use this more refined bound in Theorem 4 to obtain a more refined bound on the safety of the generate plans. A deeper study of this is a topic of future work. 10. Conclusion and Future Work In this paper, we studied the Multi-Agent Pathfinding with Time Uncertainty (MAPF-TU) problem, which is a Multi-Agent Pathfinding (MAPF) problem in which there is uncertainty over the time it takes an agent to traverse an edge. Specifically, for every edge we are given a lower and upper bound on the duration it takes to traverse it. We focused on the problem of finding a safe solution to a MAPF-TU instance, which is a solution that is guaranteed to avoid a conflict. To this end, we introduce the notion of potential presence and potential conflicts, where a solution is safe iffit has no potential conflicts. Then, we propose two algorithms, called A + ODTU and CBSTU, that find safe and optimal solutions to a MAPFTU instance. We implemented these algorithms and compared them experimentally on a variety of settings and maps. The results show that on our benchmark set of instances CBSTU significantly outperforms A + ODTU in terms of success rate. Then, we considered two online replanning settings, SENSE and SENSE+COM, where the agents have sensing or communicating capabilities while executing a solution. We demonstrate the potential benefit of online replanning in both settings, and propose online replanning algorithms that can improve the executed solution cost. We analyze these replanning algorithms theoretically, showing that using them is always advantageous. However, experimentally, we observed that signifcant benefit for replanning only appeared in the SENSE+COM when optimizing for the optimistic SOC. Finally, we discussed possible extensions to MAPF-TU. One such extension is to considering offline the possibility that the agents will sense new information and replan online. Safe Multi-Agent Pathfinding with Time Uncertainty Another extension suggests a statistical analysis that allows extracting from observed data the lower and upper bound edge traversal times. Yes another extension is where there lower and upper bound also on wait actions, e.g., for cases where an agent is not allowed to stay too long in some locations. There are many directions for future work. One can explore other objective functions to the MAPF-TU problem, such as minimizing the expected cost (assuming the traversal time distribution is known). Another direction is to adapt additional MAPF solvers to solve MAPF-TU, e.g., ICTS (Sharon et al., 2013), EPEA* (Goldenberg et al., 2014), or M* (Wagner & Choset, 2015). Finally, one may explore how to extend MAPF-TU to cases in which time is not discretized and the environment is continuous (Andreychuk et al., 2019), or when the bounds over the edge traversal durations change over time. Acknowledgments This research is partially funded by NSF award IIS-1908287 to Brendan Juba; and BSF grant #2018684 and ISF grant # 210/17 to Roni Stern. Appendix A. Sample Complexity Proofs In this appendix, we provide formal proofs for the main Theorems in Section 9.2. A.1 Proof for Theorem 3 This confidence bound is an easy consequence of basic statistical learning theory. We recall the number of examples needed to fit a class of Boolean functions is determined by the VC-dimension (Vapnik & Chervonenkis, 1971) of that class: Definition 1. A set of points of a domain X is shattered by a class C of Boolean functions on X if for every Boolean labeling of the set, there is some c C that gives each point in the set the desired label. C is then said to have VC-dimension d if the size of the largest set shattered by C contains d points. The exact asymptotic dependence of the confidence δ and accuracy ϵ obtainable for classes of a given VC-dimension and a given number of examples was recently determined by Hanneke (2016). We state this in slightly simplified form for our purposes6: Theorem 6 (cf. (Hanneke, 2016)). Let C be a class of Boolean functions of VC-dimension d. Then, for every δ, ϵ (0, 1), with probability 1 δ every c C that is true on Ω( 1 ϵ(d + log 1 δ) examples independently drawn from a distribution D on X satisfies Prx D[c(x) = 1] 1 ϵ. This bound is asymptotically optimal, but the constants hidden by the Ωare large. Other forms of this bound that feature an additional log 1 ϵ factor may give a better quantitative guarantee for the concrete values of ϵ that occur in practice. Theorem 3 is now an immediate consequence: 6. We obtain this statement from the usual form by fixing the labels to be all 1, since we are looking for a set that contains all of the examples seen so far. Shahar, Shekhar, Atzmon, Saffidine, Juba, & Stern Proof. Recall that the set of intervals has VC-dimension 2: it is easy to verify that no set of three points on the real line can be shattered (either two are identical or we can label the max/min points 1 and the middle point 0). It then follows from Hanneke s bound (Theorem 6) that with probability 1 δ, any interval that contains Ω(1 δ) examples drawn independently from some common distribution D will contain further points drawn from D with probability 1 ϵ. In particular, [L, U] is such an interval. A.2 Proof for Theorem 5 To prove this Theorem, we will need the relative-error agnostic / non-realizable version of the VC-dimension bound (Li et al., 2001), again stated in simplified form for our purposes7: Theorem 7 (cf. (Li et al., 2001)). Let C be a class of Boolean functions of VC-dimension d, let ϵ, δ (0, 1), and let α > 1. With probability 1 δ every c C that is true at least a (1 ϵ) fraction of m Ω( 1 ϵ(α 1)2 (d log 1 δ)) examples satisfies Pr[c(x) = 1] 1 αϵ. Theorem 5 is now immediate: Proof. Again, the class of intervals has VC-dimension 2. Thus, by Theorem 7 with ϵ set to ϵ/α, we see that if M(e) Ω( α ϵ(α 1)2 (log α δ)), every interval [L, U] that is true of (1 ϵ α)M(e) examples indeed satisfies Pr[e [L, U]] 1 α ϵ Appendix B. Unsound Range Constraints for Edge Conflicts The following example demonstrate that using a naive way to set range constraints to resolve a swapping conflict may lead to finding a suboptimal solution. [1, 1] [1, 1] Figure 9: A MAPF-TU instance showing that using naive range constraints for edge-based conflicts in CBSTU leads to suboptimal solutions. Example 5. Consider the MAPF-TU instance depicted in Figure 9, and assume the objective is to minimize optimistic SOC. The initial solution n (s1, g2), (g2, C), (C, g1) , (s2, C), (C, g2) o has a swapping conflict a1, a2, (g2, C), [2, 5] . However, imposing the constraints that agent a1 or agent a2 cannot traverse the edge (g2, C) at any time step in [2, 5] rules outs all 7. In Li et al s statement, we actually put ν = 2ϵ and replace α by α 1 4+(α 1) α 1 4 , so that if r is the true error and s < ϵ is the empirical error, the obtained bound on dν(r, s) = |r s| r+s+ν of α 1 4+(α 1) implies r < αϵ. Safe Multi-Agent Pathfinding with Time Uncertainty optimal solutions. Specifically, in all optimal solutions agent a1 has the single-agent plan (s1, A), (A, g2), (g2, C)(C, g1) and agent a2 has a single-agent plan in which a2 arrives to C at time step 4 and moves along (g2, C) immediately after. Andreychuk, A., Yakovlev, K. S., Atzmon, D., & Stern, R. (2019). Multi-agent pathfinding with continuous time. In the International Joint Conference on Artificial Intelligence (IJCAI), pp. 39 45. Atzmon, D., Stern, R., Felner, A., Sturtevant, N. R., & Koenig, S. (2020a). Probabilistic robust multi-agent path finding. In the International Conference on Automated Planning and Scheduling (ICAPS), pp. 29 37. Atzmon, D., Stern, R., Felner, A., Wagner, G., Bart ak, R., & Zhou, N.-F. (2020b). Robust multi-agent path finding and executing. Journal of Artificial Intelligence Research, 67, 549 579. Bart ak, R., Zhou, N.-F., Stern, R., Boyarski, E., & Surynek, P. (2017). Modeling and solving the multi-agent pathfinding problem in picat. In IEEE International Conference on Tools with Artificial Intelligence (ICTAI), pp. 959 966. Bernstein, D. S., Givan, R., Immerman, N., & Zilberstein, S. (2002). The complexity of decentralized control of markov decision processes. Mathematics of operations research, 27(4), 819 840. Boyarski, E., Felner, A., Sharon, G., & Stern, R. (2015a). Don t split, try to work it out: Bypassing conflicts in multi-agent pathfinding. In the International Conference on Automated Planning and Scheduling (ICAPS), pp. 47 51. Boyarski, E., Felner, A., Stern, R., Sharon, G., Tolpin, D., Betzalel, O., & Shimony, S. E. (2015b). ICBS: improved conflict-based search algorithm for multi-agent pathfinding. In the International Joint Conference on Artificial Intelligence (IJCAI), pp. 740 746. Brafman, R. I., & Domshlak, C. (2008). From one to many: Planning for loosely coupled multi-agent systems. In Proceedings of the Eighteenth International Conference on Automated Planning and Scheduling, ICAPS 2008, Sydney, Australia, September 1418, 2008, pp. 28 35. Cashmore, M., Coles, A., Cserna, B., Karpas, E., Magazzeni, D., & Ruml, W. (2019). Replanning for situated robots. In the International Conference on Automated Planning and Scheduling (ICAPS), pp. 665 673. Cimatti, A., Do, M., Micheli, A., Roveri, M., & Smith, D. E. (2018). Strong temporal planning with uncontrollable durations. Artificial Intelligence, 256, 1 34. Cimatti, A., Pistore, M., Roveri, M., & Traverso, P. (2003). Weak, strong, and strong cyclic planning via symbolic model checking. Artificial Intelligence, 147(1), 35 84. Cohen, L., Uras, T., Kumar, T. S., & Koenig, S. (2019). Optimal and bounded-suboptimal multi-agent motion planning. In Symposium on Combinatorial Search (So CS). Shahar, Shekhar, Atzmon, Saffidine, Juba, & Stern Coles, A., & Coles, A. (2014). PDDL+ planning with events and linear processes. In the International Conference on Automated Planning and Scheduling (ICAPS), pp. 74 82. Coles, A., Fox, M., Long, D., & Smith, A. (2008). Planning with problems requiring temporal coordination. In the AAAI Conference on Artificial Intelligence (AAAI), pp. 892 897. Dechter, R., Meiri, I., & Pearl, J. (1991). Temporal constraint networks. Artificial Intelligence, 49(1-3), 61 95. Erdem, E., Kisa, D. G., Oztok, U., & Sch uller, P. (2013). A general formal framework for pathfinding problems with multiple agents. In AAAI Conference on Artificial Intelligence. Felner, A., Li, J., Boyarski, E., Ma, H., Cohen, L., Kumar, T. S., & Koenig, S. (2018). Adding heuristics to conflict-based search for multi-agent path finding. In the International Conference on Automated Planning and Scheduling (ICAPS), pp. 83 87. Felner, A., Stern, R., Shimony, S. E., Boyarski, E., Goldenberg, M., Sharon, G., Sturtevant, N. R., Wagner, G., & Surynek, P. (2017). Search-based optimal solvers for the multiagent pathfinding problem: Summary and challenges. In the International Symposium on Combinatorial Search (So CS), pp. 29 37. Gange, G., Harabor, D., & Stuckey, P. J. (2019). Lazy CBS: implicit conflict-based search using lazy clause generation. In the International Conference on Automated Planning and Scheduling (ICAPS), pp. 155 162. Goldenberg, M., Felner, A., Stern, R., Sharon, G., Sturtevant, N., Holte, R. C., & Schaeffer, J. (2014). Enhanced partial expansion a*. Journal of Artificial Intelligence Research, 50, 141 187. Hanneke, S. (2016). The optimal sample complexity of pac learning. The Journal of Machine Learning Research, 17(1), 1319 1333. H onig, W., Kumar, T. S., Cohen, L., Ma, H., Xu, H., Ayanian, N., & Koenig, S. (2016). Multi-agent path finding with kinematic constraints. In the International Conference on Automated Planning and Scheduling (ICAPS), pp. 477 485. Lam, E., Bodic, P. L., Harabor, D., & Stuckey, P. J. (2019a). Branch-and-cut-and-price for multi-agent pathfinding. In the International Joint Conference on Artificial Intelligence (IJCAI), pp. 1289 1296. Lam, E., Bodic, P. L., Harabor, D. D., & Stuckey, P. J. (2019b). Branch-and-cut-andprice for multi-agent pathfinding. In International Joint Conference on Artificial Intelligence (IJCAI), pp. 1289 1296. Li, J., Harabor, D., Stuckey, P. J., Ma, H., & Koenig, S. (2019). Symmetry-breaking constraints for grid-based multi-agent path finding. In the AAAI Conference on Artificial Intelligence (AAAI), pp. 6087 6095. Li, Y., Long, P. M., & Srinivasan, A. (2001). Improved bounds on the sample complexity of learning. Journal of Computer and System Sciences, 62(3), 516 527. Ma, H., Kumar, S., & Koenig, S. (2017). Multi-agent path finding with delay probabilities. In the AAAI Conference on Artificial Intelligence (AAAI), pp. 3605 3612. Safe Multi-Agent Pathfinding with Time Uncertainty Morris, R., Pasareanu, C. S., Luckow, K. S., Malik, W., Ma, H., Kumar, S. T. K., & Koenig, S. (2016). Planning, scheduling and monitoring for airport surface operations. In Planning for Hybrid Systems, Papers from the 2016 AAAI Workshop. Pajarinen, J., & Peltonen, J. (2011). Efficient planning for factored infinite-horizon decpomdps. In IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, July 16-22, 2011, pp. 325 331. Seuken, S., & Zilberstein, S. (2008). Formal models and algorithms for decentralized decision making under uncertainty. Autonomous Agents and Multi-Agent Systems, 17(2), 190 250. Sharon, G., Stern, R., Felner, A., & Sturtevant, N. R. (2015). Conflict-based search for optimal multi-agent pathfinding. Artificial Intelligence, 219, 40 66. Sharon, G., Stern, R., Goldenberg, M., & Felner, A. (2013). The increasing cost tree search for optimal multi-agent pathfinding. Artificial Intelligence, 195, 470 495. Shekhar, S., Brafman, R. I., & Shani, G. (2019). A factored approach to deterministic contingent multi-agent planning. In Proceedings of the Twenty-Ninth International Conference on Automated Planning and Scheduling, ICAPS 2019, Berkeley, CA, USA, July 11-15, 2019, pp. 419 427. Silver, D. (2005). Cooperative pathfinding. In the First Artificial Intelligence and Interactive Digital Entertainment Conference, pp. 117 122. Standley, T. S. (2010). Finding optimal solutions to cooperative pathfinding problems.. In the AAAI Conference on Artificial Intelligence (AAAI), pp. 28 29. Stern, R., Sturtevant, N. R., Felner, A., Koenig, S., Ma, H., Walker, T. T., Li, J., Atzmon, D., Cohen, L., Kumar, T. K. S., Bart ak, R., & Boyarski, E. (2019). Multi-agent pathfinding: Definitions, variants, and benchmarks. In the International Symposium on Combinatorial Search (So CS), pp. 151 159. Sturtevant, N. R. (2012). Benchmarks for grid-based pathfinding. IEEE Transactions on Computational Intelligence and AI in Games, 4(2), 144 148. Surynek, P., Felner, A., Stern, R., & Boyarski, E. (2016). Efficient SAT approach to multiagent path finding under the sum of costs objective. In ECAI. Surynek, P. (2012). Towards optimal cooperative path planning in hard setups through satisfiability solving. In PRICAI, pp. 564 576. Vapnik, V., & Chervonenkis, A. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16(2), 264 280. Veloso, M. M., Biswas, J., Coltin, B., & Rosenthal, S. (2015). Cobots: Robust symbiotic autonomous mobile service robots. In the International Joint Conference on Artificial Intelligence (IJCAI), pp. 4423 4429. Vidal, T., & Fargier, H. (1999). Handling contingency in temporal constraint networks: from consistency to controllabilities. J. Exp. Theor. Artif. Intell., 11(1), 23 45. Wagner, G., & Choset, H. (2015). Subdimensional expansion for multirobot path planning. Artificial Intelligence, 219, 1 24. Shahar, Shekhar, Atzmon, Saffidine, Juba, & Stern Wagner, G., & Choset, H. (2017). Path planning for multiple agents under uncertainty. In the International Conference on Automated Planning and Scheduling (ICAPS), pp. 577 585. Wagner, G., Choset, H., & Siravuru, A. (2016). Multirobot sequential composition. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2081 2088. Walker, T. T., Sturtevant, N. R., & Felner, A. (2018). Extended increasing cost tree search for non-unit cost domains.. In IJCAI, pp. 534 540. Walker, T. T., Sturtevant, N. R., & Felner, A. (2020). Generalized and sub-optimal bipartite constraints for conflict-based search. In AAAI Conference on Artificial Intelligence, Vol. 34, pp. 7277 7284. Wurman, P. R., D Andrea, R., & Mountz, M. (2007). Coordinating hundreds of cooperative, autonomous vehicles in warehouses. In the AAAI Conference on Artificial Intelligence (AAAI), pp. 1752 1760.