# branchandcutandprice_for_multiagent_pathfinding__d813ed37.pdf

Branch-and-Cut-and-Price for Multi-Agent Pathﬁnding

Edward Lam1,2 , Pierre Le Bodic1 , Daniel Harabor1 and Peter J. Stuckey1,2

1Monash University, Melbourne, Australia 2CSIRO Data61, Melbourne, Australia {edward.lam, pierre.lebodic, daniel.harabor, peter.stuckey}@monash.edu

There are currently two broad strategies for optimal Multi-agent Pathﬁnding (MAPF): (1) searchbased methods, which model and solve MAPF directly, and (2) compilation-based solvers, which reduce MAPF to instances of well-known combinatorial problems, and thus, can beneﬁt from advances in solver techniques. In this work, we present an optimal algorithm, BCP, that hybridizes both approaches using branch-and-cut-and-price, a decomposition framework developed for mathematical optimization. We formalize BCP and compare it empirically against CBSH and CBSH-RM, two leading search-based solvers. Conclusive results on standard benchmarks indicate that its performance exceeds the state-of-the-art: solving more instances on smaller grids and scaling reliably to 100 or more agents on larger game maps.

1 Introduction

Multi-agent Pathﬁnding (MAPF) is an NP-hard graph optimization problem that involves ﬁnding a minimum-cost set of paths for a team of cooperating agents. Each agent must move from its start position to its goal position such that no two agents occupy the same vertex at the same time or cross an edge in opposite directions at the same time. One popular technique for optimal MAPF is Conﬂict-based Search (CBS) [Sharon et al., 2015]. CBS is a two-level search algorithm. At the low level, it computes a path for each agent independently. At the high level, CBS detects conﬂicts between pairs of agents and resolves them by splitting the current solution into two related subproblems, each of which involves replanning a single agent. Recursively resolving conﬂicts by splitting a subproblem into two children implicitly deﬁnes a search tree. The high-level search explores this tree using best-ﬁrst search and terminates when it expands a collision-free leaf. Optimal MAPF can also be solved using mixed integer programming (MIP) by modeling it as a multi-commodity ﬂow problem [Yu and La Valle, 2013]. The reduction involves a time-expanded graph whose vertices are indexed by space and time, and specialized gadgets to account for conﬂicts.

Despite this inefﬁcient representation, the model remains reasonably effective on small instances since MIP solvers are mature industry-grade software supported by decades of academic research. This paper proposes an optimal MAPF algorithm, called BCP, that combines the strengths of CBS (namely, decomposition and domain-speciﬁc reasoning) with the search performance of MIP. The algorithm is based on branch-and-cutand-price (also abbreviated BCP), a general framework that decomposes a difﬁcult optimization problem into a set of easier problems and then merges the individual results together in a branch-and-bound search tree [Desrosiers and L ubbecke, 2010; L ubbecke and Desrosiers, 2005; Desaulniers et al., 2005]. BCP is also a two-level algorithm. At the low level, it solves a series of single-agent pathﬁnding problems using dedicated algorithms, much like CBS. At the high level, it uses MIP to assign paths to agents and resolve conﬂicts. The high-level problem remains NP-hard but possesses structure that can be exploited by modern MIP technologies. The contributions of this paper are (1) a novel model of MAPF based on the BCP framework, (2) extensions of the basic model that drastically improve its performance, and (3) experimental results that highlight the advantages of the approach. The experiments compare two versions of BCP against two leading CBS variants: CBSH [Felner et al., 2018] and CBSH-RM [Li et al., 2019]. Results on 1,350 standard benchmarks across two small grids and two large game maps indicate that BCP exceeds state-of-the-art performance: (1) optimizing 99% of the instances compared to 85% by CBSHRM, (2) solving 96% of the hardest instances with 100 agents compared to 34% by CBSH-RM, and (3) achieving an average run-time of 5 seconds compared to 8 seconds by CBSHRM on the 1,143 instances solved by both algorithms. The remainder of this paper describes BCP and presents the experiments in detail.

2 Background This section summarizes CBS and the BCP technique applied to MAPF.

2.1 Conﬂict-based Search CBS breaks the MAPF problem into one high-level search and multiple single-agent low-level search problems. Initially, each agent plans its optimal path independently using

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

low-level A*. If none of these paths conﬂict, then an optimal solution to the MAPF problem is found. Otherwise, the high-level problem selects a conﬂict between two agents a1 and a2, and splits the current problem into two subproblems. In the ﬁrst subproblem, agent a1 is prevented from using the conﬂicting resource (vertex or edge at some time), and its optimal path is resolved under this new constraint using lowlevel search. In the second subproblem, agent a2 is similarly constrained and resolved. The high-level search continues by selecting a subproblem with the least possible cost, and either ﬁnding that it has no conﬂicts and hence produces an optimal solution, or that it must be split again into another two subproblems. This process continues until a solution is found.

2.2 Branch-and-Cut-and-Price The origins of the BCP method can be traced back more than half a century to early works in linear programming (LP) [Dantzig and Wolfe, 1960; Gilmore and Gomory, 1961; Gilmore and Gomory, 1963]. Today, its theoretical foundations are rigorously proven and well-understood [Desrosiers and L ubbecke, 2010; L ubbecke and Desrosiers, 2005; Desaulniers et al., 2005; Barnhart et al., 1998]. Several basic concepts of MIP that develop into BCP are reviewed below. Interested readers are advised to consult [Rader, 2010] for an introduction to MIP, and then the previous references for a formal treatment of the BCP technique.

Modeling To solve a problem using MIP, a formal mathematical model of the problem must ﬁrst be created. A model consists of an objective function that measures the quality of a solution, variables that represent decisions, and constraints that express relationships between the variables. A solution assigns a value to each variable such that their values respect the constraints. Variables can be categorized as integer or continuous. Integer variables must take a discrete, integral value in a solution, while continuous variables can take any value (integral or fractional). A solution is optimal if the objective function attains a global minimum. The MIP master problem of the BCP MAPF algorithm selects a minimum cost subset of paths from an extremely large set of paths subject to three classes of constraints: (1) exactly one path is selected per agent, (2) every vertex is visited in at most one of the selected paths, and (3) every edge is traversed in at most one of the selected paths. Each path is associated with an integer variable taking value 0 and 1, representing the proportion that the path is selected. A solution assigns 0 or 1 to every variable subject to the constraints of the three constraint classes.

Branch-and-bound NP-hard MIP problems are commonly solved using the branch-and-bound (BB) algorithm. BB sufﬁciently enumerates all solutions in a search tree by solving a relaxation problem at every node. BB begins with a tree containing only the root node, where it proceeds to solve the relaxation; usually but not always, an LP relaxation. An LP relaxation of a MIP model is an identical problem but drops, or relaxes, the requirement that all integer variables take integral values. In other words, the integer variables are

allowed to take fractional values, but otherwise preserves all constraints and variables. Solving this LP relaxation, which can be done in polynomial time, typically results in a fractional solution, i.e., a solution in which at least one integer variable takes fractional value, as opposed to an integer solution, in which all integer variables take integral values. The BB algorithm uses this fractional solution to branch. A branching rule is a subroutine that partitions the solution space of the LP relaxation into two disjoint sets such that a fractional value appears in neither. BB then solves the LP relaxation over the two partitions, and the process repeats. In simpler terms, a branching rule selects an integer variable taking fractional value in a solution, and creates two children nodes in which this variable cannot take fractional value. The fractional solution, and all others with the same fractionality, are removed at the expense of two more nodes in the search tree. For example, consider a fractional solution in which an agent selects two paths, each with 0.5 proportion. One path traverses edge (i, j) and the other path doesn t (perhaps traversing edge (i, k)). Summed over all paths, edge (i, j) is selected with proportion 0.5. A basic branching rule will ﬁx the edge (i, j) to proportion 0 in one child, forbidding the edge, and ﬁx the edge to proportion 1 in the other child, forcing the edge. (The branching rule does not specify anything about (i, k).) Then, (i, j) will appear with integral proportion (0 or 1) in every solution to the LP relaxation in both children. The search is now one step closer towards selecting all edges, and hence all paths, with integral proportion. Eventually, solving the LP relaxation naturally returns an integral solution, so branching will cease, resulting in a leaf node. At this point, the integer variables take integral values, so this solution is valid for the original MIP problem. It is accepted as the new best solution if it is better than the current best solution, i.e., its objective value is lower than that of the incumbent solution, if any. A beneﬁt of solving a relaxation at every node is that its optimal objective value provides a lower bound to the objective value of every solution in the entire subtree, and hence, can be used to prune the subtree if this lower bound is higher than the current best solution, i.e., the upper bound. In other words, any solution in this subtree cannot be better than the solutions already found, and hence, this subtree need not be explored. Therefore, higher lower bounds from tight relaxations can tremendously improve the speed of the search. Even though a branching rule deﬁnes how to create two children nodes, it does not specify which node should be solved next. That is the role of a node selector. The standard node selector in BB prefers nodes in a best-ﬁrst manner (lowest lower bound) but occasionally performs depth-ﬁrst diving. Because all integer variables will take integral values at some depth in the search tree, BB will terminate in ﬁnite but exponential time due to the combinatorics of branching. Since branching removes fractional solutions, never integer solutions, and bounding removes suboptimal solutions, never optimal solutions, BB is correct and optimal.

Branch-and-price Recall that the master problem selects paths from a large set P of paths. We might imagine that all possible paths for every

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

agent must be enumerated in P, from which it selects a subset of optimal paths. Since MAPF allows cycling and waiting, an inﬁnite number of paths is possible. However, based on mathematical arguments, a ﬁnite, usually exponential-size subset P of P is sufﬁcient to prove optimality; all other paths cannot appear in an optimal solution. This is obvious: for example, in a MAPF problem with one agent, nonsensically long paths looping round and round cannot contribute to an optimal solution since a shorter path without the loops exists. Branch-and-price (BP) is a variant of BB that omits some or all variables/paths initially from the master problem, and reinstates them during the search. Omitting variables is equivalent to ﬁxing their value to zero. Because some variables have been ﬁxed, part of the LP solution space is not searched. Since an optimal LP solution could lie in this part, the optimal objective value to this LP problem no longer provides a valid lower bound at this node. The set P is initialized with a few paths (possibly zero) and is iteratively ﬁlled by calling a pricer, an algorithm that generates better paths and adds them to P for the master problem to select. The pricer identiﬁes omitted variables that enlarge the solution space in the direction that must be explored, or guarantees that there are none. If variables are found, they are added to the master problem, which is solved again. Note that the master problem may or may not choose the new variables. This proceeds until a sufﬁcient portion of the original solution space is considered, at which point the pricer proves that none of the omitted variables could improve the current LP objective value even if included. Hence, the current LP optimal objective value (with many variables remaining omitted) provides exactly the same bound as a problem that includes all variables, which means BP enumerates the same solutions as BB, despite not considering all variables. This implies the correctness and optimality of BP.

Branch-and-cut Solutions with a vertex (resp. edge) collision are removed by a constraint stating that the vertex (resp. edge) can be used by at most one path. Even though the time horizon extends to inﬁnity, there is no need to check for conﬂicts at vertices and edges not currently in use or those at some distant time after all agents have reached their target. Therefore, only a ﬁnite, usually exponential-size subset of all collision constraints need to appear in the master problem. Branch-and-cut (BC) extends BB by omitting constraints from the master problem and adding them during the search. After solving the LP relaxation, BC calls a procedure known as a separator for every class of constraints (e.g., vertex collision and edge collision). A separator checks the paths chosen by the master problem and determines whether these paths exhibit a conﬂict of its class. It either adds to the master problem one or more constraints violated in the current LP solution, or concludes that all constraints of its class are satisﬁed. The new constraints will prohibit more than one agent from using the vertex or edge in all future solutions. This process repeats until no violated constraints are found. Vertex collision and edge collision are two classes of constraints categorized as problem constraints. Problem constraints are necessary and sufﬁcient to correctly model the

problem. A class of constraints can also be categorized as redundant. Redundant constraints are not necessary for modeling and solving the problem, but rather, they tighten the LP relaxation by cutting off fractional solutions, leading to a stronger lower bound and/or an integral LP solution in the next iteration; subsequently resulting in the subtree being pruned by bound or integrality much earlier than otherwise. If a class of problem constraints and its separator are correct, its constraints will only remove portions of the LP solution space that do not contain solutions valid according to the problem deﬁnition. If a class of redundant constraints and its separator are correct, its constraints will only remove fractional solutions, never integer solutions. Therefore, BC (with many constraints remaining omitted) obtains the same optimal solutions as BB (with all constraints included).

Branch-and-cut-and-price If variables are omitted, adding a constraint could cut off all currently known paths for an agent. Therefore, the pricer needs to be called again after a separator adds constraints in order to ﬁnd alternative paths that satisfy the new constraints, or to ensure that the existing paths remain feasible and optimal. The culmination of all the previous components leads to the framework named BCP, which retains the same correctness and completeness guarantees as BC and BP. To summarize, the master problem, which selects paths from P , the pricer, which adds better paths to P , and the separators, which resolve conﬂicts on the paths selected by the master problem, are repeatedly solved at a node until the chosen paths are fractionally free of conﬂicts and fractionally optimal. If any path is fractionally chosen, a branching rule creates two children nodes that do not contain the fractionality, bringing the search closer towards an integer solution. Assuming the correctness and completeness of all the components described previously, BCP the framework and the MAPF algorithm are correct and complete: the algorithm will ﬁnd an optimal solution in ﬁnite but exponential time if and only if one exists (excluding implementation issues such as bugs and numerical instability, as with any other approach). For many graph-related optimization problems, especially the Vehicle Routing Problems which bear many similarities with MAPF, models with an exponential number of variables and constraints can be mathematically proven to have an LP relaxation tighter than all other known models (e.g., [Letchford and Salazar-Gonz alez, 2006]), and solving these exponential-size MIP models with BCP far outperforms solving smaller polynomial-size MIP models with BB (e.g., [Fukasawa et al., 2006]).

3 BCP-B This section formalizes the MAPF problem and presents a basic, minimally-working algorithm, called BCP-B, that uses the BCP technique for optimal MAPF. This basic variant is extended in the next section.

3.1 Problem Deﬁnition Deﬁne a location l = (x, y) Z+ Z+ as a pair of space coordinates, and L as the set of all locations. A location l2 = (x2, y2) is a neighbor of l1 = (x1, y1) in the

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

north direction if x2 = x1 and y2 = y1 1,

south direction if x2 = x1 and y2 = y1 + 1,

west direction if x2 = x1 1 and y2 = y1, and

east direction if x2 = x1 + 1 and y2 = y1.

Let a vertex v = (l, t) L Z+ be a pair of a location and a time coordinate, and let V be the set of vertices. Deﬁne E as the set of edges, where an edge e = (v1, v2) = ((l1, t1), (l2, t2)) is a pair of vertices such that t2 = t1 + 1 and either l1 = l2 or l2 is a neighbor of l1. That is, an edge indicates a wait action or connects two adjacent locations in one timestep. Furthermore, denote the reverse of an edge e as e = ((l2, t1), (l1, t2)). Deﬁne a directed graph G = (V, E). Since there is no time horizon and edges connect vertices with increasing time, G is inﬁnite but acyclic. Deﬁne A as the set of agents, where every agent a A has a start location sa L and a goal location ga L, which may coincide. A path p of length k Z+ for agent a A is a sequence of k locations (l0, l1, l2, . . . , lk 1) such that l0 = sa, lk 1 = ga, and ((lt, t), (lt+1, t + 1)) E for all t {0, . . . , k 2}. The path p visits the vertices (lt, t) where t {0, . . . , k 1} and continues to visit vertices at its goal location after the path concludes, i.e., p also visits the vertices (ga, t) for all t {k, . . .}. The path p traverses the edges ((lt, t), (lt+1, t + 1)) where t {0, . . . , k 2}. The cost cp of a path is equal to its length. Then, an optimal solution to the MAPF problem is a set of paths, one path for each agent a A, that minimizes the sum of path lengths such that each (time-indexed) vertex is visited at most once, and each (time-indexed) edge and its reverse are traversed at most once. These two conditions are respectively referred to as vertex conﬂicts and edge conﬂicts.

3.2 Master Problem For every agent a A, assume the existence of a large pool Pa of candidate paths. The master problem chooses one path for every agent such that the sum of path costs is minimized. Additional constraints that prevent conﬂicts are added dynamically whenever a conﬂict is detected. For all a A, p Pa, deﬁne λp [0, 1] as a variable representing the proportion of selecting path p. A solution to the master problem maps every λp to a value between 0 to 1. The initial master problem (before adding additional constraints) is speciﬁed as the linear program:

p Pa cpλp (1)

subject to X

p Pa λp 1 a A, (αa) (2)

λp 0 a A, p Pa. (3)

Objective Function (1) minimizes the total cost of the selected paths. Constraint (2) ensures that every agent uses at least one path. By minimizing the total cost, no agent will use more than one path in an optimal solution. Let αa 0 be the dual variable of Constraint (2) as deﬁned by LP duality.

Constraint (3) are non-negativity constraints, which disallow negative proportions of a path. As currently stated, the master problem allows both vertex and edge conﬂicts. These are resolved by adding constraints found using the separators described as follows.

3.3 Resolving Vertex Conﬂicts Deﬁne xp v {0, 1} as a constant that indicates if path p visits vertex v V . The separator for vertex conﬂicts begins by calculating the number xv of times v is used:

p Pa xp vλp. (4)

The variable xv is positive if and only if at least one path p that uses v (i.e., the constant xp v = 1) is selected (i.e., the variable λp > 0). A vertex conﬂict occurs at v if and only if xv > 1. The separator then removes every solution that uses v more than once by adding the constraint X

p Pa xp vλp 1 v V. (βv) (5)

Denote its dual variable as βv. From duality theory, βv 0.

3.4 Resolving Edge Conﬂicts Edge conﬂicts are settled in a similar manner. Let yp e {0, 1} be a constant that indicates if path p traverses edge e E. The separator ﬁrst calculates the number ye of times e is used:

p Pa yp eλp.

An edge conﬂict occurs at e whenever ye + ye > 1. The conﬂict is resolved by adding the constraint X

p Pa (yp e + yp e )λp 1 e E. (γe) (6)

Let the dual variable of Constraint (6) be γe 0.

3.5 Finding Better Paths The pool Pa of candidate paths for any agent a is incrementally ﬁlled by a pricer that either (1) ﬁnds one or more paths that could potentially improve the current master problem solution, or (2) declares that no improving path exists. For every agent a, the pricer must solve the single-agent shortest path problem with a modiﬁed objective function. The output of the pricer is at least one path p for agent a with cost cp > 0. Within the pricer, xp v {0, 1} and yp e {0, 1} are binary decision variables, not constants, that respectively indicate whether p visits vertex v and traverses edge e. The modiﬁed objective function is

min cp := cp αa X

v V βvxp v X

e E γe(yp e + yp e ), (7)

which can be interpreted as follows. Recall from Sections 3.2 to 3.4 that αa 0, βv 0 and γe 0. If a chooses to use path p, it pays a penalty βv 0 if it visits a conﬂicting vertex v and pays γe 0 if it traverses a conﬂicting edge e. The term αa offsets the cost cp of p to measure whether

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

the new path is cost-effective compared to all existing paths of a. From LP theory, p can contribute to a better solution if and only if cp < 0. In this case, p is added to Pa, leading to another round of separation and pricing. If the pricer cannot ﬁnd a path p with cp < 0 for any agent a A, then no path can improve upon the current master problem solution, which is declared (fractionally) optimal. BCP-B implements the pricer by adapting an existing A* code to the modiﬁed path cost. Every edge e initially has cost 1. The penalty γe is added to the cost of e and e , and βv is added to all ﬁve incoming edges to v. Then, the A* algorithm ﬁnds a path p and outputs

ˆcp := cp X

v V βvxp v X

e E γe(yp e + yp e ).

The output path p is accepted into Pa if ˆcp < αa.

3.6 Resolving Fractional Solutions

The master problem solves a linear program, which typically produces fractional solutions, e.g., given an agent a A and two paths p1, p2 Pa, we can have λp1 = λp2 = 0.5. Whenever the pricer declares a solution with fractional values is optimal, branching must proceed to resolve the fractionality. At the very least, branching must split the problem into two subproblems such that the current fractional solution appears in neither subproblem. BCP-B branches on an agent-vertex pair, stipulating that an agent a must visit a vertex v in one child and must not visit v in the other child. The branching rule ﬁrst computes the constants xv as for Equation (4) and builds the set

Uv = {a A : X

p Pa xp vλp > 0}

of agents using the vertex v. At this stage, all violated Constraint (5) appear in the master problem, and hence, xv 1 for all v V . The branching rule then selects a vertex

v = arg min (l,t) V {t : 0 < xv < 1 |Uv| 2}.

That is, it chooses a vertex with the earliest time that has fractional value and is used by two or more agents. Next, it selects an agent

a = arg min a Uv {cp : p Pa λp > 0}.

That is, it chooses an agent a that is (fractionally) using a path visiting v and has a path of the shortest length among all paths used by all agents visiting v. The branching rule then creates two children nodes. In one child, a must visit v. In the other child, a must not visit v. This is in turn enforced in the master problem and the pricer at each child. The master, separation and pricing problems are then solved in both subproblems and the process repeats. Node selection and node pruning occur as in a regular branch-andbound. Globally optimal solutions appear in the leaves of the search tree, where all variables have integral value.

a1 a2 l1 l2 l3 l4

Figure 1: An example of a corridor conﬂict.

BCP-B includes Constraints (2), (3), (5) and (6), which are problem constraints necessary and sufﬁcient for correctly modeling MAPF. BCP tightens the LP relaxation of BCPB using redundant constraints that resolve corridor conﬂicts and rectangle symmetry conﬂicts. BCP also branches on path length in addition to the vertex branching in BCP-B.

4.1 Resolving Corridor Conﬂicts Corridor conﬂicts can appear when two agents fractionally cross a space of unit height and some length in opposite directions. In an integer solution, BCP-B forbids all types of conﬂicts, namely, vertex conﬂicts and edge conﬂicts, via Constraints (5) and (6). However, these constraints may not be violated by two agents fractionally crossing a corridor. Consider two agents a1 and a2 trying to cross the corridor in Figure 1. Even though the illustration portrays a corridor as a space surrounded by side obstacles, a corridor conﬂict can occur anywhere two agents cross, regardless of obstacles. Agent a1 has two paths p1,1 = (l1, l2, l3, l4) and p1,2 = (l1, l1, l2, l3, l4), which differs by a wait at time 0. Agent a2 also has two paths p2,1 = (l4, l3, l2, l1) and p2,2 = (l4, l3, l3, l2, l1). Assume that a master problem solution assigns value 0.5 to all four paths. Vertex conﬂicts do not occur in this solution since (l2, 2) and (l3, 2) are each used with value 1. Similarly, edge conﬂicts do not appear because ((l2, 1), (l3, 2)) and its reverse, and ((l2, 2), (l3, 3)) and its reverse are each used with value 1. These fractional solutions naturally arise very frequently in the LP relaxation. Pruning them is critical to tightening the lower bound, and hence, improves the performance of BCP. Corridor conﬂicts are removed using the constraint X

p Pa1 yp ((l1,t),(l2,t+1))λp + X

p Pa1 yp ((l1,t+1),(l2,t+2))λp+

p Pa2 yp ((l2,t),(l1,t+1))λp + X

p Pa2 yp ((l2,t+1),(l1,t+2))λp 1

(a1, a2, l1, l2, t), (8)

where a1, a2 A, l1, l2 L and t N such that a1 = a2, and l1 and l2 are neighbors. Constants yp e are deﬁned as in Section 3.4. The correctness of Constraint (8) is reasoned as follows. In an integer solution, if a1 uses ((l1, t), (l2, t + 1)), it arrives at l2 at t + 1, and hence, cannot simultaneously use ((l1, t + 1), (l2, t+2)). Also, a2 cannot use ((l2, t), (l1, t+1)) because it would incur an edge conﬂict, nor use ((l2, t+1), (l1, t+2)) because it would incur a vertex conﬂict at (l2, t+1). By symmetry, at most one of those four edges can be used, which is

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

Figure 2: An example of a rectangle symmetry conﬂict.

precisely Constraint (8). In a sense, this constraint combines Constraints (5) and (6) for two speciﬁc agents. Let πa1,a2,l1,l2,t 0 be the dual variable of Constraint (8). Adding one of Constraint (8) to the master problem will also modify the objective function of the pricer by appending

πa1,a2,l1,l2,typ ((l1,t),(l2,t+1)) πa1,a2,l1,l2,typ ((l1,t+1),(l2,t+2))

to Objective Function (7) when pricing agent a1, and

πa1,a2,l1,l2,typ ((l2,t),(l1,t+1)) πa1,a2,l1,l2,typ ((l2,t+1),(l1,t+2))

when pricing agent a2.

4.2 Resolving Rectangle Symmetry Conﬂicts Rectangle symmetry conﬂicts were recently introduced by [Li et al., 2019]. The idea is that any two agents entering and exiting a rectangle at precisely the right time must sustain a vertex conﬂict somewhere within the rectangle. Consider two agents a1 and a2 entering the gray rectangle in Figure 2. If the two agents use a single-agent-optimal path, the agents will always conﬂict somewhere within the rectangle. Let the edges of the two sides used by a1 be E1, shown in thick red lines. Let the edges of the two sides used by a2 be E2, shown in dashed blue lines. In BCP, these rectangle symmetries are removed using the constraint X

e E1 yp eλp + X

e E2 yp eλp 3 (a1, E1, a2, E2), (9)

where a1 = a2, and E1 E and E2 E are the edges timed precisely for entering and exiting the rectangle in an optimal-length path (no delay in the interior). Constraint (9) states that at most three of the four sides can be used at those exact timesteps. The correctness of this constraint relies on the proof given in [Li et al., 2019]. Let ρa1,E1,a2,E2 0 be the dual variable of Constraint (9). Adding one of Constraint (9) will modify the objective function of the pricer by appending, respectively, the terms

e E1 ρa1,E1,a2,E2yp e and X

e E2 ρa1,E1,a2,E2yp e

to Objective Function (7) when pricing agents a1 and a2.

4.3 Branching on Path Length Bounding the length of all paths for a particular agent bounds its cost in the entire subtree. In particular, ﬁxing the length of an agent ﬁxes its cost, regardless of the vertices it visits. Contrastingly, ﬁxing a vertex to be used or unused only indirectly affects the cost by rerouting the agent. Therefore, it is beneﬁcial to ﬁx path lengths early in the search. BCP employs

a tiered branching rule that ﬁrst branches on path length and only branches on vertices (as in Section 3.6) once every path used by an agent has the same length. The branching rule begins by calculating the set S of agents fractionally using paths of different lengths; deﬁned as

S = {a A : p1, p2 Pa, λp1 > 0, λp2 > 0, cp1 = cp2}.

From S, it chooses an agent a and a path p of shortest length that is fractionally used, i.e.,

(a, p) = arg min a S,p Pa {cp : λp > 0}.

It then creates two children nodes. In one child, a only uses paths with length less than or equal to cp. In the other child, a only uses paths with length greater than or equal to cp + 1. Branching on path lengths removes the current fractional solution and cannot remove an integer solution.

5 Experiments

This section compares the performance of BCP-B and BCP against CBSH and CBSH-RM.

5.1 Set-up BCP-B and BCP are implemented using the MIP solver SCIP 6.0.0 [Gleixner et al., 2018] with CPLEX as the LP solver. Codes for CBSH [Felner et al., 2018] and CBSH-RM [Li et al., 2019] are obtained from their authors. All four solvers are single-threaded and are run on an Intel Xeon E5-2660 V3 CPU at 2.6 GHz with a time-out of ﬁve minutes. The solvers are evaluated on the same instances as [Li et al., 2019], which consist of 1,350 standard benchmarks over two small grids and two large game maps sourced from the Moving AI repository [Sturtevant, 2018]. The 20x20 map is an empty 20 20 grid. In 10obs-20x20, 10% of the empty space is replaced with obstacles. The maps den520d and lak503d are considerably larger and have structured obstacles. Fifty instances are tested for every different number of agents.

5.2 Results Figure 3 shows the percentage of instances optimally solved by the four algorithms, categorized by map and the number of agents. BCP displays exceptional performance in comparison to the other three algorithms. It is obvious that the improvements detailed in Section 4 are worthwhile: BCP has the highest success rate, while BCP-B has the lowest, tied with CBSH. BCP optimizes 1,335 of the 1,350 instances compared to 1,145 by CBSH-RM. On the den520d instances with 120 agents, BCP and CBSH-RM respectively solve 49 and 36 instances. On the hardest lak503d map with 100 agents, BCP closes 48 instances in comparison to CBSH-RM at 17. Of the 1,335 instances optimized by BCP, 829 of them are solved at the root node, i.e. no branching occurred. On the other 506 instances with branching, the root node optimality gap averages 0.1%, with the largest at 1.3%. These results demonstrate that the lower bounds from BCP are extremely tight, making the problems reasonably easy. BCP found feasible solutions to all 15 instances on which it timed out. On these instances, the optimality gap averages

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

20 30 40 50 60

Percentage Solved

BCP CBSH-RM BCP-B CBSH

20 30 40 50

Percentage Solved

10obs-20x20

30 60 90 120

Percentage Solved

30 40 50 60 70 80 90100

Percentage Solved

Figure 3: Success rate for each map and number of agents. Higher is better.

20 30 40 50 60

20 30 40 50

10obs-20x20

30 60 90 120

30 40 50 60 70 80 90100

Figure 4: Average run-time computed over the instances solved optimally by all algorithms. Lower is better.

0.5%, with a maximum of 1.7%. A lower bound 1 lower than the feasible solution is found for 7 of the 15 instances, indicating that their optimal solutions are at most 1 lower than what is already found, if any. Figure 4 plots the average run-time averaged over the instances solved by all four algorithms. These charts conﬁrm that BCP has similar run-times for the easier instances but substantially shorter run-times for the harder instances.

6 Related Work Multi-agent Pathﬁnding is a special case of Multi-agent Planning; a well known problem that is often solved by modeling the set of agents as one large agent with many degrees of freedom [Torre no et al., 2018]. This approach, sometimes called joint planning, fails completely even for small instances of MAPF [Standley, 2010]. To improve on joint planning, stateof-the-art methods for optimal MAPF depend on some of the following decompositions and reasoning. Operator decomposition. Algorithms of this type plan in the joint space of all agents but in a way that tries to avoid an explosion in branching factor. OD [Standley, 2010] is a wellknown example that interleaves the planning of single agents to dramatically reduce the branching factor of the search. Another approach, EPEA*, is a partial expansion [Goldenberg et al., 2014] solver, which defers generating all but the most promising successors of a node. OD and EPEA* are both optimal and are often effective on MAPF problems with up to dozens of agents and with low congestion. In more challenging settings, these same methods often exhaust available memory long before ﬁnding a solution.

Compilation-based solvers. Algorithms of this type transform MAPF into related problems for which efﬁcient algorithms exist. Examples include SAT [Surynek et al., 2016], ASP [Erdem et al., 2013] and multi-commodity ﬂow [Yu and La Valle, 2013]. These approaches reason about the entire problem at once but use simple time-expanded models and currently without any specialized constraints and reasoning techniques developed speciﬁcally for MAPF. Such approaches exhibit state-of-the-art performance but typically do not scale well to large maps with many agents.

Two-level search-based solvers. Algorithms of this type reason about single agents at the low level and about the interactions between single-agent plans at the high level. Perhaps the most prominent example is CBS [Sharon et al., 2015], which includes a large family of optimal and bounded suboptimal variants [Felner et al., 2017]. Such leading MAPF algorithms can scale to large maps with many agents, often with the help of reasoning techniques developed speciﬁcally for MAPF; e.g. lower-bounding heuristics [Felner et al., 2018], branching strategies [Boyarski et al., 2015] and specialized constraints [Li et al., 2019].

7 Conclusion

This paper introduced BCP, a novel algorithm for Multi-agent Pathﬁnding that substantially improves upon current state-ofthe-art methods, including the recent algorithm CBSH-RM. These results demonstrate that close collaboration between the mathematical optimization and artiﬁcial intelligence communities can result in signiﬁcant advances.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

References [Barnhart et al., 1998] Cynthia Barnhart, Ellis L. Johnson, George L. Nemhauser, Martin W. P. Savelsbergh, and Pamela H. Vance. Branch-and-price: Column generation for solving huge integer programs. Operations Research, 46(3):316 329, 1998. [Boyarski et al., 2015] Eli Boyarski, Ariel Felner, Roni Stern, Guni Sharon, David Tolpin, Oded Betzalel, and Eyal Shimony. ICBS: Improved conﬂict-based search algorithm for multi-agent pathﬁnding. In IJCAI, pages 740 746, 2015. [Dantzig and Wolfe, 1960] George B. Dantzig and Philip Wolfe. Decomposition principle for linear programs. Operations Research, 8(1):101 111, 1960. [Desaulniers et al., 2005] Guy Desaulniers, Jacques Desrosiers, and Marius M. Solomon. Column Generation. Springer US, 2005. [Desrosiers and L ubbecke, 2010] Jacques Desrosiers and Marco E. L ubbecke. Branch-price-and-cut algorithms. In Wiley Encyclopedia of Operations Research and Management Science. John Wiley & Sons, Inc., 2010. [Erdem et al., 2013] Esra Erdem, Doga G. Kisa, Umut Oztok, and Peter Schueller. A general formal framework for pathﬁnding problems with multiple agents. In AAAI, pages 290 296, 2013. [Felner et al., 2017] Ariel Felner, Roni Stern, Eyal Shimony, Eli Boyarski, Meir Goldenberg, Guni Sharon, Nathan R. Sturtevant, Glenn Wagner, and Pavel Surynek. Searchbased optimal solvers for the multi-agent pathﬁnding problem: Summary and challenges. In SOCS, pages 29 37, 2017. [Felner et al., 2018] Ariel Felner, Jiaoyang Li, Eli Boyarski, Hang Ma, Liron Cohen, T. K. Satish Kumar, and Sven Koenig. Adding heuristics to conﬂict-based search for multi-agent path ﬁnding. In ICAPS, pages 83 87, 2018. [Fukasawa et al., 2006] Ricardo Fukasawa, Humberto Longo, Jens Lysgaard, Marcus Poggi de Arag ao, Marcelo Reis, Eduardo Uchoa, and Renato F. Werneck. Robust branch-and-cut-and-price for the capacitated vehicle routing problem. Mathematical Programming, 106(3):491 511, 2006. [Gilmore and Gomory, 1961] P. C. Gilmore and Ralph E. Gomory. A linear programming approach to the cuttingstock problem. Operations Research, 9(6):849 859, 1961. [Gilmore and Gomory, 1963] P. C. Gilmore and Ralph E. Gomory. A linear programming approach to the cutting stock problem part II. Operations Research, 11(6):863 888, 1963. [Gleixner et al., 2018] Ambros Gleixner, Michael Bastubbe, Leon Eiﬂer, Tristan Gally, Gerald Gamrath, Robert Lion Gottwald, Gregor Hendel, Christopher Hojny, Thorsten Koch, Marco E. L ubbecke, Stephen J. Maher, Matthias Miltenberger, Benjamin M uller, Marc E. Pfetsch, Christian Puchert, Daniel Rehfeldt, Franziska Schl osser, Christoph Schubert, Felipe Serrano, Yuji Shinano,

Jan Merlin Viernickel, Matthias Walter, Fabian Wegscheider, Jonas T. Witt, and Jakob Witzig. The SCIP Optimization Suite 6.0. Technical report, Optimization Online, July 2018. [Goldenberg et al., 2014] Meir Goldenberg, Ariel Felner, Roni Stern, Guni Sharon, Nathan R. Sturtevant, Robert C. Holte, and Jonathan Schaeffer. Enhanced partial expansion A*. Journal of Artiﬁcial Intelligence Research, 50:141 187, 2014. [Letchford and Salazar-Gonz alez, 2006] Adam N. Letchford and Juan-Jos e Salazar-Gonz alez. Projection results for vehicle routing. Mathematical Programming, 105(2):251 274, 2006. [Li et al., 2019] Jiaoyang Li, Daniel Harabor, Peter J. Stuckey, Hang Ma, and Sven Koenig. Symmetry-breaking constraints for grid-based multi-agent path ﬁnding. In AAAI, 2019. [L ubbecke and Desrosiers, 2005] Marco E. L ubbecke and Jacques Desrosiers. Selected topics in column generation. Operations Research, 53(6):1007 1023, 2005. [Rader, 2010] David J. Rader. Deterministic operations research: models and methods in linear optimization. John Wiley & Sons, 2010. [Sharon et al., 2015] Guni Sharon, Roni Stern, Ariel Felner, and Nathan R. Sturtevant. Conﬂict-based search for optimal multi-agent pathﬁnding. Artiﬁcial Intelligence, 219:40 66, 2015. [Standley, 2010] Trevor S. Standley. Finding optimal solutions to cooperative pathﬁnding problems. In AAAI, pages 173 178, 2010. [Sturtevant, 2018] Nathan Sturtevant. Moving AI: Pathﬁnding benchmarks. https://movingai.com/benchmarks, 2018. [Surynek et al., 2016] Pavel Surynek, Ariel Felner, Roni Stern, and Eli Boyarski. Efﬁcient SAT approach to multiagent path ﬁnding under the sum of costs objective. In ECAI, pages 810 818, 2016. [Torre no et al., 2018] Alejandro Torre no, Eva Onaindia, Anton ın Komenda, and Michal Stolba. Cooperative multiagent planning: a survey. ACM Computing Surveys (CSUR), 50(6):84, 2018. [Yu and La Valle, 2013] Jingjin Yu and Steven M. La Valle. Planning optimal paths for multiple robots on graphs. In 2013 IEEE ICRA, pages 3612 3617, 2013.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)