# hybrid_quantumclassical_multiagent_pathfinding__4d511b8e.pdf Hybrid Quantum-Classical Multi-Agent Pathfinding Thore Gerlach 1 2 Loong Kuan Lee 2 Frederic Barbaresco 3 Nico Piatkowski 2 Multi-Agent Path Finding (MAPF) focuses on determining conflict-free paths for multiple agents navigating through a shared space to reach specified goal locations. This problem becomes computationally challenging, particularly when handling large numbers of agents, as frequently encountered in practical applications like coordinating autonomous vehicles. Quantum Computing (QC) is a promising candidate in overcoming such limits. However, current quantum hardware is still in its infancy and thus limited in terms of computing power and error robustness. In this work, we present the first optimal hybrid quantum-classical MAPF algorithms which are based on branch-andcut-and-price. QC is integrated by iteratively solving QUBO problems, based on conflict graphs. Experiments on actual quantum hardware and results on benchmark data suggest that our approach dominates previous QUBO formulations and state-of-the-art MAPF solvers. 1. Introduction Emerging domains of large-scale resource allocation problems, such as assigning road capacity to vehicles, warehouse management or 3D airspace to Unmanned Aerial Vehicles (UAVs), often require Multi-Agent Pathfinding (MAPF) (Stern et al.; Choudhury et al.; Li et al., c) to determine feasible allocations. MAPF involves calculating non-colliding paths for a large number of agents simultaneously, presenting significant computational challenges in realistically sized scenarios. These challenges are becoming increasingly relevant in case of large-scale real-world applications. E.g., future UAV traffic in urban environments, driven by parcel delivery demands, is expected to involve managing thousands of flight paths (Doole et al., 2020). 1University of Bonn 2Fraunhofer IAIS 3Thales Land and Air Systems. Correspondence to: Thore Gerlach . Proceedings of the 42 nd International Conference on Machine Learning, Vancouver, Canada. PMLR 267, 2025. Copyright 2025 by the author(s). The scalability of optimal state-of-the-art MAPF solvers is limited, since finding optimal solutions is NP-hard (Sharon et al.). Thus one often falls back to suboptimal or anytime methods (Li et al., a;b; Huang et al.; Okumura, 2023b;a; Li et al., 2021). Even though such methods are computationally efficient in finding a feasible solution, the solution quality can be insufficient which implies the urge for optimal solution methods. By reformulating MAPF as a multi-commodity flow problem, it can be solved optimally via Integer Linear Programming (ILP). However, the reduction uses an inefficient representation of the problem setting in terms of space complexity and is only effective on small instances. More popular techniques for optimal MAPF include Conflict-based Search (CBS) (Sharon et al.) and Branch-and-Cut-and-Price (BCP) (Lam et al., b). CBS is a two-level procedure, with splitting a search tree based on detected conflicts between agents and subsequent replanning. This tree is explored with best-first search until a collision-free node is found. BCP takes a different approach of considering a (possibly infeasible) solution which is then refined iteratively by successively adding paths and constraints. Similarly to CBS, BCP is a two-level algorithm: on the low level it solves a series of single-agent pathfinding problems, while the high level uses ILP to assign feasible paths to agents. While low level single agent pathfinding can be performed efficiently, the high level problems of both CBS and BCP remain NP-hard. Quantum Computing (QC) is considered promising for tackling certain NP-hard Combinatorial Optimization (CO) problems. This is due to its potential to leverage quantum phenomena to solve certain types of problems faster than classical methods. The quantum mechanical effect of superposition enables exploration of a vast solution space in parallel, which is particularly advantageous for finding the best solution from a large set of possibilities. During this exploration quantum entanglement allows the encoding of high-order relationships among problem variables, leading to an effective exploration of the solution space. In particular, QC is suited for solving Quadratic Unconstrained Binary Optimization (QUBO) problems (Punnen, 2022) min z {0,1}n z Qz , (1) where Q is an n n real matrix and n is the number of qubits. Despite this simple problem structure, it is NP- Hybrid Quantum-Classical Multi-Agent Pathfinding Pricing Separation (a) Initialize paths. (b) Identify conflicts. (c) Generate paths. (d) Solve QUBO. (e) Final solution. Figure 1: Schematic visualization of our quantum MAPF algorithm. (a) First, initial paths are generated for every agent with possible conflicts. (b) We enter the outer loop (separation), where we identify conflicts between paths and add them to the problem. (c) In the pricing step, we generate new paths for every agent and (d) find the best set of paths by solving a QUBO problem. This inner loop is repeated until adding a new path cannot improve the solution quality, while the outer loop is terminated when our set of chosen paths has no conflicts. (e) By construction, a conflict-free optimal set of paths is returned. hard and hence encompasses a wide range of real-world problems, such as chip design (Gerlach et al., 2024), flight gate assignment (Stollenwerk et al., a) and trajectory planning (Stollenwerk et al., b). One prominent QC method is Adiabatic Quantum Computing (AQC), which is grounded in the adiabatic theorem (Albash & Lidar, 2018). This theorem states that a quantum system will remain in its ground state if its Hamiltonian changes slowly enough and there is a sufficient energy gap from excited states. The problem is encoded into the final Hamiltonian, and the system is evolved gradually from an initial Hamiltonian. If done adiabatically, the system ends in the ground state of the problem Hamiltonian, which corresponds to the optimal solution. Approximations for AQC can be obtained by either using digital QC hardware, such as the Quantum Approximate Optimization Algorithm (Farhi et al., 2014), or by purposefully built analog devices, such as Quantum Annealers (Johnson et al., 2011). Despite QC s promise, we find ourselves in the Noisy Intermediate-scale Quantum (NISQ) (Preskill, 2018) era. QC devices face challenges including error rates, limited computing power and restricted hardware topology, which lead to suboptimal solutions obtained with currently available hardware. In this paper, we investigate the use of hardware-aware QC for MAPF by constructing two optimal hybrid quantumclassical algorithms based on QUBO subproblems. To the best of our knowledge, these are the first quantum algorithms for MAPF. We call our algorithms QUBO-and Price (QP) and QUBO-and-Cut-and-Price (QCP) which are based on the idea of BCP. We iteratively add paths (and constraints) to the problem and solve a QUBO, which leads to the applicability of QC. This gives us an upper bound on the best possible solution and a stopping condition tells us when our set of paths contains the optimal solution. An overview of QCP can be found in Fig. 1. Even though cur- rent quantum hardware is still limited, our framework is modular and designed to be compatible with future devices. Our contributions can be summarized as follows: Two novel optimal hybrid quantum-classical MAPF algorithms (QP and QCP), which iteratively solve restricted ILP problems by using QUBO, An optimality criterion with proof, generalizing the pricing problem of classical column generation, Hardware-aware QUBO formulations for the restricted ILP problems with using the concept of conflict graphs, leading to parallel solvable independent subproblems, Extensive experiments on benchmark datasets which show the superiority of our method over previous QUBO approaches on real quantum hardware and stateof-the-art MAPF solvers. 2. Related Work Our algorithms are based on BCP (Lam et al., b). Despite their optimality, such algorithms require a sophisticated branching strategy and are not guaranteed to find a good solution in reasonable time. An anytime adaption of MAPF BCP has been investigated (Lam et al., a), but further investigation of optimal anytime algorithms is of great interest (Okumura, 2023a). Heuristics are used for avoiding exponentially many branching steps, leading to efficient suboptimal algorithms (Sadykov et al.). However, such rounding heuristics can lead to unsatisfying results, making the investigation of QC for this task intriguing. QC is a promising candidate for large-scale planning problems (Stollenwerk et al., a; Li et al., 2024). In the area of multi-agent problems, flow-problem formulations have Hybrid Quantum-Classical Multi-Agent Pathfinding been investigated (Ali et al., 2024; Zhang et al., 2021; Tarquini et al., 2024; Davies & Kalidindi). These methods are edge-based, that is each edge in the spatio-temporal graph is represented by a qubit. Even though certain constraints can be integrated into the quantum state in this framework, the problem size is way beyond current QC hardware capabilities and also infeasible for near-term devices. Instead of representing all edges in the given graph, a different approach is taken in (Mart ın & Martin) by considering which path to choose as decision variables. This leads to introducing a QUBO formulation for the Unsplittable Multi-Commodity Flow problem by directly integrating the inequality constraints. Similar to BCP, they iteratively add paths to their problem. However, they present no theoretically sound criterion on when to stop, but have to rely on suboptimal heuristics. Furthermore, the large amount of constraints have to be incorporated into the QUBO formulation which either leads to a huge number of auxiliary variables or the need for an iterative optimization scheme to adapt Lagrangian parameters (M ucke & Gerlach, 2023). The authors of (Stollenwerk et al., b; Huang et al., 2022) circumvent this problem by using conflict graphs for representing possible constraints. However, their methods are not directly applicable to the MAPF setting, since only the starting times of preplanned trajectories are optimized. We combine the ideas of an iteratively expanding path-based approach with the concept of conflict graphs. A pricing criterion tells us when all variables are included which are part of an optimal solution. The proof is based on (R onnberg & Larsson), which however assumes negativity on reduced costs. We loosen this assumption and adapt it for general applicability to MAPF. 3. Background For notational convenience, we onforth denote matrices as bold capital letters (e.g. A), vectors as bold lowercase letters (a) and define B ..= {0, 1}. Furthermore, let 1 denote the vector consisting only of ones, with the dimension following from context. Lastly, let denote entry-wise inequality between vectors, i.e., a b ai bi i. The following sections will formalize MAPF (Sec. 3.1) and give a mathematical formulation of how to solve it (Sec. 3.2). 3.1. Multi-Agent Pathfinding The input to the MAPF problem is a set of agents A, a weighted undirected graph G = (V, E) and an origindestination pair for each agent. The graph G captures the underlying environment, where all possible agent states (e.g. location) are represented by V and E can be regarded as valid transitions from one state to another, with an underlying cost. V already captures environmental constraints, such as possible obstacles, while E captures motion constraints of the agents, e.g., velocity and maximum turning rates of UAVs. The goal of MAPF is now to find optimal paths from the origin to the destination for each agent, s.t. they avoid pairwise conflicts. For rating optimality, we use the objective of the sum of all weighted path lengths. We here consider classical collision conflicts, that is the vertex conflict and the (swapping) edge conflict. A vertex conflict between two agents exists if they move to the same node at the same time, while an edge conflict prohibits two agents to use the same edge at the same time. Introducing the time component, we allow the agents to start at different points in time, while also giving them the opportunity to wait at a certain location. For finding a solution to the MAPF problem, a spatiotemporal directed graph formulation is typically used. That is, we define GT = (VT , ET ) as the graph, with nodes v = (s, t) V {1, . . . , T} and edges e = (v, v ) = ((s, t), (s , t + 1)) VT VT with weights we = w(s,s ), where (s, s ) E and T N is a maximum allowed time horizon. We define the reverse edge of e = (s, t), (s , t + 1) as e ..= (s , t), (s, t + 1) for representing edge conflicts. GT is an acyclic weighted directed graph with |VT | = T |V | and |ET | = O (ST |V |), where S is the average number of possible state transitions. For example, if we consider the classical two-dimensional grid environment, an agent has five possible state transitions, i.e., wait, go north, go east, go south and go west. Every agent a A is obliged to find a path in GT from origin (oa, ta) VT to destination (da, ta + Ta) VT for a starting time ta [T] and Ta T ta. A path p is a sequence of edges p ..= (e1, . . . , e Ta 1), s.t. et = ((x, ta +t), (y, ta +t+1)), et+1 = ((y, ta +t+1), (z, ta + t+2)), e1 = ((oa, ta), (s, ta+1)) and e Ta = ((s, ta+Ta 1), (da, ta + Ta)). The cost/length cp of a path p is defined as the sum of all its edge weights, i.e., cp ..= P e p we. Specifically, we go over to the mathematical description of the MAPF objective. 3.2. Path-Based Formulation We encode each possible path p P for every agent by zp B. An ILP formulation is given by: MP : min z BN X p Pa cpzp (2a) p Pa zp = 1, a A (2b) p Pa xp vzp 1, i VT (2c) p Pa (yp e + yp e) zp 1, e ET , (2d) Hybrid Quantum-Classical Multi-Agent Pathfinding where xp v, yp e B indicate whether path p visits vertex v VT / edge e ET , Pa indicates the set of all possible paths for agent a and N = |P|, P = S a A |Pa|. Note that xp v, yp e are constant and we just optimize over zp. The constraint in (2b) ensures that exactly one path is chosen for every agent, while (2c) and (2d) avoid conflicts. We denote (2) as the Master Problem (MP). We have |VT | constraints for avoiding vertex conflicts and |ET | constraints for avoiding swapping conflicts. The number of our decision variables representing possible solutions now corresponds to the number of all possible paths for all agents, which is exponential in the maximum time horizon, N = O(|A|ST ). A large maximum time T or number of agents |A| can make the problem size increase very quickly, making it infeasible to solve it with current NISQ devices. 4. Methodology Not only the huge number of decision variables poses a challenge for current quantum computers, but also the number of constraints strongly impacts the solvability of the resulting QUBO formulation. We overcome this issue by considering the Restricted Master Problem (RMP) which optimizes over a subset of decision variables RMP : min z Z c z (3a) s.t. Dz 1 , (3b) with Z ..= {z Bn : P p Pa zp = 1, a A}, Pa Pa, n = |P|, P = S a A Pa and c Rn is the vector consisting of the corresponding path lengths. The constraint matrix D captures inequality constraints from (2): ( xp i , if i VT , yp i + yp i , if i ET . To get equivalence between (3) and (2), we use a two-loop iterative optimization scheme: the outer loop decides which constraints are not fulfilled and should be added to our problem and the inner loop optimizes over subsets of all possible paths for every agent, increasing the number of paths in every step (see Fig. 1 and Algorithm 1). This procedure builds with the hope that we already find a (nearly) optimal solution with not exploring the whole search space, both in terms of decision variables n and number of constraints m. The mathematical framework describing this technique is called Column/Row Generation (L ubbecke). The columns are identified with the decision variables and the rows with the constraints. It alternatingly solves a high-level (RMP) and low-level Pricing and Separation problems (PP and SP), which decide which variables/constraints to add to the MP. If solving the PP/SP tells us to not add any more variables, the solution to the MP is optimal by construction. Algorithm 1 QUBOANDPRICEANDCUT Input: Shortest independent paths P and z = 1 Output: Optimal feasible solution 1: C 2: while z is infeasible do Separation 3: Add violated constraints to C 4: while not (5) do Pricing 5: p arg minp cp(λ, C) Shortest path 6: P P {p } 7: λ OPTIMIZELAGRANGIAN(P, C) Solve (8) 8: z OPTIMIZEMASTER(P, C) QUBO (Sec. 4.3) 9: end while 10: end while This method was developed for Linear Programming (LP) (Dantzig) without the restriction of integrality of the variables in our case we do not optimize over the continuous domain [0, 1] but over the binary values {0, 1}. To cope with these kind of ILP problems, branching methods, such as BCP (Desrosiers & L ubbecke), are used. In the worst case, exponentially many branching steps are needed. We circumvent this problem by considering solutions to the ILP MP, instead of the relaxed LP version. Ideally, we want n N, while an optimum of RMP is also an optimum of MP. The next section governs how to add new paths to our problem and check whether a solution is equivalent. 4.1. Pricing Instead of using a relaxed LP formulation for the PP, we use Lagrangian Relaxation (LR) (Wolsey, 2020), which can be used for ILP problems of the form in (3). The partial Lagrangian is given by L(z, λ) ..= c z + λ (Dz 1), λ Rm + and the LR of the RMP is defined as LR : L(λ) ..= min z Z L(z, λ) = min z Z c(λ) z λ 1 , (4) where c(λ) ..= c + λ D is the Lagrangian reduced cost vector. Note that L(λ) v(RMP), where v( ) indicates the optimal value of the optimization problem. Theorem 4.1. Let ˆv be the objective value of a feasible solution z to RMP (D z 1, z Z and ˆv = c z) and λ Rm +. If v(RMP) = v(MP) then a A : min p Pa\Pa cp(λ) min p Pa cp(λ) < ˆv L(λ) . (5) Proof. We give a proof by showing that v(RMP) = v(MP) holds if (5) does not hold. The path variables not in RMP are implicitly assumed to take the value 0. Assume now, that one variable not included in the RMP takes value 1 for Hybrid Quantum-Classical Multi-Agent Pathfinding agent a A, i.e., P p Pa\Pa zp 1. It follows that v(MP) = min z Z p P cpzp : Dz 1, X L(z, λ) : X p P cp(λ)zp : X p P\Pa cp(λ)zp + min p Pa cp(λ) Cλ (6d) b A\{a} min p Pb cp(λ) + min p Pa cp(λ) Cλ (6e) = min p Pa cp(λ) min p Pa cp(λ) + L(λ) (6f) ˆv v(RMP) , (6g) with Pa = Pa \ Pa, Cλ = λ 1 and Z = {z BN : P p Pa zp = 1, a A}. (6g) holds since we assume that (5) does not hold. We follow that no feasible solution to MP with P p Pa\Pa zp 1 can be better than an optimal solution to RMP for each a A. As all paths not included in the RMP are in the set S a A Pa\Pa and v(MP) v(RMP) is always true, we conclude v(MP) = v(RMP). Hence, if v(MP) = v(RMP) then (5) holds. This theorem leads to an optimality criterion, telling us when the variables in the RMP contain the ones for an optimal solution to the MP. The pricing problem (PP) aims to find a path not included in the RMP with optimal reduced costs PP : min p Pa\Pa cp(λ), a A . (7) Even though Pa \ Pa can be exponentially large, (7) can be solved efficiently. It boils down to a k shortest path problem (k |Pa|) on the graph GT with updated edge weights λe is added to the cost of we and w e for conflicted edges and λv is added to all incoming edges to conflicted vertices in the current solution z. This can be done efficiently, e.g. using Yen s algorithm (Yen), which dynamically updates the new shortest path using the already computed paths, obtained with A*. Interestingly, (5) generalizes the stopping criterion in classical column generation. The RHS is an upper bound on the optimality gap between the RMP and LD, i.e., v(RMP) v(LD) ˆv L(λ). If that gap is 0, we exactly recover the criterion given in (Lam et al., b) given by cp αa < 0, where αa = minp Pa cp(λ) is the dual variable corresponding to the convexity constraint of agent a (2b). In typical column generation, the pricing is solved with an optimal dual solution of the LP relaxation of the RMP (Lam et al., b). In our case, we solve the Lagrangian dual (LD) LD : max λ Rm + L(λ) = max λ Rm + min z Z L(z, λ) . (8) Due to standard ILP theory (Geoffrion, 2009), the optimum of LD and the LP relaxation of RMP coincides, since Z is convex. Also, the dual parameters of this LP relaxation exactly correspond to the optimal λ . Instead of relying on optimal Lagrange parameters, our theorem allows any choice of λ Rm +. 4.2. Separation In addition to adding new paths to our RMP, we can take a similar approach with handling the constraints. Starting off with no inequality constraints ((2c) and (2d)), we can iteratively add them to the RMP, reducing computational overhead. Given a feasible solution z to the RMP, we check whether it is also feasible for the MP. That is, we add the row/constraint Di z 1 to the RMP if Di z > 1, i VT ET . However, to ensure that our algorithm is optimal, we need to find an optimal solution w.r.t. the current constraints. If this cannot be guaranteed, Algorithm 1 can diverge to a suboptimal solution. The performance of this procedure is evaluated in Sec. 5. 4.3. Solving RMP with QUBO It remains to clarify how to obtain a solution of (3). In fact, solving the RMP (Algorithm 1, Line 8) is the main computational bottleneck of our developed algorithm, since finding shortest paths (Algorithm 1, Line 5) and solving an LP problem (Algorithm 1, Line 7) can be done efficiently. Since we want the solutions to be solvable with QC, we reformulate the constrained problems into QUBO formulations. We have a look at three different approaches, which all rely on using penalty factors to integrate constraints. The one-hot constraint given in Z (2b) can be integrated by using min z Z c z min z Bn c z + X a A ωa o 1 z 1 2 , for large enough ωa o > 0, which leads to a QUBO formulation (1). Setting ωa o > maxp Pa cp always guarantees the equivalence. It remains to incorporate the inequality constraints. Slack Variables The first approach inserts slack variables for every inequality constraint, similar to (Davies & Kalidindi). This is often pursued in practice but can lead to large number of variables, especially when the number of constraints is large. The linear inequality constraint in (3) can be reformulated with using an auxiliary vector s Bn Hybrid Quantum-Classical Multi-Agent Pathfinding (a) Conflicted paths of examplary MAPF problem. (b) Corresponding disconnected conflict graph. Figure 2: Schematic visualization of a conflict graph, which has two connected components. This leads to the decomposition into independent subproblems with reduced problem size. with binary entries, since Dz 1 Dz 1 + s = 0. This leads to the slightly different but equivalent problem minz Z c z, s.t. Dz 1 + s = 0. This constrained problem can then be reformulated to an equivalent unconstrained problem by introducing a penalty parameter ωs > 0 min z Z,s Bm c z + ωs Dz 1 + s 2 . (9) Using this formulation, however, comes with the overhead of introducing m additional binary variables. One variable is needed for every inequality constraint, leading to a total QUBO dimension of n+m. As we are still in the NISQ era, it is better to resort to a QUBO formulation that uses fewer qubits. However, we have a look at the performance of real quantum hardware for this QUBO formulation in Sec. 5. Without Slack Variables Since D Bm n, we can avoid using slack variables by using the equivalence min z Z,s Bm Dz 1 + s min z Z since minn N,s B(n 1+s)2 = minn N(n 1 4. Thus, we do not optimize over any slack variables and reduce the corresponding QUBO size. Conflict Graph As a third concept, we have a look into Conflict Graphs (CG). The vertices of such a graph correspond to all paths considered in the problem and the edges indicate whether there is at least one conflict between a pair of paths. An examplary schematic visualization of a conflict graph is given in Fig. 2. We denote the adjacency matrix of CG graph as C. The RMP in (3) is equivalent to min z Z c z + ωcz Cz (11) where ωc R+ penalizes potential conflicts. Note that it is square and the dimension is only dependent on the number of considered paths and not on the number of constraints. Since the structure of the CG depends on the underlying problem, it may contain several connected components. Thus, the graph of the QUBO matrix can have multiple connected components, dependent on C. These connected components can be seen as smaller instances, giving us the ability to solve them independently. This can lead to large reduction of the considered problem sizes, which is very vital for current QCs, due to the limited number of available qubits. Furthermore, a lower density and well-behaved problem structure can have a huge effect on current quantum hardware, as we will see in Sec. 5. This leads to two different hybrid quantum-classical MAPF algorithms. QUBO-and-Price (QP) describes generating new paths and stopping when (5) is violated (inner loop in Fig. 1). Even though the QUBO solver might not be optimal, this procedure leads to generating all sufficient paths for obtaining an optimal solution. We denote the extension of iteratively adding constraints to our problem as QUBO-and-Cut-and-Price (QCP). It is only guaranteed to generate an optimal path set if the QUBO is solved to optimality in every step. However, we also investigate its suboptimal performance in the next section. 5. Experimental Evaluation We compare our algorithms QP and QCP with four stateof-the-art MAPF solvers: BCP (Lam et al., b), EECBS (Li et al., 2021), La CAM* (Okumura, 2023a) and LNS2 (Li et al., b). BCP is an optimal algorithm but has also been adapted to anytime, while EECBS is suboptimal. La CAM* and LNS2 are anytime algorithms, which can quickly obtain a good feasible solution. For more details on these methods, we refer to the original papers. The code was taken from their respective public repositories1234. We kept all standard parameters and set a maximum time limit of 180 s for all solvers. The experiments were conducted on a single core of the type Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz. For QP and QCP we generate the initial paths with Prioritized Path Planning (PPP) (Ma et al.). That is, every agent is planned successively, with removing edges and nodes of previously planned paths. No clever priorization heuristic is followed, we randomly sample which agent to be chosen next. We use a maximum limit of 30 pricing steps in total and compute optimal Lagrangian parameters λ with LP in every iteration. We compare the optimal solution of the RMP obtained by an ILP Brach-and-Bound method with the solutions obtained by two different QUBO solvers. As a classical baseline, we use the Simulated Annealing (SA) 1https://github.com/ed-lam/bcp-mapf 2https://github.com/Jiaoyang-Li/EECBS 3https://github.com/Kei18/lacam3 4https://github.com/Jiaoyang-Li/MAPF-LNS2 Hybrid Quantum-Classical Multi-Agent Pathfinding Figure 3: Relative performance comparison of our methods QP-ILP, QP-QUBO, QCP-ILP and QCP-QUBO to the baselines PPP, BCP, EECBS, LNS2 and La CAM* on four different maps with a varying number of agents. The relative upper bound of the optimality gap (top) is shown along with the relative total path costs (middle and bottom) averaged over all 25 scenarios. The lower the better, i.e., a value of 0 corresponds to the best performance, while 1 corresponds to the worst performing algorithm. implementation of D-Wave (D-Wave Systems Inc., 2025) and for real quantum hardware, we run experiments on a DWave Advantage system5.4 (D-Wave Systems Inc., 2021) quantum annealer (QA). Since both solvers are probabilistic, we generate 1000 samples each and use 1000 Monte Carlo sweeps for SA and an annealing time of 20 µs for QA. The remaining remaining unspecified parameters are set to their default values. We compare the three different QUBO formulations from Sec. 4.3 and denote them by SLACK (9), HALF (10) and CONFLICT (11). These QUBO formulations are dependent on penalty parameters, and understanding their impact is crucial for NISQ devices. Large penalty weights can increase the dynamic range of QUBO coefficients (M ucke et al.; Gerlach & Piatkowski, 2024) and decrease the spectral gap (Stollenwerk et al., a;b), reducing solution quality and requiring longer annealing times. In our experiments, we set these penalties to the values described in Sec. 4.3 for adhering the constraints and leave special tuning for future work. As maps and instances, we use the well-known Moving AI benchmark (Stern et al.). This benchmark includes 33 maps and 25 random scenarios, some of which were utilized in our experiments. Every scenario on each map (with some exceptions) consists of 1000 start-goal position pairs. To evaluate a solver on a given scenario, we run it on the first 20, 40, 60, 80 and 100 start-goal pairs for all 25 scenarios and indicate the average performance. Quantum Hardware Limitations We did not scale up our experiments further in terms of the number of agents (e.g. |A| = 1000) because the QUBO size grows rapidly with the number of paths and constraints, making the problem increasingly difficult to solve on current NISQ hardware. Even with conflict graph decomposition, the resulting QUBOs from larger environments often exceed the qubit capacity and connectivity limitations of existing quantum hardware. While the D-Wave Advantage System features an impressive 5, 670 physical qubits5, its limited qubit connectivity poses significant constraints. To embed arbitrary graphical structures, D-Wave relies on chaining multiple physical qubits into logical super-qubits, a process necessitated by the underlying hardware topology. The current Pegasus topology6 5https://www.dwavesys.com/solutions-and-products/systems 6https://www.dwavesys.com/media/jwwj5z3z/14-1026ac next-generation-topology-of-dw-quantum-processors.pdf Hybrid Quantum-Classical Multi-Agent Pathfinding Table 1: Total path costs for different maps using 100 agents averaged over 25 scenarios. We show the mean and the standard deviation and indicate the best performing method in bold an the second best by underlining the result. Environment LNS2 La CAM* QP-QUBO QP-ILP empty-32-32 2164.2 88.0 2164.2 87.6 2166.5 88.9 2163.4 88.6 random-32-32-10 2239.0 113.1 2236.8 112.2 2231.3 112.2 2225.4 111.5 room-64-64-8 6301.2 307.2 6219.2 285.4 6323.4 342.4 6107.1 303.7 maze-32-32-4 4725.4 334.6 4534.3 219.8 6332.5 1272.6 5715.9 1020.6 den312d 5499.4 232.6 5492.3 224.9 4559.0 195.2 4558.9 195.2 ost003d 15164.8 782.8 15164.2 782.2 15166.2 783.0 14901.8 1452.7 den520d 17391.3 970.8 17391.0 970.8 17389.5 973.5 17378.9 3541.7 supports embeddings of up to 182 densely connected logical qubits, while the upcoming Zephyr7 topology extends this to 232. Combined with inherent noise limitations, this means that only a fraction of the full qubit count can be effectively utilized, restricting practical problem sizes to a few hundred logical qubits at most. Since path variables are added in every iteration of our algorithm, considering problems with more than 100 agents would thus lead to unreasonable results to the aforementioned limitations. As the quantum hardware matures, more large-scale experiments will be conducted in future work. Algorithm Performance In Fig. 3, we depict the performance of our two algorithms QP and QCP for four different maps of the Moving AI benchmark and varying the number of agents. All performances are shown relative to the best and worst performing configuration, i.e., vrel ..= (v vbest)/(vworst vbest) and averaged over all 25 scenarios. This maps all obtained values to the range [0, 1], where 0 indicates the best and 1 the worst performance, respectively. For QC-QUBO and QCP-QUBO we use the CONFLICT formulation and the SA solver. The top plot row shows the mean relative upper bound ˆvrel vrel(LD) on the optimality gap for a different number of agents. It is not only a measure of solution quality, but it also quantifies the problem size. The higher this gap is, the more new paths are added to our problem during pricing, since it corresponds to the RHS in our optimality criterion (5). We can see that optimally solving the RMP (QCP-ILP and QP-ILP) often has a smaller gap than generating a possibly suboptimal solution by a QUBO solver. However, the QUBO methods largely improve upon the base PPP method. Even though QP clearly outperforms QCP for optimal solving (ILP) of RMP, QCP takes way less constraints into account and is thus computationally 7https://www.dwavesys.com/media/2uznec4s/14-1056aa zephyr topology of d-wave quantum processors.pdf more efficient. This is also beneficial for the QUBO solvers, since they can easier generate good solutions for a more well-behaved problem (less constraints). The mean relative total path cost for the single MAPF instances are depicted in the middle and bottom plot row. We compare the baselines PPP, BCP, EECBS, LNS2 and La CAM* with QP solving the RMP optimally (QP-ILP) and with QUBO (QP-QUBO). If a method does not return any solution in the given time window, we set its performance to the worst other performing method, allowing for a naive anytime comparison. It is evident that QP-ILP is always optimal, while QP-ILP almost always outperforms the best performing baselines LNS2 and La CAM*, which are better for 100 agents on room-64-64-8. The (sub)optimal algorithms BCP and EECBS are always outperformed by our methods due to their bad anytime performance. It is interesting that for no map they are able to find all optimal solutions in the given time frame. Since QP-QUBO is able to outperform LNS2 and La CAM* in many cases, our method is applicable without the need of exactly solving the RMP. In Table 1, we depict the mean and standard deviation of the total path costs averaged over 25 scenarios for different maps using 100 agents. It is evident, that our method with solving the RMP exactly (QP-ILP) is almost always optimal, except for maze-32-32-4. We use PPP for initializing the set of paths which can lead to bad initial results for a large number of agents especially for small corridors appearing in the maze-like maps. Adapting the initialization method for obtaining a valid set of conflict-free paths (e.g. using LNS2 or La CAM* instead) can mitigate this effect. Further, finding valid solutions to the RMP with our proposed QUBO formulation (QP-QUBO) is often also outperforming stateof-the-art heuristics (LNS2 and La CAM*). However, an increasing agent count makes it harder to solve the resulting high-dimensional QUBO problem, sometimes leading to unsatisfactory solver performance. With quantum hardware becoming more mature in the future, we expect the Hybrid Quantum-Classical Multi-Agent Pathfinding Figure 4: Performance comparison of different QUBO formulations for four different maps with 20-agent problems along with the optimal solution, where we run QP (Sec. 5) and QCP (Sec. 5) for 30 pricing steps. We compare SA and QA by indicating the total path cost of the best sample (top) and the number of infeasible solutions (bottom), i.e. (2c) and (2d) are violated. The cost is scaled by 10 3. performance of QP-QUBO to approach the one of QP-ILP. The most time consuming step in our algorithm is to solve the QUBO problem. However, a wall-clock time comparison is difficult for quantum devices nowadays, since there is a large communication overhead when using real quantum hardware over cloud services. Nevertheless, assuming perfect communication with the QA, we can generate solutions to a QUBO with an annealing time of around 50 µs. Due to its probabilistic nature, we have to generate only few thousand samples. In the near future, this could also lead to very good wall-clock time performance. QUBO Comparison The effect of using different QUBO formulations for solving the RMP is depicted in Fig. 4. We consider four different maps (maze-32-32-4, empty-32-32, random-32-32-10, room-32-32-4) and compare the results of QP (Sec. 5) and QCP (Sec. 5) for 20 agents. This leads to varying QUBO sizes between 20 and 400, depending on the underlying map and scenario. We use SA and QA for solving HALF and CONFLICT for QP and HALF and SLACK for QCP. In the top row, we show the cost of the best sample obtained by SA and QA over all 25 scenarios, along with the optimal solution (ILP). Only feasible samples are indicated here, that is only those who adhere the constraints. For QP, we can see that finding a feasible solution with the HALF QUBO is nearly almost optimal while the solution quality of CONFLICT slightly deteriorates. Comparing HALF and SLACK for QCP, we find that both solvers are able to find optimal solutions. However, QA has problems finding feasible solutions for the first map for all QUBO formulations. The number of infeasible solutions returned by SA and QA is depicted in the bottom row. While SA always finds feasible solutions for HALF, it is easier for QA to obtain feasibility with CONFLICT. This is due to the sparsity advantage of CONFLICT over HALF and the corresponding decomposition into independent subproblems. Since the hardware topology of current quantum computers is strongly limited, such properties have a large effect on the solution quality obtained. For QCP, we find that QA finds slightly more feasible solutions with HALF than SLACK. However, we note that only a few separation steps happened and thus only a small number of constraints have been generated. Using more pricing steps or scaling up the problem size would lead to way more included constraints, making the SLACK QUBO infeasible to solve. 6. Conclusion In this paper, we presented two novel optimal hybrid quantum-classical MAPF algorithms. Extending the classical approach of Branch-and-Price-and-Cut, we circumvent exponentially many branching steps. Paths and constraints are iteratively added to our problem which is then solved by utilizing different QUBO formulations, making our algorithms suited for quantum computing. On an ideal adiabatic quantum processor, the QUBO subproblems are solved to global optimality. We proof a generalization of the classically known criterion telling us that our currently available path set is optimal. Experiments indicate good performance of our algorithm compared to state-of-the-art MAPF algorithms. Evaluating different QUBO formulations indicates the superiority of our approaches over previously presented methods. With these insights, we conclude that the hardware-aware design of the QUBO problems can already lead to an advantage on near-term quantum devices. Hybrid Quantum-Classical Multi-Agent Pathfinding Acknowledgments This research has been funded by the Federal Ministry of Education and Research of Germany and the state of North Rhine Westphalia as part of the Lamarr Institute for Machine Learning and Artificial Intelligence. Impact Statement This paper presents work whose goal is to advance the fields of Optimization and Artificial Intelligence. There are many potential societal consequences of our work, none which we feel must be specifically highlighted here. Albash, T. and Lidar, D. A. Adiabatic quantum computation. Reviews of Modern Physics, 90(1):015002, 2018. Ali, M., Ahmed, H., Malik, M. H., and Khalique, A. Multicommodity information flow through quantum annealer. Quantum Information Processing, 23(9):313, 2024. Choudhury, S., Solovey, K., Kochenderfer, M., and Pavone, M. Coordinated multi-agent pathfinding for drones and trucks over road networks. In Proceedings of the 21th International Conference on Autonomous Agents and Multiagent Systems (AAMAS). D-Wave Systems Inc. Advantage system performance update. Technical Report 141054A-A, D-Wave Systems Inc., 2021. URL https://www.dwavequantum.com/media/ kjtlcemb/14-1054a-a_advantage_system_ performance_update.pdf. Technical Report. D-Wave Systems Inc. D-wave ocean sdk. https: //docs.ocean.dwavesys.com/, 2025. Version 8.1.0. Dantzig, G. B. Linear programming. Operations research, 50(1). Davies, E. and Kalidindi, P. Quantum algorithms for drone mission planning. In Proceedings of Quantum Technologies for Defence and Security. Desrosiers, J. and L ubbecke, M. E. Branch-price-and-cut algorithms. Encyclopedia of Operations Research and Management Science. Doole, M., Ellerbroek, J., and Hoekstra, J. Estimation of traffic density from drone-based delivery in very low level urban airspace. Journal of Air Transport Management, 88:101862, 2020. Farhi, E., Goldstone, J., and Gutmann, S. A quantum approximate optimization algorithm. ar Xiv preprint ar Xiv:1411.4028, 2014. Geoffrion, A. M. Lagrangean relaxation for integer programming. In Approaches to integer programming, pp. 82 114. Springer, 2009. Gerlach, T. and Piatkowski, N. Dynamic range reduction via branch-and-bound. ar Xiv preprint ar Xiv:2409.10863, 2024. Gerlach, T., Knipp, S., Biesner, D., Emmanouilidis, S., Hauber, K., and Piatkowski, N. Fpga-placement via quantum annealing. In Proceedings of the 32nd ACM/SIGDA International Symposium on Field Programmable Gate Arrays (ISFPGA), pp. 43. ACM, 2024. Huang, T., Li, J., Koenig, S., and Dilkina, B. Anytime multi-agent path finding via machine learning-guided large neighborhood search. In Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI). Huang, Z., Li, Q., Zhao, J., and Song, M. Variational quantum algorithm applied to collision avoidance of unmanned aerial vehicles. Entropy, 24(11):1685, 2022. Johnson, M., Amin, M., Gildert, S., Lanting, T., Hamze, F., Dickson, N., Harris, R., Berkley, A., Johansson, J., Bunyk, P., Chapple, E., Enderud, C., Hilton, J., Karimi, K., Ladizinsky, E., Ladizinsky, N., Oh, T., Perminov, I., Rich, C., Thom, M., Tolkacheva, E., Truncik, C., Uchaikin, S., Wang, J., Wilson, B., and Rose, G. Quantum annealing with manufactured spins. Nature, 473 (7346), 2011. Lam, E., Harabor, D. D., Stuckey, P. J., and Li, J. Exact anytime multi-agent path finding using branch-and-cutand-price and large neighborhood search. In Proceedings of the 33rd International Conference on Automated Planning and Scheduling (ICAPS), a. Lam, E., Le Bodic, P., Harabor, D. D., and Stuckey, P. J. Branch-and-cut-and-price for multi-agent pathfinding. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI), b. Li, J., Chen, Z., Harabor, D., Stuckey, P. J., and Koenig, S. Anytime multi-agent path finding via large neighborhood search. In Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI), a. Li, J., Chen, Z., Harabor, D., Stuckey, P. J., and Koenig, S. Mapf-lns2: Fast repairing for multi-agent path finding via large neighborhood search. In Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI), b. Li, J., Tinka, A., Kiesel, S., Durham, J. W., Kumar, T. S., and Koenig, S. Lifelong multi-agent path finding in largescale warehouses. In Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI), c. Hybrid Quantum-Classical Multi-Agent Pathfinding Li, J., Ruml, W., and Koenig, S. Eecbs: A boundedsuboptimal search for multi-agent path finding. In Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI), pp. 12353. AAAI Press, 2021. Li, Q., Huang, Z., Jiang, W., Tang, Z., and Song, M. Quantum algorithms using infeasible solution constraints for collision-avoidance route planning. IEEE Transactions on Consumer Electronics, 2024. L ubbecke, M. E. Column generation. Encyclopedia of Operations Research and Management Science, 17. Ma, H., Harabor, D., Stuckey, P. J., Li, J., and Koenig, S. Searching with consistent prioritization for multi-agent path finding. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI). Mart ın, M. P. and Martin, S. Unsplittable multi-commodity flow problem via quantum computing. In Proceedings of the 9th International Conference on Control, Decision and Information Technologies (Co DIT). M ucke, S. and Gerlach, T. Efficient light source placement using quantum computing. In Proceedings of the Conference on Lernen, Wissen, Daten, Analysen (LWDA). CEUR-WS.org, 2023. M ucke, S., Gerlach, T., and Piatkowski, N. Optimumpreserving qubo parameter compression. Quantum Machine Intelligence, 7(1). Okumura, K. Improving lacam for scalable eventually optimal multi-agent pathfinding. In Proceedings of the 32nd International Joint Conference on Artificial Intelligence (IJCAI), pp. 243, 2023a. Okumura, K. Lacam: Search-based algorithm for quick multi-agent pathfinding. In Proceedings of the 37th AAAI Conference on Artificial Intelligence (AAAI), pp. 11655. AAAI Press, 2023b. Preskill, J. Quantum computing in the nisq era and beyond. Quantum, 2:79, 2018. Punnen, A. P. The quadratic unconstrained binary optimization problem. Springer, 2022. R onnberg, E. and Larsson, T. An integer optimality condition for column generation on zero one linear programs. Discrete Optimization, 31. Sadykov, R., Vanderbeck, F., Pessoa, A., Tahiri, I., and Uchoa, E. Primal heuristics for branch and price: The assets of diving methods. INFORMS Journal on Computing, 31(2). Sharon, G., Stern, R., Felner, A., and Sturtevant, N. Conflictbased search for optimal multi-agent path finding. In Proceedings of the 26th AAAI Conference on Artificial Intelligence (AAAI). Stern, R., Sturtevant, N., Felner, A., Koenig, S., Ma, H., Walker, T., Li, J., Atzmon, D., Cohen, L., Kumar, T., et al. Multi-agent pathfinding: Definitions, variants, and benchmarks. In Proceedings of the 12th International Symposium on Combinatorial Search (So CS). Stollenwerk, T., Lobe, E., and Jung, M. Flight gate assignment with a quantum annealer. In International Workshop on Quantum Technology and Optimization Problems, a. Stollenwerk, T., O Gorman, B., Venturelli, D., Mandra, S., Rodionova, O., Ng, H., Sridhar, B., Rieffel, E. G., and Biswas, R. Quantum annealing applied to de-conflicting optimal trajectories for air traffic management. IEEE Transactions on Intelligent Transportation Systems, 21 (1), b. Tarquini, S., Dragoni, D., Vandelli, M., and Tudisco, F. Testing quantum and simulated annealers on the drone delivery packing problem. ar Xiv preprint ar Xiv:2406.08430, 2024. Wolsey, L. A. Integer programming. John Wiley & Sons, 2020. Yen, J. Y. Finding the k shortest loopless paths in a network. Management Science, 17(11). Zhang, Y., Zhang, R., and Potter, A. C. Qed driven qaoa for network-flow optimization. Quantum, 5:510, 2021.