# realtime_heuristic_search_with_ltlf_goals__6c9debd7.pdf

Real-Time Heuristic Search with LTLf Goals

Jaime Middleton1, 4 , Rodrigo Toro Icarte1,2 , Jorge A. Baier1,2,3

1Department of Computer Science, Pontificia Universidad Cat olica de Chile, Santiago, Chile 2Centro Nacional de Inteligencia Artificial CENIA, Santiago, Chile 3Instituto Milenio Fundamentos de los Datos, Santiago, Chile 4Tucar Sp A, Santiago, Chile jamiddleton@uc.cl, rodrigo.toro@ing.puc.cl, jabaier@ing.puc.cl

In Real-Time Heuristic Search (RTHS) we are given a search graph G, a heuristic, and the objective is to find a path from a given start node to a given goal node in G. As such, one does not impose any trajectory constraints on the path, besides reaching the goal. In this paper we consider a version of RTHS in which temporally extended goals can be defined on the form of the path. Such goals are specified in Linear Temporal Logic over Finite Traces (LTLf), an expressive language that has been considered in many other frameworks, such as Automated Planning, Synthesis, and Reinforcement Learning, but has not yet been studied in the context of RTHS. We propose a general automata-theoretic approach for RTHS, whereby LTLf goals are supported as the result of searching over the cross product of the search graph and the automaton for the LTLf goal; specifically, we describe LTL-LRTA*, a version of LSS-LRTA*. Second, we propose an approach to produce heuristics for LTLf goals, based on existing goal-dependent heuristics. Finally, we propose a greedy strategy for RTHS with LTLf goals, which focuses search to make progress over the structure of the automaton; this yields LTL-LRTA*A. In our experimental evaluation over standard benchmarks we show LTL-LRTA*A may outperform LTL-LRTA* substantially for a variety of LTLf goals.

1 Introduction Real-time heuristic search [Korf, 1990] (RHTS) is an approach to solving search problems by interleaving search and execution. It is important for applications in which there is little time to search before an action has to be executed. Some applications are videogames and highly dynamic robotics. As originally defined, RTHS consists of reaching a goal state on a given search graph. However, many real-time applications may require agents to satisfy temporally extended goals (i.e., eventually fetch the key and then reach the door; eventually board the spaceship while avoiding locations with mud). Such kinds of goals, which impose constraints over the trajectory of states traversed by the agent, are typically

represented in linear temporal logic (LTL) in areas such as automated planning [Cresswell and Coddington, 2004; Baier and Mc Ilraith, 2006; Kabanza and Thi ebaux, 2005; Gerevini et al., 2009; Simon and R oger, 2015], synthesis [De Giacomo and Vardi, 2015; Bonet et al., 2020], and reinforcement learning [Toro Icarte et al., 2018; Camacho et al., 2019; Vaezipoor et al., 2021]. In this paper 1 we consider adding goals expressed in linear temporal logic over finite traces (LTLf) [De Giacomo and Vardi, 2013] to RTHS. Thus, we assume we are given a search graph G, a vertex of G where the agent is initially at, and an LTLf formula φ specifying legal trajectories for the agent. The problem consists of moving the agent through one of such trajectories. We take a standard automata-theoretic approach in which we use the fact that an LTLf formula φ has a corresponding finite-state automaton Aφ which accepts the traces defined by φ. We consider the cross product between the search space and the automata to propose LTL-LRTA*, a version of LSSLRTA* [Koenig and Sun, 2009] that searches over the crossproduct between the search graph and Aφ. But simply considering search over the cross-product representation does not address the important problem of guiding real-time search. To that end, we present two orthogonal contributions aimed at guiding search for LTLf goals. First, we present a method to construct a heuristic function for any LTLf goal assuming we have a goal-independent heuristic ˆh in hand, which is such that ˆh(s, g) estimates the cost of a path from a state s to a given goal state g. Second, we propose automata subgoaling, a novel approach to carry out search within the RTHS algorithm that prioritizes making progress in the automaton for the LTLf formula. This is accomplished by ordering the search frontier considering the distance (q) between the current automaton state q to an accepting state of Aφ, and using A* s priority function f = g+h as a tie breaker rather than as the main guiding function. When applying this principle to LTL-LRTA* we obtain LTL-LRTA*A. We prove LTL-LRTA*A is complete under standard assumptions when the goal is such that its automaton s graph structure does not have loops, excluding self-loops. Since no algorithms for RTHS with LTLf goals exist, the

1Our source code and appendix are publicly available at https://github.com/Jamidd/RTHS-with-LTLf-Goals

Proceedings of the Thirty-First International Joint Conference on Artiﬁcial Intelligence (IJCAI-22)

objective of our experimental evaluation was to find the strengths and weaknesses of the algorithms we propose. For our experiments we use standard grids, Starcraft maps and mazes, using a number of LTLf goals. We show that LTLLRTA*A may outperform LTL-LRTA* by a significant margin. The relative performance of both algorithms depends on the quality of the heuristic being used. We conclude that using the information captured by the automaton for the goal, either by exploiting the automaton to build a heuristic or by using automata subgoaling, is important. Although we are the first to propose to use the structure of the goal s automaton to guide search in RTHS, this idea, as a general concept, is not new. Indeed, it has been considered before in planning with LTL goals [Kabanza and Thi ebaux, 2005], but in a way that is not compatible with RTHS. Furthermore, the idea of using subgoaling has also been considered before in regular RTHS [Bulitko and Bj ornsson, 2009; Hern andez and Baier, 2011]. The way in which we incorporate subgoaling, by changing the priority of Open, is, however, fundamentally different from previous work. While in this paper we incorporate our techniques in LSSLRTA*, a generalization of LRTA* [Korf, 1990], our principles are general and can be applied to a number of other RTHS algorithms.

2 Real-Time Heuristic Search

Real-time heuristic search (RTHS) is an approach to solving search problems [Korf, 1990]. A key characteristic of the approach is that the amount of computation for decision making is bounded by a constant B, after which one or more actions may be performed. If the problem has not been solved yet, B more units of computation are allowed for a new decisionmaking episode. The loop repeats until the problem is solved. A final-state search problem P = (S, E, c, sstart, G) is a tuple, where (S, E, c) is a search graph, S is a set of states, E S S is a finite set of directed edges, c : S S 7 (0, ] is a cost function, sstart S is the initial state, G S is a set of goal states. We denote the set of neighbors of state s as N(s), formally defined as N(s) = {t | (s, t) E}. A path π from s1 to sn is a sequence of states (s1, s2, . . . , sn) such that (si, si+1) E for all i {1, , n 1}. Any path from sstart to a goal state in G is a solution to the search problem P = (S, E, c, sstart, G). The cost of path π, denoted by c(π), is the cumulative cost of all the edges traversed in π; that is, c(π) = Pn 1 i=1 c(si, si+1). State s is a dead end if there is no path from s to a state in G. To help the agent solve a search problem, RTHS algorithms use a heuristic function h : S [0, ], such that h(s) estimates the cost of a path from s to a goal state in G. A heuristic function h is consistent iff h(s) = 0 for every state s G and h(s) c(s, t) + h(t) for every (s, t) E. The perfect heuristic h is a heuristic such that h (s) denotes the cost of a minimum-cost path between s and a state in G. Heuristic h is admissible iff h(s) h (s) for every s S. Consistency implies admissibility. A typical RTHS algorithm runs a main loop in which search is carried out in the vicinity of the current state. Such

a search may not exceed a given computation bound, usually given in terms of the maximum number of states which can be expanded during search. The information gathered during search is used to (1) decide which state the agent will move to, and (2) to update the heuristic function. This process repeats until finding a goal state.

3 Search Problems with LTLf Goals

In this section, we formally define search problems with LTLf goals. To do so, we first describe the semantics of LTLf [Bienvenu et al., 2006; De Giacomo and Vardi, 2013]. We then show how to transform LTLf formulas into goals and heuristics for a search problem. All the theorems from this section are proven in Appendix A.

3.1 LTLf Goals and Automata LTLf extends propositional logic with temporal operators such as next( ), weak-next( ), and until(U). As in propositional logic, LTLf formulas are defined over a set of propositional symbols P. However, unlike propositional logic, LTLf formulas are satisfied with respect to traces, which are sequences of truth value assignments to the propositions in P. To use LTLf in real-time search, we define the semantics of LTLf with respect to paths on a search graph. To that end, we assume the agent is capable of sensing the truth value of propositions on a given state. We formalize this notion using a labelling function L : S 7 2P, which maps each state s S to a set of propositions in P, which are true in s. Given a path π = (s0, s1, . . . , sn) in a search graph, we say that π satisfies an LTLf formula φ with respect to L, denoted π |= φ, iff π[0] |= φ, where, for every i {0, . . . , n}:

1. π[i] |= p iff p L(si), where p P.

2. π[i] |= φ if π[i] |= φ.

3. π[i] |= (φ ψ) if π[i] |= φ and π[i] |= ψ.

4. π[i] |= φ if i < n and π[i + 1] |= φ.

5. π[i] |= φ if i < n implies that π[i + 1] |= φ.

6. π[i] |= φ U ψ if π[i] |= ψ for some k i and π[j] |= φ for every j {i, . . . , k 1}.

The connectives (or), (always), and (eventually) can be defined in terms of the above, as (φ ψ) def = ( φ ψ), φ def = true U φ, and φ def = φ. A useful feature of LTLf is that any formula can be translated into an equivalent deterministic finite-state automaton (DFA) using standard libraries, such as Spot [Duret-Lutz et al., 2016] or LTLf2DFA [Fuggitti, 2019]. In the context of real-time search, translating an LTLf formula φ into a DFA results in a DFA Aφ = (Q, 2P, δ, qstart, F), where Q is a finite set of automaton states, 2P is the power set of the propositions in P (i.e., the range of the labelling function), δ : Q 2P Q is the transition function, qstart Q is the initial state, and F Q is the set of accepting states. Then, a path π = (s1, . . . , sn) in the search graph is accepted by Aφ iff qn F, where qi+1 = δ(qi, L(si)) for all i {1, . . . , n 1} and q1 = qstart. In other words, Aφ accepts π iff, starting from qstart and updating the DFA state

Proceedings of the Thirty-First International Joint Conference on Artiﬁcial Intelligence (IJCAI-22)

according to δ and π, the DFA ends in an accepting state. Finally, we note that π |= φ iff Aφ accepts π by construction. We could say that the i-th automata state, represents the i-th sub-problem to be solved in order to complete the goal. Below we use notation (q) to represent the minimum number of transitions required to reach an accepting state in Aφ from q. As such, (q) = 0 if q F, and (q) = if there is no path in Aφ from q to an accepting state.

3.2 Running Example As a simple example, consider the grid world shown in Figure 1a, which includes an agent, walls, and four marked locations: a, b, c, and d. We might consider that each marked location represents a subtask that the agent can solve by reaching that location. In this domain, we can define the set of propositional symbols P = {a, b, c, d} and its corresponding labelling function L such that p L(s) iff the proposition p is true in state s (i.e., the agent is at location p). Now we can consider different LTLf goals for the agent in this domain. For instance, a (i.e., eventually a) requires that the agent reaches a cell marked with a. The previous task might be considered a standard goal in RTHS. However, LTLf goals allow us to define more complex, temporally extended goals. Some examples include to solve task a and then task b (i.e., (a b)), to avoid states in which certain property d holds (i.e., d), or combinations of LTLf goals (e.g., (a b) d). Figures 1b and 1c show examples of two more complex LTLf goals with their corresponding DFAs. The first LTLf goal asks the agent to solve all the subtasks in alphabetical order. The second LTLf goal is similar to the first LTLf goal but it also forces the agent to solve the subtasks in strict order. That means that, if the agent solves task b before solving task a, then the agent fails at solving overall task.

3.3 LTLf Search Problems (LSP) Now we formally define a search problem with an LTLf goal. Intuitively, a search problem with an LTLf goal is like any other search problem but where the objective for the agent is to traverse a sequence of states that satisfies an LTLf formula instead of reaching a particular goal state. Definition 1 (LTLf Search Problem (LSP)). An LTLf search problem P = (S, E, c, sstart, P, φ, L) is a tuple, where (S, E, c) is a search graph, sstart S is the start state, φ is an LTLf formula over a set of propositions P, and L : S 7 2P is a labelling function. Now we formally define what is a solution to an LSP. Definition 2 (Solution to an LSP). Let P be an LSP. A path π = (s1, . . . , sn) is a solution to P iff π is a path over the search graph (S, E, c) from the initial state (i.e., s1 = sstart) and π |= φ with respect to L. Moreover, the solution cost of π is c(π) = Pn 1 i=1 c(si, si+1). A standard technique to solve search problems with temporally extended goals is to exploit the cross-product between the automaton and the search graph [Bacchus et al., 1997; Baier and Mc Ilraith, 2006]. By doing so we can compile away the temporal goal, transforming the LSP into a regular search problem. An important advantage of this transformation is that it allows solving an LSP with any off-the-shelf

RTHS algorithm. The formal definition of a cross-product LSP and the equivalence between solving the LSP and the corresponding c LSP follows. Theorem 1 (Cross-product LSP (c LSP)). Given an LSP P = (S, E, c, sstart, P, φ, L) , let the DFA for φ be Aφ = (Q, 2P, δ, qstart, F). Finally, let Pφ be the cross-product LSP (c LSP) defined as (Sφ, Eφ, cφ, sstartφ, Gφ) such that: 1. Sφ = S Q, 2. ((s, q), (t, r)) Eφ iff (s, t) E and r = δ(q, L(t)), 3. cφ((s, q), (t, r)) = c(s, t), 4. sstart φ = (sstart, qstart), and 5. (s, q) Gφ iff q F. Then path π = (s1, . . . , sn) is a solution to P iff πφ = ((s1, q1), . . . , (sn, qn)) is a solution to Pφ, where q1 = qstart, s1 = sstart, and qi+1 = δ(qi, L(si+1)), for every i {1, . . . , n 1}. In addition, c(π) = cφ(πφ).

3.4 Heuristics for LSPs The cross-product transformation allows us to solve any LSP with an off-the-shelf RTHS algorithm. However, heuristics are key for the performance of any heuristic search algorithm, including RTHS algorithms. While users of heuristic search algorithms are aware of methods to construct heuristics for a given standard search problem P with final-state goals, their techniques may not be readily applicable for building heuristics for LSPs. In this section we present a simple approach to generate a heuristic for any given LTLf formula from an existing final-state heuristic for the same underlying problem. More specifically, assume that given a search graph (S, E) and a cost function c, we have a heuristic ˆh(s, t) that estimates the cost of a path from s to t. Henceforth we call these heuristics goal-independent, since they receive the goal as a parameter. For many search problems with fixed goal states (i.e., 15-puzzle, Pancacke problem) goal-dependent heuristics can be easily generated after a simple state transformation. These state transformations may be even applicable to Pattern-database heuristics, which are goal-dependent. In addition, domain-independent heuristics typically used in planning [Bonet and Geffner, 2000; Hoffmann and Nebel, 2001; Helmert, 2006] are goal-independent. Now we show how given a heuristic ˆh(s, t) for search graph (S, E, c) and an LTLf formula φ we can construct a heuristic hφ. Definition 3 (The cross-product heuristic). Given an LSP P = (S, E, c, sstart, P, φ, L) and a goalindependent heuristic ˆh : S S 7 [0, ) for G = (S, E, c), we define the cross-product heuristic hφ : S Q 7 [0, ) as follows. Let Aφ = (Q, 2P, δ, q0, F) be the DFA representation of the LTLf formula φ. And let Sq S be the set of all states that cause a transition from q Q to q Q such that q = q while an accepting state can be reached from q in Aφ. That is, Sq = {s S | q = δ(q, L(s)), q = q , (q ) < }. Then, hφ(s, q) = 0 if q F. hφ(s, q) = if (q) = .

Proceedings of the Thirty-First International Joint Conference on Artiﬁcial Intelligence (IJCAI-22)

(a) A grid world in which propositions a, b, c, d are true in certain cells.

(b) DFA for (a (b (c d))), which requires the agent to solve all subtasks in alphabetical order.

b c (c) DFA for d ( (b c) U a) ( c U b) (a (b c)), which is similar to the task of (b) but forces the agent to solve the subtasks in strict order.

Figure 1: Running example with two conceivable LTLf goals. Transitions in the DFAs have been simplified using the fact that there is no state in which two propositions are true. In addition, states containing no outgoing transitions are assumed to have a self-loop with true .

hφ(s, q) = min t Sq c(s, t) + hφ(t, δ(q, L(t))), otherwise.

where c(s, t) = max(ˆh(s, t), ϵ) and ϵ is the minimum cost of any transitions in G. That is, ϵ = min{c(s, t) | (s, t) E}.

Intuitively, hφ(s, q) estimates the cost to go from (s, q) to a goal state (s , q ) Gφ by solving a simplification of the LSP P. This simplification consists of only considering the subset of states Sq S that can change the current DFA state q. Then, the cost to go from s S to t Sq, which might not be connected in the original LSP, is given by the heuristic ˆh(s, t). The system of equations from Definition 3 simply states the standard constraints that define a solution to the shortest path problem that results from solving the simplified version of P [Wolsey, 1998]. Note that we introduced ϵ to ensure that all the cost are greater than zero in the simplified problem. This ensures that any solution to the system of equations will indeed reach an accepting state (if that state is reachable) in the simplified problem. As an example, let us consider two possible cross-product heuristics for our running example (Figure 1a). In this domain, the cost of performing any action is equal to 1. Therefore, a simple heuristic for this problem is ˆh1(s, t) = 1 if s = t and zero otherwise. Let s be the current state of the agent in Figure 1a and φ be the LTLf formula from Figure 1b. Here, the cross-product heuristic will simply sum ˆh1(s, sa), ˆh1(sa, sb), ˆh1(sb, sc), and ˆh1(sc, sd). Then, h1 φ(s, qstart) = 4 because, to reach an accepting state in the simplified problem, the agent has to hit locations a, b, c, and d. We note that ˆh1 is a particularly naive heuristic for this problem. A better heuristic would be to use the Manhattan distance ˆh M to construct the cross-product heuristic h M φ . The Manhattan distance ˆh M(s, t) estimates the cost to go from s to t by counting the number of moves needed to go from s to t if we ignore all obstacles. In the running example, h M φ (s, qstart) = 48 because that is the sum of the Manhattan distances to go through the states s, sa, sb, sc, and finally sd. On the theoretical side, we now show that the cross-product heuristic hφ(sφ) inherits properties from its base heuristic

ˆh(s, t). In particular, if ˆh is admissible (resp. consistent) for G, then hφ is admissible (resp. consistent) for Pφ. As a result, the two previously discussed heuristics, namely h1 φ and h M φ , are consistent for the running example.

Theorem 2 (Properties of the cross-product heuristic). Let P = (S, E, c, sstart, P, φ, L) be an LSP, Pφ be the c LSP for P, and ˆh : S S 7 [0, ) be a heuristic for G = (S, E, c). Let hφ be the cross-product heuristic constructed from P and ˆh. Then, the following two properties hold. First, if ˆh is admissible for G then hφ is admissible for Pφ. And second, if ˆh is consistent for G then hφ is consistent for Pφ.

On the practical side, we note that the effectiveness of computing the cross-product heuristic hφ(s, q) depends on the cardinality of Sq. In particular, the complexity of computing hφ(s, q) by solving the resulting shortest path-problem using Dijkstra is O(v2), where v = 1 + P q Q |Sq|. In some problems, such as in grid worlds, |Sq| is much smaller than |S| and, thus, hφ can be precomputed for all cross-product states (s, q) S Q in just a few seconds. However, computing hφ might be a challenge in highly combinatorial problems, such as the 15-puzzle. For those cases, we devise two possibilities. One option is to compute hφ using methods other than Dijkstra s. For instance, h1 φ can be computed by only looking at the DFA, meaning that the complexity of computing h1 φ can go down to O(|Q|2) regardless of the size of the Sq sets.

Another option is to approximate hφ by removing the recursion from Definition 3. That is, to define hφ(s, q) exactly as hφ(s, q), but when q F and (q) < , then hφ(s, q) = min{ c(s, t) | t Sq}. We will refer to hφ as a myopic heuristic. hφ has two nice properties. First, Theorem 2 also applies to hφ. Second, the complexity of computing hφ(s, q) is just O(|Sq|). However, hφ is weaker than hφ since hφ(s, q) hφ(s, q) for all (s, q) S Q. We formally discuss all these properties in Appendix A.3.

Proceedings of the Thirty-First International Joint Conference on Artiﬁcial Intelligence (IJCAI-22)

Algorithm 1: LTL-LRTA*, a simple variant of LSSLRTA* that solves an LSP by carrying out search over the cross-product representation.

Input : An LSP P = (S, E, c, sstart, P, φ, L), the DFA for φ defined as (Q, 2P, δ, qstart, F), a heuristic h : S Q 7 [0, ), and a positive integer k Effect: The agent is moved through a path from sstart to a goal state in G if a path exists

1 (snow, qnow) (sstart, qstart)

2 while qnow F do

3 Open, Closed Bounded-A (snow, qnow)

4 if Open is empty then

5 print no solution

6 abort execution

7 π path from (snow, qnow) to state at the top of Open

8 Dijkstra-Update(Open, Closed)

9 for each (s, q) in π

10 (snow, qnow) (s, q)

11 Move agent to snow

4 Real-Time Search for LSPs

We have shown that an LSP can be solved by solving its equivalent c LSP problem which does not have a temporally extended goal. Furthermore, we described an approach to compute heuristics to guide search for the c LSP. In this section we present two RTHS approaches for solving c LSPs. The first one, our baseline approach, is a straightforward modification of LSS-LRTA*. The second one is a greedy algorithm that we call automata subgoaling, which, as we show later, may lead to significantly improved performance when solving c LSPs. We note that all the theoretical results from this section are proven in Appendix B.

4.1 RTHS over the Cross-Product State Space

The first technique we propose is straightforward from Theorem 1, which establishes the equivalence between searching over an LSP or its cross product with the automaton for the LTLf formula. Algorithm 1 describes LTL-LRTA*, a simple variant of LSS-LRTA* [Koenig and Sun, 2009] which receives an LSP P and searches over the corresponding c LSP. LTL-LRTA* takes as input an LSP P, the DFA for the LTLf goal φ, a heuristic function h : S Q 7 [0, ) over the cross-product states, and a positive integer k. Even though we assume the agent moves over graph (S, E), in its main loop, LTL-LRTA* maintains variables snow and qnow which are such that (snow, qnow) is the state of the corresponding c LSP where the agent is at. The only algorithmic difference between LTL-LRTA* and LSS-LRTA* is that search is carried out over the crossproduct representation. Specifically for the search part of the algorithm, LTL-LRTA* invokes Bounded-A in Line 3. Bounded-A takes (snew, qnew) as the root of the search, and expands nodes using the transition function for automaton Aφ; as such, to compute the neighbors of (s, q) it simply iterates over each neighbor t of s, generating a state of the form (t, δ(q, L(t))). As any standard A* implementation, the priority used in the Open priority queue is given by f = g+h,

where g(s, q) is the cost of the best path found so far to state (s, q), and h is the heuristic function. Bounded-A stops after k expansions have been made or when the goal is at the top of Open. Finally, for decision-making, just like LSS-LRTA* does, LTL-LRTA* traverses path π from the current state to the state at the top of Open, that is, the one with the lowest fvalue (Line 7; Algorithm 1). For the heuristic update (Line 8), the algorithm uses a version of Dijkstra s algorithm which receives the Open and Closed datastructures previously returned by Bounded-A , just like LSS-LRTA* would do. After the execution of Line 8, for every element (s, q) in Closed it holds that h(s, q) = min(t,r) N(s,q) c(s, t) + h(t, r). LTL-LRTA* was designed to be equivalent to LSS-LRTA* when run over the c LSP for P, Pφ. It therefore inherits the following property of LSS-LRTA*.

Theorem 3. Let P be an LSP, and let Pφ = (Sφ, Eφ, cφ, sstart φ, Gφ) be its corresponding c LSP. Moreover, let hφ be a consistent heuristic for Pφ. Then LTLLRTA*, run over P and hφ is guaranteed to find a solution to P if a solution exists and no dead end in (Sφ, Eφ) is reachable from sstartφ.

4.2 Automata Subgoaling

The efficiency of LTL-LRTA* depends mainly on the quality of the given heuristic hφ. Now we present a technique which exploits the structure of the automaton Aφ as an additional source of guidance. The main intuition is that we can understand automaton states as subgoals and focus the search on states in which progress is made on the automaton structure. We implement this in Bounded-A , by changing the priority of Open to be computed as ( (q), f(s, q)) for a state (s, q), and assuming a lexicographic ordering. As such, as soon the search finds a state (s, q) such that q is closer to an accepting state than any other state in Open, search focuses on such states, since such states will be pushed to the top of Open. Since order is lexicographic, the f function still guides search within states that are equally closer to an accepting state. The pseudo-code is in Appendix C. If the DFA has no cycles (although it might have selfloops), then LTL-LRTA*A is guaranteed to solve the LSP. This property is stated in the following theorem.

Theorem 4. Let P be an LSP, and Pφ be its corresponding c LSP. Assume Pφ contains no dead-end states reachable from sstart φ and that any path πφ = ((s1, q1), . . . , (sn, qn)) from sstart φ is such that (qi) < (qj) for every i, j such that 1 i < j n when qi = qj. Then LTL-LRTA*A, run over P using a consistent hφ finds a solution to P if a solution exists.

5 Empirical Evaluation

Since no approaches to RTHS with LTLf goals exist, the main objective of our empirical evaluation was to understand what was the effectiveness of the various techniques we propose. Therefore, we compare LTL-LRTA* and LTL-LRTA*A using different heuristic functions over three well-known benchmarks and five LTLf goals.

Proceedings of the Thirty-First International Joint Conference on Artiﬁcial Intelligence (IJCAI-22)

Configuration No. best solution Sol. gap

Heuristic Domain LTL Ties LTLA LTL / LTLA

h1 φ Rooms 1020 0 1980 1.295

Star Craft 1213 0 1787 1.158

Maze 1272 0 1728 1.137

h M φ Rooms 841 635 1524 1.020

Star Craft 461 2008 531 1.002

Maze 588 1716 696 1.005

h M φ Rooms 20 0 2980 31.912

Star Craft 27 0 2973 11.564

Maze 169 1 2830 3.106

Total 5611 4360 17029 2.328

Table 1: Results for different Maps-Heuristic combinations 5.1 Experimental Setup We tested LTL-LRTA* and LTL-LRTA*A with various problems, heuristics, and goals. We ran experiments on three families of grid problems proposed by Sturtevant [2012]: Rooms, Starcraft, and Maze.2 In these domains, the agent can only move to empty cells, and each cell can be either empty or blocked. To build a search graph from the grid we connect each cell to its 4 immediate cardinal neighbors and the cost for every graph edge is 1. In addition, we randomly marked vertices of the graph with letters from the alphabet in {a, b, . . . , i}. We evaluated the performance of LTL-LRTA* and LTLLRTA*A using two cross-product heuristics: h1 φ and h M φ , and one myopic heuristic: h M φ . Intuitively, h1 φ(s, q) is equal to the minimum distance between q and an accepting state in the DFA, where the cost of transitioning between DFA states is always 1. h M φ works similarly to h1 φ(s, q), but the cost to move between DFA states is given by the Manhattan distance between the environment states that the agent must reach to cause that transition. Finally, the myopic heuristic h M φ is simply the Manhattan distance between the current state and the closest state that would cause a transition in the DFA. More details can be found in Section 3.4. We evaluated the algorithms using five different LTLf goals. These goals exploit a wide range of the features that LTLf provides. As such, some of these goals include completing a sequences of tasks (e.g., the agent has to solve a set of tasks in order), partial order tasks (e.g., some tasks must be solved before other tasks), disjunctive tasks (e.g., the agent can either do task 1 or task 2), and safety constraints (e.g., the agent must ensure that a condition holds as it solves the main task). These five goals are formally described in Appendix D.

5.2 Comparison of RTHS Methods for LSPs A summary of our results is shown in Table 1. In total, we ran 27, 000 experiments. These considered 5 LTLf goals, 3

2In Rooms we used all the maps with 8 8 rooms. In Maze we used the maps with hallways of width 32. In Starcraft we used the maps: Blast Furnance, Catwalk Alley, Crossroads, Enigma, Infermo, Flooded Plains, Space Atoll, Octopus, Valleyof Re, Wheelof War.

domains (i.e., rooms, starcraft, and maze), 10 maps per domain, 5 problem instances per map (where each instance is a different placement of the letters in the map), 6 lookahead values (with k {32, 64, 128, 256, 512, 1024}), and 3 heuristics (h1 φ, h M φ , and h M φ .). When placing letters in the maps, we either placed three letters of each type (i.e., three a s, three b s, etc) or twenty-five.

Table 1 compares the performance of LTL-LRTA* and LTL-LRTA*A using two main metrics. The first metric is the number of best solution, which is the number of times that an algorithm found a better than the other. In the table, column LTL refers to the number of problems where LTL-LRTA* found a better solution than LTL-LRTA*A and column LTLA refers to the number of problems where LTL-LRTA*A found a better solution than LTL-LRTA*. We also included the number of ties. As the table shows, LTL-LRTA*A tends to find better solutions than LTL-LRTA*A. Indeed, LTLLRTA*A found the best solution in 17, 029 problems out of 27, 000 (and tied in 4, 360). The table also shows that the advantage of LTL-LRTA*A over LTL-LRTA* is problem dependent. For instance, the performance of both methods is quite similar in Star Craft when using h M φ , whereas LTL-LRTA*A solves almost every problem faster than LTLLRTA* in Rooms when using h M φ .

The second performance metric is the solution gap. This metric compares the quality of the solutions found by each method. Specifically, we computed the solution gap by dividing the cost of the solution found by LTL-LRTA* by the cost of the solution found by LTL-LRTA*A, and then reported the geometric mean of these ratios across all the experiments. As Table 1 shows, LTL-LRTA*A found solutions that were over 2.3 times better than the solutions found by LTL-LRTA* on average. However, we only see large gaps for the case of the myopic heuristic h M φ . When using the cross-product heuristic h1 φ, LTL-LRTA*A finds solutions that are around 1.2 times better. And when using h M φ , LTL-LRTA*A is only marginally better than LTL-LRTA*. We believe that part of the reason why LTL-LRTA*A performs similar to LTL-LRTA* in this latter case is that h M φ is a well-informed (although expensive) heuristic. In particular, h M φ already propagates back the information that moving to DFA states that are closer to an accepting state decreases the heuristic value. As such, ordering Open by does not make a large difference in the algorithm s performance.

We performed a per-lookahead analysis, customary in the RTHS literature. We observed that when a significant difference in solution cost exists between LTL-LRTA* and LTLLRTA*A such differences are similar across different lookahead values. In addition, we observed no relevant differences between goal types. A detailed analysis is included in the Appendix, Tables 5-14.

Finally, we note that LTL-LRTA* and LTL-LRTA*A have identical computational complexity given the same heuristic. For that reason, the fact that LTL-LRTA*A tends to find better solutions than LTL-LRTA* also implies that LTL-LRTA*A finds solutions faster than LTL-LRTA*.

Proceedings of the Thirty-First International Joint Conference on Artiﬁcial Intelligence (IJCAI-22)

Configuration Time gap w.r.t. h1 φ

# items Domain h M φ h M φ + A h M φ + A

3 items Rooms 9.542 9.256 7.963 Star Craft 3.161 3.084 2.922 Maze 1.016 0.994 1.003

25 items Rooms 3.395 3.420 3.327 Star Craft 2.449 2.381 2.490 Maze 1.532 1.518 1.631

100 items Rooms 2.233 2.379 2.444 Star Craft 2.359 2.303 2.428 Maze 1.936 1.992 2.340

Total 2.512 2.501 2.538

Table 2: Runtime results using different heuristics. # items corresponds to the number of instances of the same letter that appear in the map. Time gap is defined as the ratio between the runtime obtained by LTL-LRTA*A using h1 φ and the corresponding algorithm. 5.3 Comparison of Heuristics for LSPs Now we compare the performance of different heuristics for solving LSPs. In Section 3.4, we proposed two ways to compute heuristics for LSPs: the cross-product heuristics and the myopic heuristics. The myopic heuristics are weaker than the cross-product heuristics, but they are faster to compute. In particular, the complexity of computing the myopic heuristic is O(|Sq|) whereas the complexity of computing the crossproduct heuristic is O(v2), where v = 1 + P q Q |Sq|. Recall that Sq S is the subset of states that change the current DFA state q Q to some q Q, such that (q ) < . Thus, we would expect that a cross-product heuristic will usually find better solutions than a myopic heuristic but, depending on the size of Sq, a myopic heuristic might find solutions faster than a cross-product heuristic. To verify this behavior empirically, we ran experiments using different numbers of duplicated letters on each map. As the number of letters increases, the size of Sq also increases since there are more locations that the agent could reach in order to change the current DFA state. Table 2 shows a runtime comparison between the crossproduct heuristic h M φ and the myopic heuristic h M φ . That is, it reports how fast each method is able to find a solution for the LSP, without carrying about the quality of such a solution. The table normalizes the performance of each method with respect to the performance of LTL-LRTA*A using h1 φ. Thus, a performance of 9.5 means that the method solved the LSP 9.5 times faster than LTL-LRTA*A using h1 φ. Specifically, the table reports the performance of three methods: h M φ refers to LTL-LRTA* using h M φ ,

h M φ + A refers to LTL-LRTA*A using h M φ , and

h M φ + A refers to LTL-LRTA*A using h M φ . The results in Table 2 show the following. When we only have three instances of each letter (i.e., |Sq| is relatively small), the cross-product heuristic h M φ dominates. However, as we increase the number of instances of each letter, the myopic heuristic h M φ tends to solve LSPs faster (on average) than the cross-product heuristic h M φ . In addition, solutions found

Configuration Solution gap w.r.t. h1 φ

# items Domain h M φ h M φ + A h M φ + A

3 items Rooms 30.012 30.256 21.056 Star Craft 4.11 4.112 3.587 Maze 1.008 1.006 0.980

25 items Rooms 11.162 11.530 9.526 Star Craft 5.381 5.402 5.155 Maze 1.847 1.867 1.862

100 items Rooms 4.791 5.037 4.535 Star Craft 4.94 4.959 4.709 Maze 2.602 2.675 3.063

Total 4.558 4.628 4.240

Table 3: Solution cost analysis using different heuristics. # items corresponds to the number of instances of the same letter that appear in the map. Solution gap is defined as the ratio between the solution cost obtained by LTL-LRTA*A using h1 φ and the corresponding algorithm. by the cross-product heuristic are usually better than the solutions found by the myopic heuristic. Table 3 reports the solution gap between each method with respect to LTL-LRTA*A using h1 φ, for the same algorithm configurations as Table 2. It shows that the crossproduct heuristic h M φ tends to find better solutions than myopic heuristics, regardless of the number of instances of each letter. This behavior is expected since the cross-product heuristic is stronger than the myopic heuristic. Therefore, the decision of using a cross-product heuristic or a myopic heuristic is application- (and problem-) dependent. If the goal is to find the best possible solution, it is better to use a cross-product heuristic. However, if the goal is to get any solution as fast as possible and the Sq sets are large, then it might be better to use a myopic heuristic.

6 Conclusion

In this paper, we studied how to incorporate temporally extended goals into RTHS. We proposed to encode such goals using LTLf and formally defined search problem with LTLf goals, which we called LSPs. We exploited the relationship between LTLf and DFAs to construct a cross-product version of an LSP (c LSP). A c LSP can be solved by any off-theshelf RTHS method. We studied different ways in which the structure of the DFA can be used to improve the performance of RTHS methods when solving a c LSP. Specifically, we (i) showed how to use the DFA to compute heuristics and (ii) proposed LTL-LRTA*A, an algorithm that guides the search exploiting the DFA s structure. We studied the theoretical properties of our heuristics and algorithm. Our empirical results showed the benefits of exploiting the DFA structure for LSP solving.

Acknowledgments

We gratefully acknowledge funding from the National Center for Artificial Intelligence CENIA FB210017, Basal ANID.

Proceedings of the Thirty-First International Joint Conference on Artiﬁcial Intelligence (IJCAI-22)

References [Bacchus et al., 1997] Fahiem Bacchus, Craig Boutilier, and Adam J. Grove. Structured solution methods for non-markovian decision processes. In Proceedings of the 14th National Conference on Artificial Intelligence (AAAI), pages 112 117, 1997. [Baier and Mc Ilraith, 2006] Jorge A. Baier and Sheila A. Mc Ilraith. Planning with temporally extended goals using heuristic search. In Proceedings of the 16th International Conference on Automated Planning and Scheduling (ICAPS), pages 342 345, 2006. [Bienvenu et al., 2006] Meghyn Bienvenu, Christian Fritz, and Sheila A. Mc Ilraith. Planning with qualitative temporal preferences. In Proceedings of the 10th International Conference on Knowledge Representation and Reasoning (KR), pages 134 144, June 2006. [Bonet and Geffner, 2000] Blai Bonet and Hector Geffner. Planning with incomplete information as heuristic search in belief space. In Proceedings of the 5th International Conference on Artificial Intelligence Planning and Systems (AIPS), pages 52 61, 2000. [Bonet et al., 2020] Blai Bonet, Giuseppe De Giacomo, Hector Geffner, Fabio Patrizi, and Sasha Rubin. High-level programming via generalized planning and LTL synthesis. In Diego Calvanese, Esra Erdem, and Michael Thielscher, editors, Proceedings of the 17th International Conference on Knowledge Representation and Reasoning (KR), pages 152 161, 2020. [Bulitko and Bj ornsson, 2009] Vadim Bulitko and Yngvi Bj ornsson. knn lrta*: Simple subgoaling for real-time search. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, volume 4, 2009. [Camacho et al., 2019] Alberto Camacho, Rodrigo Toro Icarte, Toryn Q Klassen, Richard Anthony Valenzano, and Sheila A Mc Ilraith. LTL and Beyond: Formal Languages for Reward Function Specification in Reinforcement Learning. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI), volume 19, pages 6065 6073, 2019. [Cresswell and Coddington, 2004] Stephen Cresswell and Alexandra M. Coddington. Compilation of LTL goal formulas into PDDL. In Ramon L opez de M antaras and Lorenza Saitta, editors, Proceedings of the 16th European Conference on Artificial Intelligence (ECAI), pages 985 986, Valencia, Spain, August 2004. IOS Press. [De Giacomo and Vardi, 2013] Giuseppe De Giacomo and Moshe Y Vardi. Linear temporal logic and linear dynamic logic on finite traces. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), 2013. [De Giacomo and Vardi, 2015] Giuseppe De Giacomo and Moshe Y. Vardi. Synthesis for LTL and LDL on finite traces. In Qiang Yang and Michael J. Wooldridge, editors, Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), pages 1558 1564. AAAI Press, 2015. [Duret-Lutz et al., 2016] Alexandre Duret-Lutz, Alexandre Lewkowicz, Amaury Fauchille, Thibaud Michaud, Etienne Renault, and Laurent Xu. Spot 2.0 A Framework for LTL and ω-Automata Manipulation. In Proceedings of the 14th International Symposium on Automated Technology for Verification and Analysis (ATVA), pages 122 129. Springer, 2016. [Fuggitti, 2019] Francesco Fuggitti. Ltlf2dfa, March 2019. [Gerevini et al., 2009] Alfonso Gerevini, Patrik Haslum, Derek Long, Alessandro Saetti, and Yannis Dimopoulos. Deterministic planning in the fifth international planning competition: PDDL3

and experimental evaluation of the planners. Artificial Intelligence, 173(5-6):619 668, 2009. [Helmert, 2006] Malte Helmert. The Fast Downward planning system. Journal of Artificial Intelligence Research, 26:191 246, 2006. [Hern andez and Baier, 2011] Carlos Hern andez and Jorge A. Baier. Fast subgoaling for pathfinding via real-time search. In Proceedings of the 21st International Conference on Automated Planning and Scheduling (ICAPS), Freiburg, Germany, June 2011. [Hoffmann and Nebel, 2001] J org Hoffmann and Bernhard Nebel. The FF planning system: Fast plan generation through heuristic search. Journal of Artificial Intelligence Research, 14:253 302, 2001. [Kabanza and Thi ebaux, 2005] Froduald Kabanza and Sylvie Thi ebaux. Search control in planning for temporally extended goals. In Proceedings of the 15th International Conference on Automated Planning and Scheduling (ICAPS), pages 130 139, 2005. [Koenig and Sun, 2009] Sven Koenig and Xiaoxun Sun. Comparing real-time and incremental heuristic search for real-time situated agents. Autonomous Agents and Multi-Agent Systems, 18(3):313 341, 2009. [Korf, 1990] Richard E. Korf. Real-time heuristic search. Artificial Intelligence, 42(2):189 211, 1990. [Simon and R oger, 2015] Salom e Simon and Gabriele R oger. Finding and exploiting LTL trajectory constraints in heuristic search. In Levi Lelis and Roni Stern, editors, Proceedings of the 8th Symposium on Combinatorial Search (So CS), pages 113 121. AAAI Press, 2015. [Sturtevant, 2012] Nathan Sturtevant. Benchmarks for grid-based pathfinding. Transactions on Computational Intelligence and AI in Games, 4(2):144 148, 2012. [Toro Icarte et al., 2018] Rodrigo Toro Icarte, Toryn Q. Klassen, Richard Anthony Valenzano, and Sheila A. Mc Ilraith. Teaching multiple tasks to an RL agent using LTL. In Proceedings of the 17th International Joint Conference on Autonomous Agents and Multi Agent Systems (AAMAS), pages 452 461, 2018. [Vaezipoor et al., 2021] Pashootan Vaezipoor, Andrew C Li, Rodrigo Toro Icarte, and Sheila A Mcilraith. Ltl2action: Generalizing ltl instructions for multi-task rl. In Proceedings of the 38th International Conference on Machine Learning (ICML), pages 10497 10508, 2021. [Wolsey, 1998] Laurence A Wolsey. Integer programming. John Wiley & Sons, 1998.

Proceedings of the Thirty-First International Joint Conference on Artiﬁcial Intelligence (IJCAI-22)