# counterplanning_using_goal_recognition_and_landmarks__1b03386f.pdf

Counterplanning using Goal Recognition and Landmarks

Alberto Pozanco, Yolanda E-Mart ın, Susana Fern andez, Daniel Borrajo Departamento de Inform atica, Universidad Carlos III de Madrid Avda. de la Universidad, 30. 28911 Legan es, Madrid, Spain apozanco@pa.uc3m.es, yescuder@inf.uc3m.es, sfarregu@inf.uc3m.es, dborrajo@ia.uc3m.es

In non-cooperative multi-agent systems, agents might want to prevent the opponents from achieving their goals. One alternative to solve this task would be using counterplanning to generate a plan that allows an agent to block other s to reach their goals. In this paper, we introduce a fully automated domain-independent approach for counterplanning. It combines; goal recognition to infer an opponent s goal; landmarks computation to identify subgoals that can be used to block opponents goals achievement; and classical automated planning to generate plans that prevent the opponent s goals achievement. Experimental results in several domains show the beneﬁts of our novel approach.

1 Introduction In non-cooperative multi-agent systems, agents might want to prevent the opponents from achieving their goals. This task has been named counterplanning [Carbonell, 1981]. Examples of non-cooperative multi-agent domains where this approach can provide great beneﬁts are police controls, cyber security, or real-time strategy games, where this ability has been identiﬁed as one of the major challenges for Artiﬁcial Intelligence [Onta n on et al., 2013]. Most previous counterplanning approaches are based on domain-dependent solutions, such as rule-based systems [Carbonell, 1978; Rowe, 2003], or Hierarchical Task Networks (HTN) [Willmott et al., 2001]. Recently, there has been increasing interest in the study and generation of agents capable of reasoning about their own and opponents goals as well as their environment [Cox, 2007]. Some works follow the Goal-Driven Autonomy (GDA) process, which integrates a diverse set of AI components such as HTN planning or explanation generation [Molineaux et al., 2010; Weber et al., 2010]. Other works combine goal recognition and reasoning on actions, applying those techniques in domains such as identifying terrorist activity [Jarvis et al., 2005], air combat [Borck et al., 2015], real-time strategy games [Kabanza et al., 2010], or cyber security [Boddy et al., 2005; Edelkamp et al., 2009; Sarraute et al., 2012; Obes et al., 2013; Hoffmann, 2015]. Again, these approaches are domain-dependent. On the goal recognition side, they

use plan [Kabanza et al., 2010], rules [Carbonell, 1978] or behavior [Borck et al., 2015] libraries to detect their opponent s goals. On the action reasoning side, they use stored policies [Carbonell, 1981], ask for human guidance following a mixed-initiative paradigm [Jarvis et al., 2005], or require heavy knowledge engineering processes such as HTN based approaches [Willmott et al., 2001]. In this paper we present a fully automatic domainindependent approach for counterplanning. This approach is based on: goal recognition, landmarks, and classical automated planning. Goal recognition aims to infer an agent s plan or goals from a set of observations. In general, the observed agent can be cooperative or competitive. We use this technique to infer an opponent s goals. Fact landmarks are propositions that must be true in all valid solution plans [Hoffmann et al., 2004]. We use landmarks to identify subgoals that can be used to block the opponent s goal achievement. Classical automated planning aims to generate a sequence of actions, namely a plan, which achieves some goals from an initial state. We use it to generate plans that prevent the opponent s goal achievement. The main idea of this novel approach is to: (1) quickly identify the actual opponent s goal g using planning-based goal recognition techniques; (2) compute the set of landmarks involved in the achievement of g; (3) select a counterplanning landmark, which is the ﬁrst landmark where the opponent could be blocked; and (4) generate a plan to achieve the counterplanning landmark, and therefore to block the opponent s goal achievement. This approach shows how an opponent can be effectively blocked in different non-cooperative domains. The rest of the paper is organized as follows. In the next section we review the basic notions of classical planning, goal recognition, and landmarks. In Section 3 we introduce our fully automatic domain-independent counterplanning approach, which includes the quick detection of goals using goal recognition, and the landmark s computation to identify relevant counterplanning landmarks. In Section 4 we present an empirical study, and in Section 5 we discuss future work.

2 Background

2.1 Automated Planning

Automated Planning is the task of choosing and organizing a sequence of actions such that, when applied in a given initial

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

state, it results in a goal state [Ghallab et al., 2004]. Formally, a single-agent STRIPS planning task can be deﬁned as a tuple Π = F, A, I, G , where F is a set of propositions, A is a set of instantiated actions, I F is an initial state, and G F is a set of goals. Each action a A is described by a set of preconditions (pre(a)), which represent literals that must be true in a state to execute an action, and a set of effects (eff(a)), which are the literals that are added (add(a) effects) or removed (del(a) effects) from the state after the action execution. The deﬁnition of each action might also include a cost c(a) (the default cost is one). The execution of an action a in a state s is deﬁned by a function γ such that γ(s, a) = (s \ del(a)) add(a) if pre(a) s, and s otherwise (it cannot be applied). The output of a planning task is a sequence of actions, called a plan, π = (a1, . . . , an). The execution of a plan π in a state s can be deﬁned as:

Γ(s, π) = Γ(γ(s, a1), (a2, . . . , an)) if π = s if π =

A plan π is valid if G Γ(I, π). The plan cost is commonly deﬁned as c(π) = P ai π c(ai). We will use the function PLANNER(Π) to refer to an algorithm that computes a plan π from a planning task Π.

2.2 Goal Recognition

Goal Recognition is the task of inferring another agent goals through the observation of its interactions with the environment. The problem has captured the attention of several computer science communities [Albrecht et al., 1997; Geib and Goldman, 2009; Sukthankar et al., 2014]. Among them, planning-based goal recognition approaches have been shown to be a valid domain-independent alternative to infer agents goals [Ram ırez and Geffner, 2009; 2010; 2011; Pattison and Long, 2010; E-Mart ın et al., 2015; Vered and Kaminka, 2017; Pereira et al., 2017]. Ram ırez and Geffner [2010] developed an approach that assumes observations are actions, and formally deﬁned a planning-based goal recognition problem as:

Deﬁnition 1 (Goal Recognition Problem) A goal recognition problem is a tuple T = P, G, O, Pr where P = F, A, I is a planning domain and initial conditions, G is the set of possible goals G, G F, O = (o1, ...., om) is an observation sequence with each oi being an action in A, and Pr is a prior probability distribution over the goals in G.

The solution to a goal recognition problem is a probability distribution over the set of goals G G giving the relative likelihood of each goal. In this work we assume that Pr is uniform. We will use the function RECOGNIZEGOALS(F, A, I, G, O) to refer to an algorithm that solves the goal recognition problem. This function returns a list of tuples in the form of goal, probability for each goal in G.

2.3 Landmarks

In Automated Planning, landmarks were initially deﬁned as sets of propositions that have to be true at some time in every solution plan [Hoffmann et al., 2004]. Formally:

Deﬁnition 2 (Fact Landmark) Given a planning task Π = F, A, I, G , a formula LΠ F is a fact landmark of Π iff LΠ is true in some state along all valid plans executions that achieve G from I. This deﬁnition was later extended to include action landmarks [Richter and Westphal, 2010]. We will use the function EXTRACTLANDMARKS(F, A, I, G) to refer to an algorithm that computes a set of landmarks LΠ from a planning task Π.

3 Domain-Independent Counterplanning We ﬁrst formalize the two actors involved in a counterplanning problem as planning agents. Deﬁnition 3 (Seeking agent) A seeking agent φ is an agent that has an associated planning task Πφ = Fφ, Aφ, Iφ, Gφ , and pursues its goal Gφ by following a plan πφ computed from Πφ. Deﬁnition 4 (Preventing agent) A preventing agent α is an agent that has an associated planning task Πα = Fα, Aα, Iα, Gα . Gα is initialized to . Then, Algorithm 1 (described later) computes a set of goals to be used for the counter-planning task. There can be varied relations between Πφ and Πα, and the information that one agent has from the other. For instance, the actions that both agents can perform could be the same Aφ = Aα, or totally different Aφ Aα = . They could also have different or equal observations of the world. In this work, we make the following assumptions: φ s model is known by α except for its goal Gφ. In most real-world domains that we have selected for potential applications (e.g. police control, cyber security, strategy games, . .. ), both Fφ, Iφ, and Aφ can be assumed to be known; as in most literature on goal reasoning, α knows a set of potential goals, Gφ Fφ, φ could be trying to achieve; deterministic action outcomes and full observability of those actions by α; both agents can follow optimal or suboptimal strategies to reach their goals; both agents stick to their plans. In other words, they do not replan or change their goals during execution; and the temporal duration of an action ai A is determined by its cost c(ai).1

Since both agents operate in a common environment, the execution of their actions affects the shared environment. Therefore, we assume any state of the environment s can be deﬁned in terms of the set of propositions Fe (s Fe), such that Fφ Fα Fe. Additionally, some propositions must be in Fφ Fα, i.e. they will be observable and modiﬁable by both agents. The individual execution of actions by any of the two agents in Fe will be based on the respective action sets. Hence, the execution of an action a (a Aφ Aα) in a state s is deﬁned using the previous γ(s, a). Furthermore, the joint execution of one action per agent in the same time step t can be deﬁned as follows.

1In this paper, we assume unit action costs.

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

Deﬁnition 5 (Joint execution of two actions) Given two actions aφ Aφ and aα Aα and an environment state s Fe, the joint execution of both actions at a time step t results in a new state given by

γφ,α(s, aφ, aα) = γ(γ(s, aα), aφ) if aφ not mutex with aα γ(s, aα) otherwise

Similarly, the joint execution of two plans Γφ,α(s, πφ, πα) can be deﬁned by the iteration of the joint execution of actions of those plans using γφ,α(s, aφ, aα). For simplicity, in this paper we assume that the preventing agent always executes its action ﬁrst when two actions are mutex. We deﬁne two mutex actions as follows.

Deﬁnition 6 (Mutex actions) Two actions ax, ay executed at a time step t are mutex if any literal in eff(ax) deletes (adds) any literal in pre(ay) or if any literal in eff(ax) deletes (adds) a literal that is added (deleted) in eff(ay).

Using these deﬁnitions, we can now formally describe a counterplanning task.

Deﬁnition 7 (Counterplanning task) A counterplanning task is deﬁned by a tuple CP = Πφ, Πα, Gφ, Oφ where Πφ is the planning task of φ, Πα is the planning task for the preventing agent, Gφ is the set of sets of goals that φ can potentially pursue, and Oφ = (o1, . . . , om) is a set of observations by α of the execution of a plan πφ = (o1, . . . , om, am+1, . . . , ak) that solves Πφ.2

We assume that φ generates a plan πφ to solve its planning task Πφ prior to counterplanning, and that plan (as well as its corresponding goals) is unknown for α. Then, at some time step m of the execution of πφ (where m can range from 1 to k, the length of πφ), given all observed actions from the execution of πφ, α has to infer the φ agent goals (from Gφ) and generate a solution to a counterplanning task, namely a counterplan.

Deﬁnition 8 (Counterplan) Given φ agent plan πφ = (am+1, . . . , ak), a plan πα = (a1, . . . , an) is a valid counterplan for πφ = (am+1, . . . , ak) if the joint execution of πα and πφ does not allow φ to achieve the goals in Gφ; Gφ Γφ,α(s, πφ, πα).

Our approach to solve counterplanning tasks assumes that α can delete (or add in the case of negated literals) some proposition that φ needs in order to achieve its goals. There could be different deﬁnitions for needed literals. We use planning landmarks in this work. Therefore, we impose two constraints: the seeking agent φ and the preventing agent α share some propositions, Fφ Fα = ; and at least one action a in α model, a Aα, must delete (add) at least one of φ s plan landmarks. Algorithm 1 shows the high-level algorithm used to solve a counterplanning task from the perspective of α. The algorithm ﬁrst solves a goal recognition problem using RECOGNIZEGOALS given a planning domain, initial conditions, a set of candidate goals Gφ, and a set of observations Oφ. It

2We have changed the notation oi for aj in the πφ plan to differentiate between observations and future actions.

returns Tφ, a probability distribution over the set of candidate goals set Gφ in the form of tuples goal, probability . Then, the initial state of φ, Iφ, is updated with the given observations by advancing the state from the initial Iφ and applying all actions corresponding to the observations in Oφ. Next, we select the set of most probable goals sets G φ from Tφ. For each goal g G φ, we extract the landmarks of the new φ planning task using EXTRACTLANDMARKS. This computation will return the set of common landmarks among all the most probable sets of goals, LΠφ. Figure 1 shows an example of that computation in a navigation domain. If there are not common landmarks, the counterplanning task cannot be performed. Otherwise, the algorithm selects the set of counterplanning landmarks LΠφ,Πα in EXTRACTCPLANDMARKS. This process will be explained in detail later. As before, if there are not counterplanning landmarks, the counterplanning task cannot be performed. Otherwise, one of the landmarks in LΠφ,Πα is negated and returned as the preventing agent s goal Gα in SELECTGOAL. Finally, a plan πα is computed to achieve that goal such that it prevents φ from achieving its goals. In the next section we discuss how we select Gα from LΠφ.

Algorithm 1 DOMAIN-INDEPENDENT COUNTERPLANNING

Inputs: Πφ, Πα, Gφ, Oφ Outputs: πα 1: Tφ RECOGNIZEGOALS(Fφ, Aφ, Iφ, Gφ, Oφ) 2: Iφ UPDATE(Iφ, Aφ, Oφ) 3: LΠφ Fφ 4: G φ goal(arg maxt Tφ probability(t)) 5: πα 6: for g G φ do 7: LΠφ LΠφ EXTRACTLANDMARKS(Fφ, Aφ, Iφ, g) 8: if LΠφ = then 9: LΠφ,Πα EXTRACTCPLANDMARKS(Πφ, Πα, LΠφ) 10: if LΠφ,Πα = then 11: Gα SELECTGOAL(Πφ, Πα, LΠφ,Πα) 12: Iα =UPDATE(Iα, Aα, Oφ) 13: πα PLANNER(Πα = (Fα, Aα, Iα, Gα)) 14: return πα

3.1 Selecting Goals from Landmarks Given a set of common landmarks LΠφ, two questions arises: (1) how many of those fact landmarks could be deleted (added) by α s model of the world (domain), so φ cannot achieve them?; and (2) within those facts that α can delete (add), which one should it become its goal Gα to effectively stop φ from achieving its goal? The ﬁrst question brings us to the following deﬁnition: Deﬁnition 9 Counterplanning landmark Given the set of fact landmarks from Πφ, LΠφ, a landmark li LΠφ is a counterplanning landmark for α if a Aα with li eff(a). If li is a positive literal, li should be in del(a). If li is a negative literal, li should be in add(a). All the fact landmarks that comply with this condition are added to the counterplanning landmarks set of

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

Figure 1: Navigation example. The green node illustrates the current position of the agent. The rest of the nodes represent a possible goal. The red node indicates a common landmark among the remaining goals after observing an action (moving to the right). The blue node refers to the actual goal of the seeking agent.

both agents LΠφ,Πα. We will refer to this process as EXTRACTCPLANDMARKS(Πφ, Πα, LΠφ). The second question requires further analysis. Given the deﬁnition of a fact landmark, α only has to delete (add) a fact landmark of Πφ to prevent the seeking agent from achieving its goal. The problem therefore turns into selecting a single goal from LΠφ,Πα to become the next goal for α, Gα. In most counterplanning domains, the earlier we discover the opponent s intentions (and thus stop him/her from achieving its goal), the better. This is a common characteristic in: (1) cyber security domains where we want to detect an intruder as soon as possible; (2) real-time strategy games where we want to defeat our enemy in the shortest possible time; or (3) a medical domain where we want to prevent the disease from spreading at the earliest time. Thus, this type of problems presents some temporal aspects that we need to take into account. In particular, it is not useful for α to pursue a counterplanning fact landmark that φ is going to achieve before α can avoid it. We deﬁne this temporal subproblem as ﬁnding the First Counterplanning Landmark, FCL. Algorithm 2 shows the high-level algorithm used to ﬁnd it. For each counterplanning landmark li, an optimal plan is computed for φ and α. It is done optimally to ensure that the returned values correspond to the shortest time (cost) when both agents could reach that subgoal. If the cost (duration) of achieving li by α solving Πα is smaller than the cost (duration) of achieving li by φ, solving Πφ, and there is no other landmark lj with smaller cost, then li becomes FCL. In other words, li is the ﬁrst landmark in Πφ that α can achieve before φ. Therefore, the new Gα will be the negated FCL, since we want α to avoid φ achieving FCL. As a reminder, we are performing a one-step counterplanning episode. If φ performs some actions to re-achieve FCL or change its goals, then we assume α would have to start a new counterplanning episode.

3.2 Example To illustrate our approach, let us consider a simple domain where a terrorist has committed an attack in the center of

Algorithm 2 SELECT GOAL

Inputs: Πφ, Πα, LΠφ,Πα Outputs: FCL 1: FCL 2: FCLCost 0 3: for li in LΠφ,Πα do 4: πφ PLANNER(Πφ = Fφ, Aφ, Iφ, li ) 5: πα PLANNER(Πα = Fα, Aα, Iα, { li} ) 6: if c(πφ) >= c(πα) then 7: if c(πφ) < FCLCost then 8: FCLCost c(πφ) 9: FCL li 10: return FCL

Figure 2: Terrorist capture domain. The red circle represents the initial position of the terrorist. Green circles represent his/her possible goals. Arrow a1 indicates the terrorist ﬁrst observed action. L1, L2 and L3 refer to the landmark points that the terrorist has to pass through in order to reach G1.

a city. Figure 2 shows the road network for this problem. The city police (α) knows that the terrorist (φ) wants to leave the city by either G1 (airport), G2 (train station), or G3 (bus terminal); so, Gφ = {G1, G2, G3}. The police has control over some cameras located at key points around the city (represented as nodes in Figure 2). The police actions consist of stopping the terrorist by setting a control at any of those points (Aα). So, the police wants to: (1) quickly know where the terrorist wants to go (G φ); and (2) stop him/her as soon as possible to avoid panic breaking out. When the cameras observe that the terrorist is at L1 (O1 = a1 Aφ), the police guesses that his/her goal is to reach G1 (G φ) by doing goal recognition. The police only has resources to set one control. It knows that the terrorist must pass through L1, L2 and L3 to reach the airport. Although these four spots are counterplanning landmarks LΠφ,Πα, the police can only set the control at L2, L3 and G1 before the terrorist reaches those places. Finally, the police goal Gα will be to set the control at L2 since it is the FCL; i.e. the ﬁrst spot where the terrorist can be effectively stopped.

4 Experiments and Evaluation

We empirically evaluate our approach on the new previously described TERRORIST domain as well as in other domains usually used in goal recognition works such as LOGISTICS, EASY IPC GRID, BLOCKS, and INTRUSION DETECTION. Each domain and problem conforms Πφ. Additionally,

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

in order to perform counterplanning, for each domain we have generated a new counterplanning domain, which deﬁnes the planning task Πα for α. The classical domain and the counterplanning domain comply with the requirements mentioned in Section 3. For each classical domain, we generate 10 random problems for φ. All the problems have the same number of objects. However, each problem has a different actual goal for φ, which is hidden for α. Their details are explained below.

LOGISTICS. φ can deliver any package to any destination. It can do it by driving either trucks or planes. α can break a truck or a plane to interrupt the delivery. TERRORIST. φ may want to get to any point in the map. It can do it by navigating through points that are connected. α can set a control at a point so that they can arrest φ. INTRUSION DETECTION. φ may want to perform a set of attacks to a pool of computers. It can do it by performing hacking actions like delete logs or gain root access. α can perform administration actions like encrypt the information or change the root password to a computer to prevent φ from conducting its attack. BLOCKS. φ may want to put any of block on top of another. It can do it by picking up and stacking blocks that are not painted. α can paint a block so φ can not pick it up and preventing it from achieving its goal. EASY IPC GRID. φ may want to get to any cell in the grid. It can do it by navigating through cells without door or, if it has the right key, by opening them. α can navigate through those cells without needing a key, and can steal the key or change the lock of a door to block φ s plan.

The set of candidate goals Gφ always consists of a 20% of all the possible goals in each problem. Therefore, bigger Gφ sets for the same domain mean bigger problems. In particular, for the TERRORIST domain with a problem map of 20 nodes, Gφ consists of 4 random destinations; when the map has 50 nodes, Gφ consists of 10 possible destinations. The hidden goal, i.e., the current goal of φ, Gφ, that is unknown to α, is always on Gφ. The set of observed actions was taken to be a subset of the plan solution πφ, ranging from 10% of the actions, up to 70% of the actions. We did not include tests where the observed sequence is higher than 70% because our counterplanning approach degrades rapidly. The reason for this is that the number of counterplanning landmarks decreases as the number of observed actions increases. Our fully automatic domain-independent counterplanning approach works with any combination of goal recognition and classical planning approaches. For purposes of these tests, we have selected the following conﬁguration of goal recognition techniques and planners. For the goal recognition part of our counterplanning technique, we have tested the aforementioned domains and problems using the Ram ırez and Geffner [2010] approach with different planners. The planners are HSP*f [Haslum, 2008], an optimal planner; and LAMA [Richter et al., 2011], a satisﬁcing planner. LAMA

is used in two modes: as a greedy planner that stops after the ﬁrst plan is found GREEDY LAMA; and as an anytime planner that reports the best plan found in a given time window LAMA. The planning times for all the planners were set to 1800 seconds. In all the domains πφ is computed using GREEDY LAMA. For optimal plan computations of FCL, we use HSP*f. All the experiments were ran on a Ubuntu machine with Intel Core 2 Quad Q8400 running at 2.66 GHz. Table 1 summarizes the experimental results. For each planner, each row shows average performance over the 10 problems in each domain. Each column represents different measures of quality and performance:

|Gφ|: number of goals in the candidate goal s set. |πφ|, |πα|: average plan length cost for each agent. |LΠφ|: number of landmarks of the seeking agent planning task. %Obs: percentage of the actions of πφ in Oφ. Higher percentages of observations mean that more actions of φ s plan have already been observed by α and, thus, executed by φ. Q: fraction of times that the actual goal Gφ was found to be the most likely goal G φ. In our experiments, if G φ consist of more than one goal, we select the one with more counterplanning landmarks as the most likely goal. Ideally, Q = 1. Qt: average time in seconds taken for solving the goal recognition problems. E: fraction of times that α executing πα succeeds in stopping φ in achieving its goals. Ideally, E = 1. Pe: penalty value computed as the number of steps in πφ that are successfully performed divided by the length of πφ. This penalty value represents the cost paid by πα at each time step that has not stopped πφ. Lower values of Pe indicate better performance, ideally Pe = 0.

As we can see in all the domains, the higher percentage of observations, the higher Q values, as expected. The goal recognition task becomes easier as more actions have been observed (as reported in other goal recognition works). Regarding E, the fraction of times that α blocks φ achieving its goals is clearly related to Q. Guessing the opponent s goal right usually involves more opportunities to block it. This is the case of INTRUSION DETECTION, where our counterplanning approach performs well. However, there are some cases in which we can badly guess φ s goal and still block its goal achievement (Q value is lower than E). This happens when our analysis of the goal recognition process identiﬁes a common landmark (to stop φ s plan), but selects a wrong goal as in some BLOCKS instances. The value of E is also closely related to the percentage of observations. Lower percentage values allow α to ﬁnd many landmarks where to effectively block φ. On the other hand, if most of the actions in πφ have already been observed (i.e. executed by the seeking agent), there will be just a few counterplanning landmarks to prevent φ from achieving its goal. This is also connected with the penalty values Pe. Lower percentage of observations imply that, if the opponent can be

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

HSP*f LAMA Greedy LAMA Domain |Gφ| |πφ| |πα| |LΠφ| %Obs Q Qt E Pe Q Qt E Pe Q Qt E Pe

LOGISTICS 6 9.4 1 12.3

10 0.6 4.2 0.7 0.3 0.6 5.0 0.7 0.3 0.6 2.6 0.6 0.2 30 0.6 7.5 0.7 0.4 0.6 10.7 0.7 0.4 0.7 2.6 0.8 0.4 50 0.7 10.0 0.8 0.5 0.7 16.3 0.8 0.5 0.7 2.7 0.8 0.5 70 0.9 14.3 0.6 0.8 0.9 28.8 0.6 0.8 0.9 2.8 0.7 0.8

4 3.8 1 3.8

10 0.5 3.8 0.6 0.5 0.6 2.8 0.6 0.5 0.6 2.6 0.6 0.5 30 0.6 4.0 0.8 0.7 0.6 2.9 0.8 0.7 0.6 2.8 0.8 0.7 50 0.6 5.0 0.8 0.8 0.6 3.0 0.8 0.8 0.6 2.9 0.8 0.8 70 0.9 5.1 0.9 1.0 0.9 3.3 0.9 1.0 0.9 3.0 0.9 1.0

10 5.6 1 4.1

10 0.3 283.2 0.5 0.5 0.3 211.4 0.5 0.5 0.3 210.4 0.5 0.5 30 0.5 287.2 0.6 0.8 0.4 235.0 0.6 0.8 0.4 227.3 0.6 0.8 50 0.6 307.1 0.4 0.8 0.6 248.6 0.4 0.8 0.6 269.6 0.4 0.8 70 0.6 325.8 0.4 1.0 0.6 321.1 0.4 1.0 0.6 322.7 0.4 1.0

INTRUSION DETECTION 6 4.3 1 4.8

10 1.0 0.5 1.0 0.4 1.0 1.0 1.0 0.4 1.0 0.7 1.0 0.4 30 1.0 0.4 1.0 0.7 1.0 1.0 1.0 0.7 1.0 0.7 1.0 0.7 50 1.0 0.4 1.0 0.7 1.0 1.0 1.0 0.7 1.0 1.0 1.0 0.7 70 1.0 0.3 1.0 0.9 1.0 0.8 1.0 0.9 1.0 1.0 1.0 0.9

BLOCKS 10 8.6 1 17.6

10 0.2 836.6 0.4 0.2 0.2 482.1 0.4 0.2 0.2 155.3 0.4 0.2 30 0.3 910.7 0.6 0.5 0.3 531.9 0.6 0.5 0.3 175.3 0.6 0.5 50 0.5 980.3 0.8 0.5 0.5 602.7 0.8 0.5 0.6 195.8 0.9 0.6 70 0.8 1070.3 0.7 0.8 0.8 655.3 0.7 0.8 0.8 207.8 0.7 0.7

EASY IPC GRID 4 12.6 4.5 6.3

10 0.3 5.6 0.1 1.0 0.3 1.9 0.1 1.0 0.1 1.7 0.1 0.8 30 0.3 7.3 0.3 0.6 0.3 2.0 0.3 0.6 0.3 2.0 0.4 0.78 50 0.1 9.8 0.3 0.7 0.1 2.8 0.3 0.7 0.1 2.5 0.4 0.8 70 0.3 15.3 0.0 0.3 3.5 0.0 0.3 3.1 0.0

Table 1: Comparison of the counterplanning approach in ﬁve domains using optimal and approximated goal recognition methods. Figures shown are all averages over the set of problems as explained in the text. The metrics measured are: size of Gφ, length of plans for each agent, number of landmarks |LΠφ|, percentage of observations, goal recognition accuracy Q and its time Qt, counterplanning accuracy E and penalty value Pe.

blocked, it could be done farther from the goal than if 50% or more of the plan has already been observed.

The number of landmarks of φ s planning task affects the counterplanning results. Domains with a higher number of landmarks will have a higher number of potential counterplanning landmarks where to block the opponent. The experiments conﬁrm this aspect: domains such as LOGISTICS and BLOCKS are more likely to have more landmarks and the penalty values are smaller than in the other domains which just have a few landmarks.

Regarding planners performance, GREEDY LAMA seems to achieve the best overall results both in terms of quality (Q, E, Pe) and time (Qt). Since LAMA is an anytime planner, sometimes it takes more time to complete the goal recognition process than the optimal planner HSP*f. However, all planners scale poorly to bigger problem instances where Gφ increases. This entails worse goal recognition performance Q and, therefore, worse counterplanning performance E and P. We could speed-up our technique by computing plan cost estimations instead of actual plans in order to improve the performance. That would allow its use in real-time environments.

Summarizing, the best scenario for our counterplanning technique (high E values and low Pe values) would be when the preventing agent guesses the seeker s actual goal (Q = 1) with a low percentage of observed actions (very soon) and there is a high number of landmarks in the seeker s planning task which the preventing agent can delete (add).

5 Conclusions and Future Work

We have presented a novel fully automatic domainindependent approach for counterplanning, which is based on classical planning techniques. We have formally deﬁned the counterplanning task involving two planning agents: an agent that seeks to achieve some goals; and an agent that tries to prevent its opponent from achieving its goals. To successfully block an agent in a domain-independent way, we: (1) recognize the opponent s goals by observing its performed actions; (2) identify the counterplanning landmarks of its planning task; and (3) generate a sequence of actions to block its goal achievement process as soon as possible. Results show the beneﬁts of our approach on preventing the opponent from achieving its goals in several domains. Its performance depends on the ability of the preventing agent to quickly infer the hidden goal, and the number of landmarks of the seeker s planning task.

In this work, we assume we are given the candidate goals for the goal recognition process (as in the usual literature on goal recognition). Future work would consist on relaxing this assumption and consider Fφ as Gφ. We also assume unit action costs. In future work, we would generalize our approach by considering non-unit action costs. Additionally, a natural extension to this work would be to assume a seeking agent capable of changing his/her plans and goals. It seems to be also possible to extend our approach to deal with noisy observations and uncertainty on the seeking agent s behavior.

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

Acknowledgements

This work has been partially supported by MINECO projects TIN2014-55637-C2-1-R and TIN2017-88476-C2-2-R.

References [Albrecht et al., 1997] David W. Albrecht, Ingrid Zukerman, Ann E. Nicholson, and Ariel Bud. Towards a bayesian model for keyhole plan recognition in large domains. In User Modeling, pages 365 376. Springer, 1997. [Boddy et al., 2005] Mark S. Boddy, Johnathan Gohde, Thomas Haigh, and Steven A. Harp. Course of action generation for cyber security using classical planning. In ICAPS, 2005. [Borck et al., 2015] Hayley Borck, Justin Karneeb, Ron Alford, and David W. Aha. Case-based behavior recognition in beyond visual range air combat. In FLAIRS, 2015. [Carbonell, 1978] Jaime G. Carbonell. Politics: Automated ideological reasoning. Cognitive Science, 2(1):27 51, 1978. [Carbonell, 1981] Jaime G. Carbonell. Counterplanning: A strategy-based model of adversary planning in real-world situations. Artiﬁcial Intelligence, 16(3):295 329, 1981. [Cox, 2007] Michael T. Cox. Perpetual self-aware cognitive agents. AI magazine, 28(1):32, 2007. [E-Mart ın et al., 2015] Yolanda E-Mart ın, Mar ıa D. RMoreno, and David E Smith. A fast goal recognition technique based on Interaction estimates. In IJCAI, 2015. [Edelkamp et al., 2009] Stefan Edelkamp, Carsten Elfers, Mirko Horstmann, Marcus-Sebastian Schr oder, Karsten Sohr, and Thomas Wagner. Early warning and intrusion detection based on combined AI methods. In ICAPS, 2009. [Geib and Goldman, 2009] Christopher W. Geib and Robert P. Goldman. A probabilistic plan recognition algorithm based on plan tree grammars. Artiﬁcial Intelligence, 173(11):1101 1132, 2009. [Ghallab et al., 2004] Malik Ghallab, Dana Nau, and Paolo Traverso. Automated Planning: theory and practice. Elsevier, 2004. [Haslum, 2008] Patrik Haslum. Additive and reversed relaxed reachability heuristics revisited. International Planning Competition, 2008. [Hoffmann et al., 2004] J org Hoffmann, Julie Porteous, and Laura Sebastia. Ordered landmarks in planning. Journal of Artiﬁcial Intelligence Research, 22:215 278, 2004. [Hoffmann, 2015] J org Hoffmann. Simulated penetration testing: From dijkstra to turing test++ . In ICAPS, 2015. [Jarvis et al., 2005] Peter A. Jarvis, Teresa F. Lunt, and Karen L. Myers. Identifying terrorist activity with ai plan recognition technology. AI Magazine, 26(3):73, 2005. [Kabanza et al., 2010] Froduald Kabanza, Philipe Bellefeuille, Francis Bisson, Abder Rezak Benaskeur, and Hengameh Irandoust. Opponent behaviour recognition for

real-time strategy games. In AAAI Workshop on Plan, Activity, and Intent Recognition, 2010. [Molineaux et al., 2010] Matthew Molineaux, Matthew Klenk, and David W. Aha. Goal-driven autonomy in a navy strategy simulation. In AAAI, 2010. [Obes et al., 2013] Jorge Lucangeli Obes, Carlos Sarraute, and Gerardo Richarte. Attack planning in the real world. ar Xiv preprint ar Xiv:1306.4044, 2013. [Onta n on et al., 2013] Santiago Onta n on, Gabriel Synnaeve, Alberto Uriarte, Florian Richoux, David Churchill, and Mike Preuss. A survey of real-time strategy game ai research and competition in starcraft. IEEE Transactions on Computational Intelligence and AI in games, 5(4):293 311, 2013. [Pattison and Long, 2010] David Pattison and Derek Long. Domain independent goal recognition. In STAIRS, 2010. [Pereira et al., 2017] Ramon Fraga Pereira, Nir Oren, and Felipe Meneguzzi. Landmark-based heuristics for goal recognition. In AAAI, 2017. [Ram ırez and Geffner, 2009] Miquel Ram ırez and Hector Geffner. Plan recognition as planning. In IJCAI, 2009. [Ram ırez and Geffner, 2010] Miquel Ram ırez and Hector Geffner. Probabilistic plan recognition using off-the-shelf classical planners. In AAAI, 2010. [Ram ırez and Geffner, 2011] Miquel Ram ırez and Hector Geffner. Goal recognition over POMDPs: Inferring the intention of a POMDP agent. In IJCAI, 2011. [Richter and Westphal, 2010] Silvia Richter and Matthias Westphal. The LAMA planner: Guiding cost-based anytime planning with landmarks. Journal of Artiﬁcial Intelligence Research, 39:127 177, 2010. [Richter et al., 2011] Silvia Richter, Matthias Westphal, and Malte Helmert. LAMA 2008 and 2011. In International Planning Competition, 2011. [Rowe, 2003] Neil C. Rowe. Counterplanning deceptions to foil cyber-attack plans. In IEEE Workshop on Systems, Man and Cybernetics Society, 2003. [Sarraute et al., 2012] Carlos Sarraute, Olivier Buffet, and J org Hoffmann. POMDPs make better hackers: Accounting for uncertainty in penetration testing. In AAAI, 2012. [Sukthankar et al., 2014] Gita Sukthankar, Christopher Geib, Hung Hai Bui, David Pynadath, and Robert P. Goldman. Plan, activity, and intent recognition: theory and practice. Newnes, 2014. [Vered and Kaminka, 2017] Mor Vered and Gal A. Kaminka. Heuristic online goal recognition in continuous domains. In IJCAI, 2017. [Weber et al., 2010] Ben George Weber, Michael Mateas, and Arnav Jhala. Applying goal-driven autonomy to starcraft. In AIIDE, 2010. [Willmott et al., 2001] Steven Willmott, Julian Richardson, Alan Bundy, and John Levine. Applying adversarial planning techniques to Go. Theoretical Computer Science, 252(1-2):45 82, 2001.

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)