# privacy_preserving_plans_in_partially_observable_environments__e5fbcbcd.pdf

Privacy Preserving Plans in Partially Observable Environments

Sarah Keren and Avigdor Gal and Erez Karpas

{sarahn@tx,avigal@ie,karpase@}.technion.ac.il

Technion Israel Institute of Technology

Big brother is watching but his eyesight is not all that great, since he only has partial observability of the environment. In such a setting agents may be able to preserve their privacy by hiding their true goal, following paths that may lead to multiple goals. In this work we present a framework that supports the ofﬂine analysis of goal recognition settings with non-deterministic system sensor models, in which the observer has partial (and possibly noisy) observability of the agent s actions, while the agent is assumed to have full observability of his environment. In particular, we propose a new variation of worst case distinctiveness (wcd), a measure that assesses the ability to perform goal recognition within a model. We describe a new efﬁcient way to compute this measure via a novel compilation to classical planning. In addition, we discuss the tools agents have to preserve privacy, by keeping their goal ambiguous as long as possible. Our empirical evaluation shows the feasibility of the proposed solution.

1 Introduction Modern life takes place in environments that are constantly being monitored. Smart cities are equipped with surveillance and trafﬁc cameras, Internet purchasing activities are monitored by online vendors, social media is monitored by law enforcement, etc. While monitoring activities may assist in preventing crimes, assisting consumers in getting good deals on products, and preventing online incitement, it also intrudes on the privacy of individuals, allowing means to track them and recognize their goals. In a perfect world it would be up to the individual to decide when and where to share information, what he perceives to be public space and how he chooses his practices of seeing and being seen [Hatuka and Toch, 2016]. The world is, however, imperfect in more ways than one and it is this imperfection that may allow us to preserve privacy even in monitored environments.

Goal recognition systems may suffer from reduced and noisy observability due to lack of suitable sensors, insufﬁcient sensor coverage, faulty sensors, inaccurate measurements, etc. The literature offers ways for goal recognition

systems to handle partial observability [Ramirez and Geffner, 2011; Geib and Goldman, 2005; Avrahami-Zilberbrand et al., 2005], and a recent work [Keren et al., 2016] discussed the ability of such systems to recognize goals as early as possible in an extreme case of non-observability, in which an action can either be observed fully and accurately or not at all.

In this work we present a framework that supports the ofﬂine analysis of goal recognition under a full-ﬂedged partial observability model that handles partial observability and the uncertainty that stems from it. In the proposed model, agents may either act optimally [Keren et al., 2014], or be boundedly suboptimal [Keren et al., 2015]. The agent follows his own strategy in a deterministic fashion and is fully aware of his environment and the impact of his actions. The goal recognition system has information about the agent s possible paths to his goal, but suffers from a limited ability to observe the agent s actions.

To model partial observability, we use non-deterministic system sensor models, in which the observer has partial (and possibly noisy) observability of the agent s actions, while the agent is assumed to have full observability of his environment. Similarly to Bonet and Geffner (2014), we model the system s partial observability using observation tokens that are emitted when an agent performs an action. We use a special null token to denote an action that cannot be observed by the system. The sensor model of the system is a function that maps each action into a set of possible observation tokens that may be emitted when an action is taken. Non-determinism and partial observability are modeled by mapping multiple actions into the same observation token, or the same action possibly emitting different tokens.

Given the proposed model, we offer a revised version of worst case distinctiveness (wcd) [Keren et al., 2014], a measure that assesses the ability to perform goal recognition within a model. We deﬁne non-distinctive paths as paths that lead to different goals, yet can emit the same sequence of observation tokens, due to either shared actions between the paths, or the inability of the system to distinguish different actions that emit the same observation token. We describe a new efﬁcient way to compute this measure via a novel compilation to classical planning. wcd, being a worst case measure, takes into account all possible observation tokens for each action, regardless of how unlikely they are to occur. Therefore, our compilation uses an all token determinization (similarly

Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16)

Figure 1: A partially observable goal recognition design problem

to the all outcome determinization in probabilistic planning), which ignores the probability of each observation token.

This measure serves as the basis for decision making for agents that wish to keep their goal ambiguous as long as possible to preserve privacy. To illustrate such decision making, we show how a user can choose, by using wcd, between two alternative monitoring environments that result in different levels of sensor reﬁnement (e.g., by turning the phone s GPS locator on or off). It is worth noting that our privacy preserving setting is different from other works that deal with privacy preservation in multi-agent planning, e.g., [Brafman, 2015]. In our setting, there is a single agent who wants to keep its goal private from an observer, while in the multi-agent planning setting, we have multiple agents that must work together while attempting to keep some infromation private. Example 1 To illustrate the problem at hand, consider Figure 1, which illustrates a simpliﬁed grid setting. An agent walks from a starting point s to one of two goals, g0 or g1.

Figure 1(left) illustrates the actual path an agent takes, aiming at g0. The observation of this path differs according to the two possible sensor models depicted in Figure 1(middle) and Figure 1(right). Figure 1(middle) depicts a noisy sensor model. When the agent walks straight ahead there are two possible observations that may be produced, walking either straight or to the right. If the agent walks to the left the two possible observations are walking to the left or walking straight. In the worst case, walking to the left (towards g0) does not emit an observation token that disambiguates between the paths to the two goals. On the right, the sensor is deterministic but its granularity forces it to increase its grid size, thus it cannot tell until late in the path which goal the agent targets (for example, because the GPS locator of the agent s phone is turned off and the use of location-by-cell decreases observability). The same observation token is shared by several of the original grid cells making it impossible to recognize the goal of the agent early on.

The rest of the paper is organized as follows. A model for partially observable environments is given in Section 2, followed by wcd calculation (Section 3) and a discussion on how wcd can assist in privacy preserving (Section 4). Empirical evaluation is presented in Section 5 followed by related work (Section 6) and concluding remarks (Section 7).

A model for partially observable goal recognition design with action tokens (grd-at) is given by the tuple D = h PD, GD, leg(GD), OD, SDi, where:

PD = h FD, ID, ADi is a planning domain [Ramirez and

Geffner, 2009] using the STRIPS formalism [Fikes and Nilsson, 1972]. FD is a set of ﬂuents, ID FD is the initial state, and AD is a set of actions, each represented by a triple a = hpre(a), add(a), del(a)i, the precondition, add, and delete lists respectively, all subsets of F. GD is a set of possible goals, where each possible goal

g 2 GD is a subset of FD. leg(GD) = S

g2GD leg(g) is a set of legal plans to each goal. A plan is a sequence of actions that take the agent from ID to a goal in GD. Legal plans are those allowed under the assumptions on the agent s behavior. OD is the observation token set, including the special to-

ken o;, denoting that an action could be nonobservable. SD : AD ! 2OD \ ; is a sensor model, mapping each

action a 2 AD into a set of observation tokens SD(a) OD that may be emitted when a is executed. An agent, aiming at one of the goals g 2 GD, enters the system at the initial state ID and executes one of the legal plans to g. The set of legal plans may include any plan an agent can take to achieve goal g, which can be described either explicitly or symbolically (e.g., the set of all optimal plans that do not make use of action a). Each action a performed by the agent emits one of the observation tokens o 2 SD(a), with the special token o; denoting that the action is not observed by the system. Note that the sensor model refers to how the goal recognition system observes the actions of the agent, while the agent himself is assumed to have full observability.

We now formally deﬁne the relationship between a path and the set of observation sequences it could emit.

Deﬁnition 1 Given a path ~ = ha1, ..., ani, the set of possible observable projections of ~ in D, denoted op D(~ ) (op(~ ) when clear from the context), is deﬁned as follows:

8 > > > > > > > <

> > > > > > > :

S(a1) op(ha2, ..., ani) ~ = ha1, . . . , ani

o; /2 S(a1)

(S(a1) \ {o;}) op(ha2, ..., ani) S ~ = ha1, . . . , ani op(ha2, .., ani) o; 2 S(a1)

The empty token o; is excluded from the observable projection of a path. This allows the model to account for settings in which there is no way to know if and when some action has been performed. The grd-at setting is therefore a generalization of the grd-po setting [Keren et al., 2016] in which each action is mapped to either the empty token (non-observable) or to the action name (observable).

Next, we deﬁne the relationship between a goal and the path executed by an agent as well as the relationship between a goal and the observation sequence emitted by the path. Deﬁnition 2 A path ~ satisﬁes a goal g if it follows a valid plan to g, i.e., 9 2 leg(g) s.t. ~ is a preﬁx of . We denote the set of goals satisﬁed by path ~ in D by GA

D(~ ). An observation sequence ~o satisﬁes a goal g if 9~ that satisﬁes g and ~o 2 op(~ ). We denote the set of goals satisﬁed by observation sequence ~o by GO

D(~o). Our analysis is based on the discovery of behaviors whose observable projection does not reveal the goal of the executing agent, i.e., of paths whose observable projection satisﬁes more than one goal. We deﬁne non-distinctive observation sequences and paths as follows. Deﬁnition 3 ~o is a non-distinctive observation sequence if |GO

D(~o)| > 1, that is, if it satisﬁes more than one goal. Otherwise, it is distinctive. ~ is a non-distinctive path if |GA

D(~ )| 1 and max~o2op(~ ) |GO

D(~o)| > 1. Otherwise, it is distinctive. A non-distinctive path is therefore a path that might lead to some goal, and at least one of its observable projections ~o 2 op(~ ) is non-distinctive. Note that even if a path can only lead to one goal, it is possible that its observable projection is non-distinctive, and therefore the path is non-distinctive as well. Finally, we can deﬁne worst case distinctiveness (wcd) in terms of non-distinctive paths. We denote the set of non-distinctive paths in D by ~ nd(D), and deﬁne the worst case distinctiveness of D as the maximum number of steps an agent can take in D without revealing his goal, that is wcd(D) = max ~ 2~ nd(D)

We denote the non-distinctive paths that are preﬁxes of legal plans to gi by ~ nd(gi), and the length of the longest path in ~ nd(gi), by wcd-gi(D) = max~ 2~ nd(gi)|~ |. Following Keren et al. (2016), wcd(D) = maxi(wcd-gi(D)), a property exploited by the compilation presented next.

3 Calculating wcd The baseline method for wcd calculation is a breadth ﬁrst search through the space of paths. A search node (path) can be pruned if it does not represent a preﬁx of a legal plan, or if it is distinctive. In order to determine if a path ~ is distinctive under the grd-at setting, we can solve a goal recognition problem for each possible observation sequence ~o 2 op(~ ). If any of these observation sequences supports more than one possible goal, then ~ is non-distinctive.

For a non-deterministic sensor model, the number of possible observation sequences of a path may be exponential in path length. Thus, while the BFS method supports any possible set of legal plans, it is clearly inefﬁcient. Next, we present a more efﬁcient approach to calculate the wcd for the special cases of optimal or boundedly suboptimal (reaching a goal with a bounded cost beyond optimal) legal plans. Our approach uses a compilation to classical planning. For the sake of brevity, we only present the full details for optimal plans. The needed modiﬁcation to ﬁt the boundedly suboptimal case follows closely that of [Keren et al., 2015].

For a pair of goals hg0, g1i, we formulate a planning problem that ﬁnds wcd-g0,1, the maximal non-distinctive path leading to g0, assuming g1 is the only other possible goal. The wcd of the model is the maximum over all (ordered) goal pairs in the domain [Keren et al., 2016].

The planning problem created for each goal pair (dubbed common-declare) is given in Figure 2. It involves two agents each aiming at one of the goals. Note that although the compilation includes two agents, the proposed model serves a single agent with the objective of ﬁnding a plan to follow that potentially maximizes privacy. The two-agent model is used to support two possible behaviors of agents aiming at different goals within a single search problem. This approach allows us to ﬁnd the desired plan for an agent by solving a single planning problem using any off-the-shelf planner

Each agent (agent0 and agent1) has a copy fi of each fact f. Both agents start at the same initial state, and aim at a different goal. Actions are divided into real actions, which change the state of the world, and declare actions, that correspond to the emission of the observation tokens and are used to account for what the goal recognition system sees. Whenever an agent performs a real action that emits an observation token k (indicated by the addition of performedk,i to the current state) it needs to declare the token before it can execute the next action (indicated by declaredi).

Both agents perform real actions separately, but can perform a declare either separately (Declarek

i ,Declareo;

i ) or jointly (Declarek

0,1), if both emit the same observation token k. We guarantee the discovery of wcd-g0,1 by allowing joint declares only as long as not-exposed (a boolean ﬂag, initially set to true and allowing joint and unobserved declare statements) is true and assigning them a discount which is sufﬁciently small to prevent agents from deviating from an optimal plan. In addition, while not-exposed is true, agents can declare an empty token (Declareo;

i ) with no cost. Once either agent performs a separate Declarek

i action (corresponding to the execution stage in which the goal of the agents is revealed) not-exposed is deleted once and for all, thus forcing the agents to report their action tokens separately and eliminating the discount agent0 got for declare actions. Note that no cost is assigned to agent1 s declare actions, guaranteeing an optimal solution maximizes the non-distinctive preﬁx of agent0 rather then choosing a path that maximizes the number of non-observable actions performed by both agents.

We account for the non-deterministic system sensor model by considering all the possible observation sequences that may be produced by a path. This is achieved by creating for each action a 2 A a separate action copy ak

i for each token k 2 S(a), indicating that agent i executed action a and observation token k was emitted. This lets the planner choose the observation token each action occurrence emits, ensuring the compilation discovers the maximal non-distinctive path.

Using an optimal planner, we ﬁnd an optimal plan for the compiled problem. The plan, which consists of actions for both agents, is divided into two parts, a joint declaration part, including all the real actions to which the joint declare actions refer, and a separate part. wcd-g0,1 is the number of real actions agent0 performs before the ﬁrst action whose

token is declared separately.

Given a solution P 0 to P

0, we mark the projection of P 0 on each agent i as P 0 (gi), which includes all actions in Ai that appear in P 0 (excluding the declare actions). Accordingly, the projection of the optimal solution

0 on each agent is marked as

P 0 (gi). We guarantee that

P 0 (gi) yields a legal plan for both agents in D by bounding , the penalty for declare actions, such that the maximal accumulated penalty is lower than the smallest possible diversion from a legal path to any of the agents. The minimal cost diversion is 1. In addition, the model supports settings in which an agent may reach his goal without being observed. Accordingly, whenever < 1 maxi(C (gi)), with C being the optimal

cost, both agents act optimally in P

0 [Keren et al., 2014]. Given a solution P 0 , unexposed( P 0 (gi)) denotes the preﬁx of P 0 (gi) that includes all the actions excluding the last one that were performed by agenti before the ﬁrst Declarek

i occurred. Lemma 1 unexposed( P 0 (gi)) is non-distinctive. Proof: The compilation guarantees that any action in unexposed( P 0 (gi)) is either non-observable, followed by a Declareo;

i or observable, followed by a Declarek

0,1 action in which both agents reported their pending token together. This means that actions that appear in unexposed( P 0 (gi)) form a sequence that produces an observable projection shared by both goals, and is therefore non-distinctive.

Lemma 1 claims that unexposed( P 0 (gi)) is nondistinctive. Theorem 1 shows that the optimal solution to P 0 yields wcd-g0,1, thus concluding our proof of correctness. Theorem 1 Given a grd-at model D with two goals {g0, g1} and a model P 0, created according to the commondeclare compilation, wcd-g0,1(D) = |unexposed(

P 0 (g0))|.

Proof: The bound on described above guarantees that, apart from the no-cost operation Do Expose, the solution to P 0 consists solely of actions that form a pair of optimal paths to each of the goals (in the case of bounded non-optimal agents the cost of paths is optimal with regards to the bound). Lemma 1 shows that unexposed(

P 0 (g0)) represents a non-distinctive path. Since the only way to minimize the cost of P 0 is by maximizing the number of actions agent0 performs before his ﬁrst separate report guarantees that

P 0 is the solution to P 0 that maximizes |unexposed( P 0 (g0))|.

As a ﬁnal note, while the compilation described in Figure 2 could admit many permutations of the same plan, we can optimize the process by disallowing some permutations with additional constraints. First, before the agents goals are exposed, we force them to act in a round-robin fashion, where each round consists of applying a sequence of non-observable actions, and ends with applying an action that emits a nonnull observation token. Second, after exposing the goals, we force agent1 to wait until agent0 achieves his goal, similarly to [Keren et al., 2014]. It is easy to see that from any pair of plans to g0 and g1, we can construct a solution for our compiled model that respects these constraints. We omit the full description of the compilation with these constraints for the sake of clarity.

4 Privacy Preservation with wcd

Having introduced a method for computing wcd of a model, we now discuss how it can be used to assist agents in their decision making. To start off, wcd can serve as a measure for privacy preservation, answering the question what is the maximal duration an agent can keep his goal ambiguous? . The user can choose a path that potentially maximizes its privacy, which is the path found by solving a planning problem P 0 using the common-declare compilation, and which allows him to stay ambiguous for at most wcd steps.

Let g0 be an agent s true goal. To maximize his privacy, the agent is interested in wcd value with respect to go only. Therefore, we seek wcd-g0(D) that can be computed by maxgi2GD\g0wcd-g0,i(D).

While there is a single wcd-go(D) value in a model, the compilation of Figure 2 returns only one among (possibly) several non-distinctive paths. An interesting question would be whether choosing one such path over another may bring an agent closer to his goal, in terms of number of steps? The following proposition provides an interesting observation to support the use of the chosen non-distinctive path. Proposition 1 Let ~ and ~ 0 be two non-distinctive paths that satisfy a goal g s.t. |~ | = |~ 0| = wcd-g. Let leg(g) be the set of optimal plans to g and let ~ be a preﬁx of 2 leg(g) and ~ 0 be a preﬁx of 0 2 leg(g). Then, | | |~ | = | 0| |~ 0|. Proof: |~ | = |~ 0| = wcd-g, which means that they are both maximal non-distinctive paths in the model, otherwise wcd-g would not be the longest non-distinctive path, contradicting its deﬁnition. Assume, by way of contradiction, and w.l.o.g. that | | |~ | > | 0| |~ 0|. Therefore, | | wcd-g > | 0| wcd-g, yielding | | > | 0|, which means that is not an optimal plan, contradicting the fact that 2 leg(g) and that leg(g) is a set of optimal plans to g .

According to Proposition 1, any solution the planner provides may bring the agent as close as possible to his goal, while staying ambiguous about it, as long as the agent uses optimal plans. Whenever the set of legal plans is bounded optimal [Keren et al., 2015] with a budget of b, it can be shown that the difference between the proposed path and the closest plan to his goal cannot be more than b.

An agent can affect the environment s ability to monitor his actions. We illustrate this ability via a modiﬁcation example of sensor cloaking, a general approach for modifying partially observable goal recognition models by partitioning (or bundling) actions to emit one or more observation tokens. This is equivalent to adding (or removing) sensors. Next, we examine the effect of these modiﬁcations on the grd-at model.

We let AS[a] represent the set of actions A that share a common observation token with action a and deﬁne the reﬁnement relation between two sensor models as follows.

Deﬁnition 4 Let S : A ! 2O \ ; and S0 : A ! 2O0 \ ; be two sensor models (deﬁned over the same set of actions A but differ in their observation token set O and O0). S0 is a reﬁnement of S if for every action a 2 A, (1) if o; 2 S0(a) then o; 2 S(a) and (2) AS0[a] AS[a].

Note that the token set of a given action can include the empty token in the reﬁned model, only if it was included in

For a grd-at problem D = h P, G = {g0, g1}, leg(G), O, Si, where P = h F, I, Ai, we create a planning problem P 0 = h F 0, I0, A0, G0i, with action costs C0, where:

F 0 = {f0, f1 | f 2 F} [ {not-exposed} [ {declaredi | i 2 {0, 1}} [ {performedk,i} | k 2 O, i 2 {0, 1}} I0 = {f0, f1 | f 2 I} [ {not-exposed, declared0, declared1} A0 = Ai [ {Declarek

0,1 | k 2 O} [ {Declarek

i | i 2 {0, 1}, k 2 O} [ {Declare

o; i | i 2 {0, 1}}, where Ai= {ak

i | a 2 A, k 2 S(a)}, where ak

i = h{fi | f 2 pre(a)} [ {declaredi}, {fi | f 2 add(a)} [ {performedk,i}, {fi | f 2 del(a)} [ {declaredi}i Declarek

0,1= h{performedk,0, performedk,1, not-exposed}, {declared0, declared1}, {performedk,0, performedk,1}i Declare

o; i = h{performedo;,i, not-exposed}, {declaredi}, {performedo;,i}i Declarek

i = h{performedk,i}, {declaredi}, {not-exposed, performedk,i}i G0 = {f0|f 2 g0} [ {f1|f 2 g1} [ {declared0, declared1}

1 if a 2 Ai

2 if Declarek

0,1 if Declarek

0 0 if Declarek

Figure 2: The common-declare compilation

the original model. We show that cloaking by way of reﬁnement cannot increase a model s wcd. Theorem 2 Given two grd-at models D and D

0 which differ only in their respective sensor model S and S0 and induced observation token sets O and O0 (and therefore ~ leg(GD) = ~ leg(GD0)). If S0 is a reﬁnement of S then 8~ 2 ~ leg(GD), max~o02op D0(~ ) |GO

D0(~o0)| max~o2op D(~ ) |GO

Proof: According to Deﬁnition 1, the set of possible observation sequences generated by the execution of a path ~ is op D(~ ) in D and op D0(~ ) in D0. Assume to the contrary that S0 is a reﬁnement of S but

9~ 2 ~ leg(GD)| max ~o02op D0 (~ ) |GO

D0(~o0)| > max ~o2op D(~ ) |GO

Since ~ 2 ~ leg(GD) there is at least one goal g~ , which is the actual goal of the acting agent and which is satisﬁed by all the observable projections ~o 2 op D(~ ) (by Deﬁnition 2). Our assumption implies that there is at least one other goal go (go 6= g~ ) s.t. at least one observable projection of ~ satisﬁes both g~ and go in D0, but no such observable projection in D, otherwise Eq. 1 fails to hold. We now look at the path ~ go 2 ~ leg(go) that shares a common observable projection with ~ in D0 (but not in D). ~ go exists in both models since the legal paths are the same in both.

~ and ~ go share an observable projection in D0. Therefore, for all preﬁxes of ~ there is at least one preﬁx of ~ go with which it shares an observable projection in D0. Let i represent the index of the ﬁrst action ai in ~ s.t. the preﬁx ~ 1 i = ha1, , aii of shares no common observable projection with a preﬁx of ~ go in D (while there is always one for D0, due to the way ~ is selected). According to Deﬁnition 1, this can happen in one of two cases. Either o; 2 S0(ai) but o; 62 S(ai) and op(~ 1 i) \ op(~ 1 i 1) 6= ;. Since ai is the ﬁrst action for which no common observable projection exists in D, we know that op(~ 1 i 1) shares its observable projection with some preﬁx of ~ go both in D and D0. This, however, contradicts the assumption that if S0 is a reﬁnement of S then if o; 2 S0(a) then o; 2 S(a). Otherwise, there is at least one action in ~ go that shares a common token with ai in D0 and not in D. This contradicts the assumption that for every action a 2 A AS0[a] AS[a].

LOG BLOCK GRID FULL NO POD-Obj POD-Ac POND FULL NO POD-Obj POD-Ac POND FULL NO POD POND

wcd 1 1.2 1.2 13 13 5.3 6.1 6.1 8.5 8.5 2.8 3.02 3.09 3.18 time(LS) 2.85 4.9 0.3 time(LE) 35.1 83.75 72.4 74.1 0.3 0.24 time(CD) 263.8 107.1 94.7 117.3 397.3 82 103.3 96.1 113.2 373.5 0.63 0.64 0.48 1.33 % CD 0.8 0.9 0.9 0.85 0.7 1.0 1.0 1.0 1.0 0.75 1.0 1.0 1.0 1.0

Table 1: wcd Values, Running Time, and Coverage Ratio

Corollary 1 Given two grd-at models D and D

0 that differ only in their sensor model s.t. S0 is a reﬁnement of S then wcd(D0) wcd(D).

Proof: Let ~ be a non-distinctive path in ~ nd(D0). ~ 2 ~ leg(GD) \ ~ leg(G0

D) since D and D

0 differ only in their sensor model. From Theorem 2, max~o02op D0(~ ) |GO

D0(~o0)| max~o2op D(~ ) |GO

D(~o)|. Therefore, ~ 2 ~ nd(D) (Deﬁnition 3). Since this holds for any non-distinctive path in D0, we get that wcd(D0) wcd(D).

To illustrate Corollary 1, consider Example 1 and Figure 1. The model on the left is more reﬁned then the one on the right. In the left image, the marked path becomes distinctive after one step since moving left indicates the agent aims at g0. The model on the right remains non-distinctive after the movement left and remains non-distinctive for the next two steps. Corollary 1 provides a useful tool for agents. By computing the wcd of two models, which differ in their sensor model, an agent can make an educated decision of whether the difference in the wcd value is worth the cost that comes with less reﬁned sensors. For example, by turning off the phone s GPS, an agent may reach a higher value of wcd yet this comes at a cost of a less accurate positioning information that could confuse a navigation system. The agent can decide, based on his own utility function, which model is preferred.

5 Empirical Evaluation

We performed an evaluation of the grd-at framework to measure the effect non-deterministic partially observable sensor models have on the wcd value of a model and the efﬁciency of wcd calculation using the common-declare compilation. Datasets As the number of possible sensor models is huge, we manually created a few such models for each do-

main, based on some simple rules. We used 20 problems from the LOGISTICS (LOG) domain and 12 from the BLOCKSWORLD domain (BLOCK), which were shown to be the most computationally challenging for goal recognition design in previous work [Keren et al., 2016] and 34 GRIDNAVIGATION (GRID) problems of size 25 25.

Setup For each problem, we created 5 versions:

Fully observable (FULL): all actions are observable. Non observable actions (NO): Non-observables in LOG

are Load and Unload, in BLOCK Pick Up and Put Down and a random %10 of Move actions in GRID. Partially observable deterministic (POD): For LOG and

BLOCK we created two POD sensor models, in both of which we kept the same non-observable actions as NO. In POD-Ac, each observable action emits its type as an observation token (that is, in LOG all Drive actions are indistinguishable, as are Fly, Load Airplane, and Unload Airplane, and Stack and Unstack in BLOCK). In POD-Obj, each observable action emits its ﬁrst argument as an observation token (that is, we know which truck, airplane, package, or block were affected). For GRID we created a sensor model with reduced granularity, mapping each set of 4 target cells to the same token. Due to the domain simplicity, we refrained from presenting two model variations. Partially observable non-deterministic (POND): for

BLOCK and LOG each of the observable actions can emit one of the two observation tokens described in POD. For GRID, we followed the example in Figure 1(right), where the movement to every cell can produce a token corresponding to a movement from any one of its adjacent cells. For all domains we keep the same non-observable actions as in NO and POD.

We compared three compilations: latest Split (LS) [Keren et al., 2014], latest Expose (LE) [Keren et al., 2016], and common-declare (CD), using each compilation where it applies. We used the Fast Downward planning system [Helmert, 2006] running A with the LM-CUT heuristic [Helmert and Domshlak, 2009]. Experiments were run on Intel(R) Xeon(R) CPU X5690 machines, with a time limit of 30 minutes and memory limit of 2 GB.

Results Table 1 summarizes the results, and shows the average wcd and runtime for each compilation. The bottom row shows the ratio of problems solved by common-declare. The results show that wcd increases with the decrease of observability and increase of uncertainty. The GRID domain demonstrates the ability of an agent to improve his privacy by reducing sensor granularity. Comparing POD with FULL, we see an increase of wcd, on average, by about 10%.

As for running time, the more specialized compilations perform better when applicable, but the CD compilation solved most of the problems in all settings. The most timedemanding setting for common-declare is that of POND, due to the increased branching factor, with 50%-330% more time spent over the second time-demanding setting, on average.

6 Related Work

The ﬁrst to establish the connection between the closely related ﬁelds of automated planning and goal recognition were Ramirez and Geffner (2009), presenting a compilation of plan recognition problems into classical planning problems. Several works on plan recognition followed this approach [Agotnes, 2010; Pattison and Long, 2011; Ramirez and Geffner, 2010; 2011] by using various automated planning techniques. We follow this approach as well and introduce a novel compilation into classical planning for ﬁnding wcd with non-deterministic sensor models.

Partial observability in goal recognition has been modeled in various ways [Ramirez and Geffner, 2011; Geib and Goldman, 2005; Avrahami-Zilberbrand et al., 2005]. In particular, observability can be modeled using a sensor model that includes an observation token for each action [Geffner and Bonet, 2013]. The grd-at model covers all these aspects and more. In particular, we present a sensor model in which the set of observation tokens O includes an empty observation sequence o; and A includes a no-cost action aidle by which an agent remains at his current position.

Goal recognition design was ﬁrst introduced by Keren et al. (2014). This work, followed by several extensions [Keren et al., 2015; Son et al., 2016] offered tools to analyze and solve a grd model in fully observable settings. Another work [Keren et al., 2016] presented a model that accounts for nonobservable actions. Our work extends the system model by accounting for non-deterministic sensor models that can reﬂect any arbitrary assignment of action tokens emitted by actions. In addition, this work takes a point of view of an agent, demonstrating ways to affect the ability of a system to detect an agent s goal early on.

Distributed privacy-preserving multi-agent planning is a recent topic of interest [Bonisoli et al., 2014; Torre no et al., 2014; Luis and Borrajo, 2014; Brafman, 2015]. Agents supply a public interface and through a distributed planning process ﬁnd a plan, without sharing a complete model of their actions and local state with other agents. In our setting, an agent works alone within an environment and against an observer, and tries to keep only his goal hidden for as long as possible.

7 Conclusions

The paper proposes a method for evaluating partially observable models as means to understanding the system ability to identify an agent s goal. We extend the deﬁnition of the wcd measure and propose ways to calculate it. We also present methods an agent may take to preserve his privacy, by modeling the environment, computing its wcd, identifying a path to maximize wcd, and changing sensor granularity to improve privacy. Our empirical evaluation supports the feasibility of our approach to ﬁnd the wcd of such rich sensor models. In future work, we will test our model on additional (real-world) domains and explore the rich set of possible sensor model modiﬁcations for changing wcd.

Acknowledgements

The work was carried out in and partially supported by the Technion-Microsoft Electronic Commerce research center. The authors would like to thank Guy Shani for useful comments.

[Agotnes, 2010] T. Agotnes. Domain independent goal recognition. In Stairs 2010: Proceedings of the Fifth Starting AI Researchers Symposium, volume 222, page 238. IOS Press, Incorporated, 2010. [Avrahami-Zilberbrand et al., 2005] Dorit Avrahami Zilberbrand, G Kaminka, and Hila Zarosim. Fast and complete symbolic plan recognition: Allowing for duration, interleaved execution, and lossy observations. In Proc. of the AAAI Workshop on Modeling Others from Observations, MOO, 2005. [Bonet and Geffner, 2014] Blai Bonet and Hector Geffner.

Belief tracking for planning with sensing: Width, complexity and approximations. Journal of Artiﬁcial Intelligence Research, pages 923 970, 2014. [Bonisoli et al., 2014] Andrea Bonisoli, Alfonso Gerevini,

Alessandro Saetti, and Ivan Serina. A privacy-preserving model for the multi-agent propositional planning problem. In PICAPS14 Workshop on Distributed and Multi-Agent Planning, 2014. [Brafman, 2015] Ronen I. Brafman. A privacy preserving al-

gorithm for multi-agent planning and search. In Proceedings of the Twenty-Fourth International Joint Conference on Artiﬁcial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25-31, 2015, pages 1530 1536, 2015. [Fikes and Nilsson, 1972] R. E. Fikes and N. J. Nilsson.

Strips: A new approach to the application of theorem proving to problem solving. Artiﬁcial intelligence, 2(3):189 208, 1972. [Geffner and Bonet, 2013] Hector Geffner and Blai Bonet. A

concise introduction to models and methods for automated planning. Synthesis Lectures on Artiﬁcial Intelligence and Machine Learning, 8(1):1 141, 2013. [Geib and Goldman, 2005] Christopher W Geib and Robert P Goldman. Partial observability and probabilistic plan/goal recognition. In Proceedings of the International workshop on modeling other agents from observations (MOO-05), 2005. [Hatuka and Toch, 2016] Tali Hatuka and Eran Toch. Being

visible in public space: The normalisation of asymmetrical visibility. Urban Studies, page 0042098015624384, 2016. [Helmert and Domshlak, 2009] Malte Helmert and Carmel

Domshlak. Landmarks, critical paths and abstractions: What s the difference anyway? In Alfonso Gerevini, Adele E. Howe, Amedeo Cesta, and Ioannis Refanidis, editors, Proceedings of the 19th International Conference on Automated Planning and Scheduling, ICAPS 2009, Thessaloniki, Greece, September 19-23, 2009. AAAI, 2009.

[Helmert, 2006] Malte Helmert. The fast downward planning system. J. Artif. Intell. Res. (JAIR), 26:191 246, 2006. [Keren et al., 2014] Sarah Keren, Avigdor Gal, and Erez

Karpas. Goal recognition design. In ICAPS Conference Proceedings, June 2014. [Keren et al., 2015] Sarah Keren, Avigdor Gal, and Erez

Karpas. Goal recognition design for non optimal agents. In Proceedings of the Conference of the American Association of Artiﬁcial Intelligence (AAAI 2015), January 2015. [Keren et al., 2016] Sarah Keren, Avigdor Gal, and Erez

Karpas. Goal recognition design with non oservable actions. In Proceedings of the Conference of the American Association of Artiﬁcial Intelligence (AAAI 2016), February 2016. [Luis and Borrajo, 2014] Nerea Luis and Daniel Borrajo.

Plan merging by reuse for multi-agent planning. In PICAPS14 Workshop on Distributed and Multi-Agent Planning, 2014. [Pattison and Long, 2011] D. Pattison and D. Long. Accu-

rately determining intermediate and terminal plan states using bayesian goal recognition. Proceedings of the First Workshop on Goal, Activity and Plan Recognition(GAPRec 2011), page 32, 2011. [Ramirez and Geffner, 2009] M. Ramirez and H. Geffner.

Plan recognition as planning. In Proceedings of the Twenty-First International Joint Conference on Artiﬁcial Intelligence (IJCAI 2009), 2009. [Ramirez and Geffner, 2010] M. Ramirez and H. Geffner.

Probabilistic plan recognition using off-the-shelf classical planners. In Proceedings of the Conference of the American Association of Artiﬁcial Intelligence (AAAI 2010), 2010. [Ramirez and Geffner, 2011] M. Ramirez and H. Geffner.

Goal recognition over pomdps: Inferring the intention of a pomdp agent. In Proceedings of the Twenty-Second International Joint Conference on Artiﬁcial Intelligence Volume Three (IJCAI 2011), pages 2009 2014. AAAI Press, 2011. [Son et al., 2016] Tran Cao Son, Orkunt Sabuncu, Christian

Schulz-Hanke, Torsten Schaub, and William Yeoh. Solving goal recognition design using asp. In Proceedings of the AAAI Conference on Artiﬁcial Intelligence (AAAI), 2016. [Torre no et al., 2014] Alejandro Torre no, Eva Onaindia, and

Oscar Sapena. Fmap: Distributed cooperative multi-agent planning. Applied Intelligence, 41(2):606626, 2014.