# zeroknowledge_proofs_for_classical_planning_problems__f326b892.pdf

Zero-Knowledge Proofs for Classical Planning Problems

Augusto B. Corrˆea, Clemens B uchner, Remo Christen

University of Basel, Switzerland {augusto.blaascorrea,clemens.buechner,remo.christen}@unibas.ch

In classical planning, the aim is to ﬁnd a sequence of deterministic actions leading from the initial to a goal state. In this work, we consider the scenario where a party who knows the solution to a planning task, called the prover, wants to convince a second party, the veriﬁer, that it has the solution without revealing any information about the solution itself. This is relevant in domains where privacy is important, for example when plans contain sensitive information or when the solution should not be revealed upfront. We achieve this by introducing a zero-knowledge protocol for plan existence. By restricting ourselves to tasks with polynomially-bounded plan length, we are able to construct a protocol that can be run efﬁciently by both prover and veriﬁer. The resulting protocol does not rely on any reduction, has a constant number of rounds, and runs in time polynomial in the size of the task.

Introduction

Alice needs to solve a bunch of real-world problems but is not sure how to deal with them. She remembers her neighbor Bob talking about similar problems and asks if he has any advice. Upon hearing Bob s praise of classical planning, Alice excitedly models all her problems as classical planning tasks. With the models in hand, only the solutions are missing. Conveniently, Bob is the author of the famous Fast Bobward planning system (cf. Hoffmann and Nebel 2001; Helmert 2006), which is allegedly very fast, and he agrees to solve Alice s problems. Since Bob does classical planning for a living, he expects to be compensated. Alice is happy to pay her neighbor, but wants to be sure that Bob can actually deliver what he promises. Bob assures Alice that he solved the tasks already, but refuses to show her the solutions upfront. The two are left with two questions: How can Alice be sure that Bob really solved her tasks? And how can Bob convince Alice without revealing anything about the solutions? In this paper, we help Alice and Bob. We show a zeroknowledge protocol (Goldwasser, Micali, and Rackoff 1985) for proving plan existence. The protocol is interactive and probabilistic. It is interactive because a prover (Bob) exchanges messages with a veriﬁer (Alice) to prove that it indeed knows a plan for the task; it is probabilistic because

Copyright c 2023, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

the veriﬁer is capable of generating random bits, and because a successful outcome of the protocol only gives a high probability that the prover is not fooling the veriﬁer. Yet, the protocol guarantees that the veriﬁer does not learn anything, besides that there exists a plan for the task. We are interested in a protocol that can be executed efﬁciently. We want the prover to be able to convince the veriﬁer using few messages, while also limiting both to polynomial-time computations. This limits the scope of our protocol: classical planning is PSPACE-complete (Bylander 1994) and, while every language in PSPACE has a zeroknowledge proof (Ben-Or et al. 1988; Lund et al. 1992; Shamir 1992), the only known protocols for a PSPACEcomplete language are the ones by Shamir (1992) and by Shen (1992) for true quantiﬁed Boolean formulas (QBF). Unfortunately, these protocols do not have the desired efﬁciency, so reducing a planning task to a QBF would not meet our criteria. It is not known whether languages that are not in NP have efﬁcient zero-knowledge protocols. Thus, our protocol is limited to tasks with polynomially-bounded plan lengths, an NP-complete fragment of classical planning. Our protocol can be applied to areas where privacy is a concern. This is also the motivation for privacy-preserving multi-agent planning (Brafman 2015; Torre no et al. 2017), but the use-cases for each method are contrasting: privacypreserving planners deal mostly with problems where actions and states should be kept secret; our protocol deals with problems where the solution should be kept secret. Moreover, the two approaches have different computational limitations. Privacy-preserving planners cannot be complete, strong privacy preserving, and efﬁcient (for a particular notion of efﬁciency) at the same time (Toˇziˇcka, ˇStolba, and Komenda 2017), while our protocol can only efﬁciently handle tasks in NP for which the prover already has a plan. The idea of our protocol is to break up a plan into several transitions. By making transitions indistinguishable from each other, the prover guarantees that each individual transition does not reveal anything about the complete plan. At every execution of the protocol, the veriﬁer checks a single transition to see if it is valid. The veriﬁer then repeats the protocol until it is conﬁdent that the complete plan is valid. The presented protocol has a constant number of rounds (i.e., message exchanges), and only requires polynomialtime computation from the prover and the veriﬁer.

The Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23)

Background Classical Planning We use propositional STRIPS (Fikes and Nilsson 1971) extended with negative preconditions. A planning task (or simply task) Π = V, A, I, G has the following components. The set V is a ﬁnite set of propositional variables. For any v V we say that v itself and v are its literals; v is called a positive literal and v is called a negative literal. Given a literal l, we write |l| to denote the variable associated with l. A state s V is a subset of the variables denoting which variables are true in that state, i.e., v 7 for all v s and v 7 otherwise. A condition C is a set of literals such that there are no l, l C where |l| = |l |. Let C+ and C be the sets of all positive and negative literals in C, respectively. A state s satisﬁes a condition C (written s |= C) if |l| s for all l C+ and |l| / s for all l C . We denote the variables mentioned in condition C as vars(C). The set A is a ﬁnite set of actions a, each with a precondition pre(a) and an effect eff(a). Both precondition and effect are conditions and, without loss of generality, we assume that there is no literal l such that l pre(a) and l eff(a) for any a A. Finally, I is a state called the initial state, and G is a condition called the goal. Action a A is applicable in state s if s |= pre(a). If a is applicable in s, its application results in the successor state s Ja K = (s \ eff(a) ) eff(a)+. If a is not applicable in s, s Ja K is undeﬁned. The notions of applicability and successor states are extended to ﬁnite sequences of actions π = a1, . . . , an in the natural way. We say that an action sequence π has a transition (si 1, ai, si) iff ai is in π and si 1Jai K = si where si 1 and si are states and 1 i n. A goal state of Π is a state s such that s |= G. A plan for a state s of Π is an action sequence π that takes s to a goal state: s JπK |= G. The length of a plan π corresponds to the number of actions in π and is denoted by |π|. A plan for I is also called a plan for Π. A task Π is solvable if there exists a plan for Π. In this paper, we are interested in the language of all planning tasks that have plans with length bounded by some natural number k. We call this language BOUNDEDPLANEX. Deﬁnition 1 (BOUNDEDPLANEX). Given a planning task Π and k N, the pair Π, k is in BOUNDEDPLANEX iff there exists a plan π for Π with |π| k. Deciding if Π, k BOUNDEDPLANEX is PSPACEcomplete in general (Bylander 1994). In this work, however, we are primarily interested in planning tasks that have plans of polynomial length in the representation size of the task. The representation size Π of a planning task is the length of a reasonable encoding of Π. This fragment of classical planning is NP-complete: nondeterministically guess a polynomially-long action sequence and check sequentially whether it is a plan for Π. Let F be a family of tasks and q be a positive polynomial such that for every task Π in F, there exists a plan π for Π such that |π| q( Π ). Then there exists a constant c N such that |π| q( Π ) O( Π c) for every task Π in F. Thus, we say that we are interested in families of tasks with plans bounded by O( Π c) for some constant c.1

1Note that c and q depend on the family of tasks F.

Computational Complexity We assume familiarity with the classes P, NP, and PSPACE, polynomial reductions, and the concepts of hardness and completeness. We explain the concepts of interactive proofs and the class IP (Babai 1985; Goldwasser, Micali, and Rackoff 1985) next. Due to space limitations, we refer to the seminal textbook by Arora and Barak (2009) for a more comprehensive introduction. Intuitively, an interactive proof for a language L and some common input x is a protocol between a prover P, who has unlimited computational power, and a probabilistic polynomial-time veriﬁer V . Prover P claims to have a proof (sometimes called a witness or certiﬁcate) that x L and wants to convince V of that. P and V then perform a sequence of message exchanges, called rounds, so P can try to convince V . The veriﬁer, however, does not trust P: it might be that P is a dishonest prover that does not have a proof for x L and is only trying to cheat the veriﬁer. Let V, P (x) be the output of the interaction between V and P on a common input x. We write V, P (x) = 1 if the veriﬁer V accepts the protocol, i.e., if V was convinced by P that x L. A language L is in IP if there exists a protocol with a polynomial number of rounds satisfying two conditions with respect to the common input x:

x L = P Pr[ V, P (x) = 1] 2

x / L = P Pr[ V, P (x) = 1] 1

where Pr[ V, P (x) = 1] denotes the probability that V accepts x after interacting with P. In words, if x L then P can convince V to accept x with high probability, and if x / L then P can only make V falsely accept x with low probability.2 Condition (1) is called completeness, and (2) is called soundness. Given a language L, if there exists a probabilistic polynomial-time veriﬁer V and a protocol satisfying (1) and (2) within a polynomial number of rounds, then L IP. In more practical settings, there exists an additional efﬁciency requirement: P is allowed to use a lot of resources to ﬁnd a proof for x L but has limited resources to prove it to V . This models the idea that we do not want the veriﬁcation of the solution to be as expensive as solving the problem from scratch. Although IP = PSPACE (Lund et al. 1992; Shamir 1992), it is unclear whether PSPACE-complete languages have interactive proofs with this efﬁciency property. While languages in NP have very efﬁcient protocols, to the best of our knowledge, the only known protocols for languages not in NP (Lund et al. 1992; Shamir 1992; Shen 1992) rely on very powerful provers.

Zero-Knowledge Proofs An interactive proof is called a zero-knowledge proof if it does not provide a single additional bit of information to the veriﬁer, besides whether x L or not (Goldwasser, Micali, and Rackoff 1985). Goldreich, Micali, and Wigderson (1986) proved that any decision problem in NP has a zero-knowledge proof, under the

2Constants 2/3 and 1/3 can be strengthened or weakened without changing the power of IP. See Arora and Barak (2009) for details.

assumption that one-way functions exist. Later, Ben-Or et al. (1988) showed that, under the same assumption, every language in IP has a zero-knowledge proof. Formally, zero-knowledge is deﬁned under the simulation paradigm (Goldreich 2001): a protocol is zero-knowledge if the veriﬁer can simulate the interaction with the prover on its own, without the prover. More formally, there should exist a polynomial-time probabilistic simulator M whose output distribution is computationally indistinguishable from the output distribution of the interaction between veriﬁer V and prover P, denoted V, P (x). In other words, there is no efﬁcient algorithm that can distinguish between V, P (x) and the output of M on input x, except with negligible probability.3 This should also hold for any arbitrary strategy that veriﬁer V uses, even if it does not follow the protocol. Deﬁnition 2 (Zero-Knowledge Proofs). Given a protocol between a veriﬁer V and a prover P with common input x, the protocol is said to be zero-knowledge for some language L iff for every probabilistic polynomial-time veriﬁer strategy V , there exists a probabilistic polynomial-time simulator M such that when x L the output distribution of V , P (x) and the output distribution of M on x are computationally indistinguishable. Deﬁnition 2 asserts that if V could have learned anything from the protocol, then it could have also learned it by running the simulator M on x. In simpler terms, if there is an algorithm M that can imitate the interaction V, P (x), the veriﬁer V could simply execute this algorithm instead of interacting directly with P. The deﬁnition also quantiﬁes over all possible veriﬁer strategies to guarantee that the protocol is zero-knowledge even when the veriﬁer deviates from the established protocol (i.e., it tries to cheat the prover). The common way of proving that a protocol is indeed zeroknowledge is by showing that such a simulator M exists (e.g., Goldwasser, Micali, and Rackoff 1985; Blum 1986; Impagliazzo and Yung 1987; Goldreich 2001). It is important to stress that the simulator must only be correct when x L, while there is no requirements when x / L. This models the idea that the veriﬁer learns nothing about the proof that x L, and in the case where x / L there is nothing to be learned.

Commitment Schemes An important ingredient in zeroknowledge protocols is the concept of commitment schemes (Goldreich 2001). Using a commitment scheme, the prover P can commit to a message while keeping it hidden from the veriﬁer V . This commitment can be later opened whenever P wants to reveal the information of the message to V . Computationally, a commitment scheme is a function that produces a commitment string Commit(x) given an object x and some key Kx {0, 1}n . The commitment Commit(x) can be opened using the same key Kx. Commitment schemes have two important properties: unambiguity and secrecy. A scheme is unambiguous if using two different keys to commit the same objects always

3A function f(x) is called negligible iff |f(x)| 1/g(x) for all positive polynomials g(x) and sufﬁciently large x. In our case, we consider that two distributions are computationally indistinguishable if the statistical distance between them is negligible.

produces two different commitments; it is secret if given two objects x and y, their commitments Commit(x) and Commit(y) are computationally indistinguishable. We write Commit(x) to represent the commitment of an object x. We say that the prover P opens or reveals x to V to indicate that P gives V the key Kx to open x. With some abuse of notation, we write Commit(L) to indicate the element-wise commitment of some sequence or set L. Commitment schemes can be seen as functions to encrypt objects such that their original value is only revealed under demand. In our work, we assume that the function Commit( ) is unambiguous and secret. Our results hold for any function following these criteria. For more details on commitment schemes and concrete examples, we refer to the canonical book by Goldreich (2001).

Plan Existence Protocol

We now introduce our protocol, ZK-BOUNDEDPLANEX. With ZK-BOUNDEDPLANEX, a prover P who claims to know a plan π for Π, with |π| k O( Π c) for some constant c N, can convince a veriﬁer V that Π, k BOUNDEDPLANEX in a polynomial number of rounds, while not disclosing any knowledge about π. Our protocol is inspired by the zero-knowledge protocol for Hamiltonian cycles by Blum (1986), and by the zero-knowledge simulation of computation by Impagliazzo and Yung (1987). Let us give the high-level idea ﬁrst. To start, the prover P transforms the planning task Π into a new task ˆΠ where variables are permuted and actions are made indistinguishable. The trick here is that ˆΠ still preserves all plans of Π but actions and variables of ˆΠ cannot be easily identiﬁed. Next, P sends ˆΠ together with a commitment of ˆπ an obfuscated version of the original plan π to V . The prover then poses two possible options to the veriﬁer: V can either choose to check the mapping from Π to ˆΠ and conﬁrm if the transformation is valid, or V can check a part of the plan ˆπ on its own. The veriﬁer V randomly chooses one of the two options. If V decides to check the transformation, P provides the transformation function to V , and V simply compares Π to ˆΠ, accepting if both are equivalent (otherwise, it rejects); if V decides to check the plan itself, it selects one random transition of ˆπ, P reveals this transition to V , and V computes locally whether this transition is valid or not. If it is valid, it accepts; otherwise it rejects. If ˆπ is a valid plan for ˆΠ, then all transitions in the plan are valid. Otherwise, at least one transition is invalid (either not applicable or leads to a different successor state than the one commited by P) and V has 1/k = 1/q( ˆΠ ) chance of spotting it, where q( ˆΠ ) O( ˆΠ c) is a positive polynomial as previously deﬁned. There are some other details that V also needs to take care of. For example, it needs to verify that ˆπ starts at the initial state and ends at a goal state. We detail all these steps next. P claims to have a plan π for Π but V cannot be sure that this π is indeed a plan it can be some random sequence of actions, if P is a dishonest prover. We thus say that π

is a valid plan if it is indeed a plan, and we say that π is an invalid plan if it is just some sequence of actions which does not lead from the initial state to some goal state either because it is not applicable in the initial state, because it does not lead to the goal, or because a single transition is invalid. Similarly, we extend this notion to transitions: a transition (si 1, ai, si) is invalid if si 1Jai K is undeﬁned or si 1Jai K = si.

ZK-BOUNDEDPLANEX: Step-by-Step

We formalize each step of our protocol next. A concrete example is available as a technical report (Corrˆea, B uchner, and Christen 2022).

Step 0. P and V have as common input Π, k , where Π = V, A, I, G and k O( Π c) for some constant c N. The prover P claims to know a plan π for Π with |π| k.

Step 1. P transforms Π into some new task ˆΠ so it is computationally hard for V to correctly identify actions and variables. To do this, P transforms the task in ﬁve different phases: (a) add a dummy action to the task that can be applied in any state without changing it; (b) (c) make actions indistinguishable by padding preconditions and effects until they all have the same structure; (d) permute truth values of variables to hide their identity; and (e) deﬁne speciﬁc initial and goal states to allow deterministically checking that the plan starts from the expected initial state and ends in a valid goal state. We detail each phase next. Starting from Π, each of the following phases creates a new planning task that is a modiﬁcation of the previous one. We write Πa for the task created in phase (a), Πb for the task created in phase (b), and so on.

(a) P introduces a dummy action adummy, where

pre(adummy) = eff(adummy) = {}

and then creates the planning task Πa = V, Aa, I, G , where Aa = A {adummy}. P then uses the action adummy to pad any plan π with |π| < k until it has length k. This allows our protocol to decide Π, k BOUNDEDPLANEX for tasks with plans strictly smaller than k while not revealing the length of the plan. The key property of adummy is, that its application leaves the given state unchanged and the existence of adummy does therefore not affect plan existence. (b) To ensure that all actions look similar, every action must 1) have the same number of mutual variables across precondition and effect and 2) precondition (resp. effect) must have the same number of literals. We delay the second requirement to part (c). More formally, let m(a) = |vars(pre(a)) vars(eff(a))| denote the number of mutual variables in the precondition and effect of a given action a. Our goal in this part of the transformation is to obtain a set of actions Ab such that for every pair of action a1, a2 Ab it holds that m(a1) = m(a2). To achieve this, let m = maxa Aa m(a). For every action a Aa, introduce n = m m(a) new variables

ma 1, . . . , ma n, and two new actions a and a such that

pre(a ) = pre(a) { ma i |1 i n},

eff(a ) = eff(a) {ma i |1 i n},

pre(a ) = pre(a) {ma i |1 i n}, and

eff(a ) = eff(a) { ma i |1 i n}.

These actions change all newly introduced variables in unison, which is also the reason why P needs to duplicate the number of actions: one copy to simulate a in case they are all , and one copy to simulate a in case they are all . Note that no other action affects the variables ma i and consequently all ma i always have the same value or . The initial state I remains unchanged because P considers all newly introduced variables to be initially, as they are not in I. The goal G also remains unchanged because the newly introduced variables do not inﬂuence the reachability of G. In summary, this ﬁrst transformation leads to Πb = Vb, Ab, I, G where

a Aa {ma i |1 i m m(a)} and

Ab = {a | a Aa} {a | a Aa}.

This phase can be done in O(|V||A|) steps and increases the representation size of the task by O(|V||A|). (c) We continue by padding the action preconditions and effects so that all actions have the same number of preconditions and the same number of effects. For a Ab, let p(a) = |pre(a)| and e(a) = |eff(a)|. Consider Πb as described above and let p = maxa Ab p(a) and e = maxa Ab e(a). For every action a Ab, introduce m = p p(a) new variables pa 1, . . . , pa m and n = e e(a) new variables ea 1, . . . , ea n. Furthermore, deﬁne ac such that

pre(ac) = pre(a) { pa i | 1 i m} and eff(ac) = eff(a) { ea i | 1 i n}.

In doing so, all newly introduced variables only occur as negative literals after this transformation of the planning task. By leaving them out of the initial state as well, P ensures that all actions remain applicable and all effects can only make them false, so they never change their value. The goal also remains unchanged so the newly introduced variables have no inﬂuence on the solvability of the task. In summary, this transformation leads to Πc = Vc, Ac, I, G where

a Ab {pa i | 1 i p p(a)}

a Ab {ea i | 1 i e e(a)} and

Ac = {ac | a Ab}

This phase also takes O(|V||A|) steps and increases the task representation by the same amount.

(d) To hide the identity of the original Π in its transformed version Πc, P creates a permuted version of Πc using a uniformly chosen function ρ that randomly permutes variables and their truth value (i.e., positive to negative and vice-versa) at the initial state I. The truth value of each variable is also permuted consistently in every place of occurrence (e.g., goal and actions). With some abuse of notation, we write ρ(Vc), ρ(Ac), ρ(I), and ρ(G) to denote the permuted versions of Vc, Ac, I, and G according to ρ. Let Πd = Vd, Ad, Id, Gd where

Vd = ρ(Vc), Ad = ρ(Ac), Id = ρ(I), and Gd = ρ(G).

This phase does not inﬂuence the size of the task representation but takes time O(|Vc|) which is polynomial in |V|, as discussed in the previous transformation. (e) In this last transformation, the task is changed to have a speciﬁc initial state and a unique goal state. To achieve this, P introduces two new variables v I and v , and two new actions a I and a , where

pre(a I) = {v I} { v | v Vd}, eff(a I) = { v I} Id, pre(a ) = { v I} Gd, and eff(a ) = {v } { v | v Vd}.

P changes the initial state to Ie = {v I} and the goal to Ge = {v } { v | v Vd}. To ensure that only a I is applicable in Ie, P adds v I to the preconditions of all actions and therefore deﬁnes ae for all a Ad where

pre(ae) = pre(a) { v I} and eff(ae) = eff(a).

In summary, this last transformation leads to Πe = Ve, Ae, Ie, Ge where

Ve = Vd {v I, v } and Ae = {ae | a Ad}.

This transformation can be done in polynomial time, too.

Let ˆΠ = ˆV, ˆ A, ˆI, ˆG where ˆV = Ve, ˆ A = Ae, ˆI = Ie, and ˆG = Ge. The task ˆΠ is our transformed task and it will be used through the rest of the protocol. The overall transformation from Π to ˆΠ can be done in time polynomial in Π , and ˆΠ is only polynomially larger than Π. In the rest of the paper, we say that σ is a transformation function such that σ(Π) = ˆΠ. Note that σ can be deﬁned simply by describing each step taken by P, which takes polynomial time and space. P transforms the plan π for Π of length |π| k into a plan ˆπ = a1, . . . , ak+2 for ˆΠ. This transformation is fairly straightforward so, due to space limits, we do not discuss it here. However, we note that the new initial and goal states in ˆΠ imply that a1 = a I and ak+2 = a . As these two

actions were not in the original task Π, ˆπ has two steps more than π. To simplify notation, let ℓ= k + 2 and thus ˆπ = a1, . . . , aℓ .

Step 2. P creates the sequence S of states:

S = s0, . . . , sℓ ,

where s0 and sℓcorrespond to our unique initial and goal states, respectively, and si 1Jai K = si for all 1 i ℓ. In words, sequence S corresponds to the states visited by ˆπ, if ˆπ is indeed a valid plan. Next, P commits the task ˆΠ and each element in ˆπ and S individually using some arbitrarily chosen commitment scheme. Recall that we write Commit(x) to represent the encrypted commitment of some object x, and Commit(L) represents the element-wise commitment of some sequence L. With some abuse of notation, we deﬁne Commit(ˆΠ) = Commit(ˆV), Commit( ˆ A), Commit(ˆI), Commit( ˆG) where each ˆV, ˆ A, ˆI, and ˆG are also commited element-wise. Finally, P sends Commit(ˆΠ), Commit(ˆπ), and Commit(S) to V .

Step 3. If |Commit(ˆπ)| > ℓ, the veriﬁer rejects the protocol; otherwise it picks a random bit b {0, 1} and sends it to P. If b = 0, P reveals the function σ to V together with all keys to open Commit(ˆΠ). V opens Commit(ˆΠ) to obtain ˆΠ and checks if σ(Π) = ˆΠ. If this is the case, V accepts the protocol; otherwise it rejects. If b = 1, P reveals s0 and sℓfrom Commit(S) = Commit(s0), . . . , Commit(sℓ) to V .

Step 4. V veriﬁes that s0 = {v I}, the expected initial state, and sℓ= {v }, the expected goal state. If either of these comparisons fails, V rejects the protocol. Otherwise, it uniformly chooses an integer m {1, . . . , ℓ} by ﬂipping log ℓ fair coins and sends m to P. The value of m corresponds to the m-th transition of ˆπ that V wants to verify on its own.

Step 5. The prover reveals sm 1 and sm from Commit(S) and am from Commit(ˆπ). It also reveals ˆV and am from Commit( ˆ A).4 Finally, V compares the action am in ˆπ with the action obtained from ˆ A, which should be the same, and veriﬁes that all variables used in sm 1, am and sm are indeed in ˆV. If any of these checks fail, V rejects the protocol. Last, V checks locally if sm 1Jam K = sm. If this also passes, V accepts the protocol. Otherwise, V rejects it.

All computations of the protocol can be done in polynomial-time by the prover and the veriﬁer. More explicitly, the veriﬁer needs to pick log ℓrandom bits, transform the task and compare the resulting task, compare two states to their expected value, and check a transition locally. This can all be easily done in polynomial-time by a probabilistic algorithm.

4Note that P reveals two copies of am: one from the transition and one from the set of actions ˆ A. This is done so the veriﬁer can be sure that the action used in the transition is indeed part of the task ˆΠ.

In fact, all the computations performed by the prover P are also polynomial. This means that any prover that simply happens to have the solution (e.g., it obtained this information from a third-party) could participate in the protocol. Note that this is only possible because the transformations done in Step 1 can also be applied to the plan. The protocol also only needs a constant number of rounds. Together, the constant number of rounds and the polynomial-time steps make ZK-BOUNDEDPLANEX efﬁcient enough to be performed in practice.

Completeness and Soundness We ﬁrst analyze the completeness and soundness of our protocol. Throughout the rest of the paper, let x = Π, k . We rewrite Equations (1) and (2) in terms of our language BOUNDEDPLANEX:

x BOUNDEDPLANEX = P Pr[ V, P (x) = 1] 2

x / BOUNDEDPLANEX = P Pr[ V, P (x) = 1] 1

where the ﬁrst implication refers to the protocol s completeness and the second to the protocol s soundness. We discuss completeness ﬁrst. The veriﬁer can only reject a valid input Π, k in one of the following cases (1) P provides an invalid transformation function σ (Step 3); (2) ˆπ does not start in s I = {v I} or does not end in s = {v } (Step 4); (3) one of the transitions in ˆπ is invalid (Step 5). Under the assumption of an all-powerful prover, any honest prover P that can compute a plan can also trivially satisfy the cases above. Hence, our protocol has completeness of

Pr[ V, P (x) = 1] = 1.

For soundness, we denote by P an arbitrary dishonest prover that claims to have a plan for Π but in reality it does not. Consider how P can try to trick V into accepting x although x / BOUNDEDPLANEX. First, P can try to trick V by performing a wrong task transformation in Step 1. The veriﬁer V would then reject it with probability 1/2, the probability of b = 0 in Step 3. By repeating the protocol t times, the probability of V accepting when x / BOUNDEDPLANEX decreases to (1 1/2)t which is smaller than 1/3 when t 2. Second, P can perform a correct task transformation in Step 1 but provide an invalid plan in Step 2. We write π to denote this invalid plan. Note that the plan π is invalid when there is at least one wrong transition in π. This means that for some transition (si 1, ai, si), the action ai is either not applicable in si 1 or it does not lead to si. The veriﬁer can only spot this mistake if b = 1 in Step 3 and m = i in Step 5. Hence, the probability of V accepting x is 1 1/2ℓ. By repeating the protocol 3ℓtimes, this probability decreases to (1 1/2ℓ)3ℓ< 1/3 since ℓ 2 by deﬁnition. As ℓ= k + 2 and k is polynomial in Π , the protocol only needs to be repeated a polynomial number of times. Note that, in practice, P can optimize parts of the protocol in the case it needs to be repeated several times. For example, steps (a) (c) in Step 1 can be computed only once and

be cached for future reuse. As Step 1 is the most expensive step of the protocol although it still runs in polynomial time this caching can be very beneﬁcial in practice.

Proof of Zero-Knowledge We now prove that the ZK-BOUNDEDPLANEX protocol is zero-knowledge. We need to prove that there exists a polynomial-time probabilistic simulator M with an output distribution that is computationally indistinguishable from the output distribution of V , P (x), where V is a probabilistic polynomial-time veriﬁer with an arbitrary strategy (i.e., it does not need to follow the protocol). Recall that the zero-knowledge deﬁnition is conditioned to cases where the input x belongs to the language, so M only needs to produce an output that is computationally indistinguishable from the output of V , P (x) for the case of an honest prover P. Our simulation relies on the common technique that the simulator M guesses the random choices of V in advance. In our case, M guesses the values of b and m. If it does not guess correctly, then it simply restarts the simulation (e.g., Blum 1986; Arora and Barak 2009). The trick here is that M only needs to restart the simulation a polynomial number of times, and it only outputs once it guesses b and m correctly. Although it is possible that an exponential (for example) number of restarts is needed, such corner cases occur with negligible probability, so they are not sufﬁcient to distinguish both distributions. In such cases, M can just abort the process entirely after a certain number of steps. Next, we explain how M works. We consider that M executes V as a subroutine, so it can send messages and simulate V independently of its strategy. The simulation works as follows: M begins by randomly guessing b {0, 1} and m {1, . . . , ℓ}. It then starts to act as a prover P by generating ˆΠ = ˆV, ˆ A, ˆI, ˆG as in Step 1 using some function σ. As previously argued, this transformation takes polynomial time so M can do it within its time limits. The only difference to Step 1 is that M does not compute a plan ˆπ for ˆΠ. It then produces ˆπ and S but differently than P would, as M does not know a plan ˆπ. It starts with the sequence of states S and it splits this procedure into two cases based on whether m {1, ℓ} or not. First, assume m / {1, ℓ}. Let a be a randomly selected action of ˆΠ and let s be a state such that s |= pre(a ), v I / s , v / s , and all other variables in ˆV are set to true or false uniformly at random. Then M deﬁnes S = s0, . . . , sℓ where s0 = s I = {v I}, sℓ= s = {v }, sm 1 = s , sm = s Ja K, and all other states in S are chosen randomly (by uniformly assigning each variable to true or false at each state). The sequence ˆπ is deﬁned as ˆπ = a1, . . . , aℓ where am = a and all other actions are chosen randomly from ˆ A. Note that ˆπ generated by M is (very) probably not a valid plan, but we use the same notation and terminology for simplicity. When m {1, ℓ}, instead of choosing a randomly, M must select a speciﬁc action according to the value of m. If m = 1, then a1 = a I, and we deﬁne s0 = s I and s1 = s0Ja IK in S. If m = ℓ, then aℓ= a , sℓ= s , sℓ 1 is

deﬁned such that s |= pre(a ), v I, v / s and all other variables are set to true or false uniformly at random. The rest of the procedure is the same. After that, M commits ˆπ, S, and ˆΠ and sends them to V , as P would do at the end of Step 2. V picks b {0, 1} and sends to M . If b = b , M continues the simulation; otherwise it restarts it. In the case where b = b = 0, M can send V the function σ used to transform Π into ˆΠ so V can check that σ(Π) = ˆΠ. This can be trivially done by M as it created σ on its own, so V will accept the protocol. In this case, the interaction between V and P and the interaction between V and M are indistinguishable. In the case where b = b = 1, M continues simulating P (by sending the keys to s0, sℓ, etc.) until V sends m to M . If m = m , M restarts the simulation. Otherwise, it reveals the m-th transition to V , as P would do. Since this transition was constructed in a way that it is guaranteed to be valid, V will accept the protocol. The key idea of the simulation is that the only transition a veriﬁer can view is the one they randomly choose based on m. However, M only knows that the m -th transition is valid. Therefore, M can only guarantee that V accepts when m = m . Otherwise, it needs to restart the simulation. Before proving that the protocol ZK-BOUNDEDPLANEX is zero-knowledge, we prove some lemmas.

Lemma 1. Commit(ˆΠ), Commit(ˆπ), and Commit(S) computed by M are computationally indistinguishable from Commit(ˆΠ), Commit(ˆπ), and Commit(S) computed by a prover P.

Proof. Note that Commit(ˆΠ) is computed identically by M and P, so they are trivially indistinguishable. Commit(ˆπ) and Commit(S) are computed differently (M generates random states and actions, while P uses the plan for that) but under the assumption of a commitment scheme that is hiding and binding, they are computationally indistinguishable when encrypted.

Lemma 2. The probability that b = b and m = m is at least 1/2ℓ.

Proof. Given Lemma 1 and the fact that the elements in Commit(S) are also indistinguishable between themselves (analogously to Commit(ˆπ)), we can assume that a probabilistic polynomial-time V chooses b and m in an oblivious manner (i.e., independent of the content of the encrypted information). For simplicity, we can then assume that it selects b and m even before the protocol with M or P has started. Thus, it follows that Pr[b = b and m = m ] = 1/2ℓ.

Lemma 3. Let (sm 1, am, sm) be the single transition revealed to V . Then, (sm 1, am, sm) computed by M is computationally indistinguishable from (sm 1, am, sm) computed by a prover P.

Proof. Recall that the transformation of Step 1 makes actions indistinguishable (except for a I and a ) by padding the actions and permuting variables and truth values. Thus, the action am revealed to V is indistinguishable from any

other action and so are states sm 1 and sm. However, (sm 1, am, sm) was computed differently by M and by P. When computed by P, it is a transition occurring in ˆπ. When computed by M , it is a randomly selected action in ˆ A with randomly generated states sm 1 and sm. But since variables are permuted according to a function chosen uniformly at random by both M and P, both (sm 1, am, sm) computed by M and (sm 1, am, sm) computed by a prover P are computationally indistinguishable to each other as well.

We are now set to prove that ZK-BOUNDEDPLANEX is indeed a zero-knowledge protocol.

Theorem 4. The protocol ZK-BOUNDEDPLANEX is zeroknowledge.

Proof. In other words, we want to prove that the output of the interaction between V and M is indistinguishable from the output of the interaction between V and P for any probabilistic polynomial-time veriﬁer strategy V , when conditioned to cases where x BOUNDEDPLANEX, and that M runs in probabilistic polynomial time. The fact that the interactions are computationally indistinguishable follows directly from Lemmas 1 and 3, as these are the only messages revealed by the prover to V . It remains to show that M runs in probabilistic polynomial time. As shown in Lemma 2, Pr[b = b and m = m ] = 1/2ℓand thus the expected number of restarts M makes until b = b and m = m is 2ℓ. Therefore, the expected running time of M is 2ℓt(|x|), where t(|x|) is the running time of V with input x. There is a very small probability that the algorithm needs more than polynomial number of restarts. To solve this, M can output some default value after |x| restarts. This adds 1/2 (ℓ 1/ℓ)|x| statistical distance between the distributions, but this is negligible as 1/2 (ℓ 1/ℓ)|x| 1/q(|x|) for any positive polynomial q and sufﬁciently large |x|. As ℓ O( Π c) for some constant c, we have that 2ℓt(|x|) O( Π c)t(|x|) and since t(|x|) is polynomial in Π , too, we have O( Π )t(|x|) O(poly( Π )).

Conclusion We introduce ZK-BOUNDEDPLANEX, a zero-knowledge protocol to prove plan existence for tasks with polynomiallylong plans. At the core of our protocol lies a transformation of the original planning task Π to the new task ˆΠ, where variables are permuted to hide their identities and actions are made indistinguishable. After the prover commits to ˆΠ and plan ˆπ, the veriﬁer makes a probabilistic choice to either check the task transformation or the transformed plan. In the latter case, the veriﬁer checks a single transition as well as the initial and goal states of ˆπ. If all checks pass, the veriﬁer accepts the protocol. ZK-BOUNDEDPLANEX runs in a constant number of rounds, with every step being computable in time polynomial in Π ; the veriﬁer has high conﬁdence that a plan exists after repeating the protocol a polynomial number of times, while the prover is absolutely sure that no knowledge about the plan is revealed.

Acknowledgments We especially thank Andr e G. Pereira for giving us feedback on earlier versions of this work, and Adi Shamir for pointing us to relevant literature on zero-knowledge proofs. We also thank Simon Dold and Malte Helmert for helpful discussions. This work was funded by the Swiss National Science Foundation (SNSF) as part of the project Certiﬁed Correctness and Guaranteed Performance for Domain-Independent Planning (CCGP-Plan). Furthermore, this research was also partially supported by TAILOR, a project funded by EU Horizon 2020 research and innovation programme under grant agreement no. 952215.

References Arora, S.; and Barak, B. 2009. Computational Complexity: A Modern Approach. Cambridge University Press. Babai, L. 1985. Trading Group Theory for Randomness. In Sedgewick, R., ed., Proceedings of the Seventeenth Annual ACM Symposium on Theory of Computing (STOC 85), 421 429. ACM Press. Ben-Or, M.; Goldreich, O.; Goldwasser, S.; H astad, J.; Kilian, J.; Micali, S.; and Rogaway, P. 1988. Everything Provable is Provable in Zero-Knowledge. In Goldwasser, S., ed., Proceedings of the Eighth Annual International Cryptology Conference (CRYPTO 1988), volume 403 of Lecture Notes in Computer Science, 37 56. Springer. Blum, M. 1986. How to Prove a Theorem So No One Else Can Claim It. In Proceedings of the International Congress of Mathematicians (ICM 1986), volume 2, 1444 1451. Brafman, R. I. 2015. A Privacy Preserving Algorithm for Multi-Agent Planning and Search. In Yang, Q.; and Wooldridge, M., eds., Proceedings of the 24th International Joint Conference on Artiﬁcial Intelligence (IJCAI 2015), 1530 1536. AAAI Press. Bylander, T. 1994. The Computational Complexity of Propositional STRIPS Planning. Artiﬁcial Intelligence, 69(1 2): 165 204. Corrˆea, A. B.; B uchner, C.; and Christen, R. 2022. Zero Knowledge Proofs for Classical Planning Problems: Concrete Example. Technical Report CS-2022-002, University of Basel, Department of Mathematics and Computer Science. Fikes, R. E.; and Nilsson, N. J. 1971. STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving. Artiﬁcial Intelligence, 2: 189 208. Goldreich, O. 2001. The Foundations of Cryptography Volume 1: Basic Techniques. Cambridge University Press. Goldreich, O.; Micali, S.; and Wigderson, A. 1986. Proofs that Yield Nothing But their Validity and a Methodology of Cryptographic Protocol Design (Extended Abstract). In Proceedings of the Twenty-Seventh Annual Symposium on Foundations of Computer Science (FOCS 1986), 174 187. IEEE Computer Society. Goldwasser, S.; Micali, S.; and Rackoff, C. 1985. The Knowledge Complexity of Interactive Proof-Systems (Extended Abstract). In Sedgewick, R., ed., Proceedings of the

Seventeenth Annual ACM Symposium on Theory of Computing (STOC 85), 291 304. ACM Press. Helmert, M. 2006. The Fast Downward Planning System. Journal of Artiﬁcial Intelligence Research, 26: 191 246. Hoffmann, J.; and Nebel, B. 2001. The FF Planning System: Fast Plan Generation Through Heuristic Search. Journal of Artiﬁcial Intelligence Research, 14: 253 302. Impagliazzo, R.; and Yung, M. 1987. Direct Minimum Knowledge Computations. In Pomerance, C., ed., Proceedings of the Eighth Annual International Cryptology Conference (CRYPTO 1987), volume 293 of Lecture Notes in Computer Science, 40 51. Springer. Lund, C.; Fortnow, L.; Karloff, H. J.; and Nisan, N. 1992. Algebraic Methods for Interactive Proof Systems. Journal of the ACM, 39(4): 859 868. Shamir, A. 1992. IP = PSPACE. Journal of the ACM, 39(4): 869 877. Shen, A. 1992. IP = PSPACE: Simpliﬁed Proof. Journal of the ACM, 39(4): 878 880. Torre no, A.; Onaindia, E.; Komenda, A.; and ˇStolba, M. 2017. Cooperative Multi-Agent Planning: A Survey. ACM Computing Surveys, 50(6): 1 32. Toˇziˇcka, J.; ˇStolba, M.; and Komenda, A. 2017. The Limits of Strong Privacy Preserving Multi-Agent Planning. In Barbulescu, L.; Frank, J.; Mausam; and Smith, S. F., eds., Proceedings of the Twenty-Seventh International Conference on Automated Planning and Scheduling (ICAPS 2017), 297 305. AAAI Press.