# expressive_logical_combinators_for_free__f40ae13b.pdf

Expressive Logical Combinators for Free

Pierre Genev es CNRS, LIG Alan Schmitt

Inria, Rennes

A popular technique for the analysis of web query languages relies on the translation of queries into logical formulas. These formulas are then solved for satisﬁability using an off-the-shelf satisﬁability solver. A critical aspect in this approach is the size of the obtained logical formula, since it constitutes a factor that affects the combined complexity of the global approach. We present logical combinators whose beneﬁt is to provide an exponential gain in succinctness in terms of the size of the logical representation. This opens the way for solving a wide range of problems such as satisﬁability and containment for expressive query languages in exponential-time, even though their direct formulation into the underlying logic results in an exponential blowup of the formula size, yielding an incorrectly presumed two-exponential time complexity. We illustrate this from a practical point of view on a few examples such as numerical occurrence constraints and tree frontier properties which are concrete problems found with semi-structured data.

1 Introduction

Modal logics have recently been increasingly used in the analysis of domain speciﬁc languages such as XML Schemas [Dal-Zilio et al., 2004; B arcenas et al., 2011], XPath queries [Genev es et al., 2007], SPARQL queries [Chekol et al., 2012], and CSS selectors [Genev es et al., 2012]. They also serve as a formal framework for the analysis of programming languages such as CDuce [Benzaken et al., 2003; Gesbert et al., 2011], XQuery [Boag et al., 2005; Castagna and Xu, 2011], languages for XML access control [Murata et al., 2006; Kolovski et al., 2007] and, more generally, for No SQL programming languages [Gesbert et al., 2011; Benzaken et al., 2013]. The common goal is to characterize these languages in terms of expressivity and complexity, and to build adapted and effective type-checkers, static analyzers, and compilers.

Detailed afﬁliation of authors: P. Genev es (CNRS, Laboratoire d Informatique de Grenoble, Inria, Univ. Grenoble Alpes) and A. Schmitt (Inria, Research Center Rennes - Bretagne Atlantique).

Such an approach requires the construction of efﬁcient compilers that translate ﬁrst-class constructs (such as queries) into logical formulas. Those formulas are then solved for satisﬁability using an off-the-shelf satisﬁability solver such as the ones found in [Tanabe et al., 2005; Pan et al., 2006; Genev es et al., 2007; Genev es et al., 2015a; Tanabe et al., 2008]. A critical aspect is then the size of the obtained logical formula, since it is a factor that affects the combined complexity1 of the global approach. We show that a whole class of logical combinators (or macros ) can be used as an intermediate language between the query language and the logical language. Those logical combinators provide an exponential gain in succinctness over the corresponding explicit logical representation, yet preserving the typical exponential time complexity [Tanabe et al., 2005; Pan et al., 2006; Genev es et al., 2007; Genev es et al., 2015a; Tanabe et al., 2008] of the logical decision procedure. This opens the way for solving a wide range of problems such as query satisﬁability and query containment in exponential-time, even though their direct formulation into the underlying logic results in an exponential blowup of the formula size, yielding an incorrectly presumed twoexponential time complexity. Speciﬁcally, two essential steps are involved in the reduction of a problem to logical satisﬁability: (1) the translation of the initial problem into a logical formula, and (2) the actual satisﬁability check of the formula. Traditionally, the complexity of the satisﬁability test is stated in terms of the size of the formula, thus every duplication of subformulas during the ﬁrst step may affect the combined complexity and severely impact the practical applicability of the entire approach. Interestingly, we observe that a common form of µ-calculus sub-formula duplication has a very limited impact on combined complexity in existing implementations, such as [Tanabe et al., 2005; Pan et al., 2006; Genev es et al., 2015a; Tanabe et al., 2008]. The reason lies in the fact that satisﬁability-testing algorithms can operate directly on a Hintikka-set-like representation of formulas,

1In the context of problem-solving by reduction to logical satisﬁability, combined complexity considers the complexity of the translation of the problem into logic (taking into account any potential blow-up in size induced by the change in representation), composed with the complexity of testing satisﬁability of the formula.

Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015)

named lean, composed of atomic propositions and modal subformulas. In other terms, the lean can be seen as a succinct but sufﬁcient representation from which the logical satisﬁability of the formula can be decided. In this setting, we prove that the time complexity of decision procedures actually depends on the number of distinct atomic propositions and modal subformulas present in the lean. This makes explicit a notion of truth-status sharing for identical sub-formulas not exhibited in the analysis of the time complexity of such algorithms. We develop this idea in the context of the µ-calculus, whose expressive power subsumes the ones of many modal logics. More speciﬁcally we develop this idea using the alternation-free µ-calculus with converse modalities whose models are ﬁnite trees, following [Genev es et al., 2007]. Trees are encoded in binary, without loss of generality, through the ﬁrst-child and next-sibling modalities, respectively noted 1 and 2 in the manner of [Genev es et al., 2007]. In this setting, an elegant way of building a µ-calculus formula is to apply a combinator to another formula. For instance, split(X) = 1 X 2 X is a combinator that generates a formula such that the input formula must hold in both successors of the current node. In the previous example, although X is duplicated, the increase of the size of the lean generated from split(ϕ) when compared to the one generated from ϕ is only a small constant, independent of ϕ. The paper is organized as follows. We introduce the logical formulas and combinators in 2, state and prove our main result in 3. Finally, we report on practical applications in 4 with experimental results that illustrate that the lean size is indeed a better indicator of the problem s complexity than formula size.

2 Basic Logical Formulas and Combinators

We recall the syntax of the logic of [Genev es et al., 2007], used to prove properties about ﬁnite binary trees. We consider a set AP of atomic propositions, representing the tree node names, which includes a special reserved name σ; a set Var of variables, used in ﬁxpoints; and a set Prog = 1, 2, 1, 2

of programs, to describe navigation in a tree. Program 1 navigates to the ﬁrst child (left successor), program 2 navigates to the next sibling (right successor), program 1 to the parent (predecessor to the right, if it exists), and program 2 to the previous sibling (predecessor to the left, if it exists). We let a = a for any a Prog. A logical formula is deﬁned using the following syntax.

, , σ, or σ for all σ AP \ {σ};

x for all x Var;

ϕ1 ϕ2 or ϕ1 ϕ2 where ϕ1 and ϕ2 are logical formulas;

a ϕ or a where a Prog and ϕ is a logical formula;

µx.ϕ where x Var and ϕ is a logical formula.

We now give an intuition of the interpretation of formulas in the setting of ﬁnite trees: the interpretation of a formula is a set of focused trees, which are ﬁnite trees with a selected node. A formula is satisﬁable if there exists a tree such that

a node of this tree is selected by the formula (i.e., the set of focused trees is not empty). The truth formula selects all focused trees, i.e., every node of every tree, whereas selects none. The σ formula selects every focused trees whose selected node s name is σ, whereas σ selects the nodes with other names (the node name σ is used to represent names of nodes not occurring in the formula). Formula conjunction and disjunction correspond to set intersection and union, respectively. A formula a ϕ selects a node if the node reached following a is selected by formula ϕ. Formula a selects a node if there is no node reachable through a. Finally, a ﬁxpoint µx.ϕ is interpreted as the smallest ﬁxpoint (the intersection of every pre-ﬁxpoint). A formula is closed if every occurrence of a variable x is bound by an enclosing µx. In the following, we only consider formulas that are closed and whose recursion variables are guarded (there is at least one navigation step between a recursion µx and every variable x). Note that since we do not have general negation in formulas (see below), there is no requirement for formulas to be positive (i.e., disallow formulas of the form µx. . . . x): such formulas simply cannot be expressed. Finally, we write ϕ ψ if ϕ is a sub-formula of ψ, and ϕ ψ if it is not. A combinator F is a formula with zero or more occurrences of a placeholder, written X, possibly negated ( X). We write F{ϕ/X} for the combinator F where every instance of X has been replaced by the closed formula ϕ.2 We often write F(X) to make clear the name of the placeholder, and F(ϕ) for F{ϕ/X}. We consider formulas in negation normal form. The negation of a formula or combinator, written F, is obtained using regular De Morgan s laws extended with the cases of ﬁxpoints: µx.F = µx.F, x = x, X = X, and X = X. Moreover, the negation of a modality is a disjunction:

a F = a F a F = the modality is false either because there is no node in that direction, or because the node in that direction does not satisfy the sub-formula. Note that, in order to ensure that F = F for every formula, we have a special case when a disjunction is actually the negation of a modality:

a F a = a F Following [Genev es et al., 2007], the greatest and smallest ﬁxpoint coincide (provided a simple restriction on formulas, namely for cycle-free formulas using sets of ﬁnite trees as models, see [Genev es et al., 2007] for details). Every combinator presented here fulﬁll this restriction. Nevertheless, this work could also be done in a setting where the smallest and greatest ﬁxpoints differ, and in this case one deﬁnes µx.F as νx.F and νx.F as µx.F.

3 Deciding Combined Formulas The Lean Following [Pan et al., 2006; Genev es et al., 2007; Tanabe et al., 2008], we deﬁne the lean of F as follows, with LΓ(F)

2In case of a negated placeholder, we replace X with ϕ, the negation normal form of ϕ (obtained using De Morgan s laws).

deﬁned in Figure 1 (the environment Γ is the set of already unfolded ﬁxpoints). The main difference with the usual approaches is that we close the lean under negation.

Lean(F) = a | a {1, 2, 1, 2} {σ} L (F)

LΓ( ) = LΓ( ) = LΓ(x) = LΓ(X) = LΓ( X) =

LΓ(σ) = LΓ( σ) = {σ}

LΓ(F G) = LΓ(F G) = LΓ(F) LΓ(G)

LΓ( a F) = { a F, a F, a } LΓ(F)

LΓ( a ) = { a }

LΓ(µx.F) = LΓ(F) if x F

LΓ(µx.F) = if µx.F Γ or µx.F Γ

LΓ(µx.F) = LΓ {µx.F }(F{µx.F/x}) otherwise

Figure 1: Negation Closed Lean

Intuitively, the lean of a formula ϕ is the set of every atomic proposition occurring in ϕ, and every subformula that starts with a modality present in ϕ or in the expansion of the ﬁxpoints of ϕ. In particular, the lean does not directly include disjunctive or conjunctive subformulas. Kozen has shown in [Kozen, 1982] that expanding every ﬁxpoint once is sufﬁcient to generate every subformula that may need to be considered for satisﬁability. This single expansion is tracked using the Γ argument in the deﬁnition above. Moreover, Kozen has also shown the lean is linear in the size of the formula. We next make this bound more precise.

The Factorization Power of the Lean

We can now state the main theorem of this paper: the lean size is not impacted by the duplication of sub-formulas. We write |S| for the size of a set S.

Theorem 1. Let F be a combinator and ϕ a closed formula. We have the following.

|Lean(F{ϕ/X})| |Lean(F)| + |Lean(ϕ)|

The theorem is a direct consequence of Lemma 9 which is proved below in 3. We now give some intuition about this result, through a simple example. Recall the split(X) combinator deﬁned as 1 X 2 X. Since the elements of the lean are either atomic propositions (node names) and modalities, the lean of split(ϕ) includes the lean of ϕ and four new elements: 1 ϕ, 1 ϕ, 2 ϕ, and 2 ϕ. If we now consider split(split(ϕ)), we once again add only four formulas to the lean: 1 split(ϕ), 1 split(ϕ), 2 split(ϕ), and 2 split(ϕ). This linear growth, even though the formula s size increases exponentially, is due to the fact that modalities are considered atomically and are not split up in their components (e.g., 1 (ϕ ψ) is not split up into 1 ϕ and 1 ψ).

Satisﬁability-Testing Algorithms based on the Lean A typical approach to decide the satisﬁability of a formula is to ﬁrst build the lean, as described above, then to use a tableau-based algorithm implemented with BDDs [Tanabe et al., 2005; Pan et al., 2006; Genev es et al., 2007; Tanabe et al., 2008]. The time complexity of this approach is shown to be exponential in the size of the formula. More precisely, it is exponential in the size of the lean, which is in turn linear in the size of the formula. The essence of this paper is to show (both in theory and in practice) that the lean may grow much more slowly than the formula when sub-formulas are duplicated. This opens the way for solving a wide range of problems in exponentialtime even though their direct translation into the modal logic is exponential, as illustrated on concrete examples in 4.

Proof of Theorem 1 We deﬁne the number of recursive expansions of F or ϕ, written E (F), in Figure 2. We use this number to deﬁne inductive properties that depend on ﬁxpoints being expanded.

EΓ( ) = EΓ( ) = EΓ(x) = EΓ(X) = EΓ( X) = 0 EΓ(σ) = EΓ( σ) = EΓ( a ) = 0 EΓ(F G) = EΓ(F G) = EΓ(F) + EΓ(G) EΓ( a F) = EΓ(F)

EΓ(µx.F) def= EΓ(F) if x F

EΓ(µx.F) def= 0 if µx.F Γ or µx.F Γ

EΓ(µx.F) def= EΓ {µx.F }(F{µx.F/x}) + 1 otherwise

Figure 2: Number of Recursive Expansions

We write (Γ)ϕ X as the set of formulas G where either G {ϕ/X} or G {ϕ/X} is in Γ. We use this set to identify the formulas that were expanded. Deﬁnition 2. Given ϕ, and Γ, we deﬁne (Γ)ϕ X as follows.

{G | X G (G {ϕ/X} Γ G {ϕ/X} Γ)}

Lemma 3. We have F{G/x} = F{G/x}.

Proof. By induction on F, relying on the fact that x = x.

Lemma 4. We have F{G/X} = F{G/X}.

Proof. By induction on F, relying on the fact that X = X.

Lemma 5. If Γ Γ , then LΓ (F) LΓ(F).

Proof. By induction on the lexical order of EΓ(F) and the size of F. Base cases are immediate. For conjunction and disjunction, we may apply the induction hypothesis because EΓ(F1) and EΓ(F2) do not increase and the formula size decreases. This is also the case for the modality case. For the recursion case where x F, we distinguish three cases (we do not mention the negation µx.F in these cases):

if µx.F Γ, then necessarily µx.F Γ and we immediately conclude; if µx.F Γ , then we conclude by S for any set S; otherwise, we have µx.F in neither set, and we apply the induction hypothesis, as EΓ {µx.F }(F{µx.F/x}) is strictly smaller than EΓ(µx.F).

Lemma 6. We have LΓ {µx.F }(G) = LΓ {µx.F }(G).

Proof. By an immediate induction on the lexical order of EΓ(G) and the size of G.

Lemma 7. For any F, we have LΓ(F) = LΓ(F).

Proof. By induction on the lexical order of EΓ(F) and the size of F. The base cases , , x, X, X, σ, σ, a , and µx.F there x F are immediate. For F G, we compute as follows.

LΓ(F G) = LΓ(F) LΓ(G)

= LΓ(F) LΓ(G) by induction

The disjunction case is similar. For the recursion case, if µx.F or µx.F is in Γ, then µx.F or µx.F = µx.F is also in Γ and the result follows. Finally, if neither is in Γ, we compute as follows.

LΓ(µx.F) = LΓ {µx.F }(F{µx.F/x})

= LΓ {µx.F }(F{µx.F/x}) by induction

= LΓ {µx.F }(F{µx.F/x}) by Lemma 3

= LΓ {µx.F }(F{µx.F/x}) by Lemma 6

Lemma 8. For all Γ and F, if X F, then L(Γ)ϕ X(F) = L (F).

Proof. We prove the more general result: for all Γ , L(Γ)ϕ X Γ (F) = LΓ (F) by induction on the lexical order of EΓ (F) and the size of F. The result is immediate for the base cases, and by induction for the conjunction, disjunction, and modality cases. For the recursion case, if µx.F (or its negation) is in (Γ)ϕ X Γ , then it must be in Γ as X F and members of (Γ)ϕ X contain X by deﬁnition. Thus both sides are equal to . If neither µx.F nor its negation are in (Γ)ϕ X Γ , we have L(Γ)ϕ X Γ (µx.F) = L(Γ)ϕ X Γ {µx.F }(F{µx.F/x}) We next apply the induction hypothesis with Γ {µx.F} and F{µx.F/x}, thus we have L(Γ)ϕ X Γ {µx.F }(F{µx.F/x}) = LΓ {µx.F }(F{µx.F/x}) = LΓ (µx.F). We conclude by taking Γ = .

Lemma 9. Let F be a formula mentioning X, and ϕ a closed formula. We have L (F{ϕ/X}) L (F){ϕ/X} L (ϕ).

Proof. We prove the following more general property for any Γ by induction on the lexical order of E(Γ)ϕ X(F) and the size of F.

LΓ(F{ϕ/X}) L(Γ)ϕ X(F){ϕ/X} L (ϕ)

We ﬁrst deal with every case where X F. In this case, X also does not occur in LΓ(F).

LΓ(F{ϕ/X}) = LΓ(F)

L (F) by Lemma 5

= L(Γ)ϕ X(F) by Lemma 8

= L(Γ)ϕ X(F){ϕ/X}

L(Γ)ϕ X(F){ϕ/X} L (ϕ)

Case X. We compute as follows, using Lemma 5 for the last inclusion.

LΓ(X{ϕ/X}) = LΓ(ϕ) L (ϕ)

Case X. We compute as follows, using Lemma 5 for the set inclusion, and Lemma 7 to conclude.

LΓ(( X){ϕ/X}) = LΓ(ϕ) L (ϕ) = L (ϕ)

Case F G. We compute as follows.

LΓ((F G){ϕ/X})

= LΓ(F{ϕ/X}) LΓ(G{ϕ/X}) and by induction:

L(Γ)ϕ X(F){ϕ/X} L(Γ)ϕ X(G){ϕ/X} L (ϕ)

= L(Γ)ϕ X(F G){ϕ/X} L (ϕ)

Case F G. Identical to the previous case. Case a F. We compute as follows, using Lemma 4 and the induction hypothesis.

LΓ(( a F)){ϕ/X}

= LΓ( a (F{ϕ/X}))

= { a F{ϕ/X}; a F{ϕ/X}; a } LΓ(F{ϕ/X})

= { a F; a F; a }{ϕ/X} LΓ(F{ϕ/X})

{ a F; a F; a }{ϕ/X} L(Γ)ϕ X(F){ϕ/X} L (ϕ)

= L(Γ)ϕ X( a F){ϕ/X} L (ϕ)

Case µx.F with x F. We compute as follows.

LΓ((µx.F)){ϕ/X}

= LΓ(µx.F{ϕ/X}) ϕ closed

= LΓ(F{ϕ/X}) x F{ϕ/X}

L(Γ)ϕ X(F){ϕ/X} L (ϕ) by induction

= L(Γ)ϕ X(µx.F){ϕ/X} L (ϕ) x F

Case µx.F with x F.

If we have µx.F{ϕ/X} Γ or µx.F{ϕ/X} Γ then LΓ(µx.F{ϕ/X}) = and the result is immediate. Otherwise we compute as follows.

LΓ((µx.F){ϕ/X})

= LΓ(µx.F{ϕ/X}) ϕ closed

= LΓ {µx.F {ϕ/X}}(F{ϕ/X}{µx.F {ϕ/X}/x})

= LΓ {µx.F {ϕ/X}}(F{µx.F/x}{ϕ/X}) ϕ closed To apply the induction hypothesis, we show that

E(Γ)ϕ X(µx.F) = E(Γ {µx.F {ϕ/X}})ϕ X(F{µx.F/x}) + 1.

First, we have µx.F / (Γ)ϕ X and µx.F / (Γ)ϕ X, since otherwise, we would have µx.F{ϕ/X} Γ or µx.F{ϕ/X} = µx.F{ϕ/X} Γ, which we assumed to be false. Thus E(Γ)ϕ X(µx.F) = E(Γ)ϕ X {µx.F }(F{µx.F/x}) + 1. Next, we have (Γ {µx.F{ϕ/X}})ϕ X = (Γ)ϕ X {µx.F} by deﬁnition. Thus we have E(Γ)ϕ X(µx.F) = E(Γ {µx.F {ϕ/X}})ϕ X(F{µx.F/x}) + 1. We may thus apply the induction hypothesis and continue to compute.

L(Γ {µx.F {ϕ/X}})ϕ X(F{µx.F/x}){ϕ/X} L (ϕ)

= L(Γ)ϕ X {µx.F }(F{µx.F/x}){ϕ/X} L (ϕ)

As neither µx.F nor µx.F are in (Γ)ϕ X, we have the following equality: L(Γ)ϕ X(µx.F) = L(Γ)ϕ X {µx.F }(F{µx.F/x}). We may thus conclude the computation as follows. = L(Γ)ϕ X(µx.F){ϕ/X} L (ϕ)

We complete the proof by taking Γ to be and remarking that ( )ϕ X = .

4 Applications We provide an implementation of the proposed system [Genev es et al., 2015b]. Each example given in this section comes in a boxed version that can directly be used with the implementation [Genev es et al., 2015b]. Tests are thus reproducible. We present several examples where advanced properties on the underlying data structure (here a tree) can be formulated using combinators, and for which our result applies. In particular, after an introductory example, we consider numerical constraints on the global number of occurrences, and properties on the sequence of leaves in a tree. This means, for instance, that if one extends a query language (such as XPath) with such kinds of features, then our result applies: problems such as query satisﬁability and query containment would not be harder to solve in terms of computational complexity than they already are for the language without the extensions.

A Very Simple Example: Split The combinator split(X) introduced in Section 3 may generate arbitrary large formulas: for instance, let ϕ = a 1 b 2 µy.c 2 y, the formula ψ = split(split(split(ϕ))) uses 8 occurrences of ϕ. To give this formula to the implementation of [Genev es et al., 2015b], we write it as follows.

phi() = a & <1>b & <2>let $y = c | <2>$y in $y; split(#x) = <1>#x & <2>#x; split(split(split(phi())))

We then observe that ψ contains 24 atomic propositions, 38 modalities, 15 conjunctions, and 8 disjunctions (including duplicates). The size of the lean is only 19 (14 modalities and 5 atomic propositions3 Each new split( ) around the formula then only adds two elements to the lean. The satisﬁability check of the above formula is performed in 131ms (milliseconds) with the implementation [Genev es et al., 2015b], including 5ms for computing the lean and 104ms for computing the tableau. A sample satisfying tree of 33 nodes is also constructed in 39ms.

Document-Order Relation and Global Counting

Extending modal logics with operators for counting occurrences of speciﬁc nodes in trees is a notably difﬁcult problem [Lugiez, 2005; Dal-Zilio et al., 2004]. In this example we review how our result applies for a simple form of counting (with respect to constants) in trees. A very simple example of a combinator is the descendant relation that checks that a node satisfying some formula X is accessible in the subtree by any sequence of forward modalities. It is encoded as follows:

descendant(X) = 1 (µz.X 1 z 2 z)

A whole range of combinators to navigate in a tree can be deﬁned in a similar manner. In particular we can encode:

following(X) = ancestor or self(ψ) where

ψ = following sibling(descendant or self(X))

These combinators are easily deﬁned in our logic; as they are very commonly used, they are predeﬁned in [Genev es et al., 2015b]. They can be used as such to encode the socalled document-order relation . This relation corresponds to the ordering of nodes given by a depth-ﬁrst tree traversal: x y iff node y is visited after node x in a depthﬁrst tree traversal. We deﬁne the combinator next(X) = descendant(X) following(X) with which we can mimic the document-order relation (we write X next(Y ) for x y). Notice that this combinator duplicates formulas, since the placeholder X appears twice in its deﬁnition. The document-order relation can be used to express global counting properties in trees. For instance, if we want to encode the so-called concept of a nominal or more generally the fact that some formula ψ is satisﬁed by one and only one node in the tree we can write:

psi() = a & <1>b & <2>let $y = c | <2>$y in $y; next(#x) = descendant(#x) | following(#x); previous(#x) = preceding(#x) | ancestor(#x); nominal(#x) = #x & previous(#x) & next(#x); nominal(psi())

3The numbers we report in this paper correspond to the numbers reported by the implementation [Genev es et al., 2015b].

If we now want to force the existence of at least 4 different tree nodes that satisfy ψ, we can write:

psi() & next(psi() & next(psi() & next(psi())))

The full expansion of the above formula is notably large (if we count duplicates, the formula contains 468 atomic propositions, 871 modalities, 156 conjunctions, and 404 disjunctions). However, the size of its lean is 43 (38 modalities and 5 atomic propositions). The satisﬁability check of the latter formula above is performed in 205ms with the implementation [Genev es et al., 2015b], including 15ms for computing the lean and 124ms for the tableau. A sample satisfying tree is built in 78ms.

The Tree Frontier

In this example, borrowed from [Afanasiev et al., 2005], one describes constraints on a tree frontier. A tree frontier is the set of leaves (nodes without an outgoing 1 edge) ordered from left to right. A frontier node y is the successor of a frontier node x iff x y and there is no leaf node in between x and y in the document order. A simple case analysis shows that node y is the successor of a frontier node x in one of three cases:

1. Either x is a leaf with an immediate next sibling which is also a leaf (y);

2. or x is a leaf with an immediate next sibling which is not a leaf, in which case, by navigating downward in its subtree we reach the leftmost leaf (y);

3. or x is a leaf with no next sibling, in which case, by going up to the parent node recursively until we reach a parent node which has a next sibling, then going to this next sibling, and then, from this node, navigating downward in its subtree we reach the leftmost leaf (y).

This yields the following deﬁnition of a combinator that captures all the aforementioned cases with the help of a few neatly chosen auxiliary predicates:

next Frontier Node(Y ) = leaf up Until Rsibl(ψ)

In this deﬁnition, the placeholder Y is to be replaced by a formula that holds at the successor node, and:

leaf = 1 ψ = 2 down to first leaf(Y ) up Until Rsibl(X) = ( 2 X)

1 (( 2 X) x)

down to first leaf(Z) = µx.(leaf Z) 1 x

Using these combinators, we can now express properties on the tree frontier. For instance, the formula shown on Figure 3 states that the leftmost leaf is labeled a , and, by further navigation on the tree frontier, we encounter two other leaves labeled a . If we count duplicates, the corresponding formula

leaf() = <1>T; down_to_first_leaf(#z)= let $x= (leaf() & <0>#z)

|<1>$x in $x;

up_until_rsibl(#x) = (<2>T & #x)

| let $w = <-1>((<2>T & #x) | $w) | <-2>$w in $w;

next_frontier_node(#y) = leaf()

& up_until_rsibl(<2>down_to_first_leaf(#y));

down_to_first_leaf(a & next_frontier_node(a

& next_frontier_node(a)))

Figure 3: Tree Frontier Example.

contains 7 atomic propositions, 26 modalities, 23 variables, 10 ﬁxpoint binders, 7 negations, 7 conjunctions, and 16 disjunctions. However, the size of the corresponding lean is 22. The lean is only composed of 19 distinct modalities and 3 distinct atomic propositions. Each additional nested call to the combinator next Frontier Node(Y ) extends the lean by 6 modalities. However, the corresponding global formula goes from 26 modalities to 58 for the ﬁrst addition, then it goes to 122 for the second addition. It reaches 32762 modalities for the 10th addition, whereas the corresponding formula is solved for satisﬁability in 11052ms (lean size is 82). The satisﬁability check of the formula shown in Figure 3 is performed in 136ms with the implementation [Genev es et al., 2015b], including 3ms for computing the lean and 109ms for computing the tableau. A sample satisfying tree is built in 18ms.

5 Conclusion

We have presented the concept of logical combinators that avoid exponential increases in combined complexity due to sub-formula duplication. Our main result, of theoretical nature, has very practical consequences and applies for a large class of logical solvers such as the ones found in [Tanabe et al., 2005; Pan et al., 2006; Genev es et al., 2007; Genev es et al., 2015a; Tanabe et al., 2008]. We have further illustrated this result in the context of one of these satisﬁability solvers [Genev es et al., 2007; Genev es et al., 2015a], for which we have presented an indepth analysis. This analysis focuses on the time complexity of lean-based algorithms to decide the satisﬁability of a tree logic equipped with inverse programs, nominals, and counting introduced via combinators. The analysis highlights our result by showing that the lean automatically factorizes duplicated sub-formulas even for such advanced features, thus the complexity of the algorithm should not be stated in terms of the size of the initial formula but in terms of the size of the lean. A direct consequence of this observation is that the addition of nominals and a more general form of counting to the initial tree logic has no impact on decidability nor on its precise complexity bound. We have also reported on practical experiments using an implementation.

References [Afanasiev et al., 2005] L. Afanasiev, P. Blackburn, I. Dim-

itriou, B. Gaiffe, E. Goris, M. Marx, and M. de Rijke. PDL for ordered trees. Journal of Applied Non-Classical Logics, 15(2):115 135, 2005. [B arcenas et al., 2011] Everardo B arcenas, Pierre Genev es,

Nabil Laya ıda, and Alan Schmitt. Query reasoning on trees with types, interleaving, and counting. In IJCAI 2011: Proceedings of the 22nd International Joint Conference on Artiﬁcial Intelligence, pages 718 723, Jul 2011. [Benzaken et al., 2003] V. Benzaken, G. Castagna, and A. Frisch. CDuce: an XML-centric general-purpose language. In ICFP 03: ICFP 11: Proceedings of the ACM international conference on Functional programming, pages 51 63, 2003. [Benzaken et al., 2013] V eronique Benzaken, Giuseppe Castagna, Kim Nguyen, and J erˆome Sim eon. Static and dynamic semantics of nosql languages. In POPL 13: Proceedings of the ACM Symposium on Principles of Programming Languages, pages 101 114, 2013. [Boag et al., 2005] Scott Boag, Don Chamberlin, Mary Fern andez, Daniela Florescu, Jonathan Robie, and J erˆome Sim eon. XQuery 1.0: An XML query language, 2005. http://www.w3.org/TR/xquery/. [Castagna and Xu, 2011] Giuseppe Castagna and Zhiwu Xu.

Set-theoretic foundation of parametric polymorphism and subtyping. In ICFP 11: Proceedings of the 16th ACM SIGPLAN international conference on Functional Programming, pages 94 106, 2011. [Chekol et al., 2012] Melisachew Wudage Chekol, J erˆome

Euzenat, Pierre Genev es, and Nabil Laya ıda. SPARQL query containment under RDFS entailment regime. In IJCAR: Proceedings of the 6th International Joint Conference on Automated Reasoning, pages 134 148, 2012. [Dal-Zilio et al., 2004] Silvano Dal-Zilio, Denis Lugiez, and

Charles Meyssonnier. A logic you can count on. In POPL 04: Proceedings of the ACM Symposium on Principles of Programming Languages, 2004. [Genev es et al., 2007] Pierre Genev es, Nabil Laya ıda, and

Alan Schmitt. Efﬁcient static analysis of XML paths and types. In PLDI 07: Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, pages 342 351, 2007. [Genev es et al., 2012] Pierre Genev es, Nabil Laya ıda, and

Vincent Quint. On the analysis of cascading style sheets.

In WWW 12: Proceedings of the 21st World Wide Web Conference, pages 809 818, April 2012.

[Genev es et al., 2015a] Pierre Genev es, Nabil Laya ıda, Alan

Schmitt, and Nils Gesbert. Efﬁciently Deciding µCalculus with Converse over Finite Trees. ACM Transactions on Computational Logic, 16(2), 2015.

[Genev es et al., 2015b] Pierre Genev es, Nabil Laya ıda, Alan

Schmitt, and Nils Gesbert. The XML reasoning project: http://wam.inrialpes.fr/websolver/, April 2015. [Gesbert et al., 2011] Nils Gesbert, Pierre Genev es, and

Nabil Laya ıda. Parametric polymorphism and semantic subtyping: the logical connection. In ICFP 11: Proceedings of the 16th ACM SIGPLAN international conference on Functional programming, pages 107 116, 2011.

[Kolovski et al., 2007] Vladimir Kolovski, James Hendler,

and Bijan Parsia. Analyzing web access control policies. In Proceedings of the 16th international conference on World Wide Web, WWW 07, pages 677 686, 2007.

[Kozen, 1982] D. Kozen. Results on the propositional µCalculus. In ICALP 82: Proceedings of the International Colloquium on Automata, Languages, and Programming, 1982.

[Lugiez, 2005] Denis Lugiez. Multitree automata that count.

Theor. Comput. Sci., 333(1-2):225 263, 2005.

[Murata et al., 2006] Makoto Murata, Akihiko Tozawa, Michiharu Kudo, and Satoshi Hada. XML access control using static analysis. ACM Trans. Inf. Syst. Secur., 9(3):292 324, 2006.

[Pan et al., 2006] Guoqiang Pan, Ulrike Sattler, and Moshe Y. Vardi. BDD-based decision procedures for the modal logic K. Journal of Applied Non-classical Logics, 16(1-2):169 208, 2006.

[Tanabe et al., 2005] Yoshinori Tanabe, Koichi Takahashi,

Mitsuharu Yamamoto, Akihiko Tozawa, and Masami Hagiya. A decision procedure for the alternation-free twoway modal µ-calculus. In TABLEAUX, volume 3702 of Lecture Notes in Computer Science, pages 277 291, 2005.

[Tanabe et al., 2008] Yoshinori Tanabe, Koichi Takahashi,

and Masami Hagiya. A decision procedure for alternationfree modal µ calculi. In Advances in Modal Logic, pages 341 362, 2008.