# lifted_inference_with_linear_order_axiom__9d09a3de.pdf

Lifted Inference with Linear Order Axiom

Jan T oth, Ondˇrej Kuˇzelka

Faculty of Electrical Engineering Czech Technical University in Prague Prague, Czech Republic {tothjan2, ondrej.kuzelka}@fel.cvut.cz

We consider the task of weighted first-order model counting (WFOMC) used for probabilistic inference in the area of statistical relational learning. Given a formula ϕ, domain size n and a pair of weight functions, what is the weighted sum of all models of ϕ over a domain of size n? It was shown that computing WFOMC of any logical sentence with at most two logical variables can be done in time polynomial in n. However, it was also shown that the task is #P1-complete once we add the third variable, which inspired the search for extensions of the two-variable fragment that would still permit a running time polynomial in n. One of such extension is the two-variable fragment with counting quantifiers. In this paper, we prove that adding a linear order axiom (which forces one of the predicates in ϕ to introduce a linear ordering of the domain elements in each model of ϕ) on top of the counting quantifiers still permits a computation time polynomial in the domain size. We present a new dynamic programming-based algorithm which can compute WFOMC with linear order in time polynomial in n, thus proving our primary claim.

Introduction The task of probabilistic inference is at the core of many statistical machine learning problems and much effort has been invested into performing inference faster. One of the techniques, aimed mostly at problems from the area of statistical relational learning (Getoor and Taskar 2007), is lifted inference (Van den Broeck et al. 2021). A very popular way to perform lifted inference is to encode the particular problem as an instance of the weighted first-order model counting (WFOMC) problem. It is worth noting that applications of WFOMC range much wider, making it an interesting research subject in its own right. For instance, it was used to aid in conjecturing recursive formulas in enumerative combinatorics (Barv ınek et al. 2021). Computing WFOMC in the two-variable fragment of firstorder logic (denoted as FO2) can be done in time polynomial in the domain size, which is also referred to as FO2 being domain-liftable (Van den Broeck 2011). Unfortunately, it was also shown that the same does not hold in FO3 where the problem turns out to be #P1-complete in general (Beame et al. 2015). That has inspired a search for extensions of FO2

Copyright 2023, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

that would still be domain-liftable. Several new classes have been identified since then. Kazemi et al. (2016) introduced S2FO2 and S2RU. Kuusisto and Lutz (2018) extended the two-variable fragment with one functionality axiom and showed such language to still be domain-liftable. That result was later generalized to the two-variable fragment with counting quantifiers, denoted C2(Kuˇzelka 2021). Moreover, van Bremen and Kuˇzelka (2021b) proved that C2 extended by the tree axiom is still domain-liftable as well.1 Another extension of C2 can be obtained by adding a linear order axiom. Linear order axiom (Libkin 2004) enforces some relation in the language to introduce a linear (total) ordering on the domain elements. Such a constraint is inexpressible using only two variables, requiring special treatment. This logic fragment has also received some attention from logicians (Charatonik and Witkowski 2015). In this paper, we show that extending C2 with a linear order axiom yields another domain-liftable language. We present a new dynamic programming-based algorithm for computing WFOMC in C2 with linear order. The algorithm s running time is polynomial in the domain size meaning that C2 with linear order is domain-liftable. Even though our result is mostly of theoretical interest, we still provide some interesting applications and experiments. Among others, we perform exact inference in a Markov Logic Network (Richardson and Domingos 2006) on a random graph model similar to the one of Watts and Strogatz (Watts and Strogatz 1998).2

Background Let us now review necessary concepts, definitions and assumptions as well as notation. We use boldface letters such as k to differentiate vectors from scalar values such as n. If we do not name individual vector components such as k = (k1, k2, . . . , kd), then the i-th element of k is denoted by (k)i. Since our vectors only have non-negative entries, the sum of vector elements, i.e.,

1Other recent works in lifted inference not directly related to our work presented here are works of van Bremen and Kuˇzelka (2021a), Malhotra and Serafini (2022) and Wang et al. (2022). 2This paper is accompanied by a technical report available at https://arxiv.org/abs/2211.01164

The Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23)

Pd i=1(k)i, always coincides with the L1-norm. Hence, we use |k| as a shorthand for the sum. We also introduce special name δj for a vector such that

(δj)i = 1 if i = j, 0 otherwise.

For a vector k = (k1, k2, . . . , kd) with |k| = n, |k| k

= n k1, k2, . . . , kd

denotes the multinomial coefficient. We make use of one non-trivial identity of multinomial coefficients (Berge 1971), namely

We also assume the set of natural numbers N to contain zero and that 00 = 1. We use [n] to denote the set { 1, 2, . . . , n }.

First-Order Logic

We work with a function-free subset of first-order logic. The language is defined by a finite set of constants , a finite set of variables V and a finite set of predicates P. If the arity of a predicate P P is k, we also write P/k. An atom has the form P(t1, t2, . . . , tk) where P/k P and ti V. A literal is an atom or its negation. A formula is an atom and a literal. More complex formulas may be formed from existing formulas by logical connectives, or by surrounding them with a universal ( x) or an existential ( x) quantifier where x V. A variable x in a formula is called free if the formula contains no quantification over x. A formula is called a sentence if it contains no free variables. A formula is called ground if it contains no variables. As is customary in computer science, we adopt the Herbrand semantics (Hinrichs and Genesereth 2006) with a finite domain. Since we have a finite domain with a one-toone correspondence to the constant symbols, we denote the domain also with . We denote the Herbrand base by HB. We use ω to denote a possible world, i.e., any subset of HB. When we wish to restrict a possible world ω to only atoms with a particular predicate P, we write ω[P]. We work with logical sentences containing at most two variables (the language of FO2). We assume our FO2 sentences to be constant-free. Dealing with constants in lifted inference is a challenge in its own right. Treatment of conditioning on evidence as well as using constants in sentences is available in other literature (Van Den Broeck and Davis 2012; Van Haaren et al. 2016).

Weighted Model Counting and Lifted Formulation

Throughout this paper, we study the weighted first-order model counting. We will also make use of its propositional variant, the weighted model counting. Let us formally define both these tasks.

Definition 1. (Weighted Model Counting) Let ϕ be a logical formula over some propositional language L. Let HB denote the Hebrand base of L (i.e., the set of all propositional variables). Let w : HB 7 R and w : HB 7 R be a pair of weightings assigning a positive and a negative weight to each variable in L. We define

WMC(ϕ, w, w) = X

Definition 2. (Weighted First-Order Model Counting) Let ϕ be a logical formula over some relational language L. Let n be the domain size. Let HB denote the Hebrand base of L over the domain = { 1, 2, . . . , n }. Let P be the set of the predicates of the language L and let pred : HB 7 P map each atom to its corresponding predicate symbol. Let w : P 7 R and w : P 7 R be a pair of weightings assigning a positive and a negative weight to each predicate in L. We define WFOMC(ϕ, n, w, w) = X

l ω w(pred(l)) Y

w(pred(l)).

Remark 1. Since for any domain of size n, we can define a bijective mapping π such that π( ) = { 1, 2, . . . , n }, WFOMC is defined for an arbitrary domain of size n.

Cells and Domain-Liftability of FO2

We will not build on the original proof of domain-liftability of FO2 (Van den Broeck 2011; Van den Broeck, Meert, and Darwiche 2014), but rather on the more recent one (Beame et al. 2015). Let us review some parts of that proof as we make use of them later in the paper. An important concept is the one of a cell. Definition 3. A cell of a first-order formula ϕ is a maximally consistent conjunction of literals formed from atoms in ϕ using only a single variable. We will denote cells as C1(x), C2(x), . . . , Cp(x) and assume that they are ordered (indexed). Note, however, that the ordering is purely arbitrary. Example 1. Consider ϕ = Sm(x) Fr(x, y) Sm(y). Then there are four cells:

C1(x) = Sm(x) Fr(x, x), C2(x) = Sm(x) Fr(x, x), C3(x) = Sm(x) Fr(x, x), C4(x) = Sm(x) Fr(x, x).

It turns out, that if we fix a particular assignment of domain elements to the cells and if we then condition on such evidence, the WFOMC computation decomposes into mutually independent and symmetric parts, simplifying the computation significantly. When we say assignment of domain elements to cells, we mean a domain partitioning allowing empty partitions, that is ordered with respect to a chosen cell ordering. Each partition Sj then holds the constants assigned to the cell Cj. Such partitioning can be captured by a vector. We call such a vector a partitioning vector and often shorten the term to a p-vector.

Definition 4. Let C1, C2, . . . , Cp be cells of some logical formula. Let n be the number of elements in a domain. A partitioning vector (or a p-vector) of order n is any vector k Np such that |k| = n. Moreover, conditioning on some cells may immediately lead to an unsatisfiable formula. To avoid unnecessary computation with such cells, we only work with valid cells (van Bremen and Kuˇzelka 2021a). Definition 5. A valid cell of a first-order formula ϕ(x, y) is a cell of ϕ(x, y) and is also a model of ϕ(x, x). Example 2. Consider ϕ = F(x, y) (G(x) H(x)). Cells setting both G(x) and H(x) to false are not valid cells of ϕ. Let us now introduce some notation for conditioning on particular (valid) cells. Denote ψij(x, y) = ψ(x, y) ψ(y, x) Ci(x) Cj(y), ψk(x) = ψ(x, x) Ck(x), and define rij = WMC(ψij(A, B), w , w ), (1) wk = WMC(ψk(A), w, w), (2) where A, B and the weights w , w are the same as w, w except for the atoms appearing in the cells conditioned on. Those weights are set to one, since the weights of the unary and binary reflexive atoms are already accounted for in the wk terms. Finally, we can write WFOMC(ϕ, n, w, w) = X

i,j [p]:i<j r(k)i(k)j ij Y

i [p] r( (k)i 2 ) ii w(k)i i , (3)

which implies that universally quantified FO2 is domainliftable since Equation 3 may be evaluated in time polynomial in n. Using a specialized Skolemization procedure for WFOMC (Van den Broeck, Meert, and Darwiche 2014), we can easily extend the result to the entire FO2 fragment.

Cardinality Constraints and Counting Quantifiers WFOMC can be further generalized to WFOMC under cardinality constraints (Kuˇzelka 2021). For a predicate P P, we may extend the input formula by one or more cardinality constraints of the type (|P| k), where { , =, } and k N. Intuitively, a cardinality constraint (|P| = k) is satisfied in ω if there are exactly k ground atoms with predicate P in ω. Similarly for the inequality signs. Counting quantifiers are a generalization of the traditional existential quantifier. For a variable x V, we allow usage of a quantifier of the form kx, where { , =, } and k N. Satisfaction of formulas with counting quantifiers is defined naturally, in a similar manner to the satisfaction of cardinality constraints. For example, =kx : ψ(x) is satisfied in ω if there are exactly k constants { A1, A2, . . . , Ak } such that i [k] : ω |= ψ(Ai). Kuˇzelka (2021) showed C2 to be a domain-liftable language. That was done by reducing WFOMC in C2 to WFOMC in FO2 under cardinality constraints and showing that the two-variable fragment with cardinality constraints is also domain-liftable.

Linear Order Axiom Assuming logic with equality, we can encode that the predicate R enforces a linear ordering on the domain using the following logical sentences (Libkin 2004):

1. x : R(x, x), 2. x y : R(x, y) R(y, x), 3. x y : R(x, y) R(y, x) (x = y), 4. x y z : R(x, y) R(y, z) R(x, z).

The last sentence, expressing transitivity of the relation R, is the problematic one as it requires three logical variables. Hence, we will not simply append this axiomatic definition to the input formula but rather make use of a specialized algorithm. However, we must keep the axioms in mind, when constructing cells. Substituting x for both y and z into the axioms above leaves us with (after simplification) a single sentence enforcing reflexivity, i.e., x : R(x, x). Only cells adhering to this constraint can be valid. Throughout this paper, we denote the constraint that a predicate R introduces a linear order on the domain as Linear(R). For easier readability, we also make use of the traditional symbol for the linear order predicate whenever possible. We also prefer the infix notation rather than the prefix one as it is more commonly used together with sign. We also use (A < B) as a shorthand for (A B) (B A). We often write ϕ = ψ Linear( ), where we assume ψ to be some logical sentence in FO2 or C2 and one of the predicates of the language of ψ. Let us formalize the model of such a sentence.

Definition 6. Let ψ be a logical sentence possibly containing binary predicate . A possible world ω is a model of ϕ = ψ Linear( ) if and only if ω is a model of ψ, and ω[ ] satisfies the linear order axioms.

Our usual goal will be to compute WFOMC of ϕ over some domain. In such cases, part of the input will be weightings (w, w). Since we are treating as a special predicate that is only supposed to enforce an ordering of domain elements in the models of ϕ, we will always assume w( ) = w( ) = 1. One more consideration should be given to our assumption of having equality in the language. That is not a hard requirement since encoding equality in C2 (or FO2 with cardinality constraints) is relatively simple, compared to full first-order logic. For example, we may use the axioms:

1. x : (x = x), 2. x =1y : (x = y).

Example 3. As a simple example of what the linear order axiom allows us to express, consider the sentence ϕ = x y : ψ(x, y) Linear( ), where

ψ(x, y) = T(x) (x y) T(y).

How can we interpret models of ϕ? Due to Linear( ), the predicate will define a total ordering on the domain, e.g., 1 2 . . . n. Thus, we can think of the domain as a sequence.

Algorithm 1 Incremental WFOMC

Input: An FO2 sentence ϕ, n N, weightings (w, w) Output: WFOMC(ϕ, n, w, w) Require: i [n] k Np, |k| = i : Ti[k] = 0 1: for each cell Cj do 2: T1[δj] = wj 3: end for 4: for i = 2 to n do 5: for each cell Cj do 6: for each (kold, Wold) Ti 1 do

7: Wnew Wold wj Qp l=1 r(kold)l jl 8: knew kold + δj 9: Ti[knew] Ti[knew] + Wnew 10: end for 11: end for 12: end for 13: return P k Np:|k|=n Tn[k]

The formula ψ(x, y) then seeks to split that sequence into its beginning (head of the sequence) and its end (tail of the sequence). The predicate T/1 denotes the tail of the sequence. Whenever there is a constant, for which T/1 is set to true in a model (it is part of the tail), then all constants greater also have T/1 set to true. Constants, for which T/1 is set to false, then belong to the sequence head.

Approach To prove our main result, we proceed as follows. First, we present a new algorithm based on dynamic programming that computes WFOMC of a universally quantified FO2

sentence in an incremental manner, and it does so in time polynomial in the domain size. Note, that the assumption of universal quantification is not a limiting one, since we can apply the Skolemization for WFOMC to our input sentence before running the algorithm. Second, we show how to adapt the algorithm to compute WFOMC of a formula ϕ = ψ Linear( ), where ψ is a universally quantified FO2 sentence. And third, we use the algorithm as a new WFOMC oracle in the reductions of WFOMC in C2

to WFOMC in FO2, thus proving C2 extended by a linear order axiom to be domain-liftable.

New Algorithm Our algorithm for computing WFOMC(ϕ, n, w, w) for an FO2 sentence ϕ works in an incremental manner. The domain size is inductively enlarged in a similar way as in the domain recursion rule (Van den Broeck 2011; Kazemi et al. 2016). For each domain size i, the WFOMC for each possible p-vector is computed. The results are tracked in a table Ti which maps possible p-vectors to real numbers (the weighted counts). The results are then reused to compute entries in the table Ti+1. See Algorithm 1 for details. To compute an entry Ti+1[u] for a p-vector u, we must find all entries Ti[k] such that k + δj = u and Cj is one of the cells. Intuitively speaking, we will assign the new domain element (i + 1) to the cell Cj, which will extend the

existing models with new ground atoms containing the new domain element. The models will be extended by atoms corresponding to the subformula ψj(i + 1) (which, if we are only working with valid cells, are simply the positive literals from Cj) and by atoms corresponding to the subformula ψjk(i + 1, i ) for each cell Ck and each domain element already processed (i.e., 1 i < i + 1). As we can construct the new models by extending the old, we can also compute the new model weight from the old. The weight update can be seen on Line 7 of Algorithm 1. To prove correctness of Algorithm 1, we prove that its result is the same as is specified in Equation 3. For better readability, we split the proof into an auxiliary lemma, which proves a particular property of table entries at the end of each iteration i, and the actual statement of the algorithm s correctness. Lemma 1. At the end of iteration i of the for-loop on lines 4 12, it holds that

Ti[k] = i k

i,j [p]:i<j r(k)i(k)j ij

i=1 r( (k)i 2 ) ii w(k)i i ,

for any i 2 and any p-vector k such that |k| = i.

Proof. Let us prove Lemma 1 by induction on the iteration number. First, consider i = 2. When entering the loop for the first time, we have T1[δj] = wj for each cell Cj. Then, for a particular cell Cj selected on Line 5, there are two cases to consider. The first case is kold = δj. Then Wold = wj and

Wnew = wjwj

i [p]:i =j r0 ji

rjj = 1 rjjw2 j.

Moreover, knew = 2δj. Since this is the only scenario where we obtain such knew and since 2 2δj = 1, we have

T2[2δj] = 2 2δj

The second possibility is that kold = δj , where j = j. Then Wold = wj and Wnew = w jwjrjj . The new p-vector knew = δj +δj will also be obtained when the selected cell is Cj and kold = δj. The resulting Wnew will be the same as above. Those values will be summed together (Line 9) and produce

T2[δj + δj ] = 2 rjj wjwj = 2 δj + δj

Hence, the lemma holds at the end of the first iteration. Second, assume the claim holds at the end of iteration i. Let us investigate the entry Ti+1[k]. For now, consider k without any zero entries. Then there are p cases that will produce a particular p-vector k = (k1, k2, . . . , kp), namely kold = (k1 1, k2, . . . , kp) and cell C1 kold = (k1, k2 1, . . . , kp) and cell C2 ... kold = (k1, k2, . . . , kp 1) and cell Cp.

For a particular cell Cj and kold = k δj, we have by induction hypothesis:

Wold = i k δj

r( (k)j 1 2 ) jj w(k)j 1 j Y

i,l [p]:i<l,i =j =l r(k)i(k)l il

i [p]:i =j r( (k)i 2 ) ii w(k)i i Y

i [p]:i =j r((k)j 1)(k)i ji .

Following the weight update on Line 7 and simplifying afterwards, the value will become

Wnew = i k δj

i,l [p]:i<l r(k)i(k)l il Y

i [p] r( (k)i 2 ) ii w(k)i i .

Observe that the product after the multinomial coefficient will be the same for any of the p cases outlined above. Hence, the final new table entry is given by

i,l [p]:i<l r(k)i(k)l il Y

i [p] r( (k)i 2 ) ii w(k)i i

i,l [p]:i<l r(k)i(k)l il Y

i [p] r( (k)i 2 ) ii w(k)i i

i,l [p]:i<l r(k)i(k)l il Y

i [p] r( (k)i 2 ) ii w(k)i i ,

which is consistent with the claim. The last thing to consider is if there are some zero entries in k. Suppose there are z of them and w.l.o.g. assume they are on the positions (p z + 1), (p z + 2), . . . p. Then we obtain a result such that

Ti+1[k] = Y

i,j [p]:i<j r(k)i(k)j ij

i=1 r( (k)i 2 ) ii w(k)i i

Denote u the first p z components of the vector k δj. Note that i k δj = i u , since the last z entries are all zeros. Hence, even now it holds that

Ti+1[k] = i + 1 k

i,j [p]:i<j r(k)i(k)j ij

i=1 r( (k)i 2 ) ii w(k)i i .

Theorem 1. Algorithm 1 computes WFOMC(ϕ, n, w, w) of a universally quantified FO2 sentence ϕ in prenex normal form. Moreover, it does so in time polynomial in the domain size n.

Proof. By Lemma 1, we have

Tn[k] = n k

i,j [p]:i<j r(k)i(k)j ij

i=1 r( (k)i 2 ) ii w(k)i i .

On Line 13, all those entries are summed together which produces a formula identical to the one in Equation 3.

As for the second part of the claim. The first loop on lines 1 3 runs in time O(1) with respect to n. The large loop on lines 4 12 runs in O(n). The first nested loop (lines 5 11) is again independent of n, and the second (lines 6 10) runs in O(np). The final sum on Line 13 also runs in O(np). Overall, we can upper bound the algorithm s time complexity by O(np+1). Hence, the running time is polynomial in the domain size n.

Enforcing a Linear Order When adding the linear order axiom to the input sentence ψ, each model of ψ will be with respect to some domain ordering. Assume we find the set Ωof all models for one fixed ordering. Having a domain permutation π,

ω Ω { π(ω) }

will be the set of all models with respect to the new domain ordering defined by π. Hence, the situation is symmetric for any particular ordering of the domain.

Theorem 2. Let ϕ be a formula of the form ϕ = ψ(x, y) Linear( ), where ψ(x, y) is a universally quantified FO2

sentence and is one of its predicates. Let be a domain over which we want to compute WFOMC. If ω |= ϕ and π is a permutation of , such that π( ) = , then π(ω) |= ϕ, where application of π to a possible world is defined by appropriate substitution of the domain elements in ground atoms. Moreover, ω = π(ω).

Proof. If ω is a model of ϕ, we can partition ω into two disjoint sets: ω[ ] holding only atoms with the predicate and ωψ = ω \ ω[ ]. ω[ ] defines an ordering of and ωψ is then a model of x y : ψ(x, y) respecting the ordering defined by ω[ ]. Applying the permutation π to ω[ ] will define a different domain ordering. Since there are no constants in ϕ, π(ωψ) will still be a model of x y : ψ(x, y) (we simply apply a different substitution to the variables in ψ). Moreover, since ωψ respected the ordering defined by ω[ ],π(ωψ) will respect the new ordering defined by π(ω[ ]). Hence π(ω) = π(ω[ ]) π(ωψ) is another model of ϕ and it must be different from ω, because π(ω[ ]) defines a different ordering than ω[ ].

Corollary 1. To compute WFOMC(ϕ, n, w, w), where ϕ = ψ(x, y) Linear( ), we can compute WFOMC for one ordered domain of size n and then multiply the result by the factorial of n, since there are n! different permutations of the domain.

Let us now show that we can compute WFOMC of a formula ϕ = ψ Linear( ) for a fixed domain ordering using only slightly modified Algorithm 1. The modified algorithm will take advantage of the fact that when we are processing the i-th domain element, it holds that i < i for all already processed domain elements i . Hence, when extending the domain by the constant i (and consequently, extending the models by atoms containing i), the only difference will be in the models of the subformulas ψij(A, B), where

A, B . The one constant must be greater than the other in the sense of the enforced domain ordering. Thus, we only need to redefine rij to reflect this. Then, we may prove that FO2 with a linear order axiom is domain-liftable in a similar manner to how we proved correctness of Algorithm 1 for FO2 alone. Let us redefine rij =

WMC(ψij(A, B) (B A) (A B), w , w ) (4)

Theorem 3. Incremental WFOMC with rij values from Equation 4 computes WFOMC(ϕ, n, w, w) of a universally quantified FO2 sentence ϕ in prenex normal form on the ordered domain = { 1 2 . . . n }. Moreover, it does so in time polynomial in the domain size n.

Proof. Let us prove the claim by induction on size of the domain. The base step is analogical to the one in proof of Lemma 1. More generally speaking, for a domain of a constant size K (K = 1 in Algorithm 1), we may simply ground the problem and compute its WMC without any lifting. Since K is a constant with respect to n, we won t exceed the polynomial running time. The inductive step differs from the one for Lemma 1, but still builds on the same intuition. Now, assume that our algorithm computes WFOMC with linear order for a domain of size i, where the result is stored as the table entries Ti[k] for all p-vectors k such that |k| = i (the final result would be obtained by summing those entries together). Consider processing of the element (i + 1). For a particular cell Cj and a p-vector k, adding the new element will again extend the existing models with new atoms. First, atoms corresponding to the subformula ψj(i+1) will be added, hence the old weight must be multiplied by wj. Second, atoms corresponding to the subformulas ψjk(i+1, i ) for each cell Ck and each processed element i (1 i < i + 1). However, only possible worlds satisfying i < i + 1 on top of that, will be models of the input sentence with respect to the fixed domain ordering. That is precisely captured by rij from Equation 4. Other possible worlds will be assigned zero weight. Hence,

Wnew = Wold wj

l=1 r(k)l jl .

There are more possible p-vectors u and cells Cm such that u + δm = k + δj = knew. Those all correspond to different, mutually independent models whose weights can be added together. Since we are processing all possible pvectors, those also correspond to the only existing models. Therefore, at the end of the final iteration, we will have summed up weights of all existing models of size n. And since we only substituted one value in the original Algorithm 1, the computation still runs in time polynomial in the domain size.

Theorem 4. The language of FO2 extended by a linear order axiom is domain-liftable.

Proof. For an input sentence ϕ = ψ Linear( ), where ψ is an FO2 sentence, start with converting ψ to a prenex

normal form with each predicate having arity at most 2 (Gr adel, Kolaitis, and Vardi 1997). Then apply the Skolemization for WFOMC (Van den Broeck, Meert, and Darwiche 2014) to obtain a sentence of the form ϕ = x y : ψ(x, y) Linear( ), where ψ is a quantifier-free formula. By Theorem 3, we know that Algorithm 1 computes WFOMC(ϕ, n, w, w) for one fixed ordering of the domain in time polynomial with respect to the domain size. Once we have that value, we may multiply it by n! to obtain the overall WFOMC, as is stated in Corollary 1. The entire computation thus runs in time polynomial in the domain size.

A Worked Example of Incremental WFOMC Let us now use another example of splitting a sequence to demonstrate the work of Algorithm 1. Consider the sentence ϕ = x y : ψ(x, y) Linear( ), where ψ is the conjunction of

H(x) T(x), H(y) (x y) H(x), T(x) (x y) T(y).

This time, we model a three-way split of a sequence, differentiating its head, tail and middle. We have already seen the third formula, which defines a property of the sequence tail. The second formula does the same for the head. We also require that for each element, at least one of H/1, T/1 is set to false. If both were set to true, then one element should be part of both the head and the tail, which is obviously something, we do not want. If they are both set to false, then the element is part of the sequence middle. Our goal is to compute WFOMC(ϕ, n, w, w), where (w, w) are some weight functions. For more clarity in the computations below, we leave the weights as parameters (except for the predicate, whose weights are fixed to one). We will substitute concrete numbers at the end of our example. First, we construct valid cells of ψ. There are 3 in total:

C1(x) = H(x) T(x) (x x) C2(x) = H(x) T(x) (x x) C3(x) = H(x) T(x) (x x)

Having valid cells, we need to compute the values rij and wk. Since we left the input weight functions as parameters, those cannot be specified numerically. Instead, we use their respective symbols. Finally, we can start with the pseudocode. Following the loop on Lines 1 3, we obtain the table T1 as follows:

T1[(1, 0, 0)] = w1 T1[(0, 1, 0)] = w2 T1[(0, 0, 1)] = w3

For the main loop on Lines 4 12, we have i = [2, 3] and j = [1, 2, 3].

Set j = 1. Now we iterate over entries in T1. First, we have kold = (1, 0, 0) and Wold = w1.

We compute the new weight as Wnew Wold w1 r1 11 r0 12 r0 13 = w2 1r11.

The new p-vector will be knew (2, 0, 0). The old value T2[(2, 0, 0)] = 0. Hence, we will set T2[(2, 0, 0)] 0 + w2 1r11.

Analogically with other key-value pairs, we arrive at T2[(1, 1, 0)] 0 + w1w2r12 T2[(1, 0, 1)] 0 + w1w3r13 Set j = 2. Again, iterate over entries in T1. First, we have kold = (1, 0, 0) and Wold = w1. We compute the new weight as Wnew Wold w2 r1 21 r0 22 r0 23 = w1w2r21.

The new p-vector knew (1, 1, 0) already has nonzero value set in T2, i.e., T2[(1, 1, 0)] = w1w2r12.

Hence, we will now assign T2[(1, 1, 0)] w1w2(r12 + r21).

Again, analogically for other values: T2[(0, 2, 0)] 0 + w2 2r22 T2[(0, 1, 1)] 0 + w2w3r23 After repeating the steps for j = 3, we arrive at the complete table T2 with entries: T2[(2, 0, 0)] = w2 1r11 T2[(1, 1, 0)] = w1w2(r12 + r21) T2[(1, 0, 1)] = w1w3(r13 + r31)

T2[(0, 2, 0)] = w2 2r22 T2[(0, 1, 1)] = w2w3(r23 + r32)

T2[(0, 0, 2)] = w2 3r33 When performing the computation for i = 3, we now iterate over entries in T2. Hence, for each j, there will now be six p-vector keys and their respective values to process. Eventually, we arrive at T3 such that T3[(3, 0, 0)] = w3 1r3 11 T3[(2, 1, 0)] = w2 1w2r11[r12(r12 + r21) + r2 21]

T3[(2, 0, 1)] = w2 1w3r11[r13(r13 + r31) + r2 31]

T3[(1, 2, 0)] = w1w2 2r22[r21(r21 + r12) + r2 12] T3[(1, 1, 1)] = w1w2w3[r12r13(r23 + r32) + r21r23(r13 + r31) + r31r32(r12 + r21)]

T3[(1, 0, 2)] = w1w2 3r33[r31(r31 + r13) + r2 13]

T3[(0, 3, 0)] = w3 2r3 22 T3[(0, 2, 1)] = w2 2w3r22[r23(r23 + r32) + r2 32]

T3[(0, 1, 2)] = w2w2 3r33[r32(r32 + r23) + r2 23]

T3[(0, 0, 3)] = w3 3r3 33

Per Line 13, the final result is obtained by summing all the values in T3 that are written above. To find the number of three-way sequence splits, we set all weights to one. For unitary weights, we obtain w1 w2 w3

r11 r12 r13 r21 r22 r23 r31 r32 r33

1 0 0 1 1 1 1 0 1

Plugging those values into T3 and summing produces X

k N3:|k|=3 T3[k] = 10,

which can be checked to be the correct value, e.g., by using the popular stars and bars method.

Domain-Liftability of C2 with Linear Order WFOMC in C2 may be reduced to WFOMC in FO2 under cardinality constraints. WFOMC under cardinality constraints may then be solved by repeated calls to a WFOMC oracle. As there will only be a polynomial number of such calls in the domain size, it follows that FO2 with cardinality constraints and also C2 are domain-liftable (Kuˇzelka 2021). Since the C2 domain-liftability proof only relies on a domain-lifted WFOMC oracle, we may use our new algorithm for computing WFOMC with linear order as that oracle, leading to our final result. Theorem 5. The language of C2 extended by a linear order axiom is domain-liftable. We omit the proof as it would consist of almost word by word restating of the already available proof on domainliftability of C2 (Kuˇzelka 2021) with only cosmetic changes.

Predecessor Relation Having a domain ordering, an important relation is the one of the immediate predecessor. Denoting Pred(x, y) the predecessor relation, i.e., x is the immediate predecessor of y under the order enforced by , we may encode the predecessor relation using the sentences 1. x : Perm(x, x), 2. x =1y : Perm(x, y), 3. y =1x : Perm(x, y), 4. x y : Pred(x, y) Perm(x, y), 5. x y : Pred(x, y) (x y), 6. |Pred| = n 1. We use an auxiliary relation Perm/2 for the encoding. Perm/2 is assumed to be a fresh predicate symbol and it captures a specific permutation of elements. Each domain element is mapped to its immediate successor in the ordering, except for the last one (as it has no successors). The last element in the ordering is mapped by Perm/2 to the very first one, which is the only transition in our permutation from a greater element to a smaller one. Finally, with the permutation defined, we copy all its smaller-to-greater transitions over to the predecessor relation. See the online technical report for details as well as generalization of the predecessor relation.

Experiments To check our results empirically, as well as to assess how our approach scales, we implemented the proposed algorithm in the Julia programming language (Bezanson et al. 2017). The implementation follows the algorithmic approach presented in the paper, with one notable exception. Counting quantifiers and cardinality constraints are not handled by repeated calls to a WFOMC oracle and subsequent polynomial interpolation (Kuˇzelka 2021). Instead, they are processed by introducing a symbolic variable3 for each cardinality constraint and computing the polynomial (that would be interpolated) explicitly in a single run of the algorithm. We made use of the Nemo.jl package (Fieker et al. 2017) for polynomial representation and manipulation.

Inference in Markov Logic Networks Using Incremental WFOMC, we can perform exact lifted probabilistic inference over Markov Logic Networks that use the language of C2 with the linear order axiom. We propose one such network over a random graph model similar to the one of Watts and Strogatz. Then, we present inference results for that network obtained by our algorithm. First, we review necessary background. Then, we describe our graph model. Finally, we present the computed results.

Markov Logic Networks Markov Logic Networks, often abbreviated as MLNs (Richardson and Domingos 2006), are a popular model from the area of statistical relational learning. An MLN Φ is a set of weighted first-order logic formulas (possibly with free variables) with weights taking on values from the real domain or infinity:

Φ = { (w1, α1), (w2, α2), . . . , (wk, αk) }

Given a domain , the MLN defines a probability distribution over possible worlds such as

PrΦ, (ω) = Jω |= Φ K

(wi,αi) ΦR wi N(αi, ω)

where ΦR denotes formulas with real-valued weights (soft constraints), Φ denotes formulas with infinity-valued weights (hard constraints), J K is the indicator function, Z is the normalization constant ensuring valid probability values and N(αi, ω) is the number of substitutions to free variables of αi that produce a grounding of those free variables that is satisfied in ω. The distribution formula is equivalent to the one of a Markov Random Field (Koller and Friedman 2009). Hence, an MLN along with a domain define a probabilistic graphical model and inference in the MLN is thus inference over that model. Inference (and also learning) in MLNs is reducible to WFOMC (Van den Broeck, Meert, and Darwiche 2014). For each (wi, αi(xi)) ΦR, introduce a new formula xi : ξi(xi) αi(xi), where ξi is a fresh predicate, and set w(ξi) = exp(wi), w(ξi) = 1 and w(Q) = w(Q) = 1

3Symbolic weights have also been recently used in probabilistic generating circuits (Zhang, Juba, and Van den Broeck 2021) in a similar way to ours.

for all other predicates Q. Formulas in Φ are added to the theory as additional constraints. Denoting the new theory by Γ and a query by ϕ, we can compute the inference as

PrΦ, (ϕ) = WFOMC(Γ ϕ, | |, w, w)

WFOMC(Γ, | |, w, w) .

Watts-Strogatz Model The model of Watts and Strogatz (Watts and Strogatz 1998) is a procedure for generating a random graph of specific properties. First, having n ordered nodes, each node is connected to K (assumed to be an even integer) of its closest neighbors by undirected edges (discarding parallel edges). If the sequence end or beginning are reached, we wrap to the other end. Second, each edge (i, j) for each node i is rewired with probability β. Rewiring of (i, j) means that node k is chosen at random and the edge is changed to (i, k).

Our Model We start constructing our graph model in the same manner as Watts and Strogatz, with K = 2. Ergo, we obtain one cyclic chain going over all our domain elements:

1 2 3 4 .. . n

However, we do not perform the rewiring. Instead, we simply add m additional edges at random. Hence, all nodes will be connected by the chain and, moreover, there will be various shortcuts as well. Finally, we add a weighted formula saying that friends (friendship is represented by the edges) of smokers also smoke. Intuitively, for large enough weight, our model should prefer those possible worlds where either nobody smokes or everybody does. Let us now formally state the MLN that we work with:

Φ = {( , x : Perm(x, x)), (5) ( , x y : Perm(x, y), (6) ( , y x : Perm(x, y), (7) ( , x y : Pred(x, y) Perm(x, y)), (8) ( , x y : Pred(x, y) (x y)), (9) ( , |Perm| = n), (10) ( , |Pred| = n 1), (11) ( , x y : Perm(x, y) E(x, y)), (12) ( , x y : E(x, y) E(y, x)), (13) ( , x : E(x, y)), (14) ( , |E| = 2n + 2m), (15) (ln w, Sm(x) E(x, y) Sm(y))} (16)

Senteces 5 through 11 come from the predecessor definition. They define the basic cyclic chain, albeit a directed one. We reduced the counting quantifiers to ordinary existential quantifiers by adding the cardinality constraint (the sentence) 10 (Kuˇzelka 2021). Formula 12 copies all Perm/2 transitions to E/2 and formula 13 makes the edges undirected. Moreover, sentence 14 prohibits loops. Sentence 15 then requires that there are exactly n + m undirected edges in the graph. As all these

are hard constraints, every model must define our predefined graph model. The only soft constraint is sentence 16. By manipulating its weight, we may determine how important it is for the formula to be satisfied in an interpretation.

Inference We can use Incremental WFOMC to run exact inference in the MLN described above. We may query the probability that a particular domain member (element) smokes. Obviously, the probability will be the same for any domain member. We will thus combine all of these together and query for the probability of there being exactly k smokers, instead. Denote Γ the theory obtained when we reduce the MLN Φ to WFOMC. We may answer the query as

Pr(|Sm| = k) = WFOMC(Γ (|Sm| = k), n, w, w)

WFOMC(Γ, n, w, w) .

To relate our model to others which can be modelled without the linear order axiom, we compare the results to inference over a completely random undirected graph with the same number of edges. Intuitively, completely random graph may form more disconnected components, thus not necessarily preferring the extremes, i.e., either nobody smokes or everybody does. We also keep the parameter m relatively small since, for large m, even the random graph would likely form just one connected component. The MLN over a random graph is defined as follows:

Φ = {( , E(x, y) E(y, x)), ( , E(x, y)), ( , |E| = 2n + 2m), (ln w, Sm(x) E(x, y) Sm(y))}

Figure 1 depicts the inference results for a domain size n = 10 and weight w = 3. The parameter m is set to 5, 8 and 10, respectively. As one can observe, our model prefers the extreme values more, which is consistent with our intuition above.

We showed how to compute WFOMC in C2 with linear order axiom in time polynomial in the domain size. Hence, we showed the language of C2 extended by a linear order to be domain-liftable. The computation can be performed using our new algorithm, Incremental WFOMC.

Acknowledgements

This work was supported by Czech Science Foundation project Generative Relational Models (2019104Y) and partially by the OP VVV project CZ.02.1.01/0.0/0.0/16 019/0000765 Research Center for Informatics . JT s work was also supported by a donation from X-Order Lab.

0 1 2 3 4 5 6 7 8 9 10 |Sm|

Watts-Strogatz Random graph

0 1 2 3 4 5 6 7 8 9 10 |Sm|

Watts-Strogatz Random graph

0 1 2 3 4 5 6 7 8 9 10 |Sm|

Watts-Strogatz Random graph

Figure 1: Probability of n smokers for w = 3

References Barv ınek, J.; van Bremen, T.; Wang, Y.; ˇZelezn y, F.; and Kuˇzelka, O. 2021. Automatic Conjecturing of P-Recursions Using Lifted Inference. In Inductive Logic Programming: 30th International Conference, ILP 2021, Virtual Event, October 25 27, 2021, Proceedings, 17 25. Berlin, Heidelberg: Springer-Verlag. ISBN 978-3-030-97453-4. Beame, P.; Van den Broeck, G.; Gribkoff, E.; and Suciu, D. 2015. Symmetric Weighted First-Order Model Counting. In Proceedings of the 34th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 15, 313 328. New York, NY, USA: Association for Computing Machinery. ISBN 9781450327572. Berge, C. 1971. Principles of Combinatorics. ISSN. Elsevier Science. ISBN 9780080955810. Bezanson, J.; Edelman, A.; Karpinski, S.; and Shah, V. B. 2017. Julia: A fresh approach to numerical computing. SIAM review, 59(1): 65 98. Charatonik, W.; and Witkowski, P. 2015. Two-variable Logic with Counting and a Linear Order. In Kreutzer, S., ed., 24th EACSL Annual Conference on Computer Science Logic (CSL 2015), volume 41 of Leibniz International Proceedings in Informatics (LIPIcs), 631 647. Dagstuhl, Germany: Schloss Dagstuhl Leibniz-Zentrum fuer Informatik. ISBN 978-3-939897-90-3. Fieker, C.; Hart, W.; Hofmann, T.; and Johansson, F. 2017. Nemo/Hecke: Computer Algebra and Number Theory Packages for the Julia Programming Language. In Proceedings of the 2017 ACM on International Symposium on Symbolic and Algebraic Computation, ISSAC 17, 157 164. New York, NY, USA: ACM. Getoor, L.; and Taskar, B. 2007. Introduction to statistical relational learning. The MIT Press. Gr adel, E.; Kolaitis, P. G.; and Vardi, M. Y. 1997. On the decision problem for two-variable first-order logic. Bull. Symb. Log., 3(1): 53 69. Hinrichs, T.; and Genesereth, M. 2006. Herbrand Logic. Technical Report LG-2006-02, Stanford University, Stanford, CA. Http://logic.stanford.edu/reports/LG-200602.pdf. Kazemi, S. M.; Kimmig, A.; Van den Broeck, G.; and Poole, D. 2016. New Liftable Classes for First-Order Probabilistic Inference. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS 16, 3125 3133. Red Hook, NY, USA: Curran Associates Inc. ISBN 9781510838819. Koller, D.; and Friedman, N. 2009. Probabilistic Graphical Models: Principles and Techniques. Adaptive computation and machine learning. MIT Press. ISBN 9780262013192. Kuusisto, A.; and Lutz, C. 2018. Weighted model counting beyond two-variable logic. In Proceedings of the 33rd Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2018, 619 628. Kuˇzelka, O. 2021. Weighted First-Order Model Counting in the Two-Variable Fragment With Counting Quantifiers. Journal of Artificial Intelligence Research, 70: 1281 1307.

Libkin, L. 2004. Elements of Finite Model Theory, chapter 1.2, 4. Springer. ISBN 3540212027. Malhotra, S.; and Serafini, L. 2022. Weighted Model Counting in FO2 with Cardinality Constraints and Counting Quantifiers: A Closed Form Formula. In Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 5817 5824. Richardson, M.; and Domingos, P. 2006. Markov Logic Networks. Machine Learning, 62(1 2): 107 136. van Bremen, T.; and Kuˇzelka, O. 2021a. Faster lifting for two-variable logic using cell graphs. In de Campos, C.; and Maathuis, M. H., eds., Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, volume 161 of Proceedings of Machine Learning Research, 1393 1402. PMLR. van Bremen, T.; and Kuˇzelka, O. 2021b. Lifted Inference with Tree Axioms. In Proceedings of the 18th International Conference on Principles of Knowledge Representation and Reasoning, 599 608. Van den Broeck, G. 2011. On the Completeness of First Order Knowledge Compilation for Lifted Probabilistic Inference. In Proceedings of the 24th International Conference on Neural Information Processing Systems, NIPS 11, 1386 1394. Red Hook, NY, USA: Curran Associates Inc. ISBN 9781618395993. Van Den Broeck, G.; and Davis, J. 2012. Conditioning in First-Order Knowledge Compilation and Lifted Probabilistic Inference. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, AAAI 12, 1961 1967. AAAI Press. Van den Broeck, G.; Kersting, K.; Natarajan, S.; and Poole, D. 2021. An Introduction to Lifted Probabilistic Inference. MIT Press. Van den Broeck, G.; Meert, W.; and Darwiche, A. 2014. Skolemization for Weighted First-Order Model Counting. In Proceedings of the Fourteenth International Conference on Principles of Knowledge Representation and Reasoning, KR 14, 111 120. AAAI Press. ISBN 1577356578. Van Haaren, J.; Van den Broeck, G.; Meert, W.; and Davis, J. 2016. Lifted generative learning of Markov logic networks. Machine Learning, 103: 27 55. Wang, Y.; van Bremen, T.; Wang, Y.; and Kuˇzelka, O. 2022. Domain-Lifted Sampling for Universal Two-Variable Logic and Extensions. In Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 10070 10079. Watts, D. J.; and Strogatz, S. H. 1998. Collective dynamics of small-world networks. Nature, 393(6684): 440 442. Zhang, H.; Juba, B.; and Van den Broeck, G. 2021. Probabilistic generating circuits. In International Conference on Machine Learning, 12447 12457. PMLR.