# openuniverse_weighted_model_counting__855aa43a.pdf

Open-Universe Weighted Model Counting

Vaishak Belle University of Edinburgh vaishak@ed.ac.uk

Weighted model counting (WMC) has recently emerged as an eﬀective and general approach to probabilistic inference, oﬀering a computational framework for encoding a variety of formalisms, such as factor graphs and Bayesian networks. The advent of large-scale probabilistic knowledge bases has generated further interest in relational probabilistic representations, obtained by according weights to ﬁrst-order formulas, whose semantics is given in terms of the ground theory, and solved by WMC. A fundamental limitation is that the domain of quantiﬁcation, by construction and design, is assumed to be ﬁnite, which is at odds with areas such as vision and language understanding, where the existence of objects must be inferred from raw data. Dropping the ﬁnite-domain assumption has been known to improve the expressiveness of a ﬁrst-order language for open-universe purposes, but these languages, so far, have eluded WMC approaches. In this paper, we revisit relational probabilistic models over an inﬁnite domain, and establish a number of results that permit eﬀective algorithms. We demonstrate this language on a number of examples, including a parameterized version of Pearl s Burglary-Earthquake-Alarm Bayesian network.

Introduction

Weighted model counting (WMC) has recently emerged as an eﬀective and general approach to probabilistic inference, oﬀering a computational framework for encoding a variety of formalisms, including factor graphs and Bayesian networks (Choi, Kisa, and Darwiche 2013; Chavira and Darwiche 2008). WMC generalizes #SAT, where we are to count the models of a propositional formula, in that models can be additionally accorded numeric weights, whose sum we are to compute. In particular, the encoding of graphical models to a propositional theory allows us to leverage context-speciﬁc dependencies, express hard constraints, and reason about logical equivalence. Exact solvers are based on knowledge compilation (Chavira and Darwiche 2008) or exhaustive DPLL search (Sang, Beame, and Kautz 2005); approximate ones use local search (Wei and Selman 2005) or sampling (Ermon et al. 2013; Chakraborty et al. 2014).

This research was supported in part by the Research Foundation-Flanders (FWO-Vlaanderen). Copyright c 2017, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

The advent of large-scale probabilistic knowledge bases, such as Google s Knowledge Vault (Dong et al. 2014) and Microsoft s Probase (Wu et al. 2012), often containing billions of tuples and structured data extracted from the Web and other text corpora has generated enormous interest in relational probabilistic representations (Getoor and Taskar 2007). In Markov logic networks (Richardson and Domingos 2006), for example, the weighted formula:

1.2 x, y Smoker(x) Friends(x, y) Smoker(y)

is indicative of a fairly involved Markov network obtained by grounding the formula wrt all possible values for {x, y} and assigning a potential of 1.2 to the edges of the network. Such template models can be seen as modest relatives of general-purpose probabilistic logics (Bacchus 1990; Halpern 1990) in succinctly characterizing large ﬁnite graphical models, making them appropriate for reasoning and learning with big uncertain data. In practice, the semantics of template models is given in terms of the ground (propositional) theory, for which WMC and its extensions (Gogate and Domingos 2011; Van den Broeck 2013) suﬃce. By construction, and indeed by design, a fundamental limitation of these template models is its ﬁnite domain closure assumption: the range of the quantiﬁers is assumed to be ﬁnite, typically the set of constants appearing in the logical theory. As argued in (Russell 2015), such a closed-world system is at odds with areas such as vision, language understanding, and Web mining, where the existence of objects must be inferred from raw data. Various syntactic and semantic devices have been proposed to address this, including probabilistic programming and inﬁnite-state Bayesian networks (Milch et al. 2005a; 2005b). While the progress has been signiﬁcant, we lack a complete picture of the expected properties from a logical perspective. In this paper, we revisit open-universe (OU) template models, and develop an account of probabilistic inference by weighted model counting. Formally, our proposal will allow quantiﬁers to range over an inﬁnitary set of rigid designators constants that exist in all possible worlds a common technical device from modal logic, also used elsewhere in OU applications (Srivastava et al. 2014). Such languages have eluded WMC approaches so far. As an extension, it is powerful, and can be used for features such as: unknown atoms: { x (x john Smoker(x))} says that in-

Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17)

ﬁnitely many individuals other than John are smokers, while also leaving open whether John is a smoker;

unknown values: { x(Canary(x) Color(x, black))} says the color of canaries is anything but black;

varying sets of objects: { x(Thing(x) φ(x))} says that property φ is true of all things, but the set of (possibly inﬁnite) things can vary depending on the world;

closed-world assumption: x((x = john) Smoker(x)) x((x john) Smoker(x)) says that John is the only one in the universe who smokes.

In general, such a ﬁrst-order logical language is undecidable. However, we show that when restricted to universally quantiﬁed clauses, as is usual in the literature on template models, a ﬁnite version of the ground knowledge base sufﬁces for reasoning purposes. When probabilities are further accorded to inﬁnitely many atoms, the natural question is in which sense are conditional probabilities correct? Prior work has focused on various semantics with inﬁnitely many random variables (Singla and Domingos 2007; Doshi et al. 2011; Kersting and Raedt 2000), based on topological orderings and locality properties. In this work, we contribute a deﬁnition of conditional probabilities that is coherent wrt ﬁrst-order logical entailment: this has the beneﬁt that no syntactical constraints on the encoding are necessary. We show its application in a number of examples, including a parametrized version Pearl s (1988) Alarm Bayesian network discussed in (Russell 2015). Overall, the main contributions are: results on satisﬁability and validity for ﬁrst-order clauses over an inﬁnite universe, a coherent account of weighted model counting in an OU setting, the leveraging of existing algorithms for the clausal representation, and ﬁnally, numerous examples that have been veriﬁed using existing WMC solvers. Proof sketches for all the main claims are found in the appendix.

Preliminaries Logical Background

Language: We let L be a ﬁrst-order language with equality, relational symbols {P(x), Q(x, y), R(x, y, z), . . . , P (x), . . .} of every arity, variables {x, y, z, . . .}, and a countably inﬁnite set of rigid designators or names, say, the set of natural numbers N, serving as the domain of discourse for quantiﬁcation. But for presentation purposes, we often use identiﬁers such as Latin alphabets (a, b, . . .) and proper names (john, jane, . . .). Together with equality, names essentially realize an inﬁnitary version of the unique-name assumption.1

The set of (ground) atoms is obtained as:2

VARS = P(a1, . . . , ak) | P is a predicate, ai a name .

1In general, this does not rule out capturing uncertainty about the identity of objects; see (Giacomo, Lesp erance, and Levesque 2011; Srivastava et al. 2014), for example. 2Because equality is treated separately, atoms and clauses do not include equalities.

Semantics: A L-model M is a {0, 1} assignment to the elements of VARS. Using |= to denote satisfaction, the semantics for φ L is deﬁned as usual inductively, but with equality as identity: M |= (a = b) iﬀa and b are the same names, and quantiﬁcation understood substitutionally over all names in N: M |= xφ(x) iﬀM |= φ(a) for all a N. We say that φ is valid iﬀfor every L-model M, M |= φ. We will refer to atoms also as propositions, and a ground formula also as a propositional formula. We will use p, q, e to refer to atoms, α, φ to refer to ground formulas, and Δ to refer to closed formulas with quantiﬁers.

Remark: It is worth noting that the compactness property that holds for classical ﬁrst-order logic does not hold in general when the domain is ﬁxed and inﬁnite (Levesque 1998):

{ x P(x), P(1), P(2), . . .}

is an unsatisﬁable theory whose every ﬁnite subset is indeed satisﬁable.

Propositional Fragments: While L and its models are inﬁnitary, satisﬁability checkers and model counters work with a ﬁnite language, built from a ﬁnite set of symbols. To enable this, if φ is a propositional formula, we use VARS(φ) to refer to the propositions in φ, and L(φ) to refer to the ﬁnite propositional language built from VARS(φ) and connectives { , , } . The understanding here is that a L(φ)-model M is a {0, 1} assignment to the elements of VARS(φ). Example 1 : If φ = P(a) P(b), then VARS(φ) = {P(a), P(b)}, L(φ) = {P(a), P(b), P(a), . . .}, and there are 4 L(φ)-models, 3 of which satisfy φ.

Weighted Model Counting In a nutshell, WMC extends #SAT in summing the weights of the models of a propositional formula φ:

WMC(φ, w) =

where M is an L(φ)-model, and w is a weight function. Usually, the weight function is factorized. For example, in classical WMC (Chavira and Darwiche 2008), suppose v, v : VARS(φ) R+ map atoms to positive reals, then:

where p M are those atoms that are true at M, and p M are those that are false. A relational extension of WMC (Van den Broeck 2013), dubbed WFOMC, assumes v, v map predicate symbols from φ to R+. Then:

p M v(PRED(p))

where PRED maps an atom to its predicate. WMC has emerged as a basic computational framework for encoding many formalisms. For example, Markov logic networks (Richardson and Domingos 2006), typically deﬁned as a set of weighted (ﬁnite domain) ﬁrst-order formulas, e.g., {(w, α(x, . . . , y))} can be cast a WFOMC problem (Van den Broeck 2013) by introducing a new predicate

P(x, . . . , y), letting the theory be the sentence P(x, . . . , y) α(x, . . . , y), and deﬁning v(P) = ew and v(P) = 1, and letting v(Q) = v(Q) = 1 for all predicates Q in α. Finally, given a query Q and evidence E, both of which are propositional formulas, we deﬁne the probability of Q given E wrt (φ, w) as:

Pr(Q | E, φ, w) = WMC(φ Q E, w)/WMC(φ E, w).

Logical Entailment Before addressing model counting, we ﬁrst turn to the basic question of logical entailment.

Representation Unfortunately, reasoning in the full language is undecidable (Levesque 1998), but fortunately, when the logical encoding (of a probabilistic formalism) is given as a set of clauses, as is usual in template models, we will be able to establish some results on logical entailment. We begin as follows:

Deﬁnition 2: An acceptable equality is of the form x = a, where x is any variable and a any name. Let e range over formulas built from acceptable equalities and connectives { , , }. Let c range over quantiﬁer-free disjunctions of (possibly non-ground) atoms. Let φ mean the universal closure of φ. A formula of the form (e c) is called a - clause. A knowledge base (KB) Δ is acceptable if it is a ﬁnite non-empty set of -clauses. The rank of Δ is the maximum number of variables mentioned in any -clause in Δ.

To that end, an encoding is taken to be of the form (Δ, w) where Δ is an acceptable KB, and w is a weight function.

Example 3: For example, ((x john x jane) Smoker(x) Alcoholic(x)) is a -clause with rank 1 and says that everybody other than John and Jane are either smokers or alcoholics. (The ﬁrst section mentioned a few examples of ﬁrst-order formulas that are also deﬁnable as -clauses.)

Note that -clauses are logically equivalent to a possibly inﬁnite set of ground clauses, e.g., (P(x) Q(x)) is equivalent to {P(1) Q(1), P(2) Q(2), . . .}.

Grounding: A ground theory is obtained from Δ by substituting variables with names. Suppose θ denotes a substitution. For any set of names C N, we write θ C to mean substitutions are only allowed wrt the names in C. Finally, we deﬁne the following terms:

GND(Δ) = {cθ | (e c) Δ and |= eθ};

For k 0, GND(Δ, k) = {cθ | (e c) Δ, |= eθ, θ K}, where K is the set of names mentioned in Δ plus k (arbitrary) new ones;

DC(Δ) = GND(Δ, 0);

OU(Δ) = GND(Δ, k) where k is the rank of Δ.

Example 4: Recall that because equality is understood as identity for names, formula john jane is valid, and formulas john = jane and john john are unsatisﬁable. So, given Δ = (x john Smoker(x)), we have GND(φ) = {Smoker(jane), Smoker(bob), . . .}.

Example 5: Suppose Δ = { (P(x)), Q(a)}. Then DC(Δ) = GND(Δ, 0) = {P(a), Q(a)} , and OU(Δ) = GND(φ, 1) = {P(a), P(b), Q(a)}. (Here, b is picked arbitrarily.)

Proposition 6: For any acceptable KB Δ, OU(Δ) is ﬁnite.

The understanding here is that DC(Δ) represents the domain closure assumption built into template models. The domains in Markov logic networks and WFOMC, for example, are deﬁned in terms of the Herbrand universe: basically, the constants mentioned in a (function-free) KB (Richardson and Domingos 2006; Van den Broeck 2013). Going further, OU(Δ) will play an important role in realizing logical properties in an OU setting. It should be clear that it represents a relatively small blowup to DC(Δ) by including a few extra constants (= Δ s rank). In the worst-case, we have:

Proposition 7 : Suppose Δ mentions c names, has n - clauses, each mentioning at most m predicates, and with rank k. Then, the total number of atoms mentioned in DC(Δ) is O(n m ck), and the total in OU(Δ) is O(n m (c + k)k).

Results The ﬁrst hurdle is addressing the compactness property, or the lack thereof. Such a property is useful to draw conclusions about a possibly inﬁnite theory in terms of its ﬁnite subsets. As mentioned, L does not enjoy this property, but fortunately, acceptable KBs do. For our ﬁrst result, we have:

Theorem 8: Suppose Δ is acceptable. Then S = GND(Δ) is satisﬁable iﬀevery ﬁnite subset of S is satisﬁable.

This is because GND(Δ) is a (possibly inﬁnite) propositional theory, and we can lift compactness from propositional logic. Next, for our main result, we establish that entailment can eschew the need to appeal to an inﬁnitary theory:

Theorem 9: Suppose Δ is as above, and α is any propositional formula. Then

|= Δ α iﬀOU(Δ α) is unsatisﬁable.

The proof is long, but can be approached using 2 lemmas:

Lemma 10: Suppose Δ s rank is k. If GND(Δ, k) α is satisﬁable, then so is GND(Δ, j) α for all j k.

Lemma 11: Δ |= α iﬀOU(Δ) |= α.

The ﬁrst argues that grounding Δ wrt additional names does not aﬀect satisﬁability. The crux of the argument lies in showing that names not mentioned in Δ α behave identically (in a suitable formal sense). The second argues that for entailment OU(Δ) suﬃces, which is then used for a refutation-based claim in Theorem 9. Putting Theorem 9 and Proposition 6 together, we have:

Corollary 12: Suppose Δ and α are as above, then Δ |= α can be veriﬁed in classical (ﬁnitary) propositional logic.

Example 13: Let Δ be the union of:

(Smoker(x) Friends(x, y) Smoker(y)) Smoker(john) y(Friends(john, y))

Let the query be α = Smoker(jane). Since everybody is friends with John, a smoker, we have |= Δ α. To see Theorem 9 in action, consider OU(Δ α) = GND(Δ α, 2). On applying all possible substitutions wrt the names mentioned and 2 new ones, OU(Δ α) would include: (Smoker(john) Friends(john, jane)) Smoker(jane), Smoker(john), Friends(john, jane) but also α = Smoker(jane). Indeed OU(Δ α) is unsatisﬁable.

In the above example, when grounding, the new names were not needed. To see why names from outside the KB and the query may be essential, consider:

Example 14: Let α be as above, but Δ be the union of:

(x jane Smoker(x)) (Friends(x, y)) (Smoker(x) Friends(x, y) Smoker(y))

Here |= Δ α. Argument: because someone other than Jane, say John, is declared to be a smoker and since he is Jane s friend, she must also be a smoker. However, DC(Δ α) is satisﬁable: only using the names mentioned in Δ α is insuﬃcient. In contrast, OU(Δ α) is indeed unsatisﬁable.

Weighted Model Counting

We now develop an OU account of WMC. Prior work has considered various semantics with high-dimension probability spaces, appealing to topological orderings and such. We contribute a deﬁnition to this literature that is logically motivated, based on the below (informally stated) properties:

1. When the query is believed, it has probability 1.

2. When the query contradicts what is believed, it has probability 0.

3. When the query mentions unknown individuals, these individuals are interchangeable with other unknowns.

Our deﬁnition satisﬁes these features in a ﬁrst-order setting, and will further leverage Theorem 9 in only using OU(Δ). The beneﬁt is that no assumptions are needed about the encoding (Δ, w) because only a ﬁnite weighted propositional theory is instantiated wrt a given query.3 Formally:

Deﬁnition 15: Suppose Δ is an acceptable KB, w a weight function, and {Q, E} propositional formulas.4 Then, the WMC of (Δ, w) in an OU setting is deﬁned as:

WMCOU(Δ, w) = WMC(OU(Δ), w).

The probability of Q given E wrt (Δ, w) is deﬁned as: Pr(Q | E, Δ, w) = WMCOU(Δ Q E, w)/WMCOU(Δ E, w).

3While the deﬁnition is coherent wrt entailment over an inﬁnite number of atoms, it is conceivable that other notions are possible, such as those based on inﬁnitely many random variables (Singla and Domingos 2007). 4For technical reasons, we assume here that Q and E mention the same set of names. This is without loss of generality: let γ = {p p | atom p is mentioned in Q E}, replace Q with Q = Q γ, and E with E = E γ. Of course, Q Q and E E , and clearly Q and E mention the same set of names.

We often write Pr(Q | Δ, w) to mean E = true. Algorithm 1 provides a pseudocode for Pr in terms of WMC. Its correctness is stated as follows: Theorem 16 : Suppose Δ is any acceptable KB, α any propositional formula, w any predicate-level weight function. Then (a) Pr(α | Δ, w) = 1 if |= Δ α; (b) Pr(α | Δ, w) = 0 if |= Δ α; (c) Pr(α | Δ, w) = Pr(α | Δ, w) for any bijection from N to N such that it maps names from Δ to themselves and otherwise arbitrary. If Δ was propositional, classical WMC would satisfy (a) and (b), e.g., if φ |= α then φ φ α and therefore, φ and φ α have the same model counts. Our deﬁnition not only lifts this property to ﬁrst-order entailment but also enjoys (c) which is relevant with inﬁnitely many unknowns.5

Example 17: To see this in action, let Δ be the union of: (Smoker(x) Alcoholic(x)) (x john Smoker(x)) Suppose v, v map all predicates to 1. Consider α1 = Alcoholic(jane) vs α2 = Smoker(jane). Note that Δ |= α1 but Δ |= α2. Observe OU(Δ α1) is the conjunction of: (Smoker(john) Alcoholic(john)) (Smoker(jane) Alcoholic(jane)) (Smoker(bob) Alcoholic(bob)) Smoker(jane) Smoker(bob) Alcoholic(jane) where bob is chosen arbitrarily. Clearly models can vary in how they interpret Smoker(john) Alcoholic(john), and so we obtain WMCOU(Δ α1) = 3, and also, Pr(α1 | Δ, w) = 1. However, Δ α2 is inconsistent, and indeed WMCOU(Δ α2) = 0 leading to Pr(α2 | Δ, w) = 0. Finally, let β = Alcoholic(bob). We note Pr(β | Δ, w) = 1. Let map john to itself, otherwise arbitrary: say, it maps bob to jane. So β = Alcoholic(jane). Indeed Pr(β | Δ, w) = 1. Example 18: We now consider a parameterized version of Pearl s (1988) Alarm Bayesian network. In the classic version, a burglary or an earthquake can trigger your house alarm, and hearing that your neighbors would probably call you. Russell (2015) sketches a richer variant, which we call Alarm X: imagine a large set of regions, and in each region, there are any number of houses. In region r, the likelihood of an earthquake and the likelihood of a burglary in house h are Bernoulli random variables. Roughly, imagine rules like:

.003 Burglary(r, h). .002 Earthquake(r). .8 Alarm(r, h) Burglary(r, h), Earthquake(r).

Consider the following set-up. Let Δ represent the above parameterized Bayesian network as a set of -clauses, using the usual semantics for encoding Bayesian networks

5Readers may observe that we stipulate a predicate-level weight function. As it turns out, the weight function is irrelevant for (a) and (b). However, (c) would not hold, if, for example, given Δ = { (Smoker(x))}, w maps Smoker(bob) to a higher weight than Smoker(rob) even though both of them are unknown. Thus, factorization at the level of predicates is one way to enable that unknown instances of a predicate are considered equally likely.

Algorithm 1 Pr(Q | E, Δ, w)

1: α1 = Δ Q E, α2 = Δ E 2: C = names(α1), d = rank(Δ) 3: C = C { d new names } 4: α i = ground αi wrt C

5: return WMC(α 1)/WMC(α 2)

as logical theories (Chavira and Darwiche 2008). However, the domain of the quantiﬁers will be unrestricted in Δ: we will not stipulate the list of regions, etc. Suppose we now provide the evidence E = Alarm(region1, house2). Intuitively, we would expect the following properties in an inﬁnite instantiation: (a) the probability of Q1 = Burglary(region1, house2) is the probability of Q2 = Burglary(a, b) for every a, b region1, house2 ; (b) the probability of Q3 = Earthquake(region1) is the probability of Q4 = Earthquake(c) for every c region1. As it turns out, our deﬁnition conﬁrms this intuition:

Theorem 19: Let Δ, E and Qs be as above, with a = c = region2 and b = house3. Let w be a predicate-level (strictly positive) weight function. Then Pr(Q1 | E, Δ, w) Pr(Q2 | E, Δ, w), and Pr(Q3 | E, Δ, w) Pr(Q4 | E, Δ, w).

Lifted Inference The WMC formulation works with the ground propositional theory. But just as ﬁrst-order resolution is equivalent to a large number of propositional resolution steps, there are a number of extensions to WMC that leverage the ﬁrst-order structure of template models by lumping together classes of objects (Poole 2003; de Salvo Braz, Amir, and Roth 2005; Gogate and Domingos 2011; Beame et al. 2015), referred to as lifted inference. A crisp formal claim is this (Van den Broeck 2013): an algorithm is said to be domain-lifted when it runs in time polynomial in the size of the domain. Space precludes a discussion on these algorithms. As an example, observe that on grounding α = (Smoker(x) Alcoholic(x)) wrt the domain {a} the model count is 3, on grounding α wrt {a, b} the model count is 32, and so on. Lifted systems recognize that the model count of α wrt the domain C is 3|C|, without the need for explicit grounding. Unfortunately, diﬀerent from standard template models, in the OU setting, quantiﬁcation ranges over an inﬁnite set, and equality is understood as identity over these inﬁnitely many individuals. So, we will be interested in two questions:

can these KBs be put in a form that is acceptable as input to lifted algorithms in a correctness preserving way, and

can we inherit the polynomial time guarantees of domainlifted algorithms where available?

We answer in the aﬃrmative for both. The key construction will be to normalize acceptable KBs wrt the appropriate ﬁnite set of names required by the OU semantics.

Deﬁnition 20: Suppose Δ is acceptable, k its rank, and B the names mentioned in Δ. Let N C B be any ﬁnite set such that |C| = |B|+k. Then putting Δ in equality-free normal form (ENF) wrt C is achieved by recurring the transformations:

1. If (e c) Δ, then c does not mention any names: that is, convert (e c(a)) to ((e z = a) c(z)) where z is a fresh variable not used in e. 2. If (e c) Δ, then eliminate e by introducing sorts: (a) Transform (((x = a1 . . . x = an) e) c) to [ x S (e c)] where S = {a1, . . . , an}. (b) Transform (((x a1 . . . x an) e) c) to [ x S (e c)] where S = C {a1, . . . , an} . Proposition 21 : When Δ is converted to ENF, then (i) ENF(Δ) does not mention equality, and is a conjunction of formulas of the form x1 S 1, . . . , xn S n c where c is a clause not mentioning names; (ii) the sort S introduced in 2(a) is such that S B; and (iii) the sort S introduced in 2(b) is such that S = C S , where S B. The sentence ENF(Δ) is equality-free, function-free and over ﬁnite sorts, which is precisely the fragment expected by lifted algorithms. For starters, it is correctness preserving: Theorem 22: WMC(ENF(Δ), w) = WMCOU(Δ, w) for any predicate-level w. It is shown in (Van den Broeck 2013) that when ﬁrst-order formulas mention only 2 variables, then there is a domainlifted algorithm for their WFOMC task. We leverage this: Corollary 23: Suppose Δ,C and w are as above. Suppose ENF(Δ) = i ci is such that ci only uses at most 2 logical variables. Then WMCOU(Δ, w) = WMC(ENF(Δ), w) is computable in time polynomial in |C|. Investigations on other classes of ﬁrst-order formulas is an area of ongoing research (Beame et al. 2015), and owing to Theorem 22, we expect these to carry over.

Preliminary Evaluations The simplicity of the OU semantics, that of grounding KBs wrt a few extra constants, makes it straightforward to solve OU problems using existing (unmodiﬁed) WMC software, either by Algorithm 1 or by ENF(Δ) with a grounder. For a range of problems, we have veriﬁed that correct answers wrt the OU semantics are returned using the C2D WMC solver (Darwiche 2004), the WFOMC lifted WMC solver (Van den Broeck 2013), and the Prob Log probabilistic programming language (Fierens et al. 2011).6 In this section, we investigate two empirical questions (for problems in ENF). First, how expensive is handling the OU case? Second, how does a WMC approach to OU problems compare against a sampling-based OU approach, as embedded in BLOG,7 for example (Milch et al. 2005a)? To answer these questions, we chose 5 problems that emphasize an OU environment. The particulars of the problems are discussed below, and the empirical behavior is presented in Figure 1. Experiments were run on OS X, using a 1.3 GHz Intel i5 processor with 4GB RAM. Prob Log was used for computing probabilities (OU with ENF or otherwise). BLOG v0.2 was run with the default settings.

6Recent installments of Prob Log are based on a logical translation of probabilistic rules to weighted CNFs (Fierens et al. 2011). 7BLOG is a strong baseline as it is designed and optimized for OU applications, is well-studied and approachable for modeling relational template models owing to its ﬁrst-order syntax.

semantics Alarm R Smoker1 Smoker2 OU 0.194 0.172 0.182 DC 0.192 0.170 0.175

language Alarm R Alarm X Grades WMCOU 0.194 0.2 0.36 BLOG 0.782 0.85 6.0

Figure 1: DC vs OU (top) and WMCOU vs BLOG (bottom). Time reported is in seconds, and is averaged over 10 runs.

Domain Closure (DC) vs OU Let us remark that for OU problems, DC may not give the correct answers. Nonetheless, we considered Examples 13 and 14 (Smoker1 and Smoker2 in Figure 1 respectively). Recall that DC would return the correct answer for the ﬁrst, but not the second. These are then compared to WMCOU. Next, we reconsider the classical Alarm Bayesian network (Alarm R in Figure 1), now with two neighbors, where we are interested in the probability that there was a burglary given that both persons called. This is a DC setting. For the OU case, we leave the neighbor predicate open, that is, it is not known how many neighbors there are, but of these, two distinct people call after hearing the alarm. The execution times reported in Figure 1 indicate that handling the OU case has only a marginal impact on performance, as is expected (and desired).8

WMCOU vs BLOG We contrasted the two approaches on Alarm R, but also on Alarm X. In both cases, WMCOU is notably faster. A more glaring diﬀerence is observed when we consider the Grades problem from (Heckerman, Meek, and Koller 2004). Brieﬂy, we are to predict the probability of a student s grade, given the courses she takes, the diﬃculty of these courses, the student s intelligence level, and so on. We enable an OU setting here by leaving the set of students open, and their existence is inferred from the evidence. The problem is known to have a non-trivial relational skeleton, and so it seems that a WMC-based approach fares well owing to a SAT solver s ability to handle complex logical dependencies.

Lineage and Conclusions There are several recent instrumental papers focusing on OU models. We mention the ones that are most closely related, and refer readers to (Milch et al. 2005a) and references therein on historical developments. Extensions of template models to inﬁnite domains are, of course, the closest in spirit: Singla and Domingos (2007) handle inﬁnite domains in Markov logic networks using locality constraints over Gibbs measures, Jaeger (1998) proves decidability in inﬁnite-domain relational BNs given independency constraints in query atoms, Laskey and da Costa (2005) provides a semantics for inﬁnite-state BNs assuming stratiﬁ-

8Execution times with C2D and WFOMC, along with scalability numbers for increasing domain sizes, are left to an extended report.

cation conditions, and Milch et al. (2005b) provide a semantics for inﬁnite-state BNs while assuming that only a ﬁnite number of ancestors aﬀect a variable. (Approaches not based on template models include, e.g., (Welling, Porteous, and Bart 2007), that view inﬁnite-state BNs as hierarchical Dirichlet processes.) Such template models can also be enabled by means of programming languages, which include both declarative approaches e.g., (Kersting and Raedt 2000; Fierens et al. 2011) that often support derivations that are inﬁnitely long, as well as procedural ones, e.g., BLOG (Milch et al. 2005a) that also allows identity uncertainty in addition to open domains. There has also been recent work on handling existential quantiﬁers and exchangeable atoms for model counting (Van den Broeck, Meert, and Darwiche 2014; Niepert and Van den Broeck 2014) that are related to some of the ideas in the OU semantics. From the representation side, our language and KB syntax is based on modal logic with rigid designators (Levesque and Lakemeyer 2001; Lakemeyer and Levesque 2002; Belle and Lakemeyer 2011). Our results on logical entailment, in particular, revisit and simplify the developments in (Belle and Lakemeyer 2011). Beyond this, there is a large body of work on handling unknowns for query entailment in knowledge representation and database theory, e.g., see (Van der Meyden 1998; Libkin 2016; Giacomo, Lesp erance, and Levesque 2011) and references therein. Finally, in the context of WMCOU considering new constants, studying the model count of a ﬁrst-order formula as a function of the domain size is of long-standing interest in database theory (Dalvi 2006; Beame et al. 2015; Fagin 1976). The OU semantics of this paper contributes to the above literature, and oﬀers the following salient features: 1. no syntactic stipulations (e.g., stratiﬁcation constraints) on the weighted theory (Δ, w) outside of the usual clausal representation for Δ; 2. if a parameterized representation serves as a proxy for a large but ﬁnite domain (e.g., family trees, language models), the semantics necessitates grounding only wrt the constants in the query at hand (plus KB s rank); 3. conditional probabilities are coherent wrt logical entailment over a full ﬁrst-order language; 4. the ability to leverage existing WMC technology, including lifted inference that can (sometimes) lead to exponential improvements. We think these 4 properties can put OU applications within the reach of weighted model counters. The simplicity of the propositional apparatus and the separation of the weight function from the encoding has led to the popularity of WMC, and in this vein, we hope that WMCOU can serve as an assembly language for OU formalisms. As for future work, it would be worthwhile to better understand the mathematical relationships between WMCOU and other OU semantics.

References Bacchus, F. 1990. Representing and Reasoning with Probabilistic Knowledge. MIT Press.

Beame, P.; Van den Broeck, G.; Gribkoﬀ, E.; and Suciu, D. 2015. Symmetric weighted ﬁrst-order model counting. In PODS, 313 328. ACM. Belle, V., and Lakemeyer, G. 2011. On progression and query evaluation in ﬁrst-order knowledge bases with function symbols. In Proc. IJCAI, 255 260. Chakraborty, S.; Fremont, D. J.; Meel, K. S.; Seshia, S. A.; and Vardi, M. Y. 2014. Distribution-aware sampling and weighted model counting for sat. AAAI. Chavira, M., and Darwiche, A. 2008. On probabilistic inference by weighted model counting. Artif. Intell. 172(67):772 799. Choi, A.; Kisa, D.; and Darwiche, A. 2013. Compiling probabilistic graphical models using sentential decision diagrams. In Symbolic and Quantitative Approaches to Reasoning with Uncertainty. Springer. 121 132. Dalvi, N. 2006. Query evaluation on a database given by a random graph. In Database Theory ICDT 2007. Springer. 149 163. Darwiche, A. 2004. New advances in compiling CNF to decomposable negation normal form. In ECAI, 328 332. de Salvo Braz, R.; Amir, E.; and Roth, D. 2005. Lifted ﬁrstorder probabilistic inference. In Proc. IJCAI, 1319 1325. Dong, X.; Gabrilovich, E.; Heitz, G.; Horn, W.; Lao, N.; Murphy, K.; Strohmann, T.; Sun, S.; and Zhang, W. 2014. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In Knowledge discovery and data mining. Doshi, F.; Wingate, D.; Tenenbaum, J. B.; and Roy, N. 2011. Inﬁnite dynamic bayesian networks. In ICML, 913 920. Ermon, S.; Gomes, C. P.; Sabharwal, A.; and Selman, B. 2013. Embed and project: Discrete sampling with universal hashing. In NIPS, 2085 2093. Fagin, R. 1976. Probabilities on ﬁnite models. The Journal of Symbolic Logic 41(01):50 58. Fierens, D.; den Broeck, G. V.; Thon, I.; Gutmann, B.; and Raedt, L. D. 2011. Inference in probabilistic logic programs using weighted CNF s. In UAI, 211 220. Getoor, L., and Taskar, B., eds. 2007. An Introduction to Statistical Relational Learning. MIT Press. Giacomo, G. D.; Lesp erance, Y.; and Levesque, H. J. 2011. Eﬃcient reasoning in proper knowledge bases with unknown individuals. In IJCAI, 827 832. Gogate, V., and Domingos, P. 2011. Probabilistic theorem proving. In Proc. UAI, 256 265. Halpern, J. 1990. An analysis of ﬁrst-order logics of probability. Artiﬁcial Intelligence 46(3):311 350. Heckerman, D.; Meek, C.; and Koller, D. 2004. Probabilistic models for relational data. Technical report, Technical Report MSR-TR-2004-30, Microsoft Research. Jaeger, M. 1998. Reasoning about inﬁnite random structures with relational bayesian networks. In KR, 570 581. Kersting, K., and Raedt, L. D. 2000. Bayesian logic programs. In ILP Work-in-progress reports.

Lakemeyer, G., and Levesque, H. J. 2002. Evaluation-based reasoning with disjunctive information in ﬁrst-order knowledge bases. In Proc. KR, 73 81. Laskey, K. B., and da Costa, P. C. G. 2005. Of starships and klingons: Bayesian logic for the 23rd century. In UAI, 346 353. Levesque, H., and Lakemeyer, G. 2001. The logic of knowledge bases. The MIT Press. Levesque, H. J. 1998. A completeness result for reasoning with incomplete ﬁrst-order knowledge bases. In Proc. KR, 14 23. Libkin, L. 2016. SQL s three-valued logic and certain answers. ACM Trans. Database Syst. 41(1):1:1 1:28. Milch, B.; Marthi, B.; Russell, S. J.; Sontag, D.; Ong, D. L.; and Kolobov, A. 2005a. BLOG: Probabilistic models with unknown objects. In Proc. IJCAI, 1352 1359. Milch, B.; Marthi, B.; Sontag, D.; Russell, S. J.; Ong, D. L.; and Kolobov, A. 2005b. Approximate inference for inﬁnite contingent bayesian networks. In AISTATS. Niepert, M., and Van den Broeck, G. 2014. Tractability through exchangeability: A new perspective on eﬃcient probabilistic inference. In AAAI, 2467 2475. Pearl, J. 1988. Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann. Poole, D. 2003. First-order probabilistic inference. In Proc. IJCAI, 985 991. Richardson, M., and Domingos, P. 2006. Markov logic networks. Machine learning 62(1):107 136. Russell, S. J. 2015. Unifying logic and probability. Commun. ACM 58(7):88 97. Sang, T.; Beame, P.; and Kautz, H. A. 2005. Performing bayesian inference by weighted model counting. In AAAI, 475 482. Singla, P., and Domingos, P. M. 2007. Markov logic in inﬁnite domains. In UAI, 368 375. Srivastava, S.; Russell, S. J.; Ruan, P.; and Cheng, X. 2014. First-order open-universe pomdps. In UAI, 742 751. Van den Broeck, G.; Meert, W.; and Darwiche, A. 2014. Skolemization for weighted ﬁrst-order model counting. In KR. Van den Broeck, G. 2013. Lifted Inference and Learning in Statistical Relational Models. Ph.D. Dissertation, KU Leuven. Van der Meyden, R. 1998. Logical approaches to incomplete information: A survey. In Logics for databases and information systems. Springer. 307 356. Wei, W., and Selman, B. 2005. A new approach to model counting. In Theory and Applications of Satisﬁability Testing, 96 97. Springer. Welling, M.; Porteous, I.; and Bart, E. 2007. Inﬁnite state bayes-nets for structured domains. In NIPS, 1601 1608. Wu, W.; Li, H.; Wang, H.; and Zhu, K. Q. 2012. Probase: A probabilistic taxonomy for text understanding. In Proc. Int. Conf. on Management of Data, 481 492.

Appendix Proof of Proposition 6 Proof: Since Δ is ﬁnite, and the number of quantiﬁers in a -clause is ﬁnite, the set of names considered for the grounding is ﬁnite, leading to only ﬁnitely many substitutions for each -clause in Δ.

Proof of Proposition 7 Proof: For each -clause, given a rank of k and c constants, there are ck possible substitutions, each of which result in m atoms. In the OU case, we will consider c + k constants, by deﬁnition.

Proof of Theorem 8 Proof: By construction, cθ GND(Δ) is a ground clause, and so GND(Δ) is essentially a (possibly inﬁnite) propositional theory, over a (possibly inﬁnite) vocabulary. Compactness from propositional logic then applies.

Proof of Theorem 9 The proof will need two lemmas. Before presenting the lemmas, suppose is a bijection from N to N. Then for any formula β, we write β to mean that the names a appearing in β are replaced by a .

We ﬁrst prove Lemma 10.

Proof: Let Γ = GND(Δ, k) α and Γ = GND(Δ, j) α. Without loss of generality, we assume that the names mentioned in GND(Δ, j) is the union of the names mentioned in GND(Δ, k) and j k new ones. Notation: let names(β) be the set of names mentioned in β. Suppose, by assumption, M |= Γ. Construct M :

1. for every atom p mentioned in Γ, let M [p] = M[p];

2. for every atom p not mentioned in Γ , let M [p] = M[p];

3. for every atom p mentioned in Γ Γ, or more precisely, for cθ Γ Γ, do as follows. Note that cθ mentions at most k names not appearing in Δ. By construction, since cθ Γ Γ, cθ mentions names in names(Γ ) names(Γ), say l of them. But because of this, cθ does not mention at least l names from names(Γ) names(Δ). Then let be a bijection from N to N that swaps every name from (names(Γ ) names(Γ)) names(cθ) with l names from names(Γ) names(Δ) but not appearing in names(cθ), and maps the rest to themselves. By construction, cθ Γ Γ because there is a -clause (e c) Δ and |= eθ. A simple induction shows that |= eθ iﬀ|= (eθ) iﬀ(because e does not mention names from Γ Γ) |= eθ . But by construction, then, cθ Γ. (Basically, after the swap, the names mentioned in cθ must be from names(Γ).) In other words, for every atom p mentioned in cθ, there is an atom p mentioned in cθ . To conclude, let M [p] = M[p ].

We have completed the construction of M . It is easy to show (by induction) that M satisﬁes Γ because of construction step (1). Similarly, it is easy to show that M satisﬁes Γ Γ because of construction step (3). Thus, M satisﬁes Γ and we are done.

Next, we prove Lemma 11.

Proof: Suppose Δ |= α but GND(Δ, k) α is satisﬁable. By Lemma 10, GND(Δ, j) α is satisﬁable for every j k and so, by Theorem 8, GND(Δ) α is satisﬁable, that is, Δ α is satisﬁable. Contradiction. Suppose GND(Δ, k) |= α. Since GND(Δ, k) GND(Δ), GND(Δ) |= α and so Δ |= α.

The proof for Theorem 9 is then as follows. Proof: Let γ = {p p | atom p is mentioned in α} . Let Δ = Δ γ. Since Δ Δ, we have Δ |= α iﬀΔ |= α. From Lemma 11, Δ |= α iﬀOU(Δ ) |= α iﬀOU(Δ ) α is unsatisﬁable. It is easy to verify that OU(Δ ) α OU(Δ α) OU(Δ α). Therefore, Δ |= α iﬀOU(Δ α) is unsatisﬁable.

Proof of Theorem 16 Proof: Here, (a) holds because if |= Δ α then Δ Δ α. Next, (b) is by way of Theorem 9 because if Δ α is unsatisﬁable, then it has no models. Finally, suppose β means β with names a replaced by a . For (c), by construction, (Δ α) = Δ α . By induction on α, we can show that for any model M of Δ α, we can construct a model M of Δ α , and vice versa.

Proof of Proposition 21 Proof: Here, (i) is from the two transformation rules; (ii) is because S is obtained from a e of a -clause (e c) Δ and so S B by construction; and (iii) is because S is (also) obtained from a e of (e c) Δ and so S B by construction.

Proof of Theorem 22 Proof: The understanding is that ENF(Δ) is wrt C. Let D be the set of names mentioned in OU(Δ). By construction, D C = B, where B is the set of names in Δ. Let be a bijection from N to N that maps names from B to itself, and names from D C are mapped to those in C D (say, the smallest in D C is mapped to the smallest in C D, and so on). An induction argument can be used to show that WMC(OU(Δ), w) = WMC(OU(Δ) , w). (Recall that predicate-level weight functions will not rank unknown atoms over other unknown instances, as already observed for Theorem 16.) But then OU(Δ) is equivalent to ENF(Δ).