# lpmln_weak_constraints_and_plog__40756732.pdf

LPMLN Weak Constraints, and P-log

Joohyung Lee and Zhun Yang School of Computing, Informatics and Decision Systems Engineering Arizona State University, Tempe, USA {joolee, zyang90}@asu.edu

LPMLN is a recently introduced formalism that extends answer set programs by adopting the log-linear weight scheme of Markov Logic. This paper investigates the relationships between LPMLN and two other extensions of answer set programs: weak constraints to express a quantitative preference among answer sets, and P-log to incorporate probabilistic uncertainty. We present a translation of LPMLN into programs with weak constraints and a translation of P-log into LPMLN, which complement the existing translations in the opposite directions. The ﬁrst translation allows us to compute the most probable stable models (i.e., MAP estimates) of LPMLN programs using standard ASP solvers. This result can be extended to other formalisms, such as Markov Logic, Prob Log, and Pearl s Causal Models, that are shown to be translatable into LPMLN. The second translation tells us how probabilistic nonmonotonicity (the ability of the reasoner to change his probabilistic model as a result of new information) of P-log can be represented in LPMLN, which yields a way to compute P-log using standard ASP solvers and MLN solvers.

Introduction

LPMLN (Lee and Wang 2016) is a recently introduced probabilistic logic programming language that extends answer set programs (Gelfond and Lifschitz 1988) with the concept of weighted rules, whose weight scheme is adopted from that of Markov Logic (Richardson and Domingos 2006). It is shown in (Lee and Wang 2016; Lee, Meng, and Wang 2015) that LPMLN is expressive enough to embed Markov Logic and several other probabilistic logic languages, such as Prob Log (De Raedt, Kimmig, and Toivonen 2007), Pearls Causal Models (Pearl 2000), and a fragment of P-log (Baral, Gelfond, and Rushton 2009). Among several extensions of answer set programs to overcome the deterministic nature of the stable model semantics, LPMLN is one of the few languages that incorporate the concept of weights into the semantics. Another one is weak constraints (Buccafurri, Leone, and Rullo 2000), which are to assign a quantitative preference over the stable models of non-weak constraint rules: weak constraints cannot be used for deriving stable models. It is relatively a simple extension

Copyright c 2017, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

of the stable model semantics but has turned out to be useful in many practical applications. Weak constraints are part of the ASP Core 2 language (Calimeri et al. 2013), and are implemented in standard ASP solvers, such as CLINGO and DLV. P-log is a probabilistic extension of answer set programs. In contrast to weak constraints, it is highly structured and has quite a sophisticated semantics. One of its distinct features is probabilistic nonmonotonicity (the ability of the reasoner to change his probabilistic model as a result of new information) whereas, in most other probabilistic logic languages, new information can only cause the reasoner to abandon some of his possible worlds, making the effect of an update monotonic. This paper reveals interesting relationships between LPMLN and these two extensions of answer set programs. It shows how different weight schemes of LPMLN and weak constraints are related, and how the probabilistic reasoning in P-log can be expressed in LPMLN. The result helps us understand these languages better as well as other related languages, and also provides new, effective computational methods based on the translations. It is shown in (Lee and Wang 2016) that programs with weak constraints can be easily viewed as a special case of LPMLN programs. In the ﬁrst part of this paper, we show that an inverse translation is also possible under certain conditions, i.e., an LPMLN program can be turned into a usual ASP program with weak constraints so that the most probable stable models of the LPMLN program are exactly the optimal stable models of the program with weak constraints. The result allows for using ASP solvers for computing Maximum A Posteriori probability (MAP) estimates of LPMLN

programs. Interestingly, the translation is quite simple so it can be easily applied in practice. Further, the result implies that MAP inference in other probabilistic logic languages, such as Markov Logic, Prob Log, and Pearl s Causal Models, can be computed by standard ASP solvers because they are known to be embeddable in LPMLN, thereby allowing us to apply combinatorial optimization in standard ASP solvers to MAP inference in these languages. In the second part of the paper, we show how P-log can be completely characterized in LPMLN. Unlike the translation in (Lee and Wang 2016), which was limited to a subset of

Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17)

P-log that does not allow dynamic default probability, our translation applies to full P-log and complements the recent translation from LPMLN into P-log in (Balai and Gelfond 2016). In conjunction with the embedding of LPMLN in answer set programs with weak constraints, our work shows how MAP estimates of P-log can be computed by standard ASP solvers, which provides a highly efﬁcient way to compute P-log.

Preliminaries Review: LPMLN

We review the deﬁnition of LPMLN from (Lee and Wang 2016). In fact, we consider a more general syntax of programs than the one from (Lee and Wang 2016), but this is not an essential extension. We follow the view of (Ferraris, Lee, and Lifschitz 2011) by identifying logic program rules as a special case of ﬁrst-order formulas under the stable model semantics. For example, rule r(x) p(x), not q(x) is identiﬁed with formula x(p(x) q(x) r(x)). An LPMLN

program is a ﬁnite set of weighted ﬁrst-order formulas w : F where w is a real number (in which case the weighted formula is called soft) or α for denoting the inﬁnite weight (in which case it is called hard). An LPMLN program is called ground if its formulas contain no variables. We assume a ﬁnite Herbrand Universe. Any LPMLN program can be turned into a ground program by replacing the quantiﬁers with multiple conjunctions and disjunctions over the Herbrand Universe. Each of the ground instances of a formula with free variables receives the same weight as the original formula. For any ground LPMLN program Π and any interpretation I, Π denotes the unweighted formula obtained from Π, and ΠI denotes the set of w : F in Π such that I |= F, and SM[Π] denotes the set {I | I is a stable model of ΠI} (We refer the reader to the stable model semantics of ﬁrst-order formulas in (Ferraris, Lee, and Lifschitz 2011)). The unnormalized weight of an interpretation I under Π is deﬁned as

if I SM[Π];

0 otherwise.

The normalized weight (a.k.a. probability) of an interpretation I under Π is deﬁned as

PΠ(I) = lim α WΠ(I)

J SM[Π] WΠ(J).

I is called a (probabilistic) stable model of Π if PΠ(I) = 0.

Review: Weak Constraints A weak constraint has the form

: F [Weight @ Level].

where F is a ground formula, Weight is a real number and Level is a nonnegative integer. Note that the syntax is more general than the one from the literature (Buccafurri, Leone,

and Rullo 2000; Calimeri et al. 2013), where F was restricted to conjunctions of literals.1 We will see the generalization is more convenient for stating our result, but will also present translations that conform to the restrictions imposed on the input language of ASP solvers. Let Π be a program Π1 Π2, where Π1 is a set of ground formulas and Π2 is a set of weak constraints. We call I a stable model of Π if it is a stable model of Π1 (in the sense of (Ferraris, Lee, and Lifschitz 2011)). For every stable model I of Π and any nonnegative integer l, the penalty of I at level l, denoted by PenaltyΠ(I, l), is deﬁned as

: F [w@l] Π2, I|=F

For any two stable models I and I of Π, we say I is dominated by I if there is some nonnegative integer l such that PenaltyΠ(I , l) < PenaltyΠ(I, l) and for all integers k > l, PenaltyΠ(I , k) = PenaltyΠ(I, k). A stable model of Π is called optimal if it is not dominated by another stable model of Π.

Turning LPMLN into Programs with Weak Constraints General Translation We deﬁne a translation that turns an LPMLN program into a program with weak constraints. For any ground LPMLN

program Π, the translation lpmln2wc(Π) is simply deﬁned as follows. We turn each weighted formula w : F in Π into {F}ch, where {F}ch is a choice formula, standing for F F (Ferraris, Lee, and Lifschitz 2011). Further, we add

: F [ 1@1] (1)

if w is α, and : F [ w@0] (2) otherwise. Intuitively, choice formula {F}ch allows F to be either included or not in deriving a stable model.2 When F is included, the stable model gets the (negative) penalty 1 at level 1 or w at level 0 depending on the weight of the formula, which corresponds to the (positive) reward eα or ew

that it receives under the LPMLN semantics. The following proposition tells us that choice formulas can be used for generating the members of SM[Π].

Proposition 1 For any LPMLN program Π, the set SM[Π] is exactly the set of the stable models of lpmln2wc(Π). The following theorem follows from Proposition 1. As the probability of a stable model of an LPMLN program Π increases, the penalty of the corresponding stable model of lpmln2wc(Π) decreases, and the distinction between hard rules and soft rules can be simulated by the different levels of weak constraints, and vice versa.

1A literal is either an atom p or its negation not p. 2This view of choice formulas was used in Pr ASP (Nickles and Mileo 2014) in deﬁning spanning programs.

Theorem 1 For any LPMLN program Π, the most probable stable models (i.e., the stable models with the highest probability) of Π are precisely the optimal stable models of the program with weak constraints lpmln2wc(Π).

Example 1 For program Π:

10 : p q 5 : p 1 : p r 20 : r (3)

SM[Π] has 5 elements: , {p}, {p, q}, {p, r}, {p, q, r}. Among them, {p, q} is the most probable stable model, whose weight is e15, while {p, q, r} is a probabilistic stable model whose weight is e 4. The translation yields

{p q}ch : p q [ 10 @ 0] {p r}ch : p r [ 1 @ 0] {p}ch : p [ 5 @ 0] { r }ch : r [20 @ 0]

whose optimal stable model is {p, q} with the penalty at level 0 being 15, while {p, q, r} is a stable model whose penalty at level 0 is 4.

The following example illustrates how the translation accounts for the difference between hard rules and soft rules by assigning different levels.

Example 2 Consider the LPMLN program Π in Example 1 from (Lee and Wang 2016).

α : Bird(Jo) Resident Bird(Jo) (r1) α : Bird(Jo) Migratory Bird(Jo) (r2) α : Resident Bird(Jo), Migratory Bird(Jo) (r3) 2 : Resident Bird(Jo) (r4) 1 : Migratory Bird(Jo) (r5)

The translation lpmln2wc(Π) is 3

{Bird(Jo) Resident Bird(Jo)}ch

{Bird(Jo) Migratory Bird(Jo)}ch

{ Resident Bird(Jo), Migratory Bird(Jo)}ch

{Resident Bird(Jo)}ch

{Migratory Bird(Jo)}ch

: Bird(Jo) Resident Bird(Jo) [ 1@1] : Bird(Jo) Migratory Bird(Jo) [ 1@1] : Resident Bird(Jo), Migratory Bird(Jo)} [ 1@1] : Resident Bird(Jo) [ 2@0] : Migratory Bird(Jo) [ 1@0]

The three probabilistic stable models of Π, , {Bird(Jo), Resident Bird(Jo)}, and {Bird(Jo), Migratory Bird(Jo)}, have the same penalty 3 at level 1. Among them, {Bird(Jo), Resident Bird(Jo)} has the least penalty at level 0, and thus is an optimal stable model of lpmln2wc(Π). In some applications, one may not want any hard rules to be violated assuming that hard rules encode deﬁnite knowledge. For that, lpmln2wc(Π) can be modiﬁed by simply turning hard rules into the usual ASP rules. Then the stable models of lpmln2wc(Π) satisfy all hard rules. For example,

3Recall that we identify the rules with the corresponding ﬁrstorder formulas.

the program in Example 2 can be translated into programs with weak constraints as follows.

Bird(Jo) Resident Bird(Jo) Bird(Jo) Migratory Bird(Jo) Resident Bird(Jo), Migratory Bird(Jo) {Resident Bird(Jo)}ch

{Migratory Bird(Jo)}ch

: Resident Bird(Jo) [ 2@0] : Migratory Bird(Jo) [ 1@0]

Also note that while the most probable stable models of Π and the optimal stable models of lpmln2wc(Π) coincide, their weights and penalties are not proportional to each other. The former is deﬁned by an exponential function whose exponent is the sum of the weights of the satisﬁed formulas, while the latter simply adds up the penalties of the satisﬁed formulas. On the other hand, they are monotonically increasing/decreasing as more formulas are satisﬁed. In view of Theorem 2 from (Lee and Wang 2016), which tells us how to embed Markov Logic into LPMLN, it follows from Theorem 1 that MAP inference in MLN can also be reduced to the optimal stable model ﬁnding of programs with weak constraints. For any Markov Logic Network Π, let mln2wc(Π) be the union of lpmln2wc(Π) plus choice rules {A}ch for all atoms A.

Theorem 2 For any Markov Logic Network Π, the most probable models of Π are precisely the optimal stable models of the program with weak constraints mln2wc(Π).

Similarly, MAP inference in Prob Log and Pearl s Causal Models can be reduced to ﬁnding an optimal stable model of a program with weak constraints in view of the reduction of Prob Log to LPMLN (Theorem 4 from (Lee and Wang 2016)) and the reduction of Causal Models to LPMLN (Theorem 4 from (Lee, Meng, and Wang 2015)) thereby allowing us to apply combinatorial optimization methods in standard ASP solvers to these languages.

Alternative Translations Instead of aggregating the weights of satisﬁed formulas, we may aggregate the weights of formulas that are not satisﬁed. Let lpmln2wcpnt(Π) be a modiﬁcation of lpmln2wc(Π) by replacing (1) with

and (2) with : F [w@0]. Intuitively, when F is not satisﬁed, the stable model gets the penalty 1 at level 1, or w at level 0 depending on whether F is a hard or soft formula.

Corollary 1 Theorem 1 remains true when lpmln2wc(Π) is replaced by lpmln2wcpnt(Π).

This alternative view of assigning weights to stable models, in fact, originates from Probabilistic Soft Logic (PSL) (Bach et al. 2015), where the probability density function of an interpretation is obtained from the sum over the penalty from the formulas that are distant from satisfaction. This

view will lead to a slight advantage when we further turn the translation into the input language of ASP solvers (See Footnote 6). The current ASP solvers do not allow arbitrary formulas to appear in weak constraints. For computation using the ASP solvers, let lpmln2wcpnt,rule(Π) be the translation by turning each weighted formula wi : Fi in Π into

Fi unsat(i) unsat(i) Fi : unsat(i) [wi@l].

where unsat(i) is a new atom, and l = 1 if wi is α and l = 0 otherwise.

Corollary 2 Let Π be an LPMLN program. There is a 1-1 correspondence φ between the most probable stable models of Π and the optimal stable models of lpmln2wcpnt,rule(Π), where φ(I) = I {unsat(i) | wi : Fi Π, I |= Fi}.

The corollary allows us to compute the most probable stable models (MAP estimates) of the LPMLN program using the combination of F2LP 4 and CLINGO 5 (assuming that the weights are approximated to integers). System F2LP turns this program with formulas into the input language of CLINGO, so CLINGO can be used to compute the theory. If the unweighted LPMLN program is already in the rule form Head Body that is allowed in the input languages of CLINGO and DLV, we may avoid the use of F2LP by slightly modifying the translation lpmln2wcpnt,rule by turning each weighted rule wi : Headi Bodyi instead into

unsat(i) Bodyi, not Headi Headi Bodyi, not unsat(i) : unsat(i) [wi@l]

where l = 1 if wi is α and l = 0 otherwise. In the case when Headi is , the translation can be further simpliﬁed: we simply turn wi : Bodyi into : Bodyi [wi@l].6

Example 1 continued: For program (3), the simpliﬁed translation lpmln2wcpnt,rule yields

unsat(1) p, not q q p, not unsat(1) : unsat(1) [10@0] unsat(2) p, not r r p, not unsat(2) : unsat(2) [1@0] unsat(3) not p p not unsat(3) : unsat(3) [5@0] : not r [ 20@0]

Turning P-log into LPMLN

Review: P-log Syntax A sort is a set of symbols. A constant c maps an element in the domain s1 sn to an element in the

4http://reasoning.eas.asu.edu/f2lp/ 5http://potassco.sourceforge.net 6Alternatively, we may turn it into the reward way, i.e., turning it into : not Bodyi[ wi], but the rule may not be in the input language of CLINGO.

range s0 (denoted by Range(c)), where each of s0, . . . , sn is a sort. A sorted propositional signature is a special case of propositional signatures constructed from a set of constants and their associated sorts, consisting of all propositional atoms c( u) = v where c : s1 sn s0, and u s1 sn, and v s0.7 Symbol c( u) is called an attribute and v is called its value. If the range s0 of c is {f, t} then c is called Boolean, and c( u)= t can be abbreviated as c( u) and c( u)=f as c( u). The signature of a P-log program is the union of two propositional signatures σ1 and σ2, where σ1 is a sorted propositional signature, and σ2 is a usual propositional signature consisting of atoms Do(c( u)=v), Obs(c( u)=v) and Obs(c( u) =v) for all atoms c( u)=v in σ1. A P-log program Π of signature σ1 σ2 is a tuple

Π = R, S, P, Obs, Act (4)

where the signature of each of R, S, and P is σ1 and the signature of each of Obs and Act is σ2 such that R is a set of normal rules of the form

A B1, . . . , Bm, not Bm+1, . . . , not Bn

where A, B1, . . . , Bn are atoms (0 m n). S is a set of random selection rules of the form

[r] random(c( u) : {x : p(x)}) Body (5)

where r is a unique identiﬁer, p is a boolean constant with a unary argument, and Body is a set of literals. x is a schematic variable ranging over the argument sort of p. Rule (5) is called a random selection rule for c( u). Intuitively, rule (5) says that if Body is true, the value of c( u) is selected at random from the set Range(c) {x : p(x)} unless this value is ﬁxed by a deliberate action, i.e., Do(c( u)=v) for some value v. P is a set of so-called probability atoms (pr-atoms) of the form prr(c( u)=v | C) = p (6) where r is the identiﬁer of some random selection rule for c( u) in S; c( u) = v σ1; C is a set of literals; and p is a real number in [0, 1]. We say pr-atom (6) is associated with the random selection rule whose identiﬁer is r. Obs is a set of atomic facts for representing observation : Obs(c( u)=v) and Obs(c( u) = v). Act is a set of atomic facts for representing a deliberate action: Do(c( u)=v).

Semantics Let Π be a P-log program (4) of signature σ1 σ2. The possible worlds of Π, denoted by ω(Π), are the stable models of τ(Π), a (standard) ASP program with the propositional signature

σ1 σ2 {Intervene(c( u)) | c( u) is an attribute occurring in S}

that accounts for the logical part of P-log. Due to lack of space we refer the reader to (Baral, Gelfond, and Rushton 2009) for the deﬁnition of τ(Π).

7Note that here = is just a part of the symbol for propositional atoms, and is not equality in ﬁrst-order logic.

An atom c( u) = v is called possible in a possible world W due to a random selection rule (5) if Π contains (5) such that W |= Body p(v) Intervene(c( u)).8 Pr-atom (6) is applied in W if c( u) = v is possible in W due to r and W |= C. As in (Baral, Gelfond, and Rushton 2009), we assume that all P-log programs Π satisfy the following conditions: Condition 1 [Unique random selection rule]: If a P-log program Π contains two random selection rules for c( u):

[r1] random(c( u) : {x : p1(x)}) Body1, [r2] random(c( u) : {x : p2(x)}) Body2,

then no possible world of Π satisﬁes both Body1 and Body2. Condition 2 [Unique probability assignment]: If a Plog program Π contains a random selection rule for c( u):

[r] random(c( u) : {x : p(x)}) Body

along with two different pr-atoms:

prr(c( u)=v | C1) = p1, prr(c( u)=v | C2) = p2,

then no possible world of Π satisﬁes Body, C1, and C2 together.

Given a P-log program Π, a possible world W ω(Π), and an atom c( u) = v possible in W, by Condition 1, it follows that there is exactly one random selection rule (5) such that W |= Body. Let r W,c( u) denote this random selection rule, and let AVW (c( u)) = {v | there exists a pr-atom prr W,c( u)(c( u) = v | C) = p that is applied in W for some C and p}. We then deﬁne the following notations: If v AVW (c( u)), there exists a pr-atom prr W,c( u)(c( u)=v | C) = p in Π for some C and p such that W |= C. By Condition 2, for any other prr W,c( u)(c( u) = v | C ) = p in Π, it follows that W |= C . So there is only one pr-atom that is applied in W for c( u)=v, and we deﬁne

Poss With Ass Pr(W, c( u)=v) = p.

( c( u)=v is possible in W with assigned probability p. ) If v AVW (c( u)), we deﬁne

Poss With Def Pr(W, c( u)=v) = max p, 0 ,

v AVW (c( u)) Poss With Ass Pr(W, c( u)=v )

|{v | c( u)=v is possible in W and v AVW (c( u))}| . (7)

( c( u)=v is possible in W with the default probability. ) The max function is used to ensure that the default probability is nonnegative. 9

8Note that this is slightly different from the original deﬁnition of P-log from (Baral, Gelfond, and Rushton 2009), according to which, if Intervene(c( u)) is true, the probability of c( u) = v is determined by the default probability, which is a bit unintuitive. 9In (Baral, Gelfond, and Rushton 2009), a stronger condition of unitariness is imposed to prevent (7) from being negative.

For each possible world W ω(Π), and each atom c( u)=v possible in W, the probability of c( u)=v to happen in W is deﬁned as: P(W, c( u)=v) = Poss With Ass Pr(W, c( u)=v) if v AVW (c( u)); Poss With Def Pr(W, c( u)=v) otherwise.

The unnormalized probability of a possible world W is deﬁned as ˆμΠ(W) =

c( u)=v W and c( u)=v is possible in W

P(W, c( u)=v),

and, assuming Π has at least one possible world with nonzero unnormalized probability, the normalized probability of W is deﬁned as

μΠ(W) = ˆμΠ(W)

Wi ω(Π) ˆμΠ(Wi).

We say Π is consistent if Π has at least one possible world with a non-zero probability. Example 3 Consider a variant of the Monty Hall Problem encoding in P-log from (Baral, Gelfond, and Rushton 2009) to illustrate the probabilistic nonmonotonicity in the presence of assigned probabilities. There are four doors, behind which are three goats and one car. The guest picks door 1, and Monty, the show host who always opens one of the doors with a goat, opens door 2. Further, while the guest and Monty are unaware, the statistics is that in the past, with 30% chance the prize was behind door 1, and with 20% chance, the prize was behind door 3. Is it still better to switch to another door? This example can be formalized in P-log program Π, using both assigned probability and default probability, as

Can Open(d) Selected=d. (d {1, 2, 3, 4}), Can Open(d) Prize=d. Can Open(d) not Can Open(d). random(Prize). random(Selected). random(Open : {x : Can Open(x)}). pr(Prize=1) = 0.3. pr(Prize=3) = 0.2. Obs(Selected=1). Obs(Open=2). Obs(Prize = 2).

The possible worlds of Π are as follows: W1 = {Obs(Selected = 1), Obs(Open = 2), Obs(Prize = 2), Selected = 1, Open = 2, Prize = 1, Can Open(1) = f, Can Open(2) = t, Can Open(3) = t, Can Open(4) = t} W2 = {Obs(Selected = 1), Obs(Open = 2), Obs(Prize = 2), Selected = 1, Open = 2, Prize = 3, Can Open(1) = f, Can Open(2) = t, Can Open(3) = f, Can Open(4) = t} W3 = {Obs(Selected = 1), Obs(Open = 2), Obs(Prize = 2), Selected = 1, Open = 2, Prize = 4, Can Open(1) = f, Can Open(2) = t, Can Open(3) = t, Can Open(4) = f}. The probability of each atom to happen is

P(Wi, Selected=1) = Poss With Def Pr(W, Selected=1) = 1/4

P(W1, Open=2) = Poss With Def Pr(W1, Open=2) = 1/3 P(W2, Open=2) = Poss With Def Pr(W2, Open=2) = 1/2 P(W3, Open=2) = Poss With Def Pr(W3, Open=2) = 1/2

P(W1, Prize=1) = Poss With Ass Pr(W1, Prize=1) = 0.3 P(W2, Prize=3) = Poss With Ass Pr(W2, Prize=3) = 0.2 P(W3, Prize=4) = Poss With Def Pr(W3, Prize=4) = 0.25 So,

ˆμΠ(W1) = 1/4 1/3 0.3 = 1/40 ˆμΠ(W2) = 1/4 1/2 0.2 = 1/40 ˆμΠ(W3) = 1/4 1/2 0.25 = 1/32.

Thus, in comparison with staying (W1), switching to door 3 (W2) does not affect the chance, but switching to door 4 (W3) increases the chance by 25%.

Turning P-log into LPMLN

We deﬁne translation plog2lpmln(Π) that turns a P-log program Π into an LPMLN program in a modular way. First, every rule R in τ(Π) (that is used in deﬁning the possible worlds in P-log) is turned into a hard rule α : R in plog2lpmln(Π). In addition, plog2lpmln(Π) contains the following rules to associate probability to each possible world of Π. Below x, y denote schematic variables, and W is a possible world of Π.

Possible Atoms: For each random selection rule (5) for c( u) in S and for each v Range(c), plog2lpmln(Π) includes

Possr(c( u) = v) Body, p(v), not Intervene(c( u)) (8)

Rule (8) expresses that c( u) = v is possible in W due to r if W Body p(v) Intervene(c( u)). Assigned Probability: For each pr-atom (6) in P, plog2lpmln(Π) contains the following rules:

α : Poss With Ass Prr,C(c( u)=v) Possr(c( u) = v), C (9)

α : Ass Prr,C(c( u)=v) c( u)=v, Poss With Ass Prr,C(c( u)=v) (10) ln(p) : not Ass Prr,C(c( u)=v) (p > 0) (11) α : Ass Prr,C(c( u)=v) (p = 0) α : Poss With Ass Pr(c( u)=v) Poss With Ass Prr,C(c( u)=v).

Rule (9) expresses the condition under which pr-atom (6) is applied in a possible world W. Further, if c( u) = v is true in W as well, rules (10) and (11) contribute the assigned probability eln(p) = p to the unnormalized probability of W as a factor when p > 0.

Denominator for Default Probability: For each random selection rule (5) for c( u) in S and for each v Range(c), plog2lpmln(Π) includes

α : Poss With Def Pr(c( u)=v) Possr(c( u)=v), not Poss With Ass Pr(c( u)=v) (12)

α : Num Def Pr(c( u), x) c( u)=v, Poss With Def Pr(c( u) = v), x = #count{y : Poss With Def Pr(c( u)=y)} (13)

m) : not Num Def Pr(c( u), m) (m = 2, . . . , |Range(c)|) (14)

Rule (12) asserts that c( u) = v is possible in W with a default probability if it is possible in W and not possible with an assigned probability. Rule (13) expresses, intuitively, that Num Def Pr(c( u), x) is true if there are exactly x different values v such that c( u) = v is possible in W with a default probability, and there is at least one of them that is also true

in W. This value x is the denominator of (7). Then rule (14) contributes the factor 1/x to the unnormalized probability of W as a factor.

Numerator for Default Probability: Consider each random selection rule [r] random(c( u) : {x : p(x)}) Body for c( u) in S along with all pr-atoms associated with it in P: prr(c( u)=v1 | C1) = p1 . . . prr(c( u)=vn | Cn) = pn where n 1, and vi and vj (i = j) may be equal. For each v Range(c), plog2lpmln(Π) contains the following rules:10

α : Rem Pr(c( u), 1 y) Body c( u)=v, Poss With Def Pr(c( u)=v), y = #sum{p1 : Poss With Ass Prr,C1(c( u)=v1); . . . ; pn : Poss With Ass Prr,Cn(c( u)=vn)}. (15)

α : Total Def Pr(c( u), x) Rem Pr(c( u), x), x > 0 (16) ln(x) : not Total Def Pr(c( u), x) (17) α : Rem Pr(c( u), x), x 0. (18) In rule (15), y is the sum of all assigned probabilities. Rules (16) and (17) are to account for the numerator of (7) when n > 0. The variable x stands for the numerator of (7). Rule (18) is to avoid assigning a non-positive default probability to a possible world. Note that most rules in plog2lpmln(Π) are hard rules. The soft rules (11), (14), (17) cannot be simpliﬁed as atomic facts, e.g., ln( 1

m) : Num Def Pr(c( u), m) in place of (14), which is in contrast with the use of probabilistic choice atoms in the distribution semantics based probabilistic logic programming language, such as Prob Log. This is related to the fact that the probability of each atom to happen in a possible word in P-log is derived from assigned and default probabilities, and not from independent probabilistic choices like the other probabilistic logic programming languages. In conjunction with the embedding of Prob Log in LPMLN (Lee and Wang 2016), it is interesting to note that both kinds of probabilities can be captured in LPMLN using different kinds of rules.

Example 3 Continued For the program Π in Example 3, plog2lpmln(Π) consists of the rules α : R for each rule R in τ(Π) and the following rules.

Possible Atoms: α : Poss(Prize = d) not Intervene(Prize) α : Poss(Selected = d) not Intervene(Selected) α : Poss(Open = d) Can Open(d), not Intervene(Open)

10The sum aggregate can be represented by ground ﬁrst-order formulas under the stable model semantics under the assumption that the Herbrand Universe is ﬁnite (Ferraris 2011). In the general case, it can be represented by generalized quantiﬁers (Lee and Meng 2012) or inﬁnitary propositional formulas (Harrison, Lifschitz, and Yang 2014). In the input language of ASP solvers, which does not allow real number arguments, pi can be approximated to integers of some ﬁxed interval.

Assigned Probability:

α : Poss With Ass Pr(Prize = 1) Poss(Prize = 1) α : Ass Pr(Prize = 1) Prize = 1, Poss With Ass Pr(Prize = 1) ln(0.3) : not Ass Pr(Prize = 1)

α : Poss With Ass Pr(Prize = 3) Poss(Prize = 3) α : Ass Pr(Prize = 3) Prize = 3, Poss With Ass Pr(Prize = 3) ln(0.2) : not Ass Pr(Prize = 3)

(We simpliﬁed slightly not to distinguish Poss With Ass Pr( ) and Poss With Ass Prr,C( ) because there is only one random selection rule for Prize and both pr-atoms for Prize has empty conditions. )

Denominator for Default Probability:

α : Poss With Def Pr(Prize = d) Poss(Prize = d), not Poss With Ass Pr(Prize = d) α : Poss With Def Pr(Selected = d) Poss(Selected = d), not Poss With Ass Pr(Selected = d) α : Poss With Def Pr(Open = d) Poss(Open = d), not Poss With Ass Pr(Open = d)

α : Num Def Pr(Prize, x) Prize = d, Poss With Def Pr(Prize = d), x = #count{y : Poss With Def Pr(Prize = y)} α : Num Def Pr(Selected, x) Selected = d, Poss With Def Pr(Selected = d), x = #count{y : Poss With Def Pr(Selected = y)} α : Num Def Pr(Open, x) Open = d, Poss With Def Pr(Open = d), x = #count{y : Poss With Def Pr(Open = y)} ln( 1

m) : not Num Def Pr(c, m) (c {Prize, Selected, Open}, m {2, 3, 4})

Numerator for Default Probability:

α : Rem Pr(Prize, 1 x) Prize = d, Poss With Def Pr(Prize = d), x = #sum{0.3 : Poss With Ass Pr(Prize=1); 0.2 : Poss With Ass Pr(Prize=3)} α : Total Def Pr(Prize, x) Rem Pr(Prize, x), x > 0 ln(x) : not Total Def Pr(Prize, x) α : Rem Def Pr(Prize, x), x 0

Clearly, the signature of plog2lpmln(Π) is a superset of the signature of Π. Further, plog2lpmln(Π) is linear-time constructible. The following theorem tells us that there is a 1-1 correspondence between the set of the possible worlds with non-zero probabilities of Π and the set of the stable models of plog2lpmln(Π) such that each stable model is an extension of the possible world, and the probability of each possible world of Π coincides with the probability of the corresponding stable model of plog2lpmln(Π). Theorem 3 Let Π be a consistent P-log program. There is a 1-1 correspondence φ between the set of the possible worlds of Π with non-zero probabilities and the set of probabilistic stable models of plog2lpmln(Π) such that For every possible world W of Π that has a non-zero probability, φ(W) is a probabilistic stable model of plog2lpmln(Π), and μΠ(W) = Pplog2lpmln(Π)(φ(W)). For every probabilistic stable model I of plog2lpmln(Π), the restriction of I onto the signature of τ(Π), denoted I|σ(τ(Π)), is a possible world of Π and μΠ(I|σ(τ(Π))) > 0.

Proof. (Sketch) We can check that the following mapping φ is the 1-1 correspondence.

1. φ(W) |= Possr(c( u) = v) iff c( u) = v is possible in W due to r.

2. For each pr-atom prr(c( u)=v | C) = p in Π, φ(W) |= Poss With Ass Prr,C(c( u) = v) iff this pr-atom is applied in W.

3. For each pr-atom prr(c( u)=v | C) = p in Π, φ(W) |= Ass Prr,C(c( u) = v) iff this pr-atom is applied in W, and W |= c( u)=v.

4. φ(W) |= Poss With Ass Pr(c( u)=v) iff v AVW (c( u)).

5. φ(W) |= Poss With Def Pr(c( u) = v) iff c( u) = v is possible in W and v AVW (c( u)).

6. φ(W) |= Num Def Pr(c( u), m) iff there exist exactly m different values v such that c( u) = v is possible in W; v AVW (c( u)); and, for one of such v, W |= c( u)=v.

7. φ(W) |= Rem Pr(c( u), k) iff there exists a value v such that W |= c( u) = v; c( u) = v is possible in W; v AVW (c( u)); and

v AVW (c( u)) Poss With Ass Pr(W, c( u)=v).

8. φ(W) |= Total Def Pr(c( u), k) iff φ(W) |= Rem Pr(c( u, k)) and k > 0.

To check that μΠ(W) = Pplog2lpmln(Π)(φ(W)), note ﬁrst that the weight of φ(W) is computed by multiplying e to the power of the weights of rules (11), (14), (17) that are satisﬁed by φ(W). Let s call this product TW. Consider a possible world W with a non-zero probability of Π and c( u)=v that is satisﬁed by W. If c( u) = v is possible in W and pr-atom prr(c( u) = v | C) = p is applied in W (i.e., v AVW (c( u))), then the assigned probability is applied: P(W, c( u)=v) = p. On the other hand, by condition 3, φ(W) |= Ass Prr,C(c( u)=v), so that from (11), the same p is a factor of TW. If c( u) = v is possible in W and v AVW (c( u)), the default probability is applied: P(W, c( u) = v) = p is computed by (7). By Condition 8, φ(W) |= Total Def Pr(c( u), x) where x = 1

v AVW (c( u)) Poss With Ass Pr(W, c( u) = v ).

Since φ(W) |= (17), it is a factor of TW, which is the same as the numerator of (7). Furthermore, by Condition 6, φ(W) |= Num Def Pr(c( u), m), where m is the denominator of (7). Since φ(W) |= (14), 1

m is a factor of TW.

Example 3 Continued For the P-log program Π for the Monty Hall problem, Π = plog2lpmln(Π) has three probabilistic stable models I1, I2, and I3, each of which is an extension of W1, W2, and W3 respectively, and satisﬁes the following atoms: Poss(Prize = i) for i = 1, 2, 3, 4; Poss(Selected=i) for i = 1, 2, 3, 4; Poss With Ass Pr(Prize= i) for i = 1, 3; Poss With Def Pr(Prize = i) for i = 2, 4; Poss With Def Pr(Selected = i) for i = 1, 2, 3, 4; Num Def Pr(Selected, 4). In addition,

I1 |= {Ass Pr(Prize=1), Poss(Open=2), Poss(Open=3), Poss(Open=4), Poss With Def Pr(Open=2), Poss With Def Pr(Open=3), Poss With Def Pr(Open=4), Num Def Pr(Open, 3)} I2 |= {Ass Pr(Prize=3), Poss(Open=2), Poss(Open=4), Poss With Def Pr(Open=2), Poss With Def Pr(Open=4), Num Def Pr(Open, 2)} I3 |= {Poss(Open=2), Poss(Open=3), Poss With Def Pr(Open=2), Poss With Def Pr(Open=3), Num Def Pr(Open, 2), Num Def Pr(Prize, 2), Rem Pr(Prize, 0.5), Total Def Pr(Prize, 0.5)}. The unnormalized weight WΠ (Ii) of each probabilistic stable model Ii is shown below. w(Ass Prr,C(c( u) = v)) denotes the exponentiated weight of rule (11); w(Num Def Pr(c( u), m)) denotes the exponentiated weight of rule (14); w(Total Def Pr(c( u), x)) denotes the exponentiated weight of rule (17). WΠ (I1) = w(Num Def Pr(Selected, 4)) w(Ass Pr(Prize = 1)) w(Num Def Pr(Open, 3)) = 1 4 3

3 = 1 40. WΠ (I2) = w(Num Def Pr(Selected, 4)) w(Ass Pr(Prize = 3)) w(Num Def Pr(Open, 2)) = 1 4 2

2 = 1 40; WΠ (I3) = w(Num Def Pr(Selected, 4)) w(Num Def Pr(Open, 2)) w(Num Def Pr(Prize, 2) w(Total Def Pr(Prize, 0.5) = 1

Combining the translations plog2lpmln and lpmln2wc, one can compute P-log MAP inference using standard ASP solvers.

Conclusion In this paper, we show how LPMLN is related to weak constraints and P-log. Weak constraints are a relatively simple extension to ASP programs, while P-log is highly structured but a more complex extension. LPMLN is shown to be a good middle ground language that clariﬁes the relationships. We expect the relationships will help us to apply the mathematical and computational results developed for one language to another language.

Acknowledgments We are grateful to Evgenii Balai, Yi Wang and anonymous referees for their useful comments on the draft of this paper. This work was partially supported by the National Science Foundation under Grants IIS-1319794 and IIS-1526301.

References Bach, S. H.; Broecheler, M.; Huang, B.; and Getoor, L. 2015. Hinge-loss markov random ﬁelds and probabilistic soft logic. ar Xiv:1505.04406 [cs.LG]. Balai, E., and Gelfond, M. 2016. On the relationship between P-log and LPMLN. In Proceedings of International Joint Conference on Artiﬁcial Intelligence (IJCAI). Baral, C.; Gelfond, M.; and Rushton, J. N. 2009. Probabilistic reasoning with answer sets. TPLP 9(1):57 144.

Buccafurri, F.; Leone, N.; and Rullo, P. 2000. Enhancing disjunctive datalog by constraints. Knowledge and Data Engineering, IEEE Transactions on 12(5):845 860. Calimeri, F.; Faber, W.; Gebser, M.; Ianni, G.; Kaminski, R.; Krennwallner, T.; Leone, N.; Ricca, F.; and Schaub, T. 2013. ASP-Core-2 input language format. De Raedt, L.; Kimmig, A.; and Toivonen, H. 2007. Prob Log: A probabilistic prolog and its application in link discovery. In IJCAI, volume 7, 2462 2467. Ferraris, P.; Lee, J.; and Lifschitz, V. 2011. Stable models and circumscription. Artiﬁcial Intelligence 175:236 263. Ferraris, P. 2011. Logic programs with propositional connectives and aggregates. ACM Transactions on Computational Logic (TOCL) 12(4):25. Gelfond, M., and Lifschitz, V. 1988. The stable model semantics for logic programming. In Kowalski, R., and Bowen, K., eds., Proceedings of International Logic Programming Conference and Symposium, 1070 1080. MIT Press. Harrison, A. J.; Lifschitz, V.; and Yang, F. 2014. The semantics of gringo and inﬁnitary propositional formulas. In Principles of Knowledge Representation and Reasoning: Proceedings of the Fourteenth International Conference, KR 2014. Lee, J., and Meng, Y. 2012. Stable models of formulas with generalized quantiﬁers (preliminary report). In Technical Communications of the 28th International Conference on Logic Programming, 61 71. Lee, J., and Wang, Y. 2016. Weighted rules under the stable model semantics. In Proceedings of International Conference on Principles of Knowledge Representation and Reasoning (KR), 145 154. Lee, J.; Meng, Y.; and Wang, Y. 2015. Markov logic style weighted rules under the stable model semantics. In Technical Communications of the 31st International Conference on Logic Programming. Nickles, M., and Mileo, A. 2014. Probabilistic inductive logic programming based on answer set programming. In 15th International Workshop on Non-Monotonic Reasoning (NMR 2014). Pearl, J. 2000. Causality: models, reasoning and inference, volume 29. Cambridge Univ Press. Richardson, M., and Domingos, P. 2006. Markov logic networks. Machine Learning 62(1-2):107 136.