# on_the_conditional_logic_of_simulation_models__4b2c1886.pdf

On the Conditional Logic of Simulation Models

Duligur Ibeling1, Thomas Icard2

1 Department of Computer Science, Stanford University 2 Department of Philosophy, Stanford University duligur@stanford.edu, icard@stanford.edu

We propose analyzing conditional reasoning by appeal to a notion of intervention on a simulation program, formalizing and subsuming a number of approaches to conditional thinking in the recent AI literature. Our main results include a series of axiomatizations, allowing comparison between this framework and existing frameworks (normality-ordering models, causal structural equation models), and a complexity result establishing NP-completeness of the satisﬁability problem. Perhaps surprisingly, some of the basic logical principles common to all existing approaches are invalidated in our causal simulation approach. We suggest that this additional ﬂexibility is important in modeling some intuitive examples.

1 Introduction and Motivation Much of intelligent action and reasoning involves assessing what would occur (or would have occurred) under various non-actual conditions. Such hypothetical and counterfactual (broadly, subjunctive) conditionals are bound up with central topics in artiﬁcial intelligence, including prediction, explanation, causal reasoning, and decision making. It is thus for good reason that AI researchers have focused a great deal of attention on conditional reasoning (see, e.g., [Ginsberg, 1986; Delgrande, 1998; Friedman et al., 2000; Pearl, 2009; Bottou et al., 2013], among many others). Two broad approaches to subjunctive conditionals have been especially salient in the literature. The ﬁrst, originating in philosophy [Stalnaker, 1968; Lewis, 1973], takes as basic a similarity or normality ordering on possibilities, and evaluates a claim if ϕ then ψ by asking whether ψ is true in (e.g., all) the most normal ϕ possibilities. The second approach, associated with the work of Judea Pearl, takes as basic a causal structural equation model (SEM), and evaluates conditionals according to a deﬁned notion of intervention on the model. These two approaches are in some technical and conceptual respects compatible [Pearl, 2009], though they can also be shown to conﬂict on some basic logical matters [Halpern, 2013]. Both capture important intuitions about conditional reasoning, and both have enjoyed successful applications in AI research.

In this article we propose a third approach to conditionals, which captures a different intuition, and which can already be seen as implicit in a growing body of work in AI, as well as in cognitive science. This approach takes as basic the notion of a simulation model, that is, a program for simulating the transformation from one state of the world to another, or for building up or generating a world from a partial description of it. Simulation models have been of interest since the earliest days of AI [Newell and Simon, 1961]. A recent tradition, coming out of work on statistical relational models, has proposed building complex generative models using rich and expressive programming languages, typically also incorporating probability (e.g., [Pfeffer and Koller, 2000; Milch et al., 2005; Goodman et al., 2008; de Raedt and Kimmig, 2015]). Such languages have also been used for modeling human reasoning, including with counterfactuals [Goodman et al., 2015]. Simulation models have an obvious causal (and more general dependence) structure, and it is natural to link conditionals with this very structure. We can assess a claim if ϕ then ψ by intervening on the program to ensure that ϕ holds true throughout the simulation, and asking whether ψ holds upon termination. This is conceptually different from the role of intervention in structural equation models, where the postintervention operation is to ﬁnd solutions to the manipulated system of equations. As we shall see, this conceptual difference has fundamental logical ramiﬁcations. This more procedural way of thinking about subjunctive conditionals enjoys various advantages. First, there is empirical evidence suggesting that human causal and conditional reasoning is closely tied to mental simulation [Sloman, 2005]. Second, there are many independent reasons to build generative models in AI (e.g., minimizing prediction error in classiﬁcation; see [Liang and Jordan, 2008]), making them a common tool. Thus, opportunistically, we can expect to have such models readily available (perhaps unlike normality orderings or even structural equation models). Related to this second point, many of the generative models that are currently being built using deep neural networks ﬁt neatly into our approach, even though we can often only use them as black boxes (see, e.g., [Mirza and Osindero, 2014; Kocaoglu et al., 2017], etc.). We know how to intervene on these programs (i.e., controlling input), and how to read off a result or prediction that is, we can observe what conditional claims they embody even though we may not understand all

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

the causal details of the learned model. Some authors have recently argued that certain kinds of counterfactual analysis in particular establish an appropriate standard for interpretability for these models [Wachter et al., 2018]. Our contribution in this article is threefold: (1) we propose a general semantic analysis of conditional claims in terms of program executions, subsuming all the aforementioned application areas; (2) we establish completeness theorems for a propositional conditional language with respect to (four different classes of) programs, allowing a comparison with alternative approaches at a fundamental logical level; (3) we establish NP-completeness of the satisﬁability problem for these logical systems. Before turning to these details, we explain informally what is distinctive about the resulting logic.

2 Conditional Logics The literature on conditional logic is extensive. We focus here on the most notable differences between the systems below and more familiar systems based on either world-orderings or SEMs. We will be using a notation inspired by dynamic logic (also used by [Halpern, 2000]), whereby [α]β can loosely be read as, if α were true, then β would be true. Understanding the complete logic of a given interpretation can be of both theoretical and practical interest. In the causal setting, for instance, a complete set of axioms may give the exact conditions under which some counterfactual quantity is (not) identiﬁable from statistical data [Pearl, 2009]. One of the bedrock principles of conditional reasoning is called Cautious Monotonicity [Kraus et al., 1990], or sometimes the Composition rule [Pearl, 2009]. This says that from [A](B C) we may always infer [A B]C. While there are known counterexamples to it in the literature it fails for some probabilistic and possibilistic interpretations [Dubois and Prade, 1991] and in standard versions of default logic [Makinson, 1994] the principle is foundational to both world-ordering models and SEMs. By contrast, in our setting, holding B ﬁxed during the simulation may interrupt the sequence of steps leading to C being made true. Here is a simple example (taken from [Icard, 2017]): Example 1. If Alf were ever in trouble (A), the neighbors Bea and Cam would both like to help (B and C, respectively). But neither wants to help if the other is already helping. Imagine the following scenario: upon ﬁnding out that Alf is in trouble, each looks to see if the other is already there to help. If not, then each begins to prepare to help, eventually making their way to Alf but never stopping again to see if the other is doing the same. If instead, e.g., Cam initially sees Bea already going to help, Cam will not go. One might then argue that the following both truly describe the situation: If Alf were in trouble, Bea and Cam would both go to help and If Alf were in trouble and Bea were going to help, Cam would not go to help . The example trades on a temporal ambiguity about when Bea is going to help, and it can be blocked simply by timeindexing variables. However, following a common stance in the literature [Halpern, 2000; Pearl, 2009], we maintain that requiring temporal information always be made explicit is excessively stringent. Furthermore, in line with our earlier re-

marks about black box models, we may often be in a situation where we simply do not understand the internal temporal and causal structure of the program. To take a simple example, asking a generative image model to produce a cityscape might result in images with clouds and blue skies, even though a request to produce a cityscape with a blue sky might not result in any clouds. We would like a framework that can accommodate conditional theories embodied in artifacts like these. Our completeness results below (Thm. 1) show that the logic of conditional simulation is strictly weaker than any logic of structural equation models (as established in [Halpern, 2000]) or of normality orderings (as, e.g., in [Lewis, 1973]). The conditional logic of all programs is very weak indeed. At the same time, some of the axioms in these frameworks can be recovered by restricting the class of programs (e.g., the principle of Conditional Excluded Middle, valid on structural equation models and on some worldordering models [Stalnaker, 1968], follows from optional axiom F below). We view this additional ﬂexibility as a feature. However, even for a reader who is not convinced of this, we submit that understanding the logic of this increasingly popular way of thinking about conditional information is valuable.

Prior Work. The notion of intervention introduced below (Defn. 1) is different from, but inspired by, the corresponding notion in SEMs [Meek and Glymour, 1994; Pearl, 2009]. The logical language we study in this paper, restricting antecedents to conjunctive clauses but closing off under Boolean connectives, follows [Halpern, 2000]. Interestingly, prior to any of this work, [Balkenius and G ardenfors, 1991] studied conditionals interpreted speciﬁcally over certain classes of neural networks, using a deﬁnition of clamping a node similar to our notion of intervention. They also observed that some of the core principles of non-monotonic logic fail for that setting. (See in addition [Leitgeb, 2004] for further development of related ideas.)

Let X be a set of atoms X1, X2, X3, . . . and let Lprop be the language of propositional formulas over atoms in X closed under disjunction, conjunction, and negation. Let Lint Lprop be the language of purely conjunctive, ordered formulas of unique literals, i.e., formulas of the form li1 . . . lin, where ij < ij+1 and each lij is either Xij or Xij. Each formula in Lint will specify an intervention by giving ﬁxed values for a ﬁxed list of variables. We also include the empty intervention in Lint. Given ϕ Lprop, ϕ Lint is the Lint-equivalent of ϕ if ϕ is a propositionally consistent, purely conjunctive formula over literals and ϕ results from a reordering of literals and deletion of repeated literals in ϕ. For example, the Lint-equivalent of X2 X1 X1 is X1 X2. Let Lcond be the language of formulas of the form [α]β for α Lint, β Lprop. We call such a formula a subjunctive conditional, and call α the antecedent and β the consequent. The overall causal simulation language L is the language of propositional formulas over atoms in X Lcond closed under disjunction, conjunction, and negation. For α, β L, α β abbreviates α β, and α β denotes (α β) (β α). We use α for the dual of [α],

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

i.e., α β abbreviates [α]( β).

4 Semantics

We now deﬁne the semantics of L over causal simulation models. A causal simulation model is a pair (T, x) of a Turing machine T and tape contents represented by a state description x = {xn}n N, which speciﬁes binary1 values for all tape variables, only ﬁnitely many of which can be nonzero. Running T on input x yields a new state description x as output, provided the execution halts. We say x |= Xi iff xi = 1 in x. Satisfaction x |= ϕ of ϕ Lprop is then deﬁned in the familiar way by recursion. For X-atoms we deﬁne (T, x) |= Xi iff x |= Xi. Toward a deﬁnition of satisfaction for subjunctive conditionals, we now deﬁne an intervention (in the same way as in [Icard, 2017]):

Deﬁnition 1 (Intervention). An intervention I is a computable function mapping a machine T to a new machine I(T) by taking a set of values {xi}i I, I N a ﬁnite index set, and holding ﬁxed the value of each Xi to xi throughout the execution of T. That is, I(T) ﬁrst sets each Xi to xi, then runs T while ignoring any write to any Xi.

Any α Lint uniquely speciﬁes an intervention, which we denote as Iα: each literal in α gives a tape variable to hold ﬁxed, and the literal s polarity tells us to which value it is to be ﬁxed. Now we deﬁne (T, x) |= [α]β iff for all halting executions of Iα(T) on x, the resulting tape satisﬁes β. Note that for deterministic machines, this means either Iα(T) does not halt on x, or the unique resulting tape satisﬁes β. The deﬁnition also implies that (T, x) |= α β iff there exists a halting execution of Iα(T) on x whose result satisﬁes β. Having now deﬁned (T, x) |= ϕ for atoms ϕ X Lcond, (T, x) |= ϕ for complex ϕ L is deﬁned by recursion. Interestingly, as revealed by Prop. 1, model checking in this setting is difﬁcult, while satisﬁability (or validity) for notable classes of machines is decidable (Thm. 2).

Proposition 1. If α β is propositionally consistent, then it is undecidable whether (T, x) |= α β.

Proof Sketch. Under a suitable encoding of natural numbers on the variable tape, the class Tα = {Iα(T) : T T }, where T is the class of all machines, gives an enumerable list of all the partial recursive functions, with T computably recoverable from T Tα. Moreover, Hβ = {T Tα : T halts on input x with output x |= β} is extensional and Hβ Tα, so by the Rice-Myhill-Shapiro Theorem it is undecidable. If we could decide whether (T, x) |= α β, this would allow us to decide whether T = Iα(T) Hβ.

A second limitative result is that we cannot have strong completeness (that is, completeness relative to arbitrary sets of assumptions), since by Prop. 2 we do not have compactness. On the other hand, our axiom systems (Defn. 3) are weakly complete (complete relative to ﬁnite assumption sets).

Proposition 2. The language L interpreted over causal simulation models is not compact.

1The present setting can be easily generalized to the arbitrary discrete setting, indeed without changing the logic. See [Icard, 2017].

Proof. Let f : N N be any uncomputable total function such that f(n) = n for all n and consider Ω= { Xn : n N} { Xn Xf(n) : n N} {[Xn] Xm : m, n N with m = n, m = f(n)}. If (T, x) satisﬁes every ϕ Ω, we could compute f(n) by intervening to set Xn to 1, and checking which other variable Xm is set to 1. As f is total and f(n) = n, we could always ﬁnd such m = f(n). So Ω is unsatisﬁable. But it is easily seen that every ﬁnite subset of Ωis satisﬁable.

5 Axiomatic Systems

We will now identify axiomatic systems (Defn. 3) that are sound and complete with respect to salient classes (Defn. 2) of causal simulation models, by which we mean that they prove all (completeness) and only (soundness) the generally valid principles with respect to those classes.

Deﬁnition 2. Let M be the class of all causal simulation models (T, x), where T may be non-deterministic. Let Mdet be the class of models with deterministic T, and let M be the class of models with non-deterministic T that halt on all input tapes and interventions. Also let M det = Mdet M .

Deﬁnition 3. Below are two rules and four axioms.2

PC. Propositional calculus (over the atoms of L) RW. From β β infer [α]β [α]β

R. [α]α K. [α](β γ) ([α]β [α]γ) F. α β [α]β D. [α]β α β

AX denotes the system containing axioms R and K and closed under PC and RW. AXdet is AX in addition to axiom F, AX

is AX in addition to axiom D, and AX det is the system combining all of these axioms and rules.

For the remainder of this article, ﬁx M to be one of the classes M, Mdet, M , or M det, and let AX be the respective deductive system of Defn. 3. Then:

Theorem 1. AX is sound and complete for validities with respect to the class M .

Proof. The soundness of PC, RW, R, and K is straightforward. If M is Mdet (or M det), any M M has at most one halting execution, so a property holding of one execution holds of all and F is sound. If M is M (or M det), then any M has at least one halting execution, so a property holding of all holds of one, and D is sound. As for completeness, it sufﬁces to show that any AX - consistent ϕ has a canonical model Mϕ M satisfying it. Working toward the construction of Mϕ, we prove a normal form result (Lem. 1) that elucidates what is required in order

2We use the standard names from modal and non-monotonic logic. The Left Equivalence rule [Kraus et al., 1990], namely, infer [α]β [α ]β from α α , is not needed: since antecedents belong to Lint, they are never distinguished beyond equivalence.

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

to satisfy ϕ (Lem. 3). We then deﬁne simple programming languages (Defn. 4) easily seen to be translatable into Turing machine code that we employ to construct a program for Mϕ that meets exactly these requirements.

Lemma 1. Any ϕ L is provably-in-AX (and -AX ) equivalent to a disjunction of conjunctive clauses, where each clause is of the form

k K αk βk (1)

and π Lprop while βj, βk Lint for all j Ji for all i I and for all k K. We may assume without loss of generality that αi = αi for distinct i, i I.

Proof. Note that provably in AX, [α](β γ) [α]β [α]γ and α (β γ) α β α γ. Use these equivalences and PC and RW to rewrite and get the result.

Given a clause δ as in (1), let Sδ Lint be the set of Lcondantecedents appearing in δ. Each δ gives rise to a selection function fδ : Sδ (Lint) (cf. [Stalnaker, 1968]), obtained (not uniquely) as follows. To give the value of fδ(α), suppose that α = αk for some k K. If α = αi for some i I, then α βk W j Ji βj is consistent: otherwise, [α] W j Ji βj α βk implies α which is AX- (and AX ) inconsistent. Thus for some j Ji, α βk βj is also consistent. In general α may be αk for multiple k K. For each such k, we ﬁnd such a βj. We then set fδ(α) to the set of Lintequivalents of the α βk βj, and set fδ(α) to the set of Lint-equivalents of the α βk, if α = αi for any i I. The remaining case is that α Sδ but α = αk for any k K; in this case, set fδ(α) = .

Lemma 2. If AX is AXdet or AX det we can assume fδ(α) is a singleton (or possibly empty in the case of AXdet). If AX

is AX or AX det we can assume that / range(fδ).

Proof. In AXdet, if α β1 and α β2, then because [α]β1 and [α]β2, and thus [α](β1 β2), we have α (β1 β2). In AX it is always possible to assume that for each i I there is some j Ji such that αi βj appears as a conjunct. So no such αi will be sent to .

Lemma 3. Let δ be a disjunct as in (1). Let M M. Suppose that M |= π, and for all α Sδ that M |= α β for each β fδ(α), that M |= [α] W β fδ(α) β, and that M |= [α] whenever fδ(α) = . Then M |= δ.

Proof. We show that M satisﬁes every conjunct in (1); satisfaction of π is given. For conjuncts αk βk, for k K, suppose ﬁrst that αk = αi, for any i I. Then fδ(αk) = {αk βk : k K such that αk = αk }. If M |= αk (αk βk ) then M |= αk βk for all such k . Thus suppose αk = αi for some i I. Again by the construction of fδ, we have fδ(αk) = {αk βk βjk : k K such that αk = αk } for some jk where each jk Ji. Then M |= αk (αk βk βjk ) implies M |= αk βk for each such k . To see that M |= [αi] W j Ji βj for each i such that αi = αk, by the assumption, we have M |= [αk] W j J i(αk βk βj)

for some J i Ji. Generalizing the disjunction to Ji and distributing it through shows that M |= [αk] W j Ji βj. Finally, for conjuncts [αi] W j Ji βj where αi = αk for any k K, we have fδ(αi) = so that M |= [αi] . But then M |= [αi]β for any β whatsoever, so that such conjuncts are satisﬁed.

Deﬁnition 4. Let PL be a programming language whose programs are the instances of prog in the following grammar:

const ::= 0 | 1 var ::= X1 | X2 | . . . | Xn | . . .

cond ::= var = const | var = var | var != var | cond & cond

assign ::= var := const | var := var | var := ! var

branches ::= prog | branches or branches

prog ::= | assign | prog ; prog | loop | if cond then prog else prog end | choose branches end

PLdet will denote the same language except that PLdet excludes choose-statements, PL is identical but for excluding loop-statements, and PL det is identical but for excluding both chooseand loop-statements.

A program in any of these languages may be compiled to the right type of Turing machine in an obvious way (loop represents an unconditional inﬁnite loop). For the remainder of the article, ﬁx PL to be the programming language of Defn. 4 corresponding to the choice of M . With the normal form result and suitable languages in hand, we proceed to construct the canonical model Mϕ = (Tϕ, xϕ) for ϕ. Mϕ need only satisfy a consistent clause δ as in (1). Intuitively, Mϕ will satisfy Lprop-atoms in δ via a suitable tape state xϕ (existent as δ and a fortiori π is consistent), and will satisfy each Lcond-atom by dint of a branch in Tϕ, conditional on the antecedent, in which the consequent is made to hold. We now write the PL -code of such a Tϕ. Suppose we are given δ, and that for each α Sδ we have code Holds From Intervention(α) deﬁning a condition that is met iff the program is currently being run under an intervention that ﬁxes α to be true. Then consider a PL-program Pϕ that contains one if-statement for each α Sδ, each executing if Holds From Intervention(α) is met. In the body of the if-statement for α, Pϕ has a choose-statement with one branch for each β fδ(α). The branch for each β consists of a sequence of assignment statements guaranteed to make β hold, call this Make Hold(β), clearly existent since each β is satisﬁable. If fδ(α) is a singleton, this body contains only Make Hold(β); if fδ(α) = , then this body consists of a single loop-statement. If Tϕ is the machine corresponding to Pϕ, and xϕ is a tape state satisfying π, then Mϕ |= α β for each β fδ(α), as the program has a halting branch with Make Hold(β); also, Mϕ |= [α] W β fδ(α) β as there are no other halting executions. If fδ(α) = , then Mϕ |= [α] , since under an α-ﬁxing intervention the program reaches a

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

loop-statement and has no halting executions. So by Lem. 3, we have that Mϕ satisﬁes δ. And thus ϕ. To see that Mϕ M , apply Lem. 2: in AX , / range(fδ) so we have no loops in Pϕ and Mϕ M . In AXdet, we have no choose-statements, so Mϕ Mdet; in AX det, we have neither loopnor choose-statements, and Mϕ M det. But how do we know it is possible to write code Holds From Intervention(α) by which the program can tell whether it is being run under an α-ﬁxing intervention? For any tape variable, we may try to toggle it. If the attempt succeeds, then the variable is not presently ﬁxed by an intervention. If not, then the present execution is under an intervention ﬁxing the variable. Thus, we ﬁrst try to toggle each relevant variable. Let N be the maximum index i of any atom Xi appearing in ϕ. Listing 1 call it Is Intervened(Xi) performs the toggle check for Xi and records the result in Xi+N. It uses Xi+2N as a temporary variable and ultimately leaves the value of Xi unchanged.

Listing 1: Is Intervened(Xi)

Xi+N := Xi; Xi := ! Xi; Xi+2N := Xi; if Xi+2N = Xi+N then Xi+N := 1 else Xi+N := 0 end; Xi := ! Xi;

If Is Intervened(Xi) has already been run for all 1 i N, Holds From Intervention(α) simply checks that exactly those variables appearing in α have been marked as intervened on, and that these have the correct values. If α is the Lint-equivalent of Xi1 . . . Xik Xik+1 . . . Xin, code for Holds From Intervention(α) is given in Listing 2.

Listing 2: Holds From Intervention(α)

Xi1 = 0 & . . . & Xik = 0 &

Xik+1 = 1 & . . . & Xin = 1 & Xi1+N = 1 & . . . & Xik+N = 1 &

Xik+1+N = 1 & . . . & Xin+N = 1

Completing the description of the code of Pϕ adumbrated earlier, Pϕ consists of, in order: 1. One copy of Is Intervened(Xi) for each 1 i N. 2. For each α Sδ, an if-statement with condition Holds From Intervention(α), whose body is: (a) a choose-statement with a branch for each β fδ(α), with body Make Hold(β), if |fδ(α)| 2; (b) a Make Hold(β)-snippet, if |fδ(α)| = 1; (c) or a single loop-statement if fδ(α) = .

Note that Pϕ never reads or writes a variable Xi for i > 3N, and the relevant PL -operations may be implemented with bounded space, so that we have the following Corollary: Corollary 1. Let M ,ﬁn be the class of ﬁnite state machine restrictions of M , i.e. those (T, x) M where T uses only boundedly many tape variables, for any input and intervention. Then Thm. 1 holds also for M ,ﬁn.

6 Computational Complexity

In this section we consider the problem SIM-SAT(ϕ) of deciding whether a given ϕ L is satisﬁable in M . Although by Prop. 1, it is in general undecidable whether a given particular simulation model satisﬁes a formula, we show here that it is decidable whether a given formula is satisﬁed by any model. In fact, reasoning in this framework is no harder than reasoning in propositional logic:

Theorem 2. SIM-SAT(ϕ) is NP-complete in |ϕ| (where |ϕ| is deﬁned standardly).

Proof. We clearly have NP-hardness as propositional satisﬁability can be embedded directly into L-satisﬁability. To see that satisﬁability is NP, we guess a M and check whether M |= ϕ. M is inﬁnite, and the checking step is undecidable by Prop. 1. So how could such an algorithm work? The crucial insight is that we may limit our search to a ﬁnite class M ϕ of models that are similar to the canonical Mϕ (Lem. 5). Moreover, a nice property of the canonical Tϕ is that it wears its causal structure on its sleeves: one can read off the effect of any intervention from the code of Pϕ, and Pϕ has polynomial size in |ϕ| (implied by Lem. 4). Models in M ϕ will share this property, guaranteeing that the checking step can be done in polynomial time. We will now make M ϕ precise and prove these claims. Let Sϕ Lint denote the set of Lcondantecedents appearing in ϕ. For C N, deﬁne PL ϕ,C PL

as the fragment of programs whose code consists of:

1. One copy of Is Intervened(Xi) (Listing 1), for each 1 i N, followed by

2. at most one copy of an if-statement with condition Holds From Intervention(α) (Listing 2) for each α Sϕ, whose body is one and only one of the following options, (a) (c):

(a) a choose-statement with at most C|ϕ| branches, each of which has a body consisting of a single sequence of assignments, which may only be to variables Xi for 1 i N; (b) a single sequence of assignment statements, only to variables Xi for 1 i N; (c) a single loop-statement.

However, if PL = PLdet, (a) is not allowed; if PL = PL , (c) is not allowed; and if PL = PL det, neither (a) nor (c) is allowed.

Lemma 4. The maximum length (deﬁned standardly) of a program in PL ϕ,C is polynomial in |ϕ|, and there is a C such that for all ϕ, we have Pϕ PL ϕ,C, assuming Pϕ exists.

Proof. N is O(|ϕ|), so part 1 of a program is O(|ϕ|) in length. There are at most |Sϕ| if-statements in part 2; consider the body of each one. In case (a) it has O(|ϕ|) branches, each of which involves assignment to at most N variables, and thus has length O(|ϕ|2). In case (b) its length is O(|ϕ|); in case (c) its length is O(1). Since |Sϕ| is O(|ϕ|), the total length of part 2 is O(|ϕ|3), so that both parts combined are O(|ϕ|3). To show the existence of C, it sufﬁces to prove:

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

any choose-statement in the body of an if-statement in Pϕ has O(|ϕ|) branches. Now, the number of branches in the if-statement for α is |fδ(α)|, for some consistent δ as in (1). But (1) is a clause of the disjunctive normal form of ϕ and contains no more L-literals than does ϕ, which is of course O(|ϕ|). Since each element of fδ(α) arises from the selection of a literal in (1), the number of branches is O(|ϕ|).

Henceforth let PL ϕ denote PL ϕ,C for some C guaranteed by Lem. 4, and call the set of x where only tape variables Xi with indices 1 i N are possibly nonzero XN. Let M ϕ be the class of models (T, x) where T comes from a PL ϕ-program and x XN. M ϕ is ﬁnite, and the following Lemma guarantees that we may restrict the search to M ϕ:

Lemma 5. ϕ is satisﬁable with respect to M iff it is satisﬁable with respect to M ϕ.

Proof. If ϕ is satisﬁable in M , it is AX -consistent by soundness, and hence has a canonical (Tϕ, xϕ). Without loss of generality take xϕ from Thm. 1 to be in XN. Then by Lem. 4, (Tϕ, xϕ) M ϕ, so ϕ is satisﬁable in M ϕ.

Now with Lem. 5 our algorithm will guess a program P PL ϕ and a tape x XN, and verify whether the guessed model M M ϕ satisﬁes ϕ. We just need to show that the veriﬁcation step is decidable in polynomial time. Suppose that all negations in ϕ appear only before L-atoms, since any formula may be converted to such a form in linear time. Further, rewrite literals of the form [α]β to α β. Then it sufﬁces to show that we can decide in polynomial time whether M satisﬁes a given literal in ϕ: there are linearly many of these and the truth-value of ϕ may be evaluated from their values in linear time. For an X-literal Xi or Xi, we simply output whether or not x |= Xi. For Lcondliterals with antecedent α, simulate execution of Iα(T) on x. Because P PL ϕ and such programs trigger at most one Holds From Intervention(α) if-statement when run under an intervention, we may perform this simulation by checking if there is any if-statement for α in P. If so, do one of the following, depending on what its body contains:

(a) If a choose-statement, simulate the result of running each branch. Output true iff: either the literal was [α]β and every resulting tape satisﬁes β, or the literal was α β and at least one resulting tape satisﬁes β.

(b) If an assignment sequence, simulate running it on the current tape, and output true iff the resulting tape satisﬁes β.

(c) If a loop, output true iff the literal is of the [α]β form.

This algorithm is correct since we thereby capture all halting executions, given that PL ϕ-programs conform to the ﬁxed structure above. That it runs in polynomial time follows from the polynomial-length bound of Lem. 4.

7 Conclusion and Future Work

A very natural way to assess a claim, if α were true, then β would be true, is to run a simulation in which α is assumed to hold and determine whether β would then follow. Simulations can be built using any number of tools: (probabilistic) programming languages designed speciﬁcally for generative models, generative neural networks, and many others. Our formulation of intervention on a simulation program is intended to subsume all such applications where conditional reasoning seems especially useful. We have shown that this general way of interpreting conditional claims has its own distinctive, and quite weak, logic. Due to the generality of the approach, we can validate further familiar axioms by restricting attention to smaller classes of programs (deterministic, always-halting). We believe this work represents an important initial step in providing a foundation for conditional reasoning in these increasingly common contexts. To close, we would like to mention several notable future directions. Perhaps the most obvious next step is to extend our treatment to richer languages, and in particular to the ﬁrst order setting. This is pressing for several reasons. First, much of the motivation for many of the generative frameworks mentioned earlier was to go beyond the propositional setting characteristic of traditional graphical models, for example, to be able to handle unknown (numbers of) objects (see [Poole, 2003; Milch et al., 2005]). Second, much of the work in conditional logic in AI has dealt adequately with the ﬁrst order setting by using frameworks based on normality orderings [Delgrande, 1998; Friedman et al., 2000]. It is perhaps a strike against the structural equation approach that no one has shown how to extend it adequately to ﬁrst order languages (though see [Halpern, 2000] for partial suggestions). In the present setting, just as we have used a tape data structure to encode a propositional valuation, we could also use such data structures to encode ﬁrst order models. The difﬁcult question then becomes how to understand complex (i.e., arbitrary ﬁrst-order) interventions. We have begun exploring this important extension. Given the centrality of probabilistic reasoning for many of the aforementioned tools, it is important to consider the probabilistic setting. Adding explicit probability operators in the style of [Fagin et al., 1990] results in a very natural extension of the system [Ibeling, 2018]. One could also use probability thresholds (see, e.g, [Hawthorne and Makinson, 2007]): we might say (T, x) |= [α]β just when Iα(T) results in output satisfying β with at least some threshold probability. Finally, another direction is to consider additional subclasses of programs, even for the basic propositional setting we have studied here. For example, in some contexts it makes sense to assume that variables are time-indexed and that no variable depends on any variable at a later point in time (as in dynamic Bayesian networks [Dean and Kanazawa, 1989]). In this setting there are no cyclic dependencies, which means we do not have programs like that in Example 1. Understanding the logic of such classes would be worthwhile, especially for further comparison with central classes of structural equation models (such as the recursive models of [Pearl, 2009]).

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

Acknowledgments Duligur Ibeling is supported by the Sudhakar and Sumithra Ravi Family Graduate Fellowship in the School of Engineering at Stanford University.

References [Balkenius and G ardenfors, 1991] Christian Balkenius and Peter G ardenfors. Nonmonotonic inferences in neural networks. In Proceedings of KR, 1991. [Bottou et al., 2013] L eon Bottou, Jonas Peters, Joaquin Qui nonero-Candela, Danis X. Charles, D. Max Chickering, Elon Portugaly, Dipankar Ray, Patrice Simard, and Ed Snelson. Counterfactual reasoning and learning systems: The example of computational advertising. Journal of Machine Learning Research, 14:3207 3260, 2013. [de Raedt and Kimmig, 2015] Luc de Raedt and Angelika Kimmig. Probabilistic (logic) programming concepts. Machine Learning, 100(1):5 47, 2015. [Dean and Kanazawa, 1989] Thomas Dean and Keiji Kanazawa. A model for reasoning about persistence and causation. Computational Intelligence, 5(2):142 150, 1989. [Delgrande, 1998] James P. Delgrande. On ﬁrst-order conditional logics. Artiﬁcial Intelligence, 105:105 137, 1998. [Dubois and Prade, 1991] Didier Dubois and Henri Prade. Fuzzy sets in approximate reasoning, Part 1: Inference with possibility distributions. Fuzzy Sets and Systems, 40(1):182 224, 1991. [Fagin et al., 1990] Ronald Fagin, Joseph Y. Halpern, and Nimrod Megiddo. A logic for reasoning about probabilities. Information and Computation, 87:78 128, 1990. [Friedman et al., 2000] Nir Friedman, Joseph Y. Halpern, and Daphne Koller. First-order conditional logic for default reasoning revisited. ACM Transactions on Computational Logic, 1(2):175 207, 2000. [Ginsberg, 1986] Matthew L. Ginsberg. Counterfactuals. Artiﬁcial Intelligence, 30:35 79, 1986. [Goodman et al., 2008] Noah D. Goodman, Vikash K. Mansinghka, Daniel Roy, Keith Bonawitz, and Joshua B. Tenenbaum. Church: a language for generative models. In Proc. 24th UAI, 2008. [Goodman et al., 2015] Noah D. Goodman, Joshua B. Tenenbaum, and Tobias Gerstenberg. Concepts in a probabilistic language of thought. In Eric Margolis and Stephan Laurence, editors, The Conceptual Mind: New Directions in the Study of Concepts. MIT Press, 2015. [Halpern, 2000] Joseph Y. Halpern. Axiomatizing causal reasoning. Journal of AI Research, 12:317 337, 2000. [Halpern, 2013] Joseph Y. Halpern. From causal models to counterfactual structures. Review of Symbolic Logic, 6(2):305 322, 2013. [Hawthorne and Makinson, 2007] James Hawthorne and David Makinson. The quantitative/qualitative watershed for rules of uncertain inference. Studia Logica, 86(2):247 297, 2007.

[Ibeling, 2018] Duligur Ibeling. Causal modeling with probabilistic simulation models. Manuscript, 2018. [Icard, 2017] Thomas F. Icard. From programs to causal models. In Alexandre Cremers, Thom van Gessel, and Floris Roelofsen, editors, Proceedings of the 21st Amsterdam Colloquium, pages 35 44, 2017. [Kocaoglu et al., 2017] Murat Kocaoglu, Christopher Snyder, Alexandros G. Dimakis, and Sriram Vishwanath. Causal GAN: Learning causal implicit generative models with adversarial training. Unpublished manuscript: https://arxiv.org/abs/1709.02023, 2017. [Kraus et al., 1990] Sarit Kraus, Daniel Lehmann, and Menachem Magidor. Nonmonotonic reasoning, preferential models and cumulative logics. Artiﬁcial Intelligence, 44(2):167 207, 1990. [Leitgeb, 2004] Hannes Leitgeb. Inference on the Low Level: An Investigation Into Deduction, Nonmonotonic Reasoning, and the Philosophy of Cognition. Kluwer, 2004. [Lewis, 1973] David Lewis. Counterfactuals. Harvard University Press, 1973. [Liang and Jordan, 2008] Percy Liang and Michael I. Jordan. An asymptotic analysis of generative, discriminative, and pseudolikelihood estimators. In 25th ICML, 2008. [Makinson, 1994] David Makinson. General patterns in nonmonotonic reasoning. In D. Gabbay et al., editor, Handbook of Logic in Artiﬁcial Intelligence and Logic Programming, volume III, pages 35 110. OUP, 1994. [Meek and Glymour, 1994] Christopher Meek and Clark Glymour. Conditioning and intervening. The British Journal for the Philosophy of Science, 45:1001 1021, 1994. [Milch et al., 2005] Brian Milch, Bhaskara Marthi, Stuart Russell, David Sontag, Daniel L. Ong, and Andrey Kolobov. BLOG: Probabilistic models with unknown objects. In Proc. 19th IJCAI, pages 1352 1359, 2005. [Mirza and Osindero, 2014] Mehdi Mirza and Simon Osindero. Conditional generative adversarial networks. Manuscript: https://arxiv.org/abs/1709.02023, 2014. [Newell and Simon, 1961] Allen Newell and Herbert A. Simon. Computer simulation of human thinking. Science, 134(3495):2011 2017, 1961. [Pearl, 2009] Judea Pearl. Causality. CUP, 2009. [Pfeffer and Koller, 2000] Avi Pfeffer and Daphne Koller. Semantics and inference for recursive probability models. In Proc. 7th AAAI, pages 538 544, 2000. [Poole, 2003] David Poole. First-order probabilistic inference. In Proc. 18th IJCAI, 2003. [Sloman, 2005] Steven A. Sloman. Causal Models: How We Think About the World and its Alternatives. OUP, 2005. [Stalnaker, 1968] Robert Stalnaker. A theory of conditionals. American Philosophical Quarterly, pages 98 112, 1968. [Wachter et al., 2018] Sandra Wachter, Brent Mittelstadt, and Chris Russell. Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harvard Journal of Law and Technology, 2018.

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)