# equivalent_causal_models__97dc0165.pdf

Equivalent Causal Models

Sander Beckers Munich Center for Mathematical Philosophy Ludwig Maximilian University, Munich srekcebrednas@gmail.com

The aim of this paper is to offer the ﬁrst systematic exploration and deﬁnition of equivalent causal models in the context where both models are not made up of the same variables. The idea is that two models are equivalent when they agree on all essential causal information that can be expressed using their common variables. I do so by focussing on the two main features of causal models, namely their structural relations and their functional relations. In particular, I deﬁne several relations of causal ancestry and several relations of causal sufﬁciency, and require that the most general of these relations are preserved across equivalent models.

1 Introduction Causal models are becoming evermore important and their domains of application are growing rapidly, both within science (most notably AI) and in philosophy. As with all forms of modeling, there usually exist many causal models that can be used to describe a given system of interest. Since different models can lead to widely divergent conclusions, it is of paramount importance to have some method to compare models and group together those that are sufﬁciently similar. To do so we need to be able to establish when two causal models are equivalent. The aim of this paper is to offer the ﬁrst systematic exploration and deﬁnition of equivalent causal models. Of course what makes for a sensible notion of equivalence varies across different contexts. For example, there exists a well-known notion of equivalence that is used in the epistemically limited context where one is learning a causal model from data. However, such Markov-equivalence (as it is called) does not apply in the context where two causal models are fully speciﬁed and do not consist of the same variables. Such contexts are widespread in science, but are of particular concern when trying to capture stable, modelindependent, features of the world. Furthermore, exploring equivalence in such an omniscient context offers a ﬁrm conceptual starting point for generalizing to contexts that involve approximation and uncertainty in future work. Before diving into the details, I spell out the underlying intuition of both the notion to be developed and the method

Copyright 2021, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

by which I shall do so. The aim is to formalize a suitable deﬁnition of causal model-equivalence that is relative to a set of variables that the models have in common: the idea is that two models are equivalent when they agree on all essential causal information that can be expressed using these common variables. Obviously this will require motivating what I take to be (in)essential and why I choose to focus merely on common variables. My method focusses on the observation that causal models are characterized using two types of relations: structural relations, i.e., having to do with the causal graph, and functional relations, i.e., having to do with how particular values of variables are determined by values of other variables. I develop two notions of equivalence, one for each type. Causal equivalence will then be deﬁned as being both structurally and functionally equivalent. The details are as follows. The most basic structural relation is that of one variable X being an ancestor of another variable Y . I generalize this relation along three dimensions. First, I take into account the speciﬁc values x,x and y,y of the variables that are responsible for the ancestry. Second, I consider ancestry for sets of variables X and Y . Third, I distinguish between potential ancestors, which are relative to a causal model, and actual ancestors, which are relative to a causal model and a speciﬁc context. Structural equivalence is then deﬁned as having identical potential and actual ancestors. The most basic functional relation is that of one set of variable values x being causally sufﬁcient for another set y. I ﬁrst look into spelling out an appropriate notion of causal sufﬁciency, resulting in direct sufﬁciency. I then generalize this relation by deﬁning sufﬁciency as its transitive closure. Functional equivalence is deﬁned as having identical sufﬁciency relations. The structure of this paper is as follows: the next section introduces the structural equations framework. I then offer a motivation and clariﬁcation of the problem that this paper is concerned with in Section 3. Section 4 focusses on the causal structure and develops a notion of structural equivalence. In Section 5 I shift focus to functional equivalence, by considering what makes for a good deﬁnition of causal sufﬁciency. Causal equivalence is then deﬁned as the combination of both forms of equivalence. Section 6 compares functional equivalence to Halpern s notion of conservative

The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)

extension. Section 7 discusses other related work and reﬂects on possible future directions.

2 Structural Equation Models

This section reviews the deﬁnition of causal models as understood in the structural modeling tradition started by (Pearl 2000), mostly using the notation and terminology from (Halpern 2016a).

Deﬁnition 1 A signature S is a tuple (U,V,R), where U is a set of exogenous variables, V is a set of endogenous variables, and R a function that associates with every variable Y U V a nonempty set R(Y ) of possible values for Y (i.e., the set of values over which Y ranges). If X = (X1,...,Xn), R( X) denotes the crossproduct R(X1) R(Xn).

Exogenous variables represent factors whose causal origins are outside the scope of the causal model, such as background conditions and noise. The values of the endogenous variables, on the other hand, are causally determined by other variables within the model (both endogenous and exogenous).

Deﬁnition 2 A causal model M is a pair (S,F), where S is a signature and F deﬁnes a function that associates with each endogenous variable X a structural equation FX giving the value of X in terms of the values of other endogenous and exogenous variables. Formally, the equation FX maps R(U V {X}) to R(X), so FX determines the value of X, given the values of all the other variables in U V.

We call a setting u R(U) of values of exogenous variables a context. The value of Y may depend on the values of only a few other variables. Y depends on X if there is some context u and some setting of the endogenous variables other than Y and X such that varying the value of X in that context results in a variation in the value of Y ; that is, there is a setting z of the endogenous variables other than Y and X and values x and x of X such that FY (x, z, u) FY (x , z, u). (Or, in case X U, FY (x, z, u x) FY (x , z, u x).) We then say that X is a parent of Y . It will prove useful later on to have some notation to capture the details of the foregoing situation: we write (x,x ) (y,y ) and say that ( u, z) is a witness to this. (In case X U, the witness is of the form ( u x, z). For sake of notational simplicity we leave this nuance implicit.) In this paper we restrict attention to strongly recursive (or acyclic) models, that is, models where there is a partial order on variables such that if Y depends on X, then X Y .1 In a strongly recursive model, given a context u, the values of all the endogenous variables are determined (we can just solve for the value of the variables in the order given by ). We often write the equation for an endogenous variable as Y = f( X); this denotes that the value of Y depends only on the values of the variables in X, and the connection is given by the function f. For example, we might have Y = X + 5.

1All deﬁnitions could easily be generalized to recursive models, i.e., models where the partial order may depend on the context.

An intervention has the form X x, where X is a set of endogenous variables. Intuitively, this means that the values of the variables in X are set to the values x. The structural equations deﬁne what happens in the presence of interventions. Setting the value of some variables X to x in a causal model M = (S,F) results in a new causal model, denoted M X x, which is identical to M, except that F is replaced by F X x: for each variable Y X, F X x Y = FY (i.e., the equation for Y is unchanged), while for each X in X, the equation FX for X is replaced by X = x (where x is the value in x corresponding to X ). Given a signature S = (U,V,R), an atomic formula is a formula of the form X = x, for X V and x R(X). A causal formula (over S) is one of the form [Y1 y1,...,Yk yk]φ, where

φ is a Boolean combination of atomic formulas,

Y1,...,Yk are distinct variables in V, and

yi R(Yi) for each 1 i k.

Such a formula is abbreviated as [ Y y]φ. The special case where k = 0 is abbreviated as φ. Intuitively, [Y1 y1,...,Yk yk]φ says that φ would hold if Yi were set to yi, for i = 1,...,k. A causal formula ψ is true or false in a causal setting, which is a causal model given a context. As usual, we write (M, u) ψ if the causal formula ψ is true in the causal setting (M, u). The relation is deﬁned inductively. (M, u) X = x if the variable X has value x in the unique (since we are dealing with recursive models) solution to the equations in M in context u (i.e., the unique vector of values that simultaneously satisﬁes all equations in M with the variables in U set to u). The truth of conjunctions and negations is deﬁned in the standard way. Finally, (M, u) [ Y y]φ if (M Y y, u) φ.

3 Motivation There are many different contexts in which one might consider one causal model M1 to be equivalent to another model M2. A crucial dimension along which we can separate these contexts is by looking at the kind of relation they require to hold between the sets of variables of both models. A ﬁrst context, that this paper is not directly concerned with, is the one in which both models are required to have identical variables. It is within this context that most causal search algorithms take place. The output of such an algorithm is usually a partially directed graph that represents the Markov Equivalence Class. Markov equivalence means that two models have identical observational probability distributions. Yet two Markov equivalent models only partially agree about the ancestral relations, and need not agree at all about the structural equations. This paper is concerned with a second context, namely the one in which we are comparing two causal models only with regard to (a subset of) their common variables V. In this context, the primary interest of the modeler lies in the causal relations between these common variables. We are looking for a notion of equivalence that is strong enough such that

when confronted with a choice between two models M1 and M2 that contain (at least) all of these variables, we can state that if M1 and M2 are equivalent with regards to V then they agree on all the essential causal relations that hold between the common variables and thus one is free to choose either model. This notion of equivalence matters because we want to be able to draw stable conclusions about the causal relations between the variables of interest: no matter what further variables we discover to be causally connected to the variables of interest, and no matter what variables that we are not interested in are being marginalized away, as long as the resulting causal model is equivalent to the original one the conclusions about their common variables remain identical. As a consequence, the goal of this paper is to ﬁnd a much stronger notion of equivalence than any notion that could be useful in the ﬁrst context, one that gets much closer to identity (which is of course the strongest possible notion of equivalence). In particular, any difference between two equivalent causal models in this strong sense should be entirely explained by the fact that the two models do not have identical variable sets. A fortiori, when applying this notion of equivalence to models with identical variable sets, it should reduce to identity. For example, say we have M1 consisting of the equation E = C and M2 consisting of equations E = D and D = C. If we are exclusively interested in the variables E and C, then any sensible notion of equivalence that allows for different variable sets should consider both models to be equivalent with regards to E and C. Of course this does not mean that it is impossible to state a causal relation between C and E over which both models disagree: one could simply count the number of edges it takes to get from C to E, and in that respect the two models clearly are different. Yet I assume that such numerical properties are not essential. If such properties were essential, then it seems impossible to come up with any useful notion of equivalence, for presumably one can always zoom in and inject a hitherto ignored intermediate variable along any edge. Likewise, the number of distinct paths going from one variable to another is also deemed to be inessential for the purposes of this paper. I make do with these intuitive considerations regarding numerical properties and refrain from giving a precise deﬁnition. Lastly a brief word about a third context that one could consider for the relation between the sets of variables of two models. A natural suggestion is to also compare two models such that we have a mapping τ R2(V2) R1(V1) between the values of the variables of both models. Indeed, that was the context investigated in previous work (Beckers and Halpern 2019; Beckers, Eberhardt, and Halpern 2019). Yet there the particular notion of equivalence that we used was left entirely implicit, and thus other more interesting notions were never considered. So in that respect the current paper is more general. (This will be discussed in Section 6.) Note also that the third and second context overlap: if M2 is an extension of M1, meaning that V1 V2, then one can consider the special case where τ is the identity-mapping for the common variables and marginalization for the oth-

ers, i.e., τ( v2) = v1 for all v2 R2(V2) where v1 is the restriction of v2 to R1(V1). Therefore an interesting line of future work is to generalize the notion of equivalence here developed to the third context by considering equivalence for any mapping τ.

4 Structural Equivalence This section focuses on the causal structure of the causal model. Several useful concepts are introduced in this regard, which are then combined to suggest a deﬁnition of structural equivalence. Section 2 introduced the concept of one variable being a parent of another. It is customary to further generalize this genealogical terminology by taking ancestor to be the transitive closure of the parent relation, i.e., X is an ancestor of Y if there are variables V1, ..., Vn, such that X is a parent of V1, V1 is a parent of V2, ..., and Vn is a parent of Y . By treating variables as nodes and drawing directed edges from parents to their children, each strongly recursive causal model induces a unique DAG (Directed Acyclic Graph) that captures the causal structure of the model. We say that X is an ancestor of Y if X is an ancestor of Y in this DAG. This graph by itself offers only a rough characterization of all the causal information that the model contains: it tells you which variable can depend on some other variable, but it does not provide details of this dependence. In particular, it does not say which values of some variable depend on what values of some other variable. Such details can be added by reﬁning the standard genealogical relations so that they also take into account the speciﬁc values which instantiate the parent/ancestral relations. (Throughout the rest of the paper we assume that M1 = ((U,V1,R1),F), M2 = ((U,V2,R2),F2), and V (V1 V2). We further assume that X and Y are distinct members of V unless stated otherwise, x,x are distinct values in R(X), and likewise for y,y . Similarly, X and Y are distinct possibly overlapping subsets of V, and x, x are non-overlapping sets of values in R(X). For brevity, we often leave the variables implicit, writing x instead of X = x, for example. All deﬁnitions assume that X V, but this is merely for notational simplicity. The case where X U is left implicit.)

Deﬁnition 3 We say that x rather than x is a potential parent of y rather than y if there exists a witness ( u, z) for (x,x ) (y,y ).2 The ancestral generalization is straightforwardly deﬁned as the transitive closure. We use the following notation for potential ancestry: (x,x ) p (y,y ), where p = (X,V1,...,Vn,Y ) is called a directed path from X to Y .

Should potential ancestral relations be preserved among equivalent causal models? As such this relation contains crucial causal information, namely that there s a stepwise connection between changing some value of X and changing a

2Note that potential parenthood is a symmetric relation for the values of the variables: (x, x ) (y, y ) iff (x , x) (y , y). This is no longer true for actual parenthood, introduced later on.

value of Y . But the potential ancestral relation also contains more speciﬁc information that is numeric in nature: it states that this connection is given by a single path. As mentioned before, I consider it a mistake to focus on numerical properties. Indeed, consider the following example.

Example 1 Assume the range of E is {0,1,2} and all other variables are binary. (Throughout all examples, all variables are binary unless otherwise stated.) The original model M1 consists simply of the equation E = 2 C. The extended model M2 consists of A = C, B = C, and E = A+B. Intuitively, I believe that these models should come out as equivalent regarding C and E. Yet although C is an ancestor of E in both models, (1C,0C) p (2E,0E) for some path p only in M1. (The subscripts indicate to which variables the values correspond.)

Still, the essential property that there exists a stepwise connection between (1C,0C) and (2E,0E) is preserved in M2, it is just that this connection now consists of simultaneously considering two paths rather than one: (1C,0C) (1A,1B;0A,0B) (2E,0E). In other words, we can generalize the potential ancestral relation to allow for steps between sets of variable values rather than steps between single variable values. Graphical relations are not naturally equipped to deal with sets of values, but functional relations are. So in order to make this precise we ﬁrst express the single-path potential ancestral relations using functional relations, i.e., relations expressed in terms of which formulas are true under certain interventions.

Fact 1 (x,x ) (y,y ) with witness ( u, z) in M iff (M, u) [ Z z,X x]Y = y [ Z z,X x ]Y = y .

As was pointed out in Section 2, in strongly recursive models the parents of a variable sufﬁce to determine its value. Therefore we can always restrict the witness to only include the context and a variable s endogenous parents. In fact, sometimes we can restrict it even further. For example, if the equation for Y is Y = (A B) C, then (0A,0C) sufﬁces to determine that Y = 0 and (0A,1C) sufﬁces to determine that Y = 1. This allows us to conclude that (0C,1C) (0Y ,1Y ) with just using (0A) as a witness. Generalizing the notion of a witness so that it need not consist of values for all other variables allows for comparing potential parenthood across models with different signatures. We make this precise by deﬁning what it means for a set of variable values to be directly sufﬁcient for another set of variable values. Informally, a set is sufﬁcient precisely when the outcome is independent of the values of variables not in the set.

Deﬁnition 4 We say that x is directly sufﬁcient for y if for all settings z of the other endogenous variables and all contexts u we have that (M, u) [ X x, Z z] Y = y. (If XU = X U is not empty, the quantiﬁcation is restricted to contexts u such that their restriction to XU agrees with x U.)

Now that we have identiﬁed the crucial functional relation out of which the potential parent relation is built up, we can generalize this parent relation to sets of variable values.

Deﬁnition 5 We say that x rather than x are potential joint parents of y rather than y if there is a set Z V U, and a setting z R( Z) such that

( z, x) is directly sufﬁcient for y, ( z, x ) is directly sufﬁcient for y , and X is minimal (i.e., at least one of the previous statements is false when considering any strict subset of X).

We notate this as ( x, x ) ( y, y ) and say that z is a witness.

It is trivial to see that if restricted to singletons, Deﬁnition 5 reduces to Deﬁnition 3. We demand X to be minimal in order to avoid calling redundant elements parents. (For example, if we have as structural equation Y = A, then we do not want to call (1A,1B) joint parents of Y = 1: B = 1 is entirely redundant here.) We demand that X and Y are nonidentical in order to avoid self-parenthood. But it would be too much to demand that their intersection is empty. To see why, we consider a slight variant on Example 1.

Example 2 All variables are as before. The original model M1 consists simply of the equation E = 2 C. The extended model M2 consists of A = C and E = A + C. As before, I believe that these models should come out as equivalent regarding C and E. But in order to establish that (1C,0C) p (2E,0E) for some path p in M2, we need to break up the single step into (1C,0C) (1C,1A;0C,0A) (2E,0E).

Now that we have generalized the potential parenthood relation along the dimension of sets of variables (which is already built into functional relations), we can take the transitive closure to further generalize this relation along the dimension that is characteristic of the ancestral relation (but is lacking from many functional relations).

Deﬁnition 6 We say that x rather than x are potential joint ancestors of y rather than y if there exist V1, ..., Vn and settings vi, v i R( Vi) for each i {1,...,n 1} such that ( vi, v i) ( vi+1, v i+1), ( x, x ) ( v1, v 1) and ( vn, v n) ( y, y ). We call the tuple n = ( X, Vi,..., Vn, Y ) a network from X to Y , and say that x rather than x are potential joint ancestors of y rather than y along n. We notate this as ( x, x ) n ( y, y ).

Preservation of potential joint ancestors deals with Examples 1 and 2, but it does not sufﬁce for our notion of equivalence, because it entirely ignores that much causal information is context-dependent. The above deﬁned ancestral relations are all general properties of the causal model M, in the sense that they do not make any reference to an actual context u. Yet causal models also capture causal information regarding speciﬁc contexts. Indeed, it is such information that is at stake when speaking of actual causation, an important concept that has beneﬁtted greatly from the advent of causal models. A good notion of structural equivalence should take this into account as well. Therefore we also deﬁne a contextspeciﬁc counterpart of the potential ancestral relations.

Deﬁnition 7 We say that x rather than x is an actual parent of y rather than y w.r.t. (M, u) if (x,x ) (y,y ) with a

witness ( u, z) such that (M, u) Z = z X = x. We notate this as (x,x ) u (y,y ).

Both the generalization of actual parenthood to sets of variable values and the analogous ancestral counterpart are deﬁned straightforwardly. I only explicitly give the combination of both generalizations, which is the context-relative counterpart of Deﬁnition 6.

Deﬁnition 8 We say that x rather than x are actual joint ancestors of y rather than y w.r.t. (M, u) if ( x, x ) n ( y, y ) for some n such that the exogenous parts of all the witnesses are contained in u and all the endogenous parts z are such that (M, u) Z = z X = x. We notate this as ( x, x ) u n ( y, y ).

The following example illustrates why preservation of actual joint ancestors should be combined with the preservation of potential joint ancestors to get structural equivalence.

Example 3 Model M1 contains E = B D, whereas model M2 contains E = (A C) B D. Both models contain A = C, B = C, and D = C. Note that both models have identical signatures. It is easy to verify that for each context both models will have the same actual joint ancestors. Yet they disagree on the potential ancestors, for (1A,0A) (1E,0E) in M2 (with witness (0C,0B,0D)) and not so in M1. The two models are indeed different, for they disagree on whether A is an ancestor of E, and thus do not even have the same DAG.

We are ﬁnally ready to deﬁne what it means for the structural causal information to be preserved across causal models. (As a reminder, an equivalence relation is deﬁned as being reﬂexive, symmetric, and transitive. It is easy to see that Deﬁnition 9 satisﬁes those properties.)

Deﬁnition 9 M1 and M2 are structurally equivalent relative to V if for all X U V, Y V, and x, y the following two conditions hold: ( x, x ) n ( y, y ) for some n in M1 iff ( x, x ) m ( y, y ) for some m in M2. For all u R(U) it holds that ( x, x ) u n ( y, y ) for some n in M1 iff ( x, x ) u m ( y, y ) for some m in M2.

As mentioned, an interesting special case occurs when we are comparing two models with regards to all of their common variables, i.e., V = V1 V2. In that case we simply say that M1 and M2 are structurally equivalent. (Note that contrary to Deﬁnition 9, this is not a transitive relation, so here we are using the term equivalence loosely.) The following variation on Example 3 illustrates that structural equivalence does not sufﬁce to ensure that equivalence of models with identical signatures reduces to identity.

Example 4 In M1 we have E = (A C F) B D (G F), whereas in M2 we have E = (A C G) B D (G F). Both models contain A = C, B = C, F = C, G = C, and D = C. The reader can conﬁrm that in addition to agreeing on all actual joint ancestral relations, as was the case with Example 3, they also agree on all potential joint ancestors, and are thus structurally equivalent.

(Note that they sometimes disagree about the witnesses for the potential relations: (1F ,0F ) (1E,0E) with witness (1A,0C,0B,0D,0G) only in M1, and likewise with F and G reversed for M2.) Intuitively the models are not equivalent, because they disagree on whether it is F or G that can contribute to E = 1 together with A and C.

The next section develops several notions of equivalence which do give the desired outcome for cases with identical signature, and argues for the superiority of one of them.

5 Functional Equivalence

Now we switch attention to the basic functional relations that are captured by causal models, meaning those relations capturing how the values of variables are determined by the values of other variables. Recall that if x is directly sufﬁcient for y (Def. 4), this means precisely that x fully determines the variables Y taking on the values y. So the functional relations between a set of variables are speciﬁed completely by stating all relations of direct sufﬁciency that hold between their values. A naive suggestion might therefore be to deﬁne functional equivalence as having identical relations of direct sufﬁciency for all sets only containing variables that are common to both models. However, that would give an overly strong notion of equivalence, for it means that the only way of extending a model is by adding children to childless variables. The reason is that the moment one adds a new parent A to a set of existing parents X of a variable Y then, by deﬁnition of being a parent, at least one setting x will cease being directly sufﬁcient for some y. To illustrate, even the trivial models we considered as a sanity check in Section 3 would not be considered equivalent. Recall that M1 is made up of E = D and D = C, and M2 consists simply of E = C. Given that C = c is directly sufﬁcient for E = e only in the latter, these models would not come out as equivalent for the variables {E,C}. The problem is that, as the name suggests, direct sufﬁciency focusses on parent-children relations, and is thus a numerical property. Instead, we generalize the notion of direct sufﬁciency so that it also captures sufﬁciency for descendants that are not children. Recall that we generalized ancestral relations by incorporating a feature of functional relations, namely that it also applies to sets of variable values and not just to single variables. Here we make the symmetrical counterpart of that move: we generalize the functional relation of direct sufﬁciency by incorporating a feature of ancestral relations, namely its transitivity.

Deﬁnition 10 We say that x is sufﬁcient for y if there exist w0,..., wn such that for each i, wi is directly sufﬁcient for wi+1, where x = w0 and y = wn.

This gives rise to our notion of functional equivalence. As with structural equivalence, we say that M1 and M2 are functionally equivalent if they are functionally equivalent relative to V = V1 V2.

Deﬁnition 11 M1 and M2 are functionally equivalent relative to V if for all X V U, Y V and x, y, it holds that x is sufﬁcient for y in M1 iff x is sufﬁcient for y in M2.

It is easy to see that functional equivalence deals with Example 4. For example, (1A,0C,1F ,0B,0D,0G) is sufﬁcient for E = 1 in M1 but not in M2. Nonetheless, the following example illustrates why functional equivalence by itself does not sufﬁce to guarantee structural equivalence.

Example 5 M1 consists simply of Y = D and X = D. M2 on the other hand consists of Y = (V X) D, V = D, and X = D. It is easy to verify that the models are functionally equivalent. Yet both models disagree about whether X is an ancestor of Y , and thus are not structurally equivalent.

We therefore deﬁne causal equivalence as the combination of structural and functional equivalence.

Deﬁnition 12 M1 and M2 are causally equivalent relative to V if they are both structurally and functionally equivalent relative to V. M1 and M2 are causally equivalent if they are both structurally and functionally equivalent.

6 Weak Equivalence We have come across two sensible notions of causal sufﬁciency already (Defs. 4 and 10). The following deﬁnition also offers a very natural notion of causal sufﬁciency.

Deﬁnition 13 We say that x is weakly sufﬁcient for y w.r.t. (M, u) if (M, u) [ X x] Y = y. (If XU = X U is not empty, then the restriction of the context u to XU has to agree with x U.)

Direct sufﬁciency required x to guarantee y regardless of the context and of the values of the other variables. Weak sufﬁciency on the other hand merely requires that x helps in getting y (even if only by not preventing it), but it does not guarantee it: it relies on the other variables to play nice and take on their usual values. Note that direct sufﬁciency implies sufﬁciency, and sufﬁciency implies weak sufﬁciency. Therefore one might wonder whether deﬁning equivalence using weak sufﬁciency could offer a plausible alternative to functional equivalence. (Halpern 2016b) considers precisely this deﬁnition, be it that he limits attention to the special case where M2 is an extension of M1, i.e., a case where V1 V2. Informally, his deﬁnition states that both models agree on all formulas consisting solely of variables that they have in common.

Deﬁnition 14 M2 is a conservative extension of M1 if for all X, Y V1, all x, y, and all u, it holds that x is weakly sufﬁcient for y w.r.t. (M1, u) iff x is weakly sufﬁcient for y w.r.t. (M2, u).

Halpern calls the extended model a conservative extension precisely because he takes it to be intuitive that the larger model preserves all relevant causal information of the smaller model (as far as their mutual variables go of course). I deﬁne weak equivalence as the generalization of Deﬁnition 14 to all sets of common variables V.

Deﬁnition 15 M1 and M2 are weakly equivalent relative to V if for all X U V, Y V, all x, y, and all u, it holds that x is weakly sufﬁcient for y w.r.t. (M1, u) iff x is weakly sufﬁcient for y w.r.t. (M2, u)

As before, we say that M1 and M2 are weakly equivalent if they are weakly equivalent relative to all of their common variables. Just as functional equivalence, weak equivalence captures important features that are ignored by structural equivalence. In particular, it meets our demand that it reduces to identity when applied to models with identical variable sets (and thus also deals with Example 4). Yet functional equivalence is stronger than weak equivalence.

Theorem 1 Given M1 and M2, it holds that if M1 and M2 are functionally equivalent relative to V then they are weakly equivalent relative to V.

Proof: Assume M1 and M2 are functionally equivalent relative to V. Say x is weakly sufﬁcient for y w.r.t. (M1, u), where X U V and Y V. (The reasoning for M2 is identical.) We ﬁrst show that ( u, x) is sufﬁcient for y in M1. Given strong recursivity, ( u, x) is directly sufﬁcient for some setting a1, where A1 V1 are those variables that only depend on the exogenous variables U and on the variables in X. If Y A1, then either y = a1 or a1 is directly sufﬁcient for y, and in both cases the result follows. If not, we can continue this reasoning by observing that ( u, x, a1) is directly sufﬁcient for some setting a2, where A2 V1 are those variables that only depend on the variables in U X A1. Therefore eventually we arrive at some an so that Y An and ( u, x) is sufﬁcient for an in M1. Because M1 and M2 are functionally equivalent relative to V, we also have that ( u, x) is sufﬁcient for y in M2, from which it follows that x is weakly sufﬁcient for y w.r.t. (M2, u).

The following interesting result illustrates that the dichotomy between structural and functional relations is not strict: weak equivalence (and thus also functional equivalence) is not entirely independent from structural equivalence. Concretely, the ancestor relation is preserved when moving from a causal model to a conservative extension. (Example 5 illustrates that the reverse is not true.)

Theorem 2 Given models M1 and M2 so that M2 is a conservative extension of M1, for any variables X,Y V1, we have that if X is an ancestor of Y in M1 then X is an ancestor of Y in M2. In other words, the DAG of M1 is a sub-DAG of M2 when ignoring all the variables not in V1.

Proof: Assume X is an ancestor of Y in M1. Because of the transitivity of the ancestral relation, we may assume that X is a parent of Y in M1, i.e., there exist x,x ,y,y and a witness ( u, z) with Z = V1 {X,Y } so that (x,x ) (y,y ). By Fact 1 this means that (M1, u) [ Z z,X x]Y = y [ Z z,X x ]Y = y . Because M2 is a conservative extension of M1, we also have (M2, u) [ Z z,X x]Y = y [ Z z,X x ]Y = y . Say A are the parents of Y in M2 excluding X. Thus any setting of ( A,X) combined with the context is directly sufﬁcient for some value of Y . If all variables in A get the same values under both interventions, i.e., there exists a setting a R( A) such that (M2, u) [ Z z,X x] A =

a [ Z z,X x ] A = a, then ( u, a) is a witness for the fact that X is also a parent of Y in M2. If they do not get the same values then there exists an A1 A which gets different values a1 and a 1 under the respective interventions. In other words, (M2, u) [ Z z,X x]A1 = a1 [ Z z,X x ]A1 = a 1. We are thus in the same situation for A1, a parent of Y , as we were for Y . Therefore we can repeat the reasoning from above by looking at the parents of A1 in M2 excluding X. Given strong recursivity, eventually we have to ﬁnd a variable Ai such that X is a parent of Ai, Ai is a parent of Ai 1, ..., and A1 is a parent of Y .

Despite this result and the intuitive appeal of the idea that equivalence comes down to comparing formulas made up solely of common variables, the following example shows that weak equivalence is too weak to preserve all causal information that I would consider as relevant, and therefore is not suitable to replace functional equivalence.

Example 6 The model M2 contains the following equations: E = B C F, B = A C, C = D, A = D, and F = C. If we remove variable B we get the model M1 where the equation for E has become E = A C F. The reader can verify that the models are both weakly and structurally equivalent. Yet the role of A in both models is intuitively quite different: if C = 1, then A = 1 and C = 1 fulﬁll entirely symmetrical roles in bringing about E = 1 according to M1, which is clearly not the case according to M2. Functional equivalence formalizes this intuition: A = 1 is sufﬁcient for E = 1 in M1 and not in M2.

7 Other Related Work and Future Directions (Bongers et al. 2016) offer a deﬁnition of equivalence after marginalization that generalizes Halpern s notion of conservative extension to cyclic and non-deterministic models. Their work is highly complementary to mine: contrary to them, I restrict myself to acyclic deterministic models, and contrary to me, they restrict themselves to using just this one type of equivalence (i.e., just as Halpern, they only consider Deﬁnition 14). The naive suggestion mentioned earlier of deﬁning functional equivalence using direct sufﬁciency is present implicitly in (Beckers and Halpern 2019). (In the terminology we used there, it comes down to stating that M1 is a constructive abstraction of M2 for the identity-marginalization mapping discussed in Section 3). (Rubenstein et al. 2017) also give such a deﬁnition (see their Theorem 9). However, as mentioned in Section 3, the focus of both articles lies with the third rather than the second context. As a result, neither of them discuss the issue of equivalence, nor do they consider any of the other forms of equivalence here introduced. In future work I intend to generalize my approach to the third context by combining my notions of equivalence with the various notions of abstraction from (Beckers and Halpern 2019). Simply put, the idea would be to require for some causal relations that they hold between x and y in M2 iff they hold between τ( x) and τ( y) in M1, where τ is an abstraction mapping. (What makes this non-trivial, is the

fact that τ is deﬁned as mapping a setting v2 R2(V2) to some v1 R1(V1), and thus one has to come up with a sensible generalization to random settings x and y.) As an additional step, I can make use of the work from (Beckers, Eberhardt, and Halpern 2019) to deﬁne a notion of approximate equivalence. Another promising extension concerns the topic of actual causation, i.e., the causal relation that holds between particular events, as opposed to types of events. I intend to use the current approach to analyse existing and to propose novel deﬁnitions of actual causation by assuming that a good definition ought to be stable across equivalent models: whether x causes y in some context u should not depend on which of two equivalent models one uses. I take the idea of such an analysis from (Gallow forthcoming), whose work offered the impetus for the current project. He offers a very strong sufﬁcient condition for equivalence, but does not develop a notion of equivalence. (The interested reader can easily verify that his sufﬁcient condition indeed sufﬁces to meet the criteria of my deﬁnition of causal equivalence.) The notion of actual joint ancestry (Deﬁnition 8) will be central to this extension. It was inspired by a similar but subtly different one from (Gallow forthcoming), who presents it as the basis for a stable deﬁnition of actual causation.34 Therefore the current paper can be seen as offering a partial formal vindication of his approach (modulo the subtle differences between the deﬁnitions). But I believe there is still room for improvement. For example, actual joint ancestry is (by deﬁnition) a transitive relation, whereas nowadays there is a broad consensus that actual causation is not transitive. For a more concrete and intuitive objection, consider a model with equation E = (C B) ( C A), and a context such that C = 1, B = 1, and A = 1. Then (1A,1C;0A,0C) u (1E,0E). However, it seems strange to consider A = 1 to be a part of an actual cause of E = 1, given that the disjunct it appears in was false. To sum up, I investigated when two causal models ought to be considered equivalent relative to (a subset of) their common variables. I proposed an answer by looking at the two fundamental features of a causal model, namely its structural and its functional relations. Although the discussion has made clear that this deﬁnition satisﬁes certain intuitive properties, I do not take it to offer the last word. There may very well be further intricacies that the current deﬁnition overlooks, which could lead to more reﬁned notions of equivalence.

3For the interested reader: here is an example for which his deﬁnition offers a different verdict. The equations are E = (A B) C, and B = A C, and we consider a context in which A = 1 and C = 1. Then (1A, 1C; 0A, 0C) u (1A, 1B; 0A, 0B) u (1E, 0E), and thus (A = 1, C = 1) are actual joint ancestors of E = 1. Gallow s deﬁnition involves a minimality condition applied to the entire network, as opposed to applied to each step between parents and children. Since also (1C, 0C) u (1E, 0E), (A = 1, C = 1) is not minimal and therefore he does not consider it an actual cause. 4His full deﬁnition adds to this a focus on the default/deviant distinction between values of variables, which is fairly common in work on actual causation. This distinction and his use of it could easily be added to my analysis, and thus stands orthogonal to it.

Acknowledgments Many thanks to Dmitri Gallow, Joe Halpern, and AAAI reviewers for comments on an earlier version of this paper. This research was made possible by funding from the Alexander von Humboldt Foundation.

References Beckers, S.; Eberhardt, F.; and Halpern, J. Y. 2019. Approximate Causal Abstraction. In Proc. 35th Conference on Uncertainty in Artiﬁcial Intelligence (UAI 2019). Beckers, S.; and Halpern, J. Y. 2019. Abstracting Causal Models. In Proceedings of the 33rd AAAI Conference on Artiﬁcial Intelligence, volume 33, 2678 2685. Bongers, S.; Peters, J.; Sch olkopf, B.; and Mooij, J. M. 2016. Structural Causal Models: Cycles, Marginalizations, Exogenous Reparameterizations and Reductions. URL arxiv.org/ abs/1611.06221v1. Gallow, J. D. forthcoming. A Model-Invariant Theory of Causation. Philosophical Review Available on Phil Archive: https://philarchive.org/archive/GALCPA-8 (Last accessed: 03/12/2021). Halpern, J. Y. 2016a. Actual Causality. MIT Press. Halpern, J. Y. 2016b. Appropriate Causal Models and the Stability of Causation. Review of Symbolic Logic 9(1): 76 102. Pearl, J. 2000. Causality: Models, Reasoning, and Inference. Cambridge University Press. ISBN 0-521-77362-8. Rubenstein, P. K.; Weichwald, S.; Bongers, S.; Mooij, J. M.; Janzing, D.; Grosse-Wentrup, M.; and Sch olkopf, B. 2017. Causal consistency of structural equation models. In Proc. 33rd Conference on Uncertainty in Artiﬁcial Intelligence (UAI 2017).