# causal_identification_under_markov_equivalence_completeness_results__c2f12c49.pdf

Causal Identiﬁcation under Markov Equivalence: Completeness Results

Amin Jaber 1 Jiji Zhang 2 Elias Bareinboim 1

Causal effect identiﬁcation is the task of determining whether a causal distribution is computable from the combination of an observational distribution and substantive knowledge about the domain under investigation. One of the most studied versions of this problem assumes that knowledge is articulated in the form of a fully known causal diagram, which is arguably a strong assumption in many settings. In this paper, we relax this requirement and consider that the knowledge is articulated in the form of an equivalence class of causal diagrams, in particular, a partial ancestral graph (PAG). This is attractive because a PAG can be learned directly from data, and the scientist does not need to commit to a particular, unique diagram. There are different sufﬁcient conditions for identiﬁcation in PAGs, but none is complete. We derive a complete algorithm for identiﬁcation given a PAG. This implies that whenever the causal effect is identiﬁable, the algorithm returns a valid identiﬁcation expression; alternatively, it will throw a failure condition, which means that the effect is provably not identiﬁable. We further provide a graphical characterization of nonidentiﬁability of causal effects in PAGs.

1. Introduction

One cognitive feature that arguably distinguishes humans from other species is our ability to learn, process, and use causal information. Pearl recently highlighted this point eloquently: Some tens of thousands of years ago, humans began to realize that certain things cause other things and that tinkering with the former can change the latter. . . From this discovery came organized societies, then towns and

1Department of Computer Science, Purdue University, West Lafayette, USA 2Department of Philosophy, Lingnan University, NT, HK. Correspondence to: Amin Jaber <jaber0@purdue.edu>.

Proceedings of the 36 th International Conference on Machine Learning, Long Beach, California, PMLR 97, 2019. Copyright 2019 by the author(s).

cities, and eventually the science and technology-based civilization we enjoy today. All because we asked a simple question: Why? (Pearl & Mackenzie, 2018).

At the center of causal reasoning lies the idea of tinkering what would happen if reality were different which is formally materialized through cause and effect relations. Fisher proposed a procedure to physically manipulate reality such that an outcome variable can be evaluated under different conditions, which was called randomized controlled experiments (Fisher, 1951). This method is indeed one of the most pervasive techniques used throughout the sciences, and it is often deemed the gold standard for causal inference. For instance, the process of drug s approval by the FDA is conducted following Fisher s method one can discover, for example, the effect of a drug (X) on survival (Y ), which is written in causal language as the experimental distribution P(y | do(x)), or Px(y) for short.

Causal Identiﬁcation. The infeasibility of always physically manipulating reality to see what would happen leads to one of the central challenges in causal inference, which is known as the problem of identiﬁcation of causal effects (Pearl, 2000; Spirtes et al., 2001; Bareinboim & Pearl, 2016). It s well understood that no causal claim about Px(y) can be made directly from the observational distribution P(V), where V is the set of measured (observed) variables (Pearl, 2000, Ch. 3). The idea is then to combine a coarse description of the underlying system, usually speciﬁed as a causal diagram (D), with the observational data (P(V)) in order to infer the causal distribution Px(y).

A sample causal diagram is shown in Fig. 1a, where the nodes represent random variables, directed edges represent direct causal relations from tails to heads, and bi-directed arcs represent the presence of unobserved (latent) variables that generate spurious associations between the observed variables, also known as confounding bias (Pearl, 1993). One may seek to identify the effect of forcing variable X to take the value x, i.e., do(X=x), on V4=v4, i.e., Px(v4), given the causal diagram in Fig. 1a and data from the observational distribution P(X, V1, . . . , V4). In summary, the causal identiﬁability problem asks whether the combination of D and P(V) allows the identiﬁcation of Px(y); as in any decision problem, the answer is sometimes negative, which

Causal Identiﬁcation under Markov Equivalence: Completeness Results

happens whenever the causal diagram does not warrant a transformation of Px(y) into a functional of P(V).

Identiﬁcation has been extensively studied in the literature, and a number of criteria have been established (Pearl, 1993; Galles & Pearl, 1995; Kuroki & Miyakawa, 1999; Tian & Pearl, 2002; Huang & Valtorta, 2006; Shpitser & Pearl, 2006; Bareinboim & Pearl, 2012), including the celebrated back-door criterion and the do-calculus (Pearl, 1995). Despite their power, these techniques require a single, fully speciﬁed causal diagram, which is not always available in practical settings. A sensible concern, therefore, is that forcing a single diagram may lead to false modeling assumptions and, consequently, misleading inferences.

Markov Equivalence. Another line of investigation focuses precisely on trying to learn a qualitative description of the system, which in the ideal case could lead to the true causal diagram the blueprint underlying the phenomenon being investigated. These efforts are usually deemed more data-driven , and more aligned with the zeitgeist in machine learning. In practice, however, it is common that only a Markov equivalence class including a collection of causal diagrams can be consistently inferred from observational data (Verma, 1993; Spirtes et al., 2001; Zhang, 2008b). A distinguished characterization of the Markov equivalence class uses partial ancestral graphs (PAGs). Fig. 1b shows the PAG learnable from observational data that is consistent with the true causal diagram (Fig. 1a). The directed edges in a PAG represent ancestral relations (not necessarily direct) and the circle marks stand for structural uncertainty.

Identiﬁcation under Markov Equivalence. In this work, we analyze the marriage of these two lines of investigation, where the structural invariances in a Markov equivalence class (learnable from observational data) will be used to identify causal effects, whenever possible. However, identiﬁcation from an equivalence class is considerably more challenging than from a single diagram due to the structural uncertainties. Given that fully specifying a causal diagram is usually infeasible, there is a growing interest in identiﬁability results under Markov Equivalence (Maathuis et al., 2010). For instance, Zhang (2007) extended the do-calculus to PAGs. In practice, however, it is computationally hard to decide whether there exists (and, if so, to ﬁnd) a sequence of applications of the rules of the generalized calculus to identify the interventional distribution. Perkovi c et al. (2015) generalized the back-door criterion to PAGs, and provided a sound and complete algorithm to ﬁnd a back-door admissible set, should such a set exist. However, the back-door criterion is not as powerful as the do-calculus since no adjustment set exists for many identiﬁable causal effects. More recent work generalized the c-component approach (Tian & Pearl, 2002) to PAGs and devised an algorithm for identiﬁcation under Markov equivalence (Jaber et al., 2018b;a).

Figure 1: Causal diagram (left) and the inferred PAG (right).

The algorithm was shown to be strictly more powerful than the generalized back-door criterion, but it is not complete, as we show later.1

Our strategy here is to combine the state-of-the-art graphical condition introduced in (Jaber et al., 2018a) with the classic c-component decomposition developed in (Tian, 2002) to obtain a PAG-speciﬁc decomposition, and use it to design a complete algorithm for identiﬁcation in PAGs. It is worth noting that completeness is one of the most important guarantees a theory can offer it provides a precise boundary between what is identiﬁable and what is not in the given setting. Speciﬁcally, our main contributions are as follows:

1. We introduce a novel graph-decomposition strategy that breaks the targeted causal distribution into an equivalent product of more amenable distributions.

2. We develop an algorithm to compute the effect of an arbitrary set of intervention variables on an arbitrary outcome set from a PAG and an observational distribution. We show that this algorithm is sound and complete.

3. We characterize non-identiﬁability in a PAG based on some forbidden graphical structures, which means that whenever such a structure can be found as a subgraph of the PAG, identiﬁcation is provably impossible.

2. Preliminaries

In this section, we introduce the basic notation and machinery used throughout the paper. Bold capital letters denote sets of variables, while bold lowercase letters stand for particular assignments to those variables.

Structural Causal Models. We use the language of Structural Causal Models (SCMs) (Pearl, 2000, pp. 204-207) as our basic semantical framework. Formally, an SCM M is a 4-tuple U, V, F, P(U) , where U is a set of exogenous (latent) variables and V is a set of endogenous (measured) variables. F represents a collection of functions F = {fi} such that each endogenous variable Vi V is determined by a function fi F, where fi is a mapping from the respective domain of Ui Pai to Vi, Ui U, Pai V\Vi.

1Another approach is based on SAT (Boolean constraint satisfaction) solvers (Hyttinen et al., 2015). Given its somewhat distinct nature, a closer comparison lies outside the scope of this paper.

Causal Identiﬁcation under Markov Equivalence: Completeness Results

The uncertainty is encoded through a probability distribution over the exogenous variables, P(U), and the marginal distribution induced over the endogenous variables P(V) is called observational. Every SCM is associated with one causal diagram where every variable Vi V is a node, and there exists a directed edge from every node in Pai to Vi. Also, for every pair Vi, Vj V such that Ui Uj = , there exists a bi-directed edge between Vi and Vj. We restrict our study to recursive systems, which means that the corresponding diagram will be acyclic.

Within the structural semantics, performing an action X=x is represented through the do-operator, do(X = x), which encodes the operation of replacing the original equation for X by the constant x and induces a submodel Mx. The resulting distribution is denoted by Px, which is the main target for identiﬁcation in this paper. For details on structural models, we refer readers to (Pearl, 2000).

Ancestral Graphs. We now introduce a graphical representation of equivalence classes of causal diagrams. A mixed graph can contain directed and bi-directed edges. A is an ancestor of B if they share a directed path out of A. A is a spouse of B if A B is present. An almost directed cycle happens when A is both a spouse and an ancestor of B. An inducing path is a path on which every node (except for the endpoints) is a collider on the path (i.e., both edges incident to X are into X) and every collider is an ancestor of an endpoint of the path. A mixed graph is ancestral if it doesn t contain a directed or almost directed cycle. It is maximal if there is no inducing path (relative to the empty set) between any two non-adjacent nodes. A Maximal Ancestral Graph (MAG) is a graph that is both ancestral and maximal (Richardson & Spirtes, 2002).

In general, a causal MAG represents a set of causal diagrams with the same set of observed variables that entail the same independence and ancestral relations among the observed variables. Different MAGs may be Markov equivalent in that they entail the exact same independence model. A partial ancestral graph (PAG) represents an equivalence class of MAGs [M], which shares the same adjacencies as every MAG in [M] and displays all and only the invariant edge marks (i.e., edge marks that are shared by all members of [M]). A circle indicates a edge mark that is not invariant.

A PAG is learnable from the conditional independence and dependence relations among the observed variables, and the FCI algorithm is a standard method to learn such an object (Zhang, 2008b). In short, a PAG represents a Markov equivalence class of causal diagrams with the same observed variables and independence model.

Graphical Notions. Given a causal diagram, MAG, or PAG, a path between X and Y is potentially directed (causal) from X to Y if there is no arrowhead on the path

pointing towards X. Y is called a possible descendant of X and X a possible ancestor of Y , i.e., X An(Y ), if there is a potentially directed path from X to Y . By stipulation, X An(X). A set A is (descendant) ancestral if no node outside A is a possible (descendant) ancestor of any node in A. Y is called a possible child of X, i.e., Y Ch(X), and X a possible parent of Y , i.e., X Pa(Y ), if they are adjacent and the edge is not into X. For a set of nodes X, we have Pa(X) = X XPa(X) and Ch(X) = X XCh(X). Given two sets of nodes X and Y, a path between them is called proper if one of the endpoints is in X and the other is in Y, and no other node on the path is in X or Y. For convenience, we use an asterisk (*) as a wildcard to denote any possible mark of a PAG ( , >, ) or a MAG (>, ). If the edge marks on a path between X and Y are all circles, we call the path a circle path. We refer to the closure of nodes connected with circle paths as a bucket. Obviously, given a PAG, nodes are partitioned into a unique set of buckets.

A directed edge X Y in a MAG or PAG is visible if there exists no causal diagram in the corresponding equivalence class where there is an inducing path between X and Y that is into X. This implies that a visible edge is not confounded (X Y doesn t exist). Which edges are visible is easily decidable by a graphical condition (Zhang, 2008a), so we simply mark visible edges by v. For brevity, we refer to any edge that is not a visible directed edge as invisible.

Identiﬁcation in Causal Diagrams. Pearl (2000, pp. 70) formalizes the notion of uniquely computing an effect from data as follows.

Deﬁnition 1. The causal effect of X on a disjoint set Y is said to be identiﬁable from a causal diagram D if Px(y) can be computed uniquely from any positive probability of the observed variables P(V) that is, if P M1 x (y) = P M2 x (y) for every pair of models M1 and M2 with P M1(V) = P M2(V) > 0 and D(M1) = D(M2) = D.

Tian & Pearl (2002) presented an identiﬁcation algorithm based on a decomposition of the causal diagram into a set of so-called c-components (confounded components).

Deﬁnition 2 (C-Component). In a causal diagram, two observed variables are said to be in the same c-component if and only if they are connected by a bi-directed path, i.e. a path composed solely of bi-directed edges.

For any set C V, the quantity Q[C] is deﬁned to denote the post-intervention distribution of C under an intervention on V \ C, i.e. Pv\c(c).

The signiﬁcance of c-components and their decomposition is evident from (Tian, 2002, Lemmas 10, 11), which are the basis of the complete algorithm relative to Deﬁnition 1.

Identiﬁcation in PAGs. The following deﬁnition formalizes the notion of identiﬁcation given a PAG, which general-

Causal Identiﬁcation under Markov Equivalence: Completeness Results

izes the diagram-speciﬁc notion in Deﬁnition 1.

Deﬁnition 3. Given a PAG P over V and a query Px(y) where X, Y V, Px(y) is identiﬁable given P if and only if Px(y) is identiﬁable given every causal diagram D (represented by a MAG) in the Markov equivalence class represented by P, and with the same expression.

Jaber et al. (2018a) introduced the notion of pc-component in MAGs and PAGs (and induced subgraphs thereof). This graphical construction is a necessary condition for two nodes to be in the same c-component in some causal diagram or an induced subgraph thereof. This observation is formalized in the proposition below.

Deﬁnition 4 (PC-Component). In a MAG, a PAG, or any induced subgraph thereof, two nodes are in the same possible c-component (pc-component) if there is a path between them such that (1) all non-endpoint nodes along the path are colliders, and (2) none of the edges is visible.

Proposition 1. Let P be a PAG over V, and D be any diagram in the equivalence class represented by P. For any X, Y A V, if X and Y are in the same c-component in DA, then X and Y are in the same pc-component in PA.

As a special case of Def. 4, we have the following notion, which will prove useful later on. Unlike pc-component, the relation of being in the same dc-component is transitive.

Deﬁnition 5 (DC-Component). In a MAG, a PAG, or any induced subgraph thereof, two nodes are in the same deﬁnite c-component (dc-component) if they are connected with a bi-directed path, i.e. a path of bi-directed edges.

Jaber et al. (2018a) developed an identiﬁcation criterion for PAGs where the intervention is on a bucket and the input distribution is possibly interventional. We introduce this result below as it will be used in our algorithm. The derived expression depends on a partial topological order (PTO) which is, in short, a topological order over the buckets. A detailed discussion can be found in (Jaber et al., 2018b).

Proposition 2. Let P denote a PAG over V, T be the union of a subset of the buckets in P, and X T be a bucket. Given Pv\t (i.e., Q[T]), and a partial topological order B1 < < Bm with respect to PT, Q[T\X] is identiﬁable if and only if, in PT, there does not exist Z X such that Z has a possible child C / X that is in the pc-component of Z. If identiﬁable, then the expression is given by

Q[T \ X] = Pv\t Q

{i|Bi SX} Pv\t(Bi|B(i 1)) (1)

{i|Bi SX} Pv\t(Bi|B(i 1)),

where SX = S

Z X SZ, SZ being the dc-component of Z in PT, and B(i 1) denoting the set of nodes preceding bucket Bi in the partial order.

3. Query Decomposition

Given a causal diagram D, one of the cornerstone results in (Tian, 2002) is Lemma 11, which allows one to decompose a query distribution Q[H] into a product of sub-queries over the c-components in DH. Hence, we get the following decomposition, where Hi is a c-component in DH:

i Q[Hi] (2)

For example, query Q[{Y1, Y2, Y3, Y4, Y5}], denoted Q[Y], given the causal diagram in Fig. 2b can be decomposed as follows:

Q[Y] = Q[{Y2, Y3}] . Q[{Y4, Y5}] . Q[{Y1}] (3)

We note that the decomposition relies heavily on the precise delimitation of the c-components, where each query is associated, respectively, with H1 = {Y2, Y3}, H2 = {Y4, Y5}, and H3 = {Y1}. Generalizing the decomposition to PAGs, therefore, is challenging given the structural uncertainties; most relevant the presence of latent confounders. A decomposition has to be valid in every causal diagram in the equivalence class. For instance, given the query Q[Y] over the PAG in Fig. 2a, the sequence of nodes Y2, Y3, Y4, Y5, Y1 is connected with invisible edges, which are possibly confounded. Hence, any naive decomposition of Q[Y] into a product of sub-queries over subsets of Y is invalid since we can construct a diagram in the equivalence class which violates this decomposition. For instance, the decomposition in Eq. 3 is valid for the diagram in Fig. 2b but not for the one in Fig. 2c. Yet, we can still decompose the query using some invariances such as the fact that Y2 and Y1 are not in the same pc-component in PY, hence they are not in the same c-component in the Y-induced subgraph of any causal diagram in the equivalence class by Prop. 1.

To develop an invariantly valid decomposition, we start by introducing the notion of a region. In short, a region is the pc-component of a set A appended with the corresponding buckets of the nodes. The pc-component of a set A includes all the nodes which could, in some causal diagram, be in the c-component of some node in A (Prop. 1). We append the pc-component set with the corresponding buckets of the nodes to avoid non-identiﬁability of the sub-queries since no sufﬁcient causal information is present within a bucket.

Deﬁnition 6 (Region RC A). Given a PAG or a MAG G over V, and A C V. Let the region of A with respect to C, denoted RC A, be the union of the buckets that contain nodes in the pc-component of A in the induced subgraph GC.

Consider the PAG in Figure 2a, and let C = Y and A = {Y3}. Then, RC A = {Y3, Y2, Y4, Y5} since Y2 and Y4 are in the pc-component of Y3 and Y5 is in the same bucket as Y4. Alternately, if A = {Y4, Y5}, then RC A = Y,

Causal Identiﬁcation under Markov Equivalence: Completeness Results

(b) 1st causal diagram.

(c) 2nd causal diagram.

Figure 2: Sample PAG with two causal diagrams in the equivalence class.

since Y2 and Y3 are in the pc-component of Y4 and Y1 is in the pc-component of Y5. For simplicity, we often drop C, i.e. RA, whenever it is clear from the context. Using this construction, we derive a useful decomposition as follows.

Theorem 1. Given a PAG P over V and set C V, Q[C] can be decomposed as follows.

Q[C] = Q[A] . Q[RC\A]

Q[A RC\A] (4)

where A C and R(.) = RC (.).

Proof. Let D be any causal diagram in the equivalence class of P. We show that Eq. 4 is valid for D. Let DC be the Cinduced subgraph of D. Let S1 and S2 partition A RC\A in DC where S1 contains nodes that are in the c-component of some node in A \ RC\A, and S2 contains the rest.

By construction of S1 and S2, no node in S1 is in the same c-component as any node in S2 in DA RC\A, so Q[A RC\A] = Q[S1] Q[S2]. Moreover, Q[A] = Q[A\S2] Q[S2] since none of the nodes in S2 is in the c-component of A \ S2 in DC, and consequently in DA since A C. Also, we claim that Q[RC\A] = Q[RC\A\S1] Q[S1]. Suppose for contradiction that the claim is not true, then some node S S1 is in the c-component of X RC\A \ S1 = (C \ A) S2 in DRC\A. But, X S2 since S1 and S2 are in different c-component in DC, and consequently in DRC\A. So X C \ A. But then we have the following in DC: X is in the c-component of S, and S is in the ccomponent of some node Y A \ RC\A by deﬁnition of S S1. This is a contradiction since Y would then be in the pc-component of X in the induced sub-PAG PC (Prop. 1), and consequently part of RC\A. Therefore, the claimed decomposition of Q[RC\A] is also valid in D. By the previous observations in D, we can simplify the righthand side of Eq. 4 as follows:

Q[A].Q[RC\A]

Q[A RC\A] = Q[A \ S2].Q[S2].Q[RC\A \ S1].Q[S1]

Q[S1].Q[S2]

= Q[A \ S2].Q[RC\A \ S1] (5)

Note that Eq. 5 is equivalent to Q[A \ S2].Q[(C \ A) S2]. By the previous derivation, it is easy to verify that no node

in A \ S2 is in the same c-component as any node in (C \ A) S2. Hence, the decomposition in Eq. 4 is valid for D as the right-hand side can be simpliﬁed to be consistent with Eq. 2. This concludes the proof.

Back to the query Q[Y] over the PAG in Fig. 2a, we can decompose the query accordingly with A = {Y2, Y3, Y4, Y5}:

Q[Y] = Q[A].Q[RY Y\A]

Q[A RY Y\A] = Q[Y \ {Y1}].Q[{Y1, Y4, Y5}]

Q[{Y4, Y5}] (6)

Alternately, we get this decomposition for A = {Y1}:

Q[Y] = Q[{Y1}] . Q[{Y1, Y2, Y3, Y4, Y5}]

Q[{Y1}] = Q[Y] (7)

The above two examples show that the decomposition is sensitive to the choice of A, and consequently raises a couple of interesting issues about Theorem 1:

1. Some decompositions are useless, e.g. Eq. 7.

2. Some queries in the decomposition are not identiﬁable solely due to the choice of A, e.g. Q[{Y1}] in Eq. 7 will turn out to be not identiﬁable due to the invisible edge Y5 Y1.

The following decomposition is a special case of Thm. 1 and allows us to overcome these weaknesses. Note that Eq. 6 follows from this corollary with A = {Y3}.

Corollary 1. Given a PAG P over V and set C V, Q[C] can be decomposed as follows.

Q[C] = Q[RA].Q[RC\RA]

Q[RA RC\RA]

where A C and R(.) = RC (.).

Proof. It follows from Thm. 1 with A replaced by RC A.

Causal Identiﬁcation under Markov Equivalence: Completeness Results

4. A Complete Identiﬁcation Algorithm

Using the identiﬁcation criterion in Prop. 2 and the decomposition in Corol. 1, we formulate the procedure we call IDP, which is shown in Alg. 1. The main idea of IDP goes as follows. After receiving the sets X, Y, and a PAG P, the algorithm pre-processes the query by computing D, the set of possible ancestors of Y in PV\X. Then, the procedure calls the subroutine Identify over D to compute Q[D] from the observational distribution P(V). The recursive routine basically tests for one of two conditions. First, it checks for the presence of a bucket B in PT that is a subset of the intervention nodes, i.e. B T \ C, and satisﬁes the conditions of Prop. 2. If found, it computes Q[T \ B] using Eq. 1, and proceeds with a recursive call. Alternatively, if such a bucket does not exist in PT, then IDP checks for a bucket B in PC such that the region of B with respect to C, i.e. RC B, does not span C. If such a bucket exists, then IDP decomposes the query Q[C] according to Corol. 1. Finally, if both tests fail, then IDP throws a failure condition.

Theorem 2. IDP (Alg. 1) terminates and is sound.

Proof. Starting with termination, every recursive call of Identify strictly decreases the size of the input sets C and T. This is evident in the call at line (8). We are left with the three recursive calls at line (10) due to the decomposition (Corol. 1). RB is a strict subset of C by construction. Then, RC\RB is a strict subset of C as well since the pccomponent property is symmetric. Finally, it follows easily that RB RC\RB is a strict subset of C.

Given that IDP terminates, we now move to soundness. Let G be any causal diagram in the equivalence class of PAG P over V, and let V = V \ X. We have

v \y Px(v ) = X

v \y Q[V ] = X

By deﬁnition, D is an ancestral set in PV , and hence it is ancestral in GV by (Jaber et al., 2018a, Prop. 1). So, we have the following by (Tian, 2002, Lem. 10):

v \d Q[V ] = X

d\y Q[D] (8)

Eq. 8 is equivalent to the expression in line (2) of Alg. 1. Finally, the correctness of the recursive routine Identify follows from that of Proposition 2 and Corollary 1.

4.1. Illustrative example

Consider the query Px(y) given PAG P in Fig. 2a, where X = {X1, X2} and Y = {Y1, Y2, Y3, Y4, Y5}. We have D = Y, and the problem reduces to computing Q[D] using

Algorithm 1 IDP(x, y) given PAG P

input two disjoint sets X, Y V output Expression for Px(y) or FAIL

1: Let D = An(Y)PV\X 2: Px(y) = P

d\y Identify(D, V, P)

3: function Identify (C, T, Q = Q[T]) 4: if C = then return 1 5: if C = T then return Q /* In PT, let B denote a bucket, and let CB denote the pc-component of B */

6: if B T \ C such that CB Ch(B) B then 7: Compute Q[T \ B] from Q (via Prop. 2) 8: return Identify(C, T \ B, Q[T \ B])

9: else if B C such that RB = C then

10: return Identify(RB,T,Q) . Identify(RC\RB,T,Q)

Identify(RB RC\RB,T,Q) 11: else 12: throw FAIL 13: end if 14: end function

the call Identify(D, V, P). None of the buckets X1 and X2 satisfy the condition at line (6) of Alg. 1, hence IDP tries to decompose the query. Bucket Y3 satisﬁes the condition at line (9) as RY Y3 = {Y2, Y3, Y4, Y5} Y, and we have the decomposition derived earlier in Eq. 6.

First, with the call Identify({Y2, Y3, Y4, Y5}, V, P), node Y1 satisﬁes the test at line (6) as it has no children. Hence, we compute Q[V \ {Y1}] using Proposition 2.

Q[V \ {Y1}] = P(v) P(y1, y2, y3, x1, x2|y4, y5) X

y1 P(y1, y2, y3, x1, x2|y4, y5)

= P(y2, y3, y4, y5, x1, x2) (9)

In the next recursive call, let T1 = V \ {Y1} and Py1 corresponds to Eq. 9. Now, X1 satisﬁes the condition at line (6), and we can compute Q[T1 \ {X1}] from Py1 = Q[T1],

Q[T1 \ {X1}] = Py1 Py1(y2, y3, x1, x2|y4, y5) X

x1 Py1(y2, y3, x1, x2|y4, y5)

= P(y2, y3, y4, y5, x2) (10)

Let T2 = T1 \ {X1}. Now, X2 satisﬁes the identiﬁcation

Causal Identiﬁcation under Markov Equivalence: Completeness Results

criterion and we can compute Q[T2 \ {X2}] from Eq. 10,

Q[T2 \ {X2}] = Px1,y1 Px1,y1(x2) X

x2 Px1,y1(x2)

= P(y2, y3, y4, y5|x2) (11)

Similarly, we get the following expressions for Q[{Y1, Y4, Y5}] and Q[{Y4, Y5}], respectively.

Q[{Y1, Y4, Y5}] = P(y1, y4, y5|x1) (12)

Q[{Y4, Y5}] = P(y4, y5) (13)

Hence, the ﬁnal expression for Px(y) is a result of Eqs. 11, 12, and 13 as follows.

Px(y) = P(y2, y3, y4, y5|x2) . P(y1, y4, y5|x1)

P(y4, y5) (14)

We can simplify Eq. 14 using independence relations to,

Px(y) = P(y3|x2, y2, y4)P(y2|x2)P(y1|x1, y5)P(y4, y5)

Note that the adjustment criterion in (Perkovi c et al., 2015) and the algorithm in (Jaber et al., 2018a) fail to compute the above causal effect, and hence are not complete.

4.2. Completeness

After introducing the identiﬁcation algorithm and proving its soundness, we turn to the completeness of the procedure. According to Def. 3, whenever IDP fails, we need to establish one of two conditions for completeness. Either there exist two causal diagrams in the equivalence class with different identiﬁcations, or the effect is not identiﬁable in some causal diagram, where a hedge for Px(y) exists (Shpitser & Pearl, 2006, Def. 7). In this section, we establish completeness by proving that the latter is always the case.

The blueprint for the proof goes as follows. First, we show that the recursive routine Identify fails only if an intervention node from X remains in T. Then, we construct a MAG in the equivalence class of PAG P, and we prove that an induced subgraph of it has some special properties (to be deﬁned). Finally, we use the latter MAG to construct a causal diagram in the corresponding equivalence class with a hedge for Px(y). This concludes the proof.

Lemma 1. Whenever IDP fails, there exist at least one node X X such that X T in the failing instance of Identify(C, T, Q).2

2See (Jaber et al., 2019) for all the proofs.

We ﬁrst consider an easy case of non-identiﬁable causal effects in the following theorem.

Theorem 3. Given PAG P over V and query Px(y) where X, Y V, if there exist a proper possibly directed path from X to Y that starts with an invisible edge (i.e. , ), then Px(y) is not identiﬁable.

Proof. By (Perkovi c et al., 2015, Lem. 5.5), we construct MAG M in the equivalence class of P with a proper directed path from X to Y that starts with invisible X R. Then, we construct a causal diagram in the equivalence class of M retaining the directed path and where X R is confounded (Zhang, 2008a, Lem. 10). F = {X, R}, F = {R} form a hedge for Px(y) (Shpitser & Pearl, 2006, Th. 4).

In what follows, we assume the following condition holds, otherwise the case is covered by Thm. 3.

Condition 1. Assume IDP(x, y) fails, then every proper possibly directed path from X to Y in PAG P starts with a visible edge.

This implies that every bucket in PAG P is a subset of set D in line (1) of IDP or V \ D. Since Prop. 2 and Corol. 1 don t split any bucket, then every bucket in PT, local to the failing call of Identify(C, T, Q), is a subset of C or T \ C. With these observations, we construct a MAG M in the equivalence class of the input PAG P using the procedure in Lemma 2.

Lemma 2. Under the failing conditions of IDP(x, y), i.e. a recursive Identify(C, T, Q) fails, we can construct a MAG M in the equivalence class of PAG P resulting from the following procedure applied to P:

1. orient the circles on edges in P as tails;

2. For bucket B PV\T, orient into a DAG with no unshielded colliders;

3. For bucket B PT\C, orient all edges out of some node B B such that B is in the same pc-component with a possible child W B in PT; and

4. For buckets in PC, let B1 < < Bm be PTO over PC:

(a) In Bm, orient all edges out of any node B Bm. (b) For every Bi, 1 i < m, orient all edges out of some node B Bi such that B is in the same pc-component with B in PC.

In the construction of Lem. 2, the intent behind the choice of the nodes and the edge orientations is to maintain the failing conditions of the induced subgraph PT in an induced subgraph of M. Those properties are established in Lem. 3.

Causal Identiﬁcation under Markov Equivalence: Completeness Results

Figure 3: Sample PAGs where Px(y) is not identiﬁable.

Lemma 3. Let M denote the MAG constructed in Lem. 2. Then, there is a corresponding induced subgraph MT , with C T V, that maintains the following properties:

1. C An(Y)MV\X;

2. C is a single dc-component in MC ;

3. T \ C contains at least one intervention X X; and

4. every node in T \ C is in the dc-component with a child.

With the previous observations under Condition 1, we establish the following non-identiﬁability result which is complementary to Th. 3.

Theorem 4. Assume IDP fails to identify Px(y) in PAG P, then there exist a causal diagram in the equivalence class of P such that Px(y) is not identiﬁable.

Proof. Lem. 2 constructs a MAG M in the equivalence class of P. According to Lem. 3, an induced subgraph of M over T maintains the given properties. We construct a causal diagram D from M by simply keeping all directed and bi-directed edges intact. The diagram is trivially in the equivalence class of M. In DT , DC is a single ccomponent, and every node in T \C is in the c-component with a child. It follows that T = An(C ) and T is a single c-component, otherwise some node in T \ C is not in the c-component with a child. Let R be the root set of C , then R is a root set of T as well. We can remove directed edges from DT so that C and T form R-rooted C-forests (Shpitser & Pearl, 2006, Def. 6). Finally, T \ C

contains at least one intervention X X and R C An(Y)DV\X. Hence, T , C construct a hedge for Px(y) in D and the effect is not identiﬁable.

Corollary 2. IDP (Alg. 1) is complete.

Proof. This follows from Theorems 3 and 4.

5. Non-Identiﬁable Causal Effects

In this section, we aim to derive a graphical characterization for non-identiﬁable causal effects in PAGs building on observations from the completeness proof. The characterization

is akin to the ones introduced in (Shpitser & Pearl, 2006; Bareinboim & Pearl, 2012) for a given causal diagram. We start by introducing the following construction, where R is a root set of P over V iff V = An(R)P and it is maximal if no subset satisﬁes the property. Deﬁnition 7 (DC-forest). Let P denote any subgraph of a PAG, where Y is the maximal root set. Then P is a Y-rooted DC-forest if P is a dc-component and all nodes have at most one possible child through a directed ( ) or partially directed ( ) edge.

For instance, the PAG in Fig. 3c constructs a Y-rooted dc-forest where Y = {Y1, . . . , Y4}. Using the above, we deﬁne the notion of a P-Hedge as follows. Deﬁnition 8 (P-Hedge). Let X, Y disjoint sets of nodes in PAG P. Let F, F be R-rooted DC-forests such that F X = , F X = , F F, R An(Y)PV\X. Then F and F form a P-hedge for Px(Y) in P.

Fig. 3 contains samples of PAGs with non-identiﬁable effect Px(y). The simplest example is shown in Fig. 3a in which a proper possibly directed path from X to Y is composed of an invisible edge. Fig. 3c is more complex and contains a Phedge with F = V, i.e. all nodes, and F = {Y1 . . . , Y4}. Theorem 5 (Non-Identiﬁability Criterion). Given a PAG P, Px(y) is non-identiﬁable in P if and only if there exist:

1. proper possibly directed path from X to Y that starts with an invisible edge; or

2. dc-forests F, F forming a P-hedge for Px(y).

6. Conclusion

We tackled the problem of causal identiﬁcation in a Markov equivalence class represented by a PAG. First, we introduced a novel decomposition strategy for a given causal query. We then used the decomposition to develop an algorithm for causal identiﬁcation, and showed it to be complete. A signiﬁcant implication is that whenever an effect is identiﬁable in every causal diagram compatible with a PAG, the result is the same for all. We also introduced a graphical characterization for non-identiﬁable causal effects. As the full causal structure is not learnable in many practical settings, we believe these results will be useful for data-intensive sciences where identifying causal effects is important.

Causal Identiﬁcation under Markov Equivalence: Completeness Results

Acknowledgements

Bareinboim and Jaber are supported in parts by grants from NSF IIS-1704352, IIS1750807 (CAREER), IBM Research, and Adobe Research. Zhang s research was supported in part by the Research Grants Council of Hong Kong under the General Research Fund LU13602818.

Bareinboim, E. and Pearl, J. Causal inference by surrogate experiments: z-identiﬁability. In Proceedings of the Twenty-Eighth Conference on Uncertainty in Artiﬁcial Intelligence, 2012.

Bareinboim, E. and Pearl, J. Causal inference and the datafusion problem. Proceedings of the National Academy of Sciences, 113:7345 7352, 2016.

Fisher, R. The Design of Experiments. Oliver and Boyd, Edinburgh, 6th edition, 1951.

Galles, D. and Pearl, J. Testing identiﬁability of causal effects. In Besnard, P. and Hanks, S. (eds.), Uncertainty in Artiﬁcial Intelligence 11, pp. 185 195. Morgan Kaufmann, San Francisco, 1995.

Huang, Y. and Valtorta, M. Pearl s calculus of intervention is complete. In Proceedings of the Twenty-Second Conference on Uncertainty in Artiﬁcial Intelligence, UAI 06, pp. 217 224. AUAI Press, 2006.

Hyttinen, A., Eberhardt, F., and J arvisalo, M. Do-calculus when the true graph is unknown. In UAI, pp. 395 404, 2015.

Jaber, A., Zhang, J., and Bareinboim, E. Causal identiﬁcation under Markov equivalence. In Proceedings of the 34th Conference on Uncertainty in Artiﬁcial Intelligence, UAI 18, pp. 978 987, 2018a.

Jaber, A., Zhang, J., and Bareinboim, E. A graphical criterion for effect identiﬁcation in equivalence classes of causal diagrams. In Proceedings of the 27th International Joint Conference on Artiﬁcial Intelligence, IJCAI 18, pp. 5024 5030, 2018b.

Jaber, A., Zhang, J., and Bareinboim, E. Causal identiﬁcation under Markov equivalence: Completeness results. Technical report, R-42, Purdue AI Lab, Department of Computer Science, Purdue University, 2019.

Kuroki, M. and Miyakawa, M. Identiﬁability criteria for causal effects of joint interventions. Journal of the Japan Statistical Society, 29(2):105 117, 1999.

Maathuis, M. H., Colombo, D., Kalisch, M., and B uhlmann, P. Predicting causal effects in large-scale systems from observational data. Nature Methods, 7(4):247 248, 2010.

Pearl, J. Aspects of graphical models connected with causality. In Proceedings of the 49th Session of the International Statistical Institute, pp. 391 401, Tome LV, Book 1, Florence, Italy, 1993.

Pearl, J. Causal diagrams for empirical research. Biometrika, 82(4):669 688, 1995.

Pearl, J. Causality: Models, Reasoning, and Inference. Cambridge University Press, New York, 2000. 2nd edition, 2009.

Pearl, J. and Mackenzie, D. The Book of Why: The New Science of Cause and Effect. Basic Books, 2018.

Perkovi c, E., Textor, J., Kalisch, M., and Maathuis, M. H. A complete generalized adjustment criterion. In Proceedings of the Thirty-First Conference on Uncertainty in Artiﬁcial Intelligence, pp. 682 691, 2015.

Richardson, T. and Spirtes, P. Ancestral graph Markov models. Annals of Statistics, pp. 962 1030, 2002.

Shpitser, I. and Pearl, J. Identiﬁcation of joint interventional distributions in recursive semi-Markovian causal models. In Proceedings of the National Conference on Artiﬁcial Intelligence, volume 21, pp. 1219. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999, 2006.

Spirtes, P., Glymour, C. N., and Scheines, R. Causation, prediction, and search, volume 81. MIT press, 2001.

Tian, J. Studies in causal reasoning and learning. Ph D thesis, University of California, Los Angeles, 2002.

Tian, J. and Pearl, J. A general identiﬁcation condition for causal effects. In AAAI/IAAI, pp. 567 573, 2002.

Verma, T. Graphical aspects of causal models. Technical R eport R-191, UCLA, 1993.

Zhang, J. Generalized do-calculus with testable causal assumptions. In International Conference on Artiﬁcial Intelligence and Statistics, pp. 667 674, 2007.

Zhang, J. Causal reasoning with ancestral graphs. Journal of Machine Learning Research, 9(Jul):1437 1474, 2008a.

Zhang, J. On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias. Artiﬁcial Intelligence, 172(16):1873 1896, 2008b.