# on_positivity_condition_for_causal_inference__7c1ee747.pdf

On Positivity Condition for Causal Inference

Inwoo Hwang * 1 Yesong Choe * 2 Yeahoon Kwon 2 Sanghack Lee 1 2

Identifying and estimating a causal effect is a fundamental task when researchers want to infer a causal effect using an observational study without experiments. A conventional assumption is the strict positivity of the given distribution, or so called positivity (or overlap) under the unconfounded assumption that the probabilities of treatments are positive. However, there exist many environments where neither observational data exhibits strict positivity nor unconfounded assumption holds. Against this background, we examine the graphical counterpart of the conventional positivity condition so as to license the use of identification formula without strict positivity. In particular, we explore various approaches, including analysis in a post-hoc manner, do-calculus, Q-decomposition, and algorithmic, to yielding a positivity condition for an identification formula, where we relate them, providing a comprehensive view. We further discuss the design of a positivityaware identification algorithm based on the theoretical characterization of identification formulas.

1. Introduction

Knowing only associations among variables is insufficient to establish the direction of causation, thereby limiting its utility in making informed decisions. On the other hand, understanding cause-effect relationships can help elucidate the effect of intervention so that precise control of the underlying environment can lead to the desired outcome. Causal effect identification is the task of answering whether eliciting a causal quantity under given information (typically a set of causal assumptions encoded as a causal diagram) is probable and, if so, providing a mapping from an observational distribution to the causal quantity.

*Equal contribution 1Artificial Intelligence Institute, Seoul National University, Seoul, South Korea 2Graduate School of Data Science, Seoul National University, Seoul, South Korea. Correspondence to: Sanghack Lee <sanghack@snu.ac.kr>.

Proceedings of the 41 st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024. Copyright 2024 by the author(s).

Figure 1: Causal diagrams where Px(y) is identified via a (a) backdoor and (b) front-door criterion, respectively.

Formally, the causal effect of a set of treatment variables X on a disjoint set of outcome variables Y is said to be identifiable from a causal graph G if the quantity Px(y) = P(y | do(x)) can be uniquely computed from any positive distribution over the observed variables (Tian & Pearl, 2002; Shpitser & Pearl, 2006a). If an effect is non-identifiable, there exist two models eliciting the same causal structure and agreeing on the provided observational distribution while disagreeing on the effect of the intervention of interest.

In the absence of unmeasured confounders, all causal effects are trivially identifiable (Robins, 1986; Pearl, 1993; Spirtes et al., 1993; Galles & Pearl, 1998). However, if there are unmeasured confounders (i.e., a semi-Markovian model), the problem of causal effect identifiability is getting sophisticated. Tian & Pearl (2003); Huang & Valtorta (2006); Shpitser & Pearl (2006a) developed complete identification algorithms taking an arbitrary causal graph. When not identifiable, one might rely on partial identification, which provides bounds for the causal effect, or alike (Tian & Pearl, 2000; Balke, 1995; Zhang et al., 2022; Li & Pearl, 2022).

Regardless of the existence of complete graphical identification algorithms, one simple, widely adopted identification condition is the adjustment criterion (i.e., backdoor criterion (Pearl, 1993); g-computation (Robins, 1986)). It yields the following form of the formula for Px(y):

z P(y | x, z)P(z), (1)

where Z is an admissible set. Here, positivity P(x | z) > 0 is assumed to license the use of the above formula for the causal effect. In the context of estimating average treatment effect E[Y | do(X = 1)] E[Y | do(X = 0)], Hern an & Robins (2006) stated that, for each value of the covariate in the population, there are some subjects that received the treatment i.e., P(X | z) > 0 for all z with P(z) = 0. With all individuals receiving the same treatment, it would

On Positivity Condition for Causal Inference

be impossible to estimate the causal effect from observed data (Hern an & Robins, 2006; 2020).

The same positivity condition might be derived directly from the given formula, although its validity is less clear at this moment. Given the summation in (1), if there exists z such that P(y | x, z)P(z) is undefined, then the estimate will be undefined. Although P(y | x, z) is undefined if P(x, z) = 0, let us ignore the case when P(z) = 0, for the sake of explanation. Therefore, we are concerned about P(x, z) > 0 only when P(z) > 0. That is, we want to ensure P(x | z) > 0 whenever P(z) = 0. This leads to the original positivity condition for the adjustment:

z(P(z) = 0 P(x | z) > 0), (2)

which we will denote by adj(x; Z).

There is limited literature on the topic of positivity conditions in causal identification. Shpitser & Pearl (2006a) mentioned that most literature only considers P(V) as a positive distribution, where V is a set of endogenous variables. However, they postulated that P(A) must be positive for every distribution P(B | A) involved during the derivation of a formula. Other than this, positivity assumptions are mentioned only for specific identification tasks. Petersen et al. (2012) examined a positivity condition for assessing model and parametric-specific identifiability of causal effects within a marginal structural working model, where the result was identical to the condition for the adjustment criterion. Dahabreh et al. (2019) investigated the positivity conditions to generalize causal inferences under single world intervention graphs of a randomized trial in the target population of trial-eligible individuals.

Nonetheless, previous studies have not yet explicated positivity conditions when an identifiable causal formula exhibits an intricate form, such as fractions, given a general, arbitrary causal structure. Against this background, this paper seeks a principled way of eliciting a positivity condition for identifying a causal effect given a causal graph.

1.1. Motivating examples

We motivate our work by illustrating an example in which the backdoor adjustment is inapplicable. We then describe how different identification formulas involve different positivity conditions, illustrating practical implications.

Positivity on a front-door criterion Another well-known identification criterion is front-door adjustment (Pearl, 1995), exemplified by the causal diagram in Fig. 1(b). Its identification formula for Px(y) is as follows:

z P(z | x) P

x P(y | x , z)P(x ),

which can be derived by applying rules of do-calculus and probability operations. Without assuming general positivity

Figure 2: Causal diagrams where Px(y) is identified via (a) backdoor adjustment either using {Z1}, {Z2}, or both (b) both backdoor and front-door criteria.

P(V) > 0, one must ensure which positivity conditions are sufficient for each application and combine those conditions to derive a condition licensing the use of given P(V) for Px(y). It turns out that a sufficient condition is P(x) > 0 z(Pz(y) = 0)

z(P(z |x) = 0 adj(z; X)) , (3)

where the condition over an interventional distribution Pz(y) = 0 is further expanded to x (P(y|x , z) = 0 P(x ) = 0) (see Appendix C.2 for the derivation). Furthermore, the condition derived directly from the resulting formula that avoids its undefinedness is indeed equivalent to the above condition (3). In this paper, we discuss the relationships between the two conditions derived differently.

We note that the term positivity might be a misnomer since the detailed condition (3) for front-door criterion is described not only by inequality > 0 but also by equality, logical connectives, and quantifiers (see, for comparison, P(x | z) > 0 for adjustment or P(V) > 0 for a graphical criterion). Yet, we will stick to the conventional term positivity for the condition of an observational distribution under which an identification formula is validly evaluated.

Multiplicity of identification formulas and conditions An identification algorithm (e.g., ID (Shpitser & Pearl, 2006a) or IDENTIFY (Tian & Pearl, 2003)) typically returns a single identification formula for a causal query if it is identifiable from a causal graph. However, there may exist multiple formulas (Tikka & Karvanen, 2017; 2018), which may be obtained by applying the identification algorithm over different latent projections onto V V (Pearl & Verma, 1991) (i.e., a causal graph over a subset of variables while maintaining causal relationships) of a full causal graph over V. Here, we show by examples that different formulas imply different positivity conditions.

Consider Fig. 2(a) where there exist three different backdoor admissible sets {Z1}, {Z2}, and {Z1, Z2}. In this case, it is possible for adj(x; Z2) holds true but not adj(x; Z1) (see Appendix D.1). This implies that one can estimate the causal effect with a formula but not with the other, which was not the case under strict positivity. Now consider an example in Fig. 2(b) where both backdoor and front-door criteria can be applied to identify Px(y). Similarly, it might

On Positivity Condition for Causal Inference

be the case that the backdoor is not applicable, but the front-door criterion is, and vice versa (see Appendix D.2 for details). These examples manifest the importance of knowing the characteristics of the given data and tailoring the formula accordingly, rather than simply calculating the formula assuming P(V) > 0.

Practical implications Under the violation of strict positivity, the conventional identification formula for a causal quantity may not be properly computed, e.g., due to undefined terms. Furthermore, the main tools for eliciting the formula (i.e., do-calculus and Q-decomposition) are established under the strict positivity assumption. Thus, their validity and mathematical correctness are unclear under relaxed positivity. For practitioners, it is crucial to understand which identification formula is valid for the causal quantity, as the observational distribution is rarely strictly positive in many real-world scenarios.

1.2. Contributions

Our contributions are as follows. (i) We provide a comprehensive view of eliciting a positivity condition over an observational distribution for an identifiable causal query given an arbitrary causal graph. (ii) In particular, we offer positivity conditions for do-calculus and generalized Q-decomposition, which are the main drivers of sound and complete identification algorithms, establishing rigorous foundations for obtaining sufficient positivity for causal effect identification. (iii) We devise an algorithmic approach to eliciting positivity by incorporating a relaxed version of generalized Q-decomposition into an existing identification algorithm. We establish a connection to post-hoc analysis of positivity. Further, this principled approach can be modified to design a positivity-aware identification algorithm.

1.3. Organization

We illustrate the structure of our paper in Fig. 3. In Sec. 3, we first explore, without theoretical guarantee, the possibility of eliciting a positivity condition by directly examining a given identification formula, which is typically the output of an identification algorithm such as IDENTIFY (Tian & Pearl, 2003) or ID (Shpitser & Pearl, 2006a).

In Sec. 4, we formally re-establish the rules of do-calculus (Lem. 4.1) under relaxed positivity, which is the building block of identification algorithms. Consequently, we show that the conjunction of the positivity conditions, invoked during the derivation of an identification formula through sequential applications of do-calculus, corresponds to the positivity condition for the formula (Prop. 4.2 and Corollary E.1 in Appendix).

Next, we provide a more systematic, algorithmic approach to acquiring positivity in Secs. 5 and 6. Specifically, we

Identify algorithm

Formula Q-decomposition

Sec. 3 post-hoc analysis information

i RJNu OJAk RHq I+a Wn Kk U9k J5cl8BDrf Sg Fwj9u IT9Xd Hj Hwp R76r K8crynlv LP7nt SLln XViys NIEY6ng7y IQRXAc VSw Rw XBio0QVh Qv Sv EAy QVjr Qr A7Bnj95kd SPSna5VK7q NCpgigz YBwcg D2xw Cirg Ejig Bj C4Bfg ETw Zd8a D8Wy8TEt Txqxn D/y B8f YF+q+kwg=</latexit>Px(y), G, P(V)

Sec. 5 relaxed Q-decomposition

Lemma 4.1 Prop. 4.2

Sec. 4 Do-calculus

Sec. 6 Identify+ algorithm

Figure 3: A schematic diagram representing the overall structure of this paper.

first relax generalized Q-decomposition where an additional sufficient condition for identification is provided (Lem. 5.1 and Thm. 5.1). We then develop IDENTIFY+ (Algs. 1 and 2), which incorporates the relaxed decomposition into IDENTIFY (Tian & Pearl, 2003) and an additional component that simultaneously constructs a positivity condition while eliciting an identification formula (Thm. 6.1).

We note that one may skip the post-hoc analysis (Sec. 3), read through to the algorithmic approach, and then come back to the post-hoc analysis. This sequence would be beneficial to some readers who want to avoid some unresolved uncertainties regarding the validity of post-hoc analysis. All the omitted proofs are deferred to Appendix E.

2. Preliminaries

Each variable is represented with a capital letter, e.g., X, and its realization with the corresponding lowercase x. We use bold letters, e.g., X, to denote sets of variables (or values). We follow the conventional definition of probability based on a set of axioms by Kolmogorov (1933). Throughout this paper, we focus only on the discrete case. We discuss possible challenges to the continuous case in Appendix F.

Causal framework We adopt a structural causal model (SCM (Pearl, 2009)) as our causal framework.

Definition 2.1 (SCM (Pearl, 2009)). A structural causal model (SCM) M is a 4-tuple V, U, F,P(U) , where V = {V1, . . . , Vn} is a set of endogenous variables; U is a set of exogenous variables; F = {f1, . . . , fn} is a set of functions determining V: vi fi(pai, ui) where Pai V\{Vi} and Ui U; P(U) is a distribution over U.

Each SCM M induces a distribution P(V) and a causal graph G, where directed edges encode functional relationships between observed variables, and bidirected edges encode unobserved confounders. Conducting an intervention is represented through the do-operator, do(X = x), which corresponds to replacing the original equations of X with constant x. Such an intervention induces a submodel Mx

On Positivity Condition for Causal Inference

and an interventional distribution Px(V). We adopt graphical kinship terminology Pa( ), Ch( ), An( ), and De( ) to represent its parents, children, ancestors, and descendants in G where ancestors and descendants include its argument. For a set of arguments, Pa(X) = S

X X Pa(X), and others are defined similarly. We denote by GXZ the manipulated graph of G by deleting the edges onto X and edges outgoing from Z. We denote by p for conditional independence and d for d-separation. We may use for d if no ambiguity arises. Given W a subset of vertices in G, we use G[W] to denote a subgraph induced by W V. See Appendix A for formal definitions of d-separation and latent projection.

Causal identification problem The causal effect of an action do(x) on a set of variables Y such that Y X = is said to be identifiable from P in G if Px(y) is uniquely computable from P(V) in any causal model which induces G. That is, if P M1 x (y) = P M2 x (y) for every pair of causal models M1 and M2 with P M1(V) = P M2(V) > 0 and G(M1) = G(M2) = G, then Px(y) is identifiable in G. On the other hand, if one can construct two models such that P M1 x (y) = P M2 x (y), then Px(y) is not identifiable in G.

However, simply sticking to strict positivity for the definition of identifiability will lose the opportunity to utilize an available observational distribution for identifying causal effects since one can still obtain the effect once the distribution satisfies a sufficient positivity condition, e.g., Eq. (2). Given that our objective is not devising a complete graphical criterion for identification, we will be negligent to the strict positivity from the definition and focus on the positivity condition under which graphical causal effect identification is sound. Note that the causal effect Px(y) itself is always well-defined regardless of the strict positivity assumption (see Appendix A).

3. Informal Examination of Post-hoc Approach to Positivity

In this section, we informally1 examine a positivity condition under which the identification formula itself is welldefined2. The complete algorithms (Tian & Pearl, 2003; Shpitser & Pearl, 2006a) return formulas which are composed of fractions, summation (i.e., marginalization), or multipli-

1We wanted this section to be inspirational to later, more principled sections. Further, a formal treatment to post-hoc analysis requires making sense of 0/0 in formulas, investigated in Thm. 5.1

2Here, we say a formula is well-defined if it is computable. It should not be conflated with the well-definedness of mathematical terms. A formula can still be computable even if it contains undefined terms. For example, strictly speaking, P

{z|P (x|z)>0} P(y | x, z)P(z) if z(P(z) = 0 P(x | z) > 0). We will henceforth denote it as P

z P(y | x, z)P(z) 0 adj(x; Z), and say the formula P

z P(y | x, z)P(z) is well-defined under the positivity condition adj(x; Z).

Figure 4: A causal diagram called Napkin.

cation of the given joint probability or expressions thereof. The basic idea to derive whether the formula is well-defined consists of the following: (a) the addition of any undefined term yields an undefined value; (b) the multiplication of an undefined term yields an undefined value unless zero is multiplied to; and (c) the zero denominator of a fraction makes the fraction undefined (conditional probability is its special case). We will later illustrate the sufficiency of these rules to determine positivity conditions by connecting an identification algorithm and the pattern of the formula.

We introduce the concept of well-definedness of a formula through a causal diagram called Napkin (Fig. 4) (Pearl & Mackenzie, 2018) where its formula is as follows, Px(y) =

w P (y,x|r,w)P (w) P

w P (x|r,w)P (w) . Based on the formula, since the causal quantity is a fraction, we consider the (1) numerator and (2) denominator separately. If the denominator is zero, the formula is undefined. We derive the positivity condition as follows:

w P (y,x|r,w)P (w) P

w P (x|r,w)P (w) 0 r(① 0 ②> 0),

where ①and ②is a numerator and denominator, respectively. Each condition can be expressed as

① 0 adj(r; W),

②> 0 adj(r; W) w(P(x | r, w)P(w) > 0) adj(r; W) w(P(x | r, w) > 0 P(w) > 0)

adj(r; W) w(P(x, r, w) > 0)

adj(r; W) P(x, r) > 0.

Hence, the sufficient condition for the well-definedness of the formula is r adj(r; W) P(x, r) > 0 .

While it is true that the positivity condition derived directly from a formula ensures that the formula is well-defined, yet its validity is unclear for now since the formula is derived under strict positivity, and there might be some conditions that cannot be read off from the formula. We provide a formal treatment to the validity of post-hoc analysis in Sec. 7.1.

4. Positivity for Do-calculus

We now consider developing a general and principled approach for deriving a positivity condition by examining the conditions for the application of do-calculus (Pearl, 1995), which is a building block for identification algorithms and

On Positivity Condition for Causal Inference

shown to be complete for the task of causal effect identification (Shpitser & Pearl, 2006a). Do-calculus, capable of reducing an expression for Px(y) to a subscript-free expression, consists of the following three rules:

Rule 1 (addition/deletion of observation): Px(y | z, w) = Px(y | w) if (Y Z | X, W)GX Rule 2 (exchange of action and observation): Px,z(y | w) = Px(y | z, w) if (Y Z | X, W)GXZ Rule 3 (addition/deletion of action): Px,z(y | w) = Px(y | w) if (Y Z | X, W)GX,Z(W), where Z(W) = Z \ An(W)GX.

Pearl (1995) implicitly stated that a causal effect is identifiable from any positive distribution of the observed variables (i.e., P(v) > 0) in a model characterized by a graph G if there exists a finite sequence of transformations, each conforming to one of the three rules. We can relax the positivity conditions for do-calculus taking advantage of the results from Lauritzen et al. (1990); Verma & Pearl (1990). Proposition 4.1 (Prop. 2 in (Lauritzen et al., 1990)). If X and Y are d-separated by S, then X p Y | S.

While the result looks familiar, it is without strict positivity: whenever d-separation (X d Y | S) holds true, then P(x, y | s) = P(x | s)P(y | s) for P(s) > 0 and, similarly, P(x | y, s) = P(x | s) for P(s, y) > 0 without strict positivity over all the variables (nodes) in the DAG.

We generalize the proposition to the space of observational and interventional distributions and figure out a sufficient positivity condition for each rule. Before we proceed, we slightly rewrite do-calculus replacing GX with G \ X and dropping the condition on X (Lee & Bareinboim, 2020).

Lemma 4.1 (Do-calculus with relaxed positivity). Let G be the directed acyclic graph (DAG) associated with a causal model, and let P( ) be the probability distribution induced by the model. Then,

Rule 1: Px(y | z, w) = Px(y | w) if (Y Z | W)(G\X) and Px(z, w) > 0

Rule 2: Px,z(y | w) = Px(y | z, w) if (Y Z | W)(G\X)Z and Px(z, w) > 0

Rule 3: Px,z(y | w) = Px(y | w) if (Y Z | W)(G\X)Z(W) and Px(w) > 0.

Note that, for example, if we are to claim Px(y, z | w) = Px(y | w)Px(z | w) as an alternative for Rule 1, the condition can be further relaxed to Px(w) > 0. This lemma asserts sufficient positivity conditions not exclusively for an observational distribution. However, this does not hinder us from inducing a positivity condition over an observational distribution since all the constrained interventional probabilities appeared in intermediate expressions are later expressed

using only observational probabilities. In the following, we illustrate examples (i.e., front-door and Napkin) where positivity for do-calculus is applied while taking multiplied-byzero into consideration.3 Due to the space constraint, the backdoor example is provided in Appendix C.1.

Example 4.1 (Front-door). We demonstrate the lemma by deriving the formula based on probability axioms and docalculus (Pearl, 1995) on the front-door graph.

z Px(z)Px(y | z)

z Px(z)Pxz(y) (4)

z Px(z)Pz(y)

z P(z | x)Pz(y) (5)

z P(z | x) P

x Pz(y | x )Pz(x )

z P(z | x) P

x Pz(y | x )P(x )

z P(z | x) P

x P(y | x , z)P(x ), (6)

where (5) and (6), respectively, is granted if

P(x)>0 z(Pz(y)=0) and z(P(z|x)=0 adj(z; X)).

The conjunction of both conditions grants the identification of Px(y) without strict positivity. During the derivation, other lines do not require any condition. For example, Eq. (4) does not invoke Px(z) > 0 since Px(z) = 0 makes Px(y | z) ignorable. We present the full condition in terms of the observational distribution in Appendix C.2 together with a post-hoc analysis, which yields the same positivity condition.

Example 4.2 (Napkin). We derive the condition for Napkin:

Px(y) = Pw,r,x(y)

= Pw,r(y | x) if Pw,r(x) > 0

= Pw,r(y, x)/Pw,r(x) if Pw,r(x) > 0

= Pr(y, x)/Pr(x)

w P (y,x|r,w )P (w ) P

w P (x|r,w )P (w ) . if adj(r; W)

This derivation requires two positivity conditions. In reverse, there exists r such that (1) adj(r; W) and; (2) Pw,r(x) > 0, which is equivalent to Pr(x) > 0 by Rule 3. Condition (2) is implied by P w P(x | r, w )P(w ) > 0 under (1). This leads to adj(r; W) w(P(x | r, w)P(w) > 0). Since P(x | r, w )P(w ) > 0 P(x, r, w) > 0 and w(P(x, r, w) > 0) P(x, r) > 0, we finally obtain

r(adj(r; W) P(x, r) > 0). (7)

This turns out to be equivalent to the condition derived by post-hoc analysis (see Appendix C.3).

3To make sense of an undefined term multiplied by zero, consider the following derivation P(y) = P

z P(y, z) = P

{z|P (z)>0} P(y, z) = P

{z|P (z)>0} P(y|z)P(z). Thus, P(y) = P

z P(y|z)P(z) is valid ignoring z such that P(z) = 0.

On Positivity Condition for Causal Inference

Proposition 4.2. The conjunction of positivity conditions required in deriving an identification formula for a causal effect applying do-calculus and probability operations sequentially is a sufficient positivity condition for the identification of the causal effect.

5. Generalized Q-decomposition and Positivity

Given the theoretical underpinnings of the sufficient positivity conditions over do-calculus, we now investigate the feasibility of creating an identification algorithm capable of simultaneously taking a non-strictly positive observational distribution into account. IDENTIFY is a causal effect identification algorithm (Tian & Pearl, 2003), which is sound and complete, that identifies a causal effect in a semi-Markovian model. In this section, we characterize the key components of the algorithm, called Q-decomposition (Tian & Pearl, 2003), which decomposes a probability into c-factors, a special type of interventional probability. The c-factors derived from the given observational distribution are used to answer the c-factors derived from the query (Tian & Pearl, 2003; Shpitser & Pearl, 2006a). We review Q-decomposition and establish the correctness of the decomposition without relying on the strict positivity.

We can rewrite a causal effect

y+\y Px+(y+), (8)

where Y+ = An(Y)GX and X+ = An(X An(Y))GY. It is proved that the identifiability of Px+(y+) is equivalent that of Px(y) (Shpitser & Pearl, 2006a). Since marginalization and Rule 3 do not invoke positivity, we do not impose any positivity condition over the observation at this stage.

The modified query Px+(y+) is then decomposed into cfactors typically (and implicitly) under P(V) > 0 based on the c-components (connected components via bidirected edges) over G[Y+]. Next, each c-factor is identified individually and combined to ultimately solve Px(y):

Px+(y+) = Q

C cc(G[Y+]) Q[C], (9)

where cc( ) is c-components and

Q[W](w, paw) = P

u(W) P(u(W)) Q

Vi W 1vi=fi(pai,ui), (10) with U(W) = S

Vi W Ui. This c-factor is simply written Q[W] for readability. C-factor Q[W] is equivalent to PV\W(W), an interventional distribution over W where all the other variables are intervened (i.e., W s parents). We show the validity of the decomposition without positivity condition:

Proposition 5.1. Under structural causal model and without P(V) being positive, the following holds:

C cc(G[Y+]) Q[C].

V1 V2 V3 V4 V5 V6 V7

Figure 5: An exemplar causal diagram.

Now we will relate the following characterization of c-factor with the positivity condition. We revisit generalized Qfactorization (Tian & Pearl, 2003, Lemma 4) which consists of two parts, (i) and (ii). We present it as two separate lemmas with assumptions on the positivity explicit where the first part (i) can be claimed without positivity.

Lemma 5.1. Given H V, let H1, . . . , Hk be the ccomponents of G[H] where H V. Then, without Q[H] being positive, Q[H] = Q

The second part (ii) involves positivity, which we will later generalize that can take the condition of P(V) into account.

Lemma 5.2 (reproduced from Tian & Pearl 2003 with explicit positivity). Given H V, let H1, . . . , Hk be the c-components of G[H]. Let be a topological order over the variables in H according to G[H] such that V (1) V (2) V (|H|). Let H i be the variables in H that come before V (i) including V (i). Let H i be the variables in H that come after V (i). Given Q[H] > 0,

Q[H i] Q[H i 1], (11)

where Q[H i] = P

One may think that the equality, which plays a key role in IDENTIFY, is well-defined if the numerators are welldefined (i.e., 0) and the denominators are positive. However, this approach does not fully take into account an interplay between positivity and decomposition since Lem. 5.2 is derived under strict positivity, Q[H] > 0.

5.1. Relaxed Generalized Q-decomposition

This motivates us to generalize Lem. 5.2 for algorithmic identification under relaxed positivity. The intuition behind the generalization is that the product of fractions often can be shortened by canceling out depending on the topological order. We illustrate an example in Fig. 5. Here, Q[H] is factorized as Q[H] = Q[H1] Q[H2] where H1 = {V1, V2, V4, V6, V7} and H2 = {V3, V5}. Denoting Q[H i] as Qi for brevity, if Q[H] = Q7 > 0, then Q[H1] = Q7

Q0 and Q[H2] = Q5

Q2 by Lem. 5.2. Since Q6 and Q1 can be canceled out, we can write Q[H1] = Q7

We will show that this expression is valid if Q5 > 0, and further show that it is still possible to identify Q[H1] when some of the denominators are 0, i.e., Q5 = 0 or Q3 =

On Positivity Condition for Causal Inference

0, relaxing the strict positivity condition of Q[H] > 0 in Lem. 5.2. To begin with, we define some indices, called anchors, useful to characterize such cancellation. Definition 5.1 (Anchors). Given H V, let be a topological order over the variables in H according to G[H]. Let C cc(G[H]) where C can be partitioned into variables (maximally) consecutive in a topological order such that 1 l1 r1 < l2 r2 < < l T r T |H| and C = {V (i) | i T d=1[ld, rd]}. We say {(ld, rd)}T d=1 as anchors for C and denoted as IG[H], (C).

In Fig. 5, {(1, 2), (4, 4), (6, 7)} and {(3, 3), (5, 5)} is anchors for H1 and H2, respectively, with maximally consecutive segments. Note that {(1, 1), (2, 2), (4, 4), (6, 6), (7, 7)} is also valid anchors for H1 employed in Lem. 5.2. Theorem 5.1. Given H V, let H cc(G[H]) where IG[H], (H ) = {(ld, rd)}T d=1. Then, the following holds:

(i) If Q[H l T 1] > 0,

Q[H rd] Q[H ld 1]. (12)

(ii) If Q[H rm] = 0 and Q[H lm 1] > 0 for some m,

This generalizes Lem. 5.2 where irrelevant c-factors can be canceled out taking the positivity of Q[H l T 1]. Further, even when such positivity assumption is violated, still it provides a condition where the c-factor is identified as zero. In Fig. 5, Thm. 5.1 states that Q[H1] = Q7

Q5 > 0, and Q[H2] = Q5

Q2 if Q4 > 0. If Q4 = 0 and Q3 > 0, Q[H1] = 0. Similarly, if Q2 = 0 then Q[H1] = 0. On the other hand, if Q4 > 0 and Q5 = 0, we cannot make a conclusion on Q[H1]. We demonstrate an example where a c-factor is non-identifiable when Thm. 5.1 is not applicable. Example 5.1 (Front-door). Consider two SCMs M1, M2 that share the same F, i.e., X U1, Z X U2, Y Z ( U1 U3), but differ in the distribution over U. For M1, U1, U2 Ber(0.5), U3 Ber(0.2) and, for M2, U3 Ber(0.8). Here, P M1(V) = P M2(V). Thm. 5.1-(i) states that if P(x, z, y) > 0, then

Q[X, Y ](x, z, y) = Q[X,Z,Y ]

Q[X,Z] Q[X]

Q[ ] = P(y | x, z)P(x) > 0.

Thm. 5.1-(ii) states that if (P(x, z, y) = 0 and P(x, z) > 0), or P(x) = 0, then Q[X, Y ](x, z, y) = 0. Thm. 5.1 is inapplicable to identify Q[X, Y ] from Q[X, Z, Y ] = P(V) if P(x, z) = 0 and P(x) > 0. Consider Q[X, Y ](x = 0, z = 1, y = 1) where P(x = 0, z = 1) = 0 and P(x = 0) = 0.5. Here, the two models disagree: Q[X, Y ]M1(x = 0, z = 1, y = 1) = 0.1 = Q[X, Y ]M2(x = 0, z = 1, y = 1) = 0.4, thus, Q[X, Y ](x = 0, z = 1, y = 1) is nonidentifiable.

Unlike Lem. 5.2 where any topological order can be applied safely, Thm. 5.1 depends on the choice of topological order and arguments of the c-factors since they determine which c-factor that can be zero to appear as a denominator. Example 5.2 (Order dependency without strict positivity). Consider the following functions: W U1, Z U1 U2, X W U3, Y Z U3, where U1, U2, U3 Ber(0.5). Here, we consider identifying Q[X, Y ](w = 1, z = 1, x = 1, y = 1). Under a topological order W Z X Y , the expression of Q[X, Y ] following Eq. (12), if valid, is Q[X, Y ] = Q[W,Z,X,Y ]

Q[W,Z] . Here, we cannot reach a conclusion under the topological order since Q[W, Z](w = 1, z = 1) = 0. However, with a different topological order W X Z Y , the expression of Q[X, Y ] from Eq. (12) becomes Q[X, Y ] = Q[W,X,Z,Y ]

Q[W,X,Z] Q[W,X]

Q[W ] . Since Q[W, X](w = 1, x = 1) = 0 and Q[W](w = 1) = 0.5 > 0, we have Q[X, Y ](w = 1, z = 1, x = 1, y = 1) = 0 by Thm. 5.1-(ii).

As demonstrated in Examples 5.1 and 5.2, we were not able to make a conclusion when Thm. 5.1 is inapplicable. Thus, a few research questions remain such as an efficient way of finding a topological order that admits the theorem, or the necessity of the theorem, i.e., the non-identifiability of a causal quantity when the theorem is not applicable in any topological order. Answering those questions is beyond the scope of this paper, and we leave them for future work.

6. Algorithmic Positivity

Based on the characterization of c-factors with respect to positivity (Prop. 5.1, Lem. 5.1, and Thm. 5.1), we devise an identification algorithm named IDENTIFY+, which simultaneously returns a positivity condition implied by the resulting identification formula. We limit the algorithm to be data-agnostic, where the formula is driven without the knowledge of the positivity of an underlying observational distribution (we discuss the design of positivity-aware identification algorithm in Sec. 7.2.) This condition is constructed on-the-fly by repeatedly updating the conditions for a cfactor of interest to other c-factors that induce the c-factor.

Propagating a positivity condition We present rules that translate a constraint over a c-factor Q to the constraints over the c-factors in the formula for Q. This makes use of the fact that every identifiable c-factor can be expressed using operators such as summation, product, and fraction of expressions or, simply, an observational distribution P(V). Proposition 6.1. Consider a c-factor and its expression induced by marginalization, Lem. 5.1, or Thm. 5.1 for the cfactor. Then, a constraint over the expression of the c-factor is implied by a statement of first-order logic as follows: P

z g(z) 0 = z(g(z) 0),

On Positivity Condition for Causal Inference

Algorithm 1 IDENTIFY+ (outer)

1: input: x, y X{X,Y }, X, Y V, G. 2: output: an expression and positivity condition for Px(y). 3: Initialize constraint E with x+\x (Px+(y) 0). 4: EXPAND(E, Px+(y), P

Cj cc(G[Y+]) Q[Cj]) 5: for Cj cc(G[Y+]) do 6: Let Sk cc(G[An(Y)G]) such that Cj Sk. 7: f Q[Cj]|Q[Sk] = IDENTIFY+(Cj, G[Sk], Q[Sk], E) if Cj =Sk 8: EXPAND(E, Q[Sk], f Q[Sk]|P ) 9: record f Q[Cj]|P = f Q[Cj]|Q[Sk] f Q[Sk]|P 10: end for 11: return P

Cj cc(G[Y+]) f Q[Cj]|P , E

Algorithm 2 IDENTIFY+ (inner)

1: input: C, G = G[T], Q = Q[T], constraint E. 2: output: an expression for Q[C] made with Q[T]. 3: if (A := An(C)G[T]) = C then 4: EXPAND(E, Q[C], f Q[C]|Q[T] := P

t\c Q[T]) 5: return f Q[C]|Q[T] 6: else if A = T then 7: raise FAIL. 8: else 9: Let T be a c-component in G[A] s.t. C T . 10: f Q[C]|Q[T ] = IDENTIFY+(C, G[T ], Q[T ], E) if C =T

11: EXPAND(E, Q[T ], f Q[T ]|Q[A] f Q[A]|Q[T]) 12: return f Q[C]|Q[T ] f Q[T ]|Q[A] f Q[A]|Q[T] 13: end if

z g(z) > 0 = ( z(g(z) 0)) ( z(g(z) > 0)), P

z g(z) = 0 = z(g(z) = 0), Q

i gi 0 = ( i(gi = 0)) ( i(gi 0)), Q

i gi > 0 = i(gi > 0), Q

i gi = 0 = i(gi = 0),

g1/g2 0 = g1 0 g2 > 0,

g1/g2 > 0 = g1 > 0 g2 > 0,

g1/g2 = 0 = g1 = 0 g2 > 0.

We denote by EXPAND an algorithm that rewrites a constraint following Prop. 6.1. It takes a constraint expression E, a c-factor Q, and its expression using other cfactors. For example, if Q1 = P

a Q2, then EXPAND(Q1 = 0, Q1, P

a Q2) would return a(Q2 = 0). Further, if we have Q2 = Q6

Q5 Q4 Q3 using Thm. 5.1-(i), EXPAND( a(Q2 =

Q3 ) returns a (Q6=0 Q5>0) (Q4=0 Q3>0) . We would like to emphasize that (ii) of Thm. 5.1 plays a pivotal role in inducing such positivity conditions in conjunction with Prop. 6.1, while deriving the formula itself follows (i) of Thm. 5.1 and other machineries.

IDENTIFY+ algorithm We incorporate EXPAND into algorithmic identification4. Algorithm IDENTIFY+ transforms and factorizes a given query based on Q-decomposition (Alg. 1) and delegates the identification of each c-factor to its inner part (Alg. 2). To begin with, we introduce a new notation fg|h which is a formula for g with respect to h derived based on marginalization, Lem. 5.1, and Thm. 5.1. We denote symbolic substitution by so that fg|h fh|i is fg|h with h replaced with fh|i yielding fg|i. Further, abusing notation, we denote by Px(y) 0 to mean a constraint on an expression made of P(V) such that Px(y) = f(P(V)) 0, where the expression f is the output of the algorithm.

Alg. 1 begins with the initialization of X+, Y+, and constraint E = x+\x (Px+(y) 0) (Line 3), which is then expanded with respect to the formula (Eqs. (8) and (9), Line 4). Each c-factor is then identified, and its condition is examined (Lines 5 10). Line 7 induces f Q[Cj]|Q[Sk] an expression for Q[Cj] with Q[Sk] following Alg. 2 (or identity if Cj = Sk), and Line 8 translates constraint E written in Q[Sk] to P regarding f Q[Sk]|P , which is made by Thm. 5.1. Finally, the identification formula and condition are returned (Line 11) with formulas for c-factors recorded in Line 9.

The inner algorithm (Alg. 2) attempts to identify Q[C] with Q[T] (i.e., Q[Cj] and Q[Sk], respectively, in the outer, Line 7) and returns f Q[C]|Q[T] updating constraint E with Q[C] replaced to Q[T]. In a simple case, marginalization is sufficient. Otherwise, a recursion occurs to identify Q[C] with Q[T ], based on Thm. 5.1 (i), which is acquired as a c-factor from Q[A], a marginalization of Q[T]. Once the recursion is terminated, we propagate the constraint over Q[C] to that over Q[T ] (Line 11). Once terminated, the outer algorithm will take care of further expressing the constraint using Q[T] (Line 8 in Alg. 1). We provide running examples in Appendix C.

Theorem 6.1. Whenever IDENTIFY+ returns a positivity condition and identification formula, they are correct.

We focused on deriving a positivity condition on-the-fly while deriving a formula relaxed in part by Thm. 5.1. However, this does not take the given observational distribution s positivity into account. We present in Sec. 7.2 modifications that need to be augmented for IDENTIFY+ to identify a query adapting to the condition of given data.

7. Discussion

We discuss on (i) the validity of post-hoc analysis, (ii) the design of a positivity-aware identification algorithm, and (iii) generalization to conditional causal effects. An extended discussion on other topics is provided in Appendix F.

4We focus on returning an identification formula rather than evaluating the value (i.e., estimation).

On Positivity Condition for Causal Inference

7.1. Validity of Post-hoc Analysis

Now we formally examine the validity of post-hoc analysis. In post-hoc analysis, the sufficient conditions we considered in deriving fractions, summation, and multiplication (Sec. 6) are identical to those derived from the form of the identification formula (Prop. 6.1). With the formula intact (without further reduction from the resulting algorithm), the recursive application of marginalization and Q-decomposition will leave imprints on the formula that can be back-traced. Hence, the positivity condition can be reconstructed equally.

Proposition 7.1. Consider an identification formula derived through IDENTIFY or IDENTIFY+. Post-hoc analysis driven by Prop. 6.1 yields a sufficient positivity condition.

We conjecture that the post-hoc analysis is sound for any identification expression made with product, marginalization, and fraction, irrespect to the choice of methods, e.g., do-calculus, IDENTIFY, ID, identification criteria, etc. However, it is beyond the scope of this paper due to its generality.

7.2. Considerations for Positivity-Aware Identification

For the design of a positivity-aware identification algorithm, we present three key components that need to be considered, namely, topological order, fixing values, and latent projection. With positivity checked for the given distribution, the new algorithm can further take advantage of (ii) of Thm. 5.1.

To utilize Thm. 5.1, IDENTIFY+ algorithm needs to take topological orders into account. In particular, within a loop over topological orders, the inner part of IDENTIFY+ needs to be called taking the order as an additional argument.

The choice of value for rule 3 (e.g., r in Napkin) can be crucial under non-strict positivity. Thus, the outer algorithm, where the value x+ \x is fixed, needs to enumerate different instantiations for a subset of X+ \ X, e.g., {R} {R, W} for Napkin. However, we do not take into account of the value y+ \ y since it is already associated with marginalization, which iterates over all possible values.

Under strict positivity, the absence of a hedge for Px(y) is necessary and sufficient for its identification. If a latent projection does not induce a hedge, IDENTIFY or ID yields a simpler expression with fewer variables, but it does not turn a non-identifiable effect into an identifiable one. However, under non-strict positivity, Q-decomposition under a projection (without inducing a hedge) may avoid the case of 0

0 in Thm. 5.1 as demonstrated in Appendix D. In particular, the violation of positivity in a specific c-factor can be avoided by projecting out the variables involved, resulting in the consolidation of multiple c-components. This leads to different c-factors for Px(y). Subsequently, the algorithm can be rerun with the projection to identify a causal effect. Hence, positivity-aware projection is a crucial component.

7.3. Generalization to Conditional Causal Effects

A sound and complete identification criterion for a conditional causal effect under the strict positivity is shown in Tian (2004); Shpitser & Pearl (2006b). Given a query Px(y | w), one can apply Rule 2 of do-calculus to maximally change conditions into interventions. For example, Px,w (y | w ) with W = W W . Then, identifying the unconditional causal effect Px,w (y, w ) leads to the identification of Px,w (y | w ) (if Px,w (w ) > 0), which is equivalent to the original query Px(y | w). Since we have established sufficient conditions for both do-calculus and identification of marginal effects, our results indeed generalize to conditional causal effects as well.

8. Conclusion

Strict positivity is a long-standing critical assumption for causal inference, which is often unrealistic in many practical scenarios. We provided a comprehensive treatment to constructing a positivity condition required in graphical causal effect identification. In particular, we first informally explored ways of eliciting positivity innate in an identification formula in a post-hoc manner. We then provided positivity conditions for the rules of do-calculus so that one can induce a positivity condition together with an identification formula. Given the lack of a principled way to derive a formula using do-calculus, we delved into an algorithmic approach based on Q-decomposition, where we provided a relaxed positivity condition for Q-decomposition. Equipped with such characterization, we devised an identification algorithm that can simultaneously elicit a positivity condition. We discussed the design of a positivity-aware identification algorithm, which should account for topological orders and latent projections altogether, and generalization of our results to conditional causal effects. We hope this research sparks further investigation into the development of an identification algorithm that adapts to the positivity of given data.

Impact Statement

This paper presents work whose goal is to advance the field of causal inference and machine learning. In particular, relaxing some of the assumptions underlying the task of causal effect identification will help understand the characteristics between the available observational distribution and identification, further allowing the applicability of causal effect identification for previously considered impossible. There are some potential societal consequences of our work, none of which we feel must be specifically highlighted here.

On Positivity Condition for Causal Inference

Acknowledgements

We would like to thank Jin Tian for providing helpful and extensive feedback. We also thank anonymous reviewers for constructive comments to improve the manuscript. This work was partly supported by the IITP (2022-000953-PICA/20%), MFDS (23212MFDS202/20%), and NRF (RS-2023-00222663/20%, RS-2023-00211904/40%) grant funded by the Korean government.

Balke, A. Probabilistic Counterfactuals: Semantics, Computation, and Applications. Ph D thesis, Computer Science Department, University of California, Los Angeles, CA, 11 1995.

Bareinboim, E., Correa, J. D., Ibeling, D., and Icard, T. On pearl s hierarchy and the foundations of causal inference. In Probabilistic and causal inference: the works of judea pearl, pp. 507 556. Association for Computing Machinery, 2022.

Dahabreh, I. J., Robins, J. M., Haneuse, S. J., and Hern an, M. A. Generalizing causal inferences from randomized trials: counterfactual and graphical identification. ar Xiv preprint ar Xiv:1906.10792, 2019.

Galles, D. and Pearl, J. An axiomatic characterization of causal counterfactuals. Foundation of Science, 3(1):151 182, 1998.

Geiger, D. and Pearl, J. On the logic of causal models. In Proceedings of the 4th Workshop on Uncertainty in Artificial Intelligence, pp. 136 147, St Paul, MN, 1988.

Hern an, M. A. and Robins, J. M. Estimating causal effects from epidemiological data. Journal of Epidemiology & Community Health, 60(7):578 586, 2006.

Hern an, M. A. and Robins, J. M. Causal Inference: What if. Chapman & Hall/CRC, Boca Raton, 1st edition, 2020.

Huang, Y. and Valtorta, M. Pearl s Calculus of Intervention Is Complete. In T.S. Richardson, R. D. a. (ed.), Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence, pp. 217 224, Corvallis, OR, 2006. AUAI Press. ISBN 0974903922.

Kolmogorov, A. Grundbegriffe der wahrscheinlichkeitrechnung, ergebnisse der mathematik. published in english in 1950 as foundations of the theory of probability, bronx, ny: Chelsea. 1941a, stationary sequences in hilbert space,(russian) bull. Math. Univ. Moscow, 2:3 14, 1933.

Lauritzen, S. L., Dawid, A. P., Larsen, B. N., and Leimer, H. G.-G. Independence Properties of Directed Markov Fields. Networks, 20(5):491 505, 1990. ISSN 10970037. doi: 10.1002/net.3230200503.

Lee, S. and Bareinboim, E. Causal Effect Identifiability under Partial-Observability. In Proceedings of the 37th International Conference on Machine Learning, volume 119. PMLR, 2020.

Li, A. and Pearl, J. Bounds on causal effects and application to high dimensional data. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 5773 5780, 2022.

Pearl, J. [bayesian analysis in expert systems]: comment: graphical models, causality and intervention. Statistical Science, 8(3):266 269, 1993.

Pearl, J. Causal diagrams for empirical research. Biometrika, 82(4):669 688, 1995.

Pearl, J. Causality. Cambridge university press, 2009.

Pearl, J. and Mackenzie, D. The Book of Why. Basic Books, New York, 2018. ISBN 978-0-465-09760-9.

Pearl, J. and Verma, T. S. A Theory of Inferred Causation. In Allen, J. A., Fikes, R., and Sandewall, E. (eds.), Principles of Knowledge Representation and Reasoning: Proceedings of the Second International Conference, pp. 441 452, San Mateo, CA, 1991. Morgan Kaufmann.

Petersen, M. L., Porter, K. E., Gruber, S., Wang, Y., and Van Der Laan, M. J. Diagnosing and responding to violations in the positivity assumption. Statistical methods in medical research, 21(1):31 54, 2012.

Robins, J. M. A new approach to causal inference in mortality studies with a sustained exposure period applications to control of the healthy workers survivor effect. Mathematical Modeling, 7:1393 1512, 1986.

Shpitser, I. and Pearl, J. Identification of Joint Interventional Distributions in Recursive semi-Markovian Causal Models. In Proceedings of the Twenty-First AAAI Conference on Artificial Intelligence, volume 2, pp. 1219 1226, 2006a. ISBN 978-1-57735-281-5.

Shpitser, I. and Pearl, J. Identification of Conditional Interventional Distributions. In R. Dechter and T.S. Richardson (eds.), Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence, pp. 437 444, Corvallis, OR, 2006b. AUAI Press.

Spirtes, P., Glymour, C. N., and Scheines, R. Causation, Prediction, and Search. Springer-Verlag, New York, 1993.

Tian, J. Identifying Conditional Causal Effects. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, UAI 04, pp. 561 568, Arlington, Virginia, United States, 2004. AUAI Press. ISBN 0-9749039-06. URL http://dl.acm.org/citation.cfm? id=1036843.1036911.

On Positivity Condition for Causal Inference

Tian, J. and Pearl, J. Probabilities of causation: Bounds and identification. Annals of Mathematics and Artificial Intelligence, 28(1-4):287 313, 2000.

Tian, J. and Pearl, J. A General Identification Condition for Causal Effects. In Proceedings of the Eighteenth National Conference on Artificial Intelligence (AAAI 2002), pp. 567 573, Menlo Park, CA, 2002. AAAI Press/The MIT Press.

Tian, J. and Pearl, J. On the identification of causal effects. Technical Report R-290-L, Department of Computer Science, University of California, Los Angeles, CA, 2003.

Tikka, S. and Karvanen, J. Simplifying probabilistic expressions in causal inference. The Journal of Machine Learning Research, 18(1):1203 1232, 2017.

Tikka, S. and Karvanen, J. Enhancing identification of causal effects by pruning. Journal of Machine Learning Research, 18(1):7072 7094, 2018. ISSN 15337928. doi: 10.1016/j.cellbi.2006.10.004.

Verma, T. and Pearl, J. Causal networks: Semantics and expressiveness. In Machine intelligence and pattern recognition, volume 9, pp. 69 76. Elsevier, 1990.

Verma, T. S. Graphical Aspects of Causal Models. Technical Report R-191, Cognitive Systems Laboratory, Computer Science Department, University of California, Los Angeles, Los Angeles, CA, 1993.

Zhang, J., Tian, J., and Bareinboim, E. Partial counterfactual identification from observational and experimental data. In International Conference on Machine Learning, pp. 26548 26558. PMLR, 2022.

On Positivity Condition for Causal Inference

A. Preliminaries

Here we present definitions of d-separation, latent projection, and causal effect Px(y).

d-separation Excerpt from (Geiger & Pearl, 1988). For any three disjoint subsets X, Y, and Z of nodes in a DAG G, Z is said to d-separate X from Y if there is no path from a node in X to a node in Y along which (i) every node that delivers an arrow is outside Z, and (ii) every node with converging arrows either is in Z or has a descendant in Z.

Latent projection Excerpt from (Tian & Pearl, 2003). The latent projection of a graph G over V on C V, denoted G C , is a graph over C such that, in addition to edges in G C , for every pair of vertices (Vi, Vj) C, (i) add a directed edge Vi Vj in G C if there exists a directed path from Vi to Vj in G such that every vertex on the path is not in C, and (ii) add a bidirected edge Vi Vj in G C if there exists a divergent path between Vi and Vj in G such that every vertex on the path is not in C.

Causal effect Px(y) Under Pearl s causal hierarchy (Bareinboim et al., 2022, Def. 5), P(yx) := P

{u|Yx(u)=y} P(u) where the potential response Yx(u) is defined as the solution for Y of the set of equations Fx with respect to SCM M. Importantly, this definition is irrelevant to strict positivity. Similarly, the subsequent definition of conditional causal effect, i.e., Px(y | w) = Px(y, w)/Px(w) if Px(w) > 0, is also well-established regardless of the strict positivity assumption.

B. Properties of C-factors

Proposition B.1. Let C D, if Q[D](d, pa D) > 0, then Q[C](c, pa C) > 0 with c pa C consistent with d pa D.

Proof. First, C Pa(C) D Pa(D). By definition of c-factor, there exists value u(D) with P(u(D)) > 0 that yields d given Pa(D) = pa D. Since u(C) u(D), P(u(C)) > 0, and C yields c consistent to d with Pa(C) fixed to the values consistent with d pa D.

Corollary B.1. Let C D, if Q[C](c, pa C) = 0, then Q[D](d, pa D) = 0 with c pa C consistent with d pa D.

Corollary B.2. Given H V, let be a topological order over the variables in H according to G[H]. If i j and Q[H i] = 0, then Q[H j] = 0.

Corollary B.3. Given H V, let be a topological order over the variables in H according to G[H]. If i j and Q[H j] > 0, then Q[H i] > 0.

Proposition B.2. Given H V, let be a topological order over the variables in H according to G[H]. Let i j. Then, the following statements are equivalent.

(a) Q[H i 1] > 0 and Q[H j] = 0.

(b) There exists t [i, j] such that Q[H t 1] > 0 and Q[H t] = 0.

Proof. ((a) (b)) We give a proof by contradiction. Suppose there does not exist t [i, j] such that Q[H t 1] > 0 and Q[H t] = 0. Therefore, for all t [i, j], Q[H t 1] = 0 if Q[H t] = 0. Since Q[H j] = 0, we have Q[H j] = Q[H j 1] = Q[H i 1] = 0, which contradicts the hypothesis Q[H t 1] > 0.

((b) (a)) By Corollary B.2, Q[H j] = 0 since t j and Q[H t] = 0. Also, by Corollary B.3, Q[H i 1] > 0 since i 1 t 1 and Q[H t 1] > 0.

C. Derivation of Positivity for Backdoor, Front-door, and Napkin graphs

We highlight a few sentences in order for the readers not to be confused when deriving some of the positivity conditions in this section. We present positivity conditions for backdoor, front-door, and Napkin graphs derived based on post-hoc analysis, do-calculus, generalized Q-decomposition (manually), and algorithmic approach (IDENTIFY+).

On Positivity Condition for Causal Inference

C.1. Back-door graph

To begin with note that the term P(x, z) > 0 is equivalent to P(x | z) > 0.

Post-hoc Analysis

z P(y | x, z)P(z) 0

z ① z }| { P(y | x, z)

② z}|{ P(z) 0

z (①= 0) (②= 0) (① 0 ② 0)

z(((P(y, x, z) = 0 P(x, z) > 0) P(z) = 0) (P(x, z) > 0 P(z) 0)

z(((P(y, x, z) = 0 P(x, z) > 0) P(z) = 0) (P(x, z) > 0)

z(P(x, z) > 0 P(z) = 0).

This is equivalent to z(P(x | z) > 0 P(z) = 0).

Do-Calculus We show do-calculus-based derivation and check the positivity condition for the backdoor graph:

z Px(y | z)Px(z)

z P(y | x, z)Px(z) if z(P(x, z) > 0 Px(z) = 0)

z P(y | x, z)P(z).

Plugging in Px(z) = P(z), this derivation yields adj(x; Z), which is the same for the post-hoc analysis.

Q-decomposition (manually)

z Q[{Y, Z}] = X

z Q[{Y }](y, x, z)Q[{Z}](z),

Px(y) 0 z Q[{Y, Z}] 0

z(Q[{Y }] = 0 Q[{Z}] = 0 (Q[{Y }] 0 Q[{Z}] 0)).

By Thm. 5.1, for all z, each of Q[{Y }](y, x, z) and Q[{Z}](z) is further identified as

Q[{Y }](y, x, z) = Q[{X, Y, Z}]

Q[{X, Z}] = P(x, y, z)

P(x, z) if P(x, z) > 0,

Q[{Y }](y, x, z) = 0 if P(y | x, z) = 0,

Q[{Z}](z) = X

x,y Q[{X, Y, Z}] = P(z).

Combining those, we arrive at

Px(y) 0 z(Q[{Y }] = 0 Q[{Z}] = 0 (Q[{Y }] 0 Q[{Z}] 0))

z(P(y | x, z) = 0 P(z) = 0 (P(x, z) > 0 P(z) 0))

z(P(y | x, z) = 0 P(z) = 0 P(x, z) > 0)

z(P(z) = 0 P(x, z) > 0)

z(P(z) = 0 P(x | z) > 0).

We end up with exactly the same positivity conditions as post-hoc analysis and do-calculus aforementioned.

On Positivity Condition for Causal Inference

Algorithmic Approach

(outer, line 3) E = (Px(y) 0)

(outer, line 4) with expression P z Q[{Y }]Q[{Z}], E is expanded as

z(Q[{Y }] = 0 Q[{Z}] = 0 (Q[{Y }] 0 Q[{Z}] 0)).

(outer, line 5) case Q[{Y }]

(outer, line 6) Sk = {Y }, (outer, line 7) f Q[{Y }]|Q[{Y }] is an identity function, and calling the inner part is skipped.

(outer, line 8) computes Q[{Y }] = P (y,x,z)

P (x,z) and expands E as

z((P(y, x, z) = 0 P(x, z) > 0) Q[{Z}] = 0 (P(x, z) > 0 Q[{Z}] 0)).

(outer, line 9) records Q[{Y }] = P (y,x,z)

(outer, line 5) case Q[{Z}]

(outer, line 6) Sk = {Z}, (outer, line 7) is skipped. (outer, line 8) computes Q[{Z}] = P(z) and expands E as

z((P(y, x, z) = 0 P(x, z) > 0) P(z) = 0 (P(x, z) > 0 P(z) 0)).

(outer, line 9) records Q[{Z}] = P(z).

(outer, line 11) returns P

z P (y,x,z)

P (x,z) P(z) and up-to-date E.

Further reduction leads to the same conclusion as other approaches:

z((P(y, x, z) = 0 P(x, z) > 0) P(z) = 0 (P(x, z) > 0 P(z) 0))

z((P(y, x, z) = 0 P(x, z) > 0) P(z) = 0 P(x, z) > 0)

z(P(z) = 0 P(x, z) > 0)

z(P(z) = 0 P(x | z) > 0).

C.2. Front-door graph

We would like to note that P(z | x) 0 is equivalent to P(x) > 0.

P(y | x , z)P(x ) 0 is implied by P(x ) = 0 P(z | x ) > 0 (i.e., adj(z; X) except the universal quantifier).

Post-hoc Analysis

z P(z | x) X

x P(y | x , z)P(x ) 0

z ① z }| { P(z | x)

x P(y | x , z)P(x ) 0

z (①= 0) (②= 0) (① 0 ② 0) .

The conditions involving ②are

②= 0 x (P(y | x , z) = 0 P(x ) = 0)

② 0 adj(z; X).

Px(y) 0 z P(z | x) = 0 ( x (P(y | x , z) = 0 P(x ) = 0)) (P(x) > 0 adj(z; X) . (13)

On Positivity Condition for Causal Inference

①\② NA = 0 > 0

(a) Post-hoc analysis (13)

①\② NA = 0 > 0

①\② NA = 0 > 0

①\② NA = 0 > 0

(b) Do-calculus (14)

Figure 6: Equivalence between post-hoc analysis and do-calculus driven positivity for the front-door graph. Rows represent ①being undefined (NA), equal to zero, and greater than zero and columns are similarly defined for ②. Shaded cells are the condition satisfied by the equations.

Do-Calculus We show do-calculus-based derivation and check the positivity condition for the front-door graph:

z Px(y | z)Px(z)

z Pxz(y)Px(z)

z Pz(y)Px(z)

z Pz(y)P(z | x) if z(P(x) > 0 Pz(y) = 0)

z P(z | x) X

x Pz(y | x )Pz(x )

z P(z | x) X

x Pz(y | x )P(x )

z P(z | x) X

x P(y | x , z)P(x ) if z(P(z | x) = 0 adj(z; X))

This derivation requires only two positivity conditions (with numbers from post-hoc analysis):

z(P(x) > 0 Pz(y) = 0) (i.e., z(① 0 ②= 0))

z(P(z | x) = 0 adj(z; X)) (i.e., z(①= 0 ② 0)

Combining those with replacing Pz(y) with P

x P(y | x , z)P(x ) (which is granted since we are taking the conjunction over the conditions including those allowing Pz(y) = P

x P(y | x , z)P(x ) other than the assumptions encoded in the causal graph G),

z(① 0 ②= 0) (①= 0 ② 0). (14)

Its equivalence to the post-hoc analysis is illustrated in Fig. 6.

Q-decomposition (manually)

z Q[{Y, Z}] = X

z Q[{Y }]Q[{Z}] = X

z Q[{Z}](x, z) X

x Q[{X, Y }](x , y, z).

Px(y) 0 z Q[{Y, Z}] 0

On Positivity Condition for Causal Inference

z(Q[{Z}] = 0 Q[{Y }] = 0 (Q[{Z}] 0 Q[{Y }] 0))

z h Q[{Z}] = 0 | {z } ①=0

( x Q[{X, Y }] = 0) | {z } ②=0

Q[{Z}] 0 | {z } ① 0

( x Q[{X, Y }] 0) | {z } ② 0

One can see the equivalence of Eq. (14) to the condition (13) since

Q[{X, Y }](x , y, z) = Q[{X, Z, Y }]

Q[{X, Z}] Q[{X}]

Q[ ] = P(y | x , z)P(x ) 0, if P(x , z) > 0 P(x ) = 0,

Q[{X, Y }](x , y, z) = 0 if P(x ) = 0 P(y | x , z) = 0,

Q[{Z}](x, z) = Q[{X, Z}]

Q[{X}] = P(z | x) if P(x) > 0,

Q[{Z}](x, z) = 0 if P(z | x) = 0,

by Thm. 5.1, for all z and x .

Algorithmic Approach Some of the expressions are simplified (e.g., a conditional probability for a fraction) for readability.

(outer, line 3) initializes E with Px(y) 0.

(outer, line 4) expands E with P

z Q[{Z}]Q[{Y }] and yields

z(Q[{Z}] = 0 Q[{Y }] = 0 (Q[{Z}] 0 Q[{Y }] 0)).

(outer, line 5) case Q[{Y }]

(outer, line 6) Sk = {X, Y }

(outer, line 7) calls the inner with Cj = {Y }, G[Sk] X Y

* (inner, line 3) the condition is satisfied, A = C = {Y }

* (inner, line 4) expands with Q[{Y }] = P

x Q[X, Y ], yielding

z Q[{Z}] = 0 ( x Q[X, Y ] = 0) Q[{Z}] 0 ( x Q[X, Y ] 0) .

* (inner, line 5) returns Q[{Y }] = P

x Q[X, Y ].

(outer, line 8) expands with Q[X, Y ] = P (y,z,x )

P (z,x ) P(x ) (shortened for readability)

z Q[{Z}] = 0 ( x (P(y | z, x ) = 0) P(x ) = 0) Q[{Z}] 0 ( x (P(x ) = 0 P(z | x ) > 0)) .

(outer, line 9) records Q[{Y }] = P

x P(y | z, x )P(x ).

(outer, line 5) case Q[{Z}]

(outer, line 6) Sk = {Z} (outer, line 7) skips due to Cj = Sk = {Z}

(outer, line 8) expands E with Q[{Z}] = P (x,z)

z P(z | x) = 0 ( x (P(y | z, x ) = 0) P(x ) = 0) P(z | x) > 0 ( x (P(x ) = 0 P(z | x ) > 0))

(outer, line 9) records Q[{Z}] = P (x,z)

(outer, line 11) returns P

z P(x | z) P

x P(y | z, x )P(x ) and up-to-date E.

On Positivity Condition for Causal Inference

C.3. Napkin graph

Post-hoc Analysis The causal formula for Px(y) is as follows,

w P(y, x | r, w)P(w) P

w P(x | r, w)P(w) .

Based on the formula, since the causal quantity is a fraction, we should consider the (1) numerator and (2) denominator separately. Especially when the denominator is zero, the causal quantity is undefined. We derive the positive condition as follows:

w P(y, x | r, w)P(w)

w P(x | r, w)P(w)

r(① 0 ②> 0),

where each condition can be expressed as

① 0 adj(r; W),

②> 0 adj(r; W) w(P(x | r, w)P(w) > 0)

adj(r; W) w(P(x | r, w) > 0 P(w) > 0)

adj(r; W) w(P(x, r, w) > 0)

adj(r; W) P(x, r) > 0.

Px(y) 0 r adj(r; W) P(x, r) > 0 . (15)

Do-Calculus We start by Px(y) = Pw,r,x(y).

Px(y) = Pw,r,x(y)

= Pw,r(y | x) if Pw,r(x) > 0

= Pw,r(y, x)

w P(y, x | r, w )P(w ) P

w P(x | r, w )P(w ) if adj(r; W).

By combining both conditions with Pw,r(x) substituted to the denominator and applying the derivation shown in post-hoc analysis for ②> 0, we obtain

r adj(r; W) P(x, r) > 0 . (16)

This corresponds to the condition (15) illustrated by only examining the resulting identification formula.

Q-decomposition (manually) We start by Px(y) = Q[{Y }]. By Thm. 5.1, for any r,

Q[{Y }](y, x) = Q[{X, Y }](x, y, r)

Q[{X}](x, r) if Q[{X}](x, r) > 0.

On Positivity Condition for Causal Inference

Px(y) = Q[{X, Y }](x, y, r)

Q[{X}](x, r) 0

r(Q[{X, Y }](x, y, r) 0 Q[{X}](x, r) > 0).

Since Q[{X}](x, r) = P

y Q[{X, Y }](x, y , r), we have

Px(y) = Q[{X, Y }](x, y, r) P

y Q[{X, Y }](x, y , r) 0

r(Q[{X, Y }](x, y, r) 0 y Q[{X, Y }](x, y , r) 0 y Q[{X, Y }](x, y , r) > 0)

r( y Q[{X, Y }] 0 y Q[{X, Y }] > 0).

Since Q[{X, Y }] = P

w Q[{W, X, Y }], we have

w Q[{W, X, Y }](x, y, w, r) P

y ,w Q[{W, X, Y }](x, y , w, r) 0

r( (y , w)(Q[{W, X, Y }] 0) y ( w Q[{W, X, Y }] 0 w Q[{W, X, Y }] > 0))

r( (y , w)Q[{W, X, Y }] 0 (y , w)Q[{W, X, Y }] > 0).

By Thm. 5.1 and the definition of conditional probability,

Q[{W, X, Y }] = Q[{W, R, X, Y }]

Q[{W, R}] Q[{W}]

Q[ ] = P(y, x | w, r)P(w) > 0 if P(y, x, w, r) > 0,

Q[{W, X, Y }] = P(y, x | w, r)P(w) 0 if P(r | w) > 0 P(w) = 0.

w P(y, x | w, r)P(w) P

y ,w P(y , x | w, r)P(w) 0

r( (y , w)(P(r | w) > 0 P(w) = 0) | {z } ①

(y , w)(P(y , x, w, r) > 0) | {z } ②

Here, y can be removed from ①. Also, ②is equivalent to P(x, r) > 0. Therefore,

w P(y, x | w, r)P(w) P w P(x | w, r)P(w) 0

r( w(P(r | w) > 0 P(w) = 0) P(x, r) > 0)

r(adj(r; W) P(x, r) > 0). (16)

Consequently, the resulting condition is the same as both post-hoc analysis and do-calculus.

Algorithmic Approach

(outer, line 3) initializes E with w, r(Pw,r,x(y) 0).

(outer, line 4) expands E simply with Q[{Y }], resulting in Q[{Y }] 0.

(outer, line 5) case Q[{Y }]

(outer, line 6) Sk = {W, X, Y }

(outer, line 7) calls with G[Sk] X Y

On Positivity Condition for Causal Inference

Figure 7: (a) a causal diagram where Px(y) is identified via backdoor adjustment either using Z1 or Z2, (b) with Z1 projected out, (c) with Z2 projected out, and (d) a causal diagram with an observed mediator and confounder where both backand front-door criteria are applicable.

* (inner, line 3) the condition is not satisfied, A = {X, Y } = C. * (inner, line 6) the condition is not satisfied, A = T.

* (inner, line 9) T = {Y } with G[A] X Y

* (inner, line 10) is skipped.

* (inner, line 11) f Q[T ]|Q[A] = Q[{Y,X}] P

y Q[{Y,X}] and f Q[A]|Q[T] = P

w Q[{W, X, Y }]. Therefore, f Q[T ]|Q[A]

f Q[A]|Q[T] =

w Q[{W,X,Y }] P

y ,w Q[{W,X,Y }]. Expanded E becomes

w Q[{W, X, Y }] P

y ,w Q[{W, X, Y }] 0.

We can ignore w since w is not involved any term. Then,

w Q[{W, X, Y }] 0 X

y ,w Q[{W, X, Y }] > 0 .

* (inner, line 12) returns

w Q[{W,X,Y }] P

y ,w Q[{W,X,Y }].

(outer, line 8) expands Q[{W, X, Y }] with P (y,x,r,w )

P (r,w ) P(w ) = P(y, x | r, w )P(w )

r w (P(y, x|r, w ) = 0 P(w ) = 0 P(y, x, r, w ) > 0) y , w (P(y , x, r, w ) > 0) ,

which can be further reduced to

r w (P(r, w ) > 0 P(w ) = 0) P(x, r) > 0 .

(outer, line 9) records Q[{Y }] =

w P (y,x|r,w )P (w ) P

y ,w P (y ,x|r,w )P (w ).

(outer, line 11) returns

w P (y,x|r,w )P (w ) P

w P (x|r,w )P (w ) with up-to-date E, that is, r adj(r; W) P(x, r) > 0.

D. On Different Positivity Conditions for Different Identification Formulae

Causal graphs can be simplified by latent projection (Pearl & Verma, 1991; Verma, 1993), which may lead to the simplification of identification formula (Tikka & Karvanen, 2017; 2018). Latent projection induces a causal graph over a subset of variables while maintaining causal relationships among the subset of variables.

D.1. Backdoor Graph with {Z1, Z2}

Through the example illustrated in Fig. 2(a), we would like to argue that there can be multiple sufficient positivity conditions. Consider two identification formulae for Px(y) in G illustrated in Fig. 2(a): X

z1 P(y | x, z1)P(z1) and X

z2 P(y | x, z2)P(z2),

where two different positivity conditions are implied, i.e., adj(x; Z1) and adj(x; Z2).

On Positivity Condition for Causal Inference

{Z2} works as adjustment set but not {Z1}. We claim that there exists a distribution where adj(x; Z2) holds true but not adj(x; Z1) with the following setting. Without loss of generality, we present the parametrization with marginal and conditional distributions. (We denote by XZ represent the state space of Z.)

Let P(Z1, Z2) = P(Z2 | Z1)P(Z1) > 0 with XZ2 = {0, 1} and XZ1 = {0, 1, 2}.

P(x | Z1 = 2) = 0 and P(X | Z1 = z1) > 0 for z1 {0, 1}.

First, this satisfies adj(x; Z2) since P(x | z2) = P z1 P(x|z1)P(z1|z2) > 0 because, for every z2 value, z1 value exists (i.e., {0, 1} but not 2) such that P(x | z1) > 0 and P(z1 | z2) > 0. However, adj(x; Z1) does not hold since P(x | Z1 = 2) = 0 while P(Z1 = 2) > 0.

{Z1} as an adjustment set implies {Z2} as an adjustment set. However, the other way around is not possible. Assume for contradiction that adj(x; Z2) is false, that is, P(x | z 2) = 0 P(z 2) > 0 for some z 2, or simply, P(x | z 2) = 0. Since P(Z2) = P

z1 P(Z2 | z1)P(z1), for every P(z2) > 0 including z 2, there exists a value of z1 such that P(z2 | z1) > 0 and P(z1) > 0. Thus, there exists z 1 such that P(z 1 | z 2) > 0 and P(z 1) > 0. Now, P(x | z 2) = P z1 P(x | z1)P(z1 | z 2) being 0 implies that P(x | z1) = 0 for every P(z1 | z 2) > 0. Therefore, P(x | z 1) = 0 and P(z 1) > 0, which violates adj(x; Z1).

D.2. Backdoor/Front-door Graph

We now consider a causal graph illustrated in Fig. 2(b) where both backdoor and front-door criteria are applicable. Further, IDENTIFY algorithm returns a formula different than those criteria:

w P(y | x, w)p(w) backdoor (17)

z P(z | x) X

x P(y | x , z)p(x ) front-door (18)

z P(z | x)P(y | z, w) IDENTIFY (19)

Even though the formulae are different from each other, they produce the same causal quantity under positivity. Then, one might consider the backdoor formula more applicable when specific information about Z is unknown, or vice versa for W.

Front-door but not backdoor We claim that there exists a distribution where the front-door criterion holds true, but the backdoor criterion does not. Consider the following setting:

Let X and Z be binary and W be ternary.

Let W be uniformly distributed with XW = {0, 1, 2}.

Let P(x | W = 0) = 0 and P(x | W = w) > 0 for w {1, 2}.

We set P(Z | X) to be positive.

This satisfies the positivity condition for the front-door criterion since P(X, Z) > 0 leads z(P(z | x) > 0 adj(z; X)) to be true (the last term in Eq. (13)). On the other hand, the backdoor criterion does not hold because P(x | W = 0) = 0 and P(W = 0) > 0 violate adj(x; W).

In terms of Q-decomposition,

w,z Q[{Y, W, Z}] = X

w,z Q[{Y }]Q[{W}]Q[{Z}], (20)

where, if valid,

Q[{Y }] = Q[{Y, Z, X, W}]

Q[{Z, X, W}] , (21)

On Positivity Condition for Causal Inference

but both the denominator and numerator are zero when X = x and W = 0. With W projected out, X and Y are now confounded. As we have seen before,

Q[{X, Y }] = Q[{Y, Z, X}]

Q[{Z, X}] Q[{X}]

that is, W is removed from both numerator and denominator with a fraction for X added.

Backdoor but not Front-door We similarly design the model,

Let P(X, W) > 0.

For P(Z | x) > 0 but there exists x = x such that P(z | x ) = 0.

Since P(x | W) > 0 satisfies adj(x; W), the backdoor criterion works. However, for the front-door criterion, P(z | x) > 0 (that is, ①> 0) and P(y | x , z ) is undefined due to P(x , z ) = 0 where P(x ) > 0 (i.e., ② 0). Thus, the front-door criterion does not hold.

Again, let s consider Q-decomposition where we focus again on Q[{Y }] Eq. (21) in Eq. (20). Now, due to X and Z, both parts of the fraction are zero. With Z projected out,

Q[{Y }] = Q[{Y, X, W}]

Q[{X, W}] ,

the denominator becomes positive.

E. Omitted Proofs

Lemma 4.1 (Do-calculus with relaxed positivity). Let G be the directed acyclic graph (DAG) associated with a causal model, and let P( ) be the probability distribution induced by the model. Then,

Rule 1: Px(y | z, w) = Px(y | w) if (Y Z | W)(G\X) and Px(z, w) > 0

Rule 2: Px,z(y | w) = Px(y | z, w) if (Y Z | W)(G\X)Z and Px(z, w) > 0

Rule 3: Px,z(y | w) = Px(y | w) if (Y Z | W)(G\X)Z(W) and Px(w) > 0.

Proof. (Rule 1) By definition, Px is Markov to G \ X.

(Rule 2) We employ a lifted model and then relate the results in the lifted model to the underlying model. To cover Px,z and Px simultaneously, we augment the underlying model with indicators I = {IZ | Z Z} with IZ Z for each Z Z. Each indicator can be understood as an independent switch to change the mechanism of Z to implement both original f Z when IZ = 0 and interventional mechanism z when IZ = 1. We denote by Q the distribution from the model. Then,

Px(y | w, z) = Q(y | w, z, I = 0),

Px,z(y | w) = Px,z(y | w, z) = Q(y | w, z, I = 1).

Let us denote by H, the graph G augmented by I, and we simply set P(I) = 0.5|Z|. Rule 2 (Y Z | W)(G\X)Z is equivalent to (I Y | Z, W)H (Pearl, 1995), which licenses Q(Y | z, w) = Q(Y | z, w, I) under Q(z, w, I) > 0, that is Q(z, w) > 0 since I is marginally independent to {Z, W}. Thus, (Y Z | W)(G\X)Z and Px(z, w) > 0 are sufficient.

(Rule 3) We similarly define indicator variables over Z and an augmented graph H where (Y Z | W)(G\X)Z(W) is equivalent to (I Y | W)H. Thus, Q(Y | I, W) = Q(Y | W) under Q(w, I) > 0. Hence, a sufficient condition to imply Px,z(y | w) = Px(y | w) is the conjunction of (Y Z | W)(G\X)Z(W) and Px(w) > 0.

On Positivity Condition for Causal Inference

Proposition 4.2. The conjunction of positivity conditions required in deriving an identification formula for a causal effect applying do-calculus and probability operations sequentially is a sufficient positivity condition for the identification of the causal effect.

Proof. Each sufficient condition is satisfied if the conjunction of sufficient conditions holds true. Hence, we can arrive at the last expression.

Additionally, if the derivation does not involve any reductions (e.g., merging multiple terms into one by multiplication), then any positivity condition implied in the interventional probability will also be expressed as the given observational distribution (i.e., recursively identified) at the end. Hence, we can obtain a sufficient positivity condition expressed only over the given observational distribution P(V) without a need to separate the identification of the intermediate interventional probabilities.

Corollary E.1. The conjunction of positivity conditions required in deriving an identification formula for a causal effect applying do-calculus, marginalization, and the definition of conditional probability to factorize sequentially is a sufficient positivity condition over an observational distribution for the identification of the causal effect.

Proposition 5.1. Under structural causal model and without P(V) being positive, the following holds:

C cc(G[Y+]) Q[C].

Proof. With Rule 3 and the definition of c-factor,

Px+(y+) = Px+,v\(X+ Y+)(y+) = Q[Y]

Vi V\X+ 1vi=fi(pai,ui).

Based on algebra, we can rearrange the terms as

C cc(G[Y+])

u(C) P(u(C)) Y

Vi C 1vi=fi(pai,ui)

C cc(G[Y+]) Q[C]

so that terms related to each c-component are grouped.

Lemma 5.1. Given H V, let H1, . . . , Hk be the c-components of G[H] where H V. Then, without Q[H] being positive, Q[H] = Q

Proof. We start by definition:

Q[H](h, pah) = X

u(H) P(u(H)) Y

Vi H 1vi=fi(pai,ui)

u(H) P(u(H)) Y

Vi Hj 1vi=fi(pai,ui).

Since, U(Hm) U(Hn) = for any m = n,

Q[H](h, pah) = X

u(H) P(u(H)) Y

Vi Hj 1vi=fi(pai,ui)

j P(u(Hj)) Y

Vi Hj 1vi=fi(pai,ui)

u(Hj) P(u(Hj)) Y

Vi Hj 1vi=fi(pai,ui)

On Positivity Condition for Causal Inference

j Q[Hj](hj, pahj).

Theorem 5.1. Given H V, let H cc(G[H]) where IG[H], (H ) = {(ld, rd)}T d=1. Then, the following holds:

(i) If Q[H l T 1] > 0,

Q[H rd] Q[H ld 1]. (12)

(ii) If Q[H rm] = 0 and Q[H lm 1] > 0 for some m,

Proof. We first note that Eq. (12) is well-defined under the assumption Q[H l T 1] > 0 since it implies Q[H i 1] > 0 for all i l T 1 and thus the denominators are positive.

(i) Suppose Q[H l T 1] > 0. Since (ii) covers the case of Q[H r T ] = 0, we now suppose Q[H r T ] > 0. We show H cc(G[H r T ]) since H H r T H and H cc(G[H]). Invoking Lem. 5.2,

Q[H i] Q[H i 1],

where, for each 1 d T and i [ld + 1, rd], we cancel out the terms Q[H i 1] Q[H i 1]

in the telescoping product

Q[H rd] Q[H ld 1] > 0,

since V (i) H for i [ld, rd].

(ii) Suppose Q[H rm] = 0 and Q[H lm 1] > 0 for some m. Then, there exists t [lm, rm] such that Q[H t] = 0 and Q[H t 1] > 0 (Prop. B.2). Note that V (t) H . Now, we have Q[Hj] > 0 for all Hj cc(G[H t 1]) since Hj H t 1

(Prop. B.1). Now, let H cc(G[H t]) where V (t) H . Applying Lem. 5.1 for Q[H t],

0 = Q[H t] = Q[H ] Y

H cc(G[H t])\{H } Q[H ]. (22)

For all H cc(G[H t]) \{H }, H is a c-component of G[H t 1] and thus Q[H ] > 0. Therefore, by Eq. (22), we have Q[H ] = 0.

Given Q[H ] = 0, we finally show Q[H ] = 0 by invoking Corollary B.1 with H H , which is due to V (t) H cc(G[H t]) and V (t) H cc(G[H]).

Proposition 6.1. Consider a c-factor and its expression induced by marginalization, Lem. 5.1, or Thm. 5.1 for the c-factor. Then, a constraint over the expression of the c-factor is implied by a statement of first-order logic as follows: P

z g(z) 0 = z(g(z) 0), P

z g(z) > 0 = ( z(g(z) 0)) ( z(g(z) > 0)), P

z g(z) = 0 = z(g(z) = 0), Q

i gi 0 = ( i(gi = 0)) ( i(gi 0)), Q

i gi > 0 = i(gi > 0), Q

i gi = 0 = i(gi = 0),

g1/g2 0 = g1 0 g2 > 0,

On Positivity Condition for Causal Inference

g1/g2 > 0 = g1 > 0 g2 > 0,

g1/g2 = 0 = g1 = 0 g2 > 0.

Proof. The result follows from basic algebra if there exist no undefined terms. In case of possible undefined terms, we only consider the two cases where we treat multiplication by zero as zero, namely, Q

i gi 0 and Q

i gi = 0. By relaxed generalized Q-decomposition, whenever we have an undefined term (divide by zero), it is covered by Thm. 5.1-(ii). Since Thm. 5.1 generalizes Lem. 5.2 (with anchors minimally consecutive, i.e., all separated), this also covers the equation, output by IDENTIFY.

Note that g1/g2 > 0 can be simply implied by g1 > 0 under generalized Q-decomposition since the denominator is the marginalization of the numerator. Further, for each i, gi > 0 and gi 0 are interchangeable in ( i (gi = 0)) ( i (gi 0)).

Theorem 6.1. Whenever IDENTIFY+ returns a positivity condition and identification formula, they are correct.

Proof. The soundness of the identification formula comes from the correctness of Thm. 5.1, which generalizes Lem. 5.2, and IDENTIFY algorithm, which provides the basis for IDENTIFY+. The soundness of the positivity condition is due to Props. 5.1 and 6.1.

F. Further Discussion

We discuss the generalization to continuous variables. The use of summation P in identification formulas is for the convention, and such formulas are valid for the continuous case with integral R (Shpitser & Pearl, 2006a; Tian & Pearl, 2003). However, challenges remain for the positivity condition since it involves probability density functions and measurezero sets, which complicates its analysis, especially when fractions are involved. For the case of Napkin (Fig. 4), ensuring the denominator positive with w(P(x|r, w)P(w) > 0) may be translated to W R

W f(x|r, w)d F(w) > 0 where W is a subset of the domain of W. While we believe such translation is sufficient for imposing constraints over point estimates for the continuous case, we defer its formal treatments and deeper discussion to future work.

Here, we compare our result to the relaxed positivity mentioned by Shpitser & Pearl (2006a). The authors relaxed P(V) > 0 and, without a concrete proof, relied on P(Z) > 0 whenever P(W | Z) is considered in the identification. However, this yields a stricter positivity condition, e.g., P(x, Z) > 0 versus adj(x; Z) for adjustment criterion. In this work, we explicitly considered canceling out terms, avoiding checking positivity for unnecessary c-factors. Further, we have incorporated the case where such positivity is violated yet the identification is feasible (Thm. 5.1, ii). Hence, we further advance the understanding of the formal connection between positivity conditions and causal effect identification.