# sensitivity_analysis_of_linear_structural_causal_models__f77aaa53.pdf

Sensitivity Analysis of Linear Structural Causal Models

Carlos Cinelli 1 Daniel Kumor 2 Bryant Chen 3 Judea Pearl 1 Elias Bareinboim 2

Causal inference requires assumptions about the data generating process, many of which are unveriﬁable from the data. Given that some causal assumptions might be uncertain or disputed, formal methods are needed to quantify how sensitive research conclusions are to violations of those assumptions. Although an extensive literature exists on the topic, most results are limited to speciﬁc model structures, while a general-purpose algorithmic framework for sensitivity analysis is still lacking. In this paper, we develop a formal, systematic approach to sensitivity analysis for arbitrary linear Structural Causal Models (SCMs). We start by formalizing sensitivity analysis as a constrained identiﬁcation problem. We then develop an efﬁcient, graph-based identiﬁcation algorithm that exploits non-zero constraints on both directed and bidirected edges. This allows researchers to systematically derive sensitivity curves for a target causal quantity with an arbitrary set of path coefﬁcients and error covariances as sensitivity parameters. These results can be used to display the degree to which violations of causal assumptions affect the target quantity of interest, and to judge, on scientiﬁc grounds, whether problematic degrees of violations are plausible.

1. Introduction

Randomized controlled trials (RCT) are considered the gold standard for identifying cause-effect relationships in dataintensive sciences (Gifﬁn et al., 2010). In practice, however, direct randomization is often infeasible or unethical, requiring researchers to combine non-experimental observations with assumptions about the data generating process in or-

1Depts. of Statistics and Computer Science, University of California, Los Angeles, California, USA. 2Dept. of Computer Science, Purdue University, West Lafayette, IN, USA. 3Brex, San Francisco, CA, USA. Most of this work was conducted while at IBM Research AI. Correspondence to: Carlos Cinelli <carloscinelli@ucla.edu>.

Proceedings of the 36 th International Conference on Machine Learning, Long Beach, California, PMLR 97, 2019. Copyright 2019 by the author(s).

der to obtain causal claims. These assumptions are usually encoded as the absence of certain causal relationships, or as the absence of association between certain unobserved factors. Conclusions based on causal models are, therefore, provisional: they depend on the validity of causal assumptions, regardless of the sample size (Pearl, 2000; Spirtes et al., 2000; Bareinboim & Pearl, 2016).

In many real settings, it is not uncommon that these assumptions are subject to uncertainty or dispute. Scientists may posit alternative causal models that are equally compatible with the observed data; or, more mundanely, researchers can make identiﬁcation assumptions for convenience, simply to proceed with estimation.1 Regardless of the motivation, the provisional character of causal inference behooves us to formally assess the extent to which causal conclusions are sensitive to violations of those assumptions.

The importance of such exercises is best illustrated with a real example, which directly impacted public policy. During the late 1950s and early 1960s, there was a ﬁerce debate regarding the causal effect of cigarette smoking on lung cancer. One of its most notable skeptics was the inﬂuential statistician Ronald Fisher, who claimed that, without an experiment, one cannot rule out unobserved common causes (e.g. the individual s genotype) as being responsible for the observed association (Fisher, 1957; 1958). Technically speaking, Fisher s statement was accurate; data alone could not refute his hypothesis. Yet, although no RCT measuring the effect of cigarette smoking on lung cancer was performed, currently there exists a broad consensus around the issue. How could such a consensus emerge?

An important step towards the current state of affairs was a sensitivity analysis performed by Cornﬁeld et al. (1959). Their investigation consisted of the following hypothetical question: if Fisher s hypothesis were true, how strong would the alleged confounder need to be to explain all the observed association between cigarette smoking and lung cancer? The analysis concluded that, since smokers had nine times the risk of nonsmokers for developing lung cancer, the latent confounder would need to be at least nine times more common in smokers than in nonsmokers something

1As noted by Joffe et al. (2010), such assumptions are usually made casually, largely because they justify the use of available statistical methods and not because they are truly believed .

Sensitivity Analysis of Linear Structural Causal Models

deemed implausible by experts at the time.

Cornﬁeld s exercise reveals the fundamental steps of a sensitivity analysis. The analyst introduces a violation of a causal assumption of the current model, such as positing the presence of unobserved confounders that induce a non-zero association between two error terms. Crucially, however, we are willing to tolerate this violation up to a certain plausibility limit dictated by expert judgment (e.g., prior biological understanding, pilot studies). The task is, thus, to systematically quantify how different hypothetical degrees of violation (to be deﬁned) affect the conclusions, and to judge whether expert knowledge can rule out problematic values.

The problem of sensitivity analysis has been studied throughout the sciences, ranging from statistics (Rosenbaum & Rubin, 1983; Small, 2007; Rosenbaum, 2010; Cinelli & Hazlett, 2018; Franks et al., 2019) to epidemiology (Brumback et al., 2004; Vanderweele & Arah, 2011; Ding & Vander Weele, 2016; Arah, 2017), sociology (Frank, 2000), psychology (Mauro, 1990), political science (Imai et al., 2010; Blackwell, 2013), and economics (Leamer, 1983; Imbens, 2003; Oster, 2017; Masten & Poirier, 2018). Notwithstanding all this attention, the current literature is still limited to speciﬁc models and solved on a case-by-case basis. Considering the ubiquity of causal questions in the sciences and artiﬁcial intelligence, a formal, algorithmic framework to deal with violations of causal assumptions is needed.

Causal modeling requires a formal language where the characterization of the data generating process can be encoded explicitly. Structural Causal Models (Pearl, 2000) provide such a language and, in many ﬁelds, including machine learning, the health and social sciences, linearity is a popular modeling choice. In this paper, we focus on the sensitivity analysis of linear acyclic semi-Markovian SCMs. We allow violations of exclusion and independence restrictions, such as (i) the absence or presence of unobserved common causes; and, (ii) the absence, presence or reversal of direct causal effects. Our contributions are the following:

1. We introduce a formal, algorithmic approach for sensitivity analysis in linear SCMs and show it can be reduced to a problem of identiﬁcation with non-zero constraints, i.e, identiﬁcation when certain parameter values are ﬁxed to a known, but non-zero, number.

2. We develop a novel graphical procedure, called PUSHFORWARD, that reduces identiﬁcation with a known error covariance to vanilla identiﬁcation, for which a plethora of algorithms are available.

3. We develop an efﬁcient graph-based constrained identiﬁcation algorithm that takes as input a set of sensitivity parameters and returns a sensitivity curve for the effect estimate. The algorithm is theoretically sound and

experimental results corroborate its generality, showing canonical sensitivity analysis examples are a small subset of the cases solved by our proposal.

This paper is structured as follows. Section 2 reviews basic terminology and deﬁnitions that will be used throughout the text. Section 3 shows how sensitivity analysis in the context of linear SCMs can be reduced to a constrained identiﬁcation problem. In Section 4 we develop a novel approach that allows researchers to systematically incorporate constraints on error covariances of linear SCMs. Section 5 utilizes these results to construct a constrained identiﬁcation algorithm for deriving sensitivity curves. Finally, Section 6 presents experimental results to evaluate our proposals.

2. Preliminaries

In this paper, we use the language of structural causal models as our basic semantic framework (Pearl, 2000). In particular, we consider linear semi-Markovian SCMs, consisting of a set of equations of the form V = ΛV +U, where V represent the endogenous variables, U the exogenous variables, and Λ a matrix containing the structural coefﬁcients representing both the strength of causal relationships and lack of direct causation among variables (when λij = 0). The exogenous variables are usually assumed to be multivariate Gaussian with covariance matrix E, encoding independence between error terms (when ϵij = 0).2 We focus on acyclic models, where Λ can be arranged to be lower triangular.

The covariance matrix Σ of the endogenous variables induced by model M is given by Σ = (I Λ) 1E(I Λ) . Without loss of generality, we assume model variables have been standardized to unit variance. For any three variables x, y and z, we denote σyx to be the covariance of x and y, σyx.z to be the partial covariance of y and x given z, and Ryx.z the regression coefﬁcient of y on x adjusting for z. Causal quantities of interest in a linear SCM are usually entries of Λ (or functions of those entries), and identiﬁability reduces to checking whether they can be uniquely computed from the observed covariance matrix Σ.

Causal graphs provide a parsimonious encoding of some of the substantive assumptions of a linear SCM. The causal graph (or the path diagram) of model M is a graph G = (V, D, B), where V denotes the vertices (endogenous variables), D the set of directed edges (non-zero entries of Λ) and B the set of bidirected edges (non-zero entries of E). Missing directed edges represent exclusion restrictions a variable is not a direct cause of the other. Missing bidirected edges denote independence restrictions, representing the fact that no latent common causes exist between two observed variables. When clear from context, we may

2Gaussianity is not necessary for the results of the paper.

Sensitivity Analysis of Linear Structural Causal Models

treat model coefﬁcients and their corresponding edges on the graph interchangeably. We use standard graph notation, where Pa(y) denotes the parents, Ch(y) the children, Anc(y) the ancestors, and De(y) the descendants of node y.

3. Sensitivity analysis and identiﬁcation

In this section we demonstrate the pervasiveness of identiﬁcation problems in sensitivity analysis in the context of a simple example. Suppose a scientist hypothesizes model GO shown in Fig. 1a with the goal of estimating the direct effect of a treatment x on an outcome y (structural coefﬁcient λxy). By the single-door criterion (Pearl 2000), she veriﬁes λxy is identiﬁable in GO and equal to the regression estimand Ryx.z, licensing her to proceed with estimation.

Another investigator, however, is suspicious of the bold assumption that no common causes (confounders) exist between z and x in GO. She goes on, therefore, and constructs an alternative model GA (Fig. 1b) such that the bidirected edge z x is included to account for that possibility. A question now naturally arises: how wrong could one be using Ryx.z to estimate λxy if the true causal model were given by graph GA? Answering this question requires deﬁning a measure of wrongness of the estimand, and perhaps the simplest such measure is its bias in the additive scale.3

Deﬁnition 1 (Bias of ES with respect to Q). Let Q be a computable quantity given a fully speciﬁed linear structural causal model, and let ES be any estimand (a functional of the covariance matrix Σ). The bias of ES with respect to Q is the difference between the two quantities, B = ES Q.

In our example, the proposed estimand is ES = Ryx.z, the target quantity is Q = λxy, and to compute the bias, B = Ryx.z λxy, one needs to identify λxy. Computing the bias, thus, entails an identiﬁcation problem (Prop. 1).

Proposition 1. The bias of estimand ES with respect to target quantity Q is identiﬁable iff Q is identiﬁable.

In GA, however, the presence of the bidirected edge x z renders λxy unidentiﬁable, and computation of B is not possible. How could one circumvent this impediment?

As in Cornﬁeld et al. (1959), the impossibility of computing the exact bias of Ryx.z with respect to λxy calls for another strategy expressing the bias as a function of the strength of the omitted confounders. In this way, the analyst can predict for any hypothetical strength of the confounders whether it would be enough to change the research conclusions. This allows the analyst to bring new substantive knowledge to bear, by submitting these quantitative results to a judgment of plausibility and ruling out some scenarios.

3Note this refers to the bias of an estimand (not an estimator), and it is the difference between the proposed estimand and the desired (causal) target quantity in the population.

(a) Model GO

(b) Model GA

(c) Model GB

Figure 1: Original model GO and two alternative models, GA and GB. In GA any of the remaining parameters (λzx, ϵzx or ϵzy) can be used as a sensitivity parameter for λxy, whereas GB rules out ϵzx as a sensitivity parameter. Adding a bidirected edge x y in GA does not prevent ϵzy from being a valid sensitivity parameter, whereas in GB it does.

Implementing this idea requires a precise deﬁnition of how to measure the strength of the omitted confounders. In our example, a possible candidate for measuring such strength is the structural parameter ϵzx of the added bidirected edge z x. The task then becomes: (i) to determine whether knowledge of ϵzx allows the identiﬁcation of λxy; and, (ii) if so, to ﬁnd a parameterized estimand for λxy in terms of ϵzx. This 2-step procedure can be seen as an identiﬁcation problem with non-zero constraints (Def. 2).4

Deﬁnition 2 (θ-identiﬁability). Let M be a linear SCM and θ a set of parameters of M with known (non-zero) values. A causal quantity Q is said to be θ-identiﬁable if Q is uniquely computable from Σ and θ.

We call any functional of Σ and θ, which identiﬁes Q, a θ-speciﬁc estimand (or sensitivity curve) for Q with respect to sensitivity parameters θ. These estimands are the workhorse for sensitivity analysis; they allow us to investigate how strong certain relationships must be (as parameterized by θ) in order to induce signiﬁcant bias in our estimates. In other words, identifying a bias function in terms of θ (and the observed data) for sensitivity analysis is equivalent to the constrained identiﬁcation problem of Def. 2 (Prop. 2).

Proposition 2. The bias of ES with respect to Q can be expressed as a function of θ (and Σ) iff Q is θ-identiﬁable.

Going back to GA, it is indeed possible to construct an ϵzx-speciﬁc estimand for λxy (see Sec. 4):

λxy(ϵzx) = σxy (σzx ϵzx)σyz

1 (σzx ϵzx)σzx (1)

Eq. 1 allows one to compute the bias of Ryx.z with respect to the target quantity λxy, for any given hypothetical value of ϵzx, if the true model were given by GA. Similarly, it allows one to determine how strong the unobserved confounder would need to be (as parameterized by ϵzx) such that the as-

4Note the relationship to z-ID (Bareinboim & Pearl, 2012), in which case constraints are imposed on experimental distributions in the non-parametric setting.

Sensitivity Analysis of Linear Structural Causal Models

sociation Ryx.z is completely explained by the unobserved confounder (i.e., the value of ϵzx such that λxy(ϵzx) = 0).

Still, what if the analyst has no knowledge to plausibly bound the strength of ϵzx? Even though the violation introduced in model GA was the addition of the bidirected edge x z, corresponding to ϵzx, there is no reason to limit our attention to that parameter, and any θ-speciﬁc estimand could be used for sensitivity analysis. In fact, the two remaining parameters of the model also yield valid θ-speciﬁc estimands (Sec. 5 provides an algorithmic solution),

λxy(λzx) = σxy λzxσyz

1 λzxσzx (2)

λxy(ϵzy) = σzy ϵzy

Having a diverse option of sensitivity curves is important, because sensitivity analysis relies on plausibility judgments. One could argue, for instance, that assessing the plausibility of ϵzx could be hard because it involves judging the effect of confounders of unknown cardinality, and perhaps, previous studies give plausible bounds on the direct causal effect of z on x (i.e., λzx), making a λzx-speciﬁc estimand more attractive. Regardless of the speciﬁc scenario, it is clear that the choice of sensitivity parameters should be guided by the availability of substantive knowledge.

Remarkably, several subtleties arise when deriving θ-speciﬁc estimands, even in simple models with three variables. For instance, a natural approach for tackling the problem in our example could be the re-expression of Ryx.z in terms of the covariance matrix implied by GA, yielding,

Ryx.z = λxy (σzx λzx)ϵzy

1 σ2zx = λxy ϵzxϵzy

One may surmise upon the examination of such expression that two sensitivity parameters are needed. As shown in Eqs. 1 to 3, this conclusion would be misleading.

These subtleties also appear when solving several variations of a model. Imagine the alternative model is now GB, instead of GA, as shown in Figure 1c. Is ϵzx an admissible sensitivity parameter in this case? Is the ϵzy-speciﬁc estimand derived in GA still valid if the model were GB? If we include another violation in both models, a bidirected arrow x y, would the previously obtained ϵzy-speciﬁc estimands still be valid? Despite the apparent similarity of both models, the answers to these questions reveal their sensitivity curves behave quite differently. The tools developed in this paper not only provide an algorithmic solution to these questions, but also allow researchers to swiftly answer them by simple inspection of the graph.

The above examples demonstrate several of the identiﬁcation problems entailed by a sensitivity analysis. If in small models these tasks are already complex, once we move to

models with more than three or four variables, an informal, case-by-case approach to sensitivity analysis is simply infeasible. Therefore, we need a formal framework and efﬁcient algorithms to incorporate constraints in linear SCMs.

4. Incorporating constraints in linear SCMs

Existing methods for identiﬁcation in linear SCMs, such as the QID algorithm from Chen et al. (2017), are able to incorporate constraints on directed edges and can be used to derive sensitivity curves such as the λzx-speciﬁc estimand of Eq. 2. The QID algorithm exploits a known edge λab by creating an auxiliary variable (AV) b = b λaba (Chen et al., 2016). Subtracting out the direct effect of a on b in this way may help with the identiﬁcation of other coefﬁcients in the model. For instance, the λzx-speciﬁc estimand can be computed using AVs in the following way: (i) create x = x λzxz; (ii) use x as an instrument for λxy, resulting in λxy(λzx) = σyx /σxx = (σxy λzxσyz)/(1 λzxσzx).5

However, neither the ϵzx-speciﬁc nor the ϵzy-speciﬁc estimands can be derived using QID; in fact, there is no current identiﬁcation algorithm that offers a principled and efﬁcient way to exploit knowledge of bidirected edges.6 As this is critical for the derivation of sensitivity curves (see Sec. 6), one of the core contributions of this work is the development of a novel graphical procedure that allows one to systematically incorporate constraints on error covariances.

Conventional linear SCMs already impose one type of constraint on error covariances: a lack of a bidirected edge between two variables a and b encodes the assumption that the structural parameter ϵab is zero. The identiﬁcation problem imposed by sensitivity analysis, nonetheless, sets a different type of constraint the error covariance ϵab is ﬁxed to a known but non-zero number. The essence of our method is to represent this knowledge in the graph.

Considering a graph G, covariance matrix Σ, and a known error covariance ϵab, our strategy consists of performing a manageable transformation of G such that the bidirected edge a b is removed from the graph. By manageable we mean the implied covariance matrix Σ of the transformed graph G can still be derived from Σ and the known value ϵab; otherwise, we would have no connection between G and the data, making inference in G impossible. Once this graphical transformation is applied, we can exploit any

5The QID algorithm extends generalized instrumental sets (Brito & Pearl, 2002) using a bootstrapping procedure whereby complex models can be identiﬁed by iteratively identifying coefﬁcients and using them to generate new auxiliary variables. It takes as inputs a graph G, covariance matrix Σ and known directed edges D, and it returns the new set of identiﬁed directed edges. 6Methods from computer algebra offer a complete solution but are computationally intractable. See Sec. 6 and Supp. Materials.

Sensitivity Analysis of Linear Structural Causal Models

existing graphical identiﬁcation method on the modiﬁed model G , and solutions in G can be transfered back to solutions in the original model G. In short, we manipulate the graph to reduce an identiﬁcation problem with a non-zero constraint to a standard one.

The easiest way to introduce our method, which we call PUSHFORWARD, is via an example. Consider again graph GA in Fig. 1b, and assume ϵzy is known. Path-tracing (Wright, 1921) results in the following covariances, where the known parameter ϵzy is highlighted in red,

σzx = λzx + ϵzx (5)

σzy = λzxλxy + ϵzxλxy + ϵzy (6)

σxy = λxy + λzxϵzy (7)

Ideally, we could create an alternative model G A where the bidirected edge z y is fully removed from the graph. For this to be useful, we need to be able to express the new implied covariance matrix Σ A in terms of the original covariance matrix ΣA and the known error covariance ϵzy. While expressing σ zy in terms of ΣA and ϵzy is straightforward (since, trivially, σ zy = σzy ϵzy), it is not immediately clear how to write σ xy = σxy λzxϵzy = λxy in terms of ΣA and ϵzy, for this requires identifying either λxy or λzx in the original model.

Thus, rather than fully removing z y, we push it forward to the children of z, as shown in graph G A of Fig. 2b. Note the bidirected edge is moved from being between z and y to being between x (a child of z) and y, with new structural parameter ϵ zy = λzxϵzy. Path-tracing of G A shows its implied covariance matrix Σ A is exactly the same as ΣA, except for σ zy, which can be obtained by subtracting ϵzy from σzy,

σ zx := σzx =λzx + ϵzx (8)

σ zy := σzy ϵzy =λzxλxy + λxyϵzx (9)

σ xy := σxy =λxy + λzxϵzy (10)

Since G A has the same structural coefﬁcients as G and we know how to compute the covariance matrix induced by G A from the known values Σ and ϵzy, we can use G A to identify the coefﬁcients in our original model. In this case, z is an instrument for λxy in G A, resulting in the estimand λxy(ϵzy) = σ zy/σ zx = (σzy ϵzy)/σzx of Eq. 3.

Applying the same logic to graph GB in Fig. 1c, assume ϵzy is known. Since z has no other descendants except y, pushing forward ϵzy simply removes the bidirected edge z y. This results in the modiﬁed graph G B of Fig. 2d with the amortized covariance of z and y, σ zy = σzy ϵzy. Note ϵzy enters in no other covariances of the system. The graph G B renders z single-door admissible for the identiﬁcation of λxy, giving us the estimand λxy(ϵzy) = R yx.z = (σyx σxz(σzy ϵzy))/(1 σ2 xz).

λxy ϵ xy = λzxϵzy

Figure 2: Pushing forward ϵzy in GA renders z a valid instrument in G A. Pushing forward ϵzy in GB renders z single-door admissible in G B.

This simple graphical manipulation also makes it clear why adding a bidirected edge x y as a further violation in the original graphs GA and GB has different consequences for the identiﬁcation of λxy. In G A, z still remains a valid instrument even if the original graph had x y; this would only change the value of the structural coefﬁcient ϵ xy, which would now read ϵ xy = ϵxy + λzxϵzy. In G B, however, adding x y renders z inadmissible for single-door identiﬁcation of λxy, since this backdoor path cannot be blocked.

Sometimes it might be necessary to prune variables from G to guarantee Σ is computable. Consider again GA and assume ϵzx is known. Pushing forward ϵzx results in Fig. 3b where, as before, we know σ zx = σzx ϵzx. However, path-tracing of Fig. 3b shows the covariance of z with y would also need adjustment, σ zy = σzy λxyϵzx. Thus, we have two cases: (i) if λxy is known, the adjustment is feasible and we are done; (ii) if λxy is not (yet) known, the adjustment cannot be made; but, since y is a leaf node, it can be pruned from G (Tian & Pearl, 2003), avoiding this problem (Fig. 3c). In this case, note the pruned graph is still helpful now λzx can be identiﬁed. As previously discussed, knowledge of λzx permits identiﬁcation of λxy using AVs, giving us the ϵzx-speciﬁc estimand of Eq. 1.

(b) Push forward ϵzx

(c) Prune y

Figure 3: Pushing forward ϵzx in GA requires adjusting σyz. If the adjustment is possible, y is kept in the graph as in Fig. 3b; if not, y is marginalized (pruned) as in Fig. 3c.

The graphical manipulation of PUSHFORWARD is general, and can be performed whenever we have knowledge of a known error covariance. Theorem 1 formalizes the procedure to arbitrary models. Given any bidirected edge x y with known value ϵxy, we remove it from the graph and register the new amortized covariance σ xy = σxy ϵxy. Next we repair the covariances of the descendants of x with y by,

Sensitivity Analysis of Linear Structural Causal Models

for every c Ch(x), adding (or modifying) the bidirected edge c y with the direct causal effect λxc times ϵxy. Finally, for any descendant z of y, we either (i) amortize its covariance with x, if all edges that compose the total causal effect δyz of y on its descendant z are known, or (ii) marginalize z out by pruning the graph. The ﬁnal output is a modiﬁed model G , Σ where any graphical identiﬁcation method can be applied; and, estimands in terms of Σ

can be converted back to estimands in terms of Σ and ϵxy.

Theorem 1 (PUSHFORWARD). Given a linear SCM with graph G, covariance matrix Σ, a set of known directed edges D, and known bidirected edge ϵxy, let the pair G , Σ be constructed from G and Σ as follows:

1. x y is removed and σ xy = σxy ϵxy;

2. c Ch(x), c = y, the bidirected edges c y are added if they do not exist, and ϵ cy = ϵcy + λcyϵxy;

3. z De(y), z = x, if there is an edge on any directed path from y to z that is not in D, then z is removed from G . For the remaining z, σ xz = σxz ϵxyδyz, where δyz is the sum of all directed paths from y to z;

4. All other parameters and covariances remain the same.

Then, if λab is identiﬁable in G , it is (ϵxy, D)-identiﬁable in G.

We denote by PF(G, Σ, D, ϵxy, x) the function that returns the modiﬁed model G , Σ as per Theorem 1. Pseudocode for PF (which closely follows the steps of the theorem) as well as the proof can be found in the supplementary material.

5. Algorithmic derivation of sensitivity curves

In this section, we construct a graph-based constrained identiﬁcation algorithm for linear SCMs which systematically exploits knowledge of both path coefﬁcients and error covariances efﬁciently. Our algorithm relies on the PUSHFORWARD method to incorporate constraints on bidirected edges, and on the AV technique (via the QID algorithm) to incorporate constraints on directed edges. This allows the algorithmic derivation of sensitivity curves for a target query λxy in arbitrary linear models, with an arbitrary set of directed and bidirected edges as sensitivity parameters.

Although the graphical modiﬁcation of PUSHFORWARD is deﬁned for one bidirected edge, the modiﬁed graph G

is a valid model in which any graphical operation can be performed. We can thus extend PUSHFORWARD to handle multiple bidirected edges by iteratively applying it whenever a bidirected edge of the modiﬁed graph is still known what remains to be decided is the order in which to perform these operations. Note that testing all possible orders of graphical

Figure 4: PF multiple edges in topological order.

manipulations can result in an algorithm with exponential computational complexity, even when initially pushing forward a single bidirected edge ϵxy. This happens because new bidirected edges are created for each c Ch(x) and, if all the λcx are identiﬁable, all subsets of those bidirected edges may be eligible to be pushed forward again. Thus, here we propose an efﬁcient procedure using topological ordering, which performed as well as a brute-force approach in our computational experiments (Sec. 6).

Consider the example given in Fig. 4a. The task is to decide whether θ = (ϵxz, ϵxy, ϵzy) (in red) is an admissible set of sensitivity parameters for the target coefﬁcient λxy (in blue) and, if so, to ﬁnd the corresponding sensitivity curve. Our strategy consists of, for each node v, listing its ancestors a An(v), and, in topological order, iteratively push forward ϵav if it is still known in the modiﬁed graph. By performing operations in this way, we are guaranteed to visit each ancestor of v only once. Starting with node v = z, it has only one ancestor x and a single known bidirected edge to be removed, ϵxz. This can be handled with a onestep PUSHFORWARD operation (pruning y), resulting in the modiﬁed graph G z of Fig. 4b, in which λxz can be trivially identiﬁed. Next, return to the original graph and consider v = y, with ancestors x and z. Following a topological order, we ﬁrst push forward ϵxy, giving us the modiﬁed graph G y of Fig. 4c with new bidirected edge ϵ zy = ϵzy + λxzϵxy. Note all components of ϵ zy are known, we can thus push forward ϵ zy in G y, obtaining the graph G y in Fig. 4d, in which λxy is identiﬁed with sensitivity curve R yx.z.

(a) Original Model

(b) Push forward ϵxw

(c) Push forward ϵyw

Figure 5: Instruments with ancestors and descendants.

In the previous example we demonstrated how to systematically deal with bidirected edges connected to ancestors

Sensitivity Analysis of Linear Structural Causal Models

Algorithm 1 CID(G, Σ, D, B)

1: initialize VB Vertices(B) 2: repeat 3: D D QID(G, Σ, D) 4: for each v VB do 5: G , Σ G, Σ 6: for each a An(v) in topological order do 7: if ϵ av is known then 8: G , Σ PF(G , Σ , D, ϵ av, a) 9: D D QID(G , Σ , D) 10: end if 11: end for 12: end for 13: for each ϵab B do 14: G , Σ PF(G, Σ, D, ϵab, a) 15: D D QID(G , Σ , D) 16: end for 17: until all directed edges have been identiﬁed or no edge has been identiﬁed in the last iteration

of a node v; however, in linear models, descendants of v can also help with the identiﬁcation of direct causal effects λav. Consider, for instance, Fig. 5a. The task is to ﬁnd a sensitivity curve for λxy in terms of θ = (ϵxw, ϵyw). Start with node w and, as before, push forward ϵxw as in Fig. 5b. Here, λzw can be identiﬁed with x as an instrument. Returning to the original graph, now consider node y and push forward the bidirected edge ϵyw with its descendant w, as in Fig. 5c. Since λzw has been identiﬁed, we can create the AV w = w λzwz which is a valid instrument for λxy.

These two cases illustrate our general procedure for handling multiple bidirected edges, which in combination with the QID algorithm forms our constrained identiﬁcation algorithm CID, provided in Algorithm 1. Lines 4 to 12 perform PUSHFORWARD (PF) in topological ordering, each time applying QID in the modiﬁed model to verify if new directed edges can be identiﬁed; lines 13 to 16 perform a single PUSHFORWARD operation on each bidirected edge, which may free descendants to be used as instruments as in Fig. 5. Since new identiﬁed edges can help both PUSHFORWARD as well as QID, this process is repeated until all or no new directed edges are identiﬁed in the last iteration. The complexity of CID is dominated by QID, which is polynomial if the degree of each node is bounded (Chen et al., 2017).

An interesting 4-node example is shown in Fig. 6a, where ϵzw, a parameter neither related to x nor y, is an admissible sensitivity parameter for λxy! Our algorithm derives an ϵzw-speciﬁc estimand for λxy as follows. It ﬁrst pushes forward ϵzw, and runs QID in the modiﬁed graph, resulting in the identiﬁcation of λzw. Next, the algorithm returns to the original graph, and runs QID, which uses λzw to create the auxiliary variable w = w λzwz, enabling

(a) λxy is ϵzw-identiﬁable

(b) Sensitivity of λxy in terms of ϵzw

Figure 6: In Fig 6a note that, although not connected to x nor y, ϵzw is an admissible sensitivity parameter for λxy. Fig. 6b shows the sensitivity curve of λxy in terms of ϵzw for a numerical simulation of the model in Fig. 6a.

the identiﬁcation of λzx. Finally, still within QID, λxy is obtained using the auxiliary variable x = x λzxz.

As discussed in Sec.3, the utility of θ-speciﬁc estimands is to show how sensitive the target quantity of interest is to different hypothetical values of the sensitivity parameters θ. These results can then be submitted to quantitative plausibility judgments, for instance, in the form of θ Θp, where Θp is a plausibility region. To illustrate how one could deploy this in practice, we provide a numerical example of the causal model in Fig. 6a. Our goal is to assess how different hypothetical values for ϵzw affects inference of λxy. In a real context, this needs to be estimated from ﬁnite samples, and here we use a maximum likelihood estimator. Fig. 6b shows the estimates for λxy (blue) for different values of the sensitivity parameter ϵzw, along with the corresponding 95% conﬁdence interval (gray). If, for instance, we can plausibly bound ϵzw to be within 0.1 to 0.3, the plot reveals λxy can be safely judged to be within -0.2 to -0.6.

6. Computational experiments

The identiﬁcation problem in linear systems has not yet been efﬁciently solved. Although there exists a complete solution using computer algebra (García-Puente et al., 2010), these methods are computationally intractable, making it impractical for graphs larger than 4 or 5 nodes. Since we rely on existing identiﬁcation algorithms that are polynomial but not complete (i.e., QID cannot ﬁnd all identiﬁable parameters), we cannot expect the CID algorithm to ﬁnd all sensitivity curves as well. In this section, we report the results of an extensive set of experiments aimed to empirically verify the generality of our approach. We have performed an exhaustive study of all possible queries in 3 and 4-node models, which are essentially the largest instances computer

Sensitivity Analysis of Linear Structural Causal Models

3-NODE MODELS 4-NODE MODELS

ID ALGORITHM Directed Bidirected Both Directed Bidirected Both

QID (AVs only) 19(100%) (0%) 68 (21%) 14,952(95%) (0%) 170,304(29%) CID (AVs + PF) 19(100%) 109(100%) 320(100%) 14,952(95%) 50,708(97%) 555,758(96%)

GROUND TRUTH 19 109 320 15,740 52,016 578,858

Table 1: Number of θ-identiﬁable sensitivity queries (only when θ = ) per type of sensitivity parameters θ.

algebra methods can solve through brute force.7

A query consists of determining whether in model G, a target parameter λxy is θ-identiﬁable given a set of sensitivity parameters θ. For 3-node models, we have 50 connected graphs with 720 possible queries; for 4-node models, we have 3,745 connected graphs and 1,059,156 possible queries.8 For each query, we used algebraic methods to determine ground-truth identiﬁcation and checked it against the results of both QID and CID. Our interest lies in the queries that are θ-identiﬁable only when θ = .

The results are given in Table 1, where columns restrict sensitivity parameters θ to be: (i) subsets of directed edges; (ii) subsets of bidirected edges; and, (iii) subsets of both directed and bidirected edges. The results show that our CID algorithm correctly identiﬁes all possible sensitivity curves for 3-node models. Among 4-node models, our method solves 96% of all identiﬁable sensitivity queries.

These numbers reveal that, in the context of linear SCMs, canonical sensitivity analysis examples which have been addressed on a case-by-case basis in the literature (e.g., Fig.7, target coefﬁcient in blue and sensitivity parameters in red), are only a small subset of all possible sensitivity analyses exercises enabled by our proposal.When comparing CID s results to those of QID only, it is also clear that systematically incorporating constraints on bidirected edges is essential for obtaining sensitivity curves.

A valid concern regarding CID s current implementation is that the proposed topological ordering for processing bidirected edges could be less capable than a general search over all possible valid graphical manipulations. With this in mind, we performed a thorough comparison of our proposal against other ordering methods for all queries in 3 and 4-node models. Topological ordering proved to perform as well as a brute-force search that recursively tests all possible subsets of bidirected edges that can be pushed forward.

Finally, the incompleteness of CID can stem from two sources: limitations of the graphical manipulations per-

7We use Gröbner bases, which has a doubly-exponential comp. complexity (Bardet, 2002). See Supp. Materials for details. 8For 5-node models, these numbers reach 1 million graphs and 11 billion queries. Ground-truth computations in 5-node models using computer algebra can take hours for a single graph.

(a) Backdoor

(c) Mediation

Figure 7: Canonical sensitivity analyses: (a) backdoor violation with unobserved confounders independent of observed confounders (Carnegie et al., 2016); (b) putative instrumental variable, where both the exclusion and independence restriction are suspected to be violated (Wang et al., 2018); (c) randomized trial in which treatment x has side-effect m, and unobserved mediation-outcome confounding cannot be ruled out (Vander Weele, 2010). For linear SCMs, these are special cases of all queries solved by our approach.

formed by PUSHFORWARD and the incompleteness of the identiﬁcation algorithm for directed edges, QID. Separating the two can help guide efforts for future research. To achieve that, we used algebraic methods to simulate how CID would have performed if it had access to a complete identiﬁcation algorithm for directed edges instead of QID. We found that CID would have identiﬁed over 99.99% of 4-node sensitivity queries. This seem to suggest that: (i) the main bottleneck of CID is QID; and (ii) PUSHFORWARD with topological ordering can reap the beneﬁts of improved identiﬁcation algorithms for directed edges.

7. Conclusion

We introduced a general algorithmic framework of sensitivity analysis for linear SCMs. We reduced sensitivity analysis to a constrained identiﬁcation problem and developed a novel graphical procedure to systematically incorporate constraints on bidirected edges. We then devised an efﬁcient graph-based algorithm for deriving sensitivity curves. Exhaustive experiments corroborated the generality of our proposal. Such systematic tools can help analysts better navigate in the model space and understand the trade-off between the plausibility of assumptions and the strength of conclusions. Extensions to other types of violations and to nonlinear models are promising directions for future work.

Sensitivity Analysis of Linear Structural Causal Models

Acknowledgements

The authors thank anonymous reviewers for helpful comments. Cinelli and Pearl are supported in parts by grants from IBM Research, Defense Advanced Research Projects Agency [W911NF-16-057], National Science Foundation [IIS-1302448, IIS-1527490, and IIS-1704932], and Ofﬁce of Naval Research [N00014-17-S-B001]. Bareinboim and Kumor are supported in parts by grants from NSF IIS1704352, and IIS-1750807 (CAREER), IBM Research, and Adobe Research. Most of the work by Chen was conducted while at IBM Research AI.

Arah, O. A. Bias analysis for uncontrolled confounding in the health sciences. Annual review of public health, 38: 23 38, 2017.

Bardet, M. On the complexity of a grobner basis algorithm. pp. 8, 2002.

Bareinboim, E. and Pearl, J. Causal inference by surrogate experiments: z-identiﬁability. ar Xiv preprint ar Xiv:1210.4842, 2012.

Bareinboim, E. and Pearl, J. Causal inference and the datafusion problem. Proceedings of the National Academy of Sciences, 113(27):7345 7352, 2016.

Blackwell, M. A selection bias approach to sensitivity analysis for causal effects. Political Analysis, 22(2):169 182, 2013.

Brito, C. and Pearl, J. Generalized instrumental variables. In Proceedings of the Eighteenth conference on Uncertainty in artiﬁcial intelligence, pp. 85 93. Morgan Kaufmann Publishers Inc., 2002.

Brumback, B. A., Hernán, M. A., Haneuse, S. J., and Robins, J. M. Sensitivity analyses for unmeasured confounding assuming a marginal structural model for repeated measures. Statistics in medicine, 23(5):749 767, 2004.

Carnegie, N. B., Harada, M., and Hill, J. L. Assessing sensitivity to unmeasured confounding using a simulated potential confounder. Journal of Research on Educational Effectiveness, 9(3):395 420, 2016.

Chen, B., Pearl, J., and Bareinboim, E. Incorporating knowledge into structural equation models using auxiliary variables. In Proceedings of the Twenty-Fifth International Joint Conference on Artiﬁcial Intelligence, IJCAI 2016, pp. 3577 3583, 2016.

Chen, B., Kumor, D., and Bareinboim, E. Identiﬁcation and model testing in linear structural equation models using auxiliary variables. In International Conference on Machine Learning, pp. 757 766, 2017.

Cinelli, C. and Hazlett, C. Making sense of sensitivity: Extending omitted variable bias, 2018.

Cornﬁeld, J., Haenszel, W., Hammond, E. C., Lilienfeld, A. M., Shimkin, M. B., and Wynder, E. L. Smoking and lung cancer: recent evidence and a discussion of some questions. journal of National Cancer Institute, (23): 173 203, 1959.

Cox, D., Little, J., and O shea, D. Ideals, Varieties, and Algorithms, volume 3. Springer, 1992.

Ding, P. and Vander Weele, T. J. Sensitivity analysis without assumptions. Epidemiology (Cambridge, Mass.), 27(3): 368, 2016.

Fisher, R. Cigarettes, cancer, and statistics. The Centennial Review of Arts & Science, 2:151 166, 1958.

Fisher, R. A. Dangers of cigarette-smoking. British Medical Journal, 2(5039):297, 1957.

Foygel, R., Draisma, J., and Drton, M. Half-trek criterion for generic identiﬁability of linear structural equation models. The Annals of Statistics, pp. 1682 1713, 2012.

Frank, K. A. Impact of a confounding variable on a regression coefﬁcient. Sociological Methods & Research, 29 (2):147 194, 2000.

Franks, A., D Amour, A., and Feller, A. Flexible sensitivity analysis for observational studies without observable implications. Journal of the American Statistical Association, (just-accepted):1 38, 2019.

García-Puente, L. D., Spielvogel, S., and Sullivant, S. Identifying causal effects with computer algebra. In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artiﬁcial Intelligence (UAI). AUAI Press, 2010.

Gifﬁn, R. B., Lebovitz, Y., English, R. A., et al. Transforming clinical research in the United States: challenges and opportunities: workshop summary. National Academies Press, 2010.

Imai, K., Keele, L., and Yamamoto, T. Identiﬁcation, inference and sensitivity analysis for causal mediation effects. Statistical science, pp. 51 71, 2010.

Imbens, G. W. Sensitivity to exogeneity assumptions in program evaluation. The American Economic Review, 93 (2):126 132, 2003.

Joffe, M. M., Yang, W. P., and Feldman, H. I. Selective ignorability assumptions in causal inference. The International Journal of Biostatistics, 6(2), 2010.

Leamer, E. Let s take the con out of econometrics. The American Economic Review, 73(1):31 43, 1983.

Sensitivity Analysis of Linear Structural Causal Models

Masten, M. A. and Poirier, A. Identiﬁcation of treatment effects under conditional partial independence. Econometrica, 86(1):317 351, 2018.

Mauro, R. Understanding love (left out variables error): A method for estimating the effects of omitted variables. Psychological Bulletin, 108(2):314, 1990.

Oster, E. Unobservable selection and coefﬁcient stability: Theory and evidence. Journal of Business & Economic Statistics, pp. 1 18, 2017.

Pearl, J. Causality. Cambridge university press, 2000.

Rosenbaum, P. R. Design of observational studies. Springer Series in Statistics, 2010.

Rosenbaum, P. R. and Rubin, D. B. Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. Journal of the Royal Statistical Society. Series B (Methodological), pp. 212 218, 1983.

Small, D. S. Sensitivity analysis for instrumental variables regression with overidentifying restrictions. Journal of the American Statistical Association, 102(479):1049 1058, 2007.

Spirtes, P., Glymour, C. N., Scheines, R., Heckerman, D., Meek, C., Cooper, G., and Richardson, T. Causation, prediction, and search. MIT press, 2000.

The Sage Developers. Sage Math, the Sage Mathematics Software System (Version 8.5), 2018. https://www.sagemath.org.

Tian, J. and Pearl, J. On the identiﬁcation of causal effects. Technical Report R-290, Cognitive Systems Laboratory, UCLA, 2003.

Vander Weele, T. J. Bias formulas for sensitivity analysis for direct and indirect effects. Epidemiology (Cambridge, Mass.), 21(4):540, 2010.

Vanderweele, T. J. and Arah, O. A. Bias formulas for sensitivity analysis of unmeasured confounding for general outcomes, treatments, and confounders. Epidemiology (Cambridge, Mass.), 22(1):42 52, January 2011. ISSN 1531-5487.

Wang, X., Jiang, Y., Zhang, N. R., and Small, D. S. Sensitivity analysis and power for instrumental variable studies. Biometrics, 2018.

Wright, S. Correlation and causation. Journal of agricultural research, 20(7):557 585, 1921.