# scalable_computation_of_causal_bounds__ead0845a.pdf

Scalable Computation of Causal Bounds

Madhumitha Shridharan 1 Garud Iyengar 1

We consider the problem of computing bounds for causal inference problems with unobserved confounders, where identiﬁability does not hold. Existing non-parametric approaches for computing such bounds use linear programming (LP) formulations that quickly become intractable for existing solvers because the size of the LP grows exponentially in the number of edges in the underlying causal graph. We show that this LP can be signiﬁcantly pruned by carefully considering the structure of the causal query, allowing us to compute bounds for signiﬁcantly larger causal inference problems as compared to what is possible using existing techniques. This pruning procedure also allows us to compute the bounds in closed form for a special class of causal graphs and queries, which includes a well-studied family of problems where multiple confounded treatments inﬂuence a outcome.We also propose a very efﬁcient greedy heuristic that produces very high quality bounds, and scales to problems that are several orders of magnitude larger than those for which the pruned LP can be solved.

1. Introduction

In most real world applications of causal inference (Imbens & Rubin, 2015; Pearl, 2009) there exist variables that are critical to the identiﬁcation of causal effects, but are either unknown or unmeasured, i.e. there are unobserved confounders. While it is impossible to precisely identify causal effects in the presence of unobserved confounders, it is possible to obtain bounds on the causal query.

There have been multiple such attempts to bound causal effects for small special graphs. Evans (2012) bound the

1Department of Industrial Engineering and Operations Research, Columbia University, New York, USA. Correspondence to: Madhumitha Shridharan <ms6143@columbia.edu>, Garud Iyengar <garud@ieor.columbia.edu>.

Proceedings of the 39 th International Conference on Machine Learning, Baltimore, Maryland, USA, PMLR 162, 2022. Copyright 2022 by the author(s).

causal effects in the special case where any two observed variables are neither adjacent in the graph, nor share a latent parent. Richardson et al. (2014) bound the causal effect of a treatment on a parameter of interest by invoking additional (untestable) assumptions and assess how inference about the treatment effect changes as these assumptions are varied. Kilbertus et al. (2020) and Zhang & Bareinboim (2021) develop algorithms to compute causal bounds for extensions of the instrumental variable model in a continuous setting. Geiger & Meek (2013) bound causal effects in a model under speciﬁc parametric assumptions. Finkelstein & Shpitser (2020) develop a method for obtaining bounds on causal parameters using rules of probability and restrictions on counterfactuals implied by causal graphs.

While fewer in number, there have also been attempts to bound causal effects in large general graphs. Poderini et al. (2020) propose techniques to compute bounds in special large graphs with multiple instruments and observed variables. Finkelstein et al. (2021) propose a method for partial identiﬁcation in a class of measurement error models, and Duarte et al. (2021) propose a polynomial programming based approach to solve general causal inference problems, but their procedure is computationally intensive for large graphs.

In this work, we extend the class of large graphs for which causal effects can be bounded. In particular, we focus on a class of causal inference problems where causal bounds can be obtained using linear programming (LP) (Balke & Pearl, 1994; Zhang & Bareinboim, 2017; Pearl, 2009; Sj olander et al., 2014). Recently, Sachs et al. (2020; 2021) identiﬁed a large problem class for which LPs can be used to compute causal bounds, and have developed an algorithm for formulating the objective function and the constraints of the corresponding LP. This problem class is a generalization of the instrumental variable setting, and is thus widely applicable. However, as we describe later, the size of the LP is exponential in the number of edges in the causal graph, and therefore, the straightforward formulation of the LP can be tractably solved only for very small causal graphs. In this work, we show how to use the structure of the causal query and the underlying graph to signiﬁcantly reduce the size of the LP, and as a consequence, signiﬁcantly increase the size of the graphs for which the LP method remains tractable. Our main contributions are as follows:

Scalable Computation of Causal Bounds

(a) We show that an exponential number of variables in the LP can be aggregated to reduce the size of the problem without impacting the quality of the bound. See Section 3.1 for details. The reduction in size can be very substantial compare |R| with |H| in Table 1 (details in Section 3.1).

(b) Although we show there exists a much smaller LP that can be solved to compute the bounds, we get a computational advantage only if this pruned LP can be constructed efﬁciently. Theorems 3.4 and 3.6 establish that one can construct the pruned LP without ﬁrst constructing the original LP. These results critically leverage the structure of the LP corresponding to a causal inference problem. In particular, they leverage the fact that all possible functions mapping the parents pa(V ) to a variable V are allowed. Note that, without this second result, the savings implied by the pruned LP cannot be realized. See Section 3.1 for details.

(c) Next, we show that the structural results that help us construct the pruned LP allow us to compute the bounds in closed form for a special class of causal inference problems. This class of problems includes as a special case the problems considered by Wang & Blei (2021; 2019a), where multiple confounded treatments inﬂuence a outcome. Moreover, we are able to compute these bounds even when there are causal relationships between the treatments. See Section 3.2.1 for details.

(d) Finally, we propose a simple greedy heuristic to compute optimal solutions for the pruned LPs. We show that this heuristic allows us to compute approximate bounds for much larger scale graphs with very minimal degradation in performance. See Section 4 for details.

Although we work with causal graphs with binary variables in this paper, generalizing our results to categorical variables is straightforward.

The organization of the rest of this paper is as follows. In Section 2, we present the intuition behind our contributions using a running example. In Section 3 we introduce the formalism in Sachs et al. (2020; 2021). In Section 3.1 we introduce our main structural results for pruning the LPs. In Section 3.2 we show that the LP bounds can be computed in closed form for a large class of problems, and in Section 3.2.1 we show an example of this class of problems. In Section 4 we introduce our greedy heuristic and benchmark its performance. Section 5 discusses possible extensions.

Consider the causal graph in Figure 1 with X, Y, Z {0, 1}. Suppose the data given is pxy.z = P(X = x, Y = y|Z = z) and the goal is to compute a lower bound for the causal

Figure 1: Causal Graph for Query Q

query Q = P(Y = 1|do(X = 1)). Here, U is a potentially high dimensional unobserved confounder that is difﬁcult to model directly. Response function variables circumvent this difﬁculty by modelling the impact of U on the relationships between the observed variables. For each ﬁxed value for the unknown confounder U, the variable X is a function of Z; thus, confounder U effectively selects one function from the set F = {f : f is a function from Z X}. Similarly, U selects one function from the set G = {g : g is a function from X Y }. It is easy to see that |F| = |G| = 4, and therefore, the elements can be indexed by r = (r X, r Y ) R = {1, . . . , 4}2, i.e. fr X denotes the r Xth function from F and gr Y denotes the r Y -th function from G. Thus, the response function variables r X and r Y can be used to model the impact of U, and Z and r R uniquely determine X = fr X(Z) and Y = gr Y (X) = gr Y (fr X(Z)). Note that |R| is exponential in the number of arcs in the causal graph.

The unknown distribution over the high dimensional U can be equivalently modeled via the joint distribution qr Xr Y = P(r X, r Y ). Let

Rxy.z = {(r X, r Y ) : fr X(z) = x, gr Y (x) = y}, (1)

denote the set of r-values that map z 7 (x, y). Hence, P(X = x, Y = y|Z = z) = P

(r X,r Y ) Rxy.z qr Xr Y . Furthermore, let RQ denote the set of r values consistent with the query Q = P(Y = 1|do(X = 1)) i.e.

RQ = {(r X, r Y ) : gr Y (1) = 1} (2)

Hence, P(Y = 1|do(X = 1)) = P

(r X,r Y ) RQ qr Xr Y . Then the following LP gives a lower bound on the query (Balke & Pearl, 1994; Sachs et al., 2020):

minq P (r X,r Y ) RQ qr Xr Y s.t. P

(r X,r Y ) Rxy.z qr Xr Y = pxy.z, (x, y, z) q 0, (3)

Since r R uniquely determines the value of (X, Y ) for any ﬁxed value for Z, R = x,y {0,1}2Rxy.z is a partition for the set R for any ﬁxed z. The constraints in LP (3) imply that 1 = P (x,y) {0,1}2 pxy.z = P

(x,y) {0,1}2 P

(r X,r Y ) Rxy.z qr Xr Y = P

(r X,r Y ) R qr Xr Y . Therefore, we do not include the constraint P

(r X,r Y ) R qr Xr Y = 1 in the LP.

Next, we show how to reduce the size of the LP (3) by aggregating variables. Let h : Z (X, Y ) denote any

Scalable Computation of Causal Bounds

function Z 7 (X, Y ). We also refer to h as a hyperarc since it can be interpreted as an arc in a hypergraph. Let

(r X, r Y ) R : fr X(0), gr Y (fr X(0)) = h(0) fr X(1), gr Y (fr X(1)) = h(1)

denote the set of r values consistent with the hyperarc h. Then all r Rh contribute to the same two constraints: (x, y, z) = h(0), 0 and (x, y, z) = h(1), 1 . Therefore, since we have minimization objective, we can set qr = 0 for all r Rh with objective coefﬁcient 1{r RQ} > mins Rh 1{s RQ}, and aggregate all variables qr such that r Rh, i.e. we can reformulate the LP in terms of variables qh with the objective coefﬁcient ch = minr Rh 1{r RQ} = 1{Rh RQ}. The causal graph structure implies that Rh = only for a subset of hyperarcs. Let H denote the set of valid hyperarcs for which Rh = . Thus, the LP (3) can be reformulated as

h H chqh s.t. P

h H:h(z)=(x,y) qh = pxy.z, (x, y, z) q 0, (5)

where ch = 1{Rh RQ}. This reformulation has exponentially fewer variables; however, it is useful only if the set of valid hyperarcs H and the corresponding costs ch can be efﬁciently computed.

2.1. Characterizing Valid Hyperarcs

The challenge here is to avoid iterating over all values in the set R to produce a value which validates h. Suppose h(0) = (x0, y0) and h(1) = (x1, y1) for xi, yi {0, 1}, i = 0, 1. Then the maps (may not be functions) fh and gh implied by h are as follows:

( x0 if z = 0 x1 if z = 1 gh(x) =

( y0 if x = x0 y1 if x = x1

Since R indexes the set of all possible functions in F and G, h is a valid hyperarc if, and only if, the maps fh (resp. gh) is consistent with some function f F (resp. g G). The latter is true if, and only if, it is not the case that x0 = x1 but y0 = y1. Hence, to check the validity of h, it is sufﬁcient to check if x0 = x1 but y0 = y1. We extend this notion of consistency to more complex causal graphs in Section 3.

2.2. Efﬁciently computing ch

We start with the following deﬁnitions.

Deﬁnition 2.1 (Complete Consistency). A hyperarc h is completely consistent with the query Q if Rh RQ. A conditional probability pxy.z is completely consistent with the query Q if Rxy.z RQ.

Thus, ch = 1 if, and only if, h is completely consistent with the query Q. Next, we describe how to check whether a hyperarc h is completely consistent. We begin characterizing complete consistency for conditional probabilities. We establish that

pxy.z is completely consistent with Q y = 1, x = 1.

Clearly, any (r X, r Y ) R11.z maps X = 1 to Y = 1, and thus, (r X, r Y ) RQ deﬁned in (2). Hence, any conditional probability of the form p11.z is completely consistent with Q. To see that this is the only form a completely consistent probability can take, we consider two cases:

Suppose x = 1 i.e. consider the probability p01.z. Since (r X, r Y ) index the set of all possible functions F and G, there exists (r X, r Y ) such that:

fr X(z), gr Y (fr X(z)) = (0, 1), i.e. (r X, r Y ) R01.z, but, gr Y (1) = 0, i.e. (r X, r Y ) RQ

Hence, there exists (r X, r Y ) such that (r X, r Y ) R01.z, but (r X, r Y ) RQ i.e. p01.z is not completely consistent.

Suppose y = 1 i.e. consider the probability p10.z. Then every (r X, r Y ) R10.z maps X = 1 to Y = 0, and therefore, (r X, r Y ) RQ. Hence p10.z is not completely consistent.

Next, we show that the hyperarc h is completely consistent with Q, if and only if, there exists z such that h(z) = (x, y) and the conditional probability pxy.z is completely consistent with Q. Thus, for the query Q = P(Y = 1|do(X = 1)), we have that

h is completely consistent with Q (6)

h(z) = (1, 1) for some z {0, 1}

First suppose h(z) = (1, 1) for some z {0, 1}. For every (r X, r Y ) Rh, (fr X(z), gr Y (fr X(z))) = (1, 1) i.e. (r X, r Y ) maps Z = z to (X = 1, Y = 1). Hence, (r X, r Y ) R11.z. Since p11.z is completely consistent, (r X, r Y ) RQ. Therefore, Rh RQ i.e. h is completely consistent with Q. In order to establish the reverse direction, suppose h(z) = (1, 1), for all z {0, 1}. Since R is exhaustive, there exists r R such that:

(i) r maps Z = z to (X, Y ) = h(z) = (1, 1), for all z {0, 1} i.e. r Rh

(ii) r maps X = 1 to Y = 1, i.e. (r X, r Y ) RQ

Hence h is not completely consistent with Q.

Scalable Computation of Causal Bounds

Figure 2: Causal Graph for Query Q1

To summarize, to compute ch, it is sufﬁcient to check if h is completely consistent with Q, which can be easily characterized without enumerating all the elements in R. In Section 3 we show how to extend this result to more complex graphs.

2.3. Bounds in Closed Form

Now consider the causal graph in Figure 2. Given data pxy.z, we show how to compute bounds for the modiﬁed query Q1 = P(Y = 1|do(Z = 1, X = 1)) in closed form. More precisely, we show that the optimal value of the pruned LP for this problem is the sum of the completely consistent probabilities. Retracing the steps above we get ch = 1 i.e. h is completely consistent with Q1 if and only if, h(1) = (1, 1). The objective of the pruned LP for the problem is: X

{h H:ch=1} qh

{h H: h is completely consistent with Q1} qh

{h H:h(1)=(1,1)} qh

= p11.1 (7)

where (7) follows from the constraints of the pruned LP.

As a step towards characterizing graphs and queries for which such closed form bounds are possible, we understand why closed form bounds do not exist for the original causal graph in Figure 1 and the query Q = P(Y = 1|do(X = 1)). From (6) we have ch = 1 i.e. h is completely consistent with Q, if and only if, there exists z {0, 1} such that h(z) = (1, 1). The objective of the pruned LP is: X

{h H:ch=1} qh = X

{h H: z {0,1},h(z)=(1,1)} qh

h H:h(0)=(1,1) qh + X

h H:h(1)=(1,1) qh,

since {h H : h(0) = (1, 1)} {h H : h(1) = (1, 1)} = . Hence, we cannot compute closed form bounds for the query.

Perhaps, the issue was that the input variable Z was not part of the intervention. So, suppose the query is Q2 = P(Y = 1|do(X = 1, Z = 1)) instead. In this case, the

steps above establish that ch = 1 if and only if, there exists z {0, 1} such that h(z) = (1, 1). This is identical to our condition for ch in Q. Therefore, we will still not be able to compute closed form bounds. What went wrong? Intuitively, Z is redundant in the intervention do(X = 1, Z = 1), since Z s effects on Y are blocked by intervening on X. The speciﬁcation for ch captures this intuition by leaving Z out when characterizing ch. Hence, it appears that we need that the input variable must be non-redundant in the intervention in order to compute closed form bounds. Note that in the causal graph in Figure 2 and query Q1, the input variable Z is non-redundant in the intervention. Hence, we could compute closed form bounds. This is the premise of Assumption 3.10, which characterizes an important subclass of general causal inference problems for which the bounds can be computed in closed form.

3. General Causal Inference Problems

Let G denote the causal graph. Let V1, . . . , Vn denote the variables in G in topologically sorted order. We let N = {1, . . . , n} denote the set of indices for the variables. We use lower case letters for the values for the variables, and the notation Vi = vi denotes that the variable Vi takes the value vi {0, 1}. For any subset S N, we deﬁne VS := {Vi : i S}, and the notation VS = v S denotes the variable Vi = vi, for all i S, for some v {0, 1}|N|. We consider the following class of partitioned causal graphs (Sachs et al., 2020).

Assumption 3.1 ((Sachs et al., 2020)). The index set N is partitioned into two sets N = A B, where VB topologically follow VA, VA can have a common unobserved confounder UA, and VB can have a common unobserved confounder UB; however, no pair of variables (Vi, Vj), where i A and j B, can share an unobserved confounder.

In the simple causal graphs discussed in Section 2, VA = {Z} and VB = {X, Y }.

The conditional probability distribution pv B.v A = P(VB = v B|VA = v A) is known, and our goal is to compute bounds for the causal query Q = P(VO = q O|do(VI = q I)), i.e. the probability of the output event VO = q O given an intervention do(VI = q I).

Assumption 3.2 (Query). A query Q is valid only if O B, I A B, with O I = , and every variable in VA is either intervened upon, or redundant for the intervention.

More precisely, let GI denote the mutilated graph after intervention do(VI = q I), i.e. variables VI no longer have any incoming arcs. Then, for i A, either Vi VI, i.e. it is intervened upon, or there is no directed path from Vi to any variable in VB in GI. In the example discussed in Section 2, VI = {X} and VO = {Y }.

Scalable Computation of Causal Bounds

The presence of unobserved confounders implies that the values for variables in VB are not completely deﬁned when VA = v A. However, as we note in Section 2, the unobserved confounder UB effectively selects a particular function from all possible functions mapping VA to VB subject to the constraints imposed by the graph structure. For j B, let pa(Vj) denote the parents of Vj in the causal graph. Analogous to sets F and G in Section 2, deﬁne Fj to be the set of all possible functions pa(Vj) 7 Vj. Since each variable Vk pa(Vj) takes values in {0, 1} and Vj {0, 1}, the cardinality of the set |Fj| = 22|pa(Vj )|. Then the elements of Fj can be indexed by r Vj RVj = {1, . . . , |Fj|}. Let the set R = Q j B RVj index all possible mappings from pa(Vj) 7 Vj for all j B. Note that the cardinality |R| = Q

j B 22|pa(Vj )|.

For r R, let FO(VS = v S, r) denote the value of VO VB when VS = v S provided it is well deﬁned. Since VA topologically precede VB, setting VA = v A and choosing r R completely deﬁnes the values for VB, i.e. FB(VA = v A, r) is well deﬁned. Let Rv B.v A = {r : FB(VA = v A, r) = v B}. From the deﬁnition of a valid query, it follows that setting VI = q I and selecting r R uniquely deﬁnes the value for VO. Let RQ = {r R : FO(VI = q I, r) = q O} denote the set of r values which are consistent with the query.

Then bounds for the causal query can be obtained by solving the following pair of linear programs (Balke & Pearl, 1994):

minq / maxq P

r RQ qr s.t. P r Rv B.v A qr = pv B.v A, v A, v B, q 0. (8)

Recall for VA = v A, r R uniquely determines the value of VB. Hence, for ﬁxed v A, v BRv B.v A is a partition of R. Thus, the constraint P

r Rv B.v A qr = P v B pv B.v A = 1 is implied by the other constraints in the LP, and therefore, is not explicitly added to the LP.

3.1. Pruning the LP

Let h : VA 7 VB denote any function from VA to VB. We will call these functions hyperarcs because they correspond to hyperarcs in appropriately deﬁned hypergraphs, and also as a short hand for these special functions. Let Rh = {r : FB(VA = v A, r) = h(v A), v A {0, 1}|A|} denote the set of r values which are consistent with the hyperarc h. The hyperarc h is valid only if Rh = . Let H denote the set of valid hyperarcs and let {qh : h H} denote a probability measure over valid hyperarcs. Then the LP to compute the

lower bound αL in terms of variables {qh} is given by

h H c L hqh s.t. P

{h H:h(v A)=v B} qh = pv B.v A, v A, v B q 0, (9)

where c L h = 1{Rh RQ}. Similarly, the LP for the upper bound αU can be reformulated as:

h H c U h qh s.t. P

{h H:h(v A)=v B} qh = pv B.v A, v A, v B q 0 (10)

where c U h = maxr Rh 1{r RQ} = 1{Rh RQ = }. Both reformulations have exponentially fewer variables; however, they are useful only if the set of valid hyperarcs H and the corresponding costs c L h = 1{Rh RQ} and c U h = 1{Rh RQ = } can be efﬁciently computed, i.e. in particular, without iterating over R.

3.1.1. CHARACTERIZING VALID HYPERARCS

We now show how to efﬁciently check the validity of hyperarc h. We begin by deﬁning functional consistency.

Deﬁnition 3.3 (Functional Consistency). For j B, let Pj N denote the indices of pa(Vj). A hyperarc h is functionally consistent if for all a, b {0, 1}|N| such that h(a A) = a B and h(b A) = b B, and for all j B,

a Pj = b Pj = aj = bj

A hyperarc h partially speciﬁes a function mapping pa(Vj) to Vj. Functional consistency ensures this partial speciﬁcation is consistent with some binary function pa(Vj) Vj. The following result characterizes valid hyperarcs without producing a r-value which validates it.

Theorem 3.4 (Validity of h). A hyperarc h is valid if, and only if, h is functionally consistent.

Proof. It is clear that if a hyperarc is valid, then it is functionally consistent. Suppose a hyperarc is functionally consistent. Then, for each Vj VB, the partial speciﬁcation of a mapping from pa(Vj) to Vj implied by the hyperarc h is consistent with some binary function pa(Vj) Vj. Since, for each Vj VB, r Vj indexes the set of all possible functions pa(Vj) Vj, it follows that the set of r-values which are consistent with the mapping h i.e. Rh is non-empty. Equivalently h is valid.

The ﬁrst two columns of Table 1 compare |R| with the maximum possible number of hyperarcs 2|B|2|A| for ﬁve different causal inference problems (details in Appendix). Note that the reduction in size can be several orders of magnitude, and it increases with the complexity of the causal

Scalable Computation of Causal Bounds

Graph |R| 2|B|2|A|

Ex A 1.3 108 1.0 106 2.3 103

Ex B 4.2 106 1.0 106 7.1 104

Ex C 4.2 106 1.0 106 4.4 104

Ex D 6.3 1057 1.7 107 9.4 106

Ex E 3.2 1032 1.7 107 9.4 106

Table 1: LP pruning

graph, see e.g. Examples D and E. Thus, there is a very signiﬁcant reduction in size even if all hyperarcs are valid. The last column in Table 1 lists |H|. Considering only the valid hyperarcs further decreases the size of the LP by at least 1 order of magnitude, and sometimes more. The LPs corresponding to Examples B and C can be solved without pruning; however, the LP corresponding to Example A can only be solved after pruning the problem, and the LPs for Examples D and E are too large even after pruning. In Section 4 we propose a greedy heuristic to compute bounds for these problems.

Next, we show how to efﬁciently compute c L h = 1{Rh RQ} and c U h = 1{Rh RQ = }.

3.1.2. EFFICIENTLY COMPUTING c L h

Recall that in Section 2.2 we established that c L h for the simple example can be efﬁciently computed by checking whether h is completely consistent with Q. Here, we extend that result to general causal graphs. Let IC I denote the indices of variables that are critical in the intervention i.e. for each V VIC, there is a directed path from V to a variable in VO in GI. Then the following result characterizes complete consistency of conditional probabilities.

Lemma 3.5 (Complete Consistency of Probability). For graphs satisfying Assumption 3.1 and queries satisfying Assumption 3.2, the conditional probability pv B.v A is completely consistent with Q if and only if, v IC A = q IC A, v IC B = q IC B, and v O = q O.

This result is analogous to the one for the simple example: the conditional probability pv B.v A is completely consistent with Q, if and only if, the variable assignments VA = v A and VB = v B are consistent with the query. Next, we show that a hyperarc h is completely consistent with the query Q if and only if, there exists v {0, 1}|N| such that h(v A) = v B and the conditional probability pv B.v A is completely consistent with Q.

Theorem 3.6 (Complete Consistency of Hyperarc). For graphs satisfying Assumption 3.1 and queries satisfying Assumption 3.2, a hyperarc h is completely consistent with Q if, and only if, there exists v {0, 1}|N| such that h(v A) = v B, and the conditional probability pv B.v A is completely consis-

tent with Q.

To summarize, to compute c L h , it is sufﬁcient to check if h is completely consistent with Q, which can be easily characterized without enumerating all the elements in R.

3.1.3. EFFICIENTLY COMPUTING c U h

We now show how to compute c U h = 1{Rh RQ = } using an efﬁcient algorithm for checking Rh RQ = . This condition motivates the following deﬁnition for strict inconsistency.

Deﬁnition 3.7 (Strict Inconsistency). A hyperarc h is said to strictly inconsistent with the query Q if Rh RQ = . The conditional probability pv B.v A is strictly inconsistent with the query Q if Rv B.v A RQ = .

Hence, c U h = 0, if and only if, h is strictly inconsistent with Q. The following result characterizes strict inconsistency for conditional probabilities.

Theorem 3.8 (Strict Inconsistency of Probability). For graphs satisfying Assumption 3.1 and queries satisfying Assumption 3.2, the conditional probability pv B.v A is strictly inconsistent with the query Q, if and only if, v IC A = q IC A, v IC B = q IC B, and v O = q O.

Thus, the conditional probability pv B.v A is strictly inconsistent with Q if and only if, the variable assignment VA = v A, VB = v B is inconsistent with the query. Next, we show that a hyperarc h is strictly inconsistent with the query Q if, and only if, there exists v {0, 1}|N| such that h(v A) = v B, and the conditional probability pv B.v A is strictly inconsistent with Q.

Theorem 3.9 (Strict Inconsistency of Hyperarc). For graphs satisfying Assumption 3.1 and queries satisfying Assumption 3.2, the hyperarc h is strictly inconsistent with Q if, and only if, there exists v {0, 1}|N| such that h(v A) = v B and the probability pv B.v A is strictly inconsistent with Q.

To summarize, to compute c U h , it is sufﬁcient to check if h is strictly inconsistent with Q, and that can be easily characterized without enumerating all the elements in R.

3.2. Bounds in Closed Form

Recall that for the class of problems identiﬁed by (Sachs et al., 2020), each variable in VA is either in the intervention, or redundant for the intervention. Now we show that for a subclass of problems which satisfy Assumption 3.10 i.e. where the entire set A is involved in the intervention and all intervention variables are critical, the bounds can be computed in closed form.

Assumption 3.10. The query Q = P(VO = q O | VI = q I) satisﬁes the following:

Scalable Computation of Causal Bounds

S1 A I, i.e. the entire set A is involved in the intervention,

S2 there is a directed path from every V VI to some variable in VO in GI, i.e. all intervention variables are critical, or equivalently IC = I.

Theorem 3.11 (Lower Bound for Special Class of Problems). Suppose the query Q satisﬁes S1 and S2. Then the lower bound αL is equal to the sum of the conditional probabilities that are completely consistent with Q.

Proof. Theorem 3.6 implies a hyperarc h is completely consistent with Q if, and only if, there exists v = (v A, v B) such that h(v A) = v B, v A IC = q A IC, v B IC = q B IC, and v O = q O. Therefore, for any Q satisfying S1 and S2, it follows that a hyperarc h is completely consistent with Q if, and only if, there exists v B such that h(q A) = v B, v B I = q B I, and v O = q O. Thus, it follows that

{h H:c L h =1} qh

{h H: h is completely consistent with Q} qh (11)

{v B:v I B=q I B,v O=q O}

{h H:h(q A)=v B} qh(12)

{v B:v I B=q I B,v O=q O} pv B.q A (13)

where (11) follows from Deﬁnition 2.1, (12) from the discussion above, and (13) from the constraints of the pruned LP. By Lemma 3.5, (13) is the sum of the probabilities completely consistent with Q.

Theorem 3.12 (Upper Bound for Special Class of Problems). Suppose the query Q satisﬁes S1 and S2. Then the upper bound αU is given by the difference of the sum of the conditional probabilities which are strictly inconsistent with Q and 1.

Proof. Theorem 3.9 implies that a hyperarc h is strictly inconsistent with Q if, and only if, there exists v = (v A, v B) such that h(v A) = v B, v A IC = q A IC, v B IC = q B IC, and v O = q O. Therefore, for any Q satisfying S1 and S2, it follows that a hyperarc h is strictly inconsistent with Q if, and only if, there exists v B such that h(q A) = v B,

v B I = q B I, and v O = q O. Thus, it follows that X

h H:c U h =0 qh

{h H: h is strictly inconsistent with Q} qh (14)

{v B:v B I=q B I,v O =q O}

{h H:h(q A)=v B} qh (15)

{v B:v B I=q B I,v O =q O} pv B.q A (16)

where (14) follows from Deﬁnition 3.7, (15) from the discussion above, and (16) from the constraints of the pruned LP. The result follows by the fact that αU = P

h H:c U h =1 qh = 1 P

h H:c U h =0 qh.

3.2.1. EXAMPLE

An important example of the class of problems that satisfy S1 and S2 is a causal inference problem where multiple confounded treatments inﬂuence an outcome (Ranganath & Perotte, 2019; Janzing & Sch olkopf, 2018; D Amour, 2019; Tran & Blei, 2017). For example, the setting where the treatments are medications and procedures, and the outcome is the progression of the disease in the patient. In this case, there are many confounders which inﬂuence both the prescribed treatments and outcome. Some of these confounders can be measured, e.g. the pre-existing conditions of the patient, and others are unobserved, e.g. the treatment preferences of the attending doctor (Wang & Blei, 2019a). See Figure 3 for the causal graph of the patient response where

Ci indicates the presence of pre-existing condition i in the patient

UA is an unobserved confounder (e.g. a patient characteristic) which inﬂuences the presence of pre-existing conditions

Ti, i = 1, . . . , 5, indicates whether the patient was prescribed treatment i

UB is an unobserved confounder which inﬂuences both the prescribed treatments and outcome (e.g. doctor biases, treatment preferences)

Y indicates the progression of the disease in the patient. Y = 0 indicates that the disease is mild, and Y = 1 that the disease is lethal.

Wang & Blei (2019a; 2021) introduced the deconfounder as a method to predict the expected value of the outcome

Scalable Computation of Causal Bounds

T1 T2 T3 T4 T5

Figure 3: Example of G where VA = {C1, C2} and VB = {T1, T2, T3, T4, T5, Y }

variable under treatment interventions when multiple confounded treatments inﬂuence an outcome. One of the limitations of the deconfounder approach is that it cannot be applied in a setting where the treatment variables have causal relationships between them (Ogburn et al., 2019; Wang & Blei, 2019b; Imai & Jiang, 2019). For example, in our context, the side effects of one treatment can inﬂuence the prescription of another treatment (Ogburn et al., 2019), as implied by arrows T1 T3, T2 T3, T3 T4 and T3 T5 in Figure 3. The deconfounder cannot be used for inference in this setting. However, since the entire set VA = {C1, C2} is involved in the intervention, and every node intervened upon is a parent of Y , Theorems (3.11) and (3.12) can be used to compute bounds for the query E[Y |do(T = t, C1 = c1, C2 = c2)] in closed form. In particular,

αL = P(T = t, Y = 1|C1 = c1, C2 = c2)

αU = 1 P(T = t, Y = 0|C1 = c1, C2 = c2)

4. Greedy Heuristic

For problems where even the pruned LPs are intractable, and the conditions in Section 3.2 are not satisﬁed, we propose Algorithm 1 as a greedy heuristic to compute the bounds αL and αU. This heuristic is motivated by the duals of LPs (9) and (10), which are given by:

αL = maxλ P (v A,v B) {0,1}|A| {0,1}|B| pv B.v Aλv B.v A s.t. P

v A {0,1}|A| λh(v A).v A c L h (17)

αU = minλ P

(v A,v B) {0,1}|A| {0,1}|B| pv B.v Aλv B.v A s.t. P v A {0,1}|A| λh(v A).v A c U h (18)

We utilize the fact that in our numerical experiments, we observed that for both dual LPs, there was always an optimal solution that only took values in the set { 1, 0, 1}, and the fact that in the symbolic bounds introduced by Balke & Pearl (1994) (see, also (Zhang & Bareinboim, 2017; Pearl, 2009; Sj olander et al., 2014; Sachs et al., 2020)) the constraints were combined using coefﬁcients taking values in { 1, 0, 1}. In fact, we believe that the following conjecture is true. Conjecture 4.1 (Dual Integrality). The dual LPs in (17) and (18) have optimal solutions which only take values in { 1, 0, 1}.

% with αG L = αL

% with αG U = αU

% with ϵL 0.1

% with ϵU 0.1

Ex A 100 100 100 100 Ex B 99 86 99 94 Ex C 100 84 100 86

Table 2: Greedy Solution vs Optimal Solution

Algorithm 1 Greedy Heuristic Let the permutation which sorts the conditional probabilities in descending order be i1, . . . , i2|N|. Function Greedy Lower Bound():

Initialize λ = 1. for j = 1, .., 2|N| do

while λ is feasible do

λij = λij + 1

Function Greedy Upper Bound():

Initialize λ = 1. for j = 1, .., 2|N| do

while λ is feasible do

λij = λij 1

We benchmark the greedy heuristic by computing bounds for 100 instances of each of the examples in Appendix A which do not satisfy the conditions in Section 3.2. Table 2 reports the results for Examples A, B and C for which the LP can be solved. We see that bounds from the greedy heuristic matches the LP bounds in most instances for these problems. Recall that one can compute the optimal bounds for Example A only after pruning the LP. In Table 2, ϵL = 1 αG L αL and ϵU = αG U αU 1 denote the relative errors of the lower and upper bounds, respectively. We see that the lower bound is always within 10% of the true value, whereas the upper bound is within 10% for at least 86% of the cases. See Appendix for the empirical distribution function of errors.

Furthermore, the greedy heuristic yields non-trivial bounds for Examples D and E, where the corresponding pruned LP is too large to be solved to optimality (see Table 3).

Scalable Computation of Causal Bounds

Graph αG L αG U Ex D 0.92 0.93 Ex E 0.87 0.98

Table 3: Greedy Solution for Large Problems

5. Conclusion

In this work, we show how to leverage structural properties of the LPs corresponding to causal inference problems to signiﬁcantly reduce their size. We also show how to construct these LPs efﬁciently. As a direct consequence of our results, bounds for causal queries can be computed for graphs of much larger size. We show that there are examples of causal inference problems for which bounds could be computed only after the pruning we introduce. Our structural results also allow us to characterize a set of causal inference problems for which the bounds can be computed in closed form. This class includes as a special case extensions of problems considered in the multiple causes literature (Wang & Blei, 2021; Ranganath & Perotte, 2019; Janzing & Sch olkopf, 2018; D Amour, 2019; Tran & Blei, 2017).

We are currently considering two extensions. The constraints in the dual LPs are all packing constraints (for αL) or covering constraints (for αU); however, the variables are free. However, if Conjecture 4.1 is true, the feasible set of the LP can be assumed to be bounded. This could potentially be used to compute fast approximation algorithms (Bienstock & Iyengar, 2006).

Note that here we do not allow the query to contain observations about the unit under consideration. We can show that bounds for queries containing observations can be computed by solving fractional LPs (Bitran & Novaes, 1973). These fractional LPs are special because the denominator is restricted to be non-negative. This allows us to homogenize the problem into a LP with one additional constraint. Therefore, we expect that the structural results presented here will extend to the setting where the query contains an observation.

Balke, A. and Pearl, J. Counterfactual probabilities: Computational methods, bounds and applications. In Uncertainty Proceedings 1994, pp. 46 54. Elsevier, 1994.

Bienstock, D. and Iyengar, G. Approximating fractional packings and coverings in o (1/epsilon) iterations. SIAM Journal on Computing, 35(4):825 854, 2006.

Bitran, G. R. and Novaes, A. G. Linear programming with a fractional objective function. Operations Research, 21 (1):22 29, 1973.

D Amour, A. On multi-cause approaches to causal inference with unobserved counfounding: Two cautionary failure cases and a promising alternative. In Chaudhuri, K. and Sugiyama, M. (eds.), Proceedings of the Twenty-Second International Conference on Artiﬁcial Intelligence and Statistics, volume 89 of Proceedings of Machine Learning Research, pp. 3478 3486. PMLR, 16 18 Apr 2019. URL https://proceedings.mlr.press/v89/ d-amour19a.html.

Duarte, G., Finkelstein, N., Knox, D., Mummolo, J., and Shpitser, I. An automated approach to causal inference in discrete settings. ar Xiv preprint ar Xiv:2109.13471, 2021.

Evans, R. J. Graphical methods for inequality constraints in marginalized DAGs. In 2012 IEEE International Workshop on Machine Learning for Signal Processing, pp. 1 6, 2012. doi: 10.1109/MLSP.2012.6349796.

Finkelstein, N. and Shpitser, I. Deriving bounds and inequality constraints using logical relations among counterfactuals. In Conference on Uncertainty in Artiﬁcial Intelligence, pp. 1348 1357. PMLR, 2020.

Finkelstein, N., Adams, R., Saria, S., and Shpitser, I. Partial identiﬁability in discrete data with measurement error. In de Campos, C. and Maathuis, M. H. (eds.), Proceedings of the Thirty-Seventh Conference on Uncertainty in Artiﬁcial Intelligence, volume 161 of Proceedings of Machine Learning Research, pp. 1798 1808. PMLR, 27 30 Jul 2021. URL https://proceedings.mlr.press/ v161/finkelstein21b.html.

Geiger, D. and Meek, C. Quantiﬁer elimination for statistical problems, 2013. URL https://arxiv.org/abs/ 1301.6698.

Imai, K. and Jiang, Z. Discussion of the blessings of multiple causes by wang and blei, 2019.

Imbens, G. W. and Rubin, D. B. Causal inference in statistics, social, and biomedical sciences. Cambridge University Press, 2015.

Janzing, D. and Sch olkopf, B. Detecting confounding in multivariate linear models via spectral analysis. Journal of Causal Inference, 6(1):20170013, 2018. doi: doi:10. 1515/jci-2017-0013. URL https://doi.org/10. 1515/jci-2017-0013.

Kilbertus, N., Kusner, M. J., and Silva, R. A class of algorithms for general instrumental variable models. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 20108 20119. Curran Associates, Inc., 2020. URL https://proceedings. neurips.cc/paper/2020/file/

Scalable Computation of Causal Bounds

e8b1cbd05f6e6a358a81dee52493dd06-Paper. pdf.

Ogburn, E. L., Shpitser, I., and Tchetgen, E. J. T. Comment on blessings of multiple causes . Journal of the American Statistical Association, 114(528):1611 1615, 2019. doi: 10.1080/01621459.2019.1689139. URL https:// doi.org/10.1080/01621459.2019.1689139.

Pearl, J. Causality: Models, Reasoning and Inference. Cambridge University Press, USA, 2nd edition, 2009. ISBN 052189560X.

Poderini, D., Chaves, R., Agresti, I., Carvacho, G., and Sciarrino, F. Exclusivity graph approach to instrumental inequalities. In Adams, R. P. and Gogate, V. (eds.), Proceedings of The 35th Uncertainty in Artiﬁcial Intelligence Conference, volume 115 of Proceedings of Machine Learning Research, pp. 1274 1283. PMLR, 22 25 Jul 2020. URL https://proceedings.mlr. press/v115/poderini20a.html.

Ranganath, R. and Perotte, A. Multiple causal inference with latent confounding, 2019.

Richardson, A., Hudgens, M. G., Gilbert, P., and Fine, J. Nonparametric bounds and sensitivity analysis of treatment effects. Stat Sci, 29(4):596 618, 2014. doi: 10.1214/14-STS499.

Sachs, M. C., Gabriel, E. E., and Sjolander, A. Symbolic computation of tight causal bounds. Biometrika, 103(1): 1 19, 2020.

Sachs, M. C., Jonzon, G., Sj olander, A., and Gabriel, E. E. A general method for deriving tight symbolic bounds on causal effects, 2021.

Sj olander, A., Lee, W., K allberg, H., and Pawitan, Y. Bounds on causal interactions for binary outcomes. Biometrics, 70(3):500 505, 2014. ISSN 0006341X, 15410420. URL http://www.jstor. org/stable/24538083.

Tran, D. and Blei, D. M. Implicit causal models for genomewide association studies, 2017.

Wang, Y. and Blei, D. A proxy variable view of shared confounding. In Meila, M. and Zhang, T. (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 10697 10707. PMLR, 18 24 Jul 2021. URL https://proceedings.mlr.press/ v139/wang21c.html.

Wang, Y. and Blei, D. M. The blessings of multiple causes. Journal of the American Statistical Association, 114(528): 1574 1596, 2019a.

Wang, Y. and Blei, D. M. The blessings of multiple causes: Rejoinder. Journal of the American Statistical Association, 114(528):1616 1619, 2019b. doi: 10.1080/ 01621459.2019.1690841. URL https://doi.org/ 10.1080/01621459.2019.1690841.

Zhang, J. and Bareinboim, E. Transfer learning in multiarmed bandits: A causal approach. In Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence, IJCAI-17, pp. 1340 1346, 2017. doi: 10. 24963/ijcai.2017/186. URL https://doi.org/10. 24963/ijcai.2017/186.

Zhang, J. and Bareinboim, E. Bounding causal effects on continuous outcome. Proceedings of the AAAI Conference on Artiﬁcial Intelligence, 35(13):12207 12215, May 2021. URL https://ojs.aaai.org/index. php/AAAI/article/view/17449.

Scalable Computation of Causal Bounds

A. Examples of Causal Inference Problems

In this section, we report the causal graph structure and the data generation process for the 5 examples in Table 1.

The causal graph for this example is display in Figure 4a. The query is: P(Y = 1|do(X2 = 1, Z1 = 1, Z2 = 1)), and data generating process used to generate the input data is given by

Z1 Bernoulli(logit 1(UA))

Z2 Bernoulli(logit 1(UA))

S1 Bernoulli(logit 1(UB))

X1 Bernoulli(logit 1(UB + S1))

S2 Bernoulli(logit 1(S1 + UB + X1 + Z2))

X2 Bernoulli(logit 1(S2 + UB + Z1 + Z2))

Y Bernoulli(logit 1(UB + S2 + X2 + Z2))

After sampling UA, UB, Z1, Z2 we compute

P(Y, X2, S2, X1, S1|Z2, Z1) = P(Y |UB, S2, X2, Z2)P(X2|S2, Z1, Z2, UB)P(S2|Z2, UB, S1, X1)P(X1|S1, UB)P(S1|UB)

that gives input distribution.

The causal graph for this example is in Figure 4b and the query is: P(Y = 1|do(A = 1, B = 1, C = 1, F = 1)). The data generating process used to generate the input information is as follows:

C Bernoulli(logit 1(UA))

F Bernoulli(logit 1(UA))

A Bernoulli(logit 1(C + F + UB))

B Bernoulli(logit 1(C + F + UB))

D Bernoulli(logit 1(A + UB))

E Bernoulli(logit 1(A + B + UB))

Y Bernoulli(logit 1(UB + D + C + E))

After sampling UA, UB, C, F we compute

P(A, B, D, E, Y |C, F) = P(Y |UB, D, C, E)P(E|A, B, UB)P(D|A, UB)P(B|C, F, UB)P(A|C, F, UB).

that gives the input distribution.

The causal graph for this example is in Figure 4c and the query is: P(Y = 1|do(M1 = 1, W1 = 1, W3 = 1)). The data generating process used to generate the input information is as follows:

Scalable Computation of Causal Bounds

(a) Example A

(b) Example B

(c) Example C

(d) Example D

(e) Example E

Figure 4: Examples of Causal Inference Problems

Scalable Computation of Causal Bounds

W1 Bernoulli(logit 1(UA))

W3 Bernoulli(logit 1(UA))

T Bernoulli(logit 1(W1 + W3 + UB))

M1 Bernoulli(logit 1(T + W3 + UB))

M2 Bernoulli(logit 1(M1 + W1 + UB))

Y Bernoulli(logit 1(M2 + UB))

X3 Bernoulli(logit 1(UB + Y + M1 + T))

After sampling UA, UB, W1, W3 we compute

P(T, M1, M2, X3, Y |W1, W3) = P(X3|UB, Y, M1, T)P(Y |M2, UB)P(M2|M1, W1, UB) P(M1|T, W3, UB)P(T|W1, W3, UB)

that gives the input distribution.

The causal graph for this example is in Figure 4d and the query is: P(Y = 1|do(D = 1, E = 1, F = 1)). The data generating process used to generate the input information is as follows:

F Bernoulli(logit 1(UA))

E Bernoulli(logit 1(UA))

A Bernoulli(logit 1(F + E + UB))

B Bernoulli(logit 1(E + F + A + UB))

C Bernoulli(logit 1(B + E + F + A + UB))

D Bernoulli(logit 1(B + E + F + A + C + UB))

Y Bernoulli(logit 1(E + D + UB))

G Bernoulli(logit 1(A + B + C + D + Y + E + F + UB))

After sampling UA, UB, E, F we compute

P(G, Y, D, C, B, A|E, F) = P(G|UB, A, B, C, D, Y, E, F)P(Y |E, D, UB)P(D|B, E, F, A, C, UB) P(C|B, E, F, A, UB)P(B|E, F, A, UB)P(A|E, F, UB)

that gives the input distribution.

Scalable Computation of Causal Bounds

The causal graph for this example is in Figure 4e and the query is: P(Y = 1|do(A = 1, B = 1, C = 1, F = 1)). The data generating process used to generate the input information is as follows:

A Bernoulli(logit 1(UA))

B Bernoulli(logit 1(UA))

C Bernoulli(logit 1(A + B + UB))

D Bernoulli(logit 1(A + C + B + UB))

E Bernoulli(logit 1(A + B + UB))

F Bernoulli(logit 1(A + C + B + D + E + G + UB))

G Bernoulli(logit 1(UB + A + B + C + D))

Y Bernoulli(logit 1(UB + A + E + B + F))

After sampling UA, UB, A, B we compute

P(C, D, E, F, G, Y |A, B) = P(C|A, B, UB)P(D|A, C, B, UB)P(E|A, B, UB)P(F|A, C, B, D, E, G, UB)

P(G|A, B, C, D, UB)P(Y |UB, A, E, B, F)

that gives the input distribution.

B. Proof of results

Lemma B.1 (Complete Consistency of Probability). For graphs satisfying Assumption 3.1 and queries satisfying Assumption 3.2, the conditional probability pv B.v A is completely consistent with Q if and only if, v IC A = q IC A, v IC B = q IC B, and v O = q O.

Proof. Suppose pv B.v A satisﬁes v IC A = q IC A, v IC B = q IC B and v O = q O. Then, every r Rv B.v A maps VIC = q IC to VO = q O, and thus, r RQ i.e. Rv B.v A RQ. Hence, pv B.v A is completely consistent with Q. To see that this is the only form a completely consistent probability can take, consider two cases:

Suppose either v IC A = q A IC or v IC B = q B IC i.e. the value of VIC in pv B.v A is not q IC. Since R is exhaustive, there exists r such that:

r maps VA = v A to VB = v B i.e. r Rv B.v A. In particular, r maps VIC = q IC to VO = q O. r maps VIC = q IC to VO = q O i.e. r RQ

Hence, there exists r such that r Rv B.v A, but r RQ i.e. pv B.v A is not completely consistent.

Suppose v O = q O. Then every r Rv B.v A maps VIC = q IC to VO = q O, and therefore, r RQ. Hence pv B.v A is not completely consistent.

Theorem 3.6 (Complete Consistency of Hyperarc). For graphs satisfying Assumption 3.1 and queries satisfying Assumption 3.2, a hyperarc h is completely consistent with Q if, and only if, there exists v {0, 1}|N| such that h(v A) = v B, and the conditional probability pv B.v A is completely consistent with Q.

Proof. Suppose there exists v {0, 1}|N| such that h(v A) = v B, and pv B.v A is completely consistent with Q. Since Rh Rv B.v A, and pv B.v A is completely consistent with Q, Rh RQ i.e. h is completely consistent with Q.

Scalable Computation of Causal Bounds

Suppose h is completely consistent with Q, but there does not exist v {0, 1}|N| which satisﬁes Theorem 3.6. That is, for all v {0, 1}|N| such that h(v A) = v B and v IC A = q IC A, we have either v IC B = q IC B or v O = q O. Since R is exhaustive, there exists r R such that:

(i) r maps VA = v A to VB = v B for all (v A, v B) such that h(v A) = v B i.e. r Rh

(ii) r maps VIC A = q IC A, VIC B = q IC B to VO = q O.

Hence h is not completely consistent with Q, a contradiction.

Theorem 3.8 (Strict Inconsistency of Probability). For graphs satisfying Assumption 3.1 and queries satisfying Assumption 3.2, the conditional probability pv B.v A is strictly inconsistent with the query Q, if and only if, v IC A = q IC A, v IC B = q IC B, and v O = q O.

Proof. Suppose pv B.v A satisﬁes v IC A = q IC A, v IC B = q IC B and v O = q O. Then, any r Rv B.v A maps VIC = q IC to VO = q O, and thus, r RQ i.e. Rv B.v A RQ = . Hence, pv B.v A is strictly inconsistent with Q. To see that this is the only form a strictly inconsistent probability can take, consider two cases:

Suppose either v IC A = q A IC or v IC B = q B IC i.e. the value of VIC in pv B.v A is not q IC. Since R is exhaustive, there exists r such that:

r maps VA = v A to VB = v B i.e. r Rv B.v A. In particular, r maps VIC = q IC to VO = q O. r maps VIC = q IC to VO = q O i.e. r RQ

Hence, there exists r such that r Rv B.v A, but r RQ i.e. pv B.v A is not strictly inconsistent.

Suppose v O = q O. Then every r Rv B.v A maps VIC = q IC to VO = q O, and therefore, r RQ. Hence pv B.v A is not strictly inconsistent.

Theorem 3.9 (Strict Inconsistency of Hyperarc). For graphs satisfying Assumption 3.1 and queries satisfying Assumption 3.2, the hyperarc h is strictly inconsistent with Q if, and only if, there exists v {0, 1}|N| such that h(v A) = v B and the probability pv B.v A is strictly inconsistent with Q.

Proof. Suppose there exists v {0, 1}|N| such that h(v A) = v B, and pv B.v A is strictly inconsistent with Q. Since Rh Rv B.v A, and pv B.v A is strictly inconsistent with Q, Rh RQ = i.e. h is strictly inconsistent with Q.

Suppose h is strictly inconsistent with Q, but there does not exist v {0, 1}|N| which satisﬁes Theorem 3.9. That is, for all v {0, 1}|N| such that h(v A) = v B and v IC A = q IC A, we have either v IC B = q IC B or v O = q O. Since R is exhaustive, there exists r R such that:

(i) r maps VA = v A to VB = v B for all (v A, v B) such that h(v A) = v B i.e. r Rh

(ii) r maps VIC A = q IC A, VIC B = q IC B to VO = q O.

Hence h is not strictly inconsistent with Q, a contradiction.

Scalable Computation of Causal Bounds

C. Empirical CDF for Error of Greedy Heuristic

(a) Empirical CDF of the Relative Error of αU for Example B

(b) Empirical CDF for Relative Error of αL for Example B

(c) Empirical CDF of the Relative Error of αU for Example C

Figure 5: Empirical Distribution Functions of Errors for Examples