# testing_unfaithful_gaussian_graphical_models__9b8c5b95.pdf

Testing Unfaithful Gaussian Graphical Models

De Wen Soh Department of Electrical Engineering Yale University 17 Hillhouse Ave, New Haven, CT 06511 dewen.soh@yale.edu

Sekhar Tatikonda Department of Electrical Engineering Yale University 17 Hillhouse Ave, New Haven, CT 06511 sekhar.tatikonda@yale.edu

The global Markov property for Gaussian graphical models ensures graph separation implies conditional independence. Speciﬁcally if a node set S graph separates nodes u and v then Xu is conditionally independent of Xv given XS. The opposite direction need not be true, that is, Xu Xv | XS need not imply S is a node separator of u and v. When it does, the relation Xu Xv | XS is called faithful. In this paper we provide a characterization of faithful relations and then provide an algorithm to test faithfulness based only on knowledge of other conditional relations of the form Xi Xj | XS.

1 Introduction

Graphical models [1, 2, 3] are a popular and important means of representing certain conditional independence relations between random variables. In a Gaussian graphical model, each variable is associated with a node in a graph, and any two nodes are connected by an undirected edge if and only if their two corresponding variables are independent conditioned on the rest of the variables. An edge between two nodes therefore corresponds directly to the non-zero entries of the precision matrix Ω= Σ 1, where Σ is the covariance matrix of the multivariate Gaussian distribution in question. With the graphical model deﬁned in this way, the Gaussian distribution satisﬁes the global Markov property: for any pair of nodes i and j, if all paths between the two pass through a set of nodes S, then the variables associated with i and j are conditionally independent given the variables associated with S.

The converse of the global Markov property does not always hold. When it does hold for a conditional independence relation, that relation is called faithful. If it holds for all relations in a model, that model is faithful. Faithfulness is important in structural estimation of graphical models, that is, identifying the zeros of Ω. It can be challenging to simply invert Σ. With faithfulness, to determine an edge between nodes i and j, one could run through all possible separator sets S and test for conditional independence. If S is small, the computation becomes more accurate. In the work of [4, 5, 6, 7], different assumptions are used to bound S to this end.

The main problem of faithfulness in graphical models is one of identiﬁability. Can we distinguish between a faithful graphical model and an unfaithful one? The idea of faithfulness was ﬁrst explored for conditional independence relations that were satisﬁed in a family of graphs, using the notion of θ-Markov perfectness [8, 9]. For Gaussian graphical models with a tree topology the the distribution has been shown to be faithful [10, 11]. In directed graphical models, the class of unfaithful distributions has been studied in [12, 13]. In [14, 15], a notion of strong-faithfulness as a means of relaxing the conditions of faithfulness is deﬁned.

In this paper, we study the identiﬁability of a conditional independence relation. In [6], the authors restrict their study of Gaussians to walk-summable ones. In [7], the authors restrict their class of distributions to loosely connected Markov random ﬁelds. These restrictions are such that the

local conditional independence relations imply something about the global structure of the graphical model. In our discussion, we assume no such restrictions. We provide a testable condition for the faithfulness of a conditional independence relation in a Gaussian undirected graphical model. Checking this condition requires only using other conditional independence relations in the graph. We can think of these conditional independence relations as local patches of the covariance matrix Σ. To check if a local patch reﬂects the global graph (that is, a local path is faithful) we have to make use of other local patches. Our algorithm is the ﬁrst algorithm, to the best of our knowledge, that is able to distinguish between faithful and unfaithful conditional independence relations without any restrictions on the topology or assumptions on spatial mixing of the Gaussian graphical model.

This paper is structured as follows: In Section 2, we discuss some preliminaries. In Section 3, we state our main theorem and proofs, as well as key lemmas used in the proofs. In Section 4, we lay out an algorithm that detects unfaithful conditional independence relations in Gaussian graphical models using only local patches of the covariance matrix. We also describe a graph learning algorithm for unfaithful graphical models. In Section 5, we discuss possible future directions of research.

2 Preliminaries

We ﬁrst deﬁne some linear algebra and graph notation. For a matrix M, let M T denote its transpose and let |M| denote its determinant. If I is a subset of its row indices and J a subset of its column indices, then we deﬁne the submatrix M IJ as the |I| |J| matrix with elements with both row and column indices from I and J respectively. If I = J, we use the notation M I for convenience. Let M( i, j) be the submatrix of M with the i-th row and j-th column deleted. Let M( I, J) be the submatrix with rows with indices from I and columns with indices from J removed. In the same way, for a vector v, we deﬁne v I to be the subvector of v with indices from I. Similarly, we deﬁne v( I) to be the subvector of v with indices not from I. For two vectors v and w, we denote the usual dot product by v w.

Let G = (W, E) be an undirected graph, where W = {1, . . . , n} is the set of nodes and E is the set of edges, namely, a subset of the set of all unordered pairs {(u, v) | u, v W}. In our paper we are dealing with graphs that have no self-loops and no multiple edges between the same pair of nodes. For I W, we denote the induced subgraph on nodes I by GI. For any two distinct nodes u and v, we say that the node set S W \ {u, v} is a node separator of u and v if all paths from u to v must pass through some node in S.

Let X = (X1, . . . , Xn) be a multivariate Gaussian distribution with mean µ and covariance matrix Σ. Let Ω= Σ 1 be the precision or concentration matrix of the graph. For any set S W, we deﬁne XS = {Xi | i S}. We note here that Σuv = 0 if and only if Xu is independent of Xv, which we denote by Xu Xv. If Xu is independent of Xv conditioned on some random variable Z, we denote this independence relation by Xu Xv | Z. Note that Ωuv = 0 if and only if Xu Xv | XW\{u,v}.

For any set S W, the conditional distribution of XW\S given XS = x S follows a multivariate Gaussian distribution with conditional mean µW\S Σ(W\S)SΣ 1 S (x S µS) and conditional covariance matrix ΣW\S Σ(W\S)SΣ 1 S ΣS(W\S). For distinct nodes u, v W and any set S W \ {u, v}, the following property easily follows.

Proposition 1 Xu Xv | XS if and only if Σuv = Σu SΣ 1 S ΣSv.

The concentration graph GΣ = (W, E) of a multivariate Gaussian distribution X is deﬁned as follows: We have node set W = {1, . . . , n}, with random variable Xu associated with node u, and edge set E where unordered pair (u, v) is in E if and only if Ωuv = 0. The multivariate Gaussian distribution, along with its concentration graph, is also known as a Gaussian graphical model. Any Gaussian graphical model satisﬁes the global Markov property, that is, if S is a node separator of nodes u and v in GΣ, then Xu Xv | XS. The converse is not necessarily true, and therefore, this motivates us to deﬁne faithfulness in a graphical model.

Deﬁnition 1 The conditional independence relation Xu Xv | XS is said to be faithful if S is a node separator of u and v in the concentration graph GΣ. Otherwise, it is unfaithful. A multivari-

Figure 1: Even though ΣS {u,v} is a submatrix of Σ, GΣS {u,v} need not be a subgraph of GΣ. Edge properties do not translate as well. That means the local patch ΣS {u,v} need not reﬂect the edge properties of the global graph structure of Σ.

ate Gaussian distribution is faithful if all its conditional independence relations are faithful. The distribution is unfaithful if it is not faithful.

Example 1 (Example of an unfaithful Gaussian distribution) Consider the multivariate Gaussian distribution X = (X1, X2, X3, X4) with zero mean and positive deﬁnite covariance matrix

3 2 1 2 2 4 2 1 1 2 7 1 2 1 1 6

By Proposition 1, we have X1 X3 | X2 since Σ13 = Σ12Σ 1 22 Σ23. However, the precision matrix Ω= Σ 1 has no zero entries, so the concentration graph is a complete graph. This means that node 2 is not a node separator of nodes 1 and 3. The independence relation X1 X3 | X2 is thus not faithful and the distribution X is not faithful as well.

We can think of the submatrix ΣS {u,v} as a local patch of the covariance matrix Σ. When Xu Xv | XS, nodes u and v are not connected by an edge in the concentration graph of the local patch ΣS {u,v}, that is, we have (Σ 1 S {u,v})uv = 0. This does not imply that u and v are not connected in the concentration graph GΣ. If Xu Xv | XS is faithful, then the implication follows. If Xu Xv | XS is unfaithful, then u and v may be connected in GΣ (See Figure 1).

Faithfulness is important in structural estimation, especially in high-dimensional settings. If we assume faithfulness, then ﬁnding a node set S such that Xu Xv | XS would imply that there is no edge between u and v in the concentration graph. When we have access only to the sample covariance instead of the population covariance matrix, if the size of S is small compared to n, the error of computing Xu Xv | XS is much less than the error of inverting the entire covariance matrix. This method of searching through all possible node separator sets of a certain size is employed in [6, 7]. As mention before, these authors impose other restrictions on their models to overcome the problem of unfaithfulness. We do not place any restriction on the Gaussian models. However, we do not provide probabilistic bounds when dealing with samples, which they do.

3 Main Result

In this section, we will state our main theoretical result. This result is the backbone for our algorithm that differentiates a faithful conditional independence relation from an unfaithful one. Our main goal is to decide if a conditional independence relation Xu Xv | XS is faithful or not. For convenience, we will denote GΣ simply by G = (W, E) for the rest of this paper. Now let us suppose that it is faithful; S is a node separator for u and v in G. Then we should not be able to ﬁnd a path from u to v in the induced subgraph GW\S. The main idea therefore is to search for a path between u and v in GW\S. If this fails, then we know that the conditional independence relation is faithful.

By the global Markov property, for any two distinct nodes i, j W \ S, if Xi Xj | XS, then we know that there is a path between i and j in GW\S. Thus, if we ﬁnd some w W \(S {i, j}) such that Xu Xw | XS and Xv Xw | XS, then a path exists from u to w and another exists from v to w, so u and v are connected in GW\S. This would imply that Xu Xv | XS is unfaithful.

However, testing for paths this way does not necessarily rule out all possible paths in GW\S. The problem is that some paths may be obscured by other unfaithful conditional independence relations. There may be some w whereby Xu Xw | XS and Xv Xw | XS, but the latter relation is unfaithful. This path from u to v through w is thus not detected by these two independence relations.

We will show however, that if there is no path from u to v in GW\S, then we cannot ﬁnd a series of distinct nodes w1, . . . , wt W \(S {u, v}) for some natural number t > 0 such that Xu Xw1 | XS, Xw1 Xw2 | XS, . . ., Xwt 1 Xwt | XS, Xwk Xv | XS. This is to be expected because of the global Markov property. What is more surprising about our result is that the converse is true. If we cannot ﬁnd such nodes w1, . . . , wt, then u and v are not connected by a path in GW\S. This means that if there is a path from u to v in GW\S, even though it may be hidden by some unfaithful conditional independence relations, ultimately there are enough conditional dependence relations to reveal that u and v are connected by a path in GW\S. This gives us an equivalent condition for faithfulness that is in terms of conditional independence relations.

Not being able to ﬁnd a series of nodes w1, . . . , wt that form a string of conditional dependencies from u to v as described in the previous paragraph is equivalent to the following: we can ﬁnd a partition (U, V ) of W \ S with u U and v V such that for all i U and j V , we have Xi Xj | XS. Our main result uses the existence of this partition as a test for faithfulness.

Theorem 1 Let X = (X1, . . . , Xn) be a Gaussian distribution with mean zero, covariance matrix Σ and concentration matrix Ω. Let u, v be two distinct elements of W and S W \{i, j} such that Xu Xv | XS. Then Xu Xv | XS is faithful if and only if there exists a partition of W \ S into two disjoint sets U and V such that u U, v V , and Xi Xj | XS for any i U and j V .

Proof of Theorem 1 . One direction is easy. Suppose Xu Xv | XS is faithful and S separates u and v in G. Let U be the set of all nodes reachable from u in GW\S including u. Let V = {W \ S U}. Then v V since S separates u and v in G. Also, for any i U and j V , S separates i and j in G, and by the global Markov property, Xi Xj | XS.

Next, we prove the opposite direction. Suppose that there exists a partition of W \ S into two sets U and V such that u U, v V , and Xi Xj | XS. for any i U and j V . Our goal is to show that S separates u and v in the concentration graph G of X. Let ΩW\S = Ω where the latter is the submatrix of the precision matrix Ω. Let the h-th column vector of Ω be ω(h), for h = 1, . . . , |W \ S|.

Step 1: We ﬁrst solve the trivial case where |U| = |V | = 1. If |U| = |V | = 1, then S = W \ {u, v}, and trivially, Xu Xv | XW\{u,v} implies S separates u and v, and we are done. Thus, we assume for the rest of the proof that U and V cannot both be size one.

Step 2: We deal with a second trivial case in our proof, which is the case where ω(i)( i) is identically zero for any i U. In the case where i = u, we have Ωuj = 0 for all j W \ (S {u}). This implies that u is an isolated node in GW\S, and so trivially, S must separate u and v, and we are done. In the case where i = u, we can manipulate the sets U and V so that ω(i)( i) is not identically zero for any i U, i = u. If there is some i U, i = u, such that X i Xh | XS for all h U, h = i , then we can simply move i from U into V to form a new partition (U , V ) of W \ S. This new partition still satisﬁes u U , v V , and Xi Xj | XS for all i U and j V . We can therefore shift nodes one by one over from U to V until either |U| = 1, or for any i U, i = u, there exists an h U such that Xi Xh | XS. By the global Markov property, this assumption implies that every node i U, i = u is connected by a path to some node in U, which means it must connected to some node in W \ (S {i}) by an edge. Thus, for all i U, i = u, the vector ω(i)( i) is non-zero.

Step 3: We can express the conditional independence relations in terms of elements in the precision matrix Ω, since the topology of G can be read off the non-zero entries of Ω. The proof of the following Lemma 1 uses the matrix block inversion formula and we omit the proof due to space.

Lemma 1 Xi Xj | XS if and only if |Ω ( i, j)| = 0.

From Lemma 1, observe that the conditional independence relations Xi Xj | XS are all statements about the cofactors of the matrix Ω . It follows immediately from Lemma 1 that the vector

sets {ω(h)( i) : h W \ S, h = j} are linearly dependent for all i U and j V . Each of these vector sets consists of the i-th entry truncated column vectors of Ω , with the j-th column vector excluded. Assume that the matrix Ω is partitioned as follows,

Ω = ΩUU ΩUV ΩV U ΩV V

The strategy of this proof is to use these linear dependencies to show that the submatrix ΩV U has to be zero. This would imply that for any node in U, it is not connected to any node in V by an edge. Therefore, S is a node separator of u and v in G, which is our goal.

Step 4: Let us ﬁx i U. Consider the vector sets of the form {ω(h)( i) : h W \ S, h = j}, j V . There are |V | such sets. The intersection of these sets is the vector set {ω(h)( i) : h U}. We want to use the |V | linearly dependent vector sets to say something about the linear dependency of {ω(h)( i) : h U}. With that in mind, we have the following lemmas.

Lemma 2 The vector set {ω(h)( i) : h U} is linearly dependent for any i U.

Step 5: Our ﬁnal step is to show that these linear dependencies imply that ΩUV = 0. We now have |U| vector sets {ω(h)( i) : h U} that are linearly dependent. These sets are truncated versions of the vector set {ω(h) : h U}, and they are speciﬁcally truncated by taking out entries only in U and not in V . The set {ω(h) : h U} must be linearly independent since Ω is invertible. Observe that the entries of ΩV U are contained in {ω(h)( i) : h U} for all i U. We can now use these vector sets to say something about the entries of ΩV U.

Lemma 3 The vector components ω(i) j = Ωij are zero for all i U and j V .

This implies that any node in U is not connected to any node in V by an edge. Therefore, S separates u and v in G and the relation Xu Xv | XS is faithful.

4 Algorithm for Testing Unfaithfulness

In this section, we will describe a novel algorithm for testing faithfulness of a conditional independence relation Xu Xv | XS. The algorithm tests the necessary and sufﬁcient conditions for faithfulness, namely, that we can ﬁnd a partition (U, V ) of W \ S such that u U, v V , and Xi Xj | XS for all i U and j V .

Algorithm 1 (Testing Faithfulness) Input covariance matrix Σ.

1. Deﬁne new graph G = { W, E}, where W = W \ S and E = {(i, j) : i, j W \ S, Xi Xj | XS, i = j}.

2. Generate set U to be the set of all nodes in W that are connected to u by a path in G, including u. (A breadth-ﬁrst search could be used.)

3. If v U, there exists a path from u to v in G, output Xu Xv | XS as unfaithful.

4. If v / U, let V = W \ U. Output Xu Xv | XS as faithful.

If we consider each test of whether two nodes are conditionally independent given XS as one step, the running time of the algorithm is the that of the algorithm used to determine set U. If a breadthﬁrst search is used, the running time is O(|W \ S|2|).

Theorem 2 Suppose Xu Xv | XS. If S is a node separator of u and v in the concentration graph, then Algorithm 1 will classify Xu Xv | XS as faithful. Otherwise, Algorithm 1 will classify Xu Xv | XS as unfaithful.

Proof. If Algorithm 1 determines that Xu Xv | XS is faithful, that means that it has found a partition (U, V ) of W \ S such that u U, v V , and Xi Xj | XS for any i U and

Figure 2: The concentration graph of the distribution in Example 4.

j V . By Theorem 1, this implies that Xu Xv | XS is faithful and so Algorithm 1 is correct. If Algorithm 1 decides that Xu Xv | XS is unfaithful, it does so by ﬁnding a series of nodes wℓ1, . . . , wℓt W \ (S {u, v}) for some natural number t > 0 such that Xu Xwℓ1 | XS, Xwℓ1 Xwℓ2 | XS, . . ., Xwℓt 1 Xwℓt | XS, Xwk Xv | XS, where ℓ1, . . . , ℓt are t distinct indices from R. By the global Markov property, this means that u is connected to v by a path in G, so this implies that Xu Xv | XS is unfaithful and Algorithm 1 is correct.

Example 2 (Testing an Unfaithful Distribution (1)) Let us take a look again at the 4-dimensional Gaussian distribution in Example 1. Suppose we want to test if X1 X3 | X2 is faithful or not. From its covariance matrix, we have Σ14 Σ12Σ 1 2 Σ24 = 2 2 1/4 = 3/2 = 0, so this implies that X1 X4 | X2. Similarly, X3 X4 | X2. So there exists a path from X1 to X3 in G{1,3,4} (it is trivially the edge (1, 3)), so the relation X1 X3 | X2 is unfaithful.

Example 3 (Testing an Unfaithful Distribution (2)) Consider a 6-dimensional Gaussian distribution X = (X1, . . . , X6) that has the covariance matrix

7 1 2 2 3 4 1 8 2 1 2.25 3 2 2 10 4 3 8 2 1 4 9 1 6 3 2.25 3 1 11 9 4 3 8 6 9 12

We want to test if the relation X1 X2 | X6 is faithful or unfaithful. Working out the necessary conditional independence relations to obtain G with S = {6}, we observed that (1, 3), (3, 5), (5, 4), (4, 2) E This means that 2 is reachable from 1 in G, so the relation is unfaithful. In fact, the concentration graph is the complete graph K6, and 6 is not a node separator of 1 and 2.

Example 4 (Testing a Faithful Distribution) We consider a 6-dimensional Gaussian distribution X = (X1, . . . , X6) that has a covariance matrix which is similar to the distribution in Example 3,

7 1 2 2 3 4 1 8 2 1 2.25 3 2 2 10 4 6 8 2 1 4 9 1 6 3 2.25 6 1 11 9 4 3 8 6 9 12

Observe that only Σ35 is changed. We again test the relation X1 X2 | X6. Running the algorithm produces a viable partition with U = {1, 3} and V = {2, 4, 5}. This agrees with the concentration graph, as shown in Figure 2.

We include now an algorithm that learns the topology of a class of (possibly) unfaithful Gaussian graphical models using local patches. Let us ﬁx a natural number K < n 2. We consider graphical models that satisfy the following assumption: for any nodes i and j that are not connected by an edge in G, there exists a vertex set S with |S| K such that S is a vertex separator of i and j. Certain graphs have this property, including graphs with bounded degree and some random graphs with high probability, like the Erd os-Renyi graph. The following algorithm learns the edges of a graphical model that satisﬁes the above assumptions.

Algorithm 2 (Edge Learning) Input covariance matrix Σ. For each node pair (i, j),

1. Let F = {S W \ {i, j} : |S| = K, Xi Xj | XS, and it is faithful}.

2. If F = φ, output (i, j) / E. If F = φ, output (i, j) E.

3. Output E.

Again, considering a computation of a conditional independence relation as one step, the running time of the algorithm is O(n K+4). This comes from exhaustively checking through all n 2 K possible separation sets S for each of the n 2 (i, j) pairs. Each time there is a conditional independence relation, we have to check for faithfulness using Algorithm 1, and the running time for that is O(n2). The novelty of the algorithm is in its ability to learn graphical models that are unfaithful.

Theorem 3 Algorithm 2 recovers the concentration graph G.

Proof. If F = φ, F is non-empty so there exists an S such that Xi Xj | XS is faithful. Therefore, S separates i and j in G and (i, j) / E. If F = φ, then for any S W, |S| K, we have either Xi Xj | XS or Xi Xj | XS but it is unfaithful. In both cases, S does not separate i and j in G, for any S W, |S| K. By the assumption on the graphical model, (i, j) must be in E. This shows that Algorithm 2 will correctly output the edges of G.

5 Conclusion

We have presented an equivalence condition for faithfulness in Gaussian graphical models and an algorithm to test whether a conditional independence relation is faithful or not. Gaussian distributions are special because its conditional independence relations depend on its covariance matrix, whose inverse, the precision matrix, provides us with a graph structure. The question of faithfulness in other Markov random ﬁelds, like Ising models, is an area of study that has much to be explored. The same questions can be asked, such as when unfaithful conditional independence relations occur, and whether they can be identiﬁed. In the future, we plan to extend some of these results to other Markov random ﬁelds. Determining statistical guarantees is another important direction to explore.

6.1 Proof of Lemma 2

Case 1: |V | = 1. In this case, |U| > 1 since |U| and |V | cannot both be one. the vector set {ω(h)( i) : h W \ S, h = j} is the vector set {ω(h)( i) : h U}.

Case 2: |V | > 1. Let us ﬁx i U. Note that ω(i)( j) = 0 for all j W \ (S {i}), since the diagonal entries of a positive deﬁnite matrix are non-zero, that is, ω(i) i = 0. Also, ω(i)( i) = 0 for all i U as well by Step 2 of the proof of Theorem 1. As such, the linear dependency of {ω(h)( i) : h W \ S, h = j} for any i U and j V implies that there exists scalars c(i,j) 1 , . . ., c(i,j) j 1 , c(i,j) j+1 , . . ., c(i,j) |W\S| such that X

1 h |W\S|,h =j c(i,j) h ω(h)( i) = 0. (5)

If c(i,j) i = 0, the vector set {ω(h)( i) : 1 h |W \ S|, h = u, j} is linearly dependent. This implies that the principal submatrix Ω ( i, i) has zero determinant, which contradicts Ω being positive deﬁnite. Thus, we have c(i,j) i = 0 for all i U and j V . For each i U and j V , this allows us to manipulate (5) such that w(i)( i) is expressed in terms of the other vectors in (5).

More precisely, let c(i,j) = [c(i,j) i ] 1(c(i,j) 1 , . . . , c(i,j) i 1 , c(i,j) i+1 , . . . , c(i,j) j 1 , c(i,j) j+1 , . . . , c(i,j) |W\S|), for i

U and j V . Note that Ω ( j, {i, j}) has the form [ω(1)( i), . . ., ω(i 1)( i), ω(i+1)( i), . . ., ω(j 1)( i), ω(j+1)( i), . . ., ω(|W\S|)( i)], where the vectors in the notation described above are column vectors. From (5), for any distinct j1, j2 V , we can generate equations

ω(i)( i) = Ω ( j1, {i.j1}) c(i,j1) = Ω ( j2, {i, j2}) c(i,j2), (6)

or effectively, Ω ( j1, {i.j1}) c(i,j1) Ω ( j2, {i, j2}) c(i,j2) = 0. (7)

This is a linear equation in terms of the column vectors {ω(h)( i) : h = i, h W}. These vectors must be linear independent, otherwise |Ω ( i, i)| = 0. Therefore, the coefﬁcient of each of the vectors must be zero. Speciﬁcally, the coefﬁcient of ω(j2)( i) in 7 is c(i,j1) j2 /c(i,j1) i is zero, which

implies that c(i,j1) j2 is zero, as required. Similarly, c(i,j2) j1 is zero as well. Since this holds for any

j1, j2 V , this implies that for any j V , c(i,j) h = 0 for all h V, h = j.

There are now two cases to consider. The ﬁrst is where |U| = 1. Here, i = u. Then, by (5), c(u,j) h = 0 for all distinct j, h V implies that ωu( u) = 0, which is a contradiction. Therefore |U| = 1, so |U| must be greater than 1. We then substitute c(i,j) h = 0, for all distinct j, h V , into (5) to deduce that {ω(h)( i) : h U} is indeed linearly dependent for any i U.

6.2 Proof of Lemma 3

Let |U| = k > 1 We arrange the indices of the column vectors of Ω so that U = {1, . . . , k}. For each i U, since {ω(h)( i) : h U} is linearly dependent and {ω(h) : h U} is linearly independent, there exists a non-zero vector d(i) = (d(i) 1 , . . . , d(i) k ) Rk such that Pk h=1 d(i) i ω(h)( i) = 0.

Let y(i) = (ω(1) i , . . . , ω(k) i ) Rk. Note that y(i) = ω(i) U , since Ω is symmetric, and so is a non-zero vector for all i = 1, . . . , k. Because ω(1), . . . , ω(k) are linearly independent, for each i = 1, . . . , k, we have d(i) y(h) = 0 for all h = i, h U and d(i) y(i) = 0.

We next show that vectors d(1), . . . , d(k) are linearly independent. Suppose that they are not. Then there exists some index i U and scalars a1, . . . , ai 1, ai+1, . . . , ak not all zeros, such that d(i) = P

1 j k,j =i ajd(j). We then have 0 = d(i) y(i) = P

1 h k,j =i ahd(j) y(i) = 0, a contradiction.

Therefore, d(1), . . . , d(k) are linearly independent.

For each j such that k+1 j |W\S| (that is, j V ), let us deﬁne yj = (ω(1) j , . . . , ω(k) j ). Let us ﬁx j. Observe that d(h) yj = 0 for all h = 1, . . . , k. Since d(1), . . . , d(k) are linearly independent, this implies that yj is the zero vector. Since this holds for all j such that k + 1 j |W \ S|,

therefore, ω(i) j = 0 for all 1 i k and k + 1 j |W \ S|.

[1] J. Pearl, Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, 1988.

[2] S. L. Lauritzen, Graphical models. New York: Oxford University Press, 1996.

[3] J. Whittaker, Graphical Models in Applied Multivariate Statistics. Wiley, 1990.

[4] N. Meinshausen and P. B uhlmann, High dimensional graphs and variable selection with the lasso, Annals of Statistics, vol. 34, no. 3, pp. 1436 1462, 2006.

[5] P. Ravikumar, M. J. Wainwright, G. Raskutti, and B. Yu, High dimensional covariance estimation by minimizing ℓ-1 penalized log-determinant divergence, Electronic Journal in Statistics, vol. 4, pp. 935 980, 2011.

[6] A. Anandkumar, V. Tan, F. Huang, and A. Willsky, High-dimensional gaussian graphical model selection: walk-summability and local separation criterion, J. Machine Learning Research, vol. 13, pp. 2293 2337, Aug 2012.

[7] R. Wu, R. Srikant, and J. Ni, Learning loosely connected markov random ﬁelds, Stochastic Systems, vol. 3, 2013.

[8] M. Frydenberg, Marginalisation and collapsibility in graphical interaction models, Annals of Statistics, vol. 18, pp. 790 805, 1990.

[9] G. Kauermann, On a dualization of graphical gaussian models, Scandinavian Journal of Statistics, vol. 23, no. 1, pp. 105 116, 1996.

[10] A. Becker, D. Geiger, and C. Meek, Perfect tree-like markovian distributions, Probability and Mathematical Statistics, vol. 25, no. 2, pp. 231 239, 2005.

[11] D. Malouche and B. Rajaratnam, Gaussian covariance faithful markov trees, Technical report, Department of Statistics, Stanford University, 2009.

[12] P. Spirites, C. Glymore, and R. Scheines, Causation, prediction and search. New York: Springer Verlag, 1993.

[13] C. Meek, Strong completeness and faithfulness in bayesian networks, in Proceedings of the eleventh international conference on uncertainty in artiﬁcial intelligence, 1995.

[14] C. Uhler, G. Raskutti, P. B uhlmann, and B. Yu, Geometry of faithfulness assumption in causal inference, Annals of Statistics, vol. 41, pp. 436 463, 2013.

[15] S. Lin, C. Uhler, B. Sturmfels, and P. B uhlmann, Hypersurfaces and their singularities in partial correlation testing, Preprint.