# faithfulness_of_probability_distributions_and_graphs__48e2bc0e.pdf

Journal of Machine Learning Research 18 (2017) 1-29 Submitted 5/17; Revised 11/17; Published 12/17

Faithfulness of Probability Distributions and Graphs

Kayvan Sadeghi k.sadeghi@statslab.cam.ac.uk Statistical Laboratory University of Cambridge Centre for Mathematical Sciences, Wilberforce Road Cambridge, CB3 0WB, United Kingdom

Editor: Peter Spirtes

A main question in graphical models and causal inference is whether, given a probability distribution P (which is usually an underlying distribution of data), there is a graph (or graphs) to which P is faithful. The main goal of this paper is to provide a theoretical answer to this problem. We work with general independence models, which contain probabilistic independence models as a special case. We exploit a generalization of ordering, called preordering, of the nodes of (mixed) graphs. This allows us to provide suﬃcient conditions for a given independence model to be Markov to a graph with the minimum possible number of edges, and more importantly, necessary and suﬃcient conditions for a given probability distribution to be faithful to a graph. We present our results for the general case of mixed graphs, but specialize the deﬁnitions and results to the better-known subclasses of undirected (concentration) and bidirected (covariance) graphs as well as directed acyclic graphs. Keywords: causal discovery, compositional graphoid, directed acyclic graph, faithfulness, graphical model selection, independence model, Markov property, mixed graph, structural learning

1. Introduction

Graphs have been used in graphical models in order to capture the conditional independence structure of probability distributions. Generally speaking, nodes of the graph correspond to random variables and missing edges to conditional independencies (Lauritzen, 1996). The connection between graphs and probability distributions is usually established in the literature by the concept of Markov property (Cliﬀord, 1990; Pearl, 1988; Studen y, 1989), which ensures that if there is a speciﬁc type of separation between node subsets A and B of the graph given the node subset C then random vectors XA and XB are conditionally independent given the random vector XC in the probability distribution. However, the ultimate connection between probability distributions and graphs requires the other direction to hold, namely for every conditional independence in the probability distribution to correspond to a separation in the graph. This connection has been called faithfulness of the probability distribution and the graph in Spirtes et al. (2000), and the graph has been called the perfect map of such a distribution in Pearl (1988). However, given a probability distribution P whether there is a graph (or graphs) to which P is faithful is an open problem, and consequently so is the problem of ﬁnding

c 2017 Kayvan Sadeghi.

License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v18/17-275.html.

these graphs. This problem can be raised for any type of graph existing in the literature of graphical models ranging from diﬀerent types of mixed graphs with three types of edges (Richardson and Spirtes, 2002; Wermuth, 2011; Sadeghi, 2016) and diﬀerent types of chain graphs (Lauritzen and Wermuth, 1989; Frydenberg, 1990; Andersson et al., 2001; Cox and Wermuth, 1993; Wermuth, 2011) to better-known classes of undirected (concentration) (Darroch et al., 1980) and bidirected (covariance) (Kauermann, 1996) graphs as well as directed acyclic graphs (Kiiveri et al., 1984; Pearl, 1988). Our goal is to provide an answer to this problem. A similar problem of given a graph whether there is a family of distributions faithful to it has been answered for very speciﬁc types of graphs and speciﬁc types of distributions; for example, for Gaussian distributions and undirected graphs in Lnˇeniˇcka and Mat uˇs (2007), DAGs in Pearl (1988); Geiger et al. (1990); Spirtes et al. (2000), ancestral graphs in Richardson and Spirtes (2002), and LWF chain graphs in Pe na (2011); and discrete distributions and DAGs in Geiger et al. (1990); Meek (1995) and LWF chain graphs in Pe na (2009).

The concept of faithfulness was originally deﬁned for the purpose of causal inference (Spirtes et al., 2000; Pearl, 1988), and the theory developed in this paper can be interpreted in causal language. A main approach to causal inference is based on graphical representations of causal structures, usually represented by causal graphs that are directed acyclic with nodes as random variables (that is a Bayesian network). Causal graphs are assumed to capture the true causal structure; see, for example, Neapolitan (2004). A main assumption made is called the causal faithfulness condition (CFC) stating that the true probability distribution is faithful to the true causal graph. Although the distribution being faithful to the true underlying causal graph is a much stronger assumption, it necessarily means that it must be faithful to some graph in order for one to be able to use graphical methods for causal inference. For an extensive discussion on the CFC, see Zhang and Spirtes (2008), and for related philosophical discussions, see, for example, Woodward (1998); Steel (2006). Although the results in this paper cover those of causal Bayesian networks, we have developed our theory to a much more general case of simultaneous representation of direct eﬀects , confounding , and non-causal symmetric dependence structures ; see, for example, Pearl (2009).

In this paper, we work with the general independence model J . Independence models J (P) induced by probability distributions P are a special case. We provide necessary and suﬃcient conditions for J to be graphical, that is to be faithful to a graph. In short, as proved in Theorem 17, J is graphical if and only if it satisﬁes the so-called compositional graphoid axioms as well as singleton-transitivity, and what we call ordered upwardand downward-stability.

As apparent from their names, ordered upwardand downward-stability depend on a generalization of ordering of variables, and consequently the nodes of the graph (called preordering). We provide our results for the most general case of mixed graphs, which contains almost all classes of graphs used in graphical models as subclasses with the exception of the chain graphs with the AMP Markov property (Andersson et al., 2001) and its generalizations (Pe na, 2014). However, based on the preordering w.r.t. which ordered upwardand downward-stability are satisﬁed, one can deduce to what type (or types) of graph P is faithful.

Faithfulness of Probability Distributions and Graphs

Since many of the subclasses of mixed graphs are not as well-known or used as some simpler classes of graphs, we provide a specialization of deﬁnitions and the main results for undirected and bidirected graphs as well as directed acyclic graphs at the end of the paper. We advise the readers that are only familiar with or interested in these simpler subclasses to skip the general deﬁnitions and results for mixed graphs, and focus on the specialization. The structure of the paper is as follows: In the next section, we provide the basic deﬁnitions for graphs as well as diﬀerent classes of graphs in graphical models, and independence models. In Section 3, we deﬁne and provide basic results for Markovness and faithfulness of independence models and graphs. In Section 4, we deﬁne and exploit the preordering for (socalled anterial) graphs, and then we deﬁne the concept of upwardand downward-stability. In Section 5, we provide suﬃcient conditions for an independence model to be so-called minimally Markov to a graph, and then we provide necessary and suﬃcient conditions for an independence model to be faithful to a graph. In Section 6, we specialize the deﬁnitions and results to the classes of undirected and bidirected graphs as well as directed acyclic graphs. In Section 7, we show how other results in the literature concerning faithfulness of certain probability distributions and certain classes of graphs are corollaries of our results. This also provides a nice set of examples for the theory presented in the paper. In Section 8, we end the paper with a short summary, a discussion on the statistical implications of the results, and an outline of future work.

2. Deﬁnitions and Concepts

In this section, we provide the basic deﬁnitions and concepts needed for the paper.

2.1 Graph Theoretical Deﬁnitions and Concepts

A graph G is a triple consisting of a node set or vertex set V , an edge set E, and a relation that with each edge associates two nodes (not necessarily distinct), called its endpoints. When nodes i and j are the endpoints of an edge, they are adjacent. We call a node adjacent to the node i, a neighbour of i, and denote the set of all neighbours of i by ne(i). We say that an edge is between its two endpoints. We usually refer to a graph as an ordered pair G = (V, E). Graphs G1 = (V1, E1) and G2 = (V2, E2) are called equal if (V1, E1) = (V2, E2). In this case we write G1 = G2. Notice that our graphs are labeled, that is every node is considered a diﬀerent object. Hence, for example, graph i j k is not equal to j i k. A loop is an edge with the same endpoints. Here we only discuss graphs without loops. Multiple edges are edges with the same pair of endpoints. A simple graph has neither loops nor multiple edges. Graphs in this paper are so-called mixed graphs, which contain three types of edges: lines (i j), arrows (i j), and arcs (i j), where in the two latter cases, we say that there is an arrowhead at j. If we remove all arrowheads from edges of the graph G, the obtained undirected graph is called the skeleton of G. The simple graph whose edge ij indicates whether there is an edge (or multiple edges) between i and j in G is called the adjacency graph of G. It is clear that if a graph is simple then its adjacency graph is the same as its skeleton. In this paper, only skeleton of anterial graphs or its subclasses are used, which, as will be deﬁned later, are simple graphs; hence, we denote the simple

skeleton by sk(G). We say that we direct the edges of a skeleton by putting arrowheads at the edges in order to obtain mixed graphs. A subgraph of a graph G1 is graph G2 such that V (G2) V (G1) and E(G2) E(G1) and the assignment of endpoints to edges in G2 is the same as in G1. We deﬁne a subgraph induced by edges A E of G = (V, E) to be a subgraph that contains V as the node set and all and only edges in A as the edge set. A walk is a list v0, e1, v1, . . . , ek, vk of nodes and edges such that for 1 i k, the edge ei has endpoints vi 1 and vi. A path is a walk with no repeated node or edge. A maximal set of nodes in a graph whose members are connected by some paths constitutes a connected component of the graph. A cycle is a walk with no repeated nodes or edges except for v0 = vk. We say a walk is between the ﬁrst and the last nodes of the list in G. We call the ﬁrst and the last nodes endpoints of the walk and all other nodes inner nodes. A subwalk of a walk ω = i0, e1, i1, . . . , en, in is a walk ir, er+1, ir+1, . . . , ep, ip that is a subsequence of ω between two occurrences of nodes (ir, ip, 0 r p n). If a subwalk forms a path then it is called a subpath of ω. A walk ω = i = i0, i1, . . . , in = j is directed from i to j if all edges ikik+1, 0 k n 1, are arrows pointing from ik to ik+1. If there is a directed walk from i to j then i is an ancestor of j and j is a descendant of i. We denote the set of ancestors of j by an(j). A walk ω = i = i0, i1, . . . , in = j is semi-directed from i to j if it has at least one arrow, no arcs, and every arrow ikik+1 is pointing from ik to ik+1. A walk between i and j is anterior from i to j if it is semi-directed from i to j or if it is undirected. If there is an anterior walk from i to j then we also say that i is an anterior of j. We use the notation ant(j) for the set of all anteriors of j. For a set A, we deﬁne ant(A) = S j A ant(j) \ A. Notice that, unlike in most places in the literature (for example Richardson and Spirtes 2002), we use walks instead of paths to deﬁne ancestors and anteriors. Because of this and the fact that ancestral graphs have no arrowheads pointing to lines, our deﬁnition of anterior extends the notion of anterior for ancestral graphs in Richardson and Spirtes (2002) with the modiﬁcation that in this paper, a node is not an anterior of itself. Using walks instead of paths is immaterial, as shown in Lauritzen and Sadeghi (2017). A section ρ of a walk is a maximal subwalk consisting only of lines, meaning that there is no other subwalk that only consists of lines and includes ρ. Thus, any walk decomposes uniquely into sections; these are not necessarily edge-disjoint and sections may also be single nodes. A section ρ on a walk ω is called a collider section if one of the following walks is a subwalk of ω: i ρ j, i ρ j, i ρ j. All other sections on ω are called non-collider sections. Notice that a section may be a collider on one walk and a non-collider on another, but we may speak of collider or non-collider sections without mentioning the relevant walk when this is apparent from the context.

2.2 Diﬀerent Classes of Graphs

All the subclasses of graphs included here are acyclic in the sense that they do not contain semi-directed cycles. The most general class of graphs discussed here is the class of chain mixed graphs (CMGs) (Sadeghi, 2016) that contains all mixed graphs without semi-directed cycles. They may have multiple edges containing an arc and a line or an arc and an

Faithfulness of Probability Distributions and Graphs

arrow but not a combination of an arrow and a line or arrows in opposite directions, as such combinations would constitute semi-directed cycles. In this paper, in using the term graph we mean a CMG unless otherwise stated. A general class of graphs that plays an important role in this paper is the class of anterial graphs (An Gs) (Sadeghi, 2016). An Gs are CMGs in which an endpoint of an arc cannot be an anterior of the other endpoint. CMGs include summary graphs (SGs) (Wermuth, 2011) and acyclic directed mixed graphs (ADMGs) (Richardson, 2003) as a subclass, but not include AMP chain graphs (Andersson et al., 2001), and An Gs include ancestral graphs (AGs) (Richardson and Spirtes, 2002) but not SGs or ADMGs. These have typically been introduced to describe independence structures obtained by marginalization and conditioning in DAG independence models; see for example Sadeghi (2013). An Gs also contain undirected and bidirected chain graphs (CGs). A chain graph is an acyclic graph so that if we remove all arrows, all connected components of the resulting graph called chain components contain one type of edge only. If all chain components contain lines, the chain graph is an undirected chain graph (UCG) (known as LWF chain graphs); if all chain components contain arcs it is a bidirected chain graph (BCG) (known as multivariate regression chain graphs). An Gs also contain Regression graphs (Wermuth and Sadeghi, 2012), which are chain graphs consisting of lines and arcs (although dashed undirected edges have mostly been used instead of arcs in the literature), where there is no arrowhead pointing to lines. These also contain graphs with only one type of edge; namely undirected graphs (UGs), containing only lines; bidirected graphs (BGs), containing only bidirected edges; and directed acyclic graphs (DAGs), containing only arrows and being acyclic. Clearly, a graph without arrows has no semi-directed cycles, and a semi-directed cycle in a graph with only arrows is a directed cycle. Cox and Wermuth (1993); Kauermann (1996); Wermuth and Cox (1998); Drton and Richardson (2008) used the terms concentration graphs and covariance graphs for UGs and BGs, referring to their independence interpretation associated with covariance and concentration matrices for Gaussian graphical models. DAGs have been particularly useful to describe causal Markov relations; see for example Kiiveri et al. (1984); Pearl (1988); Lauritzen and Spiegelhalter (1988); Geiger et al. (1990); Spirtes et al. (2000). For an extensive discussion on the subclasses of acyclic graphs and their relationships and hierarchy, see Lauritzen and Sadeghi (2017).

2.3 Independence Models and Their Properties

An independence model J over a ﬁnite set V is a set of triples X, Y | Z (called independence statements), where X, Y , and Z are disjoint subsets of V ; Z may be empty, but , Y | Z and X, | Z are always included in J . The independence statement X, Y | Z is read as X is independent of Y given Z . Independence models may in general have a probabilistic interpretation, but not necessarily. Similarly, not all independence models can be easily represented by graphs. For further discussion on general independence models, see Studen y (2005). In order to deﬁne probabilistic independence models, consider a set V and a collection of random variables {Xα}α V with state spaces Xα, α V and joint distribution P. We let

XA = {Xv}v A etc. for each subset A of V . For disjoint subsets A, B, and C of V we use the short notation A B | C to denote that XA is conditionally independent of XB given XC (Dawid, 1979; Lauritzen, 1996), that is that for any measurable Ω XA and P-almost all x B and x C,

P(XA Ω| XB = x B, XC = x C) = P(XA Ω| XC = x C).

We can now induce an independence model J (P) by letting

A, B | C J (P) if and only if A B | C w.r.t. P.

Similarly we use the notation A B | C for A, B | C / J (P). In order to use graphs to represent independence models, the notion of separation in a graph is fundamental. For three disjoint subsets A, B, and C, we use the notation A B | C if A and B are separated given C, and A B | C for A and B not separated given C. The separation can be uniﬁed under one deﬁnition for all graphs by using walks instead of paths: A walk π is connecting given C if every collider section of π has a node in C and all non-collider sections are disjoint from C. For pairwise disjoint subsets (A, B, C), A B | C if there are no connecting walks between A and B given C. This is in wording the same as the deﬁnition in Studen y and Bouckaert (1998) for undirected (LWF) chain graphs (although it is in fact a generalization as collider sections are generalized), and a generalization of the m-separation used with diﬀerent wordings in Richardson and Spirtes (2002); Wermuth (2011); Wermuth and Sadeghi (2012) for AGs and SGs, and of the dseparation of Pearl (1988). For UGs, the notion has a direct intuitive meaning, so that A B | C if all paths from A to B pass through C. If A, B, or C has only one member {i}, {j}, or {k}, for better readability, we write i, j | k J instead of {i}, {j} | {k} J ; and similarly for i j | k and i j | k. We also write A B when C = ; and similarly A B. A graph G induces an independence model J (G) by separation, letting

A, B | C J (G) A B | C in G.

An independence model J over a set V is a semi-graphoid if it satisﬁes the four following properties for disjoint subsets A, B, C, and D of V :

1. A, B | C J if and only if B, A | C J (symmetry);

2. if A, B D | C J then A, B | C J and A, D | C J (decomposition);

3. if A, B D | C J then A, B | C D J and A, D | C B J (weak union);

4. if A, B | C D J and A, D | C J then A, B D | C J (contraction).

Notice that the reverse implication of contraction clearly holds by decomposition and weak union. A semi-graphoid for which the reverse implication of the weak union property holds is said to be a graphoid; that is, it also satisﬁes

5. if A, B | C D J and A, D | C B J then A, B D | C J (intersection).

Faithfulness of Probability Distributions and Graphs

Furthermore, a graphoid or semi-graphoid for which the reverse implication of the decomposition property holds is said to be compositional, that is, it also satisﬁes

6. if A, B | C J and A, D | C J then A, B D | C J (composition).

Separation in graphs satisﬁes all these properties; see Theorem 1 in Lauritzen and Sadeghi (2017):

Proposition 1 For any graph G, the independence model J (G) is a compositional graphoid.

On the other hand, probabilistic independence models are always semi-graphoids (Pearl, 1988), whereas the converse is not necessarily true; see Studen y (1989). If, for example, P has strictly positive density, the induced independence model is always a graphoid; see, for example, Proposition 3.1 in Lauritzen (1996). See also Peters (2015) for a necessary and suﬃcient condition for the intersection property to hold. If the distribution P is a regular multivariate Gaussian distribution, J (P) is a compositional graphoid; for example see Studen y (2005). Probabilistic independence models with positive densities are not in general compositional; this only holds for special types of multivariate distributions such as, for example, Gaussian distributions and the symmetric binary distributions used in Wermuth et al. (2009). Another important property that is satisﬁed by separation in all graphs, but not necessarily for probabilistic independence models, is singleton-transitivity (also called weak transitivity in Pearl (1988), where it is shown that for Gaussian and binary distributions P, J (P) always satisﬁes it). For i, j, and k, single elements in V ,

7. if i, j | C J and i, j | C {k} J then i, k | C J or j, k | C J (singletontransitivity).

A singleton-transitive compositional graphoid is equivalent to what is called a Gaussoid in Lnˇeniˇcka and Mat uˇs (2007) with a rather diﬀerent axiomatization. The name reminds one that these are the axioms satisﬁed by the independence model of a regular Gaussian distribution (Pearl, 1988).

Proposition 2 For a graph G, J (G) satisﬁes singleton-transitivity.

Proof If there is a walk between i and k given C and a walk between j and k given C then by connecting these two walks, we obtain a walk π between i and j. Except the section ρ that contains k, all collider sections on π have a node in C and all non-collider sections are outside C. Hence depending on ρ being a collider or non-collider, π is connecting given C {k} or C respectively.

3. Markov and Faithful Independence Models

In this section, we deﬁne and discuss the concepts of Markovness and faithfulness for probability distributions, independence models, and graphs.

3.1 Markov Properties

For a graph G = (V, E), an independence model J deﬁned over V satisﬁes the global Markov property w.r.t. G, or is simply Markov to G, if for disjoint subsets A, B, and C of V it holds that A B | C = A, B | C J ,

or equivalently J (G) J . In particular, the graphical independence J (G) is trivially Markov to its graph G. For a probability distribution P, we simply say P is Markov to a graph G if J (P) is Markov to G, that is if J (G) J (P). Notice that every independence model over V is Markov to the complete graph with the node set V . For a graph G = (V, E), an independence model J deﬁned over V satisﬁes a pairwise Markov property w.r.t. a graph G, or is simply pairwise Markov to G, if for every nonadjacent pair of nodes i, j, it holds that i, j | C(i, j) J , for some C(i, j), where C(i, j) is the conditioning set of the pairwise Markov property and does not include i and j. The independence model J (G) only satisﬁes pairwise Markov properties w.r.t. what is called maximal graphs, which are graphs where the lack of an edge between i and j corresponds to a conditional separation statement for i and j. For a probability distribution P, we say P is pairwise Markov to a graph G if J (P) is pairwise Markov to G. For maximal graphs, a pairwise Markov property is deﬁned by letting C(i, j) = ant(i) ant(j) \ {i, j} (Lauritzen and Sadeghi, 2017), which we henceforth use as the pairwise Markov property. The conditioning set of the pairwise Markov property simpliﬁes for DAGs, C(i, j) = an(i) an(j)\{i, j} (Richardson and Spirtes, 2002; Sadeghi and Lauritzen, 2014); for BGs, C(i, j) = (as deﬁned in Wermuth and Cox 1998); and for connected UGs, C(i, j) = V \ {i, j} (as deﬁned in Lauritzen 1996). This pairwise Markov property, is in fact equivalent to the global Markov property under compositional graphoids; see Theorem 4 of Lauritzen and Sadeghi (2017):

Proposition 3 Let G be a maximal graph. If the independence model J is a compositional graphoid, then J is pairwise Markov to G if and only if J is Markov to G.

In particular, for UGs, the equivalence of the global and pairwise Markov properties holds under graphoids, and for BGs under compositional semi-graphoids. Two undirected or two bidirected graphs G and H, where G = H, induce diﬀerent independence models; that is J (G) = J (H). This is not necessarily true for larger subclasses. We call two graphs G and H such that J (G) = J (H) Markov equivalent. Conditions for Markov equivalence for most subclasses of graphs are known; see Verma and Pearl (1990); Ali et al. (2009); Wermuth and Sadeghi (2012). Notice that two Markov equivalent maximal graphs have the same skeleton.

3.2 Faithfulness and Minimal Markovness

We say that an independence model J and a probability distribution P are faithful if J = J (P). Similarly, we say that J and a graph G are faithful if J = J (G). We also say that P and G are faithful if J (P) = J (G). If P and G are faithful then we may sometimes also say that P is faithful to G or vice versa, although in principle faithfulness is a symmetric relation; the same holds for saying that J is faithful to G or P. Notice that

Faithfulness of Probability Distributions and Graphs

we are extending the deﬁnition of faithfulness to include the relation between independence models and graphs as well as independence models and probability distributions rather than only that between graphs and probability distributions as originally used in Spirtes et al. (2000). Thus, if J and G are faithful then J is Markov to G, but it also requires every independence statement to correspond to a separation in G. We say that J or G is probabilistic if there is a distribution P that is faithful to J or G respectively. If there is a graph that is faithful to J or P then we say that J or P is graphical. Our main goal is to characterize graphical probability distributions, and in addition, if existent, to provide graphs that are faithful to a given P. We solve this problem for the general case of independence models J . For a given independence model J , we deﬁne the skeleton of J , denoted by sk(J ), to be the undirected graph that is obtained from J as follows: we deﬁne the node set of sk(J ) to be V , and for every pair of nodes i, j, we check whether i, j | C J holds for some C V \{i, j}; if it does not then we draw an edge between i and j. One can similarly deﬁne sk(P) for a probability distribution P by checking whether i j | C. This is the same as the graph obtained by the ﬁrst step of the SGS algorithm (Glymour et al., 1987). Notice that this is not necessarily the same as the undirected graph obtained by the pairwise Markov property for UGs; see Section 6.1. For UGs and BGs, the skeleton of J uniquely determines the graph, whereas for other subclasses, there are several graphs with the same skeleton sk(J ). As will be seen, the preordering, deﬁned in the next section, enables us to direct the edges of the skeleton.

Proposition 4 If an independence model J is Markov to a graph G then sk(J ) is a subgraph of sk(G).

Proof Suppose that there is no edge between i and j in sk(G). Since the global Markov property w.r.t. G implies the pairwise Markov property, it holds that i, j | C(i, j) J , which implies i is not adjacent to j in sk(J ).

Hence, if J is Markov to a graph G such that sk(G) = sk(J ) then G has the fewest number of edges among those to which J is Markov. We say that J is minimally Markov to a graph G if J is Markov to G and sk(G) = sk(J ). The same can be deﬁned for probability distributions, and has been used in the literature under the name of (causal) minimality assumption (Pearl, 2009; Spirtes et al., 2000; Zhang and Spirtes, 2008; Neapolitan, 2004). Minimally Markov independence models to a graph are important since only these can also be faithful to the graph:

Proposition 5 If an independence model J and a graph G are faithful then J is minimally Markov to G.

Proof Since J is Markov to G, we need to prove that sk(G) = sk(J ). By Proposition 4, sk(J ) is a subgraph of sk(G). Now, suppose that there is no edge between i and j in sk(J ). By the construction of sk(J ), it holds that i, j | C J for some C. Since J and G are faithful, i j | C in G. This implies that i is not adjacent to j in G.

Hence, we need to discuss conditions under which J is minimally Markov to a graph as well as conditions for faithfulness of minimally Markov independence models and graphs. These will be presented in Section 5.

4. Preordering in Graphs and Ordered Stabilities

In this section, we ﬁrst deﬁne the preordering for sets and its validity for nodes of the graphs, and then use preordering to deﬁne some new properties of conditional independence.

4.1 Preordering of the Nodes of a Graph

Over a set V , a partial order is a binary relation that satisﬁes the following properties for all members a, b, c V :

a a (reﬂexivity);

if a b and b a then a = b (antisymmetry);

if a b and b c then a c (transitivity).

If a b or b a then a and b are comparable; otherwise they are incomparable. A partial order under which every pair of elements is comparable is called a total order or a linear order. A preorder over V , on the other hand, is a binary relation that is only reﬂexive and transitive. Given a preorder over V , one may deﬁne an equivalence relation on V such that a b if and only if a b and b a. This explains the use of the notation . It then holds that

if a b and b c then a c; and

if a b and a c then c b.

We also use the notation a < b indicating that a b and a b. We ﬁrst provide the following result; see Section 5.2.1 of Schr oder (2003):

Proposition 6 Let be a preorder over the set V , and the equivalence relation on V as deﬁned above. Let also Q be the set of all equivalence classes of V w.r.t. . Then the relation deﬁned over Q by [a] [b] a b is a partial order over Q.

We say that a graph G = (V, E) admits a valid preorder if, for nodes i and j of G, the following holds:

if i j then i j;

if i j then j < i;

if i j then i and j are incomparable.

The global interpretation of a valid preorder on graphs is as follows:

Proposition 7 Let be a valid preorder for a graph G. It then holds for nodes i and j of G that if i ant(j) then j i; in particular,

1. if there is a semi-directed path from i to j then j < i; and

2. if i and j are connected by a path consisting only of lines then i j.

Faithfulness of Probability Distributions and Graphs

Proof The proof follows from transitivity of both preorders and anterior paths.

Corollary 8 Let be a valid preorder for a graph G and G its subgraph induced by lines. The equivalence classes of G based on are the connected components of G . In addition, deﬁnes a partial order over the connected components of G by [if i j, i τ and j δ then τ > δ], for τ and δ two connected components of G .

Proof The results follows from Propositions 7 and 6.

For example, consider the graph in Fig. 1. The graph admits a valid preorder with the preorder provided over the nodes, or equivalently over the connected components of G (with the abuse of the notation for numbering). The notation implies that two nodes with label 2 are in the same equivalence class, 2 > 1 and 2 > 1 , but a, b , and c , a, b, c {1, 2}, are not comparable.

Figure 1: A preordered graph.

There may be many diﬀerent preorders that are valid for a graph. If there is a valid preordering and we expect the other direction of Proposition 7 to hold then we obtain a unique preordering for the graph: Given a graph G, if i / ant(j) and j / ant(i) then set i and j to be incomparable. Otherwise, let i j when i j, and i > j when i j. It is easy to see that this, in fact, is a preorder for the nodes of G. We call this preordering the minimal preorder for G since it gives the fewest possible comparable pairs of nodes. For example, the preordering in the graph of Fig. 1 is minimal. In this paper, we mostly deal with this type of preorderings for graphs. It is easy to observe the following:

Proposition 9 If G is anterial then the minimal preorder for G is a valid preorder for G.

In fact, in general, we have the following:

Proposition 10 A graph admits a valid preorder if and only if it is anterial.

Proof If a graph G admits a valid preorder then by Proposition 7, there cannot be a directed cycle, nor can there be an arc with one endpoint anterior of the other. The converse is Proposition 9.

Therefore, in addition to CMGs, SGs (and ADMGs) do not admit a valid preorder, but AGs do. However, notice that for every CMG, there exists a Markov equivalent An G (see Sadeghi 2016); and the same for SGs and AG. Hence, the question of whether there is a CMG that is faithful to J is the same as whether there is an An G that is faithful to J ; and the same holds for SG and AG.

As discussed above, there is a one-to-one correspondence between An Gs and the minimal preorder for the An Gs. When the skeleton of the concerned graph is known, we have the following trivial result:

Proposition 11 Given the skeleton of a graph G and a preorder of its nodes, G can be constructed uniquely by directing the edges of sk(G). In addition, is a valid preorder for G.

Let J be deﬁned over V . We introduce preordering in this paper in order to stress that, in principle, a preordering of members of V (which are possibly random variables) is deﬁned irrespective of graphs to which they are Markov or faithful. However, for our purposes, knowing the skeleton of such graphs, one can directly discover the directions of the edges of the graph, which correspond to the minimal preordering for the graph, rather than working directly on the set V . Suppose that there exists a preorder over V . Proposition 11 implies that we can direct the edges of sk(J ) based on in order to deﬁne a dependence graph G(J , ) induced by J and . Notice that, by Proposition 10, G(J , ) is anterial. Similarly, for P, one can deﬁne G(P, ) = G(J (P), ). As mentioned, we are only interested in preorderings that are minimal preorders of graphs. Given J , we deﬁne a preordering to be J -compatible if is the minimal preorder for G(J , ). Similarly, one can deﬁne a P-compatible preorder.

4.2 Ordered Upwardand Downward-stabilities

We now exploit the preodering for independence models in order to deﬁne two other properties in addition to the seven properties deﬁned in Section 2.3 (namely singleton-transitive compositional graphoid axioms). We say that an independence model J over the set V respectively satisﬁes ordered upwardand downward-stability w.r.t. a preorder of V if the following hold:

8. if i, j | C J then i, j | C {k} J for every k V \ {i, j} such that l k for some l {i, j} or l k for some l C (ordered upward-stability);

9. if i, j | C J then i, j | C \ {k} J for every k V \ {i, j} such that l k for every l {i, j} and l < k for every l C \ {k} (ordered downward-stability).

Ordered upward-stability is a generalization of a modiﬁcation of upward stability, deﬁned in Fallat et al. (2017), and strong union, deﬁned in Pearl and Paz (1985), for undirected graphs, with singletons instead of node subsets.

Proposition 12 Let G be an An G and the minimal preorder for G. Then J (G) satisﬁes ordered upwardand downward-stability w.r.t. .

Proof First we prove that J (G) satisﬁes ordered upward-stability w.r.t. the minimal preorder: If k C, the result is obvious, thus suppose that k / C. Suppose that there is a connecting walk π between i and j given C {k}. If k is not on π then we are done; otherwise, k is on a collider section with no other node in C on π. Notice that l k means here that k ant(l), and l k that k and l are connected by a path consisting of lines.

Faithfulness of Probability Distributions and Graphs

If l C then denote the path of lines between k and l by ω. The walk consisting of the subwalk of π from i to k, ω, ω in the reverse direction (from l to k), and the subwalk of π from k to j is connecting given C. This is because on this walk k and l are in the same collider section with l in C. Now denote an anterior path from k to l {i, j} by ϖ. Without loss of generality we can assume that l = j. We now consider two cases: 1) If ϖ has no nodes in C then the walk containing the subwalk of π between i and k and ϖ is connecting given C between i and j since k and all nodes on ϖ are on non-collider sections on this walk. 2) If ϖ has a node in C then consider the one closest to k and call it h. Then the walk consisting of the subwalk of π from i to k, the subpath of ϖ from k to h, the reverse of the subpath from h to k, and the subwalk of π from k to j is connecting given C as proven as follows: 1) if there is an arrow on ϖ then k is on a non-collider section in both instances in which it appears on this walk, and h is on a collider section and in C. 2) if ϖ only consists of lines then k and h are in the same collider section with a node (h) in C. Now we prove that J (G) satisﬁes ordered downward-stability: If k / C then the result is obvious, thus suppose that k C. Suppose that there is a connecting walk π between i and j given C \ {k}. If k is not on π then we are done. Otherwise, ﬁrst it is clear that since π is connecting given C \ {k}, k cannot be the only node on a collider section that is in C. If k is on the same section as another member of C then π is clearly connecting given C. The only case that is left is when k is on a non-collider section on π; but this is impossible: there is no arrowhead at the section containing k from one side on π, say from the j side. By moving towards j from k, it can be implied that k is either an anterior of a node in a collider section that is in C, or an anterior of j, but both are impossible since the former implies l < k for l C, and the latter implies j k.

5. Characterization of Graphical Independence Models

In this section, we ﬁrst provide suﬃcient conditions for minimal Markovness, and then necessary and suﬃcient conditions for faithfulness.

5.1 Suﬃcient Conditions for Minimal Markovness

Here we provide suﬃcient conditions for an independence model to be minimally Markov to a graph. First, we have the following trivial result:

Proposition 13 If J is minimally Markov to an anterial graph G then G = G(J , ) for the minimal preorder for G.

By Proposition 5, the above statement also holds when J and G are faithful.

Proposition 14 Suppose that there exists a J -compatible preorder over V w.r.t. which J satisﬁes ordered downwardand upward-stability. It then holds that J is pairwise Markov to G(J , ).

Proof Suppose that there are non-adjacent nodes i and j in G(J , ). These are nonadjacent in sk(J ) too. By deﬁnition of sk(J ), it holds that i, j | C J for some C V \ {i, j}.

Since J satisﬁes ordered downward-stability and since the preorder is J -compatible, as long as a node k C is not an anterior of {i, j} or there is no semi-directed path from it to C \ {k}, it can be removed from the conditioning set, that is we have that i, j | C \ k J . Since G(J , ) is acyclic, as long as ant({i, j}) C, such a k exists. Now consider a node k C \ {k} that is not an anterior of {i, j} or there is no semi-directed path from it to C \ {k, k } and apply downward-stability again. By an inductive argument we imply that i, j | C ant({i, j}) J . Now since P satisﬁes ordered upward-stability, nodes outside C that are in ant({i, j}) can be added to the conditioning set. Hence, it holds that i, j | ant({i, j}) \ {i, j} J . This completes the proof.

In principle, a compatible preorder w.r.t. which J satisﬁes ordered downwardand upwardstability can be found by going through all such preorderings. Finding such a preorder eﬃciently is a structural learning task. We believe we have an eﬃcient way to do this when the skeleton of J has been learned, but this is beyond the scope of this paper.

Theorem 15 Suppose, for an independence model J over V , that

1. J is a compositional graphoid; and

2. there exists a J -compatible preorder over V w.r.t. which J satisﬁes ordered downwardand upward-stability.

It then holds that J is minimally Markov to G(J , ).

Proof The proof follows from Proposition 14, the fact that sk(J ) = sk(G(J , )), and Proposition 3.

5.2 Conditions for Faithfulness

In this section, we present the main result of this paper, which provides necessary and suﬃcient conditions for faithfulness of independence models (and probability distributions) and a (not necessarily known) graph in the general case. In Section 6, we specialize the result to the more well-known subclasses as corollaries.

Proposition 16 Let J be an independence model over V . Suppose that J is minimally Markov to an anterial graph G. It then holds that J and G are faithful if and only if

1. J satisﬁes singleton-transitivity; and

2. J satisﬁes ordered downwardand upward-stability w.r.t. the minimal preorder for G.

Proof Suppose that J and G are faithful. This is equivalent to J = J (G). By Proposition 2, J satisﬁes singleton-transitivity, and by Proposition 12, the result follows. Conversely, suppose that J satisﬁes singleton-transitivity as well as ordered downwardand upward-stability w.r.t. the minimal preorder for G. By Proposition 13, G = G(J , ). We show that J and G are faithful. We need to show that if A, B | C J then A B | C in G. Consider i A and j B. By decomposition, we have that i, j | C J , which implies that i and j are not adjacent in G.

Faithfulness of Probability Distributions and Graphs

If, for contradiction, C does not separate nodes i and j then there exists a connecting walk π = i = i1, i2, . . . , ir = j , on which all collider sections have a node in C and all non-collider sections are outside C. In addition, for every node iq, 2 q r 1, on π deﬁne Ciq as follows: If iq / C then Ciq = ; if iq C then Ciq is the set of all l C \ π such that there is a semi-directed path from iq to l but not from any ip C to l, p {q+1, . . . , r 1}. Let Cπ = Sr 1 q=2 Ciq. We prove by induction along nodes of π that for every q, 2 q r, it holds that i1, iq | ( q 1 p=2Cip) [C \ ({iq, . . . , ir} Cπ)] / J . The base for q = 2 is trivial since by the construction of G(J , ), adjacent nodes are dependent given anything. Suppose now that the result holds for q and we prove it for q + 1. First, again by the construction of G(J , ), it holds that iq, iq+1 | ( q 1 p=2Cip) [C \ ({iq, . . . , ir} Cπ)] / J . By singleton-transitivity, this and the induction hypothesis imply (a): i1, iq+1 | ( q 1 p=2Cip) [C \ ({iq, . . . , ir} Cπ)] / J ; or (b): i1, iq+1 | {iq} ( q 1 p=2Cip) [C \ ({iq, . . . , ir} Cπ)] / J . We now have two cases: 1) If iq is on a collider section: The section containing iq has a node in C. We again consider two subcases: 1.i) If iq is in C: If (a) holds then ﬁrst notice that iq is not an anterior of {i1, iq+1} nor is there a semi-directed path from iq to ( q 1 p=2Cip) [C \ ({iq, . . . , ir} Cπ)]. We apply downward-stability to (a) to obtain (b). If Ciq = then consider a highest order node l in Ciq. Again by downward-stability, we obtain i1, iq+1 | {l} ( q 1 p=2Cip) [C \ ({iq+1, . . . , ir} Cπ)] / J . By an inductive argument along the members of Ciq (by, at each step, choosing a highest order node in Ciq that has not yet been chosen), we eventually obtain i1, iq+1 | ( q p=2Cip) [C \ ({iq+1, . . . , ir} Cπ)] / J . Notice that iq is in the conditioning set of this dependency. 1.ii) If iq is not in C: If (b) holds then observe that iq is in the same equivalence class as that of either iq+1 or a node is C, s < q. Hence we can apply upward-stability to obtain (a). Since Ci2 = , (a) is clearly the same as i1, iq+1 | ( q p=2Cip) [C \ ({iq+1, . . . , ir} Cπ)] / J . Notice that iq is not in the conditioning set of this dependency. 2) If iq is on a non-collider section: We have that iq / C, and iq ant({i1, iq+1}). Hence, by upward-stability, from (b) we obtain (a). Again, since Ciq = , we obtain the result. Therefore, we established that i1, iq | ( q 1 p=2Cip) [C \ ({iq, . . . , ir} Cπ)] / J , for all q, where depending on whether iq C or not, the conditioning set contains or does not contain iq. By letting q = r, we obtain i1, ir | C / J , a contradiction. Therefore, it holds that i j | C in G. By the composition property for separation for graphs (Proposition 1), we obtain the result.

We follow the arguments in the proof for the following example.

Example 1 Consider the undirected (LWF) chain graph G in Fig. 2. Suppose that a probability distribution P is minimally Markov to G, and J (P) satisﬁes singleton-transitivity and ordered downwardand upward-stability w.r.t. the minimal preorder for G. In order to prove faithfulness, one needs to show that if two sets of nodes are not separated given a third set then they are not independent either. For example, we have that l j | l2. Notice that this is a case where there is no path that connects l and j. In order to show that l j | l2, (as in the proof of the above theorem) we

start from l = l1 and move towards j = l7 via the connecting walk l1, l2, l3, l4, l5, l6, l7 given l2.

Figure 2: An undirected (LWF) chain graph.

We abbreviate singleton-transitivity, ordered upward-stability, and ordered downward stability by st , ous , and ods respectively. We write the arguments in a condensed from. For example, the ﬁrst implication (on the left hand side) says that l1 l2 and l2 l3, using singleton-transitivity, imply that either l1 l3 or l1 l3 | l2 ; but now ordered downward-stability implies that we must have l1 l3 | l2.

l1 l2 l1 l3 l4 l5 | l2 l1 l5 | {l2, l4} l6 l7 | l2 l1 l7 | {l2, l6} st or ods st or ous st or ous l2 l3 l1 l3 | l2 l1 l4 | l2 l1 l5 | l2 l1 l6 | l2 l1 l7 | l2. st or ous st or ous l3 l4 | l2 l1 l3 | {l2, l3} l5 l6 | l2 l1 l6 | {l2, l5}

The following example shows how singleton-transitivity and ordered stabilities are necessary for faithfulness.

Example 2 Consider the DAG G in Fig. 3. If an independence model J is minimally Markov to G then we must have h, k | {j, l} J and j, l | k J . Now, in violation of faithfulness, if only h, k | J in addition to these holds then upward-stability is violated since for j > h, h, k | j J must hold. If, in violation of faithfulness, only h, k | j J in addition to the original statements holds then singleton-transitivity is violated since h, k | j J and h, k | {j, l} J must imply either h, l | j J or k, l | j J , which are both impossible due to minimality since h, l and k, l are adjacent in G.

Figure 3: A DAG.

Without assuming the Markov assumption we have the following:

Faithfulness of Probability Distributions and Graphs

Theorem 17 Let J be an independence model deﬁned over V . It then holds that J is graphical if and only if

1. J is a singleton-transitive compositional graphoid; and

2. there exists a J -compatible preorder over V w.r.t. which J satisﬁes ordered downwardand upward-stability.

In addition, if 1 and 2 hold then J is faithful to G(J , ).

Proof ( ) follows from Propositions 1, 2, and 12, and the fact that there is always an An G that is Markov equivalent to a given CMG. To prove ( ) and the last statement of the theorem, consider G(J , ). Theorem 15 implies that J is minimally Markov to G(J , ). This together with the J -compatibility of the preorder, using Proposition 16, implies that J and G(J , ) are faithful.

Hence, for probability distributions, we have the following characterization:

Corollary 18 Let P be a probability distribution deﬁned over {Xα}α V . It then holds that P is graphical if and only if

1. J (P) satisﬁes intersection, composition, and singleton-transitivity; and

2. there exists a P-compatible preorder over V w.r.t. which J (P) satisﬁes ordered downwardand upward-stability.

In addition, if 1 and 2 hold then P is faithful to G(P, ).

In fact, we have shown in Theorem 17 that if J is graphical then for every J -compatible preorder w.r.t. which J satisﬁes ordered downwardand upward-stability, G(J , ) is a graph that is faithful to J . A consequence of Proposition 13 implies that such graphs (that is G = G(J , ) for some preorder w.r.t. which J satisﬁes ordered downwardand upwardstability) constitute the set of all graphs that are faithful to J , that is a Markov equivalence class of graphs whose members are faithful to J . Every member of the equivalence class corresponds to a diﬀerent preorder w.r.t. which J satisﬁes ordered downwardand upwardstability. This equivalence set can be represented by one member (a graph with the least number of arrowheads) similar to the idea of a CPDAG for the Markov equivalence class of DAGs (see Spirtes et al. 2000), but we do not go through the details of this here. Of course, if the goal is to verify whether J and a given G are faithful then we can use the following corollary:

Corollary 19 Let J be an independence model deﬁned over V , and G an An G with node set V . It then holds that if

1. J is a singleton-transitive compositional graphoid; and

2. J satisﬁes ordered downwardand upward-stability w.r.t. the minimal preorder for G,

then J and G are faithful.

It is also important to note that singleton-transitivity and ordered upward or downwardstability are necessary in the sense that they are not implied by one another, although there are diﬀerent ways to axiomatize singleton-transitive compositional graphoids that satisfy ordered upwardand downward-stability. If P is Gaussian then we have the following:

Corollary 20 Let P be regular Gaussian distribution. It then holds that P is graphical if and only if there exists a P-compatible preorder w.r.t. which J (P) satisﬁes ordered downwardand upward-stability.

Proof The proof follows from the fact that regular Gaussian distributions satisfy intersection, composition, and singleton-transitivity.

6. Specialization to Subclasses

The deﬁnitions and results presented in this paper can be specialized to any subclass of CMGs including ancestral graphs and LWF chain graphs. However, here we only focus on the three of the most used classes of undirected (concentration), bidirected (covariance), and directed acyclic graphs.

6.1 Specialization to Undirected and Bidirected Graphs

For connected UGs, every two nodes are in the same equivalence class, and for BGs every two nodes are incomparable; hence, the minimal (pre)ordering is trivial and uninteresting in these cases. More precisely, ordered upwardand downward-stabilities can be specialized for these two types of trivial preordering:

8a. if i, j | C J then i, j | C {k} J for every k V \ {i, j} (upward-stability);

9a. if i, j | C J then i, j | C \ {k} J for every k V \ {i, j} (downward-stability).

Proposition 21 An independence model J satisﬁes ordered upwardand downward-stability (8) and (9) w.r.t. the minimal preorder if and only if,

1. for connected undirected graphs, it satisﬁes upward-stability (8a);

2. for bidirected graphs, it satisﬁes downward-stability (9a).

In addition, if, for undirected graphs, J satisﬁes (8a) then it satisﬁes (8).

Proof The proof of 1 and 2 are straightforward since in 1 all nodes are in the same equivalence class, and in 2 all nodes are incomparable. The second part is trivial.

Analogous to Proposition 12, it is also straightforward to show the following:

Proposition 22 It holds that

1. for an undirected graph G, J (G) satisﬁes upward-stability;

Faithfulness of Probability Distributions and Graphs

2. for a bidirected graph G, J (G) satisﬁes downward-stability.

For UGs and BGs, denote the unique and trivial valid preorders by u and b respectively. For UGs, G u(J ) := G(J , u) = sk(J ), whereas for BGs, G b(J ) := G(J , b) is sk(J ) with all edges directed to be bidirected. Another way to construct induced UGs and BGs by J is to construct them based on their corresponding pairwise Markov property, that is to connect every pair of nodes i and j if i, j | V \ {i, j} / J for UGs, and if i, j | / J for BGs. Denote these graphs by Gu(J ) and Gb(J ) respectively. We then have the following:

Proposition 23 It holds that,

1. for undirected graphs, G u(J ) = Gu(J ) if J satisﬁes upward-stability;

2. for bidirected graphs, G b(J ) = Gb(J ) if J satisﬁes downward-stability.

For upwardand downward-stability, we also have the following observation:

Lemma 24 For an independence model J the following holds:

1. If J is a semi-graphoid and satisﬁes upward-stability then J satisﬁes composition.

2. If J is a semi-graphoid and satisﬁes downward-stability then J satisﬁes intersection.

Proof 1. Suppose that A, B | C J and A, D | C J . By decomposition, it holds, for i A, j B, and k D, that i, j | C J and i, k | C J . By upward-stability, we obtain i, j | C {k} J . Contraction implies i, {j, k} | C J . By an inductive argument, we obtain the result. 2. Suppose that A, B | C D J and A, D | C B J . By decomposition, it holds, for i A, j B, and k D, that i, j | C {k} J and i, k | C {j} J . By downward-stability, we obtain i, k | C J . Contraction implies i, {j, k} | C J . By an inductive argument, we obtain the result.

Lemma 25 Let J be an independence model, and suppose that J is a graphoid. It then holds that J satisﬁes ordered downward-stability (9) w.r.t. the minimal preorder for Gu(J ).

Proof By deﬁnition, we know that J is pairwise Markov to Gu(J ). It is known that under graphoid, this implies that J is Markov to Gu(J ); see Pearl (1988). Now suppose that i, j | C J . The only applicable k in the deﬁnition of (9) is not in the same connected component as that of i or j. Hence, since i j | C \ {k}, by the global Markov property, it holds that i, j | C \ {k} J .

As a consequence of Theorem 15, we have the following for UGs and BGs:

Corollary 26 It holds that

1. if J is a graphoid and satisﬁes upward-stability then J is minimally Markov to the undirected graph Gu(J );

2. if J is a compositional semi-graphoid and satisﬁes downward-stability then J is minimally Markov to the bidirected graph Gb(J ).

Proof The proof follows from Propositions 23 and 21, and Lemmas 24 and 25.

For other classes of graphs, the preordering (or ordering) of nodes (and consequently random variables) is necessary since the conditioning set C(i, j) for the pairwise Markov property depends on the preordering of the variables. This implies that in general one cannot construct more than the skeleton of graphs based on the pairwise Markov property if there is no given preordering. Proposition 16 specializes to the following two corollaries:

Corollary 27 Suppose that an independence model J is Markov to Gu(J ). Then J and Gu(J ) are faithful if and only if J satisﬁes singleton-transitivity and upward-stability.

Proof The proof follows from Propositions 21, 22, and 23, and Lemma 25.

Similarly, for BGs, we have the following:

Corollary 28 Suppose that an independence model J is Markov to Gb(J ). Then J and Gb(J ) are faithful if and only if J satisﬁes singleton-transitivity and downward-stability.

The main conditions for an independence model to be faithful to a UG or BG (consequence of Theorem 17) are as follows; see also Theorem 3 of Pearl and Paz (1985) for a similar result for UGs:

Corollary 29 Let J be an independence model deﬁned over V . It then holds that J is faithful to an undirected graph if and only if it is an upward-stable singleton-transitive graphoid. In addition, if the conditions hold then J is faithful to Gu(J ).

Proof The proof follows from Propositions 21, 22, and 23, and Lemmas 24 and 25.

Corollary 30 Let J be an independence model deﬁned over V . It then holds that J is faithful to a bidirected graph if and only if it is a downward-stable singleton-transitive compositional semi-graphoid. In addition, if the conditions hold then J is faithful to Gb(J ).

Therefore, a probability distribution is faithful to an undirected graph if and only if it satisﬁes intersection, singleton-transitivity, and upward-stability; and to a bidirected graph if and only if it satisﬁes composition, singleton-transitivity, and downward-stability. Finally, as a consequence of Corollary 20, we have the following:

Corollary 31 Let P be a regular Gaussian distribution. It then holds that P being faithful to an undirected graph is equivalent to it satisfying upward-stability; and P being faithful to a bidirected graph is equivalent to it satisfying downward-stability.

6.2 Specialization to Directed Acyclic Graphs

For DAGs, the valid preorder specializes to a valid (partial) order since there are no lines in the graph: If i j then i > j. In the minimal order for a DAG, two nodes are incomparable if and only if neither is an ancestor of the other. Therefore, we deﬁne ordered stabilities w.r.t. ordering for DAGs:

Faithfulness of Probability Distributions and Graphs

8b. if i, j | C J then i, j | C {k} J for every k V \ {i, j} such that l < k for some l {i, j} (ordered upward-stability);

9b. if i, j | C J then i, j | C \ {k} J for every k V \ {i, j} such that l < k for every l {i, j} C \ {k} (ordered downward-stability).

As a consequence of Proposition 12, we have the following:

Corollary 32 For a directed acyclic graph G, the induced independence model by d-separation J (G) satisﬁes ordered upwardand downward-stability (8b) and (9b) w.r.t. the minimal order for G.

As a consequence of Theorem 15, and for G(J , ), the DAG obtained by directing the edges of sk(J ) based on the order , we have the following:

Corollary 33 Suppose, for an independence model J over V , that

1. J is a compositional graphoid; and

2. there exists a J -compatible order over V w.r.t. which J satisﬁes ordered downwardand upward-stability, (8b) and (9b).

Then J is minimally Markov to the directed acyclic graph G(J , ).

The main conditions for an independence model to be faithful to a DAG (consequence of Theorem 17) is as follows:

Corollary 34 Let J be an independence model deﬁned over V . It then holds that J is faithful to a directed acyclic graph G if and only if

1. J is a singleton-transitive compositional graphoid; and

2. there exists a J -compatible order over V w.r.t. which J satisﬁes ordered downwardand upward-stability, (8b) and (9b).

In addition, if the conditions hold then J is faithful to G(J , ).

For a probability distribution P, the ﬁrst condition of the above corollary simpliﬁes to that P satisﬁes intersection, composition, and singleton-transitivity.

7. Comparison to the Results in the Literature

All results in the literature concerning faithfulness of certain probability distributions and certain types of graphs can be considered corollaries (or examples) of Theorem 17. Here we verify this for some of the more well-known or interesting results. For brevity, we leave out some interesting results such as faithfulness of Gaussian and discrete distributions and LWF chain graphs; see Pe na (2011) and Pe na (2009); Studen y and Bouckaert (1998) respectively.

7.1 Faithfulness of MTP2 Distributions and Undirected Graphs

Multivariate totally positive of order 2 (MTP2) distributions (Karlin and Rinott, 1980) are distributions whose density f is MTP2, that is f(x)f(y) f(x y)f(x y), for all vectors x, y, where x y = (min(x1, y1), . . . , min(xm, ym)) and x y = (max(x1, y1), . . . , max(xm, ym)). In Fallat et al. (2017), it was shown that when an MTP2 distribution P is a graphoid, it holds that P and what we denote here by Gu(P) are faithful. It is known that MTP2 distributions satisfy both singleton-transitivity and upward-stability, but not necessarily the intersection property; see Fallat et al. (2017). Hence, this result is a direct implication of Corollary 29.

7.2 Faithfulness of Gaussian Distributions and Undirected Graphs

Lnˇeniˇcka and Mat uˇs (2007) provides regular Gaussian distributions that are faithful to any undirected graph; see Corollary 3 of the mentioned paper. In order to do so, given an undirected graph G = (V, E), they provide the following matrix AG,ϵ, which acts as the covariance matrix of a Gaussian distribution that is faithful to G:

1 if i = j, ϵ if i = j and i and j are adjacent in G, 0 otherwise,

where i, j V . It is known that, for suﬃciently small ϵ > 0, the corresponding Gaussian distribution is regular, and by Corollary 31, we only need to show that it satisﬁes upward-stability. It is straightforward to show that for suﬃciently small ϵ > 0, the inverse of AG,ϵ, which is the concentration matrix of the Gaussian distribution, is an M-matrix, that is all its on-diagonal elements are positive and all its oﬀ-diagonal elements are non-positive. This is equivalent to the Gaussian distribution being MTP2; see Karlin and Rinott (1983). Hence, upward-stability is satisﬁed as discussed above.

7.3 Unfaithfulness of Certain Gaussian Distributions and Undirected Graphs and DAGs

There are many examples in the literature for, and in fact it is easy to construct examples of, probability distributions that are not faithful to undirected graphs; for one of the ﬁrst observations of this behaviour in a data set, see Example 4 of Cox and Wermuth (1993). As a constructed example, let us discuss Example 1 of Soh and Tatikonda (2014), which provided the Gaussian distribution for X = (X1, X2, X3, X4) with zero mean and positive deﬁnite covariance matrix

3 2 1 2 2 4 2 1 1 2 7 1 2 1 1 6

It can be seen that the concentration matrix Σ 1 has no zero entries, but X1 X3 | X2 since Σ13 = Σ12Σ 1 22 Σ23. Hence, the distribution and its complete skeleton are not faithful. This is clearly a consequence of the violation of upward-stability.

Faithfulness of Probability Distributions and Graphs

For DAGs too, it is easy to construct or observe examples where faithfulness is violated. One of the simplest of such examples, is what is related to the so-called transitivity of causation in philosophical causality; see, for example, Ramsey et al. (2006), where the goal is to show how the PC algorithm (Spirtes et al., 2000) errs. Consider the graph a b c, where a c and a c | b. This, in our language, is a clear violation of singleton-transitivity since the graph implies that a, b and b, c are always dependent.

7.4 Faithfulness of Gaussian and Discrete Distributions and DAGs

It was shown in Meek (1995) that, in certain measure-theoretic senses, almost all the regular Gaussian and discrete distributions that are Markov to a given DAG are faithful to it. It is known, for DAGs, that the global Markov property is equivalent to the factorization property (see Lauritzen 1996), which holds if the joint density can factorize as follows:

f V (x) = Y

v V f V (xv | xpa(v)).

Since Gaussian and discrete distributions satisfy singleton-transitivity, in order to prove the result, one needs to show that ordered upwardand downward-stabilities, (8b) and (9b), are satisﬁed for almost all distributions that factorize w.r.t. the DAG. We refrain from showing this explicitly here due to technicality of the proof, but the general idea is similar to the proofs in Appendix B of Meek (1995): For (9b), we have that the probabilistic conditional independence holds for the marginal probability over (i, j, C, k), and we need to exploit the factorization to show that the same conditional independence formula holds for the marginal probability over (i, j, C) when k / an({i, j} C. In this case the corresponding resulting polynomials are much simpler than those of Meek (1995) since one needs to consider extra marginalization over only one variable k. For (8b), we will need to prove the other direction in a similar fashion.

7.5 Faithfulness of Gaussian Distributions and Ancestral Graphs

The main goal of Richardson and Spirtes (2002) is to capture the conditional independence structure of a DAG model after marginalization and conditioning; see also Sadeghi (2013). This was done by introducing the class of AGs, and it was shown that every AG is probabilistic. In fact, there is a Gaussian distribution faithful to any AG, which is indeed the Gaussian distribution faithful to a DAG after marginalization and conditioning; see Theorem 7.5 of this paper. However, by using our results, without going through the standard theory of ancestral graphs, it is possible to show that distributions that are faithful to a DAG are graphical after marginalization and conditioning: In order to do so, we use the notation in Sadeghi (2013), for an independence model J , to deﬁne the independence model after marginalization over M and conditioning on C, denoted by α(J ; M, C), for disjoint subsets M and C of V :

α(J ; M, C) = { A, B | D : A, B | D C J and (A B D) (M C) = }.

First, we show that α(J ; M, C) satisﬁes condition 1 of Theorem 17.

Proposition 35 If J satisﬁes singleton-transitivity, intersection, and composition then so does α(J ; M, C).

Proof For composition, assume that A, B | C α(J ; M, C) and A, D | C α(J ; M, C). By deﬁnition, we have A, B | C C J and A, D | C C J . Composition for J implies A, B D | C C J . Therefore, A, B D | C α(J ; M, C). Proof of the others is similar to this one and as straightforward.

Denote by α(G; M, C), the generated AG from the DAG G and M and C two disjoint subsets of its node set that are marginalized over and conditioned on respectively. Here we do not go through the generating process of AGs from DAGs; for more details, see Richardson and Spirtes (2002); Sadeghi (2013). Without explicitly using the theory of stability under marginalization and conditioning for AGs, we show the following.

Proposition 36 If J and a DAG G are faithful then α(J ; M, C) and the ancestral graph α(G; M, C) are faithful.

Proof By Proposition 35, we only need to show that w.r.t. the ordering associated with α(G; M, C), condition 2 of Theorem 17 is also satisﬁed by α(J ; M, C). To prove ordered upward-stability, assume that i, j | C {k} / α(J ; M, C), where k is an anterior of a node l, l {i, j} or connected by lines to l, l C in H = α(G; M, C). We have that i, j | C C {k} / J . Since J and G are faithful, there is a connecting walk π between i and j given C C {k}. If k is not on π, π is connecting given C C . Otherwise, it holds that k must be a collider on π. By Lemma 2 in Sadeghi (2013), k is in ancestor of a node l {i, j} C C . By replacing the subwalk of π from l to k by the directed path from k to l if l {i, j}, or by adding the directed path from k to l plus the reverse from l to k to π if l C C , we conclude that i, j | C C / J . Hence, i, j | C / α(J ; M, C). To prove ordered downward-stability, assume that i, j | C \ {k} / α(J ; M, C), where k C is not an anterior of any node l {i, j} C in H. We have that i, j | C C \{k} / J . Since J and G are faithful, there is a connecting walk π between i and j given C C \{k}. Node k cannot be on π since k must be a non-collider and consequently an ancestor of a node in {i, j} C C in G, which by Lemma 3 in Sadeghi (2013), implies that k an(l) in H. Hence, k is not on π, and consequently π is connecting given C C . We conclude that i, j | C C / J . Hence, i, j | C / α(J ; M, C).

This shows that, for every AG, there is a distribution faithful to it since, for every AG, there is a DAG that, after marginalization and conditioning, results in the given AG.

8. Summary and Future Work

We provided suﬃcient and necessary conditions for an independence model, and consequently a probability distribution, to be faithful to a chain mixed graph. All the deﬁnitions and results concerning independence models are true for probabilistic independence models. The conditions can be divided into two main categories: 1) Those that ensure that the distribution is Markov to a graph whose skeleton is the same as the skeleton of the distribution these properties are namely intersection, composition, and ordered upwardand downward-stabilities. If the distribution is already pairwise Markov to the graph then intersection and composition suﬃce. Ordered upwardand downward-stabilities direct the edges of the skeleton of the distribution so that the pairwise Markov property is satisﬁed.

Faithfulness of Probability Distributions and Graphs

2) Those that ensure that a Markov distribution is also faithful to the graph. These are singleton-transitivity and ordered upwardand downward-stabilities. The type of preordering or preorderings w.r.t. which the distribution satisﬁes ordered upwardand downward-stabilities implies the type or types of graphs that are faithful to the distribution. In order to only deal with simpler classes of graphs, it is enough to search through a more reﬁned set of preorderings determined by the conditions that graphs of a speciﬁc class must satisfy. It is indeed true that when the skeleton of the graph or distribution is known, the preordering is equivalent to diﬀerent ways the edges of the skeleton can be directed, but in principle the preordering of the variables is unrelated to the skeleton of the distribution. The conditions of faithfulness provided in this paper for probability distributions are suﬃcient and necessary. Thus, based on these conditions many families of distributions can be shown not to be suitable for graphical modeling. This also shows the importance of Gaussian distributions in graphical models as the provided conditions (other than ordered upwardand downward-stabilities) are clearly satisﬁed by the regular Gaussian distributions. In addition, these conditions help those who devise new parametric models to ensure that their models satisfy the provided conditions for faithfulness. It is clearly important to study the characteristics of distributions that satisfy these conditions. The intersection property is well-studied and, in addition to positivity of the density that has been known to be a suﬃcient condition, necessary and suﬃcient conditions for a probabilistic independence to satisfy this property are known; see Peters (2015). For composition and singleton-transitivity, more studies are needed; in particular, it would be worthwhile to ﬁnd simple suﬃcient conditions for probability distributions in order to satisfy these properties. Another approach is to devise algorithms that test whether these conditions are satisﬁed given data or an independence model. Indeed one can argue for an alternative approach to determine whether a distribution is faithful to some graph: one ﬁrst applies the structure learning algorithm (taking as input the distribution and outputting a graph) and then one tests whether the separation relationships in the found graph correspond to the conditional independencies in the distribution. However, testing our conditions are much more eﬃcient. As an example, in the case of undirected graphs and Gaussian distributions, by our results, one only needs to test upward-stability for non-adjacent single elements i and j (starting with a minimal node cut-set as the conditioning set) rather than testing all global conditional independencies implied by the graph. As another algorithmic consequence that the results in this paper may prompt, they lead to a constraint-based structural learning algorithm (as opposed to score-based, for example, Chickering 2003) most similar to the Fast Causal Inference (FCI) algorithm (Spirtes et al., 2000) it ﬁnds the skeleton of the graph and then explores directions of the edges of the graph on the skeleton. Directing the edges of the skeleton is performed based on an exploitation of the preordering of the variables (and consequently the nodes of the graph) so that w.r.t. such a preordering ordered upwardand downward-stabilities are satisﬁed. This algorithm would have two advantages over the known constraint-based algorithms in the literature: Firstly, it is devised for a larger class of An Gs, which can be thought of as learning the LWF chain graphs with hidden and selection variables; see Sadeghi (2016). Secondly, the exploitation of ordered upwardand downward-stability for structural learn-

ing can be combined with actually testing whether these conditions are satisﬁed, and the theory in the paper ensures the faithfulness of the generated graphs. We plan on providing the details of this algorithm elsewhere. When dealing with data, another issue is that obtaining a faithful graph is very sensitive to hypothesis testing errors for inferring the independence statements from data; see Uhler et al. (2013). The concept of strong faithfulness (Zhang and Spirtes, 2003) has been suggested for the Gaussian case in order to deal with this issue. Although strong faithfulness, by nature, does not seem to be characterizable in this way, it is an interesting problem to study which of the equivalent conditions to faithfulness are more prone to such sensitivity.

Acknowledgments

The original idea of this paper was inspired by discussions at the American Institute of Mathematics workshop Positivity, Graphical Models, and the Modeling of Complex Multivariate Dependencies in October 2014. The author thanks the AIM and NSF for their support of the workshop and is indebted to other participants of the discussion group, especially to Shaun Fallat, Steﬀen Lauritzen, Nanny Wermuth, and Piotr Zwiernik for followup discussions. The author is also grateful to Rasmus Petersen for valuable discussions. This work was partly supported by grant #FA9550-12-1-0392 from the U.S. Air Force Ofﬁce of Scientiﬁc Research (AFOSR) and the Defense Advanced Research Projects Agency (DARPA).

R. Ayesha Ali, Thomas S. Richardson, and Peter Spirtes. Markov equivalence for ancestral graphs. Annals of Statistics, 37(5B):2808 2837, 2009.

Steen A. Andersson, David Madigan, and Michael D. Perlman. Alternative Markov properties for chain graphs. Scandinavian Journal of Statistics, 28:33 85, 2001.

David Maxwell Chickering. Optimal structure identiﬁcation with greedy search. Journal of Machine Learning Research, 3:507 554, 2003.

Peter Cliﬀord. Markov random ﬁelds in statistics. In G. R. Grimmett and D. J. A. Welsh, editors, Disorder in Physical Systems: A Volume in Honour of John M. Hammersley, pages 19 32. Oxford University Press, 1990.

David R. Cox and Nanny Wermuth. Linear dependencies represented by chain graphs (with discussion). Statistical Science, 8(3):204 218; 247 277, 1993.

John N. Darroch, Steﬀen L. Lauritzen, and Terry P. Speed. Markov ﬁelds and log-linear interaction models for contingency tables. Annals of Statistics, 8:522 539, 1980.

A. Philip Dawid. Conditional independence in statistical theory (with discussion). Journal of the Royal Statistical Society Series B, 41(1):1 31, 1979.

Mathias Drton and Thomas S. Richardson. Binary models for maginal independence. Journal of the Royal Statistical Society Series B, 41(2):287 309, 2008.

Faithfulness of Probability Distributions and Graphs

Shaun Fallat, Steﬀen Lauritzen, Kayvan Sadeghi, Caroline Uhler, Nanny Wermuth, and Piotr Zwiernik. Total positivity in Markov structures. Annals of Statistics., 45(3), 2017.

Morten Frydenberg. The chain graph Markov property. Scandinavian Journal of Statistics, 17:333 353, 1990.

Dan Geiger, Thomas S. Verma, and Judea Pearl. Identifying independence in Bayesian networks. Networks, 20:507 534, 1990.

Clark Glymour, Richard Scheines, Peter Spirtes, and Kevin Kelly. Discovering Causal Structure. Academic Press, 1987.

Samuel Karlin and Yosef Rinott. Classes of orderings of measures and related correlation inequalities. I. Multivariate totally positive distributions. Journal of Multivariate Analysis, 10(4):467 498, 1980.

Samuel Karlin and Yosef Rinott. M-matrices as covariance matrices of multinormal distributions. Linear Algebra and its Applications, 52:419 438, 1983.

G oran Kauermann. On a dualization of graphical Gaussian models. Scandinavian Journal of Statistics, 23(1):105 116, 1996.

Harri Kiiveri, Terry P. Speed, and J. B. Carlin. Recursive causal models. Journal of the Australian Mathematical Society, Series A, 36:30 52, 1984.

Steﬀen L. Lauritzen. Graphical Models. Clarendon Press, Oxford, United Kingdom, 1996.

Steﬀen L. Lauritzen and Kayvan Sadeghi. Unifying Markov properties for graphical models. Annals of Statistics, To appear, 2017. URL https://arxiv.org/abs/1608.05810.

Steﬀen L. Lauritzen and David J. Spiegelhalter. Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society, Series B, 50:157 224, 1988.

Steﬀen L. Lauritzen and Nanny Wermuth. Graphical models for association between variables, some of which are qualitative and some quantitative. Annals of Statistics, 17:31 57, 1989.

Radim Lnˇeniˇcka and Frantiˇsek Mat uˇs. On Gaussian conditional independence structures. Kybernetika, 43(3):327 342, 2007.

Christopher Meek. Strong completeness and faithfulness in Bayesian networks. In Proceedings of the Eleventh Conference on Uncertainty in Artiﬁcial Intelligence, UAI 95, pages 411 418, San Francisco, CA, USA, 1995.

Richard E. Neapolitan. Learning Bayesian Networks. Prentice Hall, Upper Saddle River, NJ, USA, 2004.

Jose M. Pe na. Faithfulness in chain graphs: The discrete case. International Journal of Approximate Reasoning, 50:1306 1313, 2009.

Jose M. Pe na. Faithfulness in chain graphs: The Gaussian case. In Proceedings of the 14th International Conference on Artiﬁcial Intelligence and Statistics (AISTATS 2011), volume 15, pages 588 599. JMLR.org, 2011.

Jose M. Pe na. Marginal AMP chain graphs. International Journal of Approximate Reasoning, 55:1185 1206, 2014.

Judea Pearl. Probabilistic Reasoning in Intelligent Systems : networks of plausible inference. Morgan Kaufmann Publishers, San Mateo, CA, USA, 1988.

Judea Pearl. Causality: Models, Reasoning and Inference. Cambridge University Press, New York, NY, USA, 2nd edition, 2009.

Judea Pearl and Azaria Paz. Graphoids: A Graph-based Logic for Reasoning about Relevance Relations. University of California (Los Angeles). Computer Science Department, 1985.

Jonas Peters. On the intersection property of conditional independence and its application to causal discovery. Journal of Causal Inference, 3:97 108, 2015.

Joseph Ramsey, Jiji Zhang, and Peter Spirtes. Adjacency-faithfulness and conservative causal inference. In UAI. AUAI Press, 2006.

Thomas Richardson. Markov properties for acyclic directed mixed graphs. Scandinavian Journal of Statistics, 30(1):145 157, 2003.

Thomas S. Richardson and Peter Spirtes. Ancestral graph Markov models. Annals of Statistics, 30(4):962 1030, 2002.

Kayvan Sadeghi. Stable mixed graphs. Bernoulli, 19(5(B)):2330 2358, 2013.

Kayvan Sadeghi. Marginalization and conditioning for LWF chain graphs. Annals of Statistics, 44(4):1792 1816, 2016.

Kayvan Sadeghi and Steﬀen L. Lauritzen. Markov properties for mixed graphs. Bernoulli,, 20(2):676 696, 2014.

Bernd S.W. Schr oder. Ordered Sets: An Introduction. Birkh auser, Boston, 2003.

De W. Soh and Sekhar C. Tatikonda. Testing unfaithful Gaussian graphical models. In Z. Ghahramani, M. Welling, C. Cortes, N.d. Lawrence, and K.q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages 2681 2689. Curran Associates, Inc., 2014.

Peter Spirtes, Clark Glymour, and Richard Scheines. Causation, Prediction, and Search. MIT press, 2nd edition, 2000.

Daniel Steel. Homogeneity, selection, and the faithfulness condition. Minds and Machines, 16(3):303 317, 2006.

Milan Studen y. Multiinformation and the problem of characterization of conditional independence relations. Problems of Control and Information Theory, 18:3 16, 1989.

Faithfulness of Probability Distributions and Graphs

Milan Studen y. Probabilistic Conditional Independence Structures. Springer-Verlag, London, United Kingdom, 2005.

Milan Studen y and Remco R. Bouckaert. On chain graph models for description of conditional independence structures. Annals of Statistics, 26(4):1434 1495, 1998.

Caroline Uhler, Garvesh Raskutti, Peter B uhlmann, and Bin Yu. Geometry of the faithfulness assumption in causal inference. Annals of Statistics, 41(2):436 463, 2013.

Thomas Verma and Judea Pearl. On the equivalence of causal models. In Proceedings of the Sixth Conference Annual Conference on Uncertainty in Artiﬁcial Intelligence (UAI-90), pages 220 227, New York, NY, 1990.

Nanny Wermuth. Probability distributions with summary graph structure. Bernoulli, 17 (3):845 879, 2011.

Nanny Wermuth and David R. Cox. On association models deﬁned over independence graphs. Bernoulli, 4:477 495, 1998.

Nanny Wermuth and Kayvan Sadeghi. Sequences of regressions and their independences. TEST, 21(2):215 252 and 274 279, 2012.

Nanny Wermuth, Giovanni M. Marchetti, and David R. Cox. Triangular systems for symmetric binary variables. Electronic Journal of Statistics, 3:932 955, 2009.

James Woodward. Causal independence and faithfulness. Multivariate Behavioral Research, 33(1):129 148, 1998.

Jiji Zhang and Peter Spirtes. Strong faithfulness and uniform consistency in causal inference. In Proceedings of the Nineteenth Conference on Uncertainty in Artiﬁcial Intelligence, UAI 03, 2003.

Jiji Zhang and Peter Spirtes. Detection of unfaithfulness and robust causal inference. Minds and Machines, 18(2):239 271, 2008.