# title__1d7b9317.pdf

The Thirty-Fourth AAAI Conference on Artiﬁcial Intelligence (AAAI-20)

An Efﬁcient Algorithm for Counting Markov Equivalent DAGs

Robert Ganian,1 Thekla Hamm,1 Topi Talvitie2

1Vienna University of Technology, Austria, Email: rganian@gmail.com, thamm@ac.tuwien.ac.at 2University of Helsinki, Finland, Email: topi.talvitie@helsinki.ﬁ

We consider the problem of counting the number of DAGs which are Markov-equivalent, i.e., which encode the same conditional independencies between random variables. The problem has been studied, among others, in the context of causal discovery, and it is known that it reduces to counting the number of so-called moral acyclic orientations of certain undirected graphs, notably chordal graphs. Our main empirical contribution is a new algorithm which outperforms previously known exact algorithms for the considered problem by a signiﬁcant margin. On the theoretical side, we show that our algorithm is guaranteed to run in polynomial time on a broad class of chordal graphs, including interval graphs.

1 Introduction A key task in causal discovery concerns the learning of directed acyclic graphs (DAGs) that encode the direct causal relationships within a set of random variables of interest. This task is complicated by the fact that different DAGs may be Markov equivalent, which means that they can represent exactly the same set of conditional independencies between the random variables. Every Markov equivalence class of DAGs can be uniquely represented by an essential graph, which is a partially directed graph in which edges whose directions are not ﬁxed by the class are represented as undirected edges (Andersson, Madigan, and Perlman 1997). If we have only observational data from the joint distribution of the random variables of interest, we can identify the essential graph, but not necessarily the correct DAG within the equivalence class represented by the essential graph. There has been extensive research devoted to exploring the DAGs within a given equivalence class: for instance, Maathuis, Kalisch and B uhlmann (2009) estimated causal effects between pairs of variables when only the essential graph is known, while Ghassami, Salehkaleybar, Kiyavash and Bareinboim (2018) considered the problem of discovering the directions of as many undirected edges in the essential graph as possible using given number of interventional experiments. The problems of counting and sampling DAGs in a Markov equivalence class are particularly central in the

Copyright c 2020, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

area of these exploration problems, because approaches to the other problems typically reduce to them (Radhakrishnan, Solus, and Uhler 2017; 2018). For this reason, various algorithms for sampling and counting DAGs in a Markov equivalence class have been proposed (Talvitie and Koivisto 2019; He, Jia, and Yu 2015; Ghassami et al. 2019; He and Yu 2016; Bernstein and Tetali 2017). The problem of counting and sampling DAGs in an equivalence class efﬁciently and directly reduces to the restricted case where the essential graph is an undirected chordal graph1 (Gillispie and Perlman 2002). In this case, the DAGs in the Markov equivalence class are exactly the acyclic orientations of the edges of the chordal graph which do not contain an immorality, that is, a situation in which a vertex has two parents that are not connected by an edge. For this reason, the DAGs are known as moral acyclic orientations (MAOs). We denote the problem of computing the number of MAOs in chordal graphs as #MAO .

Related Work. In their previous work, He, Jia and Yu (2015) formulated the so-called Root Picking algorithm for #MAO: a recurrent procedure which branches over all choices of the unique source vertex of a MAO, ﬁxes the orientations of edges which are forced by the given choice of the source, and proceeds on each connected component. Root Picking lies at the heart of the following algorithmic approaches for #MAO: He and Yu (2016) enhanced Root Picking algorithm by identifying and extracting so-called core graphs to handle dense subinstances better. Ghassami et al. (2019) showed that the root-picking algorithm admits a runtime upper bound of nΔ+O(1) on n-vertex graphs of maximum degree Δ. Using dynamic programming on a structural decomposition of the graph, namely its clique-tree, rather than building on Root Picking, Talvitie and Koivisto (2019) showed that if the size of the largest clique in a chordal graph is k, one can count its MAOs in O(2kk!k2n) time. These counting methods can also be adapted to solve the sampling problem such that after running the counting algorithm, we can obtain uniform samples from the set of MAOs in polynomial time per sample by retracing the computation that gave the number of MAOs. In particular, we can sam-

1A graph is chordal if it admits a special tree-like representation called a clique-tree.

ple a MAO of H by ﬁrst sampling its root with probability proportional to the number of MAOs of H with that root, then inferring the orientations arising from this, and ﬁnally recursively sampling a MAO for each of the remaining undirected components. To determine the appropriate probabilities, we can invoke Get Part Sol, in an analogous manner as in previous related works (see, e.g., the work of Ghassami et al. (2019)).

Contribution. Our main contribution is a new algorithm, Anon MAO, which solves #MAO. Anon MAO enhances Root Picking with a vertex anonymisation technique that reduces the number of generated subinstances which need to be solved throughout the recursion. We remark that, like the algorithms mentioned as Related Work, Anon MAO can be adapted to solve the sampling problem. Speciﬁcally, if we use the alias method (Vose 1991; Walker 1977) we can uniformly sample MAOs in linear time; see also the work of Ghassami et al. (2019). On the empirical side, we show that Anon MAO vastly outperforms previously known exact algorithms for #MAO. As our main theoretical contribution, we show that Anon MAO can solve all instances of #MAO where the cliquetree of the instance has a polynomially bounded number of subtrees in polynomial time. The techniques used here are based on an in-depth and non-trivial analysis of the structure of subinstances generated by Root Picking, notably with respect to a ﬁxed clique-tree. In particular, for the most part this analysis is not speciﬁc to the formulation of Anon MAO but may also prove useful when considering the complexity of future algorithms for #MAO based on Root Picking. The property of having a clique-tree with polynomially many subtrees is satisﬁed by a wide range of chordal graphs. We also show that the polynomial bound on the number of subtrees can be equivalently expressed as a more simple condition on the degree of the nodes in the clique-tree. One special and very well-studied subclass of chordal graphs that admit clique-trees with polynomially many subtrees is the class of interval graphs (Brandst adt, Le, and Spinrad 1999); consequently, as a corollary we obtain that #MAO is polynomial-time solvable on interval graphs.

2 Preliminaries

Graphs and Orientations. A graph G is deﬁned by a ﬁnite set of vertices V (G) and a binary relation on it, i.e., a subset of V (G) V (G), describing a set of edges E(G). We distinguish between undirected edges (v, w) E(G), for which also (w, v) E(G); and directed edges (v, w) E(G), for which (w, v) / E(G). Graphs all of whose edges are undirected are called undirected themselves. Similarly graphs all of whose edges are directed are called directed. For ease of presentation we denote an edge in an undirected graph containing two vertices u and v by u v and an edge in a directed graph containing two vertices u and v in that order by u v or v u. (Note that with this notation u v = v u, but u v = v u.) A directed graph G is an orientation of a graph G if for every edge u v of G, G contains either edge u v or u v (but not both), and for every edge u v of G , G contains the edge u v or

the edge u v. A (graph) isomorphism is a bijection between the vertex sets of two graphs under which the edge relation is invariant. Graphs between which an isomorphism exists are isomorphic. An induced subgraph G[X] of a graph G is a graph deﬁned by a subset X V (G) and the edge relation E(G) restricted to X. A subgraph of G is a graph that is given by a subset X V (G) and a subset F E(G[X]). If H is isomorphic to a subgraph of G, we say G contains H. An (undirected) path is a graph v1 v2 . . . vℓ where the vi are pairwise different. More speciﬁcally the given path is also called v1-vℓ-path or path from v1 to vℓ or path between v1 and vℓ. We extend this notion to paths between a vertex and a vertex subset and paths between two vertex subsets: For X, Y a v-X-path is a v-x-path with x X; an X-Y -path is a x-Y -path with x X. A subgraph of a path that is itself a path is called subpath. Given a graph G and vertices v, w V (G), the distance dist G(v, w) between v and w in G is deﬁned as min{ℓ 1 | G contains a v-w-path with ℓvertices}. For X, Y V (G), dist G(v, X) and dist G(X, Y ) are deﬁned analogously. We deﬁne the set of neighbours of a vertex (set) in G as NG(v) = {w V (G) | dist G(v, w) = 1} (and NG(X) = {w V (G) | dist G(w, X) = 1}). A vertex is dominating if it is adjacent to all other vertices in the graph. A graph G is connected if for all v, w V (G), G contains a v-w-path. A maximal (in terms of inclusion of the vertex sets) connected induced subgraph of a graph is also called a connected component of it. An (undirected) cycle is a graph v1 v2 . . . vℓ v1 where the vi are pairwise different. An undirected graph is a tree if it is connected and contains no cycle. An (induced) subgraph of a tree that is itself a tree is called (induced) subtree. An undirected graph is chordal if, for any cycle with more than three vertices it contains, the subgraph induced by the vertices of this cycle contains a cycle with at most three vertices. A directed cycle is a graph v1 v2 . . . vℓ v1 where the vi are pairwise different. An orientation is acyclic if it contains no directed cycle.

Problem Statement. An immorality is a graph that is isomorphic to a v b. An orientation is moral if it contains no immorality as an induced subgraph. A Markov equivalence class can be set into one-to-one correspondence with a partially directed graph (its so-called essential graph (Verma and Pearl 1990)) in such a way that the DAGs in the Markov equivalence class are exactly the acyclic orientations of this graph which do not create new immoralities. What is more, such a graph is efﬁciently computable from any DAG in the Markov equivalence class (Meek 1995). It is known that removing all the directed edges of the graph results in an undirected chordal graph (Andersson, Madigan, and Perlman 1997), and each component of this graph may be oriented independently (Gillispie and Perlman 2002). Thus the problem of determining the size of a Markov equivalence class reduces to the following problem (see also the recent work of Talvitie and Koivisto (2019)):

#MAO: Given an undirected connected chordal graph (UCCG) G, count the number of MAOs of G. Note that #MAO is invariant under graph isomorphism.

The Root Picking Algorithm. He, Jia and Yu (2015) proposed a recursive algorithm for #MAO which has been used as a foundation in almost all approaches in the literature thus far. It uses the fact that any MAO of a UCCG contains a unique vertex v (called root) such that there is no edge u v in the MAO. By this observation the number of MAOs is equal to the sum, over all vertices v of the UCCG, of the number of MAOs that have v as root. Now the key is that the number of MAOs of the UCCG with v as root can be computed from the number of MAOs of certain UCCGs that are subgraphs of the original UCCG. An exact description of the procedure (without additional optimisation) which we refer to simply as Root Picking is given in Algorithm 1.

Algorithm 1: Root Picking

Input: An UCCG G Output: The number of MAOs of G

1 if |V (G)| = 1 then

4 for r V (G) do

5 Let G be a copy of G

6 for a b an edge in G with dist G(r, a) < dist G(r, b) do

7 Replace a b by a b in G

8 while G contains an induced subgraph a b c do

9 Replace b c by b c in G

10 Remove all directed edges of G

C connected component of G Root Picking(C)

12 return Sol

It is easy to see and has already been remarked by He, Jia and Yu (2015) that, except for the edges incident to the root r, all edges replaced in Line 7 of Root Picking, would also be replaced in Line 9 of Root Picking. Moreover, edges that contain the root would not be replaced in Line 9. This means it is equivalent to replace all edges containing the root r x by r x and then successively replace b c by b c for all induced a b c. In this vein we will distinguish root orientations, meaning orientations of edges containing the root, and morally propagating orientations, meaning orientations arising from Lines 8 and 9 of the algorithm. We will refer to the subgraphs C of G that are recursed on in Line 11 together with the full history which led to this recursive call. Each subinstance C is recursed on after a speciﬁc sequence of choices of roots in the nested executions of Line 4 leading to the recursive call on C in Line 11. We denote the set of these chosen roots by RC.

Optimisations and Extensions of Root Picking. Talvitie and Koivisto (2019) observed that each UCCG considered in the Root Picking algorithm is an induced subgraph of the

original UCCG. By memoizing the results, they reduced the time complexity to O(2n) and also achieved speedups in practice (Talvitie and Koivisto 2019). He and Yu (2016) extended the Root Picking algorithm to better handle dense graphs. They gave a recursive algorithm that, given a core graph (an UCCG without dominating vertices), computes a polynomial P(m) such that by adding m dominating vertices, the number of MAOs is P(m)m!.

Clique-Trees. We will use the clique-tree characterisation of UCCGs to argue about the performance of our new algorithm. A clique is a graph C in which for any two vertices v = w V (C), u v is an edge in C. A clique-tree of a graph G is a pair (T , χ) of a tree T (whose vertices we call nodes) and a map χ from V (T ) to subsets of V (G) such that for each node t V (T ), G[χ(t)] is a clique; for each vertex v V (G), T [{t V (T ) | v χ(t)}] is a connected subgraph of T with at least one vertex; and for every maximal X V (G) such that G[X] is a clique, there is exactly one t V (T ) such that χ(t) = X. A clique-tree of a chordal graph can be constructed in linear time (Blair and Peyton 1993).

3 The Algorithm: Anon MAO

Algorithm 2: Anon MAO

Input: A UCCG G Output: The number of MAOs of G

1 Let Sol[ , ] be a storage indexed by 2V (G) {0, . . . , |V (G)|}

2 Initialise all Sol[ , ] = NULL

3 return Get Part Sol(G)

Subroutine: Get Part Sol

Input: A connected induced subgraph H of G Output: The number of MAOs of H

1 if |V (H)| = 1 then

3 Let D be the set of dominating vertices in H

4 Let S = V (H) \ D and d = |D|

5 if Sol[S, d] = NULL then

6 Sol[S, d] = 0

7 for r V (H) do

8 Let H be a copy of H

9 for a b an edge in H with dist H(r, a) < dist H(r, b) do

10 Replace a b by a b in H

11 while H contains a b c as induced subgraph do

12 Replace b c by b c in H

13 Remove all directed edges of H

14 Sol[S, d] +=

C conn. comp. of H Get Part Sol(C)

15 return Sol[S, d]

Our main contribution is a new algorithm which extends Root Picking by combining the notion of cores (He and Yu 2016) with dynamic records. In this section, we present the algorithm and prove correctness. In the next section, we will then prove that the algorithm in fact runs in polynomial time on a broad subclass of chordal graphs, among others implying that #MAO is polynomial-time tractable on all interval graphs. We begin by recalling the deﬁnition of cores.

Deﬁnition 1. The core (or core graph) of a graph G is obtained from G by removing all dominating vertices, i.e., is given by G[V (G) \ {v V (G) | v is dominating in G}].

From this deﬁnition it is obvious that the core of a graph can be computed in polynomial time from the graph. Conversely, G can be reconstructed (up to isomorphism) from its core graph and the number of dominating vertices. This easy observation lies at the heart of the presented dynamic programming algorithm (Algorithm 2 along with its Subroutine Get Part Sol).

Theorem 2. Anon MAO correctly solves #MAO.

Proof. The algorithm proceeds just as Root Picking whenever Sol[S, d] = NULL in Line 5 of Get Part Sol. By initialisation of Sol this is the case for all entries of Sol at the beginning of the algorithm. Because of this and Lines 3 and 4 of Get Part Sol we can assume that an entry Sol[S, d] = NULL contains the number of MAOs of some UUCG H that is an induced subgraph of G, has d dominating vertices D and V (H)\D = S. This same number is returned when encountering an induced subgraph H of G which has d dominating vertices D and V (H ) \ D = S. H and H can be easily seen to be isomorphic by mapping D to D arbitrarily and S to S with the identity mapping. This means Get Part Sol returns the number of MAOs of H which is by isomorphism the same as the number of MAOs of H .

4 Polynomially Tractable Instances of #MAO

In this section, we will prove that Anon MAO solves #MAO in polynomial time whenever there is a polynomial bound on the number of subtrees of the clique-tree. For the following considerations, we ﬁx an UCCG G together with an associated clique-tree (T , χ). We also introduce the following notation: for a subtree T of T and v V (G), we let T (v) = {t V (T) | v χ(t)}.

Sub-blocks. A crucial notion that will help us understand the behavior of Anon MAO is that of sub-blocks. The deﬁnition of sub-blocks is motivated by the following insight into the behavior of the Root Picking algorithm.

Lemma 3. Let r = u v be an edge of G that is oriented during the Root Picking algorithm when considering some root r V (G). Then dist T (T (u), T (r)) < dist T (T (v), T (r)). Moreover, there is a node s V (T ) on the path from T (r) to T (v) such that u χ(s) and v / χ(s).

Proof. Because u = r, the orientation u v results from a non-empty sequence of morally propagating orientations following an orientation of an edge incident to the root. This

describes a path in G that starts in r and ends in u. We proceed by induction along the length of a shortest such path, i.e., a shortest path of orientations from the root to u. In base case, when the sequence consists of just one morally propagating orientation, i.e., r u is an edge in G, there has to be s V (T ) such that r, u χ(s). Similarly because u v is an edge in G there is some t V (T ) such that u, v χ(t). As u v must be replaced due to a morally propagating orientation, r v is not an edge in G and thus v / χ(s). By the properties of the clique-tree, s lies on the path from T (r) to T (v) which proves the base case. For the inductive step, assume that the claim holds whenever we have an induced subgraph w u v of G where the orientation of w u can be explained by a sequence of propagating orientations of length i 1. Consider an induced subgraph w u v of G where the orientation of u v follows from a shortest path of orientations from r to u (through w) such that this path has length i1. Now, consider the unique bag s in T (u) that is closest to T (r). By our inductive assumption about the orientation of the edge w u, w χ(s). But then s cannot contain v, since v is not adjacent to w. Since T (v) does intersect T (u) and s is the bag in T (u) closest to T (r), s must lie on the path from T (v) to T (r) in particular, it satisﬁes the conditions of the lemma.

Informally, Lemma 3 shows that every morally propagating orientation (i.e., all orientations of edges not incident to the current root) u v implies the existence of a distinguishing bag that appears in the root-direction inside the clique-tree. This motivates the following deﬁnition, which identiﬁes sub-blocks in a clique-tree which are not distinguished by such a bag (i.e., where all the bags have the same intersection with the neighbouring bag in the root direction).

Deﬁnition 4. A clique-tree sub-block is a tuple (T, Out) where T is a subtree of T and Out NT (V (T)). The body BT,Out of a clique-tree sub-block (T, Out) is the vertex set of G given by

t V (T ) χ(t) \

t Out χ(t). A clique-tree sub-block is called propagation identical with respect to some r V (G) if either t V (T) r χ(t), or t Out s, s V (T) χ(s) χ(t) = χ(s ) χ(t).

A clique-tree sub-block is propagation identical w.r.t. a set R of roots if it is a propagation identical w.r.t. each vertex in R. Intuitively, by Lemma 3, a propagation identical sub-block (or PIS, in short) (T, Out) provides a template for specifying a vertex set of G, namely its body, between whose vertices no morally propagating orientations take place. We will use propagation identicality to show that each subinstance considered by Root Picking is the body of some clique-tree sub-block, where each root picked up to that point is either a dominating vertex in the subinstance or lies in the direction of some node t Out. In combination with the following observation, this will provide us with an upper bound on the number of subinstances considered by Anon MAO.

Observation 5. The number of clique-tree sub-blocks is at most ρ2, where ρ is the number of subtrees of T .

To formalise and prove the desired claim, we ﬁrst consider subinstances containing the latest root picked separately in Lemma 6. We use this as a base for proving the general case in Lemma 7 and Lemma 8. Lemma 6. Consider a subinstance C that is recursed on during the Root Picking algorithm which arises from another subinstance C after picking a root r V (C ) such that r NG(V (C)). Then C is a connected component of G[{u V (C ) \ {r} | t V (T ) : u χ(t) r χ(t)].

Proof Sketch. Since Z = {u V (C ) \ {r} | t V (T ) u χ(t) r χ(t)} is the set of all vertices at distance 1 from r in C , it obviously contains V (C). The proof follows by showing that no edge in G[Z] can be inﬂuenced by a morally propagating orientation.

For the following lemma, it will be useful to recall that RC is the set of roots chosen by Root Picking. Lemma 7. Consider a subinstance C that is recursed on during the Root Picking algorithm applied to G. There is a PIS (T, Out) with respect to RC such that C is a connected component of G[BT,Out \ RC].

Proof Sketch. We prove the lemma by induction on the recursion depth needed to reach C as a subinstance. For the base case, notice that V (G) = BT , and G = G[BT , \ ]. Now assume that the lemma holds for all instances arising at recursion depth at most i. Let C be an arbitrary subinstance at recursion depth i, and using the induction hypothesis let (T , Out ) be a PIS with respect to RC such that G[BT ,Out ] \ RC has C as a connected component and let r V (C ) be an arbitrary vertex of C , i.e., a possible choice for a root when recursing on C . We show that for every C which occurs as subinstance at recursion depth i + 1 arising from C after picking r as the new root, C is a connected component of G[BT,Out \ RC] for some PIS (T, Out) w.r.t. RC = RC {r}. Note that any clique-tree sub-block whose tree is contained in T is trivially propagation identical w.r.t. RC

t V (T ) χ(t). For any C such that r NG(V (C)), the claim follows from Lemma 6 by setting T = T (r) and Out = Out NT (T (r)). Now, consider a subinstance C arising from the case where r NG(V (C)). We now proceed in two steps. 1. We show that that V (C) is contained in some BT,Out\RC for some PIS (T, Out). To this end, let TC be an inclusionminimal connected subtree TC of T such that V (C)

t V (TC) χ(t). Let t NT (TC) be a node with minimum dist T (t, T (r)). If s, s V (TC) χ(s) χ(t) = χ(s ) χ(t) and V (C) χ(t) = , we set (T, Out) to (TC, (Out NT (TC)) {t}). Obviously with this choice V (C) BT,Out and it remains to show that (T, Out) indeed is propagation identical. This is the case, because for s, s V (TC) V (T ) it holds that for any t Out that χ(s) χ(t ) = χ(s ) χ(t ), and by assumption on t, s, s V (TC) χ(s) χ(t) = χ(s ) χ(t). Otherwise, we proceed iteratively by extending TC by t, considering the next t NT (TC) minimising dist T (t, T (r)) for the updated TC, and checking the condition again. It can be shown that this results in a node t

and subtree TC satisfying s, s V (TC) χ(s) χ(t) = χ(s ) χ(t), and we can proceed as per the previous case. 2. We conclude the proof by showing that no edge in G[BT,Out \ RC] for these BT,Out is directed during Root Picking in the branch of root choices that led to C. This sufﬁces as C is, as a subinstance of Root Picking, a connected component that arises after removing the edges directed due to previous root choices.

The next lemma provides the core statement we need to argue the desired runtime bounds for Anon MAO. In essence, it shows that every subinstance called by Anon MAO will occur as the body of a clique-tree sub-block.

Lemma 8. Consider a subinstance C that is recursed on during the Root Picking algorithm applied to G. There is a clique-tree sub-block (T, Out) such that C = G[BT,Out \ RC].

Proof Sketch. Due to Lemma 7, there is a PIS (T , Out ) w.r.t. RC such that C is a connected component of G[BT ,Out \RC]. As C is connected in G and a subgraph of G[BT ,Out ] we can ﬁnd an inclusion-maximal subtree TC of T such that V (C)

t V (TC) χ(t) and for all t V (TC), χ(t) V (C) = . By using the propagation identicality of (T , Out ) (which ensures that vertices in TC intersect with Out in the same way as T ), it can be shown that setting Out = (Out NT (V (TC))) NT (V (TC)) yields C = BTC,Out, concluding the proof.

Picked Roots. As we have seen in the preceding subsection, we can identify subinstances of the Root Picking algorithm by the bodies of clique-tree sub-blocks, with the caveat that we lose information about the precise identities of the picked roots. In this subsection we take care of this issue by ﬁxing choices for the roots that can be seen as locally isomorphic to whichever roots were actually chosen by Root Picking. Assume that the vertices of G are ordered in an arbitrary way whenever we speak about vertices being smaller or minimal , we understand these terms in regard to this ordering. A clique-tree sub-block (T, Out) together with a number d {0, . . . , |BT,Out|} induces a subgraph G(T, Out, d) of G in the following way. First, we let D(T, Out, d) = {v1, . . . , vd BT,Out | vi is minimal dominating for BT,Out \ {v1, . . . , vi 1} (D(T, Out, d) will serve as our locally isomorphic choice for the previously picked roots). Then we set V (T, Out, d) = BT,Out \ D(T, Out, d) and G(T, Out, d) = G[V (T, Out, d)].

Lemma 9. Consider a subinstance C that is recursed on during the Root Picking algorithm applied to G. There is a clique-tree sub-block (T, Out) and a number d such that C is isomorphic to G(T, Out, d).

Proof Sketch. By Lemma 8, there is a clique-tree sub-block (T, Out) such that C = G[BT,Out \RC]. Let ℓ= |BT,Out RC|. It is not difﬁcult to show that G[BT,Out \ RC] is isomporphic to G(T, Out, ℓ), and the lemma follows.

Anon MAO Root Picking Dynamic Programming

He and Yu 2016 Dynamic Programming in Tree Decomposition

Running time

n = 32 n = 64

Running time

n = 128 n = 256

2 3 4 6 8 12 16 24 32 Density r

Running time

2 3 4 6 8 12 16 24 32 Density r

Figure 1: The running times of the algorithms for different numbers of vertices n as functions of the density parameter r. Both axes are logarithmic. The results from 50 repetitions with different inputs for each algorithm, n and r are summarised using a box plot: the box shows the range between the ﬁrst and third quartile, and the whiskers show the minimum and maximum running times.

Relating G(T, Out, d) and Anon MAO. With all the components in place, now it just remains to formalise the link between the records G(T, Out, d) introduced in this section and the instances considered by Anon MAO.

Lemma 10. Let (S, d) be such that there is a subinstance C of Root Picking applied to G with d dominating vertices and G[S] as core. Then there is a clique-tree sub-block (T, Out) and some d {0, . . . , |BT,Out|} such that G(T, Out, d ) has d dominating vertices and G[S] as core.

Proof. By Lemma 9, there is a clique-tree sub-block (T, Out) and some d {0, . . . , |BT,Out|} such that C is isomorphic to G(T, Out, d ). It is easy to see that, because G(T, Out, d) and C are isomorphic, they have the same number of dominating vertices. This means G(T, Out, d ) has d dominating vertices. By choice of (T, Out) in the proof of Lemma 9, we can also assume that C = G[BT,Out \ RC]. Hence V (C) = BT,Out \ RC. By deﬁnition V (G(T, Out, d )) = V (T, Out, d ) = BT,Out \D(T, Out, d ). We show that any v V (C) \ V (G(T, Out, d )) is dominating in C, and conversely any v V (G(T, Out, d )) \ V (C) is dominating in V (G(T, Out, d )) which implies the lemma: Let v V (C) \ V (G(T, Out, d )) = (BT,Out \ RC) D(T, Out, d ). By deﬁnition of D(T, Out, d ), v w is an edge in V (C) for every w V (C) (BT,Out \ D(T, Out, d )) = V (C) \ D(T, Out, d ). For w V (C) D(T, Out, d ), either v was included into D(T, Out, d ) with a smaller index than w, in which case v dominates w, or vice versa. In any case v is dominating in C. Conversely let v V (G(T, Out, d ))\V (C) = (BT,Out\ D(T, Out, d )) RC. By propagation identicity with respect to RC

t V (T ) χ(t), for s, s V (T), RC χ(s) = RC χ(s ). Thus for all t V (T), v χ(t) which implies v is dominating in G(T, Out, d ).

We can now present our main theoretical contribution.

Theorem 11. On graphs accompanied by a clique-tree such that the tree has polynomially many, say p(|V (G)|), subtrees, Anon MAO runs in polynomial time.

Proof. Let G be such a graph and (T , χ) be such a cliquetree of G. Lemma 10 and its proof provide a surjection from the set of clique-tree sub-blocks and natural numbers bounded by the cardinality of their bodies to the set of core-dominating vertex pairs for subinstances considered by Anon MAO. Thus Anon Mao distinguishes at most p(|V (G)|)2 |V (G)| subinstances using Observation 5.

Since interval graphs can be characterised as chordal graphs whose clique-tree is a path (Ibarra 2009; Fulkerson and Gross 1965), we obtain the following corollary.

Corollary 12. Anon MAO runs in polynomial time on interval graphs.

5 On Clique-Trees with Few Subtrees We consider now brieﬂy the graph class for which we have shown #MAO to be polynomial time solvable. One can reformulate the condition of a clique-tree having a polynomially bounded number of subtrees as a more local condition on the clique-tree. Among others, this allows us to check in linear time whether a clique-tree has a desired polynomial bound on the number of subtrees or not. For a tree T, let h T denote the number of vertices with degree at least three, which we refer to as high-degree vertices, ΔT denote the maximum degree of a vertex in T, and s T denote the number of subtrees of T. Observation 13. For any tree T, s T |V (T)|h T ΔT . Lemma 14. For any sequence of (hn)n N and (Δn)n N such that there is a sequence of trees (Tn)N such that |V (Tn)| = n, h Tn = hn and ΔTn = Δn, there is a sequence of such (Tn)n N and s Tn

max 2Δn, 2hn, n hn 2 hn (Δn 2) + 1 hn (Δn 2) .

From this, we can show: Corollary 15. For a general class of trees T, each s T can be polynomially (in |V (T)|) upper-bounded for all T T, if and only if h T , ΔT < c for a constant c for all T T.

Proof Sketch. The condition is clearly sufﬁcient by Observation 13. It is also necessary: By Lemma 14, T may contain a series of trees T with arbitrarily large |V (T)| such that

s T max 2ΔT , 2h T , |V (T )| h T 2

h T (ΔT 2) + 1 h T (ΔT 2) .

This statement shows that we can express our requirement for polynomial-time solvability (i.e., that the clique-trees have at most polynomially many subtrees) equivalently by requiring that the clique-trees have at most a constant number of high-degree vertices and maximum vertex degree. We remark that one can also show a similar relationship to the number of leaves in the clique-tree.

6 Experimental Results We compared the practical performance of the Anon MAO algorithm to the state-of-the-art algorithms, notably: Root Picking Dynamic Programming: The root picking algorithm due to He, Jia, and Yu (2015) with the dynamic programming speedup (Talvitie and Koivisto 2019). He and Yu 2016: The algorithm by He and Yu (2016) based on recursively computing polynomials P(m) for core graphs such that P(m)m! is the number of MAOs when we add m dominating vertices. Dynamic Programming in Tree Decomposition: The algorithm by Talvitie and Koivisto (2019) based on dynamic programming on the clique-tree with time-complexity in O(2kk!k2n) parameterised by the treewidth k. We implemented2 Anon MAO using C++, and for the other algorithms, we used the C++ implementations of Talvitie

2github.com/ttalvitie/efﬁcient-markov-equivalent-dag-counting

and Koivisto (2019). All the implementations use only one thread of execution and compute the result exactly. We use the experimental setup formulated for this problem by He, Jia, and Yu (2015), in which we run the algorithms on randomly generated UCCGs instances with given number of vertices n and given density parameter r. The instances are generated by ﬁrst generating a tree of n vertices by successively adding each vertex as a neighbour to a randomly chosen previously added vertex, and then, as long as the graph has less than rn edges, repeatedly choosing a random pair of elements and adding an edge between them if the resulting graph is chordal. For each number of vertices 32 n 1024 and density parameter 2 r 32, we generated 50 UCCG instances, and ran each algorithm for each instance. The time limit was set to 10 minutes and the memory limit to 32 gigabytes. The results are shown in Figure 1. From the results we see that Anon MAO can handle much denser instances than the other algorithms, and it typically is the fastest algorithm except in very sparse cases, where the tree decomposition-based dynamic programming algorithm is faster.

7 Conclusion We presented Anon MAO as a new exact algorithm for #MAO. Anon MAO is based on a simple dynamic programming enhancement of Root Picking in which dominating vertices are anonymised. Our empirical analysis signiﬁes the superiority of Anon MAO compared to previously implemented exact approaches in terms of time performance on most instances. Our theoretical analysis even shows polynomial-time complexity on a large class of graphs. However one can construct a series of instances (Gn)n N, for which Anon MAO recurses on an exponential number of subinstances: V (Gn) = {v1, . . . , vn}; for 1 i < j n/2, vi vj is an edge in G; and for n/2 + 1 i n and j {1, . . . , n/2} \ {i n/2}, vi vj is an edge in G. The (for these examples unique) clique-tree of each Gn is given by Tn, where each Tn consists of n/2 + 1 nodes, t, t1, . . . tn/2 such that t ti for 1 i n/2 is an edge in Tn. Note that each Tn has O(2 n 2 ) subtrees. One can show that for Gn Anon MAO considers O(2 n 2 ) instances. The above example shows that the complexity of #MAO on general UCCGs remains an interesting open problem. We believe our analysis of Anon MAO may also provide useful structural insights for investigating the problem on general UCCGs, primarily in the direction of tractability.

Acknowledgments. Robert Ganian and Thekla Hamm acknowledge support from the Austrian Science Fund (FWF, Project P31336: NFPC). Thekla Hamm is co-funded by FWF project W1255-N23.

References Andersson, S. A.; Madigan, D.; and Perlman, M. D. 1997. A characterization of markov equivalence classes for acyclic digraphs. Ann. Statist. 25(2):505 541.

Bernstein, M., and Tetali, P. 2017. On sampling graphical Markov models. Ar Xiv e-prints 1705.09717. Blair, J. R. S., and Peyton, B. 1993. An introduction to chordal graphs and clique trees. IMA Volumes in Mathematics and its Applications 56:1 29. Brandst adt, A.; Le, V. B.; and Spinrad, J. P. 1999. Graph Classes: A Survey. Society for Industrial and Applied Mathematics. Fulkerson, D. R., and Gross, O. A. 1965. Incidence matrices and interval graphs. Paciﬁc J. Math. 15(3):835 855. Ghassami, A.; Salehkaleybar, S.; Kiyavash, N.; and Bareinboim, E. 2018. Budgeted experiment design for causal structure learning. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsm assan, Stockholm, Sweden, July 10-15, 2018, 1719 1728. Ghassami, A.; Salehkaleybar, S.; Kiyavash, N.; and Zhang, K. 2019. Counting and sampling from markov equivalent dags using clique trees. In The Thirty-Third AAAI Conference on Artiﬁcial Intelligence, AAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019., 3664 3671. Gillispie, S. B., and Perlman, M. D. 2002. The size distribution for markov equivalence classes of acyclic digraph models. Artif. Intell. 141(1/2):137 155. He, Y., and Yu, B. 2016. Formulas for counting the sizes of Markov equivalence classes of directed acyclic graphs. Ar Xiv eprints 1610.07921. He, Y.; Jia, J.; and Yu, B. 2015. Counting and exploring sizes of Markov equivalence classes of directed acyclic graphs. Journal of Machine Learning Research 16:2589 2609. Ibarra, L. 2009. The clique-separator graph for chordal graphs. Discrete Applied Mathematics 157(8):1737 1749. Maathuis, M. H.; Kalisch, M.; and B uhlmann, P. 2009. Estimating high-dimensional intervention effects from observational data. Ann. Statist. 37(6A):3133 3164. Meek, C. 1995. Causal inference and causal explanation with background knowledge. In Proceedings of the 11th Conference on Uncertainty in Artiﬁcial Intelligence (UAI), 403 410. Radhakrishnan, A.; Solus, L.; and Uhler, C. 2017. Counting markov equivalence classes by number of immoralities. In Proceedings of the Thirty-Third Conference on Uncertainty in Artiﬁcial Intelligence, UAI 2017, Sydney, Australia, August 11-15, 2017. Radhakrishnan, A.; Solus, L.; and Uhler, C. 2018. Counting markov equivalence classes for DAG models on trees. Discrete Applied Mathematics 244:170 185. Talvitie, T., and Koivisto, M. 2019. Counting and sampling markov equivalent directed acyclic graphs. In The Thirty-Third AAAI Conference on Artiﬁcial Intelligence, AAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019., 7984 7991. Verma, T., and Pearl, J. 1990. Equivalence and synthesis of causal models. In Proceedings of the 6th Annual Conference on Uncertainty in Artiﬁcial Intelligence (UAI), 255 270. Vose, M. D. 1991. A linear algorithm for generating random numbers with a given distribution. IEEE Trans. Software Eng. 17:972 975. Walker, A. J. 1977. An efﬁcient method for generating discrete random variables with general distributions. ACM Trans. Math. Softw. 3(3):253 256.