# generating_cpnets_uniformly_at_random__4ce27019.pdf Generating CP-Nets Uniformly at Random Thomas E. Allen University of Kentucky Lexington, Kentucky, USA teal223@g.uky.edu Judy Goldsmith University of Kentucky Lexington, Kentucky, USA goldsmit@cs.uky.edu Hayden E. Justice The Gatton Academy, WKU Bowling Green, Kentucky, USA hayden.justice259@topper.wku.edu Nicholas Mattei Data61 and UNSW Sydney, Australia nicholas.mattei@nicta.com.au Kayla Raines University of Kentucky Lexington, Kentucky, USA kayla.raines@live.com Conditional preference networks (CP-nets) are a commonly studied compact formalism for modeling preferences. To study the properties of CP-nets or the performance of CP-net algorithms on average, one needs to generate CP-nets in an equiprobable manner. We discuss common problems with na ıve generation, including sampling bias, which invalidates the base assumptions of many statistical tests and can undermine the results of an experimental study. We provide a novel algorithm for provably generating acyclic CP-nets uniformly at random. Our method is computationally efficient and allows for multi-valued domains and arbitrary bounds on the indegree in the dependency graph. 1 Introduction Modeling, capturing, and reasoning with preferences is a fundamental topic that spans artificial intelligence, including constraint programming (Rossi, Venable, and Walsh 2011), social choice (Chevaleyre et al. 2008), recommendation systems (Ricci et al. 2011), machine learning (F urnkranz and H ullermeier 2010), multi-agent systems (Goldsmith and Junker 2009), and other fundamental areas. One of the most commonly studied preference models is the conditional preference network (CP-net) (Boutilier et al. 2004). CP-nets are a factored, compact, and qualitative representation used to model, elicit, and reason about preferences. CP-nets have garnered considerable attention, particularly within the preference handling community (Domshlak et al. 2011). CP-nets have many potential and important applications automated negotiation (Aydo gan et al. 2013), interest-matching in social networks (Wicker and Doyle 2007), cybersecurity (Bistarelli, Fioravanti, and Peretti 2007), and as aggregation primitives for making group decisions (Lang and Xia 2009; Mattei et al. 2013; Xia, Conitzer, and Lang 2011), to name a few. One explanation for the popularity of CP-nets is their seemingly intuitive and visual representation of the language many of us use to describe what we want. For example, the CP-net be- Copyright c 2016, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. low displays the statement, If the weather is fair, I prefer to go cycling, but if it is raining, I d rather play table tennis. Weather Activity Friend fair rain fair : cycling table tennis rain : table tennis cycling cycling : emily henry table tennis : henry emily Methods for generating random data have long been of interest to computer scientists Alan Turing advocated for a random number generator in the 1951 Ferranti Mark I computer (Knuth 1997) and continue to be an active topic of research. Random generation not only of numbers, but of combinatorial objects such as spanning trees and paths in directed graphs have been studied across both mathematics and computer science (Kulkarni 1990). To our knowledge, methods for generating complex preference models such as CP-nets in a uniform manner have not yet received attention. There is considerable value in being able to generate CPnets uniformly at random, including: enabling experimental analysis of CP-net reasoning algorithms, unbiased blackbox testing, effective Monte Carlo algorithms, analysis of all CPnets to better understand their properties, and simulations for decision making or social choice experiments. Complementing theoretical results with empirical experiment, whether from real data or from data generated according to a distribution, may provide a window into feasible algorithms that provide good results in practice; biased generation may heavily skew these results. Experimental research in preference handling requires the use of real-world or simulated data. Real-world data are often messy, not openly available, notoriously difficult to collect reliably, hard to interpret, and nonexistent for CP-nets (Allen et al. 2015; Mattei and Walsh 2013). Principled methods exist to generate simulated data in social choice and preference handling using generative cultures (Berg 1985; Walsh 2011; Mattei, Forshee, and Goldsmith 2012). Such cultures have their drawbacks and limitations (Regenwetter et al. 2006; Popova, Regenwetter, and Mattei 2013), but provide a first step in experimentation for fields where data are hard to gather. While generative cultures over strict, linear Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) orders are well defined in social choice, there is not an analog for preferences over more complex structures such as CP-nets. To generalize any statistical cultures used in social choice, we need to be able to generate samples uniformly at random from a specified set of CP-nets. A key idea of our method is that the structure of a CPnet is equivalent to a tuple of sets representing the parents of nodes in the network. We show how to enumerate all such dagcodes, as these tuples are known, and how to calculate the number of CP-nets the possible graphs and conditional preference tables (CPTs) that extend a partial dagcode. The resulting novel recurrence allows us to generate the graph and CPTs, node by node, such that all acyclic CPnets with a given domain size and bound on indegree are equiprobable. We first formalize definitions that we need to discuss CPnets and random generation, highlighting two problems bias and degeneracy that result from commonly used na ıve generation methods. We show how to encode and count the dependency graphs. We next show how to avoid degeneracy in the CPTs and extend the recurrence to count all combinations of CPTs that a partially specified CP-net can have. Finally, we bring these results together to create an algorithm that samples the space of CP-nets uniformly. 2 Preliminaries A preference relation is a partial order (a reflexive, antisymmetric, transitive binary relation) on a set of outcomes O, where o o means o is preferred to o . We assume O is finite and can be factored into variables V = {X1, . . . , Xn} with associated domains Dom(Xi) = {xi 1, . . . , xi d} such that O = Dom(X1) Dom(Xn).1 We assume domain sizes are homogeneous; i.e., |Dom (Xi)| = d for all Xi V. When a variable is constrained to exactly one value of its domain, we say the value has been assigned to it. We designate by Asst(U) the set of all assignments to U V. An assignment to all variables U = V designates a unique outcome o O. We denote by uxi k the combination of u Asst(U) and xi k Dom(Xi), where U {Xi} = . The symbols and \ denote set complementation and subtraction; e.g., U V \ U. For d-ary variables the total outcome space is |O| = dn; i.e., exponential space is required to store . However, since O is factored, a conditional preference network potentially provides a compact model of . Definition 1. A CP-net is a directed acyclic graph (DAG) in which each node Xi V is labeled with a conditional preference table for its variable. An edge (Xh, Xi) indicates that the preferences over Xi in depend on the value of Xh. We thus call Xh a parent of Xi. We denote by Pa(Xi) the set of all such parents. Definition 2. A conditional preference table CPT(Xi) specifies the preferences over node Xi given an assignment to its parents. Each CPT consists of rules of the form u : i specifying a linear order on Xi for all u Asst(Pa(Xi)). We use the term dependency graph to refer to the graph of a CP-net apart from its CPTs. The term DAG always refers 1The notation follows that of Boutilier et al. (2004). to a labeled directed acyclic graph. CPT(Xi|Xh = xh k) denotes all rules of CPT(Xi) of the form uxh k : i where xh k Dom(Xh), Xh Pa(Xi), u Asst(Pa(Xi)\{Xh}). We assume here that CPTs are complete, i.e., have rules for all dm assignments to parents, where m = |Pa (Xi)| is the indegree of Xi. Since the number of rules is exponential in m, we make the customary assumption that indegree is bounded by a small constant, i.e., |Pa (Xi)| c for all Xi. 3 Na ıve Generation, Bias, and Degeneracy If one wants to generate CP-nets without regard for the resulting distribution, many simple random methods exist. For example, initialize a CP-net with n nodes, no edges, and empty CPTs; choose a random subset of pairs (Xh, Xi), h < i, inserting an edge from each Xh to Xi; generate a CPT for each Xi with d|Pa(Xi)| rules, each a random permutation of the d values of Xi; and randomly permute the n labels. We suspect that something along these lines is meant when we read in papers, We generated 1000 CP-nets at random. However, the resulting distribution is biased statistically, and this bias calls into question the validity of experiments and the ensuing analysis of algorithms and methods. To understand this bias, consider the following dependency graphs and their associated CP-net count (for d = 2). 1,033,504 CP-nets Observe that for the chain-shaped graph on the left, there are just two ways to choose each of the n CPTs such that they are consistent with the dependency graph. The CPT of root E could be [e1 e2] or [e2 e1]. The other nodes, each of which has only one parent, also have two possibilities for their CPT; e.g., CPT(A) could be [b1 : a1 a2, b2 : a2 a1] or [b1 : a2 a1, b2 : a1 a2]. However, in the case of the star-shaped graph on the right, CPT(A ) has d4 = 16 rules, each with d! = 2 possible orderings. In all, over one million CP-nets have the graph on the right, while only 32 have the graph on the left. Further observe that the ratio of this imbalance increases with the domain size d. Thus, if the algorithm above in fact generated the two graphs with equal likelihood, it would grossly oversample CP-nets with the first graph, while correspondingly undersampling those with the second. However, the na ıve algorithm does not even generate the two DAGs with equal likelihood. Since there are 5! = 120 ways to permute the labels of the first DAG, but only 5 ways to permute those of the second, the star-shaped DAG on the right would be generated 24 times as often as the chainshaped DAG on the left. Despite this, the CP-nets in the starshaped case would still be greatly undersampled. A separate problem, degeneracy, can arise in assigning the CPTs, regardless of how the DAGs are generated. Example 3. Consider the following CP-net. c1d1 : a2 a1 c1d2 : a2 a1 c2d1 : a1 a2 c2d2 : a1 a2 a1d1 : b1 b2 a1d2 : b1 b2 a2d1 : b1 b2 a2d2 : b2 b1 c1 c2 c1 : d1 d2 c2 : d2 d1 The edge from D to A indicates that the preference over the values of A depends on the value of D. However, in examining the CPT of A closely, one can observe that the preference over A does not in fact depend on D. It can thus be represented by the following, simpler CP-net. c1 : a2 a1 c2 : a1 a2 a1d1 : b1 b2 a1d2 : b1 b2 a2d1 : b1 b2 a2d2 : b2 b1 c1 c2 c1 : d1 d2 c2 : d2 d1 Understanding when a CPT is degenerate is crucial to generating CP-nets uniformly at random. 4 Encoding and Counting Graphs To facilitate unbiased generation, we model the dependency graphs of CP-nets as dagcodes (Steinsky 2003), inspired by Pr ufer codes for labeled trees (Kreher and Stinson 1999). We first treat the dagcode as an abstraction and then show how it relates to the dependency graph. Definition 4. A dagcode A = A1, . . . , An 1 is a tuple of n 1 subsets Aj {1, . . . , n} that satisfy the cardinality constraint | k j Ak| j for all j, 1 j < n. Observe from Def. 4 that tuples {1}, {1, 3} and {3}, are valid dagcodes (n = 3), but {1, 2}, and , {1, 2, 3} are not, since each violates the cardinality constraint. Steinsky (2003) proved that dagcodes correspond one-to-one with DAGs and described efficient algorithms for converting dagcodes to DAGs (see Alg. 1) and vice versa. Applied to CP-nets, each subset Aj {1, . . . , n} in the dagcode corresponds to the parents of some node Xi in the dependency graph: i.e., h Aj = Xh Pa(Xi). Note that the root node with the smallest label is implicit; informally, it is helpful to consider every dagcode as having an implicit element A0 . The order in which the remaining n 1 parent sets Pa(Xi) occur in the dagcode depends on the order of the child node Xi with respect to other nodes in the graph and the relative size of node label i, as follows: 1. If Xh is an ancestor of Xi in the DAG, the encoded parent set Pa(Xh) is ordered before Pa(Xi) in the dagcode. 2. If h < i and Xh is neither an ancestor nor a descendant of Xi, then Pa(Xh) is ordered before Pa(Xi). Example 5. The dagcode {1}, {1, 3} corresponds to a DAG with n = 3 nodes depicted below. The sets {1} and {1, 3} indicate that one node has parent X1 and another has parents X1 and X3; the third (implicit) node is a root. The mapping from parent sets to their children can be recovered from DAGCODE-TO-DAG (Alg. 1) (Steinsky 2003, adapted) working right to left as follows: A2 = {1, 3} corresponds to the parents of X2 since 2 is the largest unassigned label not in {1} {1, 3}. A1 = {1} corresponds to the parents of X3 since 3 is the largest unassigned label not in {1}. The remaining root node is X1. Observe that a DAG has bounded indegree c iff |Aj| c for all Aj in the corresponding dagcode: every node Xi in the DAG corresponds to the parent set of an element Aj in the dagcode, with the exception of a root with indegree 0. Our generation method depends on counting the number of extensions to a partially specified graph. Consider a partial encoding A<3 = {1}, {2}, of a graph with n = 4 nodes and bound c = 1 on indegree. Here the could be any subset of {1, 2, 3, 4} of cardinality 0 or 1 such that the resulting dagcode is valid, viz., , {1}, {2}, {3}, or {4}. Generalizing, let A j and all q . We will show that the resulting count for an,c(j, q) is also correct. Observe that, whatever the size of set U V , the loop at Line 4 iterates over the q s ways to choose s elements from U. Similarly, the loop at Line 5 iterates over the n q t ways to choose t elements from U. Note that the number of DAGs generated in the body of the outermost loop depends on s and t, which differ on each iteration. Thus, for all (s, t) as defined in Line 3, we take the sum of the DAGs generated in the loop body, obtaining the result given in Eq. 1. Finally, observe that all dagcodes parameterized by n, c extend the fully unspecified dagcode A<1 = , . . . , , for which j = 1 and q = 0. Thus, an,c = an,c(1, 0). 5 Counting and Generating the CPTs We can generalize the notion of a degenerate CPT introduced in Section 3 with the help of a bijection with discrete multi-valued functions. We model each CPT(Xi) as a function Fj : {0, . . . , d 1}m {0, . . . , d! 1}. The inputs correspond to the values of the m parents of Xi. The output corresponds to one of the d! orderings on the domain of Xi. ALL-DAGS( n, c, j, q, U, A 1. Theorem 8. φd(m) = d!dm Proof. Each rule of CPT(Xi) specifies one of d! linear orders of Dom(Xi). The number of rules is |Asst (Pa(Xi))| = dm, where m = |Pa (Xi)|. Since each rule can be assigned independently, φd(m) = d!dm. BUILD-CP-NET( A, F ) Input: A = A1, . . . , An 1 dagcode defining graph F = F0, . . . , Fn 1 cpt-code defining CPTs Output: the corresponding CP-net N 1: n length(A) + 1; Q {1, . . . , n} 2: initialize CP-net N with n nodes, no edges, empty CPTs 3: for j n 1 downto 1 do 4: i max(Q \ j k=1 Ak) 5: for all h Aj do 6: insert edge to Xi from its parent Xh 7: construct CPT(Xi) from Aj, Fj 8: Q Q \ {i} 9: i the only remaining element in Q 10: construct CPT(Xi) from F0 11: output CP-net N Algorithm 3: Construct CP-net from its encoding Theorem 9. ψd(m) = m k=0( 1)m k m k d!dk. Theorem 10. limm ψd(m)/φd(m) = 1. (The proofs of Thms. 9 and 10, omitted here, follow those of Hu (1968) [ 2, 10] for Boolean functions.) Theorem 11. Determining whether a CPT (resp. its corresponding function Fj) is degenerate can be conducted in time polynomial in the size of the CPT (resp. domain of Fj). Proof. (Sketch.) Recall that a CPT of Xi V is degenerate if it is vacuous in any parent Xh. It is vacuous in Xh if CPT(Xi| Xh = xh k) = CPT(Xi| Xh = xh ℓ) for all xh k, xh ℓ Dom(Xh). Note that we can check the CPT of Xi for degeneracy using an algorithm with 4 nested loops: 1. Iterate over each Xh Pa(Xi) to check whether the CPT is vacuous in Xh. 2. For each Xh, iterate over the d values of xh k Dom(Xh). 3. For each xh k, 1 < k d, iterate over the |Asst (Pa(Xi))| = d|Pa(Xi)| rules to determine whether CPT(Xi| Xh = xh k) = CPT(Xi| Xh = xh 1) in each case. 4. For each rule uxh k : i, u Asst(Pa(Xi) \ {Xh}), iterate over the d values that specify the linear order i on Xi. Let m denote the number of parents of Xi. Note that the input is dm+1, since the CPT has dm rules of length d. The nested loops require O(mdm+2) time. Thus, we can check for degeneracy in time polynomial in the size of the input. Moreover, since CPTs of d-ary nodes with m parents correspond one-to-one with functions Fj : {0, . . . , d 1}m {0, . . . , d! 1}, the proof applies also to the latter. We can leverage these results to generate non-degenerate CPTs efficiently and uniformly. For tiny values of d and m, we can choose uniformly from a modest-sized table of nondegenerate functions (e.g., ψ2(4) = 64594). For larger values, we use rejection sampling, generating a random permutation i for each assignment to parents and repeating this process in the unlikely event (e.g., < 0.0001 for m > 4 and very rapidly converging to 0 as m increases) that the result is ALL-CP-NETS( n, c, d, j, q, U, A 0 then 7: Aj S T; include Aj with A