# the_complexity_of_learning_acyclic_cpnets__d77ebd38.pdf The Complexity of Learning Acyclic CP-Nets Eisa Alanazi, Malek Mouhoub and Sandra Zilles Department of Computer Science University of Regina Regina, SK, Canada {alanazie,mouhoubm,zilles}@cs.uregina.ca Learning of user preferences has become a core issue in AI research. For example, recent studies investigate learning of Conditional Preference Networks (CP-nets) from partial information. To assess the optimality of learning algorithms as well as to better understand the combinatorial structure of CP-net classes, it is helpful to calculate certain learning-theoretic information complexity parameters. This paper provides theoretical justification for exact values (or in some cases bounds) of some of the most central information complexity parameters, namely the VC dimension, the (recursive) teaching dimension, the self-directed learning complexity, and the optimal mistake bound, for classes of acyclic CP-nets. We further provide an algorithm that learns tree-structured CP-nets from membership queries. Using our results on complexity parameters, we can assess the optimality of our algorithm as well as that of another query learning algorithm for acyclic CP-nets presented in the literature. 1 Introduction Since preference learning is important in many AI applications, there is a need for a strong theoretical underpinning of research on this topic. In recent years, substantial advances have been made in this field, for example in the design of Conditional Preference Networks (CP-nets) [Boutilier et al., 2004] and the study of their learnability. A CP-net is a compact preference representation for multi-attribute domains where the preference of one attribute may depend on the values of other attributes. Koriche and Zanuttini (2010) investigated query learning of bounded acyclic CP-nets (i.e., with a bound on the number of attributes on which the preferences for any attribute may depend). Their successful algorithms used both membership and equivalence queries, cf. [Angluin, 1988], while they proved that equivalence queries alone are not sufficient for efficient learnability. CP-nets have also been studied in models of passive learning from examples, both for batch learning [Dimopoulos et al., 2009; Lang and Mengin, 2009; Liu et al., 2014] and for online learning [Guerin et al., 2013]. A fundamental question in assessing the proposed algorithms is how many queries/examples would be needed by the best possible learning algorithm in the given learning model. For several models, lower bounds can be derived from the Vapnik Chervonenkis dimension (VCD, [Vapnik and Chervonenkis, 1971]). This central parameter is one of several that, in addition to yielding bounds on the performance of learning algorithms, provide deep insights into the combinatorial structure of the studied concept class. Such insights can in turn help to design new learning algorithms. Our main contributions are the following: (a) We provide the first study that exactly calculates the VCD for the class of unbounded acyclic CP-nets, and give a lower bound for any bound k. So far, the only existing studies present a lower bound [Koriche and Zanuttini, 2010], which we prove incorrect for large values of k, and asymptotic complexities [Chevaleyre et al., 2010]. The latter show that VCD= (2n) for k = n 1 and (n2k) when k 2 o(n), in agreement with our result that VCD equals 2n 1 for k = n 1, and at least 2k 1 + (n k)2k for general values of k. It should be noted that both previous studies assume that CP-nets can be incomplete, i.e., for some variables no preference relations may be given. In our study, we make the (not uncommon) assumption that CP-nets are complete, but our result on VCD also applies to the more general case that includes incomplete CP-nets. Further, our results are more general than existing ones in that they apply also to CP-nets with multi-valued variables (as opposed to binary variables). As a byproduct of our study, we obtain that the VCD of the class of all consistent CP-nets (whether acyclic or cyclic)1 equals that of the class of all acyclic CP-nets. Hence, the class of acyclic CP-nets is less expressive than that of all consistent CP-nets, but may (at least in some models) be as hard to learn. (b) We further provide exact values (or, in some cases, non-trivial bounds) for other important information complexity parameters, namely the teaching dimension [Goldman and Kearns, 1995], the recursive teaching dimension [Zilles et al., 2011], the self-directed learning complexity [Goldman et al., 1993], and the optimal mistake bound [Littlestone, 1988]. 1A consistent CP-net is one that does not prefer an outcome o over another outcome ˆo while at the same time preferring ˆo over o. Acyclic CP-nets are always consistent, but cyclic ones are not necessarily so. Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16) (c) We present a new algorithm that learns tree-structured CP-nets from membership queries and use our results on the teaching dimension to show that our algorithm is close to optimal. (d) We re-assess the degree of optimality of Koriche and Zanuttini s algorithm for learning bounded acyclic CP-nets, using our result on the VCD. 2 Background A concept is any subset of a given universe X (called instance space) and a concept class is a set of concepts. In this paper, we assume that X (and hence every concept class) is finite. In concept learning, the learner receives information on a target concept c contained in a fixed concept class and is requested to identify or approximate c from the given information. Information consists of labelled examples (x, ) where x 2 X, 2 {0, 1}, and = 1 iff x is contained in the target concept. We also write c(x) = 1 if x 2 c, and c(x) = 0 otherwise. The concept classes we study below are subclasses of the class of so-called conditional preference networks (CP-nets). 2.1 Conditional Preference Networks We largely follow the notation introduced by Boutilier et al. (2004) in their seminal work on CP-nets. Let V = {v1, . . . , vn} be a set of variables. Each variable vi has a set of possible values (its domain) Dvi = {vi 1, . . . , vi m}. We assume that every domain Dvi is of size m, independent of i. An assignment x to a set of variables X V is a mapping for every variable vi 2 X to a value from Dvi. We denote the set of all assignments of X V by OX and remove the subscript when X = V . A preference is an irreflexive, transitive binary relation . For any o, ˆo 2 O, we write o ˆo to denote the fact that o is strictly preferred to ˆo. CP-nets provide a compact representation of preferences over O. For every vi 2 V , the decision maker chooses a set Pa(vi) V \{vi} of parent variables that influence the preference order of vi. For every u 2 OP a(vi), the decision maker specifies a total ordering vi u over Dvi. We refer to vi u as the conditional preference statement of vi in the context of u. The set of preference statements { vi u1, . . . , vi uk}, where the uj are all contexts that yield non-empty conditional preference statements, is the Conditional Preference Table CPT(vi). Such statements expressed in a CPT are under the ceteris paribus interpretation. Definition 1. [Boutilier et al., 2004] Given, V , Pa(v), and CPT(v) for v 2 V , a CP-net is a directed graph (V, E), where, for any vi, vj 2 V , (vi, vj) 2 E iff vi 2 Pa(vj). Example 1. Figure 1a shows a CP-net over V = {A, B, C} with DA = {a, a}, DB = {b, b}, DC = {c, c}. Each variable is annotated with its CPT. For variable A, the user prefers a to a unconditionally. For C, the preference depends on the values of B, i.e., Pa(C) = {B}. For instance, in the context of b, c is preferred over c. Two outcomes o, ˆo 2 O are swap outcomes ( swaps for short) if they differ in the value of exactly one variable vi; then vi is called the swapped variable [Boutilier et al., 2004]. b : c c b : c c (a) The CP-net a b c ab c a bc ab c a bc abc (b) The induced graph Figure 1: An acyclic CP-net and its induced graph We use o[X] to denote the projection of o onto X V and write o[vi] instead of o[{vi}]. The size of a preference table for a variable vi, denoted by size(CPT(vi)), is the number of preference statements it holds. The size of a CP-net is the sum of its tables sizes. Example 2. In Figure 1a, abc, abc are swaps over the swapped variable A. The CP-net size is 1 + 1 + 2 = 4. The induced graph of a CP-net N is defined as follows. Each vertex in the induced graph represents an outcome o 2 O. An edge from ˆo to o exists iff (o, ˆo) 2 O O is a swap w.r.t. some vi 2 V and o[vi] precedes ˆo[vi] in vi o[P a(vi)] [Boutilier et al., 2004]. N defines a partial order over O that is given by the transitive closure of its induced graph. If o ˆo we say N entails (o, ˆo). N is consistent if there is no o 2 O with o o, i.e., if its induced graph is acyclic. Since acyclic CP-nets are always consistent [Boutilier et al., 2004], their class is particularly interesting for learnability studies, as is its subclass of separable CP-nets, which have no edges. Example 3. In the network in Figure 1a, abc a bc, as witnessed by the path a bc ! a bc ! abc in the induced graph shown in Figure 1b. As CP-net semantics are completely determined by the preference relation over swaps, one may consider the set Xswap = {(o1, o2) 2 O O | (o1, o2) is a swap and o1 is lexicographically smaller than o2} as the instance space; a quick calculation (which is omitted due to space constraints) shows that its size is |Xswap| = nmn 1!m 2 . (We assume a fixed lexicographic order.) For x = (o1, o2) 2 Xswap, let V (x) denote the swapped variable of x. We refer to the first and second outcomes of an example x as x.1 and x.2, respectively. We use x[Γ] to denote the assignments (in both x.1 and x.2) of Γ V \{V (x)}. In what follows, we fix k 2 {0, . . . , n 1} and consider the class Cac of all acyclic CP-nets whose nodes have an indegree of at most k. We use Cunb, Ctree and Csep for the classes of acyclic CP-nets with indegree at most n 1, 1, and 0, respectively. A concept c contains a swap pair x iff the corresponding CP-net entails (x.1, x.2). By size(c), we refer to the size of the CP-net represented by c. Lastly, M = max{size(c) | c 2 Cac} is the maximum number of statements in any concept in Cac. It can be verified that M = (n k)mk + Pk 1 2.2 Information Complexity Parameters The combinatorial structure of a concept class C has implications on the complexity of learning C, in particular on the sample complexity (sometimes called information complexity), which refers to the number of labelled examples the learner needs in order to identify any target concept in the class under the constraints of a given learning model. One of the most important complexity parameters studied in machine learning is the Vapnik-Chervonenkis dimension (VCD). In what follows, let C be a concept class over the (finite) instance space X. Definition 2. [Vapnik and Chervonenkis, 1971] A subset Y X is shattered by C if the projection of C onto Y has 2|Y | concepts. The VC dimension of C, denoted by VCD(C), is the size of the largest subset of X that is shattered by C. The number of randomly chosen examples needed to identify concepts from C in the PAC-learning model is linear in VCD(C) [Blumer et al., 1989]. By contrast to learning from random examples, in teaching models, the learner is provided with well-chosen labelled examples. Definition 3. [Goldman and Kearns, 1995; Shinohara and Miyano, 1991] A teaching set for a concept c 2 C with respect to C is a set S = {(x1, 1), . . . , (xz, z)} of labelled examples such that c is the only concept c 2 C that satisfies c(xi) = i for all i 2 {1, . . . , z}. The teaching dimension of c with respect to C, denoted by TD(c, C), is the size of the smallest teaching set for c with respect to C. The teaching dimension of C, denoted by TD(C), is given by TD(C) = max{TD(c, C) | c 2 C}. TDmin(C) = min{TD(c, C) | c 2 C} denotes the smallest TD of any c 2 C. A well-studied variation of teaching is called recursive teaching. Its complexity parameter, the recursive teaching dimension, is defined by recursively removing from C all the concepts with the smallest TD and then taking the maximum over the smallest TDs encountered in that process. For the corresponding definition of teachers, see [Zilles et al., 2011]. Definition 4. [Zilles et al., 2011] Let C0 = C and, for all i such that Ci 6= ;, define Ci+1 = Ci \ {c 2 Ci | TD(c, Ci) = TDmin(Ci)}. The recursive teaching dimension of C, denoted by RTD(C), is defined by RTD(C) = max{TDmin(Ci) | i 0}. As opposed to the TD, the RTD exhibits interesting relationships to the VCD. For example, every maximum class, i.e., a class C whose size |C| meets Sauer s upper bound !|X| ! |X| VCD(C) " [Sauer, 1972], fulfills RTD(C) = VCD(C); the same is true for classes of VCD 1 and for intersection-closed classes [Doliwa et al., 2014]. We will further determine complexity parameters for online prediction, namely the self-directed learning complexity and the optimal mistake bound. A self-directed learner passes a prediction (x, ) 2 X {0, 1} to an oracle, which responds with the information whether or not the target concept c fulfills c (x) = . In case c (x) 6= , the learner has made a mistake. The self-directed learning complexity SDC(C) is the smallest number z for which some self-directed learner exists that makes no more than z mistakes on any concept in C [Goldman et al., 1993]. In classical online learning [Littlestone, 1988], the sequence of instances x for which the learner makes label predictions is determined by an adversary. The best worst-case number of mistakes achievable in this model, where again the worst case is taken over all concepts in C, is called the optimal mistake bound of C, denoted by OPT(C). 3 Complexity Results Table 1 summarizes our complexity results for acyclic CPnets whose nodes have indegrees bounded by k. The two extreme cases are unbounded acyclic CP-nets (k = n 1, Cunb) and separable CP-nets (k = 0, Csep). The most striking observation from our results is that VCD, RTD, and SDC are equal for all values of m in Cunb. Further, when m = 2 (the most studied case in the literature), we have that TD equals the instance space size n2n 1. A close inspection of the case m = 2 shows that Xswap has only n instances that are relevant for Csep, and Csep corresponds to the class of all concepts over these n instances. Thus the values of VCD, TD, RTD, SDC, and OPT are trivially equal to n in this special case. As mentioned earlier, maximum classes and intersection-closed classes fulfill VCD = RTD, but the class of binary separable CP-nets is the only class to which this result applies, since Cac is neither maximum nor intersection-closed as soon as k > 0 or m > 2 (we omit the proofs due to space constraints). The classes Cunb are thus the first natural ones in the literature for which VCD = RTD is known to hold without the fulfillment of any of Doliwa et al. s sufficient conditions like being maximum or intersection-closed. This suggests that the combinatorial structure of CP-nets is interesting from a computational learning theory point of view. In the case of online prediction, for m 11, dlog(m!)e is known to be the minimum number of comparisons needed to sort m elements [Sloane, 2016]. However, for most practical applications, m 11 is sufficient and thus our results are still useful for judging the optimality of online learning algorithms. The fact that SDC is asymptotically strictly smaller than OPT shows that actively selecting examples strictly decreases the number of mistakes when m is large. The remainder of this section is dedicated to proving the statements from Table 1. Due to space constraints, some parts of proofs are sketched only. Our first theorem substantially improves on (and corrects) a result by Koriche and Zanuttini (2010), who present a lower bound on VCD(Cac); their bound is in fact incorrect unless k n. Theorem 1. VCD(Cunb) = mn 1, VCD(Csep) = (m 1)n and VCD(Cac) (m 1)M. As any consistent CP-net (whether acyclic or cyclic) defines an irreflexive, transitive relation, a result from [Booth et al., 2010] implies that the VCD of the class of all consistent unbounded CP-nets is at most mn 1. By our Theorem 1, this VCD is equal to mn 1. Sauer s Lemma then bounds the number of consistent CP-nets from above by Pmn 1 . This also means though, that acyclic CP- class VCD TD RTD SDC OPT Cac (m 1)M n(m 1)U (m 1)M (m 1)M dlog(m!)e M Cunb mn 1 n(m 1)mn 1 mn 1 mn 1 dlog(m!)e mn 1 m 1 Csep (m 1)n (m 1)n (m 1)n (m 1)n dlog(m!)en Table 1: Summary of complexity results. M = (n k)mk + Pk 1 i=0 mi; U is defined after Theorem 2. nets, while less expressive, are in some sense as hard to learn as all consistent CP-nets. The proof of Theorem 1 relies on decomposing Cac as a direct product of concept classes over subsets of Xswap. Definition 5. Let Ci 2Xi and Cj 2Xj be concept classes with Xi \ Xj = ;. The concept class Ci Cj 2Xi[Xj is defined by Ci Cj = {ci [ cj | ci 2 Ci and cj 2 Cj}. For concept classes C1, ..., Cr, we define Qr i=1 Ci = C1 Cr = ( ((C1 C2) C3) Cr). It is well-known that VCD( For any vi 2 V and any Γ V \ {vi}, we define CΓ CPT(vi) to be the concept class consisting of all preference relations corresponding to some CPT(vi) where Pa(vi) = Γ and |Γ| k; here the instance space is the set of all swap pairs x with V (x) = vi. Now, if we fix the context of vi by fixing an assignment γ 2 OΓ of all variables in Γ, we obtain a concept class CΓ vi γ , which corresponds to the set of all preference statements concerning the variable vi conditioned on the context γ. Its instance space is the set of all swaps x with V (x) = vi and x[Γ] = γ. Recall that V = {v1, . . . , vn}. By Sn we denote the class of all permutations of {1, . . . , n}. Proof of Theorem 1. Lemma 1 (below) states that Cac Γ {vσ(1),...,vσ(i 1)},|Γ| k which yields the bound VCD(Cac) max Γ {vσ(1),...,vσ(i 1)},|Γ| k Using VCD(CΓ vσ(i) γ ) = m 1 (see Lemma 2), independent of Γ and γ, one obtains, for any σ 2 Sn, VCD(Cac) (m 1) max Γ {vσ(1),...,vσ(i 1)},|Γ| k |OΓ| max Γ {vσ(1),...,vσ(i 1)},|Γ| k m|Γ| For k 2 {0, n 1}, to verify VCD(Cac) (m 1)M, consider any shattered set Y of size (m 1)M. One can show that there is some labeling of Y that agrees with exactly one concept in Cac, so that no set Y 0 Y is shattered (and thus no set of size > (m 1)M is shattered) by Cac. Lemma 1. Cac = S Γ {vσ(1),...,vσ(i 1)},|Γ| k Proof. By definition, for v 2 V and Γ V \ {v}, the class CΓ CPT(v) equals Q γ. (Any concept representing a preference table for v with Pa(v) = Γ corresponds to a union of concepts each of which represents a preference statement over Dv conditioned on some context γ 2 OΓ.) Any concept corresponds to choosing a set Γv of parent variables of size at most k for each variable v, which means Cac Qn Γ V \{vi},|Γ| k CΓ CPT(vi). By acyclicity, vj 2 Pa(vi) implies vi /2 Pa(vj), so that for each concept c 2 Cac some σ 2 Sn fulfills c 2 Qn Γ {vσ(1),...,vσ(i 1)},|Γ| k CΓ CPT(vσ(i)). Thus, Cac S Γ {vσ(1),...,vσ(i 1)},|Γ| k Similarly, one can argue that every concept in the class on the right hand side represents an acyclic CP-net with parent sets of size at most k. With CΓ CPT(vi) = Q vi γ , the statement of the lemma follows. Lemma 2. VCD(CΓ vi γ ) = m 1 for any vi 2 V , Γ V \ {vi}, and γ 2 OΓ. Proof. Let vi 2 V , Γ V \ {vi}, and γ 2 OΓ. We show that CΓ vi γ shatters some set of size m 1, but no set of size m. Note first that, by definition, CΓ vi γ is simply the class of all total orders over the domain Dvi of vi. To show VCD(CΓ vi γ ) m 1, choose any set of m 1 swaps over Γ [ {vi} with fixed context γ, in which the pairs of swapped values in vi are (vi 2),..., (vi m). Fix any set S Xswap of m swaps over Γ [ {vi} with fixed context γ. To show that S is not shattered, consider the undirected graph G with vertex set Dvi in which an edge between vi s exists iff S contains a swap pair flipping vi s or vice versa. G has m vertices and m edges and thus contains a cycle. The directed versions of G correspond to the labellings of S; therefore some labelling of S corresponds to a cyclic directed version of G, which does not induce a total order over Dvi. Hence the labelling is not realized by CΓ vi γ , so that S is not shattered by CΓ For studying teaching complexity, it is useful to identify concepts that are easy to teach. To this end, we use the notion of subsumption [Koriche and Zanuttini, 2010]: given CP-nets N, N 0, we say N subsumes N 0 if for all vi 2 V the following holds: If y1 y2 is specified in CPT(vi) in N 0 for some context γ0, then y1 y2 is specified in CPT(vi) in N for some context containing γ0. If in addition N 6= N 0, we say that N strictly subsumes N 0. Now let C Cac. A concept c 2 C is maximal in C if no c0 2 C strictly subsumes c. Note that maximal concepts in Cac are of size M. The following lemma formalizes the intuition that maximal concepts are easy to teach. Lemma 3. For any maximal concept c in a concept class C Cac, we have TD(c, C) (m 1)size(c). Proof. Every statement in the CP-net N represented by c corresponds to an order of m values for some variable vi under a fixed context γ. For every such order y1 vi γ . . . vi γ ym, we include m 1 positively labelled swap examples in a set T. For 1 j m 1, the jth such example labels a pair x = (x.1, x.2) of swap outcomes with V (x) = vi, the projection of x onto {vi} is (yj, yj+1), and the projection of x onto the remaining variables contains γ. The set T then has cardinality (m 1)size(c) and is obviously consistent with N. It remains to show that no other CP-net in C is consistent with T. Suppose some c0 6= c in C is consistent with T. Since c0 6= c, there is some vi 2 V , γ 2 OV \{vi}, and y, y0 2 Dvi such that y vi γ y0 holds in c0 while y0 vi γ y holds in c. Thus (i) c0 disagrees with some statement in a preference table of c, or (ii) c0 has a statement in one of its preference tables that is not contained in c. (i) is impossible since c0 is consistent with T, and (ii) is impossible since c is maximal in C. Lemma 4. Each non-maximal c0 2 Cac is strictly subsumed by some c 2 Cac s.t. TD(c0, Cac) TD(c, Cac). Proof. From the graph G0 for c0, we build a graph G by adding the maximum possible number of edges to a single variable v. As c0 is not maximal, it is possible to add at least one edge. The CP-nets corresponding to G and G0 differ only in CPT(v). Let c be the concept representing G and z be the size of its CPT for v. A smallest teaching set T 0 for c0 can be modified to a teaching set for c by replacing only those examples that refer to the swapped variable v; (m 1)z examples suffice. To distinguish c0 from c, T 0 must contain at least (m 1)z examples referring to the swapped variable v (m 1 for each context in CPT(v) in c). Hence TD(c0, Cac) TD(c, Cac). Using these lemmas, one can show that TD(Cac) equals the TD of separable CP-nets within Cac and RTD(Cac) is the TD of maximal concepts within Cac. The latter is at most (m 1)M by Lemma 3, and can be verified to be at least (m 1)M when arguing that a teaching set for a maximal concept must contain m 1 examples for each statement in its CPTs, so as to determine the preferences for each context. We thus obtain the following theorem. Theorem 2. RTD(Cac)=(m 1)M. For the TD of separables in Cac, consider any unconditional CPT(vi) = {y1 ym}. For every R V \{vi}, |R| = k, we create the dummy CPT(vi) where Pa(vi) = R with the same statement y1 ym in every context of OR. Any teaching set must show that under any context, we have the same statement y1 ym. Thus, a minimal teaching set restricted to CPT(vi) is a smallest set of examples Ui such that if projected to any subset R of size k, Ui contains mk contexts. Each of the statements of the form y1 ym can be taught by (m 1) labelled examples, and one does so for each element of Ui. For each variable vi, a respective set of examples is included in the teaching set. One can show that fewer examples are not sufficient for teaching a separable CP-net. We denote the cardinality of Ui, which is independent of i, by U. In the binary case, Ui is known as a (n 1, k)-universal set of minimum size [Damaschke, 2000]. In combination with some obvious bounds, we obtain the following theorem. Theorem 3. For 0 k n 1, we have n(m 1)mk TD(Cac) = n(m 1)U n(m 1) mk. If k = 0, then U = 1, so that TD(Csep) = (m 1)n. If k = 1, then U = m, so that TD(Ctree) = (m 1)mn. If k = n 1, then U = mn 1, so that TD(Cunb) = (m 1)nmn 1. Theorem 3 implies that, for Cunb, the ratio of TD over instance space size |Xswap| is 2 m. In particular, in the case of binary CP-nets (i.e., when m = 2), which is the focus of most of the literature on learning CP-nets, the TD equals the instance space size. However, maximal concepts have a TD far below the worst-case TD. Theorem 4. SDC(Cac) = (m 1)M. Proof. From [Doliwa et al., 2014] and Theorem 2 we get SDC(Cac) RTD(Cac) = (m 1)M. For the upper bound, note that any concept in Cac, when fixing a variable v and a context γ 2 OV \{v}, induces a total order on Dv. Goldman et al. (1993) discussed a prediction strategy (basically an insertion sort) to learn a total order over m items while making at most m 1 mistakes. Separately for each variable v, a selfdirected learner fixes an arbitrary context γ 2 OV \{v} and learns a preference over Dv with Goldman et al. s strategy. For other contexts on v, the learner will assume the same preference relation unless it makes a mistake (which will cause it to learn a new preference over Dv). For each preference to be learned ( M in total), the learner makes at most (m 1) mistakes, for a total of (m 1)M mistakes. Theorem 5. OPT(Cac) dlog(m!)e M. Proof. Any learner must identify up to M preference statements (for a fixed variable and context) separately. Each such statement is a permutation of m elements, and its identification requires at least dlog(m!)e comparisons, in the worst case. The adversary can force the learner to make as many mistakes as comparisons are needed, yielding the lower bound. 4 Near-Optimal Query Learning In what follows, we will demonstrate in two scenarios how our complexity results can be used to assess the optimality of learning algorithms, in particular, algorithms learning a target CP-net N from membership or equivalence queries. A membership (equivalence) query is a swap x (a CP-net N 0, resp.) that the learner passes to an oracle; the oracle responds yes or no , depending on whether N entails x (whether N equals N 0, resp.) If an equivalence query for N 0 is answered no , the oracle also presents a swap that is entailed by either N or N 0, but not by both. An algorithm learns a class C from a given type of queries if it identifies any target concept c 2 C from a number of queries that is polynomial in the size of the underlying representation of c [Angluin, 1988]. 4.1 Learning Tree-Structured CP-nets from Membership Queries Koriche and Zanuttini (2010) present an algorithm for learning a binary tree-structured CP-net N that may be incomplete in that it may have empty CPTs (i.e., it learns a superclass of Ctree for m = 2.) Their learner uses at most n N + 1 equivalence and 4n N + e N log(n) membership queries, where n N is the number of relevant variables and e N the number of edges in N . We present a method learning any CP-net in Ctree (i.e., complete tree CP-nets) for any m, using only membership queries. For a CP-net N, a conflict pair w.r.t. vi is a pair (x, x0) of swaps such that (i) V (x) = V (x0) = vi, (ii) x.1 and x0.1 agree on vi, (iii) x.2 and x0.2 agree on vi, and (iv) N entails one of the swaps x, x0, but not the other. If vi has a conflict pair, then vi has a parent variable vj whose values in x and x0 are different. Such a variable vj can be found with log(n) membership queries by binary search (each query halves the number of candidate variables with different values in x and x0) [Damaschke, 2000]. We use this binary search to learn tree-structured CP-nets from membership queries, by exploiting the following fact: if a variable vi in a tree CP-net has a parent, then a conflict pair w.r.t. vi exists and can be detected by asking membership queries to sort m test sets for vi. Let (vi 1, . . . , vi m) an arbitrary but fixed permutation of Dvi. Then, for all j 2 {1, . . . , m}, a test set Ii,j for vi is defined by Ii,j = {(v1 j , . . . , vi 1 j , . . . , vn j ) | 1 r m}. Since vi has no more than one parent, determining preference orders over m such test sets of size m is sufficient for revealing conflict pairs, rather than having to test all possible contexts in OV \{vi}. Now we have a simple strategy for learning tree CP-nets with membership queries: 1. For every variable vi, ask O(m log(m)) membership queries from the swaps over Ii,j to obtain orders over Ii,j for all j. 2. If at least two of the orders differ, i.e., there is a conflict pair, find the only parent of vi by log(n) further queries, else Pa(vi) = ;. 3. CPT(vi) is determined by the queries above. It is not hard to see that this algorithm learns a target CP-net N 2 Ctree with at most nm O(m log(m)) + e N log(n) membership queries, where e N is the number of edges in N . In particular, for the binary case we need 2n + e N log(n) queries at most, i.e., compared to Koriche and Zanuttini s method, when focusing only on tree CPnets with non-empty CPTs, we reduce the number of membership queries by a factor of 2, and drop the requirement for equivalence queries. As TD lower-bounds the required number of membership queries, our method uses at most log(m) + e N log(n) queries more than an optimal one, see Theorem 3. 4.2 Learning Incomplete and Complete Acyclic CP-nets from Queries Koriche and Zanuttini (2010) provide an algorithm that learns the class C ac of complete and incomplete (binary) acyclic CP-nets with nodes of bounded indegree from equivalence and membership queries. To evaluate their algorithm, they compare its query complexity to log(4/3)VCD(C ac), which lower-bounds the required number of membership and equivalence queries [Auer and Long, 1999]. They calculate a lower bound on VCD(C ac), in lieu of the exact value, cf. their Theorem 6. ac Cac implies VCD(C ac) VCD(Cac). It is not hard to see that in fact VCD(C ac) = VCD(Cac) M (for m = 2), since the largest shattering capacity is obtained when specifying CPTs for all variables, i.e., when using complete CP-nets. However, Koriche and Zanuttini s lower bound on VCD(C ac) exceeds VCD(C unb) = 2n 1 in some cases. In particular, in case k = n 1, their lower bound (on VCD(C unb)) exceeds the exact value of 2n 1, cf. our Theorem 1. It appears as though their bound is correct only for k n as it is evaluated to (n+(c 1))2n (c+1) n2 (c 1)n + (2c 1)(n+(c 1)) 2 which exceeds VCD(C unb) for small values of c where k = n c. For any k, their algorithm learns a superclass C ac of Cac for m = 2 (where the additional concepts are the incomplete ones) using at most r N + e N log(n) + e N + 1 queries in total, for a target CP-net N with r N statements and e N edges. In the worst case r N = M VCD(C ac) and e N = + (n k)k (i.e., N is maximal w.r.t. C ac). This yields M + e N (log(n) + 1) queries for their algorithm, which exceeds the lower bound log(4/3)VCD(C ac) by at most log(3/2)VCD(C ac) + e N log(n). This is a more refined assessment compared to the reported term e N log(n), and it holds for any value of k. 5 Conclusions We determined exact values or non-trivial bounds on the parameters VCD, TD, RTD, SDC, and OPT for the classes of complete k-bounded acyclic CP-nets for any k. Our VCD values still apply to the class of potentially incomplete kbounded acyclic CP-nets, and thus correct a mistake in [Koriche and Zanuttini, 2010]. Our TD value shows that our proposed algorithm for learning complete tree CP-nets is close to optimal. To the best of our knowledge, Cunb is the first known nonmaximum (and not intersection-closed) class that is interesting from an application point of view and satisfies RTD = VCD. Thus further studies on the structure of CP-nets may be helpful toward the solution of an open problem concerning the general relationship between RTD and VCD [Simon and Zilles, 2015]. Our results may also have implications on the study of consistent CP-nets. Since the class of acyclic CP-nets is less expressive than that of all consistent CP-nets, while having the same information complexity in terms of VCD, it would be interesting to find out whether learning algorithms for acyclic CP-nets can be easily adapted to consistent CP-nets in general. Acknowledgements Eisa Alanazi s research was supported by the Saudi Cultural Bureau in Canada. Malek Mouhoub and Sandra Zilles were supported by the Discovery Grant program of the Natural Sciences and Engineering Research Council of Canada (NSERC). Sandra Zilles was also supported by the NSERC Canada Research Chairs program. References [Angluin, 1988] Dana Angluin. Queries and concept learn- ing. Machine Learning, 2(4):319 342, 1988. [Auer and Long, 1999] Peter Auer and Phil Long. Structural results about on-line learning models with and without queries. Machine Learning, 36(3):147 181, 1999. [Blumer et al., 1989] Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, and Manfred K. Warmuth. Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM, 36(4):929 965, 1989. [Booth et al., 2010] Richard Booth, Yann Chevaleyre, J erˆome Lang, J erˆome Mengin, and Chattrakul Sombattheera. Learning conditionally lexicographic preference relations. In ECAI, pages 269 274, 2010. [Boutilier et al., 2004] Craig Boutilier, Ronen I. Brafman, Carmel Domshlak, Holger H. Hoos, and David Poole. CPnets: A tool for representing and reasoning with conditional ceteris paribus preference statements. Journal of Artificial Intelligence Research, 21:135 191, 2004. [Chevaleyre et al., 2010] Yann Chevaleyre, Fr ed eric Koriche, J erˆome Lang, J erˆome Megine, and Bruno Zanuttini. Learning ordinal preferences on multiattribute domains: the case of CP-nets. In Preference Learning, pages 273 296. Springer-Verlag, 2010. [Damaschke, 2000] Peter Damaschke. Adaptive versus non- adaptive attribute-efficient learning. Machine Learning, 41(2):197 215, 2000. [Dimopoulos et al., 2009] Yannis Dimopoulos, Loizos Michael, and Fani Athienitou. Ceteris paribus preference elicitation with predictive guarantees. In IJCAI, pages 1890 1895, 2009. [Doliwa et al., 2014] Thorsten Doliwa, Gaojian Fan, Hans Ulrich Simon, and Sandra Zilles. Recursive teaching dimension, VC-dimension and sample compression. Journal of Machine Learning Research, 15:3107 3131, 2014. [Goldman and Kearns, 1995] Sally A. Goldman and Michael J. Kearns. On the complexity of teaching. Journal of Computer and System Sciences, 50:20 31, 1995. [Goldman et al., 1993] Sally A. Goldman, Ronald L. Rivest, and Robert E. Schapire. Learning Binary Relations and Total Orders. SIAM Journal on Computing, 22(5):1006 1034, 1993. [Guerin et al., 2013] Joshua T. Guerin, Thomas E. Allen, and Judy Goldsmith. Learning CP-net preferences online from user queries. In ADT, pages 208 220, 2013. [Koriche and Zanuttini, 2010] Fr ed eric Koriche and Bruno Zanuttini. Learning conditional preference networks. Artificial Intelligence, 174(11):685 703, 2010. [Lang and Mengin, 2009] J erˆome Lang and J erˆome Mengin. The complexity of learning separable ceteris paribus preferences. In IJCAI, pages 848 853, 2009. [Littlestone, 1988] Nick Littlestone. Learning quickly when irrelevant attributes abound: a new linear threshold algorithm. Machine Learning, 2(4):245 318, 1988. [Liu et al., 2014] Juntao Liu, Yi Xiong, Caihua Wu, Zhijun Yao, and Wenyu Liu. Learning conditional preference networks from inconsistent examples. IEEE Trans. Knowl. Data Eng., 26(2):376 390, 2014. [Sauer, 1972] Norbert Sauer. On the density of families of sets. Journal of Combinatorial Theory, Series A, 13(1):145 147, 1972. [Shinohara and Miyano, 1991] Ayumi Shinohara and Satoru Miyano. Teachability in computational learning. New Generation Computing, 8(4):337 347, 1991. [Simon and Zilles, 2015] Hans Ulrich Simon and Sandra Zilles. Open problem: Recursive teaching dimension versus VC dimension. In COLT, pages 1770 1772, 2015. [Sloane, 2016] Neil J.A. Sloane. The on-line encyclopedia of integer sequences. Sequence A036604, 2016. Accessed: 10-April-2016. [Vapnik and Chervonenkis, 1971] Vladimir N. Vapnik and Alexey Ya. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16(2):264 280, 1971. [Zilles et al., 2011] Sandra Zilles, Steffen Lange, Robert Holte, and Martin Zinkevich. Models of cooperative teaching and learning. Journal of Machine Learning Research, 12:349 384, 2011.