# knowledgebased_sequence_mining_with_asp__17d7ba9f.pdf Knowledge-Based Sequence Mining with ASP Martin Gebser,3 Thomas Guyet,1 Ren e Quiniou,2 Javier Romero,3 Torsten Schaub2,3 1AGROCAMPUS-OUEST/IRISA, France 2Inria Centre de Rennes Bretagne Atlantique, France 3University of Potsdam, Germany We introduce a framework for knowledge-based sequence mining, based on Answer Set Programming (ASP). We begin by modeling the basic task and refine it in the sequel in several ways. First, we show how easily condensed patterns can be extracted by modular extensions of the basic approach. Second, we illustrate how ASP s preference handling capacities can be exploited for mining patterns of interest. In doing so, we demonstrate the ease of incorporating knowledge into the ASP-based mining process. To assess the trade-off in effectiveness, we provide an empirical study comparing our approach with a related sequence mining mechanism. 1 Introduction Sequential pattern mining is about identifying frequent subsequences in sequence databases [Shen et al., 2014]. Modern miners exhibit a compelling effectiveness and allow for highthroughput extraction from big data sources. But this often brings about a profusion of patterns, which becomes a problem to analysts, who, although no longer swamped by data, are now swamped by patterns. The emerging challenge is thus to trade quantity for quality and to extract fewer patterns of greater relevance to the analyst. To this end, the analyst must be empowered to express her criteria of interest. However, this turns out to be rather cumbersome with existing mining technology. First, it is difficult to customize highly optimized miners, and second, modifications need knowledge about their inner workings. Unlike this, our goal is to conceive an approach to sequence mining that allows for an easy integration of expert knowledge, without presupposing deep insights into the mining mechanism. We address this by means of Answer Set Programming (ASP; [Lifschitz, 2008]), a prime tool for knowledge representation and reasoning. Technically, the idea is to represent a sequence mining task as a logic program such that each of its answer sets comprises a pattern of interest, similar to the seminal work on ASP-based itemset mining in [J arvisalo, 2011]. Refining the considered patterns then corresponds to the addition of constraints or preferences, respectively, to an ASP encoding. Even without the deluge of patterns, mining interesting patterns by incorporating expert knowledge is a long standing is- sue. For one, [Muggleton and De Raedt, 1994] used Inductive Logic Programming with expert knowledge to improve the learning process. The data mining community addresses the issue algorithmically by defining search spaces corresponding to expert expectations, while being efficiently traversable. A common approach is to use constraints on patterns (eg. regular expressions) or on their occurrences (eg. max-gaps, max-duration). Also, interestingness measures have been defined [De Bie, 2011]. In particular, skypatterns [Ugarte et al., ] were introduced to manage complex preferences on patterns. Other approaches use data [Padmanabhan and Tuzhilin, 1998] or process [Flouvat et al., 2014] models to integrate expert knowledge into the mining process, for instance, in order to extract unexpected patterns. Recent works on pattern mining with Constraint Programming (CP) [Guns et al., 2011; Coquery et al., 2012; Negrevergne and Guns, 2015; Kemmar et al., 2015] step up the ladder by providing an effective and unified mining framework.1 However, expressing knowledge in terms of such constraints still remains difficult for domain experts. To the best of our knowledge, no existing framework addresses all aforementioned dimensions at once. In most cases, the addition of knowledge to the mining process remains a specific and tedious task that often results in dedicated algorithms. We address this shortcoming by proposing ASP as a uniform framework for intensive knowledge-based sequence mining. To this end, we develop concise yet versatile encodings of frequent (closed or maximal) sequential pattern mining as well as encodings for managing complex preferences on patterns. While the primary objective of our approach is not to tackle vast amounts of data, in the first place, we aim at a unified framework in which domain knowledge can easily be incorporated to process mid-size databases more effectively. In this way, we provide means enabling experts to extract small sets of patterns that they are really interested in. 2 Background Our terminology on sequence mining follows the one in [Negrevergne and Guns, 2015]. A database D is a multi-set of sequences over some given set of items.2 A sequence 1We return to these approaches in Section 6 in more detail. 2The generalization to sequences of itemsets, ie., sequences of sets of items, is straightforward and not considered here. Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16) s = hsii1 i m with si 2 is included in a sequence t = htji1 j n with m n, written s v t, if si = tei for 1 i m and an increasing sequence heii1 i m of positive integers 1 ei n, called an embedding of s in t. For example, we have ha, ci v ha, b, ci relative to embedding h1, 3i. We write s vb t, if s is a prefix of t. Given a database D, the cover of a sequence s is the set of sequences in D that include s: cover(s, D) = {t 2 D | s v t}. The number of sequences in D including s is called its support, that is, support(s, D) = |cover(s, D)|. For an integer k, frequent sequence mining is about discovering all sequences s such that support(s, D) k. We often call s a (sequential) pattern, and k is also referred to as the (minimum) frequency threshold. Condensed representations of frequent patterns are defined as follows. A pattern s is maximal, if there is no other pattern t such that s v t and support(t, D) k; s is closed, if there is no other pattern t such that s v t and support(s, D) = support(t, D). Analogous properties are obtained for the prefix relation vb. Additional constraints over patterns (item constraints and regular expressions) or over embeddings (maximum gaps and maximal span) are discussed in [Negrevergne and Guns, 2015] but beyond the focus of this paper. A logic program is a set of rules of the form a0 :- a1, . . . , am, not am+1, . . . , not an. (1) where each ai is a propositional atom for 0 i n and not stands for default negation. If n = 0, rule (1) is called a fact. If a0 is omitted, (1) represents an integrity constraint. Semantically, a logic program induces a collection of so-called answer sets, which are distinguished models of the program determined by answer sets semantics; see [Gelfond and Lifschitz, 1991] for details. To facilitate the use of ASP in practice, several extensions have been developed. First of all, rules with variables are viewed as shorthands for the set of their ground instances. Further language constructs include conditional literals and cardinality constraints [Simons et al., 2002]. The former are of the form a : b1, . . . , bm, the latter can be written as s {c1, . . . , cn} t, where a and bi are possibly default negated literals and each cj is a conditional literal; s and t provide lower and upper bounds on the number of satisfied literals in a cardinality constraint. The practical value of both constructs becomes more apparent when used in conjunction with variables. For instance, a conditional literal like a(X) : b(X) in a rule s antecedent expands to the conjunction of all instances of a(X) for which the corresponding instance of b(X) holds. Similarly, 2 {a(X) : b(X)} 4 holds whenever between two and four instances of a(X) (subject to b(X)) are true. Specifically, we rely in the sequel on the input language of the ASP system clingo [Gebser et al., 2014]. Moreover, we draw upon asprin [Brewka et al., 2015], an ASP-based system for expressing combinations of qualitative and quantitative preferences among answer sets. A preference relation is defined via a declaration of the form #preference(p,t){c1, . . . , cn}. where p and t are the name and type of the preference relation, respectively, and each cj is a conditional literal.3 The 3See [Brewka et al., 2015] for more general preference elements. seq(1,1,d). seq(1,2,a). seq(1,3,b).seq(1,4,c). seq(2,1,a). seq(2,2,c). seq(2,3,b).seq(2,4,c). seq(3,1,a). seq(3,2,b). seq(3,3,c). seq(4,1,a). seq(4,2,b). seq(4,3,c). seq(5,1,a). seq(5,2,c). seq(6,1,b). seq(7,1,c). Listing 1: Facts specifying a database of sequences directive #optimize(p) instructs asprin to search for poptimal answer sets, being maximal wrt the strict preference relation associated with p. While asprin already comes with a library of predefined primitives and aggregate preference types, like subset or pareto, respectively, it also allows for adding customized preferences. To this end, users provide rules defining an atom better(p) that indicates whether an answer set X is preferred to another Y wrt the preference relation associated with p. Both sets X and Y are reified by asprin via unary predicates holds and holds ,4 whose instances are drawn upon in the definition of better(p). 3 ASP-based Sequence Mining 3.1 Fact format We represent a database D in terms of facts seq(t,p,e), saying that item e occurs at position p in a sequence t, denoted by a common term in the facts providing its elements. For instance, Listing 1 specifies a database of seven sequences: Term 1 2 3 4 5 6 7 Seq. hd, a, b, ci ha, c, b, ci ha, b, ci ha, b, ci ha, ci hbi hci Note that different terms can refer to the same elements for distinguishing multiple occurrences of a sequence, such as the identifiers 3 and 4 for ha, b, ci. Considering the frequency threshold k = 2, one can check that hai, hbi, hci, ha, bi, ha, ci, hb, ci, and ha, b, ci are the frequent patterns of the database. Since all of them are included in ha, b, ci, the latter is the unique maximal pattern. Moreover, the patterns hbi, hci, ha, ci, and ha, b, ci are closed, while hai, ha, bi, and hb, ci are not because they have the same support as respective extensions to ha, ci or ha, b, ci. 3.2 Mining frequent patterns: basic encoding The encoding principle for frequent sequence mining follows the one of [J arvisalo, 2011], that is, each answer set comprises a single pattern of interest. (This allows for a compact representation of exponentially many candidate patterns.) In contrast to itemset mining, however, we need to take the order of items in a pattern into account to determine its support and check its frequency. Listing 2 provides our basic encoding of frequent sequence mining. It relies on two parameters: max determines a maximum length for patterns of interest, and k specifies the frequency threshold. While longest sequences in a given database yield a natural maximum length for patterns that 4That is, holds(a)(or holds (a)) is true iff a2X (or a2Y ). Table 1: Atoms representing a pattern s = hsii1 i m, where 1 m max, and a given multiset D of sequences t = htji1 j n Atom Meaning slot(x) 1 x max may refer to the position m+1 x of an item sm+1 x in s, which exists whenever x m pos(x) 1 x m refers to the position m+1 x of an item sm+1 x in s pat(x,e) sm+1 x = e is the item at position m+1 x in s, where 1 x m item(e) item e belongs to some t 2 D cover(t) hsii1 i m v htji1 j n, that is, s v t inc(t,p,x) hsii1 i m x v htji1 j
n + 1, eg. inc(2,5, 1..max), can be omitted, but are kept here for ease of presentation. 17 tail(T,P,X) :- seq(T,P,E), slot(X), 18 { seq(T,P+1,F) } 0. 19 tail(T,P-1,X) :- tail(T,P,X-1), 1 < P, 20 slot(X). 21 tail(T,P-1,X) :- tail(T,P,X), 1 < P, 22 seq(T,P,E), not pat(X,E). 24 ins(T,X,E) :- seq(T,P,E), inc(T,P,X), 25 not tail(T,P,X). 26 :- item(E), X = 0..max, k { ins(T,X,E) }. Listing 3: Encoding part for selecting maximal patterns the (m+1 x)-th (or at the end, if x = 0) item into an extended pattern included in t are pulled together by the rule in Line 24 and 25, providing atoms ins(t,x,e). Finally, the integrity constraint in Line 26 denies candidate patterns to which an item can be added without falling below the frequency threshold, so that only maximal patterns remain. For the database in Listing 1, ha, b, ci is the single maximal pattern. Potential additions of items from the (longer) sequences hd, a, b, ci and ha, c, b, ci, denoted by 1 and 2, are indicated by the atoms ins(1,3,d) and ins(2,2,c). That is, d can be inserted before a or c after a to obtain an extended pattern including hd, a, b, ci or ha, c, b, ci, respectively. However, in both cases, the support drops to 1 and thus below the threshold k = 2, which shows that ha, b, ci is maximal. Listing 4 provides a modification of the encoding part from Listing 3 for selecting closed instead of maximal patterns. The difference is that the redundancy of a pattern is witnessed by an extended pattern with same support. Concerning sequences t that do not include a candidate pattern at hand, the additional rule in Line 26 and 27 yields ins(t,x,e) for any item e at any position x. Hence, the integrity constraint in Line 28 and 29, which replaces its counterpart from Listing 3, denies any candidate pattern such that all sequences including it agree on the insertion of some item. For instance, the frequent pattern hb, ci of the example database in Listing 1 is denied because atoms ins(t,2,a) for t 2 {1, 2, 3, 4, 5, 6, 7} indicate that the sequences including hb, ci, denoted by 1 to 4, comply with the insertion of a before b, while the remaining three sequences do not include hb, ci, so that they tolerate arbitrary additions of items. Similarly, the frequent patterns hai and ha, bi are discarded, thus leaving the closed patterns hbi, hci, ha, ci, and ha, b, ci. Relaxed notions of maximal or closed frequent patterns based on the prefix relation vb are easily obtained by dropping the rules for tail(t,p,x) from Line 17 to 22 in Listing 3 and substituting variable X with 0 in the remaining parts of Listing 3 and 4. This restricts the consideration of extended patterns to the addition of some item at the end. For instance, the frequent pattern hb, ci of the example database in Listing 1, which is neither maximal nor closed, becomes infrequent when extended at the end, so that it remains as maximal and closed under the relaxed notions. On the other hand, the closed pattern hbi is not maximal, even under the relaxation, because adding c yields the extended frequent pattern hb, ci. While the four notions of maximal or closed frequent patterns are distinct, either of them can be imposed by augmenting the 26 ins(T,X,E) :- item(E), X = 0..max, 27 seq(T,1,F), not cover(T). 28 :- item(E), X = 0..max, 29 ins(T,X,E) : seq(T,1,F). Listing 4: Modifications for selecting closed patterns basic encoding in Listing 2 with respective rules. In the next section, we further show how preferences can be utilized to find best patterns among those satisfying hard requirements. 4 Preference-based Mining We have demonstrated above how easily a basic encoding of frequent sequence mining can be extended in order to refine the patterns of interest. This can be taken one step further by exploiting ASP s preference handling capacities to extract preferred patterns [Cho et al., 2005]. The idea is to map a preference on patterns to one among answer sets (cf. Section 2), which is accomplished in two steps with the asprin system. At first, we have to provide implementations of pattern-specific preference types that are plugged into asprin. Once done, the defined preference types can be combined freely with those available in asprin s library to form complex preferences among the patterns given by answer sets. Let us illustrate our approach by modeling the extraction of skypatterns [Ugarte et al., ], which follow the Pareto principle in combining several quality measures. To illustrate skypatterns, we consider the exemplary pattern-oriented preference types area and aconf, measuring the product of a pattern s support and length or the ratio of support and the highest frequency of its contained items, respectively. More precisely, given a database D and a (frequent) pattern s = hsii1 i m, the associated area is m support(s, D), and the aconf measure is determined by support(s, D) max1 i m |{t 2 D | hsii v t}| , (2) where greater values are preferred in both cases. We further consider Pareto efficiency to aggregate the two measures.6 While the area preference can be directly mapped to asprin s built-in cardinality type (see below), a dedicated implementation of the aconf measure is given in Listing 5. To begin with, the auxiliary rules from Line 1 to 3 provide atoms of the form cont(e,f) and freq(f). The former express that f sequences in D contain the item e, and the latter indicate that some item belongs to f or more sequences. Moreover, the rules in Line 5 and 6 derive atoms hasfreq(1..f) for items e in a candidate pattern, so that the number of such atoms corresponds to the highest frequency. As described in Section 2, it remains to define an atom better(p) to indicate that an answer set X, reified in terms of holds(a) atoms, is preferred over (a previous) one, Y , given by holds (a), according to a preference p of type aconf. This is accomplished by the rule from Line 8 to 12, whose precondition of the form preference(p,aconf) 6That is, a pattern is preferred over another one if it is at least as good as the other in all criteria, and strictly better in some criterion. 1 cont(E,F) :- item(E), 2 F= #count{ T : seq(T,P,E) }. 3 freq(1..M) :- M = #max{ F : cont(E,F) }. 5 hasfreq(F) :- cont(E,F), pat(X,E). 6 hasfreq(F-1) :- hasfreq(F), 1 < F. 8 better(P) :- preference(P,aconf), 9 #sum{ 1,F,T : holds (hasfreq(F)), 10 holds (cover(T)); 11 -1,F,T : holds (hasfreq(F)), 12 holds (cover(T)) } > 0. Listing 5: Preference type implementation checks the type of preference p. The other literal is a #sum aggregate that casts ratios (2) to corresponding products via multiplication of the denominators. That is, each sequence t in D including the pattern comprised in answer set X contributes f 0 times the addend 1, where f 0 is the highest frequency of items in the pattern given by Y . In turn, each sequence t including the pattern in Y amounts to f instances of the addend -1 when f is the highest frequency for X. As a consequence, the #sum aggregate evaluates to a positive integer iff the ratio (2) obtained with the pattern in X is greater than the one for Y , which is then signaled by better(p). With the rules in Listing 5 at hand, the preference type denoted by aconf is ready for use in preference declarations.7 Listing 6 shows an asprin declaration of the preference relations specified above. In Line 1 and 2, the area preference is mapped to the built-in cardinality type, used to count conjunctions (&) of item positions in a pattern and sequences including it. The preference denoted by p1 thus aims at maximizing m support(s, D). The second preference p2 in Line 3 and 4 refers to the aconf type, whose implementation as above addresses the maximization of the ratio (2). In Line 5, both preference relations, referenced via dedicated atoms name(p1) and name(p2), are aggregated by means of the built-in preference type pareto. This yields the nested preference p3, made subject to optimization by asprin in Line 7. For the database in Listing 1, answer sets comprising the (frequent) patterns hci, ha, ci, and ha, b, ci turn out to be optimal in terms of Pareto efficiency. Their associated area and aconf measures are 6, 10, and 12 or 1, 5/6, and 2/3, respectively. While neither of the three patterns is preferred over another, they dominate the remaining patterns hai, hbi, ha, bi, and hb, ci. Of course, other customized preference types and different combinations thereof can be conceived as well, eg. using built-in types like lexico, and, and or (with their literal meanings). 5 Experiments Given its exhibited ease of modeling, it is now interesting to examine the computational behavior of our ASP-based approach. To this end, we conducted experiments on simulated 7For brevity, Listing 5 does not show an analogous rule replacing > 0 by >= 0 for defining bettereq(p), used additionally for aggregation via Pareto efficiency. 1 #preference(p1,more(cardinality)){ 2 pos(X) & cover(T): slot(X), seq(T,1,E) }. 3 #preference(p2,aconf){ hasfreq(F) : freq(F); 4 cover(T) :seq(T,1,E) }. 5 #preference(p3,pareto){ name(p1); name(p2) }. 7 #optimize(p3). Listing 6: Preference declaration databases. First, we present time efficiency results comparing our approach with the CP-based one of CPSM [Negrevergne and Guns, 2015]. Then, we illustrate the effectiveness of preference handling to reduce the size of output pattern sets. 5.1 Computing time The usage of simulated databases allows us to control and analyze the most important characteristics of data wrt the resulting time and memory demands. To be more precise, we generated databases8 using a retro-engineering process: 1) a set of random patterns is generated, 2) occurrences of patterns are assigned to 10% of 500 database sequences, and 3) each sequence of items is randomly generated according to the patterns it must contain and a mean length. In our experiments, we vary the mean length from 10 to 40, and contained items are randomly generated according to a Gaussian law (some items are more frequent than others) over a vocabulary of 50 items. For each data point, we then give average computing times over six similar databases. Figure 1 compares computing times of ASP-based sequence mining (using the ASP system clingo) and CPSM (based on the CP solver gecode). For fairness, we ran a CPSM version using constraints similar to those in our encodings.9 The timeout was set to 20 minutes. We do not compare our approach with dedicated algorithms, eg. [Zaki, 2001], which are known to be more efficient than declarative mining approaches (see [Negrevergne and Guns, 2015]). The results show that the computing times with ASP are comparable to CPSM for short sequences. When sequences become larger (mean length 40), the efficiency of our encoding somewhat decreases. This is due to our purely declarative approach. In fact, the check for embeddings is a heavy task for the solver, and the larger the sequences, the more complex it becomes (also in view of an increasing ground instantiation size). Unlike this, CPSM uses dedicated propagators for space-efficient embedding computation. Although it is beyond the focus of this paper, we mention that computing times can be greatly improved by using maxgap constraints on embeddings, a popular means of analysis since distant events in a sequence often have less significance. 5.2 Pattern set reduction This experiment aims at comparing different pattern mining tasks: skypatterns (with area and aconf measures) as well as 8The databases used in our experiments are available at https: //sites.google.com/site/aspseqmining. 9We are grateful to the developers of CPSM for this suggestion. 10 20 30 40 0% 10% 20% 30% Threshold Figure 1: Mean times for computing all frequent patterns wrt different thresholds and mean sequence lengths (10, 20, 30, and 40), using ASP (solid) and CPSM (dashed). Dashed horizontal lines indicate the timeout limit (1200 seconds). (BW/vb) closed and maximal frequent patterns. The objectives are to evaluate the effectiveness of our encodings on real databases and to illustrate how focusing on skypatterns drastically reduces the number of extracted patterns. We ran our respective encodings on two classical databases of the UCI collection [Lichman, 2013]: jmlr (a natural language processing database; each transaction is an abstract of a paper from the Journal of Machine Learning Research) and Unix user (each transaction is a series of shell commands executed by a user during one session). Figure 2 displays numbers of frequent patterns obtained wrt different thresholds. For the basic as well as condensed frequent pattern mining tasks, we see that the lower the threshold, the greater the number of patterns. Despite the exponential growth (or decrease, respectively) of candidate pattern sets, the number of skypatterns remains very low (at most five). In addition, we observed that the times for computing skypatterns are comparable to those for computing all frequent patterns. In fact, the number of models to compare remained quite low although the search space is the same. 6 Discussion We presented the first uniform approach dealing with several dimensions of sequence analysis. Our framework relies on off-the-shelf ASP technology and high-level specifications expressed in ASP. In turn, we have shown how effortlessly the basic approach can be extended to condensed pattern extraction and illustrated how ASP s preference handling capacities can be used for extracting patterns of interest. In doing so, we demonstrated the ease of incorporating knowledge into 50 100 150 200 Threshold log(#patterns) Preferences 100 200 300 Threshold log(#patterns) Preferences Figure 2: Numbers of frequent patterns wrt different thresholds. Top: Unix user database; bottom: jmlr database. the ASP-based mining process. Although our declarative encodings cannot fully compete with the low-level efficiency of dedicated data mining systems on basic sequence mining tasks, our approach has an edge over them as regards expressiveness and extensibility. Nonetheless, our empirical analysis revealed that its performance matches that of a closely related CP-based approach [Negrevergne and Guns, 2015]. Beyond that, we have seen that the incorporation of preferences can lead to a significant reduction of pattern sets. All in all, our approach thus aims at processing mid-size databases and the enablement of analysts to extract patterns that they are really interested in. Our approach follows the encoding principle of [J arvisalo, 2011], who used ASP for itemset mining. Unlike itemset mining, however, sequence mining is more demanding since we need to take the order of items in a pattern into account. The approach in [Guyet et al., 2014] tackles another sequential pattern mining task, viz. episode mining, enumerating multiple embeddings within a single sequence. This task has also been addressed by means of Boolean Satisfiability (SAT) [Coquery et al., 2012]. However, these approaches model sequential patterns with non-contiguous items by wildcards. In such a case, detecting embeddings is closer to string matching than subsequence finding. Unlike that, our encodings leave particular embeddings of a pattern implicit and focus on sequences for which some embedding exists. More generally, SAT lacks a modeling language, which makes it hard for any SAT-based approach to achieve a similar level of extensibility. The CPSM system [Negrevergne and Guns, 2015] pursues a CP-based approach and tackles sequential pattern mining tasks similar to ours. While we rely on fully declarative encodings, CPSM uses dedicated propagators for embedding computation. Although such propagators can be implemented by space-efficient special-purpose algorithms, they cannot easily be extended to accommodate further constraints. On the other hand, it will be interesting future work to equip ASP systems with similar constructs and to find a trade-off between declarativeness and effectiveness. An original aspect of our approach is the incorporation of preferences by using the asprin system, as illustrated by the simple implementation of skypatterns on top of any underlying encoding of frequent patterns. This highlights another advantage of ASP for declarative pattern mining: It is not only effortless to add new knowledge, but also to incorporate new forms of reasoning into the main mining task, and thus, to limit the effort of implementing new tasks. Unlike this, [Ugarte et al., ] use dynamic constraint satisfaction problems to address skypatterns. The idea is to add constraints for avoiding solutions dominated by already extracted ones, and to post-process patterns for filtering optimal ones. The asprin system works in a similar yet more transparent way, as a user only needs to specify her preferences and nothing else. Generally speaking, we have demonstrated that the modeling capacities of ASP are appropriate to express complex sequence mining tasks, where choice rules define the search space, integrity constraints eliminate invalid candidate patterns, and preferences distinguish optimal outcomes. In view of the data-intense domain at hand, some efforts had to be taken to develop economic encodings of generic sequence mining tasks. Experts can now benefit from our work and take advantage of it by stating further constraints or preferences, respectively, for extracting the patterns of their interest. Acknowledgments This work was partially funded by DFG grant SCHA 550/9. References [Brewka et al., 2015] G. Brewka, J. Delgrande, J. Romero, and T. Schaub. asprin: Customizing answer set preferences without a headache. In B. Bonet and S. Koenig, editors, Proceedings of the Twenty-Ninth National Conference on Artificial Intelligence (AAAI 15), pages 1467 1474. AAAI Press, 2015. [Cho et al., 2005] M. Cho, J. Pei, H. Wang, and W. Wang. Preference-based frequent pattern mining. International Journal of Data Warehousing and Mining, 1(4):56 77, 2005. [Coquery et al., 2012] E. Coquery, S. Jabbour, L. Sa ıs, and Y. Salhi. A SAT-based approach for discovering frequent, closed and maximal patterns in a sequence. In L. De Raedt, C. Bessiere, D. Dubois, P. Doherty, P. Frasconi, F. Heintz, and P. Lucas, editors, Proceedings of the Twentieth European Conference on Artificial Intelligence (ECAI 12), pages 258 263. IOS Press, 2012. [De Bie, 2011] T. De Bie. Maximum entropy models and subjective interestingness: An application to tiles in binary databases. Data Mining and Knowledge Discovery, 23(3):407 446, 2011. [Flouvat et al., 2014] F. Flouvat, J. Sanhes, C. Pasquier, N. Selmaoui-Folcher, and J. Boulicaut. Improving pattern discovery relevancy by deriving constraints from expert models. In T. Schaub, G. Friedrich, and B. O Sullivan, editors, Proceedings of the Twenty-first European Conference on Artificial Intelligence (ECAI 14), pages 327 332. IOS Press, 2014. [Gebser et al., 2014] M. Gebser, R. Kaminski, B. Kaufmann, and T. Schaub. Clingo = ASP + control: Preliminary report. In M. Leuschel and T. Schrijvers, editors, Technical Communications of the Thirtieth International Conference on Logic Programming (ICLP 14), Theory and Practice of Logic Programming, 14(4-5):Online Supplement, 2014. Available at http://arxiv.org/abs/1405.3694. [Gelfond and Lifschitz, 1991] M. Gelfond and V. Lifschitz. Classical negation in logic programs and disjunctive databases. New Generation Computing, 9:365 385, 1991. [Guns et al., 2011] T. Guns, S. Nijssen, and L. De Raedt. Itemset mining: A constraint programming perspective. Artificial Intelligence, 175(12-13):1951 1983, 2011. [Guyet et al., 2014] T. Guyet, Y. Moinard, and R. Quiniou. Using answer set programming for pattern mining. In Actes des Huiti emes Journ ees d Intelligence Artificielle Fondamentale (JIAF 14), 2014. Available at http://arxiv. org/abs/1409.7777. [J arvisalo, 2011] M. J arvisalo. Itemset mining as a challenge application for answer set enumeration. In J. Delgrande and W. Faber, editors, Proceedings of the Eleventh International Conference on Logic Programming and Nonmonotonic Reasoning (LPNMR 11), volume 6645 of Lecture Notes in Artificial Intelligence, pages 304 310. Springer-Verlag, 2011. [Kemmar et al., 2015] A. Kemmar, S. Loudni, Y. Lebbah, P. Boizumault, and T. Charnois. PREFIX-PROJECTION global constraint for sequential pattern mining. In G. Pesant, editor, Proceedings of the Twenty-first International Conference on Principles and Practice of Constraint Programming (CP 15), volume 9255 of Lecture Notes in Computer Science, pages 226 243. Springer-Verlag, 2015. [Lichman, 2013] M. Lichman. UCI machine learning repos- itory. http://archive.ics.uci.edu/ml, 2013. [Lifschitz, 2008] V. Lifschitz. What is answer set program- ming? In D. Fox and C. Gomes, editors, Proceedings of the Twenty-third National Conference on Artificial Intelligence (AAAI 08), pages 1594 1597. AAAI Press, 2008. [Muggleton and De Raedt, 1994] S. Muggleton and L. De Raedt. Inductive logic programming: Theory and methods. Journal of Logic Programming, 19/20:629 679, 1994. [Negrevergne and Guns, 2015] B. Negrevergne and T. Guns. Constraint-based sequence mining using constraint programming. In L. Michel, editor, Proceedings of the Twelfth International Conference on Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems (CPAIOR 15), volume 9075 of Lecture Notes in Computer Science, pages 288 305. Springer-Verlag, 2015. [Padmanabhan and Tuzhilin, 1998] B. Padmanabhan and A. Tuzhilin. A belief-driven method for discovering unexpected patterns. In R. Agrawal, P. Stolorz, and G. Piatetsky-Shapiro, editors, Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD 98), pages 94 100. AAAI Press, 1998. [Shen et al., 2014] W. Shen, J. Wang, and J. Han. Sequential pattern mining. In C. Aggarwal and J. Han, editors, Frequent Pattern Mining, pages 261 282. Springer-Verlag, 2014. [Simons et al., 2002] P. Simons, I. Niemel a, and T. Soininen. Extending and implementing the stable model semantics. Artificial Intelligence, 138(1-2):181 234, 2002. [Ugarte et al., ] W. Ugarte, P. Boizumault, B. Cr emilleux, A. Lepailleur, S. Loudni, M. Plantevit, C. Ra ıssi, and A. Soulet. Skypattern mining: From pattern condensed representations to dynamic constraint satisfaction problems. Artificial Intelligence. To appear. [Zaki, 2001] M. Zaki. SPADE: An efficient algorithm for mining frequent sequences. Machine Learning, 42(12):31 60, 2001.