# aspbased_declarative_process_mining__04cd9612.pdf

ASP-Based Declarative Process Mining

Francesco Chiariello1, Fabrizio Maria Maggi2, Fabio Patrizi1

1 DIAG - Sapienza University of Rome, Italy 2 KRDB - Free University of Bozen-Bolzano, Italy chiariello@diag.uniroma1.it, maggi@inf.unibz.it, patrizi@diag.uniroma1.it

We put forward Answer Set Programming (ASP) as a solution approach for three classical problems in Declarative Process Mining: Log Generation, Query Checking, and Conformance Checking. These problems correspond to different ways of analyzing business processes under execution, starting from sequences of recorded events, a.k.a. event logs. We tackle them in their data-aware variant, i.e., by considering events that carry a payload (set of attribute-value pairs), in addition to the performed activity, specifying processes declaratively with an extension of linear-time temporal logic over ﬁnite traces (LTLf). The data-aware setting is signiﬁcantly more challenging than the control-ﬂow one: Query Checking is still open, while the existing approaches for the other two problems do not scale well. The contributions of the work include an ASP encoding schema for the three problems, their solution, and experiments showing the feasibility of the approach.

Introduction

Process Mining (PM) for Business Process Management (BPM) is a research area aimed at discovering common patters in Business Processes (BP) (van der Aalst 2016). The analysis starts from event logs, i.e., sets of traces that record the events associated with process instance executions, and typically assumes a process model, which may be taken as input, manipulated as an intermediate structure, or produced in output. Events describe process activities at different levels of details. In the simplest perspective, here referred to as control-ﬂow-only, events model atomic activity performed by a process instance at some time point; in the most complex scenario, typically referred to as multiperspective, events also carry a payload including a timestamp and activity data. Processes can be speciﬁed prescriptively, i.e., as models, such as Petri Nets, that generate traces, or declaratively, i.e., through logical formulas representing the constraints that traces must satisfy in order to comply with the process. This is the approach we adopt here. Speciﬁcally, we take a (untimed) data-aware perspective where events include the activity and set of attribute-value pairs, the payload.

Copyright 2022, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

In Declarative PM, the de-facto standard for expressing process properties is DECLARE (van der Aalst, Pesic, and Schonenberg 2009), a temporal logic consisting in a set of template formulas of the Linear-time Temporal Logic over ﬁnite traces (LTLf) (De Giacomo and Vardi 2013); here, we use a strictly more expressive extension, which we call local LTLf, i.e., L-LTLf. This logic features a simple automata-based machinery that facilitates its manipulation, while retaining the ability to express virtually all the properties of interest in declarative PM. Speciﬁcally, L-LTLf subsumes DECLARE and even its multi-perspective variant MP-DECLARE (Burattin, Maggi, and Sperduti 2016) without timestamps and correlation conditions (i.e., conditions that relate the attributes of some event to those of other events), but does not subsume full MP-DECLARE. Observe that since MP-DECLARE does not subsume L-LTLf either, the two logics are incomparable. Our goal is to devise techniques for three classical problems in Declarative PM: Event Log Generation (Skydanienko et al. 2018), i.e., generating an event log consistent with a declarative model; Query Checking (R aim et al. 2014), i.e., discovering hidden temporal properties in an event log; and Conformance Checking (Burattin, Maggi, and Sperduti 2016), i.e., checking whether the traces of an event log conform to a process model. The main challenge is dealing with data: in the data-aware framework, Query Checking is still open, while existing tools for the other two problems do not scale well. We put forward Answer Set Programming (ASP (Niemel a 1999)) as an effective solution approach. ASP natively provides data-manipulation functionalities which allow for formalizing data-aware constraints and has experienced over the last years a dramatic improvement in solution performance, thus results in a natural and promising candidate approach for addressing the problems of our interest. We show how such problems can be conveniently modeled as ASP programs, thus solved with any solver. Using the state-ofthe-art solver Clingo1 (Gebser et al. 2019), we experimentally compare our approach against existing ones for Log Generation and Conformance Checking, and show effectiveness of the approach for Query Checking in a data-aware setting. Besides providing an actual solution technique, ASP fa-

1potassco.org

The Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22)

cilitates reuse of speciﬁcations: the ASP encodings we propose here, indeed, differ in very few, although important, details. Previous related work include (Wieczorek, Jastrzab, and Unold 2020), where ASP is used to infer a ﬁnite-state automaton that accepts (resp. rejects) traces from a positive (negative) input set. This can be seen as a form of Declarative Process Discovery, i.e., the problem of obtaining a (declarative) process speciﬁcation, which is complementary to the problems we address here. Our approach is similar, in that we use automata to model temporal properties. However, we propose a different automata encoding and show the effectiveness of the approach on three different problems. Another related paper is (Heljanko and Niemel a 2003), which shows how ASP can be used to check a Petri Net against an LTL speciﬁcation, up to a bounded time horizon. Differently from our work, it: (i) deals with LTL over inﬁnite, as opposed to ﬁnite, runs; (ii) adopts a prescriptive, as opposed to declarative, approach; and (iii) does not deal with data in events. From a broader perspective, we ﬁnally observe that while we deal with a set of speciﬁc problems, the work paves the way for ASP to become a general effective approach to Declarative PM.

The Framework An activity (signature) is an expression of the form A(a1, . . . , an A), where A is the activity name and each ai is an attribute name. We call n A the arity of A. The attribute names of an activity are all distinct, but different activities may contain attributes with matching names. We assume a ﬁnite set Act of activities, all with distinct names; thus, activities can be identiﬁed by their name, instead of by the whole tuple. Every attribute (name) a of an activity A is associated with a type DA(a), i.e., the set of values that can be assigned to a when the activity is executed. For simplicity, we assume that all domains are equipped with the standard relations <, , =, , >. All results can be immediately adapted if some relations are absent in some domain. An event is the execution of an activity (at some time) and is formally captured by an expression of the form e = A(v1, . . . , vn A), where A Act is an activity name and vi DA(ai). The properties of interest in this work concern (log) traces, formally deﬁned as ﬁnite sequences of events τ = e1 en, with ei = Ai(v1, . . . , vn Ai ). Traces model process executions, i.e., the sequences of activities performed by a process instance. A ﬁnite collection of executions into a set L of traces is called an event log.

L-LTLf We adopt a declarative approach to process modeling, meaning that processes are speciﬁed through a set of constraints over their executions, i.e., over the traces they produce. The formal language we use to express properties of traces is a variant of the Linear-time Logic over ﬁnite traces, LTLf (De Giacomo and Vardi 2013), adapted to deal with the data attributes of activities. We call such variant Linear-time Logic with local conditions over ﬁnite traces, or local LTLf for short, and denote it as L-LTLf.

Given a ﬁnite set of activities Act, the formulas ϕ of LLTLf over Act are inductively deﬁned as follows:

ϕ = true | A | a a | a v | ϕ | ϕ ϕ | Xϕ | ϕUϕ,

where: a and a are attribute names from some activity in Act, v DA(a), for some A Act, is an operator from {<, , =, , >}, and A is an activity name from Act. Formulas of the form true, A, a a , and a v are called atomic; formulas not containing the operators X and U are called event formulas. The logic is interpreted over positions of ﬁnite traces. Formula true holds at every position; A checks whether activity A occurs in the trace at the given position; a a (resp. a v) compares the value assigned to attribute a with that of attribute a (resp. with that of value v), at the given position; boolean operators combine formulas as usual; the next operator Xϕ checks whether ϕ holds in the sufﬁx starting at the next position; ﬁnally, the until operator ϕUϕ checks whether ϕ is satisﬁed at some position k, and whether ϕ holds in all positions that precede k, up to the given position. Formally, we deﬁne by induction when a trace τ = e1 en satisﬁes an L-LTLf formula ϕ at position 1 i n, written τ, i |= ϕ, as follows:

τ, i |= true; τ, i |= A iff ei = A(v1, . . . , vn A); τ, i |= a a iff for ei = Ai(v1, . . . , vn Ai ) and Ai(a1, . . . , an Ai ) the signature of Ai, it is the case that, for some j and k, a = aj, a = ak, and vj vk; 2

τ, i |= a v iff for ei = Ai(v1, . . . , vn Ai ) and Ai(a1, . . . , an Ai ) the signature of Ai, it is the case that, for some j, a = aj and vj v; τ, i |= ϕ1 ϕ2 iff τ, i |= ϕ1 and τ, i |= ϕ2; τ, i |= ϕ iff τ, i |= ϕ; τ, i |= Xϕ iff i < n and τ, i + 1 |= ϕ; τ, i |= ϕ1Uϕ2 iff there exists j n s.t. τ, j |= ϕ2 and for every 1 k j 1, it is the case that τ, k |= ϕ1.

Notice that while, in general, the satisfaction of an L-LTLf formula ϕ at some position i of τ depends on the whole trace, and precisely on the sufﬁx of τ starting at position i, event formulas depend only on the event at i. As in standard LTLf, X denotes the strong next operator (which requires the existence of a next event where the inner formula is evaluated), while U denotes the strong until operator (which requires the right-hand formula to eventually hold, forcing the left-hand formula to hold in all intermediate events). The following are standard abbreviations: ϕ1 ϕ2 .= ( ϕ1 ϕ2); ϕ1 ϕ2 .= ϕ1 ϕ2; Fϕ = true Uϕ (eventually ϕ); and Gϕ = F ϕ (globally, or always, ϕ). Through L-LTLf one can express properties of process traces that involve not only the process control-ﬂow but also the manipulated data.

2Notice that this requires compatibility between the domains DAi(aj) and DAi(ak) wrt relation .

Figure 1: Automaton for the Response constraint.

Example 1 The L-LTLf formula ϕ = G(a Fb), a socalled Response constraint, says that whenever activity a occurs, it must be eventually followed by activity b. A possible data-aware variant of ϕ is the formula ϕ = G((a n < 5) Fb), which says that whenever activity a occurs with attribute n less than 5, it must be followed by activity b.

Formulas of LTLf, thus L-LTLf, have the useful property of being fully characterized by ﬁnite-state, possibly nondeterministic, automata. Speciﬁcally, for every L-LTLf formula ϕ there exists a ﬁnite-state automaton (FSA) Aϕ that accepts all and only the traces that satisfy ϕ (De Giacomo and Vardi 2013). Such automata are standard FSA with transitions labelled by event formulas. For a ﬁxed set of activities Act, let ΓAct be the set of event formulas over Act. An FSA over a set of activities Act is a tuple A = Q, q0, ρ, F , where:

Q is a ﬁnite set of states; q0 Q is the automaton initial state; δ Q ΓAct Q is the automaton transition relation; F Q is the set of automaton ﬁnal states.

Without loss of generality, we assume that formulas labeling transitions are conjunctions of literals. It is immediate to show that every FSA can be rewritten in this way. A run of an FSA A on a trace τ = e1 en (over Act) is a sequence of states ρ = q0 qn s.t. for all 0 i n 1 there exists a transition qi, γ, qi+1 δ s.t. τ, i + 1 |= γ. A trace τ over Act is accepted by A iff it induces a run of A that ends in a ﬁnal state. Notice that satisfaction of γ, this being an event formula, can be established by looking at each event ei+1 at a time, while disregarding the rest of the trace; thus, in order to construct the induced run ρ, one can proceed in an online fashion, as the next event arrives, by simply triggering, at every step, a transition outgoing from qi whose label is satisﬁed by the event.

Example 2 Consider again the formulas ϕ and ϕ shown above, and the (paramtric) automaton depicted in Fig 1. It is easy to see that for ϕ1 = a, ϕ2 = b, ϕ3 = a, ϕ4 = b, the resulting automaton accepts all and only the traces that satisfy ϕ, as well as that for ϕ1 = a n < 5, ϕ2 = b, ϕ3 = (a n < 5), ϕ4 = b, the obtained automaton accepts all and only the traces that satisfy ϕ .

The details about the construction of Aϕ from ϕ are not in the scope of this work, and we refer the interested reader to (De Giacomo and Vardi 2013) for more information; we rely on the results therein. We observe that while the automaton construction is time-exponential in the worstcase, wrt the size of the input formula ϕ, tools exist, such

as Lydia3 (De Giacomo and Favorito 2021), which exhibit efﬁcient performance in practice; this, combined with the fact that the speciﬁcations of practical interest are typically small, makes the approaches based on automata construction usually feasible in practice. We can now formalize the problems addressed in this paper.

Event Log Generation. Given a set Φ of L-LTLf formulas over a set of activities Act and a positive integer t, return a trace τ over Act of length t s.t., for every formula ϕ Φ, it is the case that τ |= ϕ. In words, the problem amounts to producing a trace of length t over Act that satisﬁes all the input constraints in Φ. A more general version of the problem requires to generate a log L of n traces of ﬁxed length t satisfying the constraints. For simplicity, we consider the ﬁrst formulation.

Query Checking. Query Checking takes as input formulas from the extension of L-LTLf with activity variables, deﬁned as follows:

ϕ = true |?V | A | a a | a v | ϕ | ϕ ϕ | Xϕ | ϕUϕ,

where symbols starting with ? are activity variables and the other symbols are as in L-LTLf (given Act). Given an L-LTLf formula with activity variables, by assigning an activity (from Act) to every variable, we obtain a regular L-LTLf formula. Formally, for an L-LTLf formula ϕ (over Act), containing a (possibly empty) set of activity variables V arsϕ, an assignment to V arsϕ is a total function ν : V arsϕ Act. Given ϕ and an assignment ν to its activity variables, ϕ[ν] denotes the (regular) L-LTLf formula obtained by replacing, in ϕ, every variable symbol ?V with an activity name from Act. Observe that if V arsϕ = there exists only one assignment ν and ϕ = ϕ[ν]. Given a trace τ, since ϕ[ν] is a regular L-LTLf formula, we can check whether τ |= ϕ[ν]. An instance of Query Checking consists in a log L and an L-LTLf formula ϕ with activity variables V arsϕ; a solution is a set Λ of assignments to V arsϕ s.t. for every assignment ν Λ and every trace τ L, it holds that τ |= ϕ[ν]. In words, query checking requires to ﬁnd a set Λ of assignments ν, each transforming the input formula ϕ into an L-LTLf formula ϕ[ν] satisﬁed by all the traces of the input log L. Observe that ϕ variables can only span over activities.

Conformance Checking. Given a trace τ and a set Φ of L-LTLf formulas, both over the same set of activities Act, check whether, for all formulas ϕ Φ, τ |= ϕ. The problem can also be deﬁned in a more general form, where τ is replaced by a log L of traces over Act and the task requires to check whether for all the traces τ of L and all ϕ Φ, it holds that τ |= ϕ.

Answer Set Programming (ASP) An ASP program consists in a set of rules which deﬁne predicates and impose relationships among them. The task of an ASP solver is that of ﬁnding a ﬁnite model of the program, i.e., an interpretation of the predicates that satisﬁes

3github.com/whitemech/lydia/releases/tag/v0.1.1

the program rules. ASP rules are written in a fragment of (function-free) First-order Logic (FOL) extended with a special negation-as-failure (NAF) operator (in addition to classical negation) which allows for distinguishing facts that are false from facts that are unknown. The presence of this operator, combined with the classical FOL negation, has a huge impact on the programs one can write and the way models are found. Here, we do not discuss these details, referring the interested reader to (Gelfond and Lifschitz 1988; Niemel a 1999). For our purposes, it will be sufﬁcient restricting to the class of rules with the NAF operator as the only available negation operator (that is, disallowing classical negation).

Syntax The basic constructs of ASP programs are: 1. constants, identiﬁed by strings starting with a lower-case letter; 2. variables, identiﬁed by strings starting with an upper-case letter; 3. terms, i.e., constants or variables; 4. atoms, i.e., expressions of the form p(t1, . . . , tn), where p is a predicate, identiﬁed by a string, and each ti is a term. A predicate p is said to have arity n if it occurs in an expression of the form p(t1, . . . , tn). An atom containing only constant terms is said to be ground. ASP rules are obtained by combining the basic elements through boolean operators and the NAF operator. In this work, we use rules of the following form:

a l1, . . . , ln, not ln+1, . . . , not lm

where a and each li are atoms p(t1, . . . , tn), not denotes the NAF operator, and every variable occurring in the rule also occurs in some atom l1, . . . , ln. The left-hand side is called the rule s head and is optional. When the head is absent, the rule is called an integrity constraint. The right-hand side is called the body and can be left empty, in which case the symbol is omitted and the rule is called a fact.

Semantics Intuitively, a model of an ASP program P is a set of ground atoms that satisﬁes all program rules. In general, many models exist. Among these, only those that are minimal wrt set inclusion and that contain a ground atom only if needed , i.e., if it occurs as the head of a ground rule, are taken as solutions, called the answer sets of P in the ASP terminology. The task of an ASP solver is to compute such sets. Given an ASP program P and a rule r P, the set of r ground instantiations is the set G(r) of rules obtained by replacing all the variables in r with all the constants mentioned in P (the so-called Herbrand universe of P), in all possible ways, so that all rules in G(r) contain only ground atoms. Then, the ground instantiation G(P) of a program P is the union of all the ground instantiations of its rules, i.e., G(P) = S r P G(P). An interpretation I of a program P is a set of ground atoms p(c1, . . . , cn), where p is a predicate of arity n occurring in P and c1, . . . , cn are constants from the Herbrand universe of P. Given a positive (i.e., without occurrences of not) program P, an interpretation I is a model of P if, for every ground rule, a l1, . . . , ln in G(P), whenever li I

for i = 1, . . . , ln, it holds that a I. An answer set of P is a model I that is minimal wrt set inclusion. The semantics of general programs is obtained as a reduction to positive programs. Namely, the reduct of a ground program G(P) wrt an interpretation I is the positive ground program G(P)I obtained by:

deleting all the rules

a l1, . . . , ln, not ln+1, . . . , not lm

of G(P) s.t. li I for some i {n + 1, . . . , m}; replacing all the remaining rules

a l1, . . . , ln, not ln+1, . . . , not lm

with a l1, . . . , ln.

Intuitively, the ﬁrst transformation removes a rule, as already satisﬁed by I; the second transformation removes the socalled negative body of the rule, because it is satisﬁed. As it can be easily seen, the resulting program G(P)I does not mention the not operator. The interpretation I is an answer set of P if it is an answer set of G(P)I.

In this work, we do not discuss the algorithms to compute the answer sets of a program, but focus on how the problems of our interest can be encoded in ASP and then solved by an ASP solver, in such a way that the returned Answer Sets represent the solution to our problems. This is the focus of the next section. For the experiments, we use the state-ofthe-art solver Clingo.

ASP for Declarative Process Mining

We encode Log Generation, Conformance Checking, and Query Checking into ASP programs. For every L-LTLf formula ϕ we deal with, we assume available the corresponding automaton Aϕ. The three programs share some common parts, such as the automata and the traces, which are modeled through suitable predicates and ASP rules. Each encoding re-uses some of these parts, possibly customized, together with additional fragments used to model problemspeciﬁc features. Activities are captured by the unary predicate act(A), where A is the activity name. In the presence of data, activity signatures are modeled by the binary predicate has attr(A, N), where A is the activity name and N is the attribute name. Attributes may be typed by stating the set of values they can take, through predicate val(N, V ), where N is the attribute name and V one of its possible values. A trace is modeled by the binary predicate trace(A, T), where A is the activity and T the time point where it occurs. Time points come from predicate time(T), which contains the values {0, . . . , ℓ}, for ℓthe trace length. The trace is deﬁned on time points from 0 to ℓ 1. In the presence of data, activity attributes are paired with values through predicate has val(N, V, T), where N is the attribute name, V the assigned value, and T the time point . Notice that the association is based on the time point (exactly one activity is performed at one time point). Simple integrity constraints are used to ensure that the mentioned attributes belong in fact

to the activity and that the sassigned value comes from the corresponding type. Automata are encoded with predicates init(S), acc(S), trans(S, F, S ), and hold(F, T). The ﬁrst and the second one model the initial state and the accepting states of the automaton, the third one models the existence of a transition from S to S under the event formula represented by integer F, and the last one models satisfaction of (event) formula F at time point T. In the presence of multiple L-LTLf formulas, each automaton is identiﬁed by a unique integer value and an additional parameter is added to the above predicates to refer to the various automata.

Example 3 The ASP encoding of the automaton for the LTLf formula G(a Fb), shown in Fig. 1, for ϕ1 = a, ϕ2 = b, ϕ3 = a, ϕ4 = b, is as follows:

init(1, s0). acc(1, s0). trans(1, s0, 1, s1). hold(1, 1, T) trace(a, T). trans(1, s1, 2, s0). hold(1, 2, T) trace(b, T). trans(1, s0, 3, s0). hold(1, 3, T) not hold(1, 1, T), time(T). trans(1, s1, 4, s1). hold(1, 4, T) trace(A, T), activity(A), A = b.

where a and b are activities and each formula ϕi (i = 1, . . . , 4) is identiﬁed by index i in the encoding. In a data-aware setting, conditions on data can be simply added to the rules for holds. For example the following rule:

hold(1, 1, T) trace(a, T), has val(number, V, T), V < 5.

expresses the fact that the event formula ϕ1 holds at time T if activity a occurs at time T in the trace, with a value less than 5 assigned to its attribute number.

To capture satisfaction of an L-LTLf formula ϕ by a trace τ, we model the execution of the automaton Aϕ on τ. To this end, we introduce predicate state(I, S, T), which expresses the fact that automaton (with index) I is in state S at time T. Since the automaton is nondeterministic in general, it can be in many states at time point T (except for the initial one). The rules deﬁning state are the following:

state(I, S, 0) init(I, S). state(I, S , T) state(I, S, T 1), trans(I, S, F, S ), hold(I, F, T 1).

The ﬁrst one says that at time point 0 every automaton I is in its respective initial state. The second one says that the current state of automaton I at time point T is S whenever the automaton is in state S at previous time point T 1, the automaton contains a transition from S to S under some event formula with index F and the formula holds at time T 1 in the trace. Finally, the fact that a trace is accepted by all automata, i.e., that the trace satisﬁes the corresponding formulas, is stated by requiring that, for each automaton, at least one of

the ﬁnal states be accepting (tlength(T) denotes the length T of the trace):

{state(I, S, T) : not acc(I, S), tlength(T)} = 0.

Next, we use these fragments to describe the ASP encodings for the problems of interest. For lack of space, we discuss only the main rules.

Event Log Generation The encoding schema of Event Log Generation is as follows:

1. Activities, attributes, attribute types, and trace length are provided as input and formalized as discussed above. 2. For each input L-LTLf constraint ϕ, the corresponding automaton Aϕ is generated and modeled as discussed, using a unique integer value to identify it. 3. Suitable integrity constraints are deﬁned to ensure that: each time point in the trace has exactly one activity; every attribute is assigned exactly one value; and the attributes assigned at a given time point actually belong to the activity occurring at that time point. 4. Finally, predicate state is deﬁned as above and it is required that every automaton ends up in at least one ﬁnal state at the last time point. 5. Predicates trace and has val contain the solution, i.e., they model a sequence of activities whose attributes have an assigned value, which satisﬁes all the input constraints.

Query Checking The ASP speciﬁcation of query checking is analogous to that of Log Generation except for the following. Firstly, the problem takes as input a set of fully speciﬁed traces. This is dealt with in a simple way, by adding a parameter to predicate trace representing the (unique) identiﬁer of the trace and, consequently, by adding such parameter to all the predicates that depend on trace (e.g., has val, hold, state). Secondly, the input L-LTLf formulas contain activity variables. To deal with them, additional predicates var(V ) and assgnmt(V, W) are introduced to account for, respectively, variables V and assignments of value W to variable V . Besides this, the automata Aϕi associated with the formulas are obtained by treating activity variables as if they were activity symbols (without affecting the construction, which does not consider the semantics of such objects), thus obtaining automata whose transitions are labelled by event formulas, possibly containing activity variables instead of activity symbols. Such formulas become regular event formulas once values are assigned to variables and can thus be evaluated on the (events of the) input trace. Formally, this requires a slightly different definition of predicate hold, which must now take assgnmt into account. To see how this is done, consider the formula ϕ = G((?A1 number < 5) F?A2). The corresponding automaton is the same as that of Fig. 1, where ϕ1 =?A1 number < 5, ϕ2 =?A2, ϕ3 = ϕ1, and ϕ4 = ϕ2. For formula ϕ1, we have the following deﬁnition of predicate hold:

hold(1, 1, J, T) trace(J, A, T), assgnmt(var A1, A), has val(J, integer, V, T), V < 5.

The parameter J stands for the trace identiﬁer, as discussed above. The above rule generalizes the corresponding one in Log Generation in the presence of activity variable ?A1. As it can be seen, in order to evaluate formula ϕ1 (second parameter in hold) of automaton 1 (ﬁrst parameter), such variable (modeled as var A1) must be instantiated ﬁrst, through predicate assgnmt. Observe that once all variables are assigned a value, the whole formula ϕ becomes variable-free, and the corresponding automaton is a regular automaton. The returned extensions of assgnmt and has val represent, together, the problem solution.

Conformance Checking Conformance Checking can be seen as a special case of Query Checking with a single input trace and where all input formulas are variable-free. In this case, the problem amounts to simply checking whether the whole speciﬁcation is consistent, which is the case if and only if the input trace, together with the assignments to the respective activity attributes, satisfy the input formulas. We close the section by observing how these problems provide a clear example of how the declarative approach allows for speciﬁcation reuse. All the speciﬁcations, indeed, share the main rules (for trace, automaton, etc.) and are easily obtained as slight variants of each other, possibly varying the (guessed) predicates representing the solution.

Experiments

In this section, we provide both a comparison with state-ofthe-art tools for Log Generation and Conformance Checking, based on multi-perspective declarative models, and an estimate of scalability for our query checking tool, for which, instead, no competitors exist. The state-of-the art tool used for Log Generation is the one presented in (Skydanienko et al. 2018), which is based on Alloy4 and tailored for MP-Declare; our results show that our ASP implementation for Log Generation scales much better than that and, at the same time, supports a more expressive data-aware rule language. As to Conformance Checking, we considered the state-of-the-art tool Declare Analyzer (Burattin, Maggi, and Sperduti 2016); we obtained comparable execution times but Declare Analyzer is speciﬁcally tailored for Declare and optimized to check conformance wrt Declare rules only, while our tool is more general in this respect. The experiments have been carried out on a standard laptop Dell XPS 15 with an Intel i7 processor and 16GB of RAM. All execution times have been averaged over 3 runs. Source code, declarative models and event logs used in the experiments are available at https://github.com/fracchiariello/process-mining-ASP.

Log Generation

For testing the Log Generation tools, we have used 8 synthetic models and 8 models derived from real life logs. The experiments with synthetic models allowed us to test scalability of the tools in a controlled environment and over models with speciﬁc characteristics. The experiments with real models have been used to test the tools in real environments.

4https://alloytools.org/

# constr. 3 5 7 10 Trace len

10 595 614 622 654 15 876 894 904 956 20 1132 1155 1178 1250 25 1364 1413 1444 1543 30 1642 1701 1746 1874 10 249 270 289 340 15 349 390 408 457 20 436 496 538 601 25 519 568 611 712 30 622 666 726 837 10 35975 35786 36464 37688 15 50649 51534 54402 54749 20 69608 70342 73122 73222 25 85127 85598 87065 89210 30 101518 101882 106062 107520 10 18733 18947 19539 20007 15 25700 25723 27344 26897 20 32047 33837 33107 33615 25 39114 38666 40556 41055 30 46207 46706 47613 49410

Table 1: Log Generation (times in ms)

For the experiments with synthetic models, we built 8 reference models containing 3, 5, 7, and 10 constraints with and without data conditions. Each model was obtained from the previous one by adding new constraints and preserving those already present. Times are in ms. The ﬁrst and second blocks of Table 1 show the execution times for the ASP-based log generator, respectively with and without data conditions; the third and fourth blocks show the results obtained with the Alloy log generator, with and without data. Times refer to the generation of logs with 10000 traces (of length from 10 to 30). Consistent results are obtained also on additional experiments for logs of size between 100 and 5000, not reported here for space reasons. The results obtained with models containing data conditions show that the ASP-based tool scales very well, requiring less than 2 sec in the worst case. This occurs when a model with 10 constraints is used to generate 10000 traces of length 30. As expected, the execution time increases linearly when the length of the traces in the generated logs increases. The number of constraints in the declarative model also affects the tool performance but with a lower impact. Without data conditions the results are similar but, as expected, the execution time is lower and increases less quickly when the complexity of the model and of the generated log increases. In the worst case (a model with 10 constraints used to generate 10000 traces of length 30), the execution time is lower than 1 sec. The results obtained with the Alloy-based tool show similar trends but with execution times almost 60 times higher than those obtained with the ASP-based tool. The real life logs used in the experiments are taken from the collection available at https://data.4tu.nl/. We used

Model (80) BPI2012 DD ID PL PTC RP RT Sepsis Trace len

10 656 100* 726* 3901 1183 119* 319 460 15 817 887 2865 4538 1820 1069 353 564 20 846 832 3160 4102 2194 813 860 640 25 1061 930 4129 6169 2889 1063 483 780 30 1433 1026 5226 9231 2370 1220 630 923 10 31935 2364* 30762* 59468 65783 2703* 24909 38241 15 43337 58572 152188 85942 97098 66641 34408 57178 20 57596 80665 237777 122511 146420 95005 44608 85808 25 72383 118975 359665 174596 221434 134851 54808 120110 30 86910 181027 563794 236697 330753 187972 63379 174838

Table 2: Log Generation, real life (times in ms)

Tool ASP Declare Analyzer Trace Len data no data data no data 10 665 635 598 110 15 1100 1035 805 145 20 1456 1354 1092 155 25 2071 1896 1273 177 30 2407 2219 1337 215

Table 3: Conformance Checking (times in ms)

the declarative process model discovery tool presented in (Maggi et al. 2018) to extract a process model using the default settings. The models in the real cases are much more complex and include a number of constraints between 10 and 49 for a minimum support of 80. The execution times needed for the Log Generation task with the ASP-based log generator and with the Alloy-based tool are shown, respectively, in the ﬁrst and second block of Table 2. An asterisk indicates that for the speciﬁc model it was not possible to generate 10000 unique traces. The complexity of real life models makes even more evident the signiﬁcant advantage of using the ASP-based tool with respect to the Alloy-based one. In particular, in the worst case, the ASP-based tool requires around 9 sec (to generate 10000 traces of length 30 for log PL) while the Alloy-based generator almost 4 mins.

Conformance Checking

Also for Conformance Checking we used synthetic and real life datasets. The former include the same declarative models as those used for Log Generation, plus synthetic logs of 1000 traces of lengths from 10 to 30. Table 3 shows the execution times for the ASP-based tool, with and without data conditions, and for the Declare Analyzer tool for synthetic datasets (times in ms). The results show that in all cases the execution times increase when the model becomes larger and the traces in the log become longer. The execution times obtained with the ASP-based tool and the Declare Analyzer are comparable for data-aware constraints, while, model constraints do not contain data conditions, the Declare Analyzer is around 5 times faster. This might be due to the use of the #max aggregate to compute a trace s length,

which yields performance degradations. A possible solution could be computing the trace length in advance and then provide it in the ASP encoding as a fact. In the real life experiments, we tested the Conformance Checking tools using models obtained with the discovery tool by varying the minimum support between 60 and 90. The minimum support indicates the minimum percentage of traces in which a constraint should be fulﬁlled to be added to the discovered model. Clearly, a higher minimum support implies that the discovered models contain less constraints. As expected (see Table 4), the execution times decrease when the minimum support used to discover the reference models increases in size. Also in this case, the Declare Analyzer (second block in Table 4) is faster. However, the ASP-based tool also scales well (ﬁrst block in Table 4) requiring in the worst case around 1min.

Query Checking

Since for Query Checking no competitor exists in the PM literature, we ran a set of controlled experiments to check how execution times vary under different conditions. We used the same synthetic logs used for Conformance Checking and tested 8 queries corresponding to 8 standard Declare templates, with and without data conditions. The results are shown in Table 5 (with and without data in the ﬁrst and second block respectively). The execution times are comparable for different types of queries and the presence of data does not affect performance. In addition, as expected, the execution times increase when the traces in the log become longer.

Conclusions

We have devised an ASP-based approach to solve three classical problems from Declarative PM, namely Log Generation, Query Checking and Conformance Checking, in a dataaware setting. Our results include correct ASP-encoding schemata and an experimental evaluation against other approaches. The experimental results show that, for Log Generation, our approach drastically outperforms the state-ofthe-art tool from PM. Time performance are slightly worse wrt to the existing ad-hoc Conformance Checker Declare Analyzer, which is optimized for Declare. As to Query Checking, our approach provides the ﬁrst solution in a data-

BPI2012 DD ID PL PTC RP RT Sepsis 60 33426 13084 49969 78625 8412 9354 49501 7116 70 33242 13245 46388 55475 5596 9359 35537 3796 80 24482 10176 29969 33775 4699 6836 35483 1778 90 8445 4568 17576 26590 2787 5608 35483 731 60 2882 2771 9800 8521 1549 2122 15262 1194 70 2852 3249 7416 5358 959 2102 10351 705 80 2291 2103 3993 2677 755 1532 11285 318 90 1691 1525 1946 1595 404 1091 10628 250

Table 4: Conformance Checking (times in ms)

Constraints Existence Responded Response Chain Absence Not Resp. Not Resp. Not Chain Trace len Existence Response Existence Response 10 521 736 534 503 566 783 602 385 15 704 1113 801 788 784 1180 879 606 20 1321 1675 1143 1128 1373 1821 1304 865 25 1397 3218 1528 1561 1562 2823 1807 1104 30 1674 2878 1824 1906 1905 2784 2028 1301 10 399 658 541 632 441 799 806 772 15 616 1183 824 1057 595 1319 1121 1182 20 903 1778 1339 1550 874 1887 2127 2062 25 1188 2381 1724 2036 1101 3246 3200 2486 30 1461 3278 2066 2632 1333 3391 2766 2846

Table 5: ASP Query Checking (times in ms)

aware setting, a problem still open so far. We believe that, by showing how the selected problems can be encoded and solved in ASP, we are not only offering a solution technique but, more in general, we are putting forward ASP an effective modeling paradigm for Declarative PM in a data-aware setting. For future work, we plan to extend the approach to deal with actual, non-integer, timestamps in events and to go beyond local LTLf by investigating the introduction of across-state quantiﬁcation to relate the values assigned to attributes at a given time point to those assigned at a different time point.

Acknowledgments Work partly supported by the ERC Advanced Grant White Mech (No. 834228), the EU ICT-48 2020 project TAILOR (No. 952215), the Sapienza Project DRAPE, and the UNIBZ project CAT.

References Burattin, A.; Maggi, F. M.; and Sperduti, A. 2016. Conformance checking based on multi-perspective declarative process models. Expert Syst. Appl., 65: 194 211. De Giacomo, G.; and Favorito, M. 2021. Compositional Approach to Translate LTLf/LDLf into Deterministic Finite Automata. In Biundo, S.; Do, M.; Goldman, R.; Katz, M.; Yang, Q.; and Zhuo, H. H., eds., Proceedings of the Thirty First International Conference on Automated Planning and Scheduling, ICAPS 2021, Guangzhou, China (virtual), August 2-13, 2021, 122 130. AAAI Press.

De Giacomo, G.; and Vardi, M. Y. 2013. Linear Temporal Logic and Linear Dynamic Logic on Finite Traces. In Rossi, F., ed., IJCAI 2013, Proceedings of the 23rd International Joint Conference on Artiﬁcial Intelligence, Beijing, China, August 3-9, 2013, 854 860. IJCAI/AAAI.

Gebser, M.; Kaminski, R.; Kaufmann, B.; and Schaub, T. 2019. Multi-shot ASP solving with clingo. Theory Pract. Log. Program., 19(1): 27 82.

Gelfond, M.; and Lifschitz, V. 1988. The Stable Model Semantics for Logic Programming. In Kowalski, R.; Bowen; and Kenneth, eds., Proceedings of International Logic Programming Conference and Symposium, 1070 1080. MIT Press.

Heljanko, K.; and Niemel a, I. 2003. Bounded LTL model checking with stable models. Theory Pract. Log. Program., 3(4-5): 519 550.

Maggi, F. M.; Di Ciccio, C.; Di Francescomarino, C.; and Kala, T. 2018. Parallel algorithms for the automated discovery of declarative process models. Inf. Syst., 74(Part): 136 152.

Niemel a, I. 1999. Logic Programs with Stable Model Semantics as a Constraint Programming Paradigm. Ann. Math. Artif. Intell., 25(3-4): 241 273.

R aim, M.; Di Ciccio, C.; Maggi, F. M.; Mecella, M.; and Mendling, J. 2014. Log-Based Understanding of Business Processes through Temporal Logic Query Checking. In On the Move to Meaningful Internet Systems: OTM 2014 Conferences - Confederated International Conferences: Coop IS,

and ODBASE 2014, Amantea, Italy, October 27-31, 2014, Proceedings, 75 92. Skydanienko, V.; Di Francescomarino, C.; Ghidini, C.; and Maggi, F. M. 2018. A Tool for Generating Event Logs from Multi-Perspective Declare Models. In van der Aalst, W. M. P.; Casati, F.; Kumar, A.; Mendling, J.; Nepal, S.; Pentland, B. T.; and Weber, B., eds., Proceedings of the Dissertation Award, Demonstration, and Industrial Track at BPM 2018 co-located with 16th International Conference on Business Process Management (BPM 2018), Sydney, Australia, September 9-14, 2018, volume 2196 of CEUR Workshop Proceedings, 111 115. CEUR-WS.org, Raffaele Conforti and Massimiliano de Leoni and Marlon Dumas. van der Aalst, W. M. P. 2016. Process Mining - Data Science in Action, Second Edition. Springer. ISBN 978-3-66249850-7. van der Aalst, W. M. P.; Pesic, M.; and Schonenberg, H. 2009. Declarative workﬂows: Balancing between ﬂexibility and support. Comput. Sci. Res. Dev., 23(2): 99 113. Wieczorek, W.; Jastrzab, T.; and Unold, O. 2020. Answer Set Programming for Regular Inference. Applied Sciences, 10(21): 7700.