# generating_counterfactual_explanations_under_temporal_constraints__f30fd400.pdf

Generating Counterfactual Explanations Under Temporal Constraints

Andrei Buliga1,2*, Chiara Di Francescomarino3*, Chiara Ghidini2*, Marco Montali2*, Massimiliano Ronzani1*

1Fondazione Bruno Kessler, Via Sommarive, 18, POVO - 38123, Trento, Italy 2Free University of Bozen-Bolzano, via Bruno Buozzi, 1 - 39100, Bozen-Bolzano, Italy 3University of Trento, Via Sommarive, 9, 38123 Trento, Italy {abuliga, mronzani}@fbk.eu, c.difrancescomarino@unitn.it, {chiara.ghidini, marco.montali}@unibz.it

Counterfactual explanations are one of the prominent e Xplainable Artificial Intelligence (XAI) techniques, and suggest changes to input data that could alter predictions, leading to more favourable outcomes. Existing counterfactual methods do not readily apply to temporal domains, such as that of process mining, where data take the form of traces of activities that must obey to temporal background knowledge expressing which dynamics are possible and which not. Specifically, counterfactuals generated off-the-shelf may violate the background knowledge, leading to inconsistent explanations. This work tackles this challenge by introducing a novel approach for generating temporally constrained counterfactuals, guaranteed to comply by design with background knowledge expressed in Linear Temporal Logic on process traces (LTLp). We do so by infusing automata-theoretic techniques for LTLp inside a genetic algorithm for counterfactual generation. The empirical evaluation shows that the generated counterfactuals are temporally meaningful and more interpretable for applications involving temporal dependencies.

Code https://github.com/abuliga/AAAI2025-temporalconstrained-counterfactuals

1 Introduction

State-of-the-art Machine Learning efforts prioritise accuracy using ensemble and deep learning techniques, but their complexity makes their input-output mappings difficult to interpret. To address this, e Xplainable Artificial Intelligence (XAI) techniques have emerged, which aid in the interpretation of predictions and promoting the adoption of advanced models (Verma, Dickerson, and Hines 2020; Guidotti 2022; Dandl et al. 2020; Beckh et al. 2023). Counterfactual explanations (Verma, Dickerson, and Hines 2020; Guidotti 2022) are a key e Xplainable Artificial Intelligence (XAI) technique. They provide insights into which changes should be applied to an input instance to alter the outcome of a prediction. Such explanations are hence particularly valuable for users who need to understand how different attributes or actions might influence an outcome

*These authors contributed equally. Copyright 2025, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

of interest. State-of-the-art counterfactual generation methods often rely on optimisation techniques to find minimal changes to inputs leading to altering the predicted outcome. Existing methods do not readily apply to temporal domains, such as that of process mining (van der Aalst 2016), where data of interest consists of traces of activities generated by executing a business/work process or a plan. Such so-called process traces (Fionda and Greco 2018) are increasingly used in key domains like healthcare, business, and industrial processes, where the sequencing of activities is central. The main issue in these settings is that not all sequencings of activities make sense: traces are typically subject to temporal constraints, that is, must comply with temporal background knowledge expressing which dynamics are possible and which not. E.g., in a healthcare process a patient may enter triage only upon giving privacy consent. Systematic solutions for generating counterfactual explanations that comply with temporal background knowledge are still lacking (Buliga et al. 2023). This is a problem in areas such as Predictive Process Monitoring (PPM) (Di Francescomarino and Ghidini 2022), a widely established framework in process mining. Here, black-box predictive models are typically used to predict the outcome of executions of business and work processes, often making interpretability a challenge. Integrating counterfactual explanations into such monitoring frameworks can enhance understanding by providing alternative trace executions to reach a more favourable outcome, only upon guaranteeing that such explanations make sense, that is, comply with temporal background knowledge. To tackle this open problem, we propose a novel optimisation based approach for generating temporally constrained counterfactual traces, that is, counterfactual process traces that are guaranteed to comply with temporal background knowledge. To express such knowledge, we take the natural choice of adopting LTLp, a temporal logic specifically designed for process traces (Fionda and Greco 2018), starting from the well established formalism of linear temporal logic on finite traces (De Giacomo and Vardi 2013). The backbone of our approach consists in infusing automata-theoretic techniques for LTLp/LTLf (De Giacomo et al. 2022) within a Genetic Algorithm (GA) body. In particular, given an LTLp formula φ and a process trace τ satisfying φ, we introduce different strategies that inspect the automaton of φ to suit-

The Thirty-Ninth AAAI Conference on Artificial Intelligence (AAAI-25)

ably constrain the mutations introduced by the GA to alter τ towards counterfactual generation, so as to guarantee that the mutated version continues to satisfy φ. An empirical evaluation on real-world and synthetic datasets demonstrates the effectiveness of our approach. Our results indicate that incorporating temporal background knowledge significantly improves the quality of counterfactual explanations, ensuring they satisfy LTLp formulae of interest while satisfying general counterfactual desiderata.

2 Running Example Consider a scenario in which an estate agency is using a PPM system to forecast the renting of flats to its customers during the application and negotiation process. The system takes in input an ongoing trace and forecasts Accept in case of a successful application and negotiation and Fail otherwise. For example, consider the ongoing trace τ1, for which the predictive system forecasts a negotiation failure:

τ1 = APPLY, AUT-CHK, MAN-CHK, PHONE, OK, OFFER, PHONE, BOOK

The trace starts with the customer submitting the application (APPLY) and the agency running an automatic check (AUTCHK). Following a failure, a manual check (MAN-CHK) is done, and the customer is informed by phone (PHONE) of the application status, which is in this case accepted as valid (OK). An offer is created (OFFER) and communicated to the customer by phone (PHONE). The customer then asks for an appointment (BOOK) to discuss the offer. To understand the reasons behind the prediction of Fail for τ1, the estate agency intends to obtain counterfactual explanations for τ1, that is, ongoing traces that suggest how to modify τ1 so that a successful negotiation is predicted. Examples are reported below, where SEND-DOC indicates that the customer sends the required documents to the estate agency, and SMS (resp., EMAIL) captures that the agency informs the customer via sms (resp., e-mail):

τc1 = APPLY, MAN-CHK, AUT-CHK, PHONE, OK, OFFER, PHONE, SEND-DOC τc2 = APPLY, AUT-CHK, MAN-CHK, PHONE, OK, OFFER, PHONE, SEND-DOC τc3 = APPLY, AUT-CHK, PHONE, OK, OFFER, SMS, SEND-DOC τc4 = APPLY, AUT-CHK, PHONE, PHONE, OK, OFFER, PHONE, SEND-DOC, BOOK τc5 = APPLY, AUT-CHK, MAN-CHK, PHONE, OK, OFFER, EMAIL, SEND-DOC

We assume that counterfactual explanations τc1, . . . , τc5 are properly built using available activities in the application and negotiation process. Nonetheless, some of them must be ruled out when considering how the agency operates. In our example, the agency has operational rules that state that: (i) within a negotiation, an automated check must eventually be conducted, and in case also a manual check is done, it can only be done after the automated one; and (ii) the agency always informs the applicant with the same communication mode (i.e., it is not possible to have, in the same negotiation, communications done with distinct modes). These two rules can be expressed using LTLp (recalled next) as follows:

φchk = ( MAN-CHK) U AUT-CHK φcomm = ((F PHONE) (F SMS)) ((F PHONE) (F EMAIL)) ((F SMS) (F EMAIL))

It turns out that, among τc1, . . . , τc5, τc1 violates φchk,

while τc3 and τc5 violate φcomm. The activities causing the violations are underlined in τc1, τc3 and τc5. All in all, dealing with the presented scenario challenges state-of-the-art counterfactual generation approaches. In fact, when these approaches are applied in an out-of-thebox way, they cannot incorporate the background knowledge captured in φchk and φcomm. What we need is a technique that only considers, as counterfactual traces, those that ensure to respect the temporal background.

3 Background We overview here the essential background for the paper.

3.1 Linear Temporal Logic on Process Traces Linear temporal logics, that is, temporal logics predicating on traces, are the most natural choice to express background knowledge in our setting. Traditionally, traces are assumed to have an infinite length, as witnessed by the main representative of this family of logics, namely LTL (Pnueli 1977). In several application domains, such as those mentioned in 1 and 2, the dynamics of the system are more naturally captured using unbounded, but finite, traces (De Giacomo, De Masellis, and Montali 2014). This led to LTL over finite traces (LTLf) (De Giacomo and Vardi 2013), which adopts the syntax of LTL but interprets formulae over finite traces. In our work, we are specifically interested in a variant of LTLf where propositions denote atomic activities constituting the basic building blocks of a process, and where each state indicates which atomic activity has been executed therein. This logic has been termed LTL on process traces (LTLp) in (Fionda and Greco 2018), which we follow next. Fix a finite set Σ of activities. A (process) trace τ over Σ is a finite, non-empty sequence a1, . . . , an over Σ, indicating which activity from Σ is executed in every instant i {1, . . . , n} of the trace. The length n of τ is denoted len(τ). For i < len(τ), τ(i) denotes the activity ai executed at instant i of τ, and τ(: i) the prefix a1, . . . , ai of τ. An LTLp formula φ is defined according to the grammar

φ ::= a | φ | φ1 φ2 | Xφ | φ1 U φ2, where a Σ.

Intuitively, X is the (strong) next operator: Xφ indicates that the next instant must be within the trace, and φ is true therein. U is the until operator: φ1 U φ2 indicates that φ2 is true now or in a later instant j of the trace, and in every instant between the current one and j excluded, φ1 is true. Formally, given an LTLp formula φ, a process trace τ, and an instance i {1, . . . , len(τ)}, we inductively define that φ is true in instant i of τ, written τ, i |= φ, as:

τ, i |= a if τ(i) = a τ, i |= φ if τ, i |= φ τ, i |= φ1 φ2 if τ, i |= φ1 or τ, i |= φ2 τ, i |= Xφ if i + 1 len(τ) and τ, i + 1 |= φ τ, i |= φ1 U φ2 if τ, j |= φ2 for some j s.t. i j len(τ) and τ, k |= φ1 for every k s.t. i k < j

We say that τ satisfies φ, written τ |= φ, if τ, 1 |= φ.

AUT-CHK Σφchk

AUT-CHK MAN-CHK

Figure 1: Graphical representation of the DFA for the LTLp formula φchk = ( MAN-CHK) U AUT-CHK. Each transition labelled Σφchk is a placeholder for the set of transitions connecting the same pair of states, one per activity in the set Σφchk (i.e., different from MAN-CHK and AUT-CHK).

The other boolean connectives true, false, , are derived as usual, noting true = W ai Σ ai. Further key temporal operators are derived from X and U. E.g., Fφ = true U φ states that φ is eventually true in some future instant. Since every LTLp formula is an LTLf formula, the automata-theoretic machinery defined for LTLf (De Giacomo and Vardi 2013; De Giacomo et al. 2022) applies to LTLp as well. Specifically, we recall the definition of a deterministic finite-state automaton (DFA) over process traces: a standard DFA with the only difference that, due to how process traces are defined, it employs Σ instead of 2Σ in labelling its transitions. a DFA over process traces from Σ is a tuple A = Σ, Q, q0, δ, F , where: (i) Q is a finite set of states; (ii) q0 Q is the initial state; (iii) δ : Q Σ Q is the (Σ-labelled) transition function; (iv) F Q is the set of final states. A process trace τ = a1, . . . , an is accepted by A if there is a sequence of n + 1 states q0, . . . , qn such that: (i) the sequence starts from the initial state q0 of A; (ii) the sequence culminates in a last state, that is, qn F; (iii) for every i {1, . . . , n}, we have δ(qi 1, ai) = qi. Notably, every LTLp formula φ can be encoded into a DFA over process traces Aφ that recognises all and only those traces that satisfy φ: for every process trace τ over Σ, we have that τ is accepted by Aφ if and only if τ |= φ. Given an LTLp formula φ, we denote by Σφ the set of activities mentioned in φ, and by Σφ the set of other activities, that is, activities not mentioned in φ: Σφ = Σ \ Σφ. Figure 1 shows the DFA of LTLp formula φchk from 2, which states that during negotiations, an AUT-CHK must be eventually done, and MAN-CHK, if done, can only follow the automated one.

3.2 Counterfactual Explanations Differently from other XAI methods, counterfactual explanations do not attempt to explain the inner logic of a predictive model, but rather offer alternatives to the user to obtain a desired prediction (Wachter, Mittelstadt, and Russell 2017). Given a black-box classifier hθ, a counterfactual τc of τ is a sample for which the prediction of the model is different from the one of τ (i.e., hθ(τc) = hθ(τ)), such that the difference between τ and τc is minimal (Guidotti 2022). A counterfactual explainer is a function Ft, where t is the number of requested counterfactuals, such that, for a given sample τ, a black box model hθ, and the set T of samples used to train the black-box model, returns a set C = {τc1, . . . , τch} (with h t). For instance, from the running example, with t = 5, y = Fail, y = Accept, and τ = τ1, running Ft(τ, hθ, T )

yields the counterfactuals τc1, . . . , τc5. The XAI literature outlines several desiderata for counterfactual explanations (Verma, Dickerson, and Hines 2020): (i) Validity: Counterfactuals should flip the original prediction, aligning with the desired class. (ii) Input Closeness: Counterfactuals should minimize changes for clearer explanations. (iii) Sparsity: Counterfactuals should alter as few attributes as possible for conciseness. (iv) Plausibility: Counterfactuals must adhere to observed feature correlations, ensuring feasibility and realism. (v) Diversity: A set of counterfactuals should provide diverse alternatives for the user. The validity of a counterfactual τc (desideratum (i)) is measured by the function val, which evaluates the difference between the predicted value hθ(τc) and the desired class y :

val(hθ(τc), y ) = Ihθ(τc) =y (1)

where I is the indicator function.1 Input closeness of the τc to τ (desideratum (ii)), measured by the dist function, assesses their dissimilarity:

dist(τ, τc) = 1 len(τ)

i=1 d(τ(i), τc(i)) (2)

where d is a properly defined distance in the feature space. In this work, for the distance d between trace elements, we use the indicator function d(x, y) = Ix =y. The sparsity of τc regarding τ (desideratum (iii)) is measured through the spars function, which counts the number of changes in the counterfactual:

spars(τ, τc) = ||τ τc||0 =

i=1 Iτ(i) =τc(i) (3)

Implausibility (desideratum (iv)) of τc is measured by the implaus function, calculating the distance between τc and the closest sample τz in the reference population T :

implaus(τc, T ) = min τz T 1 len(τ)

i=0 d(τz(i), τc(i)) (4)

where d is the same distance used in (2). The fifth desideratum (v), diversity, measures the pairwise distances between the counterfactuals in C, using the dive:

dive(C) = 1 |C|(|C| 1)

{τc,τc } dist(τc, τc ) (5)

where dist is defined in (2) and the sum is over all possible unordered pairs {τc, τc } of elements τc, τc of C.

3.3 Genetic Algorithms Genetic Algorithms (GAs) are powerful optimisation techniques, inspired from the natural processes of evolution, widely used for complex problems due to their effectiveness (Mitchell 1998). GAs work with a population of solutions P, evaluating their quality through a fitness function. We will present the main components of GAs through the

1The indicator function Ix =y is 1 when x = y and 0 otherwise.

prism of traces and counterfactual explanations. Each candidate solution in the search space, such as each possible trace, is described by a set of genes, forming its chromosome or genotype. Below, we outline the main components of a GA. The chromosome τc of a candidate solution is a sequence of genes, where τc(i) refers to the i-th gene, i {1, . . . , n} where n is the length of the chromosome. In the case of traces τc(i) represents the i-th executed activity. The fitness function f(τc) evaluates the quality of each candidate solution τc P, providing a measure that is used as the objective for the GA optimization. For example, in counterfactual generation, the fitness function instantiates the different objectives introduced in Sect. 3.2. The initial population P is a set of candidate traces generated at the beginning of the GA process. It may consist of randomly generated or predefined candidates: P = {τc1, τc2, . . . , τcp} where τck denotes an individual candidate. Afterwards, a subset of the population P is selected based on the fitness score. Selected chromosomes (parents) undergo crossover to produce offsprings. This operation combines parts of two parents chromosomes to create new ones, promoting genetic diversity. If τp1 and τp2 are two parents, then each component of the offspring τo is selected from either of the two, i.e. τo(i) = τp1(i) or τp2(i). Offspring chromosomes are subject to mutation, which involves randomly altering one or more genes. For example, a gene τo(i) might change with some small probability pmut, introducing new genetic material and helping to prevent premature convergence. Specifically, τ o(i) is mutated to a random gene with probability pmut and remains equal to τo(i) otherwise. Once this offspring undergoes mutation (τ o), it becomes part of the new population as τck. In our example, P consists of counterfactual candidates P = {τc1, . . . , τc5}. Each candidate s genotype, e.g., τc1, is an activity sequence, with each activity as a gene (e.g., τc1(1) = APPLY). The fitness function evaluates candidates based on factors like the ability to flip the outcome, where the fittest are selected for crossover and mutation. Ensuring compliance with constraints, like maintaining the correct sequence of activities (AUT-CHK before MAN-CHK), is crucial to generating compliant counterfactuals, such as c2, c4.

4 Approach We are now ready to introduce our framework for generating counterfactual explanations that comply with background knowledge described through LTLp formulae. The first step of is to define a new compliance desideratum (desideratum vi). This is done by using a compliance function measuring whether the counterfactual τc satisfies a LTLp formula φ representing the temporal background knowledge. A counterfactual τc is deemed as a temporally constrained counterfactual if it satisfies φ. Formally:

compliance(τc, φ) = 1 if τc |= φ 0 otherwise. (6)

Recalling φ = φchk φcomm and counterfactuals τc1, . . . , τc5 from 2, we have compliance(τc, φ) = 1 for c {c2, c4} and compliance(τc, φ) = 0 for c {c1, c3, c5}.

Next we formulate the fitness function and its afferent objectives ( 4.1), and we introduce the modified crossover and mutation operators for the generation of counterfactuals that guarantee the compliance to LTLp formulae ( 4.2).

4.1 Optimisation Problem Formulations We follow GA-based methods like (Schleich et al. 2021; Dandl et al. 2020) and instantiate the first four desiderata from 3.2 into corresponding optimisation objectives for the fitness function, including the compliance desideratum.2 Hence, the objectives to optimise are: validity of the counterfactual τc (1); the distance of τc to the original trace τ (2); sparsity, quantifying the number of changes in τc (3) from τ; implausibility, that corresponds to the distance of τc from the reference population T (3); and compliance, measuring whether the τc is compliant to φ or not (6). The resulting fitness function f is thus defined as:

f =val(hθ(τc), y ) + α dist(τ, τc) + β spars(τ, τc)+ γ implaus(τc, T ) + δ compliance(τc, φ). (7)

where α, β, γ, δ are weighting factors controlling the influence of each term on the overall fitness.

4.2 Temporal Knowledge-Aware Operators To guarantee the satisfaction of the background knowledge in the form of φ, we modify the GA, specifically the crossover and mutation operators introduced in 3.3.

Temporal Knowledge-aware Crossover Given an original query instance (process trace) τ satisfying LTLp formula φ, and two parent traces τp1 and τp2 in the current population P, the Temporal Knowledge-aware Crossover operator presented in Algorithm 1 generates an offspring individual τo that satisfies φ. It takes as input also the crossover probability pc and the alphabet Σφ of the activities mentioned in φ. The Temporal Knowledge-aware Crossover operator initiates an offspring individual τo by retaining from τ the phenotype that actively interacts with φ, guaranteeing its satisfaction. This is formed by those activities in Σφ (lines 3 7). The empty genes in the offspring individual chromosome are then filled with one of the two parents genetic material, but only if the corresponding parent s gene does not interact with φ, i.e., contains an activity in from Σφ (lines 8 16). In detail, a random probability p is sampled (line 9) for every empty gene (line 10). The genetic material is then chosen, as in classical crossover operators, from either parent τp1 or τp2 (lines 11 12), according to the given crossover probability pc, if the parent s activity does not belong to Σφ. Otherwise, if both parents activities at that gene belong to Σφ, the crossover operator uses the gene from the original query instance τ (line 13). As we prove later, this ensures that τo alters τ in a way that does not affect the satisfaction of φ.

Temporal Knowledge-aware Mutation We constrain the mutation operator, designed to increase the diversity of the population, with two strategies that maintain the diversity in the generated offsprings while ensuring that φ is satisfied.

2In GAs, diversity is managed through selection, crossover, and mutation operators, rather than the fitness function.

Algorithm 1: Temporal Knowledge-aware Crossover operation

1: Input: parent individuals τp1 and τp2, crossover probability pc, original query instance τ, activities Σφ 2: Output: offspring trace τo 3: for i from 1 to |τ| do 4: if τ(i) Σφ then τo(i) τ(i) 5: else τo(i) null 6: end if 7: end for 8: for i from 1 to |τo| do 9: p U(0, 1) 10: if τo(i) is null then 11: if p < pc τp1(i) / Σφ then τo(i) τp1(i) 12: else if p pc τp2(i) / Σφ then τo(i) τp2(i)

13: else τo(i) τ(i) 14: end if 15: end if 16: end for 17: return τo

The first strategy, called a Priori, computes all the possible mutations for each gene τo(i) at the beginning of the mutation phase. The second strategy, called Online, exploits DFA Aφ to compute the possible mutations for the current gene τo(i) in the construction of τo taking into account the already constructed partial trace τo(: i 1). We indicate with Di the set of all possible activities that can occur at the i-th gene in any generated counterfactual, and define it as the set of all activities occurring in position i in all historical traces T . Formally, Di = {τ(i) | τ T }. Given an offspring τo, the a Priori strategy produces an offspring τ o by mutating only genes that are not in Σφ with values that are not in Σφ, thus mapping other activities into other activities, which interact with φ interchangeably. The Online strategy instead produces an offspring τ o by computing, for every partial trace τo(: i) with 1 i |τo|, which activities could be used in place of the last activity τo(i) to alter such position i without changing the satisfaction of φ. This is realized through the SAFEACT function defined in Algorithm 2, exploiting the DFA Aφ as follows. First Aφ is traversed using the sequence of activities in τo(: i 1), leading (deterministically) to a state q of the DFA. Then it is checked which next state q is obtained by applying transition δ(q, τo(i)). The safe activities that can be used in place of τo(i) in state q are then those that lead to the same next state q . In a sense, this generalises a Priori, as in some states also activities from Σφ may be interchangeable. The Temporal Knowledge-aware Mutation operator, presented in Algorithm 3, focuses on mutating an offspring individual τo while preserving the satisfaction of the formula φ. The operator takes as input the offspring τo, the mutation probability pmut, the set of the domains of the genes D = {Di | i {1, . . . , |τo|}}, and the chosen mutation strategy S, returning the mutated offspring τ o as output. It also takes as input the LTLp formula φ capturing background knowledge, for which we assume that the DFA Aφ has been

Algorithm 2: Compute Safe Activities

1: function SAFEACT(τo, i, A) 2: Σsafe {} 3: q initial state of A 4: for j from 1 to i 1 do q δ(q, τo(j)) 5: end for 6: q δ(q, τo(i)) 7: for a Σ do 8: if δ(q, a) = q then Σsafe Σsafe {a} 9: end if 10: end for 11: return Σsafe

12: end function

Algorithm 3: Temporal Knowledge-aware Mutation operator

1: Input: offspring τo, mutation probability pmut, LTLp formula φ, domains of each gene D, mutation strategy S 2: Output: mutated offspring τ o 3: for i from 1 to |τo| do 4: p U(0, 1) 5: if p < pmut then 6: if S is a Priori and τo(i) / Σφ then 7: τo(i) U(Di \ Σφ) 8: else if S is Online then 9: Σsafe SAFEACT(τo, i, Aφ) 10: τo(i) U(Di Σsafe) 11: end if 12: else τo(i) τo(i) 13: end if 14: end for 15: τ o τo 16: return τ o

pre-computed (and hence is passed as implicit parameter to the algorithm). The algorithm starts by sampling a random mutation probability pmut (line 4), and then iterates through each gene from 1 to |τo| (lines 3 14). The mutation is carried on if the sampled probability pmut is under the set threshold probability for mutation pmut (line 5), otherwise, we return the value of τo(i) (line 12). In the case of mutation, the value of τo(i) is then uniformly sampled according to the selected mutation strategy S (lines 6 11), mutating each gene accordingly, and returning the mutated offspring τ o.

Correctness of the approach. All in all, the application of the crossover operator from Algorithm 1, as well as that of the mutation operator with the two illustrated strategies from Algorithm 3, are correct in terms of how they interact with temporal background knowledge, in the following sense.

Theorem 1. Let φ be a LTLp formula, and τp1, τp2, and τ be process traces over Σ, with τ |= φ. Let pc R[0,1]. Assume that Algorithm 1 is invoked by passing τp1, τp2, pc, τ, and Σφ as input, and that it returns τo. Then τo |= φ.

Proof. Let n = len(τ). Upon inspection of Algorithm 1, one can see that every output τo produced by the algorithm relates to the input trace τ as follows:

1. len(τo) = len(τ) = n; 2. for every i {1, . . . , n}: (a) if τ(i) Σφ then τo(i) = τ(i); (b) if instead τ Σφ, that is, τ Σφ, then τo(i) Σφ as well equivalently, τo(i) Σφ if and only if τ(i) Σφ. By absurdum, imagine that τo |= φ. Since, by property (1) above, len(τo) = len(τ), this means that the violation must occur due to a mismatch in the evaluation of an atomic formula in some instant. Technically, there must exist i {1, . . . , n} and an atomic sub-formula a Σ of φ (which, by definition, requires a Σφ) such that either: (A) τ, i |= a and τo, i |= a, or (B) τ, i |= a and τo, i |= a. Case (A). By the LTLp semantics, we have τ(i) = a and τo(i) = a. However, this is impossible: since a belongs to Σφ, then by property (2a) above, we have τ(i) = τo(i). Case (B). By the LTLp semantics, τ, i |= a if and only if τ(i) = b for some b Σ \ {a}. There are two sub-cases: either b Σφ \ {a}, or b Σφ. In the first sub-case, impossibility follows again from the fact that, by property (2a) above, since b Σφ, then τo(i) = τ(i) = b, which implies τo, i |= a. In the second sub-case, impossibility follows from the fact that τo(i) cannot be a, since by property (2b), the fact that τ(i) Σb implies that also τo(i) Σb.

Theorem 2. Let φ be a LTLp formula, τo a process trace over Σ s.t. τo |= φ, D = {Di} Σ|τo| the domains of each gene, and pmut R[0,1]. Assume that Algorithm 3 is invoked by passing τo, φ, pmut, D, S as input (with S {a Priori,Online}), and that it returns τ o. Then τ o |= φ.

Proof. Correctness of a Priori is proven analogously of Theorem 1. Correctness of Online derives directly from the correspondence between the traces that satisfy φ, and the traces accepted by the DFA of φ, Aφ. From the definition of DFA acceptance, we have that since trace τo satisfies φ, there is a sequence q0, . . . , qn of states of Aφ, such that: (i) the sequence starts from the initial state q0 of A; (ii) the sequence culminates in a last state, that is, qn F; (iii) for every i {1, . . . , n}, we have δ(qi 1, τo(i)) = qi. From Algorithm 3, we have that trace τo is mutated through Online by selecting an instant i, and allowing for replacing τo(i) with an activity a returned from Algorithm 2. From Algorithm 2, we know that activity a is returned if the following property holds: δ(qi 1, a) = qi. This means that the mutated trace τ o that replaces τo(i) with a, and maintains the rest identical, is accepted by Aφ, with the same witnessing sequence of states q0, . . . , qn. From the correspondence between Aφ and the LTLp semantics of φ, we thus have τ o |= φ.

We illustrate the two proposed strategies with formula φchk of the running example of 2. Based on the DFA of the formula (Fig. 1), we know that, from the initial state q0, if AUT-CHK happens, we reach the state q1, which is a final state, where we can repeat any activity in Σ. In q0 we can perform any other activity in Σφchk, i.e., different from AUT-CHK and MAN-CHK, remaining in q0. However, if from

q0 we perform MAN-CHK, we reach q2 which is a dead-end state and thus τ1 violates φ. Given the trace τ1 from our running example, consider the mutation of its fourth component τ1(4) = PHONE. The two strategies for the mutation operator give the following mutation possibilities: in the a Priori strategy we exclude activities in Σφ from the mutation, so we can mutate τ1(4) with activities in D4 \ Σφ; in the Online strategy the transition in the DFA associated with τ1(4) is δ(q1, τ1(4)) = q1. Since from q1 all activities give the same transition, that is, δ(q1, a) = q1 for every a Σ, we can mutate τ1(4) with the entire D4.

5 Evaluation To evaluate the approaches for generating counterfactuals explanations incorporating temporal background knowledge, we focus on answering the following questions:

RQ1 How do the proposed methods compare with a standard genetic algorithm? RQ2 How do the proposed methods compare with a baseline strategy of generation and check , which ensures the satisfaction of the temporal background knowledge?

Both questions are assessed in terms of generation time and quality of the generated counterfactuals (see below). The goal of RQ1 is to assess whether the enforcement of temporal background knowledge has an impact on the quality of the generated counterfactuals when compared to traditional GA approaches. The goal of RQ2 is to evaluate the proposed strategy, which integrates GAs and DFA against an iterative combination of GAs and a checker function that generates the counterfactual and then checks compliance in a trial-and-error fashion. While a comparison with a standard GA in terms of compliance is out of scope of this paper, as these algorithms are not built to ensure this property, RQ1 also enables us to discuss also this aspect.

Baselines, Datasets and Evaluation metrics We introduce two baseline methods. To answer the first research question, we use a standard GA, Geneticφ (Genφ), which uses the fitness function Eq. (7), and the standard crossover and mutation operators from (Schleich et al. 2021) 3. To answer to the second research question we employ Mutate And-Retry (MAR) which pairs the GA method with compliance through a trial-and-error mechanism: the generated trace is checked against φ, and the mutation is repeated until compliance. Experiments are conducted using three datasets commonly used in Process Mining, with details reported in Table 1: Claim Management (Rizzi, Di Francescomarino, and Maggi 2020) is a synthetic dataset pertaining to a claim management process, where accepted claims are labelled as true and rejected claims as false; BPIC2012 (van Dongen 2012) and BPIC2017 (van Dongen 2012) two real-life datasets about a loan application process, where traces with accepted loan offers are labelled as true, and declined offers as false.

3For compatibility reasons, the same fitness function is also used in the proposed methods, a Priori and Online, even though for them the value of compliance is always one since the generated offsprings τo always satisfy φ.

Dataset Traces Avg. Len. |Σ| Used Prefixes

Claim Management 4800 11 16 7, 10, 13, 16 BPIC2012 4685 35 36 20, 25, 30, 35 BPIC2017 31 413 35 26 20, 25, 30, 35

Table 1: Summary of the dataset characteristics.

We conduct experiments with different datasets, trace lengths, as well as with LTLp formulae with sizes of Σφ. The experiments are performed over traces with variable prefix lengths, reported in Table 1, testing how techniques perform on average with varying amounts of information (Teinemaa et al. 2019). To assess the impact of the number of generated counterfactuals, we test different settings by generating 5, 10, 15, and 20 counterfactuals (Guidotti 2022). To evaluate the counterfactual approaches, we use five metrics from (Buliga et al. 2023): Distance (2); lower is better, Sparsity (3); lower is better, Implausibility (4); lower is better, Diversity (5); higher is better, and Runtime in seconds; lower is better. These metrics refer to a single τ and are averaged across the set of generated counterfactuals. In our experiments we omit the hit rate of the counterfactual set (i.e., whether |C| = t), as in all our experiments the hit rate was always 100%.

Experimental procedure For each dataset, LTLp formula φ, and prefix length, we split the data into 70% 10% 20% into training, validation, and testing, using a chronological order split. A XGBoost model is trained and optimised using hyperparameter optimisation to identify the best model configuration for each dataset, prefix length, and encoding combination. The set T used to train the XGBoost model is used as input for the counterfactual generation methods. We tested different LTLp formulae φ, with different coverage percentages of the possible different activities |Σφ|/|Σ| 4. The specific φ formulas for each dataset

4Concerning the impact of the complexity of a formula on our work there are two distinct aspects to be considered. First, formula complexity impacts on the construction of the DFA, because the size of the DFA is, in the worst-case, doubly exponential in the length of the formula (De Giacomo and Vardi 2013). However, this is orthogonal to our approach, as we consider that the DFA has been already constructed (De Giacomo, De Masellis, and Montali 2014; Fuggitti 2019). In addition, it is well-known that, despite this worst-case complexity, the size of the DFA is often polynomial in the length of the formula, and that such off-the-shelf techniques incorporate several optimizations (De Giacomo and Favorito 2021). Second, it is less meaningful to relate metrics on the syntactic complexities of a formula with performance, as their interaction occurs at the semantic level. In this respect, focusing on coverage reflects the intuition that when a modeller explicitly mentions an activity in a formula, they do so to express constraints on when such an activity must/can be executed (i.e., a modeller would not express formula true , which has empty formula signature, with the equivalent formula W

ai Σ ai, which has the whole alphabet as formula signature). In this sense, using more activities (i.e., having a larger formula signature) correlates with constraining more , which in turn impacts on performance. This motivates the choice of coverage, and in turn why in our experiments coverage influences performance.

are presented in the code repository linked in the beginning of the paper. Regarding the coefficients in Eq. (7), after testing multiple configurations, the final configuration was set to α = 0.5, β = 0.5, γ = 0.5, δ = 0.5 to give all objectives the same weight. Next, 15 instances are sampled from the test set and used for the counterfactual generation, one trace τ at a time, while the counterfactuals are evaluated using the evaluation framework. Experiments were run on a M1 with 16GB RAM. For the GA setting, we initialise the population through a hybrid approach: selecting close points from the reference population or, if unavailable, by randomly generating traces. We set the number of generations to 100, pc = 0.5, pmut = 0.2. In population selection, the top 50% of the population, w.r.t. the fitness function, moves to the next generation. Termination occurs at the max generation number or if no significant performance improvement occurs. We assess differences using statistical tests: we perform Wilcoxon signed-rank tests (Wilcoxon 1945) for pairwise comparisons, with p-values adjusted by the Bonferroni correction (Dunn 1961). Methods are then ranked by performance on each metric, allowing for clear comparison.

Table 2 shows the average values of each metric for each method. In parentheses, we indicate the afferent rank of each value, indicating different rankings only for statistically significant differences. The best performing strategy is highlighted in bold. If multiple values are in bold, this suggests no statistically significant difference between the methods in terms of the respective metric (or, in other words, that there is no statistically significant difference between the absolute best of the row and the other bold values). This also directly translates to multiple methods having the same rank. We show the evolution of the results with three levels of coverage |Σφ|/|Σ|: 10%, 25%, and 50%.

Answering RQ1. We start by considering the time performance. Looking at the runtime of the three methods, we observe that Geneticφ and a Priori perform similarly, with Geneticφ being among the best performers 7 out of 9 times, while a Priori 5 out of 9 times. Online instead shows an increase in the time required for generating the counterfactuals, especially for large coverage rates. Regarding quality, a different story emerges: a Priori and Online demonstrate a good performance, with a Priori showing a consistent ability to generate closer, sparser, and more plausible counterfactuals w.r.t., Genφ. This performance is more pronounced in real datasets (BPIC2012, and especially BPIC2017), where the complexity and length of traces are challenging the capability of Genφ. On the contrary, in the simpler Claim Management dataset, the superiority of a Priori becomes more apparent only for higher coverage levels, while for lower coverages, Genφ exhibits better counterfactual quality results. Concerning Online, it excels in balancing counterfactual quality with diversity, particularly in BPIC2012, making it a viable candidate in scenarios where both aspects are critical.

Cover. Metric Claim Management BPIC2012 BPIC2017

Genφ MAR a Priori Online Genφ MAR a Priori Online Genφ MAR a Priori Online

Dist. 0.48 (1) 0.50 (1) 0.50 (1) 0.50 (1) 0.54 (4) 0.50 (2) 0.47 (1) 0.50 (2) 0.59 (4) 0.50 (2) 0.44 (1) 0.50 (2) Spars. 2.50 (1) 2.58 (1) 2.63 (1) 2.58 (1) 7.69 (4) 6.95 (1) 6.79 (1) 6.97 (1) 6.95 (4) 5.60 (1) 5.20 (1) 5.85 (3) Impl. 8.49 (1) 8.88 (2) 8.84 (2) 8.92 (2) 7.49 (1) 7.37 (1) 7.26 (1) 7.38 (1) 7.51 (1) 6.81 (1) 6.59 (1) 6.86 (1) Dive 0.48 (4) 0.54 (1) 0.54 (1) 0.54 (1) 0.35 (4) 0.40 (1) 0.41 (1) 0.40 (1) 0.58 (1) 0.49 (2) 0.44 (4) 0.49 (2) Runtime 2.53 (1) 4.65 (4) 4.38 (2) 4.09 (2) 25 (3) 90 (4) 12 (1) 17 (2) 29 (1) 78 (2) 57 (2) 65 (2)

Dist. 0.48 (1) 0.47 (1) 0.50 (1) 0.47 (1) 0.47 (2) 0.38 (4) 0.30 (1) 0.36 (2) 0.58 (4) 0.33 (2) 0.22 (1) 0.31 (2) Spars. 2.50 (1) 2.45 (1) 2.58 (4) 2.42 (1) 7.01 (3) 5.24 (3) 4.38 (1) 4.92 (2) 6.77 (4) 4.02 (2) 2.72 (1) 3.82 (2) Impl. 8.46 (1) 8.80 (4) 8.83 (1) 8.70 (1) 6.38 (1) 9.73 (2) 9.67 (2) 9.70 (2) 7.39 (4) 5.52 (1) 4.94 (1) 5.40 (1) Dive 0.47 (3) 0.50 (2) 0.52 (1) 0.49 (3) 0.36 (2) 0.38 (1) 0.32 (4) 0.35 (2) 0.60 (1) 0.34 (2) 0.26 (4) 0.33 (2) Runtime 2.76 (1) 4.41 (4) 3.10 (1) 3.23 (1) 154 (1) 211 (1) 133 (1) 191 (1) 29.1 (1) 81.6 (3) 54.4 (2) 75.6 (3)

Dist. 0.41 (4) 0.27 (2) 0.22 (1) 0.27 (2) 0.55 (4) 0.20 (3) 0.24 (1) 0.30 (2) 0.6 (4) 0.17 (2) 0.09 (1) 0.18 (2) Spars. 2.19 (4) 1.41 (2) 1.17 (1) 1.40 (2) 7.83 (4) 5.55 (3) 3.52 (1) 4.11 (2) 7.11 (4) 2.06 (2) 1.06 (1) 2.16 (2) Impl. 7.17 (2) 7.17 (2) 7.01 (1) 7.16 (2) 7.58 (1) 8.51 (4) 7.28 (1) 7.13 (1) 7.95 (4) 4.86 (2) 4.38 (1) 5.14 (2) Dive 0.43 (1) 0.30 (2) 0.25 (4) 0.30 (2) 0.36 (1) 0.19 (4) 0.25 (2) 0.27 (2) 0.61 (1) 0.18 (2) 0.11 (4) 0.19 (2) Runtime 3.39 (1) 9.78 (4) 4.80 (2) 5.18 (2) 535 (2) 1500 (4) 451 (1) 780 (2) 288 (1) 566 (3) 261 (1) 463 (3)

Table 2: Performance metrics across different datasets. The ranking position of each method is indicated in parentheses.

A final remark on the compliance of the counterfactuals generated with Geneticφ. Assessing Geneticφ s compliance is methodologically complex and beyond this paper s scope, as it is not designed to ensure compliance, which may depend upon several factors, including the LTLp formulae used. Nonetheless, it is worth noting that in our experiments Geneticφ managed to reach a compliance ranging on average from 80% to 99%, depending upon the coverage used. This hints that Geneticφ can achieve good overall compliance but cannot always guarantee satisfaction of φ.

Answering RQ2. Overall, the runtime performance analysis reveals that a Priori and Online demonstrate significantly low runtime across all datasets and coverage levels, enabling a quick counterfactual generation. This efficiency is particularly pronounced at higher coverage levels, where a Priori and Online maintain fast processing times, thanks to their optimised mutation processes and reduced need for extensive checks. In contrast, Mutate-And Retry exhibits substantially higher runtimes, especially notable at 50% coverage, due to the complexity of performing the trace validation after each mutation.

In terms of quality of the generated counterfactuals, MAR performs similarly to our methods, particularly when the task coverage is lower. As complexity increases, particularly at 50% task coverage, a Priori and Online outperform MAR. The counterfactuals generated by MAR tend to have higher sparsity, requiring more modifications to the original trace. In contrast, a Priori and Online are better at generating counterfactuals with less sparsity. MAR shows good performance in implausibility, often ranking close to or at the top. However, a Priori and Online still maintain a good level of implausibility performance, indicating their ability to generate plausible counterfactuals as well, but with added efficiency and sparsity. Finally, MAR does not show a significant improvement in diversity, compared to our two strategies. Thus, a Priori and Online offer a good tradeoff between runtime and quality of the generated counterfactuals, which remains overall comparable to that of MAR.

7 Related Work In this section, we review related work on counterfactual explanations, focusing on their use in Predictive Process Monitoring (PPM) and temporal data. We begin by examining general techniques for generating counterfactuals, distinguishing between case-based and generative approaches. We then explore how Genetic Algorithms (GAs) are employed in counterfactual generation, noting their advantages and limitations. Lastly, we dive into recent developments in counterfactual explanations for temporal process data.

Counterfactual explanations in XAI Counterfactual explanations identify minimal changes to alter a model s prediction (Wachter, Mittelstadt, and Russell 2017).Techniques for generating counterfactuals fall into two categories: casebased, which find counterfactuals within the sample population, and generative, creating them through optimisationbased techniques (e.g., hill-climbing algorithms) (Verma, Dickerson, and Hines 2020). Genetic Algorithms (GAs) are widely used for generating counterfactuals by optimizing a population of potential candidates through a fitness function (Dandl et al. 2020; Schleich et al. 2021). Both single-objective and multi-objective GA solutions are available, with single-objective solutions converging faster due to lower complexity (Schleich et al. 2021) and multi-objective GAs providing multiple optimal solutions using a Pareto Front (Dandl et al. 2020). One key benefit of GAs is their ability to maximize population diversity, yet their stochastic nature often leads to inconsistent results. Moreover, unlike gradient-based optimisation techniques for counterfactual generation (Mothilal, Sharma, and Tan 2020), which require the use of differentiable models to compute counterfactuals, GAs do not require access to the model s parameters or gradient computation. As such, they are not limited to differentiable models and do not require gradient computations by construction. Despite the advancements in the literature that try to incorporate plausibility and causality constraints (Schleich et al. 2021), current methods have limitations in ensuring the feasibility of counterfactuals. Feasibility is crucial for generating valid counterfactuals, typically enforced through restricting the data manifold, specifying constraints, or mini-

mizing distances to training set points (Maragno et al. 2024). As mentioned by (Beckh et al. 2023), no background knowledge injection has been explored so far for the generation of counterfactual explanations, which is a challenge also when focusing on counterfactual explanations generated with GA approaches (Zhou and Hu 2024).

Counterfactual explanations for temporal data Four works so far have tackled the counterfactual explanation problem in the PPM domain (Huang, Metzger, and Pohl 2022; Hsieh, Moreira, and Ouyang 2021; Hundogan et al. 2023; Buliga et al. 2023). The first paper introduces LORELEY, an adaptation of the Local Rule-Based Explanations (LORE) framework (Guidotti et al. 2019), which generates counterfactual explanations leveraging a surrogate decision tree model using a genetically generated neighbourhood of artificial data instances to be trained (Huang, Metzger, and Pohl 2022; Guidotti et al. 2019). The prediction task the authors address is the one of multi-class outcome prediction. To ensure the generation of feasible counterfactuals, LORELEY imposes process constraints in the counterfactual generation process by using the whole prefix of activities as a single feature, encoding the whole control-flow execution as a variant of the process. The second work presents DICE for Event Logs (DICE4EL) (Hsieh, Moreira, and Ouyang 2021). DICE4EL extends one of the methods found within DICE (Mothilal, Sharma, and Tan 2020), specifically, the gradient-based optimisation method by adding a feasibility term to ensure that the generated counterfactuals maximise the likelihood of belonging to the training set. To do so, DICE4EL leverages a Long-Short Term Memory (LSTM)-based predictive model as it requires gradients for the counterfactual explanation search. The prediction task addressed in the paper is that of next activity prediction with a milestone-aware focus. The third, the most recent approach for generating counterfactual explanations for PPM, CREATED, leverages a genetic algorithm to generate candidate counterfactual sequences (Hundogan et al. 2023). To ensure the feasibility of the data, the authors build a Markov Chain, where each event is a state. Then, using the transition probabilities from one state to another, they can determine how likely a counterfactual is, given the product of the states. The fourth and final work looked into proposing an evaluation framework for measuring counterfactual explanations in PPM by proposing a novel metric measuring the conformance of counterfactual generation techniques (Buliga et al. 2023). As noted by the authors, no previous approaches make use of temporal background knowledge explicitly when generating counterfactual explanations. However, background knowledge can play an important role in ensuring the feasibility of the generated counterfactuals, especially from the perspective of sequences of activities, where different constraints may have a different impact on the outcome of a trace execution. The present work aims to specifically fill this gap identified in the literature.

8 Conclusions We have introduced a novel framework for generating counterfactual traces in temporal domains, guaranteeing that they respect background knowledge captured in a suitable temporal logic. Our approach blends automata-theoretic techniques of this logic with genetic algorithms. The results of the evaluation show that the strategies we propose ensure that background formulae remain satisfied by the generated counterfactual traces, while these traces also maintain or improve general counterfactual explanation desiderata compared to state-of-the-art methods. In the future, we aim to develop more efficient genetic operators strategies. We also plan to extend our approach to richer temporal logics dealing not only with activities, but also with numerical data, as in (Felli et al. 2023). This appears viable given the basis provided here, in the light of the automata-theoretic characterisation of such logics (Felli et al. 2023), as well as the fact that counterfactual desiderata can be seamlessly redefined over numerical data.

Acknowledgments This work is partially funded by the Next Generation EU FAIR PE0000013 project MAIPM (CUP C63C22000770006), by the PRIN MIUR project PINPOINT Prot. 2020FNEB27, and by the PNRR project FAIR - Future AI Research (PE00000013), under the NRRP MUR program funded by the Next Generation EU.

References Beckh, K.; M uller, S.; Jakobs, M.; Toborek, V.; Tan, H.; Fischer, R.; Welke, P.; Houben, S.; and von Rueden, L. 2023. Harnessing Prior Knowledge for Explainable Machine Learning: An Overview. In 2023 IEEE Conference on Secure and Trustworthy Machine Learning (Sa TML), 450 463. Buliga, A.; Di Francescomarino, C.; Ghidini, C.; and Maggi, F. M. 2023. Counterfactuals and Ways to Build Them: Evaluating Approaches in Predictive Process Monitoring. In International Conference on Adv. Inf. Sys. Eng., 558 574. Springer. Dandl, S.; Molnar, C.; Binder, M.; and Bischl, B. 2020. Multi-Objective Counterfactual Explanations. In B ack, T.; Preuss, M.; Deutz, A.; Wang, H.; Doerr, C.; Emmerich, M.; and Trautmann, H., eds., Parallel Problem Solving from Nature PPSN XVI, 448 469. Springer. De Giacomo, G.; De Masellis, R.; Maggi, F. M.; and Montali, M. 2022. Monitoring Constraints and Metaconstraints with Temporal Logics on Finite Traces. ACM Trans. Softw. Eng. Methodol., 31(4). De Giacomo, G.; De Masellis, R.; and Montali, M. 2014. Reasoning on LTL on Finite Traces: Insensitivity to Infiniteness. In Proceedings of the 28th AAAI Conference on Artificial Intelligence (AAAI), 1027 1033. AAAI Press. De Giacomo, G.; and Favorito, M. 2021. Compositional Approach to Translate LTLf/LDLf into Deterministic Finite Automata. Proceedings of the International Conference on Automated Planning and Scheduling, 31(1): 122 130.

De Giacomo, G.; and Vardi, M. Y. 2013. Linear Temporal Logic and Linear Dynamic Logic on Finite Traces. In Rossi, F., ed., Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), 854 860. IJCAI/AAAI. Di Francescomarino, C.; and Ghidini, C. 2022. Predictive Process Monitoring. In Process Mining Handbook, volume 448 of LNBIP, 320 346. Springer. Dunn, O. J. 1961. Multiple Comparisons Among Means. Journal of the American Statistical Association, 56(293): 52 64. Felli, P.; Montali, M.; Patrizi, F.; and Winkler, S. 2023. Monitoring Arithmetic Temporal Properties on Finite Traces. In Proceedings of the 37th AAAI Conference on Artificial Intelligence (AAAI), 6346 6354. AAAI Press. Fionda, V.; and Greco, G. 2018. LTL on Finite and Process Traces: Complexity Results and a Practical Reasoner. J. Artif. Intell. Res., 63: 557 623. Fuggitti, F. 2019. LTLf2DFA (Version 1.0.3). Guidotti, R. 2022. Counterfactual explanations and how to find them: literature review and benchmarking. Data Mining and Knowledge Discovery. Guidotti, R.; Monreale, A.; Giannotti, F.; Pedreschi, D.; Ruggieri, S.; and Turini, F. 2019. Factual and Counterfactual Explanations for Black Box Decision Making. IEEE Intelligent Systems, 34: 14 23. Hsieh, C.; Moreira, C.; and Ouyang, C. 2021. Di CE4EL: Interpreting Process Predictions using a Milestone-Aware Counterfactual Approach. In ICPM, 88 95. Huang, T.-H.; Metzger, A.; and Pohl, K. 2022. Counterfactual Explanations for Predictive Business Process Monitoring. In Themistocleous, M.; and Papadaki, M., eds., Information Systems, 399 413. Cham: Springer International Publishing. Hundogan, O.; Lu, X.; Du, Y.; and Reijers, H. A. 2023. CREATED: Generating Viable Counterfactual Sequences for Predictive Process Analytics. In Indulska, M.; Reinhartz Berger, I.; Cetina, C.; and Pastor, O., eds., International Conference on Adv. Inf. Sys. Eng., 541 557. Cham: Springer Nature Switzerland. Maragno, D.; Kurtz, J.; R ober, T. E.; Goedhart, R.; Birbil, S. I.; and den Hertog, D. 2024. Finding Regions of Counterfactual Explanations via Robust Optimization. INFORMS J. Comput., 36(5): 1316 1334. Mitchell, M. 1998. An Introduction to Genetic Algorithms. Cambridge, MA, USA: MIT Press. Mothilal, R. K.; Sharma, A.; and Tan, C. 2020. Explaining machine learning classifiers through diverse counterfactual explanations. In Hildebrandt, M.; Castillo, C.; Celis, L. E.; Ruggieri, S.; Taylor, L.; and Zanfir-Fortuna, G., eds., FAT* 20: Conference on Fairness, Accountability, and Transparency, Barcelona, Spain, January 27-30, 2020, 607 617. ACM. Pnueli, A. 1977. The temporal logic of programs. In Proceedings of the 18th Annual Symposium on Foundations of Computer Science (FOCS), 46 57. IEEE.

Rizzi, W.; Di Francescomarino, C.; and Maggi, F. M. 2020. Explainability in predictive process monitoring: When understanding helps improving. In International Conference on Business Process Management, 141 158. Springer. Schleich, M.; Geng, Z.; Zhang, Y.; and Suciu, D. 2021. Ge Co: Quality Counterfactual Explanations in Real Time. Proc. VLDB Endow., 14(9): 1681 1693. Teinemaa, I.; Dumas, M.; La Rosa, M.; and Maggi, F. M. 2019. Outcome-Oriented Predictive Process Monitoring: Review and Benchmark. ACM Trans. Knowl. Discov. Data, 13(2). van der Aalst, W. M. P. 2016. Process Mining - Data Science in Action, Second Edition. Springer. van Dongen, B. 2012. BPI Challenge 2012. Verma, S.; Dickerson, J. P.; and Hines, K. 2020. Counterfactual Explanations for Machine Learning: A Review. Co RR, abs/2010.10596. Wachter, S.; Mittelstadt, B. D.; and Russell, C. 2017. Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GDPR. Co RR, abs/1711.00399. Wilcoxon, F. 1945. Individual Comparisons by Ranking Methods. Biometrics Bulletin, 1(6): 80 83. Zhou, R.; and Hu, T. 2024. Evolutionary Approaches to Explainable Machine Learning. In Banzhaf, W.; Machado, P.; and Zhang, M., eds., Handbook of Evolutionary Machine Learning, 487 506. Singapore: Springer Nature Singapore.