# smt_safety_verification_of_ontologybased_processes__edd2c601.pdf

SMT Safety Veriﬁcation of Ontology-Based Processes

Diego Calvanese1,2, Alessandro Gianola1, Andrea Mazzullo1, Marco Montali1

1Faculty of Engineering, Free University of Bozen-Bolzano, Italy, 2Computing Science Department, Ume a University, Sweden {calvanese, gianola, mazzullo, montali}@inf.unibz.it

In the context of veriﬁcation of data-aware processes, a formal approach based on satisﬁability modulo theories (SMT) has been considered to verify parameterised safety properties. This approach requires a combination of model-theoretic notions and algorithmic techniques based on backward reachability. We introduce here Ontology-Based Processes, which are a variant of one of the most investigated models in this spectrum, namely simple artifact systems (SASs), where, instead of managing a database, we operate over a description logic (DL) ontology. We prove that when the DL is expressed in (a slight extension of) RDFS, it enjoys suitable modeltheoretic properties, and that by relying on such DL we can deﬁne Ontology-Based Processes to which backward reachability can still be applied. Relying on these results we are able to show that in this novel setting, veriﬁcation of safety properties is decidable in PSPACE.

Introduction Verifying and reasoning about dynamic systems that integrate processes and data is a long-standing challenge that attracted considerable attention, and that led to a ﬂourishing series of results, within business process management (Reichert 2012; Calvanese et al. 2019a; Ghilardi et al. 2020) and data management (Vianu 2009; Calvanese, De Giacomo, and Montali 2013; Bagheri Hariri et al. 2013a; Deutsch, Hull, and Vianu 2014; Deutsch, Li, and Vianu 2019). Among the several conceptual models studied in this area, datacentric systems and in particular artifact-centric systems have been brought forward as a principled approach where relevant (business) objects are elicited, and actions evolving them through their lifecycle are deﬁned (Hull 2008). Different formal models have been proposed to capture artifact systems and study their veriﬁcation (Calvanese, De Giacomo, and Montali 2013). One of the most studied settings considers artifact systems as being composed of: (i) a readonly database storing background information about artifacts that does not change during the system evolution; (ii) a working memory, used to store data that can be modiﬁed in the course of the evolution; and (iii) transitions (also called actions or services) that query the read-only database and the working memory and use the retrieved data to update

Copyright 2023, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

the working memory. Veriﬁcation of such systems is challenging, not only because the working memory in general evolves through inﬁnitely many different conﬁgurations, but also because the desired veriﬁcation properties should hold regardless of the content of the read-only database, thus calling for a particular form of parameterised veriﬁcation (Damaggio, Deutsch, and Vianu 2012; Deutsch, Li, and Vianu 2019; Calvanese et al. 2019b, 2020). In this paper, we study for the ﬁrst time ontology-based processes, i.e., semantically-enriched artifact systems where the read-only database is substituted by a description logic (DL) ontology (Baader et al. 2003), which stores background, incomplete information about the artifacts. In this setting, two possible notions of parameterisation may be studied: one where the evolution of the system is veriﬁed against all possible choices for the ABox (i.e., the extensional component of the ontology), another where veriﬁcation is against all possible models of a ﬁxed ABox. In this work, we adopt the latter, and thus verify whether the system enjoys desired properties irrespectively of how the information explicitly provided by the ABox is completed through the assertions in the TBox (i.e., the intensional component of the ontology). More in detail, we consider an extensively studied model of such artifact systems, called simple artifact system (SAS) by Calvanese et al. (2020), where the artifact working memory consists of a ﬁxed set of artifact variables (Deutsch et al. 2009; Damaggio, Deutsch, and Vianu 2012; Calvanese et al. 2020). On top of this basis, we study the veriﬁcation of safety properties in the case where the ontology is speciﬁed in (a slight extension of) RDFS (Brickley and Guha 2014), a schema/ontology language for the Semantic Web formalized by the W3C, and where the transitions that update the working memory are expressed over the ontology signature. For this setting, we provide an SMT-based backward reachability procedure to decide safety, showing that the problem is in PSPACE. To apply this machinery, the underlying DL must enjoy suitable model-theoretic properties. In particular, to this end, we prove for the ﬁrst time that this DL admits a model completion (Chang and Keisler 1990). Showing the existence of model completions is not trivial: it requires a careful algebraic analysis of the class of all models. At the same time, we give indications on which DL constructs break our veriﬁcation machinery: indeed, model

The Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23)

completions may not exist in general, and we show that simple examples such as RL-ontologies do not admit a model completion. This gives also a technical justiﬁcation for the choice of RDFS. Detailed proofs are provided in an extended version of this article (Calvanese et al. 2021).

Related work. In spirit, our approach is reminiscent of previous works studying the veriﬁcation of dynamic systems, like Golog programs, operating over a DL ontology, such as those by Claßen et al. (2014), and Zarrieß and Claßen (2016). In fact, both in their settings and ours, the dynamic system evolves each model of the ontology, and properties are veriﬁed over all the resulting evolutions. This is radically different from approaches where the ABox itself is evolved by the process, with an execution semantics following Levesque s functional approach, in which query entailment over the current state is used to compute the successor states (Bagheri Hariri et al. 2013b). However, we differ from the work by Claßen et al. (2014) and Zarrieß and Claßen (2016) in that our goal is not only to derive foundational results, but also to transfer them into practical algorithms and thus obtain a model that is readily implementable by relying on a state-of-the-art SMT-based model checker such as MCMT (Ghilardi and Ranise 2010). As customary for artifact systems, our approach is based on actions that manipulate the artifact variables, coupled with conditionaction rules that declaratively deﬁne which actions are currently executable, and with which parameters. Alternative choices could be seamlessly taken, by adapting approaches that rely on an explicit description of the control-ﬂow, e.g., based on state machines (de Leoni, Felli, and Montali 2020) or Petri nets interpreted with interleaving semantics (Ghilardi et al. 2020). There exist very few approaches that have been proved to be successful for assessing parameterized safety of processes enriched with data capabilities, and SMT solvers are well-known to represent the state-of-the-art in this area (Calvanese et al. 2019a; Ghilardi et al. 2021; Calvanese et al. 2020): in this work we build in particular on top of the SMTbased veriﬁcation framework from (Calvanese et al. 2020). A key element of novelty in our work is to lift, in conceptual and modelling terms, this SMT-based veriﬁcation approach from to handle data stored in a DL ontology with incomplete information, as opposed to a standard relational database adopted in all the prior works on parameterized veriﬁcation of artifact systems (Deutsch, Li, and Vianu 2019, 2016; Calvanese et al. 2020). It is well known that lifting results from settings with complete to ones with incomplete information is often a non-trivial task that requires novel insights, even more when dealing with data/knowledge dynamics and formal veriﬁcation (where overall results are sparse and fragmented). This lies at the core of KR in AI and is a key contribution of our work. Moreover, to develop our novel results we need to extend the technical machinery from (Calvanese et al. 2020) in a non-trivial way. Indeed, the central technical result is then to show that model completion holds in this novel setting (Theorem 2), which in turn stands at the core of the veriﬁcation machinery we employ. In this light, Theorem

2 is a crucial, non-trivial result, as proving the existence of a model completion calls every time for a sophisticated semantical analysis of FO models of the theory of interest, that is signiﬁcantly different when a different theory is studied.

Preliminaries

In this section, we recall the syntax and semantics of ﬁrstorder logic (FO). We then deﬁne the syntax of the DL RDFS+, which is a slight extension of RDFS (Brickley and Guha 2014). Its semantics is given by the standard translation, mapping RDFS+ ontologies into equivalent sets of FO formulas.

First-Order Logic Preliminaries

The alphabet of ﬁrst-order logic (FO) consists of: countably inﬁnite and pairwise disjoint sets NP of predicate symbols (with ar(P) N being the arity of P NP), NF of function symbols (with ar(f) N being the arity of f NF), NI of individual symbols (or individual names), and Var of variables; the equality symbol = ; the Boolean operators and ; and the existential quantiﬁer . The deﬁnitions of a (FO) term t (with t denoting a possibly empty tuple of terms), formula ϕ, atom, and literal are given as usual. Moreover, we adopt the standard abbreviations and conventions for the other Boolean operators and the universal quantiﬁer . We write ϕ(x) to indicate that the free variables (deﬁned as usual) of ϕ are included in x, and we write ϕ(a) for the formula obtained from ϕ(x) by substituting a to x. Similar notions are adopted for terms. A sentence is deﬁned as a formula without free variables, while we call quantiﬁer-free a formula without any existential or universal quantiﬁers. A formula is existential if it has the form xϕ(x), and universal if it has the form xϕ(x), where ϕ is quantiﬁer-free. A (FO) theory T is a set of FO sentences, and T is said to be universal if every ϕ T is universal. A signature Σ is a subset of NP NF NI. For a set Γ of formulas (respectively, terms), the signature of Γ, denoted ΣΓ, is the set of predicate, function, and individual symbols occurring in Γ. Given Σ, we say that Γ is a set of Σ-formulas (resp., Σ-terms) if ΣΓ = Σ. Given a signature Σ, an (FO) Σ-interpretation is a pair I = ( I, I), where I is a non-empty set, called domain of I, and I is an interpretation function such that: P I ( I)ar(P ), for every P NP Σ; f I : ( I)ar(f) I, for every f NF Σ; and a I I, for every a NI Σ. For a set Γ of terms or formulas, when no confusion can arise, we often write interpretation in place of ΣΓ-interpretation . An assignment in I is a function a: Var I. We deﬁne the value of a Σ-term t in I under a as follows: a(t) = a(x), if t = x; a(t) = a I, if t = a NI Σ; and a(t) = f I(a(t)), if t = f(t), where f NF Σ and, for an m-tuple t = (t1, . . . , tm) of terms, we set a(t) = (a(t1), . . . , a(tm)). The notion of a formula ϕ being satisﬁed in an interpretation I under an assignment a, or of I being a model of ϕ under a, written I |=a ϕ, is

inductively deﬁned as:

I |=a P(t) iff a(t) P I, I |=a s = t iff a(s) = a(t), I |=a ψ iff not I |=a ψ, I |=a ψ χ iff I |=a ψ and I |=a χ, I |=a xψ iff I |=a ψ, for some a that can differ from a only on x.

For a formula ϕ(x), we write I |= ϕ[d] in place of I |=a ϕ(x), with a(x) = d. We say that a set Γ of formulas is satisﬁed in an interpretation I under an assignment a, or that I is a model of Γ under a, written I |=a Γ, if I |=a ϕ, for every ϕ Γ (we refer to a singleton Γ = {ϕ} simply as ϕ). For a sentence ϕ, the satisfaction of ϕ in I under a does not depend on a, thus we write I |= ϕ in place of I |=a ϕ, and we say that ϕ is satisﬁed in I. For a theory T, we say that T is satisﬁed in an interpretation I (or that I is a model of T), written I |= T, if every sentence of T is satisﬁed in I. A formula ϕ is satisﬁable w.r.t. T (or T-satisﬁable) if there exist an interpretation I and an assignment a in I such that I |= T and I |=a ϕ. Moreover, we say that T logically implies a formula ϕ, or that ϕ is a logical consequence of T, written T |= ϕ, if, for every interpretation I and every assignment a in I, I |= T implies that I |=a ϕ. Finally, formulas ϕ, ψ are equivalent w.r.t. T (or T-equivalent) if T |= ϕ ψ.

Description Logics Preliminaries The DL we consider here is an extension of RDFS (Brickley and Guha 2014) with disjointness between concepts and roles, conjunction and (one-level) qualiﬁed existential quantiﬁcation on the left-hand side of inclusions, and inclusion of direct and inverse roles. We denote this DL RDFS+. Formally, let NC, NR, and NI be countably inﬁnite and pairwise disjoint sets of concept, role, and individual names, respectively, where concept names are 1-ary and role names are 2-ary predicate symbols, hence NC NR NP. In RDFS+, concepts C and roles R, respectively denoting 1ary and 2-ary predicates, are deﬁned as follows:

R ::= P | P , C ::= A1 An | R. | R.A,

where P NR, n 1, and A, A1, . . . , An NC. A concept inclusion (CI) has the form C A or C A, and a role inclusion (RI) has the form R R or R R , where C is an RDFS+ concept, A NC, and R, R are roles. An RDFS+ TBox T is a ﬁnite set of CIs and RIs. An assertion has the form A(a), A(a), P(a, b), P(a, b), (a = b), or (a = b), where A NC, P NR, and a, b NI. An ABox A is a ﬁnite set of assertions. (We point out that in an ABox we allow for negated assertions, which is a feature that is not always supported in DLs.) An RDFS+ ontology O is a pair (T , A), where T is a TBox and A is an ABox. We observe that RDFS+ is incomparable in expressive power with the DLs of the popular DL-Lite family (Calvanese et al. 2007; Artale et al. 2009). Indeed, while DLLite allows for the use of existential quantiﬁcation R. on the right-hand side of CIs, this is ruled out in RDFS+. On

Acad Pos Job Pos Acad Pos Admin Pos Admin Pos Job Pos User Job Pos appl For. User appl For . Job Pos suit For. User suit For . Job Pos suit For. Good Eval Elig User User User Grad Elig User Elig User Grad Acad Pos(professor123), Acad Pos(researcher123), Admin Pos(secretary123), Admin Pos(secretary456).

Figure 1: University personnel ontology for job hiring processes

the other hand, in RDFS+ one can locally type the second component of a role through the use of qualiﬁed existential quantiﬁcation R.A on the left-hand side of CIs, while this is not possible in DL-Lite. As we will see later, differently from DL-Lite, the FO translation of an RDFS+ ontology is a universal theory. Example To represent part of the domain knowledge on job hiring processes for university personnel, we deﬁne the RDFS+ ontology O = (T , A), where T and A contain the CIs and assertions shown in Figure 1. Moreover, we assume that A, which stores data on available job positions, contains all the assertions of the form A(u), P(u, a) and P(a, u), for a distinguished individual name u NI and every A, P, a ΣO, so that u can be used to represent an undeﬁned value. The CIs of T formalise the following facts: there are both academic and administrative job positions and these are disjoint; users and job positions are disjoint; appl For relates users to job positions; to be suitable for something one has to be a user that is positively evaluated; the range of suit For is included in the extension of Job Pos; an eligible user is deﬁned as a graduate user. We deﬁne now the standard translation from RDFS+ expressions to FO formulas, which maps concepts to FO formulas with one free variable, and roles to FO formulas with two free variables. Speciﬁcally, the translation T generates formulas that contain just two variables x, y Var:

T(A1 An) = A1(x) An(x), T(P) = P(x, y), T(P ) = P(y, x), T( R. ) = y T(R), T( R.A) = y(T(R) A(y)), T( A) = T(A), T( R) = T(R),

where A, A1, . . . , An are unary predicates and P is a binary predicate. Moreover, we map CIs and RIs into universal FO sentences in the following way:

T(C D) = x(T(C) T(D)), T(R S) = x y(T(R) T(S)),

where D stands for either A or A, and S stands for either R or R . We also set T(T ) = S β T {T(β)}. Assertions α are (identically) mapped into FO literals without free variables (i.e., ground), as T(α) = α, and we set T(A) = S α A{T(α)} = A. Finally, T(O) = T(T ) T(A). It is easy to see that the set of FO sentences obtained as the translation T(O) of an RDFS+ ontology O, can be equivalently rewritten into a universal Horn theory (Kontchakov

and Zakharyaschev 2014; Hodges 1993). Such a theory, which we identify with T(O), can be obtained from T(O) by simply putting formulas into prenex normal form. The semantics for RDFS+ expressions can be given in terms of their FO translation (Kontchakov and Zakharyaschev 2014). For a concept C and an interpretation I = ( I, I), the extension of C in I is CI = {d I | I |= T(C)[d]}. Similarly, for a role R, its extension in I is RI = {(d, e) I I | I |= T(R)[d, e]}. We say that C and R are satisﬁed in I if CI = and RI = , respectively. Moreover, given a CI, RI, assertion, TBox, ABox, or ontology Γ, we say that Γ is satisﬁed in I (or that I is a model of Γ), written I |= Γ, if I |= T(Γ). Given an ontology O and (a concept, role, CI, RI, or assertion mapped, via its FO translation, into) an FO formula ϕ, we say that ϕ is satisﬁable w.r.t. O (or O-satisﬁable) if there exists a model I of O that satisﬁes ϕ under some assignment in I. Finally, we say that O logically implies an FO formula ϕ, or that ϕ is a logical consequence of O, written O |= ϕ, if, for every model I of O and every assignment a in I, I satisﬁes ϕ under a.

Basic Model-Theoretic Properties In this section, we prove the model-theoretic properties that will be used later on to develop our veriﬁcation machinery. Speciﬁcally, we show here that the standard translation of the RDFS+ ontologies introduced in the previous section admits model completion, and has the constraint satisﬁability problem decidable. These properties will then allow us to verify suitably deﬁned ontology-based processes by employing a variant of the SMT-based backward reachability procedure by Calvanese et al. (2020). For this, we require some preliminary notions. A formula that is a conjunction of Σ-literals is called a Σ-constraint. Given a Σ-theory T, we deﬁne the constraint satisﬁability problem for T as follows: given a formula yϕ(x, y), where ϕ(x, y) is a Σ-constraint, decide whether yϕ(x, y) is satisﬁable w.r.t. T. A theory T has quantiﬁer elimination (QE) iff, for every ΣT -formula ϕ(x), there exists a quantiﬁer-free formula ψ(x) such that T |= ϕ(x) ψ(x). Finally, we will use the following deﬁnition of model completion, which is restricted to cover the case of universal theories (the ones considered in this work) and that is nonetheless known to be equivalent (for universal theories) to the usual one from model theory (Chang and Keisler 1990; Ghilardi 2004). Let T be a universal Σ-theory and let T T be a further Σ-theory. We say that T is a model completion of T iff: (i) every Σ-constraint satisﬁed in a model of T is also satisﬁed in a model of T ; (ii) T has QE. A model completion T of a theory T, when it exists, is unique: this justiﬁes the use of the notation T for denoting the unique model completion of T. We now state the main technical result of the section. Theorem 2 Given an RDFS+ ontology O, T(O) is a ﬁnite universal FO theory that (i) has a decidable constraint satisﬁability problem, and (ii) admits a model completion.

Proof (Sketch) For Point (i), we reduce to RDFS+ ontology satisﬁability, which is decidable (since RDFS+ is a fragment of SROIQ (Baader et al. 2017)). For Point (ii),

we require the following deﬁnitions. A theory T has the amalgamation property if, for every pair of embeddings µ1 : I0 I1, µ2 : I0 I2 between models I0 and I1, I2 of T, there exist a model I of T and embeddings ν1 : I1 I, ν2 : I2 I, such that ν1 µ1 = ν2 µ2 (we adopt here the usual notion of embedding (Chang and Keisler 1990)). The triple (I, ν1, ν2) (or, abusing notation, just I) is called a T-amalgam of I1, I2 over I0. One can assume w.l.o.g. that I0 is a substructure of I1 and I2 and that µ1 and µ2 are inclusion embeddings (mapping an element of I0 in the same element of I1 and I2, resp.). Since there is no function symbol in ΣT (O), it sufﬁces to show that T(O) enjoys the amalgamation property (Lipparini 1982). For this purpose, for every pair of models I1, I2 of T(O) sharing a submodel I0, we deﬁne I = ( I, I) as follows: (i) I = I1 I2; (ii) for every individual symbol a ΣT (O), a I = a I1; (iii) for every (1or 2-ary) predicate symbol P ΣT (O), P I = P I1 P I2. Observe that a I = a I1 = a I0 = a I2. Moreover, if d P I, where P is n-ary, for n {1, 2}, then d ( I1)n and d P I1, or d ( I2)n and d P I2: this follows from the definition of P I := P I1 P I2. Clearly, given embeddings µ1 : I0 I1, µ2 : I0 I2, the (inclusion) embeddings i1 : I1 I, i2 : I2 I are such that i1 µ1 = i2 µ2. We show that I is a model of T(O). A formula ϕ of T(O) has one of the following forms:

(1) x(A1(x) An(x) D(x)); (2) x y(R1(x, y) D(x)); (3) x y(R1(x, y) A(y) D(x)); (4) x y(R1(x, y) R2(x, y)); (5) x y(R1(x, y) R2(x, y));

where: Ak NC, for k {1, . . . , n}; D {B, B}, with B NC; Ri(x, y) = Pi(x, y), if Ri = Pi, and Ri(x, y) = Pi(y, x), if Ri = P i , with Pi NR and i {1, 2}. Reasoning by cases, one can show that for every j {1, . . . , 5} and every formula ϕ T(O) of the form (j), I is a model of ϕ. For example, we prove here Case (1). Given d I, suppose that I |= Ak[d], i.e., d AI k, for all k {1, . . . , n}. We have that d Ii and d AIi k , for i = 1 or i = 2, and thus Ii |= Ak[d]. Since Ii is a model of T(O), and hence of ϕ, we have Ii |= D[d]. Given that D(x) is a literal and Ii is embedded in I, we obtain that I |= D[d], and thus I |= ϕ. This completes the proof that T(O) has a model completion.

Remark For every RDFS+ ontology O, the model completion T(O) of T(O) admits quantiﬁer elimination. The algorithm for quantiﬁer elimination in T(O) follows from the proof of Theorem 2: to eliminate x from a ΣT (O)-formula xϕ(x, y), we take the conjunction of the clauses χ(y) implied by ϕ(x, y), which are ﬁnitely many for T(O), up to T(O)-equivalence. This procedure is used in Algorithm 1 and is crucial to get the decidability results of Theorem 9.

Properties (i) and (ii) of Theorem 2 are in line with the foundational framework by Calvanese et al. (2020), where a third property is additionally assumed: the ﬁnite model property for constraint satisﬁability (see the references for

the deﬁnition). However, an important difference with the work by Calvanese et al. (2020) is that this property here is not needed because we do not require ﬁnite structures (i.e., databases). Finally, we observe that ontologies in the DL RL (Kontchakov and Zakharyaschev 2014) do not enjoy Property (ii) of Theorem 2. To see this, consider the RL ontology O1 = (T1, ), with T1 = { P. P. }. It can be shown (see ﬁgure below) that T(O1) does not enjoy amalgamation, hence it does not admit a model completion (Chang and Keisler 1990).

I0 I1 I2 e0 e1 e2 P P

Ontology-Based Processes To study ontology-based processes under RDFS+ ontologies, we introduce now our model, called RDFS+-based processes, which are a variant of the artifact systems studied by Calvanese et al. (2019b). We also introduce the parameterised safety problems for our model, then studied in the following section. Ontology-based processes read data from a given RDFS+ ontology, used to store background information of the system, and manipulate individual variables, called artifact variables, which represent the current state of the process. To formalise such model, we ﬁrst introduce case-deﬁned functions. For an RDFS+ ontology O, an O-partition is a ﬁnite set P = {κ1(x), . . . , κn(x)} of ΣO-literals such that O |= x Wn i=1 κi(x) x V i =j (κi(x) κj(x)). Given an O-partition P = {κ1(x), . . . , κn(x)}, and ΣO-terms t(x) = (t1(x), . . . , tn(x)), (the value of) a case-deﬁned function F on P and t, for a fresh function symbol F NF, is as follows: for every model I of O, every assignment a in I, and every tuple x of variables, a(F(x)) = a(ti(x)), if I |=a κi(x). We call a case-deﬁned function trivial when i = 1, that is, when it has only one case In order to introduce veriﬁcation problems in a symbolic setting, one ﬁrst has to specify which formulas are used to represent (i) the sets of states, (ii) the system initialisations, and (iii) the system evolution. To capture these aspects, we provide the following deﬁnitions. An RDFS+-based process (RDFS+-BP) is a tuple S = (O, x, ι(x), Sm j=1{τj(x, x )}), where m N, and

O = (T , A) is an RDFS+ ontology; x = (x1, . . . , xn) is a tuple of variables, called artifact variables, and x is a tuple of variables that are renamed copies of variables in x; ι(x) = Vn i=1 xi = ai, with ai NI, is an initial state formula; τj(x, x ) = y(γj(x, y) Vn i=1 x i = F j i (x, y)), for 1 j m, is a transition formula, where γj(x, y) is a conjunction of ΣO-literals, called guard of τj, and each x i = F j i (x, y), where F j i is a case-deﬁned function on some O-partition and list of ΣO-terms, is called an update of τj.

Notice that, when the case-deﬁned function F j i appearing in an update is trivial, the corresponding update x i = F j i (x, y) stands for a value assignment of the variable x i to a single term.

Example We develop a job hiring process for university personnel based on the domain knowledge from Example 1. Each application is created using a dedicated website portal, where potentially interested users need to register in advance. When a registered user decides to apply, the data created do not have to be stored persistently and thus can be maintained just by using artifact variables (described below) that can interact with the knowledge base. All these variables are initialised with an undeﬁned value u. In the ﬁrst transition, an application is created by a registered user, which falls into the extension of the concept User: the information about the user is then stored in the artifact variable xappl. At this point, the website asks the user whether they hold a university degree: if this is the case, the website accepts the user as eligible, their information is stored using xappl and the process can progress. Then, the user picks up a job position (assigned to xjob) and applies for it. The following steps of the process involve the evaluation of the application: for both academic and administrative positions, if the eligible candidate is suitable for the position, they are declared winner (assigned to xwinr), otherwise loser (assigned to xlosr). To formalise this process, we deﬁne the RDFS+-BP S = (O, x, ι(x), S7 j=1{τj(x, x )}) so that O is the RDFS+ ontology of Example 1, and x = (xappl, xjob, xelig, xwinr, xlosr), ι = V xi x xi = u, τ1 = y1(User(y1) x appl = y1), τ2 = Elig User(xapplicant) x elig = xappl, τ3 = z1(Job Pos(z1) appl For(xelig, z1) x job = z1); τ4 = Acad Pos(xjob) suit For(xelig, xjob) x winr = xelig, τ5 = Admin Pos(xjob) suit For(xelig, xjob) x winr = xelig, τ6 = Acad Pos(xjob) suit For(xelig, xjob) x losr = xelig, τ7 = Admin Pos(xjob) suit For(xelig, xjob) x losr = xelig.

We deﬁne now parametric safety. Given an RDFS+ ontology O, we call state (ΣO-)formula a quantiﬁer-free ΣOformula ϕ(x). A state formula constrains the content of the artifact variables characterising the states of the systems. Notice that a state formula can represent a (possibly inﬁnite) set of states, because of the presence of (possibly inﬁnitely many) different elements in a model of the ontology O. A safety formula ν(x) for S is a state ΣO-formula describing the undesired states of the system. We say that S is safe w.r.t. ν(x) if there is no k 0 and no formula

ι(x0) τj0(x0, x1) τjk 1(xk 1, xk) ν(xk), ( )

that is satisﬁable w.r.t. O, where 1 j0, . . . , jk 1 m and each xh, with 0 h k, is a tuple of variables that are renamed copies of variables in x. The safety problem for S is the following decision problem: given a safety formula ν(x) for S, decide whether S is safe w.r.t. ν(x). This veriﬁcation problem is parametric on the models of a ﬁxed RDFS+ ontology, since safety is assessed irrespectively of the choice of such a model. This

Algorithm 1: SMT-based backward reachability procedure

Function BReach(ν)

2 while φ B is T(O)-satisﬁable do

3 if ι φ is T(O)-satisﬁable then

return (unsafe, unsafe trace of form ( ));

5 φ Pre(τ, φ);

6 φ QE(T(O) , φ); return safe;

implies that, when the system is safe, it is so for every execution of the process under every model in principle, inﬁnitely many of the given ontology. Example Referring to Example 4, an undesired situation is the one where an applicant registered user is declared winner even if they were not eligible. This situation is formally described by the safety formula:

ν = User(xwinr) Elig User(xwinr).

Verifying Safety Properties for RDFS+-BPs We study now the parameterised safety problems for RDFS+-BPs by adopting a symbolic version (Calvanese et al. 2020) of the well-known backward reachability procedure (Abdulla et al. 1996). The requirements that are needed for employing our machinery are (cf. Theorem 2): (i) the existence of the model completion T(O) for T(O) (with an available quantiﬁer elimination procedure in T(O) ), and (ii) the decidability of the constraint satisﬁability problem for the ontology O. Let us consider the safety problem for an RDFS+-BP S. First of all, we need to preprocess S in order to eliminate all the occurrences of case-deﬁned functions. This can be done in polynomial time. Lemma 6 The safety problem for an RDFS+-BP S can be reduced to the safety problem for an RDFS+-BP S (the size of which is polynomial in the size of S) with only trivial casedeﬁned functions.

Indeed, similarly to what shown in (Calvanese et al. 2020), the previous lemma shows that also in this case casedeﬁned functions can be eliminated w.l.o.g.From now on, we assume that RDFS+-BPs S is without any case-deﬁned function. We are ready to describe the main procedure for detecting safety of RDFS+-BPs: this procedure will handle only the pre-processed RDFS+-BPs not containing casedeﬁned functions The SMT-based backward reachability procedure (or backward search) for handling the safety problem for an RDFS+-BP S is shown in Algorithm 1. An integral part of the algorithm is to compute symbolic preimages (Line 5). The intuition behind the algorithm is to execute a loop where, starting from the undesired states of the system (described by the safety formula ν(x)), the state space of the system is explored backward: in every iteration of the while loop

(Line 2), the current set of states is regressed through transitions thanks to the preimage computation. For that purpose, for any τ(z, z ) and φ(z) (where z are renamed copies of z), we deﬁne τ := Wm h=1 τh and Pre(τ, φ) as the formula z (τ(z, z ) φ(z )). Let φ(x) be a state formula, describing the state of the artifact variables x. The preimage of the set of states described by the formula φ(x) is the set of states described by Pre(τ, φ) (notice that, when τ = W ˆτ, then Pre(τ, φ) = W Pre(ˆτ, φ)). We recall that a state formula is a quantiﬁer-free ΣO-formula. Unfortunately, because of the presence of the existentially quantiﬁed variables y in τ, Pre(τ, φ) is not a state formula, in general. As stated by Calvanese et al. (2020), if the quantiﬁed variables were not eliminated, we would break the regressability of the procedure: indeed, the states reached by computing preimages, intuitively described by Pre(τ, φ), need to be represented by a state formula φ in the new iteration of the while loop. In addition, the increase in the number of variables due to the iteration of the preimage computation would affect the performance of the satisﬁability tests described below, in case the loop is executed many times. In order to solve these issues, it is essential to introduce the subprocedure QE(T(O) , φ) in Line 6. QE(T(O) , φ) implements the quantiﬁer elimination algorithm of T(O) and that converts the preimage Pre(τ, φ) of a state formula φ into a state formula (equivalent to it modulo the axioms of T(O) ), so as to guarantee the regressability of the procedure: this conversion is possible since T(O) eliminates from τh the existentially quantiﬁed variables y. Backward search computes iterated preimages of the safety formula ν, until a ﬁxpoint is reached (in that case, S is safe w.r.t. ν) or until a set intersecting the initial states (i.e., satisfying ι) is found (in that case, S is unsafe w.r.t. ν). Inclusion (Line 2) and disjointness (Line 3) tests can be discharged via proof obligations to be handled by SMT solvers. The ﬁxpoint is reached when the test in Line 2 returns unsat: the preimage of the set of the current states is included in the set of states reached by the backward search so far (represented as the iterated application of preimages to the safety formula ν). The test at Line 3 is satisﬁable when the states visited so far by the backward search include a possible initial state (i.e., a state satisfying ι). If this is the case, then S is unsafe w.r.t. ν. Together with the unsafe outcome, the algorithm returns an unsafe trace of the form ( ), explicitly witnessing the sequence of transitions τh that, starting from the initial conﬁgurations, lead the system to a set of states satisfying the undesired conditions described by ν(x). Theorem 7 Backward search (Algorithm 1) is correct for detecting whether an RDFS+-BP S is safe w.r.t. ν(x). Proof (Sketch) We ﬁrst require the following claim, which comes immediately from the deﬁnitions. Claim 8 For every safety formula ν(x) for S and every k 0, a formula ϑ of the form ( ) is satisﬁable w.r.t. O iff ϑ is satisﬁable w.r.t. T(O). We then need to show that, instead of satisﬁability of formulas of the form ( ) in models of T(O), we can concentrate on satisﬁability w.r.t. T(O) (which exists thanks to Property (ii) of Theorem 2). Then, by exploiting the algorithm

for quantiﬁer elimination in T(O) described in Remark 3, formulas of the form ( ) can be represented via backward search by using quantiﬁer-free formulas. We conclude by noticing that safety/unsafety of S w.r.t. ν(x) can be detected invoking the satisﬁability tests on those quantiﬁer-free formulas: these tests are effective thanks to Property (i) of Theorem 2, because the decidability of the constraint satisﬁability problem implies the decidability of the satisﬁability of arbitrary quantiﬁer-free formulae.

Backward search for generic artifact systems is not guaranteed to terminate (Calvanese et al. 2020). However, in case S is unsafe w.r.t. ν(x), an unsafe trace which is ﬁnite is found after ﬁnitely many iterations of the while loop: hence, in the unsafe case, backward search must terminate. Together with the theorem above, this means that the backward reachability procedure is at least a semi-decision procedure for detecting unsafety of RDFS+-BPs. Nevertheless, we show in the following theorem that, in case of RDFS+-BPs, backward search always terminates: thus, it is a full decision procedure, for which we also provide a PSPACE upper bound. Theorem 9 The safety problem for RDFS+-BPs S = (O, x, ι(x), Sm j=1{τj(x, x )}) is decidable in PSPACE in the combined size of x, ι, and Sm j=1 τj.

Proof There are only ﬁnitely many quantiﬁer-free ΣT (O)- formulas, up to T(O)-equivalence, that could be built out of a ﬁnite set of variables x: this holds for every RDFS+ ontology O. Thanks to the quantiﬁer elimination procedure in Line 6, the overall number of variables in φ is never increased: notice that, without quantiﬁer elimination, computing preimages Pre(τ, φj) would introduce in φj+1 new quantiﬁed variables, because of the presence of existentially quantiﬁed variables y in τ. This implies that globally there are only ﬁnitely many quantiﬁer-free ΣT (O)-formulas that Algorithm 1 needs to analyse. Hence, Algorithm 1 must terminate: by construction of B, the unsatisﬁability test of Line 2 must eventually succeed, if the unsatisﬁability test of Line 3 never does so. Concerning complexity, we need to modify Algorithm 1. We ﬁrst notice that, thanks to Lemma 6, the preprocessing that converts an RDFS+-BP with occurrences of case-deﬁned functions into its equivalent RDFS+-BP without any occurrence of case-deﬁned functions does not increase the overall complexity of the problem. Moreover, the translation of an RDFS+-ontology O into T(O) requires polynomial time. Since T(O) is universal and without function symbols, also the satisﬁability tests in Lines 2-3 can be performed in polynomial time (cf. (Calvanese et al. 2020), Proposition 3.1). Then, we adopt a nondeterministic procedure, analogous to the one by Calvanese et al. (2019b), Theorem 6.1, that makes the complexity NPSPACE: the main difference from (Calvanese et al. 2019b), Theorem 6.1, is that in our signatures, instead of unary functions, we have unary and binary relational symbols, but the argument works similarly. By Savitch s Theorem (PSPACE = NPSPACE), we conclude the proof.

This upper bound is tight. In fact, propositional STRIPS planning (Bylander 1994), a well-known PSPACE-hard

problem, can be directly encoded in our setting (without making use of an ontology, and only employing a propositional working memory). We observe that Algorithm 1 is not yet implemented in the state-of-the-art model checker MCMT (Ghilardi and Ranise 2010), which is based on SMT solving. Such an implementation, however, can be obtained by extending MCMT with the quantiﬁer elimination algorithm for T(O) (cf. Remark 3), required in Line 6, together with a procedure for RDFS+ ontology satisﬁability, required in Lines 2 3.

Conclusions

We have studied the problem of veriﬁcation of data-aware processes under RDFS+ ontologies, where the process component can interact with a knowledge base speciﬁed by means of the DL RDFS+, underpinning the RDFS constructs. We addressed this problem by introducing a suitable model of ontology-based processes, called RDFS+-BPs, and by leveraging the SMT-based version of the backward reachability procedure, which is a well-known technique to employ for verifying systems of this kind. Speciﬁcally, we have shown that this procedure is a full decision procedure for detecting safety of RDFS+-BPs, and we also provided a PSPACE complexity upper bound. This work opens several directions for future work. First, we notice that the choice of RDFS+ ontologies is not intrinsic to our approach. Indeed, motivated by conceptual modelling and data integration issues in Ontology-Based Data Access (Xiao et al. 2018) applications, we are currently working on the DL-Lite family of DLs, to deﬁne suitable DLLite-based processes with analogous decidability and complexity results. The main difference we have to account for is that, for a DL-Lite ontology O, we have an equisatisﬁable (but not equivalent) translation into a universal one-variable FO sentence T(O), and Claim 8 in the proof of Theorem 7 has to be modiﬁed to show that a trace ϑ is satisﬁable w.r.t. O iff a suitably translated trace ˆϑ is satisﬁable w.r.t. T(O). In general, nonetheless, we point out that any DL satisfying the two conditions stated in Theorem 2 can be chosen for our purposes, and that the same theoretical guarantees can be obtained over the SMT-based backward reachability procedure. As future work, we thus intend to introduce a more general framework for DL-based processes that is able to account for different DLs. We also intend to extend the results obtained here to more sophisticated artifact-centric models, such as the relational artifact systems (RASs) studied by Calvanese et al. (2020). Moreover, it is worth investigating in this setting also properties that go beyond safety, such as liveness. In this respect, we intend to exploit the framework proposed in (Geatti, Gianola, and Gigante 2022) for solving satisﬁability of Linear Temporal Logic Modulo Theories over Finite Traces (LTLf MT), which is a ﬁrst-order extension of LTLf, so as to symbolically represent DL-based processes and express temporal properties over them. This approach is particularly promising because it relies on the use of the efﬁcient BLACK solver (Geatti, Gigante, and Montanari 2019, 2021), which leverages SMT solvers as backend tools.

Acknowledgments

This research has been partially supported by the Italian Ministry of University and Research (MUR) under PRIN project PINPOINT Prot. 2020FNEB27, and by the Free University of Bozen-Bolzano with the ADAPTERS project.

Abdulla, P. A.; Cerans, K.; Jonsson, B.; and Tsay, Y.-K. 1996. General Decidability Theorems for Inﬁnite-State Systems. In Proc. of LICS, 313 321. Artale, A.; Calvanese, D.; Kontchakov, R.; and Zakharyaschev, M. 2009. The DL-Lite Family and Relations. J. of Artiﬁcial Intelligence Research, 36: 1 69. Baader, F.; Calvanese, D.; Mc Guinness, D. L.; Nardi, D.; and Patel-Schneider, P. F., eds. 2003. The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge Univ. Press. ISBN 0-521-78176-0. Baader, F.; Horrocks, I.; Lutz, C.; and Sattler, U. 2017. An Introduction to Description Logic. Cambridge Univ. Press. ISBN 978-0-521-69542-8. Bagheri Hariri, B.; Calvanese, D.; De Giacomo, G.; Deutsch, A.; and Montali, M. 2013a. Veriﬁcation of Relational Data-Centric Dynamic Systems with External Services. In Proc. of PODS, 163 174. Bagheri Hariri, B.; Calvanese, D.; Montali, M.; De Giacomo, G.; De Masellis, R.; and Felli, P. 2013b. Description Logic Knowledge and Action Bases. J. of Artiﬁcial Intelligence Research, 46: 651 686. Brickley, D.; and Guha, R. 2014. RDF Schema 1.1, W3C Recommendation, World Wide Web Consortium. https: //www.w3.org/TR/rdf-schema/. Accessed: 2022-08-01. Bylander, T. 1994. The Computational Complexity of Propositional STRIPS Planning. Artif. Intell., 69(1-2): 165 204. Calvanese, D.; De Giacomo, G.; Lembo, D.; Lenzerini, M.; and Rosati, R. 2007. Tractable Reasoning and Efﬁcient Query Answering in Description Logics: The DL-Lite Family. J. Automat. Reason., 39(3): 385 429. Calvanese, D.; De Giacomo, G.; and Montali, M. 2013. Foundations of Data Aware Process Analysis: A Database Theory Perspective. In Proc. of PODS, 1 12. Calvanese, D.; Ghilardi, S.; Gianola, A.; Montali, M.; and Rivkin, A. 2019a. Formal Modeling and SMT-Based Parameterized Veriﬁcation of Data-Aware BPMN. In Proc. of BPM, volume 11675 of LNCS, 157 175. Springer. Calvanese, D.; Ghilardi, S.; Gianola, A.; Montali, M.; and Rivkin, A. 2019b. From Model Completeness to Veriﬁcation of Data Aware Processes. In Description Logic, Theory Combination, and All That, volume 11560 of LNCS, 212 239. Springer. Calvanese, D.; Ghilardi, S.; Gianola, A.; Montali, M.; and Rivkin, A. 2020. SMT-based Veriﬁcation of Data-Aware Processes: A Model-Theoretic Approach. Math. Structures Comput. Sci., 30(3): 271 313.

Calvanese, D.; Gianola, A.; Mazzullo, A.; and Montali, M. 2021. SMT-Based Safety Veriﬁcation of Data-Aware Processes under Ontologies (Extended Version). Co RR, abs/2108.12330. Chang, C. C.; and Keisler, H. J. 1990. Model Theory. North Holland Publ. Co. Claßen, J.; Liebenberg, M.; Lakemeyer, G.; and Zarrieß, B. 2014. Exploring the Boundaries of Decidable Veriﬁcation of Non-Terminating Golog Programs. In Proc. of AAAI, 1012 1019. Damaggio, E.; Deutsch, A.; and Vianu, V. 2012. Artifact Systems with Data Dependencies and Arithmetic. ACM Trans. Database Syst., 37(3): 22. de Leoni, M.; Felli, P.; and Montali, M. 2020. Strategy Synthesis for Data-Aware Dynamic Systems with Multiple Actors. In Proc. of KR, 315 325. Deutsch, A.; Hull, R.; Patrizi, F.; and Vianu, V. 2009. Automatic Veriﬁcation of Data-Centric Business Processes. In Proc. of ICDT, 252 267. Deutsch, A.; Hull, R.; and Vianu, V. 2014. Automatic Veriﬁcation of Database-Centric Systems. SIGMOD Record, 43(3): 5 17. Deutsch, A.; Li, Y.; and Vianu, V. 2016. Veriﬁcation of Hierarchical Artifact Systems. In Proceedings of PODS 2016, 179 194. ACM. Deutsch, A.; Li, Y.; and Vianu, V. 2019. Veriﬁcation of Hierarchical Artifact Systems. ACM Trans. Database Syst., 44(3): 12:1 12:68. Geatti, L.; Gianola, A.; and Gigante, N. 2022. Linear Temporal Logic Modulo Theories over Finite Traces. In Proc. of IJCAI 2022, 2641 2647. ijcai.org. Geatti, L.; Gigante, N.; and Montanari, A. 2019. A SATBased Encoding of the One-Pass and Tree-Shaped Tableau System for LTL. In Proc. of TABLEAUX 2019, volume 11714 of LNCS, 3 20. Springer. Geatti, L.; Gigante, N.; and Montanari, A. 2021. BLACK: A Fast, Flexible and Reliable LTL Satisﬁability Checker. In Proc. of OVERLAY 2021, co-located with Gand ALF 2021, volume 2987 of CEUR Workshop Proceedings, 7 12. CEUR-WS.org. Ghilardi, S. 2004. Model-Theoretic Methods in Combined Constraint Satisﬁability. J. Automat. Reason., 33(3 4): 221 249. Ghilardi, S.; Gianola, A.; Montali, M.; and Rivkin, A. 2020. Petri Nets with Parameterised Data - Modelling and Veriﬁcation. In Proc. of BPM, volume 12168 of LNCS, 55 74. Springer. Ghilardi, S.; Gianola, A.; Montali, M.; and Rivkin, A. 2021. Delta-BPMN: A Concrete Language and Veriﬁer for Data Aware BPMN. In Proceedings of BPM 2021, volume 12875 of LNCS, 179 196. Springer. Ghilardi, S.; and Ranise, S. 2010. MCMT: A Model Checker Modulo Theories. In Proc. of IJCAR, volume 6173 of LNCS, 22 29. Springer. Hodges, W. 1993. Model Theory. Cambridge Univ. Press.

Hull, R. 2008. Artifact-Centric Business Process Models: Brief Survey of Research Results and Challenges. In Proc. of ODBASE, volume 5332 of LNCS, 1152 1163. Springer. Kontchakov, R.; and Zakharyaschev, M. 2014. An Introduction to Description Logics and Query Rewriting. In Reasoning Web, volume 8714 of LNCS, 195 244. Springer. Lipparini, P. 1982. Locally Finite Theories with Model Companion. In Atti Accad. Naz. Lincei, volume 72. Reichert, M. 2012. Process and Data: Two Sides of the Same Coin? In Proc. of OTM, volume 7565 of LNCS, 2 19. Springer. Vianu, V. 2009. Automatic Veriﬁcation of Database-Driven Systems: a New Frontier. In Proc. of ICDT, 1 13. Xiao, G.; Calvanese, D.; Kontchakov, R.; Lembo, D.; Poggi, A.; Rosati, R.; and Zakharyaschev, M. 2018. Ontology Based Data Access: A Survey. In Lang, J., ed., Proc. of IJCAI 2018, 5511 5519. ijcai.org. Zarrieß, B.; and Claßen, J. 2016. Decidable Veriﬁcation of Golog Programs over Non-Local Effect Actions. In Proc. of AAAI, 1109 1115.