# smt_safety_verification_of_ontologybased_processes__edd2c601.pdf SMT Safety Verification of Ontology-Based Processes Diego Calvanese1,2, Alessandro Gianola1, Andrea Mazzullo1, Marco Montali1 1Faculty of Engineering, Free University of Bozen-Bolzano, Italy, 2Computing Science Department, Ume a University, Sweden {calvanese, gianola, mazzullo, montali}@inf.unibz.it In the context of verification of data-aware processes, a formal approach based on satisfiability modulo theories (SMT) has been considered to verify parameterised safety properties. This approach requires a combination of model-theoretic notions and algorithmic techniques based on backward reachability. We introduce here Ontology-Based Processes, which are a variant of one of the most investigated models in this spectrum, namely simple artifact systems (SASs), where, instead of managing a database, we operate over a description logic (DL) ontology. We prove that when the DL is expressed in (a slight extension of) RDFS, it enjoys suitable modeltheoretic properties, and that by relying on such DL we can define Ontology-Based Processes to which backward reachability can still be applied. Relying on these results we are able to show that in this novel setting, verification of safety properties is decidable in PSPACE. Introduction Verifying and reasoning about dynamic systems that integrate processes and data is a long-standing challenge that attracted considerable attention, and that led to a flourishing series of results, within business process management (Reichert 2012; Calvanese et al. 2019a; Ghilardi et al. 2020) and data management (Vianu 2009; Calvanese, De Giacomo, and Montali 2013; Bagheri Hariri et al. 2013a; Deutsch, Hull, and Vianu 2014; Deutsch, Li, and Vianu 2019). Among the several conceptual models studied in this area, datacentric systems and in particular artifact-centric systems have been brought forward as a principled approach where relevant (business) objects are elicited, and actions evolving them through their lifecycle are defined (Hull 2008). Different formal models have been proposed to capture artifact systems and study their verification (Calvanese, De Giacomo, and Montali 2013). One of the most studied settings considers artifact systems as being composed of: (i) a readonly database storing background information about artifacts that does not change during the system evolution; (ii) a working memory, used to store data that can be modified in the course of the evolution; and (iii) transitions (also called actions or services) that query the read-only database and the working memory and use the retrieved data to update Copyright 2023, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. the working memory. Verification of such systems is challenging, not only because the working memory in general evolves through infinitely many different configurations, but also because the desired verification properties should hold regardless of the content of the read-only database, thus calling for a particular form of parameterised verification (Damaggio, Deutsch, and Vianu 2012; Deutsch, Li, and Vianu 2019; Calvanese et al. 2019b, 2020). In this paper, we study for the first time ontology-based processes, i.e., semantically-enriched artifact systems where the read-only database is substituted by a description logic (DL) ontology (Baader et al. 2003), which stores background, incomplete information about the artifacts. In this setting, two possible notions of parameterisation may be studied: one where the evolution of the system is verified against all possible choices for the ABox (i.e., the extensional component of the ontology), another where verification is against all possible models of a fixed ABox. In this work, we adopt the latter, and thus verify whether the system enjoys desired properties irrespectively of how the information explicitly provided by the ABox is completed through the assertions in the TBox (i.e., the intensional component of the ontology). More in detail, we consider an extensively studied model of such artifact systems, called simple artifact system (SAS) by Calvanese et al. (2020), where the artifact working memory consists of a fixed set of artifact variables (Deutsch et al. 2009; Damaggio, Deutsch, and Vianu 2012; Calvanese et al. 2020). On top of this basis, we study the verification of safety properties in the case where the ontology is specified in (a slight extension of) RDFS (Brickley and Guha 2014), a schema/ontology language for the Semantic Web formalized by the W3C, and where the transitions that update the working memory are expressed over the ontology signature. For this setting, we provide an SMT-based backward reachability procedure to decide safety, showing that the problem is in PSPACE. To apply this machinery, the underlying DL must enjoy suitable model-theoretic properties. In particular, to this end, we prove for the first time that this DL admits a model completion (Chang and Keisler 1990). Showing the existence of model completions is not trivial: it requires a careful algebraic analysis of the class of all models. At the same time, we give indications on which DL constructs break our verification machinery: indeed, model The Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23) completions may not exist in general, and we show that simple examples such as RL-ontologies do not admit a model completion. This gives also a technical justification for the choice of RDFS. Detailed proofs are provided in an extended version of this article (Calvanese et al. 2021). Related work. In spirit, our approach is reminiscent of previous works studying the verification of dynamic systems, like Golog programs, operating over a DL ontology, such as those by Claßen et al. (2014), and Zarrieß and Claßen (2016). In fact, both in their settings and ours, the dynamic system evolves each model of the ontology, and properties are verified over all the resulting evolutions. This is radically different from approaches where the ABox itself is evolved by the process, with an execution semantics following Levesque s functional approach, in which query entailment over the current state is used to compute the successor states (Bagheri Hariri et al. 2013b). However, we differ from the work by Claßen et al. (2014) and Zarrieß and Claßen (2016) in that our goal is not only to derive foundational results, but also to transfer them into practical algorithms and thus obtain a model that is readily implementable by relying on a state-of-the-art SMT-based model checker such as MCMT (Ghilardi and Ranise 2010). As customary for artifact systems, our approach is based on actions that manipulate the artifact variables, coupled with conditionaction rules that declaratively define which actions are currently executable, and with which parameters. Alternative choices could be seamlessly taken, by adapting approaches that rely on an explicit description of the control-flow, e.g., based on state machines (de Leoni, Felli, and Montali 2020) or Petri nets interpreted with interleaving semantics (Ghilardi et al. 2020). There exist very few approaches that have been proved to be successful for assessing parameterized safety of processes enriched with data capabilities, and SMT solvers are well-known to represent the state-of-the-art in this area (Calvanese et al. 2019a; Ghilardi et al. 2021; Calvanese et al. 2020): in this work we build in particular on top of the SMTbased verification framework from (Calvanese et al. 2020). A key element of novelty in our work is to lift, in conceptual and modelling terms, this SMT-based verification approach from to handle data stored in a DL ontology with incomplete information, as opposed to a standard relational database adopted in all the prior works on parameterized verification of artifact systems (Deutsch, Li, and Vianu 2019, 2016; Calvanese et al. 2020). It is well known that lifting results from settings with complete to ones with incomplete information is often a non-trivial task that requires novel insights, even more when dealing with data/knowledge dynamics and formal verification (where overall results are sparse and fragmented). This lies at the core of KR in AI and is a key contribution of our work. Moreover, to develop our novel results we need to extend the technical machinery from (Calvanese et al. 2020) in a non-trivial way. Indeed, the central technical result is then to show that model completion holds in this novel setting (Theorem 2), which in turn stands at the core of the verification machinery we employ. In this light, Theorem 2 is a crucial, non-trivial result, as proving the existence of a model completion calls every time for a sophisticated semantical analysis of FO models of the theory of interest, that is significantly different when a different theory is studied. Preliminaries In this section, we recall the syntax and semantics of firstorder logic (FO). We then define the syntax of the DL RDFS+, which is a slight extension of RDFS (Brickley and Guha 2014). Its semantics is given by the standard translation, mapping RDFS+ ontologies into equivalent sets of FO formulas. First-Order Logic Preliminaries The alphabet of first-order logic (FO) consists of: countably infinite and pairwise disjoint sets NP of predicate symbols (with ar(P) N being the arity of P NP), NF of function symbols (with ar(f) N being the arity of f NF), NI of individual symbols (or individual names), and Var of variables; the equality symbol = ; the Boolean operators and ; and the existential quantifier . The definitions of a (FO) term t (with t denoting a possibly empty tuple of terms), formula ϕ, atom, and literal are given as usual. Moreover, we adopt the standard abbreviations and conventions for the other Boolean operators and the universal quantifier . We write ϕ(x) to indicate that the free variables (defined as usual) of ϕ are included in x, and we write ϕ(a) for the formula obtained from ϕ(x) by substituting a to x. Similar notions are adopted for terms. A sentence is defined as a formula without free variables, while we call quantifier-free a formula without any existential or universal quantifiers. A formula is existential if it has the form xϕ(x), and universal if it has the form xϕ(x), where ϕ is quantifier-free. A (FO) theory T is a set of FO sentences, and T is said to be universal if every ϕ T is universal. A signature Σ is a subset of NP NF NI. For a set Γ of formulas (respectively, terms), the signature of Γ, denoted ΣΓ, is the set of predicate, function, and individual symbols occurring in Γ. Given Σ, we say that Γ is a set of Σ-formulas (resp., Σ-terms) if ΣΓ = Σ. Given a signature Σ, an (FO) Σ-interpretation is a pair I = ( I, I), where I is a non-empty set, called domain of I, and I is an interpretation function such that: P I ( I)ar(P ), for every P NP Σ; f I : ( I)ar(f) I, for every f NF Σ; and a I I, for every a NI Σ. For a set Γ of terms or formulas, when no confusion can arise, we often write interpretation in place of ΣΓ-interpretation . An assignment in I is a function a: Var I. We define the value of a Σ-term t in I under a as follows: a(t) = a(x), if t = x; a(t) = a I, if t = a NI Σ; and a(t) = f I(a(t)), if t = f(t), where f NF Σ and, for an m-tuple t = (t1, . . . , tm) of terms, we set a(t) = (a(t1), . . . , a(tm)). The notion of a formula ϕ being satisfied in an interpretation I under an assignment a, or of I being a model of ϕ under a, written I |=a ϕ, is inductively defined as: I |=a P(t) iff a(t) P I, I |=a s = t iff a(s) = a(t), I |=a ψ iff not I |=a ψ, I |=a ψ χ iff I |=a ψ and I |=a χ, I |=a xψ iff I |=a ψ, for some a that can differ from a only on x. For a formula ϕ(x), we write I |= ϕ[d] in place of I |=a ϕ(x), with a(x) = d. We say that a set Γ of formulas is satisfied in an interpretation I under an assignment a, or that I is a model of Γ under a, written I |=a Γ, if I |=a ϕ, for every ϕ Γ (we refer to a singleton Γ = {ϕ} simply as ϕ). For a sentence ϕ, the satisfaction of ϕ in I under a does not depend on a, thus we write I |= ϕ in place of I |=a ϕ, and we say that ϕ is satisfied in I. For a theory T, we say that T is satisfied in an interpretation I (or that I is a model of T), written I |= T, if every sentence of T is satisfied in I. A formula ϕ is satisfiable w.r.t. T (or T-satisfiable) if there exist an interpretation I and an assignment a in I such that I |= T and I |=a ϕ. Moreover, we say that T logically implies a formula ϕ, or that ϕ is a logical consequence of T, written T |= ϕ, if, for every interpretation I and every assignment a in I, I |= T implies that I |=a ϕ. Finally, formulas ϕ, ψ are equivalent w.r.t. T (or T-equivalent) if T |= ϕ ψ. Description Logics Preliminaries The DL we consider here is an extension of RDFS (Brickley and Guha 2014) with disjointness between concepts and roles, conjunction and (one-level) qualified existential quantification on the left-hand side of inclusions, and inclusion of direct and inverse roles. We denote this DL RDFS+. Formally, let NC, NR, and NI be countably infinite and pairwise disjoint sets of concept, role, and individual names, respectively, where concept names are 1-ary and role names are 2-ary predicate symbols, hence NC NR NP. In RDFS+, concepts C and roles R, respectively denoting 1ary and 2-ary predicates, are defined as follows: R ::= P | P , C ::= A1 An | R. | R.A, where P NR, n 1, and A, A1, . . . , An NC. A concept inclusion (CI) has the form C A or C A, and a role inclusion (RI) has the form R R or R R , where C is an RDFS+ concept, A NC, and R, R are roles. An RDFS+ TBox T is a finite set of CIs and RIs. An assertion has the form A(a), A(a), P(a, b), P(a, b), (a = b), or (a = b), where A NC, P NR, and a, b NI. An ABox A is a finite set of assertions. (We point out that in an ABox we allow for negated assertions, which is a feature that is not always supported in DLs.) An RDFS+ ontology O is a pair (T , A), where T is a TBox and A is an ABox. We observe that RDFS+ is incomparable in expressive power with the DLs of the popular DL-Lite family (Calvanese et al. 2007; Artale et al. 2009). Indeed, while DLLite allows for the use of existential quantification R. on the right-hand side of CIs, this is ruled out in RDFS+. On Acad Pos Job Pos Acad Pos Admin Pos Admin Pos Job Pos User Job Pos appl For. User appl For . Job Pos suit For. User suit For . Job Pos suit For. Good Eval Elig User User User Grad Elig User Elig User Grad Acad Pos(professor123), Acad Pos(researcher123), Admin Pos(secretary123), Admin Pos(secretary456). Figure 1: University personnel ontology for job hiring processes the other hand, in RDFS+ one can locally type the second component of a role through the use of qualified existential quantification R.A on the left-hand side of CIs, while this is not possible in DL-Lite. As we will see later, differently from DL-Lite, the FO translation of an RDFS+ ontology is a universal theory. Example To represent part of the domain knowledge on job hiring processes for university personnel, we define the RDFS+ ontology O = (T , A), where T and A contain the CIs and assertions shown in Figure 1. Moreover, we assume that A, which stores data on available job positions, contains all the assertions of the form A(u), P(u, a) and P(a, u), for a distinguished individual name u NI and every A, P, a ΣO, so that u can be used to represent an undefined value. The CIs of T formalise the following facts: there are both academic and administrative job positions and these are disjoint; users and job positions are disjoint; appl For relates users to job positions; to be suitable for something one has to be a user that is positively evaluated; the range of suit For is included in the extension of Job Pos; an eligible user is defined as a graduate user. We define now the standard translation from RDFS+ expressions to FO formulas, which maps concepts to FO formulas with one free variable, and roles to FO formulas with two free variables. Specifically, the translation T generates formulas that contain just two variables x, y Var: T(A1 An) = A1(x) An(x), T(P) = P(x, y), T(P ) = P(y, x), T( R. ) = y T(R), T( R.A) = y(T(R) A(y)), T( A) = T(A), T( R) = T(R), where A, A1, . . . , An are unary predicates and P is a binary predicate. Moreover, we map CIs and RIs into universal FO sentences in the following way: T(C D) = x(T(C) T(D)), T(R S) = x y(T(R) T(S)), where D stands for either A or A, and S stands for either R or R . We also set T(T ) = S β T {T(β)}. Assertions α are (identically) mapped into FO literals without free variables (i.e., ground), as T(α) = α, and we set T(A) = S α A{T(α)} = A. Finally, T(O) = T(T ) T(A). It is easy to see that the set of FO sentences obtained as the translation T(O) of an RDFS+ ontology O, can be equivalently rewritten into a universal Horn theory (Kontchakov and Zakharyaschev 2014; Hodges 1993). Such a theory, which we identify with T(O), can be obtained from T(O) by simply putting formulas into prenex normal form. The semantics for RDFS+ expressions can be given in terms of their FO translation (Kontchakov and Zakharyaschev 2014). For a concept C and an interpretation I = ( I, I), the extension of C in I is CI = {d I | I |= T(C)[d]}. Similarly, for a role R, its extension in I is RI = {(d, e) I I | I |= T(R)[d, e]}. We say that C and R are satisfied in I if CI = and RI = , respectively. Moreover, given a CI, RI, assertion, TBox, ABox, or ontology Γ, we say that Γ is satisfied in I (or that I is a model of Γ), written I |= Γ, if I |= T(Γ). Given an ontology O and (a concept, role, CI, RI, or assertion mapped, via its FO translation, into) an FO formula ϕ, we say that ϕ is satisfiable w.r.t. O (or O-satisfiable) if there exists a model I of O that satisfies ϕ under some assignment in I. Finally, we say that O logically implies an FO formula ϕ, or that ϕ is a logical consequence of O, written O |= ϕ, if, for every model I of O and every assignment a in I, I satisfies ϕ under a. Basic Model-Theoretic Properties In this section, we prove the model-theoretic properties that will be used later on to develop our verification machinery. Specifically, we show here that the standard translation of the RDFS+ ontologies introduced in the previous section admits model completion, and has the constraint satisfiability problem decidable. These properties will then allow us to verify suitably defined ontology-based processes by employing a variant of the SMT-based backward reachability procedure by Calvanese et al. (2020). For this, we require some preliminary notions. A formula that is a conjunction of Σ-literals is called a Σ-constraint. Given a Σ-theory T, we define the constraint satisfiability problem for T as follows: given a formula yϕ(x, y), where ϕ(x, y) is a Σ-constraint, decide whether yϕ(x, y) is satisfiable w.r.t. T. A theory T has quantifier elimination (QE) iff, for every ΣT -formula ϕ(x), there exists a quantifier-free formula ψ(x) such that T |= ϕ(x) ψ(x). Finally, we will use the following definition of model completion, which is restricted to cover the case of universal theories (the ones considered in this work) and that is nonetheless known to be equivalent (for universal theories) to the usual one from model theory (Chang and Keisler 1990; Ghilardi 2004). Let T be a universal Σ-theory and let T T be a further Σ-theory. We say that T is a model completion of T iff: (i) every Σ-constraint satisfied in a model of T is also satisfied in a model of T ; (ii) T has QE. A model completion T of a theory T, when it exists, is unique: this justifies the use of the notation T for denoting the unique model completion of T. We now state the main technical result of the section. Theorem 2 Given an RDFS+ ontology O, T(O) is a finite universal FO theory that (i) has a decidable constraint satisfiability problem, and (ii) admits a model completion. Proof (Sketch) For Point (i), we reduce to RDFS+ ontology satisfiability, which is decidable (since RDFS+ is a fragment of SROIQ (Baader et al. 2017)). For Point (ii), we require the following definitions. A theory T has the amalgamation property if, for every pair of embeddings µ1 : I0 I1, µ2 : I0 I2 between models I0 and I1, I2 of T, there exist a model I of T and embeddings ν1 : I1 I, ν2 : I2 I, such that ν1 µ1 = ν2 µ2 (we adopt here the usual notion of embedding (Chang and Keisler 1990)). The triple (I, ν1, ν2) (or, abusing notation, just I) is called a T-amalgam of I1, I2 over I0. One can assume w.l.o.g. that I0 is a substructure of I1 and I2 and that µ1 and µ2 are inclusion embeddings (mapping an element of I0 in the same element of I1 and I2, resp.). Since there is no function symbol in ΣT (O), it suffices to show that T(O) enjoys the amalgamation property (Lipparini 1982). For this purpose, for every pair of models I1, I2 of T(O) sharing a submodel I0, we define I = ( I, I) as follows: (i) I = I1 I2; (ii) for every individual symbol a ΣT (O), a I = a I1; (iii) for every (1or 2-ary) predicate symbol P ΣT (O), P I = P I1 P I2. Observe that a I = a I1 = a I0 = a I2. Moreover, if d P I, where P is n-ary, for n {1, 2}, then d ( I1)n and d P I1, or d ( I2)n and d P I2: this follows from the definition of P I := P I1 P I2. Clearly, given embeddings µ1 : I0 I1, µ2 : I0 I2, the (inclusion) embeddings i1 : I1 I, i2 : I2 I are such that i1 µ1 = i2 µ2. We show that I is a model of T(O). A formula ϕ of T(O) has one of the following forms: (1) x(A1(x) An(x) D(x)); (2) x y(R1(x, y) D(x)); (3) x y(R1(x, y) A(y) D(x)); (4) x y(R1(x, y) R2(x, y)); (5) x y(R1(x, y) R2(x, y)); where: Ak NC, for k {1, . . . , n}; D {B, B}, with B NC; Ri(x, y) = Pi(x, y), if Ri = Pi, and Ri(x, y) = Pi(y, x), if Ri = P i , with Pi NR and i {1, 2}. Reasoning by cases, one can show that for every j {1, . . . , 5} and every formula ϕ T(O) of the form (j), I is a model of ϕ. For example, we prove here Case (1). Given d I, suppose that I |= Ak[d], i.e., d AI k, for all k {1, . . . , n}. We have that d Ii and d AIi k , for i = 1 or i = 2, and thus Ii |= Ak[d]. Since Ii is a model of T(O), and hence of ϕ, we have Ii |= D[d]. Given that D(x) is a literal and Ii is embedded in I, we obtain that I |= D[d], and thus I |= ϕ. This completes the proof that T(O) has a model completion. Remark For every RDFS+ ontology O, the model completion T(O) of T(O) admits quantifier elimination. The algorithm for quantifier elimination in T(O) follows from the proof of Theorem 2: to eliminate x from a ΣT (O)-formula xϕ(x, y), we take the conjunction of the clauses χ(y) implied by ϕ(x, y), which are finitely many for T(O), up to T(O)-equivalence. This procedure is used in Algorithm 1 and is crucial to get the decidability results of Theorem 9. Properties (i) and (ii) of Theorem 2 are in line with the foundational framework by Calvanese et al. (2020), where a third property is additionally assumed: the finite model property for constraint satisfiability (see the references for the definition). However, an important difference with the work by Calvanese et al. (2020) is that this property here is not needed because we do not require finite structures (i.e., databases). Finally, we observe that ontologies in the DL RL (Kontchakov and Zakharyaschev 2014) do not enjoy Property (ii) of Theorem 2. To see this, consider the RL ontology O1 = (T1, ), with T1 = { P. P. }. It can be shown (see figure below) that T(O1) does not enjoy amalgamation, hence it does not admit a model completion (Chang and Keisler 1990). I0 I1 I2 e0 e1 e2 P P Ontology-Based Processes To study ontology-based processes under RDFS+ ontologies, we introduce now our model, called RDFS+-based processes, which are a variant of the artifact systems studied by Calvanese et al. (2019b). We also introduce the parameterised safety problems for our model, then studied in the following section. Ontology-based processes read data from a given RDFS+ ontology, used to store background information of the system, and manipulate individual variables, called artifact variables, which represent the current state of the process. To formalise such model, we first introduce case-defined functions. For an RDFS+ ontology O, an O-partition is a finite set P = {κ1(x), . . . , κn(x)} of ΣO-literals such that O |= x Wn i=1 κi(x) x V i =j (κi(x) κj(x)). Given an O-partition P = {κ1(x), . . . , κn(x)}, and ΣO-terms t(x) = (t1(x), . . . , tn(x)), (the value of) a case-defined function F on P and t, for a fresh function symbol F NF, is as follows: for every model I of O, every assignment a in I, and every tuple x of variables, a(F(x)) = a(ti(x)), if I |=a κi(x). We call a case-defined function trivial when i = 1, that is, when it has only one case In order to introduce verification problems in a symbolic setting, one first has to specify which formulas are used to represent (i) the sets of states, (ii) the system initialisations, and (iii) the system evolution. To capture these aspects, we provide the following definitions. An RDFS+-based process (RDFS+-BP) is a tuple S = (O, x, ι(x), Sm j=1{τj(x, x )}), where m N, and O = (T , A) is an RDFS+ ontology; x = (x1, . . . , xn) is a tuple of variables, called artifact variables, and x is a tuple of variables that are renamed copies of variables in x; ι(x) = Vn i=1 xi = ai, with ai NI, is an initial state formula; τj(x, x ) = y(γj(x, y) Vn i=1 x i = F j i (x, y)), for 1 j m, is a transition formula, where γj(x, y) is a conjunction of ΣO-literals, called guard of τj, and each x i = F j i (x, y), where F j i is a case-defined function on some O-partition and list of ΣO-terms, is called an update of τj. Notice that, when the case-defined function F j i appearing in an update is trivial, the corresponding update x i = F j i (x, y) stands for a value assignment of the variable x i to a single term. Example We develop a job hiring process for university personnel based on the domain knowledge from Example 1. Each application is created using a dedicated website portal, where potentially interested users need to register in advance. When a registered user decides to apply, the data created do not have to be stored persistently and thus can be maintained just by using artifact variables (described below) that can interact with the knowledge base. All these variables are initialised with an undefined value u. In the first transition, an application is created by a registered user, which falls into the extension of the concept User: the information about the user is then stored in the artifact variable xappl. At this point, the website asks the user whether they hold a university degree: if this is the case, the website accepts the user as eligible, their information is stored using xappl and the process can progress. Then, the user picks up a job position (assigned to xjob) and applies for it. The following steps of the process involve the evaluation of the application: for both academic and administrative positions, if the eligible candidate is suitable for the position, they are declared winner (assigned to xwinr), otherwise loser (assigned to xlosr). To formalise this process, we define the RDFS+-BP S = (O, x, ι(x), S7 j=1{τj(x, x )}) so that O is the RDFS+ ontology of Example 1, and x = (xappl, xjob, xelig, xwinr, xlosr), ι = V xi x xi = u, τ1 = y1(User(y1) x appl = y1), τ2 = Elig User(xapplicant) x elig = xappl, τ3 = z1(Job Pos(z1) appl For(xelig, z1) x job = z1); τ4 = Acad Pos(xjob) suit For(xelig, xjob) x winr = xelig, τ5 = Admin Pos(xjob) suit For(xelig, xjob) x winr = xelig, τ6 = Acad Pos(xjob) suit For(xelig, xjob) x losr = xelig, τ7 = Admin Pos(xjob) suit For(xelig, xjob) x losr = xelig. We define now parametric safety. Given an RDFS+ ontology O, we call state (ΣO-)formula a quantifier-free ΣOformula ϕ(x). A state formula constrains the content of the artifact variables characterising the states of the systems. Notice that a state formula can represent a (possibly infinite) set of states, because of the presence of (possibly infinitely many) different elements in a model of the ontology O. A safety formula ν(x) for S is a state ΣO-formula describing the undesired states of the system. We say that S is safe w.r.t. ν(x) if there is no k 0 and no formula ι(x0) τj0(x0, x1) τjk 1(xk 1, xk) ν(xk), ( ) that is satisfiable w.r.t. O, where 1 j0, . . . , jk 1 m and each xh, with 0 h k, is a tuple of variables that are renamed copies of variables in x. The safety problem for S is the following decision problem: given a safety formula ν(x) for S, decide whether S is safe w.r.t. ν(x). This verification problem is parametric on the models of a fixed RDFS+ ontology, since safety is assessed irrespectively of the choice of such a model. This Algorithm 1: SMT-based backward reachability procedure Function BReach(ν) 2 while φ B is T(O)-satisfiable do 3 if ι φ is T(O)-satisfiable then return (unsafe, unsafe trace of form ( )); 5 φ Pre(τ, φ); 6 φ QE(T(O) , φ); return safe; implies that, when the system is safe, it is so for every execution of the process under every model in principle, infinitely many of the given ontology. Example Referring to Example 4, an undesired situation is the one where an applicant registered user is declared winner even if they were not eligible. This situation is formally described by the safety formula: ν = User(xwinr) Elig User(xwinr). Verifying Safety Properties for RDFS+-BPs We study now the parameterised safety problems for RDFS+-BPs by adopting a symbolic version (Calvanese et al. 2020) of the well-known backward reachability procedure (Abdulla et al. 1996). The requirements that are needed for employing our machinery are (cf. Theorem 2): (i) the existence of the model completion T(O) for T(O) (with an available quantifier elimination procedure in T(O) ), and (ii) the decidability of the constraint satisfiability problem for the ontology O. Let us consider the safety problem for an RDFS+-BP S. First of all, we need to preprocess S in order to eliminate all the occurrences of case-defined functions. This can be done in polynomial time. Lemma 6 The safety problem for an RDFS+-BP S can be reduced to the safety problem for an RDFS+-BP S (the size of which is polynomial in the size of S) with only trivial casedefined functions. Indeed, similarly to what shown in (Calvanese et al. 2020), the previous lemma shows that also in this case casedefined functions can be eliminated w.l.o.g.From now on, we assume that RDFS+-BPs S is without any case-defined function. We are ready to describe the main procedure for detecting safety of RDFS+-BPs: this procedure will handle only the pre-processed RDFS+-BPs not containing casedefined functions The SMT-based backward reachability procedure (or backward search) for handling the safety problem for an RDFS+-BP S is shown in Algorithm 1. An integral part of the algorithm is to compute symbolic preimages (Line 5). The intuition behind the algorithm is to execute a loop where, starting from the undesired states of the system (described by the safety formula ν(x)), the state space of the system is explored backward: in every iteration of the while loop (Line 2), the current set of states is regressed through transitions thanks to the preimage computation. For that purpose, for any τ(z, z ) and φ(z) (where z are renamed copies of z), we define τ := Wm h=1 τh and Pre(τ, φ) as the formula z (τ(z, z ) φ(z )). Let φ(x) be a state formula, describing the state of the artifact variables x. The preimage of the set of states described by the formula φ(x) is the set of states described by Pre(τ, φ) (notice that, when τ = W ˆτ, then Pre(τ, φ) = W Pre(ˆτ, φ)). We recall that a state formula is a quantifier-free ΣO-formula. Unfortunately, because of the presence of the existentially quantified variables y in τ, Pre(τ, φ) is not a state formula, in general. As stated by Calvanese et al. (2020), if the quantified variables were not eliminated, we would break the regressability of the procedure: indeed, the states reached by computing preimages, intuitively described by Pre(τ, φ), need to be represented by a state formula φ in the new iteration of the while loop. In addition, the increase in the number of variables due to the iteration of the preimage computation would affect the performance of the satisfiability tests described below, in case the loop is executed many times. In order to solve these issues, it is essential to introduce the subprocedure QE(T(O) , φ) in Line 6. QE(T(O) , φ) implements the quantifier elimination algorithm of T(O) and that converts the preimage Pre(τ, φ) of a state formula φ into a state formula (equivalent to it modulo the axioms of T(O) ), so as to guarantee the regressability of the procedure: this conversion is possible since T(O) eliminates from τh the existentially quantified variables y. Backward search computes iterated preimages of the safety formula ν, until a fixpoint is reached (in that case, S is safe w.r.t. ν) or until a set intersecting the initial states (i.e., satisfying ι) is found (in that case, S is unsafe w.r.t. ν). Inclusion (Line 2) and disjointness (Line 3) tests can be discharged via proof obligations to be handled by SMT solvers. The fixpoint is reached when the test in Line 2 returns unsat: the preimage of the set of the current states is included in the set of states reached by the backward search so far (represented as the iterated application of preimages to the safety formula ν). The test at Line 3 is satisfiable when the states visited so far by the backward search include a possible initial state (i.e., a state satisfying ι). If this is the case, then S is unsafe w.r.t. ν. Together with the unsafe outcome, the algorithm returns an unsafe trace of the form ( ), explicitly witnessing the sequence of transitions τh that, starting from the initial configurations, lead the system to a set of states satisfying the undesired conditions described by ν(x). Theorem 7 Backward search (Algorithm 1) is correct for detecting whether an RDFS+-BP S is safe w.r.t. ν(x). Proof (Sketch) We first require the following claim, which comes immediately from the definitions. Claim 8 For every safety formula ν(x) for S and every k 0, a formula ϑ of the form ( ) is satisfiable w.r.t. O iff ϑ is satisfiable w.r.t. T(O). We then need to show that, instead of satisfiability of formulas of the form ( ) in models of T(O), we can concentrate on satisfiability w.r.t. T(O) (which exists thanks to Property (ii) of Theorem 2). Then, by exploiting the algorithm for quantifier elimination in T(O) described in Remark 3, formulas of the form ( ) can be represented via backward search by using quantifier-free formulas. We conclude by noticing that safety/unsafety of S w.r.t. ν(x) can be detected invoking the satisfiability tests on those quantifier-free formulas: these tests are effective thanks to Property (i) of Theorem 2, because the decidability of the constraint satisfiability problem implies the decidability of the satisfiability of arbitrary quantifier-free formulae. Backward search for generic artifact systems is not guaranteed to terminate (Calvanese et al. 2020). However, in case S is unsafe w.r.t. ν(x), an unsafe trace which is finite is found after finitely many iterations of the while loop: hence, in the unsafe case, backward search must terminate. Together with the theorem above, this means that the backward reachability procedure is at least a semi-decision procedure for detecting unsafety of RDFS+-BPs. Nevertheless, we show in the following theorem that, in case of RDFS+-BPs, backward search always terminates: thus, it is a full decision procedure, for which we also provide a PSPACE upper bound. Theorem 9 The safety problem for RDFS+-BPs S = (O, x, ι(x), Sm j=1{τj(x, x )}) is decidable in PSPACE in the combined size of x, ι, and Sm j=1 τj. Proof There are only finitely many quantifier-free ΣT (O)- formulas, up to T(O)-equivalence, that could be built out of a finite set of variables x: this holds for every RDFS+ ontology O. Thanks to the quantifier elimination procedure in Line 6, the overall number of variables in φ is never increased: notice that, without quantifier elimination, computing preimages Pre(τ, φj) would introduce in φj+1 new quantified variables, because of the presence of existentially quantified variables y in τ. This implies that globally there are only finitely many quantifier-free ΣT (O)-formulas that Algorithm 1 needs to analyse. Hence, Algorithm 1 must terminate: by construction of B, the unsatisfiability test of Line 2 must eventually succeed, if the unsatisfiability test of Line 3 never does so. Concerning complexity, we need to modify Algorithm 1. We first notice that, thanks to Lemma 6, the preprocessing that converts an RDFS+-BP with occurrences of case-defined functions into its equivalent RDFS+-BP without any occurrence of case-defined functions does not increase the overall complexity of the problem. Moreover, the translation of an RDFS+-ontology O into T(O) requires polynomial time. Since T(O) is universal and without function symbols, also the satisfiability tests in Lines 2-3 can be performed in polynomial time (cf. (Calvanese et al. 2020), Proposition 3.1). Then, we adopt a nondeterministic procedure, analogous to the one by Calvanese et al. (2019b), Theorem 6.1, that makes the complexity NPSPACE: the main difference from (Calvanese et al. 2019b), Theorem 6.1, is that in our signatures, instead of unary functions, we have unary and binary relational symbols, but the argument works similarly. By Savitch s Theorem (PSPACE = NPSPACE), we conclude the proof. This upper bound is tight. In fact, propositional STRIPS planning (Bylander 1994), a well-known PSPACE-hard problem, can be directly encoded in our setting (without making use of an ontology, and only employing a propositional working memory). We observe that Algorithm 1 is not yet implemented in the state-of-the-art model checker MCMT (Ghilardi and Ranise 2010), which is based on SMT solving. Such an implementation, however, can be obtained by extending MCMT with the quantifier elimination algorithm for T(O) (cf. Remark 3), required in Line 6, together with a procedure for RDFS+ ontology satisfiability, required in Lines 2 3. Conclusions We have studied the problem of verification of data-aware processes under RDFS+ ontologies, where the process component can interact with a knowledge base specified by means of the DL RDFS+, underpinning the RDFS constructs. We addressed this problem by introducing a suitable model of ontology-based processes, called RDFS+-BPs, and by leveraging the SMT-based version of the backward reachability procedure, which is a well-known technique to employ for verifying systems of this kind. Specifically, we have shown that this procedure is a full decision procedure for detecting safety of RDFS+-BPs, and we also provided a PSPACE complexity upper bound. This work opens several directions for future work. First, we notice that the choice of RDFS+ ontologies is not intrinsic to our approach. Indeed, motivated by conceptual modelling and data integration issues in Ontology-Based Data Access (Xiao et al. 2018) applications, we are currently working on the DL-Lite family of DLs, to define suitable DLLite-based processes with analogous decidability and complexity results. The main difference we have to account for is that, for a DL-Lite ontology O, we have an equisatisfiable (but not equivalent) translation into a universal one-variable FO sentence T(O), and Claim 8 in the proof of Theorem 7 has to be modified to show that a trace ϑ is satisfiable w.r.t. O iff a suitably translated trace ˆϑ is satisfiable w.r.t. T(O). In general, nonetheless, we point out that any DL satisfying the two conditions stated in Theorem 2 can be chosen for our purposes, and that the same theoretical guarantees can be obtained over the SMT-based backward reachability procedure. As future work, we thus intend to introduce a more general framework for DL-based processes that is able to account for different DLs. We also intend to extend the results obtained here to more sophisticated artifact-centric models, such as the relational artifact systems (RASs) studied by Calvanese et al. (2020). Moreover, it is worth investigating in this setting also properties that go beyond safety, such as liveness. In this respect, we intend to exploit the framework proposed in (Geatti, Gianola, and Gigante 2022) for solving satisfiability of Linear Temporal Logic Modulo Theories over Finite Traces (LTLf MT), which is a first-order extension of LTLf, so as to symbolically represent DL-based processes and express temporal properties over them. This approach is particularly promising because it relies on the use of the efficient BLACK solver (Geatti, Gigante, and Montanari 2019, 2021), which leverages SMT solvers as backend tools. Acknowledgments This research has been partially supported by the Italian Ministry of University and Research (MUR) under PRIN project PINPOINT Prot. 2020FNEB27, and by the Free University of Bozen-Bolzano with the ADAPTERS project. Abdulla, P. A.; Cerans, K.; Jonsson, B.; and Tsay, Y.-K. 1996. General Decidability Theorems for Infinite-State Systems. In Proc. of LICS, 313 321. Artale, A.; Calvanese, D.; Kontchakov, R.; and Zakharyaschev, M. 2009. The DL-Lite Family and Relations. J. of Artificial Intelligence Research, 36: 1 69. Baader, F.; Calvanese, D.; Mc Guinness, D. L.; Nardi, D.; and Patel-Schneider, P. F., eds. 2003. The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge Univ. Press. ISBN 0-521-78176-0. Baader, F.; Horrocks, I.; Lutz, C.; and Sattler, U. 2017. An Introduction to Description Logic. Cambridge Univ. Press. ISBN 978-0-521-69542-8. Bagheri Hariri, B.; Calvanese, D.; De Giacomo, G.; Deutsch, A.; and Montali, M. 2013a. Verification of Relational Data-Centric Dynamic Systems with External Services. In Proc. of PODS, 163 174. Bagheri Hariri, B.; Calvanese, D.; Montali, M.; De Giacomo, G.; De Masellis, R.; and Felli, P. 2013b. Description Logic Knowledge and Action Bases. J. of Artificial Intelligence Research, 46: 651 686. Brickley, D.; and Guha, R. 2014. RDF Schema 1.1, W3C Recommendation, World Wide Web Consortium. https: //www.w3.org/TR/rdf-schema/. Accessed: 2022-08-01. Bylander, T. 1994. The Computational Complexity of Propositional STRIPS Planning. Artif. Intell., 69(1-2): 165 204. Calvanese, D.; De Giacomo, G.; Lembo, D.; Lenzerini, M.; and Rosati, R. 2007. Tractable Reasoning and Efficient Query Answering in Description Logics: The DL-Lite Family. J. Automat. Reason., 39(3): 385 429. Calvanese, D.; De Giacomo, G.; and Montali, M. 2013. Foundations of Data Aware Process Analysis: A Database Theory Perspective. In Proc. of PODS, 1 12. Calvanese, D.; Ghilardi, S.; Gianola, A.; Montali, M.; and Rivkin, A. 2019a. Formal Modeling and SMT-Based Parameterized Verification of Data-Aware BPMN. In Proc. of BPM, volume 11675 of LNCS, 157 175. Springer. Calvanese, D.; Ghilardi, S.; Gianola, A.; Montali, M.; and Rivkin, A. 2019b. From Model Completeness to Verification of Data Aware Processes. In Description Logic, Theory Combination, and All That, volume 11560 of LNCS, 212 239. Springer. Calvanese, D.; Ghilardi, S.; Gianola, A.; Montali, M.; and Rivkin, A. 2020. SMT-based Verification of Data-Aware Processes: A Model-Theoretic Approach. Math. Structures Comput. Sci., 30(3): 271 313. Calvanese, D.; Gianola, A.; Mazzullo, A.; and Montali, M. 2021. SMT-Based Safety Verification of Data-Aware Processes under Ontologies (Extended Version). Co RR, abs/2108.12330. Chang, C. C.; and Keisler, H. J. 1990. Model Theory. North Holland Publ. Co. Claßen, J.; Liebenberg, M.; Lakemeyer, G.; and Zarrieß, B. 2014. Exploring the Boundaries of Decidable Verification of Non-Terminating Golog Programs. In Proc. of AAAI, 1012 1019. Damaggio, E.; Deutsch, A.; and Vianu, V. 2012. Artifact Systems with Data Dependencies and Arithmetic. ACM Trans. Database Syst., 37(3): 22. de Leoni, M.; Felli, P.; and Montali, M. 2020. Strategy Synthesis for Data-Aware Dynamic Systems with Multiple Actors. In Proc. of KR, 315 325. Deutsch, A.; Hull, R.; Patrizi, F.; and Vianu, V. 2009. Automatic Verification of Data-Centric Business Processes. In Proc. of ICDT, 252 267. Deutsch, A.; Hull, R.; and Vianu, V. 2014. Automatic Verification of Database-Centric Systems. SIGMOD Record, 43(3): 5 17. Deutsch, A.; Li, Y.; and Vianu, V. 2016. Verification of Hierarchical Artifact Systems. In Proceedings of PODS 2016, 179 194. ACM. Deutsch, A.; Li, Y.; and Vianu, V. 2019. Verification of Hierarchical Artifact Systems. ACM Trans. Database Syst., 44(3): 12:1 12:68. Geatti, L.; Gianola, A.; and Gigante, N. 2022. Linear Temporal Logic Modulo Theories over Finite Traces. In Proc. of IJCAI 2022, 2641 2647. ijcai.org. Geatti, L.; Gigante, N.; and Montanari, A. 2019. A SATBased Encoding of the One-Pass and Tree-Shaped Tableau System for LTL. In Proc. of TABLEAUX 2019, volume 11714 of LNCS, 3 20. Springer. Geatti, L.; Gigante, N.; and Montanari, A. 2021. BLACK: A Fast, Flexible and Reliable LTL Satisfiability Checker. In Proc. of OVERLAY 2021, co-located with Gand ALF 2021, volume 2987 of CEUR Workshop Proceedings, 7 12. CEUR-WS.org. Ghilardi, S. 2004. Model-Theoretic Methods in Combined Constraint Satisfiability. J. Automat. Reason., 33(3 4): 221 249. Ghilardi, S.; Gianola, A.; Montali, M.; and Rivkin, A. 2020. Petri Nets with Parameterised Data - Modelling and Verification. In Proc. of BPM, volume 12168 of LNCS, 55 74. Springer. Ghilardi, S.; Gianola, A.; Montali, M.; and Rivkin, A. 2021. Delta-BPMN: A Concrete Language and Verifier for Data Aware BPMN. In Proceedings of BPM 2021, volume 12875 of LNCS, 179 196. Springer. Ghilardi, S.; and Ranise, S. 2010. MCMT: A Model Checker Modulo Theories. In Proc. of IJCAR, volume 6173 of LNCS, 22 29. Springer. Hodges, W. 1993. Model Theory. Cambridge Univ. Press. Hull, R. 2008. Artifact-Centric Business Process Models: Brief Survey of Research Results and Challenges. In Proc. of ODBASE, volume 5332 of LNCS, 1152 1163. Springer. Kontchakov, R.; and Zakharyaschev, M. 2014. An Introduction to Description Logics and Query Rewriting. In Reasoning Web, volume 8714 of LNCS, 195 244. Springer. Lipparini, P. 1982. Locally Finite Theories with Model Companion. In Atti Accad. Naz. Lincei, volume 72. Reichert, M. 2012. Process and Data: Two Sides of the Same Coin? In Proc. of OTM, volume 7565 of LNCS, 2 19. Springer. Vianu, V. 2009. Automatic Verification of Database-Driven Systems: a New Frontier. In Proc. of ICDT, 1 13. Xiao, G.; Calvanese, D.; Kontchakov, R.; Lembo, D.; Poggi, A.; Rosati, R.; and Zakharyaschev, M. 2018. Ontology Based Data Access: A Survey. In Lang, J., ed., Proc. of IJCAI 2018, 5511 5519. ijcai.org. Zarrieß, B.; and Claßen, J. 2016. Decidable Verification of Golog Programs over Non-Local Effect Actions. In Proc. of AAAI, 1109 1115.