# algebra_of_modular_systems_containment_and_equivalence__2527b133.pdf

Algebra of Modular Systems: Containment and Equivalence

Andrei Bulatov, Eugenia Ternovska Simon Fraser University abulatov@sfu.ca, ter@sfu.ca

The Algebra of Modular System is a KR formalism that allows for combinations of modules written in multiple languages. Informally, a module represents a piece of knowledge. It can be given by a knowledge base, be an agent, an ASP, ILP, CP program, etc. Formally, a module is a class of structures over the same vocabulary. Modules are combined declaratively, using, essentially, operations of Codd s relational algebra. In this paper, we address the problem of checking modular system containment, which we relate to a homomorphism problem. We prove that, for a large class of modular systems, the containment problem (and thus equivalence) is in the complexity class NP, and therefore is solvable by, e.g., SAT solvers. We discuss conditions under which the problem is polynomial time solvable.

Introduction Programming from reusable components is one of the main attributes of good software engineering practice. Such a practice reduces the effort, minimizes bugs, saves the cost and time for more high-level development tasks. The theory of combining conventional imperative programs and digital circuits is relatively well developed. However, in knowledge-intensive computing, characterized by using socalled declarative programming, research on combining heterogeneous components is not as advanced. It would be very desirable to be able to write new programs by taking, e.g., a program written in Answer Set Programming (ASP), combining it with a speciﬁcation of a Constraint Satisfaction Problem (CSP), and then also with an Integer Linear Program (ILP), in a compound speciﬁcation for solving a more complex task. Moreover, the components should be substitutable, so that parts can be replaced with an alternative design without breaking the intended functionality. A central algorithmic problem in the modular setting is that of modular system equivalence. Being able to solve the equivalence problem is usually the ﬁrst step toward developing optimization techniques. Moreover, this task is crucial, for example, in system development and rapid prototyping. A related notion is that of modular systems containment. It asks whether one modular system is contained in the other.

Copyright c 2021, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

For example, the ﬁrst expression may represent the system and the second one the property to be satisﬁed. Containment then implies that the design is good, that is, it satisﬁes the speciﬁcation. Moreover, containment property is in the core of the notions of abstraction-reﬁnement and tighteningrelaxation often used in reasoning about computational processes and operations research. Algorithms for containment also solve equivalence, i.e., containment in both directions. While several declarative formalisms provide their own solutions for combining homogeneous modules, our goal is distinctly different. We need combinations of components speciﬁed in different languages (even legacy languages), and those that rely on different solving technologies. For instance, consider a company that provides logistics services. It decides how to pack goods and deliver them. The system has to solve two NP-complete tasks interactively Multiple Knapsack and Travelling Salesman Problem. It takes items from customers to deliver, and considers their profits, weights, and the capacity of trucks available. It has to decide how to pack the items in the trucks, and for each truck, to solve a TSP problem. To save time and other resources, the company utilizes reusable components, that we call concrete modules. The Knapsack part can be solved, by, e.g., a concrete module in Integer Linear Programming (ILP), and the TSP part by a concrete module in Answer Set Programming (ASP). The system is speciﬁed by an algebraic expression that combines these and possibly other components. The expression can be optimized using modular system equivalence and veriﬁed against a speciﬁcation of a desired behaviour. Clearly, our goal of utilizing modules as is implies the need for a semantic, languageindependent approach. Moreover, a natural compositionality principle should hold: a combination of modules is, again, a module that can be further reused. Early work on combining heterogeneous modules, with a range of operators, is (J arvisalo et al. 2009) and (Tasharroﬁ and Ternovska 2011). The latter paper proposed a semantic approach to combining modules, where modules correspond to classes of structures. However, when a module is a class of structures over a speciﬁc vocabulary, what is the notion of substitutability of a module by a new one, that uses a different but compatible vocabulary? The authors have not formalized this notion. Thus, that early work, while making an important transition to considering classes of structures,

The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)

does not fully support programming from reusable components. Moreover, it was not clear to us how to even approach the problem of modular system containment in that formalism. The algebra of (Ternovska 2019) is similar to ours, but it also does not differentiate between abstract and concrete modules. The main focus of that work is on a transition from a static algebra to a dynamic one, and to a related modal dynamic logic. Other heterogeneous formalisms have been proposed, but they either use embeddings of one speciﬁc logic into another, e.g., (Sebastiani 2007; de Bruijn et al. 2007; Eiter et al. 2008), or limit to conjunctions of modules (Lierler and Truszczynski 2014). The inspiration of our Algebra of Modular Systems comes from Database theory. In 1970 Edgar (Ted) F. Codd introduced a relational data model and two query languages: relational algebra and relational calculus (Codd 1970). Relational algebra contains ﬁve basic operations on relational tables (sets of tuples). Relational calculus, a ﬁrst-order logic counterpart, was proven to be essentially equivalent to the algebra (Codd 1972). The formalism is in the foundation of thousands of relational database management systems (RDBMS), a software industry generating tens of billions of dollars annually. While database queries, expressed using Codd s relational algebra, can be viewed as relations deﬁnable with respect to a structure (a database), declarative problem speciﬁcations can be understood as axiomatizations of classes of structures that constrain allowable solutions by some rules. The two notions (in italic) are deﬁned in two consecutive chapters in the classic Enderton s textbook on mathematical logic (Enderton 1972). The main idea of the Algebra of Modular Systems is to lift Codd s algebra from operations on relational tables to operations on classes of structures. This approach inherits compositionality of classical logic. However, by itself, the notion of a module as a class of structures, as in the previous work, is not sufﬁcient to formalize substitutability of various modules into a modular system, as required by a good software engineering practice.

Contributions Unlike the previous work, we introduce a version of the Algebra of Modular Systems with variables that are distinct from constant symbols of a relational vocabulary. Applying compound modules then means binding the variables with relational symbols of the (reusable) components. Such binding (or, equivalently, substitutions of constant relational symbols for relational variables with matching arities) is performed in accordance with what combinations of modules are needed. Moreover, unlike Codd s algebra, unconstrained variables are implicitly cylindriﬁed, i.e., interpreted arbitrarily.1

We introduce a novel distinction between abstract and concrete modules. Formally, a (concrete) module is a class of structures, however, such understanding is too restrictive in the context of reusable components. Therefore we introduce abstract modules that can be instantiated by a concrete

1The term comes from Tarski s Cylindric Algebras. These algebras were introduced by Tarski and others as a tool in the algebraization of the ﬁrst-order predicate calculus. See (Van den Bussche 2001) for a historic context in applications to Database theory.

one by ﬁxing a ﬁrst-order vocabulary. A concrete module can be given by a knowledge base in some logic, be a speciﬁcation of a robotic agent, be, e.g., an ASP, CSP, ILP, CP program, etc. It can even be a human making decisions. In practice, any decision procedure, of arbitrary complexity, could be used. In contract, abstract modules are used in compound speciﬁcations. Such an algebraic speciﬁcation can be applied to any matching concrete modules (i.e., equivalently, boolean queries, classes of structures, decision procedures). We introduce a general notion of representability of a module in a formalism, and apply it to some examples of ASP and CSP concrete modules. Our main technical contribution is a study of the formal problem of Modular System Containment. In this problem, we are given two expressions, and ask whether the structures that satisfy the ﬁrst one, also satisfy the second one. A crucial related notion is that of homomorphism. Our study of the Containment problem is inspired by the results of (Chandra and Merlin 1977) on a similar containment problem for database queries. In particular, we show that Modular System Equivalence problem (and thus, the Containment problem) is undecidable in general. Then, similar to (Chandra and Merlin 1977), we consider a restricted class of modular systems, namely Conjunctive Compound Modules (CCM). We prove that CCMs exhibit a similar connection to homomorphisms between relational structures in ﬁnite model theory. This implies that for CCMs the modular system containment is in the complexity class NP. This kind of modular systems are analogous to conjunctive queries in database theory that constitute by far the largest class of practical queries in databases. The hope is that CCMs will represent an equally common pattern among modular systems. Unlike database queries, CCMs exhibit a signiﬁcantly more general level of abstraction. Being abstract compound modules, they are even more general than classes of structures. Finally, we discuss conditions under which the CCM Containment problem becomes polynomial time solvable.

Algebra: Syntax and Semantics

We need a few preliminary notions. A ﬁrst-order vocabulary (denoted, e.g. τ,ε,ω) is a ﬁnite sequence of non-logical (predicate and function) symbols, each with an associated arity. A τ-structure, e.g. A = (A;SA 1 ,...,SA n , f A 1 ,..., f A m ,c A 1 ,...,c A l ) is a domain A together with interpretations of predicate symbols, function and constants (0-ary functions) in τ. To simplify presentation, we view functions as particular kinds of relations and consider relational structures only. We use B|σ to mean structure B restricted to vocabulary σ. Symbol := means denotes or is by deﬁnition .

Syntax of the Algebra. An atomic module symbol (or simply an atom) is an expression M(X1,...,Xk), where the Xi s are called relational variables. Each Xi has an associated arity ai. The set {X1,...,Xk} is called the vocabulary of M and is denoted vocab(M). The vocabulary of M, vocab(M), which consists of relational variables, is not to be confused with a ﬁrst-order vocabulary deﬁned above for structures, e.g., τ, which consists of predicate symbols. In the former,

we have variables, in the latter, we have constants. For a ﬁxed (ﬁrst order) vocabulary τ an expression M(Si1,...,Sik), where {Si1,...,Sik} τ is called a concrete module or instantiation of M in τ, provided the arity of each Sij equals aj. We write X and S to denote tuples X1,...,Xk and Si1,...,Sik , respectively. A basic compound module is built by the grammar (we use the usual rules for such expressions)

α ::= | Mi | (α α) | (α α) | (α α) | πν(α) | σΘ(α). (1) Symbol represents a tautological module, and Mi are atomic module symbols. If α is a compound module constructed this way, then vocab(α) is the union of vocab(Mi) for all atomic modules involved in α. The operations (except ) are like in Codd s relational algebra, but are of a higher order, and are deﬁned on classes of structures rather than on relational tables. In particular, projection is onto relational variables of vocab(α) rather than onto object variables. The three set-theoretic operations are union ( ), intersection ( ), set difference ( ). Projection (πν(α)) is a family of unary operations, one for each ν, where ν vocab(α). The condition Θ in selection σΘ(E) is an expression that is built up using , , , from equivalence operators , , applied to variables in α. Selection can be used, in particular, to connect modules by equating relational symbols of equal arity. Notice that the framework works for inﬁnite structures.

Semantics of Atomic Modules. The semantics of an atom M(X1,...,Xk) is given in two steps. First, for a FO (ﬁrstorder) vocabulary τ = {S1,S2,...} we instantiate vocabulary symbols X1,...,Xk as predicate symbols Si1,...,Sik from τ so that the arities of Xj and Sij match. The semantics of the resulting concrete module M(Si1,...,Sik) indicates whether M(S) is true (notation B |= M(S)) or false on τstructure B (notation B |= M(S)). For any two τ-structures B1, B2 which coincide on {Si1,...,Sik}, we have B1 |= M(S) iff B2 |= M(S). Thus, for the vocabulary τ and an instantiation Si1,...,Sik the concrete module M(Si1,...,Sik) deﬁnes a class of structures with vocabulary τ. As a simple example consider the module M3Col that (intuitively) decides whether or not a given graph, that is, a set of nodes and a set of edges is 3-colourable. Such a module has a very natural semantics on the class of graphs, that is, τstructures with τ = {E} where E is a binary predicate symbol: For a graph G = (V;EG) it holds G |= M3Col(E) if and only if G is 3-colourable. However, the freedom of interpretations of vocabulary variables allows for more ﬂexibility. Let τ = {B,R}, where both B and R are binary symbols. In other words τ is the vocabulary of bigraphs, structures with edges of two types, say, blue and red. Then for a bigraph G = (V;BG,RG) the expression G |= M3Col(B) that the blueedged part of G is 3-colourable, while G |= M3Col(R) means that the red-edged part of G is 3-colourable. Modules can also be thought of in a more straightforward way. In this way the variables of a module M(X1,...,Xk) are instantiated as predicates over some domain and the module evaluates them. To deﬁne what it means that a τstructure A satisﬁes module M, we need to ﬁrst select pred-

icate symbols Si1,...,Sik from τ, whose arities match those of X1,...,Xk, and then apply the module to SA i1 ,...,SA ik , as we would apply a decision procedure. Semantics of Five Basic Operations. Satisfaction relation |= for compound algebraic expressions is built inductively. 1. Product (α1(X) α2(Y)).2 For a vocabulary τ, instantiations R and S of X and Y, respectively, and a structure B with vocabulary τ, we have B |= α1(R) α2(S) iff both B |= α1(R) and B |= α2(S). 2. Union (α1(X) α2(Y)). For a vocabulary τ, instantiations R and S of X and Y, respectively, and a structure B with vocabulary τ, B |= α1(R) α2(S) iff at least one of B |= α1(R) and B |= α2(S) holds.3

3. Set Difference (α1(X) α2(Y)). For a vocabulary τ, instantiations R and S of X and Y, respectively, and a structure B with vocabulary τ, we have B |= α1(R) α2(S) iff B |= α1(R) and B |= α2(S). Thus, α1 α2 represents a class of structures over the joint vocabulary that are in α1(R), but not in α2(S). 4. Projection (πν(α(X1,...,Xk))), ν {X1,...,Xk}. For a vocabulary τ, instantiation Si of Xi ν, and a structure B with vocabulary τ, the semantics is given by: B |= πν(α)(Si : Xi ν) if there are instantiations Si in τ of Xi {X1,...,Xk} ν, and a τ-structure B such that B |= α(S) and B and B agree on Si, Xi ν. The operation restricts the structures of α to ν vocab(α) leaving the instantiations of other symbols open. Thus, it is more relaxed than the original expression and increases the number of models. 5. Selection is a family of unary operations of the form σΘ(X)(α(X)), where Θ is a condition that can be applied as a test to each structure from instantiations of M. The semantics is given by B |= σΘ(S)(α(S)) iff B |= α(S) and B |=FO Θ(S), where |=FO is the satisfaction relation in the sense of classical ﬁrst-order logic. Notice that as selection imposes extra restrictions on a potential model, it decreases the number of models.

Example 1 Let MHC(X,Y) and M2Col(X,Z,T) be modules computing a Hamiltonian Circuit, and a 2-colouring. In other words, X,Y are variables of arity 2, and the ﬁrst module decides if Y forms a Hamiltonian Circuit (represented as a set of edges) in the graph given by edge set X. Variable X of the second module has arity 2, and variables Z,T are unary; the module decides if unary relations Z,T specify a proper 2-colouring of the graph with edge set X. The following algebraic expression determines a combination of 2-Colouring and Hamiltonian Circuit, that is whether or not there is a 2-colourable Hamiltonian Circuit.

M2Col-HC(X,Z,T) := πX,Z,T(MHC(X,Y) M2Col(Y,Z,T)). (2)

2Note that variables X and Y do not have to be all different. In this case equal variables must have equal instantiations. 3Note that in relational algebra, both arguments to the union and the difference must be relations of the same arity. Here, tuples S, R can be of different length because instantiations are over τstructures, where τ includes the vocabularies of all modules.

Projection hides the instantiation of Y in MHC, since it is the same as Y s in M2Col.

Example 2 The following algebraic expression speciﬁes a new module that veriﬁes if a given structure with a certain binary relation is a graph that is a cycle MCycle(X) := πX(σ(X Y)(MHC(X,Y)). An alternative way to express the same module is MCycle(X) := MHC(X,X).

Representing Modules in CSP and ASP We illustrate how abstract modules can be concretely represented in two broadly used formalisms, Answer Set Programming (ASP) and Constraint Satisfaction Problem (CSP). We use well-known combinatorial problems as examples.4 In computer science, combinatorial decision problems are encoded as sets of binary strings. In ﬁnite model theory (Libkin 2004), they correspond to classes of structures. E.g., 3-Colouring corresponds to all graphs that are 3-colourable. We say module M represents a problem P if each structure in M consists of an instance and a certiﬁcate of P, e.g., a graph and its colouring.

Representing Modules in CSP CSP in the traditional AI form:

Instance: (V,D,C) where V is a ﬁnite set of variables, D is a set of values (also called domain), C is a ﬁnite set of constraints {C1,...,Cn}. Each constraint is a pair (xi,Ti), where xi is the scope, a list of variables of length mi, and Ti is a mi-ary relation over D. Question: Is there f : V D such that f(xi) Ti for all i? One can also search for such f.

Let A and B be τ-structures, where τ is a relational vocabulary τ := {R1,...,Rl}. A homomorphism is a function h : A B such that i {1,...,l} (a1,...,ani) RA i (h(a1),...,h(ani)) RB i . CSP in homomorphism form:

Instance: Two τ-structures, C1 and C2. Question: Is there a homomorphism h : C1 C2?

To relate to the AI form of CSP, think of domain elements in C1 as of variables. Tuples in relations in C1 correspond to constraint scopes, and elements in C2 to values. Relations in C2 are constraint relations. Structure C is a homomorphic pair if it has the form

C = (D;Rdom(C1),RC1 1 ,...RC1 l | {z } C1

,Rdom(C2),RC2 1 ,...RC2 l | {z } C2

where dom(C1),dom(C2) D, Rdom(C1) and Rdom(C2) are unary relations that represent the domains of τ-structures C1 and C2, respectively, and HC is a relation that represents the homomorphism function h. Module M(X1,...,Xk,Y1,...,Yk,Z) is representable in CSP (in homomorphism form) if: (a) the arity of X1,Y1

4We give examples in ASP and CSP, but any other formalism, of arbitrary expressiveness, e.g. Essence (Frisch et al. 2008), could also be used.

is 1, the arity of Xi equals that of Yi, and the arity of Z is 2, (b) for any instantiation S1,...,Sk,R1,...,Rk,T of X1,...,Xk,Y1,...,Yk,Z in a vocabulary τ, for a τstructure A , it holds that A |= M(S1,...,Sk,R1,...,Rk,T) iff T A encodes a homomorphism from (SA 1 ;SA 2 ,...,SA k ) to (RA 1 ;RA 2 ,...,RA k ). Such a representation is clearly not possible when the complexity of checking the membership in the module exceeds NP. The CSP representation of modules can be sometimes simpliﬁed using the non-uniform CSP, when the ﬁrst or the second structure in a pair is ﬁxed. In this case a CSP representation of a module is closer to the standard representation of, e.g., 3-Colouring, and intuitively corresponds to the class of structures homomorphic to a ﬁxed structure (or admitting a homomorphism from a ﬁxed structure). More precisely, let B = (B;S1,...,Sk), B = {b1,...,b|B|} be a structure. Then a module M(X1,...,Xk,Z1,...,Z|B|) is representable in the right non-uniform CSP(B) if following conditions hold: (a) the arity of Z1,...,Z|B| is 1, and the arity of Xi equals that of Si; (b) for any vocabulary τ, and any instantiation R, ...,Rk and T1,...,T|B| of X1,...,Xk and Z1,...,Z|B|, respectively, for a τ-structure A it holds A |= M(R1,...,Rk,T1,...,T|B|) iff T1,...,T|B| is a partition of the domain A of A , and the mapping ϕ : A B given by ϕ(a) = bi where a Ti for a A is a homomorphism from (A;RA 1 ,...,RA ) to B. Representations by the left non-uniform CSP are similar.

Example 3 (Hamiltonian Circuit in CSP (hom. form)) An abstract atomic module representing Hamiltonian Circuit problem (denoted by MHC(X,Y), where X and Y are binary), is such that for any vocabulary τ and any instantiation S,R of X,Y, for a τ-structure B with domain V it holds that B |= M(S,R) iff G = (V;SB) is a graph, RB SB is a set of edges of G that form a Hamiltonian Circuit. It is representable by a concrete CSP instance (C ,D) in homomorphism form where C = (V;EC ,( =V)C ), D = (V;ED,( =V)D) and there is a homomorphism H from C to D. This CSP instance represents a class of structures of the form E = (V;EC ,( =V)C ,ED,( =V)D,HE ). Notice that =V and H are auxiliary, not a part of MHC(E,CV), and are speciﬁc to the CSP representation.

Example 4 (Betweenness in CSP (AI form)) This problem was originally posed in (Opatrny 1979). An abstract atomic module representing Betweenness problem, MB(X,Y), for any instantiation N,F of X,Y, is a class of structures of the form B = (V Q;NB,FB) where V is a ﬁnite set, NB V 3, mapping FB : V Q is such that it generates a linear ordering of V such that, for each triple (r,b,g) MB, we have r < b < g or r > b > g, and Rb = {(a,b,c) Q3 | a < b < c or a > b > c}. It is representable by concrete CSP instance (V,Q,C) in the traditional AI form, where V is a set variables, Q is the domain for those variables, and C is a set of constraints, C = {Cm | m N}, Cm = ((u,v,w),Rb), and Rb = {(a,b,c) Q3 | a < b < c or a > b > c}. Observe that Betweenness admits representation by the right non-uniform CSP, as every concrete module of this form is the class of all struc-

tures that admit homomorphism to (Q,N), where N is the betweenness relation on Q.

Example 5 (k-Colouring in CSP (homomorphism form)) An abstract atomic module representing k-Colouring problem, Mk Col(X,Y), where both X,Y are binary, is such that for any vocabulary τ and any instantiation S,R of X,Y, for a τ-structure B with domain V it holds that B |= M(S,R) iff G = (V;SB) is a graph, RB SB is an equivalence relation on V with at most k equivalence classes such that each equivalence class is an independent set of G. Let Col refers to the relation that represents the homomorphism function h : G1 G2. Consider structures of the form: C = (V;V1G1,EG1 1 | {z } G1

,V2G2,EG2 2 | {z } G2

concrete 3-Colouring CSP module, M3Col(E,Col), consists of all the structures C as above, restricted to the vocabulary of this module.

In general, any class of structures representable by CSP is a (concrete) module. The converse is not true. This is due to the well known property: if there is a homomorphism from A to B and from B to C , there is also a homomorphism from A to C . Any concrete module that does not satisfy this condition cannot be represented by CSP. For example: pairs of structures A , B such that |A|+|B| is odd. There are A , B, C such that A B and B C , and |A|+|B|, |B|+|C| are odd, so (A ,B), (B,C ) belong to the module. Then if this module is representable by CSP, then, since A C , we must have (A ,C ) in the module. But we do not because |A|+|C| is even.

Representing Modules in ASP We assume familiarity of the reader with Answer Set Programming (Niemel a 1999; Marek and Truszczynski 1999). We separate IDB and EDB predicates, as is common in Datalog. Intuitively, EDB predicates are given by a database, and IDB predicates are deﬁnable in terms of those. For a program Π, we use Π to denote an ASP program obtained by augmenting Π with ground atoms representing the database, the interpretations of the EDB predicates. We say a module M is representable in ASP if there is an ASP program Π such that the stable models of Π , for each interpretation of the EDB predicates, when limited to the vocabulary of M, are precisely the structures of M.

Example 6 (Hamiltonian Circuit in ASP) The program is from page 89 of (H olldobler and Schweizer 2014).

1 {cycle(X,Y) : edge(X,Y)} 1 : node(X). 1 {cycle(X,Y) : edge(X,Y)} 1 : node(Y). reachable(Y) : cycle(s,Y). reachable(Y) : cycle(X,Y),reachable(X). : node(X),not reachable(X).

A node in the cycle has exactly one incoming and one outgoing edge, according to the ﬁrst two rules. Since for every node we nondeterministically pick one outgoing and incoming edge, we might have generated a cycle or not. To rule out

model candidates not representing cycles, we check whether every node can be reached by every other node and exclude models where there is an unreachable node. This concrete ASP module is a particular instantiation MHC(edge,cycle) of the abstract module from Example 3, where node, reachable and s are auxiliary and are speciﬁc to this concrete representation.

We now give a general translation. From ASP Programs to Atomic Modules An atomic module representing general ASP program, MASP, where τ = {R1,...,Rl,R1,...,Rk}, is a class of structures of the form

A = (D;RA 1 ,...,RA l | {z } EDB

,RA 1 ,...,RA k | {z } IDB

where D is the active domain, i.e., the constants used in the ground atoms representing the database.

Containment and Equivalence Equivalence We say that two atomic or compound modules α,α are equivalent,5 denoted α = α , if B |= α(S) iff B |= α (S) for any τ-structure B and any instantiation S of variables by relational symbols from τ. E.g., (M1 M2) M3 = (M1 M3) (M2 M3), for any choice of atoms M1,M2,M3. Modular System Equivalence problem is given two compound modules α1,α2, decide whether they are equivalent. Containment We say that a M1 is contained in M2, denoted M1 M2, if B |= M2(S) is true whenever B |= M1(S) is true, for any structure B, for any instantiation of variables S by relational symbols from τ. For compound modules α1,α2 with atoms M1,...,Ms, we say that α1 is contained in α2, denoted α1 α2, if B |= α2(S) is true whenever B |= α1(S) is true, for any instantiations S and any choice of atoms M1,...,Ms. Thus, this is essentially the problem of logical implication for our algebraic expressions. For example, M1 M2 M1 M1 M2 for any choice of atoms M1, M2. We have M1 = M2 iff M1 M2 and M2 M1.

Theorem 1 The Modular System Equivalence problem is undecidable in general, even in the ﬁnite case.

Proof: (outline) The Finite Satisﬁability problem asks whether a given formula in ﬁrst-order logic is satisﬁable by a structure with a ﬁnite domain. By Trakhtenbrot s (Trakhtenbrot 1950) theorem, this problem is undecidable. There is a straightforward reduction of the Finite Satisﬁability problem to the Modular System Equivalence problem. Observe that ﬁnite satisﬁability of ﬁrst-order logic is essentially the same problem as ﬁnite satisﬁability of relational algebra, so ﬁnite satisﬁability of modular systems is undecidable. To ﬁnish the argument, let Q be an unsatisﬁable modular system. Then F equivalent Q is the same problem as checking if F is ﬁnitely satisﬁable, which is undecidable.

5We use equality symbol (=) as a meta-symbol between algebraic expressions, and equivalence symbol ( ) as a logical connective (biconditional) in formulas.

Thus, the problem of deciding containment of compound modules is also undecidable. However, as we show below, for certain broad classes of modules it is decidable and even NP-complete, or sometimes solvable in polynomial time. It will be convenient in this section to represent tuples or sets of vocabulary variables and their instantiations as indexed sets, e.g., {Xi : i I} for a certain (ﬁnite) set I. We will mostly assume that I is clear from the context. A conjunctive compound module (CCM) is a compound module expressible by a relational algebra expression:

π{Xi:i J}(σΘ(M1 Mn)(Xi : i I)),

where J I are some ﬁnite sets, Θ is a conjunction of equivalence ( ) atomic formulas. CCMs are also expressible in logic form by formulas in prenex normal form built from atomic modules Mj(Xi1,...,Xin), and and (applied to vocabulary variables) only:

(Xi : i J) Φ(Xi : i I),

where Φ(Xi : i I) is a conjunction of atomic modules of the form Mj(Xi1,...,Xin). We study the complexity of the following problem: CCM Containment Problem

Input: CCMs α1,α2 over the same atomic module symbols M1,...,Ms. Decide: if α1 α2, for any choice of M1,...,Ms.

Note that the CCM Containment problem makes sense only for abstract modular systems (represented by algebraic expressions), i.e., modular systems where atomic module symbols have relational variables as arguments (rather than elements of a ﬁxed relational vocabulary τ). Let us observe ﬁrst that the selection operation in a CCM can be eliminated. Indeed, since the only Θ allows is a conjunction of expressions of the form X Y, Θ deﬁnes an equivalence relation on the set of variables. Replacing every occurrence of every variable from each equivalence class with a ﬁxed representative of that class, we obtain an equivalent CCM containing no selection operation. A CCM α is said to be projection-free if it has the form M1 Mn for some atoms M1,...,Mn. Canonical Structure For a projection-free CCM α(X) = Φ(M1,...,Ms), the canonical structure Aα is deﬁned as: the domain of Aα is X = {Xi : i I} for some ﬁnite set I; the vocabulary is δ = {Ti | i {1,...,s}}, where the arity of Tj equals that of module Mj; the instantiation of every symbol Tj is given by (Xi1,...,Xik) T Aα j iff Mj(Xi1,...,Xik) is an atom in Φ.

Notice that, in a canonical structure, for a speciﬁc CCM α( X), the variables of α become domain elements. For a CCM α = π{Xi:i J}α that is not projection-free and α is projection-free, Aα is deﬁned to be Aα . As an example of a canonical structure consider compound module α(X1,X2,X3,x4) = M1(X1,X2) M2(X2,X2,X3) M1(X3,X4). Then the domain of Aα is the set {X1,X2,X3,X4), its vocabulary consists of two relational symbols T1 and T2 of arity 2 and 3,

respectively. These symbols are then instantiated as follows: T Aα 1 = {(X1,X2),(X3,X4)} and T Aα 2 = {(X2,X2,X3)}.

Proposition 1 Let α1,α2 be projection-free CCMs. Then α1 α2 iff the identity mapping is a homomorphism from Aα2 to Aα1.

Proof: Indeed, if the identity is a homomorphism, it simply means that every atom of α2 is also an atom of α1. On the other hand, if α1 α2 every atom of α2 must be present in α1, as otherwise there are atomic modules, an instantiation, and a structure that contradicts α1 α2. This immediately implies that the identity mapping is a homomorphism between the two structures.

Next we extend Proposition 1 to general CCMs, toward obtaining Theorem 2. It will be convenient for us to represent a CCM as α = π(Xj:j J)Φ(M1,...,Ms), where J I and Φ(M1,...,Ms) is a projection free CCM. Let M = Φ(M1,...,Ms). In addition to canonical structure, we will need the notion of a power structure.

Power Structure. Let A be a structure with domain A and vocabulary τ, and let M1,...,Ms be modules. The power structure K = KM1,...,Ms(A ) is deﬁned as follows: let a1,...,ar be all the different arities of symbols from τ; the domain of K is

K = P(Aa1) P(Aar),

where P(B) denotes the power set of B; the vocabulary of KM1,...,Ms(A ) is R1,...,Rs, and the arity of Rj is that of the module Mj; every relational symbol Rj of arity k is interpreted as fol-

lows: tuple (c1,...,ck) Kk belongs to R KM1,...,Ms(A ) j iff ct P(Aait ), where ait is the arity of t th argument of Mj, for t [k], and B |= Mj(Si1,...,Sik) for any τ-structure B such that SB it = ct for t [k]. Note that as deﬁned, KM1,...,Ms(A ) depends only on the modules M1,...,Ms, the domain, and the vocabulary of A , but not on A s relations. The following is an example of a simple power-structure. Let M(X1,X2) be a module whose arguments have arities 2 and 3, respectively. Let also A be a relational structure with domain A = {0,1} and vocabulary τ = {T1,T2} of arities 2 and 3. Then K = P({0,1}2) P({0,1}3) consists of sets of pairs and sets of triples over {0,1} that correspond to all possible binary and ternary relations. Suppose that module M contains all the structures B with a binary and a ternary symbols such that M(S1,S2) is true whenever

S1 {(a,b): (a,b,c) S2} {(b,c): (a,b,c) S2} {(a,c): (a,b,c) S2}. (3)

Then KM(A ) has only one relational symbol R interpreted as the set of all pairs (S1,S2), where S1 {0,1}2, S2 {0,1}3 satisfying (3). These includes pairs (/0, /0), ({(0,0),(1,1)},{(0,0,0), (1,1,1)}), ({(a,b): a b},{(a,b,c): a b c}) among others.

The following lemma is a generalization of the Magic Lemma from (Chandra and Merlin 1977), as it is sometimes referred to, to a much higher level of abstraction. In this lemma, we relate the satisfaction of a concrete compound module, that is, a module where variables are instantiated with speciﬁc symbols of a relational vocabulary, to the question of existence of a homomorphism that maps variables to the elements of the domain of the Power Structure. The homomorphism takes into account how variables are instantiated by speciﬁc symbols of a relational vocabulary. The difference with the original setting (Chandra and Merlin 1977) is substantial. Instead of queries that are formulae that use database relations directly, we have algebraic combinations of modules that are, essentially, decision procedures, as in the previous section, that accept or reject structures. These structures provide interpretations to second-order variables, given their instantiations in τ. Because of this additional level, and because the proof has to work for all instantiations of variables by relational symbols, our construction of the Power Structure is rather involved. Notice that, in the results below, modules are treated abstractly as classes of structures (sets if a domain is given). As a consequence, the lemma and the subsequent theorem apply to all modules.

Lemma 1 (Magic Lemma II) Let M be a CCM with the set of variables {Xi : i I} and the set of variables J I in the projection operation, M1,...,Ms be atoms of M. Let A be a structure with vocabulary τ, and {Si : i J} an instantiation of the variables of M in τ. Then

A |= M(Si : i J) iff there is a homomorphism ϕ from AM to KM1,...,Ms(A )

such that ϕ(Xi) = SA i for i J.

Proof: ( ) Suppose A |= M(Si : i J), and A has a domain A. This means that there is an instantiation {Si : i I J}, a structure A with the same domain such that A |= M (Si : i I) and SA i = SA i for i J. Deﬁne ϕ : {Xi : i I} K by ϕ(Xi) = SA i . Obviously, this mapping satisﬁes the condition ϕ(Xi) = SA i for i J. Since for every atom Mj(Xi1,...,Xik) of Φ, A |= Mj(Si1,...,Sik), the tuple

(SA i1 ,...,SA ik ) belongs to R KM1,...,Ms(A ) j . Thus, ϕ is a homomorphism. ( ) Suppose there is a homomorphism ϕ from AM to KM1,...,Ms(A ) such that ϕ(Xi) = SA i for i J. Deﬁne A on the same domain A as that of A by setting, for every Si, i I, SA i = ϕ(Xi). Note that if i J then by assumption SA i = SA i . Also, since for every atom Mj(Xi1,...,Xik) of Φ, we have

(SA i1 ,...,SA ik ) R KM1,...,Ms(A ) j , A |= Mj(Si1,...,Sik). Thus, A |= M (Si : i I), and so A |= M(Si : i J).

Theorem 2 (Homomorphism Theorem) Let α1 = π(Xi:i J) Φ1(M1,...,Ms) and α2 = π(Xi:i J) Φ2(M1,...,Ms) be two CCMs for J I. Then α1 α2 iff there is a homomorphism ϕ : Aα2 Aα1 such that ϕ(Xi) = Xi for all i J.

Proof: ( ) Suppose there is a homomorphism ϕ : Aα2 Aα1 such that ϕ(Xi) = Xi for i J. Fix atomic modules M1,...,Ms. Now if A |= α1(Si : i J) for some instantiation {Si : i J}, by the Magic Lemma II there is a homomorphism from Aα1 to KM1,...,Ms(A ). Composing it with ϕ we obtain a homomorphism from Aα2 to KM1,...,Ms(A ). Therefore, A |= α2(Si : i J). ( ) Since α1 α2, we can choose any kind of modules and structures to guarantee that a required homomorphism exists, and we will use this freedom to the fullest. Given a vocabulary τ we deﬁne a specialized structure that only depends on τ. Fix some τ = {Si : i I} so that its symbols correspond to the variables of α1,α2 and have the same arities. Let Aτ be a structure with domain A = τ and vocabulary τ. For every Si τ, its interpretation is given by SAτ i = {(Si,...,Si)}, where the length of the tuple is the arity of Si. It is valid because Si also belongs to the domain of Aτ. Second, we deﬁne modules M1,...,Ms in such a way that the structure KM1,...,Ms(Aτ) contains a substructure isomorphic to Aα for some CCM α. Let Mj be such that under any instantiation {Si : i I} it satisﬁes the following condition:

(*) if B |= Mj(Si1,...,Sik) where B is a τ-structure with domain τ, Si1,...,Sik τ, then SB it = (Sit,...,Sit), t [k].

It does not matter how Mj behaves with other vocabularies and domains. Let α be a CCM with variables Xi, i I, and atomic modules M1,...,Ms. Later modules α1 and α2 will be used as α here. For α we deﬁne modules M1,...,Ms with property (*) and such that for some instantiation {Si : i I} it holds Aτ |= Mj(Si1,...,Sik) iff Mj(Xi1,...,Xik) is an atom in M. We denote the structure KM1,...,Ms(Aτ) by Kα(Aτ). Recall that a1,...,ar are all the different arities of symbols from τ. We consider a substructure of Kα(Aτ) induced by the set K (Aτ) K = P(Aa1) P(Aar), the domain of Kα(Aτ) consisting of all singleton sets of the form {(S,...,S)} P(Aaj), S τ and aj is the arty of S. Note that the domain of this structure does not depend on speciﬁc modules M1,...,Ms). It is not difﬁcult to see that Kα(Aτ) restricted to K (Aτ) is isomorphic to AM. By Magic Lemma II, Aτ |= M(Si : i J) for any instantiation {Si : i J}. Now we are ready to deﬁne a homomorphism from Aα2 to Aα1. By the premise of Theorem 2, for any choice of modules M1,...,Ms, vocabulary τ, instantiation {Si : i J}, and τ-structure A , if A |= α1(Si : i J) then also A |= α2(Si : i J). Choose M1,...,Ms as above for module α1. Then Aτ |= α1(Si : i J), and so Aτ |= α2(Si : i J). By Magic Lemma II there is a homomorphism ψ of Aα2 to Kα1(Aτ). As is easily seen, we may assume that the range of ψ is a subset of K (Aτ). Indeed, if ψ(Xi) K (Aτ) for some i I, then by condition (*) no tuple in no relation of Aα2 contains Xi, and its image can be chosen arbitrarily. Since Kα1(Aτ) restricted to K (Aτ) is isomorphic to Aα1, we get the result.

Theorem 2 reduces the CCM Containment problem to the Homomorphism problem of two relational structures. As the latter belongs to NP, we obtain the following corollary.

Corollary 1 (Membership in NP) The CCM Containment problem belongs to NP.

The complexity of the Homomorphism problem can be reduced if we restrict the class of allowed structures. The list of useful restrictions is long and includes bounded tree and hypertree width, various forms of acyclicity (see e.g. (Gottlob, Leone, and Scarcello 1999b,a; Gyssens, Jeavons, and Cohen 1994)), and fractional hyper tree width (Grohe and Marx 2006; Marx 2013). By Theorem 2 if we consider CCMs satisfying any of these conditions, the respective CCM Containment problem becomes polynomial time solvable.

Conclusion and Future Work In this paper, we addressed the problem of checking modular system equivalence and containment. This problem is important, for example, in hierarchical and component-based development, in rapid prototyping and system veriﬁcation. We proved that, for a large class of modular systems, namely for Conjunctive Compound Modules, system containment (and thus equivalence) problem is in the complexity class NP. This is a typical class of modular systems, conveniently described using an expressive subset of the operations. We discussed cases where the containment problem is solvable in polynomial time. We also introduced the notion of representability of modules in speciﬁc formalisms and provided examples of such representations. An important future extension of our work would be to the Unions of Conjunctive Compound Modules, using Sagiv-Yannakakis theorem (Sagiv and Yannakakis 1980). We would also like to prove a Containment property for an efﬁcient restriction of a dynamic version of the algebra (Ternovska 2019), where inputs and outputs of modules are speciﬁed, similarly to (Aamer et al. 2020a) and (Aamer et al. 2020b).

Acknowledgements The authors are grateful to the anonymous referees for their helpful comments. This research is supported by Natural Sciences and Engineering Research Council of Canada (NSERC).

References Aamer, H.; Bogaerts, B.; Surinx, D.; Ternovska, E.; and Van den Bussche, J. 2020a. Executable ﬁrst-order queries in the logic of information ﬂows. In Proc. ICDT 20, 4:1 4:14.

Aamer, H.; Bogaerts, B.; Surinx, D.; Ternovska, E.; and Van den Bussche, J. 2020b. Inputs, Outputs, and Composition in the Logic of Information Flows. In Proc. KR 20, 2 11.

Chandra, A. K.; and Merlin, P. M. 1977. Optimal Implementation of Conjunctive Queries in Relational Data Bases. In STOC, 77 90. ACM.

Codd, E. F. 1970. A Relational Model of Data for Large Shared Data Banks. Commun. ACM 13(6): 377 387.

Codd, E. F. 1972. Relational Completeness of Data Base Sublanguages. Research Report / RJ / IBM / San Jose, California RJ987.

de Bruijn, J.; Eiter, T.; Polleres, A.; and Tompits, H. 2007. Embedding Non-Ground Logic Programs into Autoepistemic Logic for Knowledge-Base Combination. In Proc. IJCAI 07). Hyderabad, India: AAAI Press.

Eiter, T.; Ianni, G.; Lukasiewicz, T.; Schindlauer, R.; and Tompits, H. 2008. Combining answer set programming with description logics for the Semantic Web. Artif. Intell. 172(12-13): 1495 1539.

Enderton, H. B. 1972. A mathematical introduction to logic. Academic Press. ISBN 978-0-12-238450-9.

Frisch, A. M.; Harvey, W.; Jefferson, C.; Mart ınez Hern andez, B.; and Miguel, I. 2008. Essence: A constraint language for specifying combinatorial problems. Constraints 13: 268 306.

Gottlob, G.; Leone, N.; and Scarcello, F. 1999a. A Comparison of Structural CSP Decomposition Methods. In IJCAI 99, Stockholm, Sweden, July 31 - August 6, 1999, 394 399.

Gottlob, G.; Leone, N.; and Scarcello, F. 1999b. On Tractable Queries and Constraints. In Database and Expert Systems Applications, 10th Int. Conference, DEXA 99, Florence, Italy, August 30 - September 3, 1999, 1 15.

Grohe, M.; and Marx, D. 2006. Constraint solving via fractional edge covers. In SODA, 289 298. ACM Press.

Gyssens, M.; Jeavons, P.; and Cohen, D. A. 1994. Decomposing Constraint Satisfaction Problems Using Database Techniques. Artif. Intell. 66(1): 57 89.

H olldobler, S.; and Schweizer, L. 2014. Answer Set Programming and Clasp, A Tutorial. In Proc. of the Young Scientists Int. Workshop on Trends in Information Processing (YSIP), 77 95. CEUR Workshop Proceedings.

J arvisalo, M.; Oikarinen, E.; Janhunen, T.; and Niemel a, I. 2009. A Module-Based Framework for Multi-language Constraint Modeling. In LPNMR, volume 5753 of Lecture Notes in Computer Science, 155 168. Springer.

Libkin, L. 2004. Elements of Finite Model Theory. Springer Verlag.

Lierler, Y.; and Truszczynski, M. 2014. Abstract Modular Inference Systems and Solvers. In Proc. PADL 2014).

Marek, V. W.; and Truszczynski, M. 1999. Stable Models and an Alternative Logic Programming Paradigm. In The Logic Programming Paradigm, Artiﬁcial Intelligence, 375 398. Springer.

Marx, D. 2013. Tractable Hypergraph Properties for Constraint Satisfaction and Conjunctive Queries. J. ACM 60(6): 42.

Niemel a, I. 1999. Logic Programs with Stable Model Semantics as a Constraint Programming Paradigm. Ann. Math. Artif. Intell. 25(3-4): 241 273.

Opatrny, J. 1979. Total ordering problem. SIAM J. Comput 8(1): 111 114.

Sagiv, Y.; and Yannakakis, M. 1980. Equivalences Among Relational Expressions with the Union and Difference Operators. J. ACM 27(4): 633 655. Sebastiani, R. 2007. Lazy Satisﬁability Modulo Theories. Journal of Satisﬁability, Boolean Modeling and Computation (JSAT) 3: 141 224. Tasharroﬁ, S.; and Ternovska, E. 2011. A Semantic Account for Modularity in Multi-language Modelling of Search Problems. In Fro Co S 2011, 259 274. Ternovska, E. 2019. An Algebra of Modular Systems: Static and Dynamic Perspectives. In Fro Co S, 2019. Trakhtenbrot, B. 1950. The Impossibility of an Algorithm for the Decidability Problem on Finite Classes. Proceedings of the USSR Academy of Sciences (in Russian) 70(4): 569 572. Van den Bussche, J. 2001. Applications of Alfred Tarski s Ideas in Database Theory. In CSL, volume 2142 of Lecture Notes in Computer Science, 20 37. Springer.