# complete_symmetry_breaking_for_finite_models__32ddbecd.pdf Complete Symmetry Breaking for Finite Models Marek Danˇco1, Mikol aˇs Janota1, Michael Codish2, Jo ao Jorge Ara ujo3 1Czech Technical University in Prague, Czechia 2Ben-Gurion University of the Negev, Beer-Sheva, Israel 3Department of Mathematics, NOVA FCT, Portugal dancomar@cvut.cz This paper introduces a SAT-based technique that calculates a compact and complete symmetry-break for finite model finding, with the focus on structures with a single binary operation (magmas). Classes of algebraic structures are typically described as first-order logic formulas and the concrete algebras are models of these formulas. Such models include an enormous number of isomorphic, i.e. symmetric, algebras. A complete symmetry-break is a formula that has as models, exactly one canonical representative from each equivalence class of algebras. Thus, we enable answering questions about properties of the models so that computation and search are restricted to the set of canonical representations. For instance, we can answer the question: How many nonisomorphic semigroups are there of size n? Such questions can be answered by counting the satisfying assignments of a SAT formula, which already filters out non-isomorphic models. The introduced technique enables us calculating numbers of algebraic structures not present in the literature and going beyond the possibilities of pure enumeration approaches. Introduction Finite model finding has a longstanding tradition in automated reasoning: often a user is interested in a model rather than proving a theorem (Mc Cune 1994). Models serve as counterexamples to invalid conjectures (Blanchette 2010) but are also interesting on their own. Indeed, Forsythe (1955) enumerates semigroups of order 4 on a computer as early as 1955. A large body of research exists that tackles finite model finding, enumeration, and counting, using constraint programming (CP) and SAT techniques (Audemard and Benhamou 2002; Distler and Kelsey 2014; Distler, Shah, and Sorge 2011; Audemard and Henocque 2001; Zhang 1996; Zhang and Zhang 1995; Mc Cune 2003; Claessen and S orensson 2003; Ara ujo, Chow, and Janota 2023; Reger, Riener, and Suda 2019). This paper computes compact and complete symmetry breaking constraints for a wide range of algebraic structures with a single binary operation, known as magmas. Classes of algebraic structures, such as semigroups, are typically described as first order logic formulas and the concrete algebras are models of these formulas. These models include an Copyright 2025, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. enormous number of isomorphic, i.e. symmetric, algebras. For a finite algebra over a domain of size n, each permutation of the n domain elements introduces a symmetry. So there are a super-exponential number of symmetries. A complete symmetry breaking constraint for an algebraic structure is satisfied exactly for the canonical representations of the structure. Complete symmetry breaks for algebraic structures, as computed in this paper, are not found in the literature. Applying these enables to answer, via computation, questions about properties of the models so that computation and search are restricted to the set of canonical representations. For instance, we can answer the question: How many nonisomorphic semigroups of size n are there? As long as we can calculate the complete symmetry-break. Questions of this type can be answered by counting the satisfying assignments of a SAT formula encoding both the property of the algebraic class and the complete symmetry break. Like this, it already filters out non-isomorphic models. This enables us to use out-of-the-box model counters (Gomes, Sabharwal, and Selman 2021) to calculate the non-isomorphic structures without enumerating them. Our approach can be seen as constrain and generate , whereas pure enumeration is generate and prune . Applying our approach enables us to calculate numbers of algebraic structures not present in the literature and that go beyond the possibilities of pure enumeration approaches. We mention also partial symmetry breaks. These are constraints which are satisfied by at least one element from each equivalence class of objects. Partial symmetry breaks are typically smaller in size than complete breaks but admit redundant (isomorphic) solutions. The symmetry breaks we compute are lex-leader symmetry breaks. Namely, from each class of isomorphic (or symmetric) objects, we select a canonical representative which is the minimal object, with respect to a lexicographic ordering, in its class. So, in theory, a complete lex-leader symmetry break is obtained by introducing a constraint that imposes that an algebraic structure must not be larger than any of its permutations. A symmetry break defined in this way involves a super-exponential number of constraints and is too large to be of practical use. In this paper we show that symmetry breaks can be often more compact in practice. However, we also show that un- The Thirty-Ninth AAAI Conference on Artificial Intelligence (AAAI-25) less P=NP, we do not expect to find a symmetry break for unrestricted algebraic structures of polynomial size. This paper adapts the approach described in Itzhakov and Codish (2016) applied to compute compact and complete symmetry breaks for graph search problems. In fact, as graphs can be posed as an algebraic structure, our approach is a generalization of this previous work. In the case of graphs, Itzhakov and Codish (2016) compute a complete symmetry breaking constraint for order 10 graph search problems consisting of only 7,853 lexicographic order constraints instead of all 10! = 3,628,800 constraints. This is our motivation in the context of the current paper. In our context, for example, we compute a complete symmetry break for AG-groupoids of size 7 that involves 240 lexicographic order constraints instead of 7! = 5,040. Applying this symmetry break enables counting of all AG-groupoids of size 7, a number which is not found in current literature. Reasoning about algebraic structures is significantly harder than about graphs, algebras correspond to matrices where the contents and row-column indices interact (one may for example have an axiom x x = x). For binary operations, the search-space is bounded by nn2, which gets unwieldy as quickly as for n = 5 ( 3e+17), which is why n = 6 is already a challenge for many algebra classes. To summarize the main contributions of this paper: 1. We generalize the approach described by Itzhakov and Codish (2016) to apply to arbitrary classes of algebraic structures described as first order logic formulas. 2. We compute compact complete symmetry breaks for a variety of algebraic structures. 3. We apply the compact complete symmetry breaks that we compute to count non-isomorphic structures of several representative classes of algebras obtaining new results. 4. We give novel theoretical insights in the complexity of complete and partial symmetry breaks for finite models. Preliminaries In this paper we study finite mathematical structures with a single binary operation, known as magmas (aka groupoids). For example, finite groups or semigroups are magmas. Magmas are denoted by a pair (D, ) where D is the domain and a binary operation on D. We rely on the well-established term of isomorphism and isomorphic copy. Definition 1 (isomorphism). A bijection f : D1 D2 is an isomorphism from a magma (D1, ) to (D2, ) if f(a b) = f(a) f(b), for all a, b D1. Two magmas are isomorphic iff there exists at least one isomorphism between them. Definition 2 (isomorphic copy). Consider a magma (D1, ) and a bijection f : D1 D2 then the isomorphic copy (D2, )f is defined as a b = f(f 1(a) f 1(b)). For a magma A we write f(A) for its isomorphic copy. Throughout the paper, we consider magmas on a finite domain D = {0, . . . , n 1} for n N+, where n is denoted as the size of the magma. Then, we are only concerned with isomorphisms between magmas on the same domain D, which means that any such isomorphism is a permutation π on D, i.e., an element of the relevant symmetric group, which we will denote as Sn. A finite magma can be naturally represented as a twodimensional multiplication table, which enables us to order them lexicographically, comparing their vectorization. The table may be vectorized in an arbitrary fashion, as long as this vectorization is fixed. Definition 3 (magma vectorization). Let A be a magma. Then vec(A) is a fixed vectorization of the two-dimensional multiplication table corresponding to A. Throughout the paper we consider three alternative vectorizations of magmas. Let vecr(A) denote the row-byrow concatenation of the rows of A left-to-right, top-down, i.e., the way people are used to read and write in Western civilization. Let vecd(A) denote the diagonal first vectorization in which the elements of the diagonal occur first, followed by the row-by-row vectorization, skipping the diagonal elements. Let vecc(A) denote the concentric vectorization where cells are ordered by the sum of row and column indices and ties are broken by row-by-row ordering. So for example, xor, i.e. 01 10 , is vectorized as 0, 1, 1, 0 in vecr( ),vecc( ) and as 0, 0, 1, 1 in vecd( ). For x+y mod 3, i.e. 012 120 201 , vecr( ) gives 0, 1, 2, 1, 2, 0, 2, 0, 1, vecc( ) gives 0, 1, 1, 2, 2, 0, 2, 0, 1, and vecd( ) gives 0, 2, 1, 1, 2, 1, 0, 2, 0. These vectorizations are mutually incompatible but each individually impose a total order on algebras via the lexicographic ordering of the corresponding vectors. Definition 4 ( ). Assume a vectorization vec( ) for magmas of size n. Define a total order vec( ) by defining A vec( ) B true iff vec(A) is lexicographically smaller or equal than vec(B). Note that vec(A) is a vector of length n2. We usually omit the subscript and write when vec( ) is clear from the context. Definition 5 (lex-leader). We say that a magma A = (D, ) is a lex-leader with respect to a given vectorization vec( ), iff for all magmas B isomorphic to A, it holds that A vec( ) B. Canonizing Sets Preliminaries Let Mn denote the set of all order n algebraic structures of a selected type (e.g. quasigroups). The following is adapted from (Itzhakov and Codish 2016) where the definition and algorithm are stated for simple graphs. Definition 6 (canonicity). Let A Mn, Π Sn, and denote the predicate minΠ(A) = V π Π A π(A). We say that A is canonical if min Sn(A). We say that the set Π is canonizing if A Mn. minΠ(A) min Sn(A). In a nutshell, a canonizing set Π is a means of quantifier elimination (Bradley and Manna 2007) by replacing π Sn by V π Π with the aim of obtaining Π smaller than n!. Algorithm 1 gradually augments Π through counterexamples until it becomes canonizing. It starts with some initial set of permutations Π (for simplicity, assume that Π = ). Then, incrementally apply the step specified in lines 2 3 of Algorithm 1, as long as the stated condition holds. Algorithm 1 Compute Canonizing Set 1: Init: Π = 2: while A Mn π Sn s.t. minΠ(A) and π(A) A do 3: Π = Π {π} 4: end while 5: return Π Lemma 7 (Itzhakov and Codish (2016)). Algorithm 1 terminates and returns a canonizing set Π. We say that a canonizing set Π of permutations is redundant if for some π Π the set Π \ {π} is also canonizing. Algorithm 1 may compute a redundant set. For example, if a permutation added at some point becomes redundant in view of permutations added later. An algorithm Reduce Algorithm is straightforward to implement and we denote it as Algorithm 2. It iterates on the elements of a canonizing set to remove redundant permutations (similarly to the iterative algorithm for minimally unsatisfiable set or monotone predicates in general (N ohrer, Biere, and Egyed 2012; Marques Silva, Janota, and Belov 2013)). Canonizing Sets for Algebras To apply Algorithm 1 it is needed to encode into SAT that we are looking for an interpretation A that is a model of some given first order logic axioms with domain size n. This is done by standard means, cf. (Mc Cune 1994; Claessen and S orensson 2003). Notably, for each triple of domain elements (d1, d2, d3) a propositional variable xd1,d2,d3 is introduced, representing the fact that d1 d2 = d3 in the model A; for ease of notation, we write A[d1, d2] d3 to refer to this variable. These variables encode the value of x y in one-hot encoding, expressed as a cardinality constraint, encoded into CNF by standard means (Roussel and Manquinho 2021), i.e. 1 = Σd DA[d1, d2] d, for d1, d2 D. First, we present the encoding of the while loop condition at line 2 in Algorithm 1. The condition has four parts: A Mn, π Sn, minΠ(A), and π(A) A. The encoding is a conjunction: φ1 φ2 φ3 φ4 of four corresponding propositional formulas. The overall complexity of the encoding is O(n7), however, simplifications are used for fixed permutations see Danˇco (2024, Sec 3.2.2). The first conjunct comprises the constraints imposed by the FOL specification which specifies Mn. The second conjunct models that π is an (unknown) permutation. To this end, we introduce propositional variables π[d1] d2 for d1, d2 D, to represent the permutation π, under the constraints that they indeed behave like a permutation: 1 = Σd Dπ[d] d = Σd Dπ[d ] d, for d D. The fourth conjunct is the encoding of the constraint π(A) A where both A and π are unknown (existentially quantified). This is the crux of the condition (we will come back to the third conjunct later). Itzhakov and Codish (2016) show how to encode this constraint for graphs but for algebras, this encoding is much more involved because the matrix contains values and permutations apply also to these values. To reason about π(A), it is useful to realize that if π(A) has the value d in cell r, c, the multiplication table A will have the value π 1(d) in cell π 1(r), π 1(c) (π is effectively a renaming). Since the SAT encoding does not enable us to treat the permutation π as a first-order citizen, we will have to go through all possible combinations of pre-images. To simplify the presentation, we introduce an auxiliary subformula pre, expressing that r , c , d are pre-images of r, c, d under π, respectively, and the value of A is d at the cell r , c . pre(r , r, c , c, d , d) π[r ] r π[c ] c π[d ] d A[r , c ] d Note that if pre(r , r, c , c, d , d) is true then π(A) has the value d at position r, c because, effectively, π replaces the primed values with their non-primed version. We consider some fixed vectorization of the multiplication table, which is represented as a sequence of row-column index pairs, (r1, c1), . . . , (rn2, cn2). Now we can define auxiliary propositional variables that express that the value of A in cell ri, ci is greater/equal than π(A) in the same cell by going through all combinations of pre-images and values. gtd i A[ri, ci] d _ r ,c ,m ,m D m