# learning_symmetric_rules_with_satnet__834f5a5c.pdf Learning Symmetric Rules with SATNet Sangho Lim School of Computing KAIST Daejeon, South Korea lim.sang@kaist.ac.kr Eun-Gyeol Oh Graduate School of Information Security KAIST Daejeon, South Korea eun-gyeol.oh@kaist.ac.kr Hongseok Yang School of Computing and Kim Jaechul Graduate School of AI, KAIST Discrete Mathematics Group, Institute for Basic Science (IBS) Daejeon, South Korea hongseok.yang@kaist.ac.kr SATNet is a differentiable constraint solver with a custom backpropagation algorithm, which can be used as a layer in a deep-learning system. It is a promising proposal for bridging deep learning and logical reasoning. In fact, SATNet has been successfully applied to learn, among others, the rules of a complex logical puzzle, such as Sudoku, just from input and output pairs where inputs are given as images. In this paper, we show how to improve the learning of SATNet by exploiting symmetries in the target rules of a given but unknown logical puzzle or more generally a logical formula. We present Sym SATNet, a variant of SATNet that translates the given symmetries of the target rules to a condition on the parameters of SATNet and requires that the parameters should have a particular parametric form that guarantees the condition. The requirement dramatically reduces the number of parameters to learn for the rules with enough symmetries, and makes the parameter learning of Sym SATNet much easier than that of SATNet. We also describe a technique for automatically discovering symmetries of the target rules from examples. Our experiments with Sudoku and Rubik s cube show the substantial improvement of Sym SATNet over the baseline SATNet. 1 Introduction Bringing the ability of reasoning to the deep-learning systems has been the aim of a large amount of recent research efforts [28, 14, 5, 26, 23]. One notable outcome of these endeavours is SATNet [26], a differentiable constraint solver with an efficient custom backpropagation algorithm. SATNet can be used as a component of a deep-learning system and make the system capable of learning and reasoning about sophisticated logical rules. Its potential has been demonstrated successfully with the tasks of learning the rules of complex logical puzzles, such as Sudoku, just from input-output examples where the inputs are given as images. We show how to improve the rule (or constraint) learning of SATNet, when the target rules have permutation symmetries. By having symmetries, we mean the solutions of the rules are closed under the permutations of those symmetries. For example, in Sudoku, if a completed 9 9 Sudoku board is a solution, permuting the numbers 1 to 9 in the board, the first three rows, or the last three columns always gives rise to another solution. Thus, these permutations are symmetries of Sudoku. *These authors contributed equally to this work. 36th Conference on Neural Information Processing Systems (Neur IPS 2022). Our improvement is a variant of SATNet, called Sym SATNet, which abbreviates symmetry-aware SATNet. Sym SATNet assumes that some symmetries of the target rules are given a priori although the rules themselves are unknown. It then translates these symmetries into a condition on the parameter matrix C Rn n of SATNet (or our minor generalisation), and requires that the parameters have a particular parametric form that guarantees the condition. Concretely, the translated condition says that the matrix C regarded as a linear map should be equivariant with respect to the group G determined by the given symmetries, and the requirement is that C should be a linear combination of elements in a basis for the space of G-equivariant symmetric matrices. The coefficients of this linear combination are the parameters of our Sym SATNet, and their number is often substantially smaller than that of the parameters of SATNet.1 For Sudoku, the former is 18, while the latter is 7292 or k 729 for some k N at best. The reduced number of parameters implies that Sym SATNet has to tackle an easier learning problem than SATNet, and has a potential to learn faster and generalise better than SATNet. Who provides symmetries for Sym SATNet? The default answer is domain experts, but a better alternative is possible. We present an automatic algorithm to discover symmetries. Our algorithm is based on empirical observation that symmetries emerge in the parameter matrix C of SATNet in the early phase of training, as clusters of similar entries. Our algorithm takes a snapshot of C at some training epoch of SATNet, and finds a group G such that (i) specific entries of C share similar values by G-equivariance condition, and (ii) the number of Sym SATNet parameters under G is minimised. We empirically evaluate Sym SATNet and our symmetry-discovering algorithm with Sudoku and a problem related to Rubik s cube. For both problems, our algorithm discovered nontrivial symmetries, and Sym SATNet with manually specified or automatically found symmetries outperformed the baseline SATNet in learning the rules, in terms of both efficiency and generalisation. Related work There have been multiple studies on discovering symmetries present in conjunctive normal form (CNF) formulas in order to reduce the search space of satisfiability (SAT) solvers. Crawford [7] proved that the symmetry-detection problem is equivalent to the graph isomorphism problem, and showed how to reduce the complexity of pigeonhole problems using symmetries. Crawford et al. [8] proposed symmetry-breaking predicates (SBPs), and Aloul et al. [1, 2] developed SBPs with more efficient constructions. For automatic symmetry detection, Darga et al. [9] presented a method that improves the partition refinement procedure introduced by Mc Kay [19, 20], and Darga et al. [10] proposed an algorithm that achieve efficiency by exploiting the sparsity of symmetries. In contrast to these global and static methods, Benhamou et al. [4] and Devriendt et al. [12] handled local symmetries that dynamically arise during search. The use of symmetries also appears in Neuro SAT [22], which learns how to solve SAT problems from examples. Neuro SAT solves a given SAT formula by message passing over a graph constructed from the formula, and in so doing, its learnable solver can exploit symmetries in the formulas. All of these techniques use symmetries to help solve given formulas, whereas our approach uses symmetries to help learn such formulas. Another difference is that those techniques find hard symmetries of given formulas, whereas our approach discovers soft or approximate symmetries in a given SATNet parameter matrix. In the context of deep learning, Basu et al. [3] and Dehmamy et al. [11] described algorithms that find and exploit symmetries via group decompositions and Lie algebra convolutions. But these techniques are not designed to find symmetries in logical formulas. Our work is related to the studies on learning logical rules from examples using gradients. Yang et al. [28] proposed neural logic programming, an end-to-end differentiable system which learns firstorder logical rules, Evans and Grefenstette [14] proposed a differentiable inductive logic programming system which is robust to noise of training data, and Cingillioglu and Russo [5] introduced an RNNbased model to learn logical reasoning tasks end-to-end. Want et al. [25, 26] presented SATNet using the mixing method, and Topan et al. [23] further improved SATNet by solving the symbol grounding problem, a key challenge of SATNet. Our work extends these lines of work by proposing how to discover and exploit symmetries from examples when learning logical rules with SATNet. 1SATNet assumes that C is of the form ST S for some S Rm n for m < n, so that the number of parameters is mn < n2. But often it is still substantially larger than the number of parameters of Sym SATNet. 2 Background We review SATNet, the formalisation of symmetries using groups, and equivariant maps. For a natural number n, let [n] = {1, . . . , n}, and for a matrix M, let Mi,j be the (i, j)-th entry of M. A good starting point for learning about SATNet is to look at its origin, the mixing method [25], which is an efficient algorithm for solving semidefinite programming problems with diagonal constraints. Let n, k N and C be a real-valued symmetric matrix in Rn n. The mixing method aims at solving the following optimisation problem: argmin V Rk n C, V T V subject to vi = 1 for i [n] (1) where vi is the i-th column of the matrix V , and vi is L2 norm of vi. The mixing method solves (1) by coordinate descent, where each column vi of V is repeatedly updated as follows: gi P j [n],j =i Ci,jvj and vi gi gi . This always finds a fixed point of the equations. In fact, it is shown that almost surely this fixed point attains a global optimum of the optimisation problem. An example of the above optimisation problem most relevant to us is a continuous relaxation of MAXSAT. MAXSAT is a problem of finding truth assignments to n boolean variables b1, . . . , bn. It assumes m clauses of those variables, F1, . . . , Fm, where Fℓis the disjunction of some variables with or without negation: Fℓ= bi1 . . . bip bip+1 . . . bip+q. Then, MAXSAT asks for a truth assignment on the variables that maximises the number of true clauses Fi under the assignment. The rules of many problems, including Sudoku, can be expressed as an instance of MAXSAT. To apply the mixing method to MAXSAT, we introduce relaxed vectors v1, . . . , vn Rk that encode the boolean variables, and construct the matrix S Rm n that encodes the m clauses of MAXSAT: the (ℓ, j)-th entry of S has 1 if Fℓcontains bj; and 1 if Fℓincludes bj; and 0 if neither of these cases holds. Then, the problem in (1) is formed with C = ST S, and solved by the mixing method. SATNet is a variant of the mixing method where some of the columns of V are fixed and the optimisation is over the rest of the columns.2 Concretely, it assumes that the column indices in [n] are split into two disjoint sets, I and O (i.e., I O = [n] and I O = ). The inputs of SATNet are the columns vi of V with i I, and the outputs are the rest of the columns (i.e., the vo s with o O). The symmetric matrix C is the parameter of SATNet. Given the input vectors, SATNet repeatedly executes the coordinate descent updates on each output column, until it converges. One important feature of SATNet is that it has a custom algorithm for backpropagation. Let VI be the matrix of the input columns to SATNet, and VO be that of the output columns computed by SATNet on the input VI under the parameter C. Assume that l is a loss of the output VO. In this context, SATNet provides formulas and algorithms for computing the derivatives l/ VI and l/ C. We recall the formulas for the derivatives. Let o1 < o2 < . . . < o|O| be the sorting of the indices in O. Assume that SATNet was run until convergence, so that the output columns in VO are the fixed point of the coordinate descent updates: for all o O, go = P j [n],j =o Co,jvj and vo = go go . The formulas for l/ VI and l/ C at (VI, VO, C) are defined in terms of the next quantities: C , D R|O| |O|, P R|O|k |O|k, U R|O|k 1, W R|O|k n2. They have the following definitions: for i, j [|O|], p, q [k], and r, s [n], (C )i,j = 0 if i = j Coi,oj if i = j, (D )i,j = goi if i = j 0 if i = j, P(i 1)k+p, (j 1)k+q = (Ik voiv T oi)p,q if i = j 0 if i = j, U = (P((D + C ) Ik)) l vec(VO) W(i 1)k+p, (r 1)n+s = 0 if r = oi and s = oi Vp,s if r = oi and s = oi Vp,s if r = oi and s = oi 0 if r = oi and s = oi. 2The original SATNet assumes that C has the form ST S for some m n matrix S. We drop this assumption and adjust the forward and backward computations of SATNet accordingly. The main steps of derivations of the formulas for the forward and backward computations are from the work on SATNet [26]. Here is the Kronecker product, _ is Moore-Penrose inverse (also known as pseudo inverse), and vec(VO) is the vector obtained by stacking the columns of VO. Let CO,I R|O| |I| be obtained by restricting C to the indices (o, i) with o O and i I. Then, l/ vec(C) = U T W, l/ vec(VI) = U T (CO,I Ik). (2) SATNet computes the above derivative formulas efficiently by iterative algorithms. 2.2 Symmetries and equivariant maps By symmetries on a set X, we mean a group G that acts on X. The acting here refers to a function _ _ from G X to X, called group action, such that (i) e x = x for the unit e G and any x X, and (ii) (g g ) x = g (g x) for all g, g G and x X, where _ _ is the group operator of G. We use symmetries of permutations on a finite set. The set X in our case is Rk n, the space of the matrix V in (1), and G is a subgroup of the group Sn of all permutations on [n]. The group action g V is then defined by permuting the columns of V by g: for all i, j [n], (g V )i,j = Vi,g 1(j). This group action can be expressed compactly with the n n permutation matrix Pg 1 where (Pg 1)i,j = 1{i=g 1(j)} and g V = V Pg 1 for the indicator function 1. Throughout the paper, we often equate each element g of G with its permutation matrix Pg, and view G itself as the group of permutation matrices Pg for g G with the standard matrix multiplication. One important reason for considering symmetries is to study maps that preserve these symmetries, called equivariant maps. Let G be a group that acts on sets X and Y. Definition 2.1. A function f : X Y is G-equivariant or equivariant if f(g x) = g (f(x)) for all g G and x X. It is G-invariant or invariant if f(g x) = f(x) for all g G and x X. The forms of equivariant maps have been studied extensively in the work on equivariant neural networks and group representation theory [6, 18, 27]. In particular, when f is linear, various representation theorems for different G s describe the matrix form of f. We use permutation groups defined inductively by the following three operations. Definition 2.2. Let G and H be permutation groups on [p] and [q], with each group element viewed as a p p or q q permutation matrix. The direct sum G H, the direct product G H, and the wreath product H G are the following groups of (p + q) (p + q) or pq pq permutation matrices with matrix multiplication as their composition: G H = {g h : g G, h H}; G H = {g h : g G, h H}; H G = {wr( h, g) : g G, h Hp}. Here g h is the block diagonal matrix with p p matrix g as its upper-left corner and q q matrix h as the lower-right corner, g h is the Kronecker product of two matrices g and h, and wr( h, g) is the pq pq permutation matrix defined by wr( h, g)(i 1)q+j,(i 1)q+j = 1{gi,i =(hi)j,j =1} for all i, i [p] and j, j [q]. The next theorem specifies the representation of G-equivariant linear maps for an inductivelyconstructed G, by describing a basis of those linear maps. For a permutation group G on [m], let E(G) = {M Rm m : Mg = g M, g G}, the vector space of G-equivariant linear maps, where each g G is regarded as a permutation matrix. See Appendix B for the proof of Theorem 2.3. Theorem 2.3. Let G, H be permutation groups on [p] and [q], and B(G), B(H) be some bases of E(G) and E(H), respectively. Then, the following sets form bases for G H, G H, and H G: B(G H) = {A 0q, 0p B : A B(G), B B(H)} {1O (p+O ), 1(p+O ) O : O O(G), O O(H)}; B(G H) = {A B : A B(G), B B(H)}; B(H G) = {A 1O O : A B(G), Ai,i = 0 for i [p] and O , O O(H)} {IO B : B B(H), O O(G)}. Here 0m is an everywhere-zero matrix in Rm m, and 1R S, IR are the matrices defined by (1R S)i,j = 1{i R, j S}, (IR)i,j = 1{i=j, i R} whose shapes are defined by the context in which they are used. Here, 1O (p+O ), 1(p+O ) O R(p+q) (p+q), 1O O Rq q, IO Rp p. Also, O(G) = {{g(i) : g G} : i [p]} (i.e., the set of G-orbits), and p + O = {p + i : i O}. 3 Symmetry-aware SATNet In this section, we present Sym SATNet, which abbreviates symmetry-aware SATNet. This variant is designed to operate when symmetries of a learning task are known a priori (via an algorithm or a domain expert). The proofs of the theorem and the lemma in the section are in Appendix C. Sym SATNet solves the optimisation problem of SATNet, but under the following assumptions: Assumption 3.1. The optimisation objective C, V T V in (1) as a map on V = Rk n is invariant under a permutation group G, whose action is of type in Section 2.2 (i.e., each g G acts as a permutation on the columns of V ). Continuing our convention, we denote Pg by g. One immediate consequence of Assumption 3.1 is C, V T V = C, (g V )T (g V ) = C, (V g 1)T (V g 1) for all g G and V Rk n. The next theorem re-phrases this property of the optimisation objective as equivariance of C: Theorem 3.2. Let C be a symmetric n n matrix. Then, C, V T V = C, (V g 1)T (V g 1) (3) for all V Rk n and g G if C as a linear map on Rn is G-equivariant, that is, Cg = g C for all g G. Furthermore, if k = n, the converse also holds. This theorem lets us incorporate the symmetries into the objective of SATNet, and leaves the handling of the diagonal constraints of SATNet. The next lemma says that those constraints require no special treatment, though, since they are already preserved by the action of any g G. Lemma 3.3. Let V Rk n and g G. Every column of V has the L2-norm 1 if and only if every column of V g 1 has the L2-norm 1. Recall E(G) = {M Rn n : Mg = g M, g G} is the vector space of G-equivariant matrices. Let E(G)s be the subset of E(G) containing only symmetric matrices. When G is a permutation group constructed by direct sum, direct product, and wreath product, we can generate a basis B(G) of E(G) automatically using Theorem 2.3. Then, we can convert B(G) to an orthogonal basis of E(G)s by applying the Gram-Schmidt orthogonalisation to {B + BT : B B(G)}. Let B(G)s = {B1, . . . , Bd} be such an orthogonal basis of E(G)s. Sym SATNet is SATNet where the matrix C in the optimisation objective has the form: α=1 θαBα (4) for some scalars θ1, . . . , θd R. Note that by this condition on the form of C, Sym SATNet has only d parameters θ1, . . . , θd, instead of n m for some m in the original formulation of SATNet. When the learning target has enough symmetries, d is usually far smaller than n2 or even n, and this reduction brings speed-up and improved generalisation. The forward computation of Sym SATNet is precisely that of SATNet, the repeated coordinate-wise updates until convergence, and the backward computation is the one of SATNet extended (by the chain rule) with a step backpropagating the derivatives l/ C to each l/ θα for α [d].3 We summarise Sym SATNet below using the usual notation of SATNet (I, O, VI, VO, and V ): The input is VI, the matrix of the input columns of V . The parameters are (θ1, . . . , θd) Rd. They define the matrix C by (4). The forward computation solves the following optimisation problem using coordinate descent, and returns VO, the matrix of the output columns of V : argmin VO Rk |O| C, V T V subject to vo = 1 for o O. The backward computation computes l/ VI and l/ θα by (2) and the chain rule: l/ θα = ( l/ vec(C)) vec(Bα) = U T W vec(Bα). (5) 3Sym SATNet is implemented based on the SATNet code [26] available under the MIT License. 4 Discovery of Symmetries One obstacle for using Sym SATNet is that the user has to specify symmetries. We now discuss how to alleviate this issue by presenting an algorithm for discovering candidate symmetries automatically. The goal of our algorithm, denoted by SYMFIND, is to find a permutation group G that captures the symmetries of an unknown learning target and is expressible by the following grammar: G ::= Im | Zm | Sm | G G | G G | G G for m N. The Im denotes the trivial group containing only the identity permutation on [m], and Zm denotes the group of cyclic permutations on [m], each of which maps i [m] to (i + n) mod m for some n. The Sm is the group of all the permutations on [m]. The last three cases are direct sum, direct product, and wreath product (see Definition 2.2). They describe three ways of decomposing a group G into smaller parts. Having such a decomposition of G brings the benefit to recursively and efficiently compute a basis of G-equivariant linear maps. The design of SYMFIND is based on our empirical observation that a softened version of symmetries often emerges in the parameter matrix C of the original SATNet during training. Even in the early part of training, many entries of C share similar values, and there is a large-enough group G with Cg g C for all g G, which intuitively means that G captures symmetries of C. Furthermore, we observed, such G often consists of symmetries of the learning target. This observation suggests an algorithm that takes C as input and finds such G expressible in our grammar or its slight extension. Algorithm 1 SYMFIND with a threshold λ > 0 1: Input: M Rm m Output: (G, σ) 2: if prj(grp(Sm, idm), M) M F λ then 3: return (Sm, idm) 4: end if 5: A {(Im, idm)} 6: if prj(grp(Zm, idm), M) M F λ then 7: A A {(Zm, idm)} 8: end if 9: (G , σ ) SUMFIND(M); 10: A A {(G , σ )} 11: for every divisor p of m do 12: (G , σ ) PRODFIND(M, p); 13: A A {(G , σ )} 14: end for 15: (G, σ) argmin(G,σ) A dim(E(grp(G, σ))) 16: return (G, σ) The input of SYMFIND is a matrix M Rm m. As previously explained, when SYMFIND is called at the top level, it receives as input the parameter C of SATNet learnt by a fixed number of training steps. However, subsequent recursive calls to SYMFIND may have input M different from C. Then, SYMFIND returns a group G in our grammar and a permutation σ on [m], together defining a permutation group on [m]: SYMFIND(M) = (G, σ); grp(G, σ) = {σ g σ 1 : g G}, where is the composition of permutations. When G is decomposed into, say G1 G2, the σ specifies which indices in [m] get permuted by G1 and G2. Once top-level SYMFIND returns (G, σ), we construct B(grp(G, σ))s, as in Section 3 with a minor adjustment with σ.4 Algorithm 1 describes SYMFIND, where idm is the identity permutation on [m], F is the Frobenius norm, dim(V) is the dimension of a vector space V, and the Reynolds operator prj projects a matrix M Rm m orthogonally to the subspace of G-equivariant m m matrices: prj(G, M) = 1 |G| g G g Mg T , so that prj(G, M) M F computes the L2 distance between the matrix M and the space E(G). In the lines 2-4, the algorithm first checks whether Sm models symmetries of the input M accurately. If so, the algorithm returns (Sm, idm). Otherwise, it assumes that an appropriate group for M s symmetries is one of the remaining cases in the grammar, and constructs a list A of candidates intially containing the trivial group (Im, idm). In the lines 6-8, the algorithm adds a pair (Zm, idm) to A if it approximates M s symmetries well. In the lines 9-10, the algorithm calls the subroutine SUMFIND which finds grp(G , σ ) with G = L i G i that approximates M s symmetries well. In the lines 11-14, the algorithm calls the other subroutine PRODFIND for every divisor p of m. For each p, PRODFIND finds grp(G , σ ) with G = G 1 G 2 (or G 2 G 1) that approximates M s symmetries well, where G 1 and G 2 are permutation groups on [p] and [m/p]. Finally, in the line 15, our SYMFIND picks a pair (G, σ) from the candidates A with the strongest level of symmetries in the sense that the basis of grp(G, σ)-equivariant matrices has the fewest elements. 4We construct B(grp(G, σ)) = {σBσT : B B(G)}, which is an orthogonal basis for E(grp(G, σ)). The subroutine SUMFIND clusters entries of M as blocks since block-shaped clusters commonly arise in matrices equivariant with respect to a direct sum of groups. The other subroutine PRODFIND uses a technique [24] to exploit a typical pattern of Kronecker product of matrices, and detects the presence of the pattern in M by applying SVD to a reshaped version of M. Each subroutine may call SYMFIND recursively. See Appendix D for the details. 5 Experimental Results We experimentally evaluated Sym SATNet and the SYMFIND algorithm on the tasks of learning rules of two problems, Sudoku and the completion problem of Rubik s cube. The original SATNet was used as a baseline, and both ground-truth and automatically-discovered symmetries were used for Sym SATNet. For SYMFIND, we also tested its ability to recover known symmetries given randomly generated equivariant matrices. We observed significant improvement of Sym SATNet over SATNet in various learning tasks, and also the promising results and limitation of SYMFIND. 0 20 40 60 80 100 Epoch Test Accuracy SATNet-Plain SATNet-300aux Sym SATNet Sym SATNet-Auto 0 20 40 60 80 100 Epoch Test Accuracy SATNet-Plain SATNet-300aux Sym SATNet Sym SATNet-Auto (b) Rubik s cube Figure 1: Test accuracies over training epochs. Sudoku problem In Sudoku, we are asked to fill in the empty cells of a 9 9 board such that every row, every column, and each of nine 3 3 blocks have all numbers 1 9. Let A {0, 1}9 9 9 be the encoding of a full number assignment for the board where the (i, j, k)-th entry of A is 1 if the (i, j)-th cell of the board contains k. In SATNet, we flatten A to the assignment on the n = 93 boolean variables, and relax each variable into Rk, resulting V Rk n in the objective of SATNet. The rules of Sudoku have symmetries formalised by G = (S3 S3) (S3 S3) S9. Each of two S3 S3 refers to solution-preserving permutations for rows and columns in Sudoku. The last S9 refers to permutations of the assigned numbers 1 9 in each cell. See Appendix F for more information about the symmetry group for Sudoku. To learn the rules of Sudoku using Sym SATNet, we constructed a basis B(G)s as explained in Section 3. It has 18 elements, which means that Sym SATNet has 18 parameters to learn. Table 1: Best test accuracies during 100 epochs and average train times (102 sec). Additional times for automatic symmetry detection are also reported after +. MODEL SUDOKU CUBE ACC. TIME ACC. TIME SATNET-PLAIN 88.1% 48.0 55.7% 1.8 1.8% 0.17 0.7% 0.01 SATNET-300AUX 97.9% 90.3 56.5% 14.0 0.3% 0.68 0.9% 0.12 SYMSATNET 99.2% 25.6 66.9% 1.1 0.2% 0.14 1.2% 0.00 SYMSATNET-AUTO 99.5% 22.7 68.1% 3.4 0.2% +0.14 0.35 2.8% +0.66 0.19 We used 9K training and 1K test examples generated by the Sudoku generator [21]. Each example is a pair (VI, VO) where the input VI assigns 3142 cells (out of 81 cells) and the output VO specifies the remaining cells. Sym SATNet was compared with SATNet Plain without auxiliary variables, and SATNet-300aux with 300 auxiliary variables. We used binary cross entropy loss and Adam optimizer [16], with the learning rate η = 2 10 3 for SATNet-Plain and SATNet-300aux as the original work and η = 4 10 2 for Sym SATNet. We measured test accuracy, the rate of the correctly-solved Sudoku instances by the forward computations. We reported the average over 10 runs with 95% confidence interval. The results are in Figure 1 and Table 1. (Sym SATNet-Auto refers to a variant of Sym SATNet that uses SYMFIND as a subroutine to find symmetries automatically, and will be described later in this section.) Our Sym SATNet outperformed SATNet-Plain and SATNet-300aux. On average, its 100 epochs finished 2-4 faster in terms of wall clock than two alternatives. due to the reduced number of iterations and avoidance of matrix operations. See Appendix G for the efficiency of Sym SATNet. Despite the speed up, the best test accuracy of Sym SATNet (99.2%) was significantly better than SATNet-Plain (88.1%) and slightly better than SATNet-300aux (97.9%). Completion problem of Rubik s cube The Rubik s cube is composed of 6 faces, each of which has 9 facelets. We considered a constraint satisfaction problem where we are asked to complete the missing facelets of the Rubik s cube such that the resulting cube is solvable; by moving the cube, we can make all facelets in each face have the same colour, and no same colours appear in two faces. Let A {0, 1}6 9 6 be a colour assignment of Rubik s cube where the (i, j, k)-th entry has 1 if and only if the j-th facelet of the i-th face has colour k. We formulate the optimisation objective of SATNet for Rubik s cube using the relaxation of A to V Rk n for n = 6 9 6. This problem has symmetries formalised by G = R54 R6 on [n]. Here R54 and R6 are permutation groups on [54] and [6], each of which captures the allowed moves of facelets, and the rotations of the whole cube. If a colour assignment A is solvable, so is the transformation of A by any permutations in G. See Appendix F for more information about the symmetries of this problem. We generated a basis B(G)s in three steps. We first created B(R54) and B(R6) using the generators of each group [15]. Next, we combined them using Theorem 2.3 to get B(G), which was converted to a symmetric orthogonal basis B(G)s via Gram-Schmidt. The final result has 48 basis elements. We used a dataset of 9K training and 1K test examples generated by randomly applying moves to the solution of the cube. Each example is a pair (VI, VO) where VI assigns colours to facelets except for two corner facelets, two edge facelets, and one center facelet, and VO specifies the colours of those five missing facelets. In the test examples, only VI is used. We trained Sym SATNet, SATNet-Plain, and SATNet-300aux for 100 epochs, under the same configuration as in the Sudoku case. The results appear in Figure 1 and Table 1. On average, the 100-epoch training of Sym SATNet completed faster in the wall-clock time than those of SATNet-Plain and SATNet-300aux. Also, it achieved better test accuracies (66.9%) than these alternatives (55.7% and 56.5%). Note that unlike Sudoku, the test accuracy of SATNet-300aux was only marginally better than that of SATNet-Plain, which indicates that both suffered from the overfitting issue. Note also the sharp increase in the training time of SATNet-300aux. These two indicate that adding auxiliary variables is not so effective for the completion problem for Rubik s cube, while exploiting symmetries is still useful. Automatic discovery of symmetries To test the effectiveness of SYMFIND, we tested whether SYMFIND could find proper symmetries in Sudoku and Rubik s cube. We applied SYMFIND to the parameter C of SATNet-Plain in T-th training epoch, where T = 10 for Sudoku and T = 20 for Rubik s cube. For Sudoku, SYMFIND always recovered the full symmetries with G = (S3 S3) (S3 S3) S9 in our 10 trials. For Rubik s cube, the group of full symmetries is grp(G, σ) for G = ((S2 S3) (S3 S8) (S2 S12)) (S2 S3). SYMFIND recovered all the parts except S2 S12. Instead of this, the algorithm found S12 S2 or S3 S8, or S4 S6 in our 10 trials. We manually observed that the entries of C in the corresponding part were difficult to be clustered, violating the assumption of SYMFIND. This illustrates the fundamental limitation of SYMFIND. To account for the limitation of SYMFIND, we refined the group G of detected symmetries to a subgroup in an additional validation step, before training Sym SATNet with those symmetries. In the validation step, we checked the usefulness of each part Gi of the expression of G in our grammar. Concretely, we rewrote G only with the part Gi in concern, where all the other parts of G were masked by the trivial groups Ik. After projecting C with the masked groups using Reynolds operator, we measured the improvement of accuracy of SATNet over validation examples. Finally, we assembled only the parts Gi that led to sufficient improvement. For example, if G = (Z3 S4) S5 is discovered by SYMFIND, we consider the parts G1 = Z3, G2 = S4, and G3 = S5. Then, we construct three masked groups G 1 = (Z3 I4) I5, G 2 = (I3 S4) I5, and G 3 = (I3 I4) S5, and measure the accuracy of SATNet with C projected by each G i over validation examples. If G 1 and G 3 show accuracy improvements greater than a threshold, we combine G1 and G3 to form G = (Z3 I4) S5, which is then used to train Sym SATNet. We used 8K training, 1K validation, and 1K test examples to train Sym SATNet with symmetries found by SYMFIND and the validation step. We denote these runs by Sym SATNet-Auto. We took a group G discovered by SYMFIND in T-th training epoch (with the same T above) and constructed its subgroup G via the validation step. Sym SATNet was then trained after being initialised by the projection of C with G . The other configurations are the same as before. As shown in Figure 1 and Table 1, Sym SATNet-Auto performed the best for Sudoku (99.5%) and Rubik s cube (68.1%) better than even Sym SATNet. During the 10 trials with Sudoku, Sym SATNet Auto was always given the full symmetries in Sudoku. For Rubik s cube, when Sym SATNet-Auto was given correct subgroups (e.g., ((S2 S3) (S3 S8) (I4 I6)) (S2 S3), ((S2 S3) (S3 (S2 S4)) (I8 I3)) (S2 S3)), then it performed even better than Sym SATNet. In two of the 10 trials, slightly incorrect symmetries were exploited, but it outperformed SATNet-Plain and SATNet-300aux. These results show the partial symmetries of subgroups derived by the validation step are still useful, even when they are slightly inaccurate. 0 1 2 3 Number of Corruption Test Accuracy SATNet-Plain SATNet-300aux Sym SATNet Sym SATNet-Auto 0 1 2 3 Number of Corruption Test Accuracy SATNet-Plain SATNet-300aux Sym SATNet Sym SATNet-Auto (b) Rubik s cube Figure 2: Best test accuracies for noisy Sudoku and Rubik s cube datasets. Robustness to noise We tested robustness of Sym SATNet and Sym SATNet-Auto to noise by noise-corrupted datasets. We generated noisy Sudoku and Rubik s cube datasets where each training example is corrupted with noise; it alters the value of a random cell or the colour of a random facelet to a random value other than the original. We used noisy datasets with 0-3 corrupted instances to measure the test accuracy, and tried 10 runs for each dataset to report the average and 95% confidence interval. All the other setups are the same as before. Figure 2 shows the results. In both problems, Sym SATNet was the most robust, showing remarkably consistent accuracies. Sym SATNet-Auto showed comparable robustness to SATNet-300aux in noisy Sudoku, but outperformed the two baselines in noisy Rubik s cube. Next, to show the robustness of SYMFIND, we applied it to restore permutation groups G from noise-corrupted G-equivariant symmetric matrices M. We picked (Gi, σi) for i [4] where σ1, σ2, σ3 are random permutations on [15], [30], [12], and σ4 is the identity permutation on [8], and G1 = Z3 Z3 Z3 Z3 Z3, G2 = S3 S10, G3 = (S3 S3) Z3, G4 = S2 S2 S2. Then, we generated grp(Gi, σi)-equivariant symmetric matrices Mi by projecting random matrices with standard normal entries into the space E(grp(Gi, σi))s. Then, Gaussian noises from N(0, ω2) for ω = 5 10 3 are added to Mi s entries, and the resulting matrix M i is given to SYMFIND. Table 2: Full accuracies and partial accuracies of SYMFIND for given groups over 1K runs. GROUP FULL ACC. PARTIAL ACC. L5 i=1 Z3 76.6% 79.2% S3 S10 60.3% 79.9% (S3 S3) Z3 77.5% 87.0% S2 S2 S2 93.5% 94.3% For each (Gi, σi), we repeatedly generated M i and ran SYMFIND on M i for 1K times, and measured the portion where SYMFIND recovered (Gi, σi) exactly (full accuracy), and also the portion of cases where SYMFIND returned a subgroup of (Gi, σi) which is not the trivial group Im (partial accuracy). As Table 2 shows, the measured full accuracies were in the range of 60.3 93.5%, and the partial accuracies were in the range of 79.2 94.3%. These results show the ability of SYMFIND to recover meaningful and sometimes full symmetries. Transfer learning To test the transferability of Sym SATNet, we generated Sudoku and Rubik s cube datasets with varying difficulties, where each dataset consisted of 9K training and 1K test examples. For Sym SATNet-Auto, we split the 9K training examples into 8K training and 1K validation examples. We used three levels of difficulties for Sudoku and Rubik s cube (easy, normal, hard), based on the number of missing cells for Sudoku or missing facelets for Rubik s cube. The input part of Sudoku examples was generated with 21 (easy), or 31 (normal), or 41 masked cells (hard), and the input part of Rubik s cube examples was generated with 3 (easy), or 4 (normal), or 5 missing facelets (hard). For both problems, we used the training examples of easy or normal 0 20 40 60 80 100 Epoch Test Accuracy SATNet-Plain SATNet-300aux Sym SATNet Sym SATNet-Auto (a) Normal Sudoku Hard Sudoku 0 20 40 60 80 100 Epoch Test Accuracy SATNet-Plain SATNet-300aux Sym SATNet Sym SATNet-Auto (b) Normal Rubik s cube Hard Rubik s cube 0 20 40 60 80 100 Epoch Test Accuracy SATNet-Plain SATNet-300aux Sym SATNet Sym SATNet-Auto (c) Easy Sudoku Hard Sudoku 0 20 40 60 80 100 Epoch Test Accuracy SATNet-Plain SATNet-300aux Sym SATNet Sym SATNet-Auto (d) Easy Rubik s cube Hard Rubik s cube Figure 3: Transfer learning with various difficulties of training and test examples. For both problems, normal and easy examples were used to train, and hard examples were used to test each model. datasets for training, and the test examples of hard datasets for testing. We repeated every task in this experiment five times. Here we report the average test accuracies and 95% confidence interval. Figure 3 shows the test accuracies throughout 100 epochs in the four types of transfer learning tasks. Sym SATNet achieved the best result in the whole tasks; it succeeded in solving hard problems after learning from easier examples. As Figures 3a and 3c indicate, Sym SATNet-Auto exploited the full group symmetries in Sudoku. For Rubik s cube, Figure 3b shows that Sym SATNet-Auto achieved better performance over the baselines by finding partial symmetries. These results show the promise of Sym SATNet and Sym SATNet-Auto for learning transferable rules even from easier examples. Note that for the easy Rubik s cube dataset, Sym SATNet-Auto showed poor performance (Figure 3d). The poor performance comes from the violation of the assumption of SYMFIND; the group symmetries sometimes did not emerge in SATNet in this case. In three out of five trials with the easy Rubik s cube dataset, SATNet learnt nothing while producing the 0% test accuracy, and SYMFIND returned the trivial group which equated Sym SATNet-Auto with SATNet-Plain. In the remaining two trials, SATNet learnt correct rules, and SYMFIND and the validation step found correct partial symmetries, which led to the improved performance. These results exhibit the fundamental limitation of Sym SATNet-Auto, whose performance strongly depends on the original SATNet. 6 Conclusion We presented Sym SATNet, that is capable of exploiting symmetries of the rules or constraints to be learnt by SATNet. We also described the SYMFIND algorithm for automatically discovering symmetries from the parameter C of the original SATNet at a fixed training epoch, which is based on our empirical observation that symmetries emerge during training as duplicated or similar entries of C. Our experimental evaluations with two rule-learning problems related to Sudoku and Rubik s cube show the benefit of Sym SATNet and the promise and limitation of SYMFIND. Although components of SYMFIND are motivated by the theoretical analysis of the space of equivariant matrices, such as Theorem 2.2, SYMFIND lacks a theoretical justification on its overall performance. One interesting future direction is to fill in this gap by identifying when symmetries emerge during the training of SATNet and proving probabilistic guarantees on when SYMFIND returns correct group symmetries. Acknowledgement This work was supported by the Engineering Research Center Program through the National Research Foundation of Korea (NRF) funded by the Korean Government MSIT (NRF2018R1A5A1059921) and also by the Institute for Basic Science (IBS-R029-C1). [1] Fadi A Aloul, Arathi Ramani, Igor L Markov, and Karem A Sakallah. Solving difficult instances of boolean satisfiability in the presence of symmetry. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 22(9):1117 1137, 2003. [2] Fadi A Aloul, Karem A Sakallah, and Igor L Markov. Efficient symmetry breaking for boolean satisfiability. IEEE Transactions on Computers, 55(5):549 558, 2006. [3] Sourya Basu, Akshayaa Magesh, Harshit Yadav, and Lav R Varshney. Group equivariant neural architecture search via group decomposition and reinforcement learning. ar Xiv e-prints, pages ar Xiv 2104, 2021. [4] Belaıd Benhamou, Tarek Nabhani, Richard Ostrowski, and Mohamed Réda Saıdi. Dynamic symmetry breaking in the satisfiability problem. In Proceedings of the 16th international conference on Logic for Programming, Artificial intelligence, and Reasoning. LPAR-16, Dakar, Senegal (April 25-may 1, 2010), 2010. [5] Nuri Cingillioglu and Alessandra Russo. Deeplogic: Towards end-to-end differentiable logical reasoning. In Proceedings of the AAAI 2019 Spring Symposium on Combining Machine Learning with Knowledge Engineering (AAAI-MAKE 2019), 2019. [6] Taco S. Cohen and Max Welling. Group equivariant convolutional networks. In Proceedings of ICML 16, pages 2990 2999, 2016. [7] James Crawford. A theoretical analysis of reasoning by symmetry in first-order logic. In AAAI Workshop on Tractable Reasoning, pages 17 22. Citeseer, 1992. [8] James M. Crawford, Matthew L. Ginsberg, Eugene M. Luks, and Amitabha Roy. Symmetrybreaking predicates for search problems. In Proceedings of the Fifth International Conference on Principles of Knowledge Representation and Reasoning, pages 148 159. Morgan Kaufmann Publishers Inc., 1996. [9] Paul T. Darga, Mark H. Liffiton, Karem A. Sakallah, and Igor L. Markov. Exploiting structure in symmetry detection for CNF. In Proceedings of the 41st Annual Design Automation Conference, pages 530 534. Association for Computing Machinery, 2004. [10] Paul T Darga, Karem A Sakallah, and Igor L Markov. Faster symmetry discovery using sparsity of symmetries. In 2008 45th ACM/IEEE Design Automation Conference, pages 149 154. IEEE, 2008. [11] Nima Dehmamy, Robin Walters, Yanchen Liu, Dashun Wang, and Rose Yu. Automatic symmetry discovery with Lie algebra convolutional network. Advances in Neural Information Processing Systems, 34, 2021. [12] Jo Devriendt, Bart Bogaerts, and Maurice Bruynooghe. Symmetric explanation learning: Effective dynamic symmetry handling for SAT. In International Conference on Theory and Applications of Satisfiability Testing, pages 83 100. Springer, 2017. [13] David S Dummit and Richard M Foote. Abstract algebra, volume 1999. Prentice Hall Englewood Cliffs, NJ, 1991. [14] Richard Evans and Edward Grefenstette. Learning explanatory rules from noisy data. J. Artif. Intell. Res., 61:1 64, 2018. [15] Marc Finzi, Max Welling, and Andrew Gordon Wilson. A practical method for constructing equivariant multilayer perceptrons for arbitrary matrix groups. In Proceedings of ICML 21, pages 3318 3328, 2021. [16] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. Co RR, abs/1412.6980, 2015. [17] Subrata Majumdar, Kalyan Kumar Dey, and Mohd Altab Hossain. Direct product and wreath product of transformation semigroups. GANIT: Journal of Bangladesh Mathematical Society, 31:1 7, Apr. 2012. [18] Haggai Maron, Heli Ben-Hamu, Nadav Shamir, and Yaron Lipman. Invariant and equivariant graph networks. In Proceedings of ICLR 19, 2019. [19] Brendan D. Mc Kay. Backtrack programming and the graph isomorphism problem. 1976. [20] Brendan D. Mc Kay. Practical graph isomorphism. Congressus Numerantium, 30:45 87, 1981. [21] Kyubyong Park. Can convolutional neural networks crack sudoku puzzles? https://github. com/Kyubyong/sudoku, 2018. [22] Daniel Selsam, Matthew Lamm, Benedikt Bünz, Percy Liang, Leonardo de Moura, and David L. Dill. Learning a SAT solver from single-bit supervision. In International Conference on Learning Representations, 2019. [23] Sever Topan, David Rolnick, and Xujie Si. Techniques for symbol grounding with SATNet. In Advances in Neural Information Processing Systems, volume 34, pages 20733 20744. Curran Associates, Inc., 2021. [24] Charles F Van Loan and Nikos Pitsianis. Approximation with kronecker products. In Linear algebra for large scale and real-time applications, pages 293 314. Springer, 1993. [25] Po-Wei Wang, Wei-Cheng Chang, and J. Zico Kolter. The mixing method: low-rank coordinate descent for semidefinite programming with diagonal constraints. Co RR, abs/1706.00476, 2018. [26] Po-Wei Wang, Priya L. Donti, Bryan Wilder, and J. Zico Kolter. Satnet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver. In Proceedings of ICML 19, pages 6545 6554, 2019. [27] Renhao Wang, Marjan Albooyeh, and Siamak Ravanbakhsh. Equivariant networks for hierarchical structures. In Proceedings of Neur IPS 20, pages 13806 13817, 2020. [28] Fan Yang, Zhilin Yang, and William W. Cohen. Differentiable learning of logical rules for knowledge base reasoning. In Proceedings of Neur IPS 17, pages 2319 2328, 2017.