# structuring_representations_using_group_invariants__68608978.pdf Structuring Representations Using Group Invariants Mehran Shakerinava , Arnab Kumar Mondal , Siamak Ravanbakhsh Mc Gill University and Mila, Montréal, Canada {mehran.shakerinava, arnab.mondal, siamak.ravanbakhsh}@mila.quebec A finite set of invariants can identify many interesting transformation groups. For example, distances, inner products and angles are preserved by Euclidean, Orthogonal and Conformal transformations, respectively. In an equivariant representation, the group invariants should remain constant on the embedding as we transform the input. This gives a procedure for learning equivariant representations without knowing the possibly nonlinear action of the group in the input space. Rather than enforcing such hard invariance constraints on the latent space, we show how to use invariants for symmetry regularization of the latent while guaranteeing equivariance through other means. We also show the feasibility of learning disentangled representations using this approach and provide favorable qualitative and quantitative results on downstream tasks, including world modeling and reinforcement learning. 1 Introduction Sample efficient representation learning is a critical open challenge in deep learning for AI. When we have prior information about transformations that are relevant to a particular domain, building representations that are aware of these transformations can lead to better sample efficiency and generalization. One way to use such symmetry priors is to make the network invariant to the given transformations. A generalization of this idea is called equivariance, where transforming the input transforms the output in a specific way. An equivariant network that makes good predictions for a particular input also generalizes to all input transformations, making symmetry a useful prior. While recent years have witnessed a range of exciting equivariant deep models, there are several limitations. First, most equivariant networks constrain the network architecture, often requiring specialized implementations. Moreover, transformations considered by the existing methods are often assumed to be linear in both input and representation space. This is the case for architectures designed for finite permutation groups and continuous Lie groups. Approaches that go beyond linear transformations in the input space often assume access to group information i.e., the group member that transforms one input to another is known. This paper introduces a simple approach that addresses all of these limitations. Our approach uses the invariants of a given linear representation of a transformation group. Previously invariants were used to connect different geometries, and group theory in Klein s Erlangen program [32]. According to this view, geometries are concerned with invariant quantities under certain transformations. For example, Euclidean geometry is concerned with the length, angle, and parallelism of lines, among others, because Euclidean transformations preserve these. However, moving to the more general and less structured Affine geometry, notions of distance and angle are no longer relevant, while parallelism remains an invariant of the geometry. The corresponding symmetry groups are examples of Lie groups that have a subgroup relation, E(n) < Aff(n), thereby enabling the groups to characterize a hierarchy (or lattice) of different geometries. These authors contributed equally to this work. 36th Conference on Neural Information Processing Systems (Neur IPS 2022). From this geometric perspective, our proposal in this work is to induce a geometry on the embedding and make it equivariant to a given group by enforcing the invariants of their defining action. For example, distance is the invariant for Euclidean geometry, which means all distance-preserving transformations are Euclidean. Therefore, to enforce equivariance to the Euclidean group, it is sufficient to ensure that the embedding of any two data points has the same distance before and after the same transformation of the inputs; see Figure 1. While this approach uses the defining action of different groups in the embedding space, the same group can have a non-linear and unknown action on the input space. In the pendulum example of Figure 1, the group E(3) acts on the value of each input image pixel using an unknown and non-linear action. Moreover, this approach does not require the pairing of group members with transformations, a piece of information that is often unavailable. Figure 1: E(3)-equivariant embedding for the pendulum. The input x consists of a pair of images that identify both the angle and the angular velocity of a pendulum. The equivariant embedding learns to encode both: the true angle is shown by a change of color and angular velocity using a change of brightness. The two circular ends (black and white) correspond to states of maximum angular velocity in opposite directions. The Sym Reg objective for the Euclidean group learns this embedding by preserving the pairwise distance between the codes before (f(x), f(x )) and after (f(t X(g, x)), f(t X(g, x))) transformations of the input by t X. Therefore dashed lines have equal lengths. For the pendulum, the transformations are in the form of applying positive or negative torque in some range. In the rest of the paper, we arrive at the idea above from a different path: after reviewing related works in Section 2 and providing a background in Section 3, Section 4 observes that equivariance, in its general form, can be a weak inductive bias. This is because having an injective code is sufficient for equivariance to any transformation group. However, in this manifestation of equivariance, the group action on the embedding can be highly non-linear. Since the simplicity of the action on the embedding seems essential for equivariance to become a useful learning bias, Section 5 proposes to regularize the group action on the code to make it simple . This symmetry regularization (Sym Reg) objective is group-dependent and the essence of our approach. Enforcing geometric invariants in the latent space is proposed as a symmetry regularization. While we focus on equivariant representation learning through self-supervision, in principle, supervised tasks can also benefit from the proposed Sym Reg. An important benefit of a symmetry-based representation is its ability to produce disentangled representations through group decomposition [27]. Section 6 studies disentanglement using Sym Reg. Section 7 presents a range of experiments to understand its behavior and puts it in the context of comparable baselines. 2 Related Works Finding effective priors and objectives for deep representation learning is an integral part of the quest for AI [3]. Among these priors, learning equivariant deep representations has been the subject of many works over the past decade. Many recent efforts in this direction have focused on the design of equivariant maps [57, 13, 47, 34, 15, 23, 54, 19, 6] where the linear action of the group on the data is known. A particularly relevant example here is Villar et al. [54], which uses group invariants to construct equivariant maps where the group acts using its linear defining action in the input space. Due to this constraint, the application of these models has been focused on fixed geometric data such as images [36], sets [60, 45], graphs [39, 33], spherical data and the (special) orthogonal group [14, 1, 50, 22], the Euclidean group [52, 55, 24] or other physically motivated groups such as the Lorentz [4] or Poincare group [54], among others. In the present work, the group action is unknown and possibly non-linear. Our setup is closer to the body of work on generative representation learning [7, 11, 40], in which the (linear) transformation is applied to the latent space [46, 58, 35, 37, 16, 21]. Among these generative coding methods, transforming autoencoder [29] is a closely related early work, which in addition to equivariance, seeks to represent the part-whole hierarchy in the data. What additionally contrasts our work with the follow-up works on capsule networks [48, 38] is that Sym Reg is agnostic to the choice of architecture and training. We only rely on our objective function to enforce equivariance. Since we consider learning equivariant representations through self-supervision, exciting recent progress in this area is also quite relevant [25, 42, 9, 53, 26, 61, 20, 41]. While the use of transformations is prominent in these works, in many settings, the objective encourages invariance to certain transformations, making such models useful for invariant downstream tasks such as classification. Similar to many of these methods, we also use transformed pairs to learn a representation, with the distinction of learning an equivariant representation. An exception is the recent work of Dangovski et al. [17], which learns an equivariant representation by separating the invariant embedding from the pose, where the relative pose is learned through supervision. Therefore, in that work, in contrast to ours, one needs to know the transformation that maps one input to another. When considering the Euclidean group, Sym Reg preserves distances in the embedding space under non-linear transformations of the input. This embedding should not be confused with isometric embedding [51], where the objective is to maintain the pairwise distances between points in the input and the embedding space. 3 Background on Symmetry Transformations We can think of transformations as a set of bijective maps on a domain X. Since these maps are composable, we can identify their compositional structure using an abstract group G. For this reason, such transformations are called group actions. To formally define transformation groups, we first define an abstract group. A group G is a set equipped with a binary operation, such that the set is closed under the operation gg G g,g G, every g G has a unique inverse such that gg 1 = e, where e is the identity element of the group, and the group operations are associative (gg )g = g(g g ). A G-action on a set X is defined by a function t G X X, which can be thought of as a bijective transformation parameterized by g G. In order to maintain the group structure, the action should satisfy the following two properties: (1) the action of the identity is the identity transformation t(e,x) = x; (2) the composition of two actions is equal to the action of the composition of group elements t(g,t(g x)) = t(gg ,x). The action t is faithful to G if transformations of X using each g G are unique i.e., g,g x X s.t. t(g,x) t(g ,x). If a G-action is defined on a set X, we call X a G-set. Many groups are defined using their defining action; for example, SO(3) is the group of rotations in 3D space. While this defining action is a linear transformation, the same group can act non-linearly on Rn using the action t SO(3) Rn Rn. 4 Equivariance is Cheap, Actions Matter A symmetry-based representation or embedding is a function f X Z such that both X and Z are G-sets, and furthermore, f knows about G-actions, in the sense that transformations of the input using t X have the same effect as transformations of the output using some action t Z: f(t X(g,x)) = t Z(g,f(x)) g,x G X (1) The following claim shows that despite many efforts in designing equivariant networks, simply asking for the representation to be equivariant is not a strong inductive bias, and we argue that the action matters. Put another way, the strong performance of existing equivariant networks should be attributed to the fact that the group action on the embedding space is simple (linear). Proposition 4.1. Given a transformation group t X G X X, the function f X Z is an equivariant representation if g G,x,x X f(x) = f(x ) f(t X(g,x)) = f(t X(g,x )). (2) That is, two embeddings are identical iff they are identical for all transformations. The proof is in the appendix. The condition above is satisfied by all injective functions, indicating that many functions are equivariant to any group. Corollary 4.2. Any injective function f X Z is equivariant to any transformation group t X G X X, if we define G action on the embedding space as t Z(g,z) f(t X(g,f 1(z))) g,z G Z (3) The ramification of the results above in what follows is two-fold: 1. While injectivity ensures equivariance, the group action on the embedding, as shown in Equation (3), can become highly non-linear. Intuitively, this action recovers x = f 1(z), applies the group action x = t X(x) in the input domains and maps back to the embedding space f(x ) to ensure equivariance. In the following, we push t Z towards a simple linear G-action through optimization of f. This objective can be interpreted as a symmetry regularization or a symmetry prior (Sym Reg). 2. Although Corollary 4.2 uses injectivity of f for the entire X, we only need this for the data manifold. In practice, one could enforce injectivity on the training dataset D using a decoder, architectural choices such as momentum encoder [26], or loss functions defined on the training data, such as a hinge loss [25] Lhinge(f,D) = x,x x D max(ϵ f(x) f(x ) ,0) or other losses that monotonically decrease with distance, such as 1 f(x) f(x ) , or its logarithm log( f(x) f(x ) ). In experiments, we use the logarithmic barrier function. 5 Symmetry Regularization Objectives In learning equivariant representations, we often do not know the abstract group G and how it transforms our data, t X. We assume that one can pick a reasonable abstract group G that contains the ground truth abstract group acting on the data i.e., G action on the input may not be faithful. Our goal is to learn an f X Z that is equivariant w.r.t. the actions t X,t Z, where t X G X X is unknown and t Z is some (simple) G-action on Z of our choosing. More Informed but Less Practical Setting. In the most informed case, the dataset also contains information about which group member g G can be used to transform x to x that is, the dataset consists of triples (x,g,xt = t X(g,x)). By having access to this information, we can regularize the embedding using the following loss function: Linformed G (f,D) = (x,g,xt) D ℓ(f(xt) t Z(g,f(x))), where ℓis an appropriate loss function, such as the square loss. At its minimum, we have f(xt) = t Z(g,f(x)) or f(t X(g,x)) = t Z(g,f(x)), enforcing equivariance condition of Equation (1). However, even if the optimal value is not reached, due to its injectivity, f is still Gequivariant, and the the objective above is regularizing the G action on the code. This informed setup is used in equivariant contrastive learning of [17]. The assumption of having access to g is realistic when we know the action t X, so that we can generate (x,g,xt) triplets. Fortunately, using group invariants, we may still learn an equivariant embedding, even if we do not have the group information tied to the dataset. Here, we introduce our method for several well-known groups first and then elaborate on the more general treatment. Example 1 (Euclidean Group). The defining action of the Euclidean group E(n) is the set of transformations that preserve the Euclidean distance between any two points in Rn, a.k.a. isometries. These transformations are compositions of translations, rotations, and reflections. Since, for the real domain, all Euclidean isometries are linear and belong to E(n), we can enforce the group structure on the embedding by ensuring that distances between the embeddings before and after any transformation match. For this, we need the dataset D to be a set of pairs of pairs ((x,xt = t X(g,x)),(x ,x t = t X(g,x ))), where x,x are transformed using the same unknown group member g. Distance-preservation loss below combined with injection loss are sufficient to produce an E(n)-regularized embedding: LE(n)(f,D) = ((x,xt),(x ,x t)) D ℓ( distance before the transformation ³¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹ ¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹µ f(x) f(x ) distance after the transformation ³¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹ ¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹µ f(xt) f(x t) ) (4) For example, in the standard RL setup, where we have access to triplets (s,a,s ), we can easily form D by unrolling an episode and collecting two different state transitions corresponding to a particular action. In practice, with a finite number of actions, we can efficiently generate this dataset by keeping a separate buffer for each action where we store state transitions for that action and sample from that buffer to train the embedding function f. We provide the algorithm in Appendix C. Example 2 (Orthogonal and Unitary Groups). The defining action of the orthogonal group O(n) preserves the inner product between two vectors. The analogous group in the complex domain is the unitary group, which preserves the complex inner product. Our symmetry-regularization objective enforces this invariant: LO(n)(f,D) = ((x,xt),(x ,x t)) D ℓ(f(x) f(x ) f(xt) f(x t)). For the unitary group, one additionally needs to embed to complex domain Z = Cn, where the only difference is in the definition of the inner product. Example 3 (Conformal Group). The invariant of conformal geometry is the angle. In a Euclidean embedding, conformal transformations include a combination of translation, rotation, dilation, and inversion with respect to an n 1-sphere. To enforce this group structure, we need triplets of inputs before and after a transformation ((x,xt),(x ,x t),(x ,x t )), so that we can calculate the angle in the embedding. Conformal Sym Reg objective, which preserves angles, imposes a weaker constraint on the embedding than the distance preservation of the Euclidean group since the latter implies the former. Moreover, it has an additional benefit that, compared to LE(n), the loss cannot be minimized by simply shrinking the embedding. Therefore in practice, the injection enforcing losses of Section 4 is not as crucial when using conformal symmetry regularization. 5.1 General Setting Given a group G acting linearly on a vector space Z, invariant polynomials associated with this action are those polynomials satisfying P(t Z(z,g)) = P(z) g G. These polynomials form an algebra studied in the field of invariant theory [56, 44]. In particular, a relevant problem is the question of whether there exists a finite set of bases for invariant polynomials for a given group representation. This question was one of Hilbert s 23 problems, and it was answered affirmatively by Hilbert himself for linear reductive groups, which includes classical Lie groups [28]. Our proposal, in its most general form is to ensure invariance of polynomial bases within the orbits of the latent space before-after transformation of the input. Some examples of classical Lie groups and their invariants are: volume and orientation preservation by the Special Linear group, where the corresponding invariant polynomial is the determinant; Lorentz and Poincare groups are the analogs of the Orthogonal and Euclidean groups in the Minkowski space respectively, therefore equipped with similar invariants; the Symplectic group preserves another bilinear form. Finite groups also possess invariants. We show this use of invariants for Sym Reg through the important example of the symmetric group. Example 4 (Symmetric Group). Symmetric polynomials P(z1,...,zn) that are invariant under all permutations of variables have a finite set of elementary bases: e1(z) = 1 j n zj, e2(z) = 1 j