# equivariance_through_parametersharing__ba833d9b.pdf

Equivariance Through Parameter-Sharing

Siamak Ravanbakhsh 1 Jeff Schneider 1 Barnab as P oczos 1

We propose to study equivariance in deep neural networks through parameter symmetries. In particular, given a group G that acts discretely on the input and output of a standard neural network layer φW RM RN, we show that φW is equivariant with respect to G-action iff G explains the symmetries of the network parameters W. Inspired by this observation, we then propose two parameter-sharing schemes to induce the desirable symmetry on W. Our procedure for tying the parameters achieves G-equivariance and, under some conditions on the action of G, it guarantees sensitivity to all other permutation groups outside G.

Given enough training data, a multi-layer perceptron would eventually learn the domain invariances in a classiﬁcation task. Nevertheless, success of convolutional and recurrent networks suggests that encoding the domain symmetries through shared parameters can signiﬁcantly boost the generalization of deep neural networks. The same observation can be made in deep learning for semi-supervised and unsupervised learning in structured domains. This raises an important question that is addressed in this paper: What kind of priors on input/output structure can be encoded through parameter-sharing?

This work is an attempt at answering this question, when our priors are in the form discrete domain symmetries. To formalize this type of prior, a family of transformations of input and output to a neural layer are expressed as group action on the input and output. The resulting neural network is invariant to this action, if transformations of the input within that particular family, does not change the output (e.g., rotation-invariance). However, if the output is transformed, in a predictable way, as we transform the input, the neural layer is equivariant to the action of the group.

1School of Computer Science, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, PA, USA 15217. Correspondence to: Siamak Ravanbakhsh <mravanba@cs.cmu.edu>.

Proceedings of the 34 th International Conference on Machine Learning, Sydney, Australia, PMLR 70, 2017. Copyright 2017 by the author(s).

Our goal is to show that parameter-sharing can be used to achieve equivariance to any discrete group action.

Application of group theory in machine learning has been the topic of various works in the past (e.g., Kondor, 2008; Bart ok et al., 2010). In particular, many probabilistic inference techniques have been extended to graphical models with known symmetry groups (Raedt et al., 2016; Kersting et al., 2009; Bui et al., 2012; Niepert, 2012). Deep and hierarchical models have used a variety of techniques to study or obtain representations that isolate transformations from the content (e.g., Hinton et al., 2011; Jayaraman & Grauman, 2015; Lenc & Vedaldi, 2015; Agrawal et al., 2015). The simplest method of achieving equivariance is through data-augmentation (Krizhevsky et al., 2012; Dieleman et al., 2015). Going beyond augmentation, several methods directly apply the group-action, in one way or another, by transforming the data or its encodings using group members (Jaderberg et al., 2015; Anselmi et al., 2013; Dieleman et al., 2016). An alternative path to invariance via harmonic analysis. In particular cascade of wavelet transforms is investigated in (Bruna & Mallat, 2013; Oyallon & Mallat, 2015; Sifre & Mallat, 2013). More recently (Cohen & Welling, 2016b) study steerable ﬁlters (e.g., Freeman et al., 1991; Hel-Or & Teo, 1998) as a general mean for achieving equivariance in deep networks. Invariance and equivariance through parameter-sharing is also discussed in several prior works (Cohen & Welling, 2016a; Gens & Domingos, 2014).

The desirability of using parameter-sharing for this purpose is mainly due to its simplicity and computational efﬁciency. However, it also suggests possible directions for discovering domain symmetries through regularization schemes. Following the previous work on the study of symmetry in deep networks, we rely on group theory and group-actions to formulate invariances and equivariances of a function. Due to discrete nature of parameter-sharing, our treatment here is limited to permutation groups. Action of a permutation group G can model discrete transformations of a set of variables, such as translation and 90 rotation of pixels around any center in an image. If the output of a function transforms with a G-action as we transform its input with a different G-action, the function is equivariant with respect to action of G. For example, in a convolution layer, as we translate the input, the feature-maps are also translated. If

Equivariance Through Parameter-Sharing

Figure 1. Summary: given a group action on input and output of a neural network layer, deﬁne a parameter-sharing for this layer that is equivariant to these actions. (left) G = D5 is a Dihedral group, acting on a 4 5 input image and an output vector of size 5. N and M denote the index set of input, and output variables respectively. Here G is represented using its Cayley diagram. (middle-left) G-action for g G is shown for an example input. G-action on the input is a combination of circular shifts (blue arrows) and vertical ﬂips (red arrows) of the 2D image. G acts on the output indices M only through circular shift. A permutation group GN,M encodes the simultaneous action of G on input and output indices. (middle-right) The structure Ωdesigned using our procedure, such that its symmetries Aut(Ω) subsumes the permutation group GN,M. (right) the same structure Ωunfolded to a bipartite form to better show the resulting parameter-sharing in the neural layer. The layer is equivariant to G-action: shifting the input will shift the output of the resulting neural network function, while ﬂipping the input does not change the output.

the output does not transform at all, the function is invariant to the action of G. Therefore, invariance is a special equivariance. In this example, different translations correspond to the action of different members of G.

The novelty of this work is its focus on the model symmetry as a gateway to equivariance. This gives us new theoretical guarantees for a strict notion of equivariance in neural networks. The core idea is simple: consider a colored bipartite graph Ωrepresenting a neural network layer. Edges of the same color represent tied parameters. This neural network layer as a function is equivariant to the actions of a given group G (and nothing more) iff the action of G is the symmetry group of Ω i.e., there is a simple bijection between parameter symmetries and equivariences of the corresponding neural network.

The problem then boils down to designing colored bipartite graphs with given symmetries, which constitutes a major part of this paper. Fig. 1 demonstrates this idea.1

For the necessary background on group theory see the Appendix. In the following, Section 1 formalizes equivariance wrt discrete group action. Section 2 relates the model symmetries a neural layer to its equivariance. Section 3 then builds on this observation to introduce two procedures for parameter-sharing that achieves a desirable equivariance.

1Throughout this paper, since we deal with ﬁnite sets, we use circular shift and circular convolution instead of shift and convolution. The two can be made identical with zero-padding of the input.

Here, we also see how group and graph convolution as well as deep-sets become special instances in our parametersharing procedure, which provides new insight and improved design in the case of group convolution. Where input and output of the layer have a one-to-one mapping, we see that the design problem reduces a well-known problem in combinatorics.

1. Group Action and Equivariance

Let x = [x1,...,x N] XN denote a set of variables and G = {g} be a ﬁnite group. The discrete action of G on x is in the form of permutation of indices in N = {1,...,N}. This group is a subgroup of the symmetric group SN; the group of all N! permutations of N objects. We use Ð N = [1,...,N] to denote the ordered counterpart to N and the G-action on this vector gÐ N [g1,...,g N] is a simple permutation. Using xÐ N to denote x, the discrete action of g G on x XN is given by gxÐ N xgÐ N .

G-action on N is a permutation group that is not necessarily isomorphic to G itself. GN G captures the structure of G when it acts on N. We use g N to denote the image of g G in GN. G-action is faithful iff two groups are isomorphic G GN that is G-action preserves its structure. In this case, each g G maps to a distinct permutation gÐ N g Ð N g,g G. Given any G-action on N we can efﬁciently obtain GN; see Appendix.

Equivariance Through Parameter-Sharing

Example 1.1 (Cyclic Group) Consider the cyclic group G = Z6 and deﬁne its action on x R3 by deﬁning it on the index set N = {1,2,3} as gn g + n mod 3 g Z6. This action is not faithful. For example, the action of g = 1 and g = 4 result in the same permutations of variables in x; i.e., single-step of circular shift to the right. With the above action, the resulting permutation group GN is isomorphic to Z3 < Z6.

Now consider the same group G = Z6 with a different action on N: gn g n mod 3 g Z6, where we replaced (+) with ( ). Let GN be the resulting permutation group. Here again GN Z3. Although isomorphic, GN GN, as they are different permutation groups of N.

Consider the function φ XN Y M and let GN and GM be the action of G on input/output index sets N and M.

Deﬁnition 1.1 The joint permutation group GN,M is a subdirect product (or pairing) of GN and GM

GN,M = GN GM {(g N,g M) g G}.

We are now ready to deﬁne equivariance and invariance. φ( ) is GN,M-equivariant iff

g Nφ(x) = φ(g Mx) x XN,(g N,g M) GN,M (1)

Moreover, if GM = {e} is trivial, we have

g Nφ(x) = φ(x) x XN,g N GN

and φ( ) is GN-invariant.

g N and g M can also be represented using permutation matrices GN {0,1}N N, and GM {0,1}M M. Equivariance relation of (1) then becomes

GMφ(x) = φ(GNx) x XN,(GN,GM) GN,M (2)

The following observation shows that the subgroup relationship affects equivariance and invariance.

Observation 1.1 If the function φ XN Y M is GN,Mequivariant, then it is also HN,M-equivariant for any permutation group HN,M < G.

Example 1.2 (Reverse Convolution) Consider the cyclic group G = Z6 and for g G, deﬁne the action on N = {1,2,3} to be gn g + n mod 3. Also let its action on M = {1,...,6} be gm g n mod 6. In other words, G-action on N performs circular shift to the right and its action on M shifts variables to the left. Examples of the permutation matrix representation for

two members of GN and GM are

2N = ( 0 1 0 0 0 1 1 0 0) 2M =

0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0

corresponding to right and left shift on vectors of different lengths. Now consider the function φ RN RM

φW(x) = Wx WT = ( 0 a b 0 a b a b 0 a b 0 b 0 a b 0 a) a,b R

Using permutation matrices one could check the equivariance condition (2) for this function. We can show that φ is equivariant to GN,M. Consider 2 Z6 and its images 2N GN and 2M GM. L.h.s. of (2) is

0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0

0 a b a b 0 b 0 a 0 a b a b 0 b 0 a

b 0 a 0 a b a b 0 b 0 a 0 a b a b 0

which is equal to its r.h.s.

0 a b a b 0 b 0 a 0 a b a b 0 b 0 a

( 0 1 0 0 0 1 1 0 0)x =

b 0 a 0 a b a b 0 b 0 a 0 a b a b 0

for any x. One could verify this equality for all g Z6.

Now consider the group HN,M < GN,M, where HN = GN and members of HM = {0,2,4}, perform left circular shift of length 0,2 and 4. It is easy to see that HN,M Z3. Moreover since HN,M < GN,M, φ( ) above is HN,Mequivariant as well. However, one prefers to characterize the equivariance properties of φ using GN,M rather than HN,M.

The observation above suggests that GN,M-equivariance is not restrictive enough. As an extreme case, a constant function φ(x) = 1 is equivariant to any permutation group GN,M SN SM. In this case equivariance of φ with respect to a particular GN,M is not very informative to us. To remedy this, we deﬁne a more strict notion of equivariance.

Deﬁnition 1.2 we say a function φ XN Y M is uniquely G-equivariant iff it is G-equivariant and it is not H-equivariant for any H > G.

2. Symmetry Groups of a Network

Given a group G, and its discrete action through GN,M, we are interested in deﬁning parameter-sharing schemes for a parametric class of functions that guarantees their unique GN,M-equivariance. We start by looking at a single neural layer and relate its unique GN,M-equivariance to the symmetries of a colored multi-edged bipartite graph that de-

Equivariance Through Parameter-Sharing

ﬁnes parameter-sharing. We then show that the idea extends to multiple-layers.

Deﬁnition 2.1 A colored multi-edged bipartite graph Ω= (N,M,α) is a triple, where N and M are its two sets of nodes, and α N M 2{1,...,C} is the edge function that assigns multiple edge-colors from the set {1,...,C} to each edge. Non-existing edges receive no color.

We are interested in the symmetries of this structure. The set of permutations (πN,πM) SN SM of nodes (within each part of the bipartite graph) that preserve all edgecolors deﬁne the Automorphism Group Aut(Ω) SN SM that is (n,m) N M

(πN,πM) Aut(Ω) α(n,m) = α((πNn,πMm)) (3)

Alternatively, to facilitate the notation, we deﬁne the same structure (colored multi-edged bipartite graph) as a set of binary relations between N and M that is Ω= (N,M,{ c}1 c C) where each relation is associated with one color c = {(n,m) c α(n,m) (n,m) N M}. This deﬁnition of structure, gives an alternative expression for Aut(Ω)

(πN,πM) Aut(Ω) (4)

((n,m) c (πNn,πMm) c) c,n,m

The signiﬁcance of this structure is in that, it deﬁnes a parameter-sharing scheme in a neural layer, where the same edge-colors correspond to the same parameters. Consider the function φ [φ1,...,φM] RN RM

φm(x;w,Ω) σ( n c α(n,m) wcxn) m (5)

where σ R R is a strictly monotonic nonlinearity and w = [w1,...wc,...,w C] is the parameter-vector for this layer.

The following key theorem relates the equivariances of φ( ;w,Ω) to the symmetries of Ω.

Theorem 2.1 For any w RC s.t., wc wc c,c , the function φ( ;w,Ω) is uniquely Aut(Ω)-equivariant.

Corollary 2.2 For any HN,M Aut(Ω), the function φ( ;w,Ω) is HN,M-equivariant.

The implication is that to achieve unique equivariance for a given group-action, we need to deﬁne the parametersharing using the structure Ωwith symmetry group GN,M.

Example 2.1 (Reverse Convolution) Revisiting Example 1.2 we can show that the condition of Theorem 2.1 holds. In this case σ(x) = x and the parameter-sharing of the matrix W is visualized below, where we used two different line styles for a,b R.

In this ﬁgure, the circular shift of variables at the output and input level to the left and right respectively, does not change the edge-colors. For example in both cases node

1 s connection to nodes 3 , 6 using dashed-lines is preserved.

Six repetitions of this action produces different permutations corresponding to six members of GN,M. Therefore GN,M Aut(Ω) and according to Corollary 2.2, φ( ) is GN,M equivariant. Moreover, using Theorem 3.3 of the next section, we can prove that these six permutations are the only edge-color preserving ones for this structure, resulting in unique equivariance.

Matrix Form. To write (5) in a matrix form, if there are multiple edges between two nodes n,m, we need to merge them. In general, by assigning on distinct color to any set in the range of α N M 2{1,...,C} we can w.l.o.g. reduce multiple edges to a single edge. In other words we can rewrite φ using W RM N

φ(x;w;Ω) = σ(Wx) Wm,n = c α(n,m) wc (6)

Using this notation, and due to strict monotonicity of the nonlinearity σ( ), Theorem 2.1 simply states that for all (g N,g M) Aut(Ω), x RN and W given by (6)

GMWx = WGNx. (7)

Example 2.2 (Permutation-Equivariant Layer) Consider all permutations of indices N and M = N.

We want to deﬁne a neural layer such that all permutations of the input g N GN = SN result in the same

Equivariance Through Parameter-Sharing

permutation of the output g M = g N. Consider the following colored bipartite graph, for a special case where N = M = 4. It is easy to show that color-preserving permutations of this structure are Aut(Ω) = SN SN = {(g,g) g SN} SN: On one hand, for (πN,πM) SN SM, having πN = πM clearly preserves the colors. On the other hand, if πN πM, there exists u N (also in M) such that πNu πMu. Therefore (πN,πM) does not preserve the relation = {(n,n) n N} corresponding to dashed edges, and therefore (πN,πM) Aut(Ω). This proves Aut(Ω) = SN SN. The function (5) for this Ωis

φ(x;w = [w1,w2],Ω) = σ(w1Ix + w211Tx).

Ravanbakhsh et al. (2016); Zaheer et al. (2017) derive the same permutation equivariant layer, by proving the commutativity in (7), while here it follows from Corollary 2.2.

Multiple Layers. For deep networks, the equivariance of the composition φ2 φ1 to G-action follows from that of individual layer φ1 XN Y M and φ2 Y M ZO. Assuming φ1 is GN,M-equivariant and φ2 is GM,O-equivariant, where G-action on M is shared between the two layers, it follows that φ2 φ1 is GN,O-equivariant, where GN,O = GN GO. This is because g G and x XN

φ2(φ1(g Nx)) = φ2(g Mφ1(x)) = g Oφ2(φ1(x)). (8)

3. Structure Design

Consider the deﬁnition of neural layer (5) that employs parameter-sharing according to Ω. Given G-action on N and M, we are interested in designing structures Ωsuch that Aut(Ω) = GN,M. According to the Theorem 2.1, it then follows that φ is uniquely GN,M-equivariant. Here, we give the sufﬁcient conditions and the design recipe to achieve this.

For this we brieﬂy review some group properties that are used in later developments.

transitivity We say that G-action on N is transitive iff n1,n2 N, there exists at least one action g G (or g N GN) such that gn1 = n2.

regularity The group action is free or semi-regular iff n1,n2 N, there is at most one g G such at gn1 = n2, and the action is regular iff it is both transitive and free i.e., for any pair n1,n2 N, there is uniquely one g G such that gn1 = n2. Any free action is also faithful. We use a similar terminology for GN. That is we call GN semi-regular iff n1,n2 N at

most one g N GN moves n1 to n2 and GN is regular if this number is exactly one.

orbit The orbit of n N is all the members to which it can be moved, Gn = {gn g G}. The orbits of n N form an equivalence relation2 This equivalence relation partitions N into orbits N = 1 p P Gnp, where np is an arbitrary representative of the partition Gnp N. Note that the G-action on N is always transitive on its orbits that is for any n,n Gnp, there is at least one g G such that n = gn . Therefore, for a semi-regular G-action, the action of G on the orbits Gnp 1 p P is regular.

Example 3.1 (Mirror Symmetry) Consider G = Z2 = {e = 0,1} (1 + 1 = 0) acting on N, where the only non-trivial action is deﬁned as ﬂipping the input: 1N[1,...,N] = [N,N 1,...,1].

G is faithful in its action on N, however GN is not transitive e.g., N cannot be moved to N 1. If N is even, then G-action is semi-regular. This is because otherwise the element in the middle n = N

2 is moved to itself by two different actions e,1 G. Furthermore, if N is even, G-action has N

2 orbits and G2 acts on these orbits regularly. If N is odd, G-action has N

2 orbits. However, its action on the orbit of the middle element G N

2 is not regular.

In the following, Section 3.1 proposes a procedure for parameter-sharing in a fully connected layer. Although simple, this design is dense and does not guarantee unique of equivariance. Section 3.2 proposes an alternative design with sparse connections that in some settings ensures unique equivariance. Section 3.3 investigates the effect of having multiple input and output channels in the neural layer and Section 3.4 studies a special case of GN = GM, where input and output indices have a one-toone mapping.

3.1. Dense Design

Consider a complete bipartite graph with N and M as its two parts and edges (n,m) N M. The action of GN,M partitions the edges into orbits {GN,M(np,mq)}np,mq, where (np,mq) is a representative edge from an orbit. Painting each orbit with a different color gives

Ω= (N,M,{ p,q = GN,M(np,mq)}). (9)

Therefore two edges (n,m) and (n ,m ) have the same color iff an action in GN,M moves one edge to the other.

2n n g s.t., n = gn n Gn n Gn.

Equivariance Through Parameter-Sharing

Proposition 3.1 GN,M Ωfor Ωof (9).

Corollary 3.2 φ( ;w,Ω), for structure (9), is equivariant to GN,M.

Example 3.2 (Nested Subsets and Wreath Product) The permutation-equivariant layer that we saw in Example 2.2 is useful for deﬁning neural layers for set structure. If our data-structure is in the form of nested subsets, then we require equivariance to permutation of variables within each set as well as permutation of subsets. Here, we show how to use our dense design for this purpose.

We use a special indexing for the input to better identify the exchangeability of variables. We assume D subsets, each of which has d variables x = [x1,1,...,x1,d,x2,1,...,x D,d].

The group of our interest is the wreath product Sd SD. This type of group product can be used to build hierarchical and nested structures with different type of symmetries at each level. Nesting subsets corresponds to the most basic form of such hierarchical constructions. We use (n,n ) to index input variables and (m,m ) for output variables.

The following ﬁgure shows the resulting parametersharing for an example with D = 2, d = 3.

How did we arrive at this structure Ω? Recall Our objective is to deﬁne parameter-sharing so that φW Rd D Rd D is equivariant to the action of G = Sd SD i.e., permutations within sets at two levels. This group-action identiﬁes three partitions of edges (seen in the ﬁgure): I) ((n,n ),(n,n )) n,n

connects each variable to its counterpart (dashed orange); II) ((n,n ),(n,m )) n,n m connects each variable to other variables within the same subset; III) ((n,n ),(m,m )) n m is the set of edges from one subset to another. According to the Corollary 3.2 this parameter-sharing guarantees equivariance.

This fully-connected design is useful when the group GN,M is large; for example when dealing with SN. However, for

smaller groups it could be very inefﬁcient in practice, as sometimes we can achieve equivariance through a sparse structure Ω. As an example, consider the 2D circular convolution layer. It is easy to show that according to this design, the convolution ﬁlter will be the same size as the input image. While this achieves the desirable equivariance, it is inefﬁcient and does not generalize as well as a convolution layer with small ﬁlters. Moreover, the dense design does not guarantee unique equivariance. We next show under some conditions on GN,M the sparse design can produce this stronger guarantee.

3.2. Sparse Design

Our sparse construction uses orbits and symmetric generating sets:

Let us denote the orbits of G-action on M and N by {Gnp 1 p P} and {Gmq 1 q Q} respectively, where P and Q are the total number of orbits and np,mq are (arbitrary) representative members of orbits Gnp, Gmq respectively. Note that in contrast to previous section, here we are considering the orbit of variables rather than the edges.

The set A G is called the generating set of G (< A >= G), iff every member of G can be expressed as a combination of members of A. If the generating set is closed under inverse a A a 1 A we call it a symmetric generating set.

Deﬁne the structure Ωas

Ω= (N,M,{ p,q,a}1 p P, 1 q Q,a A)

p,q,a = {(g Nanp,g Nmq) (g N,g M) GN,M}. (10)

In words, we have one color per each combination of orbits (p, q) and members of the generating set a A. The following theorem relates the symmetry group of this structure to G.

Theorem 3.3 GN,M Aut(Ω) for Ωof (10). Moreover if GN and GM are both semi-regular, then GN,M = Aut(Ω).

Note that this result holds for any choice of a symmetric generating set A in deﬁning Ω. Therefore, in designing sparse layers, one seeks a minimal A.

Corollary 3.4 The function φ( ,w,Ω), using the structure (10) is GN,M-equivariant. If GN and GM are semi-regular, this function is uniquely GN,M-equivariant.

Equivariance Through Parameter-Sharing

Now, assuming G-action is semi-regular on both N and M, using (arbitrarily chosen) representatives {np}1 p P and {mq}1 q Q for orbits in N and M, we can rewrite the expression (5) of the structured neural layer for the structure above. Here, components of φ = [φ1,...,φM] are enumerated for 1 q Q,g M GM:

φg Mmq(x;w) = σ( 1 p P a A wq,p,axg Nanp) (11)

where w RP Q A is the set of unique parameters, and each element φg Mmq depends on subset of parameters {wq,p,a}p,a identiﬁed by q and a subset of inputs {xa,g Nnp}p,a identiﬁed by g N.

Example 3.3 (Dihedral Group of Fig. 1) In the example of Fig. 1, the number of orbits of G-action on N is P = 2 and for M this is Q = 1. The symmetric generating set is the generating set that is used in the Cayley diagram, with the addition of inverse shift (inverse of the blue arrow). We then used (10) to build the structure of Fig. 1 (right).

Example 3.4 (Reverse Convolution) The parametersharing structure of reverse convolution in Examples 1.2 and 2.1 is produced using our sparse design. In these examples, both GN and GM are regular. Therefore the proposed parameter-sharing provides unique equivariance.

3.3. Multiple Channels

In this section, we extend our results to multiple input and output channels. Up to this point, we considered a neural network layer φ RN RM. Here, we want to see how to achieve GN,M-equivariance for φ RN K RM K , where K and K are the number of input and output channels.

First, we extend the action of G on N and M to NK = [N,...,N ¹¹¹¹¹¹¹ ¹¹¹¹¹¹¹ Ktimes

] as well as MK , to accommodate multiple chan-

nels. For this, simply repeat the G-action on each component. G-action on multiple input channels is equivalent to sub-direct product GN ... GN ¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹ ¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹ Ktimes

GN. The same applies

This repetition, multiplies the orbits of GN, one for each channel, so that instead of having P and Q orbits on the input N and output M sets, we have K P and K Q orbits on the input NK and output MK . This increases the number of parameters by a factor of K K .

The important implication is that, orbits and multiple channels are treated identically by both dense and sparse designs.

Example 3.5 (Group Convolution) The idea of groupconvolution is studied by Cohen & Welling (2016a); see also (Olah, 2014). The following claim relates the function of this type of layer to our sparse design.

Claim 3.5 Under the following conditions the neural layer (5) using our sparse design (10) performs group convolution: I) there is a bijection between the output and G (i.e., M = G) and; II) GN is transitive.

This also identiﬁes the limitations of group-convolution even in the setting where M = G: When GN is semiregular and not transitive (P > 1), group convolution is not guaranteed to be uniquely equivariant while the sparse parameter-sharing of (10) provides this guarantee.

For demonstration consider the following example in equivariance to mirror symmetry.

This ﬁgure shows the bipartite structure for G = Z2 = {0,1} and A = {1}. G-action is horizontal ﬂip of the input and the output. On the right, M = G while on the left GM-action has two orbits. Orbits are identiﬁed by line-style and color of the circles. In a neural layer with this parameter-sharing, when we ﬂip the input variables (around the mirror line) the output is also ﬂipped.

The representatives in each orbit on N and M is identiﬁed with a star. Note that each combination of orbits p and q has a parameter of its own, identiﬁed with different edge-styles. While this construction guarantees unique G-equivariance, if instead we use the same parameters across orbits (as suggested by the original group convolution) we get the parameter-sharing of the ﬁgure below middle.

Equivariance Through Parameter-Sharing

In this case, the resulting neural layer has the desired equivariance (right). However, it is equivariant to the action of a larger group GN,M Z2 Z2 > Z2, in which 1 in the second Z2 group exchanges variables across the orbits on N (left in ﬁgure above).

3.4. GN = GM

In semi-supervised and un-supervised applications, we often need to produce a single output yn for each input xn n N that is N = M. We can ensure this by having a relation c = {(n,n) n N} in Ωthat guarantees any (πN,πM) Aut(Ω) applies the same permutation to N and M = N i.e., πN = πM. The resulting structure Ω= (N,N,{ c}1 c C { c }} can be also interpreted as a colored multi-edged directed graph (digraph). This is because we can collapse the two parts by identifying n N with n M.

Therefore, the symmetry-group of the original bipartite structure, is isomorphic to symmetry group of a colored multi-edged digraph on N. Achieving unique Gequivariance then reduces to answering the following question: when could we express a permutation group G SN as the symmetry group Aut(Ω) of a colored multi-edged digraph with N nodes?

This problem is well-studied under the class of concrete representation problems (Babai, 1994). Permutation groups G that can be expressed in this way are called 2closed groups (Wielandt, 1969). The recipe for achieving GN Aut(Ω) is similar to our dense construction of Section 3.13 The 2-closure GN of a group GN is then, the greatest permutation group GN SN with the same orbit on N N as GN. It is known that for example semi-regular permutation groups are 2-closed GN = GN. This result also follows a corollary of our Theorem 3.3 for sparse design of (10).

Example 3.6 (Equivariance to 90 Rotations) Figure below compares the digraph representation of Ω produced using (left) our sparse design, and (right) our dense design.

3In a fully connected digraph, the edges that belong to the same orbit by G-action on N N, receive the same color.

Multiples of 90 rotation is produced as the action of cyclic group Z4 on eight input output variables that is N = M = {1,...,8}. Z4-action is semi-regular with two orbits; these orbits the two inner and outer set of four nodes. The representatives of each orbit in our sparse design is indicated using ﬁlled circles. The generating set consists of A = {1,3}, rotation by 90 and its inverse, rotation by 270 . Each edge in each of these ﬁgures, has a corresponding edge in the opposite direction, within a different relation. To avoid over-crowding the ﬁgure, we have dropped this edge from the drawing above, unless both edges belong to the same relation.

Example 3.7 (Graph Convolution) Consider the setting where we use the (normalized) adjacency matrix B {0,1}N N (or Laplacian) of a graph Λ, to identify parameter-sharing in a neural network layer. For a single input/output channel, this is often in the form of Ax, where x RN and A = w1B + w2I has different parameters for diagonal and off-diagonal values (e.g., Kipf & Welling, 2016; Bruna et al., 2013; Henaff et al., 2015); for multiple channels see Section 3.3. The following corollary of Theorem 2.1 identiﬁes the equivariance of Ax.

Corollary 3.6 Given the digraph Λ and its binary adjacency matrix B {0,1}N N, then (w1B + w2I)x is uniquely equivariant to the symmetry-group of Λ.

Since two graphs on N nodes can have identical symmetries, one implication of this corollary is that graphconvolution has identical equivariances for graphs with the same symmetry groups.

4. Conclusion

This work is a step towards designing neural network layers with a given equivariance and invariance properties. Our approach was to relate the equivariance properties of the neural layer to the symmetries of the parameter-matrix.

We then proposed two parameter-sharing scheme that achieves equivariance wrt any discrete group-action. Moreover under some conditions, we guarantee sensitivity wrt other group actions. This is important because even a trivial constant function is invariant to all transformations. It is therefore essential to be able to draw the line between equivariance/invariance and sensitivity in a function. To our knowledge, our work presents the ﬁrst results of its kind on guarantees regarding both variance and equivariance with respect to group actions.

Equivariance Through Parameter-Sharing

Acknowledgment

This research is supported in part by DOE grant DESC0011114 and NSF grant IIS1563887.

Agrawal, Pulkit, Carreira, Joao, and Malik, Jitendra. Learning to see by moving. In Proceedings of the IEEE International Conference on Computer Vision, pp. 37 45, 2015.

Anselmi, Fabio, Leibo, Joel Z, Rosasco, Lorenzo, Mutch, Jim, Tacchetti, Andrea, and Poggio, Tomaso. Unsupervised learning of invariant representations in hierarchical architectures. ar Xiv preprint ar Xiv:1311.4158, 2013.

Babai, Laszlo. Automorphism groups, isomorphism, reconstruction (chapter 27 of the handbook of combinatorics). University of Chicago, 1994.

Bart ok, G abor, Szepesv ari, Csaba, and Zilles, Sandra. Models of active learning in group-structured state spaces. Information and Computation, 208(4):364 384, 2010.

Bruna, Joan and Mallat, St ephane. Invariant scattering convolution networks. IEEE transactions on pattern analysis and machine intelligence, 35(8):1872 1886, 2013.

Bruna, Joan, Zaremba, Wojciech, Szlam, Arthur, and Le Cun, Yann. Spectral networks and locally connected networks on graphs. ar Xiv preprint ar Xiv:1312.6203, 2013.

Bui, Hung Hai, Huynh, Tuyen N, and Riedel, Sebastian. Automorphism groups of graphical models and lifted variational inference. ar Xiv preprint ar Xiv:1207.4814, 2012.

Cohen, Taco S and Welling, Max. Group equivariant convolutional networks. ar Xiv preprint ar Xiv:1602.07576, 2016a.

Cohen, Taco S and Welling, Max. Steerable cnns. ar Xiv preprint ar Xiv:1612.08498, 2016b.

Dieleman, Sander, Willett, Kyle W, and Dambre, Joni. Rotation-invariant convolutional neural networks for galaxy morphology prediction. Monthly notices of the royal astronomical society, 450(2):1441 1459, 2015.

Dieleman, Sander, De Fauw, Jeffrey, and Kavukcuoglu, Koray. Exploiting cyclic symmetry in convolutional neural networks. ar Xiv preprint ar Xiv:1602.02660, 2016.

Freeman, William T, Adelson, Edward H, et al. The design and use of steerable ﬁlters. IEEE Transactions on Pattern analysis and machine intelligence, 13(9):891 906, 1991.

Gens, Robert and Domingos, Pedro M. Deep symmetry networks. In Advances in neural information processing systems, pp. 2537 2545, 2014.

Hel-Or, Yacov and Teo, Patrick C. A common framework for steerability, motion estimation, and invariant feature detection. In Circuits and Systems, 1998. ISCAS 98. Proceedings of the 1998 IEEE International Symposium on, volume 5, pp. 337 340. IEEE, 1998.

Henaff, Mikael, Bruna, Joan, and Le Cun, Yann. Deep convolutional networks on graph-structured data. ar Xiv preprint ar Xiv:1506.05163, 2015.

Hinton, Geoffrey E, Krizhevsky, Alex, and Wang, Sida D. Transforming auto-encoders. In International Conference on Artiﬁcial Neural Networks, pp. 44 51. Springer, 2011.

Jaderberg, Max, Simonyan, Karen, Zisserman, Andrew, et al. Spatial transformer networks. In Advances in Neural Information Processing Systems, pp. 2017 2025, 2015.

Jayaraman, Dinesh and Grauman, Kristen. Learning image representations equivariant to ego-motion. In Proc. ICCV, 2015.

Kersting, Kristian, Ahmadi, Babak, and Natarajan, Sriraam. Counting belief propagation. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artiﬁcial Intelligence, pp. 277 284. AUAI Press, 2009.

Kipf, Thomas N and Welling, Max. Semi-supervised classiﬁcation with graph convolutional networks. ar Xiv preprint ar Xiv:1609.02907, 2016.

Kondor, Risi. Group theoretical methods in machine learning. Columbia University, 2008.

Krizhevsky, Alex, Sutskever, Ilya, and Hinton, Geoffrey E. Imagenet classiﬁcation with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097 1105, 2012.

Lenc, Karel and Vedaldi, Andrea. Understanding image representations by measuring their equivariance and equivalence. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 991 999, 2015.

Niepert, Mathias. Markov chains on orbits of permutation groups. ar Xiv preprint ar Xiv:1206.5396, 2012.

Olah, Christopher. Groups and group convolutions. http://colah.github.io/posts/ 2014-12-Groups-Convolution/, 2014.

Equivariance Through Parameter-Sharing

Oyallon, Edouard and Mallat, St ephane. Deep rototranslation scattering for object classiﬁcation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2865 2873, 2015.

Raedt, Luc De, Kersting, Kristian, Natarajan, Sriraam, and Poole, David. Statistical relational artiﬁcial intelligence: Logic, probability, and computation. Synthesis Lectures on Artiﬁcial Intelligence and Machine Learning, 10(2): 1 189, 2016.

Ravanbakhsh, Siamak, Schneider, Jeff, and Poczos, Barnabas. Deep learning with sets and point clouds. ar Xiv preprint ar Xiv:1611.04500, 2016.

Sifre, Laurent and Mallat, St ephane. Rotation, scaling and deformation invariant scattering for texture discrimination. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1233 1240, 2013.

Wielandt, H. Permutation groups through invariant relations and invariant functions. Dept. of Mathematics, Ohio State University, 1969.

Zaheer, Manzil, Kottur, Satwik, Ravanbakhsh, Siamak, Poczos, Barnabas, Salakhutdinov, Ruslan, and Smola, Alexander J. Deep sets. Co RR, abs/1703.06114, 2017. URL http://arxiv.org/abs/1703.06114.