# group_downsampling_with_equivariant_antialiasing__d01f765d.pdf

Published as a conference paper at ICLR 2025

GROUP DOWNSAMPLING WITH EQUIVARIANT ANTIALIASING

Md Ashiqur Rahman Department of Computer Science Purdue University rahman79@purdue.edu

Raymond A. Yeh Department of Computer Science Purdue University rayyeh@purdue.edu

Downsampling layers are crucial building blocks in CNN architectures, which help to increase the receptive field for learning high-level features and reduce the amount of memory/computation in the model. In this work, we study the generalization of the uniform downsampling layer for group equivariant architectures, e.g., G-CNNs. That is, we aim to downsample signals (feature maps) on general finite groups with anti-aliasing. This involves the following: (a) Given a finite group and a downsampling rate, we present an algorithm to form a suitable choice of subgroup. (b) Given a group and a subgroup, we study the notion of bandlimited-ness and propose how to perform anti-aliasing. Notably, our method generalizes the notion of downsampling based on classical sampling theory. When the signal is on a cyclic group, i.e., periodic, our method recovers the standard downsampling of an ideal low-pass filter followed by a subsampling operation. Finally, we conducted experiments on image classification tasks demonstrating that the proposed downsampling operation improves accuracy, better preserves equivariance, and reduces model size when incorporated into G-equivariant networks.

1 INTRODUCTION

Computer vision models, e.g., Conv Nets or Vi Ts, consist of striding and pooling layers used for downsampling a feature map (He et al., 2016; Liu et al., 2022; Dosovitskiy et al., 2021; Liu et al., 2021b; Wu et al., 2021; Yang et al., 2024). These subsampling layers play a crucial role in learning the spatial hierarchy of features, building in translation invariance, and reduction in computation (Zhang et al., 2023). Concepts from signal processing (Vetterli et al., 2014), e.g., bandlimited-ness and anti-aliasing, have also been introduced to design better downsampling (anti-aliasing followed by subsampling) operations (Zhang, 2019; Zou et al., 2020; Vasconcelos et al., 2021).

Given additional prior knowledge, group equivariant Conv Nets and Transformers have been proposed to incorporate additional structure into the models (Cohen & Welling, 2016; Tai et al., 2019; Romero & Cordonnier, 2021; Rojas-Gomez et al., 2022; 2024; Xu et al., 2023). These models have guarantees that the output is transformed predictably when the input is transformed. A canonical example is shift-equivariance in image segmentation, where the output mask is shifted accordingly when the input image is shifted.

Interestingly, subsampling layers are not as common in group equivariant architectures. Most models only subsample over the translation group. One limitation is that existing subsampling layers (Cohen & Welling, 2016; Xu et al., 2021) over groups require knowing the subgroup to downsample to. That is, there is no notion of subsampled by a factor of two . From a practitioner s point of view, it is often unclear how to choose such a subgroup (see Appendix A1.4 for an example). Furthermore, these subsampling layers are not designed with proper anti-aliasing, which hurts the equivariance guarantees (Gruver et al., 2023).

In this work, we propose a generalization of uniform downsampling of signals (features maps) on general finite groups with anti-aliasing. We present an algorithm to form a suitable choice of subgroup given a finite group and an integer downsampling factor. Next, we define the sampling theorem and bandlimited-ness for subgroup subsampling of signals on groups. To ensure the signal is bandlimited, we propose an anti-aliasing operation following the introduced bandlimited definition

Published as a conference paper at ICLR 2025

while maintaining equivariance. We point out that our proposed algorithm and definitions intuitively generalize the notion of downsampling based on classical sampling theory.

Beyond the theoretical aspects, we conduct experiments to test the proposed downsampling operation. First, we numerically validate the proposed claims. Second, we conduct experiments on the MNIST and CIFAR-10 datasets to evaluate the performance of the proposed downsampling layer on image classification tasks over different symmetries. We show that our proposed subsampling layer selects suitable subgroups for the task of image classification, and the proposed anti-aliasing operation further improves the models performance both in task performance and equivariance. Our contributions:

We generalize the uniform subsampling operation to signals on finite groups, allowing subsampling at a desired rate, yielding signals on subgroups. We introduce the Subgroup Sampling Theorem and the concept of bandlimited-ness for subgroup subsampling. It guarantees the perfect reconstruction of the signal on the whole group from the subsampled signal on the subgroup. We propose an equivariant anti-aliasing operation to ensure the signals are bandlimited before subgroup subsampling. Empirically, we demonstrate the efficacy of the downsampling operation.

2 RELATED WORKS

Downsampling layers (subsampling & anti-aliasing). The idea of subsampling has rooted in striding and pooling as early as the seminal works of CNNs (Fukushima, 1980; Le Cun et al., 1999). To downsample a high-resolution feature map to a low-resolution one, e.g., by a factor of two, one can simply discard every other element in a feature map. More recently, anti-aliasing has been incorporated into deep nets, inspired by signal processing, where they propose to blur the feature map before subsampling using a low-pass filter (Zhang, 2021; Karras et al., 2021; Rahman & Yeh, 2024). Later, subsampling has also been extended to groups (Cohen & Welling, 2016; Xu et al., 2021). However, the term every other is ambiguous here, resulting in a definition that assigns groups to specific subgroups without adequately addressing the subsampling rate. Additionally, the anti-aliasing operation is not tailored for groups. In contrast, this work addresses these limitations by creating a theoretical foundation for subsampling by a specified factor within groups and proposing an effective anti-aliasing method that extends the sampling theorem, which we discuss next.

The sampling theorem is the basis of digital signal processing, which studies how to sample, interpolate, and manipulate signals sampled at different rates (Vetterli et al., 2014). The sampling theorem guarantees that bandlimited signals can be perfectly reconstructed given a high enough sampling rate. This idea has been extended to graph signals and the field of graph signal processing (Chen et al., 2015b;a). The sampling theorems for the cyclic and abelian groups have also been studied (Dodson, 2007; Faridani, 1994; Mc Ewen et al., 2015; Napolitano & Spooner, 2001; Vaidyanathan & Kirac, 1999). These works generalize the discrete Fourier transform to discrete groups but do not consider a generalization for all finite discrete groups. Different from these works, we present a generalization of the sampling theorem for any finite groups, propose a downsampling layer, and show how the layer can be incorporated into group equivariant deep-nets.

Equivariant deep-nets. Incorporating equivariance into deep-nets has been found to be an effective approach to designing deep nets (Cohen & Welling, 2017; Bekkers et al., 2018; Worrall & Welling, 2019; Yeh et al., 2022) across many applications in multiple domains, e.g., sets (Ravanbakhsh et al., 2017; Zaheer et al., 2017; Hartford et al., 2018; Yeh et al., 2019a; Liu et al., 2021a; Rahman et al., 2024), graphs (Maron et al., 2019; Liu et al., 2020; Yeh et al., 2019b; Morris et al., 2022; Liao & Smidt, 2023; Du et al., 2023), etc. We foresee that our proposed downsampling layer can be incorporated into this rich literature of group equivariant architectures to build more effective and efficient models.

3 PRELIMINARIES

We now review the necessary background and definitions. For further details, please refer to A1.

Downsampling of sequences. Given a subsampling factor R and a signal x RN, the subsampling operation is defined as

Sub R : RN R N/R N Z+, where Sub R(x)[n] x[Rn]. (1)

Published as a conference paper at ICLR 2025

When we subsample a signal following Eq. (1), it can often result in a distorted signal due to aliasing. To avoid this, an anti-aliasing filter is used to remove high-frequency content, i.e., to obtain a bandlimited signal, before subsampling. An ideal anti-aliasing filter, denoted as h, is used to remove all frequency content above the Nyquist frequency (Shannon, 1949). In summary, the ideal downsampling can be expressed as an anti-aliasing filter followed by the subsampling operation as:

Dwn R(x)[n] = (x h)[Rn], where F(h)[i] = 1 if i f Nyquist else 0. (2)

Here, F : Rn Cn denotes the discrete Fourier transform.

Remarks: Subsampling a finite sequence involves retaining the signal at every R-th factor. At a glance, it is not obvious how to generalize this subsampling strategy to a finite group G. As G is a set, there is no notion of every Rth element. Naively sorting the elements and applying the subsampling for sequences would also not work, e.g., the subsampled set may not be a subgroup.

G-Equivariance. In deep learning, imposing equivariance on the layers is often desirable. We say a linear map (layer) W Rn n is equivariant with respect to a group G with representation ρU : G GL(U) and ρU : G GL(U ) with U Rn and U Rn if

W ρU (g)u = ρU (g)W x g G, u U. (3)

Generating set. A subset S of group G is said to be the generating set if any element g G can be expressed as a product of the elements of S. We use the notation G = S to denote that G is generated by S and assume identity element e / S. The set S is called the minimal generating set when S\{s} = G s S, i.e., every element of S is necessary to generate the group G. Note, a group can have more than one minimal generating set. We call the kth power of an element s S non-redundant if sk cannot be expressed as a product of the rest of the generating elements S\{s} when sk = e.

Cayley graph. To better understand the abstract structure of a group, one approach is to represent it as a graph, namely, Cayley graph. Given a group G and its generating set S, a Cayley graph Γ(G, S) consists of vertices V and edges E. The vertices correspond to each element g G, and there exists an edge (a, b) E, if there exists an s S such that b = a s. In the directed Cayley graph, edges are directed from a to b. Any element g G can be represented as a path on the Γ(G, S) starting from the identity node e.

Fourier transform for finite groups. The notion of Fourier transform has also been studied on groups (Folland, 2016; Stankovic et al., 2005). For a finite group G, let ˆG be the set of complex unitary irreducible representations (complex irreps). We denote the dimensionality of an irrep φ ˆG as dφ such that φ(g) Cdφ dφ g G. The Fourier transform of a square-integrable function f L2(G) is

ˆf(φmn i ) = 1 |G|

dφif(g)φmn i (g) φi ˆG and 1 m, n dφi, (4)

where φmn i (g) denotes the entry at mth row and nth column for matrix φi(g). Next, ˆf(φmn i ) denotes the Fourier coefficient corresponding to irrep component φmn i . Similarly, the inverse Fourier transform on a group can be expressed as

ˆf(φmn i ) p

dφiφmn i (g), (5)

where we denote the set of orthonormal Fourier basis as { p

dφiφmn i |φi ˆG and m, n dφi} following the Peter-Weyl theorem (Peter & Weyl, 1927). For real-irreps, the orthonormal basis set is constructed by only taking non-redundant columns of φi (see Supp C of Cesa et al. (2021)).

In this work, we consider signals x XG {x : G Rd} to be an unconstrained realvalued function over a finite group G. For readability, we describe the content with d = 1, which can be easily generalized. A group element g acts on the space XG via a regular representation ρXG, i.e., (ρXG (g)x)(u) = x(g 1u) u G. Our goal is to design a downsampling operator Dwn G r : XG XG , which resamples the signal on a group G to be on a subgroup G G.

Published as a conference paper at ICLR 2025

Algorithm 1 Uniform group subsampling

1: Input: Group G, Generators S, subsampling rate R, generator sd 2: Output: Subsampled group G

3: // Get directed Cayley graph 4: V, E Di Cay(G, S) 5: E E.copy() 6: for each v V do 7: // remove generator sd 8: E .remove((v, v sd)) 9: // add generator s R d 10: E .add((v, v s R d )) 11: end for 12: // BFS traversal from e 13: Q 14: G 15: Q.enqueue(e) 16: while Q = do 17: n Q.dequeue() 18: G .add(n) 19: for each (n, m) E do 20: if m / Q then 21: Q.enqueue(m) 22: end if 23: end for 24: end while 25: Return G

This involves addressing the following: (a) Given a group G and a subsampling rate R, what is an appropriate subgroup G ? (b) Given a group and a subgroup, what is the notion of bandlimited-ness to guide the design of anti-aliasing? We answer these questions by proposing a downsampling operation that generalizes the existing notion of subsampling ( 4.1) and sampling theorem ( 4.2) from sequences to finite groups.

4.1 UNIFORM GROUP SUBSAMPLING

A natural generalization from subsampling of sequences in Eq. (1) to signals on a group is to keep the signal on a subgroup G and discard the rest:

x = Sub G R(x) with x [g ] = x[g ] g G , (6)

where the downsampled signal is denoted by x : G R. However, it is not obvious how to obtain such G and how to relate it to the rate R.

Given a group G and a subsampling rate R, we propose the uniform group subsampling, which returns a subgroup G . Our subsampling algorithm intuitively generalizes from the traditional subsampling and is guaranteed to return a subgroup under mild conditions (details in Clm. 1). Our approach breaks subsampling into two parts: subsampling on a group for a specific generator (Alg. 1), and how to choose the generator.

The key idea behind Alg. 1 is to leverage the structure of the Cayley graph to perform the subsampling. Consider a generating set S = {s1, s2, . . . sn} for a group G = S . Let each generator si S to have an order oi, i.e., soi i = e and such an order always exists (Isaacs, 2009). We view the uniform subsampling of G by a factor of R for a generator sd is to uniformly discard elements along the path (e, s1 d, s2 d, . . . , sod 1 d ) on the Cayley graph of G. This can also be viewed as adding the generator s R d to the generating set S while removing the generator sd. In Example 1, we illustrate the proposed Alg. 1 applied to a sequence.

Example 1. Discrete-time periodic signal of period 4. The domain corresponds to the translation group on a periodic 1D grid of size 4, with the generator 1 representing discrete time translation ( t). The group action is addition modulo 4, indicating a periodic time shift. Its Cayley graph is shown in Fig. 1 (left). When downsampling by a factor of 2, the generator t combines to 2 t. Observe that this is equivalent to the subsampling in Eq. (1) by discarding every other element.

(a) (b) (c)

Figure 1: (a) Cayley graph of the group with generator t. (b) Edges corresponding to the generator t = 1 are removed (dotted edges) and new edges corresponding to the element 2 t are added. (c) The resultant cyclic subgroup of size 2 obtain by the traversing the new graph from node 0.

Equipped with the intuition, we now introduce two lemmas before going into the conditions and show why Alg. 1 returns G that is a subgroup of G.

Published as a conference paper at ICLR 2025

Lemma 1. For the set G returned by Alg. 1, v G if and only if v can be expressed as a product of the elements of the set S = S/{sd} {s R d }.

Lemma 2. For the set S in Lemma 1, each element si S = s 1 i G .

Please see A3.1 and A3.2 for the complete proof of the lemmas. With some mild assumptions, using the above lemmas, we can show that the set G returned by Alg. 1 is a subgroup of G (see A3.3). To guarantee that we are indeed downsampling, additional conditions are required such that G is a proper subset of G, i.e., the size of G is smaller. Specifically, we need conditions to ensure that the discarded group elements cannot be regenerated from the remaining ones.

Claim 1. If Sk d = {sk d : k Z+ and k mod R 0} are non-redundant powers of sd, od mod r 0, and the elements of Sk d can not be represented as a product of the elements of the left cosets of the subgroup Gsub = S/{sd} generated by the set {sn R d : n Z+ 0 } then Alg. 1 returns a proper subgroup G G.

Proof. We show that G forms a group by verifying closure (using Lem. 1), the existence of inverses (using Lem. 2), and that associativity and identity hold by construction. We then prove that G is a proper subset of G by showing that, under the assumptions, elements can only be discarded in Alg. 1. The formal proof is provided in A3.3.

Clm. 1 imposes conditions that restrict the regeneration of discarded elements by Alg. 1, ensuring a proper subgroup. For a better understanding of the implications of this claim, we provide a visual illustration in A4. With Alg. 1, we can subsample a group given a specific generator. If there are multiple generators, then different subgroups can be formed. We now discuss how to choose among these subgroups.

Choice of subgroups. The choice of subgroup matters. Choosing a generator sd with a small order to subsample may lead to the complete exclusion of transformations associated with it; see Example 2.

Example 2. Subsampling of dihedral group D8. Here, we illustrate the effect of the choice of generators while subsampling the group D8 = s, r|s2 = r4 = e, sr = r3s . While subsampling by a factor of 2, we can subsample along the generator s, resulting in a cyclic subgroup of rotation C4. Or, according to our proposed algorithm, we can subsample along r, resulting in subgroup D4.

Figure 2: Subsampling group D8 along the generator s (on left) and r (on right). The edges corresponding to the subsampling generators are dotted in the Cayley graph.

Based on this intuition, we propose a heuristic for selecting a set of generators Ds to subsample G with sampling factors Rs along each generator in Ds. Given the subsampling rate R, we decompose it into prime factors, i.e., R = R1 R2 R3 , sorted in descending order. For each Ri, starting from i = 1, we select the generator with the maximum order satisfying the constraint outlined in Clm. 1. Subsampling by the factor R can be conceptualized as a sequential subsampling, each by Ri. The algorithms for a generalized approach to subsampling and their time complexity analysis are provided in A5. With subsampling defined, we will next generalize the notion of bandlimited-ness and propose an equivariant anti-aliasing operator to signals on a group.

Remarks: The proposed algorithm and heuristic offer a general framework for uniformly subsampling subgroups from any finite group, extending the concept of sampling rate to groups. The heuristic seeks to maximize the number of generators in the subgroup. In practice, choosing the subgroup is a key hyperparameter influenced by the application and may require domain expertise.

Published as a conference paper at ICLR 2025

4.2 THE SUBGROUP SAMPLING THEOREM FOR SIGNALS ON GROUPS

In multi-rate signal processing, the sampling theorem states a sufficient condition (bandlimited-ness) on a signal such that perfect reconstruction can be achieved given the signal sampled at a lower rate (Vetterli et al., 2014), i.e., how to sample and interpolate between finite-dimensional vectors. In this section, we propose a sampling theory for signals on finite groups, i.e., a condition that allows for perfect reconstruction from subgroups, and an anti-aliasing filter to ensure that the signal satisfies the condition. We now establish a vectorized notation of the signal to aid the discussion.

Recall, we are considering a signal x XG {x : G R}, where G is a finite set with size N, then x can be equivalently expressed by a finite-dimensional vector x RN such that x[i] x[gi], where gi denotes the ith element of the group G in an arbitrary fixed order.

Using this notation, the Fourier transform for a finite group G in Eq. (4) can be expressed as a matrix multiplication ˆx = FGx, where ˆx CN denotes the Fourier coefficients. Similarly, the inverse Fourier transform can be expressed as x = F 1 G ˆx. Note, F 1 G and FG are orthonormal bases.

Next, the sampling operation in Eq. (6) and the interpolation operation can be expressed as matrix multiplications:

Sampling: x = Sx, Sampling followed by Interpolation: x = Ix = ISx, (7)

where S RM N (with M < N) is the sampling matrix and I RN M denotes the interpolation matrix. A perfect reconstruction is achieved when x = x, which is not true in general. Eq. (7) describes the standard setup utilized for deriving the Sampling theory for signals on different domains (Vetterli et al., 2014; Chen et al., 2015a)

We now define the sufficient condition, i.e., bandlimited-ness , for signals on groups where perfect reconstruction is possible from signals on the corresponding subgroups.

Bandlimited functions for subgroup subsampling. Our main insight is based on the observation that for any bandlimited function x we need to establish a map M CN M from the Fourier coefficients of the subsampled signal ˆx FG x to the Fourier coefficients ˆx, which results in the following dependencies between

ˆx = Mˆx F 1 G ˆx = F 1 G Mˆx x = SF 1 G Mˆx . (8)

Combining Eq. (8) and the fact that x = F 1 G ˆx , we establish the following relationship between M, S and the Fourier bases:

F 1 G = S(F 1 G M) = SB. (9)

Eq. (9) can be informally viewed as choosing a set of vectors B F 1 G M defined on G such that when subsampled to the subgroup G , they generate the Fourier basis for the subgroup G 1. Consecutively, we define the interpolation matrix as I = BFG .

We now state our proposed definition of bandlimited signals in the context of subgroup subsampling.

Claim 2. Subgroup Sampling Theorem. For any signal x on G, if the Fourier coefficients ˆx are in the 1-eigenspace of M M(M M) 1M then it can be reconstructed perfectly from the subsampled signal x on G . The superscript denotes the conjugate transpose. Proof. To prove the claim, we show that

ˆx = M(M M) 1M ˆx x = B(B B) 1B x x = PMx. (10)

Here, PM B(B B) 1B denotes the projection matrix to the column space of B F 1 G M. This means that x is in Span(B), i.e., we can express x = Bˆxc for some set of coefficient vector ˆxc. Perfect reconstruction from the subsampled signal x is now possible, i.e.,

Ix = (BFG )(Sx) = (BFG S)(Bˆxc) = BFG F 1 G ˆxc = Bˆxc = x. (11)

The complete proof is provided in A3.4.

To provide some intuition, let s study how this definition applies to Cyclic groups.

1The construction of such an M for an arbitrary group G and its subgroup G is nontrivial, as the irreps of the group and the subgroup that constitutes the corresponding Fourier bases often differ in dimensions.

Published as a conference paper at ICLR 2025

Example 3. Bandlimited-ness for Cyclic Groups. For real-valued functions over the finite cyclic group CN, the real Fourier bases consist of the constant function 1

N cos 2πk n

N sin 2πk n

2 , n Z/NZ} where k represents the frequency, and n Z/NZ represents the elements of CN. If N is even, there is an additional basis 1

2 . Assuming the Fourier coefficients are arranged in an ascending frequency and uniform downsampling by a factor of 2 (with N mod 2 = 0), we have

2 is a zero matrix of size N

2 , as the Fourier bases of C N

2 are formed by sinusoidal of lower-frequencies. The corresponding M CN N is:

Mij = 1 if i = j and i N

2 0 otherwise . (13)

The vector ˆx lies in the 1-eigenspace if ˆx[i] = 0 for i > N

2 , aligning precisely with the conventional concept of bandlimited-ness.

Remarks: We have now defined what it means for signals on a finite group to be bandlimited with respect to a given M that satisfies Eq. (9). To ensure that the signal is bandlimited before subsampling, we can use the projection matrix PM to ensure that the signal satisfies the condition in Clm. 2, i.e., perform an ideal anti-aliasing. However, it is easy to observe that the M is not unique. While many M achieve perfect reconstruction, they may not be suitable for feature learning. Specifically, the anti-aliasing operation should be equivariant to group actions and preserve some notation of smoothness. We now discuss how to find such an M.

Equivariant anti-aliasing operator. We denote the ideal anti-aliasing operator PM in the Fourier space as ˆPM FGPMF 1 G . Our goal is to find a M that achieves perfect reconstruction, performs an equivariant anti-aliasing operation, and extracts smooth features. We formulate this goal as an optimization problem:

M = arg min M

vec( ˆPM) Tvec( ˆPM) 2

2 | {z } Equivariance Objective

+λ 1 Diag F 1 G LF 1 G M |.|

| {z } Smooth Selection Objective

subject to F 1 G = SF 1 G M (Perfect Reconstruction Constraint).

Here, λ > 0 is a hyperparameter balancing equivariance and smoothness, the superscript | | denotes the elementwise absolute value, Diag returns the diagonal elements as a row vector and the details of T and L are described below.

To be an G-equivariant the anti-aliasing operator, PM needs to satisfy the following equivariant constraint:

ˆPMˆρXG (g)ˆx = ˆρXG(g) ˆPMˆx, g G, ˆx Cn. (15)

Here, we describe the equivariance constraint in the Fourier domain where ˆρXG(g) corresponds to the action of the group G on the Fourier coefficients formed by the direct sum of the corresponding irreps (see A1.2 for details).

Next, Mouli & Ribeiro (2021) show that linear operators that are contained within the 1-eigenspace of the Reynolds operator T corresponding to the tensor product representation

ˆρXG XG = ˆρXG(g) ˆρXG (g 1) (16)

satisfy Eq. (15), i.e., are equivariant, where

g GˆρXG (g) ˆρXG (g 1) . (17)

Hence, the equivariance constraint of ˆPM can be written as vec( ˆPM) = Tvec( ˆPM) (see A1.3). Finally, we relax this equality condition as a penalty term to form the equivariance objective in Eq. (14).

Published as a conference paper at ICLR 2025

Next, the smooth selection objective is designed to prefer smoother basis functions in constructing the bandlimited subspace. To quantify the smoothness of signals over groups, we view them as functions over their corresponding Cayley graphs. We adopt the notion of smoothness from graph signal processing, namely, the Laplacian quadratic form (Dong et al., 2016; Shuman et al., 2013) as the smoothness measure. The Laplacian quadratic form for a function f on G can be defined as f Lf, where L is the Laplacian of the Cayley graph Γ(G, S). A smaller value indicates a smoother function. Intuitively, the smooth selection objective can be viewed as penalizing the Fourier bases by their Laplacian quadratic form weighted by their corresponding elements in M |.|.

Finally, we solve the constrained optimization problem in Eq. (14) via Sequential Least Squares Programming (Kraft, 1988) to obtain M which defines the bandlimited-ness and a corresponding anti-aliasing operator PM .

Anti-aliased G-CNN. In group equivariant CNN (Cohen & Welling, 2016), the input is first transformed to functions/features over the desired group. When performing subgroup subsampling, our designed subsampling and anti-aliasing operator are applied to these functions in the group. We discuss this operation in detail in Appendix A6.

5 EXPERIMENTS AND EVALUATIONS

5.1 EMPERICAL VALIDATION FOR CLAIM 2

Table 1: Empirical Validation of Claim 2. We report the recon. error with / (and without) the anti-aliasing operation. Anti-aliasing achieves zero recon. error up to numerical precision.

Group Subgroup Sub. R. Recon. Err.

D28 D14 2 1.72e-13/3.8 C14 2 6.54e-13/4.0 C7 4 9.48e-14/5.2

D20 D10 2 4.10e-11/3.3 C10 2 3.03e-11/3.4 D4 5 2.78e-14/4.7

C30 C15 2 5.18e-13/4.2 C5 6 9.54e-14/5.9

We validate our theoretical findings in Clm. 2 by numerically checking the recovery of bandlimited functions after subsampling. We generate random signals x defined on dihedral group D2n = s, r|s2 = rn = (sr)2 = e and cyclic rotation group Cn = r|rn = e , sampling each value from the standard Gaussian N(0, 1). We consider subgroups G , then apply the proposed downsampling technique: project x onto a bandlimited subspace by x = PMx (anti-aliasing) and obtain x restricted to G using S (subsampling). Lastly, we interpolate the downsampled signal to the original group using x = Ix .

In Tab. 1, we report reconstruction error, defined the norm difference x x 2 2 between the bandlimited signal ( x and the interpolated signal x ). We observe that the interpolation operator successfully reconstructs the bandlimited signal. To further study the proposed anti-aliasing operator, we visualize its response to the unit sample function δG[g], where δG[g] = 1 if g = e and 0 otherwise. This response to δG represents the smoothing filter used in anti-aliasing. In Fig. 3, we illustrate such filters. We observe that for the downsampling of the cyclic group (C16 to C8), the filter is reminiscent of the sinc function (Fig. 3), which is used in an ideal low-pass filter for sequences. This further illustrates the relation of our anti-aliasing to the classic anti-aliasing on sequences as explained in Example 3.

Remarks: In practice, ideal anti-aliasing operators are often approximated. For instance, the Gaussian blur filter is commonly used to smooth signals, approximating the sinc function, which has better empirical advantages. Building on our theorem, there is potential for developing a more efficient smoothing filter directly in the group ( time ) domain.

5.2 IMAGE CLASSIFICATION

We apply the proposed subgroup selection and anti-aliasing operator to equivariant CNN architectures. Note that, to use the proposed anti-aliasing filter PM in deep nets, we only need to perform the optimization in Eq. (14) only once before training a model.

Experiment setup. As in prior works (Cesa et al., 2021; Cohen & Welling, 2017), we study the effects of subgroup subsampling and anti-aliasing on group equivariant classification models using the rotated MNIST (Deng, 2012) and CIFAR10 (Krizhevsky et al., 2009).

Published as a conference paper at ICLR 2025

e r r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13

r15 e r r2 r3 r4 r5 r6 r7

sr4 sr3 sr2

e r r2 r3 r4 r5 r6 r7

sr4 sr3 sr2

C16 to C8 D16 to C4 D16 to D8

Figure 3: Visualization of the smoothing filter (PMδG) used in the anti-aliasing operation for subgroup subsampling. The vertical bar corresponds to the value of the filter at each node, with the downward bars indicating negative values.

Table 2: Performance of G-equivariant models on Rotated MNIST and CIFAR-10 at different subsampling rates R and with/without anti-aliasing filter PM under the continuous rotation and roto-reflection symmetry (SO(2)/O(2)). Sub-group subsampling with anti-aliasing improves both equivariance and accuracy.

R # Param. PM Sym. (SO(2)) Sym. (O(2)) 103 Accno aug Accloc Accorbit Lequi Accno aug Accloc Accorbit Lequi

- 323.11 - 0.9767 0.8234 0.8346 0.058 0.9752 0.8253 0.8496 0.039 2 194.09 0.9743 0.8007 0.8106 0.056 0.9774 0.6878 0.5660 0.092 2 194.09 0.9773 0.8301 0.8358 0.049 0.9807 0.6976 0.5749 0.091 3 151.08 0.9674 0.7762 0.7907 0.057 0.9731 0.8044 0.8316 0.046 3 151.08 0.9731 0.8057 0.8173 0.047 0.9724 0.8251 0.8451 0.037 4 129.57 0.9831 0.6283 0.5052 0.109 0.9810 0.6614 0.4816 0.109 4 129.57 0.9827 0.6547 0.5219 0.093 0.9806 0.6978 0.5006 0.098

- 549.33 - 0.6934 0.4253 0.3708 0.322 0.7251 0.4463 0.3867 0.265 2 291.29 0.7060 0.4659 0.4096 0.398 0.7448 0.4757 0.3310 0.555 2 291.29 0.7088 0.4868 0.4279 0.336 0.7418 0.4720 0.3274 0.460 3 205.27 0.7006 0.4337 0.3766 0.549 0.7249 0.4210 0.3674 0.478 3 205.27 0.6945 0.4472 0.3876 0.379 0.7117 0.4794 0.4197 0.411 4 162.26 0.7075 0.4275 0.2866 0.625 0.7590 0.5205 0.2921 0.607 4 162.26 0.7000 0.4536 0.3091 0.439 0.7525 0.5425 0.3017 0.550

To rigorously examine the impact of subsampling on the rotation and roto-reflection symmetry preservation, we remove the digits 9 , 2 , and 4 from MNIST following Wang et al. (2023). These digits disrupt the symmetry assumption on the dataset that the labels remain unchanged under the group action. For instance, the digit 6 overlaps with 9 under a 180 rotation. The same is true for 2 / 5 and 4 / 7 under the roto-reflection group.

While we consider the symmetry of the image data to be continuous rotation/roto-reflection (SO(2)/O(2)), note that we use the discrete C24 and D24 equivariant CNN for computational feasibility which matches our theoretical assumptions. For MNIST and CIFAR-10, we train on 5k and 60k training images, and test on images on different levels of transformations (see A7 for details).

Evaluation metrics. We propose evaluation metrics to measure the equivariance and classification performance. Given a deep-net H, we use the features of image x to measure equivariance error

LEqui.(x) = 1 |G|

H(ρIn(g)x) ρOut(g)H(x) 2 2 H(x) 2 2 . (18)

The representations ρIn and ρOut correspond to the group action on the input and output space of the deep-net H. In the context of image classification, ρIN represents the action of 2D rotation (and flipping), while ρOUT remains as an identity, i.e., invariance. Specifically, we use the pooled features from the final equivariant convolution layer as the invariant features (Weiler et al., 2018).

For the classification performance, we consider three accuracy metrics for evaluating the model performance under different degrees of equivariance:

Published as a conference paper at ICLR 2025

Table 3: Impact of subgroup selection in subgroup sampling on a 3-layer equivariant CNN. "*" indicates selection based on our method. Our algorithm improves performance for various sampling rates.

Group Sub. R. Subgroups Accno aug Accloc Accorbit

D24 1, 2, 2 D24 C12 C6 0.9703 0.6215 0.6128 D24 D12 D6* 0.9726 0.6539 0.5489

D24 1, 4, 1 D24 C6 C6 0.9766 0.5244 0.4596 D24 D6 D6* 0.9767 0.6272 0.4860

D28 1, 2, 1 D28 C14 C14 0.9742 0.5852 0.5191 D28 D14 D14* 0.9786 0.7085 0.5792

1. Accno aug: The accuracy of the model on the original (un-augmented) dataset.

2. Accloc: The accuracy of the model on the locally augmented dataset.

3. Accorbit: The accuracy of the model on the full (SO(2)/O(2)) orbit of the dataset. The orbit is constructed by taking all 10 rotations, and for local augmentations, we report on random rotations within 60 of the test set.

Results. In Tab. 2, we report the results for MNIST and CIFAR-10 datasets under SO(2) and O(2) symmetry. We report the average over 3 runs. For all models, the standard deviations are: Accno aug < 0.001, Accorbit and Accloc < 0.01, and Lequi < 0.004.

Overall, we observe that subgroup subsampling significantly reduces the parameter count of the equivariant models. However, increasing the sampling rate (e.g., R = 4) disrupts the strict equivariance constraint, leading to a higher equivalence error (Lequi). This manifests as a decrease in both Accorbit and Accloc, while increasing the accuracy on the original test set, Accno aug. Next, incorporating our anti-aliasing operation mitigates the invariance error and achieves higher Accorbit and Accloc. Notably, a lower sampling rate combined with appropriate anti-aliasing significantly reduces parameter usage while maintaining comparable or even surpassing the accuracy and equivariance achieved with the full equivariant models.

We provide additional results of our model on STL-10 (Coates et al., 2011) dataset, where we also observe similar performance gain (see Appendix A2.2).

Ablations. In Tab. 3, we provide the ablation of the proposed sub-group selection heuristic. For 3 layered equivariant CNN and different sampling rates at different layers of the models, we report the accuracy metrics for different choices of subgroups. We observe that for different symmetry groups at different sampling rates, our proposed subgroup selection improves the performance in most cases.

Furthermore, in A2.1, we demonstrate index selection by Xu et al. (2021) can be used with our technique. The results show further performance improvements and confirm that our method can easily be incorporated with existing techniques.

Limitations. As this is a theoretical paper, we fully acknowledge that our experiments are limited to small-scale datasets and models. These experiments are meant to study and demonstrate the potential of the proposed framework. Our proposed downsampling layer currently operates on finite groups rather than continuous ones. The time complexity of the subgroup selection algorithm scales quadratically, in the worst case, with the number of edges, |E|, in the Cayley graph (see A5).

6 CONCLUSION

We propose uniform subgroup downsampling for signals on finite groups with an equivariant antialiasing operation. We generalize the uniform subsampling operation to groups and propose a subgroup selection method based on maximizing the number of generators. We then extend the sampling theorem to subgroup subsampling, generalizing the notion of bandlimited-ness and antialiasing to groups. We apply these theories to equivariant CNN and empirically show that models with subgroup subsampling can achieve comparable or even better performance compared to full equivariant models. In summary, we believe our developed theory would serve as the foundation for future research in equivariant deep nets and signal processing on groups. We are particularly excited about how to find an optimal subgroup for a given task and how to design more effective anti-aliasing for signals on groups that would build on top of our framework.

Published as a conference paper at ICLR 2025

ACKNOWLEDGMENT

The authors would like to thank Renan A. Rojas-Gomez for providing feedback on the draft version of this work, which helped to improve clarity.

Erik J Bekkers, Maxime W Lafarge, Mitko Veta, Koen AJ Eppenhof, Josien PW Pluim, and Remco Duits. Roto-translation covariant convolutional networks for medical image analysis. In Proc. MICCAI, 2018. 2, 28

Gabriele Cesa, Leon Lang, and Maurice Weiler. A program to build E(N)-equivariant steerable CNNs. In Proc. ICLR, 2021. 3, 8, 16, 29

Anadi Chaman and Ivan Dokmanic. Truly shift-invariant convolutional neural networks. In Proc. CVPR, 2021. 17

Siheng Chen, Aliaksei Sandryhaila, and Jelena Kovaˇcevi c. Sampling theory for graph signals. In Proc. ICASSP, 2015a. 2, 6

Siheng Chen, Rohan Varma, Aliaksei Sandryhaila, and Jelena Kovaˇcevi c. Discrete signal processing on graphs: Sampling theory. IEEE TSP, 2015b. 2

Adam Coates, Andrew Ng, and Honglak Lee. An analysis of single-layer networks in unsupervised feature learning. In Proceedings AISTATS. JMLR Workshop and Conference Proceedings, 2011. 10, 18

Taco Cohen and Max Welling. Group equivariant convolutional networks. In Proc. ICML, 2016. 1, 2,

8, 18, 28, 29

Taco S Cohen and Max Welling. Steerable CNNs. In Proc. ICLR, 2017. 2, 8, 28

Taco S Cohen, Mario Geiger, and Maurice Weiler. A general theory of equivariant CNNs on homogeneous spaces. In Proc. Neur IPS, 2019. 28

Li Deng. The MNIST database of handwritten digit images for machine learning research. IEEE SPM, 2012. 8

MM Dodson. Groups and the sampling theorem. Sampling Theory in Signal and Image Processing, 2007. 2

Xiaowen Dong, Dorina Thanou, Pascal Frossard, and Pierre Vandergheynst. Learning Laplacian matrix in smooth graph signal representations. IEEE TSP, 2016. 8

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In Proc. ICLR, 2021. 1

Yuanqi Du, Limei Wang, Dieqiao Feng, Guifeng Wang, Shuiwang Ji, Carla P Gomes, Zhi-Ming Ma, et al. A new perspective on building efficient and expressive 3D equivariant graph neural networks. In Proc. Neur IPS, 2023. 2

Adel Faridani. A generalized sampling theorem for locally compact abelian groups. Mathematics of Computation, 1994. 2

Gerald B Folland. A course in abstract harmonic analysis. CRC press, 2016. 3

Kunihiko Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 1980. 2

Nate Gruver, Marc Anton Finzi, Micah Goldblum, and Andrew Gordon Wilson. The lie derivative for measuring learned equivariance. In Proc. ICLR, 2023. 1

Published as a conference paper at ICLR 2025

Jason Hartford, Devon Graham, Kevin Leyton-Brown, and Siamak Ravanbakhsh. Deep models of interactions across sets. In Proc. ICML, 2018. 2

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. CVPR, 2016. 1

I Martin Isaacs. Algebra: a graduate course, volume 100. American Mathematical Soc., 2009. 4

Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Alias-free generative adversarial networks. In Proc. Neur IPS, 2021. 2

Risi Kondor and Shubhendu Trivedi. On the generalization of equivariance and convolution in neural networks to the action of compact groups. In Proc. ICML, 2018. 28

Dieter Kraft. A software package for sequential quadratic programming. Tech. Rep. DFVLR-FB 88-28, 1988. 8

Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009.

Yann Le Cun, Patrick Haffner, Léon Bottou, and Yoshua Bengio. Object recognition with gradientbased learning. In Shape, contour and grouping in computer vision, 1999. 2

Yi-Lun Liao and Tess Smidt. Equiformer: Equivariant graph attention transformer for 3d atomistic graphs. In Proc. ICLR, 2023. 2

Iou-Jen Liu, Raymond A Yeh, and Alexander G Schwing. PIC: permutation invariant critic for multi-agent deep reinforcement learning. In Proc. CORL, 2020. 2

Iou-Jen Liu, Zhongzheng Ren, Raymond A Yeh, and Alexander G Schwing. Semantic tracklets: An object-centric representation for visual multi-agent reinforcement learning. In Proc. IROS, 2021a. 2

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proc. ICCV, 2021b. 1

Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. A Conv Net for the 2020s. In Proc. CVPR, 2022. 1

Haggai Maron, Heli Ben-Hamu, Nadav Shamir, and Yaron Lipman. Invariant and equivariant graph networks. In Proc. ICLR, 2019. 2

Jason D Mc Ewen, Martin Büttner, Boris Leistedt, Hiranya V Peiris, and Yves Wiaux. A novel sampling theorem on the rotation group. IEEE SPL, 2015. 2

Christopher Morris, Gaurav Rattan, Sandra Kiefer, and Siamak Ravanbakhsh. Speq Nets: Sparsityaware permutation-equivariant graph networks. In Proc. ICML, 2022. 2

S Chandra Mouli and Bruno Ribeiro. Neural networks for learning counterfactual G-invariances from single environments. In Proc. ICLR, 2021. 7, 17

David Mumford, John Fogarty, and Frances Kirwan. Geometric invariant theory. Springer Science & Business Media, 1994. 17

Antonio Napolitano and Chad M Spooner. Cyclic spectral analysis of continuous-phase modulated signals. IEEE Transactions on Signal Processing, 2001. 2

Fritz Peter and Hermann Weyl. The completeness of the primitive representations of a closed continuous group. Mathematical Annals, 1927. 3

Md Ashiqur Rahman and Raymond A Yeh. Truly scale-equivariant deep nets with Fourier layers. In Proc. Neur IPS, 2024. 2

Published as a conference paper at ICLR 2025

Md Ashiqur Rahman, Robert Joseph George, Mogab Elleithy, Daniel Leibovici, Zongyi Li, Boris Bonev, Colin White, Julius Berner, Raymond A. Yeh, Jean Kossaifi, Kamyar Azizzadenesheli, and Anima Anandkumar. Pretraining codomain attention neural operators for solving multiphysics PDEs. In Proc. Neur IPS, 2024. 2

Siamak Ravanbakhsh, Jeff Schneider, and Barnabas Poczos. Deep learning with sets and point clouds. In Proc. ICLR workshop, 2017. 2

Renan A Rojas-Gomez, Teck-Yian Lim, Alex Schwing, Minh Do, and Raymond A Yeh. Learnable polyphase sampling for shift invariant and equivariant convolutional networks. In Proc. Neur IPS, 2022. 1

Renan A Rojas-Gomez, Teck-Yian Lim, Minh N Do, and Raymond A Yeh. Making vision transformers truly shift-equivariant. In Proc. CVPR, 2024. 1

David W. Romero and Jean-Baptiste Cordonnier. Group equivariant stand-alone self-attention for vision. In Proc. ICLR, 2021. 1

Claude Elwood Shannon. Communication in the presence of noise. Proc. IRE, 1949. 3

David I Shuman, Sunil K Narang, Pascal Frossard, Antonio Ortega, and Pierre Vandergheynst. The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Processing Magazine, 2013. 8

Radomir S Stankovic, Claudio Moraga, and Jaakko Astola. Fourier analysis on finite groups with applications in signal processing and system design. John Wiley & Sons, 2005. 3

Kai Sheng Tai, Peter Bailis, and Gregory Valiant. Equivariant transformer networks. In Proc. ICML, 2019. 1

PP Vaidyanathan and Ahmet Kirac. Cyclic LTI systems in digital signal processing. IEEE transactions on signal processing, 1999. 2

Cristina Vasconcelos, Hugo Larochelle, Vincent Dumoulin, Rob Romijnders, Nicolas Le Roux, and Ross Goroshin. Impact of aliasing on generalization in deep convolutional networks. In Proc. ICCV, 2021. 1

Martin Vetterli, Jelena Kovaˇcevi c, and Vivek K Goyal. Foundations of signal processing. Cambridge University Press, 2014. 1, 2, 6

Dian Wang, Xupeng Zhu, Jung Yeon Park, Mingxi Jia, Guanang Su, Robert Platt, and Robin Walters. A general theory of correct, incorrect, and extrinsic equivariance. In Proc. Neur IPS, 2023. 9

Maurice Weiler and Gabriele Cesa. General E(2)-equivariant steerable CNNs. In Proc. Neur IPS, 2019. 28, 29

Maurice Weiler, Fred A Hamprecht, and Martin Storath. Learning steerable filters for rotation equivariant CNNs. In Proc. CVPR, 2018. 9, 30

Daniel Worrall and Max Welling. Deep scale-spaces: Equivariance over scale. In Proc. Neur IPS, 2019. 2

Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, and Lei Zhang. Cv T: Introducing convolutions to vision transformers. In Proc. CVPR, 2021. 1

Jin Xu, Hyunjik Kim, Thomas Rainforth, and Yee Teh. Group equivariant subsampling. In Proc. Neur IPS, 2021. 1, 2, 10, 17, 30

Renjun Xu, Kaifan Yang, Ke Liu, and Fengxiang He. E(2)-equivariant vision transformer. In Proc. UAI, 2023. 1

Chiao-An Yang, Ziwei Liu, and Raymond A Yeh. Deep nets with subsampling layers unwittingly discard useful activations at test-time. In Proc. ECCV, 2024. 1

Published as a conference paper at ICLR 2025

Raymond A Yeh, Yuan-Ting Hu, and Alexander Schwing. Chirality nets for human pose regression. In Proc. Neur IPS, 2019a. 2

Raymond A Yeh, Alexander G Schwing, Jonathan Huang, and Kevin Murphy. Diverse generation for multi-agent sports games. In Proc. CVPR, 2019b. 2

Raymond A Yeh, Yuan-Ting Hu, Mark Hasegawa-Johnson, and Alexander Schwing. Equivariance discovery by learned parameter-sharing. In Proc. AISTATS, 2022. 2

Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Ruslan R Salakhutdinov, and Alexander J Smola. Deep sets. In Proc. Neur IPS, 2017. 2

Aston Zhang, Zachary C Lipton, Mu Li, and Alexander J Smola. Dive into deep learning. Cambridge University Press, 2023. 1

Richard Zhang. Making convolutional networks shift-invariant again. In Proc. ICML, 2019. 1, 30

Zhendong Zhang. Frequency pooling: Shift-equivalent and anti-aliasing downsampling. ar Xiv preprint ar Xiv:2109.11839, 2021. 2

Xueyan Zou, Fanyi Xiao, Zhiding Yu, and Yong Jae Lee. Delving deeper into anti-aliasing in Conv Nets. In Proc. BMVC, 2020. 1

Published as a conference paper at ICLR 2025

The appendix is organized as follows:

In A1, we present a review on group theory.

In A2, we present the results for incorporating the equivariant index selection operation with our proposed downsampling technique.

In A3, we provide the complete proofs of the Lemmas and Claims in the main paper.

In A4, we provide the illustration of the implications of Claim 1.

In A5, we provide a further generalization of the approach and how to check whether a given group satisfies our theoretical assumptions.

In A7, we document additional implementation details. Code is also provided in the supplemental materials.

A1 GROUP THEORY PRELIMINARIES

A Group is a set G equipped with an operation that maintains the following properties:

Closure: a, b G, a b G

Associativity: a, b, c G, a (b c) = (a b) c

Existence of Identity: e G : a G, e a = a

Existence of Inverse: a G, a 1 G : a a 1 = e.

H is a subgroup of G if H G and H satisfies all the group properties. The cardinality of the set G is known as the order of the group. And the groups of finite order are called finite groups. For any subgroup H G, the left coset generated an element g G is denoted as g H = {gh : h H} and right coset is denoted as Hg = {hg : h H}.

Discrete Rotation Group (Cn). The discrete rotation group Cn = r | rn = e is a cyclic group representing rotations by integer multiples of 360

Dihedral Group. Dihedral group D2n = s, r|s2 = rn = (sr)2 = e is the group of symmetries of a regular n-sided polygon, with rotations by integer multiple of 360

n and horizontal reflection.

General Linear Group. The general linear group GL(n, F) is the group of all invertible n n matrices with entries from the field F. GL(V ) denotes general linear group on the vector space V .

Minimal Generation Set. In general a group G can have multiple minimal generating sets. For example, for any cyclic group CN generated by the minimal generating set S = {r}, S = {rb} is also a minimal generating set when b and N are relatively prime.

A1.2 GROUP REPRESENTATION

Linear Group Representation. A linear group representation of a group G on a vector space U is a homomorphism ρ from G to the general linear group GL(U). This can be written as:

ρ : G GL(U), (A19)

where for each g G, ρ(g) is an invertible linear transformation on U.

The map ρ must satisfy the following properties :

ρ(gh) = ρ(g)ρ(h) for all g, h G.

ρ(e) = IU, where IU denotes identity transformation on U.

ρ(g) is an invertible linear transformation.

Published as a conference paper at ICLR 2025

The dimensionality of a representation ρ is equal to the dimensionality of U and written as dρ.

The trivial representation of a group G on a vector space U is a representation ρ such that ρ(g) = IU for all g G. In other words, every group element acts as the identity transformation on the vector space U.

The regular representation of a finite group G on the vector space F|G| with a basis indexed by elements of G act on it by permuting these basis elements according to the group operation.

Equivalent Representation Two representations ρ1 and ρ2 are equivalent iff ρ1 = Tρ2T 1 for some change of basis T GL(U).

Direct Sum of Representations Let ρ1 : G GL(U1) and ρ2 : G GL(U2) be two representations of a group G on vector spaces U1 and U2 over the field F. The direct sum of ρ1 and ρ2 is a representation ρ1 ρ2 : G GL(U1 U2) defined by:

(ρ1 ρ2)(g) = ρ1(g) ρ2(g) for all g G,

where U1 U2 is the direct sum of U1 and U2, and the action on U1 U2 is given by:

[ρ1 ρ2](g)[u1, u2] = [ρ1(g)u1, ρ2(g)u2]

for all u1 U1 and u2 U2.

Irreducible Representation. A representation ρ : G GL(U) of a group G on a vector space U over a field F is called irreducible if the only G-invariant subspaces of U are the trivial subspace {0} and U.

The set of irreducible representations (irreps) of G is denoted as ˆG. All the irreps of abelian groups are 1 dimensional. The irreps of dihedral groups are 1 and 2 dimensional. And the irreps of a symmetric group S4 are 1,2,3 dimensional.

Orthogonality Relation and Fourier Transform. Let φ be an irreducible unitary representation of G of degree dφ. Then the d2 φ functions { p

dφφmn | 1 m, n dφ} form an orthonormal set.

Let G be a finite group. Let ˆG = {φi, . . . , φs} be a complete set of irreducible representations of G. Then the functions np

dφiφmn i | φi ˆG, 1 m, n dφi o

form an orthonormal set in the space of complex-valued functions over group L(G),and d2 φ1 + + d2 φs = |G|. In fact, this set of orthonormal bases defined the Fourier basis for functions on the group G. The Fourier transform of a square-integrable function f L2(G) is

ˆf(φmn i ) = 1 |G|

dφif(g)φmn i (g) φi ˆG and 1 m, n dφi (A20)

where φmn i (g) denotes the entry at mth row and nth column for matrix φi(g). Next, ˆf(φmn i ) denotes the Fourier coefficient corresponding to irrep component φmn i . Similarly, the inverse Fourier transform on a group can be expressed as

ˆf(φmn i ) p

dφiφmn i (g), (A21)

And if for any g G the action on f L2(G) is defined as [g f](u) = f(g 1u) then the action can be represented in Fourier space as

d g f(φi) = φi ˆf(φi) (A22)

where, φi, ˆf(φi), d g f(φi) Cdφ1 dφi.

In the real case, irreps can have redundant columns. To eliminate redundancy, an endomorphism basis Cψi is constructed to span the non-redundant columns of an irrep ψi. The full irrep can be recovered by multiplying these columns with elements of Cψi. The reverse of this process can also be constructed, giving us the non-redundant columns (see Cesa et al. (2021) for details).

Published as a conference paper at ICLR 2025

A1.3 INVARIANT AND EQUIVARIANT MAPS

A linear map W Rn n is equivariant with respect to the group action ρU : g GL(U) and ρU : g GL(U ) with U Rn and U Rn if

WρU (g)v = ρU (g)Wv g G, v U (A23)

This imposes the following restrictions on the linear map

ρU (g) ρU (g 1) vec(W) = vec(W) g G (A24)

where vec denotes vectorization operation converting a matrix to a vector. The condition in Eq. (A24) denotes that vec(W) should be invariant to action of tensor product representation ρU U = ρU (g) ρU (g 1) on the left. For a finite group G, the Reynolds operator (Mouli & Ribeiro, 2021; Mumford et al., 1994) is defined as

g G ρ(g) (A25)

which is a G-invariant linear map with respect to the representation ρ : g GL(X) on vector space X. And to satisfy the condition in Eq. (A24), vec(W) must belong to the 1-eigenspace of Reynolds operator with respect to the tensor product representation ρU U acting on the vector space U U (Mouli & Ribeiro, 2021).

A1.4 ILLUSTRATION CHALLENGES IN SUBGROUP SUBSAMPLING

To illustrate the challenges in subgroup subsampling, we present an example by subsampling the group

D6 = {e, r, r2, s, sr, sr2}, (A26)

which is a relatively small group of size 6 by a factor of 2, i.e., we aim to discard every other element.

As can be seen, discarding every other element of the set will generate the subset H = {e, r2, sr}. We can see that H is not a subgroup as the inverse of r2 is missing in the subset; thus, it violets the property of a group.

Additionally, in the above example, we first assumed an ordering of elements of D6. However, we can also choose another ordering of the elements as

D6 = {r, e, r2, s, sr, sr2}, (A27)

resulting in a different subset after subsampling. In this example, even with a small group, the set can be arranged in 720 in different ways, creating ambiguity in the process. So, the traditional subsampling operation of naively discarding elements does not apply to groups.

A2 ADDITIONAL EXPERIMENTS

A2.1 EQUIVARIANT SUBSAMPLING

The work Xu et al. (2021) proposes to select indexes in a consistent manner that respects a specialized equivariance (see Xu et al. (2021) lemma 2.1) that can work between groups and subgroups, which is equivalent to Chaman & Dokmanic (2021) in traditional subsampling. The work assumes that the subgroup is already provided and does not perform any anti-aliasing. However, the equivariant index selection scheme can be incorporated with our proposed subgroup selection and anti-aliasing operator. In Tab. A1, we provide the results on rotated-MNIST (SO(2) symmetry), where we incorporate equivariant index selection with our proposed subgroup selection and anti-aliasing operator. We observe that the proposed anti-aliasing operator consistently reduces the equivariance error and improves accuracy. It also demonstrates the wide applicability of our proposed method.

Published as a conference paper at ICLR 2025

Table A1: Performance of G-equivariant models on Rotated MNIST at different subsampling rates and with/without anti-aliasing filter PM with the equivariant index selection. We can observe that our proposed technique improves performance and equivariance error, showing wide adaptability.

Sub. R. # Param. 103 PM Accno aug Accloc Accorbit Lequi 2 194.09 0.9733 0.8147 0.8225 0.0545 2 194.09 0.9782 0.8288 0.8297 0.0473 3 151.08 0.9692 0.7717 0.7865 0.1061 3 151.08 0.9650 0.7737 0.7833 0.0594 4 129.57 0.9656 0.6606 0.5602 0.0881 4 129.57 0.9703 0.6928 0.5759 0.0761

A2.2 RESUTL ON STL-10

We conduct experiment on STL-10 (Coates et al., 2011) dataset. This dataset comprises images with a resolution of 96 96 from 10 different object classes. We train on 5000 training images with any data augmentation. In Tab. A2 and Tab. A3, we present the result on rotation and roto-reflection symmetry groups, respectively, at different sampling rates. We observe that our proposed method achieves higher accuracy and lower equivariance error compared to naive subsampling. We also observe that, since the images in STL-10 have a much higher resolution compared to MNIST and CIFAR-10 and contain more high-frequency details, our anti-aliasing operator provides a higher performance improvement.

Table A2: Performnace of G-equivarinat models on STL-10 dataset at different sampling rate R and with/without anti-aliasing filter PM under the rotation (SO(2)) symmetry. Sub-group subsampling with anti-aliasing improves both equivariance and accuracy.

Initial Group Sub. R. PM #params ACCno aug ACCloc ACCorbit Lequi C24 - - 1.3M 0.54 0.34 0.30 0.16 C24 2 962K 0.60 0.42 0.37 0.16 C24 2 962K 0.60 0.40 0.35 0.17 C24 3 831K 0.62 0.42 0.37 0.16 C24 3 831K 0.60 0.38 0.34 0.18

Table A3: Performance of G-equivariant models on STL-10 dataset at different sampling rates R and with/without anti-aliasing filter PM under the roto-reflection (O(2)) symmetry. Sub-group subsampling with anti-aliasing improves both equivariance and accuracy.

Initial Group Sub. R. PM #params ACCno aug ACCloc ACCorbit Lequi D24 - - 1.3M 0.57 0.37 0.32 0.12 D24 2 962K 0.64 0.40 0.27 0.19 D24 2 962K 0.61 0.40 0.26 0.20 D24 3 831K 0.64 0.44 0.39 0.17 D24 3 831K 0.60 0.33 0.33 0.17

A2.3 EQUIVARIANCE ERROR PROPAGATION

To further study the effect of the anti-aliasing operator, we visualize the propagation of the equivariance error through the model on the STL-10 dataset. The latent functions in G-CNN are defined on Z G, where G is the rotation or dihedral group. To visualize the equivariance error, we pool the latent function over the group G following the method in Cohen & Welling (2016), ensuring equivariance with respect to the group action on the input ρg. For an input image I, layer k, and equivariant G-CNN G, we defined the equivarinace error at each spatial location (i, j) as

Ek eq[i, j] =

Poolg Gk(I)[i, j] ρg 1Poolg Gk(ρg I) [i, j] 2

||Poolg Gk(I)||2 2 , (A28)

Published as a conference paper at ICLR 2025

where Gk(I) denotes the output of kth hidden layer given the input I and g G. In other words, the equivariant error Ek eq[i, j] denotes the equivariance error at pixel (i, j). As can be observed in Fig. A1A8, our approach with anti-aliasing shows lower equivariance error throughout the features at all three layers. Recall, that our experiment uses an architecture consisting of a downsampling operation between each of the layers. We also observe that the equivariance error indeed propagates and worsens deeper into the models. Finally, we note that perfect equivariance, i.e., zero error, is not achieved due to the boundary pixels going out of scope in the rotated images.

Equivariance Error Layer 1 Layer 2 Layer 3

Input Image Without Anit-aliasing

Rotated Image θ = 15 With Anit-aliasing (Ours)

Figure A1: Visualization of the equivariance error at each layer.

Equivariance Error Layer 1 Layer 2 Layer 3

Input Image Without Anit-aliasing

Rotated Image θ = 30 With Anit-aliasing (Ours)

Figure A2: Visualization of the equivariance error at each layer.

Published as a conference paper at ICLR 2025

Equivariance Error Layer 1 Layer 2 Layer 3

Input Image Without Anit-aliasing

Rotated Image θ = 15 With Anit-aliasing (Ours)

Figure A3: Visualization of the equivariance error at each layer.

Equivariance Error Layer 1 Layer 2 Layer 3

Input Image Without Anit-aliasing

Rotated Image θ = 30 With Anit-aliasing (Ours)

Figure A4: Visualization of the equivariance error at each layer.

Published as a conference paper at ICLR 2025

Equivariance Error Layer 1 Layer 2 Layer 3

Input Image Without Anit-aliasing

Rotated Image θ = 15 With Anit-aliasing (Ours)

Figure A5: Visualization of the equivariance error at each layer.

Equivariance Error Layer 1 Layer 2 Layer 3

Input Image Without Anit-aliasing

Rotated Image θ = 30 With Anit-aliasing (Ours)

Figure A6: Visualization of the equivariance error at each layer.

Published as a conference paper at ICLR 2025

Equivariance Error Layer 1 Layer 2 Layer 3

Input Image Without Anit-aliasing

Rotated Image θ = 15 With Anit-aliasing (Ours)

Figure A7: Visualization of the equivariance error at each layer.

Equivariance Error Layer 1 Layer 2 Layer 3

Rotated Image θ = 30 Without Anit-aliasing

Input Image With Anit-aliasing

Figure A8: Visualization of the equivariance error at each layer.

Published as a conference paper at ICLR 2025

A3 COMPLETE PROOFS OF LEMMAS AND CLAIMS

A3.1 PROOF OF LEMMA 1

First, we provide the proof of the lemmas.

Lemma 1. For the set G returned by Alg. 1, v G if and only if v can be expressed as a product of the elements of the set S = S/{sd} {s R d }.

Proof. In the graph (V, E ), there exists a path between e and some node v V iff v G (guaranteed by BFS traversal algorithm). So it will be sufficient to prove that in a graph (V, E ), there exists a path between node e and some node v if and only if v can be expressed as the product of elements of S .

By construction, each node in a directed Cayley graph (V, E) = Di Cay(G, S) has an out-degree of |S|, with each outgoing edge corresponding to an element of the set S. Removing all outgoing edges corresponding to the element sd and adding a new outgoing edge to each node corresponding to the new element s R d , i.e., E = E \ {(a, a sd) : a V } {(a, a s R d ) : a V } maintains the property with respect to S . That means each node in graph (V, E ) has an outgoing edge corresponding to each element of the set S .

Let s assume there exists a path from e to node a V . We denote the path as a list of vertices by {e, (e sa1), . . . , (e sa1 . . . sam 1 sam)} which is constructed by picking m hops from e along the edges corresponding to the elements {sa1, . . . , sam 1, sam} in order where j saj S . This implies a = Qm j=1 saj, i.e., a is generated by products of the elements of set S .

Conversely, let b = Qn i=1 sbi such that i sbi S . Existence of a path from e to b demands the existence of a series of hops from e along the edges corresponding to the elements sb1, . . . , sbn 1, sbm. Such a series of hops always exists in graph (V, E ) as every node has |S | out-going edges corresponding to each element in S .

A3.2 PROOF OF LEMMA 2

Lemma 2. For the set S in Lemma 1, each element si S = s 1 i G .

Proof. Let sk S \ {s R d }, then s 1 k = sok 1 k (as ok is order of sk), i.e., s 1 k can be expressed as a product of the elements of S by Lemma 1, s 1 k G .

Now (s R d ) 1 = s R d . Let, w = (od 1) and (Rw mod od) (Rod R mod od) ( R mod od). so, s R d = sw R d . And, following Lemma 1, (s R d ) 1 G .

A3.3 PROOF OF CLAIM 1

Claim 1. If Sk d = {sk d : k Z+ and k mod R 0} are non-redundant powers of sd, od mod r 0, and the elements of Sk d can not be represented as a product of the elements of the left cosets of the subgroup Gsub = S/{sd} generated by the set {sn R d : n Z+ 0 } then Alg. 1 returns a proper subgroup G G.

Proof. We first prove that G is a group.

Existence of Identity By construction, e is always a member of set G as we start the traversing the graph from node e.

Closure Let a, b G . Therefore, by Lemma 1 a = Qm j=1 saj, b = Qn i=1 sbi with sai, sbj S . Now a b = (Qm i=1 sai) (Qn j=1 sbj), i.e, a b can also be expressed as a product of elements of S . So, by Lemma 1, a b G .

Associativity As G G, and element of G follows the multiplication table of group G.So, the associativity of operation will hold trivially for elements of G .

Published as a conference paper at ICLR 2025

Existence Inverse element Let, v G and v = Qn i=1 svi = sv1 sv2 . . . svn. Now we construct a group element u as u = s 1 vn s 1 vn 1 . . . s 1 v1 . And, we can see that v u = u v = e. So, u = v 1. By Lemma 2, i s 1 vi G and following the Closure property u G , i.e., G is a group.

Now, we prove that G G by contradiction. We assume that ski d Sk d such that ski d G .

As the elements of Sk d are non-redundant, ski d can not be generated only by the generator S = S /{s R d }. Additionally, ki mod R 0 and od = w R for some w Z ( R divides od), l Z : l R mod od ki. So, l Z such that (s R d )l = ski d . Therefore, the generators for the element ski d must include s R d and elements from the set S .

Without any loss of the generality, assume that the path from e to ski d is the shortest among the elements of Sk d G . Lets ski d = sk1 sk2 sk2 . . . skn 1 skn such that i ski S . Now, sk1 can not be s R d . As

ski d = s R d sk2 sk2 . . . skn 1 skn = ski R d = sk2 sk2 . . . skn 1 skn,

where ki R mod R 0 as ki mod r 0. But ski R d Sk d requires one less generator, thus contradicting our assumption that the path from e to ski d is the shortest among the elements of Sk d G .

A similar restriction is also applicable for skn. So, the path from e to ski d must start and end with generators from set S . Therefore, we can express ski d as

ski d = q1 (s Rn2 d q2) (s Rn3 d q3) . . . (s Rnl d ql) (A29)

where, j qj Gsub with Gsub is subgroup generated by S , and i ni Z.

Next, s Rnm d qm for 2 m l is an element of the left coset of the subgroup Gsub generated by element s Rnm d , i.e., s Rnm d qm {s Rnm d g : g Gsub} and q1 is an element of a trivial left coset of Gsub generated by e. Therefore, ski d is expressed as the product of the elements of the left cosets of Gsub generated by the set {sn R d : n Z+ 0 }, which contradicts our assumption.

This means that ski d / G ski d Sk d and implies that G G.

A3.4 PROOF OF CLAIM 2

Claim 2. Subgroup Sampling Theorem. For any signal x on G, if the Fourier coefficients ˆx are in the 1-eigenspace of M M(M M) 1M then it can be reconstructed perfectly from the subsampled signal x on G . The superscript denotes the conjugate transpose.

Proof. First, we show that if ˆx is in the 1-Eigenspace of M(M M) 1M , then x Span(B) with B F 1 G M. If ˆx is in the 1-eigenspace, then

ˆx = M(M M) 1M ˆx (A30)

FGx = FGF 1 G M(M M) 1M FGx (as FGF 1 G = I) (A31)

x = F 1 G M(M M) 1M FGx (A32)

x = F 1 G M(M F 1 G F 1 G M) 1M F 1 G x (as F 1 G = FG) (A33)

x = F 1 G M((F 1 G M) F 1 G M) 1(F 1 G M) x (A34)

x = B(B B) 1B x (A35) x = PMx (A36)

Here, PM B(B B) 1B denotes the projection matrix to the column space of B. Note that the columns of B are linearly independent. As F 1 G = SB and rank(F 1 G ) = M. The rank(B) is at least M. And as B has M columns, they are independent, and rank(B) = M, PM is a valid projection matrix.

Published as a conference paper at ICLR 2025

This means that x is in Span(B), i.e., we can express x = Bˆxc for some set of coefficient vector ˆxc. Perfect reconstruction from the subsampled signal x is now possible, i.e.,

Ix = (BFG )(Sx) = (BFG S)(Bˆxc) = BFG F 1 G ˆxc = Bˆxc = x. (A37)

In conclusion, perfect reconstruction of x is possible from x when ˆx is in the 1-Eigenspace of M(M M) 1M .

Published as a conference paper at ICLR 2025

A4 ILLUSTRATION OF CLAIM 1

Here is an illustration of Claim 1. The claim states that if Sk d = {sk d : k Z+ and k mod r 0} are non-redundant powers of sd, od mod r 0, and elements of Sk d can not be represented as a product of elements of left cosets of the subgroup Gsub = S/{sd} generated by set {sn R d : n Z+ 0 } then Alg. 1 returns a proper subgroup G G. In Fig. A9, we illustrate the claim with group D8.

Gsub = S/{sd}

Figure A9: Illustraion of Claim 1 for subsampling D8 by a factor R = 2 along the generator sd = r. The red-colored nodes denote the set Sk d = {r, r3}. The green highlighted nodes {e, s} is the Gsub = S/r . We can see Sk d is nonredundant, and the order of sd is divided by 2. The last part of the claim implies that the colored node must not be reachable from nodes Gsub with a hop of r2 denoted in a dotted blue line. Which is indeed satisfied with the example shown.

A5 GENERALIZATION OF SAMPLING ALGORITHM

In this section, we provide an algorithm (see Alg. 2) to check for compliance of a generator with the condition in Claim 1. We also provide a general sampling algorithm (Alg. 3) that maximizes the number of generators in the subgroup following the heuristics from 4.1.

The Alg. 2 takes O(|V |+|E|) time where V is the set of nodes and E is the set of edges in the Cayley graph. To choose the generator with the highest order, we need to check for compliance for each of the generators, making the time complexity to downsample by a chosen generator O(|S|.(|V |+|E|)). The computational complexity can be high for complex groups depending on the choice of the generating set S. Since the sampling algorithm runs only once before training to generate the sampling matrix, efficiency is maintained. Furthermore, for large complex groups, such as symmetry groups Sn, the subgroups can be selected based on prior domain knowledge followed by our proposed anti-aliasing operation.

Published as a conference paper at ICLR 2025

Algorithm 2 Check-Compliance

1: Input: Group G, Generators S, Generator s, Order of the generator o, subsampling rate r 2: Output: True, False 3: if o mod r = 0 then 4: Return False 5: end if 6: V, E Di Cay(G, S) 7: for each v V do 8: E.remove((v, v sd)) 9: E.add((v, v sr d)) 10: end for 11: // graph traversal from e 12: Q 13: Gcosets 14: Q.enqueue(e) 15: while Q = do 16: n Q.dequeue() 17: Gcosets.add(n) 18: for each (n, m) E do 19: if m / Q then 20: Q.enqueue(m) 21: end if 22: end for 23: end while 24: if sk Gcosets such that k mod r = 0 then 25: Return False 26: end if 27: Return True

Algorithm 3 General-Subsample

1: Input: Group G, Generators S, Order of the generators O, subsampling rate r, 2: Subsampled Group: G

3: V, E Di Cay(G, S) 4: G G.copy() 5: R factorize(r) 6: for i = 1 to R.length() do 7: index NULL 8: for j = 1 to S.length() do 9: // check the compliance of S[j] using Alg. 2 10: if check-compilance(G, S[j], O[j], R[j]) then 11: if (index = NULL OR O[j] < O[index]) then 12: index j 13: end if 14: end if 15: end for 16: if index = NULL then 17: Return NULL 18: end if 19: // Downsampling using Alg. 1 20: G Downsample(G, S, R[i], S[index]) 21: // updating generating set and order 22: S[index] S[index]R[i]

23: O[index] O[index]/R[i] 24: end for 25: Return G

Published as a conference paper at ICLR 2025

A6 FUNCTION ON GROUPS IN EQUIVARIANT CNN FOR IMAGES

Group Convolution: In the group equivariant convolution neural network the input image f : Z2 Rk (k=1 or 3 depending on whether the image is grayscale or colored) is first lifted to the space of roto-translation or dihedral-translation group (Z2 CN or Z2 DN) by the lifting operation (Cohen & Welling, 2016)

[f ψ](g) = X

k fk(z)ψk(g 1z), g Z2 G, (A38)

where k is the channel index, i.e., fk represents k channel of the image, ψk : Z2 R is 2D kernel, and G is either cyclic (rotation) group or dihedral group (CN or DN) for most computer vision tasks. This transformation lifts the image to the desired group by repeatedly applying the transformed (by the action of the group) filter on the image f.

The filter ψk is a regular convolution filter, i.e., it is a real-valued function defined on 2D grid Z2. The action of G on the filter ψ is defined as

[ρgψ](z) = ψ(g 1z) z Z2. (A39)

In other words, the transformed filter [ρgψ] is defined through the action of the group element g on the z Z2, which we directly use in Eq. (A38). In the case of the rotation group, the group element g corresponds to angle θ [0, 2π], and the action of the group element θ on z = [u, v] Z2 is defined as

θz cos θ sin θ sin θ cos θ

This is the underlying mechanism of rotating real values function on 2D grid (for details, please see (Cohen & Welling, 2016; 2017)).

Next, in the group convolution network, the function [f ψ] : Z2 G R is a function over Z2 G and is passed on to the following group convolution layers. For the ease of notation, let denote the real-valued function on group Z2 G as E, i.e., E : Z G R. For any E, the group convolution (Cohen & Welling, 2016) is defined as

[E κ](g) = X

h Z2 G E(h)κ(g 1h), g Z2 G, (A41)

where κ : Z G R is the group convolution kernel. The output is then passed through point-wise non-linearity and followed by more group convolution layers.

We can see that group convolution is defined by the action of Z2 G on the function κ. This is analogous to regular convolution, where the function κ is also shifted (transformed) by the action of group elements. For example, in the specific case of roto-translation group Z2 CN, group convolution is analogous to 3D convolution where the action of the roto-translation group guides shift on the filter. Please see Sec. 7 of Cohen & Welling (2016) and Bekkers et al. (2018) for details.

When performing group convolution, we follow the techniques introduced by Cohen & Welling (2016; 2017) and do not propose any modification of the group convolution operations (Eq. (A38) and Eq. (A41)) introduced in the earlier works. For a detailed explanation and construction of the group equivariant architecture, we refer the readers to Cohen et al. (2019); Weiler & Cesa (2019); Kondor & Trivedi (2018); Cohen & Welling (2016); Bekkers et al. (2018).

Anti-aliasing: The function E is represented as a tensor of size H W |G|, where H W is the resolution of the input image, which corresponds to the size of the translation group and |G| is the number of elements in the group G.

Now, at a fixed translation group element (spacial location) (i, j) {0, .., H} {0, ..., W} of the tensor, we have function over G, i.e., (i, j) {0, .., H} {0, ..., W}, we have

E(h = i, w = j, d) = Ei,j(d) R d G, (A42)

with Ei,j : G R. This function Ei,j is transformed according to the regular representation of G. We represent Ei,j as a vector of size |G|, i.e., Ei,j R|G|.

Published as a conference paper at ICLR 2025

Zy C4 f : Z2 R

[f ψ] : Z2 C4 R

Figure A10: Visualization of function on the group in C4 = {e, r, r2, r3} equivariant CNN. The input image f is transformed into a function over a group following Eq. (A38) with some learnable filter ψ. The resultant function [f ψ] is a function over Z2 C4. Now at every fixed spatial location (i, j) Zx Zy, we have functions over C4. Elements of one of such functions are marked with a dotted circle with corresponding elements of C4

In our work, we perform subsampling and anti-aliasing on the functions Ei,j (i, j) {0, .., H} {0, ..., W}. The anti-aliasing operator PM can also be represented as a matrix of size |G| |G|, i.e., PM R|G| |G| and Pg M (g, g ) R g, g G. Specifically, the anti-aliasing operation on the function E is defined as

E (h = i, w = j, d) = X

l G E(h = i, w = j, l) PM (d, l) (i, j) (A43)

with E (h = i, w = j, d) = E i,j(d).

Finally, Eq. (A43) can be implemented as a matrix-vector multiplication as

E i,j = PM Ei,j. (A44)

In other words, the anti-aliasing is a matrix multiplication along the group dimension of the tensor representation of E.

A7 ADDITIONAL IMPLEMENTATION DETAILS

It is essential to note that our experiment setup is different from that of Weiler & Cesa (2019) and designed carefully to highlight the robustness of the group equivariant model in a limited data setting. The evaluation metrics are designed to explicitly measure the consistency of the models under all group actions. Unlike Weiler & Cesa (2019), we do not train the randomly rotated datasets, thereby revealing the actual equivariance property of the model by the architecture design. Also, our designed accuracy metrics, ACCorbit, ACCloc, and ACCnoaug provide the performance of the model at different granularity under group actions, which can not be obtained by testing the model on randomly rotated tested (Weiler & Cesa, 2019).

For MNIST, we train on 5, 000 training images without any data augmentation and test on 10, 000 images on different levels of transformations. For CIFAR-10, we train on 60K images without any data augmentation and evaluate on 10K images. All models consist of 3 group equivariant convolution layers (Cesa et al., 2021; Cohen & Welling, 2016) followed by a linear layer mapping to the final logits. The filter size at each layer is 5. When subgroup subsampling is performed, the convolution layer following the subsampling layer is equivariant only to the subgroup. The output of

Published as a conference paper at ICLR 2025

the final convolution layer undergoes global-pooling operation (Weiler et al., 2018) to obtain invariant features. For subsampling, roto(dihedral)-translation group, we subsample rotation (dihedral) group and translation group independently. Subsampling along the translation group is equivalent to spatial subsampling and is performed using Blur Pool (Zhang, 2019). We set λ = 5 in Eq. (14) for obtaining M .

Models are optimized using the Adam optimizer and trained using 15 and 50 epochs with batch sizes of 128 and 256 for MNIST and CIFAR-10 datasets, respectively. All the expenses are run on a single NVIDIA RTX 6000 GPU.

A8 LATENT FEATURE RECONSTRUCTION

To further investigate the effect of our anti-aliasing operator, we reconstruct the feature on the whole group from the downsampled features on the subgroup. It is crucial to note that the naive subsampling operation in previous work (Xu et al., 2021) lacks a suitable interpolation operation. Hence, we used our proposed interpolation operator to reconstruct the feature, from the first group convolution layer, for both with and without anti-aliasing. We visualize the squared error at each pixel. As shown in Fig. A11, our anti-aliasing operation enables us to reconstruct the original feature across the entire group accurately.

Published as a conference paper at ICLR 2025

Input Image Downsampled features Reconstruction Reconstruction Error

Without Anti-aliasing

With Anti-aliasing (Ours)

Without Anti-aliasing

With Anti-aliasing (Ours)

Without Anti-aliasing

With Anti-aliasing (Ours)

Figure A11: Visualization of the reconstruction quality of the latent feature after subgroup subsampling operation.