# spinweighted_spherical_cnns__d23a043b.pdf

Spin-Weighted Spherical CNNs

Carlos Esteves GRASP Laboratory University of Pennsylvania machc@seas.upenn.edu

Ameesh Makadia Google Research makadia@google.com

Kostas Daniilidis GRASP Laboratory University of Pennsylvania kostas@cis.upenn.edu

Learning equivariant representations is a promising way to reduce sample and model complexity and improve the generalization performance of deep neural networks. The spherical CNNs are successful examples, producing SO(3)-equivariant representations of spherical inputs. There are two main types of spherical CNNs. The ﬁrst type lifts the inputs to functions on the rotation group SO(3) and applies convolutions on the group, which are computationally expensive since SO(3) has one extra dimension. The second type applies convolutions directly on the sphere, which are limited to zonal (isotropic) ﬁlters, and thus have limited expressivity. In this paper, we present a new type of spherical CNN that allows anisotropic ﬁlters in an efﬁcient way, without ever leaving the spherical domain. The key idea is to consider spin-weighted spherical functions, which were introduced in physics in the study of gravitational waves. These are complex-valued functions on the sphere whose phases change upon rotation. We deﬁne a convolution between spin-weighted functions and build a CNN based on it. The spin-weighted functions can also be interpreted as spherical vector ﬁelds, allowing applications to tasks where the inputs or outputs are vector ﬁelds. Experiments show that our method outperforms previous methods on tasks like classiﬁcation of spherical images, classiﬁcation of 3D shapes and semantic segmentation of spherical panoramas.

1 Introduction

Learning representations from data enables a variety of applications that are not possible with other methods. Convolutional neural networks (CNNs) are powerful tools in representation learning, in great part due to their translation equivariance property that allows weight-sharing, exploiting the natural structure of audio, image, or video inputs.

Recently, there has been signiﬁcant work extending equivariance to other groups of transformations [20, 9, 13, 44, 33, 17, 45, 40, 43, 18, 4] and designing equivariant CNNs on non-Euclidean domains [11, 16, 26, 37, 35, 8, 27, 37, 48]. Successful applications have been demonstrated in tasks such as 3D shape analysis [16, 18], medical imaging [42, 3], satellite/aerial imaging [13, 21], cosmology [13, 35], physics/chemistry [11, 26, 1]. Favorable results were also shown on popular upright natural image datasets such as CIFAR10/100 [39].

Rotation equivariant CNNs are the natural way to learn feature representations on spherical data. There are two prevailing designs, (a) convolution between spherical functions and zonal (isotropic; constant per latitude) ﬁlters [16], and (b) convolutions on SO(3) after lifting spherical functions to the rotation group [11]. There is a clear distinction between these two designs: (a) is more efﬁcient allowing to build representational capacity through deeper networks, and (b) has more expressive ﬁlters but is computationally expensive and thus is constrained to shallower networks. The question we consider in this paper is: how can we achieve the expressivity/representation capacity of SO(3) convolutions with the efﬁciency and scalability of spherical convolutions?

34th Conference on Neural Information Processing Systems (Neur IPS 2020), Vancouver, Canada.

In this paper, we propose to leverage spin-weighted spherical functions (SWSFs), introduced by Newman and Penrose [34] in the study of gravitational waves. These are complex-valued functions on the sphere that, upon rotation, suffer a phase change besides the usual spherical translation.

Figure 1: Colors represent a scalar ﬁeld, and the green vectors represent a vector ﬁeld. Upon rotation, scalar ﬁelds transform by simply moving values to another position, while vector ﬁelds move and also rotate. Treating vector ﬁelds as multi-channel scalars (bottom-right) results in incorrect behavior. The spinweighted spherical CNNs equivariantly handle vector ﬁelds as inputs or outputs.

Our key observation is that a combination of SWSFs allows more expressive representations than scalar spherical functions, avoiding the need to lift features to the higher dimensional SO(3). It also enables anisotropic ﬁlters, removing the ﬁlter constraint of purely spherical CNNs.

We deﬁne convolutions and cross-correlations of SWSFs. For bandlimited inputs, the operations can be computed exactly in the spectral domain, and are equivariant to the continuous group SO(3). We build a CNN where ﬁlters and features are sets of SWSFs, and adapt nonlinearities, batch normalization, and pooling layers as necessary.

Besides more expressive and efﬁcient representations, we can interpret the spin-weighted features as equivariant vector ﬁelds on the sphere, enabling applications where the inputs or outputs are vector ﬁelds. Current spherical CNNs [11, 16, 26, 35] cannot achieve equivariance in this sense, as illustrated in Fig. 1.

To evaluate vector ﬁeld equivariance, we introduce a variation of MNIST where the images and their gradients are projected to the sphere. We propose three tasks on this dataset: 1) vector ﬁeld classiﬁcation, 2) vector ﬁeld prediction from scalar ﬁelds, 3) scalar ﬁeld prediction from vector ﬁelds. We also evaluate our model on spherical image classiﬁcation, 3D shape classiﬁcation, and semantic segmentation of spherical panoramas.

To summarize our contributions,

1. We deﬁne the convolution and cross-correlation between sets of spin-weighted spherical functions. These are SO(3) equivariant operations that respect the SWSFs properties. 2. We build a CNN based on these operations and adapt usual CNN components for sets of

SWSFs as features and ﬁlters. This is, to the best of our knowledge, the ﬁrst spherical CNN that operates on vector ﬁelds. 3. We demonstrate the efﬁcacy of the spin-weighted spherical CNNs (SWSCNNs) on a variety of tasks including spherical image and vector ﬁeld classiﬁcation, predicting vector ﬁeld from images and conversely, 3D shape classiﬁcation and spherical image segmentation. 4. We will make our code and datasets publicly available at https://github.com/ daniilidis-group/swscnn.

2 Related work

Equivariant CNNs The ﬁrst equivariant CNNs were applied to images on the plane [20, 13]. Cohen and Welling [9] formalized these models and named them group equivariant convolutional neural networks (G-CNNs). While initial methods were constrained to small discrete groups of rotations on the plane, they were later extended to larger groups [41], continuous rotations [44], rotations and scale [17], 3D rotations of voxel grids [43, 40], and point clouds [37].

Spherical CNNs G-CNNs can be extended to homogeneous spaces of groups of symmetries [28]; the quintessential example is the sphere S2 as a homogeneous space of the group SO(3), the setting of spherical CNNs. There are two main branches. The ﬁrst branch, introduced by Cohen et al. [11], lifts the spherical inputs to functions on SO(3), and its ﬁlters and features are functions on the group SO(3), which is higher dimensional and thus more computationally expensive to process. Kondor et al. [26] is another example. The second branch, introduced by Esteves et al. [16], is purely spherical and has ﬁlters and features on S2, using spherical convolution as the main operation. In this case, the

ﬁlters are constrained to be zonal (isotropic), which limits the representational power. Perraudin et al. [35] also uses isotropic ﬁlters, but with graph convolutions instead of spherical convolutions.

Our approach lies between these two branches. It is not restricted to isotropic ﬁlters but it does not have to lift features to SO(3); we employ sets of SWSFs as ﬁlters and features.

A separate line of work developed spherical CNNs that are not rotation-equivariant [24, 47], which rely on the strong assumption that the inputs are aligned.

Equivariant vector ﬁelds Our approach can equivariantly handle spherical vector ﬁelds as inputs or outputs. Marcos et al. [33] introduced a planar CNN whose features are vector ﬁelds obtained from rotated ﬁlters. Cohen and Welling [12] formalized the concept of feature types that are vectors in a group representation space. This was extended to 3D Euclidean space by Weiler et al. [40]. Worrall et al. [44] introduced complex-valued features on R2 whose phases change upon rotation; this is similar in spirit to our method, but our features live on the sphere, requiring different machinery.

Cohen et al. [8] introduced a framework that produces vector ﬁeld features on general manifolds; it was specialized to the sphere by Kicanaoglu et al. [25]. The major differences are that our implementation is fully spectral and we demonstrate it on tasks requiring vector ﬁeld equivariance. Cohen et al. [10] alluded to the possibility of building spherical CNNs that can process vector ﬁelds; we materialize these networks.

3 Background

In this section, we provide the mathematical background that guides our contributions. We ﬁrst introduce the more commonly encountered spherical harmonics, then the generalization to the spinweighted spherical harmonics (SWSHs). We also describe convolutions between spherical functions, which we will later generalize to convolutions between spin-weighted functions.

Spherical Harmonics The spherical harmonics Y ℓ m : S2 C form an orthonormal basis for the space L2(S2) of square integrable functions on the sphere. Any function f : S2 C in L2(S2) can be decomposed in this basis via the spherical Fourier transform (SFT) (Eq. (1)), and synthesized back exactly via its inverse (Eq. (2)),

S2 f(x)Y ℓm(x) dx, (1) f(x) =

ˆf ℓ m Y ℓ m(x). (2)

We interchangeably use latitudes and longitudes (θ, φ) or points x R3, x = 1 to index the sphere, and we use the hat to denote Fourier coefﬁcients. A function has bandwidth B when only components of order ℓ B appear in the expansion.

The spherical harmonics are related to irreducible representations of the group SO(3) as follows,

Dℓ m,0(α, β, γ) =

4π 2ℓ+ 1Y ℓm(β, α), (3)

where α, β and γ are ZYZ Euler angles and Dℓis a Wigner-D matrix.1 Since Dℓis a group representation and hence a group homomorphism, we obtain a rotation formula,

Y ℓ m(gx) =

Dℓm,n(g)Y ℓ n(x), (4)

where we interchangeably use an element g SO(3) or Euler angles α, β and γ to refer to rotations.

Consider the rotation of a function represented by its coefﬁcients by combining Eqs. (2) and (4),

m= ℓ ˆf ℓ m Dℓm,n(g)

Y ℓ n(x). (5)

1The subscripts m, n refer to rows and columns of the matrix, respectively.

This shows that when f(x) 7 f(gx), its Fourier coefﬁcients transform as ˆf ℓ n 7 X

Dℓm,n(g) ˆf ℓ m (6)

Finally, we recall how convolutions and cross-correlations of spherical functions are computed in the spectral domain. Esteves et al. [16] deﬁne the convolution between two spherical functions f and k as Eq. (7) while Makadia and Daniilidis [32] and Cohen et al. [11] deﬁne the spherical cross-correlation as Eq. (8),

([ k f)ℓ m = 2π

4π 2ℓ+ 1 ˆf ℓ mˆkℓ 0, (7) ([ k f)ℓ m,n = ˆf ℓ mˆkℓn, (8)

Both are shown to be equivariant through Eq. (6). The left-hand side of Eq. (7) correspond to the Fourier coefﬁcients of a spherical function, while the left-hand side of Eq. (8) correspond to the Fourier coefﬁcients of a function on SO(3).

This section laid the foundation for the spin-weighted generalization. We refer to Esteves [15] for a longer exposition on this topic and to Vilenkin and Klimyk [38] and Folland [19] for the full details.

Spin-Weighted Spherical Harmonics The spin-weighted spherical functions (SWSFs) are complex-valued functions on the sphere whose phases change upon rotation. They have different types determined by the spin weight.

Let sf : S2 C be a SWSF with spin weight s, λα a rotation by α around the polar axis, and ν the north pole. In a conventional spherical function, ν is ﬁxed by the rotation, so (λα(f))(ν) = f(ν). In a spin-weighted function, however, the rotation results in a phase change, (λα(sf))(ν) = sf(ν)e isα. (9) If the spin weight is s = 0, this is equivalent to the conventional spherical functions.

The spin-weighted spherical harmonics (SWSHs) form a basis of the space of square-integrable spin-weighted spherical functions; for all square-integrable sf, we can write

sf(θ, φ) = X

s Y ℓ m(θ, φ)s ˆf ℓ m, (10)

where s ˆf ℓ m are the expansion coefﬁcients, and the decomposition is deﬁned similarly to Eq. (1). For s = 0, the SWSHs are exactly the spherical harmonics; we have 0Y ℓ m = Y ℓ m.

The SWSHs are related to the matrix elements Dℓ mn of SO(3) representations as follows,

Dℓ m, s(α, β, γ) = ( 1)s r

4π 2ℓ+ 1 s Y ℓm(β, α)e isγ. (11)

Note how different spin-weights are related to different columns of Dℓ, while the standard spherical harmonics are related to a single column as in Eq. (3). This shows that the SWSHs can be seen as functions on SO(3) with sparse spectrum, a point of view that is advocated by Boyle [6].

The SWSHs do not transform among themselves upon rotation as the spherical harmonics (Eq. (4)) due to the extra phase change. Fortunately, the coefﬁcients of expansion of a SWSF into the SWSHs do transform among themselves according to Eq. (6). When sf(x) 7 sf(gx),

s ˆf ℓ n 7 X

Dℓm,n(g)s ˆf ℓ m. (12)

This is crucial for deﬁning equivariant convolutions between combinations of SWSFs as we will do in Section 4.1. We refer to Castillo [7] and Boyle [5, 6] for more details about SWSFs.

We introduce a fully convolutional network, the spin-weighted spherical CNN (SWSCNN), where layers are based on spin-weighted convolutions, and ﬁlters and features are combinations of SWSFs. We deﬁne spin-weighted convolutions and cross-correlations, show how to efﬁciently implement them, and adapt common neural network layers to work with combinations of SWSFs.

4.1 Spin-Weighted Convolutions and Cross-Correlations

We deﬁne and evaluate the convolutions and cross-correlations in the spectral domain. Consider a set of spin weights WF , WK and sets of functions F = {sf : S2 C | s WF } and ﬁlters K = {sk: S2 C | s WK} to be convolved.

Spin-weighted convolution We deﬁne the convolution between F and K as follows,

s( \ F K) ℓ

i ˆf ℓ m sˆkℓ i, (13)

where s WK and ℓ m ℓ. Only coefﬁcients sˆkℓ i where i WF inﬂuence the output, imposing sparsity in the spectra of K. The convolution F K is also a set of SWSFs with s WK, the same spin weights as K; we leverage this to specify the desired sets of spins at each layer.

We show this operation is SO(3) equivariant by applying the rotation formula from Eq. (12). Let λg denote a rotation of each sf(x) F by g SO(3). We have,

s( \ λg F K) ℓ

Dℓm,n(g)i ˆf ℓ m sˆkℓ i

i ˆf ℓ m sˆkℓ i

Dℓm,n(g)s( \ F K) ℓ

= λg(s( \ F K) ℓ

Now consider the spherical convolution deﬁned in Eq. (7). It follows immediately that it is, up to a constant, a special case of the spin-weighted convolution, where F and K have only one element with s = 0, and only the ﬁlter coefﬁcients of form 0ˆkℓ 0 are used.

Spin-weighted cross-correlation We deﬁne the cross-correlation between F and K as follows,

s( \ F K) ℓ

i ˆf ℓ m iˆkℓs, (15)

In this case, only the spins that are common to F and K are used, but all spins may appear in the output, so it can be seen as a function on SO(3) with dense spectrum. To ensure a desired set of spins in F K, we can sparsify the spectra in K by eliminating some orders. A procedure similar to Eq. (14) proves the SO(3) equivariance of this operation.

The spin-weighted cross-correlation generalizes the spherical cross-correlation. When F and K contain only a single spin weight s = 0, the summation in Eq. (15) will contain only one term and we recover the spherical cross-correlation deﬁned in Eq. (8).

Examples To visualize the convolution and cross-correlations, we use the phase of the complex numbers and deﬁne local frames to obtain a vector ﬁeld. We visualize combinations of SWSFs by associating pixel intensities with the spin-weight s = 0 and plotting vector ﬁelds for each s > 0.

Consider an input F = {0f, 1f} and ﬁlter K = {0k, 1k}, both with spin weights 0 and 1. Their convolution also has spins 0 and 1, as shown on the left side of Fig. 2. Now consider a scalar valued (spin s = 0) input F = {0f} and ﬁlter K = {0k}. The cross-correlation will have components of every spin, but we only take spin weights 0 and 1 to visualize (this is equivalent to eliminating all orders larger than 1 in the spectrum of k); Fig. 2 shows the results.

4.2 Spin-weighted spherical CNNs

Our main operation is the convolution deﬁned in Section 4.1. Since components with the same spin can be added, the generalization to multiple channels is immediate. The convolution combines features of different spins, so we enforce the same number of channels per spin per layer. Each feature map then consists of a set of SWSFs of different spins, F = {sf : S2 Ck | s WF }, where k is the number of channels and WF the set of spin weights.

Figure 2: Left block (2 3): convolution between sets of functions of spins 0 and 1. The operation is equivariant as a vector ﬁeld and outputs carry the same spins. Right block (2 3): spin-weighted cross-correlation between scalar spherical functions. The operation is also equivariant and we show outputs corresponding to spins 0 and 1. The second row shows the effect of rotating the input F.

Filter localization We compute the convolutions in the spectral domain but apply nonlinearities, batch normalization and pooling in the spatial domain. This requires expanding the feature maps into the SWSHs basis and back at every layer, but the ﬁlters themselves are parameterized by their spectrum. We follow the idea of Esteves et al. [16] to enforce ﬁlter localization with spectral smoothness. Their ﬁlters are of the form 0ˆkℓ 0, so the spectrum is 1D and can be interpolated from a few anchor points, smoothing it out and reducing the number of parameters. In our case, the ﬁlters take the general form sˆkℓ m where s WF K are the output spin weights and m WF are the input spin weights. We then interpolate the spectrum of each component along the degrees ℓ, resulting in a factor of |WF K||WF | more parameters per layer.

Batch normalization and nonlinearity We force features with spin weight s = 0 to be real by taking their real part after every convolution. Then we can use the common rectiﬁed linear unit (Re LU) as the nonlinearity and the standard batch normalization from Ioffe and Szegedy [23].

For s > 0, we have complex-valued feature maps. Since values move and change phase upon rotation, equivariant operations must commute with this behavior. Pointwise operations on magnitudes satisfy this requirement. Similarly to Worrall et al. [44], we employ a variation of the Re LU to the complex values z = aeiθ as follows, where a R+ and b R is a learnable scalar.

z 7 max(a + b, 0)eiθ. (16)

Batch normalization is also applied pointwise, but it does not commute with spin-weighted rotations because of the mean subtraction and offset addition steps. We adapt it by removing these steps, where σ2 is the channel variance, γ C is a learnable factor and ϵ R+ is a constant added for stability,

σ2 + ϵ γ. (17)

As usual, the variance is computed along the batch during training and along the whole dataset during inference. The variance of a set of complex numbers is real and only depends on their magnitudes; we use a spherical quadrature rule to compute it.

Complexity analysis We follow Huffenberger and Wandelt [22] for the spin-weighted spherical Fourier transform (SWSFT) implementation (see appendix for details), whose complexity for bandwidth B is O(B3). While it is asymptotically slower than the O(B2 log2 B) of the standard SFT from Driscoll and Healy [14], the difference is small for bandwidths typically needed in practice [11, 16, 26]. The rotation group Fourier transform (SOFT) implementation from Kostelec and Rockmore [29] is O(B4). Our ﬁnal model requires |W| transforms per layer, so it is asymptotically a factor |W |B/log2 B slower than using SFT as in Esteves et al. [16], and a factor B/|W | faster than using the SOFT as in Cohen et al. [11]. Typical values in our experiments are B = 32 and |W| = 2.

5 Experiments

We start with experiments on image and vector ﬁeld classiﬁcation, image prediction from a vector ﬁeld, and vector ﬁeld from an image, where all images and vector ﬁelds are on the sphere. Next, we show applications to 3D shape classiﬁcation and semantic segmentation of spherical panoramas.

All experiments use spin weights 0 and 1. When inputs do not have both spins, the ﬁrst layer is designed such that its outputs have. All following layers and ﬁlters also have spins 0 and 1.

Every model is trained with different random seeds ﬁve times and averages and standard deviations (within parenthesis) are reported. See the appendix for training procedure details.

5.1 Spherical Image Classiﬁcation

Our ﬁrst experiment is on the Spherical MNIST dataset introduced by Cohen et al. [11]. This is an image classiﬁcation task where the handwritten digits from MNIST are projected on the sphere. Three modes are evaluated depending on whether the training/test set are rotated (R) or not (NR).

We simplify the architecture in Esteves et al. [16] to have a single branch, switch from spherical to spin-weighted convolutions, and adapt the numbers of channels and parameters per ﬁlter to match the parameter counts. Table 1 shows the results; we outperform previous spherical CNNs in every mode.

Table 1: Spherical MNIST results. Our model is more expressive than the isotropic and more efﬁcient than the previous anisotropic spherical CNNs, allowing deeper models and improved performance.

NR/NR R/R NR/R params

Planar CNN 99.07(4) 81.07(63) 17.23(71) 59k Cohen et al. [11] 95.59 94.62 93.40 58k Kondor et al. [26] 96.40 96.60 96.00 - Esteves et al. [16] 98.75(8) 98.71(5) 98.08(24) 57k Ours 99.37(5) 99.37(1) 99.08(12) 58k

5.2 Spherical Vector Field Classiﬁcation

Table 2: Spherical vector ﬁeld MNIST classiﬁcation results. When vector ﬁeld equivariance is required, the gap between our method and the spherical and planar baselines is even larger.

NR/NR R/R NR/R

Planar 97.7(2) 50.0(8) 14.6(9) [16] 98.4(1) 94.5(5) 24.8(8) Ours 98.2(1) 97.8(2) 98.2(7)

One crucial advantage of the SWSCNNs is that they are equivariant as vector ﬁelds. To demonstrate this, we introduce a spherical vector ﬁeld dataset. We start from MNIST [31], compute the image gradients with Sobel kernels and project the vectors to the sphere. To increase the challenge, we follow Larochelle et al. [30] and swap the train and test sets so there are 10 k images for training and 50 k for test. We call this dataset the spherical vector ﬁeld MNIST (SVFMNIST). The vector ﬁeld is converted to a spin weight s = 1 complex-valued function using a predeﬁned local tangent frame per point on the sphere. The inverse procedure converts s = 1 features to output vector ﬁelds.

The ﬁrst task we consider is classiﬁcation. We use the same architecture as in the previous experiment, the only difference is that now the ﬁrst layer maps from spin 1 to spins 0 and 1. Table 2 shows the results. The planar and spherical CNN models take the vector ﬁeld as a 2-channel input. The NR/R column clearly shows the advantage of vector ﬁeld equivariance; the baselines cannot generalize to unseen vector ﬁeld rotations, even when they are equivariant in the scalar sense as [16].

5.3 Spherical Vector Field Prediction

The SWSCNNs can also be used for dense prediction. We introduce two new tasks on SVFMNIST, 1) predicting a vector ﬁeld from an image and 2) predicting an image from a vector ﬁeld. For these tasks, we implement a fully convolutional U-Net architecture [36] with spin-weighted convolutions.

When the image is a grayscale digit and the vector ﬁeld comes from its gradients, both tasks can be easily solved via discrete integration and differentiation. We call this case easy and show it on the left side of table Table 3. It highlights a limitation of isotropic spherical CNNs; the results show that the constrained ﬁlters cannot approximate a simple image gradient operator.

We also experiment with a more challenging scenario, where the digits are colored and the vector ﬁelds are rotated based on the digit category. These are semantic tasks that require the network to implicitly classify the input in order to correctly predict output color and vector directions.

Table 3 shows the results. While the planar baseline does well in the easy tasks that can be solved with simple linear operators, our model still outperforms it when generalization to unseen rotations is demanded (NR/R). In the hard task, the SWSCNNs are clearly superior by large margins. We show sample inputs and outputs in Fig. 3; see the appendix for more.

Table 3: Vector ﬁeld to image and image to vector ﬁeld results on SVFMNIST. The SWSCNNs show superior performance, especially on the more challenging tasks. The metric is the mean-squared error 103 (lower is better). All models have around 112k parameters.

NR/NR R/R NR/R NR/NR R/R NR/R

Image to Vector Field Planar 0.3(1) 5.0(1) 9.3(1) 16.9(5) 26.0(1) 32.9(2) Esteves et al. [16] 9.7(3) 31.0(2) 45.6(7) 13.3(6) 28.5(4) 41.6(4) Ours 2.9(2) 3.4(1) 4.3(1) 11.6(6) 9.2(4) 10.2(6) Vector Field to Image Planar 1.4(1) 3.2(1) 6.9(4) 3.3(2) 13.4(2) 21.1(3) Esteves et al. [16] 3.8(1) 4.9(2) 15(2) 2.6(1) 6.4(2) 20.3(9) Ours 3.5(1) 3.8(1) 4.0(1) 2.6(1) 2.7(1) 2.9(1)

Figure 3: Input, output and ground truth for dense prediction tasks on rotated train and test sets (R/R). Top: vector ﬁeld to image. Conventional and spherical CNNs [16] predict the incorrect color, in contrast with our SWSCNNs. Bottom: image to vector ﬁeld. Our method predicts the position and orientation of each vector correctly, while the alternatives cannot.

5.4 Classiﬁcation of 3D shapes Table 4: Model Net40 shape classiﬁcation accuracy [%]. Our model outperforms previous spherical CNNs while requiring small input size and low parameter count.

upright rotated

UGSCNN [24] 87.3(3) 81.9(9) Sph CNN [16] 89.3(5) 88.4(3) Ours 89.6(3) 88.8(1) Ours + BE 90.1(3) 88.2(2)

We tackle 3D object classiﬁcation on Model Net40 [46], following the protocol from Esteves et al. [16] which considers azimuthally and arbitrarily rotated shapes.

Besides more expressive ﬁlters, our method also represents the shapes more faithfully on the sphere. Esteves et al. [16] and Cohen et al. [11] cast rays from the shape s center and assign the intersection distance and angle between normal and ray to points on the sphere. Normals are not uniquely determined by a single angle but this limitation was necessary to preserve equivariance as a scalar ﬁeld.

By using SWSCNNs, we can represent any normal direction uniquely, without breaking equivariance. We split the vector in radial and tangent components, where the radial is represented with spin s = 0 and the tangent has s = 1. Since the intersection distance is also a function with s = 0, our 3D shape representation has two spherical channels with s = 0 and one of s = 1. Following Cohen et al. [11], we also use the convex hull for extra channels.

When inputs have limited orientations, a globally equivariant model can be undesirable, even though equivariance in the local sense is still useful. We can keep the beneﬁts while still having access to the global pose by breaking equivariance on the ﬁnal layers, which we do by simply replacing them with regular 2D convolutions. We call this model Ours + BE ; it results in better performance on upright but worse on rotated , as expected.

Table 4 compares with previous spherical CNNs. The upright mode has only azimuthal rotations while rotated is unrestricted. EMVN [18] is state-of-the-art on this task with 94.4% accuracy on upright and 91.1% on rotated , but it requires 60 images as input and much larger model.

5.5 Semantic segmentation of spherical panoramas Table 5: Semantic segmentation on Stanford 2D3DS. Our model clearly outperforms previous equivariant models and matches the state-of-the-art nonequivariant model.

acc [%] m Io U

UGSCNN [24] 54.7 38.3 Gauge CNN [8] 55.9 39.4 Hex RUNet [47] 58.6 43.3 Sph CNN [16] 52.8(6) 40.2(3) Ours 55.6(5) 41.9(5) +normals 57.5(6) 43.4(4) +normals+BE 58.7(5) 43.4(4)

We evaluate our method on the Stanford 2D3DS dataset [2], following the usual protocol of reporting the average performance over the three ofﬁcial folds.

As in Section 5.4, our model is able uniquely represent surface normals. In this task, representing the normals with respect to local tangent frames is also more realistic, as they could be estimated from a depth sensor without knowledge of global orientation. Note that competing methods don t usually leverage the normals, so we also show results without them for comparison.

Table 5 shows the results. Inputs are upright so global SO(3) equivariance is not required; nevertheless, our method matches the state-of-the-art performance, which demonstrates the expressivity of the SWSCNNs.

6 Conclusion

In this paper, we introduced the spin-weighted spherical CNNs, which use sets of spin-weighted spherical functions as features and ﬁlters, and employ layers of a newly introduced spin-weighted spherical convolution to process spherical images or spherical vector ﬁelds. Our model achieves superior performance on the tasks attempted, at a reasonable computational cost. We foresee further applications of the SWSCNNs to 3D shape analysis, climate/atmospheric data analysis and other tasks where inputs or outputs can be represented as spherical images or vector ﬁelds.

Broader Impact

This paper presents advances on learning representations from spherical data. It has potential beneﬁcial applications to climate and atmospheric modeling, for example.

The method is in the broad category of equivariant CNNs, which have the goal to reduce model and sample complexity and improve generalization performance. This potentially translates to models that are more energy efﬁcient, and are more accessible to individuals without access to large computational resources. On the ﬂip side, most technology can also be applied for harmful purposes, and when making it more accessible we also risk enabling bad actors to make use of it.

Acknowledgments and Disclosure of Funding

Research was sponsored by the Army Research Ofﬁce and was accomplished under Grant Number W911NF-20-1-0080 as well as NSF TRIPODS 1934960 and the ONR N00014-17-1-2093 grants. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the ofﬁcial policies, either expressed or implied, of ARO, ONR, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.

[1] Brandon M. Anderson, Truong-Son Hy, and Risi Kondor. Cormorant: Covariant Molecular Neural Networks . In: Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, Neur IPS 2019. 2019, pp. 14510 14519. [2] Iro Armeni, Sasha Sax, Amir Roshan Zamir, and Silvio Savarese. Joint 2D-3D-Semantic Data for Indoor Scene Understanding . In: Co RR abs/1702.01105 (2017). ar Xiv: 1702.01105. URL: http://arxiv.org/abs/1702.01105. [3] Erik J Bekkers, Maxime W Lafarge, Mitko Veta, Koen AJ Eppenhof, Josien PW Pluim, and Remco Duits. Roto-translation covariant convolutional networks for medical image analysis . In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer. 2018, pp. 440 448. [4] Erik J. Bekkers. B-Spline CNNs on Lie groups . In: 8th International Conference on Learning Representations, ICLR 2020. 2020. [5] Michael Boyle. Angular velocity of gravitational radiation from precessing binaries and the corotating frame . In: Physical Review D 87.10 (2013), p. 104006. [6] Michael Boyle. How should spin-weighted spherical functions be deﬁned? In: Journal of Mathematical Physics 57.9 (Sept. 2016), p. 092504. ISSN: 1089-7658. DOI: 10.1063/1. 4962723. URL: http://dx.doi.org/10.1063/1.4962723. [7] Gerardo F Torres del Castillo. 3-D spinors, spin-weighted functions and their applications. Vol. 32. Springer Science & Business Media, 2012. [8] Taco Cohen, Maurice Weiler, Berkay Kicanaoglu, and Max Welling. Gauge Equivariant Convolutional Networks and the Icosahedral CNN . In: Proceedings of the 36th International Conference on Machine Learning, ICML 2019. 2019. [9] Taco Cohen and Max Welling. Group equivariant convolutional networks . In: International conference on machine learning. 2016, pp. 2990 2999. [10] Taco S Cohen, Mario Geiger, and Maurice Weiler. A General Theory of Equivariant CNNs on Homogeneous Spaces . In: Advances in Neural Information Processing Systems. 2019, pp. 9142 9153. [11] Taco S. Cohen, Mario Geiger, Jonas Köhler, and Max Welling. Spherical CNNs . In: International Conference on Learning Representations. 2018. URL: https://openreview.net/ forum?id=Hkbd5x ZRb. [12] Taco S. Cohen and Max Welling. Steerable CNNs . In: 5th International Conference on Learning Representations, ICLR 2017. 2017.

[13] Sander Dieleman, Jeffrey De Fauw, and Koray Kavukcuoglu. Exploiting Cyclic Symmetry in Convolutional Neural Networks . In: Proceedings of the 33nd International Conference on Machine Learning, ICML 2016. 2016, pp. 1889 1898. [14] James R Driscoll and Dennis M Healy. Computing Fourier transforms and convolutions on the 2-sphere . In: Advances in applied mathematics 15.2 (1994), pp. 202 250. [15] Carlos Esteves. Theoretical Aspects of Group Equivariant Neural Networks . In: Co RR abs/2004.05154 (2020). ar Xiv: 2004.05154. [16] Carlos Esteves, Christine Allen-Blanchette, Ameesh Makadia, and Kostas Daniilidis. Learning SO(3) Equivariant Representations with Spherical CNNs . In: The European Conference on Computer Vision (ECCV). Sept. 2018. [17] Carlos Esteves, Christine Allen-Blanchette, Xiaowei Zhou, and Kostas Daniilidis. Polar Transformer Networks . In: 6th International Conference on Learning Representations, ICLR 2018. 2018. [18] Carlos Esteves, Yinshuang Xu, Christine Allen-Blanchette, and Kostas Daniilidis. Equivariant Multi-View Networks . In: The IEEE International Conference on Computer Vision (ICCV). Oct. 2019. [19] Gerald B Folland. A course in abstract harmonic analysis. Chapman and Hall/CRC, 2016. [20] Robert Gens and Pedro M Domingos. Deep symmetry networks . In: Advances in neural information processing systems. 2014, pp. 2537 2545. [21] Joao F Henriques and Andrea Vedaldi. Warped convolutions: Efﬁcient invariance to spatial transformations . In: Proceedings of the 34th International Conference on Machine Learning Volume 70. JMLR. org. 2017, pp. 1461 1469. [22] Kevin M. Huffenberger and Benjamin D. Wandelt. Fast and Exact Spin-s Spherical Harmonic Transforms . In: The Astrophysical Journal Supplement Series 189.2 (July 2010), pp. 255 260. DOI: 10.1088/0067-0049/189/2/255. [23] Sergey Ioffe and Christian Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift . In: Proceedings of the 32nd International Conference on Machine Learning. 2015, pp. 448 456. [24] Chiyu Max Jiang, Jingwei Huang, Karthik Kashinath, Prabhat, Philip Marcus, and Matthias Nießner. Spherical CNNs on Unstructured Grids . In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. 2019. URL: https://openreview.net/forum?id=Bkl-43C9FQ. [25] Berkay Kicanaoglu, Pim de Haan, and Taco Cohen. Gauge Equivariant Spherical {CNN}s. 2020. URL: https://openreview.net/forum?id=HJe YSx HFDS. [26] Risi Kondor, Zhen Lin, and Shubhendu Trivedi. Clebsch gordan nets: a fully fourier space spherical convolutional neural network . In: Advances in Neural Information Processing Systems. 2018, pp. 10138 10147. [27] Risi Kondor, Hy Truong Son, Horace Pan, Brandon M. Anderson, and Shubhendu Trivedi. Covariant Compositional Networks For Learning Graphs . In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Workshop Track Proceedings. 2018. [28] Risi Kondor and Shubhendu Trivedi. On the Generalization of Equivariance and Convolution in Neural Networks to the Action of Compact Groups . In: International Conference on Machine Learning, ICML. 2018. [29] Peter J Kostelec and Daniel N Rockmore. FFTs on the rotation group . In: Journal of Fourier Analysis and Applications 14.2 (2008), pp. 145 179. [30] Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra, and Yoshua Bengio. An empirical evaluation of deep architectures on problems with many factors of variation . In: Proceedings of the 24th international conference on Machine learning. ACM. 2007, pp. 473 480. [31] Yann Le Cun, Corinna Cortes, and CJ Burges. MNIST handwritten digit database . In: (2010). [32] A. Makadia and K. Daniilidis. Rotation recovery from spherical images without correspondences . In: IEEE Transactions on Pattern Analysis and Machine Intelligence 28.7 (July 2006), pp. 1170 1175. ISSN: 0162-8828. DOI: 10.1109/TPAMI.2006.150.

[33] Diego Marcos, Michele Volpi, Nikos Komodakis, and Devis Tuia. Rotation Equivariant Vector Field Networks . In: IEEE International Conference on Computer Vision, ICCV 2017. 2017, pp. 5058 5067. [34] Ezra T Newman and Roger Penrose. Note on the Bondi-Metzner-Sachs Group . In: Journal of Mathematical Physics 7.5 (1966), pp. 863 870. [35] Nathanaël Perraudin, Michaël Defferrard, Tomasz Kacprzak, and Raphael Sgier. Deep Sphere: Efﬁcient spherical convolutional neural network with HEALPix sampling for cosmological applications . In: Astronomy and Computing 27 (2019), pp. 130 146. [36] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation . In: International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI). 2015. [37] Nathaniel Thomas, Tess Smidt, Steven Kearnes, Lusann Yang, Li Li, Kai Kohlhoff, and Patrick Riley. Tensor Field Networks: Rotation-and Translation-Equivariant Neural Networks for 3D Point Clouds . In: ar Xiv preprint ar Xiv:1802.08219 (2018). [38] N. Ja. Vilenkin and A. U. Klimyk. Representation of Lie Groups and Special Functions. Springer Netherlands, 1991. [39] Maurice Weiler and Gabriele Cesa. General E(2)-Equivariant Steerable CNNs . In: Advances in Neural Information Processing Systems. 2019, pp. 14334 14345. [40] Maurice Weiler, Mario Geiger, Max Welling, Wouter Boomsma, and Taco Cohen. 3D Steerable CNNs: Learning Rotationally Equivariant Features in Volumetric Data . In: Advances in Neural Information Processing Systems. Ed. by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett. 2018. [41] Maurice Weiler, Fred A. Hamprecht, and Martin Storath. Learning Steerable Filters for Rotation Equivariant CNNs . In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018. 2018, pp. 849 858. [42] Marysia Winkels and Taco S Cohen. 3D G-CNNs for Pulmonary Nodule Detection . In: ar Xiv preprint ar Xiv:1804.04656 (2018). [43] Daniel Worrall and Gabriel Brostow. Cubenet: Equivariance to 3d rotation and translation . In: Proceedings of the European Conference on Computer Vision (ECCV). 2018, pp. 567 584. [44] Daniel E Worrall, Stephan J Garbin, Daniyar Turmukhambetov, and Gabriel J Brostow. Harmonic networks: Deep translation and rotation equivariance . In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). Vol. 2. 2017. [45] Daniel E. Worrall and Max Welling. Deep Scale-spaces: Equivariance Over Scale . In: Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, Neur IPS 2019. 2019, pp. 7364 7376. [46] Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 3D Shape Nets: A Deep Representation for Volumetric Shapes . In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015, pp. 1912 1920. [47] Chao Zhang, Stephan Liwicki, William Smith, and Roberto Cipolla. Orientation-Aware Semantic Segmentation on Icosahedron Spheres . In: Proceedings of the IEEE International Conference on Computer Vision. 2019, pp. 3533 3541. [48] Yongheng Zhao, Tolga Birdal, Jan Eric Lenssen, Emanuele Menegatti, Leonidas Guibas, and Federico Tombari. Quaternion Equivariant Capsule Networks for 3d Point Clouds . In: Co RR (2019). ar Xiv: 1912.12098 [cs.LG].