# cliffordsteerable_convolutional_neural_networks__776df887.pdf Clifford-Steerable Convolutional Neural Networks Maksim Zhdanov 1 David Ruhe* 1 2 3 Maurice Weiler* 1 Ana Lucic 4 Johannes Brandstetter 5 6 Patrick Forr e 1 2 We present Clifford-Steerable Convolutional Neural Networks (CS-CNNs), a novel class of E(p, q)- equivariant CNNs. CS-CNNs process multivector fields on pseudo-Euclidean spaces Rp,q. They cover, for instance, E(3)-equivariance on R3 and Poincar e-equivariance on Minkowski spacetime R1,3. Our approach is based on an implicit parametrization of O(p, q)-steerable kernels via Clifford group equivariant neural networks. We significantly and consistently outperform baseline methods on fluid dynamics as well as relativistic electrodynamics forecasting tasks. 1. Introduction Physical systems are often described by fields on (pseudo)- Euclidean spaces. Their equations of motion obey various symmetries, such as isometries E(3) of Euclidean space R3 or relativistic Poincar e transformations E(1, 3) of Minkowski spacetime R1,3. PDE solvers should respect these symmetries. In the case of deep learning based surrogates, this property is ensured by making the neural networks equivariant (commutative) w.r.t. the transformations of interest. A fairly general class of equivariant CNNs covering arbitrary spaces and field types is described by the theory of steerable CNNs (Weiler et al., 2023). The central result there is that equivariance requires a G-steerability constraint on convolution kernels, where G = O(n) or O(p, q) for E(n)- or E(p, q)-equivariant CNNs, respectively. This constraint was solved and implemented for O(n) (Lang & Weiler, 2021; Cesa et al., 2022), however, O(p, q)-steerable kernels are so far still missing. *Equal contribution 1AMLab, Informatics Institute, University of Amsterdam 2AI4Science Lab, Informatics Institute, University of Amsterdam 3Anton Pannekoek Institute for Astronomy, University of Amsterdam 4AI4Science, Microsoft Research 5ELLIS Unit Linz, Institute for Machine Learning, JKU Linz, Austria 6NXAI Gmb H. Correspondence to: Maksim Zhdanov . Proceedings of the 41 st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024. Copyright 2024 by the author(s). Figure 1. CS-CNNs process multivector fields while respecting E(p, q)-equivariance. Shown here is a Lorentz-boost O(1, 1) of electromagnetic data on 1+1-dimensional spacetime R1,1. This work proposes Clifford-steerable CNNs (CS-CNNs), which process multivector fields on pseudo-Euclidean spaces Rp,q, and are equivariant to the pseudo-Euclidean group E(p, q): the isometries of Rp,q. Multivectors are elements of the Clifford (or geometric) algebra Cl(Rp,q) of Rp,q. Neural networks based on Clifford algebras have seen a recent surge in popularity in the field of deep learning and were used to build both non-equivariant (Brandstetter et al., 2023; Ruhe et al., 2023b) and equivariant (Ruhe et al., 2023a; Brehmer et al., 2023) models. While multivectors do not cover all possible field types, e.g. general tensor fields, they include those most relevant in physics. For instance, the Maxwell or Dirac equation and General Relativity can be formulated using the spacetime algebra Cl(R1,3). The steerability constraint on convolution kernels is usually either solved analytically or numerically, however, such solutions are not yet known for O(p, q). Observing that the G-steerability constraint is just a G-equivariance constraint, Zhdanov et al. (2023) propose to implement G-steerable kernels implicitly via G-equivariant MLPs. Our CS-CNNs follow this approach, implementing implicit O(p, q)-steerable kernels via the O(p, q)-equivariant neural networks for multivectors developed by Ruhe et al. (2023a). We demonstrate the efficacy of our approach by predicting the evolution of several physical systems. In particular, we consider a fluid dynamics forecasting task on R2, as well as relativistic electrodynamics simulations on both R3 Clifford-Steerable Convolutional Neural Networks and R1,2. CS-CNNs are the first models respecting the full spacetime symmetries of these problems. They significantly outperform competitive baselines, including conventional steerable CNNs and non-equivariant Clifford CNNs. This result remains consistent over dataset sizes. When evaluating the empirical equivariance error of our approach for E(2) symmetries, we find that we perform on par with the analytical solutions of Weiler & Cesa (2019). The main contributions of this work are the following: While prior work considered only individual multivectors, CS-CNNs process full multivector fields on pseudo-Euclidean spaces or manifolds. We investigate the representation theory of O(p, q)- steerable kernels for multivector fields and develop an implicit implementation via O(p, q)-equivariant MLPs. The resulting E(p, q)-equivariant CNNs are evaluated on various PDE simulation tasks, where they consistently outperform strong baselines. This paper is organized as follows: Section 2 introduces the theoretical background underlying our method. CS-CNNs are then developed in Section 3, and empirically evaluated in Section 4. A generalization from flat spaces to general pseudo-Riemannian manifolds is presented in Appendix G. 2. Theoretical Background The core contribution of this work is to provide a framework for the construction of steerable CNNs for processing multivector fields on general pseudo-Euclidean spaces. We provide background on pseudo-Euclidean spaces and their symmetries in Section 2.1, on equivariant (steerable) CNNs in Section 2.2, and on multivectors and the Clifford algebra formed by them in Section 2.3. 2.1. Pseudo-Euclidean spaces and groups Conventional Euclidean spaces are metric spaces, i.e. they are equipped with a metric that assigns positive distances to any pair of distinct points. Pseudo-Euclidean spaces allow for more general indefinite metrics, which relax the positivity requirement on distances. Pseudo-Euclidean spaces appear in our theory in two distinct settings: First, the (affine) base spaces on which feature vector fields are supported, e.g. Minkowski spacetime, are pseudo-Euclidean. Second, the feature vectors attached to each point of spacetime are themselves elements of pseudo-Euclidean vector spaces. We introduce these spaces and their symmetries in the following. 2.1.1. PSEUDO-EUCLIDEAN VECTOR SPACES Definition 2.1 (Pseudo-Euclidean vector space). A pseudo Euclidean vector space (inner product space) (V, η) of signature (p, q) is a p+q-dimensional vector space V over R equipped with an inner product η, which we define as a Figure 2. Examples of pseudo-Euclidean spaces R2,0 and R1,1. Colors depict O(p, q)-orbits, given by sets of all points v Rp,q with the same squared distance ηp,q(v, v) from the origin. non-degenerate1 symmetric bilinear form η : V V R, (v1, v2) 7 η(v1, v2) (1) with p and q positive and negative eigenvalues, respectively. If q = 0, η becomes positive-definite, and (V, η) is a conventional Euclidean inner product space. For q 1, η(v, v) can be negative, rendering (V, η) pseudo-Euclidean. Since every inner product space (V, η) of signature (p, q) has an orthonormal basis, we can always find a linear isometry with the standard pseudo-Euclidean space Rp,q = (V, η), to which we mostly will restrict our attention in this paper. Definition 2.2 (Standard pseudo-Euclidean vector spaces). Let e1, . . . , ep+q be the standard basis of Rp+q. Define an inner product of signature (p,q) ηp,q(v1, v2) := v 1 p,qv2 (2) in this basis via its matrix representation p,q := diag(1, . . . , 1 | {z } p times , 1, . . . , 1 | {z } q times We call the inner product space Rp,q := (Rp+q, ηp,q) the standard pseudo-Euclidean vector space of signature (p, q). Example 2.3. R3,0 R3 recovers the 3-dimensional Euclidean vector space with its standard positive-definite inner product 3,0 = diag(1, 1, 1). The signature (p, q) = (1, 3) corresponds, instead, to Minkowski spacetime R1,3 with Minkowski inner product 1,3 = diag(1, 1, 1, 1) .2 2.1.2. PSEUDO-EUCLIDEAN GROUPS We are interested in neural networks that respect (i.e., commute with, or are equivariant to) the symmetries of pseudo Euclidean spaces, which we define here. For concreteness, we give these definitions for the standard pseudo-Euclidean vector spaces Rp,q. Let us start with the two cornerstone groups that define such symmetries: Definition 2.4 (Translation groups). The translation group (Rp,q, +) associated with Rp,q is formed by its set of vectors and its (canonical) vector addition. 1Note that we explicitly refrain from imposing positivedefiniteness onto the definition of inner product, in order to include typical Minkowski spacetime inner products, etc. 2There exist different conventions regarding whether time or space components are assigned the negative sign. Clifford-Steerable Convolutional Neural Networks Definition 2.5 (Pseudo-orthogonal groups). The pseudoorthogonal group O(p,q) associated to Rp,q is formed by all invertible linear maps that preserve its inner product, O(p, q) := g GL(Rp,q) g p,qg = p,q , (4) together with matrix multiplication. O(p, q) is compact for p = 0 or q = 0, and non-compact for mixed signatures. Example 2.6. For (p, q) = (3, 0), we obtain the usual orthogonal group O(3), i.e. rotations and reflections, while (p, q) = (1, 3) corresponds to the relativistic Lorentz group O(1, 3), which also includes boosts between inertial frames. Taken together, translations and pseudo-orthogonal transformations of Rp,q form its pseudo-Euclidean group, which is the group of all metric preserving symmetries (isometries).3 Definition 2.7 (Pseudo-Euclidean groups). The pseudo Euclidean group for Rp,q is defined as semidirect product E(p, q) := (Rp,q, +) O(p, q) (5) with group multiplication defined by ( t, g) (t,g) = ( t + gt, gg). Its canonical action on Rp,q is given by E(p, q) Rp,q Rp,q, (t,g), x 7 gx + t (6) Example 2.8. The usual Euclidean group E(3) is reproduced for (p, q) = (3, 0). For Minkowski spacetime, (p, q) = (1, 3), we obtain the Poincar e group E(1, 3). 2.2. Feature vector fields & Steerable CNNs Convolutional neural networks operate on spatial signals, formalized as fields of feature vectors on a base space Rp,q. Transformations of the base space imply corresponding transformations of the feature vector fields defined on them, see Fig. 1 (left column). The specific transformation laws depend thereby on their geometric field type (e.g., scalar, vector, or tensor fields). Equivariant CNNs commute with such transformations of feature fields. The theory of steerable CNNs shows that this requires a G-equivariance constraint on convolution kernels (Weiler et al., 2023). We briefly review the definitions and basic results of feature fields and steerable CNNs in Sections 2.2.1 and 2.2.2 below. For generality, this section considers topologically closed matrix groups G GL(Rp,q) and affine groups Aff(G) = (Rp,q, +) G, and allows for any field type. Section 3 will more specifically focus on pseudo-orthogonal groups G = O(p, q), pseudo-Euclidean groups Aff(O(p, q)) = E(p, q), and multivector fields. For a detailed review of Euclidean steerable CNNs and their generalization to Riemannian manifolds we refer to Weiler et al. (2023). 3As the translations contained in E(p, q) move the origin of Rp,q, they do not preserve the vector space structure of Rp,q, but only its structure as affine space. 2.2.1. FEATURE VECTOR FIELDS Feature vector fields are functions f : Rp,q W that assign to each point x Rp,q a feature f(x) in some feature vector space W. They are additionally equipped with an Aff(G)- action determined by a G-representation ρ on W. The specific choice of (W, ρ) fixes the geometric type of feature vectors. For instance, W = R and trivial ρ(g) = 1 corresponds to scalars, W = Rp,q and ρ(g) = g describes tangent vectors. Higher order tensor spaces and representations give rise to tensor fields. Later on, W = Cl(Rp,q) will be the Clifford algebra and feature vectors will be multivectors with a natural O(p, q)-representation ρCl. Definition 2.9 (Feature vector field). Consider a pseudo Euclidean base space Rp,q. Fix any G GL(Rp,q) and consider a G-representation (W, ρ), called field type . Let Γ(Rp,q, W) := {f : Rp,q W} denote the vector space of W-feature fields. Define an Aff(G)-action ρ : Aff(G) Γ(Rp,q, W) Γ(Rp,q, W) (7) by setting (t,g) Aff(G), f Γ(Rp,q, W), x Rp,q: (t,g) ρf (x) := ρ(g)f (t,g) 1x = ρ(g)f g 1(x t) . Since Γ(Rp,q, W) is a vector space and ρ is linear, the tuple Γ(Rp,q, W), ρ forms the Aff(G)-representation of feature vector fields of type (W, ρ).4 Remark 2.10. Intuitively, (t,g) acts on f by 1. moving feature vectors across the base space, from points g 1(x t) to new locations x, and 2. G-transforming individual feature vectors f(x) W themselves by means of the G-representation ρ(g). Besides the field types mentioned above, equivariant neural networks often rely on irreducible, regular or quotient representations. More choices of field types are discussed and benchmarked in Weiler & Cesa (2019). 2.2.2. STEERABLE CNNS Steerable convolutional neural networks are composed of layers that are Aff(G)-equivariant, that is, which commute with affine group actions on feature fields: Definition 2.11 (Aff(G)-equivariance). Consider any two G-representations (Win, ρin) and (Wout, ρout). Let L : Γ(Rp,q, Win) Γ(Rp,q, Wout) be a function ( layer ) between the corresponding spaces of feature fields. This layer is said to be Aff(G)-equivariant iff it satisfies L (t,g) ρin f = (t,g) ρout L(f) (8) 4 Γ(Rp,q, W), ρ is called induced representation Ind Aff(G) G ρ (Cohen et al., 2019b). From a differential geometry perspective, it can be viewed as the space of bundle sections of a G-associated feature vector bundle; see Defs. G.6, G.7 and (Weiler et al., 2023). Clifford-Steerable Convolutional Neural Networks for any (t,g) Aff(G) and any f Γ(Rp,q, Win). Equivalently, the following diagram should commute: Γ(Rp,q, Win) Γ(Rp,q, Wout) Γ(Rp,q, Win) Γ(Rp,q, Wout) (t,g) ρin (t,g) ρout The most basic operations used in neural networks are parameterized linear layers. If one demands translation equivariance, these layers are necessarily convolutions (see Theorem 3.2.1 in (Weiler et al., 2023)). Similarly, linearity and Aff(G)-equivariance requires steerable convolutions, that is, convolutions with G-steerable kernels: Theorem 2.12 (Steerable convolution). Consider a layer L : Γ(Rp,q, Win) Γ(Rp,q, Wout) mapping between feature fields of types (Win, ρin) and (Wout, ρout), respectively. If L is demanded to be linear and Aff(G)-equivariant, then: 1. L needs to be a convolution integral 5 L fin (u) = K fin (u) := Z Rp,q K(v) fin(u v) dv, parameterized by a convolution kernel K : Rp,q Hom Vec(Win, Wout) . (10) The kernel is operator-valued since it aggregates input features in Win linearly into output features in Wout.67 2. The kernel is required to be G-steerable, that is, it needs to satisfy the G-equivariance constraint8 K(gx) = 1 | det(g)|ρout(g)K(x)ρin(g) 1 (11) =: ρHom(g)(K(x)) for any g G and x Rp,q. This constraint is diagrammatically visualized by the commutativity of: Rp,q Hom Vec(Win, Wout) Rp,q Hom Vec(Win, Wout) Proof. See Theorem 4.3.1 in (Weiler et al., 2023). 5dv is the usual Lebesgue measure on Rp+q. For the integral to exist, we assume f to be bounded and have compact support. 6Hom Vec(Win,Wout), the space of vector space homomorphisms, consists of all linear maps Win Wout. When putting Win = RCin and Wout = RCout, this space can be identified with the space RCout Cin of Cout Cin matrices. 7K : Rp,q Hom Vec(Win, Wout) itself need not be linear. 8This is in particular not demanding K(v) to be (equivariant) homomorphisms of G-representations in Hom G(Win, Wout), despite (Win, ρin) and (Wout, ρout) being G-representations. Only K itself is G-equivariant as map Rp,q Hom Vec(Win,Wout). Remark 2.13 (Discretized kernels). In practice, kernels are often discretized as arrays of shape X1, . . . , Xp+q, Cout, Cin with Cout = dim(Wout) and Cin = dim(Win). The first p+q axes are indexing a pixel grid on the domain Rp,q, while the last two axes represent the linear operators in the codomain by Cout Cin matrices. The main takeaway of this section is that one needs to implement G-steerable kernels in order to implement Aff(G)- equivariant CNNs. This is a notoriously difficult problem, requiring specialized approaches for different categories of groups G and field types (W, ρ). Unfortunately, the usual approaches do not immediately apply to our goal of implementing O(p, q)-steerable kernels for multivector fields. These include the following cases: Analytical: Most commonly, steerable kernels are parameterized in analytically derived steerable kernel bases.9 Solutions are known for SO(3) (Weiler et al., 2018a), O(3) (Geiger et al., 2020) and any G O(2) (Weiler & Cesa, 2019). Lang & Weiler (2021) and Cesa et al. (2022) generalized this to any compact groups G U(d). However, their solutions still require knowledge of irreps, Clebsch Gordan coefficients and harmonic basis functions, which need to be derived and implemented for each single group individually. Furthermore, these solutions do not cover pseudo-orthogonal groups O(p, q) of mixed signature, since these are non-compact. Regular: For regular and quotient representations, steerable kernels can be implemented via channel permutations in the matrix dimensions. This is, for instance, done in regular group convolutions (Cohen & Welling, 2016; Weiler et al., 2018b; Bekkers et al., 2018; Cohen et al., 2019a; Finzi et al., 2020). However, these approaches require finite G or rely on sampling compact G, again ruling out general (non-compact) O(p, q). Numerical: Cohen & Welling (2017) solved the kernel constraint for finite G numerically. For SO(2), Haan et al. (2021) derived numerical solutions based on Lie-algebra representation theory. The numerical routine by Shutty & Wierzynski (2022) solves for Lie-algebra irreps given their structure constants. Corresponding Lie group irreps follow via the matrix exponential, however, only on connected groups like the subgroups SO+(p, q) of O(p, q). Implicit: Steerable kernels are merely G-equivariant maps between vector spaces Rp,q and Hom Vec(Win, Wout). Based on this insight, Zhdanov et al. (2023) parameterize them implicitly via G-equivariant MLPs. However, to 9Unconstrained kernels, Eq. (10), can be linearly combined, and therefore form a vector space. The steerability constraint, Eq. (11) is linear. Steerable kernels span hence a linear subspace and can be parameterized in terms of a basis of steerable kernels. Clifford-Steerable Convolutional Neural Networks implement these MLPs, one usually requires irreps, irrep endomorphisms and Clebsch-Gordan coefficients for each G of interest. Our approach presented in Section 3 is based on the implicit kernel parametrization via neural networks by Zhdanov et al. (2023), which requires us to implement O(p, q)-equivariant neural networks. Fortunately, the Clifford group equivariant neural networks by Ruhe et al. (2023a) establish O(p, q)- equivariance for the practically relevant case of Cliffordalgebra representations ρCl, i.e., O(p, q)-actions on multivectors. The Clifford algebra, and Clifford group equivariant neural networks, are introduced in the next section. 2.3. The Clifford Algebra & Clifford Group Equivariant Neural Networks This section introduces multivector features, a specific type of geometric feature vectors with O(p, q)-action. Multivectors are the elements of a Clifford algebra Cl(V, η) corresponding to a pseudo-Euclidean R-vector space (V, η). The most relevant properties of Clifford algebras in relation to applications in geometric deep learning are the following: Cl(V, η) is, in itself, an R-vector space of dimension 2d with d := dim(V ) = p + q. This allows to use multivectors as feature vectors of neural networks (Brandstetter et al., 2023; Ruhe et al., 2023b; Brehmer et al., 2023). As an algebra, Cl(V,η) comes with an R-bilinear operation : Cl(V, η) Cl(V, η) Cl(V, η), called geometric product.10 We can therefore multiply multivectors with each other, which will be a key aspect in various neural network operations. Cl(V, η) is furthermore a representation space of the pseudo-orthogonal group O(V, η) via ρCl, defined in Eq (19) below. This allows to use multivectors as features of O(V, η)-equivariant networks (Ruhe et al., 2023a). A formal definition of Clifford algebras can be found in Appendix E. Section 2.3.1 offers a less technical introduction, highlighting basic constructions and results. Sections 2.3.2 and 2.3.3 focus on the natural O(p, q)-action on multivectors, and on Clifford group equivariant neural networks. While we will later mostly be interested in (V, η) = Rp,q and O(V, η) = O(p, q), we keep the discussion here general. 2.3.1. INTRODUCTION TO THE CLIFFORD ALGEBRA Multivectors are constructed by multiplying and summing vectors. Specifically, l vectors v1, . . . , vl V multiply to v1 . . . vl Cl(V, η). A general multivector arises as a 10The geometric product is unital, associative, non-commutative, and O(V, η)-equivariant. Its main defining property is highlighted in Eq. (14). A proper definition is given in Definition E.2, Eq. (73). name grade k dim d k basis k-vectors norm scalar 0 1 1 +1 vector 1 3 e1 +1 e2, e3 1 pseudovector 2 3 e12, e13 1 e23 +1 pseudoscalar 3 1 e123 +1 Table 1. Orthonormal basis for Cl(Rp,q) with (p, q) = (1, 2). Norm refers to η(e A, e A) = ηA; see Eq. (18). linear combination of such products, i I ci vi,1 vi,li , (13) with some finite index set I and vi,k V and ci R. The main algebraic property of the Clifford algebra is that it relates the geometric product of vectors v V to the inner product η on V by requiring: v v != η(v, v) 1Cl(V,η) v V Cl(V, η) (14) Intuitively, this means that the product of a vector with itself collapses to a scalar value η(v, v) R Cl(V, η), from which all other properties of the algebra follow by bilinearity. This leads in particular to the fundamental relation11: v2 v1 = v1 v2 + 2η(v1, v2) 1Cl(V,η) v1, v2 V. For the standard orthonormal basis [e1, . . . , ep+q] of Rp,q this reduces to the following simple rules: ej ei for i = j (15a) η(ei, ei) = +1 for i = j p (15b) η(ei, ei) = 1 for i = j > p (15c) An (orthonormal) basis of Cl(V, η) is constructed by repeatedly taking geometric products of any basis vectors ei V . Note that, up to sign flip, (1) the ordering of elements in any product is irrelevant due to Eq. (15a), and (2) any elements occurring twice cancel out due to Eqs. (15b,15c). The basis elements constructed this way can be identified with (and labeled by) subsets A [d] := {1, . . . , d}, where the presence or absence of an index i A signifies whether the corresponding ei appears in the product. Agreeing furthermore on an ordering to disambiguate signs, we define e A := ei1 ei2 . . . eik for A = {i1 < < ik} = and e := 1Cl(V,η). From this, it is clear that dim Cl(V, η) = 2d. Table 1 gives a specific example for (V, η) = R1,2. Any multivector x Cl(V, η) can be uniquely expanded in this basis, x = X A [d] x A e A, (16) where x A R are coefficients. 11To see this, use v := v1 + v2 in Eq. (14) and expand. Clifford-Steerable Convolutional Neural Networks Note that there are d k basis elements e A of grade |A|=k, i.e., which are composed from k out of the d distinct ei V . These span d+1 linear subspaces Cl(k)(V, η), the elements of which are called k-vectors. They include scalars (k = 0), vectors (k = 1), bivectors (k = 2), etc. The full Clifford algebra decomposes thus into a direct sum over grades: Cl(V, η) = Md k=0 Cl(k)(V, η), dim Cl(k)(V, η) = d k Given any multivector x, expanded as in Eq. (16), we can define its k-th grade projection on Cl(k)(V, η) as: A [d], |A|=k x A e A. (17) Finally, the inner product η on V is naturally extended to Cl(V, η) by defining η : Cl(V, η) Cl(V, η) R as η(x, y) := X A [d] ηA x A y A, (18) where ηA := Q i A η(ei,ei) { 1} are sign factors. The tuple (e A)A [d] is an orthonormal basis of Cl(V, η) w.r.t. η. All of these constructions and statements are more formally defined and proven in the appendix of (Ruhe et al., 2023b). 2.3.2. CLIFFORD GRADES AS O(p,q)-REPRESENTATIONS The individual grades Cl(k)(V, η) turn out to be representation spaces of the (abstract) pseudo-orthogonal group (19) O(V, η) := g GL(V ) v V : η(gv,gv) = η(v,v) , which coincides for (V, η) = Rp,q with O(p, q) in Def. 2.2. O(V, η) acts thereby on multivectors by individually multiplying each 1-vector from which they are constructed with g. Definition/Theorem 2.14 (O(V, η)-action on Cl(V, η)). Let (V, η) be a pseudo-Euclidean space, g, gi O(V, η), ci R, vi,j V , x, xi Cl(V, η), and I a finite index set. Define the orthogonal algebra representation ρCl : O(V, η) OAlg (Cl(V, η), η) 12 (20) of O(V, η) via the canonical O(V, η)-action on each of the contained 1-vectors: i I ci vi1 . . . viji (21) i I ci (gvi1) . . . (gviji). ρCl is well-defined as an orthogonal representation: linear: ρCl(g)(c1 x1 + c2 x2) = c1 ρCl(g)(x1) + c2 ρCl(g)(x2) composing: ρCl(g2) (ρCl(g1)(x)) = ρCl(g2g1)(x) 12OAlg Cl(V, η), η is the group of all linear orthogonal transformations of Cl(V, η) that are also multiplicative w.r.t. . invertible: ρCl(g) 1(x) = ρCl(g 1)(x), orthogonal: η(ρCl(g)(x1), ρCl(g)(x2)) = η(x1, x2) Moreover, the geometric product is O(V, η)-equivariant, making ρCl an (orthogonal) algebra representation: ρCl(g)(x1) ρCl(g)(x2) = ρCl(g)(x1 x2). (22) Cl(V, η) Cl(V, η) Cl(V, η) Cl(V, η) Cl(V, η) Cl(V, η) ρCl(g) ρCl(g) ρCl(g) This representation ρCl reduces furthermore to independent sub-representations on individual k-vectors. Theorem 2.15 (O(V, η)-action on grades Cl(k)(V, η)). Let g O(V, η), x Cl(V, η) and k 0, . . . , d a grade. The grade projection ( )(k) is O(V, η)-equivariant: ρCl(g) x (k) = ρCl(g) x(k) (24) Cl(V, η) Cl(k)(V, η) Cl(V, η) Cl(k)(V, η) ρCl(g) ρCl(g) This implies in particular that Cl(V, η) is reducible to subrepresentations Cl(k)(V, η), i.e. ρCl(g) does not mix grades. Proof. Both theorems are proven in (Ruhe et al., 2023a). 2.3.3. O(p,q)-EQUIVARIANT CLIFFORD NEURAL NETS Based on those properties, Ruhe et al. (2023a) proposed Clifford group equivariant neural networks (CGENNs). Due to a group isomorphism, this is equivalent to the network s O(V, η)-equivariance. Definition/Theorem 2.16 (Clifford Group Equivariant NN). Consider a grade k = 0,..., d and weights wk mn R. A Clifford group equivariant neural network (CGENN) is constructed from the following functions, operating on one or more multivectors xi Cl(V, η). Linear layers: mix k-vectors. For each 1 m cout: L(k) m (x1, . . . , xcin) := Xcin n=1 wk mn x(k) n (26) Such weighted linear mixing within sub-representations Cl(k)(V, η) is common in equivariant MLPs. Geometric product layers: compute weighted geometric products with grade-dependent weights: (27) P (k)(x1, x2) := Xd n=0 wk mn x(m) 1 x(n) 2 (k) Clifford-Steerable Convolutional Neural Networks This is similar to the irrep-feature tensor products in MACE (Batatia et al., 2022). Nonlinearity: As activations, we use A(x) := x Φ x(0) where Φ is the CDF of the Gaussian distribution. This is inspired by Gated GELU from Brehmer et al. (2023). All of these operations are by Theorems 2.14 and 2.15 O(V, η)-equivariant. 3. Clifford-Steerable CNNs This section presents Clifford-Steerable Convolutional Neural Networks (CS-CNNs), which operate on multivector fields on Rp,q, and are equivariant to the isometry group E(p, q) of Rp,q. To achieve E(p, q)-equivariance, we need to find a way to implement O(p, q)-steerable kernels (Section 2.2), which we do by leveraging the connection between Cl(Rp,q) and O(p, q) presented in Section 2.3. CS-CNNs process (multi-channel) multivector fields f : Rp,q Cl(Rp,q)c (28) of type (W, ρ) = (Cl(Rp,q)c, ρc Cl) with c 1 channels. The representation ρc Cl = Lc i=1ρCl : O(p, q) GL Cl(Rp,q)c (29) is given by the action ρCl from Definition/Theorem 2.14, however, applied to each of the c components individually. Following Theorem 2.12, our main goal is the construction of a convolution operator L : Γ Rp,q, Cl(Rp,q)cin Γ Rp,q, Cl(Rp,q)cout , L(fin)(u) := Z Rp,q K(v) fin(u v) dv, (30) parameterized by a convolution kernel K : Rp,q Hom Vec Cl(Rp,q)cin, Cl(Rp,q)cout (31) that satisfies the following O(p, q)-steerability (equivariance) constraint for every g O(p, q) and v Rp,q.13(32) K(gv) != ρcout Cl (g) K(v) ρcin Cl (g 1) =: ρHom(g)(K(v)), As mentioned in Section 2.2.2, constructing such O(p, q)- steerable kernels is typically difficult. To overcome this challenge, we follow Zhdanov et al. (2023) and implement the kernels implicitly. Specifically, they are based on O(p, q)- equivariant kernel networks 14 K : Rp,q Cl(Rp,q)cout cin, (33) implemented as CGENNs (Section 2.3.3). Unfortunately, the codomain of K is Cl(Rp,q)cout cin in- 13The volume factor | det g| = 1 drops out for g O(p, q). 14The kernel network s output Cl(Rp,q)cout cin is here reshaped to matrix form Cl(Rp,q)cout cin. stead of Hom Vec Cl(Rp,q)cin, Cl(Rp,q)cout , as required by steerable kernels, Eq. (31). To bridge the gap between these spaces, we introduce an O(p,q)-equivariant linear layer, called kernel head H. Its purpose is to transform the kernel network s output k := K(v) Cl(Rp,q)cout cin into the desired R-linear map between multivector channels H(k) Hom Vec Cl(Rp,q)cin, Cl(Rp,q)cout . The relation between kernel network K, kernel head H, and the resulting steerable kernel K := H K is visualized in Fig. 3 (right). To achieve O(p,q)-equivariance (steerability) of K =H K, we have to make the kernel head H of a specific form: Definition 3.1 (Kernel head). A kernel head is a map H: Cl(Rp,q)cout cin Hom Vec Cl(Rp,q)cin, Cl(Rp,q)cout k 7 H(k), (34) where the R-linear operator H(k) : Cl(Rp,q)cin Cl(Rp,q)cout, f 7 H(k)[f], is defined on each output channel i [cout] and grade component k = 0, . . . , d, by: (35) H(k)[f](k) i := X j [cin] m,n=0,...,d wk mn,ij k(m) ij f(n) j (k) m, n = 0, . . . , d label grades and j [cin] input channels. The wk mn,ij R are parameters that allow for weighted mixing between grades and channels. Our implementation of the kernel head is discussed in Appendix A.5. Note that the kernel head H can be seen as a linear combination of partially evaluated geometric product layers P (k)(kij, ) from (27), which mixes input channels to get the output channels. The specific form of the kernel head H comes from the following, most important property: Proposition 3.2 (Equivariance of the kernel head). The kernel head H is O(p, q)-equivariant w.r.t. ρcout cin Cl and ρHom, i.e. for g O(p, q) and k Cl(Rp,q)cout cin we have: H ρcout cin Cl (g)(k) = ρHom(g)(H(k)). (36) Proof. The proof relies on the O(p, q)-equivariance of the geometric product and of linear combinations within grades. It can be found in the Appendix in Proof F.1. With these obstructions out of the way, we can now give the core definition of this paper: Definition 3.3 (Clifford-steerable kernel). A Cliffordsteerable kernel K is a map as in Eq. (31) that factorizes as: K = H K with a kernel head H from Eq. (35) and a kernel network K given by a Clifford group equivariant neural network (CGENN)15 from Definition/Theorem 2.16: K = [Kij]i [cout] j [cin] : Rp,q Cl(Rp,q)cout cin. (37) 15More generally we could employ any O(p, q)-equivariant neural network K w.r.t. the standard action ρ(g) = g and ρcout cin Cl . Clifford-Steerable Convolutional Neural Networks Rp,q Clcout cin Hom Vec(Clcin, Clcout) Rp,q Clcout cin Hom Vec(Clcin, Clcout) ρcout cin Cl (g) ρHom(g) Figure 3. Left: Multi-vector valued output of the kernel-network K for cin = cout = 1, (p,q) = (1,1), and its expansion to a full O(1,1)- steerable kernel via the kernel head H. Right: Commutative diagram of the construction and O(p,q)-equivariance of implicit steerable kernels K = H K, composed from a kernel network K with cout cin multivector outputs and the kernel head H. The two inner squares show the individual equivariance of K and H, from which the kernels overall equivariance follows. We abbreviate Cl(Rp,q) by Cl. The main theoretical result of this paper is that Cliffordsteerable kernels are always O(p, q)-steerable: Theorem 3.4 (Equivariance of Clifford-steerable kernels). Every Clifford-steerable kernel K = H K is O(p, q)- steerable w.r.t. the standard action ρ(g) = g and ρHom: K(gv) = ρHom(g)(K(v)) g O(p,q), v Rp,q Proof. K and H are O(p, q)-equivariant by Definition/Theorem 2.16 and Proposition 3.2, respectively. The O(p, q)- equivariance of the composition K = H K then follows from Fig. 3 or by direct calculation: K(gv) = H K(gv) (38) = H ρcout cin Cl (g)(K(v)) = ρHom(g) H K(v) = ρHom(g) K(v) . A direct Corollary of Theorem 3.4 and Theorem 2.12 is: Corollary 3.5. Let K =H K be a Clifford-steerable kernel. The corresponding convolution operator L (Eq. (30)) is then E(p, q)-equivariant, i.e. fin Γ Rp,q, Cl(Rp,q)cin : (t, g) L(fin) = L (t, g) fin (t,g) E(p, q) Definition 3.6 (Clifford-steerable CNN). We call a convolutional network (that operates on multivector fields and is) based on Clifford-steerable kernels a Clifford-Steerable Convolutional Neural Network (CS-CNN). Remark 3.7. Brandstetter et al. (2023) use a similar kernel head H as ours, Eq. (35). However, their kernel network K is not O(p,q)-equivariant, making their overall architecture merely translationbut not E(p,q)-equivariant. Remark 3.8. The vast majority of parameters of CS-CNNs reside in their kernel networks K. Further parameters are found in the kernel heads weighted geometric product operation and summation of steerable biases to scalar grades. Remark 3.9. While CS-CNNs are formalized in continuous space, they are in practice typically applied to discretized fields. Our implementation allows for any sampling points, thus covering both pixel grids and point clouds. Appendix G generalizes CS-CNNs from flat spacetimes to general curved pseudo-Riemannian manifolds. Appendix A provides details on our implementation of CS-CNNs, available at https://github.com/maxxxzdn/cliffo rd-group-equivariant-cnns. 4. Experimental Results To assess CS-CNNs, we investigate how well they can learn to simulate dynamical systems by testing their ability to predict future states given a history of recent states (Gupta & Brandstetter, 2022). We consider three tasks: (1) Fluid dynamics on R2 (incompressible Navier-Stokes) (2) Electrodynamics on R3 (Maxwell s Eqs.) (3) Electrodynamics on R1,2 (Maxwell s Eqs., relativistic) Only the last setting is properly incorporating time into 1+2dimensional spacetime, while the former two are treating time steps improperly as feature channels. The improper setting allows us to compare our method with prior work, which was not able to incorporate the full spacetime symmetries E(1, n), but only the spatial subgroup E(n) (which is also covered by CS-CNNs). Data & Tasks: For both tasks (1) and (2), the goal is to predict the next state given previous 4 time steps. In (1), the inputs are scalar pressure and vector velocity fields. In (2) the inputs are vector electric and bivector magnetic fields. For task (3), the goal is to predict 16 future states given the previous 16 time steps. In this case, the entire electromagnetic field forms a bivector (Orb an & Mira, 2021). More details on the datasets are found in Appendix D.3. Architectures: We evaluate six network architectures: architecture matrix group G isometry group Conventional Res Net {e} translations Clifford Res Net {e} translations Fourier Neural Operators {e} translations G-Fourier Neural Operators D4 < O(2) E(2) Steerable Res Net O(n) E(n) Clifford-Steerable Res Net O(p, q) E(p, q) Clifford-Steerable Convolutional Neural Networks 0 2000 4000 No. Training Simulations Navier-Stokes, 2D 200 400 No. Training Simulations Maxwell, 3D Clifford-steerable Res Net (Ours) Basic Res Net Steerable Res Net Clifford Res Net FNO G-FNO 0 20000 40000 60000 Training Step Maxwell, 2D+1 No. Simulations 512 1024 2048 10 5 10 2 100 O(2) Equivariance Error Figure 4. Plots 1 & 2: Mean squared errors (MSEs) on the Navier-Stokes 2D and Maxwell 3D forecasting tasks (one-step loss) as a function of number of training simulations. Plot 3: Convergence (test loss) of our model vs. a basic Res Net on the relativistic Maxwell task. Plot 4: Relative O(2)-equivariance errors of different models. G-FNOs fail as they cannot correctly ingest multivector data. Figure 5. Visual comparison of target and predicted fields. Left: Our CS-Res Net clearly produces better results than the basic Res Net on Navier Stokes, despite only being trained on 64 instead of 5120 simulations. Right: On Maxwell 2D+1, CS-Res Nets capture crisp details like wavefronts more accurately. The basic Res Net model is described in Apx. D. Clifford, Steerable, and our CS-Res Nets are variations of it that substitute vanilla convolutions with their Clifford (Brandstetter et al., 2023), O(n)-steerable (Weiler & Cesa, 2019; Cesa et al., 2022), and Clifford-Steerable counterparts, respectively. We also test Fourier Neural Operators (FNO) (Li et al., 2021) and G-FNO (Helwig et al., 2023). The latter add equivariance to the Dihedral group D4 < O(2). Assuming scalar or regular representations, they are incapable of digesting multivector-valued data. We address this by replacing the initial lifting and final projection with unconstrained operations that are able to learn a geometrically correct mapping from/to multivectors. All models scale their number of channels to match the parameter count of the basic Res Net. Results: To evaluate the models, we report mean-squared error losses (MSE) on test sets. As shown in Fig. 4, our CS-Res Nets outperform all baselines on all tasks, especially when modeling Maxwell s equations. CS-Res Nets are extremely sample-efficient: for the Navier-Stokes experiment, they require only 64 training samples to outperform the basic Res Net and FNOs trained on 80 more data. Plot 1 proves CS-CNNs to be a good alternative to classical O(2)-steerable CNNs in the nonrelativistic case. We didn t run O(3)-steerable CNNs on Maxwell 3D due to resource constraints and on 2D+1 as they are not Lorentz-equivariant. G-FNO does not support either of these symmetries. The Maxwell data on spacetime R1,2 is naturally modeled by space-time algebra Cl(R1,2) (Hestenes, 2015). Contrary to tasks (1) and (2), time appears here as a proper grid dimension, not as a feature channel. The light cone structure of CS-CNN kernels (Fig. 3) ensures the models consistency across different inertial frames of reference. This is relevant as the simulated electromagnetic fields are induced by particles moving at relativistic velocities. We see in Plot 3 that CS-CNNs converge significantly faster and are more sample efficient than basic Res Nets. Equivariance error: To assess the models E(2)-equivariance, we measure the relative error |f(g.x) g.f(x)| |f(g.x)+g.f(x)| between (1) the output computed from a transformed input; and (2) the transformed output, given the original input. As shown in Fig. 4 (right), both steerable models are equivariant up to numerical artefacts. Despite training, the other models did not become equivariant at all. This holds in particular for G-FNO, which covers only a subgroup of discrete rotations. 5. Conclusions We presented Clifford-Steerable CNNs, a new theoretical framework for E(p,q)-equivariant convolutions on pseudo Euclidean spaces such as Minkowski-spacetime. CS-CNNs process fields of multivectors geometric features which naturally occur in many areas of physics. The required O(p,q)-steerable convolution kernels are implemented implicitly via Clifford group equivariant neural networks. This makes so far unknown analytic solutions for the steerability constraint unnecessary. CS-CNNs significantly outperform baselines on a variety of physical dynamics tasks. Some limitations of CS-CNNs are discussed in Appendix B. CS-CNNs are, to the best of our knowledge, the first convolutional networks respecting the full symmetries E(p,q) of pseudo-Euclidean spaces. They are readily extended to general pseudo-Riemannian manifolds; see Apx. G and (Weiler et al., 2023). They could furthermore be adapted to steerable partial differential operators (Jenner & Weiler, 2022), connecting them to multivector calculus (Hestenes, 1968; Hitzer, 2002; Lasenby et al., 1993). Clifford-Steerable Convolutional Neural Networks Impact Statement The broader implications of our work are primarily in the improved modeling of PDEs, other physical systems, or multi-vector based applications in computational geometry. Being able to model such systems more accurately can lead to better understanding about the physical systems governing our world, while being able to model such systems more efficiently could greatly improve the ecological footprint of training ML models for modeling physical systems. Acknowledgements This research was supported by Microsoft Research AI4Science. All content represents the opinion of the authors, which is not necessarily shared or endorsed by their respective employers/sponsors. Batatia, I., Kov acs, D. P., Simm, G. N. C., Ortner, C., and Cs anyi, G. Mace: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields. In Conference on Neural Information Processing Systems (Neur IPS), 2022. Bekkers, E. B-spline CNNs on Lie groups. International Conference on Learning Representations (ICLR), 2020. Bekkers, E. J., Lafarge, M. W., Veta, M., Eppenhof, K. A. J., Pluim, J. P. W., and Duits, R. Roto-Translation Covariant Convolutional Networks for Medical Image Analysis. In International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2018. Brandstetter, J., Berg, R. v. d., Welling, M., and Gupta, J. K. Clifford Neural Layers for PDE Modeling. In International Conference on Learning Representations (ICLR), 2023. Brehmer, J., Haan, P. d., Behrends, S., and Cohen, T. S. Geometric Algebra Transformer. In Conference on Neural Information Processing Systems (Neur IPS), 2023. Cesa, G., Lang, L., and Weiler, M. A Program to Build E(N)- Equivariant Steerable CNNs. In International Conference on Learning Representations (ICLR), 2022. Cohen, T. and Welling, M. Group Equivariant Convolutional Networks. In International Conference on Machine Learning (ICML), pp. 2990 2999, 2016. Cohen, T., Weiler, M., Kicanaoglu, B., and Welling, M. Gauge Equivariant Convolutional Networks and the Icosahedral CNN. In International Conference on Machine Learning (ICML), pp. 1321 1330, 2019a. Cohen, T. S. and Welling, M. Steerable CNNs. In International Conference on Learning Representations (ICLR), 2017. Cohen, T. S., Geiger, M., and Weiler, M. A General Theory of Equivariant CNNs on Homogeneous Spaces. In Conference on Neural Information Processing Systems (Neur IPS), 2019b. Filipovich, M. J. and Hughes, S. Pycharge: an open-source python package for self-consistent electrodynamics simulations of lorentz oscillators and moving point charges. Computer Physics Communications, 274:108291, 2022. Finzi, M., Stanton, S., Izmailov, P., and Wilson, A. G. Generalizing Convolutional Neural Networks for Equivariance to Lie Groups on Arbitrary Continuous Data. In International Conference on Machine Learning (ICML), pp. 3165 3176, 2020. Finzi, M., Welling, M., and Wilson, A. G. A Practical Method for Constructing Equivariant Multilayer Perceptrons for Arbitrary Matrix Groups. In International Conference on Machine Learning (ICML), 2021. Geiger, M., Smidt, T., Alby, M., Miller, B. K., Boomsma, W., Dice, B., Lapchevskyi, K., Weiler, M., Tyszkiewicz, M., Batzner, S., et al. Euclidean neural networks: e3nn. Zenodo. https://doi. org/10.5281/zenodo, 2020. Ghosh, R. and Gupta, A. Scale Steerable Filters for Locally Scale-Invariant Convolutional Neural Networks. Ar Xiv, abs/1906.03861, 2019. Gupta, J. K. and Brandstetter, J. Towards Multispatiotemporal-scale Generalized PDE Modeling. Ar Xiv, abs/2209.15616, 2022. Haan, P. d., Weiler, M., Cohen, T., and Welling, M. Gauge Equivariant Mesh CNNs: Anisotropic convolutions on geometric graphs. In International Conference on Learning Representations (ICLR), 2021. Helwig, J., Zhang, X., Fu, C., Kurtin, J., Wojtowytsch, S., and Ji, S. Group Equivariant Fourier Neural Operators for Partial Differential Equations. In International Conference on Machine Learning (ICML), 2023. Hendrycks, D. and Gimpel, K. Gaussian Error Linear Units (GELUs). ar Xiv: Learning, 2016. Hestenes, D. Multivector calculus. J. Math. Anal. Appl, 24 (2):313 325, 1968. Hestenes, D. Space-time algebra. Springer, 2015. Hitzer, E. M. Multivector differential calculus. Advances in Applied Clifford Algebras, 12:135 182, 2002. Clifford-Steerable Convolutional Neural Networks Holl, P., Thuerey, N., and Koltun, V. Learning to Control PDEs with Differentiable Physics. In International Conference on Learning Representations (ICLR), 2020. Jenner, E. and Weiler, M. Steerable Partial Differential Operators for Equivariant Neural Networks. In International Conference on Learning Representations (ICLR), 2022. Kingma, D. P. and Ba, J. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations (ICLR), volume abs/1412.6980, 2015. Lang, L. and Weiler, M. A Wigner-Eckart Theorem for Group Equivariant Convolution Kernels. In International Conference on Learning Representations (ICLR), 2021. Lasenby, A., Doran, C., and Gull, S. A multivector derivative approach to lagrangian field theory. Foundations of Physics, 23(10):1295 1327, 1993. Li, Z., Kovachki, N. B., Azizzadenesheli, K., Liu, B., Bhattacharya, K., Stuart, A. M., and Anandkumar, A. Fourier Neural Operator for Parametric Partial Differential Equations. In International Conference on Learning Representations (ICLR), 2021. Lindeberg, T. Scale-space. 2009. Loshchilov, I. and Hutter, F. Sgdr: Stochastic Gradient Descent with Warm Restarts. In International Conference on Learning Representations (ICLR), 2017. Marcos, D., Kellenberger, B., Lobry, S., and Tuia, D. Scale equivariance in CNNs with vector fields. ar Xiv preprint ar Xiv:1807.11783, 2018. Orb an, X. P. and Mira, J. Dimensional scaffolding of electromagnetism using geometric algebra. European Journal of Physics, 42(1):015204, 2021. Romero, D. W., Bekkers, E., Tomczak, J. M., and Hoogendoorn, M. Wavelet Networks: Scale-Translation Equivariant Learning From Raw Time-Series. Transactions on Machine Learning Research, 2024. Ruhe, D., Brandstetter, J., and Forr e, P. Clifford Group Equivariant Neural Networks. In Conference on Neural Information Processing Systems (Neur IPS), volume abs/2305.11141, 2023a. Ruhe, D., Gupta, J. K., Keninck, S. D., Welling, M., and Brandstetter, J. Geometric Clifford Algebra Networks. In International Conference on Machine Learning (ICML), pp. 29306 29337, 2023b. Shutty, N. and Wierzynski, C. Computing Representations for Lie Algebraic Networks. Neur IPS 2022 Workshop on Symmetry and Geometry in Neural Representations, 2022. Sosnovik, I., Szmaja, M., and Smeulders, A. W. M. Scale Equivariant Steerable Networks. In International Conference on Learning Representations (ICLR), 2020. Wang, R., Walters, R., and Yu, R. Incorporating Symmetry into Deep Dynamics Models for Improved Generalization. In International Conference on Learning Representations (ICLR), 2021. Wang, S. Extensions to the navier stokes equations. Physics of Fluids, 34(5), 2022. Weiler, M. and Cesa, G. General E(2)-Equivariant Steerable CNNs. In Conference on Neural Information Processing Systems (Neur IPS), pp. 14334 14345, 2019. Weiler, M., Geiger, M., Welling, M., Boomsma, W., and Cohen, T. 3d Steerable CNNs: Learning Rotationally Equivariant Features in Volumetric Data. In Conference on Neural Information Processing Systems (Neur IPS), pp. 10402 10413, 2018a. Weiler, M., Hamprecht, F. A., and Storath, M. Learning Steerable Filters for Rotation Equivariant CNNs. In Computer Vision and Pattern Recognition (CVPR), 2018b. Weiler, M., Forr e, P., Verlinde, E., and Welling, M. Coordinate Independent Convolutional Networks Isometry and Gauge Equivariant Convolutions on Riemannian Manifolds. ar Xiv preprint ar Xiv:2106.06020, 2021. Weiler, M., Forr e, P., Verlinde, E., and Welling, M. Equivariant and Coordinate Independent Convolutional Networks. 2023. URL https://maurice-weiler.gitlab .io/cnn_book/Equivariant And Coordinat e Independent CNNs.pdf. Worrall, D. E. and Welling, M. Deep Scale-spaces: Equivariance Over Scale. In Conference on Neural Information Processing Systems (Neur IPS), pp. 7364 7376, 2019. Wu, Y. and He, K. Group Normalization. In European Conference on Computer Vision (ECCV), pp. 3 19, 2018. Zhang, X. and Williams, L. R. Similarity equivariant linear transformation of joint orientation-scale space representations. ar Xiv preprint ar Xiv:2203.06786, 2022. Zhdanov, M., Hoffmann, N., and Cesa, G. Implicit Convolutional Kernels for Steerable CNNs. In Conference on Neural Information Processing Systems (Neur IPS), 2023. Zhu, W., Qiu, Q., Calderbank, A. R., Sapiro, G., and Cheng, X. Scaling-Translation-Equivariant Networks with Decomposed Convolutional Filters. Journal of Machine Learning Research (JMLR), 23:68:1 68:45, 2022. Clifford-Steerable Convolutional Neural Networks A. Implementation details This appendix provides details on the implementation of CS-CNNs.16 Before detailing the Clifford-steerable kernels and convolutions, we first define the following kernel shell operation, which is used twice in the final kernel computation. Recall that given the base space Rp,q equipped with the inner product ηp,q, we have a Clifford algebra Cl(Rp,q). We want to compute a kernel that maps from cin multivector input channels to cout multivector output channels, i.e., K : Rp,q Hom Vec Cl(Rp,q)cin, Cl(Rp,q)cout . (39) K is defined on any v Rp,q, which allows to model point clouds. In this work, however, we sample it on a grid of shape X1, . . . , Xp+q, analogously to typical CNNs. A.1. Clifford Embedding We briefly discuss how one is able to embed scalars and vectors into the Clifford algebra. This extends to other grades such as bivectors. Let s R and v Rp,q. Using the natural isomorphisms E(0) : R Cl(Rp,q)(0) and E(1) : Rp,q Cl(Rp,q)(1), we embed the scalar and vector components into a multivector as m := E(0)(s) + E(1)(v) Cl(Rp,q) . (40) This is a standard operation in Clifford algebra computations, where we leave the other components of the multivector zero. We denote such embeddings in the algorithms provided below jointly as CL EMBED([s, v]) . A.2. Scalar Orbital Parameterizations Note that the O(p, q)-steerability constraint K(gv) != ρcout Cl (g) K(v) ρcin Cl (g 1) =: ρHom(g)(K(v)) v Rp,q, g O(p, q) couples kernel values within but not across different O(p, q)- orbits O(p, q).v := {gv | g O(p, q)} (41) = {w | η(w, w) = η(v, v)} . The first line here is the usual definition of group orbits, while the second line makes use of the Def. 2.5 of pseudoorthogonal groups as metric-preserving linear maps. 16https://github.com/maxxxzdn/clifford-gro up-equivariant-cnns In the positive-definite case of O(n), this means that the only degree of freedom is the radial distance from the origin, resulting in (hyper)spherical orbits. Examples of such kernels can be seen in Fig. 7. Other radial kernels are obtained typically through e.g. Gaussian shells, Bessel functions, etc. In the nondefinite case of O(p, q), the orbits are hyperboloids, resulting in hyperboloid shells for e.g. the Lorentz group O(1, 3) as in Fig. 3 (left). In this case, we extend the input to the kernel with a scalar component that now relates to the hyperbolic (squared) distance from the origin. Specifically, we define an exponentially decaying ηp,qinduced (parameterized) scalar orbital shell (analogous to the radial shell of typical Steerable CNNs) in the following way. We parameterize a kernel width σ and compute the shell value as sσ(v) = sgn (ηp,q(v, v)) exp |ηp,q(v, v)| The width σ U(0.4, 0.6) is, inspired by (Cesa et al., 2022), initialized with a uniform distribution. Since ηp,q(v, v) can be negative in the nondefinite case, we take the absolute value and multiply the result by the sign of ηp,q(v, v). Computation of the kernel shell (SCALARSHELL) is outlined in Function 1. Intuitively, we obtain exponential decay for points far from the origin. However, the sign of the inner product ensures that we clearly disambiguate between light-like and space-like points. I.e., they are close in Euclidean distance but far in the ηp,q-induced distance. Note that this choice of parameterizing scalar parts of the kernel is not unique and can be experimented with. A.3. Kernel Network Recall from Section 3 that the kernel K is parameterized by a kernel network, which is a map K : Rp,q Cl(Rp,q)cout cin (43) implemented as an O(p, q)-equivariant CGENN. It consists of (linearly weighted) geometric product layers followed by multivector activations. Let {vn}N n=1 be a set of sampling points, where N := X1 . . . Xp+q. In the remainder, we leave iteration over n implicit and assume that the operations are performed for each n. We obtain a sequence of scalars using the kernel shell sn := sσ(vn) . (44) The input to the kernel network is a batch of multivectors xn := CL EMBED([sn, vn]) . (45) Clifford-Steerable Convolutional Neural Networks Function 1 SCALARSHELL input ηp,q, v Rp,q, σ. s sgn (ηp,q(v, v)) exp |ηp,q(v,v)| Function 2 CLIFFORDSTEERABLEKERNEL input p, q Λ, cin, cout, (vn)N n=1 Rp,q, CGENN output k R(cout 2d) (cin 2d) X1 Xp+q # Weighted Cayley. for i = 1 . . . cin, o = 1 . . . cout, a, b, c = 1 . . . p + q do wc oiab N(0, 1 cin N ) # Weight init. W c oiab Λc ab wc oiab end for σ U(0.4, 0.6) # Init if needed. # Compute scalars. sn SCALARSHELL(ηp,q, vn, σ) # Embed s and v into a multivector. xn CL EMBED ([sn, vn]) # Evaluate kernel network. kio n := CGENN (xn) # Reshape to kernel matrix. k RESHAPE (k, (N, cout, cin)) # Compute kernel mask. for i = 1 . . . cin, o = 1 . . . cout, k = 0 . . . p + q do σkio U(0.4, 0.6) # Init if needed. sk noi SCALARSHELL(ηp,q, vn, σkio) end for k(k) noi k(k) noi sk noi # Mask kernel. # Kernel head. kc noib P2d a=1 ka noi W c oiab # Partial weighted geometric product. # Reshape to final kernel. k RESHAPE k, cout 2d, cin 2d, X1, . . . , Xp+q Function 3 CLIFFORDSTEERABLECONVOLUTION input Fin, (vn)N n=1, ARGS output Fout Fin RESHAPE(Fin, (B, cin 2d, Y1, . . . , Yp+q)) k CLIFFORDSTEERABLEKERNEL((vn)N n=1 , ARGS) Fout CONV(Fin, k) Fout RESHAPE(Fout, (B, cout, Y1, . . . , Yp+q, 2d)) return Fout I.e., taking s and v together, they form the scalar and vector components of the CEGNN s input multivector. We found including the scalar component crucial for the correct scaling of the kernel to the range of the grid. Let i = 1, . . . , cin and o = 1, . . . , cout be a sequence of input and output channels. We then have the kernel network output knoi := K(vn)oi := CGENN(xn)oi , (46) where knoi Cl(Rp,q) is the output of the kernel network for the input multivector xn (embedded from the scalar sn and vector vn). Once the output stack of multivectors is computed, we reshape it from shape (N, cout cin) to shape (N, cout, cin), resulting in the kernel matrix k RESHAPE (k, (N, cout, cin)) , (47) where now k Cl(Rp,q)N cout cin. Note that kn Cl(Rp,q)cout cin is a matrix of multivectors, as desired. A.4. Masking We compute a second set of scalars which will act as a mask for the kernel. This is inspired by Steerable CNNs to ensure that the (e.g., radial) orbits of compact groups are fully represented in the kernel, as shown in Figure 7. However, note that for O(p, q)-steerable kernels with both p, q = 0 this is never fully possible since O(p, q) is in general not compact, and all orbits except for the origin extend to infinity. This can e.g. be seen in the hyperbolicshaped kernels in Figure 3. For equivariance to hold in practice, whole orbits would need to be present in the kernel, which is not possible if the kernel is sampled on a grid with finite support. This is not specific to our architecture, but is a consequence of the orbits non-compactness. The same issue arises e.g. in scale-equivariant CNNs (Romero et al., 2024; Worrall & Welling, 2019; Ghosh & Gupta, 2019; Sosnovik et al., 2020; Bekkers, 2020; Zhu et al., 2022; Marcos et al., 2018; Zhang & Williams, 2022). Further experimenting is needed to understand the impact of truncating the kernel on the final performance of the model. We invoke the kernel shell function again to compute a mask for each k = 0, . . . , p + q, i = 1, . . . , cin, o = 1, . . . , cout. That is, we have a weight array σkio, initialized identically as earlier, which is reused for each position in the grid. sk nio := sσkio(vn) . (48) We then mask the kernel by scalar multiplication with the shell, i.e., k(k) kio k(k) nio sk nio . (49) Clifford-Steerable Convolutional Neural Networks A.5. Kernel Head Finally, the kernel head turns the multivector matrices into a kernel that can be used by, for example, torch.nn.Conv Nd or jax.lax.conv. This is done by a partial evaluation of a (weighted) geometric product. Let µ, ν Cl(Rp,q) be two multivectors. Recall that dim Cl(Rp,q) = 2p+q = 2d. B µA νB ΛC AB , (50) where A, B, C [d] are multi-indices running over the 2d basis elements of Cl(Rp,q). Here, Λ R2d 2d 2d is the Clifford multiplication table of Cl(Rp,q), also sometimes called a Cayley table. It is defined as ( 0 if A B = C sgn A,B η(e A B, e A B) if A B = C . Here, denotes the symmetric difference of sets, i.e., A B = (A \ B) (B \ A). Further, sgn A,B := ( 1)n A,B, (52) where n A,B is the number of adjacent swaps one needs to fully sort the tuple (i1, . . . , is, j1, . . . , jt), where A = {i1, . . . , is} and B = {j1, . . . , jt}. In the following, we identify the multi-indices A, B, and C with a relabeling a, b, and c that run from 1 to 2d. Altogether, Λ defines a multivector-valued bilinear form which represents the geometric product relative to the chosen multivector basis. We can weight its entries with parameters wc oiab R, initialized as wc oiab N(0, 1 cin N ). These weightings can be redone for each input channel and output channel, as such we have a weighted Cayley table W R2d 2d 2d cin cout with entries W c oiab := Λc abwc oiab . (53) An ablation study in appendix D.4 demonstrates the great relevance of the weighting parameters empirically. Given the kernel matrix k, we compute the kernel by partial (weighted) geometric product evaluation, i.e., kc noib X2d a=1 ka noi W c oiab . (54) Finally, we reshape and permute kc noib from shape (N, cout, cin, 2d, 2d) to its final shape, i.e., k RESHAPE k, cout 2d, cin 2d, X1, . . . , Xp+q . This is the final kernel that can be used in a convolutional layer, and can be interpreted (at each sample coordinate) as an element of Hom Vec Cl(Rp,q)cin, Cl(Rp,q)cout . The pseudocode for the Clifford-steerable kernel (CLIFFORDSTEERABLEKERNEL) is given in Function 2. A.6. Clifford-steerable convolution: As defined in Section 3, Clifford-steerable convolutions can be efficiently implemented with conventional convolutional machinery such as torch.nn.Conv Nd or jax.lax.conv (see Function 3 (CLIFFORDSTEERABLECONVOLUTION) for pseudocode). We now have a kernel k R(cout 2d) (cin 2d) X1 Xp+q that can be used in a convolutional layer. Given batch size B, we now reshape the input stack of multivector fields (B, cin, Y1, . . . , Yp+q, 2d) into (B, cin 2d, Y1, . . . , Yp+q). The output array of shape (B, cout 2d, Y1, . . . , Yp+q) is obtained by convolving the input with the kernel, which is then reshaped to (B, cout, Y1, . . . , Yp+q, 2d), which can then be interpreted as a stack of multivector fields again. B. Limitations From the viewpoint of general steerable CNNs, there are some limitations: There exist more general field types (O(p,q)-representations) than multivectors, for which CS-CNNs do not provide steerable kernels. For connected Lie groups, such as the subgroups SO+(p,q), these types can in principle be computed numerically (Shutty & Wierzynski, 2022). CGENNs and CS-CNNs rely on equivariant operations that treat multivector-grades Cl(k)(V, η) as atomic features. However, it is not clear whether grades are always irreducible representations, that is, there might be further equivariant degrees of freedom which would treat irreducible sub-representations independently. We observed that the steerable kernel spaces of CS-CNNs are not necessarily complete, that is, certain degrees of freedom might be missing. However, we show in Appendix C how they are recovered by composing multiple convolutions. O(p, q) and their group orbits on Rp,q are for p, q = 0 non-compact; for instance, the hyperbolas in spacetimes R1,q extend to infinity. In practice, we sample convolution kernels on a finite sized grid as shown in Fig. 3 (left). This introduces a cutoff, breaking equivariance for large transformations. Note that this is an issue not specific to CS-CNNs, but it applies e.g. to scale-equivariant CNNs as well (Bekkers, 2020; Romero et al., 2024). Despite these limitations, CS-CNNs excel in our experiments. A major advantage of CGENNs and CS-CNNs is that they allow for a simple, unified implementation for arbitrary signatures (p,q). This is remarkable, since steerable kernels usually need to be derived for each symmetry group individually. Furthermore, our implementation applies both to multivector fields sampled on pixel grids and point clouds. Clifford-Steerable Convolutional Neural Networks C. Completeness of kernel spaces In order to not over-constrain the model, it is essential to parameterize a complete basis of O(p,q)-steerable kernels. Comparing our implicit O(2,0) = O(2)-steerable kernels with the analytical solution by (Weiler & Cesa, 2019), we find that certain degrees of freedom are missing; see Fig. 7. However, while these degrees of freedom are missing in a single convolution operation, they can be fully recovered by applying two consecutively convolutions. This suggests that the overall expressiveness of CS-CNNs is (at least for O(2)) not diminished. Moreover, two convolutions with kernels b K and K can always be expressed as a single convolution with a composed kernel b K K. As visualized below, this composed kernel recovers the full degrees of freedom reported in (Weiler & Cesa, 2019): The following two sections discuss the initial differences in kernel parametrizations and how they are resolved by adding a second linear or convolution operation. Unless stated otherwise, we focus here on cin = cout = 1 channels to reduce clutter. C.1. Coupled radial dependencies in CS-CNN kernels The first issue is that the CS-CNN parametrization implies a coupling of radial degrees of freedom. To make this precise, note that the O(2)-steerability constraint K(gv) != ρcout Cl (g) K(v) ρcin Cl (g 1) v R2, g O(2) decouples into independent constraints on individual O(2)- orbits on R2, which are rings at different radii (and the origin); visualized in Fig. 2 (left). (Weiler et al., 2018a; Weiler & Cesa, 2019) parameterize the kernel therefore in (hyper)spherical coordinates. In our case these are polar coordinates of R2, i.e. a radius r R 0 and angle ϕ S1: K(r, ϕ) := R(r)κ(ϕ) (55) The O(2)-steerability constraint affects only the angular part and leaves the radial part entirely free, such that it can be parameterized in an arbitrary basis or via an MLP. e2cnn: Weiler & Cesa (2019) solved analytically for complete bases of the angular parts. Specifically, they derive solutions Kk n(r, ϕ) = Rk n(r)κk n(ϕ) (56) for any pair of input and output field types (irreps of grades) n and k, respectively. This complete basis of O(2)-steerable kernels is shown in the bottom table of Fig. 7. CS-CNNs: CS-CNNs parameterize the kernel in terms of a kernel network K : Rp,q Cl(Rp,q)cout cin, visualized in Fig. 7 (top). Expressed in polar coordinates, assuming cin = cout = 1, and considering the independence of K on different orbits due to its O(2)-equivariance, we get the factorization K(r, ϕ)(m) = Rm(r)κm(ϕ) , (57) where m is the grade of the multivector-valued output. As described in Appendix A.5 (Eq. (53)), the kernel head operation H expands this output by multiplying it with weights W k mn = Λk mnwk mn, where wk mn R are parameters and Λk mn { 1, 0, 1} represents the geometric product relative to the standard basis of Rp,q. Note that we do not consider multiple in or output channels here. The final expanded kernel for CS-CNNs is hence given by Kk n(r, ϕ) = X m W k mn K(r, ϕ)(m) (58) m Λk mnwk mn Rm(r)κm(ϕ) . These solutions are listed in the top table in Fig. 7, and visualized in the graphics above.17 Comparison: Note that the complete solutions by (Weiler & Cesa, 2019) allow for a different radial part Rk n for each pair of input and output type (grade/irrep). In contrast, the CS-CNN parametrization expands coupled radial parts Rm, additionally multiplying them with weights wk mn (highlighted in the table in blue and green). The CS-CNN parametrization is therefore clearly less general (incomplete). Solutions: One idea to resolve this shortcoming is to make the weighted geometric product parameters themselves radially dependent, wk mn : R 0 R, r 7 wk mn(r) , (59) for instance by parameterizing the weights with a neural network. This would fully resolve the under-parametrization, 17The parameter Λk mn appears in the table as selecting to which entry k, n of the table grade K(r, ϕ)(m) is added (optionally with minus signs). Clifford-Steerable Convolutional Neural Networks CS-CNN parametrization out in scalar vector pseudoscalar 1 ws ss Rs(r) ws vv Rv(r) sin(ϕ) cos(ϕ) wv vs Rv(r) wv sv Rs(r) wv vp Rv(r) e12 wp vv Rv(r) cos(ϕ) sin(ϕ) wp sp Rs(r) complete e2cnn parametrization (Weiler & Cesa, 2019) 1 Rs s(r) 1 sin(ϕ) cos(ϕ) , b Rv v(r) " cos(2ϕ) sin(2ϕ) sin(2ϕ) cos(2ϕ) e12 Rp v(r) cos(ϕ) sin(ϕ) Figure 7. Comparison of the parametrization of O(2)-steerable kernels in CS-CNNs (top and middle) and e2cnn (bottom). While the e2cnn solutions are proven to be complete, CS-CNN seems to miss certain degrees of freedom: (1) Their radial parts are coupled in the components highlighted in blue and green, while escnn allows for independent radial parts. By coupled we mean that they are merely scaled relative to each other with weights wk mn from the weighted geometric product operation in the kernel head H, where m labels grade K(m) of the kernel network output while n, k label input and output grades of the expanded kernel in Hom Vec Cl(Rp,q), Cl(Rp,q) ; (2) CS-CNN is missing kernels of angular frequency 2 that are admissible for mapping between vector fields; highlighted in red. As explained in Appendix C, these missing degrees of freedom are recovered when composing two convolution layers. A kernel corresponding to the composition of two convolutions in a single one is visualized in Fig. 6. Clifford-Steerable Convolutional Neural Networks and would preserve equivariance, since O(2)-steerability depends only on the angular variable. However, doing this is actually not necessary, since the missing flexibility of radial parts can always be resolved by running a convolution followed by a linear layer (or a second convolution) when cout > 1. The reason for this is that different channels i = 1, . . . , cout of a kernel network K : R Cl(R)cout cin do have independent radial parts. Their convolution responses in different channels can by a subsequent linear layer be mixed with grade-dependent weights. By linearity, this is equivalent to immediately mixing the channels radial parts with grade-dependent weights, resulting in effectively decoupled radial parts. C.2. Circular harmonics order 2 kernels A second issue is that the CS-CNN parametrization is missing a basis kernel of angular frequency 2 that maps between vector fields; highlighted in red in the bottom table of Fig. 7. However, it turns out that this degree of freedom is reproduced as the difference of two consecutive convolutions ( ), one mapping vectors to pseudoscalars and back to vectors, the other one mapping vectors to scalars and back to vectors, as suggested in the (non-commutative!) computation flow diagram below: pseudo vector vector vector scalar vector As background on the angular frequency 2 kernel, note that O(2)-steerable kernels between irreducible field types of angular frequencies j and l contain angular frequencies |j l| and j + l this is a consequence of the Clebsch-Gordan decomposition of O(2)-irrep tensor products (Lang & Weiler, 2021). We identify multivector grades Cl(R2,0)(k) with the following O(2)-irreps:1819 scalars Cl(R2,0)(0) trivial irrep (j =0) vectors Cl(R2,0)(1) defining irrep (j =1) pseudo-scalars Cl(R2,0)(2) sign-flip irrep (j =0) Kernels that map vector fields (j =1) to vector fields (l=1) should hence contain angular frequencies |j l| = 0 and j +l = 2. The latter is missing since O(2)-irreps of order 2 are not represented by any grade of Cl(R2,0). To solve this issue, it seems like one would have to replace the CEGNNs underlying the kernel network K with a more 18As mentioned earlier, multivector grades may in general not be irreducible, however, for (p, q) = (2, 0) they are. 19There are two different O(2)-irreps corresponding to j = 0 (trivial and sign-flip); see (Weiler et al., 2023)[Section 5.3.4]. general O(2)-equivariant MLP, e.g. (Finzi et al., 2021). However, it can as well be implemented as a succession of two convolution operations. To make this claim plausible, observe first that convolutions are associative, that is, two consecutive convolutions with kernels K and b K are equivalent to a single convolution with kernel b K K: b K K f = b K K f (60) Secondly, convolutions are linear, such that α( b K f) + β(K f) = α b K + βK f (61) for any α, β R. Using associativity, we can express two consecutive convolutions, first going from vector to scalar fields via Ks v(r, ϕ) = Rs v(r) sin(ϕ) cos(ϕ) (62) then going back from scalars to vectors via Kv s(r, ϕ) = Rv s(r) sin(ϕ) cos(ϕ) as a single convolution between vector fields, where the combined kernel is given by: Σv v := Kv s Ks v (64) We can similar define a convolution going from vector to pseudoscalar fields via Kp v(r, ϕ) = Rp v(r) cos(ϕ) sin(ϕ) (65) and back to vector fields via Kv p(r, ϕ) = Rv p(r) cos(ϕ) sin(ϕ) as a single convolution with combined kernel: Πv v := Kp v Kv p (67) By linearity, we can define yet another convolution between vector fields by taking the difference of these kernels, which results in: Πv v Σv v = Clifford-Steerable Convolutional Neural Networks Such kernels parameterize exactly the missing O(2)- steerable kernels of angular frequency 2; highlighted in red in the bottom table in Fig. 7. This shows that the missing kernels can be recovered by two convolutions, if required. The visual proof by convolving kernels is clearly only suggestive. To make it precise, it would be required to compute the convolutions of two kernels analytically. This is easily done by identifying circular harmonics with derivatives of Gaussian kernels; a relation that is well known in classical computer vision (Lindeberg, 2009). D. Experimental details D.1. Model details: For Res Nets, we follow the setup of Wang et al. (2021); Brandstetter et al. (2023); Gupta & Brandstetter (2022): the Res Net baselines consist of 8 residual blocks, each comprising two convolution layers with 7 7 (or 7 7 7 for 3D) kernels, shortcut connections, group normalization (Wu & He, 2018), and Ge LU activation functions (Hendrycks & Gimpel, 2016). We use two embedding and two output layers, i.e., the overall architectures could be classified as Res-20 networks. Following (Gupta & Brandstetter, 2022; Brandstetter et al., 2023), we abstain from employing downprojection techniques and instead maintain a consistent spatial resolution throughout the networks. The best models have approx. 7M parameters for Navier-Stokes and 1.5M parameters for Maxwell s equations, in both 2D and 3D. D.2. Optimization: For each experiment and each model, we tuned the learning rate to find the optimal value. Each model was trained until convergence. For optimization, we used Adam optimizer (Kingma & Ba, 2015) with no learning decay and cosine learning rate scheduler (Loshchilov & Hutter, 2017) to reduce the initial value by the factor of 0.01. Training was done on a single node with 4 NVIDIA Ge Force RTX 2080 Ti GPUs. D.3. Datasets Navier Stokes: We use the Navier-Stokes data from Gupta & Brandstetter (2022), which is based on ΦFlow (Holl et al., 2020). It is simulated on a grid with spatial resolution of 128 128 pixels of size x = y = 0.25m and temporal resolution of t = 1.5s. For validation and testing, we randomly selected 1024 trajectories from corresponding partitions. Maxwell 3D: Simulations of the 3D Maxwell equations are taken from Brandstetter et al. (2023). This data is discretized on a grid with a spatial resolution of 32 32 32 voxels with x = y = z = 5 10 7m and was reported to have a temporal resolution of t = 50s. In the non-relativistically modeled setting Cl(R3,0), E is treated as a vector field, and B as a bivector field. Validation and test sets comprise 128 simulations. Maxwell 2D: We simulate data for Maxwell s equations on spacetime R2,1 using Py Charge (Filipovich & Hughes, 2022). Electromagnetic fields are emitted by point sources that move, orbit and oscillate at relativistic speeds. The spacetime grid has a resolution of 128 points in both spatial and the temporal dimension. Its spatial extent are 50nm and the temporal extent are 3.77 10 14s. Sampled simulations contain between 2 to 4 oscillating charges and 1 to 2 orbiting charges. The sources have charges sampled uniformly as integer values between 3e and 3e. Their positions are sampled uniformly on the grid, with a predefined minimum initial distance between them. Each charge has a random linear velocity and either oscillates in a random direction or orbits with a random radius. Oscillation and rotation frequencies, as well as velocities are sampled such that the overall particle velocity does not exceed 0.85c, which is necessary since the Py Charge simulation becomes unstable beyond this limit. As the field strengths span many orders of magnitude, we normalize the generated fields by dividing bivectors by their Minkowski norm and multiplying them by the logarithm of this norm. This step is non-trivial sincew Minkowskinorms can be zero or negative, however, we found that they are always positive in the generated data. We filter out numerical artifacts by removing outliers with a standard deviation greater than 20. The final dataset comprises 2048 training, 256 validation and 256 test simulations. Dataset symmetries: The classical Navier Stokes equations are Galilean invariant (Wang, 2022). Our CS-CNN for Cl(R2) is E(2)-equivariant, capturing the subgroup of isometries without boosts. Maxwell s equations are Poincar e invariant. Similar to the case of Navier Stokes, our model for Cl(R3) is E(3)- equivariant. The relativistic spacetime model for Cl(R1,2) is fully equivariant w.r.t. the Poincar e group E(1, 2). The invariance of a system s equations of motion imply an equivariant system dynamics. This statement assumes that the system is transformed as a whole, i.e. together with boundary conditions or background fields. It does obviously not hold when fixed symmetry-breaking boundary conditions or background fields are given. However, implicit kernels may in this case be informed about the symmetry breaking geometric structure by providing it in form of additional inputs to the kernel network as described in (Zhdanov et al., 2023). Clifford-Steerable Convolutional Neural Networks Learned Fixed Weights Figure 8. Performance of CS-CNNs with freely learned weights in the kernel head and such that ablate to fixed weights wk mn,ij = 1. D.4. Kernel head weight ablation As discussed in Def. 3.1 and Appendix A.5, the kernel head is essentially a partially evaluated geometric product operation with additional weighting parameters that are learned during training. To check how relevant this weighting is in practice, we ran an ablation study that fixed all kernel head weights to wk mn,ij = 1. It turns out that the weighting is quite relevant: Our fully weighted CS-CNN achieved a test MSE of 2.53 10 3 on the Navier Stokes forecasting task, while the MSE for the fixed weight CS-CNN increased to 4.30 10 3; see Fig. 8. This drastic loss in performance is explained by the fact that these weights allow to scale different kernel channels relative to each other as visualized in Fig. 7, which is essential to parameterize the complete space of steerable kernels. E. The Clifford Algebra For completeness purposes and to complement Section 2.3, in this sections, we give a short and formal definition of the Clifford algebra. For this, we first need to introduce the tensor algebra of a vector space. Definition E.1 (The tensor algebra). Let V be finite dimensional R-vector space of dimension d. Then the tensor algebra of V is defined as follows: Tens(V ) := m=0 V m (69) = span {v1 vm | m 0, vi V } , where we used the following abbreviations for the m-times tensor product of V for m 0: V m := V V | {z } m-times , V 0 := R. (70) Note that the above definition turns (Tens(V ), ) into a (non-commutative, infinite dimensional, unital, associative) algebra over R. In fact, the tensor algebra (Tens(V ), ) is, in some sense, the biggest algebra generated by V . We now have the tools to give a proper definition of the Clifford algebra: Definition E.2 (The Clifford algebra). Let (V, η) be a finite dimensional innner product space over R of dimension d. The Clifford algebra of (V, η) is then defined as the following quotient algebra: Cl(V, η) := Tens(V )/I(η), (71) I(η) := v v η(v, v) 1Tens(V ) v V (72) := span n x v v η(v, v) 1Tens(V ) y v V, x, y Tens(V ) o , where I(η) denotes the two-sided ideal of Tens(V ) generated by the relations v v η(v, v) 1Tens(V ) for all v V . The product on Cl(V, η) that is induced by the tensor product is called the geometric product and will be denoted as follows: x1 x2 := [z1 z2], (73) with the equivalence classes xi = [zi] Cl(V, η), i = 1, 2. Note that, since I(η) is a two-sided ideal, the geometric product is well-defined. The above construction turns (Cl(V, η), ) into a (non-commutative, unital, associative) algebra over R. In some sense, (Cl(V, η), ) is the biggest (noncommutative, unital, associative) algebra (A, ) over R that is generated by V and satisfies the relations v v = η(v, v) 1A for all v V . It turns out that (Cl(V, η), ) is of the finite dimension 2d and carries a parity grading of algebras and a multivector grading of vector spaces, see (Ruhe et al., 2023b) Appendix D. More properties are also explained in Section 2.3. From an abstract, theoretical point of view, the most important property of the Clifford algebra is its universal property, which fully characterizes it: Theorem E.3 (The universal property of the Clifford algebra). Let (V, η) be a finite dimensional innner product space over R of dimension d. For every (non-commutative, unital, associative) algebra (A, ) over R and every R-linear map f : V A such that for all v V we have: f(v) f(v) = η(v, v) 1A, (74) there exists a unique algebra homomorphism (over R): f : (Cl(V, η), ) (A, ), (75) such that f(v) = f(v) for all v V . Clifford-Steerable Convolutional Neural Networks Proof. The map f : V A uniquely extends to an algebra homomorphism on the tensor algebra: f : Tens(V ) A, (76) i I ci vi,1 vi,li i I ci f(vi,1) f(vi,li). (77) Because of Equation (74) we have for every v V : f v v η(v, v) 1Tens(V ) = f(v) f(v) η(v, v) 1A (78) f (I(η)) = 0. (80) This shows that f then factors through the thus welldefined induced quotient map of algebras: f : Cl(V, η) = Tens(V )/I(η) A (81) f([z]) := f (z). (82) This shows the claim. Remark E.4 (The universal property of the Clifford algebra). The universal property of the Clifford algebra can more explicitely be stated as follows: If f satisfies Equation (74) and x Cl(V, η), then we can take any representation of x of the following form: i I ci vi,1 vi,li, (83) with any finite index sets I, any li N and any coefficients c0, ci R and any vectors vi,j V , j = 1, . . . , li, i I, and, then we can compute f(x) by the following formula: i I ci f(vi,1) f(vi,li), (84) and no ambiguity can occur for f(x) if one uses a different such representation for x. Example E.5. The universal property of the Clifford algebra can, for instance, be used to show that the action of the (pseudo-)orthogonal group: O(V, η) Cl(V, η) Cl(V, η), (85) (g, x) 7 ρCl(g)(x), (86) i I ci vi,1 vi,li i I ci (gvi,1) (gvi,li), (87) is well-defined. For this one only would need to check Equation (74) for v V : (gv) (gv) = η(gv, gv) 1Cl(V,η) (88) = η(v, v) 1Cl(V,η), (89) where the first equality holds by the fundamental relation of the Clifford algebra and where the last equality holds by definition of O(V, η) g. So the linear map g : V Cl(V, η), by the universal property of the Clifford algebra, thus uniquely extends to the algebra homomorphism: ρCl(g) : Cl(V, η) Cl(V, η), (90) as defined in Equation (87). One can then check the remaining rules for a group action in a straightforward way. More details can be found in (Ruhe et al., 2023b) Appendix D and E. Proof F.1 for Proposition 3.2 (Equivariance of the kernel head). Recall the definition of the kernel head: H: Cl(Rp,q)cout cin Hom Vec Cl(Rp,q)cin, Cl(Rp,q)cout k 7 H(k) = f 7 H(k)[f] , (91) which on each output channel i [cout] and grade component k = 0, . . . , d, was given by: H(k)[f](k) i := X j [cin] m,n=0,...,d wk mn,ij k(m) ij f(n) j (k) , with: wk mn,ij R , k = [ki,j]i [cout] j [cin] Cl(Rp,q)cout cin , f = [f1, . . . , fcin] Cl(Rp,q)cin . Clearly, H(k) is a R-linear map (in f). Now let g O(p, q). We are left to check the following equivariance formula: H ρcout cin Cl (g)(k) ?= ρHom(g) H(k) (92) := ρcout Cl (g) H(k) ρcin Cl (g 1). Clifford-Steerable Convolutional Neural Networks We abbreviate s := ρcin Cl (g 1)(f) Cl(Rp,q)cin , Q := ρcout cin Cl (g)(k) Cl(Rp,q)cout cin . First note that we have for j [cin]: ρCl(g)(sj) = fj. (93) We then get: h ρHom(g) H(k) [f] i(k) = h ρcout Cl (g) H(k) ρcin Cl (g 1)(f) i(k) = h ρcout Cl (g) H(k) [s] i(k) = ρCl(g) H(k) [s] (k) i j [cin] m,n=0,...,d wk mn,ij k(m) ij s(n) j (k) ! j [cin] m,n=0,...,d wk mn,ij ρCl(g)(kij) (m) ρCl(g)(sj) (n) (k) j [cin] m,n=0,...,d wk mn,ij Q(m) ij f(n) j (k) = h H(Q)[f] i(k) = h H ρcout cin Cl (g)(k) [f] i(k) Note that we repeatedly made use of the rules in Definition/Theorem 2.14 and Theorem 2.15, i.e. the linearity, composition, multiplicativity and grade preservation of ρCl(g). As this holds for all m, k and f we get the desired equation, ρHom(g)(H(k)) = H(ρcout cin Cl (g)(k)), (94) which shows the claim. G. Clifford-steerable CNNs on pseudo-Riemannian manifolds In this section we will assume that the reader is already familiar with the general definitions of differential geometry, which can also be found in Weiler et al. (2021; 2023). We will in this section state the most important results for deep neural networks that process feature fields on G-structured pseudo-Riemannian manifolds. These results are direct generalizations from those in Weiler et al. (2023), where they were stated for (G-structured) Riemannian manifolds, but which verbatim generalize to (G-structured) pseudo Riemannian manifolds if one replaces O(d) with O(p, q) everywhere. Recall, that in this geometric setting a signal f on the manifold M is typically represented by a feature field f : M A of a certain type , like a scalar field, vector field, tensor field, multi-vector field, etc. Here f assigns to each point z an n-dimensional feature f(z) Az = Rn. Formally, f is a global section of a G-associated vector bundle A with typical fibre Rn, i.e. f Γ(A), see Weiler et al. (2023) for details. We can consider Γ(A) as the vector space of all vector fields of type A. A deep neural network F on M with N layers can then, as before, be considered as a composition: F : Γ(A0) L1 Γ(A1) L2 Γ(A2) L3 LN Γ(AN), (95) where L1, . . . , LN are maps between the vector spaces of vector fields Γ(Aℓ), which are typically linear maps or simple fixed non-linear maps. For the sake of analysis we can focus on one such linear layer: L : Γ(Ain) Γ(Aout). Our goal is to describe the case, where L is an integral operator with an convolution kernel20 such that: i.) it is well-defined, i.e. independent of the choice of (allowed) local coordinate systems (covariance), ii.) we can use the same kernel K (not just corresponding ones) in any (allowed) local coordinate system (gauge equivariance), iii.) it can do weight sharing between different locations, meaning that the same kernel K will be applied at every location, iv.) input and output transform correspondingly under global transformations (isometry equivariance). The isometry equivariance here is the most important property. Our main results in this Appendix will be that isometry equivariance will in fact follow from the first points, see Theorem G.27 and Theorem G.33. Before we introduce our Clifford-steerable CNNs on general pseudo-Riemannian manifolds with multi-vector feature fields in Appendix G.2, we first recall the general theory of G-steerable CNNs on G-structured pseudo-Riemannian manifolds in total analogy to Weiler et al. (2023) in the next section, Appendix G.1. G.1. General G-steerable CNNs on G-structured pseudo-Riemannian manifolds For the convenience of the reader, we will now recall the most important needed concepts from pseudo-Riemannian geometry in some more generality, but refer to Weiler et al. (2023) for further details and proofs. We will assume that the curved space M will carry a (non- 20Note that a convolution operator L(f)(u) = R K(u, v)f(v) dv can be seen as a continuous analogon to a matrix multiplication. In our theory K will need to depend on only one argument, corresponding to a circulant matrix. Clifford-Steerable Convolutional Neural Networks degenerate, possibly indefinite) metric tensor η of signature (p, q), d = p + q, and will also come with internal symmetries encoded by a closed subgroup G GL(d). Definition G.1 (G-structure). Let (M, η) be pseudo Riemannian manifold of signature (p, q), d = p + q, and G GL(d) a closed subgroup. A G-structure on (M, η) is a principle G-subbundle ι : GM , FM of the frame bundle FM over M. Note that GM is supposed to carry the right G-action induced from FM: : GM G GM, [ei]i [d] g := j [d] ej gj,i which thus makes the embedding ι a G-equivariant embedding. Definition G.2 (G-structured pseudo-Riemannian manifold). Let G GL(d) be closed subgroup. A G-structured pseudo-Riemannian manifold (M, G, η) of signature (p, q) - per definition - consists of a pseudo-Riemannian manifold (M, η) of dimension d = p + q with a metric tensor η of signature (p, q), and, a fixed choice of a G-structure ι : GM , FM on M. We will denote the G-structured pseudo-Riemannian manifold with the triple (M, G, η) and keep the fixed G-structure ι : GM , FM implicit in the notation, as well as the corresponding G-atlas of local tangent bundle trivializations: AG = (ΨA, U A) π 1 TM(UA) ΨA UA Rd where I is an index set and U A M are certain open subsets of M. Remark G.3. Note that for any given G GL(d) there might not exists a corresponding G-structure GM on (M, η) in general. Furthermore, even if it existed it might not be unique. So, when we talk about such a G-structure in the following we always make the implicit assumption of its existence and we also fix a specific choice. Definition G.4 (Isometry group of a G-structured pseudo-Riemannian manifold). Let (M, G, η) be a G-structured pseudo-Riemannian manifold. Its (G-structure preserving) isometry group is defined to be: Isom(M, G, η) := ϕ : M M diffeo | z M, v Tz M. ηϕ(z)(ϕ ,TM(v), ϕ ,TM(v)) = ηz(v, v), ϕ ,FM(Gz M) = Gϕ(z)M . (98) The intuition here is that the first condition constrains ϕ to be an isometry w.r.t. the metric η. The second condition constrains ϕ to be a symmetry of the G-structure, i.e. it maps G-frames to G-frames. Remark G.5 (Isometry group). Recall that the (usual/full) isometry group of a pseudo-Riemannian manifold (M, η) is defined as: := ϕ : M M diffeo | z M, v Tz M. ηϕ(z)(ϕ ,TM(v), ϕ ,TM(v)) = ηz(v, v) . (99) Also note that for a G-structured pseudo-Riemannian manifold (M, G, η) of signature (p, q) such that O(p, q) G we have: Isom(M, G, η) = Isom(M, η). (100) Definition G.6 (G-associated vector bundle). Let (M, G, η) be a G-structured pseudo-Riemannian manifold and let ρ : G GL(n) be a left linear representation of G. A vector bundle A over M is called a G-associated vector bundle (with typical fibre (Rn, ρ)) if there exists a vector bundle isomorphism over M of the form: A (GM Rn) / ρ =: GM ρ Rn, (101) where the equivalence relation is given as follows: (e , v ) ρ (e, v) : g G. (e , v ) = (e g, ρ(g 1)v). (102) Definition G.7 (Global sections of a fibre bundle). Let πA : A M be a fibre bundle over M. We denote the set of global sections of A as: Γ(A) := {f : M A | z M. f(z) Az} , (103) where Az := π 1 A (z) denotes the fibre of A over z M. Remark G.8 (Isometry action). For a G-associated vector bundle A = GM ρRn and ϕ Isom(M, G, η) we can define the induced G-associated vector bundle automorphism ϕ ,A on A as follows: ϕ ,A : A A, (104) ϕ ,A (e, v) := (ϕ ,GM(e), v) . (105) With this we can define a left action of the group Isom(M, G, η) on the corresponding space of feature fields Γ(A) as follows: : Isom(M, G, η) Γ(A) Γ(A), (106) ϕ f := ϕ ,A f ϕ 1 : M A. (107) To construct a well-behaved convolution operator on M we first need to introduce the idea of a transporter of feature fields along a curve γ : I M. Clifford-Steerable Convolutional Neural Networks Remark G.9 (Transporter). A transporter TA on the vector bundle A over M takes any (sufficiently smooth) curve γ : I M with I R some interval and two points s, t I, s t, and provides an invertible linear map: Ts,t A,γ : Aγ(s) Aγ(t), v 7 Ts,t A,γ(v). (108) TA is thought to transport the vector v Aγ(s) at location γ(s) M along the curve γ to the location γ(t) M and outputs a vector v = Ts,t A,γ(v) in Aγ(t). For consistency we require that TA satisfies the following points for such γ: 1. For s I we get: Ts,s A,γ != id Aγ(s) : Aγ(s) Aγ(s), 2. For s t u we have: Tt,u A,γ Ts,t A,γ != Ts,u A,γ : Aγ(s) Aγ(u). (109) Furthermore, the dependence on s, t and γ shall be sufficiently smooth in a certain sense. We call a transporter TTM on the tangent bundle TM a metric transporter if the map: Ts,t TM,γ : (Tγ(s)M, ηγ(s)) (Tγ(t)M, ηγ(t)) (110) is always an isometry. To construct transporters we need to introduce the notion of a connection on a vector bundle, which formalized how vector fields change when moving from one point to the next. Definition G.10 (Connection). A connection on a vector bundle A over M is an R-linear map: : Γ(A) Γ(T M A), (111) such that for all c : M R and f Γ(A) we have: (c f) = dc f + c (f), (112) where dc Γ(T M) is the differential of c. A special form of a connection are affine connections, which live on the tangent space. Definition G.11 (Affine connection). An affine connection on M (or more precisely, on TM) is an R-bilinear map: : Γ(TM) Γ(TM) Γ(TM), (113) (X, Y ) 7 XY, (114) such that for all c : M R and X, Y Γ(TM) we have: 1. c XY = c XY , 2. X(c Y ) = ( Xc) Y + c XY , where Xc denotes the directional derivative of c along X. Remark G.12. Certainly, an affine connection can also be re-written in the usual connection form: : Γ(TM) Γ(T M TM). (115) Every connection defines a (parallel) transporter TA. Definition/Lemma G.13 (Parallel transporter of a connection). Let be a connection on the vector bundle A over M. Then defines a (parallel) transporter TA for γ : I = [s, t] M as follows: Ts,t A,γ : Aγ(s) Aγ(t), v 7 f(t), (116) where f is the unique vector field f Γ(γ A) with: 1. (γ )(f) = 0, 2. f(s) = v, which always exists. Here γ denotes the corresponding pullback from M to I. For pseudo-Riemannian manifolds there is a canonical choice of a metric connection, the Levi-Cevita connection, which always exists and is uniquely characterized by its two main properties. Definition/Theorem G.14 (Fundamental theorem of pseudo-Riemannian geometry: the Levi-Civita connection). Let (M, η) be a pseudo-Riemannian manifold. Then there exists a unique affine connection on (M, η) such that the following two conditions hold for all X, Y, Z Γ(TM); 1. metric preservation: Z (η(X, Y )) = η( ZX, Y ) + η(X, ZY ). (117) 2. torsion-free: XY Y X = [X, Y ], (118) where [X, Y ] is the Lie bracket of vector fields. This affine connection is called the Levi-Cevita connection of (M, η) and is denoted as LC. Remark G.15 (Levi-Civita transporter). Let (M, G, η) be a pseudo-Riemannian manifold with Levi-Cevita connection LC. 1. The corresponding Levi-Cevita transporter TTM on TM is always a metric transporter, i.e. it always induces (linear) isometries of vector spaces: Ts,t TM,γ : (Tγ(s)M, ηγ(s)) (Tγ(t)M, ηγ(t)). (119) Clifford-Steerable Convolutional Neural Networks 2. Furthermore, the Levi-Cevita transporter extends to every G-associated vector bundle A as TA. 3. For every G-associated vector bundle A, every curve γ : I M and ϕ Isom(M, G, η), the Levi-Cevita transporter TA,γ always satisfies: ϕ ,A TA,γ = TA,ϕ γ ϕ ,A. (120) Definition G.16 (Geodesics). Let M be a manifold with affine connection and γ : I M a curve. We call γ a geodesic of (M, ) if for all t I we have: γ(t) γ(t) = 0, (121) i.e. if γ runs parallel to itself. For pseudo-Riemannian manifolds (M, η) we will typically use the Levi-Cevita connection LC to define geodesics. Definition/Lemma G.17 (Pseudo-Riemannian exponential map). For a manifold M with affine connection , z M and v Tz M there exists a unique geodesic γz,v : I = ( s, s) M of (M, ) with maximal domain I such that: γz,v(0) = z, γz,v(0) = v. (122) The -exponential map at z M is then the map: expz : T z M M, expz(v) := γz,v(1), (123) with domain: T z M := {v Tz M | γz,v(1) is defined} . (124) For pseudo-Riemannian manifolds (M, η) we will call the exponential map expz defined via the Levi-Cevita connection LC the pseudo-Riemannian exponential map of (M, η) at z M. Remark G.18. For a pseudo-Riemannian manifold (M, η) the differential d expz |v : Tv Tz M Texpz(v)M is the identity map on Tz M at v = 0 Tz M: d expz |v=0 != id Tz M : Tz M = T0Tz M Texpz(0)M = Tz M. Furthermore, there exist an open subset Uz Tz M such that 0 Uz and expz : Uz expz(Uz) M is a diffeomorphism and expz(Uz) M is an open subset. Notation G.19. For a transporter TA for a vector bundle on (M, ) we abbreviate for z M and v T z M: Tz,v := TA,γ z,v : Aexpz(v) Az, (125) where γ z,v : [0, 1] M is given by γ z,v(t) := expz((1 t) v). Definition G.20 (Transporter pullback, see Weiler et al. (2023) Def. 12.2.4). Let (M, η) be a pseudo-Riemannian manifold and A a vector bundle over M. Furthermore, let expz denote the pseudo-Riemannian exponential map (based on the Levi-Civita connection) and TA any transporter on A. We then define the transporter pullback: Exp z : Γ(A) C(T z M, Az), (126) Exp z(f)(v) := Tz,v f(expz(v)) | {z } Aexpz(v) Lemma G.21 (See Weiler et al. (2023) Thm. 13.1.4). For G-structured pseudo-Riemannian manifold (M, G, η) and G-associated vector bundle A, z M, ϕ Isom(M, G, η) and f Γ(A) we have: Exp z(ϕ f) = ϕ ,A [Exp ϕ 1(z)(f)] ϕ 1 ,TM, (128) provided the transporter map TA satisfies Equation (120). Weight sharing for the convolution operator I boils down to the use of a template convolution kernel K, which is then applied/re-used at every location z M. Definition G.22 (Template convolution kernel). Let M be a manifold of dimension d and Ain and Aout two vector bundles over M with typical fibres Win and Wout, resp. A template convolution kernel for (M, Ain, Aout) is then a (sufficiently smooth, non-linear) map: K : Rd Hom Vec(Win, Wout), (129) that is sufficiently decaying when moving away from the origin 0 Rd (to make all later constructions, like convolution operations, etc., well-defined). The G-gauge equivariance of a convolution operator I is encoded by the following G-steerability of the template convolution kernel. Definition G.23 (G-steerability convolution kernel constraints). Let G GL(d) be a closed subgroup and (M, G, η) be a G-structured pseudo-Riemannian manifold of signature (p, q), d = p + q, and Ain and Aout two Gassociated vector bundles with typical fibre (Win, ρin) and (Wout, ρout), resp. A template convolution kernel K for (M, Ain, Aout): K : Rd Hom Vec(Win, Wout), (130) will be called G-steerable if for all g G and v Rd we have: K(gv) = 1 | det g| ρout(g) K(v) ρin(g) 1 (131) =: ρHom(g)(K(v)). (132) Remark G.24. Note that the G-steerability of K is expressed through Equation (131), while the G-gauge equivariance of K will, more closely, be expressed through the re-interpretation in Equation (132). Clifford-Steerable Convolutional Neural Networks Definition G.25 (Convolution operator, see Weiler et al. (2023) Thm. 12.2.9). Let (M, G, η) be a G-structured pseudo-Riemannian manifold and Ain and Aout two Gassociated vector bundles over M with typical fibres (Win, ρin) and (Wout, ρout) and K a G-steerable template convolution kernel, see Equation (131). Let fin Γ(Ain) and consider a local trivialization (ΨC, U C) AG around z U C M (which locally trivializes Ain and Aout). Then we have a well-defined convolution operator: L : Γ(Ain) Γ(Aout), fin 7 L(fin) := fout, (133) given by the local formula: f C out(z) := Z Rd K(v C) [Exp z fin]C(v C) dv C, (134) where Exp z is the transporter pullback from Definition G.20, where expz denotes the pseudo-Riemannian exponential map (based on the Levi-Cevita connection LC) and TAin any transporter satisfying Equation (120) (e.g. parallel transport based on LC). Remark G.26 (Coordinate independence of the convolution operator). The coordinate independence of the convolution operator L : Γ(Ain) Γ(Aout) comes from the following covariance relations and Equation (131). If we use a different local trivialization (ΨB, U B) AG in Equation (134) with z U B U C then there exists a g G such that: v C = g v B Rd, (135) dv C = | det g| dv B, (136) [Exp z fin]C(v C) = ρin(g) [Exp z fin]B(v B) Win, (137) f C out(z) = ρout(g)f B out(z) Wout. (138) So, fout : M Aout is a well-defined global section in Γ(Aout). We are finally in the place to state the main theorem of this section, stating that every G-steerable template convolution kernel leads to an isometry equivariant convolution operator. Theorem G.27 (Isometry equivariance of convolution operator, see Weiler et al. (2023) Thm. 13.2.6). Let G GL(d) be closed subgroup and (M, G, η) be a G-structured pseudo-Riemannian manifold of signature (p, q) with d = p + q. Let Ain and Aout be two G-associated vector bundles with typical fibres (Win, ρin) and (Wout, ρout). Let K be a G-steerable template convolution kernel, see Equation (131). Consider the corresponding convolution operator L : Γ(Ain) Γ(Aout) given by Equation (134), where expz denotes the pseudo-Riemannian exponential map (based on the Levi-Cevita connection LC) and TAin any transporter satisfying Equation (120) (e.g. parallel transport based on LC). Then the convolution operator L : Γ(Ain) Γ(Aout) is equivariant w.r.t. the G-structure preserving isometry group Isom(M, G, η): for every ϕ Isom(M, G, η) and fin Γ(Ain) we have: L(ϕ fin) = ϕ L(fin). (139) So the main obstruction for constructing a well-behaved convolution operator L are thus the kernel constraints Equation (131), which are generally notoriously difficult to solve, especially for continuous non-compact groups G like O(p, q). G.2. Clifford-steerable CNNs on pseudo-Riemannian manifolds Let (M, η) be a pseudo-Riemannian manifold of signature (p, q) and dimension d = p + q. Then (M, η) carries a unique O(p, q)-structure OM induced by η. The intuition is that OM consists of all orthonormal frames w.r.t. η. In fact, the choice of an O(p, q)- structure on M is equivalent to the choice of a metric η of signature (p, q) on M. That said, we will now restrict to the structure group G = O(p, q) everywhere in the following. We will further restrict to multi-vector feature fields Ain := Cl(TM, η)cin and Aout := Cl(TM, η)cout, which we first need to formalize properly. Definition G.28 (Clifford algebra bundle). Let (M, η) be a pseudo-Riemannian manifold. Then the Clifford algebra bundle over M is defined (as a set) as the disjoint union of the Clifford algebras of the corresponding tangent spaces: Cl(TM, η) := G z M Cl(Tz M, ηz). (140) Cl(TM, η) becomes an algebra bundle over M with the standard constructions of local trivialization and bundle projections. Definition G.29 (Othonormal frame bundle of signature (p, q).). Let (M, η) be a pseudo-Riemannian manifold of signature (p, q) and dimension d = p + q. Abbreviate for indices i, j [d]: δp,q i,j := 0 if i = j, +1 if i = j [1, p], 1 if i = j [p + 1, d]. (141) Then the orthonormal frame bundle of signature (p, q) is defined as: z M Oz M, (142) Clifford-Steerable Convolutional Neural Networks where we put: Oz M := n [e1, . . . , ed] j [d]. ej Tz M, (143) i, j [d]. ηz(ei, ej) = δp,q i,j o . (144) Then OM becomes an O(p, q)-structure for (M, η) together with the standard constructions of local trivialization, bundle projection and right group action: : OM O(p, q) OM, (145) [ei]i [d] g := j [d] ej gj,i Lemma G.30. Let (M, η) be a pseudo-Riemannian manifold of signature (p, q) and dimension d = p + q. We have an algebra bundle isomorphism over M: Cl(TM, η) = OM ρCl Cl(Rp,q), (147) where ρCl : O(p, q) OAlg(Cl(Rp,q), ηp,q) is the usual action of the orthogonal group O(p, q) on Cl(Rp,q) by rotating all vector components individually. In particular, the Clifford algebra bundle Cl(TM, η) is an O(p, q)-associated algebra bundle over M with typical fibre Cl(Rp,q). Definition G.31 (Multivector fields). A multivector field on M is a global section f Γ(Cl(TM, η)c) for some c N, i.e. a map f : M Cl(TM, η)c that assigns to every point z M a tuple of multivectors: f(z) = [f1(z), . . . , fc(z)] Cl(Tz M, ηz)c. Remark G.32 (The action of the isometry group on multivector fields). Let ϕ Isom(M, η) then ϕ is a diffeomorphic map ϕ : M M such that for every z M the differential map is an isometry: ϕ ,TM,z : (Tz M, ηz) (Tϕ(z), ηϕ(z)). (148) We can now describe the induced map ϕ ,Cl(TM,η) via the general construction on associated vector fields, see Remark G.8, with help of the identification Equation (147): ϕ ,Cl(TM,η) : OM ρCl Cl(Rp,q) OM ρCl Cl(Rp,q), ϕ ,Cl(TM,η)(e, x) = (ϕ ,FM(e), x), (149) or we can look at the fibres directly, z M: ϕ ,Cl(TM,η),z : Cl(Tz M, ηz) Cl(Tϕ(z)M, ηϕ(z)), ϕ ,Cl(TM,η),z i I ci vi,1 vi,ki i I ci ϕ ,TM,z(vi,1) ϕ ,TM,z(vi,ki). (150) With this we can define a left action of the isometry group Isom(M, η) on the corresponding space of multivector fields Γ(Cl(TM, η)c) as follows: : Isom(M, η) Γ(Cl(TM, η)c) Γ(Cl(TM, η)c), (151) ϕ f := ϕ ,Cl(TM,η)c f ϕ 1 : M Cl(TM, η)c. (152) We are now in the position to state the main theorem of this section. Theorem G.33 (Clifford-steerable CNNs on pseudo-Riemannian manifolds are gauge and isometry equivariant). Let (M, η) be a pseudo-Riemannian manifold of signature (p, q) and dimension d = p + q. We consider (M, η) to be endowed with the structure group G = O(p, q). Consider multi-vector feature fields Ain = Cl(TM, η)cin and Aout = Cl(TM, η)cout over M. Let K = H K be a Clifford-steerable kernel, the same template convolution kernel as presented in the main paper in Section 3: K : Rp,q Hom Vec (Cl(Rp,q)cin, Cl(Rp,q)cout) , (153) where K : Rp,q Cl(Rp,q)cout cin is the kernel network, a Clifford group equivariant neural network with (cin cout) number of Clifford algebra outputs, and, where H is the O(p, q)-equivariant kernel head: H : Cl(Rp,q)cout cin (154) Hom Vec Cl(Rp,q)cin, Cl(Rp,q)cout . Then K is automatically O(p, q)-steerable, i.e. for g O(p, q), v Rp,q we have21: K(gv) = ρcout Cl (g) K(v) ρcin Cl (g) 1. (155) Furthermore, the corresponding convolution operator L : Γ(Ain) Γ(Aout), given by Equation (134), is equivariant w.r.t. the full isometry group Isom(M, η): for every ϕ Isom(M, η) and fin Γ(Ain) we have: L(ϕ fin) = ϕ L(fin). (156) Remark G.34. A similar theorem to Theorem G.33 can be stated for orientable pseudo-Riemannian manifolds (M, η) and structure group G = SO(p, q), if one reduces the Clifford group equivariant neural network parameterizing the kernel network K to be (only) SO(p, q)-equivariant. 21Note that the factor | det g| 1 does not appear here, in contrast to the general formula in Equation (131), because | det g| = 1 anyways for all g O(p, q).