# equivariant_manifold_flows__9afd3b40.pdf Equivariant Manifold Flows Isay Katsman*, Aaron Lou*, Derek Lim*, Qingxuan Jiang* Cornell University {isk22, al968, dl772, qj46}@cornell.edu Ser-Nam Lim Facebook AI sernam@gmail.com Christopher De Sa Cornell University cdesa@cs.cornell.edu Tractably modelling distributions over manifolds has long been an important goal in the natural sciences. Recent work has focused on developing general machine learning models to learn such distributions. However, for many applications these distributions must respect manifold symmetries a trait which most previous models disregard. In this paper, we lay the theoretical foundations for learning symmetry-invariant distributions on arbitrary manifolds via equivariant manifold flows. We demonstrate the utility of our approach by learning quantum field theorymotivated invariant SU(n) densities and by correcting meteor impact dataset bias. 1 Introduction Figure 1: An example of a density on SU(3) that is invariant to conjugation by SU(3). The x-axis and y-axis are the angles 1 and 2 for eigenvalues ei 1 and ei 2 of a matrix in SU(3). The axis range is to . Learning probabilistic models for data has long been the focus of many problems in machine learning and statistics. Though much effort has gone into learning models over Euclidean space [6, 20, 21], less attention has been allocated to learning models over non-Euclidean spaces, despite the fact that many problems require a manifold structure. Density learning over non Euclidean spaces has applications ranging from quantum field theory in physics [44] to motion estimation in robotics [16] to protein-structure prediction in computational biology [22]. Continuous normalizing flows (CNFs) [6, 21] are powerful generative models for learning structure in complex data due to their tractability and theoretical guarantees. Recent work [29, 30] has extended the framework of continuous normalizing flows to the setting of density learning on Riemannian manifolds. However, for many applications in the natural sciences, this construction is insufficient as it cannot properly model necessary symmetries. For example, such symmetry requirements arise when sampling coupled particle systems in physical chemistry [26] or sampling for use in SU(n)1 lattice gauge theories in theoretical physics [3]. More precisely, these symmetries are invariances with respect to action by an isometry subgroup of the underlying manifold. * indicates equal contribution 1SU(n) denotes the special unitary group SU(n) = {X 2 Cn n | X X = I, det(X) = 1}. 35th Conference on Neural Information Processing Systems (Neur IPS 2021). For example, consider the task of learning a density on the sphere that is invariant to rotation around an axis; this is an example of learning an isometry subgroup invariant2 density. For a less trivial example, note that when learning a flow-based sampler for SU(n) in the context of lattice QFT [3], the learned density must be invariant to conjugation by SU(n) (see Figure 1 for a density on SU(3) that exhibits the requisite symmetry). One might naturally attempt to work with the quotient of the manifold by the relevant isometry subgroup in order to model the invariance. First, note that this structure is not always a manifold, and additional restrictions are needed on the action to ensure the quotient will have a manifold structure3. Assuming the quotient is in fact a manifold, one then asks whether an invariant density may be modelled by learning over this quotient with a general manifold density learning method such as NMODE [29]? Though this seems plausible, it is a problematic approach for several reasons: 1. First, it is often difficult to realize necessary constructs (charts, exponential maps, tangent spaces) on the quotient manifold (e.g. this is the case for RPn, a quotient of Sn [28]). 2. Second, even if the above constructs can be realized, the quotient manifold often has a boundary, which precludes the use of a manifold CNF. To illustrate this point, consider the simple case of the sphere invariant to rotation about an axis; the quotient manifold is a closed interval, and a CNF would flow out" on the boundary. 3. Third, even if the quotient is a manifold without boundary for which we have a clear characterization, it may have a discrete structure that induces artifacts in the learned distribution. This is the case for Boyda et al. [3]: the flow construction over the quotient induces abnormalities in the density. Motivated by the above drawbacks, we design a manifold continuous normalizing flow on the original manifold that maintains the requisite symmetry invariance. Since vanilla manifold CNFs do not maintain said symmetries, we instead construct equivariant manifold flows and show they induce the desired invariance. To construct these flows, we present the first general way of designing equivariant vector fields on manifolds. A summary of our paper s contributions is as follows: We present a general framework and the requisite theory for learning equivariant manifold flows: in our setup, the flows can be learned over arbitrary Riemannian manifolds while explicitly incorporating symmetries inherent to the problem. Moreover, we prove that the equivariant flows we construct can universally approximate distributions on closed manifolds. We demonstrate the efficacy of our approach by learning gauge invariant densities over SU(n) in the context of quantum field theory. In particular, when applied to the densities in Boyda et al. [3], we adhere more naturally to the target geometry and avoid the unnatural artifacts of the quotient construction. We highlight the benefit of incorporating symmetries into manifold flow models by compar- ing directly against previous general manifold density learning approaches. We show that when a general manifold learning model is not aware of symmetries inherent to the problem, the learned density is of considerably worse quality and violates said symmetries. Prior to our work, there did not exist literature that demonstrated the benefits of incorporating isometry group symmetries for learning flows on manifolds, yet we achieve these benefits, and do so through a novel equivariant vector field construction. 2 Related Work Our work builds directly on pre-existing manifold normalizing flow models and enables them to leverage inherent symmetries through equivariance. In this section we cover important developments from the relevant fields: manifold normalizing flows and equivariant machine learning. 2This specific isometry subgroup is known as the isotropy group at a point of the sphere intersecting the axis. 3In particular, the isometry subgroup action needs to be smooth, free, and proper to ensure the quotient will be a manifold by the Quotient Manifold Theorem [28]. Normalizing Flows on Manifolds Normalizing flows on Euclidean space have long been touted as powerful generative models [6, 10, 21]. Similar to GANs [20] and VAEs [24], normalizing flows learn to map samples from a tractable prior density to a target density. However, unlike the aforementioned models, normalizing flows account for changes in volume, enabling exact evaluation of the output probability density. In a rather concrete sense, this makes them theoretically principled. As such, they are ideal candidates for generalization beyond the Euclidean setting, where a careful, theoretically principled modelling approach is necessary. Motivated by recent developments in geometric deep learning [4], many methods have extended normalizing flows to Riemannian manifolds. Rezende et al. [38] introduced constructions specific to tori and spheres, while Bose et al. [2] introduced constructions for hyperbolic space. Following this work, Falorsi and Forré [15], Lou et al. [29], Mathieu and Nickel [30] concurrently introduced a general construction by extending Neural ODEs [6] to the setting of Riemannian manifolds. Our work takes inspiration from the methods of Lou et al. [29], Mathieu and Nickel [30] and generalizes them further to enable learning that takes into account symmetries of the target density. Equivariant Machine Learning Motivated by the observation that many classic neural network architectures incorporate symmetry as an inductive bias, recent work has leveraged symmetries inherent in data through the concept of equivariance [7 9, 18, 27, 37]. Köhler et al. [26], in particular, used equivariant normalizing flows to enable learning symmetric densities over Euclidean space. The authors note their approach is better suited to density learning in some physical chemistry settings (when compared to general purpose normalizing flows), since they take into account the symmetries of the problem. Symmetries also appear naturally in the context of learning densities over manifolds. While in many cases symmetry can be a good inductive bias for learning4, for certain test tasks it is a strict requirement. For example, Boyda et al. [3] introduced equivariant flows on SU(n) for use in lattice gauge theories, where the modelled distribution must be conjugation invariant. However, beyond conjugation invariant learning on SU(n) [3], not much other work has been done for learning invariant distributions over manifolds. Our work bridges this gap by introducing the first general equivariant manifold normalizing flow model for arbitrary manifolds and symmetries. 3 Background In this section, we provide a terse overview of necessary concepts for understanding our paper. In particular, we address fundamental notions from Riemannian geometry as well as the basic set-up of normalizing flows on manifolds. For a more detailed introduction to Riemannian geometry, we refer the reader to textbooks such as Lee [28] and Kobyzev et al. [25]. 3.1 Riemannian Geometry A Riemannian manifold (M, h) is an n-dimensional manifold with a smooth collection of inner products (hx)x2M for every tangent space Tx M. The Riemannian metric h induces a distance dh on the manifold. A diffeomorphism f : M ! M is a differentiable bijection with differentiable inverse. A diffeomorphism f : M ! M is called an isometry if h(Dxf(u), Dxf(v)) = h(u, v) for all tangent vectors u, v 2 Tx M where Dxf is the differential of f. Note that isometries preserve the manifold distance function. The collection of all isometries forms a group G, which we call the isometry group of the manifold M. Riemannian metrics also allow for a natural analogue of gradients on Rn. For a function f : M ! R, we define the Riemannian gradient rxf to be the vector on Tx M such that h(rxf, v) = Dxf(v) for v 2 Tx M. 3.2 Normalizing Flows on Manifolds Manifold Normalizing Flow Let (M, h) be a Riemannian manifold. A normalizing flow on M is a diffeomorphism f : M ! M (parametrized by ) that transforms a prior density to model 4For example, asteroid impacts on the sphere can be modelled as being approximately invariant to rotation about the Earth s axis. density f . The model distribution can be computed via the Riemannian change of variables5: ###deth Dxf 1 Manifold Continuous Normalizing Flow A manifold continuous normalizing flow with base point z is a function γ : [0, 1) ! M that satisfies the manifold ODE dt = X(γ(t), t), γ(0) = z. We define FX,T : M ! M, z 7! FX,T (z) to map any base point z 2 M to the value of the CNF starting at z, evaluated at time T. This function is known as the (vector field) flow of X. 3.3 Equivariance and Invariance Let G be an isometry subgroup of M. We notate the action of an element g 2 G on M by the map Lg : M ! M. Equivariant and Invariant Functions We say that a function f : M ! N is equivariant if, for all isometries gx : M ! M and gy : N ! N, we have f gx = gy f. We say a function f : M ! N is invariant if f gx = f. Equivariant Vector Fields Let X : M [0, 1) ! TM, X(m, t) 2 Tm M be a time-dependent vector field on manifold M, with base point x0 2 M. X is a G-equivariant vector field if 8(m, t) 2 M [0, 1), X(Lgm, t) = (Dm Lg)X(m, t). Equivariant Flows A flow f : M ! M is G-equivariant if it commutes with actions from G, i.e. we have Lg f = f Lg. Invariance of Density A density on a manifold M is G-invariant if, for all g 2 G and x 2 M , (Lgx) = (x), where Lg is the action of g on x. 4 Invariant Densities from Equivariant Flows Our goal in this section is to describe a tractable way to learn a density over a manifold that obeys a symmetry given by an isometry subgroup G. Since this cannot be done directly and it is not clear how a manifold continuous normalizing flow can be altered to preserve symmetry, we will derive the following implications to yield a tractable solution: 1. G-invariant potential ) G-equivariant vector field (Theorem 1). We show that given a G-invariant potential function Φ : M ! R, the vector field rΦ is G-equivariant. 2. G-equivariant vector field ) G-equivariant flow (Theorem 2). We show that a G- equivariant vector field on M uniquely induces a G-equivariant flow. 3. G-equivariant flow ) G-invariant density (Theorem 3). We show that given a G- invariant prior and a G-equivariant flow f, the flow density f is G-invariant. These are constructed in the same spirit as the theorems in Köhler et al. [26] (which also appeared in Papamakarios et al. [34]), although we note that our results are significantly more general. In addition to extending the domain to Riemannian manifolds, we consider arbitrary symmetry groups while Köhler et al. [26] only considers the linear Lie group SO(n). As a result, our proof techniques are based on heavy geometric machinery instead of straightforward linear algebra techniques. If we have a prior distribution on the manifold that obeys the requisite invariance, then the above implications show that we can use a G-invariant potential to produce a flow that, in tandem with the CNF framework, learns an output density with the desired invariance. We claim that constructing a G-invariant potential function on a manifold is far simpler than directly parameterizing a G-invariant density or a G-equivariant flow. We shall give explicit examples of G-invariant potential constructions in Section 5.2 that induce a desired density invariance. 5Here, deth is the determinant function with volume induced by the Riemannian metric h. Moreover, we show in Theorem 4 that considering equivariant flows generated from invariant potential functions suffices to learn any smooth distribution over a closed manifold, as measured by Kullback-Leibler divergence. We defer the proofs of all theorems to the appendix. 4.1 Equivariant Gradient of Potential Function We start by showing how to construct G-equivariant vector fields from G-invariant potential functions. To design an equivariant vector field X, it is sufficient to set the vector field dynamics of X as the gradient of some G-invariant potential function Φ : M ! R. This is formalized in the following theorem. Theorem 1. Let (M, h) be a Riemannian manifold and G be its group of isometries (or an isometry subgroup). If Φ : M ! R is a smooth G-invariant function, then the following diagram commutes for any g 2 G: or r LguΦ = Du Lg(ruΦ). Hence rΦ is a G-equivariant vector field. This condition is also tight in the sense that it only occurs if G is the isometry subgroup. Hence, as long as one can construct a G-invariant potential function, one can obtain the desired equivariant vector field. By this construction, a parameterization of G-invariant potential functions yields a parameterization of (some) G-equivariant vector fields. 4.2 Constructing Equivariant Manifold Flows from Equivariant Vector Fields To construct equivariant manifold flows, we will use tools from the theory of manifold ODEs. In particular, there exists a natural correspondence between equivariant flows and equivariant vector fields. We formalize this in the following theorem: Theorem 2. Let (M, h) be a Riemannian manifold, and G be its isometry group (or one of its subgroups). Let X be any time-dependent vector field on M, and FX,T be the flow of X. Then X is a G-equivariant vector field if and only if FX,T is a G-equivariant vector field flow. Hence we can obtain an equivariant flow from an equivariant vector field, and vice versa. 4.3 Invariant Manifold Densities from Equivariant Flows We now show that G-equivariant flows induce G-invariant densities. Note that we require the group G to be an isometry subgroup in order to control the density of f, and the following theorem does not hold for general diffeomorphism subgroups. Theorem 3. Let (M, h) be a Riemannian manifold, and G be its isometry group (or one of its subgroups). If is a G-invariant density on M, and f is a G-equivariant diffeomorphism, then f is also G-invariant. In the context of manifold normalizing flows, Theorem 3 implies that if the prior density on M is G-invariant and the flow is G-equivariant, the resulting output density will be G-invariant. In the context of the overall set-up, this reduces the problem of constructing a G-invariant density to the problem of constructing a G-invariant potential function. 4.4 Sufficiency of Flows Generated via Invariant Potentials It is unclear whether equivariant flows induced by invariant potentials can learn arbitrary invariant distributions over manifolds. In particular, it is reasonable to have some concerns about limited expressivity, since it is unclear whether any equivariant flow can be generated in this way. We alleviate these concerns for our use cases by proving that equivariant flows obtained from invariant potential functions suffice to learn any smooth invariant distribution over a closed manifold, as measured by Kullback-Leibler (KL) divergence. Theorem 4. Let (M, h) be a closed Riemannian manifold. Let be a smooth, non-vanishing distribution over M, which will act as our target distribution. Let t be a distribution over said manifold parameterized by a real time variable t, with 0 acting as the initial distribution. Let DKL( t|| ) denote the Kullback Leibler divergence between distributions t and . If we choose a g : M ! R such that and if t evolves with t as the distribution of a flow according to g, it follows that @ @t DKL( tk ) = t exp(g)krgk2 dx = implying convergence of t to in KL. Moreover, the exact diffeomorphism that takes us from 0 ! is as follows. Given some initial point x 2 M, let u(t) be the solution to the initial value problem given by: dt = rg(t), u(0) = x The desired diffeomorphism maps x to limt!1 u(t). Hence if the target distribution is , the current distribution is 0, and g as defined above is the potential from which the flow controlling the evolution of t is obtained, then t converges to in KL. This means that considering flows generated by invariant potential functions is sufficient to learn any smooth invariant target distribution on a closed manifold (as measured by KL divergence). 5 Learning Invariant Densities with Equivariant Flows In this section, we discuss implementation details of the methodology given in Section 4. In particular, we describe the equivariant manifold flow model, provide two examples of invariant potential constructions on different manifolds, and discuss how training is performed depending on the target task. 5.1 Equivariant Manifold Flow Model For our equivariant flow model, we first construct a G-invariant potential function Φ : M ! R (we show how to construct these potentials in Section 5.2). The equivariant flow model works by using automatic differentiation [35] on Φ to obtain rΦ, using this rΦ for the vector field, and integrating in a step-wise fashion over the manifold. Specifically, forward integration and change-in-density (divergence) computations utilize the Riemannian Continuous Normalizing Flows [30] framework. This flow model is used in tandem with a specific training procedure (described in Section 5.3) to obtain a G-invariant model density that approximates some target. 5.2 Constructing G-invariant Potential Functions In this subsection, we present two constructions of invariant potentials on manifolds. Note that a symmetry of a manifold (i.e. action by an isometry subgroup) will leave part of the manifold free. The core idea of our invariant potential construction is to parameterize a neural network on the free portion of the manifold. While the two constructions we give below are certainly not exhaustive, they illustrate the versatility of our method, which is applicable to general manifolds and symmetries. 5.2.1 Isotropy Invariance on S2 Consider the sphere S2, which is the Riemannian manifold {v 2 R3 : kvk = 1} with the induced pullback metric. The isotropy group for a point v is defined as the subgroup of the isometry group which fixes v, i.e. the set of rotations around an axis that passes through v. In practice, we let v = (0, 0, 1), so the isotropy group is the group of rotations on the xy-plane. An isotropy invariant density would be invariant to such rotations, and hence would look like a horizontally-striped density on the sphere (see Figure 4a). Invariant Potential Parameterization We design an invariant potential by applying a neural network to the free parameter. In the case of our specific isotropy group listed above, the free parameter is the z-coordinate. The invariant potential is simply a 2-input neural network with the spatial input being the z-coordinate and the time input being the time during integration. As a result of this design, we see that the only variance in the learned distribution that uses this potential will be along the z-axis, as desired. Prior Distributions For proper learning with a normalizing flow, we need a prior distribution on the sphere that respects the isotropy invariance. There are many isotropy invariant potentials on the sphere. Natural choices include the uniform density (which is invariant to all rotations) and the wrapped distribution with the center at v [33, 40]. For our experiments, we use the uniform density. 5.2.2 Conjugation Invariance on SU(n) For many applications in physics (specifically gauge theory and lattice quantum field theory), one works with the Lie Group SU(n) the group of unitary matrices with determinant 1. In particular, when modelling probability distributions on SU(n) for lattice QFT, the desired distribution must be invariant under conjugation by SU(n) [3]. Conjugation is an isometry on SU(n) (see Appendix A.5), so we can model probability distributions invariant under this action with our developed theory. Invariant Potential Parameterization We want to construct a conjugation invariant potential function Φ : SU(n) ! R. Note that matrix conjugation preserves eigenvalues. Thus, for a function Φ : SU(n) ! R to be invariant to matrix conjugation, it has to act on the eigenvalues of x 2 SU(n) as a multi-set. We can parameterize such potential functions Φ by the Deep Set network from [45]. Deep Set is a permutation invariant neural network that acts on the eigenvalues, so the mapping of x 2 SU(n) is Φ(x) = ˆΦ({λ1(x), . . . , λn(x)}) for some set function ˆΦ. We append the integration time to the input of the standard neural network layers in the Deep Set network. As a result of this design, we see that the only variance in the learned distribution will be amongst non-similar matrices, while all similar matrices will be assigned the same density value. Prior Distributions For the prior distribution of the flow, we need a distribution that respects the matrix conjugation invariance. We use the Haar measure on SU(n), which is the uniform density over this manifold that is symmetric under gauge symmetry [3]. The volume element of the Haar measure is given for an x 2 SU(n) as Haar(x) = Q i