# slicedwasserstein_distances_and_flows_on_cartanhadamard_manifolds__d81ba6e6.pdf

Journal of Machine Learning Research 26 (2025) 1-76 Submitted 3/24; Revised 12/24; Published 1/25

Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifolds

Cl ement Bonet clement.bonet@ensae.fr ENSAE, CREST, Institut Polytechnique de Paris

Lucas Drumetz lucas.drumetz@imt-atlantique.fr IMT Atlantique, Lab-STICC

Nicolas Courty nicolas.courty@irisa.fr Universit e Bretagne Sud, IRISA

Editor: Shiqian Ma

While many Machine Learning methods have been developed or transposed on Riemannian manifolds to tackle data with known non-Euclidean geometry, Optimal Transport (OT) methods on such spaces have not received much attention. The main OT tool on these spaces is the Wasserstein distance, which suﬀers from a heavy computational burden. On Euclidean spaces, a popular alternative is the Sliced-Wasserstein distance, which leverages a closed-form solution of the Wasserstein distance in one dimension, but which is not readily available on manifolds. In this work, we derive general constructions of Sliced Wasserstein distances on Cartan-Hadamard manifolds, Riemannian manifolds with nonpositive curvature, which include among others Hyperbolic spaces or the space of Symmetric Positive Deﬁnite matrices. Then, we propose diﬀerent applications such as classiﬁcation of documents with a suitably learned ground cost on a manifold, and data set comparison on a product manifold. Additionally, we derive non-parametric schemes to minimize these new distances by approximating their Wasserstein gradient ﬂows.

Keywords: Optimal Transport, Sliced-Wasserstein, Riemannian Manifolds, Cartan Hadamard manifolds, Wasserstein Gradient Flows

1. Introduction

It is widely accepted that data have an underlying structure on a low-dimensional manifold (Bengio et al., 2013). However, it can be challenging to work directly on such data manifolds because of the absence of an analytical model. Therefore, most works only focus on Euclidean space and do not take advantage of this representation. In some cases though, the data naturally and explicitly lie on a manifold, or can be embedded on some known manifolds, allowing leveraging their intrinsic structure. In such cases, it has been shown to be beneﬁcial to exploit such a structure by leveraging the metric of the manifold rather than relying on a Euclidean embedding. To name a few examples, directional or geophysical data data for which only the direction provides information naturally lie on the sphere (Mardia et al., 2000) and hence their structure can be exploited by using methods suited to the sphere. Another popular example is given by data that have a known hierarchical structure. Then, such data beneﬁt from being embedded in hyperbolic spaces (Nickel and Kiela, 2017).

c 2025 Cl ement Bonet, Lucas Drumetz, and Nicolas Courty.

License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v26/24-0359.html.

Bonet, Drumetz, Courty

Motivated by these examples, many works have proposed new tools to handle data lying on Riemannian manifolds. To cite a few, Fletcher et al. (2004) and Huckemann and Ziezold (2006) developed PCA to perform dimension reduction on manifolds, while Le Brigant and Puechmorel (2019) studied density approximation, Feragen et al. (2015); Jayasumana et al. (2015); Fang et al. (2021) studied kernel methods and Azangulov et al. (2024a,b) developed Gaussian processes on (homogeneous) manifolds. More recently, there has been much interest in developing new neural networks with architectures that take into account the geometry of the ambient manifold (Bronstein et al., 2017), such as Residual Neural Networks (Katsman et al., 2024), discrete Normalizing Flows (Bose et al., 2020; Rezende et al., 2020; Rezende and Racani ere, 2021) or Continuous Normalizing Flows (Mathieu and Nickel, 2020; Lou et al., 2020; Rozen et al., 2021; Yataka et al., 2023). In the generative model literature, we can also mention the recent work by Chen and Lipman (2023), which extended the ﬂow matching training of Continuous Normalizing Flows to Riemannian manifolds, as well as the works by Bortoli et al. (2022) and Huang et al. (2022), who performed score based generative modeling, and Thornton et al. (2022), who studied Schr odinger bridges on manifolds. To compare probability distributions or perform generative modeling tasks, one usually needs suitable discrepancies or distances. In Machine Learning, classical divergences include, for example, the Kullback-Leibler divergence and the Maximum Mean Discrepancy (MMD). While these distances are well deﬁned for distributions lying on Riemannian manifolds, generally by crafting dedicated kernels for the MMD (Feragen et al., 2015), other choices that take more into account the geometry of the underlying space are Optimal Transport based distances, whose most prominent example is the Wasserstein distance. While the Wasserstein distance is well deﬁned on Riemannian manifolds and has been studied in many works theoretically, see e.g. (Mc Cann, 2001; Villani et al., 2009), it suﬀers from a signiﬁcant computational burden. In the Euclidean case, various approaches have been proposed to alleviate this computational cost, such as adding an entropic regularization and leveraging the Sinkhorn algorithm (Cuturi, 2013), approximating the Wasserstein distance using minibatchs (Fatras et al., 2020), using low-rank couplings (Scetbon and Cuturi, 2022), or tree metrics (Le et al., 2019). These approximations can be easily extended to Riemannian manifolds by using the right ground costs. For example, Alvarez-Melis et al. (2020) and Hoyos-Idrobo (2020) used the entropic regularized formulation on Hyperbolic spaces. Another popular alternative to the Wasserstein distance is the Sliced-Wasserstein distance (SW). While on Euclidean spaces, the Sliced-Wasserstein distance is a tractable alternative that allows working in large-scale settings, it cannot be directly extended to Riemannian manifolds since it relies on orthogonal projections of the measures onto straight lines. Hence, deriving new SW based distances on manifolds could be of much interest. This question has led to several works in this direction, ﬁrst on compact manifolds in (Rustamov and Majumdar, 2023) and then on the sphere (Bonet et al., 2023b; Quellmalz et al., 2023). Here, we focus on the particular case of Cartan-Hadamard manifolds, which encompass, in particular, Euclidean spaces, Hyperbolic spaces and Symmetric Positive Deﬁnite matrices endowed with appropriate metrics. More precisely, we develop a theoretically grounded way to deﬁne Sliced-Wasserstein distances on any Cartan-Hadamard manifold. We discuss their implementation in various speciﬁc cases, including Pullback-Euclidean manifolds, Hyperbolic spaces, Symmetric Positive Deﬁnite matrices, and product manifolds. Furthermore, we propose applications to diﬀerent machine learning tasks, such as document classiﬁcation

Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifold

and data set comparison, and we discuss the minimization of these discrepancies using the framework of Wasserstein gradient ﬂows. Contributions. In this article, we start in Section 2 by providing some background on Optimal Transport and on Riemannian manifolds. Then, in Section 3, we introduce diﬀerent ways to construct intrinsically Sliced-Wasserstein discrepancies on geodesically complete Riemannian manifolds with non-positive curvature (Cartan-Hadamard manifolds) by either using geodesic projections or horospherical projections. In Section 4, we specify the framework to diﬀerent Cartan-Hadamard manifolds, including manifolds endowed with a pullback Euclidean metric, Hyperbolic spaces, Symmetric positive Deﬁnite matrices with speciﬁc metrics and product of Cartan-Hadamard manifolds. Then, in Section 5, we derive some theoretical properties common to any sliced discrepancy on these Riemannian manifolds, as well as properties speciﬁc to the pullback Euclidean case. We also propose in Section 6 applications of the Sliced-Wasserstein distance on the Euclidean space endowed with the Mahalanobis distance on a document classiﬁcation task, and of the Sliced-Wasserstein distance on product manifolds for comparing data sets represented on the product space of the samples and of the labels. Finally, we propose in Section 7 non-parametric schemes to minimize these diﬀerent distances using Wasserstein gradient ﬂows, and hence allowing to derive new sampling algorithms on manifolds.1

2. Background

In this section, we ﬁrst introduce background on Optimal Transport through the Wasserstein distance and the Sliced-Wasserstein distance on Euclidean spaces. Then, we introduce general Riemannian manifolds. For more details about Optimal Transport, we refer to (Villani et al., 2009; Santambrogio, 2015; Peyr e et al., 2019). And for more details about Riemannian manifolds, we refer to (Gallot et al., 1990; Lee, 2006, 2012).

2.1 Optimal Transport on Euclidean Spaces

Wasserstein Distance. Optimal transport provides a principled way to compare probability distributions through the Wasserstein distance. Let p 1 and µ, ν Pp(Rd) = {µ P(R), R x p 2 dµ(x) < } two probability distributions with p ﬁnite moments. Then, the Wasserstein distance is deﬁned as

W p p (µ, ν) = inf γ Π(µ,ν)

Z x y p 2 dγ(x, y),

where Π(µ, ν) = {γ P(Rd Rd), π1 #γ = µ, π2 #γ = ν} denotes the set of couplings between µ and ν, π1(x, y) = x, π2(x, y) = y and # is the push-forward operator, deﬁned as T#µ(A) = µ(T 1(A)) for any Borel set A Rd and map T : Rd Rd. For discrete probability distributions with n samples, e.g., for µ = 1 n Pn i=1 δxi and ν = 1

n Pn j=1 δyj with x1, . . . , xn, y1, . . . , yn Rd, computing W p p requires solving a linear program, which has a O(n3 log n) worst-case complexity (Pele and Werman, 2009). Thus, it becomes intractable in large scale settings.

1. Code available at https://github.com/clbonet/Sliced-Wasserstein_Distances_and_Flows_on_ Cartan-Hadamard_Manifolds

Bonet, Drumetz, Courty

For unknown probability distributions µ and ν, from which we have access to samples x1, . . . , xn µ and y1, . . . , yn ν, a common practice to estimate W p p (µ, ν) is to compute the plug-in estimator

c W p p (µ, ν) = W p p

However, the approximation error, known as the sample complexity, is quantiﬁed in O(n 1

d ) (Boissard and Le Gouic, 2014). Thus, estimating the Wasserstein distance becomes less accurate in higher dimensions with the same sample size or computationally expensive if larger samples are used to maintain accuracy. To alleviate the computational burden and the curse of dimensionality, diﬀerent variants were proposed. We focus in this work on the Sliced-Wasserstein distance. Sliced-Wasserstein Distance. For µ, ν Pp(R), it is well known that the Wasserstein distance can be computed in closed-form (Peyr e et al., 2019, Remark 2.30). More precisely, let µ, ν Pp(R), then

W p p (µ, ν) = Z 1

0 |F 1 µ (u) F 1 ν (u)|p du, (1)

where F 1 µ and F 1 ν denote the quantile functions of µ and ν. For discrete distributions with n samples, quantiles can be computed in O(n log n) since they only require sorting the samples. Thus, for x1 < < xn, y1 < < yn, µn = 1

n Pn i=1 δxi and νn = 1

n Pn i=1 δyi,

W p p (µn, νn) = 1

i=1 |xi yi|p.

Motivated by this closed-form, Rabin et al. (2012) introduced the Sliced-Wasserstein distance, which is deﬁned by ﬁrst projecting linearly the distributions on every possible direction, and then by taking the average of the one dimensional Wasserstein distances on each line. More precisely, for a direction θ Sd 1, the coordinate of the orthogonal projection of x Rd on the line span(θ) is deﬁned by P θ(x) = x, θ . Then, by denoting by λ the uniform measure on the sphere Sd 1 = {θ Rd, θ 2 = 1}, the p-Sliced-Wasserstein distance between µ, ν Pp(Rd) is deﬁned as

SWp p(µ, ν) = Z

Sd 1 W p p (P θ #µ, P θ #ν) dλ(θ).

The projection process is illustrated in Figure 1. Since the outer integral is intractable, a common practice to estimate this integral is to rely on a Monte-Carlo approximation by ﬁrst sampling L directions θ1, . . . , θL and then taking the average of the L Wasserstein distances:

d SW p p(µ, ν) = 1

ℓ=1 W p p (P θℓ # µ, P θℓ # ν).

Thus, approximating SW requires to compute L Wasserstein distances, and Ln projections, resulting in a computational complexity of O Ln(log n + d) . Note that other integral

Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifold

Directions Source data Target data

= 40o = 100o = 160o

Figure 1: (Left) Orthogonal projection of points on a line passing through the origin 0. (Middle and Right) Illustration of the projection of 2d distributions on 3 different lines.

approximations have been recently proposed. For example, Nguyen et al. (2024) proposed to use quasi Monte-Carlo samples, and Leluc et al. (2023, 2024); Nguyen and Ho (2024a) used control variates to reduce the variance of the approximation. We are now interested in transposing this method to Riemannian manifolds, for which we give a short introduction in the following section.

2.2 Riemannian Manifolds

Deﬁnition. A Riemannian manifold (M, g) of dimension d is a space that behaves locally as a linear space diﬀeomorphic to Rd, called a tangent space. To any x M, one can associate a tangent space Tx M endowed with an inner product , x : Tx M Tx M R that varies smoothly with x. This inner product is deﬁned by the metric gx associated to the Riemannian manifold as gx(u, v) = u, v x for any x M, u, v Tx M. We denote G(x) the matrix representation of gx deﬁned such that

u, v Tx M, u, v x = gx(u, v) = u T G(x)v.

In some spaces, diﬀerent metrics can give very diﬀerent geometries. We call tangent bundle the disjoint union of all tangent spaces TM = {(x, v), x M and v Tx M}, and we call a vector ﬁeld a map V : M TM such that V (x) Tx M for all x M. Geodesics. A generalization of straight lines in Euclidean spaces to Riemannian manifolds is given by geodesics, which are smooth curves connecting two points x, y M with minimal length, i.e., continuous mappings γ : [0, 1] M such that γ(0) = x, γ(1) = y, and which minimize the length L deﬁned as

0 γ (t) γ(t) dt,

where γ (t) γ(t) = q

γ (t), γ (t) γ(t). In this work, we will focus on geodesically complete Riemannian manifolds, in which case there is always a geodesic between two points x, y

Bonet, Drumetz, Courty

Figure 2: Illustration of geodesics, of the tangent space and the exponential map on a Riemannian manifold.

M. Furthermore, in this speciﬁc case, all geodesics are actually geodesic lines, i.e., they can be extended to R. Let x, y M, γ : [0, 1] M a geodesic between x and y such that γ(0) = x and γ(1) = y, then the value of the length deﬁnes actually a distance (x, y) 7 d(x, y) between x and y, which we call the geodesic distance:

d(x, y) = inf γ, γ(0)=x, γ(1)=y L(γ).

Exponential Map. Let x M, then for any v Tx M, there exists a unique geodesic γ(x,v) starting at x with velocity v, i.e., such that γ(x,v)(0) = x and γ (x,v)(0) = v (Sommer et al., 2020). We can now deﬁne the exponential map as exp : TM M which for any x M, maps tangent vectors v Tx M back to the manifold at the point reached by the geodesic γ(x,v) at time t = 1:

(x, v) TM, expx(v) = γ(x,v)(1).

On geodesically complete manifolds, the exponential map is deﬁned on the entire tangent space, but is not necessarily a bijection. If it is the case, we note logx the inverse of expx, which allows mapping elements from the manifold to the tangent space. We illustrate these diﬀerent notions on Figure 2. Let f : M R be a diﬀerentiable map. We now deﬁne its Riemannian gradient, which is notably very important in order to generalize ﬁrst-order optimization algorithms to Riemannian manifolds (Bonnabel, 2013; Boumal, 2023).

Deﬁnition 1 (Gradient) We deﬁne the Riemannian gradient of f as the unique vector ﬁeld grad Mf : M TM satisfying

(x, v) TM, d dtf expx(tv) t=0 = v, grad Mf(x) x.

Sectional Curvature. A notion that allows studying the geometry as well as the topology of a given Riemannian manifold is the sectional curvature. Consider x M and two linearly independent vectors u, v Tx M. The sectional curvature κx(u, v) is deﬁned

Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifold

geometrically as the Gaussian curvature of the plane E = span(u, v) (Zhang et al., 2016), i.e.,

κx(u, v) = R(u, v)u, v x u, u x v, v x u, v 2x ,

where R denotes the Riemannian curvature tensor. We refer to (Lee, 2006) for more details. The behavior of geodesics changes given the curvature of the manifold. For instance, they usually diverge on manifolds of negative sectional curvature and converge on manifolds of positive sectional curvature (Hu et al., 2023). Important examples of Riemannian manifolds include Euclidean spaces, which have constant null curvature, the sphere, which has positive constant curvature and Hyperbolic spaces, which have negative constant curvature (i.e., have the same value at any point x M and for any 2-planes E) with their standard metrics. Another example is the torus endowed with the ambient metric which has some points of positive curvature, some points of negative curvature and some points of null curvature (de Oc ariz Borde et al., 2023b). In this paper, we will mostly focus on Cartan Hadamard manifolds which are complete connected Riemannian manifolds of non-positive sectional curvature.

2.3 Probability Distributions on Riemannian Manifolds

Probability Distributions. Let (M, g) be a Riemannian manifold. For x M, G(x) induces an inﬁnitesimal change of volume on the tangent space Tx M, resulting in a measure on the manifold,

d Vol(x) = p

Here, we denote by dx the Lebesgue measure. We refer to (Pennec, 2006) for more details on distributions on manifolds. Particularly interesting examples of probability distributions are wrapped distributions (Chevallier and Guigui, 2020; Chevallier et al., 2022; Galaz-Garcia et al., 2022), which are deﬁned as the push-forward of a distribution µ P(Tx M) onto P(M) using, e.g., the exponential map when it is invertible over the whole tangent space. Since it provides a very convenient way to sample on manifolds, this has received much attention notably on hyperbolic spaces with the wrapped normal distribution (Nagano et al., 2019; Cho et al., 2022b), for which the distribution in the tangent space is a Gaussian, and for which all transformations are diﬀerentiable, and can be used e.g. for variational autoencoders since they are amenable to the reparametrization trick. Optimal Transport. Optimal Transport is also well deﬁned on Riemannian manifolds using appropriate ground costs into the Kantorovich problem. Using the geodesic distance at the power p 1, we recover the p-Wasserstein distance (Mc Cann, 2001; Villani et al., 2009)

W p p (µ, ν) = inf γ Π(µ,ν)

M M d(x, y)p dγ(x, y),

where µ, ν Pp(M) = {µ P(M), R

M d(x, o)p dµ(x) < }, with o M some origin which can be arbitrarily chosen (because of the triangular inequality).

Bonet, Drumetz, Courty

3. Riemannian Sliced-Wasserstein

In this section, we introduce natural generalizations of the Sliced-Wasserstein distance for probability distributions supported on Riemannian manifolds, leveraging tools that are intrinsically deﬁned on these spaces. To achieve this, we begin by examining the Euclidean space from a Riemannian manifold perspective. Doing so, we naturally extend the Sliced Wasserstein distance to Riemannian manifolds of non-positive curvature. The proofs of this section are postponed to Appendix B.

3.1 Euclidean Sliced-Wasserstein as a Riemannian Sliced-Wasserstein Distance

It is well known that the Euclidean space can be viewed as a Riemannian manifold of null constant curvature (Lee, 2006). From this perspective, we can translate the elements used to build the Sliced-Wasserstein distance as Riemannian elements, and identify how to generalize it to more general Riemannian manifolds. First, let us recall that the p-Sliced-Wasserstein distance for p 1 between µ, ν Pp(Rd) is deﬁned as SWp p(µ, ν) = Z

Sd 1 W p p (P θ #µ, P θ #ν) dλ(θ),

where P θ(x) = x, θ and λ is the uniform distribution Sd 1. Geometrically, it amounts to projecting the distributions on every possible line passing through the origin 0. Hence, we see that we ﬁrst need to generalize lines passing through the origin. Next, we need to ﬁnd suitable projections onto these subsets. Finally, we need to ensure that we can still compute the Wasserstein distance eﬃciently between distributions supported on these subsets to maintain a computational advantage over solving the linear program. Lines. From a Riemannian manifold point of view, straight lines can be seen as geodesics, which are, as we saw in Section 2.2, curves minimizing the distance between any two points on it. For any direction θ Sd 1, the geodesic passing through 0 in direction θ is described by the curve γθ : R Rd deﬁned as γθ(t) = tθ = exp0(tθ) for any t R, and the corresponding geodesic is Gθ = span(θ). Hence, when it makes sense, a natural generalization of projections onto straight lines would be projections on geodesics passing through an origin. Projections. The projection P θ(x) of x Rd can be seen as the coordinate of the orthogonal projection on the geodesic Gθ. Indeed, the orthogonal projection P is formally deﬁned as P θ(x) = argmin y Gθ x y 2 = x, θ θ.

From this formulation, we see that P θ is a metric projection, which can also be called a geodesic projection on Riemannian manifolds as the metric is a geodesic distance. Then, we see that its coordinate on Gθ is t = x, θ = P θ(x), which can be also obtained by ﬁrst giving a direction to the geodesic, and then computing the distance between P θ(x) and the origin 0, as P θ(x) = sign( x, θ ) x, θ θ 0 2 = x, θ .

Note that this can also be recovered by solving

P θ(x) = argmin t R exp0(tθ) x 2.

Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifold

This formulation will be useful to generalize it to more general manifolds by replacing the Euclidean distance by the right geodesic distance. Note also that the geodesic projection can be seen as a projection along hyperplanes, i.e., the level sets of the projection function g(x, θ) = x, θ are (aﬃne) hyperplanes. This observation will come useful in generalizing SW to manifolds of non-positive curvature. Wasserstein Distance. The Wasserstein distance between measures lying on the real line has a closed-form which can be computed very eﬃciently (see Section 2.1). On more general Riemannian manifolds, as the geodesics will not necessarily be lines, we will need to check how to compute the Wasserstein distance between the projected measures.

3.2 On Manifolds of Non-Positive Curvature

In this section, we focus on complete connected Riemannian manifolds of non-positive curvature, also known as Hadamard manifolds or Cartan-Hadamard manifolds (Lee, 2006; Robbin and Salamon, 2011; Lang, 2012). These spaces include Euclidean spaces, but also spaces with constant negative curvature such as Hyperbolic spaces, or with variable non-positive curvatures such as the space of Symmetric Positive Deﬁnite matrices and product of manifolds with constant negative curvature (Gu et al., 2019, Lemma 1). We refer to (Ballmann et al., 2006) or (Bridson and Haeﬂiger, 2013) for more details. These spaces share many properties with Euclidean spaces (Bertrand and Kloeckner, 2012) which make it possible to extend the Sliced-Wasserstein distance on them. We will denote (M, g) a Hadamard manifold in the following. Particular cases, such as Hyperbolic spaces and the space of Symmetric Positive Deﬁnite matrices among others, will be further studied in Section 4. Properties of Hadamard Manifolds. First, since a Hadamard manifold is a complete connected Riemannian manifold, by the Hopf-Rinow theorem (Lee, 2006, Theorem 6.13), it is also geodesically complete. Therefore, any geodesic curve γ : [0, 1] M connecting x M to y M can be extended to R as a geodesic line. Furthermore, by the Cartan Hadamard theorem (Lee, 2006, Theorem 11.5), Hadamard manifolds are diﬀeomorphic to the Euclidean space Rd, and the exponential map at any x M is bijective from Tx M to M with the logarithm map as its inverse. Moreover, their injectivity radius is inﬁnite, and thus their geodesics are aperiodic and can be mapped to the real line, allowing us to ﬁnd coordinates on the real line, and hence to compute the Wasserstein distance between the projected measures eﬃciently. The SW discrepancy on such spaces is therefore analogous to the Euclidean case. Note that Hadamard manifolds belong to the more general class of CAT(0) metric spaces, and hence inherit their properties described in (Bridson and Haeﬂiger, 2013). We now discuss two diﬀerent possible projections, which both generalize the Euclidean orthogonal projection. Geodesic Projections. As we saw in Section 3.1, a natural projection onto geodesics is the geodesic projection. Let G be a geodesic passing through an origin point o M. This origin is often naturally chosen on the space and corresponds to the analog of 0 in Rd. Then, the geodesic projection onto G is obtained naturally as

x M, P G(x) = argmin y G d(x, y).

From the projection, we can obtain a coordinate on the geodesic by ﬁrst assigning it a direction and then computing the distance to the origin. By noting v To M a vector in

Bonet, Drumetz, Courty

the tangent space at the origin, such that G = Gv = {expo(tv), t R}, we can give a direction to the geodesic by computing the sign of the inner product in the tangent space of o between v and the log of P G. Analogously to the Euclidean case, we can restrict v to be of unit norm, i.e., v o = 1. We now denote the projection and coordinate projection on Gv as P v and P v, respectively. Hence, we obtain the coordinates using

P v(x) = sign logo P v(x) , v

o d P v(x), o .

We show in Proposition 2 that the map tv : Gv R deﬁned as

x Gv, tv(x) = sign logo(x), v o d(x, o), (2)

is an isometry, i.e., it satisﬁes |tv(x) tv(y)| = d(x, y) for all x, y Gv.

Proposition 2 Let (M, g) be a Hadamard manifold with origin o. Let v To M. The map tv deﬁned in Equation (2) is an isometry from Gv = {expo(tv), t R} to R.

Note that to obtain the coordinate directly from x M, we can also solve the following problem: P v(x) = argmin t R d expo(tv), x . (3)

Since Hadamard manifolds belong to the more general class of CAT(0) metric spaces, by (Bridson and Haeﬂiger, 2013, II. Proposition 2.2), the geodesic distance is geodesically convex. Hence, t 7 d expo(tv), x 2 is a coercive strictly convex function, and thus it admits a unique minimizer. Therefore, Equation (3) is well deﬁned. Moreover, we have the following characterization for the optimum:

Proposition 3 Let (M, g) be a Hadamard manifold with origin o. Let v To M, and note γ(t) = expo(tv) for all t R. Then, for any x M,

P v(x) = argmin t R d(γ(t), x)2 γ P v(x) , logγ(P v(x))(x)

γ(P v(x)) = 0.

In the Euclidean case Rd, geodesics are of the form γ(t) = tθ for any t R and for a direction θ Sd 1. Since logx(y) = y x for x, y Rd, we recover the projection formula: γ P θ(x) , logγ(P θ(x))(x)

γ(P θ(x)) = 0 θ, x P θ(x)θ = 0 P θ(x) = θ, x .

Busemann Projections. The level sets of the geodesic projections are geodesic subspaces. It has been shown that projecting along geodesics is not always the best solution, as it might not preserve distances between the original points (Chami et al., 2021). Indeed, on Euclidean spaces, as mentioned earlier, the projections are done along hyperplanes, which preserve the distances between points belonging to another geodesic with the same direction (see Figure 3). On Hadamard manifolds, hyperplane analogs can be obtained through the level sets of the Busemann function, which we now introduce. Let γ : R M be a geodesic line, then the Busemann function associated to γ is deﬁned as (Bridson and Haeﬂiger, 2013, II. Deﬁnition 8.17)

x M, Bγ(x) = lim t d x, γ(t) t .

Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifold

Figure 3: (Left) On Euclidean spaces, the distance between the projections of two points belonging to a geodesic with the same direction is conserved. (Middle) On Hyperbolic spaces, this is also the case using the horospherical projection as demonstrated in (Chami et al., 2021, Proposition 3.4), but not for geodesic projections (Right).

On Hadamard manifolds, and more generally on CAT(0) spaces, the limit exists (Bridson and Haeﬂiger, 2013, II. Lemma 8.18). This function returns a coordinate on the geodesic γ, which can be understood as a normalized distance to inﬁnity towards the direction given by γ (Chami et al., 2021). The level sets of this function are called horospheres. On spaces of constant curvature (i.e., Euclidean or Hyperbolic spaces), horospheres are of constant null curvature and hence very similar to hyperplanes. We illustrate horospheres in Hyperbolic spaces in the middle of Figure 3 and in Figure 5. For example, in the Euclidean case, we can show that the Busemann function associated with Gθ = span(θ) for θ Sd 1 is given by

x Rd, Bθ(x) = x, θ .

It actually coincides, up to a sign, with the inner product, which can be seen as a coordinate on the geodesic Gθ. Moreover, its level sets in this case are (aﬃne) hyperplanes orthogonal to θ. Hence, the Busemann function oﬀers a principled way to project elements x M from a Hadamard manifold onto R, provided its closed-form can be computed. To ﬁnd the projection onto the geodesic γ, we can solve the equation in s R, Bγ(x) = Bγ γ(s) = s, and we ﬁnd that the projection onto the geodesic γ characterized by v To M such that v o = 1 and γ(t) = expo(tv) for all t R is

Bv(x) = expo Bv(x)v .

Wasserstein Distance on Geodesics. In Proposition 4, we verify that the Wasserstein distance between the coordinates (on Pp(R)) is equal to the Wasserstein distance between the measures projected onto geodesics (on Pp(M)). This relies on the isometry property of tv derived in Proposition 2.

Bonet, Drumetz, Courty

Proposition 4 Let (M, g) a Hadamard manifold, p 1 and µ, ν Pp(M). Let v To M such that v o = 1 and Gv = {expo(tv), t R} the geodesic on which the measures are projected. Then,

W p p ( P v #µ, P v #ν) = W p p (P v #µ, P v #ν),

W p p ( Bv #µ, Bv #ν) = W p p (Bv #µ, Bv #ν),

where the Wasserstein distances are deﬁned with the corresponding geodesic distance given the space, i.e., with d(x, y) the geodesic distance on M for the Wp on the left, and |t s| for Wp on the right.

From these properties, we can work equivalently in R and on the geodesics when using the Busemann projection (also called horospherical projection) or the geodesic projection of measures. In practice, analogously to the Euclidean case, we use the projections on R and the closed-form of the Wasserstein distance in Pp(R) given by Equation (1). Sliced-Wasserstein on Hadamard Manifolds. We are now ready to deﬁne the Sliced-Wasserstein distance on Hadamard manifolds. For directions, we will sample from the uniform measure on So = {v To M, v o = 1}. Note that other distributions could be used, such as a Dirac in the maximum direction, similarly to max-SW (Deshpande et al., 2019), or any variant using diﬀerent slicing distributions, as in (Nguyen et al., 2021a,b; Ohana et al., 2023; Nguyen and Ho, 2024b). However, to deﬁne a strict generalization of SW, we choose to focus on the uniform one in this work.

Deﬁnition 5 (Cartan-Hadamard Sliced-Wasserstein) Let (M, g) a Hadamard manifold with o as its origin. Denote λo as the uniform distribution on So = {v To M, v o = 1}. Let p 1, then we deﬁne the p-Geodesic Cartan-Hadamard Sliced-Wasserstein distance between µ, ν Pp(M) as

GCHSWp p(µ, ν) = Z

So W p p (P v #µ, P v #ν) dλo(v).

Likewise, we deﬁne the p-Horospherical Cartan-Hadamard Sliced-Wasserstein distance between µ, ν Pp(M) as

HCHSWp p(µ, ν) = Z

So W p p (Bv #µ, Bv #ν) dλo(v).

In the following, when we want to mention both GCHSW and HCHSW, for example for properties satisﬁed by both, we will use the term Cartan-Hadamard Sliced-Wasserstein, abbreviated as CHSW. Then, without loss of generality, we will write

CHSWp p(µ, ν) = Z

So W p p (P v #µ, P v #ν) dλo(v),

with P v denoting either the geodesic or the horospherical projection. We illustrate the projection process in Figure 4. Guidelines between Geodesic and Horospherical CHSW. A natural question to ask is which projection we should choose. As we will see in Section 4, both projections

Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifold

Figure 4: Illustration of the projection process of measures on geodesics t 7 expo(tv1) and t 7 expo(tv2).

coincide for any pullback Euclidean metric, which includes many manifolds of interest. For negatively curved spaces, they do not coincide. Nonetheless, there are cases where we can compute only the horospherical projection in closed-form. For the few cases where we can compute both (e.g., the hyperbolic case), we refer to (Bonet et al., 2023a, Figure 3) for the behavior of the two distances between distributions. We observe that the horospherical HSW more closely aligns with the behavior of the Wasserstein distance. However, in applications, both variants perform well.

3.3 Related Works

Intrinsic Sliced-Wasserstein. To the best of our knowledge, the ﬁrst attempt to deﬁne a generalization of the Sliced-Wasserstein distance on Riemannian manifolds was made by Rustamov and Majumdar (2023). In this work, they restricted their analysis to compact spaces and proposed to use the eigendecomposition of the Laplace-Beltrami operator (see (Gallot et al., 1990, Deﬁnition 4.7)). Let (M, g) be a compact Riemannian manifold. For ℓ N, denote λℓthe eigenvalues and φℓthe eigenfunctions of the Laplace-Beltrami operator sorted by increasing eigenvalues. Then, we can deﬁne spectral distances as

x, y M, dα(x, y) = X

ℓ 0 α(λℓ) φℓ(x) φℓ(y) 2,

where α : R+ R+ is a monotonically decreasing function. Then, they deﬁne the Intrinsic Sliced-Wasserstein (ISW) distance between µ, ν P2(M) as

ISW2 2(µ, ν) = X

ℓ 0 α(λℓ)W 2 2 (φℓ)#µ, (φℓ)#ν .

The eigenfunctions are used to map the measures to the real line, which makes it very eﬃcient to compute in practice. The eigenvalues are sorted in increasing order, and the

Bonet, Drumetz, Courty

series is often truncated by keeping only the L smallest eigenvalues. This distance cannot be applied to Hadamard manifolds as these spaces are not compact. Sliced-Wasserstein on the Sphere. Bonet et al. (2023b) then proposed a Spherical Sliced-Wasserstein distance by integrating and projecting over all geodesics using the geodesic projection in an attempt to generalize the Sliced-Wasserstein distance intrinsically to the sphere Sd 1. We note that ISW is more in the spirit of a max-K Sliced Wasserstein distance (Dai and Seljak, 2021), which projects over the K maximal directions, rather than the Sliced-Wasserstein distance. More recently, Quellmalz et al. (2023, 2024) studied diﬀerent Sliced-Wasserstein distances on S2 by using spherical Radon transforms, while Tran et al. (2024) proposed to use the stereographic projection along the Generalized Sliced-Wasserstein distance (Kolouri et al., 2019), and Garrett et al. (2024) proposed Sliced-Wasserstein distances over the space of functions on the sphere using a convolution slicer w.r.t a kernel for the projection. Moreover, Genest et al. (2024) leveraged the Sliced Wasserstein distance on manifolds to sample noise on non-Euclidean spaces such as meshes. Generalized Sliced-Wasserstein. A somewhat related distance is the Generalized Sliced-Wasserstein distance (GSW) introduced by Kolouri et al. (2019), and which uses nonlinear projections onto the real lines. The main diﬀerence is that GSW focuses on probability distributions lying in Euclidean space by projecting the measures along nonlinear hypersurfaces. That said, adapting the deﬁnition of GSW to handle probability measures on Riemannian manifolds, and the properties that need to be satisﬁed by the deﬁning function g such as the homogeneity, then we can write the CHSW in the framework of GSW using g : (x, v) 7 P v(x).

4. Examples of Cartan-Hadamard Sliced-Wasserstein

In this section, we specify the framework derived in full generality in Section 3 for particular Hadamard manifolds. More precisely, we ﬁrst focus on manifolds endowed with a Pullback Euclidean metric, which are Hadamard manifolds with null curvature. Then, we look at Hyperbolic spaces which are manifolds of constant negative curvature. We also study the space of Symmetric Positive Deﬁnite matrices (SPD) endowed with metrics for which it is a Hadamard manifold. Finally, we discuss the case of the product manifold of Hadamard manifolds, which is itself a Hadamard manifold, as products of manifolds of non-positive curvature are still of non-positive curvature (Gu et al., 2019, Lemma 1). We defer the proofs of this section to Appendix C.

4.1 Pullback Euclidean Manifold

Cartan-Hadamard manifolds include, among others, spaces of null curvature. As the curvature is preserved by the pullback operator, pullback Euclidean metrics are such spaces. We formally recall the deﬁnition of a pullback Euclidean metric along with its geodesic distance and exponential map, following (Chen et al., 2024b, Theorem 3.3).

Theorem 6 (Pullback Euclidean Metric) Let N be a Euclidean space and denote , its inner product and the associated norm. Let M be some space and φ : M N be a diﬀeomorphism. Then, deﬁning for any x M and u, v Tx M the metric gφ x(u, v) = φ ,x(u), φ ,x(v) where φ ,x : Tx M Tφ(x)N is the diﬀerential of φ at x, (M, gφ) is a

Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifold

Riemannian manifold with geodesic distance

d M(x, y) = φ(x) φ(y) .

Moreover, the exponential map is

x M, v Tx M, expx(v) = φ 1 φ(x) + φ ,x(v) .

Let (M, gφ) be such a space. Denote o the origin of M. Geodesics passing through o in direction v To M have the form

t R, γv(t) = φ 1 φ(o) + tφ ,o(v) .

Moreover, tangent vectors v To M belong to the sphere So if and only if v 2 o = φ ,o(v) 2 = 1. Thus, using this formula, we can obtain both the geodesic and horospherical coordinates, which actually coincide (up to a sign), as in the Euclidean case.

Proposition 7 Let v So, then the projection coordinate on Gv = {γv(t), t R} is

x M, P v(x) = Bv(x) = φ(x) φ(o), φ ,o(v) .

For instance, the Euclidean space endowed with the Mahalanobis distance ﬁts this framework for φ(x) = A 1 2 x with A S++ d (R) a positive deﬁnite matrix, since in this case, for any x, y Rd, d(x, y)2 = (x y)T A(x y) = A 1 2 x A 1 2 y 2 2.

In this case, we have φ(0) = 0 and φ ,0(v) = A 1 2 v. Thus, the projection is obtained by P v(x) = A 1 2 x, A 1 2 v = x T Av for v S0, i.e., which satisﬁes v 2 0 = A 1 2 v 2 2 = 1. In this situation, as expected, the directions and the data points are ﬁrst mapped by the linear projection x 7 A 1 2 x, and then the usual orthogonal projections are performed as for the Euclidean Sliced-Wasserstein distance.

Deﬁnition 8 (Mahalanobis Sliced-Wasserstein) Let p 1 and A S++ d (R). The p-Mahalanobis Sliced-Wasserstein distance between µ, ν Pp(Rd) is deﬁned as

SWp p,A(µ, ν) = Z

S0 W p p (P v #µ, P v #ν) dλ0(v),

with P v(x) = x T Av for v S0 = {v Rd, v T Av = 1}, x Rd and λ0 the uniform distribution on S0.

The Mahalanobis distance is often learned in metric learning, which has been used for diﬀerent applications in, e.g., computer vision, information retrieval or bioinformatic (Bellet et al., 2013). In Section 6.1, we use the Mahalanobis Sliced-Wasserstein distance for a document classiﬁcation task (Kusner et al., 2015), where the underlying metric A is previously learned using metric learning methods (Huang et al., 2016). More generally, this Pullback Euclidean framework includes any squared geodesic distance for which the metric is of the form u, v x = u T A(x)v with A(x) S++ d (R) for any x Rd (Scarvelis and Solomon, 2023; Pooladian et al., 2023). For such a metric, we have φ ,x(v) = A(x) 1 2 v, and computing φ(x) in closed-form may not be straightforward. It also includes many useful metrics used on the space of SPD matrices, which we describe more thoroughly in Section 4.3.2.

Bonet, Drumetz, Courty

4.2 Hyperbolic Spaces

Hyperbolic spaces are Riemannian manifolds of negative constant curvature K < 0 (Lee, 2006) and are thus particular cases of Hadamard manifolds. They have recently received a surge of interest in machine learning as they allow embedding data with a hierarchical structure eﬃciently (Nickel and Kiela, 2017, 2018). A thorough review of the recent use of hyperbolic spaces in machine learning can be found in (Peng et al., 2021; Mettes et al., 2024). There are ﬁve usual parameterizations of a hyperbolic manifold (Peng et al., 2021). They are equivalent (isometric) and one can easily switch from one formulation to the other. Hence, in practice, we use the one that is the most convenient, either given the formulae to derive or the numerical properties. In machine learning, the two most commonly used models are the Poincar e ball and the Lorentz model (also known as the hyperboloid model). Each of these models has its own advantages compared to the other. For example, the Lorentz model has a distance that behaves better w.r.t. numerical issues compared to the distance of the Poincar e ball. However, the Lorentz model is unbounded, unlike the Poincar e ball. We introduce these two models in the following. Lorentz Model. The Lorentz model of curvature K < 0 is deﬁned as

Ld K = (x0, . . . , xd) Rd+1, x, x L = 1

K , x0 > 0 ,

where for x, y Rd+1, x, y L = x0y0 + Pd i=1 xiyi is the Minkowski pseudo inner-product. The Lorentz model can be seen as the upper sheet of a two-sheet hyperboloid. In the following, we will denote x0 = ( 1 K , 0, . . . , 0) Ld K the origin of the hyperboloid. The geodesic distance in this manifold is deﬁned as

x, y Ld K, d L(x, y) = 1

K arccosh K x, y L .

At any x Ld K, the tangent space is Tx Ld K = {v Rd+1, x, v L = 0}. Note that on Tx0Ld K, the Minkowski inner product equals the usual Euclidean inner product. Moreover, geodesics passing through x in direction v Tx Ld K are obtained as the intersection between the plane span(x, v) and the hyperboloid Ld K, and are of the form

t R, expx(tv) = cosh(

Kt v L)x + sinh(

In particular, geodesics passing through the origin x0 in direction v Sx0 are

t R, γv(t) = expx0(tv) = cosh

Kt x0 + sinh

Poincar e Ball. The Poincar e ball of curvature K < 0 is deﬁned as

Bd K = x Rd, x 2 2 < 1

It can be seen as the stereographic projection of each point of Ld K on the hyperplane {x Rd+1, x0 = 0}. The origin of Bd K is 0 and the geodesic distance is deﬁned as

x, y Bd K, d B(x, y) = 1

K arccosh 1 2K x y 2 2 (1 + K x 2 2)(1 + K y 2 2)

Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifold

(a) Euclidean. (b) Geodesics. (c) Horospheres. (d) Euclidean. (e) Geodesics. (f) Horospheres.

Figure 5: Projection of (red) points on a geodesic (black line) in the Poincar e ball (Left) and in the Lorentz model (Right) along Euclidean lines, geodesics or horospheres (in blue). Projected points on the geodesic are shown in green.

The tangent space is Rd and for any v Sd 1, the geodesic passing through the origin is deﬁned as

t R, γ v(t) = exp0(t v) = 1

Hyperbolic Sliced-Wasserstein. To deﬁne Hyperbolic Sliced-Wasserstein distances, we ﬁrst need to sample geodesics, which can be done in both models by simply sampling from a uniform measure on the sphere. Indeed, let v Sd 1, then the direction of the geodesic in Ld K is obtained as v = (0, v) Tx0Ld K Sd = Sx0 by concatenating 0 to v. On the Poincar e ball, v gives directly the direction to the geodesic, and is called an ideal point. Thus, we only need to compute the projection coordinates on the geodesics in order to build the corresponding Geodesic and Horospherical Sliced-Wasserstein distances. We provide the closed-form of the geodesic projection and the Busemann function for both models in the following propositions. Additionally, we illustrate the projection process in Figure 5.

Proposition 9 (Coordinate projections on Hyperbolic spaces)

1. Let v Sx0 = Tx0Ld K Sd, the geodesic and horospherical projection coordinates on Gv = span(x0, v) Ld K are for all x Ld K,

K arctanh 1

K x, v L x, x0 L

2. Let v Sd 1 an ideal point. Then the geodesic and horospherical projections coordinates on G v = {γ v(t), t R} are for all x Bd K,

Kx 2 2 1 + K x 2 2

Bonet, Drumetz, Courty

where s is deﬁned as

( 1 K x 2 2

(1 K x 2 2)2+4K x, v 2

2K x, v if x, v = 0 0 if x, v = 0.

This proposition allows to deﬁne hyperbolic Sliced-Wasserstein distances by specifying CHSW with the right formulas.

Deﬁnition 10 (Hyperbolic Sliced-Wasserstein)

1. Let p 1, µ, ν Pp(Ld K). Then, the p-Geodesic Hyperbolic Sliced-Wasserstein distance and the p-Horospherical Hyperbolic Sliced-Wasserstein distance on the Lorentz model Ld K are deﬁned as

GHSWp p(µ, ν) = Z

Tx0Ld K Sd W p p (P v #µ, P v #ν) dλ(v)

HHSWp p(µ, ν) = Z

Tx0Ld K Sd W p p (Bv #µ, Bv #ν) dλ(v).

2. Let p 1, µ, ν Pp(Bd K). Then, the p-Geodesic Hyperbolic Sliced-Wasserstein distance and the p-Horospherical Hyperbolic Sliced-Wasserstein distance on the Poincar e ball Bd K are deﬁned as

GHSWp p( µ, ν) = Z

Sd 1 W p p (P v # µ, P v # ν) dλ( v)

HHSWp p( µ, ν) = Z

Sd 1 W p p (B v # µ, B v # ν) dλ( v).

Note that we could also work on other models such as the Klein model, the Poincar e halfplane model or the hemisphere model (see e.g. (Cannon et al., 1997; Loustau, 2020)) and derive the corresponding projections in order to deﬁne the Hyperbolic Sliced-Wasserstein distances in these models. Note also that these diﬀerent Sliced-Wasserstein distances are actually equal from one model to the other when using the isometry mappings, which is a particular case of Proposition 11.

Proposition 11 Let (M, g M) and (N, g N ) be two isometric Cartan-Hadamard manifolds, φ : M N an isometry, and assume that λφ(o) = (φ ,o)#λo.2 Let p 1, µ, ν Pp(M) and µ = φ#µ, ν = φ#ν. Then,

CHSWp p(µ, ν; λo) = CHSWp p( µ, ν; λφ(o)),

where we denote CHSWp p(µ, ν; λ) the Cartan-Hadamard Sliced-Wasserstein distance with slicing distribution λ.

2. We expect it to be true in general as φ is an isometry, but we did not ﬁnd in the literature a formal proof. In practice, this fact was veriﬁed for each tested case.

Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifold

Proposition 11 includes as a particular case the Hyperbolic Sliced-Wasserstein distances (and in particular is more general than (Bonet et al., 2023a, Proposition 3.4)). This demonstrates that the Hyperbolic Sliced-Wasserstein distances are independent from the chosen model. Thus, we can work in the model which is the most convenient for us. Moreover, if we work on a model for which we do not have necessarily a closed-form, we can project the distributions on a model where we have the closed-forms such as the Lorentz model or the Poincar e ball. Bonet et al. (2023a) compared GHSW and HHSW on diﬀerent tasks such as gradient ﬂows or as regularizers for deep classiﬁcation with prototypes. Moreover, they also veriﬁed empirically that GHSW and HHSW are independent with respect to the model while comparing evolutions of the distances between Wrapped Normal distributions. In particular, they observed that HHSW had values closer to the Wasserstein distance compared to GHSW.

4.3 Symmetric Positive Deﬁnite Matrices

Let Sd(R) be the set of symmetric matrices of Rd d, and let S++ d (R) be the set of SPD matrices of Rd d, i.e., matrices M Sd(R) satisfying for all x Rd \ {0}, x T Mx > 0. S++ d (R) is a Riemannian manifold (Bhatia, 2009) which can be endowed with diﬀerent metrics. At each M S++ d (R), we can associate a tangent space TMS++ d (R) which can be identiﬁed with the space of symmetric matrices Sd(R). SPD matrices have received a lot of attention in Machine Learning. On one hand, this is the natural space to deal with invertible covariance matrices, which are often used to represent M/EEG data (Blankertz et al., 2007; Sabbagh et al., 2019) or images (Tuzel et al., 2006; Pennec, 2020). Moreover, this space is more expressive than Euclidean spaces, and endowed with speciﬁc metrics such as the Aﬃne-Invariant metric, it enjoys a nonconstant non-positive curvature. This property was leveraged to embed diﬀerent type of data (Harandi et al., 2014; Brooks et al., 2019b). This motivated the development of different machine learning algorithms (Chevallier et al., 2017; Yair et al., 2019; Zhuang et al., 2020; Lei et al., 2021; Ju and Guan, 2022) and of neural networks architectures (Huang and Van Gool, 2017; Brooks et al., 2019a). We now introduce the Sliced-Wasserstein distance on the space of SPD matrices ﬁrst endowed with the Aﬃne-Invariant metric, and then endowed with diﬀerent pullback Euclidean metrics.

4.3.1 Symmetric Positive Definite Matrices with Affine-Invariant Metric.

A classical metric widely used with SPDs is the geometric Aﬃne-Invariant metric (Pennec et al., 2006), where the inner product is deﬁned as

M S++ d (R), A, B TMS++ d (R), A, B M = Tr(M 1AM 1B).

Denoting by Tr the Trace operator, the corresponding geodesic distance d AI( , ) is given by

X, Y S++ d (R), d AI(X, Y ) = q

Tr log(X 1Y )2 .

An interesting property justifying the use of the Aﬃne-Invariant metric is that d AI satisﬁes the aﬃne-invariant property: for any g GLd(R), where GLd(R) denotes the set of invertible

Bonet, Drumetz, Courty

matrices in Rd d,

X, Y S++ d (R), d AI(g X, g Y ) = d AI(X, Y ),

where g X = g Xg T . With this metric, S++ d (R) is of (non-constant) non-positive curvature and hence a Hadamard manifold. The natural origin is the identity matrix Id and geodesics passing through Id, in direction A Sd(R) are of the form (Pennec, 2020, Section 3.6.1)

t R, γA(t) = exp Id(t A) = exp(t A),

where exp denotes the matrix exponential. For the Aﬃne-Invariant case, to the best of our knowledge, there is no closed-form for the geodesic projection on GA, the diﬃculty being that the matrices do not necessarily commute. Hence, we will discuss here the horospherical projection which can be obtained with the Busemann function. For A Sd(R) such that A F = 1, denoting γA : t 7 exp(t A) the geodesic line passing through Id with direction A, the Busemann function BA associated to γA writes as

M S++ d (R), BA(M) = lim t d AI exp(t A), M t .

We cannot directly compute this quantity by expanding the distance since exp( t A) and M are not necessarily commuting. The main idea to solve this issue is to ﬁrst ﬁnd a group G GLd(R) that will leave the Busemann function invariant. Then, we can ﬁnd an element of this group which will project M on the space of matrices commuting with exp(A). This part of the space is of null curvature, i.e., it is isometric to a Euclidean space. In this case, we can compute the Busemann function since the matrices are commuting. Hence, the Busemann function take the form

BA(M) = A, log πA(M)

where πA is a projection on the space of commuting matrices which can be obtained in practice through a UDU or LDL decomposition. We detail more precisely in Appendix F how to obtain πA. For more details about the Busemann function on the Aﬃne-invariant space, we refer to Bridson and Haeﬂiger (2013, Section II.10) and Fletcher et al. (2009, 2011). We note that computing the Busemann function on this space induces a heavy computational cost. Thus, we advocate for using in practice Sliced-Wasserstein distances obtained using Pullback-Euclidean metrics on SPDs as described in the next section.

4.3.2 Symmetric Positive Definite Matrices with Pullback Euclidean Metrics.

We study here metrics endowing the space of SPD matrices which are pullback Euclidean metrics (Chen et al., 2024a,b), i.e., metrics which are obtained through a diﬀeomorphism from S++ d (R) to (Sd(R), , F ). Pullback Euclidean metrics and more generally, pullback metrics allow inheriting properties from the mapped space (Chen et al., 2024b). The pullback Euclidean metrics studied here belong to the framework presented in Section 4.1,

Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifold

Figure 6: (Left) Random geodesics drawn in S++ 2 (R). (Right) Projections (green points) of covariance matrices (depicted as red points) over one geodesic (in black) passing through I2 along the Log-Euclidean geodesics (blue lines).

with M = S++ d (R) and N = Sd(R). This framework includes many interesting metrics, such as the Log-Euclidean metric with φ = log (Arsigny et al., 2005, 2006) which is a good ﬁrst-order approximation of the Aﬃne-Invariant metric (Arsigny et al., 2005; Pennec, 2020), the Log-Cholesky metric (Lin, 2019) or the recently proposed O(n)-invariant Log-Euclidean metric (Thanwerdas and Pennec, 2023; Chen et al., 2024a) and Adaptive Riemannian metric (Chen et al., 2024b). Log-Euclidean Metric. We ﬁrst focus on the Log-Euclidean metric, for which φ = log. To apply Proposition 7, we ﬁrst need to compute its diﬀerential in the origin Id. For completeness, we recall here the diﬀerential form of the matrix logarithm derived e.g. in (Pennec, 2020).

Lemma 12 (Section 3.2.2 in (Pennec, 2020)) Let φ : X 7 log(X) and X = UDUT S++ d (R) where D = diag(λ1, . . . , λd). The diﬀerential operator of φ at X is given by

V TXS++ d (R), φ ,X(V ) = UΣ(V )UT ,

where Σ(V ) = UT V U Γ and Γ is the Loewner s matrix deﬁned for all i, j {1, . . . , d} as

( log λi log λj

λi λj if i = j 1 λi if i = j.

Proof Apply the Daleckii-Krein formula, see e.g., (Noferini, 2017, Theorem 2.11).

We note that for close eigenvalues (Pennec, 2020),

log λi log λj

2λj + (λi λj)2

3λ2 j + O (λi λj)3 !

Furthermore, for X = D = U = Id, since [UT V U]ij = Vij, we ﬁnd φ ,Id(V ) = V for any V . Thus, as log(Id) = 0, we obtain the following projections.

Bonet, Drumetz, Courty

Proposition 13 Let φ = log. Then, for any A Sd(R) such that A F = 1, the coordinate projection is X S++ d (R), P A(X) = BA(X) = log(X), A F .

Proof Apply Proposition 7 with φ(X) = log(X) observing that φ(Id) = 0 and φ ,Id = Id.

We illustrate on Figure 6 the projection of matrices M S++ 2 (R) embedded as vectors (m11, m22, m12) R3 on geodesics passing through I2. This projection was ﬁrst derived by Bonet et al. (2023c) which introduced the Sliced-Wasserstein distance on the space of SPDs endowed with the Log-Euclidean metric, named SPDSW, and applied it to M/EEG data to perform brain-age prediction and domain adaptation for brain computer interfaces. O(n)-Invariant Log-Euclidean Metric. The O(n)-invariant Log-Euclidean metric was introduced by Thanwerdas and Pennec (2023) and further studied by Chen et al. (2024a). It is a pullback Euclidean metric with, for X S++ d (R) and p, q 0, φp,q(X) = F p,q log(X) where F p,q(A) = q A + p q

d Tr(A)Id for A Sd(R). It can be seen as a generalization of the Log-Euclidean metric since for p = q = 1, F 1,1(A) = A. Since F p,q is a linear function, the diﬀerential of φp,q at X S++ d (R) reads φp,q ,X(V ) = F p,q log ,X(V ) for any V Sd(R). Thus, we have φp,q(Id) = 0, φp,q ,Id = F p,q, and we can apply Proposition 7.

Proposition 14 Let p, q 0, φp,q = F p,q log with F p,q(A) = q A + p q

d Tr(A)Id for A Sd(R). Then, for any A Sd(R) such that A 2 Id = F p,q(A), F p,q(A) F = 1, the coordinate projection is

X S++ d (R), P A(X) = F p,q log(X) , F p,q(A)

Proof Apply Proposition 7 with φ(X) = F p,q log(X) observing that φ(Id) = 0 and φ ,Id = F p,q.

Log-Cholesky Metric. Lin (2019) introduced the Log-Cholesky metric which is obtained as a pullback Euclidean metric with respect to Ld(R), the space of lower triangular matrices, endowed with the Frobenius inner product. The diﬀeomorphism between S++ d (R) and Ld(R) is of the form φ : X 7 ϕ(L(X)) with L : S++ d (R) L++ d (R) which returns the lower triangular matrix obtained by the Cholesky decomposition, i.e., for X = LLT S++ d (R), L(X) = L, and ϕ : L++ d (R) Ld(R) deﬁned as ϕ(L) = L + log diag(L) with the strictly lower triangular part of the matrix and diag its diagonal part. It is easy to see that φ(Id) = 0. We compute the diﬀerential of φ in Lemma 48 using the chain rule and (Lin, 2019, Proposition 4) which gives the diﬀerential of L. Then, applying Proposition 7, we can compute the projection.

Proposition 15 Let φ = ϕ L. Then, for any A Sd(R) such that A 2 Id = 1, the coordinate projection is

X = LLT S++ d (R), P A(X) = L , A

F + log(diag(L)), 1

Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifold

4.4 Product of Hadamard Manifolds.

In recent attempts to embed data into more ﬂexible spaces, it has been proposed to use products of manifolds (Gu et al., 2019; Skopek et al., 2020; de Oc ariz Borde et al., 2023a,b) instead of constant curvature spaces, as real-world data may not be uniformly curved. Since products of constant curvature spaces do not necessarily have constant curvature, they oﬀer greater ﬂexibility for data embedding and they better capture the curvature of the underlying manifold. Since the product of Hadamard manifolds is still a Hadamard manifold (Gu et al., 2019), the product of hyperbolic spaces is a Hadamard manifold, and can be used to obtain ﬂexible spaces e.g. by learning the curvature of the diﬀerent spaces. Another example of a product of Hadamard manifolds is the Poincar e polydisk (Cabanes, 2022) which is the product manifold of R + with the distance d(x, y) = | log(y/x)| and the Poincar e disk, and which has received attention for radar applications (Le Brigant, 2017). Note also that Gaussian distributions with diagonal covariance matrices endowed with the Fisher information matrix form a product of hyperbolic spaces (Cho et al., 2022a). Therefore, it is of interest to provide tools to compare probability distributions on products of Hadamard manifolds. Let (Mi, gi) n i=1 be n Hadamard manifolds and deﬁne the product manifold M = M1 Mn. Then, at x = (x1, . . . , xn) M, the tangent space is simply the inner product Tx M = Tx1M1 Txn Mn, and M is equipped with the metric tensor g = Pn i=1 gi. Moreover, for v = (v1, . . . , vn) To M, the geodesic passing through the origin o = (o1, . . . , on) in direction v reads

t R, γo(t) = γo1(t), . . . , γon(t) ,

where γoi is a geodesic in Mi passing through oi in direction vi. Moreover, the squared geodesic distance can be simply obtained as (Gu et al., 2019)

x, y M, d M(x, y)2 =

i=1 d Mi(xi, yi)2.

Deriving the closed-form of the geodesic projection

t = argmin t R

i=1 d Mi γoi(t), yi 2

might depend on the context and may not be straightforward. Nonetheless, the Busemann function on a product of Hadamard manifolds is simply the weighted sum of the Busemann function on each geodesic line, and is thus easy to compute provided we know in closedform the Busemann function on each manifold Mi. This was ﬁrst observed in (Bridson and Haeﬂiger, 2013, Section II. 8.24) in the case of two manifolds, and we generalize the result to an arbitrary number of manifolds.

Proposition 16 (Busemann function on product Hadamard manifold) Let (Mi)n i=1 be n Hadamard manifolds and let M = M1 Mn be the product manifold. Let λ1, . . . , λn be such that Pn i=1 λ2 i = 1. For any i {1, . . . , n}, let γi be a geodesic line on Mi and deﬁne γ : t 7 γ1(λ1t), . . . , γn(λnt) a geodesic line on M. Then,

x = (x1, . . . , xn) M, Bγ(x) =

i=1 λi Bγi(xi).

Bonet, Drumetz, Courty

In Section 6.2, we leverage this projection and the corresponding Sliced-Wasserstein distance to compare data sets viewed as distributions on Rdx Hdy.

5. Properties

In this section, we derive theoretical properties of the Cartan-Hyperbolic Sliced-Wasserstein distance. First, we will study its topology and the conditions required to have that CHSW is a true distance. In particular, we will ﬁrst focus on the general case, and then on the speciﬁc case of pullback Euclidean metrics. Then, we will study some of its statistical properties. The proofs of this section are postponed to Appendix D.

5.1 Topology

Distance Property. First, we are interested in the distance properties of CHSW. From the properties of the Wasserstein distance and of the slicing process, we can show that it is a pseudo-distance, i.e., that it satisﬁes the positivity, the symmetry and the triangular inequality.

Proposition 17 Let p 1, then CHSWp is a ﬁnite pseudo-distance on Pp(M).

For now, the lacking property is the one of indiscernibility, i.e., that CHSWp(µ, ν) = 0 implies that µ = ν. We conjecture that it holds but we have not been able to prove it yet in full generality. In the following, we derive a suﬃcient condition on a related Radon transform for this property to hold. These derivations are inspired by (Boman and Lindskog, 2009; Bonneel et al., 2015). Let f L1(M), and let us deﬁne, analogously to the Euclidean Radon transform, the Cartan-Hadamard Radon transform CHR : L1(M) L1(R So) which integrates the function f over a level set of the projection P v:

t R, v So, CHRf(t, v) = Z

M f(x)1{t=P v(x)} d Vol(x).

Then, we can also deﬁne its dual operator CHR : C0(R So) Cb(M) for g C0(R So), where C0(R So) is the space of continuous functions from R So to R that vanish at inﬁnity and Cb(M) is the space of continuous bounded functions from M to R, as

x M, CHR g(x) = Z

So g(P v(x), v) dλo(v).

Proposition 18 CHR is the dual operator of CHR, i.e., for all f L1(M), g C0(R So), CHRf, g R So = f, CHR g M.

CHR maps C0(R So) to Cb(M) because g is necessarily bounded as a continuous function which vanishes at inﬁnity. Note that CHR actually maps C0(R So) to C0(M).

Proposition 19 Let g C0(R So), then CHR g C0(M).

Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifold

Let us now recall the disintegration theorem.

Deﬁnition 20 (Disintegration of a measure) Let (Y, Y) and (Z, Z) be measurable spaces, and (X, X) = (Y Z, Y Z) the product measurable space. Then, for µ P(X), we denote the marginals as µY = πY #µ and µZ = πZ #µ, where πY (respectively πZ) is the projection on Y (respectively Z). Then, a family K(y, )

y Y is a disintegration of µ if for all y Y , K(y, ) is a measure on Z, for all A Z, K( , A) is measurable and:

Y Z g(y, z) dµ(y, z) = Z

Z g(y, z)K(y, dz) dµY (y),

where C(X) is the set of continuous functions on X. We can denote µ = µY K. K is a probability kernel if for all y Y , K(y, Z) = 1.

The disintegration of a measure actually corresponds to conditional laws in the context of probabilities. In the case where X = Rd, we have existence and uniqueness of the disintegration (see (Santambrogio, 2015, Box 2.2) or (Ambrosio et al., 2008, Chapter 5) for the more general case). Using the dual operator, we can deﬁne the Radon transform of a measure µ in M as the measure CHRµ satisfying

g C0(R So), Z

R So g(t, v) d(CHRµ)(t, v) = Z

M CHR g(x) dµ(x).

CHRµ being a measure on R So, we can disintegrate it w.r.t. the uniform measure on So as CHRµ = λo Kµ where Kµ is a probability kernel on So B(R). In the following proposition, we show that for λo-almost every v So, K(v, ) coincides with P v #µ.

Proposition 21 Let µ be a measure on M, and Kµ a probability kernel on So B(R) such that CHRµ = λo Kµ. Then for λo-almost every v So, Kµ(v, ) = P v #µ.

All these derivations allow to link the Cartan-Hadamard Sliced-Wasserstein distance with the Radon transform deﬁned with the corresponding projection (geodesic or horospherical). Then, CHSWp(µ, ν) = 0 implies that for λo-almost every v So, P v #µ = P v #ν. Showing that the Radon transform is injective would allow us to conclude that µ = ν. Actually, we derived here two diﬀerent Cartan-Hadamard Radon transforms. Using P v as the geodesic projection, the Radon transform integrates over geodesic subspaces of dimension dim(M) 1. Such spaces are totally geodesic subspaces, and are related to the more general geodesic Radon transform (Rubin, 2003). In the case where the geodesic subspace is of dimension one, i.e., it integrates only over geodesics, this coincides with the X-ray transform and it has been studied, e.g., in (Lehtonen et al., 2018). Here, we are interested in the case of dimension dim(M) 1, which, to the best of our knowledge, has only been studied in (Lehtonen, 2016) in the case where dim(M) = 2 and hence when the geodesic Radon transform and the X-ray transform coincide. However, no results on the injectivity over the sets of measures is yet available. In the case where P v is the Busemann projection, the set of integration is a horosphere. To the best of our knowledge, general horospherical Radon transforms on Cartan-Hadamard manifolds have not yet been studied.

Bonet, Drumetz, Courty

Link with the Wasserstein Distance. An important property of the Sliced Wasserstein distance on Euclidean spaces is that it is topologically equivalent to the Wasserstein distance, i.e., it metrizes the weak convergence. Such results rely on properties of the Fourier transform which do not translate straightforwardly to manifolds. Hence, deriving such results will require further investigations. We note that a possible lead for the horospherical case is the connection between the Busemann function and the Fourier-Helgason transform (Biswas, 2018; Sonoda et al., 2022). Using that the projections are Lipschitz functions, we can still show that CHSW is a lower bound of the geodesic Wasserstein distance.

Proposition 22 Let µ, ν Pp(M), then

CHSWp p(µ, ν) W p p (µ, ν).

This property means that it induces a weaker topology compared to the Wasserstein distance, which can be computationally beneﬁcial but which also comes with less discriminative power (Nadjahi et al., 2020). Hilbert Embedding. CHSW also comes with the interesting property that it can be embedded in Hilbert spaces. This is in contrast with the Wasserstein distance which is known to not be Hilbertian (Peyr e et al., 2019, Section 8.3) except in one dimension where it coincides with its sliced counterpart.

Proposition 23 Let p 1 and H = Lp([0, 1] So, Leb λo). We deﬁne Φ as

Φ : Pp(M) H µ 7 (q, v) 7 F 1 P v #µ(q) ,

where F 1 P v #µ is the quantile function of P v #µ. Then CHSWp is Hilbertian and for all µ, ν

Pp(M), CHSWp p(µ, ν) = Φ(µ) Φ(ν) p H.

This is a nice property which allows us to deﬁne a valid positive deﬁnite kernel for measures, such as the Gaussian kernel (Jayasumana et al., 2015, Theorem 6.1), and hence to use kernel methods (Hofmann et al., 2008). This can allow, for example, to perform distribution clustering, classiﬁcation (Kolouri et al., 2016; Carriere et al., 2017) or regression (Meunier et al., 2022).

Proposition 24 Deﬁne the kernel K : P2(M) P2(M) R as K(µ, ν) = exp γCHSW2 2(µ, ν) for γ > 0. Then K is a positive deﬁnite kernel.

Proof Apply (Jayasumana et al., 2015, Theorem 6.1).

Bonet et al. (2023c) notably used this property to perform brain-age regression by ﬁrst representing M/EEG data as a probability distribution of SPD matrices. And then by plugging the Gaussian kernel constructed with the Cartan-Hadamard Sliced-Wasserstein on the space of SPDs endowed with the Log-Euclidean metric, that we presented in Section 4.3.2, into the kernel Ridge regression method.

Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifold

Note that to show that the Gaussian kernel is universal, i.e., that the resulting Reproducing Kernel Hilbert Space (RKHS) is powerful enough to approximate any continuous function (Meunier et al., 2022), we would need additional results, such as that it metrizes the weak convergence and that CHSW2 is a distance, as shown in (Meunier et al., 2022, Proposition 7).

5.2 Topology for Pullback Euclidean Manifolds

In this section, we focus on particular Hadamard manifolds for which the metric is a pullback Euclidean metric, which allows inheriting the properties of Euclidean spaces, and deriving additional properties of the corresponding Sliced-Wasserstein distance. This covers for example the space of SPD matrices with Pullback Euclidean metrics studied in Section 4.3.2 as well as the Mahalanobis manifold introduced in Section 4.1. Let N be a Euclidean space with , its inner product and the associated norm. Let φ : M N be a diﬀeomorphism and denote (M, gφ) the resulting Riemannian manifold (see Theorem 6 for more details). We recall that, by Proposition 7, the projection of x M on the geodesic characterized by the direction v So is of the form

P v(x) = φ(x) φ(o), φ ,o(v) .

In this case, given the formula, we can link CHSW with the Euclidean SW, with the integration made on Sφ(o) = {v Tφ(o)N, v φ(o) = 1} with respect to the measure λφ(o).

Lemma 25 Let (M, gφ) a pullback Euclidean Riemannian manifold and assume that λφ(o) = (φ ,o)#λo. Let p 1 and µ, ν Pp(M). Then,

CHSWp p(µ, ν) = Z

Sφ(o) W p p (Qv #φ#µ, Qv #φ#ν) dλφ(o)(v) = SWp p(φ#µ, φ#ν),

with Qv(x) = x, v and SWp the Euclidean Sliced-Wasserstein distance.

Using this simple lemma, we can leverage results known for the Euclidean Sliced Wasserstein distance to CHSW on these particular spaces. First, we show that we recover the distance property by additionally showing the indiscernible property.

Proposition 26 Let (M, gφ) a pullback Euclidean Riemannian manifold. Let p 1, then CHSWp is a ﬁnite distance on Pp(M).

We can also obtain the important property that CHSW metrizes the weak convergence, as does the Wasserstein distance (Villani et al., 2009). This property was ﬁrst shown for arbitrary measures in (Nadjahi et al., 2019) for the regular Euclidean SW.

Proposition 27 Let (M, gφ) a pullback Euclidean Riemannian manifold of dimension d. Let p 1, (µn)n a sequence in Pp(M) and µ Pp(M). Then, limn CHSWp(µn, µ) = 0 if and only if (µn)n converges weakly towards µ.

With these additional properties, we can also show that the corresponding Gaussian kernel is universal by applying (Meunier et al., 2022, Theorem 4). In addition to Proposition 22, we show that we can lower bound CHSW with the Wasserstein distance when the measures are compactly supported.

Bonet, Drumetz, Courty

Proposition 28 Let (M, gφ) a pullback Euclidean Riemannian manifold of dimension d. Let p 1, r > 0 a radius and B(o, r) = {x M, d M(x, o) r} a closed ball. Then there exists a constant Cd,p,r such that for all µ, ν Pp B(o, r) ,

W p p (µ, ν) Cd,p,r CHSWp(µ, ν) 1 d+1 .

5.3 Statistical Properties

Sample Complexity. In practical settings, we usually cannot directly compute the closedform between µ, ν Pp(M), but we have access to samples x1, . . . , xn µ and y1, . . . , yn ν. Then, it is common practice to estimate the discrepancy with the plug-in estimator CHSWp p(ˆµn, ˆνn) (Manole et al., 2021, 2022; Niles-Weed and Rigollet, 2022) where ˆµn = 1 n Pn i=1 δxi and ˆνn = 1

n Pn i=1 δyi are empirical estimations of the measures. We are interested in characterizing the speed of convergence of the plug-in estimator towards the true distance. Relying on the proof of Nadjahi et al. (2020, Corollary 2), we derive in Proposition 29 the sample complexity of CHSW. As in the Euclidean case, we ﬁnd that the sample complexity does not depend on the dimension, which is an important and appealing property of sliced divergences (Nadjahi et al., 2020) compared to the Wasserstein distance, which has a sample complexity in O(n 1/d) (Niles-Weed and Rigollet, 2022).

Proposition 29 Let p 1, q > p and µ, ν Pp(M). Denote ˆµn and ˆνn their counterpart empirical measures and Mq(µ) = R

M d(x, o)q dµ(x) their moments of order q. Then, there exists Cp,q a constant depending only on p and q such that

E |CHSWp(ˆµn, ˆνn) CHSWp(µ, ν)| 2αn,p,q C1/p p,q Mq(µ)1/q + Mq(ν)1/q ,

n 1/(2p) if q > 2p, n 1/(2p) log(n)1/p if q = 2p, n (q p)/(pq) if q (p, 2p).

This property is very appealing in practical settings as it allows to use the same number of samples while having the same convergence rate in any dimension. In practice though, we cannot compute exactly CHSWp(ˆµn, ˆνn) as the integral on So w.r.t. the uniform measure λo is intractable. Projection Complexity. Thus, to compute it in practice, we usually rely on a Monte Carlo approximation, by drawing L 1 directions v1, . . . , v L and approximating the distance by \ CHSWp,L deﬁned between µ, ν Pp(M) as

\ CHSW p p,L(µ, ν) = 1

ℓ=1 W p p (P vℓ # µ, P vℓ # ν).

In the following proposition, we derive the Monte-Carlo error of this approximation, and we show that we recover the classical rate of O(1/

Proposition 30 Let p 1, µ, ν Pp(M). Then, the error made by the Monte-Carlo estimate of CHSWp with L projections can be bounded as follows

Ev h | \ CHSW p p,L(µ, ν) CHSWp p(µ, ν)| i2 1

LVarv W p p (P v #µ, P v #ν) .

Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifold

We note that here the dimension actually intervenes in the term of variance. Computational Complexity. As we project onto the real line, the complexity of computing the Wasserstein distances between each projected distribution is in O(Ln log n). Then, we add the complexity of computing the projections, which depends on the speciﬁc spaces and whether or not we have access to a closed-form, and the complexity of sampling the directions. For instance, in the hyperbolic case, using the closed-forms derived in Proposition 9, the projection procedure has a complexity of O(nd) and thus the full complexity is in O(Ln(log n + d)). For SPDs with the Log-Euclidean metric, the closed-form derived in Theorem 13 requires computing n matrix logarithm which has a complexity of O(d3) and Ln inner products with complexity O(d2). Moreover, sampling the directions can be done in O(d3) as detailed in (Bonet et al., 2023c, Section 2.4). Thus, the full complexity is in O(Ln(log n + d2) + (L + n)d3). We refer to (Bonet et al., 2023a, Figure 2) and (Bonet et al., 2023c, Figure 2) for runtime comparisons with diﬀerent numbers of samples against the Sinkhorn algorithm and the Wasserstein distance.

6. Application of Cartan-Hadamard Sliced-Wasserstein Distances

In this section, we provide some illustrations of Cartan-Hadamard Sliced-Wasserstein distances on manifolds which were not yet studied in previous works. We note that Bonet et al. (2023a) used HSW to perform deep classiﬁcation with prototypes on Hyperbolic spaces, while Bonet et al. (2023c) used SPDSW to perform domain adaptation for Brain Computer Interface and to perform Brain-Age prediction by leveraging the Gaussian kernel from Proposition 24 and plugging it into Kernel Ridge regression. Here, we ﬁrst provide an experiment using the Mahalanobis Sliced-Wasserstein distance to classify documents, and then an experiment on a product of Cartan-Hadamard manifolds to compare data sets.

6.1 Document Classiﬁcation with Mahalanobis Sliced-Wasserstein

We propose here to perform an experiment of document classiﬁcation. Suppose that we have N documents D1, . . . , DN. Following the work of Kusner et al. (2015), we represent each document Dk as a distribution over words. More precisely, denote x1 . . . , xn Rd the set of words, embedded using word2vec (Mikolov et al., 2013) in dimension d = 300. Then, Dk is represented by the probability distribution Dk = Pn i=1 wk i δxi, where wk i represents the frequency of the word xi in Dk normalized such that Pn i=1 wk i = 1. Then, following (Huang et al., 2016), we learn a matrix A S++ d (R) using the Neighborhood Component Analysis (NCA) method (Goldberger et al., 2004) combined with the Word Centroid Distance (WCD), deﬁned as WCDA(Dk, Dℓ)2 = (Xwk Xwℓ)T A(Xwk Xwℓ) with X = (x1, . . . , xn) Rd n. We use the pytorch-metric-learning library (Musgrave et al., 2020) to learn A. Once A is learned, we compute the distance between documents using the Wasserstein distance or the Sliced-Wasserstein distance with Mahalanobis ground cost distance, i.e., d A(x, y)2 = (x y)T A(x y). Once we compute the distance between each documents d(Dk, Dℓ)

k,ℓ, we use a k-nearest neighbor classiﬁer. On Table 1, we report the results for the BBCSport data set (Kusner et al., 2015), the Movies reviews data set (Pang et al., 2002) and the Goodread data set (Maharjan et al., 2017). All the data sets are split in 5 diﬀerent

Bonet, Drumetz, Courty

BBCSport Movies Goodreads genre Goodreads like

W2 94.55 74.44 56.18 71.00 WA 98.36 76.04 56.81 68.37 SW2 89.42 0.89 67.27 0.69 50.01 1.21 65.90 0.17 SW2,A 97.58 0.04 76.55 0.11 57.03 0.68 67.54 0.14

Table 1: Accuracy for the document classiﬁcation task.

train/test sets. The number of neighbors is found using a cross validation. We compare the results when using the regular Wasserstein and Sliced-Wasserstein distances, i.e., with A = Id, and when learning A using NCA with the WCD metric. The Wasserstein distance is computed using the Python Optimal Transport (POT) library (Flamary et al., 2021). The results for SW are averaged over 3 runs and SW is approximated with L = 500 projections. With this simple initialization, we observe that the results obtained with the Mahalanobis Sliced-Wasserstein distance become very competitive with the ones obtained using the Wasserstein distance with the Mahalanobis ground cost. We note that the results might be further improved by performing then a NCA with WA or SW2,A as distances in the same spirit of (Huang et al., 2016). Here, we just use an initialization through WCD as a proof of concept to demonstrate how much it can already improve the results when using SW with a carefully chosen groundcost distance. We showcase the computational beneﬁts of using the Sliced-Wasserstein distance compared to the Wasserstein distance on Figure 7 by plotting the runtime for comparing each pair of documents, and on Table 2 with the full runtimes. We note that the Wasserstein distance is computed on CPU while the Sliced-Wasserstein distance is implemented in Pytorch and uses GPU. We used as CPU an Intel Xeon 4214 and as GPU a Titan RTX. We observe a computational gain even on small scale data sets where the documents contain few words, and therefore for which the underlying representative distributions contain few samples. For data sets with distributions with a larger number of samples such as goodreads, the computational beneﬁts are pretty big. We sum up the statistics of the diﬀerent data sets in Table 3.

6.2 Data Sets Comparisons with Sliced-Wasserstein on a Product Manifold

Assume we have data sets deﬁned as sets of feature-label pairs (x, y) X Y (Alvarez-Melis and Fusi, 2020), where the samples are in Rdx and the labels are embedded in a Hyperbolic space Hdy. Then, a data set Di can then be seen as a probability distribution on Rdx Hdy

which we can compare using CHSW on product manifolds. We assume that the data sets are already embedded in such spaces. In practice, such embedding could come up for instance when we are given image-text pairs, which could be embedded both in Hyperbolic spaces e.g. using (Desai et al., 2023), or for more classical data sets using label embeddings methods (Akata et al., 2015). Here, to get a data set represented in P(Rdx Hdy), we follow (Liu et al., 2025) and use a multidimensional scaling (MDS) method in hyperbolic spaces (Walter, 2004; Cvetkovski and Crovella, 2011) to get an embedding ψ : P(Rdx) Hdy into the hyperbolic space such that, for νy denoting the conditional probability distribution of samples in Rdx with labels y Y,

W 2 2 (νy, νy ) α d H ψ(νy), ψ(νy ) 2,

Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifold

Figure 7: Runtime between each pair of documents.

BBCSport Movies Goodreads

WA Average ( 10 3 s) 3.29 1.61 6.78 2.74 440.30 259 Full (s) 891 13544 221252

SW2,A Average ( 10 3s) 2.45 0.008 2.47 0.04 2.5 0.12 Full (s) 665 4931 1256

Table 2: Runtimes for the document classiﬁcation task (in averaged between the pairs of documents, or for the full computation of all the pairs).

with α some scaling parameter. To ﬁnd this embedding, we minimize the absolute diﬀerent squared loss (Cvetkovski and Crovella, 2011) deﬁned as, for an original distance matrix = (δi,j)i,j and a scaling factor α,

z1, . . . , zn Ldy, L(z) =

d L(zi, zj) αδij 2.

To improve the numerical stability, we perform the optimization in the tangent space following (Mishne et al., 2023) using the parametrization

zi = expx0 (0, zi) = cosh( zi ), sinh( zi ) zi

for zi Rdy 1, and then performing the optimization in the Euclidean space. We focus here on *NIST data sets, which include MNIST (Le Cun and Cortes, 2010), EMNIST (Cohen et al., 2017), Fashion MNIST (Xiao et al., 2017), KMNIST (Clanuwat et al., 2018), and USPS (Hull, 1994). We plot on Figure 8 the matrix distance obtained between the *NIST data sets either using SW between the data sets seen only through their features, i.e., with Di P(Rdx), and using HCHSW on the space P(Rdx Ldy) where the labels were embedded in Ldy using the method described in the previous paragraph with a scaling of α = 0.1 and dy = 10. We observe that when the labels are not taken into account, the USPS and MNIST data sets have a huge discrepancy between each other. However, when taking into account the labels, we recover that these two data sets are in fact more similar as they both represent numbers. Thus, we argue that using the sliced distance on the product data set to take into account the labels provides better comparisons of the data sets. Furthermore, from a computation point of view, CHSW on the product manifold

Bonet, Drumetz, Courty

MNIST EMNIST Fashion MNIST KMNIST USPS

Fashion MNIST

0.27 0.02 0.43 0.03 0.24 0.01 0.38 0.02

0.27 0.02 0.43 0.02 0.21 0.01 0.47 0.02

0.43 0.03 0.43 0.02 0.29 0.01 0.21 0.01

0.24 0.01 0.21 0.01 0.29 0.01 0.24 0.01

0.38 0.02 0.47 0.02 0.21 0.01 0.24 0.01

MNIST EMNIST Fashion MNIST KMNIST USPS

Fashion MNIST

0.2 0.01 0.42 0.03 0.19 0.01 0.3 0.02

0.2 0.01 0.4 0.02 0.18 0.01 0.34 0.02

0.42 0.03 0.4 0.02 0.28 0.02 0.18 0.01

0.19 0.01 0.18 0.01 0.28 0.02 0.18 0.01

0.3 0.02 0.34 0.02 0.18 0.01 0.18 0.01

(b) Product HCHSW.

Figure 8: Comparison between SW between the data sets and HCHSW between the data sets embedded on Rdx Ldy. Results are averaged over 100 draws of projections.

is much cheaper compared to e.g. computing the Wasserstein distance. On our experiments, computing the full distance matrix with CHSW took in average 0.05s against 120s to compute the Wasserstein distance, where we used here only 10000 samples of the data sets.

7. Cartan-Hadamard Sliced-Wasserstein Flows

We now propose to derive the Wasserstein gradient ﬂows of the CHSW distances, along with non-parametric particle schemes that approximate the ﬂows. We provide ﬁrst the results on general Hadamard manifolds and then we specify them to Mahalanobis manifolds, Hyperbolic spaces and SPDs endowed with the Log-Euclidean metric. The proofs of this section are postponed to Appendix E.

7.1 Wasserstein Gradient Flows

First Variations. On Hadamard manifolds, CHSW discrepancies can be used to learn parametric or empirical distributions by minimizing them. One possible solution is to leverage Wasserstein gradient ﬂows (Ambrosio et al., 2008; Santambrogio, 2017) of F(µ) = 1 2CHSW2 2(µ, ν), where ν is some target distribution. Approximating this ﬂow would then allow providing new samples from ν. Computing such a ﬂow requires ﬁrst computing the ﬁrst variations of the given functional. As a ﬁrst step towards computing Wasserstein gradient ﬂows of CHSW on Hadamard spaces, and analyzing them, we derive in Proposition 31 the diﬀerential of F in the Wasserstein space.

Proposition 31 Let K be a compact subset of M, µ, ν P2(K) with µ Vol. Let v So, denote ψv the Kantorovich potential between P v #µ and P v #ν for the cost c(x, y) = 1

2|x y|2 for x, y R. Let ξ be a diﬀeomorphic vector ﬁeld on K and denote for all ϵ 0, Tϵ : K M

Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifold

deﬁned as Tϵ(x) = expx ϵξ(x) for all x K. Then,

lim ϵ 0+ CHSW2 2 (Tϵ)#µ, ν CHSW2 2(µ, ν) 2ϵ

M ψ v P v(x) grad MP v(x), ξ(x) x dµ(x) dλo(v).

In the Euclidean case, we recover well the formula of the diﬀerential of SW ﬁrst derived in (Bonnotte, 2013, Proposition 5.1.7). Indeed, for x Rd, Tϵ(x) = x + ϵξ(x), and for θ Sd 1, P θ(x) = x, θ . Thus grad P θ(x) = P θ(x) = θ, and we recover

lim ϵ 0+ SW2 2 (Id + ϵξ)#µ, ν SW2 2(µ, ν) 2ϵ = Z

Rd ψ θ P θ(x) θ, ξ(x) dµ(x) dλ(θ).

Cartan-Hadamard Sliced-Wasserstein Flow. Given the diﬀerential, we can derive the Wasserstein gradient ﬂow of F(µ) = 1

2CHSW2 2(µ, ν) as the continuity equation governed by the vector ﬁeld vt obtained through the Wasserstein gradient

x M, vt(x) = W2F(µt)(x) = Z

So ψ t,v P v(x) grad MP v(x) dλo(v),

with ψt,v the Kantorovich potential between P v #µt and P v #ν such that ψ t,v(x) = x F 1 P v #µt FP v #ν(x) , i.e., the Wasserstein gradient ﬂow (µt)t 0 of F is a solution (in the distri-

butional sense) of tµt + div(µtvt) = 0.

Forward Euler Scheme. To provide an algorithm to sample from ν by minimizing F(µ) = 1 2CHSW2 2(µ, ν) while following its Wasserstein gradient ﬂow, there are several possible strategies of discretization of the ﬂow. For instance, a solution could be to compute the backward Euler scheme, also known as the Jordan-Kinderlehrer-Otto (JKO) scheme from the seminal work of Jordan et al. (1998). This strategy has for example been used to minimize the Sliced-Wasserstein distance in (Bonet et al., 2022). Here, we propose instead to use the forward Euler scheme, which allows deﬁning a particle scheme approximating the trajectory of the Wasserstein gradient ﬂow. Such a strategy has been used to minimize diﬀerent functionals such as the MMD (Arbel et al., 2019), the Kernel Stein Discrepancy (Korba et al., 2021) or the KL divergence (Fang et al., 2021; Wang et al., 2022). For SW, Liutkus et al. (2019) proposed to minimize SW with an entropy term, which required to use a Mc Kean Vlasov SDE. Let µ0 Pp(M) and τ > 0. On a Riemannian manifold, analogously to the Riemannian gradient descent (Bonnabel, 2013), the forward Euler scheme becomes

k 0, µk+1 = exp Id τ W2F(µk)

where W2F(µk) is the Wasserstein gradient, and is deﬁned as W2F(µk)(x) = vk(x) = R

So ψ k,v P v(x) grad MP v(x) dλo(v) for x M. In the Euclidean case, we recover the usual forward Euler scheme µk+1 = Id τ W2F(µk)

Bonet, Drumetz, Courty

Algorithm 1 Wasserstein gradient ﬂows of CHSW

Input: (yj)n j=1 ν, µ0, L the number of projections, N the number of steps Sample (x0 i )n i=1 µ0 for k = 0 to N 1 do

Draw v1, . . . , v L λo Compute ˆxk i,ℓ= P vℓ(xk i ), ˆyj,ℓ= P vℓ(yj) for all ℓ {1, . . . , L} Deﬁne P vℓ # ˆν = 1

n Pn j=1 δˆyj,ℓ, P vℓ # ˆµk = 1

n Pn i=1 δˆxk i,ℓ Compute ˆzk i,ℓ= ˆxk i,ℓ F 1 P vℓ # ˆν(FP vℓ # ˆµk(ˆxk i,ℓ))

Compute gℓ(xk i ) = grad MP vℓ(xk i ) Compute ˆvk(xk i ) = 1

L PL ℓ=1(ˆxk i,ℓ ˆzk i,ℓ)gℓ(xk i ) For all i {1, . . . , n}, xk+1 i = expxk i τ ˆvk(xk i )

In practice, we approximate the Wasserstein gradient by ﬁrst sampling v1, . . . , v L λo and then using

x M, ˆvk(x) = 1

ℓ=1 ψ vℓ,k P vℓ(x) grad MP vℓ(x), (4)

where ψ v,k P v(x) = P v(x) F 1 P v #ν FP v #µk(P v(x)) .

Following (Liutkus et al., 2019), the cumulative distribution functions and the quantiles are approximated using linear interpolations between the true points.3 Finally, the particle scheme is given by,

k 0, i {1, . . . , n}, xk+1 i = expxk i τ ˆvk(xk i ) .

We sum up the procedure in Algorithm 1.

7.2 Application to the Mahalanobis Manifold

For pullback Euclidean metrics, the Riemannian gradient can be obtained by using the inverse of the diﬀerential operator as stated in the following lemma.

Lemma 32 (Lemma 4 in (Chen et al., 2024a)) Let (M, gφ) be a Pullback Euclidean Riemannian manifold. For f : M R a smooth map, the gradient is of the form

x M, grad Mf(x) = φ 1 ,x φ ,x f(x) .

For the Mahalanobis distance, i.e., for φ(x) = A 1 2 x for any x Rd with A S++ d (R), the inverse of the diﬀerential is simply φ 1 ,x(v) = A 1

2 v, and we recall that the projection is P v(x) = x T Av for v So. Thus the Riemannian gradient of the projection P v for v So is

grad MP v(x) = A 1

2 (Av) = v.

3. using https://github.com/aliutkus/torchinterp1d

Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifold

2 0 2 4 6 2

Figure 9: Trajectories of Mahalanobis Sliced-Wasserstein ﬂows using four SPD matrices At along the geodesic between I2 and a randomly chosen A S++ d (R). Ellipses represent the matrices At.

We recover the same gradient. But the matrix A is still involved in the formula of the projection, which can change the trajectory of the particles. Choosing well the matrix A can help improving the convergence of ﬂows for ill conditioned problems, see e.g. (Duchi et al., 2011; Dong et al., 2023). We illustrate, on Figure 9, the eﬀect on the trajectory when using a randomly sampled SPD matrix A to specify the Mahalanobis distance compared to the classical Euclidean metric. We plot the trajectories for diﬀerent SPDs obtained on the geodesic between I2 and A, which is of the form At = exp t log(A) for t [0, 1] when using the Aﬃne-Invariant metric.

7.3 Application to Hyperbolic Spaces

Here, we propose to minimize the Hyperbolic Sliced-Wasserstein distances in order to derive new non-parametric schemes allowing to learn a distribution given its samples. We ﬁrst recall how to compute the gradient on the Lorentz model.

Proposition 33 Let f : Ld K R and note f : Rd+1 R a smooth extension on Rd+1. Then, the gradient of f at x Ld K is

grad Ld Kf(x) = Proj K x KJ f(x) ,

where J = diag( 1, 1, . . . , 1) and

Proj K x (z) = z K x, z Lx.

Proof We extend (Boumal, 2023, Proposition 7.7) to Ld K.

Then, leveraging Proposition 33, we derive the closed-forms of the gradients of the geodesic and horospherical projections, which allows deriving the forward Euler scheme of this functional, by plugging the diﬀerent formulas in Equation (4).

Bonet, Drumetz, Courty

Target distributions

log10(W 2 2 (ˆµn, ν))

GHSW HHSW SW on Poincar e

Target distributions

log10(W 2 2 (ˆµn, ν))

0 25 50 75 100 125 150 175 200 Iterations

0 50 100 150 200 250 300 Iterations

Figure 10: Log 2-Wasserstein between the target distribution and particles obtained from HSWFs (averaged over 5 runs).

Proposition 34 Let v Tx0Ld K Sd and x Ld K, then

grad Ld KBv(x) = K

grad Ld KP v(x) = K2 x, x0 Lv x, v Lx0

x, v 2 L + K x, x0 2 L .

On Bd, the gradient can be obtained by rescaling the Euclidean gradient with the inverse

Poincar e ball metric (Nickel and Kiela, 2017) which is 1+K x 2 2 2 2 (Park et al., 2021). Thus, we can also derive the corresponding formulas on the Poincar e ball. For example, for the Busemann function, we have

B v(x) = 2 x 1 x 2 2 v x v x 2 2

and therefore its Riemannian gradient is

grad Bd KB v(x) = 1 + K x 2 2 2

In Figure 10, we plot the 2-Wasserstein distance between the target distribution and samples from the Hyperbolic Sliced-Wasserstein Flows on Hyperbolic space of curvature K = 1. We compare the evolution among GHSW, HHSW and SW (on the Poincar e ball for SW) across 4 diﬀerent scenarios. The two ﬁrst ones involve a target distribution which is a Wrapped Normal Distribution (WND) located either close to the center or to the border of the disk. The second ones involve a mixture of WNDs, with some modes either close to the border or to the center. HHSW and GHSW can be done both on the Lorentz model or the Poincar e ball. Using either model give similar results. As hyperparameters, we chose n = 500 particles, a learning rate of τ = 0.1 with N = 200 epochs for centered targets, and τ = 0.5 with N = 300 epochs for bordered targets. We note that the three gradient ﬂows perform similarly, with an advantage of speed for SW. This might be due to the fact that

Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifold

SW HHSW GHSW

SW HHSW GHSW

(b) Mixture of WNDs.

Figure 11: Trajectories of 50 particles when the target is the WND on the border or the Mixture of WNDs on the border.

the minimization is done in the space of Euclidean probabilities, and thus does not take into account that the modes are actually on the border. We add on Figure 11a and Figure 11b trajectories for the border scenarios. When minimizing GHSW, particles tend to go to the modes by following the shortest path, while when minimizing HHSW, they tend to ﬁrst go to the border before converging to the modes. As the distance on the border of the Poincar e disk are bigger than to the center, this may explain the observed slower convergence of HHSW in Figure 10.

7.4 Application to SPD matrices with the Log-Euclidean Metric

For SPDSW with the Log-Euclidean metric, the formula of the gradient can be derived using the inverse of the diﬀerential as stated in Lemma 32. We report the inverse of the diﬀerential of the log in Lemma 35.

Lemma 35 Let φ : X 7 log(X) and X = UDUT S++ d (R) where D = diag(λ1, . . . , λd). Then, we have W Tφ(X)S++ d (R), φ 1 ,X(W) = U Σ(W)UT ,

where Σ(W) = UT WU Γ with Γ deﬁned as in Lemma 12.

Finally, in Lemma 36, we report the gradient of the projection obtained with the Log Euclidean metric, which can be obtained using that the diﬀerential of the matrix log satisﬁes A, log ,X(V ) F = log ,X(A), V F for any A, V Sd(R).

Lemma 36 Let A Sd(R) and X = UDUT S++ d (R) with U = diag(λ1, . . . , λd). Then,

P A(X) = UΣ(A)UT .

We now have all the tools to apply Algorithm 1 for the particular case of SPDSW. In Figure 12, trajectories plotted inside the S++ 2 (R) cone depict the evolution of the matrices along the gradient ﬂow. The noisy behavior of some of them can be mostly explained by numerical instabilities arising from the diﬀerent matrix operators used in the process, which require to use small step sizes.

Bonet, Drumetz, Courty

Figure 12: Trajectories of particles following the Wasserstein gradient ﬂow of SPDSW.

8. Future Works and Discussions

In this article, we formally introduced a way to generalize the Sliced-Wasserstein distance on Riemannian manifolds of non-positive curvature and speciﬁed this construction for diﬀerent particular cases: pullback Euclidean metrics, Hyperbolic spaces, the space of Symmetric Positive-Deﬁnite matrices, and product of Hadamard manifolds. These new discrepancies can be computed very eﬃciently and scale to distributions composed of a large number of samples in contrast to the computation of the Wasserstein distance. We also analyzed these constructions theoretically while providing new applications and non-parametric schemes to minimize them using Wasserstein gradient ﬂows. Further works might include studying other Hadamard manifolds for which we do not necessarily have a closed-form for the projections such as Siegel spaces (Cabanes, 2022), or extending this construction to more general manifolds, such as Riemannian manifolds of non-negative curvature, Finsler manifolds (Shen, 2001), which have recently received some attention in Machine Learning (L opez et al., 2021; Pouplin et al., 2023; Lin et al., 2023), or more generally, metric spaces. For projections, we studied two natural generalizations of the projection used in Euclidean spaces. We could also study other projections which do not follow geodesics subspaces or horospheres, but are well suited to Riemannian manifolds, in the same spirit of the Generalized Sliced-Wasserstein. Other subspaces could also be used, such as Hilbert curves (Bernton et al., 2019; Li et al., 2024) adapted to manifolds, or higher dimensional subspaces (Paty and Cuturi, 2019; Chami et al., 2021). Finally, we could also deﬁne other variations of CHSW such as max-CHSW for instance, and more generally adapt many of the variants which have been proposed for SW to the case of Riemannian manifolds. Note also that we could plug these constructions into the framework introduced by Bonet et al. (2024) to compare positive measures on Hadamard manifolds. On the theoretical side, we still need to show that these Sliced-Wasserstein discrepancies are proper distances by showing the indiscernible property. It might also be interesting to study whether statistical properties for the Euclidean SW distance, derived in e.g., (Nietert et al., 2022; Manole et al., 2022; Goldfeld et al., 2022; Xu and Huang, 2022; Xi and Niles Weed, 2022) still hold more generally for CHSW on any Cartan-Hadamard manifold, or to study the properties of the space of probabilities endowed with these distances, such as

Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifold

geodesic properties or the gradient ﬂows in this space, as it was recently done in (Candau Tilh, 2020; Bonet et al., 2022; Park and Slepˇcev, 2023; Kitagawa and Takatsu, 2023) for the Euclidean Sliced-Wasserstein distance.

Acknowledgments

This research was funded by project Dyna Learn from Labex Comin Labs and Region Bretagne ARED DLearn Me, and by the project OTTOPIA ANR-20-CHIA-0030 of the French National Research Agency (ANR). Cl ement Bonet s research was partially supported by the center Hi! PARIS.

Bonet, Drumetz, Courty

Appendix A. Useful Lemmas

We derive here some lemmas which will be useful for the proofs.

Lemma 37 (Lemma 6 in (Paty and Cuturi, 2019)) Let M, N be two Riemannian manifolds. Let f : M N be a measurable map and µ, ν P(M). Then,

Π(f#µ, f#ν) = {(f f)#γ, γ Π(µ, ν)}.

Proof This is a straightforward extension of (Paty and Cuturi, 2019, Lemma 6).

Lemma 38 Let (M, g) be a Hadamard manifold with origin o. Let v To M, then

1. the geodesic projection P v is 1-Lipschitz.

2. the Busemann function Bv is 1-Lipschitz.

1. By Proposition 2, we know that

x, y M, |P v(x) P v(y)| = d P v(x), P v(y) .

Moreover, by (Ballmann et al., 2006, Page 9), P v is 1-Lipschitz, so is P v.

2. The Busemann function is 1-Lipschitz, see e.g. (Bridson and Haeﬂiger, 2013, II. Proposition 8.22).

Lemma 39 Let d be a metric on M. Then, for any p 1,

x, y M, d(x, y)p 2p 1 d(x, o)p + d(o, y)p .

Lemma 40 (Lemma 1 in (Rakotomamonjy et al., 2021)) Let p 1 and η Pp(R). Denote Mq(η) = R |x|q dη(x) the moments of order q and assume that Mq(η) < for some q > p. Then, there exists a constant Cp,q depending only on p, q such that for all n 1,

E[W p p (ˆηn, η)] Cp,q Mq(η)p/q n 1/21{q>2p} + n 1/2 log(n)1{q=2p} + n (q p)/q1{q (p,2p)} .

Lemma 41 Let y M and denote for all x M, f(x) = d(x, y)2. Then, grad Mf(x) = 2 logx(y).

For references about Lemma 41, see e.g. (Chewi et al., 2020, Appendix A) or (Goto and Sato, 2021).

Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifold

Appendix B. Proofs of Section 3

B.1 Proof of Proposition 2

Proof of Proposition 2 Let x, y Gv. Then, there exists s, t R such that x = expo(sv) and y = expo(tv). By a simple calculation, we have on one hand that

sign( logo(x), v o) = sign( logo(expo(sv)), v o)

= sign(s v 2 o)

using that logo expo = Id. And similarly, sign( logo(y), v o) = sign(t). Then, by noting that o = expo(0), and recalling that d(x, y) = d(expo(tv), expo(sv)) = |t s| v o,

|tv(x) tv(y)| = |sign( logo(x), v o)d(x, o) sign( logo(y), v od(y, o)|

= sign(s)d(expo(sv), expo(0)) sign(t)d(expo(tv), expo(0))

= sign(s)|s| sign(t)|t| v o = |s t| v o = d(x, y).

B.2 Proof of Proposition 3

Proof of Proposition 3 We want to solve:

P v(x) = argmin t R d γ(t), x 2,

where γ(t) = expo(tv). For t R, let g(t) = d γ(t), x 2 = f γ(t) where f(x) = d(x, y)2 for x, y M. Then, by Lemma 41, we have for any t R,

g (t) = 0 γ (t), grad Mf γ(t) γ(t) = 0

γ (t), 2 logγ(t)(x) γ(t) = 0.

B.3 Proof of Proposition 4

Proof of Proposition 4 First, we note that P v = tv P v. Then, by using Lemma 37 which states that Π(f#µ, f#ν) = {(f f)#γ, γ Π(µ, ν)} for any f measurable, as well

Bonet, Drumetz, Courty

as that by Proposition 2, |tv(x) tv(y)| = d(x, y), we have:

W p p (P v #µ, P v #ν) = inf γ Π(P v #µ,P v #ν)

R R |x y|p dγ(x, y)

= inf γ Π(µ,ν)

R R |x y|p d(P v P v)#γ(x, y)

= inf γ Π(µ,ν)

M M |P v(x) P v(y)|p dγ(x, y)

= inf γ Π(µ,ν)

M M |tv( P v(x)) tv( P v(y))|p dγ(x, y)

= inf γ Π(µ,ν)

M M d P v(x), P v(y) p dγ(x, y)

= inf γ Π(µ,ν)

M M d(x, y)p d( P v P v)#γ(x, y)

= inf γ Π( P v #µ, P v #ν)

Gv Gv d(x, y)p dγ(x, y)

= W p p ( P v #µ, P v #ν).

Now, let us show the results when using the Busemann projection. Let v To M such that v o = 1, and recall that Bv(x) = expo( Bv(x)v). First, let us compute tv Bv:

x M, tv Bv(x) = sign( logo( Bv(x)), v o) d( Bv(x), o)

= sign logo(expo( Bv(x)v)), v o d expo( Bv(x)v), expo(0)

= sign( Bv(x) v 2 o) d(expo( Bv(x)v), expo(0))

= sign( Bv(x))| Bv(x)|

Then, using the same computation as before, we get

W p p (Bv #µ, Bv #ν) = W p p ( Bv #µ, Bv #ν).

Appendix C. Proofs of Section 4

C.1 Proof of Proposition 7

Proof of Proposition 7 Geodesic projection. Let x M. Denote f : R R such that

f(t) = d M γ(t), x 2

= d M φ 1(φ(o) + tφ ,o(v)), x 2

= φ φ 1(φ(o) + tφ ,o(v)) φ(x) 2

= t2 φ ,o(v) 2 2t φ(x) φ(o), φ ,o(v) + φ(o) φ(x) 2

= t2 2t φ(x) φ(o), φ ,o(v) + φ(o) φ(x) 2,

Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifold

using in the last line that φ ,o(v) 2 = 1 since v So. Then,

f (t) = 0 t = φ(x) φ(o), φ ,o(v) .

Therefore, P v(x) = argmin t R f(t) = φ(x) φ(o), φ ,o(v) .

Busemann function. First, following (Bridson and Haeﬂiger, 2013), we have for all x M,

Bv(x) = lim t d M(γv(t), x) t = lim t d M(γv(t), x)2 t2

denoting γv : t 7 φ 1 φ(o) + tφ ,o(v) the geodesic line associated to Gv. Then, we get

d M(γv(t), x)2 t2

2t φ(γv(t))2 φ(x) 2 t2

2t φ(o) + tφ ,o(v) φ(x) 2 t2

2t t2 φ ,o(v) 2 2t φ ,o(v), φ(x) φ(o) + φ(x) φ(o) 2 t2

= φ ,o(v), φ(x) φ(o) + 1

2t φ(x) φ(o) 2,

using that v o = φ ,o(v) = 1. Then, by passing to the limit t , we ﬁnd

Bv(x) = φ ,o(v), φ(x) φ(o) .

C.2 Proof of Proposition 9

We start by giving the proof of the coordinate geodesic projection which we recall in Proposition 42.

Proposition 42 (Coordinate of the geodesic projection on Hyperbolic space)

1. Let Gv = span(x0, v) Ld K where v Tx0Ld K Sd. Then, the coordinate P v(x) of the geodesic projection on Gv of x Ld K is

K arctanh 1

K x, v L x, x0 L

2. Let v Sd 1 be an ideal point. Then, the coordinate P v(x) of the geodesic projection on the geodesic characterized by v of x Bd K is

where s is deﬁned as

( 1 K x 2 2

(1 K x 2 2)2+4K x, v 2

2K x, v if x, v = 0 0 if x, v = 0.

Bonet, Drumetz, Courty

First, we will compute in Proposition 43 the geodesic projections.

Proposition 43 (Geodesic projection)

1. Let Gv = span(x0, v) Ld K where v Tx0Ld K Sd. Then, the geodesic projection P v

on Gv of x Ld K is

P v(x) = 1 q

K x, x0 2 L x, v 2 L

K x, x0 Lx0 + x, v Lv .

2. Let v Sd 1 be an in ideal point. Then, the geodesic projection P v on the geodesic characterized by v of x Bd K is

P v(x) = s(x) v,

( 1 K x 2 2

(1 K x 2 2)2+4K x, v 2

2K x, v if x, v = 0 0 if x, v = 0.

Proof of Proposition 43

1. Lorentz model. Any point y on the geodesic obtained by the intersection between E = span(x0, v) and Ld K can be written as

Kt)x0 + sinh(

where t R. Moreover, as arccosh is an increasing function, we have

P v(x) = argmin y E Ld K d L(x, y)

= argmin y E Ld K arccosh(K x, y L)

= argmin y E Ld K K x, y L.

This problem is equivalent with solving

argmin t R K cosh(

Kt) x, x0 L + K sinh(

Let g(t) = cosh(

Kt) x, x0 L + sinh( Kt) K x, v L, then

g (t) = 0 tanh(

K x, v L x, x0 L . (5)

Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifold

Finally, using that 1 tanh2(t) = 1 cosh2(t) and cosh2(t) sinh2(t) = 1, and observing

that necessarily, x, x0 L 0, we obtain

1 1 K x,v L x,x0 L

K x, x0 L q

K x, x0 2 L x, v 2 L ,

Kt) = x, v L q

K x, x0 2 L x, v 2 L .

2. Poincar e ball. A geodesic passing through the origin on the Poincar e ball is of the form γ(t) = tp for an ideal point p Sd 1 and t ] 1 K , 1 K [. Using that arccosh is an increasing function, we ﬁnd

P p(x) = argmin y span(γ) d B(x, y)

= argmin tp 1

K arccosh 1 2K x γ(t) 2 2 (1 + K x 2 2)(1 + K γ(t) 2 2)

= argmin tp log x γ(t) 2 2 log 1 + K x 2 2 log 1 + K γ(t) 2 2

= argmin tp log x tp 2 2 log 1 + Kt2 .

Let g(t) = log x tp 2 2 log 1 + Kt2 . Then,

( t2 + 1 K x 2 2 K x,p t 1

K = 0 if p, x = 0, t = 0 if p, x = 0.

Finally, if x, p = 0, the solution is

t = 1 K x 2 2 2K x, p

s 1 K x 2 2 2K x, p

Now, let us suppose that x, p > 0. Then,

1 K x 2 2 2K x, p +

s 1 K x 2 2 2K x, p

K 1 K x 2 2 2K x, p

Kx p 2 2 0 implies that 1 K x 2 2 2 K x,p 1 which implies that 1 K x 2 2 2K x,p

1 K , and therefore the solution is

t = 1 K x 2 2 2K x, p

s 1 K x 2 2 2K x, p

Bonet, Drumetz, Courty

Similarly, if x, p < 0, then

1 K x 2 2 2K x, p

s 1 K x 2 2 2K x, p

K 1 K x 2 2 2K x, p

Kx + p 2 2 0 implies 1 K x 2 2 2 K x,p 1, which implies that 1 K x 2 2 2K x,p

1 K and the solution is

1 K x 2 2 2K x, p +

s 1 K x 2 2 2K x, p

1 K x 2 2 2K x,p

r 1 K x 2 2 2K x,p 2 + 1

K if x, p > 0

1 K x 2 2 2K x,p +

r 1 K x 2 2 2K x,p 2 + 1

K if x, p < 0.

= 1 K x 2 2 2K x, p sign( x, p )

s 1 K x 2 2 2K x, p

= 1 K x 2 2 2K x, p sign( x, p ) 2Ksign( x, p ) x, p

(1 K x 2 2)2 + 4K x, p 2

= 1 K x 2 2 p

(1 K x 2 2)2 + 4K x, p 2

Proof of Proposition 42

1. Lorentz model. The coordinate on the geodesic can be obtained as

P v(x) = argmin t R d L expx0(tv), x .

Hence, by using Equation (5), we obtain that the optimal t satisﬁes

K x, v L x, x0 L t = 1

K arctanh 1

K x, v L x, x0 L

2. Poincar e ball. As a geodesic is of the form γ(t) = tanh Kt

2 p K for all t R, we deduce from Proposition 43 that

Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifold

We now derive the closed forms of the horospherical projections which we recall in Proposition 44.

Proposition 44 (Busemann function on Hyperbolic space)

1. On Ld K, for any direction v Tx0Ld K Sd,

x Ld K, Bv(x) = 1

2. On Bd K, for any ideal point v Sd 1,

x Bd K, B v(x) = 1

Kx 2 2 1 + K x 2 2

Proof of Proposition 44

1. Lorentz model. The geodesic in direction v can be characterized by

t R, γv(t) = cosh(

Kt)x0 + sinh(

Hence, we have for all x Ld K,

d L(γv(t), x)

K arccosh(K γv(t), x L)

K arccosh(K cosh(

Kt) x, x0 L + K

Kt) x, v L)

K e Kt + e Kt

2 x, x0 L + K

K e Kt e Kt

2 (1 + e 2 Kt) x, x0 L + 1

K (1 e 2 Kt) x, v L !

K arccosh x(t) .

Then, on one hand, we have x(t) t , and using that arccosh(x) = log x +

x2 1 , we have

d L(γv(t), x) t = 1

log x(t) + p

K log x(t) + p

x(t)2 1 e Kt

e Ktx(t) + e Ktx(t)

K log e Ktx(t) + e Ktx(t) 1 1 2x(t)2 + o 1 x(t)2

Bonet, Drumetz, Courty

e Ktx(t) = K

2 (1 + e 2 Kt) x, x0 L + K 2

K (1 e 2 Kt) x, v L

x, x0 L + x, v L

K log K x, x0 L + x, v L

2. Poincar e ball.

Let p Sd 1, then the geodesic from 0 to p is of the form γp(t) = exp0(tp) = tanh( Kt

2 ) p K . Moreover, recall that arccosh(x) = log(x +

d B(γp(t), x) = 1

1 2K tanh( Kt

2 ) p K x 2 2

(1 tanh2( Kt

2 ))(1 + K x 2 2)

K arccosh(1 + x(t)),

x(t) = 2K tanh( Kt

2 ) p K x 2 2

(1 tanh2( Kt

2 ))(1 + K x 2 2) .

Now, on one hand, we have

Bp(x) = lim t (d B(γp(t), x) t)

log 1 + x(t) + p

x(t)2 + 2x(t)

K log e Kt(1 + x(t) + p

x(t)2 + 2x(t)) .

On the other hand, using that tanh( t

e Ktx(t) = 2Ke Kt e Kt 1 e Kt+1 p K x 2 2

(1 (e Kt 1 e Kt+1)2)(1 + K x 2 2)

= 2e Kt e Ktp p

Kx 2 2 4e Kt(1 + K x 2 2)

Ke Ktx 2 2 1 + K x 2 2

Kx 2 2 1 + K x 2 2 .

Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifold

Bp(x) = lim t 1

e Kt + e Ktx(t) + e Ktx(t)

Kx 2 2 1 + K x 2 2

using that q

1 + 2 x(t) = 1 + 1 x(t) + o( 1

x(t)) and 1 x(t) t 0.

C.3 Proof of Proposition 11

First, we recall and show two lemmas.

Lemma 45 (Proposition 5.6.c in (Lee, 2006)) Suppose φ : (M, g) ( M, g) is an isometry. Then, φ takes geodesics to geodesics, i.e. if γ is the geodesic in M with γ(0) = p and γ (0) = v, then φ γ is the geodesic in M with φ(γ(0)) = φ(p) and (φ γ) (0) = φ ,p(v).

Lemma 46 Let φ : (M, g) ( M, g) an isometry and v To M such that v o = 1. Then for all x M,

Bv(x) = Bφ ,o(v) φ(x) , (6)

P v(x) = P φ ,o(v) φ(x) . (7)

Proof of Lemma 46 Let v To M such that v o = 1, x M. By Lemma 45, we have φ expo(tv) = expφ(o) tφ ,o(v) . Proof of Equation (6). Let us show that Bv(x) = Bφ ,o(v) φ(x) . By deﬁnition of the Busemann function, we have

Bv(x) = lim t d M x, expo(tv) t

= lim t d M φ(x), φ(expo(tv)) t since φ is an isometry

= lim t d M φ(x), expφ(o)(tφ ,o(v)) t

= Bφ ,o(v) φ(x) .

Proof of Equation (7). Let us now show that P v(x) = P φ ,o(v) φ(x) . Then,

P v(x) = argmin t R d M x, expo(tv)

= argmin t R d M φ(x), φ(expo(tv)) since φ is an isometry

= argmin t R d M φ(x), expφ(o)(tφ ,o(v)) by Lemma 45

= P φ ,o(v) φ(x) .

Bonet, Drumetz, Courty

Proof of Proposition 11 First, let us show that for λo-almost all v So, W p p (Bv #µ, Bv #ν) =

W p p (Bφ ,o(v) # µ, Bφ ,o(v) # ν) and W p p (P v #µ, P v #ν) = W p p (P φ ,o(v) # µ, P φ ,o(v) # ν). Using Lemma 46, we have

W p p (Bv #µ, Bv #ν) = W p p (Bφ ,o(v) # φ#µ, Bφ ,o(v) # φ#ν) = W p p (Bφ ,o(v) # µ, Bφ ,o(v) # ν),

W p p (P v #µ, P v #ν) = W p p (P φ ,o(v) # φ#µ, P φ ,o(v) # φ#ν) = W p p (P φ ,o(v) # µ, P φ ,o(v) # ν).

These results are true for all v So, and therefore for λo-almost all v So. Thus, by integrating with respect to λo, and performing the change of v 7 φ ,o(v) on the right side, we ﬁnd

HCHSWp p(µ, ν; λo) = HCHSWp p( µ, ν; (φ ,o)#λo),

GCHSWp p(µ, ν; λo) = GCHSWp p( µ, ν; (φ ,o)#λo).

Finally, we can conclude by using that φ ,o is an isometry between the tangent spaces and hence (φ ,o)#λo = λφ(o).

C.4 Proof of Proposition 15

First, let us compute the diﬀerential of φ = ϕ L. In that purpose, we ﬁrst recall the diﬀerential of L : X = LLT 7 L derived in (Lin, 2019, Proposition 4).

Lemma 47 (Proposition 4 in (Lin, 2019)) Let X S++ d (R) and V Sd(R). The differential operator L ,X : TXS++ d (R) TL(X)L++ d (R) of L at X is given by

L ,X(V ) = L(X) L(X) 1V L(X) T + 1

2diag L(X) 1V L(X) T .

Lemma 48 Let φ : X 7 ϕ L(X) and X = LLT S++ d (R) with L L++ d (R) obtained by the Cholesky decomposition. The diﬀerential operator of φ at X is given by

V TXS++ d (R), φ ,X(V ) = L ,X(V ) + diag L(X) 1diag L ,X(V ) ,

where L ,X(V ) = L(X) L(X) 1V L(X) T + 1

2diag L(X) 1V L(X) T .

Proof of Lemma 48 Using the chain rule, we have, for X S++ d (R) and V Sd(R),

φ ,X(V ) = ϕ ,X L ,X(V ) = L ,X(V ) + log ,diag(L) diag(L ,X(V ))

= L ,X(V ) + Σ diag(L ,X(V )) ,

using Lemma 12 for the diﬀerential of the log with

Σ diag(L ,X(V )) = diag(L ,X(V )) Γ

= diag(L ,X(V )) diag(L(X))

= diag(L(X)) 1diag(L ,X(V )).

Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifold

Thus, we conclude that φ ,X(V ) = L ,X(V ) + diag L(X) 1diag L ,X(V ) .

Proof of Proposition 15 On one hand we have φ(Id) = 0, L ,Id(V ) = V + 1

2diag(V ) and thus φ ,Id(V ) = V + 1

2diag(V ) since L(Id) = Id. Thus, using Proposition 7, the projection is given, for A Sd(R) such that A 2 Id = φ ,Id(A), φ ,Id(A) F = 1, by

X = LLT S++ d (R), P A(X) = φ(X), φ ,Id(A) F

= L + log(diag(L)), A + 1

= L , A F + log(diag(L)), 1

2diag(A) F .

C.5 Proof of Proposition 16

Proof of Proposition 16 We use that Bγ(x) = limt d(x,γ(t))2 t2

2t (see e.g. (Bridson and Haeﬂiger, 2013, II. 8.24)). Thus,

Bγ(x) = lim t d(x, γ(t)) t

= lim t d(x, γ(t))2 t2

i=1 λi di(xi, γi(λit))2 λ2 i t2

i=1 λi Bγi(xi).

Appendix D. Proofs of Section 5

D.1 Proof of Proposition 17

Proof of Proposition 17 First, we will show that for any µ, ν Pp(M), CHSWp(µ, ν) < . Let µ, ν Pp(M), and let γ Π(µ, ν) be an arbitrary coupling between them. Then by using ﬁrst Lemma 37 followed by the 1-Lipschitzness of the projections Lemma 38 and

Bonet, Drumetz, Courty

Lemma 39, we obtain

W p p (P v #µ, P v #ν) = inf γ Π(µ,ν)

Z |P v(x) P v(y)|p dγ(x, y)

Z |P v(x) P v(y)|p dγ(x, y)

Z d(x, y)p dγ(x, y)

2p 1 Z d(x, o)p dµ(x) + Z d(o, y)p dν(y)

Hence, we can conclude that CHSWp p(µ, ν) < . Now, let us show that it is a pseudo-distance. First, it is straightforward to see that CHSWp(µ, ν) 0, that it is symmetric, i.e. CHSWp(µ, ν) = CHSWp(ν, µ), and that µ = ν implies that CHSWp(µ, ν) = 0 using that Wp is well a distance. For the triangular inequality, we can derive it using the triangular inequality for Wp and the Minkowski inequality. Let µ, ν, α Pp(M),

CHSWp(µ, ν) = Z

So W p p (P v #µ, P v #ν) dλo(v) 1

Wp(P v #µ, P v #α) + Wp(P v #α, P v #ν) p dλo(v) 1

So W p p (P v #µ, P v #α) dλo(v) 1

So W p p (P v #α, P v #ν) dλo(v) 1

= CHSWp(µ, α) + CHSWp(α, ν).

D.2 Proof of Proposition 18

Proof of Proposition 18 Let f L1(M), g C0(R So), then by Fubini s theorem,

CHRf, g R So = Z

R CHRf(t, v)g(t, v) dtdλo(v)

M f(x)1{t=P v(x)}g(t, v) d Vol(x)dtdλo(v)

R g(t, v)1{t=P v(x)} dtdλo(v)d Vol(x)

So g P v(x), v dλo(v)d Vol(x)

M f(x)CHR g(x) d Vol(x)

= f, CHR g M.

Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifold

D.3 Proof of Proposition 19

Proof of Proposition 19 We follow the proof of (Boman and Lindskog, 2009, Lemma 1). On one hand, g C0(R So), thus for all ϵ > 0, there exists M > 0 such that |t| M implies |g(t, v)| ϵ for all v So. Let ϵ > 0 and M > 0 which satisﬁes the previous property. Denote E(x, M) = {v So, |P v(x)| < M}. Then, as d(x, o) > 0, we have

E(x, M) = {v So, |P v(x)| < M} = v So, P v(x)

d(x, o) < M d(x, o)

Thus, λo E(x, M) d(x,o) 0. Choose M such that d(x, o) > M implies that

λo E(x, M) < ϵ.

Then, for x M such that |P v(x)| max(M, M ) (and thus d(x, o) M since |P v(x)| d(x, o) as P v is Lipschitz,

E(x,M) g(P v(x), v) dλo(v)

E(x,M)c g(P v(x), v) dλo(v)

g λo E(x, M) + ϵλo E(x, M)c

Thus, we showed that CHR g(x) d(x,o) 0, and thus CHR g C0(M).

D.4 Proof of Proposition 21

Proof of Proposition 21 Let g C0(R So), as CHRµ = λo Kµ, we have by deﬁnition Z

R g(t, v) Kµ(v, dt) dλo(v) = Z

R So g(t, v) d(CHRµ)(t, v).

Hence, using the property of the dual, we have for all g Co(R So), Z

R g(t, v) Kµ(v, dt) dλo(v) = Z

R So g(t, v) d(CHRµ)(t, v)

M CHR g(x) dµ(x)

So g(P v(x), v) dλo(v)dµ(x)

M g(P v(x), v) dµ(x)dλo(v)

R g(t, v) d(P v #µ)(t)dλo(v).

Bonet, Drumetz, Courty

Hence, for λo-almost every v So, Kµ(v, ) = P v #µ.

D.5 Proof of Proposition 22

Proof of Proposition 22 Using Lemma 37 and that the projections are 1-Lipschitz (Lemma 38), we can show that, for any µ, ν Pp(M),

CHSWp p(µ, ν) = inf γ Π(µ,ν)

Z |P v(x) P v(y)|p dγ(x, y).

Let γ Π(µ, ν) being an optimal coupling for the Wasserstein distance with ground cost d, then,

CHSWp p(µ, ν) Z |P v(x) P v(y)|p dγ (x, y)

Z d(x, y)p dγ (x, y)

= W p p (µ, ν).

D.6 Proof of Proposition 23

Proof of Proposition 23 Let µ, ν Pp(M), then

CHSWp p(µ, ν) = Z

So W p p (P v #µ, P v #ν) dλo(v)

So F 1 P v #µ F 1 P v #ν p Lp([0,1]) dλo(v)

F 1 P v #µ(q) F 1 P v #ν(q) p dq dλo(v)

= Φ(µ) Φ(ν) p H.

Thus, CHSWp is Hilbertian.

D.7 Proof of Lemma 25

Proof of Lemma 25 Since for any v So and x M, P v(x) = φ(x) φ(o), φ ,o(x) , by using Lemma 37 we have W p p (P v #µ, P v #ν) = inf γ Π(P v #µ,P v #ν)

Z |x y|p dγ(x, y)

= inf γ Π(µ,ν)

Z |P v(x) P v(y)|p dγ(x, y)

= inf γ Π(µ,ν)

Z | φ(x) φ(y), φ ,o(v) |p dγ(x, y).

Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifold

Let s note Qv(x) = x, v . Then, we obtain

W p p (P v #µ, P v #ν) = inf γ Π(µ,ν)

Z |Qφ ,o(v)(φ(x)) Qφ ,o(v)(φ(y))|p dγ(x, y)

= W p p (Qφ ,o(v) # φ#µ, Qφ ,o(v) # φ#ν).

Therefore, we obtain

CHSWp p(µ, ν) = Z

So W p p (Qφ ,o(v) # φ#µ, Qφ ,o(v) # φ#ν) dλo(v)

Sφ(o) W p p (Qv #φ#µ, Qv #φ#ν) d (φ ,o)#λo (v).

Finally, since φ ,o is an isometry between the tangent spaces by deﬁnition of the metric, we have (φ ,o)#λo = λφ(o).

D.8 Proof of Proposition 26

Proof of Proposition 26 We know by Proposition 17 that CHSWp is a ﬁnite pseudo-distance. For the indiscernible property, using Lemma 25 and the distance property of SWp, we have that CHSWp(µ, ν) = SWp p(φ#µ, φ#ν) = 0 implies that φ#µ = φ#ν by applying the same proof of (Bonnotte, 2013, Proposition 5.1.2). Indeed, we have that SWp p(φ#µ, φ#ν) = 0 implies W p p (Qv #φ#µ, Qv #φ#ν) = 0 for λφ(o)-almost every v Sφ(o), and thus that Qv #φ#µ = Qv #φ#ν since Wp is a distance. Hence, using the Fourier transform and that λφ(o) is absolutely continuous with respect to the Lebesgue measure, we obtain that φ#µ = φ#ν. Then, as φ is a bijection from M to N, we have for all Borelian C M,

M 1C(x) dµ(x)

N 1C φ 1(y) d(φ#µ)(y)

N 1C φ 1(y) d(φ#ν)(y)

M 1C(x) dν(x)

D.9 Proof of Proposition 27

To prove Proposition 27, we will adapt the proof of Nadjahi et al. (2020) to our projection. First, we start to adapt Nadjahi et al. (2020, Lemma S1):

Bonet, Drumetz, Courty

Lemma 49 (Lemma S1 in Nadjahi et al. (2020)) Let (µk)k Pp(M) and µ Pp(M) such that limk CHSW1(µk, µ) = 0. Then, there exists ϕ : N N non decreasing such

that µϕ(k) L k µ.

Proof of Theorem 49 Using Lemma 25, we know that CHSW1(µ, ν) = SW1(φ#µ, φ#ν). Let s note αk = φ#µk Pp(N) and α = φ#µ Pp(N) and Qv(x) = v, x . Then, by Bogachev and Ruas (2007, Theorem 2.2.5),

Sφ(o) W1(Qv #αk, Qv #α) dλφ(o)(v) = 0

implies that there exists a subsequence (µϕ(k))k such that for λφ(o)-almost every v,

W1(Qv #αϕ(k), Qv #α) k 0.

As the Wasserstein distance metrizes the weak convergence, this is equivalent to

Qv #µϕ(k) L k Qv #µ.

Then, by Levy s characterization theorem, this is equivalent with the pointwise convergence of the characterization function, i.e. for all t R, ΦQv #αϕ(k)(t) k ΦQv #µ(t).

Then, working in Tφ(o)N with the Euclidean norm, we can use the same proof of Nadjahi et al. (2020) by using a convolution with a gaussian kernel and show that it implies that

αϕ(k) L k α, i.e. φ#µϕ(k) L k φ#µ.

Finally, let s show that it implies the weak convergence of (µϕ(k))k towards µ. Let f Cb(M), then Z

M f dµϕ(k) = Z

N f φ 1 d(φ#µϕ(k)) k

N f φ 1 d(φ#µ) = Z

Hence, we an conclude that µϕ(k) L k µ.

Proof of Proposition 27 First, we suppose that µk L k µ in Pp(M). Then, by con-

tinuity, we have that for λo almost every v To M, P v #µk k P v #µ. Moreover, as the

Wasserstein distance on R metrizes the weak convergence, Wp(P v #µk, P v #µ) k 0. Fi-

nally, as Wp is bounded and it converges for λo-almost every v, we have by the Lebesgue convergence dominated theorem that CHSWp p(µk, µ) k 0.

For the opposite side, suppose that CHSWp(µk, µ) k 0. Then, since we generalized

(Nadjahi et al., 2020, Lemma S1) to our setting in Theorem 49, we can use the same contradiction argument as Nadjahi et al. (2020) and we conclude that (µk)k converges weakly to µ.

Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifold

D.10 Proof of Proposition 28

Proof of Proposition 28 By using Lemma 37, let us ﬁrst observe that

W1(µ, ν) = inf γ Π(µ,ν)

M M d M(x, y) dγ(x, y)

= inf γ Π(µ,ν)

M M φ(x) φ(y) dγ(x, y)

= inf γ Π(µ,ν)

N N x y d(φ φ)#γ(x, y)

= inf γ Π(φ#µ,φ#ν)

N N x y dγ(x, y)

= W1(φ#µ, φ#ν).

Here, we note that W1 must be understood with respect to the ground cost metric which makes sense given the space, i.e. d M on M and on N. Then, using Lemma 25, we have

CHSW1(µ, ν) = SW1(φ#µ, φ#ν).

Since N is a Euclidean inner product space of dimension d, we can apply (Bonnotte, 2013, Lemma 5.14), and we obtain

W1(µ, ν) = W1(φ#µ, φ#ν) Cd,p,r SW1(φ#µ, φ#ν) 1 d+1 = Cd,p,r CHSW1(µ, ν) 1 d+1 .

Then, using that W p p (µ, ν) (2r)p 1W1(µ, ν) and that by the H older inequality,

CHSW1(µ, ν) CHSWp(µ, ν),

we obtain (with a diﬀerent constant Cd,r,p)

W p p (µ, ν) Cd,r,p CHSWp(µ, ν) 1 d+1 .

D.11 Proof of Proposition 29

Proof of Proposition 29 First, using the triangular inequality, the reverse triangular inequality and the Jensen inequality for x 7 x1/p (which is concave since p 1), we have the following inequality

E[|CHSWp(ˆµn, ˆνn) CHSWp(µ, ν)|]

= E[|CHSWp(ˆµn, ˆνn) CHSWp(ˆµn, ν) + CHSWp(ˆµn, ν) CHSWp(µ, ν)|]

E[|CHSWp(ˆµn, ˆνn) CHSWp(ˆµn, ν)|] + E[|CHSWp(ˆµn, ν) CHSWp(µ, ν)|]

E[CHSWp(ν, ˆνn)] + E[CHSWp(µ, ˆµn)]

E[CHSWp p(ν, ˆνn)]1/p + E[CHSWp p(µ, ˆµn)]1/p.

Bonet, Drumetz, Courty

Moreover, by Fubini-Tonelli,

E[CHSWp p(ˆµn, µ)] = E Z

So W p p (P v #ˆµn, µ) dλo(v)

So E[W p p (P v #ˆµn, P v #µ)] dλo(v).

Then, by applying Theorem 40, we get that for q > p, there exists a constant Cp,q such that,

E[W p p (P v #ˆµn, P v #ν)]

Cp,q Mq(P v #µ)p/q n 1/21{q>2p} + n 1/2 log(n)1{q=2p} + n (q p)/q1{q (p,2p)} .

Then, noting that necessarily, P v(o) = 0 (for both the horospherical and geodesic projection, since the geodesic is of the form expo(tv)), and using that P v is 1-Lipschitz Lemma 38, we can bound the moments as

Mq(P v #µ) = Z

R |x|q d(P v #µ)(x)

M |P v(x)|q dµ(x)

M |P v(x) P v(o)|q dµ(x)

M d(x, o)q dµ(x)

Therefore, we have

E[CHSWp p(ˆµn, µ)]

Cp,q Mq(µ)p/q n 1/21{q>2p} + n 1/2 log(n)1{q=2p} + n (q p)/q1{q (p,2p)} ,

and similarly,

E[CHSWp p(ˆνn, ν)]

Cp,q Mq(ν)p/q n 1/21{q>2p} + n 1/2 log(n)1{q=2p} + n (q p)/q1{q (p,2p)} .

Hence, we conclude that

E[|CHSWp(ˆµn, ˆνn) CHSWp(µ, ν)|] 2C1/p p,q Mq(ν)1/q

n 1/(2p) if q > 2p, n 1/(2p) log(n)1/p if q = 2p, n (q p)/(pq) if q (p, 2p).

Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifold

D.12 Proof of Proposition 30

Proof of Proposition 30 Let (vℓ)L ℓ=1 be iid samples of λo. Then, by ﬁrst using Jensen inequality and then remembering that Ev[W p p (P v #µ, P v #ν)] = CHSWp p(µ, ν), we have

Ev h | \ CHSW p p,L(µ, ν) CHSWp p(µ, ν)| i2 Ev

\ CHSW p p,L(µ, ν) CHSWp p(µ, ν) 2

W p p (P vℓ # µ, P vℓ # ν) CHSWp p(µ, ν)

ℓ=1 W p p (P vℓ # µ, P vℓ # ν)

LVarv W p p (P v #µ, P v #ν)

W p p (P v #µ, P v #ν) CHSWp p(µ, ν) 2 dλo(v).

Appendix E. Proofs of Section 7

E.1 Proof of Proposition 31

Proof of Proposition 31 This proof follows the proof in the Euclidean case derived in (Bonnotte, 2013, Proposition 5.1.7) or in (Candau-Tilh, 2020, Proposition 1.33). As µ is absolutely continuous, P v #µ is also absolutely continuous and there is a Kantorovitch potential ψv between P v #µ and P v #ν. Moreover, as the support is restricted to a compact, it is Lipschitz and thus diﬀerentiable almost everywhere. First, using the duality formula, we obtain the following lower bound for all ϵ > 0,

CHSW2 2 (Tϵ)#µ, ν CHSW2 2(µ, ν) 2ϵ Z

ψv(P v(Tϵ(x))) ψv(P v(x))

ϵ dµ(x)dλo(v).

Then, we know that the exponential map satisﬁes expx(0) = x and d dt exp(tv)|t=0 = v. Taking the limit ϵ 0, the right term is equal to d

dtg(t)|t=0 with g(t) = ψv(P v(Tt(x))) and is equal to

d dtg(t)|t=0 = ψ v(P v(T0(x))) P v(T0(x)), d

dt Tt(x)|t=0 x = ψ v(P v(x)) grad MP v(x), ξ(x) x.

Therefore, by the Lebesgue dominated convergence theorem (we have the convergence λoalmost surely and |ψv(P v(Tϵ(x))) ψv(P v(x))| ϵ using that ψv and P v are Lipschitz and that d expx(ϵξ(x)), expx(0) Cϵ),

lim inf ϵ 0+ CHSW2 2 (Tϵ)#µ, ν CHSW2 2(µ, ν) 2ϵ

M ψ v(P v(x)) grad MP v(x), ξ(x) dµ(x)dλo(v).

Bonet, Drumetz, Courty

For the upper bound, ﬁrst, let πv Π(µ, ν) a coupling such that πv = (P v P v)#πv Π(P v #µ, P v #ν) is an optimal coupling for the regular quadratic cost. For πv-almost every (x, y), y = x ψ v(x) and thus for πv-almost every (x, y), P v(y) = P v(x) ψ v P v(x) . Therefore,

CHSW2 2(µ, ν) = Z

So W 2 2 (P v #µ, P v #ν) dλo(v)

R R |x y|2 d πv(x, y) dλo(v)

M M |P v(x) P v(y)|2 dπv(x, y) dλo(v).

On the other hand, ((P v Tϵ) P v)#πv Π(P v #(Tϵ)#µ, P v #ν) and hence

CHSW2 2 (Tϵ)#µ, ν = Z

So W 2 2 (P v #(Tϵ)#µ, P v #ν) dλo(v)

M M |P v(Tϵ(x)) P v(y)|2 dπv(x, y) dλo(v).

CHSW2 2 (Tϵ)#µ, ν CHSW2 2(µ, ν) 2ϵ

|P v(Tϵ(x)) P v(y)|2 |P v(x) P v(y)|2

2ϵ dπv(x, y) dλo(v).

Note g(ϵ) = P v(Tϵ(x)) P v(y) 2. Then, d

dϵg(ϵ)|ϵ=0 = 2 P v(x) P v(y) grad MP v(x), ξ(x) x. But, as for πv-almost every (x, y), P v(y) = P v(x) ψ v(P v(x)), we have

d dϵg(ϵ)|ϵ=0 = 2ψ v P v(x) grad MP v(x), ξ(x) x.

Finally, by the Lebesgue dominated convergence theorem, we obtain

lim sup ϵ 0+ CHSW2 2 (Tϵ)#µ, ν CHSW2 2(µ, ν) 2ϵ

M ψ v(P v(x)) grad MP v(x), ξ(x) x dµ(x)dλo(v).

E.2 Proof of Proposition 34

Proof of Proposition 34 We apply Proposition 33. First, using that for f : x 7 x, y L, f(x) = KJy, for all x Ld K,

Kx0 + v L .

Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifold

Thus, noticing that J2 = Id+1,

grad Ld KBv(x) = Proj K x KJ Bv(x)

Kx0 + v L K x, K

Kx0 + v L + K x,

Kx0 + v L x,

Kx0 + v L x

Kx0 + v L + Kx .

Similarly, we have

P v(x) = KJ x, x0 Lv x, v Lx0

x, v 2 L + K x, x0 2 L .

Thus, observing that x, P v(x) L = 0, we have

grad Ld KP v(x) = Proj K x KJ P v(x)

= KJ P v(x) K x, KJ P v(x) Lx

= KJ P v(x)

= K2 x, x0 Lv x, v Lx0

x, v 2 L + K x, x0 2 L .

E.3 Proof of Lemma 35

Proof of Lemma 35 By Lemma 12, we have φ ,X(V ) = UΣ(V )UT with Σ(V ) = UT V U Γ. Thus, UΣ(V )UT = W Σ(V ) = UT WU

UT V U Γ = UT WU

UT V U = UT WU Γ

V = U UT WU Γ UT .

E.4 Proof of Lemma 36

Proof of Lemma 36 By (Pennec, 2020, Equation 3.8), we know that log ,X(V ), Y = log ,X(Y ), V . Thus, by linearity, we have that

V TXS++ d (R), P A ,X(V ) = A, log ,X(V ) F = log ,X(A), V F .

Bonet, Drumetz, Courty

Then, applying Lemma 12, we have the result.

Appendix F. Busemann Function on SPDs endowed with Aﬃne-Invariant Metric

Let A Sd(R), M S++ d (R), we recall from Section 4.3.1 that the Busemann function can be computed as BA(M) = lim t d AI exp(t A), M t

= A, log(πA(M) F ,

where πA is a projection on the spaces of matrices commuting with exp(A) which belongs to a group G GLd(R) leaving the Busemann function invariant. In the next paragraph, we detail how we can proceed to obtain πA. When A is diagonal with sorted values such that A11 > > Add, then the group leaving the Busemann function invariant is the set of upper triangular matrices with ones on the diagonal (Bridson and Haeﬂiger, 2013, II. Proposition 10.66), i.e. for any such matrix g in that group, BA(M) = BA(g Mg T ). If the points are sorted in increasing order, then the group is the set of lower triangular matrices. Let s note GU the set of upper triangular matrices with ones on the diagonal. For a general A Sd(R), we can ﬁrst ﬁnd an appropriate diagonalization A = P AP T , where A is diagonal sorted, and apply the change of basis M = P T MP (Fletcher et al., 2009). We suppose that all the eigenvalues of A have an order of multiplicity of one. By the aﬃne-invariance property, the distances do not change, i.e. d AI(exp(t A), M) = d AI(exp(t A), M) and hence, using the deﬁnition of the Busemann function, we have that BA(M) = B A( M). Then, we need to project M on the space of matrices commuting with exp( A) which we denote F(A). By Bridson and Haeﬂiger (2013, II. Proposition 10.67), this space corresponds to the diagonal matrices. Moreover, by Bridson and Haeﬂiger (2013, II. Proposition 10.69), there is a unique pair (g, D) GU F(A) such that M = g Dg T , and therefore, we can note πA( M) = D. This decomposition actually corresponds to a UDU decomposition. If the eigenvalues of A are sorted in increasing order, this would correspond to a LDL decomposition.

Appendix G. Additional Details on Experiments

Table 3: Dataset characteristics. BBCSport Movies Goodreads genre Goodreads like

Doc 737 2000 1003 1003 Train 517 1500 752 752 Test 220 500 251 251 Classes 5 2 8 2 Mean words by doc 116 54 182 65 1491 538 1491 538 Median words by doc 104 175 1518 1518 Max words by doc 469 577 3499 3499

We sum up the statistics of the diﬀerent datasets in Table 3.

Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifold

Zeynep Akata, Florent Perronnin, Zaid Harchaoui, and Cordelia Schmid. Label-embedding for image classiﬁcation. IEEE transactions on pattern analysis and machine intelligence, 38(7):1425 1438, 2015.

David Alvarez-Melis and Nicolo Fusi. Geometric dataset distances via optimal transport. Advances in Neural Information Processing Systems, 33:21428 21439, 2020.

David Alvarez-Melis, Youssef Mroueh, and Tommi Jaakkola. Unsupervised hierarchy matching with optimal transport over hyperbolic spaces. In International Conference on Artiﬁcial Intelligence and Statistics, pages 1606 1617. PMLR, 2020.

Luigi Ambrosio, Nicola Gigli, and Giuseppe Savar e. Gradient ﬂows: in metric spaces and in the space of probability measures. Springer Science & Business Media, 2008.

Michael Arbel, Anna Korba, Adil Salim, and Arthur Gretton. Maximum mean discrepancy gradient ﬂow. Advances in Neural Information Processing Systems, 32, 2019.

Vincent Arsigny, Pierre Fillard, Xavier Pennec, and Nicholas Ayache. Fast and Simple Computations on Tensors with Log-Euclidean Metrics. Ph D thesis, INRIA, 2005.

Vincent Arsigny, Pierre Fillard, Xavier Pennec, and Nicholas Ayache. Log-Euclidean metrics for fast and simple calculus on diﬀusion tensors. Magnetic Resonance in Medicine: An Oﬃcial Journal of the International Society for Magnetic Resonance in Medicine, 56(2): 411 421, 2006.

Iskander Azangulov, Andrei Smolensky, Alexander Terenin, and Viacheslav Borovitskiy. Stationary kernels and Gaussian processes on Lie groups and their homogeneous spaces I: the compact case. Journal of Machine Learning Research, 25(280):1 52, 2024a.

Iskander Azangulov, Andrei Smolensky, Alexander Terenin, and Viacheslav Borovitskiy. Stationary kernels and Gaussian processes on Lie groups and their homogeneous spaces II: non-compact symmetric spaces. Journal of Machine Learning Research, 25(281):1 51, 2024b.

Werner Ballmann, Mikhael Gromov, and Viktor Schroeder. Manifolds of non positive curvature. In Arbeitstagung Bonn 1984: Proceedings of the meeting held by the Max-Planck Institut f ur Mathematik, Bonn June 15 22, 1984, pages 261 268. Springer, 2006.

Aur elien Bellet, Amaury Habrard, and Marc Sebban. A survey on metric learning for feature vectors and structured data. ar Xiv preprint ar Xiv:1306.6709, 2013.

Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, 35(8):1798 1828, 2013.

Espen Bernton, Pierre E Jacob, Mathieu Gerber, and Christian P Robert. Approximate Bayesian computation with the Wasserstein distance. Journal of the Royal Statistical Society Series B: Statistical Methodology, 81(2):235 269, 2019.

Bonet, Drumetz, Courty

J erˆome Bertrand and Benoˆıt Kloeckner. A geometric study of Wasserstein spaces: Hadamard spaces. Journal of Topology and Analysis, 4(04):515 542, 2012.

Rajendra Bhatia. Positive Deﬁnite Matrices. In Positive Deﬁnite Matrices. Princeton university press, 2009.

Kingshook Biswas. The Fourier transform on negatively curved harmonic manifolds. ar Xiv preprint ar Xiv:1802.07236, 2018.

Benjamin Blankertz, Ryota Tomioka, Steven Lemm, Motoaki Kawanabe, and Klaus-Robert Muller. Optimizing spatial ﬁlters for robust eeg single-trial analysis. IEEE Signal processing magazine, 25(1):41 56, 2007.

Vladimir Igorevich Bogachev and Maria Aparecida Soares Ruas. Measure Theory, volume 1. Springer, 2007.

Emmanuel Boissard and Thibaut Le Gouic. On the mean speed of convergence of empirical and occupation measures in Wasserstein distance. In Annales de l IHP Probabilit es et statistiques, volume 50, pages 539 563, 2014.

Jan Boman and Filip Lindskog. Support theorems for the Radon transform and Cram er Wold theorems. Journal of theoretical probability, 22(3):683 710, 2009.

Cl ement Bonet, Nicolas Courty, Fran cois Septier, and Lucas Drumetz. Eﬃcient Gradient Flows in Sliced-Wasserstein Space. Transactions on Machine Learning Research, 2022. ISSN 2835-8856.

Cl ement Bonet, Laetitia Chapel, Lucas Drumetz, and Nicolas Courty. Hyperbolic Sliced Wasserstein via geodesic and horospherical projections. In Topological, Algebraic and Geometric Learning Workshops 2023, pages 334 370. PMLR, 2023a.

Cl ement Bonet, Paul Berg, Nicolas Courty, Fran cois Septier, Lucas Drumetz, and Minh Tan Pham. Spherical Sliced-Wasserstein. In The Eleventh International Conference on Learning Representations, 2023b.

Cl ement Bonet, Benoˆıt Mal ezieux, Alain Rakotomamonjy, Lucas Drumetz, Thomas Moreau, Matthieu Kowalski, and Nicolas Courty. Sliced-Wasserstein on symmetric positive deﬁnite matrices for M/EEG signals. In International Conference on Machine Learning, pages 2777 2805. PMLR, 2023c.

Cl ement Bonet, Kimia Nadjahi, Thibault S ejourn e, Kilian Fatras, and Nicolas Courty. Slicing Unbalanced Optimal Transport. Transactions on Machine Learning Research, 2024.

Silvere Bonnabel. Stochastic gradient descent on Riemannian manifolds. IEEE Transactions on Automatic Control, 58(9):2217 2229, 2013.

Nicolas Bonneel, Julien Rabin, Gabriel Peyr e, and Hanspeter Pﬁster. Sliced and Radon Wasserstein barycenters of measures. Journal of Mathematical Imaging and Vision, 51: 22 45, 2015.

Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifold

Nicolas Bonnotte. Unidimensional and evolution methods for optimal transportation. Ph D thesis, Universit e Paris Sud-Paris XI; Scuola normale superiore (Pise, Italie), 2013.

Valentin De Bortoli, Emile Mathieu, Michael John Hutchinson, James Thornton, Yee Whye Teh, and Arnaud Doucet. Riemannian score-based generative modelling. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.

Joey Bose, Ariella Smofsky, Renjie Liao, Prakash Panangaden, and Will Hamilton. Latent variable modelling with hyperbolic normalizing ﬂows. In International Conference on Machine Learning, pages 1045 1055. PMLR, 2020.

Nicolas Boumal. An Introduction to Optimization on Smooth Manifolds. Cambridge University Press, 2023.

Martin R Bridson and Andr e Haeﬂiger. Metric Spaces of Non-Positive Curvature, volume 319. Springer Science & Business Media, 2013.

Michael M Bronstein, Joan Bruna, Yann Le Cun, Arthur Szlam, and Pierre Vandergheynst. Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine, 34(4):18 42, 2017.

Daniel Brooks, Olivier Schwander, Frederic Barbaresco, Jean-Yves Schneider, and Matthieu Cord. Riemannian batch normalization for SPD neural networks. Advances in Neural Information Processing Systems, 32, 2019a.

Daniel A Brooks, Olivier Schwander, Fr ed eric Barbaresco, Jean-Yves Schneider, and Matthieu Cord. Exploring complex time-series representations for Riemannian machine learning of radar data. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3672 3676. IEEE, 2019b.

Yann Cabanes. Apprentissage dans les disques de Poincar e et de Siegel de s eries temporelles multidimensionnelles complexes suivant un mod ele autor egressif gaussien stationnaire centr e: application a la classiﬁcation de donn ees audio et de fouillis radar. Ph D thesis, Bordeaux, 2022.

Jules Candau-Tilh. Wasserstein and Sliced-Wasserstein Distances. Master s thesis, Universit e Pierre et Marie Curie, 2020.

James W Cannon, William J Floyd, Richard Kenyon, Walter R Parry, et al. Hyperbolic geometry. Flavors of geometry, 31(59-115):2, 1997.

Mathieu Carriere, Marco Cuturi, and Steve Oudot. Sliced Wasserstein kernel for persistence diagrams. In International conference on machine learning, pages 664 673. PMLR, 2017.

Ines Chami, Albert Gu, Dat P Nguyen, and Christopher R e. Horo PCA: Hyperbolic dimensionality reduction via horospherical projections. In International Conference on Machine Learning, pages 1419 1429. PMLR, 2021.

Bonet, Drumetz, Courty

Ricky TQ Chen and Yaron Lipman. Riemannian ﬂow matching on general geometries. In International Conference on Machine Learning, 2023.

Ziheng Chen, Yue Song, Gaowen Liu, Ramana Rao Kompella, Xiaojun Wu, and Nicu Sebe. Riemannian multinomial logistics regression for SPD neural networks. In Conference on Computer Vision and Pattern Recognition 2024, 2024a.

Ziheng Chen, Yue Song, Tianyang Xu, Zhiwu Huang, Xiao-Jun Wu, and Nicu Sebe. Adaptive Log-Euclidean metrics for SPD matrix learning. IEEE Transactions on Image Processing, 2024b.

Emmanuel Chevallier and Nicolas Guigui. Wrapped statistical models on manifolds: motivations, the case SE(n), and generalization to symmetric spaces. In Workshop on Joint Structures and Common Foundations of Statistical Physics, Information Geometry and Inference for Learning, pages 96 106. Springer, 2020.

Emmanuel Chevallier, Emmanuel Kalunga, and Jes us Angulo. Kernel density estimation on spaces of Gaussian distributions and symmetric positive deﬁnite matrices. SIAM Journal on Imaging Sciences, 10(1):191 215, 2017.

Emmanuel Chevallier, Didong Li, Yulong Lu, and David Dunson. Exponential-wrapped distributions on symmetric spaces. SIAM Journal on Mathematics of Data Science, 4(4): 1347 1368, 2022.

Sinho Chewi, Tyler Maunu, Philippe Rigollet, and Austin J Stromme. Gradient descent algorithms for Bures-Wasserstein barycenters. In Conference on Learning Theory, pages 1276 1304. PMLR, 2020.

Seunghyuk Cho, Juyong Lee, and Dongwoo Kim. GM-VAE: Representation Learning with VAE on Gaussian Manifold. ar Xiv preprint ar Xiv:2209.15217, 2022a.

Seunghyuk Cho, Juyong Lee, Jaesik Park, and Dongwoo Kim. A rotated hyperbolic wrapped normal distribution for hierarchical representation learning. Advances in Neural Information Processing Systems, 35:17831 17843, 2022b.

Tarin Clanuwat, Mikel Bober-Irizar, Asanobu Kitamoto, Alex Lamb, Kazuaki Yamamoto, and David Ha. Deep learning for classical japanese literature. ar Xiv preprint ar Xiv:1812.01718, 2018.

Gregory Cohen, Saeed Afshar, Jonathan Tapson, and Andre Van Schaik. EMNIST: Extending MNIST to handwritten letters. In 2017 international joint conference on neural networks (IJCNN), pages 2921 2926. IEEE, 2017.

Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. Advances in Neural Information Processing Systems, 26, 2013.

Andrej Cvetkovski and Mark Crovella. Multidimensional scaling in the Poincar e disk. ar Xiv preprint ar Xiv:1105.5332, 2011.

Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifold

Biwei Dai and Uros Seljak. Sliced iterative normalizing ﬂows. In ICML Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models, 2021.

Haitz S aez de Oc ariz Borde, Alvaro Arroyo, Ismael Morales L opez, Ingmar Posner, and Xiaowen Dong. Neural latent geometry search: Product manifold inference via Gromov Hausdorﬀ-Informed Bayesian optimization. In Advances in Neural Information Processing Systems, 2023a.

Haitz S aez de Oc ariz Borde, Anees Kazi, Federico Barbero, and Pietro Lio. Latent graph inference using product manifolds. In The Eleventh International Conference on Learning Representations, 2023b.

Karan Desai, Maximilian Nickel, Tanmay Rajpurohit, Justin Johnson, and Shanmukha Ramakrishna Vedantam. Hyperbolic image-text representations. In International Conference on Machine Learning, pages 7694 7731. PMLR, 2023.

Ishan Deshpande, Yuan-Ting Hu, Ruoyu Sun, Ayis Pyrros, Nasir Siddiqui, Sanmi Koyejo, Zhizhen Zhao, David Forsyth, and Alexander G Schwing. Max-sliced Wasserstein distance and its use for GANs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10648 10656, 2019.

Hanze Dong, Xi Wang, LIN Yong, and Tong Zhang. Particle-based variational inference with preconditioned functional gradient ﬂow. In International Conference on Learning Representations, 2023.

John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(7), 2011.

Pengfei Fang, Mehrtash Harandi, and Lars Petersson. Kernel methods in hyperbolic spaces. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10665 10674, 2021.

Kilian Fatras, Younes Zine, R emi Flamary, Remi Gribonval, and Nicolas Courty. Learning with minibatch Wasserstein : asymptotic and gradient properties. In Silvia Chiappa and Roberto Calandra, editors, International Conference on Artiﬁcial Intelligence and Statistics, volume 108 of Proceedings of Machine Learning Research, pages 2131 2141. PMLR, 26 28 Aug 2020.

Aasa Feragen, Francois Lauze, and Soren Hauberg. Geodesic exponential kernels: When curvature and linearity conﬂict. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3032 3042, 2015.

R emi Flamary, Nicolas Courty, Alexandre Gramfort, Mokhtar Z Alaya, Aur elie Boisbunon, Stanislas Chambon, Laetitia Chapel, Adrien Corenﬂos, Kilian Fatras, Nemo Fournier, et al. Pot: Python optimal transport. The Journal of Machine Learning Research, 22(1): 3571 3578, 2021.

P Thomas Fletcher, Conglin Lu, Stephen M Pizer, and Sarang Joshi. Principal geodesic analysis for the study of nonlinear statistics of shape. IEEE transactions on medical imaging, 23(8):995 1005, 2004.

Bonet, Drumetz, Courty

P Thomas Fletcher, John Moeller, JeﬀM Phillips, and Suresh Venkatasubramanian. Computing hulls and centerpoints in positive deﬁnite space. ar Xiv preprint ar Xiv:0912.1580, 2009.

P Thomas Fletcher, John Moeller, JeﬀM Phillips, and Suresh Venkatasubramanian. Horoball hulls and extents in positive deﬁnite space. In Workshop on Algorithms and Data Structures, pages 386 398. Springer, 2011.

Fernando Galaz-Garcia, Marios Papamichalis, Kathryn Turnbull, Simon Lunagomez, and Edoardo Airoldi. Wrapped Distributions on homogeneous Riemannian manifolds. ar Xiv preprint ar Xiv:2204.09790, 2022.

Sylvestre Gallot, Dominique Hulin, and Jacques Lafontaine. Riemannian Geometry, volume 2. Springer, 1990.

Robert C. Garrett, Trevor Harris, Zhuo Wang, and Bo Li. Validating climate models with spherical convolutional Wasserstein distance. In Advances in Neural Information Processing Systems, 2024.

Baptiste Genest, Nicolas Courty, and David Coeurjolly. Non-Euclidean sliced optimal transport sampling. In Computer Graphics Forum, page e15020. Wiley Online Library, 2024.

Jacob Goldberger, Geoﬀrey E Hinton, Sam Roweis, and Russ R Salakhutdinov. Neighbourhood components analysis. Advances in Neural Information Processing Systems, 17, 2004.

Ziv Goldfeld, Kengo Kato, Gabriel Rioux, and Ritwik Sadhu. Statistical inference with regularized optimal transport. ar Xiv preprint ar Xiv:2205.04283, 2022.

Jumpei Goto and Hiroyuki Sato. Approximated logarithmic maps on Riemannian manifolds and their applications. JSIAM Letters, 13:17 20, 2021.

Albert Gu, Frederic Sala, Beliz Gunel, and Christopher R e. Learning mixed-curvature representations in product spaces. In International Conference on Learning Representations, 2019.

Mehrtash T Harandi, Mathieu Salzmann, and Richard Hartley. From manifold to manifold: Geometry-aware dimensionality reduction for SPD matrices. In Computer Vision ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part II 13, pages 17 32. Springer, 2014.

Thomas Hofmann, Bernhard Sch olkopf, and Alexander J. Smola. Kernel methods in machine learning. The Annals of Statistics, 36(3):1171 1220, 2008.

Andr es Hoyos-Idrobo. Aligning hyperbolic representations: an optimal transport-based approach. ar Xiv preprint ar Xiv:2012.01089, 2020.

Zihao Hu, Guanghui Wang, and Jacob Abernethy. Riemannian projection-free online learning. In Advances in Neural Information Processing Systems, 2023.

Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifold

Chin-Wei Huang, Milad Aghajohari, Joey Bose, Prakash Panangaden, and Aaron C Courville. Riemannian diﬀusion models. Advances in Neural Information Processing Systems, 35:2750 2761, 2022.

Gao Huang, Chuan Guo, Matt J Kusner, Yu Sun, Fei Sha, and Kilian Q Weinberger. Supervised word mover s distance. Advances in Neural Information Processing Systems, 29, 2016.

Zhiwu Huang and Luc Van Gool. A Riemannian network for SPD matrix learning. In Proceedings of the AAAI conference on artiﬁcial intelligence, volume 31, 2017.

Stephan Huckemann and Herbert Ziezold. Principal component analysis for Riemannian manifolds, with an application to triangular shape spaces. Advances in Applied Probability, 38(2):299 319, 2006.

Jonathan J. Hull. A database for handwritten text recognition research. IEEE Transactions on pattern analysis and machine intelligence, 16(5):550 554, 1994.

Sadeep Jayasumana, Richard Hartley, Mathieu Salzmann, Hongdong Li, and Mehrtash Harandi. Kernel methods on Riemannian manifolds with Gaussian RBF kernels. IEEE transactions on pattern analysis and machine intelligence, 37(12):2464 2477, 2015.

Richard Jordan, David Kinderlehrer, and Felix Otto. The variational formulation of the Fokker Planck equation. SIAM journal on mathematical analysis, 29(1):1 17, 1998.

Ce Ju and Cuntai Guan. Deep optimal transport on SPD manifolds for domain adaptation. ar Xiv preprint ar Xiv:2201.05745, 2022.

Isay Katsman, Eric Chen, Sidhanth Holalkere, Anna Asch, Aaron Lou, Ser Nam Lim, and Christopher M De Sa. Riemannian residual neural networks. Advances in Neural Information Processing Systems, 36, 2024.

Jun Kitagawa and Asuka Takatsu. Two new families of metrics via optimal transport and barycenter problems. ar Xiv preprint ar Xiv:2311.15874, 2023.

Soheil Kolouri, Yang Zou, and Gustavo K Rohde. Sliced Wasserstein kernels for probability distributions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5258 5267, 2016.

Soheil Kolouri, Kimia Nadjahi, Umut Simsekli, Roland Badeau, and Gustavo Rohde. Generalized sliced wasserstein distances. Advances in Neural Information Processing Systems, 32, 2019.

Anna Korba, Pierre-Cyril Aubin-Frankowski, Szymon Majewski, and Pierre Ablin. Kernel Stein discrepancy descent. In International Conference on Machine Learning, pages 5719 5730, 2021.

Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. From word embeddings to document distances. In International conference on machine learning, pages 957 966. PMLR, 2015.

Bonet, Drumetz, Courty

Serge Lang. Fundamentals of Diﬀerential Geometry, volume 191. Springer Science & Business Media, 2012.

Tam Le, Makoto Yamada, Kenji Fukumizu, and Marco Cuturi. Tree-sliced variants of Wasserstein distances. Advances in Neural Information Processing Systems, 32, 2019.

Alice Le Brigant. Probability on the spaces of curves and the associated metric spaces via information geometry; radar applications. Ph D thesis, Universit e de Bordeaux, 2017.

Alice Le Brigant and St ephane Puechmorel. Approximation of densities on Riemannian manifolds. Entropy, 21(1):43, 2019.

Yann Le Cun and Corinna Cortes. MNIST handwritten digit database. 2010. URL http: //yann.lecun.com/exdb/mnist/.

John M Lee. Riemannian Manifolds: an Introduction to Curvature, volume 176. Springer Science & Business Media, 2006.

John M Lee. Smooth Manifolds. Springer, 2012.

Jere Lehtonen. The geodesic ray transform on two-dimensional Cartan-Hadamard manifolds. ar Xiv preprint ar Xiv:1612.04800, 2016.

Jere Lehtonen, Jesse Railo, and Mikko Salo. Tensor tomography on Cartan Hadamard manifolds. Inverse Problems, 34(4):044004, 2018.

Wenjie Lei, Zhengming Ma, Shuyu Liu, and Yuanping Lin. EEG mental recognition based on RKHS learning and source dictionary regularized RKHS subspace learning. IEEE Access, 9:150545 150559, 2021.

R emi Leluc, Fran cois Portier, Johan Segers, and Aigerim Zhuman. Speeding up monte carlo integration: Control neighbors for optimal convergence. ar Xiv preprint ar Xiv:2305.06151, 2023.

R emi Leluc, Aymeric Dieuleveut, Fran cois Portier, Johan Segers, and Aigerim Zhuman. Sliced-Wasserstein estimation with spherical harmonics as control variates. In International Conference on Machine Learning, 2024.

Tao Li, Cheng Meng, Hongteng Xu, and Jun Yu. Hilbert curve projection distance for distribution comparison. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024.

Ya-Wei Eileen Lin, Ronald R Coifman, Gal Mishne, and Ronen Talmon. Hyperbolic diffusion embedding and distance for hierarchical representation learning. In International Conference on Machine Learning, pages 21003 21025. PMLR, 2023.

Zhenhua Lin. Riemannian geometry of symmetric positive deﬁnite matrices via Cholesky decomposition. SIAM Journal on Matrix Analysis and Applications, 40(4):1353 1370, 2019.

Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifold

Xinran Liu, Yikun Bai, Yuzhe Lu, Andrea Soltoggio, and Soheil Kolouri. Wasserstein task embedding for measuring task similarities. Neural Networks, 181:106796, 2025.

Antoine Liutkus, Umut Simsekli, Szymon Majewski, Alain Durmus, and Fabian-Robert St oter. Sliced-Wasserstein ﬂows: Nonparametric generative modeling via optimal transport and diﬀusions. In International Conference on Machine Learning, pages 4104 4113. PMLR, 2019.

Federico L opez, Beatrice Pozzetti, Steve Trettel, Michael Strube, and Anna Wienhard. Symmetric spaces for graph embeddings: A Finsler-Riemannian approach. In International Conference on Machine Learning, pages 7090 7101. PMLR, 2021.

Aaron Lou, Derek Lim, Isay Katsman, Leo Huang, Qingxuan Jiang, Ser Nam Lim, and Christopher M De Sa. Neural manifold ordinary diﬀerential equations. Advances in Neural Information Processing Systems, 33:17548 17558, 2020.

Brice Loustau. Hyperbolic geometry. ar Xiv e-prints, pages ar Xiv 2003, 2020.

Suraj Maharjan, John Arevalo, Manuel Montes, Fabio A Gonz alez, and Thamar Solorio. A multi-task approach to predict likability of books. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pages 1217 1227, 2017.

Tudor Manole, Sivaraman Balakrishnan, Jonathan Niles-Weed, and Larry Wasserman. Plugin estimation of smooth optimal transport maps. ar Xiv preprint ar Xiv:2107.12364, 2021.

Tudor Manole, Sivaraman Balakrishnan, and Larry Wasserman. Minimax conﬁdence intervals for the Sliced Wasserstein distance. Electronic Journal of Statistics, 16(1):2252 2345, 2022.

Kanti V Mardia, Peter E Jupp, and KV Mardia. Directional Statistics, volume 2. Wiley Online Library, 2000.

Emile Mathieu and Maximilian Nickel. Riemannian continuous normalizing ﬂows. Advances in Neural Information Processing Systems, 33:2503 2515, 2020.

Robert J Mc Cann. Polar factorization of maps on Riemannian manifolds. Geometric & Functional Analysis GAFA, 11(3):589 608, 2001.

Pascal Mettes, Mina Ghadimi Atigh, Martin Keller-Ressel, Jeﬀrey Gu, and Serena Yeung. Hyperbolic deep learning in computer vision: A survey. International Journal of Computer Vision, pages 1 25, 2024.

Dimitri Meunier, Massimiliano Pontil, and Carlo Ciliberto. Distribution regression with Sliced Wasserstein kernels. In International Conference on Machine Learning, pages 15501 15523. PMLR, 2022.

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and JeﬀDean. Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, 26, 2013.

Bonet, Drumetz, Courty

Gal Mishne, Zhengchao Wan, Yusu Wang, and Sheng Yang. The numerical stability of hyperbolic representation learning. In International Conference on Machine Learning, pages 24925 24949. PMLR, 2023.

Kevin Musgrave, Serge Belongie, and Ser-Nam Lim. Pytorch metric learning. ar Xiv preprint ar Xiv:2008.09164, 2020.

Kimia Nadjahi, Alain Durmus, Umut Simsekli, and Roland Badeau. Asymptotic guarantees for learning generative models with the Sliced-Wasserstein distance. In Advances in Neural Information Processing Systems, volume 32, 2019.

Kimia Nadjahi, Alain Durmus, L ena ıc Chizat, Soheil Kolouri, Shahin Shahrampour, and Umut Simsekli. Statistical and topological properties of sliced probability divergences. In Advances in Neural Information Processing Systems, volume 33, pages 20802 20812, 2020.

Yoshihiro Nagano, Shoichiro Yamaguchi, Yasuhiro Fujita, and Masanori Koyama. A wrapped normal distribution on hyperbolic space for gradient-based learning. In International Conference on Machine Learning, pages 4693 4702. PMLR, 2019.

Khai Nguyen and Nhat Ho. Sliced wasserstein estimation with control variates. In International Conference on Learning Representations, 2024a.

Khai Nguyen and Nhat Ho. Energy-based sliced Wasserstein distance. Advances in Neural Information Processing Systems, 36, 2024b.

Khai Nguyen, Nhat Ho, Tung Pham, and Hung Bui. Distributional Sliced-Wasserstein and applications to generative modeling. In International Conference on Learning Representations, 2021a.

Khai Nguyen, Son Nguyen, Nhat Ho, Tung Pham, and Hung Bui. Improving relational regularized autoencoders with spherical sliced fused Gromov Wasserstein. In International Conference on Learning Representations, 2021b.

Khai Nguyen, Nicola Bariletto, and Nhat Ho. Quasi-monte carlo for 3d sliced Wasserstein. In International Conference on Learning Representations, 2024.

Maximillian Nickel and Douwe Kiela. Poincar e embeddings for learning hierarchical representations. Advances in Neural Information Processing Systems, 30, 2017.

Maximillian Nickel and Douwe Kiela. Learning continuous hierarchies in the Lorentz model of hyperbolic geometry. In International Conference on Machine Learning, pages 3779 3788. PMLR, 2018.

Sloan Nietert, Ziv Goldfeld, Ritwik Sadhu, and Kengo Kato. Statistical, robustness, and computational guarantees for sliced Wasserstein distances. Advances in Neural Information Processing Systems, 35:28179 28193, 2022.

Jonathan Niles-Weed and Philippe Rigollet. Estimation of Wasserstein distances in the spiked transport model. Bernoulli, 28(4):2663 2688, 2022.

Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifold

Vanni Noferini. A formula for the Fr echet derivative of a generalized matrix function. SIAM Journal on Matrix Analysis and Applications, 38(2):434 457, 2017.

Ruben Ohana, Kimia Nadjahi, Alain Rakotomamonjy, and Liva Ralaivola. Shedding a PACBayesian light on adaptive sliced-Wasserstein distances. In International Conference on Machine Learning, pages 26451 26473. PMLR, 2023.

Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. Thumbs up? sentiment classiﬁcation using machine learning techniques. ar Xiv preprint cs/0205070, 2002.

Jiwoong Park, Junho Cho, Hyung Jin Chang, and Jin Young Choi. Unsupervised hyperbolic representation learning via message passing auto-encoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5516 5526, 2021.

Sangmin Park and Dejan Slepˇcev. Geometry and analytic properties of the sliced wasserstein space. ar Xiv preprint ar Xiv:2311.05134, 2023.

Fran cois-Pierre Paty and Marco Cuturi. Subspace robust Wasserstein distances. In International Conference on Machine Learning, pages 5072 5081. PMLR, 2019.

Oﬁr Pele and Michael Werman. Fast and robust earth mover s distances. In 2009 IEEE 12th international conference on computer vision, pages 460 467. IEEE, 2009.

Wei Peng, Tuomas Varanka, Abdelrahman Mostafa, Henglin Shi, and Guoying Zhao. Hyperbolic deep neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(12):10023 10044, 2021.

Xavier Pennec. Intrinsic statistics on Riemannian manifolds: Basic tools for geometric measurements. Journal of Mathematical Imaging and Vision, 25:127 154, 2006.

Xavier Pennec. Manifold-valued image processing with SPD matrices. In Riemannian geometric statistics in medical image analysis, pages 75 134. Elsevier, 2020.

Xavier Pennec, Pierre Fillard, and Nicholas Ayache. A Riemannian framework for tensor computing. International Journal of computer vision, 66(1):41 66, 2006.

Gabriel Peyr e, Marco Cuturi, et al. Computational Optimal Transport: With Applications to Data Science. Foundations and Trends R in Machine Learning, 11(5-6):355 607, 2019.

Aram-Alexandre Pooladian, Carles Domingo-Enrich, Ricky TQ Chen, and Brandon Amos. Neural optimal transport with Lagrangian costs. In ICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems, 2023.

Alison Pouplin, David Eklund, Carl Henrik Ek, and Søren Hauberg. Identifying latent distances with Finslerian geometry. Transactions on Machine Learning Research, 2023. ISSN 2835-8856.

Michael Quellmalz, Robert Beinert, and Gabriele Steidl. Sliced optimal transport on the sphere. Inverse Problems, 39(10):105005, 2023.

Bonet, Drumetz, Courty

Michael Quellmalz, L eo Buecher, and Gabriele Steidl. Parallelly sliced optimal transport on spheres and on the rotation group. Journal of Mathematical Imaging and Vision, pages 1 26, 2024.

Julien Rabin, Gabriel Peyr e, Julie Delon, and Marc Bernot. Wasserstein barycenter and its application to texture mixing. In Scale Space and Variational Methods in Computer Vision: Third International Conference, SSVM 2011, Ein-Gedi, Israel, May 29 June 2, 2011, Revised Selected Papers 3, pages 435 446. Springer, 2012.

Alain Rakotomamonjy, Mokhtar Z Alaya, Maxime Berar, and Gilles Gasso. Statistical and topological properties of Gaussian smoothed sliced probability divergences. ar Xiv preprint ar Xiv:2110.10524, 2021.

Danilo J Rezende and S ebastien Racani ere. Implicit Riemannian concave potential maps. ar Xiv preprint ar Xiv:2110.01288, 2021.

Danilo Jimenez Rezende, George Papamakarios, S ebastien Racaniere, Michael Albergo, Gurtej Kanwar, Phiala Shanahan, and Kyle Cranmer. Normalizing ﬂows on tori and spheres. In International Conference on Machine Learning, pages 8083 8092. PMLR, 2020.

Joel W Robbin and Dietmar A Salamon. Introduction to Diﬀerential Geometry. ETH, Lecture Notes, preliminary version, 18, 2011.

Noam Rozen, Aditya Grover, Maximilian Nickel, and Yaron Lipman. Moser ﬂow: Divergence-based generative modeling on manifolds. Advances in Neural Information Processing Systems, 34:17669 17680, 2021.

Boris Rubin. Notes on radon transforms in integral geometry. Fractional Calculus and Applied Analysis, 6(1):25 72, 2003.

Raif M Rustamov and Subhabrata Majumdar. Intrinsic sliced Wasserstein distances for comparing collections of probability distributions on manifolds and graphs. In International Conference on Machine Learning, pages 29388 29415. PMLR, 2023.

David Sabbagh, Pierre Ablin, Ga el Varoquaux, Alexandre Gramfort, and Denis A Engemann. Manifold-regression to predict from MEG/EEG brain signals without source modeling. Advances in Neural Information Processing Systems, 32, 2019.

Filippo Santambrogio. Optimal Transport for Applied Mathematicians. Birk auser, NY, 55 (58-63):94, 2015.

Filippo Santambrogio. {Euclidean, metric, and Wasserstein} gradient ﬂows: an overview. Bulletin of Mathematical Sciences, 7:87 154, 2017.

Christopher Scarvelis and Justin Solomon. Riemannian metric learning via optimal transport. In International Conference on Learning Representations, 2023.

Meyer Scetbon and Marco Cuturi. Low-rank optimal transport: Approximation, statistics and debiasing. Advances in Neural Information Processing Systems, 35:6802 6814, 2022.

Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifold

Zhongmin Shen. Lectures on Finsler geometry. World Scientiﬁc, 2001.

Ondrej Skopek, Octavian-Eugen Ganea, and Gary B ecigneul. Mixed-curvature variational autoencoders. In International Conference on Learning Representations, 2020.

Stefan Sommer, Tom Fletcher, and Xavier Pennec. Introduction to Diﬀerential and Riemannian Geometry. In Riemannian Geometric Statistics in Medical Image Analysis, pages 3 37. Elsevier, 2020.

Sho Sonoda, Isao Ishikawa, and Masahiro Ikeda. Fully-connected network on noncompact symmetric space and ridgelet transform based on Helgason-Fourier analysis. In International Conference on Machine Learning, pages 20405 20422. PMLR, 2022.

Yann Thanwerdas and Xavier Pennec. O(n)-invariant Riemannian metrics on SPD matrices. Linear Algebra and its Applications, 661:163 201, 2023.

James Thornton, Michael Hutchinson, Emile Mathieu, Valentin De Bortoli, Yee Whye Teh, and Arnaud Doucet. Riemannian diﬀusion Schr odinger bridge. ar Xiv preprint ar Xiv:2207.03024, 2022.

Huy Tran, Yikun Bai, Abihith Kothapalli, Ashkan Shahbazi, Xinran Liu, Rocio P Diaz Martin, and Soheil Kolouri. Stereographic spherical sliced Wasserstein distances. In International Conference on Machine Learning, 2024.

Oncel Tuzel, Fatih Porikli, and Peter Meer. Region covariance: A fast descriptor for detection and classiﬁcation. In Computer Vision ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, May 7-13, 2006. Proceedings, Part II 9, pages 589 600. Springer, 2006.

C edric Villani et al. Optimal Transport: Old and New, volume 338. Springer, 2009.

J org A Walter. H-MDS: a new approach for interactive visualization with multidimensional scaling in the hyperbolic space. Information systems, 29(4):273 292, 2004.

Yifei Wang, Peng Chen, and Wuchen Li. Projected Wasserstein gradient descent for highdimensional bayesian inference. SIAM/ASA Journal on Uncertainty Quantiﬁcation, 10 (4):1513 1532, 2022.

Jiaqi Xi and Jonathan Niles-Weed. Distributional convergence of the sliced Wasserstein process. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.

Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. ar Xiv preprint ar Xiv:1708.07747, 2017.

Xianliang Xu and Zhongyi Huang. Central limit theorem for the sliced 1-Wasserstein distance and the max-sliced 1-wasserstein distance. ar Xiv preprint ar Xiv:2205.14624, 2022.

Bonet, Drumetz, Courty

Or Yair, Felix Dietrich, Ronen Talmon, and Ioannis G Kevrekidis. Domain adaptation with optimal transport on the manifold of SPD matrices. ar Xiv preprint ar Xiv:1906.00616, 2019.

Ryoma Yataka, Kazuki Hirashima, and Masashi Shiraishi. Grassmann manifold ﬂows for stable shape generation. Advances in Neural Information Processing Systems, 36:72377 72411, 2023.

Hongyi Zhang, Sashank J Reddi, and Suvrit Sra. Riemannian SVRG: Fast stochastic optimization on Riemannian manifolds. Advances in Neural Information Processing Systems, 29, 2016.

Rixin Zhuang, Zhengming Ma, Weijia Feng, and Yuanping Lin. SPD data dictionary learning based on kernel learning and Riemannian metric. IEEE Access, 8:61956 61972, 2020.