# generalized_dimension_reduction_using_semirelaxed_gromovwasserstein_distance__8eac48a6.pdf

Generalized Dimension Reduction Using Semi-Relaxed Gromov-Wasserstein Distance

Ranthony A. Clark1, Tom Needham2, Thomas Weighill3

1 Department of Mathematics, Duke University, Durham NC 2 Department of Mathematics, Florida State University, Tallahassee FL 3 Department of Mathematics & Statistics, University of North Carolina at Greensboro, Greensboro NC ranthony.clark@duke.edu, tneedham@fsu.edu, t weighill@uncg.edu

Dimension reduction techniques typically seek an embedding of a high-dimensional point cloud into a low-dimensional Euclidean space which optimally preserves the geometry of the input data. Based on expert knowledge, one may instead wish to embed the data into some other manifold or metric space in order to better reﬂect the geometry or topology of the point cloud. We propose a general method for manifoldvalued multidimensional scaling based on concepts from optimal transport. In particular, we establish theoretical connections between the recently introduced semi-relaxed Gromov Wasserstein (sr GW) framework and multidimensional scaling by solving the Monge problem in this setting. We also derive novel connections between sr GW distance and Gromov Hausdorff distance. We apply our computational framework to analyze ensembles of political redistricting plans for states with two Congressional districts, achieving an effective visualization of the ensemble as a distribution on a circle which can be used to characterize typical neutral plans, and to ﬂag outliers.

Code github.com/thomasweighill/srgw-embedding Extended version arxiv.org/abs/2405.15959

1 Introduction Dimension reduction is a fundamental task in unsupervised learning and is frequently a ﬁrst step in the data exploration pipeline. Typically, dimension reduction is framed as the process of determining an embedding of a ﬁnite metric space (X, d) into a low-dimensional Euclidean space Rm which optimally preserves the geometry of the input data. As a concrete example, for X = {x1, . . . , xn}, the metric multidimensional scaling (MDS) problem seeks a point cloud MDSm(X) Rm satisfying

MDSm(X) argmin y1,...,yn

i,j=1 (d(xi, xj) yi yj )2. (1)

Based on prior knowledge, it may be natural to theorize that a dataset X is noisily sampled from a distribution on a speciﬁc low-dimensional manifold in the ambient data space, and the practitioner may wish for this structure to be

Copyright 2025, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

reﬂected in the dimension reduction process indeed, this was the case for the geospatial data studied in Section 5, which catalyzed the ideas in this paper. A simple observation is that the MDS problem (1) still makes sense if Rm is replaced with some other low-dimensional Riemannian manifold, and the main goal of this paper is to develop a theoretical and computational framework for this manifoldvalued variant of MDS. Another observation is that the objective of the MDS problem is very similar to that of the Gromov-Wasserstein (GW) distance from optimal transport theory (M emoli 2011), and our main theoretical result makes this connection precise.

Contributions. The contributions of this paper are centered on theoretical results which tie together several ideas from classical and more recent literature. The impetus for the paper was the problem of visualizing complex redistricting datasets; our solution provides an extended computational example that illustrates how our framework can be used to gain important insights into non-Euclidean data. More speciﬁcally, our main contributions are:

We extend the formulation of semi-relaxed Gromov Wasserstein distance introduced in (Vincent-Cuaz et al. 2022) to a 1-parameter family of Lp-type distances deﬁned for general metric measure spaces (including continuous spaces). We then show in Corollary 6 that the semirelaxed GW problem generalizes the MDS problem (1) in several ways. This is based on Theorem 2, which shows that the semi-relaxed GW distance is realized by a Monge map in very general situations, thus adding to the growing recent literature on the existence of Monge maps in the GW framework (see Remark 4). We develop and unify the theory of generalized MDS and semi-relaxed GW distances by exhibiting connections to variants of the Gromov-Hausdorff (GH) distance that have appeared previously in the literature. Theorem 8 and Theorem 9 together show that a symmetrized version of the p = semi-relaxed GW distance is equal to the modiﬁed GH distance of (M emoli 2012), which, in turn, appeared in classical work on generalized MDS (Bronstein, Bronstein, and Kimmel 2006). By drawing the connection between the MDS and sr GW problems, we are able to design an efﬁcient algorithm (SRGW+GD) for computing MDS embeddings into mani-

The Thirty-Ninth AAAI Conference on Artificial Intelligence (AAAI-25)

folds, consisting of an initial discretized optimal transport computation followed by a gradient descent stage. This allows general target spaces, and produces signiﬁcantly better embeddings (quantitatively and qualitatively) than a naive gradient descent algorithm (Table 2 and Figure 1). Using experiments on the MNIST dataset, we show that SRGW+GD matches or outperforms SMACOF MDS for Euclidean target spaces. We demonstrate its effectiveness for embeddings into spheres on a dataset of rotated MNIST images, and a set of GPS coordinates of cities. In the ﬁnal section, we apply SRGW+GD to ensembles of political districting plans for low-population states, achieving a natural and effective visualization of each ensemble as a distribution on a circle. These visualizations lead to pertinent insights into the distribution of possible districting plans for each state.

2 Semi-Relaxed Gromov-Wasserstein Distance and Multidimensional Scaling Gromov-Wasserstein Distances. Let (X, d X, µX) and (Y, d Y , µY ) be metric measure spaces (mm-spaces); that is, (X, d X) is a metric space, which we assume to be complete and separable, and µX is a Borel probability measure on X. We will sometimes abuse notation and simply write X for (X, d X, µX) when it is clear that there is an associated choice of metric and measure. A coupling of µX and µY is a Borel probability measure γ on X Y with marginals equal to µX and µY , respectively. Writing this symbolically, (π1)#γ = µX and (π2)#γ = µY , where π1 : X Y X and π2 : X Y Y are the coordinate projection maps and (π1)#γ denotes the pushforward measure. We denote the set of all measure couplings between µX and µY as C(µX, µY ). For p [1, ), the Gromov-Wasserstein (GW) p-distance (Memoli 2007; M emoli 2011) is

d GW,p(X, Y ) = inf γ C(µX,µY ) disp(γ),

where the p-distortion of γ, disp(γ), is given by

disp(γ) = 1

2 ΓX,Y Lp(γ γ) (2)

ZZ |ΓX,Y (x, y, x , y )|p dγ(x, y) dγ(x , y ) 1/p

with ΓX,Y (x, y, x , y ) = d X(x, x ) d Y (y, y ). This extends to the p = case, where the distortion is equal to

dis (γ) = 1

2 ΓX,Y L (supp(γ) supp(γ))

2 sup (x,y),(x ,y ) supp(γ) |ΓX,Y (x, y, x , y )|,

with supp(γ) denoting the support of the measure γ. The GW distance was introduced in (Memoli 2007), where it was shown that d GW,p deﬁnes a metric on the space of measure-preserving isometry classes of fully supported compact mm-spaces.

Semi-Relaxed GW Distances. GW distances have become a popular tool in the machine learning community, due to their ability to compare distinct data types (Peyr e, Cuturi, and Solomon 2016; Chowdhury and Needham 2021; Xu, Luo, and Carin 2019). Many variants of GW distances have been introduced in recent years (Scetbon, Peyr e, and Cuturi 2022; Chowdhury, Miller, and Needham 2021; Chowdhury et al. 2023; Redko et al. 2020; Vayer et al. 2020). Of particular interest here is the semi-relaxed Gromov-Wasserstein (sr GW) distance of (Vincent-Cuaz et al. 2022). There, the p = 2 version was deﬁned for ﬁnite spaces, and we give a natural generalization here. Deﬁnition 1 (Semi-Relaxed Gromov-Wasserstein Distance). Let X = (X, d X, µX) and Y = (Y, d Y , µY ) be mm-spaces and let p [1, ]. A semi-coupling of µX and Y is a Borel probability measure γ on X Y such that (π1)#γ = µX (note that this deﬁnition doesn t depend on the measure on Y ). The set of semi-couplings will be denoted SC(µX, Y ). The semi-relaxed Gromov-Wasserstein (sr GW) p-distance is dsr GW,p(X, Y ) = inf γ SC(µX,Y ) disp(γ), (3)

where the p-distortion is as deﬁned in (2). We call dsr GW,p a distance in an informal sense. It is clearly asymmetric and, in fact, we will show in Section 3 that it satisﬁes the triangle inequality if and only if p = . For now, we consider (3) as an interesting optimization problem. We show below that it is closely related to MDS (1).

Monge Maps and Generalized MDS. Let (X, d X, µX) be a mm-space and Y a Polish space. Given a measurable function f : X Y , we deﬁne the semi-coupling induced by f to be the measure µf on X Y given by µf := (id X f)#µX, where id X f : X X Y is the function x 7 (x, f(x)). In the case that X is ﬁnite, this measure is given explicitly by

x X µX(x)δ(x,f(x)).

Here, and throughout the rest of the paper, we use δz to denote the Dirac mass at a point z Z. We recall that a metric space is called proper if its closed and bounded sets are compact. An action of a group G on a metric space X is said to be cocompact if there exists a compact K X such that X is covered by translates of K under the G-action. The following is our main result. Theorem 2 (Existence of Monge Maps). Let (X, d X, µX) be a metric measure space with X ﬁnite and µX fully supported and let (Y, d Y ) be a proper metric space with a cocompact action by isometries by some group G. Then for any p [1, ], there exists a function f : X Y such that dsr GW,p(X, Y ) = disp(µf). Moreover, if p < , then any semi-coupling with the same distortion as µf is induced by a function. The proof is included in the ar Xiv version. The main idea is that, from any initial coupling, one can construct a new one of the form µf with lower distortion via disintegration of the initial coupling.

Remark 3. The theorem applies when Y = Rn, endowed with Euclidean distance (a main motivating example), but also to a large class of spaces including compact metric spaces (e.g. closed Riemannian manifolds) or inﬁnite binary trees. Regarding the source space X, our proof requires X to be ﬁnite; removing this assumption, even in the Euclidean case, seems difﬁcult see e.g. (Murray and Pickarski 2024). Remark 4 (The Monge Problem). We will refer to the map f in Theorem 2 as a Monge map, in reference to the original formulation of optimal transport, due to Monge, where optimization was performed over measure-preserving maps, rather than over couplings (see (Villani 2021)). The Monge problem in OT theory is to determine conditions under which the Wasserstein distance is realized by a measurepreserving map; the problem is now well-understood, with several general results, e.g., (Brenier 1987) and (Villani et al. 2009, Theorem 10.41). The Monge problem in the GW setting is an active area of current research. The stateof-the-art results in this direction appear in (Dumont, Lacombe, and Vialard 2024), where the problem is solved for variants of the p = 2 GW distance between measures on Euclidean spaces with density. Other recent results for (variants of) the Monge problem for GW distance between more restrictive subclasses of mm-spaces appear in (Vayer 2020; Sturm 2023; Beinert, Heiss, and Steidl 2023; Salmona, Delon, and Desolneux 2022; M emoli and Needham 2024). Example 5. Without the assumptions on Y made in Theorem 2, there may be no Monge map from X to Y . Indeed, let X = {0, 1} R and let Y = {(0, 2n) | n N} {(1 + 2 n, 2n) | n N}. Since X does not isometrically embed into Y , there is no zero distortion map from X to Y . However, the maps fn : X Y given by f(0) = (0, 2n), f(1) = (1 + 2 n, 2n) have arbitrarily low distortion. We have the following corollary, showing that the sr GW problem generalizes the MDS problem. Corollary 6. Let (X, d) be a ﬁnite metric space with X = {x1, . . . , xn} and let µ be uniform measure on X. Let f : X Rm be a function which realizes dsr GW,2(X, Rm). Then the point cloud f(x1), . . . , f(xn) Rm is a solution of MDSm(X).

Related Work. The semi-relaxed GW problem was ﬁrst studied in (Vincent-Cuaz et al. 2022), where it was applied to graph machine learning problems such as dictionary learning and graph completion. The followup paper (Van Assel et al. 2024) considers theoretical aspects of sr GW and, in particular, connections to dimension reduction. Their result (Van Assel et al. 2024, Theorem 3.2) is analogous to Corollary 6 and says that spectral methods of dimensional reduction can be realized as solutions to semi-relaxed GW problems. This covers, for example, classical multidimensional scaling (as opposed to metric MDS, studied here). These spectral methods are only applicable to Euclidean (or perhaps hyperbolic) embedding spaces, so that the target applications of (Van Assel et al. 2024) and this paper are fairly disjoint. Our theoretical results are related to and complementary with those of (Van Assel et al. 2024), but neither paper generalizes the other.

The problem of dimension reduction into a non-Euclidean space has been well-studied in the topological data analysis literature. These approaches are fundamentally different than the one used here; they rely on speciﬁc constructions of algebraic topology and are therefore only suited to embedding in speciﬁc classes of spaces such as circles (De Silva and Vejdemo-Johansson 2009; Paik and Park 2023), projective spaces (Perea 2018), or lens spaces (Polanco and Perea 2019). More importantly, the results of these algorithms are qualitatively different than ours, as they are based on persistent (co)homology rather than on geometry preservation. We compare our results to one of these methods, circular coordinates, in Section 4. The Gromov-Hausdorff distance is another useful tool for shape comparison and analysis where shapes can be modeled as (compact) metric spaces, versus the Gromov Wasserstein setting which considers the distributional properties of a sample via metric measure spaces. It is used in areas such as manifold learning, computer vision, computational geometry, and topological data analysis which emphasize the geometry and topological properties of data. Our work is closely related to the inﬂuential paper (Bronstein, Bronstein, and Kimmel 2006), which approaches the problem of partial surface matching through a certain semi-relaxed version of Gromov-Hausdorff distance. This connection inspired the work in the next section, which derives a precise relationship between the version of Gromov-Hausdorff distance considered in (Bronstein, Bronstein, and Kimmel 2006) and the sr GW distance introduced of (Vincent-Cuaz et al. 2022).

3 Connections to Gromov-Hausdorff Distance Semi-Relaxed Gromov-Hausdorff Distance. Let X = (X, d X) and Y = (Y, d Y ) be metric spaces. Recall that the Gromov-Hausdorff (GH) distance between X and Y is deﬁned by d GH(X, Y ) = inf R R(X,Y ) dis(R), (4)

where R(X, Y ) is the set of correspondences between X and Y that is, the set of relations R X Y such that the coordinate projection maps take R surjectively onto each component and dis(R) is the metric distortion of the correspondence R, deﬁned by

2 sup (x,y),(x ,y ) R |d X(x, x ) d Y (y, y )|. (5)

Inspired by the semi-relaxed version of Gromov Wasserstein distance considered above, we now deﬁne a semi-relaxed version of Gromov-Hausdorff distance. Deﬁnition 7 (Semi-Relaxed Gromov-Hausdorff Distance). Let X and Y be metric spaces. A relation R X Y is called a semi-correspondence if the coordinate projection to X takes R surjectively onto X (but we put no such condition on the projection map to Y ). Let SR(X, Y ) denote the set of semi-correspondences. The semi-relaxed Gromov Hausdorff (sr GH) distance is dsr GH(X, Y ) = inf R SR(X,Y ) dis(R),

with dis(R) deﬁned as in (5). This is symmetrized as

bdsr GH(X, Y ) = max{dsr GH(X, Y ), dsr GH(Y, X)}.

Equivalence to Modiﬁed Gromov-Hausdorff Distance. We will now show that the symmetrized sr GH distance is a reformulation of a distance which has already appeared in the literature. Recall (see, e.g., (Burago, Burago, and Ivanov 2022)) that the GH distance can be expressed as

d GH(X, Y ) = inf f,g max{dis(f), dis(g), codis(f, g)}, (6)

where the inﬁmum is over (not necessarily continuous) functions f : X Y and g : Y X, the function distortion dis(f) is deﬁned by

2 sup x,x X |d X(x, x ) d Y (f(x), f(x ))|,

with dis(g) deﬁned similarly, and where the codistortion codis(f, g) is deﬁned by

codis(f, g) = 1

2 sup x X,y Y |d X(x, g(y)) d Y (y, f(x))|.

This formulation leads to a natural modiﬁcation, where the maps f and g are decoupled by dropping the codistortion term in (6). The modiﬁed Gromov-Hausdorff distance is

dm GH(X, Y ) = inf f,g max{dis(f), dis(g)}

= max inf f:X Y dis(f), inf g:Y X dis(g) .

This distance was introduced in (M emoli 2012), where it was shown to be a metric on the space of isometry classes of compact metric spaces (M emoli 2012, Theorem 4.1). Our next main result shows that it is the same as the symmetrized semi-relaxed GH distance. Theorem 8 (Equivalence of Gromov-Hausdorff Distances). The symmetrized semi-relaxed Gromov-Hausdorff distance bdsr GH is equal to the modiﬁed Gromov-Hausdorff distance dm GH. The proof is given in the ar Xiv version. We also provide an additional characterization of semi-relaxed GH distance in terms of isometric embeddings in the ar Xiv version.

Connection to Semi-Relaxed Gromov-Wasserstein Distance. There is an apparent relationship between the p = version of (semi-relaxed) Gromov-Wasserstein distance and (semi-relaxed) Gromov-Hausdorff distance. Indeed, given mm-spaces (X, d X, µX) and (Y, d Y , µY ) with fully-supported measures, any coupling γ C(µX, µY ) induces a correspondence supp(γ), and it follows that

d GH(X, Y ) d GW, (X, Y ), (7)

where the quantity on the left is understood as the Gromov Hausdorff distance between the underlying metric spaces. However, the inequality (7) is not an equality, in general see (M emoli 2011, Theorem 5.1). The following result shows that equality does hold in the semi-relaxed setting. We deﬁne the symmetrized sr GW distance ˆdsr GW,p by

bdsr GW,p(X, Y ) = max{dsr GW,p(X, Y ), dsr GW,p(Y, X)}.

Theorem 9 (Equivalence of sr GW and sr GH). Let X and Y be mm-spaces such that µX has full support. Then

bdsr GW, (X, Y ) = bdsr GH(X, Y ) = dm GH(X, Y ).

Moreover, bdsr GW,p deﬁnes a metric on the space of isometry classes of compact metric spaces which is topologically equivalent to Gromov-Hausdorff distance on any GH precompact family of compact metric spaces when p = , but does not deﬁne a pseudometric for p < . The theorem is proved in the ar Xiv version. It is based on the observation that any semi-correspondence can be approximated by a measurable semi-correspondence with an arbitrarily small change in distortion. Remark 10. The version of generalized MDS used in (Bronstein, Bronstein, and Kimmel 2006) that we mentioned in the related work section is the asymmetric modiﬁed GH problem, inff:X Y dis(f) and its ℓp relaxation, which the authors of that paper apply to ﬁnd embeddings of subsets of geodesic spaces to aid in partial surface matching. Our results Corollary 6 and Theorem 9 give a cohesive connection between various ideas: the embedding problem considered in (Bronstein, Bronstein, and Kimmel 2006) is an application of the modiﬁed GH distance of (M emoli 2012), which is equal to the p = version of a semi-relaxed GW distance, whereas the p = 2 version of sr GW ﬁrst introduced in (Vincent-Cuaz et al. 2022) generalizes the standard MDS problem.

4 Numerical Implementation and Experiments Implementation. In this section we describe how to use the semi-relaxed Gromov Wasserstein distance to ﬁnd an embedding of a ﬁnite metric space (X, d) into a smooth Riemannian manifold Y . The algorithm is straightforward: in short, we solve a sr GW problem to embed X into a predeﬁned ﬁnite subset of Y , then use this as initialization for gradient descent of the MDS functional (1), with Y as the target metric space, which could be more general than Rm. We now provide some details. Given a ﬁnite metric space (X, d), we construct an embedding ˆf : X Y as follows. We ﬁrst pick a discrete ﬁnite subset S Y to map into. In practice, we often use a grid in some coordinate system for Y , and perturb the points slightly to make it easier to solve the resulting optimization problem. We then solve the semi-relaxed Gromov-Wasserstein problem (3), dsr GW,2(X, S), which, by Theorem 2, yields an optimal Monge map f : X S. The computation of dsr GW,2(X, S) is approximated via the sr GW implementation in the Python Optimal Transport package (Flamary et al. 2021); although this approximation is not guaranteed to yield a Monge map, we found that it does so in practice. To compute the required embedding ˆf : X Y , we run a gradient descent, initialized with the embedding f (sometimes with a small perturbation); as we will see below, this drastically improves the likelihood of ﬁnding a good local minimum by gradient descent. Let X = {x1, . . . , xn} and

initialize a set of n points (yi)1 i n in Y via yi = f(xi). We then consider the distortion function Y n R deﬁned by

(y1, . . . , yn) 7 1

i,j=1 (d X(xi, xj) d Y (yi, yj))2 (8)

(cf. the MDS functional (1)) and use a gradient-based method on Y to ﬁnd a local minimum (ˆy1, ˆy2, . . . , ˆyn). Our embedding is then given by ˆf(xi) = ˆyi. We refer to this method as SRGW+GD. In our examples below, we use the Adam optimizer (Kingma and Ba 2014). In practice, the desired embedding space Y may only be known up to certain hyperparamters, such as scale. Our main example below will be when Y is a circle of unknown radius. If Y depends on a scale factor (or multiple scale factors), we add the scale factor as an additional variable in (8).

MNIST. We ﬁrst benchmark SRGW+GD against other dimension reduction methods in the case when the target space is Euclidean. We use the MNIST dataset consisting of 70,000 28 28 grayscale images of handwritten digits, which we view as vectors in R784, separated into ten smaller datasets (MNIST0 to MNIST9), one for each digit. We embed each dataset into R2 using PCA, SMACOF MDS, and SRGW+GD. Table 1 reports the distortion (dis2) values for each embedding. We see that SRGW+DG and SMACOF achieve similar distortion in all cases, with SRGW+GD performing slightly better. In terms of compute time, we ﬁnd that SRGW+GD is between 3 and 14 times faster then SMACOF MDS (using scikit-learn).

PCA SMACOF MDS SRGW+GD MNIST0 2.580 1.556 1.554 MNIST1 1.290 0.794 0.777 MNIST2 3.070 1.761 1.751 MNIST3 2.792 1.594 1.586 MNIST4 2.668 1.523 1.518 MNIST5 2.702 1.602 1.594 MNIST6 2.549 1.495 1.477 MNIST7 2.346 1.350 1.342 MNIST8 2.901 1.646 1.623 MNIST9 2.421 1.405 1.388

Table 1: Distortion for embeddings of MNIST data into R2.

Rotated MNIST. In order to demonstrate the performance of SRGW+GD when the target space is non-Euclidean, we artiﬁcally introduce non-linear structure into the MNIST datasets. We apply a random rotation between 0 and 360 to each image in each MNIST dataset (ﬁlling gaps with black pixels) to create new datasets R-MNIST0 to R-MNIST9. We embed these datasets into R2 using t-SNE, PCA, SMACOF MDS and SRGW+GD. We also embed each dataset into a circle of unknown radius using SRGW+GD and the following comparison methods: GD denotes minimizing (8) with a random initalization and the Adam optimizer. We use 10 random initializations and report the min and max distortion over all trials.

CC is the circular coordinate method introduced in (De Silva and Vejdemo-Johansson 2009), which constructs a map from a ﬁnite metric space X to S1 using persistent cohomology. We used the density-robust version introduced in (Paik and Park 2023). Since the method only produces circular coordinates and not a radius, we estimate the radius as maxx,y X d(x, y)/π. Distortion values are contained in Table 2.1 For embeddings into R2, we again ﬁnd that SRGW+GD and SMACOF MDS have similar distortion values, all signiﬁcantly lower than PCA. SRGW+GD achieves higher distortion on S1 than on R2 (since the target space has one less dimension and thus captures less variation), but still achieves a lower distortion on S1 than PCA achieves on R2. SRGW+GD achieves a signiﬁcantly lower distortion on S1 than CC does, and lower distortion than GD in all cases except for some trials on R-MNIST1. We can also gain some insight into how SRGW+GD differs qualitatively from other methods. In Figure 1, we show the image of the embedding for the dataset R-MNIST9, colored by the true angle of rotation. We also plot the true angle against the inferred angular coordinate (for R2 this is taken to be the angle of the point from the x-axis). We note that CC maps true rotation angles to angular coordinates in a roughly injective way, thereby capturing the rotation process accurately. t-SNE maps the rotation angle roughly injectively onto a thickened curve in the plane, but inferring an angular coordinate with our naive method does not recover the rotation angle correctly. PCA, MDS and both versions of SRGW+GD map true rotation angles to angular coordinates via a roughly degree two map. This better captures the global geometry, since an image of a 9 is often closer to its 180 rotation than its 90 rotation. Finally, we see that GD recovers none of the rotation structure in the dataset. In general, it can be hard to infer an angular coordinate from a planar embedding (see t-SNE in Figure 1). Even our naive method above requires ﬁnding an appropriate center for the data, which might not be the mean if the data is distributed very unevenly (e.g. in the redistricting application below). The advantage of choosing a circle as the target space is that it produces a well-deﬁned angular coordinate. Our experiments demonstrate that regardless of which target space is preferred, SRGW+GD effectively preserves global geometry. They also demonstrate the necessity of sr GW embeddings as an initialization point for gradient descent.

Cities. To demonstrate a non-Euclidean embedding where approximate isometric embedding is possible, we use a list of the 20 largest cities2, with the geodesic distance on the Earth between every pair of cities as ground truth (this distance does not assume the Earth is a perfect sphere, and instead uses the WGS-84 ellipsoid). Using SRGW+GD, we embed this dataset into a sphere of radius 6371 (the average radius of the Earth in kilometers). Figure 2 shows the embedding. SRGW+GD achieves an embedding that is almost

1While t-SNE does not aim to reduce distortion and thus is not a fair comparison, it is a helpful contrast for the qualitative behavior of SRGW+GD; we include distortion values for completeness 2https://simplemaps.com/data/world-cities, CC-BY 4.0 license

R2 embeddings S1 embeddings t-SNE PCA SMACOF MDS SRGW+GD CC GD (min,max) SRGW+GD R-MNIST0 28.196 3.508 1.989 1.987 5.467 (2.810, 2.812) 2.534 R-MNIST1 33.631 2.032 1.200 1.202 1.818 (1.709, 2.247) 1.702 R-MNIST2 30.199 3.615 2.020 2.019 5.586 (2.854, 2.856) 2.591 R-MNIST3 30.410 3.261 1.873 1.873 5.459 (2.787, 2.790) 2.439 R-MNIST4 29.236 3.593 1.880 1.828 4.846 (2.488, 2.490) 2.329 R-MNIST5 28.459 3.159 1.812 1.812 5.325 (2.734, 2.736) 2.401 R-MNIST6 32.086 3.598 1.942 1.900 2.758 (2.621, 2.623) 2.437 R-MNIST7 32.798 3.347 1.828 1.796 2.594 (2.537, 2.539) 2.324 R-MNIST8 28.310 3.296 1.870 1.869 5.368 (2.735, 2.737) 2.418 R-MNIST9 32.468 3.307 1.783 1.775 2.638 (2.464, 2.466) 2.280

Table 2: Distortion (dis2) for embeddings of randomly rotated MNIST data using various methods.

angular coord.

rotation angle

Figure 1: Circle and planar embeddings for the R-MNIST9 dataset (above), and plots comparing the true angle of rotation vs the inferred angular coordinate from the embedding (below). Color indicates the true angle of rotation.

isometric; the pairwise distances between the embedded points never differ by more than 14 kilometers from the true distance. By contrast, an MDS embedding into R3 achieves a distortion of 264.724, indicating the beneﬁt of choosing a target space with the appropriate metric (a geodesic sphere).

Figure 2: Embedding onto a geodesic sphere of 20 world cities.

5 Application to Redistricting We now demonstrate how an embedding into a natural non Euclidean target can enable visualization of a complex data set, resulting in important insights, using computational redistricting as our area of application.

Background and Data. Redistricting is the process of dividing a region into contiguous, equal population districts for the purposes of electing representatives. There has been a lot of recent attention on generating redistricting ensembles large samples from the space of valid redistricting plans for a given U.S. state (Chen and Rodden 2015; Chikina, Frieze, and Pegden 2017; Herschlag et al. 2020; De Ford, Duchin, and Solomon 2021; Duchin, Needham, and Weighill 2022). When analyzed, ensembles can uncover baseline expectations for a typical plan, or be used to ﬂag outliers (some of which might be so-called gerrymanders, i.e. unfair maps). We will use our SRGW+GD method to visualize ensembles of two-district plans in order to achieve both these goals, similar to the approach in (Abrishami et al. 2020). There are currently six states in the contiguous United States with two Congressional districts: Idaho (ID), Maine (ME), Montana (MT), New Hampshire (NH), Rhode Island (RI) and West Virginia (WV). For each of these states we

obtained Census blockgroups from (Manson et al. 2023) and generated an ensemble of 1,000 redistricting plans using the Re Com algorithm (De Ford, Duchin, and Solomon 2021). As our distance between plans we chose a Hamming distance where the distance between two redistricting plans is deﬁned as the minimum number of Census blockgroups that must be reassigned to change the ﬁrst plan into the second. Treating the ensemble as a 1,000-point metric space with this distance, we then embed the ensemble into a circle with SRGW+GD. For each embedded ensemble, we plot the image of the embedding as a set of points on the circle, as shown in Figure 3. We also display the average division for each part of the circle (see the ar Xiv version for details), and histograms showing the distribution of circular coordinates in each ensemble. In the ar Xiv version, we try other nonlinear planar embeddings and ﬁnd that none of them reveal the circle structure within the data across all states.

Results. In general, we see that circular coordinates roughly parameterize the angle of the boundary: from a north-south division, round to an east-west division and then back to a north-south division. This is most easily visible for West Virginia and Montana. This is strong evidence that the circle is a good choice of target space for embedding these ensembles. The boundary does not always look linear for states with a very uneven population distribution such as Maine (where most of the population is in the south of the state), though we still see a smooth rotation.

A general trend we can observe is a preference for divisions of the state with short (internal) boundary length. Boundary length is one possible measure of compactness , a redistricting criterion often written into legislation around redistricting. The Re Com algorithm is known to favor low boundary lengths, where the boundary is measured by the number of Census blockgroups (or other geographic units) on the boundary of the districts (De Ford, Duchin, and Solomon 2021; Procaccia and Tucker-Foltz 2022). Preference for short boundaries can be observed in West Virginia, where the northwest-southeast division requires a long boundary and is thus not likely to be drawn by the Re Com algorithm. In Idaho, we see a distribution with at least two modes: a northwest-southeast division and a northeastsouthwest division. We should note that in Idaho, a straight, vertical boundary is so unlikely that it doesn t even show up on the heat maps; this is likely because most of the large boundary length this would require. In Maine, most plans are concentrated around northeast-southwest split. In New Hampshire most plans are concentrated around a roughly north-south split, though there is more variance than for Maine. These results for Maine and New Hampshire align with the analysis in (Asgari et al. 2020) of redistricting in these states. In particular, the authors of (Asgari et al. 2020) conclude that the enacted west-east redistricting plan in New Hampshire was an outlier compared to their ensemble, and propose that incumbent protection may have played a role in the drawing of that map. Our analysis also suggests that a east-west division of the state would be an outlier when compared to the ensemble.

(a) Idaho (ID)

(b) Maine (ME)

(c) Montana (MT)

(d) New Hampshire (NH)

(e) Rhode Island (RI)

(f) West Virginia (WV)

Figure 3: Circle embeddings of 1,000-plan ensembles using SRGW+GD. Heat maps indicate the average district location for plans in each part of the circle.

Acknowledgments This material is based upon work supported by the National Science Foundation under the following grants. T. Needham was supported by NSF grants DMS 2107808 and DMS 2324962. R. A. Clark was supported by the NSF MPS Ascending Postdoc Award 2138110. We would like to thank the anonymous reviewers for their helpful suggestions during the review process.

References Abrishami, T.; Guillen, N.; Rule, P.; Schutzman, Z.; Solomon, J.; Weighill, T.; and Wu, S. 2020. Geometry of graph partitions via optimal transport. SIAM Journal on Scientiﬁc Computing, 42(5): A3340 A3366. Asgari, S.; Basewitz, Q.; Bergmann, E.; Brogsol, J.; Cox, N.; Davis, D.; Kampel, M.; Keating, B.; Knox, K.; Lam, A.; et al. 2020. Assessing congressional districting in Maine and New Hampshire. ar Xiv preprint ar Xiv:2011.06555. Beinert, R.; Heiss, C.; and Steidl, G. 2023. On assignment problems related to Gromov Wasserstein distances on the real line. SIAM Journal on Imaging Sciences, 16(2): 1028 1032. Brenier, Y. 1987. D ecomposition polaire et r earrangement monotone des champs de vecteurs. CR Acad. Sci. Paris S er. I Math., 305: 805 808. Bronstein, A. M.; Bronstein, M. M.; and Kimmel, R. 2006. Generalized multidimensional scaling: a framework for isometry-invariant partial surface matching. Proceedings of the National Academy of Sciences, 103(5): 1168 1172. Burago, D.; Burago, Y.; and Ivanov, S. 2022. A course in metric geometry, volume 33. American Mathematical Society. Chen, J.; and Rodden, J. 2015. Cutting through the thicket: Redistricting simulations and the detection of partisan gerrymanders. Election Law Journal, 14(4): 331 345. Chikina, M.; Frieze, A.; and Pegden, W. 2017. Assessing signiﬁcance in a Markov chain without mixing. Proceedings of the National Academy of Sciences, 114(11): 2860 2864. Chowdhury, S.; Miller, D.; and Needham, T. 2021. Quantized Gromov-Wasserstein. In Machine Learning and Knowledge Discovery in Databases. Research Track: European Conference, ECML PKDD 2021, Bilbao, Spain, September 13 17, 2021, Proceedings, Part III 21, 811 827. Springer. Chowdhury, S.; and Needham, T. 2021. Generalized spectral clustering via Gromov-Wasserstein learning. In International Conference on Artiﬁcial Intelligence and Statistics, 712 720. PMLR. Chowdhury, S.; Needham, T.; Semrad, E.; Wang, B.; and Zhou, Y. 2023. Hypergraph co-optimal transport: Metric and categorical properties. Journal of Applied and Computational Topology, 1 60. De Silva, V.; and Vejdemo-Johansson, M. 2009. Persistent cohomology and circular coordinates. In Proceedings of the twenty-ﬁfth annual symposium on Computational geometry, 227 236.

De Ford, D.; Duchin, M.; and Solomon, J. 2021. Recombination: A family of Markov chains for redistricting. Harvard Data Science Review, 3(1): 3. Duchin, M.; Needham, T.; and Weighill, T. 2022. The (homological) persistence of gerrymandering. Foundations of Data Science, 4(4): 581 622. Dumont, T.; Lacombe, T.; and Vialard, F.-X. 2024. On the existence of Monge maps for the Gromov Wasserstein problem. Foundations of Computational Mathematics, 1 48. Flamary, R.; Courty, N.; Gramfort, A.; Alaya, M. Z.; Boisbunon, A.; Chambon, S.; Chapel, L.; Corenﬂos, A.; Fatras, K.; Fournier, N.; et al. 2021. Pot: Python optimal transport. Journal of Machine Learning Research, 22(78): 1 8. Herschlag, G.; Kang, H. S.; Luo, J.; Graves, C. V.; Bangia, S.; Ravier, R.; and Mattingly, J. C. 2020. Quantifying gerrymandering in north carolina. Statistics and Public Policy, 7(1): 30 38. Kingma, D. P.; and Ba, J. 2014. Adam: A method for stochastic optimization. ar Xiv preprint ar Xiv:1412.6980. Manson, S.; Schroeder, J.; Riper, D. V.; Knowles, K.; Kugler, T.; Roberts, F.; and Ruggles, S. 2023. IPUMS National Historical Geographic Information System: Version 18.0 [dataset]. http://doi.org/10.18128/D050.V18.0. Memoli, F. 2007. On the use of Gromov-Hausdorff Distances for Shape Comparison. In Botsch, M.; Pajarola, R.; Chen, B.; and Zwicker, M., eds., Eurographics Symposium on Point-Based Graphics. The Eurographics Association. ISBN 978-3-905673-51-7. M emoli, F. 2011. Gromov Wasserstein distances and the metric approach to object matching. Foundations of computational mathematics, 11: 417 487. M emoli, F. 2012. Some properties of Gromov Hausdorff distances. Discrete & Computational Geometry, 48: 416 440. M emoli, F.; and Needham, T. 2024. Comparison results for Gromov Wasserstein and Gromov Monge distances. ESAIM: Control, Optimisation and Calculus of Variations, 30: 78. Murray, R.; and Pickarski, A. 2024. On Probabilistic Embeddings in Optimal Dimension Reduction. ar Xiv preprint ar Xiv:2408.02433. Paik, T.; and Park, J. 2023. Circular Coordinates for Density-Robust Analysis. ar Xiv preprint ar Xiv:2301.12742. Perea, J. A. 2018. Multiscale projective coordinates via persistent cohomology of sparse ﬁltrations. Discrete & Computational Geometry, 59: 175 225. Peyr e, G.; Cuturi, M.; and Solomon, J. 2016. Gromov Wasserstein averaging of kernel and distance matrices. In International conference on machine learning, 2664 2672. PMLR. Polanco, L.; and Perea, J. A. 2019. Coordinatizing Data With Lens Spaces and Persistent Cohomology. In Proceedings of the 31 st Canadian Conference on Computational Geometry (CCCG), 49 57.

Procaccia, A. D.; and Tucker-Foltz, J. 2022. Compact redistricting plans have many spanning trees. In Proceedings of the 2022 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 3754 3771. SIAM. Redko, I.; Vayer, T.; Flamary, R.; and Courty, N. 2020. Cooptimal transport. Advances in neural information processing systems, 33: 17559 17570. Salmona, A.; Delon, J.; and Desolneux, A. 2022. Gromov Wasserstein Distances between Gaussian Distributions. Journal of Applied Probability, 59(4). Scetbon, M.; Peyr e, G.; and Cuturi, M. 2022. Linear-time Gromov Wasserstein distances using low rank couplings and costs. In International Conference on Machine Learning, 19347 19365. PMLR. Sturm, K.-T. 2023. The Space of Spaces: Curvature Bounds and Gradient Flows on the Space of Metric Measure Spaces, volume 290. American Mathematical Society. Van Assel, H.; Vincent-Cuaz, C.; Courty, N.; Flamary, R.; Frossard, P.; and Vayer, T. 2024. Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein Projection. ar Xiv preprint ar Xiv:2402.02239. Vayer, T. 2020. A contribution to optimal transport on incomparable spaces. ar Xiv preprint ar Xiv:2011.04447. Vayer, T.; Chapel, L.; Flamary, R.; Tavenard, R.; and Courty, N. 2020. Fused Gromov-Wasserstein distance for structured objects. Algorithms, 13(9): 212. Villani, C. 2021. Topics in optimal transportation, volume 58. American Mathematical Soc. Villani, C.; et al. 2009. Optimal transport: old and new, volume 338. Springer. Vincent-Cuaz, C.; Flamary, R.; Corneli, M.; Vayer, T.; and Courty, N. 2022. Semi-relaxed Gromov Wasserstein divergence with applications on graphs. In International Conference on Learning Representations. Xu, H.; Luo, D.; and Carin, L. 2019. Scalable Gromov Wasserstein learning for graph partitioning and matching. Advances in neural information processing systems, 32.