# optimal_transport_with_cyclic_symmetry__884812cd.pdf

Optimal Transport with Cyclic Symmetry

Shoichiro Takeda, Yasunori Akagi, Naoki Marumo, Kenta Niwa

NTT Corporation, 1-1 Hikari-no-oka, Yokosuka-Shi, Kanagawa, 239-0847, Japan {shoichiro.takeda, yasunori.akagi, naoki.marumo, kenta.niwa}@ntt.com

We propose novel fast algorithms for optimal transport (OT) utilizing a cyclic symmetry structure of input data. Such OT with cyclic symmetry appears universally in various real-world examples: image processing, urban planning, and graph processing. Our main idea is to reduce OT to a small optimization problem that has significantly fewer variables by utilizing cyclic symmetry and various optimization techniques. On the basis of this reduction, our algorithms solve the small optimization problem instead of the original OT. As a result, our algorithms obtain the optimal solution and the objective function value of the original OT faster than solving the original OT directly. In this paper, our focus is on two crucial OT formulations: the linear programming OT (LOT) and the strongly convex-regularized OT, which includes the wellknown entropy-regularized OT (EROT). Experiments show the effectiveness of our algorithms for LOT and EROT in synthetic/real-world data that has a strict/approximate cyclic symmetry structure. Through theoretical and experimental results, this paper successfully introduces the concept of symmetry into the OT research field for the first time.

Introduction Given two probability vectors and a cost matrix, the discrete optimal transport (OT) problem seeks an optimal solution to minimize the cost of transporting the probability vector toward another one. Its total transportation cost is an effective tool that compares two probability vectors. Therefore, OT has been studied in various research areas, e.g., text embedding (Kusner et al. 2015), image matching (Liu et al. 2020), domain adaptation (Courty et al. 2017), graph comparison (Nikolentzos, Meladianos, and Vazirgiannis 2017), and interpolation (Solomon et al. 2015). There are many formulations for OT. Kantorovich (1942) was the first to formulate OT as the linear programming problem, and the linear OT (LOT) made great progress toward solving OT. Recently, the strongly convex-regularized OT (SROT) has attracted much attention, especially, the entropy-regularized OT (EROT) (Cuturi 2013; Blondel, Seguy, and Rolet 2018; Peyr e and Cuturi 2019; Guo, Ho, and Jordan 2020). SROT is superior to LOT in terms of guaranteeing a unique solution and computational stability.

Copyright 2024, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

Figure 1: A simple counter-example to the intuitive approach for OT with cyclic symmetry. Given the above two gray images with 90 cyclic symmetry, it seems intuitively sufficient to consider OT for only one of the symmetric areas (i.e., the red rectangular areas). However, when the cost matrix is given by the pixel-wise Euclidean distance matrix, the (0, 1)-th entry in the left image is optimally transported toward the (0, 2)-th entry beyond the red rectangular area in the right image. Thus, such an intuitive approach gives an incorrect solution, so we must develop novel algorithms that utilize cyclic symmetry appropriately in OT.

Many algorithms have been studied to solve OT. The network simplex algorithm (Ahuja, Magnanti, and Orlin 1993) is a well-known classical algorithm for LOT and has been widely used. The Sinkhorn algorithm (Cuturi 2013) and primal-dual descent algorithms (Dvurechensky, Gasnikov, and Kroshnin 2018; Guo, Ho, and Jordan 2020) have been proposed to solve EROT faster. Recently, algorithms utilizing special structures of input data have been in the spotlight for solving OT faster, e.g., algorithms that utilize the lowrankness of the input data (Tenetov, Wolansky, and Kimmel 2018; Altschuler et al. 2019). Besides, several algorithms utilize the Gibbs kernel structure of the input cost matrix in the Sinkhorn algorithm, such as separability (Solomon et al. 2015; Bonneel, Peyr e, and Cuturi 2016) and translation invariance (Getreuer 2013; Peyr e and Cuturi 2019). In this paper, we propose novel fast algorithms for OT utilizing a new special structure, cyclic symmetry, of input data. Specifically, we assume n-order cyclic symmetry for the input data; the input d-dimensional probability vector is a concatenation of n copies of an m(:= d/n)-dimensional vector, and the input d d cost matrix is a block-circulant matrix consisting of n matrices with size m m (see Assumption 1). Such OT with cyclic symmetry appears universally in various real-world examples: image processing,

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Figure 2: Overview of our algorithm for LOT with cyclic symmetry (C-LOT). This algorithm reduces C-LOT to a small LOT that has significantly fewer variables and solves the small LOT instead, resulting in fast computation. Note that the small cost matrix is not just a part of the original one; it aggregates the original cost matrix on the basis of cyclic symmetry, see (11).

urban planning, and graph processing (for details, see Examples 1 to 3). Intuitively, we can obtain an optimal solution to such a problem faster by solving OT for only one of the symmetric components of the input data and concatenating n copies of the obtained solution. However, this approach cannot work due to ignoring interactions between the symmetric components (see Figure 1). Unlike such an intuitive approach, we propose novel fast algorithms utilizing cyclic symmetry for two crucial OT formulations: LOT and SROT. First, we propose a fast algorithm for LOT with cyclic symmetry (C-LOT). Figure 2 shows an overview of this algorithm. Our main idea is to reduce C-LOT, which has d2 variables, to a small LOT, which has only m2 variables, by utilizing cyclic symmetry. To achieve this reduction, we introduce auxiliary variables considering cyclic symmetry and rewrite C-LOT as a min-min optimization problem. Surprisingly, the inner min problem can be solved analytically, and the min-min problem becomes a small LOT. Therefore, this algorithm solves C-LOT faster by solving the small LOT instead. Using the network simplex algorithm to solve the small LOT, its time complexity bound becomes O(m3 log m log(m C )+d2) where C is the cost matrix and C := maxi,j|Cij|. This is greatly improved from O(d3 log d log(d C )) when solving C-LOT directly. Next, we propose a fast algorithm for SROT with cyclic symmetry (C-SROT). Unlike C-LOT, we cannot reduce CSROT to a small SROT due to the regularizer. To overcome this issue, we consider the Fenchel dual of C-SROT. By utilizing cyclic symmetry, we show that the Fenchel dual problem has only 2m variables, which is significantly fewer than the 2d variables in the naive dual of C-SROT. Therefore, this algorithm solves the small Fenchel dual problem by the alternating minimization algorithm (Beck 2017, Chapter 14). Since the number of variables is very few, its time complexity for one iteration will be reduced, resulting in fast com-

putation as a whole. Especially, this algorithm for EROT with cyclic symmetry (C-EROT), which is a subclass of CSROT, becomes a Sinkhorn-like algorithm. We call it cyclic Sinkhorn algorithm. The interesting point is that the Gibbs kernel in the cyclic Sinkhorn algorithm differs from that in the original Sinkhorn algorithm and is designed by considering cyclic symmetry. Its time complexity bound of each iteration is O(m2), which is significantly improved from O(d2) when solving C-EROT by the original Sinkhorn algorithm. Finally, we propose a two-stage Sinkhorn algorithm for CEROT with approximate cyclic symmetry. In the real world, there are many cases where the input data exhibit only approximate cyclic symmetry due to slight noise and displacement. The cyclic Sinkhorn algorithm cannot be applied to such cases because strict cyclic symmetry of the input data is assumed. To overcome this issue, the two-stage Sinkhorn algorithm first runs the cyclic Sinkhorn algorithm to quickly obtain a strict symmetric solution. It then runs the original Sinkhorn algorithm to modify the solution. As a result, this algorithm obtains the optimal solution to C-EROT with approximate cyclic symmetry faster by utilizing cyclic symmetry at the first stage. In the Experiments section, we experimentally confirmed the fast computation of this algorithm when input data have approximate cyclic symmetry. In summary, this paper introduces the concept of symmetry into the OT research field for the first time and proposes fast cyclic symmetry-aware algorithms that solve small optimization problems instead of the original OT. We validated the effectiveness of our algorithms in synthetic/real-world data with a strict/approximate cyclic symmetry structure.

Related Work OT was initially formulated by (Monge 1781). Later (Kantorovich 1942) relaxed it as the linear programming problem, which permits splitting a mass from a single source to

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

multiple targets. The linear OT (LOT) is easier to solve than Monge s form and has made great progress toward solving OT. To solve OT, many algorithms have been proposed. For example, the network simplex algorithm (Ahuja, Magnanti, and Orlin 1993) is one of the classical algorithms for LOT and has been widely used. Recently, algorithms have been proposed to solve OT faster by adding the entropy regularizer (Cuturi 2013; Altschuler, Niles-Weed, and Rigollet 2017; Lin, Ho, and Jordan 2019b; Alaya et al. 2019). The dual form of the entropy-regularized OT can be solved faster by the Sinkhorn algorithm that updates dual variables via matrix-vector products (Sinkhorn 1967). For further acceleration, many improvements to the Sinkhorn algorithm have been proposed. For example, (Altschuler, Niles Weed, and Rigollet 2017), (Lin, Ho, and Jordan 2019b), and (Alaya et al. 2019) proposed using greedy, randomized, and safe-screening strategies, respectively, to efficiently update the dual variables. Primal-dual algorithms have received much attention (Dvurechensky, Gasnikov, and Kroshnin 2018; Lin, Ho, and Jordan 2019a; Guo, Ho, and Jordan 2020) because they report faster computation than the Sinkhorn algorithm and its variants but are rarely used in practice due to the difficulty of implementation (Pham et al. 2020). This paper focuses on the network simplex algorithm and Sinkhorn algorithm because they are widely used. As another line of work to solve OT faster, utilizing special structures of input data has been well studied (Solomon et al. 2015; Bonneel, Peyr e, and Cuturi 2016; Peyr e and Cuturi 2019; Getreuer 2013; Tenetov, Wolansky, and Kimmel 2018; Altschuler et al. 2019). Inspired by the fact that geodesic distance matrices can be low-rank approximated (Shamai et al. 2015), a low-rank approximation for the cost matrix in OT was introduced to reduce the time complexity of the Sinkhorn algorithm (Tenetov, Wolansky, and Kimmel 2018; Altschuler et al. 2019). Several approaches have utilized the Gibbs kernel structures of the cost matrix appearing in the Sinkhorn algorithms. The key to these approaches is to approximate the calculation involving the Gibbs kernel; by utilizing separability (Solomon et al. 2015; Bonneel, Peyr e, and Cuturi 2016) or translation invariant (Peyr e and Cuturi 2019; Getreuer 2013) of the Gibbs kernel on a fixed uniform grid, the matrix-vector product in the Sinkhorn algorithm can be replaced with convolutions. Thus, it can be computed faster by, e.g., a fast Fourier transform. This paper introduces the utilization of a new special but ubiquitous structure, cyclic symmetry, in OT.

Preliminary Notations

R 0 denotes the set of non-negative real numbers. , denotes the inner product; that is, for vectors x, y Rd, x, y = Pd 1 i=0 xiyi, and for matrices X, Y Rd d, X, Y = Pd 1 i,j=0 Xij Yij. The probability simplex is de-

noted as d := {x Rd | Pd 1 i=0 xi = 1, xi 0}. 1d denotes the all-ones vector in Rd. 0d denotes the all-zeros vector in Rd. Id denotes the d d identity matrix. denotes the Kronecker product.

Regularized Optimal Transport (ROT) We define the regularized OT (ROT) that adds a convex regularizer to the linear OT (LOT) introduced by (Kantorovich 1942). Given two probability vectors a, b d and a cost matrix C Rd d 0 , ROT can be defined as

min T Rd d C, T +

i,j=0 ϕ(Tij)

s.t. T1d = a, T 1d = b,

where T is called a transportation matrix and ϕ : R R {+ } is a convex function, called a regularizer. We assume ϕ(x) = + if x < 0; this assumption imposes the nonnegative constraint on T. ROT (1) is a generalization of various OT formulations. For example, (1) leads to LOT when ϕ is given by

ϕ(x) = 0 if x 0, + otherwise. (2)

Also, (1) leads to the strongly convex-regularized OT (SROT) when ϕ is a strongly convex function; a function ϕ is called strongly convex if ϕ µ

2 is convex for some µ > 0. As an important subclass of SROT, (1) leads to the entropyregularized OT (EROT) introduced by (Cuturi 2013) when ϕ is given by

ϕ(x) = λx(log x 1) if x 0, + otherwise, (3)

where λ > 0. Note that, unlike the standard OT formulation, (1) imposes the non-negative constraint on T via the regularizer ϕ(x) for two purposes: (i) to unify the description of LOT and SROT, and (ii) to describe the Fenchel dual of (8) concisely and intuitively (see later in Theorem 2).

C-ROT: ROT with Cyclic Symmetry This section explains our assumption of cyclic symmetry for ROT (1) and real-world examples of this problem. We assume that a, b, C in (1) have the following n-order cyclic symmetry. Assumption 1. There exists a divisor n of d, and the probability vectors a, b in (1) have a periodic structure:

where α, β Rm 0 and m := d

n is an integer. Also, the cost matrix C in (1) has a block-circulant structure:

Cn 1 C0 ... ... ... ... ... C1 C1 Cn 1 C0

where C0, . . . , Cn 1 Rm m 0 .

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

In this paper, we call ROT (1) with Assumption 1 Cyclic ROT (C-ROT). This problem appears universally in various real-world examples given below. Example 1 (Image with Cyclic Symmetry). Cyclic symmetry in images has been a central image research topic. Especially, because image data are represented in a rectangle form, mirror or 90 rotational symmetry has been utilized for various tasks; mirror symmetry has been utilized for the face recognition (Zhao et al. 2003) and rendering (Wu, Rupprecht, and Vedaldi 2023), and 90 rotational symmetry in medical and galaxy images has been utilized for the image segmentation (Pang et al. 2022) and morphology prediction (Dieleman, Willett, and Dambre 2015). Thus, we here consider ROT between images with cyclic symmetry, A and B Rh w 0 . For images with mirror symmetry, we assume mirror symmetry along the vertical axis; Aij = Ai,w j 1, Bij = Bi,w j 1, for 0 i < h and 0 j < w. We vectorize these images by appropriately ordering pixels as follows:

a = Aπ(0), Aπ(1), . . . , Aπ(hw 1) , (6)

b = Bπ(0), Aπ(1), . . . , Bπ(hw 1) ,

π(k) = (k mod h, k/h ) 0 k < hw

2 k mod h, 3w

By defining C as the Manhattan, Euclidean, or Chebyshev distance matrix between pixel positions, a, b, C satisfy Assumption 1; thus, C-ROT for n = 2 will appear. Similarly, by appropriately ordering pixels for a, b in the case of 90 rotational symmetry, C-ROT for n = 4 will appear. Example 2 (Urban Planning with Cyclic Symmetry). ROT has straightforward applications in logistics and economy (Kantorovich 1942; Guillaume 2012). Imagine a situation where planners design the structure of a city, this structure is simply given by two probability distributions: the distributions of residents a and services b. In this context, the objective function value of ROT enables us to measure how close residents and services are and evaluate the city s efficiency. Several city structures, such as Howard s garden city (Howard 1965), assume that residents and services are equally located along cyclic symmetry to improve quality of life. In such structures, a, b and C, where Cij is given by the Euclidean distance between each resident ai and service bj, satisfy Assumption 1; thus, C-ROT will appear. Example 3 (Graph with Cyclic Symmetry). Graphs are commonly used to model real-world data. For example, chemical molecules and crystal structures can be modeled using graphs (Bonchev 1991; Xie and Grossman 2018), and their graph structures often exhibit cyclic symmetry (Jaff e and Orchin 2002; Ladd 2014). To compare two graphs, computing their distance has been well-studied and OT-based approaches have been proposed (Nikolentzos, Meladianos, and Vazirgiannis 2017; Petric Maretic et al. 2019; Togninalli et al. 2019). We here consider ROT for computing a distance between two graphs, G1 and G2, whose structures exhibit cyclic symmetry. Following (Togninalli et al. 2019), we define ai = bj = 1

d, where d is the number of nodes, to ensure

the same amount of outgoing/incoming flow from/to a node and Cij as the Hamming distance between the Weisfeiler Lehman hash features of each node pair in G1 and G2, namely Cij = hamming(f1,i, f2,j) where fk,l is the l-th node hash feature in Gk. In this situation, a, b, and C satisfy Assumption 1, and thus C-ROT for two graphs will appear.

Fast Algorithms for C-ROT In this section, we propose fast algorithms for C-ROT.

Block-Cyclic Structure of Optimal Solution We first show the following lemma. Lemma 1. Under Assumption 1, there exists an optimal solution to (1) that has the following structure:

Tn 1 T0 ... ... ... ... ... T1 T1 Tn 1 T0

where T0, . . . , Tn 1 Rm m 0 .

Proof. Let T be an optimal solution to (1). We define T as follows:

k=0 Pk T Pk ,

Pk := 0 n 1 1 In 1 0n 1

is the block-circulant permutation matrix. First, we will show that T is a feasible solution to (1). T satisfies the constraints of row summation because

k=0 Pk T Pk 1d = 1

k=0 Pk T 1d

k=0 Pka = 1

Similarly, we can show that T satisfies the constraints of column summation (i.e., (T ) 1d = b). Thus, T is a feasible solution to (1). Next, we will show that T is an optimal solution to (1). For this purpose, we check the objective function value of (1) when T = T . Let f(T) be the objective function of (1), we get

k=0 Pk T Pk !

k=0 f Pk T Pk

k=0 f(T ) = f(T ).

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Note that, we use the convexity of f and Jensen s inequality in the inequality relationship of the above equation. Thus, T is an optimal solution to (1). Finally, for l = 0, . . . , n 1, we get

Pl T (Pl) = 1

k=0 Pk+l T (Pk+l)

k=0 Pk T (Pk) = T .

Therefore, T has a block-circulant structure of (7).

From Assumption 1 and Lemma 1, C and T have the same block-circulant structure. Plugging (5) and (7) into CROT (1) yields the following optimization problem:

min T0,...,Tn 1 Rm m

k=0 Ck, Tk +

i,j=0 ϕ(Tijk)

k=0 Tk1m = α,

k=0 T k 1m = β,

where Tijk is the (i, j)-th entry of Tk. Note that the optimal objective function value of (8) is exactly 1

n of that of (1).

Fast Algorithm for C-LOT We here propose a fast algorithm for cyclic LOT (C-LOT), which is the special case of C-ROT (1) where ϕ is given by (2). From (8), C-LOT (1) can be rewritten as

min T0,...,Tn 1 Rm m 0

k=0 Tk1m = α,

k=0 T k 1m = β.

By introducing auxiliary variables S := Pn 1 k=0 Tk and rewriting (9) for S, we can show the following theorem. Theorem 1. We consider a small LOT

min S Rm m 0 G, S s.t. S1m = α, S 1m = β, (10)

where Gij := min 0 k n 1 Cijk. (11)

Let S be an optimal solution to (10). Then, (T k)k=0,...,n 1 defined by

S ij if k = min argmin 0 k n 1 Cijk

0 otherwise (12)

is an optimal solution to (9). Also, the optimal objective function value of (9) is the same as that of (10). Note that argmin0 k n 1Cijk in (12) will return a set of indices if the same minimum value exists in several indices, and we can choose any one but the smallest index by min.

Algorithm 1: Fast Algorithm for C-LOT

Require: a, b d and C Rd d 0 under Assumption 1. 1: Compute G whose entry is given by (11). 2: Compute the optimal solution S to (10). 3: for i, j, k do 4: Compute Tijk by the relationship (12) 5: end for 6: Compute T by Lemma 1 with (Tijk) 7: return T

Proof. We fix S := Pn 1 k=0 Tk in (9). The matrix S satisfies S1m = α, S 1m = β and we can rewrite (9) as

min S Rm m 0 ,

S1m=α, S 1m=β

min T0,...,Tn 1 Rm m 0 , Pn 1 k=0 Tk=S

The inner problem is equivalent to individual problems for each (i, j)-th entry of T0, . . . , Tn 1, that is we can consider

min Tij0,...,Tij(n 1) R 0

k=0 Cijk Tijk s.t.

k=0 Tijk = Sij,

for each (i, j) independently. This problem can be solved analytically; the optimal solution is given by (12), and the optimal objective function value is G, S . Next, we solve the outer optimization problem for S. Because S Rm m 0 , S1m = α, S 1m = β and the objective function is G, S , this optimization problem is identical with (10).

Theorem 1 indicates that C-LOT (1) can be reduced to the small LOT (10), which has significantly fewer m2 variables than d2 = m2n2 variables of the original C-LOT (1). Therefore, we will obtain the optimal solution to C-LOT (1) by solving the small LOT (10) instead. The proposed algorithm is summarized in Algorithm 1. Also, Figure 2 shows the overview of this algorithm. We evaluate the time complexity of Algorithm 1. The time complexity depends on the algorithm to solve the small LOT (10). We here use the network simplex algorithm, the most popular algorithm to solve LOT, to evaluate the time complexity. Tarjan (1997) showed that the time complexity of the network simplex algorithm to solve LOT (1) with the regularizer (2) is O(d3 log d log(d C )), where C := maxi,j|Cij|. This enables the time complexity of line 2 in Algorithm 1 to be bounded by O(m3 log m log(m C )). Because line 1 and lines 3 7 can be conducted in O(d2) time for inputting C and outputting T respectively, the total time complexity of Algorithm 1 is O(m3 log m log(m C ) + d2). This is significantly improved from O(d3 log d log(d C )) when solving C-LOT (1) directly. Note that we can consider that inputting only C0, . . . , Cn 1 (in line 1) and outputting only T0, . . . , Tn 1 (in lines 3 7) are sufficient because T consists of T0, . . . , Tn 1. In this case, those lines can be conducted in O(m2n) time, and thus the total time complexity of Algorithm 1 will be O(m3 log m log(m C ) + m2n).

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Fast Algorithm for C-SROT

We propose a fast algorithm for cyclic SROT (C-SROT) which is the special case of C-ROT (1) where ϕ is a strongly convex regularizer. Note that because ϕ defined by (2) is not strongly convex, we cannot apply this algorithm to C-LOT. The following theorem follows from Fenchel s duality theorem and optimality conditions in convex analysis (see, e.g., (Rockafellar 1970, Section 31)). Theorem 2. The Fenchel dual of the problem (8) is

max w,z Rm w, α + z, β

i,j=0 ϕ (wi + zj Cijk), (13)

where ϕ : R R {+ } is the Fenchel conjugate of ϕ defined by ϕ (y) := sup{yx ϕ(x) | x R}. Also, the optimal solutions to the problem (8), T k, and to the problem (13), w and z , have the following relationship:

T ijk = (ϕ ) (w i + z j Cijk). (14)

Proof. We rewrite (8) with Lagrange multipliers w and z for the two equality constraints as follows:

min T0,...,Tn 1 Rm m max w,z Rm L(T0, . . . , Tn 1, w, z) (15)

L(T0, . . . , Tn 1, w, z)

k=0 Ck, Tk +

i,j=0 ϕ(Tijk)

= w, α + z, β

Ck w1 1z , Tk +

i,j=0 ϕ(Tijk).

Note that Problem (8) is convex and the constraints are linear and that Slater s constraint qualification holds. Hence, the strong duality holds (see, e.g., (Boyd and Vandenberghe 2004, Section 5.2.3)), and we can swap the minand maxoperations in (15):

max w,z Rm min T0,...,Tn 1 Rm m L(T0, . . . , Tn 1, w, z)

= max w,z Rm w, α + z, β

i,j=0 ϕ (wi + zj Cijk).

Thus, the Fenchel dual of the problem (8) is given by (13), and one of the optimality conditions is

T ijk = (ϕ ) (wi + zj Cijk).

Note that ϕ is a smooth and differentiable convex function because ϕ is strongly convex. Theorem 2 indicates that we will obtain the optimal solution to C-SROT (1) by solving the dual problem (13) instead because we can reconstruct it by the relationship (14) and Lemma 1. We here propose to apply the alternating minimization algorithm (Beck 2017, Chapter 14) to (13); we iteratively optimize the objective function of (13) with respect to w while fixing z, and vice versa. When we fix z, the partial derivative of the objective function with respect to wi is

j=0 (ϕ ) (wi + zj Cijk), (16)

and wi is optimal if (16) equals to 0. Because (16) monotonically decreases with respect to wi, we can find such wi easily by, e.g., the well-known Newton s method. This also applies to the optimization with respect to z while fixing w. The alternating minimization algorithm for a smooth convex function is guaranteed to attain fast convergence (see (Beck and Tetruashvili 2013) for more details). The distinguishing feature of this algorithm is treating a few dual variables. If the alternating minimization algorithm is used for the dual problem of (1) without considering cyclic symmetry, the number of dual variables is 2d = 2mn. In contrast, our algorithm treats only 2m dual variables, which is significantly reduced to 1

n. Therefore, the computational time required for one iteration in the alternating minimization will be considerably reduced.

Fast Algorithm for C-EROT We here propose a fast algorithm for cyclic EROT (CEROT), which is the crucial special case of C-ROT (1) where ϕ is given by (3). Because (3) is strongly convex, we can apply the cyclic-aware alternating minimization algorithm introduced in the previous subsection to C-EROT. Because ϕ (y) = λ exp( y

λ), (16) can be written as

j=0 Kij exp zj

k=0 exp Cijk

From (17), we can get optimal wi in closed form:

j=0 Kij exp zj

We can rewrite (19) and describe the optimal qj as follows:

pi = αi Pm 1 j=0 Kijqj , qj = βj Pm 1 i=0 Kijpi ,

where pi := exp wi

λ , qj := exp zj

λ . This algorithm resembles the Sinkhorn algorithm (Cuturi 2013); we call it

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Algorithm 2: Cyclic Sinkhorn Algorithm for C-EROT

Require: a, b d, C Rd d 0 under Assumption 1 and λ > 0. 1: Compute K whose entry is given by (18). 2: Initialize q 1m. 3: repeat 4: p α (Kq) denotes elementwise division 5: q β (K p) 6: until convergence 7: for i, j, k do

8: Tijk piqj exp Cijk

9: end for 10: Compute T by Lemma 1 with (Tijk) 11: return T

cyclic Sinkhorn algorithm. Note that the optimal solution T to C-EROT (1) can be easily reconstructed from the optimal w and z by (14) and Lemma 1. The proposed algorithm is summarized in Algorithm 2. We evaluate the time complexity of Algorithm 2. The time complexity depends on the matrix-vector product iterations in lines 4 and 5 in Algorithm 2 to solve the Fenchel dual problem (13). In the original Sinkhorn algorithm, the time complexity of each iteration is O(d2) = O(m2n2) (Cuturi 2013). In contrast, in our cyclic Sinkhorn algorithm, the time complexity of each iteration is O(m2); thus, our algorithm solves C-EROT significantly faster than the original Sinkhorn algorithm.

Two-Stage Algorithm for C-EROT with Approximate Cyclic Symmetry There are many real-world cases in which input data show only approximate cyclic symmetry. In Example 1, C satisfies Assumption 1 strictly when using the pixel-wise Euclidean distance, but input distributions a, b (namely, images) often satisfy Assumption 1 only approximately due to slight noise and displacement. Thus, the above-proposed algorithms cannot be applied to such cases because they assume to satisfy Assumption 1 strictly. To overcome this issue, we here propose a fast two-stage Sinkhorn algorithm for C-EROT with approximate cyclic symmetry. Because EROT is commonly used thanks to its differentiability and computational efficiency (Peyr e and Cuturi 2019; Guo, Ho, and Jordan 2020), we focused on C-EROT here. The two-stage Sinkhorn algorithm first runs the cyclic Sinkhorn algorithm (Algorithm 2) to quickly obtain a strict symmetric solution. It then runs the original Sinkhorn algorithm (Cuturi 2013) to modify the solution. Therefore, this algorithm obtains the optimal solution to C-EROT with approximate cyclic symmetry faster by utilizing cyclic symmetry at the first stage. The proposed algorithm is described in Algorithm 3. If satisfying Assumption 1 strictly, the time complexity of this algorithm is the same as that of the cyclic Sinkhorn algorithm. If not, it will be complex due to mixing the two Sinkhorn algorithms at Stages 1 and 2. This analysis is for future research, but we experimentally confirmed that this

Algorithm 3: Two-Stage Sinkhorn Algorithm for C-EROT with Approximate Cyclic Symmetry

Require: a, b d, C Rd d 0 and λ > 0. // Stage1: Cyclic Sinkhorn algorithm 1: for i = 0, . . . , m 1 do 2: αi = 1

n Pn 1 k=0 ai+mk the average of n-divided a

n Pn 1 k=0 bi+mk the average of n-divided b 4: end for 5: Compute K whose entry is given by (18). 6: Initialize bq 1m. 7: repeat 8: bp α (Kbq) denotes elementwise division 9: bq β (K bp) 10: until convergence // Stage2: Sinkhorn algorithm (Cuturi 2013) 11: Initialize p, q as the n concatenated bp, bq, respectively.

12: Compute Kij = exp Cij

13: repeat 14: p a (Kq) 15: q b (K p) 16: until convergence 17: return T diag(p)Kdiag(q)

algorithm shows fast computation when input data have approximate cyclic symmetry in the next section.

Experiments

To validate the effectiveness of our algorithms, we conducted experiments on synthetic/real-world data that satisfy Assumption 1 strictly/approximately. In all experiments, we evaluated whether our algorithms, which solve small optimization problems instead of the original OT, show the same results as the original OT but with faster computation; for details, we checked the average and standard deviation of the objective function values, marginal constraint errors defined by ||T 1d b||2, and the computation time when using different algorithms. The computation time was recorded between inputting the data and outputting the optimal solution. These experiments were performed on a Windows laptop with Intel Core i7-10750H CPU, 32 GB memory. All the codes were implemented in Python.

Synthetic Data with Strict Cyclic Symmetry

We created 20 synthetic random data for each of the two dimensions, d {5000, 10000}, that satisfy Assumption 1 strictly in n = 50; for details, we created a and b by concatenating n copies of m-dimensional uniform distribution with the half-open interval [0.0, 1.0) like (4) and normalized a and b so that the sum of each is 1. For C, we first sampled C0, . . . , Cn 1 by m m-dimensional Gaussian distribution with the mean 3.0 and the standard deviation 5.0. We then add the absolute minimum value, namely |mink=0,...,n 1 (mini,j=0,...,m 1 Cijk)|, to the all entries to ensure their non-negativity. After that, we created C by concatenating n copies of C0, . . . , Cn 1 like (5). For valida-

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Algorithm n d = 5000 d = 10000

Obj. value Marginal error Time (sec.) Obj. value Marginal error Time (sec.)

Network Simplex 6.034 0.824 0.000 0.000 6.523 1.013 6.526 0.917 0.000 0.000 33.660 3.238

Cyclic Network Simplex

2 6.034 0.824 0.000 0.000 1.477 0.235 6.526 0.917 0.000 0.000 7.084 0.728 5 6.034 0.824 0.000 0.000 0.300 0.030 6.526 0.917 0.000 0.000 1.391 0.155 10 6.034 0.824 0.000 0.000 0.136 0.026 6.526 0.917 0.000 0.000 0.618 0.073 25 6.034 0.824 0.000 0.000 0.080 0.019 6.526 0.917 0.000 0.000 0.381 0.044 50 6.034 0.824 0.000 0.000 0.056 0.015 6.526 0.917 0.000 0.000 0.329 0.034

Sinkhorn 6.233 0.821 0.000 0.000 3.271 1.445 6.745 0.916 0.000 0.000 14.589 4.213

Cyclic Sinkhorn

2 6.233 0.821 0.000 0.000 0.918 0.463 6.745 0.916 0.000 0.000 3.973 0.922 5 6.233 0.821 0.000 0.000 0.207 0.170 6.745 0.916 0.000 0.000 1.262 0.324 10 6.233 0.821 0.000 0.000 0.116 0.036 6.745 0.916 0.000 0.000 0.636 0.259 25 6.233 0.821 0.000 0.000 0.093 0.036 6.745 0.916 0.000 0.000 0.381 0.127 50 6.233 0.821 0.000 0.000 0.067 0.034 6.745 0.916 0.000 0.000 0.320 0.053

Table 1: Experimental results in the synthetic data. Obj. value indicates objective function value.

Figure 3: Real-world image and graph samples with cyclic symmetry (n = 2) that are used in the experiments of real-world cases. The above-shown chemical graph images are obtained by Pub Chem (Kim et al. 2022).

tion, we compared the network simplex algorithm (Ahuja, Magnanti, and Orlin 1993), Algorithm 1 using the network simplex algorithm in line 2 (we call it cyclic network simplex algorithm), the Sinkhorn algorithm (Cuturi 2013), and the cyclic Sinkhorn algorithm (Algorithm 2). We set λ = 0.5 for the regularizer (3). Because these synthetic data also satisfy Assumption 1 for all n that are divisors of 50, namely n {2, 5, 10, 25, 50}, we conducted experiments for each n; larger n leads to smaller problems that output the same result. The network simplex algorithm was implemented using LEMON (Dezs o, J uttner, and Kov acs 2011). Table 1 lists the results. The network simplex algorithm and cyclic one had the same optimal objective function values, but the latter showed faster computation times as n becomes larger. This was also the case when using the Sinkhorn algorithm and the cyclic one. Moreover, we experimentally confirmed that the marginal errors of the cyclic algorithms are 0. These results support our theoretically exact reformulations in Theorems 1 and 2 and the effectiveness of our proposed algorithms; higher cyclic symmetry (i.e., larger n) results in faster computation time.

Real-World Image Data with Approximate Cyclic Symmetry We tested our algorithms on the real-world case of mirror symmetry (n = 2) in Example 1 with the NYU Symmetry Database (Cicconet et al. 2017). In this database, we selected 20 images with mirror symmetry along the vertical axis (see Figure 3). These images were converted to gray-scale, resized to be 64 64 or 96 96 pixels, and normalized so that the sum of the intensity is 1. We then

obtained a, b by (6) and C by the pixel-wise Euclidean distance. Because EROT is commonly used in real applications (Peyr e and Cuturi 2019), we focused on C-EROT here and compared the Sinkhorn algorithm (Cuturi 2013), the cyclic one (Algorithm 2), and the two-stage one (Algorithm 3) over 190(= 20C2) image pairs, for validation. Note that, in the two-stage Sinkhorn algorithm, we stopped Stage 1 before the end of convergence to prevent the solution far from the optimal one for real-world images; we first run the cyclic Sinkhorn algorithm until the marginal error || (diag(bp)Kdiag(bq)) 1m β||2 is below 1.0 10 3 and then run the Sinkhorn algorithm until the difference between its objective function value and the value obtained by directly solving C-EROT with the Sinkhorn algorithm is below 1.0 10 4. We set λ = 0.5 for the regularizer (3).

Table 2 lists the results. The cyclic Sinkhorn algorithm showed the fastest computation time. However, because this algorithm assumes to satisfy Assumption 1 strictly, its objective function value differed from that of the original Sinkhorn algorithm, and marginal error occurred. In contrast, the two-stage Sinkhorn algorithm showed the same objective function value as that of the original one and no marginal error but with faster computation time than using the original one. These results indicate that the cyclic Sinkhorn algorithm can be a good choice for real-world data because of its fastest computation time if users tolerate the objective function value difference and the marginal error. If not, the two-stage Sinkhorn algorithm is promising for realworld data, which solves C-EROT with approximate cyclic symmetry faster than the original Sinkhorn algorithm.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Algorithm (h, w) = (64, 64), d = 4096 (h, w) = (96, 96), d = 9216

Obj. value Marginal error Time (sec.) Obj. value Marginal error Time (sec.)

Sinkhorn 4.320 2.056 0.000 0.000 16.610 6.502 6.296 3.100 0.000 0.000 117.152 53.442

Cyclic Sinkhorn 4.289 2.048 0.001 0.001 3.837 1.286 6.250 3.089 0.001 0.001 26.087 11.985

Two-Stage Sinkhorn 4.320 2.056 0.000 0.000 13.877 6.244 6.296 3.100 0.000 0.000 91.790 43.000

Table 2: Experimental results in the real-world image data.

Algorithm C20H42 dataset, d = 62 C40H82 dataset, d = 122

Obj. value Marginal error Time (msec.) Obj. value Marginal error Time (msec.)

Sinkhorn 0.598 0.186 0.000 0.000 1.534 0.840 0.640 0.220 0.000 0.000 4.204 4.331

Cyclic Sinkhorn 0.598 0.186 0.000 0.000 1.338 0.619 0.640 0.220 0.000 0.000 1.712 1.297

Table 3: Experimental results in the real-world C20H42 and C40H82 alkanes graph data.

Real-World Graph Data with Strict Cyclic Symmetry We also tested our algorithms on another real-world case of 2-order cyclic symmetry in Example 3. No well-maintained graph dataset with symmetry exists, so we made two new chemical graph datasets about C20H42 and C40H82 alkanes, which consist of 15 and 10 chemical graphs of different structures with 2-order cyclic symmetry, respectively (see Figure 3). In this test, we set d as the number of atoms and n = 2. Following (Togninalli et al. 2019), we defined a, b and C as those in Example 3. Because these graph data are real-world but with strict 2-order cyclic symmetry, we focused on C-EROT and compared only the Sinkhorn algorithm and the cyclic one (Algorithm 2), excluding the twostage one (Algorithm 3), for validation. This validation was performed over 105(= 15C2) graph pairs in C20H42 dataset and 45(= 10C2) graph pairs in C40H82 dataset, respectively. We set λ = 0.1 for the regularizer (3). Table 3 lists the graph comparison results. Our cyclic Sinkhorn algorithm had the same objective function values and marginal errors as the original one but showed faster computation times. These results support the effectiveness of our algorithm on real-world graph data.

Discussions and Limitations Through this paper, we confirmed that our algorithms can solve C-ROT faster. For further progress, we discuss the following future issues. (I) In Assumption 1, we assume knowing the cyclic order n in advance. Because cyclic symmetry arises naturally from the physical structure of input data, this assumption is reasonable in some real-world cases. However, we must improve our algorithms for unknown-order cyclic symmetry. (II) It is unknown whether our algorithms can be generalized for other symmetries, e.g., dihedral symmetry (Gatermann and Parrilo 2004). Further development of our algorithms for general symmetries remains as future work. (III) The main contribution of this paper is showing the utilization of cyclic symmetry in OT with theoretical proofs, but we must test our algorithms in various real-

world data for further development. (IV) Our study is orthogonal to the existing OT methods; our algorithms can be combined with many existing OT methods, including the state-of-the-art methods like the primal-dual descent methods (Dvurechensky, Gasnikov, and Kroshnin 2018; Guo, Ho, and Jordan 2020). As the first step, this paper combined the most widely adopted OT methods (the network simplex and Sinkhorn methods) with our algorithms to evaluate the computational improvements but a detailed analysis of the combination of our algorithms with the state-of-the-art methods remains as future work. (V) Approaches that combine special structures, such as low rankness or separability, with symmetry have excellent potential. However, it is not so trivial to combine them well because our symmetry approach may break such special structures. We must explore this potential in the future.

Conclusion We proposed novel fast algorithms for OT with cyclic symmetry. We showed that such OT can be reduced to a smaller optimization problem that has significantly fewer variables as higher cyclic symmetry exists in the input data. Our algorithms solve the small problem instead of the original OT and achieve fast computation. Through experiments, we confirmed the effectiveness of our algorithms in synthetic/real-world data with strict/approximate cyclic symmetry. This paper cultivates a new research direction, OT with symmetry, and paves the way for future research.

References Ahuja, R. K.; Magnanti, T. L.; and Orlin, J. B. 1993. Network Flows: Theory, Algorithms, and Applications. Prentice-Hall, Inc. Alaya, M. Z.; Berar, M.; Gasso, G.; and Rakotomamonjy, A. 2019. Screening Sinkhorn Algorithm for Regularized Optimal Transport. In Advances in Neural Information Processing Systems. Altschuler, J.; Bach, F.; Rudi, A.; and Niles-Weed, J. 2019. Massively scalable Sinkhorn distances via the Nystr om

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

method. In Advances in Neural Information Processing Systems. Altschuler, J.; Niles-Weed, J.; and Rigollet, P. 2017. Nearlinear time approximation algorithms for optimal transport via Sinkhorn iteration. In Advances in Neural Information Processing Systems. Beck, A. 2017. First-Order Methods in Optimization. Society for Industrial and Applied Mathematics. Beck, A.; and Tetruashvili, L. 2013. On the convergence of block coordinate descent type methods. SIAM journal on Optimization, 23(4): 2037 2060. Blondel, M.; Seguy, V.; and Rolet, A. 2018. Smooth and Sparse Optimal Transport. In International Conference on Artificial Intelligence and Statistics. Bonchev, D. 1991. Chemical graph theory: introduction and fundamentals, volume 1. CRC Press. Bonneel, N.; Peyr e, G.; and Cuturi, M. 2016. Wasserstein Barycentric Coordinates: Histogram Regression Using Optimal Transport. ACM Transactions on Graphics, 35(4). Boyd, S.; and Vandenberghe, L. 2004. Convex Optimization. Cambridge University Press. Cicconet, M.; Birodkar, V.; Lund, M.; Werman, M.; and Geiger, D. 2017. A convolutional approach to reflection symmetry. Pattern Recognition Letters, 95: 44 50. Courty, N.; Flamary, R.; Tuia, D.; and Rakotomamonjy, A. 2017. Optimal Transport for Domain Adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(9): 1853 1865. Cuturi, M. 2013. Sinkhorn Distances: Lightspeed Computation of Optimal Transport. In Advances in Neural Information Processing Systems. Dezs o, B.; J uttner, A.; and Kov acs, P. 2011. LEMON an Open Source C++ Graph Template Library. Electronic Notes in Theoretical Computer Science, 264(5): 23 45. Second Workshop on Generative Technologies. Dieleman, S.; Willett, K. W.; and Dambre, J. 2015. Rotationinvariant convolutional neural networks for galaxy morphology prediction. Monthly Notices of the Royal Astronomical Society, 450(2): 1441 1459. Dvurechensky, P.; Gasnikov, A.; and Kroshnin, A. 2018. Computational Optimal Transport: Complexity by Accelerated Gradient Descent Is Better Than by Sinkhorn s Algorithm. In International Conference on Machine Learning. Gatermann, K.; and Parrilo, P. A. 2004. Symmetry groups, semidefinite programs, and sums of squares. Journal of Pure and Applied Algebra, 192(1-3): 95 128. Getreuer, P. 2013. A Survey of Gaussian Convolution Algorithms. Image Processing On Line, 3: 286 310. Guillaume, C. 2012. Optimal transportation and economic applications. Lecture Notes. Guo, W.; Ho, N.; and Jordan, M. 2020. Fast Algorithms for Computational Optimal Transport and Wasserstein Barycenter. In International Conference on Artificial Intelligence and Statistics.

Howard, E. 1965. Garden Cities of To-Morrow, volume 23. Mit Press. Jaff e, H. H.; and Orchin, M. 2002. Symmetry in chemistry. Courier Corporation. Kantorovich, L. 1942. On the transfer of masses (in Russian). In Doklady Akademii Nauk, volume 37, 227. Kim, S.; Chen, J.; Cheng, T.; Gindulyte, A.; He, J.; He, S.; Li, Q.; Shoemaker, B. A.; Thiessen, P. A.; Yu, B.; Zaslavsky, L.; Zhang, J.; and Bolton, E. E. 2022. Pub Chem 2023 update. Nucleic Acids Research, 51(D1): D1373 D1380. Kusner, M.; Sun, Y.; Kolkin, N.; and Weinberger, K. 2015. From Word Embeddings To Document Distances. In International Conference on Machine Learning. Ladd, M. F. C. 2014. Symmetry of crystals and molecules. Oxford University Press, USA. Lin, T.; Ho, N.; and Jordan, M. 2019a. On Efficient Optimal Transport: An Analysis of Greedy and Accelerated Mirror Descent Algorithms. In International Conference on Machine Learning. Lin, T.; Ho, N.; and Jordan, M. I. 2019b. On the Acceleration of the Sinkhorn and Greenkhorn Algorithms for Optimal Transport. In Ar Xiv Preprint: 1906.01437v1. Liu, Y.; Zhu, L.; Yamada, M.; and Yang, Y. 2020. Semantic Correspondence as an Optimal Transport Problem. In Computer Vision and Pattern Recognition. Monge, G. 1781. M emoire sur la th eorie des d eblais et des remblais. De l Imprimerie Royale. Nikolentzos, G.; Meladianos, P.; and Vazirgiannis, M. 2017. Matching Node Embeddings for Graph Similarity. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. Pang, S.; Du, A.; Orgun, M. A.; Wang, Y.; Sheng, Q. Z.; Wang, S.; Huang, X.; and Yu, Z. 2022. Beyond CNNs: Exploiting Further Inherent Symmetries in Medical Image Segmentation. IEEE Transactions on Cybernetics, 1 12. Petric Maretic, H.; El Gheche, M.; Chierchia, G.; and Frossard, P. 2019. GOT: An Optimal Transport framework for Graph comparison. In Advances in Neural Information Processing Systems. Peyr e, G.; and Cuturi, M. 2019. Computational Optimal Transport: With Applications to Data Science. Now Publishers. Pham, K.; Le, K.; Ho, N.; Pham, T.; and Bui, H. 2020. On Unbalanced Optimal Transport: An Analysis of Sinkhorn Algorithm. In International Conference on Machine Learning. Rockafellar, R. T. 1970. Convex Analysis, volume 18 of Princeton Mathematical Series. New Jersey: Princeton University Press. Shamai, G.; Aflalo, Y.; Zibulevsky, M.; and Kimmel, R. 2015. Classical Scaling Revisited. In International Conference on Computer Vision. Sinkhorn, R. 1967. Diagonal equivalence to matrices with prescribed row and column sums. In The American Mathematical Monthly, 402 405.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Solomon, J.; de Goes, F.; Peyr e, G.; Cuturi, M.; Butscher, A.; Nguyen, A.; Du, T.; and Guibas, L. 2015. Convolutional Wasserstein Distances: Efficient Optimal Transportation on Geometric Domains. ACM Transactions on Graphics, 34(4). Tarjan, R. E. 1997. Dynamic trees as search trees via euler tours, applied to the network simplex algorithm. Mathematical Programming, 78(2): 169 177. Tenetov, E.; Wolansky, G.; and Kimmel, R. 2018. Fast Entropic Regularized Optimal Transport Using Semidiscrete Cost Approximation. SIAM Journal on Scientific Computing, 40(5): A3400 A3422. Togninalli, M.; Ghisu, E.; Llinares-L opez, F.; Rieck, B.; and Borgwardt, K. 2019. Wasserstein Weisfeiler Lehman Graph Kernels. In Neur IPS, 6436 6446. Wu, S.; Rupprecht, C.; and Vedaldi, A. 2023. Unsupervised Learning of Probably Symmetric Deformable 3D Objects From Images in the Wild (Invited Paper). IEEE Trans. Pattern Anal. Mach. Intell., 45(4): 5268 5281. Xie, T.; and Grossman, J. C. 2018. Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties. Phys. Rev. Lett., 120: 145301. Zhao, W.; Chellappa, R.; Phillips, P. J.; and Rosenfeld, A. 2003. Face Recognition: A Literature Survey. ACM Comput. Surv., 35(4): 399 458.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)