# curvaturebased_clustering_on_graphs__5567a9f6.pdf

Journal of Machine Learning Research 26 (2025) 1-67 Submitted 5/24; Published 1/25

Curvature-based Clustering on Graphs

Yu Tian yu.tian.research@gmail.com Center for Systems Biology Dresden Dresden, 01307, Germany

Zachary Lubberts zlubberts@virginia.edu Department of Statistics University of Virginia Charlottesville, VA 22904, USA

Melanie Weber mweber@seas.harvard.edu Harvard University Cambridge, MA 02138, USA

Editor: Michael Mahoney

Unsupervised node clustering (or community detection) is a classical graph learning task. In this paper, we study algorithms that exploit the geometry of the graph to identify densely connected substructures, which form clusters or communities. Our method implements discrete Ricci curvatures and their associated geometric ﬂows, under which the edge weights of the graph evolve to reveal its community structure. We consider several discrete curvature notions and analyze the utility of the resulting algorithms. In contrast to prior literature, we study not only single-membership community detection, where each node belongs to exactly one community, but also mixed-membership community detection, where communities may overlap. For the latter, we argue that it is beneﬁcial to perform community detection on the line graph, i.e., the graph s dual. We provide both theoretical and empirical evidence for the utility of our curvature-based clustering algorithms. In addition, we give several results on the relationship between the curvature of a graph and that of its dual, which enable the eﬃcient implementation of our proposed mixed-membership community detection approach and which may be of independent interest for curvature-based network analysis. Keywords: Community Detection, Node Clustering, Discrete Curvature, Line Graph, Graph-based Learning

1. Introduction

Relational data, such as graphs or networks, is ubiquitous in machine learning and data science. Consequently, a large body of literature devoted to studying the structure of such data has been developed. Clustering on graphs, also known as community detection or (unsupervised) node clustering, is of central importance to the study of relational data. It seeks

. Equal contribution . A preliminary version of part of this work was presented at the Neur IPS Workshop on Symmetry and Geometry in Neural Representations 2022 (Tian et al., 2023).

c 2025 Yu Tian and Zachary Lubberts and Melanie Weber.

License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v26/24-0781.html.

Tian and Lubberts and Weber

(a) Single-Membership

(b) Mixed-Membership

Figure 1: Clustering on Graphs.

to identify densely interconnected substructures (clusters) in a given graph. Such structure is ubiquitous in relational data: We may think of friend circles in social networks, metabolic pathways in biochemical networks or article categories in Wikipedia. A standard mathematical model for analyzing community structure in graphs is the Stochastic Block Model (SBM), which has led to fundamental insights into the detectability of communities (Abbe, 2017). Classically, communities are identiﬁed by clustering the nodes of the graph. Popular methods include the Louvain algorithm (Blondel et al., 2008), the Girvan-Newman algorithm (Girvan and Newman, 2002) and Spectral clustering (Cheeger, 1969; Fiedler, 1973; Spielman and Teng, 1996). Recently, there has been growing interest in another class of algorithms, which applies a geometric lens to community detection. Graph curvatures provide a means to understanding the structure of networks through a characterization of their geometry. More generally, curvature is a classical tool in diﬀerential geometry, which is used to characterize the local and global properties of geodesic spaces. While originally deﬁned in continuous spaces, discrete notions of curvature have recently seen a surge of interest. Of particular interest are discretizations of Ricci curvature, which is a local notion of curvature that relates to the volume growth rate of the unit ball in the space of interest (geodesic dispersion). Curvature-based analysis of relational data (see, e.g., (Weber et al., 2017a,b)) has been applied in many domains, including to biological (Elumalai et al., 2022; Weber et al., 2017c; Tannenbaum et al., 2015), chemical (Leal et al., 2021; Saucan et al., 2018), social (Painter et al., 2019) and ﬁnancial networks (Sandhu et al., 2016). Community detection via graph curvatures is motivated by the observation that edges between communities (so called bridges, see Figure 1b(a)) have low Ricci curvature. By identifying and removing such bridges, one can learn a partition of the graph into its communities. This simple idea has given rise to a series of curvature-based approaches for community detection (Ni et al., 2019; Sia et al., 2019; Weber et al., 2018; Tian et al., 2023). However, the picture is far from complete. The proposed algorithms vary signiﬁcantly in their implementation, in particular in the choice of the curvature notion that they utilize. In this paper, we will elucidate commonalities and diﬀerences among variants of the two most common curvature notions in terms of their utility for community detection. To this end, we propose a unifying framework for curvature-based community detection algorithms and perform a systematic analysis of the advantages and disadvantages of each curvature notion. The insights gained from this analysis give rise to several improvements, which mitigate the identiﬁed shortcomings in accuracy and scalability.

Curvature-based Clustering on Graphs

The present paper also addresses a second important gap in the literature. Existing work on partition-based community detection, speciﬁcally on curvature-based approaches, focuses almost exclusively on detecting communities with unique membership, i.e., each node can belong to only one community (single-membership community detection). However, in many complex systems that generate relational data, nodes may belong to more than one community: A member of a social network might belong to a circle of high school friends and a circle of college friends, a protein may have diﬀerent functional roles in a biochemical network etc. This creates a need for algorithms that can cluster nodes with mixed membership (Airoldi et al., 2008; Yang and Leskovec, 2013; Zhang et al., 2020), so-called mixed-membership community detection. Observe that the mixed-membership structure precludes the existence of bridges between communities, because the communities overlap (see Figure 1b(b)). This renders community detection methods that rely on graph partitions, such as the curvature-based methods discussed above, inapplicable. In this paper, we propose a principled curvature-based approach for detecting mixed-membership community structure. We show that while partition-based approaches do not recover the underlying community structure when applied to the original graph, they perform well on its dual, the line graph. The line graph encodes the connectivity between the edges of the original graph (see Figure 2). This reformulation of the relational information in the graph allows for disentangling the overlapping community structure (see Figure 3). When applied to the line graph, our curvature-based community detection approach correctly identiﬁes edges in the line graph, which can be cut to partition the graph into its communities and assign node labels that reﬂect mixed-membership community structure. Utilizing fundamental relationships between the curvature of a graph and its dual, we demonstrate that curvature-based mixed-membership community detection results in scalable and accurate methods.

1.1 Overview and Summary of Contributions

Our main contributions are as follows:

1. We propose a unifying framework for curvature-based community detection (unsupervised node clustering) algorithms (Section 2). Our framework encompasses several previously studied algorithms for curvature-based single-membership community detection (sec. 2.3, 6.1.1 and 6.2.1). Utilizing this framework, we systematically investigate the strengths and weaknesses of diﬀerent discrete curvature notions (speciﬁcally, variants of Forman s and Ollivier s Ricci curvature, Sections 7 and 8).

2. We adapt our proposed framework to mixed-membership community detection (Section 6), providing the ﬁrst curvature-based algorithms for this setting.

3. We elucidate signiﬁcant scalability issues in community detection approaches that utilize Ollivier s notion of discrete curvature. To overcome these issues, we propose an eﬀective approximation, which may be of independent interest in the broader area of curvature-based network analysis (Sections 4.1.3 and 5.2.2).

4. We demonstrate the utility of our framework through theoretical analysis (Section 6.1.2) and through a series of systematic experiments with synthetic and real network data. We further benchmark curvature-based methods against classical community detection approaches (Section 7).

Tian and Lubberts and Weber

5. As a byproduct, we derive several fundamental relations between discrete curvatures in a graph and its dual. Those relations may be of independent interest (Section 5).

2. Clustering on Graphs

Community structure is a hallmark feature of networks, characterized by clusters of nodes that have more internal connections than connections to nodes in other clusters. In networks with mixed-membership structure, which quantiﬁes the idea of overlapping communities, nodes may belong to more than one cluster. In the following, let G = (V, E, W) denote a graph with vertex set V and edge set E. We may associate weights with both vertices and edges, deﬁned as functions ω(v) : V R+ and ω(e) : E R+. The distance measure on G is the standard path distance d, i.e.,

d G(u, v) := inf P

i=1 w({zi, zi+1}), (2.1)

where the inﬁmum is taken over all u v paths P = {zi}l i=1 V , with {zi, zi+1} E for each i, and z1 = u, zl = v. We further deﬁne the degree du of a node u as the weighted sum of its adjacent edges, i.e., du := X

u e w(e) . (2.2)

We remark that the terms network and graph are often used interchangeably. Following the convention in the graph learning literature, we use graph , if we refer to the mathematical object and network when referring to a data set or speciﬁc instance of a graph. We will use the terms node and vertex interchangeably.

2.1 Single-Membership Node Clustering

2.1.1 Single-Membership Community Structure

In classical community detection, each node can only belong to one community, which characterizes the community structure as being single-membership. Various algorithms exist by virtue of interdisciplinary expertise (Blondel et al., 2008; Girvan and Newman, 2002; von Luxburg, 2007), generally aiming to optimize a quality function with respect to diﬀerent partitions of the network. We denote a quality function as Q which increases as the resulting community structure becomes clearer from some perspective, such as the modularity (Girvan and Newman, 2002), and denote a partition of the vertices as {B1, . . . , Bk} that are mutually exclusive and exhaustive, i.e., Bh Bl = when h = l and k h=1Bh = V , with k being the number of communities. The classical community detection problem can then be formulated as the combinatorial optimization problem

max k max {B1,...,Bk} Q({B1, . . . , Bk}).

In this generality, the community detection problem is NP-hard, so heuristics-based algorithms are typically applied in order to provide a suﬃciently good solution. The Stochastic Block Model (short: SBM ) (Holland et al., 1983) is a random graph model, which emulates the community structure found in many real networks. It is a

Curvature-based Clustering on Graphs

popular model for studying complex networks and, in particular, clustering on graphs. In the SBM, vertices are partitioned into subgroups called blocks, and the distribution of the edges between vertices is dependent on the blocks to which the vertices belong. Formally:

Deﬁnition 1 (Stochastic Block Model) Let {B1, . . . , Bk} be a partition of a graph s vertices into k blocks, and let B = (Bhl) denote the characteristic matrix between the blocks, where Bhl shows the probability of an edge existing between a vertex in block Bh and one in Bl. Then, if we denote the random adjacency matrix of the network by A = (Aij) where Aij = 1 if there is an edge between vertices vi and vj,

P (Aij = 1) = Bσ(i)σ(j) ,

where σ : V {1, . . . , k} indicates the block membership. A planted SBM has the additional structure that Bhh = pin, h, and Bhl = pout, h = l, which we denote by SBM(pin, pout).

2.1.2 Partition-based clustering methods

Of particular importance for classical community detection are edges between clusters, often called bridges. Bridges connect two nodes in distinct communities (see Figure 1a). In graphs with single-membership community structure, i.e., where each node belongs to one community only, such edges always exist (otherwise each community is a connected component). This observation has motivated a range of partition-based community detection approaches (Blondel et al., 2008; Girvan and Newman, 2002; von Luxburg, 2007), which rely on identifying and cutting such bridges to partition the graph into communities. Among the most popular community detection methods is the Louvain algorithm (Blondel et al., 2008), which is a heuristic method proposed to maximize modularity hierarchically, starting from each node as a community and then successively merging with its neighboring communities whenever it improves modularity. Spectral clustering (von Luxburg, 2007), another widely used method, utilizes the spectrum of the graph Laplacian to ﬁnd a partition of the graph with a minimal number of edge cuts (mincut problem). This optimization problem too is NP-hard, but it can be solved eﬃciently after relaxing some constraints. We give a brief overview of other classical partition-based community detection methods in Section 3. In this paper, we discuss methods that utilize discrete curvature to identify bridges: Below, we introduce a blueprint for partition-based clustering via discrete Ricci curvature.

2.2 Mixed-Membership Node Clustering

2.2.1 Mixed-membership Community Structure

In graphs with mixed-membership communities, nodes may belong to more than one community. Such community structure can be formalized via an extension of the SBM: The Mixed-Membership Stochastic Block Model (short: MMB) (Airoldi et al., 2008). In both models, each element of the adjacency matrix A above the diagonal is an independent Bernoulli random variable whose expectation only depends on the block memberships of the corresponding nodes. In the MMB, it is possible that nodes belong to more than one block, with various aﬃliation strengths. Formally:

Tian and Lubberts and Weber

Deﬁnition 2 (Mixed-Membership Block Model) We assume that the expectation E[A] has the form E[A] = XBXT , where X [0, 1]n k is the community membership matrix with Xil indicating the aﬃliation of node i with community l and P l Xil = 1, and n denotes the size of the network. The matrix B [0, 1]k k encodes the block connection probabilities. If Xil = 1 for some l, we call vertex i a pure node; otherwise i is a mixed node.

When all nodes are pure, we recover the ordinary SBM model. In our experiments, we consider a planted version where Bhh = pin, h, and Bhl = pout, h = l, which we denote by MMB(pin, pout).

2.2.2 Line Graphs

Figure 2: Graphs G with their corresponding line graphs L(G). Note that vertices in G with degree at least 3 result in cliques in L(G), and that cycles are preserved.

We denote the dual of a graph as its line graph:

Deﬁnition 3 Given an unweighted, undirected graph G = (V, E), its line graph is deﬁned as L(G) = (E, E), where each edge {{u, v}, {r, s}} E appears when |{u, v} {r, s}| = 1. In other words, e, e are adjacent in L(G) when they are both incident on the same vertex.

The line graph can be seen as a re-parametrization of the input graph, which encodes higher-order information about its edges connectivity. Examples of graphs and their line graphs can be found in Figure 2. Importantly, cycles are preserved under this construction (e.g., Figure 2 (bottom)). Cliques and cycles in the line graph may also arise from nodes of degree three or higher (see, e.g., Figure 2 (top)). Notice that the line graph is typically larger in size than the original graph. In a graph with n vertices and average degree nαn, we expect to have n2αn edges. In the line graph, this means n2αn vertices and n3α2 n edges, though this number increases with greater variation of the vertex degrees in the original graph. For the applications to community detection that we will discuss below, this necessitates a careful consideration of the scalability of diﬀerent community detection approaches. The classical line graph construction gives a graph without edge weights, since edge weights in the original graph correspond to node weights in the line graph. We will discuss below (Section 5.2.2) how weights can be imposed on line graph edges that encode meaningful structural information in the original graph.

2.2.3 Partition-based Methods on the Line Graph.

We have seen above that edges between clusters, often called bridges, are of particular importance for single-membership community detection. However, overlaps between communities preclude the existence of these bridges (see Figure 1b), which limits the applicability of partition-based approaches. In this paper we argue that the line graph provides a natural input for partition-based mixed-membership community detection. While nodes may not have a unique label in this model, the adjacent edges may still be internal, connecting

Curvature-based Clustering on Graphs

two nodes that are in the same community (at least partially). In this case, each edge is associated with a single community (see Figure 3 (left)). Consequently, by representing the relationships among edges in a line graph, we disentangle the overlapping communities. Each edge in the line graph appears at a vertex in the original graph. When the vertex has mixed membership, a bridge between the communities arises in the line graph (Figure 3 (right)).

2.3 Curvature-based Clustering Algorithm

Algorithm 1 Curvature-based Clustering via Ricci Flow

1: Input: Graph G = (V, E, W0), hyperparameters ν (step size, ﬂow), ϵ (tolerance), ϵd (drop threshold), T (number of iterations)

2: for t = 1, . . . , T do

3: for {u, v} E do

4: Compute curvature κ({u, v}).

5: Evolve weight under Ricci ﬂow: wt({u, v}) (1 ν κ({u, v}))d G(u, v).

7: Renormalize edge weights W t: wt({u, v}) |E|d G(u,v) P {u ,v } E d G(u ,v ) for all {u, v} E.

9: Construct cut-oﬀpoints {x0, x1, . . . , xnf }.

10: Initialize l R|V | (list of node labels), Q 1, Qbest := ϵ.

11: for i = 0, . . . nf do

12: Construct Gi = (V, Ei, W T i ) with Ei = {{u, v} : w T ({u, v}) > xi} and W T i = W T Ei.

13: Determine connected components of Gi. Assign node labels by components.

14: Compute the modularity Qi of the resulting community assignment.

15: if Qi > Qbest, Qi Qi 1

Qi > ϵd then

16: Store label assignments in l. Qbest = Qi.

18: end for

19: If Qbest > ϵ, return l.

Figure 3: Community structure in a graph G and its line graph L(G) with node labels (top) and edge labels (bottom).

In this section, we will describe the blueprint of our curvature-based clustering algorithm, which we will then specialize to the singleand mixedmembership problem described above. In the community detection literature, approaches based on discrete Ricci curvature and an associated Ricci ﬂow have been studied for unique-membership communities (Ni et al., 2019; Sia et al., 2019; Weber et al., 2018). We discuss the details of those approaches in Sections 6.1 and 6.2. To the best of our knowledge, extensions of curvature-based methods to mixed-membership community detection have not been studied before.

Tian and Lubberts and Weber

Like previous approaches that utilize Ricci curvature (e.g., Ni et al. (2019)), we build on a notion of discrete Ricci ﬂow ﬁrst proposed by Ollivier (2009):

d dtd G(u, v)(t) = κ({u, v})(t) d G(u, v)(t) ({u, v} E) , (2.3)

where d G(u, v) denotes the shortest path distance between adjacent nodes u, v E and κ({u, v}) the Ricci curvature along that edge. In this work, κ({u, v}) may denote either Forman s or Ollivier s Ricci curvature. We will describe the two resulting methods in detail below. Curvature-based clustering implements discrete Ricci ﬂow via a combinatorial evolution equation, which evolves edge weights according to the local geometry of the graph. Consider a family of weighted graphs Gt = {V, E, W t}, which is constructed from an input graph G by evolving its edge weights as

wt({u, v}) (1 κ({u, v}))d G(u, v) ({u, v} E) , (2.4)

where the curvature κ({u, v}) and shortest-path distance d G(u, v) is computed on the graph Gt 1. Equation (2.4) can be viewed as a discrete analog of Ricci ﬂow on Riemannian manifolds; it was ﬁrst introduced by Ollivier (2010) with κ denoting ORC. Since then, several variants have been studied in the context of clustering, utilizing ORC (Ni et al., 2019; Sia et al., 2019; Tian et al., 2023), as well as FRC (Weber et al., 2018). Each of these algorithms is a special case of the general framework that we describe below. The clustering procedure is initialized with the (possibly unweighted) input graph, i.e., G0 := G. We then evolve the edge weights for T iterations under Ricci ﬂow (Equation (2.4)). In each iteration, the edge weights are renormalized using

wt({u, v}) |E|d G(u, v) P {u ,v } E d G(u , v ) . (2.5)

Over time, the negativity of the curvature of edges that bridge communities intensiﬁes, as edges with lower Ricci curvature contract more slowly under Ricci ﬂow. On the other hand, edges with higher Ricci curvature contract faster. This results in a decrease of the weight of internal edges over time, while the weight of the bridges increases (see Figure 4). With that, the discrete Ricci ﬂow reinforces the meso-scale structure of the network. In order to recover single-membership labels, we cut the edges in the graph with high weights (or, equivalently, with very negative curvature) after evolving edge weights for T iterations. The resulting partition delivers a node clustering. In the mixed-membership case, we cut the edges in the line graph with the highest weight (or, equivalently, the most negative curvature) after evolving edge weights for T iterations. The resulting partition of the graph delivers an edge clustering from which we infer the community labels: We obtain a mixed-membership label vector y for each node v by computing

yl(v) = 1 |Ev|

e Ev χl(e) , (2.6)

where χl is the indicator for the edge cluster Cl. Intuitively, each edge belonging to cluster l that is incident on a node v adds more evidence of aﬃnity between the node v and the cluster Cl.

Curvature-based Clustering on Graphs

0 5 10 15 t

{1, 5} {1, 9} {1, 22} {1, 23} {1, 24}

0 5 10 15 t

{7, 87} {8, 57} {11, 62} {12, 83} {13, 96}

Figure 4: To illustrate the eﬀect of the discrete Ricci ﬂow in Algorithm 1, we consider a planted SBM of size n = 100, two equally-sized blocks, pin = 0.3 and pout = 0.02, visualized in the left panel. We select representative edges within a community, shown in orange on the graph, and plot the evolution of their weights in the middle panel. In the right panel, we show this evolution for selected edges between communities (i.e., bridges), shown in blue on the graph.

In both cases, the success of the method depends crucially on identifying a good weight threshold for cutting edges in the original graph (single-membership case) or the line graph (mixed-membership case). We propose to perform a hyperparameter search over cut-oﬀ points {x0, x1, . . . , xnf }. The construction of the cut-oﬀpoints is an important design choice and depends on the curvature notion used in the approach. We then compute the modularity, a classical quality metric in the community detection literature, for the label assignments corresponding to each cut-oﬀpoint xi. Speciﬁcally, modularity (Girvan and Newman, 2002; G omez et al., 2009) is deﬁned as

w({u, v}) dudv

δ(σ(u), σ(v)) , (2.7)

where 2w := P ij w({i, j}). The larger the modularity, the better we expect the clustering to reﬂect the underlying community structure induced by the graph s connectivity. Our approach is schematically shown in Algorithm 1. Note that in Algorithm 1, the number of iterations are selected a priori, while the choice can also be integrated with the modularity check. This is because the choice of T = 10 generally provides suﬃciently good results, which is veriﬁed with the comparison to the variant with intermediate modularity check and simple stopping criteria in Section 7.3.

3. Related Work

Community detection. Mixed-membership community detection is widely studied in the network science and data mining communities. Notable approaches include Bayesian methods (Airoldi et al., 2008; Hopkins and Steurer, 2017), matrix factorization (Yang and Leskovec, 2013), spectral clustering (Zhang et al., 2020), and vertex hunting (Jin et al., 2017), among others. In addition to the mixed-membership model that we consider here (Airoldi et al., 2008), there is a signiﬁcant body of literature on closely related overlapping community models (Lancichinetti et al., 2009; Xie et al., 2013), which also study the problem of learning non-unique node labels. Curvature-based community detection methods for non-overlapping communities have recently received growing interest (Ni et al., 2019;

Tian and Lubberts and Weber

Sia et al., 2019; Gosztolai and Arnaudon, 2021; Weber et al., 2018). Such approaches utilize notions of discrete Ricci curvature (Sia et al., 2019; Gosztolai and Arnaudon, 2021; Weber et al., 2018), as well as an associated Ricci ﬂow (Ni et al., 2019) to partition networks into communities, based on the observation that edges between communities have low curvature. The absence of such bridges in overlapping communities renders these approaches inapplicable to the setting studied in this paper. To the best of our knowledge, our algorithm is the ﬁrst to study mixed-membership community structure with curvature-based methods.

Higher-order structure in Networks. Historically, much of the network analysis literature has focused on the structural information encoded in nodes and edges. Recently, the analysis of higher-order structures has received increasing attention. A plethora of methods for analyzing higher-order structure in relational data has been developed (Benson et al., 2016; Battiston et al., 2020). Our work follows this line of thought, in that it focuses on the relations between edges and the structural information encoded therein. Curvature of higher-order structure has previously been studied in (Weber et al., 2017b; Saucan and Weber, 2018; Leal et al., 2021). Here, networks have been studied as polyhedral complexes (Weber et al., 2017b), or interactions between groups of nodes have been encoded in hypergraphs (Saucan and Weber, 2018; Leal et al., 2021). To the best of our knowledge, this is the ﬁrst work that studies the curvature of line graphs.

Line Graphs. The notion of the line graph goes back at least to (Whitney, 1932), where it is shown that whenever |V | 5, two graphs are isomorphic if and only if their line graphs are. An eﬀective version of this result, which reports whether a given graph is a line graph, and if it is, returns the base graph, appears in Lehot (1974). Community detection and more generally network analysis via the line graph has been recently studied in (Chen et al., 2017; Krzakala et al., 2013; Lubberts et al., 2021; Evans and Lambiotte, 2010).

Discrete Curvature. There exists a large body of literature on discrete notions of curvature. Notable examples include Gromov s δ-hyperbolicity (Gromov, 1987), Bakry-Emre curvature (Erbar and Maas, 2012), as well as Forman s and Ollivier s Ricci curvatures (Forman, 2003; Ollivier, 2010). While there is some work on studying those curvature notions on higher-order networks (Bloch, 2014; Weber et al., 2017b) and hypernetworks (Saucan and Weber, 2018; Leal et al., 2021), they have not been studied on line graphs. Curvaturebased analysis of relational data (see, e.g., (Weber et al., 2017a,b)) has been applied in many domains, including to biological (Elumalai et al., 2022; Weber et al., 2017c; Tannenbaum et al., 2015), chemical (Leal et al., 2021; Saucan et al., 2018), social (Painter et al., 2019), information (Ni et al., 2015) and ﬁnancial networks (Sandhu et al., 2016). Discrete curvature has also found applications in Representation Learning, in particular for identifying representation spaces in graph embeddings (Lubold et al., 2023; Weber, 2020; Weber and Nickel, 2018).

4. Discrete Graph Curvature

Curvature is a classical tool in Diﬀerential Geometry, which is used to characterize the local and global properties of geodesic spaces. In this paper, we investigate discrete notions of curvature that are deﬁned on graphs. Speciﬁcally, we focus on discretizations of Ricci

Curvature-based Clustering on Graphs

Figure 5: Computation of discrete Ricci curvature.

curvature, which is a local notion of curvature that relates to the volume growth rate of the unit ball in the space of interest (geodesic dispersion). In the following, we only consider undirected networks. We introduce two classical notions of discrete Ricci curvature, which were originally introduced by Ollivier (2010) and Forman (2003), respectively.

4.1 Ollivier s Ricci Curvature

Our ﬁrst notion of discrete curvature relates geodesic dispersion to optimal mass transport on the graph.

4.1.1 Formal Definition

Consider the transportation cost between two distance balls (i.e., vertex neighborhoods) along an edge in the network. In an unweighted graph, we endow the neighborhoods of vertices u, v adjacent to an edge e = {u, v} with a uniform measure, i.e.,

du z, s.t. u z, (4.1)

and analogously for mv(z). Here, u z indicates that u, z are neighbors. In a weighted graph, for α [0, 1] and p 0, we set

mα,p u (z) :=

α if z = u,

Cu exp( d G(u, z)p) if z u, 0 else.

where the normalizing constant Cu = P z u exp( d G(u, z)p). Notice that for neighboring vertices u, z we have d G(u, z) = w({u, z}) := wu,z. When α = 0, and p = 0 or we have an unweighted graph, this reduces to the uniform measure. We deﬁne Ollivier s Ricci curvature (Ollivier, 2010) (short: ORC) with respect to the Wasserstein-1 distance W1 between those measures, i.e.,

Ric O(e) := 1 W1(mu, mv)

d G(u, v) . (4.3)

Tian and Lubberts and Weber

Figure 6: Scatter plot of the ORC on G obtained from the optimization versus its approximation, where the optimization is done by solving the earth mover s distance exactly (middle) and approximately with Sinkhorn (right), and G is obtained from the planted SBM of size n = 100, two equally-sized blocks, pout = 0.02 and pin = 0.4, with weights being 1 (top), and the 2-d RGG of size n = 100 and radius r = 0.4 with weights being 1 (middle), and proportional to the distance (bottom). G is visualised in the left column, with the color indicating the ORC obtained by solving the earth mover s distance exactly.

The computation of ORC is illustrated in Figure 5a. We can further deﬁne ORC for vertices with respect to the curvature of its adjacent edges. Formally, let Ev := {e E : v e} denote the set of edges adjacent to a vertex v. Then its curvature is given by

Ric O(v) = X

e Ev Ric O(e) . (4.4)

4.1.2 Computational Considerations

Computing ORC on a graph is quite expensive. The high cost arises mainly from the computation of the W1-distance, which, in the discrete setting, is also known as earth mover s distance. Computing W1(mu, mv) between the neighborhoods of two vertices u, v

Curvature-based Clustering on Graphs

that are connected by an edge, corresponds to solving the linear program

min W1(mu, mv) = inf Λ

j y λijd G(i, j) (4.5)

i x λij = mj y j y; X

j y λij = mi x i x;

λij 0 i, j .

Classically, W1(mu, mv) is computed using the Hungarian algorithm (Kuhn, 1955), the fastest variant of which has a cubic complexity. With growing neighborhood size, the computational cost of the Hungarian algorithm becomes prohibitively large. Alternatively, W1(mu, mv) can be approximated with the faster Sinkhorn algorithm (Sinkhorn and Knopp, 1967), whose complexity is only quadratic. However, even a quadratic cost introduces a signiﬁcant computational bottleneck on the large-scale network data that we encounter in machine learning and data science applications. Here, we propose a diﬀerent approach. Instead of approximating the W1-distance, we propose an approximation of ORC. Our approximation can be written as a simple combinatorial quantity, which can be computed in linear time. We begin by recalling a few classical bounds on ORC, ﬁrst proven by Jost and Liu (2014). Let #(u, v) be the number of triangles that include the edge (x, y), a b := min{a, b} and a b := max{a, b}. With respect to these quantities, upper and lower bounds on ORC are given as follows:

Theorem 4 (Unweighted case (Jost and Liu, 2014))

1. Lower bound on ORC:

Ric O({u, v}) 1 1

+ + #(u, v)

Note that a simpler, but less tight lower bound with respect to node degrees only is given by

Ric O({u, v}) 2 1 1

2. Upper bound on ORC:

Ric O({u, v}) #(u, v)

du dv . (4.7)

In the following section, we will derive an extension of these results to weighted graphs.

Now, let Ricup O and Riclow O denote the upper and lower bounds, respectively. We propose to approximate Ric O({u, v}) as the arithmetic mean of the upper and lower bounds, i.e.,

[ Ric O({u, v}) := 1

Ricup O ({u, v}) + Riclow O ({u, v}) . (4.8)

Tian and Lubberts and Weber

Like the bounds themselves, [ Ric O has a simple combinatorial form and can be computed in linear time. Since the computation relies only on local structural information, the procedure can be parallelized easily. Figure 6 demonstrates the utility of this approximation in practise (in both the SBM and Random Geometric Graphs (short: RGG); see Appendix B for more explanation).

4.1.3 Bounds on Ollivier s curvature in weighted graphs

Since our curvature-based algorithms work by modifying the edge weights in a graph according to their curvature values, we require an analogue of Theorem 4 for weighted graphs, in order to use a simple combinatorial approximation to the curvature as in Equation (4.8). To do so, we make use of the dual description of the Wasserstein distance between two measures m1, m2:

W1(m1, m2) = inf π Π(m1,m2) E(x,y) πd(x, y) = sup h L 1 [Ex m1[h(x)] Ey m2[h(y)]] ,

where Π(m1, m2) is the set of couplings of m1, m2, and the supremum is over all functions h : V R which are Lipschitz with constant at most 1 (with respect to shortest path distance). We may obtain an upper bound on W1(m1, m2) by ﬁnding any coupling π of m1, m2, and a lower bound by ﬁnding any 1-Lipschitz function h.

Upper bound on W1(m1, m2). By constructing a transport plan taking mx to my, we obtain the following upper bound:

Lemma 5 Let mx, my denote the measures on the node neighborhoods of x and y given by taking α = 0 and p = 1 in Equation (4.2). Denote the vertices in N(x) \ N(y) with ℓ, N(y) \ N(x) with r, and N(x) N(y) with c. Introduce the shorthands Lx := P ℓmx(ℓ), Ly := P ℓmy(ℓ), as well as Xx := mx(x), Xy := my(x). Then

W1(mx, my) X

ℓ wℓ,xmx(ℓ) + X

r wr,ymy(r)

c wc,y(mx(c) my(c))+ + wc,x(my(c) mx(c))+

Lx + Xx Xy X

c (my(c) mx(c))+

Remark 6 This upper bound is exact for weighted trees.

As a computational tool, this bound is a step in the right direction: Consider the usual procedure for calculating ORC. We need to compute mx, my, and d G(u, v) for each pair of vertices u, v VH. Then we need to carry out the optimization, which can be realized as a linear program as in Equation (4.5). This lower bound still requires the ﬁrst step, but avoids the need for the optimization over couplings, which is the major bottleneck in the curvature computation.

Curvature-based Clustering on Graphs

Lower bound on W1(m1, m2). We now provide a lower bound on W1(mx, my):

Lemma 7 Let mx, my be as deﬁned above, and let P = {v N(x) N(y) : mx(v) my(v) > 0}, N = {v N(x) N(y) : mx(v) my(v) < 0}. For S V and v V , let d G(v, S) = min{d G(v, u) : u S}. Then

W1(mx, my) max

v P d G(v, N)(mx(v) my(v)), X

v N d G(v, P)(my(v) mx(v))

Remark 8 A lower bound for W1(mx, my) was previously given by Jost and Liu (2014). However, the ORC notion considered therein diﬀered from ours in the choice of the mass distribution imposed on node neighborhoods. In addition, we believe that the result in Jost and Liu (2014) has a slight inaccuracy; for details, see the discussion in Remark 28.

Combinatorial bounds on ORC in the weighted case. We now utilize the combinatorial upper and lower bounds on W1(mx, my) to obtain combinatorial bounds on ORC itself.

Theorem 9 (ORC bounds (weighted case)) For x, y V, x y, we have the following bounds:

1. Upper bound on ORC:

Ric O({x, y}) 1 max

wxy (mx(v) my(v))+, X

wxy (my(v) mx(v))+

2. Lower bound on ORC:

Ric O({x, y}) 1 X

wℓ,x wx,y mx(ℓ) X

wr,y wx,y my(r)

wx,y (mx(c) my(c))+ + wc,x

wx,y (my(c) mx(c))+

Lx + Xx Xy X

c (my(c) mx(c))+

While technical, both bounds are purely combinatorial and can be computed in linear time without the need to solve a linear program.

4.2 Forman s Ricci Curvature

Our second notion of discrete curvature utilizes an analogy between spectral properties of manifolds and CW complexes to deﬁne a discrete Ricci curvature.

Tian and Lubberts and Weber

4.2.1 Formal Definition

Forman s discrete Ricci curvature (short: FRC) was originally deﬁned as a measure of geodesic dispersion on CW complexes (Forman, 2003). In its most general form, FRC is deﬁned via a discrete analogue of the Bochner-Weitzenb ock identity

d = Bd + Fd ,

which establishes a connection between the (discrete) Riemann-Laplace operator d, the Bochner Laplacian Fd and Forman s curvature tensor Fd. Networks can be viewed as polyhedral complexes, which is one instance of a CW complex. In particular, by viewing a network s edges as 1-cells, we can deﬁne FRC for edges e = (v1, v2) as

Ric F (e) = ωe

ωv1 ωeωev1 X

To arrive at this expression, we have specialized Forman s general notion for Fd to the case of 1-cells (see also (Weber et al., 2017a,b)). Here, ev denotes an edge that shares the vertex v with e. Note that if G is unweighted, then this reduces to

Ric F (e) = 4 dv1 dv2 .

We can deﬁne FRC for vertices (i.e., 0-cells) with respect to the curvature of its adjacent edges Ev (i.e., Ev := {e E : e = ( , v) or e = (v, )):

Ric F (v) = X

e Ev Ric F (e) . (4.10)

It is well-known that higher-order structures impact the community structure in a graph, as well as the ability of many well-known methods to detect community structure. For example, the clustering coeﬃcient, a well-known graph characteristic, can be deﬁned via triangle counts (Watts and Strogatz (1998), see also Section 8.2.1). Therefore, we additionally deﬁne an FRC notion, which takes contributions of higher-order structures into account. To characterize such structure, we introduce the following notion: We say that two edges e, ˆe are parallel, denoted as e ˆe, if they are adjacent to either the same vertex (v e, ˆe) or the same face (f e, ˆe), but not both. In the following, we consider the 2-complex version of FRC (Weber et al., 2017b):

Ric F (e) := ωe

Note that for each edge ˆe parallel to e, only one of the two sums inside the absolute value will have any terms. Here, f denotes a 2-face of order k, such as a triangle (k = 3), a quadrangle (k = 4), a pentagon (k = 5), etc.

Curvature-based Clustering on Graphs

4.2.2 Computational Considerations

Notice that Equation (4.9) and Equation (4.10) give a simple, combinatorial curvature notion, which can be eﬃciently computed even on large-scale graphs. To compute the 2-complex FRC of an edge e exactly, one needs to identify all higher order faces in the neighborhood of an edge e and compute their respective contributions in Equation (4.11). This limits the scalability of this notion signiﬁcantly. However, notice that the likelihood of ﬁnding a face of order k in the neighborhood of an edge decreases rapidly as k increases. Figure 7 illustrates this observation with summary statistics for simulations on two popular graph models (the SBM and RGG). We propose to approximate the 2-complex FRC by only considering triangular faces and ignoring structural information involving k-faces with k > 3. This choice reduces the computational burden of the 2-complex FRC signiﬁcantly. We demonstrate below that experimentally, the performance of our FRC-based clustering algorithm does not improve if higher-order faces are taken into account for the curvature contribution (see Table 5). Speciﬁcally, we will utilize the following augmented FRC:

Ric F (e) := ωe

ev1 e ev1 e

ωv1 ωeωev1 + X

ev2 e ev2 e

To be clear, the sum ev1 e, ev1 e considers all edges ev1 which are incident on v1, hence share a vertex in common with e, but do not form a triangle with e. We consider the following weighting scheme for faces, which utilizes Heron s formula to determine face weights via edge weights: Let f = (ei, ej, ek) denote a triangle in G, i.e., ei ej, ej ek and ei ek. Then we set

s(s ωei)(s ωej)(s ωek) (4.13)

s = ωei + ωej + ωek

This weighting scheme was previously used in (Weber et al., 2017b).

5. Discrete Curvature on the Line Graph

In this section, we investigate the computation of discrete Ricci curvature on the line graph. Again, we consider two notions of curvature, Ollivier s (ORC) and Forman s (FRC). Specifically, we want to describe the relationship between a graph s curvature and the curvature of its line graph. We have seen in Section 4 that curvature can be deﬁned on the node and edge levels. Since nodes in the line graph correspond to edges in the underlying graph, it is natural to study the relationship between node-level curvature in the line graph and edge-level curvature in the original graph. In the context of the clustering applications considered in this paper, we are further interested in the insight that curvature can provide on diﬀerences between node clustering (based on connectivity in the original graph) and edge clustering (based on connectivity in the line graph). In this context, it is instructive to study the relationship between edge-level curvature in the line graph and edge-level curvature in the underlying graph.

Tian and Lubberts and Weber

10 20 Face order

Averaged count

pin =0.2 pin =0.4

5 10 15 Face order

Averaged count

r=0.3 r=0.4

Figure 7: Number of k-faces in graphs obtained from the planted SBM of size n = 100, two equally-sized blocks, pout = 0.02 (left), and the 2-d RGG of size n = 100 (right), where the results are averaged over ns = 50 realisations of each set of parameters.

Recall that, given an unweighted, undirected graph G = (V, E), its line graph is deﬁned as L(G) = (E, E), where an edge {{u, v}, {r, s}} E appears in the line graph whenever |{u, v} {r, s}| = 1. We will make use of the following simple observation throughout the section:

Lemma 10 Node degrees in the line graph are given as

d{u,v} = du + dv 2 ,

where the left hand side denotes node degrees in the line graph L(G), and the right hand side the degrees of the vertices u and v in G.

This formula follows from the fact that the edges connecting to {u, v} in L(G) are precisely those edges of the form {u, z} E or {v, z} E, where z = u, v.

5.1 Numerical observations

We explore the relationship of (1) edge-level curvature in the original graph G and nodelevel curvature in the corresponding line graph L numerically, as well as (2) edge-level curvatures in the original graph G and its line graph L. Our empirical results consider the two notions of curvature introduced above: Forman s Ricci curvature (FRC) and Ollivier s Ricci curvature (ORC). Our data sets are described in Appendix B. To compute the classical FRC and ORC notions in the original graph G and its line graph L numerically, we use the built-in ORC computation both for edges and for nodes in Graph Ricci Curvature (Ni et al., 2019). Our implementations for augmented FRC and our ORC approximation build on the code in this library. Results from each random graph model are obtained from ns = 50 realizations.

5.2 Ollivier s Ricci Curvature

In this section, we establish theoretical evidence for the observed relationship between Ollivier s curvature (ORC) of a graph and its line graph.

Curvature-based Clustering on Graphs

1.0 0.5 0.0 0.5 Ricci curvature of edges in G

Ricci curvature of nodes in L

Counts in bins

0.75 0.50 0.25 0.00 0.25 Ricci curvature of edges in G

Ricci curvature of nodes in L

Counts in bins

60 50 40 30 20 Ricci curvature of edges in G

Ricci curvature of nodes in L

Counts in bins

20 0 20 40 60 Ricci curvature of edges in G

Ricci curvature of nodes in L

Counts in bins

1.0 0.5 0.0 0.5 Ricci curvature of edges in G

Ricci curvature of nodes in L

Counts in bins

0.5 0.0 0.5 Ricci curvature of edges in G

Ricci curvature of nodes in L

Counts in bins

80 60 40 20 Ricci curvature of edges in G

Ricci curvature of nodes in L

Counts in bins

20 0 20 40 60 80 Ricci curvature of edges in G

Ricci curvature of nodes in L

Counts in bins

1.0 0.5 0.0 0.5 Ricci curvature of edges in G

Ricci curvature of nodes in L

Counts in bins

0.5 0.0 0.5 Ricci curvature of edges in G

Ricci curvature of nodes in L

Counts in bins

150 100 50 Ricci curvature of edges in G

Ricci curvature of nodes in L

Counts in bins

50 0 50 100 150 Ricci curvature of edges in G

Ricci curvature of nodes in L

Counts in bins

Figure 8: Correlation of the distributions of the classical ORC (left), the approximation of ORC from the average of the upper and lower bounds (middle left), the classical FRC (middle right), and the augmented FRC (right) for edges in G versus that for vertices in L where the networks are obtained from (top) the planted SBM of size n = 100, two equally-sized blocks, pout = 0.02 and pin = 0.4; (middle) the 2-d RGG of size n = 100, and radius r = 0.3; (bottom) the Worm data set. Notice that the range of curvature values varies between curvature notions, but the shape of the distributions has a close resemblence.

5.2.1 Unweighted case

Consider an unweighted graph G, and denote ORC in G and L(G) by Ric G O, Ric L(G) O , respectively. We prove combinatorial upper and lower bounds for ORC in L(G), which can be computed solely from structural information in the underlying graph G. These results are the analogues of Equations (4.6), (4.7), but make use of the relationships between the line graph L(G) and its base graph to avoid computing curvatures in the line graph.

Lemma 11 (Bounds on line graph ORC) Let du denote the degree of the vertex u V in the graph G, and d{u,v} the degree of the vertex {u, v} E in the line graph L(G). Then we have the following bounds on the ORC of edges in L(G):

1. Upper bound on ORC:

Ric L(G) O ({{u, v}, {v, w}}) dv 2 + 1E({u, w})

(du dw) + dv 2 ; (5.1)

Tian and Lubberts and Weber

2 1 0 1 Ricci curvature of related edges in G

Ricci curvature of edges in L

Counts in bins

1.5 1.0 0.5 0.0 0.5 Ricci curvature of related edges in G

Ricci curvature of edges in L

Counts in bins

120 100 80 60 Ricci curvature of related edges in G

Ricci curvature of edges in L

Counts in bins

50 0 50 100 Ricci curvature of related edges in G

Ricci curvature of edges in L

Counts in bins

1 0 1 Ricci curvature of related edges in G

Ricci curvature of edges in L

Counts in bins

1 0 1 Ricci curvature of related edges in G

Ricci curvature of edges in L

Counts in bins

150 125 100 75 50 25 Ricci curvature of related edges in G

Ricci curvature of edges in L

Counts in bins

0 50 100 150 Ricci curvature of related edges in G

Ricci curvature of edges in L

Counts in bins

2 1 0 1 Ricci curvature of related edges in G

Ricci curvature of edges in L

Counts in bins

1 0 1 Ricci curvature of related edges in G

Ricci curvature of edges in L

Counts in bins

250 200 150 100 50 Ricci curvature of related edges in G

Ricci curvature of edges in L

Counts in bins

100 0 100 200 300 Ricci curvature of related edges in G

Ricci curvature of edges in L

Counts in bins

Figure 9: Correlation of the distributions of the classical ORC (left), the approximation of ORC from the average of the upper and lower bounds (middle left), the classical FRC (middle right), and the augmented FRC (right) for edges in G versus that for edges in L where the networks are obtained from (top) the planted SBM of size n = 100, two equally-sized blocks pout = 0.02 and pin = 0.4; (middle) the 2-d RGG of size n = 100, and radius r = 0.3; (bottom) the Worm data set. Notice that the range of curvature values varies between curvature notions, but the shape of the distributions has a close resemblance.

2. Lower bound on ORC:

Ric L(G) O ({{u, v}, {v, w}}) 1 1 du + dv 2 1 dv + dw 2 dv 2 + 1E({u, w})

(du dw) + dv 2

1 1 du + dv 2 1 dv + dw 2 dv 2 + 1E({u, w})

(du dw) + dv 2

+ dv 2 + 1E({u, w})

(du dw) + dv 2 . (5.2)

Here, 1E({u, w}) = 1 if u, w are connected by an edge in G, and 1E({u, w}) = 0 otherwise.

A crucial feature of this lemma is that it does not require the computation of triangle counts in L(G): It only requires evaluating the degrees of the constituent vertices, and the existence of the edge {u, w} E. The latter results in an additional triangle in the base graph G and, since L(K3) is isomorphic to K3, an additional triangle in the line graph.

Curvature-based Clustering on Graphs

5.2.2 Weighted case

Importantly, we can show a version of these bounds in the weighted case, too. Let ω({u, v}, {v, w}) denote the weight of the edge {{u, v}, {v, w}} E. Then for an edge {e1, e2} E, Theorem 9 gives us the following result, which makes use of the notation from Section 4.1.3.

1. Upper bound on ORC:

Ric O({e1, e2}) 1 max

d L(G)(e, N)

ω(e1, e2) (m1(e) m2(e))+, X

d L(G)(e, P)

ω(e1, e2) (m2(e) m1(e))+

2. Lower bound on ORC:

Ric O({e1, e2}) 1 X

ω(ℓ, e1) ω(e1, e2)m1(ℓ) X

ω(r, e2) ω(e1, e2)m2(r)

ω(e1, e2)(m1(c) m2(c))+ + ω(c, e1)

ω(e1, e2)(m2(c) m1(c))+

L1 + X1 X2 X

c (m2(c) m1(c))+

Note that the upper bound for ORC in the line graph still requires one to compute the line graph. However, we can always use the trivial upper bound Ric O({e1, e2}) 1 if an approximation will suﬃce. See Section 7.4 for experimental results utilizing this approach.

Summary. In the applications discussed in the next section, we again approximate ORC as the arithmetic mean of the upper and lower bounds for the sake of computational eﬃciency:

Deﬁnition 12 (Approximate ORC in the Line Graph)

[ Ric O L(G)({u, v}) := 1

Ric L(G) O up({u, v}) + Ric L(G) O low({u, v}) . (5.3)

In Figure 10, we compare this approximation to the original ORC for some simulated graphs. Note that the approximations in the right two columns can be computed only from information contained in the graph G, so that L does not need to be constructed.

5.3 Forman s Ricci Curvature

In this section, we establish a series of relationships between Forman s curvature of a graph and its line graph. We further discuss how higher order structures in the line graph may be incorporated in the curvature computation.

5.3.1 Unweighted case

Consider an unweighted graph G, and denote FRC in G and L(G) by Ric G F , Ric L(G) F , respectively. At ﬁrst, we only consider the 1-complex curvature notion (Equations (4.9), (4.10)), which does not consider contributions of higher-order structures, such as triangles or quadrangles. We begin by investigating the relationship of edge-level Forman curvature in a graph G and the node and edge-level curvatures in its line graph L(G).

Tian and Lubberts and Weber

Figure 10: Scatter plot of the ORC on L obtained from optimization versus its approximation with L constructed (left two columns) and without (right two columns), where the optimization is done by solving the earth mover s distance exactly (left most and middle right) and the approximately with Sinkhorn (middle left and right most), and the original graph G is obtained from the planted SBM of size n = 100, two equally-sized blocks, pout = 0.02 and pin = 0.4 (top) and the 2-d RGG of size n = 100 and radius r = 0.4 (bottom).

Lemma 13 Edge-level FRC in G relates to edge-level FRC in L as

Ric L(G) F ({{u, v}, {v, w}}) = Ric G F ({u, v}) + Ric G F ({v, w}) . (5.4)

Edge-level FRC in G relates to node-level FRC in L as

Ric L(G) F ({u, v}) = Ric G F (u) + Ric G F (v) Ric G F ({u, v})2 . (5.5)

Importantly, this lemma allows us to compute node-level FRC in the line graph with respect to quantities in the original graph only. Speciﬁcally, the right hand side can be computed from edge-level FRC only, utilizing Equation (4.10). See the third columns in Figures 8 and 9 for an illustration of these relationships in a few simulated and real-data graphs.

5.3.2 Weighted case

When the vertex weights ωu = 1 for all u V , and using the line graph edge weights ω{e1,e2} = ωe1ωe2, we get the following formula for the relationship between the edge-level FRC in G versus the edge-level FRC in L(G):

Curvature-based Clustering on Graphs

Ric L(G) F ({{u, v}, {v, w}}) = ω{u,v}

ω{u,v} (2 Ric G F ({u, v}))

ω{u,v} ω{v,w} (2 Ric G F ({v, w}))

5.3.3 Incorporating higher-order structures

Finally, we discuss how to account for contributions of higher-order structures, such as triangles or quadrangles, in the computation of curvature in the line graph. As discussed in Section 4, the curvature contributions of such structures are crucial in curvature-based clustering. We have demonstrated that the successful application of the FRC-based approach to single-membership community detection relies on the consideration of triangular faces, such as in our 2-complex FRC notion (Equation (4.11)). In the following, we will illustrate the necessity of including contributions of quadrangular faces in the curvature contribution on the line graph. We ﬁrst deﬁne a 2-complex FRC notion on the line graph, which accounts for curvature contributions of triangular faces. Let Γ(v) = {v : {v, v } E} denote the set of neighbors for a vertex v V in G. Suppose {u, v}, {v, w} E denote neighboring edges in G and ({u, v}, {v, w}) E the corresponding edge in the line graph. We deﬁne the set of neighbors for an edge {u, v} as follows:

Γ {u, v} := n {u, u } : u Γ(u)\{v} o n {v, v } : v Γ(v)\{u} o

The set of triangles in the line graph containing {u, v}, {v, w} is then given by

F := n {u, v}, {v, w}, e : e Γ({u, v}) Γ({v, w}) o

= n {u, v}, {v, w}, e : e {v, v } : v Γ(v)\{u, w} {{u, w}} E o .

Hence, there are only two possible structures in G, which give rise to triangles in its line graph L: (i) edges incident to the common node of the two edges under consideration, or (ii) the other endpoints of the two edges are connected with each other. The 2-complex FRC with triangle contributions can then be written as follows:

Deﬁnition 15 (2-complex FRC in the Line Graph (k 3))

Ric F {{u, v}, {v, w}}

= ω{u,v},{v,w}

ω{u,v},{v,w}

ωf + ω{u,v} ω{u,v},{v,w} + ω{v,w} ω{u,v},{v,w}

x Γ(u)\{v,w}

ω{u,v} ω{u,v},{v,w} ω{u,v},{u,x} + X

x Γ(w)\{u,v}

ω{v,w} ω{u,v},{v,w} ω{v,w},{w,x}

Tian and Lubberts and Weber

In the special case of an unweighted network (i.e., ωe = 1 for all e V and ω{e,e } = 1 for all {e, e } E), we have

Ric F {{u, v}, {v, w}} = 2 + X

1 ωf n e Γ {u, v} : {u, v}, {v, w}, e / F o

n e Γ({v, w}) : {{u, v}, {v, w}, e} / F o . (5.6)

If we assign each triangle the same weight ω , this further simpliﬁes the above equation as follows:

Lemma 16 (2-complex FRC in the Line Graph (triangles only))

Ric F ({{u, v}, {v, w}) = dv

ω du dw + 4 2

ω + 2 1E({u, w}) . (5.7)

The lemma implies that the triangles-only 2-complex FRC in the line graph can still be largely determined by the degree of the nodes in the original graph G; the local triangle count enters via the terms 1E({u, w}) (which is 1 if {u, w} E and 0 otherwise). A simple calculation shows that ω =

3/4, i.e., 1/ω +2 = (4

3+6)/3 3.309, while node degree could be as high as O(|V |). As such, a triangle in G increases the curvature value in L by 1/ω + 2. However, our experimental results (see Table 11) demonstrate that clustering methods based on the triangles-only 2-complex FRC can have low accuracy. While considering triangles was suﬃcient for FRC-based community detection on the original graph, we need to incorporate additional higher-order information to perform community detection on the line graph with high accuracy. Our experiments suggest that accounting for curvature contributions of quadrangles leads to a signiﬁcant improvement in the performance of FRC-based mixed-membership community detection (see Table 11). We found that incorporating curvature contributions of k-faces with k 5 does not improve performance, while signiﬁcantly increasing the computational burden. This is expected, given the decreasing frequency of such structures (see Figure 7). These design choices lead to the following 2complex FRC notion, which we will utilize in our FRC-based clustering method below: The set of quadrangles in the line graph, which contains {u, v}, {v, w} is given by

F := n {u, v}, {v, w}, {w, x}, {x, u} : x (Γ(w) Γ(u))\{v} o .

As such, there are two ways for quadrangles to form in the line graph: First, any quadrangle in the original graph G gives rise to one in the line graph L. Second, a quadrangle with a single chord ({u, v, w, x}, {{u, v}, {v, w}, {w, x}, {x, u}, {u, w}}) still results in a quadrangle in L from ({u, v}, {v, w}, {w, x}, {x, u}), since there are no edges from {u, v} to {w, x} or {v, w} to {x, u} in L.

Curvature-based Clustering on Graphs

Deﬁnition 17 (2-complex FRC in the Line Graph (k 4))

Ric F {{u, v}, {v, w}}

= ω{u,v},{v,w}

ω{u,v},{v,w}

ω{u,v},{v,w}

ωf + ω{u,v} ω{u,v},{v,w} + ω{v,w} ω{u,v},{v,w}

{u,v},{v,w},e1,e2 F

ω{u,v},{v,w} ωe1,e2

x Γ(u)\Γ0(w)

ω{u,v} ω{u,v},{v,w} ω{u,v},{u,x} X

x Γ(w)\Γ0(u)

ω{v,w} ω{u,v},{v,w} ω{v,w},{w,x}

where Γ0(x) = Γ(x) {x}.

Our choice of weights for quadrangles is also based on geometric intuition utilizing Heron s Formula. Full details for this are given in Appendix A.4.

6. Algorithms

6.1 ORC-based approach

The curvature-based community detection algorithms (Algorithm 1) are built on the fact that bridges between communities are more negatively curved than edges within communities. We have discussed earlier that this can be observed in both graphs (in the singlemembership case) and their line graphs (in the case mixed-membership communities). In this section, we provide a detailed discussion of Algorithm 1 with κ denoting Ollivier s Ricci curvature (ORC). We begin by giving some intuition for the diﬀerences in curvature values of bridges and internal edges, before formalizing the argument below. By construction, ORC is closely linked to the behavior of two random walks starting at neighboring vertices (Ollivier, 2009; Jost and Liu, 2014). Informally, they are more likely to draw apart if the edge between them has negative ORC, and to draw closer together otherwise. Random walks that start at nodes adjacent to a bridge (bridge nodes), typically move into the communities to which the respective nodes belong. As a consequence, they draw apart quickly. Random walks that start at non-bridge nodes, i.e., nodes adjacent to internal edges, are more likely to stay close to each other, as they remain within the same community. Hence, we expect bridges to have much lower ORC than internal edges, a fact that is easily conﬁrmed empirically (see, e.g., Figure 4).

Remark 18 We note that an alternative notion of Ollivier s curvature (Equation (4.3)) was given by Lin et al. (2011), where node neighborhoods are endowed with the measure

1 α |Nu|, u z

Tian and Lubberts and Weber

instead of the uniform measure. A weighted version of this curvature was considered in (Ni et al., 2019). Notice that for α > 0, this curvature notion connects to lazy random walks starting at neighboring nodes. Here, the parameter α could be seen as controlling whether the random walk is likely to revisit a node, which in turn relates to a distinction between exploration of node neighborhoods in the style of breadth-ﬁrst search vs. depths-ﬁrst search . A suitable choice of α is highly dependent on the topology of the graph; hence, replacing ORC with Equation (6.1) provides an additional means for incorporating side information on the structure of the underlying graph. However, in this work we restrict ourselves to the classical ORC notion.

6.1.1 Algorithm

Single-Membership Community Detection. We implement Algorithm 1 via ORC by setting κ( ) = [ Ric O( ), i.e., the curvature is chosen to be the combinatorial ORC approximation proposed in Section 4.1. Notice that as the edge weights evolve under the Ricci ﬂow, the graph is weighted, even if the input graph was unweighted. Consequently, we use weighted ORC curvature, constructing the approximation from the bounds in Theorem 9. We further choose equally-spaced cut-oﬀpoints {xi}nf i=0, where x0 = max{u,v} E w T u,v, x1 = x0 δ, . . . , xnf = ((x0 1) mod δ + 1), and the step size for the cut-oﬀpoints is set to be δ = 0.025. Other hyperparameter choices include a constant step size ν = 1, ϵ = 10 4, drop threshold ϵd = 0.1, and stopping time T = 10, i.e., we evolve edge weights under Ricci ﬂow for 10 iterations. Instances of Algorithm 1 via ORC were previously considered in (Ni et al., 2019; Sia et al., 2019; Gosztolai and Arnaudon, 2021). Both approaches computed ORC either exactly (via the earth mover s distance) or approximately (via Sinkhorn s algorithm). However, due to the higher computational cost of either variant of the ORC computation, the approaches were limited in their scalability (see discussion Section 6.1.3 below).

Mixed-Membership Community Detection. As in the single-membership case, we implement Algorithm 1 via our combinatorial ORC approximation (Equation (5.3)). The input graph G is the line graph of the underlying graph, which is constructed before the clustering procedure is started. All edge weights are initially set to 1, i.e., the input graph is unweighted. Hyperparameter choices are analogous to the single-membership case. Instance of Algorithm 1 via ORC were ﬁrst considered in (Tian et al., 2023). Again, due to the high computational cost of the ORC computation, the approach was limited in its scalability.

6.1.2 Theoretical results.

Single-Membership case. To give theoretical intuition for curvature-based community detection algorithms in the single-membership case, we considered the following model class of graphs with community structure:

Deﬁnition 19 Let Ga,b (a > b 2) be a graph constructed as follows:

1. Construct a complete graph with b vertices {v1, . . . , vb}.

2. For each i = 1, . . . , b, introduce vertices uij, j = 1, . . . , a, and make all possible connections between vertices in {vi} {uij}a j=1, resulting in b copies of complete graphs Ka+1.

Curvature-based Clustering on Graphs

We show examples of graphs of the form Ga,b in Figure 11. Notice that in a graph Ga,b we have three types of edges:

1. bridges between communities, we denote vertices adjacent to bridges as bridge nodes and all others as internal nodes;

2. internal edges that connect a bridge node and an internal node;

3. internal edges that connect two internal nodes.

As before, we consider the following mass distribution on the node neighborhoods:

( 1 C e wx,y, x y 0, else . (6.2)

Here, C denotes the normalizing function. This choice follows from the more general scheme in Equation (6.1) by setting α = 0, p = 1. We assume that at initialization, all edges are assigned weight 1.

Theorem 20 Let the edge weights in Ga,b evolve under the Ricci ﬂow wt+1 i (1 κt i)wt i (where κt i denotes ORC for the edge type i using the weights at time t).

1. The weights of internal edges (type (3)) contract asymptotically, i.e., wt 3 0 as t .

2. The weights of bridges are larger than those of internal edges of types (2) and (3), i.e., wt 1 > wt 2 and wt 1 > wt 3 for all t > 0.

Remark 21 We note that the class of graphs Ga,b was previously considered in (Ni et al., 2019), which also give asymptotic guarantees for the evolution of edge weights under the Ricci ﬂow. However, they assume a diﬀerent mass distribution, hence, their asymptotic results are not directly applicable to our setting.

We will present the full proof of this theorem in Appendix A.5, but we give a sketch of the proof here. The argument proceeds by induction, ﬁrst calculating the exact ORC curvatures of the three edge types at iteration 1, and then calculating the exact ORC curvatures of the three edge types at iteration t, given the weights wt i. In each of these steps, we construct transport plans, as well as 1-Lipschitz functions with equal objective function values to show their optimality. This proves that we have the exact values of the Wasserstein distances. We use these exact formulas to show that the evolution of the weights has the properties given in the statement.

Mixed-Membership case. We modify the model class introduced in Deﬁnition 19 to account for mixed-membership community structure:

Deﬁnition 22 Let La,b (a > b 2) be a graph constructed as follows:

1. Construct a star-shaped graph, consisting of a center node of degree b and b leaf nodes of degree 1.

Tian and Lubberts and Weber

2. Replace each leaf node with a star-shaped graph with a + 1 vertices.

Notice that La,b consists of b blocks (communities) of size a+2 with one common node, i.e., the center node is a member of each of the b communities.

It is easy to see that the dual of a graph La,b is a graph of the form Ga,b. Consequently, we recover guarantees for curvature-based clustering on the line graph from Theorem 20.

6.1.3 Computational Considerations

Figure 11: Examples of graphs of the model classes Ga,b and La,b (top: a = b = 2, bottom: a = b = 3). For both pairs, Ga,b is the line graph of La,b.

The two key bottlenecks in ORC instances of Algorithm 1 are (1) the computation of the W1distance in the ORC computation (both singleand mixed-membership case), and (2) the construction of the line graph (mixed-membership case). Following our discussion in Section 4.1, we can circumvent bottleneck (1) by approximating ORC with the arithmetic mean of its upper and lower bounds, which are given by Equation (4.8) in the original graph (use in singlemembership case) and Equation (5.3) in the line graph (use in mixed-membership case). This reduces the complexity of the ORC computation to O(|E|dmax), in comparison with previous approaches (Ni et al., 2019; Sia et al., 2019), which achieved at best O(|E|d2 max) via Sinkhorn s approximation of the Wasserstein distance. The second bottleneck may also be circumvented in special cases. We discuss the details in Section 7.4 below.

6.2 FRC-based approach

Finally, we discuss the second instance of Algorithm 1, where κ denotes Forman s Ricci curvature (FRC). Again, we observe that, typically, bridges have lower FRC than internal edges in communities, which can be exploited for curvature-based community detection. As in the previous section, we ﬁrst provide some informal intuition on this observation, before formally introducing the algorithm. Notice that bridge nodes have typically a high node degree, as they connect not only to other nodes within the same community, but also to nodes in other communities. It is easy to see from the deﬁnition (recall, Ric F (e) = 4 dv1 dv2) that, as a result, bridges have usually lower curvature than internal edges in unweighted networks. This imbalance in curvature values is iteratively reinforced as edge weights evolve under the Ricci ﬂow. In particular, we see that, in the case of bridges, the 2-complex FRC (Equation (4.11))

Ric F (e) := ωe

Curvature-based Clustering on Graphs

is increasingly dominated by the second term, due to the high degree of the adjacent vertices and, consequently, the large number of parallel edges. This is reinforced as the edge weight increases under Ricci ﬂow, which decreases the ﬁrst term, resulting in negative curvature with high absolute value. In contrast, for internal edges, the second term has a smaller magnitude due to the smaller degrees of the adjacent vertices. This is again reinforced as the edge weight decreases under Ricci ﬂow.

6.2.1 Algorithm

Single-Membership Community Detection. We implement Algorithm 1 via FRC by setting κ( ) = Ric F ( ), chosen to be the 2-complex FRC with triangle contributions (Equation (4.12)). In the absence of given edge weights, we initialize edge weights to 1 as before. Adaptive step sizes are chosen proportional to the inverse of the highest absolute curvature of any edge in a given iteration, i.e., νt = 1/(1.1 max{u,v} E |κt u,v|). In our experiments below, we evolve edge-weights for T = 10 iterations under the Ricci ﬂow. To infer communities, cut-oﬀpoints are chosen as {xi}nf i=0, where x0 = max{u,v} E w T u,v, x1 = max{u,v} E,w Tu,v<x0 w T u,v, . . . , xn f = w T q , xn f+1 = xn f δ, . . . , xnf = ((xn f 1.1wmin)

mod δ + 1.1wmin). Here, w T q is the qth quantile of all weights; we set q = 0.999, step size δ = 0.25, and wmin = min{u,v} E w T u,v. Note that this diﬀers from the uniformly spaced cut-oﬀpoints in the ORC case. Under FRC-based Ricci ﬂow, the magnitude of some edge weights grows rapidly, resulting in a fat-tailed edge weight distribution. Hence, adjusting the spacing between cut-oﬀpoints accordingly reduces the number of cut-oﬀpoints needed to achieve high accuracy in cluster assignments. We remark that a special case of Algorithm 1 was previously proposed in (Weber et al., 2018), although considering the classical FRC notion (Equation (4.9)) instead of the 2-complex FRC and utilizing a simpler scheme to identify bridges between communities using FRC.

Mixed-Membership Community Detection. We implement Algorithm 1 via FRC on the line graph, by setting κ( , ) = Ric F ( , ), chosen to be the 2-complex FRC with triangle and quadrangle contributions (Deﬁnition 17). In the mixed-membership case, bridges between communities appear only in the line graph. Hence, we ﬁrst construct the line graph before the actual clustering procedure is started. The input to Algorithm 1 is the pre-computed line graph. All edge weights are initially set to 1, i.e., the input graph is unweighted. The adaptive step sizes, stopping time and cut-oﬀpoints are chosen analogously to the single-membership case. To the best of our knowledge, FRC-based approaches have not been considered previously in this mixed-membership setting.

6.2.2 Computational Considerations

Forman s Ricci curvature for edges is given by a simple combinatorial formula, which can be computed in linear time. The variants considered in this work (2-complex Forman curvature with triangle and quadrangle contributions) require an additional subroutine, which computes the number of triangles that include the respective edge. Counting the number of triangles in the 2-hop neighborhood is linear in the node degree, hence the overall complexity is O(|E|dmax), where dmax denotes the maximum node degree. Similarly, computing the number of quadrangles in the same 2-hop neighborhood is quadratic in the

Tian and Lubberts and Weber

degree, hence the overall complexity is O(|E|d2 max). Since this subroutine requires only local information, it is easily parallelizable.

7. Experiments

7.1 Network Data

Synthetic Data. In order to explore the interaction between discrete Ricci curvature and the structure of complex networks, we have considered both Stochastic Block Model and Random Geometric Graphs in our previous analysis. In this section, we systematically examine the performance of the algorithms on singleand mixed-membership Stochastic Block Models, where the ground-truth community membership is available. We have introduced the two variants of the Stochastic Block Model above in Section 2.

Real network data. To test our methods on real data, we consider two data sets, (i) collaboration networks from DBLP (Yang and Leskovec, 2015), and (ii) ego-networks from Facebook (Leskovec and Mcauley, 2012), for which ground truth labels are available through SNAP (Leskovec and Krevl, 2014). Speciﬁcally, the collaboration network is constructed by a comprehensive list of research papers in computer science provided by the DBLP computer science bibliography. Here, an edge from one author to another indicates that they have published at least one joint paper. There are intrinsic communities deﬁned by the publication venue, e.g., journals or conferences. Here, we maintain the choice of two communities, and randomly select two venues which have at least one node of mixed membership. We report the results for two representative networks: DBLP-1 from publication venues no. 12 and 1654, and DBLP-2 from publication venues no. 12 and 2592. While in the second data set (ego-networks), all the nodes are friends of one central user, and the friendship circles set by this user can be used as ground truth communities. We carried out the preprocessing in a similar manner as (Zhang et al., 2020), and then select two networks for which the modularity of the ground-truth communities is greater than 0.5: FB-1 for no. 414, and FB-2 for no. 1684. To better understand the characteristics of the diﬀerent real networks, we provide the following summary statistics for each network (see Table 1): (i) average node degree d, (ii) degree heterogeneity measured by the standard deviation of node degrees σd over d, (iii) the actual proportion of nodes with overlapping membership ˆπo, and (iv) the modularity of the ground-truth community. We note that the Facebook networks tend to be denser, with more nodes of mixed membership, while the collaboration networks tend to have heterogeneous degrees and communities that are not well described by modularity.

n |E| |E| k d σd/d ˆπo Modularity DBLP-1 82 173 867 2 4.22 0.65 0.012 0.15 DBLP-2 54 142 956 2 5.26 0.69 0.019 0.18 FB-1 128 1593 45635 3 24.89 0.44 0.055 0.53 FB-2 621 12399 718267 5 39.93 0.69 0.004 0.52

Table 1: Summary statistics of the real networks.

Curvature-based Clustering on Graphs

7.2 Evaluating clustering accuracy

To evaluate the accuracy of a clustering in the singleand mixed-membership settings, we introduce the following two notions of Normalized Mutual Information (NMI). When each node can only belong to one community, the classic NMI is deﬁned by considering each clustering result as a random variable and then comparing the information contained in them (Strehl and Ghosh, 2002). Speciﬁcally, let X, Y be the random variables described by two diﬀerent cluster labels. Let I(X, Y ) denote the mutual information between X and Y , and let H(X), H(Y ) denote the entropy of X, Y , respectively. Then NMI is deﬁned as

NMI(X, Y ) = I(X, Y ) mean(H(X), H(Y )), (7.1)

where mean(H(X), H(Y )) denote the mean of the two, and we consider the arithmetic mean in our experiments, mean(H(X), H(Y )) = (H(X) + H(Y ))/2. The NMI ranges from 0 to 1, where value 1 corresponds to perfect matching, or exact recovery if one of the clustering result is ground truth. We note that the classic NMI is not well-deﬁned for mixed-membership community detection, thus we later consider the following extended NMI (Lancichinetti et al., 2009). Speciﬁcally, for communities C1, C2, . . . , Ck, we now express the community membership of each node i as a binary vector of length k. (zi)l = 1 if node i belongs to Cl; (zi)l = 0 otherwise. The l-th entry of this vector can then be viewed as a random variable Zl, whose probability distribution is given by P(Zl = 1) = nl/n and P(Zl = 0) = 1 P(Zl = 1), where nl = |Cl| and n = |V |. The same holds for the random variable Yh associated with another set of communities C 1, C 2, . . . , C k . Both the empirical marginal probability distribution PZl and the joint probability distribution P(Zl, Yh) are used to further deﬁne entropy H(Z) and H(Zl, Yh). The conditional entropy of Zl given Yh is deﬁned as H(Zl|Yh) = H(Zl, Yh) H(Yh). The entropy of Zl with respect to the entire vector Y is based on the best matching between Zl and the components of Y given by

H(Zl|Y) = min h {1,2,...,k } H(Zl|Yh).

The normalized conditional entropy of Z with respect to Y is

In the same way, we can deﬁne H(Y|Z). Finally, the extended NMI for two sets of communities C1, C2, . . . , Ck and C 1, C 2, . . . , C k is given by

NMI(Z|Y) = 1 [H(Z|Y) + H(Y|Z)]/2 .

Hence to use the extended NMI, we convert the label vector y to a binary assignment by ﬁrst normalizing it by its 2-norm and then thresholding each of its elements by 0.8/k.

7.3 Single-Membership Community Detection

Benchmarking. For the ﬁrst set of experiments, we explore the performance of curvaturebased methods on planted SBMs with two blocks and various choices of parameters. We

Tian and Lubberts and Weber

start with graphs of two equally sized blocks, ﬁxed size n = 1000 and probability pout = 0.01, while changing the probability pin from 0.05 to 0.2. We then explore the critical region when pin is close to pout and the two blocks have diﬀerent sizes, n1, n2. For each set of parameters, we generate ns = 10 graphs and run our algorithms for both ORC and FRC, together with other popular community detection methods, the Louvain algorithm ( Louvain ) and variants of Spectral clustering ( Spectra ). Speciﬁcally, for ORC, the optimal transport can be obtained by solving the earth mover s distance exactly ( ORC-E ) or approximately with Sinkhorn ( ORC-S ), while we can also approximate it by the mean of the upper and lower bounds we have developed ( ORC-A ). For FRC, we can consider the 1-d version as in Equation (4.9) where no face information is included ( FRC-1 ), the augmented version with triangular faces as in Equation (4.11) ( FRC-2 ) and also the one with further quadrangular faces as in Deﬁnition 17 ( FRC-3 ). For the Louvain algorithm, we run it for 100 times, construct the co-occurrence matrix of the clustering results, and then run the algorithm one more time on the graph constructed by the co-occurrence matrix, in order to obtain consistent communities. For Spectral Clustering, we present the results from the following variants: (i) the highly optimized version in scikit-learn (Pedregosa et al., 2011) together with a grid search for the number of clusters, from 2 up to 10 ( Spectra-S ), and (ii) automatic embedding dimension selection (Zhu and Ghodsi, 2006), followed by kmeans clustering ( Spectra-A ). In Appendix C.1, we provide further details for the singlemembership community detection, where we also vary the network size, compare diﬀerent optimization methods for ORC, and explore the relationship between ORC and FRC. We also compare the curvature-based methods with other variants of Spectral Clustering in Appendix C.3, where instead of directly using the adjacency matrix, we can construct an aﬃnity by (iii) radius basis functions or (iv) nearest-neighbor techniques. We report the mean NMI, runtime, and their standard deviations (SD) for all methods. To maintain the detectability of the communities, we require the generated graphs to have modularity greater than 0.4 with the ground-truth communities if possible, or choose those with the highest modularity within 50 realizations. Our experimental results (see Table 2) demonstrate that the ORC approach outperforms most reference methods, and successfully recovers single-membership community structure, with an NMI greater than 0.8, as we vary the density via pin. We note that, unlike the curvature-based methods, Spectra-S requires the number of communities as an input parameter. This makes the problem of detecting communities considerably easier. For a fairer comparison, we perform a grid search over this parameter, however, since this includes the true number of communities, this can still be considered an unfair advantage. In addition, the implementation that is used in our experiments is highly optimized. Despite these caveats, we observe that curvature-based methods achieve comparable performance on larger graphs with more heterogeneous community structures. On synthetic data (SBM), as pin, pout become closer to each other (i.e., close to the community detection threshold), we observe that the drop in performance is less signiﬁcant in the ORC approach than in the reference methods (see Table 4). This indicates that curvature-based methods could be advantageous in regimes in which community detection is challenging with existing approaches. We only show the results from ORC-E, since ORC-S has almost the same performance as ORC-E. Furthermore, our experiments show that leveraging our proposed combinatorial ORC approximation is a viable alternative that can achieve performance comparable to the Lou-

Curvature-based Clustering on Graphs

vain algorithm and the Spectral Clustering. This is especially pronounced in denser graphs, where computing exact ORC can be prohibitively expensive. Being computationally very eﬃcient overall, the FRC approach can also retrieve the single-membership community structure when the graph is suﬃciently dense. Here, only the results from FRC-2 are included for the reasons that we will discuss later. See Appendix C.1 for more results.

pin Louvain Spectra-S Spectra-A ORC-E ORC-A FRC-2 0.05 0.998 (0.004) 0.997 (0.007) 0.997 (0.007) 0.997 (0.008) 0.225 (0.053) 0.176 (0.018) 0.1 1 (0) 1 (0) 1 (0) 1 (0) 0.993 (0.005) 0.275 (0.182) 0.15 1 (0) 1 (0) 1 (0) 1 (0) 1 (0) 0.990 (0.031) 0.2 1 (0) 1 (0) 1 (0) 1 (0) 1 (0) 1 (0)

Table 2: Mean (SD) of NMI from diﬀerent methods on planted two-block SBMs with n1 = 500, n2 = 500, pout = 0.01, and varying probabilities pin.

pin Louvain Spectra-S Spectra-A ORC-E ORC-A FRC-2 0.05 38.16 (0.892) 1.203 (0.098) 0.415 (0.046) 7.537 (0.518) 6.084 (0.507) 4.100 (0.320) 0.1 33.85 (0.451) 1.115 (0.032) 0.385 (0.013) 13.57 (0.333) 7.496 (0.293) 4.932 (0.192) 0.15 37.70 (0.533) 1.111 (0.027) 0.387 (0.010) 24.85 (1.326) 11.35 (0.644) 7.526 (0.668) 0.2 42.97 (0.794) 1.124 (0.037) 0.395 (0.014) 41.78 (1.285) 19.32 (1.331) 14.85 (1.239)

Table 3: Mean (SD) of runtime (in seconds) from diﬀerent methods on planted two-block SBMs with n1 = 500, n2 = 500, pout = 0.01, and varying probabilities pin.

pin Louvain Spectra-S Spectra-A ORC-E ORC-A FRC-2 0.02 0.019 (0.010) 0.160 (0.047) 0.056 (0.042) 0.213 (0.011) 0.126 (0.006) 0.128 (0.023) 0.04 0.008 (0.004) 0.018 (0.007) 0.007 (0.003) 0.170 (0.007) 0.132 (0.006) 0.114 (0.023) 0.06 0.003 (0.002) 0.010 (0.003) 0.003 (0.002) 0.151 (0.010) 0.128 (0.005) 0.108 (0.050) 0.08 0.004 (0.002) 0.012 (0.003) 0.004 (0.002) 0.092 (0.049) 0.130 (0.009) 0.078 (0.045)

Table 4: Mean (SD) of NMI from diﬀerent methods on planted two-block SBMs with n1 = 250, n2 = 750, and varying probabilities pin, pout where pin pout = 0.01.

Rationale for 2-complex FRC. For FRC, one could consider including contributions of higher-order faces in order to further improve the performance of community detection. Our experimental results (see Table 5) suggest that, while including triangle information is crucial, it is unclear whether encoding quadrangle leads to an improvement in accuracy. In dense graphs, including such information may even decrease performance. An exception is bipartite graphs, where quadrangle counts encode salient information. In addition, the computation of quadrangle counts is expensive, resulting in a signiﬁcant computational overhead, while the drop in eﬃciency is signiﬁcant. We further observe that FRC-2 (which includes only triangle information) has the strongest correlation with the clustering coeﬃcients, compared with FRC-1 and FRC-3 (see Figure 16 in Appendix C.1).

Towards an algorithm with integrated stopping criteria. In Algorithm 1, instead of choosing T a priori, we can integrate the choice with intermediate modularity check. One

Tian and Lubberts and Weber

pin FRC-1 FRC-2 FRC-3 NMI runtime NMI runtime NMI runtime 0.05 0.107 (0.087) 10.95 (1.386) 0.177 (0.013) 11.14 (1.447) 0.220 (0.100) 18.28 (15.99) 0.1 0.106 (0.087) 16.73 (1.103) 0.229 (0.118) 17.48 (1.075) 0.077 (0.155) 202.7 (5.996) 0.15 0.073 (0.089) 23.35 (0.353) 0.990 (0.013) 27.39 (0.992) 0.019 (0.056) 101.4 (133.7) 0.2 0.054 (0.083) 29.91 (0.571) 1 (0) 39.79 (1.702) 0.092 (0.092) 111.9 (1.680)

Table 5: Mean (SD) of NMI and runtime (in seconds) from diﬀerent FRC-based methods on planted two-block SBMs with n1 = 500, n2 = 500, pout = 0.01 and varying probabilities pin.

approach is to perform the modularity check at each iteration t, and record the best modularity value, Qt best. Then we stop the process when the best modularity value converges, where (Qt best Qt 1 best)/Qt 1 best < ϵd. With such integrated stopping criteria with intermediate modularity check, our experiments indicate that the algorithm usually stops in 2 or 3 steps, with comparable performance with the original version of Algorithm 1; see Table 6. Hence, on the one hand, the choice of T = 10 in our experiments is reasonable to provide suﬃciently good results, while not being far from the critical value for such good results. On the other hand, ﬁne-tuning the stopping criteria in the integrated version is a promising direction to pursue in the future to further improve the scalability of the algorithm.

n ORC-E ORC-int iterations 100 0.888 (0.106) 0.875 (0.108) 3.9 (1.136) 500 1 (0) 1 (0) 2.0 (1.244) 1000 1 (0) 1 (0) 2.0 (1.110)

Table 6: Mean (SD) of results from ORC-based methods on planted SBMs with two equallysized blocks, pin = 0.1, pout = 0.01 and varying sizes n, where the ﬁrst column corresponds to the NMI from Algorithm 1, the second column corresponds to the NMI from Algorithm 1 with intermediate modularity check and integrated stopping criteria, and the third column corresponds to the number of iterations in the integrated version.

7.4 Mixed-Membership Community Detection

Benchmarking. To demonstrate the performance of our algorithms, we generate graphs from planted MMBs with two equally-sized blocks n1 = n2 and diﬀerent conﬁgurations. Here, we ﬁx the network size of each block to be n1 = n2 = 150, no = 1 node of mixed membership and pout = 0, while varying the probability pin from 0.05 up to 0.2, and defer the results of varying the network size to Appendix C.2. For each set of parameters, we construct ns = 10 graphs, and run our algorithms for both ORC and FRC, together with the Louvain algorithm and Spectral clustering, in the same way as in the single-membership setting, but on the line graphs L(G). We report the mean NMI, runtime, and their standard deviations (SD) of the methods. For the purposes of detectability of communities, we consider graphs that have modularity greater than 0.4 with the ground-truth communities if possible or the ones with the highest modularity within 50 realizations. Our experimental results (see

Curvature-based Clustering on Graphs

Table 7 and Appendix C.2) demonstrate that the ORC-based approaches outperform most reference methods, where the ORC approach successfully recovers the mixed-membership community structure in almost all cases as we vary the probability pin, while the FRC approach can also retrieve the mixed membership when the network is suﬃciently dense. Since again the performance of ORC-S is almost the same as ORC-E, we only include the results from ORC-E. For the reasons that we have demonstrated in Section 5.3.3 and will discuss later, we only include the results from FRC-3.

pin Louvain Spectra-S Spectra-A ORC-E ORC-A FRC-3 0.05 0.957 (0.085) 0.997 (0.006) 0.442 (0.197) 0.996 (0.007) 0.590 (0.163) 0.657 (0.213) 0.1 0.968 (0.064) 0.999 (0.002) 0.940 (0.099) 0.998 (0.003) 0.683 (0.267) 0.730 (0.262) 0.15 0.984 (0.047) 0.999 (0.002) 0.971 (0.090) 0.999 (0.002) 0.709 (0.213) 0.800 (0.165) 0.2 1 (0) 1 (0) 0.982 (0.053) 1 (0) 0.714 (0.140) 0.814 (0.174)

Table 7: Mean (SD) of the extended NMI from diﬀerent methods on planted two-block MMBs with n1 = 150, n2 = 150, no = 1 node of mixed membership, pout = 0, and varying probabilities pin.

pin Louvain Spectra-S Spectra-A ORC-E ORC-A FRC-3 0.05 73.83 (4.757) 1.267 (0.068) 0.578 (0.373) 10.96 (1.730) 9.484 (0.585) 18.56 (2.449) 0.1 381.9 (44.93) 5.309 (0.601) 2.244 (0.305) 44.82 (5.303) 32.88 (6.112) 49.43 (14.67) 0.15 909.1 (69.85) 13.17 (0.802) 5.847 (0.452) 126.4 (21.195) 73.18 (6.555) 82.58 (9.904) 0.2 1465 (63.71) 26.78 (1.587) 12.05 (0.621) 241.6 (36.52) 123.3 (7.045) 125.2 (18.35)

Table 8: Mean (SD) of runtime (in seconds) from diﬀerent methods on planted two-block MMBs with n1 = 150, n2 = 150, no = 1 node of mixed membership, pout = 0, and varying probabilities pin.

ORC has comparable performance to spectral clustering on most datasets, and can outperform when the network has asymmetric communities, as shown in Table 4. Furthermore, we demonstrate the performance of our algorithms on real data via the two data sets described in Section 7.1. As in the synthetic networks, the ORC-based method outperforms the reference methods in most real networks, and the only case where the ORC-based method is ranked second is FB-1 , where the community structure is better captured by the Modularity, as implied by Table 1. Meanwhile, the FRC-based method can also outperform the reference methods, such as in DBLP-2 , indicating that the FRC can better characterize the underlying community structure in some cases. The results from the proposed approximation of ORC have more variance, which is expected since the real data has more variance in the structure of the networks and some may lie in the region where the approximation is relatively far from the exact value. However, the approximated version can also have comparable performance in most cases. We also observe an improvement of runtime from computing ORC by Sinkhorn optimization in FB-2 dataset (ORC-E: 5523s; ORC-S: 4941s; ORC-A: 2530s), since we allocate much larger memory for the computation (250G here instead of 25G in Appendix C.1).

Towards avoiding line graph construction. We note that only the upper bound requires global information of the line graph, while local information is suﬃcient to compute

Tian and Lubberts and Weber

Louvain Spectra-S Spectra-A ORC-E ORC-A FRC-3 DBLP-1 0.535 0.579 0.447 0.689 0.484 0.244 DBLP-2 0.652 0.708 0.358 0.969 0.463 0.737 FB-1 0.887 0.872 0.581 0.872 0.680 0.680 FB-2 0.516 0.652 0.510 0.652 0.242 0.287

Table 9: Extended NMI from diﬀerent methods on the real data.

the lower bound. Hence, one step towards avoiding the construction of line graph is to consider an upper bound that can be obtained without the information of the whole graph while performing suﬃciently well in the clustering tasks. Speciﬁcally, here we consider 1 as the upper bound, and denote the method using the average of 1 and the lower bound by ORC-A1 . Our numerical results indicate that ORC-A1 can further improve the eﬃciency from ORC-A while not signiﬁcantly aﬀecting the accuracy (see Table 10, where the NMI information is ignored because their values are close and all three methods can ﬁnd the ground-truth communities when pin = 0.2).

pin ORC-E ORC-A ORC-A1 0.1 426.8 182.2 144.6 0.2 2878 732.7 530.0

Table 10: Runtime (in seconds, including line graph construction) from diﬀerent ORC-based methods on one sample of planted two-block MMBs with n1 = n2 = 250, no = 1 node of mixed membership, pout = 0, and varying probabilities pin.

Rationale for 2-complex FRC in line graph. We have shown in Section 5.3.3 that FRC-2 in line graph can largely depend on the degree of nodes in the original graph, while the performance of FRC-1 for the task of community detection is limited, hence the corresponding algorithm may not lead to satisfactory results. Our experimental results (see Table 11) verify the above statement, where FRC-2 can have comparable performance to FRC-3 when node degree is moderate, but the performance drops signiﬁcantly and approaches FRC-1 as the network becomes denser.

pin FRC-1 FRC-2 FRC-3 0.1 0.315 (0.266) 0.816 (0.231) 0.854 (0.184) 0.2 0.269 (0.254) 0.856 (0.162) 0.895 (0.148) 0.3 0.328 (0.337) 0.859 (0.183) 0.982 (0.049) 0.4 0.560 (0.296) 0.673 (0.136) 0.978 (0.048)

Table 11: Mean (SD) of the extended NMI from diﬀerent FRC-based methods on planted two-block MMBs with n1 = n2 = 50, no = 1 node of mixed membership, pout = 0, and varying probabilities pin.

Rationale for detecting mixed-membership communities on the line graph. In the mixed-membership setting, it is still possible to apply algorithms that are designed

Curvature-based Clustering on Graphs

for the single-membership case, but the performance varies. Our experimental results (see Table 12) indicate that the improvement from applying the algorithm on the line graph is more signiﬁcant when the graph is sparser, even while there is only a few nodes of mixed membership, and also when there are more nodes of mixed membership. We should note that applying algorithms directly on the original graph cannot detect mixed-membership nodes, thus there is an upper bound on the accuracy that these methods can achieve and how far this upper bound is from 1 depends on the number of mixed-membership nodes.

pin no ORC-E (G) / (L) ORC-A (G) / (L) FRC-2 (G) / FRC-3 (L) 0.05 1 0.993 / 0.999 0.470 / 0.648 0.314 / 0.679 0.15 1 0.993 / 1 0.993 / 0.714 0.784 / 0.865 0.15 30 0.773 / 0.969 0.493 / 0.497 0.176 / 0.385

Table 12: Mean of the extended NMI from diﬀerent methods on the original graph G versus its line graph L, where the graphs are generated from planted two-block MMBs with n1 = n2 = 150, pout = 0.01, and varying number of mixed-membership node(s) and probability pin.

8. Comparison of Curvature Notions

In this ﬁnal section, we discuss diﬀerences and commonality between the two curvature notions that we considered. Speciﬁcally, we address geometric considerations, i.e., which structural features are best captured by which curvature notion and how this is reﬂected in their performance in clustering tasks.

8.1 Relationship between 2-complex FRC and ORC.

There is already strong numerical (Pouryahya et al., 2017; Samal et al., 2018) and theoretical evidence (Jost and M unch, 2021) of a close relationship between the 2-complex FRC and ORC in the prior literature. Speciﬁcally, Jost and M unch (2021) proved the following relationship between the two notions:

Theorem 23 (Jost and M unch (2021)) Ollivier s and Forman s curvature for the edges of a graph G are related as Ric O(e) = max K Ric F (e) , (8.1)

where the maximum is taken over all complexes K that have G as a 1-skeleton.

Remark 24 The Forman curvature considered by Jost and M unch (2021) deviates from Forman s orginal notion. However, one can recover the original notion as a special case via a speciﬁc choice of the cell weights. For details, see Jost and M unch (2021, sec. 7.4).

Theorem 23 implies that FRC and ORC coincide on the edges, when maximizing FRC over the choice of 2-cells whose contributions are included in curvature computation. The proof of this result proposes a means to translate between the contributions of 2-cells and the cost of transport plans.

Tian and Lubberts and Weber

Our numerical results (Figure 16) conﬁrm that the correlation between FRC and ORC becomes stronger, if FRC is augmented , i.e., if the contributions of higher-order structures (triangles, quadrangles) are taking into account for the curvature computation. In this paper, we considered a 2-complex notion of FRC in the original graph G (previously considered in (Weber et al., 2017b)) and approximate the contribution of k-faces, by including curvature contribution of faces up to order K (see Figures 17, 18 and 19 in Appendix C.4). Since the frequency of k-faces decreases sharply as k increases (illustrated in Figure 7), it is enough to consider small values of K. In our application to mixed-membership community detection, we considered K = 4. We found that this parameter choice balances fast computation (complexity increases with K) and performance (accuracy increases with K). Our numerical experiments conﬁrmed that the performance of the FRC-based community detection algorithm closer resembles that of the ORC-based community detection algorithm as K increases. We have previously noticed in Figures 8 and 9 that the range of curvature values varies between ORC and classical FRC, while the shapes of the distributions resemble each other. Notice that the range of curvature values between the 2-complex FRC and ORC is much better aligned, which is in agreement with their similar performance in downstream tasks.

8.2 Diﬀerences in structural features captured by FRC and ORC.

8.2.1 Relationship with clustering coefficient

The clustering coeﬃcient, initially introduced by Watts and Strogatz (1998), is a networktheoretic measure, which is correlated with the ability of clustering algorithms to detect communities. Formally, it measures how close a node neighborhood is to a clique (i.e., a fully connected subgraph). It is deﬁned as the ratio of the number of edges between neighbors of a node u V and the number of edges in the clique, if the neighborhood of u were fully connected:

Deﬁnition 25 (Clustering coeﬃcient) Let A = (aij)1 i,j, n denote the adjacency of a graph G and u V the node whose connectivity is characterized by the ith row in the adjacency. Then

C(u) := P v u #(u, v) du(du 1) =

P j,k aijajkaki du(du 1) . (8.2)

It has been previously observed empirically (Pouryahya et al., 2017; Samal et al., 2018) that node-level ORC is highly correlated with the clustering coeﬃcient. FRC is also correlated with the clustering coeﬃcient, although the degree of correlation depends signiﬁcantly on the type of higher-order structures that is included (see Figure 16). We see that FRC correlates best with the clustering coeﬃcient, if triangle contributions are included in the curvature computation, i.e., for our notion of 2-complex FRC. This is also the FRC notion that resembles ORC most closely. Importantly, the close relationship between ORC and the clustering coeﬃcient can also be established theoretically. In particular, Jost and Liu (2014) showed the following result:

Theorem 26 (Jost and Liu (2014), Cor. 1)

du C(u) Ric O(u) 2 + du 1 du maxv u dv C(u) .

Curvature-based Clustering on Graphs

To the best of our knowledge, no theoretical connection between FRC and the clustering coeﬃcient is known.

8.2.2 Node clustering vs. edge clustering.

Notice that Algorithm 1 when applied to the underlying graph G performs a node clustering, whereas it performs an edge clustering when applied to the line graph L(G). Speciﬁcally, if Algorithm 1 is applied to the line graph, then we obtain a mixed-membership label vector y for each vertex v by computing yl(v) = 1 |Ev| P e Ev χCl(e), where χCl is the indicator for the cluster Cl. This assignment is based on an edge clustering in the line graph, i.e., in Equation (2.6) above we assume that edge clusters correspond to node clusters. Hence, our approach solves single-membership community detection via node clustering and mixedmembership community detection via edge clustering. Both clustering approaches utilize the implicit assumption that the underlying graph is homophilic, i.e., similar nodes are more densely connected than dissimilar nodes. Node clustering identiﬁes clusters of nodes by ﬁnding densely connected subgraphs. We have seen above that dense connectivity increases curvature; consequently, our curvature-based algorithm ﬁnds clusters by identifying regions of low curvature. In contrast, edge clustering identiﬁes groups of edges that connect a set of similar nodes. Due to the higher connectivity among sets of similar nodes, those edges have a higher number of neighboring edges (i.e., edges that are adjacent to the same node). Consequently, those edges form a densely connected subgraph in the line graph. Applying curvature-based clustering on the line graph allows us to identify these densely connected subgraphs, due to their higher curvature. Throughout the paper, we have provided empirical and theoretical evidence that all curvature notions considered here have higher value in densely connected subgraphs of any graph. However, our study has demonstrated that the diﬀerence of curvature values of edges between communities, which is directly linked to the performance of curvature-based clustering, is more pronounced for some notions. In particular, our analysis shows that ORC highlights this eﬀect best. Among the diﬀerent FRC variants studied here, we showed that incorporating higher-order faces (triangles in the original graph, triangles and quadrangles in the line graph) leads to the best results. The eﬀect of Algorithm 1 may also be interpreted as a graph coarsening scheme. When applied to a graph G, our approach uncovers meso-scale structure, such as communities. The interpretation of Algorithm 1 as coarsening the line graph is perhaps less intuitive. Nevertheless, following again the homophily perspective above, we see that curvature-based edge clustering groups edges that connect similar nodes, uncovering crucial meso-scale structure in the original graph.

8.2.3 Computational Considerations.

An often cited argument in the previous literature is that variants of 2-complex FRC are preferable over ORC in applications, due to their faster computation. In particular, the known links between the spectrum of the graph Laplacian and clustering coeﬃcients (Jost and Liu, 2014) mentioned above make ORC a natural choice for curvature-based clustering. Previous approaches have either computed ORC exactly via the Hungarian method (O(|E|d3 max)) or approximated it via Sinkhorn s method (O(|E|d2 max)), both of which are signiﬁcantly more expensive than computing Forman s curvature with triangle augmentions

Tian and Lubberts and Weber

(O(|E|dmax)). We propose above an approximation via the arithmetic mean of upper and lower curvature bounds and illustrate that this approximation is on par with, or in certain instances, even superior to the Sinkhorn approximation. Importantly, our proposed approximation can be computed in O(|E|dmax) operations, so it has complexity that is comparable to that of FRC. Hence, computational considerations do not provide a clear preference for our approximate ORC approach over FRC. In addition, we propose above a variant of Algorithm 1 via ORC, which does not require the computationally expensive construction of the line graph.

9. Discussion

In this paper, we have proposed a unifying framework for curvature-based node clustering algorithms and systematically investigated the strength and weakness of diﬀerent curvature notions, speciﬁcally variants of Forman s and Ollivier s Ricci curvature. In addition to single-membership community structure, which is the focus of most existing work, we have also considered the mixed-membership setting and extended curvature-based clustering approaches to this case. To this end, we have further studied discrete curvature on the line graph. Furthermore, we have proposed an eﬀective approximation of Ollivier s notion of discrete curvature to overcome its scalability issues. This construction may be of independent interest. We emphasize that the results derived in this paper provide insights into the relationship of discrete curvatures on a graph and that of its dual, which may be of independent interest. Throughout the experiments, we observe consistently convincing performance of the curvaturebased methods. Speciﬁcally, the ORC-based approach can outperform classical community detection methods, the Louvain algorithm and Spectral Clustering, in dense synthetic and real networks, in both singleand mixed-memberships settings. In regimes where community detection is challenging (e.g., close to the detectability threshold in SBMs), curvaturebased methods outperform reference methods. Although the experimental results indicate quadratic complexity of the ORC-based method, the proposed approximation s performance is comparable to classical methods and, in relatively large or dense graphs, even computationally superior. The results agree with the linear complexity of the approximated ORCbased approach, as demonstrated in Section 6.1.3. The FRC-based approach, which has a lower complexity, can outperform or has comparable performance to the Louvain algorithm, and to spectral clustering in dense graphs. Note that we consider diﬀerent variants of Forman s Ricci curvature for singleand mixed-membership node clustering problems, where we only consider triangular faces in the former while incorporating the quadrangular faces in the latter. We also observe that FRC with triangular faces has a stronger relationship with the clustering coeﬃcients in SBMs, while FRC with further quadrangular faces has a slightly stronger relationship with the ORC in MMBs. As byproduct, the results provide some experimental evidence to relate Forman s to Ollivier s Ricci curvature. Several extensions of this work could be avenues for future investigation. In the variant of Algorithm 1 presented here, several crucial hyperparameters are chosen heuristically. Specifically, we believe that the choice of cut-oﬀpoints could be improved, potentially utilizing geometric information, such as curvature gaps (Gosztolai and Arnaudon, 2021). The choice of the stopping criterion (T) could be guided by established heuristics in the community

Curvature-based Clustering on Graphs

detection literature. For instance, the Girvan-Newman algorithm constructs a dendrogram to inform the stopping criterion and edge threshold , when the number of communities is not known a priori. Our mixed-membership approaches require the construction of the line graph, which adds signiﬁcant computational cost when applied to large-scale graphs. Approaches that avoid the construction of the line graph, perhaps utilizing the relationship between the Ricci curvature of a graph and its dual, could remove this bottleneck. We have also seen that using 1 as the upper bound for ORC could already improve the eﬃciency in our experiments. These approximations may prove computationally beneﬁcial in both the mixed-membership and single-membership settings. While Algorithm 1 allows for clustering in both unweighted and weighted networks, it cannot be applied to directed networks in its current form. An extension to directed networks would signiﬁcantly widen the range of possible applications. Further study of the behavior of discrete curvatures in various random graph models could serve to improve both theoretical understanding of these graphs, as well as to inform the development of improved algorithms for certain graph distributions. Finally, while our experiments indicate that curvature-based methods compare favorably to community detection baselines, the question of theoretical optimality remains open and presents an important avenue for future work.

Acknowledgments

Part of the computations in this paper were run on the FASRC Cannon cluster supported by the FAS Division of Science Research Computing Group at Harvard University. Part of the computations were enabled by resources provided by the Swedish National Infrastructure for Computing (SNIC) at the PDC Center for High Performance Computing, KTH Royal Institute of Technology, partially funded by the Swedish Research Council through grant agreement no. 2018-05973. We would also like to acknowledge the use of the University of Oxford Mathematical Institute compute facility in carrying out preliminary work. Most of the work was done when YT was at Nordita, Stockholm University and KTH Royal Institute of Technology, where she was funded by the Wallenberg Initiative on Networks and Quantum Information (WINQ). Part of the work was done when YT was at Mathematical Institute, University of Oxford, where she was funded by the EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling (EP/L015803/1) in collaboration with Tesco PLC. Part of this work was done while MW was at the University of Oxford, supported by a Hooke Research Fellowship. MW further acknowledges partial support from NSF CBET-2112085 and a Sloan Fellowship in Mathematics. Over the course of this work, ZL has been funded under NIH grant MH-19-147, NSF HDR TRIPODS grant 1934979, and NSF DMS grant 2113099. He would also like to acknowledge support from the Naval Engineering Education Consortium and Microsoft Research.

Tian and Lubberts and Weber

Appendix A. ORC on weighted graphs

A.1 Proofs of Results in Section 4.1

Proof of Lemma 5: Consider the subgraph of G given represented schematically in Figure 12.

Figure 12: Schematic of the subgraph H of G, focusing on x, y and their neighbors. Note that typically there are several nodes ℓ1, . . . , ℓj1, c1, . . . , cj2, and r1, . . . , rj3, satisfying deg(x) = j1 + j2 + 1, deg(y) = j2 + j3 + 1. The support of the mx measure is shown in blue, my in red.

Now deﬁne the following quantities:

ℓ mx(ℓ) Ly = X

ℓ my(ℓ) (= 0)

Xx = mx(x) Xy = my(x)

c mx(c) Cy = X

Yx = mx(y) Yy = my(y)

r mx(r) (= 0) Ry = X

If we consider transport plans restricted to H, then all of the Lx mass must be transported to x, and all of the Ry mass must be transported from y. As such, we may consider a modiﬁed diagram as in Figure 13.

Figure 13: Schematic of the subgraph H of H, focusing on x, y and their common neighbors. The support of the mx measure is shown in blue, my in red.

Curvature-based Clustering on Graphs

Here we deﬁne mx, my as follows:

Lx + Xx if v = x mx(c) if v = c Yx + Rx (= Yx) if v = y,

Ly + Xy (= Xy) if v = x my(c) if v = c Yy + Ry if v = y.

Then we have

W1(mx, my) X

ℓ ω(ℓ, x)mx(ℓ) + X

r ω(r, y)my(r) + W1( mx, my),

and this is tight when x, y have no unshared neighbors. Indeed, if we only consider the subgraph H of G, this upper bound is the exact value of W1(mx, my). Deﬁne Xx, Cx, Yx, Xy, Cy, Yy analogously for these new measures. Now for each vertex of type c, if mx(c) my(c) > 0, we will send this amount to y along (c, y). Otherwise, we will send my(c) mx(c) from x along (c, x). This results in the updated (signed) measure

Xx P c( my(c) mx(c))+ if v = x my(c) if v = c Yx + P c( mx(c) my(c))+ if v = y.

Finally, if ˆmx(x) Xy > 0, send this amount to y along the edge (x, y). Otherwise, send ˆmx(y) Yy = ( ˆmx(x) Xy) from y to x along the (x, y) edge. This gives the upper bound

W1( mx, my) X

c ( mx(c) my(c))+ω(c, y) + ( my(c) mx(c))+ω(c, x)

c ( my(c) mx(c))+

Finally, we have

W1(mx, my) X

ℓ ω(ℓ, x)mx(ℓ) + X

r ω(r, y)my(r)

c ω(c, y)(mx(c) my(c))+ + ω(c, x)(my(c) mx(c))+

Lx + Xx Xy X

c (my(c) mx(c))+

ω(x, y) = Ux,y.

For completeness, we state and prove the following lower bound on the Wasserstein distance:

Lemma 27 For any 1-Lipschitz function h(v) on G, we have

W1(mx, my) Ev mx[h(v)] Ev my[h(v)] = X

v N(x) N(y) h(v)(mx(v) my(v)) .

Tian and Lubberts and Weber

Proof Given a coupling π for mx and my, and a 1-Lipschitz function h : V R, we have

E(u,w) π[d G(u, w)] = X

u N(x) w N(y)

d G(u, w)π(u, w)

u N(x) w N(y)

(h(u) h(w))π(u, w)

v N(x) N(y) h(v)(mx(v) my(v)).

Since the choice of coupling was arbitrary, we conclude that W1(mx, my) satisﬁes the same lower bound.

Now we may prove Lemma 7.

Proof of Lemma 7: Let VH = N(x) N(y) be the set of vertices from the subgraph H of G that we considered in the upper bound for W1(mx, my). A diﬃculty that arises in deﬁning a 1-Lipschitz function h on G is that there are more edges and paths in G than just those in H. These short-circuits in the larger graph mean that a 1-Lipschitz function deﬁned on H does not necessarily extend to a 1-Lipschitz function on G. We deﬁne a partition of VH into three subsets P, Z, N based on whether mx(v) my(v) is Positive, Zero, or Negative. Observe that for a vertex v P, the excess mass mx(v) my(v) must be transported away from v in order to have a valid transport plan. In fact, it must be transported at least enough to reach N, since

v V mx(v) my(v) = X

v P (mx(v) my(v)) + X

v N (mx(v) my(v))

implies that mx(P) my(P) = my(N) mx(N). In other words, all excess mass from mx in P must be transported to vertices in N, or we cannot have a valid transport plan. Motivated by this observation, we deﬁne the distance from a vertex v V to a set of vertices S V as d G(v, S) = minu S d G(v, u), where d G is the shortest path distance in G. Thus, d G(v, S) gives the distance in G from the vertex v to the nearest element in the set S. Consider the following 1-Lipschitz functions parameterized by λ [0, 1]:

λd G(v, N) if v P (1 λ)d G(v, P) if v N λd G(v, N) (1 λ)d G(v, P) otherwise.

Note that the third equation works for all v V , since d G(v, P) = 0 for v P and similarly for N, but we split this into cases to show the behavior on the diﬀerent sets more clearly. These functions are all 1-Lipschitz since they are convex combinations of the two 1Lipschitz functions d G( , N) and d G( , P). Let s show that these are 1-Lipschitz: Let

Curvature-based Clustering on Graphs

S V , and let u, v V . Let wu, wv S be the nearest vertices in S to u, v respectively, so that d G(u, wu) = d G(u, S) and similarly for wv. Then

d G(v, S) d G(v, wu) d G(v, u) + d G(u, wu)

= d G(u, v) + d G(u, S)

d G(u, S) d G(u, wv) d G(u, v) + d G(v, wv)

= d G(u, v) + d G(v, S)

= |d G(v, S) d G(u, S)| d G(u, v).

But since the objective function Ev mx[h(v)] Ev my[h(v)] is linear in h, we see that the maximizer of this quantity over {hλ : λ [0, 1]} must be achieved at h0 or h1. This gives the following lower bound for W1(mx, my) :

W1(mx, my) max λ {0,1}

v VH hλ(v)(mx(v) my(v))

v P hλ(v) (mx(v) my(v)) | {z } >0

v Z hλ(v) (mx(v) my(v)) | {z } =0

v N hλ(v) (mx(v) my(v)) | {z } <0

v P d G(v, N)(mx(v) my(v)), X

v N d G(v, P)(my(v) mx(v))

We note that vertices of type ℓwill always belong to P, and vertices of type r will always belong to N. Vertices of type c may fall into either set, and there will typically be some of these vertices in both sets. When we set α = 0, then we always have x N and y P, though if we choose α 1/2, x cannot be in N, and y cannot be in P.

A.2 Note on combinatorial ORC bound by Jost-Liu

Remark 28 We note that the lower bound for W1(mx, my) given in Jost and Liu (2014) for weighted graphs using the measure mx(v) = wvx/dx is actually incorrect. Consider the weighted graph G shown in Figure 14, which also summarizes the measures mx, my.

x y ℓ c mx 0 1/6 1/2 1/3 my 1/3 0 0 2/3 h -6/5 -1/5 1 4/5

Figure 14: The weighted graph G, the measures mx, my from Jost and Liu (2014), and our 1-Lipschitz function h.

Tian and Lubberts and Weber

An optimal transport plan sends 1/6 mass from y to x and from c to x, and 1/2 mass from ℓto c, for a cost of 3/5. We can verify the optimality of this transport plan with the 1-Lipschitz function h given in the table in Figure 14, since this also achieves the value 3/5. As such, the exact ORC of the edge {x, y} in G is 1 3/5 = 2/5. However, the upper bound on this ORC from Jost and Liu (2014) is

A.3 Proof of Results in Section 5.2

Here is the proof of Lemma 11. Proof Let (u, v) denote the number of triangles in G which include the edge {u, v}, and ({u, v}, {v, w}) the number of triangles in L(G), which include {{u, v}, {v, w}}. Both bounds follow from a simple combinatorial computation: For the upper bound, we have

Ric L(G) O ({{u, v}, {v, w}}) ({u, v}, {v, w})

d{u,v} d{v,w} = dv 2 + 1E({u, w})

(du dw) + dv 2 .

The lower bound follows from

Ric L(G) O ({{u, v}, {v, w}}) 1 1 d{u,v} 1 d{v,w} ({u, v}, {v, w})

d{u,v} d{v,w}

1 1 d{u,v} 1 d{v,w} ({u, v}, {v, w})

d{u,v} d{v,w}

+ + ({u, v}, {v, w})

d{u,v} d{v,w}

= 1 1 du + dv 2 1 dv + dw 2 dv 2 + 1E({u, w})

(du dw) + dv 2

1 1 du + dv 2 1 dv + dw 2 dv 2 + 1E({u, w})

(du dw) + dv 2

+ dv 2 + 1E({u, w})

(du dw) + dv 2 .

A.4 Proofs and Remarks from Section 5.3.1

We prove Lemma 13. Proof The relationship arises from simple combinatorial arguments. Recalling Lemma 10, we have

Ric L(G) F ({{u, v}, {v, w}}) = 4 d{u,v} d{v,w} = 4 (du dv 2) (dv + dw 2)

= 8 du dv dv dw

= Ric G F ({u, v}) + Ric G F ({v, w}),

Curvature-based Clustering on Graphs

which gives the claim. Similarly,

Ric L(G) F ({u, v}) = X

{{u,v},{v,w}} E Ric L(G) F ({{u, v}, {v, w}}) + X

{{u,v},{u,w}} E Ric L(G) F ({{u, v}, {u, w}})

{v,w} E w =u

Ric G F ({u, v}) + Ric G F ({v, w}) + X

{u,w} E w =v

Ric G F ({u, v}) + Ric G F ({u, w})

= d{u,v}Ric G F ({u, v}) + X

{v,w} E w =u

Ric G F ({v, w}) + X

{u,w} E w =v

Ric G F ({u, w})

= (du + dv 2)Ric G F ({u, v}) + Ric G F (v) Ric G F ({u, v}) + Ric G F (u) Ric G F ({u, v})

= Ric G F (u) + Ric G F (v) Ric G F ({u, v})2.

We now prove Lemma 16. Proof

Ric F ({{u, v}, {v, w}})

ω + 2 n e Γ {u, v} : {u, v}, {v, w}, e / F o

n e Γ({v, w}) : {{u, v}, {v, w}, e} / F o

= dv 2 + 1E({u, w})

ω + 2 du 1 1E({u, w}) + dw 1 1E({u, w})

ω du dw + 4 2

ω + 2 1E({u, w}) .

Weights for quadrangles in augmented FRC To determine the weights of quadrangles, we again utilize Heron s formula: If we treat the two edges with the largest weights to be parallel, we can compute the weight of a quadrangular face via the area of the two triangles that form the trapezoid. Speciﬁcally, let f = (ei, ej, ek, el) denote a quadrangle in G, i.e., ei ej, ej ek, ek el, el ei while ei ek and ej el, where ωei ωej ωek ωel, and we consider three diﬀerent cases here: (i) ωei > ωej, i.e., when the two largest weights (or lengths) are not the same, (ii) ωei = ωej and ωek > ωel, i.e., when the two largest weights are the same while the other two are diﬀerent, and (iii) ωei = ωej and ωek = ωel, i.e., when the two largest are the same while the others are also the same. In case (i), we assume that ei, ej are parallel to each other, and then the area of the quadrangle can be obtained via

Tian and Lubberts and Weber

Heron s formula:

s(s (ωei ωej))(s ωek)(s ωel) 1 + 2 ωej ωei ωej

s = (ωei ωej) + ωek + ωel

In case (ii), we assume that ek, el are parallel to each other, and similarly

s(s (ωek ωel))(s ωei)(s ωej) 1 + 2 ωel ωek ωel

s = (ωek ωel) + ωei + ωej

Finally, in case (iii), we assume that f is a rectangle, thus

ωf := ωeiωek . (A.6)

A.5 Proof of Theorem 20

Proof Due to the homogeneity of the graphs Ga,b, which is preserved under the Ricci ﬂow, it is suﬃcient to analyze the evolution of one edge of types (1), (2), (3). In the following, di, Di, wt i, κt i denote the shortest-path distance, Wasserstein-1 distance, weight in iteration t, and ORC curvature of an edge of type (i) in iteration t, respectively. Notice that here

wt+1 i = (1 κt i)wt i = wt i 1 Di

wt i = Di , (A.7)

since di = wt i. Throughout the proof, x, y denote the vertices adjacent to the edge under consideration.

Iteration 1. In the ﬁrst iteration, all edges have the same weight (i.e., 1), hence, Equation (6.2) reduces to the uniform distribution.

Edges of type (3) have symmetric neighborhoods, with uniform mass distributions of mv = 1

a on all neighbors of x, y. Thus, the total transportation cost of moving mass from x s neighbors to y s neighbors is D3 = 1

ad3 and therefore w1 3 = D3 = 1

a, since d3 = 1.

For edges of type (1), note that dx = dy = a + (b 1), hence mv = 1 a+b 1 for all neighbors of x, y. By construction, x, y have b 2 common neighbors; due to the symmetry of the neighborhoods, no mass has to be moved for those neighbors. The optimal transportation plan for the remaining mass is as follows: Move the remaining mass a a+b 1 from the neighbors of x (which are not neighbors of y) to x. This incurs a cost of a a+b 1d2, since this mass is transported along edges of type (2). Leave a mass of 1 a+b 1 on x and transport the remainder to y, which incurs a cost of a 1 a+b 1d1, since it is transported along an edge of type (1). Lastly, we distribute mass a a+b 1 from y uniformly to its neighbors, which incurs a cost of a a+b 1d2, since it is transported along edges of type (2). In total, the transportation cost amounts to

D1 = 2a a + b 1d2 + a 1 a + b 1d1 ,

Curvature-based Clustering on Graphs

To prove that this is an optimal ﬂow, consider the following 1-Lipschitz function h:

2 if v x, v y 1 if v = x 0 if v = y or v x, y 1 if v y, v x.

Ev mx[h(v)] Ev my[h(v)] = 2 a a + b 1 + 1 ( 1) a + b 1 1 ( a) a + b 1 = 3a 1 a + b 1.

This implies w1 3 = D3 = 3a 1 a+b 1, since d1 = d2 = 1.

For edges of type (2), let x denote the internal node and y the bridge node. By construction, x, y have a 1 common neighbors (internal nodes). We want to leave a mass of 1 a+b 1 on each of those nodes (corresponding to the mass distribution in the neighborhood of y), and transport the excess mass to x and the unshared neighbors of y in an optimal way. Observe that 1

a > 1 a+b 1, so we have a total of (a 1) 1 a 1 a+b 1

excess mass to move. If this exceeds 1 a+b 1, we send that amount to x, and the remaining amount to y; otherwise we send all of this mass to x. We can simplify this condition as follows:

a 1 a + b 1

1 a + b 1 = (a 1)(b 1) a

a(a + b 1) 0

if and only if (a 1)(b 1) a = a(b 2) (b 1) 0. When b = 2, this never holds, and when b 3, this always holds, since a(b 2) (b 1) a b + 1 0, given that a b. Now let us consider these cases in turn.

Case b = 2: We send all of the excess mass from the common vertices to x, amounting to a 1 a(a+1) < 1 a+1, meaning that we must send an additional 1 a(a+1) mass from y to x, and the remaining 1 a+1 is sent to y s single unshared neighbor. The total cost is

D2 = a 1 a(a + 1)d3 + 1 a(a + 1)d2 + 1 a + 1d1.

To see that this is an optimal ﬂow, consider the following 1-Lipschitz function h:

( 1 if v x, y or v = y 0 otherwise.

This achieves a dual objective function value of 2 a+1, so w1 2 = D2 = 2 a+1, since d1 = d2 = d3 = 1.

Case b 3: We send 1 a+b 1 of the excess mass from the common vertices to x, and

the remainder (a 1)(b 1) a

a(a+b 1) to y. Since there is already a mass of 1

a at y, this gives a

Tian and Lubberts and Weber

total mass of (a 1)(b 1) a+(a+b 1)

a(a+b 1) = b 1 a+b 1 to be sent to y s unshared neighbors, as required. The total cost is

D2 = 1 a + b 1d3 + (a 1)(b 1) a

a(a + b 1) d2 + b 1 a + b 1d1.

To see that this is an optimal ﬂow, consider the following 1-Lipschitz function h:

1 if v x, y 0 if v = x, y 1 if v y, v x.

This achieves a dual objective function value

Ev mx[h(v)] Ev my[h(v)] = 1(a 1)(b 1)

a(a + b 1) 1 (b 1)

a + b 1 = (2a 1)(b 1)

a(a + b 1) ,

which gives w1 2 = D2 = (2a 1)(b 1)

a(a+b 1) = 2a 1 a+b 1 b 1

a , since d1 = d2 = d3 = 1.

We summarize these calculations in the following table:

Type 1 2 3 b = 2 3a 1

1 a b 3 3a 1 a+b 1

(2a 1)(b 1)

In both cases, it is easy to check that w1 1 > w1 2 > w1 3, using a b 2.

Iteration t+1. In round t 1, the mass distributions on the neighborhood of a bridge node are given by

( 1 Cb e wt 1, y is bridge node

1 Cb e wt 2, else ,

and for an internal node by

( 1 Cin e wt 2, y is bridge node

1 Cin e wt 3, else .

Here the normalizing constants for bridge nodes is given by

Cb = ae wt 2 + (b 1)e wt 1 ,

and for internal nodes as Cin = e wt 2 + (a 1)e wt 3 .

By way of induction, we suppose that wt 1 > wt 2 and wt 1 > wt 3, where we proved this in the case t = 1 above. We again consider the three edge types separately. Given wt i, edge weights are updated as follows in iteration t:

Curvature-based Clustering on Graphs

For edges of type (3), due to the symmetry of the neighborhoods, we again only move mass from x to y at a total cost of D3 = wt 3 Cin e wt 3. Hence, the updated weights are

wt+1 3 = wt 3 Cin e wt 3 by Equation (A.7).

For edges of type (1), the transport plan is analogous to the one in iteration 1: We move a mass of a Cb e wt 2 from the neighbors of x to x, leave a mass of 1 Cb e wt 1 there,

and transport the remaining mass a Cb e wt 2 1

Cb e wt 1 to y. Picking up a mass of 1 Cb e wt 1 previously on y, we distribute the mass among y s neighbors. The total transportation cost is D1 = a

Cb e wt 2d2 + a

Cb e wt 2 1

Cb e wt 1 d1 + a

Cb e wt 2d2 .

To prove that this is an optimal ﬂow, consider the following 1-Lipschitz function h:

wt 2 + wt 1 if v x, v y wt 1 if v = x 0 if v = y or v x, y wt 2 if v y, v x.

Ev mx[h(v)] Ev my[h(v)] = (wt 2 + wt 1) a

Cb e wt 2 + wt 1 ( 1)

Cb e wt 1 wt 2 ( a)

Consequently, the updated weights are wt+1 1 = 2awt 2+awt 1 Cb e wt 2 wt 1 Cb e wt 1.

For edges of type (2), let x denote the internal node and y the bridge node. At any of the a 1 common neighbors of x, y, we have

mx(v) = e wt 3 Cin , my(v) = e wt 2 Cb .

We deﬁne the following quantity

ct = mx(v) my(v) = e wt 3Cb e wt 2Cin Cb Cin

= ae wt 2 wt 3 + (b 1)e wt 1 wt 3 e 2wt 2 (a 1)e wt 2 wt 3 Cb Cin

= (b 1)e wt 1 wt 3 + e wt 2(e wt 3 e wt 2) Cb Cin .

When wt 2 > wt 3, this is always positive, but if wt 2 < wt 3, this could be negative.

If ct 0, then as before, we want to transport the excess mass from these common neighbors to x and the unshared neighbors of y in an optimal way. If the total excess mass exceeds my(x), we send that amount to x, and the remaining amount to y; otherwise we send all of this mass to x. This condition may be written as

rt := (a 1)e wt 3 Cin ae wt 2 Cb 0.

Tian and Lubberts and Weber

Clearly, rt = (a 1)ct e wt 2 Cb < (a 1)ct.

Case ct, rt 0: We send my(x) mass from the common neighbors to x, and the remaining portion to y. Then we send all of the mass at y to its unshared neighbors, for a cost of

D2 = e wt 2 Cb d3 + rtd2 + (b 1)e wt 1 Cb d1.

Consider the 1-Lipschitz function h given by

wt 2 if v x, y wt 2 wt 3 if v = x 0 if v = y wt 1 if v y, v x.

This gives a dual objective function value

wt 2(a 1)ct + (wt 2 wt 3) e wt 2 Cb wt 1 (b 1)e wt 1 Cb ,

which matches the formula for D2 above when we take into consideration that dj = wt j at iteration t.

Case ct 0, rt 0: We send all of the excess mass from the common neighbors to x, and get the remaining portion from y. We then send the rest of the mass at y to its unshared neighbors, for a total cost of

D2 = (a 1)ctd3 + ( rt)d2 + (b 1)e wt 1 Cb d1.

Consider the 1-Lipschitz function h given by

wt 3 if v x, y 0 if v = x wt 2 if v = y wt 2 wt 1 if v y, v x.

To show that this function is 1-Lipschitz, note that if v is a common neighbor of x, y, then |h(v) h(y)| = |wt 2 wt 3| wt 2: either wt 2 > wt 3, in which case |wt 2 wt 3| = wt 2 wt 3 wt 2, or else wt 2 < wt 3, and wt 3 wt 2 wt 2 follows by the triangle inequality since wt 3 = W1(mv, mx) W1(mv, my) + W1(my, mx) = 2wt 2, where W1 is the Wasserstein1 distance, and mx, mv, my are the measures at the vertices x, v, y with the weights from iteration t 1 (or when t = 1, w0 3 = 1 < 2 = w0 2).

This gives a dual objective function value

wt 3(a 1)ct + wt 2 e wt 2 Cin + (wt 2 wt 1) (b 1)e wt 1 Cb .

Curvature-based Clustering on Graphs

This gives the equality when we note that

rt = ae wt 2 Cb (a 1)e wt 3 Cin = 1 (b 1)e wt 1 Cb

1 e wt 2 Cin

= e wt 2 Cin (b 1)e wt 1 Cb ,

where the second equality follows from the deﬁnitions of Cb and Cin.

Case ct, rt 0: If ct 0, then rt < (a 1)ct 0. Since my places more mass at the common vertices than mx, the only vertex on which mx places more mass than my must be y. Thus, we send all of the needed mass from y to x, the common neighbors, and the unshared neighbors of y, for a cost of

D2 = e wt 2 Cb d2 + (a 1)( ct)d2 + (b 1)e wt 1 Cb d1.

Consider the 1-Lipschitz function h given by

wt 2 if v x, y wt 2 if v = x 0 if v = y wt 1 if v y, v x.

This gives a dual objective function value

wt 2(a 1)ct + ( wt 2)

+ (b 1)( wt 1)

which matches the formula for D2 above.

We summarize these calculations in the following table:

Type i wt+1 i 1 wt 1 ae wt 2 e wt 1 Cb + 2wt 2 ae wt 2 Cb 2 (ct, rt 0) wt 1 (b 1)e wt 1 Cb + wt 2rt + wt 3 e wt 2 Cb 2 (ct 0, rt 0) wt 1 (b 1)e wt 1 Cb + wt 2( rt) + wt 3(a 1)ct

2 (ct, rt 0) wt 1 (b 1)e wt 1 Cb + wt 2( rt)

3 wt 3 e wt 3 Cin

We are now ready to prove the main two theorem statements. First, we show that wt 3 is decaying to 0. Since a > b 2, we have that a 1 2. Then since Cin (a 1)e wt 3, we get

wt+1 3 = wt 3 e wt 3 Cin wt 3 1 a 1.

This gives wt+1 3 [a(a 1)t] 1, which tends to 0 exponentially quickly as t .

Tian and Lubberts and Weber

Now we wish to show that wt 1 > wt 2 and wt 1 > wt 3 for all t 1. From the proof above, we know that wt 3 < 2wt 2. We will also need the inequalities

ae wt 2 be wt 1 > 0 (A.8)

which holds from a > b and the induction hypothesis wt 1 > wt 2; and

ae wt 2 Cb > rt (A.9)

Cinae wt 2 Cin Cbrt > 0

2ae 2wt 2 + a(a 1)e wt 2 wt 3 (a 1)(b 1)e wt 1 wt 3 > 0,

which holds since e wt 2 > e wt 1 and a > b 1. Note that this implies

ae wt 2 Cb a 1

2 e wt 3 Cin e wt 3 Cin .

We ﬁrst show wt+1 1 > wt+1 3 .

wt+1 1 wt+1 3 = wt 1 ae wt 2 e wt 1 Cb + 2wt 2 ae wt 2 Cb wt 3 e wt 3 Cin

wt 1 (b 1)e wt 1 Cb > 0,

using Equation (A.8), wt 3 < 2wt 2, and the implication of Equation (A.9) just above. Since Equation (A.8) clearly shows that the coeﬃcient of wt 1 in wt+1 1 wt+1 2 will be nonnegative in any of the three cases (it is the same in all of them), we argue as follows:

wt+1 1 wt+1 2

2wt 2 ae wt 2 Cb wt 2rt wt 3 e wt 2 Cb ct, rt 0

2wt 2 ae wt 2 Cb + wt 2rt wt 3(a 1)ct ct 0, rt 0

2wt 2 ae wt 2 Cb + wt 2rt ct, rt 0.

The quantity in the ﬁrst subcase may be bounded below as follows, using wt 3 < 2wt 2:

2ae wt 2 Cb rt

wt 3 e wt 2 Cb wt 2

(a 2)e wt 2 Cb + ae wt 2 Cb rt

This lower bound is always positive by Equation (A.9).

For the second subcase, observe that (a 1)ct = rt + e wt 2 Cb e wt 2 Cb , so we may lower bound the quantity there (again using wt 3 < 2wt 2) by

ae wt 2 Cb + (a 1)e wt 3 Cin

wt 3 e wt 2 Cb wt 2

(a 2)e wt 2 Cb + (a 1)e wt 3 Cin

Curvature-based Clustering on Graphs

The third subcase is simplest of all, since the quantity above simpliﬁes to

ae wt 2 Cb + (a 1)e wt 3 Cin

This completes the proof.

Appendix B. Data sets

Real-world data. We consider both synthetic networks generated from the random graph models (see Section 5), and real networks1, which characterize the connectomes of a set of diﬀerent animal species. A connectome is a speciﬁc, cell-to-cell mapping of axonal tracts between neurons, created from cellular data obtained, for instance, via electron microscopy. Here specﬁcally, we consider networks constructed from mixed species of cats ((de Reus and van den Heuvel, 2013), Cat ), male C. elegans ((Jarrell et al., 2012), Worm ), and rhesus ((Harriger et al., 2012), Macaque ). Table 13 provides summary statistics for the real networks in our study.

n |E| |E| d σd/d Modularity Cat 65 730 18859 22.46 0.44 0.27 Worm 269 2902 93893 21.58 0.74 0.36 Macaque 242 3054 114991 25.24 0.73 0.36

Table 13: Summary statistics of the real networks.

Figure 15: The real networks, Cat (left), Worm (middle) and Macaque (right), where we apply the Louvain algorithm to detect communities and vertices in diﬀerent communities are shown in diﬀerent colors.

Random Geometric Graphs. In our experiments, we consider Random Geometric Graphs (RGGs) which incorporate the spatial perspective (Penrose, 2003). For completeness, we deﬁne the RGG below.

1. Obtained from Neuro Data s open source data base https://neurodata.io/project/connectomes/, accessed on 2 May 2022. The database hosts animal connectomes produced using data from a multitude of labs, across model species, using diﬀerent modalities.

Tian and Lubberts and Weber

Deﬁnition 29 (Random Geometric Graphs) In this model, vertices are placed in the unit cube uniformly at random, and two vertices will be connected by an edge if the distance between the vertices is less than a previously speciﬁed value called radius. Speciﬁcally, let the unit cube be [0, 1)nd where nd deﬁnes its dimension, the radius be r (0, 1), and the distance function be d(x, y), x, y [0, 1)nd. Then there will be an edge between vertices vi and vj if d(xi, xj) < r ,

where xi, xj are the positions of the vertices vi, vj, respectively, in the unit cube [0, 1)nd.

Speciﬁcally, in our experiments, we start from the RGGs corresponding to 2-d unit cubes.

Appendix C. Experiments

C.1 Single-Membership Community Detection

Benchmarking. We report the mean NMI, runtime and their standard deviations (SD) of the methods on SBMs with two equally-sized blocks and varying network sizes. Consistent with the results in Section 7.3, the ORC-based approach outperforms most reference methods, and successfully recovers single-membership community structure, with an NMI greater than 0.8 in almost all cases, as we vary the network size n.

n Louvain Spectra-S Spectra-A ORC-E ORC-A FRC-2 100 0.705 (0.121) 0.897 (0.081) 0.599 (0.078) 0.860 (0.117) 0.363 (0.070) 0.355 (0.112) 200 0.981 (0.025) 0.980 (0.020) 0.805 (0.285) 0.986 (0.018) 0.395 (0.107) 0.263 (0.035) 1000 1 (0) 1 (0) 1 (0) 1 (0) 0.990 (0.015) 0.437 (0.263) 3000 1 (0) 1 (0) 1 (0) 1 (0) 1 (0) 0.740 (0.392)

Table 14: Mean (SD) of NMI from diﬀerent methods on planted SBMs with two equallysized blocks, pin = 0.1, pout = 0.01, and varying sizes n.

n Louvain Spectra-S Spectra-A ORC-E ORC-A FRC-2 100 0.828 (0.021) 0.045 (0.004) 0.132 (0.323) 0.482 (0.018) 0.419 (0.013) 0.373 (0.009) 200 2.767 (0.098) 0.076 (0.004) 0.037 (0.004) 1.194 (0.076) 0.991 (0.016) 0.889 (0.015) 1000 33.47 (0.584) 1.087 (0.043) 0.382 (0.016) 12.66 (0.168) 6.721 (0.103) 4.244 (0.086) 3000 265.4 (3.120) 13.44 (0.477) 4.802 (0.110) 165.8 (5.805) 85.63 (2.751) 44.27 (1.564)

Table 15: Mean (SD) of runtime (in seconds) from diﬀerent methods on planted SBMs with two equally-sized blocks, pin = 0.1, pout = 0.01, and varying sizes n.

Rationale for ORC approximation. ORC-S has been considered in the literature in order to improve the eﬃciency of ORC computation, but we notice that the improvement is not clear when the graph is not suﬃciently large or dense (see Table 16). However, the approximation we have proposed almost always signiﬁcantly improve the computational eﬃciency (see also Tables 3 and 15). We also mention here that the current computations are only allocated with moderate level of memory, while we also observed that ORC-S can improve the eﬃciency when large amount of memory is available, as in the analysis of real data in Section 7.4.

Curvature-based Clustering on Graphs

pin ORC-E ORC-S ORC-A 0.1 2785 5859 1013 0.2 16876 68658 2933

Table 16: Runtime (in seconds) from diﬀerent ORC-based methods on one sample of planted SBMs with two equally-sized blocks, size n = 5000, pout = 0.01, and varying probabilities pin.

Rationale for 2-complex FRC. FRC-2 is the one of the strongest correlation with the clustering coeﬃcients, among the three versions of FRC (see Figure 16).

0.06 0.08 0.10 0.12 Clustering coefficient

Counts in bins

0.06 0.08 0.10 0.12 Clustering coefficient

Counts in bins

0.06 0.08 0.10 0.12 Clustering coefficient

Counts in bins

0.06 0.08 0.10 0.12 Clustering coefficient

Counts in bins

Figure 16: 2-d histogram of FRC-1 (left), FRC-2 (middle left), FRC-3 (middle right) and ORC-E (right) versus the clustering coeﬃcients of nodes in G, where ns = 10 networks are generated from the planted SBM with two equally-sized blocks, size n = 1000, pin = 0.1 and pout = 0.01.

C.2 Mixed-Membership Community Detection

Benchmarking. We report the mean NMI,, runtime and their standard deviations (SD) of the methods on MMBs with two equally-sized blocks and varying network sizes. Similar to the results in Section 7.4, the ORC-based method outperforms most reference methods, and successfully recovers mixed-membership community structure in almost all cases, with an NMI greater than 0.8, as we vary the network size n.

n Louvain Spectra-S Spectra-A ORC-E ORC-A FRC-3 50 0.401 (0.068) 0.965 (0.038) 0.463 (0.151) 0.925 (0.081) 0.791 (0.155) 0.712 (0.264) 100 0.469 (0.107) 0.996 (0.011) 0.453 (0.164) 0.993 (0.015) 0.664 (0.167) 0.750 (0.233) 300 0.999 (0.002) 1 (0) 0.834 (0.264) 1 (0) 0.724 (0.123) 0.815 (0.173) 500 1 (0) 1 (0) 1 (0) 1 (0) 0.769 (0.213) 0.838 (0.205)

Table 17: Mean (SD) of the extended NMI from diﬀerent methods on planted two-block MMBs with two equally-sized blocks, no = 1 node of mixed membership, pin = 0.1, pout = 0, and varying sizes n.

C.3 Comparison with spectral clustering methods

Apart from the two versions of Spectral Clustering algorithm as presented in Section 7, we have also considered other versions of the Spectral Clustering, where instead of using

Tian and Lubberts and Weber

n Louvain Spectra-S Spectra-A ORC-E ORC-A FRC-3 50 0.699 (0.135) 0.067 (0.016) 0.118 (0.295) 0.741 (0.121) 0.626 (0.106) 0.730 (0.121) 100 6.405 (0.922) 0.139 (0.005) 0.051 (0.005) 2.330 (0.378) 2.051 (0.336) 3.346 (0.408) 300 391.5 (28.44) 5.388 (0.391) 2.242 (0.168) 48.88 (4.877) 33.35 (3.868) 46.03 (8.584) 500 2386 (107.4) 60.76 (3.543) 28.05 (1.454) 269.6 (25.12) 183.0 (11.95) 200.2 (44.79)

Table 18: Mean (SD) of runtime (in seconds) from diﬀerent methods on planted two-block MMBs with two equally-sized blocks, no = 1 node of mixed membership, pin = 0.1, pout = 0, and varying sizes n.

the adjacency matrix directly, we can construct an aﬃnity matrix by (i) radius basis functions ( SC-R ) or (iv) nk nearest neighbors ( SC-N ) where nk is chosen to be 10 in the experiments. Our experimental results (see Tables 19 and 20) demonstrate that the ORC approach can also outperform other variants of Spectral Clustering, in both singleand mixed-membership community detection.

n ORC-E SC-R SC-N NMI runtime NMI runtime NMI runtime 100 0.860 (0.117) 0.482 (0.018) 0.417 (0.103) 0.042 (0.006) 0.619 (0.143) 0.063 (0.002) 500 0.986 (0.018) 1.194 (0.076) 0.615 (0.082) 0.061 (0.004) 0.889 (0.082) 0.099 (0.003) 1000 1 (0) 12.66 (0.168) 0.834 (0.276) 0.833 (0.063) 1 (0) 1.203 (0.102) 3000 1 (0) 165.8 (5.805) 1 (0) 17.28 (6.174) 1 (0) 20.13 (0.692)

Table 19: Mean (SD) of NMI and runtime (in seconds) from the ORC method and two diﬀerent Spectral Clustering methods on planted SBMs with two equally-sized blocks, pin = 0.1, pout = 0.01, and varying sizes n.

n ORC-E SC-R SC-N NMI runtime NMI runtime NMI runtime 50 0.925 (0.081) 0.741 (0.121) 0.304 (0.050) 0.077 (0.011) 0.623 (0.192) 0.086 (0.003) 100 0.993 (0.015) 2.330 (0.378) 0.128 (0.021) 0.183 (0.030) 0.683 (0.133) 0.159 (0.012) 300 1 (0) 48.88 (4.877) 0.526 (0.333) 1984 (886.8) 0.951 (0.078) 7.360 (0.387) 500 1 (0) 269.6 (25.12) 0.033 (0.029) 25332 (697.7) 0.920 (0.079) 109.6 (4.948)

Table 20: Mean (SD) of NMI and runtime (in seconds) from the ORC method and two diﬀerent Spectral Clustering methods on planted MMBs with two equally-sized blocks, pin = 0.1, pout = 0, no = 1 node of mixed membership, and varying sizes n. The number of clusters is chosen to be the true one (2), instead of a grid search, because the runtime is expected to exceed a day for each graph.

Furthermore, we have also performed a detailed comparison with other spectral methods, e.g., (Zhang, 2023). However, since the appearance of nodes of signiﬁcantly larger degree than the others is relatively rare in our datasets, the results do not change signiﬁcantly in our experiments. Therefore, we have omitted the detailed results.

Curvature-based Clustering on Graphs

C.4 Synthetic data

In Section 7.4, we observe that the performance of the algorithm via augmented FRC is not comparable with the performance of the algorithm via ORC, which raises the important problem of relating FRC to ORC. We ﬁnd that FRC-3 has a stronger linear relationship with ORC as the graphs become denser; see Figures 17, 18 and 19. This is potentially because there are more and more triangular and quadrangular faces in the graphs when more edges are present, and these characteristics become more dominant for the ORC. Meanwhile, we ﬁnd that, when pin is relatively small, all three variants of FRC cannot really separate the edges in L between the nodes in the same community versus the others initially, which might be the reason of a relatively worse performance of the algorithm via FRC in the experiments. However, FRC-3 can separate these two types of edges when pin is relatively large, which indicates that the performance is expected to be better in the dense regime, where ORC might have computational issues.

C.5 Node neighborhood measures

We give some brief computational evidence that the choice of α in the node neighborhood measure as in Equation (4.2) has little impact on the performance of our curvature-based algorithms. Over a range of diﬀerent choices of α, and for two real-data networks, we see that the results are highly conserved.

Data\α 0 0.1 0.2 0.3 0.4 0.5 0.6 DBLP-1 0.689 0.689 0.689 0.689 0.689 0.689 0.689 DBLP-2 0.969 0.969 0.969 0.969 0.969 0.969 0.969

Table 21: Extended NMI of ORC-E via line graph curvature on the real data, where we choose diﬀerent values of α in Equation (4.2).

Tian and Lubberts and Weber

0.6 0.4 0.2 0.0 0.2 ORC

Ricci curvatures of edges in L (within community 1)

Count in bins

0.4 0.2 0.0 0.2 ORC

Ricci curvatures of edges in L (within community 1)

Count in bins

0.4 0.2 0.0 0.2 ORC

Ricci curvatures of edges in L (within community 1)

Counts in bins

0.6 0.4 0.2 0.0 0.2 ORC

Ricci curvatures of edges in L (within community 2)

Count in bins

0.4 0.2 0.0 0.2 ORC

Ricci curvatures of edges in L (within community 2)

Count in bins

0.4 0.2 0.0 0.2 ORC

Ricci curvatures of edges in L (within community 2)

Counts in bins

0.6 0.4 0.2 0.0 0.2 ORC

Ricci curvatures of edges in L (between communities)

Count in bins

0.4 0.2 0.0 0.2 ORC

Ricci curvatures of edges in L (between communities)

Count in bins

0.4 0.2 0.0 0.2 ORC

Ricci curvatures of edges in L (between communities)

Counts in bins

Figure 17: 2-d histograms of the ORC-E versus the FRC-1 (left), FRC-2 (middle) and FRC3 (right) for edges in L, where the networks are obtained from the planted two-block MMB of size n = 300, pin = 0.1, pout = 0, no = 1 node of mixed membership, and we generate ns = 10 networks for the results. (Row 1 and 2: edges between nodes within communities 1, in blue and 2, in green, respectively, in L; Row 3: edges between nodes in the diﬀerent communities, in red).

Curvature-based Clustering on Graphs

0.4 0.2 0.0 0.2 ORC

Ricci curvatures of edges in L (within community 1)

Count in bins

0.4 0.2 0.0 0.2 ORC

Ricci curvatures of edges in L (within community 1)

Count in bins

0.4 0.2 0.0 0.2 ORC

Ricci curvatures of edges in L (within community 1)

Counts in bins

0.4 0.2 0.0 0.2 ORC

Ricci curvatures of edges in L (within community 2)

Count in bins

0.4 0.2 0.0 0.2 ORC

Ricci curvatures of edges in L (within community 2)

Count in bins

0.4 0.2 0.0 0.2 ORC

Ricci curvatures of edges in L (within community 2)

Counts in bins

0.4 0.2 0.0 0.2 ORC

Ricci curvatures of edges in L (between communities)

Count in bins

0.4 0.2 0.0 0.2 ORC

Ricci curvatures of edges in L (between communities)

Count in bins

0.4 0.2 0.0 0.2 ORC

Ricci curvatures of edges in L (between communities)

Counts in bins

Figure 18: 2-d histograms of the ORC-E versus the FRC-1 (left), FRC-2 (middle) and FRC-3 (right) for edges in L, where the networks are obtained from the planted two-block MMB of size n = 300, pin = 0.2, pout = 0, no = 1 node of mixed membership, and ns = 10.

Tian and Lubberts and Weber

0.4 0.2 0.0 0.2 ORC

Ricci curvatures of edges in L (within community 1)

Count in bins

0.4 0.2 0.0 0.2 ORC

Ricci curvatures of edges in L (within community 1)

Count in bins

0.4 0.2 0.0 0.2 ORC

Ricci curvatures of edges in L (within community 1)

Counts in bins

0.4 0.2 0.0 0.2 ORC

Ricci curvatures of edges in L (within community 2)

Count in bins

0.4 0.2 0.0 0.2 ORC

Ricci curvatures of edges in L (within community 2)

Count in bins

0.4 0.2 0.0 0.2 ORC

Ricci curvatures of edges in L (within community 2)

Counts in bins

0.4 0.2 0.0 0.2 ORC

Ricci curvatures of edges in L (between communities)

Count in bins

0.4 0.2 0.0 0.2 ORC

Ricci curvatures of edges in L (between communities)

Count in bins

0.4 0.2 0.0 0.2 ORC

Ricci curvatures of edges in L (between communities)

Counts in bins

Figure 19: 2-d histograms of the ORC-E versus the FRC-1 (left), FRC-2 (middle) and FRC-3 (right) for edges in L, where the networks are obtained from the planted two-block MMB of size n = 300, pin = 0.3, pout = 0, no = 1 node of mixed membership, and ns = 10.

Curvature-based Clustering on Graphs

E. Abbe. Community detection and stochastic block models: recent developments. Journal of Machine Learning Research, 18(1):6446 6531, 2017.

E. Airoldi, D. Blei, S. Fienberg, and E. Xing. Mixed membership stochastic blockmodels. Advances in neural information processing systems, 21, 2008.

F. Battiston, G. Cencetti, I. Iacopini, V. Latora, M. Lucas, A. Patania, J. Young, and G. Petri. Networks beyond pairwise interactions: structure and dynamics. Physics Reports, 874:1 92, 2020.

A. Benson, D. Gleich, and J. Leskovec. Higher-order organization of complex networks. Science, 353(6295):163 166, 2016.

E. Bloch. Combinatorial Ricci curvature for polyhedral surfaces and posets. ar Xiv:1406.4598, 2014.

V. Blondel, J. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), oct 2008.

J. Cheeger. A lower bound for the smallest eigenvalue of the laplacian. In Proceedings of the Princeton conference in honor of Professor S. Bochner, pages 195 199, 1969.

Z. Chen, X. Li, and J. Bruna. Supervised community detection with line graph neural networks. ar Xiv:1705.08415, 2017.

M. de Reus and M. van den Heuvel. Rich club organization and intermodule communication in the cat connectome. Journal of Neuroscience, 33(32):12929 12939, 2013.

P. Elumalai, Y. Yadav, N. Williams, E. Saucan, J. Jost, and A. Samal. Graph Ricci curvatures reveal atypical functional connectivity in autism spectrum disorder. Scientiﬁc reports, 12(1):1 19, 2022.

M. Erbar and J. Maas. Ricci curvature of ﬁnite markov chains via convexity of the entropy. Archive for Rational Mechanics and Analysis, 206:997 1038, 2012.

T. Evans and R. Lambiotte. Line graphs of weighted networks for overlapping communities. The European Physical Journal B, 77(2):265 272, September 2010.

M. Fiedler. Algebraic connectivity of graphs. Czechoslovak mathematical journal, 23(2): 298 305, 1973.

R. Forman. Bochner s method for cell complexes and combinatorial Ricci curvature. Discrete and Computational Geometry, 29(3):323 374, 2003.

M. Girvan and M. Newman. Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12):7821 7826, 2002.

Tian and Lubberts and Weber

S. G omez, P. Jensen, and A. Arenas. Analysis of community structure in networks of correlated data. Physical Review E, 80(1):016114, 2009.

A. Gosztolai and A. Arnaudon. Unfolding the multiscale structure of networks with dynamical Ollivier-Ricci curvature. Nature Communications, 12(1):4561, 2021.

M. Gromov. Hyperbolic groups. In Essays in Group Theory, volume 8. Mathematical Sciences Research Institute Publications. Springer, NY, 1987.

L. Harriger, M. van den Heuvel, and O. Sporns. Rich club organization of macaque cerebral cortex and its role in network communication. PLOS ONE, 7(9):1 13, 09 2012.

P. Holland, K. Laskey, and S. Leinhardt. Stochastic blockmodels: First steps. Social Networks, 5(2):109 137, 1983.

S. Hopkins and D. Steurer. Bayesian estimation from few samples: community detection and related problems, September 2017.

T. Jarrell, Y. Wang, A. Bloniarz, C. Brittin, M. Xu, J. Thomson, D. Albertson, D. Hall, and S. Emmons. The connectome of a decision-making neural network. Science, 337 (6093):437 444, 2012.

J. Jin, Z. Ke, and S. Luo. Estimating network memberships by simplex vertex hunting. ar Xiv:1708.07852, 2017.

J. Jost and S. Liu. Ollivier s Ricci curvature, local clustering and curvature-dimension inequalities on graphs. Discrete & Computational Geometry, 51(2):300 322, 2014.

J. Jost and F. M unch. Characterizations of forman curvature. ar Xiv:2110.04554, 2021.

F. Krzakala, C. Moore, E. Mossel, J. Neeman, A. Sly, L. Zdeborovm a, and P. Zhang. Spectral redemption in clustering sparse networks. Proceedings of the National Academy of Sciences, 110(52):20935 20940, 2013.

H. Kuhn. The hungarian method for the assignment problem. Naval research logistics quarterly, 2(1-2):83 97, 1955.

A. Lancichinetti, S. Fortunato, and J. Kert esz. Detecting the overlapping and hierarchical community structure in complex networks. New Journal of Physics, 11(3):033015, 2009.

W. Leal, G. Restrepo, P. F. Stadler, and J. Jost. Forman Ricci curvature for hypergraphs. Advances in Complex Systems, 24(01):2150003, 2021.

P. Lehot. An optimal algorithm to detect a line graph and output its root graph. Journal of the ACM (JACM), 21(4):569 575, 1974.

J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.

J. Leskovec and J. Mcauley. Learning to discover social circles in ego networks. In F. Pereira, C.J. Burges, L. Bottou, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012.

Curvature-based Clustering on Graphs

Y. Lin, L. Lu, and S. Yau. Ricci curvature of graphs. Tohoku Mathematical Journal, 63(4): 605 627, 2011.

Z. Lubberts, A. Athreya, Y. Park, and C. E. Priebe. Beyond the adjacency matrix: random line graphs and inference for networks with edge attributes. ar Xiv:2103.14726, 2021.

S. Lubold, A. Chandrasekhar, and T. Mc Cormick. Identifying the latent space geometry of network models through analysis of curvature. Journal of the Royal Statistical Society Series B: Statistical Methodology, 85(2):240 292, 2023.

C. Ni, Y. Lin, J. Gao, X. Gu, and E. Saucan. Ricci curvature of the internet topology. In 2015 IEEE conference on computer communications (INFOCOM), pages 2758 2766. IEEE, 2015.

C. Ni, Y. Lin, F. Luo, and J. Gao. Community detection on networks with ricci ﬂow. Scientiﬁc reports, 9(1):1 12, 2019.

Y. Ollivier. Ricci curvature of markov chains on metric spaces. Journal of Functional Analysis, 256(3):810 864, 2009.

Y. Ollivier. A survey of Ricci curvature for metric spaces and Markov chains. In Probabilistic Approach to Geometry, pages 343 381. Mathematical Society of Japan, 2010.

D. T. Painter, B. C. Daniels, and J. Jost. Network analysis for the digital humanities: Principles, problems, extensions. ISIS, 110(3):538 554, 2019.

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825 2830, 2011.

M. Penrose. Random Geometric Graphs. Oxford studies in probability. Oxford University Press, 2003.

M. Pouryahya, J. Mathews, and A. Tannenbaum. Comparing three notions of discrete Ricci curvature on biological networks. ar Xiv:1712.02943, 2017.

A. Samal, R. Sreejith, J. Gu, S. Liu, E. Saucan, and J. Jost. Comparative analysis of two discretizations of Ricci curvature for complex networks. Scientiﬁc reports, 8(1):1 16, 2018.

R. Sandhu, T. Georgiou, and A. Tannenbaum. Ricci curvature: An economic indicator for market fragility and systemic risk. Science advances, 2(5):e1501495, 2016.

E. Saucan and M. Weber. Forman s Ricci curvature-from networks to hypernetworks. In International conference on complex networks and their applications, pages 706 717. Springer, 2018.

E. Saucan, A. Samal, M. Weber, and J. Jost. Discrete curvatures and network analysis. MATCH, 80(3):605 622, 2018.

Tian and Lubberts and Weber

J. Sia, E. Jonckheere, and P. Bogdan. Ollivier-Ricci curvature-based method to community detection in complex networks. Scientiﬁc reports, 9(1):1 12, 2019.

R. Sinkhorn and P. Knopp. Concerning nonnegative matrices and doubly stochastic matrices. Paciﬁc Journal of Mathematics, 21(2):343 348, 1967.

D. Spielman and S. Teng. Spectral partitioning works: Planar graphs and ﬁnite element meshes. In Proceedings of 37th conference on foundations of computer science, pages 96 105. IEEE, 1996.

A. Strehl and J. Ghosh. Cluster ensembles a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3:583 617, 2002.

A. Tannenbaum, C. Sander, L. Zhu, R. Sandhu, I. Kolesov, E. Reznik, Y. Senbabaoglu, and T. Georgiou. Ricci curvature and robustness of cancer networks. ar Xiv:1502.04512, 2015.

Y. Tian, Z. Lubberts, and M. Weber. Mixed-membership community detection via line graph curvature. In Proceedings of the 1st Neur IPS Workshop on Symmetry and Geometry in Neural Representations, volume 197 of Proceedings of Machine Learning Research, pages 219 233. PMLR, 03 Dec 2023.

U. von Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(1):395 416, 2007.

D. Watts and S. Strogatz. Collective dynamics of small-world networks. nature, 393(6684): 440 442, 1998.

M. Weber. Neighborhood growth determines geometric priors for relational representation learning. In Proceedings of the Twenty Third International Conference on Artiﬁcial Intelligence and Statistics, volume 108 of Proceedings of Machine Learning Research, pages 266 276. PMLR, 26 28 Aug 2020.

M. Weber and M. Nickel. Curvature and representation learning: Identifying embedding spaces for relational data. In Neur IPS Relational Representation Learning, 2018.

M. Weber, E. Saucan, and J. Jost. Characterizing complex networks with Forman-Ricci curvature and associated geometric ﬂows. Journal of Complex Networks, 5(4):527 550, 2017a.

M. Weber, E. Saucan, and J. Jost. Coarse geometry of evolving networks. Journal of Complex Networks, 6(5):706 732, 2017b.

M. Weber, J. Stelzer, E. Saucan, A. Naitsat, G. Lohmann, and J. Jost. Curvature-based methods for brain network analysis. ar Xiv:1707.00180, 2017c.

M. Weber, J. Jost, and E. Saucan. Detecting the coarse geometry of networks. In Neur IPS Relational Representation Learning, 2018.

H. Whitney. Congruent graphs and the connectivity of graphs. American Journal of Mathematics, 54(1):150 168, 1932.

Curvature-based Clustering on Graphs

J. Xie, S. Kelley, and B. K. Szymanski. Overlapping community detection in networks: The state-of-the-art and comparative study. ACM Computing Surveys (csur), 45(4): 1 35, 2013.

J. Yang and J. Leskovec. Overlapping community detection at scale: a nonnegative matrix factorization approach. In Proceedings of the sixth ACM international conference on Web search and data mining, pages 587 596, 2013.

J. Yang and J. Leskovec. Deﬁning and evaluating network communities based on groundtruth. Knowledge and Information Systems, 42:181 213, 2015.

Anderson Ye Zhang. Fundamental limits of spectral clustering in stochastic block models. ar Xiv:2301.09289, 2023.

Y. Zhang, E. Levina, and J. Zhu. Detecting Overlapping Communities in Networks Using Spectral Methods. SIAM Journal on Mathematics of Data Science, 2(2):265 283, January 2020. Publisher: Society for Industrial and Applied Mathematics.

M. Zhu and A. Ghodsi. Automatic dimensionality selection from the scree plot via the use of proﬁle likelihood. Comput. Stat. Data Anal., (2):918 930, 2006.