# curvaturebased_clustering_on_graphs__5567a9f6.pdf Journal of Machine Learning Research 26 (2025) 1-67 Submitted 5/24; Published 1/25 Curvature-based Clustering on Graphs Yu Tian yu.tian.research@gmail.com Center for Systems Biology Dresden Dresden, 01307, Germany Zachary Lubberts zlubberts@virginia.edu Department of Statistics University of Virginia Charlottesville, VA 22904, USA Melanie Weber mweber@seas.harvard.edu Harvard University Cambridge, MA 02138, USA Editor: Michael Mahoney Unsupervised node clustering (or community detection) is a classical graph learning task. In this paper, we study algorithms that exploit the geometry of the graph to identify densely connected substructures, which form clusters or communities. Our method implements discrete Ricci curvatures and their associated geometric flows, under which the edge weights of the graph evolve to reveal its community structure. We consider several discrete curvature notions and analyze the utility of the resulting algorithms. In contrast to prior literature, we study not only single-membership community detection, where each node belongs to exactly one community, but also mixed-membership community detection, where communities may overlap. For the latter, we argue that it is beneficial to perform community detection on the line graph, i.e., the graph s dual. We provide both theoretical and empirical evidence for the utility of our curvature-based clustering algorithms. In addition, we give several results on the relationship between the curvature of a graph and that of its dual, which enable the efficient implementation of our proposed mixed-membership community detection approach and which may be of independent interest for curvature-based network analysis. Keywords: Community Detection, Node Clustering, Discrete Curvature, Line Graph, Graph-based Learning 1. Introduction Relational data, such as graphs or networks, is ubiquitous in machine learning and data science. Consequently, a large body of literature devoted to studying the structure of such data has been developed. Clustering on graphs, also known as community detection or (unsupervised) node clustering, is of central importance to the study of relational data. It seeks . Equal contribution . A preliminary version of part of this work was presented at the Neur IPS Workshop on Symmetry and Geometry in Neural Representations 2022 (Tian et al., 2023). c 2025 Yu Tian and Zachary Lubberts and Melanie Weber. License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v26/24-0781.html. Tian and Lubberts and Weber (a) Single-Membership (b) Mixed-Membership Figure 1: Clustering on Graphs. to identify densely interconnected substructures (clusters) in a given graph. Such structure is ubiquitous in relational data: We may think of friend circles in social networks, metabolic pathways in biochemical networks or article categories in Wikipedia. A standard mathematical model for analyzing community structure in graphs is the Stochastic Block Model (SBM), which has led to fundamental insights into the detectability of communities (Abbe, 2017). Classically, communities are identified by clustering the nodes of the graph. Popular methods include the Louvain algorithm (Blondel et al., 2008), the Girvan-Newman algorithm (Girvan and Newman, 2002) and Spectral clustering (Cheeger, 1969; Fiedler, 1973; Spielman and Teng, 1996). Recently, there has been growing interest in another class of algorithms, which applies a geometric lens to community detection. Graph curvatures provide a means to understanding the structure of networks through a characterization of their geometry. More generally, curvature is a classical tool in differential geometry, which is used to characterize the local and global properties of geodesic spaces. While originally defined in continuous spaces, discrete notions of curvature have recently seen a surge of interest. Of particular interest are discretizations of Ricci curvature, which is a local notion of curvature that relates to the volume growth rate of the unit ball in the space of interest (geodesic dispersion). Curvature-based analysis of relational data (see, e.g., (Weber et al., 2017a,b)) has been applied in many domains, including to biological (Elumalai et al., 2022; Weber et al., 2017c; Tannenbaum et al., 2015), chemical (Leal et al., 2021; Saucan et al., 2018), social (Painter et al., 2019) and financial networks (Sandhu et al., 2016). Community detection via graph curvatures is motivated by the observation that edges between communities (so called bridges, see Figure 1b(a)) have low Ricci curvature. By identifying and removing such bridges, one can learn a partition of the graph into its communities. This simple idea has given rise to a series of curvature-based approaches for community detection (Ni et al., 2019; Sia et al., 2019; Weber et al., 2018; Tian et al., 2023). However, the picture is far from complete. The proposed algorithms vary significantly in their implementation, in particular in the choice of the curvature notion that they utilize. In this paper, we will elucidate commonalities and differences among variants of the two most common curvature notions in terms of their utility for community detection. To this end, we propose a unifying framework for curvature-based community detection algorithms and perform a systematic analysis of the advantages and disadvantages of each curvature notion. The insights gained from this analysis give rise to several improvements, which mitigate the identified shortcomings in accuracy and scalability. Curvature-based Clustering on Graphs The present paper also addresses a second important gap in the literature. Existing work on partition-based community detection, specifically on curvature-based approaches, focuses almost exclusively on detecting communities with unique membership, i.e., each node can belong to only one community (single-membership community detection). However, in many complex systems that generate relational data, nodes may belong to more than one community: A member of a social network might belong to a circle of high school friends and a circle of college friends, a protein may have different functional roles in a biochemical network etc. This creates a need for algorithms that can cluster nodes with mixed membership (Airoldi et al., 2008; Yang and Leskovec, 2013; Zhang et al., 2020), so-called mixed-membership community detection. Observe that the mixed-membership structure precludes the existence of bridges between communities, because the communities overlap (see Figure 1b(b)). This renders community detection methods that rely on graph partitions, such as the curvature-based methods discussed above, inapplicable. In this paper, we propose a principled curvature-based approach for detecting mixed-membership community structure. We show that while partition-based approaches do not recover the underlying community structure when applied to the original graph, they perform well on its dual, the line graph. The line graph encodes the connectivity between the edges of the original graph (see Figure 2). This reformulation of the relational information in the graph allows for disentangling the overlapping community structure (see Figure 3). When applied to the line graph, our curvature-based community detection approach correctly identifies edges in the line graph, which can be cut to partition the graph into its communities and assign node labels that reflect mixed-membership community structure. Utilizing fundamental relationships between the curvature of a graph and its dual, we demonstrate that curvature-based mixed-membership community detection results in scalable and accurate methods. 1.1 Overview and Summary of Contributions Our main contributions are as follows: 1. We propose a unifying framework for curvature-based community detection (unsupervised node clustering) algorithms (Section 2). Our framework encompasses several previously studied algorithms for curvature-based single-membership community detection (sec. 2.3, 6.1.1 and 6.2.1). Utilizing this framework, we systematically investigate the strengths and weaknesses of different discrete curvature notions (specifically, variants of Forman s and Ollivier s Ricci curvature, Sections 7 and 8). 2. We adapt our proposed framework to mixed-membership community detection (Section 6), providing the first curvature-based algorithms for this setting. 3. We elucidate significant scalability issues in community detection approaches that utilize Ollivier s notion of discrete curvature. To overcome these issues, we propose an effective approximation, which may be of independent interest in the broader area of curvature-based network analysis (Sections 4.1.3 and 5.2.2). 4. We demonstrate the utility of our framework through theoretical analysis (Section 6.1.2) and through a series of systematic experiments with synthetic and real network data. We further benchmark curvature-based methods against classical community detection approaches (Section 7). Tian and Lubberts and Weber 5. As a byproduct, we derive several fundamental relations between discrete curvatures in a graph and its dual. Those relations may be of independent interest (Section 5). 2. Clustering on Graphs Community structure is a hallmark feature of networks, characterized by clusters of nodes that have more internal connections than connections to nodes in other clusters. In networks with mixed-membership structure, which quantifies the idea of overlapping communities, nodes may belong to more than one cluster. In the following, let G = (V, E, W) denote a graph with vertex set V and edge set E. We may associate weights with both vertices and edges, defined as functions ω(v) : V R+ and ω(e) : E R+. The distance measure on G is the standard path distance d, i.e., d G(u, v) := inf P i=1 w({zi, zi+1}), (2.1) where the infimum is taken over all u v paths P = {zi}l i=1 V , with {zi, zi+1} E for each i, and z1 = u, zl = v. We further define the degree du of a node u as the weighted sum of its adjacent edges, i.e., du := X u e w(e) . (2.2) We remark that the terms network and graph are often used interchangeably. Following the convention in the graph learning literature, we use graph , if we refer to the mathematical object and network when referring to a data set or specific instance of a graph. We will use the terms node and vertex interchangeably. 2.1 Single-Membership Node Clustering 2.1.1 Single-Membership Community Structure In classical community detection, each node can only belong to one community, which characterizes the community structure as being single-membership. Various algorithms exist by virtue of interdisciplinary expertise (Blondel et al., 2008; Girvan and Newman, 2002; von Luxburg, 2007), generally aiming to optimize a quality function with respect to different partitions of the network. We denote a quality function as Q which increases as the resulting community structure becomes clearer from some perspective, such as the modularity (Girvan and Newman, 2002), and denote a partition of the vertices as {B1, . . . , Bk} that are mutually exclusive and exhaustive, i.e., Bh Bl = when h = l and k h=1Bh = V , with k being the number of communities. The classical community detection problem can then be formulated as the combinatorial optimization problem max k max {B1,...,Bk} Q({B1, . . . , Bk}). In this generality, the community detection problem is NP-hard, so heuristics-based algorithms are typically applied in order to provide a sufficiently good solution. The Stochastic Block Model (short: SBM ) (Holland et al., 1983) is a random graph model, which emulates the community structure found in many real networks. It is a Curvature-based Clustering on Graphs popular model for studying complex networks and, in particular, clustering on graphs. In the SBM, vertices are partitioned into subgroups called blocks, and the distribution of the edges between vertices is dependent on the blocks to which the vertices belong. Formally: Definition 1 (Stochastic Block Model) Let {B1, . . . , Bk} be a partition of a graph s vertices into k blocks, and let B = (Bhl) denote the characteristic matrix between the blocks, where Bhl shows the probability of an edge existing between a vertex in block Bh and one in Bl. Then, if we denote the random adjacency matrix of the network by A = (Aij) where Aij = 1 if there is an edge between vertices vi and vj, P (Aij = 1) = Bσ(i)σ(j) , where σ : V {1, . . . , k} indicates the block membership. A planted SBM has the additional structure that Bhh = pin, h, and Bhl = pout, h = l, which we denote by SBM(pin, pout). 2.1.2 Partition-based clustering methods Of particular importance for classical community detection are edges between clusters, often called bridges. Bridges connect two nodes in distinct communities (see Figure 1a). In graphs with single-membership community structure, i.e., where each node belongs to one community only, such edges always exist (otherwise each community is a connected component). This observation has motivated a range of partition-based community detection approaches (Blondel et al., 2008; Girvan and Newman, 2002; von Luxburg, 2007), which rely on identifying and cutting such bridges to partition the graph into communities. Among the most popular community detection methods is the Louvain algorithm (Blondel et al., 2008), which is a heuristic method proposed to maximize modularity hierarchically, starting from each node as a community and then successively merging with its neighboring communities whenever it improves modularity. Spectral clustering (von Luxburg, 2007), another widely used method, utilizes the spectrum of the graph Laplacian to find a partition of the graph with a minimal number of edge cuts (mincut problem). This optimization problem too is NP-hard, but it can be solved efficiently after relaxing some constraints. We give a brief overview of other classical partition-based community detection methods in Section 3. In this paper, we discuss methods that utilize discrete curvature to identify bridges: Below, we introduce a blueprint for partition-based clustering via discrete Ricci curvature. 2.2 Mixed-Membership Node Clustering 2.2.1 Mixed-membership Community Structure In graphs with mixed-membership communities, nodes may belong to more than one community. Such community structure can be formalized via an extension of the SBM: The Mixed-Membership Stochastic Block Model (short: MMB) (Airoldi et al., 2008). In both models, each element of the adjacency matrix A above the diagonal is an independent Bernoulli random variable whose expectation only depends on the block memberships of the corresponding nodes. In the MMB, it is possible that nodes belong to more than one block, with various affiliation strengths. Formally: Tian and Lubberts and Weber Definition 2 (Mixed-Membership Block Model) We assume that the expectation E[A] has the form E[A] = XBXT , where X [0, 1]n k is the community membership matrix with Xil indicating the affiliation of node i with community l and P l Xil = 1, and n denotes the size of the network. The matrix B [0, 1]k k encodes the block connection probabilities. If Xil = 1 for some l, we call vertex i a pure node; otherwise i is a mixed node. When all nodes are pure, we recover the ordinary SBM model. In our experiments, we consider a planted version where Bhh = pin, h, and Bhl = pout, h = l, which we denote by MMB(pin, pout). 2.2.2 Line Graphs Figure 2: Graphs G with their corresponding line graphs L(G). Note that vertices in G with degree at least 3 result in cliques in L(G), and that cycles are preserved. We denote the dual of a graph as its line graph: Definition 3 Given an unweighted, undirected graph G = (V, E), its line graph is defined as L(G) = (E, E), where each edge {{u, v}, {r, s}} E appears when |{u, v} {r, s}| = 1. In other words, e, e are adjacent in L(G) when they are both incident on the same vertex. The line graph can be seen as a re-parametrization of the input graph, which encodes higher-order information about its edges connectivity. Examples of graphs and their line graphs can be found in Figure 2. Importantly, cycles are preserved under this construction (e.g., Figure 2 (bottom)). Cliques and cycles in the line graph may also arise from nodes of degree three or higher (see, e.g., Figure 2 (top)). Notice that the line graph is typically larger in size than the original graph. In a graph with n vertices and average degree nαn, we expect to have n2αn edges. In the line graph, this means n2αn vertices and n3α2 n edges, though this number increases with greater variation of the vertex degrees in the original graph. For the applications to community detection that we will discuss below, this necessitates a careful consideration of the scalability of different community detection approaches. The classical line graph construction gives a graph without edge weights, since edge weights in the original graph correspond to node weights in the line graph. We will discuss below (Section 5.2.2) how weights can be imposed on line graph edges that encode meaningful structural information in the original graph. 2.2.3 Partition-based Methods on the Line Graph. We have seen above that edges between clusters, often called bridges, are of particular importance for single-membership community detection. However, overlaps between communities preclude the existence of these bridges (see Figure 1b), which limits the applicability of partition-based approaches. In this paper we argue that the line graph provides a natural input for partition-based mixed-membership community detection. While nodes may not have a unique label in this model, the adjacent edges may still be internal, connecting Curvature-based Clustering on Graphs two nodes that are in the same community (at least partially). In this case, each edge is associated with a single community (see Figure 3 (left)). Consequently, by representing the relationships among edges in a line graph, we disentangle the overlapping communities. Each edge in the line graph appears at a vertex in the original graph. When the vertex has mixed membership, a bridge between the communities arises in the line graph (Figure 3 (right)). 2.3 Curvature-based Clustering Algorithm Algorithm 1 Curvature-based Clustering via Ricci Flow 1: Input: Graph G = (V, E, W0), hyperparameters ν (step size, flow), ϵ (tolerance), ϵd (drop threshold), T (number of iterations) 2: for t = 1, . . . , T do 3: for {u, v} E do 4: Compute curvature κ({u, v}). 5: Evolve weight under Ricci flow: wt({u, v}) (1 ν κ({u, v}))d G(u, v). 7: Renormalize edge weights W t: wt({u, v}) |E|d G(u,v) P {u ,v } E d G(u ,v ) for all {u, v} E. 9: Construct cut-offpoints {x0, x1, . . . , xnf }. 10: Initialize l R|V | (list of node labels), Q 1, Qbest := ϵ. 11: for i = 0, . . . nf do 12: Construct Gi = (V, Ei, W T i ) with Ei = {{u, v} : w T ({u, v}) > xi} and W T i = W T Ei. 13: Determine connected components of Gi. Assign node labels by components. 14: Compute the modularity Qi of the resulting community assignment. 15: if Qi > Qbest, Qi Qi 1 Qi > ϵd then 16: Store label assignments in l. Qbest = Qi. 18: end for 19: If Qbest > ϵ, return l. Figure 3: Community structure in a graph G and its line graph L(G) with node labels (top) and edge labels (bottom). In this section, we will describe the blueprint of our curvature-based clustering algorithm, which we will then specialize to the singleand mixedmembership problem described above. In the community detection literature, approaches based on discrete Ricci curvature and an associated Ricci flow have been studied for unique-membership communities (Ni et al., 2019; Sia et al., 2019; Weber et al., 2018). We discuss the details of those approaches in Sections 6.1 and 6.2. To the best of our knowledge, extensions of curvature-based methods to mixed-membership community detection have not been studied before. Tian and Lubberts and Weber Like previous approaches that utilize Ricci curvature (e.g., Ni et al. (2019)), we build on a notion of discrete Ricci flow first proposed by Ollivier (2009): d dtd G(u, v)(t) = κ({u, v})(t) d G(u, v)(t) ({u, v} E) , (2.3) where d G(u, v) denotes the shortest path distance between adjacent nodes u, v E and κ({u, v}) the Ricci curvature along that edge. In this work, κ({u, v}) may denote either Forman s or Ollivier s Ricci curvature. We will describe the two resulting methods in detail below. Curvature-based clustering implements discrete Ricci flow via a combinatorial evolution equation, which evolves edge weights according to the local geometry of the graph. Consider a family of weighted graphs Gt = {V, E, W t}, which is constructed from an input graph G by evolving its edge weights as wt({u, v}) (1 κ({u, v}))d G(u, v) ({u, v} E) , (2.4) where the curvature κ({u, v}) and shortest-path distance d G(u, v) is computed on the graph Gt 1. Equation (2.4) can be viewed as a discrete analog of Ricci flow on Riemannian manifolds; it was first introduced by Ollivier (2010) with κ denoting ORC. Since then, several variants have been studied in the context of clustering, utilizing ORC (Ni et al., 2019; Sia et al., 2019; Tian et al., 2023), as well as FRC (Weber et al., 2018). Each of these algorithms is a special case of the general framework that we describe below. The clustering procedure is initialized with the (possibly unweighted) input graph, i.e., G0 := G. We then evolve the edge weights for T iterations under Ricci flow (Equation (2.4)). In each iteration, the edge weights are renormalized using wt({u, v}) |E|d G(u, v) P {u ,v } E d G(u , v ) . (2.5) Over time, the negativity of the curvature of edges that bridge communities intensifies, as edges with lower Ricci curvature contract more slowly under Ricci flow. On the other hand, edges with higher Ricci curvature contract faster. This results in a decrease of the weight of internal edges over time, while the weight of the bridges increases (see Figure 4). With that, the discrete Ricci flow reinforces the meso-scale structure of the network. In order to recover single-membership labels, we cut the edges in the graph with high weights (or, equivalently, with very negative curvature) after evolving edge weights for T iterations. The resulting partition delivers a node clustering. In the mixed-membership case, we cut the edges in the line graph with the highest weight (or, equivalently, the most negative curvature) after evolving edge weights for T iterations. The resulting partition of the graph delivers an edge clustering from which we infer the community labels: We obtain a mixed-membership label vector y for each node v by computing yl(v) = 1 |Ev| e Ev χl(e) , (2.6) where χl is the indicator for the edge cluster Cl. Intuitively, each edge belonging to cluster l that is incident on a node v adds more evidence of affinity between the node v and the cluster Cl. Curvature-based Clustering on Graphs 0 5 10 15 t {1, 5} {1, 9} {1, 22} {1, 23} {1, 24} 0 5 10 15 t {7, 87} {8, 57} {11, 62} {12, 83} {13, 96} Figure 4: To illustrate the effect of the discrete Ricci flow in Algorithm 1, we consider a planted SBM of size n = 100, two equally-sized blocks, pin = 0.3 and pout = 0.02, visualized in the left panel. We select representative edges within a community, shown in orange on the graph, and plot the evolution of their weights in the middle panel. In the right panel, we show this evolution for selected edges between communities (i.e., bridges), shown in blue on the graph. In both cases, the success of the method depends crucially on identifying a good weight threshold for cutting edges in the original graph (single-membership case) or the line graph (mixed-membership case). We propose to perform a hyperparameter search over cut-off points {x0, x1, . . . , xnf }. The construction of the cut-offpoints is an important design choice and depends on the curvature notion used in the approach. We then compute the modularity, a classical quality metric in the community detection literature, for the label assignments corresponding to each cut-offpoint xi. Specifically, modularity (Girvan and Newman, 2002; G omez et al., 2009) is defined as w({u, v}) dudv δ(σ(u), σ(v)) , (2.7) where 2w := P ij w({i, j}). The larger the modularity, the better we expect the clustering to reflect the underlying community structure induced by the graph s connectivity. Our approach is schematically shown in Algorithm 1. Note that in Algorithm 1, the number of iterations are selected a priori, while the choice can also be integrated with the modularity check. This is because the choice of T = 10 generally provides sufficiently good results, which is verified with the comparison to the variant with intermediate modularity check and simple stopping criteria in Section 7.3. 3. Related Work Community detection. Mixed-membership community detection is widely studied in the network science and data mining communities. Notable approaches include Bayesian methods (Airoldi et al., 2008; Hopkins and Steurer, 2017), matrix factorization (Yang and Leskovec, 2013), spectral clustering (Zhang et al., 2020), and vertex hunting (Jin et al., 2017), among others. In addition to the mixed-membership model that we consider here (Airoldi et al., 2008), there is a significant body of literature on closely related overlapping community models (Lancichinetti et al., 2009; Xie et al., 2013), which also study the problem of learning non-unique node labels. Curvature-based community detection methods for non-overlapping communities have recently received growing interest (Ni et al., 2019; Tian and Lubberts and Weber Sia et al., 2019; Gosztolai and Arnaudon, 2021; Weber et al., 2018). Such approaches utilize notions of discrete Ricci curvature (Sia et al., 2019; Gosztolai and Arnaudon, 2021; Weber et al., 2018), as well as an associated Ricci flow (Ni et al., 2019) to partition networks into communities, based on the observation that edges between communities have low curvature. The absence of such bridges in overlapping communities renders these approaches inapplicable to the setting studied in this paper. To the best of our knowledge, our algorithm is the first to study mixed-membership community structure with curvature-based methods. Higher-order structure in Networks. Historically, much of the network analysis literature has focused on the structural information encoded in nodes and edges. Recently, the analysis of higher-order structures has received increasing attention. A plethora of methods for analyzing higher-order structure in relational data has been developed (Benson et al., 2016; Battiston et al., 2020). Our work follows this line of thought, in that it focuses on the relations between edges and the structural information encoded therein. Curvature of higher-order structure has previously been studied in (Weber et al., 2017b; Saucan and Weber, 2018; Leal et al., 2021). Here, networks have been studied as polyhedral complexes (Weber et al., 2017b), or interactions between groups of nodes have been encoded in hypergraphs (Saucan and Weber, 2018; Leal et al., 2021). To the best of our knowledge, this is the first work that studies the curvature of line graphs. Line Graphs. The notion of the line graph goes back at least to (Whitney, 1932), where it is shown that whenever |V | 5, two graphs are isomorphic if and only if their line graphs are. An effective version of this result, which reports whether a given graph is a line graph, and if it is, returns the base graph, appears in Lehot (1974). Community detection and more generally network analysis via the line graph has been recently studied in (Chen et al., 2017; Krzakala et al., 2013; Lubberts et al., 2021; Evans and Lambiotte, 2010). Discrete Curvature. There exists a large body of literature on discrete notions of curvature. Notable examples include Gromov s δ-hyperbolicity (Gromov, 1987), Bakry-Emre curvature (Erbar and Maas, 2012), as well as Forman s and Ollivier s Ricci curvatures (Forman, 2003; Ollivier, 2010). While there is some work on studying those curvature notions on higher-order networks (Bloch, 2014; Weber et al., 2017b) and hypernetworks (Saucan and Weber, 2018; Leal et al., 2021), they have not been studied on line graphs. Curvaturebased analysis of relational data (see, e.g., (Weber et al., 2017a,b)) has been applied in many domains, including to biological (Elumalai et al., 2022; Weber et al., 2017c; Tannenbaum et al., 2015), chemical (Leal et al., 2021; Saucan et al., 2018), social (Painter et al., 2019), information (Ni et al., 2015) and financial networks (Sandhu et al., 2016). Discrete curvature has also found applications in Representation Learning, in particular for identifying representation spaces in graph embeddings (Lubold et al., 2023; Weber, 2020; Weber and Nickel, 2018). 4. Discrete Graph Curvature Curvature is a classical tool in Differential Geometry, which is used to characterize the local and global properties of geodesic spaces. In this paper, we investigate discrete notions of curvature that are defined on graphs. Specifically, we focus on discretizations of Ricci Curvature-based Clustering on Graphs Figure 5: Computation of discrete Ricci curvature. curvature, which is a local notion of curvature that relates to the volume growth rate of the unit ball in the space of interest (geodesic dispersion). In the following, we only consider undirected networks. We introduce two classical notions of discrete Ricci curvature, which were originally introduced by Ollivier (2010) and Forman (2003), respectively. 4.1 Ollivier s Ricci Curvature Our first notion of discrete curvature relates geodesic dispersion to optimal mass transport on the graph. 4.1.1 Formal Definition Consider the transportation cost between two distance balls (i.e., vertex neighborhoods) along an edge in the network. In an unweighted graph, we endow the neighborhoods of vertices u, v adjacent to an edge e = {u, v} with a uniform measure, i.e., du z, s.t. u z, (4.1) and analogously for mv(z). Here, u z indicates that u, z are neighbors. In a weighted graph, for α [0, 1] and p 0, we set mα,p u (z) := α if z = u, Cu exp( d G(u, z)p) if z u, 0 else. where the normalizing constant Cu = P z u exp( d G(u, z)p). Notice that for neighboring vertices u, z we have d G(u, z) = w({u, z}) := wu,z. When α = 0, and p = 0 or we have an unweighted graph, this reduces to the uniform measure. We define Ollivier s Ricci curvature (Ollivier, 2010) (short: ORC) with respect to the Wasserstein-1 distance W1 between those measures, i.e., Ric O(e) := 1 W1(mu, mv) d G(u, v) . (4.3) Tian and Lubberts and Weber Figure 6: Scatter plot of the ORC on G obtained from the optimization versus its approximation, where the optimization is done by solving the earth mover s distance exactly (middle) and approximately with Sinkhorn (right), and G is obtained from the planted SBM of size n = 100, two equally-sized blocks, pout = 0.02 and pin = 0.4, with weights being 1 (top), and the 2-d RGG of size n = 100 and radius r = 0.4 with weights being 1 (middle), and proportional to the distance (bottom). G is visualised in the left column, with the color indicating the ORC obtained by solving the earth mover s distance exactly. The computation of ORC is illustrated in Figure 5a. We can further define ORC for vertices with respect to the curvature of its adjacent edges. Formally, let Ev := {e E : v e} denote the set of edges adjacent to a vertex v. Then its curvature is given by Ric O(v) = X e Ev Ric O(e) . (4.4) 4.1.2 Computational Considerations Computing ORC on a graph is quite expensive. The high cost arises mainly from the computation of the W1-distance, which, in the discrete setting, is also known as earth mover s distance. Computing W1(mu, mv) between the neighborhoods of two vertices u, v Curvature-based Clustering on Graphs that are connected by an edge, corresponds to solving the linear program min W1(mu, mv) = inf Λ j y λijd G(i, j) (4.5) i x λij = mj y j y; X j y λij = mi x i x; λij 0 i, j . Classically, W1(mu, mv) is computed using the Hungarian algorithm (Kuhn, 1955), the fastest variant of which has a cubic complexity. With growing neighborhood size, the computational cost of the Hungarian algorithm becomes prohibitively large. Alternatively, W1(mu, mv) can be approximated with the faster Sinkhorn algorithm (Sinkhorn and Knopp, 1967), whose complexity is only quadratic. However, even a quadratic cost introduces a significant computational bottleneck on the large-scale network data that we encounter in machine learning and data science applications. Here, we propose a different approach. Instead of approximating the W1-distance, we propose an approximation of ORC. Our approximation can be written as a simple combinatorial quantity, which can be computed in linear time. We begin by recalling a few classical bounds on ORC, first proven by Jost and Liu (2014). Let #(u, v) be the number of triangles that include the edge (x, y), a b := min{a, b} and a b := max{a, b}. With respect to these quantities, upper and lower bounds on ORC are given as follows: Theorem 4 (Unweighted case (Jost and Liu, 2014)) 1. Lower bound on ORC: Ric O({u, v}) 1 1 + + #(u, v) Note that a simpler, but less tight lower bound with respect to node degrees only is given by Ric O({u, v}) 2 1 1 2. Upper bound on ORC: Ric O({u, v}) #(u, v) du dv . (4.7) In the following section, we will derive an extension of these results to weighted graphs. Now, let Ricup O and Riclow O denote the upper and lower bounds, respectively. We propose to approximate Ric O({u, v}) as the arithmetic mean of the upper and lower bounds, i.e., [ Ric O({u, v}) := 1 Ricup O ({u, v}) + Riclow O ({u, v}) . (4.8) Tian and Lubberts and Weber Like the bounds themselves, [ Ric O has a simple combinatorial form and can be computed in linear time. Since the computation relies only on local structural information, the procedure can be parallelized easily. Figure 6 demonstrates the utility of this approximation in practise (in both the SBM and Random Geometric Graphs (short: RGG); see Appendix B for more explanation). 4.1.3 Bounds on Ollivier s curvature in weighted graphs Since our curvature-based algorithms work by modifying the edge weights in a graph according to their curvature values, we require an analogue of Theorem 4 for weighted graphs, in order to use a simple combinatorial approximation to the curvature as in Equation (4.8). To do so, we make use of the dual description of the Wasserstein distance between two measures m1, m2: W1(m1, m2) = inf π Π(m1,m2) E(x,y) πd(x, y) = sup h L 1 [Ex m1[h(x)] Ey m2[h(y)]] , where Π(m1, m2) is the set of couplings of m1, m2, and the supremum is over all functions h : V R which are Lipschitz with constant at most 1 (with respect to shortest path distance). We may obtain an upper bound on W1(m1, m2) by finding any coupling π of m1, m2, and a lower bound by finding any 1-Lipschitz function h. Upper bound on W1(m1, m2). By constructing a transport plan taking mx to my, we obtain the following upper bound: Lemma 5 Let mx, my denote the measures on the node neighborhoods of x and y given by taking α = 0 and p = 1 in Equation (4.2). Denote the vertices in N(x) \ N(y) with ℓ, N(y) \ N(x) with r, and N(x) N(y) with c. Introduce the shorthands Lx := P ℓmx(ℓ), Ly := P ℓmy(ℓ), as well as Xx := mx(x), Xy := my(x). Then W1(mx, my) X ℓ wℓ,xmx(ℓ) + X r wr,ymy(r) c wc,y(mx(c) my(c))+ + wc,x(my(c) mx(c))+ Lx + Xx Xy X c (my(c) mx(c))+ Remark 6 This upper bound is exact for weighted trees. As a computational tool, this bound is a step in the right direction: Consider the usual procedure for calculating ORC. We need to compute mx, my, and d G(u, v) for each pair of vertices u, v VH. Then we need to carry out the optimization, which can be realized as a linear program as in Equation (4.5). This lower bound still requires the first step, but avoids the need for the optimization over couplings, which is the major bottleneck in the curvature computation. Curvature-based Clustering on Graphs Lower bound on W1(m1, m2). We now provide a lower bound on W1(mx, my): Lemma 7 Let mx, my be as defined above, and let P = {v N(x) N(y) : mx(v) my(v) > 0}, N = {v N(x) N(y) : mx(v) my(v) < 0}. For S V and v V , let d G(v, S) = min{d G(v, u) : u S}. Then W1(mx, my) max v P d G(v, N)(mx(v) my(v)), X v N d G(v, P)(my(v) mx(v)) Remark 8 A lower bound for W1(mx, my) was previously given by Jost and Liu (2014). However, the ORC notion considered therein differed from ours in the choice of the mass distribution imposed on node neighborhoods. In addition, we believe that the result in Jost and Liu (2014) has a slight inaccuracy; for details, see the discussion in Remark 28. Combinatorial bounds on ORC in the weighted case. We now utilize the combinatorial upper and lower bounds on W1(mx, my) to obtain combinatorial bounds on ORC itself. Theorem 9 (ORC bounds (weighted case)) For x, y V, x y, we have the following bounds: 1. Upper bound on ORC: Ric O({x, y}) 1 max wxy (mx(v) my(v))+, X wxy (my(v) mx(v))+ 2. Lower bound on ORC: Ric O({x, y}) 1 X wℓ,x wx,y mx(ℓ) X wr,y wx,y my(r) wx,y (mx(c) my(c))+ + wc,x wx,y (my(c) mx(c))+ Lx + Xx Xy X c (my(c) mx(c))+ While technical, both bounds are purely combinatorial and can be computed in linear time without the need to solve a linear program. 4.2 Forman s Ricci Curvature Our second notion of discrete curvature utilizes an analogy between spectral properties of manifolds and CW complexes to define a discrete Ricci curvature. Tian and Lubberts and Weber 4.2.1 Formal Definition Forman s discrete Ricci curvature (short: FRC) was originally defined as a measure of geodesic dispersion on CW complexes (Forman, 2003). In its most general form, FRC is defined via a discrete analogue of the Bochner-Weitzenb ock identity d = Bd + Fd , which establishes a connection between the (discrete) Riemann-Laplace operator d, the Bochner Laplacian Fd and Forman s curvature tensor Fd. Networks can be viewed as polyhedral complexes, which is one instance of a CW complex. In particular, by viewing a network s edges as 1-cells, we can define FRC for edges e = (v1, v2) as Ric F (e) = ωe ωv1 ωeωev1 X To arrive at this expression, we have specialized Forman s general notion for Fd to the case of 1-cells (see also (Weber et al., 2017a,b)). Here, ev denotes an edge that shares the vertex v with e. Note that if G is unweighted, then this reduces to Ric F (e) = 4 dv1 dv2 . We can define FRC for vertices (i.e., 0-cells) with respect to the curvature of its adjacent edges Ev (i.e., Ev := {e E : e = ( , v) or e = (v, )): Ric F (v) = X e Ev Ric F (e) . (4.10) It is well-known that higher-order structures impact the community structure in a graph, as well as the ability of many well-known methods to detect community structure. For example, the clustering coefficient, a well-known graph characteristic, can be defined via triangle counts (Watts and Strogatz (1998), see also Section 8.2.1). Therefore, we additionally define an FRC notion, which takes contributions of higher-order structures into account. To characterize such structure, we introduce the following notion: We say that two edges e, ˆe are parallel, denoted as e ˆe, if they are adjacent to either the same vertex (v e, ˆe) or the same face (f e, ˆe), but not both. In the following, we consider the 2-complex version of FRC (Weber et al., 2017b): Ric F (e) := ωe Note that for each edge ˆe parallel to e, only one of the two sums inside the absolute value will have any terms. Here, f denotes a 2-face of order k, such as a triangle (k = 3), a quadrangle (k = 4), a pentagon (k = 5), etc. Curvature-based Clustering on Graphs 4.2.2 Computational Considerations Notice that Equation (4.9) and Equation (4.10) give a simple, combinatorial curvature notion, which can be efficiently computed even on large-scale graphs. To compute the 2-complex FRC of an edge e exactly, one needs to identify all higher order faces in the neighborhood of an edge e and compute their respective contributions in Equation (4.11). This limits the scalability of this notion significantly. However, notice that the likelihood of finding a face of order k in the neighborhood of an edge decreases rapidly as k increases. Figure 7 illustrates this observation with summary statistics for simulations on two popular graph models (the SBM and RGG). We propose to approximate the 2-complex FRC by only considering triangular faces and ignoring structural information involving k-faces with k > 3. This choice reduces the computational burden of the 2-complex FRC significantly. We demonstrate below that experimentally, the performance of our FRC-based clustering algorithm does not improve if higher-order faces are taken into account for the curvature contribution (see Table 5). Specifically, we will utilize the following augmented FRC: Ric F (e) := ωe ev1 e ev1 e ωv1 ωeωev1 + X ev2 e ev2 e To be clear, the sum ev1 e, ev1 e considers all edges ev1 which are incident on v1, hence share a vertex in common with e, but do not form a triangle with e. We consider the following weighting scheme for faces, which utilizes Heron s formula to determine face weights via edge weights: Let f = (ei, ej, ek) denote a triangle in G, i.e., ei ej, ej ek and ei ek. Then we set s(s ωei)(s ωej)(s ωek) (4.13) s = ωei + ωej + ωek This weighting scheme was previously used in (Weber et al., 2017b). 5. Discrete Curvature on the Line Graph In this section, we investigate the computation of discrete Ricci curvature on the line graph. Again, we consider two notions of curvature, Ollivier s (ORC) and Forman s (FRC). Specifically, we want to describe the relationship between a graph s curvature and the curvature of its line graph. We have seen in Section 4 that curvature can be defined on the node and edge levels. Since nodes in the line graph correspond to edges in the underlying graph, it is natural to study the relationship between node-level curvature in the line graph and edge-level curvature in the original graph. In the context of the clustering applications considered in this paper, we are further interested in the insight that curvature can provide on differences between node clustering (based on connectivity in the original graph) and edge clustering (based on connectivity in the line graph). In this context, it is instructive to study the relationship between edge-level curvature in the line graph and edge-level curvature in the underlying graph. Tian and Lubberts and Weber 10 20 Face order Averaged count pin =0.2 pin =0.4 5 10 15 Face order Averaged count r=0.3 r=0.4 Figure 7: Number of k-faces in graphs obtained from the planted SBM of size n = 100, two equally-sized blocks, pout = 0.02 (left), and the 2-d RGG of size n = 100 (right), where the results are averaged over ns = 50 realisations of each set of parameters. Recall that, given an unweighted, undirected graph G = (V, E), its line graph is defined as L(G) = (E, E), where an edge {{u, v}, {r, s}} E appears in the line graph whenever |{u, v} {r, s}| = 1. We will make use of the following simple observation throughout the section: Lemma 10 Node degrees in the line graph are given as d{u,v} = du + dv 2 , where the left hand side denotes node degrees in the line graph L(G), and the right hand side the degrees of the vertices u and v in G. This formula follows from the fact that the edges connecting to {u, v} in L(G) are precisely those edges of the form {u, z} E or {v, z} E, where z = u, v. 5.1 Numerical observations We explore the relationship of (1) edge-level curvature in the original graph G and nodelevel curvature in the corresponding line graph L numerically, as well as (2) edge-level curvatures in the original graph G and its line graph L. Our empirical results consider the two notions of curvature introduced above: Forman s Ricci curvature (FRC) and Ollivier s Ricci curvature (ORC). Our data sets are described in Appendix B. To compute the classical FRC and ORC notions in the original graph G and its line graph L numerically, we use the built-in ORC computation both for edges and for nodes in Graph Ricci Curvature (Ni et al., 2019). Our implementations for augmented FRC and our ORC approximation build on the code in this library. Results from each random graph model are obtained from ns = 50 realizations. 5.2 Ollivier s Ricci Curvature In this section, we establish theoretical evidence for the observed relationship between Ollivier s curvature (ORC) of a graph and its line graph. Curvature-based Clustering on Graphs 1.0 0.5 0.0 0.5 Ricci curvature of edges in G Ricci curvature of nodes in L Counts in bins 0.75 0.50 0.25 0.00 0.25 Ricci curvature of edges in G Ricci curvature of nodes in L Counts in bins 60 50 40 30 20 Ricci curvature of edges in G Ricci curvature of nodes in L Counts in bins 20 0 20 40 60 Ricci curvature of edges in G Ricci curvature of nodes in L Counts in bins 1.0 0.5 0.0 0.5 Ricci curvature of edges in G Ricci curvature of nodes in L Counts in bins 0.5 0.0 0.5 Ricci curvature of edges in G Ricci curvature of nodes in L Counts in bins 80 60 40 20 Ricci curvature of edges in G Ricci curvature of nodes in L Counts in bins 20 0 20 40 60 80 Ricci curvature of edges in G Ricci curvature of nodes in L Counts in bins 1.0 0.5 0.0 0.5 Ricci curvature of edges in G Ricci curvature of nodes in L Counts in bins 0.5 0.0 0.5 Ricci curvature of edges in G Ricci curvature of nodes in L Counts in bins 150 100 50 Ricci curvature of edges in G Ricci curvature of nodes in L Counts in bins 50 0 50 100 150 Ricci curvature of edges in G Ricci curvature of nodes in L Counts in bins Figure 8: Correlation of the distributions of the classical ORC (left), the approximation of ORC from the average of the upper and lower bounds (middle left), the classical FRC (middle right), and the augmented FRC (right) for edges in G versus that for vertices in L where the networks are obtained from (top) the planted SBM of size n = 100, two equally-sized blocks, pout = 0.02 and pin = 0.4; (middle) the 2-d RGG of size n = 100, and radius r = 0.3; (bottom) the Worm data set. Notice that the range of curvature values varies between curvature notions, but the shape of the distributions has a close resemblence. 5.2.1 Unweighted case Consider an unweighted graph G, and denote ORC in G and L(G) by Ric G O, Ric L(G) O , respectively. We prove combinatorial upper and lower bounds for ORC in L(G), which can be computed solely from structural information in the underlying graph G. These results are the analogues of Equations (4.6), (4.7), but make use of the relationships between the line graph L(G) and its base graph to avoid computing curvatures in the line graph. Lemma 11 (Bounds on line graph ORC) Let du denote the degree of the vertex u V in the graph G, and d{u,v} the degree of the vertex {u, v} E in the line graph L(G). Then we have the following bounds on the ORC of edges in L(G): 1. Upper bound on ORC: Ric L(G) O ({{u, v}, {v, w}}) dv 2 + 1E({u, w}) (du dw) + dv 2 ; (5.1) Tian and Lubberts and Weber 2 1 0 1 Ricci curvature of related edges in G Ricci curvature of edges in L Counts in bins 1.5 1.0 0.5 0.0 0.5 Ricci curvature of related edges in G Ricci curvature of edges in L Counts in bins 120 100 80 60 Ricci curvature of related edges in G Ricci curvature of edges in L Counts in bins 50 0 50 100 Ricci curvature of related edges in G Ricci curvature of edges in L Counts in bins 1 0 1 Ricci curvature of related edges in G Ricci curvature of edges in L Counts in bins 1 0 1 Ricci curvature of related edges in G Ricci curvature of edges in L Counts in bins 150 125 100 75 50 25 Ricci curvature of related edges in G Ricci curvature of edges in L Counts in bins 0 50 100 150 Ricci curvature of related edges in G Ricci curvature of edges in L Counts in bins 2 1 0 1 Ricci curvature of related edges in G Ricci curvature of edges in L Counts in bins 1 0 1 Ricci curvature of related edges in G Ricci curvature of edges in L Counts in bins 250 200 150 100 50 Ricci curvature of related edges in G Ricci curvature of edges in L Counts in bins 100 0 100 200 300 Ricci curvature of related edges in G Ricci curvature of edges in L Counts in bins Figure 9: Correlation of the distributions of the classical ORC (left), the approximation of ORC from the average of the upper and lower bounds (middle left), the classical FRC (middle right), and the augmented FRC (right) for edges in G versus that for edges in L where the networks are obtained from (top) the planted SBM of size n = 100, two equally-sized blocks pout = 0.02 and pin = 0.4; (middle) the 2-d RGG of size n = 100, and radius r = 0.3; (bottom) the Worm data set. Notice that the range of curvature values varies between curvature notions, but the shape of the distributions has a close resemblance. 2. Lower bound on ORC: Ric L(G) O ({{u, v}, {v, w}}) 1 1 du + dv 2 1 dv + dw 2 dv 2 + 1E({u, w}) (du dw) + dv 2 1 1 du + dv 2 1 dv + dw 2 dv 2 + 1E({u, w}) (du dw) + dv 2 + dv 2 + 1E({u, w}) (du dw) + dv 2 . (5.2) Here, 1E({u, w}) = 1 if u, w are connected by an edge in G, and 1E({u, w}) = 0 otherwise. A crucial feature of this lemma is that it does not require the computation of triangle counts in L(G): It only requires evaluating the degrees of the constituent vertices, and the existence of the edge {u, w} E. The latter results in an additional triangle in the base graph G and, since L(K3) is isomorphic to K3, an additional triangle in the line graph. Curvature-based Clustering on Graphs 5.2.2 Weighted case Importantly, we can show a version of these bounds in the weighted case, too. Let ω({u, v}, {v, w}) denote the weight of the edge {{u, v}, {v, w}} E. Then for an edge {e1, e2} E, Theorem 9 gives us the following result, which makes use of the notation from Section 4.1.3. 1. Upper bound on ORC: Ric O({e1, e2}) 1 max d L(G)(e, N) ω(e1, e2) (m1(e) m2(e))+, X d L(G)(e, P) ω(e1, e2) (m2(e) m1(e))+ 2. Lower bound on ORC: Ric O({e1, e2}) 1 X ω(ℓ, e1) ω(e1, e2)m1(ℓ) X ω(r, e2) ω(e1, e2)m2(r) ω(e1, e2)(m1(c) m2(c))+ + ω(c, e1) ω(e1, e2)(m2(c) m1(c))+ L1 + X1 X2 X c (m2(c) m1(c))+ Note that the upper bound for ORC in the line graph still requires one to compute the line graph. However, we can always use the trivial upper bound Ric O({e1, e2}) 1 if an approximation will suffice. See Section 7.4 for experimental results utilizing this approach. Summary. In the applications discussed in the next section, we again approximate ORC as the arithmetic mean of the upper and lower bounds for the sake of computational efficiency: Definition 12 (Approximate ORC in the Line Graph) [ Ric O L(G)({u, v}) := 1 Ric L(G) O up({u, v}) + Ric L(G) O low({u, v}) . (5.3) In Figure 10, we compare this approximation to the original ORC for some simulated graphs. Note that the approximations in the right two columns can be computed only from information contained in the graph G, so that L does not need to be constructed. 5.3 Forman s Ricci Curvature In this section, we establish a series of relationships between Forman s curvature of a graph and its line graph. We further discuss how higher order structures in the line graph may be incorporated in the curvature computation. 5.3.1 Unweighted case Consider an unweighted graph G, and denote FRC in G and L(G) by Ric G F , Ric L(G) F , respectively. At first, we only consider the 1-complex curvature notion (Equations (4.9), (4.10)), which does not consider contributions of higher-order structures, such as triangles or quadrangles. We begin by investigating the relationship of edge-level Forman curvature in a graph G and the node and edge-level curvatures in its line graph L(G). Tian and Lubberts and Weber Figure 10: Scatter plot of the ORC on L obtained from optimization versus its approximation with L constructed (left two columns) and without (right two columns), where the optimization is done by solving the earth mover s distance exactly (left most and middle right) and the approximately with Sinkhorn (middle left and right most), and the original graph G is obtained from the planted SBM of size n = 100, two equally-sized blocks, pout = 0.02 and pin = 0.4 (top) and the 2-d RGG of size n = 100 and radius r = 0.4 (bottom). Lemma 13 Edge-level FRC in G relates to edge-level FRC in L as Ric L(G) F ({{u, v}, {v, w}}) = Ric G F ({u, v}) + Ric G F ({v, w}) . (5.4) Edge-level FRC in G relates to node-level FRC in L as Ric L(G) F ({u, v}) = Ric G F (u) + Ric G F (v) Ric G F ({u, v})2 . (5.5) Importantly, this lemma allows us to compute node-level FRC in the line graph with respect to quantities in the original graph only. Specifically, the right hand side can be computed from edge-level FRC only, utilizing Equation (4.10). See the third columns in Figures 8 and 9 for an illustration of these relationships in a few simulated and real-data graphs. 5.3.2 Weighted case When the vertex weights ωu = 1 for all u V , and using the line graph edge weights ω{e1,e2} = ωe1ωe2, we get the following formula for the relationship between the edge-level FRC in G versus the edge-level FRC in L(G): Curvature-based Clustering on Graphs Ric L(G) F ({{u, v}, {v, w}}) = ω{u,v} ω{u,v} (2 Ric G F ({u, v})) ω{u,v} ω{v,w} (2 Ric G F ({v, w})) 5.3.3 Incorporating higher-order structures Finally, we discuss how to account for contributions of higher-order structures, such as triangles or quadrangles, in the computation of curvature in the line graph. As discussed in Section 4, the curvature contributions of such structures are crucial in curvature-based clustering. We have demonstrated that the successful application of the FRC-based approach to single-membership community detection relies on the consideration of triangular faces, such as in our 2-complex FRC notion (Equation (4.11)). In the following, we will illustrate the necessity of including contributions of quadrangular faces in the curvature contribution on the line graph. We first define a 2-complex FRC notion on the line graph, which accounts for curvature contributions of triangular faces. Let Γ(v) = {v : {v, v } E} denote the set of neighbors for a vertex v V in G. Suppose {u, v}, {v, w} E denote neighboring edges in G and ({u, v}, {v, w}) E the corresponding edge in the line graph. We define the set of neighbors for an edge {u, v} as follows: Γ {u, v} := n {u, u } : u Γ(u)\{v} o n {v, v } : v Γ(v)\{u} o The set of triangles in the line graph containing {u, v}, {v, w} is then given by F := n {u, v}, {v, w}, e : e Γ({u, v}) Γ({v, w}) o = n {u, v}, {v, w}, e : e {v, v } : v Γ(v)\{u, w} {{u, w}} E o . Hence, there are only two possible structures in G, which give rise to triangles in its line graph L: (i) edges incident to the common node of the two edges under consideration, or (ii) the other endpoints of the two edges are connected with each other. The 2-complex FRC with triangle contributions can then be written as follows: Definition 15 (2-complex FRC in the Line Graph (k 3)) Ric F {{u, v}, {v, w}} = ω{u,v},{v,w} ω{u,v},{v,w} ωf + ω{u,v} ω{u,v},{v,w} + ω{v,w} ω{u,v},{v,w} x Γ(u)\{v,w} ω{u,v} ω{u,v},{v,w} ω{u,v},{u,x} + X x Γ(w)\{u,v} ω{v,w} ω{u,v},{v,w} ω{v,w},{w,x} Tian and Lubberts and Weber In the special case of an unweighted network (i.e., ωe = 1 for all e V and ω{e,e } = 1 for all {e, e } E), we have Ric F {{u, v}, {v, w}} = 2 + X 1 ωf n e Γ {u, v} : {u, v}, {v, w}, e / F o n e Γ({v, w}) : {{u, v}, {v, w}, e} / F o . (5.6) If we assign each triangle the same weight ω , this further simplifies the above equation as follows: Lemma 16 (2-complex FRC in the Line Graph (triangles only)) Ric F ({{u, v}, {v, w}) = dv ω du dw + 4 2 ω + 2 1E({u, w}) . (5.7) The lemma implies that the triangles-only 2-complex FRC in the line graph can still be largely determined by the degree of the nodes in the original graph G; the local triangle count enters via the terms 1E({u, w}) (which is 1 if {u, w} E and 0 otherwise). A simple calculation shows that ω = 3/4, i.e., 1/ω +2 = (4 3+6)/3 3.309, while node degree could be as high as O(|V |). As such, a triangle in G increases the curvature value in L by 1/ω + 2. However, our experimental results (see Table 11) demonstrate that clustering methods based on the triangles-only 2-complex FRC can have low accuracy. While considering triangles was sufficient for FRC-based community detection on the original graph, we need to incorporate additional higher-order information to perform community detection on the line graph with high accuracy. Our experiments suggest that accounting for curvature contributions of quadrangles leads to a significant improvement in the performance of FRC-based mixed-membership community detection (see Table 11). We found that incorporating curvature contributions of k-faces with k 5 does not improve performance, while significantly increasing the computational burden. This is expected, given the decreasing frequency of such structures (see Figure 7). These design choices lead to the following 2complex FRC notion, which we will utilize in our FRC-based clustering method below: The set of quadrangles in the line graph, which contains {u, v}, {v, w} is given by F := n {u, v}, {v, w}, {w, x}, {x, u} : x (Γ(w) Γ(u))\{v} o . As such, there are two ways for quadrangles to form in the line graph: First, any quadrangle in the original graph G gives rise to one in the line graph L. Second, a quadrangle with a single chord ({u, v, w, x}, {{u, v}, {v, w}, {w, x}, {x, u}, {u, w}}) still results in a quadrangle in L from ({u, v}, {v, w}, {w, x}, {x, u}), since there are no edges from {u, v} to {w, x} or {v, w} to {x, u} in L. Curvature-based Clustering on Graphs Definition 17 (2-complex FRC in the Line Graph (k 4)) Ric F {{u, v}, {v, w}} = ω{u,v},{v,w} ω{u,v},{v,w} ω{u,v},{v,w} ωf + ω{u,v} ω{u,v},{v,w} + ω{v,w} ω{u,v},{v,w} {u,v},{v,w},e1,e2 F ω{u,v},{v,w} ωe1,e2 x Γ(u)\Γ0(w) ω{u,v} ω{u,v},{v,w} ω{u,v},{u,x} X x Γ(w)\Γ0(u) ω{v,w} ω{u,v},{v,w} ω{v,w},{w,x} where Γ0(x) = Γ(x) {x}. Our choice of weights for quadrangles is also based on geometric intuition utilizing Heron s Formula. Full details for this are given in Appendix A.4. 6. Algorithms 6.1 ORC-based approach The curvature-based community detection algorithms (Algorithm 1) are built on the fact that bridges between communities are more negatively curved than edges within communities. We have discussed earlier that this can be observed in both graphs (in the singlemembership case) and their line graphs (in the case mixed-membership communities). In this section, we provide a detailed discussion of Algorithm 1 with κ denoting Ollivier s Ricci curvature (ORC). We begin by giving some intuition for the differences in curvature values of bridges and internal edges, before formalizing the argument below. By construction, ORC is closely linked to the behavior of two random walks starting at neighboring vertices (Ollivier, 2009; Jost and Liu, 2014). Informally, they are more likely to draw apart if the edge between them has negative ORC, and to draw closer together otherwise. Random walks that start at nodes adjacent to a bridge (bridge nodes), typically move into the communities to which the respective nodes belong. As a consequence, they draw apart quickly. Random walks that start at non-bridge nodes, i.e., nodes adjacent to internal edges, are more likely to stay close to each other, as they remain within the same community. Hence, we expect bridges to have much lower ORC than internal edges, a fact that is easily confirmed empirically (see, e.g., Figure 4). Remark 18 We note that an alternative notion of Ollivier s curvature (Equation (4.3)) was given by Lin et al. (2011), where node neighborhoods are endowed with the measure 1 α |Nu|, u z Tian and Lubberts and Weber instead of the uniform measure. A weighted version of this curvature was considered in (Ni et al., 2019). Notice that for α > 0, this curvature notion connects to lazy random walks starting at neighboring nodes. Here, the parameter α could be seen as controlling whether the random walk is likely to revisit a node, which in turn relates to a distinction between exploration of node neighborhoods in the style of breadth-first search vs. depths-first search . A suitable choice of α is highly dependent on the topology of the graph; hence, replacing ORC with Equation (6.1) provides an additional means for incorporating side information on the structure of the underlying graph. However, in this work we restrict ourselves to the classical ORC notion. 6.1.1 Algorithm Single-Membership Community Detection. We implement Algorithm 1 via ORC by setting κ( ) = [ Ric O( ), i.e., the curvature is chosen to be the combinatorial ORC approximation proposed in Section 4.1. Notice that as the edge weights evolve under the Ricci flow, the graph is weighted, even if the input graph was unweighted. Consequently, we use weighted ORC curvature, constructing the approximation from the bounds in Theorem 9. We further choose equally-spaced cut-offpoints {xi}nf i=0, where x0 = max{u,v} E w T u,v, x1 = x0 δ, . . . , xnf = ((x0 1) mod δ + 1), and the step size for the cut-offpoints is set to be δ = 0.025. Other hyperparameter choices include a constant step size ν = 1, ϵ = 10 4, drop threshold ϵd = 0.1, and stopping time T = 10, i.e., we evolve edge weights under Ricci flow for 10 iterations. Instances of Algorithm 1 via ORC were previously considered in (Ni et al., 2019; Sia et al., 2019; Gosztolai and Arnaudon, 2021). Both approaches computed ORC either exactly (via the earth mover s distance) or approximately (via Sinkhorn s algorithm). However, due to the higher computational cost of either variant of the ORC computation, the approaches were limited in their scalability (see discussion Section 6.1.3 below). Mixed-Membership Community Detection. As in the single-membership case, we implement Algorithm 1 via our combinatorial ORC approximation (Equation (5.3)). The input graph G is the line graph of the underlying graph, which is constructed before the clustering procedure is started. All edge weights are initially set to 1, i.e., the input graph is unweighted. Hyperparameter choices are analogous to the single-membership case. Instance of Algorithm 1 via ORC were first considered in (Tian et al., 2023). Again, due to the high computational cost of the ORC computation, the approach was limited in its scalability. 6.1.2 Theoretical results. Single-Membership case. To give theoretical intuition for curvature-based community detection algorithms in the single-membership case, we considered the following model class of graphs with community structure: Definition 19 Let Ga,b (a > b 2) be a graph constructed as follows: 1. Construct a complete graph with b vertices {v1, . . . , vb}. 2. For each i = 1, . . . , b, introduce vertices uij, j = 1, . . . , a, and make all possible connections between vertices in {vi} {uij}a j=1, resulting in b copies of complete graphs Ka+1. Curvature-based Clustering on Graphs We show examples of graphs of the form Ga,b in Figure 11. Notice that in a graph Ga,b we have three types of edges: 1. bridges between communities, we denote vertices adjacent to bridges as bridge nodes and all others as internal nodes; 2. internal edges that connect a bridge node and an internal node; 3. internal edges that connect two internal nodes. As before, we consider the following mass distribution on the node neighborhoods: ( 1 C e wx,y, x y 0, else . (6.2) Here, C denotes the normalizing function. This choice follows from the more general scheme in Equation (6.1) by setting α = 0, p = 1. We assume that at initialization, all edges are assigned weight 1. Theorem 20 Let the edge weights in Ga,b evolve under the Ricci flow wt+1 i (1 κt i)wt i (where κt i denotes ORC for the edge type i using the weights at time t). 1. The weights of internal edges (type (3)) contract asymptotically, i.e., wt 3 0 as t . 2. The weights of bridges are larger than those of internal edges of types (2) and (3), i.e., wt 1 > wt 2 and wt 1 > wt 3 for all t > 0. Remark 21 We note that the class of graphs Ga,b was previously considered in (Ni et al., 2019), which also give asymptotic guarantees for the evolution of edge weights under the Ricci flow. However, they assume a different mass distribution, hence, their asymptotic results are not directly applicable to our setting. We will present the full proof of this theorem in Appendix A.5, but we give a sketch of the proof here. The argument proceeds by induction, first calculating the exact ORC curvatures of the three edge types at iteration 1, and then calculating the exact ORC curvatures of the three edge types at iteration t, given the weights wt i. In each of these steps, we construct transport plans, as well as 1-Lipschitz functions with equal objective function values to show their optimality. This proves that we have the exact values of the Wasserstein distances. We use these exact formulas to show that the evolution of the weights has the properties given in the statement. Mixed-Membership case. We modify the model class introduced in Definition 19 to account for mixed-membership community structure: Definition 22 Let La,b (a > b 2) be a graph constructed as follows: 1. Construct a star-shaped graph, consisting of a center node of degree b and b leaf nodes of degree 1. Tian and Lubberts and Weber 2. Replace each leaf node with a star-shaped graph with a + 1 vertices. Notice that La,b consists of b blocks (communities) of size a+2 with one common node, i.e., the center node is a member of each of the b communities. It is easy to see that the dual of a graph La,b is a graph of the form Ga,b. Consequently, we recover guarantees for curvature-based clustering on the line graph from Theorem 20. 6.1.3 Computational Considerations Figure 11: Examples of graphs of the model classes Ga,b and La,b (top: a = b = 2, bottom: a = b = 3). For both pairs, Ga,b is the line graph of La,b. The two key bottlenecks in ORC instances of Algorithm 1 are (1) the computation of the W1distance in the ORC computation (both singleand mixed-membership case), and (2) the construction of the line graph (mixed-membership case). Following our discussion in Section 4.1, we can circumvent bottleneck (1) by approximating ORC with the arithmetic mean of its upper and lower bounds, which are given by Equation (4.8) in the original graph (use in singlemembership case) and Equation (5.3) in the line graph (use in mixed-membership case). This reduces the complexity of the ORC computation to O(|E|dmax), in comparison with previous approaches (Ni et al., 2019; Sia et al., 2019), which achieved at best O(|E|d2 max) via Sinkhorn s approximation of the Wasserstein distance. The second bottleneck may also be circumvented in special cases. We discuss the details in Section 7.4 below. 6.2 FRC-based approach Finally, we discuss the second instance of Algorithm 1, where κ denotes Forman s Ricci curvature (FRC). Again, we observe that, typically, bridges have lower FRC than internal edges in communities, which can be exploited for curvature-based community detection. As in the previous section, we first provide some informal intuition on this observation, before formally introducing the algorithm. Notice that bridge nodes have typically a high node degree, as they connect not only to other nodes within the same community, but also to nodes in other communities. It is easy to see from the definition (recall, Ric F (e) = 4 dv1 dv2) that, as a result, bridges have usually lower curvature than internal edges in unweighted networks. This imbalance in curvature values is iteratively reinforced as edge weights evolve under the Ricci flow. In particular, we see that, in the case of bridges, the 2-complex FRC (Equation (4.11)) Ric F (e) := ωe Curvature-based Clustering on Graphs is increasingly dominated by the second term, due to the high degree of the adjacent vertices and, consequently, the large number of parallel edges. This is reinforced as the edge weight increases under Ricci flow, which decreases the first term, resulting in negative curvature with high absolute value. In contrast, for internal edges, the second term has a smaller magnitude due to the smaller degrees of the adjacent vertices. This is again reinforced as the edge weight decreases under Ricci flow. 6.2.1 Algorithm Single-Membership Community Detection. We implement Algorithm 1 via FRC by setting κ( ) = Ric F ( ), chosen to be the 2-complex FRC with triangle contributions (Equation (4.12)). In the absence of given edge weights, we initialize edge weights to 1 as before. Adaptive step sizes are chosen proportional to the inverse of the highest absolute curvature of any edge in a given iteration, i.e., νt = 1/(1.1 max{u,v} E |κt u,v|). In our experiments below, we evolve edge-weights for T = 10 iterations under the Ricci flow. To infer communities, cut-offpoints are chosen as {xi}nf i=0, where x0 = max{u,v} E w T u,v, x1 = max{u,v} E,w Tu,v 0, we will send this amount to y along (c, y). Otherwise, we will send my(c) mx(c) from x along (c, x). This results in the updated (signed) measure Xx P c( my(c) mx(c))+ if v = x my(c) if v = c Yx + P c( mx(c) my(c))+ if v = y. Finally, if ˆmx(x) Xy > 0, send this amount to y along the edge (x, y). Otherwise, send ˆmx(y) Yy = ( ˆmx(x) Xy) from y to x along the (x, y) edge. This gives the upper bound W1( mx, my) X c ( mx(c) my(c))+ω(c, y) + ( my(c) mx(c))+ω(c, x) c ( my(c) mx(c))+ Finally, we have W1(mx, my) X ℓ ω(ℓ, x)mx(ℓ) + X r ω(r, y)my(r) c ω(c, y)(mx(c) my(c))+ + ω(c, x)(my(c) mx(c))+ Lx + Xx Xy X c (my(c) mx(c))+ ω(x, y) = Ux,y. For completeness, we state and prove the following lower bound on the Wasserstein distance: Lemma 27 For any 1-Lipschitz function h(v) on G, we have W1(mx, my) Ev mx[h(v)] Ev my[h(v)] = X v N(x) N(y) h(v)(mx(v) my(v)) . Tian and Lubberts and Weber Proof Given a coupling π for mx and my, and a 1-Lipschitz function h : V R, we have E(u,w) π[d G(u, w)] = X u N(x) w N(y) d G(u, w)π(u, w) u N(x) w N(y) (h(u) h(w))π(u, w) v N(x) N(y) h(v)(mx(v) my(v)). Since the choice of coupling was arbitrary, we conclude that W1(mx, my) satisfies the same lower bound. Now we may prove Lemma 7. Proof of Lemma 7: Let VH = N(x) N(y) be the set of vertices from the subgraph H of G that we considered in the upper bound for W1(mx, my). A difficulty that arises in defining a 1-Lipschitz function h on G is that there are more edges and paths in G than just those in H. These short-circuits in the larger graph mean that a 1-Lipschitz function defined on H does not necessarily extend to a 1-Lipschitz function on G. We define a partition of VH into three subsets P, Z, N based on whether mx(v) my(v) is Positive, Zero, or Negative. Observe that for a vertex v P, the excess mass mx(v) my(v) must be transported away from v in order to have a valid transport plan. In fact, it must be transported at least enough to reach N, since v V mx(v) my(v) = X v P (mx(v) my(v)) + X v N (mx(v) my(v)) implies that mx(P) my(P) = my(N) mx(N). In other words, all excess mass from mx in P must be transported to vertices in N, or we cannot have a valid transport plan. Motivated by this observation, we define the distance from a vertex v V to a set of vertices S V as d G(v, S) = minu S d G(v, u), where d G is the shortest path distance in G. Thus, d G(v, S) gives the distance in G from the vertex v to the nearest element in the set S. Consider the following 1-Lipschitz functions parameterized by λ [0, 1]: λd G(v, N) if v P (1 λ)d G(v, P) if v N λd G(v, N) (1 λ)d G(v, P) otherwise. Note that the third equation works for all v V , since d G(v, P) = 0 for v P and similarly for N, but we split this into cases to show the behavior on the different sets more clearly. These functions are all 1-Lipschitz since they are convex combinations of the two 1Lipschitz functions d G( , N) and d G( , P). Let s show that these are 1-Lipschitz: Let Curvature-based Clustering on Graphs S V , and let u, v V . Let wu, wv S be the nearest vertices in S to u, v respectively, so that d G(u, wu) = d G(u, S) and similarly for wv. Then d G(v, S) d G(v, wu) d G(v, u) + d G(u, wu) = d G(u, v) + d G(u, S) d G(u, S) d G(u, wv) d G(u, v) + d G(v, wv) = d G(u, v) + d G(v, S) = |d G(v, S) d G(u, S)| d G(u, v). But since the objective function Ev mx[h(v)] Ev my[h(v)] is linear in h, we see that the maximizer of this quantity over {hλ : λ [0, 1]} must be achieved at h0 or h1. This gives the following lower bound for W1(mx, my) : W1(mx, my) max λ {0,1} v VH hλ(v)(mx(v) my(v)) v P hλ(v) (mx(v) my(v)) | {z } >0 v Z hλ(v) (mx(v) my(v)) | {z } =0 v N hλ(v) (mx(v) my(v)) | {z } <0 v P d G(v, N)(mx(v) my(v)), X v N d G(v, P)(my(v) mx(v)) We note that vertices of type ℓwill always belong to P, and vertices of type r will always belong to N. Vertices of type c may fall into either set, and there will typically be some of these vertices in both sets. When we set α = 0, then we always have x N and y P, though if we choose α 1/2, x cannot be in N, and y cannot be in P. A.2 Note on combinatorial ORC bound by Jost-Liu Remark 28 We note that the lower bound for W1(mx, my) given in Jost and Liu (2014) for weighted graphs using the measure mx(v) = wvx/dx is actually incorrect. Consider the weighted graph G shown in Figure 14, which also summarizes the measures mx, my. x y ℓ c mx 0 1/6 1/2 1/3 my 1/3 0 0 2/3 h -6/5 -1/5 1 4/5 Figure 14: The weighted graph G, the measures mx, my from Jost and Liu (2014), and our 1-Lipschitz function h. Tian and Lubberts and Weber An optimal transport plan sends 1/6 mass from y to x and from c to x, and 1/2 mass from ℓto c, for a cost of 3/5. We can verify the optimality of this transport plan with the 1-Lipschitz function h given in the table in Figure 14, since this also achieves the value 3/5. As such, the exact ORC of the edge {x, y} in G is 1 3/5 = 2/5. However, the upper bound on this ORC from Jost and Liu (2014) is A.3 Proof of Results in Section 5.2 Here is the proof of Lemma 11. Proof Let (u, v) denote the number of triangles in G which include the edge {u, v}, and ({u, v}, {v, w}) the number of triangles in L(G), which include {{u, v}, {v, w}}. Both bounds follow from a simple combinatorial computation: For the upper bound, we have Ric L(G) O ({{u, v}, {v, w}}) ({u, v}, {v, w}) d{u,v} d{v,w} = dv 2 + 1E({u, w}) (du dw) + dv 2 . The lower bound follows from Ric L(G) O ({{u, v}, {v, w}}) 1 1 d{u,v} 1 d{v,w} ({u, v}, {v, w}) d{u,v} d{v,w} 1 1 d{u,v} 1 d{v,w} ({u, v}, {v, w}) d{u,v} d{v,w} + + ({u, v}, {v, w}) d{u,v} d{v,w} = 1 1 du + dv 2 1 dv + dw 2 dv 2 + 1E({u, w}) (du dw) + dv 2 1 1 du + dv 2 1 dv + dw 2 dv 2 + 1E({u, w}) (du dw) + dv 2 + dv 2 + 1E({u, w}) (du dw) + dv 2 . A.4 Proofs and Remarks from Section 5.3.1 We prove Lemma 13. Proof The relationship arises from simple combinatorial arguments. Recalling Lemma 10, we have Ric L(G) F ({{u, v}, {v, w}}) = 4 d{u,v} d{v,w} = 4 (du dv 2) (dv + dw 2) = 8 du dv dv dw = Ric G F ({u, v}) + Ric G F ({v, w}), Curvature-based Clustering on Graphs which gives the claim. Similarly, Ric L(G) F ({u, v}) = X {{u,v},{v,w}} E Ric L(G) F ({{u, v}, {v, w}}) + X {{u,v},{u,w}} E Ric L(G) F ({{u, v}, {u, w}}) {v,w} E w =u Ric G F ({u, v}) + Ric G F ({v, w}) + X {u,w} E w =v Ric G F ({u, v}) + Ric G F ({u, w}) = d{u,v}Ric G F ({u, v}) + X {v,w} E w =u Ric G F ({v, w}) + X {u,w} E w =v Ric G F ({u, w}) = (du + dv 2)Ric G F ({u, v}) + Ric G F (v) Ric G F ({u, v}) + Ric G F (u) Ric G F ({u, v}) = Ric G F (u) + Ric G F (v) Ric G F ({u, v})2. We now prove Lemma 16. Proof Ric F ({{u, v}, {v, w}}) ω + 2 n e Γ {u, v} : {u, v}, {v, w}, e / F o n e Γ({v, w}) : {{u, v}, {v, w}, e} / F o = dv 2 + 1E({u, w}) ω + 2 du 1 1E({u, w}) + dw 1 1E({u, w}) ω du dw + 4 2 ω + 2 1E({u, w}) . Weights for quadrangles in augmented FRC To determine the weights of quadrangles, we again utilize Heron s formula: If we treat the two edges with the largest weights to be parallel, we can compute the weight of a quadrangular face via the area of the two triangles that form the trapezoid. Specifically, let f = (ei, ej, ek, el) denote a quadrangle in G, i.e., ei ej, ej ek, ek el, el ei while ei ek and ej el, where ωei ωej ωek ωel, and we consider three different cases here: (i) ωei > ωej, i.e., when the two largest weights (or lengths) are not the same, (ii) ωei = ωej and ωek > ωel, i.e., when the two largest weights are the same while the other two are different, and (iii) ωei = ωej and ωek = ωel, i.e., when the two largest are the same while the others are also the same. In case (i), we assume that ei, ej are parallel to each other, and then the area of the quadrangle can be obtained via Tian and Lubberts and Weber Heron s formula: s(s (ωei ωej))(s ωek)(s ωel) 1 + 2 ωej ωei ωej s = (ωei ωej) + ωek + ωel In case (ii), we assume that ek, el are parallel to each other, and similarly s(s (ωek ωel))(s ωei)(s ωej) 1 + 2 ωel ωek ωel s = (ωek ωel) + ωei + ωej Finally, in case (iii), we assume that f is a rectangle, thus ωf := ωeiωek . (A.6) A.5 Proof of Theorem 20 Proof Due to the homogeneity of the graphs Ga,b, which is preserved under the Ricci flow, it is sufficient to analyze the evolution of one edge of types (1), (2), (3). In the following, di, Di, wt i, κt i denote the shortest-path distance, Wasserstein-1 distance, weight in iteration t, and ORC curvature of an edge of type (i) in iteration t, respectively. Notice that here wt+1 i = (1 κt i)wt i = wt i 1 Di wt i = Di , (A.7) since di = wt i. Throughout the proof, x, y denote the vertices adjacent to the edge under consideration. Iteration 1. In the first iteration, all edges have the same weight (i.e., 1), hence, Equation (6.2) reduces to the uniform distribution. Edges of type (3) have symmetric neighborhoods, with uniform mass distributions of mv = 1 a on all neighbors of x, y. Thus, the total transportation cost of moving mass from x s neighbors to y s neighbors is D3 = 1 ad3 and therefore w1 3 = D3 = 1 a, since d3 = 1. For edges of type (1), note that dx = dy = a + (b 1), hence mv = 1 a+b 1 for all neighbors of x, y. By construction, x, y have b 2 common neighbors; due to the symmetry of the neighborhoods, no mass has to be moved for those neighbors. The optimal transportation plan for the remaining mass is as follows: Move the remaining mass a a+b 1 from the neighbors of x (which are not neighbors of y) to x. This incurs a cost of a a+b 1d2, since this mass is transported along edges of type (2). Leave a mass of 1 a+b 1 on x and transport the remainder to y, which incurs a cost of a 1 a+b 1d1, since it is transported along an edge of type (1). Lastly, we distribute mass a a+b 1 from y uniformly to its neighbors, which incurs a cost of a a+b 1d2, since it is transported along edges of type (2). In total, the transportation cost amounts to D1 = 2a a + b 1d2 + a 1 a + b 1d1 , Curvature-based Clustering on Graphs To prove that this is an optimal flow, consider the following 1-Lipschitz function h: 2 if v x, v y 1 if v = x 0 if v = y or v x, y 1 if v y, v x. Ev mx[h(v)] Ev my[h(v)] = 2 a a + b 1 + 1 ( 1) a + b 1 1 ( a) a + b 1 = 3a 1 a + b 1. This implies w1 3 = D3 = 3a 1 a+b 1, since d1 = d2 = 1. For edges of type (2), let x denote the internal node and y the bridge node. By construction, x, y have a 1 common neighbors (internal nodes). We want to leave a mass of 1 a+b 1 on each of those nodes (corresponding to the mass distribution in the neighborhood of y), and transport the excess mass to x and the unshared neighbors of y in an optimal way. Observe that 1 a > 1 a+b 1, so we have a total of (a 1) 1 a 1 a+b 1 excess mass to move. If this exceeds 1 a+b 1, we send that amount to x, and the remaining amount to y; otherwise we send all of this mass to x. We can simplify this condition as follows: a 1 a + b 1 1 a + b 1 = (a 1)(b 1) a a(a + b 1) 0 if and only if (a 1)(b 1) a = a(b 2) (b 1) 0. When b = 2, this never holds, and when b 3, this always holds, since a(b 2) (b 1) a b + 1 0, given that a b. Now let us consider these cases in turn. Case b = 2: We send all of the excess mass from the common vertices to x, amounting to a 1 a(a+1) < 1 a+1, meaning that we must send an additional 1 a(a+1) mass from y to x, and the remaining 1 a+1 is sent to y s single unshared neighbor. The total cost is D2 = a 1 a(a + 1)d3 + 1 a(a + 1)d2 + 1 a + 1d1. To see that this is an optimal flow, consider the following 1-Lipschitz function h: ( 1 if v x, y or v = y 0 otherwise. This achieves a dual objective function value of 2 a+1, so w1 2 = D2 = 2 a+1, since d1 = d2 = d3 = 1. Case b 3: We send 1 a+b 1 of the excess mass from the common vertices to x, and the remainder (a 1)(b 1) a a(a+b 1) to y. Since there is already a mass of 1 a at y, this gives a Tian and Lubberts and Weber total mass of (a 1)(b 1) a+(a+b 1) a(a+b 1) = b 1 a+b 1 to be sent to y s unshared neighbors, as required. The total cost is D2 = 1 a + b 1d3 + (a 1)(b 1) a a(a + b 1) d2 + b 1 a + b 1d1. To see that this is an optimal flow, consider the following 1-Lipschitz function h: 1 if v x, y 0 if v = x, y 1 if v y, v x. This achieves a dual objective function value Ev mx[h(v)] Ev my[h(v)] = 1(a 1)(b 1) a(a + b 1) 1 (b 1) a + b 1 = (2a 1)(b 1) a(a + b 1) , which gives w1 2 = D2 = (2a 1)(b 1) a(a+b 1) = 2a 1 a+b 1 b 1 a , since d1 = d2 = d3 = 1. We summarize these calculations in the following table: Type 1 2 3 b = 2 3a 1 1 a b 3 3a 1 a+b 1 (2a 1)(b 1) In both cases, it is easy to check that w1 1 > w1 2 > w1 3, using a b 2. Iteration t+1. In round t 1, the mass distributions on the neighborhood of a bridge node are given by ( 1 Cb e wt 1, y is bridge node 1 Cb e wt 2, else , and for an internal node by ( 1 Cin e wt 2, y is bridge node 1 Cin e wt 3, else . Here the normalizing constants for bridge nodes is given by Cb = ae wt 2 + (b 1)e wt 1 , and for internal nodes as Cin = e wt 2 + (a 1)e wt 3 . By way of induction, we suppose that wt 1 > wt 2 and wt 1 > wt 3, where we proved this in the case t = 1 above. We again consider the three edge types separately. Given wt i, edge weights are updated as follows in iteration t: Curvature-based Clustering on Graphs For edges of type (3), due to the symmetry of the neighborhoods, we again only move mass from x to y at a total cost of D3 = wt 3 Cin e wt 3. Hence, the updated weights are wt+1 3 = wt 3 Cin e wt 3 by Equation (A.7). For edges of type (1), the transport plan is analogous to the one in iteration 1: We move a mass of a Cb e wt 2 from the neighbors of x to x, leave a mass of 1 Cb e wt 1 there, and transport the remaining mass a Cb e wt 2 1 Cb e wt 1 to y. Picking up a mass of 1 Cb e wt 1 previously on y, we distribute the mass among y s neighbors. The total transportation cost is D1 = a Cb e wt 2d2 + a Cb e wt 2 1 Cb e wt 1 d1 + a Cb e wt 2d2 . To prove that this is an optimal flow, consider the following 1-Lipschitz function h: wt 2 + wt 1 if v x, v y wt 1 if v = x 0 if v = y or v x, y wt 2 if v y, v x. Ev mx[h(v)] Ev my[h(v)] = (wt 2 + wt 1) a Cb e wt 2 + wt 1 ( 1) Cb e wt 1 wt 2 ( a) Consequently, the updated weights are wt+1 1 = 2awt 2+awt 1 Cb e wt 2 wt 1 Cb e wt 1. For edges of type (2), let x denote the internal node and y the bridge node. At any of the a 1 common neighbors of x, y, we have mx(v) = e wt 3 Cin , my(v) = e wt 2 Cb . We define the following quantity ct = mx(v) my(v) = e wt 3Cb e wt 2Cin Cb Cin = ae wt 2 wt 3 + (b 1)e wt 1 wt 3 e 2wt 2 (a 1)e wt 2 wt 3 Cb Cin = (b 1)e wt 1 wt 3 + e wt 2(e wt 3 e wt 2) Cb Cin . When wt 2 > wt 3, this is always positive, but if wt 2 < wt 3, this could be negative. If ct 0, then as before, we want to transport the excess mass from these common neighbors to x and the unshared neighbors of y in an optimal way. If the total excess mass exceeds my(x), we send that amount to x, and the remaining amount to y; otherwise we send all of this mass to x. This condition may be written as rt := (a 1)e wt 3 Cin ae wt 2 Cb 0. Tian and Lubberts and Weber Clearly, rt = (a 1)ct e wt 2 Cb < (a 1)ct. Case ct, rt 0: We send my(x) mass from the common neighbors to x, and the remaining portion to y. Then we send all of the mass at y to its unshared neighbors, for a cost of D2 = e wt 2 Cb d3 + rtd2 + (b 1)e wt 1 Cb d1. Consider the 1-Lipschitz function h given by wt 2 if v x, y wt 2 wt 3 if v = x 0 if v = y wt 1 if v y, v x. This gives a dual objective function value wt 2(a 1)ct + (wt 2 wt 3) e wt 2 Cb wt 1 (b 1)e wt 1 Cb , which matches the formula for D2 above when we take into consideration that dj = wt j at iteration t. Case ct 0, rt 0: We send all of the excess mass from the common neighbors to x, and get the remaining portion from y. We then send the rest of the mass at y to its unshared neighbors, for a total cost of D2 = (a 1)ctd3 + ( rt)d2 + (b 1)e wt 1 Cb d1. Consider the 1-Lipschitz function h given by wt 3 if v x, y 0 if v = x wt 2 if v = y wt 2 wt 1 if v y, v x. To show that this function is 1-Lipschitz, note that if v is a common neighbor of x, y, then |h(v) h(y)| = |wt 2 wt 3| wt 2: either wt 2 > wt 3, in which case |wt 2 wt 3| = wt 2 wt 3 wt 2, or else wt 2 < wt 3, and wt 3 wt 2 wt 2 follows by the triangle inequality since wt 3 = W1(mv, mx) W1(mv, my) + W1(my, mx) = 2wt 2, where W1 is the Wasserstein1 distance, and mx, mv, my are the measures at the vertices x, v, y with the weights from iteration t 1 (or when t = 1, w0 3 = 1 < 2 = w0 2). This gives a dual objective function value wt 3(a 1)ct + wt 2 e wt 2 Cin + (wt 2 wt 1) (b 1)e wt 1 Cb . Curvature-based Clustering on Graphs This gives the equality when we note that rt = ae wt 2 Cb (a 1)e wt 3 Cin = 1 (b 1)e wt 1 Cb 1 e wt 2 Cin = e wt 2 Cin (b 1)e wt 1 Cb , where the second equality follows from the definitions of Cb and Cin. Case ct, rt 0: If ct 0, then rt < (a 1)ct 0. Since my places more mass at the common vertices than mx, the only vertex on which mx places more mass than my must be y. Thus, we send all of the needed mass from y to x, the common neighbors, and the unshared neighbors of y, for a cost of D2 = e wt 2 Cb d2 + (a 1)( ct)d2 + (b 1)e wt 1 Cb d1. Consider the 1-Lipschitz function h given by wt 2 if v x, y wt 2 if v = x 0 if v = y wt 1 if v y, v x. This gives a dual objective function value wt 2(a 1)ct + ( wt 2) + (b 1)( wt 1) which matches the formula for D2 above. We summarize these calculations in the following table: Type i wt+1 i 1 wt 1 ae wt 2 e wt 1 Cb + 2wt 2 ae wt 2 Cb 2 (ct, rt 0) wt 1 (b 1)e wt 1 Cb + wt 2rt + wt 3 e wt 2 Cb 2 (ct 0, rt 0) wt 1 (b 1)e wt 1 Cb + wt 2( rt) + wt 3(a 1)ct 2 (ct, rt 0) wt 1 (b 1)e wt 1 Cb + wt 2( rt) 3 wt 3 e wt 3 Cin We are now ready to prove the main two theorem statements. First, we show that wt 3 is decaying to 0. Since a > b 2, we have that a 1 2. Then since Cin (a 1)e wt 3, we get wt+1 3 = wt 3 e wt 3 Cin wt 3 1 a 1. This gives wt+1 3 [a(a 1)t] 1, which tends to 0 exponentially quickly as t . Tian and Lubberts and Weber Now we wish to show that wt 1 > wt 2 and wt 1 > wt 3 for all t 1. From the proof above, we know that wt 3 < 2wt 2. We will also need the inequalities ae wt 2 be wt 1 > 0 (A.8) which holds from a > b and the induction hypothesis wt 1 > wt 2; and ae wt 2 Cb > rt (A.9) Cinae wt 2 Cin Cbrt > 0 2ae 2wt 2 + a(a 1)e wt 2 wt 3 (a 1)(b 1)e wt 1 wt 3 > 0, which holds since e wt 2 > e wt 1 and a > b 1. Note that this implies ae wt 2 Cb a 1 2 e wt 3 Cin e wt 3 Cin . We first show wt+1 1 > wt+1 3 . wt+1 1 wt+1 3 = wt 1 ae wt 2 e wt 1 Cb + 2wt 2 ae wt 2 Cb wt 3 e wt 3 Cin wt 1 (b 1)e wt 1 Cb > 0, using Equation (A.8), wt 3 < 2wt 2, and the implication of Equation (A.9) just above. Since Equation (A.8) clearly shows that the coefficient of wt 1 in wt+1 1 wt+1 2 will be nonnegative in any of the three cases (it is the same in all of them), we argue as follows: wt+1 1 wt+1 2 2wt 2 ae wt 2 Cb wt 2rt wt 3 e wt 2 Cb ct, rt 0 2wt 2 ae wt 2 Cb + wt 2rt wt 3(a 1)ct ct 0, rt 0 2wt 2 ae wt 2 Cb + wt 2rt ct, rt 0. The quantity in the first subcase may be bounded below as follows, using wt 3 < 2wt 2: 2ae wt 2 Cb rt wt 3 e wt 2 Cb wt 2 (a 2)e wt 2 Cb + ae wt 2 Cb rt This lower bound is always positive by Equation (A.9). For the second subcase, observe that (a 1)ct = rt + e wt 2 Cb e wt 2 Cb , so we may lower bound the quantity there (again using wt 3 < 2wt 2) by ae wt 2 Cb + (a 1)e wt 3 Cin wt 3 e wt 2 Cb wt 2 (a 2)e wt 2 Cb + (a 1)e wt 3 Cin Curvature-based Clustering on Graphs The third subcase is simplest of all, since the quantity above simplifies to ae wt 2 Cb + (a 1)e wt 3 Cin This completes the proof. Appendix B. Data sets Real-world data. We consider both synthetic networks generated from the random graph models (see Section 5), and real networks1, which characterize the connectomes of a set of different animal species. A connectome is a specific, cell-to-cell mapping of axonal tracts between neurons, created from cellular data obtained, for instance, via electron microscopy. Here specfically, we consider networks constructed from mixed species of cats ((de Reus and van den Heuvel, 2013), Cat ), male C. elegans ((Jarrell et al., 2012), Worm ), and rhesus ((Harriger et al., 2012), Macaque ). Table 13 provides summary statistics for the real networks in our study. n |E| |E| d σd/d Modularity Cat 65 730 18859 22.46 0.44 0.27 Worm 269 2902 93893 21.58 0.74 0.36 Macaque 242 3054 114991 25.24 0.73 0.36 Table 13: Summary statistics of the real networks. Figure 15: The real networks, Cat (left), Worm (middle) and Macaque (right), where we apply the Louvain algorithm to detect communities and vertices in different communities are shown in different colors. Random Geometric Graphs. In our experiments, we consider Random Geometric Graphs (RGGs) which incorporate the spatial perspective (Penrose, 2003). For completeness, we define the RGG below. 1. Obtained from Neuro Data s open source data base https://neurodata.io/project/connectomes/, accessed on 2 May 2022. The database hosts animal connectomes produced using data from a multitude of labs, across model species, using different modalities. Tian and Lubberts and Weber Definition 29 (Random Geometric Graphs) In this model, vertices are placed in the unit cube uniformly at random, and two vertices will be connected by an edge if the distance between the vertices is less than a previously specified value called radius. Specifically, let the unit cube be [0, 1)nd where nd defines its dimension, the radius be r (0, 1), and the distance function be d(x, y), x, y [0, 1)nd. Then there will be an edge between vertices vi and vj if d(xi, xj) < r , where xi, xj are the positions of the vertices vi, vj, respectively, in the unit cube [0, 1)nd. Specifically, in our experiments, we start from the RGGs corresponding to 2-d unit cubes. Appendix C. Experiments C.1 Single-Membership Community Detection Benchmarking. We report the mean NMI, runtime and their standard deviations (SD) of the methods on SBMs with two equally-sized blocks and varying network sizes. Consistent with the results in Section 7.3, the ORC-based approach outperforms most reference methods, and successfully recovers single-membership community structure, with an NMI greater than 0.8 in almost all cases, as we vary the network size n. n Louvain Spectra-S Spectra-A ORC-E ORC-A FRC-2 100 0.705 (0.121) 0.897 (0.081) 0.599 (0.078) 0.860 (0.117) 0.363 (0.070) 0.355 (0.112) 200 0.981 (0.025) 0.980 (0.020) 0.805 (0.285) 0.986 (0.018) 0.395 (0.107) 0.263 (0.035) 1000 1 (0) 1 (0) 1 (0) 1 (0) 0.990 (0.015) 0.437 (0.263) 3000 1 (0) 1 (0) 1 (0) 1 (0) 1 (0) 0.740 (0.392) Table 14: Mean (SD) of NMI from different methods on planted SBMs with two equallysized blocks, pin = 0.1, pout = 0.01, and varying sizes n. n Louvain Spectra-S Spectra-A ORC-E ORC-A FRC-2 100 0.828 (0.021) 0.045 (0.004) 0.132 (0.323) 0.482 (0.018) 0.419 (0.013) 0.373 (0.009) 200 2.767 (0.098) 0.076 (0.004) 0.037 (0.004) 1.194 (0.076) 0.991 (0.016) 0.889 (0.015) 1000 33.47 (0.584) 1.087 (0.043) 0.382 (0.016) 12.66 (0.168) 6.721 (0.103) 4.244 (0.086) 3000 265.4 (3.120) 13.44 (0.477) 4.802 (0.110) 165.8 (5.805) 85.63 (2.751) 44.27 (1.564) Table 15: Mean (SD) of runtime (in seconds) from different methods on planted SBMs with two equally-sized blocks, pin = 0.1, pout = 0.01, and varying sizes n. Rationale for ORC approximation. ORC-S has been considered in the literature in order to improve the efficiency of ORC computation, but we notice that the improvement is not clear when the graph is not sufficiently large or dense (see Table 16). However, the approximation we have proposed almost always significantly improve the computational efficiency (see also Tables 3 and 15). We also mention here that the current computations are only allocated with moderate level of memory, while we also observed that ORC-S can improve the efficiency when large amount of memory is available, as in the analysis of real data in Section 7.4. Curvature-based Clustering on Graphs pin ORC-E ORC-S ORC-A 0.1 2785 5859 1013 0.2 16876 68658 2933 Table 16: Runtime (in seconds) from different ORC-based methods on one sample of planted SBMs with two equally-sized blocks, size n = 5000, pout = 0.01, and varying probabilities pin. Rationale for 2-complex FRC. FRC-2 is the one of the strongest correlation with the clustering coefficients, among the three versions of FRC (see Figure 16). 0.06 0.08 0.10 0.12 Clustering coefficient Counts in bins 0.06 0.08 0.10 0.12 Clustering coefficient Counts in bins 0.06 0.08 0.10 0.12 Clustering coefficient Counts in bins 0.06 0.08 0.10 0.12 Clustering coefficient Counts in bins Figure 16: 2-d histogram of FRC-1 (left), FRC-2 (middle left), FRC-3 (middle right) and ORC-E (right) versus the clustering coefficients of nodes in G, where ns = 10 networks are generated from the planted SBM with two equally-sized blocks, size n = 1000, pin = 0.1 and pout = 0.01. C.2 Mixed-Membership Community Detection Benchmarking. We report the mean NMI,, runtime and their standard deviations (SD) of the methods on MMBs with two equally-sized blocks and varying network sizes. Similar to the results in Section 7.4, the ORC-based method outperforms most reference methods, and successfully recovers mixed-membership community structure in almost all cases, with an NMI greater than 0.8, as we vary the network size n. n Louvain Spectra-S Spectra-A ORC-E ORC-A FRC-3 50 0.401 (0.068) 0.965 (0.038) 0.463 (0.151) 0.925 (0.081) 0.791 (0.155) 0.712 (0.264) 100 0.469 (0.107) 0.996 (0.011) 0.453 (0.164) 0.993 (0.015) 0.664 (0.167) 0.750 (0.233) 300 0.999 (0.002) 1 (0) 0.834 (0.264) 1 (0) 0.724 (0.123) 0.815 (0.173) 500 1 (0) 1 (0) 1 (0) 1 (0) 0.769 (0.213) 0.838 (0.205) Table 17: Mean (SD) of the extended NMI from different methods on planted two-block MMBs with two equally-sized blocks, no = 1 node of mixed membership, pin = 0.1, pout = 0, and varying sizes n. C.3 Comparison with spectral clustering methods Apart from the two versions of Spectral Clustering algorithm as presented in Section 7, we have also considered other versions of the Spectral Clustering, where instead of using Tian and Lubberts and Weber n Louvain Spectra-S Spectra-A ORC-E ORC-A FRC-3 50 0.699 (0.135) 0.067 (0.016) 0.118 (0.295) 0.741 (0.121) 0.626 (0.106) 0.730 (0.121) 100 6.405 (0.922) 0.139 (0.005) 0.051 (0.005) 2.330 (0.378) 2.051 (0.336) 3.346 (0.408) 300 391.5 (28.44) 5.388 (0.391) 2.242 (0.168) 48.88 (4.877) 33.35 (3.868) 46.03 (8.584) 500 2386 (107.4) 60.76 (3.543) 28.05 (1.454) 269.6 (25.12) 183.0 (11.95) 200.2 (44.79) Table 18: Mean (SD) of runtime (in seconds) from different methods on planted two-block MMBs with two equally-sized blocks, no = 1 node of mixed membership, pin = 0.1, pout = 0, and varying sizes n. the adjacency matrix directly, we can construct an affinity matrix by (i) radius basis functions ( SC-R ) or (iv) nk nearest neighbors ( SC-N ) where nk is chosen to be 10 in the experiments. Our experimental results (see Tables 19 and 20) demonstrate that the ORC approach can also outperform other variants of Spectral Clustering, in both singleand mixed-membership community detection. n ORC-E SC-R SC-N NMI runtime NMI runtime NMI runtime 100 0.860 (0.117) 0.482 (0.018) 0.417 (0.103) 0.042 (0.006) 0.619 (0.143) 0.063 (0.002) 500 0.986 (0.018) 1.194 (0.076) 0.615 (0.082) 0.061 (0.004) 0.889 (0.082) 0.099 (0.003) 1000 1 (0) 12.66 (0.168) 0.834 (0.276) 0.833 (0.063) 1 (0) 1.203 (0.102) 3000 1 (0) 165.8 (5.805) 1 (0) 17.28 (6.174) 1 (0) 20.13 (0.692) Table 19: Mean (SD) of NMI and runtime (in seconds) from the ORC method and two different Spectral Clustering methods on planted SBMs with two equally-sized blocks, pin = 0.1, pout = 0.01, and varying sizes n. n ORC-E SC-R SC-N NMI runtime NMI runtime NMI runtime 50 0.925 (0.081) 0.741 (0.121) 0.304 (0.050) 0.077 (0.011) 0.623 (0.192) 0.086 (0.003) 100 0.993 (0.015) 2.330 (0.378) 0.128 (0.021) 0.183 (0.030) 0.683 (0.133) 0.159 (0.012) 300 1 (0) 48.88 (4.877) 0.526 (0.333) 1984 (886.8) 0.951 (0.078) 7.360 (0.387) 500 1 (0) 269.6 (25.12) 0.033 (0.029) 25332 (697.7) 0.920 (0.079) 109.6 (4.948) Table 20: Mean (SD) of NMI and runtime (in seconds) from the ORC method and two different Spectral Clustering methods on planted MMBs with two equally-sized blocks, pin = 0.1, pout = 0, no = 1 node of mixed membership, and varying sizes n. The number of clusters is chosen to be the true one (2), instead of a grid search, because the runtime is expected to exceed a day for each graph. Furthermore, we have also performed a detailed comparison with other spectral methods, e.g., (Zhang, 2023). However, since the appearance of nodes of significantly larger degree than the others is relatively rare in our datasets, the results do not change significantly in our experiments. Therefore, we have omitted the detailed results. Curvature-based Clustering on Graphs C.4 Synthetic data In Section 7.4, we observe that the performance of the algorithm via augmented FRC is not comparable with the performance of the algorithm via ORC, which raises the important problem of relating FRC to ORC. We find that FRC-3 has a stronger linear relationship with ORC as the graphs become denser; see Figures 17, 18 and 19. This is potentially because there are more and more triangular and quadrangular faces in the graphs when more edges are present, and these characteristics become more dominant for the ORC. Meanwhile, we find that, when pin is relatively small, all three variants of FRC cannot really separate the edges in L between the nodes in the same community versus the others initially, which might be the reason of a relatively worse performance of the algorithm via FRC in the experiments. However, FRC-3 can separate these two types of edges when pin is relatively large, which indicates that the performance is expected to be better in the dense regime, where ORC might have computational issues. C.5 Node neighborhood measures We give some brief computational evidence that the choice of α in the node neighborhood measure as in Equation (4.2) has little impact on the performance of our curvature-based algorithms. Over a range of different choices of α, and for two real-data networks, we see that the results are highly conserved. Data\α 0 0.1 0.2 0.3 0.4 0.5 0.6 DBLP-1 0.689 0.689 0.689 0.689 0.689 0.689 0.689 DBLP-2 0.969 0.969 0.969 0.969 0.969 0.969 0.969 Table 21: Extended NMI of ORC-E via line graph curvature on the real data, where we choose different values of α in Equation (4.2). Tian and Lubberts and Weber 0.6 0.4 0.2 0.0 0.2 ORC Ricci curvatures of edges in L (within community 1) Count in bins 0.4 0.2 0.0 0.2 ORC Ricci curvatures of edges in L (within community 1) Count in bins 0.4 0.2 0.0 0.2 ORC Ricci curvatures of edges in L (within community 1) Counts in bins 0.6 0.4 0.2 0.0 0.2 ORC Ricci curvatures of edges in L (within community 2) Count in bins 0.4 0.2 0.0 0.2 ORC Ricci curvatures of edges in L (within community 2) Count in bins 0.4 0.2 0.0 0.2 ORC Ricci curvatures of edges in L (within community 2) Counts in bins 0.6 0.4 0.2 0.0 0.2 ORC Ricci curvatures of edges in L (between communities) Count in bins 0.4 0.2 0.0 0.2 ORC Ricci curvatures of edges in L (between communities) Count in bins 0.4 0.2 0.0 0.2 ORC Ricci curvatures of edges in L (between communities) Counts in bins Figure 17: 2-d histograms of the ORC-E versus the FRC-1 (left), FRC-2 (middle) and FRC3 (right) for edges in L, where the networks are obtained from the planted two-block MMB of size n = 300, pin = 0.1, pout = 0, no = 1 node of mixed membership, and we generate ns = 10 networks for the results. (Row 1 and 2: edges between nodes within communities 1, in blue and 2, in green, respectively, in L; Row 3: edges between nodes in the different communities, in red). Curvature-based Clustering on Graphs 0.4 0.2 0.0 0.2 ORC Ricci curvatures of edges in L (within community 1) Count in bins 0.4 0.2 0.0 0.2 ORC Ricci curvatures of edges in L (within community 1) Count in bins 0.4 0.2 0.0 0.2 ORC Ricci curvatures of edges in L (within community 1) Counts in bins 0.4 0.2 0.0 0.2 ORC Ricci curvatures of edges in L (within community 2) Count in bins 0.4 0.2 0.0 0.2 ORC Ricci curvatures of edges in L (within community 2) Count in bins 0.4 0.2 0.0 0.2 ORC Ricci curvatures of edges in L (within community 2) Counts in bins 0.4 0.2 0.0 0.2 ORC Ricci curvatures of edges in L (between communities) Count in bins 0.4 0.2 0.0 0.2 ORC Ricci curvatures of edges in L (between communities) Count in bins 0.4 0.2 0.0 0.2 ORC Ricci curvatures of edges in L (between communities) Counts in bins Figure 18: 2-d histograms of the ORC-E versus the FRC-1 (left), FRC-2 (middle) and FRC-3 (right) for edges in L, where the networks are obtained from the planted two-block MMB of size n = 300, pin = 0.2, pout = 0, no = 1 node of mixed membership, and ns = 10. Tian and Lubberts and Weber 0.4 0.2 0.0 0.2 ORC Ricci curvatures of edges in L (within community 1) Count in bins 0.4 0.2 0.0 0.2 ORC Ricci curvatures of edges in L (within community 1) Count in bins 0.4 0.2 0.0 0.2 ORC Ricci curvatures of edges in L (within community 1) Counts in bins 0.4 0.2 0.0 0.2 ORC Ricci curvatures of edges in L (within community 2) Count in bins 0.4 0.2 0.0 0.2 ORC Ricci curvatures of edges in L (within community 2) Count in bins 0.4 0.2 0.0 0.2 ORC Ricci curvatures of edges in L (within community 2) Counts in bins 0.4 0.2 0.0 0.2 ORC Ricci curvatures of edges in L (between communities) Count in bins 0.4 0.2 0.0 0.2 ORC Ricci curvatures of edges in L (between communities) Count in bins 0.4 0.2 0.0 0.2 ORC Ricci curvatures of edges in L (between communities) Counts in bins Figure 19: 2-d histograms of the ORC-E versus the FRC-1 (left), FRC-2 (middle) and FRC-3 (right) for edges in L, where the networks are obtained from the planted two-block MMB of size n = 300, pin = 0.3, pout = 0, no = 1 node of mixed membership, and ns = 10. Curvature-based Clustering on Graphs E. Abbe. Community detection and stochastic block models: recent developments. Journal of Machine Learning Research, 18(1):6446 6531, 2017. E. Airoldi, D. Blei, S. Fienberg, and E. Xing. Mixed membership stochastic blockmodels. Advances in neural information processing systems, 21, 2008. F. Battiston, G. Cencetti, I. Iacopini, V. Latora, M. Lucas, A. Patania, J. Young, and G. Petri. Networks beyond pairwise interactions: structure and dynamics. Physics Reports, 874:1 92, 2020. A. Benson, D. Gleich, and J. Leskovec. Higher-order organization of complex networks. Science, 353(6295):163 166, 2016. E. Bloch. Combinatorial Ricci curvature for polyhedral surfaces and posets. ar Xiv:1406.4598, 2014. V. Blondel, J. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), oct 2008. J. Cheeger. A lower bound for the smallest eigenvalue of the laplacian. In Proceedings of the Princeton conference in honor of Professor S. Bochner, pages 195 199, 1969. Z. Chen, X. Li, and J. Bruna. Supervised community detection with line graph neural networks. ar Xiv:1705.08415, 2017. M. de Reus and M. van den Heuvel. Rich club organization and intermodule communication in the cat connectome. Journal of Neuroscience, 33(32):12929 12939, 2013. P. Elumalai, Y. Yadav, N. Williams, E. Saucan, J. Jost, and A. Samal. Graph Ricci curvatures reveal atypical functional connectivity in autism spectrum disorder. Scientific reports, 12(1):1 19, 2022. M. Erbar and J. Maas. Ricci curvature of finite markov chains via convexity of the entropy. Archive for Rational Mechanics and Analysis, 206:997 1038, 2012. T. Evans and R. Lambiotte. Line graphs of weighted networks for overlapping communities. The European Physical Journal B, 77(2):265 272, September 2010. M. Fiedler. Algebraic connectivity of graphs. Czechoslovak mathematical journal, 23(2): 298 305, 1973. R. Forman. Bochner s method for cell complexes and combinatorial Ricci curvature. Discrete and Computational Geometry, 29(3):323 374, 2003. M. Girvan and M. Newman. Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12):7821 7826, 2002. Tian and Lubberts and Weber S. G omez, P. Jensen, and A. Arenas. Analysis of community structure in networks of correlated data. Physical Review E, 80(1):016114, 2009. A. Gosztolai and A. Arnaudon. Unfolding the multiscale structure of networks with dynamical Ollivier-Ricci curvature. Nature Communications, 12(1):4561, 2021. M. Gromov. Hyperbolic groups. In Essays in Group Theory, volume 8. Mathematical Sciences Research Institute Publications. Springer, NY, 1987. L. Harriger, M. van den Heuvel, and O. Sporns. Rich club organization of macaque cerebral cortex and its role in network communication. PLOS ONE, 7(9):1 13, 09 2012. P. Holland, K. Laskey, and S. Leinhardt. Stochastic blockmodels: First steps. Social Networks, 5(2):109 137, 1983. S. Hopkins and D. Steurer. Bayesian estimation from few samples: community detection and related problems, September 2017. T. Jarrell, Y. Wang, A. Bloniarz, C. Brittin, M. Xu, J. Thomson, D. Albertson, D. Hall, and S. Emmons. The connectome of a decision-making neural network. Science, 337 (6093):437 444, 2012. J. Jin, Z. Ke, and S. Luo. Estimating network memberships by simplex vertex hunting. ar Xiv:1708.07852, 2017. J. Jost and S. Liu. Ollivier s Ricci curvature, local clustering and curvature-dimension inequalities on graphs. Discrete & Computational Geometry, 51(2):300 322, 2014. J. Jost and F. M unch. Characterizations of forman curvature. ar Xiv:2110.04554, 2021. F. Krzakala, C. Moore, E. Mossel, J. Neeman, A. Sly, L. Zdeborovm a, and P. Zhang. Spectral redemption in clustering sparse networks. Proceedings of the National Academy of Sciences, 110(52):20935 20940, 2013. H. Kuhn. The hungarian method for the assignment problem. Naval research logistics quarterly, 2(1-2):83 97, 1955. A. Lancichinetti, S. Fortunato, and J. Kert esz. Detecting the overlapping and hierarchical community structure in complex networks. New Journal of Physics, 11(3):033015, 2009. W. Leal, G. Restrepo, P. F. Stadler, and J. Jost. Forman Ricci curvature for hypergraphs. Advances in Complex Systems, 24(01):2150003, 2021. P. Lehot. An optimal algorithm to detect a line graph and output its root graph. Journal of the ACM (JACM), 21(4):569 575, 1974. J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014. J. Leskovec and J. Mcauley. Learning to discover social circles in ego networks. In F. Pereira, C.J. Burges, L. Bottou, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012. Curvature-based Clustering on Graphs Y. Lin, L. Lu, and S. Yau. Ricci curvature of graphs. Tohoku Mathematical Journal, 63(4): 605 627, 2011. Z. Lubberts, A. Athreya, Y. Park, and C. E. Priebe. Beyond the adjacency matrix: random line graphs and inference for networks with edge attributes. ar Xiv:2103.14726, 2021. S. Lubold, A. Chandrasekhar, and T. Mc Cormick. Identifying the latent space geometry of network models through analysis of curvature. Journal of the Royal Statistical Society Series B: Statistical Methodology, 85(2):240 292, 2023. C. Ni, Y. Lin, J. Gao, X. Gu, and E. Saucan. Ricci curvature of the internet topology. In 2015 IEEE conference on computer communications (INFOCOM), pages 2758 2766. IEEE, 2015. C. Ni, Y. Lin, F. Luo, and J. Gao. Community detection on networks with ricci flow. Scientific reports, 9(1):1 12, 2019. Y. Ollivier. Ricci curvature of markov chains on metric spaces. Journal of Functional Analysis, 256(3):810 864, 2009. Y. Ollivier. A survey of Ricci curvature for metric spaces and Markov chains. In Probabilistic Approach to Geometry, pages 343 381. Mathematical Society of Japan, 2010. D. T. Painter, B. C. Daniels, and J. Jost. Network analysis for the digital humanities: Principles, problems, extensions. ISIS, 110(3):538 554, 2019. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825 2830, 2011. M. Penrose. Random Geometric Graphs. Oxford studies in probability. Oxford University Press, 2003. M. Pouryahya, J. Mathews, and A. Tannenbaum. Comparing three notions of discrete Ricci curvature on biological networks. ar Xiv:1712.02943, 2017. A. Samal, R. Sreejith, J. Gu, S. Liu, E. Saucan, and J. Jost. Comparative analysis of two discretizations of Ricci curvature for complex networks. Scientific reports, 8(1):1 16, 2018. R. Sandhu, T. Georgiou, and A. Tannenbaum. Ricci curvature: An economic indicator for market fragility and systemic risk. Science advances, 2(5):e1501495, 2016. E. Saucan and M. Weber. Forman s Ricci curvature-from networks to hypernetworks. In International conference on complex networks and their applications, pages 706 717. Springer, 2018. E. Saucan, A. Samal, M. Weber, and J. Jost. Discrete curvatures and network analysis. MATCH, 80(3):605 622, 2018. Tian and Lubberts and Weber J. Sia, E. Jonckheere, and P. Bogdan. Ollivier-Ricci curvature-based method to community detection in complex networks. Scientific reports, 9(1):1 12, 2019. R. Sinkhorn and P. Knopp. Concerning nonnegative matrices and doubly stochastic matrices. Pacific Journal of Mathematics, 21(2):343 348, 1967. D. Spielman and S. Teng. Spectral partitioning works: Planar graphs and finite element meshes. In Proceedings of 37th conference on foundations of computer science, pages 96 105. IEEE, 1996. A. Strehl and J. Ghosh. Cluster ensembles a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3:583 617, 2002. A. Tannenbaum, C. Sander, L. Zhu, R. Sandhu, I. Kolesov, E. Reznik, Y. Senbabaoglu, and T. Georgiou. Ricci curvature and robustness of cancer networks. ar Xiv:1502.04512, 2015. Y. Tian, Z. Lubberts, and M. Weber. Mixed-membership community detection via line graph curvature. In Proceedings of the 1st Neur IPS Workshop on Symmetry and Geometry in Neural Representations, volume 197 of Proceedings of Machine Learning Research, pages 219 233. PMLR, 03 Dec 2023. U. von Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(1):395 416, 2007. D. Watts and S. Strogatz. Collective dynamics of small-world networks. nature, 393(6684): 440 442, 1998. M. Weber. Neighborhood growth determines geometric priors for relational representation learning. In Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108 of Proceedings of Machine Learning Research, pages 266 276. PMLR, 26 28 Aug 2020. M. Weber and M. Nickel. Curvature and representation learning: Identifying embedding spaces for relational data. In Neur IPS Relational Representation Learning, 2018. M. Weber, E. Saucan, and J. Jost. Characterizing complex networks with Forman-Ricci curvature and associated geometric flows. Journal of Complex Networks, 5(4):527 550, 2017a. M. Weber, E. Saucan, and J. Jost. Coarse geometry of evolving networks. Journal of Complex Networks, 6(5):706 732, 2017b. M. Weber, J. Stelzer, E. Saucan, A. Naitsat, G. Lohmann, and J. Jost. Curvature-based methods for brain network analysis. ar Xiv:1707.00180, 2017c. M. Weber, J. Jost, and E. Saucan. Detecting the coarse geometry of networks. In Neur IPS Relational Representation Learning, 2018. H. Whitney. Congruent graphs and the connectivity of graphs. American Journal of Mathematics, 54(1):150 168, 1932. Curvature-based Clustering on Graphs J. Xie, S. Kelley, and B. K. Szymanski. Overlapping community detection in networks: The state-of-the-art and comparative study. ACM Computing Surveys (csur), 45(4): 1 35, 2013. J. Yang and J. Leskovec. Overlapping community detection at scale: a nonnegative matrix factorization approach. In Proceedings of the sixth ACM international conference on Web search and data mining, pages 587 596, 2013. J. Yang and J. Leskovec. Defining and evaluating network communities based on groundtruth. Knowledge and Information Systems, 42:181 213, 2015. Anderson Ye Zhang. Fundamental limits of spectral clustering in stochastic block models. ar Xiv:2301.09289, 2023. Y. Zhang, E. Levina, and J. Zhu. Detecting Overlapping Communities in Networks Using Spectral Methods. SIAM Journal on Mathematics of Data Science, 2(2):265 283, January 2020. Publisher: Society for Industrial and Applied Mathematics. M. Zhu and A. Ghodsi. Automatic dimensionality selection from the scree plot via the use of profile likelihood. Comput. Stat. Data Anal., (2):918 930, 2006.