# local_hyperflow_diffusion__a6a01cd8.pdf

Local Hyper-Flow Diffusion

Kimon Fountoulakis School of Computer Science University of Waterloo Waterloo, ON, Canada kimon.fountoulakis@uwaterloo.ca

Pan Li Department of Computer Science Purdue University West Lafayette, IN, United States panli@purdue.edu

Shenghao Yang School of Computer Science University of Waterloo Waterloo, ON, Canada shenghao.yang@uwaterloo.ca

Recently, hypergraphs have attracted a lot of attention due to their ability to capture complex relations among entities. The insurgence of hypergraphs has resulted in data of increasing size and complexity that exhibit interesting small-scale and local structure, e.g., small-scale communities and localized node-ranking around a given set of seed nodes. Popular and principled ways to capture the local structure are the local hypergraph clustering problem and the related seed set expansion problem. In this work, we propose the ﬁrst local diffusion method that achieves edge-sizeindependent Cheeger-type guarantee for the problem of local hypergraph clustering while applying to a rich class of higher-order relations that covers a number of previously studied special cases. Our method is based on a primal-dual optimization formulation where the primal problem has a natural network ﬂow interpretation, and the dual problem has a cut-based interpretation using the ℓ2-norm penalty on associated cut-costs. We demonstrate the new technique is signiﬁcantly better than state-of-the-art methods on both synthetic and real-world data.

1 Introduction

Hypergraphs [1] generalize graphs by allowing a hyperedge to consist of multiple nodes that capture higher-order relationships in complex systems and datasets [2]. Hypergraphs have been used for music recommendation on Last.fm data [3], news recommendation [4], sets of product reviews on Amazon [5], and sets of co-purchased products at Walmart [6]. Beyond the internet, hypergraphs are used for analyzing higher-order structure in neuronal, air-trafﬁc and food networks [7,8].

In order to explore and understand higher-order relationships in hypergraphs, recent work has made use of cut-cost functions that are deﬁned by associating each hyperedge with a speciﬁc set function. These functions assign speciﬁc penalties of separating the nodes within individual hyperedges. They generalize the notion of hypergraph cuts and are crucial for determining small-scale community structure [8,9]. The most popular cut-cost functions with increasing capability to model complex multiway relationships are the unit cut-cost [10 12], cardinality-based cut-cost [13,14] and general submodular cut-cost [15,16]. An illustration of a hyperedge and the associated cut-cost function is given in Figure 1. In the simplest setting, all cut-cost functions take value either 0 or 1 (e.g., the case when γ1 = γ2 = 1 in Figure 1b), we obtain a unit cut-cost hypergraph. In a slightly more general setting, the cut-costs are determined solely by the number of nodes in either side of the hyperedge cut (e.g., the case when γ1 = 1/2 and γ2 = 1 in Figure 1b), we obtain a cardinality-based hypergraph.

35th Conference on Neural Information Processing Systems (Neur IPS 2021).

We refer to hypergraphs associated with arbitrary submodular cut-cost functions (e.g., the case when γ1 = 1/2 and 0 γ2 1 in Figure 1b) as general submodular hypergraphs.

Hypergraphs that arise from data science applications consist of interesting small-scale local structure such as local communities [9,17]. Exploiting this structure is central to the above mentioned applications on hypergraphs and related applications in machine learning and applied mathematics [18]. We consider local hypergraph clustering as the task of ﬁnding a community-like cluster around a given set of seed nodes, where nodes in the cluster are densely connected to each other while relatively isolated to the rest of the graph. One of the most powerful primitives for the local hypergraph clustering task is the graph diffusion. Diffusion on a graph is the process of spreading a given initial mass from some seed node(s) to neighbor nodes using the edges of the graph. Graph diffusions have been successfully employed in the industry, for example, both Pinterest and Twitter use diffusion methods for their recommendation systems [19 21]. Google uses diffusion methods to perform clustering query reﬁnements [22]. Let us not forget Page Rank [23,24], Google s model for their search engine.

(a) Network motif

we({v1, v3}) = 1

we({v2}) = γ1

we({v1, v2}) = γ2

(b) Hyperedge and cut-cost we

Figure 1: A food network can be mapped into a hypergraph by taking each network pattern in (a) as a hyperedge [8]. This network pattern captures carbon ﬂow from two preys (v1, v2) to two predators (v3, v4). (b) is a hyperedge associated with cut-cost we that models their relations: we is a set function deﬁned over the node set e s.t. we({vi}) = γ1 for i = 1, 2, 3, 4, we({v1, v2}) = γ2, we({v1, v3}) = we({v1, v4}) = 1 and we(S) = we(e\S) for S e. we becomes the unit cut-cost when γ1 = γ2 = 1; we is cardinality-based if γ1 = 1/2 and γ2 = 1; more generally, we is submodular if γ1 = 1/2 and 0 γ2 1. The speciﬁc choices depend on the application.

Empirical and theoretical performance of local diffusion methods is often measured on the problem of local hypergraph clustering [9,25,26]. Existing local diffusion methods only directly apply to hypergraphs with the unit cut-cost [17,27]. For the slightly more general cardinality-based cut-cost, they rely on graph reduction techniques which result in a rather pessimistic edge-size-dependent approximation error [9, 13, 27, 28]. Moreover, none of the existing methods is capable of processing general submodular cut-costs. In this work, we are interested in designing a diffusion framework that (i) achieves stronger theoretical guarantees for the problem of local hypergraph clustering, (ii) is ﬂexible enough to work with general submodular hypergraphs, and (iii) permits computationally efﬁcient algorithms. We propose the ﬁrst local diffusion method that simultaneously accomplishes these goals.

In what follows we describe our main contributions and previous work. In Section 2 we provide preliminaries and notations. In Section 3 we introduce our diffusion model from a combinatorial ﬂow perspective. In Section 4 we discuss the local hypergraph clustering problem and Cheeger-type quadratic approximation error. In Section 6 we perform experiments using both synthetic and real datasets.

1.1 Our main contributions

In this work we propose a generic local diffusion model that applies to hypergraphs characterized by a rich class of cut-cost functions that covers many previously studied special cases, e.g., unit, cardinality-based and submodular cut-costs. We provide the ﬁrst edge-size-independent Cheeger-type approximation error for the problem of local hypergraph clustering using any of these cut-costs. In particular, assume that there exists a cluster C with conductance Φ(C), and assume that we are given a set of seed nodes that reasonably overlaps with C, then the proposed diffusion model can be used to ﬁnd a cluster ˆC with conductance at most O( p

Φ(C)) (in the appendix we show that an ℓp-norm version of the proposed model can achieve O(Φ(C)) asymptotically). Our hypergraph diffusion model is formulated as a convex optimization problem. It has a natural combinatorial ﬂow interpretation that generalizes the notion of network ﬂows over hyperedges. We show that the optimization problem can be solved efﬁciently by an alternating minimization method. In addition, we prove that the number of nonzero nodes in the optimal solution is independent of the size of the hypergraph, and it only depends on the size of the initial mass. This key property ensures that our algorithm scales well in practice for large datasets. We evaluate our method using both synthetic and real-world data. We show that our method improves accuracy signiﬁcantly for hypergraphs with unit, cardinarlity-based and general submodular cut-costs for local clustering.

1.2 Previous work

Recently, clustering methods on hypergraphs received renewed interest. Different methods require different assumptions about the hyperedge cut-cost, which can be roughly categorized into unit cut-cost, cardinality-based (and submodular) cut-cost and general submodular cut-cost. Moreover, existing methods can be either global, where the output is not localized around a given set of seed nodes, or local, where the output is a tiny cluster around a set of seed nodes. Local algorithms are the only scalable ones for large hypergraphs, which is our main focus. Many works propose global methods and thus they are not scalable to large hypergraphs [7,8,15 17,29 38]. Local diffusion-based methods are more relevant to our work [9,13,27]. In particular, iterative hypergraph min-cut methods for the local hypergraph clustering problem can be adopted [13]. However, these methods require in theory and in practice a large seed set, i.e., they are not expansive and thus cannot work with one seed node. The expansive combinatorial diffusion [25] is generalized for hypergraphs [27], which can detect a target cluster using only one seed node. However, combinatorial methods have a large bias towards low conductance clusters as opposed to ﬁnding the target cluster [39]. The most relevant paper to our work is [9]. However, the proposed methods in [9] depend on a reduction from hypergraphs to directed graphs. This results in an approximation error for clustering that is proportional to the size of hyperedges and induces performance degeneration when the hyperedges are large. In fact, none of the above approaches (including global and local ones) has an edge-size-independent approximation error bound for even simple cardinality-based hypergraphs. Moreover, existing local approaches do not work for general submodular hypergraphs.

2 Preliminaries and Notations

Submodular function. Given a set S, we denote 2S the power set of S and |S| the cardinality of S. A submodular function F : 2S R is a set function such that F(A)+F(B) F(A B)+F(A B) for any A, B S.

Submodular hypergaph. A hypergraph H = (V, E) is deﬁned by a set of nodes V and a set of hyperedges E 2V , i.e., each hyperedge e E is a subset of V . A hypergraph is termed submodular if every e E is associated with a submodular function we : 2e R 0 [15]. The weight we(S) indicates the cut-cost of splitting the hyperedge e into two subsets, S and e \ S. This general form allows us to describe the potentially complex higher-order relation among multiple nodes (Figure 1). A proper hyperedge weight we should satisfy that we( ) = we(e) = 0. To ease notation we extend the domain of we to 2V by setting we(S) := we(S e) for any S V . We assume without loss of generality that we is normalized by ϑe := max S e we(S), so that we(S) [0, 1] for any S V . For the sake of simplicity in presentation, we assume that ϑe = 1 for all e.1 A submodular hypergraph is written as H = (V, E, W) where W := {we, ϑe}e E. Note that when we(S) = 1 for any S 2e\{ , e}, the deﬁnition reduces to unit cut-cost hypergraphs. When we(S) only depends on |S|, it reduces to cardinality-based cut-cost hypergraphs.

Vector/Function on V or E. For a set of nodes S V , we denote 1S the indicator vector of S, i.e., [1S]v = 1 if v S and 0 otherwise. For a vector x R|V |, we write x(S) := P

v S xv, where xv in the entry in x that corresponds to v V . We deﬁne the support of x as supp(x) := {v V |xv = 0}. The support of a vector in R|E| is deﬁned analogously. We refer to a function over nodes x : V R and its explicit representation as a |V |-dimensional vector interchangeably.

Volume, cut, conductance. Given a submodular hypergraph H = (V, E, W), the degree of a node v is deﬁned as dv := |{e E : v e}|. We reserve d for the vector of node degrees and D = diag(d). We refer to vol(S) := d(S) as the volume of S V . A cut is treated as a proper subset C V , or a partition (C, C) where C := V \ C. The cut-set of C is deﬁned as C := {e E|e C = , e C = }; the cut-size of C is deﬁned as vol( C) := P

e C ϑewe(C) = P

e E ϑewe(C). The conductance of a cut C in H is Φ(C) := vol( C) min{vol(C),vol(V \C)}.

Flow. A ﬂow routing over a hyperedge e is a function re : e R where re(v) speciﬁes the amount of mass that ﬂows from {v} to e \ {v} over e. To ease notation we extend the domain of re to V by identifying re(v) = 0 for v e, so re is treated as a function re : V R or equivalently a |V |-dimensional vector. The net (out)ﬂow at a node v is given by P

e E re(v). Given a routing

1This is without loss of generality. In the appendix we show that our method works with arbitrary ϑe > 0.

function re and a set of nodes S V , a directional routing on e with direction S e \ S is represented by re(S), which speciﬁes the net amount of mass that ﬂows from S to e \ S. A routing re is called proper if it obeys ﬂow conservation, i.e., r T e 1e = 0. Our ﬂow deﬁnition generalizes the notion of network ﬂows to hypergraphs. We provide concrete illustrations in Figure 2.

v4 v3 +2 2 +1

(a) Flows on graph

(b) Hyperedge routing

(c) Flows on hypergraph

Figure 2: Illustration of proper ﬂow routings. The numbers next to each node correspond to entries in the ﬂow routing re over a (hyper)edge e. We assign the same color to a (hyper)edge and its associated ﬂow values. Our ﬂow deﬁnition is a natural generalization of graph edge ﬂow where re(v) = f if and only if v e, i.e., v is incident to e, where f and the sign determine the amplitude and direction of the ﬂow over e. In Figure 2a, the net (out)ﬂow at node v3 is given by P e E re(v3) = 1 2 3 = 4. In Figure 2b, the directional ﬂow from {v1} to {v2, v3, v4, v5} over this hyperedge equals 1; similarly, the directional ﬂow from {v1, v2, v4} to {v3, v5} equals 3 + 2 1 = 4. In Figure 2c, the net (out)ﬂow at node v3 is given by P

e E re(v3) = 3 2 = 1.

3 Diffusion as an Optimization Problem

In this section we provide details of the proposed local diffusion method. We consider diffusion process as the task of spreading mass from a small set of seed nodes to a larger set of nodes. More precisely, given a hypergraph H = (V, E, W), we assign each node a sink capacity speciﬁed by a sink function T, i.e., node v is allowed to hold at most T(v) amount of mass. In this work we focus on the setting where T(v) = dv, so that a high-degree node that is part of many hyperedges can hold more mass than a low-degree node that is part of few hyperedges. Moreover, we assign each node some initial mass speciﬁed by a source function , i.e., node v holds (v) amount of mass at the start of the diffusion. In order to encourage the spread of mass in the hypergraph, the initial mass on the seed nodes is larger than their capacity. This forces the seed nodes to diffuse mass to neighbor nodes to remove their excess mass. In Section 4 we will discuss the choice of to obtain good theoretical guarantees for the problem of local hypergraph clustering. Details about the local hypergraph clustering problem are provided in Section 4.

Given a set of proper ﬂow routings re for e E, recall that P

e E re(v) speciﬁes the amount of net (out)ﬂow at node v. Therefore, the vector m = P

e E re gives the amount of net mass at each node after routing. The excess mass at a node v is ex(v) := max{mv dv, 0}. In order to force the diffusion of initial mass we could simply require that ex(v) = 0 for all v V , or equivalently, P

e E re d. But to provide additional ﬂexibility in the diffusion dynamics, we introduce a hyper-parameter σ 0 and we impose a softer constraint P e E re d+σDz, where z R|V | is an optimization variable that controls how much excess mass is allowed on each node. In the context of numerical optimization, we show in Section 5 that σ allows a reformulation which makes the optimization problem amenable to efﬁcient alternating minimization schemes.

Note that so far we have not yet talked about how speciﬁc higher-order relations among nodes within a hyperedge would affect the ﬂow routings over it. Apparently, simply requiring that the re s obey ﬂow conservation (i.e., r T e 1e = 0) similar to the standard graph setting is not enough for hypergraphs. An important difference between hyperedge ﬂows and graph edge ﬂows is that additional constraints on re are in need. To this end, we consider re = φeρe for some φe R+ and ρe Be, where

Be := {ρe R|V | | ρe(S) we(S), S V, and ρe(V ) = we(V )} is the base polytope [40] for the submodular cut-cost we associated with hyperedge e. It is straightforward to see that re(v) = 0 for every v e and r T e 1e = 0, so re deﬁnes a proper ﬂow routing over e. Moreover, for any e V , recall that re(S) represents the net amount of mass that moves from S to e \ S over hyperedge e. Therefore, the constraints ρe(S) we(S) for S e mean that the directional ﬂows re(S) are upper bounded by a submodular function φewe(S). Intuitively, one may think of φe and ρe as the scale and the shape of re, respectively.

The goal of our diffusion problem is to ﬁnd low cost routings re φe Be for e E such that the capacity constraint P

e E re d + σDz is satisﬁed. We consider the (weighted) ℓ2-norm of φ and z as the cost of diffusion. In the appendix we show that one readily extends the ℓ2-norm to ℓp-norm for any p 2. Formally, we arrive at the following convex optimization formulation (input: the source function , the hypergraph H = (V, E, W), and a hyper-parameter σ):

min φ R|E| + ,z R|V | +

e E φ2 e + σ

v V dvz2 v, s.t. X

e E re d + σDz, re φe Be, e E. (1)

We name problem (1) Hyper-Flow Diffusion (HFD) for its combinatorial ﬂow interpretation we discussed above. The dual problem of (1) is:

max x R|V | + ( d)T x 1

e E fe(x)2 σ

v V dvx2 v, (2)

where fe in (2) is the support function of the base polytope Be given by fe(x) := maxρe Be ρT e x. fe is also known as the Lovász extension of the submodular function we.

We provide a combinatorial interpretation for (2) and leave algebraic derivations to the appendix. For the dual problem, one can view the solution x as assigning heights to nodes, and the goal is to separate/cut the nodes with source mass from the rest of the hypergraph. Observe that the linear term in the dual objective function encourages raising x higher on the seed nodes and setting it lower on others. The cost fe(x) captures the discrepancy in node heights over a hyperedge e and encourages smooth height transition over adjacent nodes. The dual solution embeds nodes into the nonnegative real line, and this embedding is what we actually use for local clustering and node ranking.

4 Local Hypergraph Clustering

In this section we discuss the performance of the primal-dual pair (1)-(2), respectively, in the context of local hypergraph clustering. We consider a generic hypergraph H = (V, E, W) with submodular hyperedge weights W = {we, ϑe}e E. Given a set of seed nodes S V , the goal of local hypergraph clustering is to identify a target cluster C V that contains or overlaps well with S. This generalizes the deﬁnition of local clustering over graphs [41]. To the best of our knowledge, we are the ﬁrst one to consider this problem for general submodular hypergraphs. We consider a subset of nodes having low conductance as a good cluster, i.e., these nodes are well-connected internally and well-separated from the rest of the hypergraph. Following prior work on local hypergraph clustering, we assume the existence of an unknown target cluster C with conductance Φ(C). We prove that applying sweepcut to an optimal solution ˆx of (2) returns a cluster ˆC whose conductance is at most quadratically worse than Φ(C). Note that this result resembles Cheeger-type approximation guarantees of spectral clustering in the graph setting [42], and it is the ﬁrst result that is independent of hyperedge size for general hypergraphs. We keep the discussion at high level and defer details to the appendix, where we prove a more general, and stronger, i.e., constant approximation error result when the primal problem (1) is penalized by the ℓp-norm for any p 2.

In order to start a diffusion process we need to provide the source mass . Similar to the p-norm ﬂow diffusion in the graph setting [26], we let

(v) = δdv if v S, 0 otherwise, (3)

where S is a set of seed nodes and δ 1. Below, we make the assumptions that the seed set S and the target cluster C have some overlap, there is a constant factor of vol(C) amount of mass trapped in C initially, and the hyper-parameter σ is not too large. Note that Assumption 2 is without loss of generality: if the right value of δ is not known apriori, we can always employ binary search to ﬁnd a good choice. Assumption 3 is very weak as it allows σ to reside in an interval containing 0. Assumption 1. vol(S C) αvol(C) and vol(S C) βvol(S) for some α, β (0, 1]. Assumption 2. The source mass as speciﬁed in (3) satisﬁes δ = 3/α, so (C) 3vol(C). Assumption 3. σ satisﬁes 0 σ βΦ(C)/3.

Let ˆx be an optimal solution for the dual problem (2). For h > 0 deﬁne the sweep sets Sh := {v V |ˆxv h}. We state the approximation property in Theorem 1.

Theorem 1. Under Assumptions 1, 2, 3, there exists h > 0 such that Φ(Sh) O( p

Some previous works on local hypergraph clustering provide an approximation guarantee by focusing on a speciﬁc setting when the seed set S contains precisely one seed node, e.g., [9,17]. However, they rely on different assumptions about the seed node which may not hold in general. For example, [17] assumes that the seed node is not a neighbor of any node not belonging to the target cluster. Even in the classical graph setting, establishing a local clustering guarantee based on diffusing mass from a single node requires assuming that a good seed node has been selected from a proper subset of the target cluster [42]. On the other hand, our theoretical framework is more natural in the sense that Theorem 1 applies to any set of seed node(s) as long as it has nonempty intersection with the target. We view the α in the denominator of the bound as the price we pay for being certain that the approximation guarantee holds for an arbitrary seed set. The empirical experiments in Section 6 show that our method has superior performance in practice even when only one seed node is used.

One of the challenges we face in establishing the result in Theorem 1 is making sure that our diffusion model enjoys not only good clustering guarantees but also practical algorithmic advantages. This is achieved by introducing the hyper-parameter σ to our diffusion problem. We will demonstrate how σ helps with algorithmic development in Section 5, but from a clustering perspective, the additional ﬂexibility given by σ > 0 complicates the underlying diffusion dynamics, making it more difﬁcult to analyze. Another challenge is connecting the Lovász extension fe(x) in (2) with the conductance of a cluster. We resolve all these problems by combining a generalized Rayleigh quotient result for submodular hypergraphs [15], primal-dual convex conjugate relations between (1) and (2), and a classical property of the Choquet integral/Lovász extension.

Let (ˆφ, ˆr, ˆz) be an optimal solution for the primal problem (1). We state the following lemma on the locality (i.e., sparsity) of the optimal solutions, which justiﬁes why HFD is a local diffusion method.

Lemma 2. |supp(ˆφ)| vol(supp(ˆx)) 1; moreover, vol(supp(ˆz)) = vol(supp(ˆx)) if σ > 0.

5 Optimization algorithm for HFD

We use a simple Alternating Minimization (AM) [43] method that efﬁciently solves the primal diffusion problem (1). For e E, we deﬁne a diagonal matrix Ae R|V | |V | such that [Ae]v,v = 1 if v e and 0 otherwise. Denote C := {(φ, r) : re φe Be, e E}. The following Lemma 3 allows us to cast problem (1) to an equivalent separable formulation amenable to the AM method.

Lemma 3. The following problem is equivalent to (1) for any σ > 0, in the sense that (ˆφ, ˆr, ˆz) is optimal in (1) for some ˆz R|V | if and only if (ˆφ, ˆr, ˆs) is optimal in (4) for some ˆs N

min φ,r,s 1 2

σ se re 2 2

, s.t. (φ, r) C, X

e E se d, se,v = 0, v e. (4)

Algorithm 1 Alternating Minimization for (4) Initialization:

φ(0) := 0, r(0) := 0, s(0) e := D 1Ae [ d]+ , e E.

For k = 0, 1, 2, . . . do:

(φ(k+1), r(k+1)) := argmin (φ,r) C

e E (φ2 e + 1

σ s(k) e re 2 2)

s(k+1) := argmin s

e E se r(k+1) e 2 2

e E se d, se,v = 0, v e.

The AM method for problem (4) is given in Algorithm 1. The ﬁrst sub-problem corresponds to computing projections to a group of cones, where all the projections can be computed in parallel. The computation of each projection depends on the choice of base polytope Be. If the hyperedge weight we is unit cut-cost, Be holds special structures and projection can be computed with O(|e| log |e|) [36]. For general Be, a conic Fujishige-Wolfe minimum norm algorithm can be adopted to obtain the projection [36]. The second sub-problem in Algorithm 1 can be easily computed in closed-form. We provide more information about Algorithm 1 and its convergence properties in the appendix.

We remark that the reformulation (4) for σ > 0 is crucial from an algorithmic point of view. If σ = 0, then the primal problem (1) has complicated coupling constraints that are hard to deal with. In this

case, one has to resort to the dual problem (2). However, problem (2) has a nonsmooth objective function, which prohibits applicability of optimization methods for smooth objective functions. Even though subgradient method may be applied, we have observed empirically that its convergence rate is extremely slow for our problem, and early stopping results in a bad quality output.

Lastly, as noted in Lemma 2, the number of nonzeros in the optimal solution is upper bounded by 1. In Figure 3 we plot the number of nodes having positive excess (which equals the number of nonzeros in the dual solution ˆx) at every iteration of Algorithm 1. Figure 3 indicates that Algorithm 1 is strongly local, meaning that it works only on a small fraction of nodes (and their incident hyperedges) as opposed to producing dense iterates. This key empirical observation has enabled our algorithm to scale to large datasets by simply keeping track of all active nodes and hyperedges. Proving that the worst-case running time of AM depends only on the number of nonzero nodes at optimality as opposed to size of the whole hypergraph is an open problem, which we leave for future work.

0 50 100 150 200 0

# nonzeros per iteration # nonzeros at optimality

(a) Cluster 12 - Gift Cards

0 50 100 150 200 0

# nonzeros per iteration # nonzeros at optimality

(b) Cluster 18 - Magazine Subs.

0 50 100 150 200 0

# nonzeros per iteration # nonzeros at optimality

(c) Cluster 24 - Prime Pantry

Figure 3: The blue solid line plots the number of nonzeros in the dual solution x over 200 iterations of Algorithm 1, when it is applied to solve HFD on the Amazon-reviews hypergraph for local clustering. See Section 6.2 for details about the dataset. The error bars show standard deviation over 10 trials. In each trial we pick a different seed node and set the same amount of initial mass. The black dashed line shows the average number of nonzeros at optimality. The algorithm touches only a small fraction of nodes out the total 2,268,264 nodes in the Amazon-reviews dataset.

6 Empirical Results

In this section we evaluate the performance of HFD for local clustering. First, we carry out experiments on synthetic hypergraphs with varying target cluster conductances and varying hyperedge sizes. For the unit cut-cost setting, we show that HFD is more robust and has better performance when the target cluster is noisy; for a cardinality-based cut-cost setting, we show that the edge-size-independent approximation guarantee is important for obtaining good recovery results. Second, we carry out experiments using real-world data. We show that HFD signiﬁcantly outperforms existing state-ofthe-art diffusion methods for both unit and cardinality-based cut-costs. Moreover, we provide a compelling example where specialized submodular cut-cost is necessary for obtaining good results. Code that reproduces all results is available at https://github.com/s-h-yang/HFD.

6.1 Synthetic experiments using hypergraph stochastic block model (HSBM)

The generative model. We generalize the standard k-uniform hypergraph stochastic block model (k HSBM) [44] to allow different types of inter-cluster hyperedges appear with possibly different probabilities according to the cardinality of hyperedge cut. Let V = {1, 2, . . . , n} be a set of nodes and let k 2 be the required constant hyperedge size. We consider k HSBM with parameters k, n, p, qj, j = 1, 2, . . . , k/2 . The model samples a k-uniform hypergraph according to the following rules: (i) The community label σi {0, 1} is chosen uniformly at random for i V ;2 (ii) Each size k subset e = {v1, v2, . . . , vk} of V appears independently as a hyperedge with probability

P(e E) = p if σv1 = σv2 = = σvk, qj if min{k Pk i=1 σvi, Pk i=1 σvi} = j.

If k = 3 or all qj s are the same, then we obtain the standard two-block k HSBM. We use this setting to evaluate HFD for unit cut-cost. If qj s are different, then we obtain a cardinality-based k HSBM.

2We consider two blocks for simplicity. In general the model applies to any number of blocks.

In particular, when q1 q2 q k/2 , it models the scenario where hyperedges containing similar numbers of nodes from each block are rare, while small noises (e.g., hyperedges that have one or two nodes in one block and all the rest in the other block) are more frequent. We use q1 qj, j 2, to evaluate HFD for cardinality-based cut-cost. There are other random hypergraph models, for example the Poisson degree-corrected HSBM [45] that deals with degree heterogeneity and edge size heterogeneity. In our experiments we focus on k HSBM because it allows stronger control over hyperedge sizes. We provide details on data generation in the appendix.

Task and methods. We consider the local hypergraph clustering problem. We assume that we are given a single labelled node and the goal is to recover all nodes having the same label. Using a single seed node the most common (and sought-after) practice for local graph clustering tasks. We test the performance of HFD with two other methods: (i) Localized Quadratic Hypergraph Diffusions (LH) [9], which can be seen as a hypergraph analogue of Approximate Personalized Page Rank (APPR); (ii) ACL [42], which is used to compute APPR vectors on a standard graph obtained from reducing a hypergraph through star expansion [46].3

Cut-costs and parameters. We consider both unit cut-cost, i.e., we(S) = 1 if S e = and e \ S = , and cardinality cut-cost we(S) = min{|S e|, |e \ S|}/ |e|/2 . HFD that uses unit and cardinality cut-costs are denoted by U-HFD and C-HFD, respectively. LH also works with both unit and cardinality cut-costs and we specify them by U-LH and C-LH, respectively. For HFD, we initialize the seed mass so that 1 is a constant factor times the volume of the target cluster. We set σ = 0.01. We highly tune LH by performing binary search over its parameters κ and δ and pick the output cluster having the lowest conductance. For ACL we use the same parameter choices as in [9]. Details on parameter setting are provided in the appendix.

Results. For each hypergraph, we randomly pick a block as the target cluster. We run the methods 50 times. Each time we choose a different node from the target cluster as the single seed node.

Unit cut-cost results. Figure 4 shows local clustering results when we ﬁx k = 3 but vary the conductance of the target cluster (i.e., constant p but varying q1). Observe that the performances of all methods become worse as the target cluster becomes more noisy, but U-HFD has signiﬁcantly better performance than both U-LH and ACL when the conductance of the target cluster is between 0.2 and 0.4. The reason that U-HFD performs better is in part because it requires much weaker conditions for the theoretical conductance guarantee to hold. On the contrary, LH assumes an upper bound on the conductance of the target cluster [9]. This upper bound is dataset-dependent and could become very small in many cases, leading to poor practical performances. We provide more details in this perspective in the appendix. ACL with star expansion is a heuristic method that has no performance guarantee.

0.1 0.2 0.3 0.4 0.5 Ground-truth conductance

Output conductance

U-HFD U-LH ACL

0.1 0.2 0.3 0.4 0.5 Ground-truth conductance

Output F1 score

U-HFD U-LH ACL

Figure 4: Output conductance and F1 against ground-truth conductance

Cardinality cut-cost results. Figure 5 shows the median (markers) and 25-75 percentiles (lower-upper bars) of conductance ratios (i.e., the ratio between output conductance and ground-truth conductance, lower is better) and F1 scores for different methods for k {3, 4, 5, 6}. The target cluster for each k

3There are other heuristic methods which ﬁrst reduce a hypergraph to a graph by clique expansion [7] and then apply diffusion methods for standard graphs. We did not compare with this approach because clique expansion often results in a dense graph and consequently makes the computation slow. Moreover, it has been shown in [9] that clique expansion did not offer signiﬁcant performance improvement over star expansion.

has conductance around 0.3.4 For k = 3, unit and cardinality cut-costs are equivalent, therefore all methods have similar performances. As k increases, cardinality cut-cost provides better performance than unit cut-cost in both conductance and F1. However, since the theoretical approximation guarantee of C-LH depends on hyperedge size [9], there is a noticeable performance degradation for C-LH when we increase k = 3 to k = 4. On the other hand, the performance of C-HFD appears to be independent from k, which aligns with our conductance bound in Theorem 1.

k = 3 k = 4 k = 5 k = 6

Φ( ˆC)/Φ(C)

U-HFD C-HFD U-LH C-LH ACL

k = 3 k = 4 k = 5 k = 6 0.80

U-HFD C-HFD U-LH C-LH ACL

Figure 5: Conductance ratio and F1 on k-uniform hypergraphs

6.2 Experiments using real-world data

We conduct extensive experiments using real-world data. First, we show that HFD has superior local clustering performances than existing methods for both unit and cardinality-based cut-costs. Then, we show that general submodular cut-cost (recall that HFD is the only method that applies to this setting) can be necessary for capturing complex high-order relations in the data, improving F1 scores by up to 20% for local clustering and providing the only meaningful result for node ranking. Because of limited space, in the appendix we show additional local clustering experiments on two additional datasets, where our method improves F1 scores by 8% on average for 13 different target clusters.

Datasets. We provide basic information about the datasets used in our experiments. Complete descriptions are provided in the appendix. Amazon-reviews (|V | = 2,268,264, |E| = 4,285,363) [13,47]. In this hypergraph each node represents a product. A set of products are connected by a hyperedge if they are reviewed by the same person. We use product category labels as ground truth cluster identities. We consider all clusters of less than 10,000 nodes. Trivago-clicks (|V | = 172,738, |E| = 233,202) [45]. The nodes in this hypergraph are accommodations/hotels. A set of nodes are connected by a hyperedge if a user performed click-out action during the same browsing session. We use geographical locations as ground truth cluster identities. There are 160 such clusters, and we ﬁlter them using cluster size and conductance. Florida Bay food network (|V | = 128, |E| = 141,233) [8]. Nodes in this hypergraph correspond to different species or organisms that live in the Bay, and hyperedges correspond to transformed motifs (Figure 1) of the original dataset. Each species is labelled according its role in the food chain: producers, low-level consumers, high-level consumers.

Methods and parameters. We compare HFD with LH and ACL.5 There is a heuristic nonlinear variant of LH which is shown to outperform linear LH in some cases [9]. Therefore we also compare with the same nonlinear variant considered in [9]. We denote the linear and nonlinear versions by LH-2.0 and LH-1.4, respectively. We set σ = 0.0001 for HFD and we set the parameters for LH-2.0, LH-1.4 and ACL as suggested by the authors [9]. More details on parameter choices appear in the appendix. We preﬁx methods that use unit and cardinality-based cut-costs by Uand C-, respectively.

Experiments for unit and cardinality cut-costs. For each target cluster in Amazon-reviews and Trivago-clicks, we run the methods multiple times, each time we use a different node as the singe seed node.6 We report the median F1 scores of the output clusters in Table 1 and Table 2. For

4See the appendix for similar results when we ﬁx the target cluster conductances around 0.2 and 0.25, respectively. These cover a reasonably wide range of scenarios in terms of the target conductance and illustrate the performance of algorithms for different levels of noise. 5We also tried a ﬂow-improve method for hypergraphs [13], but the method was very slow in our experiments, so we only used it for small datasets. See appendix for results. The ﬂow-improve method did not improve the performance of existing methods, therefore, we omitted it from comparisons on larger datasets. 6We show additional results using seed sets of more than one node in the appendix.

Amazon-reviews, we only compare the unit cut-cost because it is both shown in [9] and veriﬁed by our experiments that unit cut-cost is more suitable for this dataset. Observe that U-HFD obtains the highest F1 scores for nearly all clusters. In particular, U-HFD signiﬁcantly outperforms other methods for clusters 12, 18, 24, where we see an increase in F1 score by up to 52%. For Trivago-clicks, C-HFD has the best performance for all but one clusters. Among the rest of all other methods, U-HFD has the second highest F1 scores for nearly all clusters. Moreover, observe that for each method (i.e., HFD, LH-2.0, LH-1.4), cardinality cut-cost leads to higher F1 than its unit cut-cost counterpart.

Table 1: F1 results for Amazon-reviews network

Method 1 2 3 12 15 17 18 24 25

U-HFD 0.45 0.09 0.65 0.92 0.04 0.10 0.80 0.81 0.09 U-LH-2.0 0.23 0.07 0.23 0.29 0.05 0.06 0.21 0.28 0.05 U-LH-1.4 0.23 0.09 0.35 0.40 0.00 0.07 0.31 0.35 0.06 ACL 0.23 0.07 0.22 0.25 0.04 0.05 0.17 0.20 0.04

Table 2: F1 results for Trivago-clicks network

Method KOR ISL PRI UA-43 VNM HKG MLT GTM UKR EST

U-HFD 0.75 0.99 0.89 0.85 0.28 0.82 0.98 0.94 0.60 0.94 C-HFD 0.76 0.99 0.95 0.94 0.32 0.80 0.98 0.97 0.68 0.94 U-LH-2.0 0.70 0.86 0.79 0.70 0.24 0.92 0.88 0.82 0.50 0.90 C-LH-2.0 0.73 0.90 0.84 0.78 0.27 0.94 0.96 0.88 0.51 0.83 U-LH-1.4 0.69 0.84 0.80 0.75 0.28 0.87 0.92 0.83 0.47 0.90 C-LH-1.4 0.71 0.88 0.84 0.78 0.27 0.88 0.93 0.85 0.50 0.85 ACL 0.65 0.84 0.75 0.68 0.23 0.90 0.83 0.69 0.50 0.88

Experiments for general submodular cut-cost. In order to understand the importance of specialized general submodular hypergraphs we study the node-ranking problem for the Florida Bay food network using hypergraph modelling shown in Figure 1. We compare HFD using unit (U-HFD, γ1 = γ2 = 1), cardinality-based (C-HFD, γ1 = 1/2 and γ2 = 1) and submodular (S-HFD, γ1 = 1/2 and γ2 = 0) cut-costs. Our goal is to search the most similar species of a queried species based on the food-network structure. Table 3 shows that S-HFD provides the only meaningful node ranking results. Intuitively, when γ2 = 0, separating the preys v1, v2 from the predators v3, v4 incurs 0 cost. This encourages S-HFD to diffuse mass among preys or predators only and not to cross from a predator to a prey or vice versa. As a result, similar species receive similar amount of mass and thus are ranked similarly. In the local clustering setting, Table 3 compares HFD using different cut-costs. By exploiting specialized higher-order relations, S-HFD further improves F1 scores by up to 20% over U-HFD and C-HFD. This is not surprising, given the poor node-ranking results of other cut-costs. In the appendix we show another application of submodular cut-cost for node-ranking in an international oil trade network.

Table 3: Node-ranking and local clustering in Florida Bay food network using different cut-costs

Top-2 node-ranking results Clustering F1

Method Query: Raptors Query: Gray Snapper Prod. Low High

U-HFD Epiphytic Gastropods, Detriti. Gastropods Meiofauna, Epiphytic Gastropods 0.69 0.47 0.64 C-HFD Epiphytic Gastropods, Detriti. Gastropods Meiofauna, Epiphytic Gastropods 0.67 0.47 0.64 S-HFD Gruiformes, Small Shorebirds Snook, Mackerel 0.69 0.62 0.84

Acknowledgments and Disclosure of Funding

K.F. would like to acknowledge the support of the Natural Sciences and Engineering Research Council of Canada (NSERC). Cette recherche a été ﬁnancée par le Conseil de recherches en sciences naturelles et en génie du Canada (CRSNG), [RGPIN-2019-04067, DGECR2019-00147]. P.L. is partially supported by the 2021 JP Morgan Faculty Award and the National Science Foundation (NSF) award HDR-2117997. S.Y. is partially supported by the Borealis AI 2021 Fellowship.

[1] C. Berge, Hypergraphs: combinatorics of ﬁnite sets, vol. 45. Elsevier, 1984.

[2] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon, Network motifs: simple building blocks of complex networks, Science, vol. 298, no. 5594, pp. 824 827, 2002.

[3] J. Bu, S. Tan, C. Chen, C. Wang, H. Wu, L. Zhang, and X. He, Music recommendation by uniﬁed hypergraph: combining social media information and music content, in MM 10: Proceedings of the 18th ACM international conference on Multimedia, 2010.

[4] L. Li and T. Li, News recommendation via hypergraph learning: encapsulation of user behavior and news content, in WSDM 13: Proceedings of the sixth ACM international conference on Web search and data mining, 2013.

[5] J. Ni, J. Li, and J. Mc Auley, Justifying recommendations using distantly-labeled reviews and ﬁne-grained aspects, in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 188 197, 2019.

[6] I. Amburg, N. Veldt, and A. R. Benson, Clustering in graphs and hypergraphs with categorical edge labels, in Proceedings of the Web Conference, 2020.

[7] A. R. Benson, D. F. Gleich, and J. Leskovec, Higher-order organization of complex networks, Science, vol. 353, no. 6295, pp. 163 166, 2016.

[8] P. Li and O. Milenkovic, Inhomogeneous hypergraph clustering with applications, in Advances in Neural Information Processing Systems, 2017.

[9] M. Liu, N. Veldt, H. Song, P. Li, and D. F. Gleich, Strongly local hypergraph diffusions for clustering and semi-supervised learning, in The Web Conf 2021, 2021.

[10] E. L. Lawler, Cutsets and partitions of hypergraphs, Networks, vol. 3, no. 3, pp. 275 285, 1973.

[11] E. Ihler, D. Wagner, and F. Wagner, Modeling hypergraphs by graphs with the same mincut properties, Information Processing Letters, vol. 45, no. 4, pp. 171 175, 1993.

[12] S. W. Hadley, Approximation techniques for hypergraph partitioning problems, Discrete Appl. Math., vol. 59, p. 115 127, May 1995.

[13] N. Veldt, A. R. Benson, and J. Kleinberg, Minimizing localized ratio cut objectives in hypergraphs, in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2020.

[14] N. Veldt, A. R. Benson, and J. Kleinberg, Hypergraph cuts with general splitting functions, 2020.

[15] P. Li and O. Milenkovic, Submodular hypergraphs: p-laplacians, cheeger inequalities and spectral clustering, in Proceedings of the 35th International Conference on Machine Learning, 2018.

[16] Y. Yoshida, Cheeger inequalities for submodular transformations, in Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 2582 2601, SIAM, 2019.

[17] Y. Takai, A. Miyauchi, M. Ikeda, and Y. Yoshida, Hypergraph clustering based on pagerank, in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1970 1978, 2020.

[18] A. Benson, D. Gleich, and D. Higham, Higher-order network analysis takes off, fueled by classical ideas and new data, SIAM News (online), 2021.

[19] C. Eksombatchai, P. Jindal, J. Z. Liu, Y. Liu, R. Sharma, C. Sugnet, M. Ulrich, and J. Leskovec, Pixie: A system for recommending 3+ billion items to 200+ million users in real-time, WWW 18: Proceedings of the 2018 World Wide Web Conference, pp. 1775 1784, 2018.

[20] C. Eksombatchai, J. Leskovec, R. Sharma, C. Sugnet, and M. Ulrich, Node graph traversal methods. U.S. Patent 10 762 134 B1, Sep. 2020, 2020.

[21] P. Gupta, A. Goel, J. Lin, A. Sharma, D. Wang, and R. Zadeh, WTF: the who to follow service at twitter, WWW 13: Proceedings of the 22nd international conference on World Wide Web, pp. 505 514, 2013.

[22] E. Sadikov, J. Madhavan, L. Wang, and A. Halevy, Clustering query reﬁnements by user intent, WWW 10: Proceedings of the 19th international conference on World wide web, pp. 841 850, 2010.

[23] S. Brin and L. Page, The anatomy of a large-scale hypertextual web search engine, Computer Networks and ISDN Systems, vol. 30, no. 1 7, pp. 107 117, 1998.

[24] L. Page, S. Brin, R. Motwani, and T. Winograd, The pagerank citation ranking: Bringing order to the web, tech. rep., Stanford, 1999. Technical Report 1999-66, Stanford Info Lab.

[25] D. Wang, K. Fountoulakis, M. Henzinger, M. W. Mahoney, and S. Rao, Capacity releasing diffusion for speed and locality, Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 3607 2017, 2017.

[26] K. Fountoulakis, D. Wang, and S. Yang, p-norm ﬂow diffusion for local graph clustering, in Proceedings of the 37th International Conference on Machine Learning, 2020.

[27] R. Ibrahim and D. F. Gleich, Local hypergraph clustering using capacity releasing diffusion, PLOS ONE, vol. 15, pp. 1 20, 12 2020.

[28] H. Yin, A. R. Benson, J. Leskovec, and D. F. Gleich, Local higher-order graph clustering, in Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 555 564, 2017.

[29] D. Zhou, J. Huang, and B. Schölkopf, Learning with hypergraphs: Clustering, classiﬁcation, and embedding, Advances in neural information processing systems, vol. 19, pp. 1601 1608, 2006.

[30] S. Arora, S. Rao, and U. Vazirani, Expander ﬂows, geometric embeddings and graph partitioning, JACM, vol. 56, Apr. 2009.

[31] M. Hein, S. Setzer, L. Jost, and S. S. Rangapuram, The total variation on hypergraphs-learning on hypergraphs revisited, in Proceedings of the 26th International Conference on Neural Information Processing Systems-Volume 2, pp. 2427 2435, 2013.

[32] A. Louis, Hypergraph markov operators, eigenvalues and approximation algorithms, STOC, (New York, NY, USA), p. 713 722, Association for Computing Machinery, 2015.

[33] C. Zhang, S. Hu, Z. G. Tang, and T. H. Chan, Re-revisiting learning on hypergraphs: conﬁdence interval and subgradient method, in International Conference on Machine Learning, pp. 4026 4034, PMLR, 2017.

[34] T.-H. H. Chan, A. Louis, Z. G. Tang, and C. Zhang, Spectral properties of hypergraph laplacian and approximation algorithms, JACM, vol. 65, Mar. 2018.

[35] E. Chien, P. Li, and O. Milenkovic, Landing probabilities of random walks for seed-set expansion in hypergraphs, 2019.

[36] P. Li, N. He, and O. Milenkovic, Quadratic decomposable submodular function minimization: Theory and practice, Journal of Machine Learning Research, vol. 21, no. 106, pp. 1 49, 2020.

[37] U. Chitra and B. Raphael, Random walks on hypergraphs with edge-dependent vertex weights, in Proceedings of the 36th International Conference on Machine Learning (K. Chaudhuri and R. Salakhutdinov, eds.), vol. 97 of Proceedings of Machine Learning Research, pp. 1172 1181, PMLR, 09 15 Jun 2019.

[38] K. Hayashi, S. G. Aksoy, C. H. Park, and H. Park, Hypergraph random walks, laplacians, and clustering, in Proceedings of the 29th ACM International Conference on Information & Knowledge Management, CIKM 20, (New York, NY, USA), p. 495 504, Association for Computing Machinery, 2020.

[39] K. Fountoulakis, M. Liu, , D. F. Gleich, and M. W. Mahoney, Flow-based algorithms for improving clusters: A unifying framework, software, and performance, ar Xiv:2004.09608, 2020.

[40] F. Bach, Learning with submodular functions: A convex optimization perspective, Foundations and Trends in Machine Learning, vol. 6, no. 2-3, pp. 145 373, 2013.

[41] K. Fountoulakis, F. Roosta-Khorasani, J. Shun, X. Cheng, and M. W. Mahoney, Variational perspective on local graph clustering, Mathematical Programming B, pp. 1 21, 2017.

[42] R. Andersen, F. Chung, and K. Lang, Local graph partitioning using pagerank vectors, FOCS 06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, pp. 475 486, 2006.

[43] A. Beck, On the convergence of alternating minimization for convex programming with applications to iteratively reweighted least squares and decomposition schemes, SIAM Journal on Optimization, vol. 25, no. 1, pp. 185 209, 2015.

[44] D. Ghoshdastidar and A. Dukkipati, Consistency of spectral partitioning of uniform hypergraphs under planted partition model, in Advances in Neural Information Processing Systems (Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Q. Weinberger, eds.), vol. 27, Curran Associates, Inc., 2014.

[45] P. S. Chodrow, N. Veldt, and A. R. Benson, Generative hypergraph clustering: from blockmodels to modularity, 2021.

[46] J. Y. Zien, M. D. F. Schlag, and P. K. Chan, Multilevel spectral hypergraph partitioning with arbitrary vertex sizes, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 18, no. 9, pp. 1389 1399, 1999.

[47] J. Ni, J. Li, and J. Mc Auley, Justifying recommendations using distantly-labeled reviews and ﬁne-grained aspects, in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 188 197, 2019.