# finding_dense_subgraphs_via_lowrank_bilinear_optimization__99867b90.pdf

Finding Dense Subgraphs via Low-Rank Bilinear Optimization

Dimitris S. Papailiopoulos DIMITRIS@UTEXAS.EDU Ioannis Mitliagkas IOANNIS@UTEXAS.EDU Alexandros G. Dimakis DIMAKIS@AUSTIN.UTEXAS.EDU Constantine Caramanis CONSTANTINE@UTEXAS.EDU The University of Texas at Austin

Given a graph, the Densest k-Subgraph (Dk S) problem asks for the subgraph on k vertices that contains the largest number of edges. In this work, we develop a new algorithm for Dk S that searches a low-dimensional space for provably dense subgraphs. Our algorithm comes with novel performance bounds that depend on the graph spectrum. Our graph-dependent bounds are surprisingly tight for real-world graphs where we ﬁnd subgraphs with density provably within 70% of the optimum. These guarantees are signiﬁcantly tighter than the best available worst case a priori bounds.

Our algorithm runs in nearly linear time, under spectral assumptions satisﬁed by most graphs found in applications. Moreover, it is highly scalable and parallelizable. We demonstrate this by implementing it in Map Reduce and executing numerous experiments on massive real-world graphs that have up to billions of edges. We empirically show that our algorithm can ﬁnd subgraphs of signiﬁcantly higher density compared to the previous state of the art.

1. Introduction

Given a graph G on n vertices with m edges and a parameter k, we are interested in ﬁnding an induced subgraph on k vertices with the largest average degree, also known as the maximum density. This is the Densest k-Subgraph (Dk S) a fundamental problem in combinatorial optimization with applications in numerous ﬁelds including social sciences, communication networks, and biology (see e.g. (Hu et al., 2005; Gibson et al., 2005; Dourisboure et al., 2007; Saha et al., 2010; Miller et al., 2010; Bahmani et al., 2012)).

Dk S is a notoriously hard problem. It is NP-hard by reduc-

Proceedings of the 31 st International Conference on Machine Learning, Beijing, China, 2014. JMLR: W&CP volume 32. Copyright 2014 by the author(s).

tion to MAXCLIQUE. Moreover, Khot showed in (Khot, 2004) that, under widely believed complexity-theoretic assumptions, Dk S cannot be approximated within an arbitrary constant factor.1 The best known approximation ratio was n1/3+ϵ (for some small ϵ) due to (Feige et al., 2001). Recently, (Bhaskara et al., 2010) introduced an algorithm with approximation ratio n1/4+ϵ, that runs in time n O(1/ϵ). Such results, where the approximation factor scales as a polynomial in the number of vertices, are too pessimistic for real-world applications. This resistance to better approximations, despite the long history of the problem, suggests that Dk S is probably very hard in the worst case.

Our Contributions. In this work we move beyond the worst case framework. We present a novel Dk S algorithm that has two key features: i) it comes with approximation guarantees that are surprisingly tight on real-world graphs and ii) it is fully parallelizable and can scale up to graphs with billions of edges.

Our algorithm combines spectral and combinatorial techniques; it relies on examining candidate subgraphs obtained from vectors lying in a low-dimensional subspace of the adjacency matrix of the graph. This is accomplished through a framework called the Spannogram, which we deﬁne below.

Our approximation guarantees are graph-dependent: they are related to the spectrum of the adjacency matrix of the graph. Let opt denote the average degree (i.e., the density) of the densest k-subgraph, where 0 opt k 1. Our algorithm takes as input the graph, the subgraph size k, and an accuracy parameter d {1, . . . , n}. The output is a subgraph on k vertices with density optd, for which we obtain the following approximation result: Theorem 1. For any unweighted graph, our algorithm outputs in time O nd+2 log n

δ a k-subgraph that has density

optd 0.5 (1 δ) opt 2 |λd+1|,

with probability 1 1

n, where λi is the ith largest, in mag-

1approximation ratio ρ means that there exists an algorithm that produces in polynomial time a number A, such that 1 opt

A ρ, where opt is the optimal density.

Finding Dense Subgraphs via Low-Rank Bilinear Optimization

nitude, eigenvalue of the adjacency matrix of the graph. If the graph is bipartite, or if the largest d eigenvalues of the graph are positive, then our algorithm runs in time O nd+1 + Td , and outputs a k-subgraph with density

optd opt 2 |λd+1|,

where Td is the time to compute the d leading eigenvectors of the adjacency matrix of the graph.

Our bounds come close to 2+ϵ and 1+ϵ factor approximations, when λd+1 is signiﬁcantly smaller than the density of the densest k-subgraph. In the following theorem, we give such an example. However, we would like to note that in the worst case our bounds might not yield something meaningful. Theorem 2. If the densest-k-subgraph contains a constant fraction of all the edges, and k = Θ(

E), then we can approximate Dk S within a factor of 2+ϵ, in time n O(1/ϵ2). If additionally the graph is bipartite, we can approximate Dk S within a factor of 1 + ϵ.

The above result is similar to the 1 + ϵ approximation ratio of (Arora et al., 1995) for dense graphs, where the densestk-subgraph contains a constant fraction of the Ω(n2) edges, where k = Ω(n). The innovation here is that our ratio also applies to sparse graphs with sublinear number of edges.

Computable upper bounds. In addition to these theoretical guarantees, our analysis allows us to obtain a graphdependent upper bound for the optimal subgraph density. This is shown in Fig. 3 in our experimental section, where for many graphs our algorithm is provably within 70% from the upper bound of opt. These are far stronger guarantees than the best available a priori bounds. This illustrates the potential power of graph-dependent guarantees that, however, require the execution of an algorithm.

Nearly-linear time approximation. Our algorithm has a worst-case running time of O nd+2 log n

δ . Under some mild spectral assumptions, a randomized version of our algorithm runs in nearly-linear time. Theorem 3. Let the d largest eigenvalues of the graph be positive, and let the d-th,(d + 1)-st largest have constant

C. Then, we can modify our algorithm to output, with probability 1 δ, a k-subgraph with density (1 ϵ)2 optd, in time O m log n + n

ϵ δ , where m is the number of edges.

We found that the above spectral condition holds for all d 5, in many real-world graphs that we tested.

Scalability. We develop two key scalability features that allow us to scale up efﬁciently on massive graphs.

Vertex sparsiﬁcation: We introduce a pre-processing step that eliminates vertices that are unlikely to be part of

the densest k-subgraph. The elimination is based on the vertices weighted leverage scores (Mahoney & Drineas, 2009; Boutsidis et al., 2009) and admits a provable bound on the introduced error. We empirically found that even with a negligible additional error, the elimination dramatically reduced problem sizes in all tested datasets.

Map Reduce implementation: We show that our algorithm is fully-parallelizable and tailor it for the Map Reduce framework. We use our Map Reduce implementation to run experiments on Elastic Map Reduce (EMR) on Amazon. In our large-scale experiments, we were able to scale out to thousands of mappers and reducers in parallel over 800 cores, and ﬁnd large dense subgraphs in graphs with billions of edges.

1.1. Related work Dk S algorithms: One of the few positive results for Dk S is a 1 + ϵ approximation for dense graphs where m = Ω(n2), and in the linear subgraph setting k = Ω(n) (Arora et al., 1995). For some values of m = o(n2) a 2 + ϵ approximation was established by (Suzuki & Tokuyama, 2005). Moreover, for any k = Ω(n) a constant factor approximation is possible via a greedy approach by (Asahiro et al., 2000), or via semideﬁnite relaxations by (Srivastav & Wolf, 1998) and (Feige & Langberg, 2001). Recently, (Alon et al., 2013) established new approximation results for graphs with small ϵ-rank, using an approximate solver for low-rank perturbed versions of the adjacency matrix.

There is a vast literature on algorithms for detecting communities and well-connected subgraphs: greedy schemes (Ravi et al., 1994), optimization approaches (Jethava et al., 2012; d Aspremont et al., 2010; Ames, 2011), and the truncated power method (Yuan & Zhang, 2011). We compare with various of these algorithms in our evaluation section.

The Spannogram framework: We present an exact solver for bilinear optimization problems on matrices of constant rank, under {0, 1} and sparsity constraints on the variables. Our theory is a generalization of the Spannogram framework, originally introduced in the foundational work of (Karystinos & Liavas, 2010) and further developed in (Asteris et al., 2014; Papailiopoulos et al., 2013), that obtains exact solvers for low-rank quadratic optimization problems with combinatorial constraints, such as sparse PCA.

Map Reduce algorithms for graphs: The design of Map Reduce algorithms for massive graphs is an active research area as Hadoop becomes one of the standards for storing large data sets. The related work by Bahmani et al. (Bahmani et al., 2012) designs a novel Map Reduce algorithm for the densest subgraph problem. This densest subgraph problem requires ﬁnding a subgraph of highest normalized density without enforcing a speciﬁc subgraph size k.

Finding Dense Subgraphs via Low-Rank Bilinear Optimization

Surprisingly, without a subgraph size restriction, the densest subgraph becomes polynomially solvable and therefore fundamentally different from what we consider in this paper.

2. Proposed Algorithm

The density of a subgraph indexed by a vertex set S {1, . . . , n} is equal to the average degree of the vertices within S:

den(S) = 1T SA1S

where A is the adjacency matrix (Ai,j = 1 if (i, j) is an edge, else Ai,j = 0) and the indicator vector 1S has 1s in the entries indexed by S and 0 otherwise. Observe that 1T SA1S = P

i,j S Ai,j is twice the number of edges in the subgraph with vertices in S.

For a ﬁxed subgraph size |S| = k, we can express Dk S as a quadratic optimization:

Dk S : opt = (1/k) max |S|=k 1T SA1S

where |S| = k denotes that the optimization variable is a k-vertex subset of {1, . . . , n}.

The bilinear relaxation of Dk S. We approximate Dk S via approximating its bipartite version. This problem can be expressed as a bilinear maximization:

DBk S : opt B = (1/k) max |X|=k max |Y|=k 1T X A1Y.

As we see in the following lemma, the two problems are fundamentally related: a good solution for the bipartite version of the problem maps to a half as good solution for Dk S. The proof is given in the Supplemental Material.

Lemma 1. A ρ-approximation algorithm for DBk S implies a 2ρ-approximation algorithm for Dk S.

2.1. Dk S through low rank approximations

At the core of our approximation lies a constant rank solver: we show that DBk S can be solved in polynomial time on constant rank matrices. We solve constant rank instances of DBk S instead of Dk S due to an important implication: Dk S is NP-hard even for rank-1 matrices with 1 negative eigenvalue, as we show in the Supplemental Material.

The exact steps of our algorithm are given in the pseudocode tables referred to as Algorithms 1-3.2 The output of our algorithm is a k-subgraph Zd that has density optd that

2In the pseudocode of Algorithm 2, topk(v), denotes the indices of the k largest signed elements of v.

comes with provable guarantees. We present our theoretical guarantees in the next subsection.

Our main algorithmic innovation, the constant rank solver for DBk S (Algorithms 2-3), is called many times: in lines 5, 8, and 15 of our general Dk S approximation, shown as Algorithm 1. We describe its steps subsequently.

Algorithm 1 low-rank approximations for Dk S

1: [Vd, Λd] = EVD(A, d) 2: if G is bipartite then 3: B = bi-adjacency of G 4: [Vd, Σd, Ud] = SVD(B, d) 5: {Xd, Yd} = arg max|X|+|Y|=k 1T X VdΣd UT d 1Y. 6: Zd = Xd Yd 7: else if The ﬁrst d eigenvalues of A are positive then 8: {Xd, Xd} = arg max|X|=|Y|=k 1T X VdΛd VT d 1Y. 9: Zd = Xd 10: else 11: for i = 1 : log n

δ do 12: draw n fair coins and assign them to vertices 13: L = vertices with heads; R = {1, . . . , n} L 14: Bi d = [VdΛd VT d ]L,R 15: {X i, Yi} = arg max|X|+|Y|=k 1T X Bi d1Y. 16: end for 17: {X i, Yi} = arg max1 i n 1T X i Bi d1Yi 18: Zd = Xd Yd 19: end if 20: Output: Zd

Constant rank solver for DBk S. In the following we present an exact solver for DBk S on constant rank approximations of A. Our Dk S algorithm makes a number of calls to the DBk S low-rank solver on slightly different (some times rectangular) matrices. The details of the general lowrank solver are in the Supplemental Material.

Step 1: Obtain Ad = Pd i=1 λiviv T i , a rank-d approximation of A. Here, λi is the i-th largest in magnitude eigenvalue and vi the corresponding eigenvector.

Step 2: Use Ad to obtain O(nd) candidate subgraphs. For any matrix A we can solve DBk S by exhaustively checking all n k 2 pairs (X, Y) of k-subsets of vertices. Surprisingly, if we want to ﬁnd the X, Y pairs that maximize 1T X Ad1Y, i.e., the bilinear problem on the rank-d matrix Ad, then we show that only O(nd) candidate pairs need to be examined.

Step 3: Check all k-set pairs {X, Y} obtained by Step 2, and output the one with the largest density on the low-rank weighted adjacency Ad.

In the next section, we derive the constant rank-solver using two key facts. First, for each ﬁxed vertex set Y, we show that it is easy to ﬁnd the optimal set X that maximizes 1T X Ad1Y for that Y. Since this turns out to be easy, then the challenge is to ﬁnd the number of different vertex sets Y that we need to check. Do we need to exhaustively check all n k k-sets Y? We show that this question is equivalent

Finding Dense Subgraphs via Low-Rank Bilinear Optimization

to searching the span of the ﬁrst d eigenvectors of A, and collecting in a set Sd the top-k coordinates of all vectors in that d-dimensional space. By modifying the Spannogram theory of (Karystinos & Liavas, 2010; Asteris et al., 2014), we show how this set has size O(nd) and can be constructed in time O(nd+1). This will imply that DBk S can be solved in time O(nd+1) on Ad.

Computational Complexity. The worst-case time complexity of the constant-rank DBk S solver on Ad is O(Td + nd+1), where Td is the time to compute the ﬁrst d eigenvectors of A. Under conditions satisﬁed by many real world graphs, we show that we can modify our algorithm and obtain a randomized one that succeeds with probability δ and is ϵ far from the optimal rank-d solver, while its complexity reduces to nearly linear in the number of edges m of the graph G: O m log n + n

Algorithm 2 lowrank DBk S(k, d, A)

1: [Vd, Λd] = EVD(A, d) 2: Sd = Spannogram(k, Vd) 3: {Xd, Yd} = arg max|X|=k max Y Sd 1T X VdΛd VT d 1Y 4: Output: {Xd, Yd} 1: Spannogram(k, Vd) 2: Sd = {topk(v) : v span(v1, . . . , vd)} 3: Output: Sd.

2.2. Approximation Guarantees

We approximate DBk S by ﬁnding a solution to the constant rank problem

max |X|=k max |Y|=k 1T X Ad1Y.

We output a pair of vertex sets, Xd, Yd, which we refer to as the rank-d optimal solution, that has density

opt B d = (1/k) 1T Xd A1Yd.

Our approximation guarantees measure how far opt B d is from opt B, the optimal density for DBk S. Our bounds capture a simple core idea: the loss in our approximation comes due to solving the problem on Ad instead of solving it on the full rank matrix A. This loss is quantiﬁed in the next lemma. The detailed proofs of the following results are in the supplemental material. Lemma 2. For any matrix A: opt B d opt B 2 |λd+1|, where λi is the ith largest eigenvalue of A.

Using an appropriate pre-processing step and then running Algorithm 2 as a subroutine on a sub-sampled and low-rank version of A, we output a k-subgraph Zd that has density optd. By essentially combining Lemmata 1 and 2 we obtain the following bounds.

Theorem 1. Algorithm 1 outputs in time O nd+2 log n

δ a k-subgraph that has density

optd = den(Zd) 0.5 (1 δ) opt 2 |λd+1|,

with probability 1 1

n, where λi is the ith largest, in magnitude, eigenvalue of the adjacency matrix of the graph. If the graph is bipartite, or if the largest d eigenvalues of the graph are positive, then our algorithm runs in time O nd+1 + Td , and outputs a k-subgraph with density optd opt 2 |λd+1|, where Td is the time to compute the d leading eigenvectors of the adjacency matrix of the graph.

Using bounds on eigenvalues of graphs, Theorem 1 translates to the following approximation guarantees.

Theorem 2. If the densest-k-subgraph contains a constant fraction of all the edges, and k = Θ(

E), then we can approximate Dk S within a factor of 2+ϵ, in time n O(1/ϵ2). If additionally the graph is bipartite, then we can approximate Dk S within a factor of 1 + ϵ.

Remark 1. The above results are similar to the 1 + ϵ ratio of (Arora et al., 1995), which holds for graphs where the densest-k-subgraph contains Ω(n2) edges.

Graph dependent bounds. For any given graph, after running our constant rank solver on Ad, we can compute an upper bound to the optimal density opt via bounds on opt B, since it is easy to see that opt B opt. Our graphdependent bound is the minimum of three upper bounds on the unknown optimal density:

Lemma 3. The optimal density of Dk S can be bounded as

opt min (1/k) 1T Xd Ad1Yd + |λd+1|, k 1, λ1 .

In our experimental section, we plot the above upper bounds, and show that for most tested graphs our algorithm performs provably within 70% from the upper bound on the optimal density. These are far stronger guarantees than the best available a priori bounds.

3. The Spannogram Framework

In this section, we describe how our constant rank solver operates by examining candidate vectors in a lowdimensional span of A.

Here, we work on a rank-d matrix Ad = v1u T 1 + . . . + vdu T d where ui = λivi, and we wish to solve:

max |X|=|Y|=k 1T X v1u T 1 + . . . + vdu T d 1Y. (1)

Observe that we can rewrite (1) in the following way

max |X|=|Y|=k 1T X

v1 (u T 1 1Y) | {z } c1

+ . . . + vd (u T d 1Y) | {z } cd

= max |Y|=k

max |X|=k 1T X v Y

Finding Dense Subgraphs via Low-Rank Bilinear Optimization

where v Y = v1 c1 + . . . + vd cd is an n-dimensional vector generated by the d-dimensional subspace spanned by v1, . . . , vd.

We will now make a key observation: for every ﬁxed vector v Y in (2), the index set X that maximizes 1T X v Y can be easily computed. It is not hard to see that for any ﬁxed vector v Y, the k-subset X that maximizes 1T X v Y = P

i X [v Y]i corresponds the set of k largest signed coordinates of v Y. That is, the locally optimal k-set is topk(v Y).

We now wish to ﬁnd all possible locally optimal sets X. If we could possibly check all vectors v Y, then we could ﬁnd all locally optimal index sets topk(v Y).

Let us denote as Sd the set of all k-subsets X that are the optimal solutions of the inner maximization of (2) for any vector v in the span of v1, . . . , vd

Sd = {topk([v1 c1 + . . . + vd cd]) : c1, . . . , cd R}.

Clearly, this set contains all possible locally optimal X sets of the form topk(v Y). Therefore, we can rewrite DBk S on Ad as max |Y|=k max X Sd 1T X Ad1Y. (3)

The above problem can now be solved in the following way: for every set X Sd ﬁnd the locally optimal set Y that maximizes 1T X Ad1Y, that is, this will be topk(Ad1X ). Then, we simply need to test all such X, Y pairs on Ad and keep the optimizer.

Due to the above, the problem of solving DBk S on Ad is equivalent to constructing the set of k-supports Sd, and then ﬁnding the optimal solution in that set. How large can Sd be and can we construct it in polynomial time? Initially one could expect that the set Sd could have size as big as n k . Instead, we show that the set Sd will be tremendously smaller, as in (Karystinos & Liavas, 2010) and (Asteris et al., 2014).

Lemma 4. The set Sd has size at most O(nd) and can be built in time O(nd+1) using Algorithm 2.

3.1. Constructing the set Sd

We build up to the general rank-d algorithm by explaining special cases that are easier to understand.

Rank-1 case. We start with the d = 1 case, where we have S1 = {topk(c1 v1) : c1 R}. It is not hard to see that there are only two supports to include in S1: topk(v1) and topk( v1). These two sets can be constructed in time in time O(n), via a partial sorting and selection algorithm (Cormen et al., 2001). Hence, S1 has size 2 and can be constructed in time O(n).

Rank-2 case. This is the ﬁrst non-trivial d which exhibits the details of the Spannogram algorithm.

Let an auxiliary angle φ Φ = [0, π) and let

c = [ c1 c2 ] = h sin φ cos φ i .3

Then, we re-express c1 v1 + c2 v2 in terms of φ as

v(φ) = sin φ v1 + cos φ v2. (4)

This means that we can rewrite the set S2 as:

S2 = {topk( (v(φ)), φ [0, π)}.

Observe that each element of v(φ) is a continuous spectral curve in φ: [v(φ)]i = [v1]i sin(φ) + [v2]i cos(φ). Consequently, the top/bottom-k supports of v(φ) (i.e., topk( v(φ))) are themselves a function of φ. How can we ﬁnd all possible supports?

0 0.5 1 1.5 2 2.5 3 10

[v(φ)]1 [v(φ)]2 [v(φ)]3 [v(φ)]4 [v(φ)]5

Figure 1. A rank d = 2 spannogram for n = 5 and two random vectors v1, v2. Observe that every two curves intersect in exactly one point. These intersection points deﬁne intervals in which a top-k set is invariant.

The Spannogram. In Fig. 1, we draw an example plot of ﬁve curves [v(φ)]i, i = 1, . . . , 5, which we call a spannogram. From the spannogram in Fig. 1, we can see that the continuity of these sinusoidal curves implies a local invariance property of the top/bottom k supports topk( v(φ)), in a small neighborhood around a ﬁxed φ. So, when does a top/bottom-k support change? The index sets topk( v(φ)) change if and only if two curves cross, i.e., when the ordering of two elements [v(φ)]i,[v(φ)]j changes.

Finding all supports: There are n curves and each pair intersects at exactly one point in the Φ domain4. Therefore, there are exactly n 2 intersection points. These n 2 intersection points deﬁne n 2 + 1 intervals. Within an interval the top/bottom k supports topk( v(φ)) remain the same. Hence, it is now clear that |S2| 2 n 2 = O(n2).

A way to ﬁnd all supports in S2 is to compute the v(φi,j) vectors on the intersection points of two curves i, j, and

3Observe that when we scan φ, the vectors c, c express all possible unit norm vectors on the circle. 4Here we assume that the curves are in general position. This can be always accomplished by inﬁnitesimally perturbing the curves as in (Papailiopoulos et al., 2013).

Finding Dense Subgraphs via Low-Rank Bilinear Optimization

then the supports in the two adjacent intervals of such intersection point. The v(φi,j) vector on an intersection point of two curves i and j can be easily computed by ﬁrst solving a set of linear equations [v(φi,j)]i = [v(φi,j)]j (ei ej)T [v1 v2]ci,j = 02 1 for the unknown vector ci,j, where ei is the i-th column of the n n identity matrix, i.e., ci,j = nullspace((ei ej)T [v1 v2]). Then, we compute v(φi,j) = [v1 v2]ci,j. Further details on breaking ties in topk(v(φi,j)) can be found in the supplemental material.

Computational cost: We have n 2 intersection points, where we calculate the top/bottom k supports for each v(φi,j). The top/bottom k elements of every v(φi,j) can be computed in time O(n) using a partial sorting and selection algorithm (Cormen et al., 2001). Since we perform this routine a total of O( n 2 ) times, the total complexity of our rank-2 algorithm is O(n3).

General Rank-d case. The algorithm generalizes to arbitrary dimension d, as we show in the supplemental material; its pseudo-code is given as Algorithm 3. Remark 2. Observe that the computation of each loop under line 2 of Algorithm 3 can be computed in parallel. This will allow us to parallelize the Spannogram.

Algorithm 3 Spannogram(k,Vd)

1: Sd = 2: for all (i1, . . . , id) {1, . . . , n}d and s { 1, 1} do

3: c = s nullspace

[(Vd]i1,: [Vd]i2,: ... [Vd]i1,: [Vd]id,:

4: v = VT d c 5: S = topk(v) 6: T = S {i1, . . . , id} 7: for all d k |T | subsets J of (i1, . . . , id) do 8: Sd = Sd S (T J ) 9: end for 10: end for 11: Output: Sd.

3.2. An approximate Sd in nearly-linear time

In our exact solver, we solve DBk S on Ad in time O(nd+1). Surprisingly, when Ad has only positive eigenvalues, then we can tightly approximate DBk S on Ad in nearly linear time.

Theorem 3. Let the d largest eigenvalues of the graph be positive, and let the d-th,(d + 1)-st largest have constant

C. Then, we can output, with probability

1 δ, a k-subgraph with density (1 ϵ)2 optd, in time O m log n + n

ϵ δ . The main idea is that instead of checking all O(nd) possible k sets in Sd, we can approximately solve the problem by randomly sampling M = O ϵ d log 1

ϵ δ vectors in the span of v1, . . . , vd. Our proof is based on the fact that we can approximate the surface of the d-dimensional

sphere with M randomly sampled vectors from the span of v1, . . . , vd. This allows us to identify, probability 1 δ, near-optimal candidates in Sd. The modiﬁed algorithm is very simple and is given below; its analysis can be found in the supplemental material.

Algorithm 4 Spannogram approx(k, Vd, Λd)

1: for i = 1 : O ϵ d log 1

ϵ δ do 2: v = (Λ1/2 d Vd)T randn(d, 1) 3: Sd = Sd topk(v) topk( v) 4: end for 5: Output: Sd.

4. Scaling up

In this section, we present the two key scalability features that allow us to scale up to graphs with billions of edges.

4.1. Vertex Sparsiﬁcation We introduce a very simple and efﬁcient pre-processing step for discarding vertices that are unlikely to appear in a top k set in Sd. This step runs after we compute Ad and uses the leverage score, ℓi = Pd j=1[Vd]2 i,j|λj|, of the i-th vertex to decide whether we will discard it or not. We show in the supplemental material, that by appropriately setting a threshold, we can guarantee a provable bound on the error introduced. In our experimental results, the above elimination is able to reduce n to approximately ˆn 10 k for a provably small additive error, even for data sets where n = 108.

4.2. Map Reduce Implementation A Map Reduce implementation allows scaling out to a large number of compute nodes that can work in parallel. The reader can refer to (Meng & Mahoney, 2013; Bahmani et al., 2012)) for a comprehensive treatment of the Map Reduce paradigm. In short, the Hadoop/Map Reduce infrastructure stores the input graph as a distributed ﬁle spread across multiple machines; it provides a tuple streaming abstraction, where each map and reduce function receives and emits tuples as (key, value) pairs. The role of the keys is to ensure information aggregation: all the tuples with the same key are processed by the same reducer.

For the spectral decomposition step of our scheme we design a simple implementation of the power method in Map Reduce. The details are beyond the scope of this work; high-performance implementations are already available in the literature, e.g. (Lin & Schatz, 2010). We instead focus on the novel implementation of the Spannogram.

Our Map Reduce implementation of the rank-2 Spannogram is outlined in Algorithm 4. The Mapper is responsible for the duplication and dissemination of the eigenvectors, V2, U2 = V2Λ2, to all reducers. Line 3 emits the j-th row of V2 and U2 once for every node i. Since i is used as the key, this ensures that every reducer receives V2, U2 in

Finding Dense Subgraphs via Low-Rank Bilinear Optimization

their entirety.

From the breakdown of the Spannogram in Section 3, it is understood that, for the rank-2 case, it sufﬁces to solve a simple system of equations for every pair of nodes. The Reducer for node i receives the full eigenvectors V2, U2 and is responsible for solving the problem for every pair (i, j), where j > i. Then, Line 6 emits the best candidate computed at Reducer i. A trivial ﬁnal step, not outlined here, collects all n2 candidate sets and keeps the best one as the ﬁnal solution.

The basic outline in Algorithm 4 comes with heavy communication needs and was chosen here for ease of exposition. The more efﬁcient version that we implement, does not replicate V2, U2 n times. Instead, the number of reducers say R = nα is ﬁne-tuned to the capabilities of the cluster. The mappers emit V2, U2 R times, once for every reducer. Then, reducer r is responsible for solving for node pairs (i, j), where i r (mod R) and j > i. Depending on the performance bottleneck, different choices for α are more appropriate. We divide the construction of the O(n2) candidate sets in S2 to O(nα) reducers and each of them computes O(n2 α) candidate subgraphs. The total communication cost for this parallelization scheme is O(n1+α): nα reducers need to have access to the entire V2, U2 that has 2 2 n entries. Moreover, the total computation cost for each reducer is O(n3 α).

Algorithm 5 Spannogram MR(V2, U2)

1: Map({[V2]j,:, [U2]j,:, j}): 2: for i = 1 : n do 3: emit: i, {[V2]j,1, [V2]j,2, [U2]j,1, [U2]j,2, j} 4: end for 1: Reducei( i, {[V2]j,1, [V2]j,2, [U2]j,1, [U2]j,2, j} , j): 2: for each j i + 1 do 3: c = nullspace([V]i,: [V]j,:) 4: [denj, {Xj, Yj}] = max |Y|=k,X topk( V2c)1X V2UT 2 1Y

5: end for 6: emit: i, {Xi, Yi} = maxj 1Xj V2UT 2 1Yj

5. Experimental Evaluation

We experimentally evaluate the performance of our algorithm and compare it to the truncated power method (TPower) of (Yuan & Zhang, 2011), a greedy algorithm by (Feige et al., 2001) (GFeige) and another greedy algorithm by (Ravi et al., 1994) (GRavi). We performed experiments on synthetic dense subgraphs and also massive real graphs from multiple sources. In all experiments we compare the density of the subgraph obtained by the Spannogram to the density of the output subgraphs given by the other algorithms.

Our experiments illustrate three key points: (1) for all tested graphs, our method outperforms some times significantly all other algorithms compared; (2) our method is

Subgraph density

G-Feige G-Ravi TPower Spannogram

(a) Densities of the recovered subgraph v.s. the expected number of edges.

0 100 200 300 400 500 600

Total time in minutes

Number of cores

Running times on 2.5 Billion Edges

Power method Spannogram

(b) Running times of the Spannogram and power iteration for two top eigenvectors.

Figure 2. Planted clique experiments for random graphs.

highly scalable, allowing us to solve far larger problem instances; (3) our data-dependent upper bound in many cases provide a certiﬁcate of near-optimality, far more accurate and useful, than what a priori bounds are able to do.

Planted clique. We ﬁrst consider the so-called (and now much studied) Planted Clique problem: we seek to ﬁnd a clique of size k that has been planted in a graph where all other edges are drawn independently with probability 1/2. We scale our randomized experiments from n = 100 up to 105. In all cases we set the size of the clique to k = 3 n close to what is believed to be the critical computability threshold. In all our experiments, GRavi, TPower, and the Spannogram successfully recovered the hidden clique. However, as can be seen in Fig. 2, the Spannogram algorithm is the only one able to scale up to n = 105 a massive dense graph with about 2.5 billion edges. The reason is that this graph does not ﬁt in the main memory of one machine and caused all centralized algorithms to crash after several hours. Our Map Reduce implementation scales out smoothly, since it splits the problem over multiple smaller problems solved in parallel.

Speciﬁcally, we used Amazon Wireless Services Elastic Map Reduce framework (aws). We implemented our map and reduce functions in Python and used the MRJob class (mrj). For our biggest experiments we used a 100-machine strong cluster, consisting of m1.xlarge AWS instances (a total of 800 cores).

The running times of our experiments over Map Reduce are shown in Fig. 2(b). The main bottleneck is the computation of the ﬁrst two eigenvectors which is performed by repeating the power iteration for few (typically 4) iterations. This step is not the emphasis of this work and has not been optimized. The Spannogram algorithm is signiﬁcantly faster and the beneﬁts of parallelization are clear since it is CPU intensive.

In principle, the other algorithms could be also implemented over Map Reduce, but that requires non-trivial dis-

Finding Dense Subgraphs via Low-Rank Bilinear Optimization

Figure 3. Subgraph density vs. subgraph size (k). We compare our Dk S Spannogram algorithm with the algorithms from (Feige et al., 2001) (GFeige), (Ravi et al., 1994) (GRavi), and (Yuan & Zhang, 2011) (t PM). Across all subgraph sizes k, we obtain higher subgraph densities using Spannograms of rank d = 2 or 5. We also obtain a provable data-dependent upper bound (solid black line) on the objective. This proves that for these data sets, our algorithm is typically within 80% from optimality, for all sizes up to k = 250, and indeed for small subgraph sizes we ﬁnd a clique which is clearly optimal. Further experiments on multiple other data sets are shown in the supplemental material.

tributed algorithm design. As is well-known, e.g., (Meng & Mahoney, 2013), implementing iterative machine learning algorithms over Map Reduce can be a signiﬁcant task and schemes which perform worse in standard metrics can be highly preferable for this parallel framework. Careful Map Reduce algorithmic design is needed especially for dense graphs like the one in the hidden clique problem.

Real Datasets. Next, we demonstrate our method s performance in real datasets and also illustrate the power of our data-dependent bounds. We run experiments on large graphs from different applications and our ﬁndings are presented in Fig. 3. The ﬁgure compares the density achieved by the Spannogram algorithm for rank 1, 2 and 5 to the performance of GFeige, GRavi and TPower. The ﬁgure shows that the rank-2 and rank-5 versions of our algorithm, improve sometimes signiﬁcantly over the other techniques. Our novel data-dependent upper-bound shows that our results on these data sets are provably near-optimal.

The experiments are performed for two community graphs (com-Live Journal and com-DBLP), a web graph (web Notre Dame), and a subset of the Facebook graph. A larger set of experiments is included in the supplemental material.

Note that the largest graph in Figure 3 contains no more than 35 million edges; these cases ﬁt in the main memory of a single machine and the running times are presented in the supplemental material, all performed on a standard Macbook Pro laptop using Matlab. In summary, rank-2 took less than one second for all these graphs while prior work methods took approximately the same time, up to a few seconds. Rank-1 was signiﬁcantly faster than all other methods in all tested graphs and took fractions of a second. Rank-5 took up to 1000 seconds for the largest graph (Live Journal).

We conclude that our algorithm is an efﬁcient option for ﬁnding dense subgraphs. Different rank choices give a tradeoff between accuracy and performance while the parallel nature allows scalability when needed. Further, our theoretical upper-bound can be useful for practitioners investigating dense structures in large graphs.

6. Acknowledgments

The authors would like to acknowledge support from NSF grants CCF 1344364, CCF 1344179, DARPA XDATA, and research gifts by Google, Docomo and Microsoft.

Finding Dense Subgraphs via Low-Rank Bilinear Optimization

Amazon Web Services, Elastic Map Reduce. URL http:// aws.amazon.com/elasticmapreduce/.

MRJob. URL http://pythonhosted.org/mrjob/.

Alon, Noga, Lee, Troy, Shraibman, Adi, and Vempala, Santosh. The approximate rank of a matrix and its algorithmic applications: approximate rank. In Proceedings of the 45th annual ACM symposium on Symposium on theory of computing, pp. 675 684. ACM, 2013.

Ames, Brendan PW. Convex relaxation for the planted clique, biclique, and clustering problems. Ph D thesis, University of Waterloo, 2011.

Arora, Sanjeev, Karger, David, and Karpinski, Marek. Polynomial time approximation schemes for dense instances of nphard problems. In STOC, 1995.

Asahiro, Yuichi, Iwama, Kazuo, Tamaki, Hisao, and Tokuyama, Takeshi. Greedily ﬁnding a dense subgraph. Journal of Algorithms, 34(2):203 221, 2000.

Asteris, Megasthenis, Papailiopoulos, Dimitris S, and Karystinos, George N. The sparse principal component of a constant-rank matrix. IEEE Trans. IT, 60(4):228 2290, 2014.

Bahmani, Bahman, Kumar, Ravi, and Vassilvitskii, Sergei. Densest subgraph in streaming and mapreduce. Proceedings of the VLDB Endowment, 5(5):454 465, 2012.

Bhaskara, Aditya, Charikar, Moses, Chlamtac, Eden, Feige, Uriel, and Vijayaraghavan, Aravindan. Detecting high log-densities: an O(n1/4) approximation for densest k-subgraph. In STOC, 2010.

Boutsidis, Christos, Mahoney, Michael W, and Drineas, Petros. An improved approximation algorithm for the column subset selection problem. In Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 968 977. Society for Industrial and Applied Mathematics, 2009.

Cormen, Thomas H, Leiserson, Charles E, Rivest, Ronald L, and Stein, Clifford. Introduction to algorithms. MIT press, 2001.

d Aspremont, Alexandre et al. Weak recovery conditions using graph partitioning bounds. 2010.

Dourisboure, Yon, Geraci, Filippo, and Pellegrini, Marco. Extraction and classiﬁcation of dense communities in the web. In WWW, 2007.

Feige, Uriel and Langberg, Michael. Approximation algorithms for maximization problems arising in graph partitioning. Journal of Algorithms, 41(2):174 211, 2001.

Feige, Uriel, Peleg, David, and Kortsarz, Guy. The dense ksubgraph problem. Algorithmica, 29(3):410 421, 2001.

Gibson, David, Kumar, Ravi, and Tomkins, Andrew. Discovering large dense subgraphs in massive graphs. In PVLDB, 2005.

Hu, Haiyan, Yan, Xifeng, Huang, Yu, Han, Jiawei, and Zhou, Xianghong Jasmine. Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics, 21(suppl 1):i213 i221, 2005. Jethava, Vinay, Martinsson, Anders, Bhattacharyya, Chiranjib, and Dubhashi, Devdatt. The lovasz theta function, svms and ﬁnding large dense subgraphs. In NIPS, 2012.

Karystinos, George N and Liavas, Athanasios P. Efﬁcient computation of the binary vector that maximizes a rank-deﬁcient quadratic form. IEEE Trans. IT, 56(7):3581 3593, 2010.

Khot, Subhash. Ruling out ptas for graph min-bisection, densest subgraph and bipartite clique. In FOCS, 2004.

Lin, Jimmy and Schatz, Michael. Design patterns for efﬁcient graph algorithms in mapreduce. In Proceedings of the Eighth Workshop on Mining and Learning with Graphs, pp. 78 85. ACM, 2010.

Mahoney, Michael W and Drineas, Petros. Cur matrix decompositions for improved data analysis. Proceedings of the National Academy of Sciences, 106(3):697 702, 2009.

Meng, Xiangrui and Mahoney, Michael. Robust regression on mapreduce. In Proceedings of The 30th International Conference on Machine Learning, pp. 888 896, 2013.

Miller, B, Bliss, N, and Wolfe, P. Subgraph detection using eigenvector l1 norms. In NIPS, 2010.

Papailiopoulos, Dimitris, Dimakis, Alexandros, and Korokythakis, Stavros. Sparse pca through low-rank approximations. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), pp. 747 755, 2013.

Ravi, Sekharipuram S, Rosenkrantz, Daniel J, and Tayi, Giri K. Heuristic and special case algorithms for dispersion problems. Operations Research, 42(2):299 310, 1994.

Saha, Barna, Hoch, Allison, Khuller, Samir, Raschid, Louiqa, and Zhang, Xiao-Ning. Dense subgraphs with restrictions and applications to gene annotation graphs. In Research in Computational Molecular Biology, pp. 456 472. Springer, 2010.

Srivastav, Anand and Wolf, Katja. Finding dense subgraphs with semideﬁnite programming. Springer, 1998.

Suzuki, Akiko and Tokuyama, Takeshi. Dense subgraph problems with output-density conditions. In Algorithms and Computation, pp. 266 276. Springer, 2005.

Yuan, Xiao-Tong and Zhang, Tong. Truncated power method for sparse eigenvalue problems. ar Xiv preprint ar Xiv:1112.2679, 2011.