# community_detection_using_timedependent_personalized_pagerank__172e6593.pdf Community Detection Using Time-Dependent Personalized Page Rank Haim Avron HAIMAV@US.IBM.COM IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 Lior Horesh LHORESH@US.IBM.COM IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 Local graph diffusions have proven to be valuable tools for solving various graph clustering problems. As such, there has been much interest recently in efficient local algorithms for computing them. We present an efficient local algorithm for approximating a graph diffusion that generalizes both the celebrated personalized Page Rank and its recent competitor/companion - the heat kernel. Our algorithm is based on writing the diffusion vector as the solution of an initial value problem, and then using a waveform relaxation approach to approximate the solution. Our experimental results suggest that it produces rankings that are distinct and competitive with the ones produced by high quality implementations of personalized Page Rank and localized heat kernel, and that our algorithm is a useful addition to the toolset of localized graph diffusions. 1. Introduction In community detection problems (i.e., graph clustering problems), one seeks to identify sets of nodes in a graph that are both internally cohesive and well separated from the rest of the graph. Such sets are then referred to as communities or clusters. In one important variant, the goal is to build a community around a given seed node or set of seed nodes. That is, the algorithm is given, as an input, a node (or nodes) in the graph, and the goal is to find a cluster in which it is a member. One popular technique for identifying communities using seed nodes is to use local graph diffusions (Andersen et al., 2006; Chung, 2007). The general framework is as follows. Using the seed, a diffusion vector is computed. The Proceedings of the 32 nd International Conference on Machine Learning, Lille, France, 2015. JMLR: W&CP volume 37. Copyright 2015 by the author(s). diffusion vector is reweighted based on the degrees, and the nodes are sorted according to their magnitude in the reweighted diffusion vector. The community is found by making a sweep over the nodes according to their rank, selecting the prefix that minimizes (or maximizes) some scoring function. If the input vector is sparse, most of the entries in the diffusion vectors tend to be tiny, in which case a sparse approximation is viable (the exact diffusion vector is generally dense), and this is allows algorithms that work even on massive graphs. Algorithms that find sparse approximations to diffusion vectors are local graph diffusion algorithms. One widely used scoring function is conductance. The conductance of a subset of nodes S is given by φ(S) S min(vol (S) , vol S ) , where S is the complementary set of nodes, S is the number of edges with one end-point in S and the other in S, and vol (S) is the sum of degrees of nodes in S. Selecting sets with low conductance tends to produce high quality clusters (Yang & Leskovec, 2015), and the performance of the aforementioned algorithm with respect to minimizing the conductance can be rigorously analyzed for certain diffusions (Andersen et al., 2006; Chung, 2009). However, we stress that other scoring functions have been applied successfully (Yang & Leskovec, 2015). The classical graph diffusion vector is the Page Rank vector (Brin & Page, 1998). Assume, without loss of generality, that the set of nodes is {1, . . . , n}. Let A Rn n be the adjacency matrix of the graph, and let D Rn n be a diagonal matrix with the degrees on the diagonal. Let P AD 1 be the random walk transition matrix. The Page Rank vector is equal to p (1 α)(In αP) 1s where α (0, 1) and s Rn are parameters. The vector s is typically referred to as the teleportation vector. The Community Detection Using Time-Dependent Personalized Page Rank Page Rank vector p can be interpreted as the stationary distribution of a random walk with restart, and there are also other equivalent formulations. When s is sparse this vector is often referred to as a personalized Page Rank vector. Andersen et al. (2006) described an efficient efficient local algorithm for personalized Page Rank. Recently, the heat kernel diffusion vector (Chung, 2007) has received attention in the literature. The heat kernel vector is equal to h exp { γ(In P)} s where γ > 0 is some parameter. Kloster & Gleich (2014) recently described an efficient local algorithm for the heat kernel, and experimentally showed that the heat kernel tends to produce smaller and more realistic communities than the ones produced using the Page Rank diffusion, with only a modest increase in conductance. In this paper we consider a special case of the dynamical Page Rank diffusion recently introduced by Gleich & Rossi (2014). Fix parameters γ > 0 and α [0, 1]. The diffusion vector x, which we refer to as the time-dependent Page Rank vector, is equal to x(γ) where x( ) is the solution to the initial value problem x (t) = (1 α)s (In αP)x(t) x(0) = s, t [0, γ] . (1) It can be shown that (Gleich & Rossi, 2014) x = (1 α)(In αP) 1s + (2) exp { γ(In αP)} (s (1 α)(In αP) 1s) . Time-dependent Page Rank is a generalization of both Page Rank and heat kernel. If we fix α and let γ go to infinity, the time-dependent Page Rank vector x converges to the Page Rank vector p. If we set α = 1.0, time-dependent Page Rank vector x is exactly equal to the heat kernel vector h. Consequently, one can hope that certain non-degenerate combinations of α and γ (perhaps ones that are dependent on the application and/or the graph) will produce better diffusion vectors for downstream use. The main contribution of this paper is an efficient local algorithm for approximating the time-dependent personalized Page Rank vector x. As such, our algorithm can also be viewed as a new algorithm for both the heat kernel diffusion and the personalized Page Rank diffusion. The proposed algorithm is deterministic and simple (we give a detailed pseudo-code in the supplementary material). Our algorithm tends to access the out-links much less than existing algorithms for personalized heat kernel and Page Rank, and as such more suitable than those algorithms when out-link access is somewhat expensive. We experimentally compare communities produced using our algorithm with communities produced using the heat kernel and using Page Rank. Our algorithm tends to generate smaller communities than both heat kernel and Page Rank, with roughly the same conductance. In our experimental setup, it had similar performance to heat kernel in detecting ground-truth communities. An open-source implementation of the algorithm is available through the lib Skylark library (http:// xdata-skylark.github.io/libskylark/). 2. Preliminaries 2.1. Notation We denote scalars using Greek letters or using t, s, . . . . Vectors are denoted by x, y, . . . and matrices by A, B, . . . . We use the convention that vectors are column-vectors. We denote vector-valued functions from [0, γ] (or any other interval implied by the text) to Rn by x( ), y( ), . . . , while x(t), y(t) denotes their evaluation (which is a vector in Rn) at a specific point t. Accordingly, x( ), y( ), . . . denote scalar-valued functions, and x(t), y(t), . . . their evaluation at t. For a vector (or vector-valued function), lower case letter with an index denotes the value of coordinate i of the vector (resp. vector-valued function). For example, xi (resp. xi( )) denotes the ith entry of x (resp. x( )). Note that xi is different from xi; the former is a scalar, while the latter is a vector indexed by i. 2.2. Diffusion Coefficients Since both Page Rank and heat kernel are a special cases of time-dependent Page Rank, any rigorous theoretical guarantee on either one can be mimicked using the timedependent Page Rank. Consequently, one can hope that some non-degenerate combinations of α and γ will offer better results. While there is no rigorous analysis supporting this assessment, inspection of diffusion coefficients can provide insight into possibly favorable behaviors of timedependent Page Rank. Graph diffusion vectors, like Page Rank and heat kernel, can be written as an infinite series where P k=0 αk = 1. The terms αk are the diffusion coefficients, which serve as weights to the distribution of random walks of the corresponding lengths. For Page Rank, the diffusion expansion is k=0 αk Pks , Community Detection Using Time-Dependent Personalized Page Rank and for the heat kernel it is Based on (2), it is easy to show that The diffusion coefficients of Page Rank decay at a fixed rate. Thus, if the decay for low-indices is slow (i.e. short paths have roughly the same coefficient), the decay is very slow overall, and excessively long paths will have rather high coefficients, thereby encouraging large communities. Slow decay also poses an algorithmic challenge, since the algorithm needs to allocate significant weights to very long paths. If we reduce α so that long paths have very small coefficients, we ran the risk of having a strong decay for the initial coefficients, perhaps resulting in unduly small communities. As for the heat kernel, while the coefficients eventually decay to zero very quickly (since k! grows much faster than γk), there is an initial phase in which they actually grow (this is because k! is indepenent of γ, and thus γk initially grows much faster than k!). Thus, the coefficients of the heat kernel display an hump in the lower coefficients, which might cause short paths to get very small weight. Inspecting the diffusion coefficients of time-dependent Page Rank, we see that for small indices, the behavior is quite similar to that of Page Rank (as long as α is small enough), but for large k the decay is similar to that of heat kernel. Thus, by carefully choosing α and γ, we can entertain a slow initial decay, followed by very rapid decay beyond some point. This is illustrated in Figure 1 (left). Figure 1 (right) reports the cumulative sum of the diffusion coefficients as the index increases. Once the cumulative sum gets very close to 1.0, the weights of remaining long paths are too small for them to affect the final diffusion vector. Thus, when the cumulative sum quickly approaches 1.0 there is more hope for very fast algorithms. 2.3. Relaxation Methods and Waveform Relaxation Relaxation is a general technique for solving linear equations of the form Ax = b. In the most general form, the matrix is split A = Bk Ck (here k is an iteration index), and the iteration is written as x(k+1) = B 1 k (Ckx(k) + b). By carefully choosing Bk and Ck, one can ensure that x(k+1) is identical to x(k) in all but one coordinate. Such methods are called coordinate relaxation methods, and they are closely related to the Gauss-Seidel and Gauss Southwell iteration. Several competitive local algorithms 0 5 10 15 20 25 30 10 15 Coefficient Page Rank (α = 0.99) Page Rank (α = 0.85) Heat Kernel (γ = 5.0) Time-dependent Page Rank (α = 0.85, γ = 5.0) 0 5 10 15 20 25 30 0 Cumulative Sum Figure 1. Diffusion coefficients (αks) for different diffusions. The left graph plots the coefficients themselves, while the right graph shows a cumulative sum of the coefficients. for personalized Page Rank and heat kernel employ a coordinate relaxation approach (Kloster & Gleich, 2014). Waveform relaxation, originally developed at the beginning of the 1980s for the simulation of electronic circuits, is a generalization of relaxation to solving initial value problems of the form x (t) = F(t, x(t)). The basic idea is to introduce new functions Gk( , , ) such that F(t, x) = Gk(t, x, x). To obtain x(k+1)( ) from x(k)( ) we solve the initial value problem (x(k+1)) (t) = Gk(t, x(k+1)(t), x(k)(t)). By choosing Gk( , , ) carefully we can, again, ascertain x(k+1)( ) to be identical to x(k)( ) in all but one coordinate. This form of waveform coordinate relaxation is the basis for our method. 3. Algorithm As explained earlier, diffusion-based community detection algorithms start by computing a diffusion vector, which is used rank the vertices and perform a sweep. Our goal is to integrate the time-dependent personalized Page Rank (2) vector in this framework, instead of the personalized Page Rank vector or the heat kernel vector. This section describes our algorithm for computing (2). The exact time-dependent personalized Page Rank vector is completely dense for connected graphs. Computing it exactly is too expensive for the purpose of detecting local communities. It is common practice to find approximate vectors that are localized (i.e., have only a small number of non-zeros). Since the main purpose of the diffusion vector is to form a ranking of the nodes by sorting their degree reweighted diffusion values, a natural objective is to seek an approximate diffusion vector y such that D 1(x y) < ϵ . Our algorithm constructs a vector-valued function that approximates the diffusion vector for all t [0, γ]. That is, the algorithm constructs a vector function y( ) such that D 1(x(t) y(t)) < ϵ (3) for all t [0, γ]. In the above x( ) is the solution to (1). Community Detection Using Time-Dependent Personalized Page Rank Remark. Some of the algorithms for approximate diffusion also guarantee that the error is one sided, i.e. y x. That can be achieved by our algorithm by solving to ϵ/2 approximation error, and then applying a non-uniform shift to y. This shift does not affect the ranking of the nodes. Let y( ) be some candidate solution. Following Botchev et al. (2013) we define the residual r( ) relative to y( ) to be r(t) (1 α)s (In αP)y(t) y (t) . The residual is a powerful tool for analyzing approximate solutions to initial value problems (Botchev et al., 2013). In particular, it can be used to certify that (3) is met by y( ), as the following proposition shows (the proof, like all proofs, appears in the supplementary material): Proposition 1. Let di denote the ith diagonal entry of D (which is the degree of vertex i). Suppose that y(0) = s. If ri( ) < (1 α)diϵ 1 exp((α 1)γ) (4) (if α = 1 then substitute 1 exp((α 1)γ)/(1 α) with γ) holds for all i = 1, . . . , n, then (3) holds for all t [0, γ]. Algorithm Outline. The key observation is that if we initialize y(t) = s then most coordinates are not in violation of (4). Thus, if we use a simple waveform relaxation approach, in which we iteratively target coordinates in which (4) is violated, the iterates wo;; tend to stay localized. In particular, our algorithm maintains a queue of violating coordinates. In each iteration, a coordinate is extracted from the queue and the corresponding coordinate of y( ) is perturbed so that the corresponding residual is small enough. This, in turn, can cause new coordinates to violate (4). Newly violating coordinates (if any) are then inserted into the queue. The algorithm terminates when the queue is empty. To allow these various steps to be implemented in standard computational models (e.g. RAM model), we restrict the coordinates of y( ) to be polynomials of some fixed degree. We remark that our algorithm is similar in structure and spirit to push algorithms for personalized Page Rank and heat kernel (Andersen et al., 2006; Kloster & Gleich, 2014). 3.1. Waveform Relaxation Approach Proposition 1 suggests the following waveform relaxation approach: identify an index i for which ri( ) is too large, modify yi( ) so to force ri( ) to be zero, while keeping yj( ) for j = i as-is. This leads to the following proto-algorithm: 1. Initialize y(t) = s. 2. While there exists an i such that ri( ) > (1 α)diϵ 1 exp((α 1)γ) (a) Select such an i arbitrarily. (b) Set yi( ) to the solution of the ODE y (t) = y(t) + α j=1 Pijyj(t) + (1 α)si y(0) = si (5) The reason we use the term proto-algorithm is that in the above form, the proto-algorithm cannot be implemented in usual computational models (e.g. RAM model), as it abstractly operates on functions. We address this issue in the next subsection. 3.2. Restricting the Function Space To turn the proto-algorithm into an actual algorithm, we restrict the functions to be polynomials of degree equal or smaller than some fixed degree N. That is, each coordinate of y( ) is a polynomial of degree at most N. As a consequence, each coordinate of r( ) is also a polynomial of degree at most N. How N is chosen is explained in the next subsection. For now, it suffices to keep in mind that N depend on γ and ϵ, and that it tends to grow slowly. For example, for γ = 5.0 and ϵ = 10 4, N = 14 suffices. Let PN be the space of polynomials of degree at most N. We use a pseudo-spectral approach of representing polynomials in PN by their values at predefined grid points1. Let t0, . . . , t N be the result of scaling and shifting the Chebyshev points of the second kind from [ 1, 1] to [0, γ], i.e. tj = (cos(jπ/N) + 1)γ/2, j = 0, . . . , N . Given a set of values y0, . . . , y N there is unique polynomial p( ) which interpolates yj = p(tj), which means we can use the vector p(t0) p(t N) T to represent p( ) PN. We denote by SN the operator mapping a scalar function to a N +1-dimensional vector of its point-samples at t0, . . . , t N, i.e. SN[f( )] f(t0) f(t N) T . Our algorithm uses yi SN[yi( )] to represent yi( ). The derivative of a degree N polynomial is a degree N 1 polynomial, so ri( ) PN. Thus, we can use ri SN[ri( )] to represent ri( ).2 1Our approach is inspired by the CHEBFUN library (Driscoll et al., 2014; Battles & Trefethen, 2004). In fact, early prototypes simply implemented the proto-algorithm using CHEBFUN. 2The reason we do not use the coefficients in the monomial basis is because that will not result in a stable algorithm. The reason we use non-equidistant grid points is that equidistant grid point will result in a non-stable algorithm as well due to the Runge phenomenon. See Trefethen (2000) and Boyd (1989). Community Detection Using Time-Dependent Personalized Page Rank We now show how the various operations in the protoalgorithm are implemented in our algorithm. These operations are: Taking derivative in order to compute residuals. Computing the infinity norm of residual entries for testing convergence. Solving the ODE (5). 3.2.1. DERIVATIVE Derivative is a linear operation on PN, so there exists a matrix Ξ R(N+1) (N+1) such that for every p( ) PN SN[p ( )] = ΞSN[p( )] . The following explicit formula for Ξ is based on known formulas for the [ 1, 1] domain (Boyd, 1989) γ(1 + 2N 2)/12 i = j = 0 γ(1 + 2N 2)/12 i = j = N γxj/(4 4x2 j) i = j; 0 < j < N ( 1)i+jpi/(2pjxi pjxj) i = j where xi = cos(πi/N), p0 = p N = 2, and pj = 1 otherwise. 3.2.2. BOUNDING ri( ) Convergence is tested by inspecting the value of ri( ) . This raises the question whether we can compute ri( ) using ri, or more generally, given SN[p( )] can p( ) be computed for p( ) PN? The answer is yes, and in a stable manner (Battles & Trefethen, 2004). However, this computation is rather expensive: O(N 3) operations. Fortunately, we can bound p( ) using SN[p( )] in only O(N) operations, as the following proposition shows. Proposition 2. Let p( ) PN. The following holds, p( ) (1 + 2 π log N) SN[p( )] . Thus, we use the following termination test: ri < (1 α)diϵ (1 exp((α 1)γ))(1 + 2 π log N) (6) (if α = 1 then substitute 1 exp((α 1)γ)/(1 α) with γ) for all i = 1, . . . , n. 3.2.3. SOLVING THE ODE The solution to the ODE (5) is generally not a polynomial, and step 2 (b) cannot be implemented exactly if we restrict the solution space to PN. However, as N grows, better and better approximation to the solution can be found in PN. How to set N to be large enough so to find sufficiently accurate solution to the ODE is the subject of the next subsection. In this subsection, we explain how our algorithm finds an approximate solution to (5) within PN. Writing y(t) = yi(t) + d(t) and rearranging (5), we can rewrite the ODE and the update step as d (t) + d(t) ri(t) = 0, d(0) = 0 (7) yi( ) yi( ) + d( ) . If we abandon solving (7) exactly so that we can require d( ) PN, the update step (in the values-on-grid-points representation) then becomes where d SN[d( )]. After this step, the residual can be updated as well ri ri (Ξ + IN+1)d . The convergence criteria requires ri to be small enough for all i, so it seems natural to attempt to find a di that will minimize ri after the update. However, infinity norm minimization is rather expensive. Noting that ri ri 2 we instead opt to minimize ri 2 after the update, which yields the following equality-constrained linear least-squares problem min d ri (Ξ + IN+1)d 2 s.t. d N+1 = 0 . Writing Ξ + IN+1 = Ξ1 c we get the following formula for d: d = Ξ+ 1ri 0 This leads to the following update formula for ri ri (IN+1 Ξ1Ξ+ 1)ri. (9) Let R In Ξ1Ξ+ 1. Since Ξ1 has full column rank, R is exactly a rank one matrix. In fact, R = uu T where u is the unique unit norm null vector of ΞT 1. This allows us to compute the update formula (9) in O(N) time. Note that Ξ+ 1u = 0N and Ru = u, so setting ri to 0 instead of using (9) will not change the result of any subsequent application of (8). However, it might affect the detection of convergence. 3.3. Choosing N Recall that the update formula for the residual is ri (IN+1 Ξ1Ξ+ 1)ri, and that to declare convergence ri Community Detection Using Time-Dependent Personalized Page Rank has to be small enough. It is crucial that N is set large enough so that (IN+1 Ξ1Ξ+ 1)ri is small enough for all ris encountered by the algorithm. Observing that at the first iteration all residuals are multiple of the all-ones vector, we can bound the infinity error after the first iteration using the following lemma. Lemma 3. Let 1N+1 be the N + 1 dimensional all-ones vector. Provided that γ 1 and N 10 we have (IN+1 Ξ1Ξ+ 1)1N+1 20 exp( γ/2)IN+1(γ)(4/5)N+1 where I0( ), I1( ), . . . are the modified Bessel functions of the first kind. While we are unable to prove it rigorously, numerical experiments revealed that values of (IN+1 Ξ1Ξ+ 1)ri produced by the algorithm are of the same order of magnitude as that of the first iteration. Therefore, we base our criteria for choosing N on Lemma 3, and set N to be the minimum N 10 that will guarantee 20 exp( γ/2)IN+1(γ)(4/5)N+1 ϵ γ(1 + 2 Our experience showed that this strategy for choosing N is extremely robust, never failing to allow the algorithm to converge in all our experiments (we stress that once the algorithm has detected convergence, the result is accurate enough regardless of how N is chosen). An inequality due to Luke (1972) implies that IN+1(γ) γN+1eγ/2N+1(N + 1)! (we credit the answer to question 415834 of math stackexchange for this observation) which imply that N = O(γ + log(1/ϵ)). 3.4. Putting It All Together The algorithm keeps a queue of indices which violate (6). The queue is initialized using the seed vector. In each iteration, the algorithm pops an index from the queue and operates on it. Suppose u is the popped index. The algorithm first updates yu and ru using formulas in subsection 3.2.3. It then updates rv for all neighboring vs of u. If rv violates (6), and it is not already in the queue, it is added to the queue. The algorithm terminates when the queue is empty. During the execution of the algorithm, most of the yis and ris are zero. Our algorithm does not explicitly retain them in memory, and initializes yi and ri only when encountered first (that happens in three cases: as a seed vector, as a neighbor of a seed vector and as a neighbor of an index that was popped from the queue). We use an hash table for compact storage with fast access. A detailed pseudo-code appears in the supplementary material. The only access our algorithm needs to G = (V, E) is to query a vertex degree (i.e., given v V , return its degree G.deg(v)) and to query the set of vertex neighbors (i.e., given v V , return its neighbors G.neighbors(v)). Once the algorithm terminates, the result y( ) appear as a collection of N + 1 dimensional vectors, which hold the values y( ) at N + 1 time points in the range [0, γ]. These vectors can be used for ranking. The value of y(t) for other values of t [0, γ] can be computed efficiently using the barycentric formula (Berrut & Trefethen, 2004). In terms of computational cost, we first note that while finding N and Ξ+ 1 is expensive, this is a one-time operation given γ and ϵ, so their values can be cached for future uses. Thus, we do not include them in the analysis. Assuming hash operations take O(1), the cost of a single iteration of the main loop is O(N 2) = O((t + log(1/ϵ))2). While we do not currently have a bound on the number of iterations, our experience shows that it is rather small. 4. Experimental Results We experimentally evaluated our algorithm in the context of community detection using a seed vector (that is, we use the common sweep procedure after applying the diffusion, selecting the prefix with minimum conductance). It is not the purpose of this section to show that our algorithm always, or in most of the cases, produces better communities when compared to the baselines. Evaluating communities is very tricky, and it is unclear how to quantify which of two communities is better. For example, just comparing the conductance is questionable, since Page Rank based sweeps tend to produce lower conductance but much larger (and less realistic) clusters than heat kernel based sweeps (Kloster & Gleich, 2014). Other metrics exist, and it is unclear which is best (Yang & Leskovec, 2015). It is not unreasonable to expect that the best metric is application dependent and might in fact be an ensemble of metrics, and thus for a given application the best strategy might be to work with an ensemble of diffusion vectors from which a collection of candidate clusters are generated. Instead, we demonstrate that our algorithm produces diffusion vectors that are useful in the sense that they produces rankings that are distinct and competitive with the ones produced by high quality implementations of personalized Page Rank and localized heat kernel. In competitive we mean that selecting the better community between the two is not a trivial task of selecting the smallest conductance, and in fact that task might be application dependent. As such, we aim to establish that our algorithm is a useful addition to the toolset of local graph diffusions. We use two other codes as a baseline. The first is hkgrow (Kloster & Gleich, 2014)3. The second is pprgrow, 3The paper uses the name hk-relax, but the code uses hkgrow. Community Detection Using Time-Dependent Personalized Page Rank Table 1. Datasets Graph |V | |E| email-Enron 36, 692 183, 831 ca-Astro Ph 18, 771 198, 050 soc-sign-epinions 131, 580 711, 210 soc-Live Journal1 4, 846, 609 42, 851, 237 com-amazon 334, 863 925, 872 com-dblp 317, 080 1, 049, 866 com-youtube 1, 134, 890 2, 987, 624 com-lj 3, 997, 962 34, 681, 189 com-orkut 3, 072, 441 117, 185, 083 an implementation of the personalized Page Rank push algorithm (Andersen et al., 2006) by the authors of hkgrow. Similary, we call our algorithm tpprgrow. Note that we are comparing algorithms and not diffusions. The reason is that even when the underlying diffusion is the same (for example time-dependent Page Rank with α = 1.0 is identical to heat kernel), different algorithms often produce different approximate diffusion vector. Unless the accuracy parameter ϵ is set to be extremely small and well below common values, these different values result in different ranking between nodes. Indeed, it is common practice to use ϵ as another parameter, and to run the algorithm with different ϵ values and select the best cluster found, instead of driving ϵ to be low enough for the ranking to be fully determined (we remark that setting ϵ so that the exact ranking between all nodes is fully determined necessitates a dense vector, which is counter to the goal of having a local algorithm). 4.1. Conductance, Cluster Size and Runtime In the first set of experiments we compare hkgrow with γ = 5.0 to tpprgrow with α = 1.0, γ = 5.0 and tpprgrow with α = 0.85, γ = 5.0. Note that the underlying diffusion for tpprgrow with α = 1.0, γ = 5.0 and hkgrow with γ = 5.0 is exactly the same. So any difference in the observed results is due to finding a different approximate diffusion vector within the allowed error budget. We use the four widely used datasets at the top of Table 1. For each dataset, we randomly choose a seed node, and detect a community around it using the various algorithms. We repeat 1000 times for every dataset. We use ϵ = 0.001. The results are summarized in Figure 2, Table 2 and Table 3. Figure 2 shows scatter plots of the conductance versus cluster size of the cluster found by the two variants of tpprgrow using randomly selected seed nodes. The values are normalized according to the cluster found using hkgrow. We see that tpprgrow tends to produce clusters with slightly higher (but comparable) conductance, but often of smaller, and perhaps more realistic, size. It also occasionally produces clusters with lower conductance. 10 3 10 2 10 1 10 0 10 1 10 2 0 Relative Cluster Size Relative Conductance email Enron ttprgrow(alpha = 1.0, gamma = 5.0) ttprgrow(alpha = 0.85, gamma = 5.0) 10 2 10 1 10 0 10 1 10 2 0 Relative Cluster Size Relative Conductance ca Astro Ph ttprgrow(alpha = 1.0, gamma = 5.0) ttprgrow(alpha = 0.85, gamma = 5.0) 10 3 10 2 10 1 10 0 10 1 10 2 10 3 0 Relative Cluster Size Relative Conductance soc sign epinions ttprgrow(alpha = 1.0, gamma = 5.0) ttprgrow(alpha = 0.85, gamma = 5.0) 10 3 10 2 10 1 10 0 10 1 10 2 0 Relative Cluster Size Relative Conductance soc Live Journal1 ttprgrow(alpha = 1.0, gamma = 5.0) ttprgrow(alpha = 0.85, gamma = 5.0) Figure 2. Scatter plot of conductance vs. community size of tpprgrow relative to hkgrow. Each point represent a single run from a random seed, with blue x s with α = 1.0 and γ = 5.0, and red + s for α = 0.85 and γ = 5.0. The conductance and community size is relative to the values obtained by the cluster found by the hkgrow with the same seed node. Table 2 quantifies this further. In the table we compare the size and conductance of the cluster found between the algorithms. For each combination of algorithms, we find what percent of the runs resulted in clusters that are smaller, equal or larger of one algorithm versus the other. We repeat with regards to conductance. We see that not only does tpprgrow with α = 1.0, γ = 5.0 tends to produce smaller sized clusters, but also with higher conductance. tpprgrow with α = 0.85, γ = 5.0 tends to produce even smaller clusters and even higher conductance. Table 3 compares the performance in terms of running time. The total time columns aggregates the total running time of all 1000 experiments. While tpprgrow tends to be somewhat slower than hkgrow when used with α = 1.0 and γ = 5.0, it is faster than hkgrow when α = 0.85 is used. This is probably due to fact that with α = 0.85 the sum of diffusion coefficients converge to 1.0 faster (recall Figure 1). We also report the average number of iterations in the main loop. This corresponds to the average number of access to the edge list of a node in the graph. In some cases, this might be dominant cost (e.g., if the graph is too big to kept in memory and instead kept on disk, so each access to the neighbor list of a vertex is access to external storage). We see that tpprgrow tends to do less iterations. The results make it clear that the clusters that can be found using tpprgrow are often distinct and competitive with the ones found by hkgrow. Community Detection Using Time-Dependent Personalized Page Rank Table 2. Comparison of cluster profile produced using different diffusions. tpprgrow-1 refers to tpprgrow with α = 1.0, γ = 5.0. tpprgrow-2 uses α = 0.85, γ = 5.0. For hkgrow we use γ = 5.0. A triple X%/Y%/Z% means that between the algorithms compared, the first algorithm had a value smaller than the second algorithm s value for X% of the cases, equal value for Y% of the cases, and lager value for Z% of the cases. For example, the 36%/49%/15% under Size for tpprgrow-1 vs. hkgrow means that in 36% of the cases tpprgow produced a cluster which was smaller than the one produced by hkgrow, 49% of the cases the cluster size was the same and in 15% of the cases the cluster was larger. Dataset tpprgrow-1 vs. hkgrow tpprgrow-2 vs. hkgrow tpprgrow-2 vs. tpprgrow-1 Size Conductance Size Conductance Size Conductance Enron 36%/49%/15% 9%/48%/43% 52%/29%/19% 1%/29%/70% 49%/32%/19% 1%/31%/68% ca-Astro Ph 40%/43%/17% 10%/42%/48% 59%/23%/18% 0%/22%/78% 58%/23%/19% 1%/22%/77% soc-sign-epinions 27%/60%/12% 5%/ 60% / 35% 38%/38%/23% 0%/39%/61% 39%/ 41%/20% 1%/41%/58% soc-Live Journal1 42%/42%/16% 12%/41%/47% 61%/21%/18% 0%20%/80% 58%/ 22%/20% 1%/21%/78% Table 3. Comparison between the total time and average number of iterations of the main loop for finding cluster around the same seeds between hkgrow and two tpprgrow configurations. Dataset tpprgrow with α = 1.0, γ = 5.0 tpprgrow with α = 0.85, γ = 5.0 hkgrow with γ = 5.0 Avg. #Its Total Time (sec) Avg. #Its Total Time (sec) Avg. #Its Total Time (sec) Enron 358.3 2.26 163.8 0.70 594.0 0.93 ca-Astro Ph 124.4 1.47 69.8 0.58 272.7 0.93 soc-sign-epinions 111.9 1.16 64.4 0.56 259.3 0.80 soc-Live Journal1 134.1 1.86 72.7 0.51 304.8 1.00 Table 4. Comparison between tpprgrow, hkgrow and pprgrow on finding real-world communities (datasets with ground-truth communities). In the Statistics column, the first item is the average F1-score, the second item is the average conductance, and the third is the average size. Dataset tpprgrow hkgrow pprgrow Statistics Time (sec) Statistics Time (sec) Statistics Time (sec) amazon 0.940/0.026/28 131 0.943/0.027/28 26 0.822/0.019/46 362 dblp 0.591/0.165/33 343 0.592/0.166/34 138 0.527/0.139/53 429 youtube 0.185/0.527/61 292 0.173/0.490/84 126 0.183/0.522/89 108 lj 0.730/0.170/45 505 0.719/0.153/51 197 0.655/0.143/63 555 orkut 0.357/0.778/58 6058 0.363/0.762/65 2814 0.360/0.755/61 1673 4.2. Datasets with Ground Truth Next, we evaluate the different algorithms using datasets for which ground truth communities exists (Yang & Leskovec, 2015). These are the datasets at the bottom of Table 1. Our setup is quite similar to the one used by Kloster & Gleich (2014): for each dataset, we use only the list of 5000 top communities, and experiment on all communities with 10 or more members. Given a seed, we run tpprgrow with ϵ = 0.001 and three different values of α: 0.85, 0.99, 1.0. We use γ = 5.0, but examine the diffusion vector at three additional time points. Among the clusters found, we pick the one with minimum conductance. We repeat this for every member of the ground truth community, using it as a seed, choosing the community with maximum F1-score. Similarly we run on hkgrow with the same four values of γ, and pprgrow with α = 0.85, 0.99 (α = 1.0 does not make sense for pprgrow). Table 4 reports for each dataset and algorithm, the average F1-score, average cluster size, average cluster conductance, and total running time. On average, tpprgrow and hkgrow find clusters of about the same quality. hkgrow is faster than our algorithm by a factor of x2-5 (however, our algo- rithm explores a much wider parameter space). pprgrow tends to produce larger clusters, with lower conductance, but they are less faithful to the ground truth. It also tends to be slower than hkgrow and tpprgrow. 5. Conclusions We have described an efficient local algorithm for yetanother graph diffusion. Our algorithm is also yet-another local algorithm for Page Rank and heat kernel. Our experiments suggest that the algorithm produces rankings that are distinct and competitive with the ones produced by high quality implementations of personalized Page Rank and local heat kernel, and so is a useful addition to the toolset of local graph diffusions. From a theoretical perspective, by selecting parameters so to behave like Page Rank or heat kernel, the time-dependent Page Rank diffusion can recover the same rigorous bounds that have been proven for those graph diffusions. However, it is unclear whether a careful choice of parameter can, in fact, produce stronger theoretical results. We propose that as an open question for future research. Community Detection Using Time-Dependent Personalized Page Rank Acknowledgments We thank David Gleich and Kyle Kloster for useful discussions. Haim Avron acknowledges the support from the XDATA program of the Defense Advanced Research Projects Agency (DARPA), administered through Air Force Research Laboratory contract FA8750-12-C-0323. Abramowitz, M. and Stegun, I. A. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Dover, New York, Ninth Dover printing, Tenth GPO printing edition, 1964. Andersen, R., Chung, F., and Lang, K. Local graph partitioning using pagerank vectors. In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, FOCS 06, pp. 475 486, Washington, DC, USA, 2006. IEEE Computer Society. ISBN 0-76952720-5. doi: 10.1109/FOCS.2006.44. URL http: //dx.doi.org/10.1109/FOCS.2006.44. Battles, Z. and Trefethen, L. An extension of MATLAB to continuous functions and operators. SIAM Journal on Scientific Computing, 25(5):1743 1770, 2004. doi: 10. 1137/S1064827503430126. URL http://dx.doi. org/10.1137/S1064827503430126. Berrut, J. and Trefethen, L. Barycentric Lagrange interpolation. SIAM Review, 46(3):501 517, 2004. doi: 10. 1137/S0036144502417715. URL http://dx.doi. org/10.1137/S0036144502417715. Botchev, M. A., Grimm, V., and Hochbruck, M. Residual, restarting and Richardson iteration for the matrix exponential. SIAM journal on scientific computing, 35(3): A1376 A1397, 2013. URL http://doc.utwente. nl/87651/. Boyd, J. P. Chebyshev and Fourier Spectral Methods. Springer-Verlag, New York, 1989. 792 pp. Brin, S. and Page, L. The anatomy of a largescale hypertextual web search engine. Comput. Netw. ISDN Syst., 30(1-7):107 117, April 1998. ISSN 0169-7552. doi: 10.1016/S0169-7552(98) 00110-X. URL http://dx.doi.org/10.1016/ S0169-7552(98)00110-X. Chung, F. The heat kernel as the pagerank of a graph. Proceedings of the National Academy of Sciences, 104(50): 19735 19740, December 2007. ISSN 1091-6490. doi: 10.1073/pnas.0708838104. URL http://dx.doi. org/10.1073/pnas.0708838104. Chung, F. A local graph partitioning algorithm using heat kernel pagerank. In Avrachenkov, Konstantin, Donato, Debora, and Litvak, Nelly (eds.), Algorithms and Models for the Web-Graph, volume 5427 of Lecture Notes in Computer Science, pp. 62 75. Springer Berlin Heidelberg, 2009. ISBN 978-3-540-95994-6. doi: 10. 1007/978-3-540-95995-3_6. URL http://dx.doi. org/10.1007/978-3-540-95995-3_6. Driscoll, T. A., Hale, N., and Trefethen, L. N. (eds.). Chebfun Guide. Pafnuty Publications, Oxford, UK, 1st edition, 2014. Gleich, D. F. and Rossi, R. A. A dynamical system for Page Rank with time-dependent teleportation. Internet Mathematics, 10(1 2):188 217, June 2014. doi: 10.1080/15427951.2013.814092. Kloster, K. and Gleich, D. F. Heat kernel based community detection. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 14, pp. 1386 1395, New York, NY, USA, 2014. ACM. ISBN 978-1-45032956-9. doi: 10.1145/2623330.2623706. URL http: //doi.acm.org/10.1145/2623330.2623706. Laforgia, A. Bounds for modified Bessel functions. Journal of Computational and Applied Mathematics, 34(3):263 267, 1991. ISSN 0377-0427. doi: http://dx.doi.org/10.1016/0377-0427(91)90087-Z. URL http://www.sciencedirect.com/science/ article/pii/037704279190087Z. Luke, Y. L. Inequalities for generalized hypergeometric functions. Journal of Approximation Theory, 5(1):41 65, 1972. ISSN 0021-9045. doi: http://dx.doi.org/10.1016/0021-9045(72)90028-7. URL http://www.sciencedirect.com/science/ article/pii/0021904572900287. Paris, R. An inequality for the Bessel function Jν(νx). SIAM Journal on Mathematical Analysis, 15(1):203 205, 1984. doi: 10.1137/0515016. URL http://dx. doi.org/10.1137/0515016. Trefethen, L. N. Spectral Methods in MATLAB. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2000. ISBN 0-89871-465-6. Yang, J. and Leskovec, J. Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems, 42(1):181 213, 2015. ISSN 0219-1377. doi: 10.1007/s10115-013-0693-z. URL http://dx.doi.org/10.1007/ s10115-013-0693-z.