# dynamic_spectral_clustering_with_provable_approximation_guarantee__9f977253.pdf

Dynamic Spectral Clustering with Provable Approximation Guarantee

Steinar Laenen 1 He Sun 1

This paper studies clustering algorithms for dynamically evolving graphs {Gt}, in which new edges (and potential new vertices) are added into a graph, and the underlying cluster structure of the graph can gradually change. The paper proves that, under some mild condition on the clusterstructure, the clusters of the final graph GT of n T vertices at time T can be well approximated by a dynamic variant of the spectral clustering algorithm. The algorithm runs in amortised update time O(1) and query time o(n T ). Experimental studies on both synthetic and real-world datasets further confirm the practicality of our designed algorithm.

1. Introduction

For any graph G = (V, E) and parameter k N as input, the objective of graph clustering is to partition the vertex set of G into k clusters such that vertices within each cluster are better connected than to the rest of the graph. Since large-scale graphs are commonly used to model practical datasets, designing efficient graph clustering algorithms is an important problem in machine learning and related fields.

In practice, these large-scale graphs usually evolve over time: not only are new vertices and edges added into a graph, but the graph s clusters could also change gradually, resulting in a new cluster-structure in the long term. Instead of periodically running a clustering algorithm from scratch, it is important to design algorithms that can quickly identify and return the new clusters in dynamically evolving graphs.

In this paper we study clustering for dynamically evolving graphs, and obtain the following results. As the first and conceptual contribution, we propose a model for dynamic graph clustering. In contrast to the classical model for dynamic graph algorithms (Thorup, 2007; Beimel et al., 2022), our

1School of Informatics, University of Edinburgh, United Kingdom. Correspondence to: Steinar Laenen <steinar.laenen@ed.ac.uk>, He Sun <h.sun@ed.ac.uk>.

Proceedings of the 41 st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024. Copyright 2024 by the author(s).

proposed model considers not only edge insertions but also vertex insertions; as such the underlying graph can gradually form a new cluster-structure with a different number of clusters from the initial graph.

As the second and algorithmic result, we design a randomised graph clustering algorithm that works in the abovementioned model, and our result is as follows:

Theorem 1.1 (Informal statement of Theorem 4.6). Let G1 = (V1, E1) be a graph of n1 vertices and k = e O(1) clusters.1 Assume that new edges, which could be adjacent to new vertices, are added to Gt at each time t to obtain Gt+1, and there are O(poly(n1)) added edges in total at time T = O(poly(n1)) to form GT of n T vertices and k

clusters. Then, there is a randomised algorithm such that the following hold with high probability:

The initial k clusters of G1 can be approximately computed in e O(|E1|) time.

The new k clusters of GT can be approximately computed with amortised update time O(1) and query time o(n T ).

To examine the result, we notice that, although the number of clusters k in G1 can be identified with the classical eigen-gap heuristic (Ng et al., 2001; von Luxburg, 2007), computing an eigen-gap is expensive and cannot be directly applied to determine the change of k in dynamically evolving graphs. Our result shows that the new number of clusters k can be computed by a dynamic clustering algorithm with sublinear query time. Secondly, as the running time of a clustering algorithm is at least linear in the number of edges in GT and it takes Ω(n T ) time to output the cluster membership of all the vertices, obtaining an o(n T ) amortised query time2 is significant. To the best of our knowledge, our work presents the first such result with respect to theoretical guarantees of the output clusters, and time complexity.

Our algorithm not only achieves strong theoretical guarantees, but also works very well in practice. For instance, for input graphs with 300,000 vertices and up to 490,000,000 edges generated from the stochastic block model, our algo-

1We use e O(n) to represent O(n logc(n)) for constant c. 2Throughout the paper we use T to denote query time, and t as arbitrary time throughout the sequence of graphs {Gt}.

Dynamic Spectral Clustering with Provable Approximation Guarantee

rithm runs more than 100 times faster than repeated execution of spectral clustering on the updated graphs, while obtaining a comparable clustering result.

1.1. Overview of the Algorithm

For any input graph G1 with a well-defined cluster structure, we first construct a cluster-preserving sparsifier H1 of G1, which is a sparse subgraph of G1 that maintains its clusterstructure, and employ spectral clustering on H1 to obtain the initial k clusters of G1. After this, with a new edge arriving at every time t, our designed algorithm applies two components to track the cluster-structure of Gt.

The first component is a dynamic algorithm that maintains a cluster-preserving sparsifier Ht for Gt. Our designed algorithm is based on sampling edges with probability proportional to the degrees of their endpoints, and these edges get resampled if their degrees have significantly changed. We show that Ht always preserves the cluster-structure of Gt, and the algorithm s amortised update time complexity is O(1).

Our second component is an algorithm that dynamically maintains a contracted graph e Gt of Gt, and this contracted graph is used to sketch the cluster-structure of Gt. For the first input graph G1 and the output of spectral clustering on H1, our initial contracted graph e G1 consists of k super vertices with self-loops: these super vertices correspond to the k clusters in G1, and are connected by edges with weight equal to the cut values of the corresponding clusters in H1. After that, when new edges (and potentially new vertices) arrive over time, our algorithm updates e Gt such that (new) clusters are represented by either the same super vertices, newly added vertices, or a combination of both. The algorithm further updates the edge weights between the super vertices. With slight increase in the number of vertices of e Gt over time, we prove that the cluster-structure in Gt is approximately preserved in e Gt. In particular, when new clusters are formed in Gt, this new cluster-structure of Gt can be identified by the eigen-gap of e Gt s Laplacian matrix. See Figure 1 for the illustration of our approach.

1.2. Related work

Our work directly relates to a number of works on incremental spectral clustering algorithms (e.g., (Dhanjal et al., 2014; Martin et al., 2018; Ning et al., 2007)). These works usually rely on analysing the change of approximate eigenvectors and don t show the approximation guarantee of the returned clusters. Many works along this direction further employ matrix perturbation theory in their analysis, requiring that the total number of vertices in a graph is fixed.

Our work is also linked to related dynamic graph algorithm problems (e.g., (Bernstein et al., 2022; Saranurak & Wang,

2019)). However, most works in dynamic graph algorithms focus on the design of dynamic algorithms in a general graph, while for dynamic clustering one needs to assume the presence of cluster-structures in the initial and final graphs, such that the algorithm s performance can be rigorously analysed. Nevertheless, some of our presented techniques, like the adaptive sampling, are inspired by the dynamic graph algorithms literature.

2. Preliminaries

2.1. Notation

Let G = (V, E, w) be an undirected graph with |V | = n vertices, |E| = m edges, and weight function w : V V R 0. For any edge e = {u, v} E, we write w G(u, v) or w G(e) to express the weight of e. For a vertex u V , we denote its degree by deg G(u) P v V w G(u, v), and the volume for any S V is defined as vol G(S) P

u S deg G(u). For any S, T V , we define the cut value between S and T by w G(S, T) P

e EG(S,T ) w G(e), where EG(S, T) is the set of edges between S and T. Moreover, for any S V , the conductance of S is defined as

ΦG(S) w G(S, V \ S) min{vol G(S), vol G(V \ S)}

if S = , and ΦG(S) = 1 if S = . For any integer k 2, we call subsets of vertices A1, . . . , Ak a k-way partition of G if Sk i=1 Ai = V and Ai Aj = for different i and j. We define the k-way expansion of G by

ρG(k) min partitions A1,...,Ak max 1 i k ΦG(Ai).

Our analysis is based on the spectral properties of graphs, and we list the basics of spectral graph theory. For a graph G = (V, E, w), let DG Rn n be the diagonal matrix defined by DG(u, u) = deg G(u) for all u V . We denote by AG Rn n the adjacency matrix of G, where AG(u, v) = w G(u, v) for all u, v V . The normalised Laplacian matrix of G is defined as LG I D 1/2 G AGD 1/2 G , where I is the n n identity matrix. The normalised Laplacian LG is symmetric and real-valued, and has n real eigenvalues which we write as 0 = λ1(LG) . . . λn(LG) 2; we use fi Rn(1 i n) to express the eigenvector of LG corresponding to λi.

Lemma 2.1 (Higher-order Cheeger inequality, (Lee et al., 2014)). There is an absolute constant C2.1 such that it holds for any graph G and k 2 that

2 ρG(k) C2.1 k3p

λk(LG). (1)

Dynamic Spectral Clustering with Provable Approximation Guarantee

Cluster + contract

Graph update Dynamic sparsifier Dynamic contracted graph

(a) (b) (c)

Figure 1. Illustration of our technique. The black and red edges in Figure (a) are the edges in Gt and the added ones in Gt ; the dashed black and red edges in Figure (b) are the ones added in Ht and Ht ; the black and red edges in Figure (c) are the ones in e Gt and e Gt .

2.2. Spectral Clustering

Spectral clustering is a popular clustering algorithm used in practice (Ng et al., 2001), and it can be described with a few lines of code (Algorithm 1).

Algorithm 1 Spectral Clustering(G, k)

1: Input: Graph G = (V, E, w), number of clusters k N 2: Output: Partitioning P1, . . . , Pk 3: Compute eigenvectors f1, . . . , fk of LG 4: for u V do 5: F(u) 1

deg G(u) (f1(u), . . . , fk(u))

6: end for 7: P1, . . . , Pk k-means({F(u)}u V , k) 8: Return P1, . . . , Pk

To analyse the performance of spectral clustering, we examine the scenario in which there is a large gap between λk+1(LG) and ρG(k). By the higher-order Cheeger inequality, a low value of ρG(k) ensures that V can be partitioned into k clusters, each of which has conductance at most ρG(k); on the other hand, a large value of λk+1(LG) implies that any (k + 1) partition of V would introduce some A V with ΦG(A) ρG(k + 1) λk+1(LG)/2. Based on this, Peng et al. (2017) introduced the parameter

ΥG(k) λk+1(LG)

ρG(k) , (2)

and showed that a large value of ΥG(k) is sufficient to guarantee a good performance of spectral clustering. They further showed that, for a graph G with m edges, spectral clustering runs in O(m logβ m) time for constant β R+.

For convenience of notation, we always order the output of spectral clustering by P1, . . . , Pk such that vol G(P1) . . . vol G(Pk).

2.3. Model for Dynamic Graph Clustering

We assume that the initial graph G1 = (V1, E1) with n1 vertices satisfies λk+1(LG1) = Ω(1) and ρG1(k) = O(k 8 log 2γ(n1)) for some constant γ R+. This condition is similar to lower bounding ΥG1(k), and ensures that the initial input graph G1 has k well-defined clusters. After this, the underlying graph is updated through an edge insertion at each time, and let Gt = (Vt, Et) be the graph constructed at time t. We assume that every edge insertion introduces at most one new vertex; as such the underlying graph is always connected, and the number of vertices nt |Vt| could increase over time. We further assume that, after every Θ(logγ(nt)) steps, there is time t such that Gt = (Vt , Et ) presents a well-defined structure of k clusters, which is characterised by λk +1(LGt ) = Ω(1) and ρGt (k ) = O(k 8 log 2γ(nt )) for some k N.

Notice that, since both the number of vertices nt in time t and the number of clusters could change, our above-defined dynamic gap assumption allows the underlying graph to gradually form a new cluster structure, e.g., O(logγ(n1)) newly added vertices and their adjacent edges could initially form a small new cluster which gradually grows into a large one. On the other side, our assumption prevents the disappearance of the underlying graph s cluster-structure throughout the edge updates, which would make the objective function of a clustering algorithm ill-defined.

Dynamic Spectral Clustering with Provable Approximation Guarantee

3. Dynamic Cluster-Preserving Sparsifiers

A graph sparsifier is a sparse representation of an input graph that inherits certain properties of the original dense graph, and their efficient construction plays a key role in designing a number of nearly-linear time graph algorithms. However, typical constructions of graph sparsifiers are based on fast Laplacian solvers, making them difficult to implement in practice. To overcome this, Sun & Zanetti (2019) studied a variant of graph sparsifiers for graph clustering, and introduced the notion of a cluster-preserving sparsifier: Definition 3.1 (Cluster-preserving sparsifier). Let G = (V, E) be any graph with k clusters, and {Si}k i=1 a kway partition of G corresponding to ρG(k). We call a re-weighted subgraph H = (V, F E, w H) a clusterpreserving sparsifier of G if (i) ΦH(Si) = O(k ΦG(Si)) for 1 i k, and (ii) λk+1(LH) = Ω(λk+1(LG)).

To examine the two conditions of Definition 3.1, notice that graph G = (V, E) has exactly k clusters if (i) G has k disjoint subsets S1, . . . , Sk of low conductance, and (ii) any (k + 1)-way partition of G would include some A V of high conductance, which would be implied by a lower bound on λk+1(LG) due to (1). With the well-known eigengap heuristic and theoretical analysis on spectral clustering (Peng et al., 2017), these two conditions ensure that the k optimal clusters in G have low conductance in H as well.

3.1. The SZ Algorithm

We first present the algorithm in (Sun & Zanetti, 2019) for constructing a cluster-preserving sparsifier; we call it the SZ algorithm for simplicity. Given any input graph G = (V, E), the algorithm computes

pu(v) min C 1 λk+1(LG) log n deg G(u), 1

pv(u) min C 1 λk+1(LG) log n deg G(v), 1 ,

for every e = {u, v}, where C R+ is some constant. Then, the algorithm samples e = {u, v} with probability pe pu(v) + pv(u) pu(v) pv(u), and sets the weight of every sampled e = {u, v} in H as w H(u, v) 1/pe. By setting F as the set of the sampled edges, the algorithm returns H = (V, F, w H). Sun & Zanetti (2019) proved that, with high probability, H has e O(n) edges and is a clusterpreserving sparsifier of G.

On the other side, while Definition 3.1 shows that the optimal clusters Si (1 i k) of G have low conductance in H, it doesn t build the connection from the vertex sets of low conductance in H to the ones in G. In this paper, we prove that such a connection holds as well; this allows us to apply spectral clustering on H, and reason about the conductance of its returned clusters in G.

Lemma 3.2. Let G be a graph with ΥG(k) = Ω(k) for some k N with optimal clusters {Si}k i=1, and H its cluster preserving sparsifier. Let {Pi}k i=1 be the output of spectral clustering on H, and without loss of generality let the optimal correspondence of Pi be Si for any 1 i k. Then, it holds with high probability for any 1 i k that

vol G(Pi Si) = O k2

ΦG(Pi) = O ΦG(Si) + k2

where A B (A \ B) (B \ A).

3.2. Construction of Dynamic Cluster-Preserving Sparsifiers

Now we design an algorithm that constructs a clusterpreserving sparsifier under edge and vertex insertions, and our algorithm works as follows. Initially, for the input G1 with n1 vertices, a well-defined structure of k clusters and

τ C λk+1(LG1) (3)

for some constant C R+, we run the SZ algorithm and obtain a cluster-preserving sparsifier of G1. In addition to storing the sparsifier H1 of G1, the algorithm employs the vector sp 1 to store the values log n1/deg G1(u) for every vertex u, which are used to sample adjacent edges of vertex u. See Algorithms 2 and 3 for formal description.

Algorithm 2 Sample Edge(e, G, τ)

1: Input: edge e = {u, v}, graph G = (V, E) of n vertices, parameter τ R+

Output: edge e with weight w(e ) p(u, v) pu(v) + pv(u) pu(v) pv(u) Sample e with probability p(u, v) 2: if e is sampled then 3: e e, w(e ) 1/p(u, v) 4: else 5: e , w(e ) 0 6: end if 7: Return e , w(e )

Next, given the graph Gt currently constructed at time t, its sparsifier Ht, and edge insertion e = {u, v}, the algorithm compares for every vertex w the parameter log nt+1/deg Gt+1(w) with sp t (w), the quantity used to sample the adjacent edges of w the last time, and checks whether the two values change significantly. If it is the case, then the used sampling probability is too far from the correct one when running the static SZ algorithm on Gt+1, and hence we resample all the edges adjacent to w with

Dynamic Spectral Clustering with Provable Approximation Guarantee

Algorithm 3 Static SZSparsifier(G, τ)

1: Input: G = (V, E) of n vertices, parameter τ R+

2: Output: Cluster preserving sparsifier H = (V, F, w H), degree list sp

3: F 4: for e E do 5: e , w(e ) Sample Edge(e, G, τ) 6: F F e , w H(e) w(e ) 7: end for 8: sp n log n deg G(u) | u V o

9: Return H, sp

Algorithm 4 Update Sparsifier(Gt, Ht, sp t , e, τ)

1: Input: Gt = (Vt, Et), Ht = (Vt, Ft, w Ht), sp t , incoming edge e = {u, v}, parameter τ 2: Output: Ht+1 = (Vt+1, Ft+1, w Ht+1), sp t+1 3: Vnew {u, v} \ Vt 4: Gt+1 (Vt Vnew, Et e) 5: Ht+1 (Vt Vnew, Ft, w Ht) 6: sp t+1 sp t 7: if Vnew = then 8: e , w(e ) Sample Edge(e, Gt+1, τ) 9: Ft+1 Ft+1 e , w Ht+1(e) w(e ) 10: if u Vnew then 11: sp t+1(u) log nt+1 deg Gt+1(u) 12: end if 13: if v Vnew then 14: sp t+1(u) log nt+1 deg Gt+1(v) 15: end if 16: end if Vdoubled n ˆv Vt+1 \ Vnew | log nt+1 deg Gt+1(ˆv) > 2

sp t (ˆv) or log nt+1 deg Gt+1(ˆv) < sp t (ˆv)

17: if |Vdoubled| > 0 then 18: for ˆu Vdoubled do 19: Ft+1 Ft+1 \ EHt+1(ˆu) 20: for ˆe EGt+1 adjacent to ˆu do 21: ˆe , w(ˆe ) Sample Edge(ˆe, Gt+1, τ) 22: Ft+1 Ft+1 ˆe , w Ht+1(ˆe) w(ˆe ) 23: end for 24: sp t+1(ˆu) log nt+1 deg Gt+1(ˆu) 25: end for 26: else 27: e , w(e ) Sample Edge(e, Gt+1, τ) 28: Ft+1 Ft+1 e , w Ht+1(e) w(e ) 29: end if 30: Return Ht+1, sp t+1

the right sampling probability. Otherwise, we simply use the values stored in sp t to sample the upcoming edge e, and include it in Ht+1 if e is sampled. See Algorithm 4 for formal description3, and Theorem 3.3 for its performance:

Theorem 3.3. Let G1 = (V1, E1) be a graph with n1 vertices and a well-defined structure of k = e O(1) clusters, and {Gt} the sequence of graphs of {nt} vertices constructed sequentially through an edge insertion at each time. Assuming graph GT at time T = O(poly(n1)) has a well-defined structure of e O(1) clusters and n T = O(poly(n1)), Algorithm 4 returns a cluster-preserving sparsifier HT = (VT , FT , w HT ) of GT with high probability, and |FT | = e O(n T ). The algorithm s amortised running time is O(1) per edge update.

4. Dynamic Spectral Clustering Algorithm

This section presents our main dynamic spectral clustering algorithm, and is organised as follows: In Section 4.1, we present the construction and update procedure of a contracted graph, which is the data structure that summarises the cluster structure of an underlying input graph and allows for quick updates to the clusters. The properties of dynamic contracted graphs are analysed in Section 4.2. We present the main algorithm and analyse its performance in Section 4.3.

4.1. Construction and Update of Contracted Graphs

For any input graph Gt = (Vt, Et) of nt vertices, its dynamic cluster-preserving sparsifier Ht = (Vt, Ft, w Ht), and its k clusters P1, . . . , Pk returned from running spectral clustering on Ht, we apply Algorithm 5 to construct a contracted graph e Gt = (e Vt, e Et, w e Gt) of Gt. Notice that we introduce the set of non-contracted vertices e V nc t = , which will be used later.

Lemma 4.1. The algorithm Contract Graph(Ht, P) returns e Gt = (e Vt, e Et, w e Gt) in O (|Ft|) time.

Next we discuss how the contracted graph is updated under edge and vertex insertions. Given the graph Gt = (Vt, Et) with nt vertices that satisfies λk+1(LGt) = Ω(1) and ρGt(k) = O(k 8 log 2γ(nt)) for some constant γ R+, its cluster-preserving sparsifier Ht = (Vt, Ft, w Ht), the corresponding contracted graph e Gt = (e Vt, e Et, w e Gt), and the upcoming edge insertion e = {u, v}, we construct e Gt+1 from e Gt as follows:

3Notice that, since λk+1(LGt) = Ω(1) for any graph Gt exhibiting a well-defined structure of k clusters and it holds for GT at time T = O(poly(n1)) that n T = O(poly(n1)), i.e., log n T = O(log n1), by setting C to be a sufficiently large constant, τ log n1 is the right parameter for defining the sampling probability at time T = O(poly(n1)).

Dynamic Spectral Clustering with Provable Approximation Guarantee

Algorithm 5 Contract Graph(Ht, P)

1: Input: Cluster preserving sparsifier Ht = (Vt, Ft, w Ht), partition P = {P1, . . . Pk} 2: Output: Contracted graph e Gt = (e Vt, e Et, w e Gt) 3: Let pi be a representative super vertex for each cluster Pi P. 4: e V c t {pi | Pi P}, e V nc t 5: e Vt e V nc t e V c t 6: e Et 7: for {pi, pj} e V c t e V c t do

8: e Et e Et {pi, pj} 9: w e Gt(pi, pj) w Ht(Pi, Pj) 10: end for 11: Return e Gt = (e Vt, e Et, w e Gt)

If either u or v is a new vertex, the algorithm adds the vertex to e Gt as a non-contracted vertex. The algorithm sets Vnew = {u, v} \ Vt, and Vt+1 = Vt Vnew.

For every existing vertex w {u, v} \ Vnew that belongs to some Pi, the algorithm checks whether deg Gt+1(w) > 2 deg Gr(w), where deg Gr(w) for r t is the degree of w when the contracted graph was constructed. If it is the case, the algorithm pulls w out of pi, and adds it to e Vt+1, i.e., the uses a single vertex in e Gt+1 to represent w.

The algorithm adjusts the edge weights in the contracted graph based on the type of the vertices. For instance, the algorithm sets w e Gt+1(u, v) = 1 if both of u and v are non-contracted vertices, and decreases the value of w e Gt+1(Pu, Pu) if vertex u pulls out of Pu P.

See Algorithm 7 in the appendix for the formal description of the algorithm Update Contracted Graph(Gt, e Gt, e). Lemma 4.2. The amortised time complexity of Update Contracted Graph(Gt, e Gt, e) is O(1).

4.2. Properties of the Contracted Graph

Now we analyse the properties of the contracted graph. Since the amortised time complexity for every edge update (Theorem 3.3 and Lemma 4.2) remains valid when we consider a sequence of edge updates at every time, without loss of generality let Gt = (Vt Vnew, Et Enew) be the graph after a sequence of edge updates from Gt = (Vt, Et) with nt vertices, and e Gt be the contracted graph of Gt constructed by sequentially running Update Contracted Graph for each e Enew. We assume that |Enew| logγ(nt) for some γ R+.

We first prove that the clusters returned by spectral clustering on Ht also have low conductance in Gt. Notice that, as

the underlying graph Gt could be dense over time, running a clustering algorithm on its sparsifier Ht with e O(nt) edges is crucial to achieve the algorithm s quick update time.

Lemma 4.3. It holds with high probability that ΦHt(Pi) = O k2 ρGt(k) and ΦGt(Pi) = O k2 ρGt(k) for all Pi P.

Next, we define the event E1 that

ΦHt(Pi) = O k 6 log 2γ(nt)

and ΦGt(Pi) = O k 6 log 2γ(nt)

hold for all Pi P. By the fact that λk+1(Gt) = Ω(1), ρGt(k) = O(k 8 log 2γ(nt)) and Lemma 4.3, E1 holds with high probability. We further define the event E2 that

(1/2) deg Gt(u) deg Ht(u) (3/2) deg Gt(u)

hold for all u Vt, and know from the proof of Theorem 3.3 that E2 holds with high probability. In the following we assume that both of E1 and E2 happen.

Next, we study the relationship between the cluster-structure in Gt and the one in e Gt . Recall that the number of vertices in e Gt is much smaller than the one in Gt . We first prove that there are ℓdisjoint vertex sets of low conductance in Gt if and only if there are ℓsuch vertex sets in e Gt .

Lemma 4.4. The following statements hold:

If ρGt (ℓ) log α(nt ) holds for some ℓ N and α > 0, then ρ e Gt (ℓ) =

max O log 0.9α(nt ) , O k 6 log γ(nt ) .

If ρ e Gt (ℓ) log δ(nt ) holds for some ℓ N and δ > 0 , then ρGt (ℓ) =

max n O log δ(nt ) , O k 6 log γ(nt ) o .

Secondly, we show that there is a close connection between λℓ+1(LGt ) and λℓ+1(L e Gt ) for any ℓ N.

Lemma 4.5. The following statements hold:

If λℓ+1(L e Gt ) = Ω(1) for some ℓ N, then

λℓ+1 LGt = Ω log α(nt )/ℓ6 for constant α > 0.

If λℓ+1 LGt = Ω(1) holds for some ℓ N, then λℓ+1(L e Gt ) = Ω(1).

Lemmas 4.4 and 4.5 imply that the cluster-structures in Gt and e Gt are approximately preserved.

Dynamic Spectral Clustering with Provable Approximation Guarantee

4.3. Main Algorithm

Our main algorithm consists of the preprocessing stage, update stage, and query stage. They are described as follows:

Preprocessing Stage. For the initial input graph G1 = (V1, E1), we apply (i) Static SZSparsifier(G1, τ) to obtain H1 = (V1, F1, w H1), (ii) Spectral Clustering(H1, k) to obtain initial partition P = {P1, . . . Pk}, and (iii) Contract Graph(H1, P) to obtain e G1 = (e V1, e E1).

Update Stage. When a new edge arrives at time t, we apply Algorithm 4 and the update procedure of the contracted graph (Section 4.1) to dynamically maintain Ht and e Gt.

Query Stage. When a query for a new clustering starts at time T, the algorithm performs the following operations, where γ is the constant satisfying γ > β and γ > 0.9α:

For r being the last time at which e Gt is recomputed, the algorithm checks if T r logγ(nr), i.e., the number of added edges after the last reconstruction of the contracted graph is less than logγ(nr). If it is the case, then the algorithm runs spectral clustering on the contracted graph e GT .

Otherwise, the algorithm runs spectral clustering on HT . It also recomputes e GT , by first computing e Gr , where r is the last time at which the dynamic gap assumption holds, and updating e Gr to e GT with the edge updates between time r and T.

See Algorithm 6 for formal description.

Algorithm 6 Query Spec Clustering(GT , HT , e GT , γ, ℓ)

1: Input: Graphs GT , HT , and e GT , γ R+, and ℓ N 2: Output: Partition P = {P1, . . . Pℓ} 3: Let r be the last time at which e GT is recomputed. 4: if T r logγ(nr) then 5: P1, . . . , Pℓ Spectral Clustering( e GT , ℓ) 6: Return {P1, . . . , Pℓ} 7: else 8: P1, . . . , Pℓ Spectral Clustering(HT , ℓ) 9: Recompute e Gr , where r is the last time at which the dynamic gap assumption holds 10: Update e Gr to e GT with the edge updates between time r and T 11: Return {P1, . . . , Pℓ} 12: end if

Theorem 4.6. Let G1 = (V1, E1) be a graph with n1 vertices and k = e O(1) clusters, and {Gt} the sequence of graphs of {nt} vertices constructed through an edge insertion at each time satisfying the dynamic gap assumption. Assume that GT at query time T has ℓclusters, i.e.,

λℓ+1(LGT ) = Ω(1) and ρGT (ℓ) = O(ℓ 1 log α(n T )) for α R+. Then, with high probability Algorithm 6 returns P1, . . . Pℓwith ΦGT (Pi) = O ℓ log 0.9α(n T ) for every 1 i ℓ. The algorithm s running time for returning the clusters of G1 is e O(|E1|). Afterwards, the algorithm s amortised update time is O(1), and amortised query time is o(n T ).

Proof. The algorithm s running time and approximation guarantee on G1 follows from (Macgregor & Sun, 2022), so we only need to analyse the dynamic update stage. We first analyse the conductance of every output Pi. Notice that, if Lines 4 6 of Algorithm 6 are executed, then by Lemmas 3.2, 4.4 and 4.5 the approximation guarantee holds. Otherwise, Lines 7 12 are executed, then by the dynamic gap condition and Lemma 3.2 the approximation guarantee holds as well.

Next, we prove the running time guarantee. The O(1) amortised update time of Ht and e Gt follows by Theorem 3.3 and Lemma 4.2. For the query at time T, notice that if Lines 4 6 are executed, then the query time is at most O(|e VT |3) = O((k + logγ(n T ))3) = e O(1). Note, the super vertices are used as sketches to quickly update the cluster assignment of each vertex; otherwise, Lines 7 12 are executed, and the query time is dominated by spectral clustering s time complexity of O n T logβ(n T ) . Since this

only happens every logγ(nr) = O(logγ(n T )) edge updates, the amortised query time is O(n T logβ γ(n T )) = o(n T ).

Finally, we show that the number of clusters ℓcan be identified with our claimed time complexity. Notice that, if Lines 4 6 of the algorithm are executed, then by Lemmas 4.4 and 4.5 we can detect the spectral gap in GT using e GT ; hence we can choose ℓin o(nt) time. Otherwise, Lines 7 12 are executed. In this case, we run spectral clustering with different values of ℓ and find the correct value of ℓ(The same procedure is done to recompute e Gr ). Since there are e O(1) clusters in total, we achieve the same query time guarantee.

5. Experiments

We experimentally evaluate the performance of our algorithm on synthetic and real-world datasets. We report the clustering accuracy of all tested algorithms using the Adjusted Rand Index (ARI) (Rand, 1971), and compute the average and standard deviation over 10 independent runs. Algorithms were implemented in Python 3.12.1 and experiments were performed using a Lenovo Think Pad T15G, with an Intel(R) Xeon(R) W-10855M CPU@2.80GHz processor and 126 GB RAM. Our code can be downloaded from https://github.com/steinarlaenen/Dynamic-Spectral Clustering-With-Provable-Approximation-Guarantee.

Dynamic Spectral Clustering with Provable Approximation Guarantee

2 3 4 5 6 7 8 9 10 11 0.90

2 3 4 5 6 7 8 9 10 11 0.90

2 3 4 5 6 7 8 9 10 11 0

2 3 4 5 6 7 8 9 10 11 0

SC on GT SC on HT SC on e GT

Figure 2. Results on the two versions of our dynamic SBM. Figures (a) and (b) report the average ARI score at each time T for the clustering results on GT , HT , and e GT ; Figures (c) and (d) report the running time in seconds at each time T. Shaded regions indicate the standard deviation.

5.1. Results on Synthetic Data

We study graphs generated from the stochastic block model (SBM), and introduce two dynamic extensions to generate new clusters and merge existing clusters.

SBM with increasing number of clusters. We generate the first graph G1 based on the standard SBM, and set k = 10 and the number of vertices in each cluster {Si}k i=1 as nk = 10, 000. For every pair of u Si and v Sj we include edge {u, v} with probability p if i = j, and with probability q if i = j.

To update the graph, we generate a batch of edge updates in two steps: first, we randomly select a subset Q V (G1) such that |Q| = nnew = 400, and for any u, v Q we include edge e = {u, v} in the graph with probability r1; setting r1 sufficiently large ensures that the set Q forms a new cluster in the graph. Second, for any u, v V (G1) we include edge e = {u, v} with probability s. The edges sampled from these two processes form one edge update batch. We sample 10 such batches (ensuring no new clusters overlap), each inducing a new cluster and additional noise.

To cluster each GT , we run spectral clustering (SC) on three graphs:

1. We run spectral clustering on the full graph GT .

2. We construct the contracted graph e G1 at time T = 1, and incrementally update e G1 using the procedure described in Section 4.1. Then, we run spectral clustering on each e GT .

3. We construct a cluster-preserving sparsifier H1 using Algorithm 3, which we dynamically update using Algorithm 4 with sampling parameter τ = 3, and cluster each subsequent HT .

At each time T, we run spectral clustering with k = 10 + T 1 on all three graphs, and report the running times and ARI scores. We set p = 0.1, q = 0.01, r1 = 0.95, and s = 0.00001, and plot the results in the left plots of Figure 2. We can see that at every time T, spectral clustering on GT returns the perfect clustering, and spectral clustering on e GT and HT returns marginally worse clustering results. On the running time, we see that running spectral clustering on GT , HT and e GT takes around 100 seconds, 50 seconds, and less than 1 second respectively. This highlights that our algorithm returns nearly-optimal clusters with much faster running time than running spectral clustering on GT or HT .

Next, we compare the spectral gaps of LGT and L e GT for every T, and Table 1 reports that these gaps are well preserved. This demonstrates that, as what we prove earlier, the new cluster-structure of GT can be indeed identified from e GT .

Table 1. Spectral gaps in LGT and L e GT for SBM with increasing number of clusters. We report λk+T (LGT )/λk+T 1(LGT ) and λk+T (L e GT )/λk+T 1(L e GT ) at each time T.

T 2 3 4 5 6 7 8 9 10 11

GT 6.3 5.8 5.8 5.7 5.7 5.7 5.6 5.6 5.6 5.5 e GT 9.3 9.2 9.0 8.7 8.5 8.1 7.8 7.5 7.1 6.8

Dynamic Spectral Clustering with Provable Approximation Guarantee

2 10 18 26 34 42 50 58

2 22 42 62 82 102 0.00

2 10 18 26 34 42 50 58 0

Cumulative Time (s)

2 22 42 62 82 102 0

SC on GT SC on HT SC on e GT

Figure 3. Results on MNIST and EMNIST. Figures (a) and (b) report the average ARI scores at each time T for the clustering results on GT , HT , and e GT ; Figures (c) and (d) report the average cumulative running time in seconds at each time T. Shaded regions indicate the standard deviation.

SBM with decreasing number of clusters. We set k = 25, and the first graph G1 is generated based on the standard SBM with parameters p and q. For clusters {Si}5 i=1 we set |Si| = 20, 000, and for {Si}25 i=6 we set |Si| = 500; hence there are 5 large and 20 small clusters.

To update the graph, we generate a batch of edge updates as follows: we randomly choose two clusters Si and Sj such that |Si| = |Sj| = 500, and for any u Si and v Sj we include edge e = {u, v} in the graph with probability r2. Setting r2 sufficiently large ensures that clusters Si and Sj merge. Similarly as before, for any u, v V (G1) we also include edge e = {u, v} with probability s. All the edges sampled by these two processes form a single batch update. We sample 10 such batches, and there are k = 15 clusters at final time T = 11. At each time T, we run spectral clustering with k = 25 T + 1 on all three graphs. We set p = 0.1, q = 0.001, r2 = 0.95, and s = 0.00001, and plot the results in the right plots of Figure 2.

Similar to the SBM with increasing number of clusters, at every time T, spectral clustering on all three graphs returns similar results. We further see that spectral clustering on e GT has lower running time than the one on GT and HT . The spectral gaps in GT and e GT are reported in Table 2.

5.2. Results on Real-World Data

We further evaluate our algorithm on the MNIST dataset (Lecun et al., 1998), which consists of 10 classes of handwritten digits and has 70, 000 images, and the letter subset of the EMNIST dataset (Cohen et al., 2017), which consists of 26

Table 2. Spectral gaps in LGT and L e GT for SBM with decreasing number of clusters. We report λk T +2(LGT )/λk T +1(LGT ) and λk T +2(L e GT )/λk T +1(L e GT ) at each time T.

T 2 3 4 5 6 7 8 9 10 11

GT 4.5 4.4 4.2 4.0 4.0 3.9 3.8 3.8 3.8 3.6 e GT 8.3 8.0 7.5 7.4 7.4 7.0 6.8 6.3 5.8 5.4

classes of handwritten letters and has 145, 600 images. We construct a k-nearest neighbour graph for each dataset, and set k = 100 (resp. k = 200) for MNIST (resp. EMNIST).

We select four classes (clusters) at random; the chosen vertices and adjacent edges in the k-nearest neighbour graph form G1. To construct the sequence of updates, we select one new cluster (resp. two) at random for MNIST (resp. EMNIST), and add the edges inside the new cluster as well as the ones between the new and existing clusters. We randomly partition these new edges into 10 batches of equal size, and add these to the graph sequentially. We recompute e GT after one class (resp. two) is streamed for MNIST (resp. EMNIST), and report the results in Figure 3. The update/reconstruction time is included in the running time.

Our experiments on real-world data further confirm that, as the size of the underlying graph and its number of clusters increase over time, our designed algorithm has much lower running time compared with repeated execution of spectral clustering, while producing comparable clustering results.

Dynamic Spectral Clustering with Provable Approximation Guarantee

Impact Statement

This paper presents work whose goal is to advance the field of Machine Learning. There are many potential societal consequences of our work, none of which we feel must be specifically highlighted here.

Acknowledgements

This work is supported by an EPSRC Early Career Fellowship (EP/T00729X/1). Part of this work was done when He Sun was visiting the Simons Institute for the Theory of Computing in Fall 2023.

Beimel, A., Kaplan, H., Mansour, Y., Nissim, K., Saranurak, T., and Stemmer, U. Dynamic algorithms against an adaptive adversary: generic constructions and lower bounds. In 54th Annual ACM Symposium on Theory of Computing (STOC 22), pp. 1671 1684, 2022.

Bernstein, A., van den Brand, J., Gutenberg, M. P., Nanongkai, D., Saranurak, T., Sidford, A., and Sun, H. Fully-dynamic graph sparsifiers against an adaptive adversary. In 49th International Colloquium on Automata, Languages, and Programming (ICALP 22), pp. 20:1 20:20, 2022.

Chung, F. and Lu, L. Concentration inequalities and martingale inequalities: a survey. Internet mathematics, 3(1): 79 127, 2006.

Cohen, G., Afshar, S., Tapson, J., and Schaik, A. V. Emnist: Extending MNIST to handwritten letters. In 2017 International Joint Conference on Neural Networks (IJCNN), pp. 2921 2926, 2017.

Dhanjal, C., Gaudel, R., and Cl emenc on, S. Efficient eigenupdating for spectral graph clustering. Neurocomputing, 131:440 452, 2014.

Lecun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11):2278 2324, 1998.

Lee, J. R., Gharan, S. O., and Trevisan, L. Multiway spectral partitioning and higher-order Cheeger inequalities. Journal of the ACM, 61(6):1 30, 2014.

Macgregor, P. and Sun, H. A tighter analysis of spectral clustering, and beyond. In 39th International Conference on Machine Learning (ICML 22), pp. 14717 14742, 2022.

Martin, L., Loukas, A., and Vandergheynst, P. Fast approximate spectral clustering for dynamic networks. In 35th In-

ternational Conference on Machine Learning (ICML 18), pp. 3420 3429, 2018.

Ng, A. Y., Jordan, M. I., and Weiss, Y. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems 15 (Neur IPS 01), pp. 849 856, 2001.

Ning, H., Xu, W., Chi, Y., Gong, Y., and Huang, T. Incremental spectral clustering with application to monitoring of evolving blog communities. In Proceedings of the 2007 SIAM International Conference on Data Mining (SDM), pp. 261 272, 2007.

Peng, R., Sun, H., and Zanetti, L. Partitioning Well Clustered Graphs: Spectral Clustering Works! SIAM Journal on Computing, 46(2):710 743, 2017.

Rand, W. M. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66(336):846 850, 1971.

Saranurak, T. and Wang, D. Expander decomposition and pruning: Faster, stronger, and simpler. In 30th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 19), pp. 2616 2635, 2019.

Sun, H. and Zanetti, L. Distributed graph clustering and sparsification. ACM Transactions on Parallel Computing, 6(3):17:1 17:23, 2019.

Thorup, M. Fully-dynamic min-cut. Combinatorica, 27(1): 91 127, 2007.

Tropp, J. A. User-friendly tail bounds for sums of random matrices. Foundations of Computational Mathematics, 12(4):389 434, 2012.

von Luxburg, U. A tutorial on spectral clustering. Statistics and Computing volume, 17(4):395 416, 2007.

Dynamic Spectral Clustering with Provable Approximation Guarantee

A. Omitted Details from Section 3

The section presents the details omitted from Section 3, and is organised as follows. We prove Lemma 3.2 in Section A.1, and prove Theorem 3.3 in Section A.2.

A.1. Proof of Lemma 3.2

We first prove a structure theorem. We define the vectors χ1, . . . χk to be the indicator vectors of the optimal clusters S1, . . . , Sk in G, where χi(u) = 1 if u Si, and χi = 0 otherwise. We further use g1, . . . , gk to denote the indicator vectors of the optimal clusters S1, . . . Sk in G, normalised by the degrees in H, i.e.,

1 2 Hχi . (4)

Theorem A.1. Let S1, . . . , Sk be a k-way partition of G achieving ρG(k), and ΥG(k) = Ω(k), and {fi}k i=1 be first k eigenvectors of LH and let { gi}k i=1 be defined as in (4) above. Then, the following statements hold:

1. For any i [k], there is bfi Rn, which is a linear combination of f1, . . . , fk, such that gi bfi 2 = O (k/ΥG(k)) .

2. There are vectors bg1, . . . , bgk, each of which is a linear combination of g1, . . . , gk, such that Pk i=1 fi bgi 2 = O k2/ΥG(k) .

Proof. Let bfi = Pk j=1 gi, fj fj, and we write gi as a linear combination of the vectors f1, . . . , fn by gi = Pn j=1 gi, fj fj.

Since bfi is a projection of gi, we have that gi bfi is perpendicular to bfi and

gi bfi 2 = gi 2 bfi 2 =

j=1 gi, fj 2

j=1 gi, fj 2

j=k+1 gi, fj 2.

Now, let us consider the quadratic form

g i LH gi =

j=1 gi, fj f j

j=1 gi, fj fj

j=1 gi, fj 2λj(LH)

λk+1(LH) gi bfi 2

= Ω(λk+1(LG)) gi bfi 2 , (5)

where the second to last inequality follows by the fact that λi(LH) 0 holds for any 1 i n, and the last inequality follows because H is a cluster preserving sparsifier of G. This gives us that

g i LH gi = X

(u,v) EH w H(u, v)

deg H(u) gi(v) p

(u,v) EH w H(u, v)

vol H(Si) χi(v) p

= w H(Si, V \ Si)

= O (k ρG(k)) , (6)

Dynamic Spectral Clustering with Provable Approximation Guarantee

where the last line holds because H is a cluster preserving sparsifier of G. Combining (5) with (6), we have that

gi bfi 2 g i LH gi Ω(λk+1(LG)) O (k ρG(k))

Ω(λk+1(LG)) = O k ΥG(k)

which proves the first statement of the theorem.

Now we prove the second statement. We define for any 1 i k that bgi = Pk j=1 fi, gj gj, and have that

i=1 fi bgi 2 =

j=1 gj, fi 2

i=1 gj, fi 2 !

j=1 O k ΥG(k)

where the last inequality follows by the first statement of Theorem A.1.

Proof Sketch of Lemma 3.2. The proof follows Theorem 1.2 of (Peng et al., 2017) and Theorem 2 of (Macgregor & Sun, 2022), which imply that every returned cluster Pi (1 i k) from spectral clustering on G satisfies that

vol G(Pi Si) = O k vol G(Si)

ΦG(Pi) = O ΦG(Si) + k ΥG(k)

where Si is the optimal correspondence of Pi in G. Since H is a cluster-preserving sparsifier of G, we know that ρH(k) = O(k ρG(k)) and λk+1(LH) = Ω(λk+1(LG)), which implies that

ΥH(k) = λk+1(LH)

ρH(k) = Ω(λk+1(LG))

O(k ρG(k)) = Ω 1

k ΥG(k) . (7)

On the other side, compared with their work, we need to apply the bottom k eigenvectors of LH instead of LG to run spectral clustering. As such, combining (7) with the adjusted structure theorem (Theorem A.1) one can prove Lemma 3.2 using the proof technique from (Macgregor & Sun, 2022) and (Peng et al., 2017).

A.2. Proof of Theorem 3.3

Let Eresampled e [ n {u, v} EGt+1 | u Vdoubled o

Dynamic Spectral Clustering with Provable Approximation Guarantee

be the set of all the edges that have been (re)-sampled by Algorithm 4, and

Eold Et+1 \ Eresampled.

Moreover, let

p(t+1) u (v) min

( τ log(nt+1)

deg Gt+1(u) , 1

be the ideal sampling probability of an edge {u, v} if one runs the SZ algorithm from the scratch on Gt+1, and let

q(t+1)(u, v) p(t+1) u (v) + p(t+1) v (u) p(t+1) u (v) p(t+1) v (u)

be the probability that edge e is sampled if one runs the SZ algorithm from scratch at time t + 1. For any edge {u, v}, we use

eq(u, v) q(r)(u, v)

epu(v) p(r) u (v)

epv(u) p(r) v (u)

for some 1 r t + 1 to denote the sampling probability last used for edge {u, v} throughout the sequence of edge updates. Hence, we have eq(u, v) = q(t+1)(u, v) if {u, v} Eresampled, and eq(u, v) = q(r)(u, v) for some 1 r t + 1 if edge {u, v} Eold. By the algorithm description (Line 16 in Algorithm 4), we know that

τ log(nt+1) 2 deg Gt+1(u) τ log(nr)

deg Gr(u) 2 τ log(nt+1)

deg Gt+1(u) . (8)

The following two concentration inequalities will be used in our analysis.

Lemma A.2 (Bernstein s Inequality (Chung & Lu, 2006)). Let X1, . . . , Xn be independent random variables such that |Xi| M for any i {1, . . . , n}. Let X = Pn i=1 Xi, and R = Pn i=1 E X2 i . Then, it holds that

P [|X E [X] | t] 2 exp t2

2(R + Mt/3)

Lemma A.3 (Matrix Chernoff Bound (Tropp, 2012)). Consider a finite sequence {Xi} of independent, random, PSD matrices of dimension d that satisfy Xi R. Let µmin λmin(E [P

i Xi]) and µmax λmax(E [P

i Xi]). Then, it holds that

for δ [0, 1], and

(1 + δ)µmax

We first prove the following result on the relationship of cut values between Gt+1 and Ht+1.

Lemma A.4. Let Gt+1 be a graph, and Ht+1 the sparsifier returned by Algorithm 4. Suppose for every {u, v} Et+1 that epu(v) < 1, then it holds for any non-empty subset A Vt+1 that

P |w Ht+1(A, Vt+1 \ A) w Gt+1(A, Vt+1 \ A)| 1

2 w Gt+1(A, Vt+1 \ A)

2 exp τ log nt+1 w Gt+1(A, Vt+1 \ A)

10 vol Gt+1(A)

Dynamic Spectral Clustering with Provable Approximation Guarantee

Proof. For any edge e = {u, v}, we define the random variable Ye by

( 1 eq(u,v) with probability eq(u, v)

0 otherwise.

We also define Z X

e EGt+1(A,Vt+1\A) Ye,

and have that

e={u,v} EGt+1(A,Vt+1\A) E[ Ye ] = X

e={u,v} EGt+1(A,Vt+1\A) eq(u, v) eq(u, v) 1 = w Gt+1(A, Vt+1 \ A).

To prove a concentration bound on this degree estimate, we apply the Bernstein inequality (Lemma A.2), for which we need to bound the second moment R X

e={u,v} EGt+1(A,Vt+1\A) E[ Y 2 e ].

We get that

e={u,v} EGt+1(A,Vt+1\A) eq(u, v) 1 eq(u, v)

e={u,v} EGt+1(A,Vt+1\A)

e={u,v} EGt+1(A,Vt+1\A)

1 epu(v) (9)

e={u,v} EGt+1(A,Vt+1\A)

2 deg Gt+1(u)

τ log(nt+1) (10)

τ log(nt+1) X

e={u,v} EGt+1(A,Vt+1\A) 1

= 2 Gt+1(A) w Gt+1(A, Vt+1 \ A)

τ log(nt+1) ,

where Gt+1(A) maxu A deg Gt+1(u), (9) holds since eq(u, v) = epu(v) + epv(u) epu(v) epv(u) epu(v), and (10) holds because of (8).

Note, by (8), for any edge e = {u, v} EGt+1(A, Vt+1 \ A) we have that

0 Ye = 1 eq(u, v) 1 epu(v) 2 Gt+1(A)

τ log nt+1 .

Then, by applying Bernstein s inequality, we have that

P |Z E[ Z ]| 1

2E[ Z ] 2 exp

w Gt+1(A, Vt+1 \ A)2/4

Gt+1(A) w Gt+1(A,Vt+1\A)

τ log(nt+1) + Gt+1(A) w Gt+1(A,Vt+1\A)

3 τ log(nt+1)

= 2 exp τ log(nt+1) 3 w Gt+1(A, Vt+1 \ A)

2 exp τ log(nt+1) w Gt+1(A, Vt+1 \ A)

10 vol Gt+1(A)

which proves the lemma.

Dynamic Spectral Clustering with Provable Approximation Guarantee

Proof of Theorem 3.3. We first analyse the number of edges in Ht+1, i.e., the size of Ft+1. We have that

e={u,v} EGt+1

e={u,v} EGt+1

2 τ log nt+1

deg Gt+1(u) = 2 τ nt+1 log nt+1,

where the first inequality holds by (8). Therefore, it holds by the Markov inequality that the number of edges {u, v} with epu(v) 1 is O (τ nt+1 log nt+1). Without loss of generality, we assume that these edges are included in Ft+1, and we assume for the remaining part of the proof that it holds that epu(v) < 1.

We now show that the degrees of the vertices in Gt+1 are approximately preserved in Ht+1. Let u be an arbitrary vertex of Gt+1. Observing that vol Gt+1(u) = w Gt+1(u, V \u) = deg Gt+1(u) and w Ht+1(u, Vt+1\u) = deg Ht+1(u), by Lemma A.4 it holds that

P |deg Ht+1(u) deg Gt+1(u)| 1

2deg Gt+1(u) = 2 exp ( (1/10) τ log nt+1)

= 2 exp (1/10) (log nt+1 C)/λk+1(LGt+1)

= o(1/n2 t+1).

Hence, by taking C to be sufficiently large and the union bound, it holds with high probability that the degrees of all the vertices in Gt+1 are preserved in Ht+1 up to a constant factor. Throughout the rest of the proof, we assume this is the case. This implies for any subset A Vt+1 that vol Ht+1(A) = Θ(vol Gt+1(A)).

Secondly, we prove it holds that ΦHt+1(Si) = O(k ΦGt+1(Si)) for any 1 i k, where S1, . . . , Sk are the optimal clusters corresponding to ρGt+1(k). For any 1 i k, it holds that

E[ w Ht+1(Si, Vt+1 \ Si) ] = X

e={u,v} Et+1 u Si,v / Si

eq(u, v) 1 eq(u, v) = w Gt+1(S, Vt+1 \ Si).

Hence, by Markov s inequality and the union bound, it holds with constant probability that w Ht+1(Si, Vt+1 \ Si) = O(k w Gt+1(Si, Vt+1 \ Si)). Therefore, it holds with constant probability that

ρHt+1(k) max 1 i k ΦHt+1(Si) = max 1 i k O(k ΦGt+1(Si)) = O(k ρGt+1(k)).

Next, we prove that λk+1(LHt+1) = Ω(λk+1(LGt+1)). Let LGt+1 be the projection of LGt+1 on its top nt+1 k eigenspaces, and notice that LGt+1 can be written as

i=k+1 λi(LGt+1) fif i

where f1, . . . , fnt+1 are the eigenvectors of LGt+1. Let L 1/2 Gt+1 be the square root of the pseudoinverse of LGt+1. We prove that the top nt+1 k eigenvalues of LGt+1 are preserved, which implies that λk+1(LHt+1) = Θ(λk+1(LGt+1)).

To prove this, for each edge e = {u, v} EGt+1 we define a random matrix Xe Rnt+1 nt+1 by

( w Ht+1(u, v) L 1/2 Gt+1beb e L 1/2 Gt+1 if e = {u, v} is sampled by the algorithm 0 otherwise,

where be χu χv is the edge indicator vector and χv Rn is defined by

deg Gt+1(v) if a = v

0 otherwise.

Notice that X

e EGt+1 Xe = X

e={u,v} e EGt+1

w Ht+1(u, v) L 1/2 Gt+1beb e L 1/2 Gt+1 = L 1/2 Gt+1LH t+1L 1/2 Gt+1,

Dynamic Spectral Clustering with Provable Approximation Guarantee

where LH t+1 X

e EGt+1 w Ht+1(u, v) beb e

is LHt+1 normalised with respect to the degree of the vertices in Gt+1. We prove that, with high probability, the top nt+1 k eigenvalues of LH t+1 and LGt+1 are approximately the same. Then, to finish the proof, we also show that this is the case for the top nt+1 k eigenvalues of LHt+1 and LH t+1, from which we get that λk+1(LHt+1) = Ω λk+1(LGt+1) .

First, from (8) we get that for any edge e it holds that

eq(u, v) epu(v) + epv(u) 2

τ log(nt+1)

deg Gt+1(u) + τ log(nt+1)

deg Gt+1(v)

2 (epu(v) + epv(u)) 1

τ log(nt+1)

deg Gt+1(u) + τ log(nt+1)

deg Gt+1(u)

We start by calculating the first moment of P

e EGt+1 Xe, and have that

e={u,v} e EGt+1

eq(u, v) w Ht+1(u, v) L 1/2 Gt+1beb e L 1/2 Gt+1L 1/2 Gt+1

e={u,v} e EGt+1

eq(u, v) 1 eq(u, v) L 1/2 Gt+1beb e L 1/2 Gt+1

= L 1/2 Gt+1LGt+1L 1/2 Gt+1.

Moreover, for any sampled e = {u, v} we have that

Xe w Ht+1(u, v) b e L 1/2 Gt+1L 1/2 Gt+1be

= 1 eq(u, v) b e L 1/2 Gt+1L 1/2 Gt+1be

1 eq(u, v) 1 λk+1(LGt+1) be 2

4λk+1(LGt+1)

C log nt+1 1 deg Gt+1(u) + 1 deg Gt+1(v)

1 λk+1(LGt+1)

1 deg Gt+1(u) + 1 deg Gt+1(v)

= 4 C log nt+1 ,

where the second inequality follows by the min-max theorem of eigenvalues, and (16) holds by (15). Now we apply the matrix Chernoff bound (Lemma A.3) to analyse the eigenvalues of P

e EGt+1 Xe. We set λmax E h P e EGt+1 Xe i =

L 1/2 Gt+1LGt+1L 1/2 Gt+1 = 1, R = 4 C log nt+1 and δ = 1/2, and have that

(1 + 1/2)3/2

C log nt+1/4 = O(1/nc t+1)

for some constant c. Therefore we get that

= 1 O(1/nc t+1). (17)

Dynamic Spectral Clustering with Provable Approximation Guarantee

Similarly, since λmin E h P

e EGt+1 Xe i = λmin

L 1/2 Gt+1LGt+1L 1/2 Gt+1 = 1, the other side of the matrix Chernoff bound gives us that

= 1 O(1/nc t+1). (18)

Combining (17) and (18), it holds with probability 1 O(1/nc t+1) for any non-zero x Rnt+1 in the space spanned by fk+1, . . . , fnt+1 that

x L 1/2 Gt+1L Ht+1L 1/2 Gt+1x

x x (1/2, 3/2).

Since dim(span{fk+1, . . . , fnt+1}) = nt+1 k, there exist nt+1 k orthogonal vectors whose Rayleigh quotient with respect to L Ht+1 is Ω λk+1 LGt+1 . The Courant-Fischer Theorem implies that λk+1(L Ht+1) = Ω λk+1 LGt+1 .

It only remains to show that λk+1(LHt+1) = Ω(λk+1(L Ht+1)), which implies that λk+1(LHt+1) = Ω λk+1 LGt+1 . By definition of λk+1(L Ht+1), we have that

LHt+1 = D 1/2 Ht+1D1/2 Gt+1L Ht+1D1/2 Gt+1D 1/2 Ht+1.

Therefore, for any x Rnt+1 and y D1/2 Gt+1D 1/2 Ht+1x, it holds that

x x = y L Ht+1y

where the final guarantee follows from the fact that the degrees in Ht+1 are preserved up to a constant factor. The conclusion of the theorem follows from the Courant-Fischer Theorem.

Finally, it remains to analyse the amortised update time of the algorithm. Notice that, if one only needs to sample the incoming edge at time t + 1, then the update time is O(1). Otherwise, all the edges adjacent to some vertex w need to be resampled, and the running time for this step is O(deg Gt+1(w)). However, this means that either deg Gt+1(w) > 2 deg Gt(w) or log(nt+1) > 2 log(nt). In the first case, this only occurs at most every deg Gt(w) edge updates, which results in the amortised update time of O(1). The second case only happens after every n2 t vertex additions, and in the worst case we only have to resample all the edges in present in Gt every n2 t edge updates, which again leads to the amortised update time of O(1).

B. Omitted Details from Section 4

This section contains the omitted details from Section 4, and is organised as follows. In Section B.1 we introduce additional notation to analyse our constructed contracted graphs. In Section B.2 we present the omitted proofs for Lemmas 4.1, 4.2, 4.3, 4.4, and 4.5, and we formally describe the Update Contracted Graph procedure.

B.1. Notation

For any subset A Vt , let e A A e V nc t be the representation of A among the non-contracted vertices of e Gt . Recall that for any subset of vertices A Vt , we use A(t) A Vt to denote the set of vertices present at time t. Let

Eadded Enew n {u, v} Et | deg Gt (u) > 2 deg Gr(u) or deg Gt (v) > 2 deg Gr(v) o

be the set of edges that have been directly added into e Gt, where deg Gr(w) for r t is the degree of w used to construct the contracted graph. These edges are the ones directly added as new edges or their endpoints are pulled out from clusters in e Gt. For a subset B e Vt , let ˆB be the representation of the set B in Gt , i.e.,

pi Bc P (t ) i

where P (t ) i Pi \ (Pi e V nc t ), Bnc B V nc t , and Bc B V c t . One can see P (t ) i as the vertices in Pi that are still represented by the respective super vertex in e G.

Dynamic Spectral Clustering with Provable Approximation Guarantee

B.2. Omitted Proofs

Our analysis is based on approximation guarantee of spectral clustering. The following result, which can be shown easily by combining the proof technique of (Peng et al., 2017) and the one of (Macgregor & Sun, 2022), will be used in our analysis.

Lemma B.1. There is an absolute constant CB.1 R>0, such that the following holds: Let G be a graph with k optimal clusters {Si}k i=1, and ΥG(k) CB.1 k. Let {Pi}k i=1 be the output of spectral clustering and, without loss of generality, the optimal correspondence of Pi is Si for any 1 i k. Then, it holds for any 1 i k that

vol G(Pi Si) k CB.1

3ΥG(k) vol G(Si),

where A B for any sets A and B is defined by A B (A \ B) (B \ A). It also holds that

ΦG(Pi) = O ΦG(Si) + k ΥG(k)

Moreover, these P1, . . . , Pk can be computed in nearly-linear time.

Proof of Lemma 4.1. The running time of the algorithm is dominated by computing the total weight w Ht(Pi, Pj) between every Pi, Pj P (Lines 7 10), which takes O(|Ft|) time as there are |Ft| edges in Ht.

Proof of Lemma 4.2. The running time of the update operation is dominated by the case in which a vertex is pulled out from a contracted vertex (Lines 7 22). It s easy to see that, if this does not happen, then the running time is O(1) as the edge is just added into e Gt.

Let {u, v} be the added edge, and we assume Lines 7 22 are triggered. The running time for this case is O(deg Gt+1(u) + deg Gt+1(v)), since at least one of u and v is pulled out from their respective contracted vertices and all the adjacent edges are placed into the contracted graph. Notice that this only happens if deg Gt+1(u) > 2 deg Gt(u) or deg Gt+1(v) > 2 deg Gt(v). Since at least deg Gt(u) or deg Gt(v) edge insertions are needed before running Lines 7 22, the amortised per edge update time is O(1).

Proof of Lemma 4.3. Notice by Lemma B.1 we know it holds with high probability for all 1 i k that ΦHt(Pi) = O (k ρHt(k)). By applying Theorem 3.3, it holds with high probability that ΦHt(Pi) = O k2 ρGt(k) . By Lemma 3.2, we also have with high probability that ΦGt(Pi) = O k2 ρGt(k) . This proves the statement.

The next lemma shows that, starting from Gt and Ht, one can easily construct a cluster preserving sparsifier of Gt .

Lemma B.2. Let H t (Vt , Ft Eadded, w H t ) be a graph, where

1 e Eadded w Ht(e) e Ft \ Eadded 0 otherwise.

Then, it holds with high probability that H t is a cluster preserving sparsifier of Gt .

Proof. First, for any e Eadded we know that it is included in H t with probability 1. For any other edge e = {u, v} Ft \ Eadded, we know by the construction of Ht using the dynamic cluster-preserving sparsifier that the parameter used to sample e from the perspective of u is

τ log(nt) 2 deg Gt(u) τ log(nr)

deg Gr(u) 2 τ log(nt)

deg Gt(u) ,

for some 1 r t. We also know by construction that for any e = {u, v} Ft \ Eadded that

deg Gt (u) 2 deg Gt(u).

Dynamic Spectral Clustering with Provable Approximation Guarantee

Algorithm 7 Update Contracted Graph(Gt, e Gt, e)

1: Input: Graph Gt = (Vt, Et), contracted graph e Gt = (e Vt, e Et, w e Gt), incoming edge e = {u, v}.

2: Output: Contracted graph e Gt+1 = (e Vt+1, e Et+1, w e Gt+1)

3: Vnew {u, v} \ Vt 4: Gt+1 (Vt Vnew, Et e) 5: e Gt+1 (e Vt Vnew, e Et, w e Gt) = (e Vt+1, e Et+1, w e Gt+1)

6: e V nc t+1 e V nc t+1 Vnew 7: for w {u, v} \ Vnew do 8: Let Gr be the graph at time r when the contracted graph is constructed, and Hr = (Vr, Fr, w Hr) the cluster preserving sparsifier at time r. 9: if w / e V nc t+1 and deg Gt+1(w) > 2 deg Gr(w) then 10: Let pj be the super node such that w Pj 11: e V nc t+1 e V nc t+1 w

12: e Et+1 e Et+1 EGt+1(w, e V nc t+1)

13: for ˆv e V nc t+1 adjacent to w do 14: w e Gt+1(pj, ˆv) w e Gt+1(pj, ˆv) 1 15: end for 16: for {w, pi} w e V c t+1 do

17: e Et+1 e Et+1 {w, pi}

18: w e Gt+1(w, pi) w Gt+1(w, P (t+1) i )

19: w e Gt+1(pi, pj) w e Gt+1(pi, pj) w Hr(w, P (t+1) i ) 20: end for 21: end if 22: end for 23: if u e V nc t+1 and v e V nc t+1 then

24: e Et+1 e Et+1 {u, v} 25: else if u e V nc t+1 or v e V nc t+1 then

26: Without loss of generality, let u e V nc t+1 and v / e V nc t+1. Let pj be the supernode such that v Pj 27: e Et+1 e Et+1 {u, pj} 28: w e Gt+1(u, pj) w e Gt+1(u, pj) + 1 29: else 30: Let pi and pj be the supernodes such that u Pi and v Pj 31: w e Gt+1(pi, pj) w e Gt+1(pi, pj) + 1 32: end if 33: Return Gt+1 = (e Vt+1, e Et+1, w e Gt+1)

Dynamic Spectral Clustering with Provable Approximation Guarantee

Finally, it holds that log(nt) log(nt ) 2 log(nt). From this we get that e is sampled from vertex u with the following parameter τ log(nt ) 4 deg Gt (u) τ log(nr)

deg Gr(u) 4 τ log(nt )

deg Gt (u) .

Following almost the same analysis as the proof of Theorem 3.3, it holds with high probability that H t is a cluster preserving sparsifier of Gt .

Our next lemma proves several useful properties about the contracted graph as it is updated.

Lemma B.3. The following statements hold:

(C1) It holds for any subset B Vt \ e V nc t that vol Gt (B) 2 vol Gt(B).

(C2) Suppose for a subset A Vt with vol Gt (A) vol(Gt )/2 we have that ΦGt(A(t)) 1/c1 and ΦGt (A) log ε(nt ) for any positive c1, ε such that 4 c1 logε(nt ), then it holds that

Φ e Gt ( e A) 21 c1 logε(nt ).

(C3) For any super node pi e V c t , it holds that

Φ e Gt(pi) = O k 6 log 2γ(nt) ,

and Φ e Gt (pi) = O k 6 log γ(nt) .

Informally speaking, Property (C1) of Lemma B.3 shows that the volume of any vertex set B Vt that are not directly represented in e Gt remains approximately the same in Gt and Gt ; Property (C2) states that, if the conductance of any set A Vt in Gt becomes much lower than the one in Gt, then its representative set e A e Vt has low conductance; Property (C3) further shows that the conductance of all the contracted vertices doesn t change significantly over time.

Proof of Lemma B.3. For (C1), by construction we have that for any u Vt \ e V nc t it holds that deg Gt (u) 2 deg Gt(u), from which the statement follows.

Next, we prove (C2). The following two claims will be used in our analysis.

Claim B.3.1. It holds that vol Enew(A) vol Gt(A(t)) logε(nt )

Proof. Assume by contradiction that vol Enew(A) < vol Gt(A(t)) logε(nt )

2 c1 . We have that

ΦGt (A) = w Gt(A(t), Vt \ A(t)) + w Enew(A, Vt \ A)

vol Gt(A(t)) + vol Enew(A)

w Gt(A(t), Vt \ A(t)) vol Gt(A(t)) + vol Enew(A)

2 min w Gt(A(t), Vt \ A(t))

vol Gt(A(t)) , w Gt(A(t), Vt \ A(t))

vol Enew(A)

2 min ΦGt(A(t)), vol Gt(A(t)) c1 vol Enew(A)

> 1 logε(nt ),

where on the last line we used the contradictory assumption. This contradicts the condition of ΦGt (A) log ε(nt ), and hence the statement holds.

Dynamic Spectral Clustering with Provable Approximation Guarantee

Notice that this claim implies that

vol Gt (A) = vol Gt(A(t)) + vol Enew(A)

1 + logε(nt )

vol Gt(A(t))

1 + logε(nt )

4 c1 + logε(nt )

vol Gt(A(t))

2 + logε(nt )

vol Gt(A(t)) (19)

where the last inequality follows from the fact that 4 c1 logε(nt ).

Claim B.3.2. It holds that vol e Gt ( e A) logε(nt )

4 c1 vol Gt(A(t)).

Proof. Assume by contradiction that vol e Gt ( e A) < logε(nt )

4 c1 vol Gt(A(t)). Then, it holds that

vol Gt (A) = vol e Gt ( e A) + vol Gt (A \ e A)

vol e Gt ( e A) + 2 vol Gt(A \ e A)

< logε(nt )

4 c1 vol Gt(A(t)) + 2 vol Gt(A(t)),

where the first inequality holds by statement (C1). Hence, we reach a contradiction with (19), and the claim holds.

Now we are ready to prove statement (C2). We have that

Φ e Gt ( e A) = w e Gt ( e A, e Vt \ e A)

vol e Gt ( e A)

w Gt (A, Vt \ A) + w Gt (A \ e A, e A)

vol e Gt ( e A)

log ε(nt ) vol Gt (A) + vol Gt (A \ e A)

vol e Gt ( e A)

logε(nt ) vol e Gt ( e A) + 8 c1 vol Gt(A(t)) vol Gt(A(t)) logε(nt ) (20)

vol Gt(A(t)) + vol Enew(A)

logε(nt ) vol e Gt ( e A) + 8 c1 logε(nt )

3 vol Gt(A(t)) + vol e Gt ( e A)

logε(nt ) vol e Gt ( e A) + 8 c1 logε(nt ) (21)

3 vol Gt(A(t)) c1 4

log2ε(nt ) vol Gt(A(t)) + vol e Gt (A)

logε(nt ) vol e Gt ( e A) + 8 c1 logε(nt ) (22)

12 c1 log2ε(nt ) + 1 logε(nt ) + 8 c1 logε(nt )

logε(nt ) 21 c1 logε(nt ),

where (20) holds by Claim B.3.2 and the fact that vol Gt (A \ e A) 2 vol Gt(A \ e A) 2 vol Gt(A(t)) by statement (C1). (21) holds because by construction vol Enew(A) vol e Gt ( e A) + vol Gt (A \ e A) vol e Gt ( e A) + 2 vol Gt(A(t)), and (22) holds because of Claim B.3.2.

Dynamic Spectral Clustering with Provable Approximation Guarantee

Finally, we prove statement (C3). For any Pi P, we have by construction that

Φ e Gt(pi) = ΦHt(Pi) = O k 6 log 2γ(nt) , (23)

where the last equality holds by Lemma 4.3. This proves the first part of the statement. Next, notice that for any pi e V c t , because Gt is connected and each Pi has almost identical volume as the corresponding optimal Si in Gt (Lemma 3.2), by construction it holds that vol e Gt(pi) = Ω(k6 log2γ(nt)), (24)

and vol e Gt(e Vt \ pi) = Ω(k6 log2γ(nt)). (25)

Taking this into account, we get that

w e Gt (pi, e Vt \ pi) w e Gt(pi, e Vt \ pi) + |Enew| + w Gt Pi e V nc t , Pi \ (Pi e V nc t )

w e Gt(pi, e V \ pi) + logγ(nt) + vol Gt Pi e V nc t

Φ e Gt(pi) min{vol e Gt(pi), vol e Gt(e V \ pi)} + logγ(nt) + 2 logγ(nt) (26)

= O k 6 log 2γ(nt) min{vol e Gt(pi), vol e Gt(e V \ pi)} + 3 logγ(nt), (27)

where (26) holds because vol Gt (Pi e V nc t ) 2 logγ(nt) as every vertex that is pulled out of pi needs to at least double in degree, so adding |Enew| edges ensures at most 2 |Enew| volume can be pulled out of pi, (27) holds because of (23). Moreover, we also have that

min{vol e Gt (pi), vol e Gt (e Vt \ pi)} min{vol e Gt(pi) 2 logγ(nt), vol e Gt(e Vt \ pi)} (28)

= Ω min{vol e Gt(pi), vol e Gt(e Vt \ pi)} , (29)

where (28) holds because vol e Gt (pi) vol e Gt(pi) vol Gt (Pi e V nc t ) vol e Gt (pi) 2 logγ(nt), and (29) holds because

of (24) and (25). Combining (27) and (29), we have for any pi e V c t that

Φ e Gt (pi) = w e Gt (pi, e Vt \ pi)

min{vol e Gt (pi), vol e Gt (e Vt \ pi)} = O k 6 log γ(nt) ,

which proves the second part of statement (C3).

Corollary B.4. Suppose for a subset A Vt with vol Gt (A) vol(Gt )/2, it holds that Φ e Gt ( e A) > (21 c1) log ε(nt )

and ΦGt (A) log ε(nt ) for any positive c1, ε satisfying 4 c1 logε(nt ). Then, it holds that ΦGt(A(t)) < 1/c1.

Proof of Corollary B.4. Assume by contradiction that ΦGt(A(t)) 1/c1. Then, by statement (C2) in Lemma B.3 and the fact that ΦGt (A) log ε(nt ), it holds that Φ e Gt ( e A) (21 c1) log ε(nt ), which is a contradiction. Hence, it holds

that ΦGt(A(t)) < 1/c1.

Before analysing the spectral gap in the contracted graph e Gt with respect to the spectral gap in the full graph Gt , we show that for any small subset of vertices A V with a low value of ΦGt (A), the conductance of its corresponding set in the contracted graph Φ e Gt ( e A) is low as well.

Lemma B.5. Let C Vt be a subset of vertices such that vol Gt (C) k6 log2γ(nt) and ΦGt (C) log ε(nt ) for some constant ε > 0. Then, it holds that Φ e Gt ( e C) = O(log 0.9ε(nt )).

Dynamic Spectral Clustering with Provable Approximation Guarantee

Proof. We prove this by contradiction. Assume by contradiction that

Φ e Gt ( e C) > 21

4 log0.1ε(nt ) log ε(nt ) = 21

4 log 0.9ε(nt ).

Setting c1 (1/4) log0.1ε(nt ), it holds by Corollary B.4 that

ΦGt(C(t)) < 4 log 0.1ε(nt ). (30)

We will show that C(t) can be used to create a (k + 1)-partition in Gt with low outer conductance, contradicting with the fact that λk+1(Gt) = Ω(1).

Let S1, . . . Sk be the optimal clusters in Gt corresponding to ρGt(k). Given that Gt is a connected graph and ρGt(k) = O k 8 log 2γ(nt) , it holds that vol Gt(Si) = Ω k8 log2γ(nt) for all 1 i k. We then create the following (k + 1)-partition:

A C(t) n S1 \ C(t), . . . , Sk \ C(t)o ,

which is a valid partition as we know that vol Gt(C(t)) vol Gt (C) k6 log2γ(nt) by the conditions of the lemma. Now we will compute the conductance of each cluster in A.

First of all, we have from (30) that ΦGt(C(t)) < 4 log 0.1ε(nt ). (31)

Second, for any cluster Sj \ C(t) we have that

ΦGt(Sj \ C(t)) = w Gt(Sj \ C(t), Vt \ (Sj \ C(t))) min{vol Gt(Sj \ C(t)), vol Gt(Vt \ (Sj \ C(t)))}.

Our proof is by the following case distinction:

Case 1: min{vol Gt(Sj \ C(t)), vol Gt(Vt \ (Sj \ C(t)))} = vol Gt(Vt \ (Sj \ C(t))).

ΦGt(Sj \ C(t)) = w Gt(Sj \ C(t), Vt \ (Sj \ C(t)))

vol Gt(Vt \ (Sj \ C(t)))

w Gt(Sj, Vt \ Sj) + w Gt(C(t), Vt \ C(t))

vol Gt(Vt \ Sj) + vol Gt(C(t) Sj)

2 max w Gt(Sj, Vt \ Sj)

vol Gt(Vt \ Sj) , w Gt(C(t), Vt \ C(t))

vol Gt(Vt \ Sj)

2 max n ΦGt(Sj), ΦGt(C(t)) o (32)

max{O k 8 log 2γ(nt) , 4 log 0.1ε(nt )},

where for (32) it holds that min{vol Gt(Sj), vol Gt(Vt \ Sj)} = vol Gt(Vt \ Sj) because we know that vol(Gt)/2 vol Gt(Vt \ (Sj \ C(t))) vol Gt(Vt \ Sj), and we also know that vol Gt(Vt \ Sj) vol Gt(C(t)).

Case 2: min{vol Gt(Sj \ C(t)), vol Gt(Vt \ (Sj \ C(t)))} = vol Gt(Sj \ C(t)).

ΦGt(Sj \ C(t)) = w Gt(Sj \ C(t), Vt \ (Sj \ C(t)))

vol Gt(Sj \ C(t))

w Gt(Sj, Vt \ Sj) + w Gt(C(t), Vt \ C(t))

vol Gt(Sj) vol Gt(C(t))

w Gt(Sj, Vt \ Sj) + w Gt(C(t), Vt \ C(t))

Ω(vol Gt(Sj)) (33)

= O (ΦGt(Sj)) + O ΦGt(C(t)) (34)

= 2 max O k 8 log 2γ(nt) , O log 0.1ε(nt ) (35)

Dynamic Spectral Clustering with Provable Approximation Guarantee

where (33) holds because vol Gt(Si) = Ω k8 log2γ(nt) and vol Gt(C(t)) = O k6 log2γ(nt) , (34) holds because w Gt(Sj, Vt \ Sj) ΦGt(Sj) vol Gt(Sj) and w Gt(C(t), Vt \ C(t)) ΦGt(C(t)) vol Gt(C(t)).

Combining both cases, we have for every 1 j k that

ΦGt(Sj \ C(t)) = 2 max O k 8 log 2γ(nt) , O log 0.1ε(nt ) . (36)

Therefore, by combining (31) and (36), we have shown that

ρGt(k + 1) max Aj A ΦGt(Aj) = 2 max O k 8 log 2γ(nt) , O log 0.1ε(nt ) ,

which contradicts the fact that ρGt(k + 1) λk+1(LGt)

2 = Ω(1). Hence, the statement of the lemma follows.

Proof of Lemma 4.4. We first prove the first statement. Let S = S1, . . . , Sℓbe a set of clusters that achieve ρGt (ℓ). For ease of notation we set

Ssmall S(t ) small k6 log2γ(nt)

to be the clusters in S with volume at most k6 log2γ(nt), and similarly

Slarge S(t ) large k6 log2γ(nt) .

We will use the partition S, which has low outer conductance in Gt , to create an r-way partition in e Gt with low r-way expansion. We construct this r-way partition, denoted by R, as follows:

R n e S1, . . . , e Sℓ1, p1, . . . , pk 1, p k o

where ℓ1 |Ssmall|, and we define

e V nc t \ [

to be the union of the super node pk with the leftover non-contracted vertices which do not belong to any e Sj. We start by showing that R has low r-way expansion:

By Lemma B.5, we know that for every Sj Ssmall, it holds that Φ e Gt (e Sj) = O log 0.9α(nt ) .

By Property (C3) of Lemma B.3, we know that for every super node pi {pk1, . . . pk 1} it holds that Φ e Gt (pi) =

O k 6 log γ(nt ) .

Finally, for p k we know that

Φ e Gt (p k) = w e Gt (p k, e Vt \ p k)

min n vol e Gt (p k), vol e Gt (e Vt \ p k) o. (37)

We split the computation of this conductance into two cases.

Dynamic Spectral Clustering with Provable Approximation Guarantee

Case 1: Suppose min n vol e Gt (p k), vol e Gt (e Vt \ p k) o = vol e Gt (p k). Then, we have that

Φ e Gt (p k) = w e Gt (p k, e Vt \ p k)

vol e Gt (p k)

w e Gt(pk, e Vt \ pk) + logγ(nt ) + vol e Gt (e V nc t )

vol e Gt(pk) vol e Gt (e V nc t ) (38)

Φ e Gt(pk) vol e Gt(pk) + 3 logγ(nt )

Ω vol e Gt(pk) (39)

= O k 6 log 2γ(nt) vol e Gt (pk) + 3 logγ(nt )

Ω vol e Gt(pk) (40)

= O k 6 log 2γ(nt) vol e Gt (pk)

Ω vol e Gt(pk) (41)

= O k 6 log 2γ(nt) ,

where (38) holds because |Enew| logγ(nt) is the maximum amount of weight that can be added between pk and its complement, (39) holds because vol e Gt (e V nc t ) 2 |Enew| is the maximum volume of non-contracted

vertices that can be added to e Gt and (40) holds because of statement (C3) of Lemma B.3, and (41) holds since vol e Gt (pk) vol(Gt)/k nt/k.

Case 2: Suppose min n vol e Gt (p k), vol e Gt (e Vt \ p k) o = vol e Gt (e Vt \ p k). Then it holds that the conductance of p k is upper bounded by the maximum conductance of every other cluster in R, i.e.,

Φ e Gt (p k) = w e Gt (p k, e Vt \ p k)

vol e Gt (e Vt \ p k)

Sj Ssmall w e Gt (e Sj, e Vt \ e Sj) + Pk 1 j=1 w e Gt (pj, e Vt \ pj) P

Sj Ssmall vol e Gt (e Sj) + Pk 1 j=1 vol e Gt (pj)

max max Sj Ssmall

n Φ e Gt (e Sj) o , max pj {p1,...,pk 1}

n Φ e Gt (pj) o

= max O log 0.9α(nt ) , O k 6 log γ(nt ) ,

where the last inequality follows by the mediant inequality.

Combining the two cases, we have that

Φ e Gt (p k) = max O log 0.9α(nt ) , O k 6 log γ(nt ) .

We have so far analysed the conductance of each of the clusters in the partition R, and have shown that

ρ e Gt (r) = max O log 0.9α(nt ) , O k 6 log γ(nt ) . (42)

Before reaching the final contradiction, we prove the following claim.

Claim B.5.1. It holds that r ℓ.

Proof. Assume by contradiction that r < ℓ. In this case, we know that r = |Ssmall| + |P|, and ℓ= |S|. Therefore, the condition of r < ℓgives us that |Ssmall| + |P| < |S|, which implies that |P| < |S| |Ssmall| = |Slarge|. This means that the number of large clusters in S is greater than the number of clusters in P.

Dynamic Spectral Clustering with Provable Approximation Guarantee

It therefore holds that |Slarge| > |P| = k. Furthermore, since it holds that for every Sj Slarge that vol Gt (Sj) > k6 log2γ(nt), and the number of new edges is |Enew| logγ(nt), it also holds that

ΦGt(Sj) = O ΦGt (Sj) = max O log α(nt ) , k6 logγ(nt) .

This means that Slarge is a set of |Slarge| k + 1 disjoint subsets in Gt with low conductance, which contradicts the higher-order Cheeger inequality and proves the claim.

Combining (42) with Claim B.5.1 gives us that

ρ e Gt (ℓ) = max O log 0.9α(nt ) , O k 6 log γ(nt ) ,

and this proves the first statement.

Next we prove the second statement. Let A1, . . . Aℓbe the partition such that Φ e Gt (Ai) = O log δ(nt ) for every

1 i ℓ. Recall that ˆAi is the representation of the set Ai in the full graph Gt , i.e.,

ˆAi Anc i [

pj Ac i P (t ) j

where P (t ) j = Pj \ (Pj e V nc t ), Anc i Ai V nc t , and Ac i Ai V c t . One can see P (t ) j as the vertices in Pj that have not been pulled out into the contracted graph yet.

Notice that, when Ac i = , it holds by construction that ΦGt (Ai) = Φ e Gt (Ai) log δ(nt ). So we only look at the case where Ac i = . Without loss of generality, we assume that Ac i does not contain all the contracted nodes p1, . . . , pk. If it did, then

ΦGt ( ˆAi) = ΦGt

log δ(nt ).

Therefore, given that for any 1 i ℓit holds that vol Gt (Anc i ) 2 |Enew| 2 logγ(nt), and for any 1 j k it

holds that vol Gt (P (t ) i ) = Ω(k6 log2γ(nt)), we get that

ΦGt ( ˆAi) = O

pj Ac i P (t ) j

= O k 6 log γ(nt ) ,

where the last line holds because of property (C3) of Lemma B.3. This proves the second statement.

Proof of Lemma 4.5. We first prove the first statement, and we will prove this by contradiction. Assume by contradiction that λℓ+1(LGt ) < C log α(nt )

(ℓ+1)6 for some constant C. Then, by the higher-order Cheeger inequality, there exists an optimal (ℓ+ 1)-way partition S = {S1, . . . Sℓ+1} such that for all 1 i ℓ+ 1

ΦGt (Si) ρGt (ℓ+ 1) C2.1 (ℓ+ 1)3 p

λℓ+1(Gt ) = O log 0.5α(nt ) .

By Lemma 4.4, it then holds that ρ e Gt (ℓ+ 1) = max O log 0.45α(nt ) , O k 6 log γ(nt ) , which contradicts the fact that

ρ e Gt (ℓ+ 1) λℓ+1(L e Gt )

from which the first statement of the lemma follows.

Next we prove the second statement. We prove this by analysing the spectrum of L e Gt with respect to LGt through LH t . As proven in Lemma B.2, H t is a cluster preserving sparsifier of Gt , and therefore we know that

λℓ+1(LH t ) = Ω(λℓ+1(LGt )). (43)

Dynamic Spectral Clustering with Provable Approximation Guarantee

Our next analysis is inspired by the work on meta graphs of Macgregor and Sun (Macgregor & Sun, 2022). We will analyse the spectrum of LH t with respect to the spectrum of L e Gt , and for simplicity we denote H t H, e Gt e G,

and nt n. For every vertex uj V ( e G) in the contracted graph, we associate it with a non-empty group of vertices Aj V (H) as follows: for all uj e V nc t , we associate uj with its unique corresponding single vertex v V (H), and for every uj = pr e V c t for some r, we associate it with its corresponding vertices in the cluster P (t ) r V (H). Then, let χj Rn be the indicator vector for the vertices Aj V (H) corresponding to the vertex uj V ( e G).

We define en = |V ( e G)|, and let the eigenvalues of L e G be γ1 γ2 . . . γen with corresponding eigenvectors g1, g2, . . . , gen Ren. We further define vectors g1, . . . gen which will represent the eigenvectors g1, . . . gen of the normalised Laplacian L e G, but blown up to size Rn. Formally, we define

1 2 Hχj gi(j).

We can readily check that these vectors form an orthonormal basis. First,

vol H(Aj) gi(j)

j=1 gi(j)2 X

d H(u) vol H(Aj)

j=1 g(j)2 = 1.

And similarly for any i1 = i2,

d H(u) vol H(Aj)gi1(j)gi2(j)

j=1 gi1(j)gi2(j) = 0.

We also get the useful property that for the eigenvalues λ1, . . . , λn of LH and γ1, . . . , γen of the contracted Laplacian L e G, it holds that λi 2 νi. In particular,

gi LH g i =

v Ay w H(u, v)

d H(u) gi(v) p

y=x w H(Ax, Ay)

vol H(Ax) gi(y) p

= 2 gi L e Gg i .

Therefore we have an i-dimensional subspace Xi such that

max x Xi x LHx

x x = 2 γi,

from which it follows by the Courant-Fischer theorem that λi 2 γi. Combining this with (43), we get that

λℓ+1(L e Gt ) 1

2 λℓ+1(LH t ) = Ω λℓ+1(LGt ) = Ω(1),

which proves the lemma.