# improved_dynamic_graph_learning_through_faulttolerant_sparsification__03b4a586.pdf

Improved Dynamic Graph Learning through Fault-Tolerant Sparsiﬁcation

Chun Jiang Zhu 1 Sabine Storandt 2 Kam-Yiu Lam 3 Song Han 1 Jinbo Bi 1

Graph sparsiﬁcation has been used to improve the computational cost of learning over graphs, e.g., Laplacian-regularized estimation, graph semisupervised learning (SSL) and spectral clustering (SC). However, when graphs vary over time, repeated sparsiﬁcation requires polynomial order computational cost per update. We propose a new type of graph sparsiﬁcation namely fault-tolerant (FT) sparsiﬁcation to signiﬁcantly reduce the cost to only a constant. Then the computational cost of subsequent graph learning tasks can be significantly improved with limited loss in their accuracy. In particular, we give theoretical analysis to upper bound the loss in the accuracy of the subsequent Laplacian-regularized estimation, graph SSL and SC, due to the FT sparsiﬁcation. In addition, FT spectral sparsiﬁcation can be generalized to FT cut sparsiﬁcation, for cut-based graph learning. Extensive experiments have conﬁrmed the computational efﬁciencies and accuracies of the proposed methods for learning on dynamic graphs.

1. Introduction

For statistical estimation problems over graphs, an effective regularization term can be based on the underlying graph structures, speciﬁcally, the graph Laplacian (Calandriello et al., 2018; Sadhanala et al., 2016). Consider a graph G(V, E, W) with n vertices and m edges, its Laplacian matrix LG is DG AG, where AG and DG are the adjacency and degree matrices of G, respectively. 1 Suppose y = (y1, , yn) are observations over vertices in G. They are independently drawed from a model parameter

1University of Connecticut 2University of Konstanz 3City University of Hong Kong. Correspondence to: Jinbo Bi <jinbo.bi@uconn.edu>.

Proceedings of the 36 th International Conference on Machine Learning, Long Beach, California, PMLR 97, 2019. Copyright 2019 by the author(s). 1AG and DG are deﬁned as (AG)u,v = W(u, v) if (u, v) E and (AG)u,v = 0 otherwise, (DG)u,v = P

w V W(u, w) if u = v, and (DG)u,v = 0 otherwise, respectively.

β = (β 1, , β n) with Gaussian noises, i.e., for every i [1, n], E(yi) = β i . Then Laplacian-regularized estimation is to solve

min β Rn ||y β||2 + λβLGβT , (1)

where λ is a regularization parameter. Note that the regularization term βLGβT = P

(i,j) E W(i, j)(βi βj). So,

the intuition is to ﬁnd a vector ˆβ such that it will have components that vary smoothly over adjacent vertices in G, while λ controls the degree of smoothness. 2 Other spectral methods (those with a loss function based on the Laplacian) include graph Semi-Supervised Learning (SSL) (Zhu et al., 2003), Logistic Smoothing (Sadhanala et al., 2016), graphregularized least squares (Belkin et al., 2005) and spectral clustering (SC) (Ng et al., 2001). There also exists another type of graph-based learning based on graph cuts and using cut-based algorithms directly, instead of spectral methods, for enhancement, e.g., Min-Cut for SSL (Blum & Chawla, 2001), Max-Cut for SSL (Wang et al., 2013), Sparsest-Cut for hierarchical learning (Moses & Vaggos, 2017) and Max Flow for SSL (Rustamov & Klosowski, 2018).

However, all these methods suffer from the scalability problem. E.g., Equation (1) has a closed-form solution ˆβ = (I + λLG) 1y. As the matrix I + λLG is Strongly Diagonally Dominant (SDD), one can use an optimal solver for SDD matrices, which requires O(m) time (the notation O hides a polylogarithmic factor) (Spielman & Teng, 2004). Nonetheless, even for mildly large, dense graphs with millions of edges, e.g., social networks, protein interaction networks and communication networks, the above methods are not feasible. Recently, (Calandriello et al., 2018; Sadhanala et al., 2016) proposed to preprocess G into a sparse yet spectrally similar graph, called a sparsiﬁer H of size O(n) for subsequent graph learning. By using H instead of G in the Laplacian-regularized term as below,

min β Rn ||y β||2 + λ βLHβT , (2)

one can solve β = (I + λ LH) 1 in only O(n) time, and it has been shown in the papers that the accuracy of the estimation is not affected signiﬁcantly by using H.

2Here we implicitly assume that the underlying signal β is smooth over G.

Improved Dynamic Graph Learning through Fault-Tolerant Sparsiﬁcation

Nonetheless, in practice graphs vary over time and can have updates in terms of vertex/edge insertions/deletions. Reducing the computational cost in learning on dynamically changing graphs is more challenging. E.g., solving Equation (1) at every time point is not feasible obviously, as recall that the computational cost for a static graph at each time point is already not practical. Then similar to the static case, one may turn to incrementally maintain, at each time point t, a sparse graph Ht spectrally similar to the current graph Gt, and then efﬁciently solve Equation (2) with LHt. To achieve incremental maintenance of Ht over time, there exists dynamic or streaming spectral sparsiﬁcation algorithms, e.g., (Abraham et al., 2016; Kelner & Levin, 2013; Kapralov et al., 2014). However, these methods have several major problems: (1) the per update computational cost is polylogarithmic w.r.t. n, which could still be high especially when the number of updates is large; (2) they are quite complicated and purely theoretical with no experimental studies, and thus may not be practical. Designing a simple and practical solution for learning on dynamic graphs is of great interest.

In this work, we propose a new type of graph sparsiﬁcation to resolve the problems above for learning on dynamic graphs: the proposed fault-tolerant subgraphs have constant per update computational cost, work for both vertex and edge insertion/deletions, and will be shown to have both theoretical and experimental guarantees for accuracy of subsequent Laplacian-regularized estimation and graph SSL.

For notational convenience, it is assumed that we start from a graph G = G0 such that for every time point t 1, the vertex set Vt and edge set Et of Gt at the time point t are a subset of V0 and E0, respectively. When otherwise, we can add an additional graph G as the start graph, by including into G all the vertices and edges in G0, G1, , as long as they are pre-deﬁned or known in advance. This is not uncommon in applications. E.g., in a topological network, the vertices and edges, which represent peers and their communication infrastructures resp., are pre-deﬁned so that later the actual communications between peers take place and form communication networks over time. Moreover, the original assumption has its root in fault-tolerant subgraph studies (Chechik et al., 2010; Dinitz & Krauthgamer, 2011; Bodwin et al., 2018) and thus has many applications. E.g., in a social network, users may prefer to temporarily unfriend some friends (for a post) or even disable their own accounts and then recover later. That is, these edges and vertices are marked as faulty temporarily. In a communication network, there are also frequent failed and later recovered computing nodes or communication links.

Speciﬁcally, for the spectral methods, e.g. Laplacianregularized estimation and graph SSL (Zhu et al., 2003), we propose fault-tolerant spectral sparsiﬁers, while for cut-

(c) H2 Figure 1: 1-FT cut sparsiﬁers of G: H1 and H2. (a) G with 36 edges and edge weight 1. (b) H1 with 18 edges and edge weight 2. (c) H2 with 12 edges and edge weight 3. Without loss of generality consider that v is faulty. The Min-Cut of G {v} is 5, while the Min-Cut of H1 {v} and H2 {v} are 4 and 3, respectively. Then H1 and H2 are 1-FT (1 0.2)-cut sparsiﬁer and (1 0.4)-cut sparsiﬁer of G, respectively. based methods, e.g. Min-Cut for SSL (Blum & Chawla, 2001) and Max-Cut for SSL (Wang et al., 2013), we propose fault-tolerant cut sparsiﬁers. Given a graph G, a (possibly re-weighted) subgraph H of G is called an f-vertex-faulttolerant (FT) spectral (cut) sparsiﬁer, if for all possible faulty vertex sets F of size |F| f, the subgraph H F is guaranteed to be a spectral (cut) sparsifer of G F, i.e., the spectral Laplacian quadratic forms (all graph cuts) of H F and G F are the same within a factor of 1 ϵ. The deﬁnition can be naturally extended to the edge-FT setting. We use Fig. 1 to illustrate the deﬁnitions, where two FT cut sparsiﬁers H1 and H2 of G with different ϵ are shown. We will discuss how the proposed FT sparsiﬁers can be used to reduce the computational cost.

Our Contribution. The high computational cost of graphbased learning, e.g., Laplacian-regularized estimation and graph SSL, has been alleviated by graph sparsiﬁcation. However, when graphs are changing over time, repeated sparsiﬁcation requires polynomial order computational cost per update. We design computation-efﬁcient methods to signiﬁcantly reduce the cost to only a constant. To improve the computational cost of spectral methods, we propose efﬁcient algorithms for constructing FT spectral sparsiﬁers, including a core algorithm, a parallel algorithm and a distributed algorithm. Furthermore, for cut-based methods, we propose a new algorithm for constructing FT cut sparsiﬁers. Then the computational cost of subsequent graph learning tasks can be signiﬁcantly improved with limited loss in their accuracy. In particular, we give theoretical analysis to upper bound the loss in the accuracy of the subsequent Laplacian-regularized estimation, graph SSL and SC, due to the FT sparsiﬁcation. Finally, we have performed extensive experiments to conﬁrm the computational efﬁciencies and accuracies of the proposed methods for learning on dynamic graphs.

2. Background

Graph SSL. In this problem, we are only given observations y S of the signal β for a subset of vertices S V in G and the task is to predict the signals for the rest of vertices

Improved Dynamic Graph Learning through Fault-Tolerant Sparsiﬁcation

V S. A typical formulation is the harmonic function solution (HFS) (Zhu et al., 2003), which solves the following optimization problem.

ˆβHF S = arg min β Rn 1 |S|(β y)T l S(β y) + λβLGβT

= (λ|S|LG + IS)+y IS (3) IS is the identity matrix with zeros at vertices not in S.

Graph Sparsiﬁcation. A re-weighted subgraph H of G is called a cut sparsiﬁer of G if for every vertex set S, the weight of the cut between S and V S in H is at most 1 ϵ times of the weight of the cut in G (Benczur & Karger, 1996). (Fung et al., 2011) proposed to construct cut sparsiﬁers by sampling edges according to edge connectivities. In a seminal work, Spielman and Teng (Spielman & Teng, 2011) generalized cut sparsiﬁers to spectral sparsiﬁers. (Spielman & Srivastava, 2011) proposed to construct spectral sparsiﬁers by sampling edges with probability proportional to their effective resistances. G can be considered as an electrical resistive network, and the effective resistance between any two vertices in G is deﬁned as the potential difference that has to be applied between them in order to drive one unit of current through the network G.

A subgraph H of G(V, E) is called a k-spanner of G if for every u, v V , the distance between u and v in H is at most k times of their distance in G (Peleg & Schaffer, 1989). (Kapralov & Panigrahy, 2012) showed that a spectral sparsiﬁer can be obtained by repeated constructions of spanners. (Koutis & Xu, 2016) proposed an algorithm framework based on spanners and sampling. (Lee & Sun, 2017) proposed the state-of-the-art algorithm for constructing O(n) sized spectral sparsiﬁers in O(m) time. Graph sparsiﬁcation has been widely used in linear algebra and algorithm design (Spielman & Teng, 2011), particularly in improving the computational cost of learning over graphs (Calandriello et al., 2018; Sadhanala et al., 2016).

Dynamic Graph Sparsiﬁcation. There has been theoretical algorithms focusing on designing dynamic graph sparsiﬁcatiion. (Kelner & Levin, 2013; Cohen et al., 2016) designed dynamic sparsiﬁcations for insertion-only streams, which can support edge insertions. But their algorithms cannot process edge deletions. (Ahn et al., 2012) proposed a streaming algorithm for constructing cut sparsiﬁers. Later, (Kapralov et al., 2014) proposed a single pass streaming algorithm for constructing spectral sparsiﬁers based on the iterative sparsiﬁcation (Li et al., 2013). (Abraham et al., 2016) proposed a fully dynamic sparsiﬁcation by thoroughly modifying (Koutis & Xu, 2016) to guaranteeing that updates do not propagate in recursive spanner constructions. However, all these algorithms require Ω(polylog(n)) computational cost per update. We reduce the cost to a constant under the assumption that the graph being maintained at every time

point differs from the original graph by a bounded amount.

Fault-Tolerant Subgraphs. This topic has been extensively studied in the theorem community. For short, we will use VFT and EFT to refer to Vertex-Fault-Tolerant and Edge-Fault-Tolerant, respectively. f-VFT (f-EFT) kspanners are a subgraph such that for all vertex (edge) faults of size at most f, the remaining part of the subgraph is always an k-spanner of the remaining part of the original graphs. They were ﬁrstly studied by (Levcopoulos et al., 1998) in the context of geometric graphs. (Chechik et al.,

2010) generalized them to general graphs and proposed algorithms for constructing both VFT and EFT spanners. Recently, (Bodwin et al., 2018) proved that for a ﬁxed k, every n-vertex graph has an f-VFT (f-EFT) spanner of stretch 2k 1 and size Ok(f 1 1/k n1+1/k) (the Ok notation suppresses a 2O(k) factor). Due to limit of space, more FT studies on graphs are referred to the Appendix. Unfortunately, the important topics of FT spectral sparsiers and cut sparsiﬁers have been greatly ignored.

Notation and Deﬁnition. A weighted undirected graph G(V, E, W) consists of a vertex set V , an edge set E and a weight function W which assigns a weight W(e) to each edge e E. W can be omitted from the presentation if it is clear from the context. A simple path, or a path in short, P between u and v in G is a sequence of edges (u = v1, v2), , (vl, vl+1 = v). Its distance is equal to Pl i=1 W(vi, vi+1). Each edge e in G has resistance R(e) = 1/W(e), and the effective resistance between any two vertices u and v in G is denoted as RG(u, v). Suppose that P is a path connecting the two endpoints of an edge e, the stretch of e over P is equal to αP (e) = W(e) P

e P (1/W(e )).

For an arbitrary vertex set VF , we use G VF to denote the remaining graph of G after removing vertices in VF and their edges. For an arbitrary edge set EF , G EF denotes the remaining graph of G after removing edges in EF . For an arbitrary graph H, G H denotes the remaining graph of G after removing edges in H. The notation LA LB means that for every vector x Rn, x T LAx x T LBx, while LA {0,1} LB means that for every vector x {0, 1}n, x T LAx x T LBx. For an event Z, we use Pr[Z] to denote the probability of the event Z happens. We say an event Z happens with high probability (w.h.p.) if, Pr[Z] 1 1/nc

for some constant c > 1. The Laplacian matrix Le G of an edge e in G is the Laplacian matrix of the subgraph of G containing only the edge e. It is zero elsewhere except a 2 2 submatrix.

3. FT Spectral Sparsiﬁers

Before we provide algorithms for constructing FT spectral sparsiﬁers, we formally deﬁne them as follows. We ﬁrst

Improved Dynamic Graph Learning through Fault-Tolerant Sparsiﬁcation

describe the core algorithm in Section 3.1, and then present the parallel and distributed algorithms in Section 3.2. Deﬁnition 1. For a graph G(V, E), a positive integer f and parameter ϵ (0, 1), a re-weighted subgraph H(V, E E) is an f-VFT (f-EFT) (1 ϵ)-spectral sparsiﬁer, if for all vertex (edge) sets F V (F E) of size |F| f, (1 ϵ)LG F LH F (1 + ϵ)LG F holds.

3.1. Proposed Algorithms

In this section, we propose algorithms for constructing FT spectral sparsiﬁer of bounded size. The result are summarized in the following theorem. Theorem 1. For an n-vertex m-edge graph G, a positive integer f, a parameter ϵ (0, 1) and ρ > 1, an f-VFT (f-EFT) (1 ϵ)-spectral sparsiﬁer for G of expected size O(fn log ρ + n log2 n log3 ρ/ϵ2 + m/ρ) w.h.p. can be constructed.

We will use FT spanners formally deﬁned as follows. Deﬁnition 2. For positive integers f and α, a subgraph H of a graph G is an f-VFT (f-EFT) α-spanner of G, if for all vertex (edge) sets F of size |F| f, we have that for every u, v V the distance between u and v in H F is at most α times of their distance in G F.

Our algorithm is inspired by the sparsiﬁcation algorithm by (Koutis & Xu, 2016). However, we need to handle the cases when there are some vertices or edges to become faulty. The idea of the algorithm is to ﬁrst construct an (f + t)-VFT/EFT spanner for the input graph G by any VFT/EFT graph spanners algorithms to determine a set of edges with small effective resistances in G. The (f + t)- VFT/EFT spanner guarantees that even in the presence of at most f faults, each non-spanner edge (edge not in the spanner) has t edge-disjoint paths between its endpoints in the spanner (and thus in the input graph G), serving as a certiﬁcate for an upper bounded effective resistance of the edge. It then uniformly samples each non-spanner edge with a ﬁxed constant probability, and scales the edge weight of each sampled edge proportionally to preserve the edge s expectation. This sampling step is inherently VFT/EFT because the sampling of each non-spanner edge is independent. By the matrix concentration bounds, we can prove that the spanner together with the sampled non-spanner edges are a moderately sparse VFT/EFT spectral sparsiﬁer, in which the number of edges has been reduced by a constant factor. The desirable VFT/EFT spectral sparsiﬁer can be obtained by repeating the process until we get a sufﬁcient sparsity, which happens after a logarithmic number of iterations.

This algorithm is simple, yet works for both VFT and EFT settings. It is interesting to discover that FT spectral sparsiﬁers can be constructed via FT graph spanners, and thus extending the connection between spectral sparsiﬁers and

Algorithm 1 Light-FTSS

Require: G(V, E), f > 0, ϵ (0, 1) Ensure: H

Construct an (f + 24 log2 n/ϵ2)-FT (log n)-spanner J for G; H J; for each edge e G J do

With probability 0.25, add e to H with a new weight 4W(e); end for

Algorithm 2 FTSS

Require: G(V, E), f > 0, ϵ (0, 1), ρ > 1 Ensure: H

G0 G; for i [1, log ρ ] do

Gi Light-FTSS(Gi 1, f, ϵ/ log ρ ); end for H G log ρ ;

spanners from a standard setting (Kapralov & Panigrahy, 2012) to the new FT setting.

Speciﬁcally, our algorithm is summarized in Algorithms 1 and 2. For a graph G, a positive integer f and an ϵ (0, 1), Algorithm 1 (Light-FTSS) ﬁrst constructs an (f + 24 log2 n/ϵ2)-FT (log n)-spanner J for G and adds edges in J to an edge set H. It then samples each edge e J with a ﬁxed probability 0.25 and adds e to H with weight 4W(e) if it is sampled. The probability 0.25 is chosen in order to make the expected size bound hold w.h.p.. Finally, it returns H as the moderately sparse FT spectral sparsiﬁer. Given a graph G, a positive integer f, an ϵ (0, 1) and ρ > 1, for every i [1, log ρ ] Algorithm 2 calls Light-FTSS with parameters Gi 1, f and ϵ/ log ρ , where G0 = G and Gj is the output of Light-FTSS in the iteration when i = j. After the iterations terminate, G log ρ is returned as the ﬁnal FT spectral sparsiﬁer.

We ﬁrst prove the following key lemma.

Lemma 1. Let G be an n-vertex graph and J be an f-VFT (f-EFT) (log n)-spanner of G. For every edge e G J, and every vertex (edge) set F of size |F| ˆf such that e G F, we have that

W(e) RG F (e) log n/(f ˆf),

which implies that

W(e) Le G F log n/(f ˆf) LG F .

Proof. The VFT setting. We ﬁrst show that for every edge e G J, there are at least f vertex-disjoint paths P1, , Pg between the two endpoints of e in J, such that for every i [1, g], αPi(e) log n. Suppose for contradiction that for an edge e G J, there are g < f such paths in J. Then by correctly setting a faulty vertex set F, we can invalidate all these g paths in the graph J F.

Improved Dynamic Graph Learning through Fault-Tolerant Sparsiﬁcation

Figure 2: A faulty vertex set F = {w1, , w ˆ f} of size ˆf can invalidate at most ˆf paths out of f vertex-disjoint paths between endpoints u and v of an edge e(u, v). Here f = 5 and ˆf = 3.

Speciﬁcally, we select an arbitrary vertex, excluding the two endpoints of e, from each of P1, , Pg to formulate F. By construction, |F| = g < f. For the edge e G F, αJ F (e) = because there is no path between its two endpoints in J F. This contradicts with the fact that J is an f-VFT (log n)-spanner of G.

We then show that for every edge e G J, and every fault set F of size ˆf such that e G F, there are at least f ˆf vertex-disjoint paths P1, , Ph (h f ˆf) between the two endpoints of e in J F, where for every i [1, h], αPi(e) log n. This is because a fault set F of size ˆf can invalidate at most ˆf paths as in Fig. 2. By deﬁnition, for every i [1, h], we have that

αPi(e) = W(e) X

e Pi (1/W(e)) log n. (4)

According to the formula for resistors connected in series, for every path Pi with i [1, h], the effective resistance between the two endpoints of e in Pi is equal to

e Pi R(e) = X

e Pi (1/W(e)) (5)

Combining Equations (4) and (5), we have that for every i [1, h], RPi(e) log n/W(e). According to the formula for resistors connected in parallel, for a set of edge-disjoint (vertex-disjointness subsumes edge-jointness) paths {P1, , Ph} between e s two endpoints, and let P be the union of these paths P = h i=1Pi, the effective resistance between e s two endpoints in P is equal to RP (e) = (Ph i=1(RPi(e)) 1) 1 log n/h W(e) log n/(f ˆf) W(e). According to the Rayleigh s monotonicity law (Doyle & Snell, 2000), for any subgraph H of G and any edge e G, RG(e) RH(e) holds. Therefore,

RG F (e) RP (e) log n/(f ˆf) W(e). (6)

By (Spielman & Srivastava, 2011), we have Le G RG(e)LG. Then applying it to the graph G F, we have that Le G F RG F (e)LG F . (7)

By combining Equation (7) with Equation (6), we have that

W(e) Le G F log n/(f ˆf) LG F .

The EFT setting. We show that for every edge e G J, there are at least f edge-disjoint paths P1, , Pg between the endpoints of e in J such that for every i [1, g], αPi(e) log n. Suppose for contradiction that for an edge e G J, there are g < f such paths. Then, we can invalidate all g paths in J F by selecting an arbitrary edge from each of P1, , Pg and adding them into F. We have that the size of F is g < f. For the edge e G F, αJ F (e) = because its endpoints are not connected in J F, and thus αJ F (e) > log n. This contradicts with the fact that J is an f-EFT (log n)-spanner of G. The rest of the proof for the EFT setting is similar to that for the VFT setting, and thus omitted here.

The proposed algorithm is an algorithm framework where any VFT (EFT) graph spanner algorithms can be plugged in, in Line 1 of Algorithm 1. We employ the state-of-the-art algorithms for constructing optimal VFT and suboptimal EFT spanners (for a ﬁxed stretch) by (Bodwin et al., 2018), in order to derive the size bound in Theorem 1. We can use any other VFT (EFT) spanner algorithms here, but the bounds obtained can be slightly worse. The algorithm is a natural generalization of the greedy spanner algorithm by (Althofer et al., 1993) to the FT setting. Given a graph G(V, E), it examines the edges of E in non-decreasing order of weight (ties are broken arbitrarily), and adds an edge e into the current spanner H if and only if there exists a fault set F such that αH F (e) > α = log n, where the stretch factor is a ﬁxed integer α = log n. We will use the following theorem.

Theorem 2. (Bodwin et al., 2018) Let G be an n-vertex graph without negative weight cycle, and α be a ﬁxed integer log n. For any positive (possibly non-constant) integer f, G has an f-VFT (f-EFT) α-spanner of size O(fn).

For the EFT setting, we can also use an alternative EFT (log n)-spanner algorithm (Chechik et al., 2010) combined with the greedy graph spanner algorithm by (Althofer et al., 1993) to get EFT spectral sparsiﬁers with the same size bound as in Theorem 1. For a graph G, the algorithm (Chechik et al., 2010) constructs an f-EFT (log n)-spanner (the same as the f-bundle spanner) as the union of a sequence of graphs H = f i=1Hi, where for every i [1, f], Hi is a (log n)-spanner of the graph G i 1 j=1Hj. However, it cannot be extended to the VFT setting.

We will also use the following variant (Harvey, 2012) of a matrix concentration bound by Tropp (Tropp, 2012).

Theorem 3. (Harvey, 2012) Let Y1, , Yk be independent positive semi-deﬁnite matrices of size n n. Let Y = Pk i=1 Yi and Z = E[Y ]. Suppose for every i [1, k],

Improved Dynamic Graph Learning through Fault-Tolerant Sparsiﬁcation

Yi SZ, where S is a scalar. Then for all ϵ [0, 1], Pr[Pk i=1 Yi (1 ϵ)Z] n exp( ϵ2/2S), and Pr[Pk i=1 Yi (1 + ϵ)Z] n exp( ϵ2/3S).

We are now ready to prove the following theorem summarizing properties of Algorithm 1 (Light-FTSS).

Theorem 4. Given an n-vertex m-edge graph G, a positive integer f and ϵ (0, 1), Light-FTSS constructs an f-VFT (f-EFT) (1 ϵ)-spectral sparsiﬁer for G of expected size O(fn + n log2 n/ϵ2 + m/2), with probability at least 1 1/n2.

Proof. We only consider the VFT setting as the proof for the EFT setting follows similar principle. We ﬁrst prove that the output H by Light-FTSS is an f-VFT (1 ϵ)-spectral sparsiﬁer. Let F V be any vertex fault set of size at most f. For every edge e G J, let Xe be the random variable deﬁned as

( 4W(e)Le G F , with probability 0.25 0, otherwise

For every i [1, ( ϵ2/(6 log n) ) 1], let Ji = ϵ2/(6 log n) (J F), which implies that

LJi = ϵ2/(6 log n) LJ F .

We then apply Theorem 3 to the random matrix

( ϵ2/(6 log n) ) 1 X

e G J Xe + LJ F .

E(Y ) = E( X

e G J Xe + LJ F ) = X

e G J E(Xe) + LJ F

e G J Le G F + LJ F = LG F .

By the deﬁnition of X(e) and Lemma 1, for every e G J we have that

X(e) 4W(e) Le G F ϵ2/(6 log n) LG F .

Furthermore, by deﬁnition of Ji and the fact that LJ F LG F , we have for every i [1, ( ϵ2/(6 log n) ) 1],

LJi = ϵ2/(6 log n) LJ F ϵ2/(6 log n) LG F .

Now the condition of Theorem 3 is satisﬁed with S = ϵ2/(6 log n). Therefore, the inequality

(1 ϵ)LG F LH F (1 + ϵ)LG F (8)

holds with probability at least 1 1/2n exp( 3 log n) = 1 1/2n 2. This completes the proof for the spectral bound.

We then prove the size bound. The number of edges in J is O(fn + n log2 n/ϵ2), according to Theorem 2. By construction, the expected number of edges in H J is m/4. Applying Chernoff s inequality gives that with probability at least 1 1/2n 2 the expected number of edges in H J is at most m/2, which implies the total expected number of edges is O(fn + n log2 n/ϵ2 + m/2). By a union bound, the event that both the inequality (8) and the size bound hold happens with probability at least 1 1/n2.

Given Theorem 4, the proof of Theorem 1 is referred to the Appendix, due to limit of space. Note that introducing tolerance to f vertex or edge faults costs us an extra term fn log ρ in the size of the spectral sparsiﬁer, compared to the size of a standard combinatorial spectral sparsiﬁer (Koutis & Xu, 2016). It is interesting to investigate how to further reduce the extra term in the size for spectral sparsifers, as (Bodwin et al., 2018) has shown that only a sub-linear term (w.r.t. f) is required when introducing the fault-tolerant property into graph spanners.

3.2. Parallel and Distributed Algorithms

In this section, we present parallel and distributed EFT spectral sparsiﬁcation algorithms. The models of computation used by our parallel algorithm and distributed algorithm are the CRCW PRAM model (Reif, 1993) and the synchronized distributed model (Coulouris & Dollimore, 1988), respectively. The results are summarized in the following theorems, and their proofs have been moved to the Appendix, due to limit of space.

Theorem 5. For an n-vertex m-edge graph G, a positive integer f, a parameter ϵ (0, 1) and ρ > 1, an f EFT (1 ϵ)-spectral sparsiﬁer of size O(fn log n log ρ + n log3 n log3 ρ/ϵ2 +m/ρ) can be constructed in the CRCW PRAM model using O(fm log n log ρ+m log3 n log3 ρ/ϵ2) work in O(f log n log ρ + log3 n log3 ρ/ϵ2) time, w.h.p.

Theorem 6. For an n-vertex m-edge graph G, a positive integer f, a parameter ϵ (0, 1) and ρ > 1, an f-EFT (1 ϵ)-spectral sparsiﬁer of size O(fn log n log ρ + n log3 n log3 ρ/ϵ2 + m/ρ) can be constructed in the synchronized distributed model in O(f log n log ρ + log4 n log3 ρ/ϵ2) rounds with O(fm log n log ρ + m log3 n log3 ρ/ϵ2) communication complexity, using messages of O(log n) size.

3.3. Using the FT Spectral Sparsiﬁers in Subsequent Graph Learning

The proposed FT sparsiﬁers have the following desirable properties by deﬁnition. At a time point t > 0, for each vertex v (edge e) insertion into Gt 1, if v (e) is in H, add v and its associated edges in H (e itself) to Ht 1. For

Improved Dynamic Graph Learning through Fault-Tolerant Sparsiﬁcation

each vertex v (edge e) deletion from Gt 1, if v (e) is in Ht 1, remove v and its associated edges (e) from Ht 1. These only incur a constant computational cost per update. More importantly, the resulting subgraph is guaranteed to be a spectral sparsiﬁer of the graph Gt at the time point t, under the assumption that Gt differs from G0 by a bounded amount. Then it can be used in subsequent graph learning tasks without a signiﬁcant loss in their accuracy. 4. FT Cut Sparsiﬁers

We ﬁrst formally deﬁne FT cut sparsiﬁers, and then generalize the algorithm for FT spectral sparsiﬁers to an algorithm for constructing FT cut sparsiﬁers.

Deﬁnition 3. For a graph G(V, E), a positive integer f and parameter ϵ (0, 1), a re-weighted subgraph H(V, E E) is an f-VFT (f-EFT) (1 ϵ)-cut sparsiﬁer if, for all vertex (edge) sets F V (F E) of size |F| f, (1 ϵ)LG F {0,1} LH F {0,1} (1 + ϵ)LG F holds.

As an important building block, we deﬁne a variant of maximum spanning trees (MST, a spanning tree of a weighted graph having maximum weight), namely FT α-MST.

Deﬁnition 4. For positive integers f and α, a subgraph H of a graph G is an f-VFT (f-EFT) α-MST of G, if for all vertex (edge) sets F of size |F| f, we have that for every edge (u, v) in G F, there is a path P from u to v in H F such that for every edge e on P, W(e) αW(e ).

Note that we relax the requirement that H is a tree. Any subgraph with the property is qualiﬁed. We also relax the requirement of an exact MST, which corresponds to the case when α = 1. We will explain the reason shortly.

We observe that the algorithm framework for constructing FT spectral sparsiﬁers in Section 3.1 can be generalized to construct FT cut sparsiﬁers, if we replace FT spanners by FT α-MST, and then thoroughly set the parameters. This is inspired by sampling according to edge connectivities. FT α-MST can guarantee that even in the presence of at most f faults, each edge in the MST has a lower bound on the edge connectivity. To preserve edge connectivities, there is no requirement on the weight of edges in the cut deﬁning the edge connectivities. Therefore, we deﬁne and use an approximate version of FT MST.

After replacing the ﬁrst line in Algorithm 1 by constructing an (f + Cϵc log w log3 n/ϵ2)-FT (log n)-MST J for G, where w is the maximum ratio of the weights of two edges in G, Cϵ > 0 and c > 1 are two new parameters, and also making according changes in Algorithm 2, we can get Algorithms 3 and 4 for constructing FT cut sparsiﬁers. Due to limit of space, we have moved the algorithms, and also the following theorem summarizing main results for FT cut sparsiﬁers, to the Appendix.

Theorem 7. For an n-vertex m-edge graph G, a positive integer f, a parameter ϵ (0, 1), ρ > 1 a constant Cϵ > 0 and a parameter c > 1, Algorithm 4 constructs an f VFT (f-EFT) (1 ϵ)-cut sparsiﬁer for G of expected size O(fn log ρ + n log2 n log3 ρ/ϵ2 + m/ρ), with probability at least 1 n c.

5. Stability Bounds for Subsequent Tasks

In this section, we give stability bounds to quantify the impact of the FT sparsiﬁcation on the accuracy of subsequent graph learning tasks. The FT spectral sparsiﬁers have an important property that, for every time point t, the resulting subgraph Ht by applying all incoming updates at t on Ht 1 is a spectral sparsiﬁer of the current graph Gt. We then immediately obtain, for all time points, the upper bounds on the loss of the accuracy of subsequent graph learning tasks due to the FT sparsiﬁcation, by applying existing results for static graphs to every graph Gt and its spectral sparsiﬁer Ht. We describe the details as follows.

Laplacian-Regularized Estimation. Recall that ˆβ and β are the estimations obtained using the exact graph Gt and the approximate subgraph Ht as in Equations 1 and 2, respectively. The bound in the loss of accuracy is upper bounded by || β ˆβ||2 2 2(1 + 2ϵ)λˆβT LG ˆβ, if λ = 2λ (Sadhanala et al., 2016). A slightly better bound can be obtained if λ = λ. As can be seen from the bound, we can smoothly tradeoff accuracy and computational cost of the estimation by setting the value of the parameter ϵ for an FT spectral sparsiﬁer. A small value of ϵ results in a small accuracy loss but a large number of edges for the FT sparsiﬁer, which implies a high computational cost, while a large value of ϵ results in a small sized FT sparsiﬁer but a large accuracy loss.

Graph SSL. Because the original HFS in Equation (3) is not stable, we use a stable version, Stable-HFS for theoretical analysis.With an added regularization term µ = (( γ

|S|LG +IS)+y IS)T 1/( γ

|S|LG +1)+1, Stable-HFS has a

new solution ˆβST A = ( γ

|S|LG + IS)+(y IS + µ1). Denote ˆR(β) = 1 |S| P|S| i=1(β(xi) y(xi))2 as the empirical error and R(β) = 1 n Pn i=1(β(xi) y(xi))2 as the generalization error on all labeled and unlabeled vertices. It has been proven that R( β) ˆR(ˆβ) + β + o(β), in Theorem 2 in (Calandriello et al., 2018). The last term o(β) contains ϵ and increasing ϵ results in a constant increase in the bound. Then similar to Laplacian-regularized estimation, we can tradeoff accuracy and computational cost by correctly setting the value of ϵ.

There exists strong stability bounds for other spectral methods, e.g. Logistic Smoothing (Sadhanala et al., 2016), graphregularized least squares, Laplacian SVM (Belkin et al., 2005) and SC (Ng et al., 2001). We do not review each of

Improved Dynamic Graph Learning through Fault-Tolerant Sparsiﬁcation

1 2 3 4 5 6 7 8 9 10 Time point

EXACT SPA FTSPA

(a) Estimation, σ = 0.1

1 2 3 4 5 6 7 8 9 10 Time point

EXACT SPA FTSPA

(b) Estimation, σ = 0.01

1 2 3 4 5 6 7 8 9 10 Time point

EXACT SPA FTSPA

(c) SSL, σ = 0.1

1 2 3 4 5 6 7 8 9 10 Time point

EXACT SPA FTSPA

(d) SSL, σ = 0.01

Figure 3: Accuracy of Laplacian-regularized estimation and graph SSL for signals with Gaussian noise of σ = 0.1 and 0.01

them here, but give more discussions in the Appendix. 6. Experiments

We empirically show that the computational cost for maintaining spectral sparsiﬁers can be signiﬁcantly improved for dynamic graphs, while the accuracy of subsequent Laplacian-regularized estimation and graph SSL are not signiﬁcantly affected.

Dataset. We used the Facebook social network data from the SNAP 3. The numbers of vertices and edges in the egocombined graph are 4309 and 88234, respectively. A smooth signal β over the vertices was simulated following the approach in (Sadhanala et al., 2016). Then Gaussian noises from N(0, σ) were added to form observations y. For every time point in [1, 10], a random number of at most 200 insertions/deletions of randomly selected edges were generated. We can only simulate ten time points, because the baseline SPA (to be deﬁned) incurs a high computational cost.

Methods. We compared our algorithm FTSPA with a baseline SPA, which constructs a spectral sparsiﬁer from scratch at every time point. We did not include the dynamic algorithms (Kyng et al., 2017; Kapralov et al., 2014) because none of these algorithms has been implemented before. The performance measures include: (1) the update time representing the computational cost, and (2) the error || β ˆβ||2 2 for Laplacian-regularized estimation, and the generalization error R(β) for graph SSL. All the results were obtained by taking the average values from ﬁve runs of sparsiﬁcation.

Results. We tested the errors and update times for four signals with different noises, σ {10 3, 10 2, 10 1, 100}. For Laplacian-regularized estimation, the parameter λ {10 3, 10 2, 10 1, 100}, while for graph SSL, λ {10 6, 10 4, 10 2, 100}. We will only report the smallest error achieved by different λ. In graph SSL, the labels were the sign of the signal β , and the number of revealed labeled vertices l = 1000 with 500 ones and 500 zeros, following (Calandriello et al., 2018). For our algorithm, the parameters were set to f {1, 3, 5, 7}, ϵ = 0.2 and

3https://snap.stanford.edu/data/ego-Facebook.html

Methods Update Time Speedup # Edges SPA 34.2 s 1 12978 30 FTSPA 0.3 ms > 105 16502 41

Table 1: Update time and # edges of SPA and FTSPA

ρ = 20. Due to limit of space, we only report the results for σ = {0.1, 0.01} and f = 1. The other results are similar and can be found in the Appendix.

As shown in Figure 3, for both tasks on both signals, the errors of SPA and FTSPA are not signiﬁcantly affected by the sparﬁcation. They were compared with the exact method EXACT, which computes the solution using the current graph Gt at a time point t. However, the average update time of FTSPA over all time points is only 0.3 miliseconds, signiﬁcantly smaller than that of SPA, 34.2 seconds. The speedup is over 105. We observed that a small value of f, e.g. f = 1, is already enough to tolerant up to 200 edge updates on over 105 edges.

Moreover, the number of edges in SPA and FTSPA are only about 13K and 16.5K respectively, signiﬁcantly smaller than the number of edges in the original graph, about 88K, as shown in Table 1. Then the computational cost of subsequent Laplacian-regularized estimation and graph SSL are signiﬁcantly improved. 7. Conclusion and Future Work

Graph sparsiﬁcation has been widely used in large-scale graph learning, e.g., Laplacian-regularized estimation, graph SSL and SC. However, when graphs vary over time, existing methods require polynomial order computational cost per update. We reduce the cost to to only a constant by proposing FT spectral sparsiﬁcation and cut sparsiﬁcation. We then analyze the stability bounds for subsequent learning tasks, e.g., Laplacian-regularized estimation and graph SSL. Extensive experiments have validated the computational efﬁciencies and accuracies of the proposed methods. As the future work, we will study the existence of FT sparsiﬁers for digraphs (Cohen et al., 2017; Zhu & Lam, 2017; 2018). Proving the size lower bounds for the FT spectral sparsiﬁcation and cut sparsiﬁcation is also an interesting direction.

Improved Dynamic Graph Learning through Fault-Tolerant Sparsiﬁcation

Acknowledgements

We thank the four reviewers for their insightful comments. This work was supported by NSF grants: DBI1356655, CCF-1514357 and IIS-1718738. Jinbo Bi was also supported by NIH grants 5R01DA037349-04 and 5K02DA043063-03.

Abraham, I., Durfee, D., Koutis, I., Krinninger, S., and Peng, R. On fully dynamic graph sparsiﬁers. In Proceedings of IEEE FOCS Conference, pp. 335 344, 2016.

Ahn, K. J., Guha, S., and Mc Gregor, A. Graph sketches: sparsiﬁcation, spanners, and subgraphs. In Proceedings of PODS Conference, pp. 5 14, 2012.

Althofer, I., Das, G., Dobkin, D., Joseph, D., and Soares, J. On sparse spanners of weighted graphs. Discrete Computational Geometry, 9:81 100, 1993.

Belkin, M., Niyogi, P., and Sindhwani, V. On manifold regularization. In Proceedings of AISTATS Conference, pp. 1, 2005.

Benczur, A. and Karger, D. Approximating s-t minimum cuts in O(n2) time. In Proceedings of ACM STOC Conference, pp. 47 55, 1996.

Blum, A. and Chawla, S. Learning from labeled and unlabeled data using graph mincuts. In Proceedings of ICML Conference, pp. 19 26, 2001.

Bodwin, G., Dinitz, M., Parter, M., and Williams, V. Optimal vertex fault tolerant spanners (for ﬁxed stretch). In Proceedings of SIAM SODA Conference, pp. 1884 1900, 2018.

Calandriello, D., Koutis, I., Lazaric, A., and Valko, M. Improved large-scale graph learning through ridge spectral sparsiﬁcation. In Proceedings of ICML Conference, pp. 688 697, 2018.

Chechik, S., Langberg, M., Peleg, D., and Roditty, L. Faulttolerant spanners for general graphs. SIAM Journal on Computing, 39(7):3403 3423, 2010.

Cohen, M., Musco, C., and Pachocki, J. Online row sampling. In Proceedings of APPROX-RANDOM Conference, pp. 7:1 7:18, 2016.

Cohen, M., Kelner, J., Peebles, J., Peng, R., Rao, A., Sidford, A., and Vladu, A. Almost-linear-time algorithms for Markov chains and new spectral primitives for directed graphs. In Proceedings of STOC Conference, pp. 410 419, 2017.

Coulouris, G. F. and Dollimore, J. Distributed systems: concepts and design. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1988.

Dinitz, M. and Krauthgamer, R. Fault-tolerant spanners: better and simpler. In Proceedings of ACM PODC Conference, pp. 169 178, 2011.

Doyle, P. and Snell, J. Random walks and electric networks. https://arxiv.org/abs/math/0001057, 2000.

Fung, W., Hariharan, R., Harvey, N. J., and Panigrahi, D. A general framework for graph sparsiﬁcation. In Proceedings of STOC Conference, pp. 71 80, 2011.

Harvey, N. Matrix concentration and sparsiﬁcation. In Workshop on Randomized Numerical Linear Algebra: Theory and Practise, 2012.

Kapralov, M. and Panigrahy, R. Spectral sparsiﬁcation via random spanners. In Proceedings of ITCS Conference, pp. 393 398, 2012.

Kapralov, M., Lee, Y., Musco, C., Musco, C., and Aaron, S. Single pass spectral sparsiﬁcation in dynamic streams. In Proceedings of IEEE FOCS Conference, pp. 561 570, 2014.

Kelner, J. and Levin, A. Spectral sparsiﬁcation in the semistreaming setting. Theory of Computing Systems, 53(2): 243 262, 2013.

Koutis, I. and Xu, S. Simple parallel and distributed algorithms for spectral graph sparsiﬁcation. ACM Transactions on Parallel Computing, 3(2):14, 2016.

Kyng, R., Pachocki, J., Peng, R., and Sachdeva, S. A framework for analyzing resparsiﬁcation algorithms. In Proceedings of SIAM SODA Conference, pp. 2032 2043, 2017.

Lee, Y. and Sun, H. An SDP-based algorithm for linearsized spectral sparsiﬁcation. In Proceedings of ACM STOC Conference, pp. 678 687, 2017.

Levcopoulos, C., Narasimhan, G., and Smid, M. Efﬁcient algorithms for constructing fault-tolerant geometric spanners. In Proceedings of ACM STOC Conference, pp. 186 195, 1998.

Li, M., Miller, G., and Peng, R. Iterative row sampling. In Proceedings of IEEE FOCS Conference, pp. 127 136, 2013.

Moses, C. and Vaggos, C. Approximate hierarchical clustering via sparsest cut and spreading metrics. In Proceedings of SODA Conference, pp. 841 854, 2017.

Improved Dynamic Graph Learning through Fault-Tolerant Sparsiﬁcation

Ng, A., Jordan, M., and Weiss, Y. On spectral clustering: analysis and an algorithm. In Proceedings of NIPS Conference, pp. 849 856, 2001.

Peleg, D. and Schaffer, A. Graph spanners. Journal of Graph Theory, 13(1):99 116, 1989.

Reif, J. H. Synthesis of parallel algorithms. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1993.

Rustamov, R. and Klosowski, J. Interpretable graph-based semi-supervised learning via ﬂows. In Proceedings of AAAI Conference, pp. 3976 3983, 2018.

Sadhanala, V., Wang, Y.-X., and Tibshirani, R. J. Graph sparsiﬁcation approaches for Laplacian smoothing. In Proceedings of AISTATS Conference, pp. 1250 1259, 2016.

Spielman, D. and Srivastava, N. Graph sparsiﬁcation by effective resistances. SIAM Journal on Computing, 40(6): 1913 1926, 2011.

Spielman, D. and Teng, S.-H. Nearly-linear time algorithms for graph partitioning, graph sparsiﬁcation, and solving linear systems. In Proceedings of STOC Conference, pp. 81 90, 2004.

Spielman, D. and Teng, S.-H. Spectral sparsiﬁcation of graphs. SIAM Journal on Computing, 40(4):981 1025, 2011.

Tropp, J. User-friendly tail bounds for sums of random matrices. Foundations of Computational Mathematics, 12(4):389 434, 2012.

Wang, J., Jebara, T., and Chang, S.-F. Semi-supervised learning using greedy max-cut. Journal of Machine Learning Research, 14:771 800, 2013.

Zhu, C. and Lam, K.-Y. Source-wise round-trip spanners. Information Processing Letters, 124(C):42 45, 2017.

Zhu, C. and Lam, K.-Y. Deterministic improved round-trip spanners. Information Processing Letters, 129:57 60, 2018.

Zhu, X., Ghahramani, Z., and Lafferty, J. Semi-supervised learning using gaussian ﬁelds and harmonic functions. In Proceedings of ICML Conference, pp. 912 919, 2003.