# differentially_private_densest_subgraph_detection__2b9ca0dd.pdf

Differentially Private Densest Subgraph Detection

Dung Nguyen 1 2 Anil Vullikanti 1 2

Densest subgraph detection is a fundamental graph mining problem, with a large number of applications. There has been a lot of work on efﬁcient algorithms for ﬁnding the densest subgraph in massive networks. However, in many domains, the network is private, and returning a densest subgraph can reveal information about the network. Differential privacy is a powerful framework to handle such settings. We study the densest subgraph problem in the edge privacy model, in which the edges of the graph are private. We present the ﬁrst sequential and parallel differentially private algorithms for this problem. We show that our algorithms have an additive approximation guarantee. We evaluate our algorithms on a large number of real-world networks, and observe a good privacy-accuracy tradeoff when the network has high density.

1. Introduction

Data privacy is a fundamental challenge in many real world applications, e.g., healthcare, social networks, and ﬁnance, where there is a risk of revealing private information through adversarial queries. Differential privacy (deﬁned in Section 2), developed through the work of a number of researchers, e.g., (Dwork, 2011; Blum et al., 2005; Dwork et al., 2014; Nissim et al., 2007), has proven to be a very powerful approach to support diverse kinds of computations with rigorous privacy guarantees (see (Zhu et al., 2017; Vadhan, 2017) for extensive surveys on this topic). This framework allows database owners to support queries with a very controlled and rigorous loss of privacy. Differentially private algorithms have now been designed for a number of problems, including supervised and unsupervised machine learning, social network analysis and deep learning (Kasiviswanathan et al., 2013; Blocki et al., 2013; Abadi et al., 2016; Abowd,

1Department of Computer Science, University of Virginia, Virginia, USA 2Biocomplexity Institute and Inintiative, University of Virginia, Virginia, USA. Correspondence to: Dung Nguyen <dungn@virginia.edu>.

Proceedings of the 38 th International Conference on Machine Learning, PMLR 139, 2021. Copyright 2021 by the author(s).

2018). However, network problems have been proven to be much harder, and private algorithms for many problems (e.g., in graph mining) remain poorly understood. For some of the problems which have been studied, e.g., (Nguyen et al., 2016), good accuracy bounds are not known. Further, most private algorithms do not scale very well, especially for network problems, e.g., (Kasiviswanathan et al., 2013; Blocki et al., 2013), and there has been limited work on parallel algorithms.

Here, we study the problem of ﬁnding the densest subgraph in a graph G(V, E), which is a very basic subroutine in graph mining, and has been used in diverse domains, including bioinformatics, network science, fraud detection, and social network analysis, e.g., (Cadena et al., 2018; Hooi et al., 2016; Khuller & Saha, 2009; Cadena et al., 2016; Tsourakakis et al., 2013; Rozenshtein et al., 2014). There are many notions of density (Cadena et al., 2018); here, we focus on the notion of average density ρ(S) of S V , deﬁned as ρ(S) = (#edges with both end points in S)/|S|. The goal is to ﬁnd a densest subgraph S = argmax S V ρ(S); we use ρ(G) = ρ(S ) to indicate the density of the the densest subgraph in G. This notion of density is one of the most common studied ones, e.g., for anomaly detection in networked data (Cadena et al., 2018; Hooi et al., 2016), primarily because it can be computed very efﬁciently (Charikar, 2000; Asahiro et al., 2002; Khuller & Saha, 2009; Goldberg, 1984). In particular, the densest subgraph can be computed optimally using linear programming (Goldberg, 1984; Charikar, 2000), and a simple iterative greedy algorithm gives a 1

2-approximation (Charikar, 2000; Asahiro et al., 2002). Highly scalable parallel algorithms have also been developed, e.g., (Bahmani et al., 2012; Ghaffari et al., 2019). However, private algorithms are not known for the dense subgraph problem, which is the focus of this paper.

There are two standard approaches for privacy in networks edge privacy (in which the nodes are public and the edges are private) and node privacy (in which the nodes are also private). As we observe later, the maximum density value has low sensitivity can be computed easily using the Laplace mechanism. However, in graph mining settings, analysts are interested in ﬁnding the actual densest subgraph (i.e., the nodes in the subgraph), which is the focus of our paper, and is a much harder problem. It only makes sense in the edge privacy model. Our contributions are summarized below.

Differentially Private Densest Subgraph Detection

We ﬁrst present a sequential (ϵ, δ)-differentially private algorithm (SEQDENSEDP) for the densest subgraph problem in the edge privacy model. We adapt Charikar s algorithm (Charikar, 2000), using the exponential mechanism (Mc Sherry & Talwar, 2007; Gupta et al.) iteratively, and prove that our mechanism achieves the same multiplicative approximation factor as (Charikar, 2000), with an additive logarithmic factor, i.e., the solution S computed by SEQDENSEDP satisﬁes ρ(S) ρ(S )/2 O(log n), with high probability.

We alter the sampling process in SEQDENSEDP, and obtain a parallel algorithm (PARDENSEDP), which involves sampling multiple nodes in parallel, in each iteration. This works well in practice, in some regimes, but can have exponential time in the worst case. Our ﬁnal algorithm (PHASEDENSEDP) groups multiple iterations into a single sampling phase (basically guessing when a node would be sampled). We prove that PHASEDENSEDP takes O(log n) phases, with high probability. Both PARDENSEDP and PHASEDENSEDP have similar accuracy as SEQDENSEDP, though the constant factors get worse. We also design a distributed version of PHASEDENSEDP in the Map-Reduce model (Dean & Ghemawat, 2008; Karloff et al., 2010; Beame et al., 2013). We show it has similar accuracy and runtime guarantees as PHASEDENSEDP.

We evaluate our algorithms on a large number of diverse networks, in order to understand the privacyaccuracy tradeoffs. We ﬁnd that in networks with high densities, our private algorithms have good accuracy, especially for larger values of ϵ. Our algorithms also have reasonable recall, i.e., the private solution contains a large fraction of the optimal subgraph. SEQDENSEDP has better accuracy than PARDENSEDP, which, in turn, is better than PHASEDENSEDP. In some regimes of ϵ, PARDENSEDP has signiﬁcantly better accuracy than PHASEDENSEDP, and is comparable to SEQDENSEDP.

We show an additive lower bound of Ω( log n) for any differentially private algorithm for DENSEDP.

Due to the limited space, we have omitted many proofs and other discussion; these are presented in the Appendix. The full version including the Appendix is available at (Nguyen & Vullikanti, 2021). We note that our analysis is not very tight, in order to simplify the discussion; we expect the constant factors can be tightened with a more careful analysis.

We use the exponential mechanism (Mc Sherry & Talwar, 2007) in an iterative manner, which involves choosing elements from a set with probability that is proportional to

a score associated with the element. This approach has been used in private versions of other combinatorial optimization problems, e.g., vertex cover, global minimum cut and set cover, e.g., (Gupta et al.). However, this is inherently sequential, and we develop a parallel version, in which multiple elements can be chosen simultaneously. Consequentially, the analysis of the performance becomes much more challenging, and is one of our contributions.

2. Preliminaries

2.1. Problem statement

Densest subgraph. Let G = (V, E) denote an undirected graph. For a subset of nodes S V , let E[S] denote the set of edges with both end points in S. The density of a set S in G is deﬁned as ρ(S) = |E[S]|

|S| (Charikar, 2000; Khuller & Saha, 2009). The objective is to ﬁnd a subset S (G) = argmax S V ρ(S), which achieves the maximum density; we refer to it as S when the graph is clear from the context. Let ρ(G) = ρ(S ). We say that ρ(S) is an α-approximate solution if ρ(S) αρ(S ). An optimal densest subgraph (i.e., α = 1) can be computed optimally using a linear programming based algorithm, and an iterative algorithm gives a α = 1/2-approximation (Charikar, 2000; Khuller & Saha, 2009). However, this turns out to be difﬁcult to achieve under privacy, and we consider a slightly relaxed version: we say S is an (α, β)-approximation if ρ(S) αρ(S (G)) β. Let N G(v) denote the set of neighbors of v in G, and let deg G(v) = |N G(v)| denote the degree of v. For a subset S V , let deg G S (v) = |N(v) S| denote the number of neighbors of v in S in graph G; when G is clear from the context, we denote it by deg S(v).

Differential privacy on graphs. Let G denote a set of graphs on a ﬁxed set V of nodes. For a graph G G, we use V (G) and E(G) to denote the set of nodes and edges of G, respectively. In this paper, we will focus on the notion of Edge privacy (Blocki et al., 2013), where all graphs G G have a ﬁxed set of nodes V (G) = V , and two graphs G, G G are considered neighbors, i.e., G G , if they differ in exactly one edge, i.e., |E(G) E(G )| = 1. We note that another notion that has been considered is Node privacy (Kasiviswanathan et al., 2013): two graphs G, G G are considered neighbors, i.e., G G , if they differ in exactly one node, i.e., |V (G) V (G )| = 1. Note that they might differ in many edges.

Deﬁnition 2.1. A (randomized) algorithm M : G R is (ϵ, δ)-differentially private if for all subsets S R of its output space, and for all G, G G, with G G , we have Pr[M(G) S] eϵPr[M(G ) S] + δ (Dwork et al., 2014; Dwork, 2011; Vadhan, 2017; Blocki et al., 2013).

Problem statement: densest subgraph detection with edge differential privacy (DENSEDP) Given a family of

Differentially Private Densest Subgraph Detection

graphs G on a set V of vertices, and parameters ϵ, δ, construct an (ϵ, δ)-differentially private mechanism M : G 2V , such that: (1) for any graphs G G , and S 2V , Pr[M(G) S] eϵ Pr[M(G ) S]+δ, and (2) ρ(M(G)) is maximized.

We evaluate the accuracy of the mechanism M (in the second condition above) in terms of the (α, β)-approximation to ρ(S (G)) for any graph G G; therefore, we would like α to be as large as possible, and β to be as small as possible.

Computing density value vs ﬁnding a subgraph. The DENSEDP problem involves ﬁnding a subset M(G) V , where V is not private, but only the edges are private. It can be veriﬁed easily that the sensitivity (Deﬁnition 2.2) ρ = max G G |ρ(S (G)) ρ(S (G ))| 1, as the addition or removal of an edge can be shown to alter the density by at most 1. As a result, private computation of ρ(S (G)) can be done easily with the standard Laplacian mechanism Lap(1) (Dwork et al., 2014). However, returning a subgraph with high density, and with privacy is a harder problem.

2.2. Additional background on differential privacy

Let A1 : D1 R1 and A2 : D2 R2 be (ϵ1, δ1) and (ϵ2, δ2)-differentially private algorithms, respectively. Let f : R1 R 1 be an arbitrary randomized mapping. We will make extensive use of the following basic results shown in (Dwork et al., 2014), that differential privacy is preserved under: (1) post-processing (Proposition 2.1 of (Dwork et al., 2014)), i.e., f A1 : D R 1 is also (ϵ, δ)-differentially private, and (2) composition (Theorem 3.16 of (Dwork et al., 2014)), i.e., A : (D1, D2) (R1, R2) deﬁned as A(x) = (A1(x), A2(x)) is (ϵ1 + ϵ2, δ1 + δ2)-differentially private.

Deﬁnition 2.2. Sensitivity of utility function (Dwork et al., 2014). Given a dataset space D and an arbitrary range R, a function u : D R R has a global sensitivity u deﬁned as: u = max r R max x x |u(x, r) u(x , r)|

Deﬁnition 2.3. The Exponential Mechanism. (Deﬁnition 3.4 of (Dwork et al., 2014)) The exponential mechanism M(x, u, R) that outputs an element r R with probability proportional to exp( ϵu(x,r)

u ) is ϵ-differentially private, if the addition of an element to the data does not decrease the value of the utility function. Theorem 2.4. Utility of the Exponential Mechanism. (Theorem 3.11 and Corollary 3.12 of (Dwork et al., 2014)) For a given dataset x, let OPT = maxr Ru(x, r). For the exponential mechanism M( ), we have:

Pr u(x, M(x, u, R)) OPT 2 u

ϵ (ln |R| + r) e r

Adversarial Probabilistic Process. Following (Gupta et al.), consider an adversarial probabilisic process with

n iterations. In each iteration i, the adversary tosses a coin, which gives heads with probability pi [0, 1], adaptively by observing previous i-1 iterations. We utilize this process to prove the privacy bounds of our algorithms.

Lemma 2.5. Lemma B.1 of (Gupta et al.). Let Zi be a random variable, with Zi = 1 indicating that no head appears in the ﬁrst i iterations, and Zi = 0 otherwise. Let Y = Pn i=1 pi Zi. For any q, Pr[Y > q] exp( q).

Corollary 2.5.1. Let T be the ﬁrst iteration in which a head appears in the process above. For any δ (0, 1], PT 1 i=1 pi ln δ 1, with probability at least 1 δ.

3. Related Works

Differential privacy has been a very active area of research since its introduction. We brieﬂy summarize the main results and challenges; we focus only on works on privacy for graph that are directly relevant to our paper. Due to space constraints, we refer to (Dwork, 2011; Blum et al., 2005; Dwork et al., 2014; Nissim et al., 2007; Zhu et al., 2017; Vadhan, 2017) for surveys and details of key results in this area, and provide additional discussion in the Appendix.

Much of the initial work on graph privacy involved computing different kinds of statistics, e.g., number of edges, counts of triangles and other subgraphs, and the cost of the minimum spanning tree (Nissim et al., 2007; Karwa et al., 2014a;b; Mir & Wright, 2012; Hay et al., 2009; Kasiviswanathan et al., 2013; Blocki et al., 2013). Graph statistics are challenging due to their high sensitivity. (Nissim et al., 2007) introduce a concept of smooth sensitivity and apply it for triangle counting and minimum spanning tree problems in the edge privacy model. (Karwa et al., 2014a) generalize the method for other subgraph counting problems, such as k-star or k-triangle. Many other graph statistics in the edge-privacy model have been studied in (Karwa et al., 2014b; Mir & Wright, 2012; Hay et al., 2009). On the other hand, graph statistics in the node-privacy model are generally more difﬁcult. (Kasiviswanathan et al., 2013; Blocki et al., 2013) are two of the ﬁrst studies that utilize Lipschitz Extensions to develop node differentially private mechanisms. They transform subgraph counting problems into linear programming optimizations of which the sensitivities of the solutions are restricted. Algorithms for node differentially private graph queries such as degree distributions or random graph model estimations have been established by (Chen & Zhou, 2013; Ding et al., 2018; Day et al., 2016).

In contrast to graph statistics, releasing a subset of nodes in the input graph with differential privacy (as in the st-mincut, k-median, or the vertex cover problems), or a collection of subsets in other combinatorial optimization problems (as in the set cover problem) is much more challenging, e.g., (Mitrovic et al., 2017; Gupta et al.). One of the ﬁrst

Differentially Private Densest Subgraph Detection

papers on this was by (Gupta et al.), who design private algorithms for different combinatorial optimization problems, including computing global mincuts, vertex cover and set cover. Our algorithms are inspired by the techniques of (Gupta et al.).

Another direction of work is on generation of differentially private synthetic graphs (Xiao et al., 2014; Nguyen et al., 2015; Chen et al., 2014; Gupta et al., 2012), which preserve some speciﬁc graph properties, such as cut, shortest path length and degree distribution queries. Such synthetic graphs are useful since post-processing does not cause any privacy violation. Finally, we note that private graph algorithms have also been considered for other notions of privacy, e.g. (Imola et al., 2020).

4. Private Densest Subgraph

Here we design and analyze SEQDENSEDP for the DENSEDP problem. This is an adaptation of the (non private) algorithm Charikar (Charikar, 2000), which gives a 1/2-approximation for the densest subgraph problem this algorithm constructs a sequence of subgraphs S1, . . . , Sn (by removing a minimum degree node each time), and then chooses argmax Siρ(Si). SEQDENSEDP adapts this algorithm, and involves two parts. First, it generates a sequence of candidate subgraphs by removing one node at a time using an Exponential Mechanism, with probability proportional to an exponential function of its current degree times a constant. Therefore, lower-degree nodes have exponentially more chances to be removed than higher-degree nodes. Intuitively, we try to remove low-degree nodes and retain highdegree nodes to construct dense candidates. The second part of the algorithm applies the Exponential Mechanism to randomly select one of the candidates by their densities.

Algorithm 1 SEQDENSEDP(G, ϵ, δ) Input: G G, privacy parameters ϵ, δ. Output: Subset S V (G)

1: Let ϵ = ϵ 4 ln (e/δ), S0 = V (G) 2: for t := 1 to n do 3: Pick a node πt {v : v St 1} with probability propor-

tional to e ϵ deg St 1 (v)

4: St = St 1 πt 5: end for 6: Return St from {St : t = 0, . . . , n 1} with probability proportional to eϵρ(St)/2

Lemma 4.1. The computation of the sequence πG = π1, . . . , πn, in the for loop in Algorithm 1 (lines 3 5) is (ϵ/2, δ)-differentially private.

Proof. (Short, Full proof in Theorem A.1) Let G G be two graphs that differ in exactly one edge e = (u, v). Let πG and πG denote the permutations computed in the for

loop in Algorithm 1. Let T is the ﬁrst iteration that one of the endpoints of e (i.e., u or v) is removed. Let adj(e) deﬁne the set of nodes adjacent to edge e. Recall that deg G St(j) is the degree of node j in subgraph St of graph G. Fix a permutation π. Consider the ratio:

P = Pr[πG = π]

exp( ϵ deg G St(πt))/ P

j exp( ϵ deg G St(j))

exp( ϵ deg G St (πt))/ P

j exp( ϵ deg G St (j))

= exp( ϵ deg G ST (πT ))

exp( ϵ deg G ST (πT ))

j exp( ϵ deg G St (j)) P

j exp( ϵ deg G St(j))

The last equality follows from the deﬁnition of T , because: (i) for t = T + 1..n, deg G St(j) = deg G St (j) for all nodes j, and, (ii) for t = 1..T 1, deg G St(πt) = deg G St (πt). We have two cases for the analysis.

First, suppose G+e = G . In this case, exp( ϵ deg G ST (πT ))

exp( ϵ deg G ST (πT )) =

exp(ϵ ) because deg G ST (πT )+1 = deg G ST (πT ). Further, for each t, j, we have deg G St (j) deg G St(j) for all j, which im-

plies the product QT t=1

j exp( ϵ deg G St (j)) P

j exp( ϵ deg G St(j)) 1. Therefore,

Second, suppose G e = G . Similar to the ﬁrst case, exp( ϵ deg G ST (πT ))

exp( ϵ deg G ST (πT )) = exp( ϵ ) 1. For j adj(e), t T ,

we have deg G St(j) = deg G St (j) + 1. For j / adj(e), and for all t, deg G St(j) = deg G St (j). Further, for t > T , we have deg G St(j) = deg G St (j) for all j. Note that whenever we use deg G St(j), we only consider j St 1, i.e., j would not have been deleted from the graph before step t. Therefore, for t T : we can expand the term P

j exp( ϵ deg G St (j)) as P

j exp( ϵ deg G St (j)) = (eϵ 1) P

j adj(e) exp( ϵ deg G St(j)) + P

j exp( ϵ deg G St(j)). Substituting it in the expression for P. Since deg G St(j) = deg G St (j) for all j and t > T , we have

t=1 (1 + (exp(ϵ ) 1)

j adj(e) exp( ϵ deg G St(j)) P

j exp( ϵ deg G St(j)) )

t=1 (1 + (exp(ϵ ) 1)pt(G))

t=1 exp((exp(ϵ ) 1)pt(G)), using 1 + x ex for x 0,

where pt(G) is the probability that an endpoint of e is picked in iteration t. We will show in Lemma 4.2, by using the adversarial process (Section 2.2), that PT 1 t=1 pt(G) ln δ 1

with prob. at least 1 δ and use it to derive a bound on P. We say that π is ln δ 1-good if PT 1 t=1 pt(G) ln δ 1 and ln δ 1-bad otherwise (good and bad for short). Therefore,

Differentially Private Densest Subgraph Detection

with probability at least 1 δ, π is good and

t=1 exp((exp(ϵ ) 1)pt(G))

t=1 pt(G)), using ex 1 + 2x for x 1

exp(2ϵ (ln δ 1 + p T (G))) exp(2ϵ (ln δ 1 + 1))

= exp(2ϵ (ln(e/δ)) = exp(ϵ/2), since ϵ = ϵ 4 ln (e/δ),

which implies Pr[πG = π] exp(ϵ/2) Pr[πG = π].

Let P be the set of output orderings that might be generated by the for loop in SEQDENSEDP. Then, we have

Pr[πG P] = X

π P Pr[πG = π]

π P:πis good Pr[πG = π] + X

π P:π is bad Pr[πG = π]

exp(ϵ/2) Pr[πG P] + δ

The lemma then follows.

Lemma 4.2. Let pt(G) is the probability that an endpoint of edge e is pick in iteration t and T is the ﬁrst iteration that an endpoint of edge e is picked (as stated in Lemma 4.1), PT 1 t=1 pt(G) ln δ 1 with probability at least 1 δ.

Proof. We map our algorithm to the adversarial probabilistic process in Lemma 2.5 and Corollary 2.5.1. In each iteration t, the adversary chooses heads with probability equal to pt(G). Since pt(G) is calculated based on the outcomes of t 1 previous iterations, it satisﬁes the conditions of the adversarial model. In any iteration t, if no head has appeared in the previous t 1 iterations, we

have pt(G) =

j adj(e) exp( ϵ deg G St(j)) P

j exp( ϵ deg G St(j)) , since each node j is

picked with probability proportional to exp( ϵ deg G St(j)). Therefore, the two processes (i.e., our algorithm and the adversarial probabilistic process) are now equivalent. By Corollary 2.5.1, PT 1 t=1 pt(G) ln δ 1 with probability at least 1 δ. The lemma then follows.

Theorem 4.3. Algorithm 1 is (ϵ, δ)-differentially private.

Proof. By Lemma 4.1, the sequence {S1, S2, ...} is (ϵ/2, δ)-differentially private. For each St, adding or removing 1 edge to/from the graph changes ρ(St) by at most 1, hence ρ = 1. Applying the Exponential Mechanism, the last command releases an output with ϵ/2-differential privacy. Using the Composition Theorem, Algorithm 1 is (ϵ, δ)-differentially private.

Lemma 4.4. In each iteration t, the node vt picked by the exponential mechanism satisﬁes deg St(v) 2ρ(St) + 24

ϵ ln(e/δ) ln(n), with probability at least 1 1/n2.

Proof. Since 2ρ(St) = 2(#edges in St)

v St deg St(v)

|St| , it follows that minv St deg St(v) 2ρ(St). Let vmin be a node achieving the minimum degree in St. Then deg St(vmin) 2ρ(St). In the exponential mechanism run in iteration t, the sensitivity = 1, as |deg G St(v) deg G St (v)| 1. Applying Theorem 2.4 (with u(St, v) = deg G St(v) and r = 2 ln n), it follows that Pr[deg St(vt) deg St(vmin) + 6

ϵ ln(n)] e 2 ln n 1/n2. Substituting ϵ = ϵ 4 ln(e/δ), the Lemma follows.

Theorem 4.5. Let S be an optimal solution. Assuming n > 3, δ < 1/e, the set S output by Algorithm 1 satisﬁes ρ(S) ρ(S )

ϵ ln(1/δ) ln(n), with probability at least 1 2/n.

Proof. Applying a union bound with Lemma 4.4, it follows that for each iteration t, deg St(vt) 2ρ(St) + 24

ϵ ln(e/δ) ln(n), with probability 1 1/n. Let T denote the ﬁrst iteration that a node v T S is picked. As observed in (Bahmani et al., 2012), deg S (v) ρ(S ), therefore

ρ(S ) deg S (v) deg ST (v)

2ρ(ST ) + 24

ϵ ln(e/δ) ln(n),

with probability at least 1 1/n. Therefore, ρ(ST ) ρ(S )/2 12

ϵ ln(e/δ) ln(n), with prob. at least 1 1/n.

Next, applying Theorem 2.4 to the exponential mechanism in the last line of the algorithm, we have Pr[ρ(S) ρ(ST ) 4 log n+r

ϵ ] e r. Choosing r = ln n, we have ρ(S) ρ(ST ) 8 ln n/ϵ, with probability at least 1 1/n. Combining this with the bound on ρ(ST ), we have ρ(S) ρ(S )/2 12

ϵ ln(e/δ) ln(n) 8 ln n/ϵ.

Finally, we derive a coarse but simpler bound. For δ 1/e, we have 1 ln(e/δ) 2 ln(1/δ). Using these in the lower bound for ρ(S) gives ρ(S) ρ(S )

8 ln(1/δ)) 1

ϵ ln(1/δ) ln(n) ρ(S )

ϵ ln(1/δ) ln(n).

5. Parallel Private Densest Subgraph

5.1. Algorithm PARDENSEDP

Instead of picking a single node in each iteration, as in Algorithm SEQDENSEDP, the main idea in PARDENSEDP is to independently sample each node with the right probability. As a result, multiple nodes might be selected simultaneously, especially when the degrees are low. This change makes the privacy analysis more challenging because (1) we have to consider different scenarios of picking the endpoints of the extra edge e and (2) without the normalization terms, the probability P in the proof of Lemma 4.1 loses its symmetry.

Theorem 5.1. Algorithm 2 is (ϵ, δ)-differentially private.

Differentially Private Densest Subgraph Detection

Algorithm 2 PARDENSEDP(G, ϵ, δ) Input: G G, privacy parameters ϵ, δ. Output: A subset S V (G)

1: Let ϵ = 1 1/e 8 ln (e/δ)ϵ, and c = 1/ϵ + 1 2: S0 = V (G), t = 0 3: while |St| > 0 do 4: j St, assign a random variable ptj {0, 1} such that Pr[ptj = 1] = Ptj = exp( ϵ (deg St(j) + c)) 5: t := t + 1 6: πt = {j St : ptj = 1} 7: St = St 1\πt 8: end while 9: Return St from distinct set {St} with probability proportional to eϵρ(St)/2

Proof. (Sketch, full proof in Theorem A.4) Using the same notations in Lemma 4.1, we prove the sequence {S1, S2, ...} output by the while loop is (ϵ/2, δ)-differentially private. The remaining part is the same to Theorem 4.3. We construct P = Pr[πG=π]

Pr[πG =π], and deﬁne:

P G tj P G tj , Bt = Y

1 P G tj 1 P G tj ,

P G T j P G T j , D = Y

1 P G T j 1 P G T j

then expand P = QT 1 t=1 (At Bt) C D.

We consider 2 cases (1) G + e = G and (2) G e = G

and 2 subcases in each case: (a) both u, v are picked at step T and (b) exact one of them is picked at T . In Case 1, it is straightforward to prove P exp(2ϵ ), where Case 2 is more difﬁcult to analyze.

In Case 2, we prove At = 1 t < T . We reduce C and D, and depends on the subcases, they have slightly different forms. We then reduce P to (2a) P QT 1 t=1 Bt and (2b) P QT 1 t=1 Bt 1 P G T u 1 P G T u . Hence, the remain-

ing thing is to bound QT 1 t=1 Bt. Since for all j adj(e), deg G St(j) = deg G St (j), the terms corresponding to such nodes cancel out in Bt, which leads to QT 1 t=1 Bt exp( 4ϵ 1 e 1 PT 1 t=1 puv t (G)), in which puv t (G) is the probability of picking at least one of u, v in iteration t. Mapping the Algorithm s sampling process to the Adversarial Probabilistic Process in Lemma 2.5, we ﬁnd the upper bound of PT 1 t=1 puv t (G) is ln δ 1 with probability at least 1 δ. It follows that P exp( 4ϵ ln(1/δ)

1 1/e ) exp(ϵ/2) in Case 2a

and P exp( 4ϵ ln(e/δ)

1 1/e ) = exp(ϵ/2) in Case 2b. Using the same argument as in the last part of Lemma 4.1, the proof follows.

Theorem 5.2. (Proof in Theorem A.5) Let S be an optimal solution. If δ 1/e, the set S output by Algorithm 2

satisﬁes ρ(S) ρ(S )

ϵ ln(1/δ) ln n, with probability at least 1 2/n.

Running time of PARDENSEDP. Each node is removed in an iteration with probability e ϵ deg St 1(v). This could be as small as e Θ(n), e.g., if St 1 has a dense subgraph in which each node has very high degree. Therefore, the number of iterations could be eΘ(n) in the worst case (this is borne out of our experiments on some networks).

5.2. Algorithm PHASEDENSEDP: Logarithmic Bounded Runtime Algorithm

Algorithm 2 can execute in parallel but it has exponential runtime. This section presents a modiﬁed version of PARDENSEDP that has logarithmic bounded runtime, named PHASEDENSEDP. We group iterations into phases and aim to remove at least a constant fraction of nodes in each phase (Lemma 5.3). Also, we only update node s new degrees at the end of each phase, allowing the sampling process per each node to run independently as its degree remains unchanged in the current phase. The privacy analysis of PHASEDENSEDP differs from PARDENSEDP s in two key ways: (1) the terms corresponding to the endpoints of the extra edge e are not updated until the end of the current phase and (2) it requires two instances of the Adversarial Probabilistic Process to mimic the sampling process.

Algorithm 3 PHASEDENSEDP(G, ϵ, δ) Input: G G, privacy parameters ϵ, δ. Output: A subset S V (G)

1: Let ϵ = 1 1/e 24 ln (4/δ)ϵ, and c = 1/ϵ + 1 2: Let π0 = V, S0 = V , T0 = 0, i = 1 3: while Si = do 4: if |Si| ln n then 5: Ti = Ti 1 + 1, πTi = Si Si+1 = , i = i + 1 6: else 7: For each v Si, let d(v) = deg Si(v) 8: Let Pv = exp( ϵ (d(v) + c)) 9: Let Tv be chosen from a geometric random process with probability Pv 10: Let [ ρ(Si) = ρ(Si) + 16

ϵ ln n + Lap( 4 ln n

11: Let Ti = exp(ϵ (4 [ ρ(Si) + c)) 4 ln n 12: For t Ti, let πt = {v : Tv = t} 13: Si+1 = Si \ Ti t =Ti 1+1πt 14: i = i + 1 15: end if 16: end while 17: Return S distinct set {S0, S1, . . . , } with probability proportional to eϵρ(Si)/2

Lemma 5.3. The number of phases is at most log n, with probability at least 1 1 n2 .

Proof. Consider any phase i, except the last one, and let

Differentially Private Densest Subgraph Detection

v Si be a node with d Si(v) 4ρ(Si). Since Tv is sampled from a geometric distribution, Pr[Tv > 4 ln n/Pv] (1 Pv)4 ln n/Pv 1/n4. Therefore, with probability at least 1 1/n3, for all nodes v Si, we have Tv exp(ϵ (d Si(v) + c))4 ln n exp(ϵ (4ρ(Si) + c))4 ln n exp(ϵ (4 [ ρ(Si)+c))4 ln n = Ti, where the second inequality is because d Si(v) 4ρ(Si) and the last inequality follows Lemma A.6. Let A(Si) = Ti t =Ti 1+1πt denote the set of nodes removed in this phase.

Therefore, with probability at least 1 1/n3, every node v Si+1 = Si A(Si) has d Si(v) > 4ρ(Si). We have 2|E(Si)| = P

v A(Si) d Si(v) + P

v Si+1 d Si(v)

|Si+1|4ρ(Si) = 4|Si+1| |E(Si)|

|Si| , which implies |Si+1| |Si|/2. Thus, with probability at least 1 1/n3, the number of nodes in each phase reduces by factor of 2. Therefore, with probability at least 1 log n

n3 1 1 n2 , the Lemma follows.

Theorem 5.4. Algorithm 3 preserves (ϵ, δ)-differential privacy, with ϵ (0, 1], δ (2n 2, 1).

Proof. (Sketch; full proof in Theorem A.10) Similar to Theorem 5.1, we prove the sequence {S1, S2, ...} output by the while loop is (ϵ/2, δ)-differentially private.

First, prove the sequence {π0, π1, . . .} is (ϵ/4, δ/2)- differentially private. Using the same notations and expansion, we re-analyze the same cases in Theorem 5.1. Analysis in Case 1a and 2a (where u and v are removed at the same iteration) remains unchanged. We re-evaluate Case 1b and 2b. Assume u is removed ﬁrst at Tu and v is removed later at T = Tv > Tu.

In Case 1b, we prove that At = 1 t = Tu and ATu = exp(ϵ ), C and D are the same as in Theorem 5.1. We split the product of Bt into 2 ranges Bt = (1 P G tu)(1 P G tv) (1 P G tu )(1 P G tv ) for

t [1, Tu 1] and Bt = 1 P G tv 1 P G tv for t [Tu, Tv 1]. In

either case, Bt 1. We have P = QTv 1 t=1 (At Bt) C D exp(2ϵ ).

In Case 2b, similar to Theorem 5.1, we prove At 1, C 1, D = 1, and split the product of Bt into two parts, where the latter arises since we do not update the degree of v after removing u: QTv 1 t=1 Bt = QTu 1 t=1 (1 P G tu)(1 P G tv) (1 P G tu )(1 P G tv ) QTv 1 t=Tu 1 P G tv 1 P G tv QTu 1 t=1 exp( 4ϵ pt(G)

1 e 1 ) QTv Tu 1 t =1 exp( 2ϵ qv t (G) 1 e 1 ). We use two instances of the Adversarial Probabilistic Process to calculate the upper bound on the product of Bt as QTv 1 t=1 Bt exp( 4ϵ ln (4/δ)

1 e 1 ) exp( 2ϵ ln (4/δ)

1 e 1 ) = exp( 6ϵ ln (4/δ)

1 e 1 ), with probability at least 1 δ/2. Second, we prove the sequence {\ ρ(S0), \ ρ(S0), . . .} is (ϵ/4, δ/2)-differentially private, using the Laplace mech-

anism on each [ ρ(Si) and composition over ln n elements of the sequence. Finally, Because {S0, S1, . . .} is composed from {π0, π1, . . .} and the the information computed from {\ ρ(S0), \ ρ(S1), . . .}, it is (ϵ/2, δ)-differentially private.

Theorem 5.5. (Proof in Theorem A.11) Let S be an optimal solution. With probability at least 1 2/n, the set S output by Algorithm 3 satisﬁes ρ(S) ρ(S )

ϵ ln(1/δ) ln(n) if δ 1/e.

6. Map-Reduce Implementation

We brieﬂy describe MRDENSEDP (Algorithm 4) a Map Reduce implementation of PHASEDENSEDP. We prove in Lemma A.13 that MRDENSEDP preserves the same privacy guarantee and runtime as of PHASEDENSEDP. Each iteration i of MRDENSEDP contains 3 Reduce subphases and 1 Map subphase and is equivalent to phase i of PHASEDENSEDP. The ﬁrst Reduce subphase takes input as pairs of nodes that constitute an edge and group them by endpoints. It creates a list of neighbors of each node to keep track of current nodes and calculate their degrees. The second Reduce subphase groups the current nodes together to form a candidate set Si while the third one gathers the nodes degrees to calculate the density Si. The Map subphase performs the sampling process. It samples the removal time of each node and compares to the cut-off time of the current phase to decide if the node is picked or not. If the node is picked, it emits the node s self-edge to signal the next iteration s ﬁrst Reduce subphase that the node is removed. Otherwise, it emits the edges adjacent to the node for the next iteration to rebuild the graph.

7. Lower bound

Theorem 7.1. Any (ϵ, δ)-differentially private algorithm for DENSEDP must incur an additive lower bound of Ω( p

log n/ϵ) for δ 1/n.

Proof. Let G = (V, E) denote a graph with |V | = n and E = . Let M(G) be an (ϵ, δ)-differentially private mechanism. Let V be partitioned into sets V1, . . . , VN, for N = n/a, a 2 = 1 3ϵ log n, with each |Vi| = a. For i {1, . . . , N}, deﬁne Si = {S V : S Vi = , |S| a2}. We prove in Lemma A.12 that there exists an i {1, . . . , N} such that Pr[M(G) Si] 2a3

Next, let G = (V, E ) be a graph such that Vi is a clique in E , and there are no additional edges. Then, |E | = a 2 = 1 3ϵ log n. By the group privacy property of (ϵ, δ)-differential privacy (Lemma 2.2 of (Vadhan, 2017)), it follows that Pr[M(G ) Si] Pr[M(G) Si] exp( 1

3 log n) + 1 3ϵ log n exp( 1

3 log n)δ 2a3

n n1/3 + log n

n2/3 + log n 3ϵn2/3 3a3

Differentially Private Densest Subgraph Detection

For any S Si, we have ρ(S) 1/2, since either (1)

|S| > a2, which implies ρ(S) ( a 2) a2 1

2, since S may have up to a 2 edges. or (2) |S| a2, S Vi = , which implies ρ(S) = 0. If M(G ) Si, we have ρ(M(G )) a/2. Hence, E[ρ(M(G ))] a 2 Pr[M(G ) Si] + 1

2 (1 Pr[M(G ) Si]) 3a4

2 1, and the theorem follows since ρ(G ) = a 1

8. Experiments

Our experiments address the following questions. Accuracy of our proposed methods. How does the accuracy of the solution computed using our private algorithms vary with ϵ and δ in different graphs? Recall. To what extent can our private algorithms ﬁnd the densest subgraph, as quantiﬁed by the recall? Efﬁciency of the parallel algorithms. How does the number of iterations vary, relative to SEQDENSEDP?

8.1. Baselines and measurements

We use the algorithm of (Charikar, 2000) as a baseline to compute a non-private densest solution for performance evaluation in our experiments. Let Sb denote the subgraph output by the baseline algorithm, and let M(G) denote the solution computed by our private algorithm M on graph G (for ﬁxed ϵ, δ). We consider the following metrics for evaluating our algorithms. For all these metrics, the closer they are to 1, the better our algorithms perform, in comparison to the baseline.

Relative density: this is deﬁned as ρ(M(G))/ρ(Sb). Jaccard index: |M(G) Sb|/|M(G) Sb|, which captures similarity between M(G) and Sb. Recall: |M(G) Sb|/|Sb|, which quantiﬁes what fraction of nodes in Sb are selected by M(G).

Table 1 lists 6 different networks from SNAP database we use to evaluate our results these are chosen to be of various sizes, in order to understand the impact of network structure on the results (Leskovec & Krevl, 2014). Due to the space limit, we present the setup and experiments on an extended list of 20 networks (Table 2) in Section B.1. Also, we only show results for SEQDENSEDP and PARDENSEDP here; see the Appendix for a detailed comparison with PHASEDENSEDP.

8.2. Experimental Results

Utility analysis of private Algorithms. Figure 1 shows the densities of private subgraphs in relative to the densities of the baselines. In most networks which have average degree larger than 4, we observe that SEQDENSEDP has relative

Network Nodes Edges Description ca-Gr Qc 5242 14496 Collab. net. Arxiv General Rel. musae DE 9498 153138 Social net. of Twitch (DE) musae ENGB 7126 35324 Social net. of Twitch (ENGB) ca-Astro Ph 18772 198110 Collab. net. Arxiv Astro Phy. musae squirrel 5201 198493 Wiki page-page net. (squirrel) facebook 4039 88234 Social circles from Facebook

Table 1. Summary of 6 of the 20 networks used for the experiments (Leskovec & Krevl, 2014); the remaining networks are summarized in Table 2 of the Appendix.

Figure 1. Accuracy in term of relative density of our private algorithms. The number in each graph indicates the density of the graph as whole (which equals the average degree).

density 75% for ϵ = 2 or 4. For ϵ = 2, SEQDENSEDP has the same density as the non-private baseline in four of six networks and has at least 50% of the density of the baseline in all except one. Figure 5 shows the relationship between the relative density of the solution computed by our private mechanism and the network density (i.e., the average degree) for the twenty networks we study. It conﬁrms that, in general, the accuracy is higher in higher density graphs. In general, we observe the trade-off between accuracy and privacy in all measurements. Higher ϵ, which means less privacy guarantee, yields better accuracy. In contrast, Figures 1, 2, 3 show that δ does not have signiﬁcant impacts on the quality of solutions. We show in the Appendix that, in general, PARDENSEDP outperforms PHASEDENSEDP at ϵ 2. For small ϵ, neither algorithm has good accuracy, though PHASEDENSEDP is slightly better.

Jaccard index and recall. Figure 2 shows the Jaccard similarity between the private subgraphs and Sb. The coefﬁcients vary greatly across networks. Four out of the six networks have Jaccard similarity coefﬁcients at least 0.5 for ϵ 2 with SEQDENSEDP. As with the relative density, networks with higher density tend to have better Jaccard similarity coefﬁcients. Figure 3 shows the recall of our algorithms, i.e., the fraction of nodes in Sb which are selected by our private algorithms. The recall for SEQDENSEDP is at least 75% in all networks for ϵ 1, which indicates that the algorithm successfully ﬁnds most nodes in Sb.

Differentially Private Densest Subgraph Detection

Figure 2. Jaccard similarity coefﬁcient of private subgraphs and non-private baselines.

Figure 3. Recall: Fraction of nodes of the baseline s subgraphs are included in our algorithm s outputs.

Performance of parallel algorithms. Figure 4 shows the ratio of the number of iterations taken by PARDENSEDP and SEQDENSEDP(recall that the number of iterations for SEQDENSEDP is n). Due to the nature of the sampling probabilities in PARDENSEDP, the probability of removing a node reduces as ϵ increases. Consequently, PARDENSEDP takes more iterations to remove all nodes with larger ϵ and smaller δ. In all but the musae squirrel network (Figure 4), the number of iterations for PARDENSEDP is about 1% of that for SEQDENSEDP, when ϵ 8. Hence in most cases, PARDENSEDP is much more efﬁcient.

9. Conclusions

In this paper, we design the ﬁrst sequential and parallel differentially private algorithms for the densest subgraph problem in the edge-privacy model. All of them give (1/2, O(log n))-approximate solutions, with high probability. In other words, they match the 1/2 approximation of (Charikar, 2000), with an additive approximation of O(log n); we also prove a lower bound on the additive term of Ω( log n). Our main technical contributions include adaptation of the exponential mechanism to be applied in parallel, and the analysis of privacy and accuracy. Our experiments on 20 networks show that our algorithms have good accuracy in high-density networks and reasonable recall,

Figure 4. Number of iterations taken by PARDENSEDP, relative to SEQDENSEDP.

Figure 5. The relationship between accuracy (relative density) and network density. We use ϵ {1, 2}, δ = 10 6. Each point corresponds to one network. The solid lines show linear models for the relative density and the network density. The plot conﬁrms that our algorithms have better accuracy on higher density networks.

overall, when ϵ 2. SEQDENSEDP has better accuracy while the parallel variants reduce the number of iterations signiﬁcantly in most settings. Our experiments suggest that PARDENSEDP is good enough for practical uses among two parallel variants. Our paper leads to many open questions. Can we improve the runtime, e.g., by a combination of PARDENSEDP and PHASEDENSEDP? The most intriguing problem is to close the gap between the upper and lower bounds in accuracy. In particular, is an O( log n) additive approximation to (Charikar, 2000) possible to achieve? Is it possible to obtain a (1, O(log n))-approximate solution, i.e., a purely additive approximation?

Acknowledgement. We thank anonymous ICML reviewers for helpful suggestions. A. Vullikanti is supported by NSF grants IIS-1931628, CCF-1918656, NSF IIS-1955797, NIH grant R01GM109718.

Differentially Private Densest Subgraph Detection

Abadi, M., Chu, A., Goodfellow, I., Mc Mahan, H. B., Mironov, I., Talwar, K., and Zhang, L. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, CCS 16, pp. 308 318, New York, NY, USA, 2016. Association for Computing Machinery. ISBN 9781450341394. doi: 10.1145/ 2976749.2978318. URL https://doi.org/10. 1145/2976749.2978318.

Abowd, J. M. The u.s. census bureau adopts differential privacy. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 18, pp. 2867, New York, NY, USA, 2018. Association for Computing Machinery. ISBN 9781450355520. doi: 10.1145/ 3219819.3226070. URL https://doi.org/10. 1145/3219819.3226070.

Asahiro, Y., Hassin, R., and Iwama, K. Complexity of ﬁnding dense subgraphs. Discrete Applied Mathematics, 121(1):15 26, 2002.

Bahmani, B., Kumar, R., and Vassilvitskii, S. Densest subgraph in streaming and mapreduce. Proc. VLDB Endow., 5(5):454 465, January 2012. ISSN 2150-8097. doi: 10.14778/2140436.2140442. URL https://doi. org/10.14778/2140436.2140442.

Beame, P., Koutris, P., and Suciu, D. Communication steps for parallel query processing. In Hull, R. and Fan, W. (eds.), Proceedings of the 32nd ACM SIGMODSIGACT-SIGART Symposium on Principles of Database Systems, PODS 2013, New York, NY, USA - June 22 - 27, 2013, pp. 273 284. ACM, 2013. doi: 10.1145/ 2463664.2465224. URL https://doi.org/10. 1145/2463664.2465224.

Blocki, J., Blum, A., Datta, A., and Sheffet, O. Differentially private data analysis of social networks via restricted sensitivity. In Proceedings of the 4th Conference on Innovations in Theoretical Computer Science, ITCS 13, pp. 87 96, New York, NY, USA, 2013. Association for Computing Machinery. ISBN 9781450318594. doi: 10. 1145/2422436.2422449. URL https://doi.org/ 10.1145/2422436.2422449.

Blum, A., Dwork, C., Mc Sherry, F., and Nissim, K. Practical privacy: The sulq framework. In Proceedings of the Twenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 05, pp. 128 138, New York, NY, USA, 2005. Association for Computing Machinery. ISBN 1595930620. doi: 10.1145/1065167.1065184. URL https://doi. org/10.1145/1065167.1065184.

Cadena, J., Vullikanti, A., and Aggarwal, C. On dense subgraphs in signed network streams. In IEEE International Conference on Data Mining (ICDM), 2016.

Cadena, J., Chen, F., and Vullikanti, A. Graph anomaly detection based on steiner connectivity and density. Proceedings of the IEEE, 106(5):829 845, 2018.

Charikar, M. Greedy approximation algorithms for ﬁnding dense components in a graph. volume 1913, pp. 84 95, 09 2000. doi: 10.1007/3-540-44436-X 10.

Chen, R., Fung, B. C., Philip, S. Y., and Desai, B. C. Correlated network data publication via differential privacy. The VLDB Journal, 23(4):653 676, 2014.

Chen, S. and Zhou, S. Recursive mechanism: towards node differential privacy and unrestricted joins. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 653 664, 2013.

Day, W.-Y., Li, N., and Lyu, M. Publishing graph degree distribution with node differential privacy. In Proceedings of the 2016 International Conference on Management of Data, pp. 123 138, 2016.

Dean, J. and Ghemawat, S. Mapreduce: simpliﬁed data processing on large clusters. Communications of the ACM, 51(1):107 113, 2008.

Ding, X., Zhang, X., Bao, Z., and Jin, H. Privacypreserving triangle counting in large graphs. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, CIKM 18, pp. 1283 1292, New York, NY, USA, 2018. Association for Computing Machinery. ISBN 9781450360142. doi: 10.1145/3269206.3271736. URL https://doi. org/10.1145/3269206.3271736.

Dwork, C. Differential privacy. Encyclopedia of Cryptography and Security, pp. 338 340, 2011.

Dwork, C., Roth, A., et al. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3-4):211 407, 2014.

Ghaffari, M., Lattanzi, S., and Mitrovi c, S. Improved parallel algorithms for density-based network clustering. In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 2201 2210, Long Beach, California, USA, 09 15 Jun 2019. PMLR. URL http://proceedings. mlr.press/v97/ghaffari19a.html.

Goldberg, A. V. Finding a maximum density subgraph. University of California Berkeley, CA, 1984.

Differentially Private Densest Subgraph Detection

Gupta, A., Ligett, K., Mc Sherry, F., Roth, A., and Talwar, K. Differentially Private Combinatorial Optimization, pp. 1106 1125. doi: 10.1137/1.9781611973075.90. URL https://epubs.siam.org/doi/abs/10. 1137/1.9781611973075.90.

Gupta, A., Roth, A., and Ullman, J. Iterative constructions and private data release. In Theory of cryptography conference, pp. 339 356. Springer, 2012.

Hay, M., Li, C., Miklau, G., and Jensen, D. Accurate estimation of the degree distribution of private networks. In 2009 Ninth IEEE International Conference on Data Mining, pp. 169 178. IEEE, 2009.

Hooi, B., Song, H. A., Beutel, A., Shah, N., Shin, K., and Faloutsos, C. Fraudar: Bounding graph fraud in the face of camouﬂage. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), KDD 16, pp. 895 904, New York, NY, USA, 2016. ACM. ISBN 978-1-4503-4232-2. doi: 10.1145/2939672.2939747. URL http://doi.acm. org/10.1145/2939672.2939747.

Imola, J., Murakami, T., and Chaudhuri, K. Locally differentially private analysis of graph statistics, 2020.

Karloff, H. J., Suri, S., and Vassilvitskii, S. A model of computation for mapreduce. In Charikar, M. (ed.), Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, Austin, Texas, USA, January 17-19, 2010, pp. 938 948. SIAM, 2010. doi: 10.1137/1.9781611973075.76. URL https: //doi.org/10.1137/1.9781611973075.76.

Karwa, V., Raskhodnikova, S., Smith, A., and Yaroslavtsev, G. Private analysis of graph structure. ACM Trans. Database Syst., 39(3), October 2014a. ISSN 0362-5915. doi: 10.1145/2611523. URL https://doi.org/10. 1145/2611523.

Karwa, V., Slavkovi c, A. B., and Krivitsky, P. Differentially private exponential random graphs. In International Conference on Privacy in Statistical Databases, pp. 143 155. Springer, 2014b.

Kasiviswanathan, S. P., Nissim, K., Raskhodnikova, S., and Smith, A. Analyzing graphs with node differential privacy. In Proceedings of the 10th Theory of Cryptography Conference on Theory of Cryptography, TCC 13, pp. 457 476, Berlin, Heidelberg, 2013. Springer-Verlag. ISBN 978-3-642-36593-5. doi: 10. 1007/978-3-642-36594-2 26. URL http://dx.doi. org/10.1007/978-3-642-36594-2_26.

Khuller, S. and Saha, B. On ﬁnding dense subgraphs. In Albers, S., Marchetti-Spaccamela, A., Matias, Y.,

Nikoletseas, S., and Thomas, W. (eds.), Automata, Languages and Programming, pp. 597 608, Berlin, Heidelberg, 2009. Springer Berlin Heidelberg.

Leskovec, J. and Krevl, A. SNAP Datasets: Stanford large network dataset collection. http://snap. stanford.edu/data, June 2014.

Mc Sherry, F. and Talwar, K. Mechanism design via differential privacy. In Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science, FOCS 07, pp. 94 103, USA, 2007. IEEE Computer Society. ISBN 0769530109. doi: 10.1109/FOCS.2007.41. URL https://doi.org/10.1109/FOCS.2007.41.

Mir, D. and Wright, R. N. A differentially private estimator for the stochastic kronecker graph model. In Proceedings of the 2012 Joint EDBT/ICDT Workshops, pp. 167 176, 2012.

Mitrovic, M., Bun, M., Krause, A., and Karbasi, A. Differentially private submodular maximization: Data summarization in disguise. In Precup, D. and Teh, Y. W. (eds.), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp. 2478 2487, International Convention Centre, Sydney, Australia, 06 11 Aug 2017. PMLR. URL http://proceedings.mlr. press/v70/mitrovic17a.html.

Nguyen, D. and Vullikanti, A. Differentially private densest subgraph detection. Co RR, abs/2105.13287, 2021. URL https://arxiv.org/abs/2105.13287.

Nguyen, H. H., Imine, A., and Rusinowitch, M. Differentially private publication of social graphs at linear cost. In 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 596 599. IEEE, 2015.

Nguyen, H. H., Imine, A., and Rusinowitch, M. Detecting communities under differential privacy. In Proceedings of the 2016 ACM on Workshop on Privacy in the Electronic Society, WPES 16, pp. 83 93, New York, NY, USA, 2016. Association for Computing Machinery. ISBN 9781450345699. doi: 10.1145/ 2994620.2994624. URL https://doi.org/10. 1145/2994620.2994624.

Nissim, K., Raskhodnikova, S., and Smith, A. Smooth sensitivity and sampling in private data analysis. In Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing, STOC 07, pp. 75 84, New York, NY, USA, 2007. Association for Computing Machinery. ISBN 9781595936318. doi: 10. 1145/1250790.1250803. URL https://doi.org/ 10.1145/1250790.1250803.

Differentially Private Densest Subgraph Detection

Rozenshtein, P., Anagnostopoulos, A., Gionis, A., and Tatti, N. Event detection in activity networks. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2014.

Tsourakakis, C., Bonchi, F., Gionis, A., Gullo, F., and Tsiarli, M. Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 104 112, 2013.

Vadhan, S. The Complexity of Differential Privacy, pp. 347 450. Springer International Publishing, Cham, 2017. ISBN 978-3-319-57048-8. doi: 10.1007/ 978-3-319-57048-8 7. URL https://doi.org/10. 1007/978-3-319-57048-8_7.

Xiao, Q., Chen, R., and Tan, K.-L. Differentially private network data release via structural inference. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 911 920, 2014.

Zhu, T., Li, G., Zhou, W., and Yu, P. S. Differentially private data publishing and analysis: A survey. IEEE Transactions on Knowledge and Data Engineering, 29(8): 1619 1638, Aug 2017. ISSN 2326-3865. doi: 10.1109/ TKDE.2017.2697856.