# faster_approximate_subgraph_counts_with_privacy__c5976774.pdf Faster approximate subgraph counts with privacy Dung Nguyen Department of Computer Science and UVA Biocomplexity Institute and Initiative University of Virginia, USA dungn@virginia.edu Mahantesh Halappanavar Data Sciences and Machine Intelligence Group Pacific Northwest National Laboratory, USA hala@pnnl.gov Venkatesh Srinivasan Department of Mathematics and Computer Science Santa Clara University, USA vsrinivasan4@scu.edu Anil Vullikanti Department of Computer Science and UVA Biocomplexity Institute and Initiative University of Virginia, USA vsakumar@virginia.edu One of the most common problems studied in the context of differential privacy for graph data is counting the number of non-induced embeddings of a subgraph in a given graph. These counts have very high global sensitivity. Therefore, adding noise based on powerful alternative techniques, such as smooth sensitivity and higher-order local sensitivity have been shown to give significantly better accuracy. However, all these alternatives to global sensitivity become computationally very expensive, and to date efficient polynomial time algorithms are known only for few selected subgraphs, such as triangles, k-triangles, and k-stars. In this paper, we show that good approximations to these sensitivity metrics can be still used to get private algorithms. Using this approach, we much faster algorithms for privately counting the number of triangles in real-world social networks, which can be easily parallelized. We also give a private polynomial time algorithm for counting any constant size subgraph using less noise than the global sensitivity; we show this can be improved significantly for counting paths in special classes of graphs. 1 Introduction The notion of Differential Privacy (DP) [12] has emerged as the de facto standard notion for supporting queries on private and sensitive data. DP ensures that changes in private data have limited statistical influence (measured by the privacy budget ϵ, δ) on the output of queries, without needing any attack model, which is one of the reasons of its popularity. Networked abstractions are commonly used in a number of applications, such as public health, social networks, and finance. Such datasets are usually sensitive, and maintaining privacy is an important concern. Within the context of network/graph data, two privacy models have been considered in most prior work, namely, edge and node privacy. There has been a lot of interest in developing private algorithms for a number of network problems, such as community detection [29, 9, 8, 22], and counting small subgraphs (e.g., stars and triangles [25, 36, 23, 31]). However, efficient private algorithms with good accuracy are not known for many graph problems, including counting small subgraphs. One of the main challenges is that graph problems often have very high global sensitivity (the maximum change in the output function due to the change in one edge/vertex (see Section 3)), in contrast to many statistical and machine learning queries. As a result, DP algorithms based on global sensitivity (which can usually be computed quite easily for many problems) do not give good accuracy bounds, in general. For instance, in the edge-privacy 37th Conference on Neural Information Processing Systems (Neur IPS 2023). model, the global sensitivity of #triangles in a graph G = (V, E) can be as high as n 2, where n = |V |, making the added noise too large in many instances. A number of novel alternatives to global sensitivity have been proposed, e.g., smooth, restricted and multi-level local sensitivity, and ladder functions [36, 23, 3, 31], which add significantly lower level of noise than mechanisms based on global sensitivity. However, all these alternative notions of sensitivity are computationally much more challenging, and none of the current graph DP algorithms for subgraph counting scale to even moderate size networks. Even the simplest problem of counting #triangles privately takes min{m maxv d(v), M(n)} time [23], where m = |E|, d(v) is the degree of node v V , and M(n) is the time for multiplying two n n matrices, which is super-quadratic, in general. This is a sharp contrast with the significant advances in non-private graph mining where complex network properties such as clique counts can be computed using provably-efficient and practical algorithms [35, 20, 19]. For a few other subgraphs, such as k-cliques and k-triangles (see [36, 23] for definitions), private algorithms using noise lower than the global sensitivity are known, but the worst case running time can still be O(nk) [36]. However, for most subgraph counting problems, no private algorithms which use alternatives to global sensitivity, and have time comparable to the non-private algorithms, are known. A main tool for subgraph counting and other graph mining tasks in massive networks are sampling based approximation algorithms, e.g., [2]; however, these haven t been used in the context of private graph algorithms. In this paper, we take the first steps in this direction. Our main contributions are: We show that for a query f(G), we can get a private algorithm by adding noise that scales with an approximation to the smooth sensitivity to an approximation to f(G) (Theorem 1). This result opens the door to improving efficiency of private algorithms using approximation techniques that have been very useful in non-private graph analytics. As an illustration, we use this result to obtain a quasilinear time private algorithm for counting #triangles under edge privacy (Theorem 2) for graphs which satisfy a stronger transitivity property (see Definition 5). Our algorithm adapts the diamond sampling technique of [2] for approximate triangle counting, and can also be parallelized easily. No parallel algorithms have been developed for private subgraph counting so far. Extending the approach of [23], we design a private higher-order local sensitivity approach for subgraph counting, which gives privacy even when the higher-order local sensitivities are estimated approximately (Theorem 7). This yields the first private algorithm to count the number of embeddings of any subgraph with ℓedges in time O(n2ℓ), using less noise than global sensitivity. For the specific case of paths with ℓedges, we develop an algorithm that gives a very close approximation to the higher-order sensitivity in time O(h(d, t, ℓ)poly(n)), where d and t are the degeneracy and treewidth of G, respectively (defined later). Thus, for paths, the running time does not grow as O(nℓ) for graphs with bounded degeneracy and treewidth. These properties have been used extensively in developing efficient algorithms for subgraph counting and other graph mining tasks, e.g., [5, 13, 6]; but, they haven t been used for private algorithms. We experimentally evaluate our algorithm for counting #triangles (Section 6). We show that our algorithm for estimating the smooth sensitivity has very good approximation factor, and significantly better running time than the exact algorithm. Using approximate smooth sensitivity for counting #triangles gives good accuracy even for fairly low ϵ values, suggesting that using approximate smooth sensitivity does not compromise performance. Our algorithm also has good scaling, and is the first private algorithm for counting #triangles which has been run on networks with over 2 million edges. We note that many details, including proofs are presented in the Supplementary Material (SM). 2 Related Work The area of DP graph algorithms is quite large and active. There has been a lot of work on DP algorithms for many basic graph problems, such as community detection, subgraph counting, finding small cuts, and releasing synthetic graphs [25, 28, 30, 33, 16, 3, 15, 21, 17]. There is some work on graph algorithms under local DP, e.g., [16], but most of the prior work has been for global DP (as defined in Section 3), and is our focus in this work. We refer to [27] for a recent survey on private graph algorithms. For brevity, we only discuss prior work on private subgraph counting that is directly related to our paper, and in the edge-DP model. We omit the discussion on private estimation of edge Privacy model Runtime Global Sensitivity ϵ O(1) Ladder function [10] (ϵ, δ) O(matrix mul. of size n) Recursive mechanism [7] (ϵ, δ) O(mn) Restricted Sensitivity [3] (ϵ, δ) O(mn) Blackbox Transformation [4] (ϵ, δ) O(non-private # alg.) (Exact) Smooth Sensitivity [23] (ϵ, δ) O(matrix mul. of size n) Our method (ϵ, δ) O(m log2 n) for (C, Λ)-graphs with constant C, Λ Table 1: Summary of the characteristics of differentially private #triangle counting algorithms in the edgeprivacy model (the other works mentioned here consider other subgraphs also, but we only focus on the runtime of counting #triangles). The runtime of Blackbox Transformation [4] includes the calculation of an approximated triangle counts, while all others assume the true counts are given. The runtime of our method is reported with δ = Ω(n 2) and constant γ (see Corollary 1). density and degree distribution, which has been studied more extensively, e.g., [14], and results for node-DP [25], which is less studied. As mentioned earlier, global sensitivity, denoted by GS(f) for a graph metric f(G) (defined in Section 3) is high for many subgraph counting problems. Adding noise based on GS(f) leads to low accuracy, while noise based on just LSf, the local sensitivity, need not be private [31]. Nissim et al. [31] develop the notion of smooth sensitivity S f, β(G) = max G LSf(G ) e β d(G,G ) , where d(G, G ) is the swap distance between graphs G and G , and show that adding noise based on S f, β(G) is private. However, smooth sensitivity becomes computationally much more challenging, since its definition involves considering local sensitivity at distance t for all t, which doesn t seem like a polynomial time computation. Nissim et al. [31] develop polynomial time algorithms for computing the smooth sensitivity exactly for counting #triangles, which was improved slightly to min{mdmax, M(n)} [23], where M(n) is the matrix multiplication time. Polynomial time smooth sensitivity bounds were also shown for a small number of other subgraphs, but no polynomial time algorithms are known beyond that. Interestingly, no hardness bounds are known for smooth sensitivity; the existing hardness bounds, e.g., [23, 36] only give hardness for computing local sensitivity at distance t. In order to handle other subgraph counts, Karwa et al. [23] develop a different technique involving local sensitivity of local sensitivity (motivated by the propose-test-release technique [12]), and use it for private counts of k-cliques. Zhang et al. [36] develop the technique of ladder functions to handle other kinds of subgraphs, and use it to privately count #k-cliques in the graph in time O(n T(n)), where T(n) is the time needed to count #k-cliques non-privately. A few other techniques, such as inverse sensitivity [1] and propose-test-release [12] are known. However, private algorithms for counting most subgraphs including paths and trees are not known. The recent work of [4] is related, but only considers approximation in the queries, but not in sensitivity. As a result, our methods give significantly higher efficiency. We also note that Blocki et al. [4] develop a black-box approach to make certain approximation algorithms differentially private. However, their work requires the function being computed to have small global sensitivity. The difficulty of subgraph counting with DP has motivated work in slight variations of the DP model. Rastogi et al. [34] consider the problems of releasing more general subgraph counts. However, they consider a relaxed version of edge-DP, called (edge) adversarial privacy, that uses a Bayesian attacker. Chen et al. [7] design a different approach that gives lower bounds for general subgraph counts through a linear program to form a recursive strategy. But, as mentioned in [36], their method suffers from a bias between the true query answer and the lower bound, in exchange for less noise. [32] presents a different approach based on iterative refinement that estimates counts by degree, and can be implemented in time O((maxv deg(v))3m). There have been studies on subgraph counting (including triangle counting) under other privacy models: the node-DP model [25, 10], and the shuffle model [18]. We note that our analysis of approximate smooth sensitivity holds for other privacy models as well, and we expect this could be used for improved private queries for other problems. However, the specific technique using diamond sampling for faster triangle counting only holds in the edge-DP model. Fundamentally new ideas are needed for extending this technique to node-DP and others. For a more comprehensive review of recent development on subgraph counting with privacy, we refer readers to [27]. 3 Preliminaries A non-induced embedding of a graph H = (VH, EH) into a graph G = (V, E) is a mapping ϕ : VH V such that (ϕ(u), ϕ(v)) E whenever (u, v) EH. We use f H(G) to denote the number of non-induced embeddings of a H in G. We drop the subscript H when it is clear from the context. Let ℓ= |VH| and n = |V |. A denotes the adjacency matrix of graph G. Let N(i) and N(i) denote the set of neighbors and non-neighbors of node i V , respectively; let d(i) = |N(i)| and d(i) = n 1 d(i). Let dmax = maxv V d(v) denote the maximum degree. Differential privacy on graphs. Let G denote a set of graphs on a fixed set V of nodes. For a graph G G, we use V (G) and E(G) to denote the set of nodes and edges of G, respectively. In this paper, we will focus on the notion of edge privacy [3], where all graphs G G have a fixed set of nodes V (G) = V , and two graphs G, G G are considered neighbors, i.e., G G , if they differ in exactly one edge, i.e., |E(G) E(G )| = 1. Definition 1. A (randomized) algorithm M : G R is (ϵ, δ)-differentially private if for all subsets S R of its output space, and for all G, G G, with G G , we have Pr[M(G) S] eϵPr[M(G ) S] + δ [12, 3]. Problem statement: subgraph counting with edge differential privacy. Given a family of graphs G on a set V of vertices, a subgraph H, and parameters ϵ, δ, construct an (ϵ, δ)-differentially private mechanism Mf H : G 2V , such that |Mf H(G) f H(G)| is minimized. We discuss the notions of sensitivity mostly using the notation from [23, 31], with slight changes. Let LSf(G) = max G G |f(G) f(G )| denote the local sensitivity of f; we also use LS(f(G)) to denote this. GS(f) = max G LS(f(G)) denotes the global sensitivity. The local sensitivity of a function f on a graph G at distance t is defined as LS(t) f (G) = maxd(G,G ) t LSf(G ) Definition 2. (Smooth bound on LS [31]) For β > 0, a function S : Dn R+ is a β-smooth upper bound on LSf if it satisfies the following conditions: (1) for all x Dn: S(x) LSf(x), and (2) for all x y Dn: S(x) eβ S(y). S f,β(G) = max G (LS(f(G )) e βd(G,G )) is the smallest function satisfying Definition 2, and is referred to as the β-smooth sensitivity of f at G. Using the local sensitivity at distance t, the smooth sensitivity can be written as S f,β(G) = maxt=1,...,( n 2) e tβLS(t) f (G). Definition 3. (Admissible Noise Distribution) [31] A probability distribution on Rd given by a density function h is (α, β)-admissible if, for α = α(ϵ, δ), β = β(ϵ, δ), the following conditions hold for all Rd and λ R satisfying || ||1 α and |λ| β, and for all measurable S Rd: (1) Sliding property: Pr Z h[Z S] eϵ/2 Pr Z h[Z S + ] + δ/2, and (2) Dilation property: Pr Z h[Z S] eϵ/2 Pr Z h[Z eλ S] + δ/2. An important example of an admissible noise distribution is the Laplace mechanism Lap(λ) probability density function is h(z) = 1 2λe |z|/λ. Lemma 1. (Calibrating noise to Smooth Sensitivity [31]) Let Z be a random variable sampled from an (α, β)-admissible noise. Let Sf,β be the β-smooth upper bound on the local sensitivity of f. Then algorithm A(x) = f(x) + Sf,β(x) α Z is (ϵ, δ)-differentially private. Definition 4. (An (α, δ)-approximation to a function f). f is said to be an (α, δ)-approximation to a function f if for any input x, with probability at least 1 δ, we have (1 α)f(x) f(x) (1 + α)f(x). Social networks generally have the property that nodes have high clustering coefficient, which is the fraction of pairs of neighbors of a node which are connected. We consider a more restricted notion here. We refer to a path with r edges as an r-path. We say a 2-path i, j, k is closed if (i, k) E, i.e., i, j, k form a triangle in G. Definition 5. ((C, Λ)-transitive graph) A graph is said to be (C, Λ)-transitive if for any vertex j V and edge (i, j) E, we have: if deg(j) > Λ, then C fraction of all the 2-paths starting with i, j are closed. 4 Approximate smooth sensitivity and fast private triangle counting We first show that we can ensure privacy even if the smooth sensitivity S f,β is estimated approximately; we then extend it to show that this works when f(G) and S f,β both are estimated approximately. Definition 6. Sf,β is said to be a γ-upper approximation of the smooth sensitivity S f,β of function f if S f,β(D) Sf,β(D) eγS f,β(D) for any dataset D. Sf,β is said to be a (γ, δ )-upper approximation of the smooth sensitivity S f,β of function f if Sf,β is a γ-upper approximation for any dataset D with probability at least 1 δ . We observe below that calibrating noise using a (γ, δ )-approximation of the smooth sensitivity gives us privacy. We assume f is a real valued function, since we are focused on graph statistics. Lemma 2. (Lemma 12) Let Sf,β be a (γ, δ )-approximation to S f,β, for γ = β = 16ϵ ln(2/δ). Then, A(D) = f(D) + Lap( 2 Sf,β(D) ϵ ) is (ϵ, eϵ/2+1 2 δ + 2δ )-differentially private. Approximate Smooth Sensitivity for Approximate Query. We now show that approximate smooth sensitivity can be used even when f(G) is computed approximately, which becomes a bit more challenging. Let Af be (α1, δ1)-approximation of a function (query) f for a small constant α1 < 1/2. Let Sf,β be (γ, δ2)-upper approximation to S f,β. We will show that we can utilize Af and Sf,β to calculate a differentially private version of the function f. For the purpose of privacy analysis, we define the functions gf, sf, and Sgf as below. Those functions may not be computed efficiently. However, they are only used for the analysis, and the actual algorithm does not compute them. The actual computation (Algorithm 3) will only utilize Af and Sf,β to output the differentially private version of f. We follow some of the analysiss of [4] to prove our privacy guarantees below. We share (with [4]) the same process of proving that Sgf (D) is a smooth-upper bound of the local sensitivity (Smooth Sensitivity) of gf. The main difference is our analyses has to take into account the newly defined function sf which is a bounded variant of the Approximate Smooth Sensitivity ( Sf,β, which approximates S f,β), while [4] uses the global sensitivity GSf of f. As we can see in Lemma 13, the second condition requires Sgf (D) eβ Sgf (D ) for any pair of neighbor datasets D D . While GSf remains unchanged in both D and D , Sf,β may have different values for D and D which makes it more difficult to analyze Sgf . We also have to take into account the small probability δ2 that Sf,β(D) / {Sf,β(D), eγSf,β(D)}, that will add up in the δ-part of (ϵ, δ)-DP of the final output. Definition 7. Let Af be an (α1, δ1)-approximation of f. We define functions gf and sf as: Af(D) if (1 α1)f(D) Af(D) (1 + α1)f(D), (1 α1)f(D) if Af(D) < (1 α1)f(D), (1 + α1)f(D) if Af(D) > (1 + α1)f(D). (1) Sf,β(D) if Sf,β(D) Sf,β(D) eγSf,β(D), Sf,β(D) if Sf,β(D) < Sf,β(D), eγSf,β(D) if Sf,β(D) > eγSf,β(D). (2) Let Sgf (D) = 4α1gf(D) + 2sf(D) Lemma 3. (Lemma 13) Given gf, sf, and Sgf as defined above, Sgf is a β -smooth upper bound of the local sensitivity of gf, where β 4α1 + γ + β and α1 < 1/2. In order to prove that Sgf is a β -smooth upper bound of the local sensitivity of gf, we have to prove the two conditions: LSgf (D) Sgf(D), and Sgf (D) eβ Sgf (D ) for any pair of neighbor datasets D D . The full proof is in Lemma 13 in the SM. Theorem 1 below summarizes our result. Theorem 1. (Theorem 4) Let Af be a (α1, δ1)-approximation to f, Sf,β be a (γ, δ2)-upper approximation to S f,β, for γ = β = 8ϵ ln(2/δ), α1 = 4ϵ ln(2/δ). Then, A(D) = Af(D) + Lap( 8α1Af (D)+4 Sf,β(D) ϵ ) is (ϵ, eϵ/2+1 2 δ + 2δ1 + 2δ2)-differentially private. 4.1 Application: fast private triangle counting In this section, we study the problem of computing f (G), the number of triangles in a graph G(E, V ), by fast approximation of its smooth sensitivity. Recall the definitions of local sensitivity at distance t, denoted by LS(t) f (G), defined in Section 3. Figure 1: Paths (i, k, j) and (i, o, j) represent wedges formed by nodes i, j [2]. We denote the local and smooth sensitivity of f (G) by LS(t) (G) and S ,β(G). Let aij = P k [n] Aik Ajk denote the number of common neighbors of nodes i and j in a graph and bij = P k [n] Aik Ajk denote the number of nodes that are neighbors of i or j but not both. Then it has been shown that LS(t) (G) = maxi,j cij(t) where cij(t) = min aij + t+min(t,bij) 2 , n 2 (Claim 3.13 of [31]). We can rewrite this as LS(t) (G) = min(maxi,j aij + bij 2 , maxi,j aij+t, n 2) = min(maxi,j deg(i)+deg(j) 2, maxi,j aij + t, n 2). The bottleneck in computing LS(t) (G) (and S ,β(G)) is estimating ˆa = maxi,j aij. We adapt the diamond sampling technique of [2] for this task. By doing that, we can quickly calculate c LS (t) (G) = min(maxi,j deg(i)+deg(j) 2, ˆa + t, n 2) The core idea in diamond sampling is to find a diamond (a 4-cycle) of the form (i, o, j, k) obtained by the intersection of two wedges (a 2-path) (i, o, j) and (i, k, j), as shown in Figure 1. Since any pair (i, j) is part of a2 ij diamonds, the probability of selecting such a diamond is proportional to a2 ij. This idea is formalized in Algorithm 4. Let W be the matrix Wki = deg(k) deg(i) for all (k, i) E, and 0 for all other entries. We first note that if ˆa = 0, it must be the case that G has no paths of length 2, for if there is a 2-path i, k, j, then aij 1. Therefore, if ˆa = 0, we can determine LS(t) (G) exactly in time O(m). Therefore, we assume that ˆa 1 in the rest of this section. Even though Algorithm 4 can quickly estimate the quantity max a2 ij within a specific bound, it can only do so with the number of iterations that is sufficiently large which its exact threshold is unknown without the access to the dataset and the value of interest (max a2 ij) itself (see Lemma 15). We overcome this issue with the following idea. First, we propose some estimation τ of max a2 ij and run an instance of Algorithm 4 with the number of iterations computed from τ. Second, we compare the estimation of max a2 ij returned by Algorithm 4 with τ to determine if we overor under-estimate τ. Third, we adjust τ accordingly and repeat the process until we find a good value of τ. Algorithm 1 τ estimation Input: G(E, V ) Output: τ : τ < maxij a2 ij 1: k := 0, τk := n2, θ := 1 2 2: while TRUE do 3: Calculate s := 3c log n W θ2τk 4: x := Algorithm 4(G, s) 5: if maxij xij W 2τk then 6: τk+1 := 3 4τk 7: k := k + 1 8: else 9: Return τk 10: end if 11: end while Algorithm 2 Algorithm to compute S ,β(G) Input: G(E, V ), β, δ, γ Output: S ,β(G) 1: τk := Algorithm 1(G) 2: τ := τk/3 3: θ := e2γ 1 e2γ+1 4: c := max(logn(1/(2δ)) + 2, 12θ2) 5: Calculate s := 3c log n W θ2τ 6: x := Algorithm 4(G, s) 7: ˆa2 := maxij xij W 8: Calculate c LS (t) (G), t = 1, . . . , log (n)/β, using ˆa 9: Return maxt=1,...,log (n)/β e tβ c LS (t) (G) We implement Algorithm 1 based on this idea. In the algorithm, we first set τ to its largest possible value (n2) knowing that we are over-estimating τ (τ > max a2 ij). We note that by over-estimating τ, Algorithm 4 (line 6) may not have enough iterations to guarantee the convergence of max xij to max a2 ij. Note that in the output of Algorithm 4, for any i, j, xij is an approximation of a2 ij. We determine if we are truly over-estimating τ by applying Corollary 3 (in the SM), comparing the estimation maxij xij from Algorithm 4 in line 6 to (3/2)τ. The proof utilizes an idea that even though max xij may not converge to max a2 ij and we may not determine it directly, max xij can converge to τ once we under-estimate τ (i.e., τ < max a2 ij) and that we can check this convergence easily. Corollary 3 states that when max xij < (3/2)τ, max a2 ij < 3τ which means we should lower τ. We keep running Algorithm 4 and reducing τ by 3/4 in each iteration until the estimation of max xij lies above (3/2)τ. By Corollary 3, we show that when it happens, τk max a2 ij that guarantees the convergence of max xij to max a2 ij. Lemma 4 shows that when the algorithm outputs τk, w.h.p., the true value of interest max a2 ij is within τk and 4τk. Lemma 4. (Lemma 18 in SM) Let τk be the output of Algorithm 1. With probability at least 1 1/nc 3, τk < maxij a2 ij < 4τk. Algorithm 2 calculates an approximation of max a2 ij for the Approximated Smooth Sensitivity with any γ (which is required for Differential Privacy with an arbitrary ϵ). Since τk is guaranteed to be close to the true value of max a2 ij (within a constant factor of 4), we run the last instance of Algorithm 4 with some constant θ, which is derived from the target approximation factor γ. Lemma 5 shows that the estimation output by Algorithm 4 in line 6 falls within the range of (1 θ, 1 + θ) of the true value. Following that, Theorem 2 guarantees that the quantity output by Algorithm 2 is a (γ, δ)-upper approximation to Smooth Sensitivity S ,β(G) of the triangle count of graph G. Lemma 5. (Lemma 19 in SM) With probability at least 1 δ in Algorithm 2, maxij xij W s (1 θ, 1 + θ) maxij a2 ij. Theorem 2. (Theorem 5, 6 in SM) Suppose maxij aij > 0. Algorithm 2 outputs S ,β(G), which is a (γ, δ)-upper approxmiation to S ,β(G), in O 2δ ) W 1 log m log n e2γ 1 e2γ +1 2 maxij a2 ij + m + log n Proof. (Sketch) Algorithm 1 has a run-time of k O(s log m)) where s = O( W 1c log n/(θ2τ)) as it executes Algorithm 4 with varying s = s and k is the number of search steps. Since at the end of the search τk > maxij a2 ij/4 w.h.p., we know that k = O(log 4n2 maxij a2 ij ). In the first step, we start with τ0 = n2 and reduce τk by a factor of 3/4 after each step. The total running time will be O( W 1 log m c log n θ2 1 n2 (1 + (3/4) 1 + . . . (3/4) k) = O W 1c log m log n θ2 maxij a2 ij Algorithm 2 has a run-time of O logn (2δ) W 1 log m log n e2γ 1 e2γ +1 2 maxij a2 ij + m + log n taking into account the time required to compute W (O(m)) and S ,β(G) (O(log (n)/β + n)) and substituting the values of c and θ as described in the Algorithm. Corollary 1. Suppose maxij aij > 0, G is (C, Λ)-transitive (as in Definition 5), and δ n 2. Then, for any parameter γ > 0. Algorithm 2 outputs S ,β(G), which is a (γ, δ)-upper approxmation to S ,β(G), in O(m max(1/C2, Λ2)) log m log n/ min(γ4, 1/4) + n) time. Proof. Since e2γ 1 e2γ+1 2 > γ4 for 0 < γ < 1/ 2, we have e2γ 1 e2γ+1 2 > min(γ4, 1/4). Next, for any node i and j Ni, we have deg(j) maxi j ai j max{ 1 C , Λ}: if deg(j) > Λ, it must be the case that at least C-fraction of the paths i, j, k for k Nj are closed, which means aij C deg(j), and so deg(j) maxi j ai j 1 C ; if deg(j) Λ, we have deg(j) maxi j ai j Λ. There- fore, W maxij a2 ij = P j Ni deg(i) deg(j) maxi j a2 i j = P i V (G) deg(i) maxi j ai j P j Ni deg(j) maxi j ai j P j Ni max( 1 C2 , Λ2) = 2m max( 1 C2 , Λ2), the Corollary follows. 5 Higher-order local sensitivity and improved bounds for path counting As mentioned earlier in Section 2, Karwa et al. [23] develop a different technique involving local sensitivity of the local sensitivity, and use it for private counts of k-cliques, since it seems hard to # ! # " # " ## # # # # # $ # % Figure 2: Exaples of compact subgraphs and anchor edges in the algorithm for computing higher-order sensitivity of paths: (Left) D = {r1, r2, r3, r4}, edges (r1, r 1), (r2, r 2), (r3, r 3) and (r4, r 4) are external anchor edges, while (r5, r 5) is an internal anchor edge. Nodes s and t are the starting and ending nodes of the path; (Middle) D = {r1, r2, r3}, and (r1, r 1), (r2, r 2), (r3, r 3) are external anchor edges, while (r5, r 5) is an internal anchor edge. Node s is the starting node of the path, and the path ends at some other node, not in this compact subgraph; (Right) D = {r1, r2, r3, r4}, edges (r1, r 1), (r2, r 2), (r3, r 3) and (r4, r 4) are external anchor edges, while (r5, r 5) and (r6, r 6) are internal anchor edges. Paths pass through this compact subgraph, without starting or ending in it. estimate smooth sensitivity for such subgraphs. We generalize it, and develop a multi-level local sensitivity approach, extending [23], for privately counting the number of copies of any subgraph H; as in the case of smooth sensitivity, we show that this also works with approximate queries. Let ℓdenote the number of edges in H. Our main idea involves finding private bounds for higherorder local sensitivity. Let L = {{i, j} : i, j V } denote all possible pairs of nodes in V . For subgraph H, and a subset S {{i, j} : i, j V } of pairs of nodes, let f H(G, S) denote the number of embeddings of H in the graph G(V, E S), which contain all the edges corresponding to pairs of nodes in S. For S = , f H(G, ) is the number of embeddings of H in G. We drop the subscript H when it is clear from the context. Let f (k)(G) = max|S|=k f(G, S); note that f (0)(G) = f(G). It can be seen that LS(f (k)) f (k+1)(G), which is the basis for considering higher-order local sensitivity for subgraph counting. However, a careful analysis of the privacy bounds becomes involved. Our approach is described in Algorithm 5 in the supplementary material, and the main result is summarized below. Theorem 3. Let g(k)(G) = f (k)(G)+g(k+1)(G) ln 1/δ2 ϵ2 +Lap(g(k+1)(G)/ϵ2), for k = ℓ 1, , 0 as computed in Algorithm 5. Then, g(0)(G) is an ((km + 1)ϵ2, δ2 + (km + 1)eϵ2δ2)-DP estimate of f(G). The above result, and the definition of f (k)(G) implies that for any subgraph with ℓedges, the higher-order local sensitivity approach can be run in time O(n2ℓ). This can be improved further for paths and trees, as summarized below. Lemma 6. Private counting for a graph H with a constant ℓnumber of edges using higher-order local sensitivity (Algorithm 5) has running time O(n2ℓ). When H is a path or tree with ℓedges, Algorithm 5 has running time O(nℓ+1). Higher-order local sensitivity with approximate queries. We show that our method works even if we have a very good approximation ˆf(x) to f(x), instead of the exact value (Theorem 7 in the supplementary material). This allows us to use ˆf(x) instead of f(x) in Algorithm 5. Improved bounds for path counting in graphs with bounded degeneracy and treewidth. We say that G has degeneracy d, if the nodes can be ordered so that each node has at most d neighbors with higher index [13]. Informally, G has treewidth t if it has recursive balanced separators of size t; see [11] for details. There has been a lot of work showing that non-private algorithms for subgraph counting can be done efficiently when G has either of these parameters bounded, e.g., [5, 13, 6, 11]. We show that if G has degeneracy d and treewidth t, then the higher-order local sensitivity of paths of length k can be computed in time O(h(d, t, k)poly(n)), where h(d, t, k) is independent of n; thus the running time does not scale as nk. Our algorithm and analysis are summarized in Section E in the supplementary material (Theorem 8). Recall that the goal is to compute f (k)(G) = max S:|S| k f(G, S). We summarize the main ideas here. We show that a feasible solution, which involves fixing a subset S of pairs of nodes, can be viewed as a tuple of at most k disjoint subgraphs (Hi, Di), referred to as k-compact subgraphs. This is Network Description #nodes #edges #triangles Oregon-1 AS peering information-Oregon route-views 10,670 22,002 17,145 ca-Hep Th Collaboration network-Arxiv High Energy Physics 9,877 51,971 28,339 ca-Gr Qc Collaboration network-Arxiv General Relativity 5,242 14,496 48,620 Oregon-2 AS peering information-Oregon route-views 10,900 31,h180 82,857 ca-Cond Mat Collaboration network-Arxiv Condensed Matter 23,133 186,936 173,361 loc-Brightkite Brightkite location based online social network 58,228 428,156 494,728 com-Amazon Amazon product network 334,863 925,872 667,129 email-Enron Email communication network Enron 36,692 367,662 727,044 ca-Astro Ph Collaboration network-Arxiv Astro Physics 18,772 198,110 1.35 106 com-DBLP DBLP collaboration network 317,080 2,099,732 2.22 106 loc-Gowalla Gowalla location based online social network 196,591 950,327 2.27 106 ca-Hep Ph Collaboration network-Arxiv High Energy Physics 12,008 237,010 3.36 106 Table 2: Statistics of tested networks. defined to be a subgraph which satisfies the following conditions (Figure 6): (1) H = Gσ(v, k, F (i)) for some node v and subset F (i) = {e1, . . . , ei}, where Gσ is the k-hop graph starting at node v, only restricted to higher degeneracy order nodes than v, (2) the k-hop neighborhood is expanded as edges in F (i) (referred to as internal anchor edges) are added, one at a time, and (3) D is a subset of vertices in H, and is referred to as the set of connectors. The compact subgraphs are connected via edges between connector nodes. We show that paths using edges from S (which contribute to the count f(G, S)) can be viewed to consist of segments within these compact subgraphs, passing through internal anchor edges, as well as edges between connectors (which are also from S). We guess the number of path segments between connectors within a factor of (1 + 1/(log n)c) for a constant c, which allows us to search for a tuple of compact subgraphs with the corresponding approximate counts. We show that the number of such tuples is O(h(d, k)poly(n)), and existence of given tuple can be determined in a similar time. 6 Experimental Results We evaluate our algorithm for privately computing #triangles using approximate smooth sensitivity (Algorithm 2), compared with the exact version of smooth sensitivity and other baselines. Datasets. We consider different real-world networks from [26] as inputs. The networks have have sizes range from 10K 300K nodes, with one of the largest ones having over 2 million edges. Statistics of the networks are presented in Table 2. Due to space limit, we only present the results for some selected networks here. The full results are presented in the SM. Infrastructure. All algorithms are implemented in C++ and Open MP framework for parallelization. We ran our experiments on a system with a 48-core Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz and 1.5TB RAM and limit the number of parallel threads in all experiments to 40. Baselines. We implement two baseline methods: the exact smooth sensitivity of triangle count to calibrate Laplacian noise (denoted in our experiment as Karwa et al.) [23], and using the ladder function method (Zhang et al.) [36]. Metrics for evaluating performance. For each input network G, we calculate the following metrics: Accuracy of the private triangle counts, defined as TRIANGLE COUNT RELATIVE ERROR = |Mf (G) f (G)| f (G) , where Mf is the tested private mechanism to output f (G). Speedup of Algorithm 2, which compares its running time with the time required for [36, 23] SPEEDUP = Runtime of our algorithm on G Runtime of baseline algorithm on G (Figure 5). Approximation error of Algorithm 2 for estimating smooth sensitivity, defined as SENSITIVITY APPROXIMATION ERROR = Sβ(G) Sβ(G) Sβ(G) (Figure 4). Parameters. We test our algorithms at different privacy budgets ϵ {2{ 3, 2, 1,0,1}} and a fixed δ = 10 6. We report the average accuracy and runtime of five (5) repeats of the private triangle count via approximate smooth sensitivity experiments due to the randomness used in the algorithm. For the exact calculation, we calculate the smooth sensitivity once and re-sample the noise in each run, since the exact calculation of the sensitivity is deterministic and does not change for each setting. We use the true counts of triangles of the networks as reported by [26]. Figure 3: Triangle Count Relative Error, showing the accuracy of the private triangle count with noise calibrated by (1) our approximate smooth sensitivity (Algorithm 2), (2) the exact smooth sensitivity (Karwa et al.), and (3) the ladder function (Zhang et al.) Figure 4: Error factor in approximate smooth sensitivity output by Algorithm 2 (relative to the exact smooth sensitivity) on selected networks. Dotted lines indicate the theoretical upper bound factor eγ (guaranteed by Theorem 2). Figure 5: Speedup of Algorithm 2 over the exact smooth sensitivity (Karwa et al.), and the ladder function (Zhang et al.)) on selected networks. Experimental Results. The results show that our algorithm can achieve similar levels of accuracy to the exact calculation while being several orders of magintude faster. Figure 3 shows that when using approximate smooth sensitivity, the private triangle counts reach the accuracy of using the exact calculation of the smooth sensitivity across different privacy budgets in all but two networks (Oregon 1 and Oregon 2). In these two networks, the smooth sensitivities are relatively large in comparison with the triangle counts, which makes the noise fluctuate more for the approximate estimate (see Figure 8). Figure 4 shows that all approximate smooth sensitivities are within some small factor of the exact smooth sensitivities. Noter that approximate smooth sensitivities are always larger than the true smooth sensitivity. It is important since a lower value may expose the privacy due to an inadequate noise magnitude. In general, a smaller value of ϵ (higher privacy guarantee) requires a more accurate estimation of the sensitivity. It is illustrated in Figure 4 as the approximate smooth sensitivity is close to the exact smooth sensitivity in all networks at ϵ = 0.125 or ϵ = 0.25. Figure 5 shows that our approximation algorithm is orders of magnitude faster than the exact algorithm. In large graphs (Amazon, DBLP, Gowalla), the speedup may reach 1, 000-fold. Generally, a lower ϵ requires a smaller approximation factor (γ), which in turn requires a larger number of iterations to reduce the error in approximation. The majority of tested networks have speedup factors between 10 and 100-fold across all privacy budgets ϵ. 7 Conclusions and future work We give significant improvement in the running time for privately counting the number of embeddings of constant size subgraphs in a graph, without using noise based on global sensitivity. Despite a lot of work on private subgraph counting, efficient algorithms were not known for many subgraphs. Our results for triangles show significant benefits of our approach, by improving on the runtime over all prior private algorithms. Our approach of using approximations to sensitivity metrics opens the possibility of using other techniques from graph sampling and sketching to obtain more efficient algorithms. For general subgraphs, our focus here has been on theoretical bounds using multi-level local sensitivity combined with approximate queries. Developing efficient and practical algorithms for these problems is a good direction for future work. Acknowledgments and Disclosure of Funding This work was partially supported in part by the following grants: National Science Foundation Grants CCF-1918656 (Expeditions), OAC-1916805 (CINES), IIS-1955797, and CNS-2317193, VDH Grant PV-BII VDH COVID-19 Modeling Program VDH-21-501-0135, DTRA subcontract/ARA S-D0018915-TO-01-UVA, NIH 2R01GM109718-07, CDC MIND cooperative agreement U01CK000589, and the U.S. DOE Exascale Computing Project s (ECP) (17-SC-20-SC) Exa Graph codesign center at Pacific Northwest National Laboratory. [1] Hilal Asi and John C. Duchi. Instance-optimality in differential privacy via approximate inverse sensitivity mechanisms. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS 20, Red Hook, NY, USA, 2020. Curran Associates Inc. [2] Grey Ballard, Tamara G Kolda, Ali Pinar, and C Seshadhri. Diamond sampling for approximate maximum all-pairs dot-product (mad) search. In 2015 IEEE International Conference on Data Mining, pages 11 20. IEEE, 2015. [3] Jeremiah Blocki, Avrim Blum, Anupam Datta, and Or Sheffet. Differentially private data analysis of social networks via restricted sensitivity. In Proceedings of the 4th Conference on Innovations in Theoretical Computer Science, ITCS 13, page 87 96, New York, NY, USA, 2013. Association for Computing Machinery. [4] Jeremiah Blocki, Elena Grigorescu, Tamalika Mukherjee, and Samson Zhou. How to make your approximation algorithm private: A black-box differentially-private transformation for tunable approximation algorithms of functions with low sensitivity. ar Xiv preprint ar Xiv:2210.03831, 2022. [5] Marco Bressan. Faster subgraph counting in sparse graphs. In 14th International Symposium on Parameterized and Exact Computation (IPEC 2019). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019. [6] Marco Bressan and Marc Roth. Exact and approximate pattern counting in degenerate graphs: New algorithms, hardness results, and complexity dichotomies. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 276 285. IEEE, 2022. [7] Shixi Chen and Shuigeng Zhou. Recursive mechanism: Towards node differential privacy and unrestricted joins. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD 13, page 653 664, New York, NY, USA, 2013. Association for Computing Machinery. [8] Vincent Cohen-Addad, Chenglin Fan, Silvio Lattanzi, Slobodan Mitrovic, Ashkan Norouzi-Fard, Nikos Parotsidis, and Jakub M Tarnawski. Near-optimal correlation clustering with privacy. Advances in Neural Information Processing Systems, 35:33702 33715, 2022. [9] Laxman Dhulipala, Quanquan C Liu, Sofya Raskhodnikova, Jessica Shi, Julian Shun, and Shangdi Yu. Differential privacy from locally adjustable graph algorithms: k-core decomposition, low out-degree ordering, and densest subgraphs. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 754 765. IEEE, 2022. [10] Xiaofeng Ding, Xiaodong Zhang, Zhifeng Bao, and Hai Jin. Privacy-preserving triangle counting in large graphs. In Proceedings of the 27th ACM international conference on information and knowledge management, pages 1283 1292, 2018. [11] Rodney G Downey and Michael R Fellows. Fundamentals of parameterized complexity, volume 4. Springer, 2013. [12] Cynthia Dwork, Aaron Roth, et al. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3 4):211 407, 2014. [13] David Eppstein, Maarten Löffler, and Darren Strash. Listing all maximal cliques in sparse graphs in near-optimal time. ar Xiv preprint ar Xiv:1006.5440, 2010. [14] Michael Hay, Chao Li, Gerome Miklau, and David Jensen. Accurate estimation of the degree distribution of private networks. In 2009 Ninth IEEE International Conference on Data Mining, pages 169 178, 2009. [15] Jonathan Hehir, Aleksandra Slavkovic, and Xiaoyue Niu. Consistency of privacy-preserving spectral clustering under the stochastic block model. ar Xiv preprint ar Xiv:2105.12615, 2021. [16] Jacob Imola, Takao Murakami, and Kamalika Chaudhuri. Locally differentially private analysis of graph statistics. In 30th USENIX Symposium on Security, 2021. [17] Jacob Imola, Takao Murakami, and Kamalika Chaudhuri. {Communication-Efficient} triangle counting under local differential privacy. In 31st USENIX Security Symposium (USENIX Security 22), pages 537 554, 2022. [18] Jacob Imola, Takao Murakami, and Kamalika Chaudhuri. Differentially private subgraph counting in the shuffle model. ar Xiv preprint ar Xiv:2205.01429, 2022. [19] Shweta Jain and C Seshadhri. A fast and provable method for estimating clique counts using turán s theorem. In Proceedings of the 26th international conference on world wide web, pages 441 449, 2017. [20] Madhav Jha, C Seshadhri, and Ali Pinar. Path sampling: A fast and provable method for estimating 4-vertex subgraph counts. In Proceedings of the 24th international conference on world wide web, pages 495 505, 2015. [21] Tianxi Ji, Changqing Luo, Yifan Guo, Jinlong Ji, Weixian Liao, and Pan Li. Differentially private community detection in attributed social networks. In Asian Conference on Machine Learning, pages 16 31. PMLR, 2019. [22] Tianxi Ji, Changqing Luo, Yifan Guo, Qianlong Wang, Lixing Yu, and Pan Li. Community detection in online social networks: A differentially private and parsimonious approach. IEEE transactions on computational social systems, 7(1):151 163, 2020. [23] Vishesh Karwa, Sofya Raskhodnikova, Adam Smith, and Grigory Yaroslavtsev. Private analysis of graph structure. ACM Transactions on Database Systems (TODS), 39(3):1 33, 2014. [24] Shiva P Kasiviswanathan and Adam Smith. On the semantics of differential privacy: A bayesian formulation. Journal of Privacy and Confidentiality, 6(1), 2014. [25] Shiva Prasad Kasiviswanathan, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. Analyzing graphs with node differential privacy. In Proceedings of the 10th Theory of Cryptography Conference on Theory of Cryptography, TCC 13, pages 457 476, Berlin, Heidelberg, 2013. Springer-Verlag. [26] Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014. [27] Yang Li, Michael Purcell, Thierry Rakotoarivelo, David Smith, Thilina Ranbaduge, and Kee Siong Ng. Private graph data release: A survey. ACM Comput. Surv., jan 2023. Just Accepted. [28] Yvonne Mülle, Chris Clifton, and Klemens Böhm. Privacy-integrated graph clustering through differential privacy. In EDBT/ICDT Workshops, volume 157, 2015. [29] Dung Nguyen and Anil Vullikanti. Differentially private densest subgraph detection. In 38th International Conference on Machine Learning (ICML). PMLR, July 2021. [30] Hiep H Nguyen, Abdessamad Imine, and Michaël Rusinowitch. Detecting communities under differential privacy. In Proceedings of the 2016 ACM on Workshop on Privacy in the Electronic Society, pages 83 93, 2016. [31] Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. Smooth sensitivity and sampling in private data analysis. In Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, pages 75 84, 2007. [32] Davide Proserpio, Sharon Goldberg, and Frank Mc Sherry. Calibrating data to sensitivity in private data analysis. Proceedings of the VLDB Endowment, 7(8), 2014. [33] Zhan Qin, Ting Yu, Yin Yang, Issa Khalil, Xiaokui Xiao, and Kui Ren. Generating synthetic decentralized social graphs with local differential privacy. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS), pages 425 438, 2017. [34] Vibhor Rastogi, Michael Hay, Gerome Miklau, and Dan Suciu. Relationship privacy: Output perturbation for queries with joins. In Proceedings of the Twenty-Eighth ACM SIGMODSIGACT-SIGART Symposium on Principles of Database Systems, PODS 09, page 107 116, New York, NY, USA, 2009. Association for Computing Machinery. [35] Jessica Shi, Laxman Dhulipala, and Julian Shun. Parallel clique counting and peeling algorithms. In SIAM Conference on Applied and Computational Discrete Algorithms (ACDA21), pages 135 146. SIAM, 2021. [36] Jun Zhang, Graham Cormode, Cecilia M. Procopiuc, Divesh Srivastava, and Xiaokui Xiao. Private release of graph statistics using ladder functions. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD 15, page 731 745, New York, NY, USA, 2015. Association for Computing Machinery. A Differential Privacy with high probability Following [24], we define Indistinguishability and Point-wise Indistinguishability as follows: Definition 8. Two random variables X, Y having the same support are (ϵ, δ)-indistinguishable if for all set O Supp(X), Pr[X O] eϵ Pr[Y O] + δ and Pr[Y O] eϵ Pr[Y O] + δ Definition 9. Two random variables X, Y are point-wise (ϵ, δ)-indistinguishable if with probability at least 1 δ over a drawn from either X or Y , we have: e ϵ Pr[Y = a] Pr[X = a] eϵ Pr[Y = a] The following lemma allows us to interpret a mechanism that is ϵ-differentially private with probability at least 1 δ as (ϵ, δ)-differentially private. Lemma 7. (Lemma 3.3 of [24]) If X, Y are point-wise (ϵ, δ)-indistinguishable then they are (ϵ, δ)- indistinguishable. We extend the definition of (ϵ, δ)-indistinguishability by allowing some small error probability δ : Definition 10. Two random variables X, Y having the same Support are (ϵ, δ, δ )-indistinguishable if for all set O Supp(X), with probability at least 1 δ over a drawn from either X or Y , we have: Pr[X O] eϵ Pr[Y O] + δ and Pr[Y O] eϵ Pr[Y O] + δ The following lemma allows us to interpret a mechanism that is (ϵ, δ)-differentially private with probability at least 1 δ as (ϵ, δ + δ )-differentially private. Lemma 8. If X, Y are (ϵ, δ, δ )-indistinguishability then they are (ϵ, δ + δ )-indistinguishable. Proof. Our proof is insired by [24]. We define the set Bad as follow: Bad = {O : Pr[X O] > eϵ Pr[Y O] + δ or Pr[Y O] > eϵ Pr[Y O] + δ} Since X, Y are (ϵ, δ, δ )-indistinguishability, Pr[X Bad] < δ . We have: Pr[X O] Pr[X O \ Bad] + Pr[X Bad] eϵ Pr[Y O \ Bad] + δ + δ eϵ Pr[Y O] + δ + δ . Similarly for Y , we have Pr[Y O] eϵ Pr[X O] + δ + δ and the Lemma follows. B Missing proofs of Section 4 Approximate Smooth Sensitivity for exact queries When we can calculate a γ-approximation of the smooth sensitivity S f,β, i.e., Sf,β is a bounded approximation of S f,β with probability 1, we can substitute S f,β by Sf,β to calibrate the noise directly, although with a bit larger effective constant β = β + γ, which in turn will make the noise a bit larger. Lemma 9 formalizes the equivalence of S f,β and Sf,β in this case. Lemma 9. If Sf,β is a γ-approximation of the smooth sensitivity S f,β, then Sf,β is a β -smooth sensitivity of f at β = β + γ. Proof. By definition of Sf,β and S f,β, it follows that Sf,β(D) S f,β(D) LSf(D) for all x. Second, for all D D Dn, we have: Sf,β(D) eγS f,β(D) eγeβS f,β(D ) eγ+β Sf,β(D ) eβ Sf,β(D ) , the lemma follows from Definition 2. In the case when we can only calculate a (γ, δ )-approximation of the smooth sensitivity, i.e., Sf,β is a γ-approximation of S f,β with probability at least 1 δ , Lemma 9 cannot be directly applied. Similar to [31], we use the formalism of admissible noise, to prove we can use Sf,β to calibrate an (α, β)-admissible noise with appropriate values of α and β to deliver the privacy guarantee. In Lemma 10, we analyze the output of such a mechanism on an arbitrary pair of neighbors D and D . We condition our analysis on the events of success of Sf,β on both D and D (with probability at least 1 2δ ). The failure probability (at most 2δ ) will then add up to the δ-part of the final (ϵ, δ)-DP guarantee. Lemma 10. Let h be (α, β )-admissible noise. Let Sf,β be a (γ, δ )-approximation to S f,β, where β = β γ. Then, A(D) = f(D) + Sf,β(D) α Z is (ϵ, eϵ/2+1 2 δ + 2δ )-differentially private, where Z h. Proof. We prove that with probability at least 1 2δ , A(D) is (ϵ, eϵ/2+1 2 δ)-differentially private, and by applying Lemma 8 the Lemma follows. Let E be the event that S (D) Sf,β(D) eγS (D); we have Pr[E] 1 δ . Similarly, let E be the event that S (D ) Sf,β(D ) eγS (D ) for a fixed neighbor D of D; we have Pr[E ] 1 δ , and Pr[E E ] 1 2δ . We will condition on E E . Let S R+. Let N(D) = Sf,β(D) α , and define S1 = S f(D) N(D) , S2 = S1 + f(D) f(D ) N(D) = S f(D ) S3 = S2 N(D) N(D ) = S f(D ) We observe that A(D) S if and only if Z S1. Similarly, A(D ) S if and only if Z S3. We have Pr[A(D) S] = Pr[Z S1] eϵ/2 Pr Z S1 + f(D) f(D ) = eϵ/2 Pr[Z S2] + δ/2 eϵ Pr Z S2 exp ln N(D) + eϵ/2δ/2 + δ/2 = eϵ Pr[Z S3] + eϵ/2δ/2 + δ/2 = eϵ Pr[A(D ) S] + (eϵ/2 + 1)δ/2, where the first inequality is because of Sliding Property and |f(D) f(D )| N(D) = α|f(D) f(D )| Sf,β(D) α|f(D) f(D )| which satisfies the condition for the Sliding Property, and the second inequality is because of Dilation Property and ln N(D) ln Sf,β(D) Sf,β(D ) ln eγ S (D) |γ + ln e β| β , which satisfies the condition for the Dilation Property. Lemma 11 specializes Lemma 10 by using Laplacian noise. We note that this is a generalization of [31] s analysis of Laplacian noise. We introduce parameter k to control the magnitude of the Laplacian noise, while [31] s analysis fixed it to 1. This design gives us the flexibility to choose the values of α and β given fixed targets of ϵ and δ. When k = 1/2, our design yields the same results as [31]. Lemma 11. Let α = kϵ and β = 2k2ϵ log(2/δ) for any constant k > 0, then Lap(2k) is an (α, β)- admissible noise. Proof. For som real-valued random variable Y , let ρδ(Y ) be the least solution to Pr[Y ρδ] > 1 δ. Let δ = δ/2. Let Z Lap(2k). It follows that |Z| Exponential(1/(2k)). We will solve the value of ρδ (|Z|) as: 1 e 2k|ρδ (|Z|)| = 1 δ or ρδ (|Z|) = log(2/δ) Substituting ρδ (|Z|) to the formula of β, we have: β = 2k2ϵ log(2/δ) = kϵ ρδ (|Z|) We first prove the Sliding property of Lap(2k). It is equivalent to prove that ln( h(z+ ) h(z) ) ϵ/2 for any | | < α and h(z) is the PDF of Lap(2k). For the distribution Lap(2k), h(z) = 1 4ke |z|/(2k). We have: = ln exp( |z + |/(2k)) exp( |z|/(2k)) 2k (|z| |z + |) We next prove the Dilation property of Lap(2k), or prove that ln( eλh(eλz) h(z) ) ϵ/2 with probability at least 1 δ/2. In the first case, setting λ > 0, we have h(eλz) < h(z), and that ln( eλh(eλz) λ kϵ ρδ (|Z|). Since |Z| Exponential(1/(2k)), the median of the distribution of |Z| is ln(2) 1/(2k) = 2 ln(2)k. With δ < 1, we have δ < 1/2 and therefore ρδ (|Z|) median(|Z|) = 2ln(2)k. It follows that ln( eλh(eλz) h(z) ) kϵ 2 ln(2)k < ϵ In the second case, λ < 0. We consider the ratio h(eλz) h(z) = exp( |zeλ|/(2k)) exp( |z|/(2k)) We then have ln( eλh(eλz) h(z) ) λ + |z||λ| 2k < |z||λ| 2k . Consider the event G : {z : |z| < ρδ (|Z|)}. Under G, we have |z||λ| 1 2k ρδ (|Z|) ϵ ρδ (|Z|) 1 2k = ϵ 2. Under h, Pr[G] = 1 δ/2, which completes the proof for the Dilation property of Lap(2k). Finally, we apply a Laplacian noise with appropriate parameters to provide a concrete mechanism using Approximate Smooth Sensitivity Sf,β to guarantee (ϵ, δ)-DP. Lemma 12. (Lemma 2) Let Sf,β be a (γ, δ )-approximation to S f,β, for γ = β = 16ϵ ln(2/δ). Then, A(D) = f(D) + Lap( 2 Sf,β(D) ϵ ) is (ϵ, eϵ/2+1 2 δ + 2δ )-differentially private. Proof. Set k = 4, using Lemma 11 with α = kϵ = 4ϵ, and β = β + γ = 32ϵ ln(2/δ) = 2k2ϵ ln(2/δ). Then, from Lemma 11, the Lap(2k) = Lap(8) distribution is (α, β )-admissible. By Lemma 10, A(D) = f(D) + Sf,β(D) α Z = f(D) + Sf,β(D) 4ϵ Lap(8) = f(D) + Lap(2 Sf,β(D)/ϵ) is (ϵ, eϵ/2+1 2 δ + 2δ )- differentially private and the corollary follows. Approximate Smooth Sensitivity for Approximate queries Definition 11. (Definition 7) We define a functions gf and sf as: Af(D) if (1 α1)f(D) Af(D) (1 + α1)f(D), (1 α1)f(D) if Af(D) < (1 α1)f(D), (1 + α1)f(D) if Af(D) > (1 + α1)f(D). (3) Sf,β(D) if Sf,β(D) Sf,β(D) eγSf,β(D), Sf,β(D) if Sf,β(D) < Sf,β(D), eγSf,β(D) if Sf,β(D) > eγSf,β(D). (4) Let Sgf (D) = 4α1gf(D) + 2sf(D) Algorithm 3 Fast algorithm using Approximate Smooth Sensitivity for Approximate Query Af Input: G, f, ϵ, δ Output: (ϵ, (eϵ/2 + 2)δ)-differentially private estimaton of query f 1: β := γ := 4α1 = ϵ 6 log(2/δ) 2: δ1 := δ2 = δ/2 3: Calculate the (γ, δ2)-upper approximate smooth sensitivity Sf,β(G) 4: Calculate the (α1, δ1)-approximate f Af(G) 5: Return Af(G) + 8α1Af (G)+4 Sf,β(G) Lemma 13 proves that Sgf is a β -smooth upper bound of the local sensitivity of gf, i.e., we have to prove the two conditions: LSgf (D) Sgf(D), and Sgf (D) eβ Sgf (D ) for any pair of neighbor datasets D D ,. where LSgf (D) = max D D|gf(D) gf(D )| or the local sensitivity of gf at D. Lemma 13. (Lemma 3) Given gf, sf, and Sgf as defined above, Sgf is a β -smooth upper bound of the local sensitivity of gf, where β 4α1 + γ + β and α1 < 1/2. Proof. We now prove the two conditions. We first start with proving Sgf is an upper bound on the local sensitivity LSgf of gf: LSgf (D) = max D :D D gf(D) gf(D ) max D :D D (1 + α1)f(D) (1 α1)f(D ) max D :D D α1 f(D) + f(D ) + f(D) f(D ) α1 f(D) + f(D) + LS(D) + LSf(D) 2α1f(D) + 2LSf(D) 4α1gf(D) + 2Sf,β(D) 4α1gf(D) + 2sf(D) = Sgf (D). The first inequality uses Definition 7 and the fifth uses the facts, f(D) Af (D) 1 α1 2gf(D), and LSf(D) Sf,β(D) (as Sf,β(D) is a upper bound on LSf(D)). For the second condition, we need to prove that for any pair D D , the values of Sgf on them do not differ by more than a multiplicative factor of eβ with β = 4α1 + γ + β. Sgf (D) = 4α1gf(D) + 2sf(D) 4α1(1 + α1)f(D) + 2sf(D) 4α1(1 + α1)(f(D ) + sf(D)) + 2sf(D) = 4α1(1 + α1)f(D ) + 2(2α1(1 + α1) + 1)sf(D) 4α1(1 + α1) 1 α1 gf(D ) + 2(2α1(1 + α1) + 1)eγ+βsf(D ) 4α1(1 + α1)(1 + 2α1)gf(D ) + 2(1 + 4α1)eγ+βsf(D ) 4α1(1 + 4α1)gf(D ) + 2(1 + 4α1)eγ+βsf(D ) e4α1+γ+β(4α1gf(D ) + 2sf(D )) = eβ Sgf (D ). We note that sf, gf, and Sgf are all hypothetical functions, which we may not calculate directly. We need to connect the statement above (Sgf is a Smooth Sensitivity of gf) to the functions we can compute (Af and Sf,β). Again, we define a hypothetical mechanism A that takes D as its input. By the definition of A and the properties of admissible noise, A is (ϵ, δ)-DP. Instead of A , we can only calculate A, by replacing gf and sf by Af and Sf,β respectively. Next, we argue that A and A are, in fact, not too different. Lemma 14 conditions on the probability that A and A are the same and prove that the privacy guarantee of A is the same with of A in such condition. The failure probabilities (of Af and Sf,β in approximation of f and Sf) will add up to the δ-part of the final (ϵ, δ)-DP. Lemma 14. Let h be (α, β )-admissible noise with β = 4α1 + γ + β. With Af and Sf,β defined as above, Algorithm A(D) = Af(D) + 4α1Af (D)+2 Sf,β(D) α Z is (ϵ, eϵ/2+1 2 δ + 2δ1 + 2δ2)-differentially private, where Z h. Proof. We prove that with probability at least 1 2δ1 2δ2, A(x) is (ϵ, eϵ/2+1 2 δ)-differentially private, and by applying Lemma 8, the Lemma follows. Let A (D) = gf(D) + Sgf α Z = gf(D) + 4α1sf (D)+2sf (D) α Z, as we replace Af and Sf,β by gf and sf respectively. We note that the use of gf and sf is only for the purpose of privacy bound analysis. With β > 4α1 + γ + β, Lemma 3 shows that Sgf is a β -smooth upper bound on the sensitivity of gf. By Lemma 10, A (D) = gf(D) + Sgf α Z is (ϵ, eϵ/2+1 2 δ)-differentially private. Since with probability at least 1 δ1, Af(D) is gf(D), and at least 1 δ2, Sf,β(D) is sf(D), with probability at least 1 δ1 δ2, A(D) has the same components as A (D), hence they are identical. For a fixed neighbor D D, we also have that A (D) is identical with A (D ) with probability at least 1 δ1 δ2. By Lemma 8, A(D), A(D ) are (ϵ, eϵ/2+1 2 δ, 2δ1 + 2δ2)-indistinguishable. It follows that A(D) is (ϵ, eϵ/2+1 2 δ + 2δ1 + 2δ2)-differentially private. Theorem 4. (Theorem 1) Let Af be a (α1, δ1) approximation to f, Sf,β be a (γ, δ2)-upper approxima- tion to S f,β, for γ = β = 8ϵ ln(2/δ), α1 = 4ϵ ln(2/δ). Then, A(D) = Af(D) + Lap( 8α1Af (D)+4 Sf,β(D) is (ϵ, eϵ/2+1 2 δ + 2δ1 + 2δ2)-differentially private. Proof. Set k = 4, using Lemma 11 with α = kϵ = 4ϵ, and β = β + γ + 4α1 = 32ϵ ln(2/δ) = 2k2ϵ ln(2/δ). Then, from Lemma 11, the Lap(2k) = Lap(8) distribution is (α, β )-admissible. By Lemma 14, A(D) = f(D) + Sf,β(D) α Z = f(D) + Sf,β(D) 4ϵ Lap(8) = f(D) + Lap( 2 Sf,β(D) 2 δ + 2δ1 + 2δ2)-differentially private and the Theorem follows. C Missing proofs of Section 4.1 The main target of this section is to calculate an approximation of max a2 ij, such that we can apply it to approximate a γ-approximation of S ,β for any γ and β. There are three key components of the analysis: Algorithm 4 uses diamond sampling to quickly calculate xij that approximates a2 ij for every i, j. We use this algorithm as an important sub-routine in Algorithm 1 and 2. Lemma15 shows the utility of the algorithm. We note that due to the Lemma, the quality of each estimation xij depends on the true value a2 ij itself which is unknown and is the value we are trying to get. Hence, we cannot directly apply Algorithm 4 to get any-approximation of max a2 ij as we need. Algorithm 1 uses multiple rounds of Algorithm 4 with increasing numbers of iterations. The main idea behind this process is that, with a small number of iterations, Algorithm 4 executes very quickly and gives us a rough estimation xij of a2 ij for every i, j. We then check if the estimation is good enough. We know the estimation is considered good when maxij xij < 3 2τk {some constants}. Corollary 2, 3 will explain why. Finally, Lemma 18 concludes the utility of the output of the algorithm, such that after the last iteration k, τk is at most 4 times the value of max a2 ij. Algorithm 2 further improves the output of Algorithm 1. Now we know roughly how large max2 ij is, but only within a factor of 4. We will run the final instance of Algorithm 4 with a special constant θ, which we use to control the approximation of the final estimation. θ is set by γ-the approximation level required for γ-approximation of the smooth sensitivity described by the previous section (which in turn is set by (ϵ, δ)). Lemma 19 concludes the final utility of our estimation. We define the norm of matrix A as follows: A = A 1 = P ij Aij, Ai = P k Aik. Note that degree d(i) = Ai , d(i) = 1 Ai . The following concentration bound is shown in [2] for the output of Algorithm 4. Lemma 15. Lemma 3 of [2]. Fix θ > 0 and error probability β. For any aij > 0, if the number of samples s > 3 W log(2/β)/(θ2a2 ij) then The following two lemmas and corollaries relate upper and lower bounds on the random variable maxij xij output by Algorithm 4 to the quantity of interest, maxij a2 ij. Algorithm 4 Fast estimation of (aij)2 Input: G(V, E), Number of iterations s Output: xij that approximates a2 ij, for all i, j 1: Let A be the adjacency matrix of G 2: for (k, i) E do 3: Wki = d(k)d(i) 4: end for 5: x 0 6: for l = 1, . . . , s do 7: Sample (k, i) with probability Wki/ W 1 8: Sample j from Nk 9: Sample o from Ni 10: xij+ = Aoj 11: end for 12: Return x Lemma 16 states that if we choose a constant τ that is close but larger than max a2 ij, then after running Algorithm 4 with s iterations, the maximum of xij is not larger than τ {some constants}. Similarly, Lemma 17 states that if we choose a constant τ that is close but smaller than max a2 ij, then after running Algorithm 4 with s iterations, the maximum of xij is not smaller than τ {some constants}. With appropriate selections of the constants, Corollary 2 and 2 summarize the two lemmas and describe the condition in which we may find a good value for τ. They allow us to observe the output of an instance of Algorithm 4 and compare the outcome with a pre-determined value of τ. Depending on the sign of the comparision max xij W s τ, we can decide that if we are over-estimating τ (by Corollary 3) or under-estimating τ (by Corollary 2). The two lemmas and their corollaries below form the basis for Algorithm 1 that finds an appropriate value of τ using binary search. Lemma 16. For any τ such that maxij a2 ij < λτ for some constant 0 < λ, there exists a constant λ > λ such that with probability at least 1 1 poly(n), maxij xij W Proof. Consider any i, j. By Chernoff s bound, Pr[xij > (1 + θ )E[xij]] exp θ 2E[xij] As E[xij] = a2 ijs W , we have that s > (1 + θ )a2 ij] exp θ 2sa2 ij 3 W Setting λ = (1+θ )a2 ij τ , we have: 1 + θ = λ τ Therefore, we can lower bound θ as follows: = λ τ a2 ij a2 ij a2 ij since a2 ij > λτ. Now, we substitute (1 + θ )a2 ij by λ τ: s > λ τ] exp θ 2sa2 ij 3 W θ 23 W c log na2 ij 3 W θ2τ θ 2a2 ijc log n τ 2(λ λ)2a2 ijc log n a4 ijθ2τ τ(λ λ)2c log n λθ2 c log n = n c (λ λ)2 Selecting λ > λ s.t c (λ λ)2 λθ2 3 and taking the union bound on all pairs ij, the Lemma follows. Corollary 2. With c > 3, θ = 1 2, λ = 1, λ = 3 2, with probability at least 1 1/nc 2, if maxij xij W 2τ then maxij a2 ij > τ. Lemma 17. For any τ such that maxij a2 ij > λτ for some constant 0 < λ, then there exists a constant λ (1 θ)λ such that with probability at least 1 1 poly(n), maxij xij W Proof. Since maxij a2 ij > λτ, there exists pairs of nodes i j such that a2 i j > λτ. For such pairs i j , we have: Pr x2 i j W s < (1 θ)a2 i j exp θ2sa2 i j 3 W 3θ2 W a2 i j c log n 3 W θ2τ a2 i j c log n exp ( λc log n) = n cλ. Setting λ (1 θ)λ, we have λ (1 θ)a2 i j τ , or λ τ (1 θ)a2 i j . Therefore Pr h x2 i j W s < λ τ i Pr h x2 i j W s < (1 θ)a2 i j i n cλ. Choosing c such that cλ > 3 and taking the union bound over all pairs i j , the Lemma follows. Corollary 3. With c > 3, θ = 1 2, λ = 3, λ = 3 2, with probability at least 1 1/nc 2, if maxij xij W 2τ then maxij a2 ij < 3τ. Lemma 18. (Lemma 4) Let τk is the output of Algorithm 1, with probability at least 1 1/nc 3, τk < maxij a2 ij < 4τk. Proof. Assume the Algorithm 1 stops at step k. At step k 1, maxij xij < 3 2τk 1, then by Corollary 3 we have maxij a2 ij < 3τk 1 = 3 4 3τk = 4τk. At step k, maxij xij > 3 2τk, then by Corollary 2 we have maxij a2 ij > τk. Applying union bound on k steps, with probability at least 1 1/nc 3 we have τk < maxij a2 ij < 4τk, the Lemma follows. Finally, in Algorithm 2, we postprocess the output of Algorithm 1 to obtain an accurate estimate of maxij a2 ij and hence, the smooth sensitivity S ,β(G). Lemma 19. (Lemma 5) With probability at least 1 δ in Algorithm 2, maxij xij W s (1 θ, 1 + θ) maxij a2 ij. Proof. We split all pairs ij into 2 sets {i j : a2 i j < τ} and the remaining pairs { i j} that a2 i j > τ. We will prove that the Algorithm will not output xi j for any i j . For every i j such that a2 i j < τ, by Lemma 16, set λ = 1, λ = 3 2, since c 12θ2 or c(λ λ ) θ2 = 3, we have maxi j xi j W 2τ with probability at least 1 1/nc 2. Because maxij a2 ij > τk = 3τ, by Lemma 17, set λ = 3, λ = 3 2, since c 2 or cλ > 3, we have maxij xij W 2τ with probability at least 1 1/nc 2. Then with probability at least 1 1/2nc 2, Algorithm 2 will not output any xi j . The remaining sets { i j} that a2 i j > τ. By Lemma 15, i j : x i j W s (1 θ, 1 + θ)a2 i j. Applying the union bound on all pair i j and setting c logn(1/(2δ)) + 2, the Lemma follows. Theorem 5. (Theorem 2) S ,β(G) output by Algorithm 2 is a (γ, δ)-upper approxmiation to S ,β(G). Proof. From the result of Lemma 19, we have: (1 θ) max ij a2 ij max ij xij W s (1 + θ) max ij a2 ij Multipling all terms by 1/(1 θ), we have: max ij a2 ij max ij xij W s(1 θ) 1 + θ 1 θ max ij a2 ij max ij a2 ij ˆa2 1 + θ 1 θ max ij a2 ij max ij aij ˆa 1 + θ 1 θ max ij aij max ij aij ˆa e2γ max ij aij max ij aij ˆa eγ max ij aij. Therefore with probability at least 1 δ, we have LS(t) (G) c LS (t) (G) eγLS(t) (G) and hence: S ,β(G) S ,β(G) eγS ,β(G), with probability at least 1 δ. It follows that S ,β(G) is (γ, δ)-upper approximation to S ,β(G). Theorem 6. (Theorem 2) Suppose maxij aij > 0. Algorithm 2 has a runtime of logn (2δ) W 1 log m log n e2γ 1 e2γ +1 2 maxij a2 ij + m + log n Proof. Let m denote the number of edges in G. Then, computing W and W takes O(m) time and each sampling step requires O(log m) time. However, we can perform the calculation of W and W once and reuse them for all Algorithms (4, 1, and 2). Each sampling step in Algorithm 4 takes O(log m) time and there are s iterations which results in the total cost of O(s log m). Algorithm 1 has a run-time of k O(s log m)) where s = O( W 1c log n/(θ2τ)) as it executes Algorithm 4 with varying s = s and k is the number of search steps. Since at the end of the search τk > maxij a2 ij/4 w.h.p., we know that k = O(log 4n2 maxij a2 ij ). In the first step, we start with τ0 = n2 and reduce τk by a factor of 3/4 after each step. The total running time will be O( W 1 log m c log n θ2 1 n2 (1 + (3/4) 1 + . . . (3/4) k) = O W 1c log m log n θ2 maxij a2 ij Finally, Algorithm 2 has a run-time of O logn (2δ) W log m log n e2γ 1 e2γ +1 2 maxij a2 ij + m + log n taking into account the time required to compute W (O(m)) and S ,β(G) (O(log (n)/β + n)) and substituting the values of c and θ as described in the Algorithm. D Details from Section 5 As mentioned earlier, Karwa et al. [23] introduce the notion of local sensitivity of local sensitivity for private counting of k-cliques. In this work, we generalize this notion to multi-level local sensitivity. Recall that f(G, S) denotes the number of embeddings of a fixed subgraph H in G that contains all pairs in S and f (k)(G) = max|S|=k f(G, S). The following lemma is at the core of our approach as it relates local sensitivity to subgraph counting. Lemma 20. For all k 0 we have LS(f (k)) f (k+1)(G). Proof. When k = 0 we have LS(f (0)) = LS(f) = max{i,j} L |f(G (i, j)) f(G (i, j))| = max{i,j} L f(G, {i, j}) = f (1)(G). When k 1, we have LS(f (k)) = max{i,j} |f (k)(G (i, j)) f (k)(G (i, j))|. We argue below that for each {i, j} L, we have |f (k)(G (i, j)) f (k)(G (i, j))| f (k+1)(G), and the statement follows. We have two cases. Case 1. (i, j) G: let S L, |S| = k be a subset such that f (k)(G (i, j)) = f(G (i, j), S). Let T L, |T| = k be a subset such that f (k)(G) = f(G, T). By definition of T, we have f(G, S) f(G, T). We have two sub-cases, depending on whether {i, j} S or {i, j} S. First, suppose {i, j} S. This implies f(G (i, j), S) = f(G, S) + f(G, S {i, j}), and |f (k)(G (i, j)) f (k)(G)| = |f(G, S) + f(G, S {i, j}) f(G, T)| f(G, S {i, j}) f (k+1)(G). Next, suppose {i, j} S. Then, f(G, S) = f(G (i, j), S) f(G (i, j), T), from the definition of S. Since f( , T) is monotone with respect to the set of edges in the graph, we have f(G (i, j), T) f(G, T), which implies f(G, S) f(G, T). Combined with the earlier observation that f(G, T) f(G, S), we have f(G, T) = f(G, S). This implies |f (k)(G (i, j)) f (k)(G)| = |f(G (i, j), S) f(G, T)| = |f(G, S) f(G, T)| = 0 f (k+1)(G) Thus, when (i, j) G, in either sub-case, we have |f (k)(G (i, j)) f (k)(G)| f (k+1)(G). Case 2. (i, j) G: we have |f (k)(G (i, j)) f (k)(G (i, j))| = |f (k)(G) f (k)(G (i, j))| = |f (k)(G (i, j)) f (k)(G )|, where G = G (i, j). Repeating the argument in case 1 for the graph G , we have |f (k)(G (i, j)) f (k)(G )| f (k+1)(G ) f (k+1)(G). Lemma 21. (Lemma 4.4 of [23]) Let B be an (ϵ1, δ1)-differentially private algorithm such that Pr[B(x) LSf(x)) > 1 δ2 for all x. Consider the algorithm A that runs B(x) to obtain an estimate f LS of the local sensitivity and release both f LS and a noisy estimate f: A(x) = ( f LS, f(x) + Lap( f LS/ϵ2)), where f LS = B(x), and where Lap(λ) is a Laplace random variable with mean 0 and scale parameter λ. Then A is (ϵ1 + ϵ2, δ1 + eϵ1δ2)-DP. Algorithm 5 Estimating higher-order private local sensitivity. Input: G, ϵ, δ, subgraph H with ℓedges Output: Private estimate g(0) for f(G) 1: Let km = ℓ 2: for k = km 1 down to 0 do 3: g(k)(G) = f (k)(G) + g(k+1)(G) ln 1/δ2 ϵ2 + Lap(g(k+1)(G)/ϵ2) 4: end for 5: return g(0)(G) We now give the proof of correctness for Algorithm 5 using Theorem 3. Proof of Theorem 3 Proof. We prove by induction on k 1 that g(km k)(G) computed by Algorithm 5 is a ((k + 1)ϵ2, δ2 + (k + 1)eϵ2δ2)-DP private estimate, and Pr[g(km k)(G) f (k)(G)] 1 δ2. The base case is k = 1. Note that for km = ℓ, f (km) 1, since once ℓedges are fixed, either we get a fixed subgraph, or it doesn t result in an embedding. Since LS(f (km 1)) f (km), it follows that g(km 1) is an (ϵ2, 0)-DP estimate. Further, Pr[g(km 1) < f (km 1)] = Pr[Lap(f (km)/ϵ2] < f (km) ln(1/δ2)/ϵ2] < δ2, which proves the base case. Next, consider k > 1. From Lemma 20, LS(f (km k)) f (km k+1). Since g(km k+1) is an (kϵ2, δ2 + keϵ2δ2)-DP estimate, and Pr[g(km k+1) > f (km k+1)] > 1 δ2 (by induction), it follows from Lemma 21 that g(km k) is an (kϵ2 + ϵ2, δ2 + keϵ2δ2 + eϵ2δ2)-DP estimate. Also, Pr[g(km k) > f (km k)] = Pr[Lap(g(km k+1)/ϵ2) > g(km k+1) ln 1/δ2/ϵ2] > 1 δ2, and the inductive step follows. Finally, we analyze the running time of Algorithm 5 using Lemma 6. Proof of Lemma 6 Proof. Observe that f(G, S) can be computed in time O(n2(ℓ |S|), so that f (k) can be estimated in time O((n2)ℓ). Therefore, the total running time for Algorithm 5 is O(n2ℓ). Improved time for paths of length ℓ. We show that for paths with ℓedges, the running time can be actually improved from O(n2ℓ) to O(nℓ+1), by keeping track of slightly more complicated information. For a subset S L of pairs of nodes, let h(S) denote the set of unique nodes in S. Let π be a function, such that for each u h(S), π(u) specifies the position node u has in the embedding of the path. Let f(G, S, π) the number of paths containing the edges in S, but with the additional constraint that the ordering with respect to π be satisfied. Then, we have f (k)(G) = max|S|=k,π f(G, S, π). Proof. (Sketch) Observe that f(G, S, π) can be computed in time O(nℓ+1 |h(S)|) time, since this involves considering the nodes in the remaining positions of the path, other than those specified by π. Further, f (k)(G) can be computed in O(nℓ+1) time, since we can first guess the set h(S) and the ordering π (instead of directly guessing |S| pairs of nodes), and then guess the remaining nodes. Theorem 7. Let B be an (ϵ1, δ1)-differentially private algorithm such that Pr[B(x) LS(f, x)] > 1 δ2 for all x. Let ˆf(x) [(1 ϵ3)f(x), (1 + ϵ3)f(x)] be an approximation to f(x), with ϵ3 small enough so that ϵ3f(x) LS(f, x) for all x. Consider the algorithm A that runs B(x) releases: A(x) = ( f LS, ˆf(x) + Lap( f LS/ϵ2)), where f LS = B(x), and where Lap(λ) is a Laplace random variable with mean 0 and scale parameter λ. Then A is (ϵ1 + (3 + ϵ3)ϵ2, eϵ1(δ1 + δ2) + δ2)-DP. Proof. The proof is a modification of Lemma 4.4 of [23]). Define the following events A(x) = ( LS(f, x) = B(x), ˆf(x) + Lap( LS(f, x)/ϵ2)) A(y) = ( LS(f, y) = B(y), ˆf(y) + Lap( LS(f, y)/ϵ2)) Amix = ( LS(f, x) = B(x), ˆf(y) + Lap( LS(f, x)/ϵ2)) Let Px, Py and Pmix correspond to the distributions of these events. We first compare Py(S) and Pmix(S) for any subset S. Observe that Pmix(S) Py(S) = Pr[B(x) S] Pr[B(y) S]. Since B is (ϵ1, δ1)-DP, we have Pr[B(y) S] eϵ1 Pr[B(x) S] + δ1, which implies Py(S) eϵ1Pmix(S) + δ1. Next, we compare Pmix(S) and Py(S) for any subset S. Let F denote the event B(x) = LS(f, x) LS(f, x). Then, Px(S|F ) Pmix(S|F ) = Pr[Lap(z/ϵ2) S ˆ f(x)] Pr[Lap(z/ϵ2) S ˆ f(y)] exp( | ˆ f(y) ˆ f(x)| z ϵ2), where z = LS(f, x). We have that the fraction | ˆ f(y) ˆ f(x)| z 1 z max{|(1 + ϵ3)f(y) (1 ϵ3)f(x)|, |(1 + ϵ3)f(x) (1 ϵ3)f(y)|} = 1 z max{|(1 + ϵ3)(f(y) f(x)) + 2ϵ3f(x)|, |(1 ϵ3)(f(x) f(y)) + 2ϵ3f(x)|} (1 + ϵ3)z + 2ϵ3f(x) z , with probability 1 δ1 3 + ϵ3, with probability 1 δ1. where the second inequality holds because |f(x) f(y)| LS(f, x) LS(f, x), with probability 1 δ1. This implies Px(S|F ) Pmix(S|F ) e (3+ϵ3)ϵ2 with probability 1 δ1. Equivalently, Pmix(S|F) e(3+ϵ3)ϵ2Px(S|F) + δ1. Therefore, Pmix(S) = Pmix(S F) + Pmix(S F) e(3+ϵ3)ϵ2Px(S) + δ1 + δ2. Putting everything together, Py(S) eϵ1Pmix(S) + δ1 eϵ1+(3+ϵ3)ϵ2Px(S) + eϵ1(δ1 + δ2) + δ2. E Estimating higher-order local sensitivity for path counts Let H denote paths of length ℓ. We assume the graph G has degeneracy d and treewidth t. We show that f (k)(G) = max|S|=k f H(G, S) can be computed in time O(poly(n)(dt)k) under these assumptions on G (i.e., in time smaller than O(n2ℓ)). Let σ denote a degeneracy ordering of the nodes in V . So node v has index σ(v). Let Nσ(v) = {v : v N(v), σ(v ) > σ(v)} be the set of neighbors of v with index larger than σ(v); by definition of degeneracy, it follows that |Nσ(v)| d for all v. Let v, u be two nodes such that σ(v) < σ(u). Then, dist(v, u, σ, K) denotes the length of the shortest path from v to u in a graph K, when restricted to nodes v with σ(v) < σ(v ) < σ(u). Let G( ˆE) denote the graph induced by E ˆE. Let Nσ(v, k, ˆE) = {u : dist(v, u, σ, G( ˆE)) k} be the set of nodes within k-hops of v in the graph G( ˆE), restricted to higher index nodes than v. Let Gσ(v, k, ˆE) denote the graph induced by the subset Nσ(v, k, ˆE) of nodes. We say a pair e = (u , v ) is an internal anchor pair in Gσ(v, k, ˆE), if u , v Nσ(v, k, ˆE), and (u , v ) E(Gσ(v, k, ˆE)) (i.e., (u , v ) is a newly added edge); else, we say (u , v ) is an external anchor pair with respect to Gσ(v, k, ˆE). We will add internal anchor pairs sequentially. Let ei be the edge added in the ith step, and let F (i) = {e1, . . . , ei}. k-compact subgraph. We define (H, D) to be a k-compact subgraph if H = Gσ(v, k, F (i)) for some node v and subset F (i) = {e1, . . . , ei} i k # ! # " # " ## # # # # # $ # % Figure 6: (Left) D = {r1, r2, r3, r4}, edges (r1, r 1), (r2, r 2), (r3, r 3) and (r4, r 4) are external anchor edges, while (r5, r 5) is an internal anchor edge. Nodes s and t are the starting and ending nodes of the path; (Middle) D = {r1, r2, r3}, and (r1, r 1), (r2, r 2), (r3, r 3) are external anchor edges, while (r5, r 5) is an internal anchor edge. Node s is the starting node of the path, and the path ends at some other node, not in this compact subgraph; (Right) D = {r1, r2, r3, r4}, edges (r1, r 1), (r2, r 2), (r3, r 3) and (r4, r 4) are external anchor edges, while (r5, r 5) and (r6, r 6) are internal anchor edges. Paths pass through this compact subgraph, without starting or ending in it. The pair e1 = (u1, v1) is an internal anchor pair in Gσ(v, k, ). For each j = 2, . . . , i, the pair ej = (uj, vj) is an internal anchor pair in the graph Gσ(v, k, F (j 1) = {e1, . . . , ej 1}). D is a subset of vertices in H, and is referred to as the set of connectors. Lemma 22. Let (H, D) be a k-compact subgraph. Then H has O(kdk) nodes and edges. and the tuple (H, D) can be described using O(k2dk+1 log d) bits. Proof. (Sketch) We prove by induction on i that Gσ(v, k, F (i)) has at most (i + 1)dk nodes and edges. First observe that Nσ(v, k, ) has at most dk nodes and edges, since that all have index more than σ(v). We have i = 1 as the base case. Let σ(u1) < σ(v1). Then Gσ(v, k, {e1}) will have at most 2dk more nodes, with at most dk additional reachable nodes through v1. In the inductive step, Gσ(v, k, F (i) = {e1, . . . , ei}) has at most dk nodes (through vi, assuming σ(ui) < σ(vi)) in addition to those in Gσ(v, k, F (i 1)). The neighborhood of each node u Nσ(v, k, F (i)) can be represented in O(d log(kdk)) bits. Therefore, the total number of bits needed to specify H is O(kdkd log(kdk)) = O(kdk+1 log(kdk)). D consists of at most k nodes from Nσ(v, k, F (i)); therefore, D can be specified in O(k log(kdk)) bits. Therefore, (H, D) can be specified in O(kdk+1 log(kdk)) bits, which implies the number of distinct (H, D) pairs is at most exp(kdk+1 log(kdk)). Pairing of connector set. The connector set D = {r1, . . . , rℓ}, with ℓ k, represents nodes which have incident edges outside the compact subgraph H. We might have to consider three kinds of paths, corresponding to Figure 6 1. As in Figure 6 (Left), these start at a node s in H, then pass through multiple anchor edges, and end at a node t in H. We partition D into two singletons and multiple pairs π2(D) = {(rπ(1)), (rπ(2), rπ(3)), . . . , (rπ(ℓ))}. This corresponds to paths which leave H at node rπ(1) and finally return at node rπ(ℓ), with disjoint segments from rπ(2j) to rπ(2j+1) lying within H, for each j. We refer to the set of such pairings as Π2(D). 2. This corresponds to paths which start at a node s in H, then pass through multiple anchor edges, and don t return to a node in H eventually (as in Figure 6 (Middle)). We partition D into one singleton and multiple pairs π1(D) = {(rπ(1)), (rπ(2), rπ(3)), . . . , (rπ(ℓ 1), rπ(ℓ))}. We refer to the set of such pairings as Π1(D). 3. This corresponds to paths which only pass through nodes in H (as in Figure 6 (Right)). We partition D into multiple pairs π0(D) = {(rπ(1), rπ(2)), . . . , (rπ(ℓ 1), rπ(ℓ))}. We refer to the set of such pairings as Π0(D). Approximate counts corresponding to pairings. Note that for a given compact subgraph (H, D), there can be multiple pairings in Π0(D), Π1(D), Π2(D). For each such pairing, e.g., π Π0(D) = {(rπ(1), rπ(2)), . . . , (rπ(ℓ 1), rπ(ℓ))}, we can guess a set of tuples of numbers Num(π, ˆE) = {((Num(rπ(2j+1), rπ(2j+2)), j = 0, 1 . . .))}, where Num(rπ(2j+1), rπ(2j+2)) corresponds to the number of segments from rπ(2j+1) to rπ(2j+2), which are within H, and use some edges from ˆE as internal anchor edges The path segments counted in Num(rπ(2j+1), rπ(2j+2)) and those in Num(rπ(2j +1), rπ(2j +2)) are disjoint. In particular, this means that the internal anchor edges used in these counts are disjoint. The paths within a tuple ((Num(rπ(2j+1), rπ(2j+2)), j = 0, 1 . . .)) contain all the internal anchor edges in ˆE. In order to reduce the information complexity, we only keep track of Num(rπ(2j+1), rπ(2j+2)) approximately as powers of (1 + µ) for a small µ (0, 1); we denote these approximate counts as d Num(rπ(2j+1), rπ(2j+2)), and the corresponding tuple of counts by (d Num(rπ(1), rπ(2)), . . . , d Num(rπ(ℓ 1), rπ(ℓ))). d Num(π, ˆE) denotes the set of approximate path counts which use the internal anchor edges ˆE. Similarly, approximate counts d Num(π, ˆE) can be defined for π Π1(D) and π Π2(D). Let Π(D) = Π0(D) Π1(D) Π2(D). Let IA(H) denote the set of internal anchor edges in a compact subgraph H. We refer to a tuple (H, D, Π(D), S π Π(D) d Num(π, IA(H))) as an augmented compact subgraph with approximate counts. Feasible solutions as a tuple of augmented compact subgraph with approximate counts. We now observe that any feasible solution involving anchoring a subset S = {(ir, jr) : r = 1, . . . , s} of pairs of nodes can be viewed as a set of at most k disjoint augmented compact subgraphs with approximate counts. Lemma 23. Any feasible solution S = {(ir, jr) : r = 1, . . . , s}, with |S| k corresponds to a set of augmented compact subgraphs with approximate counts (H1, D1, Π(D1), S π Π(D1) d Num(π, IA(H1))), . . . , (Hℓ, Dℓ, Π(Dℓ), S π Π(Dℓ) d Num(π, IA(Hℓ)), for some ℓ k, such that for every j, (Hj, Dj) consists of nodes with lower index than those in (Hj , Dj ), for j < j . Proof. (Sketch) Consider the set of all paths P which are considered, i.e., those which use the edges from S. Let v be the node with the minimum index among those in P. Consider the graph Gσ(v, k, ). If any pair (is, js) S is an internal anchor pair for Gσ(v, k, ), define e1 = (is, js) (this is added to IA(H1)); this is again repeated if there is another pair (is , js ) which is an internal anchor pair for Gσ(v, k, e1). We stop, when we have a subgraph H1 such that no other pair in S is an internal anchor pair for H1. If there is a path P P with one end point u V (H1) and other end point outside H1, u will be added to D1. Finally, we construct pairings and associated approximate counts. If a path P has a segment P from r1 D1 to r2 D1 which is within H1, we would have (r1, r2) as part of a pairing, and the number of such segments will contribute to the associated counts. Thus a compact subgraph (H1, D1, Π(D1), S π Π(D1) d Num(π, IA(H1))) can be constructed corresponding to the set P. Next, consider the path segments we get once we delete all the segments in H1. The next compact subgraph is constructed in a similar manner. Let S(G, σ, k) denote the set of possible augmented compact subgraphs with approximate counts. From Lemma 23, it follows that any feasible solution is an element of S k k S(G, σ, k)k . Lemma 24. The number of feasible solutions is O(exp(kkdk+1 log(kdk))(k log n)k2). Proof. (Sketch) We first bound |S(G, σ, k)|. For each compact subgraph (H, D), the number of pairings |Π(D)| is at most kk. For each pairing π Π(D), there are O(log n)k possible approximate counts. Therefore, |S(G, σ, k)| = O(exp(kdk+1 log(kdk))(k log n)k using Lemma 22. The lemma follows since the number of feasible solutions is at most k|S(G, σ, k)|k. Finding a tuple of disjoint augmented compact subgraphs with approximate counts in G. We now show that we can check in polynomial time if an augmented compact subgraph exists. Thus in time O(|S(G, σ, k)|poly(n), we can check which augmented compact subgraphs exist. For a given augmented compact subgraph, the total number of paths can be approximated, which allows us to find the one with the maximum number. Lemma 25. Suppose we are given a tuple M1, . . . , Mℓ, where Mj = (Hj, Dj, Π(Dj), {d Num(π), π Π(Dj)}), for some ℓ k such that for every j, (Hj, Dj) consists of nodes with lower index than those in (Hj , Dj ), for j < j . It is possible to find a set of nodes v1, . . . , vℓ, sequence of edges F (1), . . . , F (ℓ), connector sets ˆD1, . . . , ˆDℓ, pairings Π( ˆD1), . . . , Π( ˆDℓ), such that for each j, Hj = Gσ(vj, k, F (j)), ˆDj V (Gσ(vj, k, F (j))), and S π Π( ˆ Dj) d Num(π, IA(Hj) = {d Num(π), π Π(Dj)} in time O(poly(n)exp(kkdk+1 log(kdk))(k log n)k2(kdk)t) if G has degeneracy d and treewidth t. Proof. (Sketch) We use cj to denote a color corresponding to Mj. For each node v, it is possible to decide in time O(kdk) if v can be colored by cj, i.e., if there exist F (j), ˆDj such that (Gσ(v, k, F (j)), ˆDj, S π Π( ˆ Dj) d Num(π, IA(Hj))) = Mj. If v has color cj, it has conflicts with C(v, cj) = V (Gσ(v, k, F (j))), i.e., we cannot have a node u C(v, cj) assigned color ci. Our goal is to find nodes v1, . . . , vℓsuch that vj can be assigned color cj, and vj C(vj, cj) for any j = j. If G has treewidth t, we can determine if such a coloring can be done in time O(poly(n)(kdk)t) time. The main idea for the dynamic program is the following. Let T = (V, E) denote a tree decomposition of G. Each node U V is a subset of V . We construct an information set I(U) for each U V, which will allow us to determine possible colors for each node v U, in the following manner. I(U) consists of a set of possible colorings x of U. We consider only x which correspond to colorings in which each color appears at most once; x can be viewed as a coloring of the subgraph induced by all the nodes in the subtree of T rooted at U. A coloring x specifies for each node u U, and for each j, an indication of the following: (1) node u is colored by cj, (2) node u is not colored by cj, but color cj that can be assigned to it, (3) node u is not colored by cj, and, moreover, cj cannot be assigned to u. This is because there exists some other node u such that u C(u , cj). We don t keep track of node u (otherwise we would end up needing nk information), but instead specify the set F and position of u in the compact graph Gσ(u , k, F). The information set I(U) consists of all possible such colorings x; note that |I(U)| = O((kdk)t). We construct the information sets of all U V in a bottom-up manner. Suppose U has children U1, U2 V. We construct I(U) from I(U1) and I(U2), as described below. For each x I(U1), x I(U2), we construct a coloring x, if feasible, in the following manner. For each node u U, and for each j: 1. If u U1 U2, x is infeasible if the indicators for u in x and x are inconsistent 2. If u is contained only in U1, the indicator for u in x is kept to be the same as that in x ; similarly if u is contained only in U2 3. If u U1 U2, we check if N(u) (U1 U2) = . In this case, we choose an indicator for u in x that is consistent with the indicators for its neighbors in x and x . It is also possible that we have inconsistencies, and in that case, x will not be feasible. If a node u1 N(u) U1 is in the conflict set of some node with color cj, using the position of u1 in the associated compact subgraph, we can determine if u would also have a conflict, and if so, at what position. Similarly, if there exists a node u2 N(u) U2. If there simultaneously exist u1 N(u) U1, u2 N(u) U2, both of which are in conflict sets with respect to cj, which induce inconsistent conflicts for node u with respect to cj, we would consider x infeasible; if the conflicts induced by u1, u2 are consistent, we could keep that indication for node u with respect to cj. Once the information sets I(U) for all U V are known, we can find a valid coloring in a top-down manner. Lemma 26. Suppose we are given a tuple M1, . . . , Mℓ, where Mj = (Hj, Dj, Π(Dj), {d Num(π), π Π(Dj)}), for some ℓ k such that for every j, (Hj, Dj) consists of nodes with lower index than those in (Hj , Dj ), for j < j . It is possible to determine in O(kk) time if the Mj s are consistent, i.e., if it is possible to find external anchor edges that are consistent for the tuples. Theorem 8. If G has degeneracy d and treewidth t, we can get a O(1 + 1/polylog(n)) factor upper bound on f(G, S) in time O(poly(n)exp(kkdk+1 log(kdk))(k log n)k2(kdk)t), . F Additional Experiments In this section, we present the complete set of experimental results for private triangle counting (Algorithm 2) on all the tested networks (12 networks given in Table 2). Figure 7: Triangle Count Relative Error, showing the accuracy of the private triangle count with noise calibrated by (1) approximate smooth sensitivity (Algorithm 2), and (2) exact smooth sensitivity. Figure 8: The ratio of the smooth sensitivity over the actual count of triangles for all networks. Figure 9: Speedup of Algorithm 2 (compared to the exact calculation) on all networks. Figure 10: Sensitivity Approx. Error, showing the error of approximate smooth sensitivity output by Algorithm 2 (relative to the exact smooth sensitivity) on all networks. Figure F shows that when approximate smooth sensitivity is used instead of exact smooth sensitivity, the accuracy of the private triangle counts measured using the metric, Triangle Count Relative Error, are the same across different privacy budgets in all but two networks (Oregon 1 and Oregon 2) (Figure 8). In these two networks, the smooth sensitivities are relatively large in comparison with the triangle counts, making the noise fluctuate much more for the approximate estimate. For the ca-Hep PH dataset, the two outputs coincide. Figure 8) shows the ratio of smooth sensitivity to actual triangle count for all datasets. Note that this ratio is large for the two datasets, Oregon 1 and 2, but small for all the other datasets. Figure 9 shows the speed-up achieved using Algorithm 2. As seen in the figure, it is orders of magnitude faster than the exact algorithm on all the 12 datasets. In fact, on large graphs (Amazon, DBLP, Gowalla), the speedup is as high as 1, 000-fold. Generally, a lower ϵ requires a smaller approximation factor (γ), which in turn requires a larger number of iterations to reduce the error in approximation. To summarize, the majority of tested networks have speedup factors between 10 and 100-fold across all privacy budgets ϵ. Figure 10 shows that all approximate smooth sensitivities computed by Algorithm 2 are close to the exact smooth sensitivities. We measure this using the metric, Sensitivity Approx. Error. Approximate smooth sensitivities are always larger than the true smooth sensitivity. It is necessary since a lower value may expose the privacy due to an inadequate noise magnitude. In general, a smaller value of ϵ (higher privacy guarantee) requires a more accurate estimation of the sensitivity. It is illustrated in Figure 10 as the approximate smooth sensitivity is close to the exact smooth sensitivity in all networks at ϵ = 0.125 or ϵ = 0.25.