# faster_privacy_accounting_via_evolving_discretization__78489f2e.pdf Faster Privacy Accounting via Evolving Discretization Badih Ghazi 1 Pritish Kamath 1 Ravi Kumar 1 Pasin Manurangsi 1 We introduce a new algorithm for numerical composition of privacy random variables, useful for computing the accurate differential privacy parameters for composition of mechanisms. Our algorithm achieves a running time and memory usage of polylog(k) for the task of self-composing a mechanism, from a broad class of mechanisms, k times; this class, e.g., includes the sub-sampled Gaussian mechanism, that appears in the analysis of differentially private stochastic gradient descent. By comparison, recent work by Gopi et al. (2021) has obtained a running time of e O( k) for the same task. Our approach extends to the case of composing k different mechanisms in the same class, improving upon their running time and memory usage from e O(k1.5) to e O(k). 1. Introduction Differential privacy (DP) (Dwork et al., 2006b;a) has become the preferred notion of privacy in both academia and the industry. Fueled by the increased awareness and demand for privacy, several systems that use DP mechanisms to guard users privacy have been deployed in the industry (Erlingsson et al., 2014; Shankland, 2014; Greenberg, 2016; Apple Differential Privacy Team, 2017; Ding et al., 2017; Kenthapadi & Tran, 2018; Rogers et al., 2021), and the US Census (Abowd, 2018). Besides the large volume of data, many of these systems offer real-time private data analytic and inference capabilities, which entail strict computational efficiency requirements on the underlying DP operations. We recall the definition of DP (Dwork et al., 2006b;a): Definition 1.1 (DP). Let ε > 0 and δ [0, 1]. A randomized algorithm M : X n Y is (ε, δ)-DP if, for *Equal contribution 1Google Research, USA. Correspondence to: Pritish Kamath , Pasin Manurangsi . Proceedings of the 39 th International Conference on Machine Learning, Baltimore, Maryland, USA, PMLR 162, 2022. Copyright 2022 by the author(s). all x, x X n differing on a single index and all outputs S Y, we have Pr[M(x) S] eε Pr[M(x ) S]+δ. The DP guarantees of a mechanism are captured by the privacy parameters ε and δ; the smaller these parameters, the more private the mechanism. Often a mechanism is simultaneously DP for multiple privacy parameters; this is captured by studying the privacy loss of a mechanism M the curve δM( ) for which the mechanism is (ε, δM(ε))-DP. A fundamental mathematical property satisfied by DP is composition, which prescribes the privacy guarantees of results from executing multiple DP mechanisms. In basic composition (Dwork et al., 2006a), a mechanism that returns the results of executing an (ε1, δ1)-DP mechanism and an (ε2, δ2)-DP mechanism is (ε1 + ε2, δ1 + δ2)-DP. Unfortunately, this bound becomes weak when composing a large number of mechanisms. The advanced composition (Dwork et al., 2010) offers stronger bounds: roughly speaking, composing k mechanisms that are each (ε, δ)-DP yields a mechanism whose privacy parameters are of the order of kε and kδ clearly more desirable than the basic composition. In fact, obtaining tight composition bounds has been an active research topic. Kairouz et al. (2015) showed how to obtain the exact privacy parameters of self-composition (mechanism composed with itself), while Murtagh & Vadhan (2016) showed that the corresponding problem for the more general case is #P-complete. Privacy Loss Distribution (PLD). Tighter bounds for privacy parameters that go beyond advanced composition are possible if the privacy loss of the mechanism is taken into account. Meiser & Mohammadi (2018); Sommer et al. (2019) initiated the study of numerical methods for accurately estimating the privacy parameters, using the privacy loss distribution (PLD) of a mechanism. The PLD is the probability mass function of the so-called privacy loss random variable (PRV) for discrete mechanisms, and its density function for continuous mechanisms (see Section 2 for formal definitions). PLDs have two nice properties: (i) tight privacy parameters can be computed from the PLD of a mechanism and (ii) the PLD of a composed mechanism is the convolution of the individual PLDs. Property (ii) makes PLD amenable to FFT-based methods. Koskela et al. (2020) exploited this property to speed up the computation of the PLD of the composition. An important step to retain efficiency using PLDs is approximating the distribution so that it has finite support; this is especially needed in the case when the PLD is continuous or has a large support. PLD implementations have been at the heart of many DP systems and open-source libraries including (Lukas Prediger, 2020; Google, 2020; Microsoft, 2021). To enable scale and support latency considerations, the PLD composition has to be as efficient as possible. This is the primary focus of our paper. Our starting point is the work of Koskela et al. (2020; 2021); Koskela & Honkela (2021), who derived explicit error bounds for the approximation obtained by the FFTbased algorithm. The running time of this algorithm for k-fold self-composition of a mechanism M that is (ε0, 0)- DP is1 e O k2ε0 . When ε0 = 1 k log(1/δerr), this running time is e O k1.5 , where δerr is the additive error in δ. Gopi et al. (2021) improved this bound to obtain an algorithm with running time of log 1 δerr εerr where εerr is the additive error in ε. When composing k different mechanisms, their running time is log 1 δerr εerr Our Contributions. We design and study new algorithms for k-fold numerical composition of PLDs. Specifically, for reasonable choice of mechanisms, we obtain (Section 3) a two-stage algorithm for selfcomposing PLDs, with running time log 1 δerr εerr provide (Section 4) an experimental evaluation of the two-stage algorithm, comparing its runtime to that of the algorithm of Gopi et al. (2021). We find that the speedup obtained by our algorithm improves with k. extend (Section 5) the two-stage algorithm to a recursive multi-stage algorithm with a running time of polylog(k) q log 1 δerr εerr 1For any positive T, we denote by e O(T) any quantity of the form O(T polylog(T)). Both the two-stage and multi-stage algorithms extend to the case of composing k different mechanisms. In each case, the running time increases by a multiplicative O(k) factor. Note that this factor is inevitable since the input size is k indeed, the algorithm needs to read the k input PLDs. We defer the details of this extension to Appendix B. Algorithm Overview. The main technique underlying our algorithms is the evolution of the discretization and truncation intervals of the approximations of the PRV with the number of compositions. To describe our approach, we first present a high-level description of the algorithm of Gopi et al. (2021). Their algorithm discretizes an O(1)-length interval into bucket intervals each with mesh size 1 k0.5 , thus leading to a total of O(k0.5) buckets and a running time of e O(k0.5) for the FFT convolution algorithm. Both these aspects of their algorithm are in some sense necessary: a truncation interval of length O(1) would lose significant information about the k-fold composition PRV, whereas a discretization interval of length 1 k0.5 would lose significant information about the original PRV; so relaxing either would lead to large approximation error. The key insight in our work is that it is possible to avoid having both these aspects simultaneously. In particular, in our two-stage algorithm, the first stage performs a k0.5-fold composition using an interval of length 1 k0.25 discretized into bucket intervals with mesh size 1 k0.5 , followed by another k0.5-fold composition using an interval of length O(1) discretized into bucket intervals with mesh size 1 k0.25 . Thus, in each stage the number of discretization buckets is e O(k0.25) leading to a final running time of e O(k0.25) for performing two FFT convolutions. The recursive multi-stage algorithm extends this idea to O(log k) stages, each performed with an increasing discretization interval and truncation interval, ensuring that the number of discretization buckets at each step is O(polylog(k)). Experimental Evaluation. We implement our two-stage algorithm and compare it to baselines from the literature. We consider the sub-sampled Gaussian mechanism, which is a fundamental primitive in private machine learning and constitutes a core primitive of training algorithms that use differentially private stochastic gradient descent (DP-SGD) (see, e.g., (Abadi et al., 2016)). For 216 compositions and a standard deviation of 226.86 and with subsampling probability of 0.2, we obtain a speedup of 2.66 compared to the state-of-the-art. We also consider compositions of the widely-used Laplace mechanism. For 216 compositions, and a scale parameter of 1133.84 for the Laplace distribution, we achieve a speedup of 2.3 . The parameters were chosen such that the composed mechanism satisfies (ε = 1.0, δ = 10 6)-DP. See Section 4 for more details. Related Work. In addition to Moments Accountant (Abadi et al., 2016) and R enyi DP (Mironov, 2017) (which were originally used to bound the privacy loss in DP deep learning), several other tools can also be used to upper-bound the privacy parameters of composed mechanisms, including concentrated DP (Dwork & Rothblum, 2016; Bun & Steinke, 2016), its truncated variant (Bun et al., 2018), and Gaussian DP (Dong et al., 2019; Bu et al., 2020). However, none of these methods is tight; furthermore, none of them allows a high-accuracy estimation of the privacy parameters. In fact, some of them require that the moments of the PRV are bounded; the PLD composition approach does not have such restrictions and hence is applicable to mechanisms such as DP-SGD-JL (Bu et al., 2021). In a recent work, Doroshenko et al. (2022) proposed a different discretization procedure based on whether we want the discretized PLD to be optimistic or pessimistic . They do not analyze the error bounds explicitly but it can be seen that their running time is O(k), which is slower than both our algorithm and that of Gopi et al. (2021). Another recent work (Zhu et al., 2022) developed a rigorous notion of worst-case PLD for mechanisms, under the name dominating PLDs. Our algorithms can be used for compositions of dominating PLDs; indeed, our experimental results for Laplace and (subsampled) Gaussian mechanisms are already doing this implicitly. Furthermore, Zhu et al. (2022) propose a different method for computing tight DP composition bounds. However, their algorithm requires an analytical expression for the characteristic function of the PLDs. This may not exist, e.g., we are unaware of such an analytical expression for subsampled Gaussian mechanisms. 2. Preliminaries Let Z>0 denote the set of all positive integers, R 0 the set of all non-negative real numbers, and let R = R { , + }. For any two random variables X and Y , we denote by d TV(X, Y ) their total variation distance. 2.1. Privacy Loss Random Variables We will use the following privacy definitions and tools that appeared in previous works on PLDs (Sommer et al., 2019; Koskela et al., 2020; Gopi et al., 2021). For any mechanism M, and any ε R 0, we denote by δM(ε) the smallest value δ such that M is (ε, δ)-DP. The resulting function δM( ) is said to be the privacy curve of the mechanism M. A closely related notion is the privacy curve δ(X||Y ) : R 0 [0, 1] between two random variables X, Y , and which is defined, for any ε R 0 as δ(X||Y )(ε) = sup S Ω Pr[Y S] eε Pr[X S], where Ωis the support of X and Y . The privacy loss random variables associated with a privacy curve δM are random variables (X, Y ) such that δM is the same curve as δ(X||Y ), and that satisfy certain additional properties (which make them unique). While PRVs have been defined earlier in Dwork & Rothblum (2016); Bun & Steinke (2016), we use the definition of Gopi et al. (2021): Definition 2.1 (PRV). Given a privacy curve δM : R 0 [0, 1], we say that random variables (X, Y ) are privacy loss random variables (PRVs) for δM, if (i) X and Y are supported on R, (ii) δ(X||Y ) = δM, (iii) Y (t) = et X(t) for every t R, and (iv) Y ( ) = X( ) = 0, where X(t) and Y (t) denote the PDFs of X and Y , respectively. Theorem 2.2 (Gopi et al. (2021)). Let δ be a privacy curve that is identical to δ(P||Q) for some random variables P and Q. Then, the PRVs (X, Y ) for δ are given by X = log Q(w) P (w) where ω P, and Y = log Q(w) P (w) where ω Q. Moreover, we define δY (ε) := EY [(1 eε Y )+] for every ε R and define εY (δ) as the smallest ε such that δY (ε) δ. Note that δY (ε) is well defined even for Y that is not a privacy loss random variable. If δ1 is a privacy curve identical to δ(X1||Y1) and δ2 is a privacy curve identical to δ(X2||Y2), then the composition δ1 δ2 is defined as the privacy curve δ((X1, X2)||(Y1, Y2)), where X1 is independent of X2, and Y1 is independent of Y2. A crucial property is that composition of privacy curves corresponds to addition of PRVs: Theorem 2.3 (Dwork & Rothblum (2016)). Let δ1 and δ2 be privacy curves with PRVs (X1, Y1) and (X2, Y2) respectively. Then, the PRVs for the privacy curve δ1 δ2 are given by (X1 + X2, Y1 + Y2). We interchangeably use the same letter to denote both a random variable and its corresponding probability distribution. For any two distributions Y1, Y2, we use Y1 Y2 to denote its convolution (same as the random variable Y1 + Y2). We use Y k to denote the k-fold convolution of Y with itself. Finally, we use the following tail bounds for PRVs. Lemma 2.4 (Gopi et al. (2021)). For all PRV Y , ε 0 and α > 0, it holds that Pr[|Y | ε + α] δY (ε) (1+e ε α) 4 αδY (ε) if α < 1. 2.2. Coupling Approximation To describe and analyze our algorithm, we use the coupling approximation tool used by Gopi et al. (2021). They showed that, in order to provide an approximation guarantee on the privacy loss curve, it suffices to approximate a PRV according to the following coupling notion of approximation: Definition 2.5 (Coupling Approximation). For two random variables Y1, Y2, we write |Y1 Y2| η h if there exists a coupling between them such that Pr[|Y1 Y2| > h] η. We use the following properties of coupling approximation shown by Gopi et al. (2021). We provide the proofs in Appendix A for completeness. Lemma 2.6 (Properties of Coupling Approximation). Let X, Y, Z be random variables. 1. If |X Y | δerr εerr, then for all ε R 0, δY (ε + εerr) δerr δX(ε) δY (ε εerr) + δerr. 2. If |X Y | η1 h1 and |Y Z| η2 h2, then |X Z| η1+η2 h1 + h2 ( triangle inequality ). 3. If d TV(X, Y ) η, then |X Y | η 0; furthermore, for all k Z>0, it holds that |X k Y k| kη 0. 4. If |X Y µ| 0 h (for any µ) and E[X] = E[Y ], then for all η > 0 and k Z>0, X k Y k η h q 2.3. Discretization Procedure We adapt the discretization procedure from (Gopi et al., 2021). The only difference is that we assume the support of the input distribution is already in the specified range as opposed to being truncated as part of the algorithm. A complete description of the procedure is given in Algorithm 1. Algorithm 1 Discretize RV (adapted from Gopi et al., 2021) Input: CDFY ( ) of a RV Y , mesh size h, truncation parameter L h ( 1 2 + Z>0). Constraint: Support of Y is contained in ( L, L]. Output: PDF of an approximation e Y supported on µ + (h Z [ L, L]) for some µ [ h 2 h for i = n to n do qi CDFY (ih + h/2) CDFY (ih h/2) end for q q/(Pn i= n qi) Normalize q µ E[Y ] Pn i= n ih qi e Y ih + µ w.p. qi for n i n return e Y The procedure can be shown to satisfy the following key property. Proposition 2.7. For any random variable Y supported in ( L, L], the output e Y of Discretize RV with mesh size Algorithm 2 Two Stage Self Compose PRV Input: PRV Y , number of compositions k = k1 k2 + r (for r < k1), mesh sizes h1 h2, truncation parameters L1 L2, where each Li hi ( 1 2 + Z>0). Output: PDF of an approximation Y2 for Y k. Y2 will be supported on µ + (h2Z [ L2, L2]) for some µ 0, h2 Y0 Y ||Y | L1 Y conditioned on |Y | L1 e Y0 Discretize RV(Y0, h1, L1) Y1 e Y L1k1 0 k1-fold FFT convolution e Y1 Discretize RV(Y1, h2, L2) Y2 e Y L2k2 1 k2-fold FFT convolution Yr e Y L1r 0 r-fold FFT convolution e Yr Discretize RV(Yr, h2, L2) return Y2 L2 e Yr FFT convolution h and truncation parameter L satisfies E[Y ] = E[e Y ] and |Y e Y µ| 0 h 2 , for some µ with |µ| h 3. Two-Stage Composition Algorithm Our two-stage algorithm for the case of k-fold selfcomposition is given in Algorithm 2. We assume k = k1 k2+r where k1, k2 Z>0, r < k1, and k1 k2 k, which for instance can be achieved by taking k1 = k , k2 = k/k1 , and r = k k1 k2. The algorithm implements the circular convolution L using Fast Fourier Transform (FFT). For any L and x R, we define x (mod 2L) = x 2Ln where n Z such that x 2Ln ( L, L]. Given x, y R the circular addition is defined as x L y := x + y (mod 2L). Similarly, for random variables X, Y , we define X L Y to be their convolution modulo 2L and Y Lk to be the k-fold convolution of Y modulo 2L. Observe that Discretize RV with mesh size h and truncation parameter L runs in time O(L/h), assuming an O(1)- time access to CDFY ( ). The FFT convolution step runs in time O Li hi log Li ; thus the overall running time of Two Stage Self Compose PRV is O L1 h1 log L1 h1 h2 log L2 h2 The approximation guarantees provided by our two-stage algorithm are captured in the following theorem. For convenience, we assume that k is a perfect square (we set k1 = k2 = k). The complete proof is in Section 3.1. Theorem 3.1. For any PRV Y , the approximation Y2 returned by Two Stage Self Compose PRV satisfies |Y k Y2| δerr εerr, when invoked with k1 = k2 = k0.5 (assumed to be an integer) and parameters given below (using η := δerr (8 h1 := εerr k0.5 q η , h2 := εerr k0.25 q L1 max εY εerrδerr 16k1.25 , εY 64k0.75 + εerr k0.25 L2 max εY k εerrδerr 16 + 2εerr, L1 . In terms of a concrete running time bound, Theorem 3.1 implies: Corollary 3.2. For any DP algorithm M, the privacy curve δM k(ε) of the k-fold (adaptive) composition of M can be approximated in time e O εup k0.25 log(k/δerr) εerr εerr is the additive error in ε, δerr is the additive error in δ, and εup is an upper bound on εY k( εerrδerr k εY εerrδerr In many practical regimes of interest, εup/εerr is a constant. For ease of exposition in the following, we assume that εerr is a small constant, e.g. 0.1 and suppress the dependence on εerr. Suppose the original mechanism M underlying Y satisfies (ε = 1 k log(1/δerr), δ = ok 1 k1.25 )-DP. Then by advanced composition (Dwork et al., 2010), we have that M t satisfies (ε q δ + 2tε(eε 1), tδ + δ )- DP. If tδ + δ tδerr k1.25 , then we have that εY t tδerr t k ln k tδerr . Instantiating this with t = 1, k, and k gives us that εup is at most a constant. Note that, to set the value of L1 and L2, we do not need the exact value of εY k (or εY k or εY ). We only need an upper bound on εY k, which can often be obtained by using the RDP accountant or some other method. For the case when k is not a perfect square, using a similar analysis, it is easy to see that the approximation error would be no worse than the error in k2(k1 + 1) self-compositions. The running time increases by a constant because of the additional step of r-fold convolution to get Yr and the final convolution step to get Y2 L2 e Yr; however this does not affect the asymptotic time complexity of the algorithm. Moreover, as seen in Section 4, even with this additional cost, Two Stage Self Compose PRV is still faster than the algorithm in Gopi et al. (2021). 3.1. Analysis In this section we establish Theorem 3.1. The proof proceeds are follows. We establish coupling approximations between consecutive random variables in the following sequence: Y k1k2, Y k1k2 0 , e Y k1k2 0 , Y k2 1 , e Y k2 1 , Y2 , and then apply Lemma 2.6(2). To establish each coupling approximation, we use Lemmas 2.6(3) and 2.6(4). Coupling Y k1k2, Y k1k2 0 . Since d TV(Y, Y0) = Pr[|Y | > L1] =: δ0, we have from Lemma 2.6(3) that |Y k1k2 Y k1k2 0 | k1k2δ0 0 . (1) Coupling Y k1k2 0 , e Y k1k2 0 and Y k2 1 , e Y k2 1 . We have from Proposition 2.7 that E[Y0] = E[e Y0] and that |Y0 e Y0 µ| 0 h1 2 . Thus, by applying Lemma 2.6(4), we have that (for any η; to be chosen later) Y k1 0 e Y k1 0 η h1 q η, (2) Y k1k2 0 e Y k1k2 0 η h1 q Similarly, we have that Y k2 1 e Y k2 1 η h2 q Coupling e Y k1k2 0 , Y k2 1 and e Y k2 1 , Y2 . Since d TV(e Y k1 0 , e Y L1k1 0 ) Pr h e Y k1 0 > L1 i =: δ1, and Y1 = Y L1k1 0 , it holds via Lemma 2.6(3) that e Y k1k2 0 Y k2 1 k2δ1 0 . (5) Similarly, for δ2 := Pr h e Y k2 1 > L2 i , we have that e Y k2 1 Y2 δ2 0 . (6) Towards combining the bounds. Combining Equations (1) and (3) to (6) using Lemma 2.6(2), we have that Y k1k2 Y2 δerr εerr, where δerr = 2η + k1k2δ0 + k2δ1 + δ2 , (7) and εerr = h1 k1k2 + h2 k2 q We set h1 := εerr q 2k1k2 log 2 η and h2 := εerr q η . The key step remaining is to bound δ0, δ1, and δ2 in terms of hi s, Li s, and ηi s. To do so, we use Lemma 2.4 for αi s to be chosen later. Bounding δ0. We have δ0 := Pr [|Y | > L1] and hence δ0 4 α0 δY (L1 α0). Bounding δ1. For eh1 := h1 q η = εerr 2 k2 , we have δ1 := Pr h e Y k1 0 > L1 i Pr h e Y k1 0 Y k1 0 > eh1 i + Pr h Y k1 0 > L1 eh1 i η + Pr h Y k1 > L1 eh1 i = δ1 η + 4 α1 δY k1 L1 εerr 2 k2 α1 , where the second inequality uses Equation (2) and the third inequality uses the fact that the tails of Y k1 are only heavier than the tails of Y k1 0 since Y0 is a truncation of Y . Bounding δ2. First, we combine Equations (3) to (5) to get (recall εerr in Equation (8)) Y k1k2 0 e Y k2 1 2η+k2δ1 εerr. Using this, we get δ2 := Pr h e Y k2 1 > L2 i Pr h e Y k2 1 Y k1k2 0 > εerr i + Pr h Y k1k2 0 > L2 εerr i 2η + k2δ1 + Pr Y k1k2 > L2 εerr 2η + k2δ1 + 4 α2 δY k1k2 (L2 εerr α2) = δ2 (k2 + 2)η + k2 4 α1 δY k1(L1 εerr 2 k2 α1) + 4 α2 δY k1k2 (L2 εerr α2) , where we use in the third step that tails of Y k1k2 are only heavier than tails of Y k1k2 0 since Y0 is a truncation of Y . Putting it all together. Plugging in these bounds for δ0, δ1, and δ2 in Equation (7), we get that Y k1k2 Y2 δerr εerr, where δerr (2k2 + 4)η + k1k2 4 α0 δY (L1 α0) + 2k2 4 α1 δY k1(L1 εerr 2 k2 α1) + 4 α2 δY k1k2 (L2 εerr α2) , (9) and εerr = h1 k1k2 + h2 k2 q Thus, we can set parameters L1, L2, and η as follows such that each of the four terms in Equation (9) is at most δerr/4 to satisfy the above: η = δerr 8k2+16, εY α0δerr 16k1k2 εY k1 α1δerr + εerr 2 k2 + α1 L2 max εY k1k2 α2δerr 16 + εerr + α2, L1 . Setting k1 = k2 = k (assumed to be integers) and α0 = εerr k2 , α1 = εerr 2 k2 and α2 = εerr, completes the proof of Theorem 3.1. Runtime analysis. As argued earlier, the total running time of Two Stage Self Compose PRV is given as O L1 h1 log L1 . Substituting in the bounds for L1, L2, h1, and h2, we get a final running time of εup k0.25 p log(k/δerr) εerr where εup is as in Corollary 3.2. 3.2. Heterogeneous Compositions Algorithm 2 can be easily generalized to handle heterogeneous composition of k different mechanisms, with a running time blow up of k over the homogeneous case. We defer the details to Appendix B. 4. Experimental Evaluation of Two Stage Self Compose PRV We empirically evaluate Two Stage Self Compose PRV on the tasks of self-composing two kinds of mechanism acting on dataset of real values x1, . . . , xn as Laplace Mechanism : returns P i xi plus a noise drawn from L(0, b) given by the PDF P(x) = e |x|/b/2b. Poisson Subsampled Gaussian Mechanism with probability γ: Subsamples a random subset S of indices by including each index independently with probability γ. Returns P i S xi plus a noise drawn from the Gaussian distribution N(0, σ2). Both these mechanisms are highly used in practice. For each mechanism, we compare against the implementation by Gopi et al. (2021)2 (referred to as GLW) on three fronts: (i) the running time of the algorithm, (ii) the number of discretized buckets used, and (iii) the final estimates on δY k(ε) which includes comparing lower bound δY2(ε + εerr) δerr, estimates δY2(ε) and upper bounds δY2(ε 2github.com/microsoft/prv accountant (a) Running times (average over 20 runs); shaded region indicates 20th 80th percentiles (b) Number of discretization buckets (c) Delta estimates Figure 1. Compositions of the Laplace mechanism. (a) Running times (average over 20 runs); shaded region indicates 20th 80th percentiles (b) Number of discretization buckets (c) Delta estimates Figure 2. Compositions of Poisson subsampled (with probability γ = 0.2) Gaussian mechanism. εerr) + δerr. We use εerr = 0.1 and δerr = 10 10 in all the experiments. We run each algorithm for a varying number k of selfcompositions of a basic mechanism. The noise parameter of basic mechanism is chosen such that the final δ(ε) value after k-fold composition is equal to 10 6 for each value of k.3 The exact choice of noise parameters used are shown in Figure 3. The comparison for the Laplace mechanism is shown in Figure 1 and for the subsampled Gaussian mechanism is shown in Figure 2. In terms of accuracy we find that for the same choice of εerr and δerr, the estimates returned by Two Stage Self Compose PRV are nearly identical to the estimates returned by GLW for the subsampled Gaussian mechanism. On the other hand, the estimates for Laplace mechanism returned by both algorithms are similar and consistent with each other, but strictly speaking, incomparable with each other. 3these values were computed using the Google DP accountant github.com/google/differential-privacy/tree/main/python. 5. Multi-Stage Recursive Composition We extend the approach in Two Stage Self Compose PRV to give a multi-stage algorithm (Algorithm 3), presented only when k is a power of 2 for ease of notation. Similar to the running time analysis of Two Stage Self Compose PRV, the running time of Recursive Self Compose PRV is given as assuming an O(1)-time access to CDFY ( ). Theorem 5.1. For all PRV Y and k = 2t, the approximation Yt returned by Recursive Self Compose PRV satisfies |Y k Yt| δerr εerr, using a choice of parameters satisfying for all i t that 2t i log 1 δerr Li εY 2i εerrδerr 2O(t) + hi 3 + 2i q (a) For Laplace mechanism (Figure 1). (b) For Poisson subsampled Gaussian mechanism (Figure 2). Figure 3. Noise parameters used for experiments. Algorithm 3 Recursive Self Compose PRV Input: PRV Y , number of compositions k = 2t, mesh sizes h1 ht, truncation parameters L1 Lt, where each Li hi ( 1 2 + Z>0) for all i. Output: PDF of an approximation Yt for Y k. Yt will be supported on µ+(ht Z [ Lt, Lt]) for some µ 0, ht Y0 Y ||Y | L1 Y conditioned on |Y | L1 for i = 0 to t 1 do e Yi Discretize RV(Yi, hi+1, Li+1) Yi+1 e Yi Li+1 e Yi FFT convolution end for return Yt Proof Outline. We establish coupling approximations between consecutive random variables in the sequence: Y 2t, Y 2t 0 , e Y 2t 0 , Y 2t 1 1 , . . . , Y 2 t 1, e Y 2 t 1, Yt , using a similar approach as in the proof of Theorem 3.1. Running time analysis. The overall running time is at most O P i Li hi log Li , which can be upper bounded by log 1 δerr εerr where εup := maxi 2t i εY 2i εerrδerr In many practical regimes of interest, εup/εerr is at most polylog(t) = polyloglog(k). For ease of exposition in the following, we assume that εerr is a small constant, e.g. 0.1 and suppress the dependence on εerr. Suppose the original mechanism M underlying Y satisfies (ε = 1 k log(1/δerr), δ = ok(1) k O(1) )-DP. Then by advanced compo- sition (Dwork et al., 2010), we have that M 2i satisfies δ +2i+1ε(eε 1), 2iδ +δ )-DP. If 2iδ +δ δerr 2O(t) , then we have that εY 2i δerr 1 2t i ln 2O(t) δerr . Instantiating this with i = 1, . . . , t gives us that εup is at most polylog(k). 6. Conclusions and Discussion In this work, we presented an algorithm with a running time and memory usage of polylog(k) for the task of selfcomposing a broad class of DP mechanisms k times. We also extended our algorithm to the case of composing k different mechanisms in the same class, resulting in a running time and memory usage e O(k); both of these improve the state-of-the-art by roughly a factor of k. We also demonstrated the practical benefits of our framework compared to the state-of-the-art by evaluating on the sub-sampled Gaussian mechanism and the Laplace mechanism, both of which are widely used in the literature and in practice. For future work, it would be interesting to tighten the log factors in our bounds. A related future direction is to make the Recursive Self Compose PRV algorithm more practical, since the current recursive analysis is quite loose. Note that Recursive Self Compose PRV could also be performed with an arity larger than 2; e.g., with an arity of 100, one would perform 1003 compositions as a three-stage composition. For any constant arity, our algorithm gives an asymptotic runtime of O(polylogk) as k , however, for practical considerations, one may also consider adapting the arity with k to tighten the log factors. We avoided doing so for simplicity, since our focus in obtaining an O(polylog(k)) running time was primarily theoretical. Acknowledgments We would like to thank the anonymous reviewers for their thoughtful comments that have improved the quality of the paper. Abadi, M., Chu, A., Goodfellow, I., Mc Mahan, H. B., Mironov, I., Talwar, K., and Zhang, L. Deep learning with differential privacy. In CCS, pp. 308 318, 2016. Abowd, J. M. The US Census Bureau adopts differential privacy. In KDD, pp. 2867 2867, 2018. Apple Differential Privacy Team. Learning with privacy at scale. Apple Machine Learning Journal, 2017. Bu, Z., Dong, J., Long, Q., and Su, W. J. Deep learning with Gaussian differential privacy. Harvard Data Science Review, 2020(23), 2020. Bu, Z., Gopi, S., Kulkarni, J., Lee, Y. T., Shen, J. H., and Tantipongpipat, U. Fast and memory efficient differentially private-SGD via JL projections. In Neur IPS, 2021. Bun, M. and Steinke, T. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In TCC, pp. 635 658, 2016. Bun, M., Dwork, C., Rothblum, G. N., and Steinke, T. Composable and versatile privacy via truncated CDP. In STOC, pp. 74 86, 2018. Ding, B., Kulkarni, J., and Yekhanin, S. Collecting telemetry data privately. In Neur IPS, pp. 3571 3580, 2017. Dong, J., Roth, A., and Su, W. J. Gaussian differential privacy. ar Xiv:1905.02383, 2019. Doroshenko, V., Ghazi, B., Kamath, P., Kumar, R., and Manurangsi, P. Connect the dots: Tighter discrete approximations of privacy loss distributions. In PETS (to appear), 2022. Dwork, C. and Rothblum, G. N. Concentrated differential privacy. ar Xiv:1603.01887, 2016. Dwork, C., Kenthapadi, K., Mc Sherry, F., Mironov, I., and Naor, M. Our data, ourselves: Privacy via distributed noise generation. In EUROCRYPT, pp. 486 503, 2006a. Dwork, C., Mc Sherry, F., Nissim, K., and Smith, A. D. Calibrating noise to sensitivity in private data analysis. In TCC, pp. 265 284, 2006b. Dwork, C., Rothblum, G. N., and Vadhan, S. Boosting and differential privacy. In FOCS, pp. 51 60, 2010. Erlingsson, U., Pihur, V., and Korolova, A. RAPPOR: Randomized aggregatable privacy-preserving ordinal response. In CCS, pp. 1054 1067, 2014. Google. DP Accounting Library. https: //github.com/google/differential-privacy/ tree/main/python/dp accounting, 2020. Gopi, S., Lee, Y. T., and Wutschitz, L. Numerical composition of differential privacy. In Neur IPS, 2021. Greenberg, A. Apple s differential privacy is about collecting your data but not your data. Wired, June, 13, 2016. Kairouz, P., Oh, S., and Viswanath, P. The composition theorem for differential privacy. In ICML, pp. 1376 1385, 2015. Kenthapadi, K. and Tran, T. T. L. Pripearl: A framework for privacy-preserving analytics and reporting at Linked In. In CIKM, pp. 2183 2191, 2018. Koskela, A. and Honkela, A. Computing differential privacy guarantees for heterogeneous compositions using FFT. ar Xiv:2102.12412, 2021. Koskela, A., J alk o, J., and Honkela, A. Computing tight differential privacy guarantees using FFT. In AISTATS, pp. 2560 2569, 2020. Koskela, A., J alk o, J., Prediger, L., and Honkela, A. Tight differential privacy for discrete-valued mechanisms and for the subsampled Gaussian mechanism using FFT. In AISTATS, pp. 3358 3366, 2021. Lukas Prediger, A. K. Code for computing tight guarantees for differential privacy. https://github.com/ DPBayes/PLD-Accountant, 2020. Meiser, S. and Mohammadi, E. Tight on budget? Tight bounds for r-fold approximate differential privacy. In CCS, pp. 247 264, 2018. Microsoft. A fast algorithm to optimally compose privacy guarantees of differentially private (DP) mechanisms to arbitrary accuracy. https://github.com/microsoft/ prv accountant, 2021. Mironov, I. R enyi differential privacy. In CSF, pp. 263 275, 2017. Murtagh, J. and Vadhan, S. The complexity of computing the optimal composition of differential privacy. In TCC, pp. 157 175, 2016. Rogers, R., Subramaniam, S., Peng, S., Durfee, D., Lee, S., Kancha, S. K., Sahay, S., and Ahammad, P. Linked In s audience engagements API: A privacy preserving data analytics system at scale. Journal of Privacy and Confidentiality, 11(3), 2021. Shankland, S. How Google tricks itself to protect Chrome user privacy. CNET, October, 2014. Sommer, D. M., Meiser, S., and Mohammadi, E. Privacy loss classes: The central limit theorem in differential privacy. Po PETS, 2019(2):245 269, 2019. Zhu, Y., Dong, J., and Wang, Y. Optimal accounting of differential privacy via characteristic function. In AISTATS, pp. 4782 4817, 2022. A. Proofs of Coupling Approximation Properties For sake of completeness, we include proofs of the lemmas we use from Gopi et al. (2021). Proof of Lemma 2.6(2). There exists couplings (X, Y ) and (Y, Z) such that Pr[|X Y | h1] η1 and Pr[|Y Z| h2] η2. From these two couplings, we can construct a coupling between (X, Z): sample X, sample Y from Y |X (given by coupling (X, Y )) and finally sample Z from Z|Y (given by coupling (Y, Z)). Therefore for this coupling, we have: Pr[|X Z| h1 + h2] Pr[|(X Y ) + (Y Z) h1 + h2] Pr[|X Y | + |Y Z| h1 + h2] Pr[|X Y | h1] + Pr[|Y Z| h2] Proof of Lemma 2.6(3). The first part follows from the fact that there exists a coupling between X and Y such that Pr[X = Y ] = d TV(X, Y ). The second part follows from the first part and the fact that d TV(X k, Y k) k d TV(X, Y ). Proof of Lemma 2.6(4). Let X = Y e Y where (Y, e Y ) are coupled such that |Y e Y µ| h with probability 1. Then E[X] = 0 and X [µ h, µ + h] w.p. 1 and hence by Hoeffding s inequality, Pr |X k| t 2 exp 2t2 = η , for t = h r B. Two-Stage Heterogeneous Composition We can handle composition of k different PRVs Y 1, . . . , Y k with a slight modification to Two Stage Self Compose PRV as given in Algorithm 4. The approximation analysis remains similar as before. The main difference is that L1 and L2 are to be chosen as L1 O max max i εY i εerrδerr εY tk1+1:(t+1)k1 L2 O max εY 1:k εerrδerr where we denote Y i:j = Y i Y j. In the case of k1 = k2 = k (assumed to be an integer) and r = 0, this leads to a final running time of εup k1.25 p log(k/δerr) εerr where εup can be bounded as max εY 1:k εerrδerr k max t εY t k max i εY i εerrδerr C. Analysis of Recursive Self Compose PRV We establish coupling approximations between consecutive random variables in the sequence: Y 2t, Y 2t 0 , e Y 2t 0 , Y 2t 1 1 , e Y 2t 1 1 , Y 2t 2 2 , . . . , Y 2 t 1, e Y 2 t 1, Yt , Coupling Y 2t and Y 2t 0 . Since d TV(Y, Y0) = Pr[|Y | > L1] =: δ0, we have from Lemma 2.6(3) that |Y 2t Y 2t 0 | 2tδ0 0 . (10) Algorithm 4 Two Stage Compose PRV for heterogeneous compositionss Input: PRVs Y 1, . . . , Y k, number of compositions k = k1 k2 + r, r < k1, mesh sizes h1 h2, truncation parameters L1 L2, where each Li hi ( 1 2 + Z>0). Output: PDF of an approximation b Y2 for Y 1 . . . Y k. b Y2 will be supported on µ + (h2Z [ L2, L2]) for some µ h2 for i = 1 to k1k2 do Y i 0 Y i||Y i| L1 Y i conditioned on |Y i| L1 e Y i 0 Discretize RV(Y i 0 , h1, L1) end for for i = k1k2 + 1 to k do Y i 0 Y i||Y i| L2 Y i conditioned on |Y i| L2 e Y i 0 Discretize RV(Y i 0 , h2, L2) end for for t = 0 to k2 1 do Y t 1 e Y tk1+1 0 L1 L1 e Y (t+1)k1 0 k1-fold FFT convolution e Y t 1 Discretize RV(Y t 1 , h2, L2) end for Y2 e Y 1 1 L2 L2 e Y k2 1 k2-fold FFT convolution return Y2 L2 e Y k1 k2+1 0 L2 L2 e Y k 0 Coupling Y 2t i i and e Y 2t i i . We have from Proposition 2.7 that E[Yi] = E[e Yi] and that |Yi e Yi µi| 0 hi+1 2 for some µi satisfying |µi| hi+1 2 . Thus, applying Lemma 2.6(4), we have (for η to be chosen later) that Y 2t i i e Y 2t i i η hi+1 2t i 1 log 2 η =: ehi+1. (11) Coupling e Y 2t i i and Y 2t i 1 i+1 . Since d TV(e Yi e Yi, e Yi Li+1 e Yi) Pr[|e Yi e Yi| > Li+1] =: δi+1, it holds via Lemma 2.6(3) that |e Y 2t i i Y 2t i 1 i+1 | 2t i 1δi+1 0 . (12) Putting things together. Thus, combining Equations (11) and (12) for i {0, . . . , t 1}, using Lemma 2.6(3), we have that Y 2t 0 Yt δ ε , (13) where δ := δ t := tη + Pt j=1 2t jδj , (14) and ε := ε t := Pt j=1 hj q More generally, the same analysis shows that for any 1 i t, Y 2i 0 Yi δ i ε i , where δ i := iη + Pi j=1 2i jδj , (15) and ε i := Pi j=1 hj q To simplify our analysis going forward, we fix the choice of hi s that we will use, namely, hi = ε η (for η that will be chosen later). This implies that 2t i = hi+1 i r1 where the last step uses that i t. In order to get our final bound, we need to bound δi s in terms of η, the mesh sizes hi s, and truncation parameters Li s. For ease of notation, we let δ 0 = 0. We have for 0 i < t that δi+1 = Pr h e Yi e Yi > Li+1 i Pr [|Yi Yi| > Li+1 2hi+1] (since, Yi e Yi hi+1 w.p. 1) 2 Pr h Yi Y 2i 0 > ε i i + Pr h Y 2i+1 0 > Li+1 2hi+1 2ε i i 2δ i + Pr h Y 2i+1 > Li+1 2hi+1 2ε i i δi+1 2δ i + 4 αi+1 δY 2i+1(e Li+1), (16) where in the penultimate step we use that the tails of Y 2i 0 are no larger than tails of Y 2i since Y0 is a truncation of Y , and in the last step we use Lemma 2.4 with e Li+1 := Li+1 2hi+1(1 + i q η) αi+1 (eventually we set αi+1 = hi+1). We show using an inductive argument that for Ci = 8i, δi 2Ci η + Pi j=1 4 αj δY 2j e Lj , (17) and δ i Ci+1 η + Pi j=1 4 αj δY 2j e Lj . (18) The base case holds since δ1 4 α1 δY 2(e L1); note C1 > 1. From (15), we have δ i iη + Pi j=1 2i jδj iη + Pi j=1 2i j 2Cj η + Pj ℓ=1 4 αℓδY 2ℓ e Lℓ iη + Pi j=1 2i j 2Cj η + Pi j=1 4 αj δY 2j e Lj iη + (4Ci) η + Pi j=1 4 αj δY 2j e Lj Ci+1 η + Pi j=1 4 αj δY 2j e Lj . This completes the inductive step (18). Finally, (16) immediately implies the inductive step (17). Putting this together in (14), and setting αi = hi, we get δ t 1 ε 2O(t) i=1 δY i(e Li) Finally, combining (13) with (10) using Lemma 2.6(2), we get Y 2t Yt δerr εerr where δerr := 1 ε 2O(t) η + Pt i=1 δY 2i(e Li) + 2O(t)δY (e L1) , and εerr := ε . Thus, we get the desired approximation result for the following choice of parameters η = δerr 2O(t) , hi = εerr t q 2t i log 1 δerr Li max n εY 2i εerrδerr 2O(t) + hi 3 + 2i q η , Li 1 o . Thus, the overall running time is at most log 1 δerr εerr where εup := maxi 2t i εY 2i εerrδerr Extensions of Recursive Self Compose PRV. Similar to Appendix B, it is possible to extend Recursive Self Compose PRV to handle the case where the number of compositions k is not a power of 2, with a similar asymptotic runtime. It can then be extended to handle heterogeneous compositions of k different mechanisms.