# poission_subsampled_rényi_differential_privacy__06cb76db.pdf Poisson Subsampled Renyi Differential Privacy Yuqing Zhu 1 Yu-Xiang Wang 1 We consider the problem of "privacyamplification by subsampling under the Renyi Differential Privacy (RDP) framework (Mironov, 2017). This is the main workhorse underlying the moments accountant approach for differentially private deep learning (Abadi et al., 2016). Complementing a recent result on this problem that deals with Sampling without Replacement (Wang et al., 2019), we address the Poisson subsampling scheme which selects each data point independently with probability γ. The seemingly minor change allows us to more precisely characterize the RDP of M Poisson Sample. In particular, we prove an exact analytical formula for the case when M is the Gaussian mechanism or the Laplace mechanism. For general M, we prove an upper bound that is optimal up to an additive constant of log(3)/(α 1) and a multiplicative factor of 1 + O(γ). Our result is the first of its kind that makes the moments accountant technique (Abadi et al., 2016) efficient and generally applicable for all Poisson-subsampled mechanisms. An open source implementation is available at https: //github.com/yuxiangw/autodp. 1. Introduction Privacy-amplification by Subsampling and the Renyi Dif- ferential Privacy are the two fundamental techniques that have been driving many exciting recent advances in differentially private learning (Abadi et al., 2016; Park et al., 2016; Papernot et al., 2018; Mc Mahan et al., 2018). One prominent use case of both techniques is the Noisy SGD algorithm (Song et al., 2013; Bassily et al., 2014; Wang et al., 2015; Foulds et al., 2016; Abadi et al., 1UC Santa Barbara, Department of Computer Science. Correspondence to: Yuqing Zhu , Yu-Xiang Wang . Proceedings of the 36 th International Conference on Machine Learning, Long Beach, California, PMLR 97, 2019. Copyright 2019 by the author(s). 2016) for differentially private deep learning. Noisy SGD iteratively updates the model parameters as follows: fi(θt) + Zt where θt is the model parameter at tth step, ηt is the learning rate, fi is the loss function of data point i, is the standard gradient operator, I [n] is a randomly subsampled index set and Zt N(0, σ2I). When fi(θt) is bounded (or clipped) in ℓ2-norm, the Gaussian noise-adding procedure is known to ensure (ϵ, δ)-DP for this iteration. ϵ, δ are nonnegative numbers that quantifies the privacy loss incurred from running the algorithm (the smaller the better). But this is clearly not good enough as it takes many iterations to learn the model, and the privacy guarantee deteriorates as the algorithm continues. This is where the privacyamplification and RDP become useful. The principle of privacy-amplification by subsampling works seamlessly with Noisy SGD as it allows us to exploit the randomness in choosing the minibatch I for the interest of a stronger privacy guarantee. Roughly speaking, if the minibatch I is obtained by selecting each data point with probability γ, then we can amplify the privacy guarantee to a stronger (O(γϵ), γδ)-DP. The RDP framework provides a complementary set of benefits that reduce the overall privacy loss over the multiple iterations we run Noisy SGD. Notice that the vanilla strong-composition is stated for any (ϵ, δ)-DP algorithm. By using the moments accountant techniques (Abadi et al., 2016) that keep track of the RDP of a specific algorithm subsampled-Gaussian mechanism, one can hope to more efficiently use the privacy budget than what an optimal algorithm would be able to using only (ϵ, δ)-DP (Kairouz et al., 2015). In general, however, calculating the RDP for the procedure that first subsamples the data set then apply a randomized mechanism M is highly non-trivial. An exact analytical formula is not known even for the widely-used subsampled Gaussian mechanism. Existing asymptotic bounds are typically off by a constant, and only apply to a restricted subset of the parameter regimes. To get the most mileage out of the moments accountant, practitioners often resort to numerical integration which calculates and keep track of a Poission Subsampled RDP fixed list of RDP values (Abadi et al., 2016; Park et al., 2016). Wang et al. (2019) took a first stab at this problem and provided a general RDP-amplification bound that applies to any M. Their result, however, is still a constant factor away from being optimal. A more subtle difference is that Wang et al. (2019) considered Subsampling without Replacements finding a random subset of size m at random rather than the Poisson subsampling that was used by Abadi et al. (2016), which includes each data points independently at random with probability γ. The difference is substantial enough that it introduces several new technical hurdles. In this paper, we provide the first general result of privacyamplification of RDP via Poisson subsampling. Our main contributions are the following. 1. First, we prove a nearly optimal upper bound on the RDP of M Poisson Sample as a function of the sampling probability γ, RDP order α, and the RDP of M up to α. The bound matches a lower bound up to an additive factor of log(3)/(α 1), where α is the order of RDP. When α is small relative to 1/γ with γ being the sampling probability, our upper bound is optimal up to a multiplicative factor of 1 + O(γαeϵ(α)). The result tightens and generalizes Lemma 3 of (Abadi et al., 2016), which addresses only the case when M is Gaussian mechanism and applies only to the cases when γ is very small. 2. Second, we identify a novel condition on the odd order Pearson-Vajda χα-Divergences under which we can exactly attain the lower bound. We show that Gaussian mechanism and Laplace mechanism fall under this category, but there exists M that samples from an exponential family distribution where the condition is false and the lower bound is not attainable. Practically, our analytical characterization simplifies the moments accountant approach for differentially private deep learning by avoiding numerical integration and pre-specifying a list of moments. On the theory front, our result corroborates the observation of Wang et al. (2019) that the Pearson-Vajda Divergences are natural quantities for understanding the subsampling in differential privacy. 3. Lastly, knowing that exactly evaluating the analytical subsampled RDP bound of αth order takes α calls of the RDP subroutine ϵM( ), we propose an efficiently τ-term approximation scheme that uses only τ call of ϵM( ). We conduct numerical experiments to compare our general bounds, tight bound, and τ-term approximations for a variety of problem setup and show- casing the use of these bounds in moments accountantbased strong composition. 2. Background and Problem Setup In this section, we provide some background on differential privacy, privacy-amplification by subsampling, RDP and the moments accountant technique so as to formally set up the problem. We will also introduce symbols and notations as we proceed. Differential Privacy. Let X be the space of all data sets. One representation of such a data set is to take X = {0, 1}N where N is the size of the population and each X X is an indicator vector that describes each individual s participation in the data set. We say X, X X are neighbors if X can be constructed by adding or removing one individual from X, or equivalently, X X 1 = 1. Definition 1 (Differential Privacy (Dwork et al., 2006)). A randomized algorithm M : X Θ is (ϵ, δ)-DP (differentially private) if for every pair of neighboring datasets X, X X, and every possible (measurable) output set E Θ the following inequality holds: Pr[M(X) E] eϵ Pr[M(X ) E] + δ. The definition places an information-theoretic limit on an adversary s ability to infer whether the input dataset is X or X , and as a result, guarantees a degree of plausible deniability to any individual in the population. ϵ, δ are privacy loss parameters that quantify the strength of privacy protection. In practice, we consider the privacy guarantee marginally meaningful if ϵ 1 and δ = o(1/n)1, where n denotes the size of data set and o( ) is the standard little-o notation. When δ = 0, we say that M obeys ϵ-(pure) DP. One important property of DP relevant to this paper is that it composes gracefully over multiple access. Roughly speaking, if we run k sequentially chosen (ϵ, δ)-DP algorithm on a dataset, the overall composed privacy loss is ( O( kϵ), kδ + δ )-DP where the O notation hides logarithmic terms in k, 1/δ and 1/δ . Part of the reason for writing this paper is to enable sharper algorithm-dependent composition for a popular class of algorithms that subsamples the data first. Before we get there, let us describe the RDP framework and the moments accountant that the make these algorithm-dependent composition possible. Renyi Differential Privacy and Moments Accountant. Renyi differential privacy (RDP) is a refinement of DP that uses Renyi-divergence as a distance metric in the place of the sup-divergence. Definition 2 (Rényi Differential Privacy (Mironov, 2017)). 1It is traditionally required that δ to be cryptographically small, e.g., o(poly(1/n)), but in practice, with a big data set, δ = 1/n2 is typically considered acceptable. Poission Subsampled RDP Figure 1. Illustration of the subsampled-mechanism and the key underlying idea that enables privacy-amplification . The diagram on the left illustrate the two parts of randomization. Part (1): Poisson Sample: Each person toss a random coin to select whether they are included in the data set; Part (2): The subsampled data set is analyzed by a randomized algorithm M. The figure on the right illustrates the fact that the distribution of output is a mixture distribution indexed by the different potential subset selected by the subsampling, and that when we change the original data set by adding or removing one person, only a small fraction of the mixture components that happen to be affected by that change will be different, hus opening up the possibility of privacy amplifying . We say that a mechanism M is (α, ϵ)-RDP with order α (1, ) if for all neighboring datasets X, X Dα(M(X) M(X )) := 1 α 1 log Eθ M(X ) $% p M(X)(θ) In this paper, we do not treat each α in isolation but instead take a functional view of RDP where we use ϵM(α) to denote that randomized algorithm M obeys (α, ϵM(α))-RDP. The function ϵM( ) can be viewed as a more elaborate description of the privacy loss incurred by running M. It subsumes pure-DP as an RDP algorithm is ϵ(+ )-DP. The moments accountant technique (Abadi et al., 2016) can be thought of as a data structure that keeps track of the RDP (function) for the sequence of data accesses. Composition is trivial in RDP as ϵM1 M2( ) = [ϵM1 + ϵM2]( ). At any given time, let the composition of all algorithms being M, the moments accountant can be used to produce an (ϵ, δ)-DP certificate using δ ϵ : ϵ(δ) = min α 1 + ϵM(α 1), (2) ϵ δ : δ(ϵ) = min α>1 e(α 1)(ϵM(α 1) ϵ). (3) This approach is simpler and often produces more favorable composed privacy parameters than the advanced composition approach for (ϵ, δ)-DP. As the moments accountant gain popularity, many classes of randomized algorithms with exact analytical RDP are becoming available, e.g., the exponential family mechanisms (Geumlek et al., 2017). As a side note, the initial moments accountant (Abadi et al., 2016) keeps track of a vector of log-moment (equivalent to RDP up to a rescaling) associated with a pre-defined list of order αs. Wang et al. (2019) observes that these optimization problems are unimodal and proposes an analytical moments accountant that solves (2) and (3) using bisections can be solved using bisection with a doubling trick. This avoids the need to pre-define the list of moments to track. Wang et al. (2019) also observes that (α 1)ϵ(α) is a convex function in α and any such discretization scheme (e.g., all integer α) can be extended into a continuous function in α by simply doing linear interpolation. Privacy amplification by subsampling. As we discussed in the introduction, privacy amplification by subsampling is the other workhorse (besides RDP / moments accountant) that drove much of the recent advances in differentially private deep learning. We would like to add that, it was also used as a key technical hammer for analyzing DP algorithms for empirical risk minimization (Bassily et al., 2014) and Bayesian learning (Wang et al., 2015), as well as for studying learning-theoretic questions with differential privacy constraints (Kasiviswanathan et al., 2011; Beimel et al., 2013; Bun et al., 2015; Wang et al., 2016). We now furnish a bit more details on this central property and highlight some subtleties in the types. The privacy amplification lemma was derived in (Kasiviswanathan et al., 2011; Beimel et al., 2013; Li et al., 2012), where all three authors adopted what Balle et al. (2018) calls Poisson subsampling: Definition 3 (Poisson Sample). Given a dataset X, the procedure Poisson Sample outputs a subset of the data {xi|σi = 1, i [n]} by sampling σi Ber(γ) independently for i = 1, ..., n. Poission Subsampled RDP The procedure is equivalent to the sampling without replacement scheme with m Binomial(γ, n). At the limit of n , γ 0 while γn λ, the Binomial distribution converges to a Poisson distribution with parameter λ. This is probably the reason why it is called Poisson sampling to begin with2. Here we cite the tight privacy amplification bound for Poisson Sample as it first appears. Lemma 4 ((Li et al., 2012, Theorem 1) ). If M is (ϵ, δ)-DP, then M that applies M Poisson Sample obeys (ϵ , δ )-DP with ϵ = log 1 + γ(eϵ 1) and δ = γδ. The lemma implies that if the base privacy loss ϵ 1, then the amplified privacy loss obeys that ϵ 2γϵ. Poisson subsampling is different from the sampling without replacement scheme that outputs a subset with size γn uniformly at random. Interestingly, it was shown that the latter also enjoys the same bound with respect to the replace-one version of the DP definition. In general, we find that the add/remove version of the DP definition works more naturally with Poisson sampling, while the replace-one version works well with sampling without replacement . We defer a more comprehensive account of the subsampling lemma for (ϵ, δ)-DP to (Balle & Wang, 2018) and the references therein. Subsampled RDP and friends. A small body of recent work focuses on deriving algorithm-specific subsampling Lemma so that this classical wisdom can be combined with more modern techniques such as RDP and Concentrated DIfferential privacy (CDP) (Bun & Steinke, 2016) (also (Dwork & Rothblum, 2016)). Abadi et al. (2016) obtains the first such results for subsampled-Gaussian mechanism under Poisson subsampling. Wang et al. (2019) provides a general subsampled RDP bound that supports any M but under the sampling without replacement scheme. The objective of this paper is to come up with results of a similar flavor for the Poisson sampling scheme. The main differences in our setting include: (a) Poisson sampling goes naturally with add/remove ver- sion of the DP definition, which is independent to the size of the data. (b) The size of the random subset m itself is a Binomial random variable. (c) It is asymmetric, the Renyi divergence of P against Q is different from the Renyi divergence of Q against P. 2We noticed that the original definition of Poisson sampling in the survey sampling theory is slightly more general. It allows a different probability of sampling each person (Särndal et al., 2003). Our results apply trivially to that setting as well with a personalized RDP bound for individual i that depends on γi. As we will see in the our results, the third difference brings about some major technical challenges. Finally, Bun et al. (2018) studies subsampling in CDP with a conclusion that subsampling does not amplify the CDP parameters in general. A truncated version of CDP was then proposed, called t CDP, which does get amplified up to a threshold. CDP and t CDP are closely related to RDP in that they are linear upper bounds of ϵ(α) on (1, ] and on (1, τ] for some threshold τ respectively. RDP captures finer information about the underlying mechanism. The experimental results in (Wang et al., 2019) suggest that unlike the case for the Gaussian mechanism (in which case CDP is tight), there isn t a good linear approximation of ϵ(α) for the subsampled-Gaussian mechanism due to the phase transition. Our results on the Poisson-sampling model echoes the same phenomenon. More symbols and notations. We end the section with a quick summary of the notations that we introduced. X, X denotes two neighboring datasets. M is a randomized algorithm and ϵM( ) is the RDP function of M (the subscript may be dropped when it s clear from the context). n, m are reserved for the size of the original and subsampled data. We note that neither is public and m is random. Greek letters α, γ, ϵ, δ are reserved for the order of RDP, the sampling probability as well as the two privacy loss parameters. M Poisson Sample(X) is used to mean the composition function M(Poisson Sample(X)). Let us also define a few shorthands. We will denote p to be the density function of M Poisson Sample(X), and q to be the density from data set M Poisson Sample(X ). Similarly, we will define µ0 and µ1 as two generic density functions of M(X) and M(X ). 3. Main results Before we present our main result, we would like to warn the readers that the presented bounds might not be as interpretable. We argue that this is a feature rather than an artifact of our proof because we need the messiness to state the bound exactly. These bounds are meant to be implemented to achieve the tightest possible privacy composition numerically in the Moments Accountant, rather than being made easily interpretable. After all, constant matters in differential privacy! For the interest of interpretability, we provide figures that demonstrate the behaviors of the bound for prototypical mechanisms in practice. Theorem 5 (General upper bound). Let M be any randomized algorithm that obeys (α, ϵ(α))-RDP. Let γ be the sub- Poission Subsampled RDP sampling probability and then we have for integer α 2, ϵM Poisson Sample(α) 1 α 1 log (1 γ)α 1(αγ γ + 1) γ2(1 γ)α 2eϵ(2) + 3 (1 γ)α ℓγℓe(ℓ 1)ϵ(ℓ) The proof is revealing but technically involved. One main difference from Wang et al. (2019) is that in Poisson sampling we need to bound both Dα(p q) and Dα(q p). Existing arguments via the quasi-convexity of Renyi divergence allows us to easily bound Dα(p q) tightly using RDP for the case when p has one more data points than q, but Dα(q p) turns out to be very tricky. A big part of our novelty in the proof is about analyzing Dα(q p). We defer more details of the proof to Appendix A. Theorem 6 (Lower bound). M and pairs of adjacent data sets such that ϵM Poisson Sample(α) 1 α 1 log (1 γ)α 1(αγ γ + 1) (1 γ)α ℓγℓe(ℓ 1)ϵ(ℓ) Proof. The construction effectively follows Proposition 11 of (Wang et al., 2019), while adjusting for the details. Let M be Laplace noise adding of a counting query f(X ) = , x X 1[x > 0]. Let everyone in the data set X obeys that x < 0. In the adjacent dataset X = X {xn+1} with xn+1 > 0. Let µ0 be the Laplace distribution centered at 0, µ1 be the one that is centered at 1. Then we know that M(X ) µ0 = q and M(X) (1 γ)µ0 + γµ1 = p. It follows that Eq[(p/q)α] = Eµ0[((1 γ) + γµ1/µ0)α] (1 γ)α ℓγℓEµ0[(µ1/µ0)ℓ]. By definition Eµ0[(µ1/µ0)ℓ] = e(ℓ 1)Dℓ(µ0 µ1). which the RDP bonud ϵ(ℓ) is attained by µ0, µ1, then we have constructed one pair of p, q, which implies a lower bound for RDP of M Poisson Sample. Note that the only difference between the upper and lower bounds are a factor of 3 on the third summand in side the logarithm. In the regime when γαeϵ(α) 1 (in which case the third summand is much smaller than the second), the upper and lower bound match up to a multiplicative factor of 1 + O(γαeϵ(α)). In all other regimes, the upper and lower bounds match up to an additive factor of log(3) α 1 . The results suggest that we can construct a nearly optimal moment accountant. Remark 7 (Nearly optimal Moment Accountant). This implies that if any algorithm with the help of an oracle that calculates the exact RDP for M is able to prove an (ϵ, δ)- DP for the Poisson subsampled RDP mechanism, then the RDP upper bound we construct using Theorem 5 will lead to an (ϵ, 3δ)-DP bound for the same mechanism. Moreover, we show that for many randomized algorithms (including the popular Gaussian mechanism and Laplace mechanism) that satisfy an additional assumption, we can strengthen the upper bound further and exactly match the lower bound for all α. Theorem 8 (Tight upper bound). Let M be a randomized algorithm with up to αth order RDP ϵ(α) < . If for all adjacent data sets X X , and all odd 3 ℓ α, Dχℓ(M(X) M(X )) := EM(X ) (4) then the lower bound in Theorem 6 is also an upper bound. The proof of this theorem is presented in Appendix A In the theorem, M(X) M(X ) 1 is a linearized ver- sion of the privacy random variable log M(X) M(X ) and Dχℓ(M(X) M(X )) is the Pearson-Vajda χℓpseudodivergence (Vajda, 1973), which has more recently been used to approximate any f-divergence in (Nielsen & Nock, 2014). The related |χℓ| version of this divergence is identified as the key quantity natural for studying subsampling without replacement (Wang et al., 2019). The non-negativity condition requires, roughly speaking, the distribution of the linearized privacy loss random variable M(X) M(X ) 1 to be skewed to the right. The following Lemma provides one way to think about it. Lemma 9. Let π, µ be two measures that are absolute continuous w.r.t. each other and let α 1. Eµ[(π/µ 1)α] = Eπ[(π/µ 1)α 1] Eµ[(π/µ 1)α 1]. Proof. Eµ[(π/µ 1)α] = Eµ[(π/µ 1)(π/µ 1)α 1] = Eπ[(π/µ 1)α 1] Eµ[(π/µ 1)α 1] The lemma implies that (4) holds if an only if for all even 2 ℓ α for all pairs of X, X . This should intuitively be true for most mechanisms because we know from nonnegativity that M(X) M(X ) 1 1, which poses a hard limit to which you can be skewed to the Poission Subsampled RDP Figure 2. Negative χα divergence in Poisson distribution. left. Indeed, we can show that the condition is true for the two most widely used DP procedure. Proposition 10. Condition (4) is true when M is the Gaussian mechanism and Laplace mechanism. The proof, given in Appendix B, is interesting and can be used as recipes to qualify other mechanisms for the tight bound. The main difficulty of checking this condition is in searching all pairs of neighboring data sets and identify one pair that minimizes the odd order moment. The convenient property of noise-adding procedure is that typically the search reduces to a univariate optimization problem of the sensitivity parameter. One may ask, whether the condition is true in general for any randomized algorithm M? The answer is unfortunately no. For example, Nielsen & Nock (2014) constructed an example of two Poisson distributions with negative χ3-divergence (see Figure 2 for an illustration.) This also implies that for some M, we can derive a lower bound that is greater than that in Theorem 6 by simply toggling the order of X and X . As a result, if one needs to work out the tight bound, the condition needs to be checked for each M separately. Finally, we address the computational issue of implementing our bounds in moments accountant. Naive implementation of Theorem 5 will easily suffer from overflow or underflow and it takes α calls to the RDP oracle ϵM( ) before we can evaluate one RDP of the subsampled mechanism at α. This is highly undesirable. The following theorem provides a fast approximation bound that can be evaluated with just 2τ calls to the RDP oracle of M. The idea is that since either it is the first few terms that dominates the sum or the last few terms that dominates the sum, we can just compute them exactly and calculate the remainder terms with a more easily computable upper bound. Theorem 11 (τ-term approximation). The expression in Theorem 6 (therefore Theorem 8) can be bounded by ϵM Poisson Sample(α) 1 α 1 log (1 γ)α(1 e ϵ(α τ)) + e ϵ(α τ)(1 γ + γeϵ(α τ))α (1 γ)α ℓγℓ(e(ℓ 1)ϵ(α τ) e(ℓ 1)ϵ(ℓ)) (1 γ)α ℓγℓ(e(ℓ 1)ϵ(ℓ) e(ℓ 1)ϵ(α τ)) A similar bound can be stated for the general upper bound in Theorem 5, which we defer to Appendix C. Remark 12 (Numerical stability). The bounds in Theorem 5 and 8 can be written as the log sum exp form, i.e., softmax. The numerically stable way of evaluating log sum exp is well-known. The bound in Theorem 11, though, have both positive terms and negative terms. We choose to represent the summands in the log term by a sign, and the logarithm of its magnitude. This makes it possible for us to use log diff exp and compute the bound in a numerically stable way. 4. Experiments and Discussion In this section, we conduct various numerical experiments to illustrate the behaviors of the RDP for subsampled mechanisms and showcasing its usage in moments accountant for composition. We will have three set of experiments. (1) We will just plot our RDP bounds (Theorem 5, Theorem 6) as a function of α. (2) We will compare how close the τterm approximations approximate the actual bound.(5) (3) We will build our moments accountant and illustrate the stronger composition that we get out of our tight bound. Specifically, for each of the experiments above, we replicate the experimental setup of which takes the base mechanism M to be Gaussian mechanism, Laplace mechanism and Randomized Response mechanism. Their RDP formula are worked out analytically (Mironov, 2017) below: ϵGaussian(α) = α 2σ2 , ϵLaplace(α) = 1 α 1 log(( α 2α 1 )e b for α > 1, ϵRand Resp(α) = 1 α 1 log(pα(1 p)1 α + (1 p)αp1 α) for α > 1, where parameter σ, b, p are the standard parameters for Gaussian, Laplace and Bernoulli distributions. Following Wang et al. (2019), we will have two sets of experiments with high noise, high privacy setting σ = 5, b = 2, and p = 0.6 and low noise, low privacy setting using σ = 1, b = 0.5, p = 0.9. These parameters are chosen such that the ϵ-DP or (ϵ, δ)-DP of the base mechanisms are roughly ϵ 0.5 in the high privacy setting or ϵ 2 in the low privacy setting. Poission Subsampled RDP (a) Subsampled Gaussian with σ = 5 (b) Subsampled Laplace with b = 2 (c) Subsampled Random Response with p = 0.6 (d) Subsampled Gaussian with σ = 1 (e) Subsampled Laplace with b = 0.5 (f) Subsampled Random Response with p = 0.9 Figure 3. The RDP parameter (ϵ(α)) of three subsampled mechanisms as a function of order α.We compare the general upper bound with other methods under high and low privcy regime, general upper bound is obtained through Theorem 5. the corresponding tight upper bound in possion subsample case is represented as the black curve. We will include benchmarks when appropriate. For example, we will compare to Lemma 3 of Abadi et al. (2016) whenever we work with Gaussian mechanisms. Also, we will compare to the upper bound of Wang et al. (2019) for subsample without replacements. Finally, we will include the more traditional approaches of tracking and composing privacy losses using simply (ϵ, δ)-DP. We will see that while the moments accountant approach does not dominate the traditional approach, it does substantially reduces the aggregate privacy loss for almost all experiments when we compose over a large number of rounds. Comparing RDP bounds. The results on the RDP bounds are shown in the Figure 3. First of all, the RDP of subsampled Gaussian mechanism behaves very differently from that of the Laplace mechanism, There is a phase transition about the subsampled-Gaussian mechanism that happens around αγeϵ(α) γ 1. Before the phase transition the RDP is roughly O(γ2α2(eϵ(2) 1)), the RDP quickly converges to ϵ(α), which implies that subsampling has no effects. This kind of behaviors cannot be captured through CDP. On the other hand, for ϵ-DP mechanisms, the RDP increases linearly with α before being capped what the standard privacy amplification by Lemma 4. Relative to existing bounds, our tight bound closes the constant gap, while our general bound is also nearly optimal as we predict. It is worth noting that the bound in Abadi et al. (2016) only applies to up to a threshold of α. τ-term approximation. Figure 5 illustrates the quality of approximation as we increase τ. With τ = 50, the results nearly matches the RDP bound everywhere, except that in the Gaussian case, the phase-transition happened a little bit earlier. Usage in moments accountant. The experiments on moments accountant are shown in Figure 4. Our result are compared to the optimal strong composition (Kairouz et al., 2015) with parameters optimally tuned according to Wang et al. (2019). As we can see, all bounds based on the moments accountants eventually scales proportional to k. Moments accountant techniques with the tight bound end up winning by a constant factor. It is worth noting that in the Gaussian case, moments accountant only starts to perform better than traditional approaches after composing for 1000 times. Also, the version of moments accountant using the theoretical bounds from Abadi et al. (2016) gave substantially worse results3. Finally the general RDP bound 3We implemented the bound from the proof of Abadi et al. (2016) s Lemma 3 for fair comparison. According to Section 3.2 of Abadi et al. (2016), their experiments use numerical integration to approximate the moments. See more discussion on this in Appendix D. Poission Subsampled RDP (a) Gaussian mechanism (σ = 5) (b) Laplace mechanism (b = 2) (c) Randomized response (p = 0.6) (d) Gaussian mechanism (σ = 1) (e) Laplace mechanism (b = 0.5) (f) Randomized response (p = 0.9) Figure 4. Illustration of the use of our bounds in moments accountant. We plot the the privacy loss ϵ for δ = 1e 8 (using (2)) after k rounds of composition. The x-axis is the number of composition k,and the y-axis is the privacy loss after k s composition. The green curve is based on general upper bound for all parametrized random mechanism obtained through Theorem 5. Short hand MA refer to moment accountant . The upper three figures are in high privacy regime with parameter σ = 5, b = 2, p = 0.6, the lower three are in low privacy regime with σ = 1, b = 0.5, p = 0.9. perform as well as the tight bound when k is large (thanks to the tightness for small α). 5. Conclusion In this paper, we study the problem of privacyamplification by poisson subsampling, which involves "add/remove" scheme instead of replacement strategy. Specifically, we derive a tight upper bound for M Poisson Sample for any mechanism satisfying that their odd order Pearson-Vajda χα-Divergences are nonnegative. We showed that Gaussian mechanism and Laplace mechanism have this property, as a result, finding the exact analytical expression for the Poisson subsampled Gaussian mechanism that has seen significant application in differentially private deep learning. Our results imply that we can completely avoid numerical integration in moments accounts and track the entire range of α without paying unbounded memory. In addition, we propose an efficiently τ-term approximation scheme which only calculates the first and last τ terms in the Binomial expansion when evaluating the RDP of subsampled mechanisms. This greatly simplifies the computation for computing ϵ given δ as is used in the moments accountant. The experiment result of τ-term approximate part reveals that approximate bound matches up the lower bound quickly even for a relative small τ. Future work includes making use of the exact subsampled RDP bounds to tighten the existing results that made use of subsampled-mechanisms, coming up with more general recipe to automatically check the nonnegativity condition on the odd-order Pearson-Vajda χα-Divergences and design differentially private learning algorithms with more complex and hetereogenous building blocks. Acknowledgements YZ and YW were supported by a start-up grant of UCSB Computer Science Department and a Machine Learning Research Award from Amazon Web Services. The authors thank Borja Balle, Shiva Kasiviswanathan, Alex Smola, Mu Li and Kunal Talwar for helpful discussions. Abadi, M., Chu, A., Goodfellow, I., Mc Mahan, H. B., Mironov, I., Talwar, K., and Zhang, L. Deep learning with differential privacy. In ACM SIGSAC Conference on Computer and Communications Security (CCS-16), pp. 308 318. ACM, 2016. Balle, B. and Wang, Y.-X. Improving gaussian mechanism for differential privacy: Analytical calibration and op- Poission Subsampled RDP timal denoising. International Conference in Machine Learning (ICML), 2018. Balle, B., Barthe, G., and Gaboardi, M. Privacy amplifi- cation by subsampling: Tight analyses via couplings and divergences. In Advances in Neural Information Processing Systems (NIPS-18), 2018. Bassily, R., Smith, A., and Thakurta, A. Private empirical risk minimization: Efficient algorithms and tight error bounds. In Foundations of Computer Science (FOCS14), pp. 464 473. IEEE, 2014. Beimel, A., Nissim, K., and Stemmer, U. Characterizing the sample complexity of private learners. In Conference on Innovations in Theoretical Computer Science (ITCS13), pp. 97 110. ACM, 2013. Bun, M. and Steinke, T. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Theory of Cryptography Conference, pp. 635 658. Springer, 2016. Bun, M., Nissim, K., Stemmer, U., and Vadhan, S. Differ- entially private release and learning of threshold functions. In Foundations of Computer Science (FOCS), 2015 IEEE 56th Annual Symposium on, pp. 634 649. IEEE, 2015. Bun, M., Dwork, C., Rothblum, G. N., and Steinke, T. Composable and versatile privacy via truncated cdp. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pp. 74 86. ACM, 2018. Dwork, C. and Rothblum, G. N. Concentrated differential privacy. ar Xiv preprint ar Xiv:1603.01887, 2016. Dwork, C., Mc Sherry, F., Nissim, K., and Smith, A. Cal- ibrating noise to sensitivity in private data analysis. In Theory of cryptography, pp. 265 284. Springer, 2006. Foulds, J., Geumlek, J., Welling, M., and Chaudhuri, K. On the theory and practice of privacy-preserving bayesian data analysis. In Conference on Uncertainty in Artificial Intelligence (UAI-16), pp. 192 201. AUAI Press, 2016. Geumlek, J., Song, S., and Chaudhuri, K. Renyi differential privacy mechanisms for posterior sampling. In Advances in Neural Information Processing Systems (NIPS-17), pp. 5295 5304, 2017. Kairouz, P., Oh, S., and Viswanath, P. The composition theorem for differential privacy. In International Conference on Machine Learning (ICML-15), 2015. Kasiviswanathan, S. P., Lee, H. K., Nissim, K., Raskhod- nikova, S., and Smith, A. What can we learn privately? SIAM Journal on Computing, 40(3):793 826, 2011. Li, N., Qardaji, W., and Su, D. On sampling, anonymiza- tion, and differential privacy or, k-anonymization meets differential privacy. In The 7th ACM Symposium on Information, Computer and Communications Security, pp. 32 33. ACM, 2012. Mc Mahan, H. B., Ramage, D., Talwar, K., and Zhang, L. Learning differentially private recurrent language models. In International Conference on Learning Representations (ICLR-18), 2018. Mironov, I. Rényi differential privacy. In Computer Secu- rity Foundations Symposium (CSF), 2017 IEEE 30th, pp. 263 275. IEEE, 2017. Nielsen, F. and Nock, R. On the chi square and higher-order chi distances for approximating f-divergences. IEEE Signal Processing Letters, 21(1):10 13, 2014. Papernot, N., Song, S., Mironov, I., Raghunathan, A., Tal- war, K., and Úlfar Erlingsson. Scalable private learning with pate. In International Conference on Learning Representations (ICLR-18), 2018. Park, M., Foulds, J., Chaudhuri, K., and Welling, M. Vari- ational bayes in private settings (vips). ar Xiv preprint ar Xiv:1611.00340, 2016. Särndal, C.-E., Swensson, B., and Wretman, J. Model as- sisted survey sampling. Springer Science & Business Media, 2003. Song, S., Chaudhuri, K., and Sarwate, A. D. Stochastic gradient descent with differentially private updates. In Conference on Signal and Information Processing, 2013. Vajda, I. χα-divergence and generalized fisher information. In Prague Conference on Information Theory, Statistical Decision Functions and Random Processes, pp. 223. Academia, 1973. Wang, Y.-X., Fienberg, S., and Smola, A. Privacy for free: Posterior sampling and stochastic gradient monte carlo. In International Conference on Machine Learning (ICML-15), pp. 2493 2502, 2015. Wang, Y.-X., Lei, J., and Fienberg, S. E. Learning with differential privacy: Stability, learnability and the sufficiency and necessity of erm principle. Journal of Machine Learning Research, 17(183):1 40, 2016. Wang, Y.-X., Balle, B., and Kasiviswanathan, S. Subsam- pled rényi differential privacy and analytical moments accountant. In International Conference on Artificial Intelligence and Statistics (AISTATS-19), 2019.