# the_saddlepoint_method_in_differential_privacy__c17c8ccd.pdf

The Saddle-Point Method in Differential Privacy

Wael Alghamdi 1 Juan Felipe Gomez 1 Shahab Asoodeh 2 Flavio P. Calmon 1 Oliver Kosut 3 Lalitha Sankar 3

We characterize the differential privacy guarantees of privacy mechanisms in the largecomposition regime, i.e., when a privacy mechanism is sequentially applied a large number of times to sensitive data. Via exponentially tilting the privacy loss random variable, we derive a new formula for the privacy curve expressing it as a contour integral over an integration path that runs parallel to the imaginary axis with a free real-axis intercept. Then, using the method of steepest descent from mathematical physics, we demonstrate that the choice of saddle-point as the real-axis intercept yields closed-form accurate approximations of the desired contour integral. This procedure dubbed the saddle-point accountant (SPA) yields a constant-time accurate approximation of the privacy curve. Theoretically, our results can be viewed as a refinement of both Gaussian Differential Privacy and the moments accountant method found in R enyi Differential Privacy. In practice, we demonstrate through numerical experiments that the SPA provides a precise approximation of privacy guarantees competitive with purely numerical-based methods (such as FFT-based accountants), while enjoying closedform mathematical expressions.

1. Introduction

Differential privacy (DP) is a widely adopted standard for privacy-preserving machine learning (ML). Differentially private mechanisms used in ML tasks typically operate in the large-composition regime, where mechanisms are sequentially applied many times to sensitive data. For exam-

1School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, USA 2Department of Computing and Software, Mc Master University, Hamilton, Ontario, Canada 3School of Electrical, Computer, and Energy Engineering, Arizona State University, Tempe, Arizona, USA. Correspondence to: Wael Alghamdi <alghamdi@g.harvard.edu>.

Proceedings of the 40 th International Conference on Machine Learning, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright 2023 by the author(s).

ple, when training neural networks using stochastic gradient descent, DP can be ensured by clipping and adding Gaussian noise to each gradient update (Abadi et al., 2016). Here, a DP mechanism (gradient clipping plus noise) is applied hundreds or thousands of times to the training data.

Quantifying the privacy loss after a large number of compositions of DP mechanisms is a central challenge in privacypreserving ML. A key result by Murtagh & Vadhan (2016, Theorem 1.5) states that computing exact privacy parameters under composition is in general #P-complete, hence infeasible. This challenge has spurred several follow-up works on privacy accounting, e.g., (Dong et al., 2022; Koskela et al., 2020; Koskela & Honkela, 2021; Koskela et al., 2021; Gopi et al., 2021; Ghazi et al., 2022; Doroshenko et al., 2022), which compute upper bounds on the privacy budget parameters (ε, δ) in DP (see (6) for a formal definition).

The currently available accountants have several limitations. The accountants that have closed-form formulas thereby attaining constant (in composition) runtimes such as the moments accountant (Abadi et al., 2016; Mironov, 2017) and the CLT-based Gaussian-DP accountant (Bu et al., 2020), suffer from either overestimating or underestimating, respectively, the privacy parameters. On the other hand, convolution-based accountants, such as FFT-based approaches (Koskela et al., 2020; Gopi et al., 2021), while working well in practice, do not have constant runtimes, cannot generate the full privacy curve, and are limited by machine precision due to their purely numerical nature.1

For example, existing implementations of the FFT-based approaches fail to estimate values of δ below 10 10 (Gopi et al., 2021, Appendix B) or 10 12 (Doroshenko et al., 2022, Appendix C).

We overcome these challenges by introducing a new approach for estimating DP parameters using complex analysis. Our approach is based on the method of steepest descent for integral approximation a well-known method in mathematical physics (Jeffreys & Jeffreys, 1999). We derive the saddle-point accountant (SPA),2 which:

1Of course, this limitation can be alleviated by using custom implementations and arbitrary float-point precision libraries. Our point is that closed-form formulas do not have this limitation. 2We provide a Python implementation of the proposed SPA at https://github.com/Felipe-Gomez/saddlepoint accountant

The Saddle-Point Method in Differential Privacy

0.0 0.5 1.0 1.5 2.0 2.5 3.0

Moments Accountant

PRV Accountant

Connect the Dots

SPA-CL T Bounds

Figure 1. Accounting for the composition of 3000 subsampled Gaussian mechanisms, with noise scale σ = 2 and subsampling rate λ = 0.01. The remaining FFT discretization parameters are set4 to εerror = 0.07, δerror = 10 10 for the PRV Accountant (Gopi et al., 2021), and discretization interval length of 2 10 4 for Connect the Dots (Doroshenko et al., 2022).

1) has a computable closed-form formula, hence enjoys constant runtime complexity in the number of compositions;

2) estimates the privacy parameters accurately and with provable error bounds; and

3) works for any value of δ, however small, thus describing the full range of (ε, δ) guarantees.

We illustrate the above properties of the SPA in Figure 1, which shows a comparison between the SPA and the state-ofthe-art (SOTA) DP accountants when computing the (ε, δ) curve of a composition of 3000 subsampled Gaussian mechanisms. Only the moments accountant and the SPA are able to trace the whole privacy curve (see for example the region δ > 10 15). Further, the SPA upper and lower bounds have a narrow gap between them.

The SPA combines large-deviation and central-limit approaches for bounding expectations of sums of independent random variables, thereby attaining the best of both worlds. The large deviation approach uses the moment-generating function to approximate the probability of very unlikely events. The central limit theorem (CLT) approximates a random variable by a Gaussian with the same mean and variance. For DP accounting, the large deviation approach led to the moments accountant (Abadi et al., 2016); the CLT approach led to Gaussian-DP (Sommer et al., 2019; Dong et al., 2022). Both these accountant methods can be computed in constant time, but their accuracy is far less than

4Decreasing these error parameters makes the PRV accountant more accurate, and we emphasize that the reason we include the PRV accountant in this plot is that it serves as a proxy for the ground truth.

the SOTA FFT accountant (Gopi et al., 2021). The saddlepoint method can be viewed as a combination of two basic approaches: maintaining from large deviations the ability to handle very small values of δ, as well as the precise guarantees of the CLT. The resulting SPA achieves better accuracy than either approach on its own, while maintaining the optimality of the runtime complexity.

A brief overview of the SPA. Suppose that a DP mechanism has a privacy loss random variable whose cumulantgenerating function K(t) is finite for positive values of t (see Section 2 for precise definitions). Note that K(t) is a familiar quantity used in DP accounting; for instance, it can be verified that the mechanism satisfies exactly (t + 1, K(t)/t)-R enyi-DP for each t > 0 (Mironov, 2017). The SPA performs the following steps to estimate δ given ε:

1) Set F(t) K(t) εt log t log(t + 1),

2) solve F (t) = 0 over t > 0,

3) return δ(ε) e F (t)/ p

From this general workflow, it is clear that the SPA runs in constant time for n-fold self-composition; indeed, the cumulant-generating function for the composition is n K. Moreover, the root-finding in step 2 is similar to the one performed in the moments accountant (Abadi et al., 2016), which solves K (t) ε = 0 instead.

We refer to the approximation returned by this simple procedure as SPA-MSD.5 The reason SPA-MSD approximates the privacy curve well is the following three steps. First, we express the privacy curve as the following contour integral:

δ(ε) = 1 2πi

t i e F (z) dz, (1)

which holds independently of the choice of t > 0. Second, we apply the method of steepest descent, which uses a judicious choice of the integration path in the complex plane: the line parallel to the imaginary axis with real part corresponding to the saddle-point of the integrand, i.e., the unique point t > 0 for which F (t) = 0. This approach leads to a new series expansion for δ given a fixed ε (see (26)), where the first term of this series corresponding to the approximation in step 3 above. This new expression for the privacy curve is our first main contribution.

Our experiments demonstrate that the SPA-MSD approximation is very accurate and can consistently achieve relative errors below 0.1% in ε for a fixed δ (see Figure 3). However, this approach does not provide a provable upper bound on the privacy curve only an approximation. Consequently, we introduce another SPA, named SPA-CLT, where we first expand the K term in the integrand in (1) as an Edgeworth

5MSD stands for method of steepest descent.

The Saddle-Point Method in Differential Privacy

series (Hall, 2013), then apply the Berry-Esseen theorem to prove upper and lower bounds on the privacy curve. This procedure is equivalent to applying CLT to a tilted version of the privacy loss random variable.

The SPA-CLT amounts to replacing step 3 above by a slightly different approximation given in Proposition 5.3. This second approximation also enjoys constant runtime, yields provable and accurate upper bounds for the privacy curve even for very small values of δ, and is our second main contribution.

Finally, our third main contribution is an asymptotically tight DP composition theorem (see Theorem 4.1) which is useful in the error rate analysis for the SPA and is of independent interest.

The rest of the paper is organized as follows. Preliminaries on DP and the method of steepest descent are recalled in Section 2. We derive a contour-integral formula and an asymptotic expansion for the privacy curve in Section 3. This asymptotic expansion gives rise to heuristics for approximating the privacy curve, which leads to the the SPA-MSD method in Section 3.3. Then, we derive a tight composition theorem and the decay rate of the saddle-point in Section 4. In Section 5, we introduce the SPA-CLT (the second version of the SPA) and apply the results from Section 4 to derive rigorous bounds on the privacy curve. All proofs can be found in the Appendices.

2. Preliminaries

We collect in this section some of the required background on differential privacy, the method of steepest descent, and exponential tilting. We also prove a useful inequality on subsampling in Lemma 2.2, and we clarify our notation and assumptions.

2.1. Notation

For a random variable L, the moment-generating function (MGF) is denoted by ML(t) E[et L], and the cumulantgenerating function (CGF) by KL(t) log ML(t). The hockey-stick divergence (with parameter γ 1) of a probability measure P from another Q is defined as

Eγ(P Q) sup B Borel P(B) γQ(B). (2)

If X P and Y Q are random variables, we also write Eγ(X Y ) Eγ(P Q). The standard-normal cumulative distribution function is denoted by Φ. The Q function is defined by Q(x) 1 Φ(x). We also denote the function q : R (0, ) by

2π ez2/2. (3)

The (m, k)-th partial Bell polynomial is denoted by (with

x = (x1, , xm))

k1+ +km=k 1 k1+ +m km=m

(4) where the sum runs over nonnegative integers kj, and the m-th complete Bell polynomial by

k=1 Bm,k(x). (5)

The variance of a random variable X is denoted by σ2 X. We will use the standard Bachmann-Landau notations O, Ω, Θ, o, ω, and we will let indicate the equivalence of order, i.e., an bn if and only if an/bn 1 as n . We will also write fn P

k N an,k to indicate an asymptotic expansion, i.e., the series might not converge but the first few partial sums approximate fn well.

2.2. Differential Privacy

We review the basics of differential privacy, and derive a useful inequality for subsampling. Definition 2.1 ((ε, δ)-DP (Dwork et al., 2006a;b; Zhu et al., 2022)). A mechanism (i.e., randomized algorithm) M is (ε, δ)-differentially private (DP) if, for every pair of neighboring datasets, denoted D D , and event E,

P [M(D) E] eε P [M(D ) E] δ, (6)

i.e., if sup D D Eeε(M(D) M(D )) δ. (7)

A pair of probability measures (P, Q) is called a dominating pair for M if, for every ε 0, event E, and neighboring datasets D D , the following inequality holds:

P [M(D) E] eε P [M(D ) E] P(E) eεQ(E). (8) If (8) is tight, i.e., if

sup D D P [M(D) E] eε P [M(D ) E]

= P(E) eεQ(E) (9)

for each fixed ε 0, then (P, Q) is said to be a tightly dominating pair. For any dominating pair (P, Q) consisting of equivalent measures, we associate a privacy loss random variable (PLRV) that is defined as

d Q(X), X P. (10)

It is not hard to see that a mechanism M having PLRV L will satisfy (ε, δL(ε))-DP for every ε 0, where we define the privacy curve (with a+ max(0, a))

δL(ε) E h 1 eε L +i . (11)

The Saddle-Point Method in Differential Privacy

Composition of DP Mechanisms. The use of PLRVs facilitates DP accounting under composition. The adaptive composition of two mechanisms M1 and M2 is given by the mechanism

(M1 M2)(D) (M1(D), M2(D, M1(D))), (12)

that is, M2 can look at both the dataset and the output of M1. Let M(n) = M1 Mn denote the adaptive composition of n, possibly distinct, mechanisms. We can form a PLRV for the composed mechanism that splits additively. In other words, L(n) L1 + + Ln, where L1, , Ln are independent PLRVs for M1, , Mn, respectively, is a PLRV for the composition M(n) (Dong et al., 2022, Theorem 3.2). The ensuing privacy curve δL(n) (as defined by (11)) gives a privacy guarantee for M(n). Like all accounting methods cited herein, we focus on computing or approximating the curve δL(n), which we refer to henceforth as the composition curve. We also denote the n-fold selfcomposition of a mechanism M by M n, and in this case we may choose the Lj to be i.i.d.

Subsampling and DP-SGD. In the context of differentially-private stochastic gradient descent (DPSGD), one applies a DP mechanism on a subset of the dataset. The fraction of the batch size over the size of the dataset is called the subsampling rate, denoted by λ. Subsampling is known to amplify the privacy guarantees (Balle et al., 2018). In this setting, with Mλ denoting the subsampled mechanism, one should bound both orders Eeε(M(D) Mλ(D )) and Eeε(Mλ(D) M(D )) to obtain the value of δ. In the following lemma, we show that in fact one order dominates. See Appendix A for the proof and further details on subsampling.

Lemma 2.2. Fix a Borel probability measure P over Rn

that is symmetric around the origin (i.e., P(A) = P( A) for every Borel A Rn), and fix constants (s, λ, γ) Rn [0, 1] [1, ). Let Ts P be the probability measure given by (Ts P)(A) = P(A s), and let Q = (1 λ)P +λTs P. We have the inequality Eγ(P Q) Eγ(Q P), with equality if and only if (γ 1) λ s Eγ(Q P) = 0.

Proof. See Appendix A.

2.3. The Method of Steepest Descent

We give a brief overview of the method of steepest descent (see Appendix B for details). We need to compute

t i e Fn(z) dz (13)

for a given Fn, provided that In is independent of the value of t R. In a nutshell, the method of steepest descent is a powerful tool for choosing the best parameter t that renders

the computation of In easiest. Namely, t is the saddle-point of Fn, defined as the unique solution to F n(t0) = 0. Then, one would obtain the asymptotic expansion :

In e Fn(t0) p

where we define the constants

βn,m ( 1)m B2m(0, 0, F (3) n (t0), . . . , F (2m) n (t0)) 2mm!F n (t0)m .

(15) Recall that this does not mean that the above equation holds for In with equality for any particular n. Rather, it is a heuristic indicating the potential for the truncated expansion to give close approximations for the intended integral In.

In our application of the method of steepest descent to DP, we show in Theorem 3.1 that the privacy curve can be represented as the contour integral (13) for the choice of function

Fn(z) = KL(n)(z) zε log z log(1 + z). (16)

2.4. Exponential Tilting

An essential tool that we use for our theoretical analysis is exponential tilting of random variables, defined as follows. Definition 2.3. The exponential tilting with parameter t R of a random variable L having a finite MGF at t is the random variable L whose probability measure is given by

P L(B) 1 ML(t)

B etx d PL(x) (17)

for any Borel set B. If L has PDF p L, then L is given by its PDF

p L(x) = etxp L(x)

ML(t) . (18)

A simple key feature of exponential tilting, stated here without proof, is that it respects addition and independence. Lemma 2.4. For independent Lj, the exponential tilting of L = L1 + + Ln with parameter t is L = L1 + + Ln, where Lj is the exponential tilting of Lj with parameter t for each j. Further, L1, . . . , Ln are independent too.

2.5. Assumptions

We will require the PLRVs to have finite MGFs. Assumption 2.5. The MGF ML(t) of the PLRV L is finite for every t > 0.

Under Assumption 2.5, both the MGF and CGF can be extended to be holomorphic functions over the half-plane z (0, ) + i R C.

We impose the following technical assumption on the distribution of the PLRV so that Parseval s identity applies.

The Saddle-Point Method in Differential Privacy

Assumption 2.6. The induced probability measure PL by the PLRV L decomposes as a sum PL = QL + RL for QL absolutely continuous with respect to the Lebesgue measure and discrete RL. Further, with q L denoting the PDF of QL, we assume that x 7 etxq L(x)2 is integrable for each t > 0.

For our error analysis, we will assume the following on the growth of the first three moments of a PLRV.

Assumption 2.7. With L = L1 + + Ln being the exponential tilting with parameter t > 0, and denoting

j=1 E Lj E[ Lj] 3 , (19)

we assume that there are constants KL, V > 0, and P such that t = o(n 1/3) yields the limit (as n )

1 n (E[ L], σ2 L, Pt) (KL, V, P). (20)

A few remarks on the satisfiability of the above assumptions are in order. Remark 2.8. Assumption 2.7 is automatically satisfied under Assumption 2.5 for self-composition. Remark 2.9. It is worth noting that all of the above assumptions are satisfied by the usual continuous DP mechanisms, including both the subsampled Gaussian mechanism (because its PLRV is continuous with a PDF that decays super-exponentially) and the subsampled Laplace mechanism (because its PLRV s continuous part is bounded). See Appendix C for more details. Remark 2.10. Although finiteness of the MGF rules out DP mechanisms whose PLRVs are infinite with nonzero probability (e.g., discrete mechanisms or compactly-supported mechanisms), our approach may be extended to encapsulate this case too. Specifically, the mass at infinity should be added to the value of δ directly. Indeed, we may rewrite the privacy curve δL via conditioning on the event {L < } as

δL(ε) = P[L = ] + E (1 eε L)+|L < P[L < ]

= P[L = ] + δZ(ε) P[L < ], (21)

where Z is a random variable obtained from L via conditioning on the event {L < }. Then, we may apply our methods on δZ and obtain results for the original curve δL in view of the relation (21). Remark 2.11. It is not hard to see that the MGF ML of the PLRV L is finite for any additive continuous mechanism with PDF of the form e g(x) for a continuous g such that g(x) β|x|α for some α, β > 0. For such DP mechanisms, the MGF of the ensuing PLRV is finite at any t > 0 and for any subsampling rate λ [0, 1].

3. New Representations of the Privacy Curve

In Theorem 3.1, we derive two new formulas for the privacy curve. Then, we apply the method of steepest descent to the contour-integral formula (23). This yields the asymptotic expansion (26) of the privacy curve, which is the basis for the SPA-MSD as given by Definition 3.6. Later, in Section 5, we derive rigorous bounds on a CLT-based approximation that is inspired by the approximations in the present section.

We assume that we have access to a PLRV L for mechanism M (see Definition 2.1). In most cases, the relevant variable is L(n) = L1 + + Ln, such that, as discussed in Section 2.2, δL(n) is the composition curve for the adaptive composition M(n) = M1 Mn (and L1, , Ln are PLRVs for M1, , Mn that are independent). However, in this section we derive formulas for the privacy curve δL for any variable L. We note that for these formulas to be numerically computable, it suffices that the distribution of L be known to an extent that the derivatives of the MGF M (k) L (t) can be computed.

3.1. The Privacy Curve as a Contour Integral

The privacy curve is defined in (11) as the expectation δL(ε) = E[f(L)], where f(x) = (1 eε x)+. We want to transform this integral via Parseval s identity into the frequency domain. However, as f L1(R), we cannot directly apply Parseval s identity. Nevertheless, exponentially tilting L, we may replace f(x) by e txf(x), which decays fast. We carry out the details of this idea in Appendix E to obtain the following new formulas for δL.

Theorem 3.1. If the PLRV L satisfies Assumption 2.5, then, for every t > 0, we may write the privacy curve as

δL(ε) = ML(t) E e t L 1 eε L + (22)

for all ε 0, where L is the exponential tilting of L with parameter t (see Definition 2.3). If, in addition, L satisfies Assumption 2.6, then we also have the formula6

δL(ε) = 1 2πi

t i e Fε(z) dz (23)

for all ε 0, where we define the exponent by7

Fε(z) KL(z) zε log z log(1 + z). (24)

Proof. See Appendix E.

6The independence of formula (23) of t is not surprising, given Cauchy s integration theorem. More importantly, the theorem states that an integration path with real part t is actually equivalent to exponential tilting with parameter t. 7We use the principal branch of the complex logarithm, so Fε is well defined and analytic over the half-plane z (0, ) + i R.

The Saddle-Point Method in Differential Privacy

The two formulas in (22) (23) lead to two paths for approximating δL. The first is a direct application of the method of steepest descent, where Fε is expanded around the saddlepoint (see Section 2.3). The second simply approximates the expectation formula in (22) via the CLT, by replacing L with a Gaussian. The first path (described next) leads to better approximations numerically, but the second path is more amenable to an error analysis (see Section 5).

3.2. The Privacy Curve in Terms of Bell Polynomials

As we have proved a formula in (23) for the privacy curve δL representing it as a contour integral like in (13), we can now apply the method of steepest descent to approximate it. Recall from Section 2.3 that the best choice for the real-axis intercept in (23) is the saddle-point.

Definition 3.2. The saddle-point associated with a PLRV L satisfying Assumption 2.5 and a privacy parameter ε satisfying ε < ess sup L is the unique t0 > 0 such that F ε(t0) = 0, or equivalently

K L(t0) = ε + 1

t0 + 1 t0 + 1. (25)

Remark 3.3. The original moments accountant aims to solve K L(t) = ε, indicating the connection between the moments accountant and the SPA, introduced formally in Section 3.3.

Remark 3.4. We show in Appendix D that the saddle-point, as given by Definition 3.2, is indeed well-defined.

Applying the method of steepest descent to the contour integral in (23) with the choice of t being the saddle-point, we obtain the following asymptotic expansion for the privacy curve in terms of the derivatives of the MGF, connected via Bell polynomials (see Section 2.3).

Heuristic 3.5. Let L be a PLRV satisfying Assumption 2.5. Then, for any ε [0, ess sup L), and with t0 denoting the associated saddle-point, we have the asymptotic expansion

δL(ε) e Fε(t0) p

where, with Bk(x1, . . . , xk) denoting the k-th Bell polynomial and F (k) ε the k-th derivative, we denote the constants

βε,m ( 1)m B2m(0, 0, F (3) ε (t0), . . . , F (2m) ε (t0)) 2mm!F ε (t0)m .

(27) Further, with Bk,j(x1, , xk) denoting the (k, j)-th par-

1500 2500 3500 4500

compositions (n)

Moments Accountant

SPA-MSD (k=1)

SPA-MSD (k=2)

SPA-MSD (k=3)

Figure 2. Privacy budget ε of the subsampled Gaussian mechanism after 1500 n 4500 compositions using the proposed SPAMSD (29) and the other closed-form accountants. We use the subsampling rate λ = 0.01, noise scale σ = 2, and δ = 10 15.

tial Bell polynomial, the derivatives of Fε are8 (for k 2)

F (k) ε (t0) = ( 1)k 1(k 1)! 1

tk 0 + 1 (t0 + 1)k

( 1)j 1(j 1)!

ML(t0)j Bk,j(M L(t0), , M (k) L (t0)).

3.3. Application: The Saddle-Point Accountant

Based on the asymptotic expansion in (26), we can derive various approximations of δL depending on how many terms we keep. This leads to the following versions of the saddlepoint accountant (SPA).

Definition 3.6. The order-k method-of-steepest-descent saddle-point accountant (SPA-MSD) for the mechanism M with PLRV L satisfying Assumption 2.5 is defined by

δ(k) L, SP-MSD(ε) e Fε(t0) p

when ε < ess sup L, where t0 > 0 is the saddle-point (i.e., F ε(t0) = 0), and we set δ(k) L, SP-MSD(ε) = 0 if ε ess sup L. Here, the βε,m are as defined in (27).

The first SPA-MSD is

δ(1) L, SP-MSD(ε) = e Fε(t0) p

2πF ε (t0) , (30)

8The formula for F (k) ε follows immediately by Fa a di Bruno s formula for the derivatives of composition of functions.

The Saddle-Point Method in Differential Privacy

which can be expanded using the definition of Fε as

δ(1) L, SP-MSD(ε)

= e KL(t0) εt0

t0(t0 + 1)K L(t0) + t2 0 + (1 + t0)2 . (31)

The order-2 SPA-MSD is given by

δ(2) L, SP-MSD(ε) = e Fε(t0) p

8 F (4) ε (t0) F ε (t0)2

and the order-3 SPA-MSD is given by

δ(3) L, SP-MSD(ε) = e Fε(t0) p

8 F (4) ε (t0) F ε (t0)2 5

24 F (3) ε (t0)2

F ε (t0)3 1

48 F (6) ε (t0) F ε (t0)3

Empirical Accuracy of SPA-MSD. The expressions for the SPA-MSD displayed in (31) (33) can traverse privacy curves that are virtually indistinguishable from the groundtruth. We illustrate this in Figure 2 for the subsampled Gaussian, where we estimate ε (for fixed δ = 10 15) under a varying number of compositions. In this experiment, SPA-MSD improves on the other closed-form accountants (which run in constant time). Hence, SPA-MSD can be seen a correction to both the large deviation method and the CLT-based method found in the Moments Accountant and Gaussian-DP, respectively. A zoomed-in version of this figure is given in Figure 3, which shows the relative errors. See Appendix L for the SPA-MSD pseudocode, Appendix M for computing the ground-truth in Figure 2, and Appendix N for more experiments.

4. Asymptotically Tight Composition Theorem

We show next that the lowest ε under composition cannot deviate from the mean of the PLRV by a large multiple of the standard deviation of the PLRV. This result is used afterwards to derive the asymptotic behavior of the saddlepoint. The asymptotic behavior of the saddle-point, in turn, will be helpful in the next section to derive rigorous bounds on the SPA approximation error. We prove the following asymptotically tight DP composition theorem.

Theorem 4.1. Let M = M1 Mn have a PLRV L = L1+ +Ln, where the Lj are PLRVs for the Mj that are independent. Assume that the Lj have finite absolute third moments, and P0 = o(σ3 L) as n (see (19)). Let δ (0, 1/2) be such that lim sup δ < 1/2 (so δ is allowed to vary with n). If σL/( Φ 1(δ)) as n , then M is (E[L] Φ 1(δ)σL, δ (1 + o(1)))-DP. Conversely, this result is tight in the following sense. If δ0 (0, 1/2)

is fixed, σL , and M is (E[L] + bσL, δ0 + o(1))-DP, then we must have lim inf b Φ 1(δ0).

Proof. See Appendix F.

A more compact way to state the constant-δ claim in the theorem is that, for any fixed δ (0, 1/2), we have

δL(E[L] Φ 1(δ)σL) δ. (34)

For example, Theorem 4.1 implies that δL(ε) is close to 10 10 if and only if ε is around E[L] + 6.4σL for all large n, since Φ 1(10 10) 6.4. Thus, if one hopes to have a small value of δ, the only interesting values of ε, in the regime of high n, are those that are above E[L] by the derived multiple of σL.

4.1. Asymptotic Formula for the Saddle-Point

We re-parameterize ε = E[L] + bσL, so b can be seen as the Z-score of ε, which is justified by Theorem 4.1. For this regime of values of ε, we prove the following asymptotic characterization of the saddle-point.

Theorem 4.2. Let L = L1 + + Ln for independent Lj satisfying Assumption 2.5, and suppose that (E[L], σ2 L) n (KL, V) for some constants KL, V > 0. Let ε = E[L] + bσL, where b > 0 satisfies b = o(n1/6), and assume that ε < ess sup L. Then, the value of the saddle-point (as given by Definition 3.2) satisfies the asymptotic relation

b2 + 4 2σL . (35)

Proof. See Appendix G.

This asymptotic formula for the saddle-point will be useful in deriving the asymptotic rate of the approximation error of the SPA in the next section.

5. CLT Error Bound Analysis

While the approximations of Section 3 are often very precise (see Figure 2), they are merely approximations, and do not provide any hard guarantees on the (ε, δ)-DP of a given mechanism. In this section, we derive the alternative form of the SPA by applying the Berry-Esseen theorem to the saddle-point exponentially tilted PLRV, thereby obtaining upper and lower bounds on the achieved privacy parameters.

5.1. CLT Based Version of the SPA

We return to the expectation based formula for δL shown in Theorem 3.1, which can be rewritten as

δL(ε) = e KL(t) εt E h f L ε, t i , (36)

The Saddle-Point Method in Differential Privacy

where f(x, t) e xt 1 e x + , (37)

with t > 0 varying freely and L being the exponential tilting of L with parameter t. Here, L = L1 + + Ln for independent Lj satisfying Assumption 2.5. We will simply replace L by a Gaussian with the same first two moments,9

and choose t to be the saddle-point of L as per Definition 3.2. Thus, we introduce the following version of the SPA.

Definition 5.1. Under Assumption 2.5, the CLT version of the saddle-point accountant (SPA-CLT) is defined by

δL, SP-CLT(ε) e KL(t0) εt0 E f(Z ε, t0) (38)

if ε < ess sup L, where Z N (K L(t0), K L(t0)), and t0 is the saddle-point for L as given by Definition 3.2. We define δL, SP-CLT(ε) = 0 for ε ess sup L.

Remark 5.2. The approach giving rise to δL, SP-CLT can be seen as a series expansion of the e KL(z) part of the integrand in Theorem 3.1, or equivalently as an (order-0) Edgeworth expansion (Hall, 2013) of the distribution of L. However, the Edgeworth expansion approach delineated herein is different from what can be found in the DP literature (Wang et al., 2022). Specifically, we apply the Edgeworth expansion on the tilted random variable L, whereas the approach of Wang et al. (2022) uses the Edgeworth expansion of the non-tilted version L. This distinction can yield very different approximations. We include a comparison between our approach and the standard CLT in Appendix H.

The following result expresses the CLT-based SPA in terms of easily computable functions. In what follows, we let δL, SP-CLT(ε; t) denote the same expression as in (38) but with t0 replaced by a free t > 0, i.e.,

δL, SP-CLT(ε; t) e KL(t) εt E f(Z ε, t) 1[0,ess sup L)(ε) (39) where Z N(K L(t), K L(t)). In particular δL, SP-CLT(ε) = δL, SP-CLT(ε; t0) for t0 the saddle-point.

Proposition 5.3. Suppose Assumption 2.5 holds. Fix any t > 0 and ε [0, ess sup L), and denote

γ K L(t) ε p

K L(t) , α q

K L(t) t γ,

K L(t) (t + 1) γ.

Then, we have that (with q as defined in (3))

δL, SP-CLT(ε; t) = q(α) q(β)

2π e KL(t) εt γ2/2. (41)

Proof. See Appendix I.1.

9It is not hard to see that the mean and variance of L are given by E[ L] = K L(t) and σ2 L = K L(t).

Remark 5.4. It holds that 0 < q(z) < min(1/z, p

π/2) for all z > 0, and q(z) 1/z as z (NIS, Section 7.8).

While the two methods of approximation the steepest descent as in Section 3.3, and the CLT approach in this section lead to different approximations, these two approximations are closely related, as described by the following simple inequality.

Proposition 5.5. Under Assumption 2.5, for any t > 0

δL, SP-CLT(ε; t) e Fε(t) p

2πK L(t) . (42)

Proof. See Appendix I.2.

Note that the only difference between the right-hand side of (42) and δ(1) L, SP-MSD(ε) is that the denominator involves K L instead of F ε .

5.2. Finite-Composition Error Bound

Using the Berry-Esseen theorem, we prove the following theorem for the error bounds on the approximation δL, SP-CLT for arbitrary tilts.

Theorem 5.6. Suppose Assumption 2.5 holds. For any t > 0 and ε 0, there is a ζ [ 1, 1] such that

δL(ε) = e KL(t) εt E e t(Z ε) 1 e (Z ε) +

+ ζ err SP(ε; t), (43)

where Z N(K L(t), K L(t)) and the error is defined by

err SP(ε; t) e KL(t) εt tt

(1 + t)1+t 1.12 Pt K L(t)3/2 . (44)

Proof. See Appendix J.

Note that omitting the ζ term in the right-hand side of (43) gives exactly δL, SP-CLT(ε; t) as per Definition 5.1. Thus, Theorem 5.6 can be equivalently restated as the following error bound for SPA-CLT: for each t > 0 and ε 0,

|δL(ε) δL, SP-CLT(ε; t)| err SP(ε; t). (45)

5.3. Asymptotic Error Rate

While Theorem 5.6 holds for any positive value of t around which the MGF is finite, a natural choice of t is the saddlepoint t0 itself, defined as the solution to (25). We analyze the ensuing error rate for this particular choice of tilt next.

Specifically, we show that the error rate in approximating δL by δL, SP-CLT decays roughly at least as fast as 1/( n eb2/2) for the choice ε = E[L] + bσL, and we characterize the constant term too.

The Saddle-Point Method in Differential Privacy

Theorem 5.7. Let L = L1 + + Ln for independent PLRVs L1, , Ln that satisfy Assumption 2.5. Suppose that Assumption 2.7 holds too. Let ε = E[L] + bσL for b > 0 satisfying b = o(n1/6), and let t0 be the saddle-point of L (see Definition 3.2). Then, as n , we have

err SP(ε; t0) 1.12 e P

V3/2 C(b)τ n , (46)

where τ < 1 satisfies τ 1, and we define the term C(b) exp (b2 + b

b2 + 4)/4 . Furthermore, writing t0 = τ0 b+

b2+4 2σL , we may take τ = (2 τ0)τ0 in (46).

Proof. See Appendix K.

Remark 5.8. In Appendix H, we illustrate the benefit of tilting the PLRV by comparing the error term in (46) with the corresponding standard CLT error (i.e., without tilting). For example, for a limiting value of δ = 10 10, the error term incurred by our tilting-based approach is roughly 9orders of magnitude smaller than the standard approach without tilting.

5.4. Relative-Error Comparisons

The SPA-CLT approximation (41) and its error bound (44) can approximate the privacy parameters accurately. In Figure 3, we plot the relative error10 in estimating ε given δ = 10 15 incurred by SPA-CLT (both for the approximation in (41) and the approximation the error term (44)), SPA-MSD (for comparison), and the other closed-form accountants. The setting is for the subsampled Gaussian mechanism, with the same parameters as in Figure 2. Here, SPA improves on both the moments accountant and Gaussian-DP.

6. Conclusion and Open Problems

We introduce a novel application of the method of steepest descent in DP. First, using the exponentially-tilted version of the PLRV, we derive new formulas for the privacy curve (Theorem 3.1). Inspired by the method of steepest descent, we fix the exponential tilt to be the saddle-point of the integrand s exponent. This amounts to solving the scalar equation

K L(t) = ε + 1

t + 1 t + 1. (47)

The ensuing closed-form formulas provide constant-runtime accurate approximations that can traverse the full privacy curve (e.g., for δ < 10 10). Our approach can be seen as a correction to both large-deviation methods (e.g., the moments accountant, via the additional 1/t + 1/(t + 1) term) and CLT-based methods (e.g., Gaussian-DP, via preprocessing the PLRV with exponential tilting). This way, we retain

10We take the relative error of a privacy curve estimate ˆε(δ), with a ground-truth of ε(δ), to be |1 ˆε(δ)/ε(δ)|.

1500 2500 3500 4500

compositions (n)

relati(e error in ε

Mo ents Accountant

SPA-CL T Bound

SPA-MSD (k=1)

SPA-MSD (k=2)

SPA-MSD (k=3)

Figure 3. Accounting for the privacy budget ε, given δ = 10 15, for the subsampled Gaussian mechanism, with subsampling rate λ = 0.01, and noise scale σ = 2. We plot the relative error in estimating ε (i.e., |1 ˆε(δ)/ε(δ)| for an estimate ˆε) versus the number of compositions, n. the SPA outperforms the other closedform accountants for this experiment.

the constant runtime of closed-form accountants without sacrificing accuracy, as demonstrated by our experiments.

The saddle-point approach leaves a few questions open. The relative-error plot in Figure 3 indicates that, while the SPACLT bounds achieve reasonable relative error, the original approximation given by SPA-CLT and SPA-MSD seem to be several orders of magnitudes more accurate than can be captured by the bounds we derive herein. Hence, it is an interesting future line of work to refine our bounds to further reveal the power of the saddle-point approximation. One promising path towards such a refinement might be through finding mechanism-specific bounds. Relatedly, such finer bounds would shed light on the question of how large is large-enough n? The additional experiments in Appendix N show that n might only need to be of moderate size for the SPA to provide tight guarantees, yet a more complete answer requires additional techniques.

Acknowledgements

We thank the anonymous referees for their careful critique, which helped improve the quality of the paper considerably. This material is based upon work supported by the National Science Foundation under Grant Nos. CAREER-1845852, FAI-2040880, CIF-1900750, SCH-2312666, CIF-1922971, and CIF-1901243; by NSERC Canada; and by the U.S. Department of Energy Award No. DE-SC0022158.

The Saddle-Point Method in Differential Privacy

NIST Digital Library of Mathematical Functions. http://dlmf.nist.gov/, Release 1.1.6 of 2022-06-30. F. W. J. Olver, A. B. Olde Daalhuis, D. W. Lozier, B. I. Schneider, R. F. Boisvert, C. W. Clark, B. R. Miller, B. V. Saunders, H. S. Cohl, and M. A. Mc Clain, eds.

Abadi, M., Chu, A., Goodfellow, I., Mc Mahan, H. B., Mironov, I., Talwar, K., and Zhang, L. Deep learning with differential privacy. In Proc. ACM SIGSAC CCS, pp. 308 318, 2016. doi: 10.1145/2976749.2978318.

Balle, B., Barthe, G., and Gaboardi, M. Privacy amplification by subsampling: Tight analyses via couplings and divergences. In Neur IPS, pp. 6280 6290, 2018.

Bu, Z., Dong, J., Long, Q., and Su, W. Deep learning with Gaussian differential privacy. Harvard Data Science Review, 2(3), sep 30 2020. https://hdsr.mitpress.mit.edu/pub/u24wj42y.

De, S., Berrada, L., Hayes, J., Smith, S. L., and Balle, B. Unlocking high-accuracy differentially private image classification through scale. ar Xiv preprint ar Xiv:2204.13650, 2022.

Dong, J., Roth, A., and Su, W. J. Gaussian Differential Privacy. Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(1):3 37, 02 2022. ISSN 1369-7412. doi: 10.1111/rssb.12454. URL https:// doi.org/10.1111/rssb.12454.

Doroshenko, V., Ghazi, B., Kamath, P., Kumar, R., and Manurangsi, P. Connect the dots: Tighter discrete approximations of privacy loss distributions. In Privacy Enhancing Technologies Symposium (PETS), 2022.

Dwork, C., Kenthapadi, K., Mc Sherry, F., Mironov, I., and Naor, M. Our data, ourselves: Privacy via distributed noise generation. In Vaudenay, S. (ed.), EUROCRYPT, pp. 486 503, 2006a.

Dwork, C., Mc Sherry, F., Nissim, K., and Smith, A. Calibrating noise to sensitivity in private data analysis. In Proc. Theory of Cryptography (TCC), pp. 265 284, Berlin, Heidelberg, 2006b.

Ghazi, B., Kamath, P., Kumar, R., and Manurangsi, P. Faster privacy accounting via evolving discretization. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., and Sabato, S. (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp. 7470 7483. PMLR, 17 23 Jul 2022.

Gopi, S., Lee, Y. T., and Wutschitz, L. Numerical composition of differential privacy. In Advances in Neural Information Processing Systems (Neur IPS), 2021.

Hall, P. The bootstrap and Edgeworth expansion. Springer Science & Business Media, 2013.

Jeffreys, H. and Jeffreys, B. Methods of Mathematical Physics. Cambridge University Press, 3rd edition, 1999. doi: 10.1017/CBO9781139168489.

Kasiviswanathan, S. P., Lee, H. K., Nissim, K., Raskhodnikova, S., and Smith, A. What can we learn privately? SIAM J. Comput., 40(3):793 826, June 2011. ISSN 00975397. doi: 10.1137/090756090.

Koskela, A. and Honkela, A. Computing differential privacy guarantees for heterogeneous compositions using fft. Co RR, abs/2102.12412, 2021. URL https: //arxiv.org/abs/2102.12412.

Koskela, A., J alk o, J., and Honkela, A. Computing tight differential privacy guarantees using fft. In International Conference on Artificial Intelligence and Statistics, pp. 2560 2569. PMLR, 2020.

Koskela, A., J alk o, J., Prediger, L., and Honkela, A. Tight differential privacy for discrete-valued mechanisms and for the subsampled gaussian mechanism using fft. In International Conference on Artificial Intelligence and Statistics, pp. 3358 3366. PMLR, 2021.

Mironov, I. R enyi differential privacy. In Proc. Computer Security Found. (CSF), pp. 263 275, 2017.

Mironov, I., Talwar, K., and Zhang, L. R enyi differential privacy of the sampled gaussian mechanism. ar Xiv preprint ar Xiv:1908.10530, 2019.

Murtagh, J. and Vadhan, S. The complexity of computing the optimal composition of differential privacy. In Proc. Int. Conf. Theory of Cryptography, pp. 157 175, 2016.

Sommer, D. M., Meiser, S., and Mohammadi, E. Privacy loss classes: The central limit theorem in differential privacy. Proceedings on Privacy Enhancing Technologies, 2019(2):245 269, 2019.

Wang, H., Gao, S., Zhang, H., Shen, M., and Su, W. J. Analytical composition of differential privacy via the edgeworth accountant. ar Xiv preprint ar Xiv:2206.04236, 2022.

Zhu, Y. and Wang, Y.-X. Poission subsampled r enyi differential privacy. In Chaudhuri, K. and Salakhutdinov, R. (eds.), ICML, volume 97, pp. 7634 7642, 09 15 Jun 2019.

Zhu, Y., Dong, J., and Wang, Y.-X. Optimal accounting of differential privacy via characteristic function. In International Conference on Artificial Intelligence and Statistics, pp. 4782 4817. PMLR, 2022.

The Saddle-Point Method in Differential Privacy

A. Subsampling: Proof of Lemma 2.2

Subsampling is a fundamental tool in the analysis of differentially-private mechanisms. Informally, subsampling entails applying a differentially-private mechanism to a small set of randomly sampled datapoints from a given dataset. There are several ways of formally defining the subsampling operator, see, e.g., (Balle et al., 2018). The most well-known one, Poisson subsampling, is parameterized by the subsampling rate λ (0, 1] which indicates the probability of selecting a datapoint. More formally, the subsampled datapoints from a dataset D can be expressed as {x D : Bx = 1}, where Bx is a Bernoulli random variable with parameter λ independent for each x D. Given any mechanism M, we define the subsampled mechanism Mλ as the composition of M and the Poisson subsampling operator. Characterizing the privacy guarantees of subsampled mechanisms is the subject of privacy amplification by subsampling principle (Kasiviswanathan et al., 2011). This principle is well-studied particularly for characterizing the privacy guarantees of subsampled Gaussian mechanisms in the context of a variant of differential privacy, namely, R enyi differential privacy (Zhu & Wang, 2019; Abadi et al., 2016; Mironov et al., 2019). We can mirror their formulation to characterize ε and δ for the subsampled Gaussian mechanisms. Recall that a Gaussian mechanism satisfies M(D) = N(f(D), σ2Id) where f is a query function with ℓ2-sensitivity 1. For the subsampled Gaussian, the optimal privacy curve (of a single composition) is

δMλ(ε) = max Eeε(P Q), Eeε(Q P) , (48)

where P = N(0, σ2Id) and Q = (1 λ)P +λP , and P N(e1, σ2Id) where e1 is the first standard basis vector. In Lemma 2.2 (restated below for convenience), we show that the above maximum is always attained by Eeε(Q P) for any ε 0, and that it holds for a larger family of DP mechanisms (including Gaussian and Laplace mechanisms). A similar ordering bound was proved by Mironov et al. (2019, Theorem 5) for the R enyi divergence. Lemma A.1. Fix a Borel probability measure P over Rn

that is symmetric around the origin (i.e., P(A) = P( A) for every Borel A Rn), and fix constants (s, λ, γ) Rn [0, 1] [1, ). Let Ts P be the probability measure given by (Ts P)(A) = P(A s), and let Q = (1 λ)P +λTs P. We have the inequality Eγ(P Q) Eγ(Q P), with equality if and only if (γ 1) λ s Eγ(Q P) = 0.

Proof. The case λ = 0 is clear, so assume λ (0, 1]. Suppose for now that γ (1 λ) < 1. Denote R Ts P, and consider the function G : (0, ) [0, ) defined by

G(t) t E1+ γ 1

t (P R). (49)

Since γ 7 Eγ (P R) is monotonically decreasing, we have that G is monotonically increasing. Note that 0 <

γλ + 1 γ λ. Thus, plugging t {γλ + 1 γ, λ} into G, we obtain

(γλ+1 γ) E γλ γλ+1 γ (P R) λ E λ (1 γ)

λ (P R). (50)

Now, note that

(γλ + 1 γ) E γλ γλ+1 γ (P R)

= (γλ + 1 γ) sup A P(A) γλ γλ + 1 γ R(A)

= sup A P(A) γ ((1 λ)P(A) + λR(A)) (52)

= Eγ(P Q), (53)

where the suprema are taken over all Borel sets A Rn. In addition, by symmetry of P around the origin, we have that

Eγ (P R) = sup A P(A) γ P(A s) (54)

= sup A P( A) γ P( A s) (55)

= sup A P(A) γ P(A + s) (56)

= sup A P(A s) γ P(A) (57)

= Eγ (R P). (58)

λ E λ (1 γ)

= λ E λ (1 γ)

λ (R P) (59)

= λ sup A R(A) λ (1 γ)

λ P(A) (60)

= sup A ((1 λ)P(A) + λR(A)) γP(A) (61)

= Eγ(Q P). (62)

We conclude from (50) the desired inequality Eγ(P Q) Eγ(Q P). In addition, the case γ (1 λ) 1 follows immediately since then Eγ(P Q) = 0 Eγ(Q P).

In light of this lemma, the privacy guarantee of a subsampled Gaussian mechanism is fully characterized by computing only Eeε((1 λ)P + λTs P P), where P = N(0, σ2Id). Based on this result, for our numerical experiments, we only compute the saddle-point accountant with this order of P and Q.

B. The Method of Steepest Descent

We describe the general approach for the method of steepest descent. Our task is to compute the contour integral

t i e Fn(z) dz. (63)

The Saddle-Point Method in Differential Privacy

What we will obtain is an asymptotic expansion

In e Fn(t0) p

In a nutshell, the method of steepest descent is a powerful tool for choosing the best parameter t that renders the computation of In easiest. In particular, this choice of t is called the saddle-point, which is found as follows.

Here, Fn is holomorphic over a strip (0, T) + i R in the complex plane, the parameter n N is growing without bound, and t (0, T) R is a free parameter. In particular, the value of the integral In is assumed to be independent of the parameter t. This could be satisfied for certain choices of Fn by virtue of its analyticity and in view of Cauchy s integral theorem. As we show in Theorem 3.1, computing the above contour integral amount to exactly computing the privacy parameter δL(n)(ε) if we choose the function

Fn(z) = KL(n)(z) zε log z log(1 + z). (65)

Suppose that F n (t) > 0 over t (0, T) in particular, Fn is strictly convex over the real interval (0, T) and that there is a value t0 (0, T) solving the equation F n(t0) = 0, which is then necessarily unique. Then, a second order Taylor expansion around t0 yields that

Fn(z) = Fn(t0) + (z t0)2

2 F n (t0) + o(|z t0|3). (66)

Looking at the the values of the approximating quadratic Fn(t0) + (z t0)2

2 F n (t0) for z near t0 along the real axis (so z = t for t0 t R) and along the axis t0 + i R (so z = t0 + is for 0 s R), we see that this approximation has a local minimum at t0 along the real axis and it has a local maximum at t0 along the axis t0 + i R. Hence, t0 is a saddle-point for the approximating quadratic. Further, as the integral we are concerned with runs along the contour t + i R, we expect the value of In to come primarily from values z t0.

Now, consider the function

Gn(z) = Fn(t0 + z) Fn(t0) z2

2 F n (t0). (67)

We have that Gn is holomorphic over some vertical strip centered at the origin, and

G(k) n (0) = 0, 0 k 2 F (k) n (t0), k 3. (68)

We assume for the next steps that Gn is an entire function. Thus, Gn has the expansion

F (k) n (t0)

k! zk. (69)

Furthermore, e Gn(z) has the power series expansion

e Gn(z) = 1 + X

k 3 αn,kzk, (70)

k!Bk(0, 0, F (3) ε (t0), . . . , F (k) ε (t0)). (71)

As we may write

Fn(t0 + is) = Fn(t0) + Gn(is) F n (t0)

we get the exact value of the integral

In = e Fn(t0)

e s2F n (t0)/2

k 3 αn,k(is)k

The derived steps thus far have all been justified rigorously. The final step, however, is a heuristic, where we truncate the power series expansion to obtain possible estimates of In. The point is that the derived expressions through this heuristic have the potential of being proved by other means to be indeed close approximations of In.

For instance, dropping the whole series beyond the constant term yields the basic saddle-point approximation

In,1 e Fn(t0)

e s2F n (t0)/2 ds = e Fn(t0) p

2πF n (t0) .

(74) Note that this approximation is in fact exact if Fn is a quadratic, i.e., for computing the Gaussian integral. Keeping the terms k {3, , 2k }, it is not hard to see that one obtains the k -th estimate

In,k e Fn(t0) p

where we denote the constants

βn,m ( 1)m B2m(0, 0, F (3) n (t0), . . . , F (2m) n (t0)) 2mm!F n (t0)m .

(76) Then one might say that In has the asymptotic expansion

In e Fn(t0) p

Recall that this does not mean that the above equation holds with equality for any particular n. Rather, it is a heuristic indicating the potential for the truncated expansion to give close approximations for the intended integral In.

The Saddle-Point Method in Differential Privacy

C. Satisfiability of the Assumptions

We explain here how Assumption 2.6 is satisfied by the subsampled Gaussian and Laplace mechanisms. Note that by the Lebesgue decomposition theorem, the probability measure of the PLRV can always be decomposed into a sum of an absolutely continuous measure, a discrete measure, and a singular measure (such as the Cantor distribution). Thus, Assumption 2.6 requires the exclusion of singular components. This can be easily seen to be satisfied by the subsampled Gaussian and Laplace mechanisms. Further, Assumption 2.6 does not impose any requirement on the discrete part. Thus, we consider the continuous part here.

Note that the PLRV for the subsampled Gaussian mechanism (with subsampling rate λ, variance σ2, and sensitivity s) is given by

L = log 1 λ + λe(2s X s2)/(2σ2) , (78)

where X (1 λ)N(0, σ2) + λN(s, σ2). Hence,

P [L z] = P X s

s log ez (1 λ)

(79) if z > log(1 λ), and P[L z] = 0 otherwise. So, L is continuous with PDF

p L(z) = A e2z

g(z)3/2 g(z)

2s2 log g(z)

λ 1(log(1 λ), )(z),

(80) where g(z) = ez (1 λ) and we have the constant A =

λ 2π exp s2

8σ2 . From this, we see that p L(z) decays superexponentially as z . Further, it is continuous. Indeed, we only need to check continuity at z = log(1 λ). But this is immediate using, e.g., y = log g(z)

λ and taking y . These properties imply that Assumption 2.6 is satisfied by the subsampled Gaussian mechanism.

Finally, we note that the case of the subsampled Laplace mechanism is simpler. Indeed, taking the analogous expression for L as in (78), we see that L has only a discrete component and a continuous component. Further, the continuous part comes from values of X between 0 and s. This boundedness translates into the fact that the PDF of the continuous part of L is compactly supported, so Assumption 2.6 is satisfied in this case too.

D. Well-Definedness of the Saddle-Point

The well-definedness of the saddle-point, given ε < ess sup L, follows from convexity of Fε over the positive reals. Namely, we show that Fε is complex-differentiable and that there is a unique positive real t0 such that F ε(t0) = 0. Let KL|R be the restriction of the CGF to the real axis.

We have that KL|R is convex over (0, ), and thus, Fε|R is strictly convex there. Thus, the minimum of Fε over the positive reals is unique; further, the real derivative at this minimum vanishes. Nevertheless, finiteness of ML over (0, ) implies its analyticity over the half-plane (0, ) + i R; in particular, the complex derivative of Fε exists in the same half-plane. Hence, the function Fε is complex-differentiable at t0, and its derivative vanishes there, as required.

E. New Formulas for the Privacy Curve: Proof of Theorem 3.1

Before proving Theorem 3.1, we show the following general Parseval identity. For f L1(R), we denote the Fourier transform by

R f(x)e ixξ dx. (81)

Lemma E.1. Let P = Q+R be a Borel probability measure on R, where Q is absolutely continuous with respect to the Lebesgue measure whose PDF is square-integrable and R is discrete. For any continuous function f : R R such that f L1(R) L2(R), ˆf L1(R), and EX P [|f(X)|] < , we have the Parseval identity Z

R f(x) d P(x) = 1

R ˆf(ξ)ϕP (ξ) dξ, (82)

where ϕP (ξ) EX P [eiξX] is the characteristic function.

Proof. Let q denote the PDF of Q. Suppose R is supported over {xj}j J, where J is at most countable, and write rj = R({xj}). Then, we may write Z

R f(x) d P(x) = Z

R f(x) d Q(x) + Z

R f(x) d R(x) (83)

R f(x)q(x) dx + X

j J f(xj)rj. (84)

Since f, q L1(R) L2(R), we have the Parseval identity Z

R f(x)q(x) dx = 1

R ˆf(ξ)ϕQ(ξ) dξ. (85)

As we also have continuity of f and integrability of ˆf, we also have the Fourier inversion

R ˆf(ξ)eixξ dξ (86)

for every x R. In particular, we have that

j J f(xj)rj = 1

R ˆf(ξ)ϕR(ξ) dξ. (87)

The desired result follows by ϕP = ϕQ + ϕR.

The Saddle-Point Method in Differential Privacy

Now, we apply Lemma E.1 to derive Theorem 3.1.

Proof of Theorem 3.1. Expectations of functions of L can be written in terms of L as E[f( L)] = E[et Lf(L)]/ML(t). Thus, the MGF of the tilted variable L is given by

M L(z) = E[ez L] = E[et Lez L]

ML(t) = ML(t + z)

ML(t) . (88)

Similarly, expectations of functions of L can be written in terms of L as E[f(L)] = ML(t) E[e t Lf( L)]. Thus, we can write the privacy curve δL in terms of the tilted variable L as

δL(ε) = E h 1 eε L +i (89)

= ML(t) E e t L 1 eε L + . (90)

In other words, the formula in (22) holds.

Next, we use Assumption 2.6 to apply Parseval s identity (Lemma E.1) to the expectation in (90) to get the contourintegral formula in (23). Specifically, consider the function

f(x) = e tx 1 eε x + , (91)

Note that f is bounded, continuous, and f L1(R) L2(R). Further, we have the Fourier transform

ˆf(s) = e (t+is)ε

(t + is)(t + 1 + is) L1(R). (92)

In addition, by Assumption 2.6, the probability measure PL induced by L may be written as PL = QL + RL, where QL is absolutely continuous with respect to the Lebesgue measure whose PDF q L satisfies that x 7 eτxq L(x)2 is integrable for every τ > 0 and RL is discrete. Suppose RL is supported over {xj}j J with J at most countable, and write r L,j = R({xj}). Then, by definition of exponential tilting, for every Borel set B R, we have that

P L(B) = 1 ML(t)

B etx d PL(x) (93)

B etx d QL(x) + 1 ML(t)

B etx d RL(x)

B etxq L(x) dx + 1 ML(t)

= Q(B) + R(B), (96)

where we define the Borel measures

Q(B) 1 ML(t)

B etxq L(x) dx, (97)

R(B) 1 ML(t)

etxjr L,j. (98)

From these definitions, it is clear that R is discrete and Q is absolutely continuous with respect to the Lebesgue measure with PDF q(x) etxq L(x)/ML(t). Furthermore, by assumption on q L, we have that q L2(R). Therefore, we may apply Parseval s identity (Lemma E.1) on f and P L to obtain

E h f( L) i = 1

R ˆf(s)ϕP L(s) ds. (99)

Next, applying the formula for M L in (88), we see that

ϕP L(s) = E h eis Li = M L(is) = ML(t + is)

ML(t) . (100)

Therefore, combining formulas (90) and (99), we get

δL(ε) = ML(t)E h f( L) i = 1

R ˆf(s)ML(t + is) ds.

(101) Now, using the contour {z = t + is : < s < } oriented counter-clockwise, we see that (101) may be rewritten as the contour integral

δL(ε) = 1 2πi

t i ˆf((z t)/i)ML(z) dz. (102)

Finally, using the formula for ˆf in (92), we deduce

δL(ε) = 1 2πi

z(z + 1)ML(z) dz (103)

t i e Fε(z) dz, (104)

where we define

Fε(z) KL(z) εz log(z) log(1 + z) (105)

and we take the principal branch for the complex logarithm. This is precisely the desired formula for δL stated in (23), and the proof of the theorem is therefore complete.

F. The Large-Composition Regime: Proof of Theorem 4.1

We may show this result using the standard Berry-Esseen approach. By the Berry-Esseen theorem, we have for Z

The Saddle-Point Method in Differential Privacy

N(E[L], σ2 L) that

δL(ε) = E h 1 eε L +i (106)

0 P (1 eε L)+ > u du (107)

0 P 1 eε L > u du (108)

0 P 1 eε L > u du (109)

0 P [L > ε log(1 u)] du (110)

= δZ(ε) + θ 0.56 P0

where |θ| 1. A direct computation yields that for any ε E[L] (with Z N(E[L], σ2 L))

δZ(ε) = Φ E[L] ε

eε E[L]+σ2 L/2 Φ E[L] σ2 L ε σL

Plugging in ε = E[L] Φ 1(δ)σL, we obtain that

δZ(E[L] Φ 1(δ)σL)

= δ e Φ 1(δ)σL+σ2 L/2Φ Φ 1(δ) σL . (113)

Using Φ( x) = Q(x) = q(x)e x2/2

2π for x > 0, we obtain

δZ(E[L] Φ 1(δ)σL) (114)

= δ q(σL Φ 1(δ))

2π e Φ 1(δ)2/2 (115)

= δ q(σL Φ 1(δ))

2π e ( Φ 1(δ))2/2 (116)

= δ q(σL Φ 1(δ))

q( Φ 1(δ)) Φ(Φ 1(δ)) (117)

= δ 1 q(σL Φ 1(δ))

Note that q(x) 1/x as x . Since σL/( Φ 1(δ)) by assumption, we also have σL Φ 1(δ)) . Thus, we obtain

q(σL Φ 1(δ))

q( Φ 1(δ)) 1 Φ 1(δ)q( Φ 1(δ)) 1 σL Φ 1(δ) 1.

(119) As lim sup δ < 1/2, we get that the term Φ 1(δ)q( Φ 1(δ)) is bounded away from 0. Therefore, we get that

δZ(E[L] Φ 1(δ)σL) = δ (1 + o(1)) . (120)

From (111), and since P0 = o(σ3 L) by assumption, we conclude that

δL(E[L] Φ 1(δ)σL) = δ (1 + o(1)) . (121)

In other words, M is (E[L] Φ 1(δ)σL, δ (1 + o(1)))-DP, as desired.

G. Asymptotic of the Saddle-Point: Proof of Theorem 4.2

We write K = KL for short. Consider the saddle-point equation (25):

K (t) = ε + 1

t + 1 1 + t. (122)

The left-hand side strictly increases from E[L] to ess sup L over t [0, ), whereas the right-hand side strictly decreases from to ε over the same interval. Hence, there exists a unique solution t = t0 > 0, which we call the saddle-point.

We show first that t0 0 as n . Suppose, for the sake of contradiction, that t lim supn t0 > 0, and let nk be a sequence of indices such that the sequence of the nk-th saddle points, denoted t(k) 0 , converge to t . Let ρ2 : (0, ) (0, ) be defined by ρ2(t) (K (t) E[L])/(tσ2 L), so ρ2(t) 1 as t 0+ and

K (t) = E[L] + σ2 Ltρ2(t). (123)

Note that ρ2 is a continuous function. Noting that ε = E[L] + bσL, rearranging the saddle-point equation yields that

1 + σ2 L E[L]tρ2(t)

E[L] = 1 + 1

εt + 1 ε (1 + t). (124)

Taking t {t(k) 0 }k N, letting k , and recalling the assumptions that (E[L], σ2 L) n (KL, V) for KL, V > 0 and that b = o( n), we infer from (124) that

KL = 0. (125)

Equality (125) contradicts that V, t , ρ2(t ), KL > 0. Thus, we must have that t = 0.

Consider the reparametrization t = d/σL, so d is a variable over (0, ). The saddle-point equation can be rewritten as ρ2(t) b

d 1 ρ2(t)d3

(126) We rewrite the saddle-point equation in this quadratic form since it closely approximates the quadratic d2 bd

The Saddle-Point Method in Differential Privacy

1 = 0 at the saddle-point. Indeed, let d0 > 0 be such that t0 = d0/σL. We obtain from (126) the inequality 1 2d2 0 (b + 1)d0 1 0 for all large n. This latter inequality yields that

d0 b + 1 + p

(b + 1)2 + 2 = o(n1/6). (127)

Hence, ρ2(t0)d3 0/σL 0 as n , i.e., the constant term in (126) approaches 1. Thus, for all large n, completing the square in (126) yields (denoting t = t0, ρ = ρ2, and σ = σL for short)

σ + r b + 2

σ 2 + 4 1 ρ(t)d3 0 σ ρ(t) b

(128) Taking n , we obtain

b2 + 4 2 , (129)

which gives the desired asymptotic formula for the saddlepoint t0 = d0/σL.

H. Contrast between SPA and the Standard CLT

To illustrate the advantage of our tilting approach, we compare the asymptotic behavior of the error in Theorem 5.7 to that obtainable from non-tilted Berry-Esseen. Let L = L1 + +Ln for independent PLRVs L1, , Ln that satisfy Assumption 2.5. Suppose that Assumption 2.7 holds too.

By the Berry-Esseen theorem, we have for a Gaussian Z N(E[L], σ2 L) that11

δL(ε) = E h 1 eε L +i (130)

0 P [L > ε log(1 u)] du (131)

= δZ(ε) + θ 0.56 P0

where |θ| 1. By Assumption 2.7, the error term in the standard Berry-Esseen approach shown above satisfies

err Standard(ε) 0.56 P0

σ3 L 0.56 P

V3/2 n . (133)

Thus, the improvement our approach yields is asymptotically (see Theorem 5.7 for the definitions of C(b) and τ)

err SP(ε; t0) err Standard(ε) 2 e

C(b)τ . (134)

11Note that Z is not necessarily a PLRV associated to a Gaussian mechanism, since in general σ2 L = 2E[L].

Even for moderate values of b, the above ratio is very small (recall that we denote ε = E[L] + bσL). For example, if b 6.4 (so δ 10 10 in the limit; see Theorem 4.1 on the high-composition regime), we obtain the limit of the ratio

lim n err SP(ε; t0) err Standard(ε) 3 10 9. (135)

In addition, in the complementary regime of δ 0, e.g., when ε = E[L] + bσL with b log n (and still b = o(n1/6)), one has that the error term in the standard CLT dominates the approximation of δ:

δZ(ε) = o (err Standard(ε)) . (136)

In contrast, in the same regime, our error term err SP(ε; t0) is always vanishingly smaller than the approximation itself, i.e.,

err SP(ε; t0) = o (δL, SP-CLT(ε)) . (137)

I. Proofs of Section 5.1

I.1. Proof of Proposition 5.3

Denote K = KL for short. The Gaussian expectation may be computed as

E f (Z ε, t)

= exp K (t)t2

2 (K (t) ε)t

K (t) t K (t) ε p

exp K (t)(t + 1)2

2 (K (t) ε)(t + 1)

K (t) (t + 1) K (t) ε p

Using Q(z) = q(z)

2πe z2/2 and the definitions of α, β, γ, we get

E f (Z ε, t) = q(α) q(β)

2π e γ2/2. (139)

Plugging this into the definition of δL, SP-CLT completes the proof.

I.2. Proof of Proposition 5.5

Let Z N (K L(t), K L(t)) be the variable in the expectation in (38). Its PDF is upper bounded by p Z(z)

The Saddle-Point Method in Differential Privacy

2πK L(t). Thus

E e t(Z ε) 1 e (Z ε) +

ε p Z(z)e t(z ε) 1 e (z ε) dz (140)

ε e t(z ε) 1 e (z ε) dz (141)

2πK L(t) t(t + 1) . (142)

Applying this bound to the definition of δL, SP-CLT(ε) in (38) completes the proof.

J. Error Bound for SPA-CLT: Proof of Theorem 5.6

Fix t > 0. Recall from (36) that

δL(ε) = e KL(t) εt E h f L ε, t i (143)

where L is the exponential tilting of L with parameter t, and

f(x, t) = e xt(1 e x)+ (144)

Note that K L(t) = E[ L] and K L(t) = Var[ L]. We consider the function f(x, t). We show next that, for fixed t, x 7 f(x, t) is a unimodal function with a maximal value of tt/(t + 1)t+1. Certainly f(x, t) 0 for all x. For x > 0 the derivative (with respect to x) is

f (x, t) = te tx(1 e x) + e txe x (145)

= e tx t + (t + 1)e x . (146)

Note that t + (t + 1)e x is monotonically decreasing in x, which means that f(x, t) is increasing until t + (t + 1)e x = 0, and is subsequently decreasing. In particular, the maximal value of f is attained when

x = x0 = log t t + 1. (147)

Note that x0 > 0. Thus, the maximal value of f is

fmax f(x0, t) = f log t t + 1, t (148)

t 1 t t + 1

(t + 1)t+1 . (149)

Thus, between x = 0 and x = x0, f(x, t) is monotonically increasing from 0 to fmax; then from x = x0 to x = , f(x, t) is monotonically decreasing from fmax to 0. Thus, there exist functions f 1 1 (z), f 1 2 (z) such that, for any z (0, fmax), f(x, t) > z if and only if

f 1 1 (z) < x < f 1 2 (z).

E[ f( L ε, t)]

0 P h f( L ε, t) > z i dz (150)

0 P h f 1 1 (z) < L ε < f 1 2 (z) i dz. (151)

In addition, we may apply the Berry-Esseen theorem to write

P h L > x i P[Z > x] 0.56 Pt K L(t)3/2 (152)

where Z N (K L(t), K L(t)) and Pt is defined in the beginning of Section 5. Thus we have the upper bound

= e KL(t) εt E h f L ε, t i (153)

= e KL(t) εt Z fmax

0 P h f 1 1 (z) < L ε < f 1 2 (z) i dz

e KL(t) εt 1.12fmax Pt

0 P f 1 1 (z) < Z ε < f 1 2 (z) dz

= e KL(t) εt E f (Z ε, t) + 1.12fmax Pt

Similarly, we have the lower bound

e KL(t) εt E f (Z ε, t) 1.12fmax Pt

This completes the proof of the theorem.

K. Asymptotic of the SPA-CLT Approximation Error: Proof of Theorem 5.7

We write K = KL for short. Recall the definition of the error term in (44)

err SP(ε; t0) = e K(t0) εt0 tt0 0 (1 + t0)1+t0 1.12 Pt0 K (t0)3/2 .

From the characterization of the saddle-point in Theorem 4.2, we have that

b2 + 4 2σL . (159)

The Saddle-Point Method in Differential Privacy

By Assumption 2.7, we have that σ2 L = K (0) n V as n . Hence, t0 c/ n for c = (b+

b2 + 4)/(2V) = o(n1/6). Thus, by Assumption 2.7 again, (K (t0), Pt0) n (V, P). As we also have that t0 0, we conclude that

tt0 0 (1 + t0)1+t0 1.12 Pt0 K (t0)3/2 1.12 P

V3/2 n . (160)

Thus, it only remains to analyze the asymptotic of exp (K(t0) εt0).

We use the following Taylor expansion of K around 0:

K(t0) = t0 E[L] + t2 0 2 σ2 L + t3 0 6 K (ξ), (161)

where 0 ξ t0. Using ε = E[L] + bσL, and writing t0 = d0/σL (so d0 (b+

b2 + 4)/2 by (159)), we obtain

K(t0) εt0 = d2 0 2 bd0 + d3 0K (ξ)

6σ3 L . (162)

Now, note that K (ξ) = Pn j=1 K Lj(ξ). Thus, applying the triangle inequality, we obtain that |K (ξ)| Pξ. As 0 ξ t0, Assumption 2.7 yields that |K (ξ)| = O(n). As σL = Θ( n), and d0 = o(n1/6), we infer that

6σ3 L 0 (163)

as n . Hence,

exp (K(t0) εt0) exp d2 0 2 bd0

Writing d0 = τ0 (b +

b2 + 4)/2, so τ0 > 0 and τ0 1 by (159), then collecting terms, we obtain

d2 0 2 bd0 = τ 2 0 2 (2 τ0)τ0 b2 + b

b2 + 4 4 . (165)

Therefore, we obtain that

exp (K(t0) εt0) e C(b)τ (166)

where τ (2 τ0)τ0 1. Putting the asymptotics shown above together, we conclude that

err SP(ε; t0) 1.12 e P

V3/2 C(b)τ n , (167)

as desired.

L. Instantiation of the Saddle-point Accountant

The algorithm SADDLEPOINTACCOUNTANT (Algorithm 1), giving the workflow of the versions of the SPA, is presented here.

Algorithm 1 : SADDLEPOINTACCOUNTANT (SPA)

1: Input: A finite set E [0, ) (values of ε), and tightly dominating distributions (P1, Q1), . . . , (Pn, Qn).

2: Output: Four approximations δ(k) L,SP-MSD, 1 k 3, and δL, SP-CLT of the privacy curve δL, and an error bound so that |δL(ε) δL, SP-CLT(ε)| err SP(ε).

3: Lj log d Pj

d Qj (Xj) where Xj Pj j [n]

4: KLj(t) log E et Lj j [n] 5: L L1 + + Ln 6: KL KL1 + + KLn

7: for ε E do 8: t0 positive solution to K L(t0) = ε + 1

t0 + 1 t0+1 9: Fε(t) KL(t) εt log t log(t + 1)

8 F (4) ε (t0) F ε (t0)2

24 F (3) ε (t0)2

F ε (t0)3 1

48 F (6) ε (t0) F ε (t0)3

12: δ(1) L, SP-MSD(ε) e Fε(t0) p

13: δ(2) L, SP-MSD(ε) e Fε(t0) p

2πF ε (t0) (1 + βε,2)

14: δ(3) L, SP-MSD(ε) e Fε(t0) p

2πF ε (t0) (1 + βε,2 + βε,3)

15: γ K L(t0) ε p

16: (α, β) p

K L(t0) t0 γ, p

K L(t0) (t0 + 1) γ

17: δL, SP-CLT(ε) e KL(t0) εt0 γ2/2 q(α) q(β)

18: Lj exp. tilt of Lj with parameter t0, j [n]

j [n] E Lj K Lj(t0) 3

20: err SP(ε) e KL(t0) εt0 tt0 0 (1 + t0)1+t0 1.12 P(n) t0 K L(t0)3/2

21: end for

22: Return: δ(k) L,SP-MSD, 1 k 3, δL, SP-CLT, err SP.

The Saddle-Point Method in Differential Privacy

M. Ground-Truth Curve Computation

We explain here how the ground-truth curve in Figure 2 is computed. Since the setting there is for self-composition, we employ that here too. So, let L1, , Ln be i.i.d. PLRVs for the subsampled Gaussian mechanism, and consider the PLRV L = L1 + + Ln for the composed mechanism.

Recall that the saddle-point accountant gives various approximations to the contour integral given in Theorem 3.1, which we copy here:

δL1(ε) = 1 2πi

t i e Fε(z) dz (168)

where the function Fε is defined as:

Fε(z) = KL1(z) εz log z log(1 + z). (169)

After n compositions, the contour integral becomes:

δL(ε) = 1 2πi

t i en KL1(z) εz log z log(1+z) dz.

(170) Recall that this formula holds for any value of t > 0.

The ground-truth in (170) is then computed via standard numerical integration, which evidently is a time-consuming process, yet it is one that can produce a reference value to relatively compare accountants accuracies.

Let P = N(0, σ2), Q = (1 λ)N(0, σ2) + λN(s, σ2). The composed subsampled Gaussian has the PLRV L = L1 + + Ln, where the Lj are independent and (see Lemma 2.2)

Lj = log d Q

d P (X) = log 1 λ + λes(2X s)/(2σ2) ,

X (1 λ)N(0, σ2) + λN(s, σ2). (171)

In addition, the MGF of L1 may be written as

ML1(z) = E[ez L1] (172)

d P (X) z (173)

d P (X) z+1#

1 λ + λes(2x s)/(2σ2) z+1 d P(x).

Recall that the CGF is given by

KL1(z) = log ML1(z). (176)

Plugging in the log integral (176) into the contour integral (170), the contour integral can be directly computed

using standard numerical libraries. We note that this calculation is very slow, as the integrand in (170) itself involves an integral over R. Moreover, we numerically invert this function via bisection to obtain the curve described in Figure 2. This ground-truth curve was computed on a 64-core cluster using multi-processing to distribute the workload, and took a wall-time of 45 minutes. This amounts to a runtime of 48 CPU hours. In contrast, all other accountants run in the order of seconds on a commercial laptop.

N. Additional Numerical Experiments

We provide further experiments exploring the flexibility of the saddle-point accountant. We show that the SPAMSD approximations can be accurate even in the moderatecomposition regime, though the SPA-CLT bounds become loose for a small number of compositions. We demonstrate this using parameters used by a real-world application of DP on the image classification SGD algorithm in (De et al., 2022), which uses the subsampled Gaussian as the DP mechanism. In particular, we use the noise scale σ = 9.4 and subsampling rate λ = 214/50000, as these were the values that allowed a 40-layer Wide-Res Net to achieve a new SOTA accuracy of 81.4% on CIFAR-10 under (ε = 8, δ = 10 5)- DP. This algorithm went up to n = 2000 compositions to achieve this SOTA.

First, we plot the (ε, δ)-curves at n {100, 250, 500, 2000} compositions in Figure 4. We observe that the CLT bounds get tighter as the number of compositions increases, but the order-1 SPA-MSD remains consistently accurate for all presented compositions and values of δ.

Second, we demonstrate the accuracy of the order-1 SPAMSD for all compositions less than 2000 in Figure 5, where we fix δ = 10 5, vary the number of compositions, and plot the resulting value of ε.

These two plots verify that the order-1 SPA-MSD is much more accurate than the CLT bounds suggest.

Finally, to demonstrate the applicability of the saddle-point accountant to other mechanisms beyond the subsampled Gaussian. In Figure 6 we present results for the composed Laplace mechanism. In particular, we borrow the mechanism parameters used in the PRV Accountant notebook showcasing their implementation, i.e., n = 1000 compositions with a sensitivity of s = 0.01 and a shape parameter b = 1. We compare against the Moments Accountant (using the Laplace cumulant), PRV Accountant and Connect the Dots implementation, in the same spirit as Figure 1. Note that we use εerror = 0.01 for the PRV Accountant, which is an order of magnitude finer than the εerror = 0.1 used in the linked notebook, for demonstration purposes. Once again, the SPA upper and lower bounds have a narrow gap between them.

The Saddle-Point Method in Differential Privacy

0.0 0.5 1.0 1.5 2.0 2.5 3.0

Moments Accountant PRV Accountant Connect the Dots SPA-MSD (k=1) SPA-CLT Bounds

(a) 100 compositions

0 1 2 3 4 5

Moments Accountant PRV Accountant Connect the Dots SPA-MSD (k=1) SPA-CLT Bounds

(b) 250 compositions

0 1 2 3 4 5 6 7

Moments Accountant PRV Accountant Connect the Dots SPA-MSD (k=1) SPA-CLT Bounds

(c) 500 compositions

0 2 4 6 8 10 12 14

Moments Accountant PRV Accountant Connect the Dots SPA-MSD (k=1) SPA-CLT Bounds

(d) 2000 compositions

Figure 4. Accounting for the composition of n {100, 250, 500, 2000} subsampled Gaussian mechanisms, with noise scale σ = 9.4 and subsampling rate λ = 214/50000. The PRV Accountant (Gopi et al., 2021) discretization parameters are εerror = 0.1, δerror = 10 10. The Connect the Dots (Doroshenko et al., 2022) discretization interval length is 0.005.

The Saddle-Point Method in Differential Privacy

250 500 750 1000 1250 1500 1750 2000 compositions (n)

Moments Accountant PRV Accountant SPA-MSD (k=1)

Figure 5. Privacy budget ε of the subsampled Gaussian mechanism after 1 n 2000 compositions using the order-1 SPA-MSD, the Moments Accountant, and the PRV Accountant (Gopi et al., 2021). We use subsampling λ = 214/50000, noise scale σ = 9.4, and δ = 10 5. The discretization parameters for the PRV Accountant are εerror = 0.1, δerror = 10 10.

0.0 0.5 1.0 1.5 2.0 2.5 3.0

Moments Accountant

PRV Accountant

Connect the Dots

SPA-CL T Bounds

Figure 6. Accounting for the composition of 1000 Laplace mechanisms having shape parameter b = 1 and with sensitivity s = 0.01. The remaining FFT discretization parameters are set to εerror = 0.01, δerror = 10 10 for the PRV Accountant (Gopi et al., 2021), and discretization interval length of 2 10 4 for Connect the Dots (Doroshenko et al., 2022).