# optimizing_noise_distributions_for_differential_privacy__96d1fc63.pdf Optimizing Noise Distributions for Differential Privacy Atefeh Gilani 1 Juan Felipe Gomez 2 Shahab Asoodeh 3 Flavio P. Calmon 4 Oliver Kosut 1 Lalitha Sankar 1 We propose a unified optimization framework for designing continuous and discrete noise distributions that ensure differential privacy (DP) by minimizing R enyi DP, a variant of DP, under a cost constraint. R enyi DP has the advantage that by considering different values of the R enyi parameter α, we can tailor our optimization for any number of compositions. To solve the optimization problem, we reduce it to a finite-dimensional convex formulation and perform preconditioned gradient descent. The resulting noise distributions are then compared to their Gaussian and Laplace counterparts. Numerical results demonstrate that our optimized distributions are consistently better, with significant improvements in (ε, δ)-DP guarantees in the moderate composition regimes, compared to Gaussian and Laplace distributions with the same variance. 1. Introduction Differential privacy (DP) (Dwork et al., 2006a;b) provides strong privacy protections by ensuring that information released from queries on sensitive data does not reveal whether any individual is included in the dataset. This is achieved through privacy mechanisms that add noise to query responses, effectively masking any private information. In practice, privacy mechanisms are rarely applied only once to sensitive data. Instead, they are often used sequentially, with the frequency depending on the specific use case. In what we call the moderate composition regime, privacy mechanisms are applied a limited number of times to release aggregated data. For example, the U.S. Census Bureau 1School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ, USA 2Department of Physics, Harvard University, Cambridge, MA, USA 3Department of Computing and Software, Mc Master University, Hamilton, Ontario, Canada 4School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA. Correspondence to: Atefeh Gilani . Proceedings of the 42 nd International Conference on Machine Learning, Vancouver, Canada. PMLR 267, 2025. Copyright 2025 by the author(s). (Abowd et al., 2022) publishes demographic statistics such as population density or employment rates for each state, requiring repeated applications of privacy-preserving methods. On the other hand, the large composition regime involves thousands of sequential applications of privacy mechanisms, such as during the training of a machine learning model. The most widely used noise distributions incorporated into privacy mechanisms for achieving DP are Laplace noise (Dwork et al., 2006b), typically employed when pure DP is desired, and Gaussian noise (Abadi et al., 2016), often used for approximate DP (i.e., (ε, δ)-DP). However, these distributions may not necessarily be optimal across all settings in which they are applied. For example, while the Gaussian distribution is commonly used to ensure approximate DP in the large composition regime, the parametrized Cactus distribution has been shown to be optimal in this regime for a fixed sensitivity s > 0 (Alghamdi et al., 2022). Moreover, the Schr odinger distribution has been demonstrated to be optimal for vanishing sensitivity (s 0+), and it only recovers the Gaussian distribution as a special case under a quadratic cost function (Alghamdi et al., 2023). Additionally, in (Geng et al., 2015), it was shown that the Staircase distribution is optimal for ε-DP in the single composition regime. In this paper, we focus on designing optimal noise distributions tailored to a fixed number of compositions and sensitivity. We demonstrate that the resulting optimal R enyi DP noise distribution significantly outperforms both Laplace and Gaussian distributions in the moderate composition regime. Discrete noise distributions also hold significant value for DP. This is because finite precision implementations that generate continuous random variables can result in naive floating-point approximations which have been shown to compromise their de facto privacy guarantees (Mironov, 2012). Restricting noise distributions to integer support offers resilience against floating-point attacks and better suits applications involving integer-valued queries, such as population counts in the US Census. To address these challenges, the discrete Gaussian distribution (Canonne et al., 2020) and the discrete Laplace distribution (Ghosh et al., 2012; Balcer & Vadhan, 2017) have been proposed. These distributions, supported on integer values, serve as natural discrete analogues to the continuous Gaussian and Laplace distributions. Optimizing Noise Distributions for Differential Privacy In this paper, we apply exactly the same approach as for continuous distributions to optimize discrete noise distributions. The comparison between the discrete R enyi DP noise distribution and the discrete Gaussian and Laplace distributions mirrors that of their continuous counterparts, with the discrete R enyi DP noise distribution also outperforming the discrete Gaussian and Laplace distributions in the moderate composition regime. Whether in the continuous or discrete case, optimizing with respect to (ε, δ)-DP in the context of compositions is challenging due to the problem s non-convex nature. To overcome this, we leverage the properties of R enyi differential privacy (RDP) (Mironov, 2017), a variant of DP defined in terms of the R enyi divergence of order α, where α (1, ). RDP has garnered significant attention (Abadi et al., 2016; Chen et al., 2019; Feldman et al., 2021; L ecuyer et al., 2021; Feldman et al., 2022). One of its key advantages is that, for composition, the total RDP is the sum of the individual RDPs. Therefore, our approach is to optimize each RDP term individually, which is sufficient to optimize the overall privacy guarantee. In addition, RDP can be converted into (ε, δ)-DP, as in the moments accountant in (Abadi et al., 2016; Mironov, 2017; Bun & Steinke, 2016; Balle et al., 2020; Asoodeh et al., 2021). This conversion involves minimizing over the R enyi order α. Thus, the optimal α is closely tied to the number of compositions, and is generally decreasing with the number of compositions: it tends toward 1 in the large composition regime, and towards infinity in the single-composition regime. For moderate compositions, α generally lies within an intermediate range, avoiding the extremes of 1 and infinity. Because our framework allows us to optimize the noise for any α, our approach encompasses optimal noise distributions in both the large and small composition regimes. In the large composition regime, the Cactus distribution is shown to be optimal in (Alghamdi et al., 2022); this distribution is found by minimizing the Kullback-Leibler divergence, which corresponds to RDP as α 1. Since we incorporate the moments accountant into our optimization framework, for large numbers of compositions, we will automatically seek α near 1, thus recovering the Cactus distribution. In the single-composition regime for pure DP, the optimal α tends to infinity, and our noise distribution recovers the Staircase distribution, consistent with RDP converging to pure DP as α . Our main contributions include: 1. We propose an algorithm for optimizing noise distributions under (ε, δ)-DP, outlined in Algorithm 1 1. Our procedure receives as input user-defined (i) target δ, (ii) number of compositions, (iii) sensitivity, and (iv) average distortion per query (i.e., a cost constraint) such as mean 1Our implementation is available on Git Hub (git, 2025). Algorithm 1 Optimal Noise Distribution for (ε, δ)-DP input : Privacy parameter δ, Number of compositions Nc, Noise scale σ, Query sensitivity s, type (Discrete or Continuous), Hyperparameters θ output : Optimal distribution P minimizing ε for the given δ and Nc, satisfying E[Cost] σ2 1: P0 Initialize Distribution(type, σ, s, θ) 2: P Optimization Algorithm(P0, type, σ, s, δ, Nc, θ) 0.0 0.5 1.0 1.5 2.0 10 8 RDP noise Staircase Cactus Gaussian Laplace Figure 1. This plot compares the privacy curve of our optimized noise designed for 10 compositions, s = 1, and δ = 10 6 against other noise distributions, all with the same standard deviation of 8. At the target δ = 10 6, our noise achieves an ε of 1.62, compared to 1.76 for Laplace/Staircase and 1.74 for Gaussian/Cactus. Replacing these mechanisms with our optimized noise results in at least a 6.89% improvement in the ε value squared-error (MSE). The algorithm outputs an optimized discrete or continuous additive noise distribution. Figure 1 illustrates the resulting (ε, δ) curve for an optimized noise distribution considering 10 compositions, noise standard deviation constraint of 8, and target δ = 10 6. Relative to other noise mechanisms, our optimized mechanisms achieve a lower ε for the same distortion at the target δ. 2. We formulate a finite-dimensional convex optimization problem for producing optimized noise distributions and solve it using preconditioned gradient descent, denoted as Optimization Algorithm in Algorithm 1. This convex problem leverages RDP of order α as an intermediate optimization objective. The RDP hyperparameter α is automatically selected based on the the user-defined input parameters. 3. Our algorithm recovers as special cases noise distributions that are known to be optimal in different regimes, such as the the Staircase and Cactus mechanisms. In the moderate composition regime (roughly 10 40 compositions), both discrete and continuous distributions derived from our framework achieve superior (ε, δ)-DP guarantees compared to their Gaussian and Laplace counterparts for a target cost, Optimizing Noise Distributions for Differential Privacy number of compositions, and δ. We validate the favorable performance of our optimized noise distributions using Connect-the-Dots (Doroshenko et al., 2022) accounting. 2. Preliminaries Notation. Vectors are represented by bold lowercase letters, e.g., x, and matrices by bold capital letters, e.g., A. The symbol B+ denotes the Moore-Penrose right inverse of B (Wikipedia, a). The notation diag(x) denotes a diagonal matrix with the elements of x. The symbol [N] denotes the set of integers from 0 to N, i.e., [N] = {0, 1, 2, . . . , N}. The symbol Φ represents the cumulative distribution function of the standard normal distribution. We review some basic definitions and results from the DP literature. Given an alphabet X, let P(X) be the set of distributions supported on X. For P, Q P(X), the R enyi divergence of order α, for α (0, 1) (1, ), is Dα(P Q) = 1 α 1 log EP Let D be a set of possible datasets, and be a neighboring relation among elements of D. That is, for d, d D we write d d to mean that d and d are neighbors, which typically means that they differ in one entry. A mechanism is a function M : D P(X), which, for each d D, selects a distribution Md P(X). This can be interpreted as a conditional distribution for a random variable supported in X given a dataset from D. Definition 2.1. (Dwork et al., 2006a;b) A mechanism M : D P(X) is said to be (ε, δ)-DP if for all A X, Md(A) eεMd (A) + δ for all d, d D, d d . (2) For a mechanism M, we also define the best ε for a given δ as εM(δ) = inf{ε : M is (ε, δ)-DP}. Definition 2.2. (Mironov, 2017) A mechanism M : D P(X) is said to be (α, γ)-RDP if Dα(Md Md ) γ for all d, d D, d d . (3) For a mechanism M, also define the best γ for a given α as γM(α) = inf{γ : M is (α, γ)-RDP}. (4) For two mechanisms M(1), M(2), each outputting a variable in X, their composition M : D P(X X) is Md(x1, x2) = M(1) d (x1)M(2) d (x2). (5) Proposition 2.3. (Mironov, 2017) For any mechanisms M(1), M(2) and their composition M, γM(α) γM(1)(α) + γM(2)(α). (6) The above is non-adaptive composition, in that each mechanism works independently of the other s output. In contrast, in an adaptive composition, the second mechanism may depend on the output of the first. A similar composition result holds for the adaptive setting (Mironov, 2017), but for brevity we omit the details. A particular consequence of Proposition 2.3 is that, if the same (or equivalent) mechanisms are composed Nc times, then the RDP is simply multiplied by Nc. The moments accountant (Abadi et al., 2016) provides a method to derive (ε, δ) guarantees from (α, γ) guarantees. The most basic form of the moments accountant follows. Proposition 2.4. (Abadi et al., 2016) For a mechanism M, εM(δ) inf α>1 γM(α) + log(1/δ) While improvements to the moments accountant have been made in (Balle et al., 2020; Asoodeh et al., 2021), we use this simple version in our algorithm to simultaneously optimize for the noise distribution and α. However, when we perform numerical privacy accounting for the resulting distribution, we use the state-of-the-art Connect-the-Dots accountant. The sensitivity of a query q : D R describes the maximum difference in the query s output when changing a single entry in the input dataset. Formally, it is defined as s = max d,d D,d d |q(d) q(d )|. (8) Given a real-valued query function q with sensitivity bound s, the Gaussian mechanism involves adding Gaussian noise with zero mean and variance σ2 to the query q. Similarly, the Laplace mechanism involves adding noise from a Laplace distribution with zero mean and scale parameter t to the query q. For many natural queries, the output is inherently discrete, specifically integer-valued, such as when counting how many records in a dataset meet a certain condition. In these cases, it is preferable to add discrete noise directly to the query results. Adding continuous noise to discrete results would require rounding, which can affect the privacy guarantees. Discrete Gaussian and discrete Laplace distributions are the discrete counterparts of their continuous versions and are specifically designed for discrete-valued queries. The definitions of discrete Gaussian and Laplace distributions are as follows. Let Z be the set of integers. The discrete Gaussian distribution, denoted NZ(µ, σ2), for µ Z and σ > 0, is the PMF in P(Z) given by P(x) = e (x µ)2 y Z e (y µ)2 2σ2 , x Z. (9) Optimizing Noise Distributions for Differential Privacy The discrete Laplace distribution, denoted Lap Z(µ, t), for µ Z and t > 0, is the PMF in P(Z) given by P(x) = e1/t 1 e1/t + 1 e |x µ|/t, x Z. (10) 3. Optimized R enyi DP Distributions In this section, we find noise distributions with the best R enyi DP guarantees by formulating a minimax optimization problem. To set up this problem, we first give the following definitions. We use Z to denote the domain of the DP query output, which can be either the real numbers or the integers. The set S is defined as the intersection of the interval [ s, s] with Z, where s Z+ represents the query s sensitivity. Specifically, S = [ s, s] if Z = R, and S = { s, . . . , 0, . . . , s} if Z = Z. We assume a cost function c : Z [0, ) that is symmetric, and an upper bound C R+ on the expected cost. For a function f, Ttf is the shifted version of f by t, i.e., (Ttf)(x) := f(x t). With the above definitions, the optimization problem is minimize PZ P(Z) max t S Dα (PZ Tt PZ) , subject to E[c(Z)] C. (11) For α > 1, we can move the min-max operation inside the logarithm of the R enyi divergence, considering it as our main objective. Let gα(PZ, t) denote the expression inside the logarithm of Dα (PZ Tt PZ), defined as: gα(PZ, t) = EPZ and let gα(PZ) denote the inner maximization: gα(PZ) = max t S gα(PZ, t). (13) We first show that gα(PZ) is convex in PZ. By leveraging this property, along with the symmetry of the cost function, we restrict our search to symmetric noise distributions, as formalized in the following theorem. A detailed proof is provided in the Appendix A. Theorem 3.1. The function gα(PZ) is convex in PZ. Moreover, it suffices to restrict the search to the set of symmetric noise distributions within P(Z) to solve the optimization problem in (11). 3.1. Finite-Dimensional Distribution Classes While we would like to solve (11), it cannot be solved as-is because the search space of distributions is infinitedimensional. In this subsection, we address this by defining finite-dimensional families of distributions that can closely approximate solutions to (11) while being computationally tractable. These families draw inspiration from a similar finite-dimensional parameterization in (Alghamdi et al., 2022). We then present a method for solving the optimization problem within these defined families. For the continuous case, we focus on symmetric piecewise constant probability density functions (PDFs). Importantly, this approach is highly flexible, as any PDF can be closely approximated by using sufficiently small bin widths. This shifts the problem from optimizing the density at every real number to determining the probability pi of bin i. Given these probabilities, the PDF is given by f(z) := p|i| , for z Ii, i Z, (14) where Ii = (i 1 2) , (i + 1 2) denotes the open interval corresponding to bin i, and > 0 represents the bin width. Remark 3.2. As the Lebesgue measure of the breakpoints in the piecewise-constant representation is zero, it is not necessary to explicitly define the value of f at these points. This formulation still involves countably infinite variables, specifically {pi}i Z 0. To ensure a finite number of variables, we introduce the constraint that the distribution exhibits geometric tails. The geometric tails begin beyond a specific interval, starting after bin N, where N is a hyperparameter for the distribution family. The decay rate of the geometric tail is given by a second hyperparameter r. Importantly, this tail assumption does not significantly impact the optimization, provided that the interval is sufficiently large to capture the majority of the PDF s significant behavior. For a fixed , these tails are fully characterized by the probability mass of the bins immediately preceding them and the decay factor r. This reduces the problem to determining the probability masses of the bins preceding the geometric tails, thus converting the problem into a finite-dimensional one. The following formally defines the family of distributions under consideration. Definition 3.3 (Symmetric Piecewise-Constant PDF Family). Let N N, r (0, 1), and > 0. The PDF associated with a vector of probabilities p = (p0, p1, . . . , p N) [0, 1]N+1 is fp,r,N, (z) := , if z Ii, |i| < N, r|i| N, if z Ii, |i| N, (15) subject to the normalization constraint Z R fp,r,N, (z) dz = p0 + 2 j=1 pj + 2p N 1 r = 1. (16) For the discrete case, it is again sufficient to restrict our search to symmetric distributions. We also assume geomet- Optimizing Noise Distributions for Differential Privacy ric tails as in the continuous setting to reduce to a finitedimensional search space. The following gives the family of discrete distributions under consideration. Definition 3.4 (Symmetric PMF Family). Let N N, r (0, 1). The PMF associated with a vector of probabilities p = (p0, p1, , p N) [0, 1]N+1 is Pp,r,N(i) = ( p|i|, for i Z, with |i| N p Nr|i| N, for i Z, with |i| > N, (17) subject to the normalization constraint: i Z Pp,r,N(i) = p0 + 2 i=1 pi + 2p N 1 r = 1. (18) The following proposition characterizes the expected cost for these two distribution families. Proposition 3.5. Let Z be a random variable with distribution given by either the continuous or discrete distribution families described above. Then the expected cost is E[c(Z)] = p0A0 + 2 i=1 pi Ai + 2p N i=N ri NAi, (19) where, for a continuous distribution, z Ii c(z) dz, i Z 0, (20) and for a discrete distribution Ai = c(i). In the special case where c(z) = z2, corresponding to the variance2 of Z, the expression simplifies to, for the continuous case, 12 + 2 2 N 1 X + 2p N 2 r2(N 1)2 + N 2(1 2r) + r(2N + 1) For the discrete case, the variance is the same as in (21), except that the 2/12 term is not present, and = 1. The proof is provided in Appendix B. 3.2. Finite-Dimensional Convex Optimization Within the distribution family described in Definition 3.3 or Definition 3.4, the task of finding the optimal distribution in (11) reduces to determining the vector p = (p0, . . . , p N). Since gα(PZ), defined in (13), is convex in PZ, it is also convex in p. Therefore, the minimization component of the optimization reduces to a finite-dimensional convex optimization problem. The following theorem characterizes 2The distribution s symmetry ensures Z has a mean of zero. gα(PZ, t) in (12) for the two distribution families, and delineates the feasible regions for both continuous and discrete cases. Notably, the resulting finite-dimensional convex optimization problem is almost the same for the two domains. Theorem 3.6. Let s be the sensitivity of a query. Within the symmetric piecewise-constant PDF family defined in Definition 3.3, with a specified bin width of such that s is an integer, the optimization problem for the continuous case, as detailed in (11), can be reformulated as follows: minimize p=(p0,p1, ,p N) max t {1,..., s subject to E[c(Z)] C, j=1 pj + 2 p N pi [0, 1] for all i [N], (22) gα(p, t) = p N r r(1 α)t + rαt + j= N pα |t+j| p1 α |j| + pα Nrα(t N) N X j=N t+1 rαj p1 α |j| + p1 α N r N(1 α) N 1 X j= t N pα |t+j| r j(1 α). (23) For the discrete case within the symmetric PMF family from Definition 3.4, the optimization problem in (11) can also be reformulated as (22) (23) except that we take = 1. The cost constraint E[c(Z)] C for both cases is further detailed in Proposition 3.5. Proof Sketch: The objective in (22) is the quantity inside the logarithm of the R enyi divergence thus the RDP can be easily calculated from the optimal objective. Although the inner maximization in the continuous case initially considers t [ s, s], we demonstrate that the divergence is maximized when the bins of a piecewise constant PDF align with those of its shifted version. In this scenario, the optimal shift is a multiple of the bin width within the interval [ s, s]. The set {1, . . . , s } corresponds to the possible integer multiples of that determine the optimal shift values. The detailed proof is in Appendix C. In the piecewise-constant PDF family introduced in Definition 3.3, setting = 1 creates a PDF where each bin has a width of 1 and is centered on the integers, which can be interpreted as a PMF over the integers. This leads to the objective for the discrete case being a specific instance of the continuous case when = 1. Theorem 3.6 highlights that solving for the optimal noise in the two domains (continuous and discrete) is nearly identical. Optimizing Noise Distributions for Differential Privacy Algorithm 2 Initialize Distribution input : Noise scale σ, type (discrete or continuous), Query sensitivity s, Distribution parameters N, r, output : p0 1: ˆCmin 0 2: ˆCmax 2σ2 4: ˆC ˆCmin + ˆCmax 2 5: for i = 0 to N 1 do 6: pi Φ (i + 1 7: end for 8: p N (1 r) 1 Φ (N 1 9: Var variance of the noise associated with p and type using Proposition 3.5 10: if Var < σ2 then 11: ˆCmin ˆC 12: else 13: ˆCmax ˆC 14: end if 15: until Var and σ2 are sufficiently close 16: p0 p The objectives differ only in that must be 1 for the discrete case. Moreover, as shown in Proposition 3.5, for quadratic cost, the constraints differ only in that the continuous case includes an extra constant term 2/12 whereas discrete does not. Remark 3.7. Let σ represent the standard deviation of the noise. As demonstrated in (Mironov, 2017), for Gaussian and Laplace distributions, the RDP of order α depends solely on the ratio σ/s, rather than on σ and s individually. In fact, our noise distributions satisfy exactly the same property. In the optimization problem in (22), for a quadratic cost with C = σ2, we can rewrite the cost constraint from (21) as: + 2p N r2(N 1)2 + N 2(1 2r) + r(2N + 1) (1 r)3 . (24) Moreover, s appears only in the set over which we maximize. By fixing s/ to an integer, say m, which is feasible because can be adjusted to maintain the desired ratio, we get = s/m. Substituting this into (24), it becomes a function of σ/s. This implies that the optimization now depends solely on the ratio σ/s. Therefore, the RDP of our noise depend solely on the ratio σ/s, similar to the behavior observed for Gaussian and Laplace distributions. Algorithm 3 Optimization Algorithm input : Privacy parameter δ, Number of compositions Nc, Noise scale σ, Query sensitivity s, Total number of iterations K, Initial distribution p0, Distribution parameters N, r, , R enyi parameter (α) update time step T, type (discrete or continuous) output : Final solution p K 2 log (1/δ) 2: Calculate A, b so the linear constraints for cost and normalization are given by Ap = b 3: for k = 1 to K do 4: t arg maxt {1, , s } gα(pk 1, t) 5: M diag(pk 1) 1 6: g M 1 pgα(pk 1, t ) 7: B AM 1 8: B+ BT (BBT ) 1 9: gproj g B+Bg 10: µub mini [N],gproj i >0 1 gproj i 11: pk pk 1 12: gmin gα(pk 1, t ) 13: for µ n µub, µub 2 , . . . , µub 14: ˆp M 1(1 µ gproj) 15: ˆg maxt {1, , s } gα(ˆp, t) 16: if ˆg < gmin then 17: pk ˆp 18: end if 19: end for 20: if k is a multiple of T then 21: α α γ pk,Nc,δ(α) γ pk,Nc,δ(α) 22: end if 23: end for 4. Preconditioned Gradient Descent Algorithm In this section, we present our algorithms for identifying the noise distribution that achieves the best (ε, δ)-DP guarantees. Algorithm 1 is the overall algorithm, which calls Algorithm 2 and then Algorithm 3. The role of Algorithm 2 is to initialize the vector p, which is passed to Algorithm 3. This latter algorithm performs preconditioned gradient descent to find the optimal noise distribution. Algorithm 2 Overview: Since the Gaussian distribution is a straightforward baseline, we initialize with an approximation of this distribution. Since our distribution families cannot exactly capture the Gaussian (or discrete Gaussian) distribution, we must slightly adjust the parameters to exactly match the desired σ. In the algorithm, ˆC represents the variance of the Gaussian: based on this, we derive p to approximate a Gaussian with variance ˆC. However, the resulting distribution will have a slightly different variance, denoted Var. We employ a bisection algorithm to find ˆC so Optimizing Noise Distributions for Differential Privacy 6 4 2 0 2 4 6 z RDP noise Cactus 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 z RDP noise Staircase Figure 2. The left plot compares our optimized noise distribution and a Cactus distribution, both with standard deviation 0.3, under the setting δ = 10 5, s = 1, and 50,000 compositions, corresponding to α = 1.006 (moments accountant), with noise parameters = 0.005, N = 1600, and r = 0.9999. The right plot compares our optimized noise and a Staircase distribution, both with standard deviation 1, under the setting δ = 10 20, s = 1, and one composition, corresponding to α = 125.01, with noise parameters = 0.01, N = 2000, and r = 0.9999. 15 10 5 0 5 10 15 z 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 z Figure 3. The left plot displays the optimized noise distribution for a standard deviation of 4, δ = 10 6, s = 1, and 20 compositions, corresponding to α = 5.95 (using the moments accountant). The right plot shows the distribution for a standard deviation of 2, δ = 10 10, s = 1, and 8 compositions, corresponding to α = 11.32. The noise parameters are = 0.01, r = 0.9999, and N = 8000 for the left plot, and N = 4000 for the right. that Var = σ2. The bisection method allows us to converge to the desired variance for the initial noise mechanism iteratively. It does so by iteratively tightening the chosen upper and lower bounds on ˆC until they converge. Algorithm 3 Overview: The main task is to solve the optimization problem described from Theorem 3.6, given by (22). The minimization problem is convex in p, so gradient descent will convergence to the optimal objective value. Recall that our optimization is a minimax problem. The maximization in (22) only requires iterating over a finite set, while a preconditioned gradient descent method is used to solve the convex optimization part. Explanation of Key Aspects of Algorithm 3: Linear constraints: The optimization problem involves two linear constraints, corresponding to the cost constraint (which is active) and the normalization constraint. These two constraints can be expressed as a matrix equation: Ap = b. We focus on the quadratic cost c(z) = z2 with C = σ2, where σ is the standard deviation of the noise. The appropriate cost constraint (depending on whether type is discrete or continuous) from Proposition 3.5 is used in each case. The normalization constraint is in (22). Optimization for α: Algorithm 3 simultaneously optimizes for p and α. Specifically, we optimize α according to the moments accountant formula, given by γp,Nc,δ(α) = Nc α 1 log gα(p) + log(1/δ) where gα(p) = maxt {1,...,s/ } gα(p, t). The initial value of α is set to the optimal α for the moments accountant with Gaussian noise (corresponding to the initialization of p to being approximately Gaussian). Using the fact that the RDP of Gaussian noise is s2α 2σ2 in the moments accountant formula, the optimal α for Gaussian is α = q Subsequently, every T iterations, α is updated by taking a Newton step to optimize the moments accountant formula. Optimizing Noise Distributions for Differential Privacy 1 5 10 15 20 25 30 35 40 45 Compositions 6.0 Laplace RDP noise Gaussian Staircase Cactus Figure 4. For fixed δ = 10 6, the grid evaluates different combinations of composition count and noise standard deviation (σ), marking RDP noise as the winner whenever it achieves more than a 2% improvement in ε over the best alternative (among Gaussian, Laplace, Staircase, and Cactus). Even when another distribution is marked as the best, our noise still consistently outperforms the others, although by less than 2%. 0.0 0.5 1.0 1.5 2.0 10 8 RDP noise (pointwise min) Staircase Cactus Gaussian Laplace 0.0 0.5 1.0 1.5 2.0 10 8 Discrete RDP noise (pointwise min) Discrete Staircase Discrete Cactus Discrete Gaussian Discrete Laplace Figure 5. The plots compare the pointwise minimum of privacy curves from our optimized noise distributions designed for 10 compositions, sensitivity 1, and varying δ with privacy curves for Gaussian, Laplace, Cactus, and Staircase mechanisms, all with a fixed standard deviation of σ = 8. The left plot shows continuous distributions; the right plot shows discrete distributions. All curves are evaluated at 10 compositions. Preconditioning: Standard gradient descent is ill-suited to this optimization problem. The reason is that the probability vector p is constrained to be non-negative, in addition to the normalization and cost constraints. All these constraints by themselves can be handled by projected gradient descent (indeed, this is our approach to dealing with the normalization and cost constraints; see below). However, doing so will tend to produce p vectors with zeros, which lead to infinite values of the objective in (23). This blowing-up occurs because the R enyi divergence between distributions with different supports is infinite. Thus, it is not enough to keep p in the feasible set, it must be kept strictly feasible. This can be addressed using standard gradient descent by reducing the step size, but this causes very slow convergence. To address this limitation, we instead use preconditioned gradient descent, where we apply a linear transformation to the solution space, and take a gradient step in the transformed space. In particular, we use the transformation matrix M = diag(pk 1) 1, where pk 1 is the current optimization variable. This transformation introduces uniformity, ensuring that p maintains a roughly consistent distance from the boundary imposed by the nonnegativity constraint. Gradient Calculation and Projection: After computing the gradient in the transformed basis, we project it onto the region that meets the cost and normalization constraints. Satisfying the equation Ap = b for a vector p in the original basis is equivalent to satisfying AM 1q = b for a vector q in the transformed basis. Backtracking Line Search: The current position pk 1 in the original basis maps to Mpk 1 = diag(pk 1) 1pk 1 = 1 in the transformed basis. Thus, in each update within the transformed basis, the current position is represented by an all-ones vector. The update rule is: q = 1 µ gproj, where µ is the learning rate. The Optimizing Noise Distributions for Differential Privacy Table 1. MSE improvement (%) of RDP noise over Gaussian for δ = 10 6 and 10 queries (mean std across 20 seeds). Dataset ε = 0.62 0.69 0.78 0.84 0.97 1.05 Breast Cancer 8.28 0.19 9.14 0.19 9.63 0.23 8.61 0.17 10.34 0.20 11.48 0.18 Diabetes 8.11 0.19 9.06 0.21 9.43 0.16 8.48 0.17 10.06 0.17 11.12 0.22 UCI Heart Disease 8.14 0.22 9.05 0.16 9.50 0.26 8.52 0.19 10.20 0.20 11.31 0.18 corresponding update in the original basis is: pk = M 1q. The constraint pk 0 implies q 0, leading to the upper bound µub = mini [N],gproj i >0 1 gproj i . We implement a backtracking line search strategy that evaluates learning rates in the sequence µub, µub/2, . . . , µub/210. Among these, the learning rate that achieves the largest reduction in the objective function is selected. 5. Numerical Results In this section, we present numerical results evaluating the privacy characteristics of our proposed RDP noise. All (ε, δ)-DP guarantees presented here are computed using the Connect-the-Dots accounting (Doroshenko et al., 2022). Figures 2 and 3 show optimized RDP noise distributions across different settings. In particular, Figure 2 highlights how our framework automatically recovers Cactus and Staircase distributions in the regimes where they perform best (α close to 1 and respectively). Figure 4 illustrates what we call the moderate composition regime, showing the range of compositions and noise standard deviations (σ) where our optimized noise distribution achieves at least a 2% improvement in ε compared to other known mechanisms (Gaussian, Laplace, Staircase, and Cactus) for δ = 10 6. The regions with more than two percent improvement depend on both σ and the number of compositions. Importantly, there is no single mechanism that is always best across all parameter settings so finding the best one typically requires testing all four for each combination. Even in cases where another distribution is labeled as the best, our noise still consistently performs better than the rest, although the margin is less than 2%. Our noise is beneficial not only when it gives a clear 2% or greater advantage, but also in situations where it is unclear in advance which mechanism will perform best. For example, even though the Staircase distribution is provably optimal only for a single composition and in the pure DP limit (δ 0), it sometimes outperforms the others at δ = 10 6 for certain σ and composition choices yet it is easy to mistakenly pick a different baseline like Laplace. Our optimized noise is robust and helps avoid mistakes when choosing among existing mechanisms. Note that this plot is specific to δ = 10 6; for other values of δ, there are additional combinations of σ and compositions where our noise provides similar improvements, some of which are shown in Appendix D. In the two parts of Figure 5, we fix the number of compositions and standard deviation while optimizing the noise distribution for different δ values, for both continuous (left) and discrete (right) domains. Each δ results in a different optimized noise distribution, generating its own privacy curve. The RDP noise curves represent the pointwise minimum of these curves: that is, each point (ε, δ) on this curve is derived from the noise distribution optimized for that specific δ. In contrast, Figure 1 shows the privacy curve for a specific noise distribution optimized for a single target δ. Our approach remains optimal across all δ values, closely matching the performance of other noise mechanisms in their respective optimal regimes while significantly outperforming them elsewhere. Note that the range of ε where we observe the greatest improvement varies with the choice of σ and the number of compositions. We provide an additional set of plots for a different combination in Appendix D. In Table 1, we report the performance improvement of our optimized noise over the Gaussian baseline on three widely used datasets for privacy-preserving machine learning: Breast Cancer Wisconsin (Diagnostic) (Wolberg et al., 1993), Diabetes (learn developers), and the UCI Heart Disease dataset (Cleveland subset) (Janosi et al., 1988). Details about the datasets and the experimental setup are provided in Appendix D due to space constraints. Since Gaussian noise consistently outperformed Laplace, Cactus, and Staircase mechanisms across all tested settings (with δ = 10 6, 10 queries, and a range of target ε values), we focus our comparison on the Gaussian baseline. As shown, replacing Gaussian noise with our optimized mechanism yields an improvement of 8% to 12% across the evaluated ε values. 6. Conclusion We have introduced a unified optimization framework to identify optimal continuous and discrete noise distributions for (ε, δ)-DP for a given cost and number of compositions. To address the problem s non-convexity, we have converted the objective into R enyi DP of an optimized order α, yielding a finite-dimensional convex optimization problem. We have introduced a novel preconditioned gradient descent algorithm to efficiently solve this optimization problem. The resulting noise distributions are consistently better than Gaussian and Laplace distributions, with significant improvements in the moderate composition regime. Optimizing Noise Distributions for Differential Privacy Acknowledgements We thank the anonymous reviewers for their valuable feedback, which significantly improved the quality of this paper. This work was supported by the National Science Foundation under Grant Nos. CIF-2312666 and CIF-2312667. Impact Statement This paper presents work whose goal is to advance the field of Machine Learning. There are many potential societal consequences of our work, none which we feel must be specifically highlighted here. Renyi DP mechanism design. https://github.com/ Sankar Lab/Renyi-DP-Mechanism-Design, May 2025. Abadi, M., Chu, A., Goodfellow, I., Mc Mahan, H. B., Mironov, I., Talwar, K., and Zhang, L. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pp. 308 318, 2016. Abowd, J., Ashmead, R., Cumings-Menon, R., Garfinkel, S., Heineck, M., Heiss, C., Johns, R., Kifer, D., Leclerc, P., Machanavajjhala, A., Moran, B., Sexton, W., Spence, M., and Zhuravlev, P. The 2020 Census Disclosure Avoidance System Top Down Algorithm. Harvard Data Science Review, (Special Issue 2), jun 24 2022. https://hdsr.mitpress.mit.edu/pub/7evz361i. Alghamdi, W., Asoodeh, S., Calmon, F. P., Kosut, O., Sankar, L., and Wei, F. Cactus mechanisms: Optimal differential privacy mechanisms in the large-composition regime. In 2022 IEEE International Symposium on Information Theory (ISIT), pp. 1838 1843, 2022. doi: 10.1109/ISIT50566.2022.9834438. Alghamdi, W., Asoodeh, S., Calmon, F. P., Felipe Gomez, J., Kosut, O., and Sankar, L. Schr odinger mechanisms: Optimal differential privacy mechanisms for small sensitivity. In 2023 IEEE International Symposium on Information Theory (ISIT), pp. 2201 2206, 2023. doi: 10.1109/ISIT54713.2023.10206616. Asoodeh, S., Liao, J., Calmon, F. P., Kosut, O., and Sankar, L. Three variants of differential privacy: Lossless conversion and applications. IEEE Journal on Selected Areas in Information Theory, 2(1):208 222, 2021. Balcer, V. and Vadhan, S. P. Differential privacy on finite computers. In Information Technology Convergence and Services, 2017. URL https://api. semanticscholar.org/Corpus ID:6692720. Balle, B., Barthe, G., Gaboardi, M., Hsu, J., and Sato, T. Hypothesis testing interpretations and R enyi differential privacy. In Int. Conf. Art. Intelligence and Stat. (AISTAT), pp. 2496 2506, 2020. Bun, M. and Steinke, T. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Proc. Theory of Cryptography, pp. 635 658, 2016. Canonne, C. L., Kamath, G., and Steinke, T. The discrete Gaussian for differential privacy. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS 20, Red Hook, NY, USA, 2020. Curran Associates Inc. ISBN 9781713829546. Chen, Y., Thakurta, A. G., and Woodruff, D. P. Poisson subsampled r enyi differential privacy. In Proceedings of the 52nd Annual ACM Symposium on Theory of Computing (STOC), pp. 849 860. ACM, 2019. doi: 10.1145/3313276.3316369. Doroshenko, V., Ghazi, B., Kamath, P., Kumar, R., and Manurangsi, P. Connect the dots: Tighter discrete approximations of privacy loss distributions, 2022. URL https://arxiv.org/abs/2207.04380. Dwork, C., Kenthapadi, K., Mc Sherry, F., Mironov, I., and Naor, M. Our data, ourselves: Privacy via distributed noise generation. In Vaudenay, S. (ed.), EUROCRYPT, pp. 486 503, 2006a. Dwork, C., Mc Sherry, F., Nissim, K., and Smith, A. Calibrating noise to sensitivity in private data analysis. In Proc. Theory of Cryptography (TCC), pp. 265 284, Berlin, Heidelberg, 2006b. ISBN 3-540-32731-2, 978-3-540-327318. Feldman, V., Acharya, A., Dinesh, P. N., and Rothblum, G. N. On the r enyi differential privacy of the shuffle model. In Proceedings of the 52nd Annual ACM Symposium on Theory of Computing (STOC), pp. 1232 1244. ACM, 2021. doi: 10.1145/3460120.3484794. Feldman, V., Liu, Y., Karpov, D. V., Muthukrishnan, S., and Kuczynski, J. Stronger privacy amplification by shuffling for r enyi and approximate differential privacy. ar Xiv preprint ar Xiv:2208.04591, 2022. URL https: //arxiv.org/abs/2208.04591. Geng, Q., Kairouz, P., Oh, S., and Viswanath, P. The staircase mechanism in differential privacy. IEEE Journal of Selected Topics in Signal Processing, 9(7):1176 1184, 2015. doi: 10.1109/JSTSP.2015.2425831. Ghosh, A., Roughgarden, T., and Sundararajan, M. Universally utility-maximizing privacy mechanisms. SIAM J. Comput., 41(6):1673 1693, January 2012. ISSN 0097-5397. doi: 10.1137/09076828X. URL https: //doi.org/10.1137/09076828X. Optimizing Noise Distributions for Differential Privacy Janosi, A., Steinbrunn, W., Pfisterer, M., and Detrano, R. Heart disease data set. https://archive.ics. uci.edu/ml/datasets/Heart+Disease, 1988. Accessed: 2025-04-29. learn developers, S. Diabetes dataset. https: //scikit-learn.org/stable/modules/ generated/sklearn.datasets.load_ diabetes.html. Accessed: 2025-04-29. L ecuyer, M., Lee, J. M., Smith, A., and Vadhan, S. Practical privacy filters and odometers with r enyi differential privacy and applications to differentially private deep learning. ar Xiv preprint ar Xiv:2103.01379, 2021. URL https://arxiv.org/abs/2103.01379. Mironov, I. On significance of the least significant bits for differential privacy. In Proceedings of the 2012 ACM Conference on Computer and Communications Security, CCS 12, pp. 650 661, New York, NY, USA, 2012. Association for Computing Machinery. ISBN 9781450316514. doi: 10.1145/2382196.2382264. URL https://doi. org/10.1145/2382196.2382264. Mironov, I. R enyi differential privacy. In Proc. IEEE Comp. Security Foundations Symp. (CSF), pp. 263 275, 2017. van Erven, T. and Harremos, P. R enyi divergence and kullback-leibler divergence. IEEE Transactions on Information Theory, 60(7):3797 3820, 2014. doi: 10.1109/ TIT.2014.2320500. Wikipedia. Moore Penrose inverse . https: //en.wikipedia.org/wiki/Moore%E2% 80%93Penrose_inverse, a. Wikipedia. Tonelli s theorem for non-negative measurable functions . https://en.wikipedia.org/wiki/ Fubini%27s_theorem#Tonelli s_theorem_ for_non-negative_measurable_functions, b. Wolberg, W. H., Mangasarian, O. L., and Street, W. N. Breast cancer wisconsin (diagnostic) data set. https: //archive.ics.uci.edu/ml/datasets/ Breast+Cancer+Wisconsin+(Diagnostic), 1993. Accessed: 2025-04-29. Optimizing Noise Distributions for Differential Privacy A. Proof of Theorem 3.1 Let Z be the domain of the query s output, which may consist of real numbers or integers, and let P(Z) denote the set of all probability distributions supported on Z. Let the distribution P Z represents the reflection of PZ with respect to the y-axis, such that P Z(z) = PZ( z) for all z Z. The feasible region Pf P(Z) encompasses all noise distributions PZ P(Z) that satisfy the normalization constraint EPZ[1] = 1, the positivity constraint PZ(z) 0 for all z Z, and the cost constraint EPZ[c(Z)] C. The set S represent the possible shifts: specifically, S = [ s, s] if Z = R, and S = { s, . . . , 0, . . . , s} if Z = Z. Recall that gα(PZ, t) = EPZ and gα(PZ) = max t S gα(PZ, t). (27) To justify restricting the search to symmetric noise distributions, we demonstrate that the feasible region Pf is both symmetric and convex. Additionally, we prove that gα(PZ) exhibits symmetry in Z and convexity in PZ. By leveraging these properties, we establish that minimizing over the entire feasible region yields the same result as minimizing solely over the symmetric distributions within that region. Convexity of the Feasible Region: Let λ (0, 1) and PZ, QZ Pf, we show that λPZ + (1 λ)QZ Pf. The convex combination λPZ + (1 λ)QZ is a valid distribution, as it satisfies the normalization constraint as follows: EλPZ+(1 λ)QZ[1] = λ EPZ[1] + (1 λ) EQZ[1] = 1, (28) and it is non-negative for all z Z, i.e., λPZ(z) + (1 λ)QZ(z) 0. We also have EλPZ+(1 λ)QZ[c(Z)] = λ EPZ[c(Z)] + (1 λ) EQZ[c(Z)] λC + (1 λ)C = C. (29) The inequality holds because PZ, QZ Pf, and thus EPZ[c(Z)] C and EQZ[c(Z)] C. So, we have λPZ + (1 λ)QZ Pf, and the feasible region Pf is convex. Symmetry of the Feasible Region: We show that if PZ lies within the feasible region, its reflection P Z must also belong to the feasible region. If PZ P(Z) (i.e., it is a non-negative, normalized measure), then P Z P(Z). Thus, our focus is on verifying the cost constraint. Given that EPZ[c(Z)] C, we need to show that EP Z[c(Z)] C as well. We have EP Z[c(Z)] = EPZ[c( Z)] = EPZ[c(Z)] C (30) where the first equality follows from a variable change and the symmetry of Z. The second equality follows from the symmetry of the cost function. Symmetry and Convexity of the inner maximization:. As shown in (van Erven & Harremos, 2014) (Proof of Theorem 13), the expression EPZ PZ QZ α 1 is jointly convex in (PZ, QZ). In (26), with QZ = Tt PZ and the linearity of Tt, it follows that gα(PZ, t) is convex in PZ. Moreover, since the pointwise maximum over t S preserves convexity, gα(PZ) is also convex in PZ. We now show that gα(PZ) is symmetric as follows: gα(P Z) = max t S gα(P Z, t) (31) = max t S EP Z = max t S EPZ Optimizing Noise Distributions for Differential Privacy = max t S EPZ = max t S gα(PZ, t) (35) = gα(PZ), (36) where (33) follows from the symmetry of Z and the variable substitution from z to z, while (34) results from the inherent symmetry of S. Sufficiency of Restricting the Search to Symmetric Noise Distributions: Leveraging these properties, we now establish that it is sufficient to restrict the search to symmetric noise distributions. Let P Z be an optimal noise distribution; i.e., P Z minimizes gα(PZ) over all PZ Pf. Also let M = gα(P Z) be the optimal objective value. Let λ (0, 1), since the feasible region is convex, we have λP Z + (1 λ)P Z Pf. (37) Since gα(PZ) is symmetric, we have gα(P Z) = gα(P Z) = M , and so P Z is also an optimal noise distribution. Considering the convexity of g(PZ), we have gα λP Z + (1 λ)P Z λgα(P Z) + (1 λ)gα(P Z) = λM + (1 λ)M = M (38) which means every noise distribution on the line segment connecting P Z and P Z is an optimal noise distribution. For the special case of λ = 1 2, we get an optimal noise distribution P Z + P Z 2 which is symmetric. This proves the existence of a symmetric noise distribution among the set of all possible solutions and so the sufficiency of searching over only symmetric noise distributions. B. Proof of Proposition 3.5 For the continuous case, we have E[c(Z)] = Z z R c(z) f Z(z) dz (39) c(z) dz (40) z Ii c(z) dz (41) z I0 c(z) dz + 2 z Ii c(z) dz + 2 z Ii c(z) dz (42) i=1 pi Ai + 2p N i=N ri NAi, (43) z Ii c(z) dz, (44) and (42) follows from the symmetry of c(z). When c(z) = z2, Ai simplifies to: z Ii z2 dz = 2 2)3 = 2 i2 + 1 Optimizing Noise Distributions for Differential Privacy Var(Z) = p0 2 i=1 pi 2 i2 + 1 i=N ri N 2 i2 + 1 i=N 2p Nri N ! i=1 pi 2i2 + 2p N i=N ri N 2i2 (47) i=1 2pi + 2p N i=1 pi 2i2 + 2p N i=N ri N 2i2 (48) 12 + 2 2 N 1 X i=1 pi i2 + 2p N 2 X i=N ri Ni2, (49) where (49) follows from the normalization constraint in (16), and we have i=N ri N i2 = r2(N 1)2 + N 2(1 2r) + r(2N + 1) (1 r)3 . (50) The proof for the discrete case follows a similar approach. C. Proof of Theorem 3.6 Here, we present a proof for the continuous case. The proof for the discrete case follows a similar approach to that of the continuous case. Recall that in the continuous case, the PDF is given by fp,r,N, (z) := pi , for z Ii, with i Z, (51) where Ii = (i 1 2) , (i + 1 2) , p i = pi, and pi = p Nr|i| N for |i| N. To simplify the expression inside the logarithm of the RDP, as presented in (12), we begin by substituting the piecewise-constant form of the PDF and simplifying it for any shift t [ s, s]. After substitution, the resulting expression becomes piecewise linear in t, with breakpoints occurring when t is a multiple of the bin width , aligning the bins of the PDF and its shifted version. Thus, to find the maximum over t, we only need to consider the breakpoints within [ s, s] and the interval s endpoints, forming a finite set of candidates. Finally, we simplify the expression for each element in this finite set. For the continuous case, we have gα(fp,r,N, , t) = Z R fp,r,N, (z)α fp,r,N, (z t)1 αdz (52) pj 1{z t Ij} i= pα i 1{z Ii} j= p1 α j 1{z t Ij} dz (54) j= pα i p1 α j 1{z Ii} 1{z t Ij} dz (55) j= pα i p1 α j R 1{z Ii} 1{z t Ij} dz. (56) Equation (56) follows from applying Tonelli s theorem for non-negative measurable functions (Wikipedia, b) twice. This holds true because both the Lebesgue measure on the real numbers and the counting measure on the integers are σ-finite, and the expression pα i p1 α j 1{z Ii} 1{z t Ij} is non-negative for all i, j Z and z R. The integral within the Optimizing Noise Distributions for Differential Privacy expression above can be simplified as follows: R 1{z Ii}1{z t Ij} dz = t + (j i + 1) , if (i j 1) t (i j) , t + (i j + 1) , if (i j) t (i j + 1) , 0, otherwise (57) = (t + (j i + 1) ) 1 {t ((i j 1) , (i j) ]} + ( t + (i j + 1) ) 1 {t ((i j) , (i j + 1) )} (58) Substituting the expression from (58), the expression in (56) simplifies to: j= pα i p1 α j R 1{z Ii} 1{z t Ij} dz (59) i= pα i p1 α j (t + (j i + 1) ) 1 {t ((i j 1) , (i j) ]} + ( t + (i j + 1)) 1 {t ((i j) , (i j + 1) )} m= pα m+j p1 α j (t + ( m + 1) ) 1 {t ((m 1) , m ]} + ( t + (m + 1) ) 1 {t (m , (m + 1) )} where, in (61), we apply the variable change m = i j for each fixed j. Since t is a real number in the interval [ s, s], the indicator functions 1 {t ((m 1) , m ]} and 1 {t (m , (m + 1) )} are zero for values of m outside the set { s , . . . , 0, . . . , s }, where is chosen such that s is an integer. Therefore, gα(fp,r,N, , t) simplifies to: gα(fp,r,N, , t) = 1 pα m+j p1 α j (t + ( m + 1) ) 1 {t ((m 1) , m ]} + ( t + (m + 1) ) 1 {t (m , (m + 1) )} To solve the maximization problem over t [ s, s], observe that the above expression is piecewise linear in t. Therefore, it is sufficient to evaluate the maximum at the endpoints of the linear segments. These endpoints are given by t n a | a n s , . . . , 0, . . . , s oo . Let evaluate the expression in (62) at t = a : gα(fp,r,N, , a ) pα m+j p1 α j (a + ( m + 1) ) 1 {a ((m 1) , m ]} + ( a + (m + 1) ) 1 {a (m , (m + 1) )} j= pα a+j p1 α j (a + ( a + 1) ) (64) j= pα a+j p1 α j (65) Optimizing Noise Distributions for Differential Privacy min{ N a, N} 1 X j= (p N r a j N)α (p N r j N)1 α + max{N,N a} X j=min{ N a, N} pα a+j p1 α j j=max{N,N a}+1 (p N ra+j N)α (p N rj N)1 α (66) = p N r aα N min{ N a, N} 1 X max{N,N a} X j=min{ N a, N} pα a+j p1 α j + p N rαa N X j=max{N,N a}+1 rj (67) = p N r aα N r min{ N a, N}+1 max{N,N a} X j=min{ N a, N} pα a+j p1 α j + p N rαa N rmax{N,N a}+1 r aα N+max{N+a,N} + raα N+max{N,N a} + max{N,N a} X j=min{ N a, N} pα a+j p1 α j (69) r aα+max{a,0} + raα+max{0, a} + max{N,N a} X j=min{ N a, N} pα a+j p1 α j (70) r(1 α)|a| + rα|a| + j= N pα |a|+j p1 α j + pα Nrα(|a| N) j=N |a|+1 rαj p1 α j + p1 α N r N(1 α) N 1 X j= |a| N pα |a|+j r j(1 α) 1{a = 0} (71) where (64) follows because an open interval (m , (m + 1) ) does not include any endpoints. So 1 {a (m , (m + 1) )} = 0 and a half-open interval ((m 1) , m ] includes the endpoint a if m = a. The equality (71) follows because r aα+max{a,0} + raα+max{0, a} = r aα+a + raα if a 0 r aα + raα a if a < 0 = r(1 α)|a| + rα|a| (72) max{N,N a} X j=min{ N a, N} pα a+j p1 α j (73) j= N pα |a|+j p1 α j + p1 α N r N(1 α) N 1 X j= |a| N pα |a|+j r j(1 α) 1{a = 0} (74) j= N pα |a|+j p1 α j + pα Nrα(|a| N) N X j=N |a|+1 rαj p1 α j p1 α N r N(1 α) N 1 X j= |a| N pα |a|+j r j(1 α) 1{a = 0}. (75) where we have used the assumption that the distribution is symmetric, meaning pi = p i. The quantity in (71) is symmetric with respect to a; therefore, the maximum over t n a | a n s , . . . , 0, . . . , s Optimizing Noise Distributions for Differential Privacy 1 5 10 15 20 25 30 35 40 45 50 55 60 Compositions 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 Laplace RDP noise Gaussian Staircase Cactus Figure 6. For fixed δ = 10 9, the grid evaluates different combinations of composition count and noise standard deviation (σ), marking RDP noise as the winner whenever it achieves more than a 2% improvement in ε over the best alternative (among Gaussian, Laplace, Staircase, and Cactus). Even when another distribution is marked as the best, our noise still consistently outperforms the others, although by less than 2%. reduces to a maximum over t n a | a n 0, . . . , s and |a| simplifies to a. Moreover, when a = 0, the two distributions are identical, and the divergence reaches its minimum value of zero. Hence, we can exclude a = 0 from the set. So, the task of finding the worst-case shift collapses to the following optimization: max a {1,..., s r(1 α)a + rαa + j= N pα a+j p1 α j + pα Nrα(a N) j=N a+1 rαj p1 α j + p1 α N r N(1 α) N 1 X j= a N pα a+j r j(1 α). (76) Utilizing the symmetry of the noise distribution (pi = p i), we can rewrite the maximization as follows: max a {1,..., s r(1 α)a + rαa + j= N pα |a+j| p1 α |j| + pα Nrα(a N) j=N a+1 rαj p1 α |j| + p1 α N r N(1 α) N 1 X j= a N pα |a+j| r j(1 α). (77) D. Additional Numerical Results The region of compositions and noise levels (σ) where our optimized noise achieves at least a 2% improvement in ε over standard mechanisms (Gaussian, Laplace, Staircase, and Cactus) shifts as δ changes. Figure 6 presents a plot similar to Figure 4 but for δ = 10 9. These results further highlight that the optimal noise distribution is highly setting-dependent. Since our RDP noise consistently performs well across different settings and avoids the need for manual selection among existing mechanisms, it offers a more reliable default. Figure 7 shows a plot similar to Figure 5, but for a smaller σ = 5, with the number of compositions fixed at 10 in both cases. As shown, the range of ε where our method provides the greatest improvement shifts with the choice of σ. For σ = 5 (Figure 7), the peak improvement occurs between ε = 2 and 2.8, while for the larger σ = 8 in Figure 5, it lies between approximately 1.2 and 1.8. This shift is expected optimizing for a larger σ corresponds to a higher privacy regime, resulting in smaller ε values. To obtain the results in Table 1, we assume that the 5th and 95th percentiles of each feature are privately released. These quantiles are used to rescale each feature by subtracting the 5th percentile and dividing by the difference between the 95th and 5th percentiles. This transformation maps most feature values to the [0, 1] range, and any values outside this range are Optimizing Noise Distributions for Differential Privacy 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 10 8 RDP noise (pointwise min) Staircase Cactus Gaussian Laplace 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 10 8 Discrete RDP noise (pointwise min) Discrete Staircase Discrete Cactus Discrete Gaussian Discrete Laplace Figure 7. The plots compare the pointwise minimum of privacy curves from our optimized noise distributions designed for 10 compositions, sensitivity 1, and varying δ with privacy curves for Gaussian, Laplace, Cactus, and Staircase mechanisms, all with a fixed standard deviation of σ = 5. The left plot shows continuous distributions; the right plot shows discrete distributions. All curves are evaluated at 10 compositions. clipped to 0 or 1. This normalization step ensures that all features are on a consistent scale before noise is added. Mapping back to the original feature scale can be done using the privately released quantiles. Since the quantile release step incurs the same privacy cost for both the Gaussian and RDP noise mechanisms, we do not account for it separately in our analysis. Instead, we focus on evaluating the privacy-utility trade-offs introduced by the noise-adding mechanisms themselves. We then add noise using either the Gaussian mechanism or our optimized RDP noise. For each query, we generate 100,000 differentially private outputs by adding noise according to the selected mechanism and compute the mean squared error (MSE) with respect to the true (non-private) value. We average the MSEs across 10 queries, repeat the process for 20 random seeds, and report the mean and standard deviation ( ) of the improvement over Gaussian noise. For the remainder of this section, we provide details about the datasets and the queries used in the experiments presented in Table 1. Breast Cancer Wisconsin (Diagnostic) (Wolberg et al., 1993): This dataset contains 569 records with 30 continuous features extracted from digitized images of fine needle aspirates of breast masses. These features capture various properties of cell nuclei, including radius, texture, perimeter, area, smoothness, and symmetry. We define 10 continuous queries, each computing the average of a key diagnostic indicator. These include the overall average values of features such as mean radius, texture, area, smoothness, and fractal dimension. Such queries are commonly used in biomedical research and medical data analysis to support disease characterization and model interpretability in diagnostic applications. Diabetes dataset (learn developers): This dataset includes 442 patient records and 10 features representing physiological variables such as age, sex, body mass index (BMI), blood pressure, and six blood serum measurements (s1 to s6). We define 10 continuous queries, each computing the mean of a feature across the dataset. These queries capture the average body mass index, blood pressure, age, sex indicator (reflecting gender distribution), and the average values of each of the six serum biomarkers. UCI Heart Disease Dataset (Cleveland subset)(Janosi et al., 1988): A widely used clinical dataset containing diagnostic information for 303 patients undergoing evaluation for coronary artery disease. It includes a mix of demographic and clinical features such as age, sex, resting blood pressure, serum cholesterol, maximum heart rate achieved during exercise, and indicators of exercise-induced angina. To simulate realistic medical analytics, we define 10 queries focused on patient demographics and cardiac health metrics. These include the average age and maximum heart rate within subgroups defined by sex and angina status, as well as the average resting blood pressure and cholesterol levels for higher-risk populations, such as individuals with hypertension or adults over the age of 55.