# privacy_profiles_for_private_selection__b3c07d37.pdf

Privacy Profiles for Private Selection

Antti Koskela 1 Rachel Redberg 2 Yu-Xiang Wang 3

Private selection mechanisms (e.g., Report Noisy Max, Sparse Vector) are fundamental primitives of differentially private (DP) data analysis with wide applications to private query release, voting, and hyperparameter tuning. Recent work (Liu & Talwar, 2019; Papernot & Steinke, 2022) has made significant progress in both generalizing private selection mechanisms and tightening their privacy analysis using modern numerical privacy accounting tools, e.g., Renyi DP. But Renyi DP is known to be lossy when (ϵ, δ)-DP is ultimately needed, and there is a trend to close the gap by directly handling privacy profiles, i.e., δ as a function of ϵ or its equivalent dual form known as f-DPs. In this paper, we work out an easy-to-use recipe that bounds the privacy profiles of Report Noisy Max and Private Tuning using the privacy profiles of the base algorithms they corral. Numerically, our approach improves over the RDP-based accounting in all regimes of interest and leads to substantial benefits in end-to-end private learning experiments. Our general result also allows analysing the case of binomially-distributed number of rounds, which leads to more concentrated distributions compared to the previously considered Poisson distribution.

1. Introduction

Differential privacy (DP) bounds the privacy loss incurred when an algorithm is run on a dataset. While the analysis of this bound is often quite nuanced, the rough tally is that whenever the algorithm accesses the data, it incurs a privacy cost.

By this rule of thumb, data privacy for modern machine learning (ML) applications is in trouble. Most modern ML

1Nokia Bell Labs 2Northeastern University 3UC San Diego. Correspondence to: Antti Koskela <antti.h.koskela@nokia-belllabs.com>.

Proceedings of the 41 st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024. Copyright 2024 by the author(s).

algorithms are notoriously finicky and require extensive hyperparameter tuning in order to achieve good performance. In the context of data privacy, this means that the evaluation of every additional hyperparameter candidate could have a privacy cost.

Imagine that you have an (ϵ, δ)-DP training algorithm and wish to evaluate it across k candidates. Na ıvely it s possible to analyze the privacy of this procedure via composition, a property of differential privacy which in its most basic form implies that the privacy parameters ϵ and δ add up (Dwork et al., 2006). This na ıve analysis would allow an ML practitioner to privately release all k trained models at a cost of (kϵ, kδ)-DP i.e., the privacy cost scales linearly in the number of model evaluations. But for hyperparameter tuning, practitioners are typically most interested in the best model (with the highest quality score) and hence might want to output only one out of the k trained models. Thus, a tighter analysis may often be available for this class of private selection mechanisms which choose from a set of candidates the item which approximately maximizes a given quality score.

The aim of our work is to improve the privacy analysis of private hyperparameter tuning algorithms for machine learning algorithms. This line of inquiry originates with Chaudhuri & Vinterbo (2013), who highlighted the need for differentially private parameter tuning in order to achieve end-to-end privacy in machine learning applications. Chaudhuri & Vinterbo (2013) proposed a procedure for training and validation which provides differential privacy under the somewhat stringent condition that the learning algorithm must uphold a notion of Lipschitz-like stability on the score function.

The subsequent work of Liu & Talwar (2019) ushered in the age of black-box frameworks for differentially private hyperparameter tuning. Liu & Talwar (2019) relaxed the stability assumption on the score function to the weaker requirement that each individual candidate be differentially private. They proposed black-box tuning methods that use a random stopping strategy; in particular, the number of candidates evaluated by the learning algorithm follows a geometric distribution. Liu & Talwar (2019) showed that if each candidate (or base mechanism , as we will call it in our work) satisfies ϵ-DP, then their end-to-end algorithm for

Privacy Profiles for Private Selection

private selection satifies 3ϵ-DP. They also gave approximate (ϵ, δ)-DP results.

The Liu & Talwar (2019) bounds apply only to (ϵ, δ)-DP. Mohapatra et al. (2022) gave adaptive tuning methods and showed that for a reasonable number of privately chosen candidates, na ıve accounting via R enyi differential privacy (RDP) often yields tighter DP bounds than private selection using the methods by Liu & Talwar (2019). One might then ask, what could more sophisticated RDP accounting do?

We base our approach on the quite notable results of Papernot & Steinke (2022), which built on Liu & Talwar (2019) s work and provided RDP bounds for black-box tuning algorithms which scale logarithmically in relation to the number of model evaluations. Papernot & Steinke (2022) also generalized the random stopping strategy to encompass distributions other than the geometric distribution: the number of repetitions can now be distributed according to a truncated negative binomial distribution, or to a Poisson distribution. In either case the RDP bound for the end-to-end algorithm is constructed from the RDP and DP bounds on the base mechanisms.

What is missing from the private selection and private hyperparameter tuning line of work, are bounds that would directly use the privacy profiles of the base mechanisms as building blocks and would also be capable of taking advantage of numerical accounting methods (e.g., the recent work of Koskela et al., 2021; Zhu et al., 2022; Gopi et al., 2021). In this work we address this shortfall. Our proposed bounds utilize only point-wise information about the (ϵ, δ)-profile of the base mechanism, similarly to bounds given by Liu & Talwar (2019).

As one application of our results we consider Differentially Private Stochastic Gradient Descent (DP-SGD) (Abadi et al., 2016), a popular technique for training machine learning models with DP. DP-SGD introduces extra hyperparameters such as the noise level σ and clipping constant C; factors like the subsampling ratio q and the training duration also affect both the privacy and accuracy of the methods. As demonstrated by Papernot & Steinke (2022), tweaking DPSGD s hyperparameters often relies on using sensitive data which require privacy protection. The risk of data leakage from hyperparameters is arguably smaller than from model parameters, but developing methods with low privacy costs has proven challenging. The leading algorithms proposed by Papernot & Steinke (2022) still incur a significant privacy cost overhead. Our aim is to further lower this overhead.

As another application of our analysis we consider Generalized Propose-Test-Release (Redberg et al., 2023), which broadens the reach of the Propose-Test-Release (PTR) framework by allowing it to handle queries with unbounded sensitivity, making it applicable to problems such as lin-

ear regression. The method proposed by Redberg et al. (2023) gives point-wise (ϵ, δ)-privacy guarantees for generalized PTR. Thus, in order to account for the privacy cost of tuning the hyperparameters of the underlying queries, we can directly use our analysis for which point-wise (ϵ, δ)- guarantees are sufficient. We show that our bounds are considerably tighter than those of Liu & Talwar (2019), the only previous applicable results. We also empirically illustrate that compared to well-established non-adaptive methods, our bounds considerably improve the privacy-utility trade-off for linear regression problems.

2. Preliminaries

We first give the basic definitions. An input dataset containing n data points is denoted as X = {x1, . . . , xn}. Denote the set of all possible datasets by X. We say X and X

are neighbors if we get one by adding or removing one data element to or from the other, or by replacing one data element in the other (denoted X X ). Consider a randomized mechanism M : X O, where O denotes the output space. The (ϵ, δ)-definition of DP can be given as follows (Dwork, 2006). Definition 2.1. Let ϵ > 0 and δ [0, 1]. We say that a mechanism M is (ϵ, δ)-DP, if for all neighboring datasets X and X and for every measurable set E O we have:

Pr(M(X) E) eϵPr(M(X ) E) + δ.

We state many of our results for general f-divergences. For a convex function f : [0, ) R, we define the fdivergence between distributions P and Q taking values in Y as

Hf(P||Q) = Z

Notice that we do not require the normalization f(1) = 0 often used in the so-called Czsis ar divergences (Liese & Vajda, 2006) as it is not necessary and can be obtained simply by scaling. Especially, our aim is to find tight bounds for the hockey stick divergence, i.e., when f(z) = [z eϵ]+ for some ϵ R. This is due to the fact that tight (ϵ, δ)- bounds can be obtained using the hockey-stick-divergence: Lemma 2.2 (Balle et al. 2018, Theorem 1). A mechanism M satisfies (ϵ, δ)-DP if and only if, max X X Hf(M(X)||M(X )) δ for f(z) = [z eϵ]+.

We denote the hockey stick divergence determined by ϵ R by He ϵ throughout the paper, and will refer to

δM(ϵ) := max X X Heϵ(M(X)||M(X ))

as the privacy profile of mechanism M.

We will also use the R enyi differential privacy (RDP) (Mironov, 2017) which is defined as follows.

Privacy Profiles for Private Selection

R enyi divergence of order α > 1 between two distributions P and Q is defined as

Dα(P||Q) = 1 α 1 log Z P(t)

α Q(t) dt (2.1)

and we say that a mechanism M is (α, ϵ)-RDP, if for all neighboring datasets X and X , the output distributions of M(X) and M(X ) have R enyi divergence of order α > 1 at most ϵ, i.e., if

max X X Dα M(X)||M(X ) ϵ.

To convert from RDP to (ϵ, δ)-DP, we use the formula given in Appendix (D.1).

Notice that the R enyi divergence of order α > 1 is a scaled logarithm of an f-divergence determined by f(z) = zα. Existing RDP analyses for Report Noisy Max and private selection use the fact f(u, v) = ( u

v )αv is a jointly convex function for u and v. We formulate many of our results for general f-divergences, for which the following technical result will then be central (see e.g., Ch. 3, Boyd & Vandenberghe, 2004).

Lemma 2.3. If f : R+ R+ is a convex function, then the function f x y y is jointly convex for x 0 and y > 0.

For the analysis of private selection algorithms, the number of times K that the base algorithm is evaluated is a random variable. To analyze different alternatives for choosing K, we will need the concept of probability generating functions.

Definition 2.4. Let K be a random variable taking values in N {0}. The probability generating function (PGF) of K, φ : [0, 1] [0, 1] is defined as

k=0 P(K = k) zk.

Our main result Thm. 4.1 is stated for a general PGF φ and we use it obtain method-specific bounds. Throughout the paper, we will denote m = E[K].

3. Report Noisy Max for Additive Noise Mechanisms

As a first application of the hockey-stick divergence-based analysis, we consider the Report Noisy Max (RNM) of 1dimensional additive noise mechanisms. This will serve as a segue into the more involved analysis of private selection, where we also obtain bigger gains compared to the previous results. Our analysis here is based on the RDP analysis by Zhu & Wang (2022). We mention that recent applications of RNM include private in-context learning of LLMs (Wu et al., 2024; Tang et al., 2024).

Let X = {x1, x2, . . . , x N}, where xi X for all i [N], be a data set and consider the mechanism

M(X) = arg max{M1(X), . . . , Mm(X)}, (3.1)

where for every i [m],

Mi(X) = f(X) + Zi,

for some function fi : X R such that

max X X |fi(X) fi(X )| 1

and where the noises Zi, i [m], are i.i.d. We have the following existing RDP bound. Theorem 3.1 (Zhu & Wang 2022, Theorem 8). Let α > 1. The mechanism M(X) is (α, ϵ )-RDP for

ϵ = ϵ + log m

where ϵ denotes the RDP guarantee of order α for an additive noise mechanism with noise Z and sensitivity 2. Theorem 3.2. Let X X and ϵ R. We have:

He ϵ M(X)||M(X ) m δ(ϵ), (3.3)

where δ(ϵ) is the privacy profile of the additive noise mechanism with sensitivity 2. If we assume monotonicity, i.e., if fi(X) fi(X ) or fi(X) fi(X ) for all i [m], then the sensitivity of 2 can be replaced with a sensitivity of 1. Theorem 3.3. For an adaptive composition of k mechanisms of the form (3.1), we get the privacy profile upper bound mk δ(ϵ), where δ(ϵ) is the privacy profile of an m-wise composition of an additive noise mechanism with noise Z and sensitivity 2.

The RDP analysis of the private selection algorithm provided by Papernot & Steinke (2022) shows that the RDP guarantees essentially grow as log m, where m is the expected number of candidates for the private selection algorithm. When Z is normally distributed, we directly see from our analysis that for a fixed δ the ϵ-values of the private selection algorithm grow as 1

σ log 1 2 m

δ . This result is obtained for the RNM using a simple tail bound of the Gaussian. Corollary 3.4. Consider the mechanism M defined in Eq. 3.1 and suppose Z is normally distributed with variance σ2. Let δ > 0. Then M is (ϵ, δ)-DP for

As Figure 1 shows, with the bound given in Thm. 3.2 we observe differences one usually observes between the accurate hockey stick and R enyi divergence bounds (for comparisons, see, e.g. Canonne et al., 2020). When considering the private selection problem where the number of candidates is randomized, we obtain larger differences. Also, the ϵ-growth order O( 1

σ log 1 2 m

δ ) is retained. All of this is discussed in the next two sections.

Privacy Profiles for Private Selection

101 102 103 104 105

δ , Cor. 3.4

RDP bound, Thm. 3.1 HS bound, Thm. 3.2

Figure 1. Comparison of the (ϵ, δ)-bounds for the RNM mechanism (3.1) when the base mechanisms Mi, i [m], are 1-d Gaussian mechanisms with sensitivity 1 and noise scale σ = 4.0, and when δ = 10 6. Also plotted is the bound of Corollary 3.4.

4. General Bound for Private Selection

We use the notation and setting of Papernot & Steinke (2022). This means that the tuning algorithm A outputs both the argument of the maximizer (the best hyperparameters or the index) and the output of the base mechanism (e.g., the model trained with the best hyperparameters).

Let Y denote the finite and ordered output space (the quality score), Q(y) the density function of the quality score of the base mechanism taking values in Y, K the random variable for the number of times the base mechanism is run and A(y) the density function of the tuning algorithm that outputs the best one of the K alternatives. Let A and A denote the output distributions of the tuning algorithm evaluated on neighboring datasets X and X , respectively. Then, the f-divergence between A and A can be bounded using the following result. We use the proof technique from Lemma 7 of Papernot & Steinke (2022) and similarly invoke the argument that our proof for finite Y can be extended to the general case.

Theorem 4.1. Let X X and let A and A be the density functions of the hyperparameter tuning algorithm as defined above, evaluated on X and X , respectively. Let Q and Q be the density functions of the quality score of the base mechanism, evaluated on X and X , respectively. Let K be random variable for the times the base mechanism is run and φ(z) the PGF of K. Let f : [0, ) R be a convex function. Then,

Hf(A||A ) X

y Y f Q(y)φ (qy)

Q (y)φ (q y)

Q (y)φ (q y),

where for each y Y, qy and q y are obtained by applying

the same y-dependent post-processing function to Q and Q , respectively.

Looking at the bound given by Thm. 4.1, we can decompose the right-hand side in case f corresponds to the R enyi divergence and obtain (Lemma 7, Papernot & Steinke, 2022) as a corollary.

Remark 4.2. Let f(z) = zλ for some λ 1. Then, by Thm. 4.1,

e(λ 1)Dλ(A||A ) = Hf(A||A )

λ Q (y) φ (qy)

for some q and q that are obtained by applying the same post-processing function to Q and Q , respectively. Taking the logarithm and dividing by λ 1, we obtain (Lemma 7, Papernot & Steinke, 2022).

In case of RDP analysis, the bounds for the randomized private selection algorithms (see Theorems 5.1 and 5.6) allow optimizing the bound (4.1) with respect to the privacy profile of Q. For example: as q is a result of post-processing of Q, we have that for all ϵ 0

q eϵq + δ(ϵ),

where δ(ϵ) gives the privacy profile of Q, and we can carry out an optimization of the bound (4.1) individually for each RDP order λ w.r.t. ϵ. In case the function f in Thm. 4.1 corresponds to the hockey stick divergence, the best we can have is a uniform bound for the ratio φ (qy)

φ (q y) which uses the bound qy eϵq y + δ(ϵ) with the same value of ϵ for each y Y. As a result there is one degree of freedom less to optimize in the upper bounds we obtain using the hockey stick divergence. Nevertheless, the analysis with the hockey stick divergence becomes much simpler and the resulting (ϵ, δ)-DP bounds for private selection become tighter.

5. Distribution Specific Bounds for Private Selection

We next consider privacy profile bounds for two specific choices of the distribution K, the truncated negative binomial distribution and the binomial distribution. As we show, in both cases the bounds allow evaluating a considerably larger number of private candidates than the state-of-the-art bounds. The bounds for the binomial distribution generalize the bounds for the Poisson distribution, improve the state-ofthe-art bounds for the Poisson distribution, and also allow concentrating K further.

Privacy Profiles for Private Selection

5.1. Truncated Negative Binomial Distribution

Suppose the number of trials K is distributed according to the truncated negative binomial distribution Dη,γ which is determined by γ (0, 1) and η ( 1, ) and by the probabilities (k N) for η = 0 by

P(K = k) = (1 γ)k

and for η = 0 by

P(K = k) = (1 γ)k

k log 1/γ .

It holds that when η = 0,

EK = η(1 γ)

and when η = 0,

The derivative of the corresponding probability generating function is given by

φ (z) = 1 (1 γ)z η 1 γη+1 EK. (5.1)

As a baseline, we consider the following RDP bound.

Theorem 5.1 (Papernot & Steinke 2022). Let Q satisfy α, ϵ -RDP and bα,bϵ -RDP for some α (1, ) and bα [1, ). Draw K from a truncated negative binomial distribution distribution Dη,γ, where γ (0, 1) and η ( 1, ). Run Q(X) for K times. Then A(X) returns the best value of those K runs (also Q s output). Then A satisfies α, ϵ (α) -RDP, where

ϵ (α) = ϵ(α) + (η + 1) 1 1

+ (1 + η) log(1/γ)

Using the PGF (5.1) and our general result Thm.4.1, we obtain the following bound using the hockey-stick divergence.

Theorem 5.2. Let K Dη,γ and let δ(ϵ1), ϵ1 R, define the privacy profile of the base mechanism Q. Then, for A and A , the output distributions of the selection algorithm evaluated on neighboring datasets X and X , respectively, and for all ϵ1 0,

He ϵ(A||A ) m δ(bϵ)

bϵ = ϵ (η + 1) log eϵ1 + 1 γ

We directly obtain the following pure ϵ-DP result from Thm. 5.2. Notice that it includes Theorem 1.3 (3ϵ-bound) by Liu & Talwar (2019) as a special case.

Corollary 5.3. Let K Dη,γ. If the base mechanism Q is ϵ-DP, then the selection algorithm A is (η + 2)ϵ-DP. For η = 1 we get Theorem 1.3 of (Liu & Talwar, 2019).

The following (ϵ, δ)-DP result is also a straightforward corollary of Thm. 5.2.

Corollary 5.4. Let K Dη,γ. If the base mechanism Q is (ϵ, δ)-DP, then then the selection algorithm A is (η + 2)ϵ + γ 1δ, mδ -DP.

Notice that for the geometric distribution (η = 1), Cor. 5.4 implies that if Q is (ϵ, δ)-DP, then A is 3ϵ + m δ, m δ -DP.

We can show that for a fixed δ the ϵ-value of the private selection algorithm is O(log 1 2 m

δ ), in case the base mechanism is Gaussian differentially private (Dong et al., 2022), i.e., if its privacy profile is dominated by the hockey-stick divergence between two Gaussians.

Corollary 5.5 (ϵ-values when Q is GDP). Let K Dη,γ with η 1 and suppose the base mechanism is dominated by the Gaussian mechanism with noise parameter σ > 0 and L2-sensitivity 1. Then, for a fixed δ > 0, the private selection algorithm A is (ϵ, δ)-DP for

ϵ = (η + 1) 1

Figure 2 illustrates the various bounds for the truncated negative binomial distribution with η = 1 when the base mechanism is the Gaussian mechanism with σ = 4 and L2-sensitivity 1, when m = 30, 300 and 3000. Figure 3 shows the increase of the ϵ-values for a fixed value of δ and the bound of Lemma 5.5 for comparison.

5.2. Binomial Distribution

The choice of distribution for the number of repetitions K has practical implications on the utility of private selection (Papernot & Steinke, 2022). One issue is that when the distribution of K is less concentrated, the value of K is likely to be small meaning fewer candidates evaluated even when its expectation is large. This is especially problematic for smaller numbers of candidates where the expectation of K is not even large to begin with! As a practical alternative to the less-concentrated truncated negative binomial distribution, Papernot & Steinke (2022) consider the Poisson distribution for K.

In our work, by considering K Bin(n, p) to be binomially distributed, we can still further concentrate the distribution of K by allowing a small additional privacy cost.

Privacy Profiles for Private Selection

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 ϵ

RDP bound, Thm. 5.1, m=30 RDP bound, Thm. 5.1, m=300 RDP bound, Thm. 5.1, m=3000 HS bound, Thm. 5.2, m=30 HS bound, Thm. 5.2, m=300 HS bound, Thm. 5.2, m=3000

Priv. Profile of Base Mechanism

Figure 2. Comparison of various (ϵ, δ)-bounds when K Dη,γ with η = 1 (the geometric distribution, E[K] = γ 1) and m = 30, 300, 3000. The base mechanism is the Gaussian mechanism with L2-sensitivity 1 and noise parameter σ = 4.

101 102 103 104 105

RDP bound, Thm. 5.1 HS bound, Thm. 5.2

Figure 3. Growth of ϵ-values for δ = 10 6 for the RDP and hockey stick divergence based bounds as a function of m, when K Dη,γ with η = 1 and the base mechanism is the Gaussian mechanism with L2-sensitivity 1 and noise parameter σ = 4. The privacy profile bound given by Thm. 5.2 retains the O(log 1 2 m

δ ) growth of ϵ-values from the DP RNM (See Fig. 1).

The hockey stick-based approach simplifies the analysis such that when compared to the RDP bounds for Poissondistributed K, we essentially get much more concentrated K for the same privacy cost using the binomial distribution.

Our bound for K Bin(n, p) is a strict generalization of the Poisson distribution case, as we get the result for the Poisson distribution as a limit when p 0. The RDP bound to compare is the one given by Papernot & Steinke (2022).

Theorem 5.6 (Papernot & Steinke 2022). Let Q satisfy α, ϵ -RDP and (bϵ, bδ)-DP for some α (1, ) and

ϵ,bϵ, bδ 0. Draw K from a Poisson distribution with mean m. Run Q(X) for K times. Then A(X) returns the best value of those K runs (also Q s output). If K = 0, A(X) returns some arbitrary output. If ebϵ 1 + 1 α 1, then A satisfies α, ϵ (α) -RDP, where

ϵ (α) = ϵ + m bδ + log m

For a hockey stick divergence bound, we consider the binomially distributed K which is a strict generalization of the Poisson case and includes its privacy profile bound as a special case. We can derive the following result from Thm. 4.1 when using the PGF of the binomial distribution.

Theorem 5.7. Let K Bin(n, p) for some n N and 0 < p < 1, and let δ(ϵ1), ϵ1 R, define the privacy profile of the base mechanism Q. Suppose

ϵ1 log 1 + p 1 p δ(ϵ1) .

Then, for A and A , the output distributions of the selection algorithm evaluated on neighboring datasets X and X , respectively, for all ϵ > 0 and for all ϵ1 0,

He ϵ(A||A ) m δ(bϵ), (5.2)

bϵ = ϵ (n 1) log 1 + p (eϵ1 1) + p δ(ϵ1) .

We get the hockey stick divergence bound for the case K Poisson(m) as a corollary of Thm. 5.7.

Corollary 5.8. Let K Poisson(m) for some m N, and let δ(ϵ1), ϵ1 R, define the privacy profile of the base mechanism Q. Then, for A and A , the output distributions of the selection algorithm evaluated on neighboring datasets X and X , respectively, and for all ϵ > 0 and for all ϵ1 0,

He ϵ(A||A ) m δ(bϵ), (5.3)

bϵ = ϵ m (eϵ1 1) m δ(ϵ1).

The proof of Cor. 5.8 essentially follows from the fact that Bin(n, m/n) approaches Poisson(m) in total variation distance as n grows and that the bound (5.2) approaches the bound (5.3) as n grows. Figure 4 illustrates that when compared to the RDP bound of Thm. 5.6 for the Poisson distributed K, we can obtain much smaller probabilities for small values of K for the same privacy cost when using K Bin(n, m/n) and Thm. 5.7.

Privacy Profiles for Private Selection

0.0 0.5 1.0 1.5 2.0 2.5 ϵ

RDP bound, Thm. 5.6, K Poisson(m)

HS bound, Thm. 5.7, K Bin(n, m/n), n = 15

HS bound, Thm. 5.7, K Bin(n, m/n), n = 20

HS bound, Thm. 5.7, K Bin(n, m/n), n = 1000

0 2 4 6 8 10 12 K

CDF of Poisson(m)

CDF of Bin(n, m/n), n = 15

CDF of Bin(n, m/n), n = 20

CDF of Bin(n, m/n), n = 1000

Figure 4. Top: Comparison of the bound of Thm. 5.7 for K Bin(n, m/n) for different values of n, when m = 10, and the RDP bound of Thm. 5.6. The base mechanism is the Gaussian mechanism with sensitivity 1 and σ = 4.0. Bottom: Comparison of the CDFs for different values of n. When comparing to the RDP bound, we see that at δ 10 6 we get more concentrated K for free by using the binomial distribution and Thm. 5.7.

6. Applications

6.1. Hyperparameter tuning for Propose-Test-Release

Propose-Test-Release (PTR) (Dwork & Lei, 2009) is one of the most versatile recipes for data-adaptive DP mechanism design. In its vanilla form, it involves three steps: (1) propose (or guess) a bound of the local sensitivity; (2) privately test whether the bound is valid; (3) If it passes the test, calibrate noise proportional to the proposed bound; otherwise, refuse to answer. Recently, Redberg et al. (2023) generalized this approach by considering data-adaptive privacy losses instead. The biggest challenge to apply the method is to know which bound to propose. The data-dependent privacy loss will depend on both the dataset and the hyperparameters of the query (as an example, think of the noise scale in additive noise mechanisms). To tune these hyperpa-

rameters we consider the private selection algorithm with geometrically distributed K. The tricky issue with PTR is that PTR does not satisfy R enyi DP, and thus disqualifies the approach from Papernot & Steinke (2022). Meanwhile, our methods deal with (ε, δ)-DP and δ-approximate Gaussian DP very naturally. Using our Thm. 5.2 we can select the best threshold to propose in a large number of candidates without resorting to composition. We have the following result for Generalized PTR with an (bϵ, bδ)-DP test. We refer to Redberg et al. (2023) for more details.

Theorem 6.1 (Redberg et al. 2023). Consider a proposal ϕ and a data-dependent function ϵϕ(X) w.r.t. δ > 0. Suppose that we have an (bϵ, bδ)-DP test T : X {0, 1} such that when ϵϕ(X) > ϵ, T (X) = 1 with probability δ and 0 with probability 1 δ . Then the Generalized PTR algorithm (Redberg et al., 2023, Alg. 2) is (ϵ +bϵ, δ + bδ + δ )-DP.

Our approach is to wrap the private selection algorithm around generalized PTR and tune the parameter ϕ. We use the point-wise guarantees given by Thm. 6.1 for Gen. PTR and our Thm. 5.1 for the tuning algorithm. To illustrate the effectiveness of this approach, we consider a linear regression problem on two UCI benchmark data sets (Bache & Lichman, 2013), see Fig. 5. We apply the Generalized PTR to the one-posterior sample (OPS) algorithm described in (Redberg et al., 2023) which includes privately releasing the L2-norm of the non-private solution and also the smallest eigenvalue of the feature covariance matrix. The parameter to tune in the method is the regularization strength λ (see Alg. 7, Redberg et al., 2023) and we carry out a random search on a pre-defined logarithmically equidistant grid meaning that we pick a random value from the grid at each of the K rounds. Notice that we could draw the candidates from any fixed probability distribution; the only requirement is that each candidate mechanism has the same privacy profile. As baselines we have the same approach using the privacy bounds of Liu & Talwar (Thm. 3.5, 2019), the output perturbation method (Chaudhuri et al., 2011) and the non-adaptive method OPS-Balanced by Wang (2018).

6.2. Private Hyperparameter Tuning of DP-SGD

Our results enable using numerical accountants for computing the privacy profile δ(ϵ) for the subsampled Gaussian mechanism (see, e.g., Koskela et al., 2021; Gopi et al., 2021; Zhu et al., 2022). We consider the simplest case, i.e., the Poisson subsampling and the add/remove neighborhood relation of datasets. We apply the numerical method proposed by Koskela et al. (2021) to the dominating pairs given in (Thm. 11, Zhu et al., 2022) to obtain accurate privacy profiles for the base mechanism. We remark that Zhu et al. (2022) give privacy profiles for various subsampling schemes under both add/remove and substitute neighborhood relations of datasets. With these results, one can numer-

Privacy Profiles for Private Selection

UCI Bike dataset (n = 17379, d = 17)

non-private Out Pert OPS OPS with PTR (Thm. 5.2) OPS with PTR (Liu&Talwar)

UCI Elevator dataset (n = 8752, d = 18)

non-private Out Pert OPS OPS with PTR (Thm. 5.2) OPS with PTR (Liu&Talwar)

Figure 5. Linear regression problem on two UCI benchmark datasets. Tuning OPS-PTR (i.e., generalized PTR applied to the one-posterior sample algorithm) via the private selection algorithm outperforms baseline methods when the privacy cost of the tuning procedure is calculated using our Thm. 5.2.

ically construct the dominating pair of distributions using methods of Doroshenko et al. (2022) and also obtain upper bounds for compositions.

We find that our bounds are tighter than the RDP bounds across a variety of parameter combinations. Figure 6 shows comparisons with parameters taken from an example of (Papernot & Steinke, 2022). RDP parameters are evaluated using the Opacus library (Yousefpour et al., 2021). Often, using larger batch sizes and noise ratios leads to increased privacy-utility tradeoff (De et al., 2022; Ponomareva et al., 2023). Figure 7 shows comparisons in a setting of such a high-accuracy experiment (De et al., 2022).

Adjusting Hyperparameters. One important question is, how to adjust the hyperparameters in case we are optimizing parameters that affect the privacy guarantees themselves. For example, one may consider tuning the noise parameter σ in DP-SGD in which case one may also consider adjusting the length of the training.

0 5 10 15 20 25 30 m

RDP bound (Poisson), Thm. 5.6 HS bound (Poisson), Cor. 5.8 RDP bound (Geom.), Thm. 5.1 HS bound (Geom.), Thm. 5.2

Figure 6. Comparison of the hockey stick divergence bounds of Thm. 5.2 and Cor. 5.8 and the RDP bounds of Thm. 5.1 and 5.6 for the private selection for a Poisson subsampled Gaussian mechanism with subsampling ratio q = 256/60000, noise parameter σ = 1.1 and number of steps T = 60/q .

100 101 102 103 104

RDP bound, Thm. 5.1 HS bound, Thm. 5.2

Figure 7. Comparison of the bounds, when the base mechanism is a Poisson subsampled Gaussian mechanism with the parameters q = 16384/50000, σ = 21.1 and T = 250 (see Table 18 in De et al., 2022). We see that the bound of Thm. 5.2 allows evaluating approximately 3 times as many models as the RDP bound.

The bound of Thm. 5.2 essentially requires only point-wise information about the privacy profile of the base mechanism Q. However, the bound can be optimized w.r.t. the privacy profile of Q which may affect it considerably. For finding ideal point-wise DP thresholds, we consider the following procedure. Suppose the base mechanisms all satisfy (ϵQ, δ) for some ϵQ > 0 and δ > 0. As the privacy profiles start to resemble those of the Gaussian mechanism for large numbers of compositions (Dong et al., 2022), we carry out the optimization of the upper bound of Thm. 5.2 w.r.t. to

Privacy Profiles for Private Selection

the privacy profile of a Gaussian mechanism (GM) that is adjusted to be (ϵQ, δ)-DP. More specifically, we first adjust the noise scale σ of the GM such that it is (ϵQ, δ)-DP, to obtain a privacy profile δQ(ϵ). Then, in case we are using K Dη,γ, we carry out the minimization

ϵ1 = arg min ϵ log eϵ + 1 γ

and set δ1 = δQ(ϵ1). This results in the first threshold (ϵ1, δ1). Using the same privacy profile we find an bϵ-value where this GM is (bϵ, δ/m)-DP. The following corollary result of Thm. 5.2 then gives the ϵ-value for which the private selection algorithm is (ϵ, δ)-DP.

Corollary 6.2. Suppose the base mechanism Q is (ϵ1, δ1)- DP and (bϵ, δ/m)-DP for some ϵ1 0 and bϵ 0. Then, the private selection algorithm with K Dη,γ is (ϵ, δ)-DP for

ϵ = bϵ + (η + 1) log eϵ1 + 1 γ

Figure 8 shows the upper bound obtained using this procedure, when we are tuning the σ-parameter for the Poisson subsampled Gaussian mechanism. We fix q = 0.01 and set as a threshold ϵQ = 1.5 and δ = 10 6. We consider three σ-candidates: 2.0, 3.0 and 4.0 and for each of them the number of iterations T is determined to be the maximum such that the privacy profile of the candidate is below (ϵ1, δ1)- and (bϵ, δ/m)-thresholds. As a result we can run the candidate models for 4000, 10000 and 18000 iterations, respectively. We compare the (ϵ, δ)-DP bound given by Cor. 6.2 and the bounds we would obtain by optimizing Thm. 5.2 individually for each candidate mechanism and see that there is a very small gap.

7. Conclusions and Future Work

We have filled a gap in the private selection literature by providing (ϵ, δ)-privacy analysis for various selection algorithms that is able to accurately use the privacy profiles of the candidate mechanisms. Our bounds can be used as a drop-in replacement of the existing RDP bounds. When compared to existing RDP bounds, in DP-SGD tuning, for example, the new bounds allow evaluating approximately 3 times as many candidate models. The bounds also improve existing point-wise (ϵ, δ)-bounds which translates to improved utility in data-adaptive analyses using the generalized PTR framework. We have also shown how to use the bounds to adjust parameters of the candidate models when tuning hyperparameters that affect the privacy guarantees of the candidate mechanisms.

Related to the results of Section 3, it is an open problem how to find tight bounds for a given noise-adding mechanism in case of unbounded queries. It is fairly straightforward to find accurate numerical bounds for a fixed pair of datasets

101 102 103

Upper Bound Optimized Candidate Bound, σ = 2.0

Optimized Candidate Bound, σ = 3.0

Optimized Candidate Bound, σ = 4.0

Figure 8. Tuning the σ-parameter for the Poisson subsampled Gaussian mechanism with the geometrically distributed K. We fix q = 0.01 and consider three σ-candidates: 2.0, 3.0 and 4.0. Shown are the (ϵ, δ)-bound given by Cor. 6.2 and the bounds obtained by optimizing Thm. 5.2 individually for each candidate mechanism.

and a given noise-adding mechanism, but it is unclear what would generally be the worst-case pairs of query values in case of unbounded queries (for results in case of bounded queries, see Lebensold et al., 2024). Regarding the results of Section 4 and 5, our impression is that when the number of candidates is randomized, our bounds cannot be much tightened. However, one could consider different randomized selection algorithms such as those by Cohen et al. (2023) where the number of candidates is Beta-binomially distributed. It is an interesting question whether our general result of Thm. 4.1 could be then used to derive tighter privacy bounds. Note that all our results are non-data adaptive in the sense that they do not benefit, for example, from the case that one of the candidates is clearly better than the rest. As an alternative, one could consider data-adaptive mechanisms such as Bayesian optimization methods (Wang et al., 2023) or data-adaptive top-k selection mechanisms (Zhu & Wang, 2022).

Impact Statement

This paper presents work whose goal is to advance the field of Differentially Private Machine Learning. There are many potential societal consequences of our work, none which we feel must be specifically highlighted here.

Acknowledgments

The work was partially supported by NSF Award #2048091. The authors thank Yuqing Zhu for sharing code from Redberg et al. (2023).

Privacy Profiles for Private Selection

Abadi, M., Chu, A., Goodfellow, I., Mc Mahan, H. B., Mironov, I., Talwar, K., and Zhang, L. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308 318, 2016.

Bache, K. and Lichman, M. UCI machine learning repository, 2013. URL http://archive.ics.uci. edu/ml.

Balle, B. and Wang, Y.-X. Improving the Gaussian mechanism for differential privacy: Analytical calibration and optimal denoising. In International Conference on Machine Learning, pp. 394 403. PMLR, 2018.

Balle, B., Barthe, G., and Gaboardi, M. Privacy amplification by subsampling: Tight analyses via couplings and divergences. Advances in Neural Information Processing Systems, pp. 6277 6287, 2018.

Boyd, S. P. and Vandenberghe, L. Convex optimization. Cambridge university press, 2004.

Canonne, C. L., Kamath, G., and Steinke, T. The discrete gaussian for differential privacy. Advances in Neural Information Processing Systems, 33:15676 15688, 2020.

Chaudhuri, K. and Vinterbo, S. A. A stability-based validation procedure for differentially private machine learning. Advances in Neural Information Processing Systems, 26, 2013.

Chaudhuri, K., Monteleoni, C., and Sarwate, A. D. Differentially private empirical risk minimization. J. Mach. Learn. Res., 12:1069 1109, 2011.

Cohen, E., Lyu, X., Nelson, J., Sarl os, T., and Stemmer, U. Generalized private selection and testing with high confidence. In 14th Innovations in Theoretical Computer Science Conference (ITCS 2023). Schloss-Dagstuhl-Leibniz Zentrum f ur Informatik, 2023.

De, S., Berrada, L., Hayes, J., Smith, S. L., and Balle, B. Unlocking high-accuracy differentially private image classification through scale. ar Xiv preprint ar Xiv:2204.13650, 2022.

Dong, J., Roth, A., and Su, W. J. Gaussian differential privacy. Journal of the Royal Statistical Society Series B, 84(1):3 37, 2022.

Doroshenko, V., Ghazi, B., Kamath, P., Kumar, R., and Manurangsi, P. Connect the dots: Tighter discrete approximations of privacy loss distributions. ar Xiv preprint ar Xiv:2207.04380, 2022.

Dwork, C. Differential privacy. In Proc. 33rd Int. Colloq. on Automata, Languages and Prog. (ICALP 2006), Part II, pp. 1 12, 2006.

Dwork, C. and Lei, J. Differential privacy and robust statistics. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pp. 371 380, 2009.

Dwork, C., Mc Sherry, F., Nissim, K., and Smith, A. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography: Third Theory of Cryptography Conference, TCC, Proceedings 3, pp. 265 284. Springer, 2006.

Gopi, S., Lee, Y. T., and Wutschitz, L. Numerical composition of differential privacy. Advances in Neural Information Processing Systems, 34:11631 11642, 2021.

Koskela, A., J alk o, J., Prediger, L., and Honkela, A. Tight differential privacy for discrete-valued mechanisms and for the subsampled Gaussian mechanism using FFT. In International Conference on Artificial Intelligence and Statistics, pp. 3358 3366. PMLR, 2021.

Lebensold, J., Precup, D., and Balle, B. On the privacy of selection mechanisms with gaussian noise. pp. 1495 1503, 2024.

Liese, F. and Vajda, I. On divergences and informations in statistics and information theory. IEEE Transactions on Information Theory, 52(10):4394 4412, 2006.

Liu, J. and Talwar, K. Private selection from private candidates. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pp. 298 309, 2019.

Mironov, I. R enyi differential privacy. In 2017 IEEE 30th Computer Security Foundations Symposium (CSF), pp. 263 275, Aug 2017. doi: 10.1109/CSF.2017.11.

Mohapatra, S., Sasy, S., He, X., Kamath, G., and Thakkar, O. The role of adaptive optimizers for honest private hyperparameter selection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 7806 7813, 2022.

Papernot, N. and Steinke, T. Hyperparameter tuning with renyi differential privacy. In International Conference on Learning Representations, 2022.

Ponomareva, N., Hazimeh, H., Kurakin, A., Xu, Z., Denison, C., Mc Mahan, H. B., Vassilvitskii, S., Chien, S., and Thakurta, A. G. How to dp-fy ml: A practical guide to machine learning with differential privacy. Journal of Artificial Intelligence Research, 77:1113 1201, 2023.

Redberg, R., Zhu, Y., and Wang, Y.-X. Generalized PTR: User-friendly recipes for data-adaptive algorithms with

Privacy Profiles for Private Selection

differential privacy. In International Conference on Artificial Intelligence and Statistics, pp. 3977 4005. PMLR, 2023.

Tang, X., Shin, R., Inan, H. A., Manoel, A., Mireshghallah, F., Lin, Z., Gopi, S., Kulkarni, J., and Sim, R. Privacypreserving in-context learning with differentially private few-shot generation. In International Conference on Learning Representations, 2024.

Wang, H., Gao, S., Zhang, H., Su, W. J., and Shen, M. DPHy Po: An adaptive private framework for hyperparameter optimization. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.

Wang, Y.-X. Revisiting differentially private linear regression: optimal and adaptive prediction & estimation in unbounded domain. Uncertainty in Artificial Intelligence (UAI-18), 2018.

Wu, T., Panda, A., Wang, J. T., and Mittal, P. Privacypreserving in-context learning for large language models. In International Conference on Learning Representations, 2024.

Yousefpour, A., Shilov, I., Sablayrolles, A., Testuggine, D., Prasad, K., Malek, M., Nguyen, J., Ghosh, S., Bharadwaj, A., Zhao, J., et al. Opacus: User-friendly differential privacy library in pytorch. In Neur IPS 2021 Workshop Privacy in Machine Learning, 2021.

Zhu, Y. and Wang, Y.-X. Adaptive private-k-selection with adaptive k and application to multi-label pate. In International Conference on Artificial Intelligence and Statistics, pp. 5622 5635. PMLR, 2022.

Zhu, Y., Dong, J., and Wang, Y.-X. Optimal accounting of differential privacy via characteristic function. In Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, 2022.

Privacy Profiles for Private Selection

A. Proofs for Section 3

A.1. Proof of Theorem 3.2 (Privacy Profile of Additive Noise RNM)

Theorem A.1. Let X X and ϵ R. We have

He ϵ M(X)||M(X ) m δ(ϵ),

where δ(ϵ) is the privacy profile of the additive noise mechanism with sensitivity 2.

Proof. For each i [m], denoting the density function of Zi by p(ri), we have that

Hf M(X)||M(X )

i=1 f P(M(X) = i)

P(M(X ) = i)

P(M(X ) = i)

R p(ri 2) P(fi(X) + ri 2 > maxj [m],j =i{fj(X) + rj}) dri R p(ri) P(fi(X ) + ri > maxj [m],j =i{fj(X ) + rj}) dri

p(ri) P(fi(X ) + ri > max j [m],j =i{fj(X ) + rj}) dri,

where p(ri) s are the density functions of Zi s, respectively, and where the randomness in P(fi(X) + ri > maxj [m],j =i{fj(X) + rj}) is w.r.t. rj s.

From the Lipschitz property it follows that for all i [m],

P(fi(X) + ri 2 > maxj [m],j =i{fj(X) + rj})

P(fi(X ) + ri > maxj [m],j =i{fj(X ) + rj}) 1. (A.2)

For an f divergence determined by a convex f of the RNM of additive noise mechanisms evaluated at X and X , we have that

Hf M(X)||M(X ) =

i=1 f P(M(X) = i)

P(M(X ) = i)

P(M(X ) = i)

R p(ri 2) P(fi(X) + ri 2 > maxj [m],j =i{fj(X) + rj}) dri R p(ri) P(fi(X ) + ri > maxj [m],j =i{fj(X ) + rj}) dri

p(ri) P(fi(X ) + ri > max j [m],j =i{fj(X ) + rj}) dri

f p(ri 2) P(fi(X) + ri 2 > maxj [m],j =i{fj(X) + rj})

p(ri) P(fi(X ) + ri > maxj [m],j =i{fj(X ) + rj})

p(ri) P(fi(X ) + ri > max j [m],j =i{fj(X ) + rj}) dri

p(ri) P(fi(X ) + ri > max j [m],j =i{fj(X ) + rj}) dri

i=1 Hf p(ri)||p(ri 2)

in case f corresponds to the hockey-stick-divergence and where δ(ϵ) is the privacy profile of the additive noise mechanism with sensitivity 2. In the first inequality we have used Lemma 2.3 and Jensen s inequality term-wise and in the second inequality we have used the Lipschitz property and the fact that f(z) is a non-decreasing function of z. In case of monotonicity, i.e., if fi(X) fi(X ) for all i [m], the condition (A.2) holds with ri 2 replaced by ri 1 and we have the result with ri 2 replaced by ri 1.

Privacy Profiles for Private Selection

A.2. Proof of Theorem 3.3

Lemma A.2. In case of an adaptive composition of k mechanisms of the form (3.1), we get the privacy profile upper bound mk δ(ϵ), where δ(ϵ) is the privacy profile of a m-wise composition of the additive noise mechanism with noise Z and sensitivity 2.

Proof. We use the proof technique used in (Thm. 27 Zhu et al., 2022) and consider an adaptive composition of two mechanisms. The general case follows from the proof. Let X X . Denote the density functions of M1(X) and M2 M1(X), X by f1(t) and f2(t, s), respectively, and the density functions of M1(X ) and M2 M1(X ), X by f 1(t) and f 2(t, s), respectively. Denote the density function of Z by p(t). Using the bound given by Thm. 3.2, we have:

He ϵ (M(X), M(X) ||(M(X), M(X) = Z

R max{f1(t)f2(t, s) eϵf 1(t)f 2(t, s), 0} dsdt

R max{f2(t, s) e ϵ log f1(t)

f 1(t) f 2(t, s), 0} ds

R max{p(s 2) e ϵ log f1(t)

f 1(t) p(s), 0} ds

R max{f1(t) e ϵ log p(s 2)

p(s) f 1(t), 0} dt

R max{p(t 2) e ϵ log p(s 2)

p(s) p(t), 0} dt

R max{p(s 2)p(t 2) eϵp(s)p(t), 0} dtds

which shows the claim for k = 2. The general case follows by induction.

A.3. Proof of Corollary 3.4

Lemma A.3. Consider the mechanism M defined in Eq. 3.1 and suppose Z is normally distributed with variance σ2. Let δ > 0. Then M is (ϵ, δ)-DP for

Proof. Let δ > 0. By (Lemma 3, Balle & Wang, 2018) we know that the Gaussian mechanism with L2-sensitivity and noise scale σ is (ϵ, δ)-DP if P(ω ϵ) δ,

Using a simple Chernoff bound for the Gaussian, we see that for any ϵ 2

P(ω ϵ) e eϵ2σ2

where eϵ = ϵ 2

2σ2 . Setting

we see that P(ω ϵ) 1

The claim follows setting = 2 and using Thm. 3.2.

Privacy Profiles for Private Selection

B. Proof of Theorem 4.1 (General Bound for the Privacy Selection)

Similarly to (Lemma 7, Papernot & Steinke, 2022) we denote Q( y) = P

y y Q(y ) and Q(< y) = P

y <y Q(y ) and formulate our main result for the case P(K = 0) = 0. The case P(K = 0) > 0 could be included in the upper bound as a small additional term of the form f(0) P(K = 0). E.g., in case of the hockey stick divergence, it does not account for the divergence in case ϵ > 0 so we neglect it.

Also, φ denotes the probability generating function of K, i.e.,

k=1 P(K = k) zk.

As shown in (Lemma 7, Papernot & Steinke, 2022),

k=1 P[K = k] Q( y)k Q(< y)k

= φ Q( y) φ Q(< y)

Q(<y) φ (z) dz

= Q(y) E X [Q(<y),Q( y)][φ (X)],

where X [Q(< y), Q( y)] denotes uniformly distributed r.v. on the interval [Q(< y), Q( y)].

Theorem B.1. Let X X and let A and A be the density functions of the hyperparameter tuning algorithm, evaluated on X and X , respectively. Let Q and Q be the density functions of the quality score of the base mechanism, evaluated on X and X , respectively. Let K be random variable for the times the base mechanism is run and φ(z) the PGF of K. Let f : [0, ) R be a convex function. Then,

Hf(A||A ) X

y Y f Q(y)φ (qy)

Q (y)φ (q y)

Q (y)φ (q y),

where for each y Y, qy and q y are obtained by applying the same y-dependent post-processing function to Q and Q , respectively.

Proof. For the mechanism A defined in (B.1), we can bound the HS divergence as follows:

Hf(A||A ) = X

y Y f Q(y) EX [Q(<y),Q( y)][φ (X)] Q (y) EX [Q (<y),Q ( y)][φ (X )]

Q (y) E X [Q (<y),Q ( y)][φ (X )]

y Y E X [Q(<y),Q( y)], X [Q (<y),Q ( y)] f Q(y)φ (X)

Q (y)φ (X )

Q (y)φ (X )

y Y max y E X [Q(<y ),Q( y )], X [Q (<y ),Q ( y )] f Q(y)φ (X)

Q (y)φ (X )

Q (y)φ (X ),

where in the first inequality we use Lemma 2.3 and Jensen s inequality. Notice that in the second inequality the maximum is taken only over the arguments in the expectation over X and X .

Jensen s inequality applies in case X and X are arbitrarily coupled and we use the same coupling between X and X as in (Lemma 7, Papernot & Steinke, 2022), i.e., we couple X and X such that

Q(y) = X Q (< y)

Q (y) . (B.3)

Privacy Profiles for Private Selection

We see that for X [Q(< y ), Q( y )] and X [Q (< y ), Q ( y )], the expressions (B.3) are between 0 and 1. Thus, continuing from (B.2), we find that

Hf(A||A ) X

y Y max y E X [Q(<y ),Q( y )], X [Q (<y ),Q ( y )] f Q(y)φ (X)

Q (y)φ (X )

Q (y)φ (X )

y Y max y max (X,X ) f Q(y)φ (X)

Q (y)φ (X )

Q (y)φ (X )

y Y max y f Q(y)φ (Q(< y ) + ty Q(y ))

Q (y)φ (Q (< y ) + ty Q (y ))

Q (y)φ (Q (< y ) + ty Q (y ))

for some {ty}y Y, where ty [0, 1] for all y Y. Furthermore, taking the maximum over ty s, we get

Hf(A||A ) X

y Y max y ,ty f Q(y)φ (Q(< y ) + ty Q(y ))

Q (y)φ (Q (< y ) + ty Q (y ))

Q (y)φ (Q (< y ) + ty Q (y ))

y Y f Q(y)φ (qy)

Q (y)φ (q y)

Q (y)φ (q y),

where for each y Y, qy and q y are obtained by applying the same y-dependent post-processing function to Q and Q , respectively. This can be seen using a similar reasoning as in (Lemma 7, Papernot & Steinke, 2022). It follows from the fact that for all y Y, there clearly exist y , t such that

(y , t ) = argmax y ,ty f Q(y)φ (Q(< y ) + ty Q(y ))

Q (y)φ (Q (< y ) + ty Q (y ))

Q (y)φ (Q (< y ) + ty Q (y )).

The kernel of the post-processing function is then given by

1, if z < y , t , if z = y , 0, else,

i.e., qy = P

z Y g(z)Q(z) and q y = P

z Y g(z)Q (z).

B.1. The Case of Continuous Output Score Function

In case the ordered output space Y of the base mechanism is continuous, the proof simplifies considerably. Theorem B.2. For continuous output space Y,

Hf(A||A ) = Z

Q(y) φ Q( y)

Q (y) φ Q ( y)

Q (y) φ Q ( y) dy, (B.4)

where where Q(y) and Q( y) denote the density function the CDF of the base mechanism Q, respectively.

Proof. The CDF of the private selection algorithm at y Y is given by

k=1 P[K = k] Q( y)k,

Therefore, differentiating, we see that the density function of A is given by

k=1 k P[K = k] Q(y) Q( y)k 1

k=1 k P[K = k] Q( y)k 1

= Q(y) φ Q( y) .

We have a similar representation for the density A (y), and the claim follows.

Privacy Profiles for Private Selection

C. Proofs for Section 5

C.1. Proof of Theorem 5.2

Theorem C.1. Let K Dη,γ and let δ(ϵ1), ϵ1 R, define the privacy profile of the base mechanism Q. Then, for A and A , the output distributions of the selection algorithm evaluated on neighboring datasets X and X , respectively, and for all ϵ1 0, He ϵ(A||A ) m δ(bϵ)

bϵ = ϵ (η + 1) log eϵ1 + 1 γ

Proof. Let q and q be results of applying some post-processing function to Q and Q , respectively. Then,

1 q eϵ1(1 q) + δ(ϵ1) (C.1)

for all ϵ1 R. Thus, we see that for all ϵ1 0,

φ (q) φ (q ) = (γ 1)q + 1

= (1 γ)(1 q ) + γ

(1 γ)(1 q) + γ

(1 γ)(1 q)eϵ1 + δ(ϵ1)(1 γ) + γ

(1 γ)(1 q) + γ

(1 γ)(1 q) eϵ1 + δ(ϵ1)(1 γ) + γ eϵ1

(1 γ)(1 q) + γ

= eϵ1 + δ(ϵ1)(1 γ) (1 γ)(1 q) + γ

γ δ(ϵ1) η+1 ,

where we have used the inequality (C.1) in the first inequality and the inequality γ eϵ1γ in the second inequality. Using Thm. 4.1 for the hockey-stick divergence f(z) = [z eϵ]+, we have that for all ϵ1 0,

Hf(A||A ) X

Q (y)φ (q y) eϵ

+ Q (y) φ (q y)

1 e ϵ log Q(y) Q (y) log φ (qy) φ (q y) #

+ Q(y) E[K] γη+1 1 qy(1 γ) η+1

1 e ϵ log Q(y) Q (y) log φ (qy) φ (q y) #

+ Q(y) E[K]

1 eϵ log Q(y) Q (y) (η+1) log(e ϵ1+ 1 γ

+ Q(y) E[K]

= E[K] He eϵ Q||Q ,

where eϵ = ϵ (η + 1) log eϵ1 + 1 γ

γ δ(ϵ1) . In the third inequality we have used the inequality (C.2) the fact that

[1 eϵ s]+ is a non-decreasing function of s for all ϵ R.

C.2. Proof of Corollaries 5.3 and 5.4

Corollary C.2. Let K Dη,γ. If the base mechanism Q is ϵ-DP, then the selection algorithm A is (η + 2)ϵ-DP. For η = 1 we get Theorem 1.3 of (Liu & Talwar, 2019).

Privacy Profiles for Private Selection

Proof. Let ϵ1 be such that δ(ϵ1) = 0, where ϵ1, δ(ϵ1) gives a privacy profile for the base mechanism Q. Then,

bϵ = ϵ (η + 1) log eϵ1 + 1 γ

= ϵ (η + 1)ϵ1,

and by Theorem 5.2, He ϵ(A||A ) = 0 if ϵ = (η + 2)ϵ1.

Corollary C.3. Let K Dη,γ. If the base mechanism Q is (ϵ, δ)-DP, then then the selection algorithm A is (η + 2)ϵ + γ 1δ, mδ -DP.

Proof. Let ϵ1, δ(ϵ1) be a privacy profile for the base mechanism Q and ϵ1 0. Then,

bϵ = ϵ (η + 1) log eϵ1 + 1 γ

ϵ (η + 1) log eϵ1 1 + γ 1δ(ϵ1)

ϵ (η + 1) ϵ1 + γ 1δ(ϵ1)

and by Thm. 5.2, He ϵ(A||A ) = mδ if ϵ = (η + 2)ϵ1 + γ 1δ.

C.3. Proof of Corollary 5.5

Corollary C.4 (ϵ-values when Q is GDP). Let K Dη,γ with η 1 and suppose the base mechanism is dominated by the Gaussian mechanism with noise parameter σ > 0 and L2-sensitivity 1. Then, for a fixed δ > 0, the private selection algorithm A is (ϵ, δ)-DP for

ϵ = (η + 1) 1

Proof. By (Lemma 3, Balle & Wang, 2018), we know that for the Gaussian mechanism with sensitivity 1 and noise scale σ the privacy loss random variable ω is distributed as ω N 1

σ2 . Thus, using a simple Chernoff bound for the Gaussian, for its privacy profile δ(ϵ1) we have

δ(ϵ1) P(ω ϵ1) e eϵ2σ2

where eϵ = ϵ1 1 2σ2 . Choosing

ϵ1 = 1 2σ2 + 1

we have that δ(ϵ1) δ

Furthermore, for any η 1, for ϵ1 of Eq. C.4, we have the following bound for the additional term in the bound of Thm. 5.2:

(η + 1) log eϵ1 + 1 γ

γ δ(ϵ1) (η + 1) log eϵ1 + eϵ1 1 γ

= (η + 1)ϵ1 + (η + 1) log 1 + 1 γ

(η + 1)ϵ1 + (η + 1) log 1 + 1 γ

= (η + 1)ϵ1 + (η + 1) log 1 + 1 γ

= (η + 1)ϵ1 + (η + 1) log 1 + 1 γη

(η + 1)ϵ1 + (η + 1) log 1 + 1

(η + 1)ϵ1 + η + 1

(η + 1)ϵ1 + 2δ.

Privacy Profiles for Private Selection

Setting ϵ = (η + 2)ϵ1 + 2δ, Thm. 5.2 and the inequalities (C.6) and (C.5) show that

He ϵ(A||A ) m δ ϵ (η + 1) log 1 + 1 γ

and the claim follows.

C.4. Proof of Theorem 5.7 (Private Selection, Binomial Distribution)

First, the following auxiliary lemma is needed.

Lemma C.5. Let a, b, c, d > 0. Then, for x 0, the function

f(x) = ax + b

is non-decreasing if and only if a

Proof. The claim follows from the expression

f (x) = ad cb (cx + d)2 .

Recall that for K Bin(n, p), the probability generating function is given by

φ(z) = (1 p + pz)n. (C.7)

Theorem C.6. Let K Bin(n, p) for some n N and 0 < p < 1, and let δ(ϵ1), ϵ1 R, define the privacy profile of the base mechanism Q. Suppose ϵ1 log 1 + p 1 pδ(ϵ1) .

Then, for A and A , the output distributions of the selection algorithm evaluated on neighboring datasets X and X , respectively, for all ϵ > 0 and for all ϵ1 0,

He ϵ(A||A ) m δ(bϵ), (C.8)

where bϵ = ϵ (n 1) log 1 + p(eϵ1 1) + pδ(ϵ1) .

Proof. Using the PGF of the binomial distribution given in Eq. (C.7) by the auxiliary Lemma C.5, for each y Y,

φ (qy) φ (q y) = 1 p + pqy

1 p + peϵ1q y + pδ(ϵ1) 1 p + pq y

= 1 + p(eϵ1 1)q y + pδ(ϵ1) 1 p + pq y

1 + p(eϵ1 1) + pδ(ϵ1) n 1,

in case p(eϵ1 1)

pδ(ϵ1) p 1 p,

Privacy Profiles for Private Selection

ϵ1 log 1 + p 1 pδ(ϵ1) .

Moreover, we see that

φ (q) n p = m (C.10)

for all 0 q 1. The claim follows from the inequalities (C.9) and (C.10) and from Theorem 4.1.

C.5. Proof of Corollary 5.8

Corollary C.7. Let K Poisson(m) for some m N, and let δ(ϵ1), ϵ1 R, define the privacy profile of the base mechanism Q. Then, for A and A , the output distributions of the selection algorithm evaluated on neighboring datasets X and X , respectively, and for all ϵ > 0, and for all ϵ1 0,

He ϵ(A||A ) m δ(bϵ),

bϵ = ϵ m (eϵ1 1) m δ(ϵ1).

Proof. Let K Poisson(m) and Kn Bin(n, m/n) and let A, A denote the density functions of the private selection algorithm corresponding to K and let An, A n those corresponding to Kn, evaluated on neighboring datasets X, X , respectively. Looking at the form of A given in Eq. (B.1), we have that

He ϵ A||A = X

y Y max{A(y) eϵA (y), 0}

k=1 max{P[K = k] Q( y)k Q(< y)k eϵP[K = k] Q ( y)k Q (< y)k , 0}

k=1 max{P[Kn = k] Q( y)k Q(< y)k eϵP[Kn = k] Q ( y)k Q (< y)k , 0}

k=0 |P[K = k] P[Kn = k]|

= He ϵ An||A n + (1 + eϵ)

k=0 |P[K = k] P[Kn = k]| ,

where the inequality follows from the fact that max{a + b, 0} |a| + max{b, 0} for all a, b R. Since by Le Cam s inequality, Bin(n, m/n) Poisson(m) in total variation distance as n , we have that

He ϵ A||A = lim n He ϵ An||A n .

Fixing n p = m in the bound (5.2) of Thm. 5.7 (bound for the case K Bin(n, n/m)), we see that the bound approaches the bound (5.3) of Cor. 5.8 (bound for the case K Poisson(m)) as p 0, since then

(n 1) log 1 + p(eϵ1 1) + pδ(ϵ1) m (eϵ1 1) + m δ(ϵ1).

This follows from the fact that log(1+x)

x 1 as x 0.

Remark C.8. We can also get Cor. 5.8 directly using the PGF of the Poisson distribution. For K Poisson(m), the PGF

Privacy Profiles for Private Selection

is φ(z) = em(z 1), i.e. φ (z) = m em(z 1). Using Thm. 4.1 for the hockey-stick divergence f(z) = [a eϵ]+, we get

Hf(A||A ) X

y Y f Q(y)φ (qy)

Q (y)φ (q y)

Q (y) φ (q y)

Q (y)φ (q y) eϵ

+ Q (y) φ (q y)

1 e ϵ log Q(y) Q (y) log φ (qy) φ (q y) #

+ Q(y) φ (qy)

1 eϵ log Q(y) Q (y) m (qy q y)

+ Q(y) m em (qy 1).

As the probabilities qy and q y are obtained by applying the same post-processing to Q and Q , for all ϵ1 0, q y eϵ1qy + δ(ϵ1). Using also the fact that [1 eϵ s]+ is a non-decreasing function of s for all ϵ R, we get from the inequality (C.11) that for all ϵ1 0,

Hf(A||A ) X

1 eϵ log Q(y) Q (y) m (qy q y)

+ Q(y) m em (qy 1)

1 eϵ log Q(y) Q (y) m q y(1 e ϵ1) m δ(ϵ1)

+ Q(y) m em (qy 1).

1 eϵ log Q(y) Q (y) m (1 e ϵ1) m δ(ϵ1)

which gives the claim of Cor. 5.8.

D. Converting RDP Bounds to (ϵ, δ)-Bounds

To convert from R enyi DP to approximate DP we use following formula.

Lemma D.1 (Canonne et al. 2020). Suppose the mechanism M is α, ϵ -RDP. Then M is also (ϵ, δ(ϵ))-DP for arbitrary ϵ 0 with

δ(ϵ) = exp (α 1)(ϵ ϵ)

α 1 . (D.1)