# instancespecific_asymmetric_sensitivity_in_differential_privacy__6c736488.pdf

Instance-Specific Asymmetric Sensitivity in Differential Privacy

David Durfee Mozilla Anonym ddurfee@mozilla.com

We provide a new algorithmic framework for differentially private estimation of general functions that adapts to the hardness of the underlying dataset. We build upon previous work that gives a paradigm for selecting an output through the exponential mechanism based upon closeness of the inverse to the underlying dataset, termed the inverse sensitivity mechanism. Our framework will slightly modify the closeness metric and instead give a simple and efficient application of the sparse vector technique. While the inverse sensitivity mechanism was shown to be instance optimal, it was only with respect to a class of unbiased mechanisms such that the most likely outcome matches the underlying data. We break this assumption in order to more naturally navigate the bias-variance tradeoff, which will also critically allow for extending our method to unbounded data. In consideration of this tradeoff, we provide theoretical guarantees and empirical validation that our technique will be particularly effective when the distances to the underlying dataset are asymmetric. This asymmetry is inherent to a range of important problems including fundamental statistics such as variance, as well as commonly used machine learning performance metrics for both classification and regression tasks. We efficiently instantiate our method in 𝑂(𝑛) time for these problems and empirically show that our techniques will give substantially improved differentially private estimations.

1 Introduction

We consider the general problem of estimating aggregate functions or statistics of a dataset with differential privacy. The massive increase in data collection to improve analytics and modelling across industries has made such data computations invaluable, but can also leak sensitive individual information. Rigorously measuring such leakage can be achieved through differential privacy, which quantifies the extent that one individual s data can affect the output. Much of the focus within the field of differential privacy is upon constructing algorithms that give both accurate output and privacy guarantees by injecting specific types of randomness. One of the most canonical mechanisms for achieving this considers the maximum effect one individual s data could have upon the output of a given function, referred to as the sensitivity of the function, and adds proportional noise to the function output. In general, the notion of sensitivity plays a central role in many differentially private algorithms, directly affecting the accuracy of the output.

While using the worst-case sensitivity across all potential datasets will ensure privacy guarantees, the utility can be improved by using variants of sensitivity that are specific to the underlying dataset. This notion was initially considered in Nissim et al. (2007), introducing smooth sensitivity, an interpolation between worst-case sensitivity and local sensitivity of the underlying data, by which noise could be added proportionally. The smooth sensitivity adapts well to the underlying data and

38th Conference on Neural Information Processing Systems (Neur IPS 2024).

was further extended to other commonly used variants of the original privacy definition Bun & Steinke (2019).

More aggressive methods were also considered with a data-independent conjectured sensitivity parameter and more accurate results provided when the underlying data complies with the parameter. The propose-test-release methods check that all datasets close to the underlying data have sensitivity below a parameter and add noise proportional when the criteria is met and fail otherwise Dwork & Lei (2009); Thakurta & Smith (2013). Preprocessing methods provide an approximation of the function with sensitivity below a given parameter by which noise can be added proportionally and the approximation is accurate for underlying data with low sensitivity Chen & Zhou (2013); Blocki et al. (2013); Kasiviswanathan et al. (2013); Cummings & Durfee (2020). Clipping techniques, commonly seen in differentially private stochastic gradient descent Abadi et al. (2016), are also a more rudimentary and efficient preprocessing method for ensuring sufficiently small sensitivity. The primary challenge with these approaches is the sensitivity parameter must be specified a priori and can add significant bias if the underlying data does not comply with the parameter.

In contrast, the inverse sensitivity mechanism directly improves upon the smooth sensitivity technique adapting even better to the underlying data. While several instantiations had been previously known in the literature, it was introduced in it s full generality in Asi & Duchi (2020b). Specifically, this framework considers all potential outputs based upon the closeness of their inverse to the underlying data and applies the exponential mechanism to select a point accordingly. This exact methodology can even improve upon adding noise proportional to the local sensitivity of the underlying data, which generally violates differential privacy. Follow-up work gave approximations of this method that allow for efficient implementations of more complex instantiations Asi & Duchi (2020a). For both the exact and approximate versions, the inverse sensitivity mechanism is instance optimal and nearly instance optimal, respectively, under certain assumptions Asi & Duchi (2020a,b).

1.1 Our techniques

We build upon the inverse sensitivity mechanism, particularly within the class of functions for which it was shown to be optimal. However, those guarantees only held for a class of unbiased mechanisms such that the most likely outcome matches the underlying data. The inverse sensitivity mechanism and smooth sensitivity techniques fit this characterization of unbiased. The methods that specify a data-independent sensitivity parameter break this assumption by essentially fixing the variance through this parameter but adding significant bias if the variance parameter is set too low. Our method will also break the unbiased assumption but still adapt well to the underlying data to more naturally navigate the bias-variance tradeoff.

In particular, we similarly consider the distance from the underlying data to the inverse of each possible output, which can be considered the inverse sensitivity. However, we instead invoke the well-known sparse vector technique, originally introduced in Dwork et al. (2009), to select an output close to the underlying data. The iterative nature of sparse vector technique will create a slight bias, while still adapting well to the underlying data. By utilizing this iterative technique, we can also better take advantage when the sensitivities are asymmetric that allows us to reduce the variance, and we thusly term our method the asymmetric sensitivity mechanism. In fact, the local sensitivity can be infinite with unbounded data and our technique can still naturally handle this setting for a wide variety of functions including our instantiations. We support this with theoretical utility guarantees that are asymptotically superior to previous work under these conditions.

Our notion of asymmetric sensitivities is inherent to a range of problems, and we first instantiate our method upon variance, a fundamental property of a dataset that is widely used in statistical analysis. Likewise, this property will also apply to commonly used machine learning performance metrics: cross-entropy loss, mean squared error (MSE), and mean absolute error (MAE). Model performance evaluation is an essential part of a machine learning pipeline, particularly for iterative improvement, so accurate and private evaluation is critical. We instantiate our method upon these functions as well, and give an extensive empirical study for each instantiation across a variety of datasets and privacy parameters. We show that our method significantly improves performance of private estimation for these important problems. We further complement our results with an approximate method that allows for more efficient implementations of general functions while still preserving the asymmetry that we exploit for improved estimations. This will allow us to give 𝑂(𝑛) time implementations for each invocation of our method.

1.2 Additional related works

While the most closely related literature was discussed in more detail previously, we provide additional related works here. Recent work considered instance-optimality but for estimators population quantities Mc Millan et al. (2022), which differ from the empirical quantities studied here and in the other previously mentioned work. Additional work formally considered the bias-varianceprivacy trade-off particularly for mean estimation Kamath et al. (2023), but considers bias in the more classical sense. Interestingly, it s also been seen in the work for obtaining (asymptotically) optimal mean estimation for subgaussian distributions Karwa & Vadhan (2018); Bun & Steinke (2019) and distributions satisfying bounded moment conditions Barber & Duchi (2014); Kamath et al. (2020), that adding bias was necessary. This fits with our results where bias, albeit a different type, was needed to improve instance-specific differential privacy.

1.3 Our contributions

We summarize our primary contributions as the following:

1. We introduce a new algorithmic framework for private estimation of general functions, which we refer to as the asymmetric sensitivity mechanism, along with a more computationally efficient approximate variant (see Section 3). 2. We provide theoretical utility guarantees that asymptotically confirm our method s advantage when the sensitivities are asymmetric and further give intuition and empirical support of this asymmetric advantage (see Section 4) 3. We efficiently instantiate our method for private variance estimation, and provide an extensive empirical study showing significantly improved accuracy (see Section 5). 4. We further invoke our method upon model evaluation for both classification and regression tasks with corresponding efficient implementations and empirical studies showing improved estimations (see Section 6).

Additional and supplemental analysis, results and empirical studies are pushed to the appendix.

2 Preliminaries

For simplicity and ease of comparison we borrow much of the notation from Asi & Duchi (2020a,b). Definition 2.1. Let 𝒙, 𝒙 be datasets of our data universe 𝑛. We define 𝑑ham(𝒙, 𝒙 ) = |{𝑖 𝒙𝑖 𝒙 𝑖}| to be the Hamming distance between datasets. If 𝑑ham(𝒙, 𝒙 ) 1 then 𝒙, 𝒙 are neighboring datasets.

Note that we assume the swap definition of neighboring datasets but will also discuss how our results apply to the add-subtract definition in Appendix B.3. We further define the (global) sensitivity. Definition 2.2. 𝑓 𝑛 ℝhas sensitivity Δ if for any neighboring datasets |𝑓(𝒙) 𝑓(𝒙 )| Δ

We will be using the classical (pure) differential privacy definition, but will also discuss how our methods apply to other definitions with improved guarantees. Definition 2.3. Dwork et al. (2006b,a) A mechanism 𝑀 𝑛 is (𝜀, 𝛿)-differentially-private (DP) if for any neighboring datasets 𝒙, 𝒙 and measurable 𝑆 :

Pr[𝑀(𝒙) 𝑆] 𝑒𝜀Pr[𝑀(𝒙 ) 𝑆] + 𝛿.

If 𝛿= 0 then 𝑀is 𝜀-DP.

2.1 Sparse vector technique

We define the fundamental sparse vector technique introduced in Dwork et al. (2009) and often considered to apply Laplacian noise Lyu et al. (2017). However, recent work showed the noise can instead be added from the exponential distribution at the same parameter for improved utility Durfee (2023). Let Expo(𝑏) denote a draw from the exponential distribution with scale parameter 𝑏. The sparse vector technique iteratively calls the following algorithm.

This technique can further see improvement when the queries are monotonic which will apply to most of our instantiations of our method.

Algorithm 1 Above Threshold Require: Input dataset 𝒙, a stream of queries {𝑓𝑖 𝑛 ℝ} with sensitivity Δ, and a threshold 𝑇

1: Set 𝑇= 𝑇+ Expo(Δ/𝜀1) 2: for each query 𝑖do 3: Set 𝜈𝑖= Expo(Δ/𝜀2) 4: if 𝑓𝑖(𝒙) + 𝜈𝑖 𝑇then 5: Output and halt 6: else 7: Output 8: end if 9: end for

Definition 2.4. We say that stream of queries {𝑓𝑖 𝑛 ℝ} with sensitivity Δ is monotonic if for any neighboring 𝒙, 𝒙 𝑛we have either 𝑓𝑖(𝒙) 𝑓𝑖(𝒙 ) for all 𝑖or 𝑓𝑖(𝒙) 𝑓𝑖(𝒙 ) for all 𝑖.

This allows for the following differential privacy guarantees from Durfee (2023). Proposition 2.5. Algorithm 1 is (𝜀1 + 2𝜀2)-DP in general and (𝜀1 + 𝜀2)-DP for monotonic queries

2.2 Inverse sensitivity mechanism

The inverse sensitivity mechanism had seen several previous instantiations but was introduced in it s full generality in Asi & Duchi (2020b). We first introduce the exponential mechanism. Definition 2.6. Mc Sherry & Talwar (2007) The Exponential Mechanism is a randomized mapping 𝑀 𝑛 such that

Pr [𝑀(𝒙) = 𝑡] exp ( 𝜀 𝑞(𝒙, 𝑡)

where 𝑞 𝑛 ℝhas sensitivity Δ. Proposition 2.7. Mc Sherry & Talwar (2007) The exponential mechanism is 𝜀-DP

We then define the distance of a potential output from the underlying dataset to be the Hamming distance required to change the data such that the new data matches the output. Definition 2.8. For a function 𝑓 𝑛 and 𝒙 𝑛, let the inverse sensitivity of 𝑡 be

len𝑓(𝒙; 𝑡) def= inf 𝒙 {𝑑ham(𝒙, 𝒙 )|𝑓(𝒙 ) = 𝑡}

By construction this distance metric for any output cannot change by more than one between neighboring datasets due to the triangle inequality for Hamming distance. Corollary 2.9. For any neighboring datasets 𝒙, 𝒙 𝑛and 𝑡 im(𝑓) where im(𝑓) is the image of the function, we have |len𝑓(𝒙; 𝑡) len𝑓(𝒙 ; 𝑡)| 1

The inverse sensitivity mechanism then draws from the exponential mechanism instantiated upon the distance metric giving the density function

𝜋𝑀inv(𝒙)(𝑡) = 𝑒 len𝑓(𝒙;𝑡)𝜀/2

𝑒 len𝑓(𝒙;𝑠)𝜀/2𝑑𝑠 (M.1)

and mechanism M.1 is 𝜀-DP by Proposition 2.7 and Corollary 2.9.

3 Asymmetric Sensitivity Mechanism

In this section we introduce our general methodology for instance-specific differentially private estimation, which we term the asymmetric sensitivity mechanism. We first give the exact formulation which will fit an extensive class of functions focused upon in Asi & Duchi (2020b). Next

we provide a simple framework by which our method can be implemented. The efficiency of this implementation is highly dependent upon the function of interest, but we supplement these results with an approximate method. This can allow for broader efficient implementations and also extends our methodology to general functions. While this section will set up and provide the necessary rigor for our techniques, we also point the reader to Appendix C for a more intuitive explanation of our approach compared to the inverse sensitivity mechanism.

3.1 Exact asymmetric sensitivity mechanism

Our method will similarly consider the distance for each output from the underlying data with the goal being to select an output with distance close to zero. The inverse sensitivity mechanism does this through the exponential mechanism, but we will instead apply the sparse vector technique. In order to better apply the sparse vector technique, we will first modify the inverse sensitivity such that it is negative for outputs that are less than output from the underlying data. Definition 3.1. For 𝑓 𝑛 ℝand 𝒙 𝑛, let the reflective inverse sensitivity of 𝑡 ℝbe

s𝑓(𝒙; 𝑡) def= sgn(𝑡 𝑓(𝒙)) (len𝑓(𝒙; 𝑡) 1

Intuitively, the goal of applying the sparse vector technique will be to identify when the reflective inverse sensitivity crosses the threshold from negative to positive. While there are reasonable methods of extending our approach to higher dimensions, it will both become computationally inefficient and the notion of asymmetry, which gives our method the most significant improvement, is less inherent in higher dimensions. Initially, we focus upon a general class of functions considered in Asi & Duchi (2020b) that was shown to include all continuous functions from a convex domain. Definition 3.2 (Definition 4.1 in Asi & Duchi (2020b)). Let 𝑓 𝑛 ℝ. Then 𝑓is samplemonotone if for every 𝒙 𝑛and 𝑠, 𝑡 ℝsatisfying 𝑓(𝒙) 𝑠 𝑡or 𝑡 𝑠 𝑓(𝒙), we have len𝑓(𝒙; 𝑠) len𝑓(𝒙; 𝑡)

For this class of functions, we show that the reflective inverse sensitivity of an output maintains closeness between neighboring datasets. Accordingly, we can apply the sparse vector technique to a stream of potential outputs in order to (noisily) identify when the reflective inverse sensitivity crosses the threshold from negative to positive. This gives the asymmetric sensitivity mechanism for a stream of potential outputs {𝑡𝑖} that calls Above Threshold and returns 𝑡𝑘when

Above Threshold(𝒙, {s𝑓(𝒙; 𝑡𝑖)}, 𝑇= 0) = { 𝑘 1, } (M.2)

To be effective, this stream of potential outputs should be increasing (or decreasing if we flip the sign of s𝑓) but will still achieve the desired privacy guarantees regardless which is shown in Appendix A.2. Theorem 3.3. Given sample-monotone 𝑓 𝑛 ℝand any stream of potential outputs {𝑡𝑖}, we have that mechanism M.2 is (𝜀1 + 2𝜀2)-DP in general and (𝜀1 + 𝜀2)-DP if s𝑓is monotonic.

We further detail in Appendix B a simple, general, and robust strategy for selecting potential outputs that provides a reasonable limit on the number of queries.

3.2 Implementation framework

The primary bottleneck in efficiently implementing both our asymmetric sensitivity mechanism and the inverse sensitivity mechanism is the computation of the inverse sensitivity. In particular, it will require computing upper and lower output bounds for different Hamming distances from our underlying data. Definition 3.4. For a function 𝑓 𝑛 ℝ, we define the upper and lower output bounds for Hamming distance 𝓁as 𝑈𝓁 𝑓(𝒙) def= sup 𝒙 {𝑓(𝒙 ) 𝑑ham(𝒙, 𝒙 ) 𝓁}

and 𝐿𝓁 𝑓(𝒙) def= inf 𝒙 {𝑓(𝒙 ) 𝑑ham(𝒙, 𝒙 ) 𝓁}

The complexity of computing these depends upon the function, but we can use the upper and lower output bounds to get the inverse sensitivity with the following lemma proven in Appendix A.2.

Lemma 3.5. If 𝑓is sample-monotone then len𝑓(𝒙; 𝑡) = inf{𝓁 𝐿𝓁 𝑓(𝒙) 𝑡 𝑈𝓁 𝑓(𝒙)} for all 𝑡 ℝ

If we then assume access to the array 𝐿𝑛 𝑓(𝒙), ..., 𝐿1 𝑓(𝒙), 𝑓(𝒙), 𝑈1 𝑓(𝒙), ..., 𝑈𝑛 𝑓(𝒙), for any potential output 𝑡𝑖we can compute len𝑓(𝒙; 𝑡𝑖) in 𝑂(log(𝑛)) time with a simple binary search. Alternatively, we could also take 𝑂(𝑛) amortized time by maintaining a pointer and iteratively increasing the index for each new potential output if we assume the stream of potential outputs are non-decreasing. This gives the general implementation framework:

1. Compute upper and lower output bounds 𝑈𝓁 𝑓(𝒙) and 𝐿𝓁 𝑓(𝒙) for all 𝓁 [𝑛]

2. Use the output bounds to efficiently run Above Threshold(𝒙, {s𝑓(𝒙; 𝑡𝑖)}, 𝑇= 0)

3.3 Approximate asymmetric sensitivity mechanism

In Section A, we show how we can extend our asymmetric sensitivity mechanism to general functions 𝑓 𝑛 ℝand provide more efficient implementations. It will follow closely with our exact version above.

4 Asymmetric Sensitivity Advantage

In this section, we first connect our definitions with the corresponding definitions in the previous work, by which utility guarantees are provided. Then we discuss the notion of asymmetric sensitivities and provide our utility guarantees that exploit this asymmetry to asymptotically improve upon the previous work under those conditions.

4.1 Connection to previous work

An essential quantity for our method and both inverse and smooth sensitivity mechanisms is the amount a function output can change if 𝑘individuals change their data. This is quantified in Equation 3 from Asi & Duchi (2020b) which can be translated to our definitions (in the case when = ℝ) as

𝜔𝑓(𝒙; 𝑘) def= max{|𝑓(𝒙) 𝐿𝑘 𝑓(𝒙)|, |𝑓(𝒙) 𝑈𝑘 𝑓(𝒙)|}

Note that if 𝑘= 1 then this is the local sensitivity of the function. It s then shown in Asi & Duchi (2020b) (Corollary 4.2 and Equation 13, respectively) that the general utility guarantees of both inverse sensitivity mechanism and smooth sensitivity mechanism are bounded with respect to this quantity. More simply, the accuracy guarantees degrade as the local sensitivity increases and there is no utility bound if local sensitivity is infinite.

4.2 Asymmetric accuracy guarantees

To understand the advantages of our method, we will consider the sensitivities to be asymmetric if |𝑓(𝒙) 𝑈𝑘 𝑓(𝒙)| >> |𝑓(𝒙) 𝐿𝑘 𝑓(𝒙)| for most 𝑘(or vice versa), which is to say that changing an individual s data can generally increase the function more than decrease it. In general, we expect any lower bounded function to inherently limit the amount changing one individual s data can decrease the function compared to increasing the function. Each instantiation in our empirical study is a non-negative function which then fits this characterization. For simplicity, we will restrict our consideration to non-negative functions for our utility guarantees, but can easily apply these to other settings.

The goal for our method is to exploit the asymmetric sensitivities by instead applying the sparse vector technique. The iterative nature of this technique biases the output towards being less than 𝑓(𝒙), but more importantly the 𝑈𝑘 𝑓(𝒙) values will have little effect upon the accuracy. Specifically, once the threshold is crossed it becomes increasingly unlikely that the sparse vector technique will proceed. Explicitly connecting this with our mechanism, even if 𝑈𝑘 1 (𝒙) = and so the local

sensitivity is infinite, we still have s𝑓(𝒙; 𝑡𝑖) = 1/2 for all 𝑡𝑖> 𝑓(𝒙) and assuming 𝑡𝑖is increasing it is increasingly unlikely that we output larger 𝑡𝑖> 𝑓(𝒙).

We formalize this intuition by providing a theoretical utility guarantee that does not depend upon the upper bound values. Essentially, we are able to replace the |𝑓(𝒙) 𝑈𝑘 𝑓(𝒙)| in 𝜔𝑓(𝒙; 𝑘) with a relative error bound based upon a parameter that we fix across all our empirical instantiations. Accordingly, as 𝑈1 𝑓(𝒙) and thereby asymmetry increases and also local sensitivity increases, our utility guarantees are asymptotically superior to both inverse and smooth sensitivity mechanisms.

Lemma 4.1. Let 𝑀denote the mechanism from M.2 with 𝑡𝑖= 𝛽𝑖 1 where 𝛽> 1 and we let 𝜀= 𝜀1+2𝜀2 with 𝜀1 = 𝜀2 in the call to Algorithm 1. Given non-negative sample monotone 𝑓 𝑛 ℝwe have

Pr [|𝑀(𝒙) 𝑓(𝒙)| < max{|𝑓(𝒙) 𝐿log 𝑘+1(𝒙)|, (𝛽𝑘 1)(𝑓(𝒙) + 1)}] > 1 𝑂( 1 𝑘𝑒𝜀/6 )

for any 𝒙 𝑛such that 𝑓(𝒙) 𝛽𝑂(1).

We provide the proof in Appendix C along with further intuition and empirical evidence of our asymmetric advantage. We also extend these results to general non-negative functions in Corollary C.4 by applying our approximate variant to achieve the same guarantees. We set 𝛽= 1.005 in all our empirical studies and also note that our method is robust to reasonable choices of the 𝛽 parameter (Appendix B.2).

5 Private Variance Estimation

In this section, we instantiate our asymmetric sensitivity mechanism upon variance, a fundamental property of a dataset that is widely used in statistical analysis.

Definition 5.1. Let = ℝand for 𝒙 𝑛we let 𝒙= 1

𝑛 𝑛 𝑖=1 𝒙𝑖and define

Var [𝒙] def= 1

𝑖=1 (𝒙𝑖 𝒙)2

There has been extensive work in the privacy literature upon covariance estimation for univariate and multivariate Gaussians with a focus upon optimizing asymptotic performance1. Our focus here will be practical methods for general data, so a rigorous comparison to all these methods for Gaussians is untenable and outside the scope of this work.

We first show how the variance instantiation of asymmetric sensitivity can be implemented efficiently and give intuition upon why we expect asymmetric sensitivities for this function. Next we give a detailed empirical study that confirms this intuition, showing that our method will substantially outperform inverse sensitivity on the task of variance estimation.

5.1 Efficient variance instantiation

As seen in Section 3.2, we need to efficiently provide upper and lower output bounds in order to achieve an efficient implementation. We first consider the lower output bounds and can provide the exact bounds from Definition 3.4 which we prove in Appendix D.

Lemma 5.2. Given 𝒙 ℝ𝑛, if 𝒙1 ... 𝒙𝑛are ordered then we have lower output bounds

𝐿𝓁 Var(𝒙) = 𝑛 𝓁

𝑛 min 0 𝑖 𝓁Var [𝒙[𝓁+1 𝑖 𝑛 𝑖]]

where we let 𝒙[𝓁+1 𝑖 𝑛 𝑖] def= (𝒙𝓁+1 𝑖, ..., 𝒙𝑛 𝑖)

1See Karwa & Vadhan (2018); Du et al. (2020); Biswas et al. (2020); Kamath et al. (2019); Bun et al. (2019); Aden-Ali et al. (2021); Ashtiani & Liaw (2022); Kothari et al. (2022); Tsfadia et al. (2022); Liu et al. (2022); Kamath et al. (2022)

While the formula above immediately suggests an 𝑂(𝑛2) time computation of all the lower output bounds, we will further prove in Appendix D that we can use our approximation to get a more efficient implementation. In particular, we consider a data independent fixed distance and only compute the exact bounds outputs closer to the underlying data to still ensure accurate estimations.

Lemma 5.3. Given 𝒙 ℝ𝑛and 𝑐 0, we can compute all approximate output bounds 𝐿 𝓁 Var(𝒙) = 𝐿𝓁 Var(𝒙) for 𝓁 𝑐and 𝐿 𝓁 Var(𝒙) = 0 for 𝓁> 𝑐in 𝑂(𝑛+ 𝑐2) time.

Next we consider the upper output bounds, but if the data is unbounded then we must have 𝑈1Var(𝒙) = . It is precisely for this reason that asymmetric sensitivities are inherent for variance. Our method can naturally handle this setting and we show in Appendix B.1 that unbounded upper output bounds barely affects our accuracy. However, applying the inverse sensitivity mechanism requires reasonable bounds upon each data point that should be data-independent and also sufficiently loose to not add bias from clipping data points. Lemma 5.4. If we restrict all values to the interval [𝑎, 𝑏] then given 𝒙 [𝑎, 𝑏]𝑛we can give approximate upper output bounds

𝑈 𝓁 Var(𝒙) = Var [𝒙] + 𝓁(𝑏 𝑎)2

To our knowledge, there is no efficient method for computing the exact upper output bounds for general data (contained in a range), so to maintain practicality we provide approximate bounds, proven in Appendix D.

Algorithm 2 Variance instantiation of asymmetric sensitivity mechanism Require: Input dataset 𝒙, and parameter 𝛽> 1

1: Compute all 𝐿 𝓁 Var(𝒙) with 𝑐= 100 (Lemma 5.3) 2: Compute all 𝑈 𝓁 Var(𝒙) if domain is restricted to [𝑎, 𝑏]𝑛(Lemma 5.4) 3: { 𝑘 1, } Above Threshold(𝒙, {s𝑓(𝒙; 𝛽𝑖 1)}, 𝑇= 0) output 𝛽𝑘 1

Theorem 5.5. Algorithm 2 is (𝜀1 + 2𝜀2)-DP and has a runtime of 𝑂(𝑛+ 𝑞) where 𝑞is the number of queries in Above Threshold

Proof. If we assume the domain is restricted to [𝑎, 𝑏]𝑛then the privacy guarantees follow from Lemma 5.3 and Lemma 5.4 applied to Theorem A.4. If not then we apply Lemma 5.3 and 𝑈 1 Var(𝒙) = to Theorem A.4 to get our privacy guarantees.

For the runtime, computing all 𝐿 𝓁 Var(𝒙) is 𝑂(𝑛) time by Lemma 5.3 and fixing 𝑐= 100, and computing all 𝑈 𝓁 Var(𝒙) is 𝑂(𝑛) time. Finally, we can run Above Threshold in 𝑂(𝑛+ 𝑞) time as seen in Section 3.2

In our implementations we ll more reasonably assume 𝛽 1.001, so 𝛽50000 > 1021 and we ll simply terminate Above Threshold after at most 50,000 queries for all datasets without affecting the privacy guarantees. This then gives a runtime of 𝑂(𝑛).

5.2 Empirical study of variance estimation

For our instantiations of machine learning model evaluation we will be using the following datasets for regression tasks: Diamonds dataset containing diamond prices and related features Wickham (2016); Abalone dataset containing age of abalone and related features Nash et al. (1995); and Bike dataset containing number of bike rentals and related features Fanaee-T (2013). We will also use the labels from these datasets to test our variance invocation. We also use the Adult dataset, Becker & Kohavi (1996), for model evaluation of classification tasks so we will borrow two of the features, age and hours worked per week, to test our variance invocation.

While our method does not require any bounds on the data to still maintain high accuracy (see Appendix B.1), it is necessary for the other mechanisms. All of our data is inherently non-negative

Figure 1: Plots comparing each method for estimating variance. For each privacy parameter we sample 1,000 datapoints from the dataset and call each mechanism 100 times, plotting the average absolute error with 0.9 confidence intervals.

and some data has innate upper bounds. If not, we use reasonable upper bounds, which should be data independent, to avoid adding bias from clipping the data. We use the following bounds: [0,50000] for diamond prices; [0,50] for abalone ages; [0,5000] for city bike rentals; [0,168] for hours; and [0,125] for age. Our algorithm (Algorithm 2 in Appendix D) will have a parameter 𝛽which we fix 𝛽= 1.005 and will maintain this consistency across all experiments. We further show in Appendix B.2 that our method is robustly accurate across reasonable 𝛽choices.

For each privacy parameter, we repeat 100 times: sample 1,000 datapoints from the dataset and call each mechanism on the sampled data for estimates. We plot the average absolute error for each method along with confidence intervals of 0.9 in Figure 5.2. As expected, we see that our approach of variance estimation sees substantially less error across privacy parameters and datasets.

6 Private Machine Learning Model Evaluation

In this section, we invoke our asymmetric sensitivity mechanism upon commonly used metrics for machine learning model evaluation for both classification and regression tasks. In particular, we consider cross-entropy loss for classification tasks, and mean squared error (MSE) and mean absolute error (MAE) for regression tasks. Note that combining our improved estimation for variance in Section 5 with the improved estimation for MSE also implies an improved estimation of the coefficient of determination, 𝑅2, also commonly used for evaluating regression performance.

We provide full definitions along with technical analysis in Section E.

6.1 Empirical study of model evaluation for classification

For our empirical study of model evaluation for classification tasks we will consider two tabular datasets with binary labels, the Adult dataset Becker & Kohavi (1996) and Diabetes dataset Efron et al. (2004), along with two computer vision tasks with 10 classes, the mnist dataset Le Cun et al. (2010) and cifar10 dataset Krizhevsky et al. (2009). Our focus here is upon the accuracy of our evaluation, not optimizing the quality of the model itself. As such, we will be using reasonable choices for models for simplicity but certainly not state-of-the-art models .

For the tabular data, we partition into train and test with an 80/20 split and train with an xgboost classifier with the default parameters. For the mnist data, which is already partitioned, we use a simple MLP with one inner dense layer of 128 neurons and relu activation, and the final layer of 10 neurons has a softmax activation. We train this model for 5 epochs. For the cifar10 data, which is already partitioned, we use a relatively small CNN with several pooling and convolutional layers,

Figure 2: Plots comparing each method for estimating cross entropy loss. For each privacy parameter we both sample 1,000 datapoints from the test set and call each mechanism 100 times, plotting the average absolute error with 0.9 confidence intervals.

and several dense layers at the end with relu activation and final layer with softmax activation. We train this model for 10 epochs.

Again our method does not require any bounds on the data to maintain high accuracy, but it is necessary for the inverse sensitivity mechanism. Given the softmax activation for our models, the outputs are unbounded, but we will provide reasonable bounds. We will use the bounds [-10,10] of the model output for binary classification tabular data, and bounds [ 25, 25]10 of the model output for the multi-classification vision data. Once again, we fix our parameter 𝛽= 1.005.

6.2 Empirical study of model evaluation for regression

As discussed in Section 5, our machine learning model evaluations for regression will use the following datasets: Diamonds dataset containing diamond prices and related features Wickham (2016); Abalone dataset containing age of abalone and related feature Nash et al. (1995); and Bike dataset containing number of bike rentals and related feature Fanaee-T (2013). We also use the same parameters from Section 5 for these datasets. Once again, our goal here is to accurately assess the quality of the model and not optimize performance. As such we simply train with xgboost regressor under default parameters after partitioning each dataset into train and test with an 80/20 split.

We first consider mean squared error estimation and repeat 100 times for each privacy parameter: draw 1,000 datapoints from the test set and call each mechanism 100 times for estimates. We then plot the average absolute error along with confidence intervals of 0.9. We repeat this process for mean absolute error.

Abadi, M., Chu, A., Goodfellow, I., Mc Mahan, H. B., Mironov, I., Talwar, K., and Zhang, L. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pp. 308 318, 2016.

Aden-Ali, I., Ashtiani, H., and Kamath, G. On the sample complexity of privately learning unbounded high-dimensional gaussians. In Algorithmic Learning Theory, pp. 185 216. PMLR, 2021.

Ashtiani, H. and Liaw, C. Private and polynomial time algorithms for learning gaussians and beyond. In Conference on Learning Theory, pp. 1075 1076. PMLR, 2022.

Asi, H. and Duchi, J. C. Instance-optimality in differential privacy via approximate inverse sensitivity mechanisms. Advances in neural information processing systems, 33:14106 14117, 2020a.

Asi, H. and Duchi, J. C. Near instance-optimality in differential privacy. ar Xiv preprint ar Xiv:2005.10630, 2020b.

Barber, R. F. and Duchi, J. C. Privacy and statistical risk: Formalisms and minimax bounds. ar Xiv preprint ar Xiv:1412.4451, 2014.

Becker, B. and Kohavi, R. Adult. UCI Machine Learning Repository, 1996. DOI: https://doi.org/10.24432/C5XW20.

Biswas, S., Dong, Y., Kamath, G., and Ullman, J. Coinpress: Practical private mean and covariance estimation. Advances in Neural Information Processing Systems, 33:14475 14485, 2020.

Blocki, J., Blum, A., Datta, A., and Sheffet, O. Differentially private data analysis of social networks via restricted sensitivity. In Proceedings of the 4th conference on Innovations in Theoretical Computer Science, pp. 87 96, 2013.

Bun, M. and Steinke, T. Average-case averages: Private algorithms for smooth sensitivity and mean estimation. Advances in Neural Information Processing Systems, 32, 2019.

Bun, M., Kamath, G., Steinke, T., and Wu, S. Z. Private hypothesis selection. Advances in Neural Information Processing Systems, 32, 2019.

Chen, S. and Zhou, S. Recursive mechanism: towards node differential privacy and unrestricted joins. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 653 664, 2013.

Cummings, R. and Durfee, D. Individual sensitivity preprocessing for data privacy. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 528 547. SIAM, 2020.

Du, W., Foot, C., Moniot, M., Bray, A., and Groce, A. Differentially private confidence intervals. ar Xiv preprint ar Xiv:2001.02285, 2020.

Durfee, D. Unbounded differentially private quantile and maximum estimation. ar Xiv preprint ar Xiv:2305.01177, 2023.

Dwork, C. and Lei, J. Differential privacy and robust statistics. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pp. 371 380, 2009.

Dwork, C., Kenthapadi, K., Mc Sherry, F., Mironov, I., and Naor, M. Our data, ourselves: Privacy via distributed noise generation. In Advances in Cryptology-EUROCRYPT 2006: 24th Annual International Conference on the Theory and Applications of Cryptographic Techniques, St. Petersburg, Russia, May 28-June 1, 2006. Proceedings 25, pp. 486 503. Springer, 2006a.

Dwork, C., Mc Sherry, F., Nissim, K., and Smith, A. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006. Proceedings 3, pp. 265 284. Springer, 2006b.

Dwork, C., Naor, M., Reingold, O., Rothblum, G. N., and Vadhan, S. On the complexity of differentially private data release: efficient algorithms and hardness results. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pp. 381 390, 2009.

Dwork, C., Roth, A., et al. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3 4):211 407, 2014.

Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. Diabetes dataset. https://web.stanford. edu/~hastie/Papers/LARS/Least Angle_2002.pdf, 2004.

Fanaee-T, H. Bike Sharing Dataset. UCI Machine Learning Repository, 2013. DOI: https://doi.org/10.24432/C5W894.

Kamath, G., Li, J., Singhal, V., and Ullman, J. Privately learning high-dimensional distributions. In Conference on Learning Theory, pp. 1853 1902. PMLR, 2019.

Kamath, G., Singhal, V., and Ullman, J. Private mean estimation of heavy-tailed distributions. In Conference on Learning Theory, pp. 2204 2235. PMLR, 2020.

Kamath, G., Mouzakis, A., Singhal, V., Steinke, T., and Ullman, J. A private and computationallyefficient estimator for unbounded gaussians. In Conference on Learning Theory, pp. 544 572. PMLR, 2022.

Kamath, G., Mouzakis, A., Regehr, M., Singhal, V., Steinke, T., and Ullman, J. A bias-variance-privacy trilemma for statistical estimation. ar Xiv preprint ar Xiv:2301.13334, 2023.

Karwa, V. and Vadhan, S. Finite sample differentially private confidence intervals. In 9th Innovations in Theoretical Computer Science Conference (ITCS 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018.

Kasiviswanathan, S. P., Nissim, K., Raskhodnikova, S., and Smith, A. Analyzing graphs with node differential privacy. In Theory of Cryptography: 10th Theory of Cryptography Conference, TCC 2013, Tokyo, Japan, March 3-6, 2013. Proceedings, pp. 457 476. Springer, 2013.

Kothari, P., Manurangsi, P., and Velingker, A. Private robust estimation by stabilizing convex relaxations. In Conference on Learning Theory, pp. 723 777. PMLR, 2022.

Krizhevsky, A., Nair, V., and Hinton, G. Cifar-10 (canadian institute for advanced research, cifar-10 dataset). https://www.cs.toronto.edu/~kriz/cifar.html, 2009.

Le Cun, Y., Cortes, C., and Burges, C. Mnist handwritten digit database. AT&T Labs [Online]. Available: http://yann. lecun. com/exdb/mnist, 2010.

Liu, X., Kong, W., and Oh, S. Differential privacy and robust statistics in high dimensions. In Conference on Learning Theory, pp. 1167 1246. PMLR, 2022.

Lyu, M., Su, D., and Li, N. Understanding the sparse vector technique for differential privacy. Proceedings of the VLDB Endowment, 10(6), 2017.

Mc Millan, A., Smith, A., and Ullman, J. Instance-optimal differentially private estimation. ar Xiv preprint ar Xiv:2210.15819, 2022.

Mc Sherry, F. and Talwar, K. Mechanism design via differential privacy. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS 07), pp. 94 103. IEEE, 2007.

Nash, W., Sellers, T., Talbot, S., Cawthorn, A., and Ford, W. Abalone. UCI Machine Learning Repository, 1995. DOI: https://doi.org/10.24432/C55C7W.

Nissim, K., Raskhodnikova, S., and Smith, A. Smooth sensitivity and sampling in private data analysis. In Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, pp. 75 84, 2007.

Smith, A. Privacy-preserving statistical estimation with optimal convergence rates. In Proceedings of the forty-third annual ACM symposium on Theory of computing, pp. 813 822, 2011.

Thakurta, A. G. and Smith, A. Differentially private feature selection via stability arguments, and the robustness of the lasso. In Conference on Learning Theory, pp. 819 850. PMLR, 2013.

Tsfadia, E., Cohen, E., Kaplan, H., Mansour, Y., and Stemmer, U. Friendlycore: Practical differentially private aggregation. In International Conference on Machine Learning, pp. 21828 21863. PMLR, 2022.

Wickham, H. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016. ISBN 978-3-319-24277-4. URL https://ggplot2.tidyverse.org.

A Additional Analysis of Asymmetric Sensitivity Mechanism

In this section, we give full details upon extending our asymmetric sensitivity mechanism to general functions 𝑓 𝑛 ℝand provide more efficient implementations. We also provide all missing proofs from Section 3.

A.1 Approximate asymmetric sensitivity mechanism

As noted in Section 3, computing the upper and lower outputs bounds can be inefficient or even infeasible, which is necessary for our method, inverse sensitivity method, and smooth sensitivity method. As such we provide an approximation of these bounds that is a slightly more granular version of the approximation method provided in Asi & Duchi (2020a) and also applies to the inverse sensitivity mechanism.

Definition A.1. For a function 𝑓 𝑛 ℝ, we define approximate upper and lower sensitivity bounding functions of Hamming distance 𝓁to be 𝑈𝓁 𝑓 𝑛 ℝand 𝐿𝓁 𝑓 𝑛 ℝ, if for all 𝓁 0 and any neighboring datasets 𝒙, 𝒙 𝑛we have 𝑈𝓁 𝑓(𝒙) 𝑈𝓁 𝑓(𝒙) and 𝑈𝓁 𝑓(𝒙) 𝑈𝓁+1 𝑓 (𝒙 ) along with 𝐿𝓁 𝑓(𝒙) 𝐿𝓁 𝑓(𝒙) and 𝐿𝓁 𝑓(𝒙) 𝐿𝓁+1 𝑓 (𝒙 )

In particular, this definition separates the approximation for the upper vs lower bounds as opposed to treating them symmetrically. This maintains asymmetry which is precisely where our technique excels most. We then define a variant of closeness for each output to the underlying data which utilizes these approximate upper and lower bounds. Definition A.2. For a function 𝑓 𝑛 ℝalong with 𝑈𝓁 𝑓 𝑛 ℝand 𝐿𝓁 𝑓 𝑛 ℝ, for any 𝑡 ℝ, we let

len𝑓(𝒙; 𝑡) def= inf{𝓁 𝐿𝓁 𝑓(𝒙) 𝑡 𝑈𝓁 𝑓(𝒙)}

Outputs can only be closer to the underlying data under this approximate definition which could hurt accuracy, but will still give the desired privacy for inverse sensitivity mechanism. We then extend this definition equivalently for our reflective inverse sensitivity. Definition A.3. For a function 𝑓 𝑛 ℝalong with 𝑈𝓁 𝑓 𝑛 ℝand 𝐿𝓁 𝑓 𝑛 ℝ, for any 𝑡 ℝ, we let

s𝑓(𝒙; 𝑡) def= sgn(𝑡 𝑓(𝒙)) (len𝑓(𝒙; 𝑡) 1

We will then be able to show in general that the approximate reflective inverse sensitivity of an output maintains closeness between neighboring datasets. This gives the approximate asymmetric sensitivity mechanism for a stream of potential outputs {𝑡𝑖} that calls Above Threshold and returns 𝑡𝑘when

Above Threshold(𝒙, {s𝑓(𝒙; 𝑡𝑖)}, 𝑇= 0) = { 𝑘 1, } (M.3)

This will then achieve the same privacy guarantees which is shown in Appendix A.3.

Theorem A.4. Given 𝑓 𝑛 ℝand any stream of potential outputs {𝑡𝑖}, we have that mechanism M.3 is (𝜀1 + 2𝜀2)-DP in general and (𝜀1 + 𝜀2)-DP if s𝑓is monotonic.

A.2 Analysis for exact asymmetric sensitivity mechanism

We first prove a helper lemma regarding the closeness of the reflective inverse sensitivities for neighboring datasets.

Lemma A.5. Given sample-monotone 𝑓 𝑛 ℝ, we must have im(𝑓) is a convex set, and for any neighboring datasets 𝒙, 𝒙 𝑛and 𝑡 im(𝑓) we have |s𝑓(𝒙; 𝑡) s𝑓(𝒙 ; 𝑡)| 1

Proof. We first show the image is convex. For any 𝑎, 𝑏 im(𝑓), there must be datasets 𝒙𝑎, 𝒙𝑏 𝑛 such that 𝑓(𝒙𝑎) = 𝑎and 𝑓(𝒙𝑏) = 𝑏. Furthermore, by Definition 2.1 we know 𝑑ham(𝒙𝑎, 𝒙𝑏) 𝑛, so

len𝑓(𝒙𝑎; 𝑏) 𝑛. Without loss of generality assume 𝑎 𝑏, then by Definition 3.2 for any 𝑐 [𝑎, 𝑏] we must have len𝑓(𝒙𝑎; 𝑐) 𝑛which implies 𝑐 im(𝑓).

Next we show the closeness between neighboring datasets. First consider the case when 𝑡> max{𝑓(𝒙), 𝑓(𝒙 )} or 𝑡< min{𝑓(𝒙), 𝑓(𝒙 )}. This implies sgn(𝑡 𝑓(𝒙)) = sgn(𝑡 𝑓(𝒙 )) and the bound follows from Corollary 2.9. Without loss of generality, assume 𝑓(𝒙) 𝑓(𝒙 ) and consider the other case when 𝑓(𝒙) 𝑡 𝑓(𝒙 ). Due to the fact that they re neighboring len𝑓(𝒙; 𝑓(𝒙 )) 1 and len𝑓(𝒙 ; 𝑓(𝒙)) 1. Applying Definition 3.2, len𝑓(𝒙; 𝑡) 1 and len𝑓(𝒙 ; 𝑡) 1 which implies |s𝑓(𝒙; 𝑡)| 1/2 and |s𝑓(𝒙 ; 𝑡)| 1/2 giving the desired bound.

With this lemma we can prove Theorem 3.3.

Proof of Theorem 3.3. For 𝑡𝑖 im(𝑓) we know the sensitivity is at most 1 from Lemma A.5. Further if 𝑡𝑖 im(𝑓) then by the convexity of im(𝑓) from Lemma A.5 we know that either 𝑡𝑖< 𝑓(𝒙) and so s𝑓(𝒙, 𝑡) = for all 𝒙 𝑛or 𝑡𝑖> 𝑓(𝒙) and so s𝑓(𝒙, 𝑡) = for all 𝒙 𝑛. The privacy guarantees then follow from Proposition 2.5.

We also provide a proof of Lemma 3.5 connecting the inverse sensitivities to the upper and lower output bounds for sample-monotone functions.

Proof of Lemma 3.5. First consider the case when 𝑡< 𝐿𝑛 𝑓(𝒙) or 𝑡> 𝑈𝑛 𝑓(𝒙), which implies inf{𝓁 𝐿𝓁 𝑓(𝒙) 𝑡 𝑈𝓁 𝑓(𝒙)} = because the infimum of the empty set is infinity. This also implies that 𝑡 im(𝑓) so len𝑓(𝒙; 𝑡) = for the same reason. Next, consider the other case when 𝐿𝑛 𝑓(𝒙) 𝑡 𝑈𝑛 𝑓(𝒙) and let 𝑘= inf{𝓁 𝐿𝓁 𝑓(𝒙) 𝑡 𝑈𝓁 𝑓(𝒙)}. If there exists 𝒙 such that 𝑓(𝒙 ) = 𝑡and 𝑑ham(𝒙, 𝒙 ) = 𝑘 < 𝑘, then this would imply 𝐿𝑘 𝑓(𝒙) 𝑡 𝑈𝑘 𝑓(𝒙), so inf{𝓁 𝐿𝓁 𝑓(𝒙) 𝑡 𝑈𝓁 𝑓(𝒙)} < 𝑘giving a contradiction. Thus we must have len𝑓(𝒙; 𝑡) 𝑘. Furthermore, the sample-monotone definition implies that len𝑓(𝒙; 𝑡) 𝑘because 𝐿𝑘 𝑓(𝒙) 𝑡 𝑈𝑘 𝑓(𝒙). Therefore, we also have len𝑓(𝒙; 𝑡) = 𝑘.

A.3 Analysis for approximate asymmetric sensitivity mechanism

We again prove a helper lemma regarding the closeness of the approximate inverse sensitivities for neighboring datasets. Lemma A.6. Given 𝑓 𝑛 ℝ, for any neighboring datasets 𝒙, 𝒙 𝑛and inf𝒙{𝑓(𝒙)} 𝑡 sup𝒙{𝑓(𝒙)} we have ||len𝑓(𝒙; 𝑡) len𝑓(𝒙 ; 𝑡)|| 1

Proof. By construction, 𝐿𝑛 𝑓(𝒙) = inf𝒙{𝑓(𝒙)} and 𝑈𝑛 𝑓(𝒙) = sup𝒙{𝑓(𝒙)} because the Hamming distance between any datasets is always at most 𝑛. Thus we must have len𝑓(𝒙; 𝑡) 𝑛and len𝑓(𝒙 ; 𝑡) 𝑛. Without loss of generality, assume len𝑓(𝒙; 𝑡) len𝑓(𝒙 ; 𝑡) and len𝑓(𝒙; 𝑡) = 𝓁. By Definition A.1 we then have 𝐿𝓁+1 𝑓 (𝒙 ) 𝑡 𝑈𝓁+1 𝑓 (𝒙 ) so len𝑓(𝒙 ; 𝑡) 𝓁+ 1, which implies our desired inequality.

We then extend this closeness to the approximate reflective inverse sensitivities for neighboring datasets. Lemma A.7. Given 𝑓 𝑛 ℝ, for any neighboring datasets 𝒙, 𝒙 𝑛and inf𝒙{𝑓(𝒙)} 𝑡 sup𝒙{𝑓(𝒙)} we have |s𝑓(𝒙; 𝑡) s𝑓(𝒙 ; 𝑡)| 1

Proof. First consider the case when 𝑡> max{𝑓(𝒙), 𝑓(𝒙 )} or 𝑡< min{𝑓(𝒙), 𝑓(𝒙 )}. This implies sgn(𝑡 𝑓(𝒙)) = sgn(𝑡 𝑓(𝒙 )) and the bound follows from Lemma A.6. Without loss of generality, assume 𝑓(𝒙) 𝑓(𝒙 ) and consider the other case when 𝑓(𝒙) 𝑡 𝑓(𝒙 ). Due to the fact that they re neighboring and Definition A.2 we have len𝑓(𝒙; 𝑡) 1 and len𝑓(𝒙 ; 𝑡 1. This implies |s𝑓(𝒙; 𝑡)| 1/2 and |s𝑓(𝒙 ; 𝑡)| 1/2 giving the desired bound.

With these lemmas we can now prove our Theorem A.4.

Proof of Theorem A.4. For 𝑡such that inf𝒙{𝑓(𝒙)} 𝑡 sup𝒙{𝑓(𝒙)} we know the sensitivity is at most 1 from Lemma A.7. Otherwise either 𝑡𝑖< 𝑓(𝒙) and so s𝑓(𝒙, 𝑡) = for all 𝒙 𝑛or 𝑡𝑖> 𝑓(𝒙) and so s𝑓(𝒙, 𝑡) = for all 𝒙 𝑛. The privacy guarantees then follow from Proposition 2.5.

B Supplemental Results

In this section we provide supplemental results and experiments to our main results. We first show that the asymmetric sensitivity mechanism naturally handles unbounded data for our instantiations with negligible accuracy loss. Next we provide a simple, general, and robust strategy for selecting potential outputs for our method and provide a corresponding empirical study. Finally, we discuss how our methods can also apply to the add-subtract definition of neighboring datasets.

B.1 Naturally handling unbounded data

As previously discussed in our instantiations from Sections 5 and 6, the functions considered will have infinite upper output bounds if the data is unbounded. Given the iterative nature of the sparse vector technique, we will be able to naturally handle this setting with negligible accuracy loss. In particular, Algorithm 1 outputs the first query above the threshold, and we see from our Definition A.3 that even if 𝑈𝓁 𝑓(𝒙) = then we will have that the reflective inverse sensitivity is 1/2 for all possible outputs greater than 𝑓(𝒙). Thus each query of potential outputs greater than 𝑓(𝒙) is more likely than not to terminate the algorithm. The probability of termination increases even more if the reflective inverse sensitivity is greater than 1/2 but will have minimal effect.

We test this upon our variance instantion by using the bounds from Section 5 and also considering the unbounded case. We also use the same parameters and datasets from Section 5. From Figure 3, we see that the difference in performance for the unbounded setting is slim and thus our method can inherently consider unbounded data.

Figure 3: Plots of the absolute error for variance estimation with both reasonable bounds on the data and unbounded data.

While this works for our instantiations, this is aided by each function being non-negative by construction, and applying our method generally to unbounded data will often require an innate lower or upper bound on the function output. In contrast, efficient implementations of the inverse sensitivity mechanism require both upper and lower bounds on the function output, and often further require bounded data for reasonable accuracy such as in our instantiations. More specifically, the inverse sensitivity mechanism uses the upper and lower output bounds from Definition 3.4 to construct intervals and draws an interval from the exponential mechanism and uniformly selects a point from the chosen interval. But this requires setting a data independent upper and lower bound from the function for efficient implementation. This efficient approach was first seen in

an instantiation of a close variant of the inverse sensitivity mechanism for privately computing quantiles Smith (2011).

B.2 Robust and efficient potential output selection

In order to handle both fully unbounded and partially unbounded functions, we provide an efficient and robust potential output selection method that also borrows from the private quantile literature Durfee (2023), which instantiates a close variant of our asymmetric sensitivity mechanism. In Algorithm 2, we provided the explicit approach for selecting potential outputs when the outputs are non-negative. This same approach can be shifted to handle any other lower bounded functions and symmetrically applied to upper bounded functions. The exponential nature of the potential outputs implies that they will become incredibly large or small within a reasonable number of queries which limits the running time. Specifically, if we set reasonably assume 𝛽 1.001, then 𝛽50000 > 1021 and we can simply terminate Above Threshold after at most 50,000 queries for all datasets without affecting the privacy guarantees.

If the function has no innate bounds then we can call sparse vector technique with two iterations, searching through the positive numbers first, and then searching through the negatives if the first iteration immediately terminated. If the function has both innate upper and lower bounds then we can uniformly select the potential outputs from these bounds.

Figure 4: Plot of variance estimation using our method for different 𝛽parameters

While this methodology introduces a new parameter 𝛽that must be selected independent of the underlying data, we were able to fix 𝛽= 1.005 as a default and still see high performance across different invocations and datasets. Furthermore, we show here that we could consider other reasonable settings of 𝛽and still see robustly accurate performance from our methodology. In fact, from our experiments we can see that our empirical results could have been further improved by setting 𝛽= 1.01

B.3 Add-subtract neighboring

In Definition 2.1, we made the notion of neighboring datasets follow the swap definition. Another commonly used notion of neighboring datasets in the differential privacy literature is the addsubtract definition where a users data is added or removed between neighbors. Our asymmetric sensitivity mechanism can naturally extend to this notion of Hamming distance as well.

The primary difference would be that all datasets in the data universe would no longer be at most distance 𝑛from one another as the size of the dataset could vary. As such, the list of upper and lower output bounds could be infinite, but we can circumvent this with minimal practical impact. Potential outputs far from the underlying data are already incredibly unlikely to be selected so

relaxing the bounds will barely affect the output distribution. As such, we can set the approximate upper and lower bounds to be positive and negative infinity, respectively, which is essentially identical to what was done in Lemma 5.3. To account for this changed definition we would also need to update the upper and lower output bounds for our instantiations.

C Intuition and Asymmetric Advantage

In this section, we first give a more visual explanation of our methodology compared to the inverse sensitivity mechanism. We then provide strong intuition upon why our method will substantially improve the estimation accuracy when the sensitivities are asymmetric by more naturally balancing the bias-variance tradeoff. We further supplement this intuition with a formal metric that quantifies the asymmetry of the sensitivities, and we empirically validate that increased asymmetry directly corresponds to improved relative performance of our method. Finally, we give the missing analysis of Lemma 4.1, our theoretical utility guarantees.

C.1 Visualization of both methods

Recall that the inverse sensitivity method considered the distance metric of any output from the underlying data in Definition 2.8. We then proposed a variant of that definition better suited to applying the sparse vector technique in Definition 3.1.

Figure 5: We provide here an informal visualization of the inverse sensitivity and reflective inverse sensitivity. For most functions of interest we can just plot these as step functions using the upper and lower output bounds from Definition 3.4. In the plot, we denote 𝐿𝜅 𝑓(𝒙) and 𝑈𝜅 𝑓(𝒙) with 𝐿𝜅and 𝑈𝜅, respectively. We will go into further detail in the next section but note that the sensitivities are perfectly symmetric in this example.

Both of these functions essentially shift between neighboring datasets which allows for maintaining closeness between outputs for each metric. We further note that unlike the inverse sensitivity, a shift in the reflective inverse sensitivity will be monotonic because of it s increasing nature, allowing for improved privacy guarantees for many instantiations. Given the closeness between outputs for neighboring datasets, we can apply the exponential mechanism or sparse vector technique. Applying our method entails considering an increasing stream of potential output {𝑡𝑖} and (noisily) identifying when the reflective inverse sensitivity crosses from negative to positive by calling the sparse vector technique.

Figure 6: We provide here an informal visualization of the approximate PDFs for inverse sensitivity mechanism (ISM) and our asymmetric sensitivity mechanism (ASM) when the sensitivities are perfectly symmetric. We slightly alter our mechanism M.3 to uniformly draw an output in [𝑡𝑘 1, 𝑡𝑘] for easier visualization, which implies our PDF will be a step function between the potential outputs.

The iterative nature of the sparse vector technique will most often lead to our method being slightly more likely to find an output less than the true output. This introduces bias into our mechanism but we still remain competitive with inverse sensitivity even in the perfectly symmetric setting.

C.2 Naturally navigating the bias-variance tradeoff

In Figure 6 we assumed that the sensitivities were perfectly symmetric. More specifically, for 𝓁> 0 we let Δ𝓁 𝐿(𝑓; 𝒙) def= 𝐿𝓁 1 𝑓 (𝒙) 𝐿𝓁 𝑓(𝒙) and Δ𝓁 𝑈(𝑓; 𝒙) def= 𝑈𝓁 𝑓(𝒙) 𝑈𝓁 1 𝑓 (𝒙), which are the amount the function can marginally decrease and increase, respectively, by changing the 𝓁th individual s

data. Note that for 𝓁= 1 these quantities correspond to the local sensitivity. We then say that the sensitivities are perfectly symmetric if Δ𝓁 𝐿(𝑓; 𝒙) = Δ𝓁 𝑈(𝑓; 𝒙) for all 𝓁> 0, which held by construction for our examples in the previous section.

Informally, we will say that the sensitivities are asymmetric if Δ𝓁 𝐿(𝑓; 𝒙) << Δ𝓁 𝑈(𝑓; 𝒙) (or the reverse) for most 𝓁particularly those closer to 0. We now consider an example in which the upper and lower outputs bounds imply asymmetric sensitivities and compare the approximate PDFs for each method.

Figure 7: We provide here an informal visualization of the approximate PDFs but with asymmetric sensitivities. We remove the labels for 𝐿1 𝑓(𝒙), 𝐿2 𝑓(𝒙), and 𝐿3 𝑓(𝒙) as they become too condensed around 𝑓(𝒙) but we keep their tick marks on the x-axis. We again slightly alter our mechanism M.3 to uniformly draw an output in [𝑡𝑘 1, 𝑡𝑘] for easier visualization, which implies our PDF will be a step function between the potential outputs.

For a more encompassing discussion on the bias-variance trade-off, we first consider applying the smooth sensitivity framework to this example. Smooth sensitivity is unbiased in the classical sense (the expected output matches the underlying data) but the noise is added proportional to at least the local sensitivity which would be Δ1 𝑈(𝑓; 𝒙) here. The inverse sensitivity mechanism weakens the unbiased definition to better take advantage of the asymmetry from Δ𝓁 𝐿(𝑓; 𝒙) << Δ𝓁 𝑈(𝑓; 𝒙) and we see in Figure 7 that the probability mass of outputs less than 𝑓(𝒙) becomes much more closely concentrated around 𝑓(𝒙). However, using their notion of unbiased still limits the extent that it can improve the private estimation. In contrast, a variant of the preprocessing method from Cummings & Durfee (2020) could be applied here and potentially be both unbiased and have low variance if the a priori sensitivity parameter closely matches the Δ𝓁 𝐿(𝑓; 𝒙) for 𝓁close to zero. However, applying this same reduced sensitivity parameter, which essentially fixes the variance and must be data independent, would lead to significant bias in Figure 6.

By adding the slight bias towards early stopping from the iterative nature of the sparse vector technique, we can take full advantage of the tighter grouping of the lower output bound to significantly reduce the variance. More specifically, if we map Figure 5 to these lower output bounds, then the reflective inverse sensitivity will be much steeper right below 𝑓(𝒙), which has the biggest impact upon when sparse vector technique terminates. In fact, we could set the upper output bounds to be infinite and this would only slightly decay the accuracy of our method. This allows us to naturally consider unbounded data for a variety functions, and we specifically examine the effects upon variance estimation in Appendix B.1 with empirical results showing negligible impact upon accuracy. While our method does still have dependence upon the data-independent stream of potential outputs {𝑡𝑖}, we provide a simple and general strategy for this selection in Appendix B that robustly maintains accuracy across different invocations and datasets. As a result, the slight bias from our method can utilize the asymmetry for significant improvement while also remaining competitive under perfect symmetry, giving a more inherent optimization of the bias-variance trade-off.

C.3 Formalizing asymmetry of sensitivities

We previously defined asymmetric sensitivity to informally be a consistent mismatch between Δ𝓁 𝐿(𝑓; 𝒙) and Δ𝓁 𝑈(𝑓; 𝒙) for most 𝓁. Additionally, this mismatch has the highest impact when 𝓁is closest to 0, as the outputs farther away from the underlying data are far less likely to be selected. However, the level of privacy further affects this likelihood where smaller 𝜀implies mismatched Δ𝓁 𝐿(𝑓; 𝒙) and Δ𝓁 𝑈(𝑓; 𝒙) have a higher impact upon symmetry for larger 𝓁. Therefore, to obtain a formal measurement of asymmetry, we should consider an averaging of Δ𝓁 𝐿(𝑓; 𝒙) vs Δ𝓁 𝑈(𝑓; 𝒙) over all 𝓁but with higher weight given to smaller 𝓁and this weighting should be further scaled by the privacy parameter. We observe that the inverse sensitivity mechanism uniformly draws from [𝐿𝓁 𝑓, 𝐿𝓁 1 𝑓 ] and [𝑈𝓁 1 𝑓 , 𝑈𝓁 𝑓] with probability proportional to Δ𝓁 𝐿(𝑓; 𝒙) exp( 𝓁 𝜀/2) and Δ𝓁(𝑓; 𝒙) exp( 𝓁 𝜀/2), respectively, by construction of the exponential mechanism. This then precisely fits with our desired weighting and the probability that the inverse sensitivity mechanism selects an output greater than

𝑓(𝒙) is simply a normalization of 𝓁Δ𝓁 𝑈(𝑓; 𝒙) exp( 𝓁 𝜀/2). With this connection of our informal notion to the inverse sensitivity mechanism, we then give a formal definition for measuring the asymmetry of the sensitivities.

Definition C.1. Given a function 𝑓 𝑛 ℝ, we measure the asymmetry of the sensitivities by

|||| Pr [𝑀inv(𝒙) > 𝑓(𝒙)] 1

based upon the probability distribution from M.1.

Accordingly, we have that the sensitivities are symmetric if 𝑓(𝒙) is the median of the inverse sensitivity mechanism. But this property does not imply accurate estimation as the variance could still be quite large. However, it is not surprising that it will perform relatively worse than our method for more asymmetric sensitivities. We empirically test this conjecture across different levels of sensitivity symmetry, which we also vary by toggling the level of privacy.

Figure 8: We consider a range of randomly sampled output bounds and privacy parameters and compute the absolute error of our asymmetric sensitivity mechanism (ASM) and the inverse sensitivity mechanism (ISM) over a small number of random draws. We plot the (error ISM) / (error ASM) corresponding to the asymmetry of the sensitivities for the given output bounds and privacy parameter.

For our simulations we uniformly distribute the lower and upper output bounds and we toggle the range of the upper output bounds to vary the level of asymmetry. We also consider upper output bounds that are more heavy tailed and toggle the level of privacy, thereby determining the impact of the tail, to vary the asymmetry of the sensitivities. For each simulated output bounds and privacy parameter, we compute the asymmetry of the sensitivities and we make several calls to both methods for private estimations and compute the average absolute error of each. While these simulations are not all-encompassing, they empirically validate our provided intuition, and we will further see in our instantiations of functions with inherent asymmetry that our methodology gives substantially improved estimates.

C.4 Analysis of theoretical utility guarantees

In this section we provide the proof of Lemma 4.1. Our proof will first bound the probability that we output a value that s too small and then bound the probability that we output a value that s too large. As a result, we provide two helper lemmas for each direction that are close variants of Theorem 3.24 in Dwork et al. (2014).

Lemma C.2. For any sequence of m queries 𝑓1, ..., 𝑓𝑚with sensitivity 1 such that |{𝑖 𝑚 𝑓𝑖(𝑥) 𝑇 𝛼}| = 0, then Algorithm 1 (where we set 𝜀1 = 𝜀2 and 𝜀= 𝜀1 + 2𝜀2) will terminate during these queries with probability at most 𝑚 𝑒 𝛼𝜀/3

Proof. We know that exponential noise is non-negative, so to terminate at query 𝑖, the noisy result must exceed the threshold. By the CDF of the exponential distribution and our assumption, we have Pr𝜈𝑖 Expo(3/𝜀) [𝑓𝑖(𝒙) + 𝜈𝑖> 𝑇] 𝑒 𝛼𝜀/3. Applying a union bound over all 𝑚queries gives our desired result.

Lemma C.3. For any sequence of m queries 𝑓𝑗, ..., 𝑓𝑗+𝑚with sensitivity 1 such that |{𝑖 (𝑗, 𝑗+ 𝑚) 𝑓𝑖(𝑥) < 𝑇+ 𝛼}| = 0, then Algorithm 1 (where we set 𝜀1 = 𝜀2 and 𝜀= 𝜀1 + 2𝜀2) will terminate at query 𝑗+ 𝑚or later with probability at most 𝑒 𝛼𝜀/3/𝑚

Proof. In order to terminate at query 𝑗+ 𝑚or later we need that the noisy threshold is above all previous noisy queries. Let 𝜈𝑇 Expo(3/𝜀) and 𝜈𝑖 Expo(3/𝜀). There are 𝑚 1 indices in (𝑗, 𝑗+ 𝑚) so by independence we have

1 𝑚= Pr [𝜈𝑇> max 𝑖 (𝑗,𝑗+𝑚) 𝜈𝑖]

and furthermore by change of variable and the PDF of the exponential distribution

Pr [𝜈𝑇> max 𝑖 (𝑗,𝑗+𝑚) 𝜈𝑖] = 𝑒𝛼𝜀/3Pr [𝜈𝑇 𝛼> max 𝑖 (𝑗,𝑗+𝑚) 𝜈𝑖]

Due to our assumption that 𝑓𝑖(𝑥) 𝑇+𝛼for all 𝑖 (𝑗, 𝑗+𝑚), we see that Pr [𝜈𝑇 𝛼> max𝑖 (𝑗,𝑗+𝑚) 𝜈𝑖] is an upper bound on the probability of still continuing at step 𝑗+ 𝑚. This implies our desired claim.

Proof of Lemma 4.1. We first show Pr [𝑀(𝒙) < 𝐿ln 𝑘(𝒙)] < 𝐶/𝑘𝑒𝜀/6 for a constant 𝐶. By Definition 3.1 and Lemma 3.5 we know that for any 𝑡𝑖< 𝐿ln 𝑘+1(𝒙) we have s𝑓(𝒙; 𝑡𝑖) < 𝑇 ln 𝑘. Furthermore, we assumed that 𝑓(𝒙) 𝛽𝐶for a constant 𝐶and 𝑡𝑖= 𝛽𝑖 1 so 𝑡𝐶> 𝑓(𝒙) 𝐿ln 𝑘+1(𝒙). We then apply Lemma C.2 with 𝛼= ln 𝑘and 𝑚= 𝐶, which implies Pr [𝑀(𝒙) < 𝐿ln 𝑘(𝒙)] < 𝐶/𝑘𝑒𝜀/6 for a constant 𝐶as desired.

Next, we want to show that Pr [𝑀(𝒙) > 𝛽𝑘(𝑓(𝒙) + 1)] < 𝐶/𝑘𝑒𝜀/6 for a constant 𝐶. Using the fact that 𝑓(𝒙)+1 1 there are at least 𝑘 1 values of 𝑡𝑖such that 𝑡𝑖 [𝑓(𝒙), 𝛽𝑘(𝑓(𝒙)+1)]. By Definition 3.1 and Lemma 3.5 we know that for any 𝑡𝑖 [𝑓(𝒙), 𝛽𝑘(𝑓(𝒙) + 1)] we have s𝑓(𝒙; 𝑡𝑖) 𝑇+ 1/2. We then apply Lemma C.3 with 𝛼= 1/2 and 𝑚= 𝑘to get that Pr [𝑀(𝒙) > 𝛽𝑘(𝑓(𝒙) + 1)] 1/𝑘𝑒𝜀/6 as desired.

Combining these inequalities gives our desired

Pr [|𝑀(𝒙) 𝑓(𝒙)| < max{|𝑓(𝒙) 𝐿log 𝑘+1(𝒙)|, (𝛽𝑘 1)(𝑓(𝒙) + 1)}] > 1 𝑂( 1 𝑘𝑒𝜀/6 )

Corollary C.4. Let 𝑀denote the mechanism from M.3 with 𝑡𝑖= 𝛽𝑖 1 where 𝛽> 1 and we let 𝜀= 𝜀1 + 2𝜀2 with 𝜀1 = 𝜀2 in the call to Algorithm 1. Given non-negative 𝑓 ℝwe have

Pr [|𝑀(𝒙) 𝑓(𝒙)| < max{|𝑓(𝒙) 𝐿log 𝑘+1(𝒙)|, (𝛽𝑘 1)(𝑓(𝒙) + 1)}] > 1 𝑂( 1 𝑘𝑒𝜀/6 )

when 𝑓(𝒙) 𝛽𝑂(1).

Proof. The proof follows identically but instead of applying Lemma 3.5 we can directly use Definition A.2 and note that setting 𝐿𝓁 𝑓 𝐿𝓁 𝑓satisfies Definition A.1

D Additional Analysis of Variance Invocation

In this section we provide the necessary proofs for the efficient variance instantiation of our method. Most of the analysis could be considered folklore properties of variance as it is such a well-studied statistical property, but we duplicate some of these properties for completeness. We first provide a known alternative definition of variance. Definition D.1. Let = ℝand for 𝒙 𝑛, we can equivalently define variance as

Var [𝒙] def= 1

1 2(𝑥𝑖 𝑥𝑗)2

We will also add some helpful notation where for any 𝑆 [𝑛] we let 𝒙𝑆= {𝒙𝑖 𝑖 𝑆} and similarly for any 1 𝑎 𝑏 𝑛we let 𝒙[𝑎 𝑏] = (𝒙𝑎, ..., 𝒙𝑏).

D.1 Analysis for Lemma 5.2

We first prove a helper lemma that if 𝓁individuals data is changed in order to minimize variance, then those values should be set to be the mean of the remaining values.

Lemma D.2. For any 𝒙 𝑛and any 𝑆 [𝑛], we have that

inf 𝒙𝑆Var [𝒙] = 𝑛 |𝑆|

𝑛 Var [𝒙[𝑛] 𝑆]

Proof. By least squares minimization we know that 𝑖 [𝑛] 𝑆(𝒙𝑖 𝑦)2 is minimized by setting

𝑖 [𝑛] 𝑆 𝒙𝑖 def= 𝒙[𝑛] 𝑆

If we set 𝒙𝑖= 𝒙[𝑛] 𝑆for all 𝑖 𝑆, then we have 𝒙= 𝒙[𝑛] 𝑆and we have 𝑖 𝑆(𝒙𝑖 𝒙)2 = 0 which must be minimal by the non-negativity of squared error. Combining these properties we get

inf 𝒙𝑆Var [𝒙] = 1

𝑖 [𝑛] 𝑆 (𝒙𝑖 𝒙[𝑛] 𝑆)2

We can then apply our Definition 5.1 to get our desired result.

With this helper lemma we can then provide the proof of our lower output bounds through a proof by contradiction.

Proof of Lemma 5.2. By Lemma D.2 we have that

𝐿𝓁 Var(𝒙) = min 𝑆 [𝑛] |𝑆|=𝓁 𝑛 𝓁

𝑛 Var [𝒙[𝑛] 𝑆]

We will then give our desired claim through a proof by contradiction. Suppose this is minimized by 𝑆 [𝑛] with some 𝑖 𝑆such that there exists 𝑗, 𝑘 𝑆where 𝒙𝑗< 𝒙𝑖< 𝒙𝑘. Let 𝑆𝑗= (𝑆 𝑖) 𝑗and 𝑆𝑘= (𝑆 𝑖) 𝑘. To obtain our contradiction it then suffices to show that

min { Var [𝒙[𝑛] 𝑆𝑗] , Var [𝒙[𝑛] 𝑆𝑘] } < Var [𝒙[𝑛] 𝑆]

Applying our alternative formulation of variance from Definition D.1 we have

Var [𝒙[𝑛] 𝑆] = 1 𝑛 𝓁

1 2(𝒙𝑎 𝒙𝑏)2

Through cancellation of like terms we have

Var [𝒙[𝑛] 𝑆𝑗] < Var [𝒙[𝑛] 𝑆]

𝑎 [𝑛] (𝑆𝑗 𝑖) (𝒙𝑎 𝒙𝑖)2 <

𝑎 [𝑛] (𝑆 𝑗) (𝒙𝑎 𝒙𝑗)2

By construction we have (𝑆𝑗 𝑖) = (𝑆 𝑗), so by the convexity of least squares minimization and the fact that 𝒙𝑗< 𝒙𝑖, we have that this inequality holds if 𝒙𝑖 𝒙[𝑛] (𝑆 𝑗). Equivalently, we have

𝒙𝑖 𝒙[𝑛] (𝑆 𝑘) Var [𝒙[𝑛] 𝑆𝑘] < Var [𝒙[𝑛] 𝑆]

Further, we know that 𝒙[𝑛] (𝑆 𝑗) > 𝒙[𝑛] (𝑆 𝑘) because 𝒙𝑗< 𝒙𝑘. Therefore we must have either 𝒙𝑖 𝒙[𝑛] (𝑆 𝑗) or 𝒙𝑖 𝒙[𝑛] (𝑆 𝑘) which implies

min { Var [𝒙[𝑛] 𝑆𝑗] , Var [𝒙[𝑛] 𝑆𝑘] } < Var [𝒙[𝑛] 𝑆]

and gives our desired contradiction.

D.2 Analysis for Lemma 5.3

In this section we prove that the approximate lower output bounds fit Definition A.1 and can be efficiently computed. We note that these lower output bounds will still maintain high accuracy because the bounds further away from the underlying data have far less effect upon the mechanism and accordingly, loose bounds will have little effect.

Proof of Lemma 5.3. Recall that we assumed a fixed constant, 𝑐and defined 𝐿 𝓁 Var(𝒙) = 𝐿𝓁 Var(𝒙) for

𝓁 𝑐and 𝐿 𝓁 Var(𝒙) = 0 for 𝓁> 𝑐. By construction, we know that variance is non-negative, so we must have 𝐿𝓁 Var(𝒙) 0 for all 𝓁. This then implies 𝐿𝓁 𝑓(𝒙) 𝐿𝓁 𝑓(𝒙) for all 𝓁. Furthermore, if 𝓁> 𝑐 then we immediately have 𝐿𝓁 𝑓(𝒙) 𝐿𝓁+1 𝑓 (𝒙 ). Otherwise 𝐿𝓁 𝑓(𝒙) = 𝐿𝓁 𝑓(𝒙) and we know by definition that 𝐿𝓁 𝑓(𝒙) 𝐿𝓁+1 𝑓 (𝒙 ) which implies 𝐿𝓁 𝑓(𝒙) 𝐿𝓁+1 𝑓 (𝒙 )

For the runtime, by Lemma 5.2 we see that it suffices to compute Var [𝒙[𝓁+1 𝑖 𝑛 𝑖]] for all 0 𝑖 𝓁 𝑐, where we assume 𝒙1 ... 𝒙𝑛. However, this ordering only matters the largest and smallest 𝑐values, so we compute these in 𝑂(𝑛+ 𝑐log(𝑐)) time. Further, we compute 𝑛 𝑖=1 𝒙𝑖and 𝑛 𝑖=1 𝒙2 𝑖in 𝑂(𝑛) time and will utilize the well-known fact that variance can be computed in 𝑂(1) time with these quantities. We can then just iterate through all 𝑖, 𝓁such that 0 𝑖 𝓁 𝑐updating the sum of variables and squares to compute each Var [𝒙[𝓁+1 𝑖 𝑛 𝑖]] in 𝑂(1) time taking a total of 𝑂(𝑐2) time.

D.3 Analysis for Lemma 5.4

Proof of Lemma 5.4. We first show that for any neighboring datasets 𝒙, 𝒙 we have 𝑈𝓁 Var(𝒙) 𝑈𝓁+1 Var(𝒙 ). By construction, this reduces to showing that Var [𝒙] Var [𝒙 ] + (𝑏 𝑎)2/𝑛. Let 𝑖be the index such that 𝒙𝑖 𝒙 𝑖. Applying the alternative variance formulation in Definition D.1 and cancelling like terms reduces this to showing

𝑗 𝑖 (𝒙𝑖 𝒙𝑗)2 1

𝑗 𝑖 (𝒙 𝑖 𝒙𝑗)2 + (𝑏 𝑎)2

We assumed that all datasets were restricted to [𝑎, 𝑏]𝑛so we must then have 𝑗 𝑖(𝒙𝑖 𝒙𝑗)2 𝑛(𝑏 𝑎)2, which then implies our desired inequality by the non-negativity of squared error.

Next we show that 𝑈𝓁 Var(𝒙) 𝑈 𝓁 Var(𝒙). It suffices to show that for an arbitrary 𝒙 such that

𝑑ham(𝒙, 𝒙 ) = 𝓁we have Var [𝒙 ] Var [𝒙]+ 𝓁(𝑏 𝑎)2

𝑛 . We previously showed that Var [𝒙] Var [𝒙 ]+ (𝑏 𝑎)2/𝑛for any 𝒙, 𝒙 such that 𝑑ham(𝒙, 𝒙 ) = 1, so our claim then follows inductively.

Therefore, our construction of 𝑈 𝓁 Var satisfies Definition A.1 as desired.

E Additional Analysis of Model Evaluation Invocations

In this section, we provide the definitions and implementation details for applying our method to machine learning model evaluation functions. We further unify this analysis by considering applying our methodology to linearly separable functions.

We now define our datasets as (𝒙, 𝑦) 𝑛 ℝ𝑛and we set 𝑑ham((𝒙, 𝑦), (𝒙 , 𝑦 )) = |{𝑖 𝒙𝑖 𝒙 𝑖or 𝑦𝑖 𝑦 𝑖}| to be the Hamming distance between datasets. We will be considering binary classification and multi-class classification, so we define cross-entropy loss for both.

Definition E.1. Given machine learning model for binary-classification 𝜔 ℝand assume all 𝑦𝑖 {0, 1}, let

BCE𝜔(𝒙, 𝑦) =

𝑖=1 𝑦𝑖log ( 1 1 + 𝑒 𝜔(𝒙𝑖) ) (1 𝑦𝑖) log ( 𝑒 𝜔(𝒙𝑖)

1 + 𝑒 𝜔(𝒙𝑖) )

Similarly, given machine learning model for multi-classification 𝜔 ℝ𝑐with 𝑐classes and assume all 𝑦𝑖 [𝑐], let

CE𝜔(𝒙, 𝑦) =

𝑖=1 log ( 𝑒𝜔(𝒙𝑖)𝑦𝑖 𝑐 𝑗=1 𝑒𝜔(𝒙𝑖)𝑗)

We also define the mean squared error and mean absolute error.

Definition E.2. Given a dataset (𝒙, 𝑦) and machine learning model 𝜔 ℝ, we define

MSE𝜔(𝒙, 𝑦) = 𝑛 𝑖=1(𝜔(𝒙𝑖) 𝑦𝑖)2

MAE𝜔(𝒙, 𝑦) = 𝑛 𝑖=1 |𝜔(𝒙𝑖) 𝑦𝑖|

E.1 Application to linearly separable functions

In this section we consider all linearly separable functions 𝑓 𝑛 ℝsuch that

where ℝ. We could also extend our results here to that are specific to the index, but for simplicity we will restrict our consideration. Without loss of generality assume the indices are ordered such that (𝒙𝑖) (𝒙𝑖+1). We first provide lower output bounds for these functions.

Lemma E.3. Given a linearly separable function 𝑓 ℝ, then

𝑖=1 (𝒙𝑖) + 𝓁 inf 𝒙𝑘 { (𝒙𝑘)}

Proof. Changing any individual s data can only change their contribution to the sum by the linearly separable property. Therefore, decreasing the 𝓁individuals with the highest contribution to the sum must minimize the function for datasets with Hamming distance 𝓁from the underlying data.

Similarly, we provide upper output bounds for linearly separable functions.

Lemma E.4. Given a linearly separable function 𝑓 ℝ, then

𝑖=𝓁 (𝒙𝑖) + 𝓁 sup 𝒙𝑘 { (𝒙𝑘)}

Proof. Follows equivalently to the proof of Lemma E.3

We then provide approximate relaxations of these bounds that will allow for easier application.

Lemma E.5. Given a linearly separable function 𝑓 ℝ, then approximate upper and lower sensitivity bounding functions 𝑈𝓁 𝑓 𝑛 ℝand 𝐿𝓁 𝑓 𝑛 ℝsatisfy Definition A.1 for

𝑖=1 (𝒙𝑖) + 𝓁 𝑎 and 𝑈𝓁 𝑓(𝒙) =

𝑖=𝓁 (𝒙𝑖) + 𝓁 𝑏

if 𝑎 inf𝒙𝑘 { (𝒙𝑘)} and 𝑏 sup𝒙𝑘 { (𝒙𝑘)}

Proof. We show this the approximate lower output bounds and the upper follow equivalently. We first have 𝐿𝓁 𝑓(𝒙) 𝐿𝓁 𝑓(𝒙) by construction. Next, for neighboring 𝒙, 𝒙 , let 𝑗be the index at which 𝒙𝑗 𝒙 𝑗. We assumed an ordering to the indices for simplicity, but we equivalently have

𝐿𝓁 𝑓(𝒙) = min 𝑆 [𝑛] |𝑆|=𝑛 𝓁 {

𝑖 𝑆 (𝒙𝑖) } + 𝓁 𝑎

For some given 𝓁, let 𝑆𝒙be the subset of indices that minimizes this for 𝐿𝓁 𝑓(𝒙). If 𝑗 𝑆𝒙then we have 𝐿𝓁 𝑓(𝒙) 𝐿𝓁 𝑓(𝒙 ), so 𝐿𝓁 𝑓(𝒙) 𝐿𝓁+1 𝑓 (𝒙 ). Otherwise, we know that

𝐿𝓁+1 𝑓 (𝒙 ) = min 𝑆 [𝑛] |𝑆|=𝑛 (𝓁+1 {

𝑖 𝑆 (𝒙 𝑖) } + (𝓁+ 1) 𝑎

𝑖 𝑆𝒙 𝑗 (𝒙𝑖) + (𝓁+ 1) 𝑎

and because 𝑎 (𝒙𝑗) then this implies 𝐿𝓁 𝑓(𝒙) 𝐿𝓁+1 𝑓 (𝒙 ).

We further show that our approximate bounds allow for monotonic reflective inverse sensitivities which implies improved privacy guarantees.

Lemma E.6. Given a linearly separable function 𝑓 ℝ, along with approximate upper and lower sensitivity bounding functions 𝑈𝓁 𝑓 𝑛 ℝand 𝐿𝓁 𝑓 𝑛 ℝsuch that

𝑖=1 (𝒙𝑖) + 𝓁 𝑎 and 𝑈𝓁 𝑓(𝒙) =

𝑖=𝓁 (𝒙𝑖) + 𝓁 𝑏

where 𝑎 inf𝒙𝑘 { (𝒙𝑘)} and 𝑏 sup𝒙𝑘 { (𝒙𝑘)}. For any neighboring datasets 𝒙, 𝒙 we have that either s𝑓(𝒙; 𝑡) s𝑓(𝒙 ; 𝑡) for all 𝑡 ℝor s𝑓(𝒙; 𝑡) s𝑓(𝒙 ; 𝑡) for all 𝑡 ℝ

Proof. Without loss of generality, assume 𝑓(𝒙) 𝑓(𝒙 ) and we will show s𝑓(𝒙; 𝑡) s𝑓(𝒙 ; 𝑡) for all 𝑡 𝑅.

We first consider 𝑡 [𝑓(𝒙), 𝑓(𝒙 )]. Given that the datasets are neighboring, we must have 𝐿1 𝑓(𝒙) 𝑡 𝑈1 𝑓(𝒙) and 𝐿1 𝑓(𝒙 ) 𝑡 𝑈1 𝑓(𝒙 ). By Definition A.1 and Definition A.2, we have that len𝑓(𝒙; 𝑡) 1 and len𝑓(𝒙 ; 𝑡) 1. Given that 𝑡 [𝑓(𝒙), 𝑓(𝒙 )] this then implies s𝑓(𝒙; 𝑡) {1/2, 0} and s𝑓(𝒙 ; 𝑡) { 1/2, 0}. Therefore s𝑓(𝒙; 𝑡) s𝑓(𝒙 ; 𝑡).

Next consider the case when 𝑡< 𝑓(𝒙). This implies sgn(𝑡 𝑓(𝒙)) = sgn(𝑡 𝑓(𝒙 )) = 1 and also that len𝑓(𝒙; 𝑡) 1 and len𝑓(𝒙 ; 𝑡) 1. By Lemma E.7 we have that 𝐿𝓁 𝑓(𝒙) 𝐿𝓁 𝑓(𝒙 ) for all 𝓁 0, which implies len𝑓(𝒙; 𝑡) len𝑓(𝒙 ; 𝑡) and therefore s𝑓(𝒙; 𝑡) s𝑓(𝒙 ; 𝑡).

Finally consider 𝑡> 𝑓(𝒙 ). This implies sgn(𝑡 𝑓(𝒙)) = sgn(𝑡 𝑓(𝒙 )) = 1 and also that len𝑓(𝒙; 𝑡) 1 and len𝑓(𝒙 ; 𝑡) 1. By Lemma E.7 we have that 𝑈𝓁 𝑓(𝒙) 𝑈𝓁 𝑓(𝒙 ) for all 𝓁 0, which implies len𝑓(𝒙; 𝑡) len𝑓(𝒙 ; 𝑡) and therefore s𝑓(𝒙; 𝑡) s𝑓(𝒙 ; 𝑡).

We will also require the following helper lemma in order prove Lemma E.6.

Lemma E.7. Given a linearly separable function 𝑓 ℝ, along with approximate upper and lower sensitivity bounding functions 𝑈𝓁 𝑓 𝑛 ℝand 𝐿𝓁 𝑓 𝑛 ℝsuch that

𝑖=1 (𝒙𝑖) + 𝓁 𝑎 and 𝑈𝓁 𝑓(𝒙) =

𝑖=𝓁 (𝒙𝑖) + 𝓁 𝑏

where 𝑎 inf𝒙𝑘 { (𝒙𝑘)} and 𝑏 sup𝒙𝑘 { (𝒙𝑘)}. For any neighboring datasets 𝒙, 𝒙 , if 𝑓(𝒙) 𝑓(𝒙 ) then 𝐿𝓁 𝑓(𝒙) 𝐿𝓁 𝑓(𝒙 ) and 𝑈𝓁 𝑓(𝒙) 𝑈𝓁 𝑓(𝒙 ) for all 𝓁 0

Proof. Let 𝑗be the index at which 𝒙𝑗 𝒙 𝑗, which implies (𝒙𝑗) (𝒙 𝑗) because of our linearly separable property. We assumed an ordering to the indices for simplicity, but we equivalently have

𝐿𝓁 𝑓(𝒙) = min 𝑆 [𝑛] |𝑆|=𝑛 𝓁 {

𝑖 𝑆 (𝒙𝑖) } + 𝓁 𝑎

For a given 𝓁let 𝑆𝒙 denote the set of indices that minimizes 𝐿𝓁 𝑓(𝒙 ). If 𝑗 𝑆𝒙 then we have

𝐿𝓁 𝑓(𝒙 ) = (𝒙 𝑗) +

𝑖 𝑆𝒙 𝑗 (𝒙𝑖) + 𝓁 𝑎 (𝒙𝑗) +

𝑖 𝑆𝒙 𝑗 (𝒙𝑖) + 𝓁 𝑎=

𝑖 𝑆𝒙 (𝒙𝑖) + 𝓁 𝑎 𝐿𝓁 𝑓(𝒙)

Similarly, if 𝑗 𝑆𝒙 then

𝑖 𝑆𝒙 (𝒙𝑖) + 𝓁 𝑎 𝐿𝓁 𝑓(𝒙)

The proof for 𝑈𝓁 𝑓(𝒙) 𝑈𝓁 𝑓(𝒙 ) follows equivalently.

E.2 Efficient cross-entropy loss instantiation

We will provide the efficient instantiation for multi-class cross entropy loss as this can easily be extended to binary cross entropy loss. Without loss of generality, assume the indices are ordered such that

log ( 𝑒𝜔(𝒙𝑖)𝑦𝑖 𝑐 𝑗=1 𝑒𝜔(𝒙𝑖)𝑗) log ( 𝑒𝜔(𝒙𝑖+1)𝑦𝑖+1 𝑐 𝑗=1 𝑒𝜔(𝒙𝑖+1)𝑗)

We can then provide the lower output bounds for cross entropy loss

Lemma E.8. Given a dataset (𝒙, 𝑦) and machine learning model 𝜔 ℝ𝑐, then

𝐿𝓁 CE𝜔(𝒙, 𝑦) =

𝑖=1 log ( 𝑒𝜔(𝒙𝑖)𝑦𝑖 𝑐 𝑗=1 𝑒𝜔(𝒙𝑖)𝑗)

Proof. We apply Lemma E.3 and Lemma E.5 and we observe that

log ( 𝑒𝜔(𝒙𝑖)𝑦𝑖 𝑐 𝑗=1 𝑒𝜔(𝒙𝑖)𝑗)

As seen with variance in Section 5, we have that 𝑈1 CE𝜔(𝒙, 𝑦) = which implies that cross-entropy loss has inherently asymmetric sensitivities. Similarly, we will need to restrict the range of these values in order to apply inverse sensitivity mechanism even though our method could easily handle the unbounded setting. We provide a proof for these upper output bounds in Appendix E.

Lemma E.9. Given a dataset (𝒙, 𝑦) and machine learning model 𝜔 ℝ𝑐where we restrict 𝜔(𝒙𝑖) [𝑎, 𝑏]𝑐for all 𝑖, then

𝑈𝓁 CE𝜔(𝒙, 𝑦) = (𝓁 log ( 𝑒𝑎 𝑏

𝑒𝑎 𝑏+ 𝑐 1) +

𝑖=𝓁+1 log ( 𝑒𝜔(𝒙𝑖)𝑦𝑖 𝑐 𝑗=1 𝑒𝜔(𝒙𝑖)𝑗))

Proof. We apply Lemma E.4 and Lemma E.5, and we observe that with our restricted bounds we have

log ( 𝑒𝜔(𝒙𝑖)𝑦𝑖 𝑐 𝑗=1 𝑒𝜔(𝒙𝑖)𝑗)

The corresponding algorithm for instantiating cross-entropy loss with our method is similar to Algorithm 2. The privacy guarantees from Theorem 5.5 also follow equivalently, but we can additionally improve the privacy to be (𝜀1 + 𝜀2)-DP by applying Lemma E.6 to achieve monotonicity. We can also achieve 𝑂(𝑛log(𝑛) + 𝑞) runtime by computing all of our approximate upper and lower bounds, but we could also easily employ the same strategy of setting these bounds to be infinity and zero, respectively, for all 𝓁> 𝑐where we set 𝑐= 100. This then gives the linear runtime, where we also utilize the fact that we will never run Above Threshold from more than 50,000 queries.

E.3 Efficient implementation for regression evaluation

We will only provide the efficient implementation for MSE in this section as MAE will follow identically. Without loss of generality, assume the indices are ordered such that (𝜔(𝒙𝑖) 𝑦𝑖)2 (𝜔(𝒙𝑖+1) 𝑦𝑖+1)2.

Lemma E.10. Given a dataset (𝒙, 𝑦) and machine learning model 𝜔 ℝ, then

𝐿𝓁 MSE𝜔(𝒙, 𝑦) = 𝑛 𝓁 𝑖=1(𝜔(𝒙𝑖) 𝑦𝑖)2

Proof. We apply Lemma E.3 and Lemma E.5, and we observe that

inf 𝒙𝑖,𝑦𝑖{𝜔(𝒙𝑖) 𝑦𝑖)2} 0

As seen with variance in Section 5, we have that 𝑈1 MSE𝜔(𝒙, 𝑦) = which implies that MSE has inherently asymmetric sensitivities. Similarly, we will need to restrict the range of these values in order to apply inverse sensitivity mechanism even though our method could easily handle the unbounded setting.

Lemma E.11. Given a dataset (𝒙, 𝑦) and machine learning model 𝜔 ℝwhere we restrict 𝜔(𝒙𝑖) [𝑎, 𝑏] and 𝑦𝑖 [𝑎, 𝑏] for all 𝑖, then

𝑈𝓁 MSE𝜔(𝒙, 𝑦) = 𝓁(𝑏 𝑎)2 + 𝑛 𝑖=𝓁+1(𝜔(𝒙𝑖) 𝑦𝑖)2

Proof. We apply Lemma E.4 and Lemma E.5, and we observe that for our bounded setting

sup 𝒙𝑖,𝑦𝑖 {𝜔(𝒙𝑖) 𝑦𝑖)2} (𝑏 𝑎)2

The corresponding algorithm for instantiating MSE with our method is identical to Algorithm 2. The privacy guarantees from Theorem 5.5 also follow equivalently, but we can additionally improve the privacy to be (𝜀1 + 𝜀2)-DP by applying Lemma E.6 to achieve monotonicity. We can also achieve 𝑂(𝑛log(𝑛) + 𝑞) runtime by computing all of our approximate upper and lower bounds, but we could also easily employ the same strategy of setting these bounds to be infinity and zero, respectively, for all 𝓁> 𝑐where we set 𝑐= 100. This then gives the linear runtime, where we also utilize the fact that we will never run Above Threshold from more than 50,000 queries.

Neur IPS Paper Checklist

Question: Do the main claims made in the abstract and introduction accurately reflect the paper s contributions and scope?

Answer: [Yes]

Justification: We enumerate the main claims made in the abstract and introduction at the end of the introduction with pointers to each section that contains their support.

2. Limitations

Question: Does the paper discuss the limitations of the work performed by the authors?

Answer: [Yes]

Justification: We provide an approximate variant of our method to overcome the limitations of the exact method which are discussed. We also give theoretical and empirical details upon when our method is advantageous.

3. Theory Assumptions and Proofs

Question: For each theoretical result, does the paper provide the full set of assumptions and a complete (and correct) proof?

Answer: [Yes]

Justification: All assumptions for each theoretical claim is contained in the claim and all proofs are in the appendix.

4. Experimental Result Reproducibility

Question: Does the paper fully disclose all the information needed to reproduce the main experimental results of the paper to the extent that it affects the main claims and/or conclusions of the paper (regardless of whether the code and data are provided or not)?

Answer: [Yes]

Justification: We provide full details on datasets, parameters, and experimental setup for all empirical results.

5. Open access to data and code

Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material?

Answer: [No]

Justification: The data is open source and the code is straightforward to reproduce as all algorithms are simple, but we have not open sourced the code. We d be happy to provide all code used upon request.

6. Experimental Setting/Details

Question: Does the paper specify all the training and test details (e.g., data splits, hyperparameters, how they were chosen, type of optimizer, etc.) necessary to understand the results?

Answer: [Yes]

Justification: For each empirical study and dataset we specify all parameters, training/test splits and models used from open source packages.

7. Experiment Statistical Significance

Question: Does the paper report error bars suitably and correctly defined or other appropriate information about the statistical significance of the experiments?

Answer: [Yes]

Justification: All figures from our empirical results contain confidence interval error bars.

8. Experiments Compute Resources

Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [No] Justification: Our algorithms are lightweight so we just used basic colab notebooks to run the different empirical studies, but this was not specified in the paper. 9. Code Of Ethics

Question: Does the research conducted in the paper conform, in every respect, with the Neur IPS Code of Ethics https://neurips.cc/public/Ethics Guidelines? Answer: [Yes] Justification: We believe this work conforms to the Code of Ethics 10. Broader Impacts

Question: Does the paper discuss both potential positive societal impacts and negative societal impacts of the work performed? Answer: [NA] Justification: This work only provides improved methods in differential privacy to give improved estimation for the same level of privacy. 11. Safeguards

Question: Does the paper describe safeguards that have been put in place for responsible release of data or models that have a high risk for misuse (e.g., pretrained language models, image generators, or scraped datasets)? Answer: [NA] Justification: We don t propose any algorithms or models that could be considered high risk 12. Licenses for existing assets

Question: Are the creators or original owners of assets (e.g., code, data, models), used in the paper, properly credited and are the license and terms of use explicitly mentioned and properly respected? Answer: [Yes] Justification: We provide citations for all datasets used. 13. New Assets

Question: Are new assets introduced in the paper well documented and is the documentation provided alongside the assets? Answer: [NA] Justification: We don t introduce new assets. 14. Crowdsourcing and Research with Human Subjects

Question: For crowdsourcing experiments and research with human subjects, does the paper include the full text of instructions given to participants and screenshots, if applicable, as well as details about compensation (if any)? Answer: [NA] Justification: This paper doesn t involve crowdsourding or research with human subjects. 15. Institutional Review Board (IRB) Approvals or Equivalent for Research with Human Subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or institution) were obtained? Answer: [NA] Justification: This paper doesn t involve crowdsourding or research with human subjects.