# breaking_the_communicationprivacyaccuracy_tradeoff_with_fdifferential_privacy__f16eeef0.pdf Breaking the Communication-Privacy-Accuracy Tradeoff with f-Differential Privacy Richeng Jin1 Zhonggen Su1 Caijun Zhong1 Zhaoyang Zhang1 Tony Q.S. Quek2 Huaiyu Dai3 1Zhejiang University 2Singapore University of Technology and Design 3North Carolina State University We consider a federated data analytics problem in which a server coordinates the collaborative data analysis of multiple users with privacy concerns and limited communication capability. The commonly adopted compression schemes introduce information loss into local data while improving communication efficiency, and it remains an open problem whether such discrete-valued mechanisms provide any privacy protection. In this paper, we study the local differential privacy guarantees of discrete-valued mechanisms with finite output space through the lens of f-differential privacy (DP). More specifically, we advance the existing literature by deriving tight f-DP guarantees for a variety of discrete-valued mechanisms, including the binomial noise and the binomial mechanisms that are proposed for privacy preservation, and the sign-based methods that are proposed for data compression, in closed-form expressions. We further investigate the amplification in privacy by sparsification and propose a ternary stochastic compressor. By leveraging compression for privacy amplification, we improve the existing methods by removing the dependency of accuracy (in terms of mean square error) on communication cost in the popular use case of distributed mean estimation, therefore breaking the three-way tradeoff between privacy, communication, and accuracy. 1 Introduction Nowadays, the massive data generated and collected for analysis, and consequently the prohibitive communication overhead for data transmission, are overwhelming the centralized data analytics paradigm. Federated data analytics is, therefore, proposed as a new distributed computing paradigm that enables data analysis while keeping the raw data locally on the user devices [1]. Similarly to its most notable use case, i.e., federated learning (FL) [2, 3], federated data analytics faces two critical challenges: data privacy and communication efficiency. On one hand, the local data of users may contain sensitive information, and privacy-preserving mechanisms are needed. On the other hand, the user devices are usually equipped with limited communication capabilities, and compression mechanisms are often adopted to improve communication efficiency. Differential privacy (DP) has become the gold standard for privacy measures due to its rigorous foundation and simple implementation. One classic technique to ensure DP is adding Gaussian or Laplacian noises to the data [4]. However, they are prone to numerical errors on finite-precision computers [5] and may not be suitable for federated data analytics with communication constraints due to their continuous nature. With such consideration, various discrete noises with privacy guarantees have been proposed, e.g., the binomial noise [6], the discrete Gaussian mechanism [7], and the Skellam mechanism [8]. Nonetheless, the additive noises in [7] and [8] assume infinite range, which renders them less communication-efficient without appropriate clipping. Unfortunately, clipping 37th Conference on Neural Information Processing Systems (Neur IPS 2023). usually ruins the unbiasedness of the mechanism. [9] develops a Poisson binomial mechanism (PBM) that does not rely on additive noise. In PBM, each user adopts a binomial mechanism, which takes a continuous input and encodes it into the success probability of a binomial distribution. The output of the binomial mechanism is shared with a central server which releases the aggregated result that follows the Poisson binomial distribution. However, [9] focuses on distributed DP in which the server only observes the output of the aggregated results instead of the data shared by each individual user, and therefore, requires a secure computation function (e.g., secure aggregation [3]). In addition to discrete DP mechanisms, existing works have investigated the fundamental tradeoff between communication, privacy, and accuracy under the classic (ϵ, δ)-DP framework (e.g., [10, 11, 12, 13]). Notably, in the case of distributed mean estimation, [13] incorporates Kashin s representation and proposed Subsampled and Quantized Kashin s Response (SQKR), which achieves order-optimal mean square error (MSE) that has a linear dependency on the dimension of the private data d. SQKR first computes Kashin s representation of the private data and quantizes each coordinate into a 1-bit message. Then, k coordinates are randomly sampled and privatized by the 2k-Random Response mechanism [14]. SQKR achieves an order-optimal three-way tradeoff between privacy, accuracy, and communication. Nonetheless, it does not account for the privacy introduced during sparsification. Intuitively, as compression becomes more aggressive, less information will be shared by the users, which naturally leads to better privacy protection. However, formally quantifying the privacy guarantees of compression mechanisms remains an open problem. In this work, we close the gap by investigating the local DP guarantees of discrete-valued mechanisms, based on which a ternary stochastic compressor is proposed to leverage the privacy amplification by compression and advance the literature by achieving a better communication-privacy-accuracy tradeoff. More specifically, we focus on the emerging concept of f-DP [15] that can be readily converted to (ϵ, δ)-DP and R enyi differential privacy [16] in a lossless way while enjoying better composition property [17]. Our contributions. In this work, we derive the closed-form expressions of the tradeoff function between type I and type II error rates in the hypothesis testing problem for a generic discrete-valued mechanism with a finite output space, based on which f-DP guarantees of the binomial noise (c.f. Section 4.1) and the binomial mechanism (c.f. Section 4.2) that covers a variety of discrete differentially private mechanisms and compression mechanisms as special cases are obtained. Our analyses lead to tighter privacy guarantees for binomial noise than [6] and extend the results for the binomial mechanism in [9] to local DP. To the best of our knowledge, this is the first work that investigates the f-DP guarantees of discrete-valued mechanisms, and the results could possibly inspire the design of better differentially private compression mechanisms. Inspired by the analytical results, we also leverage the privacy amplification of the sparsification scheme and propose a ternary stochastic compressor (c.f. Section 5). By accounting for the privacy amplification of compression, our analyses reveal that given a privacy budget µ-GDP (which is a special case of f-DP) with µ < p 4dr/(1 r) (in which r is the ratio of non-zero coordinates in expectation for the sparsification scheme), the MSE of the ternary stochastic compressor only depends on µ in the use case of distributed mean estimation (which is the building block of FL). In this sense, we break the three-way tradeoff between communication overhead, privacy, and accuracy by removing the dependency of accuracy on the communication overhead. Different from existing works which suggest that, in the high privacy regime, the error introduced by compression is dominated by the error introduced for privacy, we show that the error caused by compression could be translated into enhancement in privacy. Compared to SQKR [13], the proposed scheme yields better privacy guarantees given the same MSE and communication cost. For the scenario where each user i observes xi { c, c}d for some constant c > 0, the proposed scheme achieves the same privacy guarantee and MSE as those of the classic Gaussian mechanism in the large d regime, which essentially means that the improvement in communication efficiency is achieved for free. We remark that the regime of large d is often of interest in practical FL in which d is the number of training parameters. 2 Related Work Recently, there is a surge of interest in developing differentially private data analysis techniques, which can be divided into three categories: central differential privacy (CDP) that assumes a trusted central server to perturb the collected data [18], distributed differential privacy that relies on secure aggregation during data collection [3], and local differential privacy (LDP) that avoids the need for the trusted server by perturbing the local data on the user side [19]. To overcome the drawbacks of the Gaussian and Laplacian mechanisms, several discrete mechanisms have been proposed. [18] introduces the one-dimensional binomial noise, which is extended to the general d-dimensional case in [6] with more comprehensive analysis in terms of (ϵ, δ)-DP. [20] analyzes the LDP guarantees of discrete Gaussian noise, while [7] further considers secure aggregation. [8] studies the R enyi DP guarantees of the Skellam mechanism. However, both the discrete Gaussian mechanism and the Skellam mechanism assume infinite ranges at the output, which makes them less communication efficient without appropriate clipping. Moreover, all the above three mechanisms achieve differential privacy at the cost of exploding variance for the additive noise in the high-privacy regimes. Another line of studies jointly considers privacy preservation and compression. [10, 11] propose to achieve DP by quantizing, sampling, and perturbing each entry, while [12] proposes a vector quantization scheme with local differential privacy. However, the MSE of these schemes grows with d2. [13] investigates the three-way communication-privacy-accuracy tradeoff and incorporates Kashin s representation to achieve order-optimal estimation error in mean estimation. [21] proposes to first sample a portion of coordinates, followed by the randomized response mechanism [22]. [23] and [24] further incorporate shuffling for privacy amplification. [25] proposes to compress the LDP schemes using a pseudorandom generator, while [26] utilizes minimal random coding. [27] proposes a privacy-aware compression mechanism that accommodates DP requirement and unbiasedness simultaneously. However, they consider pure ϵ-DP, which cannot be easily generalized to the relaxed variants. [9] proposes the Poisson binomial mechanism with R enyi DP guarantees. Nonetheless, R enyi DP lacks the favorable hypothesis testing interpretation and the conversion to (ϵ, δ)-DP is lossy. Moreover, most of the existing works focus on privatizing the compressed data or vice versa, leaving the privacy guarantees of compression mechanisms largely unexplored. [28] proposes a numerical accountant based on fast Fourier transform [29] to evaluate (ϵ, δ)-DP of general discrete-valued mechanisms. Recently, an independent work [30] studies privacy amplification by compression for central (ϵ, δ)-DP and multi-message shuffling frameworks. In this work, we consider LDP through the lens of f-DP and eliminate the need for a trusted server or shuffler. Among the relaxations of differential privacy notions [31, 16, 32], f-DP [15] is a variant of ϵ-DP with hypothesis testing interpretation, which enjoys the property of lossless conversion to (ϵ, δ)-DP and tight composition [33]. As a result, it leads to favorable performance in distributed/federated learning [34, 35]. However, to the best of our knowledge, none of the existing works study the f-DP of discrete-valued mechanisms. In this work, we bridge the gap by deriving tight f-DP guarantees of various compression mechanisms in closed form, based on which a ternary stochastic compressor is proposed to achieve a better communication-privacy-accuracy tradeoff than existing methods. 3 Problem Setup and Preliminaries 3.1 Problem Setup We consider a set of N users (denoted by N) with local data xi Rd. The users aim to share xi s with a central server in a privacy-preserving and communication-efficient manner. More specifically, the users adopt a privacy-preserving mechanism M to obfuscate their data and share the perturbed results M(xi) s with the central server. In the use case of distributed/federated learning, each user has a local dataset S. During each training step, it computes the local stochastic gradients and shares the obfuscated gradients with the server. In this sense, the overall gradient computation and obfuscation mechanism M takes the local dataset S as the input and outputs the obfuscated result M(S). Upon receiving the shared M(S) s, the server estimates the mean of the local gradients. 3.2 Differential Privacy Formally, differential privacy is defined as follows. Definition 1 ((ϵ, δ)-DP [18]). A randomized mechanism M is (ϵ, δ)-differentially private if for all neighboring datasets S and S and all O O in the range of M, we have P(M(S) O) eϵP(M(S ) O) + δ, (1) in which S and S are neighboring datasets that differ in only one record, and ϵ, δ 0 are the parameters that characterize the level of differential privacy. 3.3 f-Differential Privacy Assuming that there exist two neighboring datasets S and S , from the hypothesis testing perspective, we have the following two hypotheses H0 : the underlying dataset is S, H1 : the underlying dataset is S . (2) Let P and Q denote the probability distribution of M(S) and M(S ), respectively. [15] formulates the problem of distinguishing the two hypotheses as the tradeoff between the achievable type I and type II error rates. More precisely, consider a rejection rule 0 ϕ 1 (which rejects H0 with a probability of ϕ), the type I and type II error rates are defined as αϕ = EP [ϕ] and βϕ = 1 EQ[ϕ], respectively. In this sense, f-DP characterizes the tradeoff between type I and type II error rates. The tradeoff function and f-DP are formally defined as follows. Definition 2 (tradeoff function [15]). For any two probability distributions P and Q on the same space, the tradeoff function T(P, Q) : [0, 1] [0, 1] is defined as T(P, Q)(α) = inf{βϕ : αϕ α}, where the infimum is taken over all (measurable) rejection rule ϕ. Definition 3 (f-DP [15]). Let f be a tradeoff function. With a slight abuse of notation, a mechanism M is f-differentially private if T(M(S), M(S )) f for all neighboring datasets S and S , which suggests that the attacker cannot achieve a type II error rate smaller than f(α). f-DP can be converted to (ϵ, δ)-DP as follows. Lemma 1. [15] A mechanism is f(α)-differentially private if and only if it is (ϵ, δ)-differentially private with f(α) = max{0, 1 δ eϵα, e ϵ(1 δ α)}. (3) Finally, we introduce a special case of f-DP with f(α) = Φ(Φ 1(1 α) µ), which is denoted as µ-GDP. More specifically, µ-GDP corresponds to the tradeoff function of two normal distributions with mean 0 and µ, respectively, and a variance of 1. 4 Tight f-DP Analysis for Existing Discrete-Valued Mechanisms In this section, we derive the f-DP guarantees for a variety of existing differentially private discretevalued mechanisms in the scalar case (i.e., d = 1) to illustrate the main ideas. The vector case will be discussed in Section 6. More specifically, according to Definition 3, the f-DP of a mechanism M is given by the infimum of the tradeoff function over all neighboring datasets S and S , i.e., f(α) = inf S,S infϕ{βϕ(α) : αϕ α}. Therefore, the analysis consists of two steps: 1) we obtain the closed-form expressions of the tradeoff functions, i.e., infϕ{βϕ(α) : αϕ α}, for a generic discrete-valued mechanism (see Section A in the supplementary material); and 2) given the tradeoff functions, we derive the f-DP by identifying the mechanism-specific infimums of the tradeoff functions over all possible neighboring datasets. We remark that the tradeoff functions for the discrete-valued mechanisms are essentially piece-wise functions with both the domain and range of each piece determined by both the mechanisms and the datasets, which renders the analysis for the second step highly non-trivial. 4.1 Binomial Noise In this subsection, we consider the binomial noise (i.e., Algorithm 1) proposed in [6], which serves as a communication-efficient alternative to the classic Gaussian noise. More specifically, the output of stochastic quantization in [6] is perturbed by a binomial random variable. Algorithm 1 Binomial Noise [6] Input: xi [0, 1, , l], i N, number of trials M, success probability p. Privatization: Zi xi + Binom(M, p). Theorem 1. Let Z = Binom(M, p), the binomial noise mechanism in Algorithm 1 is f bn(α)- differentially private with f bn(α) = min{β+ ϕ,inf(α), β ϕ,inf(α)}, (4) β+ ϕ,inf(α) = P( Z k + l) + P (Z= k+l)P ( Z< k) P ( Z= k) P ( Z= k+l) P ( Z= k) α, for α [P( Z < k), P( Z k)], k [0, M l], 0, for α [P( Z M l), 1]. β ϕ,inf(α) = P( Z k l) + P ( Z= k l)P ( Z> k) P ( Z= k) P ( Z= k l) P ( Z= k) α, for α [P( Z > k), P( Z k)], k [l, M], 0, for α [P( Z l), 1]. Given that P( Z = k) = M k pk(1 p)M k, it can be readily shown that when p = 0.5, both β+ ϕ,inf(α) and β ϕ,inf(α) are maximized, and f(α) = β+ ϕ,inf(α) = β ϕ,inf(α). Fig. 1 shows the impact of M when l = 8, which confirms the result in [6] that a larger M provides better privacy protection (recall that given the same α, a larger βα indicates that the attacker makes mistakes in the hypothesis testing more likely and therefore corresponds to better privacy protection). Note that the output of Algorithm 1 Zi {0, 1, ..., M + l}, which reqiures a communication overhead of log2(M +l+1) bits. We can readily convert f(α)-DP to (ϵ, δ)-DP by utilizing Lemma 1. 0.0 0.2 0.4 0.6 0.8 1.0 M = 10, p = 0.5 M = 50, p = 0.5 M = 100, p = 0.5 M = 200, p = 0.5 M = 500, p = 0.5 Figure 1: Impact of M on Algorithm 1 with l = 8. Remark 1. The results derived in this work improve [6] in two aspects: (1) Theorem 1 in [6] requires Mp(1 p) max(23 log(10d/δ), 2l/s) > max(23 log(10), 2l/s), in which 1/s N is some scaling factor. When p = 1/2, it requires M 212. More specifically, for M = 500, [6] requires δ > 0.044. Our results imply that there exists some (ϵ, δ) such that Algorithm 1 is (ϵ, δ)-DP as long as M > l. For M = 500, δ can be as small as 4.61 10 136. (2) Our results are tight, in the sense that no relaxation is applied in our derivation. As an example, when M = 500 and p = 0.5, Theorem 1 in [6] gives (3.18, 0.044)-DP while Theorem 1 in this paper yields (1.67, 0.039)-DP. 4.2 Binomial Mechanism Algorithm 2 Binomial Mechanism [9] Input: c > 0, xi [ c, c], M N, pi(xi) [pmin, pmax] Privatization: Zi Binom(M, pi(xi)). In this subsection, we consider the binomial mechanism (i.e., Algorithm 2). Different from Algorithm 1 that perturbs the data with noise following the binomial distribution with the same success probability, the binomial mechanism encodes the input xi into the success probability of the binomial distribution. We establish the privacy guarantee of Algorithm 2 as follows. Theorem 2. The binomial mechanism in Algorithm 2 is f bm(α)-differentially private with f bm(α) = min{β+ ϕ,inf(α), β ϕ,inf(α)}, (7) in which β+ ϕ,inf(α) = 1 [P(Y < k) + γP(Y = k)] = P(Y k) + P(Y = k)P(X < k) P(X = k) P(Y = k) for α [P(X < k), P(X k)] and k {0, 1, 2, , M}, where X = Binom(M, pmax) and Y = Binom(M, pmin), and β ϕ,inf(α) = 1 [P(Y > k) + γP(Y = k)] = P(Y k) + P(Y = k)P(X > k) P(X = k) P(Y = k) for α [P(X > k), P(X k)] and k {0, 1, 2, , M}, where X = Binom(M, pmin) and Y = Binom(M, pmax). When pmax = 1 pmin, we have β+ ϕ,inf(α) = β ϕ,inf(α). Remark 2 (Comparison to [9]). The binomial mechanism is part of the Poisson binomial mechanism proposed in [9]. More specifically, in [9], each user i shares the output of the binomial mechanism Zi with the server, in which pi(xi) = 1 cxi and θ is some design parameter. It can be readily verified that pmax = 1 pmin in this case. The server then aggregates the result through x = c MNθ(P 2 ). [9] requires secure aggregation and considers the privacy leakage of releasing x, while we complement it by showing the LDP, i.e., the privacy leakage of releasing Zi for each user. In addition, we eliminate the constraint θ [0, 1 4], and the results hold for any selection of pi(xi). Moreover, the privacy guarantees in Theorem 2 are tight since no relaxation is involved. Fig. 2 shows the impact of M on the privacy guarantee. In contrast to binomial noise, the privacy of the binomial mechanisms improves as M (and equivalently communication overhead) decreases, which implies that it is more suitable for communication-constrained scenarios. We also derive the f-DP of the Poisson binomial mechanism, which are presented in Section C in the supplementary material. 0.0 0.2 0.4 0.6 0.8 1.0 Figure 2: Impact of M on Algorithm 2. In the following, we present two existing compressors that are special cases of the binomial mechanism. Example 1. We first consider the following stochastic sign compressor proposed in [36]. Definition 4 (Two-Level Stochastic Compressor [36]). For any given x [ c, c], the compressor sto-sign outputs sto-sign(x, A) = ( 1, with probability A+x 1, with probability A x where A > c is the design parameter that controls the level of stochasticity. With a slight modification (i.e., mapping the output space from {0, 1} to { 1, 1}), sto-sign(x, A) can be understood as a special case of the binomial mechanism with M = 1 and pi(xi) = A+xi 2A . In this case, we have pmax = A+c 2A and pmin = A c 2A . Applying the results in Theorem 2 yields f sto-sign(α) = β+ ϕ,inf(α) = β ϕ,inf(α) = A cα, for α [0, A+c A c A+c A c A+cα, for α [ A+c 2A , 1]. (9) Combining (9) with (3) suggests that the sto-sign compressor ensures (ln( A+c A c), 0)-DP. Example 2. The second sign-based compressor that we examine is CLDP ( ) [23]. Definition 5 (CLDP ( ) [23]). For any given x [ c, c], the compressor CLDP ( ) outputs CLDP (ϵ), which is given by (+1, with probability 1 2c eϵ 1 eϵ+1, 1, with probability 1 2c eϵ 1 eϵ+1. (10) CLDP (ϵ) can be understood as a special case of sto-sign(x, A) with A = c(eϵ+1) eϵ 1 . In this case, according to (9), we have f CLDP (α) = ( 1 eϵα, for α [0, A+c e ϵ(1 α), for α [ A+c 2A , 1]. (11) Combining the above result with (3) suggests that CLDP (ϵ) ensures (ϵ, 0)-DP, which recovers the result in [23]. It is worth mentioning that CLDP (ϵ) can be understood as the composition of sto-sign with A = c followed by the randomized response mechanism [22], and is equivalent to the one-dimensional case of the compressor in [13]. Moreover, the one-dimensional case of the schemes in [10, 11] can also be understood as special cases of sto-sign. 5 The Proposed Ternary Compressor The output of the binomial mechanism with M = 1 lies in the set {0, 1}, which coincides with the sign-based compressor. In this section, we extend the analysis to the ternary case, which can be understood as a combination of sign-based quantization and sparsification (when the output takes value 0, no transmission is needed since it does not contain any information) and leads to improved communication efficiency. More specifically, we propose the following ternary compressor. Definition 6 (Ternary Stochastic Compressor). For any given x [ c, c], the compressor ternary outputs ternary(x, A, B), which is given by ternary(x, A, B) = 1, with probability A+x 0, with probability 1 A 1, with probability A x where B > A > c are the design parameters that control the level of sparsity. For the ternary stochastic compressor in Definition 6, we establish its privacy guarantee as follows. 0.0 0.2 0.4 0.6 0.8 1.0 (ln(2), 0.05)-DP Figure 3: Sparsification improves privacy. Theorem 3. The ternary stochastic compressor is f ternary(α)- differentially private with f ternary(α) = A cα, for α [0, A c B α, for α [ A c A c A+c A c A+cα, for α [1 A+c 2B , 1]. (13) Remark 3 (Privacy amplification by sparsification). It can be observed from (9) and (13) that f ternary(α) > f sto-sign when α [ A c 2B ], and f ternary(α) = f sto-sign, otherwise. Fig. 3 shows f ternary(α) and f sto-sign for c = 0.1, A = 0.25, B = 0.5, and the shaded gray area corresponds to the improvement in privacy. It can be observed that communication efficiency and privacy are improved simultaneously. It is worth mentioning that, if we convert the privacy guarantees to (ϵ, 0)-DP, we have ϵ = ln( 7 3) for both compressors. However, the ternary compressor ensures (ln(2), 0.05)-DP (i.e., f ternary(α) max{0, 0.95 2α, 0.5(0.95 α)}) while the sto-sign compressor does not. We note that for the same A, as B increases (i.e., communication cost decreases), f ternary(α) approaches f(α) = 1 α (which corresponds to perfect privacy). In the following, we present a special case of the proposed ternary stochastic compressor. Example 3. The ternary-based compressor proposed in [37] is formally defined as follows. Definition 7 (ternarize( ) [37]). For any given x [ c, c], the compressor ternarize( ) outputs ternarize(x, B) = sign(x) with probability |x|/B and ternarize(x, B) = 0 otherwise, in which B > c is the design parameter. ternarize(x, B) can be understood as a special case of ternary(x, A, B) with A = |x|. According to Theorem 3, f ternary(α) = 1 c B α for α [0, 1 c B ] and f ternary(α) = 0 for α [1 c B , 1]. Combining the above result with (3), we have δ = c B and ϵ = 0, i.e., ternarize( ) provides perfect privacy protection (ϵ = 0) with a violation probability of δ = c B . Specifically, the attacker cannot distinguish xi from x i if the output of ternarize( ) = 0 (perfect privacy protection), while no differential privacy is provided if the output of ternarize( ) = 0 (violation of the privacy guarantee). Remark 4. It is worth mentioning that, in [37], the users transmit a scaled version of ternarize( ) and the scaling factor reveals the magnitude information of xi. Therefore, the compressor in [37] is not differentially private. 6 Breaking the Communication-Privacy-Accuracy Tradeoff In this section, we extend the results in Section 5 to the vector case in two different approaches, followed by discussions on the three-way tradeoff between communication, privacy, and accuracy. The results in Section 4 can be extended similarly. Specifically, in the first approach, we derive the µ-GDP in closed form, while introducing some loss in privacy guarantees. In the second approach, a tight approximation is presented. Given the results in Section 5, we can readily convert f-DP in the scalar case to Gaussian differential privacy in the vector case as follows. Theorem 4. Given a vector xi = [xi,1, xi,2, , xi,d] with |xi,j| c, j. Applying the ternary compressor to the j-th coordinate of xi independently yields µ-GDP with µ = 2Φ 1( 1 1+( A+c Remark 5. Note that ||xi||2 c is a sufficient condition for |xi,j| c, j. In the proof of Theorem 4, we first convert f ternary(α)-DP to (ϵ, 0)-DP for the scalar case, and then obtain (dϵ, 0)-DP for the d-dimensional case, followed by the conversion to GDP. One may notice that some loss in privacy guarantee is introduced since the extreme case |xi,j| = c, j actually violates the condition ||xi||2 c. To address this issue, following a similar method in [13, 38, 9], one may introduce Kashin s representation to transform the l2 geometry of the data into the l geometry. More specifically, [39] shows that for D > d, there exists a tight frame U such that for any x Rd, one can always represent each xi with yi [ γ0/ d]D for some γ0 and xi = Uyi. In Theorem 4, some loss in privacy guarantees is introduced when we convert f-DP to µ-GDP. In fact, since each coordinate of the vector is processed independently, the extension from the scalar case to the d-dimensional case may be understood as the d-fold composition of the mechanism in the scalar case. The composed result can be well approximated or numerically obtained via the central limit theorem for f-DP in [15] or the Edgeworth expansion in [33]. In the following, we present the result for the ternary compressor by utilizing the central limit theorem for f-DP. Theorem 5. For a vector xi = [xi,1, xi,2, , xi,d] with |xi,j| c, j, the ternary compressor with B A > c is f ternary(α)-DP with Gµ(α + γ) γ f ternary(α) Gµ(α γ) + γ, (14) in which AB c2 , γ = 0.56 h A c B2 )3/2d1/2 . (15) Given the above results, we investigate the communication-privacy-accuracy tradeoff and compare the proposed ternary stochastic compressor with the state-of-the-art method SQKR in [13] and the classic Gaussian mechanism. According to the discussion in Remark 5, given the l2 norm constraint, Kashin s representation can be applied to transform it into the l geometry. Therefore, for ease of discussion, we consider the setting in which each user i stores a vector xi = [xi,1, xi,2, , xi,d] with |xi,j| c = C d, j, and ||xi||2 C. Ternary Stochastic Compressor: Let Zi,j = ternary(xi,j, A, B), then E[BZi,j] = xi,j and V ar(BZi,j) = AB x2 i,j. In this sense, applying the ternary stochastic compressor to each coordinate of xi independently yields an unbiased estimator with a variance of ABd ||xi||2 2. The privacy guarantee is given by Theorem 5, and the communication overhead is (log2(d) + 1) A B d bits in expectation. SQKR: In SQKR, each user first quantizes each coordinate of xi to { c, c} with 1-bit stochastic quantization. Then, it samples k coordinates (with replacement) and privatizes the k bit message via the 2k Random response mechanism with ϵ-LDP [14]. The SQKR mechanism yields an unbiased estimator with a variance of d eϵ 1 )2C2 ||xi||2 2. The privacy guarantee is ϵ-LDP, and the corresponding communication overhead is (log2(d) + 1)k bits. Gaussian Mechanism: We apply the Gaussian mechanism (i.e., adding independent zero-mean Gaussian noise ni,j N(0, σ2) to xi,j), followed by a sparsification probability of 1 A/B as in ternary(xi,j, A, B), which gives ZGauss i,j = B A(xi,j + ni,j) with probability A/B and ZGauss i,j = 0, otherwise. It can be observed that E[ZGauss i,j ] = xi,j and V ar(ZGauss i,j ) = B A 1)x2 i,j. Therefore, the Gaussian mechanism yields an unbiased estimator with a variance of B A 1)||xi||2 2. By utilizing the post-processing property, it can be shown that the above Gaussian mechanism is 2 dc σ -GDP [15], and the communication overhead is (log2(d) + 32) A B d bits in expectation. Discussion: It can be observed that for SQKR, with a given privacy guarantee ϵ-LDP, the variance (i.e., MSE) depends on k (i.e., the communication overhead). When eϵ 2k (which corresponds to the high privacy regime), the variance grows rapidly as k increases. For the proposed ternary stochastic compressor, it can be observed that both the privacy guarantee (in terms of µ-GDP) and the variance depend on AB. Particularly, with a given privacy guarantee µ < p 4dr/(1 r) for r = A/B, the variance is given by (4d/µ2 +1)C2 ||xi||2 2, which remains the same regardless of the communication overhead. In this sense, we essentially remove the dependency of accuracy on the communication overhead and therefore break the three-way tradeoff between communication 0.0 0.2 0.4 0.6 0.8 1.0 Ternary-ε = 1 Ternary-ε = 2 Ternary-ε = 5 0 1 2 3 4 5 Gaussian-Sparsity Ratio 0.2 Gaussian-Sparsity Ratio 0.4 Gaussian-Sparsity Ratio 0.6 Gaussian-Sparsity Ratio 0.8 Gaussian-Sparsity Ratio 1.0 T ernary-Sparsity Ratio 0.2 T ernary-Sparsity Ratio 0.4 T ernary-Sparsity Ratio 0.6 T ernary-Sparsity Ratio 0.8 T ernary-Sparsity Ratio 1.0 0 2 4 6 8 10 Gaussian-Sparsity Ratio 1.0 Ternary-A = 5c Ternary-A = 10c Ternary-A = 20c Ternary-A = 30c Figure 4: For the left figure, we set k = 10 and derive the corresponding variance for SQKR, based on which A and B for the ternary stochastic compressor are computed such that they have the same communication overhead and MSE in expectation. The middle and right figures show the tradeoff between µ-GDP and MSE. For the middle figure, we set σ { 2 3, 1, 2, 4, 6, 8, 10} for the Gaussian mechanism, given which A and B are computed such that AB = c2 + σ2 and the sparsity ratio is A/B. For the right figure, we set A {5c, 10c, 20c, 30c} and A/B {0.2, 0.4, 0.6, 0.8, 1.0}, given which the corresponding σ s are computed such that AB = c2 + σ2. overhead, privacy, and accuracy.1 This is mainly realized by accounting for privacy amplification by sparsification. At a high level, when fewer coordinates are shared (which corresponds to a larger privacy amplification and a larger MSE), the ternary stochastic compressor introduces less ambiguity to each coordinate (which corresponds to worse privacy protection and a smaller MSE) such that both the privacy guarantee and the MSE remain the same. Since we use different differential privacy measures from [13] (i.e., µ-GDP in this work and ϵ-DP in [13]), we focus on the comparison between the proposed ternary stochastic compressor and the Gaussian mechanism (which is order-optimal in most parameter regimes, see [30]) in the following discussion and present the detailed comparison with SQKR in the experiments in Section 7. Let AB = c2 + σ2, it can be observed that the f-DP guarantee of the ternary compressor approaches that of the Gaussian mechanism as d increases, and the corresponding variance is given by V ar(BZi,j) = σ2 + c2 x2 i,j. When A = B, i.e., no sparsification is applied, we have V ar(BZi,j) V ar(ZGauss i,j ) = c2 x2 i,j. Specifically, when xi,j { c, c}, 1 j d, the ternary compressor demonstrates the same f-DP privacy guarantee and variance as that for the Gaussian mechanism, i.e., the improvement in communication efficiency is obtained for free (in the large d regime). When B > A, we have V ar(BZi,j) V ar(ZGauss i,j ) = (1 B A)σ2 + c2 B Ax2 i,j, and there exists some B such that the ternary compressor outperforms the Gaussian mechanism in terms of both variance and communication efficiency. It is worth mentioning that the privacy guarantee of the Gaussian mechanism is derived by utilizing the post-processing property. We believe that sparsification brings improvement in privacy for the Gaussian mechanism as well, which is, however, beyond the scope of this paper. Optimality: It has been shown that, for k-bit unbiased compression mechanisms, there is a lower bound of Ω(C2d/k) in MSE [40]. For the proposed ternary compressor, the MSE and the communication cost are given by O(ABd) and A(log(d)+1)d/B bits, respectively. Let k = A(log(d)+1)d/B, it achieves an MSE of O(A2d2(log(d) + 1)/k). Since A > c = C/ d, the MSE of the ternary compressor is given by O(C2d(log(d)+1)/k), which implies that it is order-optimal up to a factor of log(d). Note that the factor of log(d) is used to represent the indices of coordinates that are non-zero, which can be eliminated by allowing for shared randomness between the users and the server. 7 Experiments In this section, we examine the performance of the proposed ternary compressor in the case of distributed mean estimation. We follow the set-up of [9] and generate N = 1000 user vectors with dimension d = 250, i.e., x1, ..., x N R250. Each local vector has bounded l2 and l norms, i.e., ||xi||2 C = 1 and ||xi|| c = 1 Fig. 4 compares the proposed ternary stochastic compressor with SQKR and the Gaussian mechanism. More specifically, the left figure in Fig. 4 compares the privacy guarantees (in terms of the tradeoff between type I and type II error rates) of the ternary stochastic compressor and SQKR given the 1In practice, utilizing the closed-form expressions of the MSE and the privacy guarantee µ, one may readily obtain the corresponding A and B for any given privacy/MSE and communication cost specifications. same communication overhead and MSE. It can be observed that the proposed ternary stochastic compressor outperforms SQKR in terms of privacy preservation, i.e., given the same type I error rate α, the type II error rate β of the ternary stochastic compressor is significantly larger than that of SQKR, which implies better privacy protection. For example, for SQKR with ϵ = 2, given type I error rate α = 0.5, the type II error rate of the attacker is around f SQKR(α) = 0.068, while the ternary compressor attains f ternary(α) = 0.484. Given the same MSE and communication cost as that of SQKR with ϵSQKR = {1, 2, 5}, if we translate the privacy guarantees of the ternary compressor from f-DP to ϵ-DP via Lemma 1 (we numerically test different ϵ s such that f ternary(α) max{0, 1 δ eϵα, e ϵ(1 δ α)} holds for δ = 0), we have ϵternary = {0.05, 0.2, 3.9} for the ternary compressor, which demonstrates its effectiveness. The middle and right figures in Fig. 4 show the tradeoff between MSE and DP guarantees for the Gaussian mechanism and the proposed ternary compressor. Particularly, in the middle figure, the tradeoff curves for the ternary compressor with all the examined sparsity ratios overlap with that of the Gaussian mechanism with A/B = 1 since they essentially have the same privacy guarantees, and the difference in MSE is negligible. For the Gaussian mechanism with A B < 1, the MSE is larger due to sparsification, which validates our discussion in Section 6. In the right figure, we examine the MSEs of the proposed ternary compressor with various A s and B s. It can be observed that the corresponding tradeoff between MSE and privacy guarantee matches that of the Gaussian mechanism well, which validates that the improvement in communication efficiency for the proposed ternary compressor is obtained for free. 8 Limitation The main results derived in this paper are for the scalar case, which are extended to the vector case by invoking the central limit theorem. In this case, the privacy guarantees derived in Theorem 5 are tight only in the large d regime. Fortunately, in applications like distributed learning, d corresponds to the model size (usually in the orders of millions for modern neural networks). Moreover, despite that the privacy-accuracy tradeoff of the proposed ternary compressor matches that of the Gaussian mechanism which is order-optimal in (ϵ, δ)-DP, the optimality of the proposed ternary compressor in the f-DP regime needs to be further established. 9 Conclusion In this paper, we derived the privacy guarantees of discrete-valued mechanisms with finite output space in the lens of f-differential privacy, which covered various differentially private mechanisms and compression mechanisms as special cases. Through leveraging the privacy amplification by sparsification, a ternary compressor that achieves better accuracy-privacy-communication tradeoff than existing methods is proposed. It is expected that the proposed methods can find broader applications in the design of communication efficient and differentially private federated data analysis techniques. Acknowledgments and Disclosure of Funding Richeng Jin was supported in part by the National Natural Science Foundation of China under Grant No. 62301487, in part by the Zhejiang Provincial Natural Science Foundation of China under Grant No. LQ23F010021, and in part by the Ng Teng Fong Charitable Foundation in the form of ZJUSUTD IDEA Grant No. 188170-11102. Zhonggen Su was supported by the Fundamental Research Funds for the Central Universities Grants. Zhaoyang Zhang was supported in part by the National Natural Science Foundation of China under Grant No. U20A20158, in part by the National Key R&D Program of China under Grant No. 2020YFB1807101, and in part by the Zhejiang Provincial Key R&D Program under Grant No. 2023C01021. Huaiyu Dai was supported by the US National Science Foundation under Grant No. ECCS-2203214. The views expressed in this publication are those of the authors and do not necessarily reflect the views of the National Science Foundation. [1] D. Wang, S. Shi, Y. Zhu, and Z. Han, Federated analytics: Opportunities and challenges, IEEE Network, vol. 36, no. 1, pp. 151 158, 2021. [2] B. Mc Mahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, Communication-efficient learning of deep networks from decentralized data, in Artificial Intelligence and Statistics. PMLR, 2017, pp. 1273 1282. [3] P. Kairouz, H. B. Mc Mahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings et al., Advances and open problems in federated learning, Foundations and Trends in Machine Learning, vol. 14, no. 1, 2021. [4] C. Dwork, F. Mc Sherry, K. Nissim, and A. Smith, Calibrating noise to sensitivity in private data analysis, in Theory of cryptography conference. Springer, 2006, pp. 265 284. [5] I. Mironov, On significance of the least significant bits for differential privacy, in Proceedings of the 2012 ACM conference on Computer and communications security, 2012, pp. 650 661. [6] N. Agarwal, A. T. Suresh, F. X. X. Yu, S. Kumar, and B. Mc Mahan, cp SGD: Communicationefficient and differentially-private distributed SGD, in Advances in Neural Information Processing Systems, 2018, pp. 7564 7575. [7] P. Kairouz, Z. Liu, and T. Steinke, The distributed discrete gaussian mechanism for federated learning with secure aggregation, in International Conference on Machine Learning. PMLR, 2021, pp. 5201 5212. [8] N. Agarwal, P. Kairouz, and Z. Liu, The skellam mechanism for differentially private federated learning, Advances in Neural Information Processing Systems, vol. 34, pp. 5052 5064, 2021. [9] W.-N. Chen, A. Ozgur, and P. Kairouz, The poisson binomial mechanism for unbiased federated learning with secure aggregation, in International Conference on Machine Learning. PMLR, 2022, pp. 3490 3506. [10] T. T. Nguyˆen, X. Xiao, Y. Yang, S. C. Hui, H. Shin, and J. Shin, Collecting and analyzing data from smart device users with local differential privacy, ar Xiv preprint ar Xiv:1606.05053, 2016. [11] T. Wang, J. Zhao, X. Yang, and X. Ren, Locally differentially private data collection and analysis, ar Xiv preprint ar Xiv:1906.01777, 2019. [12] V. Gandikota, D. Kane, R. K. Maity, and A. Mazumdar, vqsgd: Vector quantized stochastic gradient descent, in International Conference on Artificial Intelligence and Statistics. PMLR, 2021, pp. 2197 2205. [13] W.-N. Chen, P. Kairouz, and A. Ozgur, Breaking the communication-privacy-accuracy trilemma, Advances in Neural Information Processing Systems, vol. 33, pp. 3312 3324, 2020. [14] A. T. Suresh, X. Y. Felix, S. Kumar, and H. B. Mc Mahan, Distributed mean estimation with limited communication, in International conference on machine learning. PMLR, 2017, pp. 3329 3337. [15] J. Dong, A. Roth, and W. Su, Gaussian differential privacy, Journal of the Royal Statistical Society, 2021. [16] I. Mironov, R enyi differential privacy, in IEEE Computer Security Foundations Symposium (CSF). IEEE, 2017, pp. 263 275. [17] A. El Ouadrhiri and A. Abdelhadi, Differential privacy for deep and federated learning: A survey, IEEE Access, vol. 10, pp. 22 359 22 380, 2022. [18] C. Dwork, K. Kenthapadi, F. Mc Sherry, I. Mironov, and M. Naor, Our data, ourselves: Privacy via distributed noise generation, in Annual international conference on the theory and applications of cryptographic techniques. Springer, 2006, pp. 486 503. [19] S. P. Kasiviswanathan, H. K. Lee, K. Nissim, S. Raskhodnikova, and A. Smith, What can we learn privately? SIAM Journal on Computing, vol. 40, no. 3, pp. 793 826, 2011. [20] C. L. Canonne, G. Kamath, and T. Steinke, The discrete gaussian for differential privacy, Advances in Neural Information Processing Systems, vol. 33, pp. 15 676 15 688, 2020. [21] G. Cormode and I. L. Markov, Bit-efficient numerical aggregation and stronger privacy for trust in federated analytics, ar Xiv preprint ar Xiv:2108.01521, 2021. [22] P. Kairouz, S. Oh, and P. Viswanath, Extremal mechanisms for local differential privacy, The Journal of Machine Learning Research, vol. 17, no. 1, pp. 492 542, 2016. [23] A. Girgis, D. Data, S. Diggavi, P. Kairouz, and A. T. Suresh, Shuffled model of differential privacy in federated learning, in International Conference on Artificial Intelligence and Statistics. PMLR, 2021, pp. 2521 2529. [24] A. M. Girgis, D. Data, S. Diggavi, P. Kairouz, and A. T. Suresh, Shuffled model of federated learning: Privacy, accuracy and communication trade-offs, IEEE journal on selected areas in information theory, vol. 2, no. 1, pp. 464 478, 2021. [25] V. Feldman and K. Talwar, Lossless compression of efficient private local randomizers, in International Conference on Machine Learning. PMLR, 2021, pp. 3208 3219. [26] A. Shah, W.-N. Chen, J. Balle, P. Kairouz, and L. Theis, Optimal compression of locally differentially private mechanisms, in International Conference on Artificial Intelligence and Statistics. PMLR, 2022, pp. 7680 7723. [27] K. Chaudhuri, C. Guo, and M. Rabbat, Privacy-aware compression for federated data analysis, in The 38th Conference on Uncertainty in Artificial Intelligence, 2022. [28] A. Koskela, J. J alk o, L. Prediger, and A. Honkela, Tight differential privacy for discrete-valued mechanisms and for the subsampled gaussian mechanism using fft, in International Conference on Artificial Intelligence and Statistics. PMLR, 2021, pp. 3358 3366. [29] A. Koskela, J. J alk o, and A. Honkela, Computing tight differential privacy guarantees using fft, in International Conference on Artificial Intelligence and Statistics. PMLR, 2020, pp. 2560 2569. [30] W.-N. Chen, D. Song, A. Ozgur, and P. Kairouz, Privacy amplification via compression: Achieving the optimal privacy-accuracy-communication trade-off in distributed mean estimation, ar Xiv preprint ar Xiv:2304.01541, 2023. [31] C. Dwork and G. N. Rothblum, Concentrated differential privacy, ar Xiv preprint ar Xiv:1603.01887, 2016. [32] M. Bun, C. Dwork, G. N. Rothblum, and T. Steinke, Composable and versatile privacy via truncated cdp, in Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, 2018, pp. 74 86. [33] Q. Zheng, J. Dong, Q. Long, and W. Su, Sharp composition bounds for gaussian differential privacy via edgeworth expansion, in International Conference on Machine Learning. PMLR, 2020, pp. 11 420 11 435. [34] Z. Bu, J. Dong, Q. Long, and W. J. Su, Deep learning with gaussian differential privacy, Harvard data science review, vol. 2020, no. 23, 2020. [35] Q. Zheng, S. Chen, Q. Long, and W. Su, Federated f-differential privacy, in International Conference on Artificial Intelligence and Statistics. PMLR, 2021, pp. 2251 2259. [36] R. Jin, Y. Huang, X. He, H. Dai, and T. Wu, Stochastic-Sign SGD for federated learning with theoretical guarantees, ar Xiv preprint ar Xiv:2002.10940, 2020. [37] W. Wen, C. Xu, F. Yan, C. Wu, Y. Wang, Y. Chen, and H. Li, Tern Grad: Ternary gradients to reduce communication in distributed deep learning, in Advances in Neural Information Processing Systems, 2017, pp. 1509 1519. [38] M. Safaryan, E. Shulgin, and P. Richt arik, Uncertainty principle for communication compression in distributed and federated learning and the search for an optimal compressor, ar Xiv preprint ar Xiv:2002.08958, 2020. [39] Y. Lyubarskii and R. Vershynin, Uncertainty principles and vector quantization, IEEE Transactions on Information Theory, vol. 56, no. 7, pp. 3491 3501, 2010. [40] W.-N. Chen, C. A. C. Choo, P. Kairouz, and A. T. Suresh, The fundamental price of secure aggregation in differentially private federated learning, in International Conference on Machine Learning. PMLR, 2022, pp. 3056 3089. [41] E. L. Lehmann, J. P. Romano, and G. Casella, Testing statistical hypotheses. Springer, 2005, vol. 3. [42] J. Lee, M. Kim, S. W. Kwak, and S. Jung, Differentially private multivariate statistics with an application to contingency table analysis, ar Xiv preprint ar Xiv:2211.15019, 2022. [43] Y. Liu, K. Sun, L. Kong, and B. Jiang, Identification, amplification and measurement: A bridge to gaussian differential privacy, ar Xiv preprint ar Xiv:2210.09269, 2022. Breaking the Communication-Privacy-Accuracy Tradeoff with f-Differential Privacy: Supplementary Material A Tradeoff Functions for a Generic Discrete-Valued Mechanism We consider a general randomization protocol M( ) with discrete and finite output space. In this case, we can always find a one-to-one mapping between the range of M( ) and a subset of Z. With such consideration, we assume that the output of the randomization protocol is an integer, i.e., M(S) ZM Z, S, without loss of generality. Given the randomization protocol and the hypothesis testing problem in (2), we derive its tradeoff function as a function of the type I error rate in the following lemma. Lemma 2. For two neighboring datasets S and S , suppose that the range of the randomized mechanism R(M(S)) R(M(S )) = ZU M = [ZU L , . . . , ZU R] Z and R(M(S)) R(M(S )) = ZI M = [ZI L, . . . , ZI R] Z. Let X = M(S) and Y = M(S ). Then, Case (1) If M(S) [ZI L, ZI L + 1, . . . , ZU R], M(S ) [ZU L , ZU L + 1, . . . , ZI R], and P (Y =k) P (X=k) is a decreasing function of k for k ZI M, the tradeoff function in Definition 2 is given by P(Y k) + P (Y =k)P (Xk) P (X=k) P (Y =k) if α (P(X > k), P(X k)], k [ZI L, ZI R]. 0, if α (P(X > ZI L 1), 1]. Remark 6. It is assumed in Lemma 2 that P (Y =k) P (X=k) is a decreasing function (for part (1)) or an increasing function (for part (2)) of k ZI M, without loss of generality. In practice, thanks to the post-processing property of DP [15], one can relabel the output of the mechanism to ensure that this condition holds and Lemma 2 can be adapted accordingly. Remark 7. We note that in Lemma 2, both X and Y depend on both the randomized mechanism M( ) and the neighboring datasets S and S . Therefore, the infimums of the tradeoff functions in (16) and (17) are mechanism-specific, which should be analyzed individually. After identifying the neighboring datasets S and S that minimize β+ ϕ (α) and β ϕ (α) for a mechanism M( ) (which is highly non-trivial), we can obtain the distributions of X and Y in (16) and (17) and derive the corresponding f-DP guarantees. Remark 8. Since β+ ϕ (α) is a piecewise function with decreasing slopes w.r.t k (see, e.g., Fig. 1), it can be readily shown that β+ ϕ (α) max{P(Y k) + P (Y =k) P (X=k)P(X < k) P (Y =k) P (X=k)α, 0}, k ZI M. As a result, utilizing Lemma 1, we may obtain different pairs of (ϵ, δ) given different k s. Remark 9. Although we assume a finite output space, a similar method can be applied to the mechanisms with an infinite range. Taking the discrete Gaussian noise [20] as an example, M(x) = x + V with P(V = v) = e v2/2σ2 P v Z e v2/2σ2 . One may easily verify that P (M(xi)=k) P (M(x i)=k) is a decreasing function of k if x i > xi (and increasing otherwise). Then we can find some threshold v for the rejection rule ϕ such that αϕ = P(M(xi) v) = α, and the corresponding βϕ(α) = 1 P(M(x i) v). The key to proving Lemma 2 is finding the rejection rule ϕ such that βϕ(α) is minimized for a pre-determined α [0, 1]. To this end, we utilize the Neyman-Pearson Lemma [41], which states that for a given α, the most powerful rejection rule is threshold-based, i.e., if the likelihood ratio P (Y =k) P (X=k) is larger than/equal to/smaller than a threshold h, H0 is rejected with probability 1/γ/0. More specifically, since X and Y may have different ranges, we divide the discussion into two cases (i.e., Case (1) and Case (2) in Lemma 2). The Neyman-Pearson Lemma [41] is given as follows. Lemma 3. (Neyman-Pearson Lemma [41]) Let P and Q be probability distributions on Ωwith densities p and q, respectively. For the hypothesis testing problem H0 : P vs H1 : Q, a test ϕ : Ω [0, 1] is the most powerful test at level α if and only if there are two constants h [0, + ] and γ [0, 1] such that ϕ has the form and EP [ϕ] = α. The rejection rule suggests that H0 is rejected with a probability of ϕ(x) given the observation x. Given Lemma 3, the problem is then reduced to finding the corresponding h and γ such that the type I error rate αϕ = α. For part (1) (the results for part (2) can be shown similarly), we divide the range of α (i.e., [0, 1]) into multiple segments, as shown in Fig. 5. To achieve α = 0, we set h = and γ = 1, which suggests that the hypothesis H0 is always rejected when k < ZI L and accepted otherwise. To achieve α (P(X < k), P(X k)], for k [ZI L, ZI R], we set h = P (Y =k) P (X=k) and γ = α P (XZI R) . In this case, it can be shown that αϕ = α (P(X < ZI R + 1), 1]. The corresponding βϕ can be derived accordingly, which is given by (16). The complete proof is given below. Proof. Given Lemma 3, the problem is reduced to finding the parameters h and γ in (18) such that EP [ϕ] = α, which can be proved as follows. Case (1) We divide α [0, 1] into ZU R ZI L + 1 segments: [P(X < ZU L ), P(X < ZI L)] (P(X < ZI L), P(X ZI L)] (P(X < k), P(X k)] (P(X < ZU R), P(X ZU R)], as shown in Fig. 5. & & = ' * + < , - * + . , * + < #/ 0 - * + . #/ 0 Figure 5: Dividing α into multiple segments for part (1). When α = P(X < ZU L ) = P(X < ZI L) = 0, we set h = + . In this case, noticing that P (Y =k) P (X=k) = h for k < ZI L, and P (Y =k) P (X=k) < h otherwise, we have EP [ϕ] = γP(X < ZI L) = 0 = α, (19) and β+ ϕ (0) = 1 EQ[ϕ] = 1 γP(Y < ZI L). (20) The infimum is attained when γ = 1, which yields β+ ϕ (0) = P(Y ZI L). When α (P(X < k), P(X k)] for k [ZI L, ZI R], we set h = P (Y =k) P (X=k). In this case, P (Y =k ) h for k = k, and P (Y =k ) P (X=k ) > h for k < k, and therefore EP [ϕ] = P(X < k) + γP(X = k). (21) We adjust γ such that EP [ϕ] = α, which yields γ = α P(X < k) P(X = k) , (22) and β+ ϕ (α) = 1 [P(Y < k) + γP(Y = k)] = P(Y k) P(Y = k)α P(X < k) = P(Y k) + P(Y = k)P(X < k) P(X = k) P(Y = k) When α (P(X < k), P(X k)] for k (ZI R, ZU R], we set h = 0. In this case, P (Y =k ) P (X=k ) = h for k > ZI R, and P (Y =k ) P (X=k ) > h for k ZI R. As a result, EP [ϕ] = P(X ZI R) + γP(X > ZI R), (24) and β+ ϕ (α) = 1 [P(Y ZI R) + γP(Y > ZI R)] = 0 (25) Similarly, we can prove the second part of Lemma 2 as follows. Case (2) We also divide α [0, 1] into ZU R ZI L + 1 segments: [P(X > ZU L ), P(X ZU L )] (P(X > k), P(X k)] (P(X > ZI R), P(X ZI R)], as shown in Fig. 6. ' * > #$ , + ' * > - , ' * . - /, ' * . #0 % Figure 6: Dividing α in to multiple segments for part (2). When α (P(X > k), P(X k)] for k [ZU L , ZI L), we set h = 0. In this case, EP [ϕ] = P(X ZI L) + γP(X < ZI L), (26) and β ϕ (α) = 1 [P(Y ZI L) + γP(Y < ZI L)] = 0 (27) When α (P(X > k), P(X k)] for k [ZI L, ZI R], we set h = P (Y =k) P (X=k). In this case, EP [ϕ] = P(X > k) + γP(X = k). (28) Setting EP [ϕ] = α yields γ = α P(X > k) P(X = k) , (29) and β ϕ (α) = 1 [P(Y > k) + γP(Y = k)] = P(Y k) P(Y = k)α P(X > k) = P(Y k) + P(Y = k)P(X > k) P(X = k) P(Y = k) When α = P(X > ZI R) = 0, we set h = + . In this case, EP [ϕ] = γP(X > ZI R) = 0 = α, (31) and β+ ϕ (0) = 1 EQ[ϕ] = 1 γP(Y > ZI R). (32) The infimum is attained when γ = 1, which yields β ϕ (0) = P(Y ZI R). B Proofs of Theoretical Results B.1 Proof of Theorem 1 Theorem 1. Let Z = Binom(M, p), the binomial noise mechanism in Algorithm 1 is f bn(α)- differentially private with f bn(α) = min{β+ ϕ,inf(α), β ϕ,inf(α)}, (33) β+ ϕ,inf(α) = P( Z k + l) + P (Z= k+l)P ( Z< k) P ( Z= k) P ( Z= k+l) P ( Z= k) α, for α [P( Z < k), P( Z k)], k [0, M l], 0, for α [P( Z M l), 1]. β ϕ,inf(α) = P( Z k l) + P ( Z= k l)P ( Z> k) P ( Z= k) P ( Z= k l) P ( Z= k) α, for α [P( Z > k), P( Z k)], k [l, M], 0, for α [P( Z l), 1]. Given that P( Z = k) = M k pk(1 p)M k, it can be readily shown that when p = 0.5, both β+ ϕ,inf(α) and β ϕ,inf(α) are maximized, and f(α) = β+ ϕ,inf(α) = β ϕ,inf(α). Before proving Theorem 1, we first show the following lemma. Lemma 4. Let X = xi + Binom(M, p) and Y = x i + Binom(M, p). Then, if xi > x i, P(Y k) + P (Y =k)P (Xk) P (X=k) P (Y =k) if α [P(X > k), P(X k)], k [x i, xi + M]. 0, if α (P(X > x i 1), 1] Proof of Lemma 4. When xi > x i, it can be easily verified that P(X = k) > 0 only for k [xi, xi + 1, , xi + M], P(Y = k) > 0 only for k [x i, x i + 1, , x i + M]. For k [xi, , x i + M], we have P(Y = k) P(X = k) = M k x i pk x i(1 p)M k+x i M k xi pk xi(1 p)M k+xi = (N k + x i + 1)(N k + x i + 2) (N k + xi) (k xi + 1)(k xi + 2) (k x i) (1 p It can be observed that P (Y =k) P (X=k) is a decreasing function of k. When xi < x i, it can be easily verified that P(X = k) > 0 only for k [xi, xi + 1, , xi + M], P(Y = k) > 0 only for k [x i, x i + 1, , x i + M]. For k [x i, , xi + M], we have P(Y = k) P(X = k) = M k x i pk x i(1 p)M k+x i M k xi pk xi(1 p)M k+xi = (k x i + 1)(k x i + 2) (k xi) (N k + xi + 1)(N k + xi + 2) (N k + x i)(1 p It can be observed that P (Y =k) P (X=k) is an increasing function of k, and invoking Lemma 2 completes the proof. Given Lemma 4, we are ready to prove Theorem 1. Proof of Theorem 1. Let Z = Binom(M, p), X = xi + Z and Y = x i + Z. Two cases are considered: Case 1: xi > x i. In this case, according to Lemma 4, we have P(Y k) + P (Y =k)P (X 0, J( + 1, k) J( , k) < P( Z = k + ) < 0. If P( Z = k + + 1) P( Z = k + ) < 0, J( + 1, k) J( , k) < P( Z = k + + 1) < 0. As a result, the infimum of β+ ϕ (α) is attained when = l, i.e., xi = l and x i = 0, which yields β+ ϕ,inf(α) = P( Z k + l) + P ( Z= k+l)P ( Z< k) P (Z= k) P ( Z= k+l) P ( Z= k) α, for α [P( Z < k), P( Z k)], k [0, M l], 0, for α [P( Z M l), 1]. Case 2: xi < x i. In this case, according to Lemma 4, we have P(Y k) + P (Y =k)P (X>k) P (X=k) P (Y =k) for α [P(X > k), P(X k)], k [x i, xi + M], 0, for α [P(X x i), 1], In the following, we show the infimum of β(α). For the ease of presentation, let k = k xi and x i xi = . Then, we have P(Y k) = P(x i + Z k) = P( Z k ), P(Y = k) = P( Z = k ), P(X > k) = P(xi + Z > k) = P( Z > k), P(X = k) = P(xi + Z = k) = P( Z = k). (45) can be rewritten as P( Z k ) + P ( Z= k )P ( Z> k) P ( Z= k) P ( Z= k ) P ( Z= k) α, for α [P( Z > k), P( Z k)], k [ , M], 0, for α [P( Z ), 1]. Let J( , k) = P( Z k ) + P ( Z= k )P ( Z> k) P ( Z= k) P ( Z= k ) P ( Z= k) α, we have J( + 1, k) J( , k) = P( Z = k ) + P( Z = k 1) P( Z = k ) P( Z = k) [P( Z > k) α] (48) Since α [P( Z > k), P( Z k)], we have P( Z > k) α [ P( Z = k), 0]. If P( Z = k 1) P( Z = k ) > 0, then J( + 1, k) J( , k) < P( Z = k ) < 0. If P( Z = k 1) P( Z = k ) < 0, then J( +1, k) J( , k) < P( Z = k 1) < 0. As a result, the infimum of β ϕ (α) is attained when = l, i.e., xi = 0 and x i = l, which yields β ϕ,inf(α) = P( Z k l) + P ( Z= k l)P ( Z> k) P ( Z= k) P ( Z= k l) P ( Z= k) α, for α [P( Z > k), P( Z k)], k [l, M], 0, for α [P( Z l), 1]. Combining (44) and (49) completes the first part of the proof. When p = 0.5, it can be found that both β+ ϕ,inf(α) and β ϕ,inf(α) are maximized, and f(α) = β+ ϕ,inf(α) = β ϕ,inf(α). B.2 Proof of Theorem 2 Theorem 2. The binomial mechanism in Algorithm 2 is f bm(α)-differentially private with f bm(α) = min{β+ ϕ,inf(α), β ϕ,inf(α)}, (50) in which β+ ϕ,inf(α) = 1 [P(Y < k) + γP(Y = k)] = P(Y k) + P(Y = k)P(X < k) P(X = k) P(Y = k) for α [P(X < k), P(X k)] and k {0, 1, 2, , M}, where X = Binom(M, pmax) and Y = Binom(M, pmin), and β ϕ,inf(α) = 1 [P(Y > k) + γP(Y = k)] = P(Y k) + P(Y = k)P(X > k) P(X = k) P(Y = k) for α [P(X > k), P(X k)] and k {0, 1, 2, , M}, where X = Binom(M, pmin) and Y = Binom(M, pmax). When pmax = 1 pmin, we have β+ ϕ,inf(α) = β ϕ,inf(α). Proof. Observing that the output space of the binomial mechanism remains the same for different data xi, i.e., ZI L = ZU L = 0 and ZI R = ZU R = M in Lemma 2. Moreover, let X = Binom(M, p) and Y = Binom(M, q), we have P (Y =k) P (X=k) = ( M k )qk(1 q)M k ( M k )pk(1 p)M k = ( 1 q 1 p)M( q(1 p) p(1 q))k. Similarly, we consider the following two cases. Case 1: q < p. In this case, we can find that P (Y =k) P (X=k) is a decreasing function of k. Therefore, according to Lemma 2, we have β+ ϕ (α) = 1 [P(Y < k) + γP(Y = k)] = P(Y k) P(Y = k)α P(X < k) = P(Y k) + P(Y = k)P(X < k) P(X = k) P(Y = k) In the following, we show that the infimum is attained when p = pmax and q = pmin. For Binomial distribution Y , we have P (Y ˆp. Suppose that α [P(X < k), P(X k)] and α [P( ˆX < ˆk), P( ˆX ˆk)] for some k and ˆk are satisfied simultaneously, it can be readily shown that k ˆk. In addition, α [max{P(X < k), P( ˆX < ˆk)}, min{P(X k), P( ˆX ˆk)}]. Let β+ ϕ,p(α) = P(Y k) + P(Y = k)[P(X < k) α] P(X = k) , (53) β+ ϕ,ˆp(α) = P(Y ˆk) + P(Y = ˆk)[P( ˆX < ˆk) α] P( ˆX = ˆk) , (54) β+ ϕ,p(α) β+ ϕ,ˆp(α) = P(Y k) P(Y ˆk) + P(Y = k)[P(X < k) α] P(X = k) P(Y = ˆk)[P( ˆX < ˆk) α] P( ˆX = ˆk) = P(Y > k) P(Y > ˆk) + P(Y = k)[P(X k) α] P(X = k) P(Y = ˆk)[P( ˆX ˆk) α] P( ˆX = ˆk) . Obviously, P(Y k) P(Y ˆk) 0 and P(Y > k) P(Y > ˆk) 0 for k ˆk. Observing that β+ ϕ,p(α) β+ ϕ,ˆp(α) is a linear function of α [max{P(X < k), P( ˆX < ˆk)}, min{P(X k), P( ˆX ˆk)}] given Y , X, ˆX, k and ˆk, we consider the following four possible cases: 1) P(X < k) P( ˆX < ˆk) and α = P( ˆX < ˆk): In this case, P (Y =k)[P (X P( ˆX < ˆk) and α = P(X < k): In this case, β+ ϕ,p(α) β+ ϕ,ˆp(α) = P(Y k) P(Y ˆk) + P(Y = k)[P(X < k) α] P(X = k) P(Y = ˆk)[P( ˆX < ˆk) α] P( ˆX = ˆk) = P(Y k) P(Y ˆk) P(Y = ˆk)[P( ˆX < ˆk) P(X < k)] P( ˆX = ˆk) . When k = ˆk, since p > ˆp, we have P( ˆX < ˆk) P(X < ˆk) > 0, which violates the condition that P(X < k) > P( ˆX < ˆk). When k > ˆk, we have P(Y k) P(Y ˆk) P(Y = ˆk). Therefore, β+ ϕ,p(α) β+ ϕ,ˆp(α) P(Y = ˆk) P(Y = ˆk)[P( ˆX < ˆk) P(X < k)] P( ˆX = ˆk) = P(Y = ˆk)[P( ˆX ˆk) P(X < k)] P( ˆX = ˆk) 0. 3) P(X k) P( ˆX ˆk) and α = P(X k): In this case, P(Y = k)[P(X k) α] P(X = k) P(Y = ˆk)[P( ˆX ˆk) α] P( ˆX = ˆk) = P(Y = ˆk)[P( ˆX ˆk) P(X k)] P( ˆX = ˆk) 0 As a result, β+ ϕ,p(α) β+ ϕ,ˆp(α) P(Y > k) P(Y > ˆk) 0. 4) P(X k) > P( ˆX ˆk) and α = P( ˆX ˆk): In this case, when k = ˆk, P(X ˆk) P( ˆX ˆk) > 0, which violates the condition that P(X k) > P( ˆX ˆk). When k > ˆk, β+ ϕ,p(α) β+ ϕ,ˆp(α) = P(Y k) P(Y ˆk) + P(Y = k)[P(X < k) P( ˆX ˆk)] P(Y = ˆk)[P( ˆX < ˆk) P( ˆX ˆk)] P( ˆX = ˆk) = P(Y k) P(Y > ˆk) + P(Y = k)[P(X < k) P( ˆX ˆk)] Since k > ˆk, P(Y k) P(Y > ˆk) 0. In addition, P(X < k) P( ˆX ˆk) 0 since α [max{P(X < k), P( ˆX < ˆk)}, P( ˆX ˆk)]. As a result, β+ ϕ,p(α) β+ ϕ,ˆp(α) P(Y > k) P(Y > ˆk) 0. Now that β+ ϕ,p(α) β+ ϕ,ˆp(α) is a linear function of α [max{P(X < k), P( ˆX < ˆk)}, min{P(X k), P( ˆX ˆk)}], which is non-positive in the extreme points (i.e., the boundaries), we can conclude that β+ ϕ,p(α) β+ ϕ,ˆp(α) 0 for any α [max{P(X < k), P( ˆX < ˆk)}, min{P(X k), P( ˆX ˆk)}]. Therefore, the infimum of β+ ϕ (α) is attained when p = pmax. Case 2: q > p. In this case, we can find that P (Y =k) P (X=k) is an increasing function of k. As a result, according to Lemma 2, we have β ϕ (α) = P(Y k) + P(Y = k)P(X > k) P(X = k) P(Y = k) P(X = k)α. (60) Similarly, it can be shown that the infimum is attained when q = pmax and p = pmin. As a result, we have T(P, Q)(α) = min{β+ ϕ,inf(α), β ϕ,inf(α)} (61) B.3 Proof of Theorem 3 Theorem 3. The ternary stochastic compressor is f ternary(α)-differentially private with f ternary(α) = A cα, for α [0, A c B α, for α [ A c A c A+c A c A+cα, for α [1 A+c 2B , 1]. (62) We provide the f-DP analysis for a generic ternary stochastic compressor defined as follows. Definition 8 (Generic Ternary Stochastic Compressor). For any given x [ c, c], the generic compressor ternary outputs ternary(x, p1, p0, p 1), which is given by ternary(x, p1, p0, p 1) = 1, with probability p1(x), 0, with probability p0, 1, with probability p 1(x), (63) where p0 is the design parameter that controls the level of sparsity and p1(x), p 1(x) [pmin, pmax]. It can be readily verified that p1 = A+x 2B ,p0 = 1 A B , p 1 = A x 2B (and therefore pmin = A c 2B and pmax = A+c 2B ) for the ternary stochastic compressor in Definition 6. In the following, we show the f-DP of the generic ternary stochastic compressor, and the corresponding f-DP guarantee for the compressor in Definition 6 can be obtained with pmin = A c 2B , pmax = A+c 2B , and p0 = 1 A Lemma 5. Suppose that p0 is independent of x, pmax + pmin = 1 p0, and p1(x) > p1(y), x > y. The ternary compressor is f ternary(α)-differentially private with f ternary(α) = pmin α, for α [0, pmin], p0 + 2pmin α, for α [pmin, 1 pmax], pmin pmax pmin pmax α, for α [1 pmax, 1], (64) Proof. Similar to the binomial mechanism, the output space of the ternary mechanism remains the same for different inputs. Let Y = ternary(x i, p1, p0, p 1) and X = ternary(xi, p1, p0, p 1), we have P(Y = 1) P(X = 1) = p 1(x i) p 1(xi), P(Y = 0) P(X = 0) = 1, P(Y = 1) P(X = 1) = p1(x i) p1(xi). When xi > x i, it can be observed that P (Y =k) P (X=k) is a decreasing function of k. According to Lemma 2, we have 1 p 1(x i) p 1(xi)α, for α [0, p 1(xi)], p0 + p1(x i) + p 1(xi) α, for α [p 1(xi), 1 p1(xi)], p1(x i) p1(xi) p1(x i) p1(xi)α, for α [1 p1(xi), 1]. When xi < x i, it can be observed that P (Y =k) P (X=k) is an increasing function of k. According to Lemma 2, we have 1 p1(x i) p1(xi)α, for α [0, p1(xi)], p0 + p 1(x i) + p1(xi) α, for α [p1(xi), 1 p 1(xi)], p 1(x i) p 1(xi) p 1(x i) p 1(xi)α, for α [1 p 1(xi), 1]. The infimum of β+ ϕ (α) is attained when p 1(x i) = pmax and p 1(xi) = pmin, while the infimum of β ϕ (α) is attained when p1(x i) = pmax and p1(xi) = pmin. As a result, we have f ternary(α) = pmin α, for α [0, pmin], p0 + 2pmin α, for α [pmin, 1 pmax], pmin pmax pmin pmax α, for α [1 pmax, 1], (68) which completes the proof. B.4 Proof of Theorem 4 Theorem 4. Given a vector xi = [xi,1, xi,2, , xi,d] with |xi,j| c, j. Applying the ternary compressor to the j-th coordinate of xi independently yields µ-GDP with µ = 2Φ 1( 1 1+( A+c Before proving Theorem 4, we first introduce the following lemma. Lemma 6. [42, 43] Any (ϵ, 0)-DP algorithm is also µ-GDP for µ = 2Φ 1( 1 1+eϵ ), in which Φ( ) is the cumulative density function of normal distribution. Proof. According to Theorem 3, in the scalar case, the ternary stochastic compressor is f ternary(α)- differentially private with f ternary(α) = A cα, for α [0, A c B α, for α [ A c A c A+c A c A+cα, for α [1 A+c 2B , 1]. (69) It can be easily verified that f ternary(α) max{0, 1 ( A+c A c)α, ( A c A+c)(1 α)}. Invoking Lemma 1 suggests that it is (log( A+c A c), 0)-DP. Extending it to the d-dimensional case yields (d log( A+c A c)M, 0)- DP. As a result, according to Lemma 6, it is 2Φ 1( 1 1+( A+c A c )d )-GDP. B.5 Proof of Theorem 5 Theorem 5. For a vector xi = [xi,1, xi,2, , xi,d] with |xi,j| c, j, the ternary compressor with B A > c is f ternary(α)-DP with Gµ(α + γ) γ f ternary(α) Gµ(α γ) + γ, (70) in which AB c2 , γ = 0.56 h A c B2 )3/2d1/2 . (71) Before proving Theorem 5, we first define the following functions as in [15], kl(f) = Z 1 0 log |f (x)|dx, (72) κ2(f) = Z 1 0 log2 |f (x)|dx, (73) κ3(f) = Z 1 0 | log |f (x)||3dx, (74) κ3(f) = Z 1 0 | log |f (x)| + kl(f)|3dx. (75) The central limit theorem for f-DP is formally introduced as follows. Lemma 7 ([15]). Let f1, ..., fn be symmetric trade-off functions such that κ3(fi) < for all 1 i d. Denote µ = 2||kl||1 p ||κ2||1 ||kl||2 2 , and γ = 0.56|| κ3||1 (||κ2||1 ||kl||2 2)3/2 , and assume γ < 1 2. Then, for all α [γ, 1 γ], we have Gµ(α + γ) γ f1 f2 fd(α) Gµ(α γ) + γ. (76) Given Lemma 7, we are ready to prove Theorem 5. Proof. Given fi(α) in (62), we have kl(f) = A c 2B log A + c B log A + c κ2(f) = A c 2B log2 A + c 2B log2 A c B log2 A + c κ3(f) = A c 3 log A + c The corresponding µ and γ are given as follows AB c2 , (81) γ = 0.56 h A c B2 )3/2d1/2 , (82) which completes the proof. C f-DP of the Poisson Binomial Mechanism The Poisson binomial mechanism [9] is presented in Algorithm 3. In the following, we show the Algorithm 3 Poisson Binomial Mechanism Input: pi [pmin, pmax], i N Privatization: Zpb PB(p1, p2, , p N) = P i N Binom(M, pi). f-DP guarantee of the Poisson binomial mechanism with M = 1. The extension to the proof for M > 1 is straightforward by following a similar technique. Theorem 6. The Poisson binomial mechanism with M = 1 in Algorithm 3 is f pb(α)-differentially private with f pb(α) = min max 0, 1 1 pmin 1 pmax α, pmin pmax (1 α) , max 0, 1 pmax pmin α, 1 pmax 1 pmin (1 α) . (83) Proof. For Poisson Binomial, let X = PB(p1, p2, , pi 1, pi, pi+1, , p N), Y = PB(p1, p2, , pi 1, p i, pi+1, , p N), Z = PB(p1, p2, , pi 1, pi+1, , p N), (84) in which PB stands for Poisson Binomial. In this case, P(Y = k + 1) P(X = k + 1) = P(Z = k + 1)(1 p i) + P(Z = k)p i P(Z = k + 1)(1 pi) + P(Z = k)pi . (85) In addition, P(Y = k + 1)P(X = k) P(Y = k)P(X = k + 1) = [P(Z = k + 1)P(Z = k 1) (P(Z = k))2](pi p i). (86) Since P(Z = k + 1)P(Z = k 1) (P(Z = k))2 < 0 for Poisson Binomial distribution, we have P(Y = k + 1)P(X = k) P(Y = k)P(X = k + 1) (> 0, if pi < p i, < 0, if pi > p i. (87) That being said, P (Y =k) P (X=k) is an increasing function of k if pi < p i and a decreasing function of k if pi > p i. Following the same analysis as that in the proof of Theorem 2, for pi > p i, we have β+ ϕ (α) = 1 [P(Y < k) + γP(Y = k)] = P(Y k) P(Y = k)α P(X < k) = P(Y k) + P(Y = k)P(X < k) P(X = k) P(Y = k) for α [P(X < k), P(X k)] and k {0, 1, 2, , N}. In the following, we show that the infimum of β+ ϕ (α) is attained when pi = pmax and p i = pmin. Case 1: k = 0. In this case, P(Y 0) = 1, P(Y = 0) = P(Z = 0)(1 p i), P(X < 0) = 0, P(X = 0) = P(Z = 0)(1 pi). Plugging (89) into (88) yields β+ ϕ (α) = 1 1 p i 1 pi α. (90) It is obvious that the infimum is attained when pi = pmax and p i = pmin. Case 2: k > 0. In this case, P(Y k) = P(Z k) + P(Z = k 1)p i, P(Y = k) = P(Z = k)(1 p i) + P(Z = k 1)p i, P(X < k) = P(Z < k) P(Z = k 1)pi, P(X = k) = P(Z = k)(1 pi) + P(Z = k 1)pi. Plugging (91) into (88) yields β+ ϕ (α) = p(Z > k) + P(Z = k)p i + [P(X k) α][P(Z = k) [P(Z = k) P(Z = k 1)p i]] P(X = k) . (92) The p i related term is given by P(X = k)P(Z = k) P(X = k) [P(Z = k) P(Z = k 1)][P(X k) α] Observing that (93) is a linear function of α, we only need to examine α {P(X < k), P(X k)}. More specifically, when α = P(X k), it is reduced to P(Z = k)p i; when α = P(X < k), it is reduced to P(Z = k 1)p i. In both cases, the infimum is attained when p i = pmin. Given that p i = pmin, the same technique as in the proof of Theorem 2 can be applied to show that the infimum is attained when p = pmax. Since P (Y =k) P (X=k) is a decreasing function of k when pi > p i, we have pmin pmax P(Y = k) P(X = k) 1 pmin 1 pmax . (94) Given that β+ ϕ (α) is a decreasing function of α with β+ ϕ (0) = 1 and β+ ϕ (1) = 0, we can readily conclude that β+ ϕ (α) max{0, 1 1 pmin 1 pmax α} and β+ ϕ (α) pmin pmax (1 α). That being said, β+ ϕ (α) max{0, 1 1 pmin 1 pmax α, pmin pmax (1 α)}. Similarly, for pi < p i, we have β ϕ (α) = 1 [P(Y > k) + γP(Y = k)] = P(Y k) P(Y = k)α P(X > k) = P(Y k) + P(Y = k)P(X > k) P(X = k) P(Y = k) for α [P(X > k), P(X k)] and k {0, 1, 2, , N}. The infimum is attained when pi = pmin, p i = pmax. Since P (Y =k) P (X=k) is an increasing function of k when pi < p i, we have 1 pmax 1 pmin P(Y = k) P(X = k) pmax pmin . (96) Given that β ϕ (α) is an increasing function of α with β ϕ (0) = 1 and β ϕ (1) = 0, we can easily conclude that β ϕ (α) max{0, 1 pmax pmin α} and β ϕ (α) 1 pmax 1 pmin (1 α). That being said, β ϕ (α) max{0, 1 pmax pmin α, 1 pmax 1 pmin (1 α)}.