# universal_exact_compression_of_differentially_private_mechanisms__a1679c9a.pdf

Universal Exact Compression of Differentially Private Mechanisms

Yanxiao Liu The Chinese University of Hong Kong yanxiaoliu@link.cuhk.edu.hk

Wei-Ning Chen Stanford University wnchen@stanford.edu

Ayfer Özgür Stanford University aozgur@stanford.edu

Cheuk Ting Li The Chinese University of Hong Kong ctli@ie.cuhk.edu.hk

To reduce the communication cost of differential privacy mechanisms, we introduce a novel construction, called Poisson private representation (PPR), designed to compress and simulate any local randomizer while ensuring local differential privacy. Unlike previous simulation-based local differential privacy mechanisms, PPR exactly preserves the joint distribution of the data and the output of the original local randomizer. Hence, the PPR-compressed privacy mechanism retains all desirable statistical properties of the original privacy mechanism such as unbiasedness and Gaussianity. Moreover, PPR achieves a compression size within a logarithmic gap from the theoretical lower bound. Using the PPR, we give a new order-wise trade-off between communication, accuracy, central and local differential privacy for distributed mean estimation. Experiment results on distributed mean estimation show that PPR consistently gives a better trade-off between communication, accuracy and central differential privacy compared to the coordinate subsampled Gaussian mechanism, while also providing local differential privacy.

1 Introduction

In modern data science, there is a growing dependence on large amounts of high-quality data, often generated by edge devices (e.g., photos and videos captured by smartphones, or messages hosted by social networks). However, this data inherently contains personal information, making it susceptible to privacy breaches during acquisition, collection, or utilization. For instance, despite the significant recent advancement in foundational models [9], studies have shown that these models can accidentally memorize their training data. This poses a risk where malicious users, even with just API access, can extract substantial portions of sensitive information [14, 15]. In recent years, differential privacy (DP) [28] has emerged as a powerful framework for safeguarding users privacy by ensuring that local data is properly randomized before leaving users devices. Apart from privacy concerns, communicating local data from edge devices to the central server often becomes a bottleneck in the system pipeline, especially with high-dimensional data common in many machine learning scenarios. This leads to the following fundamental question: how can we efficiently communicate privatized data?

Recent works have shown that a wide range of differential privacy mechanisms can be simulated and compressed using shared randomness, resulting in a compressed mechanism which has a smaller communication cost compared to the original mechanism, while retaining the (perhaps slightly weakened) privacy guarantee. This can be done via rejection sampling [31], importance sampling [71, 78], or dithered quantization [56, 72, 46, 49, 80] with each approach having its own advantages and disadvantages. For example, importance-sampling-based methods [71, 78] and the

38th Conference on Neural Information Processing Systems (Neur IPS 2024).

rejection-sampling-based method [31] can simulate a wide range of privacy mechanisms; however, the output distribution of the induced mechanism does not perfectly match the original mechanism. This is limiting in scenarios where the original mechanism is designed to satisfy some desired statistical properties, e.g. it is often desirable for the local randomizer to be unbiased or to be summable noise such as Gaussian or other infinitely divisible distributions. Since the induced mechanism is different from the original one, these statistical properties are not preserved. On the other hand, ditheredquantization-based approaches [48, 56, 72, 46, 49, 80] can ensure a correct simulated distribution, but they can only simulate additive noise mechanisms. More importantly, dithered quantization relies on shared randomness between the user and the server, and the server needs to know the dither for decoding. This annuls the local privacy guarantee on the user data, unless we are willing to assume a trusted aggregator [46], use an additional secure aggregation step [49], or restrict attention to specific privacy mechanisms (e.g., one-dimensional Laplace [72]).

Our contribution

In this paper, we introduce a novel DP mechanism compressor called Poisson private representation (PPR), designed to compress and exactly simulate any local randomizer while ensuring local DP, through the use of shared randomness.1 We elaborate on three main advantages of PPR, namely universality, exactness and communication efficiency.

Universality. Unlike dithered-quantization-based approaches which can only simulate additive noise mechanisms, PPR can simulate any local or central DP mechanism with discrete or continuous input and output. Moreover, PPR is universal in the sense that the user and the server only need to agree on the output space and a proposal distribution, and the user can simulate any DP mechanism with the same output space. The user can choose a suitable DP mechanism and privacy budget according to their communication bandwidth and privacy requirement, without divulging their choice to the server.

Exactness. Unlike previous DP mechanism compressors such as Feldman and Talwar [31], Shah et al. [71], Triastcyn et al. [78], PPR enables exact simulation, ensuring that the reproduced distribution perfectly matches the original one. Exact distribution recovery offers several advantages. Firstly, the compressed sample maintains the same statistical properties as the uncompressed one. If the local randomizer is unbiased (a crucial requirement for many machine learning tasks like DP-SGD), the outcome of PPR remains unbiased. In contrast, reconstruction distributions in prior simulation-based compression methods [31, 71] are often biased unless specific debiasing steps are performed (only possible for certain DP mechanisms [71]). Secondly, when the goal is to compute the mean (e.g., for private mean or frequency estimation problems) and the local noise is summable (e.g., Gaussian noise or other infinitely divisible distributions [55, 42]), exact distribution recovery of the local noise enables precise privacy accounting for the final central DP guarantee, without relying on generic privacy amplification techniques like shuffling [30, 32]. PPR can compress a central DP mechanism (e.g., the Gaussian mechanism [27]) and simultaneously achieve weaker local DP (i.e., with a larger εlocal) and stronger central DP (i.e., with a smaller εcentral), while maintaining exactly the same privacy-utility trade-offs as the uncompressed Gaussian mechanism.

Communication efficiency. PPR compresses the output of any DP mechanism to a size close to the theoretical lower bound. For a mechanism on the data X with output Z, the compression size of PPR is I(X; Z) + log(I(X; Z) + 1) + O(1), with only a logarithmic gap from the mutual information lower bound I(X; Z).2 The O(1) constant can be given explicitly in terms of a tunable parameter α > 1 which controls the trade-off between compression size, computational time and privacy.

The main technical tool we utilize for PPR is the Poisson functional representation [61, 60], which provides precise control over the reconstructed joint distribution in channel simulation problems [6, 45, 61, 35, 41, 10, 7, 21]. Channel simulation aims to achieve the minimum communication for simulating a channel (i.e., a specific conditional distribution). Typically, these methods rely on shared randomness between the user and server, and privacy is only preserved when the shared randomness is hidden from the adversary. This setup conflicts with local DP, where the server (which requires access to shared randomness for decoding) is considered adversarial. To ensure local DP, we introduce a randomized encoder based on the Poisson functional representation, which stochastically maps

1Our code can be found in https://github.com/cheuktingli/Poisson Private Repr 2This is similar to channel simulation [45] and the strong functional representation lemma [61], though [45, 61] do not concern privacy.

a private local message to its representation. Hence, PPR achieves order-wise trade-offs between privacy, communication, and accuracy, while preserving the original distribution of local randomizers.

Notations. Entropy H(X), mutual information I(X; Y ), KL divergence D(P Q) and logarithm are to the same base, e.g., they can be all in bits (base 2), or all in nats (base e). For P, Q, d P( )/d Q denotes the Radon-Nikodym derivative.

2 Related Work

Generic compression of local DP mechanisms. In this work, we consider both central DP [28] and local DP [79, 53]. Recent research has explored methods for compressing local DP randomizers when shared randomness is involved. For instance, when ε 1, Bassily and Smith [5] demonstrated that a single bit can simulate any local DP randomizer with a small degradation of utility, as long as the output can be computed using only a subset of the users data. Bun et al. [12] proposed another generic compression technique based on rejection sampling, which compresses a ε-DP mechanism into a 10ε-DP mechanism. Feldman and Talwar [31] proposed a distributed simulation approach using rejection sampling with shared randomness, while Shah et al. [71], Triastcyn et al. [78] utilized importance sampling (or more specifically, minimum random coding [20, 74, 47]). However, all these methods only approximate the original local DP mechanism, unlike our scheme, which achieves an exact distribution recovery.

Distributed mean estimation under DP. Mean estimation is the canonical problems in distributed learning and analytics. They have been widely studied under privacy [24, 8, 23, 4], communication [40, 11, 75], or both constraints [18, 31, 71, 43, 17, 19]. Among them, Asi et al. [4] has demonstrated that the optimal unbiased mean estimation scheme under local differential privacy is priv Unit [8]. Subsequently, communication-efficient mechanisms introduced by Feldman and Talwar [31], Shah et al. [71], Isik et al. [51] aimed to construct communication-efficient versions of priv Unit, either through distributed simulation or discretization. However, these approaches only approximate the priv Unit distribution, while our proposed method ensures exact distribution recovery.

Distributed channel simulation. Our approach relies on the notion of channel simulation [6, 45, 61, 35, 41, 10, 7, 21]. One-shot channel simulation is a lossy compression task, which aims to find the minimum amount of communications over a noiseless channel that is in need to simulate some channel PZ|X (a specific conditional distribution). By Harsha et al. [45], Li and El Gamal [61], the average communication cost is I(X; Z) + O(log(I(X; Z))). In [45], algorithms based on rejection sampling are proposed, and it is further generalized in [39] by introducing the greedy rejection coding. Dithered quantization [81] has also been used to simulate an additive noise channel in [2] for neural compression. As also shown in [2], the time complexity of channel simulation protocols (e.g., in [61]) is usually high, and [76, 35, 41] try to improve the runtime under certain assumptions. Moreover, channel simulation tools have also been used in neural network compression [47], image compression via variational autoencoders [37], diffusion models with perfect realism [77] and differentially private federated learning [71].

Poisson functional representation. The Poisson functional representation is a channel simulation scheme studied in [61]. Also refer to [65] for related constructions for Monte Carlo simulations. Based on the Poisson functional representation, the Poisson matching lemma has been used in proving one-shot achievability results for various network information theory problems [60, 64]. Also see applications on unequal message protection [54], hypothesis testing [44], information hiding [63], minimax learning [62] and secret key generation [50]. A variation called the importance matching lemma [69] has also used in distributed lossy compression. By [38], the Poisson functional representation can be viewed as a certain variant of the A sampling [66, 65], and hence an optimized version with better runtime for one-dimensional unimodal distribution has been proposed in [38].

3 Preliminaries

We begin by reviewing the formal definitions of differential privacy (DP). We consider two models of DP data analysis. In the central model, introduced in Dwork et al. [28], the data of the individuals is stored in a database X X by the server. The server is then trusted to perform data analysis

whose output Z = A(X) Z (where A is a randomized algorithm), which is sent to an untrusted data analyst, does not reveal too much information about any particular individual s data. While this model requires a higher level of trust than the local model, it is possible to design significantly more accurate algorithms. We say that two databases X, X X are neighboring if they differ in a single data point. More generally, we can consider a symmetric neighbor relation N X 2, and regard X, X as neighbors if (X, X ) N.

On the other hand, in the local model, each individual (or client) randomizes their data before sending it to the server, meaning that individuals are not required to trust the server. A local DP mechanism [53] is a local randomizer A that maps the local data X X to the output Z = A(X) Z. Note that here X is the data at one user, unlike central-DP where X is the database with the data of all users. We now review the notion of (ε, δ)-central and local DP.

Definition 3.1 (Differential privacy [28, 53]). Given a mechanism A which induces the conditional distribution PZ|X of Z = A(X), we say that it satisfies (ε, δ)-DP if for any neighboring (x, x ) N and S Z, it holds that

Pr(Z S | X = x) eε Pr(Z S | X = x ) + δ.

In particular, if N = X 2, we say that the mechanism satisfies (ε, δ)-local DP [53].3

When a mechanism satisfies (ε, 0)-central/local DP, we will refer to it simply as ε-central/local DP. ε-DP can be generalized to metric privacy by considering a metric d X (x, x ) over X [16, 3].

Definition 3.2 (ε d X -privacy [16, 3]). Given a mechanism A with conditional distribution PZ|X, and a metric d X over X, we say that A satisfies ε d X -privacy if for any x, x X, S Z, we have

Pr(Z S | X = x) eε d X (x,x ) Pr(Z S | X = x ).

This recovers the original ε-central DP by considering d X to be the Hamming distance among databases, and recovers the original ε-local DP by considering d X to be the discrete metric [16].

The reason we use X to refer to both the database in central DP and the user s data in local DP is that our proposed method can compress both central and local DP mechanisms in exactly the same manner. In the following sections, the mechanism A to be compressed (often written as a conditional distribution PZ|X) can be either a central or local DP mechanism, and the neighbor relation N can be any symmetric relation. The encoder refers to the server in central DP, or the user in local DP. The decoder refers to the data analyst in central DP, or the server in local DP.

4 Poisson Private Representation

Definition 4.1 (Poisson functional representation [61, 60]). Let (Ti)i be a Poisson process with rate 1

(i.e., T1, T2 T1, T3 T2, . . . iid Exp(1)), independent of Zi iid Q for i = 1, 2, . . .. Then (Zi, Ti)i is a Poisson process with intensity measure Q λ[0, ) [57], where λ[0, ) is the Lebesgue measure over [0, ). Fix any distribution P over Z that is absolutely continuous with respect to Q. Let

Ti := Ti d P

d Q(Zi) 1 . (1)

Then (Zi, Ti) is a Poisson process with intensity measure P λ[0, ), which is from the mapping theorem [57]. The Poisson functional representation (PFR) [61, 60] selects the point Z = ZK with the smallest associated TK, i.e., let K := argmini Ti and Z := ZK.4

The PFR selects a sample following the target distribution P using another distribution Q. It draws a random sequence (Zi)i from Q and a sequence of times (Ti)i according to a Poisson process. If we select the sample Zi with the smallest Ti, then the selected sample follows Q. To obtain a sample from P instead, we multiply the time by the factor ( d P

d Q(Zi)) 1 in (1) to give Ti, so the Zi with the

smallest Ti will follow P.

3Equivalently, local DP can be viewed as a special case of central DP with dataset size n = 1. 4Since the Ti s are continuous, with probability 1, there do not exist two equal values among Ti s.

The Poisson functional representation guarantees that Z P [61]. To simulate a DP mechanism with a conditional distribution PZ|X using the Poisson functional representation, we can use (Zi)i as the shared randomness between the encoder and the decoder. 5 Upon observing X, the encoder generates the Poisson process (Ti)i, computes Ti and K using P = PZ|X, and transmits K to the decoder. The decoder simply outputs ZK, which follows the conditional distribution PZ|X. The issue is that K is a function of X and the shared randomness (Zi, Ti)i, and a change of X may affect K in a deterministic manner, and hence this method cannot be directly used to protect the privacy of X.

Poisson private representation. To ensure privacy, we introduce randomness in the encoder by a generalization of the Poisson functional representation, which we call Poisson private representation (PPR) with parameter α (1, ], proposal distribution Q and the simulated mechanism PZ|X. Both X and Z can be discrete or continuous, though as a regularity condition, we require PZ|X( |X) to be absolutely continuous with respect to Q almost surely. The PPR-compressed mechanism is given as:

1. We use (Zi)i=1,2,..., Zi iid Q as the shared randomness between the encoder and the decoder.

Practically, the encoder and the decoder can share a random seed and generate Zi iid Q from it using a pseudorandom number generator.6

2. The encoder knows (Zi)i, X, PZ|X and performs the following steps:

(a) Generates the Poisson process (Ti)i with rate 1. (b) Computes Ti := Ti ( d P

d Q(Zi)) 1, where P := PZ|X( |X). Take Ti = if d P

d Q(Zi) = 0. (c) Generates K Z+ using local randomness with

Pr(K = k) = T α k P i=1 T α i .

(d) Compress K (e.g., using Elias delta coding [29]) and sends K.

3. The decoder, which knows (Zi)i, K, outputs Z = ZK.

Note that when α = , we have K = argmini Ti, and PPR reduces to the original Poisson functional representation [61, 60]. PPR can simulate the privacy mechanism PZ|X precisely, as shown in the following proposition. The proof is in Appendix A.

Proposition 4.2. The output Z of PPR follows the conditional distribution PZ|X exactly.

Due to the exactness of PPR, it guarantees unbiasedness for tasks such as DME. If the goal is only to design a stand-alone privacy mechanism, we can focus on the privacy and utility of the mechanism without studying the output distribution. However, if the output of the mechanism is used for downstream tasks (e.g., for DME, after receiving information from clients, the server sends information about the aggregated mean to data analysts, where central DP is crucial), having an exact characterization of the conditional distribution of the output given the input allows us to obtain precise (central) privacy and utility guarantees.

Notably, PPR is universal in the sense that only the encoder needs to know the simulated mechanism PZ|X. The decoder can decode the index K as long as it has access to the shared randomness (Zi)i. This allows the encoder to choose an arbitrary mechanism PZ|X with the same Z, and adapt the choice of PZ|X to the communication and privacy constraints without explicitly informing the decoder which mechanism is chosen.

Practically, the algorithm cannot compute the whole infinite sequence ( Ti)i. We can truncate the method and only compute Ti, . . . , TN for a large N and select K {1, . . . , N}, which incurs a small distortion in the distribution of Z.7 While this method is practically acceptable, it might defeat

5The original Poisson functional representation [61, 60] uses the whole (Zi, Ti)i as the shared randomness. It is clear that (Ti)i is not needed by the decoder, and hence we can use only (Zi)i as the shared randomness. 6We note that our analyses assume that the adversary knows both the index K and the shared randomness (Zi)i, and we prove that the mechanism is still private despite the shared randomness between the encoder and the decoder, since the privacy is provided by locally randomizing K in Step 2c. 7To compare to the minimal random coding (MRC) [47, 20, 74] scheme in [71], which also utilizes a finite number N of samples (Zi)i=1,...,N, while truncating the number of samples to N in both PPR and MRC

the purpose of having an exact algorithm that ensures the correct conditional distribution PZ|X. In Appendix B, we will present an exact algorithm for PPR that terminates in a finite amount of time, using a reparametrization that allows the encoder to know when the optimal point Zi has already been encountered (see Algorithm 1 in Appendix B).

By the lower bound for channel simulation [6, 61], we must have H(K) I(X; Z), i.e., the compression size is at least the mutual information between the data X and the output Z. The following result shows that the compression provided by PPR is almost optimal , i.e., close to the theoretical lower bound I(X; Z). The proof is given in Appendix F.

Theorem 4.3 (Compression size of PPR). For PPR with parameter α > 1, when the encoder is given the input x, the message K given by PPR satisfies

E[log K] D(P Q) + (log(3.56))/ min{(α 1)/2, 1},

where P := PZ|X( |x). As a result, when the input X PX is random, taking Q = PZ, we have

E[log K] I(X; Z) + (log(3.56))/ min{(α 1)/2, 1}.

Note the running time complexity (which depends on the number of samples Zi the algorithm must examine before outputting the index K) can be quite high. Since E[log K] I(X; Z), K (and hence the running time) is at least exponential in I(X; Z). See more discussions in Section 8.

If a prefix-free encoding of K is required, then the number of bits needed is slightly larger than log2 K. For example, if Elias delta code [29] is used, the expected compression size is E[log2 K]+ 2 log2(E[log2 K] + 1) + 1 bits. If the Shannon code [73] (an almost-optimal prefix-free code) for the Zipf distribution p(k) k λ with λ = 1 + 1/E[log2 K] is used, the expected compression size is E[log2 K]+log2(E[log2 K]+1)+2 bits (see [61]). Both codes yield an I(X; Z)+O(log I(X; Z)) size, within a logarithmic gap from the lower bound I(X; Z). This is similar to some other channel simulation schemes such as [45, 10, 61], though these schemes do not provide privacy guarantees.

Note that if PZ|X is ε-DP, then by definition, for any z Z and x, x0 X, it holds that

D PZ|X=x PZ|X=x0 = EZ PZ|X=x

log d PZ|X=x

d PZ|X=x0 (Z) ε log e.

Setting the proposal distribution Q = PZ|X=x0 for an arbitrary x0 X gives the following bound.

Corollary 4.4 (Compression size under ε-LDP). Let PZ|X satisfy ε-differential privacy. Let x0 X and Q = PZ|X=x0. Then for PPR with parameter α > 1, the expected compression size is at most ℓ+ log2(ℓ+ 1) + 2 bits, where ℓ:= ε log2 e + (log2(3.56))/min {(α 1)/2, 1}.

Next, we analyze the privacy guarantee of PPR. The PPR method induces a conditional distribution P(Zi)i,K|X of the knowledge of the decoder ((Zi)i, K), given the data X. To analyze the privacy guarantee, we study whether the randomized mapping P(Zi)i,K|X from X to ((Zi)i, K) satisfies ε-DP or (ε, δ)-DP. 8 This is similar to the privacy condition in [71], and is referred as decoder privacy in [72], which is stronger than database privacy which concerns the privacy of the randomized mapping from X to the final output Z [72] (which is simply the privacy of the original mechanism PZ|X to be compressed since PPR simulates PZ|X precisely). Since the decoder knows ((Zi)i, K), more than just the final output Z, we expect that the PPR-compressed mechanism P(Zi)i,K|X to have a worse privacy guarantee than the original mechanism PZ|X, which is the price of having a smaller communication cost. The following result shows that, if the original mechanism PZ|X is ε-DP, then the PPR-compressed mechanism is guaranteed to be 2αε-DP.

results in a distortion in the distribution of Z that tends to 0 as N , the difference is that log K (which is approximately the compression size) in MRC grows like log N, whereas log K does not grow as N in PPR. The size N in truncated PPR merely controls the tradeoff between accuracy of the distribution of Z and the running time of the algorithm. 8Note that the encoder does not actually send ((Zi)i, K); it only sends K. The common randomness (Zi)i is independent of the data X, and can be pre-generated using a common random seed in practice. While this seed must be communicated between the client and the server as a small overhead, the client and the server only ever need to communicate one seed to initialize a pseudorandom number generator, that can be used in all subsequent privacy mechanisms and communication tasks (to transmit high-dimensional data or use DP mechanisms for many times). The conditional distribution P(Zi)i,K|X is only relevant for privacy analysis.

Theorem 4.5 (ε-DP of PPR). If the mechanism PZ|X is ε-differentially private, then PPR P(Zi)i,K|X with parameter α > 1 is 2αε-differentially private.

Similar results also apply to (ε, δ)-DP and metric DP. Theorem 4.6 ((ε, δ)-DP of PPR). If the mechanism PZ|X is (ε, δ)-differentially private, then PPR P(Zi)i,K|X with parameter α > 1 is (2αε, 2δ)-differentially private. Theorem 4.7 (Metric privacy of PPR). If the mechanism PZ|X satisfies ε d X -privacy, then PPR P(Zi)i,K|X with parameter α > 1 satisfies 2αε d X -privacy.

Refer to Appendices C and D for the proofs. In Theorem 4.5 and Theorem 4.6, PPR imposes a multiplicative penalty 2α on the privacy parameter ε. This penalty can be made arbitrarily close to 2 by taking α close to 1, which increases the communication cost (see Theorem 4.3). Compared to minimal random coding which has a factor 2 penalty in the DP guarantee [47, 71], the 2α factor in PPR is slightly larger, though PPR ensures exact simulation (unlike [47, 71] which are approximate). The method in [31] does not have a penalty on ε, but the utility and compression size depends on computational hardness assumptions on the pseudorandom number generator, and there is no guarantee that the compression size is close to the optimum. In comparison, the compression and privacy guarantees of PPR are unconditional and does not rely on computational assumptions.

In order to make the penalty of PPR close to 1, we have to consider (ε, δ)-differential privacy, and allow a small failure probability, i.e., a small increase in δ. The following result shows that PPR can compress any ε-DP mechanism into a ( ε, 0)-DP mechanism as long as α is close enough to 1 (i.e., almost no inflation). More generally, PPR can compress an (ε, δ)-DP mechanism into an ( ε, 2δ)-DP mechanism for α close to 1. The proof is in Appendix E.

Theorem 4.8 (Tighter (ε, δ)-DP of PPR). If the mechanism PZ|X is (ε, δ)-differentially private, then PPR P(Zi)i,K|X with parameter α > 1 is (αε + ε, 2(δ + δ))-differentially private, for every ε (0, 1] and δ (0, 1/3] that satisfy α e 4.2 δ ε2/( ln δ) + 1.

5 Applications to Distributed Mean Estimation

We demonstrate the efficacy of PPR by applying it to distributed mean estimation (DME) [75]. Note that private DME is the core sub-routine in various private and federated optimization algorithms, such as DP-SGD [1] or DP-Fed Avg [67].

Consider the following general distributed setting: each of n clients holds a local data point Xi X, and a central server aims to estimate a function of all local data µ (Xn), subject to privacy and local communication constraints. To this end, each client i compresses Xi into a message Zi Zn via a local encoder, and we require that each Zi can be encoded into a bit string with an expected length of at most b bits. Upon receiving Zn := (Z1, . . . , Zn), the central server decodes it and outputs a DP estimate ˆµ. Two DP criteria can be considered: the (ε, δ)-central DP of the randomized mapping from Xn to ˆµ, and the (ε, δ)-local DP of the randomized mapping from Xi to Zi for each client i.

In the distributed L2 mean estimation problem, X = Bd(C) := v Rd v 2 C , and the central server aims to estimate the sample mean µ(Xn) := 1

n Pn i=1 Xi by minimizing the mean squared error (MSE) E[ µ ˆµ 2 2]. It is recently proved that under ε-local DP, priv Unit [8, 4] is the optimal mechanism. By simulating priv Unit with PPR and applying Corollary 4.4 and Theorem 4.6, we immediately obtain the following corollary: Corollary 5.1 (PPR simulating priv Unit). Let P be the density defined by ε-priv Unit2 Bhowmick et al. [8, Algorithm 1]. Let Q be the uniform density over the sphere Sd 1 (1/m) where the radius 1/m is defined in Bhowmick et al. [8, (15)]. Let r := eε. Then the outcome of PPR (see Algorithm 1) satisfies (1) 2αε-local DP; and (2) (αε+ ε, 2 δ)-DP for any α e 4.2 δ ε2/ log(1/ δ)+1. In addition, the average compression size is at most ℓ+log2(ℓ+1)+2 bits where ℓ:= ε+(log2 (3.56))/ min{(α 1)/2, 1}. Moreover, PPR achieves the same MSE as ε-priv Unit2, which is O d/ min ε, ε2 .

Note that PPR can simulate arbitrary local DP mechanisms. However, we present only the result of priv Unit2 because it achieves the optimal privacy-accuracy trade-off. Besides simulating local DP mechanisms, PPR can also compress central DP mechanisms while still preserving some (albeit weaker) local guarantees. We give a corollary of Theorems 4.3 and 4.6. The proof is in Appendix H.

Corollary 5.2 (PPR-compressed Gaussian mechanism). Let ε, δ (0, 1). Consider the Gaussian mechanism PZ|X( |x) = N(x, σ2

n Id), and the proposal distribution Q = N(0, ( C2

n )Id), where

2 ln(1.25/δ)

ε . For each client i, let Zi be the output of PPR applied on PZ|X( |Xi). We have:

ˆµ(Zn) := 1

i Zi yields an unbiased estimator of µ(Xn) = 1

n Pn i=1 Xi satisfying (ε, δ)- central DP and has MSE E[ µ ˆµ 2 2] = σ2d/n2.

As long as ε < 1/ n, PPR satisfies (2α nε, 2δ)-local DP.9

The average per-client communication cost is at most ℓ+ log2(ℓ+ 1) + 2 bits where

dσ2 + 1 + ηα d

2d ln(1.25/δ) + 1 + ηα,

where ηα := (log2(3.56))/ min{(α 1)/2, 1}.

A few remarks are in order. First, notice that when α is fixed, for an O( C2d

n2ε2 log(1/δ)) MSE, the per-client communication cost is

O d log nε2

d log(1/δ) + 1 + 1 ,

which is at least as good as the O(nε2/ log(1/δ) + 1) bound in [75, 19], and can be better than O(nε2/ log(1/δ) + 1) when n d. Hence, the PPR-compressed Gaussian mechanism is order-wise optimal. Second, compared to other works that also compress the Gaussian mechanism, PPR is the only lossless compressor; schemes based on random sparsification, projection, or minimum random coding (e.g., Triastcyn et al. [78], Chen et al. [19]) are lossy, i.e., they introduce additional distortion on top of the DP noise. Finally, other DP mechanism compressors tailored to local randomizers [31, 71] do not provide the same level of central DP guarantees when applied to local Gaussian noise since the reconstructed noise is no longer Gaussian. Refer to Section 7 for experiments.

6 Applications to Metric Privacy

Metric privacy [16, 3] (see Definition 3.2) allows users to send privatized version Z Rd of their data vectors X Rd to an untrusted server, so that the server can know X approximately but not exactly. A popular mechanism is the Laplace mechanism [16, 3, 33, 34], where a d-dimensional Laplace noise is added to X. The conditional density function of Z given X is f Z|X(z|x) e εd X (x,z), where ε is the privacy parameter, and the metric d X (x, z) = x z 2 is the Euclidean distance. The Laplace mechanism achieves ε d X -privacy, and has been used, for example, in geo-indistinguishability to privatize the users locations [3], and to privatize high-dimensional word embedding vectors [33, 34].

A problem is that the real vector Z cannot be encoded into finitely many bits. To this end, [3] studies a discrete Laplace mechanism where each coordinate of Z is quantized to a finite number of levels, introducing additional distortion to Z. PPR provides an alternative compression method that preserves the statistical behavior of Z (e.g., unbiasedness) exactly. We give a corollary of Theorems 4.3 and 4.7. The proof is in Appendix I. Refer to Appendix J for an experiment on metric privacy. Corollary 6.1 (PPR-compressed Laplace mechanism). Consider PPR applied to the Laplace mechanism PZ|X where X Bd(C) = {x Rd | x 2 C}, with a proposal distribution Q = N(0, ( C2

ε2 )Id). It achieves an MSE d(d+1)

ε2 , a 2αϵ d X -privacy, and a compression size at most ℓ+ log2(ℓ+ 1) + 2 bits, where

d + d + 1 log2 Γ(d + 1) Γ( d

2 + 1) + ηα,

where ηα := (log2(3.56))/ min{(α 1)/2, 1}.

9The restricted range on ε < 1/ n is due to the simpler privacy accountant [25]. By using the Rényi DP accountant instead, one can achieve a tighter result that applies to any n. We present the Rényi DP version of the corollary in Appendix G. Moreover, in the context of federated learning, n refers to the number of clients in each round, which is typically much smaller than the total number of clients. For example, as observed in [52], the per-round cohort size in Google s FL application typically ranges from 103 to 105, significantly smaller than the number of trainable parameters d [106, 109] or the number of available users N [106, 108].

7 Empirical Results

We empirically evaluate our scheme on the DME problem (which is formally introduced in Section 5), examine the privacy-accuracy-communication trade-off, and compare it with the Coordinate Subsampled Gaussian Mechanism (CSGM) [19, Algorithm 1], an order-optimal scheme for DME under central DP. In Chen et al. [19], each client only communicates partial information (via sampling a subset of the coordinates of the data vector) about its samples to amplify the privacy, and the compression is mainly from subsampling. Moreover, CSGM only guarantees central DP.

We use the same setup that has been used in [19]: consider n = 500 clients, and the dimension of local vectors is d = 1000, each of which is generated according to Xi(j) i.i.d. (2 Ber(0.8) 1), where Ber(0.8) is a Bernoulli random variable with parameter p = 0.8. We require (ε, δ)-central DP with δ = 10 6 and ε [0.05, 6] and apply the PPR with α = 2 to simulate the Gaussian mechanism, where the privacy budgets are accounted via Rényi DP.

We compare the MSE of PPR (α = 2, using Theorem 4.3) and CSGM under various compression sizes in Figure 1 (the y-axis is in logarithmic scale).10 Note that the MSE of the (uncompressed) Gaussian mechanism coincides with the CSGM with 1000 bits, and the PPR with only 400 bits. We see that PPR consistently achieves a smaller MSE compared to CSGM for all ε s and compression sizes considered. For ϵ = 1 and we compress d = 1000 to 50 bits, CSGM has an MSE 0.1231 , while PPR has an MSE 0.08173, giving a 33.61% reduction. For ϵ = 0.5 and we compress d = 1000 to 25 bits (the case of high compression and conservative privacy), CSGM has an MSE 0.3877, while PPR has an MSE 0.3011, giving a 22.33% reduction. These reductions are significant, since all considered mechanisms are asymptotically close to optimal and a large improvement compared to an (almost optimal) mechanism is unexpected. See Section L for more about MSE against the compression sizes.

We also emphasize that PPR provides both central and local DP guarantees according to Theorem 4.5, 4.6 and 4.8. In contrast, CSGM only provides central DP guarantees. Another advantage of PPR under conservative privacy (small ϵ) is that the trade-off between ϵ and MSE of PPR exactly coincides with the trade-off of the Gaussian mechanism for small ϵ (see Figure 1), and CSGM is only close to (but strictly worse than) the Gaussian mechanism. This means that for small ϵ, PPR provides compression without any drawback in terms of ϵ-MSE trade-off compared to the Gaussian mechanism (which requires an infinite size communication to exactly realize).

Moreover, although directly applying PPR on the d-dimensional vectors is impractical for a large d, one can ensure an efficient O(d) running time (see Section 8 for details) by breaking the vector with d = 1000 dimensions into small chunks of fixed lengths (we use dchunk = 50 dimensions for each chunk), and apply the PPR to each chunk. We call it the sliced PPR in Figure 1. Though the sliced PPR has a small penalty on the MSE (as shown in Figure 1), it still outperforms the CSGM (400 bits) for the range of ε in the plot. For the sliced PPR for one d = 1000 vector, when ϵ = 0.05, the running time is 1.3348 seconds on average.11 For larger ϵ s, we can choose smaller dchunk s to have reasonable running time: For ϵ = 6 and dchunk = 2 we have an average running time 0.0127 seconds and with dchunk = 4 we have an average running time 0.6343 seconds; for ϵ = 10 and dchunk = 2 we have an average running time 0.0128 seconds and with dchunk = 4 we have an average running time 0.7301 seconds. See Appendix K for more experiments on the running time of the sliced PPR.

8 Limitations

While PPR is communication-efficient, having only a logarithmic gap from the theoretical lower bound on the compression size as shown in Theorem 4.3, the running time complexity can be high. However, we note that an exponential complexity is also needed in sampling methods that do not ensure privacy, such as [65, 47]. It has been proved in [2] that no polynomial time general sampling-

10Source code: https://github.com/cheuktingli/Poisson Private Repr. Experiments were executed on M1 Pro Macbook, 8-core CPU ( 3.2 GHz) with 16GB memory. For PPR under a privacy budget ε and communication budget b, we find the largest ε ε such that the communication cost bound in Theorem 4.3 (with Shannon code [73]) for simulating the Gaussian mechanism with (ε , δ)-central DP is at most b, and use PPR to simulate this Gaussian mechanism. Thus, MSE of PPR in Figure 1 becomes flat for large ε, as PPR falls back to using a smaller ε instead of ε due to the communication budget. 11The running time is calculated by 1000

50 Tchunk, where each chunk s running time Tchunk is averaged over 1000 trials. The estimate of the mean of Tchunk is 0.0667, whereas the standard deviation is 0.2038.

Figure 1: MSE of distributed mean estimation for PPR and CSGM [19] for different ε s.

based method exists (even without privacy constraint), if RP = NP. All existing polynomial time exact channel simulation methods can only simulate specific noisy channels.12 Hence, a polynomial time algorithm for exactly compressing a general DP mechanism is likely nonexistent.

Nevertheless, this is not an obstacle for simulating local DP mechanisms, since the mutual information I(X; Z) for a reasonable local DP mechanism must be small, or else the leakage of the data X in Z would be too large. For an ε-local DP mechanism, we have I(X; Z) min{ε, ε2} (in nats) [22]. Hence, the PPR algorithm can terminate quickly even if has a running time exponential in I(X; Z).

Another way to ensure a polynomial running time is to divide the data into small chunks and apply the mechanism to each chunk separately. For example, to apply the Gaussian mechanism to a high-dimensional vector, we break it into several shorter vectors and apply the mechanism to each vector. Experiments in Section 7 show that this greatly reduces the running time while having only a small penalty on the compression size. See Appendix K for experiments on the running time of PPR.

9 Conclusion

We proposed a novel scheme for compressing DP mechanisms, called Poisson private representation (PPR). Unlike previous schemes which are either constrained on special classes of DP mechanisms or introducing additional distortions on the output, our scheme can compress and exactly simulate arbitrary mechanisms while protecting differential privacy, with a compression size that is close to the theoretic lower bound. A future direction is to reduce the running time of PPR under certain restrictions on PZ|X. For example, the techniques in [38, 35] may be useful when PZ|X is unimodal.

Acknowledgment

YL was partially supported by the CUHK Ph D International Mobility for Partnerships and Collaborations Award 2023-24. WC and AO were supported by the NSF grant CIF-2213223. CTL was partially supported by two grants from the Research Grants Council of the Hong Kong Special Administrative Region, China [Project No.s: CUHK 24205621 (ECS), CUHK 14209823 (GRF)].

12For example, [36] and dithered-quantization-based schemes [48, 72] can only simulate additive noise mechanisms. Among these existing works, only [72] ensures local DP.

[1] Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan Mc Mahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pages 308 318, 2016.

[2] Eirikur Agustsson and Lucas Theis. Universally quantized neural compression. Advances in neural information processing systems, 33:12367 12376, 2020.

[3] Miguel E Andrés, Nicolás E Bordenabe, Konstantinos Chatzikokolakis, and Catuscia Palamidessi. Geo-indistinguishability: Differential privacy for location-based systems. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security, pages 901 914, 2013.

[4] Hilal Asi, Vitaly Feldman, and Kunal Talwar. Optimal algorithms for mean estimation under local differential privacy. In International Conference on Machine Learning, pages 1046 1056. PMLR, 2022.

[5] Raef Bassily and Adam Smith. Local, private, efficient protocols for succinct histograms. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 127 135, 2015.

[6] Charles H Bennett, Peter W Shor, John Smolin, and Ashish V Thapliyal. Entanglement-assisted capacity of a quantum channel and the reverse Shannon theorem. IEEE Trans. Inf. Theory, 48 (10):2637 2655, 2002.

[7] Charles H Bennett, Igor Devetak, Aram W Harrow, Peter W Shor, and Andreas Winter. The quantum reverse shannon theorem and resource tradeoffs for simulating quantum channels. IEEE Transactions on Information Theory, 60(5):2926 2959, 2014.

[8] Abhishek Bhowmick, John Duchi, Julien Freudiger, Gaurav Kapoor, and Ryan Rogers. Protection against reconstruction and its applications in private federated learning. ar Xiv preprint ar Xiv:1812.00984, 2018.

[9] Rishi Bommasani, Drew A Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, et al. On the opportunities and risks of foundation models. ar Xiv preprint ar Xiv:2108.07258, 2021.

[10] Mark Braverman and Ankit Garg. Public vs private coin in bounded-round information. In International Colloquium on Automata, Languages, and Programming, pages 502 513. Springer, 2014.

[11] Mark Braverman, Ankit Garg, Tengyu Ma, Huy L Nguyen, and David P Woodruff. Communication lower bounds for statistical estimation problems via a distributed data processing inequality. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pages 1011 1020, 2016.

[12] Mark Bun, Jelani Nelson, and Uri Stemmer. Heavy hitters and the structure of local privacy. In Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, SIGMOD/PODS 18, page 435 447, New York, NY, USA, 2018. Association for Computing Machinery.

[13] Clément L Canonne, Gautam Kamath, and Thomas Steinke. The discrete Gaussian for differential privacy. Advances in Neural Information Processing Systems, 33:15676 15688, 2020.

[14] Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, et al. Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21), pages 2633 2650, 2021.

[15] Nicolas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramer, Borja Balle, Daphne Ippolito, and Eric Wallace. Extracting training data from diffusion models. In 32nd USENIX Security Symposium (USENIX Security 23), pages 5253 5270, 2023.

[16] Konstantinos Chatzikokolakis, Miguel E Andrés, Nicolás Emilio Bordenabe, and Catuscia Palamidessi. Broadening the scope of differential privacy using metrics. In Privacy Enhancing Technologies: 13th International Symposium, PETS 2013, Bloomington, IN, USA, July 10-12, 2013. Proceedings 13, pages 82 102. Springer, 2013.

[17] Kamalika Chaudhuri, Chuan Guo, and Mike Rabbat. Privacy-aware compression for federated data analysis. In Uncertainty in Artificial Intelligence, pages 296 306. PMLR, 2022.

[18] Wei-Ning Chen, Peter Kairouz, and Ayfer Özgür. Breaking the communication-privacy-accuracy trilemma. Advances in Neural Information Processing Systems, 33:3312 3324, 2020.

[19] Wei-Ning Chen, Dan Song, Ayfer Özgür, and Peter Kairouz. Privacy amplification via compression: Achieving the optimal privacy-accuracy-communication trade-off in distributed mean estimation. Advances in Neural Information Processing Systems, 36, 2024.

[20] Paul Cuff. Communication requirements for generating correlated random variables. In 2008 IEEE International Symposium on Information Theory, pages 1393 1397. IEEE, 2008.

[21] Paul Cuff. Distributed channel synthesis. IEEE Trans. Inf. Theory, 59(11):7071 7096, Nov 2013.

[22] Paul Cuff and Lanqing Yu. Differential privacy as a mutual information constraint. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pages 43 54, 2016.

[23] John Duchi and Ryan Rogers. Lower bounds for locally private estimation via communication complexity. In Conference on Learning Theory, pages 1161 1191. PMLR, 2019.

[24] John C Duchi, Michael I Jordan, and Martin J Wainwright. Local privacy and statistical minimax rates. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pages 429 438. IEEE, 2013.

[25] Cynthia Dwork. Differential privacy. In International colloquium on automata, languages, and programming, pages 1 12. Springer, 2006.

[26] Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3 4):211 407, 2014.

[27] Cynthia Dwork, Krishnaram Kenthapadi, Frank Mc Sherry, Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation. In Advances in Cryptology EUROCRYPT 2006: 24th Annual International Conference on the Theory and Applications of Cryptographic Techniques, St. Petersburg, Russia, May 28-June 1, 2006. Proceedings 25, pages 486 503. Springer, 2006.

[28] Cynthia Dwork, Frank Mc Sherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference, pages 265 284. Springer, 2006.

[29] Peter Elias. Universal codeword sets and representations of the integers. IEEE transactions on information theory, 21(2):194 203, 1975.

[30] Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Abhradeep Thakurta. Amplification by shuffling: From local to central differential privacy via anonymity. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2468 2479. SIAM, 2019.

[31] Vitaly Feldman and Kunal Talwar. Lossless compression of efficient private local randomizers. In International Conference on Machine Learning, pages 3208 3219. PMLR, 2021.

[32] Vitaly Feldman, Audra Mc Millan, and Kunal Talwar. Hiding among the clones: A simple and nearly optimal analysis of privacy amplification by shuffling. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 954 964. IEEE, 2022.

[33] Natasha Fernandes, Mark Dras, and Annabelle Mc Iver. Generalised differential privacy for text document processing. In Principles of Security and Trust: 8th International Conference, POST 2019, pages 123 148. Springer International Publishing, 2019.

[34] Oluwaseyi Feyisetan, Borja Balle, Thomas Drake, and Tom Diethe. Privacy-and utilitypreserving textual analysis via calibrated multivariate perturbations. In Proceedings of the 13th international conference on web search and data mining, pages 178 186, 2020.

[35] Gergely Flamich. Greedy poisson rejection sampling. Advances in Neural Information Processing Systems, 36, 2024.

[36] Gergely Flamich and Lucas Theis. Adaptive greedy rejection sampling. In 2023 IEEE International Symposium on Information Theory (ISIT), pages 454 459. IEEE, 2023.

[37] Gergely Flamich, Marton Havasi, and José Miguel Hernández-Lobato. Compressing images by encoding their latent representations with relative entropy coding. Advances in Neural Information Processing Systems, 33:16131 16141, 2020. [38] Gergely Flamich, Stratis Markou, and José Miguel Hernández-Lobato. Fast relative entropy coding with A* coding. In International Conference on Machine Learning, pages 6548 6577. PMLR, 2022. [39] Gergely Flamich, Stratis Markou, and José Miguel Hernández Lobato. Faster relative entropy coding with greedy rejection coding. ar Xiv preprint ar Xiv:2309.15746, 2023. [40] Ankit Garg, Tengyu Ma, and Huy Nguyen. On communication cost of distributed statistical estimation and dimensionality. In Advances in Neural Information Processing Systems, pages 2726 2734, 2014. [41] Daniel Goc and Gergely Flamich. On channel simulation with causal rejection samplers. ar Xiv preprint ar Xiv:2401.16579, 2024. [42] Slawomir Goryczka and Li Xiong. A comprehensive comparison of multiparty secure additions with differential privacy. IEEE transactions on dependable and secure computing, 14(5): 463 477, 2015. [43] Chuan Guo, Kamalika Chaudhuri, Pierre Stock, and Michael Rabbat. Privacy-aware compression for federated learning through numerical mechanism design. In International Conference on Machine Learning, pages 11888 11904. PMLR, 2023. [44] Yuanxin Guo, Sadaf Salehkalaibar, Stark C. Draper, and Wei Yu. One-shot achievability region for hypothesis testing with communication constraint. In accepted at the IEEE Information Theory Workshop. IEEE, 2024. [45] Prahladh Harsha, Rahul Jain, David Mc Allester, and Jaikumar Radhakrishnan. The communication complexity of correlation. IEEE Trans. Inf. Theory, 56(1):438 449, Jan 2010. [46] Burak Hasırcıo glu and Deniz Gündüz. Communication efficient private federated learning using dithering. In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 7575 7579. IEEE, 2024. [47] Marton Havasi, Robert Peharz, and José Miguel Hernández-Lobato. Minimal random code learning: Getting bits back from compressed model parameters. In 7th International Conference on Learning Representations, ICLR 2019, 2019. [48] Mahmoud Hegazy and Cheuk Ting Li. Randomized quantization with exact error distribution. In 2022 IEEE Information Theory Workshop (ITW), pages 350 355. IEEE, 2022. [49] Mahmoud Hegazy, Rémi Leluc, Cheuk Ting Li, and Aymeric Dieuleveut. Compression with exact error distribution for federated learning. In Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, volume 238 of Proceedings of Machine Learning Research, pages 613 621. PMLR, 02 04 May 2024. [50] Henri Hentilä, Yanina Y Shkel, and Visa Koivunen. Communication-constrained secret key generation: Second-order bounds. IEEE Transactions on Information Theory, 2024. [51] Berivan Isik, Wei-Ning Chen, Ayfer Özgür, Tsachy Weissman, and Albert No. Exact optimality of communication-privacy-utility tradeoffs in distributed mean estimation. Advances in Neural Information Processing Systems, 36, 2024. [52] Peter Kairouz, H Brendan Mc Mahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, et al. Advances and open problems in federated learning. Foundations and trends in machine learning, 14(1 2):1 210, 2021. [53] Shiva Prasad Kasiviswanathan, Homin K Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. What can we learn privately? SIAM Journal on Computing, 40(3):793 826, 2011. [54] Ashish Khisti, Arash Behboodi, Gabriele Cesa, and Pratik Kumar. Unequal message protection: One-shot analysis via poisson matching lemma. In 2024 IEEE International Symposium on Information Theory (ISIT). IEEE, 2024. [55] Samuel Kotz, Tomasz Kozubowski, and Krzystof Podgorski. The Laplace distribution and generalizations: a revisit with applications to communications, economics, engineering, and finance. Springer Science & Business Media, 2012.

[56] Natalie Lang, Elad Sofer, Tomer Shaked, and Nir Shlezinger. Joint privacy enhancement and quantization in federated learning. IEEE Transactions on Signal Processing, 71:295 310, 2023.

[57] Günter Last and Mathew Penrose. Lectures on the Poisson process, volume 7. Cambridge University Press, 2017.

[58] Cheuk Ting Li. Pointwise redundancy in one-shot lossy compression via Poisson functional representation. ar Xiv preprint, 2024.

[59] Cheuk Ting Li and Venkat Anantharam. Pairwise multi-marginal optimal transport and embedding for earth mover s distance. ar Xiv preprint ar Xiv:1908.01388, 2019.

[60] Cheuk Ting Li and Venkat Anantharam. A unified framework for one-shot achievability via the Poisson matching lemma. IEEE Transactions on Information Theory, 67(5):2624 2651, 2021.

[61] Cheuk Ting Li and Abbas El Gamal. Strong functional representation lemma and applications to coding theorems. IEEE Transactions on Information Theory, 64(11):6967 6978, Nov 2018.

[62] Cheuk Ting Li, Xiugang Wu, Ayfer Özgür, and Abbas El Gamal. Minimax learning for remote prediction. In 2018 IEEE ISIT, pages 541 545, June 2018. doi: 10.1109/ISIT.2018.8437318.

[63] Yanxiao Liu and Cheuk Ting Li. One-shot information hiding. In accepted at the IEEE Information Theory Workshop. IEEE, 2024.

[64] Yanxiao Liu and Cheuk Ting Li. One-shot coding over general noisy networks. ar Xiv preprint ar Xiv:2402.06021, 2024.

[65] Chris J Maddison. A Poisson process model for Monte Carlo. Perturbation, Optimization, and Statistics, pages 193 232, 2016.

[66] Chris J Maddison, Daniel Tarlow, and Tom Minka. A* sampling. Advances in neural information processing systems, 27, 2014.

[67] Brendan Mc Mahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273 1282. PMLR, 2017.

[68] Ilya Mironov. Rényi differential privacy. In 2017 IEEE 30th computer security foundations symposium (CSF), pages 263 275. IEEE, 2017.

[69] Buu Phan, Ashish Khisti, and Christos Louizos. Importance matching lemma for lossy compression with side information. In International Conference on Artificial Intelligence and Statistics, pages 1387 1395. PMLR, 2024.

[70] John K Salmon, Mark A Moraes, Ron O Dror, and David E Shaw. Parallel random numbers: as easy as 1, 2, 3. In Proceedings of 2011 international conference for high performance computing, networking, storage and analysis, pages 1 12, 2011.

[71] Abhin Shah, Wei-Ning Chen, Johannes Balle, Peter Kairouz, and Lucas Theis. Optimal compression of locally differentially private mechanisms. In International Conference on Artificial Intelligence and Statistics, pages 7680 7723. PMLR, 2022.

[72] Ali Moradi Shahmiri, Chih Wei Ling, and Cheuk Ting Li. Communication-efficient laplace mechanism for differential privacy via random quantization. In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4550 4554. IEEE, 2024.

[73] Claude E Shannon. A mathematical theory of communication. Bell system technical journal, 27(3):379 423, 1948.

[74] Eva C Song, Paul Cuff, and H Vincent Poor. The likelihood encoder for lossy compression. IEEE Trans. Inf. Theory, 62(4):1836 1849, 2016.

[75] Ananda Theertha Suresh, X Yu Felix, Sanjiv Kumar, and H Brendan Mc Mahan. Distributed mean estimation with limited communication. In International conference on machine learning, pages 3329 3337. PMLR, 2017.

[76] Lucas Theis and Noureldin Y Ahmed. Algorithms for the communication of samples. In International Conference on Machine Learning, pages 21308 21328. PMLR, 2022.

[77] Lucas Theis, Tim Salimans, Matthew D Hoffman, and Fabian Mentzer. Lossy compression with Gaussian diffusion. ar Xiv preprint ar Xiv:2206.08889, 2022.

[78] Aleksei Triastcyn, Matthias Reisser, and Christos Louizos. DP-REC: Private & communicationefficient federated learning. ar Xiv preprint ar Xiv:2111.05454, 2021. [79] Stanley L Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309):63 69, 1965. [80] Guangfeng Yan, Tan Li, Tian Lan, Kui Wu, and Linqi Song. Layered randomized quantization for communication-efficient and privacy-preserving distributed learning. ar Xiv preprint ar Xiv:2312.07060, 2023. [81] Jacob Ziv. On universal quantization. IEEE Transactions on Information Theory, 31(3):344 347, 1985.

A Proof of Proposition 4.2

Write (Xi)i PP(µ) if the points (Xi)i (as a multiset, ignoring the ordering) form a Poisson point process with intensity measure µ. Similarly, for f : [0, )n [0, ), we write PP(f) for the Poisson point process with intensity function f (i.e., the intensity measure has a Radon-Nikodym derivative f against the Lebesgue measure).

Let (Ti)i PP(1) be a Poisson process with rate 1, independent of Z1, Z2, . . . iid Q. By the marking theorem [57], (Zi, Ti)i PP(Q λ[0, )), where Q λ[0, ) is the product measure between Q and the Lebesgues measure over [0, ). Let P = PZ|X( |x), and Ti = Ti ( d P

d Q(Zi)) 1. By the mapping

theorem [57] (also see [61, 60]), (Zi, Ti)i PP(P λ[0, )). Note that the points (Zi, Ti)i may not be sorted in ascending order of Ti. Therefore, we will sort them as follows. Let j1 be the j such that Tj is the smallest, j2 be the j other than j1 such that Tj is the smallest, and so on. Break ties arbitrarily. Then ( Tji)i is an ascending sequence, and we still have (Zji, Tji)i PP(P λ[0, )) since we are merely rearranging the points. Comparing (Zji, Tji)i PP(P λ[0, )) with the definition of

(Zi, Ti)i PP(Q λ[0, )), we can see that ( Tji)i PP(1) is independent of Zj1, Zj2, . . . iid P.

Recall that in PPR, we generate K Z+ with

Pr(K = k) = T α k P i=1 T α i ,

and the final output is ZK. Rearranging the points according to (ji)i, the distribution of the final output remains the same if we instead generate K Z+ with

Pr(K = k) = T α jk P i=1 T α ji ,

and the final output is Zj K . Since ( Tji)i PP(1) is independent of Zji iid P, we know that K is independent of (Zji)i, and hence Zj K P follows the desired distribution.

B Reparametrization and Detailed Algorithm of PPR

We now discuss the implementation of the Poisson private representation in Section 4. Practically, the algorithm cannot compute the whole infinite sequence ( Ti)i. We now present an exact algorithm for PPR that terminates in a finite amount of time using a reparametrization.

In the proof of Theorem F.1, we showed that, letting (Ti)i PP(1), Z1, Z2, . . . iid Q, Ri :=

(d P/d Q)(Zi), V1, V2, . . . iid Exp(1), PPR can be equivalently expressed as

K = argmin k T α k R α k Vk.

The problem of finding K is that there is no stopping criteria for the argmin. For example, if we scan the points (Ti, Ri, Vi)i in increasing order of Ti, it is always possible that there is a future point with Vi so small that it makes T α i R α i Vi smaller than the current minimum. If we scan the points in increasing order of Vi instead, it is likewise possible that there is a future point with a very small Ti. We can scan the points in increasing order of Ui := T α i Vi, but we would not know the indices of the points in the original process where T1 T2 is in increasing order, which is necessary to find out the Zi corresponding to each point (recall that in PPR, the point with the smallest Ti corresponds to Z1, the second smallest Ti corresponds to Z2, etc.).

Therefore, we will scan the points in increasing order of Bi := T α i min{Vi, 1} instead. By the mapping theorem [57], (T α i )i PP(α 1t1/α 1). By the marking theorem [57],

(T α i , Vi)i PP(α 1t1/α 1e v).

By the mapping theorem,

(T α i , T α i Vi)i PP(α 1t1/α 2e vt 1).

Since Bi = min{T α i , T α i Vi}, again by the mapping theorem,

b α 1b1/α 2e vb 1dv

b α 1t1/α 2e bt 1dt

= PP α 1b1/α 1e 1 + α 1b1/α 1γ(1 α 1, 1)

= PP α 1 e 1 + γ1 b1/α 1 ,

where γ1 := γ(1 α 1, 1) and γ(β, x) = R x 0 e ττ β 1dτ is the lower incomplete gamma function. Comparing the distribution of (Bi)i and (T α i )i, we can generate (Bi)i by first generating (Ui)i PP(1), and then taking Bi = (Uiα/(e 1 + γ1))α. The conditional distribution of (Ti, Vi) given Bi = b is described as follows:

With probability e 1/(e 1 + γ1), we have T α i = b and T α i Vi b(Exp(1) + 1), and hence Ti = b1/α and Vi Exp(1) + 1. With probability γ1/(e 1 + γ1), we have T α i Vi = b and

T α i α 1t1/α 2e bt 1

α 1γ(1 α 1, 1)b1/α 1 .

Hence, for 0 < τ 1,

Pr(Vi τ) = Pr(T α i b/τ) = γ(1 α 1, τ)

γ(1 α 1, 1),

and Vi follows the truncated gamma distribution with shape 1 α 1 and scale 1, truncated within the interval [0, 1]. We then have Ti = (b/Vi)1/α.

The algorithm is given in Algorithm 1. The encoder and decoder require a shared random seed s. One way to generate s is to have the encoder and decoder maintain two synchronized pseudorandom number generators (PRNGs) that are always at the same state, and invoke the PRNGs to generate s, guaranteeing that the s at the encoder is the same as the s at the decoder. The encoder maintains a collection of points (Ti, Vi, Θi), stored in a heap to allow fast query and removal of the point with the smallest Ti. The value Θi {0, 1} indicates whether it is possible that the point (Ti, Vi) attains the minimum of T α k R α k Vk. The encoding algorithm repeats until there is no possible points left in the heap, and it is impossible for any future point to be better than the current minimum of T α k R α k Vk. The encoding time complexity is O(supz(d P/d Q)(z) log(supz(d P/d Q)(z))), which is close to other sampling-based channel simulation schemes [45, 36].13 The decoding algorithm simply outputs the k-th sample generated using the random seed s, which can be performed in O(1) time.14

The PPR is implemented by Algorithm 1. We write x Exp G (1) to mean that we generate an exponential random variate x with rate 1 using the pseudorandom number generator G . Write x Explocal(1) to mean that x is generated using a local pseudorandom number generator (not G ).

C Proofs of Theorem 4.5 and Theorem 4.7

First prove Theorem 4.5. Consider a ε-DP mechanism PZ|X. Consider neighbors x1, x2, and let Pj := PZ|X( |xj), Tj,i := Ti/( d Pj

d Q (Zi)), and Kj be the output of PPR applied on Pj, for j = 1, 2. Since PZ|X is ε-DP,

d Q (z) d P1

d Q (z) eε d P2

d Q (z) (2)

13It was shown in [36] that greedy rejection sampling [45] runs in O(supz(d P/d Q)(z)) time. The PPR algorithm has an additional log term due to the use of heap. 14A counter-based PRNG [70] allows us to directly jump to the state after k uses of the PRNG, without the need of generating all k samples, greatly improving the decoding efficiency. This technique is applicable to greedy rejection sampling [45] and the original Poisson functional representation [61, 60] as well.

Algorithm 1 Poisson private representation Procedure PPRENCODE(α, Q, r, r , s) : Input: parameter α > 1, distribution Q, density r(z) := (d P/d Q)(z), bound r supz r(z), random seed s Output: index k Z>0

1: Initialize PRNG G using the seed s 2: u 0, w , k 0, k 0, n 0 3: γ1 γ(1 α 1, 1) = R 1 0 e ττ α 1dτ 4: h (empty heap) 5: while true do 6: u u + Explocal(1) Generated using local randomness (not G ) 7: b (uα/(e 1 + γ1))α

8: if n = 0 and b(r ) α w then No possible points left and future points impossible 9: return k

10: end if 11: if Uniflocal(0, 1) < e 1/(e 1 + γ1) then Run with prob. e 1/(e 1 + γ1) 12: t b1/α, v Explocal(1) + 1 13: else 14: repeat 15: v Gammalocal(1 α 1, 1) Gamma distribution 16: until v 1 17: t (b/v)1/α

18: end if 19: θ 1{(t/r )αv w } Is it possible for this point to be optimal 20: Push (t, v, θ) to h 21: n n + θ Number of possible points in heap 22: while h = and min(t ,v ,θ ) h t b1/α do Assign Zi s to points in heap with small Ti 23: (t, v, θ) arg min(t ,v ,θ ) h t , and pop (t, v, θ) from h 24: n n θ 25: k k + 1 26: Generate z Q using G 27: w (t/r(z))αv 28: if w < w then 29: w w 30: k k 31: end if 32: end while 33: end while

Procedure PPRDECODE(Q, k, s) : Input: Q, index k Z>0, seed s Output: sample z

1: Initialize PRNG G using the seed s 2: for i = 1, 2, . . . , k do 3: Generate z Q using G See footnote 14 4: end for 5: return z

for Q-almost every z,15 and hence e ε T2,i T1,i eε T2,i. For k Z+, we have, almost surely,

Pr(K1 = k | (Zi, Ti)i) = T α 1,k P i=1 T α 1,i

eαε T α 2,k P i=1 e αε T α 2,i = e2αε Pr(K2 = k | (Zi, Ti)i).

For any measurable S Z Z>0,

Pr (((Zi)i, K1) S)

= E Pr ((Zi)i, K1) S (Zi, Ti)i

k: ((Zi)i,k) S Pr K1 = k (Zi, Ti)i

k: ((Zi)i,k) S Pr K2 = k (Zi, Ti)i

= e2αε Pr (((Zi)i, K2) S) . (3)

Hence, P(Zi)i,K|X is 2αε-DP.

For Theorem 4.7, consider a ε d X -private mechanism PZ|X, and consider x1, x2 X. We have

e ε d X (x1,x2) d P2

d Q (z) d P1

d Q (z) eε d X (x1,x2) d P2

d Q (z) (4)

for Q-almost every z. By exactly the same arguments as in the proof of Theorem 4.5, Pr (((Zi)i, K1) S) e2αε d X (x1,x2) Pr (((Zi)i, K2) S), and hence P(Zi)i,K|X is 2αε d X - private.

D Proof of Theorem 4.6

Consider a (ε, δ)-DP mechanism PZ|X. Consider neighbors x1, x2, and let Pj := PZ|X( |xj), and Kj be the output of PPR applied on Pj, for j = 1, 2. By the definition of (ε, δ)-differential privacy, we have Z max {ρ1(z) eερ2(z), 0} Q(dz) δ, (5) Z max {ρ2(z) eερ1(z), 0} Q(dz) δ. (6)

ρ(z) := min max ρ1(z), e ερ2(z) , eερ2(z) .

Note that e ερ2(z) ρ(z) eερ2(z). We then consider two cases:

ρ(z)Q(dz) 1. Let ρ3(z) be such that R ρ3(z)Q(dz) = 1 and

ρ(z) ρ3(z) eερ2(z).

We can always find such ρ3 by taking an appropriate convex combination of the lower bound above (which integrates to 1) and the upper obund above (which integrates to 1). We then have

e ερ2(z) ρ3(z) eερ2(z). (7)

15ε-DP only implies that (2) holds for P1-almost every z (or equivalently P2-almost every z since P1, P2 are absolutely continuous with respect to each other). We now show that (2) holds for Q-almost every z. Apply Lebesgue s decomposition theorem to find measures Q, ˆQ such that Q = Q + ˆQ, Q P1 and ˆQ P1. There exists Z Z such that P1(Z ) = 1 and ˆQ(Z ) = 0. Since P1 Q, we have P1 Q. We have (2) for Q-almost every z. Also, we have (2) for ˆQ-almost every z since z / Z gives d P1

d Q (z) = d P1

d ˆ Q (z) = 0 for ˆQ-almost every z, and also d P2

d Q (z) = 0 for ˆQ-almost every z since P2 P1.

If ρ1(z) eερ2(z) 0, then ρ1(z) ρ3(z) ρ1(z) ρ(z) 0. If ρ1(z) eερ2(z) > 0, then ρ3(z) = ρ(z) = eερ2(z). Either way, we have max {ρ1(z) ρ3(z), 0} = max {ρ1(z) eερ2(z), 0}. By (5), we have Z max {ρ1(z) ρ3(z), 0} Q(dz) δ.

Let P3 = ρ3Q be the probability measure with d P3/d Q = ρ3. Then the total variation distance d TV(P1, P3) between P1 and P3 is at most δ, and by (7),

d Q (z) d P3

d Q (z) eε d P2

d Q (z). (8)

ρ(z)Q(dz) > 1. Let ρ3(z) be such that R ρ3(z)Q(dz) = 1 and

e ερ2(z) ρ3(z) ρ(z).

We can always find such ρ3 by taking an appropriate convex combination of the lower bound above (which integrates to 1) and the upper obund above (which integrates to > 1). We again have e ερ2(z) ρ3(z) eερ2(z). If e ερ2(z) ρ1(z) 0, then ρ3(z) ρ1(z) ρ(z) ρ1(z) 0. If e ερ2(z) ρ1(z) > 0, then ρ3(z) = ρ(z) = e ερ2(z). Either way, we have max {ρ3(z) ρ1(z), 0} = max {e ερ2(z) ρ1(z), 0}. By (6), we have Z max {ρ3(z) ρ1(z), 0} Q(dz) e εδ δ.

Let P3 = ρ3Q be the probability measure with d P3/d Q = ρ3. Again, we have d TV(P1, P3) δ and (8). Therefore, regardless of whether Case 1 or Case 2 holds, we can construct P3 satisfying d TV(P1, P3) δ and (8). Let K3 be the output of PPR applied on P3.

In the proof of Theorem F.1, we see that PPR has the following equivalent formulation. Let (Ti)i

PP(1) be a Poisson process with rate 1, independent of Z1, Z2, . . . iid Q. Let Ri := (d P/d Q)(Zi),

and let its probability measure be PR. Let V1, V2, . . . iid Exp(1). PPR can be equivalently expressed as

K = argmin k T α k R α k Vk = argmin k

Tk V 1/α k Rk .

Note that (Ti V 1/α i )i PP( R 0 v 1/αe vdv) = PP(Γ(1 α 1)) is a uniform Poisson process. Therefore PPR is the same as the Poisson functional representation [61, 60] applied on (Ti V 1/α i )i. By the grand coupling property of Poisson functional representation [60, 59] (see [59, Theorem 3]), if we apply the Poisson functional representation on P1 and P3 to get K1 and K3 respectively, then

Pr(K1 = K3) 2d TV(P1, P3) 2δ.

Therefore, for any measurable S Z Z>0,

Pr (((Zi)i, K1) S) Pr (((Zi)i, K3) S) + 2δ

e2αε Pr (((Zi)i, K2) S) + 2δ,

where the last inequality is by applying (3) on P3, P2 instead of P1, P2. Hence, P(Zi)i,K|X is (2αε, 2δ)-DP.

E Proof of Theorem 4.8

We present the proof of (ε, δ)-DP of PPR (i.e., Theorem 4.8).

Proof. We assume

where β := e 4.2. Using the Laplace functional of the Poisson process ( Ti)i [57, Theorem 3.9], for w > 0,

0 (1 exp( wt α))dt (10)

= exp w1/αΓ(1 α 1) .

We first bound the left tail of P

i T α i . By Chernoff bound, for d 0,

inf w>0 ewd E

= inf w>0 exp wd w1/αΓ(1 α 1)

α α 1 d Γ(1 α 1)

1 α 1 Γ(1 α 1)

= exp Γ(1 α 1) α α 1 d 1 α 1 α α α 1 α 1 α 1

αd (Γ(1 α 1))α

1 α 1 1 α 1 !

(Γ(2 α 1))α

1 α 1 1 α 1 !

(α 1)d (Γ(2 α 1))α

Therefore, to guarantee Pr(P

i T α i d) δ/3, we require

d Γ(2 α 1)α ln( δ/3) (α 1)

Γ(2 α 1)α ln( δ/3) (α 1)

(exp ( γ(α 1)))α ln( δ2) (α 1)

! 2 ln δ β δ ε2

ln δ 2e 1β δ ε2 !

exp 0.81 β ε2

since 1 < α 2, 0 < δ 1/3, β = e 4.2 and 0 < ε 1, where γ is the Euler-Mascheroni constant. Hence, we have

i T α i e ε/2

We then bound the right tail of P i T α i . Unfortunately, the Laplace functional (10) does not work since the integral diverges for small t. Therefore, we have to bound t away from 0. Note that mini Ti Exp(1), and hence Pr(min i Ti δ/3) δ/3. (12)

Write τ = δ/3. Using the Laplace functional again, for w > 0,

τ (1 exp(wt α))dt

τ (exp(wt α) 1)dt

τ (exp(wτ α) 1) t α

= exp exp(wτ α) 1

τ α τ (α 1)

= exp τ(exp(wτ α) 1)

Therefore, by Chernoff bound, for d 0,

inf w>0 exp wd + τ(exp(wτ α) 1)

exp dτ α ln(d(α 1)τ α 1) + τ(exp(ln(d(α 1)τ α 1)) 1)

= exp dτ α ln(d(α 1)τ α 1) + τ d(α 1)τ α 1 1

= exp cτ α 1 ln c + τ c 1

= exp τ α 1 (c ln c c + 1)

exp τ(2 ln 2 1)(c 1)2

where c := d(α 1)τ α 1,

and the last inequality holds whenever c [1, 2] since in this range,

c ln c c + 1 (2 ln 2 1)(c 1)2.

Substituting

we have c = e ε/2τ α 1. By (13), to guarantee Pr(P

i: Ti>τ T α i d) δ/3 = τ, we require

τ(2 ln 2 1)(e ε/2τ α 1 1)2

(α 1)( ln τ)

τ(2 ln 2 1) + 1. (14)

Substituting (9), we have

e ε/2τ α 1 e ε/2τ

2 + 2 ln δ β ε 3 ln δ

since 0 < δ 1/3. Note that this also guarantees c = e ε/2τ α 1 [1, 2] since β = e 4.2 and 0 < ε 1. We also have

(α 1)( ln τ)

τ(2 ln 2 1)

ln δ( ln τ)

τ(2 ln 2 1)

ln δ( 2 ln δ)

( δ/3)(2 ln 2 1)

2 ln 2 1 16β ε2.

(α 1)( ln τ)

τ(2 ln 2 1) + 1 4 ε p

(a) exp ε 1

e ε/2τ α 1,

where (a) is by β = e 4.2. Hence (14) is satisfied, and

T α i e ε/2

Combining this with (11) and (12),

i T α i / he ε/2

i T α i e ε/2

+ Pr(min i Ti δ/3)

T α i e ε/2

Consider an (ε, δ)-differentially private mechanism PZ|X. Consider neighbors x1, x2, and let Pj := PZ|X( |xj), Tj,i := Ti/( d Pj

d Q (Zi)), and Kj be the output of PPR applied on Pj, for j = 1, 2. We

first consider the case δ = 0, which gives d P1

d Q (z) eε d P2

d Q (z) for every z. For any measurable S Z Z>0,

Pr (((Zi)i, K1) S)

= E Pr ((Zi)i, K1) S (Zi, Ti)i

k: ((Zi)i,k) S Pr K1 = k (Zi, Ti)i

k: ((Zi)i,k) S

i T α 1,i he ε/2

k: ((Zi)i,k) S

i T α 1,i , 1

k: ((Zi)i,k) S

T α 1,k e ε/2/(α 1), 1

k: ((Zi)i,k) S

d Q (Zk))αT α k e ε/2/(α 1) , 1

k: ((Zi)i,k) S

d Q (Zk))αT α k e ε/2/(α 1) , 1

i T α 2,i he ε/2

k: ((Zi)i,k) S

d Q (Zk))αT α k e ε/2/(α 1) , 1

k: ((Zi)i,k) S

d Q (Zk))αT α k e ε P

i T α 2,i , 1

k: ((Zi)i,k) S

= eαε+ ε Pr (((Zi)i, K2) S) + 2 δ. (15)

Hence PPR is (αε + ε, 2 δ)-differentially private.

For the case δ > 0, by the definition of (ε, δ)-differential privacy, we have Z max d P1

d Q (z) eε d P2

d Q (z), 0 Q(dz) δ.

Let P3 be a probability measure that satisfies

d Q (z), eε d P2

d Q (z) d P3

d Q (z) eε d P2

for every z. Such P3 can be constructed by taking an appropriate convex combination of the lower bound above (which integrates to 1) and the upper bound above (which integrates to 1) such that P3 integrates to 1. We have Z max d P1

d Q (z) d P3

d Q (z), 0 Q(dz) δ,

and hence the total variation distance d TV(P1, P3) between P1 and P3 is at most δ. Let K3 be the output of PPR applied on P3.

In the proof of Theorem F.1, we see that PPR has the following equivalent formulation. Let (Ti)i

PP(1) be a Poisson process with rate 1, independent of Z1, Z2, . . . iid Q. Let Ri := (d P/d Q)(Zi),

and let its probability measure be PR. Let V1, V2, . . . iid Exp(1). PPR can be equivalently expressed as

K = argmin k T α k R α k Vk = argmin k

Tk V 1/α k Rk .

Note that (Ti V 1/α i )i PP( R 0 v 1/αe vdv) = PP(Γ(1 α 1)) is a uniform Poisson process. Therefore PPR is the same as the Poisson functional representation [61, 60] applied on (Ti V 1/α i )i. By the grand coupling property of Poisson functional representation [60, 59] (see [59, Theorem 3]), if we apply the Poisson functional representation on P1 and P3 to get K1 and K3 respectively, then

Pr(K1 = K3) 2d TV(P1, P3) 2δ.

Therefore, for any measurable S Z Z>0,

Pr (((Zi)i, K1) S) Pr (((Zi)i, K3) S) + 2δ

eαε+ ε Pr (((Zi)i, K2) S) + 2 δ + 2δ,

where the last inequality is by applying (15) on P3, P2 instead of P1, P2. This completes the proof.

F Proof of Theorem 4.3

We now bound the size of the index output by the Poisson private representation. The following is a refined version of Theorem 4.3.

Theorem F.1. For PPR with parameter α > 1, when the encoder is given the input x, the message K given by PPR satisfies

E[log K] D(P Q)

+ inf η (0,1] (0,α 1) 1 η log

α )Γ(η + 1) (Γ(1 1

D(P Q) + log(3.56) min{(α 1)/2, 1}, (17)

where P := PZ|X( |x).

Note that for α = , (16) with η = 1 gives E[log K] D(P Q) + log 2, recovering the bound in [58] (which strengthened [61]).

Proof. Write (Xi)i PP(µ) if the points (Xi)i (as a multiset, ignoring the ordering) form a Poisson point process with intensity measure µ. Similarly, for f : [0, )n [0, ), we write PP(f) for the Poisson point process with intensity function f (i.e., the intensity measure has a Radon-Nikodym derivative f against the Lebesgue measure). Let (Ti)i PP(1) be a Poisson process with rate 1, independent of Z1, Z2, . . . iid Q. Let Ri := (d P/d Q)(Zi), and let its probability measure be PR. We have Ti = Ti/Ri. Let V1, V2, . . . iid Exp(1). By the property of exponential random variables, for any p1, p2, . . . 0 with P

i pi < , we have Pr(argmink Vk/pk = k) = pk/ P i pi. Therefore, PPRF can be equivalently expressed as

K = argmin k T α k R α k Vk.

By the marking theorem [57], (Ti, Ri, Vi)i is a Poisson process over [0, )3 with intensity measure

(Ti, Ri, Vi)i PP e v PR(r) .

By the mapping theorem [57], letting Wi := T α i R α i Vi, we have

(Ti, Ri, Wi)i PP rαt αe wrαt αPR(r) . (18)

Again by the mapping theorem,

(Wi)i PP ER PR

0 Rαt αe w Rαt αdt

= PP E h α 1(w Rα)1/α 1Γ(1 α 1)Rαi

= PP E h α 1w1/α 1Γ(1 α 1)R i

= PP α 1w1/α 1Γ(1 α 1)

since E[R] = R (d P/d Q)(z)Q(dz) = 1. Recall that WK = mini Wi by the definition of K. We have

Pr(WK > w) = exp Z w

0 α 1v1/α 1Γ(1 α 1)dv

= exp w1/αΓ(1 α 1) .

Hence the probability density function of WK is

dw exp w1/αΓ(1 α 1)

= α 1w1/α 1Γ(1 α 1) exp w1/αΓ(1 α 1) . (19)

By (18), the Radon-Nikodym derivative between the conditional distribution of RK given WK = w and PR is

Pr(RK [r, r + dr) | WK = w)/PR(dr)

R 0 rαt αe wrαt αdt ER PR R 0 Rαt αe w Rαt αdt

= α 1w1/α 1Γ(1 α 1)r

α 1w1/α 1Γ(1 α 1) = r

does not depend on w. Hence RK is independent of WK. By (18), for 0 η < α 1,

E[T η K | RK = r, WK = w]

R 0 tηrαt αe wrαt αdt R 0 rαt αe wrαt αdt

= α 1w(η+1)/α 1Γ(1 (η + 1)α 1)rη+1

α 1w1/α 1Γ(1 α 1)r . (20)

Since RK is independent of WK, using (20) and (19), for η (0, 1] (0, α 1),

E[T η K | RK = r]

0 α 1w(η+1)/α 1Γ(1 (η + 1)α 1)rη exp w1/αΓ(1 α 1) dw

= rηΓ(1 (η + 1)α 1) Z

0 α 1w(η+1)/α 1 exp w1/αΓ(1 α 1) dw

= rηΓ(1 (η + 1)α 1)(Γ(1 α 1)) (η+1)Γ(η + 1) =: cα,ηrη, (21)

where cα,η := Γ(1 (η + 1)α 1)(Γ(1 α 1)) (η+1)Γ(η + 1). Hence, E[log(TK + 1) | RK = r]

E[log((T η K + 1)1/η) | RK = r]

= E[η 1 log(T η K + 1) | RK = r]

η 1 log(cα,ηrη + 1). (22)

Note that K 1 = |{i : Ti < TK}| , and hence the expecation of K 1 given TK should be around TK. This is not exact since conditioning on TK changes the distribution of the process (Ti, Ri, Vi)i. To resolve this problem, we define a new process (T i, R i, V i )i which includes all points in (Ti, Ri, Vi)i excluding the point (TK, RK, VK), together with newly generated points according to PP e v PR(r)1{tαr αv < T α KR α K VK} .

Basically, {tαr αv < T α KR α K VK} is the impossible region where the points in (Ti, Ri, Vi)i cannot be located in, since K attains the minimum of T α KR α K VK. The new process (T i, R i, V i )i removes the point (TK, RK, VK), and then fills in the impossible region. It is straightforward to check that (T i, R i, V i )i PP(e v PR(r)), independent of (TK, RK, VK). We have E[K | TK]

= E h |{i : Ti < TK}| TK i + 1

E h |{i : T i < TK}| TK i + 1

= TK + 1. Therefore, by (22) and Jensen s inequality, E[log K] = E [E[log K | TK]] E [log(TK + 1)]

= E E log(TK + 1) RK

E η 1 log(cα,ηRη K + 1)

d Q(Z) η + 1

= η 1E log d P

d Q(Z) η + η 1E log cα,η + d P

D(P Q) + η 1 log cα,η + E d P

D(P Q) + η 1 log(cα,η + 1),

where the last line is due to E[((d P/d Q)(Z)) 1] = R ((d P/d Q)(Z)) 1P(dz) 1 (this step appeared in [58]). The bound (16) follows from minimizing over η (0, 1] (0, α 1).

To show (17), substituting η = min{(α 1)/2, 1},

cα,η = Γ(1 (η + 1)α 1)Γ(η + 1)

(Γ(1 α 1))η+1

(a) (1 α 1)η+1

0.885η+1 (1 (η + 1)α 1)

0.8852 (1 ((α 1)/2 + 1)α 1)

= 2 0.8852 (1 α 1)η

where (a) is because 0.885 xΓ(x) = Γ(x + 1) 1 for 0 < x 1. Hence,

E[log K] D(P Q) + η 1 log(cα,η + 1),

D(P Q) + log(3.56) min{(α 1)/2, 1}.

G Distributed Mean Estimation with Rényi DP

In many machine learning applications, privacy budgets are often accounted in the moment space, and one popular moment accountant is the Rényi DP accountant. For completeness, we provide a Rényi DP version of Corollary 5.2 in this section. We begin with the following definition of Rényi DP: Definition G.1 (Rényi Differential privacy [1, 68]). Given a mechanism A which induces the conditional distribution PZ|X of Z = A(X), we say that it satisfies (γ, ε)- Rényi DP if for any neighboring (x, x ) N and S Z, it holds that

Dγ PZ|X=x PZ|X=x ε,

Dγ (P Q) := 1 γ 1 log EQ

is the Rényi divergence between P and Q. If N = X 2, we say that the mechanism satisfies (γ, ε)-local DP.

The following conversion lemma from [13] relates Rényi DP to (εDP(δ), δ)-DP. Lemma G.2. If A satisfies (γ, ε)-Rényi DP for some γ 1, then, for any δ > 0, A satisfies (εDP(δ), δ)-DP, where

εDP(δ) = ε + log (1/γδ)

γ 1 + log(1 1/γ). (23)

The following theorem states that, when simulating the Gaussian mechanism, PPR satisfies the following both central and local DP guarantee: Corollary G.3 (PPR-compressed Gaussian mechanism). Let ε 0 and γ 1. Consider the Gaussian mechanism PZ|X( |x) = N(x, σ2

n Id), and the proposal distribution Q = N(0, ( C2

n )Id), where σ q

2ε . For each client i, let Zi be the output of PPR applied on PZ|X( |Xi). We have:

ˆµ(Zn) := 1

i Zi yields an unbiased estimator of µ(Xn) = 1

n Pn i=1 Xi satisfying (γ, ε)- (central) Rényi DP and (εDP(δ), δ)-DP, where εDP(δ) is defined in (23).

PZ|Xi satisfies (2α εDP(δ), 2δ)-local DP, where

εDP(δ) := nε + log (1/γδ)

γ 1 + log(1 1/γ).

ˆµ(Zn) has MSE E[ µ ˆµ 2 2] = σ2d/n2.

The average per-client communication cost is at most ℓ+ log2(ℓ+ 1) + 2 bits where

dσ2 + 1 + ηα d

2d ln(1.25/δ) + 1 + ηα,

where ηα := (log2(3.56))/ min{(α 1)/2, 1}.

Proof. The central DP guarantee follows from [68] and Lemma G.2. The local DP guarantee follows from Lemma G.2 and Theorem 4.8. Finally, the communication bound can be obtained from the same analysis as in Corollary 5.2.

H Proof of Corollary 5.2

Consider the PPR applied on the Gaussian mechanism PZ|X( |x) = N(x, σ2

n Id), with the proposal distribution Q = N(0, ( C2

n )Id). PPR ensures that Zi follows the distribution N(Xi, σ2

n Id). Therefore the MSE is

E µ ˆµ 2 2 = E

i=1 (Xi Zi)

For the compression size, for x Rd with x 2 C, we have

D(PZ|X( |x) Q)

= EZ PZ|X( |x)

log d PZ|X( |x)

= EZ PZ|X( |x)

log (2πσ2/n) d/2 exp( 1

2 Z x 2 2/(σ2/n))

n )) d/2 exp( 1

2 Z 2 2/( C2

= EZ PZ|X( |x)

n Z x 2 2 σ2/n

Hence, by Theorem 4.3, the compression size is at most ℓ+ log2(ℓ+ 1) + 2 bits, where

dσ2 + 1 + ηα

2d ln(1.25/δ) + 1 + ηα

nϵ2 log2(e)

4 ln(1.25/δ) + ηα,

where ηα := (log2(3.56))/ min{(α 1)/2, 1}.

The central-DP guarantee follows from (ε, δ)-DP of Gaussian mechanism [26, Appendix A] since the output distribution of PPR is exactly the same as the Gaussian mechanism, whereas the local-DP guarantee follows from Theorem 4.6 and [26, Appendix A].

I Proof of Corollary 6.1

Let X Z 2 = RS where R [0, ) is the magnitude of X Z, and S 2 = 1. As shown in [33], R follows the Gamma distribution with shape d and scale 1/ε. Hence the MSE is

E X Z 2 2 = E R2 = d

ε2 = d(d + 1)

The conditional differential entropy (in nats) of Z given X is h(Z|X) = h(R) + h(S|R)

= d + ln Γ(d) (d 1)ψ(d) ln ε + E ln(n Rd 1Vol(Bd(1)))

= d + ln Γ(d) (d 1)ψ(d) ln ε + ln d + ln(Vol(Bd(1))) + (d 1)E [ln R]

= d + ln Γ(d) (d 1)ψ(d) ln ε + ln d + d

2 ln π ln Γ d

(d 1) ln ϵ + (d 1)ψ(d)

ε + ln dΓ(d) Γ( d

where ψ is the digamma function. Therefore, the KL divergence between PZ|X( |x) and Q (in nats) is D(PZ|X( |x) Q)

= EZ PZ|X( |x)

Z 2 2 2( C2

+ EZ PZ|X( |x) Z 2 2

ε2 ) d ln e π

ε ln dΓ(d) Γ( d

+ C2 + d(d+1)

ε2 ) d ln e π

ε ln dΓ(d) Γ( d

d + d + 1 ln Γ(d + 1)

Hence, by Theorem 4.3, the compression size is at most ℓ+ log2(ℓ+ 1) + 2 bits. The metric privacy guarantee follows from Theorem 4.7.

J Experiments on Metric Privacy

We use PPR to simulate the Laplace mechanism [3, 33, 34] f Z|X(z|x) e εd X (x,z) discussed in Section 6. We consider X Bd(C) where C = 10000 and d = 500. A large number of dimensions d is common, for example, in privatizing word embedding vectors [33, 34]. We compare the performance of PPR-compressed Laplace mechanism (Corollary 6.1) with the discrete Laplace mechanism [3]. The discrete Laplace mechanism is described as follows (slightly modified from [3] to work for the d-ball Bd(C)): 1) generate a Laplace noise Y with probability density function f Y (y) e ε y 2; 2) compute ˆZ = X + Y ; 3) truncate ˆZ to the closest point Z in Bd(C); and 4) quantize each coordinate of Z by a quantizer with step size u > 0. The number of bits required by the discrete Laplace mechanism is log2(Vol(Bd(C))/ud) , where Vol(Bd(C))/ud is the number of quantization cells (hypercube of side length u) inside Bd(C). The parameter u is selected to fit the number of bits allowed.

Figure 2 shows the mean squared error of PPR-compressed Laplace mechanism (α = 2) and the discrete Laplace mechanism for different ε s, when the number of bits is limited to 500, 1000 and 1500.16 We can see that PPR performs better for larger ϵ or smaller MSE, whereas the discrete Laplace mechanism performs better for smaller ϵ or larger MSE. The performance of discrete Laplace mechanism for smaller ϵ is due to the truncation step which limits Z to Bd(C), which reduces the error at the expense of introducing distortion to the distribution of Z, and making Z a biased estimate of X. In comparison, PPR preserves the Laplace conditional distribution f Z|X exactly, and hence produces an unbiased Z.

16The MSE of PPR is computed using the closed-form formula in Corollary 6.1, which is tractable since Z follows the Laplace conditional distribution f Z|X exactly. The number of bits used by PPR is given by the bound in Corollary 6.1. The MSE of the discrete Laplace mechanism is estimated using 5000 trials per data point. Although we do not plot the error bars, the largest coefficient of variation of the sample mean (i.e., standard error of the mean divided by the sample mean) is only 0.00117, which would be unnoticeable even if the error bars were plotted.

Figure 2: MSE of PPR-compressed Laplace mechanism and discrete Laplace mechanism [3] for different ε s.

K Running Time of PPR

As discussed in Section 7, we can ensure an O(d) running time for the Gaussian mechanism by using the sliced PPR, where the d-dimensional vector X is divided into d/dchunk chunks, each with a fixed dimension dchunk (possibly except the last chunk if dchunk is not a factor of d). The average total running time is d/dchunk Tchunk, where Tchunk is the average running time of PPR applied on one chunk.17 Therefore, to study the running time of the sliced PPR, we study how Tchunk depend on dchunk.

In Figure 3 we show the running time Tchunk of PPR applied on one chunk with dimension dchunk, where dchunk ranges from 40 to 110.18 With d = 1000, n = 500, ε = 0.05 and δ = 10 6, we require a Gaussian mechanism with noise N(0, n σ2Idchunk) where σ = 1.0917 at each user in order to ensure (ε, δ)-central DP. We record the mean Tchunk and the standard error of the mean19 of the running time of PPR applied to simulate this Gaussian mechanism (averaged over 20000 trials).

17Note that the chunks may be processed in parallel for improved efficiency. 18Experiments were executed on M1 Pro Macbook, 8-core CPU ( 3.2 GHz) with 16GB memory. 19The standard error of the mean is given by σmean = σtime/ ntrials, where σtime is the standard deviation of the running time among the ntrials = 20000 trials.

Figure 3: Average running time of PPR applied to a chunk of dimension dchunk, with error bars indicating the interval Tchunk 2σmean, where Tchunk is the sample mean of the running time, and σmean is the standard error of the mean (see Footnote 19).

We plot the average running time (over 20000 trials for each data point) against the values of ϵ [0.06, 10], with dchunk always chosen to be 4. The average running time is denoted as Tchunk, and the standard error of the mean is given by σmean = σtime/ ntrials, where σtime is the standard deviation of the running time among the σtime = 20000 trials.

Figure 4: Average running time (over 20000 trials), dchunk = 4 and ε [0.06, 10], with error bars indicating the interval Tchunk 2σmean, where Tchunk is the sample mean of the running time, and σmean is the standard error of the mean.

L MSE against Compression Size

We plot the MSE against the compression size (ranging from 25 to 1000 bits) for ϵ {0.25, 0.5, 1.0, 2.0} in the following figure.

Figure 5: The MSE of PPR and CSGM against the compression size in bits, where ε is chosen from {0.25, 0.5, 1.0, 2.0} and compression sizes vary from 25 to 1000 bits. Note that parts of the curves for PPR are flat, because a lower compression size is already sufficient for PPR to exactly simulate the best Gaussian mechanism for that value of ε, so a higher compression size than necessary will not affect the result.

Neur IPS Paper Checklist

Question: Do the main claims made in the abstract and introduction accurately reflect the paper s contributions and scope? Answer: [Yes]

Justification: We have clearly stated our paper s contributions and scope in both the abstract and introduction. We try to answer the fundamental question: how can we efficiently communicate privatized data? The main contribution is our novel DP mechanism compressor , whose main advantages: universality, exactness and communication efficiency, have been elaborated in the introduction. Guidelines:

The answer NA means that the abstract and introduction do not include the claims made in the paper. The abstract and/or introduction should clearly state the claims made, including the contributions made in the paper and important assumptions and limitations. A No or NA answer to this question will not be perceived well by the reviewers. The claims made should match theoretical and experimental results, and reflect how much the results can be expected to generalize to other settings. It is fine to include aspirational goals as motivation as long as it is clear that these goals are not attained by the paper. 2. Limitations

Question: Does the paper discuss the limitations of the work performed by the authors? Answer: [Yes] Justification: The main limitation of the proposed method is that it has a running time exponential with respect to the mutual information, though this is not an obstacle for simulating local DP mechanism (where the mutual information must be small). This is elaborated in Section 8. Note that there is no limitation on the type of DP mechanism that can be compressed by PPR (it can compress any DP mechanism to almost the theoretically smallest size). See Section 8 for details. Guidelines:

The answer NA means that the paper has no limitation while the answer No means that the paper has limitations, but those are not discussed in the paper. The authors are encouraged to create a separate "Limitations" section in their paper. The paper should point out any strong assumptions and how robust the results are to violations of these assumptions (e.g., independence assumptions, noiseless settings, model well-specification, asymptotic approximations only holding locally). The authors should reflect on how these assumptions might be violated in practice and what the implications would be. The authors should reflect on the scope of the claims made, e.g., if the approach was only tested on a few datasets or with a few runs. In general, empirical results often depend on implicit assumptions, which should be articulated. The authors should reflect on the factors that influence the performance of the approach. For example, a facial recognition algorithm may perform poorly when image resolution is low or images are taken in low lighting. Or a speech-to-text system might not be used reliably to provide closed captions for online lectures because it fails to handle technical jargon. The authors should discuss the computational efficiency of the proposed algorithms and how they scale with dataset size. If applicable, the authors should discuss possible limitations of their approach to address problems of privacy and fairness. While the authors might fear that complete honesty about limitations might be used by reviewers as grounds for rejection, a worse outcome might be that reviewers discover limitations that aren t acknowledged in the paper. The authors should use their best

judgment and recognize that individual actions in favor of transparency play an important role in developing norms that preserve the integrity of the community. Reviewers will be specifically instructed to not penalize honesty concerning limitations.

3. Theory Assumptions and Proofs

Question: For each theoretical result, does the paper provide the full set of assumptions and a complete (and correct) proof?

Answer: [Yes]

Justification: All theorems are precise mathematical statements, with all necessary assumptions included in either the theorem statement or the definition of PPR. Complete proofs are included in the appendices.

Guidelines:

The answer NA means that the paper does not include theoretical results. All the theorems, formulas, and proofs in the paper should be numbered and crossreferenced. All assumptions should be clearly stated or referenced in the statement of any theorems. The proofs can either appear in the main paper or the supplemental material, but if they appear in the supplemental material, the authors are encouraged to provide a short proof sketch to provide intuition. Inversely, any informal proof provided in the core of the paper should be complemented by formal proofs provided in appendix or supplemental material. Theorems and Lemmas that the proof relies upon should be properly referenced.

4. Experimental Result Reproducibility

Question: Does the paper fully disclose all the information needed to reproduce the main experimental results of the paper to the extent that it affects the main claims and/or conclusions of the paper (regardless of whether the code and data are provided or not)?

Answer: [Yes]

Justification: The detailed pseudocode of the PPR method is given in Algorithm 1 in the appendix, and the implementation together with the codes and data for Section 7 have been submitted in the supplementary material.

Guidelines:

The answer NA means that the paper does not include experiments. If the paper includes experiments, a No answer to this question will not be perceived well by the reviewers: Making the paper reproducible is important, regardless of whether the code and data are provided or not. If the contribution is a dataset and/or model, the authors should describe the steps taken to make their results reproducible or verifiable. Depending on the contribution, reproducibility can be accomplished in various ways. For example, if the contribution is a novel architecture, describing the architecture fully might suffice, or if the contribution is a specific model and empirical evaluation, it may be necessary to either make it possible for others to replicate the model with the same dataset, or provide access to the model. In general. releasing code and data is often one good way to accomplish this, but reproducibility can also be provided via detailed instructions for how to replicate the results, access to a hosted model (e.g., in the case of a large language model), releasing of a model checkpoint, or other means that are appropriate to the research performed. While Neur IPS does not require releasing code, the conference does require all submissions to provide some reasonable avenue for reproducibility, which may depend on the nature of the contribution. For example (a) If the contribution is primarily a new algorithm, the paper should make it clear how to reproduce that algorithm. (b) If the contribution is primarily a new model architecture, the paper should describe the architecture clearly and fully.

(c) If the contribution is a new model (e.g., a large language model), then there should either be a way to access this model for reproducing the results or a way to reproduce the model (e.g., with an open-source dataset or instructions for how to construct the dataset). (d) We recognize that reproducibility may be tricky in some cases, in which case authors are welcome to describe the particular way they provide for reproducibility. In the case of closed-source models, it may be that access to the model is limited in some way (e.g., to registered users), but it should be possible for other researchers to have some path to reproducing or verifying the results.

5. Open access to data and code

Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material?

Answer: [Yes] Justification: We have provided detailed implementations of Algorithm 1 and data for Figure 1 (the main experimental result) in the supplemental material.

Guidelines:

The answer NA means that paper does not include experiments requiring code. Please see the Neur IPS code and data submission guidelines (https://nips.cc/ public/guides/Code Submission Policy) for more details. While we encourage the release of code and data, we understand that this might not be possible, so No is an acceptable answer. Papers cannot be rejected simply for not including code, unless this is central to the contribution (e.g., for a new open-source benchmark). The instructions should contain the exact command and environment needed to run to reproduce the results. See the Neur IPS code and data submission guidelines (https: //nips.cc/public/guides/Code Submission Policy) for more details. The authors should provide instructions on data access and preparation, including how to access the raw data, preprocessed data, intermediate data, and generated data, etc. The authors should provide scripts to reproduce all experimental results for the new proposed method and baselines. If only a subset of experiments are reproducible, they should state which ones are omitted from the script and why. At submission time, to preserve anonymity, the authors should release anonymized versions (if applicable). Providing as much information as possible in supplemental material (appended to the paper) is recommended, but including URLs to data and code is permitted.

6. Experimental Setting/Details

Question: Does the paper specify all the training and test details (e.g., data splits, hyperparameters, how they were chosen, type of optimizer, etc.) necessary to understand the results?

Answer: [Yes]

Justification: We have specified all the experiment details in Section 7.

Guidelines:

The answer NA means that the paper does not include experiments. The experimental setting should be presented in the core of the paper to a level of detail that is necessary to appreciate the results and make sense of them. The full details can be provided either with the code, in appendix, or as supplemental material.

7. Experiment Statistical Significance

Question: Does the paper report error bars suitably and correctly defined or other appropriate information about the statistical significance of the experiments?

Answer: [Yes]

Justification: For the PPR applied on the Gaussian mechanism in Section 7, the mean squared error is obtained by a closed-form formula, and the compression size is bounded using the expression in Theorem 4.3. Therefore, the plot about PPR in in Figure 1 is precise. For the running time of sliced PPR, we have reported the standard error of the mean of the running time via error bars in Figure 3. For the experiments on the Laplace mechanism in Appendix J, the standard error of the mean has been recorded for the estimation of the MSE of the discrete Laplace mechanism, and the largest coefficient of variation of the sample mean is reported, which is very small and would be unnoticeable even if the error bars were plotted.

Guidelines:

The answer NA means that the paper does not include experiments. The authors should answer "Yes" if the results are accompanied by error bars, confidence intervals, or statistical significance tests, at least for the experiments that support the main claims of the paper. The factors of variability that the error bars are capturing should be clearly stated (for example, train/test split, initialization, random drawing of some parameter, or overall run with given experimental conditions). The method for calculating the error bars should be explained (closed form formula, call to a library function, bootstrap, etc.) The assumptions made should be given (e.g., Normally distributed errors). It should be clear whether the error bar is the standard deviation or the standard error of the mean. It is OK to report 1-sigma error bars, but one should state it. The authors should preferably report a 2-sigma error bar than state that they have a 96% CI, if the hypothesis of Normality of errors is not verified. For asymmetric distributions, the authors should be careful not to show in tables or figures symmetric error bars that would yield results that are out of range (e.g. negative error rates). If error bars are reported in tables or plots, The authors should explain in the text how they were calculated and reference the corresponding figures or tables in the text.

8. Experiments Compute Resources

Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments?

Answer: [Yes]

Justification: We have provided all the necessary information on the experiments in Section 7, including the computer and CPU information. Figure 1 plotted according to closed-form formulas and hence can be immediately derived.

Guidelines:

The answer NA means that the paper does not include experiments. The paper should indicate the type of compute workers CPU or GPU, internal cluster, or cloud provider, including relevant memory and storage. The paper should provide the amount of compute required for each of the individual experimental runs as well as estimate the total compute. The paper should disclose whether the full research project required more compute than the experiments reported in the paper (e.g., preliminary or failed experiments that didn t make it into the paper).

9. Code Of Ethics

Question: Does the research conducted in the paper conform, in every respect, with the Neur IPS Code of Ethics https://neurips.cc/public/Ethics Guidelines?

Answer: [Yes] Justification: We confirm that the research conducted in this paper conforms with the Neur IPS Code of Ethics.

Guidelines:

The answer NA means that the authors have not reviewed the Neur IPS Code of Ethics. If the authors answer No, they should explain the special circumstances that require a deviation from the Code of Ethics. The authors should make sure to preserve anonymity (e.g., if there is a special consideration due to laws or regulations in their jurisdiction).

10. Broader Impacts

Question: Does the paper discuss both potential positive societal impacts and negative societal impacts of the work performed?

Answer: [No] Justification: We believe research on privacy preservation generally has a positive societal impact, and research on reducing the communication cost of privacy mechanisms (such as this paper) is beneficial. Nevertheless, we have not addressed this in the paper, since we believe our theoretical results has no specific societal impact outside of the general impacts of privacy and reduced communication cost.

Guidelines:

The answer NA means that there is no societal impact of the work performed. If the authors answer NA or No, they should explain why their work has no societal impact or why the paper does not address societal impact. Examples of negative societal impacts include potential malicious or unintended uses (e.g., disinformation, generating fake profiles, surveillance), fairness considerations (e.g., deployment of technologies that could make decisions that unfairly impact specific groups), privacy considerations, and security considerations. The conference expects that many papers will be foundational research and not tied to particular applications, let alone deployments. However, if there is a direct path to any negative applications, the authors should point it out. For example, it is legitimate to point out that an improvement in the quality of generative models could be used to generate deepfakes for disinformation. On the other hand, it is not needed to point out that a generic algorithm for optimizing neural networks could enable people to train models that generate Deepfakes faster. The authors should consider possible harms that could arise when the technology is being used as intended and functioning correctly, harms that could arise when the technology is being used as intended but gives incorrect results, and harms following from (intentional or unintentional) misuse of the technology. If there are negative societal impacts, the authors could also discuss possible mitigation strategies (e.g., gated release of models, providing defenses in addition to attacks, mechanisms for monitoring misuse, mechanisms to monitor how a system learns from feedback over time, improving the efficiency and accessibility of ML).

11. Safeguards

Question: Does the paper describe safeguards that have been put in place for responsible release of data or models that have a high risk for misuse (e.g., pretrained language models, image generators, or scraped datasets)?

Answer: [NA] Justification: This piece of research is theoretical. We do not release any new dataset or pretrained model.

Guidelines:

The answer NA means that the paper poses no such risks. Released models that have a high risk for misuse or dual-use should be released with necessary safeguards to allow for controlled use of the model, for example by requiring that users adhere to usage guidelines or restrictions to access the model or implementing safety filters. Datasets that have been scraped from the Internet could pose safety risks. The authors should describe how they avoided releasing unsafe images.

We recognize that providing effective safeguards is challenging, and many papers do not require this, but we encourage authors to take this into account and make a best faith effort. 12. Licenses for existing assets

Question: Are the creators or original owners of assets (e.g., code, data, models), used in the paper, properly credited and are the license and terms of use explicitly mentioned and properly respected? Answer: [Yes] Justification: We have appropriately mentioned and cited the paper whenever we compare with it. Guidelines:

The answer NA means that the paper does not use existing assets. The authors should cite the original paper that produced the code package or dataset. The authors should state which version of the asset is used and, if possible, include a URL. The name of the license (e.g., CC-BY 4.0) should be included for each asset. For scraped data from a particular source (e.g., website), the copyright and terms of service of that source should be provided. If assets are released, the license, copyright information, and terms of use in the package should be provided. For popular datasets, paperswithcode.com/datasets has curated licenses for some datasets. Their licensing guide can help determine the license of a dataset. For existing datasets that are re-packaged, both the original license and the license of the derived asset (if it has changed) should be provided. If this information is not available online, the authors are encouraged to reach out to the asset s creators. 13. New Assets

Question: Are new assets introduced in the paper well documented and is the documentation provided alongside the assets? Answer: [Yes] Justification: This paper is mainly for theoretical research. We have provided the codes for our algorithm (implemented by Python) in the supplementary material, which is well documented. We do not release any new dataset. Guidelines:

The answer NA means that the paper does not release new assets. Researchers should communicate the details of the dataset/code/model as part of their submissions via structured templates. This includes details about training, license, limitations, etc. The paper should discuss whether and how consent was obtained from people whose asset is used. At submission time, remember to anonymize your assets (if applicable). You can either create an anonymized URL or include an anonymized zip file. 14. Crowdsourcing and Research with Human Subjects

Question: For crowdsourcing experiments and research with human subjects, does the paper include the full text of instructions given to participants and screenshots, if applicable, as well as details about compensation (if any)? Answer: [NA] Justification: This piece of research does not involve crowdsourcing or human subjects. Guidelines:

The answer NA means that the paper does not involve crowdsourcing nor research with human subjects.

Including this information in the supplemental material is fine, but if the main contribution of the paper involves human subjects, then as much detail as possible should be included in the main paper. According to the Neur IPS Code of Ethics, workers involved in data collection, curation, or other labor should be paid at least the minimum wage in the country of the data collector. 15. Institutional Review Board (IRB) Approvals or Equivalent for Research with Human Subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or institution) were obtained? Answer: [NA] Justification: This piece of research does not involve crowdsourcing or human subjects. Guidelines:

The answer NA means that the paper does not involve crowdsourcing nor research with human subjects. Depending on the country in which research is conducted, IRB approval (or equivalent) may be required for any human subjects research. If you obtained IRB approval, you should clearly state this in the paper. We recognize that the procedures for this may vary significantly between institutions and locations, and we expect authors to adhere to the Neur IPS Code of Ethics and the guidelines for their institution. For initial submissions, do not include any information that would break anonymity (if applicable), such as the institution conducting the review.