# simulationbased_bayesian_inference_from_privacy_protected_data__8e710456.pdf

Published in Transactions on Machine Learning Research (03/2025)

Simulation-based Bayesian Inference from Privacy Protected Data

Yifei Xiong xiong173@purdue.edu Department of Statistics Purdue University

Nianqiao Phyllis Ju nianqiao@purdue.edu Department of Statistics Purdue University

Sanguo Zhang sgzhang@ucas.ac.cn School of Mathematical Sciences University of Chinese Academy of Sciences

Reviewed on Open Review: https: // openreview. net/ forum? id= SB7Jzh DG45

Many modern statistical analysis and machine learning applications require training models on sensitive user data. Under a formal definition of privacy protection, differentially private algorithms inject calibrated noise into the confidential data or during the data analysis process to produce privacy-protected datasets or queries. However, restricting access to only privatized data during statistical analysis makes it computationally challenging to make valid statistical inferences. In this work, we propose simulation-based inference methods from privacy-protected datasets. In addition to sequential Monte Carlo approximate Bayesian computation, we adopt neural conditional density estimators as a flexible family of distributions to approximate the posterior distribution of model parameters given the observed private query results. We illustrate our methods on discrete time-series data under an infectious disease model and with ordinary linear regression models. Illustrating the privacy-utility trade-off, our experiments and analysis demonstrate the necessity and feasibility of designing valid statistical inference procedures to correct for biases introduced by the privacy-protection mechanisms.

1 Introduction

Motivation. Many AI systems require collecting and training on massive amounts of personal information (such as income, disease status, location, purchase history, etc.). Despite unprecedented data collection efforts by companies, governments, researchers, and other agencies, oftentimes, data collectors have to lock the data inside their own databases due to privacy concerns. Differential privacy (DP) provides a mathematical definition for the protection of individual data (Drechsler et al., 2024). Under this framework, privacy-protecting procedures (i.e., DP algorithms) have enabled data collectors, such as tech companies, the US Census Bureau, and social scientists, to share research data in a wide variety of settings while protecting the privacy of individual users. Privacy researchers typically collect confidential data and then inject calibrated random noise into the confidential data to achieve the desired levels of privacy protection. Some algorithms aim for DP data analysis, resulting in DP optimizations (Arora et al., 2023; Bassily et al., 2021), approximations (Chaudhuri et al., 2013; Bie et al., 2023), or predictions (Rho et al., 2023); while other algorithms produce DP datasets (or descriptive statistics), which enables data sharing across research teams and entities. Examples of the latter include the 2020 US Census (Abowd et al., 2022; Gong et al., 2022;

Published in Transactions on Machine Learning Research (03/2025)

Drechsler, 2023), the Facebook URL dataset (Evans & King, 2023), and New York Airbnb Open Data (Guo & Hu, 2023). Our work tackles this data-sharing regime: we aim to make valid statistical inferences with privatized data, and we place a special focus on using complex models such as continuous-time Markov jump processes.

Although the DP data-sharing regime corrupts confidential information in order to satisfy privacy, since the probabilistic design of such mechanisms can be publicly known, in principle analysts can still conduct reliable estimation and uncertainty quantification by accounting for bias and noises introduced during privatization. However, in practice, valid inference based on privatized data is a challenge that requires the revision of existing statistical methods designed originally for confidential data (Foulds et al., 2016). Even for well-understood procedures such as ordinary linear regression and generalized linear models, adding the extra layer of privacy protection has introduced new theoretical and methodological questions in statistics (Cai et al., 2021; Alabi & Vadhan, 2022; Li et al., 2023; Barrientos et al., 2019).

However, for complex models, even the confidential data likelihood functions are intractable or time-consuming to compute. Accounting for privacy noise on top of that is a formidable challenge. Without developing valid inference procedures under this regime, we must either make biased estimations or restrict ourselves to simple models. In this work, we propose methods to estimate the parameters of complex models that underlie privacy-protected data.

Related works. There is a fast-growing literature on statistical inference under differential privacy. Early work by Williams & Mc Sherry (2010) first proposed using Bayesian inference to handle DP noise. Since then, various Bayesian methods have been developed for privatized data. Markov Chain Monte Carlo (MCMC) methods have been proposed in some specific models and priors, such as exponential family distributions (Bernstein & Sheldon, 2018), Bayesian linear regression (Bernstein & Sheldon, 2019) and generalized linear models (Kulkarni et al., 2021). As for generic algorithms, Ju et al. (2022) proposed a dataaugmentation MCMC strategy to overcome the intractable marginal likelihood resulting from privatization, and Gong (2022) derived point estimates of the posterior distribution using the Expectation-Maximization algorithm. Several frequentist inference methods have also been developed. Karwa et al. (2015) employed a parametric bootstrap method to construct confidence intervals for the model parameters of the log-linear model. Awan & Wang (2023) proposes simulation-based inference methods for hypothesis testing and confidence intervals, for frequentist inference.

To the best of our knowledge, only three papers (Waites & Cummings, 2021; Lee et al., 2022; Su et al., 2023) have incorporated normalizing flows (Kobyzev et al., 2020; Papamakarios et al., 2021) and DP, and both works design DP-versions of normalizing flow. In contrast, our work uses flow-based methods as a neural density estimation tool to analyze DP-protected data.

Our contributions. In this work, we propose several likelihood-free inference methods that make statistical inferences from privacy-protected data. First, we highlight that sequential Monte Carlo approximate Bayesian computation (SMC-ABC) can be used for this purpose, which improves the current practice of using ABC (Gong, 2022). Next, we propose sequential private posterior estimation (SPPE) and sequential private likelihood estimation (SPLE), two sequential neural density estimation methods to approximate the private data posterior distribution with normalizing flows. Unlike likelihood-based methods to learn from private data, SMC-ABC, SPPE, and SPLE require only simulations from a generative model for confidential data, and hence are applicable to more models. We demonstrate the efficiency of our methods on an infectious disease model using synthetic and several real disease outbreak data. SPPE and SPLE require fewer numbers of simulations from the data model to achieve the same inference results compared with SMC-ABC. We also propose a privacy mechanism for the release of the infection curve. Our experiments also demonstrate the privacy-utility trade-off in linear regression, where we compared the proposed likelihood-free methods with data augmentation MCMC, a likelihood-based inference method. The code is available on Git Hub1.

1https://github.com/Yifei-Xiong/Simulation-based-Bayesian-Inference-from-Privacy-Protected-Data

Published in Transactions on Machine Learning Research (03/2025)

2 Background and challenges

Let θ Θ be the model parameter and x = (x1, , xn) Xn represent the confidential database, where n is the sample size. We model the database with some likelihood function f(x | θ). This confidential data model can be a simulator whose likelihood can be impossible to compute. In the confidential data setting, our interest is to approximate the posterior distribution π(θ | xo) π(θ)f(xo | θ) where xo is the observed dataset and π(θ) is the prior. In this section, we first review some key approaches for likelihood-free inference in the confidential data setting and then describe output perturbation mechanisms to generate differentially private queries sdp. Finally, we explain why learning θ given observed so dp is challenging. We will present new methods for likelihood-free inference in the privatized data setting in Section 3.

2.1 Likelihood-free inference for confidential data

Likelihood-free inference methods require only the ability to generate data from a simulator model, instead of evaluating its likelihood.

Approximate Bayesian Computation (ABC) and sequential Monte Carlo ABC (SMC-ABC). ABC-based methods circumvent likelihood evaluations with simulations. In its most basic form, a set of parameters are sampled from the prior θ(i) π(θ) and a data set x(i) is simulated from the model for each θ(i). The samples θ(i) such that x(i) is close to the observed data xo provide an approximation to the posterior π(θ | xo). It is also common to choose some summary statistics s and to retain samples according to distance between each s(x(i)) and s(xo). The efficiency of ABC is controlled by the closeness between the prior distribution, which can be viewed as its proposal distribution, and the posterior distribution, which is the target distribution. SMC-ABC (Sisson et al., 2007; Beaumont et al., 2009) improves ABC by specifying a sequence of intermediate target distributions between the prior and the posterior, and thus the θ(i) samples can gradually evolve towards the target distribution. The sequential updates in SMC-ABC propagate and reweight the parameter samples with importance sampling.

Neural density estimation. In contrast to sampling-based approximations like ABC and SMC-ABC, we can pursue density approximation of π(θ | xo) with neural networks. We illustrate a neural estimator for the posterior density π(θ | x). Let {qϕ(θ | x)}ϕ denote the collection of densities representable by an architecture q and weight parameters ϕ. We can train the neural estimator to minimize an ideal loss function

ˆϕ = arg min ϕ Ep(θ,x) [ log qϕ(θ | x)] = arg min ϕ

Θ Xn [ log qϕ(θ | x)] π(θ)f(x | θ)dθdx.

This loss is an expected value of log qϕ under the joint distribution p(θ, x) = π(θ)f(x | θ). When f(x | θ) is intractable, the expectation is also intractable, and training relies on Monte Carlo approximations of the integral using samples {(θ(j), x(j))}N i=1. In sequential neural density estimation, we can sequentially improve the Monte Carlo estimation of the loss function (Papamakarios & Murray, 2016; Lueckmann et al., 2017; Greenberg et al., 2019). Later, in Section 3.4, we will use randomized quasi-Monte Carlo (RQMC) methods (Owen, 1997a;b) as a drop-in replacement for standard Monte Carlo to reduce variance. Other neural density approaches include approximating likelihood functions (Papamakarios et al., 2019) or likelihood ratios (Miller et al., 2022). We refer the readers to (Cranmer et al., 2020; Lueckmann et al., 2021) for systematic reviews of neural density approximations.

A notable neural density estimation method is normalizing flows (NF) (Dinh et al., 2017; Papamakarios et al., 2017; Durkan et al., 2019; Papamakarios et al., 2021), which are adopted in our experiments. NF start with a simple base density (e.g. multivariate Normal) and push it through a sequence of invertible and differentiable transformations. The resulting distribution is richer and more complex. We choose NF because of their expressive power and popularity in probabilistic modeling.

2.2 Differential privacy

Given confidential data x, let η be a randomized algorithm to produce a differentially private statistic sdp from x. We also use η(sdp | x) to denote the conditional density of the private output sdp given the

Published in Transactions on Machine Learning Research (03/2025)

confidential data x. Intuitively, a procedure is private when perturbing one individual s response in the dataset leads to only a small change in the algorithm s outcome. This is characterized by the ϵ-DP definition from Dwork et al. (2006), based on neighboring databases. Definition 1 (ϵ-DP). We say x, x Xn are neighboring databases if they differ by one and only one data record. Denote this by d(x, x ) 1. A privacy mechanism with conditional density η satisfies ϵ-DP if, for all possible values of sdp and for neighboring datasets, the following probability ratio is bounded: R

A η(sdp | x) dsdp R

A η(sdp | x ) dsdp exp(ϵ), A Range(η). (1)

The parameter ϵ is referred to as the privacy loss budget, and it plays a pivotal role in determining the extent to which sdp discloses information about x: Larger values of ϵ correspond to reduced privacy guarantees, whereas ϵ = 0 signifies perfect privacy.

Output perturbation methods achieve privacy by first computing a query s : Xn S (such as mean, median, histogram, contingency tables) of the database and then releasing s(x) with added noise. To satisfy ϵ-DP, the query s must have finite sensitivity. Definition 2 (Global sensitivity (Nissim et al., 2007)). The Lp sensitivity of a function s, denoted p(s), is the maximum Lp-norm change in the function s value between neighboring databases x and x , namely

p(s) = max d(x,x )=1 s(x) s(x ) p .

A common output perturbation technique is the Laplace mechanism. Proposition 3 (Laplace mechanism (Dwork et al., 2006)). For a real-valued query s : Xn S, adding zero-centered Laplace noise with parameter 1(s)/ϵ achieves ϵ-DP.

Chaudhuri et al. (2013) provides a multivariate version of the Laplace mechanism. Other mechanisms include the exponential mechanism and the Gaussian mechanism (Liu, 2018). There are also relaxations of ϵ-DP, such as (ϵ, δ)-DP and Gaussian DP (Dong et al., 2022).

We exploit output perturbation on summary statistics in two ways. From an inference perspective, when s(x) is a sufficient statistic for θ, the confidential data posterior π(θ | x) is fully characterized by s(x) and the prior. This is the rationale for choosing π(θ | sdp) as the target distribution and for the posterior from perturbed data to be still informative about the model parameters. From a computation perspective, since many summary statistics are of lower dimension than the confidential data, this can facilitate efficient computations. In Section 3.4, we leverage the low dimensionality of sdp by incorporating quasi-Monte Carlo techniques.

2.3 Bayesian inference with privatized queries

In the private data setting, our goal is to approximate the posterior distribution of θ given privatized data sdp. A key challenge with data analysis on privatized data via π(θ | sdp) is the intractable private data marginal likelihood.

f(sdp | θ) = Z

Xn f(x | θ)η(sdp | x)dx. (2)

When the confidential data likelihood f(x | θ) can be evaluated, Ju et al. (2022) proposed a data-augmentation MCMC algorithm to approximate the doubly-intractable private data posterior

π(θ | sdp) f(sdp | θ)π(θ). (3)

The data augmentation strategy circumvents the intractability of evaluating Equation (2) by working with the joint posterior distribution p(θ, x | sdp) f(x | θ)π(θ)η(sdp | x) instead of the marginal posterior in Equation (3).

In this work, we study the challenging scenarios when the confidential likelihood f(x | θ) is intractable. This situation is triply intractable , as there are three levels of intractability in f(x | θ), f(sdp | θ), and π(θ | sdp). Data augmentation MCMC, which includes an inference step to sample from π(θ | x), is no longer applicable in the triply intractable setting.

Published in Transactions on Machine Learning Research (03/2025)

3 Proposed methods

We present methods for likelihood-free inference for the triply intractable posterior distribution π(θ | sdp). First and foremost, we recognize that SMC-ABC methods are viable solutions to approximate the private data posterior, as long as the privacy mechanism is publicly known and can be replicated by the data analyst.

Next, under the same assumptions, we focus on two neural density approximation methods that leverage state-of-the-art simulation-based inference methods and can be more efficient than the SMC-ABC baseline. We present two complementary approaches: a) Section 3.2: approximate the private data marginal likelihood f(sdp | θ) in Equation (2), and b) Section 3.3: approximate π(θ | sdp) from Equation (3) directly.

3.1 Adapting SMC-ABC to the private data setting

A data analyst can create a (hierarchical) private data simulator for sdp by combining the confidential data simulator x f( | θ) and a tractable output perturbation mechanism (e.g. the Laplace mechanism) with sdp η( | x). With this simulator for sdp, a rejection ABC algorithm can simulate parameter samples {θ(i)}i=1,...,N from the prior, and then keep samples correspond to a simulated private query s(i) dp close to the observed value so dp. This insight has been exploited in Gong (2022), using ABC to approximate π(θ | so dp). We point out that SMC-ABC can also be used for this purpose. We include a description of the algorithm in Appendix C for completeness.

Both SMC-ABC and ABC aim to target the exact posterior distribution, unlike neural density estimation methods which insist that the approximation comes from some parametric family qϕ. For this reason, we will use SMC-ABC as a baseline to test and validate our methods in Section 4.

3.2 Private data likelihood estimation

Our first neural density approximation strategy is to approximate the private data marginal likelihood f(sdp | θ) with a neural network denoted by qϕ(sdp | θ). When training qϕ(sdp | θ) f(sdp | θ), we aim to minimize their average KL divergence under the prior π(θ), corresponding to minimizing

Eπ(θ) [DKL (f(sdp | θ) qϕ(sdp | θ))] . (4)

After some derivations (in Appendix A), Equation (4) is equivalent to Ep(θ,sdp) [ log qϕ(sdp | θ)] up to a constant independent of ϕ. The expectation is taken with respect to the prior and the private data generating process p(θ, sdp) = π(θ) f(sdp | θ), which is intractble. To facilitate computations, let s write Equation (4) with respect to the confidential data generating process, resulting in

ℓPLE(ϕ) = Ep(θ,x)

S η(sdp | x) log qϕ(sdp | θ)dsdp

With bϕ = arg min ℓPLE(ϕ), the resulting private data likelihood estimation is qb ϕ(sdp | θ). Since Equation (5) is an expectation with respect to a tractable distribution, we can approximate the integral with Monte Carlo methods, discussed in Section 3.4.

When the primary inference goal is the maximum likelihood estimator, one can approximate it with bθMLE = arg maxθ qb ϕ(sdp | θ). Under the Bayesian paradigm, the posterior approximation of Equation (3) can be bπPLE(θ) π(θ)qb ϕ(sdp | θ). Quantities such as posterior median, mean, and credible regions can be estimated accordingly.

3.3 Private data posterior estimation

Now we approximate the private data posterior π(θ | sdp) in Equation (3) directly with a neural posterior estimator qϕ(θ | sdp), bypassing the synthetic likelihood step in Section 3.2. To find qϕ(θ | sdp) π(θ | sdp), let us minimize their KL divergence

DKL (π(θ | sdp) qϕ(θ | sdp)) = Eπ(θ|sdp)

log π(θ | sdp)

qϕ(θ | sdp)

Published in Transactions on Machine Learning Research (03/2025)

As shown in Section A, this is equivalent to

ℓPPE(ϕ) = Ep(θ,x)

S η(sdp | x) log qϕ(θ | sdp)dsdp

Many Bayesian problems use uninformative priors π(θ), which are dispersed in the parameter space. As a result, when employing a naive Monte Carlo strategy that uses the prior as the proposal to approximate expectations in Equation (5) or Equation (6) has most of its samples falling in low-density regions, leading to inefficient estimation due to high variance. To address this issue, importance sampling utilizes a proposal distribution p to change the bases of integration and thus reduce the variance of numerical integration. The automatic posterior transformation (APT) method (Greenberg et al., 2019) uses p(θ, x) = p(θ)f(x | θ) as proposal and has (unnormalized) importance weights

qϕ(θ | sdp) qϕ(θ | sdp) p(θ)

Based on this idea, we design a modified loss of Equation (6) that leverages the APT framework to improve efficiency in posterior estimation. Specifically, we propose the following loss function

ℓPPE A(ϕ) = E p(θ,x)

S η(sdp | x) log qϕ(θ | sdp)dsdp

and it is useful for the sequential training strategy we introduce in Section 3.5. The derivation of Equation (8) is in Appendix A.

3.4 Nested RQMC estimators

We can inspect the general form of loss functions in Equations (5) and (8) with

ℓ(ϕ) = E p(θ,x)

S η(sdp | x)g(sdp, θ)dsdp

Here g(sdp, θ) = log qϕ(θ | sdp) for PPE-A and g(sdp, θ) = log qϕ(sdp | θ) for PLE. Equation 9 is a double integral, where the outer expectation is with respect to some proposal distribution p and the inner integral involves the privacy mechanism η and the neural estimator qϕ.

With independent and identically distributed (i.i.d.) samples from the joint density {(θ(i), x(i))}N i=1 p(θ, x) = p(θ)f(x | θ), we can unbiasedly approximate Equation (9) with

S η(sdp | x(i))g(sdp, θ(i))dsdp

The inner integrals I(θ, x) = R η(sdp | x)g(sdp, θ)dsdp can be approximated with standard Monte Carlo integration techniques, as popular DP mechanisms such as Laplace and Gaussian mechanisms can be easily simulated. Using M i.i.d. samples, the root-mean-squared-error (RMSE) to estimate I(θ, x) approximations are typically on the order of O(M 1/2) due to the central limit theorem.

In many applications, the DP query result sdp serves as a private descriptive statistic of a dataset x. This statistic is commonly embedded in a low-dimensional space S, where the dimension r = dim(S) is significantly less than the dimension of the data space dim(Xn). For the privacy mechanisms, we typically model the generation process as τ : (u, s(x)) 7 sdp where u U[0, 1]r is a uniform random variable from the r-dimensional hypercube. For example, the process τ(u, s(x)) = s(x) 1(s)

2) log 1 2|u 1

achieves ϵ DP for 1-dimensional queries.

Randomized quasi-Monte Carlo (RQMC) methods, as detailed in (Owen, 1997a;b), differ from traditional Monte Carlo (MC) methods in that it generate correlated, low-discrepancy sequences {v(1), , v(M)} [0, 1]r. These sequences cover the parameter space more evenly than the pseudo-random sequences used by MC

Published in Transactions on Machine Learning Research (03/2025)

methods. This low-discrepancy feature helps to reduce the variance of the estimators, which is particularly useful in low-dimensional integration tasks (L Ecuyer, 2018). The estimator used in RQMC can be described as

ˆIRQMC θ,x = 1

j=1 g(τ(v(j); x), θ) := 1

j=1 gθ,x(v(j)). (11)

The RQMC estimator is unbiased and has a smaller RMSE compared to MC estimators. The -discrepancy of the point set {v(1:M)}, denoted by D (v(1:M)) is of the order O(M 1(log M)r) = O(M 1+δ) for a positive constant δ. If our neural approximation family gθ,x( ) has bounded Hardy-Krause variation VHK[ gθ,x( )] (Aistleitner et al., 2017), then, according to Basu & Owen (2016), the mean squared error of the RQMC estimator in Equation (11) satisfies

E ˆIRQMC θ,x I(θ, x) 2 V 2 HK( gθ,x)(D )2 = O(M 2+2δ). (12)

Thus, the RMSE of the RQMC estimator is of the order O(M 1+δ), achieving faster convergence than an MC estimator with O(M 1/2). We verify this improvement in convergence from the RQMC estimator on the SIR model and linear regression example with the neural spline flow approximation family (Durkan et al., 2019), in Appendix E, Figure 7.

3.5 Sequential neural estimations

This section presents our sequential neural approximation methods on privacy-protected data. Our central goal is to approximate the private data posterior distribution π(θ | sdp) given observed privatized data query sdp. We summarize the Sequential Private Posterior Estimation (SPPE) algorithm in Algo.1 and the Sequential Private Likelihood Estimation (SPLE) in Algo.2.

Algorithm 1 Sequential private-data posterior estimation (SPPE)

Input: observed privatized summary statistics so dp, neural estimation family qϕ(θ | sdp), and confidential data simulator f(x | θ) Initialization: set p0(θ) = π(θ), simulated data filtration D0 = {} for r = 1, 2, , R do

Sample {θ(i)}i=1:N from pr 1(θ) Simulate x(i) f( | θ(i)) for each i Update filtration Dr = Dr 1 {(θ(i), x(i))}i=1:N Update ϕ arg minϕ ˆℓPPE A(ϕ) using ˆIRQMC Equation (11) on Dr Set proposal pr(θ) = qϕ(θ | so dp) end for return ˆπ(θ | so dp) = qϕ(θ | so dp)

Both SPPE and SPLE use normalizing flows as the variational family to minimize some KL divergence, which takes the general form of Equation (9). We have designed their sequential training procedures to be sample efficient, in the sense that training data generated during previous rounds are kept and used in subsequent rounds.

In a sequential approximation procedure, we iteratively refine the neural approximations towards the target distribution. After the r-th training round, we incorporate the current neural density estimator qϕ into the proposal distribution of the next training round, using the automatic posterior transformation weights and loss functions described in Equations (7) and (8) respectively. Sequential training procedures can gradually move qϕ towards high-density regions of the private data posterior, and thus achieve good accuracy with fewer samples from the simulator, compared with non-sequential training procedures that use a fixed proposal distribution.

Published in Transactions on Machine Learning Research (03/2025)

Algorithm 2 Sequential private-data likelihood estimation (SPLE)

Input: observed privatized summary statistics so dp, neural estimation family qϕ(sdp | θ), and confidential data simulator f(x | θ) Initialization: set p0(θ) := π(θ), simulated data filtration D0 = {} for r = 1, 2, , R do

Sample {θ(i)}N i=1 from pr 1(θ). Simulate x(i) f( | θ(i)) for each i Update filtration Dr = Dr 1 {(θ(i), x(i))}i=1:N Update ϕ arg minϕ ˆℓPLE(ϕ) using ˆIRQMC Equation (11) on Dr Set proposal pr(θ) π(θ)qϕ(so dp | θ) end for return posterior estimation bπ(θ | so dp) = p R(θ) and likelihood estimation ˆfθ(sdp | θ) = qϕ (sdp | θ)

4 Applications

Here we illustrate our methods on the susceptible-infected-recovered (SIR) model and linear regression. We include experiments on the Naïve Bayes log-linear model in the Appendix.

4.1 SIR model for disease spread

The SIR model is a time-series model that describes how an infectious disease spreads in a closed population. It is most often used as a deterministic ordinary differential equation (ODE), but can also be represented by a Markov jump process.

To the best of our knowledge, inference on privacy-protected data with the SIR model has not been studied in the literature. Our proposed methods are particularly suitable for this problem for two reasons. First, our methods are simulation-based and thus are applicable under the ODE model, when other likelihood-based methods can no longer be applied. Second, in the SIR model, low-dimensional summary statistics can be very informative about model parameters. Then the RQMC methods discussed in Section 3.4 can provide efficiency and accuracy gains when evaluating the loss functions.

We describe a stochastic SIR model in a closed population with K people. As the disease spreads, the individuals progress through the three states: susceptible, infected, and recovered. We use S(t), I(t), and R(t) to denote the number of individuals within each compartment at time t. We make the following assumptions: (a) individuals are infected at a rate β SI

K , resulting in a decrease of S by one and an increase of I by one, (b) infected individuals recover with a rate γI, leading to a decrease of I by one and an increase of R by one. The confidential data likelihood of this continuous-time Markov jump process is hard to compute. Our goal is to infer the infection and recovery rates θ = (β, γ), under initial conditions (S, I, R) = (K 1, 1, 0).

Privatizing the infection curve. Here, we propose a mechanism to privatize the infection trajectory I(t)/K, which is the proportion of infected individuals at each t.

Proposition 4 (DP infection trajectory). Consider a sequence of L points {t1, , t L} in the time interval [0, T], our privatized query can be sdp = (s1, , s L) where each si Binomial n, I(ti)+m

K+2m independently.

The mechanism generating sdp = (s1, , s L) satisfies ϵ-DP, with ϵ = n

This algorithm adds calibrated noise to the SIR process to produce sdp, a differential private time series. It can probably protect each individual s infection status. We demonstrate that analysts can still make inferences about population parameters (β, γ) by only knowing sdp, which retains information about the speed of disease spread.

Experiments on synthetic data. We illustrate the performance of SPPE and SPLE on synthetic privatized SIR model data. The data generating parameters are set to emulate a measles outbreak. We set L = 10 time points with n = 1000 and m = 1000, achieving ϵ = 10 differential privacy. Lower privacy loss budgets

Published in Transactions on Machine Learning Research (03/2025)

2 4 6 8 10 Number of Rounds

2 4 6 8 10 Number of Rounds

2.5 2.0 1.5 1.0 0.5

Figure 1: Inference on SIR model. A. Convergence of sequential posterior estimations given DP-protected infection trajectory. Each round entails N = 1000 simulations. B. Approximation accuracy by SPPE (orange) and SPLE (red) against the number of rounds, the error bars represent the mean with the upper and lower quartiles over 20 random trials.

(ϵ = 1, 0.1) are explored in Appendix E.1.1 by adjusting m accordingly. We also describe prior specifications and implementation details in the Appendix.

Figure 1-A describes the convergence of the posterior approximations ˆπ(θ | so dp) towards the SMC-ABC (Beaumont et al., 2009) baseline, requiring up to 5 105 simulations. We use SMC-ABC as the baseline because it does not resort to the variational approximations employed by SPPE and SPLE. Both SPPE and SPLE quickly adapt to meet the SMC-ABC results, with orders of magnitudes fewer simulations needed. After the first round of simulations, SPPE can identify the high probability region of π(θ | sdp), while the SPLE posterior is still exploring the parameter space. See Appendix for the SPLE approximation after r = 1. After r = 5 rounds, both methods can concentrate around the posterior mean and have captured the posterior correlation cov(β, γ | sdp). By inspecting marginal posterior histograms (Figure 6), we find that, in this example, the posterior approximated by SPPE is slightly more concentrated than those from SMC-ABC and SPLE.

To quantitatively evaluate the performance of our methods, we use the following metrics: (1) MMD (Gretton et al., 2012): maximum mean discrepancy between the neural estimated posteriors and SMC-ABC posterior; (2) C2ST (Lopez-Paz & Oquab, 2016): classifier two sample tests; (3) NLOG: negative log density at true parameters; and (4) LMD: log median distance from simulated to observed sdp. Smaller values indicate better performance for all four metrics. In Figure 1-B, we compare the performance of these methods at various numbers of simulation samples/rounds. After Round 5, SPLE and SPPE have similar accuracy. We also compare their runtime in Table 1 of the Appendix. To achieve MMD lower than 0.1, SPPE is 6x faster and SPLE is 2x faster than SMC-ABC.

Real disease outbreaks. We apply our privacy mechanism and inference methods to several real infectious disease outbreaks: influenza, Ebola, and COVID-19. In Figure 2-A, we compare the posterior distributions π(β, γ | sdp) obtained from SMC-ABC, SPPE, and SPLE. The results are similar to synthetic data experiments: when SPPE and SPLE use the same computational resources, the SPPE posterior converges faster than SPLE. In Figure 2-B, we inspect the 95%-credible interval for the ratio R0 = β/γ, which is known as the basic reproduction number. Our R0 estimates using privatized data are consistent with common estimates of R0 for these diseases (Eisenberg, 2020). Specifically, typical R0 ranges are (1.51, 2.53) for Ebola, (1.5, 3.5) for COVID-19, and with the exception of the flu outbreak (0.9, 2.1), which should be modeled by the SI model instead of SIR.

4.2 Bayesian linear regression

We demonstrate our methods on a linear regression model, and compare it to existing work on DP regression analysis like (Ju et al., 2022; Bernstein & Sheldon, 2019). We consider linear regression with n subjects and p predictors. Denote x0 Rn p as the design matrix without intercept terms, and let x = (1n 1, x0) represent the design matrix with the intercept. Ordinary linear regression models assume that the outcomes

Published in Transactions on Machine Learning Research (03/2025)

Figure 2: Inference on real infectious disease outbreaks. A. Visualization of the posterior distribution given private infection curve applied to flu, Ebola [in a) Guinea, b) Liberia, and c) Sierra Leone], and COVID-19 in Clark County, Nevada. All experiments use a privacy level of ϵ = 10. B. Mean and 95% credible intervals for R0 = β/γ with different methods in each dataset. Grey: SMC-ABC; orange: SPPE; red: SPLE. A non-DP baseline (black stars) is also shown, which is obtained from the solution of the corresponding ordinary differential equations.

y satisfy y|x0 Nn(xβ, σ2In). Under the constraint of differential privacy, both outcomes y and predictors x0 are subject to calibrated noise. In a Bayesian setting, we model the predictors with x0,i Np(m, Σ) for i = 1, 2, , n independently. Our parameter of interest is β, which represents the (p + 1)-dimensional vector of regression coefficients. Our experiments assume that σ, m, and Σ are fixed, to illustrate our algorithm. In practice, one can also estimate these parameters from data. Our setting is the same as that used in Ju et al. (2022).

Private sufficient statistics. We achieve ϵ-DP on confidential data (x, y) by adding Laplace noise to sufficient statistics. Achieving privacy requires finite ℓ1 sensitivity on confidential data. As a result, before adding noise for privacy, we first need to clamp the predictors and the responses, and then normalize them to take values in [ 1, 1]. Let s denote the clamped confidential data as x and y, respectively. We then define the summary statistics of clamped data as s := 1

n x x . We have refined the sensitivity analysis of Ju et al. (2022) to (s) = 1

n(p2 + 3p + 3). The privacy summary statistics sdp is achieved by adding independent Laplace(0, 1( s)/ϵ) noise to each entry of s. This output perturbation mechanism satisfies ϵ-DP. Details of clamping and sensitivity analysis are in the supplementary materials.

Posterior approximations. We compare the 95% posterior credible intervals obtained by methods applicable to linear regression, including SMC-ABC, SPPE, SPLE, Data-augmentation MCMC (DA-MCMC) (Ju et al., 2022), Gibbs-SS (Bernstein & Sheldon, 2019), and RNPE (Ward et al., 2022). See Table 2, 3, and 4 for marginal posterior credible intervals of β | sdp at privacy levels ϵ = 10, 1 and 0.1 respectively. Both tables are in Appendix E.2. We also use the Kolmogorov-Smirnov test to assess the similarity of empirical posterior marginal, shown in Figure 3 at a privacy level of ϵ = 10. Results from SPPE, SPLE, SMC-ABC, and DA-MCMC reach agreement on the posterior marginals. However, results from Gibbs-SS and RNPE are qualitatively different from the other four methods, yielding biased approximations for β0 and β1 respectively. Among the likelihood-free methods, SPLE attempts to approximate the likelihood function and is the most similar to DA-MCMC, our likelihood-based baseline.

The cost of privacy. Although it has been a standard practice in many DP work (Bernstein & Sheldon, 2018; 2019; Ju et al., 2022; Gong, 2022) to achieve finite global sensitivity through clamping, this benefit of privacy protection comes at the cost of accuracy of the subsequent statistical analysis.

Our experiments demonstrate the privacy-utility trade-off, which has also been demonstrated by related works on noise/privacy-aware inference. A naive plug-in estimator (plugging in sdp as s into the conjugate posterior π(θ | s)) gives the wrong posterior; See the second rows in Table 2. Besides, achieving privacy protection comes at the cost of estimation accuracy: private data posterior π(θ | sdp) is different from the

Published in Transactions on Machine Learning Research (03/2025)

Figure 3: Kolmogorov-Smirnov test statistics between approximations of posterior marginals, at ϵ = 10.

confidential data posterior π(θ | x) even under a high loss budget of ϵ = 10 (small privacy noise) setting; See the first rows in Table 2, 3, and 4. With small privacy noise, data corruption primarily comes from the clamping step. Evaluating this censoring bias is still an open problem in DP data analysis, with some recent attempts in Biswas et al. (2020); Evans et al. (2019); Covington et al. (2021).

5 Discussion

In this work, we propose three simulation-based inference methods to learn population parameters from privacy-protected data: SMC-ABC, SPPE, and SPLE. The latter two are neural density estimation methods. We have designed SPPE and SPLE to leverage state-of-the-art computational tools, such as normalizing flows and randomized quasi-Monte Carlo, to be computationally efficient for complex data models. SPPE aims to approximate the posterior-data posterior, and SPLE approximates the posterior-data marginal likelihood.

We compare our methods to similar DP data analysis work that focuses on post processing of privacy-protected datasets (or their summary statistics). Compared with existing ABC-based analysis for DP data (Gong, 2022), SPPE and SPLE do not reject training samples and hence are more computationally efficient. Compared with DA-MCMC (Ju et al., 2022), our method does not require that the confidential data likelihood can be evaluated easily. Our methods require only that one can simulate from the prior distribution π(θ) and the confidential model f(x | θ). They also all scale linearly with the sample size of the confidential database.

Our work contributes to the growing literature on statistical analysis with privatized data. In particular, our simulation-based inference framework can be applied to complex models with intractable likelihood functions and the resulting triply-intractable private data posterior. We hope our methods can catalyze DP-protected data sharing between data collectors and analysts. Our experiments and analysis demonstrate the necessity and feasibility of designing valid statistical inference procedures to correct for biases introduced by privacy-protection mechanisms. We advocate for increases in both sharing privacy-protected data by collectors and using valid inference procedures.

We acknowledge the limitations of the present work and point out future directions. Our methods leverage the fact that popular DP mechanisms can be efficiently achieved with random number generators. In some tasks, such as DP principle component analysis (Chaudhuri et al., 2013), the privacy mechanism is actually intractable to simulate from but its density is easy to evaluate. Our method is not applicable to this type of DP mechanism. Additionally, since our method scales linearly in the sample size of the confidential database, it might not be ideal for massive datasets, such as the Facebook URL dataset (Evans & King, 2023), which concerns millions of users. It is of interest to develop methods that scale sub-linearly in sample size. Finally, in Appendix D, we show the algorithms scale linearly with the dimension of the summary statistics and quadratically with the number of nodes per layer in the neural network. In our experiments, we found that using at most 8 layers and 50 nodes can lead to satisfactory performance. A precise understanding of how the convergence rate depends on the sample size of the confidential data and the dimensionality of the summary statistics remains to be established and is likely to vary on a case-by-case basis for the choice neural density estimators.

Published in Transactions on Machine Learning Research (03/2025)

Acknowledgments

We sincerely appreciate the valuable comments and suggestions from anonymous referees. The work of the third author was supported by the National Natural Science Foundation of China No.12171454 and Fundamental Research Funds for the Central Universities.

John Abowd, Robert Ashmead, Ryan Cumings-Menon, Simson Garfinkel, Micah Heineck, Christine Heiss, Robert Johns, Daniel Kifer, Philip Leclerc, Ashwin Machanavajjhala, Brett Moran, William Sexton, Matthew Spence, and Pavel Zhuravlev. The 2020 Census Disclosure Avoidance System Top Down Algorithm. Harvard Data Science Review, (Special Issue 2), Jun 24 2022. https://hdsr.mitpress.mit.edu/pub/7evz361i.

Christoph Aistleitner, Florian Pausinger, Anne Marie Svane, and Robert F Tichy. On functions of bounded variation. In Mathematical Proceedings of the Cambridge Philosophical Society, volume 162, pp. 405 418. Cambridge University Press, 2017.

Daniel Alabi and Salil Vadhan. Hypothesis testing for differentially private linear regression. Advances in Neural Information Processing Systems, 35:14196 14209, 2022.

Raman Arora, Raef Bassily, Tomás González, Cristóbal A Guzmán, Michael Menart, and Enayat Ullah. Faster rates of convergence to stationary points in differentially private optimization. In International Conference on Machine Learning, pp. 1060 1092. PMLR, 2023.

Jordan Awan and Zhanyu Wang. Simulation-based, finite-sample inference for privatized data, 2023.

Andrés F Barrientos, Jerome P Reiter, Ashwin Machanavajjhala, and Yan Chen. Differentially private significance tests for regression coefficients. Journal of Computational and Graphical Statistics, 28(2): 440 453, 2019.

Raef Bassily, Cristóbal A Guzmán, and Michael Menart. Differentially private stochastic optimization: new results in convex and non-convex settings. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, 2021. URL https://openreview. net/forum?id=Ra-2Ov Xr7UU.

Kinjal Basu and Art B Owen. Transformations and Hardy Krause variation. SIAM Journal on Numerical Analysis, 54(3):1946 1966, 2016.

Mark A Beaumont, Jean-Marie Cornuet, Jean-Michel Marin, and Christian P Robert. Adaptive approximate Bayesian computation. Biometrika, 96(4):983 990, 2009.

Garrett Bernstein and Daniel R Sheldon. Differentially private Bayesian inference for exponential families. Advances in Neural Information Processing Systems, 31, 2018.

Garrett Bernstein and Daniel R Sheldon. Differentially private Bayesian linear regression. Advances in Neural Information Processing Systems, 32, 2019.

Alex Bie, Gautam Kamath, and Guojun Zhang. Private GANs, Revisited. ar Xiv preprint ar Xiv:2302.02936, 2023.

Sourav Biswas, Yihe Dong, Gautam Kamath, and Jonathan Ullman. Coinpress: Practical private mean and covariance estimation. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 14475 14485. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/ a684eceee76fc522773286a895bc8436-Paper.pdf.

T Tony Cai, Yichen Wang, and Linjun Zhang. The cost of privacy: Optimal rates of convergence for parameter estimation with differential privacy. The Annals of Statistics, 49(5):2825 2850, 2021.

Published in Transactions on Machine Learning Research (03/2025)

Kamalika Chaudhuri, Anand D Sarwate, and Kaushik Sinha. A near-optimal algorithm for differentiallyprivate principal components. Journal of Machine Learning Research, 14, 2013.

Christian Covington, Xi He, James Honaker, and Gautam Kamath. Unbiased statistical estimation and valid confidence intervals under differential privacy. ar Xiv preprint ar Xiv:2110.14465, 2021.

Kyle Cranmer, Johann Brehmer, and Gilles Louppe. The frontier of simulation-based inference. Proceedings of the National Academy of Sciences, 117(48):30055 30062, 2020.

Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density estimation using real NVP. In International Conference on Learning Representations, 2017. URL https://arxiv.org/pdf/1605.08803.pdf.

Jinshuo Dong, Aaron Roth, and Weijie J Su. Gaussian differential privacy. Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(1):3 37, 2022.

Jörg Drechsler. Differential privacy for government agencies are we there yet? Journal of the American Statistical Association, 118(541):761 773, 2023. doi: 10.1080/01621459.2022.2161385. URL https://doi. org/10.1080/01621459.2022.2161385.

Jörg Drechsler, Daniel Kifer, Jerome Reiter, and Aleksandra Slavković. Handbook of Sharing Confidential Data: Differential Privacy, Secure Multiparty Computation, and Synthetic Data. CRC Press, 2024.

Conor Durkan, Artur Bekasov, Iain Murray, and George Papamakarios. Neural spline flows. Advances in Neural Information Processing Systems, 32, 2019.

Cynthia Dwork, Frank Mc Sherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006. Proceedings 3, pp. 265 284. Springer, 2006.

Joseph Eisenberg. R0: How scientists quantify the intensity of an outbreak like coronavirus and its pandemic potential. https://sph.umich.edu/pursuit/2020posts/how-scientists-quantify-outbreaks.html, 2020. Accessed: 2023-10-08.

Georgina Evans and Gary King. Statistically valid inferences from differentially private data releases, with application to the Facebook URLs dataset. Political Analysis, 31(1):1 21, 2023. doi: 10.1017/pan.2022.1.

Georgina Evans, Gary King, Margaret Schwenzfeier, and Abhradeep Thakurta. Statistically valid inferences from privacy-protected data. American Political Science Review, pp. 1 16, 2019.

James Foulds, Joseph Geumlek, Max Welling, and Kamalika Chaudhuri. On the theory and practice of privacy-preserving Bayesian data analysis. ar Xiv preprint ar Xiv:1603.07294, 2016.

Daniel T Gillespie. Exact stochastic simulation of coupled chemical reactions. The Journal of Physical Chemistry, 81(25):2340 2361, 1977.

Ruobin Gong. Exact inference with approximate computation for differentially private data via perturbations. Journal of Privacy and Confidentiality, 12(2), Nov. 2022. doi: 10.29012/jpc.797. URL https://journalprivacyconfidentiality.org/index.php/jpc/article/view/797.

Ruobin Gong, Erica L. Groshen, and Salil Vadhan. Harnessing the known unknowns: Differential privacy and the 2020 census (co-editors forward). Harvard Data Science Review, (Special Issue 2), 2022. URL https://doi.org/10.1162/99608f92.cb06b469.

David Greenberg, Marcel Nonnenmacher, and Jakob Macke. Automatic posterior transformation for likelihoodfree inference. In International Conference on Machine Learning, pp. 2404 2414. PMLR, 2019.

Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Schölkopf, and Alexander Smola. A kernel two-sample test. The Journal of Machine Learning Research, 13(1):723 773, 2012.

Published in Transactions on Machine Learning Research (03/2025)

Shijie Guo and Jingchen Hu. Data privacy protection and utility preservation through Bayesian data synthesis: A case study on airbnb listings. The American Statistician, 77(2):192 200, 2023. doi: 10.1080/00031305. 2022.2077440. URL https://doi.org/10.1080/00031305.2022.2077440.

Nianqiao Ju, Jordan Awan, Ruobin Gong, and Vinayak Rao. Data augmentation MCMC for Bayesian inference from privatized data. Advances in Neural Information Processing Systems, 35:12732 12743, 2022.

Vishesh Karwa, Dan Kifer, and Aleksandra B Slavković. Private posterior distributions from variational approximations. ar Xiv preprint ar Xiv:1511.07896, 2015.

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. ar Xiv preprint ar Xiv:1412.6980, 2014.

Ivan Kobyzev, Simon JD Prince, and Marcus A Brubaker. Normalizing flows: An introduction and review of current methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(11):3964 3979, 2020.

Tejas Kulkarni, Joonas Jälkö, Antti Koskela, Samuel Kaski, and Antti Honkela. Differentially private Bayesian inference for generalized linear models. In International Conference on Machine Learning, pp. 5838 5849. PMLR, 2021.

Jaewoo Lee, Minjung Kim, Yonghyun Jeong, and Youngmin Ro. Differentially private normalizing flows for synthetic tabular data generation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 7345 7353, 2022.

Sai Li, Linjun Zhang, T Tony Cai, and Hongzhe Li. Estimation and inference for high-dimensional generalized linear models with knowledge transfer. Journal of the American Statistical Association, pp. 1 12, 2023.

Fang Liu. Generalized Gaussian mechanism for differential privacy. IEEE Transactions on Knowledge and Data Engineering, 31(4):747 756, 2018.

David Lopez-Paz and Maxime Oquab. Revisiting classifier two-sample tests. ar Xiv preprint ar Xiv:1610.06545, 2016.

Jan-Matthis Lueckmann, Pedro J Goncalves, Giacomo Bassetto, Kaan Öcal, Marcel Nonnenmacher, and Jakob H Macke. Flexible statistical inference for mechanistic models of neural dynamics. Advances in Neural Information Processing Systems, 30, 2017.

Jan-Matthis Lueckmann, Jan Boelts, David Greenberg, Pedro Goncalves, and Jakob Macke. Benchmarking simulation-based inference. In International Conference on Artificial Intelligence and Statistics, pp. 343 351. PMLR, 2021.

Pierre L Ecuyer. Randomized quasi-Monte Carlo: An introduction for practitioners. Springer, 2018.

Benjamin Kurt Miller, Christoph Weniger, and Patrick Forré. Contrastive neural ratio estimation. ar Xiv preprint ar Xiv:2210.06170, 2022.

Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. Smooth sensitivity and sampling in private data analysis. In Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing, STOC 07, pp. 75 84, New York, NY, USA, 2007. Association for Computing Machinery. ISBN 9781595936318. doi: 10.1145/1250790.1250803. URL https://doi.org/10.1145/1250790.1250803.

Art B Owen. Monte Carlo variance of scrambled net quadrature. SIAM Journal on Numerical Analysis, 34 (5):1884 1910, 1997a.

Art B Owen. Scrambled net variance for integrals of smooth functions. The Annals of Statistics, 25(4): 1541 1562, 1997b.

George Papamakarios and Iain Murray. Fast ε-free inference of simulation models with Bayesian conditional density estimation. Advances in Neural Information Processing Systems, 29, 2016.

Published in Transactions on Machine Learning Research (03/2025)

George Papamakarios, Theo Pavlakou, and Iain Murray. Masked autoregressive flow for density estimation. Advances in Neural Information Processing Systems, 30, 2017.

George Papamakarios, David Sterratt, and Iain Murray. Sequential neural likelihood: Fast likelihood-free inference with autoregressive flows. In The 22nd International Conference on Artificial Intelligence and Statistics, pp. 837 848. PMLR, 2019.

George Papamakarios, Eric Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, and Balaji Lakshminarayanan. Normalizing flows for probabilistic modeling and inference. J. Mach. Learn. Res., 22(1), jan 2021. ISSN 1532-4435.

Saeyoung Rho, Rachel Cummings, and Vishal Misra. Differentially private synthetic control. In International Conference on Artificial Intelligence and Statistics, pp. 1457 1491. PMLR, 2023.

Scott A Sisson, Yanan Fan, and Mark M Tanaka. Sequential Monte Carlo without likelihoods. Proceedings of the National Academy of Sciences, 104(6):1760 1765, 2007.

Bingyue Su, Yu Wang, Daniele E Schiavazzi, and Fang Liu. Differentially private normalizing flows for density estimation, data synthesis, and variational inference with application to electronic health records. ar Xiv preprint ar Xiv:2302.05787, 2023.

Chris Waites and Rachel Cummings. Differentially private normalizing flows for privacy-preserving density estimation. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, AIES 21, pp. 1000 1009, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450384735. doi: 10.1145/3461702.3462625. URL https://doi.org/10.1145/3461702.3462625.

Daniel Ward, Patrick Cannon, Mark Beaumont, Matteo Fasiolo, and Sebastian Schmon. Robust neural posterior estimation and statistical model criticism. Advances in Neural Information Processing Systems, 35:33845 33859, 2022.

Oliver Williams and Frank Mc Sherry. Probabilistic inference and differential privacy. Advances in Neural Information Processing Systems, 23, 2010.

A Derivations for SPLE and SPPE objectives

Lemma 5. For private data likelihood estimation, minimizing the average KL divergence DKL (f(sdp | θ) qϕ(sdp | θ)) under the prior π(θ), is equivalent to minimizing the objective function

ℓPLE(ϕ) = Ep(θ,x)

S η(sdp | x) log qϕ(sdp | θ)dsdp

With respect to the joint distribution p(θ, x) p(θ)f(x | θ), the objective function still has the form

ℓPLE(ϕ) = E p(θ,x)

S η(sdp | x) log qϕ(sdp | θ)dsdp

Published in Transactions on Machine Learning Research (03/2025)

Proof. Note that

Eπ(θ) [DKL (f(sdp | θ) qϕ(sdp | θ))]

S f(sdp | θ) (log f(sdp | θ) log qϕ(sdp | θ)) dsdp

S Θ π(θ)f(sdp | θ) log qϕ(sdp | θ)dsdpdθ

S Θ Xn π(θ)η(sdp | x)f(x | θ) log qϕ(sdp | θ)dsdpdθdx

Θ Xn p(θ, x) Z

S η(sdp | x) log qϕ(sdp | θ)dsdp

= C1 Ep(θ,x)

S η(sdp | x) log qϕ(sdp | θ)dsdp

where C1 = RR

S Θ π(θ)f(sdp | θ) log f(sdp | θ)dsdpdθ is a constant unrelated to ϕ. Furthermore, if θ are sampled from some proposal p(θ), we still have

E p(θ) [DKL (f(sdp | θ) qϕ(sdp | θ))]

S Θ p(θ)f(sdp | θ) log f(sdp | θ)dsdpdθ E p(θ,x)

S η(sdp | x) log qϕ(sdp | θ)dsdp

Lemma 6. For private data posterior estimation, minimizing the average KL divergence DKL (p(θ | sdp) qϕ(θ | sdp)) with respect to the marginal evidence p(sdp) = R

S π(θ)f(sdp | θ)dθ, is equivalent to minimizing the objective function

ℓPPE(ϕ) = Ep(θ,x)

S η(sdp | x) log qϕ(θ | sdp)dsdp

With respect to the joint distribution p(θ, x) p(θ)f(x | θ), then the objective function has the form

ℓPPE A(ϕ) = E p(θ,x)

S η(sdp | x) log qϕ(θ | sdp)dsdp

qϕ(θ | sdp) := qϕ(θ | sdp) p(θ)

π(θ) 1 Z(sdp, ϕ), Z(sdp, ϕ) = Z

Θ qϕ(θ | sdp) p(θ)

π(θ)dθ. (17)

Proof. We have

Ep(sdp) [DKL (π(θ | sdp) qϕ(θ | sdp))]

Θ π(θ | sdp) (log π(θ | sdp) log qϕ(θ | sdp)) dθ dsdp

S Θ p(sdp)π(θ | sdp) log qϕ(θ | sdp)dsdpdθ

S Θ Xn π(θ)η(sdp | x)f(x | θ) log qϕ(θ | sdp)dsdpdθdx

Θ Xn p(θ, x) Z

S η(sdp | x) log qϕ(θ | sdp)dsdp

= C2 Ep(θ,x)

S η(sdp | x) log qϕ(θ | sdp)dsdp

Published in Transactions on Machine Learning Research (03/2025)

where C2 = RR

S Θ p(sdp)π(θ | sdp) log π(θ | sdp)dsdpdθ is a constant unrelated to ϕ. Furthermore, if θ are sampled from some proposal p(θ), then

Ep(sdp) [DKL ( π(θ | sdp) qϕ(θ | sdp))]

S Θ p(sdp) π(θ | sdp) log π(θ | sdp)dsdpdθ E p(θ,x)

S η(sdp | x) log qϕ(θ | sdp)dsdp

where π(θ | sdp) is called proposal posterior (Greenberg et al., 2019), which satisfied

π(θ | sdp) = π(θ | sdp) p(θ)p(sdp)

π(θ) p(sdp), p(sdp) = Z

Θ p(θ)f(sdp | θ)dθ.

Based on Prop. 1 in the work of Papamakarios & Murray (2016), minimizing Equation (18) results in the convergence of qϕ(θ | sdp) to π(θ | sdp) and qϕ(θ | sdp) to π(θ | sdp).

B Proof of Proposition 4

Proposition 4. Consider a sequence of L points {t1, , t L} in the time interval [0, T], our privatized query can be sdp = (s1, , s L) where each si Binomial n, I(ti)+m

K+2m independently. The mechanism generating

sdp = (s1, , s L) satisfies ϵ-DP, with ϵ = n

Proof. Denote the numbers of infectious as I(t) {0, 1, , K} and its neighbors I(t), note that |I(t) I(t)| 1 holds for all t [0, T] because a change of the infection status of any one of the K individuals will at most increase or decrease I(t) by only 1. We examine the following density ratio:

p(si | I(ti)) p(si | I(ti)) =

n si I(ti)+m

K+2m si K I(ti)+m

K+2m (n si)

n si I(ti)+m

K+2m si K I(ti)+m

K+2m (n si)

= I(ti) + m

si K I(ti) + m

K I(ti) + m

This expression can be analyzed under three distinct cases. In the first case, when I(ti) > I(ti), we have

I(ti) = I(ti) + 1, leading to Hi K I(ti)+m K I(ti)+m

(n si) K I(ti)+m K I(ti)+m

m n. In the second case, if

I(ti) < I(ti), then I(ti) = I(ti) 1, and we obtain Hi I(ti)+m

m n. Lastly, when

I(ti) = I(ti), Hi equals to 1. Thus, p(si|I(ti))

p(si| I(ti)) 1+m

m . Now for sdp = (s1, , s L), we have

p(sdp | {I(t)|t [0, T]}) p(sdp | { I(t)|t [0, T]}) = p(sdp | I(t1), , I(t L))

p(sdp | I(t1), , I(t L)) exp n

which gives the mechanism generating sdp = (s1, , s L) satisfies ϵ-DP, with ϵ = n

C Description of the SMC-ABC algorithm

Algorithm 3 presents the extension of the SMC-ABC algorithm (Sisson et al., 2007; Beaumont et al., 2009) to posterior estimation from the privatized data settings, by incorporating the private data simulator f(x | θ) and the privatized algorithm η.

Published in Transactions on Machine Learning Research (03/2025)

Algorithm 3 Sequential Monte Carlo - Approximate Bayesian Computation (SMC-ABC)

Input: observed privatized summary statistics so dp, discrepancy function ρ, thresholds {ϵt}, transition kernels {Kt}, simulator f(x | θ), and privatized algorithm η for t = 1, 2, . . . , T do

for i = 1, 2, . . . , N do

if t = 1 then

Sample θ π(θ) independently else

Sample θ from the previous population {θ(i) t 1} with weights {W (i) t 1} Perturb θ using transition kernel: θ Kt(θ | θ ) end if repeat

Generate x f(x | θ ) Generate s dp η(sdp | x ) until ρ(s dp, so dp) ϵt Set θ(i) t = θ

Compute weights:

1, if t = 1

π(θ(i) t ) PN

j=1 W (j) t 1Kt(θ(i) t |θ(j) t 1), if t > 1

end for Normalize the weights so that PN i=1 W (i) t = 1

Compute effective sample size: ESS = h PN i=1(W (i) t )2i 1

if ESS < E then

Resample {θ(i) t } with weights {W (i) t } to obtain a new population Set weights {W (i) t = 1/N} end if end for return posterior estimation {θ(i) T , W (i) T }N i=1

D Computational Complexity Analysis

In this section, we analyze the computational complexity of the proposed (S)PLE and (S)PPE algorithms in detail. We begin by examining the computational cost of neural spline flows, which serve as the core density estimators in our framework.

Computational complexity for neural spline flows. Denote the neural spline flow (Durkan et al., 2019) as qϕ(y | z) with trainable parameters ϕ. Let nl represent the number of flow layers, and each layer containing nh hidden layers of nu units to generate the parameters of K knots for monotonic spline. Assuming the input dimensions of y and z are dy and dz, respectively. Each flow layer processes the concatenated input (y, z) and outputs (3K 1) parameter for constructing the monotonic spline for each dimension of y (see Appendix A.1 in Durkan et al. (2019) for details).

The computational complexity of evaluating gradient on one sample, i.e., ϕqϕ(yi | zi), is then given by

CNSF = O nl (dz + dy)nu + nhn2 u + Kdynu . (19)

Computational complexity for PLE. To minimize the PLE loss ℓPLE(ϕ), the gradient with respect to ϕ is computed over mini-batches of size B. For each pair (θi, xi) in a mini-batch, the integral I(θi, xi) from Equation (12) is estimated using M pseudo-samples {v(1), . . . , v(M)} generated via RQMC.

Published in Transactions on Machine Learning Research (03/2025)

Considering a neural spline flow with nl layers, each containing nh hidden layers with nu units, the computational cost for processing a single mini-batch is then

CPLE = O nl BM (dim(θ) + dim(s))nu + nhn2 u + Kdim(s)nu + ndim(X)BM log M , (20)

where the last term represents the complexity of generating an RQMC-based estimator with M samples. Specifically, the factor M log M accounts for the cost of generating an RQMC sequence, and dim(x) = ndim(X) represents the dimension of the confidential data x Xn with sample size n.

Computational complexity for PPE. For the PPE method, the roles of the parameters θ and summary statistics sdp are reversed compared to PLE. To estimate the normalization constant Z(sdp, ϕ) in Equation (17), we using the atomic proposal method (Greenberg et al., 2019), approximated as

Z(sdp, ϕ) = Z

Θ qϕ(θ | sdp) p(θ)

j=1 qϕ(θj | sdp,j) 1 π(θj),

where na is the number of atoms, and {θj}na j=1 are subsamples under the current mini-batch, which can be regarded as the samples from proposal p(θ) (see Appendix A.2 in Greenberg et al. (2019) for details). The computational complexity for PPE is then given by

CPPE = O nl BM (dim(θ) + dim(s))nu + nanhn2 u + Kdim(s)nanu + ndim(X)BM log M . (21)

Both PLE and PPE scale linearly with the posterior dimension dim(θ). Notably, while they also scale linearly with the confidential data sample size n, this scaling is decoupled from the complexity of the flow structure, which directly models the conditional density between the summary statistics s (or sdp) and the model parameters θ, rather than the high-dimensional confidential data itself.

E Experimental details

The training and inference processes of the methods were primarily implemented using the Pytorch package in Python.

Experimental setup. We employed neural spline flows (Durkan et al., 2019) as the conditional density estimator, consisting of 8 layers. Following the flow structure settings in (Lueckmann et al., 2021), each layer consists of two residual blocks with 50 units and Re LU activation function, with 10 bins in each monotonic piecewise rational-quadratic transforms, and the tail bound was set to 5. Through empirical evaluation, we found that 8 layers provided sufficient flexibility for posterior estimation. Appendix E.4 compares 5-layer and 10-layer models, showing that 5 layers slightly degrade performance, while 10 layers perform similarly to 8 layers but with higher computational cost.

In the training process, the number of samples simulated in each round is N = 1000 and there are R = 10 rounds in total. In each round of training, we randomly select 5% of the newly generated samples as validation data. According to the early stop criterion proposed by Papamakarios et al. (2019), we stop training if the value of loss on validation data does not decrease after 20 epochs in a single round. For stochastic gradient descent optimizer, we choose the Adam (Kingma & Ba, 2014) with the batchsize of 100, the learning rate of 5 10 4 and the weight decay is 10 4.

E.1 SIR model

E.1.1 Detailed results on synthetic data

In our experiments, we use the Gillespie algorithm (Gillespie, 1977) to simulate the whole process over a duration of T = 160 time units and record the populations at intervals of 1-time units. The prior distribution of β is set to N(log 0.4, 0.5) and prior distribution of γ is set to N(log 0.125, 0.2). The value of K is configured as 1000000, while N is set to 1000 for the number of observations.

Published in Transactions on Machine Learning Research (03/2025)

To publish the privatized data about the infection process, we select the infectious group I(t) at L = 10 evenly-spaced points in time [0, T], with the privacy parameters set to n = 1000 and m = 1000, which satisfied ϵ-DP with ϵ = 10. The ground truth parameters are

θ = (exp( 0.5), exp( 3)),

and the observed private statistic so dp simulated from the model with ground truth parameters θ is

so dp = (0.0010, 0.0310, 0.6140, 0.2630, 0.1230, 0.0470, 0.0180, 0.0090, 0.0050, 0.0030).

Figure 4 illustrates the convergence of the approximate posterior by SPPE and SPLE in each round, and compares our results with the SMC-ABC method, where we performed simulations up to 5 105 times for the SMC-ABC method to generate the near exact True posterior. The tolerance parameter ϵt in SMC-ABC was selected as ϵt = 0.5 0.7t, where t represents the iteration round, and the discrepancy function ρ was chosen as the ℓ2 norm. Figure 5 depicts the results of the SMC-ABC method under the same performance metrics. Our method, after 10 rounds or 104 simulations, achieved a similar performance as the SMC-ABC method with approximately 105 simulations. The computational time costs comparison between our methods and SMC-ABC, as illustrated in Table 1, also reveals that our approaches require significantly fewer simulations, resulting in substantially lower simulation time expenditures compared to the SMC-ABC method. However, our methods require extra time for the training of normalizing flows, a duration dependent on the flow s complexity. Overall, both SPPE and SPLE attain a low MMD more rapidly than SMC-ABC.

Table 1: Computational Cost to achieve MMD < 0.1 in the SIR Model (Mean Standard Deviation)

Method Simulation Time (min) Network Training Time (min) Total Number of Simulations SMC-ABC 115.38 1.51 - 115.38 1.51 82.85 1.03 SPPE 2.65 1.03 15.73 7.67 18.38 8.63 2.12 0.71 SPLE 7.11 1.01 45.98 14.46 53.09 15.26 5.40 0.77

Figure 4: Detailed convergence of sequential posterior estimations given DP-protected infection trajectory under the SIR model. Each round entails N = 1000 simulations. Orange: SPPE; red: SPLE; grey: SMC-ABC.

0.6 0.7 0.8 0.9 1.0

104 5 104 105 2 105 Number of Simulations

104 5 104 105 2 105 Number of Simulations

Figure 5: Approximation accuracy by SMC-ABC on the SIR model against the number of simulations.

Published in Transactions on Machine Learning Research (03/2025)

Finally, we investigate the marginal posterior distributions π(β | sdp) and π(γ | sdp) in Figure 6, and conclude that SPPE and SPLE perform similarly well. In this example, the posterior approximated by SPPE is slightly more concentrated than those from SMC-ABC and SPLE.

True posterior

0.6 0.8 1.0 0.0

0.05 0.06 0

Figure 6: Marginal posterior histograms of β, γ, and β/γ in SIR model on synthetic data. Grey: SMC-ABC; orange: SPPE; red: SPLE. The vertical lines indicate true data generating parameters, set to mimic a measles outbreak.

102 104 Number of samples

Task: SIR model

MC Slope: -0.5039 RQMC Slope: -0.9682

102 104 Number of samples

Task: linear regression

MC Slope: -0.4931 RQMC Slope: -0.8857

Figure 7: Rate of convergence of MC and RQMC: The x-axis represents the number of samples M used for integral estimation, and the y-axis shows the logarithmic value of the root-mean-square-error (RMSE). The results indicate that the RMSE of the MC method is approximately O(M 1/2), while the RMSE of the RQMC method is approximately O(M 1).

Finally, we evaluate the impact of the privacy loss budget on the performance of SPPE and SPLE. By setting the parameter m to either 104 or 105, we can adjust the privacy loss budget to ϵ = 1 or ϵ = 0.1, respectively. Figure 8 illustrates the posterior approximation accuracy under different ϵ values. SPPE demonstrates rapid and stable convergence, while SPLE requires approximately 5 training rounds to converge.

E.1.2 Inference on real disease outbreaks

We applied our inference methods (SPPE and SPLE) to analyze several real infectious disease outbreaks, namely influenza, Ebola, and COVID-19. For these experiments, we chose the privacy parameters M = 100, N = 100, and L = 10, which result in an overall privacy loss budget ϵ = 10 according to Proposition 4.

influenza outbreak. We utilized the dataset from a boarding school, obtained from https://search. r-project.org/CRAN/refmans/epimdr/html/flu.html. The total population in the school was K = 763.

Published in Transactions on Machine Learning Research (03/2025)

2 4 6 8 10 Number of Rounds

0.5 0.0 0.5 1.0 1.5

2 4 6 8 10 Number of Rounds

2 4 6 8 10 Number of Rounds

2 4 6 8 10 Number of Rounds

2.5 2.0 1.5 1.0 0.5

Figure 8: Approximation accuracy by SPPE (orange) and SPLE (red) against the number of rounds with A. Privacy loss budget ϵ = 1. B. Privacy loss budget ϵ = 0.1. The error bars represent the mean with the upper and lower quartiles.

We considered a daily time interval, and the observed private statistic so dp is

so dp,flu = (0.0010, 0.0039, 0.0105, 0.0367, 0.0996, 0.2910, 0.3840, 0.3368, 0.3106, 0.2516).

Ebola outbreak in West Africa, 2014. We analyzed datasets from three regions: a) Guinea, b) Liberia, and c) Sierra Leone. The dataset source is from https://apps.who.int/gho/data/node.ebola-sitrep. We assumed potential contact individuals of K = 100, 000. We selected 9 equally spaced time intervals of 120 days starting from 03/31/2014. The resulting observed private statistic so dp is as follows

so dp,(a) = (0.0010, 0.0111, 0.0085, 0.0129, 0.0510, 0.0520, 0.0224, 0.0212, 0.0084, 0.0023).

so dp,(b) = (0.0010, 0.0007, 0.0019, 0.0664, 0.2579, 0.0742, 0.0610, 0.0542, 0.0172, 0.0003).

so dp,(c) = (0.0010, 0.0079, 0.0434, 0.2156, 0.2054, 0.0928, 0.0549, 0.0366, 0.0237, 0.0232).

COVID-19. We examined the COVID-19 dataset for Clark County, Nevada. See https://usafacts. org/visualizations/coronavirus-covid-19-spread-map/state/nevada/county/clark-county/. We assumed a potential contact population of K = 100, 000. We selected 9 equally spaced time intervals of 24 days, starting from 09/07/2020. To obtain the number of currently infected individuals, we calculated the difference in the total confirmed cases with a time interval of 14 days from the original dataset. The resulting observed private statistic so dp is:

so dp,covid = (0.0010, 0.0281, 0.0566, 0.0978, 0.2108, 0.2443, 0.2574, 0.0864, 0.0434, 0.0263).

The numerical results are presented and compared in Figure 4 of the main text.

E.2 Bayesian linear regression

Data generating parameters. Following the parameters settings in Ju et al. (2022), we model the predictors with x0,i Np(m, Σ), where Σ = In and m = (0.9, 1.17). The outcomes y satisfy y | x0 Nn(xβ, σ2In) where σ2 = 2, and the prior for β is independent normal N(0, 1). We set the privacy loss budget ϵ = 10 and the number of subjects n = 100. The ground truth parameters are denoted as

θ = ( 1.79, 2.89, 0.66).

We simulate the summary statistics s from the model using the ground truth parameters θ , resulting in

0.3742 0.0629 0.0299

1.0000 0.0938 0.1270 0.0938 0.0180 0.0094 0.1270 0.0094 0.0280

its corresponding vector form is

svec = ( 0.3742, 0.0629, 0.0299, 0.2499, 0.0938, 0.1270, 0.0180, 0.0094, 0.0280).

Published in Transactions on Machine Learning Research (03/2025)

Furthermore, the observed private statistic so dp is given by

0.3824 0.0667 0.0320

1.0000 0.0988 0.1385 0.0988 0.0219 0.0229 0.1385 0.0229 0.0341

its corresponding vector form is

so dp,vec = ( 0.3824, 0.0667, 0.0320, 0.2720, 0.0988, 0.1385, 0.0219, 0.0229, 0.0341).

Experimental results. In Figure 9, we present a comparison of the performance of the SPPE and SPLE methods across different metrics as the number of simulation rounds increases. The SPLE method demonstrates a faster convergence in this task. Figure 10 displays the posterior after 10 rounds, where both the SPPE and SPLE methods achieve results that are close to the near exact posterior obtained by the SMC-ABC method.

2 4 6 8 10 Number of Rounds

2 4 6 8 10 Number of Rounds

2.45 2.40 2.35 2.30 2.25

Figure 9: Approximation accuracy by SPPE (orange) and SPLE (red) on the Bayesian linear regression model against number of rounds, the error bars represent the mean with the upper and lower quartiles.

Figure 10: Posterior comparison on the Bayesian linear regression model. Grey: SMC-ABC; orange: SPPE; red: SPLE. The vertical lines and black dots indicate true data generating parameters.

To further characterize the privacy-utility trade-off, we compare the private data posterior distributions under several levels of privacy loss budget, in Figure 11. The underlying confidential data x is the same for each ϵ = 0.1, 0.3, 1, 3, 10, 30, 100. For each privacy loss level, we simulate one private summary sdp(ϵ), and use SPPE and SPLE to approximate the private data posterior π(θ | sdp; ϵ). The two methods yield very

Published in Transactions on Machine Learning Research (03/2025)

similar results. More interestingly, we use the same confidential data x as Ju et al. (2022) and the posterior distributions in Figure 11 follow a similar trend with that in Figure 2 of Ju et al. (2022). For larger privacy loss budget (smaller noise), confidential data are mainly corrupted by clamping, and the proposed methods can elevate the effect of this censoring bias, as E(θ | sdp) is closer to the confidential data expectation E(θ | x). As ϵ gets closer to 0 (near perfect privacy), the privacy mechanism has injected so much noise into sdp that the posterior distribution is more dispersed, and little information about θ can be learned based on observing sdp.

0.1 0.3 1 3 10 30 100

0.1 0.3 1 3 10 30 100

0.1 0.3 1 3 10 30 100

Figure 11: Marginal posterior histograms of θ = (β0, β1, β2) given sdp generated with several levels of privacy loss budget on the same confidential data x. Orange: SPPE; red: SPLE. The dash-dotted horizontal lines indicate the confidential data posterior means, and the dotted lines indicate prior means.

Finally, we compare the posterior means and 95% credible intervals obtained by different methods under various privacy loss budgets, as shown in Tables 2, 3, and 4. When larger noise is added, the posterior from all methods converge toward the prior of the parameters. Additionally, the naive posterior approximation fails to compute results due to the non-positive definiteness of the covariance matrix. The corresponding Kolmogorov-Smirnov test results, presented in Figure 12, shows that RNPE exhibits a larger bias compared to other methods, while the results from DA-MCMC, SMC-ABC, SPPE, SPLE, and Gibbs-SS are closely aligned.

Table 2: Estimated posterior mean and 95% credible intervals for the linear regression example using various methods. Here privacy loss budget is set to ϵ = 10.

β0 β1 β2 Confidential Posterior given x -2.15 (-2.68, -1.61) -2.79 (-3.08, -2.50) -0.83 (-1.08, -0.58) Naive posterior approximation -4.63 (-5.04, -4.22) -6.23 (-6.57, -5.90) -5.10 (-5.40, -4.79) DA-MCMC -0.62 (-2.50, 0.99) -2.72 (-3.74, -0.96) 0.54 (-1.06, 2.46) SMC-ABC -0.59 (-2.28, 0.93) -2.44 (-3.67, -0.38) 0.85 (-0.88, 2.77) SPPE -0.51 (-2.25, 1.07) -2.40 (-3.61, -0.30) 0.90 (-0.89, 2.85) SPLE -0.64 (-2.31, 0.96) -2.61 (-3.65, -0.98) 0.64 (-0.93, 2.48) Gibbs-SS -0.46 (-2.22, 1.39) -0.50 (-2.08, 1.28) 0.41 (-1.68, 2.13) RNPE -1.40 (-3.59, 0.94) -2.15 (-3.34, 0.51) 0.29 (-2.11, 2.97)

Table 3: Estimated posterior mean and 95% credible intervals for the linear regression example using various methods. Here privacy loss budget is set to ϵ = 1.

β0 β1 β2 Confidential Posterior given x -2.15 (-2.68, -1.61) -2.79 (-3.08, -2.50) -0.83 (-1.08, -0.58) Naive posterior approximation - - - DA-MCMC -1.00 (-2.75, 0.85) -0.96 (-2.66, 0.89) 1.23 (-0.76, 2.88) SMC-ABC -0.80 (-2.74, 1.17) -0.83 (-2.64, 1.23) 0.90 (-1.00, 2.58) SPPE -0.93 (-2.80, 1.04) -0.90 (-2.62, 0.97) 1.11 (-0.71, 2.72) SPLE -0.89 (-2.69, 1.06) -1.04 (-2.82, 0.91) 1.25 (-0.55, 2.88) Gibbs-SS -0.92 (-2.72, 0.94) -0.67 (-2.21, 1.04) 1.04 (-0.62, 2.47) RNPE -1.37 (-3.81, 2.07) -0.51 (-3.01, 2.53) 1.34 (-1.77, 3.76)

Published in Transactions on Machine Learning Research (03/2025)

Table 4: Estimated posterior mean and 95% credible intervals for the linear regression example using various methods. Here privacy loss budget is set to ϵ = 0.1.

β0 β1 β2 Confidential Posterior given x -2.15 (-2.68, -1.61) -2.79 (-3.08, -2.50) -0.83 (-1.08, -0.58) Naive posterior approximation - - - DA-MCMC -0.12 (-1.98, 1.77) -0.13 (-2.13, 1.88) 0.13 (-1.91, 2.12) SMC-ABC -0.09 (-1.97, 1.93) -0.10 (-2.09, 1.95) 0.11 (-1.90, 2.22) SPPE -0.12 (-2.13, 1.73) -0.12 (-2.11, 1.83) 0.14 (-1.81, 2.09) SPLE -0.04 (-2.01, 1.99) -0.06 (-2.00, 1.85) 0.12 (-1.86, 2.09) Gibbs-SS -0.05 (-1.94, 1.95) -0.01 (-1.95, 1.97) 0.21 (-1.62, 1.89) RNPE -0.22 (-3.56, 3.16) 0.13 (-3.12, 3.51) 0.34 (-3.12, 3.61)

Figure 12: Kolmogorov-Smirnov test statistics between approximations of posterior marginals. A. Privacy loss budget ϵ = 1. B. Privacy loss budget ϵ = 0.1.

E.3 Naïve Bayes log-linear model

Model description. The naïve Bayes log-linear model is a commonly used approach for modeling categorical data (Karwa et al., 2015). The input feature-vector, denoted as x = (x1, , x K), consists of K features, each taking values in the range {1, 2, , Jk}. The output class, denoted as y, represents the target category and takes values in {1, 2, , I}. The model assumes that the conditional probability of the input given the output, denoted as p(x | y), can be factorized as the product of individual feature probabilities: p(x | y) = QK k=1 p(xk | y). The model parameters are pk ij, which represents the probability p(xk = j | y = i), with prior (pk i,1, , pk i,Jk) Dirichlet(αk i,1, , αk i,Jk) for all i and k; and pi = p(y = i), with prior (p1, , p I) Dirichlet(α1, , αI).

We assume that (nk i,1, , nk i,Jk) Multinomial(ni; pk i,1, , pk i,Jk) for all i and k and (n1, , n I) Multinomial(n; p1, , p I). Here nk i,j represents the counts #(y = i, xk = j). One sufficient statistics of the model is the proportion of the counts rk i,j := 1

nnk i,j, where n = PI i=1 PJk j=1 nk i,j for all k. To protect the privacy of the dataset, Laplace noise ek i,j is added to the proportion of the counts, resulting in the privatized proportion mk i,j = rk i,j + ek i,j. When ek i,j Laplace(0, 2K

nϵ ), the released private statistic {mk i,j}i,j,k satisfied ϵ-DP.

Published in Transactions on Machine Learning Research (03/2025)

In our simulation, we set αk i,j = αi = 2 for all i, j, k, and n = 100, with I = 2, K = 2 and Jk = 2 for all k. The privacy loss budget ϵ = 10. The ground truth parameters are

p1 1,1 = 0.3887, p1 1,2 = 0.6113, p2 1,1 = 0.7537, p2 1,2 = 0.2463,

p1 2,1 = 0.6534, p1 2,2 = 0.3466, p2 2,1 = 0.5834, p2 2,2 = 0.4166,

p1 = 0.8489, p2 = 0.1511.

and the observed private statistics simulated from the model with ground truth parameters are

r1 1,1 = 0.3275, r1 1,2 = 0.4520, r2 1,1 = 0.5862, r2 1,2 = 0.1827,

r1 2,1 = 0.1293, r1 2,2 = 0.0858, r2 2,1 = 0.1288, r2 2,2 = 0.0954.

Experimental results. Figure 13 illustrates the performance of the SPPE and SPLE methods across four different metrics, and Figure 14 shows the marginal posterior histograms after 10 rounds, our methods stabilize in performance after round 3.

0.00 0.01 0.02 0.03 0.04

0.58 0.60 0.62 0.64 0.66

2 4 6 8 10 Number of Rounds

2 4 6 8 10 Number of Rounds

Figure 13: Approximation accuracy by SPPE (orange) and SPLE (red) on the log-linear model against the number of rounds, the error bars represent the mean with the upper and lower quartiles.

True posterior

0.25 0.50 0.75 p1

0.25 0.50 0.75 p1

0.50 0.75 p2

0.25 0.50 p2

0.25 0.50 0.75

0.25 0.50 0.75

0.25 0.50 0.75

0.25 0.50 0.75

0.50 0.75 p1

0.25 0.50 p2

Figure 14: Marginal posterior histograms of the log-linear model. Grey: SMC-ABC; orange: SPPE; red: SPLE. The vertical lines indicate true data generating parameters.

E.4 Neural spline flows with different layers

To evaluate the impact of varying the number of layers in the neural spline flow (NSF) model, we conducted additional experiments using configurations with 5 and 10 layers, in addition to the existing 8-layer setup.

The results are shown in Figure 15. For the Bayesian linear regression task, the posterior approximation performance with 5-layer and 10-layer configurations is comparable to the 8-layer NSF. This indicates that

Published in Transactions on Machine Learning Research (03/2025)

a 5-layer NSF possesses sufficient flexibility to represent the posterior or likelihood. However, for the SIR model, the SPLE method with the 5-layer configuration converges slightly slower compared to the 8-layer NSF, indicating that fewer layers may marginally affect convergence in some cases. Besides, as analyzed in Appendix D, increasing the number of layers also results in a linear increase in computational cost.

2 4 6 8 10 Number of Rounds

2 4 6 8 10 Number of Rounds

2.5 2.0 1.5 1.0 0.5

0.6 0.7 0.8 0.9 1.0

2 4 6 8 10 Number of Rounds

2 4 6 8 10 Number of Rounds

2.5 2.0 1.5 1.0 0.5

2 4 6 8 10 Number of Rounds

2 4 6 8 10 Number of Rounds

2 4 6 8 10 Number of Rounds

2 4 6 8 10 Number of Rounds

Figure 15: Comparison of approximation performance for NSF models with different layer configurations by SPPE (orange) and SPLE (red). A. 5-layer NSF on SIR model. B. 10-layer NSF on SIR model. C. 5-layer NSF on Bayesian linear regression model. D. 10-layer NSF on Bayesian linear regression model.

F Statement on Computing Resources

Our numerical experiments were conducted on a computer equipped with four Ge Force RTX 2080 Ti graphics cards and a pair of 14-core Intel E5-2690 v4 CPUs.