# privately_learning_markov_random_fields__e4dd60cd.pdf

Privately Learning Markov Random Fields

Huanyu Zhang 1 Gautam Kamath * 2 Janardhan Kulkarni * 3 Zhiwei Steven Wu * 4

We consider the problem of learning Markov Random Fields (including the prototypical example, the Ising model) under the constraint of differential privacy. Our learning goals include both structure learning, where we try to estimate the underlying graph structure of the model, as well as the harder goal of parameter learning, in which we additionally estimate the parameter on each edge. We provide algorithms and lower bounds for both problems under a variety of privacy constraints namely pure, concentrated, and approximate differential privacy. While non-privately, both learning goals enjoy roughly the same complexity, we show that this is not the case under differential privacy. In particular, only structure learning under approximate differential privacy maintains the non-private logarithmic dependence on the dimensionality of the data, while a change in either the learning goal or the privacy notion would necessitate a polynomial dependence. As a result, we show that the privacy constraint imposes a strong separation between these two learning problems in the high-dimensional data regime.

1. Introduction

Graphical models are a common structure used to model high-dimensional data, which ﬁnd a myriad of applications in diverse research disciplines, including probability theory, Markov Chain Monte Carlo, computer vision, theoretical computer science, social network analysis, game theory, and computational biology (Levin et al., 2009; Chatterjee, 2005; Felsenstein, 2004; Daskalakis et al., 2011; Geman & Grafﬁgne, 1986; Ellison, 1993; Montanari & Saberi, 2010). While statistical tasks involving general distributions over

These authors are in alphabetical order. 1School of Electrical and Computer Engineering, Cornell University 2Cheriton School of Computer Science, University of Waterloo 3Microsoft Research Redmond 4Computer Science & Engineering, University of Minnesota. Correspondence to: Huanyu Zhang <hz388@cornell.edu>.

Proceedings of the 37 th International Conference on Machine Learning, Online, PMLR 119, 2020. Copyright 2020 by the author(s).

p variables often run into the curse of dimensionality (i.e., an exponential sample complexity in p), Markov Random Fields (MRFs) are a particular family of undirected graphical models which are parameterized by the order t of their interactions. Restricting the order of interactions allows us to capture most distributions which may naturally arise, and also avoids this severe dependence on the dimension (i.e., we often pay an exponential dependence on t instead of p). An MRF is deﬁned as follows, see Section 2 for more precise deﬁnitions and notations we will use in this paper.

Deﬁnition 1.1. Let k, t, p N, G = (V, E) be a graph on p nodes, and Ct(G) be the set of cliques of size at most t in G. A Markov Random Field with alphabet size k and t-order interactions is a distribution D over [k]p such that

Pr Z D[Z = z] exp

I Ct(G) ψI(z)

where ψI : [k]p R depends only on variables in I. We note that each node corresponds to one coordinate of Z. Furthermore, the case when k = t = 2 corresponds to the prototypical example of an MRF, the Ising model (Ising, 1925) (Deﬁnition 2.1). More generally, if t = 2, we call the model pairwise (Deﬁnition 2.2), and if k = 2 but t is unrestricted, we call the model a binary MRF (Deﬁnition 2.4). In this paper, we mainly look at these two special cases of MRFs.

Given the wide applicability of these graphical models, there has been a great deal of work on the problem of graphical model estimation (Ravikumar et al., 2010; Santhanam & Wainwright, 2012; Bresler, 2015; Vuffray et al., 2016; Klivans & Meka, 2017; Hamilton et al., 2017; Rigollet & H utter, 2017; Lokhov et al., 2018; Wu et al., 2018). That is, given a dataset generated from a graphical model, can we infer properties of the underlying distribution? Most of the attention has focused on two related learning goals.

1. Structure learning (Deﬁnition 2.5): Recover the set of non-zero edges in G.

2. Parameter learning (Deﬁnition 2.6): Recover the set of non-zero edges in G, as well as ψI for all cliques I of size at most t.

It is clear that structure learning is no harder than parameter learning. Nonetheless, the sample complexity of both

Privately Learning Markov Random Fields

learning goals is known to be roughly equivalent. That is, both can be performed using a number of samples which is only logarithmic in the dimension p (assuming a model of bounded width λ1), thus facilitating estimation in very high-dimensional settings.

However, in modern settings of data analysis, we may be running our algorithms on datasets which are sensitive in nature. For instance, graphical models are often used to model medical and genetic data (Friedman et al., 2000; Lagor et al., 2001) if our learning algorithm reveals too much information about individual datapoints used to train the model, this is tantamount to releasing medical records of individuals providing their data, thus violating their privacy. In order to assuage these concerns, we consider the problem of learning graphical models under the constraint of differential privacy (DP) (Dwork et al., 2006), considered by many to be the gold standard of data privacy. Informally, an algorithm is said to be differentially private if its distribution over outputs is insensitive to the addition or removal of a single datapoint from the dataset (a more formal deﬁnition is provided in Section 2). Differential privacy has enjoyed widespread adoption, including deployment in Apple (Differential Privacy Team, Apple, 2017), Google (Erlingsson et al., 2014), Microsoft (Ding et al., 2017), and the US Census Bureau for the 2020 Census (Dajani et al., 2017).

Our goal is to design algorithms which guarantee both:

Accuracy: With high probability, the algorithm learns the underlying graphical model; Privacy: For every dataset, the algorithm guarantees differential privacy. Thematically, we investigate the following question: how much additional data is needed to learn Markov Random Fields under the constraint of differential privacy? As mentioned before, absent privacy constraints, the sample complexity is logarithmic in p. Can we guarantee privacy with comparable amounts of data? Or if more data is needed, how much more?

1.1. Results and Techniques

We proceed to describe our results on privately learning Markov Random Fields. In this section, we will assume familiarity with some of the most common notions of differential privacy: pure ε-differential privacy, ρ-zero-concentrated differential privacy, and approximate (ε, δ)-differential privacy. In particular, one should know that these are in (strictly) decreasing order of strength (i.e., an algorithm which satisﬁes pure DP gives more privacy to the dataset than concentrated DP), formal deﬁnitions appear in Section 2. Furthermore, in order to be precise, some of our

1This is a common parameterization of the problem, which roughly corresponds to the graph having bounded-degree, see Section 2 for more details.

theorem statements will use notation which is deﬁned later (Section 2) these may be skipped on a ﬁrst reading, as our prose will not require this knowledge.

Upper Bounds. Our ﬁrst upper bounds are for parameter learning. First, we have the following theorem, which gives an upper bound for parameter learning pairwise graphical models under concentrated differential privacy, showing that this learning goal can be achieved with O( p) samples. In particular, this includes the special case of the Ising model, which corresponds to an alphabet size k = 2. Note that this implies the same result if one relaxes the learning goal to structure learning, or the privacy notion to approximate DP, as these modiﬁcations only make the problem easier. Further details are given in Section 3.3.

Theorem 1.2. There exists an efﬁcient ρ-z CDP algorithm which learns the parameters of a pairwise graphical model to accuracy α with probability at least 2/3, which takes

λ2k5 log(pk)e O(λ)

α4 + pλ2k5.5 log2(pk)e O(λ)

This result can be seen as a private adaptation of the elegant work of (Wu et al., 2018) (which in turn builds on the structural results of (Klivans & Meka, 2017)). Wu, Sanghavi, and Dimakis (Wu et al., 2018) show that ℓ1-constrained logistic regression sufﬁces to learn the parameters of all pairwise graphical models. We ﬁrst develop a private analog of this method, based on the private Franke-Wolfe method of Talwar, Thakurta, and Zhang (Talwar et al., 2014; 2015), which is of independent interest. This method is studied in Section 3.1.

Theorem 1.3. If we consider the problem of private sparse logistic regression, there exists an efﬁcient ρ-z CDP algorithm that produces a parameter vector wpriv, such that with probability at least 1 β, the empirical risk satisﬁes

L(wpriv; D) L(werm; D) = O

λ 4 3 log( np

We note that Theorem 1.3 avoids a polynomial dependence on the dimension p in favor of a polynomial dependence on the sparsity parameter λ. The greater dependence on p which arises in Theorem 1.2 is from applying Theorem 1.3 and then using composition properties of concentrated DP.

We go on to generalize the results of (Wu et al., 2018), showing that ℓ1-constrained logistic regression can also learn the parameters of binary t-wise MRFs. This result is novel even in the non-private setting. Due to the page limit, we defer coverage of binary MRFs to the supplement.

The following theorem shows that we can learn the parameters of binary t-wise MRFs with O( p) samples.

Privately Learning Markov Random Fields

Theorem 1.4. Let D be an unknown binary t-wise MRF with associated polynomial h. Then there exists an ρ-z CDP algorithm which learns the maximal monomials of h to accuracy α, given n i.i.d. samples Z1, , Zn D, where

e5λt p log2(p)

ρα 9 2 + tλ2 p log p

ρα2 + e6λt log(p)

To obtain the rate above, our algorithm uses the Private Multiplicative Weights (PMW) method by (Hardt & Rothblum, 2010) to estimate all parity queries of all orders no more than t. The PMW method runs in time exponential in p, since it maintains a distribution over the data domain. We can also obtain an oracle-efﬁcient algorithm that runs in polynomial time when given access to an empirical risk minimization oracle over the class of parities. By replacing PMW with such an oracle-efﬁcient algorithm FEM in (Vietri et al., 2019), we obtain a slightly worse sample complexity

e5λt p log2(p)

ρα 9 2 + tλ2p

p3 log p ρα2 + e6λt log(p)

For the special case of structure learning under approximate differential privacy, we provide a signiﬁcantly better algorithm. In particular, we can achieve an O(log p) sample complexity, which improves exponentially on the above algorithm s sample complexity of O( p). The following is a representative theorem statement for pairwise graphical models, though we derive similar statements for binary MRFs of higher order.

Theorem 1.5. There exists an efﬁcient (ε, δ)-differentially private algorithm which, with probability at least 2/3, learns the structure of a pairwise graphical model, which requires a sample complexity of

n = O λ2k4 exp(14λ) log(pk) log(1/δ)

where η is the minimum parameter weight in absolute value. The detailed deﬁnition is in Section 2.

This result can be derived using stability properties of nonprivate algorithms. In particular, in the non-private setting, the guarantees of algorithms for this problem recover the entire graph exactly with high probability. This allows us to derive private algorithms at a multiplicative cost of O(log(1/δ)/ε) samples, using either the propose-testrelease framework (Dwork & Lei, 2009) or stability-based histograms (Korolova et al., 2009; Bun et al., 2015). Further details are given in Section 5.

Lower Bounds. We note the signiﬁcant gap between the aforementioned upper bounds: in particular, our more generally applicable upper bound (Theorem 1.2) has a O( p)

dependence on the dimension, whereas the best known lower bound is Ω(log p) (Santhanam & Wainwright, 2012). However, we show that our upper bound is tight. That is, even if we relax the privacy notion to approximate differential privacy, or relax the learning goal to structure learning, the sample complexity is still Ω( p). Perhaps surprisingly, if we perform both relaxations simultaneously, this falls into the purview of Theorem 1.5, and the sample complexity drops to O(log p).

First, we show that even under approximate differential privacy, learning the parameters of a graphical model requires Ω( p) samples. The formal statement is given in Section 4.

Theorem 1.6 (Informal). Any algorithm which satisﬁes approximate differential privacy and learns the parameters of a pairwise graphical model with probability at least 2/3 requires poly(p) samples.

This result is proved by constructing a family of instances of binary pairwise graphical models (i.e., Ising models) which encode product distributions. Speciﬁcally, we consider the set of graphs formed by a perfect matching with edges (2i, 2i + 1) for i [p/2]. In order to estimate the parameter on every edge, one must estimate the correlation between each such pair of nodes, which can be shown to correspond to learning the mean of a particular product distribution in ℓ -distance. This problem is well-known to have a gap between the non-private and private sample complexities, due to methods derived from ﬁngerprinting codes (Bun et al., 2014; Dwork et al., 2015; Steinke & Ullman, 2017), or differentially private Fano s inequality (Acharya et al., 2020).

Second, we show that learning the structure of a graphical model, under either pure or concentrated differential privacy, requires poly(p) samples. The formal theorem appears in Section 6.

Theorem 1.7 (Informal). Any algorithm which satisﬁes pure or concentrated differential privacy and learns the structure of a pairwise graphical model with probability at least 2/3 requires poly(p) samples.

We derive this result via packing arguments (Hardt & Talwar, 2010; Beimel et al., 2014; Acharya et al., 2020), by showing that there exists a large number (exponential in p) of different binary pairwise graphical models which must be distinguished. The construction of a packing of size m implies lower bounds of Ω(log m) and Ω( log m) for learning under pure and concentrated differential privacy, respectively.

1.1.1. SUMMARY AND DISCUSSION

We summarize our ﬁndings on privately learning Markov Random Fields in Table 1, focusing on the speciﬁc case of the Ising model. We note that qualitatively similar relation-

Privately Learning Markov Random Fields

ships between problems also hold for general pairwise models as well as higher-order binary Markov Random Fields. Each cell denotes the sample complexity of a learning task, which is a combination of an objective and a privacy constraint. Problems become harder as we go down (as the privacy requirement is tightened) and to the right (structure learning is easier than parameter learning).

The top row shows that both learning goals require only Θ(log p) samples to perform absent privacy constraints, and are thus tractable even in very high-dimensional settings or when data is limited. However, if we additionally wish to guarantee privacy, our results show that this logarithmic sample complexity is only achievable when one considers structure learning under approximate differential privacy. If one changes the learning goal to parameter learning, or tightens the privacy notion to concentrated differential privacy, then the sample complexity jumps to become polynomial in the dimension, in particular Ω( p). Nonetheless, we provide algorithms which match this dependence, giving a tight Θ( p) bound on the sample complexity.

Due to space restrictions, details of our results on t-wise MRFs and several proofs appear in the supplement.

1.2. Related Work

As mentioned before, there has been signiﬁcant work in learning the structure and parameters of graphical models, see, e.g., (Chow & Liu, 1968; Csisz ar & Talata, 2006; Abbeel et al., 2006; Ravikumar et al., 2010; Jalali et al., 2011a;b; Santhanam & Wainwright, 2012; Bresler et al., 2014; Bresler, 2015; Vuffray et al., 2016; Klivans & Meka, 2017; Hamilton et al., 2017; Rigollet & H utter, 2017; Lokhov et al., 2018; Wu et al., 2018). Perhaps a turning point in this literature is the work of Bresler (Bresler, 2015), who showed for the ﬁrst time that general Ising models of bounded degree can be learned in polynomial time. Since this result, following works have focused on both generalizing these results to broader settings (including MRFs with higher-order interactions and non-binary alphabets) as well as simplifying existing arguments. There has also been work on learning, testing, and inferring other statistical properties of graphical models (Bhattacharya & Mukherjee, 2016; Mart ın del Campo et al., 2016; Daskalakis et al., 2017; Mukherjee et al., 2018; Bhattacharya, 2019). In particular, learning and testing Ising models in statistical distance have also been explored (Daskalakis et al., 2018; Gheissari et al., 2018; Devroye et al., 2018; Daskalakis et al., 2019; Bezakova et al., 2019), and are interesting questions under the constraint of privacy.

Recent investigations at the intersection of graphical models and differential privacy include (Bernstein et al., 2017; Chowdhury et al., 2019; Mc Kenna et al., 2019). Bernstein et al. (Bernstein et al., 2017) privately learn graph-

ical models by adding noise to the sufﬁcient statistics and use an expectation-maximization based approach to recover the parameters. However, the focus is somewhat different, as they do not provide ﬁnite sample guarantees for the accuracy when performing parameter recovery, nor consider structure learning at all. Chowdhury, Rekatsinas, and Jha (Chowdhury et al., 2019) study differentially private learning of Bayesian Networks, another popular type of graphical model which is incomparable with Markov Random Fields. Mc Kenna, Sheldon, and Miklau (Mc Kenna et al., 2019) apply graphical models in place of full contingency tables to privately perform inference.

Graphical models can be seen as a natural extension of product distributions, which correspond to the case when the order of the MRF t is 1. There has been signiﬁcant work in differentially private estimation of product distributions (Blum et al., 2005; Bun et al., 2014; Dwork et al., 2006; Steinke & Ullman, 2017; Kamath et al., 2019; Cai et al., 2019; Bun et al., 2019). Recently, this investigation has been broadened into differentially private distribution estimation, including sample-based estimation of properties and parameters, see, e.g., (Nissim et al., 2007; Smith, 2011; Bun et al., 2015; Diakonikolas et al., 2015; Karwa & Vadhan, 2018; Acharya et al., 2018; Kamath et al., 2019; Bun et al., 2019). For further coverage of differentially private statistics, see (Kamath & Ullman, 2020).

2. Preliminaries

Given a set of points Z1, , Zn, we use superscripts, i.e., Zi to denote the i-th datapoint. Given a vector Z Rp, we use subscripts, i.e., Zi to denote its i-th coordinate. We also use Z i to denote the vector after deleting the i-th coordinate, i.e. Z i = [Z1, , Zi 1, Zi+1, , Zp].

2.1. Markov Random Field Preliminaries

We ﬁrst introduce the deﬁnition of the Ising model, which is a special case of general MRFs when k = t = 2. Deﬁnition 2.1. The p-variable Ising model is a distribution D(A, θ) on { 1, 1}p that satisﬁes

Pr (Z = z) exp

1 i j p Ai,jzizj + X

where A Rp p is a symmetric weight matrix with Aii = 0, i [p] and θ Rp is a mean-ﬁeld vector. The dependency graph of D(A, θ) is an undirected graph G = (V, E), with vertices V = [p] and edges E = {(i, j) : Ai,j = 0}. The width of D(A, θ) is deﬁned as

λ(A, θ) = max i [p]

j [p] |Ai,j| + |θi|

Privately Learning Markov Random Fields

Structure Learning Parameter Learning Non-private Θ(log p) (folklore) Θ(log p) (folklore) Approximate DP Θ(log p) (Theorems 5.3) Θ( p) (Theorems 3.3 and 4.1) Zero-concentrated DP Θ( p) (Theorems 3.3 and 6.1) Θ( p) (Theorems 3.3 and 4.1) Pure DP Ω(p) (Theorem 6.1) Ω(p) (Theorem 6.1)

Table 1. Sample complexity (dependence on p) of privately learning an Ising model.

Let η(A, θ) be the minimum edge weight in absolute value, i.e., η(A, θ) = mini,j [p]:Ai,j =0 |Ai,j| .

We note that the Ising model is supported on { 1, 1}p. A natural generalization is to generalize its support to [k]p, and maintain pairwise correlations.

Deﬁnition 2.2. The p-variable pairwise graphical model is a distribution D(W, Θ) on [k]p that satisﬁes

Pr (Z = z) exp

1 i j p Wi,j(zi, zj) + X

i [p] θi(zi)

where W = {Wi,j Rk k : i = j [p]} is a set of weight matrices satisfying Wi,j = W T j,i, and Θ = {θi Rk : i [p]} is a set of mean-ﬁeld vectors. The dependency graph of D(W, Θ) is an undirected graph G = (V, E), with vertices V = [p] and edges E = {(i, j) : Wi,j = 0}. The width of D(W, Θ) is deﬁned as

λ(W, Θ) = max i [p],a [k]

j [p]\i max b [k] |Wi,j(a, b)| + |θi(a)|

Deﬁne η(W, Θ) = min(i,j) E maxa,b |Wi,j(a, b)|.

Both the above models only consider pairwise interactions between nodes. In order to capture higher-order interactions, we examine the more-general model of Markov Random Fields (MRFs). In this paper, we will restrict our attention to MRFs over a binary alphabet (i.e., distributions over { 1}p). In order to deﬁne binary t-wise MRFs, we ﬁrst need the following deﬁnition of multilinear polynomials, partial derivatives and maximal monomials.

Deﬁnition 2.3. Multilinear polynomial is deﬁned as h : Rp R such that h(x) = P

i I xi where h(I) denotes the coefﬁcient of the monomial Q

i I xi with respect to the variables (xi : i I). Let ih(x) = P

J:i J h(J {i}) Q

j J xj denote the partial derivative of h with respect to xi. We say I [p] is a maximal monomial of h if h(J) = 0 for all J I.

Now we are able to formally deﬁne binary t-wise MRFs.

Deﬁnition 2.4. For a graph G = (V, E) on p vertices, let Ct(G) denotes all cliques of size at most t in G. A binary

t-wise Markov random ﬁeld on G is a distribution D on { 1, 1}p which satisﬁes

Pr Z D (Z = z) exp

I Ct(G) ϕI(z)

and each ϕI : Rp R is a multilinear polynomial that depends only on the variables in I. We call G the dependency graph of the MRF and h(x) = P

I Ct(G) ϕI(x) the factorization polynomial of the MRF. The width of D is deﬁned as λ = maxi [p] ih 1, where h 1 := P

Finally, we deﬁne two possible goals for learning graphical models. First, the easier goal is structure learning, which involves recovering the set of non-zero edges.

Deﬁnition 2.5. An algorithm learns the structure of a graphical model if, given samples Z1, . . . , Zn D, it outputs a graph ˆG = (V, ˆE) over V = [p] such that ˆE = E, the set of edges in the dependency graph of D.

The more difﬁcult goal is parameter learning, which requires the algorithm to learn not only the location of the edges, but also their parameter values.

Deﬁnition 2.6. An algorithm learns the parameters of an Ising model (resp. pairwise graphical model) if, given samples Z1, . . . , Zn D, it outputs a matrix ˆA (resp. set of matrices ˆ W) such that maxi,j [p] |Ai,j ˆAi,j| α (resp. |Wi,j(a, b) c Wi,j(a, b)| α, i = j [p], a, b [k]).

Deﬁnition 2.7. An algorithm learns the parameters of a binary t-wise MRF with associated polynomial h if, given samples X1, . . . , Xn D, it outputs another multilinear polynomial u such that that for all maximal monomial I [p], h(I) u(I) α.

2.2. Privacy Preliminaries

A dataset X = (X1, . . . , Xn) X n is a collection of points from some universe X. We say that two datasets X and X are neighboring, which are denoted as X X

if they differ in exactly one single point. In our work we consider a few different variants of differential privacy. The ﬁrst is the standard notion of differential privacy.

Deﬁnition 2.8 (Differential Privacy (DP) (Dwork et al., 2006)). A randomized algorithm A : X n S satisﬁes

Privately Learning Markov Random Fields

(ε, δ)-differential privacy ((ε, δ)-DP) if for every pair of neighboring datasets X, X X n, and any event S S,

Pr (A(X) S) eε Pr (A(X ) S) + δ.

The second is concentrated differential privacy (Dwork & Rothblum, 2016). In this work, we speciﬁcally consider its reﬁnement zero-mean concentrated differential privacy (Bun & Steinke, 2016).

Deﬁnition 2.9 (Concentrated Differential Privacy (z CDP) (Bun & Steinke, 2016)). A randomized algorithm A : X n S satisﬁes ρ-z CDP if for every pair of neighboring datasets X, X X n,

α (1, ) Dα (M(X)||M(X )) ρα,

where Dα (M(X)||M(X )) is the α-R enyi divergence between M(X) and M(X ).

The following lemma quantiﬁes the relationships between (ε, 0)-DP, ρ-z CDP and (ε, δ)-DP.

Lemma 2.10 (Relationships Between Variants of DP (Bun & Steinke, 2016)). For every ε 0,

1. If A satisﬁes (ε, 0)-DP, then A is ε2

2. If A satisﬁes ε2

2 -z CDP, then A satisﬁes ( ε2

δ ), δ)-DP for every δ > 0.

Roughly speaking, (ε, 0)-DP is stronger than z CDP, which is stronger than (ε, δ)-DP with δ > 0.

A crucial property of all the variants of differential privacy is that they can be composed adaptively. By adaptive composition, we mean a sequence of algorithms A1(X), . . . , AT (X) where the algorithm At(X) may also depend on the outcomes of the algorithms A1(X), . . . , At 1(X).

Lemma 2.11 (Composition of DP (Dwork et al., 2010; Bun & Steinke, 2016)). If A is an adaptive composition of differentially private algorithms A1, . . . , AT , then the following two properties hold:

1. If A1, . . . , AT are (ε0, δ1), . . . , (ε0, δT )-DP for some ε0 1, then for every δ0 > 0, A is (ε, δ)-DP for ε = ε0 p

6T log(1/δ0) and δ = δ0 + P

2. If A1, . . . , AT are ρ1, . . . , ρT -z CDP then A is ρ-z CDP for ρ = P t ρt.

3. Parameter Learning of Pairwise Graphical Models

3.1. Private Sparse Logistic Regression

As a subroutine of our parameter learning algorithm, we will solve the following problem of private sparse logistic

Algorithm 1 AP F W (D, L, ρ, C) : Private FW Algorithm

Input: Data set: D = {d1, , dn}, loss function: L(w; D) = 1 n Pn j=1 log(1 + e yj w,xj ), convex set: C = {w Rp : w 1 λ}, iteration times: T, and privacy parameters: ρ Initialize w from an arbitrary point in C. for t = 1 to T 1 do

s S, αs s, L(w; D) + Lap 0, L1 C 1

T n ρ . wt arg mins S αs. wt+1 (1 µt)wt + µt wt, where µt = 2 t+2. end for

Output: wpriv = w T

regression: given a training data set D consisting of n examples dj = (xj, yj) drawn from a distribution P, where xj Rp with xj 1 and yj { 1}, a constraint set C = {w Rp : w 1 λ}, we want to minimize population logistic loss EP log(1 + e Y w,X ) subject to privacy constraint. To do so, we will leverage the private Frank Wolfe (FW) algorithm by (Talwar et al., 2014), which minimizes the empirical risk L(w; D) = 1

n Pn j=1 ℓ(w; dj) =

1 n Pn j=1 log(1 + e yj w,xj ). We show that their algorithm also satisﬁes z CDP; meanwhile establishes the empirical loss guarantee and population loss guarantee in sparse logistic regression. These results are stated in Theorem 1.3 and Theorem 3.1, respectively, and we defer the proof to the supplement.

Theorem 3.1 (Private sparse logistic regression). Algorithm 1 satisﬁes ρ-z CDP. Given a data set D drawn i.i.d. from an unknown distribution P, with probability at least 1 β over the randomness of the algorithm and D,

EP ℓ(wpriv; (X, Y )) min w C EP [ℓ(w; (X, Y ))]

λ 4 3 log( np

(n ρ) 2 3 + λ log 1 β

3.2. Privately Learning Ising Models

We ﬁrst consider the problem of estimating the weight matrix of the Ising model. To be precise, given n i.i.d. samples {z1, , zn} generated from an unknown distribution D(A, θ), our goal is to design an ρ-z CDP estimator ˆA such that with probability at least 2

3, maxi,j [p] Ai,j ˆAi,j α.

An observation of the Ising model is that for any node Zi, the probability of Zi = 1 conditioned on the values of the remaining nodes Z i follows from a sigmoid function. The next lemma comes from (Klivans & Meka, 2017), which formalizes this observation.

Privately Learning Markov Random Fields

Algorithm 2 Privately Learning Ising Models

Input: n samples {z1, , zn}, where zm { 1}p

for m [n], an upper bound on λ(A, θ) λ, privacy parameter ρ

for i = 1 to p do

m [n], xm [zm i, 1], ym zm i .

wpriv AP F W (D, L, ρ , C), where ρ = ρ

p, D = {(xm, ym)}n m=1, C = { w 1 2λ}, and L(w; D) = 1

n Pn m=1 log 1 + e ym w,xm .

j p, ˆAi,j 1

2wpriv j , where j = j when j < i and ej = j 1 if j > i. end for

Output: ˆA Rp p

Lemma 3.2. Let Z D(A, θ) and Z { 1, 1}p, then i [p], x { 1, 1}[p]\{i},

Pr (Zi = 1|Z i = x) = σ

j =i 2Ai,jxj + 2θi

= σ( w, x ).

where w = 2[Ai,1, , Ai,i 1, Ai,i+1, , Ai,p, θi] Rp, and x = [x, 1] Rp.

By Lemma 3.2, we can estimate the weight matrix by solving a logistic regression for each node, which is utilized in (Wu et al., 2018) to design non-private estimators. Our algorithm uses the private FW method to solve the per-node logistic regression problem and achieves the the following theoretical guarantee.

Theorem 3.3. Let D(A, θ) be an unknown p-variable Ising model with λ(A, θ) λ. There exists an efﬁcient ρ-z CDP algorithm which outputs a weight matrix ˆA Rp p such that with probability greater than 2/3,

maxi,j [p] Ai,j ˆAi,j α if the number of i.i.d. samples satisﬁes

λ2 log(p)e12λ

α4 + pλ2 log2(p)e9λ

Proof. We ﬁrst prove that Algorithm 2 satisﬁes ρ-z CDP. Notice that in each iteration, the algorithm solves a private sparse logistic regression under ρ

p-z CDP. Therefore, Algorithm 2 satisﬁes ρ-z CDP by composition (Lemma 2.11).

For the accuracy analysis, we start by looking at the ﬁrst iteration (i = 1) and showing that A1,j ˆA1,j α, j

[p], with probability greater than 1 1 10p.

Given a random sample Z D(A, θ), we let X = [Z 1, 1], Y = Z1. From Lemma 3.2, Pr (Y = 1|X = x) = σ( w , x ), where w = 2[A1,2, , A1,p, θ1]. We also note that w 1 2λ, as a consequence of the width constraint of the Ising model.

For any n i.i.d. samples {zm}n m=1 drawn from the Ising model, let xm = [zm 1, 1] and ym = zm 1 , it is easy to check that each (xm, ym) is the realization of (X, Y ).

Let wpriv be the output of A D, L, ρ

p, {w : w 1 2λ} , where D = {(xm, ym)}n m=1. By Lemma 3.1, when

n = O pλ2 log2(p)

ργ 3 2 + λ2 log(p)

, with probability

greater than 1 1 10p, EZ D(A,θ) L(wpriv; (X, Y )) EZ D(A,θ) [L(w ; (X, Y ))] γ.

We will use the following lemma from (Wu et al., 2018). Roughly speaking, with the assumption that the samples are generated from an Ising model, any estimator wpriv which achieves a small error in the loss L guarantees an accurate parameter recovery in ℓ distance.

Lemma 3.4. Let X, Y be deﬁned above. We suppose the joint distribution of (X, Y ) is P, and Pr (Y = 1|X = x) = σ( u1, x + θ1) for some u1 Rp 1 and θ1 R. If E(X,Y ) P log 1 + e Y ( u1,X +θ1) E(X,Y ) P log 1 + e Y ( u2,X +θ2) γ for some u2 Rp 1, θ2 R, and γ 1 2e 4λ 6, then u1 u2 = O(e2λ γ).

By Lemma 3.4, if EZ D(A,θ) L(wpriv; (X, Y )) EZ D(A,θ) [L(w ; (X, Y ))] O α2e 6λ , we have wpriv w α. By replacing γ = α2e 6λ, we

prove that A1,j ˆA1,j α with probability greater

than 1 1 10p. Noting that similar argument works for the other iterations and non-overlapping part of the matrix is recovered in different iterations. By union bound over p iterations, we prove that with probability at least 2

maxi,j [p] Ai,j ˆAi,j α.

3.3. Privately Learning Pairwise Graphical Model over General Alphabet

Next, we study parameter learning for pairwise graphical models over general alphabet. Given n i.i.d. samples {z1, , zn} drawn from an unknown distribution D(W, Θ), we want to design an ρ-z CDP estimator ˆ W such that with probability at least 2

3, i = j [p], u, v

[k], Wi,j(u, v) c Wi,j(u, v) α. To facilitate our presen-

tation, we assume that i = j [p], every row (and column)

Privately Learning Markov Random Fields

vector of Wi,j has zero mean.2

Analogous to Lemma 3.2 for the Ising model, a pairwise graphical model has the following property, which can be utilized to recover its parameters. The proof is similar and we omit it for simplicity.

Lemma 3.5. Let Z D(W, Θ) and Z [k]p. For any i [p], any u = v [k], and any x [k]p 1,

Pr (Zi = u|Zi {u, v}, Z i = x)

j =i (Wi,j(u, xj) Wi,j(v, xj)) + θi(u) θi(v)

Now we introduce our algorithm. Without loss of generality, we consider estimating W1,j for all j [p] as a running example. We ﬁx a pair of values (u, v), where u, v [k] and u = v. Let Su,v be the samples where Z1 {u, v}. In order to utilize Lemma 3.5, we do the following transformation on the samples in Su,v: for the m-th sample zm, let ym = 1 if zm 1 = u, else ym = 1. And xm is the one hot encoding of the vector [zm 1, 1], where One Hot Encode(s) is a mapping from [k]p to Rp k, and the i-th row is the t-th standard basis vector given si = t. Then we deﬁne w Rp k as follows:

w (j, ) = W1,j+1(u, ) W1,j+1(v, ), j [p 1];

w (p, ) = [θ1(u) θ1(v), 0, , 0].

Lemma 3.5 implies that t, Pr (Y t = 1) = σ( w , Xt ), where , is the element-wise multiplication of matrices. According to the deﬁnition of the width of D(W, Θ), w 1 λk. Now we can apply the sparse logistic regression in Algorithm 3 to the samples in Su,v.

Suppose wpriv u,v is the output of the private Frank-Wolfe algorithm, we deﬁne Uu,v Rp k as follows: b [k],

Uu,v(j, b) = wpriv u,v (j, b) 1

a [k] wpriv u,v (j, a), j [p 1];

Uu,v(p, b) = wpriv u,v (p, b) + 1

a [k] wpriv u,v (j, a).

(1) Uu,v can be seen as a centered version of wpriv u,v (for the ﬁrst p 1 rows). It is not hard to see that Uu,v, x = wpriv u,v , x , so Uu,v is also a minimizer of the sparse logistic regression.

2The assumption that Wi,j is centered is without loss of generality and widely used in the literature (Klivans & Meka, 2017; Wu et al., 2018). We present the argument here for completeness. Suppose the a-th row of Wi,j is not centered, i.e., P

b Wi,j(a, b) = 0, we can deﬁne W i,j(a, b) = Wi,j(a, b) 1

b Wi,j(a, b) and θ i(a) = θi(a)+ 1

b Wi,j(a, b), and the probability distribution remains unchanged.

Algorithm 3 Privately Learning Pairwise Graphical Model

Input: alphabet size k, n i.i.d. samples {z1, , zn}, where zm [k]p for m [n], an upper bound on λ(W, Θ) λ, privacy parameter ρ for i = 1 to p do

for each pair u = v [k] do

Su,v {zm, m [n] : zm i {u, v}}.

zm Su,v, xm One Hot Encode([zm i, 1]), ym 1 if zm i = u; yt 1 if zm i = v.

wpriv u,v AP F W (D, L, ρ , C), where ρ = ρ k2p, D = {(xm, ym) : zm Su,v},

L(w; D) = 1 |Su,v| P|Su,v| m=1 log 1 + e ym w,xm , C = { w 1 2λk}.

Deﬁne Uu,v Rp k by centering the ﬁrst p 1 rows of wpriv u,v , as in Equation 1. end for for j [p]\i and u [k] do

c Wi,j(u, :) 1 k P

v [k] Uu,v( j, :), where j = j when j < i and j = j 1 when j > i. end for end for Output: c Wi,j Rk k for all i = j [p]

For now, let we suppose j [p 1], b [k], Uu,v(j, b) is a good approximate of (W1,j+1(u, b) W1,j+1(v, b)), which is proved in the supplement. If we sum over v [k], it can be shown that 1

v [k] Uu,v(j, b) is also a good approximate of W1,j+1(u, b), for all j [p 1], and u, b [k], because of the centering assumption of W, i.e., j [p 1], b [k], P

v [k] W1,j+1(v, b) = 0.

The following theorem is the main result of this section, where its proof is structurally similar to that of Theorem 3.3 and we leave it to the supplement.

Theorem 3.6. Let D(W, Θ) be an unknown p-variable pairwise graphical model distribution, and we suppose that D(W, Θ) has width λ(W, Θ) λ. There exists an efﬁcient ρ-z CDP algorithm which outputs c W such that with probability greater than 2/3, Wi,j(u, v) c Wi,j(u, v) α,

i = j [p], u, v [k] if the number of i.i.d. samples satisfy

λ2k5 log(pk)e O(λ)

α4 + pλ2k5.5 log2(pk)e O(λ)

4. Lower Bounds for Parameter Learning

The lower bound for parameter estimation is based on mean estimation in ℓ distance. For details on the construction, refer to Section 1.1, and the proof appears in the supplement.

Privately Learning Markov Random Fields

Theorem 4.1. Suppose A is an (ε, δ)-differentially private algorithm that takes n i.i.d. samples Z1, . . . , Zn drawn from any unknown p-variable Ising model D(A, θ) and out-

puts ˆA such that E h maxi,j [p] |Ai,j ˆAi,j| i α 1/50.

Then n = Ω p

5. Structure Learning of Graphical Models

In this section, we will give an (ε, δ)-differentially private algorithm for learning the structure of a Markov Random Field. The dependence on the dimension d will be only logarithmic, in comparison to the complexity of privately learning the parameters. The following lemma is immediate from stability-based mode arguments (see, e.g., Proposition 3.4 of (Vadhan, 2017)). Lemma 5.1. Suppose there exists a (non-private) algorithm which takes X = (X1, . . . , Xn) sampled i.i.d. from some distribution D, and outputs some ﬁxed value Y (which may depend on D) with probability at least 2/3. Then there exists an (ε, δ)-differentially private algorithm which takes

O n log(1/δ)

ε samples and outputs Y with probability at least 1 δ.

We can now directly import the following theorem from (Wu et al., 2018). Theorem 5.2 ((Wu et al., 2018)). There exists an algorithm which, with probability at least 2/3, learns the structure of a pairwise graphical model. It requires n = O λ2k4e14λ log(pk)

η4 samples.

This gives us the following private learning result as a corollary. Similar results for binary MRFs appear in the supplement. Corollary 5.3. There exists an (ε, δ)-differentially private algorithm which, with probability at least 2/3, learns the structure of a pairwise graphical model. It requires n = O λ2k4e14λ log(pk) log(1/δ)

εη4 samples.

6. Lower Bounds for Structure Learning

In this section we state our structure learning lower bounds under pure DP or z CDP, for learning either Ising models or pairwise graphical models. We show that under both ε-DP and ρ-z CDP, a polynomial dependence on the dimension is unavoidable. Due to the page limit, we defer the proofs to the supplement. Theorem 6.1. Any (ε, 0)-DP algorithm which learns the structure of an Ising model with minimum edge weight η requires n = Ω p

ε samples. Furthermore, n =

p ρ samples are required for the same task under ρz CDP.

Theorem 6.2. Any (ε, 0)-DP algorithm which learns the structure of a p-variable pairwise graphical model with minimum edge weight η requires n = Ω p

Furthermore, n = Ω q

ρ samples are required for the same task under ρ-z CDP.

Acknowledgments

The authors would like to thank Kunal Talwar for suggesting the study of this problem, and Adam Klivans, Frederic Koehler, Ankur Moitra, and Shanshan Wu for helpful and inspiring conversations. GK would like to thank Chengdu Style Restaurant (古月飘香) in Berkeley for inspiration in the conception of this project.

Huanyu Zhang is supported by NSF #1815893 and by NSF #1704443. This work was partially done while the author was an intern at Microsoft Research Redmond.

Gautam Kamath is supported by a University of Waterloo startup grant. Part of this work was done while supported as a Microsoft Research Fellow, as part of the Simons-Berkeley Research Fellowship program, and while visiting Microsoft Research Redmond.

Zhiwei Steven Wu is supported in part by the NSF FAI Award #1939606, a Google Faculty Research Award, a J.P. Morgan Faculty Award, a Facebook Research Award, and a Mozilla Research Grant.

Abbeel, P., Koller, D., and Ng, A. Y. Learning factor graphs in polynomial time and sample complexity. Journal of Machine Learning Research, 7(Aug):1743 1788, 2006.

Acharya, J., Kamath, G., Sun, Z., and Zhang, H. Inspectre: Privately estimating the unseen. In Proceedings of the 35th International Conference on Machine Learning, ICML 18, pp. 30 39. JMLR, Inc., 2018.

Acharya, J., Sun, Z., and Zhang, H. Differentially private assouad, fano, and le cam. ar Xiv preprint ar Xiv:2004.06830, 2020.

Beimel, A., Brenner, H., Kasiviswanathan, S. P., and Nissim, K. Bounds on the sample complexity for private learning and private data release. Machine Learning, 94(3):401 437, 2014.

Bernstein, G., Mc Kenna, R., Sun, T., Sheldon, D., Hay, M., and Miklau, G. Differentially private learning of undirected graphical models using collective graphical models. In Proceedings of the 34th International Conference on Machine Learning, ICML 17, pp. 478 487. JMLR, Inc., 2017.

Privately Learning Markov Random Fields

Bezakova, I., Blanca, A., Chen, Z., ˇStefankoviˇc, D., and Vigoda, E. Lower bounds for testing graphical models: Colorings and antiferromagnetic Ising models. In Proceedings of the 32nd Annual Conference on Learning Theory, COLT 19, pp. 283 298, 2019.

Bhattacharya, B. B. A general asymptotic framework for distribution-free graph-based two-sample tests. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 81(3):575 602, 2019.

Bhattacharya, B. B. and Mukherjee, S. Inference in Ising models. Bernoulli, 2016.

Blum, A., Dwork, C., Mc Sherry, F., and Nissim, K. Practical privacy: The Su LQ framework. In Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 05, pp. 128 138, New York, NY, USA, 2005. ACM.

Bresler, G. Efﬁciently learning Ising models on arbitrary graphs. In Proceedings of the 47th Annual ACM Symposium on the Theory of Computing, STOC 15, pp. 771 782, New York, NY, USA, 2015. ACM.

Bresler, G., Gamarnik, D., and Shah, D. Structure learning of antiferromagnetic Ising models. In Advances in Neural Information Processing Systems 27, NIPS 14, pp. 2852 2860. Curran Associates, Inc., 2014.

Bun, M. and Steinke, T. Concentrated differential privacy: Simpliﬁcations, extensions, and lower bounds. In Proceedings of the 14th Conference on Theory of Cryptography, TCC 16-B, pp. 635 658, Berlin, Heidelberg, 2016. Springer.

Bun, M., Ullman, J., and Vadhan, S. Fingerprinting codes and the price of approximate differential privacy. In Proceedings of the 46th Annual ACM Symposium on the Theory of Computing, STOC 14, pp. 1 10, New York, NY, USA, 2014. ACM.

Bun, M., Nissim, K., Stemmer, U., and Vadhan, S. Differentially private release and learning of threshold functions. In Proceedings of the 56th Annual IEEE Symposium on Foundations of Computer Science, FOCS 15, pp. 634 649, Washington, DC, USA, 2015. IEEE Computer Society.

Bun, M., Kamath, G., Steinke, T., and Wu, Z. S. Private hypothesis selection. In Advances in Neural Information Processing Systems 32, Neur IPS 19. Curran Associates, Inc., 2019.

Cai, T. T., Wang, Y., and Zhang, L. The cost of privacy: Optimal rates of convergence for parameter estimation with differential privacy. ar Xiv preprint ar Xiv:1902.04495, 2019.

Chatterjee, S. Concentration Inequalities with Exchangeable Pairs. Ph D thesis, Stanford University, June 2005.

Chow, C. and Liu, C. Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14(3):462 467, 1968.

Chowdhury, A. R., Rekatsinas, T., and Jha, S. Datadependent differentially private parameter learning for directed graphical models. ar Xiv preprint ar Xiv:1905.12813, 2019.

Csisz ar, I. and Talata, Z. Consistent estimation of the basic neighborhood of Markov random ﬁelds. The Annals of Statistics, 34(1):123 145, 2006.

Dajani, A. N., Lauger, A. D., Singer, P. E., Kifer, D., Reiter, J. P., Machanavajjhala, A., Garﬁnkel, S. L., Dahl, S. A., Graham, M., Karwa, V., Kim, H., Lelerc, P., Schmutte, I. M., Sexton, W. N., Vilhuber, L., and Abowd, J. M. The modernization of statistical disclosure limitation at the U.S. census bureau, 2017. Presented at the September 2017 meeting of the Census Scientiﬁc Advisory Committee.

Daskalakis, C., Mossel, E., and Roch, S. Evolutionary trees and the Ising model on the Bethe lattice: A proof of Steel s conjecture. Probability Theory and Related Fields, 149(1):149 189, 2011.

Daskalakis, C., Dikkala, N., and Kamath, G. Concentration of multilinear functions of the Ising model with applications to network data. In Advances in Neural Information Processing Systems 30, NIPS 17. Curran Associates, Inc., 2017.

Daskalakis, C., Dikkala, N., and Kamath, G. Testing Ising models. In Proceedings of the 29th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 18, pp. 1989 2007, Philadelphia, PA, USA, 2018. SIAM.

Daskalakis, C., Dikkala, N., and Kamath, G. Testing Ising models. IEEE Transactions on Information Theory, 65 (11):6829 6852, 2019.

Devroye, L., Mehrabian, A., and Reddad, T. The minimax learning rate of normal and Ising undirected graphical models. ar Xiv preprint ar Xiv:1806.06887, 2018.

Diakonikolas, I., Hardt, M., and Schmidt, L. Differentially private learning of structured discrete distributions. In Advances in Neural Information Processing Systems 28, NIPS 15, pp. 2566 2574. Curran Associates, Inc., 2015.

Differential Privacy Team, Apple. Learning with privacy at scale. https:// machinelearning.apple.com/docs/ learning-with-privacy-at-scale/

Privately Learning Markov Random Fields

appledifferentialprivacysystem.pdf, December 2017.

Ding, B., Kulkarni, J., and Yekhanin, S. Collecting telemetry data privately. In Advances in Neural Information Processing Systems 30, NIPS 17, pp. 3571 3580. Curran Associates, Inc., 2017.

Dwork, C. and Lei, J. Differential privacy and robust statistics. In Proceedings of the 41st Annual ACM Symposium on the Theory of Computing, STOC 09, pp. 371 380, New York, NY, USA, 2009. ACM.

Dwork, C. and Rothblum, G. N. Concentrated differential privacy. ar Xiv preprint ar Xiv:1603.01887, 2016.

Dwork, C., Mc Sherry, F., Nissim, K., and Smith, A. Calibrating noise to sensitivity in private data analysis. In Proceedings of the 3rd Conference on Theory of Cryptography, TCC 06, pp. 265 284, Berlin, Heidelberg, 2006. Springer.

Dwork, C., Rothblum, G. N., and Vadhan, S. Boosting and differential privacy. In Proceedings of the 51st Annual IEEE Symposium on Foundations of Computer Science, FOCS 10, pp. 51 60, Washington, DC, USA, 2010. IEEE Computer Society.

Dwork, C., Smith, A., Steinke, T., Ullman, J., and Vadhan, S. Robust traceability from trace amounts. In Proceedings of the 56th Annual IEEE Symposium on Foundations of Computer Science, FOCS 15, pp. 650 669, Washington, DC, USA, 2015. IEEE Computer Society.

Ellison, G. Learning, local interaction, and coordination. Econometrica, 61(5):1047 1071, 1993.

Erlingsson, U., Pihur, V., and Korolova, A. RAPPOR: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM Conference on Computer and Communications Security, CCS 14, pp. 1054 1067, New York, NY, USA, 2014. ACM.

Felsenstein, J. Inferring Phylogenies. Sinauer Associates Sunderland, 2004.

Friedman, N., Linial, M., Nachman, I., and Pe er, D. Using Bayesian networks to analyze expression data. Journal of Computational Biology, 7(3-4):601 620, 2000.

Geman, S. and Grafﬁgne, C. Markov random ﬁeld image models and their applications to computer vision. In Proceedings of the International Congress of Mathematicians, pp. 1496 1517. American Mathematical Society, 1986.

Gheissari, R., Lubetzky, E., and Peres, Y. Concentration inequalities for polynomials of contracting Ising models.

Electronic Communications in Probability, 23(76):1 12, 2018.

Hamilton, L., Koehler, F., and Moitra, A. Information theoretic properties of Markov random ﬁelds, and their algorithmic applications. In Advances in Neural Information Processing Systems 30, NIPS 17. Curran Associates, Inc., 2017.

Hardt, M. and Rothblum, G. N. A multiplicative weights mechanism for privacy-preserving data analysis. In Proceedings of the 51st Annual IEEE Symposium on Foundations of Computer Science, FOCS 10, pp. 61 70, Washington, DC, USA, 2010. IEEE Computer Society.

Hardt, M. and Talwar, K. On the geometry of differential privacy. In Proceedings of the 42nd Annual ACM Symposium on the Theory of Computing, STOC 10, pp. 705 714, New York, NY, USA, 2010. ACM.

Ising, E. Beitrag zur theorie des ferromagnetismus. Zeitschrift f ur Physik A Hadrons and Nuclei, 31(1):253 258, 1925.

Jalali, A., Johnson, C. C., and Ravikumar, P. K. On learning discrete graphical models using greedy methods. In Advances in Neural Information Processing Systems 24, NIPS 11, pp. 1935 1943. Curran Associates, Inc., 2011a.

Jalali, A., Ravikumar, P. K., Vasuki, V., and Sanghavi, S. On learning discrete graphical models using group-sparse regularization. In Proceedings of the 14th International Conference on Artiﬁcial Intelligence and Statistics, AISTATS 11, pp. 378 387. JMLR, Inc., 2011b.

Kamath, G. and Ullman, J. A primer on private statistics. ar Xiv preprint ar Xiv:2005.00010, 2020.

Kamath, G., Li, J., Singhal, V., and Ullman, J. Privately learning high-dimensional distributions. In Proceedings of the 32nd Annual Conference on Learning Theory, COLT 19, pp. 1853 1902, 2019.

Karwa, V. and Vadhan, S. Finite sample differentially private conﬁdence intervals. In Proceedings of the 9th Conference on Innovations in Theoretical Computer Science, ITCS 18, pp. 44:1 44:9, Dagstuhl, Germany, 2018. Schloss Dagstuhl Leibniz-Zentrum fuer Informatik.

Klivans, A. and Meka, R. Learning graphical models using multiplicative weights. In Proceedings of the 58th Annual IEEE Symposium on Foundations of Computer Science, FOCS 17, pp. 343 354, Washington, DC, USA, 2017. IEEE Computer Society.

Korolova, A., Kenthapadi, K., Mishra, N., and Ntoulas, A. Releasing search queries and clicks privately. In Proceedings of the 18th International World Wide Web

Privately Learning Markov Random Fields

Conference, WWW 09, pp. 171 180, New York, NY, USA, 2009. ACM.

Lagor, C., Aronsky, D., Fiszman, M., and Haug, P. J. Automatic identiﬁcation of patients eligible for a pneumonia guideline: comparing the diagnostic accuracy of two decision support models. Studies in Health Technology and Informatics, 84(1):493 497, 2001.

Levin, D. A., Peres, Y., and Wilmer, E. L. Markov Chains and Mixing Times. American Mathematical Society, 2009.

Lokhov, A. Y., Vuffray, M., Misra, S., and Chertkov, M. Optimal structure and parameter learning of Ising models. Science Advances, 4(3):e1700791, 2018.

Mart ın del Campo, A., Cepeda, S., and Uhler, C. Exact goodness-of-ﬁt testing for the Ising model. Scandinavian Journal of Statistics, 2016.

Mc Kenna, R., Sheldon, D., and Miklau, G. Graphical-model based estimation and inference for differential privacy. ar Xiv preprint ar Xiv:1901.09136, 2019.

Montanari, A. and Saberi, A. The spread of innovations in social networks. Proceedings of the National Academy of Sciences, 107(47):20196 20201, 2010.

Mukherjee, R., Mukherjee, S., and Yuan, M. Global testing against sparse alternatives under Ising models. The Annals of Statistics, 46(5):2062 2093, 2018.

Nissim, K., Raskhodnikova, S., and Smith, A. Smooth sensitivity and sampling in private data analysis. In Proceedings of the 39th Annual ACM Symposium on the Theory of Computing, STOC 07, pp. 75 84, New York, NY, USA, 2007. ACM.

Ravikumar, P., Wainwright, M. J., and Lafferty, J. D. Highdimensional Ising model selection using ℓ1-regularized logistic regression. The Annals of Statistics, 38(3):1287 1319, 2010.

Rigollet, P. and H utter, J.-C. High dimensional statistics. http://www-math.mit.edu/ rigollet/ PDFs/Rig Notes17.pdf, 2017. Lecture notes.

Santhanam, N. P. and Wainwright, M. J. Informationtheoretic limits of selecting binary graphical models in high dimensions. IEEE Transactions on Information Theory, 58(7):4117 4134, 2012.

Smith, A. Privacy-preserving statistical estimation with optimal convergence rates. In Proceedings of the 43rd Annual ACM Symposium on the Theory of Computing, STOC 11, pp. 813 822, New York, NY, USA, 2011. ACM.

Steinke, T. and Ullman, J. Between pure and approximate differential privacy. The Journal of Privacy and Conﬁdentiality, 7(2):3 22, 2017.

Talwar, K., Thakurta, A., and Zhang, L. Private empirical risk minimization beyond the worst case: The effect of the constraint set geometry. ar Xiv preprint ar Xiv:1411.5417, 2014.

Talwar, K., Thakurta, A., and Zhang, L. Nearly-optimal private LASSO. In Advances in Neural Information Processing Systems 28, NIPS 15, pp. 3025 3033. Curran Associates, Inc., 2015.

Vadhan, S. The complexity of differential privacy. In Lindell, Y. (ed.), Tutorials on the Foundations of Cryptography: Dedicated to Oded Goldreich, chapter 7, pp. 347 450. Springer International Publishing AG, Cham, Switzerland, 2017.

Vietri, G., Tian, G., Bun, M., Steinke, T., and Wu, Z. S. New oracle efﬁcient algorithms for private synthetic data release. Neur IPS Pri ML workshop, 2019.

Vuffray, M., Misra, S., Lokhov, A., and Chertkov, M. Interaction screening: Efﬁcient and sample-optimal learning of Ising models. In Advances in Neural Information Processing Systems 29, NIPS 16, pp. 2595 2603. Curran Associates, Inc., 2016.

Wu, S., Sanghavi, S., and Dimakis, A. G. Sparse logistic regression learns all discrete pairwise graphical models. ar Xiv preprint ar Xiv:1810.11905, 2018.