# differentially_private_learning_of_geometric_concepts__d59e564c.pdf Differentially Private Learning of Geometric Concepts Haim Kaplan 1 2 Yishay Mansour 1 2 Yossi Matias 2 Uri Stemmer 3 4 We present differentially private efficient algorithms for learning union of polygons in the plane (which are not necessarily convex). Our algorithms achieve (α, β)-PAC learning and (ε, δ)-differential privacy using a sample of size O 1 αεk log d , where the domain is [d] [d] and k is the number of edges in the union of polygons. 1. Introduction Machine learning algorithms have exciting and wide-range potential. However, as the data frequently contain sensitive personal information, there are real privacy concerns associated with the development and the deployment of this technology. Motivated by this observation, the line of work on differentially private learning (initiated by Kasiviswanathan et al. (2011)) aims to construct learning algorithms that provide strong (mathematically proven) privacy protections for the training data. Both government agencies and industrial companies have realized the importance of introducing strong privacy protection to statistical and machine learning tasks. A few recent examples include Google (Erlingsson et al., 2014) and Apple (Thakurta et al., 2017) that are already using differentially private estimation algorithms that feed into machine learning algorithms, and the US Census Bureau announcement that they will use differentially private data publication techniques in the next decennial census (Abowd, 2016). Differential privacy is increasingly accepted as a standard for rigorous privacy. We refer the reader to the excellent surveys in (Dwork & Roth, 2014) and (Vadhan, 2016). The definition of differential privacy is, Definition 1.1 (Dwork et al. (2006)). Let A be a randomized algorithm whose input is a sample. Algorithm A is (ε, δ)-differentially private if for every two samples S, S that differ in one example, and for any event T, we have 1Tel Aviv University 2Google research, Israel 3Ben-Gurion University 4Supported by a gift from Google Ltd. Correspondence to: Uri Stemmer . Proceedings of the 36 th International Conference on Machine Learning, Long Beach, California, PMLR 97, 2019. Copyright 2019 by the author(s). Pr[A(S) T] eε Pr[A(S ) T] + δ. For now, we can think of a (non-private) learner as an algorithm that operates on a set of classified random examples, and outputs a hypothesis h that misclassifies fresh examples with probability at most (say) 1 10. A private learner must achieve the same goal while guaranteeing that the choice of h preserves differential privacy of the sample points. Intuitively this means that the choice of h should not be significantly affected by any particular sample. While many learning tasks of interest are compatible with differential privacy, privacy comes with a cost (in terms of computational resources and the amount of data needed), and it is important to understand how efficient can private learning be. Indeed, there has been a significant amount of work aimed at understanding the sample complexity of private learning (Beimel et al., 2010; Chaudhuri & Hsu, 2011; Beimel et al., 2013a;b; Feldman & Xiao, 2015; Bun et al., 2015; Alon et al., 2018; Beimel et al., 2019), the computational complexity of private learning (Bun & Zhandry, 2016), and studying variations of the private learning model (Beimel et al., 2015; Bun et al., 2016; Dwork & Feldman, 2018; Bassily et al., 2018). However, in spite of the significant progress made in recent years, much remains unknown and answers to fundamental questions are still missing. In particular, the literature lacks effective constructions of private learners for specific concept classes of interest, such as halfspaces, polynomials, d-dimensional balls, and more. We remark that, in principle, every (nonprivate) learner that works in the statistical queries (SQ) model of Kearns (1998) can be transformed to preserve differential privacy. However, as the transformation is only tight up to polynomial factors, and as SQ learners are often much less efficient than their PAC learners counterparts, the resulting private learners are typically far from practical. In this work we make an important step towards bringing differentially private learning closer to practice, and construct an effective algorithm for privately learning simple geometric shapes, focusing on polygons in the plane. To motivate our work, consider the task of analyzing GPS navigation data, or the task of learning the shape of a flood or a fire based on users location reports. As user location data might be sensitive, the ability to privately learn such shapes in the plane is of significant importance. Differentially Private Learning of Geometric Concepts 1.1. A Non-Private Learner for Conjunctions and Existing Techniques Our learner is obtained by designing a private variant for the classical (non-private) learner for conjunctions using the greedy set-cover algorithm. Before describing our new learner, we first quickly recall this non-private technique (see e.g., (Kearns & Vazirani, 1994) for more details). Let CONJk,d denote the class of all conjunctions (i.e., AND) of at most k literals over d Boolean variables v1, . . . , vd, e.g., v1 v4 v5. Here, for a labeled example ( x, σ), the vector x {0, 1}d is interpreted as an assignment to the d Boolean variables, and σ = 1 iff this assignment satisfies the target concept. Given a sample S = {( xi, σi)} of labeled examples, the classical (non-private) learner for this class begins with the hypothesis h = v1 v1 vd vd, and then proceeds by deleting from h any literal that contradicts a positively labeled example in S. Observe that at the end of this process, the set of literals appearing in h contains the set of literals appearing in the target concept (because a literal is only deleted when it is contradicted by a positively labeled example). The next step is to eliminate unnecessary literals from the hypothesis h (in order to guarantee generalization). Note that removing literals from h might cause it to err on negative example in S, and hence, the algorithm must carefully choose which of the literals to eliminate. This can be done using the greedy algorithm for set cover as follows. We have already made sure that each of the literals in h does not err on positive examples in S (since such literals were deleted), and we know that there is a choice of k literals from h that together correctly classify all negative examples in S (since we know that the k literals of the target concept are contained in h). Thus, our task can be restated as identifying a small number of literals from h that together correctly classify all negative examples in S. This can be done using the greedy algorithm for set cover, where every literal in h corresponds to a set, and this set covers a negative example if the literal is zero on this example. To summarize, the algorithm first identifies the collection of all literals that are consistent with the positive data, and then uses the greedy algorithm for set cover in order to identify a small subset of these literals that together correctly classify the negative data. This is a good starting point for designing a private learner for conjunctions, since the greedy algorithm for set cover has a private variant (Gupta et al., 2010). The challenge here is that in the private algorithm of Gupta et al. (2010), the collection of sets from which the cover is chosen is assumed to be fixed and independent of the input data. In contrast, in our case the collection of sets corresponds to the literals that correctly classified the positive data, which is data dependent. One might try to overcome this challenge by first identifying, in a private manner, a collection L of all literals that correctly classify (most of) the positive data, and then to run the private algorithm of Gupta et al. (2010) to choose a small number of literals from L. However, a direct implementation of this idea would require accounting for the privacy loss incurred due to each literal in L. As |L| can be linear in d (the number of possible literals), this would result in an algorithm with sample complexity poly(d). When we apply this strategy to learn polygons in the plane, d will correspond to the size of an assumed grid on the plane, which we think of as being very big, e.g., d = 264. Hence, poly(d) sample complexity is unacceptable. 1.2. Our Results Our first result is an efficient private learner for conjunctions. Our learner is obtained by modifying the strategy outlined above to use the greedy algorithm for set cover in order to choose a small number of literals directly out of the set of all possible 2d literals (instead of choosing them out of the set of literals that agree with the positive examples). However, this must be done carefully, as unlike before, we need to ensure that the selected literals will not cause our hypothesis to err on the positive examples. Specifically, in every step of the greedy algorithm we will aim at choosing a literal that eliminates (i.e., evaluates to zero on) a lot of the negative examples without eliminating (essentially) any of the positive examples. In the terminology of the set cover problem, we will have two types of elements positive and negative elements and our goal will be to identify a small cover of the negative examples that does not cover (essentially) any of the positive examples. We show, Theorem 1.2. There exists an efficient (ε, δ)-differentially private (α, β)-PAC learner for CONJk,d with sample complexity1 O( 1 αεk log d). We remark that our technique extends to disjunctions of literals in a straightforward way, as follows. Theorem 1.3. There exists an efficient (ε, δ)-differentially private (α, β)-PAC learner for the class of all disjunctions (i.e., OR) of at most k literals over d Boolean variables with sample complexity O( 1 αεk log d). We then show that our technique can be used to privately learn (not necessarily convex) polygons in the plane. To see the connection, let us first consider convex polygons, and observe that a convex polygon with k edges can be represented as the intersection of k halfplanes. Thus, as in our learner for Boolean conjunctions, if we could identify a halfplane that eliminates (i.e., evaluates to zero on) a lot of the negative examples without eliminating (essentially) any of the positive examples in the sample, then we could 1For simplicity we used the O notation to hide logarithmic factors in α, β, ε, δ, k. The dependency in these factors wil be made explicit in the following sections. Differentially Private Learning of Geometric Concepts learn convex polygons in iterations. However, recall that in our learner for conjunctions, the parameter d controlled both the description length of examples (since each example specified an assignment to d variables) and the number of possible literals (which was 2d). Thus, the running time of our algorithm was allowed to be linear in the number of possible literals. The main challenge when applying this technique to (convex) polygons is that the number of possible halfplanes (which will correspond to the parameter d) is huge, and our algorithm cannot run in time linear in the number of possible halfplanes. To recover from this difficulty, we consider the dual plane in which sample points correspond to lines, and show that in that dual plane it is possible to (privately) identify a point (that corresponds to a halfplane in the primal plane) with the required properties (that is, eliminating a lot of negative examples while not eliminating positive examples). The idea is that since there are only n input points, in the dual plane there will be only n lines to consider, which partition the dual plane into at most n2 regions. Now, two different halfplanes in the primal plane correspond to two different points in the dual plane, and if these two points fall in the same region, then the two halfplanes behave identically on the input sample. We could therefore partition the halfplanes (in the primal plane) into at most n2 equivalence classes w.r.t. the way they behave on the input sample. This fact can be leveraged in order to efficiently implement the algorithm. Our techniques extend to non-convex polygons, which unlike convex polygons cannot be represented as intersection of halfplanes. It it well known that every (simple2) polygon with k edges can be represented as the union of at most k triangles, each of which can be represented as the intersection of at most 3 halfplanes (as a triangle is a convex polygon with 3 edges). In other words, a (simple) polygon with k edges can be represented as a DNF formula (i.e., disjunction of conjunctive clauses) in which each clause has at most 3 literals. As we will see, our techniques can be extended to capture this case efficiently. Our main theorem is the following. Theorem 1.4 (informal). There exists an efficient (ε, δ)- differentially private (α, β)-PAC learner for union of (simple) polygons in the plane with sample complexity O( 1 αεk log d), where k is the number of edges in the union of polygons and log d is the description length of examples. As the greedy algorithm for set cover has many applications in computational learning theory, we hope that our techniques will continue to find much broader use. Remark 1.5. Any informalities made hereafter are removed in the a full version of the paper, available at https: //arxiv.org/abs/1902.05017. 2A simple polygon is one which does not intersect itself. 2. Preliminaries In the following X is some arbitrary domain. A concept (similarly, hypothesis) over domain X is a predicate defined over X. A concept class (similarly, hypothesis class) is a set of concepts. Two labeled databases S, S (X {0, 1})n are called neighboring if they differ on a single (labeled) entry. A function f : X R has sensitivity s if for every neighboring S, S , it holds that |f(S) f(S )| s. Definition 2.1. Let H be a concept class over a domain X, and let k N. We use H k to denote the class of all disjunctions (i.e., OR) of at most k concepts from H, and similarly, we denote H k for the class of all conjunctions (i.e., AND) of at most k concepts from H. Without privacy considerations, the sample complexity of PAC learning a class C is essentially characterized by a combinatorial quantity called the Vapnik-Chervonenkis dimension of C, denoted as VC(C). For conjunctions and disjunctions of a class H we have: Observation 2.2. For every concept class H we have VC(H k) O(k log(k) VC(H)) and VC(H k) O(k log(k) VC(H)). Observation 2.2 is standard (see, e.g., (Eisenstat & Angluin, 2007)). Our strategy in the following sections for privately learning a concept class C will be to use a simpler concept class H such that for some (hopefully small) k we have C H k or C H k. Example 2.3. Let DISJk,d denote the class of all disjunctions (i.e., OR) of at most k literals over d Boolean variables, and similarly, let CONJk,d denote the class of all conjunctions (i.e., AND) of at most k literals over d Boolean variables. Trivially, DISJk,d = (DISJ1,d) k, and CONJk,d = (CONJ1,d) k. The most basic constructions of differentially private algorithms are via the Laplace mechanism as follows. Theorem 2.4 (The Laplacian Mechanism (Dwork et al., 2006)). A random variable has probability distribution Lap(b) if its probability density function is f(x) = 1 2b exp( |x| b ), where x R. Let f : X R be a sensitivity s function. The mechanism A that on input S X adds noise with distribution Lap( s ε) to the output of f(S) preserves ε-differential privacy. We next describe the exponential mechanism of Mc Sherry & Talwar (2007). Let X be a domain and H a set of solutions (in our context, H will be a set of hypotheses). Given a quality function q : X H N with sensitivity 1, and a database S X , the goal is to chooses a solution h H approximately maximizing q(S, h). The mechanism chooses a solution h H with probability proportional to exp (ε q(S, f)/2). Differentially Private Learning of Geometric Concepts Proposition 2.5 (Properties of the Exponential Mechanism). (i) The exponential mechanism is ε-differentially private. (ii) Let ˆe maxf H{q(S, f)} and λ > 0. The exponential mechanism outputs a solution h such that q(S, h) ˆe λ with probability at most |H| exp( ελ/2). 3. A Generic Construction via Set Cover In this section we present our generic construction for privately learning a concept class C containing concepts that can be written as the conjunction or the disjunction of functions in a (hopefully simpler) class H. For readability we focus on conjunctions. The extension to disjunction is straightforward. Claim 3.1. Fix a target function c C, and consider the execution of Set Cover Learner on a sample S = {(xi, c (xi))}n i=1. Assume that every run of the selection procedure A in Step 1e returns a hypothesis hj s.t. q(hj) maxf H{q(f)} λ. Then, with probability at least 1 β it holds that hfin errs on at most α o +4kλ log 2 α example in S. Proof. First observe that there are 2k log 2 α draws from Lap 2k α throughout the execution. By the properties of the Laplace distribution, with probability at least 1 β it holds that the maximum absolute value of these random variables is at most = 2k α . We continue with the analysis assuming that this is the case. In particular, this means that in every iteration j we have |S0| 2 bj |S0|. Thus, in every iteration there exists a hypothesis h H with q( h) 0. To see this, recall that the target concept c can be written as c = h 1 h k for h 1, . . . , h k H. Hence, in every iteration j there is a hypothesis h H that correctly classifies all of the (remaining) positive points in S while correctly classifying at least 1/k fraction of the (remaining) negative points in S, i.e., at least |S0|/k bj/k. Such a hypothesis h satisfies q( h) = 0. By our assumption on the selection procedure A, we therefore have that in each iteration j, the selection procedure identifies a hypothesis hj s.t. q(hj) λ. By the definition of q, in every iteration j we have that the selected hj misclassifies at most λ of the remaining positive examples in S. Therefore, hfin misclassifies at most 2kλ log 2 α positive examples in S. Moreover, in every iteration j s.t. |S0| 2kλ + 4 we have that hj classifies correctly at least 1 2k fraction of the negative examples in S. To see this, observe that as q(hj) λ we have #hj 0(S0) bj/k λ |S0| 2 That is, either there exists an iteration j in which number of negative points in S drops below 2kλ + 4 , or every iteration shrinks the number of negative examples by a factor of 1 2k, in which case after 2k log 2 α iterations there could be at most αn 2 negative points in S. Observe that hfin does not err on negative points that were removed from S, and therefore, there could be at most max αn 2 , 2kλ + 4 negative points on which hfin errs. Overall, hfin errs on at most max αn 2 , 4 + 4kλ log 2 α points in S. Claim 3.1 ensures that if at every step A picks hj of high quality, then (w.h.p.) algorithm Set Cover Learner returns a hypothesis from H 2k log 2 α with low empirical error. Combining this with standard generalization bounds and with Observation 2.2 (that bounds the VC dimension of H 2k log 2 α ) we get the following theorem. Theorem 3.2. Let C, H, k be two concept classes and an integer such that C H k. Let A be a selection procedure that takes a database S and a quality function q, and returns a hypothesis h H such that q(hj) maxf H{q(f)} λ with probability at least 1 β 4k log(2/α). Then, algorithm Set Cover Learner with A as the selection procedure is an (α, β)-PAC learner for C with sample complexity n = Θ k log 1 α α VC(H) log(k) + λ + 1 ε log k β log 1 Tuning the selection procedure. If H is finite, then one could directly implement the selection procedure A using the exponential mechanism of Mc Sherry & Talwar (2007) to find a hypothesis hj with large q(hj) at each iteration. In order to guarantee that all of the k iterations of algorithm Set Cover Learner satisfy together (ε, δ)-differential privacy, it suffices that each application of the exponential mechanism satisfies ˆε ε k-differential privacy (see, e.g., (Dwork et al., 2010) for composition theorems for differential privacy). When choosing such an ˆε, the exponential mechanism identifies, in every iteration, an hj such that q(j) maxf H{q(f)} 1 maxf H{q(f)} k ε log |H|. This gives a selection procedure A which selects hj with q(hj) maxf H{q(f)} λ, for λ k ε log |H|. Example 3.3. There exist efficient (ε, δ)-differentially private (α, β)-PAC learners for CONJk,d and for DISJk,d with sample complexity n = Θ 1 αε k1.5 log d . The reason for the dependency in k1.5 in the above example, is that for the privacy analysis we wanted to ensure that each iteration was differentially private with parameter ε/ k (because when composing ℓdifferentially private mechanisms the privacy budget deteriorates proportionally to ℓ). This resulted in k/ε misclassified points per iteration (and there are k iterations). As we next explain, in our case, the privacy parameter does not need to deteriorate with the number of iterations, which allows us to improve the sample complexity by a k factor. Our approach to proving Differentially Private Learning of Geometric Concepts Algorithm Set Cover Learner Settings: Concept classes C, H and an integer k N such that C H k. Input: Labeled sample S = {(xi, σi)}n i=1 (X {0, 1})n, privacy parameter ε. Tool used: A selection procedure A that takes a database S and a quality function q (that assigns a score to each hypothesis in H), and returns a hypothesis h H. 1. For j = 1 to 2k log 2 α (a) Let S1 and S0 denote the set of positive and negative examples in S, respectively. (b) For h H let #h 0(S1) and #h 0(S0) denote the number of positive and negative examples in S, respectively, that h labels as 0. That is, #h 0(S1) = |{xi S1 : h(xi) = 0}| and #h 0(S0) = |{xi S0 : h(xi) = 0}|. (c) Let wj Lap 2k α and set bj = |S0| + wj 2k (d) For every h H, define q(h) = min n #h 0(S0) bj k , #h 0(S1) o . (e) Let hj A(S, q), and delete from S every (xi, σi) such that hj(xi) = 0. 2. Return the hypothesis hfin = h1 h2 h2k log 2 this improved bound builds on the analysis of Gupta et al. (2010) for their private set cover algorithm. The main difference is that we have both positive and negative examples, which we need to handle differently. As we next explain, this will be achieved using Step 1c of Set Cover Learner. Claim 3.4. Let ε (0, 1) and δ < 1/e. Instantiating Set Cover Learner with the exponential mechanism as the selection procedure A with privacy parameter ˆε = ε 2 ln(e/δ) (per iteration) satisfies (ε, δ)-differential privacy. We next present an intuitive (and oversimplified) overview of the proof. Consider two neighboring databases S and S such that S = S {(x , σ )}, and let us focus here on the case where σ = 0. Fix a possible output h = (h1, h2, . . . , h2k log 2 α ) of Set Cover Learner. We will analyze the ratio Pr[Set Cover Learner(S) = h] Pr[Set Cover Learner(S ) = h] . (1) Let t be such that ht is the first hypothesis in this output vector satisfying ht(x ) = 0. Observe that after the tth iteration, the executions on S and on S continue exactly the same, since (x , σ ) is removed from S during the tth iteration (because in every iteration we remove all input elements on which the selected hypothesis evaluates to 0). Intuitively, if t is small then we only need to pay (in the privacy analysis) for a small number of iterations. In general, however, t might be as large as 2k log 2 α, and accounting for that many iterations in the privacy analysis is exactly what we are trying to avoid. Recall that each iteration j of Set Cover Learner draws a random noise wj from Lap 2k α . Let us denote these noises as they are in the execution on S as w = (w1, . . . , w2k log 2 α ) and in the execution on S as w = (w 1, . . . , w 2k log 2 α ). Furthermore, let us assume that w j = wj 1 for every j t and that w j = wj for every j > t. By the properties of the Laplace distribution, this assumption distorts our bound on the ratio in expression (1) by at most an eε factor. (In a sense, for these random noises we do account for all 2k log 2 α potential iterations by sampling random noises with larger variance. However, this larger variance is mitigated by the fact that in the quality function q we divide noises by k, and hence, we do not incurr an increase of poly(k) in the sample complexity due to this issue.) We have already established that after the tth iteration, the two executions are identical. In addition, due to our assumption on w and w , during the first t iterations, the only hypotheses with different qualities (between the execution on S and on S ) or those hypotheses that label x as 0. This is because if a hypothesis h labels x as 1, then (x , 0) only effects the quality q(h) via the noisy estimation for the size of S0 (denoted as bj in the algorithm), which by our assumption on w and w is the same in the two executions (because the difference in the noise cancels out the difference in |S0|). To summarize, after conditioning on w and w , the additional example (x , 0) causes the two executions to differ only in their first t iterations, and within these t iterations it affects only the qualities of the hypotheses that label x as zero. This can be formalized to bound the ratio in expression (1) by Qt j=1 exp (ε pj), where pj is the probability that a hypothesis that labels x as 0 is chosen at step j of the algorithm. The proof then concludes by arguing that if these probabilities {pj} are small then they reduce our privacy costs (since they multiply ε), and if these probabilities {pj} are large then the index t should be small Differentially Private Learning of Geometric Concepts (since we are likely to identify a hypothesis that labels x as zero quickly, and t is the index of the first such hypothesis), and therefore we must only account for the privacy loss incurred during a small number of iterations. For example, by combining Claim 3.4 with Claim 3.1, we get improved learners for conjunctions and disjunctions: Theorem 3.5. There exist efficient (ε, δ)-differentially private (α, β)-PAC learners for CONJk,d and DISJk,d with sample complexity n = Θ 1 αε k log d . The runtime of the learners is polynomial in k, d, 1 δ , and log 1 4. Convex Polygons in the Plane In this section we show how our generic construction from the previous section applies to convex polygons in a (discrete version of the) Euclidean plane. This is an important step towards our construction for (not necessarily convex) polygons. We represent a convex polygon with k edges as the intersection of k halfplanes. A halfplane over R2 can be represented using 3 parameters a, b, c R with fa,b,c(x, y) = 1 iff cy ax + b. Denote the set of all such halfplanes over R2 as HALFPLANE = {fa,b,c : a, b, c R}. We can now define the class of convex polygons with k edges over R2 as CONVEX-k-GON = HALFPLANE k. For a parameter d N, let Xd = {0, 1, 2, . . . , d}, and let X2 d = (Xd)2 denote a discretization of the Euclidean plane, in which each axis consists of the points in Xd. We assume that our examples are from X2 d. Hence, as explained next, we are able to represent a halfplane using only two real parameters a, b R and a bit z { 1}. The parameters a and b define the line y = ax + b, and the parameter z determines whether the halfplane is above or below the line. In other words, fa,b,z(x, y) = 1 iff zy z(ax + b). Even though in this representation we do not capture vertical lines, for our purposes, vertical lines will not be needed. The reason is that when the examples come from the discretization X2 d, a vertical line can always be replaced with a non-vertical line such that the corresponding halfplanes behave exactly the same on all of X2 d. Moreover, since the discretization X2 d is finite, it suffices to consider bounded real valued parameters a, b [ 2d2, 2d2]. Actually, by letting a reside in a bigger range, we can encode the bit z in a, and represent a halfplane using only two real numbers. We denote the set of all such halfplanes as HALFPLANEd = fˆa,b : ˆa [ 2d2, 6d2], b [ 2d2, 2d2] , where fˆa,b(x, y) = 1 iff zy z(ax + b) for a = ˆa 4d2 1{ˆa>2d2} and z = 1 2 1{ˆa>2d2}. This discussion can be formalized to get the following observation. Observation 4.1. For every f HALFPLANE there exists an ˆf HALFPLANEd such that for every (x, y) X2 d we have f(x, y) = ˆf(x, y). Remark 4.2. We think of the discretization size d as a large number, e.g., d = 264. The runtime and the sample complexity of our algorithms is at most logarithmic in d. A consequence of Observation 4.1 is that, in order to learn CONVEX-k-GON over examples in X2 d, it suffices to describe a learner for the class HALFPLANE k d over examples in X2 d. As we next explain, this can be done using our techniques from Section 3. Concretely, we need to specify the selection procedure used in Step 1e of algorithm Set Cover Learner, for privately choosing a hypothesis from HALFPLANEd. Our selection procedure appears in algorithm Select Halfplane. Privacy analysis of Select Halfplane. Consider running algorithm Select Halfplane with a score function q whose sensitivity is (at most) 1, and observe that, as in the standard analysis of the exponential mechanism (Mc Sherry & Talwar, 2007), algorithm Select Halfplane satisfies 2ε-differential privacy. To see this, fix two neighboring databases S, S , and fix (a, b) D2. Let us denote the probability density functions in the execution on S and on S as p S(a, b) and p S (a, b), respectively. Since q is of sensitivity 1, we have that p S(a, b) e2εp S (a, b), and hence, for any set of possible outcomes U we have Pr[Select Halfplane(S) U] e2ε Pr[Select Halfplane(S ) U], as required. Moreover, a similar analysis to that of Claim 3.4 shows the following. Claim 4.3. Consider instantiating Set Cover Learner with the selection procedure Select Halfplane. In order for the whole execution to satisfy (ε, δ)- differential privacy, it suffices to execute each instance of Select Halfplane with a privacy parameter ˆε = O (ε/ log(1/δ)). Utility analysis of Select Halfplane. In algorithm Select Halfplane we identify points in X2 d with lines in D2 and vice verse. The following observation states that if two points in D2 belong to the same region (as defined in Step 3) then these two points correspond to halfplanes in X2 d that agree on every point in the input sample S. This allows us to partition the halfplanes (in the primal plane) into a small number of equivalence classes. Observation 4.4. Consider the execution of Select Halfplane on a sample S, and let ˆR = {rj i } be the regions defined in Step 3 (for j {1, 2}). For every region rj i ˆR, for every two points in this region (a1, b1), (a2, b2) rj i , and for every example (x, y) in the sample S we have fa1,b1(x, y) = fa2,b2(x, y). Proof. Fix two points (a1, b1), (a2, b2) that belong to the same region in ˆR. By the definition of the regions in ˆR, Differentially Private Learning of Geometric Concepts Algorithm Select Halfplane Input: Sample S = {((xi, yi), σi)}n i=1 (X2 d { 1})n, privacy parameter ε, quality function q : HALFPLANEd R. 1. Denote D = 2d2, 2d2 and F = 2d2, 6d2 . We will refer to the axes of D2 and of F D as a and b. 2. Identify every example ((x, y), σ) S with the line ℓx,y in D2 defined by the equation y = xa + b, where a, b are the variables and x, y are the coefficients. Denote Sdual = {ℓx,y : ((x, y), σ) S}. 3. Let R = {r1 1, r1 2, . . . , r1 |R|} denote the partition of D2 into regions defined by the lines in Sdual. Also let R = {r2 1, . . . , r2 |R|} be a partition of [2d2, 6d2] D identical to R except that it is shifted by 4d2 on the a axis. Denote ˆR = R R . % Note that, by induction, n lines can divide the plane into at most n2 different regions. Hence, |R| is small. 4. For every 1 i |R|, let wi denote the area of region r1 i (which is the same as the area of r2 i ), and let (a1 i , b1 i ) r1 i and (a2 i , b2 i ) r2 i be arbitrary points in these regions. 5. Denote N = P rj i ˆ R wi exp(ε q(faj i ,bj i )), where faj i ,bj i is a halfplane in HALFPLANEd. 6. Choose and return a pair (ˆa, b) [ 2d2, 6d2] [ 2d2, 2d2] with probability density function p(ˆa, b) = 1 N exp(ε q(fˆa,b)). % Note that for every (a, b), (a , b ) rj i in the same region we have q(fa,b) = q(fa ,b ) (see Observation 4.4). Hence, this step can be implemented by first selecting a region rj i ˆR with probability proportional to wi exp(ε q(faj i ,bj i )), and then selecting a random (a, b) rj i uniformly. for every example (x, y) in the sample S we have that y a1x + b1 if and only if y a2x + b2, and hence, fa1,b1(x, y) = 1 iff fa2,b2(x, y) = 1. In particular, Observation 4.4 shows that the function p defined in Step 6 indeed defines a probability density function, as for F = [ 2d2, 6d2] and D = [ 2d2, 2d2] we have Z F D p(a, b) d2(a, b) = X rj i p(a, b) d2(a, b) exp(ε q(fa,b)) exp(ε q(faj i ,bj i )) exp(ε q(faj i ,bj i )) rj i 1 d2(a, b) wi exp(ε q(faj i ,bj i )) We also need to argue about the area of the region in the dual plane that corresponds to hypotheses with high quality (as the probability of a choosing a hypotheses from that region is proportional to its area). This is done in the following claim. Claim 4.5. Consider the execution of Select Halfplane on a sample S, and let w1, . . . , w|R| denote the areas of the regions defined in Step 3. Then for every i we have that wi d 4/4. Proof. We will show that every two different vertices of the regions in R are at distance at least 1/d2, and hence, the minimal possible area is that of an equilateral triangle with edge length 1/d2, which has area To show this lower bound on the distance between a pair of vertices, let ℓx1,y1, ℓx2,y2, ℓx3,y3, ℓx4,y4 be 4 lines in Sdual, and assume that ℓx1,y1 and ℓx2,y2 intersect at (a1,2, b1,2), and that ℓx3,y3 and ℓx4,y4 intersect at (a3,4, b3,4). Moreover, assume that these two intersection points are different. We can write the coordinates of these intersection points as a1,2 = y1 y2 x1 x2 , b1,2 = y1 x1 y1 y2 a3,4 = y3 y4 x3 x4 , b3,4 = y3 x3 y3 y4 Now if a1,2 = a3,4, then (a1,2, b1,2) (a3,4, b3,4) 2 |a1,2 a3,4| = y1 y2 x1 x2 y3 y4 = (y1 y2)(x3 x4) (y3 y4)(x1 x2) (x1 x2)(x3 x4) and if a1,2 = a3,4, then (a1,2, b1,2) (a3,4, b3,4) 2 |b1,2 b3,4| = = |y1 y3 a1,2(x1 x3)| 1 Differentially Private Learning of Geometric Concepts The following lemma states the utility guarantees of Select Halfplane. Lemma 4.6. Consider executing Select Halfplane on a sample S, and assume that there exists a hypothesis f HALFPLANEd with q(f) ν. Then with probability at least 1 β, the output f is s.t. q(f ) ν 8 Proof. Denote F = [ 2d2, 6d2] and D = [ 2d2, 2d2]. Let ˆR = {r1 1, r2 1, . . . , r1 |R|, r2 |R|} denote the regions defined in Step 3, and let B ˆR denote the subset of all regions s.t. the halfplanes that correspond to points in these regions have quality less than ν 8 β ). Then the probability that Select Halfplane outputs a hypothesis f with q(f ) < ν 8 β ) is at most r p(a, b) d2(a, b) X eεν 8 ln( 2d eεν 8 ln( 2d β ) area (F D) N = eεν 8 ln( 2d eεν 8 ln( 2d 1/(4d4) exp(εν) = 128d8e 8 ln(2d/β) β. Combining Lemma 4.6 with Claim 4.3 and Theorem 3.2 yields our private learners for convex polygons: Theorem 4.7. There exists an efficient (ε, δ)-differentially private (α, β)-PAC learner for CONVEX-k-GON over examples from X2 d with sample complexity O k αε log 1 5. Union of Non-Convex Polygons In this section we briefly describe how our techniques from the previous sections can be used to learn the class of (simple) polygons in the plane, as defined next. For a simple and closed curve3 C, we use interior(C) to denote the union of C and its bounded area. We define the class of all simple polygons in the plane with (at most) k edges as interior(C) : C is a simple and closed curve in R2, consisting of at most k line segments By standard arguments in computational geometry, every such polygon with k edges can be represented as the union of at most k triangles, each of which can be represented as the intersection of at most 3 halfplanes (since a triangle is a convex polygon with 3 edges). Let us denote the class of all triangles in the plane as TRIANGLE. Hence, k-GON TRIANGLE k HALFPLANE 3 k . 3A curve is simple and closed if it does not cross itself and ends at the same point where it begins. Thus, in order to learn polygons with k edges, it suffices to construct a learner for the class HALFPLANE 3 k. In fact, this class captures unions of polygons with a total of at most k edges. In addition, similar arguments to those given in Section 4 show that if input examples come from X2 d = {0, 1, . . . , d}2, then it suffices to construct a learner for HALFPLANE 3 d k, which we can do using our techniques from Sections 3 and 4. First, as we mentioned, a straightforward modification to algorithm Set Cover Learner yields an algorithm for learning classes of the form C H k (instead of C H k as stated in Section 3). Now, to get an efficient construction, we need to specify the selection procedure for choosing a hypothesis hj HALFPLANE 3 d in each step of Set Cover Learner. As before, given an input sample S we consider the dual plane D2 s.t. every input example in S from the primal plane corresponds to a line in the dual plane, and every point from the dual plane corresponds to a halfplane in the primal plane. Recall that in the previous section we identified a hypothesis (which was a halfplane) with a point in the dual plane. The modification is that now a hypothesis is a triangle which we identify with three points in the dual plane (these 3 points correspond to 3 halfplanes in the primal plane, whose intersection is a triangle). We use k-UNION-GON to denote the class of all unions of (simple) polygons with a total of at most k edges. That is, every hypothesis h k-UNION-GON can be written as h = h1 hm for (h1, . . . , hm) (k1-GON km-GON) where k1 + +km k. We obtain the following theorem. Theorem 5.1. There exists an efficient (ε, δ)- differentially private (α, β)-PAC learner for k-UNION-GON over examples from X2 d with sample complexity O k αε log 1 α . The runtime of the learner is polynomial in k, 1 β , and log d. 6. Conclusion and Future Work In this work we presented a computationally efficient differentially private PAC learner for simple geometric concepts in the plane, which can be described as the union of polygons. Our results extend to higher dimensions by replacing lines with hyperplanes, and triangles with simplices. The running time, however, depends exponentially on the dimension. Our results also extend, via linearization, to other simple geometric concepts whose boundaries are defined by low degree polynomials, such as balls. In general, the dimension of the linearization depends on the degrees of the polynomials. This motivates the open problem of improving the dependency of the running time on the dimension of the problem. Differentially Private Learning of Geometric Concepts Acknowledgements The authors would like to thank Kobbi Nissim, and the anonymous reviewers, for their helpful comments. Abowd, J. M. The challenge of scientific reproducibility and privacy protection for statistical agencies. Census Scientific Advisory Committee, 2016. Alon, N., Livni, R., Malliaris, M., and Moran, S. Private PAC learning implies finite littlestone dimension. Co RR, abs/1806.00949, 2018. Bassily, R., Thakurta, A. G., and Thakkar, O. D. Modelagnostic private learning. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, Neur IPS 2018, 3-8 December 2018, Montr eal, Canada., pp. 7102 7112, 2018. Beimel, A., Kasiviswanathan, S. P., and Nissim, K. Bounds on the sample complexity for private learning and private data release. In TCC, volume 5978 of LNCS, pp. 437 454. Springer, 2010. Beimel, A., Nissim, K., and Stemmer, U. Characterizing the sample complexity of private learners. In ITCS, pp. 97 110. ACM, 2013a. Beimel, A., Nissim, K., and Stemmer, U. Private learning and sanitization: Pure vs. approximate differential privacy. In APPROX-RANDOM, volume 8096 of LNCS, pp. 363 378. Springer, 2013b. Beimel, A., Nissim, K., and Stemmer, U. Learning privately with labeled and unlabeled examples. In SODA, pp. 461 477. SIAM, 2015. Beimel, A., Moran, S., Nissim, K., and Stemmer, U. Private center points and learning of halfspaces. Co RR, abs/1902.10731, 2019. Bun, M. and Zhandry, M. Order-revealing encryption and the hardness of private learning. In TCC, volume 9562 of LNCS, pp. 176 206. Springer, 2016. Bun, M., Nissim, K., Stemmer, U., and Vadhan, S. P. Differentially private release and learning of threshold functions. In FOCS, pp. 634 649, 2015. Bun, M., Nissim, K., and Stemmer, U. Simultaneous private learning of multiple concepts. In ITCS, pp. 369 380. ACM, 2016. Chaudhuri, K. and Hsu, D. Sample complexity bounds for differentially private learning. In COLT, volume 19 of JMLR Proceedings, pp. 155 186, 2011. Dwork, C. and Feldman, V. Privacy-preserving prediction. In COLT, pp. 1693 1702, 2018. Dwork, C. and Roth, A. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3-4):211 407, 2014. Dwork, C., Mc Sherry, F., Nissim, K., and Smith, A. Calibrating noise to sensitivity in private data analysis. In TCC, volume 3876 of LNCS, pp. 265 284. Springer, 2006. Dwork, C., Rothblum, G. N., and Vadhan, S. P. Boosting and differential privacy. In FOCS, pp. 51 60. IEEE Computer Society, 2010. Eisenstat, D. and Angluin, D. The VC dimension of k-fold union. Inf. Process. Lett., 101(5):181 184, 2007. doi: 10.1016/j.ipl.2006.10.004. Erlingsson, U., Pihur, V., and Korolova, A. Rappor: Randomized aggregatable privacy-preserving ordinal response. In CCS, 2014. Feldman, V. and Xiao, D. Sample complexity bounds on differentially private learning via communication complexity. SIAM J. Comput., 44(6):1740 1764, 2015. Gupta, A., Ligett, K., Mc Sherry, F., Roth, A., and Talwar, K. Differentially private combinatorial optimization. In SODA, pp. 1106 1125, 2010. Kasiviswanathan, S. P., Lee, H. K., Nissim, K., Raskhodnikova, S., and Smith, A. What can we learn privately? SIAM J. Comput., 40(3):793 826, 2011. Kearns, M. J. Efficient noise-tolerant learning from statistical queries. J. ACM, 45(6):983 1006, 1998. Kearns, M. J. and Vazirani, U. V. An Introduction to Computational Learning Theory. MIT press, Cambridge, Massachusetts, 1994. Mc Sherry, F. and Talwar, K. Mechanism design via differential privacy. In FOCS, pp. 94 103. IEEE Computer Society, 2007. Thakurta, A., Vyrros, A., Vaishampayan, U., Kapoor, G., Freudiger, J., Sridhar, V., and Davidson, D. Learning new words. US Patent 9594741, 2017. Vadhan, S. The complexity of differential privacy, 2016.