# privately_counting_partially_ordered_data__05a66e06.pdf Published as a conference paper at ICLR 2025 PRIVATELY COUNTING PARTIALLY ORDERED DATA Matthew Joseph, M onica Ribero & Alexander Yu Google Research NY We consider differentially private counting when each data point consists of d bits satisfying a partial order. Our main technical contribution is a problem-specific K-norm mechanism that runs in time O(d2). Experiments show that, depending on the partial order in question, our solution dominates existing pure differentially private mechanisms, and can reduce their error by an order of magnitude or more. 1 INTRODUCTION A differentially private (DP) (Dwork et al., 2006) statistic incorporates randomness to obscure the contribution of any single data point. To achieve this, differentially private algorithms typically require and, if necessary, enforce restrictions on the data points. A common restriction is ℓp sensitivity: if the statistic T is a vector in Rd, adding a user s data point must not change the ℓp norm of T by more than p. This framework is generic enough to apply to a wide variety of problems, but it can yield relatively poor performance for statistics whose sensitivity is not tightly characterized by an ℓp norm. One such statistic is counting on partially ordered data. As a straightforward running example, the National Health Interview Survey (Services & Medicaid, 2024) has been administered annually since 1957 by the Center for Disease Control. The Hypertension section of the 2024 survey first asks if the respondent has been told they have hypertension and then asks if they have been told multiple times. The survey structure requires a respondent to answer yes to the first question in order to answer yes to the second; equivalently, if the answers are given as binary vector x, the survey structure imposes partial order x2 x1. A count of this partially ordered data then records the number of yes responses to each question. Other examples of partially ordered data include software library dependencies, coursework prerequisites, and any data encoded using directed acyclic graphs. It is possible to apply ℓp sensitivity to partially ordered data. In the survey example, a respondent can answer yes to every question, so the vector of counts has ℓ1 sensitivity d. However, this analysis also protects against inputs that cannot occur, as the binary and order constraints means that no response can have the form (d, 0, . . . , 0) or (0, 1, . . .). In this sense, straightforward application of Laplace (or other ℓp mechanism) noise may be loose . 1.1 CONTRIBUTIONS Our main technical contribution is an efficient implementation of the K-norm mechanism (Hardt & Talwar, 2010) for counting partially ordered data. Our solution can be applied to any partial order, satisfies pure differential privacy, and runs in time O(d2) (Theorem 3.15). This provides a more than cubic speedup over the fastest general polytope sampler and is provably exponentially faster than any ℓp ball rejection sampler (Theorem 3.17). Experiments using both synthetic and real-world partial orders demonstrate significant error reductions over existing DP mechanisms (Section 4). 1.2 RELATED WORK Joseph & Yu (2024) previously constructed efficient implementations of the K-norm mechanism to obtain better noise distributions for the problems of (contribution-bounded) sum, count, and vote. For a longer discussion of the K-norm mechanism, its variants, and its relation to other queryanswering solutions like the projection (Nikolov et al., 2013; Nikolov, 2023), matrix (Li et al., {mtjoseph, mribero, alexjyu}@google.com. Published as a conference paper at ICLR 2025 2015; Mc Kenna et al., 2018), and factorization (Edmonds et al., 2020; Nikolov & Tang, 2023) mechanisms, we refer the interested reader to their paper; here, we note that these mechanisms can be viewed as general but possibly slow approximations of optimal constructions of problem-specific private additive noise. In contrast, we provide a problem-specific but optimal and fast construction. The simplest difference between our work and that of Joseph & Yu (2024) is that we consider different problems that require different sampling techniques. In more detail, these different problems are characterized by different combinatorial objects. In Joseph & Yu (2024), their sum and vote polytopes are based on a truncated hypercube and permutohedron, respectively. Here, the poset polytope is most closely related to the double poset polytope. Similarly, though our samplers are also based on identifying an efficiently sampleable triangulation of the polytope, the triangulations themselves are different: the sum polytope triangulation is indexed by permutations with a fixed number of ascents, while ours is indexed by pairs of non-interfering chains in the Birkhoff lattice. 2 PRELIMINARIES 2.1 DIFFERENTIAL PRIVACY The material in this subsection is largely adapted from the preliminaries of Joseph & Yu (2024), who derived efficient instances of the K-norm mechanism for different problems. We use pure differential privacy, in the add-remove model. Definition 2.1 (Dwork et al. (2006)). Databases X, X from data domain X are neighbors X X if they differ in the presence or absence of a single record. A randomized mechanism M : X O is ε-differentially private (DP) if for all X X X and any S O, PM [M(X) S] eεPM [M(X ) S] . Our solution will apply the K-norm mechanism. Lemma 2.2 (Hardt & Talwar (2010)). Given statistic T with -sensitivity and database X, the K-norm mechanism has output density f X(y) exp ε y T(X) and satisfies ε-DP. We apply an instance of the K-norm mechanism chosen using our statistic s sensitivity space. Definition 2.3 (Kattis & Nikolov (2017); Awan & Slavkovi c (2021)). The sensitivity space of statistic T is S(T) = {T(X) T(X ) | X, X are neighboring databases}. Sensitivity spaces that induce norms are particularly interesting, as the corresponding K-norm mechanisms enjoy certain notions of optimality. Lemma 2.4. If set W is convex, bounded, absorbing (for every u Rd, there exists c > 0 such that u c W), and symmetric around 0 (u W u W), then the function W : Rd R 0 given by u W = inf{c R 0 | u c W} is a norm, and we say W induces W . Definition 2.5. If statistic T has a sensitivity space convex hull CH(S(T)) satisfying Lemma 2.4, we say T induces a norm, and call CH(S(T)) the induced norm ball of T. Awan & Slavkovi c (2021) show that the instance of the K-norm mechanism using a statistic s induced norm is optimal with respect to entropy and conditional variance (see Sections 3.2 and 3.3 of their paper for details). The only remaining challenge is running the resulting mechanism. Lemma 2.6 (Hardt & Talwar (2010)). Running the K-norm mechanism reduces to sampling the unit ball for the norm . Our main technical contribution is therefore a fast sampler for the induced norm ball for counting partially ordered data. This is a d-dimensional polytope with Ω(d) constraints. Applying the best general sampler takes time O(d2+ω) where ω 2 is the matrix multiplication exponent (Theorem 1.5 of Laddha et al. (2020)), and achieving a close enough approximation for O(ε)-DP adds another O(d) factor (Appendix A of Hardt & Talwar (2010)). 2.2 PARTIALLY ORDERED DATA Throughout, we assume that our goal is to compute a sum T(X) = P x X x of length-d binary vectors with bits that satisfy a partial order. Published as a conference paper at ICLR 2025 Definition 2.7. A partially ordered set or poset (P, ) is a finite set P together with a reflexive, transitive, and anti-symmetric relation . If p1, p2 P and p1 p2, we say that p1 is a child of p2 and p2 is a parent of p1. A linear extension of poset (P, ) is a total ordering (P, ) that preserves , i.e. an ordering of the elements of P given by p1 p2 . . . pd such that pi pj implies i < j. We assume that a poset is given in the form of a |P| |P| binary matrix M where Mij = 1 pi pj. We say a length-|P| binary vector x is partially ordered if pi pj implies xi xj. We also assume that each poset has a single root. Assumption 2.8. The poset (P, ) has a root r P such that p r for all p P. For surveys, this corresponds to a question asking if the respondent wants to take the survey at all, and allows us to count nonrespondents. Our mechanism is optimal for these posets in the sense described by Awan & Slavkovi c (2021). Note also that if we are instead given a poset without a root, we can simply add a root, apply our mechanism, and ignore the added dimension in the final output, as done in our experiments. By post-processing, this does not alter the privacy guarantee. 3 AN EFFICIENT MECHANISM We start with a high level summary of the sampler. Since our goal is to sample the norm ball induced by our statistic, we first show that counting partially ordered data induces a norm, and that the induced norm ball can be expressed as a combinatorial object called a double order polytope (Section 3.1). This connection allows us to leverage a triangulation for double order polytopes developed by Chappell et al. (2017). The problem then reduces to sampling a single simplex from the triangulation. However, the simplices in this triangulation are indexed by pairs of non-interfering chains (Definition 3.10) essentially disjoint sets of elements in the poset ground set with a special structure given by the order relation and it is not obvious how to sample such an object uniformly at random. To do so, we prove a bijection between the pairs of non-interfering chains of the triangulation and a family of new structures called extended bipartitions (Definition 3.12), for which we construct a uniform sampler (Section 3.3). Putting these elements together yields an overall sampler of a simplex of the triangulation, from which we can easily sample a single point (Section 3.4). 3.1 POSET BALL As described in Section 2.1, an efficient application of the optimal K-norm mechanism for counting partially ordered data reduces to sampling the norm ball induced by its sensitivity space. We start with some basic results about the structure of this norm ball. Throughout, we denote our original poset by P . Recall from Assumption 2.8 that we assume the poset (P , ) ordering our data points has a root r such that p r for all p P ; we omit this assumption for the rest of this section. Lemma 3.1. Let T(P , )(X) = P x X x be a statistic summing data points x that satisfy a partial order (P , ) (Definition 2.7). Then T(P , ) induces a norm. Proof. We verify the conditions in Lemma 2.4. We write T = T(P , ) for brevity. CH(S(T)) is a convex hull and so immediately convex. Let V (CH(S(T))) be the set of vertices of CH(S(T)). By the definition of sensitivity space, the vertices of CH(S(T)) are the length-|P | partially ordered binary vectors and their negations. There are finitely many vertices, so CH(S(T)) is bounded. Any point p CH(S(T)) is a convex combination of the vertices P civi, and vi V (CH(S(T))) implies vi V (CH(S(T))), so p = P ci( vi) CH(S(T)). Thus, CH(S(T)) is symmetric around the origin. It remains to show CH(S(T)) is absorbing. Consider any linear extension (P , ) of (P , ). Then (P , ) has all of the relations in (P , ), so S(T(P , )) S(T(P , )), and CH(S(T(P , ))) CH(S(T(P , ))). Define {v1, . . . , vd} to be the set of binary vectors where vi is the vector whose first i entries are 1 while the rest are 0. Then V (CH(S(T(P , )))) is { v1, ..., vd}. Since {v1, ..., vd} forms a basis of Rd, there is an invertible linear map A mapping {v1, ..., vd} to the standard basis. In particular, A(CH(S(T(P , )))) is the unit ℓ1 ball Bd 1. Let b be a small origin centered ball around 0 in Bd 1. Then A 1(b) is a small origin Published as a conference paper at ICLR 2025 Figure 1: The solid orange border outlines the d = 2 path graph (x2 x1) poset ball. The dashed teal border is the minimum containing ℓ1 ball, and the dotted purple border is the minimum containing ℓ ball. centered ellipsoid in CH(S(T(P , ))), i.e. 0 is an interior point of CH(S(T(P , ))). Since S(T(P , )) S(T(P , )), it follows that CH(S(T(P , ))) is absorbing. We shorthand T s induced norm ball as the poset ball. As a simple example, the poset ball for the path graph x2 x1 appears in Figure 1. The next result shows that the poset ball can be reinterpreted as an object from combinatorics, the double order polytope. This interpretation allows us to apply results about the double order polytope from Chappell et al. (2017). Reasoning about the double order polytope requires some additional definitions related to posets. Definition 3.2. A double poset is a triple (P, +, ) where P+ = (P, +) and P = (P, ) are posets on the same ground set. A double poset is compatible if P+ and P have a shared linear extension. Definition 3.3 (Chappell et al. (2017)). Let P1, P2 Rd be polytopes. The Cayley sum of P1 and P2 is defined by P1 P2 = CH({1} P1 { 1} P2). Define P1 P2 = P1 P2. The order polytope O(P) of a poset (P, ) is the set of all order preserving functions f : P [0, 1] such that 0 f(a) f(b) 1 for all a, b P where a b. The double order polytope O2(P) of a double poset (P, +, ) is O(P+) O(P ).1 We can now state the connection between the poset ball and double order polytope. Lemma 3.4. The poset ball for poset (P , ) is O2(P r), the double order polytope on the double order poset (P r, , ). Proof. Let S be the set of functions f : P {0, 1} such that 0 f(a) f(b) 1 for all a, b P where a b. We can view functions in S as |P |-dim binary vectors in the natural way. Recall that the poset ball is CH(S S). Equivalently, it is CH(O(P ) O(P )). We apply the following general result about the convex hulls of reflected polytopes. A similar result is stated without proof in Chappell et al. (2017); for completeness, a proof appears in Appendix A. Claim 3.5. Let P be a (d 1)-dim convex polytope in Rd lying in the hyperplane x1 = 1. Let P = P {0}. Then V (CH(P P )) = V (P) V ( P). By Claim 3.5, the vertices of CH(O(P ) O(P )) are V (O(P )) V ( O(P )) {0}. The vertices in V (O(P )) {0} are of the form 1 v where v is a vertex of V (O(P r)), and the first dimension corresponds to the r-dimension. In other words, the vertices of this shape are exactly the vertices of O2(P r) , the double order polytope on the double poset (P r, , ). By Lemma 3.4, our new goal is to construct a fast sampler for the double order polytope O2(P r). 1This is a slight modification of the definition of the double order polytope given by Chappell et al. (2017), but the change does not meaningfully affect the properties of this structure (see Remark A.1 for details). Published as a conference paper at ICLR 2025 3.2 DOUBLE ORDER POLYTOPES AND DOUBLE CHAIN POLYTOPES We sample O2(P r) by constructing a triangulation that decomposes it into disjoint d-dimensional simplices of equal volume. By this construction, it suffices to sample one of the simplices uniformly at random and then return a uniform random sample from the chosen simplex. The following result about uniformly sampling a simplex is folklore; we take the statement from Joseph & Yu (2024). Lemma 3.6. A collection of points x0, . . . , xd Rn with n d are affinely independent if Pd i=0 αixi = 0 and Pd i=0 αi = 0 implies α = 0. A d-simplex is the convex hull of d + 1 affinely independent points and can be uniformly sampled in time O(d log(d)). Since sampling a simplex is easy, it remains to find a triangulation of O2(P r) into simplices. For completeness, the following definition formalizes the notion of a triangulation. Definition 3.7. Let P Rd be a polytope. For 0 k d 1, a proper k-dim face of P is an intersection I of the form H P of a (d 1)-dim hyperplane H with the boundary of P, such that the I P and dim(I) = k. A subdivision of P is a collection of d-dim polytopes P1, ..., Pm such that P = m i=1Pi and for each 1 i < j m the (possibly empty) set Pi Pj is a proper face of each of Pi and Pj. A triangulation of P is a subdivision containing only simplices. A simplex with vertices in a lattice Λ is unimodular if it has minimal volume. A triangulation is unimodular if all of its simplices are unimodular. We use a result from Chappell et al. (2017) that provides a mapping between the double order polytope and a related geometric object, the double chain polytope. Definition 3.8 (Chappell et al. (2017)). Given poset P, a chain is a sequence of (strictly) increasing elements in P. The chain polytope C(P) is the set of functions g : P R 0 such that g(a1) + ... + g(ak) 1 for all chains a1 ... ak of P. The double chain polytope C2(P) of a double poset (P, +, ) is C(P+) C(P ) Lemma 3.9 (Theorem 4.3 Chappell et al. (2017)). For any compatible double poset (P, +, ), there is an explicit homeomorphism between the double chain polytope C2(P) and the double order polytope O2(P). Conveniently, the mapping transfers a triangulation of the double chain polytope to a triangulation of the double order polytope also due to Chappell et al. (2017). This triangulation will be indexed by objects called non-interfering pairs of chains. Definition 3.10 (Chappell et al. (2017)). Given poset (P, ), I P is a filter if x I and x y implies y I, and the Birkhoff poset J (P) is the poset given by the filters of P ordered by inclusion. Given double poset (P, +, ), for chains of filters C+ J (P+) and C J (P ), we say the pair of chains C = (C+, C ) is non-interfering if min(J+) min(J ) = for all J+ C+, J C , where min( ) denotes the set of minimal elements of the filter. Moreover, we denote by ni(P) the collection of non-interfering pairs of chains. We therefore obtain the following triangulation of the double order polytope indexed by noninterfering pairs of chains. This follows from Corollary 4.1, Theorem 4.3, and Equation 13 of Chappell et al. (2017). Lemma 3.11 (Chappell et al. (2017)). Let (P, +, ) be a double poset. For L J (P), define F(L) = CH({1J | J L}) R|P |. For non-interfering pair of chains C = (C+, C ) of size |C| = |C+| + |C | = |P| + 1, define map F(C) = F(C+) F(C ). Then { F(C) | C ni(P)} is a triangulation of O2(P) into |P|-dimensional simplices. Moreover, given C, F(C) can be computed in time O(d2). As a result of this triangulation, it now suffices to uniformly sample a non-interfering pair of chains associated with P r of size d + 1. This will be easier, because we can construct a bijection between these pairs and extended bipartitions of P r , and extended bipartitions are conceptually simpler than non-interfering pairs of chains. 3.3 EXTENDED BIPARTITIONS Definition 3.12. Given poset (P, ), an extended bipartition is a quadruple ((A, ), (B, ), A, B) where (A, ), (B, ) are subposets of P such that A B = , A B = P, and A, B are linear extensions of (A, ), (B, ) respectively. Published as a conference paper at ICLR 2025 At a high level, given a non-interfering pair of chains, it can be shown that the set difference between the minimal elements of adjacent filters in each chain has size exactly 1, and that these successive differences form a linear extension of the subposet induced by those elements. Moreover, the ground sets of the two linear extensions form a bipartition of the original ground set. Conversely, given an extended bipartition, we can define a non-interfering pair of chains of filters by treating each suffix of each linear extension of the bipartition as the minimal elements of a filter. Due to space constraints, the full proof of this result appears in Appendix A. Lemma 3.13. There is a bijection between the set of non-interfering pairs of chains C = (C+, C ) of the double poset (P r, , ) of size |C| = |C+| + |C | = d + 1 and the set of extended bipartitions of P r. Moreover, given an extended bipartition, its corresponding non-interfering pair of chains can be computed in time O(d2). The problem is now reduced to uniformly sampling an extended bipartition. Constructing such a sampler is the last technical step in our argument. Lemma 3.14. Given poset (P, ) of size |P| = d, there is an algorithm to uniformly sample an extended bipartition of P in time O(d2). Proof. We start by reformulating the matrix M associated with P (Definition 2.7) as a more suitable data structure GP : iterate through the matrix M and, for each element v, construct a list pv of its parent elements and a list cv of its child elements. This takes a single O(d2) pass through M. Note that GP enables us to compute a maximal element in time O(d) by starting at any node and following parent pointers. Our algorithm, A, receives a GP representation of a poset P and uses recursion as follows. At each step, compute maximal element v P in O(d) and then construct GP v from GP by deleting v and the associated parent pointers from its children in time O(d). Then compute A(GP v) to obtain extended bipartition ((A, ), (B, ), A, B) of P v. We will use this extended bipartition of P v to construct one for P. Note that A and B can be represented by increasing ordered lists of elements of A and B respectively. Let aj be the A-largest element that is -smaller than v in a1 A A ak. To find aj, iterate from right to left (i.e., A-largest to A-smallest) over the A list and check whether the current node w has the property that Mw,v = 1, stopping when the first child of v is found. Define bj similarly. Then modifying A by inserting v anywhere to the right of aj in a1 ak produces a linear extension A and the extended bipartition ((A {v}, ), (B, ), A , B) of P. We call the set of all such insertion points the valid placements for A, and define the valid placements for B similarly. Let L be the union of the valid placements for A and B. By the above, computing L and uniformly sampling a valid placement takes time O(d). Finally, inserting v into a linear extension takes time O(d) assuming the linear extensions are implemented using linked lists. Overall, we do O(d) work at each recursive step. In the base case that |P| = 1, we flip a coin to either return ((P, ), P , ) or (( , P), , P ) where P and are trivial linear extensions. Since we do O(d) work at each recursive step and there are |P| = d steps, the total run time is O(d2). It remains to verify that the sampling is uniform. For any extended bipartition of (P, ) given by ((A, ), (B, ), A, B), removing maximal v yields a smaller extended bipartition of the subposet (P v, ) such that if aj and bj are defined with respect to the smaller extended bipartition, then v is either to the right of aj in A of A or to the right of bj in B of B. 3.4 OVERALL ALGORITHM Having sampled an extended bipartition, Lemma 3.13 already verified that we can convert it to a non-interfering pair of chains efficiently, and Lemma 3.11 shows how to map that to a simplex in the double order polytope in time O(d2). Sampling the simplex takes time O(d log(d)) (Lemma 3.6), so putting these steps together yields the overall sampler. Pseudocode appears in Algorithm 1. Theorem 3.15. The poset ball for (P , ) can be sampled in time O(d2). Published as a conference paper at ICLR 2025 Algorithm 1 Poset Ball Sampler 1: Input: Poset P satisfying Assumption 2.8 2: Uniformly sample an extended bipartition (N+, N , A, B) (Lemma 3.14) 3: Convert (N+, N , A, B) into its non-interfering chain C = (C+, C ) (Lemma 3.13) 4: Compute the vertices of the simplex F(C) = F(C+) F(C ) (Lemma 3.11) 5: Return a uniform sample from F(C) (Lemma 3.6) 3.5 REJECTION SAMPLING THE POSET BALL IS INEFFICIENT We complement our efficient sampler with a negative result for rejection sampling the poset ball using any ℓp ball. First, the best possible candidate for rejection sampling is the ℓ ball. Lemma 3.16. The unit ℓ ball Bd is the minimum volume ℓp ball containing the poset ball. Proof. Any ℓp ball that contains (1, .., 1) must also contain all points in {0, 1}d {(1, ..., 1)} since each of these other points has strictly smaller ℓp norm. In particular, any ℓp ball that contains (1, ..., 1) must contain Bd . Since the poset ball contains (1, . . . , 1), the claim follows. Our overall result thus needs only to analyze rejection sampling from Bd . Theorem 3.17. Rejection sampling the poset ball using any ℓp ball is inefficient. Proof. By Lemma 3.16, it suffices to consider Bd . Consider the poset (P, ) on {r, p1, . . . , pd 1} with set of relations {pi r}d 1 i=1 . Then for any other poset P , its poset ball has smaller volume as R(P) R(P ) means that P can be formed from P by intersecting P with half-spaces defined by the relations corresponding to the relations in R(P ) R(P). Then P s poset ball is a cylinder with bases [0, 1]d 1 and height 2 (in the direction of the root axis) so has volume 1d 1(2) = 2. However, |Bd | = 2d. It follows that rejection sampling requires Ω(2d) samples from Bd in expectation. 4 EXPERIMENTS This section evaluates our K-norm mechanism on a variety of poset structures . First, we derive a general result about the squared expected ℓ2 norm of ℓp balls (Section 4.1). Based on this result, the strongest comparison for our algorithm is the ℓ mechanism. We use this as the baseline for experiments on path posets, random posets (Section 4.3) and the National Health Interview Survey (Services & Medicaid, 2024) (Section 4.4). An evaluation of runtime appears in Section 4.5. Note that our mechanism augments a given d-element poset with a root to ensure Assumption 2.8 as necessary. Since the baseline ℓ mechanism is applied to the d-dimensional problem, we ignore the root dimension of the (d+1)-dimensional noise vector produced by our mechanism when comparing squared ℓ2 norms. As discussed in Section 2.2, by post-processing, this does not affect privacy. 4.1 CHOICE OF BASELINE This section justifies our choice of the ℓ mechanism i.e., the K-norm mechanism instantiated with the ℓ norm as the baseline in our experiments. This choice relies on the following result, proved in Appendix B. Lemma 4.1. Let E2 2(X) denote the expected squared ℓ2 norm of a uniform sample from X, and let Bd p denote the d-dimensional unit ℓp ball. Then E2 2(Bd p) = d 3 3d d+2 Γ( d , where Γ is the gamma function, and E2 2(Bd ) = d/3. Let rp,d = d1/p be the minimum radius for which rp,d Bd p contains Y(P ). By the above, E2 2(rp,d Bd p) = r2 p,d E2 2(Bd p). Remark 4.2 from Hardt & Talwar (2010) establishes that, fixing privacy parameters, the expected squared ℓ2 norm of a sample from the K-norm mechanism is proportional to the expected squared Published as a conference paper at ICLR 2025 0 10 20 30 40 50 dimension d expected squared 2 norm p = 1 p = 2 p = 5 p = 10 p = Figure 2: r2 p,d E2 2(Bd p) (see Lemma 4.1). ℓ2 norm of its norm ball. Thus, to identify the best ℓp-norm mechanism, it suffices to identify the p minimizing E2 2(rp,d Bd p) in Lemma 4.1. Figure 8 plots the analytic expression derived in Lemma 4.1 for various p and d and leads us to take the ℓ -norm mechanism as the strongest baseline. We note briefly that Laplace (ℓ1) noise is approximately 2 (d = 2) to 5 (d = 50) times worse than ℓ noise by squared ℓ2 norm, so the following results are even stronger when evaluated against the Laplace mechanism. Furthermore, for (1, 10 6)-DP, the analytical Gaussian mechanism (Balle & Wang, 2018) is also dominated by the ℓ mechanism over the range d 50 featured in our experiments (see plot in Appendix B.2), so we omit it here. 4.2 PATH POSET We start with the path poset defined by the ground set P = {1, ..., d} and the corresponding linear order 1 2 ... d (see Figure 1 for an illustration of its poset ball for d = 2). Figure 3 demonstrates that our K-norm mechanism improves significantly over the baseline on the path poset, up to over an order of magnitude near d = 50. As subsequent experiments will develop, this dramatic error reduction is a consequence of the path graph s extreme depth. Informally, it is the graph on d elements with the smallest poset ball, since any other graph can be created from the path graph by repeatedly moving the leaf element of the original path and grafting it somewhere higher up the tree, and this operation strictly decreases the number of relations in the poset. 0 10 20 30 40 50 depth poset norm/ norm Figure 3: Each point records the average mean squared ℓ2 norm ratio between the poset and ℓ balls for the path graph over 100 trials for each of various d. Published as a conference paper at ICLR 2025 4.3 RANDOM POSETS The second set of experiments features random posets. The posets are generated using the algorithm given by Melanc on et al. (2001) for uniformly sampling a directed acyclic graph on a fixed number of uniquely labeled vertices. As their algorithm has runtime O(d4), we restrict attention to relatively small values for the number of poset elements d. The first random poset experiment examines our algorithm s improvement over the ℓ mechanism as d grows. Figure 4 demonstrates that our algorithm s advantage in terms of expected squared ℓ2 norm widens with d, increasing to over 90% at d = 40. 5 10 15 20 25 30 35 40 d poset norm/ norm Figure 4: Each point records the average mean squared ℓ2 norm ratio between the poset and ℓ balls over 100 trials of random posets. A lower value means our mechanism achieves lower error. The second and third random poset experiments break this data out along two dimensions (Figure 5). Fixing d = 10 (ignoring the root), we plot improvement as a function of both the depth (longest chain of relations) and number of relations in the poset. Both numbers are computed on the transitive reduction of the directed acyclic graph corresponding to the poset, i.e., the directed acyclic graph that minimizes the number of edges while preserving all paths of the poset. As observed for d, at a high level we find that larger and richer poset structures lead to correspondingly more dramatic improvements, up to a 4x reduction in error. 5 6 7 8 9 10 depth poset norm/ norm 20 22 24 26 28 30 num_edges poset norm/ norm Figure 5: Each point records the average mean squared ℓ2 norm ratio between the poset and ℓ balls over at least 100 trials. We omit extremes of both depth and number of edges, as they do not occur enough times in our overall sample of 5000 random posets to cross the 100-sample threshold. 4.4 NATIONAL HEALTH INTERVIEW SURVEY The final set of experiments uses the National Health Interview Survey (NHIS) (Services & Medicaid, 2024). As described in the introduction, the survey includes or omits certain questions depending on previous answers. This induces a poset, and our experiments use either the first, first two, Published as a conference paper at ICLR 2025 or first three sections of the survey (Hypertension, Cholesterol, and Asthma). The resulting posets have size from d = 4 to d = 15. As shown in Figure 6, our mechanism more roughly halves the error of the baseline mechanism. # survey sections poset ball squared ℓ2 norm / l ball squared ℓ2 norm 1 0.573 2 0.503 3 0.460 Figure 6: NHIS average mean squared ℓ2 norm ratios, from 10,000 trials each. 4.5 RUNTIME Empirically comparing our algorithm s speed to the general O(d3+ω) sampler from Laddha et al. (2020) is hard, as their algorithm is complex and relies on asymptotic statements that make accurate implementation difficult to the best of our knowledge, no public implementation of the algorithm exists. However, in addition to its O(d2) runtime, we provide experiments demonstrating the empirical speed of our algorithm over random graphs of varying depth. In Figure 7 we plot the average runtime of sampling the poset ball as the dimension d varies. On a 2 CPU machine with 32GB RAM, our method takes less than half a second for any of the d used in our experiments. 0 10 20 30 40 50 dimension d average runtime (s) Figure 7: Poset ball sampler average runtime to generate one sample. For each depth we average over 10 different uniformly random posets and over 10 samples of the mechanism. 5 DISCUSSION The material above demonstrates that the K-norm mechanism offers strong utility improvements over existing pure differentially private mechanisms for privately counting partially ordered data without sacrificing efficiency. A natural direction for future work is to consider the same problem under approximate differential privacy. There, elliptic Gaussian noise is the primary tool, and the geometric problem changes from sampling the induced norm ball to computing the smallest (in terms of expected squared ℓ2 norm) ellipse enclosing the induced norm ball. Unfortunately, while Joseph & Yu (2024) solved this problem for count and vote, their solutions relied on the norm balls being symmetric around the vector (1, . . . , 1), and this does not hold for the poset ball in general. This suggests that new techniques may be required to obtain a similar result for partially ordered data. Published as a conference paper at ICLR 2025 Jordan Awan and Aleksandra Slavkovi c. Structure and sensitivity in differential privacy: Comparing k-norm mechanisms. Journal of the American Statistical Association, 2021. Borja Balle and Yu-Xiang Wang. Improving the gaussian mechanism for differential privacy: Analytical calibration and optimal denoising. In International Conference on Machine Learning (ICML), 2018. Thomas Chappell, Tobias Friedl, and Raman Sanyal. Two double poset polytopes. SIAM Journal on Discrete Mathematics, 2017. Cynthia Dwork, Frank Mc Sherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference (TCC), 2006. Alexander Edmonds, Aleksandar Nikolov, and Jonathan Ullman. The power of factorization mechanisms in local and central differential privacy. In Symposium on the Theory of Computing (STOC), 2020. Moritz Hardt and Kunal Talwar. On the geometry of differential privacy. In Symposium on the Theory of Computing (STOC), 2010. Matthew Joseph and Alexander Yu. Some Constructions of Private, Efficient, and Optimal K-Norm and Elliptic Gaussian Noise. In Conference on Learning Theory (COLT), 2024. Assimakis Kattis and Aleksandar Nikolov. Lower bounds for differential privacy from gaussian width. In Symposium on Computational Geometry (SOCG), 2017. Aditi Laddha, Yin Tat Lee, and Santosh Vempala. Strong self-concordance and sampling. In Symposium on Theory of Computing (STOC), 2020. Chao Li, Gerome Miklau, Michael Hay, Andrew Mc Gregor, and Vibhor Rastogi. The matrix mechanism: optimizing linear counting queries under differential privacy. The VLDB Journal, 2015. Ryan Mc Kenna, Gerome Miklau, Michael Hay, and Ashwin Machanavajjhala. Optimizing Error of High-Dimensional Statistical Queries under Differential Privacy. The VLDB Journal, 2018. Guy Melanc on, Isabelle Dutour, and Mireille Bousquet-M elou. Random generation of directed acyclic graphs. Electronic Notes in Discrete Mathematics, 2001. Aleksandar Nikolov. Private query release via the johnson-lindenstrauss transform. In Symposium on Discrete Algorithms (SODA), 2023. Aleksandar Nikolov and Haohua Tang. Gaussian Noise is Nearly Instance Optimal for Private Unbiased Mean Estimation. ar Xiv preprint ar Xiv:2301.13850, 2023. Aleksandar Nikolov, Kunal Talwar, and Li Zhang. The geometry of differential privacy: the sparse and approximate cases. In Symposium on the Theory of Computing (STOC), 2013. Center for Medicare Services and Medicaid. National health interview survey. https: //www.cms.gov/about-cms/agency-information/omh/resource-center/ hcps-and-researchers/data-tools/sgm-clearinghouse/nhis, 2024. A SAMPLER PROOFS Remark A.1. The exposition from Chappell et al. (2017) defines the double order polytope of a double poset (P, +, ) as O(2P+) O(2P ), defines the double chain polytope as C(2P+) C(2P ) and for a pair of chains C = (C+, C ) it defines F(C) = 2F(C+) 2F(C ), and F (C) = 2F (C+) 2F (C ). Removing these factors of 2 from the definitions does not change the combinatorial structure of the double order polytope or double chain polytope since this removal simply scales down the (d 1) dimensions of Rd that excludes the dimension that is added from the operation. Consequently, the triangulation of lemma 3.11 still holds. The reason we remove the factors of 2 is because the sensitivity space we are interested in is CH(O(P ) O(P )) which is equivalent to our definition of the double order polytope O2(P r) where r is the global maximum of P. Published as a conference paper at ICLR 2025 Claim A.2. Let P be a (d 1)-dim convex polytope in Rd lying in the hyperplane x1 = 1. Let P = P {0}. Then V (CH(P P )) = V (P) V ( P). Proof. We have V (CH(P P )) = V (CH(V (P ) V ( P ))) V (P ) V ( P ) = V (P) V ( P) {0} where the comes from the fact that the vertices of the convex hull of a finite set is a subset of that set. Since CH(P P ) has antipodal symmetry, then x CH(P P ) implies x CH(P P ) which means that the line segment between x and x contains a small 1-dim neighborhood B around the origin such that B CH(P P ), so the origin is not a vertex of CH(P P ), so we have that V (CH(P P )) V (P) V ( P). Now, suppose that v V (P) is not a vertex of CH(P P ). We call a convex combination of vertices non-singleton if its support contains at least 2 vertices. The supposition implies there is a non-singleton convex combination of vertices in V (P ) V ( P ) that equals v. None of vertices in the support of this combination can lie in V ( P ) because then the x1 coordinate of the combination would be less than 1 whereas the first coordinate of v is 1. So v must be a nonsingleton convex combination of points in V (P), contradicting the fact that v is a vertex of V (P). So V (P) V (CH(P P )) and similarly V ( P) V (CH(P P )). But by the previous paragraph, V (CH(P P )) V (P) V ( P), so V (CH(P P )) = V (P) V ( P). Lemma 3.13. There is a bijection between the set of non-interfering pairs of chains C = (C+, C ) of the double poset (P r, , ) of size |C| = |C+| + |C | = d + 1 and the set of extended bipartitions of P r. Moreover, given an extended bipartition, its corresponding non-interfering pair of chains can be computed in time O(d2). Proof. Suppose we have a non-interfering pair of chains C = (C+, C ) where |C| = d + 1. Write each chain of filters as Cσ = Jσ 1 ... Jσ |Cσ| for σ {+, }. Then | min(Jσ i+1) min(Jσ i )| 1 for 1 i |Cσ| 1. Any element p (min(Jσ i+1) min(Jσ i )) lies in Jσ i+1. If it also lies in Jσ i , then p min(Jσ i ) means there exists p Jσ i with p p, but Jσ i Jσ i+1 then implies that p min(Jσ i+1), a contradiction. Thus (min(Jσ i+1) min(Jσ i )) (Jσ i+1 Jσ i ), and similarly for any j > i, we have (min(Jσ j+1) min(Jσ j )) (Jσ j+1 Jσ j ). Then Jσ i+1 Jσ j implies (min(Jσ i+1 min(Jσ i )) (min(Jσ j+1 min(Jσ j )) = . Let Nσ = |Cσ| 1 i=1 (min(Jσ i+1) min(Jσ i )). The above implies |Nσ| |Cσ| 1. Moreover, N+ N = since Nσ |Cσ| 1 i=1 min(Jσ i+1) , and |C+| 1 i=1 min(J+ i+1) |C | 1 i=1 min(J i+1) = since C is non-interfering. Thus |N+ N | = |N+| + |N | |C+| + |C | 2 = |C| 2 = d 1 = |P r|. But N+ N P r, so |N+ N | = |P r|. This means that (N+, N ) partition P r and | min(Jσ i+1) min(Jσ i )| = 1 for 1 i |Cσ| 1. It follows that min(Jσ 1 ) = Jσ 1 = . This leads us to the following extended bipartition. Let ai = min(J+ |C+| i+1) min(J+ |C+| i) for 1 i |C+| 1 and bi = min(J |C | i+1) min(J |C | i) for 1 i |C | 1. Since filters are upwards closed, either ai aj for some aj {ai+1, ..., a|C+| 1} or ai is incomparable to each element of {ai+1, ..., a|C+| 1}. In either case, a1 A ... A a|C+| 1 is a linear extension of N+, and similarly b1 B ... B b|C | 1 is a linear extension of N , so ((N+, ), (N , ), A, B) is an extended bipartition of P r. Conversely, given an extended bipartition (N+, N , a1 A ... A a|N+|, b1 B ... B b|N |), define sets m+ 1 = , m 1 = , m+ i = min( i 1 j=1a|N+|+1 j) for 2 i |N+| + 1 and m i = min( i 1 j=1b|N |+1 j) for 2 i |N | + 1. Define Jσ i to be the filter with minimal elements mσ i for 2 i |Nσ| + 1 and let Jσ 1 = . Let Cσ = Jσ 1 ... Jσ |Nσ|+1. Then (C+, C ) is a non-interfering pair with |C+| + |C | = |N+| + |N | + 2 = d + 1, and this map from the set of extended bipartitions to the set of non-interfering pairs of chains is an inverse of the mapping of the previous paragraphs. It remains to verify the runtime of computing the non-interfering pair of chains we constructed above. Without loss of generality, we focus on J+ i . Filter J+ 1 = and J+ 2 is the set of elements with Published as a conference paper at ICLR 2025 a 1 in row |N+| of matrix M, computed in time O(d). Now, suppose we have computed J+ i and want to compute Ji+1. By comparing rows |N+| + 1 i and |N+| + 2 i of M, we can compute the set pi of parents of a|N+|+1 i that are not parents of a|N+|+2 i (we include a|N+|+1 i pi) in time O(d). Note that pi = Ji+1 Ji. Then J+ i+1 = J+ i pi where is disjoint union. Thus each filter of J+ 1 . . . J+ |N+|+1 can be computed in time O(d), taking time O(d2) overall. B EXPERIMENT PROOFS B.1 EXPECTED SQUARED NORM We start with a simple result about the expected squared ℓ2 norm, E2 2. Recall from Section 4.1 that E2 2(X) denotes the expected squared ℓ2 norm of a uniform sample from X. Lemma B.1. Then E2 2(X Y ) = E2 2(X) + E2 2(Y ). Proof. To uniformly sample X Y , we can uniformly sample x X, y Y and then return x y. Samples x and y embed into disjoint coordinates of x y, so x y 2 2 = x 2 2 + y 2 2. Lemma 4.1. Let E2 2(X) denote the expected squared ℓ2 norm of a uniform sample from X, and let Bd p denote the d-dimensional unit ℓp ball. Then E2 2(Bd p) = d 3 3d d+2 Γ( d , where Γ is the gamma function, and E2 2(Bd ) = d/3. Let rp,d = d1/p be the minimum radius for which rp,d Bd p contains Y(P ). By the above, E2 2(rp,d Bd p) = r2 p,d E2 2(Bd p). Proof. We use the fact that R 1 0 ta 1(1 t)b 1 = Γ(a)Γ(b) Γ(a+b) . Then E2 2(Bd p) = |Bd p| 1 Z |x1|p+...+|xd|p 1 i=1 x2 i x1... xd = |Bd p| 1 Z |x1|p+...+|xd|p 1 dx2 1 x1... xd = |Bd p| 1 Z 1 |x2|p+...+|xd|p 1 |x1|p x2... xd = |Bd p| 1 Z 1 1 dx2 1|Bd 1 p |(1 |x1|p) = 2d|Bd p| 1|Bd 1 p | Z 1 We substitute y = xp to get y = pxp 1 x x = 1 pxp 1 y Published as a conference paper at ICLR 2025 and the chain of equalities continues with E2 2(Bd p) = 2 d|Bd p| 1|Bd 1 p | Z 1 0 y 2 p (1 y) d|Bd p| 1|Bd 1 p | 3 0 y 3 p 1(1 y) d|Bd p| 1|Bd 1 p | 3 Γ 3 p Γ( d 1 p + 1)Γ( d 1 For p = , we use E2 2(Bd ) = E2 2([0, 1]d) = d E2 2([0, 1]) = d Z 1 where the first equality comes from the fact that the expectation restricted to the positive orthant does not change the expectation value because of symmetry over the orthants, and the second equality comes from Lemma B.1. We show that rp,d = d1/p. Note that rp,d is equal to the smallest radius such that rp,d Bd p contains the point (1, ..., 1) Y(P ) because if (1, ..., 1) rp,d Bd p then all other binary vectors of length d are also contained in rp,d Bd p, i.e. Y(P ) rp,d Bd p. Next, we show that E2 2(rp,d Bd p) = r2 p,d E2 2(Bd p). Since rp,d Bd p = {rp,dx : x Bd p}, E2 2(rp,d Bd p) = |Bd p| 1 Z |x1|p+...+|xd|p 1 i=1 (rp,dxi)2 x1... xd = r2 p,d|Bd p| 1 Z |x1|p+...+|xd|p 1 i=1 x2 i x1... xd = r2 p,d E2 2(Bd p). B.2 ANALYTICAL GAUSSIAN MECHANISM COMPARISON Published as a conference paper at ICLR 2025 0 10 20 30 40 50 d mean squared 2 norm error comparison Figure 8: A comparison of expected squared ℓ2 norms for the 1-DP ℓ mechanism (with ℓ sensitivity 1) and (1, 10 6)-DP analytical Gaussian mechanism (with ℓ2 sensitivity