# optimality_of_matrix_mechanism_on_ell_ppmetric__7772b4e3.pdf Published as a conference paper at ICLR 2025 OPTIMALITY OF MATRIX MECHANISM ON ℓp p-METRIC Zongrui Zou, Jingcheng Liu State Key Laboratory for Novel Software Technology, New Cornerstone Science Laboratory Nanjing University 163 Xianlin Avenue, Nanjing, Jiangsu Province, 210023, China zou.zongrui@smail.nju.edu.cn, liu@nju.edu.cn Jalaj Upadhyay Management Science & Information Systems Department Rutgers University 100 Rockafeller Road, Piscataway, NJ 08854, USA jalaj.upadhyay@rutgers.edu In this paper, we introduce the ℓp p-error metric (for p 2) when answering linear queries under the constraint of differential privacy. We characterize such an error under (ε, δ)-differential privacy in the natural add/remove model. Before this paper, tight characterization in the hardness of privately answering linear queries was known under ℓ2 2-error metric (Edmonds et al. (2020)) and ℓ2 p-error metric for unbiased mechanisms in the substitution model (Nikolov & Tang (2024)). As a direct consequence of our results, we give tight bounds on answering prefix sum and parity queries under differential privacy for all constant p in terms of the ℓp p error, generalizing the bounds in Henzinger et al. (2023) for p = 2. 1 INTRODUCTION Analysis or learning with sensitive datasets under privacy has garnered increasing attention in recent years. In this paper, we study the most fundamental question of answering linear queries on confidential dataset x Rn while preserving differential privacy (DP) (Dwork et al., 2016). Informally speaking, differential privacy captures the property of a randomized algorithm that its output distribution is relatively stable when executed on two neighboring datasets, i.e., datasets that can be formed by changing one data point. More formally, Definition 1.1 Let M : X R be a randomized algorithm, where R is the output domain. For fixed ε > 0 and δ [0, 1), we say that M preserves (ε, δ)-differential privacy if, for any measurable set S R and any pair of neighboring datasets x, y X, Pr[M(x) S] Pr[M(y) S] eε + δ. If δ = 0, we say A preserves pure differential privacy (denoted by ε-DP). Many fundamental analyses can be cast as a set of linear queries (Dwork & Roth, 2014; Vadhan, 2017): given an input x Rn, a set of m linear queries can be represented as the rows of a matrix A Rm n. The answer to the set of queries is simply the matrix-vector product Ax. Here, x, x Rn are neighboring if x x 1 1 (known as add/remove model of privacy). When these queries are answered using a privacy-preserving algorithm, M, the performance of the algorithm is usually measured in terms of its absolute error or mean squared error (eq. (13)). JL and ZZ have been supported by National Science Foundation of China under Grant No. 62472212 and the New Cornerstone Science Foundation. JU s research was funded by the Rutgers Decanal Grant no. 302918 and an unrestricted gift from Google. Published as a conference paper at ICLR 2025 In this paper, we initiate the study of ℓp p-error metric that seamlessly interpolate1 between p = 2 (squared error) to p = (absolute error): errℓp p(M, A) := max x Rn E M(x) Ax p p 1/p . (1) The error metric defined above has a natural and intuitive interpretation for data analysis. To elaborate on this, consider the most natural mechanism that adds i.i.d. noise to each answer of a set of linear queries, and let vi be the error in answering the i-th query. Then our error metric captures the p-th moment of the error, which is a random vector v Rm. By considering all p, one can identify the exact nature of the probability distribution of the error. One popular mechanism for privately answering linear queries under different error metrics is the matrix mechanism (Li et al., 2015), also known as the factorization mechanism. In the matrix mechanism, given a set of m linear queries represented by a workload matrix A Rm n, we compute a factorization LR = A (where L Rm k, R Rk n) and output L(Rx + z) for any input x Rn with an appropriately scaled Gaussian random vector z Rk. This mechanism is both unbiased (i.e., E[z] = 0) and oblivious, i.e., the distribution of z is stochastically independent of x. In this paper, we show that the optimal matrix mechanism is also optimal among all differentially private mechanisms with respect to the ℓp p metric, up to logarithmic factors: Theorem 1.2 (Informal statement of Theorem 1.4 and Theorem D.1) Fix A Rm n be a matrix representing m linear queries, and let M : Rn Rm be any (ε, δ)-DP algorithm. Then, there exists a factorization of A = LR such that Mmatrix(x) = L(Rx + z) with z N(0, R 2 1 2Ik) preserves (ε, δ)-DP and that errℓp p(Mmatrix, A) errℓp p(M, A) polylog(1/δ, m). Here, Ik Rk k is the identity matrix. To prove this, we characterize the ℓp p-error for answering linear queries under (ε, δ)-differential privacy generalizing Edmonds et al. (2020), and also obtain a characterization of errℓp p(Mmatrix, A) that is tight up to log factors, for every query matrix A and p 2. For the convenience of use, we start by stating a weaker form of our lower bound. We will see that this is an immediate corollary of our main theorem. Theorem 1.3 Let A Rm n be a matrix representing m linear queries. Then for any (ε, δ)-DP algorithm M, errℓp p(M, A) = Ωε,δ(m1/p 1/2 A 1/ n). Here, A 1 is the Schatten-1 norm of A and Ωε,δ( ) hides the dependency on the privacy parameters. To demonstrate the power of the above results, we obtain tight bounds for privately answering prefix sum and parity queries. These are two important classes of queries: for example, prefix sum is used as a subroutine in private learning (Kairouz et al., 2021) and parity queries2 are often used for hardness results (Kasiviswanathan et al., 2011). 1. (Prefix sum) In this problem, the data curator outputs P i t xi of a vector x = (x1, x2, , xn) in a differentially private manner for all t n. This is equivalent to asking linear queries with Aprefix {0, 1}n n where Aprefix is a lower-triangular matrix with non-zero entry equal to one. In Theorem 1.5, we show that, for all constant p, the ℓp p-error of prefix sum under (ε, δ)-differential privacy is Θε,δ(n1/p log(n)) and can be achieved by the same mechanism for all p = O(1); for p = ω(1), the gap between the upper and lower bound is of factor p log(n). This generalizes the result of Dwork et al. (2010) and Henzinger et al. (2023). 2. (Parity Queries). Let QP d,w = {q P (x) = Q i P xi : P [d], |P| = w} be the class of parity queries over the input domain { 1, 1}d. In Theorem 1.6, we show that for any (ε, δ)-differentially private mechanism M that takes as input d and w, errℓp p M, QP d,w = Ωε,δ m1/2+1/p 1Interpolation plays a key role in functional analysis (Riesz, 1927; Thorin, 1948), and it is one of the primary reasons for the initial support for Riesz (1910; 1913) s study of ℓp-norm despite Minkowski s skepticism (Minkowski, 1913). 2Depending on which parity queries are made, the workload matrix would consist of a subset of rows of a normalized n n Hadamard matrix Published as a conference paper at ICLR 2025 for m = d w . Since Oε,δ m1/2+1/p min{p, log(m)} is the ℓp p error of the trivial Gaussian mechanism, this is optimal whenever min{p, log(m)} = O(1). Our study is motivated and inspired by recent elegant work by Nikolov & Tang (2024), who proved the instance optimality of the matrix mechanism instantiated using correlated Gaussian noise for unbiased mean estimation. They considered the following error metric3: err NT (M, A) := max x Rn E M(x) Ax 2 p 1/2 . (2) Nikolov & Tang (2024) showed that, with the privacy notion where x x 1 O(1) and that P i [n](xi x i) = 0 (commonly known as the substitution model of privacy), matrix mechanism is instance-optimal for any unbiased mechanism under the metric defined in eq. (2)4. Our work instead focuses on obliviousness of the matrix mechanism, a more natural ℓp p metric and a different (stronger) privacy notion, i.e., it differs both in the error metric and the results: 1. To understand the difference between these two error metrics, consider the error vector v Rm. The metric used in Nikolov & Tang (2024) amounts to estimating E[(|v1|p + + |vm|p)2/p] instead of a more natural E[vp 1 + + vp m] in eq. (1). In other words, it does not explain the behavior of the error even in the case of the naive additive noise mechanisms. This is one of the primary reasons we believe our error metric is more natural. 2. They focused on instance optimality of unbiased mean estimation. While this is a wellstudied problem, it does not cover the question of the ℓp-optimality of general linear queries under the error metric defined by eq. (1) for general mechanisms. We answer this question broadly and prove equivalent results for a more natural error metric. 3. Differential privacy in the add/remove model (i.e., x x 1 1) is usually the most natural notion considered in literature of answering linear queries ( Edmonds et al. (2020); Nikolov et al. (2013); Bhaskara et al. (2012)). In contrast, Nikolov & Tang (2024) considered the substitution model for unbiased mean estimation over some convex polytopes. The sensitivity polytope related to the substitution model can be substantially smaller than that of the add/remove model (see also Appendix B.3 for a more detailed discussion). As a result, deriving a new lower bound for the stronger add/remove model is needed. From a pure analysis perspective (and as is often the case in mathematics) as well, one prefers a metric respecting the symmetry as shown in our choice of metric, the ℓp-norm, and Fp moments studied in the streaming literature. While both of the error metrics (eq. (1) and eq. (2)) converge to the same metric as p and when p 2, the mathematical object the sequence captures as a function of p is vastly different. That is, our results complement that of Nikolov & Tang (2024). 1.1 OUR CONTRIBUTIONS Our main result is a lower bound on general (ε, δ)-differentially private mechanisms for answering linear queries in high privacy regimes in terms of certain factorization norms Nikolov & Tang (2024) defined below5 : γ(p)(A) := min LR=A trp/2(LL ) R 1 2 o , where trp(U) := Pd i=1 U p ii 1/p p < maxi [d] |Uii| p = is the p-trace. Equipped with this definition, we state our lower bound: 3Nikolov and Tang confirmed with us that they did not consider the metric considered in this paper. 4We note that a Gaussian distribution is entirely characterized by its first two moments and, at a high level, eq. (2) captures the variance of the ℓp norm of the zero mean vector representing the additive error. 5Let B p q = min x p=1 Bx q. Then two commonly studied factorization norms in privacy and functional analysis denoted by γ2(A) and γF (A) are defined as γ2(A) = min LR=A{ L 2 R 1 2} and γF (A) = min LR=A{ L F R 1 2}. Both these norms are special cases of γ(p)( ) because when p = 2, trp/2(LL ) = L 2 F and when p , then tr (LL ) = maxi [d](LL )ii = L 2 2 . Published as a conference paper at ICLR 2025 Theorem 1.4 (Lower bound for (ε, δ)-DP) Fix any n, m N, ε (0, 1 2), 0 δ 1 and p [2, ). For any query matrix A Rm n, if a mechanism M : Rn Rm preserves (ε, δ)- differential privacy, then there exists a universal constant C , errℓp p(M, A) (1 δ)γ(p)(A) C ε , where δ = 2e2ε(e1/2 1) Theorem 1.4 generalizes the result of Edmonds et al. (2020) for p = 2 to all p 2. Our result can also be contrasted with the lower bound which uses discrepancy methods. It is known that the ℓ -error of an (ε, δ)-differentially private algorithm for linear queries is lower bounded by the hereditary discrepancy of the corresponding matrix A Rm n (Muthukrishnan & Nikolov, 2012), which in turn is lower bounded by γ( )/ p log(m) using its characterization in terms of a semidefinite program (Matoušek et al., 2020). Our result shows that we can get a p log(m) better lower bound. We complement this lower-bound with a tight upper bound in Appendix D (see Theorem D.1) matching it up to an O(log(1/δ) min{p, log(2m)}) factor, combining this and Theorem 1.4 gives Theorem 1.2. The meaning of γ(p)(A) in Theorem 1.4 is not immediately apparent. Thus, as one of its applications, we study explicit lower bound (with respect to n instead of γ(p)(A)) for some special types of queries that are widely used in the community of privacy. We first characterize the accuracy of prefix sum, i.e., when the query matrix Aprefix is a lower-triangular all-one matrix. The upper bound in Theorem 1.5 follows from the binary tree mechanism (Chan et al., 2011; Dwork et al., 2010) while the lower bound uses Theorem 1.4. Notably, we can extend the lower bound for prefix sum queries to all ε > 0, rather than limiting it to a high privacy regime of ε < 1/2 as in Theorem 1.4. Theorem 1.5 For any n N and any p [2, ), the matrix mechanism, Mfact, achieves the following error guarantee while preserving (ε, δ)-differential privacy: errℓp p(Mfact, Aprefix, n) = O n1/p log(n) p log(1/δ) min{p, log(n)} Further, there is no (ε, δ)-differentially private mechanism M that achieves errℓp p(M, Aprefix, n) = o (1 δ)n1/p log(n) ( 1 16, ε2, Θ We note that e3ε 1 = O(ε) in a high privacy regime where ε = O(1), so the lower and upper bound match in such a regime. Theorem 1.5 recovers the result in Henzinger et al. (2023) for p = 2 as a special case. Moreover, it exactly characterizes the error of prefix sum with respect to any ℓp p metric for all constant p. One can obtain an Ω(log(n)) lower bound on ℓ -error using a slight modification of the packing argument in Dwork et al. (2010) for δ = o(1/n). Our bound extends the packing-based lower bound to larger values of δ and all p [2, ). As another application, in Theorem 1.6 (shown in Appendix C), we characterize the lower bound on privately answering parity queries. The theorem recovers the lower bound in Section 8 of Henzinger et al. (2023) for p = 2 and Section 3.6 of Edmonds et al. (2020) when p . Theorem 1.6 Let QP d,w be the collection of parity queries. For any (ε, δ)-differentially private mechanism M for answering queries in QP d,w, the worst case ℓp p error M, QP d,w, d w (1 δ) e3ε 1 Organization of the proof. In Section 2.1, we first develop the lower bound for additive noise mechanism on arbitrary matrix with linearly independent rows. For general matrix, in Section 2.2, we remove the linear independency assumption in the start of Section 2, and then with the help of the back-box reduction from additive noise mechanism to general mechanism, we give a (ε, δ)-DP lower bound with respect to general matrix A Rm n in only high privacy regime, which establishes Theorem 1.4. Next, in Section 2.3, we prove our easy-to-use bound Theorem 1.3 based on Theorem 1.4. We derive Theorem 1.5 as a corollary of previous sections. The proof of Theorem 1.6 follows a similar reasoning and we defer it in Section C. Published as a conference paper at ICLR 2025 2 LOWER BOUND FOR (ε, δ)-DP AND ITS APPLICATION In this section, we prove our lower bound and its applications in proving lower bounds of prefix sum and parity queries in ℓp p metric. Throughout this paper, we write a b if there exists some universal constant c such that a 1 cb. For proving the lower bound in terms of (ε, δ)-differential privacy, we first consider a special class of mechanisms that adds noise sampled from an appropriate distribution to the real answer of the queries (a high-level idea of our proof is presented in Appendix B). We call such a class of mechanisms the additive noise mechanisms. Unlike Nikolov & Tang (2024), we do not assume that the mechanism is unbiased which makes our analysis more subtle. Before stating the result, we fix some notations. Let Bn p := {x Rd : x p 1} denote the n-dimensional ℓp-ball and ABn 1 := {Ax : x Bn 1 } denote the sensitivity polytope. To describe the lower bound, for any matrix A Rm n, we define the map, κ : Rm n R, that computes the width of the sensitivity polytope with respect to the most narrow direction: κ(A) := min θ 2=1 w ABn 1 (θ) where w ABn 1 (θ) := max x 1 1 θ Ax min x 1 1 θ Ax. (3) To prove Theorem 1.4, we first show a lower bound for additive noise mechanisms when A has linearly independent rows. We then remove this assumption in a high privacy regime (ε < 1/2) in Section 2.2; Theorem 1.4 then follows by using a general reduction of Bhaskara et al. (2012). 2.1 LOWER BOUND ON ADDITIVE NOISE MECHANISMS Theorem 2.1 (Lower bound for additive noise mechanisms) Fix any ε > 0, p [2, ) and query matrix A Rm n. There exists a δ(A, ε, n) := min n 1 16, ε2, ε κ(A)n1 2/p 12γ(p)(A) o such that for any δ δ(A, ε, n), if M is a (ε, δ)-differentially private additive noise mechanism, then for any x Rn, E M(x) Ax p p 1/p (1 δ )γ(p)(A) 8(e3ε 1) , where δ = 2δ 1 e ε . The above theorem implies an almost tight lower bound in a high privacy regime. For example, when 0 ε 1 3, since 3ε e3ε 1 6ε, it directly implies that E M(x) Ax p p 1/p (1 δ )γ(p)(A) This matches the upper bound given in Theorem D.1. For additive noise mechanisms, Theorem 2.1 is naturally instance-optimal on any x Rn. We note that the range of δ in Theorem 2.1 depends on κ(A), and it is easy to verify that κ(A) > 0 if and only if A has linearly independent rows (see Lemma 2.5 for details). While special linear queries such as prefix sum and parity queries inherently possess linearly independent rows, there are many interesting matrices without linearly independent rows. In the high privacy regime, which was the setting considered in Edmonds et al. (2020), we remove the full rank assumption (see Theorem 2.8 in Section 2.2). This underpins Theorem 1.4. The main technical obstacle of Theorem 2.1 lies in making explicit all the intricate dependencies on the width of the sensitivity polytope, and how to handle the bias in lower bound proofs. We note that Edmonds et al. (2020) only studies ℓ2 2 error. Therefore, without loss of generality, it can be assumed that the bias is 0. Nikolov & Tang (2024) studies an unbiased setting and their approximate DP lower bound depends on the minimum width of the polytope (w0 in Nikolov & Tang (2024)). Our lower bound does not assume unbiasedness, and our lower bound in Theorem 2.1 does not depend on the minimum width in the bound itself. Instead, the minimum width is only required in Theorem 2.1 for the applicable range of δ. This means that our bound remains non-trivial even if the minimum width is like 1/n. In proving the new lower bound, we also adapt geometric characterizations in Nikolov & Tang (2024) to handle bias of an additive noise mechanism. We also give a more detailed discussion in Appendix B.3. To prove Theorem 2.1, we consider mechanisms of the form M(x) = Ax+z, where z is stochastically independent of x. For any input x Rn, we define the covariance matrix of M(x) to be ΣM(x) = E[(M(x) E[M(x)])(M(x) E[M(x)]) ]. Published as a conference paper at ICLR 2025 Since an additive noise mechanism can be biased, E[M(x)] is not necessarily Ax. We prove in Appendix E.1 the following relationship between the ℓp p error and the p-trace of the covariance matrix. Lemma 2.2 Fix any p [2, ) and any additive noise mechanism M : Rn Rm. It holds that Rn, E M(x) Ax p p 1/p q trp/2(ΣM(x)). Therefore to prove Theorem 2.1, it suffices to prove a lower bound on trp/2(ΣM(x)) for any additive noise private mechanism M( ). To start with, we give a statement bounding the bias of an additive noise mechanism. In particular, using the Hölder s inequality, for p 2, we have i [n] (Ezi)2 i n (Ezi)p ! 2 p n(p 2)/p E[ z p p] 2/p n(p 2)/p. Taking the square root of both sides gives the following result. Lemma 2.3 Fix p 2. Let M(x) = Ax + z be an additive noise mechanism with z Rm, then E[ M(x) Ax p p] 1/p Ez 2 n(1/p 1/2). Therefore, we can assume E[z] 2 γ(p)(A)n(p 2)/2p ε . Otherwise, due to Lemma 2.3, for all x Rn, E M(x) Ax p p 1/p E[z] 2 n(p 2)/2p > γ(p)(A)n(p 2)/2p εn(p 2)/2p = γ(p)(A) ε > (1 δ )γ(p)(A) So, it suffices to prove a lower bound on trp/2(ΣM(x)) for additive noise mechanisms with small bias. For this, we use a folklore trick (Smith, 2016) that has been used frequently in the literature of differential privacy. It consists of the following steps: For distributions D and D corresponding to the output distribution of a privacy-preserving mechanism on the neighboring dataset, we first define the support on which the privacy loss variable with respect to D and D is bounded. Then we update the probability distribution D such that the privacy loss random variable with respect to the new distribution and D is still bounded and the measure of the new distribution is close in some metric to D. In more details, for any two distributions P and Q over Ωand ε > 0, let SP,Q,ε := n ω Ω: e ε P (ω) Q(ω) eεo be the subset of the ground set Ωin which P and Q are ε-indistinguishable. Note that this is the same as Bad0 in Kasiviswanathan & Smith (2014b). As in Nikolov & Tang (2024), define b P = Q(SP,Q,2ε) P(SP,Q,2ε)P(T SP,Q,2ε) + Q(T\SP,Q,2ε) (4) where T Ω. This allows us to reduce differential privacy to χ2-divergence (eq. (10)) using Lemma 46 in Nikolov & Tang (2024) (see Lemma A.18). The following lemma (proven in Appendix E.2) states that, for a small bias additive noise mechanism, if Ω R and P,Q are distributions of some additive noise mechanism on neighboring datasets, then |EX b P [X] EX Q[X]| cannot be small. Lemma 2.4 Suppose the additive noise mechanism M(x) = Ax + z is (ε, δ)-differentially private. Fix any θ Rm. Let Mθ(x) : Rn R such that Mθ(x) := θ Ax + θ z where E[z] 2 ε n(p 2)/2p. Let ε, δ be such that δ = 2δ 1 e ε min{ 1 16, ε κ(A) n p 2 2p 12γ(p)(A) , 1 e ε}. Then, for any x Rn, there exists a neighboring dataset x such that if P, Q are the distributions of Mθ(x) and Mθ(x ) respectively, and let b P be the distribution defined in eq. (4), we have |EX b P [X] EX Q[X]| 1 2 2δ w ABn 1 (θ) 2 17 δ Var[θ M(x)]. We will use Lemma 2.4 to prove Theorem 2.1. To do so, we need to study the applicable range of δ in Lemma 2.4. Fix any θ Rm. Given any ε (0, 1 2), let δ(A, ε, n) be the maximum value of δ such that δ = 2δ 1 e ε min ( 1 16, 1 e ε, ε κ(A) n Published as a conference paper at ICLR 2025 Note that δ > 0 iff κ(A) > 0 as other quantities are positive. We characterize when κ(A) > 0 in Appendix F.2 that ensures δ > 0 through the following lemma: Lemma 2.5 κ(A) > 0 if and only if A has linearly independent rows. We will also need two geometric lemmas inspired by Nikolov & Tang (2024), that connect ℓ1 geometry and ℓ2 geometry, and also to the factorization norm. For K, L Rm, denote by K L v Rm, K + v L. That is, K L means that K is covered by L in terms of translation. We define Λp(A) := inf W Rm m trp/2(WW ) : ABn 1 WBm 2 o . The first lemma is similar to the one in Nikolov & Tang (2024), but for a general mechanism (instead of only for unbiased mechanisms). This lemma shows that if the variance of one way marginal of an additive noise mechanism M( ) is lower bounded by the square of the width of the sensitivity polytope ABn 1 , then ABn 1 can be covered by C p ΣM(x)Bm 2 in terms of translation with proper C. Lemma 2.6 (Nikolov & Tang (2024)) Let M : Rn Rm be any randomized mechanism and A Rm n be any matrix. If there exists some universal constant C such that for any input x Rn and any θ Rm, it satisfies Var[θ M(x)] wθ(ABn 1 ) C 2 , then ABn 1 C p ΣM(x)Bm 2 . The original lemma in Nikolov & Tang (2024) is only stated for unbiased mechanisms instead of general mechanisms. Thus, we include a proof in Appendix F (restated as Lemma F.4) for completeness. The final piece we need is a lemma implicit in Nikolov & Tang (2024) that connects Λp(A) and the factorization norm γ(p)(A). Lemma 2.7 (Nikolov & Tang (2024)) For any p [2, ] and A Rm n, Λp(A) γ(p)(A). Now we are ready to prove Theorem 2.1. Proof: [Proof of Theorem 2.1]. Let ε = 2ε log(1 δ ). Note that, for every ε > 0, we have δ 1 e ε, and thus ε 2ε + ε 3ε. Finally n1 2/p n 1. For any x and x chosen in Lemma 2.4, we consider two cases based on the variance, Var[θ M(x)]: 1. When Var[θ M(x)] < w ABn 1 (θ) 2 256δ . By Lemma 2.4, |EX b P [X] EX Q[X]| is at least 1 2 2δ w ABn 1 (θ) 2 17 δ Var[θ M(x)] 1 8δ 8 w ABn 1 (θ). Note that Q is the distribution of θ M(x ), and Var[θ M(x)] = Var[θ M(x )] since the oblivious noise θ z is independent of the input. Then, by the Hammersley-Chapman-Robins bound (Lemma A.14), we have that for such a pair of datasets (x, x ): Var[θ M(x)] = Var[θ M(x )] EX b P [X] EX Q[X] 2 χ2( b P, θ M(x )) (1 8δ )2 w ABn 1 (θ) 2 64χ2( b P, Q) (1 8δ )2 w ABn 1 (θ) 2 64e ε(e ε 1)2 . (5) Here, we used Lemma A.18 that shows that b P and Q are ε-indistinguishable (Definition A.17). Thus χ2( b P, Q) e ε(e ε 1)2 by Lemma 39 in Nikolov & Tang (2024) (also see Lemma A.13). 2. When Var[θ M(x)] w ABn 1 (θ) 2 256δ . First note that, when δ ε2 ε2, we have 1 8δ δ . Therefore, for every θ Rm, as in the other case, for any x, Var[θ M(x)] (1 8δ )2 w ABn 1 (θ) 2 64e 3ε(e3ε 1)2 . (6) Published as a conference paper at ICLR 2025 Lemma 2.6 with eq. (5) and eq. (6) implies that ABn 1 C p ΣM(x)Bm 2 where C = 16 ε 1 8δ . So C2 trp/2(ΣM(x)) inf W Rm m trp/2(WW ) : ABn 1 WBm 2 = (Λp(A))2 (7) for all p 2. That is, trp/2(ΣM(x)) (1 8δ )Λp(A) Combining Lemma 2.2, Lemma 2.7, and eq. (7) therefore gives us the result: E M(x) Ax p p 1/p q trp/2(ΣM(X)) (1 8δ )γ(p)(A) The proof of Theorem 2.1 is complete after replacing ε by ε. 2.2 PROOF OF THEOREM 1.4 Theorem 2.1 assumes that rows of the linear query matrix is linearly independent (i.e., κ(A) > 0), otherwise the lower bound reduces to the one for pure differential privacy. We next show that for additive noise mechanisms, this assumption can be removed in the most natural high privacy regime: Theorem 2.8 Fix any 0 < ε < 1 2, 0 δ 1, p [2, ] and query matrix A Rm n. If M( ) is an additive noise mechanism such that M(x) = Ax + z for any dataset x Rn and M( ) preserves (ε, δ)-differential privacy, then for every x Rn, we have that there exists a universal constant C, E M(x) Ax p p 1/p (1 δ )γ(p)(A) Cε , where δ = e1/2 1 Unlike the proof of Theorem 2.1, we prove the above result using Lemma F.5, in which some of the technical ingredients are implicit in (Kasiviswanathan et al., 2010, Lemma 4.12). Then we combine it with (Nikolov & Tang, 2024, Lemma 35). We defer the proof of Theorem 2.8 to Section F. Note that for general ε > 0, the analysis of Theorem 2.1 also naturally gives an Ω(γ(p)(A)/(e3ε 1)) lower bound when A has full rank rows, while Theorem 2.8 only works for high privacy regime. To obtain a lower bound for general (ε, δ)-differentially private algorithm, we recall that the reduction in Bhaskara et al. (2012) does not rely on the error metric. In particular, Theorem 1.4 follows by combining Theorem 2.8 and the reduction given by the following theorem to get a worst-case lower bound for arbitrary mechanisms. Theorem 2.9 (Theorem 4.3 in Bhaskara et al. (2012)) Fix any A Rm n. Let M : Rn Rm be a (ε, δ)-differentially private algorithm. Then there exists a (2ε, eεδ)-differentially private algorithm M := Ax + z with oblivious z such that errℓp p(M , A) errℓp p(M, A). 2.3 CONNECTING γ(p)( ) AND SCHATTEN-1 NORM: PROOF OF THEOREM 1.3 In previous sections, we established the connection between the hardness of privately answering linear queries defined by A and the generalized factorization norm of A, denoted as γ(p)(A). However, expressing γ(p)(A) analytically for a general matrix A can be difficult. To provide a more practical lower bound and facilitate potential applications, in the following lemma, we give a lower bound of γ(p) in terms of the Schatten-1 norm of A, which is simply the sum of singular values of A. Lemma 2.10 Let A Rm n be any real matrix. It holds that γ(p)(A) m1/p A 1/ mn. Proof: By (Nikolov & Tang, 2023, Theorem 23) and Lemma 27 in Nikolov & Tang (2024), for any p > 2, the γ(p)-norm of A can be rewritten as the following optimization problem: γ(p)(A) = max{γ(2)(DA) : D is diagonal , D 0, Trq(D2) = 1} where q = p p 2. Let D = m 1 p 1 2 I, then D is a diagonal PSD matrix and Trq(D2) = m 2 p 1 m 1 q = 1. Using (Henzinger et al., 2023, Lemma 1.1), we therefore have γ(p)(A) m1/p 1/2 γ(2)(I A) = m1/p 1/2 γ(2)(A) m1/p 1/2 A 1 n , Published as a conference paper at ICLR 2025 completing the proof. Theorem 1.3 directly follows from Lemma 2.10 and Theorem 1.4. 2.4 APPLICATION I: TIGHT LOWER BOUND FOR PRIVATE PREFIX SUM WITH ℓp p ERROR So far, we have seen that the lower bounds on privately answering linear queries depend on γ(p)(A). In this section, we focus on a fundamental type of query: prefix sum and establish an explicit bound that underpins Theorem 1.5 by giving tight upper and lower bounds of γ(p)(A) and κ(A) for such a specific A. In particular, given n N+, we consider the prefix sum (i.e., continual counting) matrix Aprefix, whose (i, j)-th entry is Aprefix[i, j] = 1 i j 0 otherwise (8) be the matrix computing prefix sum of the dataset x Rn. We first give the following lower bound on private prefix sum: Theorem 2.11 Fix any ε (0, 1 6) and p [2, ]. Then, for any δ Cεn1/p 1/2 where 12 ε(1 e ε)e ε (1 + ln(4n/5)/π), ε2e ε(1 e ε) if M : Rn Rm preserves (ε, δ)-differential privacy, then max x Rn E M(x) Aprefixx p p 1/p (1 δ) n1/p log n 96ε where δ = 2δeε Next, we show that there exists a factorization of Aprefix such that the ℓp p-error of the matrix mechanism is bounded by O(n1/p log(n)) implying the lower bound in Theorem 2.11 is optimal for p = O(1) proving Theorem 1.5. If p = Ω(1), then this lower bound is near optimal with only a Θ( log n) gap. Theorem 2.12 Fix parameters ε > 0 and 0 < δ < 1. Given any x Rn, there exists a (ε, δ)- differentially private matrix mechanism M such that E[ M(x) Aprefixx p p] 1/p 3n1/p log n log(1/δ) min{p, log(n)} The upper and lower bound in Theorem 1.5 directly follows from Theorem 2.11 and Theorem 2.12. We defer the proof of Theorem 2.12 to Appendix G.2. 2.4.1 PROOF OF THEOREM 2.11 To prove this theorem for all ε > 0, we need two things: firstly, a lower bound on the factorization norm γ(p)(Aprefix); secondly, in order to determine δ , we show that κ(Aprefix) is lower bounded by a constant (Lemma 2.14). Such a geometric property of Aprefix could also be of independent interest. Then, the explicit lower bound on privately computing Aprefixx is obtained by applying Theorem 2.1 and the black-box reduction given in Theorem 2.9 (Theorem 4.3 in Bhaskara et al. (2012)). Lemma 2.13 Let Aprefix be the matrix defined in Equation (8). Then, γ(p)(Aprefix) n1/p log n. Proof: Recall that for any p > 2, the ℓp factorization norm of Aprefix can be rewritten as: γ(p)(Aprefix) = max{γ(2)(DAprefix) : D is diagonal , D 0, Trq(D2) = 1} where q = p p 2. Let D = n1/p 1/2I, then D is a diagonal PSD matrix and Trq(D2) = n2/p 1 n1/q = 1. Using (Henzinger et al., 2023, equation (5.30)), γ(p)(Aprefix) n1/p 1/2 γ(2)(I Aprefix) n1/p log n completing the proof. Next, we compute κ(Aprefix). The proof of Lemma 2.14 is given in Section G.1. Published as a conference paper at ICLR 2025 Lemma 2.14 Let κ( ) be as defined in eq. (3). Then κ(Aprefix) = 2. Now we are ready to complete the proof of Theorem 2.11, which is a lower bound for private continual releasing of the prefix sum on arbitrary ℓp p metric with 2 p < . Recalling Theorem 2.1, it remains to show δ (Aprefix, ε, n) Cεn1/p 1/2. In particular, we have the following: δ (Aprefix, ε, n) = 2(eε 1) ( 1 16, ε κ(Aprefix) n 12γ(p)(Aprefix) , ε2 ) 6γF (Aprefix), ε2 ) 12 ε(1 e ε)e ε (1 + ln(4n/5)/π), ε2e ε(1 e ε) = Cε n1/2 1/p , where the first inequality comes from that γ(p)(A) γ(2)(A) = γF (A) for any A and p 2, the second inequality follows from Henzinger et al. (2023). This completes the proof. 3 DISCUSSION In this paper, we established lower bounds on approximating the linear query Ax with respect to approximate differential privacy under ℓp p error, so we can study the optimality of matrix mechanisms not only in expectation but also with respect to probability tail bounds. For limitations, we note that we only give a worst case lower bound over all x Rn by the definition of ℓp p error metric (see also eq. (1)). To understand why we cannot get a instance-optimal lower bound, consider a trivial mechanism Mx0 such that for any x Rn, it always outputs Ax0 where x0 Rn is any given dataset. Clearly Mx0 is not an oblivious additive noise mechanism, and it preserves perfect differential privacy, i.e., ε = 0, and perfect accuracy on the input x0, which explains why an instance-optimal lower bound is unrealistic for general mechanisms. In Nikolov & Tang (2024), the authors study unbiased mechanism, and show that the Gaussian mechanism is indeed instance-optimal over all such unbiased mechanisms, by giving an asymmetric lower bound saying that if an unbiased mechanism performs well in an input x0, then it must perform worse in some other inputs x where x neighboring x0. It is still open if such an asymmetric lower bound exists for general linear queries over all general mechanisms. ACKNOWLEDGMENTS The authors would like to thank Aleksandar Nikolov for his valuable comments and especially his insights on removing the assumption of linear independence in Edmonds et al. (2020) and an earlier version of this paper. The authors also thank George Li for his discussions during the project and his helpful feedback on the paper draft. Published as a conference paper at ICLR 2025 Aditya Bhaskara, Daniel Dadush, Ravishankar Krishnaswamy, and Kunal Talwar. Unconditional differentially private mechanisms for linear queries. In Proceedings of the forty-fourth annual ACM symposium on Theory of computing, pp. 1269 1284, 2012. T-H Hubert Chan, Elaine Shi, and Dawn Song. Private and continual release of statistics. ACM Transactions on Information and System Security (TISSEC), 14(3):1 24, 2011. Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3 4):211 407, 2014. Cynthia Dwork, Moni Naor, Toniann Pitassi, and Guy N Rothblum. Differential privacy under continual observation. In Proceedings of the forty-second ACM symposium on Theory of computing, pp. 715 724, 2010. Cynthia Dwork, Frank Mc Sherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. Journal of Privacy and Confidentiality, 7(3):17 51, 2016. Alexander Edmonds, Aleksandar Nikolov, and Jonathan Ullman. The power of factorization mechanisms in local and central differential privacy. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, pp. 425 438, 2020. Hendrik Fichtenberger, Monika Henzinger, and Jalaj Upadhyay. Constant matters: Fine-grained error bound on differentially private continual observation. In International Conference on Machine Learning, ICML, Proceedings of Machine Learning Research, pp. 10072 10092, 2023. Monika Henzinger and Jalaj Upadhyay. Improved differentially private continual observation using group algebra. In Proceedings of the 2025 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 2951 2970. SIAM, 2025. Monika Henzinger, Jalaj Upadhyay, and Sarvagya Upadhyay. Almost tight error bounds on differentially private continual counting. In Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 5003 5039. SIAM, 2023. Peter Kairouz, Brendan Mc Mahan, Shuang Song, Om Thakkar, Abhradeep Thakurta, and Zheng Xu. Practical and private (deep) learning without sampling or shuffling. In International Conference on Machine Learning, pp. 5213 5225. PMLR, 2021. Shiva Prasad Kasiviswanathan and Adam Smith. On the semantics of differential privacy: A bayesian formulation. J. Priv. Confidentiality, 6(1), 2014a. doi: 10.29012/jpc.v6i1.634. URL https://doi.org/10.29012/jpc.v6i1.634. Shiva Prasad Kasiviswanathan and Adam Smith. On the semantics of differential privacy: A bayesian formulation. Journal of Privacy and Confidentiality, 6(1), 2014b. Shiva Prasad Kasiviswanathan, Mark Rudelson, Adam Smith, and Jonathan Ullman. The price of privately releasing contingency tables and the spectra of random matrices with correlated rows. In Proceedings of the forty-second ACM symposium on Theory of computing, pp. 775 784, 2010. Shiva Prasad Kasiviswanathan, Homin K Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. What can we learn privately? SIAM Journal on Computing, 40(3):793 826, 2011. Chao Li and Gerome Miklau. Optimal error of query sets under the differentially-private matrix mechanism. In Proceedings of the 16th International Conference on Database Theory, pp. 272 283, 2013. Chao Li, Gerome Miklau, Michael Hay, Andrew Mc Gregor, and Vibhor Rastogi. The matrix mechanism: optimizing linear counting queries under differential privacy. The VLDB journal, 24 (6):757 781, 2015. Jiˇrí Matoušek, Aleksandar Nikolov, and Kunal Talwar. Factorization norms and hereditary discrepancy. International Mathematics Research Notices, 2020(3):751 780, 2020. Published as a conference paper at ICLR 2025 Hermann Minkowski. Review of les systèmes d équations linéaires à une infinite d inconnues". Paris: Gauthier-Villars, VI + 182 S. 8 . (Collection Borel.) (1913)., 1913. Shanmugavelayutham Muthukrishnan and Aleksandar Nikolov. Optimal private halfspace counting via discrepancy. In Proceedings of the forty-fourth annual ACM symposium on Theory of computing, pp. 1285 1292, 2012. Aleksandar Nikolov and Haohua Tang. Gaussian noise is nearly instance optimal for private unbiased mean estimation. Co RR, abs/2301.13850, 2023. Aleksandar Nikolov and Haohua Tang. General gaussian noise mechanisms and their optimality for unbiased mean estimation. In Innovation in Theoretical Computer Science, 2024. Aleksandar Nikolov, Kunal Talwar, and Li Zhang. The geometry of differential privacy: the sparse and approximate cases. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pp. 351 360. ACM, 2013. Friedrich Riesz. Untersuchungen über systeme integrierbarer funktionen. Mathematische Annalen, 69(4):449 497, 1910. Frigyes Riesz. Les systèmes d équations linéaires à une infinité d inconnues. Gauthier-Villars, 1913. Marcel Riesz. On the maxima of bilinear forms and on linear functionals. Acta mathematica, 49 (3-4):465 497, 1927. Adam Smith. Personal communication. 2016. G Olof Thorin. Convexity theorems generalizing those of m. riesz and hadamard with some applications. Comm. Sém. Math. Univ. Lund [Medd. Lunds Univ. Mat. Sem.], 9:1 58, 1948. Salil Vadhan. The complexity of differential privacy. Tutorials on the Foundations of Cryptography: Dedicated to Oded Goldreich, pp. 347 450, 2017. Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge University Press, 2018. Published as a conference paper at ICLR 2025 A BASIC DEFINITIONS AND PRELIMINARIES Matrix theory and Convex Geometry We first introduce several definitions regarding the matrix norms and geometric properties of the query matrix A. Definition A.1 (Schatten-1 norm) Let s1, , sm be the singular values of A, we define the Schatten-1 norm to be Definition A.2 (p-trace) Fix any d N+. Let U Rd d be a positive semi-definite matrix, we define the p-trace norm to be Naturally, we define tr (U) = maxi [d] |Uii|. The following definition for generalized factorization norm was firstly pointed out by Nikolov and Tang Nikolov & Tang (2024): Definition A.3 For any 2 p and A Rm n, we define γ(p)(A) := min LR=A trp/2(LL ) R 1 2 o . It can be verified that γ(2)(A) = γF (A) and γ( )(A) = γ2(A). This is because when p = 2, trp/2(LL ) = L 2 F and when p , then tr (LL ) = maxi [d](LL )ii = L 2 2 . We use the following result to connect the factorization norm and the Schatten-1 norm: Lemma A.4 (Henzinger et al. (2023) and Li & Miklau (2013)) Let A Cm n be a complex matrix. Then γ(2)(A) A 1 n Using Theorem A.4, Henzinger et al. Henzinger et al. (2023) showed the following bound: Theorem A.5 (Henzinger et al. (2023)) For any n N, let Mcount be a lower triangular matrix with all ones. Then γ(2)(Mcount) n 2 + ln 2n + 1 + ln(2n + 1) Definition A.6 (Parity Query) Let d and w be integer parameters and let the domain be X = { 1}d. Then a parity query is a query that belongs to the family of queries q P (x) = Y i P xi : P {1, , d}, |P| = w Definition A.7 (Hadamard matrix) Fix any integer d 1, the d-th Hadamard matrix Hd is a 2d 2d matrix Hd 1 Hd 1 Hd 1 Hd 1 When d = 0, H0 = [1]. The following definitions are related to the geometry property of a query matrix. Definition A.8 Fix any d N. For any K, L Rd, we write K L v Rd, K + v L. Published as a conference paper at ICLR 2025 This is saying that if K L, then K can be covered by L by relocating the center of K. Next, we define the width of a convex body. Definition A.9 (Width of a convex body) Given any vector θ Rm, we define the width of any convex body K Rm with respect to θ be w K(θ) := max x K θ x min x K θ x. We use Bd p to denote the unit ball of dimension d with respect to the ℓp norm. Formally, Bd p = {x Rd : x p 1}. For matrix W with d columns, we also write WBd p = {Wx : x Bd p} to denote the sensitivity polytope of W with respect to the p-th norm. Definition A.10 (Nikolov & Tang (2024)) For any query matrix A Rm n and p [2, ], we define Λp(A) := inf W Rm m trp/2(WW ) : ABn 1 WBm 2 o . Here, we give some insights about why Λp(A) in Definition A.10 is useful for establishing the lower bound. Geometrically, ABn 1 is exactly the convex body comprising differences between the ground truth output of any pair of neighboring datasets, A(x x ) where x x 1. Since Λp(A) is the minimum trace norm of WW where WBm 2 covers the sensitivity polytope ABn 1 , then, Λp(A) can be interpreted as a specific kind of measurements on the volume of the body WBm 2 that covers A(x x ) over all pair of neighboring datasets. Intuitively, if this volume gets larger, it is harder to preserve utility because the outputs of neighboring datasets will be far apart. Therefore, it gives a way to prove the lower bound by establishing a connection between the ℓp p error and Λp(A). The following lemma also reveals the relationship between Λp(A) and the factorization norm γ(p)(A): Lemma A.11 (Nikolov & Tang (2024)) For any p [2, ] and A Rm n, Λp(A) γ(p)(A). Basically speaking, for any matrix A Rm n, one can always find a factorization of A = LR such that R 1 2 is smaller than 1 and that ABn 1 LBm 2 . Then, taking the L that minimizes q trp/2(LL ) yields the above lemma. Differential privacy. Here, we first introduce Gaussian mechanism, which is the main component of the upper bound proof in this paper. Lemma A.12 (Gaussian mechanism) Fix any 0 ε, δ 1. Let f : X Y be any deterministic function. If for all neighboring dataset x, x , f(x) f(x ) 2 , then M(x) = f(x) + z where z N(0, σ2I) satisfies (ε, δ)-differential privacy as long as σ2 9 2 log(1/δ) As in Nikolov & Tang (2024), one of the necessary conditions of DP algorithms that we will consider is that DP algorithms preserve the χ2-divergence between neighboring datasets. For two distribution P and Q, the χ2 divergence between them is χ2(P, Q) := Ex Q It is not hard to verify (perhaps it is also well-known) the following lemma: Lemma A.13 (Lemma 39 in Nikolov & Tang (2024)) Suppose M is an ε-differentially private algorithm and x, x be two neighboring datasets such that x x 1 1. Let P and Q be the distributions of M(x) and M(x ) respectively. Then χ2(P, Q) e ε(eε 1)2. Published as a conference paper at ICLR 2025 The reason why we consider χ2 distribution is that the lower bound of the variance of a real random variable can be characterized by its χ2 divergence between another arbitrary random variable. This is the classical Hammersley-Chapman-Robins bound stated in the following lemma: Lemma A.14 (Hammersley-Chapman-Robins bound) For any two distributions P, Q over real numbers and for X, Y distributed, respectively, according to P and Q, we have p Var(Y ) |E[X] E[Y ]| p We also need the following lemma in this paper: Lemma A.15 (Lemma 4.4 in Kasiviswanathan et al. (2010)) Let w Rn be any single query and M (x) := w x + z (z R) be any additive noise mechanism that is (ε, 0)-differentially private for any 0 < ε < 1, then E[z2] 1 ε2 for some universal constant C. Lemma A.16 (Kasiviswanathan & Smith (2014b)) Let M be any (ε, δ)-differentially private mechanism, let P be the distribution of M(x) and Q be the distribution of M(x ). Let SP,Q,ε := ω Ω: e ε P(ω) Q(ω) eε . (11) Then max {Pr[P / S], Pr[Q / S]} δ = 2δ 1 e ε . Given two distribution P and Q and set defined by eq. (11), Nikolov & Tang (2024) defined a a distribution b P such that for any T Ω, b P = Q(SP,Q,2ε) P(SP,Q,2ε)P(T SP,Q,2ε) + Q(T\SP,Q,2ε) (12) Here we define (ε, δ)-indistinguishability: Definition A.17 (Kasiviswanathan & Smith (2014b)) Let Ωbe a ground set and µ1, µ2 be two distributions with support Ω1 Ω, Ω2 Ωrespectively. We say that µ1 and µ2 are (ε, δ)- indistinguishable for ε > 0 and δ (0, 1) if for any S Ω, it holds that µ1(S) µ2(S) eε + δ and µ2(S) µ1(S) eε + δ. If δ = 0, we also say µ1 and µ2 are ε-indistinguishable. We use the following lemma to characterize the relation between b P and Q: Lemma A.18 (Lemma 46 in Nikolov & Tang (2024)) Let P, Q be a pair of (ε, δ)- indistinguishable distributions over Ωand b P be the distribution defined in eq. (12), then log b P(ω) Q(ω) 2ε log(1 δ ) = ε for all ω Ω. Here, δ = 2δ 1 e ε . That is to say, b P and Q are ε-indistinguishable. Error Metric. Two of the normally used metrics for a private mechanism M are the squared error (denoted by err MSE) and absolute error (denoted by errℓ ), respectively: err MSE(M, A, n) := max x Rn E 1 n M(x) Ax 2 2 errℓ (M, A, n) := max x Rn E [ M(x) Ax ] . (13) B HIGH-LEVEL OVERVIEW OF OUR TECHNIQUES In this section, we briefly discuss some techniques and ideas that underpin our proof. Published as a conference paper at ICLR 2025 B.1 UPPER BOUND ON MATRIX MECHANISM IN ℓp p METRIC. To find an upper bound on answering linear queries, we use the Gaussian mechanism that adds correlated noise based on a factorization of the query matrix A. Specifically, given a query matrix A Rm n, we consider the additive noise mechanism M(x) = Ax + z. For any factorization of A = LR where L Rm k and R Rk n, such a mechanism can be rewritten as M(x) = L(Rx + z ) where Lz has the same distribution as z. Finally, we show that minimizing the ℓp p error on such mechanism is equivalent to finding an optimal factorization of A, and the optimal error can be characterized by the generalized factorization norm γ(p)(A). B.2 LOWER BOUND ON OBLIVIOUS ADDITIVE NOISE APPROXIMATE DP MECHANISMS IN ℓp p METRIC. To prove a lower bound on mechanisms that add oblivious additive noise, we consider the convex sensitivity polytope ABn 1 = {Ay : y Rn and y 1 1} of the query matrix A Rm n. We use the following measurement introduced by Nikolov & Tang (2024): trp/2(WW ) : v Rm, ABn 1 + v WBm 2 o (14) to bound the minimum scale of the variance needed for the noise to achieve differential privacy. Intuitively, if the measure of the sensitivity polytope ABn 1 is larger (in terms of q trp/2(WW )), then it is harder to make two points in ABn 1 indistinguishable. To formulate such intuition, we first establish a bridge between ℓp p error and the covariance matrix ΣM(x) Rm m of the output distribution (Lemma 2.2). Next, a direct approach is to show that if an oblivious mechanism M is (ε, δ)-differentially private, then by a standard lower bound in Kasiviswanathan et al. (2010), the square root of the covariance matrix ΣM(x) satisfies that ABn 1 + v p for some v Rm, which establishes a relationship between the infimum value in eq. (14) and the ℓp p error. Finally, we apply Lemma 20 in Nikolov and Tang Nikolov & Tang (2024) to lower bound such infimum value by the general factorization norm γ(p)(A). However, the lower bound in Kasiviswanathan et al. (2010) works in only high privacy regime. To get a lower bound for approximate DP algorithms for all ε > 0, we use the fact that output distributions of differentially private mechanisms under two adjacent datasets must be close under χ2-divergence. Consequently, we employ the χ2-divergence to set a lower bound on the minimum variance of the oblivious noise that must be introduced to achieve differential privacy. Since we do not assume the unbiasedness as in Nikolov & Tang (2024), we have to consider the bias of the oblivious noise. However, we show that such a pipeline still works if the oblivious noise has a small bias. On the other hand, if the noise has a large enough bias, then one can show that the ℓp p error is already large. Combined, we establish a lower bound that the 1 p-root of the ℓp p error is at least Ω((1 δ)γ(p)(A)/ε) for any oblivious (ε, δ)-DP mechanisms on any query matrix A Rm n with κ(A) > 0. With the lower bounds on oblivious mechanisms, we use the standard reduction in Bhaskara et al. (2012) to obtain a worst case (in terms of the input x Rn) lower bound for general (ε, δ)-DP mechanisms that might be data-dependent. B.3 COMPARISON OF TECHNIQUES As alluded to in the introduction, the focus of Nikolov and Tang Nikolov & Tang (2023) is the ℓ2 p instance optimality of matrix mechanisms among unbiased mechanisms, while we instead focus on the worst case ℓp p optimality of matrix mechanisms among any differentially private mechanisms. Here, we highlight key departures between this paper and previous works. Add/remove model v.s. Substitution model. Our main departure compared to Nikolov & Tang (2024) lies in a different privacy notion, leading to different choice of sensitivity polytopes for the geometric arguments of lower bound. To elaborate, we recall a natural way to view linear query as a mean estimation task as suggested in Nikolov & Tang (2024): Published as a conference paper at ICLR 2025 Let X be the domain of data points, X = (x1, x2, , xk) be the dataset, and Q : X Rm be the query workload. Then, answering Q(X) is equivalent to doing mean estimation in the polytope KQ = {Q(x) : x X}. The authors in Nikolov & Tang (2024) considered a substitution DP model, where a data point can be replaced by another within the domain. If we further let h RX (where |X| = n in the notation of our paper) be the histogram of the dataset, then the substitution DP model corresponds to the neighboring notion where h h 1 2 and P x X (hx h x) = 0 (i.e., bounded DP). This differs from the commonly used privacy notion for linear queries (as used in Bhaskara et al. (2012), Edmonds et al. (2020) etc.) where we only ask h h 1 = O(1), representing a natural ℓ1 sensitivity, and is more like the add/remove DP model when translated back to mean estimation. 6 Even if we restrict our discussion to the substitution DP model, the lowerbound for mean estimation from Nikolov and Tang would be in terms of Γp(KQ) := inf nq Trp/2(AA ) : KQ ABd 2 o . In contrast, our lowerbound for linear queries is in terms of the factorization norm of the workload matrix A associated with the query Q. These two are not comparable: imagine when the workload matrix consists of vectors that are very far from the origin, but are close to each other; then we have small Γp(KQ) but a substantially larger factorization norm of the workload matrix A. Therefore, a lower bound in terms of Γp(KQ) does not imply a lower bound in terms of the factorization norm γ(p)(A) as ours. Essentially, this is due to the fact that the most suitable choice for "sensitivity polytope" is different for mean estimation and linear queries. Technical comparisons. Our departure in analysis and its complication compared to previous works stems from the fact that we do not assume unbiased mechanisms. Our different approach also means that our dependency on κ appears only in the applicable range of the privacy parameter δ, instead of showing up in the lower bound itself. We elaborate it next. The lower bound of Nikolov & Tang (2024) combined techniques from Edmonds et al. (2020) for oblivious mechanisms with the classical results for unbiased estimators, i.e., they crucially rely on the estimator being unbiased. We first explain at a high level why they need the assumption of an unbiased mechanism. Edmonds et al. (2020) showed that the variance of the one-dimensional private mechanism is lower bounded by the width of the underlying sensitivity polytope. For a data oblivious mechanism, as considered in Edmonds et al. (2020) in their first step, this almost immediately implies a lower bound. However, this might not always be true for an unbiased mechanism. In fact, since Edmonds et al. (2020) consider the ℓ2 error metric, they can assume without any loss of generality that the bias is 0. This is not the case for ℓ2 p error considered in Nikolov & Tang (2024) or ℓp-error as considered in this paper. This causes the departure of our proof technique from Edmonds et al. (2020) and Nikolov & Tang (2024) since we cannot assume that the bias is 0 either by an assumption of unbiased mechanism or because of the choice of metric (i.e., ℓ2 error metric). We first show in Lemma 2.3 that the error would be large if the bias is large enough. So, the rest of our proof has to deal with the setting when the bias is small. In fact, using a case analysis based on the magnitude of bias is also helpful from another perspective: our lower bound depends only on γ(p)( ) norm while the effect of minimum width of sensitivity polytope is reflected in the applicable range of δ when we consider any ε > 0 (including the low privacy regime). In general, the width of the sensitivity polytope can be 0 as shown in Lemma 2.5, but as we show it is lower bounded by a constant for two important linear query matrices. Further, for general linear query matrices whose sensitivity polytope has a small minimum width, say 1/n, our lower bound remains non-trivial, while Nikolov & Tang (2024) only provided a very weak lower bound (that is, dependent inversely on the dimension). We discuss it next. For approximate differential privacy, Nikolov & Tang (2024) proved that any mechanism would have a large error either on the input x or one of its neighbor x . This is because they rely on a classical result from statistics, known as Hammersley-Chapman-Robins bound (Lemma A.14). To apply this bound, they need to prove that the χ2-divergence between the mechanism s output on two neighboring datasets is bounded. However, while this is true for ε-differential privacy, this is not true for (ε, δ)-differential privacy because the support of the two mechanisms might differ. To ensure that 6We have also confirmed with the authors of Nikolov & Tang (2024) that if using add/remove DP as in our setting, a lot of results in their paper need to be modified. Consequently, there is no black-box reduction and it does not appear to simplify our current proof. Published as a conference paper at ICLR 2025 the two distributions have the same support, they use the general trick used in differential privacy (and, to our knowledge, first appeared in Kasiviswanathan & Smith (2014a)) and define a set as we defined in eq. (4). This set serves two purposes: (i) the χ2-divergence between both distributions is bounded, and (ii) the difference of the expectation of either of the two distributions restricted over the set is close to the original unrestricted distribution unless one of the two distributions has a large variance. We can now do the case analysis. In case 1, if the expectation of neither of the two distributions changes much, we can restrict our attention to the defined set. Otherwise, we are in case 2, where we just pick the distribution whose expectation changed by a lot and for which we are in the case where the variance is high. As a result, one can only prove that either Munbiased(x) or Munbiased(x ) have a large error. There is another price with this analysis. If we are in case 2, then their technique gives a lower bound on the variance that depends on the minimum width (κ( ) in our paper and w0( ) in Nikolov & Tang (2024)). Due to Lemma 2.5, their result by itself is vacuous if the query matrix A has linearly dependent rows. They alleviate this concern using the following trick: one can always find a random subspace, so the minimum width is at least the inverse of the dimension of the original sensitivity polytope under the projection onto that subspace. In other words, in case 2, we can only prove a lower bound with inverse dependence on the dimension. As a result, the lower bound is less useful as the dimension increases. Since the reduction from the class of oblivious additive noise differentially private mechanism to the class of general differentially private mechanism follows from Bhaskar et al. Bhaskara et al. (2012), we only focus on the class of oblivious additive noise differentially private mechanism in the following exposition. Since the large bias case is easy to deal with (and already implies a lower bound on the error as shown in Lemma 2.3), we need to deal with the case when the mechanism has a small bias. Dealing with the possibility of bias results in an extra term of E[θ z] 1 Q P in eq. (17), where P and Q are the distribution of the output of the mechanism on two neighboring datasets, θ Rd and z is the noise which is stochastically independent of the input. Since E[θ z] is not identically zero, this term finally results in an extra term of δ E[θ z] term. For a non-vacuous lower bound, this term has to be o(w ABn 1 (θ)) in all directions θ. Using the fact that we are in the low bias case, we have an upper bound on E[θ z]; this gives us an applicable range of δ , i.e., the value of δ for which the term E[θ z] o(w ABn 1 (θ)), which in turn depends on the narrowest direction of w ABn 1 . This narrowest direction is κ(A) by definition. C PROOF OF THEOREM 1.6 We use the observation made in Edmonds et al. (2020). Let Q = Qd,w be the corresponding matrix of the w-way parity queries on the domain { 1, 1}d. Then, Q is the sub-matrix of a 2d 2d Hadamard matrix H (see also Definition A.7) produced by selecting d w rows of H. We have the following lemma that gives the lower bound on κ(Q). This allows us to set the range of δ and combined with the worst case lower bound Theorem 1.4 give an (ε, δ)-DP lower bound for general mechanisms on answering parity queries. Lemma C.1 κ(Q) 2. Proof: Let ℓ= d w and let q 1 , q 2 , q ℓbe the rows of Q. We first note that since Q contains ℓ rows of a Hardamard matrix, then each row of Q is orthogonal to each other, and the ℓ2 norm of each row q i is 2d/2 where 1 i ℓ. We recall that κ(Q) := min θ θ=1 max x 1 1 θ Qx min x 1 1 θ Qx . (15) First note that for any fixed unit vector θ = (θ1, θ2, , θℓ) Rℓ, θ Q 2 2 = (θ1q 1 + + θℓq ℓ)(θ1q1 + + θℓqℓ) = i=1 θ2 i q i qi = 2d ℓ X i=1 θ2 i = 2d, Published as a conference paper at ICLR 2025 where the second equality comes from that q i qj = 0 for any i = j. Then, we choose x+ R2d be the vector such that x+ 1 = 1 and θ Q = cx+ for some scalar c > 0, and x = x+. Finally, observe that θ Qx+ θ Qx = 2θ Qx+ = 2 θ Q 2 x+ 2 2 θ Q 2 where the inequality comes from the fact that x+ 2 x1 1/ 2d. Since for any θ we can always find such a pair of x+ and x , then we have κ(Q) 2 by eq. (15). Lemma C.2 Q 1 = d w 2d/2. Proof: As noted in Edmonds et al. (2020), the parity query matrix, Q, is the submatrix formed by choosing the appropriate d w rows of a 2d 2d unnormalized Hadamard matrix. In other words, n = 2d and m = d w . Since the Hadamard matrix is orthogonal, the rows of Q are linearly independent. Furthermore, there are d w singular values, all of which are 2d/2. Since Q 1 is just the sum of the singular values of Q, we have the result. Setting n = 2d and m = d w gives us the required bound and proof of Theorem 1.6. D PROOF OF THE UPPER BOUND We first state the theorem in its full generality for the ease of the readers. Theorem D.1 Fix any 0 < ε, δ < 1 and 2 p < . For any query matrix A Rm n and dataset x Rn, there exists a factorization of A = LR and a parameter σ = σ(ε, δ, R) such that the mechanism M(x) := L(Rx + z) where each entry in z is i.i.d sampled from N 0, σ2 preserves (ε, δ)-differential privacy. Moreover, E M(x) Ax p p 1/p 3γ(p)(A) log(1/δ)min{p, log(2m)} Proof: Let ρ = ε 3 log(1/δ). Note that the factorization of query matrix A is independent of x. Thus, the mechanism M( ) can be considered as the post-processing of Rx + z. The ℓ2 sensitivity of Rx can be bounded by Rx Rx 2 max y 1=1 Ry 2 = R 1 2, since x x 1 1 if (x, x ) is a pair of neighboring datasets. Then, let σ2 = 2 2ρ2 where = R 1 2, by Lemma A.12, Rx + z preserves (ε, δ)-DP as well as M(x). For the utility part, we consider the Gaussian variable z = Lz and thus z N(0, σ2LL ). Then, the ℓp p error can be formulated as E M(x) Ax p p 1/p = E LRx + Lz Ax P p 1/p = E Ax + z Ax P p 1/p = E z p p 1/p = i n E[|z i|p] min{p, log(2m)} i n (Var[z i]) p 2 min{p, log(2m)} σ q min{p, log(2m)} q trp/2(LL ) R 1 2. Letting L and R be the optimal factorization of A yields the desired result. Here, the inequality comes from the standard bound on the p-th moment of the Gaussian variable (Proposition 2.5.2 in Vershynin Vershynin (2018)) and the union bound over all coordinates respectively. Published as a conference paper at ICLR 2025 E MISSING PROOFS FROM SECTION 2.1 Recall that ΣM(x) = E[(M(x) E[M(x)])(M(x) E[M(x)]) ] is the covariance matrix of M(x). E.1 PROOF OF LEMMA 2.2 The proof follows from the following set of derivation. E M(x) Ax p p 2/p E M(x) Ax 2 p = E i=1 (M(x)i (Ax)i)2 p E (M(x)i (Ax)i)2 p E z2 i (Ezi)2 p i=1 Var[M(x)i] p 2 p = trp/2(ΣM(x)). E.2 PROOF OF LEMMA 2.4 Note that θ M(x) preserves (ε, δ)-differential privacy for any θ if M( ) is (ε, δ)-differentially private. Further, w ABn 1 (θ) = maxv ABn 1 θ v minv ABn 1 θ v. For any proposition P, we let 1{P} = 1 if P is true 0 otherwise . Let S = SP,Q,2ε. By the definition of b P, similar to Nikolov & Tang (2024), we have |EX b P [X] EX Q[X]| = x b P(x) x Q(x) + Z = Q(S) P(S)EX P [X1{X S}] EX Q[X1{X S}] Q(S) P(S)EX P [X] EX Q[X] | {z } S1 Q(S) P(S)EX Q[X1{X / S}] EX Q[X1{X / S}] | {z } S2 Published as a conference paper at ICLR 2025 We now bound the above two terms separately. Recall that P and Q are distributions of Mθ(x) = θ (Ax + z) and Mθ(x ) = θ (Ax + z) respectively, then S1 := Q(S) P(S)EX P [X] EX Q[X] θ A Q(S) P(S)x x |E[θ z]| 1 Q(S) By Lemma A.16, 1 δ P(S) 1 and 1 δ Q(S) 1. Further, if we chose ε and δ such that δ = 2δ 1 e ε < 1 P(S) 1 1 δ 1 + 2δ . (18) Now we consider the term f(x, y) := θ A Q(S) We do a case analysis based on the ratio Q(S) P (S) 1, then ABn 1 KP,Q := A Q(S) P(S)x y : x y 1 Therefore, there exists a pair of (x+, x +) with (x+ y+) Bn 1 such that f(x+, y+) = θ A Q(S) = w ABn 1 (θ) 2 . P (S) < 1, then the set K P,Q = P(S) Q(S) A Q(S) P(S)x y : x y 1 Q(S)y : x y 1 contains ABn 1 . In this case, there also exists a pair of (x , y ) with (x y ) Bn 1 such that θ A Q(S) P(S) w ABn 1 (θ) 2 (1 δ )w ABn 1 (θ) 2 . Finally, we have that Q(S) P(S)EX P [X] EX Q[X] (1 δ ) w ABn 1 (θ) 2 2δ Eθ z. (19) Next, we try to bound the second term in eq. (16): S2 = Q(S) P(S)EX P [X1{X / S}] EX Q[X1{X / S}] Q(S) P(S)EX P [(X EP [X])1{X / S}] EX Q[(X EQ[X])1{X / S}] + Q(S) P(S)EP [X] EP [1{X / S}] EQ[X] EQ[1{X / S}] Q(S) P(S)EX P [(X EP [X])1{X / S}] | {z } S21 + |EX Q[(X EQ[X])1{X / S}]| | {z } S22 + δ Q(S) P(S)EX P [X] EX Q[X] | {z } S23 We bound each of these terms separately. Published as a conference paper at ICLR 2025 Bounding S21 and S22 Using Q(S) P (S) 1 + 2δ , we have P(S) |EX P [(X EP [X])1{X / S}]| (1 + 2δ ) |EX P [(X EP [X])1{X / S}]| (1 + 2δ ) p EX P [(X EP [X])2]E[1{X / S}] (1 + 2δ ) p δ EX P [(X EP [X])2] (1 + 2δ ) q δ Var[θ M(x)]. Similarly, we see that S22 = |EX Q[(X EQ[X])1{X / S}]| q δ Var[θ M(x )]. S21 + S22 = Q(S) P(S) |EX P [(X EP [X])1{X / S}]| + |EX Q[(X EQ[X])1{X / S}]| δ Var[θ M(x)] + q δ Var[θ M(x )] (1 + 2δ ) q δ Var[θ M(x)] + q δ Var[θ M(x )], where the last inequality is due to eq. (18). Bounding S23 With a similar argument as in S1, we have S23 = δ Q(S) P(S)EX P [X] EX Q[X] δ θ A Q(S) P(S)x y + δ |E[θ z]| 1 Q(S) δ w ABn 1 (θ) 2 + 2δ Eθ z , (22) where the last inequality can be achieved under the same choice of x and y as in eq. (19). Plugging the bound in eq. (21) and eq. (22) in to eq. (20), we get S2 S21 + S22 + S23 δ w ABn 1 (θ) 2 + 2δ Eθ z + (1 + 2δ ) q δ Var[θ M(x)] + q δ Var[θ M(x )] (23) Plugging eq. (17) and eq. (23) in eq. (16) and setting (ε, δ) such that 16, ε κ(A) n 12γ(p)(A) , ε2} 1 for any fix θ, we have that for every x Rn, there exists an x such that x x 1 1 and |EX b P [X] EX Q[X]| Q(S) P(S)EX P [X] EX Q[X] Q(S) P(S)EX Q[X1{X / S}] EX Q[X1{X / S}] (1 2δ ) w ABn 1 (θ) 2 3δ Eθ z (1 + 2δ ) q δ Var[θ M(x)] q δ Var[θ M(x )] (1 2δ ) w ABn 1 (θ) 2 ε κ(A) 4γ(p)(A) γ(p)(A) ε (2 + 2δ ) q δ Var[θ M(x)] 2 2δ w ABn 1 (θ) 2 17 δ Var[θ M(x)], which completes the proof of Lemma 2.4. Here, the second last inequality comes from that M( ) adds oblivious noise and thus Var[θ M(x)] = Var[θ M(x )]. Published as a conference paper at ICLR 2025 F MISSING PROOFS IN SECTION 2.2 F.1 PROOF OF THEOREM 2.8 The key step in proving Theorem 2.8 is applying Lemma 4.12 in Kasivishwanathan et al. Kasiviswanathan et al. (2010). Lemma F.1 (Kasiviswanathan et al. (2010)) Suppose X, Y are real-valued random variables with statistical difference at most e1/2 1 + δ. Then, for all a R, at least one of E[X2] or E[(Y a)2] is Ω(a2(1 δ)2). The following lemma introduced ε into the above lower bound. Lemma F.2 (Dwork and Roth Dwork & Roth (2014)) Fix any 0 < ε 1 2 and δ > 0. Let A(x) : Rn R be any randomized algorithm. If A is (ε, δ)-differentially private, then A 1 (1/2, e1/2 1 eε 1 δ)-DP. We first prove the following lemma based on Lemma F.1, which has also been claimed in Edmonds et al. (2020) (Lemma 26) but without a proof. Lemma F.3 Let w Rn be any single query and M(x) := w x + z (z R) be any dataindependent mechanism that is (ε, δ)-differentially private for 0 < ε 1 2 and 0 δ 1, then (E[z2])1/2 1 δ Cε w for some universal constant C. Here, δ = e1/2 1 Proof: We consider the mechanism M (x) = 2εM( 1 2εx) = w x + 2εz. Let δ = e1/2 1 eε 1 δ, then M is ( 1 2, δ )-differentially private. Fix any pair of neighboring dataset x and x such that x x 1 1. Let X = M (x) and Y = M (x ) respectively. Then, it is easy to verify that d T V (X, Y ) = max S R |Pr[X S] Pr[Y S]| e1/2 1 + δ since M is ( 1 2, δ )-differentially private and thus Pr[X S] e1/2Pr[Y S] + δ for any S. Next, let X = X w x = 2εz, Y = Y w x = 2εz + w (x x) and a = w (x x). Then d T V (X , Y ) = e1/2 1 + δ and thus by Lemma F.1 (Lemma 4.12 in Kasiviswanathan et al. (2010)), we have E[z2] = 1 4ε2 E[X 2] = 1 4ε2 E[(Y a)2] (w (x x ))2 Cε2 (1 δ )2 for some universal constant C. Finally, choose the pair of neighboring datasets x and x that maximizes w (x x ) completes the proof. Now, we are ready to start the proof of Theorem 2.8. Proof: (Of Theorem 2.8.) This proof can be considered as a complementary version of the proof in Nikolov & Tang (2024) and Edmonds et al. (2020) since they only focus on unbiased mean estimation or linear queries in ℓ2 2 metric. We recall the reader the notation K L for K, L Rm. The notations means that there exists a v Rm such that K + v L. We now restate Lemma 2.6 and give a proof here: Lemma F.4 (Restatement of Lemma 2.6 in (Nikolov & Tang, 2024)) Let M : Rn Rm be any randomized mechanism and A Rm n be any matrix. If there exists some universal constant C such that for any input x Rn and any θ Rm, it satisfies Var[θ M(x)] wθ(ABn 1 ) C then ABn 1 C p ΣM(x)Bm 2 . Published as a conference paper at ICLR 2025 Proof: Recall that ΣM(x) = E[(M(x) E[M(x)])(M(x) E[M(x)]) ] is the covariance matrix of M(x). Therefore, Var[θ M(x)] can be written as q Var[θ M(x)] = q Note that p ΣM(x)θ 2 = p ΣM(x)θ 2 u 2 for any u Bm 2 . Therefore, by Cauchy-Schwarz inequality, we have that p ΣM(x)θ 2 max u Bm 2 θ p ΣM(x)u = max v E θ v ΣM(x)Bm 2 . In the above, the equality can be achieved if u = θ/ θ 2. Now, for any v ABn 1 , let Kv = {u v : u ABn 1 } be a convex body. Since v ABn 1 , for any θ Rm, max u Kv θ u = max w ABn 1 {θ w} θ v 0. This implies the following set of inequalities: max u Kv θ u max u Kv θ u + max u Kv θ u = max u Kv θ u min u Kv θ u = wθ(Kv) = wθ(ABn 1 ). Finally, the assumption in Lemma F.4 is equivalent to the following: for any θ Rm, c max w E θ w max u Kv θ u. Since both E = p ΣM(x)Bm 2 and Kv = ABn 1 v (for some v ABn 1 ) contain zero vector, we have Kv c p ΣM(x)Bm 2 . This completes the proof of Lemma F.4 (and Lemma 2.6). In the view of Lemma F.4, we next show that for any direction θ Rm, p Var[θ M(x)] 1 εw ABn 1 (θ) as long as M(x) is (ε, δ)-differentially private. Lemma F.5 Fix any 0 < ε < 1 2, 0 δ < 1 and a query matrix A Rm n. Let C > 0 be some universal constant. If M( ) is an additive noise mechanism such that M(x) = Ax + z for any dataset x Rn and M( ) preserves (ε, δ)-differential privacy, then for any x Rn and any direction θ Rm, let δ = e1/2 1 eε 1 δ, we have q Var[θ M(x)] 1 δ Cε w ABn 1 (θ). Proof: [Proof Of Lemma F.5] Given any vector θ Rm, recall that the width of a convex body K Rm with respect to θ is defined as w K(θ) := max x K θ x min x K θ x in Definition A.9. In this Lemma, we aim to show that if M(x) preserves (ε, δ)-differential privacy, then the variance of the one-dimensional marginal of M(x) cannot be very small in terms of w ABn 1 (θ). Unlike Nikolov & Tang (2024), since we focus on the additive noise mechanisms, we consider Lemma F.3 relating to the lower bound for such mechanisms. Fix any θ Rd. We remark that we are trying the give the lower bound of the variance of onedimensional marginal θ M(x), and θ M(x) = θ Ax + θ z. By the post-processing property, θ M(x) also preserves (ε, 0)-differential privacy since M(x) is ε-differentially private. Thus, by Lemma F.3, E (θ z)2 = Var[θ M(x)] (1 δ )2 ε2 Aθ 2 (1 δ )2 ε2 max x x 1 1 |θ A(x x )|2. (25) Published as a conference paper at ICLR 2025 Fix any θ Rm. We then show that for any x, there always exist a neighboring dataset x such that |θ A(x x )| can be lower bounded by w ABn 1 (θ)/2. This gives a lower bound of E (θ z)2 . The construction closely follows Nikolov & Tang (2024) and we state the construction here for completeness. Consider a mapping f : Rm Rn from ABn 1 to Bn 1 such that for any v ABn 1 , v = Af(v). Given any θ Rm, let w be the vector in ABn 1 that maximizes θ w. Then, we can choose x + such that (x, x +) is a pair of neighboring datasets such that x x + = f(w). In this case, θ A(x x +) = θ Af(w) = θ w = max v ABn 1 θ v. Similarly, for any x we can also find another pair of neighboring datasets x and x such that θ A(x x ) = max v ABn 1 θ v = min v ABn 1 θ v. Thus, we have |θ A(x x +)| + |θ A(x x )| = max v ABn 1 θ v + min v ABn 1 θ v max v ABn 1 θ v min v ABn 1 θ v w ABn 1 (θ). In particular, this implies that, for any θ Rm and any x Rn, there exists an x Rn neighboring to x such that |θ A(x x )| w ABn 1 (θ) 2 , where w K(θ) := max v K v θ min v K v θ. (26) Combining eq. (25) and eq. (26), we get Var[θ M(x)] (1 δ )2 ε2 max x x 1 1 |θ A(x x )|2 ε2 |θ A(x x )|2 (1 δ )2 w ABn 1 (θ) ε for any θ Rm. We now proceed to complete the proof of Theorem 2.8. As a consequence of Lemma F.4 and Lemma F.5, by setting W = Cε p ΣM(x), we see that for any (ε, δ)-differentially private algorithm M and p 2, there are trp/2(ΣM(x)) inf W Rm m trp/2(WW ) : ABn 1 WBm 2 o = Λp(A). Then, by Lemma 2.7 and Lemma 2.2, we have E M(x) Ax p p 1/p 1 δ trp/2(ΣM(X)) (1 δ )Λp(A) Cε (1 δ )γ(p)(A) which completes the proof of Theorem 2.8. F.2 PROOF OF LEMMA 2.5 Suppose A has linearly independent rows, then for any non-zero bθ Rm, bθ A must have at least one non-zero elements since bθ A would be linear combinations of rows of A. Let the non-zero element be (bθ A)i > 0. Then κ(A) = min θ θ=1 max x Bn 1 θ Ax min x Bn 1 θ Ax 2(bθ A)i > 0. On the other hand, if A has linearly dependent rows, then there will be a bθ Rn such that bθ A = 0, and thus κ(A). Intuitively, such A maps Bn 1 to a lower dimension polytope. Published as a conference paper at ICLR 2025 Figure 1: A geometric intuition of Lemma 2.14. G MISSING PROOFS FROM SECTION 2.4 G.1 PROOF OF LEMMA 2.14 Proof: We first show that κ(Aprefix) 2. Fix any unit vector θ = (θ1, θn) Rn, we have max x 1 1 θ Aprefixx = |θ1| + 2|θ2| + + n|θn| = i=1 i|θi|. (27) We show the minimum value of eq. (27) by induction. We claim that for any 1 i n, conditioned on Pi j=1 θ2 j = a for any 0 < a 1, the minimum value of Pi j=1 j|θj| is at least c. Now, consider the new condition Pi+1 j=1 θ2 j = c for some 0 < c 1. Let a = c θ2 i+1, according to the assumption, we have that i X j=1 j|θj| + (i + 1)|θi+1| q c θ2 i+1 + (i + 1)|θi+1|. Consider the function f(y) = p c y2 + (i + 1)y for 0 < c 1 and 0 y c. We have df dy = y c y2 +i+1. Clearly for any i 1, there exists a 0 < c0 < c such that f(y) monotonically increasing in (0, c0) and monotonically decreasing in (c0, c). Thus, f(y) min{f(0), f( c)} c. That is, j=1 j|θj| + (i + 1)|θi+1| c. Since c can be any value in (0, 1] and |θ1| = 1 if θ2 1 = 1, by induction, for any unit vector θ Rn, max x 1 1 θ Aprefixx = i=1 i|θi| 1. With a symmetric argument, we have that for the same vector θ, min x 1 1 θ Aprefixx = i=1 i|θi| 1, Which implies that κ(Aprefix) 2. On the other hand, Let bθ = e1 = (1, 0, , 0) , then it is easy to see that w ABn 1 (bθ) = 2. Thus, κ(Aprefix) = 2. Figure 1 also gives a geometric explanation of the most narrow width of ABn 1 when n = 2. In the following diagram, x+ = (1, 1) Aprefix B2 1 and x = ( 1, 1) Aprefix B2 1. Published as a conference paper at ICLR 2025 G.2 PROOF OF THEOREM 2.12 We show two factorizations of the counting matrix Aprefix that works across all p-norm for constant p. If we use the factorization of Aprefix as in Fichtenberger et al. (2023) or Henzinger & Upadhyay (2025). We here discuss Fichtenberger et al. (2023) for its simplicity. To recall their result, they construct matrices L and R such that LR = Aprefix and L[i, j] = R[i, j] = f(i j) i j 0 i < j , where f(k) = 1 1 2k f(k 1) k 1 1 k = 0 By noting that f(k) is the Wallis formula, we know that f(k) 1 πk. This implies that i=1 R[i, 1]2 = i=1 f(i 1)2 1 + 1 π(i 1) = O(log(n)) trp/2(LL ) = i=1 ( L[i, :] 2)p/2 !1/p i=1 logp/2(i) That is, γ(p)(Aprefix) = O(n1/p log(n)). Then, Theorem 2.12 follows using Theorem D.1.