# robust_testing_and_estimation_under_manipulation_attacks__923132b0.pdf

Robust Testing and Estimation under Manipulation Attacks

Jayadev Acharya * 1 Ziteng Sun * 1 Huanyu Zhang * 1

We study robust testing and estimation of discrete distributions in the strong contamination model. We consider both the centralized setting and the distributed setting with information constraints including communication and local privacy (LDP) constraints. Our technique relates the strength of manipulation attacks to the earth-mover distance using Hamming distance as the metric between messages (samples) from the users. In the centralized setting, we provide optimal error bounds for both learning and testing. Our lower bounds under local information constraints build on the recent lower bound methods in distributed inference. In the communication constrained setting, we develop novel algorithms based on random hashing and an ℓ1/ℓ1 isometry.

1. Introduction

Data from users form the backbone of modern distributed learning systems such as federated learning (Kairouz et al., 2019). Two of the key aspects of such large-scale distributed systems that make inference tasks challenging are

(i) information constraints at the users (e.g., preserving privacy, bandwidth limitations), and (ii) γ-manipulation attacks where an adversary has complete control over a γ fraction of the users.

An extreme example is when malicious users are deliberately injected to disrupt the system. Note that when there are only manipulation attacks but no information constraints, the setting is equivalent to the robust inference where a fraction of the samples can be adversarially corrupted.

(Cheu et al., 2021) initiated the study of manipulation attacks under local differential privacy (LDP), thereby considering the practically important setting where both of the

*Equal contribution 1Electrical and Computer Engineering, Cornell University, Ithaca, USA. Correspondence to: Jayadev Acharya <acharya@cornell.edu>, Ziteng Sun <zs335@cornell.edu>, Huanyu Zhang <hz388@cornell.edu>.

Proceedings of the 38 th International Conference on Machine Learning, PMLR 139, 2021. Copyright 2021 by the author(s).

challenges above exist simultaneously. Motivated by their work, we further study manipulation attacks for inference on discrete distributions both with and without information constraints.

X1 X2 X3 . . . Xn 2 Xn 1 Xn

W1 W2 W3 . . . Wn 1 Wn 1 Wn

Y1 Y2 Y3 . . . Yn 2 Yn 1 Yn

Y1 Y 2 Y3 . . . Y n 2 Yn 1 Yn

Figure 1. The information-constrained distributed model with manipulation attack. The shaded messages are manipulated.

Problem setup. Let k be the simplex of all distributions over a discrete domain X of size k (wlog let X = [k] := {1, . . . , k}), and Xn := (X1, . . . , Xn) be n independent samples from an unknown p k which are distributed across n users. Each user i then sends a message Yi Y based on Xi according to a pre-speciﬁed communication protocol Π to the server.

An adversary has access to Π and observes the intended messages Y n := (Y1, . . . , Yn). It then chooses a set C [n] with |C| m:=γn and performs an attack MC : Yn Yn as follows: for each i C, it can change Yi to an arbitrary Y i . The output of this attack is Zn:=(Z1, . . . , Zn) = MC(Y n), which satisﬁes Zi = Yi if i / C, and Zi = Y i if i C. We call this a γ-manipulation attack. A central server observes Zn (and has no knowledge about C or MC) and has to solve an inference task on p. See Figure 1 for an overview of the model. Remark 1. A natural question to ask is what happens if the adversary, in addition to Y n, can also observe original samples Xn. The algorithms proposed in this paper all work with the same guarantee under this setting as well.

Robust Testing and Estimation under Manipulation Attacks

Moreover, the proposed attacks only use Y n, and therefore any optimality result naturally holds under this stronger threat model as well. Remark 2. The threat model is closer to the strong contamination model (Diakonikolas & Kane, 2019) considered in recent literature on robust statistics. This adversary is stronger than the setting of (Cheu et al., 2021) where it is only allowed to select C and choose Y i s based on the protocol Π instead of the messages Y n.

Communication protocol. We consider public-coin noninteractive protocols in this work. The users have access to public randomness U, which is independent of Xn. Based on U, user i chooses a channel Wi, which is a (possibly randomized) mapping described by

Wi(y | x) = Pr (Yi = y | Xi = x) .

When the input distribution is p and the channel is Wi, the distribution of Yi is

Pr (Yi = y) = X

x p(x)Wi(y | x) = EX p [Wi(y | X)] .

For a given set of channels W n and input distribution p, let p Y n denote the output distribution of messages, and by independence of Xi s,

p Y n(yn) =

i=1 EX p [Wi(yi | X)]

All users then send their messages Yi = Wi(Xi), i [n] to the server simultaneously. The adversary also observes U and can use this information in its attack (e.g., choose C dependent on U as well).

Information constraints. We model information constraints at the users by a set of channels W with input domain [k]. We illustrate this with two canonical examples, local differential privacy and communication constraints.

Local differential privacy (LDP). A channel W : [k] Y = {0, 1} is ε-LDP if y Y, x, x X,

W(y | x) eεW(y | x ),

which requires that the output distributions are close no matter what the input is, hence protecting the identity of the input. We use Wε to denote the set of all ε-LDP channels.

Communication constraints. Let ℓ< log k, and Wℓ:= {W : [k] Y = {0, 1}ℓ} be the set of channels that output ℓ-bit messages.

Inference tasks. We consider the fundamental tasks of distribution estimation (learning) and goodness-of-ﬁt (identity testing), described below.

Distribution learning (DL). The goal is to design messaging schemes and an estimator bp : Yn k for the underlying distribution p. The loss is measured in the expected total variation (TV) distance between bp and p, i.e., Ep [d TV(bp(Zn), p)] , where the expectation is over the randomness of samples Xn p and the scheme. We wish to characterize the following minimax loss (risk) under manipulation attacks, where we design the best messaging schemes W n := (W1, . . . Wn) Wn and estimator ˆp for the worst distribution1:

RDL(k, n, W, γ):= inf ˆp,W n sup p sup MC:|C| γn E [d TV(bp(Zn), p)] .

Without any information constraints and without any manipulation (i.e., γ = 0), when the server observes Xn, the risk is known to be Θ( p

k/n) achieved by the empirical histogram.

Identity testing (IT). Let q k be a known reference distribution and α > 0 be a distance parameter. The goal is to design Wn and a tester T : Yn {yes, no} such that under any γ-manipulation attack MC,

Pr p [T (Zn) = yes ] > 0.9, if p = q,

Pr p [T (Zn) = no ] > 0.9, if d TV(p, q) α. (1)

In other words, with probability at least 0.9, we can test if p = q or p is α-far in total variation distance from q. The minimax risk of IT under manipulation attacks is

RIT(k, n, W, γ) :=

inf{α: q k, W n, T , s.t. (1) holds},

the smallest α for which we can test if a distribution is α-far from q. Uniformity testing (UT) refers to the testing problem where we restrict q to be u[k], the uniform distribution over [k], we denote the corresponding risk as RUT(k, n, W, γ). We also denote the smallest α such that (1) holds with probability β (instead of 0.9) by Rβ IT(UT)(k, n, W, γ).

Without constraints and attacks, the risk is known to be Θ(k1/4/ n) (Paninski, 2008; Chan et al., 2014).

We now mention two special cases of our setting.

When there are no information constraints (i.e., W contains any scheme), users can transmit Yi = Xi, namely the samples can be sent as is. Then a γ-fraction of the samples are corrupted, reducing to the strong contamination model in robust estimation where a γ fraction of the samples are corrupted. We denote the rates as RDL(IT)(k, n, γ), dropping W from the notation.

1Note here by deﬁnition, sup MC:|C| γn should be inside the expectation since the attacker can observe the messages. However, since MC is a function of Y n, sup MC already covers all possible attacks. Hence both minimax formulations have the same quantity.

Robust Testing and Estimation under Manipulation Attacks

When γ = 0, there is no manipulation and only information constraints are present, and we denote the rates by RDL(IT)(k, n, W).

Organization. We present our contributions and related work in Section 2 and 3 respectively. In Section 4, we establish our lower bound technique based on earth-mover distance (EMD). In Section 5 we establish tight risk bounds without manipulation attacks (γ = 0). In Section 6 we show bounds for manipulation attacks under information constraints.

2. Our contributions

Lower bounds from EMD. Since manipulation attacks can change a γ-fraction of the n messages, we characterize the difﬁculty of learning and testing under such attacks in terms of the earth-mover distance (EMD) between messages with Hamming distance as the metric, stated in Theorem 1. Using Le Cam s method, the lower bounds are provided in terms of EMD between distributions of messages from mixtures of sources (distributions), which is critical for obtaining tight bounds.

Robust learning and testing. Without information constraints, the server observes the true samples from p but with γ-fraction adversarially corrupted. For distribution learning the minimax risk is Θ( p

k/n + γ). While this result is standard, we provide it for completeness in Corollary 3. For testing, the optimal risk is more involved. In Theorem 2 we show that when γ fraction of the samples are corrupted, the risk is Θ(k1/4/ n + γ + p

kγ/n + γ 4p

k/n) where the ﬁrst term corresponds to the statistical rate proved in (Paninski, 2008; Valiant & Valiant, 2014; Diakonikolas et al., 2018). In particular, when γ min{1/

k, 1/ n} the risk increases signiﬁcantly compared to the uncorrupted case.

Manipulation attacks under information constraints. In Corollary 14, we provide a general lower bound for estimating and testing distributions under information constraints and a γ-fraction manipulation attack. The result builds on the recently developed framework for distributed inference in (Acharya et al., 2019b) and bounds the EMD between messages in terms of the trace norm of a channel information matrix (Deﬁnition 3).

Communication constraints. In Theorem 8 and 9, we establish risk bounds for distribution learning and testing under ℓ-bit communication constraints. We propose a protocol based on random hashing which matches the lower bound we prove up to logarithmic factors. Our bounds suggest that manipulation attacks are signiﬁcantly stronger with communication constraints on the channels. More precisely, the error due to manipulation attack can be as large as Θ(γ p

k/2ℓ) compared to γ in the unconstrained setting.

We also provide a robust testing algorithm under communication constraints based on an ℓ1/ℓ1 isometry in (Acharya et al., 2020a). However, the bounds only match the lower bounds for ℓ= O(1) or ℓ= Θ(log k). The testing bound in the unconstrained case suggests more effort is needed to study how communication constraints limit EMD. Closing this gap is an interesting future direction.

Privacy constraints. In Theorem 10 we prove a lower bound that matches the upper bounds provided in (Cheu et al., 2021) for both testing (up to a constant factor) and learning (up to logarithmic factor). We note that in (Cheu et al., 2021), a lower bound smaller by a logarithmic factor is proved under a weaker threat model, which is not directly comparable to our result.

The results are summarized in Table 12.

3. Related work

Without local information constraints, our work is related to the literature of robust statistical inference. Robust statistics has a long history (Huber, 2004). More recently, the interest focuses on designing computationally efﬁcient robust algorithms in high-dimensional estimation (Diakonikolas et al., 2016; Lai et al., 2016). See (Diakonikolas & Kane, 2019) for a survey. In (Diakonikolas et al., 2016; Chen et al., 2016), it is proved that for estimating a single Gaussian distribution, the risk due to adversarial attack is Θ(γ). For discrete distributions, a line of work (Qiao & Valiant, 2018; Chen et al., 2020; Jain & Orlitsky, 2020a;b) consider robust estimation in the distributed setting where each user contributes s 1 samples. Compared to the result in Corollary 3, they show that in this case the risk due to manipulation can be much smaller than γ.

(Valiant & Valiant, 2011; Daskalakis et al., 2018) study tolerant identity testing where the goal is to test between d TV(p, q) α/10 and d TV(p, q) α for a reference distribution q. The optimal sample complexity has been established as Θ(k/α2 log k). Suppose γ = α, then a γ-robust identity tester is also a tolerant tester since with γ fraction of the users controlled, the adversary can simply change the distribution to another distribution within TV distance α

2 with high probability. Theorem 2 shows that the robust setting is strictly harder by showing that Θ(k) samples are needed when γ and α are both constants. This is due to the richer class of attacks that the adversary can perform compared to the tolerant testing where the samples are still independent.

Robust identity testing Gaussian distributions without information constraints has been studied in (Diakonikolas

2All stated risk bounds are upper bounded by 1, which is omitted throughout the paper for simplicity.

Robust Testing and Estimation under Manipulation Attacks

Task Constraint Manipulation risk (UB) Manipulation risk (LB)

(Folklore) (Folklore, Corollary 3)

(Cheu et al., 2021) (Theorem 10)

(Theorem 8) (Theorem 8)

None O k 1 4 n + γ + q

Ω k 1 4 n + γ + q

(Theorem 2) (Theorem 2)

(Cheu et al., 2021) (Theorem 10)

k 2ℓ/2n + q

k 2ℓ/2n + q

(Theorem 9) (Theorem 9)

Table 1. Summary of results. For problems marked by ( ), (Cheu et al., 2021) also provides lower bounds lower than the stated bounds by a logarithmic factor under a weaker threat model.

& Kane, 2020), where the contamination model is slightly different. In (Acharya et al., 2020c), identity testing of Gaussians is studied under communication constraints without manipulation attacks.

Without manipulation attacks, there is signiﬁcant recent work interest in studying discrete distribution learning and testing in the distributed setting under information constraints. Optimal risks have been established under communication constraints (Han et al., 2018; Han et al., 2018; Acharya et al., 2020d; 2019b; 2020b) and LDP constraints (Duchi et al., 2013; Kairouz et al., 2016; Sheffet, 2017; Acharya et al., 2019a; 2020b).

In distributed learning systems, especially federated learning (Kairouz et al., 2019), manipulation attack is related to the so-called model poisoning attack (Bhagoji et al., 2019; Bagdasaryan et al., 2020), where the attacker has full control of a fraction of the users and can change model updates arbitrarily which doesn t have to obey the local training and messaging protocol. In these works, it is shown empirically that manipulation attacks can signiﬁcantly outperform the classic data poisoning attack where the attacker can only insert data points to local users whose messages still follow the local messaging protocol.

4. Moving the earth: the power of manipulation attacks

We now characterize the power of manipulation attacks in terms of the earth-mover (a.k.a. Wasserstein) distance

between the distributions of the messages at the output of the channels. We ﬁrst recall EMD with Hamming metric. Deﬁnition 1. Let Q1 and Q2 be distributions over Yn and π(Q1, Q2) be the set of all couplings between Q1 and Q2. The earth-mover distance (EMD) between Q1 and Q2 is

d EM (Q1, Q2) :=

inf Q π(Q1,Q2) E(Y n (1),Y n (2)) Q h d Ham(Y n (1), Y n (2)) i .

Note that a γ-manipulation attack can change Y n (1) Yn to another sequence Y n (2) Yn as long as d Ham(Y n (1), Y n (2)) γn. If Q1 and Q2 are distributions over length-n messages in Yn with EMD at most c γn, for some small constant c, the attack can effectively confuse sequences generated from Q1 and Q2. We formalize this intuition in Theorem 1 below. A key ingredient in the theorem below is to consider message distributions from a mixture of input distributions.

Let q k be some reference distribution and P k be a ﬁnite set such that for all p P,

d TV(p, q) α. (2)

Let p be uniformly drawn from P. Further, for a ﬁxed W n Wn

E h p Y ni = 1 |P|

be the message distribution when the input distribution is uniformly chosen from P.

Robust Testing and Estimation under Manipulation Attacks

Theorem 1. Suppose P satisﬁes (2) for some q and E p Y n

is as deﬁned in (3). If for all W n Wn,

d EM E h p Y ni , q Y n γn

then for both distribution learning and testing,

RIT(k, n, W, γ) = Ω(α), RDL(k, n, W, γ) = Ω(α).

Proof. We ﬁrst show a reduction from testing to learning. Suppose there exists a distribution learning algorithm with risk α/20 under any γ-manipulation attack. We can use this for testing as follows: (i) By Markov s inequality, we learn the distribution to output a bp such that with probability at least 0.9, d TV(bp, p) α/2 , and then (ii) test if d TV(bp, q) α/2 to perform testing with respect to q. This shows that RDL(k, n, W, γ) c RIT(k, n, W, γ), for some c. Therefore, we only need to prove that RIT(k, n, W, γ) = Ω(α).

Fix W n Wn, we have d EM E p Y n , q Y n γn/2.3 By the existence of minimizer for optimal transport4, there exists a randomized mapping F such that if Y n (1) E p Y n , then Y n (2) D= F(Y n (1)) q Y n and

E h d Ham(Y n 1 , F(Y n (1))) i γn

2 . By Markov s inequality, we have

Pr d Ham(Y n (1), F(Y n (1))) > γn 1

Consider the following γ-manipulation attack MC:

MC(Y n (1)) =

( F(Y n (1)), if d Ham(Y n (1), F(Y n (1))) γn,

Y n (1), if d Ham(Y n (1), F(Y n (1))) > γn.

Then by (4), we have

d TV(MC(Y n (1)), Y n (2)) = d TV(MC(Y n (1)), F(Y n (1))) 1

Hence, if the attacker sends the true messages when the distribution is q, by Bayes risk, it is impossible to test between q and E p Y n with success probability at least 9/10. By applying Le Cam s two-point method, it is impossible to tell whether the unknown distribution p equals q, or comes from P, which concludes the proof.

With the main technique at hand, for each family of channels, we prove lower bounds by designing distributions that are separated in TV distances while the corresponding messages are close in earth-mover distance. We start with the unconstrained setting in Section 5 and turn to the constrained case in Section 6.

3Without loss of generality, we assume W n is ﬁxed since the adversary can observe public randomness U.

4Hamming distance is a lower semi-continuous cost function.

5. Robust identity testing and learning

We consider robust identity testing without information constraints, i.e., the server observes the raw samples, γ fraction of which are adversarially corrupted. We prove the following tight minimax rate.

RIT(k, n, γ) = Θ

The ﬁrst term is the statistical rate which is implied by the sample complexity bound in (Paninski, 2008). Our bound implies that when γ > min{1/ n, 1/

k}, the risk due to manipulation can be signiﬁcantly larger than the statistical risk. The upper bound is based on the ℓ1-tester proposed in (Diakonikolas et al., 2018), which we present in Section 5.1. The lower bound is proved using the technique based on earth-mover distance developed in Section 4, provided in Section 5.2.

We get the next corollary for learning under γ-manipulation attacks without information constraints.

Corollary 3 (folklore).

RDL(k, n, γ) = Θ

The upper bound is achieved by the empirical distribution. The ﬁrst term is the standard risk without manipulation, and the second term follows from the second term in Theorem 2 and the reduction from testing to learning. We omit the details.

5.1. Upper bound for testing

Our upper bound proceeds in two stages. We will ﬁrst reduce identity testing to uniformity testing, and then provide an algorithm for uniformity testing.

Reduction from identity to uniformity testing. For unconstrained distribution estimation, Theorem 1 of (Goldreich, 2016) showed that, up to constant factors, the risk for testing identity of any distribution is upper bounded by the risk of uniformity testing. We extend their argument to the γ-manipulation attack. In particular, we will show that

RIT(k, n, γ) 3RUT(6k, n, γ).

Proof. (Goldreich, 2016) showed that for any distribution q over [k], there exists a randomized function Gq : [k] [6k] such that if X q, then Gq(X) u[6k], and if X p for a distribution with d TV(p, q) α, then d TV(Gq(X), u[6k]) α/3.

Robust Testing and Estimation under Manipulation Attacks

Let Xn be n samples from p, and Zn be a γcorrupted version of Xn. We then apply Gq independently to each of the Zi to obtain a new sequence Gq(Zn):=(Gq(Z1) . . . Gq(Zn)). If p = q, Gq(Xn) is distributed according to u[6k] and Gq(Zn) is a γ-manipulated version of it. If d TV(p, q) α, Gq(Xn) is distributed according to a distribution at least α/3-far from u[6k]. Therefore, an algorithm for γ-robust uniformity testing with α = α/3 can be used for α-identity testing with q.

We now present an algorithm for γ-robust uniformity testing. Recall that Zn is obtained upon perturbing γn samples from Xn p. For z [k], let Mz(Zn) be the number of appearances of z in Zn. The TV distance between the empirical histogram of Zn and uniform, which was used in (Diakonikolas et al., 2018),

will be used as our test statistic.

We now bound the difference in test statistic between Zn

and Xn when d Ham(Zn, Xn) γn. Lemma 5. Suppose Zn is obtained from Xn by manipulating at most γn samples, then

|S(Xn) S(Zn)| min γ, nγ

Proof. For the ﬁrst term, by the triangle inequality for any a, b R, we have ||a| |b|| |a b|. Using this in (5), we obtain,

|S(Xn) S(Zn)| 1

Mx(Xn) Mx(Zn)

where we used the fact that changing one sample from Xn

changes Mx(Xn) for at most two x [k] each by at most one, and therefore Pk x=1 |Mx(Xn) Mx(Zn)| 2γn.

We note that the second term is smaller than the ﬁrst when n < k. When n < k,

where Φ0(Xn) is the number of symbols not appearing in Xn. Therefore,

|S(Xn) S(Zn)| 1

k |Φ0(Xn) Φ0(Zn)| nγ

Next we use the following result from (Diakonikolas et al., 2018), which shows a separation in the test statistic under p = u[k] and d TV(p, u[k]) α. Let µ(p):=EXn p [S(Xn)] be the expectation of the statistic of the original samples.

Lemma 6 ((Diakonikolas et al., 2018), Lemma 4). Let Xn

be i.i.d. samples from p over [k]. For every β (0, 1), there

exist constants c1, c2 such that if α c1 k 1 4 n, then with probability at least 1 β,

1. when p = u[k],

S(Xn) µ(u[k]) < 9 10c2α2 min n n2 k2 , p n

2. when d TV(p, u[k]) α,

S(Xn) µ(u[k]) > 11

10c2α2 min n n2 k2 , p n

Our test for uniformity is the following:

( yes if S(Zn) µ(u[k]) c2α2 min n n2 k2 , p n

no otherwise. (6)

By Lemma 5, when α 10

|S(Xn) S(Zn)| 1

10 c2α2 min n2

Setting β = 1/10 in Lemma 6 shows that the test in (6) solves the uniformity testing problem.

Remark 3. Note that the proof shows that for any constant failure probability β, the risk for robust identity testing is the same as that in Theorem 2 up to a constant factor, i.e., for any β, there is a constant c(β), such that

Rβ IT(k, n, W, γ) c(β)

k 1 4 n + γ +

This will be important when we consider error boosting in the proof of Theorem 12.

5.2. Lower bound

In uniformity testing we have q = u[k]. We will use P to be the following class of 2k/2 distributions from (Paninski, 2008) indexed by z { 1}k/2, i.e., P = {pz : z { 1}k/2}, where

pz(2i 1) = 1 + zi 2α

k , pz(2i) = 1 zi 2α

Note that for any z { 1}k/2, d TV(pz, u[k]) = α. The following lemma, proved in (Acharya et al., 2018) characterizes the earth-mover distance between EZ u{ 1}k/2 [pn Z] and u[k]n.

Robust Testing and Estimation under Manipulation Attacks

Lemma 7 ((Acharya et al., 2018) Lemma 7).

d EM EZ u{ 1}k/2 [pn Z] , u[k]n

= O n min nα2

Note in the centralized case Y n = Xn, hence p Y n = pn. Plugging Lemma 7 in Theorem 1, we get the lower bound by setting the EMD to be γn/2 and solving for α.

6. Robust constrained inference

In this section we consider learning and testing under communication and LDP constraints. We ﬁrst state the results, then in Section 6.1 and 6.2 establish the upper bounds and ﬁnally close with lower bounds in Section 6.3.

Communication constraints. For distribution learning under ℓ-bit communication constraints, we establish the following risk bound, which is optimal up to logarithmic factors.

RDL(k, n, Wℓ, γ) = Θ

n(2ℓ k) + γ

The ﬁrst term is the risk under communication constraints without manipulation attacks (Han et al., 2018; Acharya et al., 2020d). The second term shows that if ℓ< log k,

manipulation attack increases the risk by Θ γ q

pared to the the increase of Θ(γ) in the unconstrained setting (Corollary 3).

For identity testing, we obtain the following risk bounds.

Theorem 9. Suppose ℓ log k,

RIT(k, n, Wℓ, γ) =

RIT(k, n, Wℓ, γ) = Ω

The ﬁrst term above is the risk of testing under Wℓwithout information constraints. The risk due to manipulation attack

in the upper bound is increased by a factor of q

k 2ℓcompared to the unconstrained case in Theorem 2 . We remark that the upper and lower bounds above match (up to constant factors) for ℓ= Θ(1) and for ℓ= Θ(log k) (when ℓ= log k, it matches the risk for unconstrained testing in Theorem 2).

We believe that the upper bound is tight and proving a better lower bound is an interesting future work.

Local privacy constraints. We establish the following lower bounds for learning and testing under ε-LDP. Theorem 10. Suppose ε = O(1),

RDL(k, n, Wε, γ) = Ω

RIT(k, n, Wε, γ) = Ω

We note that (Cheu et al., 2021) designs algorithms that achieve the bounds above up to a constant factor for testing and up to logarithmic factors for learning. However, their lower bounds are a logarithmic factor smaller than Theorem 10 under a weaker threat model, making the bounds incomparable.

6.1. Distribution estimation with ℓbits under manipulation

Without loss of generality, we assume ℓ< log k. We now present a scheme based on random hasing that achieves the upper bound for learning with the rate in Theorem 8. Random hashing has been previously used for sparse estimation under communication constraints (Acharya et al., 2021). Deﬁnition 2. A random mapping h : [k] [2ℓ] is a random hashing function if x [k], y [2ℓ], Pr (h(x) = y) = 1 2ℓ.

Let T be the set of all k 2ℓbinary matrices that have exactly one 1 in each row. A random hashing function h is equivalent to a Th drawn uniformly at random from T with the correspondence

Th(x, y) = 1 {h(x) = y} .

Now for any ﬁxed y [2ℓ], the yth column of Th has each entry as an independent Ber(1/2ℓ) random variable.

We describe the protocol and the estimator below.

1. Using randomness U users obtain independent random hashing functions h1, . . . , hn and send

Yi = hi(Xi).

2. Upon receiving the manipulated samples Zn [2ℓ]n, the server outputs

bp(Zn) = 2ℓ

i=1 Thi( , Zi) n

Without manipulation attacks, when the server receives Y n, it has been shown in (Acharya et al., 2021) that

E [d TV(bp(Y n), p)] = O

Robust Testing and Estimation under Manipulation Attacks

Note that in Theorem 8, the second term becomes larger than one when γ > c p

2ℓ/k. We therefore focus on γ < p

2ℓ/k. By the triangle inequality, it sufﬁces to bound E [d TV(bp(Y n), bp(Zn))]. Let C be the set of samples that are manipulated, and hence |C| γn = m. Therefore,

d TV(bp(Y n), bp(Zn))

i C (Thi( , Yi) Thi( , Zi))

n max |C |=m max yi,zi [2ℓ]

i C (Thi( , yi) Thi( , zi))

n max |C |=m max yi,zi [2ℓ] max v { 1}k v T X

i C (Thi( , yi) Thi( , zi)),

where (8) follows by maximizing over C, and (9) holds since for u Rk, u 1 = maxv { 1}k v T u.

Recall that for i [k] and yi, zi Y, Thi( , yi) and Thi( , zi) are both k-dimensional binary vectors with each coordinate an independent Ber(1/2ℓ). Then for any C [n], with |C | = m, x [k], yi, zi [2ℓ], and v { 1}k, v T P i C Thi( , yi) Thi( , zi)) is distributed as the difference between two Binom(km, 1/2ℓ) random variables5

since hi s are independently generated.

Hence, by Chernoff bound (multiplicative form) and union bound, we have η (0, p

i C (Thi( , yi) Thi( , zi))>2

Taking union bound over n m subsets C , (2ℓ)m (2ℓ)m

possible choices of {yi, zi}i C , and 2k choices of v { 1}k, by (9) and (10), we have

d TV(bp(Y n), bp(Zn))> 2

2 n m (2ℓ)2m2k

6(k + 2mℓlog 2 + m log n + log(2ℓ/γ2k)), we have6,

d TV(bp(Y n), bp(Zn)) > 2

Hence using n = m/γ, we have

E [d TV(bp(Y n), bp(Zn))] = O

5It is possible that these two binomials are correlated. However, the union bound used after this still holds. 6ℓ< log k implies the choice of η (0, p

6.2. ℓ-bit identity testing under manipulation

We now establish the upper bound in Theorem 9. We ﬁrst reduce the problem to an unconstrained testing problem over a domain of size [2ℓ] that can be represented using ℓ bits and then invoke our bounds from Theorem 2 for robust identity testing without information constraints.

For the reduction, we use the domain compression technique proposed in (Acharya et al., 2020a) to compress the observed samples to a smaller domain of size 2ℓ. Moreover, the protocol preserves the TV distance between any pair of distributions up to a factor of Ω( p

2ℓ/k) with a constant probability. More precisely, we will use the following lemma from (Acharya et al., 2020a). Lemma 11 (Theorem 3.2 (Acharya et al., 2020a)). For any ℓ< log k, there exists a mapping ϕ : {0, 1} [k] [2ℓ] and universal constants c1 and c2 such that p, q k with d TV(p, q) α, we have

d TV(ϕ(U, p), ϕ(U, q)) c1α

where U is a public random string and with a slight abuse of notation we denote by ϕ(U, p) the distribution of ϕ(U, X) when X p.

Using this lemma, our testing scheme is the following.

1. Divide users into N = log1 c2/2(1/10) disjoint batches of equal size, where c2 is the constant in Lemma 11. For each batch Bj, j [N], generate an independent public random string Uj. 2. Each user i Bj sends the ℓ-bit message Yi = ϕ(Uj, Xi). 3. For j [N], let Z(Bj) be the messages received from users in Bj. Perform the robust testing algorithm in Section 5.1 with alphabet size 2ℓand distance Rβ IT(2ℓ, n/N, Nγ) where β = min{c2/2, 1

9/10}. 4. If all tests output yes, output yes, else, output no.

We now analyze the algorithm.

In the null case, when p = q, ϕ(U, p) = ϕ(U, q), the test in each batch outputs yes with probability at least 1 β (see Remark 3). Since the batches are disjoint, all tests output yes with probability at least (1 β)N 9/10 since β 1 Np

Now in the alternate case, suppose that

d TV(p, q) 1

k 2ℓ Rβ IT(2ℓ, n/N, Nγ).

Then with probability at least c2 over the randomness of U, we have

d TV(ϕ(U, p), ϕ(U, q)) Rβ IT 2ℓ, n

Robust Testing and Estimation under Manipulation Attacks

Conditioned on this event, by Theorem 2, we have the test in this batch outputs no with probability at least 1 c2/2 since β < c2/2. By disjointness of the batches, and the union bound we have at least one of the tests output no with probability at least 1 (1 c2/2)N 9/10 by the choice of N.

From Remark 3 we know that Rβ IT(2ℓ, n/N, Nγ) is at most larger than RIT(2ℓ, n, γ) by a multiplicative constant, which gives the following reduction. Lemma 12.

RIT(k, n, Wℓ, γ) = O

k 2ℓ k RIT(2ℓ k, n, γ)

To obtain the upper bound in Theorem 9, we only need to plug in the bound of robust testing without information constraints from Theorem 2.

6.3. Lower bounds by χ2-contraction

In order to establish the lower bounds, by Theorem 1, it is sufﬁcient to establish upper bounds on the EMD of messages Y n induced by a suitably chosen mixture of distributions from that induced by the uniform distribution.

In particular, we will relate the EMD under information constraints to the channel information matrices of the allowed channels, which describes how the channel can exploit the structure of the set of distributions in (7) to solve inference tasks. Deﬁnition 3 (Channel Information Matrix). For a channel W : [k] Y, the channel information matrix of W, denoted by H(W), is a (k/2) (k/2) matrix and i1, i2 [k/2], H(W)(i1, i2):=

(W(y|2i1 1) W(y|2i1))(W(y|2i2 1) W(y|2i2)) P

x [k] W(y | x) .

We will establish the following upper bounds on the EMD under local information constraints. Lemma 13. For P deﬁned in (7), and p be uniformly drawn from P, and for any W n Wn, let E p Y n =

p P p Y n be the message distribution under a uniform mixture and channels W n. Then,

d EM E h p Y n Z i , u[k]Y n 2nα

max W W H(W)

We will use this lemma with Theorem 1 to obtain the following lower bound for robust inference. Corollary 14.

RIT(DL)(k, n, W, γ) = Ω

k max W W H(W)

where denotes the trace norm of a matrix.

Under privacy and communication constraints, we have max W Wε H(W) ε2 and max W Wℓ H(W) 2ℓ, which are proved in (Acharya et al., 2019b). Using Corollary 14, we obtain the corresponding terms in the lower bound part of Theorem 8, the 9, and 10. The ﬁrst terms in these bounds are the lower bounds without manipulation attacks and are proved in (Acharya et al., 2019b; Han et al., 2018; Duchi et al., 2013) respectively. Remark 4. Lower bounds without information constraints are automatically lower bounds of the constrained infer-

ence. Therefore, we get the lower bound of Ω q

Theorem 9 from Theorem 2.

Now it is enough to prove Lemma 13.

Proof. We will use the same lower bound construction stated in (7). We bound the earth-mover distance using the naive coupling for length-n independent sequences which is equal to the sum of TV distances on each entry. Then we can relate the TV distances to the χ2-divergence bounds proved in (Acharya et al., 2019b), stated below.

Lemma 15 (Theorem IV.11 (Acharya et al., 2019b)). Let denote the trace norm of a matrix,

EZ u{ 1}k/2 dχ2(W p Z, W u[k]) = 8α2

where q, W q denotes the distribution of the message Y given the input X q, and p Z is deﬁned in (7).

Let Z u{ 1}k/2. Then we have

d EM E h p Y n Z i , u[k]Y n

E h d EM p Y n Z , u[k]Y n i (Convexity)

i=1 d TV(Wi p Z, Wi u[k])

(Naive coupling)

1 2dχ2(Wi p Z, Wi u[k])

(Pinkser s Inequality)

2dχ2(Wi p Z, Wi u[k]) (Concavity)

max W W H(W)

k . (Lemma 15)

7. Acknowledgement

This research is supported by the NSF through grants CCF1846300 (CAREER), CCF-1815893.

Robust Testing and Estimation under Manipulation Attacks

Acharya, J., Sun, Z., and Zhang, H. Differentially private testing of identity and closeness of discrete distributions. In Advances in Neural Information Processing Systems, pp. 6879 6891, 2018.

Acharya, J., Canonne, C. L., Freitag, C., and Tyagi, H. Test without trust: Optimal locally private distribution testing. In The 22nd International Conference on Artiﬁcial Intelligence and Statistics, pp. 2067 2076, 2019a.

Acharya, J., Canonne, C. L., and Tyagi, H. Inference under information constraints: Lower bounds from chi-square contraction. Proceedings of Machine Learning Research vol, 99:1 15, 2019b.

Acharya, J., Canonne, C. L., Han, Y., Sun, Z., and Tyagi, H. Domain compression and its application to randomnessoptimal distributed goodness-of-ﬁt. In Conference on Learning Theory, pp. 3 40. PMLR, 2020a.

Acharya, J., Canonne, C. L., Liu, Y., Sun, Z., and Tyagi, H. Interactive inference under information constraints. ar Xiv preprint ar Xiv:2007.10976, 2020b.

Acharya, J., Canonne, C. L., and Tyagi, H. Distributed signal detection under communication constraints. In Conference on Learning Theory, pp. 41 63. PMLR, 2020c.

Acharya, J., Canonne, C. L., and Tyagi, H. Inference under information constraints ii: Communication constraints and shared randomness. IEEE Transactions on Information Theory, 66(12):7856 7877, 2020d.

Acharya, J., Kairouz, P., Liu, Y., and Sun, Z. Estimating sparse discrete distributions under privacy and communication constraints. In Feldman, V., Ligett, K., and Sabato, S. (eds.), Proceedings of the 32nd International Conference on Algorithmic Learning Theory, volume 132 of Proceedings of Machine Learning Research, pp. 79 98. PMLR, 16 19 Mar 2021. URL http://proceedings.mlr.press/v132/ acharya21b.html.

Bagdasaryan, E., Veit, A., Hua, Y., Estrin, D., and Shmatikov, V. How to backdoor federated learning. In International Conference on Artiﬁcial Intelligence and Statistics, pp. 2938 2948. PMLR, 2020.

Bhagoji, A. N., Chakraborty, S., Mittal, P., and Calo, S. Analyzing federated learning through an adversarial lens. In International Conference on Machine Learning, pp. 634 643. PMLR, 2019.

Chan, S.-O., Diakonikolas, I., Valiant, G., and Valiant, P. Optimal algorithms for testing closeness of discrete distributions. In Proceedings of the 25th Annual ACM-SIAM

Symposium on Discrete Algorithms, SODA 14, pp. 1193 1203, Philadelphia, PA, USA, 2014. SIAM.

Chen, M., Gao, C., and Ren, Z. A general decision theory for huber s ϵ-contamination model. Electronic Journal of Statistics, 10(2):3752 3774, 2016.

Chen, S., Li, J., and Moitra, A. Learning structured distributions from untrusted batches: Faster and simpler. Advances in Neural Information Processing Systems, 2020.

Cheu, A., Smith, A., and Ullman, J. Manipulation attacks in local differential privacy. In 2021 2021 IEEE Symposium on Security and Privacy (SP), pp. 1 18, Los Alamitos, CA, USA, may 2021. doi: 10.1109/SP40001.2021.00001.

Daskalakis, C., Kamath, G., and Wright, J. Which distribution distances are sublinearly testable? In Proceedings of the 29th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 18, pp. 2747 2764, Philadelphia, PA, USA, 2018. SIAM.

Diakonikolas, I. and Kane, D. M. Recent advances in algorithmic high-dimensional robust statistics. ar Xiv preprint ar Xiv:1911.05911, 2019.

Diakonikolas, I. and Kane, D. M. The sample complexity of robust covariance testing. ar Xiv e-prints, pp. ar Xiv 2012, 2020.

Diakonikolas, I., Kamath, G., Kane, D. M., Li, J., Moitra, A., and Stewart, A. Robust estimators in high dimensions without the computational intractability. In Proceedings of the 57th Annual IEEE Symposium on Foundations of Computer Science, FOCS 16, pp. 655 664, Washington, DC, USA, 2016. IEEE Computer Society.

Diakonikolas, I., Gouleakis, T., Peebles, J., and Price, E. Sample-optimal identity testing with high probability. In Proceedings of the 45th International Colloquium on Automata, Languages, and Programming, ICALP 18, pp. 41:1 41:14, 2018.

Duchi, J. C., Jordan, M. I., and Wainwright, M. J. Local privacy and statistical minimax rates. In Proceedings of the 54st Annual IEEE Symposium on Foundations of Computer Science, FOCS 13, pp. 429 438. IEEE, 2013.

Goldreich, O. The uniform distribution is complete with respect to testing identity to a ﬁxed distribution. In Electronic Colloquium on Computational Complexity (ECCC), volume 23, 2016.

Han, Y., Mukherjee, P., Ozg ur, A., and Weissman, T. Distributed statistical estimation of high-dimensional and nonparametric distributions. In 2018 IEEE International Symposium on Information Theory (ISIT), pp. 506 510, 2018. doi: 10.1109/ISIT.2018.8437818.

Robust Testing and Estimation under Manipulation Attacks

Han, Y., Ozg ur, A., and Weissman, T. Geometric lower bounds for distributed parameter estimation under communication constraints. In Conference On Learning Theory, pp. 3163 3188. PMLR, 2018.

Huber, P. J. Robust statistics, volume 523. John Wiley & Sons, 2004.

Jain, A. and Orlitsky, A. A general method for robust learning from batches. Advances in Neural Information Processing Systems, 33, 2020a.

Jain, A. and Orlitsky, A. Optimal robust learning of discrete distributions from batches. In International Conference on Machine Learning, pp. 4651 4660. PMLR, 2020b.

Kairouz, P., Bonawitz, K., and Ramage, D. Discrete distribution estimation under local privacy. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML 16, pp. 2436 2444, 2016.

Kairouz, P., Mc Mahan, H. B., Avent, B., Bellet, A., Bennis, M., Bhagoji, A. N., Bonawitz, K., Charles, Z., Cormode, G., Cummings, R., D Oliveira, R. G. L., Rouayheb, S. E., Evans, D., Gardner, J., Garrett, Z., Gasc on, A., Ghazi, B., Gibbons, P. B., Gruteser, M., Harchaoui, Z., He, C., He, L., Huo, Z., Hutchinson, B., Hsu, J., Jaggi, M., Javidi, T., Joshi, G., Khodak, M., Koneˇcn y, J., Korolova, A., Koushanfar, F., Koyejo, S., Lepoint, T., Liu, Y., Mittal, P., Mohri, M., Nock, R., Ozg ur, A., Pagh, R., Raykova, M., Qi, H., Ramage, D., Raskar, R., Song, D., Song, W., Stich, S. U., Sun, Z., Suresh, A. T., Tram er, F., Vepakomma, P., Wang, J., Xiong, L., Xu, Z., Yang, Q., Yu, F. X., Yu, H., and Zhao, S. Advances and open problems in federated learning, 2019.

Lai, K. A., Rao, A. B., and Vempala, S. Agnostic estimation of mean and covariance. In Proceedings of the 57th Annual IEEE Symposium on Foundations of Computer Science, FOCS 16, pp. 665 674. IEEE Computer Society, 2016.

Paninski, L. A coincidence-based test for uniformity given very sparsely sampled discrete data. IEEE Transactions on Information Theory, 54(10):4750 4755, 2008.

Qiao, M. and Valiant, G. Learning discrete distributions from untrusted batches. In 9th Innovations in Theoretical Computer Science Conference (ITCS 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018.

Sheffet, O. Differentially private ordinary least squares. In Proceedings of the 34th International Conference on Machine Learning, ICML 17, pp. 3105 3114. JMLR, Inc., 2017.

Valiant, G. and Valiant, P. Estimating the unseen: An n/ log n-sample estimator for entropy and support size, shown optimal via new CLTs. In Proceedings of the 43rd Annual ACM Symposium on the Theory of Computing, STOC 11, pp. 685 694, New York, NY, USA, 2011. ACM.

Valiant, G. and Valiant, P. An automatic inequality prover and instance optimal identity testing. In Proceedings of the 55th Annual IEEE Symposium on Foundations of Computer Science, pp. 51 60. IEEE, 2014.