# adversarial_examples_from_computational_constraints__0335454d.pdf

Adversarial Examples from Computational Constraints

S ebastien Bubeck 1 Yin Tat Lee 1 2 Eric Price 3 Ilya Razenshteyn 1

Why are classiﬁers in high dimension vulnerable to adversarial perturbations? We show that it is likely not due to information theoretic limitations, but rather it could be due to computational constraints. First we prove that, for a broad set of classiﬁcation tasks, the mere existence of a robust classiﬁer implies that it can be found by a possibly exponential-time algorithm with relatively few training examples. Then we give two particular classiﬁcation tasks where learning a robust classiﬁer is computationally intractable. More precisely we construct two binary classiﬁcations task in high dimensional space which are (i) information theoretically easy to learn robustly for large perturbations, (ii) efﬁciently learnable (nonrobustly) by a simple linear separator, (iii) yet are not efﬁciently robustly learnable, even for small perturbations. Speciﬁcally, for the ﬁrst task hardness holds for any efﬁcient algorithm in the statistical query (SQ) model, while for the second task we rule out any efﬁcient algorithm under a cryptographic assumption. These examples give an exponential separation between classical learning and robust learning in the statistical query model or under a cryptographic assumption. It suggests that adversarial examples may be an unavoidable byproduct of computational limitations of learning algorithms.

1. Introduction

The most basic task in learning theory is to learn from a data set (Xi, f(Xi))i [n] a good approximation to the unknown input-output function f. One is typically interested in ﬁnding a hypothesis function h with small out of sample probability of error. That is, assuming the Xi s are i.i.d.

*Equal contribution 1Microsoft Research, Redmond, Washington, USA 2University of Washington, Seattle, Washington, USA 3University of Texas, Austin, Texas, USA. Correspondence to: Ilya Razenshteyn <ilyaraz@microsoft.com>.

Proceedings of the 36 th International Conference on Machine Learning, Long Beach, California, PMLR 97, 2019. Copyright 2019 by the author(s).

from some distribution D, one wishes to approximately minimize PX D(h(X) = f(X)). A more challenging task is to learn a robust hypothesis, that is, one that would minimize the probability of error against adversarially corrupted examples. More precisely, assume that the input space is endowed with a norm and let ε > 0 be a ﬁxed robustness parameter. In robust learning the goal is to ﬁnd h to minimize:

P X D( z such that z ε, and h(X + z) = f(X + z)) .

(1) Such an input X + z in the above event is colloquially referred to as an adversarial example1.

Following Szegedy et al. (2013) there is a rapidly expanding literature exploring the vulnerability of neural networks to adversarially chosen perturbations. The surprising observation is that, say in vision applications, for most images X D the perturbation can be chosen in a way that is imperceptible to a human yet dramatically changes the output of state-of-the-art neural networks. This is a particularly important issue as these neural networks are currently being deployed in real-world situations. Naturally there is by now a large literature (in fact going back at least to (Dalvi et al., 2004; Globerson and Roweis, 2006)) on attacks (ﬁnding adversarial perturbations) and defenses (making classiﬁers robust against certain type of attacks).

While we have a sophisticated theory for the classical goal of minimizing the non-robust probability of error, our understanding of the robust scenario is still very rudimentary. At the moment, the attackers seem to be winning the arms race against the defenders , see e.g., (Athalye et al., 2018). We identify four mutually exclusive possibilities for why all known classiﬁcation algorithms are vulnerable to adversarial examples:

1. No robust classiﬁer exists.

2. Identifying a robust classiﬁer requires too much training data.

3. Identifying a robust classiﬁer from limited training data

1In the literature one sometimes uses a more stringent deﬁnition of adversarial examples, where X and z are in addition required to satisfy f(X + z) = f(X). We ignore this requirement here.

Adversarial Examples from Computational Constraints

is information theoretically possible but computationally intractable.

4. We just have not found the right algorithm yet.

The goal of this paper is to provide two pieces of evidence, one in favor of hypothesis 3 and one against hypothesis 2. Our primary result is that hypothesis 3 is indeed possible: there exist robust classiﬁcation tasks that are information theoretically easy but computationally intractable under a powerful model of computation (namely the statistical query model, see below) or for unrestricted efﬁcient algorithms but under a cryptographic hardness assumption. Our secondary result is evidence against hypothesis 2, showing that if a robust classiﬁer exists then it can be found with relatively few training examples under a standard assumption on the data distribution (for example, that the distribution within each label is close to a Lipschitz generative model, or is drawn from a ﬁnite set of exponential size).

In Section 1.1 we discuss related work on adversarial examples in light of those four hypotheses. In Section 1.2 we introduce the model of computation under which we will prove intractability. We conclude the introduction with Section 1.4 where we give a brief proof overview for our primary and secondary result. These results are discussed in greater depth respectively in Sections 4,5 and Section 3.

1.1. Related work on adversarial examples

To the best of our knowledge, previous works have not linked computational constraints to adversarial examples, but instead have focused on the other three hypotheses.

Supporting hypothesis 1 is the work of Fawzi et al. (2018). Here the authors consider a generative model for the features, namely X = g(r) where r Rd is sampled from an isotropic Gaussian (in particular it is typically of Euclidean norm roughly

d). The observation is that, due to Gaussian isoperimetry, no classiﬁer is robust to perturbations in r of Euclidean norm O(1). If g is L-Lipschitz, this corresponds to perturbations of the image X of at most O(L). On the other hand, evidence against hypothesis 1 is the fact that humans seem to be robust classiﬁers with low error rate (albeit nonzero error rate, as shown by examples in (Elsayed et al., 2018)). This suggests that, to ﬁt real distributions on images, the Lipschitz parameter L in the data model assumed in (Fawzi et al., 2018) may be prohibitively large.

Another work arguing the inevitability of adversarial examples is Gilmer et al. (2018). There the authors propose a simple classiﬁcation task, namely distinguishing between samples on the unit sphere in high dimension and samples on a sphere of radius R bounded away from 1. They show experimentally that even in such a simple setup, state-ofthe-art neural networks have adversarial examples at most

points. We note however that this example only applies to speciﬁc classiﬁers, since it is easy to construct an efﬁcient robust classiﬁer for the given example (e.g., just use a linear model on the norm of the features); thus the hardness here only appears for a given network structure.

Supporting hypothesis 2 is the work of Schmidt et al. (2018). Here the authors consider a mixture of two separated Gaussians (isotropic, with means at distance Θ(

d)). With such a separation a single sample is sufﬁcient to learn nonrobustly; but to learn a classiﬁer that is robust to O(1)-size perturbations in ℓ -norm one needs Ω(

d) samples. This polynomial separation suggests that avoiding adversarial examples in high dimension requires a lot more samples than mere learning but only up to

d samples. In fact, since their hard instance is essentially a set of 2d possible distributions, our secondary result gives a black-box algorithm that would produce a robust classiﬁer with O(d) samples.

Finally the large body of work on adversarial defense can be viewed as investigating hypothesis 4. We note that, at the time of writing, the state of the art defense Madry et al. (2018) (according to (Athalye et al., 2018)) is still far from being robust. Indeed on the CIFAR-10 dataset its accuracy is below 50% even with very small perturbations (of order 10 2 in ℓ -norm), while state of the art non-robust accuracy is higher than 95%.

Update. Since the ﬁrst version of the paper appeared (in May 2018), all the directions discussed above have seen lots of progress (for a small and by no means representative sample, see (Garg et al., 2018; Yin et al., 2018; Zhang et al., 2019)). However, the status quo does not seem to change signiﬁcantly. Namely, the defense from (Madry et al., 2018) is still the state of the art for the CIFAR dataset, and there are no attacks that perform drastically better than simple projected gradient descent (PGD). Perhaps the most signiﬁcant developments have been related to training models with provable robustness guarantees (Dvijotham et al., 2018; Wong and Kolter, 2018; Weng et al., 2018; Xiao et al., 2018), however, currently all the methods for such training are either extremely slow or the certiﬁed bound is much weaker than the bound achieved by the PGD attack.

1.2. The SQ model

Proving computational hardness is a notoriously difﬁcult problem. To circumvent this difﬁculty one usually either (i) reduces the problem at hand to a well-established computational hardness conjecture (e.g., proving NP-hardness), or (ii) proves an unconditional hardness within a limited computational framework (such as the oracle lower bounds in convex optimization, (Nesterov, 2004)). Our task here is further complicated by the average-case nature of the problem (the datasets are i.i.d. from some ﬁxed distribution).

Adversarial Examples from Computational Constraints

Fortunately there is a growing set of results on computational hardness in learning theory that we can leverage. The statistical query (SQ) model of computation from Kearns (1998) is a particularly successful instance of approach (ii) for learning theory: (a) most known learning algorithms fall in the framework, including in particular logistic regression, SVM, stochastic gradient descent, etc; and (b) SQ-hardness has been proved for many interesting problems that are believed to be computationally hard, such as learning parity with noise (Kearns, 1998), learning intersection of halfspaces (Klivans and Sherstov, 2007), the planted clique problem (Feldman et al., 2013), robust estimation of high-dimensional Gaussians (Diakonikolas et al., 2017), or learning a function computable by a small neural network (Song et al., 2017). Thus we naturally use this model to prove our main result on the computational hardness of robust learning. We now recall the deﬁnition of the SQ model and state informally our main result.

As Kearns put it in his original paper, the SQ model considers learning algorithms that construct a hypothesis based on statistical properties of large samples rather than on the idiosyncrasies of a particular sample . More precisely, rather than having access to a data set (Xi, f(Xi)), in the SQ model one must make queries to a τ-SQ oracle which operates as follows: given a [0, 1]-valued function ψ deﬁned on input/output pairs, the SQ oracle returns a value EX D ψ(X, f(X)) + ξ where |ξ| τ. We refer to τ as the precision of the oracle. Obviously, an algorithm using T queries to an oracle with precision τ can be simulated using a data set of size roughly T/τ 2. In our main result we consider an oracle with exponential precision. More concretely we take τ of order exp( Cdc) where d is the dimension of the problem and c, C > 0 are some numerical constants. Observe that such a high precision oracle cannot be simulated with a polynomial (in d) number of samples. Yet we show that even with such a high precision one needs an exponential number of queries to achieve robust learning for a certain task which on the other hand is easy to learn, and information theoretically learnable robustly:

Theorem 1.1 (informal). For any ε > 0, there exists a classiﬁcation task in Rd which is

learnable in poly(d) time and poly(d) samples;

robustly learnable in poly(d) samples with ℓ2robustness parameter log0.49 d (while with high probability all samples have ℓ2-norm O(

not efﬁciently and robustly learnable in the statistical query model, in the sense that even with an exponential (in d) precision statistical query oracle one needs an exponential (in d) number of queries in order to robustly learn with robustness parameter ε.

The same result holds using the ℓ norm instead of ℓ2, except with diameter O( d log d).

Of course, a number of natural machine learning algorithms such as nearest neighbor are not based on statistical queries. Although we cannot prove it, we believe that our input distributions are computationally hard in general. For the case of nearest neighbor, the distance to points of each class have very similar distributions indeed, the two distributions match on polynomially many moments. This suggests that exponentially many samples are necessary for nearest neighbor. For more information about nearest neighbor classiﬁers in the context of adversarial examples, see (Wang et al., 2017).

Moreover, there are very few problems in any domain with exponential SQ hardness for which polynomial time algorithms are known; in fact, the only such problems involve solving systems of linear equations over ﬁnite ﬁelds (Feldman, 2017). Since Theorem 1.1 involves a real-valued problem, ﬁnding a polynomial time algorithm that avoids the SQ lower bound would be a remarkable breakthrough in SQ theory.

1.3. Cryptographic hardness

We complement Theorem 1.1 with an alternative construction, which has qualitatively similar properties. However, there are two important differences. First, instead of log0.49 d-robust classiﬁer, we can guarantee the existence of a Ω(

d)-robust one (which is the best possible, since the diameter of the dataset is O(

d)). Second, instead of ruling out efﬁcient SQ algorithms, we can rule out all the efﬁcient algorithms. However, this is of course not an unconditional result, and we show it under a cryptographic assumption.

More speciﬁcally, we build a classiﬁcation task out of a pseudo-random generator from (Blum et al., 1986) and hardness of robust learning follows from computational indistinguishability of the output of the generator and the uniform distribution.

1.4. Overview of proofs

Our secondary result, on the information theoretic achievability of robustness, is proved via simple arguments reminiscent of PAC-learning theory. Namely, if a classiﬁer is not good enough for a given pair of distributions, we can rule it out with high conﬁdence by looking at not too many samples. Then, we use a union bound to claim the result for a family of pairs that is either at most exponentially large, or is at least covered by a net of at most exponential size (the only subtlety is in the proper deﬁnition of a net in this robust context).

Our primarily result, on the hardness of robustness, is technically much more challenging. The central object in the proof

Adversarial Examples from Computational Constraints

is a natural high-dimensional generalization of a construction from Diakonikolas et al. (2017). Roughly speaking, a hard pair of distributions is obtained by taking a standard multivariate Gaussian, choosing a random k-dimensional subspace and planting there two well-separated distributions that match many moments of a Gaussian (in (Diakonikolas et al., 2017) only the case k = 1 is considered). To show an SQ lower bound, we use as in (Diakonikolas et al., 2017) the framework of (Blum et al., 1994; Feldman et al., 2013) to reduce the question to computing a certain non-standard notion of correlation between the distributions. To bound said correlation, we deviate from (Diakonikolas et al., 2017) signiﬁcantly, since their argument is tailored crucially to the case k = 1. Our argument is less precise, but allows k 1 which is necessary to obtain a large separation between the distributions (which in turn controls the parameter M in Theorem 1.1).

For the cryptographic hardness, we, roughly speaking, require a classiﬁer to distinguish the uniform distribution from the output of the pseudo-random generator (PRG) on a uniformly random seed. Because the seed is much shorter than the output, extremely robust classiﬁers exist (since the image of the generator is tiny). In order to provide an example where an efﬁcient robust classiﬁer exists and is information-theoretically easy but computationally intractable to learn from data, we use a trapdoor PRG. The construction from (Blum et al., 1986) gives a trapdoor PRG under standard cryptographic assumptions (Vazirani and Vazirani, 1983).

2. Deﬁnitions

Throughout we restrict ourselves to binary classiﬁers, Rdfeature space, as well as to balanced classes. We ﬁx some norm in Rd, and we denote B(ε) = {z Rd : z ε}. Deﬁnition 2.1. The ε-robust zero-one loss (with respect to ) is deﬁned as follows, for f : Rd {0, 1} and (x, i) Rd {0, 1},

ℓε(f, x, i) = 1{ z B(ε) : f(x + z) = i} .

Deﬁnition 2.2. A binary classiﬁer f : Rd {0, 1} is (ε, δ)-robust for a pair of distributions (D0, D1) on X if for any i {0, 1},

E X Di[ℓε(f, X, i)] δ .

Deﬁnition 2.3. A (binary) classiﬁcation task is given by a family D of pairs of distributions D = (D0, D1) over a domain X. The goal is to map datasets X0, X1 consisting of n i.i.d. samples from D0 and D1 respectively into a classiﬁer f : Rd {0, 1}.

We say that D is (ε, δ)-robustly learnable with n samples if there is a classiﬁcation mapping such that, for every D D,

with probability at least 2/3 over X0 and X1, the resulting classiﬁer f is (ε, δ)-robust for D. Remark 2.4. The success probability 2/3 is an arbitrary constant larger than 1/2. It is easy to see that, for any η > 0, by using O(n log(1/η)) samples one can obtain a success probability of 1 η.

We also note that the classical (ε , δ )-PAC learning scenario, with δ = 1/3, corresponds to our deﬁnition of (ε, δ)- robust classiﬁcation with parameters ε = 0 and δ = ε . Slightly more precisely, a concept class F {0, 1}Rd for PAC-learning corresponds to the family D of all pairs of distribution supported respectively on f 1(0) and f 1(1) for some f F. Deﬁnition 2.5. We say that D is (ε, δ)-robustly feasible if every D D admits an (ε, δ)-robust classiﬁer. When it exists we denote f D for such a classiﬁer (chosen arbitrarily among all robust classiﬁers for D), and FD = {f D, D D}.

3. Robust learning with few samples

Obviously robust feasibility is a necessary condition for robust learnability. We show that it is in fact sufﬁcient, even for sample efﬁcient robust learnability. We ﬁrst do so when a ﬁnite set of classiﬁers FD sufﬁces for robust feasibility.

3.1. Robust empirical risk minimization

Theorem 3.1. Assume that D is (ε, δ)-robustly feasible. Then it is (ε, δ + δ )-robustly learnable with n = Ω δ+δ

δ 2 log(|FD|) .

Proof. Let ˆDi = 1

n Pn j=1 δXi(j) be the empirical measure corresponding to the dataset Xi. We will show that ERM on the ε-robust loss gives the claimed sample complexity. More precisely we consider the following classiﬁer: ˆf = arg min f FD max i {0,1} E X ˆ Di ℓε(f, X, i) .

For shorthand notation we write pf = maxi {0,1} EX Di ℓε(f, X, i) and ˆpf = maxi {0,1} EX ˆ Di ℓε(f, X, i). In particular we simply want to prove that p ˆ f δ + δ . Note that by deﬁnition pf D δ. A standard Chernoff bound gives that, with probability at least 2/3, one has for every f FD,

|pf ˆpf| = O( q

pf log(|FD|)/n) .

Now observe that for n 4 δ+δ

δ 2 log(|FD|) one can has p

pf D log(|FD|)/n δ /2 , and thus we obtain with n =

δ 2 log(|FD|) ,

r p ˆ f δ + δ ˆp ˆ f ˆpf D δ + δ .

Adversarial Examples from Computational Constraints

It now sufﬁces to observe that s δ + δ implies s δ

s δ+δ > δ + δ

3.2. Robust covering number

In many natural situations the classiﬁcation task is speciﬁed by a continuous set of distributions. For example one might have a set of the form D = {(g0(w0), g1(w1)), (w0, w1) Ω} where g0 and g1 are Lipschitz functions and Ωis some compact subset of Rd . In this case Theorem 3.1 does not apply, although one would like to say that essentially D is of log-size roughly d . The classical solution to this difﬁculty is with covering numbers:

Deﬁnition 3.2. For a metric space (X, dist) we write

Ndist(X, ε) = inf {|X| s.t. X X

x X {y : dist(x, y) ε}

With a slight abuse of notation we also extend the distance to the Cartesian product X X by dist((x, x ), (y, y )) = max(dist(x, x ), dist(y, y )).

With the above deﬁnitions one can obtain the following result as a straightforward corollary of Theorem 3.1 and the deﬁnition of total variation distance.

Theorem 3.3. Assume that D is (ε, δ)-robustly feasible. Then D is (ε, δ + 2δ )-robustly learnable with n =

δ 2 log(NTV(D, δ )) .

In fact, if one is willing to lose a little bit of robustness, one can use a signiﬁcantly weaker notion of distance than total variation. Indeed we can consider a broader class of modiﬁcations to a distribution that preserves the robustness of a classiﬁer: in Theorem 3.3 we used that we can move arbitrarily a small amount of mass, but in fact we can also move a little an arbitrary amount of mass. While the former type of movement corresponds to total variation distance, the latter corresponds to the (inﬁnity) Wasserstein distance. We denote W (D, D ) for the inﬁmum of sup(x,x ) supp(µ) x x over all measures µ(x, x ) with marginal over x (respectively x ) equal to D (respectively D ). Next we introduce a slightly non-standard notion of covering with respect to a pair of distances

Deﬁnition 3.4. For a metric space X equipped with two distances dist and dist we deﬁne an (ε, δ) neighborhood by2:

Uε,δ(x) = y : z s.t. dist (x, z) δ and dist(z, y) ε .

2The choice of ﬁrst moving with dist and then with dist will

The corresponding covering number is:

Ndist,dist (X, ε, δ) = inf {|X| s.t. X X

x X Uε,δ(x)

It is now easy to prove the following strengthening of Theorem 3.3:

Theorem 3.5. Assume that D is (ε, δ)-robustly feasible. Then D is (ε ε , δ + 2δ )-robustly learnable with n =

δ 2 log(NW ,TV(D, ε , δ )) .

Proof. Let A be the set realizing the inﬁmum in the deﬁnition of NW ,TV(D, ε , δ ). Observe that D is (ε ε , δ+δ )- robustly feasible with classiﬁers from FA, and apply Theorem 3.1.

3.3. Covering number bound from generative models

We now show that distributions approximated by generative models have bounded covering numbers (in terms of Deﬁnition 3.4), so Theorem 3.5 gives a good sample complexity for such distributions. The proof is deferred to Appendix C in the supplementary material.

Deﬁnition 3.6. A generative model gw : Rk Rd is a neural network indexed by weights w Rm. The generated distribution D(gw) is the distribution given by gw(x) for x N(0, Ik).

Lemma 3.7. Let gw be an ℓ-layer neural network architecture with at most d activations in each layer and Lipschitz nonlinearities such as Re LUs. Consider any family of distribution pairs D such that for each D D, and each i {0, 1}, there exists some w [ B, B]m with W (Di, D(gw)) ε. Then

log (NW ,TV(D, ε + δ, δ)) O(mℓlog(d B/δ)).

4. Lower bound for the SQ model

Let D0 and D1 be two distributions over a set X, for which we would like to solve a (binary) classiﬁcation task. The SQ model, introduced in (Kearns, 1998), is deﬁned as follows. An algorithm is allowed to access D0 and D1 through queries of the following kind. A query is speciﬁed by a function h: X [0, 1], and the response is two

ﬁt our application. In general a more natural deﬁnition would be:

Uε,δ(x) = {y : x = z1, z 1, . . . , zn, z n = y

i=1 dist(zi, z i) ε and

i=1 dist (z i, zi+1) δ} .

Adversarial Examples from Computational Constraints

numbers u, v R such that u Ex D0[h(x)] τ and v Ex D1[h(x)] τ. Here τ > 0 is a positive parameter called precision. After asking a number of such queries, the algorithm must output a required (robust or non-robust) classiﬁer for D0 and D1.

Our main result is as follows:

Theorem 4.1. For every sufﬁciently small ρ, γ > 0 the following holds. There exists a family of 2d O(1) pairs of distributions ( e D0, e D1) over Rd such that:

Almost all the mass of e D0 and e D1 is supported in an ℓ2-ball of radius O(

The distributions e D0 and e D1 admits a (Ω( p

1/γ), 2 dΩ(γ))-robust classiﬁer; moreover, a Ω( p

1/γ), 0.01)-robust classiﬁer can be learned from O(d) samples from D0 and D1;

For f D0 and f D1, there exists a linear (non-robust) classiﬁer, which can be learned in polynomial time;

For every ε > ρ, in order to learn a (ε, 0.01)-robust classiﬁer for e D0 and e D1, one needs at least 2dΩ(1)

statistical queries with accuracy as good as 2 dΩ(γ).

For instance, if γ is a small constant we get the existence of a C-robust classiﬁer, where C is a large constant. One could push C as high as Ω(log1/2 ε d) at a cost of the lower bound being against SQ queries with somewhat worse accuracy

(2 2logΩ(ε) d instead of 2 dΩ(1)).

We ﬁrst show a family of pairs (D0, D1) that admit a robust classiﬁer, yet it is hard (in the SQ model) to learn any (nonrobust) classiﬁer. Later, in Section 4.3, we show a simple modiﬁcation of this family to obtain the main result.

4.1. Hard family of distributions

Here we deﬁne a hard family of pairs of distributions (D0, D1) as discussed above. This section contains the deﬁnition and key properties of the family; proofs of those properties appear in Appendix A. This family can be seen to be a high-dimensional generalization and modiﬁcation of a family considered in (Diakonikolas et al., 2017). The family depends on three parameters: integers 1 k d, m 1 and a positive real ε > 0.

Fix an integer m 1. We introduce two auxiliary distributions over R that we will use later as building blocks.

Lemma 4.2. There exist two distributions DA and DB over R with everywhere positive p.d.f. s A(t) and B(t) respectively such that:

DA and DB match N(0, 1) in the ﬁrst m moments;

6 4 2 0 2 4 6

Well-separated distributions matching on 19 moments

DA DB N(0;1) rescaled

Figure 1. The distributions in Lemma 4.2 are similar to discretized Gaussians, with careful discretization and weighting from Gauss Hermite quadrature.

There exist two subsets SA, SB R such that the distance between SA and SB is at least Ω(1/ m), Px DA[x SA] 1 e Ω(m), and Px DB[x SB] 1 e Ω(m);

A, B C , and for every 0 l m + 1 and t, one has: | dl

dtl A(t) G(t)|, | dl

dtl B(t) G(t)| m O(l+1).

(See Figure 1 for the illustration.)

Next let us ﬁx parameters 1 k d and ε > 0. Let U = {Ui} be a family of k-dimensional subspaces of Rd

with ﬁxed orthonormal bases such that for every i = j and u Ui, one has: proj Uju 2 ε u 2. Informally speaking, subspaces from U are pairwise near-orthogonal. Lemma 4.3. For every k dΩ(1), there exists such a family U with ε d 0.49 and |U| = 2dΘ(1).

Now we are ready to deﬁne our family of hard pairs (D0, D1) of distributions over Rd. The family is parameterized by a k-dimensional subspace U U together with an orthonormal basis u1, u2, . . . , uk U, where U is the family of subspaces guaranteed by Lemma 4.3. Let us extend the above basis to a basis for the whole Rd: u1, u2, . . . , ud. Now we deﬁne a pair of distributions DU,A and DU,B via their p.d.f. s AU(x) and BU(x) respectively as follows:

i=1 A( x, ui )

i=k+1 G( x, ui )

and BU(x) =

i=1 B( x, ui )

i=k+1 G( x, ui ),

where A( ) and B( ) are densities of distributions DA and DB from Lemma 4.2, and G(t) = 1

2π e t2/2 is the p.d.f. of the standard Gaussian distribution N(0, 1). Now we simply take D0 to be DU,A and D1 to be DU,B. Lemma 4.4. There exist two sets SU,A, SU,B Rd such that the distance between SU,A and SU,B is Ω( p

k/m), and for which Px DU,A[x SU,A] 1 e Ω(km) and Px DU,B[x SU,B] 1 e Ω(km).

Adversarial Examples from Computational Constraints

As a result, the pair (D0, D1) admits a (Ω( p

k/m), e kmΩ(1))-robust classiﬁer. Moreover, since log |U| O(d) (which follows from standard bounds on the number of pairwise near-orthogonal unit vectors in Rd), it follows from Theorem 3.1 that one can learn a (Ω( p

k/m), 0.01)-robust classiﬁer from merely O(d) samples.

4.2. SQ lower bound for learning a classiﬁer for DU,A and DU,B

The heart of the matter is to show that it requires 2dΩ(1)

statistical queries with precision τ = 2 dΘ(γ) to learn a classiﬁer for DU,A and DU,B provided that all the parameters m, k, ε are set correctly. The argument is fairly involved and uses the framework of (Feldman et al., 2013) to reduce the question to that of upper bounding χ-correlation between the distributions. Due to space limitations, we show the argument in Appendix B of the supplementary material.

4.3. Making the distribution easy to learn non-robustly

Let us now show a family of pairs distributions ( e D0, e D1) over Rd+1 such that it is easy to learn a (non-robust) classiﬁer, but hard to learn a robust one. The construction is very simple: we take distributions (D0, D1) over Rd as deﬁned above and deﬁne x e D0 to be x = (0, y1, y2, . . . , yd), where y D0, and, similarly, x e D1 to be x = (ρ, y1, y2, . . . , yd), where y D1 and ρ > 0. These distributions admit a trivial (non-robust) classiﬁer based on the ﬁrst coordinate. Moreover, since e D0 and e D1 are linearly separable, they can be classiﬁed using linear SVM or logistic regression. Information-theoretically, one can learn a ( p

1/γ, 0.1)-robust classiﬁer using O(d) samples by ignoring the ﬁrst coordinate and applying Theorem 3.1. However, for every ε > ρ, one needs 2dΩ(1) SQ queries with accuracy 2 dΘ(γ) to learn an (ε, 0.1)-robust separator. This can be shown exactly the same way as for D0 and D1 (see Appendix B in the supplementary material).

The above distributions are hard to learn robustly with respect to the ℓ2 norm. We can switch to ℓ by replacing x by its Hadamard transform Hx. Since Hx Hy H(x y) 2/

d = x y 2, the robustness parameters in the theorem are unchanged while the diameter becomes O( d log d).

5. Cryptographic hardness

5.1. Hard-to-compute robust classiﬁers

We will now exhibit a binary classiﬁcation task that admits a maximally robust classiﬁer (that is, robust to perturbations comparable to the diameter of the support), yet any efﬁciently computable classiﬁer has an accuracy close to

random guessing.

Let G : {0, 1}d/2 {0, 1}d be a cryptographic pseudorandom generator (PRG). Let D0 be uniform on {0, 1}n and D1 be the distribution of G(s) for s uniform in {0, 1}d/2. Clearly a simple volume argument shows that there exists a classiﬁer A which satisﬁes (1) for ε = Θ(

d) (i.e., this problem admits a maximally robust classiﬁer). Yet by definition of a PRG no polynomial time algorithm can have a non-trivial classiﬁcation accuracy (let alone robust accuracy).

5.2. Adversarial examples and trapdoor PRG

Given Section 5.1, our goal is now to construct a classiﬁcation task which admits a maximally robust classiﬁer that is also efﬁciently computable, yet one cannot get non-trivial accuracy in polynomial time. The main idea here is to replace the PRG in the construction of Section 5.1 with a trapdoor PRG. In a nutshell a trapdoor PRG comes with a key, such that knowing the key allows to efﬁciently distinguish the PRG from a true source of randomness (and thus allows for efﬁcient classiﬁcation in the construction of Section 5.1). Note also that, by a simple union bound, the sample complexity of such a problem would be of order of the number of bits in the key.

Let us now detail the construction a bit more. For the sake of concreteness, we use a speciﬁc trapdoor PRG, namely the Blum Blum Shub PRG (Blum et al., 1986) (in its backward form). Let p and q be large distinct prime numbers congruent to 3 mod 4, let N = pq and d = O(log(N)). The BBS PRG GN : {0, 1}d {0, 1} works as follow. First it maps the seed s {0, 1}d to x0 N a quadratic residue mod N in such a way that a uniformly random seed gives a nearly-uniform quadratic residue modulo N. Next it iteratively takes square roots mod N, that is let xi+1 be such that xi = x2 i+1 mod N and xi+1 is a quadratic residue itself (this is well-deﬁned per our assumption on p and q). The ith element of the output of GN is then simply the parity of xi.

The key property of the BBS PRG is that, without knowing the factorization N = pq, its output is computationally indistinguishable (under the quadratic residuosity assumption) from a true source of randomness (even when the seed is known), while on the other hand knowing the factorization allows for efﬁcient distinguishing. To make this mathematically precise let us recall the notion of computational statistical distance for a family of pairs of distribution {(D0(ω), D1(ω)), ω Ω}: it is the supremum over all polynomial-time algorithms of the inﬁmum over ω Ωof the success probability one can have to identify whether a random sample was generated from D0(ω) or generated from D1(ω). Let us ﬁx some constant c > 1 and denote Dn 0 = unif({0, 1}dc) and Dn 1 (N) the distribution of the

Adversarial Examples from Computational Constraints

ﬁrst dc bits of s GN(s) where s is a uniformly random element of {0, 1}d. Theorem 5.1 ((Blum et al., 1986; Vazirani and Vazirani, 1983)). Assuming that for inﬁnitely many N the computational statistical distance of {(Dd 0, Dd 1(pq))}p,q is greater than 1/2+1/poly(d) would refute the quadratic residuosity assumption.

On the other hand, if p and q are known, then the computational statistical distance of {(Dd 0, Dd 1(N))} is 1 od(1).

From the above discussion we have the following properties for the classiﬁcation task described by the family {(Dd 0, Dd 1(pq))}p,q:

a. The (robust) sample complexity of this family is O(d).

b. Any task in this family admits a maximally robust classiﬁer (same volume argument as in Section 5.1) that is also efﬁciently computable (second statement in Theorem 5.1).

c. Under the quadratic residuosity assumption, any polynomial time learning algorithm for this family has an accuracy close to chance on some task in the family (ﬁrst statement in Theorem 5.1).

We also note that, using the trick of adding a dummy coordinate revealing the label from Section 4.3, one could replace property c by c and add property d as follows (for any ﬁxed ε > 0):

c . Under the quadratic residuosity assumption, any polynomial time learning algorithm for this family has a ε-robust accuracy close to chance on some task in the family.

d. One can learn non-robustly in polynomial time (and polynomial sample complexity).

Remark: After the preliminary version of the present paper appeared, we got notiﬁed by Degwekar and Vaikuntanathan that there is an issue with the above construction. Namely, it is not clear that the item b holds (the existence of an efﬁcient robust classiﬁer), since one can corrupt seed, which prevents us from (efﬁciently) distinguishing D0 and D1. However, as they explain in their paper (Degwekar and Vaikuntanathan, 2019), this can be remedied by post-composing our construction with a constant-rate linear distance efﬁcient errorcorrecting code. We refer the reader to (Degwekar and Vaikuntanathan, 2019) for a further discussion.

6. Conclusion and future directions

In this paper we put forward the thesis that adversarial examples might be an unavoidable consequence of computational constraints for learning algorithms. Our main piece

of evidence is two classiﬁcation tasks, for which there exist classiﬁers robust to large Euclidean perturbations, yet ﬁnding any non-trivial robust classiﬁer is hard in the statistical query model or under a cryptographic hardness assumption. The most important question for the validity of our thesis is whether one could prove a similar hardness result for natural distributions. This is a particularly challenging open problem as the concept of a natural distribution is fuzzy (for instance there is no consensus on what a natural distribution for images should look like).

Adversarial Examples from Computational Constraints

Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In Proceedings of the 35th International Conference on Machine Learning, ICML 18, 2018. URL https://arxiv.org/abs/ 1802.00420.

Avrim Blum, Merrick Furst, Jeffrey Jackson, Michael Kearns, Yishay Mansour, and Steven Rudich. Weakly learning dnf and characterizing statistical query learning using fourier analysis. In Proceedings of the Twenty-sixth Annual ACM Symposium on Theory of Computing, STOC 94, pages 253 262. ACM, 1994.

Lenore Blum, Manuel Blum, and Mike Shub. A simple unpredictable pseudo-random number generator. SIAM Journal on computing, 15(2):364 383, 1986.

Nilesh Dalvi, Pedro Domingos, Mausam, Sumit Sanghai, and Deepak Verma. Adversarial classiﬁcation. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 04, pages 99 108. ACM, 2004.

Akshay Degwekar and Vinod Vaikuntanathan. Computational limitations in robust classiﬁcation and win-win results. ar Xiv preprint ar Xiv:1902.01086, 2019.

Ilias Diakonikolas, Daniel Kane, and Alistair Stewart. Statistical query lower bounds for robust estimation of highdimensional gaussians and gaussian mixtures. In Proceedings of the ﬁfty-eighth Annual Symposium on Foundations of Computer Science, FOCS 17, 2017. URL https://arxiv.org/pdf/1611.03473.pdf.

Krishnamurthy Dvijotham, Robert Stanforth, Sven Gowal, Timothy Mann, and Pushmeet Kohli. A dual approach to scalable veriﬁcation of deep networks. ar Xiv preprint ar Xiv:1803.06567, 2018.

Gamaleldin F Elsayed, Shreya Shankar, Brian Cheung, Nicolas Papernot, Alex Kurakin, Ian Goodfellow, and Jascha Sohl-Dickstein. Adversarial examples that fool both human and computer vision. ar Xiv preprint ar Xiv:1802.08195, 2018.

Alhussein Fawzi, Hamza Fawzi, and Omar Fawzi. Adversarial vulnerability for any classiﬁer, 2018. URL https: //arxiv.org/pdf/ar Xiv:1802.08686.pdf.

Vitaly Feldman. A general characterization of the statistical query complexity. Proceedings of Machine Learning Research vol, 65:1 46, 2017.

Vitaly Feldman, Elena Grigorescu, Lev Reyzin, Santosh Vempala, and Ying Xiao. Statistical algorithms and a

lower bound for detecting planted cliques. In Proceedings of the forty-ﬁfth annual ACM symposium on Theory of computing, STOC 13, pages 655 664. ACM, 2013.

Shivam Garg, Vatsal Sharan, Brian Zhang, and Gregory Valiant. A spectral view of adversarially robust features. In Advances in Neural Information Processing Systems, pages 10159 10169, 2018.

Amparo Gil, Javier Segura, and Nico M Temme. Asymptotic approximations to the nodes and weights of gauss hermite and gauss laguerre quadratures. Studies in Applied Mathematics, 140(3):298 332, 2018.

Justin Gilmer, Luke Metz, Fartash Faghri, Sam Schoenholz, Maithra Raghu, Martin Wattenberg, and Ian Goodfellow. Adversarial spheres. In International Conference on Learning Representations Workshop, 2018. URL https://arxiv.org/pdf/1801.02774.pdf.

Amir Globerson and Sam Roweis. Nightmare at test time: Robust learning by feature deletion. In Proceedings of the 23rd International Conference on Machine Learning, ICML 06, pages 353 360. ACM, 2006.

Michael Kearns. Efﬁcient noise-tolerant learning from statistical queries. Journal of the ACM (JACM), 45(6):983 1006, 1998.

Adam R. Klivans and Alexander A. Sherstov. Unconditional lower bounds for learning intersections of halfspaces. Machine Learning, 69(2):97 114, 2007.

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018. URL https://arxiv.org/pdf/1706. 06083.pdf.

Y. Nesterov. Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers, 2004.

Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Aleksander Madry. Adversarially robust generalization requires more data, 2018. URL https: //arxiv.org/pdf/ar Xiv:1804.11285.pdf.

Le Song, Santosh Vempala, John Wilmes, and Bo Xie. On the complexity of learning neural networks. In Advances in Neural Information Processing Systems, pages 5520 5528, 2017.

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In International Conference on Learning Representations, 2013. URL https://arxiv.org/pdf/1312.6199.pdf.

Adversarial Examples from Computational Constraints

Gabor Szego. Orthogonal polynomials, volume 23. American Mathematical Soc., 1939.

Umesh V Vazirani and Vijay V Vazirani. Trapdoor pseudorandom number generators, with applications to protocol design. In Foundations of Computer Science, 1983., 24th Annual Symposium on, pages 23 30. IEEE, 1983.

Yizhen Wang, Somesh Jha, and Kamalika Chaudhuri. Analyzing the robustness of nearest neighbors to adversarial examples. ar Xiv preprint ar Xiv:1706.03922, 2017.

Tsui-Wei Weng, Huan Zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Duane Boning, Inderjit S Dhillon, and Luca Daniel. Towards fast computation of certiﬁed robustness for relu networks. ar Xiv preprint ar Xiv:1804.09699, 2018.

Eric Wong and Zico Kolter. Provable defenses against adversarial examples via the convex outer adversarial polytope. In International Conference on Machine Learning, pages 5283 5292, 2018.

Kai Y Xiao, Vincent Tjeng, Nur Muhammad Shaﬁullah, and Aleksander Madry. Training for faster adversarial robustness veriﬁcation via inducing relu stability. ar Xiv preprint ar Xiv:1809.03008, 2018.

Dong Yin, Kannan Ramchandran, and Peter Bartlett. Rademacher complexity for adversarially robust generalization. ar Xiv preprint ar Xiv:1810.11914, 2018.

Huan Zhang, Hongge Chen, Zhao Song, Duane Boning, Inderjit S Dhillon, and Cho-Jui Hsieh. The limitations of adversarial training and the blind-spot attack. ar Xiv preprint ar Xiv:1901.04684, 2019.