# query_complexity_of_adversarial_attacks__cdb0e045.pdf Query Complexity of Adversarial Attacks Grzegorz Głuch 1 Rudiger Urbanke 1 There are two main attack models considered in the adversarial robustness literature: black-box and white-box. We consider these threat models as two ends of a fine-grained spectrum, indexed by the number of queries the adversary can ask. Using this point of view we investigate how many queries the adversary needs to make to design an attack that is comparable to the best possible attack in the white-box model. We give a lower bound on that number of queries in terms of entropy of decision boundaries of the classifier. Using this result we analyze two classical learning algorithms on two synthetic tasks for which we prove meaningful security guarantees. The obtained bounds suggest that some learning algorithms are inherently more robust against querybounded adversaries than others. 1. Introduction Modern neural networks achieve high accuracy on tasks such as image classification (Krizhevsky et al., 2012) or speech recognition (Collobert & Weston, 2008). However, they are typically susceptible to small, adversarially-chosen perturbations of the inputs (Szegedy et al., 2014; Nguyen et al., 2015): more precisely, given a correctly-classified input x, one can typically find a small perturbation δ such that x+δ is misclassified by the network while to the human eye this perturbation is not perceptible. There are two main threat models considered in the literature: black-box and white-box. In the white-box model, on the one hand, the attacker (Biggio et al., 2013; Zheng et al., 2019) is assumed to have access to a full description of the model. For the case of neural networks that amounts to a knowledge of the architecture and the weights. In the black-box model, on the other hand, the adversary (Papernot et al., 2017; Chen et al., 2017; Liu et al., 2016; Xiao et al., 1School of Computer and Communication Sciences, EPFL, Switzerland. Correspondence to: Grzegorz Głuch . Proceedings of the 38 th International Conference on Machine Learning, PMLR 139, 2021. Copyright 2021 by the author(s). 2018; Hayes & Danezis, 2017) can only observe the input output behavior of the model. Many defenses have been proposed to date. To mention just some adversarial learn ing (Goodfellow et al., 2014; Madry et al., 2018; Tramer et al., 2018; Xiao et al., 2019), input denoising (Bhagoji et al., 2017; Xu et al., 2017), or more recently, randomized smoothing (Lecuyer et al., 2019; Cohen et al., 2019; Salman et al., 2019; Gluch & Urbanke, 2019). Unfortunately, most heuristic defenses break in the presence of suitably strong adversaries (Carlini & Wagner, 2017; Athalye et al., 2018; Uesato et al., 2018) and provable defenses are often imprac tical or allow only very small perturbations. Thus a full defense remains elusive. The current literature on this topic is considerable. We refer the reader to Chakraborty et al (2018) for an overview of both attacks and defenses and to Bhambri et al. (2019) for a survey focused on black-box attacks. We consider black-box and white-box models as the extreme points of a spectrum parameterized by the number of queries allowed for the adversary. This point of view is related to Ilyas et al. (2018) where the authors design a black-box attack with a limited number of queries. Intuitively, the more queries the adversary can make the more knowledge he gains about the classifier. When the number of queries approaches infinity then we transition from a black-box to a white-box model as in this limit the adversary knows the classifying function exactly. Using this point of view we ask: How many queries does the adversary need to make to reliably find adversarial examples? By reliably we mean comparable with the informationtheoretic white-box performance. To be more formal, we assume that there is a distribution D and a high-accuracy classifier f that maps Rd to classes Y. The adversary A only has black-box access to f. Moreover, ϵ R+ is an upper bound on the norm (usually ℓp norm) of the allowed adversarial perturbation. Assume that f is susceptible to ϵ-bounded adversarial perturbations for an η-fraction of the underlying distribution D. The quantity η is the largest error an adversary, who has access to unbounded computational resources and fully knows f, can achieve. We ask: How many queries to the classifier f does A need to make in order to be able to find adversarial examples for say an Query Complexity of Adversarial Attacks η/2-fraction of the distribution D? This question is similar to problems considered in Ashtiani et al. (2020). The difference is that in Ashtiani et al. (2020) the authors define the query complexity of the adversary as a function of the number of points for which the adversarial examples are to be found. Moreover, they require the adversary to be perfect, that is to find adversarial examples whenever they exist. This stands in contrast to our approach that only requires the adversary to succeed for say a 1/2 fraction of the adversarial examples. The question we ask is also similar to ideas in Cullina et al. (2018). In this paper the authors consider a generalization of PAC learning and ask how many queries an algorithm requires in order to learn robustly. Similar questions were also asked in Yin et al. (2019) and (Schmidt et al., 2018). The difference is that we focus on the query complexity of the attacker and not the defender. Diochnos et al. (2018); Gourdeau et al. (2019) discuss the use of membership queries as part of converting a black-box classifier to a white-box classifier. Our contributions. We introduce a new notion - the query complexity (QC) of adversarial attacks. This notion unifies the two most popular attack models and enables a systematic study of robustness of learning algorithms against query-bounded adversaries. Our findings are the following: the higher the entropy of the decision boundaries that are created by the learning algorithm the more secure is the resulting system in our attack model. We first prove a general lower bound on the QC in terms of the entropy of decision boundaries. Then, using this result, we present two scenarios for which we are able to show meaningful lower bounds on the QC. The first one is a simple 2-dimensional distribution and a nearest neighbor algorithm. For this setting we are able to prove a strong query lower bound of Θ(m), where m is the number of samples on which the classifier was trained . For the second example we consider the well-known adversarial spheres distribution, introduced in the seminal paper Gilmer et al. (2018). For this learning task we argue that quadratic neural networks have a query lower bound of Θ(d), where d is the dimensionality of the data. We discuss why certain learning algorithms like linear classifiers and also neural networks might be far less secure than KNN against querybounded adversaries. Finally, we use the lower bound on the QC in terms of entropy to prove, for a broad class of learning algorithms, a security guarantee against query-bounded adversaries that grows with accuracy. There exist tasks for which it is easy to find high-accuracy classifiers but finding robust models is infeasible. E.g., in Bubeck et al. (2019) the authors describe a situation where it is information-theoretically easy to learn robustly but there is no algorithm in the statistical query model that computes a robust classifier. In Bubeck et al. (2018) an even stronger result is proven. It is shown that under a standard cryptographic assumption there exist learning tasks for which no algorithm can efficiently learn a robust classifier. Finally, in Tsipras et al. (2019) it was shown that robust and accurate classifiers might not even exist. The query-bounded point of view shows a way to address these fundamental difficulties even for tasks for which it is impossible to produce a model that is secure against a resource-unbounded adversary, it might be possible to defend against a query-bounded adversary. Organization of the paper. In Section 2 we formally define the threat model and the query complexity of adversarial attacks. In Section 7 we show that a security guarantee against query-bounded adversaries that grows with accuracy for a rich class of learning algorithms. In Section 4 and 5 we analyze the query complexity of KNN and Quadratic Neural Network learning algorithms respectively. In Section 6 we present a universal defense against query-bounded adversaries. Finally, in Section 9 we summarize the results and discuss future directions. We defer most of the proofs to the appendix. 2. The Query Complexity of adversarial attacks We start by formally defining the threat model. For a data distribution D over Rd and a set A Rd let µ(A) := PX D[X A]. For simplicity we consider only separable binary classification tasks. Such tasks are fully specified by D as well as a ground truth h : Rd { 1, 1}. For a binary classification task with a ground truth h : Rd { 1, 1} and a classifier f : Rd { 1, 1} we define the error set as E(f) := {x Rd : f(x) = h(x)}. Note that with this definition it might happen that E(f) supp(D). For x Rd and ϵ > 0 we write Bϵ(x) to denote the closed ball with center x and radius ϵ and Bϵ to denote the closed ball with center 0 and radius ϵ. We say that a function p : Rd Rd is an ϵ-perturbation if for all x Rd we have { } p(x) x 2 ϵ. For n N we denote the set 1, 2, . . . , n by [n]. For x, y Rd we will use [x, y] to denote the closed line segment between x and y. For A, B Rd we define A + B := {x + y : x A, y B}. We use m to denote the sample size. Definition 1 (Risk). Consider a separable, binary classification task with a ground truth h : Rd { 1, 1}. For a classifier f : Rd { 1, 1} we define the Risk as R(f) := PX[f(X) = h(X)]. Definition 2 (Adversarial risk). Consider a binary classification task with separable classes with a ground truth h : Rd { 1, 1}. For a classifier f : Rd { 1, 1} and ϵ R 0 we define the Adversarial Risk as: AR(f, ϵ) := PX[ γ Bϵ : f(X + γ) = h(X + γ)]. Query Complexity of Adversarial Attacks an ϵ-perturbation p we define: AR(f, p) := PX[f(p(X)) = h(p(X))], to be the adversarial risk of a specific perturbation function p. Note: In the literature other definitions were also considered, see Diochnos et al. (2018); Gourdeau et al. (2019). In order to keep the exposition simple, we restrict our discussion to our definition of adversarial risk and to ℓ2-bounded adversarial perturbations. Other norms can of course be considered and might be important in practice. Definition 3 (Query-bounded adversary). For ϵ R 0 and f : Rd { 1, 1} a q-bounded adversary with parameter ϵ is a deterministic algorithm* A that asks at most q N (potentially adaptive) queries of the form ? f(x) = 1 and outputs an ϵ-perturbation A(f) : Rd Rd. Definition 4 (Query complexity of adversarial attacks). Consider a binary classification task T for separable classes with a ground truth h : Rd { 1, 1} and a distribution D. Assume that there is a learning algorithm ALG for this task that given S Dm learns a classifier ALG(S) : Rd { 1, 1}. For ϵ R 0 define the Query Complexity of adversarial attacks on ALG with respect to (T, m, ϵ) and denote it by QC(ALG, T, m, ϵ): It is the minimum q N so that there exists a q-bounded adversary A with parameter ϵ such that PS Dm of the event 1 AR(ALG(S), A(ALG(S))) AR(ALG(S), ϵ) 2 is at least 0.99. In words, it is the minimum number of queries that is needed so that there exists an adversary who can achieve an error of half the maximum achievable error (with high probability over the data samples). Note that it follows from Definitions 3 and 4 that A is computationally unbounded, knows the distribution D and the ground truth h of the learning task and also knows the learning algorithm ALG. The only restriction that is imposed on A is the number of allowed queries. What is important is that A does not know S nor the potential randomness of ALG (in the generalized setting ALG can be randomized, see Definition 5) this is what makes the QC non-degenerate. To see this, observe that if A knew S and ALG was deterministic then A could achieve AR(ALG(S), A(ALG(S))) = AR(ALG(S), ϵ) without asking any queries. This is because A can for every point x check if there exists γ Bϵ such that *We use algorithm here since this seems more natural. But we do not limit the attacker computationally nor are we concerned with questions of computability. Hence, function would be equally correct. ALG(S)(X+γ) = h(X), as A can compute ALG(S) without asking any queries. This allows A to achieve adversarial risk of AR(ALG(S), ϵ) (see Definition 3). Defnition 4 was guided by experiments. For instance, in Papernot et al. (2017), in order to attack a neural network f the adversary trains a new neural network ˆf. This ˆf acts as an approximation of f. She does so by creating a training set, which is labeled using f and then training ˆf on this dataset. The concept of transferability allows the adversary to ensure that f will also misclassify inputs misclassified by ˆf. This phenomenon is reflected in our definition after the initial phase of querying ˆ f (and training f) the adversary no longer has access to f. This means that after this phase the adversary implicitly constructed an ϵ-perturbation p as in Definition 2. One can also consider different definitions. For instance ask about the number of queries required to attack a given point or measure the adversarial risk in absolute terms (instead of 1/2 of the white-box performance). Both of these questions are valid and are of interest in their own right. For the sake of definiteness we choose what we consider to be one important viewpoint, a viewpoint that is well motivated by experiments. Definition 4 can be generalized to incorporate randomness in the learning algorithm. Intuitively, the randomness in ALG can increase the entropy of the learning process and that in turn may lead to a higher QC. Further, both the approxima- tion constant (which is chosen to be 1/2 in Definition 4) as well as the success probability can also be generalized. Definition 5 (Query complexity of adversarial attacks - generalized). Consider a binary classification task T for separable classes with a ground truth h : Rd { 1, 1} and a distribution D. Assume that there is a randomized learning algorithm ALG for this task that given S Dm and a sequence of random bits B B learns a classifier ALG(S, B) : Rd { 1, 1}. For ϵ R 0, κ, α [0, 1] define the Query Complexity of the adversarial attacks on ALG with respect to (T, m, ϵ, α, κ) and denote it by QC(ALG, T, m, ϵ, α, κ): It is the minimum q N such that there exists a q-bounded adversary A with parameter ϵ such that PS Dm,B B of the event AR(ALG(S, B), A(ALG(S, B))) αAR(ALG(S, B), ϵ) is at least 1 κ. If the above holds for A we will refer to α as the approximation constant of A and to κ as the error probability of A. For the sake of clarity whenever possible we will restrict ourselves for the most part to Definition 4. This eliminates two parameters from our expressions and restricts the attention to deterministic learning algorithms. Only when the Query Complexity of Adversarial Attacks distinction becomes important will we refer to Definition 5. Summary: The query complexity of adversarial attacks is the minimum q for which there exists a q-bounded adversary that carries out a successful attack. Such adversaries are computationally unbounded, know the learning task and the learning algorithm but don t know the training set. 3. High-entropy decision boundaries lead to robustness The decision boundary of a learning algorithm applied to a given task can be viewed as the outcome of a random process: (i) generate a training set and, (ii) apply to it the, potentially randomized, learning algorithm. Recall, see Definitions 4 and 5, that a query-bounded adversary does not know the sample on which the model was trained nor the randomness used by the learner. This means that if the decision boundary has high entropy then the adversary needs to ask many questions to recover the boundary to a high degree of precision. This suggest that high-entropy decision boundaries are robust against query-bounded adversaries since intuitively it is clear that an approximate knowledge of the decision boundary is a prerequisite for a successful attack. Next we present a result that makes this intuition formal. Before delving into the details let us explain the intuition behind this approach. Let us recall the set-up. The classifier is trained on a sample S that is unknown to the adversary. This classifier has a particular error set. We say that an adversary succeeds if, after asking some queries, it manages to produce an ϵ-perturbation with the property that this perturbation moves sufficient mass into the error set of the classifier. Here, sufficient means at least half of what is possible if the adversary had known the classifier exactly. Let us say in this case that an ϵ-perturbation is consistent with an error set. The following theorem states that if for every ϵ-perturbation the probability that an error set of a classifier is consistent with that perturbation is small then the QC is high. This is true since if for every ϵ-perturbation only a small fraction of probability space (i.e., the possible classifiers) is consistent with this perturbation then A s protocol has to return many distinct ϵ-perturbations depending on the outcome of its queries. And to distinguish which perturbation it should return it has to ask many queries. Theorem 1. [Reduction.] Let ϵ R 0 and let T be a binary classification task on Rd with separable classes. Let ALG be a randomized learning algorithm for T that uses QC(ALG, T, m, ϵ, 1/2, κ) 1 κ log , supp: ϵ-perturbation PS Dm,B B [E(S, B, p)] where the event E(S, B, p) is defined as: 1 AR(ALG(S, B), ϵ) µ(p (E(ALG(S, B)))) . 2 Remark 1. For the sake of clarity and consistency with the standard setup we fixed the approximation constant to be equal 1/2. We note however, that Theorem 1 (and its proof) is also true for all approximation constants. Summary: Theorem 1 is a key ingredient in most of our results. It gives a lower bound on the QC in terms of a geometric-like notion of entropy of error sets. This is often much easier to compute than to analyze the inner workings of a particular learning algorithm. samples. Then for every the following holds: 4. Entropy due to locality K-NN algorithms Let us now analyze the QC of K-Nearest Neighbor (K-NN) algorithms. Nearest neighbor algorithms are among the simplest and most studied algorithms in machine learning. They are also widely used as a benchmark. It was shown in Cover & Hart (1967) that for a sufficiently large training set, the risk of the 1-NN learning rule is upper bounded by twice the optimal risk. It is also known that these methods suffer from the curse of dimensionality for d dimensional distributions they typically require m = 2Θ(d log(d)) many samples. That is why in practice one often uses some dimensionality reduction subroutine before applying K-NN. Moreover, K-NN techniques are one of the few learning algorithms that do not require any learning. In the naive implementation all computation is deferred until function evaluation. This is related to the most interesting fact from our perspective, namely that the classification rule of the K-NN algorithm depends only on the local structure of the training set. We argue that this property makes K-NN algorithms secure against query-bounded adversaries. Intuitively, if the adversary A wants to achieve a high adversarial risk she needs to understand the global structure of the decision boundary. But if the classification rule is only very weakly correlated between distant regions of the space then this intuition suggests that A may need to ask Θ(m) many queries to guarantee a high adversarial risk. This is consistent with the entropy point of view. Moreover there are experimental results (see Wang et al. (2018); Papernot et al. (2016)) that show that it is hard to attack K-NN classifiers in the black-box model. Query Complexity of Adversarial Attacks We make these intuitions formal in the following sense. We design a synthetic binary learning task in R2, where the data is uniformly distributed on two parallel intervals correspondiing to the two classes. We then show a Θ(m) lower bound for the QC of 1-NN algorithm for this learning task. This means that the number of queries the adversary needs to make to attack 1-NN is proportional to the number of samples on which the algorithm was trained . We conjecture that a similar behavior occurs in higher dimensions as well. 4.1. K-NN QC lower bounds Consider the following distribution. Let m N and z R+. Let L , L+ R2 be two parallel intervals of length m placed at distance z apart. More formally, L := [(0, 0), (m, 0)], L+ = [(0, z), (m, z)]. Let the binary learning task Tintervals(z) be as follows. We generate x R2 uniformly at random from the union L L+. We assign the label y = 1 if x L and y = +1 otherwise. In Figure 1 we visualize a decision boundary of the 1-NN algorithm for a random S Dm on the Tintervals. Horizontal lines, black and gray, represent the two classes, crosses are data points, white and gray regions depict the classification rule and the union of red intervals is equal to the error set. We also include more visualizations in the appendix. The main result of this subsection is: Theorem 2. There exists a function λ : R+ (0, 1) such that the 1-Nearest Neighbor (1-NN) algorithm applied to the learning task Tintervals(z) satisfies: QC(1-NN, Tintervals(z), 2m, z/10, 1 λ(z), 0.1) Θ(m), provided that z = Ω(1). Summary: The K-NN algorithm learns classification rules that depend only on the local structure of the data. This implies high-entropy decision boundaries, which in turn leads to robustness against query-bounded adversaries. The QC of 1-NN scales at least linearly with the size of the training set. 5. Entropy due to symmetry - Quadratic Neural Networks In this section we analyze the QC of Quadratic Neural Networks (QNN) applied to a learning task defined in Gilmer et al. (2018). Let Sd 1 r := {x Rd : x 2 = r}. The distribution D is defined by the following process: generate x U[Sd 1 1 ] and b U{ 1, 1} (where U denotes the uniform distribution). If b = 1 return (x, 1), otherwise return (1.3x, +1). The associated ground truth is defined as h(x) = 1 for x Rd, x 2 1.15 and h(x) = 1 otherwise . The QNN is a single hidden-layer network where the activation function is the quadratic function σ(x) = x2. There are no bias terms in the hidden layer. The output node computes the sum of the activations from the hidden layer, multiplies them by a scalar and adds a bias. If we assume that the hidden layer contains h nodes then the network has d h + 2 parameters. It was shown in Gilmer et al. (2018) that the function that is learned by QNN has the P form d y(x) = i=1 αiz2 i 1, where the αi s are scalars that depend on the parameters of the network and z = M(x) for some orthogonal matrix M. The decision boundary is P thus d α z2 i=1 i i = 1, which means that it is an ellipsoid centered at the origin. In a series of experiments performed for the Concentric Spheres (CS) dataset in Gilmer et al. (2018) it was shown that a QNN trained with N = 106 many samples with d = 500 and h = 1000 learns a classifier with an estimated error of approximately 10 20 but the adversarial risk η is high and is estimated to be 1/2 when ϵ 0.18. On the theoretical side, it was proven in Gluch & Urbanke (2019) (see Section 9.1) that log(η/δ) ϵ O . (1) d In words, (1) gives an upper bound on the biggest allowed perturbation ϵ in terms of the risk δ, the adversarial risk η and the dimension d. In particular if we want the classifier to be adversarially robust for ϵ = Θ(1) (that is for perturbations comparable with the separation between the two classes) then δ = 2 Ω(d) . Even robustness of only ϵ = Θ(1/ d) requires the risk to be as small as δ = 2 Ω( d). These results paint a bleak picture of the adversarial robustness for CS. The QC point-of-view is more optimistic. Using results from Section 7 we first show that if the network learns classifiers with risk 2 Ω(k) then it automatically leads to a lower bound on the QC of Θ(k). Moreover, for a simplified model of the network, we show that even if the risk of the learned classifier is only a small constant, say 0.01, then this results in a lower bound on the QC of Θ(d) for perturbations of Θ(1/ d). Using (1) our result guarantees security against Θ(d) -bounded adversaries for perturbations which are Θ( d) times bigger than the best possible against unbounded adversaries. This shows that restricting the power of the adversary can make a significant difference. We argue that the obtained Θ(d) lower bound is close to the real QC for this algorithm and learning task. Observe that the decision boundary of the network is an ellipsoid which can be described by O(d2) parameters (d2 for the rotation matrix and d for lengths of principal axes). This suggest that it should be possible to design a O(d2)-bounded adversary that succeeds on this task. Assuming that this is indeed the Query Complexity of Adversarial Attacks Figure 1. A random decision boundary of 1-NN for Tintervals. case, our lower bound is only a factor O(d) optimum. The results of this section can be understood in the following way. The relatively simple structure of the decision boundaries allows the adversary to attack the model with only O(d2) queries. There is however enough entropy in the network to guarantee a lower bound for the QC of Θ(d). This entropy intuitively comes from the rotational invariance of the dataset and in turn of the learned decision boundary. We conjecture that algorithms like linear classifiers (e.g., SVMs) exhibit a similar behavior. That is, for natural learning tasks they are robust against q-bounded adversaries only for q = O(poly(d)). The reason is that all these algorithms generate classifiers with relatively simple decision boundaries which can be described by O(poly(d)) parameters. But this is not the end of the story for CS. Our results don t preclude the possibility that there exist a learning algorithm that is secure against q-bounded adversaries for q d. In fact in Section 6 we present an off-the-shelf solution that can be applied to CS dataset and which, by injecting entropy , achie ves security against k-bounded adversaries for ϵ = Θ 1 . k d away from the 5.1. Quadratic Neural Networks QC lower bounds Using the results from Section 7 one can show that increased accuracy leads to increased robustness. More precisely if QNN has a risk of 2 Ω(k) then it is secure against Θ(k)- bounded adversaries for ϵ = Θ(1). The proof of this fact is deferred to the appendix. Now we argue that also in the case where the risk achieved by the network is as large as a constant then QNN are still robust against Θ(d)-bounded adversaries. We first argue that any reasonable optimization algorithm applied to QNN for the CS learning task gives rise to a distribution on error sets that is rotational invariant. This follows from the fact that D itself is rotational invari- ant. Now observ n e that for QNN the error sets oare of the form: x Rd Pd : x 2 1.15, i=1 αiz2 i > 1 n o x Rd Pd : x 2 > 1.15, i=1 αiz2 i < 1 , as the decision P boundary learned by QNN is defined by d i=1 αiz2 i = 1, where z = Mx for some orthonormal matrix M. These sets might be quite complicated as they are basically defined as the set difference of a ball and an ellipsoid. We will refer to the real distribution on error sets of QNN as EQNN. We assume that the standard risk of classifiers learned by the QNN is concentrated around a constant δ. Intuitively a complicated (high entropy) distribution on error sets results in a high QC and a simple (low entropy) distribution results in a low QC. In the rest of this section we first introduce a set of simple , artificial distributions over error sets and then we state QC lower bounds for these distributions. Formal definitions are presented in Definition Distributions on Spherical Caps in the appendix, here we give a short description of what they are. For y Sd 1 1 let cap(y, r, τ) := Br {x Rd : x, y τ}. Let τ : [0, 1] [0, 1] be such that for every δ [0, 1] we have ν(cap( , 1, τ(δ)))/ν(Sd 1 1 ) = δ, where ν is a d 1 dimensional measure on the sphere Sd 1 1 . For k N+ let: E (k) := cap(e1, 1.15, τ(δ/k)) \ B1.15/1.3 E+(k) := cap(e1, 1.495, 1.3τ(δ/k)) \ B1.15, where e1 is a standard basis vector. Note that for every k we have 1.3 E (k) = E+(k). For δ (0, 1), k N+ the distributions are: Cap(δ) - randomly rotated either E (1) or E+(1), chosen uniformly at random; Capi.i.d k (δ) - union of k i.i.d. randomly rotated sets each either E (k) or E+(k), chosen uniformly at random; Cap G k(δ) - k randomly rotated sets, each either E (k) or E+(k), chosen uniformly at random; the relative positions of cap s normal vectors are determined by G, where G is a given distribution on (Sd 1 1 )k. We conjecture that Cap(0.01), Capsi.i.d. d (0.01), Caps G d (0.01) have QCs that are no larger than the QC of EQNN that Query Complexity of Adversarial Attacks Table 1. QC for CS Error distribution Lower bound Cap(0.01) Capsi.i.d.(0.01) k Caps G(0.01) k achieves standard risk 0.01. The intuitive reason is that they contain less entropy than EQNN and so it should be easier to attack these distributions. In Lemma 1 we prove a Θ(d) lower bound for Cap(0.01) and, in the appendix, we give a matching upper bound of Θ(d). Also in the appendix, we give two reductions that lower-bound QC of Capsi.i.d. d and Caps G d based on a conjecture (see Cap conjecture in the appendix). We summarize the proved and conjectured lower bounds in Table 1. Lemma 1 (Lower bound for Cap). There exists λ > 0 such that if a q-bounded adversary A succeeds on Cap(0.01) with approximation constant 1 λ, error probability 2/3 for ϵ = τ(0.01). Then Summary: Quadratic neural networks have simple decision boundaries - they are of the form of ellipsoids. But due to the rotational symmetry there is sufficient entropy to guarantee robustness against Θ(d)-bounded adversaries. 6. How to increase the entropy of an existing scheme a universal defense It was proven in Gluch & Urbanke (2019) that there exists a universal defense against adversarial attacks. The defense algorithm gets as an input access to a high accuracy classifier f and outputs a new classifier g that is adversarially robust. The idea of the defense is based on randomized smoothing (Cohen et al., 2019; Salman et al., 2019) and random partitions of metric spaces. Simple rephrasing of Theorem 5 from Gluch & Urbanke (2019) in the language of the QC of adversarial attacks gives the following: Theorem 3. For every d N+ there exists a randomized algorithm DEF satisfying the following. It is given as input access to an initial classifier Rd { 1, 1} and provides oracle access to a new classifier Rd { 1, 1}. For every separable binary classification task T in Rd with separation ϵ the following conditions hold. Let ALG be a learning algorithm for T that uses m samples. Then for every S This lower bound is conditional on Cap conjecture (in the appendix). This summary is conditional on Cap conjecture. Dm we have R(DEF(ALG(S))) 2R(ALG(S)) and for every ϵ > 0: QC DEF ALG, T, m, ϵ ϵ ( ) Θ . d ϵ Summary: There exists a universal defense that can be applied on top of any learning algorithm to make it secure against query-bounded adversaries. Roughly speaking, it works by injecting additional randomness to increase the entropy of the final classifier. 7. Robustness and accuracy foes no more It was argued in Tsipras et al. (2019) that there might be an inherent tension between accuracy and adversarial robustness. We argue that this potential tension disappears for a rich class of learning algorithms if we consider q-bounded adversaries. We show that if a learning algorithm satisfies a particular natural property then there is a lower bound for the QC of this algorithm that grows with accuracy. Theorem 4. For every ϵ R 0, C, δ, η R+ and T a binary classification task on Rd with separable classes the following conditions hold. If ALG is a learning algorithm for T and satisfies the following properties: 1. x supp(D) + Bϵ, PS Dm[ALG(S)(x) = h(x)] C δ, 2. PS Dm[AR(ALG(S), ϵ) η] 0.99, 3. PS Dm[R(ALG(S)) δ] 0.99, then: η QC(ALG, T, m, ϵ) log . 3 C δ The lower bound obtained in Theorem 4 is useful in situations when ALG(S) has high accuracy but the adversarial risk is large. This is a typical situation when using neural networks one is often able to find classifiers that have high accuracy but they are not adversarially robust. Summary: For a rich class of learning algorithms our security guarantee against query-bounded adversaries increases with accuracy. A risk of 2 Ω(k) leads to robustness against Θ(k)-bounded adversaries. 8. Discussion For a given task the QC can vary considerably depending on the learning algorithm. In this section we try to explain our current understanding of what governs this dependence. Query Complexity of Adversarial Attacks Consider the two intervals learning task from Theorem 2. As proven, if we use the 1-NN classifier the QC is lowerbounded by Θ(m), where m is the number of training examples. Now consider what happens if we used the concept class of linear separators with the standard ERM algorithm. Then we expect the learned classifier to be a line that approximately separates the two intervals. To approximately recover this line the adversary can find two points through which the line passes by running two binary search procedures. This implies that the QC is independent of m. Thus QCs can be as different as Ω(m) and O(1) depending on the used concept class/learning algorithm. Note that if we used the SVM classifier then with allowed perturbation of z/10 (as in Theorem 2) the adversarial risk will be 0 (for m big enough), as the learned classifier will be defined by the line passing through (0, z/2) and (m, z/2), which makes the QC degenerate and equal 0. Consider now the CS data set and see what happens when you very the number of samples m and the learning algorithm. As we mentioned in Section 4, for CS K-NN might need as many as m = 2Θ(d log d) samples. This means that the lower bound of Θ(m) (from Theorem 2) becomes 2Θ(d log d). On the other hand, QNN applied to this task is conjectured to have QC of Θ(d ) also look at an interesting learning algorithm from Montasser et al. (2019), which for a concept class H can improperly learn H to a constant adversarial risk using O(VC-dim(H) dual VC-dim(H)) many samples. As the VC-dimension, and also the dual VC-dimension, of ellipsoids is in O(d2) we get that this algorithm achieves constant adversarial risk using m = O(d4) samples. The QC in this case is, in a way, irrelevant because the adversarial risk is small already. Thus we see three algorithms with different QCs but it s not clear if we can directly compare them as they require different sample sizes. These examples suggest that there is a strong connection between the number of degrees of freedom of the classifier and the QC. However this connection cannot be expressed in terms of VC-dim or Adversarial VC-dim (Cullina et al., 2018) as there exist concept classes and learning tasks for which QC VC-dim/Adversarial VC-dim. Understanding the connection between the QC and the number of parameters is a part of ongoing work. 9. Conclusions and takeaways We investigate robustness of learning algorithms against query-bounded adversaries. We start by introducing a definition of QC of adversarial attacks and then proceed to study it s properties. We prove a general lower bound for the QC in terms of an entropy-like property. Using this result we then show a series of lower bounds for classical learning algorithms. Specifically, we give a lower bound of Θ(d) for QNNs and a lower bound of Θ(m) for 1-NN algorithm. Moreover we observe that sources of the entropy can be varied. For K-NN the entropy is high due to the locality of the learning algorithm whereas for QNN it comes from the rotational symmetry of the data. The entropy can also be increased by introducing randomness in the learning algorithm itself. We also show that improvements in accuracy of a model lead to an improved security against query-bounded adversaries. Our analysis identifies properties of learning algorithms that make them (non-)robust. These results give a rule-of-thumb: The higher the entropy of decision boundary the better for assessing the QC of a given algorithm. We believe that a systematic investigation of learning algorithms from the point of view of QC will lead to more adversarially-robust systems. Specifically, it should be possible to design generic defenses that can be applied on top of any learning algorithm. One example of such a defense was given in Section 6. Significantly more work is needed in order to fulfill the potential of this approach. But imagine that this type of defense could be applied efficiently with only a black-box access to the underlying classifier. And imagine further, that it could guaranteed a QC of, say q = 2Θ(d). This would arguably solve the adversarial robustness problem. Ashtiani, H., Pathak, V., and Urner, R. Black-box certification and learning under adversarial perturbations. Ar Xiv, abs/2006.16520, 2020. Athalye, A., Carlini, N., and Wagner, D. A. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsm assan, Stockholm, Sweden, July 10-15, 2018, pp. 274 283, 2018. URL http://proceedings.mlr.press/ v80/athalye18a.html. Bhagoji, A., Cullina, D., and Mittal, P. Dimensionality reduction as a defense against evasion attacks on machine learning classifiers. 04 2017. Bhambri, S., Muku, S., Tulasi, A. S., and Buduru, A. B. A study of black box adversarial attacks in computer vision. Ar Xiv, abs/1912.01667, 2019. Biggio, B., Corona, I., Maiorca, D., Nelson, B., ˇSrndi c, N., Laskov, P., Giacinto, G., and Roli, F. Evasion attacks against machine learning at test time. In Blockeel, H., Kersting, K., Nijssen, S., and ˇZelezn y, F. (eds.), Machine Learning and Knowledge Discovery in Databases, pp. Query Complexity of Adversarial Attacks 387 402, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg. ISBN 978-3-642-40994-3. Bubeck, S., Lee, Y. T., Price, E., and Razenshteyn, I. P. Adversarial examples from cryptographic pseudo-random generators. Co RR, abs/1811.06418, 2018. URL http: //arxiv.org/abs/1811.06418. Bubeck, S., Lee, Y. T., Price, E., and Razenshteyn, I. Adversarial examples from computational constraints. In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 831 840, Long Beach, California, USA, 09 15 Jun 2019. PMLR. URL http://proceedings. mlr.press/v97/bubeck19a.html. Carlini, N. and Wagner, D. A. Adversarial examples are not easily detected: Bypassing ten detection methods. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, AISec@CCS 2017, Dallas, TX, USA, November 3, 2017, pp. 3 14, 2017. doi: 10.1145/3128572.3140444. URL https://doi. org/10.1145/3128572.3140444. Chakraborty, A., Alam, M., Dey, V., Chattopadhyay, A., and Mukhopadhyay, D. Adversarial attacks and defences: A survey. Ar Xiv, abs/1810.00069, 2018. Chen, P.-Y., Zhang, H., Sharma, Y., Yi, J., and Hsieh, C.-J. Zoo: Zeroth order optimization based blackbox attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, AISec 17, pp. 15 26, New York, NY, USA, 2017. Association for Computing Machinery. ISBN 9781450352024. doi: 10.1145/3128572.3140448. URL https://doi. org/10.1145/3128572.3140448. Cohen, J., Rosenfeld, E., and Kolter, Z. Certified adversarial robustness via randomized smoothing. In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 1310 1320, Long Beach, California, USA, 09 15 Jun 2019. PMLR. URL http://proceedings. mlr.press/v97/cohen19c.html. Collobert, R. and Weston, J. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning, ICML 08, pp. 160 167, New York, NY, USA, 2008. ACM. ISBN 978-1-60558-205-4. doi: 10.1145/1390156. 1390177. URL http://doi.acm.org/10.1145/ 1390156.1390177. Cover, T. and Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory, 13:21 27, 1967. Cullina, D., Bhagoji, A. N., and Mittal, P. Pac-learning in the presence of evasion adversaries. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS 18, pp. 228 239, Red Hook, NY, USA, 2018. Curran Associates Inc. Diochnos, D. I., Mahloujifar, S., and Mahmoody, M. Adversarial risk and robustness: General definitions and implications for the uniform distribution. Ar Xiv, abs/1810.12272, 2018. Gilmer, J., Metz, L., Faghri, F., Schoenholz, S. S., Raghu, M., Wattenberg, M., and Goodfellow, I. J. Adversarial spheres. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Workshop Track Proceedings, 2018. URL https://openreview.net/forum? id=Skthl Lk Pf. Gluch, G. and Urbanke, R. L. Constructing a provably adversarially-robust classifier from a high accuracy one. Co RR, abs/1912.07561, 2019. URL http://arxiv. org/abs/1912.07561. Goodfellow, I. J., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples. 2014. URL http://arxiv.org/abs/1412.6572. cite arxiv:1412.6572. Gourdeau, P., Kanade, V., Kwiatkowska, M., and Worrell, J. On the hardness of robust classification. In Wallach, H., Larochelle, H., Beygelzimer, A., d'Alch e-Buc, F., Fox, E., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings. neurips.cc/paper/2019/file/ 8133415ea4647b6345849fb38311cf32-Paper. pdf. Hayes, J. and Danezis, G. Machine learning as an adversarial service: Learning black-box adversarial examples. 08 2017. Ilyas, A., Engstrom, L., Athalye, A., and Lin, J. Black-box adversarial attacks with limited queries and information. In Dy, J. and Krause, A. (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp. 2137 2146, Stockholmsm assan, Stockholm Sweden, 10 15 Jul 2018. PMLR. URL http://proceedings.mlr. press/v80/ilyas18a.html. Krizhevsky, A., Sutskever, I., and Hinton, G. E. Imagenet classification with deep convolutional neural networks. In Query Complexity of Adversarial Attacks Pereira, F., Burges, C. J. C., Bottou, L., and Weinberger, K. Q. (eds.), Advances in Neural Information Processing Systems 25, pp. 1097 1105. Curran Associates, Inc., 2012. L ecuyer, M., Atlidakis, V., Geambasu, R., Hsu, D., and Jana, S. Certified robustness to adversarial examples with differential privacy. In 2019 IEEE Symposium on Security and Privacy, SP 2019, San Francisco, CA, USA, May 19-23, 2019, pp. 656 672, 2019. doi: 10.1109/SP. 2019.00044. URL https://doi.org/10.1109/ SP.2019.00044. Liu, Y., Chen, X., Liu, C., and Song, D. Delving into transferable adversarial examples and black-box attacks. Co RR, abs/1611.02770, 2016. URL http://arxiv. org/abs/1611.02770. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018. URL https:// openreview.net/forum?id=r Jz IBf ZAb. Montasser, O., Hanneke, S., and Srebro, N. Vc classes are adversarially robustly learnable, but only improperly. volume 99 of Proceedings of Machine Learning Research, pp. 2512 2530, Phoenix, USA, 25 28 Jun 2019. PMLR. URL http://proceedings.mlr. press/v99/montasser19a.html. Nguyen, A. M., Yosinski, J., and Clune, J. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In CVPR, pp. 427 436. IEEE Computer Society, 2015. ISBN 978-1-46736964-0. URL http://dblp.uni-trier.de/db/ conf/cvpr/cvpr2015.html#Nguyen YC15. Papernot, N., Mc Daniel, P., and Goodfellow, I. J. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. Ar Xiv, abs/1605.07277, 2016. Papernot, N., Mc Daniel, P., Goodfellow, I., Jha, S., Celik, Z. B., and Swami, A. Practical black-box attacks against machine learning. pp. 506 519, 04 2017. doi: 10.1145/ 3052973.3053009. Salman, H., Yang, G., Li, J., Zhang, P., Zhang, H., Razenshteyn, I. P., and Bubeck, S. Provably robust deep learning via adversarially trained smoothed classifiers. Ar Xiv, abs/1906.04584, 2019. Schmidt, L., Santurkar, S., Tsipras, D., Talwar, K., and Madry, A. Adversarially robust generalization requires more data. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS 18, pp. 5019 5031, Red Hook, NY, USA, 2018. Curran Associates Inc. Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. Intriguing properties of neural networks. In International Conference on Learning Representations, 2014. URL http: //arxiv.org/abs/1312.6199. Tram er, F., Kurakin, A., Papernot, N., Goodfellow, I., Boneh, D., and Mc Daniel, P. Ensemble adversarial training: Attacks and defenses. In International Conference on Learning Representations, 2018. URL https: //openreview.net/forum?id=rk Zv Se-RZ. Tsipras, D., Santurkar, S., Engstrom, L., Turner, A., and Madry, A. Robustness may be at odds with accuracy. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum? id=Syx Ab30c Y7. Uesato, J., O Donoghue, B., Kohli, P., and van den Oord, A. Adversarial risk and the dangers of evaluating against weak attacks. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsm assan, Stockholm, Sweden, July 10-15, 2018, pp. 5032 5041, 2018. URL http://proceedings. mlr.press/v80/uesato18a.html. Wang, Y., Jha, S., and Chaudhuri, K. Analyzing the robustness of nearest neighbors to adversarial examples. volume 80 of Proceedings of Machine Learning Research, pp. 5133 5142, Stockholmsm assan, Stockholm Sweden, 10 15 Jul 2018. PMLR. URL http://proceedings.mlr. press/v80/wang18c.html. Xiao, C., Li, B., Zhu, J.-Y., He, W., Liu, M., and Song, D. Generating adversarial examples with adversarial networks, 2018. URL https://openreview.net/ forum?id=Hknby Qb C-. Xiao, K. Y., Tjeng, V., Shafiullah, N. M. M., and Madry, A. Training for faster adversarial robustness verification via inducing re LU stability. In International Conference on Learning Representations, 2019. URL https:// openreview.net/forum?id=BJf IVj Ac Km. Xu, W., Evans, D., and Qi, Y. Feature squeezing: Detecting adversarial examples in deep neural networks. 04 2017. Yin, D., Kannan, R., and Bartlett, P. Rademacher complexity for adversarially robust generalization. volume 97 of Proceedings of Machine Learning Research, pp. 7085 7094, Long Beach, California, USA, 09 15 Jun 2019. PMLR. URL http://proceedings.mlr. press/v97/yin19b.html. Query Complexity of Adversarial Attacks Zheng, T., Chen, C., and Ren, K. Distributionally adversarial attack. Proceedings of the AAAI Conference on Artificial Intelligence, 33:2253 2260, 07 2019. doi: 10.1609/aaai.v33i01.33012253.