# blackbox_certification_and_learning_under_adversarial_perturbations__5235ee18.pdf

Black-box Certiﬁcation and Learning under Adversarial Perturbations

Hassan Ashtiani * 1 Vinayak Pathak * 2 Ruth Urner * 3

We formally study the problem of classiﬁcation under adversarial perturbations from a learner s perspective as well as a third-party who aims at certifying the robustness of a given black-box classiﬁer. We analyze a PAC-type framework of semisupervised learning and identify possibility and impossibility results for proper learning of VCclasses in this setting. We further introduce a new setting of black-box certiﬁcation under limited query budget, and analyze this for various classes of predictors and perturbation. We also consider the viewpoint of a black-box adversary that aims at ﬁnding adversarial examples, showing that the existence of an adversary with polynomial query complexity can imply the existence of a sample efﬁcient robust learner.

1. Introduction

We formally study the problem of classiﬁcation under adversarial perturbations. An adversarial perturbation is an imperceptible alteration of a classiﬁer s input which changes its prediction. The existence of adversarial perturbations for real-world input instances and typical classiﬁers (Szegedy et al., 2014) has contributed to a lack of trust in predictive tools derived from automated learning. Recent years have thus seen a surge of studies proposing various heuristics to enhance robustness to adversarial attacks (Chakraborty et al., 2018). Existing solutions often either (i) modify the learning procedure to increase the adversarial robustness, e.g. by modifying the training data or the loss function used for training (Sinha et al., 2018; Cohen et al., 2019; Salman et al., 2019), or (ii) post-process an existing classiﬁer to enhance its robustness (Cohen et al., 2019).

*Equal contribution 1Department of Computing and Software, Mc Master University, Hamilton, ON, Canada 2Scotiabank, Toronto, ON, Canada 3Lassonde School of Engineering, EECS Department, York University, Toronto, ON, Canada. Correspondence to: Hassan Ashtiani <zokaeiam@mcmaster.ca>, Vinayak Pathak <vpathak@uwaterloo.ca>, Ruth Urner <ruth@eecs.yorku.ca>.

Proceedings of the 37 th International Conference on Machine Learning, Online, PMLR 119, 2020. Copyright 2020 by the author(s).

A user of a predictive tool, however, may not oftentimes be involved in the training of the classiﬁer nor have the technical access or capabilities to modify its input/output behavior. Instead, the predictor may have been provided by a third party and the user may have merely a black-box access to the predictor. That is, the predictor h presents itself as an oracle that takes input query x and responds with the label h(x). The provider of the predictive tool, while not necessarily assumed to have malicious intent, is still naturally considered untrusted, and the user thus has an interest in verifying the predictor s performance (including adversarial robustness) on its own application domain. While the standard notion of classiﬁcation accuracy can be easily estimated from an i.i.d. sample generated from user s data generating distribution, estimating the expected robust loss is not that easy: Given a labeled instance (x, y), the user can immediately verify whether the instance is misclassiﬁed (h(x) 6= y) using a single query to h, but understanding whether x is vulnerable under adversarial perturbations may require many more queries to the oracle.

We introduce and analyze a formal model for black-box certiﬁcation under query access, and provide examples of hypothesis classes and perturbation types1 for which such a certiﬁer exists. We further introduce the notion of witness sets for certiﬁcation, and identify more general classes of problems and perturbation types that admit black-box certiﬁcation with ﬁnite queries. On the contrary, we demonstrate cases of simple classes where the query complexity of certiﬁcation is unbounded.

We further look at the problem from the viewpoint of the adversary, connecting the query complexity of an adversary (for ﬁnding adversarial examples) and that of the certiﬁer. An intriguing question that we explore is whether the sample complexity of learning a robust classiﬁer with respect to a hypothesis class is related to the query complexity of an optimal adversary (or certiﬁer) for that class. We uncover such a connection, showing that the existence of a successful adversary with polynomial query complexity for a properly compressible class implies sample efﬁcient robust learnability of that class. For this, we adapt a compression-based argument, demonstrating a sample complexity upper bound

1A perturbation type captures the set of admissible perturbations the adversary is allowed to make at each point (See Section 2).

Black-box Certiﬁcation and Learning under Adversarial Perturbations

for robust learning that is smaller than what was previously known (Montasser et al., 2019) (assuming that a linear adversary exists and the class is properly compressible).

We start our investigations with the problem of robustly (PAC-)learning classes of ﬁnite VC-dimension. It has been shown recently that, while the VC-dimension characterizes the proper learnability of a hypothesis class under the binary (misclassiﬁcation) loss, there are classes of small VC-dimension that are not properly learnable under the robust loss (Montasser et al., 2019). We deﬁne the notion of the margin class (associated with a hypothesis class and a perturbation type) and show that, if both the class and the margin class are simple (measured by their VC-dimension), then proper learning under robust loss is not signiﬁcantly more difﬁcult than learning with respect to the binary loss.

The corresponding complexity of the margin class, however, can be potentially large for speciﬁc choices of perturbation types and hypothesis classes. We thus investigate and provide scenarios where a form of semi-supervised learning can overcome the impossibility of proper robust learning. We believe our investigations of robust learnability in these scenarios may help shed some light on where the difﬁculty of general robust classiﬁcation stems from.

1.1. Related work

Recent years have produced a surge of work on adversarial attack (and defense) mechanisms (Madry et al., 2018; Chakraborty et al., 2018; Chen et al., 2017; Dong et al., 2018; Narodytska & Kasiviswanathan, 2017; Papernot et al., 2017; Akhtar & Mian, 2018; Su et al., 2019), as well as the development of reference implementations of these (Goodfellow et al., 2018). Here, we brieﬂy review some earlier work on theoretical understanding of the problem.

Several recent studies have suggested and analyzed approaches of training under data augmentation (Sinha et al., 2018; Salman et al., 2019). The general idea is to add adversarial perturbations to data points already at training time to promote smoothness around the support of the data generating distribution. These studies then provide statistical guarantees for the robustness of the learned classiﬁer. Similarly, statistical guarantees have been presented for robust training that modiﬁes the loss function rather than the training data (Wong & Kolter, 2018). However, the notion of robustness certiﬁcation used in these is different from what we propose. While they focus on designing learning methods that are certiﬁably robust, we aim at certifying an arbitrary classiﬁer and for a potentially new distribution.

The robust learnability of ﬁnite VC-classes has been studied only recently, often with pessimistic conclusions. An early result demonstrated that there exist distributions where robust learning requires provably more data than its non-

robust counterpart (Schmidt et al., 2018). Recent works have studied adversarially robust classiﬁcation in the PAClearning framework of computational learning theory (Cullina et al., 2018; Awasthi et al., 2019; Montasser et al., 2020) and presented hardness results for binary distribution and hypothesis classes in this framework (Diochnos et al., 2018; Gourdeau et al., 2019; Diochnos et al., 2019). On the other hand, robust learning has been shown to be possible, for example when the hypothesis class is ﬁnite and the adversary has a ﬁnite number of options for corrupting the input (Feige et al., 2015). This result has also been extended to the more general case of classes with ﬁnite VC-dimension (Attias et al., 2019). It has also been shown that robust learning is possible (by Robust Empirical Risk Minimization (RERM)) under a feasibility assumption on the distribution and bounded covering numbers of the hypothesis class (Bubeck et al., 2019). However, more recent work has presented classes of VC-dimension 1, where the robust loss class has arbitrarily large VC-dimension (Cullina et al., 2018) and, moreover, where proper learning (such as RERM) is impossible in a distribution-free ﬁnite sample regime (Montasser et al., 2019). Remarkably, the latter work also presents an improper learning scheme for any VC-class and any adversary type. The sample complexity of this approach, however, depends on the dual VC-dimension which can be exponential in the VC-dimension of the class.

We note that two additional aspects of our work have appeared in the the literature before: considering robust learnability by imposing computational constraints on an adversary has been explored recently (Bubeck et al., 2019; Gourdeau et al., 2019; Garg et al., 2019). Earlier work has also hypothesized that unlabeled data may facilitate adversarially robust learning, and demonstrated a scenario where access to unlabeled data yields a better bound on the sample complexity under a speciﬁc data generative model (Carmon et al., 2019; Alayrac et al., 2019).

Less closely related to our work, the theory of adversarially robust learnability has been studied for non-parametric learners. A ﬁrst study in that framework showed that a nearest neighbor classiﬁer s robust loss converges to that of the Bayes optimal (Wang et al., 2018). A follow-up work then derived a characterization of the best classiﬁer with respect to the robust loss (analogous to the notion of the Bayes optimal), and suggested a training data pruning approach for non-parametric robust classiﬁcation (Yang et al., 2019).

1.2. Outline and summary of contributions

Problem setup and the adversarial loss formulation. In Section 2, we provide the formal setup for the problem of adversarial learning. We also decompose the adversarial loss, and deﬁne the notion of the margin class associated with a hypothesis class and a perturbation type (Def. 4).

Black-box Certiﬁcation and Learning under Adversarial Perturbations

Using unlabeled data for adversarial learning of VCclasses. In Section 3, we study the sample complexity of proper robust learning. While this sample complexity can be inﬁnite for general VC-classes (Cullina et al., 2018; Montasser et al., 2019; Yin et al., 2019), we show that VCclasses are properly robustly learnable if the margin class also has ﬁnite VC-dim (Thm. 7). We formalize an idealized notion of semi-supervised learning where the learner has additional oracle access to probability weights of the margin sets. We show that, perhaps counter intuitively, oracle access to both (exact) margin weights and (exact) binary losses, does not sufﬁce for identifying the minimizer of the adversarial loss in a class H (Thm. 9), even in the 0/1realizable case (Thm. 12). However, under the additional assumption of robust realizability, proper learning becomes feasible with access to the marginal or sufﬁcient unlabled data (Thms. 10 and 11).

Black-box certiﬁcation with query access. We formally deﬁne the problem of black-box certiﬁcation through query access (Def. 15), and demonstrate examples where certiﬁcation is possible (Obs. 16) or impossible (Obs. 17). Motivated by this impossibility result, we also introduce a tolerant notion of certiﬁcation (Def. 19). We show that while more classes are certiﬁable with this deﬁnition (Obs. 20), some simple classes remain impossible to certify (Obs. 21). We identify a sufﬁcient condition for certiﬁability of a hypothesis class w.r.t. a perturbation type through the notion of witness sets (Def. 22 and Thm. 23). We then consider the query complexity of the adversary (as opposed to that of the certiﬁer) for ﬁnding adversarial instances (Def. 24, 25, and 26) and for the case of a non-adaptive adversary relate it to the existence of a witness set (Obs. 27).

Connecting adversarial query complexity and PAClearnability. The culminating result connects the two themes of our work: robust (PAC-)learnability, and query complexity of an adversary. With Theorem 28, we show that existence of a perfect adversary with small query complexity implies sample-efﬁcient robust learning for properly compressible classes.

We include the proof sketches in the paper, and refer the reader to the supplementary material for detailed proofs.

2. Setup and Deﬁnitions

We let X denote the domain (often X Rd) and Y (mostly Y = {0, 1}) a (binary) label space. We assume that data is generated by some distribution P over X Y and let PX denote the marginal of P over X. A hypothesis is a function h : X ! Y , and can naturally be identiﬁed with a subset of X Y , namely h = {(x, y) 2 X Y | x 2 X, y = f(x)}. Since we are working with binary labels, we also sometimes identify a hypothesis h with the pre-image of 1 under h,

that is the domain subset {x 2 X | h(x) = 1}. We let F denote the set of all Borel functions2 from X to Y (or all functions in case of a countable domain). A hypothesis class is a subset of F, often denoted by H F.

The quality of prediction of a hypothesis on a labeled example (x, y) is measured by a loss function : (F X Y ) ! R. For classiﬁcation problems, the quality of prediction is typically measured with the binary loss

0/1(h, x, y) = 1 [h(x) 6= y]

, where 1 [ ] denotes the indicator function for predicate . For (adversarially) robust classiﬁcation, we let U : X ! 2X, the perturbation type, be a function that maps each instance to the set of admissible perturbations at point x. We assume that the perturbation type satisﬁes x 2 U(x) for all x 2 X. If X is equipped with a metric dist, then a natural choice for the set of perturbations at x is a ball Br(x) = {z 2 X | dist(x, z) r} of radius r around x. For an x 2 X and h 2 H, we say that x0 2 U(x) is an adversarial point of x with respect to h if h(x) 6= h(x0). We use the following deﬁnition of the adversarially robust loss with respect to perturbation type U

U(h, x, y) = 1 [9z 2 U(x) : h(z) 6= y] .

If U(x) is always a ball of radius r around x, we will also use the notation r(h, x, y) = Br(h, x, y). We assume that the perturbation type is so that U(f, , ) is a measurable function for all f 2 F. A sufﬁcient condition for this is that the set U(x) are open sets (where X is assumed to be equipped with some topology) and the pertubation type further satisﬁes z 2 U(x) if and only if x 2 U(z) for all x, z 2 X (see Appendix B for a proof and an example of a simple perturbation type that renders the the corresponding loss function of a threshold predictor non-measurable).

We denote the expected loss (or true loss) of a hypothesis h with respect to the distribution P and loss function by LP (h) = E(x,y) P [ (h, x, y)]. In particular, we will denote the true binary loss by L0/1

P (h) and the true robust loss by LU

P (h). Further, we denote the approximation error of class H with respect to distribution P and loss function by LP (H) = infh2H LP (h).

The empirical loss of a hypothesis h with respect to loss function and a sample S = ((x1, y1), . . . , (xn, yn)) is deﬁned as LS(h) = 1

i=1 (h, xi, yi).

A learner A is a function that takes in a ﬁnite sequence of labeled instances S = ((x1, y1), . . . , (xn, yn)) and outputs a hypothesis h = A(S). The following is a standard notion of (PAC-)learnability from ﬁnite samples of a hypothesis

2For an uncountable domain, we only consider Borelmeasurable hypotheses to avoid dealing with measurability issues.

Black-box Certiﬁcation and Learning under Adversarial Perturbations

class (Vapnik & Chervonenkis, 1971; Valiant, 1984; Blumer et al., 1989; Shalev-Shwartz & Ben-David, 2014).

Deﬁnition 1 ((Agnostic) Learnability). A hypothesis class H is agnostic learnable with respect to set of distributions P and loss function , if there exists a learner A such that for all , δ 2 (0, 1), there is a sample size m( , δ) such that, for any distribution P 2 P, if the input to A is an iid sample S from P of size m m( , δ), then, with probability at least (1 δ) over the samples, the learner outputs a hypothesis

h = A(S) with LP (h) LP (H) + .

H is said to be learnable in the realizable case with respect to loss function , if the above holds under the condition that LP (H) = 0. We say that H is distribution-free learnable (or simply learnable) if it is learnable when P is the set of

all probability measures over X Y .

Deﬁnition 2 (VC-dimension). We say that a collection of subsets G 2X of some domain X shatters a subset B X if for every F B there exists G 2 G such that G\B = F. The VC-dimension of G, denoted by VC(G), is deﬁned to be

the supremum of the size of the sets that are shattered by G.

It is easy to see that the VC-dimension of a binary hypothesis class H is independent of whether we view H as a subset of X Y or pre-images of 1 (thus, subsets of X). It is well known that, for the binary loss, a hypothesis class is (distribution-free) learnable if and only if it has ﬁnite VCdimension (Blumer et al., 1989). Furthermore, any learnable binary hypothesis class can be learned with a proper learner.

Deﬁnition 3 (Proper Learnability). We call a learner A a proper learner for the class H if, for all input samples S, we have A(S) 2 H. A class H is properly learnable if the conditions in Deﬁnition 1 hold with a proper learner A.

It has recently been shown that there are classes of ﬁnite VC-dimension that are not properly learnable with respect to the adversarially robust loss (Montasser et al., 2019).

2.1. Decomposing the robust loss

In this work, we adapt the most commonly used notion of a adversarially robust loss (Montasser et al., 2019; Yang et al., 2019). Note that, we have U(h, x, y) = 1 if and only if at least one of the following conditions holds: h makes a mistake on x with respect to label y, or there is a close-by instance z 2 U(x) that h labels different than x, that is, x is close to h s decision boundary.

The ﬁrst condition holds when (x, y) falls into the error region, errh = (X Y )\h. The notion of error region then naturally captures the (non-adversarial) loss:

P (h) = P(x,y) P [(x, y) 2 errh] = P(errh).

The second condition holds when x lies in the margin area of h. The following deﬁnition makes this notion explicit.

Let h 2 F be some hypothesis. We deﬁne the margin area of h with respect to perturbation type U, as the subset mar U

h X Y deﬁned by

h = {(x, y) 2 X Y | 9z 2 U(x) : h(x) 6= h(z)}

Based on these deﬁnitions, the adversarially robust loss with respect to U is 1 if and only if the sample (x, y) falls into the error region errh and/or the margin area mar U

P (h) = P(errh [ mar U

Deﬁnition 4. For class H, we refer to the collection HU

mar = {mar U

h | h 2 H} as the margin class of H.

While we deﬁned that margin areas mar U

h as subsets of X Y , it is sometimes natural to identify them with their projection on X, thus simply as subsets of X.

Remark 5. There is more than one way to formulate a loss function that captures both classiﬁcation accuracy and robustness to (small) adversarial perturbations. The notion we adopt has the property that even the true labeling function can have positive robust loss, if the true labels themselves change within the adversarial neighbourhoods. A natural alternative is to say an adversarial point is a point

in the neighbourhood of an instance that is misclassiﬁed by the classiﬁer. However, note that such a notion cannot be phrased as a loss function (h, x, y) (as it depends on the true label of the perturbed instance). Previous studies have provided excellent discussions of the various options (Diochnos et al., 2018; Gourdeau et al., 2019).

Semi-Supervised Learning (SSL) Since the margin areas mar U

h can naturally be viewed as subsets of X, their weights P(mar U

h ) under the data generating distribution can potentially be estimated with samples from PX, that is, from unlabeled data. A learner that takes in both a labeled sample S from P and an unlabeled sample T from PX, is called a semi-supervised learner. For scenarios where robust learning has been shown to be hard, we explore whether this hardness can be overcome by SSL. We consider semi-supervised learners that take in labeled and unlabeled samples, and also idealized semi-supervised learners that, in addition to a labeled samples have oracle access to probability weights of certain subsets of X (G opfert et al., 2019).

3. Robust Learning of VC Classes

It has been shown that there is a class H of bounded VCdimension (VC(H) = 1 in fact) and a perturbation type U such that H is not robustly properly learnable (Montasser et al., 2019), even if the distribution is realizable with respect to H under the U-robust loss. The perturbation type U in that lower bound construction can actually chosen to be

Black-box Certiﬁcation and Learning under Adversarial Perturbations

balls with respect to some metric over X = Rd (for any d, even d = 1). The same work also shows that if a class has bounded VC-dimension, then it is (improperly) robustly learnable with respect to any perturbation type U. Theorem 6 ((Montasser et al., 2019)). (1) There is a class H over X = Rd with VC(H) = 1, and a set of distributions P with Lr

P (H) = 0 for all P 2 P, such that H is not proper learnable over P with respect to loss function r. (2) Let X be any domain and U : X ! 2X be any type of

perturbation, and let H {0, 1}X be a hypothesis class with ﬁnite VC-dimension. Then H is distribution-free agnostic learnable with respect to loss function U.

While the second part of the above theorem seems to settle adversarially robust learnability for binary hypothesis classes, the positive result is achieved with a compressionbased learner, which has potentially much higher sample complexity than what sufﬁces for the binary loss. In fact, the size of the best known general compression scheme (Moran & Yehudayoff, 2016) depends on the VC-dimension of the dual class of H, making the sample complexity of this approach generally exponential in VC-dimension of H.

We ﬁrst show that the impossibility part of the above theorem crucially depends on the combination of the class H and a pertubation type U (despite these being balls in a Euclidian space) so that the margin class HU

mar has inﬁnite VC-dimension. We prove that, if both H and HU

mar have ﬁnite VC-dimension then H is (distribution-free) learnable with respect to the robust loss, with a proper learner. Theorem 7 (Proper learnability for ﬁnite VC and ﬁnite margin-VC). Let X be any domain and H F be a hypothesis class with ﬁnite VC-dimension. Further, let U : X ! 2X be any perturbation type such that HU

mar has ﬁnite VC-dimension. We set D = VC(H)+VC(HU

mar). Then H is distribution-free (agnostically) properly learnable

with respect to the robust loss U, and the sample complexity is O

D log(D)+log(1/δ)

Proof Sketch. We provide the more detailed argument in the appendix. Recall that a set S X Y is said to be an -approximation of P with respect to H X Y if for all h 2 H we have

$$$P[h] |h\S|

$$$ , that is, if the empirical estimates with respect to S of the sets in h are -close to their true probability weights. Consider the class of subsets G = {(errh [ mar U

h ) X Y | h 2 H} of point-wise unions of error and margin regions. A simple counting argument shows that VC(G) D log(D), where D = VC(H) + VC(HU

mar). Thus, by basic VC-theory, a sample

D log D+log(1/δ)

will be an -approximation of G with respect to P with probability at least 1 δ. Thus any empirical risk minimizer with respect to U is a successful proper and agnostic robust learner for H.

Observation 8. We believe the conditions of Theorem 7 hold for most natural classes and perturbation types U. Eg. if H is the class of linear predictors in Rd and U are sets of balls with respect to some p-norm, then both H and HU

mar have ﬁnite VC-dimension (see also (Yin et al., 2019)).

3.1. Using unlabeled data for robust proper learning

In light of the above two general results, we turn to investigate whether unlabeled data can help in overcoming the discrepancy between the two setups. In particular, under various additional assumptions, we consider the case of VC(H) being ﬁnite but VC(HU

mar) (potentially) being inﬁnite and a learner having additional access to PX.

We model knowledge of PX as the learner having access to an oracle that returns the probability weights of various subsets of X. We say that the learner has access to a margin oracle for class H if, for every h 2 H, it has access (can query) the probability weight of the margin set of h, that is P(mar U

h ). Since the margin areas can be viewed as subsets of X, if the margin class of H under perturbation type U has ﬁnite VC-dimension, a margin oracle can be approximated using an unlabeled sample from the distribution P.

Similarly, one could deﬁne an error oracle for H as an oracle, that, for every h 2 H would return the weight of the error sets P(errh). This is typically approximated with a labeled sample from the data-generating distribution, if the class has ﬁnite VC-dimension. This is similar to the settings of learning by distances (Ben-David et al., 1995) or learning with statistical queries (Kearns, 1998; Feldman, 2017).

To minimize the adversarial loss however, the learner needs to ﬁnd (through oracle access or through approximations by samples) a minimizer of the weights P(errh [ mar U

h ). We now ﬁrst show that having access to both an exact error oracle and an exact margin oracle does not sufﬁce for this.

Theorem 9. There is a class H with VC(H) = 1 over a domain X with |X| = 7, a perturbation type U : X ! 2X, and two distributions P 1 and P 2 over X {0, 1}, that are indistinguishable with error and margin oracles for H, while their robust loss minimizers in H differ.

Proof. Let X = {x1, x2, . . . , x7} be the domain. We consider two distributions P 1 and P 2 over X {0, 1}. Both have true label 0 on all points, that is P(y = 1|x) = 0 for all x 2 X. However their marginals P 1 and P 2 differ:

X(x1) = P 1

X(x3) = 0, P 1

X(x2) = 2/6, and

X(xi) = 1/6 for i 2 {4, 5, 6, 7}.

X(x4) = P 2

X(x6) = 0, P 2

X(x5) = 2/6, and

X(xi) = 1/6 for i 2 {1, 2, 3, 7}.

The class H consists of two functions: h1 =

Black-box Certiﬁcation and Learning under Adversarial Perturbations

1 [x = x2 _ x = x3] and h2 = 1 [x = x5 _ x = x6]. Further, we consider the following perturbation sets (for readability, we ﬁrst state them without the points themselves):

U(x1) = {x2}, U(x2) = {x1, x3}, U(x3) = {x2}, U(x4) = {x5}, U(x5) = {x4, x6}, U(x6) = {x5}, U(x7) = ;

Now we set U(xi) = U(xi) [ {xi}, so that each point is included in its own perturbation set. Now, both h1 and h2 have 0/1-loss 2/6 = 1/3 on both P 1 and P 2. And for both h1 and h2 the margin areas have weight 2/6 = 1/3 on both P 1 and P 2. However, the adversarial loss minimizer for P 1

is h1 and for P 2 is h2 (by a gap of 1/6 each).

While the impossibility result in the above example, of course, can be overcome by estimating the weights of the seven points in the domain, the construction exhibits that merely estimating classiﬁcation error and weights of margin sets does not sufﬁce for proper learning with respect to the adversarial loss. The example shows, that the learner also needs to take into account the interactions (intersections between the sets) of the two components of the adversarial loss. However the weights of the intersection sets errh \ mar U

h , inherently involve label information.

In the following subsection we show that realizability with respect to the robust loss implies that robust learning becomes possible with access to a (bounded size) labeled sample from the distribution and additional access to a margin oracle or a (bounded size) unlabeled sample. In the appendix Section C.3, we further explore weakening this assumption to only require 0/1-reazability with access to stronger version of the margin oracle.

3.1.1. ROBUST REALIZABILITY: 9h 2 H WITH LU

This is the setup of the impossibility result for proper learning (Montasser et al., 2019). We show that proper learning becomes possible with access to a margin oracle for H.

Theorem 10. Let X be some domain, H a hypothesis class with ﬁnite VC-dimension and U : X ! 2X any perturbation type. If a learner is given additional access to a margin oracle for H, then H is properly learnable with respect to the robust loss U and the class of distributions P that are robust-realizable by H, LU

P (H) = 0, with labeled sample complexity O( VC(H)+log(1/δ)

Proof Sketch. By the robust realizability, there is an h 2 H with LU

P (h ) = 0 implying that L0/1

P (h ) = 0, that is, the distribution is (standard) realizable by H. Basic VC-theory tells us that an iid sample S of size

VC(H)+log( 1

guarantees that all functions in the version space of S (that is

all h with L0/1

S (h) = 0) have true binary loss at most (with probability at least 1 δ). Now, with access to a margin oracle for H a learner can remove all hypotheses with P(mar U

h ) > 0 from the version space and return any remaining hypothesis (at least h will remain).

Note that the above procedure crucially depends on actual access to a margin oracle. The weights P(mar U

h ) cannot be generally estimated if HU

mar has inﬁnite VC-dimension, as the impossibility result for proper learning from ﬁnite samples shows. Thus proper learnability even under these (strong) assumptions cannot always be manifested by a semi-supervised proper learner that has access only to ﬁnite amounts of unlabeled data. We also note that the above result (even with access to PX) does not allow for an extension to the agnostic case via the type of reductions known from compression-based bounds (Montasser et al., 2019; Moran & Yehudayoff, 2016).

On the other hand, if the margin class has ﬁnite, but potentially much larger VC-dimension than H, then we can use unlabeled data to approximate the margin oracle in Theorem 10. The following result thus provides an improved bound

on the number of labeled samples that sufﬁce for robust proper learning under the assumptions of Theorem 7.

Theorem 11. Let X be some domain, H a hypothesis class with ﬁnite VC-dimension and let U : X ! 2X be a perturbation type such that the margin class HU

mar also has ﬁnite VC-dimension. If a learner is given additional access to an (unlabeled) sample T from PX, then H is properly learnable with respect to the robust loss U and the class of distributions P that are robust-realizable by H, LU

P (H) = 0, with labeled sample complexity O( VC(H)+log(1/δ)

labeled sample complexity O(

mar)+log(1/δ)

Proof. The stated sample sizes imply that all functions in H in the version space of the labeled sample S have true binary loss at most and all functions in H whose margin areas are not hit by T have true margin weight at most . The learner can thus output any function h with 0 classiﬁcation error on S and 0 margin weight under T (at least h will satisfy these conditions), and we get LU

h = P(errh [ mar U

The assumption in the above theorems states that there exists one function h in the class that has both perfect classiﬁcation accuracy and no weight in its margin area. The proof of the impossibility construction of Theorem 9 employs a class and distributions where no function in the class has perfect margin or perfectly classiﬁes the task. We can modify that construction to show that the double realizability in Theorem 10 is necessary if the access to the marginal should be restricted to a margin oracle for H. The proof of the follwing result can be found in Appendix C.2.

Black-box Certiﬁcation and Learning under Adversarial Perturbations

Theorem 12. There is a class H with VC(H) = 1 over a domain X with |X| = 8, a perturbation type U : X ! 2X, and two distributions P 1 and P 2 over X {0, 1}, such that there are functions hr, hc 2 H with L0/1

P i (hr) = 0 and P i(mar U

hc) = 0 for both i 2 {1, 2}, while P 1 and P 2 are indistinguishable with error and margin oracles for H and their robust loss minimizers in H differ.

4. Black-box Certiﬁcation and the Query

Complexity of Adversarial Attacks

Given a ﬁxed hypothesis h, a basic concentration inequality (e.g., Hoeffding s inequality) indicates that the empirical loss of h on a samples S P m, L0/1

S (h), gives an O(m 1/2)-accurate estimate of the true loss with respect to P, L0/1

P (h). In fact, in order to compute L0/1

S (h), we do not need to know h directly; it would sufﬁce to be able to query h(x) on the given sample. Therefore, we can say it is possible to estimate the true binary loss of h up to additive error using O(1/ 2) samples from P and O(1/ 2) queries to h(.).

The high-level question that we ask in this section is whether and when we can do the same for the adversarial loss, LU

P (h). If possible, it would mean that we can have a thirdparty that certiﬁes the robustness of a given black-box predictor (e.g., without relying on the knowledge of the learning algorithm that produced it) Deﬁnition 13 (Label Query Oracle). We call an oracle Oh a label query oracle for a hypothesis h, if for all x 2 X, upon querying for x, the oracle returns the label Oh(x) = h(x). Deﬁnition 14 (Query-based Algorithm). We call an algorithm A : (S1

i=1 Xi, Oh) ! R a query-based algorithm, if A has access to a label query oracle Oh. Deﬁnition 15 (Certiﬁablility). A class H is certiﬁable with respect to U if there exists a query based algorithm A and there are functions q, m : (0, 1)2 ! N such that for every , δ 2 (0, 1], every distribution P over X Y , and every h 2 H, we have that with probability at least 1 δ over an iid sample S P m

X of size m m( , δ)

|A(S, Oh) LU

with a query budget of q( , δ) for A. In this case, we say that H admits (m, q) blackbox query certiﬁcation.

In light of Section 2.1, the task of robust certiﬁcation is to estimate the probability weight of the set errh [ mar U

h . Observation 16. Let H be the set of all half-spaces in R2

and let U(x) = {z : kx zk1 1} be the unit ball wrt 1-norm centred at x. Then H admits (m, q)-certiﬁcation under U for functions m, q 2 O(1/ 2).

Proof. Say we have a sample S P m

X . For each point x 2 S deﬁne the set w(x) = {x + (0, 1), x + (1, 0), x +

( 1, 0), x + (0, 1)}, i.e., the four corner points of U(x).

The certiﬁer can determine whether x 2 errh by querying the label of x; further it can determine whether x 2 mar U

h by querying all points in w(x). Let W = [x2Sw(x). By querying all points in S [ W, the certiﬁer can calculate the robust loss of h on S. This will be an -accurate estimate of LU

P (h) when m = O(1/ 2).

We immediately see that certiﬁcation is non-trivial, in that there are cases where robust certiﬁcation is impossible. The proof can be found in Appendix D.

Observation 17. Let H be the set of all half-spaces in R2

and let U(x) = {z : kx zk2 1} be the unit ball wrt 2-norm centred at x. Then H is not certiﬁable under U.

This motivates us to deﬁne a tolerant certiﬁcation version.

Deﬁnition 18 (Restriction of a perturbation type). Let U, V : X ! 2X be a perturbation types. We say that U is a restriction of V if U(x) V(x) for all x 2 X.

Note that if U is a restriction of V, then, for all distributions P and predictors h we have LU

Deﬁnition 19 (Tolerant Certiﬁcation). A class H is tolerantly certiﬁable with respect to U and V, where U is a restriction of V, if there exists a query based algorithm A, and there are functions q, m : (0, 1)2 ! N such that for every , δ 2 (0, 1], every distribution P over X Y , and every h 2 H, we have that with probability at least 1 δ over an i.i.d. sample S P m

X of size m m( , δ)

A(S, Oh) 2 [LU

with a query budget of q( , δ) for A. In this case, we say that H admits tolerant (m, q) blackbox query certiﬁcation.

Observation 20. Let U(x) = {z : kx zk2 1} and V(x) = {z : kx zk2 1 + γ}. Let H be the set of all half-spaces in R2. Then H is (O(1/ 2), O(1/pγ 2)) tolerantly certiﬁable with respect to U and V.

Proof sketch. For each x, we can always ﬁnd a regular polygon with O( /pγ) vertices that sits between the U(x) and V(x). Therefore, in order to ﬁnd out whether x is adversarially vulnerable or not, it would sufﬁce to make O( /pγ) queries. Combining this with Hoeffding s inequality shows that if we sample 1/ 2 points from P and make O( /pγ) queries for each, we can estimate LU,V

P (h) within error .

Though more realistic, even the tolerant notion of certiﬁability does not make all seemingly simple classes certiﬁable.

Observation 21. Let U(x) = {z : kx zk2 1} and V(x) = {z : kx zk2 1 + γ}. There exists a hypothesis class H with VC-dimension 1, such that H is not tolerantly certiﬁable with respect to U and V.

Black-box Certiﬁcation and Learning under Adversarial Perturbations

Proof sketch. For any p 2 R2, let hp(x) = 1 [x = p]. Let H = {hp : p 2 R2}. H clearly has a VC-dimension of 1, but we claim that it is not tolerantly certiﬁable. We construct an argument similar to Observation 17. The idea is that no matter what queries the certiﬁer chooses, we can always set p to be a point that was not queried and is either inside U or outside V depending on the certiﬁer s answer.

4.1. Witness Sets for Certiﬁcation

A common observation in the previous examples was that if the certiﬁer could identify a set of points whose labels determined the points in S that were in the margin of h, then querying those points was enough for robust certiﬁcation. This motivates the following deﬁnition.

Deﬁnition 22 (Witness sets). Given a hypothesis class H and a perturbation type U, for any point x 2 X, we say that w(x) X is a witness set for x if there exists a mapping f : {0, 1}w(x) ! {0, 1} such that for any hypothesis h 2 H, f(h|w(x)) = 1 if and only if x lies in the margin of h (where h|w(x) denotes the restriction of hypothesis h to set w(x)).

Clearly, all positive examples above were created using witness sets. The following theorem identiﬁes a large class of H, U pairs that exhibit ﬁnite witness sets.

Theorem 23. For any x 2 X, consider two partial orderings x

1 over the elements of H where for h1, h2 2 H, we say h1 x

1 h2 if U(x) \ h1 U(x) \ h2, and h1 x

0 h2 if U(x)\h1 U(x)\h2. For both partial orderings we identify (as equivalent) hypotheses where these intersections co-incide and further we remove all hypotheses where the intersections are empty. 3 If both partial orders have ﬁnite number of minima for each x, then the pair H, U exhibits a ﬁnite witness set and hence is certiﬁable.

Proof. For this proof we will identify hypotheses with their equivalence classes in each partial ordering. Let M0(x) H be the set of minima for x

0 and M1(x) H for x

1. For each h 2 M0(x), we pick a point x0 2 U(x) such that x0 2 h but x0 /2 h0 for any h0 h, thus forming a set w0(x). Similarly, we deﬁne the set w1(x). We claim that w(x) = w0(x) [ w1(x) [ {x} is a witness set for x, i.e., we can determine whether x is in the margin of any hypothesis h 2 H by looking at labels that h assigns to points in w(x).

We only consider the case where h(x) = 0, since the h(x) = 1 case is similar. We claim that x is in the margin of h if and only if there exists a point in w1(x) that is assigned the label 1 by h. Indeed, suppose there exists such a point. Then since the point lies in U(x) and is assigned the opposite label as x by h, x must lie in the margin of h. For the other direction, suppose x lies in the margin of h. Then

3Here, we think of hypotheses h1 and h2 as the pre-image of 1 (as noted in Section 2), and hence subsets of X.

there must exist a point x0 2 U(x) such that h(x0) = 1, which means there must be a hypothesis ˆh 2 M1(x) such that ˆh x

1 h, which means there must exist ˆx 2 w1(x) such that h(ˆx) = 1.

We can easily verify, for example, that for the (H, U) pair deﬁned in Observation 16, the set of minima deﬁned by the partial orderings above is ﬁnite. Indeed, the (equivalence class of) half-spaces corresponding to the four corners of the unit cube constitute the minima.

4.2. Query complexity of adversarial attacks and its

connection to robust PAC learning

Even though in the literature on (practical) adversarial attacks an adversary is often modelled as an actual algorithm, in the theoretical literature the focus has been on whether adversarial examples merely exist4. However, one can say a robust learner is successful if it merely ﬁnds a hypothesis that is potentially non-robust in the conventional sense yet whose adversarial examples are hard to ﬁnd for the adversary. To formalize this idea, one needs to deﬁne some notion of bounded adversary in a way that enables the study of the complexity of ﬁnding adversarial examples. Attempts have been made at studying computationally bounded adversaries in certain scenarios (Garg et al., 2019) but not in the distribution-free setting. Here we study an adversary s query complexity. We start by formally deﬁning an adversary and discussing a few different properties of adversaries.

Deﬁnition 24. For an (H, U) pair, an adversary is an algorithm A tasked with the following: given a set S of n points from the domain, and query access to a hypothesis h 2 H, return a set S0 such that (i) each point x0 2 S0 is an adversarial point to some point in S and (ii) for every x 2 S that has an adversarial point, there exists x0 2 S0 such that x0

is an adversarial point for x. If the two conditions hold we call S0 an admissible attack on S w.r.t. (H, U). We call an adversary perfect for (H, U) if for every S X it outputs an admissible attack on S. We say that the adversary is proper if all its queries are in the set S

There have been (successful) attempts (Papernot et al., 2017; Brendel et al., 2018) at attacking trained neural network models where the adversary was not given any information about the gradients, and had to rely solely on black-box queries to the model. Our deﬁnition of the adversary ﬁts those scenarios. Next, we deﬁne the query complexity of the adversary.

Deﬁnition 25. If, for (H, U), there is a function f : N ! N such that, for any h 2 H and any set S, the adversary A will produce an admissible attack S0 after at most f(|S|) queries,

4E.g., the deﬁnition of adversarial loss in Section 2 is only concerned with whether an adversarial point exists.

Black-box Certiﬁcation and Learning under Adversarial Perturbations

we say that adversary A has query complexity bounded by f on (H, U). We say that the adversary is efﬁcient if f(n) is linear in n.

Note that it is possible that the adversary s queries are adaptive, i.e., the ith point it queries depends on the output of its ﬁrst i 1 queries. A weaker version of an adversary is one where that is not the case.

Deﬁnition 26. An adversary is called non-adaptive if the set of points it queries is uniquely determined by the set S before making any queries to h.

Intuitively, there is a connection between perfect adversaries and witness sets because a witness set merely helps identify the points in S that have adversarial points, whereas an adversary ﬁnds those adversarial points.

Observation 27. If the (H, U) pair exhibits a perfect, nonadaptive adversary with query complexity f(n), then it also has witness sets of size f(n).

Finally, we tie everything together by showing that, for properly compressible classes, the existence of a proper, perfect adversary implies that the robust learning problem has a small sample complexity. We say a class H is properly compressible if (i) it admits a sample compression scheme (Littlestone & Warmuth, 1986) of size O(VC log(n)) where VC is the VC-dimension of H and n is the number of samples that are being compressed , and (ii) the hypothesis outputted by the scheme is always a member of H. Note that (i) holds for all hypothesis classes (by a boosting-based compression scheme (Schapire & Freund, 2013; Moran & Yehudayoff, 2016)). Furthermore, many natural classes are

shown to have proper compression schemes, and it is open if this is true for all VC-classes. (see (Floyd & Warmuth, 1995; Ben-David & Litman, 1998)).

Theorem 28. Assume H is properly compressible. If the robust learning problem deﬁned by (H, U) has a perfect, proper, and efﬁcient adversary, then in the robust realizable-case (LU

P (H) = 0) it can be robustly learned with O(t log2(t)) samples, where t = VC(H)/ 2.

Proof Sketch. We adapt the compression-based approach of (Montasser et al., 2019) to prove the result. Let us assume that we are given a sample S of size |S| = m that is labeled by some h 2 H, and want to compress this sample using a small subset K S. Let us assume that ˆh is the hypothesis that is reconstructed using K. For the compression to succeed, we need to have U(h, x, y) = U(ˆh, x, y) for every (x, y) 2 S. Given the perfect proper efﬁcient adversary, we can ﬁnd all the adversarial points in S using C queries per point in S, for some constant C 0. In fact, we can amend the points corresponding to these queries to S to create an inﬂated set, which we call T.

Note that |T| C |S|. Furthermore, we can now replace the condition 8(x, y) 2 S, U(h, x, y) = U(ˆh, x, y) with 8(x, y) 2 T, 0/1(h, x, y) = 0/1(ˆh, x, y) (the latter implies the former because of the deﬁnition of perfect adversary). Therefore, our task becomes compressing T with respect to the standard binary loss, for which we will invoke the assumption that H has a proper compression scheme. Assume this compression scheme compressed T to a subset K T of size k. We argue that, by invoking the adversary we can convert the compressed set to only include points form the original sample S, and some additional bits as side information. For each point x 2 K in the compressed set that is an original sample point from S, we know that x 2 U(x S) for some x S 2 S, since the adversary is proper. We construct a new compressed set K0 S, by replacing such points x with their corresponding points x S and bits b to encode the rank of the point x among the queries that the adversary would make for x S. Now, the decompressor can ﬁrst recover the set K by invoking the adversary, and then use the standard decompression. Finally, the size of the compression is O(log(m))V C(H) and the results follows from the classic connection of compression and learning (Littlestone & Warmuth, 1986; Montasser et al., 2019).

This result shows an interesting connection between the difﬁculty of ﬁnding adversarial examples and that of robust learning. In particular, if the adversarial points can be found easily (at least when measured by query complexity), then robust learning is almost as easy as non-robust learning (in the sense of agnostic sample complexity). Or, stated in the contrapositive, if robust learning is hard, then even if adversarial points exist, ﬁnding them is going to be hard. It is possible to further extend the result to the agnostic learning scenario, using the same reduction from agnostic learning to realizable learning that was proposed by (David et al., 2016) and used in (Montasser et al., 2019).

5. Conclusion

We formalized the problem of black-box certiﬁcation and its relation to an adversary with bounded query budget. We showed the existence of an adversary with small query complexity implies small sample complexity for robust learning. This suggests that the apparent hardness of robust learning compared to standard PAC learning in terms of sample

complexity may not actually matter as long as we are dealing with bounded adversaries. It would be interesting to explore other types of adversaries (e.g., non-proper and/or non-perfect) to see if they lead to efﬁcient robust learners as well. Another interesting direction is ﬁnding scenarios where ﬁnite unlabeled data can substitute the knowledge of the marginal distribution discussed in Section 3.

Black-box Certiﬁcation and Learning under Adversarial Perturbations

Acknowledgements

We thank the Vector Institute for providing us with the meeting space in which this work was developed! Ruth Urner and Hassan Ashtiani were supported by NSERC Discovery Grants.

Akhtar, N. and Mian, A. Threat of adversarial attacks on

deep learning in computer vision: A survey. IEEE Access, 6:14410 14430, 2018.

Alayrac, J., Uesato, J., Huang, P., Fawzi, A., Stanforth, R.,

and Kohli, P. Are labels required for improving adversarial robustness? In Advances in Neural Information Processing Systems 32, Neur IPS, pp. 12192 12202, 2019.

Attias, I., Kontorovich, A., and Mansour, Y. Improved

generalization bounds for robust learning. In Algorithmic Learning Theory, ALT, pp. 162 183, 2019.

Awasthi, P., Dutta, A., and Vijayaraghavan, A. On robust-

ness to adversarial examples and polynomial optimization. In Advances in Neural Information Processing Systems, Neur IPS, pp. 13760 13770, 2019.

Ben-David, S. and Litman, A. Combinatorial variability of

vapnik-chervonenkis classes with applications to sample compression schemes. Discrete Applied Mathematics, 86 (1):3 25, 1998.

Ben-David, S., Itai, A., and Kushilevitz, E. Learning by

distances. Inf. Comput., 117(2):240 250, 1995.

Blumer, A., Ehrenfeucht, A., Haussler, D., and Warmuth,

M. K. Learnability and the vapnik-chervonenkis dimension. Journal of the ACM (JACM), 36(4):929 965, 1989.

Brendel, W., Rauber, J., and Bethge, M. Decision-based

adversarial attacks: Reliable attacks against black-box machine learning models. 2018.

Bubeck, S., Lee, Y. T., Price, E., and Razenshteyn, I. P.

Adversarial examples from computational constraints. In Proceedings of the 36th International Conference on Machine Learning, ICML, pp. 831 840, 2019.

Carmon, Y., Raghunathan, A., Schmidt, L., Duchi, J. C., and

Liang, P. Unlabeled data improves adversarial robustness. In Advances in Neural Information Processing Systems 32, Neur IPS, pp. 11190 11201, 2019.

Chakraborty, A., Alam, M., Dey, V., Chattopadhyay, A., and

Mukhopadhyay, D. Adversarial attacks and defences: A survey. Co RR, abs/1810.00069, 2018.

Chen, P.-Y., Zhang, H., Sharma, Y., Yi, J., and Hsieh, C.-

J. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM Workshop on Artiﬁcial Intelligence and Security, pp. 15 26, 2017.

Cohen, J. M., Rosenfeld, E., and Kolter, J. Z. Certiﬁed

adversarial robustness via randomized smoothing. In Proceedings of the 36th International Conference on Machine Learning, ICML, pp. 1310 1320, 2019.

Cullina, D., Bhagoji, A. N., and Mittal, P. Pac-learning

in the presence of adversaries. In Advances in Neural Information Processing Systems, Neur IPS, pp. 230 241, 2018.

David, O., Moran, S., and Yehudayoff, A. Supervised learn-

ing through the lens of compression. In Advances in Neural Information Processing Systems, NIPS, pp. 2784 2792, 2016.

Diochnos, D., Mahloujifar, S., and Mahmoody, M. Ad-

versarial risk and robustness: General deﬁnitions and implications for the uniform distribution. In Advances in Neural Information Processing Systems 31, Neur IPS, pp. 10359 10368, 2018.

Diochnos, D. I., Mahloujifar, S., and Mahmoody, M. Lower

bounds for adversarially robust PAC learning. Co RR, abs/1906.05815, 2019.

Dong, Y., Liao, F., Pang, T., Su, H., Zhu, J., Hu, X., and

Li, J. Boosting adversarial attacks with momentum. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 9185 9193, 2018.

Feige, U., Mansour, Y., and Schapire, R. Learning and infer-

ence in the presence of corrupted inputs. In Conference on Learning Theory, COLT, pp. 637 657, 2015.

Feldman, V. A general characterization of the statistical

query complexity. In Proceedings of the 30th Conference on Learning Theory, COLT, pp. 785 830, 2017.

Floyd, S. and Warmuth, M. Sample compression, learnabil-

ity, and the vapnik-chervonenkis dimension. Machine learning, 21(3):269 304, 1995.

Garg, S., Jha, S., Mahloujifar, S., and Mahmoody, M. Ad-

versarially robust learning could leverage computational hardness. Co RR, abs/1905.11564, 2019.

Goodfellow, I. J., Mc Daniel, P. D., and Papernot, N. Mak-

ing machine learning robust against adversarial inputs. Commun. ACM, 61(7):56 66, 2018.

G opfert, C., Ben-David, S., Bousquet, O., Gelly, S., Tol-

stikhin, I. O., and Urner, R. When can unlabeled data

Black-box Certiﬁcation and Learning under Adversarial Perturbations

improve the learning rate? In Conference on Learning Theory, COLT, pp. 1500 1518, 2019.

Gourdeau, P., Kanade, V., Kwiatkowska, M., and Worrell, J.

On the hardness of robust classiﬁcation. In Advances in Neural Information Processing Systems 32, Neur IPS, pp. 7444 7453, 2019.

Haussler, D. and Welzl, E. epsilon-nets and simplex range

queries. Discret. Comput. Geom., 2:127 151, 1987.

Kearns, M. J. Efﬁcient noise-tolerant learning from statisti-

cal queries. J. ACM, 45(6):983 1006, 1998.

Littlestone, N. and Warmuth, M. Relating data compression

and learnability. 1986.

Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and

Vladu, A. Towards deep learning models resistant to adversarial attacks. In 6th International Conference on Learning Representations, ICLR, 2018.

Montasser, O., Hanneke, S., and Srebro, N. VC classes are

adversarially robustly learnable, but only improperly. In Conference on Learning Theory, COLT, pp. 2512 2530, 2019.

Montasser, O., Goel, S., Diakonikolas, I., and Srebro, N.

Efﬁciently learning adversarially robust halfspaces with noise. ar Xiv preprint ar Xiv:2005.07652, 2020.

Moran, S. and Yehudayoff, A. Sample compression schemes

for vc classes. Journal of the ACM (JACM), 63(3):1 10, 2016.

Narodytska, N. and Kasiviswanathan, S. Simple black-box

adversarial attacks on deep neural networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1310 1318. IEEE, 2017.

Papernot, N., Mc Daniel, P., Goodfellow, I., Jha, S., Celik,

Z. B., and Swami, A. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia conference on computer and communications secu-

rity, pp. 506 519, 2017.

Salman, H., Li, J., Razenshteyn, I. P., Zhang, P., Zhang, H.,

Bubeck, S., and Yang, G. Provably robust deep learning via adversarially trained smoothed classiﬁers. In Advances in Neural Information Processing Systems 32, Neur IPS, pp. 11289 11300, 2019.

Schapire, R. E. and Freund, Y. Boosting: Foundations and

algorithms. Kybernetes, 2013.

Schmidt, L., Santurkar, S., Tsipras, D., Talwar, K., and

Madry, A. Adversarially robust generalization requires more data. In Advances in Neural Information Processing Systems, Neur IPS, pp. 5014 5026, 2018.

Shalev-Shwartz, S. and Ben-David, S. Understanding Ma-

chine Learning: From Theory to Algorithms. Cambridge University Press, 2014.

Sinha, A., Namkoong, H., and Duchi, J. C. Certifying

some distributional robustness with principled adversarial training. In 6th International Conference on Learning Representations, ICLR, 2018.

Su, J., Vargas, D. V., and Sakurai, K. One pixel attack

for fooling deep neural networks. IEEE Transactions on Evolutionary Computation, 23(5):828 841, 2019.

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan,

D., Goodfellow, I. J., and Fergus, R. Intriguing properties of neural networks. In 2nd International Conference on Learning Representations, ICLR, 2014.

Valiant, L. G. A theory of the learnable. Commun. ACM, 27

(11):1134 1142, 1984.

Vapnik, V. N. and Chervonenkis, A. Y. On the uniform

convergence of relative frequencies of events to their probabilities. Theory of Probability & Its Applications, 16(2):264 280, 1971.

Wang, Y., Jha, S., and Chaudhuri, K. Analyzing the ro-

bustness of nearest neighbors to adversarial examples. In Proceedings of the 35th International Conference on Machine Learning, ICML, pp. 5120 5129, 2018.

Wong, E. and Kolter, J. Z. Provable defenses against adver-

sarial examples via the convex outer adversarial polytope. In Proceedings of the 35th International Conference on Machine Learning, ICML, pp. 5283 5292, 2018.

Yang, Y., Rashtchian, C., Wang, Y., and Chaudhuri, K. Ad-

versarial examples for non-parametric methods: Attacks, defenses and large sample limits. Co RR, abs/1906.03310, 2019.

Yin, D., Ramchandran, K., and Bartlett, P. L. Rademacher

complexity for adversarially robust generalization. In Proceedings of the 36th International Conference on Machine Learning,ICML, pp. 7085 7094, 2019.