# rademacher_complexity_for_adversarially_robust_generalization__685d0c1b.pdf

Rademacher Complexity for Adversarially Robust Generalization

Dong Yin 1 Kannan Ramchandran 1 Peter Bartlett 1 2

Many machine learning models are vulnerable to adversarial attacks; for example, adding adversarial perturbations that are imperceptible to humans can often make machine learning models produce wrong predictions with high conﬁdence; moreover, although we may obtain robust models on the training dataset via adversarial training, in some problems the learned models cannot generalize well to the test data. In this paper, we focus on ℓ attacks, and study the adversarially robust generalization problem through the lens of Rademacher complexity. For binary linear classiﬁers, we prove tight bounds for the adversarial Rademacher complexity, and show that the adversarial Rademacher complexity is never smaller than its natural counterpart, and it has an unavoidable dimension dependence, unless the weight vector has bounded ℓ1 norm, and our results also extend to multi-class linear classiﬁers; in addition, for (nonlinear) neural networks, we show that the dimension dependence in the adversarial Rademacher complexity also exists. We further consider a surrogate adversarial loss for onehidden layer Re LU network and prove margin bounds for this setting. Our results indicate that having ℓ1 norm constraints on the weight matrices might be a potential way to improve generalization in the adversarial setting. We demonstrate experimental results that validate our theoretical ﬁndings.

1. Introduction

In recent years, many modern machine learning models, in particular, deep neural networks, have achieved success in tasks such as image classiﬁcation (He et al., 2016), speech recognition (Graves et al., 2013), machine translation (Bah-

1Department of Electrical Engineering and Computer Sciences, UC Berkeley, Berkeley, CA, USA 2Department of Statistics, UC Berkeley, Berkeley, CA, USA. Correspondence to: Dong Yin <dongyin@eecs.berkeley.edu>.

Proceedings of the 36 th International Conference on Machine Learning, Long Beach, California, PMLR 97, 2019. Copyright 2019 by the author(s).

danau et al., 2014), game playing (Silver et al., 2016), etc. However, although these models achieve the state-of-theart performance in many standard benchmarks or competitions, it has been observed that by adversarially adding some perturbation to the input of the model (images, audio signals), the machine learning models can make wrong predictions with high conﬁdence. These adversarial inputs are often called the adversarial examples. Typical methods of generating adversarial examples include adding small perturbations that are imperceptible to humans (Szegedy et al., 2013), changing surrounding areas of the main objects in images (Gilmer et al., 2018a), and even simple rotation and translation (Engstrom et al., 2017). This phenomenon was ﬁrst discovered by Szegedy et al. (2013) in image classiﬁcation problems, and similar phenomena have been observed in other areas (Carlini & Wagner, 2018; Kos et al., 2018). Adversarial examples bring serious challenges in many security-critical applications, such as medical diagnosis and autonomous driving the existence of these examples shows that many state-of-the-art machine learning models are actually unreliable in the presence of adversarial attacks.

Since the discovery of adversarial examples, there has been a race between designing robust models that can defend against adversarial attacks and designing attack algorithms that can generate adversarial examples and fool the machine learning models (Goodfellow et al., 2014; Gu & Rigazio, 2014; Carlini & Wagner, 2016; 2017). As of now, it seems that the attackers are winning this game. For example, a recent work shows that many of the defense algorithms fail when the attacker uses a carefully designed gradient-based method (Athalye et al., 2018). Meanwhile, adversarial training (Huang et al., 2015; Shaham et al., 2015; Madry et al., 2017) seems to be the most effective defense method. Adversarial training takes a robust optimization (Ben-Tal et al., 2009) perspective to the problem, and the basic idea is to minimize some adversarial loss over the training data. We elaborate below.

Suppose that data points (x, y) are drawn according to some unknown distribution D over the feature-label space X Y, and X Rd. Let F be a hypothesis class (e.g., a class of neural networks with a particular architecture), and ℓ(f(x), y) be the loss associated with f F. Consider the ℓ white-box adversarial attack where an adversary is

Rademacher Complexity for Adversarially Robust Generalization

allowed to observe the trained model and choose some x

such that x x ϵ and ℓ(f(x ), y) is maximized. Therefore, to better defend against adversarial attacks, during training, the learner should aim to solve the empirical adversarial risk minimization problem

min f F 1 n

i=1 max x i xi ϵ ℓ(f(x i), yi), (1)

where {(xi, yi)}n i=1 are n i.i.d. training examples drawn according to D. This minimax formulation raises many interesting theoretical and practical questions. For example, we need to understand how to efﬁciently solve the optimization problem in (1), and in addition, we need to characterize the generalization property of the adversarial risk, i.e., the gap between the empirical adversarial risk in (1) and the population adversarial risk E(x,y) D[max x x ϵ ℓ(f(x ), y)]. In fact, for deep neural networks, both questions are still wide open. In particular, for the generalization problem, it has been observed that even if we can minimize the adversarial training error, the adversarial test error can still be large. For example, for a Resnet (He et al., 2016) model on CIFAR10, using the PGD adversarial training algorithm by Madry et al. (2017), one can achieve about 96% adversarial training accuracy, but the adversarial test accuracy is only 47%. This generalization gap is signiﬁcantly larger than that in the natural setting (without adversarial attacks), and thus it has become increasingly important to better understand the generalization behavior of machine learning models in the adversarial setting.

In this paper, we focus on the adversarially robust generalization property and make a ﬁrst step towards deeper understanding of this problem. We focus on ℓ adversarial attacks and analyze generalization through the lens of Rademacher complexity. We study both linear classiﬁers and nonlinear feedforward neural networks, and both binary and multi-class classiﬁcation problems. We summarize our contributions as follows, and provide detailed comparison with existing works in Section 6.

1.1. Our Contributions For binary linear classiﬁers, we prove tight upper and lower bounds for the adversarial Rademacher complexity. We show that the adversarial Rademacher complexity is never smaller than its counterpart in the natural setting, which provides theoretical evidence for the empirical observation that adversarially robust generalization can be hard. We also show that under an ℓ adversarial attack, when the weight vector of the linear classiﬁer has bounded ℓp norm (p 1), a polynomial dimension dependence in the adversarial Rademacher complexity is unavoidable, unless p = 1. For multi-class linear classiﬁers, we prove margin bounds in the adversarial setting. Similar to binary classiﬁers, the margin bound also exhibits polynomial dimension dependence when the weight vector for each

class has bounded ℓp norm (p > 1).

For neural networks, we show that in contrast to the margin bounds derived by Bartlett et al. (2017) and Golowich et al. (2017) which depend only on the norms of the weight matrices and the data points, the adversarial Rademacher complexity has a lower bound with an explicit dimension dependence, which is also an effect of the ℓ attack. We further consider a surrogate adversarial loss for one hidden layer Re LU networks, based on the SDP relaxation proposed by Raghunathan et al. (2018a). We prove margin bounds using the surrogate loss and show that if the weight matrix of the ﬁrst layer has bounded ℓ1 norm, the margin bound does not have explicit dimension dependence. This suggests that in the adversarial setting, controlling the ℓ1 norms of the weight matrices may be a way to improve generalization.

We conduct experiments on linear classiﬁers and neural networks to validate our theoretical ﬁndings; more specifically, our experiments show that ℓ1 regularization could reduce the adversarial generalization error, and the adversarial generalization gap increases as the dimension of the feature spaces increases.

Notation We deﬁne the set [N] := {1, 2, . . . , N}. For two sets A and B, we denote by BA the set of all functions from A to B. We denote the indicator function of a event A as 1(A). Unless otherwise stated, we denote vectors by boldface lowercase letters such as w, and the elements in the vector are denoted by italics letters with subscripts, such as wk. All-one vectors are denoted by 1. Matrices are denoted by boldface uppercase letters such as W. For a matrix W Rd m with columns wi, i [m], the (p, q) matrix norm of W is deﬁned as W p,q = [ w1 p, w2 p, , wm p] q, and we may use the shorthand notation p p,p. We denote the spectral norm of matrices by σ and the Frobenius norm of matrices by F (i.e., F 2). We use B x (ϵ) to denote the ℓ ball centered at x Rd with radius ϵ, i.e., B x (ϵ) = {x Rd : x x ϵ}.

2. Problem Setup

We start with the general statistical learning framework. Let X and Y be the feature and label spaces, respectively, and suppose that there is an unknown distribution D over X Y. In this paper, we assume that the feature space is a subset of the d dimensional Euclidean space, i.e., X Rd. Let F VX be the hypothesis class that we use to make predictions, where V is another space that might be different from Y. Let ℓ: V Y [0, B] be the loss function. Throughout this paper we assume that ℓis bounded, i.e., B is a positive constant. In addition, we introduce the function class ℓF [0, B]X Y by composing the functions in F with ℓ( , y), i.e., ℓF := {(x, y) 7 ℓ(f(x), y) : f F}. The goal of the learning problem is to ﬁnd f F such

Rademacher Complexity for Adversarially Robust Generalization

that the population risk R(f) := E(x,y) D[ℓ(f(x), y)] is minimized.

We consider the supervised learning setting where one has access to n i.i.d. training examples drawn according to D, denoted by (x1, y1), (x2, y2), . . . , (xn, yn). A learning algorithm maps the n training examples to a hypothesis f F. In this paper, we are interested in the gap between the empirical risk Rn(f) := 1

n Pn i=1 ℓ(f(xi), yi) and the population risk R(f), known as the generalization error.

Rademacher complexity (Bartlett & Mendelson, 2002) is one of the classic measures of generalization error. Here, we present its formal deﬁnition. For any function class H RZ, given a sample S = {z1, z2, . . . , zn} of size n, and empirical Rademacher complexity is deﬁned as

i=1 σih(zi)

where σ1, . . . , σn are i.i.d. Rademacher random variables with P{σi = 1} = P{σi = 1} = 1 2. In our learning problem, denote the training sample by S, i.e., S := {(x1, y1), (x2, y2), . . . , (xn, yn)}. We then have the following theorem which connects the population and empirical risks via Rademacher complexity.

Theorem 1. (Bartlett & Mendelson, 2002; Mohri et al., 2012) Suppose that the range of ℓ(f(x), y) is [0, B]. Then, for any δ (0, 1), with probability at least 1 δ, the following holds for all f F,

R(f) Rn(f) + 2BRS(ℓF) + 3B

As we can see, Rademacher complexity measures the rate that the empirical risk converges to the population risk uniformly across F. In fact, according to the antisymmetrization lower bound by Koltchinskii et al. (2006), one can show that RS(ℓF) supf F R(f) Rn(f) RS(ℓF). Therefore, Rademacher complexity is a tight bound for the rate of uniform convergence of a loss function class, and in many settings can be a tight bound for generalization error.

The above discussions can be extended to the adversarial setting. In this paper, we focus on the ℓ adversarial attack. In this setting, the learning algorithm still has access to n i.i.d. uncorrupted training examples drawn according to D. Once the learning procedure ﬁnishes, the output hypothesis f is revealed to an adversary. For any data point (x, y) drawn according to D, the adversary is allowed to perturb x within some ℓ ball to maximize the loss. Our goal is to minimize the adversarial population risk, i.e.,

e R(f) := E(x,y) D

max x B x (ϵ) ℓ(f(x ), y) ,

and to this end, a natural way is to conduct adversarial training minimizing the adversarial empirical risk

e Rn(f) := 1

i=1 max x i B xi(ϵ) ℓ(f(x i), yi).

Let us deﬁne the adversarial loss eℓ(f(x), y) := max B x (ϵ) ℓ(f(x ), y) and the function class eℓF [0, B]X Y as eℓF := {eℓ(f(x), y) : f F}. Since the range of eℓ(f(x), y) is still [0, B], we have the following direct corollary of Theorem 1. Corollary 1. For any δ (0, 1), with probability at least 1 δ, the following holds for all f F,

e R(f) e Rn(f) + 2BRS(eℓF) + 3B

As we can see, the Rademacher complexity of the adversarial loss function class eℓF, i.e., RS(eℓF) is again the key quantity for the generalization ability of the learning problem. A natural problem of interest is to compare the Rademacher complexities in the natural and the adversarial settings, and obtain generalization bounds for the adversarial loss.

3. Linear Classiﬁers

3.1. Binary Classiﬁcation We start with binary linear classiﬁers. In this setting, we deﬁne Y = { 1, +1}, and let the hypothesis class F RX

be a set of linear functions of x X. More speciﬁcally, we deﬁne fw(x) := w, x , and consider prediction vector w with ℓp norm constraint (p 1), i.e.,

F = {fw(x) : w p W}. (2) We predict the label with the sign of fw(x); more specifically, we assume that the loss function ℓ(fw(x), y) can be written as ℓ(fw(x), y) φ(y w, x ), where φ : R [0, B] is monotonically nonincreasing and Lφ-Lipschitz. In fact, if φ(0) 1, we can obtain a bound on the classiﬁcation error according to Theorem 1. That is, with probability at least 1 δ, for all fw F,

P(x,y) D{sgn(fw(x)) = y}

i=1 ℓ(fw(xi), yi) + 2BRS(ℓF) + 3B

In addition, recall that according to the Ledoux-Talagrand contraction inequality (Ledoux & Talagrand, 2013), we have RS(ℓF) LφRS(F).

For the adversarial setting, we have eℓ(fw(x), y) = max x B x (ϵ)ℓ(fw(x ), y)=φ( min x B x (ϵ)y w, x ).

Let us deﬁne the following function class e F RX { 1}:

e F = min x B x (ϵ) y w, x : w p W . (3)

Again, we have RS(eℓF) LφRS( e F). The ﬁrst major contribution of our work is the following theorem, which provides a comparison between RS(F) and RS( e F).

Rademacher Complexity for Adversarially Robust Generalization

Theorem 2 (Main Result 1). Let F := {fw(x) : w p W} and e F := {minx B x (ϵ) y w, x : w p W}. Suppose that 1

q = 1. Then, there exists a universal constant c (0, 1) such that

max{RS(F), cϵW d 1 q n} RS( e F) RS(F) + ϵW d 1 q n.

We prove Theorem 2 in Appendix B. We can see that the adversarial Rademacher complexity, i.e., RS( e F) is always at least as large as the Rademacher complexity in the natural setting. This implies that uniform convergence in the adversarial setting is at least as hard as that in the natural setting. In addition, since max{a, b} 1

2(a + b), we have c 2

RS(F) + ϵW d 1 q n

RS( e F) RS(F) + ϵW d 1 q n.

Therefore, we have a tight bound for RS( e F) up to a constant factor. Further, if p > 1 the adversarial Rademacher complexity has an unavoidable polynomial dimension dependence, i.e., RS( e F) is always at least as large as O(ϵW d1/q

n ). On the other hand, one can easily show that in the natural setting, RS(F) = W

n Eσ[ Pn i=1 σixi q], which implies that RS(F) depends on the distribution of xi and the norm constraint W, but does not have an explicit dimension dependence. This means that RS( e F) could be order-wise larger than RS(F), depending on the distribution of the data. An interesting fact is that, if we have an ℓ1 norm constraint on the prediction vector w, we can avoid the dimension dependence in RS( e F).

3.2. Multi-class Classiﬁcation

Margin Bounds for Multi-class Classiﬁcation We proceed to study multi-class linear classiﬁers. We start with the standard margin bound framework for multi-class classiﬁcation. In K-class classiﬁcation problems, we choose Y = [K], and the functions in the hypothesis class F map X to RK, i.e., F (RK)X . Intuitively, the k-th coordinate of f(x) is the score that f gives to the k-th class, and we make prediction by choosing the class with the highest score. We deﬁne the margin operator M(z, y) : RK [K] R as M(z, y) = zy maxy =y zy . For a training example (x, y), a hypothesis f makes a correct prediction if and only if M(f(x), y) > 0. We also deﬁne function class MF := {(x, y) 7 M(f(x), y) : f F} RX [K]. For multi-class classiﬁcation problems, we consider a particular loss function ℓ(f(x), y) = φγ(M(f(x), y)), where γ > 0 and φγ : R [0, 1] is the ramp loss:

γ 0 < t < γ 0 t γ.

One can check that ℓ(f(x), y) satisﬁes:

1(y = arg max y [K][f(x)]y ) ℓ(f(x), y)

1([f(x)]y γ + max y =y[f(x)]y ). (5)

Let S = {(xi, yi)}n i=1 (X [K])n be the i.i.d. training examples, and deﬁne the function class ℓF := {(x, y) 7 φγ(M(f(x), y)) : f F} RX [K]. Since φγ(t) [0, 1] and φγ( ) is 1/γ-Lipschitz, by combining (5) with Theorem 1, we can obtain the following direct corollary as the generalization bound in the multi-class setting (Mohri et al., 2012).

Corollary 2. Consider the above multi-class classiﬁcation setting. For any ﬁxed γ > 0, we have with probability at least 1 δ, for all f F,

y = arg max y [K][f(x)]y

i=1 1([f(xi)]yi γ + max y =y[f(xi)]y ) + 2RS(ℓF)

In the adversarial setting, the adversary tries to maximize the loss ℓ(f(x), y) = φγ(M(f(x), y)) around an ℓ ball centered at x. We have the adversarial loss eℓ(f(x), y) := maxx B x (ϵ) ℓ(f(x ), y), and the function class eℓF := {(x, y) 7 eℓ(f(x), y) : f F} RX [K]. Thus, we have the following generalization bound in the adversarial setting.

Corollary 3. Consider the above adversarial multi-class classiﬁcation setting. For any ﬁxed γ > 0, we have with probability at least 1 δ, for all f F,

x B x (ϵ) s.t. y = arg max y [K][f(x )]y

i=1 1( x i B xi(ϵ),[f(x i)]yi γ + max y =y[f(x i)]y )

+ 2RS(eℓF) + 3

Multi-class Linear Classiﬁers We now focus on multiclass linear classiﬁers. For linear classiﬁers, each function in the hypothesis class is linearly parametrized by a matrix W RK d, i.e., f(x) f W(x) = Wx. Let wk Rd be the k-th column of W , then we have [f W(x)]k = wk, x . We assume that each wk has ℓp norm (p 1) upper bounded by W, which implies that F = {f W(x) : W p, W}. In the natural setting, we have the following margin bound for linear classiﬁers as a corollary of the multi-class margin bounds by Kuznetsov et al. (2015); Maximov & Reshetova (2016).

Theorem 3. Consider the multi-class linear classiﬁers in the above setting, and suppose that 1

q = 1, p, q 1. For any ﬁxed γ > 0 and W > 0, we have with probability at least 1 δ, for all W such that W p, W,

Rademacher Complexity for Adversarially Robust Generalization

y = arg max y [K] wy , x

i=1 1( wyi, xi γ + max y =yi wy , xi )

We prove Theorem 3 in Appendix C.1 for completeness. In the adversarial setting, we have the following margin bound.

Theorem 4 (Main Result 2). Consider the multi-class linear classiﬁers in the adversarial setting, and suppose that 1 p + 1

q = 1, p, q 1. For any ﬁxed γ > 0 and W > 0, we have with probability at least 1 δ, for all W such that W p, W,

P(x,y) D n x B x (ϵ), s.t. y = arg max y [K] wy , x o

i=1 Ei + 2WK

Kd 1 q n + 1

y=1 Uy i +3

Ei =1( wyi, xi γ+ max y =yi( wy , xi +ϵ wy wyi 1)),

i=1 σixi1(yi = y) q i .

We prove Theorem 4 in Appendix C.2. As we can see, similar to the binary classiﬁcation problems, if p > 1, the margin bound in the adversarial setting has an explicit polynomial dependence on d, whereas in the natural setting, the margin bound does not have dimension dependence. This shows that, at least for the generalization upper bound that we obtain, the dimension dependence in the adversarial setting also exists in the multi-class classiﬁcation problems. 4. Neural Networks

We proceed to consider feedforward neural networks with Re LU activation. Here, each function f in the hypothesis class F is parametrized by a sequence of matrices W = (W1, W2, . . . , WL), i.e., f f W. Assume that Wh Rdh dh 1, and ρ( ) be the Re LU function, i.e., ρ(t) = max{t, 0} for t R. For vectors, ρ(x) is vector generated by applying ρ( ) on each coordinate of x, i.e., [ρ(x)]i = ρ(xi). We have1

f W(x) = WLρ(WL 1ρ( ρ(W1x) )).

For K-class classiﬁcation, we have d L = K, f W(x) : Rd RK, and [f W(x)]k is the score for the k-th class. In the special case of binary classiﬁcation, as discussed in Section 3.1, we can have Y = {+1, 1}, d L = 1, and the loss function can be written as

1This implies that d0 d.

ℓ(f W(x), y) = φ(yf W(x)), where φ : R [0, B] is monotonically nonincreasing and Lφ-Lipschitz.

4.1. Comparison of Rademacher Complexity Bounds

We start with a comparison of Rademacher complexities of neural networks in the natural and adversarial settings. Although naively applying the deﬁnition of Rademacher complexity may provide a loose generalization bound (Zhang et al., 2016a), when properly normalized by the margin, one can still derive generalization bound that matches experimental observations via Rademacher complexity (Bartlett et al., 2017). Our comparison shows that, when the weight matrices of the neural networks have bounded norms, in the natural setting, the Rademacher complexity is upper bounded by a quantity which only has logarithmic dependence on the dimension; however, in the adversarial setting, the Rademacher complexity is lower bounded by a quantity with explicit

d dependence.

We focus on the binary classiﬁcation. For the natural setting, we review the results by Bartlett et al. (2017). Let S = {(xi, yi)}n i=1 (X { 1, +1})n be the i.i.d. training examples, and deﬁne X := [x1, x2, , xn] Rd n, and dmax = max{d, d1, d2, . . . , d L}. Theorem 5. (Bartlett et al., 2017) Consider the neural network hypothesis class F = {f W(x) : W = (W1, W2, . . . , WL), Wh σ sh, W h 2,1 bh, h [L]} RX . Then, we have

RS(F) 4 n3/2 + 26 log(n) log(2dmax)

where A = X F QL h=1 sh PL j=1( bj

sj )2/3 3/2 .

On the other hand, in this work, we prove the following result which shows that when the product of the spectral norms of all the weight matrices is bounded, the Rademacher complexity of the adversarial loss function class is lower bounded by a quantity with an explicit

d factor. More speciﬁcally, for binary classiﬁcation problems, since eℓ(f W(x), y)= max x B x (ϵ)ℓ(f W(x ), y)=φ( min x B x (ϵ)yf W(x )),

and φ( ) is Lipschitz, we consider the function class

e F = {(x, y) 7 min x B x (ϵ) yf W(x ) :

W =(W1, W2, . . . , WL),

h=1 Wh σ r} RX { 1}.

(6) Then we have the following result. Theorem 6 (Main Result 3). Let e F be deﬁned as in (6). Then, there exists a universal constant c > 0 such that

RS( e F) cr 1

Rademacher Complexity for Adversarially Robust Generalization

We prove Theorem 6 in Appendix D.1. This result shows that if we aim to study the Rademacher complexity of the function class deﬁned as in (6), a

d dimension dependence may be unavoidable, in contrast to the natural setting where the dimension dependence is only logarithmic.

4.2. Generalization Bound for Surrogate Adversarial Loss

For neural networks, even if there is only one hidden layer, for a particular data point (x, y), evaluating the adversarial loss eℓ(f W(x), y) = maxx B x (ϵ) ℓ(f W(x ), y) can be hard, since it requires maximizing a non-concave function in a bounded set. A recent line of work tries to ﬁnd upper bounds for eℓ(f W(x), y) that can be computed in polynomial time. More speciﬁcally, we would like to ﬁnd surrogate adversarial loss bℓ(f W(x), y) such that bℓ(f W(x), y) eℓ(f W(x), y), x, y, W. These surrogate adversarial loss can thus provide certiﬁed defense against adversarial examples, and can be computed efﬁciently. In addition, the surrogate adversarial loss bℓ(f W(x), y) should be as tight as possible it should be close enough to the original adversarial loss eℓ(f W(x), y), so that the surrogate adversarial loss can indeed represent the robustness of the model against adversarial attacks. Recently, a few approaches to designing surrogate adversarial loss have been developed, and SDP relaxation (Raghunathan et al., 2018a;b) and LP relaxation (Kolter & Wong, 2017; Wong et al., 2018) are two major examples.

In this section, we focus on the SDP relaxation for one hidden layer neural network with Re LU activation proposed by Raghunathan et al. (2018a). We prove a generalization bound regarding the surrogate adversarial loss, and show that this generalization bound does not have explicit dimension dependence if the weight matrix of the ﬁrst layer has bounded ℓ1 norm. We consider K-class classiﬁcation problems in this section (i.e., Y = [K]), and start with the deﬁnition and property of the SDP surrogate loss. Since we only have one hidden layer, f W(x) = W2ρ(W1x). Let w2,k be the k-th column of W 2 . Then, we have the following results according to Raghunathan et al. (2018a). Theorem 7. (Raghunathan et al., 2018a) For any (x, y), W1, W2, and y = y,

max x B x (ϵ)([f W(x )]y [f W(x )]y) [f W(x)]y [f W(x)]y

4 max P 0,diag(P) 1 Q(w2,y w2,y, W1), P ,

where Q(v, W) :=

0 0 1 W diag(v) 0 0 W diag(v) diag(v) W1 diag(v) W 0

Since we consider multi-class classiﬁcation problems in this section, we use the ramp loss φγ deﬁned in (4) composed

with the margin operator as our loss function. Thus, we have ℓ(f W(x), y) = φγ(M(f W(x), y)) and eℓ(f W(x), y) = maxx B x (ϵ) φγ(M(f W(x ), y)). Here, we design a surrogate loss bℓ(f W(x), y) based on Theorem 7. Lemma 1. Deﬁne bℓ(f W(x), y) := φγ M(f W(x), y)

2 max k [K],z= 1 max P 0,diag(P) 1 z Q(w2,k, W1), P .

Then, we have max x B x (ϵ) 1(y = arg max y [K][f W(x )]y ) bℓ(f W(x), y)

1 M(f W(x), y)

2 max k [K],z= 1 max P 0,diag(P) 1 z Q(w2,k, W1), P γ .

We prove Lemma 1 in Appendix D.2. With this surrogate adversarial loss in hand, we can develop the following margin bound for adversarial generalization. In this theorem, we use X = [x1, x2, , xn] Rd n, and dmax = max{d, d1, K}. Theorem 8 (Main Result 4). Consider the neural network hypothesis class F = {f W(x) : W = (W1, W2), Wh σ sh, h = 1, 2, W1 1 b1, W 2 2,1 b2}. Then, for any ﬁxed γ > 0, with probability at least 1 δ, we have for all f W( ) F,

x B x (ϵ) s.t. y = arg max y [K][f W(x )]y

n3/2 + 60 log(n) log(2dmax)

where Ei =1 [f W(xi)]yi γ + max y =yi[f W(xi)]y

2 max k [K],z= 1 max P 0,diag(P) 1 z Q(w2,k, W1), P ,

s1 )2/3 + (b2

s2 )2/3 3/2 X F .

We prove Theorem 8 in Appendix D.3. Similar to linear classiﬁers, in the adversarial setting, if we have an ℓ1 norm constraint on the matrix matrix W1, then the generalization bound of the surrogate adversarial loss does not have an explicit dimension dependence.

5. Experiments

In this section, we validate our theoretical ﬁndings for linear classiﬁers and neural networks via experiments. Our experiments are implemented with Tensorﬂow (Abadi et al., 2016) on the MNIST dataset (Le Cun et al., 1998).

Rademacher Complexity for Adversarially Robust Generalization

5.1. Linear Classiﬁers We validate two theoretical ﬁndings for linear classiﬁers: (i) controlling the ℓ1 norm of the model parameters can reduce the adversarial generalization error, and (ii) there is a dimension dependence in adversarial generalization, i.e., adversarially robust generalization is harder when the dimension of the feature space is higher. We train the multiclass linear classiﬁer using the following objective function:

i=1 max x i B xi(ϵ) ℓ(f W(x i), yi) + λ W 1, (8)

where ℓ( ) is cross entropy loss and f W(x) Wx. Since we focus on the generalization property, we use a small number of training data so that the generalization gap is more signiﬁcant. More speciﬁcally, in each run of the training algorithm, we randomly sample n = 1000 data points from the training set of MNIST as the training data, and run adversarial training to minimize the objective (8). Our training algorithm alternates between mini-batch stochastic gradient descent with respect to W and computing adversarial examples on the chosen batch in each iteration. Here, we note that since we consider linear classiﬁers, the adversarial examples can be analytically computed according to Appendix C.2.

In our ﬁrst experiment, we vary the values of ϵ and λ, and for each (ϵ, λ) pair, we conduct 10 runs of the training algorithm, and in each run we sample the 1000 training data independently. In Figure 2, we plot the adversarial generalization error as a function of ϵ and λ, and the error bars show the standard deviation of the 10 runs. As we can see, when λ increases, the generalization gap decreases, and thus we conclude that ℓ1 regularization is helpful for reducing adversarial generalization error.

0.000 0.005 0.010 0.015 0.020 0.025 0.030 perturbation

train_acc - test_acc

=0:0 =0:001 =0:002

Figure 1. Linear classiﬁers. Adversarial generalization error vs ℓ perturbation ϵ and regularization coefﬁcient λ. In our second experiment, we choose λ = 0 and study the dependence of adversarial generalization error on the dimension of the feature space. Recall that each data point in the original MNIST dataset is a 28 28 image, i.e., d = 784. We construct two additional image datasets with d = 196 (downsampled) and d = 3136 (expanded), respectively. To construct the downsampled image, we replace each 2 2 patch say, with pixel values a, b, c, d on the original image with a single pixel with value

a2 + b2 + c2 + d2. To construct the expanded image, we replace each pixel say,

with value a on the original image with a 2 2 patch, with the value of each pixel in the patch being a/2. Such construction keeps the ℓ2 norm of the every single image the same across the three datasets, and thus leads a fair comparison. The adversarial generalization error is plotted in Figure 2, and as we can see, when the dimension d increases, the generalization gap also increases.

0.000 0.005 0.010 0.015 perturbation

train_acc - test_acc

d =196 d =784 d =3136

Figure 2. Linear classiﬁers. Adversarial generalization error vs ℓ perturbation ϵ and dimension of feature space d.

5.2. Neural Networks

In this experiment, we validate our theoretical result that ℓ1 regularization can reduce the adversarial generalization error on a four-layer Re LU neural network, where the ﬁrst two layers are convolutional and the last two layers are fully connected. We use PGD attack (Madry et al., 2017) adversarial training to minimize the ℓ1 regularized objective (8). We use the whole training set of MNIST, and once the model is obtained, we use PGD attack to check the adversarial training and test error. We present the adversarial generalization errors under the PGD attack in Figure 3. As we can see, the adversarial generalization error decreases as we increase the regularization coefﬁcient λ; thus ℓ1 regularization indeed reduces the adversarial generalization error under the PGD attack.

0.000 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 regularization

train_acc - test_acc

² =0:05 ² =0:2

Figure 3. Neural networks. Adversarial generalization error vs regularization coefﬁcient λ.

6. Related Work

During the preparation of the initial draft of this paper, we become aware of another independent and concurrent work by Khim & Loh (2018), which studies a similar problem. In this section, we ﬁrst compare our work with Khim & Loh (2018) and then discuss other related work. We make the comparison in the following aspects.

For binary classiﬁcation problems, the adversarial

Rademacher Complexity for Adversarially Robust Generalization

Rademacher complexity upper bound by Khim & Loh (2018) is similar to ours. However, we provide an adversarial Rademacher complexity lower bound that matches the upper bound. Our lower bound shows that the adversarial Rademacher complexity is never smaller than that in the natural setting, indicating the hardness of adversarially robust generalization. As mentioned, although our lower bound is for Rademacher complexity rather than generalization, Rademacher complexity is a tight bound for the rate of uniform convergence of a loss function class (Koltchinskii et al., 2006) and thus in many settings can be a tight bound for generalization. In addition, we provide a lower bound for the adversarial Rademacher complexity for neural networks. These lower bounds do not appear in the work by Khim & Loh (2018). We discuss the generalization bounds for the multi-class setting, whereas Khim & Loh (2018) focus only on binary classiﬁcation. Both our work and Khim & Loh (2018) prove adversarial generalization bound using surrogate adversarial loss (upper bound for the actual adversarial loss). Khim & Loh (2018) use a method called tree transform whereas we use the SDP relaxation proposed by (Raghunathan et al., 2018a). These two approaches are based on different ideas and thus we believe that they are not directly comparable.

We proceed to discuss other related work.

Adversarially robust generalization As discussed in Section 1, it has been observed by Madry et al. (2017) that there might be a signiﬁcant generalization gap when training deep neural networks in the adversarial setting. This generalization problem has been further studied by Schmidt et al. (2018), who show that to correctly classify two separated d-dimensional spherical Gaussian distributions, in the natural setting one only needs O(1) training data, but in the adversarial setting one needs Θ(

d) data. Getting distribution agnostic generalization bounds (also known as the PAC-learning framework) for the adversarial setting is proposed as an open problem by Schmidt et al. (2018). In a subsequent work, Cullina et al. (2018) study PAC-learning guarantees for binary linear classiﬁers in the adversarial setting via VC-dimension, and show that the VC-dimension does not increase in the adversarial setting. This result does not provide explanation to the empirical observation that adversarially robust generalization may be hard. In fact, although VC-dimension and Rademacher complexity can both provide valid generalization bounds, VC-dimension usually depends on the number of parameters in the model while Rademacher complexity usually depends on the norms of the weight matrices and data points, and can often provide tighter generalization bounds (Bartlett, 1998). Suggala et al. (2018) discuss a similar notion of adversarial risk but do not prove explicit generalization bounds. Attias et al.

(2018) prove adversarial generalization bounds in a setting where the number of potential adversarial perturbations is ﬁnite, which is a weaker notion than the ℓ attack that we consider.

Provable defense against adversarial attacks Besides generalization property, another recent line of work aim to design provable defense against adversarial attacks. Two examples of provable defense are SDP relaxation (Raghunathan et al., 2018a;b) and LP relaxation (Kolter & Wong, 2017; Wong et al., 2018). The idea of these methods is to construct upper bounds of the adversarial risk that can be efﬁciently evaluated and optimized. The analyses of these algorithms usually focus on minimizing training error and do not have generalization guarantee; in contrast, we focus on generalization property in this paper.

Generalization of neural networks Generalization of neural networks has been an important topic, even in the natural setting where there is no adversary. The key challenge is to understand why deep neural networks can generalize to unseen data despite the high capacity of the model class. The problem has received attention since the early stage of neural network study (Bartlett, 1998; Anthony & Bartlett, 1999). Recently, understanding generalization of deep nets is raised as an open problem since traditional techniques such as VC-dimension, Rademacher complexity, and algorithmic stability are observed to produce vacuous generalization bounds (Zhang et al., 2016a). Progress has been made more recently. In particular, it has been shown that when properly normalized by the margin, using Rademacher complexity or PAC-Bayesian analysis, one can obtain generalization bounds that tend to match the experimental results (Bartlett et al., 2017; Neyshabur et al., 2017; Arora et al., 2018; Golowich et al., 2017). In addition, in this paper, we show that when the weight vectors or matrices have bounded ℓ1 norm, there is no dimension dependence in the adversarial generalization bound. This result is consistent and related to several previous works (Lee et al., 1996; Bartlett, 1998; Mei et al., 2018; Zhang et al., 2016b).

A few other lines of work have conducted theoretical analysis of adversarial examples (Wang et al., 2017; Bubeck et al., 2018; Gilmer et al., 2018b; Dohmatob, 2018; Mahloujifar et al., 2018). We provide additional discussions on related work in Appendix A.

Acknowledgements

D. Yin is partially supported by Berkeley Deep Drive Industry Consortium. K. Ramchandran is partially supported by NSF CIF award 1703678. P. Bartlett is partially supported by NSF grant IIS-1619362. The authors would like to thank Justin Gilmer for helpful discussion and anonymous reviewers for their comments. Cloud computing resources are provided by AWS Cloud Credits for Research.

Rademacher Complexity for Adversarially Robust Generalization

Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al. Tensorﬂow: a system for large-scale machine learning. In OSDI, volume 16, pp. 265 283, 2016.

Anthony, M. and Bartlett, P. L. Neural network learning: Theoretical foundations. Cambridge University Press, 1999.

Arora, S., Ge, R., Neyshabur, B., and Zhang, Y. Stronger generalization bounds for deep nets via a compression approach. ar Xiv preprint ar Xiv:1802.05296, 2018.

Athalye, A., Carlini, N., and Wagner, D. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. ar Xiv preprint ar Xiv:1802.00420, 2018.

Attias, I., Kontorovich, A., and Mansour, Y. Improved generalization bounds for robust learning. ar Xiv preprint ar Xiv:1810.02180, 2018.

Bahdanau, D., Cho, K., and Bengio, Y. Neural machine translation by jointly learning to align and translate. ar Xiv preprint ar Xiv:1409.0473, 2014.

Bartlett, P. L. The sample complexity of pattern classiﬁcation with neural networks: the size of the weights is more important than the size of the network. IEEE Transactions on Information Theory, 44(2):525 536, 1998.

Bartlett, P. L. and Mendelson, S. Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3(Nov):463 482, 2002.

Bartlett, P. L., Foster, D. J., and Telgarsky, M. J. Spectrallynormalized margin bounds for neural networks. In Advances in Neural Information Processing Systems, 2017.

Ben-Tal, A., El Ghaoui, L., and Nemirovski, A. Robust optimization. Princeton University Press, 2009.

Bubeck, S., Price, E., and Razenshteyn, I. Adversarial examples from computational constraints. ar Xiv preprint ar Xiv:1805.10204, 2018.

Carlini, N. and Wagner, D. Defensive distillation is not robust to adversarial examples. ar Xiv preprint ar Xiv:1607.04311, 2016.

Carlini, N. and Wagner, D. Adversarial examples are not easily detected: Bypassing ten detection methods. In Proceedings of the 10th ACM Workshop on Artiﬁcial Intelligence and Security, pp. 3 14. ACM, 2017.

Carlini, N. and Wagner, D. Audio adversarial examples: Targeted attacks on speech-to-text. ar Xiv preprint ar Xiv:1801.01944, 2018.

Cullina, D., Bhagoji, A. N., and Mittal, P. PAC-learning in the presence of evasion adversaries. ar Xiv preprint ar Xiv:1806.01471, 2018.

Dohmatob, E. Limitations of adversarial robustness: strong no free lunch theorem. ar Xiv preprint ar Xiv:1810.04065, 2018.

Engstrom, L., Tsipras, D., Schmidt, L., and Madry, A. A rotation and a translation sufﬁce: Fooling CNNs with simple transformations. ar Xiv preprint ar Xiv:1712.02779, 2017.

Gilmer, J., Adams, R. P., Goodfellow, I., Andersen, D., and Dahl, G. E. Motivating the rules of the game for adversarial example research. ar Xiv preprint ar Xiv:1807.06732, 2018a.

Gilmer, J., Metz, L., Faghri, F., Schoenholz, S. S., Raghu, M., Wattenberg, M., and Goodfellow, I. Adversarial spheres. ar Xiv preprint ar Xiv:1801.02774, 2018b.

Golowich, N., Rakhlin, A., and Shamir, O. Size-independent sample complexity of neural networks. ar Xiv preprint ar Xiv:1712.06541, 2017.

Goodfellow, I. J., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples. ar Xiv preprint ar Xiv:1412.6572, 2014.

Graves, A., Mohamed, A.-r., and Hinton, G. Speech recognition with deep recurrent neural networks. In International Conference on Acoustics, Speech, and Signal Processing. IEEE, 2013.

Gu, S. and Rigazio, L. Towards deep neural network architectures robust to adversarial examples. ar Xiv preprint ar Xiv:1412.5068, 2014.

He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In Computer Vision and Pattern Recognition, 2016.

Huang, R., Xu, B., Schuurmans, D., and Szepesv ari, C. Learning with a strong adversary. ar Xiv preprint ar Xiv:1511.03034, 2015.

Khim, J. and Loh, P.-L. Adversarial risk bounds for binary classiﬁcation via function transformation. ar Xiv preprint ar Xiv:1810.09519, 2018.

Koltchinskii, V. et al. Local rademacher complexities and oracle inequalities in risk minimization. The Annals of Statistics, 34(6):2593 2656, 2006.

Kolter, J. Z. and Wong, E. Provable defenses against adversarial examples via the convex outer adversarial polytope. ar Xiv preprint ar Xiv:1711.00851, 2017.

Rademacher Complexity for Adversarially Robust Generalization

Kos, J., Fischer, I., and Song, D. Adversarial examples for generative models. In 2018 IEEE Security and Privacy Workshops (SPW), pp. 36 42. IEEE, 2018.

Kuznetsov, V., Mohri, M., and Syed, U. Rademacher complexity margin bounds for learning with a large number of classes. In ICML Workshop on Extreme Classiﬁcation: Learning with a Very Large Number of Labels, 2015.

Le Cun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11):2278 2324, 1998.

Ledoux, M. and Talagrand, M. Probability in Banach Spaces: isoperimetry and processes. Springer Science & Business Media, 2013.

Lee, W. S., Bartlett, P. L., and Williamson, R. C. Efﬁcient agnostic learning of neural networks with bounded fanin. IEEE Transactions on Information Theory, 42(6): 2118 2132, 1996.

Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. Towards deep learning models resistant to adversarial attacks. ar Xiv preprint ar Xiv:1706.06083, 2017.

Mahloujifar, S., Diochnos, D. I., and Mahmoody, M. The curse of concentration in robust learning: Evasion and poisoning attacks from concentration of measure. ar Xiv preprint ar Xiv:1809.03063, 2018.

Maximov, Y. and Reshetova, D. Tight risk bounds for multiclass margin classiﬁers. Pattern Recognition and Image Analysis, 26(4):673 680, 2016.

Mei, S., Montanari, A., and Nguyen, P.-M. A mean ﬁeld view of the landscape of two-layers neural networks. ar Xiv preprint ar Xiv:1804.06561, 2018.

Mohri, M., Rostamizadeh, A., and Talwalkar, A. Foundations of machine learning. MIT press, 2012.

Neyshabur, B., Bhojanapalli, S., Mc Allester, D., and Srebro, N. A PAC-bayesian approach to spectrally-normalized margin bounds for neural networks. ar Xiv preprint ar Xiv:1707.09564, 2017.

Raghunathan, A., Steinhardt, J., and Liang, P. Certiﬁed defenses against adversarial examples. ar Xiv preprint ar Xiv:1801.09344, 2018a.

Raghunathan, A., Steinhardt, J., and Liang, P. S. Semidefinite relaxations for certifying robustness to adversarial examples. In Advances in Neural Information Processing Systems, 2018b.

Schmidt, L., Santurkar, S., Tsipras, D., Talwar, K., and Madry, A. Adversarially robust generalization requires more data. ar Xiv preprint ar Xiv:1804.11285, 2018.

Shaham, U., Yamada, Y., and Negahban, S. Understanding adversarial training: Increasing local stability of neural nets through robust optimization. ar Xiv preprint ar Xiv:1511.05432, 2015.

Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484, 2016.

Suggala, A. S., Prasad, A., Nagarajan, V., and Ravikumar, P. On adversarial risk and training. ar Xiv preprint ar Xiv:1806.02924, 2018.

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. Intriguing properties of neural networks. ar Xiv preprint ar Xiv:1312.6199, 2013.

Wang, Y., Jha, S., and Chaudhuri, K. Analyzing the robustness of nearest neighbors to adversarial examples. ar Xiv preprint ar Xiv:1706.03922, 2017.

Wong, E., Schmidt, F., Metzen, J. H., and Kolter, J. Z. Scaling provable adversarial defenses. ar Xiv preprint ar Xiv:1805.12514, 2018.

Zhang, C., Bengio, S., Hardt, M., Recht, B., and Vinyals, O. Understanding deep learning requires rethinking generalization. ar Xiv preprint ar Xiv:1611.03530, 2016a.

Zhang, Y., Lee, J. D., and Jordan, M. I. ℓ1-regularized neural networks are improperly learnable in polynomial time. In International Conference on Machine Learning, 2016b.