# fundamental_tradeoffs_in_distributionally_adversarial_training__7afe6a32.pdf

Fundamental Tradeoffs in Distributionally Adversarial Training

Mohammad Mehrabi * 1 Adel Javanmard * 1 Ryan A. Rossi 2 Anup B. Rao 2 Tung Mai 2

Adversarial training is among the most effective techniques to improve robustness of models against adversarial perturbations. However, the full effect of this approach on models is not well understood. For example, while adversarial training can reduce the adversarial risk (prediction error against an adversary), it sometimes increase standard risk (generalization error when there is no adversary). In this paper, we focus on distribution perturbing adversary framework wherein the adversary can change the test distribution within a neighborhood of the training data distribution. The neighborhood is deﬁned via Wasserstein distance between distributions and the radius of the neighborhood is a measure of adversary s manipulative power. We study the tradeoff between standard risk and adversarial risk and derive the Pareto-optimal tradeoff, achievable over speciﬁc classes of models, in the inﬁnite data limit with features dimension kept ﬁxed. We consider three learning settings: 1) Regression with the class of linear models; 2) Binary classiﬁcation under the Gaussian mixtures data model, with the class of linear classiﬁers; 3) Regression with the class of random features model (which can be equivalently represented as two-layer neural network with random ﬁrst-layer weights). We show that a tradeoff between standard and adversarial risk is manifested in all three settings. We further characterize the Pareto-optimal tradeoff curves and discuss how a variety of factors, such as features correlation, adversary s power or the width of twolayer neural network would affect this tradeoff.

1. Introduction

Modern machine learning algorithms, and in particular deep neural networks, have demonstrated breakthrough empirical

*Equal contribution 1Department of Data Sciences and Operations, USC Marshall School of Business, University of Southern California, USA; 2Adobe Research, USA. Correspondence to: Adel Javanmard <ajavanma@usc.edu>.

Proceedings of the 38 th International Conference on Machine Learning, PMLR 139, 2021. Copyright 2021 by the author(s).

performance, and have been deployed in a multitude of applications domains ranging from visual object classiﬁcation to speech recognition, robotics, natural language processing and healthcare. The common practice to train these models is by empirical loss minimization on the training data. Nonetheless, it has been observed that the resulting models are surprisingly vulnerable to minute discrepancies between the test and the training data distributions. There are several well documented examples of such behavior in computer vision and image processing where small imperceptible manipulations of images can signiﬁcantly compromise the performance of the state-of-the-art classiﬁers (Szegedy et al., 2014; Biggio et al., 2013). Other examples include well-designed malicious content like malware which can be labeled legitimate by the classiﬁer and harm the system (Chen et al., 2017; Papernot et al., 2017), or adversarial attacks on speech recognition systems, such as Google Now or Siri, which consists in voice commands that are incomprehensible or even completely inaudible to human and can still control the systems (Carlini et al., 2016; Vaidya et al., 2015; Zhang et al., 2017). It is evident that in practice such vulnerability can have catastrophic consequences.

By studying adversarial samples, one can in turn improve the robustness of machine learning algorithms against adversarial attacks. In the past few years, there has been a signiﬁcant research on generating various adversarial samples (Carlini & Wagner, 2017; Athalye et al., 2018; Goodfellow et al., 2015b; Papernot et al., 2016a) and defenses (Madry et al., 2018a; Cisse et al., 2017; Papernot et al., 2016b). Among the considerable effort to improve the adversarial robustness of algorithms, adversarial training is one of the most effective techniques. Adversarial training is often formulated as a minimax optimization problem, where the inner maximization aims to ﬁnd an adversarial example that maximizes a predictive loss function, while the outer minimization aims to train a robust estimator using the generated adversarial examples (Goodfellow et al., 2015a; Kurakin et al., 2016; Madry et al., 2018a; Raghunathan et al., 2018; Wong & Kolter, 2018).

While adversarial training techniques have been successful in improving the adversarial robustness of the models, their full effect on machine learning systems is not well understood. In particular, some studies (Madry et al., 2018a) observed that the robustness virtue of adversarial training

Fundamental Tradeoffs in Distributionally Adversarial Training

comes at the cost of worsening the performance on natural unperturbed inputs, i.e, increasing generalization error. However, (Tsipras et al., 2018) observes empirically that when there are very few training data, adversarial training can help with reducing the generalization error. Complicating matters further, (Raghunathan et al., 2019) argues that additional unlabeled data can mitigate the tension between adversarial risk (predictive performance against adversarial perturbations) and the standard risk (predictive performance when there is no adversary, a.k.a generalization error). These observations raise the following important question regarding adversarial training:

Is there a fundamental tradeoff between adversarial risk and standard risk? Or do there exist models that are optimal with respect to both of these measures? What are the roles of different factors, such as adversary s power, problem dimension and the complexity of the model class (e.g., number of neurons) in the interplay between standard risk and adversarial risk?

Here, by fundamental tradeoff we mean a tradeoff that holds given unlimited computational power and inﬁnite training data to train a model. In this work, we answer these questions for adversarial distribution shifts, where the adversary can shift the test data distribution, making it different from the training data distribution. The test data distribution can be an arbitrary but ﬁxed distribution in a neighborhood of the training data distribution and the radius of this neighborhood is a measure of adversary s power.

Contributions. We next summarize our contributions in this paper:

We characterize the fundamental tradeoff between standard risk and adversarial risk for distributionally adversarial training for the settings of linear regression and binary classiﬁcation (under a Gaussian mixtures model). We focus on inﬁnite data limit (n ) with ﬁnite feature dimension (d) and hence our analysis is at population level. The fundamental tradeoff is characterized by studying the Pareto optimal fronts for the achievability region in the two dimensional standard risk-adversarial risk region. The Pareto optimal front consists in the set of estimators for which one cannot decrease both standard and adversarial risk by deviating from these estimators. Similar tradeoffs have been derived for linear regression setting with norm bounded adversarial perturbation and isotropic Gaussian features (Javanmard et al., 2020). Here we focus on distribution perturbing adversaries and consider general anisotropic Gaussian features.

For the binary classiﬁcation we consider a Gaussian mixtures model with general feature covariance and a distribution perturbing adversary, where the perturbation is measured in terms of the Wasserstein metric with general

ℓr norm. (We refer to Sections 2.2 and 3.2 for further details and formal deﬁnitions). Our analysis shows how the fundamental tradeoff between standard and adversarial risk is impacted by a variety of factors, such as adversary s power, feature dimension, features correlation and the choice of ℓr perturbation norm. An interesting observation is that for r = 2 the tradeoff between standard and adversarial risk vanishes. In other words, there exists a model which achieve both the optimal standard risk and the optimal adversarial risk.

We also study the Pareto optimal tradeoffs between the standard and adversarial risks for the problem of learning an unknown function over the d-dimensional sphere using random features model. This can be represented as linear models with N random nonlinear features of the form σ(w T a x), 1 a N, with σ( ) a nonlinear activation. Equivalently this can be characterized as ﬁtting a two-layer neural network with random ﬁrst-layer. Building upon approximation formula for adversarial risk, we study the effect of network width N on the tradeoff between standard and adversarial risks.

1.1. Further related work

Very recent work (Javanmard & Soltanolkotabi, 2020; Taheri et al., 2020) have focused on binary classiﬁcation, under Gaussian mixtures model and proposed a precise characterization of the standard and adversarial risk achieved by a speciﬁc class of adversarial training approach (Tsipras et al., 2018; Madry et al., 2018b). These work consider an asymptotic regime where the sample size grows in proportion to the problem dimension d and focus on norm bounded adversarial perturbation. In comparison, we consider a ﬁxed d, inﬁnite n setting and consider distribution perturbing adversary. Also we focus on fundamental tradeoffs achieved by any linear classiﬁer, while (Javanmard & Soltanolkotabi, 2020; Taheri et al., 2020) work with a speciﬁc class of saddle point estimator. The other work (Dobriban et al., 2020) also considers norm bounded adversarial perturbation for the classiﬁcation problem and studies the optimal ℓ2 and ℓ robust linear classiﬁers assuming access to the class averages. Furthermore, it also studies the tradeoff between standard and robust accuracies from a Bayesian perspective by contrasting this optimal robust classiﬁer with the Bayes optimal classiﬁer in a non-adversarial setting.

2. Problem formulation

In a classic supervised learning setting, a learner is given n pairs of data points {zi := (xi, yi)}i=1:n with xi Rd

representing features vectors and yi the response variables (or labels). The common assumption in supervised machine learning is that the data points zi are drawn independently and identically from some probability measure PZ deﬁned

Fundamental Tradeoffs in Distributionally Adversarial Training

over the space Z := X Y. Given this training data, the learner would like to ﬁt a parametric function fθ with θ Rd to predict the response (label) on new points x.

A common practice to model ﬁtting is through the empirical risk minimization:

bθ = arg min θ Rd 1 n

j=1 ℓ(θ; (xj, yj)) , (1)

with ℓ(θ; (x, y)) := eℓ(fθ(x), y) and eℓbeing a loss function which captures the discrepancy between the estimated value fθ(x) and the true response value y. The performance of the model is then measured in terms of standard risk (a.k.a. generalization error), deﬁned as

SR(θ) := Ez=(x,y) PZ [ℓ(θ; (x, y))] . (2)

Standard risk is a population risk and quantiﬁes the expected error on new data points drawn from the same distribution as the training data.

Although the empirical risk minimization is a widely used approach for model learning, it is well known that the resulting models can be highly vulnerable to adversarial perturbations of their inputs, known as adversarial attacks. We next discuss the adversarial setting and two common adversary models that are considered in literature.

2.1. Adversarial setting

The adversarial setting can be perceived as a game between the learner and the adversary. Given access to the training data, drawn i.i.d from a common distribution PZ, the learner chooses a model θ. Depending on the adversary s budget ε, the adversary chooses a test data point ( x, y) that can deviate from a typical test point according to one of the following models. The performance of model θ is then measured in terms of predicting y given the perturbed input x.

Norm-bounded perturbations. In this setting, y = y (no perturbation on the response) and x = x + δ where δ can be an arbitrary vector from ℓr-ball of radius ε. The adversarial risk in this case is deﬁned as

AR(θ) := E(x,y) PZ

sup δ ℓr ε ℓ(θ; (x + δ, y))

Distribution shift. In this setting, the adversary can shift the distribution of test data, making it different than the training distribution PZ. Speciﬁcally, ( x, y) Q where Q Uε(PZ) denotes an εneighborhood of the distribution PZ. A popular choice of this neighborhood is via the Wasserstein distance, which is formally deﬁned below. In this case, the adversarial risk is deﬁned as

AR(θ) := sup Q Uε(PZ) E( x, y) Q [ℓ(θ; ( x, y))] . (4)

Note that this is a strong notion of adversary as the perturbation is chosen after observing both the model θ and data point (x, y) (in norm-bounded perturbation model) or the training data distribution PZ (in the distribution shift model). The distribution perturbing adversary is a common model in a multitude of application domains and has already been adopted by several work including (Staib & Jegelka, 2017; Dong et al., 2020; Pydi & Jog, 2020).

Our primary focus on this work is on the distribution shift adversary model with Wasserstein metric to measure the distance between distributions. The next section provides a brief background on the Wasserstein robust loss which will be used later in our work.

2.2. Background on Wasserstein robust loss

Let Z be a metric space endowed with metric d : Z Z R 0. Denote by P(Z) the set of all Borel probability measures on Z. For a Q-measurable function f, the Lp(Q)- norm of f is deﬁned as

Z |f|p d Q 1/p for p [1, ) Q-ess sup z Z |f(z)| for p = (5)

For two distributions P, Q P(Z) the Wasserstein distance of order p is given by

Wp(P, Q) := inf π Cpl(P,Q) d π,p , (6)

where the coupling set Cpl(P, Q) denotes the set of all probability measures π on Z Z with the ﬁrst marginal π1 := π( Z) = P and the second marginal π2 := π(Z ) = Q.

We use the Wasserstein distance to deﬁne the neighborhood set Uε in the distribution shift adversary model. Namely,

Uε(PZ) := {Q P(Z) : Wp(PZ, Q) ε} . (7)

In this case we refer to AR(θ) given by (4) as Wasserstein adversarial risk. Note that this notion involves a maximization over distributions Q P(Z) which can be daunting. However, an important result from distributional robust optimization which we also use in our characterization of AR(θ) is that the strong duality holds for this problem under general conditions. The dual problem of (4) is given by

minγ 0 n γεp + EPZ[φγ(θ; z)] o , p [1, ),

Ez PZ h sup z Z{ℓ(θ; z) : d(z, z) ε} i p = .

Here φγ(θ; z) is the robust surrogate for the loss function ℓ(θ; z) and is deﬁned as

φγ(θ; z0) := sup z Z {ℓ(θ; z) γdp(z, z0)} . (9)

Fundamental Tradeoffs in Distributionally Adversarial Training

For p [1, ) it is shown that strong duality holds if either PZ has ﬁnite support or Z is a Polish space (Gao & Kleywegt, 2016). For p = , Lemma EC.2 in (Gao et al., 2020) shows that strong duality holds if PZ has ﬁnite support.

Remark 2.1. It is worth noting that the Wasserstein adversary model is stronger than and generalizes the normbounded perturbation model. In particular, for any p [1, ],

sup δ ℓr ε ℓ(θ; (x + δ, y))

sup Q Uε(PZ) E( x, y) Q [ℓ(θ; ( x, y))] ,

where Uε(PZ) is given by (7) with the Wasserstein distance deﬁned with respect to the ℓr distance which is also used in the deﬁnition of norm-bounded perturbation model. Equality holds for p = . We refer to (Staib & Jegelka, 2017, Proposition 3.1) and (Pydi & Jog, 2020, Corollary 2.1, Theorem 1) for the proof and further explanation.

2.2.1. REGULARIZATION EFFECT OF WASSERSTEIN ADVERSARIAL LOSS

It is clear from the deﬁnition that AR(θ) SR(θ) for any model θ. Understanding the tradeoff between standard and adversarial risks is intimately related to the gap AR(θ) SR(θ). The gap between the Wasserstein adversarial loss and the standard loss has been studied in several settings in the context of distributionally robust optimization (DRO) (Bartl et al., 2020; Gao et al., 2020). In particular, (Bartl et al., 2020; Gao et al., 2020) introduced the notion of variation of the loss, denoted as V(ℓ), as a measure of the magnitude change in the expected loss when the data distribution is perturbed, and showed that the Wasserstein adversarial loss is closely related to regularizing the nominal loss by the variation V(ℓ) regularizer. The formal deﬁnition of the variation of loss, recalled from (Gao et al., 2020), is given below.

Deﬁnition 2.2. (Variation of the loss). Suppose that Z is a normed space with norm . Let ℓbe a continuous function on Z. Also assume that zℓexists P-almost everywhere. The variation of loss ℓwith respect to P is deﬁned as

zℓ P,q q [1, ) , PZ esssup z Z sup z =z

(ℓ( z) ℓ(z))+

z z , q = .

Here denotes the dual norm of and we recall that P,q is the Lq(P)-norm given by (5).

The following proposition from (Bartl et al., 2020; Gao et al., 2020) states that the variation of loss captures the ﬁrst

order term of the gap between Wasserstein adversarial risk and standard risk for small ε.

Proposition 2.3. Suppose that the loss ℓ(θ; z) is differentiable in the interior of Z for every θ, and zℓis continuous on Z. When p (1, ), assume that there exists M, L 0 such that for every θ and z, z Z,

zℓ(θ; z) zℓ(θ; z) M + L z z p 1 .

When p = , assume instead that there exists M 0 and δ0 > 0 such that for every θ and z, z Z with z z < δ0, we have

zℓ(θ; z) zℓ(θ; z) M .

Then, there exists ε such that for all 0 ε < ε and all θ

AR(θ) SR(θ) = εVPZ,q(ℓ) + O(ε2) , (11)

q = 1 and p is the order of Wasserstein distance in deﬁning set Uε(PZ) in the adversarial risk (4).

By virtue of Proposition 2.3, the Wasserstein adversarial risk can be perceived as a regularized form of the standard risk with regularization given by the variation of the loss. Nonetheless, note that this is only an approximation which captures the ﬁrst order terms for small adversary s power ε. (See also Remark 8 in (Bartl et al., 2020) for an upper bound on the gap, up to second order terms in ε.)

In this paper, we consider the settings of linear regression and binary classiﬁcation. For these settings, only for the special case of p = 1 (1-Wasserstein) and when the loss is Lipschitz and its derivative converges at , it is shown that the gap (11) is linear in ε and therefore is precisely characterized as εVPZ,q(ℓ). However, as we will consider more common losses for these settings, namely quadratic loss for linear regression and 0-1 loss for classiﬁcation, such characterization does not apply to our settings and requires a direct derivation of adversarial risk. Later, in Section 3.3 we use the result of Proposition 2.3 to study the tradeoff between SR and AR in the problem of learning an unknown function over the d-dimensional sphere Sd 1.

3. Main results

In this paper we focus on the distribution perturbing adversary and aim at understanding the fundamental tradeoff between standard risk and adversarial risk, which holds regardless of computational power or the size of training data. We consider 2-Wasserstein distance (p = 2) with the metric d(z, z) deﬁned as

d(z, z) = x x ℓ2 + I{y = y} , (12)

for z = (x, y) and z = ( x, y). Therefore, the adversary with a ﬁnite power ε can only perturb the distribution of the

Fundamental Tradeoffs in Distributionally Adversarial Training

input feature x, but not y. Otherwise, the distance d(z, z) becomes inﬁnite and so the Wasserstein distance between the the data distribution PZ and the adversary distribution Q, given by (6), also becomes inﬁnite. It is worth noting that this choice of d is only for simplicity of presentations and our results in this section can be derived in a straightforward manner for distances that also allow perturbations on the y component.

The following remark relates the two types of adversary discussed in Section 2.1 and follows readily from the deﬁnition (6) and Equation (5).

Remark 3.1. For distance d( , ) given by (12), the adversary model with norm bounded perturbations correspond to the distribution shifting adversary model with p = Wasserstein distance.

3.1. Linear regression

We consider the class of linear models to ﬁt to data with quadratic loss ℓ(z; θ) = (y x Tθ)2. Our ﬁrst result is a closed form representation of the Wasserstein adversarial risk (4) in this case.

Proposition 3.2. Consider the quadratic loss ℓ(z; θ) = (y x Tθ)2 and the distribution perturbing adversary with Uε(PZ) given by (7) with p = 2 and the metric d given by (12). In this case the adversarial risk AR(θ) admits the following form:

EPZ[(y x Tθ)2] + ε θ ℓ2

To prove Proposition 3.2 we exploit the dual problem (8). We refer to Section A.1 for the proof of Proposition 3.2.

Pareto optimal curve. For the linear regression setting, note that the standard risk SR(θ) and the adversarial risk AR(θ) are convex functions of θ. (The latter is convex since EQ[(y x Tθ)2] is convex for any distribution Q and maximization preserves convexity.) Therefore, we can ﬁnd (almost) all Pareto optimal points by minimizing a weighted combination of the two risk measures by varying the weight λ:

θλ := arg min θ λSR(θ) + AR(θ) (14)

The Pareto optimal curve is then given by {(SR(θλ), AR(θλ)) : λ 0}.

Theorem 3.3. Consider the setting of Proposition 3.2 with v := E[yx], σ2 y := E[y2], and Σ := E[xx T]. Then the solution θ of optimization (14) is given either by (i) θλ = 0 or (ii) θλ = (Σ + γ I) 1v, with γ the ﬁxed point of the

following two equations:

γ = ε2 + εA

A = 1 (Σ + γI) 1v ℓ2

σ2 y + Σ1/2(Σ + γI) 1v 2

2v T(Σ + γI) 1v 1/2 . (16)

In case (i) we have SR(θλ) = AR(θλ) = σ2 y. In case (ii) we have

SR(θλ) = A2 (Σ + γ I) 1v 2 ℓ2 ,

AR(θλ) = (A + ε)2 (Σ + γ I) 1v 2 ℓ2 ,

where A is given by (16) when γ = γ .

The proof of Theorem 3.3 is given in Section A.2.

Corollary 3.4. Suppose that data is generated according to linear model y = x Tθ0 + w with w N(0, σ2) and isotropic features satisfying E[xx T] = Id. Then the solution θλ of optimization (14) is given either by (i) θλ = 0 or (ii) θλ = (1 + γ ) 1θ0, where γ is the ﬁxed point of the following two equations:

γ = ε2 + εA

γ2 + (1 + γ)2 σ2

In case (i) we have SR(θλ) = AR(θλ) = σ2 + θ0 2 ℓ2. In case (ii) we have

SR(θλ) = A2 (1 + γ ) 2 θ0 2 ℓ2 , (20)

AR(θλ) = (A + ε)2(1 + γ ) 2 θ0 2 ℓ2 , (21)

where A is given by (19) when γ = γ .

The proof of Corollary 3.4 is provided in Section A.3.

Figure 1 shows the effect of various parameters on the Pareto optimal tradeoffs between adversarial (AR) and standard risks (SR) in linear regression setting. We consider data generated according to the linear model y = x Tθ0 + w with w N(0, 1) and features xi sampled i.i.d from N(0, Σ) where Σi,j = ρ|i j|. Figure 1a demonstrates the role of features dimension d on the Pareto optimal curve for the setting with ρ = 0 (identity covariance matrix), adversary s power ε = 1, and the entries of θ0 generated independently from N(0, 1/40). Note that by Corollary 3.4, in the case of isotropic features, standard risk and adversarial risks depend

Fundamental Tradeoffs in Distributionally Adversarial Training

1 1.5 2 2.5 3 Standard risk

Adversarial risk

(a) Pareto optimal curve for several feature dimensions d with ρ = 0 and ε = 1.

1 2 3 4 5 Standard risk

Adversarial risk

(b) Pareto optimal curve for several feature dependency values ρ with d = 10, ε = 1.

1 2 3 4 5 Standard risk

Adversarial risk

(c) Pareto optimal curve for several adversary s power ε with ρ = 0 and d = 10.

Figure 1. The effect of feature dimension (d), dependency across features (ρ), and adversary s power (ε) on Pareto optimal tradeoff between adversarial (AR) and standard risks (SR) in linear regression setting.

on θ0 only through its ℓ2 norm. The variations in the Paretocurve here is due to variations in θ0 ℓ2 as d changes.

Figure 1b investigates the role of dependency across features (ρ) in the optimal tradeoff between standard and adversarial risks. In this setting d = 10, ε = 1, and θ0 1

d N(0, I). As we see all the curves start from the same point. This can be easily veriﬁed by the result of Theorem 3.3: For the linear data model y = x Tθ0 + w, we have v = Σθ0 and at λ = , the Pareto-optimal estimator is the minimizer of the standard risk, i.e. θλ= = θ0. Also by (15) we have γ = 0, and by (16) we obtain A = σ/ θ0 ℓ2. Plugging these values in (17) we get SR(θ ) = σ2 and AR(θ ) = (σ + ε0 θ0 ℓ2)2. Therefore both metrics become independent of ρ at λ = . Also looking at the right-most point of the Pareto-curves, corresponding to λ = 0, we see that as ρ increases from small to moderate values, this point moves upward-right, indicating that both standard and adversarial risks increase, but after some value of ρ, we start to see a reverse behavior, where standard and adversarial risks start to decrease.

Finally in Figure 1c we observe the effect of adversary s budget ε on the Pareto optimal curve. Here, d = 10, ρ = 0, and θ0 1

d N(0, I). Clearly, as ε grows there is a wider range of Pareto-optimal estimators and the two measures of risks would deviate further from each other. When ε becomes smaller, the two measures of standard and adversarial risks get closer to each other and so the Pareto-optimal curve becomes shorter.

3.2. Binary classiﬁcation

We next consider the problem of binary classiﬁcation under a Gaussian mixture data model. Under this model, each data point belongs to one of two two classes { 1} with corresponding probabilities π+, and π = 1 π+. The feature vectors in each class are generated independently

according to an isometric Gaussian distribution with mean { µ} depending on the class. In other words, given label yi { 1}, the feature vector xi Rd is drawn from N(yiµ, Σ).

We focus on class of linear classiﬁers {x Tθ : θ Rd}. Given a model θ the predicted labels are simply given as sign(x Tθ). We consider 0-1 loss ℓ(θ; z) = I(ˆy = y) = I(yx Tθ 0). We also consider Wasserstein adversarial training with distance

d(z, z) = x x ℓr + I{y = y} , (22)

for z = (x, y) and z = ( x, y). Our next results is on characterizing the standard risk and the Wasserstein adversarial risk for this model. Proposition 3.5. Consider binary classiﬁcation with Gaussian mixture data model and 0-1 loss. Let aθ := µTθ Σ1/2θ ℓ2 .

Then, for a linear classiﬁer x 7 sgn(x Tθ), the standard risk is given by

SR(θ) = Φ( aθ) ,

where Φ(z) = 1

2π R z e t2

2 dt denotes the c.d.f of a standard Gaussian distribution. In addition, deﬁne the function F(θ, γ) : Rd+1 7 R 0 as follows:

F(θ, γ) = γ

bθ ε2 + Φ r 2

+ (a2 θ + 1) h Φ aθ r 2

with bθ := Σ1/2θ 2

ℓ2 θ 2 ℓq , ℓq denoting the dual norm of ℓr (i.e.,

q = 1) and ϕ(t) := 1

2 standing for the p.d.f

Fundamental Tradeoffs in Distributionally Adversarial Training

of a standard Gaussian distribution. Then, the Wasserstein adversarial risk with p = 2 and metric d( , ) given by (22) is characterized as

AR(θ) = inf γ 0 F(θ, γ) . (24)

Note that as an implication of Proposition 3.5, the standard risk SR(θ) and the adversarial risk AR(θ) depend on the estimator θ only through the components aθ and bθ.

We next characterize the Pareto optimal front for the region {(SR(θ), AR(θ)) : θ Rd}. Since the 0-1 loss I(yx Tθ 0) is convex in θ, both the standard risk and the adversarial risks are convex functions of θ (by a similar argument given prior to Theorem 3.3.)

Theorem 3.6. Assume the setting of Proposition 3.5 and consider the following minimization problem

(θλ , γλ ) := arg min γ 0,θ λΦ( aθ) + F(θ, γ) . (25)

The Pareto optimal curve is given by

{(Φ( aθλ ), F(θλ , γλ )) : λ 0} .

Theorem 3.6 follows from the fact that the Pareto front of a convex set is characterized by intersection points of the set with the supporting hyperplanes.

Remark 3.7. For r = q = 2 and Σ = I, we have bθ = 1. In this case the objective of (25) is decreasing in aθ and since |aθ| µ ℓ2, it is minimized at aθ = µ ℓ2. In addition, SR(θ) is decreasing in aθ and is minimized at the same value of aθ = µ ℓ2. Therefore, by introducing c := µ ℓ2, the Pareto-optimal curve shrinks to a single point given by

SR = Φ( c) , (26)

AR = inf γ 0

γε2 + Φ r 2

+ (c2 + 1) h Φ c r 2

In other words, the tradeoff between standard and adversarial risks, achieved by linear classiﬁers, vanishes in this case and the estimators in direction of the class average µ are optimal with respect to both standard risk and the Wasserstein adversarial risks.

We refer to Section A.5 for the proof of Remark 3.7.

Figure 2 showcases the effect of different factors in a binary classiﬁcation setting on the Pareto-optimal tradeoff between

standard and adversarial risks. Here the features x are drawn from N(yµ, Σ), with Σij = ρ|i j|. The class average µ has i.i.d entries from N(0, 1/d) with d = 10. In Figure 2a, we investigate the role of the norm r used in the Wasserstein adversary model, cf. Equation (22). As discussed in Remark 3.7, when r = 2, the tradeoff between standard and adversarial risks vanishes and the estimators in direction of the class average µ are optimal with respect to both standard risk and the Wasserstein adversarial risks.

Figure 2b illustrates the effect of dependency among features ρ on optimal tradeoff between standard and adversarial risks. In this setting r = and ε = 0.3. From the result of Theorem 3.6, we see that these risks very much depend on the interaction between the class average µ and the features covariance Σ and so the curves are shifted in highly nontrivial way depends on the value of ρ when we ﬁx µ.

The role of adversary s budget ε is depicted in ﬁgure 2c in which r = , ρ = 0. Similar to the linear regression setting, when ε is small the two measures of risk are close to each other and we have a small range of Pareto-optimal models. As ε grows, the standard risk and the adversarial risks differ signiﬁcantly and we get a wide range of Paretooptimal models.

3.3. Learning nonlinear functions

We next investigate the tradeoff between standard and adversarial risk for the problem of learning an unknown function over the d-dimensional sphere Sd 1. More precisely, we consider the following data generative model:

y = fd(x) + w , (27)

with x Unif(Sd 1(

d)), the d-dimensional sphere of radius

d, and w N(0, σ2) independent of x. We consider ﬁtting a random features model to data generated according to (27). The class of random features model is given by

FRF(θ, U) = f(x, θ, U) :=

i=1 θiσ(u T i x) ,

with θi R, i = 1, . . . , N , (28)

where U RN d is a matrix whose i-th row is the vector ui, uniformly drawn from Sd 1(1), independently from data. The random features model can be equivalently represented by two-layer neural network with the ﬁrst-layer weights U chosen randomly and θ = (θi)1 i N corresponding to the second-layer weights. The random features model was introduced by (Rahimi & Recht, 2007) for scaling kernel methods to large datasets. There is indeed a substantial literature drawing connections between random features models, kernel methods and fully trained neural networks (Daniely

Fundamental Tradeoffs in Distributionally Adversarial Training

0.02 0.04 0.06 0.08 0.1 0.12 0.14 Standard risk

Adversarial risk

(a) Pareto optimal curve for several ℓr norms on feature space with d = 10, ε = 0.5 and ρ = 0.

0 0.05 0.1 0.15 Standard risk

Adversarial risk

(b) Pareto optimal curve for several feature dependency values (ρ) with d = 10, ε = 0.3, and r = .

0.02 0.04 0.06 0.08 0.1 0.12 0.14 Standard risk

Adversarial risk

(c) Pareto optimal curve for several adversary s budget ε with d = 10, r = , and ρ = 0.

Figure 2. The effect of deﬁned ℓr norm on feature space, dependency across features (ρ) and adversary s power ε on Pareto optimal tradeoff between adversarial and standard risks in binary classiﬁcation under Gaussian mixture model.

et al., 2016; Daniely, 2017; Jacot et al., 2018; Li & Liang, 2018). In (Mei & Montanari, 2019), the generalization error (standard risk) of random features model was precisely characterized for the problem of learning a function fd( ) over Sd 1(

d) in the regime where the network width N, sample size n and feature dimension d grow in proportion. The nonlinear model considered in (Mei & Montanari, 2019) is of the form

fd(x) = βd,0 + x Tβd,1 + f NL d (x) , (29)

with the nonlinear component f NL d (x) is a centered isotropic Gaussian process indexed by x. We follow the same model and consider the following random quadratic function

fd(x) = βd,0 + x Tβd,1 + F

d [x TGx tr(G)] , (30)

for some ﬁxed F R and G Rd d a random matrix with i.i.d entries from N(0, 1).

Our goal is to study the Pareto-optimal tradeoff between standard and adversarial risks for this learning setting, achieved by the class of random features model (28). The standard risk in this setting is given by

SR(θ) = Ex,y (y θTσ(Ux))2

= Ex (fd(x) θTσ(Ux))2 + σ2 . (31)

For the Wasserstein adversarial risk we use the following corollary which is obtained by specializing Proposition 2.3 to random features model.

Corollary 3.8. Consider the class of feature model given by (28). In this case, the 2-Wasserstein adversarial risk with distance d( , ) (12) admits the following ﬁrst-order

approximation:

AR(θ) = SR(θ)

+ 2ε Ex h (fd(x) θTσ(Ux))2 + σ2

U Tdiag(σ (Ux))θ 2

i1/2 + O(ε2) ,

with σ ( ) denoting the derivative of the activation σ( ) and SR(θ) given by (31).

The proof of Corollary 3.8 is given in Appendix A.6. The standard risk is quadratic and hence convex in θ. The adversarial risk is also convex in θ (it follows from the fact that pointwise maximization preserves convexity.) Therefore, for small values of ε (weak adversary) the ﬁrst order approximation of AR(θ) is also convex in θ. As such, (almost) all Pareto optimal points are given by minimizing a weighted combination of the two risk measures as the weight λ varies in [0, ). Namely,

θλ := arg min θ λSR(θ) + AR(θ) , (33)

with SR(θ) given by (31), and AR(θ) given by (32).

We use the above characterization to derive the Paretooptimal tradeoff curves between standard and adversarial risks for learning function fd(x), given by (30) with F = 1, βd,0 = 0, and βd,1 Rd with i.i.d entries N(0, 1/d). The data are generated according to (27) with σ = 2, d = 10 and N {250, 500, 750, 1000}. To compute θλ we use empirical loss with n = 500K samples of x Unif(Sd 1(

d)). For each value of λ and N we generate 15 realization of weights U and compute θλ for each realizations using gradient descent on the loss function (33). The Pareto optimal points {SR(θλ), AR(θλ) : λ 0} are plotted in Figure 3

Fundamental Tradeoffs in Distributionally Adversarial Training

4.05 4.1 4.15 4.2 4.25 4.3 4.35 Standard risk

Adversarial risk

N=250 N=500 N=750 N=1000

Figure 3. Pareto-optimal tradeoff curves for learning random quadratic functions using random features model. Data is generated according to (27) with σ = 2 and fd(x) given by (30). Here, d = 10 and N is the number of random features (width of the neural network).

for ε = 0.2. As we see for each value of N the tradeoff curves concentrate as N grows implying that the estimator θλ becomes independent of the speciﬁc realization of weights U. Also we observe that the tradeoff between standard and adversarial risks exist even for large values of N. Interestingly, as the network width N grows both the standard risk and adversarial risk decrease but the tradeoff between them clearly remains (the length of Pareto front does not shrink).

4. Conclusion

Linear regression and binary classiﬁcation are two simple, yet foundational settings in machine learning and still the full effect of adversarial training is not known for these settings. In this work we focus on distribution perturbing adversary and provide a framework on how to think about the tradeoff between the Standard risk (SR) and the Adversarial risk (AR), its existence and its quantitative behavior with respect to data distribution and the hypotheses class. Note that these are non-trivial questions and previously there has been speciﬁc examples to hint such tradeoff. A tantalizing question is whether one can remove this tradeoff (or improve SR and AR simultaneously) by considering a more complex class of hypotheses. Our discussion in Section 3.3 is a ﬁrst attempt to answer this question for random features model.

Acknowledgements

A. Javanmard is partially supported by an Adobe Data Science Faculty Research Award, Sloan Research Fellowship, and the NSF CAREER Award DMS-18444. The authors would like to thank the anonymous reviewers for their thoughtful comments that helped us to improve our work.

Athalye, A., Carlini, N., and Wagner, D. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. ar Xiv preprint ar Xiv:1802.00420, International Conference on Machine Learning, 2018. 1

Bartl, D., Drapeau, S., Obloj, J., and Wiesel, J. Robust uncertainty sensitivity analysis. ar Xiv preprint ar Xiv:2006.12022, 2020. 4

Biggio, B., Corona, I., Maiorca, D., Nelson, B., Srndi c, N., Laskov, P., Giacinto, G., and Roli, F. Evasion attacks against machine learning at test time. In Joint European conference on machine learning and knowledge discovery in databases, pp. 387 402. Springer, 2013. 1

Carlini, N. and Wagner, D. Adversarial examples are not easily detected: Bypassing ten detection methods. In Proceedings of the 10th ACM Workshop on Artiﬁcial Intelligence and Security, pp. 3 14, 2017. 1

Carlini, N., Mishra, P., Vaidya, T., Zhang, Y., Sherr, M., Shields, C., Wagner, D., and Zhou, W. Hidden voice commands. In 25th {USENIX} Security Symposium ({USENIX} Security 16), pp. 513 530, 2016. 1

Chen, L., Ye, Y., and Bourlai, T. Adversarial machine learning in malware detection: Arms race between evasion attack and defense. In 2017 European Intelligence and Security Informatics Conference (EISIC), pp. 99 106. IEEE, 2017. 1

Cisse, M., Bojanowski, P., Grave, E., Dauphin, Y., and Usunier, N. Parseval networks: Improving robustness to adversarial examples. ar Xiv preprint ar Xiv:1704.08847, International Conference on Machine Learning, 2017. 1

Daniely, A. Sgd learns the conjugate kernel class of the network. In Advances in Neural Information Processing Systems, pp. 2422 2430, 2017. 8

Daniely, A., Frostig, R., and Singer, Y. Toward deeper understanding of neural networks: The power of initialization and a dual view on expressivity. In Advances In Neural Information Processing Systems, pp. 2253 2261, 2016. 7

Dobriban, E., Hassani, H., Hong, D., and Robey, A. Provable tradeoffs in adversarially robust classiﬁcation. ar Xiv preprint ar Xiv:2006.05161, 2020. 2

Dong, Y., Deng, Z., Pang, T., Su, H., and Zhu, J. Adversarial distributional training for robust deep learning. ar Xiv preprint ar Xiv:2002.05999, 2020. 3

Gao, R. and Kleywegt, A. J. Distributionally robust stochastic optimization with wasserstein distance. ar Xiv preprint ar Xiv:1604.02199, 2016. 4

Fundamental Tradeoffs in Distributionally Adversarial Training

Gao, R., Chen, X., and Kleywegt, A. J. Wasserstein distributionally robust optimization and variation regularization. ar Xiv preprint ar Xiv:1712.06050v3, 2020. 4

Goodfellow, I. J., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015a. 1

Goodfellow, I. J., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples. ar Xiv preprint ar Xiv:1412.6572, International Conference on Learning Representations, 2015b. 1

Jacot, A., Gabriel, F., and Hongler, C. Neural tangent kernel: Convergence and generalization in neural networks. In Advances in neural information processing systems, pp. 8571 8580, 2018. 8

Javanmard, A. and Soltanolkotabi, M. Precise statistical analysis of classiﬁcation accuracies for adversarial training. ar Xiv preprint ar Xiv:2010.11213, 2020. 2

Javanmard, A., Soltanolkotabi, M., and Hassani, H. Precise tradeoffs in adversarial training for linear regression. volume 125 of Proceedings of Machine Learning Research, Conference of Learning Theory (COLT), pp. 2034 2078. PMLR, 09 12 Jul 2020. 2

Kurakin, A., Goodfellow, I., and Bengio, S. Adversarial machine learning at scale. ar Xiv preprint ar Xiv:1611.01236, 2016. 1

Li, Y. and Liang, Y. Learning overparameterized neural networks via stochastic gradient descent on structured data. In Advances in Neural Information Processing Systems, pp. 8157 8166, 2018. 8

Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. Towards deep learning models resistant to adversarial attacks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, 2018a. 1

Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. Towards deep learning models resistant to adversarial attacks. ar Xiv preprint ar Xiv:1706.06083, International Conference on Learning Representations, 2018b. 2

Mei, S. and Montanari, A. The generalization error of random features regression: Precise asymptotics and double descent curve. ar Xiv preprint ar Xiv:1908.05355, 2019. 8

Papernot, N., Mc Daniel, P., Swami, A., and Harang, R. Crafting adversarial input sequences for recurrent neural networks. In MILCOM 2016-2016 IEEE Military Communications Conference, pp. 49 54. IEEE, 2016a. 1

Papernot, N., Mc Daniel, P., Wu, X., Jha, S., and Swami, A. Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy (SP), pp. 582 597. IEEE, 2016b. 1

Papernot, N., Mc Daniel, P., Goodfellow, I., Jha, S., Celik, Z. B., and Swami, A. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia conference on computer and communications security, pp. 506 519, 2017. 1

Pydi, M. S. and Jog, V. Adversarial risk via optimal transport and optimal couplings. In International Conference on Machine Learning, pp. 7814 7823. PMLR, 2020. 3, 4

Raghunathan, A., Steinhardt, J., and Liang, P. Certiﬁed defenses against adversarial examples. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, 2018. 1

Raghunathan, A., Xie, S. M., Yang, F., Duchi, J. C., and Liang, P. Adversarial training can hurt generalization. ar Xiv preprint ar Xiv:1906.06032, 2019. 2

Rahimi, A. and Recht, B. Random features for large-scale kernel machines. Advances in neural information processing systems, 20:1177 1184, 2007. 7

Staib, M. and Jegelka, S. Distributionally robust deep learning as a generalization of adversarial training. In NIPS workshop on Machine Learning and Computer Security, 2017. 3, 4

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. Intriguing properties of neural networks. International Conference on Learning Representations (ICLR), 2014. 1

Taheri, H., Pedarsani, R., and Thrampoulidis, C. Asymptotic behavior of adversarial training in binary classiﬁcation. ar Xiv preprint ar Xiv:2010.13275, 2020. 2

Tsipras, D., Santurkar, S., Engstrom, L., Turner, A., and Madry, A. Robustness may be at odds with accuracy. ar Xiv preprint ar Xiv:1805.12152, 2018. 2

Vaidya, T., Zhang, Y., Sherr, M., and Shields, C. Cocaine noodles: exploiting the gap between human and machine speech recognition. In 9th {USENIX} Workshop on Offensive Technologies ({WOOT} 15), 2015. 1

Fundamental Tradeoffs in Distributionally Adversarial Training

Wong, E. and Kolter, J. Z. Provable defenses against adversarial examples via the convex outer adversarial polytope. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsm assan, Stockholm, Sweden, July 10-15, 2018, pp. 5283 5292, 2018. 1

Zhang, G., Yan, C., Ji, X., Zhang, T., Zhang, T., and Xu, W. Dolphinattack: Inaudible voice commands. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 103 117, 2017. 1