# stability_and_generalization_in_free_adversarial_training__9c992b52.pdf

Published in Transactions on Machine Learning Research (11/2024)

Stability and Generalization in Free Adversarial Training

Xiwei Cheng xwcheng@link.cuhk.edu.hk The Chinese University of Hong Kong

Kexin Fu fu448@purdue.edu Purdue University

Farzan Farnia farnia@cse.cuhk.edu.hk The Chinese University of Hong Kong

Reviewed on Open Review: https: // openreview. net/ forum? id= jmw Ei C9bq2

While adversarial training methods have significantly improved the robustness of deep neural networks against norm-bounded adversarial perturbations, the generalization gap between their performance on training and test data is considerably greater than that of standard empirical risk minimization. Recent studies have aimed to connect the generalization properties of adversarially trained classifiers to the min-max optimization algorithm used in their training. In this work, we analyze the interconnections between generalization and optimization in adversarial training using the algorithmic stability framework. Specifically, our goal is to compare the generalization gap of neural networks trained using the vanilla adversarial training method, which fully optimizes perturbations at every iteration, with the free adversarial training method, which simultaneously optimizes norm-bounded perturbations and classifier parameters. We prove bounds on the generalization error of these methods, indicating that the free adversarial training method may exhibit a lower generalization gap between training and test samples due to its simultaneous min-max optimization of classifier weights and perturbation variables. We conduct several numerical experiments to evaluate the train-to-test generalization gap in vanilla and free adversarial training methods. Our empirical findings also suggest that the free adversarial training method could lead to a smaller generalization gap over a similar number of training iterations. The paper code is available at https://github.com/Xiwei-Cheng/Stability_Free AT.

1 Introduction

Deep neural networks (DNNs) have attained state-of-the-art results in various supervised learning tasks in computer vision, speech recognition, and natural language processing. However, it is widely recognized that DNNs are susceptible to minor adversarially designed perturbations to their input data, commonly regarded as adversarial attacks (Szegedy et al., 2013; Goodfellow et al., 2014). Adversarial examples are typically generated by optimizing a norm-constrained perturbation leading to the maximum classification loss for input data. A standard approach to defending against adversarial attacks is adversarial training (AT) (Madry et al., 2017), according to which the DNN classifier is learned using adversarially perturbed training data. AT methods have significantly improved the robustness of DNNs against norm-bounded perturbations. In recent years, several variants of AT methods have been developed to accelerate and facilitate the application of AT to large-scale models and datasets (Kannan et al., 2018; Shafahi et al., 2019; Wong et al., 2020).

While AT algorithms offer significant robustness against standard norm-bounded attacks, the generalization gap between their performance on training and test data has been found to be considerably greater than that of DNNs trained by standard empirical risk minimization (ERM) (Schmidt et al., 2018; Raghunathan et al., 2019). To study the overfitting phenomenon in adversarial training, the recent literature has focused

Published in Transactions on Machine Learning Research (11/2024)

on the generalization analysis of adversarially trained models. The studies have attempted to analyze the generalization error in learning adversarially-robust models (Yin et al., 2019; Awasthi et al., 2020; Xiao et al., 2022b) and to show the reduction of generalization gap under explicit and implicit regularization methods such as early stopping and Lipschitz regularization (Rice et al., 2020; Zhang et al., 2022; Xiao et al., 2023b).

Specifically, several works (Lei et al., 2021; Farnia & Ozdaglar, 2021; Xiao et al., 2022b; 2024b) have concentrated on the interconnections between the optimization and generalization of adversarially-trained models. Since adversarial training methods use adversarial training examples with the worst-case norm-bounded perturbations, they are typically formulated as min-max optimization problems where the classifier and adversarial perturbations are the minimization and maximization variables, respectively. To solve the min-max optimization problem, the vanilla AT framework follows an iterative algorithm where, at every iteration, the inner maximization problem is fully solved for designing the optimal perturbations and subsequently, a single gradient update is applied to the DNN s parameters. Therefore, the vanilla AT results in a non-simultaneous optimization of the minimization and maximization variables of the underlying min-max problem. However, the theoretical generalization error bounds in (Farnia & Ozdaglar, 2021; Lei et al., 2021) suggest that the non-simultaneous optimization of the min and max variables in a min-max learning problem could lead to a greater generalization gap. Therefore, a natural question is whether an adversarial training algorithm with simultaneous optimization of the min and max problems can reduce the generalization gap.

In this work, we focus on a widely-used variant of adversarial training proposed by Shafahi et al. (2019), adversarial training for free (Free AT), and aim to analyze its generalization behavior compared to the vanilla AT approach. While the vanilla AT follows a sequential optimization of the DNN and perturbation variables, the Free AT approach simultaneously computes the gradient of the two groups of variables at every round of applying the backpropagation algorithm to the multi-layer DNN. We aim to demonstrate that the simultaneous optimization of the classifier and adversarial examples in free AT could translate into a lower generalization error compared to vanilla AT. To this end, we provide theoretical and numerical results to compare the generalization properties of vanilla vs. free AT frameworks.

On the theory side, we leverage the algorithmic stability framework (Bousquet & Elisseeff, 2002; Hardt et al., 2015) to derive generalization error bounds for free and vanilla adversarial training methods. The shown generalization bounds suggest that in the nonconvex-nonconcave regime, the free AT algorithm could enjoy a lower generalization gap than the vanilla AT, since it applies simultaneous gradient updates to the DNN s and perturbations variables. We also develop a similar generalization bound for the fast AT methodology (Goodfellow et al., 2014) which uses a single gradient step to optimize the perturbations. Our theoretical results suggest a comparable generalization bound between free and fast AT approaches.

Finally, we present the results of our numerical experiments to compare the generalization performance of the vanilla and free AT methods over standard computer vision datasets and neural network architectures. Our numerical results also suggest that the free AT method results in a considerably lower generalization gap than the vanilla AT. Furthermore, we propose applying the free AT framework to implement Free TRADES, a free-AT version of the adversarial training algorithm TRADES (Zhang et al., 2019). The numerical results suggest that the TRADES algorithm similarly has a lower generalization gap under simultaneous optimization of the min and max optimization variables. This work s contributions can be summarized as:

Applying the algorithmic stability framework to analyze the generalization behavior of free AT,

Providing a theoretical comparison of the generalization properties of the vanilla and free AT methods,

Numerically analyzing the generalization performance of the free vs. vanilla AT schemes under white-box and black-box adversarial attacks,

Proposing Free TRADES, a simultaneous updating variant of TRADES with better generalization.

2 Related Work

Generalization in Adversarial Training: Since the discovery of adversarial examples (Szegedy et al., 2013), a large body of works has focused on training robust DNNs against adversarial perturbations (Goodfellow et al., 2014; Carlini & Wagner, 2017; Madry et al., 2017; Zhang et al., 2019). Shafahi et al. (2019)

Published in Transactions on Machine Learning Research (11/2024)

proposed the free adversarial training algorithm to update the neural net and adversarial perturbations simultaneously, aimed at reducing the computational cost of adversarial training. Also, Wong et al. (2020) discussed the application of the fast AT algorithm, aiming to lower the computational costs.

Compared to standard ERM training, the overfitting in adversarial training is shown to be more severe (Rice et al., 2020). A line of works analyzed adversarial generalization by applying uniform convergence tools such as VC-dimension (Montasser et al., 2019; Attias et al., 2022) and Rademacher complexity (Yin et al., 2019; Farnia et al., 2018; Awasthi et al., 2020; Xiao et al., 2022a; 2024a). Schmidt et al. (2018) proved tight bounds on the adversarially robust generalization error showing that vanilla adversarial training requires more data for proper generalization than standard training. Xing et al. (2022) studied the phase transition of generalization error from standard training to adversarial training. Also, the reference (Andriushchenko & Flammarion, 2020) discusses the catastrophic overfitting in the Fast AT method.

Uniform Stability: Bousquet & Elisseeff (2002) developed the algorithmic stability framework to analyze the generalization performance of learning algorithms. Hardt et al. (2015) further extended the algorithmic stability approach to stochastic gradient-based optimization (SGD) methods. Bassily et al. (2020); Lei (2023) analyzed the stability under non-smooth functions. Some recent works applied the stability framework to study the generalization gap of adversarial training, while they mostly assumed an oracle to obtain a perfect perturbation and focused on the stability of the training process. Xing et al. (2021) analyzed the stability by shedding light on the non-smooth nature of the adversarial loss. Xiao et al. (2022b) further investigated the stability bound by introducing a notion of approximate smoothness. Based on this result, Xiao et al. (2022c) proposed a smoothed version of SGDmax to improve the adversarial generalization. Xiao et al. (2023a) utilized the stability framework to improve the robustness of DNNs under various types of attacks.

Generalization in minimax learning frameworks: The generalization analysis of general minimax learning frameworks has been studied in several related works. Arora et al. (2017) established a uniform convergence generalization bound in terms of the discriminator s parameters in generative adversarial networks (GANs). Zhang et al. (2017); Bai et al. (2018) characterized the generalizability of GANs using the Rademacher complexity of the discriminator function space. Some work also analyzed generalization in GANs from the algorithmic perspective. Farnia & Ozdaglar (2021); Lei et al. (2021) compared the generalization of SGDA and SGDmax in minimax optimization problems using algorithmic stability. Wu et al. (2019) studied generalization in GANs from the perspective of differential privacy. Ozdaglar et al. (2022) proposed a new metric to evaluate the generalization of minimax problems and studied the generalization behaviors of SGDA and SGDmax.

3 Preliminaries

Suppose that labeled sample (x, y) is randomly drawn from some unknown distribution D. The goal of adversarial training is to find a model fw with parameter w W which minimizes the population risk against the adversarial perturbation δ from a feasible perturbation set , defined as:

R(w) := E(x,y) D

max δ h(w, δ; x, y) ,

where h(w, δ; x, y) = Loss(fw(x + δ), y) is the loss function in the supervised learning problem. Since the learner does not have access to the underlying distribution D but only a dataset S = {x1, x2, , xn} of size n, we define the empirical adversarial risk as

j=1 max δ h(w, δ; xj, yj).

The generalization adversarial risk Egen(w) of model parameter w is defined as the difference between population and empirical risk, i.e., Egen(w) := R(w) RS(w). For a potentially randomized algorithm A which takes a dataset S as input and outputs a random vector w = A(S), we can define its expected generalization adversarial risk over the randomness of a training set S and stochastic algorithm A, e.g. under mini-batch

Published in Transactions on Machine Learning Research (11/2024)

selection in stochastic gradient methods or random initialization of the weights of a neural net classifier,

Egen(A) := ES,A R(A(S)) RS(A(S)) .

Throughout the paper, unless specified otherwise, we use to denote the L2 norm of vectors or the Frobenius norm of matrices.

3.1 Adversarial Training

In standard applications of adversarial training, the perturbation set is usually an L2-norm or L -norm bounded ball of some small radius ε (Szegedy et al., 2013; Goodfellow et al., 2014). To robustify a neural network, the standard methodology AVanilla is to train the network with (approximately) perfectly perturbed samples, both in practice (Madry et al., 2017; Rice et al., 2020) and in theoretical analysis (Xing et al., 2021; Xiao et al., 2022b), which is formally defined as follows:

Algorithm 1 Vanilla Adversarial Training Algorithm AVanilla

1: Input: Training samples S, perturbation set , learning rate of model weight αw, mini-batch size b, number of iterations T

2: for step t 1, , T do 3: Uniformly random mini-batch B S of size b 4: Compute adversarial attack δj for all (xj, yj) B: δj arg max δ h(w, δ; xj, yj) 5: Update w with perturbed samples: w w αw

(xj,yj) B wh(w, δj; xj, yj) 6: end for

In practice, due to the non-convexity of neural networks, it is computationally intractable to compute the best adversarial attack δ = arg max δ h(w, δ; x), but the standard projected gradient descent (PGD) attack (Madry et al., 2017) is widely believed to produce near-optimal attacks, by iteratively projecting the gradient δh(w, δ; x) onto the set of extreme points of , i.e.,

π (g) := arg min δ Extreme Points( ) g δ 2, (1)

updating the attack δ with the projected gradient π ( δh(w, δ; x)) and some step size αδ, and projecting the update attack to the feasible set , i.e,

P (g) := arg min δ g δ 2. (2)

Despite the significant robustness gained from AVanilla, it demands significant computational costs for training. The Free adversarial training algorithm AFree (Shafahi et al., 2019) is proposed to avoid the overhead cost, by simultaneously updating the model weight parameter w when performing PGD attacks. AFree is empirically observed to achieve comparable robustness to AVanilla, while it can considerably reduce the training time (Shafahi et al., 2019; Wong et al., 2020).

Algorithm 2 Free Adversarial Training Algorithm AFree

1: Input: Training samples S, perturbation set , step size of model weight αw, learning rate of adversarial attack αδ, free step m, mini-batch size b, number of iterations T 2: for step 1, , T/m do 3: Uniformly random mini-batch B S of size b 4: δ := [δj]{j:(xj,yj) B} Uniform( b) 5: for iteration i 1, , m do 6: Compute weight gradient and attack gradient by backpropagation:

b P (xj,yj) B wh(w, δj; xj, yj), and gδ [ δh(w, δj; xj, yj)]{j:(xj,yj) B} 8: Update w with mini-batch gradient descent: w w αwgw 9: Update δ with projected gradient ascent: δ [P (δj + αδπ (gδj))]{j:(xj,yj) B} 10: end for 11: end for

Published in Transactions on Machine Learning Research (11/2024)

We also compare AVanilla and AFree with the Fast adversarial training algorithm AFast (Wong et al., 2020), a variant of the fast gradient sign method (FGSM) proposed by Goodfellow et al. (2014). Instead of computing a perfect perturbation, it applies only one projected gradient step with fine-tuned step size from a randomly initialized point in . AFast is empirically shown to achieve comparable robustness to the standard PGD training with lower training costs (Wong et al., 2020; Andriushchenko & Flammarion, 2020).

Algorithm 3 Fast Adversarial Training Algorithm AFast

1: Input: Training samples X, perturbation set , , learning rate of model weight αw, step size of adversarial attack αδ, mini-batch size b, number of iterations T 2: for step t 1, , T do 3: Uniformly random mini-batch B S of size b 4: Compute adversarial attack δ with random start: 5: δ := [ δj]{j:(xj,yj) B} Uniform( b)

6: gδ [ δh(w, δj; xj, yj)]{j:(xj,yj) B} 7: δ [P ( δj + αδπ (gδj))]{j:(xj,yj) B} 8: Update w with perturbed sample: w w αw

(xj,yj) B wh(w, δj; xj, yj) 9: end for

4 Stability and Generalization in Adversarial Training

To bound the generalization adversarial risk, the notion of uniform stability with respect to the adversarial loss is introduced (Bousquet & Elisseeff, 2002).

Definition 1. A randomized algorithm A is ϵ-uniformly stable if for all datasets S, S Dn such that S and S differ in at most one example, we have

max δ h(A(S), δ; x) max δ h(A(S ), δ; x) ϵ. (3)

Similar to Theorem 2.2 in Hardt et al. (2015), the generalization risk in expectation of a uniformly stable algorithm can be bounded by the following theorem

Theorem 1. Assume that a randomized algorithm A is ϵ-uniformly stable, then the expected generalization risk satisfies |Egen| = |ES,A[R(A(S)) RS(A(S))]| ϵ.

Proof. The proof can be found in Theorem 2.2 in Hardt et al. (2015) by replacing the loss function with the adversarial loss maxδ h(w, δ; x).

In order to study the uniform stability of adversarial training, we make the following assumptions on the Lipschitzness and smoothness of the objective function. Our generalization results will hold as long as Assumptions 1, 2 hold locally within an attack radius distance from the support set of X.

Assumption 1. h(w, δ) is jointly L-Lipschitz in (w, δ) and Lw-Lipschitz in w over W , i.e., for every w, w W and δ, δ we have

|h(w, δ) h(w , δ )|2 L2 w w 2 + δ δ 2 ,

|h(w, δ) h(w , δ)|2 Lw 2 w w 2.

Assumption 2. h(w, δ) is continuously differentiable and β-smooth over W , i.e., [ wh(w, δ), δh(w, δ)] is β-Lipschitz over W and for every w, w W, δ, δ we have

wh(w, δ) wh(w , δ ) 2 + δh(w, δ) δh(w , δ ) 2

β2 w w 2 + δ δ 2 .

Published in Transactions on Machine Learning Research (11/2024)

We clarify that the Lipschitzness and smoothness assumptions are common practice in the uniform stability analysis (Hardt et al., 2015; Xing et al., 2021; Farnia & Ozdaglar, 2021; Xiao et al., 2022b). In practice, although Re LU activation function is non-smooth, recent works (Du et al., 2019; Allen-Zhu et al., 2019) showed that the loss function of over-parameterized neural networks is semi-smooth; also, another line of works (Xie et al., 2020; Singla et al., 2021) suggest that replacing Re LU with smooth activation functions can strengthen adversarial training; and some works (Fazlyab et al., 2019; Shi et al., 2022) attempt to compute the Lipschitz constant of neural networks.

5 Stability-based Generalization Bounds for Free AT

In this section, we provide generalization bounds on vanilla, free, and fast adversarial training algorithms. While previous works mainly focus on theoretically analyzing the stability behaviors of vanilla adversarial training under the scenario that h(w, δ; x) is convex in w (Xing et al., 2021; Xiao et al., 2022b), or h(w, δ; x) is concave or even strongly-concave in δ (Lei et al., 2021; Farnia & Ozdaglar, 2021; Yang et al., 2022; Ozdaglar et al., 2022), our analysis focuses on the nonconvex-nonconcave scenario: without assumptions on the convexity of h(w, δ; x) in w or concavity of h(w, δ; x) in δ. We defer the proof of Theorems 2 and 4 to the Appendix A.1 and A.2. Throughout the proof, we assume that Assumptions 1 and 2 hold. Theorem 2 (Stability generalization bound of AVanilla). Assume that h(w, δ) satisfies Assumptions 1 and 2 and is bounded in [0, 1], and the perturbation set is an L2-norm ball of some constant radius ε, i.e., = {δ : ||δ|| ε}. Suppose that we run AVanilla in Algorithm 1 for T steps with vanishing step size αw,t c/t. Let constant λVanilla := βc, then

Egen(AVanilla) b

1 + 1 λVanilla

b (εβn + L) 1 λVanilla+1 T

λVanilla λVanilla+1 . (4)

By equation 4, we have the following asymptotic bound on Egen(AVanilla) with respect to T and n

Egen(AVanilla) = O T

λVanilla λVanilla+1 /n

λVanilla λVanilla+1 . (5)

This bound suggests that the vanilla adversarial training algorithm could lead to large generalization gaps,

because for any T = Ω(n), the bound T

λVanilla λVanilla+1 /n

λVanilla λVanilla+1 = Ω(1) is non-vanishing even when we are given infinity samples. This implication is also confirmed by the following lower bound from the work of Xing et al. (2021) and Xiao et al. (2022b): Theorem 3 (Lower bound on stability; Theorem 1 in Xing et al. (2021), Theorem 5.2 in Xiao et al. (2022b)). Suppose = {δ : ||δ|| ε}. Assume w(S) is the output of running AVanilla on the dataset S with mini-batch size b = 1 and constant step size αw 1/β for T steps. There exist some loss function h(w, δ; x) which is differentiable and convex with respect to w, some constant ε > 0, and some datasets S and S that differ in only one sample, such that

E[||w(S) w(S )||] Ω

This lower bound indicates that AVanilla could lack stability when the attack radius ε = Ω(1), hence the algorithm may result in significant generalization error from the stability perspective. Note that the lower bound in equation 6 is not inconsistent with Theorem 2, in which the step-size is assumed to be vanishing αw,t c/t and thus the lower bound is not directly applicable under that assumption. However, this constant generalization gap could be reduced by free adversarial training. Theorem 4 (Stability generalization bound of AFree). Assume that h(w, δ) satisfies Assumptions 1 and 2 and is bounded in [0, 1], and the perturbation set is an L2-norm ball of some constant radius ε, i.e., = {δ : ||δ|| ε}. Suppose we run AFree in Algorithm 2 for T/m steps with vanishing step size αw,t c/mt and constant step size αδ. If the norm of gradient δh(w, δ; x) is lower bounded by 1/ψ for constant ψ > 0 with probability 1 during the training process, let constant λFree := βc(1 + βc/m + αδεψβ)m 1, then

Egen(AFree) b

1 + 1 λFree

1 λFree+1 T

λFree λFree+1 . (7)

Published in Transactions on Machine Learning Research (11/2024)

Remark 1. To validate the soundness of the assumption on the lower-bounded norm δh(w, δ; x) by constant 1/ψ in practical settings, in Appendix B.6 we present the numerical evaluation of the gradient norm over the course of free-AT on CIFAR-10 and CIFAR-100 data, indicating that the norm is consistently lower-bounded by O(10 3) in the experiments.

Remark 2. Theorem 4 indicates how the simultaneous updates influence the generalization of adversarial training. From equation 7, we have the following asymptotic bound on Egen(AFree) with respect to T and n

Egen(AFree) = O T

λFree λFree+1 /n . (8)

Therefore, by controlling the step size αδ of the maximization step, we can bound the coefficient λFree and thus control the generalization gap of AFree, where a lower αδ can result in a smaller generalization gap.

Remark 3. One technical contribution of this work is to perform the stability analysis where the maximization variable is re-initialized after every m iterations where a new mini-batch of data is used. Our theoretical results suggest that as long as m is bounded by O(αδϵψ/c), the generalization risk will not change significantly with a greater m value, which is not implied by the standard bounds in the existing works (Farnia & Ozdaglar, 2021) in the literature. Also, our theoretical analysis considers the normalized gradient (instead of the vanilla gradient) for the gradient ascent step of solving the maximization sub-problem and mini-batch stochastic optimization for updating min and max variables at every iteration, which are not analyzed in the previous literature (Farnia & Ozdaglar, 2021).

We note that Theorem 3 (Xing et al., 2021; Xiao et al., 2022b) gives a lower bound Ω(T/n) on the algorithmic stability of vanilla adversarial training. On the other hand, comparing equation 8 with equation 5 suggests that AFree can generalize better than AVanilla for any T = O(n), since

λFree λFree+1 /n

λVanilla λVanilla+1 /n

λVanilla λVanilla+1 = T

1 λVanilla+1 1

1 λFree+1 = O 1/T

1 λFree+1 .

Furthermore, when T = O(n), equation 8 gives Egen(AFree) = O 1/n

1 λFree+1 , which implies that the generalization gap of AFree can be bounded given enough samples. If the number of iterations T is fixed, one can see that the generalization gap of AFree has a faster convergence to 0 than AVanilla. Therefore, neural nets trained by the free AT algorithm could generalize better than the vanilla adversarially-trained networks due to their improved algorithmic stability. Our theoretical results also echo the conclusion in Schmidt et al. (2018) that adversarially robust generalization requires more data, since λFree increases with respect to ε.

Remark 4. We note that the Free-AT method in Algorithm 2 follows the update rule of a projected gradient descent ascent (Projected GDA) which has been widely studied in the optimization literature. To the best of our knowledge, a tight convergence rate for Projected GDA applied to a general nonconvex-nonconcave optimization problem is still an open question in the community (Lin et al., 2020; Li et al., 2022). As a tight convergence rate for GDA nonconvex-nonconcave optimization is not available in the literature, we relied on numerical experiments to verify the improvement in the generalization gap by Free-AT method. Note that our numerical experiments use standard selections of stepsizes and other hyperparameters for the AT algorithms, and their numerical results suggest the standard hyperparameter selection leads to a lower generalization gap for Free-AT compared to Vanilla-AT.

We also provide theoretical analysis for the fast adversarial training algorithm AFast in Theorem 5, whose proof is deferred to Appendix A.3.

Theorem 5 (Stability generalization bound of AFast). Assume that h(w, δ) satisfies Assumptions 1 and 2 and is bounded in [0, 1], and the perturbation set is an L2-norm ball of some constant radius ε, i.e., = {δ : ||δ|| ε}. Suppose that we run AFast in Algorithm 3 for T steps with vanishing step size αw,t c/t and constant step size αδ. If the norm of gradient δh(w, δ; x) is lower bounded by 1/ψ for some constant ψ > 0 with probability 1 during the training process, let constant λFast := βc(1 + αδεψβ), then

Egen(AFast) b

1 + 1 λFast

1 λFast+1 T

λFast λFast+1 . (9)

Published in Transactions on Machine Learning Research (11/2024)

(a) Robust train/test error and generalization gap against L2-norm attack.

(b) Robust train/test error and generalization gap against L -norm attack.

Figure 1: Learning curves of different algorithms for a Res Net18 model adversarially trained against L2 and L attacks on CIFAR-10. The free curves are scaled horizontally by a factor of m.

Similar to Remark 2, we have the following asymptotic bound on Egen(AFast) with respect to T and n

Egen(AFast) = O T

λFast λFast+1 /n . (10)

Therefore, by controlling the step size αδ of the maximization step, we can bound the coefficient λFast and thus control the generalization gap of AFast, where a lower αδ can result in a smaller generalization gap. Comparing equation 10 with equation 5 also suggests that for any T = O(n), AFast can generalize better

than AVanilla, since T

λFast λFast+1 /n

λVanilla λVanilla+1 /n

λVanilla λVanilla+1 = T

n 1 λVanilla+1 1

T 1 λFast+1 = O 1/T

1 λFast+1 .

6 Numerical Results

In this section, we evaluate the generalization performance of vanilla, fast, and free adversarial training algorithms in a series of numerical experiments. We first demonstrate the overfitting issue in vanilla adversarial training and show that free or fast algorithms can considerably reduce the generalization gap. We demonstrate that the smaller generalization gap could translate into greater robustness against score-based or transferred black-box attacks. To examine the advantages of free AT, we also study the generalization gap for different numbers of training samples.

Experiment Settings: We conduct our experiments on datasets CIFAR-10, CIFAR-100 (Krizhevsky & Hinton, 2009), Tiny-Image Net (Le & Yang, 2015), and SVHN (Netzer et al., 2011). Following the standard setting in Madry et al. (2017), we use Res Net18 (He et al., 2016) for CIFAR-10 and CIFAR-100, Res Net50 for Tiny-Image Net, and VGG19 (Simonyan & Zisserman, 2014) for SVHN to validate our results on a diverse selection of network architectures. For vanilla adversarial training algorithm, since the inner optimization

Published in Transactions on Machine Learning Research (11/2024)

Table 1: Robust training accuracy, testing accuracy, and generalization gap of the vanilla, fast, and free algorithms across CIFAR-10 and CIFAR-100 datasets.

Dataset Attack Results (%) Vanilla-7 Vanilla-10 Fast Free-2 Free-4 Free-6 Free-8 Free-10

L2 Train Acc. 100.0 100.0 89.7 89.5 86.4 78.9 74.3 71.1 Test Acc. 65.5 65.6 59.7 62.1 66.0 66.2 65.4 64.2 Gen. Gap 34.5 34.4 30.0 27.4 20.4 12.7 8.9 6.9

L Train Acc. 94.8 95.1 81.4 37.9 63.7 58.8 55.6 52.7 Test Acc. 42.9 43.7 38.9 28.2 46.3 47.2 48.2 46.9 Gen. Gap 51.9 51.4 42.5 9.7 17.4 11.6 7.4 5.8

L2 Train Acc. 99.9 99.9 81.5 94.0 82.9 68.1 58.2 49.7 Test Acc. 35.7 36.5 30.8 34.8 36.6 38.0 37.8 36.9 Gen. Gap 64.2 63.4 50.7 59.2 46.3 30.1 20.4 12.8

L Train Acc. 95.6 96.0 70.3 36.9 52.9 43.2 36.7 32.2 Test Acc. 20.0 20.3 17.3 15.0 22.4 24.5 24.9 24.5 Gen. Gap 75.6 75.7 53.0 21.9 30.5 18.7 11.8 7.7

task maxδ h(w, δ; x) is computationally intractable for neural networks which are generally non-concave, we apply standard projected gradient descent (PGD) attacks (Madry et al., 2017) as a surrogate adversary. For free and fast algorithms, we adopt AFree and AFast defined in Algorithms 2 and 3, following from Shafahi et al. (2019); Wong et al. (2020).

Robust Overfitting during Training Process: We applied L2-norm attack of radius ε = 128/255 and L -norm attack of radius ε = 8/255 to adversarially train Res Net18 models on CIFAR-10. For the vanilla algorithm, we used a PGD adversary to perturb the image. For the free algorithm, we applied the learning rate of adversarial attack αδ = ε with free step m as 2, 4, 6, 8, and 10.1 The other implementation details are deferred to Appendix B.1. We trained the models for 200 epochs and after every epoch, we tested the models robust accuracy against a PGD adversary and evaluated the generalization gap. The numerical results on CIFAR-10 and CIFAR-100 are presented in Table 1, and the training curves are plotted in Figure 1. Further numerical results on SVHN and Tiny-Image Net are deferred to Table 4 in Appendix B.1.

Based on the empirical results, we observe the significant overfitting in the robust accuracy of the vanilla adversarial training: the generalization gap is above 30% against L2 attack and 50% against L attack. On the other hand, the free AT algorithm has less severe overfitting and reduced the generalization gap to 20%. Although the free AT algorithm applies a weaker adversary, it achieves comparable robustness on test samples to the vanilla AT algorithm against the PGD attacks by lowering the generalization gap. Additional numerical results for different numbers of free AT steps and on other datasets are provided in Appendix B.1.

Robustness Evaluation Against Black-box Attacks: To study the consequences of the generalization behavior of the free AT algorithm, we evaluated the robustness of the adversarially-trained networks against black-box attack schemes where the attacker does not have access to the parameters of the target models (Bhagoji et al., 2018). We applied the square attack (Andriushchenko et al., 2020), a score-based methodology via random search, to examine networks adversarially trained by the discussed algorithms as shown in Figure 2. We also used adversarial examples transferred from other independently trained robust models as shown in Figure 3. More experiments on different datasets are provided in Appendix B.2.

We extensively observe the improvements of the free algorithm compared to the vanilla algorithm against different black-box attacks, which suggests that its robustness is not gained from gradient-masking (Athalye et al., 2018) but rather attributed to the smaller generalization gap. Furthermore, our numerical findings in Appendix B.3 indicate that the adversarial perturbations designed for DNNs trained by free AT could transfer better to an unseen target neural net than those found for DNNs trained by vanilla and fast AT.

Generalization Gap for Different Numbers of Training Samples: To examine our theoretical results in Theorems 2 and 4, we evaluated the robust generalization loss with respect to different num-

1Throughout this work, we use Vanilla-m to denote the vanilla AT algorithm with m PGD-adversary iterations, and Vanilla without specification means Vanilla-10 by default. We also use Free-m to denote the free AT algorithm with m free steps, and Free without specification means Free-4 by default. It is worth noting that Vanilla-1 is equivalent to fast AT algorithm.

Published in Transactions on Machine Learning Research (11/2024)

Figure 2: Robust accuracy of Res Net18 models adversarially trained by vanilla, fast, and free algorithms against square attack on CIFAR10. The left figure applies L2 attacks of radius ranging from 64 to 192, and the right figure applies L attacks of radius ranging from 1 to 9.

Figure 3: Robust accuracy against transferred attacks designed for another independently trained robust model. The left figure applies L2 attacks of radius ranging from 64 to 192, and the right figure applies L attacks of radius ranging from 1 to 9.

bers of training samples n. We randomly sampled a subset from the CIFAR-10 training dataset of size n {10000, 20000, 30000, 40000, 50000}, and adversarially trained Res Net18 models on the subset for a fixed number of iterations. As shown in Figure 4, the generalization gap of free AT is notably decreasing faster than vanilla AT with respect to n, which is consistent with our theoretical analysis. More experimental results are discussed in the Appendix B.4.

Generalization analysis of friendly adversarial training (Zhang et al., 2020) with adaptive attack steps: We also perform numerical experiments to compare vanilla AT with friendly adversarial training (Friendly-AT) (Zhang et al., 2020), which applies an adaptive number of steps and an adaptive training perturbation radius εtrain. As shown in Table 2, by choosing a sufficiently large εtrain its generalization improvement over vanilla AT is similar to that of Fast-AT. On the other hand, a smaller εtrain results in PGD AT-like numerical results, showing the trade-off in generalization-optimization accuracy explored by Friendly-AT. The theoretical analysis of Friendly-AT will be an interesting future direction for our work.

Published in Transactions on Machine Learning Research (11/2024)

(a) L2-norm-based Adversarial Training

(b) L -norm-based Adversarial Training

Figure 4: Adversarial generalization gap of Res Net18 models adversarially trained by vanilla, fast, and free algorithm for a fixed number of steps on a subset of CIFAR-10.

Table 2: Robust generalization performance of Friendly-AT (Zhang et al., 2020) and Vanilla-AT for Res Net18 models adversarially trained against L2 and L attacks on CIFAR-10. We set the extent steps τ = 3 in Friendly-AT, and other implementations follow from Zhang et al. (2020).

CIFAR-10, L2 attack CIFAR-10, L attack Results (%) Vanilla Friendly Friendly Vanilla Friendly Friendly εtrain = 128/255 εtrain = 256/255 εtrain = 8/255 εtrain = 16/255 Train Acc. 100.0 99.9 100.0 95.1 94.7 88.7 Test Acc. 65.6 65.8 66.9 43.7 44.1 46.3 Gen. Gap 34.4 34.1 33.1 51.4 50.6 42.4

7 Free TRADES

Our theoretical results suggest that the improved generalization in the free AT algorithm could follow from its simultaneous min-max optimization updates. A natural question is whether we can extend these results to other adversarial training methods. Here we propose the Free TRADES algorithm, a combination of the free AT algorithm and another well-established adversarial learning algorithm, TRADES (Zhang et al., 2019), and numerically evaluate the proposed Free TRADES method.

The main characteristic of TRADES can be summarized as substituting the adversarial loss h(w, δ; x, y) = Loss(fw(x + δ), y) with a surrogate loss:

hλ(w, δ; x, y) := Loss(fw(x), y) + 1

λLoss(fw(x), fw(x + δ)). (11)

The TRADES algorithm is aimed to minimize the surrogate risk 1

n Pn j=1 maxδ hλ(w, δ; xj, yj). Therefore, a natural idea gained from our theoretical analysis is to apply simultaneous updates to the adversarial attack δ and model weight w, which is stated in Algorithm 4.

We performed several numerical experiments to compare the performance of TRADES and Free TRADES algorithms. The results demonstrated in Table 3 show that Free TRADES could considerably improve the generalization gap while attaining a comparable (sometimes better) test performance to TRADES, which indicates that other adversarial training algorithms different from vanilla AT can also benefit from simultaneous optimization updates. Furthermore, we note that Free-AT and Free TRADES also have a better generalization performance on the clean data than Vanilla-AT and TRADES. Our theoretical results suggest that the training process of free AT is algorithmically more stable than vanilla AT, therefore the free AT could be similarly expected to generalize better on clean data than vanilla AT. The numerical results

Published in Transactions on Machine Learning Research (11/2024)

Algorithm 4 Free TRADES Adversarial Training Algorithm AFree TRADES

1: Input: Training samples S, perturbation set , step size of model weight αw, learning rate of attack αδ, free step m, mini-batch size b, number of iterations T, TRADES coefficient λ 2: for step 1, , T/m do 3: Uniformly random mini-batch B S of size b 4: δ := [δj]{j:xj,yj B} Uniform( b) 5: for iteration i 1, , m do

6: Compute weight gradient and attack gradient by backpropagation: 7: gw 1

xj,yj B w hλ(w, δj; xj, yj), and gδ [ δ hλ(w, δj; xj, yj)]{j:xj,yj B} 8: Update w with mini-batch gradient descent: w w αwgw 9: Update δ with projected gradient ascent: δ [P (δj + αδπ (gδj))]{j:xj,yj B} 10: end for

11: end for

Table 3: Robust generalization performance of the TRADES and Free-TRADES algorithms for Res Net18 models adversarially trained against L2 and L attacks on CIFAR-10 and CIFAR-100. We set TRADES coefficient λ = 1/6, free steps m = 4, and other details following from B.1. We run five independent trials and report the mean and standard deviation.

CIFAR-10, L2 attack CIFAR-10, L attack Results (%) TRADES Free-TRADES TRADES Free-TRADES Train Acc. 99.1 0.1 83.4 0.3 85.6 0.3 61.2 0.8 Test Acc. 66.3 0.3 68.2 0.2 50.4 0.3 49.8 0.5 Gen. Gap 32.8 0.4 15.2 0.2 35.3 0.1 11.4 0.3

CIFAR-100, L2 attack CIFAR-100, L attack Results (%) TRADES Free-TRADES TRADES Free-TRADES Train Acc. 99.6 0.1 81.2 0.8 83.5 0.7 46.5 0.4 Test Acc. 35.6 0.3 40.0 0.2 25.3 0.3 25.4 0.3 Gen. Gap 64.0 0.2 41.1 0.9 57.3 0.6 21.1 0.4

shown in Table 5 in Appendix B.1 are consistent with this intuition. The theoretical analysis of AFree TRADES will be an interesting future direction for our work.

8 Conclusion

In this work, we studied the role of min-max optimization algorithms in the generalization performance of adversarial training methods. We focused on the widely-used free adversarial training method and, leveraging the algorithmic stability framework we compared its generalization behavior with that of vanilla adversarial training. Our generalization bounds suggest that not only can the free AT approach lead to a faster optimization compared to the vanilla AT, but also it leads to a lower generalization gap between the performance on training and test data. We note that our theoretical conclusions are based on the upper bounds following the algorithmic stability-based generalization analysis, and an interesting topic for future study is to prove a similar result for the actual generalization gap under simple linear or shallow neural net classifiers. Another future direction could be to extend our theoretical analysis of the simultaneous optimization updates to other adversarial training methods such as ALP (Kannan et al., 2018).

Acknowledgment

The work of Farzan Farnia is partially supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China, Project 14209920, and is partially supported by a CUHK Direct Research Grant with CUHK Project No. 4055164.

Published in Transactions on Machine Learning Research (11/2024)

Zeyuan Allen-Zhu, Yuanzhi Li, and Zhao Song. A convergence theory for deep learning via overparameterization. In International conference on machine learning, pp. 242 252. PMLR, 2019.

Maksym Andriushchenko and Nicolas Flammarion. Understanding and improving fast adversarial training. Advances in Neural Information Processing Systems, 33:16048 16059, 2020.

Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion, and Matthias Hein. Square attack: a queryefficient black-box adversarial attack via random search. In European conference on computer vision, pp. 484 501. Springer, 2020.

Sanjeev Arora, Rong Ge, Yingyu Liang, Tengyu Ma, and Yi Zhang. Generalization and equilibrium in generative adversarial nets (gans). In International conference on machine learning, pp. 224 232. PMLR, 2017.

Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In International conference on machine learning, pp. 274 283. PMLR, 2018.

Idan Attias, Aryeh Kontorovich, and Yishay Mansour. Improved generalization bounds for adversarially robust learning. The Journal of Machine Learning Research, 23(1):7897 7927, 2022.

Pranjal Awasthi, Natalie Frank, and Mehryar Mohri. Adversarial learning guarantees for linear hypotheses and neural networks. In International Conference on Machine Learning, pp. 431 441. PMLR, 2020.

Yu Bai, Tengyu Ma, and Andrej Risteski. Approximability of discriminators implies diversity in gans. ar Xiv preprint ar Xiv:1806.10586, 2018.

Raef Bassily, Vitaly Feldman, Cristóbal Guzmán, and Kunal Talwar. Stability of stochastic gradient descent on nonsmooth convex losses. Advances in Neural Information Processing Systems, 33:4381 4391, 2020.

Arjun Nitin Bhagoji, Warren He, Bo Li, and Dawn Song. Practical black-box attacks on deep neural networks using efficient query mechanisms. In Proceedings of the European conference on computer vision (ECCV), pp. 154 169, 2018.

Olivier Bousquet and André Elisseeff. Stability and generalization. The Journal of Machine Learning Research, 2:499 526, 2002.

Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp), pp. 39 57. Ieee, 2017.

Simon Du, Jason Lee, Haochuan Li, Liwei Wang, and Xiyu Zhai. Gradient descent finds global minima of deep neural networks. In International conference on machine learning, pp. 1675 1685. PMLR, 2019.

Farzan Farnia and Asuman Ozdaglar. Train simultaneously, generalize better: Stability of gradient-based minimax learners. In International Conference on Machine Learning, pp. 3174 3185. PMLR, 2021.

Farzan Farnia, Jesse M Zhang, and David Tse. Generalizable adversarial training via spectral normalization. ar Xiv preprint ar Xiv:1811.07457, 2018.

Mahyar Fazlyab, Alexander Robey, Hamed Hassani, Manfred Morari, and George Pappas. Efficient and accurate estimation of lipschitz constants for deep neural networks. Advances in Neural Information Processing Systems, 32, 2019.

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. ar Xiv preprint ar Xiv:1412.6572, 2014.

Moritz Hardt, Benjamin Recht, and Yoram Singer. Train faster, generalize better: Stability of stochastic gradient descent. ar Xiv preprint ar Xiv:1509.01240, 2015.

Published in Transactions on Machine Learning Research (11/2024)

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770 778, 2016.

Zhichao Huang, Yanbo Fan, Chen Liu, Weizhong Zhang, Yong Zhang, Mathieu Salzmann, Sabine Süsstrunk, and Jue Wang. Fast adversarial training with adaptive step size. IEEE Transactions on Image Processing, 2023.

Harini Kannan, Alexey Kurakin, and Ian Goodfellow. Adversarial logit pairing. ar Xiv preprint ar Xiv:1803.06373, 2018.

Hoki Kim, Woojin Lee, and Jaewook Lee. Understanding catastrophic overfitting in single-step adversarial training. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp. 8119 8127, 2021.

Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical Report 0, University of Toronto, Toronto, Ontario, 2009. URL https://www.cs.toronto.edu/~kriz/ learning-features-2009-TR.pdf.

Ya Le and Xuan Yang. Tiny imagenet visual recognition challenge. CS 231N, 7(7):3, 2015.

Yunwen Lei. Stability and generalization of stochastic optimization with nonconvex and nonsmooth problems. In The Thirty Sixth Annual Conference on Learning Theory, pp. 191 227. PMLR, 2023.

Yunwen Lei, Zhenhuan Yang, Tianbao Yang, and Yiming Ying. Stability and generalization of stochastic gradient methods for minimax problems. In International Conference on Machine Learning, pp. 6175 6186. PMLR, 2021.

Haochuan Li, Farzan Farnia, Subhro Das, and Ali Jadbabaie. On convergence of gradient descent ascent: A tight local analysis. In International Conference on Machine Learning, pp. 12717 12740. PMLR, 2022.

Tianyi Lin, Chi Jin, and Michael Jordan. On gradient descent ascent for nonconvex-concave minimax problems. In International Conference on Machine Learning, pp. 6083 6093. PMLR, 2020.

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. ar Xiv preprint ar Xiv:1706.06083, 2017.

Omar Montasser, Steve Hanneke, and Nathan Srebro. Vc classes are adversarially robustly learnable, but only improperly. In Conference on Learning Theory, pp. 2512 2530. PMLR, 2019.

Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. Reading digits in natural images with unsupervised feature learning. 2011.

Asuman Ozdaglar, Sarath Pattathil, Jiawei Zhang, and Kaiqing Zhang. What is a good metric to study generalization of minimax learners? Advances in Neural Information Processing Systems, 35:38190 38203, 2022.

Aditi Raghunathan, Sang Michael Xie, Fanny Yang, John C Duchi, and Percy Liang. Adversarial training can hurt generalization. ar Xiv preprint ar Xiv:1906.06032, 2019.

Leslie Rice, Eric Wong, and Zico Kolter. Overfitting in adversarially robust deep learning. In International Conference on Machine Learning, pp. 8093 8104. PMLR, 2020.

Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Aleksander Madry. Adversarially robust generalization requires more data. Advances in neural information processing systems, 31, 2018.

Ali Shafahi, Mahyar Najibi, Mohammad Amin Ghiasi, Zheng Xu, John Dickerson, Christoph Studer, Larry S Davis, Gavin Taylor, and Tom Goldstein. Adversarial training for free! Advances in Neural Information Processing Systems, 32, 2019.

Published in Transactions on Machine Learning Research (11/2024)

Zhouxing Shi, Yihan Wang, Huan Zhang, J Zico Kolter, and Cho-Jui Hsieh. Efficiently computing local lipschitz constants of neural networks via bound propagation. Advances in Neural Information Processing Systems, 35:2350 2364, 2022.

Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. ar Xiv preprint ar Xiv:1409.1556, 2014.

Vasu Singla, Sahil Singla, Soheil Feizi, and David Jacobs. Low curvature activations reduce overfitting in adversarial training. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16423 16433, 2021.

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. ar Xiv preprint ar Xiv:1312.6199, 2013.

Eric Wong, Leslie Rice, and J Zico Kolter. Fast is better than free: Revisiting adversarial training. ar Xiv preprint ar Xiv:2001.03994, 2020.

Bingzhe Wu, Shiwan Zhao, Chaochao Chen, Haoyang Xu, Li Wang, Xiaolu Zhang, Guangyu Sun, and Jun Zhou. Generalization in generative adversarial networks: A novel perspective from privacy protection. Advances in Neural Information Processing Systems, 32, 2019.

Jiancong Xiao, Yanbo Fan, Ruoyu Sun, and Zhi-Quan Luo. Adversarial rademacher complexity of deep neural networks. ar Xiv preprint ar Xiv:2211.14966, 2022a.

Jiancong Xiao, Yanbo Fan, Ruoyu Sun, Jue Wang, and Zhi-Quan Luo. Stability analysis and generalization bounds of adversarial training. Advances in Neural Information Processing Systems, 35:15446 15459, 2022b.

Jiancong Xiao, Jiawei Zhang, Zhi-Quan Luo, and Asuman E. Ozdaglar. Smoothed-SGDmax: A stabilityinspired algorithm to improve adversarial generalization. In Neur IPS ML Safety Workshop, 2022c. URL https://openreview.net/forum?id=4rks WKd Gov R.

Jiancong Xiao, Zeyu Qin, Yanbo Fan, Baoyuan Wu, Jue Wang, and Zhi-Quan Luo. Improving adversarial training for multiple perturbations through the lens of uniform stability. In The Second Workshop on New Frontiers in Adversarial Machine Learning, 2023a. URL https://openreview.net/forum?id= qv ALKz8BUV.

Jiancong Xiao, Ruoyu Sun, and Zhi-Quan Luo. Pac-bayesian spectrally-normalized bounds for adversarially robust generalization. Advances in Neural Information Processing Systems, 36:36305 36323, 2023b.

Jiancong Xiao, Qi Long, Weijie Su, et al. Bridging the gap: Rademacher complexity in robust and standard generalization. In The Thirty Seventh Annual Conference on Learning Theory, pp. 5074 5075. PMLR, 2024a.

Jiancong Xiao, Jiawei Zhang, Zhi-Quan Luo, and Asuman E. Ozdaglar. Uniformly stable algorithms for adversarial training and beyond. In Forty-first International Conference on Machine Learning, 2024b. URL https://openreview.net/forum?id=od Cl49t WA6.

Cihang Xie, Mingxing Tan, Boqing Gong, Alan Yuille, and Quoc V Le. Smooth adversarial training. ar Xiv preprint ar Xiv:2006.14536, 2020.

Yue Xing, Qifan Song, and Guang Cheng. On the algorithmic stability of adversarial training. Advances in neural information processing systems, 34:26523 26535, 2021.

Yue Xing, Qifan Song, and Guang Cheng. Phase transition from clean training to adversarial training. Advances in Neural Information Processing Systems, 35:9330 9343, 2022.

Zhenhuan Yang, Shu Hu, Yunwen Lei, Kush R Vashney, Siwei Lyu, and Yiming Ying. Differentially private sgda for minimax problems. In Uncertainty in Artificial Intelligence, pp. 2192 2202. PMLR, 2022.

Published in Transactions on Machine Learning Research (11/2024)

Dong Yin, Ramchandran Kannan, and Peter Bartlett. Rademacher complexity for adversarially robust generalization. In International conference on machine learning, pp. 7085 7094. PMLR, 2019.

Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. ar Xiv preprint ar Xiv:1605.07146, 2016.

Bohang Zhang, Du Jiang, Di He, and Liwei Wang. Rethinking lipschitz neural networks and certified robustness: A boolean function perspective. Advances in neural information processing systems, 35: 19398 19413, 2022.

Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Laurent El Ghaoui, and Michael Jordan. Theoretically principled trade-off between robustness and accuracy. In International conference on machine learning, pp. 7472 7482. PMLR, 2019.

Jingfeng Zhang, Xilie Xu, Bo Han, Gang Niu, Lizhen Cui, Masashi Sugiyama, and Mohan Kankanhalli. Attacks which do not kill training make adversarial learning stronger. In International conference on machine learning, pp. 11278 11287. PMLR, 2020.

Pengchuan Zhang, Qiang Liu, Dengyong Zhou, Tao Xu, and Xiaodong He. On the discriminationgeneralization tradeoff in gans. ar Xiv preprint ar Xiv:1711.02771, 2017.

A Proof of Stability Generalization Bounds

We first prove the Lipschitzness of maxδ h(w, δ; x) by the following Lemma:

Lemma 1. Define hmax(w; x) := maxδ h(w, δ; x). If h(w, δ; x) satisfies Assumption 1, then hmax is Lw Lipschitz in w for any fixed x.

Proof. For any w, w and x X, without loss of generality assume that hmax(w) hmax(w ), then upon defining δ = arg maxδ h(w, δ; x) we have

|hmax(w; x) hmax(w ; x)| = h(w, δ ; x) max δ h(w , δ; x)

h(w, δ ; x) h(w , δ ; x)

Lw||w w ||,

thus completes the proof.

Another important observation is that similar to Lemma 3.11 in Hardt et al. (2015), mini-batch gradient descent typically makes several steps before it encounters the one example on which two datasets in stability analysis differ.

Lemma 2. Suppose hmax(w; x) := maxδ h(w, δ; x) is bounded as hmax [0, 1]. By applying mini-batch gradient descent for two datasets S, S that only differ in only one sample s, denote by wt and w t the output after t steps respectively and define d(w) t := wt w t . Then for any x and t0,

E[|hmax(wt; x) hmax(w t; x)|] bt0

n + Lw E[d(w) t |d(w) t0 = 0].

Proof. Denote the event E = 1{d(w) t0 =0}. Noting that if the sample s is not visited before the t0-th step,

wt = w t since the updates are the same, hence

t=1 Pr(s Bt) = bt0

Published in Transactions on Machine Learning Research (11/2024)

Then, by the law of total probability,

E[|hmax(wt; x) hmax(w t; x)|] = Pr(E)E[|hmax(wt; x) hmax(w t; x)| | E]

+ Pr(Ec)E[|hmax(wt; x) hmax(w t; x)| | Ec]

Lw E[d(w) t |d(w) t0 = 0] + bt0

where the last step comes from the Lipschitzness proved in Lemma 1.

Equipped with Lemmas 1 and 2, it remains to bound E[d(w) T |d(w) t0 = 0]. The following lemma allows us to do this, given that for every t, E[d(w) t ] is recursively bounded by the previous E[d(w) t 1].

Lemma 3. Suppose that for every step t, the expected distance between weight parameters is bounded by the following recursion for some constants ν and ξ (depending on the algorithm A):

E[d(w) t ] 1 + ν

E[d(w) t 1] + ν

Then, after T steps, the generalization adversarial risk can be bounded by

b Lwξν 1 ν+1 T ν ν+1 .

Proof. Conditioned on d(w) t0 = 0 for some t0, by the recursion bound equation 12 we have

E[d(w) T |d(w) t0 = 0] + ξ

t=t0+1 exp ν

exp ν log T

By Lemma 2, we further obtain

E[|hmax(wt; x) hmax(w t; x)|] Lw ξ n

The bound is maximized at

b LwξT νν 1 ν+1 .

Combining this bound with Theorem 1 finally gives us

Egen(A) sup S,S ,x E[|hmax(wt; x) hmax(w t; x)|] b

b Lwξν 1 ν+1 T ν ν+1 ,

hence the proof is complete.

A.1 Proof of Theorem 2

Lemma 4 (Growth Lemma of AVanilla). Consider two datasets S, S differ in only one sample s. Then the following recursion holds for any step t

E[d(w) t ] (1 + αw,tβ)E[d(w) t 1] + αw,tβ

Published in Transactions on Machine Learning Research (11/2024)

Proof. At step t, let Bt, B t denote the mini-batches respectively. If s Bt, we have

xj Bt wh(wt 1, δj; xj) w t 1 + αw,t

x j B t wh(w t 1, δ j; x j)

||wt 1 w t 1|| + αw,t

wh(wt 1, δj; xj) wh(w t 1, δj; xj) (14)

wh(w t 1, δj; xj) wh(w t 1, δ j; xj) (15)

||wt 1 w t 1|| + αw,t

xj Bt β||wt 1 w t 1|| + αw,t

xj Bt β δj δ j (16)

= (1 + αw,tβ)d(w) t 1 + 2εαw,tβ, (17)

where the last inequality is because the perturbation set = {δ : ||δ|| ε} hence δj δ j 2ε. If s Bt, by the Lipschitzness of h we can bound wh(w, δs; s) L for all w, δ, s. Hence

wt 1 αw,t wh(wt 1, δs; s) w t 1 + αw,t wh(w t 1, δ s; s ) d(w) t 1 + 2αw,t L.

Similar to equation 17 we can further bound

xj Bt wh(wt 1, δj; xj) w t 1 + αw,t

x j B t wh(w t 1, δ j; x j)

(1 + αw,tβ)d(w) t 1 + 2εαw,tβ + 1

d(w) t 1 + 2αw,t L . (19)

Since Bt is randomly drawn from S, Pr(s Bt) = b

n. Combining equations 17 and 19, by the law of total probability we have

E[d(w) t ] (1 + αw,tβ)E[d(w) t 1] + 2εαw,tβ + 2

hence we finish the proof of Lemma 4.

By Lemma 4, upon plugging ν = βc and ξ = 2εn + 2L

β into Lemma 3 we obtain the desired result

Egen(AVanilla) b

b (εβn + L) 1 βc+1 T

A.2 Proof of Theorem 4

Lemma 5 (Iteration-wise Growth Lemma of AFree). Consider two datasets S, S differ in only one sample s. At iteration i of step t, let Bt, B t denote the mini-batches respectively, let wt,i, w t,i denote

the model parameters and δt,i, δ t,i denote the perturbations respectively, and let d(w) t,i := ||wt,i w t,i||,

d(δ) t,i := 1

j:xj Bt ||(δt,i)j (δ t,i)j||. Define the expansivity matrix

ηt := 1 + αw,tβ αw,tβ αδεψβ 1 + αδεψβ

Then we have " d(w) t,i+1 d(δ) t,i+1

" d(w) t,i d(δ) t,i

bαw,t L 2 bαδεψL

Published in Transactions on Machine Learning Research (11/2024)

Proof. If s Bt, which implies Bt = B t, we have

d(w) t,i+1 =

xj Bt wh(wt,i, (δt,i)j; xj) w t,i + αw,t

x j B t wh(w t,i, (δ t,i)j; x j)

||wt,i w t,i|| + αw,t

xj Bt || wh(wt,i, (δt,i)j; xj) wh(w t,i, (δt,i)j; xj)|| (22)

xj Bt || wh(w t,i, (δt,i)j; xj) wh(w t,i, (δ t,i)j; xj)|| (23)

||wt,i w t,i|| + αw,t

xj Bt β||wt,i w t,i|| + αw,t

xj Bt β (δt,i)j (δ t,i)j (24)

= (1 + αw,tβ)d(w) t,i + αw,tβd(δ) t,i . (25)

If s Bt, the gradient difference with respect to s and s shall be separately bounded. By the Lipschitzness of h, we can bound the expansive property of the minimization step with respect to s and s by

||wt,i αw,t wh(wt,i, (δt,i)j; s) w t,i + αw,t wh(w t,i, (δ t,i)j; s )|| d(w) t,i + 2αw,t L.

Similar to equation 25 we can further derive the bound for mini-batch gradient descent by

d(w) t,i+1 =

xj Bt wh(wt,i, (δt,i)j; xj) w t,i + αw,t

xj B t wh(w t,i, (δ t,i)j; x j)

(1 + αw,tβ)d(w) t,i + αw,tβd(δ) t,i + 1

d(w) t,i + 2αw,t L

(1 + αw,tβ)d(w) t,i + αw,tβd(δ) t,i + 2

We then proceed to bound d(δ) t,i recursively. When = {δ : ||δ|| ε}, by the definition of projected gradient in equation 1, we have π (g) = arg min δ Extreme Points( ) ||g δ||2 = εg

Since we assume that with probability 1 the norm of gradient δh(w, δ; x) is lower bounded by 1/ψ for some constant ψ > 0, we can translate π into the projection onto the convex set . For any vector g such that ||g|| 1/ψ,

||g|| = εψg

ψ||g|| = εψg max{1, εψ||g||/ε} = arg min δ ||εψg δ||2 = P (εψg). (26)

Since is convex, the projection P ( ) is 1-Lipschitz. So for all j such that xj B we have

||(δt,i+1)j (δ t,i+1)j|| = ||P ((δt,i)j + αδπ (gδj)) P ((δ t,i)j + αδπ (gδ j))||

||(δt,i)j + αδπ (gδj) (δ t,i)j αδπ (gδ j)||

||(δt,i)j (δ t,i)j|| + αδ||P (εψgδj) P (εψgδ j)||

||(δt,i)j (δ t,i)j|| + αδεψ|| δh(wt,i, (δt,i)j; xj) δh(w t,i, (δ t,i)j; x j)||.

If xj = x j, by the smoothness we can further bound

||(δt,i+1)j (δ t,i+1)j|| ||(δt,i)j (δ t,i)j|| + αδεψ|| δh(wt,i, (δt,i)j; xj) δh(w t,i, (δt,i)j; xj)|| (27)

+ αδεψ|| δh(w t,i, (δt,i)j; xj) δh(w t,i, (δ t,i)j; xj)|| (28)

(1 + αδεψβ)||(δt,i)j (δ t,i)j|| + αδεψβd(w) t,i . (29)

Published in Transactions on Machine Learning Research (11/2024)

Otherwise if xj = s = x j, we can bound it by the Lipschitzness

||(δt,i+1)j (δ t,i+1)j|| ||(δt,i)j (δ t,i)j|| + 2αδεψL. (30)

Upon combining equations 29 and 30, we obtain the desired recursion bound for d(δ) t,i

d(δ) t,i+1 (1 + αδεψβ)d(δ) t,i + αδεψβd(w) t,i + 1{s Bt} 2

Finally, combining the above completes the proof.

Lemma 6 (Step-wise Growth Lemma of AFree). Consider two datasets S, S differ in only one sample s. Let Bt, B t denote the mini-batches at step t respectively, and let d(w) t := ||wt,m w t,m|| be the distance between weight parameters after step t. Over the randomness of Bt, we have

E[d(w) t ] + 2L

t (1 + αw,tβ + αδεψβ)m 1 E[d(w) t 1] + 2L

Proof. Noting that by Lemma 5 we can obtain " d(w) t,i+1 d(δ) t,i+1

" d(w) t,i d(δ) t,i

bαw,t L 2 bαδεψL

" d(w) t,i d(δ) t,i

Since the updates are repeated for m iterations in one step, by induction we have " d(w) t,m d(δ) t,m

ηm t d(w) t,0 0

Denote α := αw,tβ and r := αδεψ/αw,t for simplicity. By the definition in equation 20, we can calculate the eigendecomposition of ηt

ηt = 1 + α α αr 1 + αr

1 + α(r + 1) 0 0 1

r r+1 r r+1 r r+1 1 r+1

Denoting d0 := d(w) t,0 + 1{s Bt} 2L

bβ and dm := d(w) t,m + 1{s Bt} 2L

bβ , we can solve the recursion dm d(δ) t,m

(1 + α(r + 1))m 0 0 1

r r+1 r r+1 r r+1 1 r+1

(1 + α(r + 1))m 0 0 1

r r+1d0 r r+1d0

(1 + α(r + 1))m r r+1d0 r r+1d0

" r+(1+α(r+1))m

r+1 d0 r((1+α(r+1))m 1)

Noting that by assumption αw,t c/mt, upon plugging in α = αw,tβ and r = αδεψ/αw,t,

r + (1 + α(r + 1))m

r + 1 = 1 + (1 + α(r + 1))m 1

j=0 (1 + α(r + 1))j

1 + α m(1 + α(r + 1))m 1

t (1 + αw,tβ + αδεψβ)m 1.

Published in Transactions on Machine Learning Research (11/2024)

Since d(w) t = d(w) t,m and d(w) t 1 = d(w) t,0 , we obtain that

d(w) t + 1{s Bt} 2L

t (1 + αw,tβ + αδεψβ)m 1 d(w) t 1 + 1{s Bt} 2L

Since Bt is drawn uniformly randomly from S, Pr(s Bt) = b

n. By the law of total probability, we complete the proof.

By Lemma 6, since αw,t c/mt c/m, upon plugging ν = βc(1 + βc/m + αδεψβ)m 1 = λFree and ξ = 2L

β into Lemma 3 we obtain that after T/m steps,

Egen(AVanilla) b

1 + 1 λFree

1 λFree+1 T

λFree λFree+1 .

A.3 Proof of Theorem 5

To prove Theorem 5, we start with the following growth lemma of AFast.

Lemma 7 (Growth Lemma of AFast). Consider two datasets S, S differ in only one sample s. Then the following recursion holds for any step t

E[d(w) t ] (1 + αw,tβ(1 + αδεψβ))E[d(w) t 1] + 2

Proof. We first bound the difference between δj and δ j. At step t, let Bt, B t denote the mini-batches respectively. By equation 26, we have π (g) = P (εψg) if ||g|| 1/ψ. Since is convex, the projection P ( ) is 1-Lipschitz. So for all j such that xj B we have δj δ j = P ( δj + αδπ (gδj)) P ( δj + αδπ (gδ j))

( δj + αδπ (gδj)) ( δj + αδπ (gδ j))

αδ P (εψgδj) P (εψgδ j)

αδεψ δh(wt 1, δj; xj) δh(w t 1, δj; x j)

If xj = x j, by smoothness we obtain ||δj δ j|| αδεψβd(w) t 1, so

|| wh(wt 1, δj; xj) wh(w t 1, δ j; x j)||

|| wh(wt 1, δj; xj) wh(w t 1, δj; x j)|| + || wh(w t 1, δj; xj) wh(w t 1, δ j; x j)||

β||wt 1 w t 1|| + β||δj δ j||

β(1 + αδεψβ)d(w) t 1.

Otherwise if xj = s = x j, we can only bound this term by the Lipschitzness

|| wh(wt 1, δj; xj) wh(w t 1, δ j; x j)|| 2L.

Therefore, we can bound d(w) t by the following recursion

xj Bt wh(wt 1, δj; xj) w t 1 + αw,t

x j B t wh(w t 1, δ j; x j)

wt 1 w t 1 + αw,t

wh(wt 1, δj; xj) wh(w t 1, δ j; x j)

d(w) t 1 + αw,tβ(1 + αδεψβ)d(w) t 1 + 1{s Bt} 2αw,t L

Published in Transactions on Machine Learning Research (11/2024)

Since Bt is randomly drawn from S, Pr(s Bt) = b

n. So by the law of total probability we have

E[d(w) t ] (1 + αw,tβ(1 + αδεψβ))E[d(w) t 1] + 2

hence we finish the proof of Lemma 7.

Armed with Lemmas 3 and 7, by letting ν = βc(1 + αδεψβ) and ξ = 2L β(1+ αδεψβ) we obtain

Egen(AFast) b

1 + 1 βc(1 + αδεψβ)

1 βc(1+ αδεψβ)+1 T

βc(1+ αδεψβ) βc(1+ αδεψβ)+1 ,

thus complete the proof.

B Additional Numerical Results

B.1 Robust Overfitting During Training Process

Implementation Details: For the training process of networks, we follow the standards in the literature (Madry et al., 2017; Rice et al., 2020). We apply mini-batch gradient descent with batch size b = 128. Weight decay is set to be 2 10 4. We adopt a piecewise learning rate decay schedule, starting with 0.1 and decaying by a factor of 10 at the 100th and 150th epochs, for 200 total epochs. For the vanilla algorithm, we apply PGD adversaries with 1, 7, or 10 iterations and set the step size as ε, ε/4, or ε/4 respectively. For the free algorithm, since it repeats m iterations at each step, we use 200/m epochs to match their training iterations for fair comparison. For the adversaries, we consider both L2-norm attack of radius ε = 128/255, and L -norm attack of radius ε = 8/255 (except for Tiny-Image Net ε = 4/255).

We extend the experiments in Section 6 using different free steps m, various neural network architectures, and other datasets. We use Res Net18 for CIFAR-10 and CIFAR-100, Res Net50 for Tiny-Image Net, and VGG19 for SVHN. We apply both L2 and L adversaries and provide the plots of training curves for CIFAR-10 and CIFAR-100 in Figure 5. We can observe that the vanilla training suffers from robust overfitting, while the free algorithm with moderate free steps generalizes better and achieves comparable robustness to the vanilla algorithm.

Published in Transactions on Machine Learning Research (11/2024)

Table 4: Robust training accuracy, testing accuracy, and generalization gap of the vanilla, fast, and free algorithms across Tiny-Image Net and SVHN datasets.

Dataset Attack Results (%) Vanilla-7 Vanilla-10 Fast Free-2 Free-4 Free-6 Free-8 Free-10

Tiny-Image Net

L2 Train Acc. 100.0 100.0 100.0 100.0 100.0 99.3 65.1 46.8 Test Acc. 22.9 23.0 23.5 23.8 24.8 22.7 23.6 23.8 Gen. Gap 77.1 77.0 76.5 76.2 75.2 76.6 41.5 23.0

L Train Acc. 100.0 100.0 100.0 99.2 99.8 96.2 60.2 39.5 Test Acc. 13.8 13.9 13.5 12.3 14.5 14.4 15.7 16.8 Gen. Gap 86.2 86.1 86.5 86.9 85.3 81.8 44.5 22.7

L2 Train Acc. 100.0 100.0 96.9 89.2 95.1 90.0 93.5 91.0 Test Acc. 61.4 61.2 60.8 61.6 60.8 62.3 61.6 62.5 Gen. Gap 38.6 38.8 36.1 27.6 34.3 27.7 31.9 28.5

L Train Acc. 89.1 88.9 55.1 49.6 55.4 63.8 56.7 56.9 Test Acc. 38.7 38.8 39.5 38.3 42.7 47.1 46.7 45.0 Gen. Gap 50.4 50.1 15.6 11.3 12.7 16.7 10.0 11.9

Table 5: Clean and robust training accuracy, testing accuracy, and generalization gap of the vanilla, free, TRADES, and Free TRADES algorithms on CIFAR-10 dataset.

Attack Results (%) Vanilla Free TRADES Free TRADES

Clean Train Acc. 100.0 98.7 99.8 95.8 Clean Test Acc. 88.1 88.6 86.2 86.8 Clean Gen. Gap 11.9 10.1 13.6 9.0

Robust Train Acc. 100.0 86.4 99.2 83.1 Robust Test Acc. 65.6 66.0 66.2 68.0 Robust Gen. Gap 34.4 20.4 33.0 15.1

Clean Train Acc. 99.9 95.1 97.7 91.0 Clean Test Acc. 84.7 85.9 82.4 83.1 Clean Gen. Gap 15.2 9.2 15.3 7.9

Robust Train Acc. 95.1 63.7 85.5 60.2 Robust Test Acc. 43.7 46.3 50.1 48.7 Robust Gen. Gap 51.4 17.4 35.4 11.5

Published in Transactions on Machine Learning Research (11/2024)

(a) Robust train/test error and generalization gap against L2-norm attack on CIFAR-10.

(b) Robust train/test error and generalization gap against L -norm attack on CIFAR-10.

(c) Robust train/test error and generalization gap against L2-norm attack on CIFAR-100.

(d) Robust train/test error and generalization gap against L -norm attack on CIFAR-100.

Figure 5: Learning curves of different algorithms for a Res Net18 model adversarially trained against L2-norm and L -norm attacks on CIFAR-10 and CIFAR-100. The free curves are scaled horizontally by a factor of m for clear comparison.

Published in Transactions on Machine Learning Research (11/2024)

B.2 Robustness Against Black-box Attack

We perform additional experiments to test the robust accuracy of models trained by vanilla, fast, and free algorithms against black-box attacks. In Figure 6, we demonstrate their accuracy against L2 and L square attacks with 5000 queries of various radius. In Figure 7, we use transferred attacks designed for other independently trained robust models.

(a) Accuracy of adversarially trained Res Net18 models against L2 and L square attacks on CIFAR-10.

(b) Accuracy of adversarially trained Res Net18 models against L2 and L square attacks on CIFAR-100.

(c) Accuracy of adversarially trained Res Net50 models against L2 and L square attacks on Tiny-Image Net.

Figure 6: Robust accuracy of adversarially trained models by vanilla, fast, and free algorithms against square attacks on CIFAR-10, CIFAR-100, and Tiny-Image Net. The left figure applies L2 attacks of radius from 64 to 192, and the right figure applies L attacks of radius from 1 to 9.

Published in Transactions on Machine Learning Research (11/2024)

(a) Accuracy of adversarially trained Res Net18 models against L2 and L transferred attacks on CIFAR10.

(b) Accuracy of adversarially trained Res Net18 models against L2 and L transferred attacks on CIFAR100.

(c) Accuracy of adversarially trained Res Net50 models against L2 and L transferred attacks on Tiny Image Net.

Figure 7: Robust accuracy of models adversarially trained by vanilla, fast, and free algorithms against transferred attacks designed for other independently trained robust models on CIFAR-10, CIFAR-100, and Tiny-Image Net. The left figure applies L2 attacks of radius ranging from 64 to 192, and the right figure applies L attacks of radius ranging from 1 to 9.

Published in Transactions on Machine Learning Research (11/2024)

B.3 Transferability

We further investigated the transferability of the adversarial examples designed for the models trained by the mentioned adversarial training algorithms. We computed the adversarial perturbations designed for the robust models and used them to attack other standard ERM-trained models. We test the transferability of models trained by vanilla, fast, and free algorithms. In Figure 8, we transfer the attacks to a standard Res Net18 model on CIFAR-100 and a standard Res Net50 model on Tiny-Image Net. In Figure 9, we transfer the attacks to various standard models including Res Net18, Res Net50, and Wide-Res Net34 (Zagoruyko & Komodakis, 2016) on CIFAR-10.

Our numerical results suggest that the better generalization performance of the free algorithm could result in more transferable adversarial perturbations, which could be more detrimental to the performance of other unseen neural network models.

(a) Test accuracy of a standard Res Net18 target model against transferred attacks from adversarially trained models on CIFAR-100.

(b) Test accuracy of a standard Res Net50 target model against transferred attacks from adversarially trained models on Tiny-Image Net.

Figure 8: Test accuracy of standard Res Net18 and Res Net50 target models against transferred attacks from models adversarially trained by vanilla, fast, and free algorithms on CIFAR-100 and Tiny-Image Net. The left figure applies L2 attacks of radius ranging from 64 to 192, and the right figure applies L attacks of radius ranging from 4 to 12.

Published in Transactions on Machine Learning Research (11/2024)

(a) Test accuracy of a standard Res Net18 target model against transferred attacks from adversarially trained Res Net18 models.

(b) Test accuracy of a standard Res Net50 target model against transferred attacks from adversarially trained Res Net18 models.

(c) Test accuracy of a standard Wide-Res Net34 target model against transferred attacks from adversarially trained Res Net18 models.

Figure 9: Test accuracy of standard Res Net18, Res Net50, and Wide-Res Net34 target models against transferred attacks from Res Net18 models adversarially trained by vanilla, fast, and free algorithms on CIFAR-10. The left figure applies L2 attacks of radius ranging from 64 to 192, and the right figure applies L attacks of radius ranging from 4 to 12.

Published in Transactions on Machine Learning Research (11/2024)

B.4 Generalization Gap for Different Numbers of Training Samples

We perform additional experiments to study the relationship between the number of training samples and the generalization gap. We randomly sampled a subset from the CIFAR-10 and CIFAR-100 training datasets of size n {10000, 20000, 30000, 40000, 50000}, and adversarially trained Res Net18 models on the subset for a fixed number of iterations. The results are demonstrated in Figure 10.

(a) L2-norm adversarial training in CIFAR-10.

(b) L -norm adversarial training in CIFAR-10.

(c) L2-norm adversarial training in CIFAR-100.

(d) L -norm adversarial training in CIFAR-100.

Figure 10: Adversarial generalization gap of Res Net18 models adversarially trained by vanilla, fast, and free algorithms (with free steps m = 4 or m = 6) for a fixed number of steps on a subset of CIFAR-10 or CIFAR-100.

Published in Transactions on Machine Learning Research (11/2024)

B.5 Hyperparameter Choices and Generalization Behavior with Early Stopping

For the fast adversarial training algorithm, we applied the learning rate of adversarial attack αδ = 7/255 for L attack and αδ = 64/255 for L2 attack over 200 training epochs, following from Andriushchenko & Flammarion (2020). We note that our hyperparameter selection is different from the original implementation of fast AT (Wong et al., 2020), which applied the attack step size αδ = 10/255 for L attack over 15 training epochs with early stopping. We clarify that we do not use early stopping in our implementation to test the generalization behavior of fast AT over the same 200 number of epochs as we choose for PGD and free AT, where the choice of 200 epoch numbers follows from the reference Rice et al. (2020) studying overfitting in adversarial training. Regarding the choice of attack step size, we observed that setting the fast attack learning rate as αδ = 10/255 will lead to fast AT s catastrophic overfitting over the course of 200 training epochs. The learning curves of fast AT with learning rate αδ = 7/255 and αδ = 10/255 for L attack over 200 training epochs are demonstrated in Figure 11. The catastrophic overfitting phenomenon in fast AT has been similarly observed and reported in the literature (Andriushchenko & Flammarion, 2020; Kim et al., 2021; Huang et al., 2023).

Figure 11: Learning curves of fast AT with attack learning rate αδ = 7/255 and αδ = 10/255 for a Res Net18 model adversarially trained against L attacks on CIFAR-10.

We also note that as the stability framework gives a generalization bound in terms of the min and max steps, our theoretical analysis is also applicable if early stopping is employed. From the learning curves in Figure 5, we note that throughout the whole training process, the generalization performance of free AT is consistently better than vanilla AT, which is consistent with our theoretical results. For instance, Table 6 reports the generalization gaps of the models trained by vanilla, fast (with αδ = 10/255), and free AT algorithms if the training process is stopped at the 50th epoch.

Table 6: Robust training, testing, and generalization performance of the models trained by vanilla, fast (with αδ = 10/255), and free AT algorithms against L attack on CIFAR-10, where the training process is stopped at the 50th epoch.

Results (%) Vanilla Fast Free Train Acc. 53.7 41.8 37.0 Test Acc. 47.1 39.6 35.4 Gen. Gap 6.6 2.2 1.6

Published in Transactions on Machine Learning Research (11/2024)

B.6 Soundness of Lower-bounded Gradient Norm Assumption in Theorem 4

In Theorem 4 we make an assumption that the norm of gradient δh(w, δ; x) is lower bounded by 1/ψ for some constant ψ > 0 with probability 1 during the training process. We note that the assumption is only required for the points within the ϵ-distance from the training data. To address the soundness of this assumption, we have numerically evaluated the gradient norm over the course of free-AT training on CIFAR-10 and CIFAR-100 data, indicating that the minimum gradient norm on training data is constantly lower-bounded by O(10 3) in those experiments, i.e., ψ = O(103) is an upper bounded constant.

We trained Res Net18 networks on CIFAR-10 and CIFAR-100 datasets, applying L2 adversary and setting the free step m as 4 and 6, and we recorded the gradient norm δh(w, δj; xj, yj) 2 of every sample throughout the training process. The heatmaps are plotted in Figure 12.

(a) CIFAR-10, Free-4

(b) CIFAR-10, Free-6

(c) CIFAR-100, Free-4

(d) CIFAR-100, Free-6

Figure 12: The heat map of gradient norm throughout the training process.