# on_the_diversity_of_adversarial_ensemble_learning__619606cd.pdf

On the Diversity of Adversarial Ensemble Learning

Jun-Qi Guo * 1 Meng-Zhang Qian * 1 Wei Gao 1 Zhi-Hua Zhou 1

Diversity has been one of the most crucial factors on the design of adversarial ensemble methods. This work focuses on the fundamental problems: How to define the diversity for the adversarial ensemble, and how to correlate with algorithmic performance. We first show that it is an NP-Hard problem to precisely calculate the diversity of two networks in adversarial ensemble learning, which makes it different from prior diversity analysis. We present the first diversity decomposition under the first-order approximation for the adversarial ensemble learning. Specifically, the adversarial ensemble loss can be decomposed into average of individual adversarial losses, gradient diversity, prediction diversity and cross diversity. Hence, it is not sufficient to merely consider the gradient diversity on the characterization of diversity as in previous adversarial ensemble methods. We present diversity decomposition for classification with cross-entropy loss similarly. Based on the theoretical analysis, we develop new ensemble method via orthogonal adversarial predictions to simultaneously improve gradient diversity and cross diversity. We finally conduct experiments to validate the effectiveness of our method.

1. Introduction

General machine learning models may be misled heavily by examples with adversarial perturbations (Szegedy et al., 2014; Goodfellow et al., 2015), which raises some serious concerns about reliability of models, particularly in highrisk applications such as healthcare, finance and autonomous driving (Finlayson et al., 2019; Deng et al., 2021; Fursov et al., 2021). Various robust methods have been developed against adversarial examples in recent years (Zhang et al.,

*Equal contribution 1National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China; School of Artificial Intelligence, Nanjing University, Nanjing, China. Correspondence to: Wei Gao <gaow@lamda.nju.edu.cn>.

Proceedings of the 42 nd International Conference on Machine Learning, Vancouver, Canada. PMLR 267, 2025. Copyright 2025 by the author(s).

2019b; Gowal et al., 2021; Pang et al., 2021; Wang et al., 2023; Peng et al., 2023; Bartoldson et al., 2024).

Ensemble learning combines multiple learners rather than one single learner with better performance, which has been paid much attention in the adversarial robustness learning. Sequential robust ensemble has been constructed via the boosting framework (Abernethy et al., 2021; Zhang et al., 2019a; 2022; Guo et al., 2022), and parallel ensemble has also developed for robust learning (Pinot et al., 2020; Sen et al., 2020; Yang et al., 2021; 2022; Deng & Mu, 2024).

Diversity has always been one of the most crucial factors in the design of ensemble methods (Zhou, 2012; Wood et al., 2023). For adversarial ensemble learning, Pang et al. (2019) took prediction disagreements of base learners as diversity, while Yang et al. (2020) defined the diversity via losses of base learners on exchanged adversarial examples. Another idea is to consider the misalignment of gradient directions for diversity (Kariyappa & Qureshi, 2019; Dabouei et al., 2020; Huang et al., 2021; Bogun et al., 2022).

There are some fundamental problems open for adversarial ensemble learning. For example, how to formally define the diversity of adversarial ensemble, and what s more, how to correlate diversity definition with algorithmic performance from a theoretical view. This work studies fundamental problems of diversity in the adversarial ensemble learning, and the main contributions are summarized as follows:

We first show that it is an NP-Hard problem to precisely calculate the diversity of two neural networks in the adversarial ensemble, since diversity is heavily relevant to intrinsic structures and output predictions of models simultaneously. This challenge makes it different from traditional diversity on output predictions (Zhou, 2012; Wood et al., 2023). Sun & Zhou (2018) indicated the importance of structure diversity for decision trees, whereas it remains open for neural networks.

We present the first diversity decomposition under the first-order approximation in the adversarial ensemble learning. Specifically, the adversarial ensemble loss is decomposed into average of individual adversarial losses, prediction diversity, gradient diversity and cross diversity. It is not sufficient to only consider gradient diversity on the characterization of diversity as in prior

On the Diversity of Adversarial Ensemble Learning

adversarial ensemble methods. We present similar decomposition for classification with cross-entropy loss, which is commonly used for neural networks.

Based on theoretical analysis, we develop the Adv EOAP adversarial ensemble method1 via the orthogonal of adversarial predictions of base learners, which could improve gradient and cross diversity simultaneously. We finally conduct empirical studies to validate the effectiveness of our Adv EOAP method.

The rest of this work is constructed as follows: Section 2 presents some preliminaries. Section 3 provides diversity analysis. Section 4 develops our method. Section 5 conducts experiments. Section 6 concludes with future work.

2. Preliminary

Let X Rd and Y denote the instance and label space, respectively, where Y = {0, 1} for binary classification and Y R for regression. Let D be an underlying distribution over the product space X Y, and we have a training data

Sn = {(x1, y1), (x2, y2), , (xn, yn)} ,

where each sample is drawn i.i.d. from distribution D.

Define the perturbation set ϵ p w.r.t. lp norm and ϵ > 0 as

ϵ p = {δ Rd : δ p = (|δ1|p+|δ2|p+ +|δd|p)1/p ϵ},

which shows instances imperceptible perturbation (Szegedy et al., 2014). Let F = {f : X R} be a hypothesis space, and loss function ℓ: Y Y R is introduced to measure performance. Given f F, we define the adversarial perturbation w.r.t. example x as

δ f arg maxδ ϵpℓ(f(x + δ), y),

and x + δ f is called the adversarial example w.r.t. x.

We define the expected adversarial loss as

Ladv(f, D) = E(x,y) D h max δ ϵp {ℓ(f(x + δ), y)} i ,

and define the empirical adversarial loss w.r.t. data Sn as

ˆLadv(f, Sn) = 1

i=1 max δ ϵ p {ℓ(f(xi + δ), yi)}.

We finally introduce some useful notations in this work. Denote by , the inner product of two vectors, and ei is a unit vector with i-th element 1. Write [k] = {1, 2, , k}

1Code is available at https://github.com/Guo JQ42/Adv OAP.

for integer k > 0. For two non-negative real numbers a and b with a + b = 1, we define the KL-divergence as

KL(a, b) = a ln(a/b) + (1 a) ln((1 a)/(1 b)) ,

and for two probability vectors a = (a1, a2, , am) and b = (b1, b2, , bm), we define the KL-divergence as

i=1 ai ln(ai/bi) .

3. Theoretical Analysis on Diversity

Given m learners f1(x), ..., fm(x), this work focuses on the simplest ensemble method (Dietterich, 2000; Zhou, 2012)

We begin with the challenge on the analysis of adversarial diversity, and then present diversity decomposition w.r.t. squared loss and cross-entropy loss, respectively.

3.1. Main challenge on analysis of adversarial diversity

For traditional non-adversarial ensemble learning, it is easy to make the error-ambiguity decomposition over example (x, y) w.r.t. squared loss from (Krogh & Vedelsby, 1994;

Zhou, 2012) as follows:

( f(x) y)2 | {z } ensemble loss

m | {z } average loss

(fj(x) f(x))2

m | {z } ambiguity

where the ambiguity can be viewed as ensemble diversity. Wood et al. (2023) further presented bias-variance-diversity decomposition for ensemble learning, simplified by

ensemble loss = bias + variance diversity.

It is natural to consider some similar decompositions in the adversarial diversity learning. However, this remains some challenges as shown by the following theorem.

Theorem 3.1. For squared loss, it is an NP-hard problem to precisely calculate the diversity w.r.t. example (x, y) in the adversarial ensemble learning as follows:

j=1 max δ ϵp {(fj(x+δ) y)2} max δ ϵp {( f(x+δ) y)2} ,

where the error ambiguity decomposition is considered for two neural networks f1(x) and f2(x) with Re LU activation, and f(x) = (f1(x) + f2(x))/2.

On the Diversity of Adversarial Ensemble Learning

Theorem 3.1 shows that, even for the ensemble of two neural networks, it is an NP-hard problem to make error ambiguity decomposition in the adversarial ensemble learning. The main challenge is that the adversarial example is heavily dependent on the intrinsic structure of neural network, and diversity analysis is relevant to intrinsic structure, model prediction and adversarial perturbation simultaneously. This is different from traditional diversity analysis on model predictions (Zhou, 2012; Wood et al., 2023). We also notice the importance of structure diversity for decision trees (Sun & Zhou, 2018), whereas it remains open for neural networks.

The proof idea involves the reduction of 3-SAT problem. Specially, we consider a 3-SAT problem with k clauses, and each clause is a disjunction of 3 literals. We construct a neural network g(x) with Θ(k) layers and Θ(k) width, where each literal can be regarded as an input node of g(x), and each disjunction and conjunction can be replaced with the max and min operators, respectively. The max and min operators can be constructed by Re LU activation. We construct two neural networks f1(x, x0) = min(g(x), x0) and f2(x, x0) = min(g(x), x0) by adding an auxiliary variable x0. The detailed proof is given in Appendix A, which is partially motivated from previous work on l1 or l norm (Katz et al., 2017; Weng et al., 2018), while our work generalizes to lp norm with p = 1, 2, , .

3.2. Diversity decomposition w.r.t. squared loss

Previous ensemble methods generally take the first-order approximation for adversarial loss function, and focus on gradient diversity (Kariyappa & Qureshi, 2019; Dabouei et al., 2020; Huang et al., 2021). This is partially because of the NP-hardness for adversarial diversity in Theorem 3.1. Following the first-order Taylor approximation, we have

We now present the first decomposition for the adversarial ensemble loss w.r.t. squared loss as follows:

Theorem 3.2. For ensemble f = Pm j=1 fj/m, we present the decomposition of adversarial ensemble loss under the first-order approximation over example (x, y) as follows:

max δ ϵ p {( f(x+δ) y)2} =

maxδ ϵp{(fj(x + δ) y)2}

m | {z } average of individual adversarial losses

(fj(x) f(x))2

m | {z } prediction diversity

fj(x) 2 q f(x) 2 q m | {z } gradient diversity

1 2 3 4 5 6 7 8 9 10 11 12 Number of CNN Base Learners

Adversarial Loss

CIFAR10 (l2 norm ball)

Average adversarial loss Adversarial ensemble loss

1 2 3 4 5 6 7 8 9 10 11 12 Number of CNN Base Learners

2 9 2 8 2 7 2 6 2 5 2 4 2 3 2 2

CIFAR10 (l2 norm ball)

Prediction diversity Gradient diversity Cross diversity

1 2 3 4 5 6 7 8 9 10 11 12 Number of CNN Base Learners

Adversarial Loss

CIFAR10 (l norm ball)

Average adversarial loss Adversarial ensemble loss

1 2 3 4 5 6 7 8 9 10 11 12 Number of CNN Base Learners

0 2 9 2 8 2 7 2 6 2 5 2 4 2 3 2 2 2 1 20

CIFAR10 (l norm ball)

Prediction diversity Gradient diversity Cross diversity

Figure 1. An illustration of diversity decomposition in Theorem 3.2

on dataset CIFAR10 of perturbation balls with l2 and l norm, respectively. Here, we consider CNNs as base learners.

fj(x) q|fj(x) y| f(x) q| f(x) y|

m | {z } cross diversity

where 1/p + 1/q = 1.

In this theorem, the adversarial ensemble loss is decomposed into the average of individual adversarial losses, prediction diversity, gradient diversity and cross diversity. Gradient diversity measures the dispersion of gradients concerning the mean of gradients w.r.t. lq norm, and it exactly becomes the variance as for q = 2. Prediction diversity can be understood as the traditional diversity such as ambiguity (Krogh & Vedelsby, 1994; Zhou, 2012). Cross diversity can be viewed as a cross between functional outputs and gradients. Gradient and cross diversities are highly relevant to intrinsic structures of functions and functional outputs. The detailed proof is given in Appendix B.1.

From Theorem 3.2, it is also observable that gradient and cross diversities take more important roles on adversarial ensemble learning as for larger ϵ (i.e., radius of perturbation ball), yet prediction diversity dominates as for smaller ϵ. Also, the decomposition is relevant to the lp-norm distance of perturbation ball. Thus, the characterization of diversities in adversarial ensemble learning is more complicated than that of traditional non-adversarial ensemble learning.

Figure 1 presents an intuitive illustration for the diversity decomposition of Theorem 3.2. Here, we consider dataset CIFAR10 with two classes, and focus on the perturbation ball with the popular l2 and l norm. More experiment details are given in Appendix B.2, and we try to understand the trends of loss functions and diversities as the number of CNN-base learners increases.

As can be seen from Figure 1, we have smaller adversarial ensemble loss as for larger prediction, gradient and cross diversities w.r.t. l2 and l norm perturbation balls, when we

On the Diversity of Adversarial Ensemble Learning

keep average of individual adversarial losses stable. Generally, gradient and cross diversities are larger than prediction diversity and take more important roles. This is nicely in accordance with our Theorem 3.2, and diversity empirically plays an important role in adversarial ensemble learning.

We focus on the first-order approximation as in previous adversarial ensemble methods (Kariyappa & Qureshi, 2019; Dabouei et al., 2020; Huang et al., 2021), and it is interesting to explore the second-order (or higher-order) approximation from gradients and Hessian matrices. The main challenge is how to obtain the closed-form solution of adversarial loss via second-order approximation, because it is relevant to the roots of high-order polynomials (Forsythe & Golub, 1965; Mor e & Sorensen, 1983; Fortin & Wolkowicz, 2004). More discussions on this issue are given in Appendix B.3.

Relevant to previous adversarial ensemble methods

Most previous ensemble methods consider the first-order approximation of adversarial loss functions (Kariyappa & Qureshi, 2019; Dabouei et al., 2020; Yang et al., 2021; Bogun et al., 2022), and the diversity is measured by the average of cos values over pairs of gradients of base learner. Specifically, the diversity w.r.t. instance x is given by

j=i+1 cos( fi(x), fj(x)),

where cos( fi(x), fj(x)) denotes the cos value of the angle between fi(x) and fj(x). We could derive the relationship between our gradient diversity and previous diversity via the cos functions as follows:

Gradient diversity = mϵ2 ϵ2

i=1 fi(x) 2 2

i =j fi(x) 2 fj(x) 2 cos( fi(x), fj(x)) .

It is feasible to enlarge gradient diversity by decreasing cos functions, which is nicely in accordance with previous work (Dabouei et al., 2020; Bogun et al., 2022). Meanwhile, it is noteworthy of other important factors on gradient diversity such as gradient norm, rather than only one factor, which can be shown by following examples. Example 1. There exist two ensembles of the same averages of cos values and individual adversarial losses, but with different adversarial ensemble losses.

Proof. We focus on 2-dimensional instance space X R2

and label space Y R, and consider

f1(x) = x1 + x2, f2(x) = x1 3x2 ,

f3(x) = abx1 + bx2, f4(x) = abx1 bx2 ,

2 3 4 5 6 7 8 9 10 11 12 Number of CNN Base Learners

0.85 0.90 0.95 1.00 1.05 1.10 1.15 1.20 1.25

Adversarial Loss

Average cos value Average adversarial loss Adversarial ensemble loss

Average Cos Value

CIFAR10 (l2 norm ball)

2 3 4 5 6 7 8 9 10 11 12 Number of CNN Base Learners

Adversarial Loss

Average cos value Average adversarial loss Adversarial ensemble loss

Average Cos Value

CIFAR10 (l norm ball)

Figure 2. An illustration of the influence of average cos value on dataset CIFAR10 of perturbation balls with l2 and l norm, respectively. Here, we consider CNNs as base learners.

where a = (

5 1)/2 and b = (

5a). We study two ensembles: one ensemble of f1 and f2; the other ensemble of f3 and f4. For example (x, y) = ([1, 0], 1) and perturbation set = {δ : δ 2 2}, we have

cos( f1(x), f2(x)) = cos( f3(x), f4(x)) ,

also with the same average of individual adversarial losses

i=1 max δ (fi(x + δ) y)2

i=3 max δ (fi(x + δ) y)2

However, the adversarial ensemble losses are different from

max δ (f1(x + δ)/2 + f2(x + δ)/2 y)2 = 8 ,

max δ (f3(x + δ)/2 + f4(x + δ)/2 y)2 7.2 .

Here, we consider the l2 norm in perturbation set , and similar analysis could be made for lp-norm. More details are presented in Appendix B.4.

In addition to Example 1, we could also present empirical studies on adversarial ensemble losses versus the cos values over dataset CIFAR10, and the experiment details are given in Appendix B.2. Figure 2 shows the curves of adversarial ensemble loss, the averages of cos values and individual adversarial losses with l2 and l norm.

From Figure 2, it is clear that adversarial ensemble losses keep decreasing when we increase number of CNN-base learners, whereas it almost remains constant for the averages of cos values and individual adversarial losses. Therefore, it is not sufficient to characterize the diversity of adversarial ensemble by merely considering the average of cos values as in (Dabouei et al., 2020; Bogun et al., 2022).

3.3. Diversity decomposition w.r.t. cross-entropy loss

We study the decomposition of adversarial ensemble w.r.t. cross-entropy loss for binary classification, and also follow the first-order approximation f(x+δ) f(x)+ f(x)T δ. For base learners with logit outputs (i.e., the logarithm of the ratio of probabilities), we have the probability of the positive class over x + δ as follows:

pf(x + δ) = 1 + exp( (f(x) + f(x)T δ) 1 . (1)

On the Diversity of Adversarial Ensemble Learning

1 2 3 4 5 6 7 8 9 10 11 12 Number of CNN Base Learners

3.30 3.35 3.40 3.45 3.50 3.55 3.60 3.65 3.70

Adversarial CE Loss

CIFAR10 (l2 norm ball)

Average adversarial loss Adversarial ensemble loss

1 2 3 4 5 6 7 8 9 10 11 12 Number of CNN Base Learners

2 9 2 8 2 7 2 6 2 5 2 4 2 3 2 2

CIFAR10 (l2 norm ball)

Cross diversity Gradient diversity

1 2 3 4 5 6 7 8 9 10 11 12 Number of CNN Base Learners

Adversarial CE Loss

CIFAR10 (l norm ball)

Average adversarial loss Adversarial ensemble loss

1 2 3 4 5 6 7 8 9 10 11 12 Number of CNN Base Learners

0 2 9 2 8 2 7 2 6 2 5 2 4 2 3 2 2 2 1 20

CIFAR10 (l norm ball)

Cross diversity Gradient diversity

Figure 3. An illustration of diversity decomposition in Theorem 3.4

on dataset CIFAR10 of perturbation balls with l2 and l norm, respectively. Here, we consider CNNs as base learners.

The cross-entropy loss over (x + δ, y) is given by

ℓ(f(x + δ), y) = y(f(x) + f(x)T δ)

+ ln(1 + exp(f(x) + f(x)T δ)) ,

as in (Bishop & Nasrabadi, 2006), and recall that

δ f arg maxδ ϵpℓ(f(x + δ), y) . (2)

We have the following closed-form solution for pf(x + δ ), and the detailed proof is presented in Appendix C.1.

Lemma 3.3. For function f and example (x, y), we have

pf(x+δ f) = 1+exp( (f(x) (2y 1) f(x) qϵ)) 1

where probability of positive class pf( ) and perturbation δ f are defined by Eqns. (1) and (2), respectively.

Based on this lemma, we have

Theorem 3.4. For ensemble f = Pm j=1 fj/m, we have the decomposition of adversarial ensemble loss under the first-order approximation over example (x, y) as follows:

max δ ϵ p ℓ( f(x + δ), y) =

maxδ ϵp ℓ(fj(x + δ), y)

m | {z } average of individual adversarial losses

fj(x) q f(x) q

m | {z } gradient diversity

KL(p f( x f), pfj( xfj))

m | {z } cross diversity

where 1/p + 1/q = 1, r = ϵ(y p f( x f))(2y 1), pf( ) is defined by Eqn. (1), and xf = x + δ f is the adversarial example with δ f from Eqn. (2).

In this theorem, the adversarial ensemble loss is decomposed into the average of individual adversarial losses, gradient

2 3 4 5 6 7 8 9 10 11 12 Number of CNN Base Learners

Adversarial Loss

Average cos value Average adversarial loss Adversarial ensemble loss

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Average Cos Value

CIFAR10 (l2 norm ball)

2 3 4 5 6 7 8 9 10 11 12 Number of CNN Base Learners

Adversarial Loss

Average cos value Average adversarial loss Adversarial ensemble loss

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Average Cos Value

CIFAR10 (l norm ball)

Figure 4. An illustration of influence of the average cos value on dataset CIFAR10 of perturbation balls w.r.t. l2 and l norm, respectively. Here, we consider CNNs as base learners.

diversity and cross diversity. The cross diversity shows the diversity of base learners according to their KL divergence to ensemble f, which is relevant to function outputs and gradients, as shown in Lemma 3.3. It is evident that the decomposition of Theorem 3.4 is quite different from that of Theorem 3.2 because of different loss functions. The detailed proof is presented in Appendix. C.2.

We also present Figure 3 to illustrate the decomposition of Theorem 3.4, and some experimental details are given in Appendix B.2. As can be seen, adversarial ensemble loss gets smaller as for larger cross and gradient diversities w.r.t. l2 and l norm perturbation balls, when we maintain the average of individual adversarial losses stable. In addition, it is not sufficient to characterize the diversity of adversarial ensemble by only merely considering the average of cos values as done in (Dabouei et al., 2020; Bogun et al., 2022). We also present some empirical studies on the influence of average cos values on adversarial ensemble loss shown in Figure 4, and more details are given in Appendix C.3.

4. Our Adv EOAP Method

Motivated from our theoretical analysis in Theorem 3.4, we develop a robust ensemble method for multi-class learning, and the basic idea is to train multiple deep neural networks for adversarial ensemble via a regularization by considering gradient diversity and cross diversity simultaneously.

For multi-class learning with κ classes, the base learner f = (f1, f2, , fκ): X Rκ maps each instance to a κ-dimensional logit vector. The predicted probability vector pf(x) = (pf1(x), pf2(x), , pfκ(x)) can be calculated from f(x) via softmax function as follows

pfk(x) = exp (fk(x)) Pκ j=1 exp (fj(x)) . (3)

This section also focuses on the first-order approximation f(x + δ) f(x) + Jf(x)δ, where Jf(x) is the Jacobi matrix of f (Rudin et al., 1964). Denote by

δ f arg maxδ ϵpℓ(f(x + δ), y) . (4)

For m base learners f1, , fm with fj = (fj,1, , fj,κ),

On the Diversity of Adversarial Ensemble Learning

class 1 class 2 class 3 class 4

Probability

class 1 class 2 class 3 class 4

Probability

class 1 class 2 class 3 class 4

Probability

class 1 class 2 class 3 class 4

Probability

Figure 5. An illustration of orthogonality by optimizing Eqn. (5).

we have cross diversity for multi-classification

Cross diversity =

KL(p f(x + δ f), pfj(x + δ fj))

where pf( ) and δ f are given by Eqns. (3)-(4), respectively. From some algebraic derivations in Appendix D.1, we have

Gradient diversity =

r, Jfj(x)δ fj J f(x)δ f

r, fj(x + δ fj) f(x + δ f)

where r = p f(x + δ f) ey.

For multi-class learning, we could simultaneously improve cross diversity and gradient diversity by diversifying the output predictions of adversarial examples of base learners. This motivates us to develop a new ensemble algorithm via orthogonal adversarial predictions, i.e., we orthogonalize the outputs over adversarial examples for base learners to improve diversity. The orthogonal idea is inspired by (Pang et al., 2019), which is limited only on clean examples, while our work generalizes to adversarial examples.

Specifically, we introduce a regularization for orthogonal adversarial output predictions of f1, , fm as

Γα(x, y) = H

pfj(x + δ fj)

+α log(det(AT A)) ,

where H( ) is the function of information entropy and

A = [ pf1(x + δ f1), , pfm(x + δ fm)] R(K 1) m .

Here, pf( ) is an (K 1)-dimensional vector obtained by removing the y-th element of pf( ) and normalizing with l1 norm, and det(AT A) shows the square of volume of polytope spanned by pf1(x + δ f1), , pfm(x + δ fm) as in (Bernstein, 2009). Intuitively, the orthogonal probability vectors pf1(x + δ f1), , pfm(x + δ fm) could yield

Algorithm 1 The Adv EOAP method

Input: training dataset S, number of base learners m, learning rate η and the SGD iterations T Initialize: base learners f1, , fm for t = 1 to T do

Partition training dataset S into batches S1, , Sb for k [b] do

Generate adversarial examples for every base learner and example by the PGD-attack for j = 1 to m do

Update learner fj by gradient descent in Eqn. (5) end for end for end for Output: Ensemble model f = Pm j=1 fj/m

larger volume and entropy in Γα(x, y), and hence improve gradient diversity and cross diversity simultaneously.

Given training sample S = {(x1, y1), , (xn, yn)}, we present the final objective optimization as

j=1 max δ ϵp

ℓ(fj(xi + δ), yi) λΓα(xi, yi)

where λ is a hyper-parameter to tradeoff adversarial loss and regularization Γα( ). We could obtain mutual orthogonal base learners f1, , fm from the optimization of Eqn. (5), and the details are given in Appendix D.2.

Figure 5 gives an illustration for the orthogonality of base learners. Here are three base learners in multi-class learning of 4 classes, and the ground-truth class is class 1. The output predictions are mutually orthogonal except for the groundtruth class 1. Therefore, the ensemble of three base learners could make correct prediction robustly even if three base learners are misled to different classes.

On the optimization of Eqn. (5), we take adversarial training method with stochastic gradient descent from (Madry et al., 2018). We generate adversarial examples w.r.t. all base learners by PGD-attack, and calculate gradients to update the parameters for each base learner of deep neural network. Algorithm 1 presents the detailed description for our Adversarial Ensemble training with Orthogonal Adversarial Predictions, which is short for Adv EOAP. For Algorithm 1, the time complexity takes m-times as that of training a single neural network adversarially. In addition, it takes O(m3) computational cost to calculate the regularization and gradients. More details are given in Appendix D.3

On the Diversity of Adversarial Ensemble Learning

Table 1. Comparison of classification accuracies (mean std %) over adversarial examples generated by different adversarial attacks. / indicates that our Adv EOAP is significantly better/worse than the corresponding methods (pair-wise t-test at 95% significance level).

Methods FGSM PGD10 PGD20 PGD40 Auto PGD MORA Auto Attack

Our Adv EOAP 96.413 0.124 95.676 0.039 95.648 0.056 95.504 0.098 94.907 0.187 94.672 0.209 94.072 0.423

GAL 10.271 0.709 00.001 0.002 00.002 0.005 00.000 0.000 00.000 0.000 00.004 0.005 00.000 0.000

ADP 10.888 1.931 00.000 0.000 00.000 0.000 00.000 0.000 00.000 0.000 00.004 0.006 00.000 0.000

Adv ADP 95.333 0.098 95.119 0.061 95.059 0.089 93.032 0.191 93.298 0.109 93.172 0.063 92.948 0.062

DVERGE 74.903 1.059 37.506 4.292 34.807 4.981 05.846 2.573 00.001 0.002 00.299 0.361 00.000 0.000

PDD 10.446 1.293 06.041 2.640 04.059 2.904 02.256 2.792 01.163 1.295 04.100 4.006 00.000 0.000

TRS 91.044 0.892 86.544 1.300 86.709 1.014 85.954 3.283 80.902 4.905 79.628 5.471 78.615 5.997

i GATADP 83.814 2.408 79.892 1.410 79.281 1.732 79.778 2.028 59.357 4.559 50.208 6.439 48.589 5.178

Our Adv EOAP 82.743 0.263 81.770 0.245 81.752 0.248 80.543 0.290 81.467 0.079 80.643 0.332 80.506 0.327

GAL 15.404 6.709 00.352 0.447 00.170 0.215 00.315 0.306 00.000 0.000 00.009 0.005 00.000 0.000

ADP 22.287 1.741 00.001 0.002 00.001 0.002 00.000 0.000 00.000 0.000 00.009 0.005 00.000 0.000

Adv ADP 82.728 0.331 79.167 0.338 79.074 0.387 80.080 0.408 78.461 0.372 77.833 0.313 77.732 0.245

DVERGE 48.242 1.536 27.140 2.878 25.846 3.019 32.652 3.033 17.222 4.786 20.126 4.605 15.035 5.560

PDD 29.936 4.991 18.740 4.925 17.958 4.873 19.043 4.524 12.161 5.521 14.408 3.617 00.319 0.466

TRS 70.767 0.466 69.330 0.125 69.285 0.105 67.641 0.440 68.719 0.098 67.800 0.183 67.250 0.304

i GATADP 66.278 2.051 63.139 0.236 62.967 0.149 62.882 0.379 61.920 0.083 51.869 4.773 48.750 5.763

Our Adv EOAP 55.718 0.245 53.076 0.249 52.996 0.255 52.903 0.295 51.997 0.234 48.318 0.065 47.884 0.060

GAL 12.370 2.959 00.007 0.013 00.000 0.000 00.000 0.000 00.000 0.000 00.002 0.004 00.000 0.000

ADP 23.100 0.757 00.008 0.010 00.001 0.003 00.000 0.000 00.000 0.000 00.001 0.003 00.000 0.000

Adv ADP 55.478 0.214 47.116 0.278 46.802 0.284 46.573 0.332 44.192 0.216 42.904 0.235 42.174 0.198

DVERGE 28.536 0.882 05.358 0.344 04.830 0.343 04.690 0.269 02.246 0.155 02.868 0.178 01.748 0.164

PDD 23.750 4.005 14.116 4.596 14.896 4.596 17.400 2.710 05.570 2.365 09.526 4.345 01.000 0.626

TRS 39.350 0.402 37.548 0.431 37.453 0.440 37.263 0.259 36.690 0.440 32.828 0.433 32.588 0.433

i GATADP 19.990 0.509 18.075 0.035 18.000 0.056 13.707 2.596 17.260 0.099 12.365 1.054 12.090 1.103

5. Experiments

We conduct experiments on three datasets2: MNIST of 70000 images and 784 dimensions, F-MNIST of 70000 images and 784 dimensions, and CIFAR10 of 60000 images and 3072 dimensions. Three datasets have been wellstudied in previous works (Strauss et al., 2017; Kariyappa & Qureshi, 2019; Yang et al., 2021; Deng & Mu, 2024). We compare our method with the state-of-the-art methods on adversarial ensemble learning as follows:

GAL: Non-adversarial training via the diversity of cos values of gradients (Kariyappa & Qureshi, 2019);

ADP: Non-adversarial training via the diversity of the orthogonality of predictions (Pang et al., 2019);

Adv ADP: Adversarial training via the diversity of the orthogonality of predictions (Pang et al., 2019);

DVERGE: Adversarial training on the exchange of adversarial examples in base learners to diversify the adversarial vulnerability (Yang et al., 2020);

2Download from https://paperswithcode.com/dataset.

PDD: Non-adversarial training to diversify the feature representations via dropouts (Huang et al., 2021);

TRS: GAL by preserving the smoothness of base learners (Yang et al., 2021);

i GATADP: Adv ADP by allocating globally adversarial examples to base learners (Deng & Mu, 2024).

For all datasets, the perturbation size is set as 0.2, 0.05 and 0.03 under l -norm ball, respectively, as done in (Croce & Hein, 2020; Deng & Mu, 2024). For all methods, we select Res Net20 as base learners with learners number as 3, 3 and 8 for MNIST, F-MNIST and CIFAR10, respectively. More settings are given in Appendix E.1. All experiments are performed on a server with 64 CPU cores (2 Intel Xeon Gold 6430 CPUs) and NVIDIA Ge Force RTX 4090 GPU, running Ubuntu 24.04 with 1TB main memory.

5.1. Performance under adversarial attacks

We take accuracy to measure performance on adversarial examples generated by seven popular adversarial attacks, i.e., FGSM (Goodfellow et al., 2015), PGD10, PGD20,

On the Diversity of Adversarial Ensemble Learning

Table 2. Comparison of classification accuracies (mean std %) over adversarial examples generated by EOT and BPDA attacks. / indicates that our Adv EOAP is significantly better/worse than the corresponding methods (pair-wise t-test at 95% significance level).

Attacks Datasets Our Adv EOAP GAL Adv ADP PDD DVERGE TRS i GATADP

MNIST 88.116 0.316 01.178 1.958 85.724 0.163 03.913 5.471 38.358 3.147 75.535 4.981 72.721 1.258

F-MNIST 62.416 0.778 02.201 1.886 61.137 0.547 08.323 1.815 35.162 2.390 56.592 1.110 57.893 2.147

CIFAT10 42.003 0.504 01.490 0.179 40.787 0.238 08.535 0.006 20.371 0.550 31.629 1.156 16.408 2.697

MNIST 95.382 0.079 00.002 0.005 92.713 0.137 02.757 2.852 14.187 7.868 85.024 3.758 71.763 3.041

F-MNIST 80.080 0.256 00.352 0.447 79.074 0.387 17.958 4.873 25.846 3.019 66.409 0.450 62.967 0.149

CIFAT10 49.530 0.050 00.000 0.000 44.687 0.286 14.896 4.596 04.730 0.343 33.417 0.310 10.693 1.776

MNIST 95.676 0.104 00.000 0.000 93.646 0.133 12.398 4.848 19.385 8.467 86.783 3.416 84.261 1.767

F-MNIST 80.832 0.311 01.098 0.243 80.832 0.380 30.152 8.392 40.672 2.473 68.032 0.364 73.444 0.657

CIFAT10 53.092 0.230 00.000 0.000 47.858 0.174 25.277 4.449 05.608 0.343 37.383 0.355 14.500 2.608

MNIST 96.132 0.071 00.000 0.000 94.857 0.085 09.165 0.941 57.491 1.640 88.993 2.873 91.126 0.672

F-MNIST 81.309 0.409 00.343 0.206 82.106 0.468 30.265 9.179 52.157 1.658 68.900 0.330 75.948 0.619

CIFAT10 52.440 0.226 00.000 0.000 46.240 0.252 25.690 3.737 05.347 0.382 36.480 0.261 13.600 2.548

MNIST 94.596 0.135 00.022 0.020 92.830 0.244 04.941 4.538 02.635 1.482 81.595 2.255 44.669 7.900

F-MNIST 80.241 0.247 00.026 0.045 80.757 0.386 05.706 7.595 50.324 4.693 67.526 0.467 68.174 0.462

CIFAT10 53.383 0.110 00.273 0.073 53.397 0.160 12.650 1.642 26.573 0.046 40.507 0.399 14.410 1.161

Table 3. Comparison of accuracies (mean std%) for our Adv EOAP with and without regularization Γ( ) under the APGD attack.

Our Adv EOAP MNIST F-MNIST CIFAR10 without Γα( ) 93.513 0.046 80.317 0.159 50.833 0.102 with Γα( ) 94.907 0.187 81.202 0.311 51.997 0.234 Improvement 1.394 0.149 1.150 0.140 1.163 0.257

PGD40 (Madry et al., 2018), Auto PGD (Croce & Hein, 2020), MORA (Gao et al., 2022) and Auto Attack (Croce & Hein, 2020). All methods are evaluated over 50 runs with different random initializations, as summarized in Table 1.

From Table 1, it is clear that our Adv EOAP method achieves significantly better performance than GAL, PDD and TRS, since it wins at most times and never loses. This is because such methods merely focus on cos values as the diversity measure, yet ignore other factors such as cross diversity and individual adversarial losses, which is consistent with the ensemble decomposition as in Theorem 3.4.

Our Adv EOAP is also better than DEVERGE and i GATADP, since DVERGE exchanges adversarial examples of base learners and i GATADP allocates adversarial examples of ensemble, without the consideration of diversities of base learners. Our Adv EOAP outperforms ADP and Adv ADP, since such methods consider diversity over clean examples, rather than adversarial examples. It is quite different to study diversity in the adversarial ensemble learning, which is heavily relevant to intrinsic structures and output predictions of models simultaneously as in Theorem 3.1.

We further study two additional adversarial attacks BPDA (Athalye et al., 2018a) and EOT (Athalye et al., 2018b),

0 0.01 0.02 0.03 0.04 0.05 Perturbation Size

0.80 0.83 0.85 0.88 0.90 0.93 0.95 0.98 1.00

Adversarial Accuracy

Our method Adv ADP TRS i GAT

0 0.01 0.02 0.03 0.04 0.05 Perturbation Size

Adversarial Accuracy

Our method Adv ADP TRS i GAT

Figure 6. Influence of perturbation sizes under the PGD20 attack.

where EOT considers adversarial perturbations insensitive to transformations, and BPDA considers potential gradient risks and designs corresponded attacks. More details are given in Appendix E.2. We implement EOT and four BPDA attacks, and experimental comparisons are summarized in Table 2. It is obvious that our Adv EOAP method achieves better performance than other adversarial ensembles, which shows the robustness of our method to gradient risks and adversarial perturbations insensitive to transformations.

We also present some ablation experiments to verify the effectiveness of the regularization Γα( ) in Eqn. (5), which essentially considers gradient diversity and cross diversity via orthogonal adversarial predictions. Table 3 shows some experimental comparisons for our Adv EOAP with and without regularization. It is clear that our method takes better performance with regularization, which nicely shows the importance of diversity on the design of ensemble methods.

We study the influence of different perturbation size ϵ over datasets MNIST and FMNIST, as shown in Figure 6. It is observable that our Adv EOAP achieves better or comparable performance than other ensemble methods for different size of perturbations, in particular for larger ϵ. This shows the effectiveness of our methods for larger perturbation size.

On the Diversity of Adversarial Ensemble Learning

Running Time (seconds)

Our Adv E-OAP GAL(non-adv.)

ADP(non-adv.) Adv ADP

Figure 7. Comparisons of running time (seconds/epoch) for our Adv EOAP and other adversarial ensemble methods.

1 2 3 4 5 6 7 8 Number of Misled Classes

With regularization Without regularization

1 2 3 4 5 6 7 8 Number of Misled Classes

With regularization Without regularization

Figure 8. The frequency of number of misled classes.

We finally compare the training time of Adv EOAP with other methods in Figure 7. As can be seen, our Adv EOAP takes comparable training time to other adversarial-training ensembles Adv ADP, DVERGE, PDD, TRS and i GAT, but with more time than the non-adversarial training methods GAL and ADP, which obviously take smaller adversarial prediction accuracy as shown in Table 1.

5.2. Orthogonality and convergence analysis

We now illustrate the orthogonality of base learners for our Adv EOAP, which could mislead base learners to different classes under adversarial attacks. Figure 8 summarizes the frequency of number of misled classes with 8 base learners over two datasets MNIST and FMNIST. It is clear that base learners of our Adv EOAP predict with more different classes than that of Adv EOAP without regularization.

We also present the convergence analysis on adversarial ensemble loss, average of adversarial losses, as well as gradient diversity and cross diversity during the training process. Figure 9 presents the convergence curves over three datasets MNIST, FMNIST and CIFAR10. It is clear that our Adv EOAP method could decrease adversarial ensemble loss, and simultaneously increase gradient diversity and cross diversity. This is nicely in accordance with our diversity decomposition in Theorem 3.4.

6. Conclusion

Diversity has always been one of the most crucial factors on the designs of ensemble methods. This work focuses on the fundamental problems of diversity in the adversarial

0 10 20 30 40 50 60 Epoch

Adversarial Loss

Average adversarial loss Adversarial ensemble loss

0 10 20 30 40 50 60 Epoch

0 2 9 2 8 2 7 2 6 2 5 2 4 2 3 2 2 2 1

Gradient diversity Cross diversity

0 10 20 30 40 50 60 Epoch

Adversarial Loss

Average adversarial loss Adversarial ensemble loss

0 10 20 30 40 50 60 Epoch

2 9 2 8 2 7 2 6 2 5 2 4 2 3 2 2

Gradient diversity Cross diversity

0 50 100 150 200 250 Epoch

Adversarial Loss

Average adversarial loss Adversarial ensemble loss

0 50 100 150 200 250 Epoch

2 9 2 8 2 7 2 6 2 5 2 4 2 3 2 2

Gradient diversity Cross diversity

Figure 9. The curve of adversarial losses and diversities during the training process, where we normalize according to the average of individual adversarial losses.

ensemble learning. We prove the NP-Hard problem on the precise calculation of diversity for networks in adversarial ensemble learning, and give the first diversity decomposition under first-order approximation. Specifically, adversarial ensemble loss can be decomposed into average of individual adversarial losses, prediction diversity, gradient diversity and cross diversity. We consider similar diversity decomposition for classification with cross-entropy loss. Based on theoretical analysis, we develop a new ensemble method via orthogonal adversarial predictions to improve gradient and cross diversity simultaneously. An interesting future work is to explore other adversarial ensemble algorithms with better robustness and generalization from our theoretical analysis.

Acknowledgements

The authors want to thank the reviewers for their helpful comments and suggestions. This research was supported by National Key R&D Program of China (2021ZD0112802) and NSFC (62376119).

Impact Statement

This work shows the NP-Hardness of the calculation for diversity in adversarial ensemble learning and presents the first diversity decomposition with first-order approximation. It further develops a new ensemble method for adversarial defense and validates it through a series of experiments. There are many potential societal consequences of our work, none of which we feel must be specifically highlighted here.

On the Diversity of Adversarial Ensemble Learning

Abernethy, J., Awasthi, P., and Kale, S. A multiclass boosting framework for achieving fast and provable adversarial robustness. Co RR, abs/2103.01276, 2021.

Andriushchenko, M., Croce, F., Flammarion, N., and Hein, M. Square attack: A query-efficient black-box adversarial attack via random search. In Proceedings of the 16th European Conference on Computer Vision, pp. 484 501, Virtual Event, 2020.

Athalye, A., Carlini, N., and Wagner, D. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In Proceedings of the 35th International Conference on Machine Learning, pp. 274 283, Stockholm, Sweden, 2018a.

Athalye, A., Engstrom, L., Ilyas, A., and Kwok, K. Synthesizing robust adversarial examples. In Proceedings of the 35th International Conference on Machine Learning, pp. 284 293, Stockholm, Sweden, 2018b.

Bartoldson, B. R., Diffenderfer, J., Parasyris, K., and Kailkhura, B. Adversarial robustness limits via scalinglaw and human-alignment studies. In Proceedings of the 41st International Conference on Machine Learning, pp. 3046 3072, Vienna, Austria, 2024.

Bernstein, D. S. Matrix Mathematics: Theory, Facts, and Formulas. Princeton University Press, 2009.

Bishop, C. M. and Nasrabadi, N. M. Pattern Recognition and Machine Learning, volume 4. Springer, 2006.

Bogun, A., Kostadinov, D., and Borth, D. Saliency diversified deep ensemble for robustness to adversaries. In Proceedings of the 36th AAAI Workshop on Adversarial Machine Learning and Beyond, Virtual Event, 2022.

Croce, F. and Hein, M. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In Proceedings of the 37th International Conference on Machine Learning, pp. 2206 2216, Vienna, Austria, 2020.

Dabouei, A., Soleymani, S., Taherkhani, F., Dawson, J., and Nasrabadi, N. M. Exploiting joint robustness to adversarial perturbations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1122 1131, Seattle, WA, 2020.

Deng, Y. and Mu, T. Understanding and improving ensemble adversarial defense. In Advances in Neural Information Processing Systems 37, pp. 58075 58087, New Orleans, LA, 2024.

Deng, Y., Zhang, T., Lou, G., Zheng, X., Jin, J., and Han, Q.-L. Deep learning-based autonomous driving systems: A survey of attacks and defenses. IEEE Transactions on Industrial Informatics, 17(12):7897 7912, 2021.

Dietterich, T. G. Ensemble methods in machine learning. In Proceedings of the 1st International Workshop on Multiple Classifier Systems, pp. 1 15, Cagliari, Italy, 2000.

Finlayson, S. G., Bowers, J. D., Ito, J., Zittrain, J. L., Beam, A. L., and Kohane, I. S. Adversarial attacks on medical machine learning. Science, 363(6433):1287 1289, 2019.

Forsythe, G. E. and Golub, G. H. On the stationary values of a second-degree polynomial on the unit sphere. Journal of the Society for Industrial and Applied Mathematics, 13 (4):1050 1068, 1965.

Fortin, C. and Wolkowicz, H. The trust region subproblem and semidefinite programming. Optimization Methods and Software, 19(1):41 67, 2004.

Fursov, I., Morozov, M., Kaploukhaya, N., Kovtun, E., Rivera-Castro, R., Gusev, G., Babaev, D., Kireev, I., Zaytsev, A., and Burnaev, E. Adversarial attacks on deep models for financial transaction records. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 2868 2878, Virtual Event, 2021.

Gao, X., Xu, C.-Z., et al. Mora: Improving ensemble robustness evaluation with model reweighing attack. In Advances in Neural Information Processing Systems 35, pp. 26955 26965, New Orleans, LA, 2022.

Goodfellow, I. J., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, 2015.

Gowal, S., Rebuffi, S.-A., Wiles, O., Stimberg, F., Calian, D. A., and Mann, T. A. Improving robustness using generated data. In Advances in Neural Information Processing Systems 34, pp. 4218 4233, Virtual Event, 2021.

Guo, J.-Q., Teng, M.-Z., Gao, W., and Zhou, Z.-H. Fast provably robust decision trees and boosting. In Proceedings of the 39th International Conference on Machine Learning, pp. 8127 8144, Baltimore, MD, 2022.

Huang, B., Ke, Z., Wang, Y., Wang, W., Shen, L., and Liu, F. Adversarial defence by diversified simultaneous training of deep ensembles. In Proceedings of the 35th AAAI Conference on Artificial Intelligence, pp. 7823 7831, Virtual Event, 2021.

Kariyappa, S. and Qureshi, M. K. Improving adversarial robustness of ensembles with diversity training. Co RR, abs/1901.09981, 2019.

On the Diversity of Adversarial Ensemble Learning

Katz, G., Barrett, C., Dill, D. L., Julian, K., and Kochenderfer, M. J. Reluplex: An efficient smt solver for verifying deep neural networks. In Proceedings of the 29th International Conference on Computer Aided Verification, pp. 97 117, Heidelberg, Germany, 2017.

Krogh, A. and Vedelsby, J. Neural network ensembles, cross validation, and active learning. In Advances in Neural Information Processing Systems 7, pp. 231, Denver, CO, 1994.

Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. Towards deep learning models resistant to adversarial attacks. In Proceedings of the 6th International Conference on Learning Representations, Vancouver, Canada, 2018.

Mor e, J. J. and Sorensen, D. C. Computing a trust region step. SIAM Journal on Scientific and Statistical Computing, 4(3):553 572, 1983.

Pang, T., Xu, K., Du, C., Chen, N., and Zhu, J. Improving adversarial robustness via promoting ensemble diversity. In Proceedings of the 36th International Conference on Machine Learning, pp. 4970 4979, Long Beach, LA, 2019.

Pang, T., Yang, X., Dong, Y., Su, H., and Zhu, J. Bag of tricks for adversarial training. In Proceedings of the 9th International Conference on Learning Representations, Virtual Event, 2021.

Peng, S., Xu, W., Cornelius, C., Hull, M., Li, K., Duggal, R., Phute, M., Martin, J., and Chau, D. H. Robust principles: Architectural design principles for adversarially robust cnns. In 34th British Machine Vision Conference, pp. 739 740, Aberdeen, UK, 2023.

Pinot, R., Ettedgui, R., Rizk, G., Chevaleyre, Y., and Atif, J. Randomization matters how to defend against strong adversarial attacks. In Proceedings of the 37th International Conference on Machine Learning, pp. 7717 7727, Vienna, Austria, 2020.

Robbins, H. and Monro, S. A stochastic approximation method. Annals of Mathematical Statistics, pp. 400 407, 1951.

Rudin, W. et al. Principles of Mathematical Analysis, volume 3. Mc Graw-hill New York, 1964.

Sen, S., Ravindran, B., and Raghunathan, A. Empir: Ensembles of mixed precision deep networks for increased robustness against adversarial attacks. In Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 2020.

Strauss, T., Hanselmann, M., Junginger, A., and Ulmer, H. Ensemble methods as a defense to adversarial perturbations against deep neural networks. Co RR, abs/1709.03423, 2017.

Sun, T. and Zhou, Z.-H. Structural diversity of decision tree ensemble learning. Frontiers of Computer Science, 12(3): 560 570, 2018.

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. Intriguing properties of neural networks. In Proceedings of the 2nd International Conference on Learning Representations, Banff, Canada, 2014.

Wang, Z., Pang, T., Du, C., Lin, M., Liu, W., and Yan, S. Better diffusion models further improve adversarial training. In Proceedings of the 40th International Conference on Machine Learning, pp. 36246 36263, Honolulu, Hawaii, 2023.

Weng, L., Zhang, H., Chen, H., Song, Z., Hsieh, C.-J., Daniel, L., Boning, D., and Dhillon, I. Towards fast computation of certified robustness for relu networks. In Proceedings of the 35th International Conference on Machine Learning, pp. 5276 5285, Stockholm, Sweden, 2018.

Wood, D., Mu, T., Webb, A. M., Reeve, H. W., Lujan, M., and Brown, G. A unified theory of diversity in ensemble learning. Journal of Machine Learning Research, 24 (359):1 49, 2023.

Yang, H., Zhang, J., Dong, H., Inkawhich, N., Gardner, A., Touchet, A., Wilkes, W., Berry, H., and Li, H. Dverge: diversifying vulnerabilities for enhanced robust generation of ensembles. In Advances in Neural Information Processing Systems 33, pp. 5505 5515, Virtual Event, 2020.

Yang, Z., Li, L., Xu, X., Zuo, S., Chen, Q., Zhou, P., Rubinstein, B., Zhang, C., and Li, B. Trs: Transferability reduced ensemble via promoting gradient diversity and model smoothness. In Advances in Neural Information Processing Systems 34, pp. 17642 17655, Virtual Event, 2021.

Yang, Z., Li, L., Xu, X., Kailkhura, B., Xie, T., and Li, B. On the certified robustness for ensemble models and beyond. In Proceedings of the 10th International Conference on Learning Representations, Virtual Event, 2022.

Young, L. An inequality of the h older type, connected with stieltjes integration. Acta Mathematica, 67(1):251 282, 1936.

On the Diversity of Adversarial Ensemble Learning

Zhang, D., Zhang, H., Courville, A., Bengio, Y., Ravikumar, P., and Suggala, A. S. Building robust ensembles via margin boosting. In Proceedings of the 39th International Conference on Machine Learning, pp. 26669 26692, Baltimore, MD, 2022.

Zhang, H., Cheng, M., and Hsieh, C.-J. Enhancing certifiable robustness via a deep model ensemble. Co RR, abs/1910.14655, 2019a.

Zhang, H., Yu, Y., Jiao, J., Xing, E., El Ghaoui, L., and Jordan, M. Theoretically principled trade-off between robustness and accuracy. In Proceedings of the 36th International Conference on Machine Learning, pp. 7472 7482, Long Beach, LA, 2019b.

Zhou, Z.-H. Ensemble Methods: Foundations and Algorithms. CRC Press, 2012.

On the Diversity of Adversarial Ensemble Learning

Figure 10. The structure of transformed neural network. The φ1,j( ), max( ) and min( ) functions can be constructed by Re LU functions. The φ2 is fully connected layer with 1, 1 or 0 weights. The φ3 is fully connected layer with 1 or 0 weights.

A. Proof of Theorem 3.1

Definition A.1 (3-SAT problem). Given r boolean variables and s clauses in a conjunctive normal form CNF formula with each clause s size at most 3, is there an assignment to the r variables to make the CNF formula to be satisfied?

We present a key lemma for NP-hardness of adversarial squared loss, and the basic idea is a reduction from 3-SAT problem. Lemma A.2. For some Re LU neural network g( ) and example (x0, y0), it is an NP-hard problem to precisely calculate the following adversarial squared loss max δ ϵp {(g(x + δ) y)2} .

Proof. It is sufficient to prove the NP-hardness of solving maxδ ϵp{(g(x + δ) y)2} = γ for some γ > 0. Let ϕ = C1 C2 Cs be a 3-SAT formula over a variable set V = {v1, , vr}. Each Ci = q1 i q2 i q3 i is a disjunction with three literals q1 i , q2 i , q3 i , and each literal is a variable from V or their negations. The 3-SAT problem is to determine whether there exists an assignment a : V {0, 1} for true ϕ.

We will show that any 3-SAT formula ϕ can be transformed into a neural network g over sample (x0, y0) = (0, 1) in polynomial time, as well as the following sufficient and necessary condition:

ϕ is satisfiable max δ ϵp {(g(x0 + δ) y0)2} = (ϵ/r1/p + 1)2 . (6)

We will construct the Re LU neural network g(x) = φ5 φ4 φ3 φ2 φ1(x) with input x = (x1, x2, , xr) Rr as shown in Figure 10. Specifically, we present the detailed constructions as follows:

We construct function φ1(x) : Rr Rr with the j-th element

[φ1(x)]j = max(xj +ϵ/r1/p, 0) max(xj ϵ/r1/p, 0) ϵ/r1/p =

ϵ/r1/p for xj > ϵ/r1/p

xj for ϵ/r1/p < xj ϵ/r1/p

ϵ/r1/p for xj ϵ/r1/p ,

For x0 = 0 and δ p ϵ, we can achieve the maximum or minimum of the elements of φ1(x0 + δ) independently. It is easy to construct φ1(x) with a Re LU neural network because of three operators +, and max( , 0). Intuitively, the i-th element of φ1(x0 + δ) can be viewed as the variable vi in 3-SAT problem, and its value ϵ/r1/p and ϵ/r1/p can be viewed as the false and true of variable vi, respectively.

We construct φ2(x) = (x, x), and this follows that, for j [2r], x0 = 0 and δ p ϵ,

ϵ/r1/p [φ2 φ1(x0 + δ)]j ϵ/r1/p .

We achieve the maximum or minimum for the first r elements of φ2 φ1(x0 + δ) independently. The last r elements are always the opposite of the first r elements. Intuitively, the last r elements of φ2 φ1(x0 + δ) can be viewed as the negation of variables v1, , vr in 3-SAT problem.

On the Diversity of Adversarial Ensemble Learning

We construct φ3(x) = Wx with W = (w1 1; w2 1; w3 1; ; w1 s; w2 s; w3 s)T R3s 2r. Here, we construct three vectors w1 i , w2 i , w3 i {0, 1}2r for clause Ci = q1 i q2 i q3 i for i [s]. For k [3], the wk i is the unit vector with j-th and (j + r)-th element 1 if qk i is variable vj and its negation vj, respectively. For j [3s], x0 = 0 and δ p ϵ, we have

ϵ/r1/p [φ3 φ2 φ1(x0 + δ)]j ϵ/r1/p .

Intuitively, every three elements of φ3 φ2 φ1(x0 + δ) can be viewed as three literals of a clause in 3-SAT problem. We achieve independently the maximum or minimum for φ3 φ2 φ1(x0 + δ) whose corresponding literals are different variables.

We construct φ4(x) = (max{x1, x2, x3}, max{x4, x5, x6}, , max{x3s 2, x3s 1, x3s}), where max{ , , } = max{max{ , }, } and min{ , , } = min{min{ , }, } can be constructed via Re LU function (max{ , 0} function) as

max{a, b} = max{a b, 0} + b and min{a, b} = max{b a, 0} + b . (7)

For j [s], x0 = 0 and δ p ϵ, we have

ϵ/r1/p [φ4 φ3 φ2 φ1(x0 + δ)]j ϵ/r1/p .

We can achieve the maximum for the j-th element of φ4 φ3 φ2 φ1(x0 +δ) if and only if the (3j 2)-th, (3j 1)-th or 3j-th elements of φ3 φ2 φ1(x0 + δ) are maximal. Intuitively, the max function can be viewed as the operator, and the i-th element of φ4 φ3 φ2 φ1(x0 + δ) can be viewed as the clause Ci in 3-SAT problem.

We construct φ5(x) = min{x1, x2, , xs} = min{min{min{ , xs 2}, xs 1}, xs} via Re LU function. For x0 = 0 and δ p ϵ, we have

ϵ/r1/p φ5 φ4 φ3 φ2 φ1(x0 + δ) ϵ/r1/p . (8)

We achieve the maximum if and only if all elements of φ4 φ3 φ2 φ1(x0+δ) are maximal. Intuitively, the min function can be viewed as the operator in 3-SAT problem. The ϵ/r1/p and ϵ/r1/p value of φ5 φ4 φ3 φ2 φ1(x0 + δ) can be viewed as the false and true of the CNF formula ϕ = C1 C2 Cs, respectively.

For (x0, y0) = (0, 1) and ϵ 1, it remains to show that, from Eqns. (6) and (8),

ϕ is satisfiable there is an δ s.t. g(δ) = ϵ/r1/p . (9)

= If ϕ = C1 C2 Cm is satisfiable, then there is a satisfiable assignment α. We set the i-th element of δ as ϵ/r1/p

and ϵ/r1/p if α(vi) is true and false, respectively. Then, we discuss g(δ) step by step as follows:

The i-th element of φ1(δ) is ϵ/r1/p and ϵ/r1/p if the i-th element of δ is ϵ/r1/p and ϵ/r1/p, respectively;

The first r elements of φ2 φ1(δ) are equal to φ1(δ) while the last r elements are the opposite of the first r elements;

The (3(i 1) + j)-th element of φ3 φ2 φ1(δ) is the l-th and (l + r)-th element of φ2 φ1(δ) if literal qj i is variable vl and vl, respectively;

Every element of φ4 φ3 φ2 φ1(δ) is ϵ/r1/p, since there is at least one true literal in q1 i , q2 i , q3 i for every satisfiable Ci = q1 i q2 i q3 i , and hence there is at least one element with ϵ/r1/p in every three elements of φ3 φ2 φ1(δ);

The final output φ5 φ4 φ3 φ2 φ1(δ) is ϵ/r1/p, since φ5 takes the minimum of φ4 φ3 φ2 φ1(δ).

= If g(δ) = ϵ/r1/p, we discuss δ as follows:

Every elements of φ4 φ3 φ2 φ1(δ) is ϵ/r1/p from φ5( ) [ ϵ/r1/p, ϵ/r1/p];

At least one element is equal to ϵ/r1/p in every three elements of φ3 φ2 φ1(δ), since φ4 takes the maximum of every three elements. Without loss of generality, let all (3(i 1) + 1)-th elements be ϵ/r1/p for i [m];

On the Diversity of Adversarial Ensemble Learning

Figure 11. The structure of the transformed neural network. The φ block is a network similarly in the proof of Theorem A.2.

For the first r elements of φ2 φ1(δ), the l-th element will be equal to ϵ/r1/p if literal q1 i is variable vl. For the last r elements of φ2 φ1(δ), the (l + r)-th element will be equal to ϵ/r1/p if literal q1 i is the negation of variable vl;

The l-th element of φ1(δ) is ϵ/r1/p and ϵ/r1/p if the l-th element of φ2 φ1(δ) is ϵ/r1/p and ϵ/r1/p, respectively;

The l-th element of δ is ϵ/r1/p and ϵ/r1/p if the l-th element of φ1(δ) is ϵ/r1/p and ϵ/r1/p, respectively.

Let vl be true and false if the l-th element of δ is ϵ/r1/p and ϵ/r1/p, respectively. This assignment makes the CNF formula satisfiable, since there is at least one element in every three elements of φ3 φ2 φ1(δ) with ϵ/r1/p.

Proof of Theorem 3.1. It is sufficient to prove the NP-hardness of solving

j=1 max δ ϵp {(fj(x + δ) y)2} max δ ϵp {( f(x + δ) y)2} = γ for some γ > 0 .

Let ϕ = C1 C2 Cs be a 3-SAT formula over variable set V = {v1, , vr}. Each Ci = q1 i q2 i q3 i is a disjunction with three literals q1 i , q2 i , q3 i , and each literal is a variable from V or their negations. We will show that any 3-SAT formula ϕ can be transformed into two neural networks f1, f2 and sample (x0, y0) = (0, 1) in polynomial time, as well as the following sufficient and necessary condition:

ϕ is satisfiable Υ := 1

j=1 max δ ϵp {(fj(δ) + 1)2} max δ ϵp {( f(δ) + 1)2} = (ϵ/r1/p + 1)2 1 . (10)

We will construct the Re LU neural network f1(x) and f2(x) with inputs x Rr+1 in Figure 11. For ϕ = C1 C2 Cs, we construct neural network g(x1:r) as shown in Lemma A.2, where x1:r are the first r elements of x. We construct f1(x) = min{g(x1:r), η(xr+1)} with

η(xr+1) = max(xr+1 + ϵ r1/p , 0) max(xr+1 ϵ r1/p , 0) ϵ r1/p =

ϵ/r1/p for xr+1 > ϵ/r1/p

xr+1 for ϵ/r1/p < xr+1 ϵ/r1/p

ϵ/r1/p for xr+1 ϵ/r1/p .

It is easy to construct f1(x) via a Re LU neural network from Eqn. (7), and we have

ϕ is satisfiable max δ ϵp {(f1(δ) + 1)2} = (ϵ/r1/p + 1)2 . (11)

This is because ϵ/r1/p f1(x) ϵ/r1/p from Eqn. (8), and for ϵ (0, 1], ϕ is satisfiable if and only if there exists an δ s.t. f1(δ) = ϵ/r1/p. If f1(δ) = ϵ/r1/p, then ϕ is satisfiable from Eqn. (9), since we have g(δ1:r) = ϵ/r1/p from f1(δ) = min{g(δ1:r), η(δr+1)} and η(δr+1) ϵ/r1/p. For the inverse direction, if ϕ is satisfiable, then there is an δ1:r s.t. g(δ1:r) = ϵ/r1/p from Eqn. (9). This follows that f1(δ) = ϵ/r1/p for δk+1 = ϵ/r1/p.

In a similar manner, we construct f2(x) = min{g(x1:k), η( xk+1)}, and have

ϕ is satisfiable max δ ϵ p {(f2(δ) + 1)2} = (ϵ/r1/p + 1)2 . (12)

On the Diversity of Adversarial Ensemble Learning

For odd function η(x), we have the ensemble f(x) = (f1(x) + f2(x))/2 as

2(min{g(x1:k), η(xk+1)}+min{g(x1:k), η( xk+1)}) 1

2(η(xk+1)+η( xk+1)) = 1

2(η(xk+1) η(xk+1)) = 0 .

We have ϵ/r1/p f(x) 0, and it holds that ( f(δ) + 1)2 1 for ϵ (0, 1], and the equality holds for δ = 0. We have

max δ p ϵ{( f(δ) + 1)2} = 1 . (13)

For the proof of Eqn. (10), we have Υ = (ϵ/r1/p + 1)2 1 if ϕ is satisfiable from Eqns. (11)-(13). For the inverse direction, if Υ = (ϵ/r1/p + 1)2 1, then we have, from Eqn. (13),

j=1 max δ ϵp {(fj(δ) + 1)2} = (ϵ/r1/p + 1)2 . (14)

From f1(δ), f2(δ) [ ϵ/r1/p, ϵ/r1/p] and ϵ (0, 1], we have

(f1(δ) + 1)2 (ϵ/r1/p + 1)2 and (f2(δ) + 1)2 (ϵ/r1/p + 1)2 .

This follows that, from Eqn. (14)

max δ ϵp {(f1(δ) + 1)2} = (ϵ/r1/p + 1)2 and max δ ϵp {(f2(δ) + 1)2} = (ϵ/r1/p + 1)2 ;

therefore, ϕ is satisfiable from Eqns. (11)-(12). This completes the proof.

B. Appendix for Section 3.2

B.1. Proof of Theorem 3.2

We begin with some useful lemmas as follows: Lemma B.1 (H older s inequality (Young, 1936)). For two real vectors a = (a1, , ad) and b = (b1, , bd), we have

i=1 |aibi| a p b q for positive p and q with 1/p + 1/q = 1 ,

where the equality holds if and only if α|ai|p = β|bi|q for i [d] w.r.t. some positive constants α and β. Lemma B.2. For vectors w, δ Rd, we have

max δ p ϵ w T δ = ϵ w q and min δ p ϵ w T δ = ϵ w q.

Proof. For every δ with δ p ϵ, we have, from Lemma B.1,

i=1 |wiδi| δ p w q ϵ w q .

Notice that the above equality holds if we choose δ = δ = (δ 1, δ 2, , δ d) with

δ i = sign(wi)ϵ |wi|q/ w q q 1/p ,

where sign(wi) is equal to 1, 0, 1 if wi is negative, zero or positive, respectively. This follows that

max δ p ϵ w T δ = ϵ w q .

We also have, by letting δ = δ,

min δ p ϵ w T δ = min δ p ϵ w T ( δ ) = min δ p ϵ w T δ = max δ p ϵ w T δ = ϵ w q ,

which completes the proof.

On the Diversity of Adversarial Ensemble Learning

Lemma B.3. For linear function hw(x) = w T x + b, we have, for positive p, q with 1/p + 1/q = 1,

max δ p ϵ(w T (x + δ) + b y)2 = (|w T x + b y| + w qϵ)2 .

Proof. We have the adversarial squared loss

max δ p ϵ(w T (x + δ) + b y)2 = max δ p ϵ(w T x + b y + w T δ)2 .

From Lemma B.2, we have w qϵ w T δ w qϵ and

max δ p ϵ(w T (x + δ) + b y)2 = max δ p ϵ(w T x + b y + w T δ)2 = (|w T x + b y| + w qϵ)2 ,

which completes the proof.

Proof of Theorem 3.2. We have the adversarial loss for fj and f, from Lemma B.3,

max δ ϵp {( f(x + δ) y)2} = max δ ϵp {( f(x) + f(x)T δ y)2} = (| f(x) y| + f(x) qϵ)2 ,

max δ ϵp {(fj(x + δ) y)2} = max δ ϵp {(fj(x) + fj(x)T δ y)2} = (|fj(x) y| + fj(x) qϵ)2 .

This follows that

j=1 max δ ϵp {(fj(x + δ) y)2} max δ ϵp {( f(x + δ) y)2}

j=1 (|fj(x) y| + fj(x) qϵ)2 (| f(x) y| + f(x) qϵ)2

( f(x) y)2 + 2| f(x) y| f(x) qϵ + f(x) 2 qϵ2

j=1 ( fj(x) 2 q f(x) 2 q) + 2ϵ

j=1 ( fj(x) q|fj(x) y| f(x) q| f(x) y|)

j=1 (fj(x) y)2 ( f(x) y)2

= Gradient Diversity + Cross Diversity + 1

j=1 (fj(x) y)2 ( f(x) y)2 .

We also have

j=1 (fj(x) y)2 ( f(x) y)2 = 1

j=1 fj(x)2 f(x)2 = 1

j=1 (fj(x) f(x))2 ,

which completes the proof.

B.2. Training Details for Diversity Decomposition

We consider the base learners as convolutional neural network with two convolutional layers and one MLP layer of 100 neurons. The first convolutional layer has 24 filters with kernel size 5, while the second convolutional layer has 24 filters of kernel size 5. We take the Re LU activation function, and the input and output sizes are 3 32 32 and 1, respectively.

We select the 5-th and 6-th class on datasets MNIST and F-MNIST to train the base learners independently, and take the SGD method (Robbins & Monro, 1951) with batch size 256 and learning rate 0.01. We consider the PGD-attack (Madry et al., 2018) to calculate the adversarial ensemble loss and average of individual adversarial losses. The perturbation size is set to 8/255 and 128/255 for l and l2 norm, respectively.

On the Diversity of Adversarial Ensemble Learning

B.3. Discussions on the Second-Order Approximation

For the second-order approximation, we have

f(x + δ) f(x) + f(x)T δ + 1

2δT H(x)δ ,

where H(x) is the Hessian matrix of f at point x. This follows that

max δ ϵp (f(x + δ) y)2 = max δ ϵ p

f(x) + f(x)T δ + 1

2δT H(x)δ y 2 ,

and hence, it is sufficient to consider two problems as follows

max δ ϵp f(x) + f(x)T δ + 1

2δT H(x)δ y and min δ ϵp f(x) + f(x)T δ + 1

2δT H(x)δ y .

We focus on the special case p = 2, and the above two problems can be formalized as

min δ 2 ϵ 1 2δT Bδ b T δ for some b Rn and symmetric matrix B Rn n . (15)

This is known as the trust region subproblem (Forsythe & Golub, 1965; Mor e & Sorensen, 1983; Fortin & Wolkowicz, 2004), and we have

the optimal solution δ = B 1b if there is no solution on the boundary {δ : δ 2 ϵ} in (15), from the positive definiteness of B and B 1b < ϵ;

the optimal solution δ = (B + α I) 1b if there is solution on the boundary {δ : δ 2 ϵ} in (15) and α > λ1. Here, λ1 is the smallest eigenvalue of B and α is the solution of

γ2 j (λj + α )2 = ϵ2 , (16)

where λ1, , λn are eigenvalues of B and γ1, , γn are the elements of QT b with eigendecomposition B = QΛQT .

the optimal solution δ = δ0 + τz if there is solution on the boundary {δ : δ 2 ϵ} in (15) and α λ1 in (16). Here, δ0 is the solution of (B λ1I)δ0 = b s.t. δ0 ϵ ,

and z is an eigenvector of B with eigenvalue λ1 and τ R satisfies δ0 + τz = ϵ.

Here, the main challenge is how to obtain the closed-form solution of adversarial loss via second-order approximation, since it is relevant to the roots of high-order polynomials Eqn. (16).

B.4. Discussions of Average of cos Values for l Norm

We focus on 2-dimensional instance space X R2 and label space Y R, and consider

f1(x) = x1 + x2, f2(x) = x1 3x2, f3(x) = abx1 + bx2 and f4(x) = abx1 bx2 ,

where a = (

5 1)/2 and b =

5/5. We study two ensembles: one ensemble of f1 and f2; the other ensemble of f3 and f4. For example (x, y) = ([1, 0], 1) and perturbation set = {δ : δ 1}, we have

cos( f1(x), f2(x)) = f1(x), f2(x) f1(x) 2 f2(x) 2 = f3(x), f4(x) f3(x) 2 f4(x) 2 = cos( f3(x), f4(x)) ,

and we also have the same average of individual adversarial losses from Lemma B.3 as follows

i=1 max δ (fi(x + δ) y)2

i=3 max δ (fi(x + δ) y)2

On the Diversity of Adversarial Ensemble Learning

However, the adversarial ensemble losses are different from

max δ (f1(x + δ)/2 + f2(x + δ)/2 y)2 = 8 and max δ (f3(x + δ)/2 + f4(x + δ)/2 y)2 7.2 .

Thus, there exist two ensembles of the same averages of cos values and individual adversarial losses, but with different adversarial ensemble losses for l norm.

C. Appendix for Section 3.3

C.1. Proof of Lemma 3.3

We have the cross-entropy loss

ℓ(f(x + δ), y) = y(f(x) + f(x)T δ) + ln(1 + exp(f(x) + f(x)T δ)) ,

and the adversarial cross-entropy loss

max δ ϵp ℓ(f(x + δ), y) = max δ ϵp

y(f(x) + f(x)T δ) + ln(1 + exp(f(x) + f(x)T δ)) . (17)

For δ ϵ p, we have, from Lemma B.1

f(x) qϵ f(x)T δ f(x) qϵ .

We get the maximum of Eqn. (17) when

( f(x) qϵ for y = 0 f(x) qϵ for y = 1 ,

and the optimal adversarial perturbation δ with f(x)T δ = (2y 1) f(x) qϵ. We finally have

f(x + δ ) = f(x) + f(x)T δ = f(x) (2y 1) f(x) qϵ ,

and the probability of the positive class of the adversarial example x + δ

padv f,+ = 1 1 + exp( (f(x) + f(x)T δ )) = 1 1 + exp( (f(x) (2y 1) f(x) qϵ)) ,

which completes the proof.

C.2. Proof of Theorem 3.4

ln(1 + exp(fj(x) y fj(x) qϵ)) ln(1 + exp( f(x) y f(x) qϵ))

= 1 1 + exp( f(x) y f(x) qϵ) ln(1 + exp(fj(x) y fj(x) qϵ)

1 + exp( f(x) y f(x) qϵ) )

+ 1 1 + exp( ( f(x) y f(x) qϵ)) ln(1 + exp(fj(x) y fj(x) qϵ)

1 + exp( f(x) y f(x) qϵ) )

= 1 1 + exp( f(x) y f(x) qϵ) ln(1 + exp(fj(x) y fj(x) qϵ)

1 + exp( f(x) y f(x) qϵ) )

+ 1 1 + exp( ( f(x) y f(x) qϵ)) ln(1 + exp( (fj(x) y fj(x) qϵ))

1 + exp( ( f(x) y f(x) qϵ)) )

+ 1 1 + exp( ( f(x) y f(x) qϵ))(fj(x) y fj(x) qϵ f(x) + y f(x) qϵ)

= KL(p f,adv, pfj,adv) + p f,+,adv(fj(x) y fj(x) qϵ f(x) + y f(x) qϵ) .

On the Diversity of Adversarial Ensemble Learning

This follows that, from Lemma 3.3,

j=1 max δ p ϵ ℓ( fj(x + δ), y) max δ p ϵ ℓ( g(x + δ), y)

j=1 ln(1 + exp(fj(x) y fj(x) qϵ)) y(fj(x) y fj(x) qϵ)

ln(1 + exp( f(x) y f(x) qϵ)) + y( f(x) y f(x) qϵ)

j=1 KL(p f,adv, pfj,adv) + (p f,+,adv y)(2y 1)ϵ 1

j=1 ( f(x) q fj(x) q) ,

which completes the proof.

C.3. Discussions of Average of cos Values for Cross-Entropy Loss

Example 2. There exist two ensembles of the same averages of cos values and individual adversarial cross-entropy losses, but with different adversarial ensemble losses for cross-entropy loss and l2 norm.

Proof. We focus on 2-dimensional instance space X R2 and label space Y R, and consider

f1(x) = x1 + x2, f2(x) = x1 3x2, f3(x) = abx1 + bx2 and f4(x) = abx1 bx2 ,

2 and b = 2 ln q

(1 + exp(1 +

2))(1 + exp(1 +

We study two ensembles: one ensemble of f1 and f2; the other ensemble of f3 and f4. For example (x, y) = ([1, 0], 1) and perturbation set = {δ : δ 2 1}, we have

cos( f1(x), f2(x)) = f1(x), f2(x) f1(x) 2 f2(x) 2 = f3(x), f4(x) f3(x) 2 f4(x) 2 = cos( f3(x), f4(x)) ,

and we also have the same average of individual adversarial losses from Lemma 3.3

i=1 max δ (fi(x + δ) y)2

i=3 max δ (fi(x + δ) y)2

2 = ln(1 + exp(1 +

2)) + ln(1 + exp(1 +

However, the adversarial ensemble losses are different from

max δ (f1(x + δ)/2 + f2(x + δ)/2 y)2 2.4999 and max δ (f3(x + δ)/2 + f4(x + δ)/2 y)2 2.3738 ,

which completes the proof.

Example 3. There exist two ensembles of the same averages of cos values and individual adversarial cross-entropy losses, but with different adversarial ensemble losses for cross-entropy loss for l norm.

Proof. We focus on 2-dimensional instance space X R2 and label space Y R, and consider

f1(x) = x1 + x2, f2(x) = x1 3x2, f3(x) = abx1 + bx2 and f4(x) = abx1 bx2 ,

where a = (

5 1)/2 and b = ln( p

(1 + exp(3))(1 + exp(5)) 1)/

5 . We study two ensembles: one ensemble of f1 and f2; the other of f3 and f4. For example (x, y) = ([1, 0], 1) and perturbation set = {δ : δ 1}, we have

cos( f1(x), f2(x)) = f1(x), f2(x) f1(x) 2 f2(x) 2 = f3(x), f4(x) f3(x) 2 f4(x) 2 = cos( f3(x), f4(x)) ,

On the Diversity of Adversarial Ensemble Learning

and we also have the same average of individual adversarial losses from Lemma 3.3

i=1 max δ (fi(x + δ) y)2

i=3 max δ (fi(x + δ) y)2

2 = ln(1 + exp(3)) + ln(1 + exp(5))

However, the adversarial ensemble losses are different from

max δ (f1(x + δ)/2 + f2(x + δ)/2 y)2 3.0486 and max δ (f3(x + δ)/2 + f4(x + δ)/2 y)2 2.3738 ,

which completes the proof.

D. Appendix for Section 4

D.1. Proof of the Extended Diversity

For simplicity, we abbreviate f( xf) and pf( xf) to f and pf, respectively. Let fk, pfk, k [K] be the k-th element of f and pf, respectively. We have the adversarial cross-entropy loss for multi-classification

ℓ(f( xf), y) = fy + log

k=1 exp(fk)

This follows that

ℓ( f( x f), y) 1

j=1 ℓ(fj( xfj), y) = fy + 1

j=1 fj,y + log

k=1 exp( fk)

k=1 exp(fj,k)

We also have

k=1 exp( fk)

k=1 exp(fj,k)

exp( fk1) PK k=1 exp( fk) log

PK k=1 exp( fk) PK k=1 exp(fj,k)

exp( fk1) PK k=1 exp( fk) log

fj,k1/ PK k=1 exp(fj,k) fk1/ PK k=1 exp( fk) fk1 fj,k1

k=1 p fk( fk fj,k) KL(p f, pfj) ,

and this follows that, by setting r = p f( x f) ey,

k=1 exp(fj,k)

k=1 exp( fk)

r, fj( xfj) f( x f) m +

KL(p f( x f), pfj( xfj))

We have, from the first-order approximation f( xf) f(x) + Jf(x)δ f,

j=1 r, fj( xfj) f( x f) = 1

j=1 r, Jfj(x)δ fj J f(x)δ f ,

which completes the proof by combining with Eqns. (18)-(19).

D.2. Proof of Orthogonality

We now show the orthogonalization of the predictions of base learners by optimizing Eqn. (5) as follows. Theorem D.1. For K multi-class learning, let f1, , fm be m base learners with cross-entropy loss being at least B, and K 1 is a multiple of m. We have the minimizer of Eqn. (5) over example (x, y) as

pfi(x + δ fi)k =

exp( B) for k = y 1 exp( B)

K 1 for k si 0 otherwise , (20)

where s1, , sm is a partition of set {1, , K}\{y} and pfi( xfi)k is the k-th element of pfi( xfi).

On the Diversity of Adversarial Ensemble Learning

Table 4. Comparison of training time (times(s) per epoch) for our Adv EOAP with and without the regularization Γ.

Our Adv EOAP MNIST F-MNIST CIFAR10

without regularization Γα( ) 40.08 40.11 123.84

with regularization Γα( ) 41.45 41.32 125.48

Proof. We first have pfi(x + δ fi)y = exp( B), since the adversarial cross-entropy loss is at least B for each base learner. The regularization in Eqn. (5) is defined as

Γα(x, y) = H( Xm

j=1 pfj( xfj)/m) + α log(V ( pf1( xf1), , pfm( xfm))) .

The V ( pf1( xf1), , pfm( xfm)) achieves its maximum if and only if the non-label probability vectors of each individual network are mutually orthogonal (Bernstein, 2009). The H(Pm j=1 pfj( xfj)/m) achieves the maximum if and only if the mean of non-label probability vectors of individual networks are uniform. It is obvious that Eqn. (20) satisfies the two conditions simultaneously. Thus, Eqn. (20) is the minimizer of Eqn. (5).

D.3. Other Details of Algorithm 1

PGD-attack for lp norm perturbation ball

The PGD-attack generates adversarial examples iteratively for l -norm perturbation ball as follows

x+ ϵ (xt,j + α sign( ℓ(fj(xt,j), y))) ,

where Q denotes the projection and x + ϵ = {x + δ|δ ϵ }. The x0,j is initialized as x, and x T,j is used as the adversarial example of base learner fj. For other lp norm perturbation ball, PGD-attack generates adversarial examples as

x+ ϵp (xt,j + α gp( ℓ(fj(xt,j), y))) ,

where gp is the function that maps the gradient to the update direction

gp(w) = sign(wi)ϵ(|wi|q/ w q q)1/p .

Calculating the volume of polytope

For m vectors x1, , xm Rd and X = (x1, , xm) Rd m, we have, from matrix theory (Bernstein, 2009),

V 2(x1, , xm) = det(XT X) ,

where det(XT X) is the determination of the matrix XT X. For matrix A Rn n, we also have

aij = det(A)(A 1)T ,

where aij is the i-th row and j-th column element of A. We finally optimize the objective Eqn. (5) for neural networks with SGD method (Robbins & Monro, 1951).

Time Complexity of Algorithm 1

The time complexity of Algorithm 1 takes m-times as that of training a single neural network adversarially (m is the number of neural networks in the ensemble). In addition, it takes O(m3) computational cost for the regularization with its gradient. In practice, the regularization takes much smaller computational cost than that of training neural networks, as in Table 4.

On the Diversity of Adversarial Ensemble Learning

Table 5. Hyperparameters of all ensemble methods used in our experiments. Parameters that were not applicable were left blank.

Parameter GAL ADP Adv ADP DVERGE PDD TRS i GAT(ADP)

α 0.5 2 2 - 0.01 1 2

β - 0.5 0.5 - - 5 0.5

E. Appendix for Section 5

E.1. Experimental settings

For i GATADP, we take 150, 150 and 480 epoches for MNIST, F-MNIST and CIFAR10 for convergence; while for other ensemble methods, we take 60, 60 and 250 epoches for MNIST, F-MNIST and CIFAR10, respectively. For adversarial examples in training process, we take PGD10 with 10 steps with step-size 0.04, 0.01 and 0.008 for MNIST, F-MNIST and CIFAR10, respectively. We set α = 0.02 and λ = 10 for our method, and Table 5 summarizes parameter setting for others.

E.2. EOT and BPDA attacks

We take the Backward Pass Differentiable Approximation (BPDA) attack (Athalye et al., 2018a) for potential gradient risks and designs attacks. We design four different attacks as follows:

BPDA1: For potential gradients vanishing risk (i.e., small gradient of ensemble from different gradient of base learner), we instead use the k times of the average logits of the base learners as the logit of the ensemble, where k [1, m] and m is the number of base learners. We evaluate all possible values of k [1, m] and report the lowest adversarial accuracy observed.

BPDA2: For potential incorrect gradients from random gradients of single base learner, we consider the attack of deleting a base learner and using other gradients of the ensemble.

BPDA3: For potential incorrect gradients from random gradients of base learners, we could consider the attack of selecting randomly half of base learners at each step for attack.

BPDA4: For other potential gradient risks, we consider the black box attack (Andriushchenko et al., 2020).

We also take the Expectation over Transformation (EOT) attack (Athalye et al., 2018b) to add adversarial perturbations insensitively in transformations. We implement 20 times rotations randomly within -30 to +30 degrees.