# on_the_perils_of_cascading_robust_classifiers__98ec706a.pdf

Published as a conference paper at ICLR 2023

ON THE PERILS OF CASCADING ROBUST CLASSIFIERS

Ravi Mangal , Zifan Wang , Chi Zhang Electrical and Computer Engineering Carnegie Mellon University Pittsburgh, PA 15213 {rmangal, zifanw, chiz5}@andrew.cmu.edu

Klas Leino School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 kleino@cs.cmu.edu

Corina P as areanu Carnegie Mellon University and NASA Ames Moffett Field, CA 94043 pcorina@andrew.cmu.edu

Matt Fredrikson School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 mfredrik@cmu.edu

Ensembling certiﬁably robust neural networks is a promising approach for improving the certiﬁed robust accuracy of neural models. Black-box ensembles that assume only query-access to the constituent models (and their robustness certiﬁers) during prediction are particularly attractive due to their modular structure. Cascading ensembles are a popular instance of black-box ensembles that appear to improve certiﬁed robust accuracies in practice. However, we show that the robustness certiﬁer used by a cascading ensemble is unsound. That is, when a cascading ensemble is certiﬁed as locally robust at an input x (with respect to ϵ), there can be inputs x in the ϵ-ball centered at x, such that the cascade s prediction at x is different from x and thus the ensemble is not locally robust. Our theoretical ﬁndings are accompanied by empirical results that further demonstrate this unsoundness. We present cascade attack (Cas A), an adversarial attack against cascading ensembles, and show that: (1) there exists an adversarial input for up to 88% of the samples where the ensemble claims to be certiﬁably robust and accurate; and (2) the accuracy of a cascading ensemble under our attack is as low as 11% when it claims to be certiﬁably robust and accurate on 97% of the test set. Our work reveals a critical pitfall of cascading certiﬁably robust models by showing that the seemingly beneﬁcial strategy of cascading can actually hurt the robustness of the resulting ensemble. Our code is available at https://github.com/Trista Chi/ensemble KW.

1 INTRODUCTION

Local robustness has emerged as an important requirement of classiﬁer models. It ensures that models are not susceptible to misclassiﬁcations caused by small perturbations to correctly classiﬁed inputs. A lack of robustness can be exploited by not only malicious actors (in the form of adversarial examples (Szegedy et al., 2014)) but can also lead to incorrect behavior in the presence of natural noise (Gilmer et al., 2019). However, ensuring local robustness of neural network classiﬁers has turned out to be a hard challenge. Although neural networks can achieve state-of-the-art classiﬁcation accuracies on a variety of important tasks, neural classiﬁers with comparable certiﬁed robust accuracies1 (CRA, Def. 2.2) remain elusive, even when trained in a robustness-aware manner (Madry et al., 2018; Wong & Kolter, 2018; Cohen et al., 2019; Leino et al., 2021). In light of the limitations of robustness-aware training, ensembling certiﬁably robust neural classiﬁers has been shown to be a promising approach for improving certiﬁed robust accuracies (Wong et al., 2018; Yang et al., 2022). An ensemble combines the outputs of multiple base classiﬁers to make a prediction, and is a well-known mechanism for improving classiﬁcation accuracy when one only has access to weak learners (Dietterich, 2000; Bauer & Kohavi, 1999).

Ensembles designed to improve CRA take one of two forms. White-box ensembles (Yang et al., 2022; Zhang et al., 2019; Liu et al., 2020) assume white-box access to the constituent models. They

Equal Contribution 1Percentage of inputs where the classiﬁer is accurate and certiﬁed as locally robust.

Published as a conference paper at ICLR 2023

(𝑎) (𝑏) (𝑐)

Constituent Models

Figure 1: Visualizing classiﬁcation results of 2D points for constituent models (a-c) and the corresponding Cascading Ensemble (d, Def. 2.7) and Uniform Voting Ensemble (e, Def. 5.3). Regions with colors correspond to predictions (0: red, 1: blue, 2: green) made by the underlying model (or ensemble). Darker colors indicate that the accompanying robustness certiﬁcation of the underlying model (or ensemble) returns 1 and lighter colors are for cases when the certiﬁcation returns 0. All points receiving 1 for certiﬁcations (darker regions) are at least ϵ-away from the other classes in (a)-(c), i.e. certiﬁcation is sound (Def. 2.3). This property is violated in (d), e.g. points from dark red regions are not ϵ-away from the blue region in the zoomed-in view on the left, but preserved in (e). Namely, voting ensembles are soundness-preserving (Def. 2.6) while cascading ensembles are not.

calculate new logits by averaging the corresponding logits of the constituent classiﬁers. For local robustness certiﬁcation, they treat the ensemble as a single, large model and then use off-the-shelf techniques (Cohen et al., 2019; Weng et al., 2018; Wong & Kolter, 2018; Zhang et al., 2018) for certiﬁcation. Black-box ensembles (Wong et al., 2018; Blum et al., 2022), on the other hand, assume only query-access to the constituent classiﬁers during prediction, and are, therefore, agnostic to their internal details. They re-use the prediction and certiﬁcation outcomes of the constituent models to calculate the ensemble s prediction and certiﬁcate. Their black-box nature lends them modularity and permits any combination of constituent classiﬁers, irrespective of their individual certiﬁcation mechanism, so we focus our efforts on them in this paper.

Cascading ensembles (Wong et al., 2018; Blum et al., 2022) are a particularly popular instance of black-box ensembles that appear to improve CRA in practice. They evaluate the constituent classiﬁers (and their certiﬁers) in a ﬁxed sequence. The ensemble s prediction is the output of the ﬁrst constituent classiﬁer in the sequence that is certiﬁed locally robust, defaulting to the last classiﬁer s output if no model can be certiﬁed. Importantly, the cascading ensemble is itself certiﬁed locally robust only when at least one of the constituent classiﬁers is certiﬁed locally robust.

Our contributions. We show in this paper that the local robustness certiﬁcation mechanism used by cascading ensembles is unsound even when the certiﬁers used by each of the constituent classiﬁers are sound (Theorem 2.8). In other words, when a cascading ensemble is certiﬁed as locally robust at an input x, there can, in fact, be inputs x in the ϵ-ball centered at x, such that the cascade s prediction at x is different from x. Figure 1 demonstrates this visually on a toy dataset. The cascading ensemble can have points that are less than ϵ away from the decision boundary, yet the ensemble is certiﬁed locally robust at such points (Figure 1(d)). As a consequence of our result, use of a cascading ensemble in any scenario requiring local robustness guarantees is unsafe and existing empirical results that report the CRA of cascading ensembles are not valid.

Guided by our theoretical construction, we propose cascade attack (Cas A, Algorithm 3.1), an adversarial attack against cascading ensembles, and conduct an empirical evaluation with the cascading ensembles trained by Wong et al. (2018) for MNIST and CIFAR-10 datasets. With Cas A, we show that: (1) there exists an adversarial example for up to 88% of the samples where the ensemble claims

Published as a conference paper at ICLR 2023

to be certiﬁably robust and accurate; (2) the empirical robust accuracy of a cascading ensemble is as low as 11% while it claims to be certiﬁably robust and accurate on 97% of the test set; and (3) viewing all experiments as a whole, the empirical robust accuracy of a cascading ensemble is almost always lower than even the CRA of the single best model in the ensemble. Namely, a cascading ensemble is often less robust. Our results conclusively demonstrate that the unsoundness of the cascading ensemble certiﬁcation mechanism can be exploited in practice, and cause the ensemble to perform markedly worse than the single best constituent model.

We also present an alternate ensembling mechanism based on weighted voting that, like cascading ensembles, assumes only query-access to the constituent classiﬁers but comes with a provably sound local robustness certiﬁcation procedure (Section 5). We show through a thought experiment that it is possible for a voting ensemble to improve upon the CRA of its constituent models (Section 5.2), and observe that the key ingredient for the success of voting ensembles is a suitable balance between diversity and similarity of their constituents. We leave the design of training algorithms that balance the diversity and similarity of the constituent classiﬁers as future work.

2 CASCADING ENSEMBLES

In this section, we introduce our notation and required deﬁnitions. We then show the local robustness certiﬁcation procedure used by cascading ensembles is unsound.

2.1 CERTIFIABLE CLASSIFIERS AND ENSEMBLERS

We begin with our notation and necessary deﬁnitions. Suppose a neural network f :Rd Rm takes an input and outputs the probability of m different classes. The subscript xj denotes the j-th element of a vector x. When discussing multiple networks, we differentiate them with a superscript, e.g. f (1),f (2). Throughout the paper we use the upper-case letter F to denote the prediction of f such that F(x)=argmaxj Y{fj(x)} where fj is the logit for class j and Y =[m].2 The prediction F(x) is considered ϵ-locally robust at x if all neighbors within an ϵ-ball centered at x receive the same predictions, which is formally stated in Def. 2.1.

Deﬁnition 2.1 (ϵ-Local Robustness). A network, F, is ϵ-locally robust at x w.r.t to norm, || ||, if x . ||x x|| ϵ= F(x )=F(x).

Though local robustness certiﬁcation of Re LU networks is NP-Complete (Katz et al., 2017), due to its importance, the problem has been receiving increasing attention from the community. Proposed certiﬁcation methods rely on a variety of algorithmic approaches like solving corresponding linear (Jordan et al., 2019) or semi-deﬁnite programs (Raghunathan et al., 2018), interval propagations (Gowal et al., 2018; Lee et al., 2020; Zhang et al., 2018), abstract interpretation (Singh et al., 2019a), geometric projections (Fromherz et al., 2021), dual networks (Wong & Kolter, 2018), or Lipschitz approximations (Leino et al., 2021; Weng et al., 2018).

If a certiﬁcation method is provided for a network F, we use F ϵ : Rd Y {0,1} to denote a certiﬁable neural classiﬁer that returns a prediction according to F and the outcome of the certiﬁcation method applied to F with respect to robustness radius ϵ. We use F ϵ label(x) to refer to the prediction and F ϵ cert(x) to refer to the certiﬁcation outcome. If F ϵ cert(x)=0, the accompanying robustness certiﬁcation is unable to certify F (i.e., Flabel) as ϵ-locally robust at x. When ϵ is clear from the context, we directly write F. One popular metric to evaluate the performance of any F is Certiﬁed Robust Accuracy (CRA).

Deﬁnition 2.2 (Certiﬁed Robust Accuracy). The certiﬁed robust accuracy (CRA) of a certiﬁable classiﬁer F Rd Y {0, 1} on a given dataset Sk Rd Y with k samples is given by CRA( F,Sk):=1/k P

(xi,yi) Sk1[ F(xi)=(yi,1)].

For F and its CRA to be useful in practice without providing false robustness guarantees, it must be sound to begin with (Def. 2.3).

Deﬁnition 2.3 (Certiﬁcation Soundness). A certiﬁable classiﬁer, F : Rd Y {0,1}, is sound if x Rd . Fcert(x)=1 = Flabel is ϵ-locally robust at x.

2[m]:={0,1,...,m 1}

Published as a conference paper at ICLR 2023

Notice that if there exist ϵ-close inputs x,x where F(x)=(y,1) and F(x )=(y ,0), where y =y , then it still means that F is not sound. We deﬁne an ensembler (Def. 2.4) as a function that combines multiple certiﬁable classiﬁers into a single certiﬁable classiﬁer (i.e., an ensemble).

Deﬁnition 2.4 (Ensembler). Let F := Rd Y {0,1} represent the set of all certiﬁable classiﬁers. An ensembler E : FN F is a function over N certiﬁable classiﬁers that returns a certiﬁable classiﬁer.

A query-access ensembler formalizes our notion of a black-box ensemble.

Deﬁnition 2.5 (Query-Access Ensembler). Let G:=(Y {0,1})N Y {0,1}. E is a query-access ensembler if, F (0), F (1),..., F (N 1) F . G G . x Rd.

E F (0), F (1),..., F (N 1) (x) = G F (0)(x), F (1)(x),..., F (N 1)(x)

Def. 2.5 says that if E is query-access its output F can always be re-written as a function over the outputs of the certiﬁable classiﬁers F (0), F (1),..., F (N 1). Put differently, F only has black-box or query-access to classiﬁers F (0), F (1),..., F (N 1).

Finally, a soundness-preserving ensembler (Def. 2.6) ensures that if the constituent certiﬁable classiﬁers are sound (as deﬁned in Def. 2.3), the ensemble output by the ensembler is also sound. Deﬁnition 2.6 (Soundness-Preserving Ensembler). An ensembler E is soundness-preserving if, F (0), F (1),..., F (N 1) F, F (0), F (1),..., F (N 1) are sound = E( F (0), F (1),..., F (N 1)) is sound.

2.2 CASCADING ENSEMBLER IS NOT SOUNDNESS-PRESERVING

Cascading ensembles (Wong et al., 2018; Blum et al., 2022) are a popular instance of black-box ensembles that appear to be practically effective in improving certiﬁed robust accuracies. However, we show that cascading ensembles are not sound.

We deﬁne a cascading ensemble to be the output of a cascading ensembler (Def. 2.7). A cascading ensemble evaluates its constituent certiﬁable classiﬁers in a ﬁxed sequence. For a given input x, the ensemble either returns the prediction and certiﬁcation outcomes of the ﬁrst constituent classiﬁer F (j)

such that F (j) cert = 1 or of the last constituent classiﬁer in case none of the constituent classiﬁers can be certiﬁed. Clearly, cascading ensemblers are query-access (formal proof in Appendix A).

Deﬁnition 2.7 (Cascading Ensembler). Let F (0), F (1),..., F (N 1) be N certiﬁable classiﬁers. A cascading ensembler EC : FN F is deﬁned as follows

EC F (0), F (1),..., F (N 1) (x) := F (j)(x) if j N 1 . c(j)=1 F (N 1)(x) otherwise

where c(j):=1 if ( F (j) cert(x)=1) and ( i<j, F (i) cert(x)=0), and 0 otherwise.

Theorem 2.8 shows that cascading ensemblers are not soundness-preserving, and so a cascading ensemble can be unsound. We show this by means of a counterexample.3

Theorem 2.8. The cascading ensembler EC is not soundness-preserving.

Proof. We can re-write the theorem statement as, F (0), F (1), ... , F (N 1) F such that for F :=EC( F (0), F (1),..., F (N 1)), x Rd, Fcert(x)=1 = Flabel is ϵ-locally robust at x.

We prove by constructing the following counterexample. Consider a cascading ensemble F constituted of certiﬁable classiﬁers F (0) and F (1). F (0) and F (1) are such there exists an x where

F (0) cert(x)=0 F (1)(x)=(y,1) (1)

Using Def. 2.7, it is true that F(x) = (y,1). Without violating (1), we can have another point x such that,

3The use of a cascade of certiﬁcation methods for a single classiﬁer as in (Gowal et al., 2018; Singh et al., 2019b) is orthogonal and sound.

Published as a conference paper at ICLR 2023

Algorithm 3.1: Cascade Attack (Cas A)

Inputs: Ensemble F F, constituent models F (0), ..., F (N 1) F, input x Rd, attack bound ϵ R, and distance metric ℓp Output: An attack input x Rd

1 Attack( F, F (0) , ... , F (N 1) , x , ϵ , ℓp):

2 y := Flabel(x)

3 idxs := {i | i [N] F (i) label(x) =y} {N 1}

4 foreach i idxs do

5 if i=N 1 then

6 x := x + argmax δ Bp(0,ϵ)

Lce(one-hot( F (i) label(x+δ)), one-hot(y)) + P

k<i (1 F (k) cert (x+δ))

8 x := x + argmax δ Bp(0,ϵ)

F (i) cert(x+δ) + P

k<i (1 F (k) cert (x+δ))

9 if Flabel(x ) =y then

10 return x

11 return x

(||x x || ϵ) ( F (0)(x )=(y ,1)) ( F (1)(x )=(y,0)) (y =y) (2)

Using Def. 2.7, it is true that F(x ) = (y ,1). Thus, for two points x,x constructed as above, we show that x,x , s.t. ||x x|| ϵ, Fcert(x) = 1 = Flabel(x ) = Flabel(x), which violates the condition of local robustness (Def. 2.1).

The counterexample constructed in Thm. 2.8 is not just hypothetical, but something that materializes on real models (see Figure 1 for a toy example and Section 4 for our empirical evaluation).

3 ATTACKING CASCADING ENSEMBLES

Section 2.2 shows that a cascading ensemble does not provide a robustness guarantee. We further show here how one can attack the cascading ensemble and ﬁnd an adversarial example within the ϵ-ball centered at the input x.

Overview of Attack. Algorithm 3.1 describes the attack algorithm, cascade attack (Cas A), inspired by the proof of Theorem 2.8. Given an input x, the goal of the algorithm is to ﬁnd an input x in the ϵ-ball centered at x such that the predictions of the cascade at x and x are different. The inputs to the algorithm are an ensemble F, its constituent classiﬁers F (0), ..., F (N 1), the input x to be attacked, and the attack distance bound ϵ as well as distance metric ℓp. The algorithm either returns a successful adversarial input x such that ||x x ||p ϵ and Flabel(x) = Flabel(x ) or it returns the original input x if no adversarial input was found. We use the following notations: Lce stands for cross-entropy loss, one-hot is the one-hot encoding function, and Bp(0,ϵ) is the ℓp-ball of radius ϵ centered at 0.

Preparing Targets. Cas A gets the label y predicted by the ensemble F at input x (line 2) to select the constituent models it may attack. The attacker is only interested in a constituent model (by remembering its index i) if it predicts a label other than y at x or it is the last one (line 3). We are not interested in attacking a model F (j) that predicts y at x because such an effort is bound to fail. F (j) is still sound even though the ensemble is not; therefore, no point x assigned a label other than y by F (j) is such that it is both less than ϵ-away from x and F (j) is also certiﬁably robust at x (the second condition is necessary for F (j) to be used for prediction at x by the ensemble). However, the last model F (N 1) is an exception and always remembered, i.e. idxs always includes N 1. The reason is that, given an input x , if all models F (i);i<N 1 fail to be certiﬁably robust at x , the ensemble uses F (N 1) for prediction at x irrespective of whether F (N 1) is itself certiﬁably robust at x or not.

4In our implementation, we use a surrogate version of this objective (see Section 3).

Published as a conference paper at ICLR 2023

Attacker s Steps. For each model index in idxs, we try to ﬁnd an adversarial example (lines 4-10). An attacker stops as soon as they ﬁnd a valid adversarial example (lines 9-10). Lines 6 and 8 describe the objective an attacker minimizes to ﬁnd the adversarial examples. If index i =N 1, the attacker optimizes δ such that, at input x + δ, the model F (i) is certiﬁed robust whereas all other models F (k);k <i are not certiﬁed robust. This ensures that model F (i) is used for prediction at input x+δ as it is certiﬁably robust at x. If index i = N 1, we still require that all models F (k);k < i are not certiﬁed robust at x+δ. But instead of requiring that F (i) is certiﬁed robust at x+δ, we only require that the predicted label at x+δ be different from y. We solve the optimization problems on lines 6 and 8 using projected gradient descent (PGD) (Madry et al., 2018).

Surrogate Objectives. For cases when the certiﬁcation procedure, i.e. F (i) cert(x+δ), is not differentiable or too expensive to run multiple times, we provide the following cheap surrogate replacements. The intuition underlying the surrogate versions is that, given a model, the distance to the decision boundary from an input is correlated with the margin between the top logit scores of the model at that input. To use the surrogate objectives for the attack, we need to assume access to the logit scores of the models. For the problem argmaxδ Bp(0,ϵ) F (i) cert(x+δ), we try to increase the logit score associated with the desired prediction as much as possible. Then, a surrogate version of the problem is as follows (where F (i) logit represents the logit scores produced by model F (i)):

argmax δ Bp(0,ϵ) Lce( F (i) logit(x+δ), one-hot( F (i) label(x))) (3)

For the problem argmaxδ Bp(0,ϵ) P

k<i(1 F (k) cert(x + δ)), we want the input x + δ to be as close as possible to the decision boundaries for each of the models by F (k),k < i so that the robustness certiﬁcations will fail. The speciﬁc predictions F (k) label(x+δ) of these models do not matter. Towards that end, we aim to make the margin between the logit scores of any model F (k) be as small as possible. This leads to the following surrogate problem (where unif is a discrete uniform distribution): argmax δ Bp(0,ϵ) X

k<i Lce( F (k) logit(x+δ), unif) (4)

4 EMPIRICAL EVALUATION

The goal of our empirical evaluation is to demonstrate the extent to which the unsoundness of the cascading ensembles manifests in practice and can be exploited by an adversary, i.e. Cas A. For our measurements, we use the ℓ and ℓ2 robust cascading ensembles constructed by Wong et al. (2018) for MNIST (Le Cun et al., 1998) and CIFAR-10 (Krizhevsky, 2009) datasets. The constituent classiﬁers in these cascades use a local robustness certiﬁcation mechanism based on dual networks (Wong et al., 2018). Each cascade includes between 2-7 certiﬁable classiﬁers with the same architecture (except for the ℓ robust, CIFAR-10 Resnet cascades that include only a single constituent model, and are hence not considered in our evaluation). The training code and all the constituent models in the ensembles are made available by Wong et al. (2018) in a public repository (Wong & Kolter).

We report the certiﬁed robust accuracy (CRA) and standard accuracy (Acc) for the cascading ensemble as well as the single best constituent model in the ensemble. While the certiﬁer for a single model is sound, the ensemble certiﬁer is unsound and the reported ensemble CRA is an over-estimate. We therefore measure the empirical robustness of the ensemble under Cas A. Certifying with dual networks (Wong et al., 2018) is differentiable but extremely expensive. To run the attack more efﬁciently, we use the surrogate replacements in Section 3 and take 100 steps using PGD (other hyper-parameters to follow in Appendix B) to empirically minimize the objectives. After the attack, we report the false positive rate (FPR), i.e. % of test inputs for which an adversarial example is found within the ϵ-ball, and the empirical robust accuracy (ERA), i.e. % of test set where the cascade is empirically robust (i.e., our attack failed). All our experiments were run on a pair of NVIDIA TITAN RTX GPUs with 24 GB of RAM each, and a 4.2GHz Intel Core i7-7700K with 64 GB of RAM.

Table 1 shows the results for ℓ robustness (top) and ℓ2 robustness (bottom). Each row in the table represents a speciﬁc combination of dataset (MNIST or CIFAR-10), architecture (Small or Large convolutional networks), and ϵ value used for local robustness certiﬁcation. The structure of the table is the same as Tables 2 and 4 in (Wong et al., 2018), except we add the columns reporting FPR and ERA.

Summary of Results. We see from Table 1 that, irrespective of the dataset, model, ϵ value, or ℓp metric under consideration, our attack can ﬁnd false positives, with false positive rates (FPR) as

Published as a conference paper at ICLR 2023

Table 1: Results on models pre-trained by Wong et al. (2018) for ℓ (top) and ℓ2 (bottom) robustness. CRA: % of test set where model is certiﬁed robust and accurate. Acc: % of test set where model is accurate. FPR: among all test inputs where cascade is certiﬁed robust and accurate, % of inputs for which an adversarial example is found within the ϵ-ball using our ensemble attack (i.e., false positive rate). ERA: % of test set where the cascading is empirically robust (i.e., our attack failed) and accurate (ERA of a single model is always greater or equal to its CRA because of soundness and therefore not included). The unsoundness of cascade certiﬁcation is shown by the high false positive rates (FPR).

ℓ Single Model Cascading Ensemble Dataset Model ϵ CRA(%) Acc(%) unsound CRA(%) FPR(%) Acc(%) ERA(%)

MNIST Small, Exact 0.1 95.54 98.96 96.33 88.71 96.62 11.17 MNIST Small 0.1 94.94 98.79 96.07 81.93 96.24 17.51 MNIST Large 0.1 95.55 98.81 96.27 86.37 96.42 13.27

MNIST Small 0.3 56.21 85.23 65.41 88.87 65.80 7.67 MNIST Large 0.3 58.02 88.84 65.50 85.27 65.50 9.65

CIFAR10 Small 2/255 46.43 60.86 56.65 11.51 56.65 50.13 CIFAR10 Large 2/255 52.65 67.70 64.87 10.47 65.13 58.15

CIFAR10 Small 8/255 20.58 27.60 28.32 16.00 28.32 23.79 CIFAR10 Large 8/255 16.04 19.01 20.83 17.67 20.83 17.15

ℓ2 Single Model Cascading Dataset Model ϵ CRA(%) Acc(%) unsound CRA(%) FPR(%) Acc(%) ERA(%)

MNIST Small, Exact 1.58 43.52 88.14 75.58 44.72 80.43 43.46 MNIST Small 1.58 43.34 87.73 74.66 40.93 79.07 45.73 MNIST Large 1.58 43.96 88.39 74.50 51.95 74.99 35.81

CIFAR10 Small 36/255 46.05 54.39 49.89 3.61 51.37 49.27 CIFAR10 Large 36/255 50.26 60.14 58.72 2.70 58.76 57.17 CIFAR10 Resnet 36/255 51.65 60.70 58.65 3.41 58.69 56.68

Table 2: Run time and peak memory usage of Cas A. Results are reported on one Titan RTX.

Dataset Model # of Models ϵ Objective unsound CRA(%) ERA(%) Batch Size Mem./Batch (MB) Time/Batch (Min)

MNIST Small 7 0.3 (ℓ ) dual networks 65.41 4.93 32 12565 4.34 MNIST Small 7 0.3 (ℓ ) our surrogates 65.41 7.67 32 4548 0.06

CIFAR10 Small 5 2/255 (ℓ ) dual networks 28.32 23.79 32 10124 2.43 CIFAR10 Small 5 2/255 (ℓ ) our surrogates 28.32 22.92 32 4898 0.06

high as 88.87%. In other words, there always exist test inputs where the ensemble is accurate and declares itself to be certiﬁed robust, but our attack is able to ﬁnd an adversarial example. This result demonstrates that the unsoundness of the cascading ensemble certiﬁcation mechanism is not just a problem in theory but it can be exploited by adversaries in practice. More strikingly, the empirical robust accuracy (ERA) of the ensemble is often signiﬁcantly lower than the certiﬁed robust accuracy (CRA) of the best constituent model. Since the ERA of a model is an upper-bound of its CRA, the actual CRA of the ensemble can be no larger than the reported ERA. This result shows that the use of a cascading ensemble can actually hurt the robustness instead of improving it. Note that there are small differences between the Acc and CRA numbers reported in Table 1 and those in (Wong et al., 2018). Though we use the evaluation code and pre-trained models made available by Wong et al. (2018), the hardware and Py Torch versions we use in our experiments are different.

Attack Efﬁciency. In Table 2, we compare the attack results of Cas A using the original objectives, i.e. dual networks (Wong et al., 2018), and using surrogate replacements. Because the ensemble on MNIST contains more constituent models, it uses more memory with dual networks compared to CIFAR10. Our report of run time and memory usage shows that using surrogate replacements allows us to run attacks with larger batch size, less memory and time to reach the same level of performance.

5 A QUERY-ACCESS, SOUNDNESS-PRESERVING ENSEMBLER

We present a query-access, soundness-preserving ensembler based on weighted voting in this section. Voting is a natural choice for the design of a query-access ensembler but ensuring that the ensembler is soundness-preserving can be subtle. Section 5.1 deﬁnes our ensembler and proves that it is soundnesspreserving. In Section 5.2, we present a thought experiment demonstrating that it is possible for a voting

Published as a conference paper at ICLR 2023

ensemble to signiﬁcantly improve upon the CRA of its constituent models. Appendix D describes our algorithm for learning the weights to be used in weighted voting, and Appendix E presents initial empirical results with the voting ensemble. Our results show that improving the CRA via a voting ensemble can be difﬁcult on realistic datasets since it requires the ensemble to demonstrate a suitable balance between diversity and similarity of its constituents, but we believe that this is a fruitful direction for future research.

5.1 WEIGHTED VOTING ENSEMBLE

Voting ensembles run a vote amongst their constituent classiﬁers to make a prediction. In the simplest case, each constituent has a single vote that gets assigned to their predicted label. The label with the maximum number of votes is chosen as the ensemble s prediction. More generally, weighted voting allows a single constituent to be allocated more than one vote. The decimal number of votes allocated to each constituent is referred to as its weight. For simplicity, we assume that weights of the constituents in an ensemble are normalized so that they sum up to 1. We use weighted voting to not only choose the ensemble s prediction but to also decide its certiﬁcation outcome. The interaction between voting and certiﬁcation is subtle and needs careful design to ensure that the certiﬁcation procedure is sound.

Extra Notations. Let vw x (j,c) denote the total number of votes allocated to certiﬁable classiﬁers, F (i), in the ensemble that output (j,c). More formally, for an input x, label j Y, certiﬁcation outcome c {0,1},weight w [0,1]N, and a set of constituent certiﬁable classiﬁers, F (0),..., F (N 1), let vw x (j,c) := PN 1 i=0 wi 1[ F (i)(x) = (j,c)]. We ﬁnd it useful to use vw x ( ,c) to denote the number of votes for any class with certiﬁcate, c; i.e., vw x ( ,c)=P

j Yvw x (j,x). Likewise, we will use vw x (j) to denote the number of votes for class j regardless of certiﬁcate; i.e., vw x (j)=vw x (j,0)+vw x (j,1).

Deﬁnition 5.1 (Weighted Voting Ensembler). Let F (1),..., F (N) be N certiﬁable classiﬁers. A weighted voting ensembler, Ew V : FN F is deﬁned as follows F(x):=Ew V ( F (0), F (1),..., F (N 1))(x):= Flabel(x), Fcert(x) , Flabel(x):=argmax j

n vw x (j) o

where Fcert(x):= 1 if j = Flabel(x) . vw x ( Flabel(x),1)>vw x ( ,0)+vw x (j,1) 0 otherwise

The prediction of the weighted voting ensemble is the label receiving the maximum number of votes regardless of the certiﬁcate. However, for the certiﬁcation outcome, the ensemble has to consider the certiﬁcates of the constituent models. The ensemble should be certiﬁed robust only if its prediction outcome, i.e., the label receiving the maximum number of votes (regardless of the certiﬁcate), can be guaranteed to not change in an ϵ-ball. The condition under which Fcert(x)=1 ensures this is the case, and allows us to prove that weighted voting ensemblers are soundness-preserving (Theorem 5.2). A key observation underlying the condition is that only constituent classiﬁers that are not certiﬁed robust at the current input can change their predicted label in the ϵ-ball, and, in the worst case, transfer all their votes (vw x ( ,0)) to the label with the second highest number of votes at x. We believe that our proof of soundness-preservation is of independent interest. We also note that weighted-voting ensemblers are query-access (formal proofs in Appendix A).

Theorem 5.2. The weighted voting ensembler Ew V is soundness-preserving.

Deﬁnition 5.3 (Uniform Voting Ensembler). Let F (0),..., F (N 1) be N certiﬁable classiﬁers. The uniform voting ensembler, EU : FN F is a weighted voting ensembler that assigns equal weights to each classiﬁer, i.e. EU =Ew V where i {0,...,N 1}.wi =1/N.

5.2 EFFECTIVENESS OF VOTING: A THOUGHT EXPERIMENT

Voting ensembles require the constituents to strike the right balance between diversity and similarity to be effective. In other words, while the constituents should be accurate and robust in different regions of the input space (diversity), these regions should also have some overlap (similarity). We conduct a thought experiment using a simple hypothetical example (Example. 5.4) where such a balance is struck. The existence of this example provides evidence and hope that voting ensembles can improve the CRA.We present the example informally here. The detailed, rigorous argument is in Appendix A.

5In case of a tie, we assume that the label corresponding to the logit with the lowest index is returned.

Published as a conference paper at ICLR 2023

Example 5.4. Assume that we have a uniform voting ensemble F with three constituent classiﬁers F (0), F (1), and F (2). Assume that on a given dataset Sk with 100 samples, each of the constituent classiﬁers has CRA equal to 0.5. Let s say that the samples in Sk are ordered such that F (0) is accurate and robust on the ﬁrst 50 samples (i.e., samples 0-49), F (1) is accurate and robust on samples 25-74, and F (2) on samples 0-24 and 50-74. Then, for each of the ﬁrst 75 samples, two out of three constituents in the ensemble are accurate and robust. Therefore, by Def. 5.1, the ensemble F is accurate and robust on samples 0-74, and has CRA equal to 0.75.

6 RELATED WORK

Ensembling is a well-known approach for improving the accuracy of models as long as the constituent models are suitably diverse (Dietterich, 2000). In recent years, with the growing focus on robust accuracy as a metric of model quality, a number of ensembling techniques have been proposed for improving this metric. Depending on whether an ensemble is query-access or not (i.e., does not or does require access to the internal details of the constituent models for prediction and certiﬁcation), it can be classiﬁed as a white-box or a black-box ensemble. The modularity of black-box ensembles is attractive as the constituent classiﬁers can each be from a different model family (i.e., neural networks, decision trees, support vector machines, etc.) and each use a different mechanism for robustness certiﬁcation. The constituents of white-box ensembles, on the other hand, tend to be from the same model family but this provides the beneﬁt of tuning the ensembling strategy to the model family being used.

White-box ensembles. Several works (Yang et al., 2022; Zhang et al., 2019; Liu et al., 2020) present certiﬁable ensembles where the ensemble logits are calculated by averaging the corresponding logits of the constituent classiﬁers. Needing access to the logits of the constituent classiﬁers, and not just their predictions, is one aspect that makes these ensembles white-box. More importantly, the approaches used by these ensembles for local robustness certiﬁcation are also in violation of our deﬁnition of query-access ensembles (Def. 2.5). For instance, randomized smoothing (Cohen et al., 2019) is used in (Yang et al., 2022; Liu et al., 2020) to certify the ensemble, which requires evaluating the constituent models on a large number of inputs for each prediction, and not just one input. Other approaches (Zhang et al., 2019) use interval bound propagation (IBP) (Gowal et al., 2018; Zhang et al., 2018) to certify the ensemble. Calculating the interval bounds requires access to the architecture and weights of each of the constituent models, violating the requirements of a query-access ensemble. A number of white-box ensembling techniques (Pang et al., 2019; Yang et al., 2020; Kariyappa & Qureshi, 2019; Sen et al., 2020; Zhang et al., 2022) only aim to improve empirical robust accuracy, i.e., these ensembles do not provide a robustness certiﬁcate. As before, the ensemble logits are calculated by averaging the corresponding logits of the constituent models. These approaches differ from each other in the training interventions used to promote diversity in the constituent models.

Black-box ensembles. Cascading ensembles (Wong et al., 2018; Blum et al., 2022) are the most popular example of certiﬁably robust black-box ensembles. While Wong et al. (2018) empirically evaluate their cascading ensemble, the results of Blum et al. (2022) are purely theoretical. However, as we show in this work, the certiﬁcation mechanism used by cascading ensembles is unsound. Devvrit et al. (2020); Sen et al. (2020) present a black-box voting ensemble but, unlike our voting ensemble, their ensemble does not provide robustness certiﬁcates. Nevertheless, they are able to show improvements in the empirical robust accuracy with the voting ensemble.

7 CONCLUSION

In this paper, we showed that the local robustness certiﬁcation mechanism used by cascading ensembles is unsound. As a consequence, existing empirical results that report the certiﬁed robust accuracies (CRA) of cascading ensembles are not valid. Guided by our theoretical results, we designed an attack algorithm against cascading ensembles and demonstrated that their unsoundness can be easily exploited in practice. In fact, the performance of the ensembles is markedly worse than their single best constituent model. Finally, we presented an alternate black-box ensembling mechanism based on weighted voting that we prove to be sound, and, via a thought experiment, showed that voting ensembles can signiﬁcantly improve the CRA if the constituent models have the right balance between diversity and similarity.

Published as a conference paper at ICLR 2023

ETHICS STATEMENT

Our work sheds light on existing vulnerabilities in state-of-the-art certiﬁably robust neural classiﬁers. The presented attacks can be used by malicious entities to adversarially attack deployed cascading ensembles of certiﬁably robust models. However, by putting this knowledge out in the public domain and making practitioners aware of the existence of the problem, we hope that precautions can be taken to protect existing systems. Moreover, it highlights the need to harden future systems against such attacks.

REPRODUCIBILITY STATEMENT

To examine our theoretical results, the proof of Theorem 2.8 directly follows the body of the theorem in Section 2.2 while the proof of Theorem 5.2 is delayed to Appendix A and F, together with the proofs of other theorems that only appear in the appendix, i.e. Theorem A.1, A.2 (Appendix A) and Theorem F.2 (Appendix F). All the datasets used in our work are publicly available with links in their corresponding reference. Our experimental code is uploaded in the supplementary material (and also at https://github.com/Trista Chi/ensemble KW) with a detailed README ﬁle and weights of models to reproduce the results in Table 1, 2, 5, 6, 7, and 8. Moreover, hyper-parameters used in these table are also documented in Appendix B and D. The hardware information used in all experiments is reported in Section 4.

ACKNOWLEDGEMENTS

We would like to thank the reviewers for their comments which helped us improve this article. The work described in this paper has been supported by the Software Engineering Institute under its FFRDC Contract No. FA8702-15-D-0002 with the U.S. Department of Defense, DARPA and the Air Force Research Laboratory under agreement number FA8750-15-2-0277, as well as DARPA GARD Contract HR00112020006.

Eric Bauer and Ron Kohavi. An empirical comparison of voting classiﬁcation algorithms: Bagging, boosting, and variants. Machine learning, 36(1):105 139, 1999.

Avrim Blum, Omar Montasser, Greg Shakhnarovich, and Hongyang Zhang. Boosting barely robust learners: A new perspective on adversarial robustness. ar Xiv preprint ar Xiv:2202.05920, 2022.

Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certiﬁed adversarial robustness via randomized smoothing. In Kamalika Chaudhuri and Ruslan Salakhutdinov (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 1310 1320. PMLR, 09 15 Jun 2019. URL https://proceedings.mlr.press/v97/cohen19c.html.

Devvrit, Minhao Cheng, Cho-Jui Hsieh, and Inderjit S. Dhillon. Voting based ensemble improves robustness of defensive models. Ar Xiv, abs/2011.14031, 2020.

Thomas G. Dietterich. Ensemble methods in machine learning. In Multiple Classiﬁer Systems, pp. 1 15, Berlin, Heidelberg, 2000. Springer Berlin Heidelberg. ISBN 978-3-540-45014-6.

Aymeric Fromherz, Klas Leino, Matt Fredrikson, Bryan Parno, and Corina P as areanu. Fast geometric projections for local robustness certiﬁcation. In International Conference on Learning Representations (ICLR), 2021.

Justin Gilmer, Nicolas Ford, Nicholas Carlini, and Ekin Cubuk. Adversarial examples are a natural consequence of test error in noise. In Kamalika Chaudhuri and Ruslan Salakhutdinov (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 2280 2289. PMLR, 09 15 Jun 2019. URL https://proceedings.mlr.press/v97/gilmer19a.html.

Sven Gowal, Krishnamurthy Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy Mann, and Pushmeet Kohli. On the effectiveness of interval bound propagation for training veriﬁably robust models. ar Xiv preprint ar Xiv:1810.12715, 2018.

Published as a conference paper at ICLR 2023

Matt Jordan, Justin Lewis, and Alexandros G Dimakis. Provable certiﬁcates for adversarial examples: Fitting a ball in the union of polytopes. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/ paper/2019/file/ae3f4c649fb55c2ee3ef4d1abdb79ce5-Paper.pdf.

Sanjay Kariyappa and Moinuddin K Qureshi. Improving adversarial robustness of ensembles with diversity training. ar Xiv preprint ar Xiv:1901.09981, 2019.

Guy Katz, Clark Barrett, David L Dill, Kyle Julian, and Mykel J Kochenderfer. Reluplex: An efﬁcient smt solver for verifying deep neural networks. In International Conference on Computer Aided Veriﬁcation, pp. 97 117. Springer, 2017.

Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009.

Yann Le Cun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278 2324, 1998.

Sungyoon Lee, Jaewook Lee, and Saerom Park. Lipschitz-certiﬁable training with a tight outer bound. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 16891 16902. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/ file/c46482dd5d39742f0bfd417b492d0e8e-Paper.pdf.

Klas Leino, Zifan Wang, and Matt Fredrikson. Globally-robust neural networks. In International Conference on Machine Learning (ICML), 2021.

Chizhou Liu, Yunzhen Feng, Ranran Wang, and Bin Dong. Enhancing certiﬁed robustness via smoothed weighted ensembling. ar Xiv preprint ar Xiv:2005.09363, 2020.

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.

Tianyu Pang, Kun Xu, Chao Du, Ning Chen, and Jun Zhu. Improving adversarial robustness via promoting ensemble diversity. In International Conference on Machine Learning, pp. 4970 4979. PMLR, 2019.

Aditi Raghunathan, Jacob Steinhardt, and Percy Liang. Certiﬁed defenses against adversarial examples. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=Bys4ob-Rb.

Sanchari Sen, Balaraman Ravindran, and Anand Raghunathan. Empir: Ensembles of mixed precision deep networks for increased robustness against adversarial attacks. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=HJem3y HKw H.

Gagandeep Singh, Timon Gehr, Markus Püschel, and Martin Vechev. An abstract domain for certifying neural networks. Proc. ACM Program. Lang., 3(POPL), January 2019a.

Gagandeep Singh, Timon Gehr, Markus Püschel, and Martin Vechev. Robustness certiﬁcation with reﬁnement. In International Conference on Learning Representations, 2019b.

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In Yoshua Bengio and Yann Le Cun (eds.), 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 1416, 2014, Conference Track Proceedings, 2014. URL http://arxiv.org/abs/1312.6199.

Lily Weng, Huan Zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Luca Daniel, Duane Boning, and Inderjit Dhillon. Towards fast computation of certiﬁed robustness for relu networks. In International Conference on Machine Learning, pp. 5276 5285. PMLR, 2018.

Eric Wong and J Zico Kolter. URL https://github.com/locuslab/convex_ adversarial.

Eric Wong and Zico Kolter. Provable defenses against adversarial examples via the convex outer adversarial polytope. In International Conference on Machine Learning, pp. 5286 5295. PMLR, 2018.

Published as a conference paper at ICLR 2023

Eric Wong, Frank Schmidt, Jan Hendrik Metzen, and J Zico Kolter. Scaling provable adversarial defenses. Advances in Neural Information Processing Systems, 31, 2018.

Huanrui Yang, Jingyang Zhang, Hongliang Dong, Nathan Inkawhich, Andrew Gardner, Andrew Touchet, Wesley Wilkes, Heath Berry, and Hai Li. Dverge: diversifying vulnerabilities for enhanced robust generation of ensembles. Advances in Neural Information Processing Systems, 33:5505 5515, 2020.

Zhuolin Yang, Linyi Li, Xiaojun Xu, Bhavya Kailkhura, Tao Xie, and Bo Li. On the certiﬁed robustness for ensemble models and beyond. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=t Ua4REj Gj Tf.

Dinghuai Zhang, Hongyang Zhang, Aaron Courville, Yoshua Bengio, Pradeep Ravikumar, and Arun Sai Suggala. Building robust ensembles via margin boosting. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp. 26669 26692. PMLR, 17 23 Jul 2022. URL https://proceedings.mlr.press/v162/zhang22aj.html.

Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. Efﬁcient neural network robustness certiﬁcation with general activation functions. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS 18, pp. 4944 4953, Red Hook, NY, USA, 2018. Curran Associates Inc.

Huan Zhang, Minhao Cheng, and Cho-Jui Hsieh. Enhancing certiﬁable robustness via a deep model ensemble. ar Xiv preprint ar Xiv:1910.14655, 2019.

Published as a conference paper at ICLR 2023

Theorem 5.2. The weighted voting ensembler Ew V is soundness-preserving.

Proof. Let F (0), ..., F (N 1) be N certiﬁable classiﬁers, which we assume are sound. Let F :=Ew V ( F (0),..., F (N 1)); i.e., F is given by Deﬁnition 5.1.

Assume for the sake of contradiction x,x s.t. ||x x || ϵ, F(x) = (j1,1), and Flabel(x ) = j2 where j2 =j1.

Since F(x) = (j1,1), by Deﬁnition 5.1., j = j1, vw x (j1,1) > vw x ( ,0) + vw x (j,1), and thus, in particular, Equation 5 holds. vw x (j1,1)>vw x ( ,0)+vw x (j2,1) (5)

Consider the votes on x . The models that contribute to vw x (j1,1) are all locally robust6 at x, so each of these models must output the label j1 on x , which is at distance no greater than ϵ from x; thus Equation 6 holds. vw x (j1) vw x (j1,1) (6)

Conversely, only points that are non-robust at x can change labels on x , thus we obtain Equation 7. vw x (j2) vw x ( ,0)+vw x (j2,1) (7)

Putting things together we have vw x (j1) vw x (j1,1) by (6) >vw x ( ,0)+vw x (j2,1) by (5) vw x (j2) by (7)

Thus, since vw x (j1)>vw x (j2), Flabel(x ) cannot be j2 .

Example 5.4. We want to show that, F (0), F (1), ... , F (N 1) F, w [0, 1]N such that for F :=Ew C ( F (0), F (1),..., F (N 1)),

Sk Rd Y. i {0,...,N 1}. CRA( F,Sk)>CRA( F (i),Sk)

Consider a weighted voting ensemble F constituted of certiﬁable classiﬁers F (0), F (1), and F (2)

with weights w=( 1

3), i.e., F is a uniform voting ensemble.

Suppose k = 100, i.e., Sk is a dataset with 100 samples. Moreover, lets say that CRA( F (0), Sk) = CRA( F (1), Sk) = CRA( F (2), Sk) = 0.5. Also, suppose that the samples in Sk are arranged in a ﬁxed sequence Sk =(x0,y0),...,(xk 1,yk 1) such that, i [0,49]. F (0)(xi)=(yi,1) (8)

i [25,74]. F (1)(xi)=(yi,1) (9)

i [0,24] [50,74]. F (2)(xi)=(yi,1) (10) where [i,j] is the set of integers from i to j, i and j included. 8, 9, and 10 are consistent with the fact that certiﬁed robust accuracy of each model is 0.5.

From 8, 9, 10, and Deﬁnition 5.1, i [0,74]. Flabel(xi)=argmax j

n vw xi(j) o =yi (11)

i [0,74]. j =yi . vw xi(yi,1)>vw xi( ,0)+vw xi(j,1) (12)

From 12 and Deﬁnition 5.1, i [0,74]. Fcert(xi)=1 (13)

From 11, 13, and Deﬁnition 2.2, CRA( F,Sk)=0.75 (14)

Theorem A.1. The cascading ensembler EC is query-access.

6The models are known to be locally robust because they are sound and their output matches ( ,1).

Published as a conference paper at ICLR 2023

Table 3: Hyper-parameters Used for Cas A in Table 1

Dataset Norm ϵ Normalization Max Steps Step Size

MNIST ℓ 0.1 [0,1] 100 0.004 MNIST ℓ 0.3 [0,1] 100 0.012 MNIST ℓ2 1.58 [0,1] 100 0.03

CIFAR10 ℓ 2/255 µ=[0.485, 0.456, 0.406], σ=0.225 100 0.0003 CIFAR10 ℓ 8/255 µ=[0.485, 0.456, 0.406], σ=0.225 100 0.00124 CIFAR10 ℓ2 36/255 µ=[0.485, 0.456, 0.406], σ=0.225 100 0.0003

Proof. Let F :=EC( F (0), F (1),..., F (N 1)). Let g(0),g(1),...,g(N 1) Y {0,1}. We use g(j) 2 to refer to the second element in the pair g(j). Deﬁne G as follows,

G(g(0),g(1),...,g(N 1)):= g(j) if j N 1. c(j)=1 g(N 1) otherwise

where c(j):=1 if (g(j) 2 =1) ( i<j,g(i) 2 =0) and 0 otherwise.

Then, by Def. 2.7, F(x) = G( F (0)(x), F (1)(x),..., F (N 1)(x)). Then, by Def. 2.5, cascading ensembler is query-access.

Theorem A.2. The weighted voting ensembler Ew V is query-access.

Proof. Let F :=Ew V ( F (0), F (1),..., F (N 1)). Let g(0),g(1),...,g(N 1) Y {0,1}. We use g to refer to the set {g(0),g(1),...,g(N 1)}. Deﬁne G as follows, G(g(0),g(1),...,g(N 1)):= G1(g(0),g(1),...,g(N 1)),G2(g(0),g(1),...,g(N 1))) where ˆj =G1(g(0),g(1),...,g(N 1)):=argmax j

n vw g (j, ) o ,

G2(g(0),g(1),...,g(N 1)):= 1 if j =ˆj . vw g (ˆj,1)>vw g ( ,0)+vw g (j,1) 0 otherwise and

vw g (j,c)=

i=0 wi 1 h g(i) =(j,c) i

Then, by Def. 5.1, F(x) = G( F (0)(x), F (1)(x),..., F (N 1)(x)). Then, by Def. 2.5, weighted voting ensembler is query-access.

B HYPER-PARAMETERS OF TABLE 1

In Table 3, we report hyper-parameters used to run Cas A to reach the statistics reported in Table 1. Notice that if a normalization is µ=[0.485, 0.456, 0.406], σ =0.225 , we divide the ϵ and step size by σ during the experiment. We use SGD as the optimizer for all experiments.

C ATTACKING NON-SEQUENTIALLY TRAINED CASCADING ENSEMBLES

Wong et al. (2018) train cascading ensembles in a sequential manner, i.e., each model in the sequence is only trained on those training samples that could not be certiﬁed robust by any of the previous models. The training algorithm is described by Wong et al. (2018) in appendix C (Algorithm 2) of their paper. We evaluate the efﬁcacy of our attack algorithm (Cas A) on cascade ensembles trained in a non-sequential manner. That is, each constituent model is trained independently on the entire train dataset, and each constituent only differs due to the randomness of initialization and of stochastic gradient descent during training. Besides this difference, the code, architecture, hyperparameters, and data used for training are the same as that used by Wong et al. (2018). For every combination of dataset, architecture, and ϵ value, we train three constituent models, and use them to construct non-sequentially trained cascading ensembles.

Table 4 shows the results of running our attack on such cascades with non-sequentially trained constituents. We make the following observations:

Published as a conference paper at ICLR 2023

Table 4: Attack results on non-sequentially trained cascade ensembles.

Dataset Model ℓp ϵ unsound CRA(%) FPR(%) Acc(%) ERA(%)

MNIST Small,Exact ℓ 0.1 97.07 0.32 99.14 98.1 MNIST Small ℓ 0.1 96.25 0.27 98.69 97.4 MNIST Small ℓ 0.3 66.09 3.60 84.77 72.97 CIFAR10 Small ℓ 2/255 55.53 6.43 62.05 54.59 CIFAR10 Small ℓ 8/255 25.3 9.05 27.44 23.35

MNIST Small,Exact ℓ2 1.58 47.42 0.0002 87.78 73.56 MNIST Small ℓ2 1.58 47.4 0.0 87.5 72.93 CIFAR10 Small ℓ2 36/255 52.33 3.29 55.88 53.71

The non-sequentially trained ensembles continue to be unsound and our attack is able to ﬁnd adversarial inputs (demonstrated by non-zero FPR). The success rate of our attack is much lower than on the sequentially-trained models shared by Wong et al. (2018). The unsound CRA of these ensembles is comparable to that of the sequentially-trained models.

We hypothesize that our attack demonstrates much higher success rates on sequentially-trained models because, when trained sequentially, it is likely that the later models in the cascade are degenerate, i.e., are very robust but with low accuracy (similar to constant functions). Then, to attack the ensemble, we just need to ﬁnd an attack input where the initial models cannot be certiﬁed, since the remaining degenerate models are typically robust and inaccurate.

The degeneracy of the later models in the sequentially-trained ensemble may also explain why the unsound CRAs of the two kinds of ensembles are comparable. Models are trained in a sequential manner to enhance their "diversity". However, due to the degeneracy of the later models, the sequentially-trained ensembles likely end up being about only as diverse as the non-sequentially trained ensembles.

Finally, we note that the sequential style of training cascade ensembles is quite natural. In fact, both, Wong et al. (2018) and Blum et al. (2022) train models in a sequential manner. But these results suggest that sequentially training may make it easier to exploit the unsoundness of cascading ensembles.

D WEIGHTED VOTING ENSEMBLE: LEARNING WEIGHTS

The weights w in Ew V determines the importance of each constituent classiﬁer in the ensemble. Given a set of k labeled inputs, Sk (e.g. the training set), we would like to learn the optimal weights w that maximize the ensembler s CRA (Def. 2.2) over Sk. When Sk resembles the true distribution of the test points, the learned w is expected to be close to the optimal weights that maximizes the CRA of the test set. Weight optimization over Sk naturally takes the following form.

max w [0,1]N 1 k

(xi,yi) Sk 1 h Ew V ( F (0),..., F (N 1))(xi)=(yi,1) i (15)

For the indicator to output 1, it is required that the margin of votes be greater than 0, i.e. w xi(yi) := vw xi(yi,1) vw xi( ,0) maxj =yi{vw xi(j,1)} > 0. Namely, the votes for the class yi, i.e. vw xi(yi,1), must be greater than the votes for all other classes i.e. maxj =yi{vw xi(j,1)} plus the votes for non-robust predictions vw xi( ,0) as discussed in Def. 5.1. Eq.(15) then becomes:

max w [0,1]N 1 k

(xi,yi) Sk 1 w xi(yi)>0 (16)

The indicator function is not differentiable so we replace it with a differentiable and monotonically increasing function s, which leads to Eq. 17.

max w [0,1]N 1 k

(xi,yi) Sk s( w xi(yi)) (17)

In this paper, we choose s to be the sigmoid function σt where t is the temperature only for negative inputs, i.e., σt(x) := σ(x) if x > 0 and σ(x/t) otherwise, where σ is the standard sigmoid function. Sigmoid is non-negative so margins with opposite signs do not cancel, and it also avoids biasing training towards producing larger margins on a small number of points. Indeed, vanishing gradients are

Published as a conference paper at ICLR 2023

Table 5: Results on models pre-trained by Wong et al. (2018) for ℓ robustness.

Single Model Cascading Uniform Voting Weighted Voting Dataset Model ϵ CRA(%) Acc(%) unsound CRA(%) Acc(%) ERA(%) CRA(%) Acc(%) CRA(%) Acc(%)

MNIST Small, Exact 0.1 95.54 98.96 96.33 96.62 11.17 0.01 61.68 95.54 61.68 MNIST Small 0.1 94.94 98.79 96.07 96.24 17.51 9.29 65.85 94.94 98.79 MNIST Large 0.1 95.55 98.81 96.27 96.42 13.27 10.12 63.89 95.55 98.81

MNIST Small 0.3 56.21 85.23 65.41 65.80 7.67 11.48 56.46 56.21 85.23 MNIST Large 0.3 58.02 88.84 65.50 65.50 9.65 26.95 65.97 58.02 88.84

CIFAR10 Small 2/255 46.43 60.86 56.65 56.65 50.13 18.58 40.88 46.43 60.86 CIFAR10 Large 2/255 52.65 67.70 64.88 65.14 58.15 18.07 48.92 52.65 67.70

CIFAR10 Small 8/255 20.58 27.60 28.32 28.32 23.79 10.78 24.11 19.00 23.78 CIFAR10 Large 8/255 16.04 19.01 20.83 20.83 17.15 5.18 21.01 16.04 19.01

Table 6: Results on models pre-trained by Wong et al. (2018) for ℓ2 robustness.

Single Model Cascading Uniform Voting Weighted Voting Dataset Model ϵ CRA(%) Acc(%) unsound CRA(%) Acc(%) ERA(%) CRA(%) Acc(%) CRA(%) Acc(%)

MNIST Small, Exact 1.58 43.52 88.14 75.58 80.43 43.46 6.42 74.25 43.52 88.14 MNIST Small 1.58 43.34 87.73 74.66 79.07 45.73 6.74 74.24 43.34 87.73 MNIST Large 1.58 43.96 88.39 74.50 74.99 35.81 6.35 65.99 43.96 88.39

CIFAR10 Small 36/255 46.05 54.39 49.89 51.37 49.27 11.47 38.74 46.05 54.39 CIFAR10 Large 36/255 50.26 60.14 58.72 58.76 57.17 10.56 39.74 50.26 60.14 CIFAR10 Resnet 36/255 51.65 60.7 58.65 58.69 56.28 22.74 46.71 51.65 60.7

useful on points around large positive margins, so the temperature is only applied on negative inputs. This leads us to Eq. 18, the optimization objective we solve for optimal weights w .

w =argmax w [0,1]N 1 k

(xi,yi) Sk σt( w xi(yi)) (18)

E WEIGHTED VOTING ENSEMBLE: EMPIRICAL RESULTS

The goal of these experiments is to evaluate the efﬁcacy of our sound voting ensemble. For our experiments, we use the pre-trained ensemble constituent models made available by Wong et al. (2018) to construct three kinds of ensembles, namely, cascading ensembles, uniform voting ensembles, and weighted voting ensembles. The weights for the weighted voting ensemble are learned in the manner described in Appendix D. We report certiﬁed robust accuracy (CRA) and standard accuracy (Acc) for each ensemble as well as for the best constituent model. Note that all these ensembles are query-access but only the uniform voting and weighted voting ensembles are soundness-preserving. Consequently, the CRA reported for the cascading ensemble grossly overestimates the actual CRA as demonstrated by our attack results. We always set the temperature to 1e5 and learning rate to 1e-2 when learning the weights as described in Appendix D.

Table 5 shows the results for ℓ robustness. Each row in the table represents a speciﬁc combination of dataset (MNIST or CIFAR-10), architecture (Small or Large convolutional networks or Resnet), and ϵ value used for local robustness certiﬁcation. Table 6 shows the results for ℓ2 robustness using constituent models pre-trained by Wong et al. (2018).

Summary of Results. We see from Tables 5 and 6 that while the cascading ensemble appears to improve upon the CRA of the single best model in the ensemble, these numbers are misleading due to the unsoundness of the certiﬁcation mechanism. The CRA for the uniform voting and weighted ensembles are consistently lower than that reported by the cascading ensemble, and in many cases, signiﬁcantly so. Uniform voting ensembles stand-out for their low CRA but there is a simple explanation for these results. The constituent models are trained by Wong et al. (2018) in a cascading manner, i.e., later constituent models are trained on only those points that cannot be certiﬁed by the previous models. This strategy causes the subset of inputs labeled correct and certiﬁably robust by each constituent model to have minimal overlap. However, voting ensembles need these input subsets to strike the right balance between diversity and overlap for improving the CRA .

Another interesting observation is that, in most cases, the CRA of the weighted voting ensemble and the single best constituent model are the same. This is again a consequence of the cascaded manner in which the constituent models are trained. The ﬁrst model in the cascade typically vastly outperforms

Published as a conference paper at ICLR 2023

Table 7: Results on non-sequentially trained models for ℓ robustness.

Single Model Cascading Uniform Voting Weighted Voting Dataset Model ϵ CRA(%) Acc(%) unsound CRA(%) Acc(%) ERA(%) CRA(%) Acc(%) CRA(%) Acc(%)

MNIST Small, Exact 0.1 95.61 99.02 97.07 99.14 98.1 95.56 99.16 95.54 98.96 MNIST Small 0.1 94.94 98.79 96.25 98.69 97.4 94.46 98.78 94.94 98.79

MNIST Small 0.3 56.21 85.23 66.09 84.77 72.97 55.24 85.02 56.21 85.23

CIFAR10 Small 2/255 46.43 60.86 55.49 62.06 54.59 43.48 62.79 42.35 61.09

CIFAR10 Small 8/255 21.04 28.29 25.11 27.73 23.35 20.44 28.32 21.04 28.29

Table 8: Results on non-sequentially trained models for ℓ2 robustness.

Single Model Cascading Uniform Voting Weighted Voting Dataset Model ϵ CRA(%) Acc(%) unsound CRA(%) Acc(%) ERA(%) CRA(%) Acc(%) CRA(%) Acc(%)

MNIST Small, Exact 1.58 43.52 88.14 47.42 87.78 73.56 42.71 88.19 43.52 88.14 MNIST Small 1.58 43.34 87.73 47.40 87.50 72.93 42.87 87.99 43.34 87.73 CIFAR10 Small 36/255 46.05 54.39 52.33 55.88 53.71 37.07 57.40 42.12 54.65

the subsequent models. Moreover, as already mentioned, the constituent models have almost no overlap in the input regions where they perform well, and their presence only ends up harming the performance of the voting ensemble. As a consequence, the optimal normalized weights, learned by solving the optimization problem described in Appendix D, typically assign all the mass to the ﬁrst model. The detailed weights for each of the weighted voting ensemble are given in Tables 9, 10, 11, and 12.

These results suggest two takeaway messages. First, the cascaded strategy of Wong et al. (2018) for training constituent models is in conﬂict with the requirement that constituent models overlap in their behavior for voting ensembles to be effective. This gives up hope that if the constituent models are suitably trained, voting ensembles can improve the CRA. We leave this exploration for future work. Second, even if the constituent models do not show the right balance between diversity and similarity, our weight learning procedure ensures that the performance of the weighted voting ensemble is no worse than the single best constituent model. Ideally, we would like the weights to be equally distributed since this conveys that every constituent in the ensemble has something to contribute. But, in the worst case, the weights play the role of a model selection procedure, assigning zero weights to constituent models that do not contribute to the ensemble.

Non-Sequential Training. We conduct another set of experiments where instead of using the constituent models pre-trained by Wong et al. (2018), we train them ourselves in a non-sequential manner. That is, each constituent model is trained on the entire train dataset, and each constituent only differs due to the randomness of initialization and of stochastic gradient descent during training. Besides this difference, the code, architecture, hyperparameters, and data used for training are the same as that used by Wong et al. (2018). For every combination of dataset, architecture, and ϵ value, we train three constituent models, and use them to construct cascading, uniform voting, and weighted voting ensembles.

Table 7 shows the results for ℓ robustness using non-sequentially trained constituent model ands Table 8 shows the results for ℓ2 robustness. We observe that, for non-sequentially trained models, the CRA of uniform voting and weighted voting ensembles are comparable, and similar to the CRA of the single best constituent model in the ensemble. In this case, the constituent models have too much overlap and almost no diversity. These results reafﬁrm our observation that voting ensembles require a balance between diversity and similarity to be effective.

F AN ALTERNATE FORMULATION OF UNIFORM VOTING ENSEMBLER

Deﬁnition F.1 (Permutation-based Cascading Ensembler). Let F (0), F (1), ... , F (N 1) be N certiﬁable classiﬁers and N is odd. Suppose Π is the set of all permutations of {0,1,...,N 1}. A permutation-based cascading ensembler EP : FN F is deﬁned as follows

EP ( F (0), F (1),..., F (N 1))(x):=

F (π0)(x) if π Π. c2(π)=1 ( F (π0) label (x),0) if π Π. c2(π )=1 π Π. c1(π)=1 ( ,0) otherwise

Published as a conference paper at ICLR 2023

Table 9: Learned weights for weighted voting ensemble with models pre-trained by Wong et al. (2018) for ℓ robustness.

number of Dataset Model ϵ models weights

MNIST Small, Exact 0.1 6 [0.996, 0.003, 0.000, 0.001, 0.000, 0.000] MNIST Small 0.1 7 [0.996, 0.000, 0.000, 0.003, 0.000, 0.000, 0.001] MNIST Large 0.1 5 [0.996, 0.002, 0.001, 0.000, 0.001]

MNIST Small 0.3 3 [0.995, 0.003, 0.002] MNIST Large 0.3 3 [0.948, 0.008, 0.044]

CIFAR10 Small 2/255 5 [0.995, 0.002, 0.001, 0.001, 0.001] CIFAR10 Large 2/255 4 [0.994, 0.003, 0.001, 0.002]

CIFAR10 Small 8/255 3 [0.003, 0.995, 0.002] CIFAR10 Large 8/255 3 [0.995, 0.002, 0.003]

Table 10: Learned weights for weighted voting ensembles with models pre-trained by Wong et al. (2018) for ℓ2 robustness.

number of Dataset Model ϵ models weights

MNIST Small, Exact 1.58 6 [0.995, 0.001, 0.001, 0.000, 0.003, 0.001] MNIST Small 1.58 6 [0.995, 0.001, 0.002, 0.001, 0.001, 0.000] MNIST Large 1.58 6 [0.996, 0.000, 0.001, 0.002, 0.000, 0.001]

CIFAR10 Small 36/255 2 [0.994, 0.006] CIFAR10 Large 36/255 6 [0.994, 0.002, 0.001, 0.001, 0.001, 0.001] CIFAR10 Resnet 36/255 4 [0.994, 0.004, 0.001, 0.001]

Table 11: Learned weights for weighted voting ensemble with non-sequentially trained models for ℓ robustness.

number of Dataset Model ϵ models weights

MNIST Small, Exact 0.1 3 [0.710, 0.131, 0.159] MNIST Small 0.1 3 [0.694, 0.154, 0.152]

MNIST Small 0.3 3 [0.908, 0.061, 0.031]

CIFAR10 Small 2/255 3 [0.011, 0.956, 0.0323]

CIFAR10 Small 8/255 3 [0.042, 0.087, 0.871]

Table 12: Learned weights for weighted voting ensembles with non-sequentially trained models for ℓ2 robustness.

number of Dataset Model ϵ models weights

MNIST Small, Exact 1.58 3 [0.695, 0.159, 0.146] MNIST Small 1.58 3 [0.660, 0.196, 0.144] CIFAR10 Small 36/255 3 [0.003, 0.002, 0.995]

Published as a conference paper at ICLR 2023

where is a random label selected from Y7 ,π0 refers to the ﬁrst element of the permutation π,

( 1 if j. ( N+1

2 j N 1) ( i<j. F (πi) label(x)= F (πj) label (x)) 0 otherwise (19)

( 1 if j. ( N+1

2 j N 1) ( i<j. F (πi)(x)= F (πj)(x)) ( F (πj) cert (x)=1) 0 otherwise (20)

and i,j {0,1,...,N 1}. Theorem F.2. The permutation-based cascading ensembler EP is a soundness-preserving ensembler.

Proof. Let F :=EP ( F (0), F (1),..., F (N 1)). For F we want to show that, x Rd, Fcert(x)=1= Flabel is ϵ-locally robust at x. (21)

W.L.O.G suppose Flabel(x)=y. If F(x)=(y,1), let us assume that π is the permutation such that c2(π)=1. Let k be the integer that makes c2(π)=1 to be true. Thus

F(x)=(y,1)= (N +1

2 k N 1) ( i<k, F (πi)(x)= F (πk)(x)=(y,1)) (22)

By our assumptions that F (0), F (1),..., F (N 1) are sound, which are invariant to the permutation of these models. Therefore, by Def. 2.3, i k,

F (πi)(x)=(y,1)= ( x . ||x x|| ϵ= F (πi) label(x )= F (πi) label(x)=y) (23)

Eq. (23) implies that x s.t. ||x x|| ϵ, the following statement is true

2 k N 1) ( i k, F (πi) label(x)= F (πk) label (x )=y) (24)

Plug the condition (24) into Def. F.1, we ﬁnd that c1(π) = 1 for x . Moreover, there cannot be a permutation π such that c1(π ) = c2(π ) = 1 F (π 0) label = y since k N+1

2 . Therefore, Flabel(x )= F (π0) label (x )=y, and we arrive at the following statement, x . ||x x|| ϵ= Flabel(x )= Flabel(x)=y (25) which completes the proof for the soundness of Flabel at any x.

7One can also return the plurality prediction of all models for the consideration of clean accuracy but the choice of will not change the relevant theorems.