# random_noise_defense_against_querybased_blackbox_attacks__99198514.pdf

Random Noise Defense Against Query-Based Black-Box Attacks

Zeyu Qin1 Yanbo Fan2 Hongyuan Zha1 Baoyuan Wu1

1School of Data Science, Shenzhen Research Institute of Big Data, The Chinese University of Hong Kong, Shenzhen 2Tencent AI Lab zeyuqin@link.cuhk.edu.cn, fanyanbo0124@gmail.com, zhahy@cuhk.edu.cn, wubaoyuan@cuhk.edu.cn

The query-based black-box attacks have raised serious threats to machine learning models in many real applications. In this work, we study a lightweight defense method, dubbed Random Noise Defense (RND), which adds proper Gaussian noise to each query. We conduct the theoretical analysis about the effectiveness of RND against query-based black-box attacks and the corresponding adaptive attacks. Our theoretical results reveal that the defense performance of RND is determined by the magnitude ratio between the noise induced by RND and the noise added by the attackers for gradient estimation or local search. The large magnitude ratio leads to the stronger defense performance of RND, and it s also critical for mitigating adaptive attacks. Based on our analysis, we further propose to combine RND with a plausible Gaussian augmentation Fine-tuning (RND-GF). It enables RND to add larger noise to each query while maintaining the clean accuracy to obtain a better trade-off between clean accuracy and defense performance. Additionally, RND can be ﬂexibly combined with the existing defense methods to further boost the adversarial robustness, such as adversarial training (AT). Extensive experiments on CIFAR-10 and Image Net verify our theoretical ﬁndings and the effectiveness of RND and RND-GF.

1 Introduction

Deep neural networks (DNNs) have been successfully applied in many safety-critical tasks, such as autonomous driving, face recognition and veriﬁcation, etc. However, it has been shown that DNN models are vulnerable to adversarial examples [18, 21, 26, 29, 48], which are indistinguishable from natural examples but make a model produce erroneous predictions. For real-world applications, the DNN model as well as the training dataset, are often hidden from users. Instead, only the model feedback for each query (e.g., labels or conﬁdence scores) is accessible. In this case, the product providers mainly face severe threats from query-based black-box attacks, which don t require any knowledge about the attacked models.

In this work, we focus on efﬁcient defense techniques against query-based black-box attacks, of which the main challenges are 1) the defender should not signiﬁcantly inﬂuence the model s feedback to normal queries, but it is difﬁcult to know whether a query is normal or malicious; 2) the defender has no information about what kinds of black-box attack strategies adopted by the attacker. Considerable efforts have been devoted to improving the adversarial robustness of DNNs [13, 34, 49, 50]. Among them, adversarial training (AT) is considered as the most effective defense techniques [3, 49]. However, the improved robustness from AT is often accompanied by signiﬁcant degradation of the

Corresponding author.

35th Conference on Neural Information Processing Systems (Neur IPS 2021).

clean accuracy. Besides, the training cost of AT is much higher than that of standard training, and then it also often suffers from poor generalization to new samples or adversarial attacks [20, 39, 45, 46]. Thus, we argue that AT-based defense is not a very suitable choice for black-box defense. In contrast, we expect that a good defense technique should satisfy the following requirements: well keeping clean accuracy, lightweight, and plug-and-play.

To this end, we study a lightweight defense strategy, dubbed Random Noise Defense (RND) against query-based black-box attacks. For query-based attacks [1, 2, 7, 9, 11, 19, 23, 26, 27, 32, 35], the core is to ﬁnd an attack direction by gradient estimation or random search based on the exact feedback of consecutive queries, which leads to a decrease of the designed objective. RND is realized by adding random noise to each query at the inference time. Therefore, the returned feedback with randomness results in poor gradient estimation or random search and slows down the attack process.

To better understand the effectiveness of RND, we provide a theoretical analysis of the defense performance of RND against query-based attacks and adaptive attacks. Our theoretical results reveal that the defense performance of RND is determined by the magnitude ratio between the noise induced by RND and the noise added by the attackers for gradient estimation or random search. The attack efﬁciency is signiﬁcantly affected by a large magnitude ratio. The attackers need more queries to ﬁnd the adversarial examples or fail to ﬁnd a successful attack under limited query settings. That is, the larger ratio leads to the better defense performance of RND. Apart from standard attacks, the adaptive attack (EOT) [3] has been considered as an effective strategy to mitigate the random effect induced by the defenders. We also conduct theoretical analysis about the defense effect of RND against EOT attacks, and demonstrate that the magnitude ratio is also crucial to mitigate adaptive attacks and the adaptive attacks have the limited impact of evading the RND. On the other hand, large random noises to each query may lead to degradation of clean accuracy. To achieve a better trade-off between the defense effect and the clean accuracy while maintaining training time efﬁciency, we further propose combining RND with a lightweight Gaussian augmentation Fine-tuning (RND-GF). RND-GF enables us to adopt larger noise in inference time to disturb the query process better.

We conduct extensive experiments on CIFAR-10 and Image Net. The experimental results verify our theoretical results and demonstrate the effectiveness of RND-GF. It is worth noting that RND is plug-and-play and easy to combine with existing defense methods such as AT to boost the defense performance further. To verify these, we evaluate the performance of RND combined with AT [22] and ﬁnd that the RND can improve the robust accuracy of AT by up to 23.1% against the SOTA black-box attack Square attack [2] with maintaining clean accuracy.

The main contributions of this work are four-fold.

We study a lightweight random noise defense (RND) against black-box attacks theoretically and empirically. Our theoretical results reveal that the effectiveness of RND is determined by the magnitude ratio between the noise induced by RND and the noise added by the attackers for gradient estimation and random search. We theoretically analyze the performance of RND against the adaptive attack (EOT) and demonstrate that EOT has the limited effect of evading the RND. Leveraging our theoretical analysis, we further propose an efﬁcient and stronger defense strategy RND-GF by combining the Gaussian augmentation Fine-tuning and RND towards a better trade-off between clean and adversarial performance. Extensive experiments verify our theoretical analysis and show the effectiveness of our defense methods against several state-of-the-art query-based attacks.

2 Related Work

Query-based Methods. Here we mainly review the query-based black-box attack methods, which can be categorized into two classes, including gradient estimation and search-based methods. Gradient estimation methods are based on zero-order (ZO) optimization algorithms. The attacker utilizes a direct search strategy to ﬁnd the search direction in search-based methods instead of explicitly estimating gradient. Furthermore, for our theoretical analysis, we focus on score-based queries. The continuous score (e.g., the posterior probability or the logit) for each query is returned, in contrast to the decision-based queries which return hard labels. Speciﬁcally, [26] proposed the ﬁrst limited query-based attack method by utilizing the Natural Evolutionary Strategies (NES) to estimate the

gradient. [32] proposed the ZOsign SGD algorithm, which is similar to NES. Based on Bandit Optimization, [27] proposed to combine the time and data-dependent gradient prior with gradient estimation, which dramatically reduced the number of queries. For the ℓ2 norm constrain, Sim BA [23] randomly sampled a perturbation from orthonormal basis. Sign Hunter [1] focuses on estimating the sign of gradient and ﬂipped the sign of perturbation to improve query efﬁciency. Square attack [2] is the state-of-art query-based attack method that selects localized square-shaped updates at random positions of images.

Black-Box Defense. Compared with the defense for white-box attacks, the defense specially designed for black-box attacks has not been well studied. Two recent works [8, 31] proposed to detect malicious queries based on the comparison with the history queries since the malicious queries are more similar with each other compared with normal queries. Adv Mind [37] proposed to infer the intent of the adversary, and it also needed to store the history queries. However, suppose the attacker adopted the strategy of long-interval malicious queries. In that case, the defender has to store a long history, with a very high cost of storage and comparison. [42] proposed the ﬁrst black-box certiﬁed defense method, dubbed denoised smoothing. [5] also showed that adding random noise can defend against query-based attacks through experimental evaluations, without theoretical analysis of the defense effect. There are also a few randomization-based defenses. R&P [51] proposed a random input transform-based method. RSE [33] added large Gaussian noise to both the input and activation of each layer. PNI [25] combined adversarial training with adding Gaussian noise to the input or weight of each layer. However, these methods signiﬁcantly sacriﬁced the accuracy of benign examples and also have a huge training cost. Pixel DP [30] and random smoothing [13, 41] proposed to train classiﬁers with large Gaussian noise to get certiﬁed robustness under the ℓ2 norm. However, they require to obtain a majority prediction of this query. Obviously, that places a huge burden on model inference and even helps the attacker get the more accurate gradient estimation. The too large Gaussian noise used in those methods also sacriﬁces clean accuracy. In contrast, RND maintains a good clean accuracy and only perturbs each query once, without any extra burden. [40] showed that the model with Gaussian augmentation training achieves the state-of-art defense against common corruptions, while the defense to black-box attacks is not evaluated.

3 Preliminaries

3.1 Score-based Black-Box Attack

In this work, we mainly focus on the score-based attacks. The analysis and evaluation of decisionbased attacks are shown in supplementary materials. We denote the attacked model as M : X Y with X being the input space and Y being the output space. Given a benign data (x, y) (X, Y), the goal of adversarial attack is to generate an adversarial example xadv that is similar with x, but to enforce prediction of M to be different with the true label y (i.e., untargeted attack) or to be a target label h (i.e., targeted attack). The optimization of untargeted attack can be formulated as follows, min xadv NR(x) f(xadv) = min xadv NR(x)(My(xadv) max j =y Mj(xadv)). (1)

For targeted attack, the objective function is maxj =h Mj(xadv) Mh(xadv). Mj(xadv) denotes the logit or softmax output w.r.t. class j. NR(x) = {x | x x p R} indicates a ℓp ball around x (p is often speciﬁed as 2 or ), with R > 0. In attack evaluation, as long as L(xadv) is less than 0, attackers consider the attack to be successful.

The score-based attack algorithms commonly adopt the projected gradient descent algorithms, xt+1 = Proj NR(x)(xt ηtg(xt)). (2)

For white-box attacks, g(xt) is the gradient of f(x) w.r.t. xt. However, for black-box attacks, the gradient g(xt) cannot be directly obtained. So the attackers utilize the gradient estimation or random search to conduct the update g(xt).

3.2 Zero-Order Optimization for Black-Box Attack

Zero-order optimization (ZO) has become the mainstream of black-box attacks, dubbed ZO attacks. The general idea of ZO attack methods is to estimate the gradient according to the objective values returned by querying. The gradient estimator widely used in score-based and decision-based attacks

[7, 10, 12, 12, 16, 26, 27, 32, 36] is

gµ(x) =f(x + µu) f(x)

where u N(0, I), and µ 0. Based on this gradient estimator, the update of attack becomes xt+1 = Proj NR(x)(xt ηtgµ(xt)). (4)

3.3 Random Search for Black-Box Attack

The search-based attacks also depend on the value of f(x + µu) f(x) to ﬁnd attack directions. Here, u is a searching direction sampled from some pre-deﬁned distributions, such as gaussian noise in [12], orthogonal basis in [23] and squared perturbations in [2]. The µ is the size of perturbation in each search step such as the size of square in Square attack [2]. The work of [1, 35] adopt the ﬁxed size schedule. If f(x + µu) f(x) < 0, the attackers consider u as a valuable attack direction. So, the direction searching becomes s(x) = I(h(x) < 0) µu where h(x) = f(x + µu) f(x), (5)

where, I is the indicator function. And the updating of search-based attacks becomes xt+1 = Proj NR(x)(xt + s(xt)). (6)

4 RND: Random Noise Defense Against Query-based Attack Methods

4.1 Random Noise Defense

In the gradient estimator (e.g., Eq.(3)) and searching direction (e.g., Eq.(5)), the attacker adds one small random perturbation µu to get the objective difference between two queries. Thus, if the defender can further disturb this random perturbation, the gradient estimation or searching direction is expected to be misled to decrease the attack efﬁciency. However, it s also hard for defenders to identify whether a query is normal or malicious. Inspired by those observations, we study a lightweight defense method, dubbed Random Noise Defense (RND), that simply adds random noise to each query. For RND, the feedback for one query x is M(x + νv), with v N(0, I) is standard Gaussian noise added by the defender, and the factor ν > 0 controls the magnitude of random noise. Considering the task of defending query-based black-box attacks, RND should satisfy: 1) the prediction of each query will not be changed signiﬁcantly; 2) the estimated gradient or direction searching should be perturbed as large as possible towards better defense performance. The gradient estimator and searching direction under RND become

gµ,ν(x) = f (x + µu + νv1) f (x + νv2)

sν(x) = I(hν(x) < 0) µu where hν(x) = f(x + µu + νv1) f(x + νv2), (8)

where v1, v2 N(0, I) are both standard Gaussian noise generated by the defender.

To satisfy the ﬁrst requirement, one cannot add too large noise to each query. However, to meet the second requirement, ν should be large enough to change the gradient estimation or searching direction. We need to choose a proper ν to achieve a good trade-off between these two requirements. In the following, we provide the theoretical analysis of RND, which can shed light on the setting of ν.

4.2 Theoretical Analysis of Random Noise Defense Against ZO Attacks

In this section, we will present the theoretical analysis of the effect of RND against ZO attacks. Speciﬁcally, we study the convergence property of ZO attack in Eq.(4) with gµ,ν(x) in Eq.(7) being the gradient estimator. Throughout our analysis, the measure of adversarial perturbation is speciﬁed as ℓ2 norm, corresponding to NR(x) = {x | x x 2 R}. To facilitate subsequent analyses, we ﬁrst introduce some assumptions, notations, and deﬁnitions.

Assumption 1. f(x) is Lipschitz-continuous, i.e., |f(y) f(x)| L0(f) y x .

Assumption 2. f(x) is continuous and differentiable, and f(x) is Lipschitz-continuous, i.e., f(y) f(x) L1(f) y x .

Notations. We denote the sequence of standard Gaussian noises added by the attacker as Ut = {u0, u1, . . . , ut}, with t being the iteration index in the update Eq.(4). The sequence of standard Gaussian noises added by the defender is denoted as Vt = {v01, v02, . . . , vt1, vt2}. The generated sequential solutions are denoted as {x0, x1, . . . , x Q}, and the benign example x is used as the initial solution, i.e., x0 = x. d = |X| denotes the input dimension.

Deﬁnition 1. The Gaussian-Smoothing function corresponding to f(x) with ν > 0, v N(0, I) is

fν(x) = 1 (2π)d/2

Z f(x + νv) e 1

2 v 2 2 dv. (9)

Due to the noise inserted by the defender, fν becomes the objective function for the attacker. We also deﬁne the minimum of fν(x), f ν = fν(x ) = minx NR(x0) fν(x).

4.2.1 Theoretical Analysis under General Non-Convex Case

Here, we report theoretical analysis of RND under general non-convex case satisfying Assumption 1.

Theorem 1. Under Assumption 1, for any Q 0, consider a sequence {xt}Q t=0 generated according to the descent update Eq.(4) using the gradient estimator gµ,ν(x) in Eq.(7) , with the constant stepsize,

η = h Rϵ γ(α)2d3L3 0(f)(Q+1) i1/2 Then,

t=0 EUt,Vt( fµ,ν(xt) 2) 2L0(f) 5 2 R 1 2 d 3 2

(Q + 1) 1 2 ϵ 1 2 γ(α) (10)

µ = α and γ(α) = α +

2 2 , which is always increasing function of α.

Remark 1. Due to the non-convexity assumption, we only guarantee the convergence to a stationary point of the function fµ,ν(x), which is a smoothing approximation of fν. To bound the gap ϵ between fµ,ν(x) and fν(x), i.e., |fµ,ν(x) fν(x)| ϵ, x NR(x), we could choose µ ˆµ = ϵ d1/2L0(f) [36]. The minimum of upper bound is denoted as δ, then the upper bound for the expected number of queries is O γ(α)2 d3L5 0(f)R ϵδ2 .

The convergence rate of ZO attacks is important since attackers need to ﬁnd adversarial examples within limited queries effectively. Theorem 1 shows the convergence rate is positive related to the ratio ν

µ. The larger ratio ν

µ will lead to the higher upper bound of convergence error and slower convergence rate. In consequence, the attackers need much more queries to ﬁnd adversarial examples. Speciﬁcally, if ν

µ 1, the query complexity is equivalent to the original constant term O( d3

ϵδ2 ). When

ν µ 1, the query complexity is really improved over O( d3

ϵδ2 ). Therefore, the attack efﬁciency will be signiﬁcantly reduced under the queries limited setting, leading to failed attacks or a much larger number of queries for successful attacks. Therefore, the large ratio ν

µ leads the effectiveness of RND. In accordance with our intuition, the defenders should insert larger noise (i.e., ν) than that added by the attack (i.e., µ) to achieve the satisﬁed defense effect.

Trade-off of Larger ν and Clean Accuracy: If f(x) is Lipschitz-continuous, then |fν(x) f(x)| νL0(f)d1/2 [36]. The larger ν is, the larger the gap between fν(x) and f(x). So the clean accuracy of the model with adding larger noise will also decrease. This is also shown in Figure 1 (d), which forms a trade-off between defense performance of RND and clean accuracy.

Larger Noise Size µ Adopted by Attackers: The attacker may be aware of the defense mechanism, increasing the adopted noise size µ. Our experiment results also verify this point. As shown in Figure 2 (a), for NES attack, the attack failure rate is almost 0, when ν = µ = 0.01. This shows adding small noise can t always guarantee an effective defense. Previous work [5, 15] don t consider this and overestimate the effect of small noise. However, increasing the noise size µ will also lead to less accurate gradient estimation and random search in Eq.(3) and Eq.(5), leading to a signiﬁcant decrease in attack performance consequentially. Taking the NES [26] and Bandit attack [27] on Image Net as examples, as shown in Figure 3, when µ increase from 0.01 to 0.05, the ℓ attack failure rates increase from 5.1%, 5.0% to 22.8%, 25.7% respectively. For Square attack, [2], when µ increases from 0.3 to 0.5, the average number of queries has doubled. To guarantee a successful

attack, attackers cannot always increase µ. Our experimental results show that the attack performance decreases signiﬁcantly when the µ of ZO attacks and Square attack is larger than 0.01 and 0.3. Based on the above analysis, we propose the stronger defense method in Section 4.4. It enables us to add larger noise (ν > µ) and still maintains a good clean accuracy.

4.2.2 Theoretical Analysis of Random Noise Defense Against Adaptive Attacks

As suggested in recent studies of robust defense [3, 6, 49], the defender should take a robust evaluation against the corresponding adaptive attack. The attacker is aware of the defense mechanism. Here we study the defense effect of RND against adaptive attacks. Since the idea of RND is to insert random noise to each query, an adaptive attacker could utilize Expectation Over Transformation (EOT) [4] to obtain a more accurate estimation, i.e., querying one sample multiple times to get the average value. Then, the gradient estimator used in ZO attacks Eq.(4) becomes

gµ,ν(x) = 1

f(x + µu + νvj1) f(x + νvj2)

where vj1, vj2 N(0, I), with j = 1, . . . , M. Note that here the deﬁnition of the sequential standard Gaussian noises added by the defender (see Section 4.2.1) should be updated to Vt = {v01, . . . , v0M, . . . , vt1, . . . , vt M}. vij Vt contains vij1 and vij2. The convergence analysis of ZO attack with Eq.(11) against RND is presented in Theorem 2.

Theorem 2. Under Assumption 1 and 2, for any Q 0, consider a sequence {xt}Q t=0 generated according to the descent update Eq.(4) using the gradient estimator gµ,ν(x) Eq.(11), we have

t=0 ηt EUt,Vt( fµ,ν(xt) 2) L0(f)R

t=0 η2 t (L0(f)2L1(f)d2(1

+ ν2L0(f)L1(f)2

µ d 5 2 + ν4L1(f)3(M + 1)

The larger M for EOT: Theorem 2 shows that the upper bound will be reduced with larger M. So, EOT can mitigate the defense effect caused by the randomness of RND. However, with M going to inﬁnity, the upper bound of expected convergence error (i.e., Eq. (12)) is still determined by the max term ν4

µ2 d3. It implies that the attack improvement from EOT is limited, especially with the larger ratio ν

µ. Experiments in Section 5.3 verify our analysis of EOT attack.

4.3 Theoretical Analysis of Random Noise Defense Against Search-based Attacks

In this section, we will show how RND affects search-based attacks. Based on the Eq.(5) and Eq.(8), by adding noise νv, the value of hν(x) will be different from that of h(x), and there is certain probability that Sign(hν(x)) be different from Sign(h(x)). When the random noise νv causes inconsistence between Sign(hν(x)) and Sign(h(x)), RND will mislead the attackers to select the incorrect attack directions (i.e., abandoning the descent direction w.r.t. f or selecting the ascent direction), so as to decrease attack performance. We have following theoretical analysis about this. Theorem 3. Under Assumption 1, considering the direction update Eq.(6) with Eq.(8) in search-based attacks, we have,

P(Sign(h(x)) = Sign(hν(x)) 2L0(f)ν

d |h(x)| (13)

The probability P is intuitively controlled by the relative values of the hν(x) and h(x). The proof is shown in Section F of supplementary materials. Theorem 3 shows the probability of misleading attacker is positive correlated with ν |h(x)|. Due to the small value µ and local linearity of model, we have |h(x)| = |f(x + µu) f(x)| L0(f)µ u . Therefore, the |h(x)| is also positively correlated with the stepsize µ within the small neighborhoods. So the probability of changing the sign is positively correlated with ν

µ. The larger ratio ν

µ leads to the larger upper bound of the probability. The attackers are more likely to be misleading and need much more queries to ﬁnd adversarial examples. So, the defense performance of RND is better. As shown in Figure 1 (a-c), the evaluations on Square attack verify our analysis.

(d) Figure 1: (a-c): Attack failure rate (%) of Square ℓ attacks on VGG-16(CIFAR-10), Inception v3(Image Net) and AT model (Image Net) under different values of µ and ν, where µ is the square size in Square attacks. (d): clean accuracy for different models on CIFAR-10 and Image Net. The circle lines and triangle lines represent models on CIFAR-10 and Image Net respectively.

4.4 RND with Gaussian Augmentation Fine-tuning

The aforementioned theoretical analyses reveal that RND should choose a proper noise magnitude ν to achieve a good balance between clean accuracy and defense effectiveness. To achieve a high-quality balance, we could reduce the sensitivity of the target model to random noises. The inﬂuence of the noise on the accuracy of each query will be reduced. To satisfy the lightweight requirement of black-box defense, we propose to utilize Gaussian Augmentation Fine-tuning (GF), which ﬁne-tunes the deployed model with Gaussian augmentation. We add random Gaussian noise to each training sample as a pre-processing step in the ﬁne-tuning process. Consequently, the model ﬁne-tuned with GF is expected to maintain good accuracy, even though defenders add a relatively large random noise to each query. As shown in Figure 1 (d), GF models still maintain the high clean accuracy under larger noise.

5 Experiments

5.1 Experimental Settings

Datasets and Classiﬁcation Models. We conduct experiments on two widely used benchmark datasets in adversarial machine learning: CIFAR-10 [28] and Image Net [14]. For classiﬁcation models, we use VGG-16 [43], Wide Res Net-16-10 and Wide Res Net-28-10 [54] on CIFAR-10. We conducted standard training, and their clean accuracy on the test set is 92.76% 95.10, and 96.60%, respectively. For Image Net, we adopt the pretrained Inception v3 model [47] and Res Net-50 model [24] provided by torchvision package and the clean accuracy are 76.80% and 74.90%, respectively.

Black-box Attack Methods and Compared Defense Methods. We consider several mainstreamed query-based black-box attack methods, including NES [26], ZO-sign SGD (ZS) [32], Bandit [27], ECO [35], Sim BA [23], Sign Hunter [1] and Square attack [2]. Note that the NES, ZS, Bandit are gradient estimation based attack methods, and the other four are search-based methods. Sim BA and ECO are only designed for ℓ2 and ℓ attack, respectively. Following [26], we evaluate all the attack methods on the whole test set of CIFAR-10 and 1,000 random sampled images from the validation set of Image Net. We present the evaluation performance against the untargeted attack under both ℓ and ℓ2 norm settings. The perturbation budget of ℓ is set to 0.05 for both datasets. For ℓ2 attack, the perturbation budget is set to 1 and 5 on CIFAR-10 and Image Net, respectively. The number of maximal queries is set to 10,000. We adopt the attack failure rate and the average number of query as an evaluation metric. The higher the attack failure rate and the larger the average number of query, the better the adversarial defense performance. We compare our methods with RSE [33] and PNI [25] on CIFAR-10. We adopt the pre-trained AT model [22] which shows better robustness than other AT models. For Image Net, we compare pre-trained Feature Denoise (FD) model [52] and adopt pre-trained AT in Robustness Library [17]. The implementation details of those methods are given in the Section B.1 and B.2 of supplementary materials.

5.2 Evaluation of RND Against Query-based Black-Box Attacks

We ﬁrst evaluate the defense performance of RND with various settings of µ and ν against querybased black-box attacks and verify the theoretical results in Section 4.2 and 4.3. Speciﬁcally, we set

(d) Figure 2: Attack failure rate (%) of query-based attacks on VGG-16 and CIFAR-10 under different values of µ and ν. We adopt logarithm scale in subplot (a-c) for better illustration. The complete evaluation under ℓ2 attack is given in Section B.4 of supplementary materials.

(d) Figure 3: Attack failure rate (%) of query-based attacks on Inception v3 and Image Net under different values of µ and ν. We adopt logarithm scale in subplot (a-c) for better illustration. The complete evaluation under ℓ2 attack is given in Section B.4 of supplementary materials.

ν {0.0, 0.01, 0.02} and µ {10 4, 5 10 4, 10 3, 0.005, 0.01, 0.05} for ZO attacks on CIFAR-10 and µ {10 4, 0.001, 0.01, 0.05} on Image Net. For Square attacks, we set µ {0.05, 0.1, 0.2, 0.3}. Figure 1 (a-c) presents the defense performance of RND against Square attack on CIFAR-10 and Image Net, respectively. Figure 2 (a-c) and 3 (a-c) present the defense performance of RND against NES, ZS and Bandit on CIFAR-10 and Image Net, respectively.

The results in Figure 1 (a-c), 2 (a-c), and 3 (a-c) show that: the attack failure rate of all attack methods generally increases as the value of ν

µ increases. Speciﬁcally, for a ﬁxed value of ν (e.g., 0.01), the attack failure rate increases as µ decreases. While for a certain value of µ (e.g., 10 4), the attack failure rate increases as ν increases. These collaborate our theoretical analysis that the ratio of ν

µ determines the probability of changing sign and the convergence rate of ZO attacks. The larger ν µ, the higher the probability of changing the sign and the convergence error of ZO attacks, which results in poor attack performance under the query-limited settings.

We also evaluate the defense performance of RND with various values of ν against other searchbased black-box attacks and the results are shown in Figure 2 (d) for CIFAR-10 and Figure 3 (d) for Image Net. As shown in the plots, the RND can signiﬁcantly boost the defense performance. The attack failure rate increases as the value of ν increases. The complete evaluations, including the Wide Res Net-16 on CIFAR-10, the ℓ2 attack, and the mean and medium number of queries of successful attacks, are reported in Section B.4 of supplementary materials.

5.3 Evaluation of RND Against Adaptive Attacks

We then evaluate the defense effect of RND against the adaptive attack EOT and verify our theoretical analysis in Section 4.2.2. We evaluate EOT with M {1, 5, 10} on CIFAR-10 and Image Net, and ν is set to 0.02 and µ is set to 0.001 and 0.1 for ZO attacks and Square attack, respectively. The evaluation with even larger M and µ is presented in Section B.5 of supplementary materials.

We consider two settings of query budget: 1) adaptive query budget that assigns query budget of 10,000 M for different value of M; 2) ﬁxed query budget that adopt a ﬁxed query budget of 10,000 for all M. We ﬁrst evaluate the performance with adaptive query budget on NES and ZS, and their numerical results are shown in Table 1. As shown in Table 1, the attack failure rate decreases as M increases on both datasets. However, the average number of the query of successful attack also signiﬁcantly increases as M increases, which demonstrates that the adaptive EOT attack can

Table 1: The evaluation of EOT with ℓ attack on CIFAR-10 and Image Net under the adaptive and ﬁxed query setting. The left part is the results on CIFAR-10 and the right part is on Image Net. The average number of query of successful attack as well as the attack failure rate are reported. The performance of EOT with ℓ2 attack is reported in Section B.5 of supplementary materials.

settings Methods M=1 M=5 M= 10 Methods M=1 M=5 M= 10

adaptive NES 1448/0.484 4078/0.361 5763/0.342 NES 2532/0.762 5364/0.705 7582/0.691 ZS 1489/0.493 3189/0.374 5912/0.349 ZS 2824/0.825 5735/0.761 7662/0.740

NES 1448/0.484 2528/0.452 3246/0.443 NES 2533/0.762 5240/0.775 5658/0.781 ZS 1489/0.493 2765/0.448 3123/0.421 ZS 2824/0.825 4023/0.842 4652/0.861 Bandit 436/0.696 276/0.582 314/0.543 Bandit 305/0.604 759/0.523 946/0.49 Square 380/0.301 181/0.162 223/0.121 Square 93/0.353 145/0.20 328/0.171 Sign Hunter 459/0.367 559/0.224 759/0.191 Sign Hunter 173/0.532 336/0.456 659/0.431 ECO 904/0.720 1681/0.761 2560/0.793 ECO 1237/0.666 3065/0.678 3091/0.692 Sim BA 1353/0.650 3852/0.467 4103/0.396 Sim BA 274/0.891 468/0.878 517/0.869

Table 2: The comparison of RND (ν = 0.02), GF, RND-GF (ν = 0.05), AT, RND-AT (ν = 0.05), PNI, RSE, and FD on CIFAR-10 and Imagenet. The average number of queries of successful attack and the attack failure rates are reported. The best and second best attack failure rate under each attack are highlighted in bold and underlined, respectively. The evaluation under ℓ2 attack is shown in Section B.6 of supplementary materials.

Datasets Methods Clean Acc NES(ℓ ) ZS(ℓ ) Bandit(ℓ ) Sign(ℓ ) Square(ℓ ) Sim BA(ℓ2) ECO(ℓ )

CIFAR-10 (Wide Net-28)

Clean Model 96.60% 465.5/0.01 581.8/0.06 210.2/0.03 167.6/0.03 137.1/0.02 457.2/0.04 457.8/0.0 GF 91.72% 999.0/0.407 759.9/0.544 744.5/0.116 348.3/0.027 581.0/0.061 1146.8/0.395 883.9/0.067 RSE[33] 84.12% 1246.3/0.396 1327.8/0.422 281.7/0.372 243.7/0.221 413.3/0.243 498.3/0.337 578.3/0.534 PNI[25] 87.20% 1071.4/0.725 1310.7/0.823 324.9/0.824 267.0/0.708 295.3/0.612 945.0/0.857 2342.2/0.623 AT[22] 89.48% 821.6/0.807 614.9/0.862 1451.5/0.623 766.3/0.476 1135.4/0.499 1523.2/0.635 1180.4/0.484 RND 93.60% 842.5/0.05 941.8/0.143 273.1/0.478 977.2/0.226 762.4/0.116 2112.6/0.549 912.8/0.688 RND-GF 92.40% 2805.7/0.516 2966.3/0.730 1223.5/0.841 1017.1/0.407 1207.3/0.378 1220.2/0.863 687.2/0.872 RND-AT 87.40% 2499.2/0.842 2625.7/0.923 891.5/0.891 767.9/0.737 1170.7/0.730 1787.4/0.912 687.4/ 0.911

Image Net (Res Net-50)

Clean Model 74.90% 1031.9/0.0 2013.0/0.235 329.2/0.02 264.1/0.03 76.5/0.0 1234.5/0.281 347.7/0.0 GF[40] 74.70% 1685.5/0.03 1712.1/0.347 601.4/0.02 329.0/0.0 97.28/0.0 1417.4/0.112 362.4/0.0 FD[52] 54.20% 1997.2/0.679 1555.5/0.775 1579.2/0.426 1633.1/0.332 1092.4/0.242 2607.9/0.613 1501.0/0.240 AT[17] 61.60% 2113.4/0.724 1688.7/0.815 1091.5/0.416 1522.7/0.289 1109.0/0.159 2638.2/0.651 1440.6/0.200 RND 73.00% 3041.5/0.245 2266.2/0.330 390.6/0.536 661.0/0.314 81.5/0.101 825.3/0.612 2435.5/0.540 RND-GF 71.15% 2489.3/0.421 2053.5/0.563 495.9/0.603 514.0/0.348 1009.9/0.146 777.2/ 0.762 994.8/0.702 RND-AT 58.15% 2556.6/0.864 2596.6/0.870 448.0/0.810 724.2/0.632 1306.3/0.386 1210.5/0.953 631.1/0.865

increase the attacking success rate with a sacriﬁce of query efﬁciency. More importantly, we observe that the relative performance improvements induced by EOT generally decrease as M increases. These verify our theoretical analysis in Section 4.2.2.

The evaluation under the ﬁxed query budget is also reported in Table 1. On small-scale dataset CIFAR10, the attack failure rate of all attacks generally decreases as M increases. Yet, we also observe a similar phenomenon in that the relative performance improvements induced by EOT decrease as M increases. For the large-scale dataset Image Net, EOT can increase the attack performance with a signiﬁcant sacriﬁce of query efﬁciency under most cases, except for NES and ZS. The attack performance of NES and ZS decreases as M increases. Speciﬁcally, for a ﬁxed number of queries, the larger the M, the smaller the number of iterations for ZO attack under limited query setting. The poor performance of NES and ZS with larger M may be because the improvements induced by EOT are less than the potential degeneration caused by the decrease of iterations.

5.4 Evaluation of RND-GF and the Combination of RND with AT

In this section, we evaluate the performance of RND combined with Gaussian augmentation Finetuning, dubbed RNG-GF. In ﬁne-tuning phase, we adopt the cyclic learning rate [44] to achieve superconvergence in 50 epochs. The training detail is shown in Section B.3 of supplementary materials. We adopt the attack failure rate and the average number of queries under the stronger adaptive attack EOT for evaluation. As that in the Section 5.2, we also tune the parameter µ for attackers and tune M {1, 5, 10} for EOT to achieve the best attack performance for all attack methods.

The comparison with other defense methods is shown in Table 2, where GF refers to standard ﬁne-tuning with Gaussian augmentation. RND denotes the clean model coupled with random noise defense, and RND-GF indicates the GF model coupled with RND. As shown in the Table 2, RND can signiﬁcantly boost the defense performance of the Clean Model on both datasets. For example, RNG achieves 20% 40% improvement against Bandit and Sign Hunter on both datasets. The proposed

GF can protect the clean accuracy from the larger noise induced by RND. Therefore, we adopt a relatively larger ν = 0.05 for RND-GF. According to our theoretical analysis, larger ν will lead to better defense performance. Experimental results also show that RNG-GF signiﬁcantly improves the defense performance under all attack methods while maintaining good clean accuracy compared with RND. For example, RNG-GF achieves 40% 50% and 20% 30% improvements against ZO and search-based attacks on CIFAR-10. Compared with RSE, PNI, and FD, RND-GF achieves a higher failure rate against Bandit, Sim BA, and ECO and leads to much more queries on other attacks. Besides, it also maintains a much better clean accuracy and lower training cost.

Without any extra modiﬁcation, RND can be easily combined with many existing defense methods. We adopt the pre-trained AT Wide Net-28 (ℓ ) model [22] and pre-trained AT Res Net-50 (ℓ ) model in Robustness Library [17] on CIFAR-10 and Image Net, respectively. As shown in Figure 1 (d), AT models are less affected by adding noise like GF models. So we can adopt the larger noise (ν = 0.05) in inference time. Combining AT with RND, RND-AT signiﬁcantly improves the robustness against all attacks and achieves the best performance among all methods. Based on AT models, RND-AT achieves 23.1% and 22.7% improvements against Square attack on CIFAR-10 and Image Net.

5.5 Discussion and Limitations

As shown in Section 3.1, we focus on the score-based attacks. The analysis and evaluation of RND against decision-based attacks are shown in Section C of supplementary materials. Besides, as we mainly focus on practical query-based black-box attacks where the models and the training datasets are unknown to the attackers, we don t cover the attacks utilizing the transferability from surrogate to target models. These methods usually assume that the surrogate models are trained on the same training set with the target model. It is difﬁcult to obtain the training set behind the target model in real scenarios. Meanwhile, there have been some defense strategies developed for transfer-based attacks [38, 50, 53]. It is interesting to explore the combination of RND and these works towards better defense performance against transfer-based attacks. We leave it to future works.

6 Conclusion

In this work, we study a lightweight black-box defense, dubbed Random Noise Defense (RND) for query-based black-box attacks, which is realized by adding proper Gaussian noise to each query. We give the ﬁrst theoretical analysis of the effectiveness of RND against both standard and adaptive query-based black-box attacks. Based on our analysis, we further propose RND-GF ,which combines RND with Gaussian augmentation Fine-tuning to obtain a better trade-off between clean accuracy and robustness. Extensive experiments verify our theoretical analysis and demonstrate the effectiveness of the RND-GF against several state-of-the-art query-based attacks. Without any extra modiﬁcation, RND can easily combine with the existing defense methods, such as AT. We further demonstrate that when combined with RND, RND-AT can signiﬁcantly boost the adversarial robustness. Given that RND is very simple and effective, we recommend it should become a baseline to evaluate new query-based black-box attack methods.

Acknowledgment

We want to thank Jiancong Xiao, Ziniu Li, Congliang Chen, Jiawei Zhang, and Yingru Li for helpful discussions. We want to thank the anonymous reviewers for their valuable suggestions and comments. Baoyuan Wu and Zeyu Qin are partially supported by the Natural Science Foundation of China under grant No.62076213, the university development fund of the Chinese University of Hong Kong, Shenzhen under grant No.01001810, the special project fund of Shenzhen Research Institute of Big Data under grant No.T00120210003, and Tencent AI Lab Rhino-Bird Focused Research Program under grant No. JR202123. Hongyuan Zha is partially supported by a grant from Shenzhen Research Institute of Big Data.

[1] Abdullah Al-Dujaili and Una-May O Reilly. Sign bits are all you need for black-box attacks. In International Conference on Learning Representations, 2020.

[2] Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion, and Matthias Hein. Square attack: a query-efﬁcient black-box adversarial attack via random search. In European Conference on Computer Vision, pages 484 501. Springer, 2020.

[3] Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In International Conference on Machine Learning, pages 274 283, 2018.

[4] Anish Athalye, Logan Engstrom, Andrew Ilyas, and Kevin Kwok. Synthesizing robust adversarial examples. In International conference on machine learning, pages 284 293. PMLR, 2018.

[5] Junyoung Byun, Hyojun Go, and Changick Kim. Small input noise is enough to defend against query-based black-box attacks, 2021.

[6] Nicholas Carlini, Anish Athalye, Nicolas Papernot, Wieland Brendel, Jonas Rauber, Dimitris Tsipras, Ian Goodfellow, Aleksander Madry, and Alexey Kurakin. On evaluating adversarial robustness. ar Xiv preprint ar Xiv:1902.06705, 2019.

[7] Jianbo Chen, Michael I Jordan, and Martin J Wainwright. Hopskipjumpattack: A query-efﬁcient decision-based attack. In 2020 ieee symposium on security and privacy (sp), pages 1277 1294. IEEE, 2020.

[8] Steven Chen, Nicholas Carlini, and David Wagner. Stateful detection of black-box adversarial attacks. In Proceedings of the 1st ACM Workshop on Security and Privacy on Artiﬁcial Intelligence, pages 30 39, 2020.

[9] Weilun Chen, Zhaoxiang Zhang, Xiaolin Hu, and Baoyuan Wu. Boosting decision-based black-box adversarial attacks with random sign ﬂip. In European Conference on Computer Vision, pages 276 293. Springer, 2020.

[10] Minhao Cheng, Thong Le, Pin-Yu Chen, Huan Zhang, Jin Feng Yi, and Cho-Jui Hsieh. Queryefﬁcient hard-label black-box attack: An optimization-based approach. In International Conference on Learning Representations, 2019.

[11] Minhao Cheng, Simranjit Singh, Patrick Chen, Pin-Yu Chen, Sijia Liu, and Cho-Jui Hsieh. Sign-opt: A query-efﬁcient hard-label adversarial attack. ar Xiv preprint ar Xiv:1909.10773, 2019.

[12] Minhao Cheng, Simranjit Singh, Patrick H. Chen, Pin-Yu Chen, Sijia Liu, and Cho-Jui Hsieh. Sign-opt: A query-efﬁcient hard-label adversarial attack. In International Conference on Learning Representations, 2020.

[13] Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certiﬁed adversarial robustness via randomized smoothing. In International Conference on Machine Learning, pages 1310 1320, 2019.

[14] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A largescale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248 255. IEEE, 2009.

[15] Yinpeng Dong, Qi-An Fu, Xiao Yang, Tianyu Pang, Hang Su, Zihao Xiao, and Jun Zhu. Benchmarking adversarial robustness on image classiﬁcation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 321 331, 2020.

[16] John C Duchi, Michael I Jordan, Martin J Wainwright, and Andre Wibisono. Optimal rates for zero-order convex optimization: The power of two function evaluations. IEEE Transactions on Information Theory, 61(5):2788 2806, 2015.

[17] Logan Engstrom, Andrew Ilyas, Hadi Salman, Shibani Santurkar, and Dimitris Tsipras. Robustness (python library), 2019.

[18] Yanbo Fan, Baoyuan Wu, Tuanhui Li, Yong Zhang, Mingyang Li, Zhifeng Li, and Yujiu Yang. Sparse adversarial attack via perturbation factorization. In Computer Vision ECCV 2020: 16th European Conference, Glasgow, UK, August 23 28, 2020, Proceedings, Part XXII 16, pages 35 50. Springer, 2020.

[19] Yan Feng, Baoyuan Wu, Yanbo Fan, Li Liu, Zhifeng Li, and Shutao Xia. Boosting black-box attack with partially transferred conditional adversarial distribution, 2020.

[20] Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A. Wichmann, and Wieland Brendel. Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. In ICLR, 2019.

[21] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. ar Xiv preprint ar Xiv:1412.6572, 2014.

[22] Sven Gowal, Chongli Qin, Jonathan Uesato, Timothy Mann, and Pushmeet Kohli. Uncovering the limits of adversarial training against norm-bounded adversarial examples. ar Xiv preprint ar Xiv:2010.03593, 2020.

[23] Chuan Guo, Jacob Gardner, Yurong You, Andrew Gordon Wilson, and Kilian Weinberger. Simple black-box adversarial attacks. In International Conference on Machine Learning, pages 2484 2493, 2019.

[24] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770 778, 2016.

[25] Zhezhi He, Adnan Siraj Rakin, and Deliang Fan. Parametric noise injection: Trainable randomness to improve deep neural network robustness against adversarial attack. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 588 597, 2019.

[26] Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin. Black-box adversarial attacks with limited queries and information. In International Conference on Machine Learning, pages 2137 2146, 2018.

[27] Andrew Ilyas, Logan Engstrom, and Aleksander Madry. Prior convictions: Black-box adversarial attacks with bandits and priors. In International Conference on Learning Representations, 2018.

[28] Alex Krizhevsky et al. Learning multiple layers of features from tiny images. 2009.

[29] Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. Adversarial examples in the physical world. Artiﬁcial Intelligence Safety and Security, page 99 112, Jul 2018.

[30] Mathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, and Suman Jana. Certiﬁed robustness to adversarial examples with differential privacy. In 2019 IEEE Symposium on Security and Privacy (SP), pages 656 672. IEEE, 2019.

[31] Huiying Li, Shawn Shan, Emily Wenger, Jiayun Zhang, Haitao Zheng, and Ben Y Zhao. Blacklight: Defending black-box adversarial attacks on deep neural networks. ar Xiv preprint ar Xiv:2006.14042, 2020.

[32] Sijia Liu, Pin-Yu Chen, Xiangyi Chen, and Mingyi Hong. signsgd via zeroth-order oracle. In International Conference on Learning Representations, 2018.

[33] Xuanqing Liu, Minhao Cheng, Huan Zhang, and Cho-Jui Hsieh. Towards robust neural networks via random self-ensemble. In Proceedings of the European Conference on Computer Vision (ECCV), pages 369 385, 2018.

[34] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.

[35] Seungyong Moon, Gaon An, and Hyun Oh Song. Parsimonious black-box adversarial attacks via efﬁcient combinatorial optimization. In International Conference on Machine Learning, pages 4636 4645. PMLR, 2019.

[36] Yurii Nesterov and Vladimir Spokoiny. Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 17(2):527 566, 2017.

[37] Ren Pang, Xinyang Zhang, Shouling Ji, Xiapu Luo, and Ting Wang. Advmind: Inferring adversary intent of black-box attacks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1899 1907, 2020.

[38] Tianyu Pang, Kun Xu, Chao Du, Ning Chen, and Jun Zhu. Improving adversarial robustness via promoting ensemble diversity. In International Conference on Machine Learning, pages 4970 4979. PMLR, 2019.

[39] Leslie Rice, Eric Wong, and Zico Kolter. Overﬁtting in adversarially robust deep learning. In International Conference on Machine Learning, pages 8093 8104. PMLR, 2020.

[40] Evgenia Rusak, Lukas Schott, Roland S Zimmermann, Julian Bitterwolf, Oliver Bringmann, Matthias Bethge, and Wieland Brendel. A simple way to make neural networks robust against diverse image corruptions. In European Conference on Computer Vision, pages 53 69. Springer, 2020.

[41] Hadi Salman, Jerry Li, Ilya Razenshteyn, Pengchuan Zhang, Huan Zhang, Sebastien Bubeck, and Greg Yang. Provably robust deep learning via adversarially trained smoothed classiﬁers. In Advances in Neural Information Processing Systems, pages 11292 11303, 2019.

[42] Hadi Salman, Mingjie Sun, Greg Yang, Ashish Kapoor, and J Zico Kolter. Denoised smoothing: A provable defense for pretrained classiﬁers. Advances in Neural Information Processing Systems, 33, 2020.

[43] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. ar Xiv preprint ar Xiv:1409.1556, 2014.

[44] Leslie N Smith. Cyclical learning rates for training neural networks. In 2017 IEEE winter conference on applications of computer vision (WACV), pages 464 472. IEEE, 2017.

[45] Jure Sokolic, Raja Giryes, Guillermo Sapiro, and Miguel R. D. Rodrigues. Generalization error of invariant classiﬁers. In Aarti Singh and Xiaojin (Jerry) Zhu, editors, AISTATS, pages 1094 1103, 2017.

[46] David Stutz, Matthias Hein, and Bernt Schiele. Conﬁdence-calibrated adversarial training: Generalizing to unseen attacks. In International Conference on Machine Learning, pages 9155 9166. PMLR, 2020.

[47] C Szegedy, V Vanhoucke, S Ioffe, J Shlens, and Z Wojna. Rethinking the inception architecture for computer vision. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2818 2826, 2016.

[48] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. ar Xiv preprint ar Xiv:1312.6199, 2013.

[49] Florian Tramer, Nicholas Carlini, Wieland Brendel, and Aleksander Madry. On adaptive attacks to adversarial example defenses. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 1631 1643. Curran Associates, Inc., 2020.

[50] Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick Mc Daniel. Ensemble adversarial training: Attacks and defenses. ar Xiv preprint ar Xiv:1705.07204, 2017.

[51] Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, and Alan Yuille. Mitigating adversarial effects through randomization. In International Conference on Learning Representations, 2018.

[52] Cihang Xie, Yuxin Wu, Laurens van der Maaten, Alan L Yuille, and Kaiming He. Feature denoising for improving adversarial robustness. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 501 509, 2019.

[53] Huanrui Yang, Jingyang Zhang, Hongliang Dong, Nathan Inkawhich, Andrew Gardner, Andrew Touchet, Wesley Wilkes, Heath Berry, and Hai Li. Dverge: Diversifying vulnerabilities for enhanced robust generation of ensembles. Advances in Neural Information Processing Systems, 33, 2020.

[54] Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. ar Xiv preprint ar Xiv:1605.07146, 2016.

1. For all authors...

(a) Do the main claims made in the abstract and introduction accurately reﬂect the paper s contributions and scope? [Yes] See ﬁnal part of Section 1. (b) Did you describe the limitations of your work? [Yes] See Section 5.5.

(c) Did you discuss any potential negative societal impacts of your work? [Yes] See Supplementary Materials (d) Have you read the ethics review guidelines and ensured that your paper conforms to them? [Yes] 2. If you are including theoretical results...

(a) Did you state the full set of assumptions of all theoretical results? [Yes] See Section 4.2. (b) Did you include complete proofs of all theoretical results? [Yes] See Supplementary Materials. 3. If you ran experiments...

(a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Section 5.1 and Supplementary Materials (c) Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? [No] We run all experiments 3 times and average all results over 3 random seeds. We ﬁnd the results of each experiment were almost identical to each other. (d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] See Supplementary Materials 4. If you are using existing assets (e.g., code, data, models) or curating/releasing new assets...

(a) If your work uses existing assets, did you cite the creators? [Yes] See Section 5.1 (b) Did you mention the license of the assets? [Yes] See Supplementary Materials

(c) Did you include any new assets either in the supplemental material or as a URL? [Yes]

Yes, we provide our code in the Supplemental Materials. (d) Did you discuss whether and how consent was obtained from people whose data you re using/curating? [No] No, all the used data are well-known public benchmark datasets such as CIFAR10, Image Net. (e) Did you discuss whether the data you are using/curating contains personally identiﬁable information or offensive content? [No] No, all the used data are well-known public benchmark datasets such as CIFAR10, Image Net. 5. If you used crowdsourcing or conducted research with human subjects...

(a) Did you include the full text of instructions given to participants and screenshots, if applicable? [N/A] Not applicable. (b) Did you describe any potential participant risks, with links to Institutional Review Board (IRB) approvals, if applicable? [N/A] Not applicable. (c) Did you include the estimated hourly wage paid to participants and the total amount spent on participant compensation? [N/A] Not applicable.