# neural_architecture_dilation_for_adversarial_robustness__1a8c3327.pdf

Neural Architecture Dilation for Adversarial Robustness

Yanxi Li 1, Zhaohui Yang 2,3, Yunhe Wang 2, Chang Xu 1

1 School of Computer Science, University of Sydney, Australia 2 Huawei Noah s Ark Lab 3 Key Lab of Machine Perception (MOE), Department of Machine Intelligence, Peking University, China yali0722@uni.sydney.edu.au, zhaohuiyang@pku.edu.cn, yunhe.wang@huawei.com, c.xu@sydney.edu.au

With the tremendous advances in the architecture and scale of convolutional neural networks (CNNs) over the past few decades, they can easily reach or even exceed the performance of humans in certain tasks. However, a recently discovered shortcoming of CNNs is that they are vulnerable to adversarial attacks. Although the adversarial robustness of CNNs can be improved by adversarial training, there is a trade-off between standard accuracy and adversarial robustness. From the neural architecture perspective, this paper aims to improve the adversarial robustness of the backbone CNNs that have a satisfactory accuracy. Under a minimal computational overhead, the introduction of a dilation architecture is expected to be friendly with the standard performance of the backbone CNN while pursuing adversarial robustness. Theoretical analyses on the standard and adversarial error bounds naturally motivate the proposed neural architecture dilation algorithm. Experimental results on real-world datasets and benchmark neural networks demonstrate the effectiveness of the proposed algorithm to balance the accuracy and adversarial robustness. 1 Introduction In the past few decades, novel architecture design and network scale expansion have achieved signiﬁcant success in the development of convolutional neural networks (CNN) [12, 13, 11, 23, 9, 31, 1, 35, 17, 15]. These advanced neural networks can already reach or even exceed the performance of humans in certain tasks [10, 21]. Despite the success of CNNs, a recently discovered shortcoming of them is that they are vulnerable to adversarial attacks. The ingeniously designed small perturbations when applied to images could mislead the networks to predict incorrect labels of the input [7]. This vulnerability notably reduces the reliability of CNNs in practical applications. Hence developing solutions to increase the adversarial robustness of CNNs against adversarial attacks has attracted particular attention from the researchers. Adversarial training can be the most standard defense approach, which augments the training data with adversarial examples. These adversarial examples are often generated by fast gradient sign method (FGSM) [7] or projected gradient descent (PGD) [18]. Tramèr et al. [24] investigates the adversarial examples produced by a number of pre-trained models and developed an ensemble adversarial training. Focusing on the worst-case loss over a convex outer region, Wong and Kolter [27] introduces a provable robust model. There are more improvements of PGD adversarial training techniques, including Lipschitz regularization [6] and curriculum adversarial training [2]. In a recent study by Tsipras et al. [25], there exists a trade-off between standard accuracy and adversarial robustness. After the networks have been trained to defend against adversarial attacks, their performance over natural image classiﬁcation could be negatively inﬂuenced. TRADES [32] theoretically studies this trade-off by introducing a boundary error between the natural (i.e. standard) error and the robust error. Instead of directly adjusting the trade-off, the friendly adversarial training (FAT) [33] proposes to exploit weak adversarial examples for a slight standard accuracy drop.

35th Conference on Neural Information Processing Systems (Neur IPS 2021).

Numerous efforts have been made to defend the adversarial attacks by carefully designing various training objective functions of the networks. But less noticed is that the neural architecture actually bounds the performance of the network. Recently there are a few attempts to analyze the adversarial robustness of the neural network from the architecture perspective. For example, RACL [5] applies Lipschitz constraint on architecture parameters in one-shot NAS to reduce the Lipschitz constant and improve the robustness. Rob Net [8] search for adversarially robust network architectures directly with adversarial training. Despite these studies, a deeper understanding of the accuracy and robustness trade-off from the architecture perspective is still largely missing. In this paper, we focus on designing neural networks sufﬁcient for both standard and adversarial classiﬁcation from the architecture perspective. We propose neural architecture dilation for adversarial robustness (NADAR). Beginning with the backbone network of a satisfactory accuracy over the natural data, we search for a dilation architecture to pursue a maximal robustness gain while preserving a minimal accuracy drop. Besides, we also apply a FLOPs-aware approach to optimize the architecture, which can prevent the architecture from increasing the computation cost of the network too much. We theoretically analyze our dilation framework and prove that our constrained optimization objectives can effectively achieve our motivations. Experimental results on benchmark datasets demonstrate the signiﬁcance of studying the adversarial robustness from the architecture perspective and the effectiveness of the proposed algorithm. 2 Related Works

2.1 Adversarial Training FGSM [7] claims that the adversarial vulnerability of neural networks is related to their linear nature instead of the nonlinearity and overﬁtting previously thought. A method to generate adversarial examples for adversarial training is proposed based on such a perspective to reduce the adversarial error. PGD [18] studies the adversarial robustness from the view of robust optimization. A ﬁrst-order gradient-based method for iterative adversarial is proposed. Free AT [22] reduces the computational overhead of generating adversarial examples. The gradient information in network training is recycled to generate adversarial training. With this gradient reusing, it achieves 7 to 30 times of speedup. However, the adversarial robustness comes at a price. Tsipras et al. [25] reveals that there is a trade-off between the standard accuracy and adversarial robustness because of the difference between features learned by the optimal standard and optimal robust classiﬁers. TRADES [32] theoretically analyzes this trade-off. A boundary error is identiﬁed between the standard and adversarial error to guide the design of defense against adversarial attacks. As a solution, a tuning parameter λ is introduced into their framework to adjust the trade-off. The friendly adversarial training (FAT) [33] generates weak adversarial examples that satisfy a minimal margin of loss. The miss-classiﬁed adversarial examples with the lowest classiﬁcation loss are selected for adversarial training. 2.2 Neural Architecture Search NAS aims to automatically design neural architectures for networks. Early NAS methods [1, 35, 16, 19] are computationally intensive, requiring hundreds or thousands of GPU hours because of the demand of training and evaluation of a large number of architectures. Recently, the differentiable and one-shot NAS approaches Liu et al. [17] and Xu et al. [29] propose to construct a one-shot supernetwork and optimize the architecture parameter with gradient descent, which reduces the computational overhead dramatically. Differentiable neural architecture search allows joint and differentiable optimization of model weights and the architecture parameter using gradient descent. Due to the parallel training of multiple architectures, DARTS is memory consuming. Several followup works aim to reduce the memory cost and improve the efﬁciency of NAS. One remarkable approach among them is PC-DARTS [29]. It utilizes a partial channel connections technique, where sub-channels of the intermediate features are sampled to be processed. Therefore, memory usage and computational cost are reduced. Besides direct reducing the training cost, CARS [30] proposes a novel efﬁcient continuous evolutionary approach based on the historical evaluation. Similarly, PVLL-NAS [14] performs evaluation with a performance estimator, who samples neural architectures for both architecture searching and iterative training of the estimator itself. Considering adversarial attacks in the optimization of neural architectures can help designing networks that are inherently resistant to adversarial attacks. RACL [5] applied a constraint on the architecture parameter in differentiable one-shot NAS to reduce the Lipschitz constant. Previous works [3, 26] have shown that a smaller Lipschitz constant always corresponds to a more robust network. It is, therefore, effective to improve the robustness of neural architectures by constraining their Lipschitz constant. Rob Net [8] directly optimizes the architecture by adversarial training with PGD.

3 Methodology The adversarial training can be considered as a minimax problem, where the adversarial perturbations are generated to attack the network by maximizing the classiﬁcation loss, and the network is optimized to defend against such attacks:

min f E(x,y) D

max x Bp(x,ε) ℓ(y, f(x )) , (1)

where D is the distribution of the natural examples x and the labels y, Bp(x, ε) = {x : x x p ε} deﬁnes the set of allowed adversarial examples x within the scale ε of small perturbations under lp normalization, and f is the network under attack. 3.1 Robust Architecture Dilation

Hybrid Network

Dilation Network

Figure 1: The overall structure of a NADAR hybrid network.

The capacity of deep neural network has been demonstrated to be critical to its adversarial robustness [18, 25, 34]. Madry et al. [18] ﬁnds capacity plays an important role in adversarial robustness, and networks require larger capacity for adversarial than standard tasks. Tsipras et al. [25] suggests simple classiﬁer for standard tasks cannot reach good performance on adversarial tasks. However, it remains an open question to use the minimal increase of network capacity in exchange for the adversarial robustness. Suppose that we have a backbone network fb that can achieve a satisfactory accuracy on the natural data. To strengthen its adversarial robustness without hurting the standard accuracy, we propose to increase the capacity of this backbone network fb by dilating it with a network fd, whose architecture and parameter will be optimized within the adversarial training.

The backbone network fb is split into blocks. A block f (l) b is deﬁned as a set of successive layers in the backbone with the same resolution. For a backbone with L blocks, i.e. fb = {f (l) b , l 1, . . . , L}, we attach a cell f (l) d of the dilation network to each block f (l) b . Therefore, the dilation network also has L cells, i.e. fd = {f (l) d , l 1, . . . , L}. For the dilation architecture, we search for cells within a NASNet-like [35] search space. In a NASNet-like search space, each cell takes two previous outputs as its inputs. The backbone and the dilation network are further aggregated by element-wise sum. The overall structure of a NADAR hybrid network is as shown in Figure 1. Formally, the hybrid network for the adversarial training is deﬁned as:

fhyb(x) = h l=1,...,L f (l) b (z(l 1) hyb ) + f (l) d (z(l 1) hyb , z(l 2) hyb ) , (2)

where z(l) hyb = f (l) b (z(l 1) hyb ) + f (l) d (z(l 1) hyb , z(l 2) hyb ) is the latent feature extracted by the backbone block and the dilation block, and represents functional composition. We also deﬁne a classiﬁcation hypothesis h : z(L) hyb ˆy, where z(L) hyb is the latent representation extracted by the last convolutional layer L, and ˆy is the predicted label. During search, the backbone network fb has a ﬁxed architecture and is parameterized by network weights θb. The dilation network fd is parameterized by not only network weights θd but also the architecture parameter αd. The objective of robust architecture dilation is to optimize αd for the minimal adversarial loss min αd L(adv) valid(fhyb; θ d(αd)), (3)

s.t. θ d(αd) = argmin θd L(adv) train(fhyb), (4)

where L(adv) train(fhyb) and L(adv) valid(fhyb; θ d(αd)) are the adversarial losses of fhyb (with the form of Eq. 1) on the training set Dtrain and the validation set Dvalid, respectively, and θ d(αd) is the optimal network weights of fd depending on the current dilation architecture αd. 3.2 Standard Performance Constraint Existing works on adversarial robustness often ﬁx the network capacity, and the increase of adversarial robustness is accompanied by the standard accuracy drop [25, 32]. However, in this work, we increase

the capacity with dilation, which allows us to increase the robustness while maintaining a competitive standard accuracy. We reach that with a standard performance constraint on the dilation architecture. The constraint is achieved by comparing the standard performance of the hybrid network fhyb to the standard performance of the backbone. We denote the network using the backbone only as fbck, which can be formally deﬁned as:

fbck(x) = h l=1,...,Lf (l) b z(l 1) bck , (5)

where z(l) bck = f (l) b (z(l 1) bck ) is the latent feature extracted by the backbone block. The standard model is optimized with natural examples by: min θb L(std)(fbck) = E(x,y) D [ℓ(fbck(y, x))] . (6)

where L(std) is the standard loss. Similarly, we can deﬁne the standard loss L(std)(fhyb) for the hybrid network fhyb. In this way, we can compare the two networks by the difference of their losses and constrain the standard loss of the hybrid network to be equal to or lower than the standard loss of the standard network: L(std)(fhyb) L(std)(fbck) 0. (7) We do not directly optimize the dilation architecture on the standard task, because it is introduced to capture the difference between the standard and adversarial tasks to improve the robustness of the standard trained backbone. It is unnecessary to let both the backbone network and the dilation network to learn the standard task. 3.3 FLOPs-Aware Architecture Optimization By enlarging the capacity of networks, we can improve the robustness, but a drawback is that the model size and computation cost raises. We want to obtain the largest robustness improvement with the lowest computation overhead. Therefore, a computation budget constraint on architecture search is applied. As we are not targeting at any speciﬁc platform, the number of ﬂoating point operations (FLOPs) in the architecture instead of the inference latency is considered. The FLOPs is calculated by counting the number of multi-add operations in the network. We use a differentiable manner to optimize the dilation architecture. In differentiable NAS, a directed acyclic graph (DAG) is constructed as the supernetwork, whose nodes are latent representations and edges are operations. Given that the adversarial training is computationally intensive, to reduce the search cost, a partial channel connections technique proposed by Xu et al. [29] is utilized. During search, operation candidates for each edge are weighted summed with a softmax distribution of the architecture parameter α:

o(i,j)(xi) = (1 Si,j) xi + X

exp(α(o) i,j ) P

o O exp(α(o ) i,j ) o(Si,j xi)

where O is a set of operation candidates, xi is the output of the i-th node, and Si,j is binary mask on edge (i, j) for partial channel connections. The binary mask Si,j is set to 1 or 0 to let the channel be selected or bypassed, respectively. Besides the architecture parameter α, the partial channel connections technique also introduces a edge normalization weight β:

exp(βi,j) P

i <j exp(βi ,j) o(i,j)(xi)

where I(j) is the j-th node. The edge normalization can stabilize differentiable NAS by reducing ﬂuctuation in edge selection after search. Considering Eqs. 8 and 9, the expected FLOPs of the ﬁnally obtained discrete architectures from the one-shot supernetwork can be estimated according to α and β. We calculate the weighted sum of FLOPs of the operation candidates with the identical softmax distributions in Eqs. 8 and 9, which can naturally lead to an expectation. Therefore, the expected FLOPs of node I(j) can be calculated by:

FLOPs(I(j)) = X

exp(βi,j) P

i <j exp(βi ,j) X

exp(α(o) i,j ) P o O exp(α(o ) i,j ) FLOPs(o). (10)

After that, the FLOPs of the dilation network FLOPs(fd) can be estimated by taking the sum of the FLOPs of all the nodes and cells. The objective function in Eq. 3 can be rewritten with the FLOPs constraint as: min αd γ log(FLOPs(fd))τ L(adv) valid(fhyb), (11)

where γ and τ are two coefﬁcient terms. τ controls the sensitivity of the objective function to the FLOPs constraint, and γ scales the constraint to a reasonable range (e.g. around 1.0). 3.4 Optimization We reformulate the bi-level form optimization problem deﬁned in Eqs 3 and 4 into a constrained optimization form. Combining with the standard performance constraint in Eq. 7 and the FLOPsaware objectives in Eq. 11, we have min αd γ log(FLOPs(fd))τ L(adv) valid(fhyb; θ d(αd)), (12)

s.t. L(std) valid(fhyb) L(std) valid(fbck) 0, (13)

θ d(αd) = argmin θd L(adv) train(fhyb), s.t. L(std) train(fhyb) L(std) train(fbck) 0. (14)

To solve the constrained architecture optimization problem, we apply a common method for constrained optimization, namely alternating direction method of multipliers (ADMM). To apply ADMM, the objective function needs to be reformulate as an augmented Lagrangian function. We ﬁrst deal with the upper-level optimization of the architecture parameter αd:

L({αd}, {λ1}) = γ log(FLOPs(fd))τ L(adv) valid(fhyb) + λ1 c1 + ρ

2 max{0, c1} 2 2 (15)

s.t. c1 = L(std) valid(fhyb) L(std) valid(fbck), (16) where λ1 is the Lagrangian multiplier, and ρ R+ is a positive number predeﬁned in ADMM. We update αd and λ1 alternately with: α(t+1) d α(t) d η1 L({α(t) d }, {λ(t) 1 }) (17)

λ(t+1) 1 λ(t) 1 + ρ c1, (18) where η1 is a learning rate for architecture. Similarly, the lower-level optimization problem of network weights θd as an augmented Lagrangian function can be deﬁned as:

L({θd}, {λ2}) = L(adv) train(fhyb) + λ2 c2 + ρ

2 max{0, c2} 2 2 (19)

s.t. c2 = L(std) train(fhyb) L(std) train(fbck), (20) where λ2 is the Lagrangian multiplier. Similarly, we can update θd and λ2 with the same alternate manner: θ(t+1) d θ(t) d η2 L({θ(t) d }, {λ(t) 2 }), (21)

λ(t+1) 2 λ(t) 2 + ρ c2, (22) where η2 is the learning rate for network weights. 4 Theoretical Analysis In this section, we provide theoretical analysis of our proposed NADAR. As there are two major goals in our optimization problem, i.e., the standard performance constraint and the adversarial robustness, this analysis is also twofold. Firstly, a standard error bound of NADAR is analyzed. We demonstrate that the standard error of the dilated adversarial network can be bounded by the standard error of the backbone network and our standard performance constraint. Secondly, we compare the adversarial error of the dilated adversarial network and the standard error of the backbone standard network. We demonstrate that the adversarial performance can be improved by adding a dilation architecture to the backbone, even if the backbone is ﬁxed. These two error bounds can naturally motivate the optimization problem in Eqs. 12 and 13. Detailed proofs are provided in our supplementary material. Besides, through this analysis, we want to reveal two remarks: (1) enlarging the backbone network with dilation can improve its performance, which proves the validity of our neural architecture dilation; (2) the dilation architecture should be consistent with the backbone on clean samples and samples that are insensitive to attacks, which directly inspires our standard performance constraint. We discuss the binary classiﬁcation case for a simpliﬁcation, where the label space is Y = { 1, +1}. The obtained theoretical results can also be generalized to the multi-class classiﬁcation case. A binary classiﬁcation hypothesis h H is deﬁned as a mapping h : X 7 R, where H is a hypothesis space, and X is an input space of natural examples. The output of the hypothesis is a real value score. The predicted label can be obtained from the score by applying the sign function sign( ) on it. Denote the backbone hypothesis as hb. By further investigating the inﬂuence of the dilation architecture, the hypothesis of the resulting hybrid network can be deﬁned as hhyb(x) = hb(x) + hd(x), where hd stands for the change resulting from the dilation architecture. The standard model corresponds to a

hypothesis hbck(x) = hb(x). We further deﬁne the standard error of a hypothesis h as Rstd(h) := E [1{sign(h(x)) = y}] , (23) and the adversarial error of it as Radv(h) := E [1{ x Bp(x, ε), s.t. sign(h(x )) = y}] , (24) where 1{ } denotes the indicator function. 4.1 Standard Error Bound To compare the error of two different hypotheses, we ﬁrst slightly modify the error function. Eq. 23 checks the condition that sign(h(x)) = y. Because the label space is binary and the output space of h is real value, we can remove the sign function by replacing the condition with yh(x) 0. Then, by applying a simple inequality 1{yh(x) 0} e yh(x), we have a very useful inequality about the standard error: Rstd(h) E h e yh(x)i . (25)

Eq. 25 can lead to our standard error bound in Theorem 1. Theorem 1. Let hbck(x) = hb(x) be a standard hypothesis, hhyb(x) = hb(x) + hd(x) be a hybrid hypothesis, and Rstd(hbck) and Rstd(hhyb) be the standard error of hbck and hhyb, respectively. For any mapping hb, hd : X 7 R, we have

Rstd(hhyb) Rstd(hbck) + E h e hb(x)hd(x)i , (26)

where x X is the input. Theorem 1 illustrates that the standard performance of the hybrid network is bounded by the standard performance of the backbone network and the sign disagreement between hb(x) and hd(x). This reﬂects our remark (2). If the backbone accurately predicts the label of the natural data x, hd(x) shall make the same category prediction, which implies that the prediction by the hybrid hypothesis hhyb(x) = hb(x) + hd(x) can be strengthened and would not lead to a worse result than that of the stand hypothesis. To reach such objective, it naturally links with the standard performance constraint proposed in Eq. 7 and applied in Eq. 13. 4.2 Adversarial Error Bound Similar to Eq. 25, we can have an inequality about the adversarial error:

Radv(h) E max x Bp(x,ε) e yh(x ) , (27)

based on which we can derive the following Lemma 2. Lemma 2. For any mapping h : X 7 R, we have

E max x Bp(x,ε) e yh(x ) E max x Bp(x,ε) e yh(x)e h(x)h(x ) , (28)

where x X is the input, y { 1, +1} is the corresponding label, and ε is the bound of allowed adversarial perturbation. Lemma 2 is an inherent feature of a single hypothesis. We generalize it to the case of dilating hbck to hhyb with a dilation hypothesis hd. Theorem 3. Let hbck(x) = hb(x) be a standard hypothesis, hhyb(x) = hb(x) + hd(x) be a dilated hypothesis, Rstd(hbck) be the standard error of hbck, and Radv(hhyb) be the adversarial error of hhyb. For any mapping hb, hd : X 7 R, we have

Radv(hhyb) Rstd(hbck) + E max x Bp(x,ε) e yhb(x) e hb(x)hb(x )e yhd(x ) 1 . (29)

where x X is the input, y { 1, +1} is the corresponding label, and ε is the bound of allowed adversarial perturbation. By minimizing e yhb(x) in Theorem 3, we expect the backbone network to have a satisfactory accuracy on the natural data, which is a prerequisite of the proposed algorithm. As the backbone network has been ﬁxed in this paper, the term e hb(x)hb(x ) will not be inﬂuenced by the algorithm. The remaining term e yhd(x ) implies that even if the backbone network makes wrong prediction on the adversarial example x , there is still a chance for the dilation network hd to correct the mis-classiﬁcation and improve the overall adversarial accuracy of the hybrid network hhyb. This capability of dilation reﬂects our remark (1). In another case, if hb makes a correct prediction, hd should agree with it, which reﬂects our remark (2) is also applied to the adversarial error.

Table 1: The standard validation accuracy on natural images and adversarial validation accuracy under various attacks of NADAR comparing to different SOTA methods on CIFAR-10.

Category Method Params (M) + (G) Valid Acc. Against (%) Back. Arch. Back. Arch. Natural FGSM PGD-20 PGD-100 MI-FGSM

Standard Standard 46.2 - 6.7 - 95.01 0.00 0.00 0.00 0.00

Adversarial Training

PGD-7 [18] 46.2 - 6.7 - 87.25 56.10 45.84 45.29 - FAT [33] 46.2 - 6.7 - 89.34 65.52 46.13 46.82 - Free AT-8 [22] 46.2 - 6.7 - 85.96 - 46.82 46.19 - TRADES-1 [32] 46.2 - 6.7 - 88.64 - 48.90 - 51.26 TRADES-6 [32] 46.2 - 6.7 - 84.92 - 56.43 - 57.95

Standard NAS

Amoeba Net [20] - 3.2 - 0.5 83.41 56.40 39.47 - 47.60 NASNet [35] - 3.8 - 0.6 83.66 55.67 48.02 - 53.05 DARTS [17] - 3.3 - 0.5 83.75 55.75 44.91 - 51.63 PC-DARTS [29] - 3.6 - 0.6 83.94 52.67 41.92 - 49.09

Rob Net-small [8] - 4.4 - N/A 78.05 53.93 48.32 48.08 48.98 Rob Net-medium [8] - 5.7 - N/A 78.33 54.55 49.13 48.96 49.34 Rob Net-large [8] - 6.9 - N/A 78.57 54.98 49.44 49.24 49.92 RACL [5] - 3.6 - 0.5 83.89 57.44 49.34 - 54.73

Dilation NADAR-A (ours) 46.2 3.6 6.7 0.6 86.61 59.98 52.84 52.54 57.72 NADAR-B (ours) 46.2 4.4 6.7 0.7 86.23 60.46 53.43 53.06 58.43

5 Experiments We perform extensive experiments to demonstrate that NADAR can improve the adversarial robustness of neural networks by dilating the neural architecture. In this section, we ﬁrst compare both the standard and adversarial accuracy of our hybrid network to various state-of-the-art (SOTA) methods. Then, we perform experiments to analyze the impact of each component in the NADAR framework, including the dilation-based training approach and the standard performance constraint. Finally, we explore the sufﬁcient scale of dilation and the effect of FLOPs constraint. More results on other datasets under various attacking manners with different backbones are also available in the supplementary material. 5.1 Experiment Setting We use a similar pipeline to previous NAS works [17, 29, 5, 8]. Firstly, we optimize the dilating architecture in a one-shot model. Then, a discrete architecture is derived according to the architecture parameters α and β. Finally, a discrete network is constructed and retrained for validation. During the dilation phase, the training set is split into two equal parts. One is used as the training set for network weights optimization, and the other one is used as the validation set for architecture parameter optimization. During the retraining and validation phases, the entire training set is used for training, and the trained network is validated on the original validation set. We perform dilation under white-box attacks on CIAFR-10/100 [10] and Image Net [21] and under black-box attacks on CIFATR-10. The NADAR framework requires a backbone to be dilated. Following previous works [18, 22, 33, 32], we use the 10 times wider variant of Res Net, i.e. the Wide Res Net 34-10 (WRN34-10) [31], on both CIFAR dataset, and use Res Net-50 [9] on Image Net. The search space of dilated architecture and the dilated architectures are as illustrated in Section A of the supplementary material. Considering both the optimization of neural architecture and the generation of adversarial examples is computational intensive, we apply methods to reduce the computational overhead during the dilation phase. As aforementioned, we utilize partial channel connections to reduce the cost of architecture optimization. As for the adversarial training during search, we use Free AT [22], which recycles gradients during training for the generation of adversarial examples and reduces the training cost. 5.2 Defense Against White-box Attacks CIFAR-10. We compare the hybrid network with 4 categories of SOTA methods, including standard training, adversarial training, standard NAS, and robust NAS. The standard training method and all the adversarial training methods use the WRN34-10. For the standard NAS methods, the architecture is the best architecture searched with standard training as reported in their papers and is retrained with PGD-7 [18]. For the adversarial NAS methods, we follow their original setting. We include two best dilation architectures obtained with our method, NADAR-A and NADAR-B dilated without and with the FLOPs constraint, respectively. The architectures are visualized in Section A of the supplementary material. Our architectures are also retrained with PGD-7. In Table 1, the standard accuracy on natural images and the adversarial accuracy under PGD-20 attack are reported. Comparing NADAR-A and NADAR-B, the FLOPs constraint can obviously reduce the FLOPs number (reducing by 14.28%) as well as the parameters number (reducing by 18.19%) of the dilation

Table 2: The standard and adversarial validation on CIFAR-100.

Valid Acc. Against (%) Natural PGD-20

Standard 78.84 0.00

PGD-7 [18] - 23.20 Free AT-8 [22] 62.13 25.88 Rob Net-large [8] - 23.19 RACL [5] - 27.80

NADAR-A (ours) 61.73 27.77 NADAR-B (ours) 62.56 28.40

Table 3: The standard and adversarial validation accuracy on Tiny-Image Net with Res Net-50 as backbone.

Architecture Training Method

Valid Acc. Against (%) Natural FGSM PGD-10 PGD-20

Backbone PGD-4 0.19 43.23 24.13 22.25 22.16 Dilation (ours) PGD-4 0.45 44.37 24.33 22.69 22.70

Backbone Free AT-4 0.05 42.73 24.10 22.67 22.58 Dilation (ours) Free AT-4 0.12 44.68 24.66 22.88 22.78

Backbone Fast AT 0.12 45.92 23.53 20.66 20.54 Dilation (ours) Fast AT 0.22 46.22 23.90 21.21 21.14

architecture. The negative impact of it on the adversarial accuracy is marginal (only 0.59% under PGD-20 attack), and the standard accuracy can even be slightly improved. As for adversarial training methods, we improve the adversarial performance by 7.59% with only 1.02% standard performance drop comparing to PGD-7 while we use the same training method but the hybrid network. This result illustrates that our dilation architecture can indeed improve the robustness of a network without modifying the training method. At the meantime, the standard accuracy is constrained to a competitive level. Comparing to Free AT-8, which is their best adversarial setting, our method reaches both lower standard accuracy drop and higher adversarial accuracy gain than it. A very different method to PGD and Free AT, namely friendly adversarial training (FAT), aims to improve the standard accuracy of adversarially trained models by generating weaker adversarial examples than regular adversarial training. Although FAT can make signiﬁcant improvement on standard accuracy with weak attack, its adversarial accuracy gain against PGD-7 is marginal (only 0.29%). It even has lower adversarial accuracy than Free AT-8 despite its higher standard accuracy. Unlike FAT, our method is dedicated to another direction of improvement, which improves the adversarial robustness without signiﬁcantly affecting the standard performance. Even though there is still a trade-off between the standard and the adversarial accuracy, we can increase the ratio of the standard drop to the adversarial gain to 1 : 7.44. A previous work focuses on the trade-off is TRADES, which introduces a tuning parameter (λ) to adjust the balance of the trade-off. Nevertheless, comparing the TRADES-1 (1/λ = 1) and TRADES-6 (1/λ = 6), their trade-off ratio is only 1 : 2.02 (i.e. 3.72% standard accuracy drop for 7.53% adversarial accuracy gain). We can provide a better ratio of trade-off than them. Besides, our standard performance is naturally constrained by Eq. 7. There are no hyperparameters in it that needs to be adjusted, which leads to our better trade-off ratio of the standard drop to the adversarial gain and more reasonable balance than TRADES. Finally, comparing to NAS methods, our hybrid network can outperform both the standard and robust NAS architectures. The standard NAS architectures are not optimized for adversarial robustness. Except the NASNet, their adversarial accuracies are generally poor. Although they are optimized for standard tasks, their standard accuracy after adversarial training is signiﬁcantly lower than the WRN34-10 trained with PGD-7. As for robust NAS methods, Rob Net signiﬁcantly sacriﬁces their standard accuracy for robustness, which has the lowest standard accuracy among all the works listed in Table 1. RACL has a better trade-off, but it can only reach the standard accuracy of standard NAS architecture, which is still lower than adversarilly trained WRN34-10. This demonstrates that dilating a standard backbone for both standard constraint and adversarial gain is more effectiveness than design a new architecture from scratch. CIFAR-100. We adapt architectures dilated on CIFAR-10 to CIFAR-100, and report the results in Table 2. We consider two kinds of baselines, including traditional adversarial training methods (PGD-7 and Free AT-8) and two robust NAS methods (Rob Net and RACL). The results show that even with more categories, NADAR can still reach superior robustness under PGD-20 attack. As for the standard validation accuracy, we can reach competitive performance comparing to Free AT-8. The other works do not report standard accuracy in their papers. As for the adversarial validation accuracy, our NADAR-B can outperform all the baselines, while NADAR-A is slightly lower than RACL but signiﬁcantly better than the others. Tiny-Image Net. We also adapt our architectures to a larger dataset, namely Tiny-Image Net. For efﬁcient training on Tiny-Image Net, we compare our dilated architecture with PGD and two efﬁcient adversarial training method, i.e. Free AT [22] and Fast AT [28]. We follow the Image Net setting of Shafahi et al. [22] and Wong et al. [28], which uses Res Net-50 as the backbone and set the clip size ϵ = 4. For PGD and Free AT, we set the number of steps K = 4 and the step size ϵS = 2 The results are reported in Table 3. We also report the GPU days cost to train the networks with NVIDIA V100

GPU. Although NADAR consumes approximately 1.8 2.4 GPU days, our method can consistently outperform the baselines in terms of both natural and adversarial accuracy. 5.3 Defense Against Auto Attack

Table 4: The adversarial validation accuracy of NADAR comparing to different SOTA methods under Auto Attack on CIFAR-10.

Category Method Valid Acc. Against (%) APGDCE APGDT DLR FABT Square AA

Adversarial Training

PGD-7 [18] 44.75 44.28 44.75 53.10 44.04 Fast AT [28] 45.90 43.22 43.74 53.32 43.21 Free AT-8 [22] 43.66 41.64 43.44 51.95 41.47

Dilation NADAR-A (ours) 52.27 50.00 50.00 58.69 49.83 NADAR-B (ours) 52.64 50.45 50.88 59.33 50.44

Beside the traditional attack methods, we also consider a novel and promising parameter-free evaluation method, namely Auto Attack [4]. We use the standard setting of Auto Attack, including four individual attacks: APGDCE, APGDT DLR, FABT and Square. The column AA is a combination of the four attacks. The validation accuracy are reported in Table 4. As can be seen, our method reaches superior performance than the baselines. For simpliﬁcation, we only list the comparison to the best performance among PGD-7, Fast AT and Free AT-8 as follow: under APGDCE attack, we can outperform Fast AT by 6.74%; under APGDT DLR, we can outperform PGD-7 by 6.17%; under FABT attacks we can outperform PGD-7 by 6.13%; under Square attack, we can outperform Fast AT by 6.01%. 5.4 Defense Against Black-box Attacks

Table 5: The adversarial validation accuracy under black-box attacks on CIFAR-10.

Defense Network Source Network Valid Acc. (%) FGSM PGD-20 PGD-100 MI-FGSM

WRN34-10 + PGD-7 WRN34-10 + Natural 83.99 84.56 84.76 84.05 NADAR-B + PGD-7 WRN34-10 + Natural 85.94 86.59 86.51 85.95

WRN34-10 + PGD-7 WRN34-10 + FGSM 70.78 68.26 68.30 69.73 NADAR-B + PGD-7 WRN34-10 + FGSM 77.25 77.66 77.69 77.19

WRN34-10 + PGD-7 NADAR-B + PGD-7 69.33 67.08 67.11 68.26 NADAR-B + PGD-7 WRN34-10 + PGD-7 70.78 68.26 68.30 69.73

We perform black-box attacks on CIFAR-10. We use different source networks to generate adversarial examples. For the source networks, we use the WRN34-10 backbone trained with natural images and adversarial images generated with FGSM and PGD-7. For the defense networks, we compare our best NADAR-B architecture with the plain WRN34-10 backbone. Both of them are trained with PGD-7. The results are reported in Table 5 grouped according to source networks. With the WRN34-10 source network trained with natural images and FGSM, NADAR-B can consistently outperform the backbone. We also use NADAR-B trained with PGD-7 and WRN34-10 trained with PGD-7 to attack each other. Our hybrid network can consistently reach superior performances. 5.5 NADAR Trained with Different Adversarial Training Methods Table 6: Comparison of test accuracy of NADAR and WRN34-10 backbone when using various AT methods for training.

Model AT Method Valid Acc. Against (%) Natural PGD-20 AA

Backbone PGD-7 87.25 45.84 44.04 NADAR (ours) PGD-7 86.23 53.43(+7.59) 50.44(+6.4)

Backbone FAT 89.34 46.13 N/A NADAR (ours) FAT 88.12 54.53(+8.40) 51.37(N/A)

Backbone TRADES-1 88.64 48.90 43.01 NADAR (ours) TRADES-1 89.77 55.13(+6.23) 50.90(+7.89)

Backbone TRADES-6 84.92 56.43 53.08 NADAR (ours) TRADES-6 83.94 57.43(+1.00) 55.25(+2.17)

In previous experiments, our NADAR is trained with PGD-7, and other competitors are trained according to their corresponding settings. To further investigate whether NADAR can work along with other stronger adversarial training methods than the PGD-7, we train the obtained architecture with various adversarial training methods, including FAT, TRADES-1, and TRADES-6. The results are shown in Table 6. As can be seen, our method can consistently outperform the backbone in terms of adversarial accuracy. Regarding natural accuracy, NADAR is competitive to the backbone (only 0.52% lower in average, which is a marginal drop, given the magnitude of robustness improvement). Especially, under the TRADES-1 setting, which focuses on natural accuracy, NADAR can outperform its backbone on both natural accuracy (89.77% vs. 88.64%) and adversarial accuracy (55.13% vs. 48.90% under PGD-20 and 50.90% vs. 43.01% under Auto Attack). In the meantime, its adversarial accuracy (55.13%) is much closer to the one of TRADES-6 (56.43%), which focuses on robustness and has low (4.85% lower) natural accuracy, than to the one of TRADES-1 (48.90%), which has a similar natural accuracy. 5.6 Ablation Study of Dilation Method We perform ablation study of the dilation method. There are two crucial components in our method. Firstly, the separate optimization objectives of standard and adversarial tasks can ensure the that the backbone focus on clear images, and the dilation network learns to improve the robustness of the backbone. Secondly, the standard performance constraint prevents the dilation network from

harming the standard performance of the backbone network. This experiment demonstrates that both of them make crucial contributions to the ﬁnal results. Note that without the separate objectives, the hybrid network is trained as a whole. Therefore, there is also no standard constraint. We use the same settings to Section 5.2.

Table 7: The standard and adversarial accuracy by retraining of various networks dilated with ablated manners.

Separate Objectives

Standard Constraint

Valid Acc. Against (%)

Natural PGD-20

No N/A 84.19 0.32 45.97 0.18 Yes No 84.79 0.55 48.53 0.33 Yes Yes 85.97 0.26 53.18 0.25

The standard and adversarial accuracy of the obtained networks by retraining is reported in Table 7. If there is no standard performance constraint, dilating together or separately has the similar standard performance. Although the backbone of the latter is trained with standard objective, it won t inﬂuence the retraining results too much (only 0.6% higher in average). The complete framework with standard constraint reaches the best standard accuracy after retraining. As for the adversarial accuracy, dilating with the separate objectives can consistently outperforms dilating with a single adversarial objectives. 6 Conclusion The trade-off between accuracy and robustness is considered as an inherent property of neural networks, which cannot be easily bypassed with adversarial training or robust NAS. In this paper, we propose to dilate the architecture of neural networks to increase the adversarial robustness while maintaining a competitive standard accuracy with a straightforward constraint. The framework is called neural architecture dilation for adversarial robustness (NADAR). Extensive experiments demonstrate that NADAR can effectively improve the robustness of neural networks and can reach a better trade-off ratio than existing methods. Acknowledgments The authors would like to thank the area chairs and the reviewers for their constructive comments. This work was supported in part by the Australian Research Council under projects DE180101438 and DP210101859. References

[1] B. Baker, O. Gupta, N. Naik, and R. Raskar. Designing neural network architectures using reinforcement learning. ar Xiv preprint ar Xiv:1611.02167, 2016.

[2] Q.-Z. Cai, M. Du, C. Liu, and D. Song. Curriculum adversarial training. ar Xiv preprint ar Xiv:1805.04807, 2018.

[3] M. Cisse, P. Bojanowski, E. Grave, Y. Dauphin, and N. Usunier. Parseval networks: Improving robustness to adversarial examples. In International Conference on Machine Learning, pages 854 863. PMLR, 2017.

[4] F. Croce and M. Hein. Reliable evaluation of adversarial robustness with an ensemble of diverse parameterfree attacks. In International Conference on Machine Learning, pages 2206 2216. PMLR, 2020.

[5] M. Dong, Y. Li, Y. Wang, and C. Xu. Adversarially robust neural architectures. ar Xiv preprint ar Xiv:2009.00902, 2020.

[6] F. Farnia, J. M. Zhang, and D. Tse. Generalizable adversarial training via spectral normalization. ar Xiv preprint ar Xiv:1811.07457, 2018.

[7] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. ar Xiv preprint ar Xiv:1412.6572, 2014.

[8] M. Guo, Y. Yang, R. Xu, Z. Liu, and D. Lin. When nas meets robustness: In search of robust architectures against adversarial attacks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 631 640, 2020.

[9] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770 778, 2016.

[10] A. Krizhevsky, G. Hinton, et al. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.

[11] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classiﬁcation with deep convolutional neural networks. Advances in neural information processing systems, 25:1097 1105, 2012.

[12] Y. Le Cun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4):541 551, 1989.

[13] Y. Le Cun, L. Jackel, L. Bottou, C. Cortes, J. S. Denker, H. Drucker, I. Guyon, U. A. Muller, E. Sackinger, P. Simard, et al. Learning algorithms for classiﬁcation: A comparison on handwritten digit recognition. Neural networks: the statistical mechanics perspective, 261:276, 1995.

[14] Y. Li, M. Dong, Y. Wang, and C. Xu. Neural architecture search in a proxy validation loss landscape. In Proceedings of the International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 5853 5862. PMLR, 2020.

[15] Y. Li, Z. Yang, Y. Wang, and C. Xu. Adapting neural architectures between domains. In Advances in Neural Information Processing Systems, 2020.

[16] H. Liu, K. Simonyan, O. Vinyals, C. Fernando, and K. Kavukcuoglu. Hierarchical representations for efﬁcient architecture search. ar Xiv preprint ar Xiv:1711.00436, 2017.

[17] H. Liu, K. Simonyan, and Y. Yang. Darts: Differentiable architecture search. ar Xiv preprint ar Xiv:1806.09055, 2018.

[18] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models resistant to adversarial attacks. ar Xiv preprint ar Xiv:1706.06083, 2017.

[19] E. Real, S. Moore, A. Selle, S. Saxena, Y. L. Suematsu, J. Tan, Q. V. Le, and A. Kurakin. Largescale evolution of image classiﬁers. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 2902 2911. JMLR. org, 2017.

[20] E. Real, A. Aggarwal, Y. Huang, and Q. V. Le. Regularized evolution for image classiﬁer architecture search. In Proceedings of the AAAI Conference on Artiﬁcial Intelligence, volume 33, pages 4780 4789, 2019.

[21] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3):211 252, 2015.

[22] A. Shafahi, M. Najibi, M. A. Ghiasi, Z. Xu, J. Dickerson, C. Studer, L. S. Davis, G. Taylor, and T. Goldstein. Adversarial training for free! In Advances in Neural Information Processing Systems, pages 3358 3369, 2019.

[23] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1 9, 2015.

[24] F. Tramèr, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, and P. Mc Daniel. Ensemble adversarial training: Attacks and defenses. ar Xiv preprint ar Xiv:1705.07204, 2017.

[25] D. Tsipras, S. Santurkar, L. Engstrom, A. Turner, and A. Madry. Robustness may be at odds with accuracy. ar Xiv preprint ar Xiv:1805.12152, 2018.

[26] T.-W. Weng, H. Zhang, P.-Y. Chen, J. Yi, D. Su, Y. Gao, C.-J. Hsieh, and L. Daniel. Evaluating the robustness of neural networks: An extreme value theory approach. ar Xiv preprint ar Xiv:1801.10578, 2018.

[27] E. Wong and Z. Kolter. Provable defenses against adversarial examples via the convex outer adversarial polytope. In International Conference on Machine Learning, pages 5286 5295. PMLR, 2018.

[28] E. Wong, L. Rice, and J. Z. Kolter. Fast is better than free: Revisiting adversarial training. ar Xiv preprint ar Xiv:2001.03994, 2020.

[29] Y. Xu, L. Xie, X. Zhang, X. Chen, G.-J. Qi, Q. Tian, and H. Xiong. Pc-darts: Partial channel connections for memory-efﬁcient differentiable architecture search. ar Xiv preprint ar Xiv:1907.05737, 2019.

[30] Z. Yang, Y. Wang, X. Chen, B. Shi, C. Xu, C. Xu, Q. Tian, and C. Xu. Cars: Continuous evolution for efﬁcient neural architecture search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1826 1835, 2020.

[31] S. Zagoruyko and N. Komodakis. Wide residual networks. ar Xiv preprint ar Xiv:1605.07146, 2016.

[32] H. Zhang, Y. Yu, J. Jiao, E. P. Xing, L. E. Ghaoui, and M. I. Jordan. Theoretically principled trade-off between robustness and accuracy. ar Xiv preprint ar Xiv:1901.08573, 2019.

[33] J. Zhang, X. Xu, B. Han, G. Niu, L. Cui, M. Sugiyama, and M. Kankanhalli. Attacks which do not kill training make adversarial learning stronger. In International Conference on Machine Learning, pages 11278 11287. PMLR, 2020.

[34] J. Zhang, J. Zhu, G. Niu, B. Han, M. Sugiyama, and M. S. Kankanhalli. Geometry-aware instancereweighted adversarial training. In Proceeding of International Conference on Learning Representations, 2021.

[35] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8697 8710, 2018.