# neural_architecture_dilation_for_adversarial_robustness__1a8c3327.pdf Neural Architecture Dilation for Adversarial Robustness Yanxi Li 1, Zhaohui Yang 2,3, Yunhe Wang 2, Chang Xu 1 1 School of Computer Science, University of Sydney, Australia 2 Huawei Noah s Ark Lab 3 Key Lab of Machine Perception (MOE), Department of Machine Intelligence, Peking University, China yali0722@uni.sydney.edu.au, zhaohuiyang@pku.edu.cn, yunhe.wang@huawei.com, c.xu@sydney.edu.au With the tremendous advances in the architecture and scale of convolutional neural networks (CNNs) over the past few decades, they can easily reach or even exceed the performance of humans in certain tasks. However, a recently discovered shortcoming of CNNs is that they are vulnerable to adversarial attacks. Although the adversarial robustness of CNNs can be improved by adversarial training, there is a trade-off between standard accuracy and adversarial robustness. From the neural architecture perspective, this paper aims to improve the adversarial robustness of the backbone CNNs that have a satisfactory accuracy. Under a minimal computational overhead, the introduction of a dilation architecture is expected to be friendly with the standard performance of the backbone CNN while pursuing adversarial robustness. Theoretical analyses on the standard and adversarial error bounds naturally motivate the proposed neural architecture dilation algorithm. Experimental results on real-world datasets and benchmark neural networks demonstrate the effectiveness of the proposed algorithm to balance the accuracy and adversarial robustness. 1 Introduction In the past few decades, novel architecture design and network scale expansion have achieved significant success in the development of convolutional neural networks (CNN) [12, 13, 11, 23, 9, 31, 1, 35, 17, 15]. These advanced neural networks can already reach or even exceed the performance of humans in certain tasks [10, 21]. Despite the success of CNNs, a recently discovered shortcoming of them is that they are vulnerable to adversarial attacks. The ingeniously designed small perturbations when applied to images could mislead the networks to predict incorrect labels of the input [7]. This vulnerability notably reduces the reliability of CNNs in practical applications. Hence developing solutions to increase the adversarial robustness of CNNs against adversarial attacks has attracted particular attention from the researchers. Adversarial training can be the most standard defense approach, which augments the training data with adversarial examples. These adversarial examples are often generated by fast gradient sign method (FGSM) [7] or projected gradient descent (PGD) [18]. Tramèr et al. [24] investigates the adversarial examples produced by a number of pre-trained models and developed an ensemble adversarial training. Focusing on the worst-case loss over a convex outer region, Wong and Kolter [27] introduces a provable robust model. There are more improvements of PGD adversarial training techniques, including Lipschitz regularization [6] and curriculum adversarial training [2]. In a recent study by Tsipras et al. [25], there exists a trade-off between standard accuracy and adversarial robustness. After the networks have been trained to defend against adversarial attacks, their performance over natural image classification could be negatively influenced. TRADES [32] theoretically studies this trade-off by introducing a boundary error between the natural (i.e. standard) error and the robust error. Instead of directly adjusting the trade-off, the friendly adversarial training (FAT) [33] proposes to exploit weak adversarial examples for a slight standard accuracy drop. 35th Conference on Neural Information Processing Systems (Neur IPS 2021). Numerous efforts have been made to defend the adversarial attacks by carefully designing various training objective functions of the networks. But less noticed is that the neural architecture actually bounds the performance of the network. Recently there are a few attempts to analyze the adversarial robustness of the neural network from the architecture perspective. For example, RACL [5] applies Lipschitz constraint on architecture parameters in one-shot NAS to reduce the Lipschitz constant and improve the robustness. Rob Net [8] search for adversarially robust network architectures directly with adversarial training. Despite these studies, a deeper understanding of the accuracy and robustness trade-off from the architecture perspective is still largely missing. In this paper, we focus on designing neural networks sufficient for both standard and adversarial classification from the architecture perspective. We propose neural architecture dilation for adversarial robustness (NADAR). Beginning with the backbone network of a satisfactory accuracy over the natural data, we search for a dilation architecture to pursue a maximal robustness gain while preserving a minimal accuracy drop. Besides, we also apply a FLOPs-aware approach to optimize the architecture, which can prevent the architecture from increasing the computation cost of the network too much. We theoretically analyze our dilation framework and prove that our constrained optimization objectives can effectively achieve our motivations. Experimental results on benchmark datasets demonstrate the significance of studying the adversarial robustness from the architecture perspective and the effectiveness of the proposed algorithm. 2 Related Works 2.1 Adversarial Training FGSM [7] claims that the adversarial vulnerability of neural networks is related to their linear nature instead of the nonlinearity and overfitting previously thought. A method to generate adversarial examples for adversarial training is proposed based on such a perspective to reduce the adversarial error. PGD [18] studies the adversarial robustness from the view of robust optimization. A first-order gradient-based method for iterative adversarial is proposed. Free AT [22] reduces the computational overhead of generating adversarial examples. The gradient information in network training is recycled to generate adversarial training. With this gradient reusing, it achieves 7 to 30 times of speedup. However, the adversarial robustness comes at a price. Tsipras et al. [25] reveals that there is a trade-off between the standard accuracy and adversarial robustness because of the difference between features learned by the optimal standard and optimal robust classifiers. TRADES [32] theoretically analyzes this trade-off. A boundary error is identified between the standard and adversarial error to guide the design of defense against adversarial attacks. As a solution, a tuning parameter λ is introduced into their framework to adjust the trade-off. The friendly adversarial training (FAT) [33] generates weak adversarial examples that satisfy a minimal margin of loss. The miss-classified adversarial examples with the lowest classification loss are selected for adversarial training. 2.2 Neural Architecture Search NAS aims to automatically design neural architectures for networks. Early NAS methods [1, 35, 16, 19] are computationally intensive, requiring hundreds or thousands of GPU hours because of the demand of training and evaluation of a large number of architectures. Recently, the differentiable and one-shot NAS approaches Liu et al. [17] and Xu et al. [29] propose to construct a one-shot supernetwork and optimize the architecture parameter with gradient descent, which reduces the computational overhead dramatically. Differentiable neural architecture search allows joint and differentiable optimization of model weights and the architecture parameter using gradient descent. Due to the parallel training of multiple architectures, DARTS is memory consuming. Several followup works aim to reduce the memory cost and improve the efficiency of NAS. One remarkable approach among them is PC-DARTS [29]. It utilizes a partial channel connections technique, where sub-channels of the intermediate features are sampled to be processed. Therefore, memory usage and computational cost are reduced. Besides direct reducing the training cost, CARS [30] proposes a novel efficient continuous evolutionary approach based on the historical evaluation. Similarly, PVLL-NAS [14] performs evaluation with a performance estimator, who samples neural architectures for both architecture searching and iterative training of the estimator itself. Considering adversarial attacks in the optimization of neural architectures can help designing networks that are inherently resistant to adversarial attacks. RACL [5] applied a constraint on the architecture parameter in differentiable one-shot NAS to reduce the Lipschitz constant. Previous works [3, 26] have shown that a smaller Lipschitz constant always corresponds to a more robust network. It is, therefore, effective to improve the robustness of neural architectures by constraining their Lipschitz constant. Rob Net [8] directly optimizes the architecture by adversarial training with PGD. 3 Methodology The adversarial training can be considered as a minimax problem, where the adversarial perturbations are generated to attack the network by maximizing the classification loss, and the network is optimized to defend against such attacks: min f E(x,y) D max x Bp(x,ε) ℓ(y, f(x )) , (1) where D is the distribution of the natural examples x and the labels y, Bp(x, ε) = {x : x x p ε} defines the set of allowed adversarial examples x within the scale ε of small perturbations under lp normalization, and f is the network under attack. 3.1 Robust Architecture Dilation Hybrid Network Dilation Network Figure 1: The overall structure of a NADAR hybrid network. The capacity of deep neural network has been demonstrated to be critical to its adversarial robustness [18, 25, 34]. Madry et al. [18] finds capacity plays an important role in adversarial robustness, and networks require larger capacity for adversarial than standard tasks. Tsipras et al. [25] suggests simple classifier for standard tasks cannot reach good performance on adversarial tasks. However, it remains an open question to use the minimal increase of network capacity in exchange for the adversarial robustness. Suppose that we have a backbone network fb that can achieve a satisfactory accuracy on the natural data. To strengthen its adversarial robustness without hurting the standard accuracy, we propose to increase the capacity of this backbone network fb by dilating it with a network fd, whose architecture and parameter will be optimized within the adversarial training. The backbone network fb is split into blocks. A block f (l) b is defined as a set of successive layers in the backbone with the same resolution. For a backbone with L blocks, i.e. fb = {f (l) b , l 1, . . . , L}, we attach a cell f (l) d of the dilation network to each block f (l) b . Therefore, the dilation network also has L cells, i.e. fd = {f (l) d , l 1, . . . , L}. For the dilation architecture, we search for cells within a NASNet-like [35] search space. In a NASNet-like search space, each cell takes two previous outputs as its inputs. The backbone and the dilation network are further aggregated by element-wise sum. The overall structure of a NADAR hybrid network is as shown in Figure 1. Formally, the hybrid network for the adversarial training is defined as: fhyb(x) = h l=1,...,L f (l) b (z(l 1) hyb ) + f (l) d (z(l 1) hyb , z(l 2) hyb ) , (2) where z(l) hyb = f (l) b (z(l 1) hyb ) + f (l) d (z(l 1) hyb , z(l 2) hyb ) is the latent feature extracted by the backbone block and the dilation block, and represents functional composition. We also define a classification hypothesis h : z(L) hyb ˆy, where z(L) hyb is the latent representation extracted by the last convolutional layer L, and ˆy is the predicted label. During search, the backbone network fb has a fixed architecture and is parameterized by network weights θb. The dilation network fd is parameterized by not only network weights θd but also the architecture parameter αd. The objective of robust architecture dilation is to optimize αd for the minimal adversarial loss min αd L(adv) valid(fhyb; θ d(αd)), (3) s.t. θ d(αd) = argmin θd L(adv) train(fhyb), (4) where L(adv) train(fhyb) and L(adv) valid(fhyb; θ d(αd)) are the adversarial losses of fhyb (with the form of Eq. 1) on the training set Dtrain and the validation set Dvalid, respectively, and θ d(αd) is the optimal network weights of fd depending on the current dilation architecture αd. 3.2 Standard Performance Constraint Existing works on adversarial robustness often fix the network capacity, and the increase of adversarial robustness is accompanied by the standard accuracy drop [25, 32]. However, in this work, we increase the capacity with dilation, which allows us to increase the robustness while maintaining a competitive standard accuracy. We reach that with a standard performance constraint on the dilation architecture. The constraint is achieved by comparing the standard performance of the hybrid network fhyb to the standard performance of the backbone. We denote the network using the backbone only as fbck, which can be formally defined as: fbck(x) = h l=1,...,Lf (l) b z(l 1) bck , (5) where z(l) bck = f (l) b (z(l 1) bck ) is the latent feature extracted by the backbone block. The standard model is optimized with natural examples by: min θb L(std)(fbck) = E(x,y) D [ℓ(fbck(y, x))] . (6) where L(std) is the standard loss. Similarly, we can define the standard loss L(std)(fhyb) for the hybrid network fhyb. In this way, we can compare the two networks by the difference of their losses and constrain the standard loss of the hybrid network to be equal to or lower than the standard loss of the standard network: L(std)(fhyb) L(std)(fbck) 0. (7) We do not directly optimize the dilation architecture on the standard task, because it is introduced to capture the difference between the standard and adversarial tasks to improve the robustness of the standard trained backbone. It is unnecessary to let both the backbone network and the dilation network to learn the standard task. 3.3 FLOPs-Aware Architecture Optimization By enlarging the capacity of networks, we can improve the robustness, but a drawback is that the model size and computation cost raises. We want to obtain the largest robustness improvement with the lowest computation overhead. Therefore, a computation budget constraint on architecture search is applied. As we are not targeting at any specific platform, the number of floating point operations (FLOPs) in the architecture instead of the inference latency is considered. The FLOPs is calculated by counting the number of multi-add operations in the network. We use a differentiable manner to optimize the dilation architecture. In differentiable NAS, a directed acyclic graph (DAG) is constructed as the supernetwork, whose nodes are latent representations and edges are operations. Given that the adversarial training is computationally intensive, to reduce the search cost, a partial channel connections technique proposed by Xu et al. [29] is utilized. During search, operation candidates for each edge are weighted summed with a softmax distribution of the architecture parameter α: o(i,j)(xi) = (1 Si,j) xi + X exp(α(o) i,j ) P o O exp(α(o ) i,j ) o(Si,j xi) where O is a set of operation candidates, xi is the output of the i-th node, and Si,j is binary mask on edge (i, j) for partial channel connections. The binary mask Si,j is set to 1 or 0 to let the channel be selected or bypassed, respectively. Besides the architecture parameter α, the partial channel connections technique also introduces a edge normalization weight β: exp(βi,j) P i