# sparse_dnns_with_improved_adversarial_robustness__1632791f.pdf Sparse DNNs with Improved Adversarial Robustness Yiwen Guo1, 2 Chao Zhang3 Changshui Zhang2 Yurong Chen1 1 Intel Labs China 2 Institute for Artificial Intelligence, Tsinghua University (THUAI), State Key Lab of Intelligent Technologies and Systems, Beijing National Research Center for Information Science and Technology (BNRis), Department of Automation, Tsinghua University 3 Academy for Advanced Interdisciplinary Studies, Center for Data Science, Peking University {yiwen.guo, yurong.chen}@intel.com pkuzc@pku.edu.cn zcs@mail.tsinghua.edu.cn Deep neural networks (DNNs) are computationally/memory-intensive and vulnerable to adversarial attacks, making them prohibitive in some real-world applications. By converting dense models into sparse ones, pruning appears to be a promising solution to reducing the computation/memory cost. This paper studies classification models, especially DNN-based ones, to demonstrate that there exists intrinsic relationships between their sparsity and adversarial robustness. Our analyses reveal, both theoretically and empirically, that nonlinear DNN-based classifiers behave differently under l2 attacks from some linear ones. We further demonstrate that an appropriately higher model sparsity implies better robustness of nonlinear DNNs, whereas over-sparsified models can be more difficult to resist adversarial examples. 1 Introduction Although deep neural networks (DNNs) have advanced the state-of-the-art of many artificial intelligence techniques, some undesired properties may hinder them from being deployed in real-world applications. With continued proliferation of deep learning powered applications, one major concern raised recently is the heavy computation and storage burden that DNN models shall lay upon mobile platforms. Such burden stems from substantially redundant feature representations and parameterizations [6]. To address this issue and make DNNs less resource-intensive, a variety of solutions have been proposed. In particular, it has been reported that more than 90% of connections in a well-trained DNN can be removed using pruning strategies [14, 13, 28, 21, 23], while no accuracy loss is observed. Such a remarkable network sparsity leads to considerable compressions and speedups on both GPUs and CPUs [25]. Aside from being efficient, sparse representations are theoretically attractive [2, 8] and have made their way into tremendous applications over the past decade. Orthogonal to the inefficiency issue, it has also been discovered that DNN models are vulnerable to adversarial examples maliciously generated images which are perceptually similar to benign ones but can fool classifiers to make arbitrary predictions [26, 3]. Furthermore, generic regularizations (e.g., dropout and weight decay) do not really help on resisting adversarial attacks [11]. Such undesirable property may prohibit DNNs from being applied to security-sensitive applications. The cause of this phenomenon seems mysterious and remains to be an open question. One reasonable explanation is the local linearity of modern DNNs [11]. Quite a lot of attempts, including adversarial training [11, 27, 19], knowledge distillation [24], detecting and rejecting [18], and some gradient masking techniques like randomization [31], have been made to ameliorate this issue and defend adversarial attacks. The first two authors contributed equally to this work. 32nd Conference on Neural Information Processing Systems (Neur IPS 2018), Montréal, Canada. It is crucial to study potential relationships between the inefficiency (i.e., redundancy) and adversarial robustness of classifiers, in consideration of the inclination to avoid robbing Peter to pay Paul , if possible. Towards shedding light on such relationships, especially for DNNs, we provide comprehensive analyses in this paper from both the theoretical and empirical perspectives. By introducing reasonable metrics, we reveal, somewhat surprising, that there is a discrepancy between the robustness of sparse linear classifiers and nonlinear DNNs, under l2 attacks. Our results also demonstrate that an appropriately higher sparsity implies better robustness of nonlinear DNNs, whereas over-sparsified models can be more difficult to resist adversarial examples, under both the l and l2 circumstances. 2 Related Works In light of the Occam s razor principle, we presume there exists intrinsic relationships between the sparsity and robustness of classifiers, and thus perform a comprehensive study in this paper. Our theoretical and empirical analyses shall cover both linear classifiers and nonlinear DNNs, in which the middle-layer activations and connection weights can all become sparse. The (in)efficiency and robustness of DNNs have seldom been discussed together, especially from a theoretical point of view. Very recently, Gopalakrishnan et al. [12, 20] propose to sparsify the input representations as a defense and provide provable evidences on resisting l attacks. Though intriguing, their theoretical analyses are limited to only linear and binary classification cases. Contemporaneous with our work, Wang et al. [29] and Ye et al. [32] experimentally discuss how pruning shall affect the robustness of some DNNs but surprisingly draw opposite conclusions. Galloway et al. [9] focus on binary DNNs instead of the sparse ones and show that the difficulty of performing adversarial attacks on binary networks DNNs remains as that of training. To some extent, several very recent defense methods also utilize the sparsity of DNNs. For improved model robustness, Gao et al. [10] attempt to detect the feature activations exclusive to the adversarial examples and prune them away. Dhillon et al. [7] choose an alternative way that prunes activations stochastically to mask gradients. These methods focus only on the sparsity of middle-layer activations and pay little attention to the sparsity of connections. 3 Sparsity and Robustness of Classifiers This paper aims at analyzing and exploring potential relationships between the sparsity and robustness of classifiers to untargeted white-box adversarial attacks, from both theoretical and practical perspectives. To be more specific, we consider models which learn parameterized mappings xi 7 yi, when given a set of labelled training samples {(xi, yi)} for supervision. Similar to a bunch of other theoretical efforts, our analyses start from linear classifiers and will be generalized to nonlinear DNNs later in Section 3.2. Generally, the sparsity of a DNN model can be considered in two aspects: the sparsity of connections among neurons and the sparsity of neuron activations. In particular, the sparsity of activations also include that of middle-layer activations and inputs, which can be treated as a special case. Knowing that the input sparsity has been previously discussed [12], we shall focus primarily on the weight and activation sparsity for nonlinear DNNs and just study the weight sparsity for linear models. 3.1 Linear Models For simplicity of notation, we first give theoretical results for binary classifiers with ˆyi = sgn(w T xi), in which w, xi Rn. We also ignore the bias term b for clarity. Notice that w T x + b can be simply rewritten as w T [x; 1] in which w = [w; b], so all our theoretical results in the sequel apply directly to linear cases with bias. Given ground-truth labels yi {+1, 1}, a classifier can be effectively trained by minimizing some empirical loss P i τ( yi w T xi) using a logistic sigmoid function like softplus: τ( ) = log(1 + exp( )) [11]. Adversarial attacks typically minimize an lp norm (e.g., l2, l , l1 and l0) of the required perturbation under certain (box) constraints. Though not completely equivalent with the distinctions in our visual domain, such norms play a crucial role in evaluating adversarial robustness. We study both the l and l2 attacks in this paper. With an ambition to totalize them, we propose to evaluate the robustness of linear models using the following metrics that describe the ability of resisting them respectively: Binary : r := Ex,y 1y=sgn(w T ˇx) , r2 := Ex,y (1y=ˆy d(x, x)) . (1) Here we let ˇx = x ϵy sgn(w) and x = x w(w T x)/ w 2 2 be the adversarial examples generated by applying the fast gradient sign (FGS) [11] and Deep Fool [22] methods as representatives. Without box constraints on the image domain, they can be regarded as the optimal l and l2 attacks targeting on the linear classifiers [20, 22]. Function d calculates the Euclidean distance between two ndimensional vectors and we know that d(x, x) = |w T x|/ w 2. The introduced two metrics evaluate robustness of classifiers from two different perspectives: r calculates the expected accuracy on (FGS) adversarial examples and r2 measures a decision margin between benign examples from the two classes. For both of them, higher value indicates stronger adversarial robustness. Note that unlike some metrics calculating (maybe normalized) Euclidean distances between all pairs of benign and adversarial examples, our r2 omits the originally misclassfied examples, which makes more sense if the classifiers are imperfect in the sense of prediction accuracy. We will refer to µk := E(x|y = k, ˆy = k), which is the conditional expectation for class k. Be aware that although there exists attack-agnostic guarantees on the model robustness [16, 30], they are all instance-specific. Instead of generalizing them to the entire input space for analysis, we focus on the proposed statistical metrics and present their connections to the guarantees later in Section 3.2. Some other experimentally feasible metrics shall be involved in Section 4. The following theorem sheds light on intrinsic relationships between the described robustness metrics and lp norms of w. Theorem 3.1. (The sparsity and robustness of binary linear classifiers). Suppose that Py(k) = 1/2 for k = 1, and an obtained linear classifier achieves the same expected accuracy t on different classes, then we have 2 w T (µ+1 µ 1) w 2 and r t 2 w T (µ+1 µ 1) ϵ w 1 . (2) Proof. For r , we first rewrite it in the form of Pr(y w T ˇx > 0). We know from assumptions that Pr(ˆy = k|y = k) = t and Pr(y = k) = 1/2, so we further get t 2 Pr (k w T x > ϵ w 1| y = k, ˆy = k), (3) by using the law of total probability and substituting ˇx with x ϵy sgn(w). Lastly the result follows after using the Markov s inequality. As for r2, the proof is straightforward by similarly casting its definition into the sum of conditional expectations. That is, t 2 Ex|y,ˆy y = k, ˆy = k . (4) Theorem 3.1 indicates clear relationships between the sparsity and robustness of linear models. In terms of r , optimizing the problem gives rise to a sparse solution of w. By duality, maximizing the squared upper bound of r also resembles solving a sparse PCA problem [5]. Reciprocally, we might also concur that a highly sparse w implies relatively robust classification results. Nevertheless, it seems that the defined r2 has nothing to do with the sparsity of w. It gets maximized iff w approaches µ+1 µ 1 or µ 1 µ+1, however, sparsifying w probably does not help on reaching this goal. In fact, under some assumptions about data distributions, the dense reference model can be nearly optimal in the sense of r2. We will see this phenomenon remains in multi-class linear classifications in Theorem 3.2 but does not remain in nonlinear DNNs in Section 3.2. One can check Section 4.1 and 4.2 for some experimental discussions in more details. Having realized that the l robustness of binary linear classifiers is closely related to w 1, we now turn to multi-class cases with the ground truth yi {1, . . . , c} and prediction ˆyi = arg maxk(w T k xi), in which wk = W[:, k] indicates the k-th column of a matrix W Rn c. Here the training objective f calculates the cross-entropy loss between ground truth labels and outputs of a softmax function. The introduced two metrics shall be slightly modified to: Multi-class : r := Ex,y 1y=arg maxk(w T k ˇx) , r2 := Ex,y (1y=ˆy d(x, x)) . (5) Likewise, ˇx = x + ϵ sgn( f(x)) and x = x wδ(w T δ x)/ wδ 2 2 are the FGS and Deep Fool adversarial examples under multi-class circumstances, in which wδ = wˆy we and e {1, . . . , c} {ˆy} is carefully chosen such that |(wˆy we)T x|/ wˆy we 2 is minimized. Denote an averaged classifier by w := P k wk/c, we provide upper bounds for both r and r2 in the following theorem. Theorem 3.2. (The sparsity and robustness of multi-class linear classifiers). Suppose that Py(k) = 1/c for k {1, ..., c}, and an obtained linear classifier achieves the same expected accuracy t on different classes, then we have wk w 2 and r t ϵ wk w 1 (6) under two additional assumptions: (I) FGS achieves higher per-class success rates than a weaker perturbation like ϵ sgn(wy w), (II) the FGS perturbation does not correct misclassifications. We present in Theorem 3.2 similar bounds for multi-class classifiers to that provided in Theorem 3.1, under some mild assumptions. Our proof is deferred to the supplementary material. We emphasize that the two additional assumptions are intuitively acceptable. First, increasing the classification loss in a more principled way, say using FGS, ought to diminish the expected accuracy more effectively. Second, with high probability, an original misclassification cannot be fixed using the FGS method, as one intends to do precisely the opposite. Similarly, the presented bound for r also implies sparsity, though it is the sparsity of wk w. In fact, this is directly related with the sparsity of wk, considering that the classifiers can be postprocessed to subtract their average simultaneously whilst the classification decision won t change for any possible input. Particularly, Theorem 3.2 also partially suits linear DNN-based classifications. Let the classifier gk be factorized in a form of w T k = (w k)T W T d 1 . . . W T 1 , it is evident to see that higher sparsity of the multipliers encourages higher probability of a sparse wk. 3.2 Deep Neural Networks A nonlinear feedforward DNN is usually specified by a directed acyclic graph G = (V, E) [4] with a single root node for final outputs. According to the forward propagation rule, the activation value of each internal (and also output) node is calculated based on its incoming nodes and learnable weights corresponding to the edges. Nonlinear activation functions are incorporated to ensure the capacity. With biases, some nodes output a special value of one. We omit them for simplicity reasons as before. Classifications are performed by comparing the prediction scores corresponding to different classes, which means ˆy = arg maxk {1,...,c} gk(x). Benefit from some very recent theoretic efforts [16, 30], we can directly utilize well-established robustness guarantees for nonlinear DNNs. Let us first denote by Bp(x, R) a close ball centred at x with radius R and then denote by Lk q,x the (best) local Lipschitz constant of function gˆy(x) gk(x) over a fixed Bp(x, R), if there exists one. It has been proven that the following lemma offers a reasonable lower bound for the required lp norm of instance-specific perturbations when all classifiers are Lipschitz continuous [30]. Proposition 3.1. [30] Let ˆy = arg maxk {1,...,c} gk(x) and 1 q = 1, then for any x Bp(0, R), p R+ and a set of Lipschitz continuous functions {gk : Rn 7 R}, with x p min min k =ˆy gˆy(x) gk(x) Lkq,x , R := γ, (7) it holds that ˆy = arg maxk {1,...,c} gk(x + x), which means the classification decision does not change on Bp(x, γ). Here the introduced γ is basically an instance-specific lower bound that guarantees the robustness of multi-class classifiers. We shall later discuss its connections with our rps, for p { , 2}, and now we try providing a local Lipschitz constant (which may not be the smallest) of function gˆy(x) gk(x), to help us delve deeper into the robustness of nonlinear DNNs. Without loss of generality, we will let the following discussion be made under a fixed radius R > 0 and a given instance x Rn. Some modern DNNs can be structurally very complex. Let us simply consider a multi-layer perceptron (MLP) parameterized by a series of weight matrices W1 Rn0 n1, . . . , Wd Rnd 1 nd, in which n0 = n and nd = c. Discussions about networks with more advanced architectures like convolutions, pooling and skip connections can be directly generalized [1]. Specifically, we have gk(xi) = w T k σ(W T d 1σ(. . . σ(W T 1 xi))), (8) in which wk = Wd[:, k] and σ is the nonlinear activation function. Here we mostly focus on Re LU networks with rectified-linear-flavoured nonlinearity, so the neuron activations in middle-layers are naturally sparse. Due to clarity reasons, we discuss the weight and activation sparsities separately. Mathematically, we let a0 = x and aj = σ(W T j aj 1) for 0 < j < d be the layer-wise activations. We will refer to Dj(x) := diag 1Wj[:,1]T aj 1>0, . . . , 1Wj[:,nj]T aj 1>0 , (9) which is a diagonal matrix whose entries taking value one correspond to nonzero activations within the j-th layer, and Mj {0, 1}nj 1 nj, which is a binary mask corresponding to each (possibly sparse) Wj. Along with some analyses, the following lemma and theorem present intrinsic relationships between the adversarial robustness and (both weight and activation) sparsity of nonlinear DNNs. Lemma 3.1. (A local Lipschitz constant for Re LU networks). Let 1 q = 1, then for any x Rn, k {1, . . . , c} and q {1, 2}, the local Lipschitz constant of function gˆy(x) gk(x) satisfies Lk q,x wˆy wk q sup x Bp(x,R) j=1 ( Dj(x ) p Wj p) . (10) Theorem 3.3. (The sparsity and robustness of nonlinear DNNs). Let the weight matrix be represented as Wj = W j Mj, in which {Mj[u, v]} are independent Bernoulli B(1, 1 αj) random variables and 0 / {W j[u, v]}, for j {1, . . . , d 1}. Then for any x Rn and k {1, . . . , c}, it holds that EM1,...,Md 1(Lk 2,x) c2 (1 η(α1, . . . , αd 1; x)) (11) and EM1,...,Md 1(Lk 1,x) c1 (1 η(α1, . . . , αd 1; x)), (12) in which function η is monotonically increasing w.r.t. each αj, c2 = wˆy wk 2 Q j W j F and c1 = wˆy wk 1 Q j W j 1, are two constants. Proof Sketch. Function Q Dj( ) p Wj p defined on Rn is bounded from above and below, thus we know there exists an ˆx Bp(x, R) satisfying Lk q,x wˆy wk q Y j Dj(ˆx) p Wj p. (13) Particularly, Q Dj(ˆx) p = 0 is fulfilled iff Dd 1(ˆx) p = 0 (i.e., it equals 1 for q {1, 2}). Under the assumptions on Mj, we know that the entries of Wj are independent of each other, thus Pr M1,...,Md 1(Dd 1(ˆx)[u, u] = 0) = Pr M1,...,Md 1(Wd 1[:, u]T ad 2 0) u (αd 1 + ξd 2,u αd 1ξd 2,u ), (14) in which ξd 2,u is a newly introduced scalar that equals or less equals to the probability of the u -th neuron being deactivated. In this manner, we can recursively define the function η and it is easy to validate its monotonicity. Additionally, we prove that cq wˆy wk q E (Q Wj p| Dd 1(ˆx) p = 1) holds for q {1, 2} and the result follows. See the supplementary material for a detailed proof. In Lemma 3.1 we introduce probably smaller local Lipschitz constants than the commonly known ones (i.e., c2 and c1), and subsequently in Theorem 3.3 we build theoretical relationships between Lk q,x and the network sparsity, for q {1, 2} (i.e., p { , 2}). Apparently, Lk q,x is prone to get Figure 1: The robustness of linear classifiers with varying weight sparsity. Upper: binary classification between 1 s and 7 s, lower: multi-class classification on the whole MNIST test set. smaller if any weight matrix gets more sparse. It is worthy noting that the local Lipschitz constant is of great importance in evaluating the robustness of DNNs, and it is effective to regularize DNNs by just minimizing Lk q,x, or equivalently gˆy(x) gk(x) q for differentiable continuous functions [16]. Thus we reckon, when the network is over-parameterized, an appropriately higher weight sparsity implies a larger γ and stronger robustness. There are similar conclusions if aj gets more sparse. Recall that in the linear binary case, we apply the Deep Fool adversarial example x when evaluating the robustness using r2. It is not difficult to validate that the equality d(x, x) = |(wˆy wk =ˆy)T x|/Lk 2,x holds for such x and w 1 := w, which means the Deep Fool perturbation ideally minimizes the Euclidean norm and helps us measure a lower bound in this regard. This can be directly generalized to multi-class classifiers. Unlike r2 which represents a margin, our r is basically an expected accuracy. Nevertheless, we also know that a perturbation of ϵy sgn(w) shall successfully fool the classifiers if ϵ |(wˆy wk =ˆy)T x|/Lk 1,x. 4 Experimental Results In this section, we conduct experiments to testify our theoretical results. To be consistent, we still start from linear models and turn to nonlinear DNNs afterwards. As previously discussed, we perform both l and l2 attacks on the classifiers to evaluate their adversarial robustness. In addition to the FGS [11] and Deep Fool [22] attacks which have been thoroughly discussed in Section 3, we introduce two more attacks in this section for extensive comparisons of the model robustness. Adversarial attacks. We use the FGS and randomized FGS (r FGS) [27] methods to perform l attacks. As a famous l attack, FGS has been widely exploited in the literature. In order to generate adversarial examples, it calculates the gradient of training loss w.r.t. benign inputs and uses its sign as perturbations, in an element-wise manner. The r FGS attack is a computationally efficient alternative to multi-step l attacks with an ability of breaking adversarial training-based defences. We keep its hyper-parameters fixed for all experiments in this paper. For l2 attacks, we choose Deep Fool and the C&W s attack [3]. Deep Fool linearises nonlinear classifiers locally and approximates the optimal perturbations iteratively. C&W s method casts the problem of constructing adversarial examples as optimizing an objective function without constraints, such that some recent gradient-descent-based solvers can be adopted. On the base of different attacks, four r2 and r values can be calculated for each classification model. 4.1 The Sparse Linear Classifier Behaves Differently under l and l2 Attacks In our experiments on linear classifiers, both the binary and multi-class scenarios shall be evaluated. We choose the well-established MNIST dataset as a benchmark, which consists of 70,000 28 28 images of handwritten digits. According to the official test protocol, 10,000 of them should be used for performance evaluation and the remaining 60,000 for training. For experiments on the binary Figure 2: The robustness of nonlinear DNNs with varying weight sparsity. (a)-(b): Le Net-300-100, (c)-(d): Le Net-5, (e)-(f): the VGG-like network, (g)-(h): Res Net-32. cases, we randomly choose a pair of digits (e.g., 0 and 8 or 1 and 7 ) as positive and negative classes. Linear classifiers are trained following our previous discussions and utilizing the softplus function: minw,b P i log(1 + exp( yi(w T xi + b))). Parameters w and b are randomly initialized and learnt by means of stochastic gradient descent with momentum. For the 1 and 7 classification case, we train 10 reference models from different initializations and achieve a prediction accuracy of 99.17 0.00% on the benign test set. For the classification of all 10 classes, we train 10 references similarly and achieve a test-set accuracy of 92.26 0.08%. To produce models with different weight sparsities, we use a progressive pruning strategy [14]. That being said, we follow a pipeline of iteratively pruning and re-training. Within each iteration, a portion (ρ) of nonzero entries of w, whose magnitudes are relatively small in comparison with the others, will be directly set to zero and shall never be activated again. After m times of such pruning , we shall collect 10(m + 1) models from all 10 dense references. Here we set m = 16, ρ = 1/3 so the achieved final percentage of zero weights should be 99.74% 1 (1 ρ)m. We calculate the prediction accuracies on adversarial examples (i.e., r ) under different l attacks and the average Euclidean norm of required perturbations (i.e., r2) under different l2 attacks to evaluate the adversarial robustness of different models in practice. For l attacks, we set ϵ = 0.1. Figure 1 illustrates how our metrics of robustness vary with the weight sparsity. We only demonstrate the variability of the first 12 points (from left to right) on each curve, to make the bars more resolvable. The upper and lower subfigures correspond to binary and multi-class cases, respectively. Obviously, the experimental results are consistent with our previous theoretical ones. While sparse linear models are prone to be more robust in the sense of r , their r2 robustness maintains similar or becomes even slightly weaker than the dense references, until there emerges inevitable accuracy degradations on benign examples (i.e., when r may drop as well). We also observe from Figure 1 that, in both the binary and multi-class cases, r2 starts decreasing much earlier than the benign-set accuracy. Though very slight in the binary case, the degradation of r2 actually occurs after the first round of pruning (from 2.0103 0.0022 to 2.0009 0.0016 with Deep Fool incorporated, and from 2.3151 0.0023 to 2.3061 0.0023 with the C&W s attack). 4.2 Sparse Nonlinear DNNs Can be Consistently More Robust Regarding nonlinear DNNs, we follow the same experimental pipeline as described in Section 4.1. We train MLPs with 2 hidden fully-connected layers and convolutional networks with 2 convolutional layers, 2 pooling layers and 2 fully-connected layers as references on MNIST, following the Le Net300-100 and Le Net-5 architectures in network compression papers [14, 13, 28, 21]. We also follow the training policy suggested by Caffe [17] and train network models for 50,000 iterations with a batch size of 64 such that the training cross-entropy loss does not decrease any longer. The well-trained reference models achieve much higher prediction accuracies (Le Net-300-100: 98.20 0.07% and Le Net-5: 99.11 0.04%) than previous tested linear ones on the benign test set. Weight sparsity. Then we prune the dense references and illustrate some major results regarding the robustness and weight sparsity in Figure 2 (a)-(d). (See Figure 3 in our supplementary material for results under r FGS and the C&W s attack.) Weight matrices/tensors within each layer is uniformly pruned so the network sparsity should be approximately equal to the layer-wise sparsity. As expected, we observe similar results to previous linear cases in the context of our r but significantly different results in r2. Unlike previous linear models which behave differently under l and l2 attacks, nonlinear DNN models show a consistent trend of adversarial robustness with respect to the sparsity. In particular, we observe increased r and r2 values under different attacks when continually pruning the models, until the sparsity reaches some thresholds and leads to inevitable capacity degradations. For additional verifications, we calculate the CLEVER [30] scores that approximate attack-agnostic lower bounds of the lp norms of required perturbations (in Table 3 in the supplementary material). Experiments are also conducted on CIFAR-10, in which deeper nonlinear networks can be involved. We train 10 VGG-like network models [23] (each incorporates 12 convolutional layers and 2 fullyconnected layers) and 10 Res Net models [15] (each incorporates 31 convolutional layers and a single fully-connected layers) from scratch. Such deep architectures lead to average prediction accuracies of 93.01% and 92.89%. Still, we prune dense network models in the progressive manner and illustrate quantitative relationships between the robustness and weight sparsity in Figure 2 (e)-(h). The first and last layers in each network are kept dense to avoid early accuracy degradation on the benign set. The same observations can be made. Note that the Res Nets are capable of resisting some Deep Fool examples, for which the second and subsequent iterations make little sense and can be disregarded. Activation sparsity. Having testified relationship between the robustness and weight sparsity of nonlinear DNNs, we now examine the activation sparsity. As previously mentioned, the middle-layer activations of Re LU incorporated DNNs are naturally sparse. We simply add a l1 norm regularization of weight matrices/tensors to the learning objective to encourage higher sparsities and calculate r and r2 accordingly. Experiments are conducted on MNIST. Table 1 summarizes the results, in which Sparsity (a) indicates the percentage of deactivated (i.e., zero) neurons feeding to the last fully-connected layer. Here the r and r2 values are calculated using the FGS and Deep Fool attacks, respectively. Apparently, we still observe positive correlations between the robustness and (activation) sparsity in a certain range. Table 1: The robustness of DNNs regularized using the l1 norm of weight matrices/tensors. Network r r2 Accuracy Sparsity (a) 0.2862 0.0113 1.3213 0.0207 98.20 0.07% 45.25 1.14% Le Net-300-100 0.3993 0.0079 1.5903 0.0240 98.27 0.04% 75.92 0.54% 0.2098 0.0133 1.1440 0.0402 97.96 0.07% 95.22 0.18% 0.7388 0.0188 2.7831 0.1490 99.11 0.04% 51.26 1.88% Le Net-5 0.7729 0.0081 3.1688 0.1203 99.19 0.05% 97.54 0.10% 0.6741 0.0162 2.0799 0.0522 99.10 0.06% 99.64 0.02% 4.3 Avoid Over-pruning We discover from Figure 2 that the sharp decrease of the adversarial robustness, especially in the sense of r2, may occur in advance of the benign-set accuracy degradation. Hence, it can be necessary to evaluate the adversarial robustness of DNNs during an aggressive surgery, even though the prediction accuracy of compressed models may remain competitive with their references on benign test-sets. To further explore this, we collect some off-the-shelf sparse models (including a 56 compressed Le Net-300-100 and a 108 compressed Le Net-5) [13] and their corresponding dense references from the Internet and hereby evaluate their r and r2 robustness. Table 2 compares the robustness of different models. Obviously, these extremely sparse models are more vulnerable to the Deep Fool attack, and what s worse, the over 100 pruned Le Net-5 seems also more vulnerable to FGS, which suggests researchers to take care and avoid over-pruning if possible. One might also discover the fact with other pruning methods. Table 2: The robustness of pre-compressed nonlinear DNNs and their provided dense references. Model r r2 Sparsity (W) Le Net-300-100 dense 0.2663 1.3899 0.00% Le Net-300-100 sparse 0.3823 1.1058 98.21% Le Net-5 dense 0.7887 2.7226 0.00% Le Net-5 sparse 0.6791 1.7383 99.07% 5 Conclusions In this paper, we study some intrinsic relationships between the adversarial robustness and the sparsity of classifiers, both theoretically and empirically. By introducing plausible metrics, we demonstrate that unlike some linear models which behave differently under l and l2 attacks, sparse nonlinear DNNs can be consistently more robust to both of them than their corresponding dense references, until their sparsity reaches certain thresholds and inevitably causes harm to the network capacity. Our results also demonstrate that such sparsity, including sparse connections and middle-layer neuron activations, can be effectively imposed using network pruning and l1 regularization of weight tensors. Acknowledgement We would like to thank anonymous reviewers for their constructive suggestions. Changshui Zhang is supported by NSFC (Grant No. 61876095, No. 61751308 and No. 61473167) and Beijing Natural Science Foundation (Grant No. L172037). [1] Peter L Bartlett, Dylan J Foster, and Matus J Telgarsky. Spectrally-normalized margin bounds for neural networks. In NIPS, 2017. [2] Emmanuel J Candès, Justin Romberg, and Terence Tao. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on Information Theory, 52(2):489 509, 2006. [3] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In SP, 2017. [4] Moustapha Cisse, Piotr Bojanowski, Edouard Grave, Yann Dauphin, and Nicolas Usunier. Parseval networks: Improving robustness to adversarial examples. In ICML, 2017. [5] Alexandre d Aspremont, Francis Bach, and Laurent El Ghaoui. Optimal solutions for sparse principal component analysis. JMLR, 9(July):1269 1294, 2008. [6] Misha Denil, Babak Shakibi, Laurent Dinh, Marc Aurelio Ranzato, and Nando De Freitas. Predicting parameters in deep learning. In NIPS, 2013. [7] Guneet S Dhillon, Kamyar Azizzadenesheli, Zachary C Lipton, Jeremy Bernstein, Jean Kossaifi, Aran Khanna, and Anima Anandkumar. Stochastic activation pruning for robust adversarial defense. In ICLR, 2018. [8] David L Donoho. Compressed sensing. IEEE Transactions on Information Theory, 52(4):1289 1306, 2006. [9] Angus Galloway, Graham W Taylor, and Medhat Moussa. Attacking binarized neural networks. In ICLR, 2018. [10] Ji Gao, Beilun Wang, Zeming Lin, Weilin Xu, and Yanjun Qi. Deepcloak: Masking deep neural network models for robustness against adversarial samples. In ICLR Workshop, 2017. [11] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In ICLR, 2015. [12] Soorya Gopalakrishnan, Zhinus Marzi, Upamanyu Madhow, and Ramtin Pedarsani. Combating adversarial attacks using sparse representations. In ICLR Workshop, 2018. [13] Yiwen Guo, Anbang Yao, and Yurong Chen. Dynamic network surgery for efficient dnns. In NIPS, 2016. [14] Song Han, Jeff Pool, John Tran, and William Dally. Learning both weights and connections for efficient neural network. In NIPS, 2015. [15] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016. [16] Matthias Hein and Maksym Andriushchenko. Formal guarantees on the robustness of a classifier against adversarial manipulation. In NIPS, 2017. [17] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. Caffe: Convolutional architecture for fast feature embedding. In MM, 2014. [18] Jiajun Lu, Theerasit Issaranon, and David Forsyth. Safetynet: Detecting and rejecting adversarial examples robustly. In ICCV, 2017. [19] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In ICLR, 2018. [20] Zhinus Marzi, Soorya Gopalakrishnan, Upamanyu Madhow, and Ramtin Pedarsani. Sparsity-based defense against adversarial attacks on linear classifiers. ar Xiv preprint ar Xiv:1801.04695, 2018. [21] Dmitry Molchanov, Arsenii Ashukha, and Dmitry Vetrov. Variational dropout sparsifies deep neural networks. In ICML, 2017. [22] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deep Fool: a simple and accurate method to fool deep neural networks. In CVPR, 2016. [23] Kirill Neklyudov, Dmitry Molchanov, Arsenii Ashukha, and Dmitry P Vetrov. Structured bayesian pruning via log-normal multiplicative noise. In NIPS, 2017. [24] Nicolas Papernot, Patrick Mc Daniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In SP, 2016. [25] Jongsoo Park, Sheng Li, Wei Wen, Ping Tak Peter Tang, Hai Li, Yiran Chen, and Pradeep Dubey. Faster cnns with direct sparse convolutions and guided pruning. In ICLR, 2017. [26] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In ICLR, 2014. [27] Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Dan Boneh, and Patrick Mc Daniel. Ensemble adversarial training: Attacks and defenses. In ICLR, 2018. [28] Karen Ullrich, Edward Meeds, and Max Welling. Soft weight-sharing for neural network compression. In ICLR, 2017. [29] Luyu Wang, Gavin Weiguang Ding, Ruitong Huang, Yanshuai Cao, and Yik Chau Lui. Adversarial robustness of pruned neural networks. In ICLR Workshop submission, 2018. [30] Tsui-Wei Weng, Huan Zhang, Pin-Yu Chen, Jinfeng Yi, Dong Su, Yupeng Gao, Cho-Jui Hsieh, and Luca Daniel. Evaluating the robustness of neural networks: An extreme value theory approach. In ICLR, 2018. [31] Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, and Alan Yuille. Mitigating adversarial effects through randomization. In ICLR, 2018. [32] Shaokai Ye, Siyue Wang, Xiao Wang, Bo Yuan, Wujie Wen, and Xue Lin. Defending DNN adversarial attacks with pruning and logits augmentation. In ICLR Workshop submission, 2018.