# scaling_provable_adversarial_defenses__dd9ea508.pdf

Scaling provable adversarial defenses

Eric Wong Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213 ericwong@cs.cmu.edu

Frank R. Schmidt Bosch Center for Artiﬁcial Intelligence Renningen, Germany frank.r.schmidt@de.bosch.com

Jan Hendrik Metzen Bosch Center for Artiﬁcial Intelligence Renningen, Germany janhendrik.metzen@de.bosch.com

J. Zico Kolter Computer Science Department Carnegie Mellon University and Bosch Center for Artiﬁcial Intelligence Pittsburgh, PA 15213 zkolter@cs.cmu.edu

Recent work has developed methods for learning deep network classiﬁers that are provably robust to norm-bounded adversarial perturbation; however, these methods are currently only possible for relatively small feedforward networks. In this paper, in an effort to scale these approaches to substantially larger models, we extend previous work in three main directions. First, we present a technique for extending these training procedures to much more general networks, with skip connections (such as Res Nets) and general nonlinearities; the approach is fully modular, and can be implemented automatically (analogous to automatic differentiation). Second, in the speciﬁc case of ℓ adversarial perturbations and networks with Re LU nonlinearities, we adopt a nonlinear random projection for training, which scales linearly in the number of hidden units (previous approaches scaled quadratically). Third, we show how to further improve robust error through cascade models. On both MNIST and CIFAR data sets, we train classiﬁers that improve substantially on the state of the art in provable robust adversarial error bounds: from 5.8% to 3.1% on MNIST (with ℓ perturbations of ϵ = 0.1), and from 80% to 36.4% on CIFAR (with ℓ perturbations of ϵ = 2/255). Code for all experiments in the paper is available at https://github.com/locuslab/convex_adversarial/.

1 Introduction

A body of recent work in adversarial machine learning has shown that it is possible to learn provably robust deep classiﬁers [Wong and Kolter, 2017, Raghunathan et al., 2018, Dvijotham et al., 2018]. These are deep networks that are veriﬁably guaranteed to be robust to adversarial perturbations under some speciﬁed attack model; for example, a certain robustness certiﬁcate may guarantee that for a given example x, no perturbation with ℓ norm less than some speciﬁed ϵ could change the class label that the network predicts for the perturbed example x + . However, up until this point, such provable guarantees have only been possible for reasonably small-sized networks. It has remained unclear whether these methods could extend to larger, more representionally complex networks.

In this paper, we make substantial progress towards the goal of scaling these provably robust networks to realistic sizes. Speciﬁcally, we extend the techniques of Wong and Kolter [2017] in three key ways. First, while past work has only applied to pure feedforward networks, we extend the framework to deal with arbitrary residual/skip connections (a hallmark of modern deep network architectures),

32nd Conference on Neural Information Processing Systems (Neur IPS 2018), Montréal, Canada.

and arbitrary activation functions (Dvijotham et al. [2018] also worked with arbitrary activation functions, but only for feedforward networks, and just discusses network veriﬁcation rather than robust training). Second, and possibly most importantly, computing the upper bound on the robust loss in [Wong and Kolter, 2017] in the worst case scales quadratically in the number of hidden units in the network, making the approach impractical for larger networks. In this work, we use a nonlinear random projection technique to estimate the bound in manner that scales only linearly in the size of the hidden units (i.e., only a constant multiple times the cost of traditional training), and which empirically can be used to train the networks with no degradation in performance from the previous work. Third, we show how to further improve robust performance of these methods, though at the expense of worse non-robust error, using multi-stage cascade models. Through these extensions, we are able to improve substantially upon the veriﬁed robust errors obtained by past work.

2 Background and related work

Work in adversarial defenses typically falls in one of three primary categories. First, there is ongoing work in developing heuristic defenses against adversarial examples: [Goodfellow et al., 2015, Papernot et al., 2016, Kurakin et al., 2017, Metzen et al., 2017] to name a few. While this work is largely empirical at this point, substantial progress has been made towards developing networks that seem much more robust than previous approaches. Although a distressingly large number of these defenses are quickly broken by more advanced attacks [Athalye et al., 2018], there have also been some methods that have proven empirically resistant to the current suite of attacks; the recent NIPS 2017 adversarial example challenge [Kurakin et al., 2018], for example, highlights some of the progress made on developing classiﬁers that appear much stronger in practice than many of the ad-hoc techniques developed in previous years. Many of the approaches, though not formally veriﬁed in the strict sense during training, nonetheless have substantial theoretical justiﬁcation for why they may perform well: Sinha et al. [2018] uses properties of statistical robustness to develop an approach that is not much more difﬁcult to train and which empirically does achieve some measure of resistance to attacks; Madry et al. [2017] considers robustness to a ﬁrst-order adversary, and shows that a randomized projected gradient descent procedure is optimal in this setting. Indeed, in some cases the classiﬁers trained via these methods can be veriﬁed to be adversarially robust using the veriﬁcation techniques discussed below (though only for very small networks). Despite this progress, we believe it is also crucially important to consider defenses that are provably robust, to avoid any possible attack.

Second, our work in this paper relates closely to techniques for the formal veriﬁcation of neural networks systems (indeed, our approach can be viewed as a convex procedure for veriﬁcation, coupled with a method for training networks via the veriﬁed bounds). In this area, most past work focuses on using exact (combinatorial) solvers to verify the robustness properties of networks, either via Satisﬁability Modulo Theories (SMT) solvers [Huang et al., 2017, Ehlers, 2017, Carlini and Wagner, 2017] or integer programming approaches [Lomuscio and Maganti, 2017, Tjeng and Tedrake, 2017, Cheng et al., 2017]. These methods have the beneﬁt of being able to reason exactly about robustness, but at the cost of being combinatorial in complexity. This drawback has so far prevented these methods from effectively scaling to large models or being used within a training setting. There have also been a number of recent attempts to verify networks using non-combinatorial methods (and this current work ﬁts broadly in this general area). For example, Gehr et al. [2018] develop a suite of veriﬁcation methods based upon abstract interpretations (these can be broadly construed as relaxations of combinations of activations that are maintained as they pass through the network). Dvijotham et al. [2018] use an approach based upon analytically solving an optimization problem resulting from dual functions of the activations (which extends to activations beyond the Re LU). However, these methods apply to simple feedforward architectures without skip connections, and focus only on veriﬁcation of existing networks.

Third, and most relevant to our current work, there are several approaches that go beyond provable veriﬁcation, and also integrate the veriﬁcation procedure into the training of the network itself. For example, Hein and Andriushchenko [2017] develop a formal bound for robustness to ℓ2 perturbations in two-layer networks, and train a surrogate of their bounds. Raghunathan et al. [2018] develop a semideﬁnite programming (SDP) relaxation of exact veriﬁcation methods, and train a network by minimizing this bound via the dual SDP. And Wong and Kolter [2017] present a linear-programming (LP) based upper bound on the robust error or loss that can be suffered under norm-bounded

perturbation, then minimize this upper bound during training; the method is particularly efﬁcient since they do not solve the LP directly, but instead show that it is possible to bound the LP optimal value and compute elementwise bounds on the activation functions based on a backward pass through the network. However, it is still the case that none of these approaches scale to realistically-sized networks; even the approach of [Wong and Kolter, 2017], which empirically has been scaled to the largest settings of all the above approaches, in the worst case scales quadratically in the number of hidden units in the network and dimensions in the input. Thus, all the approaches so far have been limited to relatively small networks and problems such as MNIST.

Contributions This paper ﬁts into this third category of integrating veriﬁcation into training, and makes substantial progress towards scaling these methods to realistic settings. While we cannot yet reach e.g. Image Net scales, even in this current work, we show that it is possible to overcome the main hurdles to scalability of past approaches. Speciﬁcally, we develop a provably robust training procedure, based upon the approach in [Wong and Kolter, 2017], but extending it in three key ways. The resulting method: 1) extends to general networks with skip connections, residual layers, and activations besides the Re LU; we do so by using a general formulation based on the Fenchel conjugate function of activations; 2) scales linearly in the dimensionality of the input and number of hidden units in the network, using techniques from nonlinear random projections, all while suffering minimal degradation in accuracy; and 3) further improves the quality of the bound with model cascades. We describe each of these contributions in the next section.

3 Scaling provably robust networks

3.1 Robust bounds for general networks via modular dual functions

This section presents an architecture for constructing provably robust bounds for general deep network architectures, using Fenchel duality. Importantly, we derive the dual of each network operation in a fully modular fashion, simplifying the problem of deriving robust bounds of a network to bounding the dual of individual functions. By building up a toolkit of dual operations, we can automatically construct the dual of any network architecture by iterating through the layers of the original network.

The adversarial problem for general networks We consider a generalized k layer neural network fθ : R|x| R|y| given by the equations

j=1 fij(zj), for i = 2, . . . , k (1)

where z1 = x, fθ(x) zk (i.e., the output of the network) and fij : R|zj| R|zi| is some function from layer j to layer i. Importantly, this differs from prior work in two key ways. First, unlike the conjugate forms found in Wong and Kolter [2017], Dvijotham et al. [2018], we no longer assume that the network consists of linear operations followed by activation functions, and instead opt to work with an arbitrary sequence of k functions. This simpliﬁes the analysis of sequential non-linear activations commonly found in modern architectures, e.g. max pooling or a normalization strategy followed by a Re LU,1 by analyzing each activation independently, whereas previous work would need to analyze the entire sequence as a single, joint activation. Second, we allow layers to depend not just on the previous layer, but also on all layers before it. This generalization applies to networks with any kind of skip connections, e.g. residual networks and dense networks, and greatly expands the set of possible architectures.

Let B(x) R|x|, represent some input constraint for the adversary. For this section we will focus on an arbitrary norm ball B(x) = {x + : ϵ}. This is the constraint set considered for norm-bounded adversarial perturbations, however other constraint sets can certainly be considered. Then, given an input example x, a known label y , and a target label ytarg, the problem of ﬁnding the most adversarial example within B (i.e., a so-called targeted adversarial attack) can be written as

minimize zk c T zk, subject to zi =

j=1 fij(zj), for i = 2, . . . , k, z1 B(x) (2)

1Batch normalization, since it depends on entire minibatches, is formally not covered by the approach, but it can be approximated by considering the scaling and shifting to be generic parameters, as is done at test time.

where c = ey eytarg.

Dual networks via compositions of modular dual functions To bound the adversarial problem, we look to its dual optimization problem using the machinery of Fenchel conjugate functions [Fenchel, 1949], described in Deﬁnition 1.

Deﬁnition 1. The conjugate of a function f is another function f deﬁned by

f (y) = max x x T y f(x) (3)

Speciﬁcally, we can lift the constraint zi+1 = Pi j=1 fij(zj) from Equation 2 into the objective with an indicator function, and use conjugate functions to obtain a lower bound. For brevity, we will use the subscript notation ( )1:i = (( )1, . . . , ( )i), e.g. z1:i = (z1, . . . , zi). Due to the skip connections, the indicator functions are not independent, so we cannot directly conjugate each individual indicator function. We can, however, still form its dual using the conjugate of a different indicator function corresponding to the backwards direction, as shown in Lemma 1. Lemma 1. Let the indicator function for the ith constraint be

χi(z1:i) = 0 if zi = Pi 1 j=1 fij(zj) otherwise, (4)

for i = 2, . . . , k, and consider the joint indicator function Pk i=2 χi(z1:i). Then, the joint indicator is lower bounded by maxν1:k νT k zk νT 1 z1 Pk 1 i=1 χ i ( νi, νi+1:k), where

χ i (νi:k) = max zi νT i zi +

j=i+1 νT j fji(zi) (5)

for i = 1, . . . , k 1. Note that χ i (νi:k) is the exact conjugate of the indicator for the set {xi:k : xj = fji(xi) j > i}, which is different from the set indicated by χi. However, when there are no skip connections (i.e. zi only depends on zi 1), χ i is exactly the conjugate of χi.

We defer the proof of Lemma 1 to Appendix A.1. With structured upper bounds on these conjugate functions, we can bound the original adversarial problem using the dual network described in Theorem 1. We can then optimize the bound using any standard deep learning toolkit using the same robust optimization procedure as in Wong and Kolter [2017] but using our bound instead. This amounts to minimizing the loss evaluated on our bound of possible network outputs under perturbations, as a drop in replacement for the traditional network output. For the adversarial setting, note that the ℓ perturbation results in a dual norm of ℓ1. Theorem 1. Let gij and hi be any functions such that

χ i ( νi, νi+1:k) hi(νi:k) subject to νi =

j=i+1 gij(νj) (6)

for i = 1, . . . , k 1. Then, the adversarial problem from Equation 2 is lower bounded by

J(x, ν1:k) = νT 1 x ϵ ν1

i=1 hi(νi:k) (7)

where is the dual norm, and ν1:k = g(c) is the output of a k layer neural network g on input c, given by the equations

νk = c, νi =

j=i gij(νj+1), for i = 1, . . . , k 1. (8)

We denote the upper bound on the conjugate function from Equation 6 a dual layer, and defer the proof to Appendix A.2. To give a concrete example, we present two possible dual layers for linear operators and Re LU activations in Corollaries 1 and 2 (their derivations are in Appendix B), and we also depict an example dual residual block in Figure 1.

ν1 ν2 ν3 ν4 ν5

Dual Re LU4

Dual Re LU2

Figure 1: An example of the layers forming a typical residual block (left) and its dual (right), using the dual layers described in Corollaries 1 and 2. Note that the bias terms of the residual network go into the dual objective and are not part of the structure of the dual network, and the skip connections remain in the dual network but go in the opposite direction.

Corollary 1. The dual layer for a linear operator ˆzi+1 = Wizi + bi is

χ i (νi:k) = νT i+1bi subject to νi = W T i νi+1. (9)

Corollary 2. Suppose we have lower and upper bounds ℓij, uij on the pre-activations. The dual layer for a Re LU activation ˆzi+1 = max(zi, 0) is

χ i (νi:k) X

j Ii ℓi,j[νij]+ subject to νi = Diνi+1. (10)

where I i , I+ i , I denote the index sets where the bounds are negative, positive or spanning the origin respectively, and where Di is a diagonal matrix with entries

0 j I i 1 j I+ i ui,j ui,j ℓi,j j Ii . (11)

We brieﬂy note that these dual layers recover the original dual network described in Wong and Kolter [2017]. Furthermore, the dual linear operation is the exact conjugate and introduces no looseness to the bound, while the dual Re LU uses the same relaxation used in Ehlers [2017], Wong and Kolter [2017]. More generally, the strength of the bound from Theorem 1 relies entirely on the tightness of the individual dual layers to their respective conjugate functions in Equation 6. While any gij, hi can be chosen to upper bound the conjugate function, a tighter bound on the conjugate results in a tighter bound on the adversarial problem.

If the dual layers for all operations are linear, the bounds for all layers can be computed with a single forward pass through the dual network using a direct generalization of the form used in Wong and Kolter [2017] (due to their similarity, we defer the exact algorithm to Appendix F). By trading off tightness of the bound with computational efﬁciency by using linear dual layers, we can efﬁciently compute all bounds and construct the dual network one layer at a time. The end result is that we can automatically construct dual networks from dual layers in a fully modular fashion, completely independent of the overall network architecture (similar to how auto-differentiation tools proceed one function at a time to compute all parameter gradients using only the local gradient of each function). With a sufﬁciently comprehensive toolkit of dual layers, we can compute provable bounds on the adversarial problem for any network architecture.

For other dual layers, we point the reader to two resources. For the explicit form of dual layers for hardtanh, batch normalization, residual connections, we direct the reader to Appendix B. For analytical forms of conjugate functions of other activation functions such as tanh, sigmoid, and max pooling, we refer the reader to Dvijotham et al. [2018].

3.2 Efﬁcient bound computation for ℓ perturbations via random projections

A limiting factor of the proposed algorithm and the work of Wong and Kolter [2017] is its computational complexity: for instance, to compute the bounds exactly for ℓ norm bounded perturbations in Re LU networks, it is computationally expensive to calculate ν1 1 and P

j Ii ℓij[νij]+. In contrast to other terms like νT i+1bi which require only sending a single bias vector through the dual network,

Algorithm 1 Estimating ν1 1 and P j I ℓij[νij]+

input: Linear dual network operations gij, projection dimension r, lower bounds ℓij, dij from Equation 13, layer-wise sizes |zi| R(1) 1 := Cauchy(r, |z1|) // initialize random matrix for ℓ1 term for i = 2, . . . , k do

// pass each term forward through the network for j = 1, . . . , i 1 do

R(i) j , S(i) j := Pi 1 k=1 g T ki(R(k) i ), Pi 1 k=1 g T ki(S(k) i ) end for R(i) i , S(i) i := diag(di)Cauchy(|zi|, r), di // initialize terms for layer i end for output: median(|R(k) 1 |), 0.5 median(|R(k) 2 |) + S(k) 2 , . . . , 0.5 median(|R(k) k |) + S(k) k

the matrices ν1 and νi,Ii must be explicitly formed by sending an example through the dual network for each input dimension and for each j Ii, which renders the entire computation quadratic in the number of hidden units. To scale the method for larger, Re LU networks with ℓ perturbations, we look to random Cauchy projections. Note that for an ℓ2 norm bounded adversarial perturbation, the dual norm is also an ℓ2 norm, so we can use traditional random projections [Vempala, 2005]. Experiments for the ℓ2 norm are explored further in Appendix H. However, for the remainder of this section we focus on the ℓ1 case arising from ℓ perturbations.

Estimating with Cauchy random projections From the work of Li et al. [2007], we can use the sample median estimator with Cauchy random projections to directly estimate ν1 1 for linear dual networks, and use a variation to estimate P

j I ℓij[νij]+, as shown in Theorem 2 (the proof is in Appendix D.1). Theorem 2. . Let ν1:k be the dual network from Equation 1 with linear dual layers and let r > 0 be the projection dimension. Then, we can estimate

ν1 1 median(|νT 1 R|) (12) where R is a |z1| r standard Cauchy random matrix and the median is taken over the second axis. Furthermore, we can estimate X

j I ℓij[νij]+ 1

2 median(|νT i diag(di)R|) + νT i di , di,j = ui,j ui,j ℓi,j j Ii 0 j Ii (13)

where R is a |zi| r standard Cauchy random matrix, and the median is taken over the second axis.

This estimate has two main advantages: ﬁrst, it is simple to compute, as evaluating νT 1 R involves passing the random matrix forward through the dual network (similarly, the other term requires passing a modiﬁed random matrix through the dual network; the exact algorithm is detailed in 1). Second, it is memory efﬁcient in the backward pass, as the gradient need only propagate through the median entries.

These random projections reduce the computational complexity of computing these terms to piping r random Cauchy vectors (and an additional vector) through the network. Crucially, the complexity is no longer a quadratic function of the network size: if we ﬁx the projection dimension to some constant r, then the computational complexity is now linear with the input dimension and Ii. Since previous work was either quadratic or combinatorially expensive to compute, estimating the bound with random projections is the fastest and most scalable approach towards training robust networks that we are aware of. At test time, the bound can be computed exactly, as the gradients no longer need to be stored. However, if desired, it is possible to use a different estimator (speciﬁcally, the geometric estimator) for the ℓ norm to calculate high probability bounds on the adversarial problem, which is discussed in Appendix E.1.

3.3 Bias reduction with cascading ensembles

A ﬁnal major challenge of training models to minimize a robust bound on the adversarial loss, is that the robustness penalty acts as a regularization. For example, in a two-layer Re LU network, the robust

Table 1: Number of hidden units, parameters, and time per epoch for various architectures.

Model Dataset # hidden units # parameters Time (s) / epoch Small MNIST 4804 166406 74 CIFAR 6244 214918 48 Large MNIST 28064 1974762 667 CIFAR 62464 2466858 466 Resnet MNIST 82536 3254562 2174 CIFAR 107496 4214850 1685

Table 2: Results on MNIST, and CIFAR10 with small networks, large networks, residual networks, and cascaded variants.

Single model error Cascade error Dataset Model Epsilon Robust Standard Robust Standard MNIST Small, Exact 0.1 4.48% 1.26% - - MNIST Small 0.1 4.99% 1.37% 3.13% 3.13% MNIST Large 0.1 3.67% 1.08% 3.42% 3.18% MNIST Small 0.3 43.10% 14.87% 33.64% 33.64% MNIST Large 0.3 45.66% 12.61% 41.62% 35.24% CIFAR10 Small 2/255 52.75% 38.91% 39.35% 39.35% CIFAR10 Large 2/255 46.59% 31.28% 38.84% 36.08% CIFAR10 Resnet 2/255 46.11% 31.72% 36.41% 35.93% CIFAR10 Small 8/255 79.25% 72.24% 71.71% 71.71% CIFAR10 Large 8/255 83.43% 80.56 79.24% 79.14% CIFAR10 Resnet 8/255 78.22% 71.33% 70.95% 70.77%

loss penalizes ϵ ν1 1 = ϵ W1D1W2 1, which effectively acts as a regularizer on the network with weight ϵ. Because of this, the resulting networks (even those with large representational capacity), are typically overregularized to the point that many ﬁlters/weights become identically zero (i.e., the network capacity is not used).

To address this point, we advocate for using a robust cascade of networks: that is, we train a sequence of robust classiﬁers, where later elements of the cascade are trained (and evaluated) only on those examples that the previous elements of the cascade cannot certify (i.e., those examples that lie within ϵ of the decision boundary). This procedure is formally described in the Appendix in Algorithm 2.

4 Experiments

Dataset and Architectures We evaluate the techniques in this paper on two main datasets: MNIST digit classiﬁcation [Le Cun et al., 1998] and CIFAR10 image classiﬁcation [Krizhevsky, 2009].2 We test on a variety of deep and wide convolutional architectures, with and without residual connections. All code for these experiments is available at https://github.com/locuslab/convex_ adversarial/. The small network is the same as that used in [Wong and Kolter, 2017], with two convolutional layers of 16 and 32 ﬁlters and a fully connected layer of 100 units. The large network is a scaled up version of it, with four convolutional layers with 32, 32, 64, and 64 ﬁlters, and two fully connected layers of 512 units. The residual networks use the same structure used by [Zagoruyko and Komodakis, 2016] with 4 residual blocks with 16, 16, 32, and 64 ﬁlters. We highlight a subset of the results in Table 2, and brieﬂy describe a few key observations below. We leave more extensive experiments and details regarding the experimental setup in Appendix G, including additional experiments on ℓ2 perturbations. All results except where otherwise noted use random projection of 50 dimensions.

2We fully realize the irony of a paper with scaling" in the title that currently maxes out on CIFAR10 experiments. But we emphasize that when it comes to certiﬁably robust networks, the networks we consider here, as we illustrate below in Table 1, are more than an order of magnitude larger than any that have been considered previously in the literature. Thus, our emphasis is really on the potential scaling properties of these approaches rather than large-scale experiments on e.g. Image Net sized data sets.

Robust error (train)

Robust error (test)

Exact k=10 k=50 k=100 k=150 k=200

Figure 2: Training and testing robust error curves over epochs on the MNIST dataset using k projection dimensions. The ϵ value for training is scheduled from 0.01 to 0.1 over the ﬁrst 20 epochs. The projections force the model to generalize over higher variance, reducing the generalization gap.

0 50 100 4 10 1

Robust error (train)

0 50 100 4 10 1

Robust error (test)

Cascade 1 Cascade 2 Cascade 3 Cascade 4 Cascade 5

Figure 3: Robust error curves as we add models to the cascade for the CIFAR10 dataset on a small model. The ϵ value for training is scheduled to reach 2/255 after 20 epochs. The training curves are for each individual model, and the testing curves are for the whole cascade up to the stage.

Summary of results For the different data sets and models, the ﬁnal robust and nominal test errors are given in Table 2. We emphasize that in all cases we report the robust test error, that is, our upper bound on the possible test set error that the classiﬁer can suffer under any norm-bounded attack (thus, considering different empirical attacks is orthogonal to our main presentation and not something that we include, as we are focused on veriﬁed performance). As we are focusing on the particular random projections discussed above, all experiments consider attacks with bounded ℓ norm, plus the Re LU networks highlighted above. On MNIST, the (non-cascaded) large model reaches a ﬁnal robust error of 3.7% for ϵ = 0.1, and the best cascade reaches 3.1% error. This contrasts with the best previous bound of 5.8% robust error for this epsilon, from [Wong and Kolter, 2017]. On CIFAR10, the Res Net model achieves 46.1% robust error for ϵ = 2/255, and the cascade lowers this to 36.4% error. In contrast, the previous best veriﬁed robust error for this ϵ, from [Dvijotham et al., 2018], was 80%. While the robust error is naturally substantially higher for ϵ = 8/255 (the amount typically considered in empirical works), we are still able to achieve 71% provable robust error; for comparison, the best empirical robust performance against current attacks is 53% error at ϵ = 8/255 Madry et al. [2017], and most heuristic defenses have been broken to beyond this error Athalye et al. [2018].

Number of random projections In the MNIST dataset (the only data set where it is trivial to run exact training without projection), we have evaluated our approach using different projection dimensions as well as exact training (i.e., without random projections). We note that using substantially lower projection dimension does not have a signiﬁcant impact on the test error. This fact is highlighted in Figure 2. Using the same convolutional architecture used by Wong and Kolter [2017], which previously required gigabytes of memory and took hours to train, it is sufﬁcient to use only 10 random projections to achieve comparable test error performance to training with the exact bound. Each training epoch with 10 random projections takes less than a minute on a single Ge Force GTX 1080 Ti graphics card, while using less than 700MB of memory, achieving signiﬁcant speedup and memory reduction over Wong and Kolter [2017]. The estimation quality and the corresponding speedups obtained are explored in more detail in Appendix E.6.

Cascades Finally, we consider the performance of the cascaded versus non-cascaded models. In all cases, cascading the models is able to improve the robust error performance, sometimes substantially, for instance decreasing the robust error on CIFAR10 from 46.1% to 36.4% for ϵ = 2/255. However, this comes at a cost as well: the nominal error increases throughout the cascade (this is to be expected, since the cascade essentially tries to force the robust and nominal errors to match). Thus, there is

substantial value to both improving the single-model networks and integrating cascades into the prediction.

5 Conclusion

In this paper, we have presented a general methodology for deriving dual networks from compositions of dual layers based on the methodology of conjugate functions to train classiﬁers that are provably robust to adversarial attacks. Importantly, the methodology is linearly scalable for Re LU based networks against ℓ norm bounded attacks, making it possible to train large scale, provably robust networks that were previously out of reach, and the obtained bounds can be improved further with model cascades. While this marks a signiﬁcant step forward in scalable defenses for deep networks, there are several directions for improvement. One particularly important direction is better architecture development: a wide range of functions and activations not found in traditional deep residual networks may have better robustness properties or more efﬁcient dual layers that also allow for scalable training. But perhaps even more importantly, we also need to consider the nature of adversarial perturbations beyond just norm-bounded attacks. Better characterizing the space of perturbations that a network should be resilient to represents one of the major challenges going forward for adversarial machine learning.

Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. ar Xiv preprint ar Xiv:1802.00420, 2018.

Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In Security and Privacy (SP), 2017 IEEE Symposium on, pages 39 57. IEEE, 2017.

Chih-Hong Cheng, Georg Nührenberg, and Harald Ruess. Maximum resilience of artiﬁcial neural networks. In International Symposium on Automated Technology for Veriﬁcation and Analysis, pages 251 268. Springer, 2017.

Krishnamurthy Dvijotham, Robert Stanforth, Sven Gowal, Timothy Mann, and Pushmeet Kohli. A dual approach to scalable veriﬁcation of deep networks. ar Xiv preprint ar Xiv:1803.06567, 2018.

Ruediger Ehlers. Formal veriﬁcation of piece-wise linear feed-forward neural networks. In International Symposium on Automated Technology for Veriﬁcation and Analysis, 2017.

Werner Fenchel. On conjugate convex functions. Canad. J. Math, 1(73-77), 1949.

Timon Gehr, Matthew Mirman, Dana Drachsler-Cohen, Petar Tsankov, Swarat Chaudhuri, and Martin Vechev. AI2: Safety and robustness certiﬁcation of neural networks with abstract interpretation. In IEEE Conference on Security and Privacy, 2018.

Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In International Conference on Learning Representations, 2015. URL http://arxiv. org/abs/1412.6572.

Matthias Hein and Maksym Andriushchenko. Formal guarantees on the robustness of a classiﬁer against adversarial manipulation. In Advances in Neural Information Processing Systems. 2017.

Xiaowei Huang, Marta Kwiatkowska, Sen Wang, and Min Wu. Safety veriﬁcation of deep neural networks. In International Conference on Computer Aided Veriﬁcation, pages 3 29. Springer, 2017.

Alex Krizhevsky. Learning multiple layers of features from tiny images. 2009.

Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale. In International Conference on Learning Representations, 2017.

Alexey Kurakin, Ian Goodfellow, Samy Bengio, Yinpeng Dong, Fangzhou Liao, Ming Liang, Tianyu Pang, Jun Zhu, Xiaolin Hu, Cihang Xie, et al. Adversarial attacks and defences competition. ar Xiv preprint ar Xiv:1804.00097, 2018.

Yann Le Cun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278 2324, 1998.

Ping Li, Trevor J Hastie, and Kenneth W Church. Nonlinear estimators and tail bounds for dimension reduction in l1 using cauchy random projections. Journal of Machine Learning Research, 8(Oct): 2497 2532, 2007.

Alessio Lomuscio and Lalit Maganti. An approach to reachability analysis for feed-forward relu neural networks. ar Xiv preprint ar Xiv:1706.07351, 2017.

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. ar Xiv preprint ar Xiv:1706.06083, 2017.

Jan Hendrik Metzen, Tim Genewein, Volker Fischer, and Bastian Bischoff. On detecting adversarial perturbations. In International Conference on Learning Representations, 2017.

Nicolas Papernot, Patrick Mc Daniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In Security and Privacy (SP), 2016 IEEE Symposium on, pages 582 597. IEEE, 2016.

Aditi Raghunathan, Jacob Steinhardt, and Percy Liang. Certiﬁed defenses against adversarial examples. In International Conference on Learning Representations, 2018.

Aman Sinha, Hongseok Namkoong, and John Duchi. Certiﬁable distributional robustness with principled adversarial training. In International Conference on Learning Representations, 2018.

Vincent Tjeng and Russ Tedrake. Verifying neural networks with mixed integer programming. Co RR, abs/1711.07356, 2017. URL http://arxiv.org/abs/1711.07356.

Santosh S Vempala. The random projection method, volume 65. American Mathematical Soc., 2005.

Eric Wong and J Zico Kolter. Provable defenses against adversarial examples via the convex outer adversarial polytope. ar Xiv preprint ar Xiv:1711.00851, 2017.

Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. ar Xiv preprint ar Xiv:1605.07146, 2016.