# fast_and_effective_robustness_certification__a04d4987.pdf Fast and Effective Robustness Certification Gagandeep Singh, Timon Gehr, Matthew Mirman, Markus Püschel, Martin Vechev Department of Computer Science ETH Zurich, Switzerland {gsingh,timon.gehr,matthew.mirman,pueschel,martin.vechev}@inf.ethz.ch We present a new method and system, called Deep Z, for certifying neural network robustness based on abstract interpretation. Compared to state-of-the-art automated verifiers for neural networks, Deep Z: (i) handles Re LU, Tanh and Sigmoid activation functions, (ii) supports feedforward, convolutional, and residual architectures, (iii) is significantly more scalable and precise, and (iv) and is sound with respect to floating point arithmetic. These benefits are due to carefully designed approximations tailored to the setting of neural networks. As an example, Deep Z achieves a verification accuracy of 97% on a large network with 88, 500 hidden units under L attack with ϵ = 0.1 with an average runtime of 133 seconds. 1 Introduction Neural networks have become an integral part of many critical applications such as vehicle control, pattern recognition, and medical diagnosis. However, it has been shown recently that neural networks are susceptible to adversarial attacks, where the network can be easily tricked into making wrong classification by only slightly modifying its inputs [26, 12, 17, 3, 2, 9, 23, 22]. As a result, there is considerable interest in formally ensuring robustness of neural networks against such attacks. Robustness verifiers can be complete or incomplete. Complete verifiers do not have false positives but have limited scalability as they are based on computationally expensive methods such as SMT solving [14, 8], mixed integer linear programming [27] or input refinement [28]. On the other hand, incomplete verifiers can produce false positives but they scale better than complete verifiers. Incomplete verifiers employ a variety of methods including duality [24, 7], abstract interpretation [10], and linear approximations [29, 15]. Orthogonal to robustness certification, adversarial training focuses on making neural networks robust by training against an adversarial model of attack. Gu and Rigazio [13] add concrete noise to the training set and remove it statistically for defending against adversarial examples. Goodfellow et al. [12] generate adversarial examples misclassified by neural networks and then design a defense against this attack by explicit training against perturbations generated by the attack. Madry et al. [19] show that training against an optimal attack also guards against non-optimal attacks. While this technique was effective in experiments, Carlini et al. [4] demonstrated an attack for the safety-critical problem of ground-truthing, where this defense occasionally exacerbated the problem. The work of [30] and [21] proposes scalable defenses against the standard L -based adversarial attacks. In this paper we build on the work of Gehr et al. [10], which introduces the idea of using the classic framework of abstract interpretation [5] to soundly approximate the behavior of the network. Similar to [10], we also use the Zonotope abstraction for sound approximations. The Zonotope domain provides a closed affine form for each variable and enables a precise and cheap way to capture the effect of affine transformations inside neural networks, without requiring backpropagation as in [15, 29]. The key limitation of [10] however, is that it only provides a fairly generic abstract transformer for the Re LU activation function, which is slow and imprecise. Further, the work does not 32nd Conference on Neural Information Processing Systems (Neur IPS 2018), Montréal, Canada. discuss approximations of other important functions (e.g., Sigmoid, Tanh). Indeed, defining sound, scalable and precise abstract transformers is the most difficult aspect of abstract-interpretation-based analyses. While generic transformers tend to be easier to reason about and ensure soundness of, they often lack the scalability and precision of transformers that exploit the underlying properties of the abstract domain (e.g., Zonotope) and the function being approximated (e.g., Re LU). Our contributions. In this work we address these limitations and make the following contributions: We introduce new, point-wise Zonotope abstract transformers specifically designed for the Re LU, Sigmoid, and Tanh activations often used by neural networks. Our transformers minimize the area of the projection of the zonotope to the 2-D input-output plane. Further, our transformers are sound w.r.t. floating point arithmetic. We implemented both, sequential and parallel versions of our transformers in an end-to-end automated verification system called Deep Z. We evaluated Deep Z on the task of verifying local robustness against L -norm based attacks on large MNIST and CIFAR10 feed forward, convolutional, and residual networks. In our evaluation we considered both, undefended as well as defended versions of the same network (defended against L attacks using state-of-the-art defenses). Our experimental results demonstrate that Deep Z is more precise and faster than prior work. Deep Z precisely verifies large networks with > 88, 000 hidden units under L -norm based perturbations within few minutes, while being sound w.r.t to floating point arithmetic. To our best knowledge, Deep Z is currently the most scalable system for certifying local robustness of neural networks while guaranteed soundness w.r.t to floating point operations (used by all neural networks). All of our code, datasets and results are publicly available at http://safeai.ethz.ch/. 2 Abstract Interpretation for Verifying Robustness of Neural Networks Abstract Interpretation [5] is a classic method for sound and precise over-approximation of potentially unbounded or infinite set of program behaviors. The key idea behind this framework consists of defining so called abstract transformers for statements used by the program (e.g., affine arithmetic, Re LU functions, etc). These transformers approximate (i.e., they are sound) the behavior of the statement by defining its effect on an abstract domain. An abstract domain is simply a set of abstract elements (approximations) typically ordered in a lattice of some kind. A key challenge when defining abstract transformers is striking a balance between scalability (how fast the transformer computes the approximation) and precision (how much precision it loses). Once transformers are defined, the analysis with abstract interpretation proceeds by executing them on the particular program (e.g., a neural network) and computing a final approximation (a fixed point). The relevant property can then be checked on this final approximation: if the property can be proved, then it holds for any concrete input to the program, otherwise, it may either hold but the abstraction was too coarse and unable to prove it (i.e., a false positive) or it may indeed not hold. Verifying robustness properties of neural networks exactly is computationally expensive as it usually requires evaluating the network exhaustively on a prohibitively large set of inputs. Abstract interpretation can be leveraged for this problem by designing abstract transformers specifically for the computations used in the network, e.g., affine arithmetic and activation functions. The network can then be analyzed using these abstract transformers. For example, we can abstract a concrete input x and relevant perturbations to x (resulting in many different inputs) into one abstract element αR and then analyze the network starting from αR, producing an abstract output αo R. We can then verify the robustness property of interest over αo R: if successful, it means we verified it over all concrete outputs corresponding to all perturbations of the concrete input. In this paper, we consider local robustness properties (Rx,ϵ, CL) where Rx,ϵ represents the set of perturbed inputs around the original input x Rm based on a small constant ϵ > 0. CL is a robustness condition which defines the set of outputs that all have the same label L: y Rn | arg max i {1,...,n} (yi) = L o . A robustness property (Rx,ϵ, CL) holds iff the set of outputs OR corresponding to all inputs in Rx,ϵ is included in CL. (Rx,ϵ, CL) can be verified using abstract interpretation by checking if the abstract output αo R resulting from analyzing the network with an abstraction of Rx,ϵ is included in CL. Zonotope Abstraction. In this work, we build on the classic Zonotope numerical abstract domain, which we discuss below. This domain was already shown to be a suitable basis for analyzing neural networks by Gehr et al. [10]. In the next section, we introduce our new abstract transformers which leverage properties of the domain and are the novel contribution of this work. Let X be the set of n variables. The Zonotope abstraction [11] builds on affine arithmetic by associating an affine expression ˆx for each variable x X: i=1 αi ϵi, where α0, αi R, ϵi [ai, bi] [ 1, 1] (1) This expression consists of a center coefficient α0, a set of noise symbols ϵi, and coefficients αi representing partial deviations around the center. Crucially, the noise symbols ϵi can be shared between affine forms for different variables which creates implicit dependencies and constraints between the affine forms. This makes the Zonotope abstraction more powerful than an Interval abstraction which only maintains ranges of a variable x. A range [lx, ux] can be simply derived from the affine form by computing the minimal and maximal value possible. A zonotope Z Rn is represented by a collection of affine forms for all variables x X, and is the set of all possible (joint) values of the affine forms for an arbitrary instantiation of the noise symbols ϵi. As in practice, it is impossible to compute with arbitrary real numbers, we instead use a slightly modified definition: ˆx := [α0, β0] + i=1 [αi, βi] ϵi, where α0, β0, αi, βi R, ϵi [ai, bi] [ 1, 1] (2) In this interval affine form, we have replaced all coefficients by intervals. All computations on intervals are performed using standard interval arithmetic. To ensure soundness with respect to different rounding modes and to account for the lack algebraic properties such as associativity and distributivity in the floating point world, the lower bounds and the upper bounds are rounded towards and + respectively and suitable error values are added as explained in [20]. Since affine arithmetic is fast and exact for affine transformations, it is an attractive candidate for the verification of neural networks [10]. However, the Zonotope abstraction is inherently not exact for non-linear activation functions such as Re LU, Sigmoid, and Tanh. Thus, approximation is needed, which creates a tradeoff between the cost of computation and precision. As mentioned earlier, a generic approximation of the Re LU function was proposed by Gehr et al. [10], however, this approximation is both imprecise and costly as it relies on the expensive Zonotope join operator. Overall, this results in suboptimal precision and performance of the analysis. 3 Fast Zonotope Abstract Transformers We now introduce our fast and precise pointwise Zonotope abstract transformers for the Re LU, Sigmoid, and Tanh activations (Sigmoid and Tanh are not supported by Gehr et al. [10]) and show their optimality in terms of area in the input-output plane. Our evaluation in Section 4 shows that our proposed approximations strike a good balance between precision and performance. The effect of applying the Re LU function on an input zonotope Z can be represented with the assignment y := max(0, x) where x, y X. If x can only have positive (lx > 0) or non-positive values (ux 0) in Z, then ˆy = ˆx or ˆy = [0, 0] respectively. The affine forms for the remaining variables are not affected and the resulting zonotope is exact. When x can have both positive and negative values, then the output cannot be exactly captured by the zonotope abstraction and thus approximations are required. We define such an approximation for this case. The approximation can also be applied pointwise per layer, namely, only altering the affine form ˆy while keeping all other affine forms in Z unaltered. (a) 0 λ ux ux lx ˆy = λ ˆx + ux (1 λ) (b) ux ux lx λ 1 ˆy = λ (ˆx lx) Figure 1: Two zonotope approximations for the Re LU function parameterized by the slope λ. Fig. 1 shows the projections into the xy-plane of two sets of sound zonotope approximations. The projections have the shape of a parallelogram with two vertical lines and two parallel lines of slope λ, which is a parameter. To ensure soundness for all approximations in Fig. 1 (a), we require 0 λ ux ux lx . Similarly, ux ux lx λ 1 for Fig. 1 (b). Notice that the two sets have one element in common at λ = ux ux lx . Among the different candidate approximations in Fig. 1, we choose the one minimizing the area of the parallelogram in the xy-plane. The area A1(λ) of the parallelogram in Fig. 1 (a) is: A1(λ) = (1 λ) ux (ux lx). (3) A1(λ) is a decreasing function of λ. Thus A1 is minimized at λ = ux ux lx . Similarly, the area A2(λ) of the parallelogram in Fig. 1 (b) is: A2(λ) = λ ( lx) (ux lx). (4) A2(λ) is an increasing function of λ and also minimized at λ = ux ux lx . In summary, we obtain the following theorem: Theorem 3.1 Let Z be the input to a Re LU function y = Re LU(x). Consider the set of pointwise Zonotope approximations O of the output that only alter the affine form ˆy of the variable y. The new affine form ˆy for the output with the minimal area in the xy-plane is given by: ˆx, if lx > 0, [0, 0], if ux 0, [λl, λu] ˆx + [µl, µu] + [µl, µu] ϵnew, otherwise. (5) Here λl, λu are floating point representations of λopt = ux ux lx using rounding towards and + respectively. Similarly, µl, µu are floating point representations of µ = ux lx 2 (ux lx) using rounding towards and + respectively, and ϵnew [ 1, 1] is a new noise symbol. The running time of the optimal transformer in Theorem 3.1 is linear in the number p of noise symbols. One can also define an optimal Zonotope transformer minimizing the volume of the output zonotope, however this is too expensive and the resulting transformer cannot be applied pointwise. 3.2 Sigmoid The effect of applying the Sigmoid function on an input zonotope Z can be represented with the assignment y := σ(x) where x, y X and σ(x) = ex 1+ex . For the assigned variable y, we have [ly, uy] [0, 1]. When lx = ux, then ˆy := h eux 1+eux , eux 1+eux i and the resulting zonotope is exact, otherwise the output cannot be exactly represented by a zonotope and thus approximations are required. We define pointwise approximations for the Sigmoid function such that ly = σ(lx), uy = σ(ux) and then choose the one minimizing the area of its projection in the xy-plane. Fig. 2 shows the projections into the xy-plane of a set of sound zonotope approximations for the output of the Sigmoid function which have ly = σ(lx), uy = σ(ux). As for Re LU, the projections have the shape of a parallelogram with two vertical lines and two parallel lines of slope λ which parameterizes the set. To ensure soundness, we have 0 λ min(σ (lx), σ (ux)) where σ x = ex (1+ex)2 . The area A(λ) of the parallelogram with slope λ in Fig. 2 is: A(λ) = (σ(ux) σ(lx) λ (ux lx)) (ux lx) (6) ˆy = f(lx) + λ (ˆx lx) ˆy = f(ux) + λ (ˆx ux) Figure 2: Zonotope approximation for the sigmoid function parameterized by slope λ, where 0 λ min(f (lx), f (ux)). A(λ) is a decreasing function of λ and thus A(λ) is minimized at λopt = min(σ (lx), σ (ux)). This yields the following theorem: Theorem 3.2 Let Z be the input to a smooth S-shaped1 function y = f(x) (such as the Sigmoid function y = σ(x) = ex 1+ex ). Consider the set of pointwise Zonotope approximations O of the output that only alter the affine form ˆy of the variable y and where the box concretization of ˆy satisfies ly = σ(lx), uy = σ(ux). The new affine form ˆy for the output with the minimum area in the xy-plane is given by: ˆy = [f(ux)l, f(ux)u], if lx = ux, [λl, λu] ˆx + [µ1 l , µ1 u] + [µ2 l , µ2 u] ϵnew, otherwise, (7) Here, f(ux)l, f(ux)u are floating point representations of f(ux) rounded towards and + respectively and λl, λu are floating point representations of λopt = min(f (lx), f (ux)) using rounding towards and + respectively. Similarly µ1 l , µ1 u and µ2 l , µ2 u are floating point representations of µ1 = 1 2(f(ux)+f(lx) λopt (ux+lx)) and µ2 = 1 2f(ux) f(lx) λopt (ux lx) computed using rounding towards and + and adding the error due to the non-associativity of floating point addition, and ϵnew [ 1, 1] is a new noise symbol. As with Re LU, the optimal Sigmoid transformer in Theorem 3.2 has linear running time in the number of noise symbols and can be applied pointwise. The Tanh function is also S-shaped, like the Sigmoid function. A fast, optimal, and pointwise Tanh transformer can be defined using Theorem 3.2 by setting f(x) = tanh(x) and f (x) = 1 tanh2(x). 4 Experiments We now evaluate the effectiveness of our new Zonotope transformers for verifying local robustness of neural networks. Our implementation is available as an end-to-end automated verifier, called Deep Z. The verifier is implemented in Python, however, the underlying abstract transformers are implemented in C (for performance) in both the sequential and the parallel version, and are made available as part of the public ELINA [1, 25] library. 4.1 Experimental setup Evaluation datasets. We used the popular MNIST [18] and CIFAR10 [16] datasets for our experiments. MNIST contains 60 000 grayscale images of size 28 28 pixels and CIFAR10 consists of 60 000 RGB images of size 32 32 pixels. Neural networks. Table 1 shows the fully connected feedforward (FFNNs), convolutional (CNNs), and residual networks for the MNIST and CIFAR10 datasets used in our experiments. We used both 1A smooth function f : R R is said to be S-shaped if f (x) 0 and there exists a value x such that for all x R, we have f (x) 0 x x . Table 1: Neural network architectures used in our experiments. Dataset Model Type #Hidden units MNIST FFNNSmall fully connected 610 FFNNBig fully connected 3 010 Conv Small convolutional 3 604 Conv Med convolutional 4 804 Conv Big convolutional 34 688 Conv Super convolutional 88 500 Skip residual 71,650 CIFAR10 FFNNBig fully connected 3 010 Conv Small convolutional 4 852 Conv Big convolutional 62 464 0.005 0.010 0.015 0.020 0.025 0.030 0% Verified robustness Deep Z Fast-Lin (a) MNIST FFNNSmall Re LU 0.005 0.010 0.015 0.020 0.025 0.030 0 Deep Z Fast-Lin (b) MNIST FFNNSmall Re LU Figure 3: Comparing the performance and precision of Deep Z with the state of the art. undefended and defended training procedures for training our networks. For adversarial training, we used Diff AI [21] and projected gradient descent (PGD) [6] parameterized with ϵ. In our evaluation, we refer to the undefended nets as Point, and to the defended networks with the name of the training procedure (either Diff AI or PGD). More details on our neural networks and the training procedures can be found in the appendix. Robustness property. We consider the standard L -norm-based perturbation regions Rx,ϵ [3], where Rx,ϵ contains all perturbed inputs x where each pixel x i has a distance of at most ϵ from the corresponding pixel xi in the original input x. Rx,ϵ can be exactly represented by a single zonotope. Benchmarks. We selected the first 100 images from the test set of each data set. Then, we specified a robustness property for each image using a set of robustness bounds ϵ. 4.2 Experimental results All experiments for the FFNNs were carried out on a 3.3 GHz 10 core Intel i9-7900X Skylake CPU with 64 GB main memory; the CNNs and the residual network were evaluated on a 2.6 GHz 14 core Intel Xeon CPU E5-2690 with 512 GB main memory. We used a time limit of 10 minutes per run for all our experiments. Comparison with prior work. We compare the precision and performance of the sequential version of Deep Z against two state-of-the-art certifiers Fast-Lin [29] and AI2 [10] on the FFNNSmall MNIST network with Re LU activation. We note that both of these certifiers support only a subset of the network architectures that Deep Z can support. Specifically, Fast-Lin only supports FFNNs with Re LU activations whereas AI2 supports FFNNs and CNNs with Re LU activations. We also note that Fast-Lin is not sound under floating point semantics. 0.015 0.020 0.025 0.030 0.035 0.040 0% Verified robustness PGDϵ=0.3 PGDϵ=0.1 (a) MNIST FFNNBig Re LU 0.015 0.020 0.025 0.030 0.035 0.040 0 PGDϵ=0.3 PGDϵ=0.1 (b) MNIST FFNNBig Re LU 0.020 0.040 0.060 0.080 0.100 0.120 0% Verified robustness (c) MNIST Conv Small Re LU 0.020 0.040 0.060 0.080 0.100 0.120 0 (d) MNIST Conv Small Re LU Figure 4: Verified robustness by Deep Z on the MNIST networks with Re LU activations. Table 2: Verified robustness by Deep Z on the large networks trained with Diff AI. Dataset Model ϵ % verified robustness average runtime(s) MNIST Conv Big 0.1 97 5 Conv Big 0.2 79 7 Conv Big 0.3 37 17 Conv Super 0.1 97 133 Skip 0.1 95 29 CIFAR10 Conv Big 0.006 50 39 Fig. 3 shows the percentage of verified robustness and the average analysis time of all three certifiers. The values of ϵ are shown on the x-axis. Deep Z has the same precision as Fast-Lin but is up to 2.5x times faster. We note that the runtime of Deep Z increases with increasing value of ϵ; this is because the complexity of our analysis is determined by the maximum number of noise symbols in the affine form. Our Re LU transformer creates one noise symbol for any variable that can take both positive and negative values. The number of such cases rises with the increasing value of ϵ. On the other hand, AI2 is significantly less precise and slower compared to both Fast-Lin and Deep Z. We also compared Deep Z against the duality-based certifier from [7], however it always timed out in our experiments. Detailed experiments. Next, we evaluate Deep Z on the remaining networks using the parallelized version of our Zonotope transformers. Fig. 4 shows the percentage of verified robustness and the average analysis time of Deep Z for the MNIST networks with Re LU activations. Deep Z analyzes all FFNNBig networks with average runtime 22 seconds and proves 95% of the robustness properties for ϵ = 0.04 for the defended PGDϵ=0.3 network. Deep Z is able to analyze all Conv Small networks with average runtime 2 seconds. It proves 95% of the robustness properties for ϵ = 0.1 on the 0.005 0.010 0.015 0.020 0.025 0.030 0% Verified robustness PGDϵ=0.3 PGDϵ=0.1 (a) MNIST FFNNBig Sigmoid 0.005 0.010 0.015 0.020 0.025 0.030 0% Verified robustness PGDϵ=0.3 PGDϵ=0.1 (b) MNIST FFNNBig Tanh 0.020 0.040 0.060 0.080 0.100 0.120 0% Verified robustness PGDϵ=0.3 PGDϵ=0.1 (c) MNIST Conv Med Sigmoid 0.020 0.040 0.060 0.080 0.100 0.120 0% Verified robustness PGDϵ=0.3 PGDϵ=0.1 (d) MNIST Conv Med Tanh Figure 5: Verified robustness by Deep Z on the MNIST networks with Sigmoid and Tanh activations. Conv Small network defended with Diff AI. Table 2 shows the precision and the performance of Deep Z on Conv Big, Conv Super, and Skip networks trained with Diff AI. Deep Z proves 95% and 97% of robustness properties for the Skip and Conv Super networks containing > 71, 000 and > 88 000 hidden units respectively in 29 and 133 seconds on average. Fig. 5 shows the precision and the performance of Deep Z on the MNIST FFNNBig and Conv Med networks with Sigmoid and Tanh activations. It can be seen that Deep Z verifies 74% of the robustness properties on the FFNNBig Sigmoid and Tanh networks trained with PGDϵ=0.3 for ϵ = 0.03. Deep Z verifies 82% of the robustness properties on the Conv Med Sigmoid network for ϵ = 0.1. The corresponding number for the Tanh network is 33%. We note that unlike the Re LU transformer, both Sigmoid and Tanh transformers always create a new noise symbol whenever lx = ux. Thus, the runtime does not increase significantly with ϵ and is not plotted. Deep Z has an average runtime of 35 and 22 seconds on all FFNNBig and Conv Med networks, respectively. Fig. 6 shows that Deep Z has an average runtime of 50 seconds for the CIFAR10 FFNNBig Re LU networks. It can be seen that the defended FFNNBig CIFAR10 Re LU networks are not significantly more provable than the undefended network. However, Deep Z verifies more properties on the defended Conv Small networks than the undefended one and proves 75% of robustness properties on the Diff AI defended network for ϵ = 0.01. Deep Z has an average runtime of 3 seconds on all Conv Small networks. Deep Z is able to verify 50% of robustness properties for Conv Big network defended with Diff AI with an average runtime of 39 seconds as shown in Table 2. Deep Z verifies 82% of robustness properties on the FFNNBig Sigmoid network defended with PGDϵ=0.0078 for ϵ = 0.012 in Fig. 7. It verifies 46% of the robustness properties on the FFNNBig network with Tanh activation trained using PGDϵ=0.0313 for the same ϵ. The average runtime of Deep Z on all networks is 90 seconds. 0.0005 0.0010 0.0015 0.0020 0.0025 0.0030 0% Verified robustness PGDϵ=0.0313 PGDϵ=0.0078 (a) CIFAR10 FFNNBig Re LU 0.0005 0.0010 0.0015 0.0020 0.0025 0.0030 0 PGDϵ=0.0313 PGDϵ=0.0078 (b) CIFAR10 FFNNBig Re LU 0.002 0.004 0.006 0.008 0.010 0.012 0% Verified robustness (c) CIFAR10 Conv Small Re LU 0.002 0.004 0.006 0.008 0.010 0.012 0 (d) CIFAR10 Conv Small Re LU Figure 6: Verified robustness by Deep Z on the CIFAR10 networks with Re LU activations. 0.0002 0.0004 0.0006 0.0008 0.0010 0.0012 0% Verified robustness PGDϵ=0.0313 PGDϵ=0.0078 (a) CIFAR10 FFNNBig Sigmoid 0.0002 0.0004 0.0006 0.0008 0.0010 0.0012 0% Verified robustness PGDϵ=0.0313 PGDϵ=0.0078 (b) CIFAR10 FFNNBig Tanh Figure 7: Verified robustness by Deep Z on the CIFAR10 networks with Sigmoid and Tanh activations. 5 Conclusion We introduced fast and precise Zonotope abstract transformers for key non-linear activations used in modern neural networks. We used these transformers to build Deep Z, an automated verifier for proving the robustness of neural networks against adversarial attacks. We evaluated the effectiveness of Deep Z on verifying robustness of large feedforward, convolutional, and residual networks against challenging L -norm attacks. Our results show that Deep Z is more precise and faster than prior work, while also ensuring soundness with respect to floating point operations. [1] ELINA: ETH Library for Numerical Analysis. http://elina.ethz.ch. [2] Anish Athalye and Ilya Sutskever. Synthesizing robust adversarial examples. In Proc. International Conference on Machine Learning (ICML), 2018. [3] Nicholas Carlini and David A. Wagner. Towards evaluating the robustness of neural networks. In Proc. IEEE Symposium on Security and Privacy (SP), pages 39 57, 2017. [4] Nicholas Carlini, Guy Katz, Clark Barrett, and David L. Dill. Ground-truth adversarial examples. Co RR, abs/1709.10207, 2017. [5] Patrick Cousot and Radhia Cousot. Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In Proc. Symposium on Principles of Programming Languages (POPL), pages 238 252, 1977. [6] Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, and Jianguo Li. Boosting adversarial attacks with momentum. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. [7] Krishnamurthy Dvijotham, Robert Stanforth, Sven Gowal, Timothy Mann, and Pushmeet Kohli. A dual approach to scalable verification of deep networks. In Proc. Uncertainty in Artificial Intelligence (UAI), pages 162 171, 2018. [8] Rüdiger Ehlers. Formal verification of piece-wise linear feed-forward neural networks. In Automated Technology for Verification and Analysis (ATVA), pages 269 286, 2017. [9] Ivan Evtimov, Kevin Eykholt, Earlence Fernandes, Tadayoshi Kohno, Bo Li, Atul Prakash, Amir Rahmati, and Dawn Song. Robust physical-world attacks on deep learning models. ar Xiv preprint ar Xiv:1707.08945, 1, 2017. [10] T. Gehr, M. Mirman, D. Drachsler-Cohen, P. Tsankov, S. Chaudhuri, and M. Vechev. Ai2: Safety and robustness certification of neural networks with abstract interpretation. In Proc. IEEE Symposium on Security and Privacy (SP), pages 948 963, 2018. [11] Khalil Ghorbal, Eric Goubault, and Sylvie Putot. The zonotope abstract domain taylor1+. In Proc. Computer Aided Verification (CAV), pages 627 633, 2009. [12] Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In Proc. International Conference on Learning Representations (ICLR), 2015. [13] Shixiang Gu and Luca Rigazio. Towards deep neural network architectures robust to adversarial examples. ar Xiv preprint ar Xiv:1412.5068, 2014. [14] Guy Katz, Clark W. Barrett, David L. Dill, Kyle Julian, and Mykel J. Kochenderfer. Reluplex: An efficient SMT solver for verifying deep neural networks. In Proc. Computer Aided Verification (CAV), pages 97 117, 2017. [15] J Zico Kolter and Eric Wong. Provable defenses against adversarial examples via the convex outer adversarial polytope. In Proc. International Conference on Machine Learning (ICML), 2018. [16] Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009. [17] Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. Adversarial examples in the physical world. Co RR, abs/1607.02533, 2016. [18] Yann Lecun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. In Proc. of the IEEE, pages 2278 2324, 1998. [19] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In Proc. International Conference on Learning Representations (ICLR), 2018. [20] Antoine Miné. Relational abstract domains for the detection of floating-point run-time errors. In Proc. European Symposium on Programming (ESOP), pages 3 17, 2004. [21] Matthew Mirman, Timon Gehr, and Martin Vechev. Differentiable abstract interpretation for provably robust neural networks. In Proc. International Conference on Machine Learning (ICML), 2018. [22] Nicolas Papernot, Patrick Mc Daniel, and Ian Goodfellow. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. ar Xiv preprint ar Xiv:1605.07277, 2016. [23] Nicolas Papernot, Patrick Mc Daniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. Practical black-box attacks against machine learning. In Proc. Asia Conference on Computer and Communications Security, pages 506 519, 2017. [24] Aditi Raghunathan, Jacob Steinhardt, and Percy Liang. Certified defenses against adversarial examples. In Proc. International Conference on Machine Learning (ICML), 2018. [25] Gagandeep Singh, Markus Püschel, and Martin Vechev. Fast polyhedra abstract domain. In Proc. Principles of Programming Languages (POPL), pages 46 59, 2017. [26] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. Co RR, abs/1312.6199, 2013. [27] Vincent Tjeng and Russ Tedrake. Verifying neural networks with mixed integer programming. Co RR, abs/1711.07356, 2017. [28] Shiqi Wang, Kexin Pei, Justin Whitehouse, Junfeng Yang, and Suman Jana. Formal security analysis of neural networks using symbolic intervals. In USENIX Security Symposium (USENIX Security 18), pages 1599 1614, 2018. [29] Tsui-Wei Weng, Huan Zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Luca Daniel, Duane Boning, and Inderjit Dhillon. Towards fast computation of certified robustness for Re LU networks. In Proc. International Conference on Machine Learning (ICML), pages 5273 5282, 2018. [30] Eric Wong, Frank Schmidt, Jan Hendrik Metzen, and J. Zico Kolter. Scaling provable adversarial defenses. Co RR, abs/1805.12514, 2018. URL http://arxiv.org/abs/1805.12514. A Dataset Normalization For each dataset we include a normalization layer (which gets applied after the ϵ-sized box has been calculated) using an approximated mean µ and standard deviation σ per channel as X µ MNIST: µ = 0.1307, σ = 0.3081. CIFAR10: µ = [0.4914, 0.4822, 0.4465], σ = [0.2023, 0.1994, 0.2010]. B Neural Networks Evaluated We test with six networks: one feed forward, four convolutional (without maxpool), and one with a residual connection. These are trained in various ways. In the following descriptions, we use Convs C W H to mean a convolutional layer that outputs C channels, with a kernel width of W pixels and height of H, with a stride of s which then applies Re LU to every output. FC n is a fully connected layer which outputs n neurons without automatically applying Re LU. For each architecture we test three versions: (i) an undefended network; (ii) a network defended with MI-FGSM (a PGD variant which we refer to as PGD in the graphs) [6] with µ = 1, 22 iterations and two restarts, where the step size is ϵ = 5.5 1 for the ϵ used for training; (iii) a network defended with a system based on Diff AI [21]. FFNN. A 6 layer feed forward net with 500 nodes in each and an activation (Re LU, Sigmoid, Tanh) after each layer except the last. Conv Small. Our smallest convolutional network with no convolutional padding. x Conv216 4 4 Re LU Conv232 4 4 Re LU FC 100 z. Conv Med. Similar to Conv Small, but with a convolutional padding of 1. Here we test with the three activations Act = Re LU, Sigmoid, and Tanh. x Conv216 4 4 Act Conv232 4 4 Act FC 1000 z. Conv Big. A significantly larger convolutional network with a convolutional padding of 1. x Conv132 3 3 Re LU Conv232 4 4 Re LU Conv164 3 3 Re LU Conv264 4 4 Re LU FC 512 Re LU FC 512 z. Conv Super Our largest convolutional network with no padding. x Conv132 3 3 Conv132 4 4 Conv164 3 3 Conv164 4 4 FC 512 Re LU FC 512 z. Skip Two convolutional networks of different sizes, which are then concatenated together. This network uses no convolutional padding. x Conv116 3 3 Re LU Conv116 3 3 Re LU Conv132 3 3 Re LU FC 200 o1, x Conv132 4 4 Re LU Conv132 4 4 Re LU FC 200 o2, CAT(o1, o2) Re LU FC 200 Re LU z.