# provably_robust_adversarial_examples__7cae0787.pdf

Published as a conference paper at ICLR 2022

PROVABLY ROBUST ADVERSARIAL EXAMPLES

Dimitar I. Dimitrov1 Gagandeep Singh2,3 Timon Gehr1 Martin Vechev1

1 ETH Zurich 2 University of Illinois, Urbana Champaign 3 VMware Research {dimitar.dimitrov, timon.gehr, martin.vechev}@inf.ethz.ch1 ggnds@illinois.edu2

We introduce the concept of provably robust adversarial examples for deep neural networks connected input regions constructed from standard adversarial examples which are guaranteed to be robust to a set of real-world perturbations (such as changes in pixel intensity and geometric transformations). We present a novel method called PARADE for generating these regions in a scalable manner which works by iteratively reﬁning the region initially obtained via sampling until a reﬁned region is certiﬁed to be adversarial with existing state-of-the-art veriﬁers. At each step, a novel optimization procedure is applied to maximize the region s volume under the constraint that the convex relaxation of the network behavior with respect to the region implies a chosen bound on the certiﬁcation objective. Our experimental evaluation shows the effectiveness of PARADE: it successfully ﬁnds large provably robust regions including ones containing 10573 adversarial examples for pixel intensity and 10599 for geometric perturbations. The provability enables our robust examples to be signiﬁcantly more effective against state-of-theart defenses based on randomized smoothing than the individual attacks used to construct the regions.

1 INTRODUCTION

Deep neural networks (DNNs) are vulnerable to adversarial attacks: small input perturbations that cause misclassiﬁcation (Szegedy et al., 2013). This has caused an increased interest in investigating powerful attacks (Goodfellow et al., 2015; Carlini & Wagner, 2017; Madry et al., 2018; Andriushchenko et al., 2019; Zheng et al., 2019; Wang et al., 2019; Croce & Hein, 2019; Tramèr et al., 2020). An important limitation of existing attack methods is that they only produce a single concrete adversarial example and their effect can often be mitigated with existing defenses (Madry et al., 2018; Hu et al., 2019; Sen et al., 2020; Xiao et al., 2020; Pang et al., 2020; Lécuyer et al., 2019; Cohen et al., 2019; Salman et al., 2019; Fischer et al., 2020; Li et al., 2020). The effectiveness of these attacks can be improved by robustifying them: to consolidate the individual examples to produce a large symbolic region guaranteed to only contain adversarial examples.

This Work: Provably Robust Adversarial Examples. We present the concept of a provably robust adversarial example a tuple consisting of an adversarial input point and an input region around it capturing a very large set (e.g., > 10100) of points guaranteed to be adversarial. We then introduce a novel algorithm for generating such regions and apply it to the setting of pixel intensity changes and geometric transformations. Our work relates to prior approaches on generating robust adversarial examples (Athalye et al., 2018; Qin et al., 2019) but differs in a crucial point: our regions are guaranteed to be adversarial while prior approaches are empirical and offer no such guarantees.

Main Contributions. Our key contributions are:

The concept of a provably robust adversarial example a connected input region capturing a very large set of points, generated by a set of perturbations, guaranteed to be adversarial.

A novel scalable method for synthesizing such examples called Prov Ably Robust ADversarial Examples (PARADE), based on iterative reﬁnement that employs existing state-of-the-art techniques to certify the robustness. Our method is compatible with a wide range of certiﬁ-

Published as a conference paper at ICLR 2022

cation techniques making it easily extendable to new adversarial attack models. We make the code of PARADE available at https://github.com/eth-sri/parade.git A thorough evaluation of PARADE, demonstrating it can generate provable regions containing 10573 concrete adversarial points for pixel intensity changes, in 2 minutes, and 10599 concrete points for geometric transformations, in 20 minutes, on a challenging CIFAR10 network. We also demonstrate that our robust adversarial examples are signiﬁcantly more effective against state-of-the-art defenses based on randomized smoothing than the individual attacks used to construct the regions.

2 BACKGROUND

We now discuss the background necessary for the remainder of the paper. We consider a neural network f : Rn0 Rnl with l layers, n0 input neurons and nl output classes. While our method can handle arbitrary activations, we focus on networks with the widely-used Re LU activation. The network classiﬁes an input x to class y(x) with the largest corresponding output value, i.e., y(x) = argmaxi[f(x)]i. Note for brevity we omit the argument to y when it is clear from the context.

2.1 NEURAL NETWORK CERTIFICATION

In this work, we rely on existing state-of-the-art neural network certiﬁcation methods based on convex relaxations to prove that the adversarial examples produced by our algorithm are robust. These certiﬁcation methods take a convex input region I Rn0 and prove that every point in I is classiﬁed as the target label yt by f. They propagate the set I through the layers of the network, producing a convex region that covers all possible values of the output neurons (Gehr et al., 2018). Robustness follows by proving that, for all combinations of output neuron values in this region, the output neuron corresponding to class yt has a larger value than the one corresponding to any other class y = yt.

Commonly, one proves this property by computing a function Ly : Rn0 R for each label y = yt, such that, for all x I, we have Ly(x) [f(x)]yt [f(x)]y. For each Ly, one computes minx I Ly(x) to obtain a global lower bound that is true for all x I. If we obtain positive bounds for all y = yt, robustness is proven. To simplify notation, we will say that the certiﬁcation objective L(x) is the function Ly(x) with the smallest minimum value on I. We will call its corresponding minimum value the certiﬁcation error. We require Ly(x) to be a linear function of x. This requirement is consistent with many popular certiﬁcation algorithms based on convex relaxation, such as CROWN (Zhang et al., 2018), Deep Z (Singh et al., 2018a), and Deep Poly (Singh et al., 2019). Without loss of generality, for the rest of this paper, we will treat Deep Poly as our preferred certiﬁcation method.

2.2 CERTIFICATION AGAINST GEOMETRIC TRANSFORMATIONS

Deep Poly operates over speciﬁcations based on linear constraints over input pixels for veriﬁcation. These constraints are straightforward to provide for simple pixel intensity transformations such as adversarial patches (Chiang et al., 2020) and L (Carlini & Wagner, 2017) perturbations that provide a closed-form formula for the input region. However, geometric transformations do not yield such linear regions. To prove the robustness of our generated examples to geometric transformations, we rely on Deep G (Balunovi c et al., 2019) which, given a range of geometric transformation parameters, creates an overapproximation of the set of input images generated by the geometric perturbations. Deep G then leverages Deep Poly to certify the input image region. When generating our geometric robust examples, we work directly in the geometric parameter space and, thus, our input region I and the inputs to our certiﬁcation objective L(x) are also in geometric space. Despite this change, as our approach is agnostic to the choice of the veriﬁer, in the remainder of the paper we will assume the certiﬁcation is done using Deep Poly and not Deep G, unless otherwise stated.

2.3 RANDOMIZED SMOOTHING

Randomized smoothing (Lécuyer et al., 2019; Cohen et al., 2019) is a provable defense mechanism against adversarial attacks. For a chosen standard deviation σ and neural network f as deﬁned above, randomized smoothing computes a smoothed classiﬁer g based on f, such that g(x) = argmax c P(y(x + ϵ) = c) with random Gaussian noise ϵ N(0, σ2I). This construction of g allows

Published as a conference paper at ICLR 2022

Cohen et al. (2019) to introduce the procedure CERTIFY that provides probabilistic guarantees on the robustness of g around a point x: Proposition 1. (From Cohen et al. (2019)) With probability at least 1 α over the randomness in CERTIFY, if CERTIFY returns a class y and a radius R (i.e does not abstain), then g predicts y within radius R around x : g(x + δ) = y, for all δ 2 < R.

We deﬁne adversarial attacks on smoothed classiﬁers, as follows: Deﬁnition 1 (Adversarial attack on smoothed classiﬁers). For a ﬁxed σ, α and an adversarial distance R R>0, we call x Rn0 an adversarial attack on the smoothed classiﬁer g at the point x Rn0, if x x 2 < R and g(x) = g( x).

Similarly to generating adversarial attacks on the network f, we need to balance the adversarial distance R on g. If too big the problem becomes trivial; if too small no attacks exist. We outline the exact procedure we use to heuristically select R in Appendix C.4. Using the above deﬁnition, we deﬁne the strength of an attack x as follows: Deﬁnition 2 (Strength of adversarial attack on smoothed classiﬁers). We measure the strength of an attack x in terms of Radv the radius around x, whose L2 ball is certiﬁed to be the same adversarial class as x on the smoothed network g using CERTIFY for a chosen σ and α.

Intuitively, this deﬁnition states that for points x for which Radv is bigger, the smoothed classiﬁer is less conﬁdent about predicting the correct class, since more adversarial examples are sampled in this region and therefore, the attack on g is stronger. We use this measure in Section 5 to compare the effectiveness of our adversarial examples to examples obtained by PGD on g.

Existing methods for generating robust adversarial examples focus on achieving empirical robustness (Qin et al., 2019; Athalye et al., 2018). In contrast, we consider provably robust adversarial examples, deﬁned below: Deﬁnition 3 (Provably Robust Adversarial Example). We deﬁne a provably robust adversarial example to be any large connected neural network input region, deﬁned by a set of perturbations of an input, that can be formally proven to only contain adversarial examples.

In this section, we outline how PARADE generates such regions. The technical details are given in Section 4. To generate a provably robust adversarial example, ideally, we would like to directly maximize the input region s size, under the constraint that it only contains adversarial examples. This leads to multiple challenges: Small changes of the parametrization of the input region (e.g., as a hyperbox) can lead to large changes of the certiﬁcation objective, necessitating a small learning rate for optimization algorithms based on gradient descent. At the same time, the optimizer would have to solve a full forward veriﬁcation problem in each optimization step, which is slow and impractical. Additionally, like Balunovic & Vechev (2020), we empirically observed that bad initialization of the robust region causes convergence to local minima, resulting in small regions. This can be explained by the practical observation in Jovanovi c et al. (2021), which shows that optimization problems involving convex relaxations are hard to solve for all but the simplest convex shapes. We now provide an overview of how we generate robust adversarial regions while alleviating these problems.

Our method for generating robust examples in the shape of a hyperbox is shown in Figure 1 and assumes an algorithm A that generates adversarial examples and a neural network veriﬁer V. We require V to provide the certiﬁcation objective L(x) as a linear function of the network s input neurons (whose values can be drawn from the input hyperbox), as described in Section 2. We split our algorithm into two steps, described in Section 3.1 and Section 3.2, to address the challenges outlined above. An optional ﬁnal step, illustrated in Figure 2 and outlined in Appendix E, demonstrates the extension of PARADE to generate polyhedral examples.

3.1 COMPUTING AN OVERAPPROXIMATION REGION

In the ﬁrst step, we compute an overapproximation hyperbox O by ﬁtting a hyperbox around samples obtained from A. O is represented as a dashed blue rectangle in Figure 1. Intuitively, we use O to restrict the search space for adversarial regions to a part of the input space that A can attack.

Published as a conference paper at ICLR 2022

Overapproximation Box Underapproximation Box

1. Generate samples by attacking the network

2. Generate an overapproximation box from samples

3. Try to verify the network and create a target hyperplane

4. Adjust the target hyperplane s bias and shrink box accordingly

Fit Box Initialize

Figure 1: Overview of PARADE. The green triangle denoted T represents the ground truth adversarial region and the red crosses represent the attacks obtained by A. The dashed blue and violet dotted rectangles denoted O and U, represent the ﬁtted overapproximation and underapproximation boxes, respectively. The solid black lines represent hyperplanes generated by the veriﬁcation procedure, while their dash-dotted counterparts represent the hyperplanes after bias-adjustment. The arrows going out of the hyperplanes represent the direction of their corresponding half-spaces.

3.2 COMPUTING AN UNDERAPPROXIMATION REGION

In the second step, we compute the robust adversarial example as a hyperbox U, represented as a dotted violet rectangle in Figure 1. We will refer to U as underapproximation hyperbox to distinguish it from the polyhedral robust example P generated in Appendix E. U is obtained by iteratively shrinking O until arriving at a provably adversarial region. We denote the hyperbox created at iteration i of the shrinking process with Ui, where U0 = O. At each iteration i, we compute the certiﬁcation objective Li 1(x) of the region Ui 1 computed in the previous iteration. If the objective is proven to be positive, we assign U to Ui 1, as in Step 4 in Figure 1, and return. Otherwise, we solve an optimization problem, optimizing a surrogate for the volume of Ui Ui 1 under the constraint that minx Ui Li 1(x) pi 1, for a chosen bound pi 1 0. We describe the optimization problem involved in the shrinking procedure in the next paragraph. In Section 4, we will present and motivate our choice of pi 1. In Appendix D.2, we further discuss a sufﬁcient condition under which the overall procedure for generating U is guaranteed to converge and why this condition often holds in practice.

An important property of our choice of pi 1 is that it becomes smaller over time and thus forces the certiﬁcation objective to increase over time until it becomes positive. We note that optimizing L(x) this way is simpler than directly optimizing for U by differentiating through the convex relaxation of the network. In particular, our method does not require propagating a gradient through the convex relaxation of the network and the convex relaxation is updated only once per iteration. Additionally, it allows PARADE to work with convex relaxation procedures that are not differentiable, such as the Deep G method employed in this paper. The resulting hyperbox U is a provable adversarial region.

Shrinking Ui 1. There are multiple ways to shrink Ui 1 so that the condition minx Ui Li 1(x) pi 1 is satisﬁed. We make the greedy choice to shrink Ui 1 in a way that maximizes a proxy for the volume of the new box Ui. We note this may not be the globally optimal choice across all iterations of the algorithm, however, and that even achieving local optimality in terms of volume of Ui is a hard problem as it is non-convex in high dimensions.

One possible solution to the shrinking problem is to use a scheme that we call uniform shrinking, where all lower bounds [li 1]j and upper bounds [ui 1]j in each dimension j of the hyperbox Ui 1 are changed by the same amount. The optimal amount for this uniform shrinking scheme can then be selected using a binary search. While this gives an efﬁcient shrinking algorithm, in practice we observe that uniform shrinking produces small regions. This is expected, as it may force all dimensions of the hyperbox to be shrunk by a large amount even though it might be enough to shrink only along some subset of the dimensions.

In contrast, our shrinking scheme adjusts each [li 1]j and [ui 1]j individually. While this shrinking scheme is more powerful, optimizing the size of Ui with respect to its lower and upper bounds [li]j and [ui]j becomes more complex. We approach the problem differently in the high dimensional input setting related to L perturbations and the low dimensional setting related to geometric changes.

Shrinking for L Transformations. For the high-dimensional setting of L pixel transformations, we maximize Pn0 j=1[wi]j, where [wi]j = [ui]j [li]j denotes the width of the jth dimension of Ui. The linear objective allows for solutions where a small number of dimensions are shrunk to 0 width

Published as a conference paper at ICLR 2022

in favor of allowing most input dimensions to retain a big width. As a result, in our computed region Ui there are few pixels that can only take a single value while most other pixels can take a big range of values. Since, Pn0 i=j[wi]j is a linear function of [li 1]j and [ui 1]j, the optimization can be posed as a Linear Program (LP) (Schrijver, 1998) and be efﬁciently solved with an LP solver to obtain optimal values for [ui]j and [li]j w.r.t the objective, as we show in Appendix B.2.

Shrinking for Geometric Transformations. As discussed in Section 2.2, we generate our provably robust examples to geometric transformations in the low-dimensional geometric parameter space. Therefore, the resulting region Ui is also low-dimensional. We found that maximizing the logarithm of the volume of Ui is more effective in this setting, as this objective encourages roughly equally-sized ranges for the dimensions of Ui. We solve the optimization problem using PGD, which can lead to merely locally-optimal solutions for the volume of Ui. We, therefore, rely on multiple initializations of PGD and choose the best result. Appendix B.3 describes the process in detail.

3.3 PROVABLY ROBUST POLYHEDRAL ADVERSARIAL EXAMPLES

We remark that our provably robust adversarial examples are not restricted to hyperbox shapes. Appendix E describes an optional ﬁnal step of PARADE that generates provably robust polyhedral adversarial examples P from U and O. P is iteratively cut from O, such that P U is ensured. Therefore, P represents a bigger region, but takes additional time to compute. The computation of P, requires a V that accumulates imprecision only at activation functions, such as Re LU or sigmoid, in order to work. This property is violated by Deep G s handling of the geometric transformations. For this reason, we only apply it in the L -transformation setting. We refer interested readers to Appendix E for information on how we generate polyhedral adversarial examples.

4 PARADE: PROVABLY ROBUST ADVERSARIAL EXAMPLES

We now present PARADE in more formal terms. We already discussed how to compute O in Section 3. We use Algorithm 1 to compute U, which is the output of PARADE. It requires a neural network f : Rn0 Rnl with l layers, adversarial target class yt, an overapproximation hyperbox O, a speed/precision trade-off parameter c [0, 1], a maximum number of iterations uit, an early termination threshold t and a certiﬁcation method V that takes a neural network f, a hyperbox Ui 1 and a target yt and returns a linear certiﬁcation objective Li 1(x).

Algorithm 1 GENERATE_UNDERAPPROX

1: func GENERATE_UNDERAPPROX( f, O, V, yt, c, uit, t ) 2: U0 = O 3: for i {1, 2, . . . , uit} do 4: Li 1(x), ei 1 = V( f, Ui 1, yt ) 5: if ei 1 0 then 6: return Ui 1 7: end if 8: pi 1 = ei 1 c 9: if pi 1 t then 10: pi 1 = 0 11: end if 12: if L _transformation then 13: Ui = Shrink_LP( Ui 1, Li 1(x), pi 1 ) 14: else 15: Ui = Shrink_PGD( Ui 1, Li 1(x), pi 1 ) 16: end if 17: end for 18: return Failed To Converge 19: end func

As described in Section 3.2, the algorithm generates a sequence of hyperboxes U0 U1 . . . Uuit with U0 = O and returns the ﬁrst hyperbox from the sequence proven to be adversarial by V. At each iteration, the algorithm attempts to certify the hyperbox from the previous iteration Ui 1 to be adversarial by computing the certiﬁcation error ei 1 (Line 4) and checking if it is positive (Line 5). If successful, Ui 1 is returned. Otherwise, we use Li 1(x) generated by our attempt at verifying Ui 1 (Line 4) and generate the constraint minx Ui Li 1(x) pi 1 based on a parameter pi 1 0. We note that minx Ui Li 1(x) is a function of the hyperbox Ui, parametrized by its lower and upper bounds. In order to shrink Ui 1, we optimize the lower and upper bounds of Ui explicitly, such that the constraint holds true for the bounds. Next, we specify and motivate our choice of pi 1.

At each iteration i, we use the constraint minx Ui Li 1(x) pi 1 to shrink Ui 1 (Line 12 16) by calling either the SHRINK_LP or SHRINK_PGD optimization procedures. SHRINK_LP and

Published as a conference paper at ICLR 2022

Table 1: Comparison between different methods for creating adversarial examples robust to intensity changes. Column ϵ depicts the preturbation radius used by adversarial algorithm A. Columns #Corr, #Img, #Reg show the number of correctly classiﬁed images, adversarial images and adversarial regions for the network. For each method columns #Ver Reg, Time and #Size show the number of veriﬁed regions, average time taken, and number of concrete adversarial examples inside the regions.

BASELINE PARADE BOX PARADE POLY

DATASET MODEL ϵ #CORR #IMG #REG #VERREG TIME SIZE #VERREG TIME SIZE #VERREG TIME SIZEO

MNIST 8 X 200 0.045 97 22 53 41 272 S 1024 53 114 S 10121 53 1556 S < 10191 CO NVSMA LL 0.12 100 21 32 31 171 S 10339 32 74 S 10494 32 141 S < 10561 CON VBIG 0.05 98 18 29 15 1933S 109 28 880 S 10137 28 5636 S < 10173

CIFAR10 CO NVSMA LL 0.006 59 23 44 28 238 S 10360 44 113 S 10486 44 264 S < 10543 CO NVBIG 0.008 60 25 36 26 479 S 10380 36 404 S 10573 36 610 S < 10654

SHRINK_PGD are detailed in Appendix B.2 and Appendix B.3, respectively. To enforce progress of the algorithm, we require the constraint to remove non-zero volume from Ui 1. Let pmax i 1 be any upper bound on pi 1, for which if pi 1 exceeds pmax i 1 the constraint minx Ui Li 1(x) pi 1 is trivially satisﬁed with Ui = Ui 1. We can show pmax i 1 = minx Ui 1 Li 1(x) is one such upper bound, as we have: minx Ui Li 1(x) minx Ui 1 Li 1(x) = pmax i 1 , where the inequality follows from Ui Ui 1. We show the hyperplane Li 1(x) = pmax i 1 denoted as Upper in Step 3 in Figure 1. We note that Upper always touches Ui 1 by construction. Decreasing the value of pi 1, increases the volume removed from Ui 1, as demonstrated by the other two hyperplanes in Step 3 in Figure 1. Additionally, we require pi 1 0 because any value of pi 1 0 creates a provably adversarial region Ui and choosing pi 1 < 0 generates smaller region Ui than choosing pi 1 = 0. We denote the hyperplane corresponding to pi 1 = 0 in Figure 1 as Provable. We, thus, use pi 1 [0, pmax i 1 ).

We note that while pi 1 = 0 is guaranteed to generate a provable region Ui, it is often too greedy. This is due to the fact that with an imprecise V, Li 1(x) itself is also imprecise and, therefore a more precise certiﬁcation method may certify Ui as adversarial for pi 1 > 0. Further, as the region Ui 1 shrinks, the precision of V increases. Thus, V might be capable of certifying Ui, for pi 1 > 0, since on the next iteration of the algorithm the more precise certiﬁcation objective Li(x) will be used for certifying the robustness of the smaller Ui. We choose pi 1 heuristically, by setting pi 1 = pmax i 1 c, for a chosen parameter c [0, 1). This is depicted in Line 8 in Algorithm 1. The closer c is to 1, the bigger the chance that the modiﬁed region will not verify, but, also, the bigger the region we produce and vice versa. Thus, c balances the precision and the speed of convergence of the algorithm. We empirically verify the effect of c on our regions in Appendix A.1. The hyperplane resulting from c = 0.65, denoted as Chosen, is shown in Step 3 in Figure 1.

Finally, if at some iteration i, the Upper hyperplane gets closer than a small threshold t to the Provable hyperplane, that is pi 1 t, we set pi 1 = 0 to force the algorithm to terminate (Line 10). This speeds up the convergence of PARADE in the ﬁnal stages, when the precision loss of V is not too big.

5 EXPERIMENTAL EVALUATION

We now evaluate the effectiveness of PARADE on realistic networks. We implemented PARADE in Python and used Tensorﬂow (Abadi et al., 2015) for generating PGD attacks. We use Gurobi 9.0 (Gurobi Optimization, LLC, 2020) for solving the LP instances. We rely on ERAN (Singh et al., 2018b) for its Deep Poly and Deep G implementations. We ran all our experiments on a 2.8 GHz 16 core Intel(R) Xeon(R) Gold 6242 processor with 64 GB RAM.

Neural Networks. We use MNIST (Le Cun et al., 1998) and CIFAR10 (Krizhevsky, 2009) based neural networks. Table 5 in Appendix C.1 shows the sizes and types of the different networks. All networks in our L experiments, except the CIFAR10 Conv Big, are not adversarially trained. In our experiments, normally trained networks were more sensitive to geometric perturbations than for L . Therefore, we use Diff AI-trained (Mirman et al., 2018) networks which tend to be less sensitive against geometric transformations. Our deepest network is the MNIST 8 200 fully-connected network (FFN) which has 8 layers with 200 neurons each and another layer with 10 neurons. Our largest network is the CIFAR10 Conv Big network with 6 layers and 62K neurons. This network is among the largest benchmarks that existing certiﬁcation methods can handle in a scalable and precise manner. We compare the methods on the ﬁrst 100 test images of the datasets, a standard practice in the certiﬁcation literature (Balunovi c et al., 2019; Singh et al., 2019), while ﬁltering the wrongly-classiﬁed ones. For each experiment, we tuned the value of c to balance runtime and size.

Published as a conference paper at ICLR 2022

Table 2: Comparison between methods for creating adversarial examples robust to geometric changes. Columns #Corr, #Img, #Reg show the number of correctly classiﬁed images, adversarial images and adversarial regions for the network. For each method columns #Ver Reg, Time and #Splits show the number of veriﬁed regions, average time taken, and number of splits used to verify. Under and Over show median bounds on the number of concrete adversarial examples inside the regions.

BASELINE PARADE

DATASET TRANSFORM #CORR #IMG #REG #VERREG TIME #SPLITS #VERREG TIME #SPLITS UNDER OVER

MNIST CO NVSMAL L

R(17) SC(18) SH(0.03) 99 38 54 10 890 S 2X5X2 51 774 S 1X2X1 > 1096 < 10195

SC(20) T(-1.7,1.7,-1.7,1.7) 99 32 56 5 682 S 4X3X3 51 521 S 2X1X1 > 1071 < 10160

SC(20) R(13) B(10, 0.05) 99 33 48 2 420 S 3X2X2X2 40 370 S 2X1X1X1 > 1070 < 10455

MNIST CO NVBIG

R(10) SC(15) SH(0.03) 95 40 50 9 812 S 2X4X2 44 835 S 1X2X1 > 1077 < 10205

SC(20) T(0,1,0,1) 95 34 46 2 435 S 4X2X2 42 441 S 2X1X1 > 1064 < 10174

SC(15) R(9) B(5, 0.05) 95 39 52 2 801 S 3X2X2X2 46 537 S 2X1X1X1 > 10119 < 10545

CIFAR CO NVSMAL L

R(2.5) SC(10) SH(0.02) 53 24 29 1 1829 S 5X2X2 29 1369 S 2X1X1 > 10599 < 101173

SC(10) T(0,1,0,1) 53 28 32 1 1489 S 4X3X3 32 954 S 2X1X1 > 1066 < 10174

SC(5) R(8) B(1, 0.01) 53 21 25 1 2189 S 5X2X2X2 21 1481 S 2X1X1X1 > 10513 < 102187

5.1 ADVERSARIAL EXAMPLES ROBUST TO INTENSITY CHANGES

Table 1 summarizes our results on generating examples robust to intensity changes, whose precise deﬁnition is in Appendix B.1. Further, Appendix F.2 shows images of the examples obtained in our experiment. We compare the uniform shrinking described in Section 3, used as a baseline, against the PARADE variants for generating robust hyperbox and polyhedral adversarial examples. In all experiments, we compute O using examples collected within an L ball around a test image with the radius ϵ speciﬁed in Table 1. The values of ϵ are chosen such that the attack has a non-trivial probability (> 0 but < 1) of success. Additional details about the experimental setup are given in Appendix C.2.

For the hyperbox adversarial examples obtained by both PARADE and the baseline, we calculate the size of the example in terms of the number of concrete adversarial examples it contains and we report the median of all regions under the Size columns in Table 1. For the polyhedral adversarial examples, we report the median number of concrete adversarial examples contained within an overapproximated hyperbox around the polyhedral region in the Size O column. We note that our hyperbox regions are contained within our polyhedral regions by construction and, thus, the size of the hyperbox regions acts as an underapproximation of the size of our polyhedral regions. We also report the average runtime of all algorithms on the attackable regions.

We note that for MNIST Conv Big and 8 200, the regions obtained for O produced huge certiﬁcation error > 1000, which prevented our underapproximation algorithm U to converge on these networks. To alleviate the problem, we used uniform shrinking to lower the error to under 100 ﬁrst. We then used the obtained box in place of O. We conjecture the reason for this issue is the huge amount of imprecision accumulated in the networks due to their depth and width and note that the uniform shrinking on these networks alone performs worse. For each network, in Table 1, we report the number of distinct pairs of an image and an adversarial target on which A succeeds (column #Reg), as well as the number of unique images on which A succeeds (column #Img). We note that #Reg > #Img. We further report the number of test images that are correctly classiﬁed by the networks (column #Corr). The table details the number of regions on which the different methods succeed (column #Ver Reg), that is, they ﬁnd a robust adversarial example containing > 1000 concrete adversarial examples. As can be seen in Table 1, PARADE Box creates symbolic regions that consistently contain more concrete adversarial examples (up to 10573 images on CIFAR10 Conv Big) than the baseline while being faster. PARADE also succeeds more often, failing only for a single region due to the corresponding hyperbox O also containing < 1000 concrete examples versus 53 failures for the baseline. Overall, our method, unlike the baselines, generates robust examples containing large number of examples in almost all cases in which the image is attackable, in a few minutes.

5.2 ADVERSARIAL EXAMPLES ROBUST TO GEOMETRIC CHANGES

Next, we demonstrate the results of our algorithm for generating examples robust to geometric transformations, whose precise deﬁnition is given in Appendix B.1. Here, PARADE relies on Deep G for robustness certiﬁcation, as detailed in Section 2. To increase its precision, Deep G selectively employs input space splitting, where each split is veriﬁed individually. This strategy is effective due to the low dimensionality of the input space (3 to 4 geometric parameters in our experiments). To this end, we uniformly split our method s initial region O, instead, where all splits are individually shrunken and their sizes are summed. We compare PARADE against a baseline based on a more aggressive uniform splitting of O that does not use our shrinking procedure. Since the cost and

Published as a conference paper at ICLR 2022

Table 3: The robustness of different examples to L2 smoothing defenses.

MNIST CIFAR

METHOD 8X200 CON VSMA LL CO NVBI G CO NVSMA LL CO NVBIG

BASELINE 0.55 0.38 0.59 0.53 0.26 PARADE 1.00 1.00 1.00 1.00 1.00 INDIVIDUAL ATTACKS MEAN 0.29 0.16 0.18 0.48 0.25 INDIVIDUAL ATTACKS 95% PERCENTILE 0.53 0.44 0.51 0.61 0.37

precision of veriﬁcation increases with the number of splits, to allow for fair comparison we select the number of baseline splits, so the time taken by the baseline is similar or more than for PARADE.

Table 2 summarizes the results for PARADE and the baseline on the MNIST and CIFAR10 datasets. Further, Appendix F.3 shows visualizations of the adversarial examples obtained in the experiment. The Transform column in Table 2 represents the set of common geometric transformations on the images. Here, R(x) signiﬁes a rotation of up to x degrees; Sc(x), scaling of up to x%; Sh(m), shearing with a factor up to m%; B(γ, β), changes in contrast between γ% and brightness between β%; and T(xl, xr, yd, yu), translation of up to xl pixels left, xr pixels right, yd pixels down and yu pixels up. The selected combinations of transformations contain 4 parameters, as is standard for assessing adversarial robustness against geometrical perturbations (Balunovi c et al., 2019). Unlike Deep G, we chose only combinations of 3 parameters, as they are more challenging. The chosen combinations ensure that all transformations from Deep G are present. Similar to L , we chose the parameter bounds such that the attack algorithm has a non-trivial probability of success. Additional details about the experimental setup are given in Appendix C.3.

We note that in this setting PARADE creates robust regions in the geometric parameter space, however since we are interested in the number of concrete adversarial images possible within our computed regions, we use Deep G to compute a polyhedron overapproximating the values of the pixels inside the region. For each polyhedron, we compute underapproximation and overapproximation hyperboxes, resulting in a lower and an upper bound on the number of concrete examples within our computed regions. The underapproximation hyperbox is computed as in Gopinath et al. (2019) and the overapproximation is constructed by computing the extreme values of each pixel. The medians of the bounds are shown in the Over and Under columns in Table 2. For all experiments, the baseline failed to generate regions with > 1000 examples on most images and, thus, we omit its computed size to save space. On the few images where it does not, the size generated by PARADE is always more.

We report the number of concrete examples, the number of splits, the average runtime on attackable regions, and the number of successfully veriﬁed regions for both the baseline and PARADE in the #Splits, Time, and #Ver Reg columns. Table 2 demonstrates that in a similar or smaller amount of time, PARADE is capable of generating robust adversarial examples for most regions O, while the baseline fails for most. Further, our regions are non-trivial as they contain more than 1064 concrete adversarial attacks. We note that the runtimes for generating robust examples in this setting is higher than in Table 1. This is due to both the inefﬁciency of the attack and the time required for the robustness certiﬁcation of a single region. Due to the ﬂexibility and the generality of our method, we believe it can be easily adapted to more efﬁcient attack and certiﬁcation methods in the future.

5.3 ROBUSTNESS OF ADVERSARIAL EXAMPLES TO RANDOMIZED SMOOTHING

Next, we show that our adversarial examples robust to intensity changes are signiﬁcantly more effective against state-of-the-art defenses based on randomized smoothing compared to the concrete attacks by A generated for our examples, as well as the baseline from Table 1. We note that our setting is different than Salman et al. (2019) which directly attacks the smoothed classiﬁer. Instead, we create examples with PARADE on the original network f with A, based on the L2 PGD attack and evaluate them on g. For all methods, we calculate the quantity Radv for g, introduced in Deﬁnition 2, that measures the adversarial attack strength on g. For PARADE and the baseline we select x in Deﬁnition 2 as the middle point of the underapproximation box projected back to the L2 ball, with center x and radius the adversarial distance R . For the concrete attacks, x is simply the attack itself. We exclude images whose adversarial distance R is more than 33% bigger than the certiﬁed radius R of x, as we consider them robust to attacks. We experiment with networks f with the same architectures as in Table 1. We supply additional details on the experimental setup in Appendix C.4.

In Table 3, we show the ratio between the adversarial strengths Radv of all other methods and PARADE averaged across the ﬁrst 100 test images. The last two rows in Table 3 depict the mean and the 95% percentile of the strength ratio across the attacks generated by A. We note the smaller the ratio in Table 3 is, the weaker the attacker is with respect to PARADE. We observe that our examples

Published as a conference paper at ICLR 2022

produce attacks on the smoothed classiﬁer g with average strength ranging from 2 to 6 larger compared to the alternatives. We conjecture, this is caused by our examples being chosen such that the regions around them have comparatively higher concrete example density. We note the uniform baseline also largely performs better than the mean individual attack, further supporting our claim.

5.4 COMPARISON WITH EMPIRICALLY ROBUST EXAMPLES

We compare our provably robust adversarial examples to empirically robust adversarial examples (Athalye et al., 2018) large input regions, deﬁned by a set of perturbations of an image, that are empirically veriﬁed to be adversarial. The empirical veriﬁcation is based on calculating the expectation over transformations (Eo T) of the obtained region deﬁned in Athalye et al. (2018). We consider a region to be an empirically robust adversarial example if it has an Eo T 90%. We demonstrate that Eo T is a limited measure, as it is not uniform. That is, empirically robust adversarial examples can exhibit high Eo T - e.g 95%, while specially chosen subregions incur very low Eo T scores. In Appendix C.5, we discuss our extension to Athalye et al. (2018) used in this experiment.

We execute the experiment on the ﬁrst MNIST Conv Big transformation in Table 2. We note that the algorithm from Athalye et al. (2018) relies on sampling and does not scale to high dimensional L perturbations. The algorithm produces 24 different regions, signiﬁcantly less than our 44 provable regions. The 24 empirically robust adversarial examples produced achieve average Eo T of 95.5%. To demonstrate the non-uniformness of the empirical robustness, for each region, we considered a subregion of the example, with dimensions of size 10% of the original example, placed at the corner of the example closest to the original unperturbed image. We calculated the Eo T of these subregions and found that 14/24 regions had an Eo T of < 50%, with 9 having an Eo T of 0%.

This demonstrates that the fundamental assumption that the Eo T of a region is roughly uniform throughout does not hold and instead it is possible and even common in practice that a signiﬁcant part of an empirically robust example is much less robust than the remaining region. Further, this often happens close to the original input image, thus providing the misleading impression that an empirically robust adversarial example is a much stronger adversarial attack than it actually is. This misleading behaviour suggests that provably robust adversarial examples, which cannot exhibit such behaviour by construction, provide a more reliable method of constructing adversarial examples invariant to intensity changes and geometric transformations. We note that a limiting factor to the practical adoption of our method remains scalability to real world networks and datasets such as Image Net, due to the reliance on certiﬁcation techniques that do not scale there yet. We note that our method is general and efﬁcient enough to directly beneﬁt from any future advancement in veriﬁcation.

5.5 COMPARISON WITH LIU ET AL. (2019)

In this section, we extend the method proposed in Liu et al. (2019) to generate provably robust hyperbox regions against pixel intensity changes and compare the resulting algorithm to PARADE. As the code provided in Liu et al. (2019) only applies to fully connected networks, we applied both their approach and PARADE on a new 8x200 MNIST fully connected network with the same architecture as the one used in Table 1. To facilitate simpler comparison, the network was trained using the code provided in Liu et al. (2019). We initialized both methods with the same overapproximation hyperbox O and compare the shrinking phase. With similar runtime ( 200s), out of 84 possible regions, PARADE and Liu et al. (2019) created 84 and 76 regions with median underapproximation volume of 10184 and 1046, respectively. We point out that our method is capable of creating polyhedral regions and handle the geometric perturbations, while Liu et al. (2019) cannot.

6 CONCLUSION

We introduced the concept of provably robust adversarial examples large input regions guaranteed to be adversarial to a set of perturbations. We presented a scalable method called PARADE for synthesizing such examples, generic enough to handle multiple perturbations types including pixel intensity and geometric transformations. We demonstrated the effectiveness of PARADE by showing it creates more and larger regions than baselines, in similar or less time, including regions containing 10573 adversarial examples for pixel intensity and 10599 for geometric transformations.

Published as a conference paper at ICLR 2022

Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. Tensor Flow: Large-scale machine learning on heterogeneous systems, 2015. URL http://tensorflow.org/. Software available from tensorﬂow.org.

Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion, and Matthias Hein. Square attack: a query-efﬁcient black-box adversarial attack via random search, 2019.

Anish Athalye, Logan Engstrom, Andrew Ilyas, and Kevin Kwok. Synthesizing robust adversarial examples. In Proc. International Conference on Machine Learning (ICML), volume 80, pp. 284 293, 10 15 Jul 2018.

Mislav Balunovic and Martin Vechev. Adversarial training and provable defenses: Bridging the gap. 2020.

Mislav Balunovi c, Maximilian Baader, Gagandeep Singh, Timon Gehr, and Martin Vechev. Certifying geometric robustness of neural networks. Advances in Neural Information Processing Systems 32, 2019.

Nicholas Carlini and David A. Wagner. Towards evaluating the robustness of neural networks. In Proc. IEEE Symposium on Security and Privacy (SP), pp. 39 57, 2017.

Ping-yeh Chiang, Renkun Ni, Ahmed Abdelkader, Chen Zhu, Christoph Studer, and Tom Goldstein. Certiﬁed defenses for adversarial patches. In 8th International Conference on Learning Representations (ICLR 2020)(cancelled). International Conference on Learning Representations, 2020.

Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certiﬁed adversarial robustness via randomized smoothing. In International Conference on Machine Learning, pp. 1310 1320. PMLR, 2019.

Francesco Croce and Matthias Hein. Minimally distorted adversarial examples with a fast adaptive boundary attack, 2019.

Ruediger Ehlers. Formal veriﬁcation of piece-wise linear feed-forward neural networks. In D. D Souza and K. N. Kumar (eds.), International Symposium on Automated Technology for Veriﬁcation and Analysis, pp. 269 286. Springer, 2017.

Logan Engstrom, Brandon Tran, Dimitris Tsipras, Ludwig Schmidt, and Aleksander Madry. Exploring the landscape of spatial robustness. In International Conference on Machine Learning, pp. 1802 1811. PMLR, 2019.

Marc Fischer, Maximilian Baader, and Martin T. Vechev. Certiﬁed defense to image transformations via randomized smoothing. In Proc. Neural Information Processing Systems (Neur IPS), 2020.

Marguerite Frank and Philip Wolfe. An algorithm for quadratic programming. Naval research logistics quarterly, 3(1-2):95 110, 1956.

Timon Gehr, Matthew Mirman, Dana Drachsler-Cohen, Petar Tsankov, Swarat Chaudhuri, and Martin Vechev. Ai2: Safety and robustness certiﬁcation of neural networks with abstract interpretation. In 2018 IEEE Symposium on Security and Privacy (SP), pp. 3 18. IEEE, 2018.

Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In International Conference on Learning Representations (ICLR), 2015.

Divya Gopinath, Hayes Converse, Corina Pasareanu, and Ankur Taly. Property inference for deep neural networks. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 797 809. IEEE, 2019.

Published as a conference paper at ICLR 2022

Gurobi Optimization, LLC. Gurobi optimizer reference manual, 2020. URL http://www. gurobi.com.

Shengyuan Hu, Tao Yu, Chuan Guo, Wei-Lun Chao, and Kilian Q. Weinberger. A new defense against adversarial images: Turning a weakness into a strength. In Proc. Neural Information Processing Systems, (Neur IPS), pp. 1633 1644, 2019.

Max Jaderberg, Karen Simonyan, Andrew Zisserman, and Koray Kavukcuoglu. Spatial transformer networks. ar Xiv preprint ar Xiv:1506.02025, 2015.

Nikola Jovanovi c, Mislav Balunovi c, Maximilian Baader, and Martin Vechev. Certiﬁed defenses: Why tighter relaxations may hurt training. ar Xiv preprint ar Xiv:2102.06700, 2021.

Alex Krizhevsky. Learning multiple layers of features from tiny images. 2009.

Yann Le Cun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278 2324, 1998.

Mathias Lécuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, and Suman Jana. Certiﬁed robustness to adversarial examples with differential privacy. In 2019 IEEE Symposium on Security and Privacy, SP, pp. 656 672. IEEE, 2019.

Linyi Li, Maurice Weber, Xiaojun Xu, Luka Rimanic, Tao Xie, Ce Zhang, and Bo Li. Provable robust learning based on transformation-speciﬁc smoothing. Co RR, abs/2002.12398, 2020.

Chen Liu, Ryota Tomioka, and Volkan Cevher. On certifying non-uniform bounds against adversarial attacks. In International Conference on Machine Learning, pp. 4072 4081. PMLR, 2019.

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In Proc. International Conference on Learning Representations (ICLR), 2018.

Matthew Mirman, Timon Gehr, and Martin Vechev. Differentiable abstract interpretation for provably robust neural networks. In International Conference on Machine Learning, pp. 3578 3586. PMLR, 2018.

Tianyu Pang, Kun Xu, Yinpeng Dong, Chao Du, Ning Chen, and Jun Zhu. Rethinking softmax cross-entropy loss for adversarial robustness. In Proc. International Conference on Learning Representations, ICLR, 2020.

Yao Qin, Nicholas Carlini, Garrison Cottrell, Ian Goodfellow, and Colin Raffel. Imperceptible, robust, and targeted adversarial examples for automatic speech recognition. In International conference on machine learning, pp. 5231 5240. PMLR, 2019.

Hadi Salman, Jerry Li, Ilya P. Razenshteyn, Pengchuan Zhang, Huan Zhang, Sébastien Bubeck, and Greg Yang. Provably robust deep learning via adversarially trained smoothed classiﬁers. In Proc. Neural Information Processing Systems (Neur IPS), pp. 11289 11300, 2019.

Alexander Schrijver. Theory of linear and integer programming. John Wiley & Sons, 1998.

Sanchari Sen, Balaraman Ravindran, and Anand Raghunathan. EMPIR: ensembles of mixed precision deep networks for increased robustness against adversarial attacks. In Proc. International Conference on Learning Representations, ICLR, 2020.

Gagandeep Singh, Timon Gehr, Matthew Mirman, Markus Püschel, and Martin Vechev. Fast and effective robustness certiﬁcation. In Advances in Neural Information Processing Systems, pp. 10802 10813, 2018a.

Gagandeep Singh, Jonathan Maurer, Matthew Mirman, Timon Gehr, Adrian Hoffmann, Petar Tsankov, Dana Drachsler Cohen, Markus Püschel, and Martin Vechev. ERAN: ETH Robustness Analyzer for Neural Networks, 2018b. URL https://github.com/eth-sri/eran.

Gagandeep Singh, Timon Gehr, Markus Püschel, and Martin Vechev. An abstract domain for certifying neural networks. Proceedings of the ACM on Programming Languages, 3(POPL):41, 2019.

Published as a conference paper at ICLR 2022

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. ar Xiv preprint ar Xiv:1312.6199, 2013.

Yusuke Tashiro, Yang Song, and Stefano Ermon. Output diversiﬁed initialization for adversarial attacks. ar Xiv preprint ar Xiv:2003.06878, 2020.

Florian Tramèr, Nicholas Carlini, Wieland Brendel, and Aleksander Madry. On adaptive attacks to adversarial example defenses. Co RR, abs/2002.08347, 2020.

Shiqi Wang, Yizheng Chen, Ahmed Abdou, and Suman Jana. Enhancing gradient-based attacks with symbolic intervals, 2019.

Chang Xiao, Peilin Zhong, and Changxi Zheng. Enhancing adversarial defense by k-winners-take-all. In Proc. International Conference on Learning Representations, ICLR, 2020.

Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. Efﬁcient neural network robustness certiﬁcation with general activation functions. In Advances in neural information processing systems, pp. 4939 4948, 2018.

Tianhang Zheng, Changyou Chen, and Kui Ren. Distributionally adversarial attack. Proc. AAAI Conference on Artiﬁcial Intelligence, 33:2253 2260, 07 2019.

Published as a conference paper at ICLR 2022

Table 4: Comparison between regions created with different values for the parameter c introduced in Algorithm 1 on the MNIST Conv Big Sc(20) T(0,1,0,1) experiment. Columns Under and Over show the median bounds on the number of concrete adversarial examples contained within the regions. The columns Time and #It depict the average time and number of iterations taken by Algorithm 1.

C #VERREG UNDER OVER TIME #IT

0.45 25 > 1059 < 10167 264 S 1.6 0.55 36 > 1052 < 10162 333 S 2.9 0.65 42 > 1064 < 10174 441 S 4.5 0.75 44 > 1067 < 10181 585 S 6.7 0.85 44 > 10113 < 10199 871 S 10.6

A ADDITIONAL EXPERIMENTS

A.1 DEPENDENCE OF PARADE ON THE VALUE OF c

In this section, we examine the effect of the parameter c, introduced in Algorithm 1, on the size of our provably robust adversarial examples. We show how the number of concrete adversarial examples contained in our regions robust to geometric transformations vary, as a function of c. Table 4 depicts the results on the MNIST Conv Big scaling and translation experiment, originally depicted in Table 2. The columns Time and #It depict the average time and number of iterations taken by Algorithm 1 in order to converge. Similarly to Table 2, the #Ver Reg column depicts the number of veriﬁed regions. The results correspond to our intuition for c: the higher the value of c is, the greater the volume produced by our method and vice versa. Further, we observe that for higher values of c, our algorithm succeeds to produce signiﬁcantly more veriﬁed regions. We chose the value c = 0.65 for the experiments in Table 2 in the main body of the paper, as it represents a good trade-off between the speed and the number of regions recovered, as shown in Table 4.

B FURTHER DETAILS ON OUR ALGORITHM DESIGN

B.1 ADVERSARIAL EXAMPLE DEFINITIONS

In this section, we provide precise deﬁnitions of our adversarial examples robust to pixel intensity changes and geometric changes, used in Table 1 and Table 2.

B.1.1 ADVERSARIAL EXAMPLES ROBUST TO L PIXEL INTENSITY CHANGES

Deﬁnition 4. For an image x Rn0, a neural network f, adversarial budget ϵ R, and adversarial class yt = f(x), a convex region I Rn0 is called an adversarial example around x robust to pixel intensity changes, iff for all points x I it is satisﬁed that x x ϵ and yt = f( x).

B.1.2 ADVERSARIAL EXAMPLES ROBUST TO GEOMETRIC CHANGES

Deﬁnition 5. Let T : Rn0 Rt Rn0 be a geometric perturbation that takes an image x Rn0 and a vector p Rt of geometric parameters, bounded within the hyperbox [pl, pu], given by the lower and upper bound vectors pl, pu Rt, and produces geometrically transformed version of the image, denoted x, that should be classiﬁed the same as x. For an image x Rn0, a neural network f, and adversarial class yt = f(x), a convex region I Rt is called an adversarial example around x robust to the geometric perturbation T , iff for all points p I it is satisﬁed that p [pl, pu] and yt = f(T (x, p)).

We note that this deﬁnition allows for T to be a complex composition of simple geometric transformations such as rotations and translations, as it is the case in Table 2. In this case, p is a vector of the parameters of all transformations. We further note that while Deﬁnition 5 deﬁnes I in terms of the geometric parameters p, one can also look at I, as the region in image space obtained by propagating all possible p through T . Deep G Balunovi c et al. (2019) overapproximates the later region in image space with a polyhedra to produce the polyhedral region I in image space. In Table 2, we report the underapproximation and overapproximation sizes of I in terms of number of concrete images it contains.

Published as a conference paper at ICLR 2022

Algorithm 2 PGD_PROJECT

1: func PGD_PROJECT( li, ui, li 1, ui 1, hp, a, b, pi 1 ) 2: li = Clip( li, li 1, ui 1 ) 3: ui = Clip( ui, li, ui 1 ) 4: val = hp( li, ui ) b 5: if val pi 1 then 6: return li, ui

7: end if 8: [li]a>0 = [li]a>0 (pi 1 b)

val 9: [ui]a<0 = [ui]a<0 (pi 1 b)

val 10: li = Clip( li, li 1, ui 1 ) 11: ui = Clip( ui, li, ui 1 ) RETURN li, ui

12: end func

B.2 DETAILED DESCRIPTION OF SHRINK_LP

In this section we describe in detail the function SHRINK_LP, used to shrink Ui 1 to Ui when generating examples robust to pixel intensity changes, based on the parameters Li 1(x) and pi 1. SHRINK_LP deﬁnes two sets of LP variables for the lower bounds li Rn0 and the upper bounds ui Rn0 of Ui. As described in Section 3.2, the LP optimizes the sum of widths of the dimensions of Ui, i.e., Pn0 j=1([ui]j [li]j).

Recall that Li 1(x) is a linear function of x and, thus, can be represented as Li 1(x) = a x + b. As described in Section 3.2, the LP we deﬁne needs to enforce minx Ui Li 1(x) pi 1. We rewrite minx Ui Li 1(x) as:

hp( li, ui ) = max(0, a) li + min(0, a) ui + b.

Since hp(li, ui) is a linear function of li and ui, we add the constraint hp(li, ui) pi 1 to the LP. This results in the LP problem:

l i, ui = argmax Pn0 j=1([ui]j [li]j) s.t. hp(li, ui) pi 1 [li 1]j [li]j [ui]j [ui 1]j for all j.

where li 1, ui 1 Rn0 represent the lower and upper bounds of Ui 1, the second constraint enforces Ui Ui 1 and l i, ui Rn0 denotes the LP solution. Finally, SHRINK_LP outputs the hyperbox Ui = [l i, ui].

B.3 DETAILED DESCRIPTION OF SHRINK_PGD

In this section we describe in detail the function SHRINK_PGD, used to shrink Ui 1 to Ui for generating regions robust to geometric transformations, based on the parameters Li 1(x) and pi 1.

SHRINK_PGD relies on PGD to optimize the log volume of Ui with respect to its lower and upper bounds, as described in Section 3.2. The optimization problem can be written as:

l i, ui = argmax Pn0 j=1 log([ui]j [li]j) s.t. hp(li, ui) pi 1 [li 1]j [li]j [ui]j [ui 1]j for all j, (1)

where we use the same notations as in the previous section. Since we rely on PGD, we need to deﬁne how to project values of li and ui that violate the constraints in Equation 1.

Algorithm 2 demonstrates the heuristic we use for the projection. Algorithm 2 slightly abuses the notations above, and treats li and ui, as the concrete values of li and ui to be projected, instead of treating them as variables.

It ﬁrst clips li and ui to enforce Ui Ui 1 (Line 2 3). It then checks whether hp(li, ui) pi 1 is satisﬁed and if so returns (Line 4 7). Otherwise, all elements of li corresponding to positive values

Published as a conference paper at ICLR 2022

Table 5: Neural networks used in our experiments. For each, we list the number of layers and neurons, as well as, their type (fully connected (FFN) or convolutional (Conv)).

DATASET MODEL TYPE NEURONS LAYERS

MNIST 8 200 FFN 1,610 9 CONVSMALL CONV 3,604 3 CONVBIG CONV 48,064 6

CIFAR10 CON VSMALL CONV 4,852 3 CONVBIG CONV 62,464 6

of a, [li]a>0, and all elements of ui corresponding to negative values of a, [ui]a<0, are adjusted by a factor of pi 1 b

val , where val = hp(li, ui) b. This enforces hp(li, ui) pi 1 (Line 8 9). In rare cases the second projection can result in values of li and ui violating Ui Ui 1. Our solution is to clip again (Line 10 11).

After the last clipping operation, in the rare case mentioned above the returned values for li and ui violate hp(li, ui) pi 1. However, we did not ﬁnd this to be a problem in practice, as it occurs rarely. Further, one can view such violations, as choosing a different value for c in Algorithm 1, resulting in different value for pi 1.

C FURTHER DETAILS ON EXPERIMENTAL SETUP

C.1 NETWORKS

Table 5 shows the architectures of the networks on which we evaluate.

C.2 FURTHER DETAILS ON THE EXPERIMENTAL SETUP IN SECTION 5.1

In this section, we provide further details on the experimental setup in Section 5.1. In this experiment, for creating O we use 2500 adversarial examples each from the Frank-Wolfe optimization algorithm (Frank & Wolfe, 1956) and PGD with step size 0.1ϵ and 0.01ϵ, respectively. For the PGD examples, we use output diversiﬁcation (Tashiro et al., 2020) with 5 iterations. Throughout the experiment, we set c = 0.99 and multiply it by a factor of 0.99 at each iteration of Algorithm 1.

C.3 FURTHER DETAILS ON THE EXPERIMENTAL SETUP IN SECTION 5.2

In this section, we provide further details on the experimental setup in Section 5.2. For all experiments in Table 2, we compute O using random sampling attacks which are the state-of-the-art for geometric transformations (Engstrom et al., 2019). For the 3 dimensional experiments, we rely on 15000 random samples, while for the 4D conﬁgurations we use 50000. We use c = 0.65 for all experiments, except for the 4 dimensional CIFAR10 experiment, where we use c = 0.75 to increase the precision of PARADE. For the MNIST experiments in Table 2, we execute SHRINK_PGD with 50 initialization and 200 PGD steps of size 5 10 5. For CIFAR10 experiments, we use 500 initialization and 50 PGD steps instead.

C.4 DETAILED DESCRIPTION OF EXPERIMENTAL SETUP IN SECTION 5.3

Neural networks In Section 5.3, we experiment with networks f with the same architectures as in Table 5. Following the standard practices for training networks used alongside smoothing, outlined in Cohen et al. (2019), we trained the networks on images with added Gaussian noise with a standard deviation of 0.15.

Selecting α and σ In the experiments in Section 5.3, we need to select α and σ to compute Radv using Deﬁnition 2. For all methods we use α = 0.005, as suggested by Cohen et al. (2019). We select the σ value that produces the biggest adversarial strength Radv for our method and use it for computing Radv for all methods. We note that tuning σ for the individual attacks generated by A is very computationally intensive and therefore we avoid it.

Published as a conference paper at ICLR 2022

Selecting R In this paragraph, we detail how the adversarial distance R is chosen. We choose it heuristically, for each individual x. For each x, we select R by searching for the smallest adversarial distance on f, where at least 10% of 500 attacks with A on f succeed but exclude images x, whose R is more than 33% bigger than the certiﬁed radius R of g. This procedure allows us to select the adversarial distance R , that is neither too small, nor too big, which is important to avoid making the problem trivial, as outlined in Section 2.3. In particular, for this choice of R the smoothed classiﬁer g is likely to be attackable, since f contains enough attacks for a high density input adversarial region to exist. On top of that, the exclusion of images x with too big adversarial distance R experimentally allowed us to exclude trivial attacks, for which x is attackable for most classes on g.

C.5 DETAILED DESCRIPTION OF ADAPTED EOT USED IN EXPERIMENTS IN SECTION 5.4

In this section, we discuss the extension of Athalye et al. (2018) we use to produce the empirically robust adversarial examples for the experiment in Section 5.4. We deﬁne an optimization procedure, where we seek a hyperbox geometric region Uemp Rn0, deﬁned in terms of the lower bound lemp Rn0 and the upper bound uemp Rn0 vectors. We seek to ﬁnd Uemp within the set of allowed geometric transformations I Rn0, such that Uemp is an empirically robust adversarial example with maximal log volume, and high Eo T. Similarly to Athalye et al. (2018) Eo T is calculated on set of samples from Uemp. We rely on differentiable bilinear interpolation (Jaderberg et al., 2015) to allow gradients to ﬂow through the geometric transformations we consider and optimize lemp and uemp using PGD.

D ANALYSIS OF ALGORITHM 1

In this section, we analyze the properties of PARADE.

D.1 SOUNDNESS OF ALGORITHM 1

Theorem 1. If Algorithm 1 converges, it returns a provably robust region U.

Proof. Algorithm 1 can only complete its execution using Line 6 or Line 18. Since we assume the algorithm successfully converged, returning at Line 18 is impossible. Therefore, the algorithm must have returned the region Ui 1 at Line 6 and, therefore, ei 1 must be non-negative (otherwise we would not have taken the right branch at Line 5). However, ei 1 is obtained by executing the veriﬁer V on Ui 1 at Line 4. By the deﬁnition of our veriﬁer V, it returns non-negative ei 1 only if Ui 1 is provably robust. This ﬁnishes our proof.

D.2 CONVERGENCE OF ALGORITHM 1

In this subsection, we demonstrate that Algorithm 1 converges exponentially fast under a suitable assumption for the monotonicity of the neural network certiﬁcation method V. To this end, we introduce the following deﬁnition of monotonicity of V, that we will leverage for the rest of the subsection.

Deﬁnition 6. Let V(f, I, yt) be a neural network certiﬁcation method that returns the certiﬁcation objective LI(x) for an input region I and target yt on the neural network f. We call V objectivemonotonic if for all f and input regions X and Y with X Y the property:

min x X LX (x) min x X LY(x)

is satisﬁed.

Intuitively, Deﬁnition 6 states that a veriﬁer is objective-monotonic if the certiﬁcation objective produced for a smaller region X is always higher (closer to veriﬁcation) than the certiﬁcation objective produced for a bigger region Y, when both of them are optimized over the smaller region X.

We note that most practical neural network certiﬁcation methods, including Deep Poly, are not objective-monotonic that is one can ﬁnd triples f, X and Y violating the deﬁnitions above.

Published as a conference paper at ICLR 2022

However, in practice we ﬁnd that for all practical neural network certiﬁcation methods the deﬁnitions hold for most triples. This is expected, as for violating triples we know that the overapproximation generated for X needs to be larger than the one generated for Y. Intuitively, if that was the case for a large portion of the triples, the certiﬁcation method will not be tight.

Next, we leverage this deﬁtion to demonstrate that our algorithm converges exponentially fast under the idealized condition that the certiﬁcation method used is objective-monotonic. Theorem 2. For objective-monotonic certiﬁcation methods V in Algorithm 1, the certiﬁcation error at the i-th iteration ei converges exponentially fast towards 0 (towards certiﬁcation).

Proof. To prove Theorem 2, we will leverage the deﬁnition of objective-monotonic certiﬁcation method above to bind the certiﬁcation error between consequitive iterations of Algorithm 1. We will then use the bound to derive a bound on the certiﬁcation error ei in terms of the intial certiﬁcation error e0. Finally, we will use the bound to show the exponential convergence.

Since Ui Ui 1, we can apply Deﬁnition 6 to get the inequality:

min x Ui Li(x) min x Ui Li 1(x).

By construction of Ui, we further know: min x Ui Li 1(x) pi 1 = ei 1 c.

Combining both inequalities and noting that ei = minx Ui Li(x), we get:

ei ei 1 c. (2) Since Equation 2 holds for all i, we can apply it recursively on its right-hand side to get: ei e0 ci. (3) We recall that ei is negative unless Ui veriﬁes. Therefore by Equation 3, ei converges exponentially fast towards 0, regardless of e0.

E PROVABLY ROBUST POLYHEDRA ADVERSARIAL EXAMPLES

In this section, we show an extension to PARADE that allows us to generate provably robust adversarial examples represented as a general polyhedron shapes P in the neural network input space. The approach relies on the underapproximation and overapproximation boxes U and O, presented in the main body, to guide the search of P and is demonstrated in Figure 2. The results of the method presented in this section were shown in Table 1 in Section 5.1.

E.1 BACKGROUND

Our method for generating polyhedral robust adversarial examples requires more restricted form for V, compared to the algorithms presented in the main body. In this subsection, we discuss the relevant background and the additional requirements we pose on V and introduce the relevant notation for the rest of this section.

For this section, we will focus on Re LU-based neural networks. We note that our concepts can easily be extended to networks with other activation functions such as sigmoid or tanh. We consider neural network with l layers f1, . . . , fl. The ﬁrst l 1 layers are of the form fi(x) = max(0, Ai x + bi) for i {1, . . . , l 1} and the ﬁnal layer is of the form fl(x) = Al x + bl. The neural network f is the composition of all layers: f = fl f1. Let ni be the number of neurons in layer i. If za i,j is the input to the j-th Re LU activation in layer i and zr i,j is the corresponding Re LU output, we can represent the neural network as the following system of constraints: For i {1, . . . , l} and j {1, . . . , ni},

za i,j = [bi]j +

k=1 [Ai]j,k zr i 1,k,

zr i,j = max(0, za i,j).

Published as a conference paper at ICLR 2022

Polyhedral Region

1. Try to verify the network and obtain hyperplanes

2. Adjust hyperplanes biases until they do not intersect the underapproximation box

3. Repeat until the polyhedral region is veriﬁable

Adjust Planes

Figure 2: Overview of our method for generating provably robust polyhedral adversarial examples. The green triangle denoted T represents the ground truth adversarial region. The dashed blue rectangle denoted O and the violet dotted rectangle denoted U represent the ﬁtted overapproximation and underapproximation boxes, respectively. The solid black lines represent hyperplanes generated by the certiﬁcation procedure V, while their dash-dotted counterparts represent the hyperplanes after bias-adjustment. The small arrows going out of the hyperplanes in Step 1 and Step 2 represent the direction which is retained by the hyperplanes corresponding half-spaces. The output polyhedral region denoted P is shown in red in Step 3.

zr i,j = λi,j za i,j + µi,j

zr i,j = λi,j za i,j + µi,j

zr i,j = za i,j

zr i,j = λi,j za i,j + µi,j

zr i,j = za i,j

Figure 3: Convex approximations for the Re LU function: (a) shows the triangle approximation Ehlers (2017) with the minimum area in the input-output plane, (b) and (c) show the two convex approximations used in Deep Poly Singh et al. (2019). In the ﬁgure, λi,j = ui,j/(ui,j li,j) and µi,j = li,j ui,j/(ui,j li,j). The ﬁgure is taken from Singh et al. (2019).

Here, zr 0,j describes the j-th of n0 input activations. We call the two types of constraints afﬁne constraints and Re LU constraints, respectively. We additionally constrain our inputs zr 0 to be points inside of a polyhedron P Rn0. An adversarial class yt has the highest score on all inputs in P if min(za l,yt za l,y) > 0 with respect to all constraints for all y {1, . . . , nl} \ {yt}. The goal in section will be to ﬁnd a large polyhedron P for which we can verify that this is true.

Linear approximations of Re LU As reasoning about neural network constraints directly is often intractable, given bounds li,j za i,j ui,j, the triangle approximation (Ehlers, 2017) relaxes the Re LU constraints such that all involved inequalities are linear:

zr i,j za i,j, zr i,j 0,

zr i,j λi,j za i,j + µi,j.

Figure 3 (a) visualizes the triangle approximation. Here, λi,j = ui,j/(ui,j li,j) and µi,j = li,j ui,j/(ui,j li,j) are selected such that this set of constraints describes the convex hull of the Re LU constraints in the (za i,j, zr i,j)-plane and the given bounds.

The Deep Poly approximation (Singh et al., 2019) further relaxes the triangle approximation by keeping only one of the lower bounds on each variable zr i,j. It picks either zr i,j za i,j or zr i,j 0, whichever minimizes the area of the resulting triangle in the (za i,j, zr i,j)-plane. Figure 3 (b) and (c) show the two convex relaxations used in Deep Poly.

Published as a conference paper at ICLR 2022

We can obtain li,j and ui,j by optimizing each component of za i,j according to the previously determined constraints for layers 0, . . . , i. We note that in the special case where ui,j 0 or li,j 0, no approximation is needed and zr i,j is directly assigned 0 or za i,j, respectively. Neurons for which the approximation is needed are called undecided.

Computing neuron bounds Most veriﬁcation algorithms based on convex relaxation start with a chosen input region for which they calculate the bounds li,j and ui,j of za i,j in layer-by-layer fashion, where the constraints for previous layers are used to derive the bounds for layer i. In addition to Deep Poly and the triangle approximation, this is also true for other commonly used convex relaxation algorithms such as CROWN (Zhang et al., 2018) and Deep Z (Singh et al., 2018a).

Next, we brieﬂy describe how the bounds are computed for a particular layer in Deep Poly. We focus on Deep Poly since it is the particular convex relaxation algorithm we chose to employ in this section. For all neurons, Deep Poly stores a single linear lower and upper bound inequality. These inequalities bound the neuron s value in terms of the values of neurons from previous layers in the network. For Re LU neurons, these are the inequalities described above. For afﬁne neurons, these are simply the afﬁne transformations, where the afﬁne equality is converted to a pair of inequalities.

To arrive at the bounds li,j and ui,j of za i,j, Deep Poly repeatedly backsubstitutes variables in the linear inequalities of za i,j with their corresponding linear inequalities from previous layers. The process continues until we arrive at linear inequalities Li,j(zr 0) za i,j Ui,j(zr 0), with Li,j(x) = ai,j x+bi,j and Ui,j(x) = ai,j x + bi,j denoting linear functions in terms of input variables x Rn0 with coefﬁcients ai,j, ai,j Rn0 and bi,j, bi,j R. The ﬁnal bounds li,j and ui,j are obtained by optimizing the respective functions Li,j(x) and Ui,j(x) over the input region.

We note that our algorithm does not depend on details of Deep Poly s backsubstitution algorithm: It only requires the existence of linear functions Li,j(x) and Ui,j(x) bounding neuron values za i,j in terms of input activations. This makes it compatible with other popular veriﬁcation algorithms based on convex relaxation, such as CROWN (Zhang et al., 2018) and Deep Z (Singh et al., 2018a).

Certiﬁcation Objective To certify the robustness of an input region, V needs to certify that min(za l,yt za l,y) > 0 for all y {1, . . . , nl} \ {yt}. To calculate lower bounds on those minima, a common trick is to introduce an afﬁne layer that computes the objectives za l,yt za l,y for each y. We will refer to this layer as objective layer. To obtain lower bounds on our objectives, we compute lower bounds of the corresponding neuron activations in the objective layer with the usual backsubstitution procedure.

For the rest of the section we will denote the backsubstituted linear lower bounds on our objectives in terms of input activations for a given target yt with Ll+1,y(x) = al+1,y x + bl+1,y for each y. Finally, to verify that min(za l,yt za l,y) > 0, it sufﬁces to show that Ll+1,y(x) > 0 for each point x in the input region and all y = yt. We note that the function denoted Ll+1,y(x) in this section corresponds to the function denoted Ly(x) in the main body.

E.2 COMPUTING P

The algorithm for computing P is presented in Algorithm 3. It takes as input:

The neural network f, as described in Section E.1. The underapproximation hyperbox U. The overapproximation hyperbox O. A neural network certiﬁcation method V( f, Pit 1, i, j ) that takes a neural network f, a polyhedron Pit 1 at iteration it 1, the layer number i and the neuron number j and returns the linear functions Li,j(x) and Ui,j(x) and the corresponding concrete bounds li,j and ui,j (See Section E.1). The adversarial attack s target class yt. The maximum number pit of iterations used to generate the polyhedron P.

The algorithm computes P, such that U P O is satisﬁed. We motivate this choice by noting that we know that U is certiﬁably robust and it is therefore a lower bound on P. The algorithm generates

Published as a conference paper at ICLR 2022

Algorithm 3 GEN_POLY

1: func GEN_POLY( f, U, O, V, yt, pit ) 2: P0 = O 3: for it {1, 2, . . . , pit} do 4: hs o, Ver = Gen_Obj_Planes( f, U, Pit 1, V, yt ) 5: hs b = Gen_Bound_Planes( f, U, Pit 1, V ) 6: Pit = Add_Planes( Pit 1, hs b hs o ) 7: if Ver then 8: return Pit 1 9: end if 10: end for 11: return Failed To Converge 12: end func

Algorithm 4 GEN_OBJ_PLANES

1: func GEN_OBJ_PLANES( f, U, Pit 1, V, yt ) 2: hs o = [] 3: Ver = True 4: for y {1, . . . , nl} \ {yt} do 5: Ll+1,y(x), _, ly, _ = V( f, Pit 1, l + 1, y ) 6: if ly 0 then 7: Ver = False 8: Ll+1,y(x) = Adjust_Bias( U, Ll+1,y(x) ) 9: hs o += [ Ll+1,y(x) 0 ] 10: end if 11: end for 12: return hs o, Ver 13: end func

a sequence of polyhedra P0 P1 . . . Pp with P0 = O. At each iteration, it computes two sets of half-space constraints hs o(Line 4) and hs b(Line 5), corresponding to the neurons in the objective and the afﬁne layers of the network. We detail how these constraints are computed in the next two paragraphs. We add these constraints to the polyhedron from the last iteration Pit 1 to obtain Pit (Line 6). The algorithm terminates when Pit 1 is certiﬁed to be robust (Line 7 9) and Pit 1 is returned. The result of our algorithm is represented by the red region denoted as P in Step 3 in Figure 2. We note that, since P0 = O, P O is guaranteed by construction, as each subsequent iteration of the algorithm shrinks the polyhedron further. However, U P depends on the choice of the half-space constraints.

Generating Certiﬁcation Objective Constraints Algorithm 4 details the procedure we use to collect the half-space constraints corresponding to the objective layer of the network. To obtain constraints from V, we ﬁrst compute the linear functions Ll+1,y(x) and their respective certiﬁcation errors ly for all output classes y {1, . . . , nl} \ {yt} (Line 5). We know that adding the constraints Ll+1,y(x) 0 to our polyhedron Pit 1 will result in a certiﬁably robust region Pit, however the region will not obey U Pit. Therefore, we instead adjust the bias of constraints, so that the resulting hyperplanes Ll+1,y(x) = 0 do not intersect U (Line 8). This process is depicted geometrically in Step 2 in Figure 2 and will be described in more detail below. All resulting bias-adjusted hyperplanes are collected in the set hs o (Line 9) and returned. We note that if ly > 0 for some y, the half-spaces Ll+1,y(x) 0 are trivial. That is {Ll+1,y(x) 0} Pit 1 = Pit 1, and thus they are ﬁltered out of hs o (Line 6). In addition to detecting the trivial constraints, we use the certiﬁcation errors ly to check if Pit 1 is certiﬁably robust and return the result alongside hs o in Line 11.

Issues Related to Using Only Objective Constraints In our experiments, we found that only adding bias-adjusted objective constraints leads to slow convergence of our algorithm. In particular, we found that the ﬁrst few iterations of the algorithm result in a substantial improvement in the certiﬁcation objective. However, after this initial fast progress, we observe that the change in the certiﬁcation objective becomes very small. The reason for this is that the generated constraints from consecutive executions of V are very similar. We conjecture that the underlying reason is that the constraints no longer change the input shape enough to substantially change the convex approximation of the

Published as a conference paper at ICLR 2022

Algorithm 5 GEN_BOUND_PLANES

1: func GEN_BOUND_PLANES( f, U, Pit 1, V ) 2: hs b = [] 3: for i {1, 2, . . . , l 1} do 4: for j {1, 2, . . . , ni} do 5: Li,j(x), Ui,j(x), li,j, ui,j =V( f, Pit 1, i, j ) 6: if li,j 0 and ui,j 0 then 7: Li,j(x) = Adjust_Bias( U, Li,j(x) ) 8: Ui,j(x) = Adjust_Bias( U, Ui,j(x) ) 9: hs b += [ Li,j(x) 0, Ui,j(x) 0 ] 10: end if 11: end for 12: end for 13: return hs b 14: end func

network. This is in contrast to what we observe when computing the underapproximation hyperbox region U. The difference between the two settings comes from the number of added constraints per iteration. For U, usually many lower and upper bounds are adjusted each iteration, while the objective constraints added are much less, especially for the pixel intensity changes experiment, where the input is very high-dimensional.

Generating Constraints from Afﬁne Layers To facilitate faster convergence, we generate additional constraints for all undecided neurons in the network. We generate two constraints, one with respect to the lower bound of the neuron Li,j(x) 0 and one with respect to the upper bound Ui,j(x) 0. Algorithm 5 describes the process. Li,j(x) and Ui,j(x) are computed by V (Line 5) alongside their concrete bounds li,j and ui,j. The concrete bounds are used to check which neurons are undecided (Line 6).

The generated constraints have the effect of making the neurons decided, while also enforcing substantial change in the convex relaxation of the neural network. Note that the newly obtained constraints are bias-adjusted similar to the objective constraints described above (Line 7 8). We point out that the bias adjustment described below takes care of the problem that Li,j(x) 0 and Ui,j(x) 0 cannot be simultaneously true. Empirically, these hyperplanes serve an important role for improving the convergence rate of our algorithm. Algorithm 5 collects them in the set hs b (Line 9) and returns them to be added to Pit 1.

Hyperplane bias adjustment So far, we have discussed how constraints are generated but not how we adjust their biases. The bias adjustment algorithm is given in Algorithm 6. The bias adjustment for a lower bound constraint h(x) 0 enforces that the intersection of the constraint and the given underapproximation hyperbox U is U. It does so by computing hmin = min x U(h(x))(Line 2)

and changing the given constraint to h(x) hmin (Line 4). The optimization hmin = min x U(h(x))

is analytically solvable, as demonstrated Section B.2. Note that our algorithm adjusts only the constraints with negative hmin (Line 3), since constraints with positive hmin already satisfy the property. The following lemma shows the soundness of our bias-adjustment algorithm:

Lemma 3. The intersection of the bias-adjusted half-space constraint h(x) 0 deﬁned in Algorithm 6 with U is U.

Proof. By the construction of hmin, for all points x U we have h(x) hmin. We distinguish two cases for the value of hmin non-negative and negative.

If hmin 0, the bias adjustment badj is assigned to 0 and h(x) = h(x). Since h(x) hmin for all points x U, h(x) = h(x) hmin 0 for all points x U. Therefore, there does not exist point in U for which the bias-adjusted half-space constraint h(x) 0 is violated. If hmin < 0, the bias adjustment badj is assigned to hmin and h(x) = h(x) hmin. Therefore, h(x) = h(x) hmin hmin hmin = 0 for all points x U. Analogously to the other case,

Published as a conference paper at ICLR 2022

Algorithm 6 ADJUST_BIAS

1: func ADJUST_BIAS( U, h(x) ) 2: hmin = min x U(h(x))

3: badj = min( 0, hmin ) 4: h(x) = h(x) badj 5: return h(x) 6: end func

h(x) 0 is not violated by the points in U. Since in both cases h(x) 0 is not violated by any points in U, the intersection of the constraint and U is simply U.

The bias adjustment for upper bound constraints h(x) 0 is achieved by calling Algorithm 6 with h(x), as in Line 8 in Algorithm 5.

Adapting Deep Poly to Polyhedral Input Regions Out of the box, Deep Poly only handles certiﬁcation of hyperbox input regions. This is because in order to obtain the concrete bounds li,j = minx I Li,j(x) and ui,j = maxx I Ui,j(x) for an input region I Rn0 it uses the analytical solution of the optimization problems discussed in Section B.2. In order to use Deep Poly for polyhedral region I, we solve the optimization problem using an Linear Program (LP). This is computationally expensive it involves solving two LP instances per neuron. For large networks, this can be prohibitive.

To alleviate this issue, in the ﬁrst iteration of Algorithm 3, we remember which neurons are undecided. The computation of the undecided neurons in the ﬁrst iteration can be done using regular Deep Poly, as P0 = O. In future iterations we solve the LP instances only for the undecided neurons, while using the P0 bounds for the decided neurons. This optimization is sound, since we have Pit P0 by construction. This improves the performance substantially for most certiﬁably robust input regions in Re LU-based networks we have just a few undecided neurons.

Proving that P constraints U and is contained within O In this paragraph, we present a proof for the desired property of Algorithm 3 that P constraints U and is contained within O.

Theorem 4. For all polyhedral regions P generated by Algorithm 3, U P O holds.

Proof. To prove P O, we leverage the fact that the polyhedra P is constructed from the overapproximation box O by intersecting it with the half-space constraints hbi b and hbi o. Since all constraints only reduce the volume of the polyhedra, it follows by induction over the generated half-space constraints that Pi O for all i. Therefore, P O.

To prove U P, we note that the generated half-space constraints hbi b and hbi o are chosen such that hbi b U = U and hbi o U = U for all i (See Lemma 3). By applying this property by induction over the generated half-space constraints, it follows that U Pi for all i. Therefore, U P.

F VISUALISATION OF ROBUST ADVERSARIAL EXAMPLES

F.1 DISCUSSION

Figure 4 shows an adversarial region provably robust to intensity changes containing 10284 adversarial images produced by our method on MNIST Conv Big. The original image is of the digit 5, and all images in our region are classiﬁed as 3 by the network. The colorbar in Figure 4 quantiﬁes the number of values each pixel can take in our inferred region. The yellow and violet colors represent the two extremes. The intensity of the yellow-colored pixels can vary the most, thus these pixels contribute to more adversarial examples. The intensity of the purple-colored varies the least, thus the adversarial examples in our region are sensitive to the intensity values of these pixels. In our region, the intensity of most background pixels on the edges of the image can vary a lot, as these are green. Violet and green color are more evenly distributed among pixels closer to the foreground (part of the digit "5"). Further, the intensity of several pixels in the foreground can also vary signiﬁcantly. We

Published as a conference paper at ICLR 2022

Figure 4: Adversarial region robust to pixel intensity changes.

note that all yellow pixels occur in the foreground. In summary, our region can capture examples that can be generated by signiﬁcant variations in the intensities of several background and foreground pixels. We supply visualizations for all of the experiments described in Section 5.1 and 5.2 in the next two sections.

F.2 ADVERSARIAL EXAMPLES ROBUST TO INTENSITY CHANGES

In Figure 5 8, we visualise adversarial examples provably robust to pixel intensity changes for the different networks in Table 1. For all ﬁgures, the colorbar on the right-hand side quantiﬁes the number of values each pixel can take in our inferred region. The yellow and violet colors represent the two extremes. The intensity of the yellow-colored pixels can vary the most, thus these pixels contribute more to the adversarial examples.

In Figure 5, we show adversarial examples for the MNIST Conv Big network. The 3 sub-ﬁgures represent the digits 9, 5, and 3, but the images in our regions are classiﬁed as 4, 3, and 8, respectively. Our regions are of size 10220, 10284, and 10262, respectively.

Figure 5: Sensitivity of probably robust adversarial examples based on pixel intensity changes on the MNIST Conv Big network.

In Figure 6, we show adversarial examples for the MNIST Conv Small network. The 3 sub-ﬁgures represent the digits 2, 9, and 9, but the images in our regions are classiﬁed as 3, 5, and 4, respectively. Our regions are of size 10520, 10652, and 10708, respectively.

In Figure 7, we show adversarial examples for the MNIST 8 200 network. The 3 sub-ﬁgures represent the digits 5, 9, and 2, but the images in our regions are classiﬁed as 8, 8, and 3, respectively. Our regions are of size 10148, 10128, and 10157, respectively.

In Figure 8, we show adversarial examples for the CIFAR10 Conv Small network. The 3 sub-ﬁgures represent a boat, a bird, and another bird, but the images in our regions are classiﬁed as an airplane , a frog, and a dog, respectively. Our regions are of size 10626, 10563, and 10608, respectively.

Published as a conference paper at ICLR 2022

Figure 6: Sensitivity of probably robust adversarial examples based on pixel intensity changes on the MNIST Conv Small network.

Figure 7: Sensitivity of probably robust adversarial examples based on pixel intensity changes on the MNIST 8 200 network.

Figure 8: Sensitivity of probably robust adversarial examples based on pixel intensity changes on the CIFAR10 Conv Small network.

F.3 ADVERSARIAL EXAMPLES ROBUST TO GEOMETRIC CHANGES

In Figure 9 11, we visualise adversarial examples provably robust to geometric perturbations for the different networks in Table 2. For all ﬁgures, the colorbar on the right-hand side quantiﬁes the number of values each pixel can take in our inferred region. The yellow and violet colors represent the two extremes. The intensity of the yellow-colored pixels can vary the most, thus these pixels contribute more to the adversarial examples. For all ﬁgures, the subﬁgures represent an adversarial example for each of the rows in Table 2.

In Figure 9, we show adversarial examples for the MNIST Conv Big network. The 3 sub-ﬁgures represent the digits 6, 1, and 3, but the images in our regions are classiﬁed as 5, 8, and 7, respectively. Our regions have underapproximations of size 10159, 10183, and 1013, respectively.

Figure 9: Sensitivity of probably robust adversarial examples based on geometric perturbations. Subﬁgures (a), (b) and (c) correspond to the ﬁrst, second, and third geometric perturbations for the MNIST Conv Big network in Table 2, respectively.

In Figure 10, we show adversarial examples for the MNIST Conv Small network. The 3 sub-ﬁgures represent the digits 6, 1, and 3, but the images in our regions are classiﬁed as 5, 8, and 7, respectively. Our regions are of size 10115, 1082, and 1063, respectively.

Published as a conference paper at ICLR 2022

Figure 10: Sensitivity of probably robust adversarial examples based on geometric perturbations. Subﬁgures (a), (b) and (c) correspond to the ﬁrst, second, and third geometric perturbations for the MNIST Conv Small network in Table 2, respectively.