# lipschitzcertifiable_training_with_a_tight_outer_bound__f91c9e4f.pdf

Lipschitz-Certiﬁable Training with a Tight Outer Bound

Sungyoon Lee Seoul National University Seoul, Korea goman1934@snu.ac.kr

Jaewook Lee Seoul National University Seoul, Korea jaewook@snu.ac.kr

Saerom Park Sungshin Women s University Seoul, Korea psr6275@sungshin.ac.kr

Veriﬁable training is a promising research direction for training a robust network. However, most veriﬁable training methods are slow or lack scalability. In this study, we propose a fast and scalable certiﬁable training algorithm based on Lipschitz analysis and interval arithmetic. Our certiﬁable training algorithm provides a tight propagated outer bound by introducing the box constraint propagation (BCP), and it efﬁciently computes the worst logit over the outer bound. In the experiments, we show that BCP achieves a tighter outer bound than the global Lipschitz-based outer bound. Moreover, our certiﬁable training algorithm is over 12 times faster than the state-of-the-art dual relaxation-based method; however, it achieves comparable or better veriﬁcation performance, improving natural accuracy. Our fast certiﬁable training algorithm with the tight outer bound can scale to Tiny Image Net with veriﬁcation accuracy of 20.1% (ℓ2-perturbation of ϵ = 36/255). Our code is available at https://github.com/sungyoon-lee/bcp.

1 Introduction

Deep learning has shown successful results in many applications. However, it has been demonstrated that deep neural networks are vulnerable to small but adversarially designed perturbations in the input which can mislead a network to predict a wrong label [33]. There have been many studies on such adversarial attacks and defenses against them [12, 19, 28, 27, 24, 4, 35, 41, 14].

However, Athalye et al. [1] have shown that many of these defense methods are designed to defend against speciﬁc predeﬁned adversarial attacks, and, in turn, the models can yet be broken by unseen stronger adaptive adversaries. Thus, many veriﬁcation methods are proposed to guarantee stable prediction of input within a perturbation set [18, 5, 9, 16, 38, 34, 23, 11, 3, 30, 8, 43, 2]. Veriﬁcation of a neural network provides either lower bounds on the norm of the input perturbations required to fool the network or upper bounds on the worst-case errors of the network against speciﬁed perturbations. In particular, veriﬁable training incorporates the veriﬁcation procedure using the upper bound into the training loop and yields a robust model [39, 40, 7, 8, 29, 30].

Veriﬁable training methods are mainly categorized into two approaches: dual relaxation and layerwise bound propagation approaches. The dual relaxation approach formulates the veriﬁable training as a convex optimization and uses duality to build a relaxed bound of the optimization problem and to relieve the computational load [39, 40, 29, 7, 30]. Although these veriﬁable training methods can provide relatively exact robustness bounds for veriﬁcation, they still involve expensive computations and poor scalability. In contrast, the layer-wise bound propagation approach calculates the upper bounds on the worst case error through relaxation on the layer-wise operations and forward propagation for the perturbation set that can be made of ℓ - or ℓ2-balls [13, 44, 36, 25, 32, 37]. These layer-wise methods are computationally efﬁcient but have loose bounds in the initial phase of training, hindering the application to larger networks.

34th Conference on Neural Information Processing Systems (Neur IPS 2020), Vancouver, Canada.

Apart from these deterministic veriﬁcation approaches, randomized smoothing is one promising approach to procure robust classiﬁcation results. This can make any classiﬁer to acquire a certiﬁed adversarial robustness by constructing a smoothed classiﬁer [22, 21, 6, 31]. However, the randomized smoothing methods require a large number of samples to certify the classiﬁer.

In this study, we propose an efﬁcient certiﬁable training method with a tight outer bound propagation. This propagation enables the model to scale to Tiny Image Net. Our algorithm minimizes an upper bound on the robust classiﬁcation error. We further tighten the upper bound by introducing a valid box constraint into the optimization problem, thereby improving the optimal solution. By tightening the upper bound on the objective, both the robustness and standard accuracy of our method improve.

To summarize, the main contributions of this paper are as follows:

We propose a fast certiﬁable training algorithm called Box Constraints Propagation (BCP) with an efﬁcient computation of the upper bound on the robust classiﬁcation error. BCP is over 12 times faster than the state-of-the-art dual relaxation-based method [40].

We can obtain tighter outer bounds than those without BCP. These bounds are on average 25.3-55.4% tighter in terms of the length of the worst logit translation. Therefore, our certiﬁcate using BCP achieves the veriﬁcation accuracy comparable to CAP [40], while improving the natural accuracy on CIFAR-10.

Our approach can scale to Tiny Image Net and learn a certiﬁcate that can achieve 20.1% veriﬁcation accuracy (ϵ = 36/255). To the best of our knowledge, this is the ﬁrst non-trivial (deterministic) veriﬁcation accuracy on Tiny Image Net under the ℓ2-robustness.

Our veriﬁcation loss can adapt to the input locations; thus, the model can learn a behavior depending on the input locations.

2 Certiﬁable Training with Worst Logit

In this section, we introduce the robust training problem for multi-class classiﬁcation, deﬁne the speciﬁcation based on the worst logit, and establish the objective that provides an upper bound of the robust training problem.

Notation We consider a c-class classiﬁcation problem, where x X RN is an input, y Y = {0, 1, ..., c 1} is the label with respect to the input x, and c is the number of classes. A mapping that takes an input x and outputs a logit vector ζ = z(x) Z is denoted by z : X Z Rc, and the corresponding classiﬁer is f : X Y with f(x) = argmaxm Y zm(x) where zm is the output logit for a class m Y. We assume the classiﬁer network is a feedforward network with K layers as z(k) = h(k)(z(k 1)), k = 1, . . . , K, where z(k) is the vector of the activations in the k-th layer, z(K) = ζ, z(0) = x, and h(k) is the operation in the k-th layer. Let B(x, ϵ) be a perturbation set around the input x with a level of perturbation ϵ. Then, for a classiﬁer f, the robust classiﬁcation error within the perturbation set B( , ϵ) on a data distribution D is deﬁned as R(f) = PD x B(x, ϵ) s.t. f(x ) = y . We omit the dependency on D and ϵ for simplicity.

2.1 Robust Training

The main goal of certiﬁable training is to minimize the robust classiﬁcation error R(f). However, because the exact veriﬁcation for R(f) is NP-complete [18], a simple surrogate of R(f) is used to construct the objective of certiﬁable training. Our certiﬁable training minimizes an upper bound on R(f) that builds a certiﬁcate of robustness whereas adversarial training [24] minimizes a lower bound on R(f). To obtain the upper bound on R(f), we propagate the perturbation set B(x, ϵ) and calculate the outer bound on the propagated image in the logit space Z. For simplicity, B(x) denotes the input perturbation set B(x, ϵ). Let ˆz(B(x)) Z be an outer bound on the logit image of the perturbation set z(B(x)). Then, we can construct the following upper bound ˆR(f) on R(f):

R(f) = E(x,y) D max ζ z(B(x)) max y =y 1[ ζy ζy 0]

E(x,y) D max ζ ˆz(B(x)) max y =y 1[ ζy ζy 0] = ˆR(f), (1)

where ζm is the m-th element of the logit vector ζ and 1[ ] denotes the indicator function.

2.2 Worst-Translated Logit

Based on the upper bound ˆR(f) in (1), we can construct an objective for veriﬁable training as E(x,y) D max ζ ˆz(B(x)) L(ζ, y) using cross-entropy loss L as a surrogate loss function for the 0-1 loss

of ˆR(f). However, it is still inefﬁcient to ﬁnd the optimal solution for the non-convex maximization problem max ζ ˆz(B(x)) L(ζ, y). Dual relaxation approach addressed this problem by computing

a differentiable upper bound on the robust classiﬁcation error, using a feasible dual solution of the underlying relaxed LP [39, 40, 8]. In contrast, layer-wise propagation approach proposed the worst-case logit or the certiﬁable margin in the logit space to obtain a differentiable upper bound on the robust classiﬁcation error [36, 13, 44]. In this study, we introduce the worst-translated logit z(x) that provides an upper bound on the cross-entropy loss over an outer bound ˆz(B(x)) as follows:

Deﬁnition 1. The worst-translated logit over an outer bound ˆz(B(x)) for the input x and the corresponding label y is deﬁned as z(x; y) = z(x) + t(x; y) where the translation vector t(x; y) has its m-th element with

tm(x; y) = (zy(x) zm(x)) min ζ ˆz(B(x))

ζy ζm . (2)

When the context is clear, we omit y in z(x; y) and t(x; y), and just write z(x) and t(x) for brevity.

Proposition 1 (Wong and Kolter [39]). For an outer bound ˆz(B(x)) z(B(x)) and its corresponding worst-translated logit z(x), the following inequality holds:

max ζ ˆz(B(x)) L(ζ, y) L(z(x), y), (3)

where L is the cross-entropy loss function.

Finally, the objective to be minimized is formulated as follows:

J (f, D) = E(x,y) D L(z(x), y) . (4)

Note that the worst-translated logit z(x) for the input x may not be inside the outer bound ˆz(B(x)). The remaining problem is how to calculate the outer bound ˆz(B(x)) of the logit image z(B(x)) and how to solve the minimization in (2) corresponding to the outer bound, which will be discussed in Section 3.1 and 3.2, respectively.

Furthermore, by using the worst-translated logit z(x) as a certiﬁcate that guarantees robustness to adversarial perturbations, we can obtain veriﬁcation error of the model f on the test data Dtest as follows: RV (f) = ˆP(x,y) Dtest min y =y zy(x) zy (x) 0 (5)

which is larger than the robust classiﬁcation error R(f) on the test data Dtest.

3 Lipschitz-Certiﬁable Training with Tight Outer Bound

In this section, we propose a tight outer bound estimation and an efﬁcient algorithm for calculating the worst-translated logit. We mainly focus on ℓ2-perturbation sets in the input space, but our method can be easily extended to any ℓp-perturbations for p > 0 and ℓ -perturbations, as described later.

Notation The ℓ2-perturbation set and the ℓ -perturbation set in the input space are denoted by B2(x, ϵ) = {x : x x 2 ϵ} and B (x, ϵ) = {x : |x i xi| ϵ, i}, respectively. To obtain a tight outer bound ˆz(B2(x, ϵ)), we propagate the perturbation sets through the layers and calculate layerwise outer bounds B(k) 2 and B(k) in the k-th layer. The k-th layer ℓ -bound B(k) can be represented as the box constraint B(k) = midrad(m(k), r(k)) {p : |pi m(k) i | r(k) i , i} with the midpoint m(k) and the radius r(k) [26]. We call the B(k) 2 "ball outer bounds" and the B(k) "box outer bounds".

Figure 1: Illustration of BCP. For the k-th layer, the k-th box B(k) (left) is propagated to the next box B(k+1) (right), both colored in red. Note that the k-th ball B(k) 2 is independently propagated to the next ball B(k+1) 2 which has L(k) times larger radius.

3.1 Outer Bound Estimation

The outer bound from ℓ2-perturbation ˆz(B2(x, ϵ)) can be simply constructed by using the global Lischiptz constant L of the logit function z, where ˆz(B2(x, ϵ)) = B2(z(x), ϵL) [36]. The global Lipschitz constant is efﬁciently computed as the product of all layer-wise Lipschitz constants, L = QK k=1 L(k). To tighten the spherical outer bound in the logit space, Tsuzuku et al. [36] replaced it with the ellipsoidal outer bound, ˆz(B2(x, ϵ)) = h(K)(B2(z(K 1), ρ(K 1))), where ρ(k) = ϵ Qk i=1 L(i). In the objective (4), the ellipsoidal outer bound enables the Lipschitz-margin to solve the optimization in (2) explicitly. However, the global Lipschitz constant can still overestimate the outer bound and impose a strong penalty for it, limiting the expressiveness of the model [17]. It leads to a poor classiﬁcation performance, not getting sharp transitions near decision boundary [15]. On the other hand, using local Lipschitz constants to estimate the outer bounds for each given input x is computationally infeasible to be integrated into the training loop for medium-sized networks, and thus limited to 2-layered networks [15, 10, 20].

To this end, we propose a method called BCP, using layer-wise propagation with the Lipschitz constant and interval arithmetic to efﬁciently approximate the propagation of the perturbation set adaptive to the input location. This addresses the problems of the global Lipschitz constant-based outer bound and remains the efﬁcient computations for certiﬁable training. In addition, it enables us to obtain a certiﬁcate that can adapt to local properties of the classiﬁer. We further discussed the intuition behind the design of BCP in the supplementary material.

Box Constraint Propagation Our outer bound propagation starts with the ℓ2and ℓ - perturbations sets (B(0) 2 , B(0) ), propagates them through layers, and derive the tight ℓ -outer bound B(k) in each layer by circumscribing the propagated images and ﬁnding the intersection of the circumscribed boxes. In the penultimate layer, we combine the propagated box constraint bound B(K 1) and the propagated global Lipschitz bound B(K 1) 2 to tighten the ﬁnal outer bound ˆz(B(x)).

For ℓ2-certiﬁable training, we ﬁrst consider the pair (B(0) 2 , B(0) ), where B(0) 2 B2(x, ϵ) and B(0) B (x, ϵ) circumscribing B(0) 2 in the input space. Next, we propagate them through the layers to compute the layerwise outer bound pair (B(k) 2 , B(k) ). Here, we assume a feedforward network, but we can extend it to residual networks (see the supplementary material). The ball outer bound in the k-th layer B(k) 2 is the Lipschitz outer bound B2(z(k), ρ(k)) with the radius ρ(k) = ϵ Qk i=1 L(i), where we use the power iteration to estimate the layer-wise Lipschitz constants L(i). The box outer bound B(k+1) in the (k + 1)-th layer (k = 0, 1, , K 2) is obtained by two box constraints that one circumscribes the propagated ellipse image h(k+1)(B(k) 2 ) of the ball B(k) 2 and the other circumscribes the propagated parallelepiped image h(k+1)(B(k) ) of the box B(k) as described in Figure 1.

In case of linear layers, the circumscribed box about the propagated ellipse image, out (h(k+1)(B(k) 2 )), is calculated as follows:

out (h(k+1)(B(k) 2 )) = midrad( ˆm(k), ˆr(k)) s.t. ˆm(k) = h(k+1)(z(k)), ˆr(k) i = W(k+1) i,: 2 ρ(k), (6)

where W(k+1) i,: is the i-th row of the weight matrix W(k+1) of the linear function h(k+1). Simultaneously, we use the interval arithmetic to obtain the other box about the propagated parallelepiped image, IA(B(k) ) as in [13]:

IA(B(k) ) = midrad( m(k), r(k)) s.t. m(k) = h(k+1)(m(k)), r(k) = |W(k+1)| r(k), (7)

where |W| takes the element-wise absolute values of W. The above two propagations can be easily extended to nonlinear layers. The details are described in the supplementary material.

Finally, we can obtain the box outer bound B(k+1) = out (h(k+1)(B(k) 2 )) IA(B(k) ) for the next (k + 1)-th layer as illustrated in Figure 1 with the following equations:

m(k+1) = (ub(k+1) + lb(k+1))/2, r(k+1) = (ub(k+1) lb(k+1))/2 s.t.

ub(k+1) = max( ˆm(k) + ˆr(k), m(k) + r(k)), lb(k+1) = min( ˆm(k) ˆr(k), m(k) r(k)), (8)

where max and min take the element-wise maximum and minimum values, respectively.

In the penultimate layer, we obtain the intersection B(K 1) B(K 1) 2 of the box outer bound B(K 1) and the ball outer bound B(K 1) 2 . The intersection is propagated to the logit space through the last linear layer to obtain the tight outer bound, as ˆz(B(x)) = h(K)(B(K 1) B(K 1) 2 ) Z.

Extension to ℓp-norm We note that BCP can be easily extended to ℓp-certiﬁable training for any p > 0 by modifying B(0) 2 = B2(x, ϵ) to B2(x, ϵ ) circumscribing Bp(x, ϵ) in the input space RN, where ϵ = N 1/2 1/max(p,2)ϵ. For the ℓ -case, we can use B(0) 2 = B2(x,

Nϵ) circumscribing B(0) . Thus, for ℓ -bound, BCP can be considered as a generalized version of IBP (Interval Bound Propagation) [13]. We found that BCP shows a similar performance to IBP as an ℓ -certiﬁed training (see the supplementary material for the details). For now we focus on the performance of BCP under ℓ2-perturbations.

3.2 Certiﬁable Training Algorithm

Formulation Our certiﬁable algorithm aims to minimize the objective J (f, D) in (4) to get a robust classiﬁer. The objective contains the worst-translated logit z(x), which requires computation of the translation vector t(x) in (2). In this section, we propose an efﬁcient algorithm to calculate t(x) for the tight propagated outer bound ˆz(B(x)) = h(K)(B(K 1) B(K 1) 2 ) proposed in Section 3.1. In Equation (2), zy(x) zm(x) is easily obtained by a forward pass through the network. However, the optimization min ζ ˆz(B(x))

ζy ζm is nontrivial and dependent on the outer bound ˆz(B(x)).

Without loss of generality, we can assume that y = 1 and m = 0. Then, the optimal values ζ 0, ζ 1 are as follows: ζ 0, ζ 1 = argmin (ζ0,ζ1) Π0,1ˆz(B(x))

ζ1 ζ0 , (9)

where Π0,1 is the projection onto the ζ0ζ1-plane. Then, we can formulate the following optimization:

min ζ ˆz(B(x))(e1 e0)T ζ = min ζ B(K 1) 2 B(K 1) (e1 e0)T h(K)(ζ )

= min ζ B(K 1) 2 B(K 1) (W(K) 1,: W(K) 0,: )ζ + b(K) 1 b(K) 0 , (10)

where ei is the i-th standard basis vector, and W(K) and b(K) is the weight matrix and the bias vector for the last linear layer h(K). Note that ζ is the vector in the penultimate layer. Therefore, we can construct the following optimization problem that ﬁnds the largest violation of the speciﬁcation to verify the network:

min ζ c T ζ s.t. ζ z(K 1) 2 ρ(K 1), |ζ m(K 1)| r(K 1), (11)

where c is a speciﬁcation vector with c T = W(K) 1,: W(K) 0,: , and the second constraint takes the element-wise absolute value and the element-wise inequality. Since it is computationally expensive to

Algorithm 1 Box Constraint Propagation (BCP) Certiﬁable Training

Input: training data (x, y) D, target perturbation size ϵtarget, network parameterized by θ Output: Robust network fθ repeat

Read mini-batch B from D and adjust ϵ and λ according to the schedule. // Compute the box outer bound and the ball outer bound // B(K 1) = midrad(m(K 1), r(K 1)) where m(K 1), r(K 1) = BCP(x, ϵ; θ) ((6)-(8)). B(K 1) 2 = B2(z(K 1), ρ(K 1)) where z(K 1) = h(K 1) h(1)(x) and ρ(K 1) = ϵ QK 1 i=1 L(i)

// Solve the optimization in (11) for each m = y in parallel // Initialize p = z(K 1) ρ(K 1) c c .

while not |p m(K 1)| r(K 1) do

Decompose p into two parts: p = p[I] + p[Ic], where I {l : |pl m(K 1) l | r(K 1) l }. First phase Project p[I] onto B(K 1) . Second phase With the scaling parameter η in (12), update p ΠB(K 1) p[I] + ηp[Ic]. end while Calculate the worst-translated logit z(x) = z(x) + t(x) with (2) and (10):

tm(x) = c T (z(K 1) p). // Update Parameters // Update the parameter θ with the objective (13):

θ θ α θJ (fθ, B; λ). until training phase ends

integrate a typical optimization tool within the training loop, we propose a simple iterative algorithm that approaches to the optimal solution of (11) in a ﬁnite number of steps. We emphasize that by solving (11) we can obtain a better certiﬁcate than the global Lipschitz-based certiﬁcate because it uses additional box constraint, |ζ m(K 1)| r(K 1). We will see how this additional constraint affects the outer bound and the veriﬁcation performance in Section 4.

Solving the optimization We solve the optimization (11) by using an efﬁcient iterative algorithm that terminates when none of the elements violate the box constraint. Our algorithm starts with the initial point p = z(K 1) ρ(K 1) c c which is the optimal solution of (11) when ignoring the box constraint. Then p satisﬁes the ball constraint but is not guaranteed to satisfy the box constraint. We decompose the indices of p into two parts, I and Ic, where I {l : |pl m(K 1) l | r(K 1) l }. Then, we can represent p = p[I] + p[Ic], where p[J] = P

l J plel. Note that I or Ic can be empty, and we deﬁne p[φ] = 0. Then, we iterate the following two phases to ﬁnd the optimal solution efﬁciently. In the ﬁrst phase, p[I] is projected onto the box, denoted by ΠB(K 1) p[I]. In the second phase, p[Ic] is scaled with an adaptive parameter η 1, as computed by:

q ρ(K 1) 2 ΠB(K 1) p[I] z(K 1)[I] 2

p[Ic] z(K 1)[Ic] . (12)

Based on (12), the next point p ΠB(K 1) p[I] + ηp[Ic] is on the boundary B(K 1) 2 of the ball

B(K 1) 2 when Ic = φ. We skip the scaling in the case of Ic = φ. This iterative algorithm terminates when p satisﬁes the box constraint. The following proposition shows that our algorithm terminates within a ﬁnite step which is determined by the number of elements in c.

Proposition 2. The while loop in Algorithm 1 ﬁnds the optimal solution p = (ζ ) of the optimization problem (11) in a ﬁnite number of iterative steps less than the number of elements in c.

Proof. The proof is deferred to the supplementary material.

Algorithm 1 illustrates the BCP training algorithm. Similar to Kurakin et al. [19], we train on a mixture of normal logit z(x) and the worst logit z(x) as follows:

J (f, D; λ) = E(x,y) D (1 λ)L(z(x), y) + λL(z(x), y) i . (13)

We gradually increase the perturbation ϵ from 0 to the target bound ϵtarget and increase λ in (13) from 0 to 1, stabilizing the initial phase of training and improving natural accuracy [13, 44]. Therefore, our algorithm enables fast certiﬁable training of the robust model with a tight outer bound and is, thus, scalable to large networks.

4 Experiments

We demonstrate that the proposed method can provide a tight outer bound for ℓ2-perturbation set and train certiﬁably robust networks, comparing its performance against state-of-the-art certiﬁable training methods (LMT [36], CAP [40], and IBP [13]) on MNIST and CIFAR10. Moreover, we also show that the BCP scheme can scale to Tiny Image Net and obtain a meaningful veriﬁcation accuracy.1 We further investigate the robustness under a wide range of perturbation. The details of hyper-parameters and architectures used in the experiments can be found in the supplementary material.

2.5 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5

Logit space

BCP w/o BCP PGD PGD-Margin Decision Boundary

0.5 1.0 1.5 2.0 2.5 3.0

Logit space

BCP w/o BCP PGD PGD-Margin Decision Boundary

(b) CIFAR-10

14 12 10 8 6 4 2 0

Logit space

BCP w/o BCP PGD PGD-Margin Decision Boundary

(c) Acorn-or-seashore

Figure 2: Illustration of the outer bounds for the BCP trained models on (a) MNIST, (b) CIFAR-10, and (c) Acorn-or-seashore classiﬁcation tasks. BCP cuts off the lower area under the red line from the elliptic area and tightens the outer bound. The shaded parallelogram area in (c) indicates the image of the feasible region for the box constraint after the last linear layer.

Visualization of Tightening Effects Figure 2 illustrates how BCP can tighten the outer region by introducing the box constraint B(K 1) in (11). We can easily visualize the high-dimensional ellipsoid h(K)(B(K 1) 2 ) Rc in 2D plane with ζyand ζm -axes by projection, where m corresponds with the most probable class except the true class y. However, a high-dimensional parallelogram h(K)(B(K 1) ) is hard to visualize in the 2D plane. Thus, we use the red lines in Figure 2 (a)-(b) to indicate that the projection of the outer region h(K)(B(K 1) 2 B(K 1) ) must lie above the red line and inside the ellipsoid, showing how much area is cut off by the box constraint B(K 1) . We compute the worst-case margin (ζy ζm ζ y ζ m ) based on (9) and build the veriﬁcation boundary with it,

where the red line is obtained from the solution ζ y, ζ m of (9) for ˆz(B(x)) = h(K)(B(K 1) 2 B(K 1) ),

and the blue line is for ˆz(B(x)) = h(K)(B(K 1) 2 ). To verify the network, we utilize the veriﬁcation boundary, where the veriﬁcation for B2(x, ϵtarget) succeeds if the veriﬁcation boundary is above the decision boundary (ζy = ζm ). Figure 2c explicitly illustrates the ellipsoid h(K)(B(K 1) 2 ) and the parallelogram h(K)(B(K 1) ) with the veriﬁcation boundary for a toy binary classiﬁcation problem between acorn and seashore derived from Tiny Image Net dataset. We also indicate the logits for the adversarial examples against PGD attacks based on cross-entropy loss (PGD) and margin-based loss (PGD-Margin), which cannot go over the veriﬁcation boundaries.

Quantitative Analysis of Tightness of Outer Bounds To quantitatively analyze how much BCP can tighten the outer bound, we use "normalized ℓ1-norm" of the translation vector as a measure of

1https://tiny-imagenet.herokuapp.com/

tightness of the outer bound ˆz(B(x)) for given input x, deﬁned as τ(x) = P i =j ti(x; j)/c(c 1).

Without BCP, this is a constant, τ = P

i =j ρ(K 1) W(K) i,: W(K) j,: 2/c(c 1), over X since it only considers the global Lipschitz constant and does not depend on the input. We indicate this constant tightness measure τ for each dataset as the dotted lines in Figure 3. On the other hand, using BCP, we can consider the local properties of inputs, and thus, we can obtain different tightness for each input. As shown in the violin plots of the tightness in Figure 3, BCP can tighten the outer bounds by 55.4% (MNIST), 45.8% (CIFAR-10), and 25.3% (Tiny Image Net) on average.

Translation

Translation

Tiny Image Net

Translation

Figure 3: Violin plots of the tightness of the outer bounds. The dotted lines indicate the tightness without BCP. A smaller value indicates a better tightness.

Table 1: Computation time compared to CAP [40]. BCP is over 12 times faster than CAP ( For WRN, CAP uses two GPUs because of the memory limit).

Data Structure

Computation time (s/epoch) Speed up CAP BCP

MNIST 4C3F 689 57.5 12.0

CIFAR-10 4C3F 645 53.0 12.2 6C2F 1,369 56.5 24.2 WRN 1,121* 89.5 12.5 Tiny Image Net 8C2F - 3,268 -

Veriﬁcation performance We evaluate our certiﬁable training algorithm and other state-of-the-art methods (LMT [36], CAP [40], and IBP [13]) with ϵtarget = 1.58, 36/255, and 36/255 on MNIST, CIFAR-10 and Tiny Image Net, respectively. We use the same bound for evaluation, ϵeval = ϵtarget. For MNIST, BCP outperforms other methods not only in terms of veriﬁcation accuracy but also in terms of standard accuracy. For CIFAR-10, BCP outperforms LMT and IBP, and produces comparable performance with CAP in terms of veriﬁcation accuracy, whereas outperforming in terms of both standard accuracy and robust accuracy against PGD. For Tiny Image Net, BCP can achieve a veriﬁcation accuracy of 20.08%, while LMT and IBP learn constant models and CAP is not scalable to Tiny Image Net.

To further investigate robustness of the models, in Figure 4, we demonstrate the change of veriﬁcation accuracy for different ℓ2-perturbations ϵeval. We train the robust models with ϵtarget = 1.58 on MNIST (Figure 4a) and ϵtarget = 36/255 and 2ϵtarget = 72/255 on CIFAR-10 (Figure 4b,4c). Comparing to state-of-the-art methods, BCP achieves the highest veriﬁcation accuracy in a wide

Table 2: Comparison to other veriﬁable training methods. Best performances are highlighted in bold.

Data Structure # parameters Method Accuracy (%) Standard PGD Veriﬁcation

MNIST 4C3F 1974762 CAP 88.39 62.25 43.95 LMT 86.48 53.56 40.55 BCP 92.41 64.70 47.95

4C3F 2466858

CAP 60.14 55.67 50.29 LMT 56.49 49.83 37.20 IBP 34.50 31.79 24.39 BCP 63.88 58.75 49.58

6C2F 2250378

CAP 60.10 56.20 50.87 LMT 63.05 58.32 38.11 IBP 32.96 31.08 23.42 BCP 65.72 60.78 51.30

WRN 4214850 CAP 60.70 56.77 51.63 LMT 61.33 56.39 33.35 BCP 64.79 59.16 50.33 Tiny Image Net 8C2F 4342984 BCP 28.76 26.64 20.08

range of ϵeval. The veriﬁcation accuracy of BCP slowly decreases as increasing ϵ, and the decrease seems almost linear, while we observe a signiﬁcant drop in veriﬁcation accuracy when ϵeval ϵtarget for CAP. We emphasize that the veriﬁcation accuracy against a range of perturbation involves more meaningful understanding of robustness than the veriﬁcation performance at a speciﬁc perturbation bound ϵeval in Table 2 (see the supplementary material for more detailed results).

0.0 0.5 1.0 1.5 2.0 2.5 3.0

Verification Acc

BCP CAP LMT IBP

0.0 0.1 0.2 0.3 0.4 0.5

Verification Acc

BCP CAP LMT IBP

(b) CIFAR-10 (36/255)

0.0 0.1 0.2 0.3 0.4 0.5

Verification Acc

BCP CAP LMT IBP

(c) CIFAR-10 (72/255)

Figure 4: Veriﬁcation performances of veriﬁable training methods. The vertical lines indicate ϵtarget.

Computational Cost Table 1 shows that BCP is over 12 times faster than CAP. We evaluate the computation times on a single Titan X GPU. For a fair comparison, we use the same batch size for both methods as 50 on MNIST and CIFAR-10 and 5 on Tiny Image Net. Because CAP is memoryinefﬁcient, they cannot increase the batch size, whereas we can further speed up with a larger batch size. In the case of WRN [42] on CIFAR-10, we can speed up to 61.1 sec/epoch using batch size of 128, while CAP needs two GPUs to run with a batch size of 50. It implies that our certiﬁable training is efﬁciently applicable to a large-scale dataset.

5 Conclusion

In this study, we propose a fast certiﬁable training with a tight outer bound. To obtain a tight outer bound, we propose BCP that efﬁciently computes box constraints which can tighten the outer bound. Then, we train a certiﬁably robust model by minimizing the certiﬁcate loss based on the worst-translated logit over the tight outer bound. By doing so, we can build the ﬁrst certiﬁable robust model on Tiny Image Net under the ℓ2-perturbation. We hope that our method can serve as a strong benchmark for certiﬁable training on a large-scale dataset.

Broader Impact

Veriﬁable training can be used as one of a general learning scheme for applications to securitysensitive domains such as self-driving cars, face recognition, and medical diagnostics. In these applications, an adversarial example is a potential safety hazard that we want to avoid. By training a model with BCP, we can guarantee that no adversarial attack within a given norm-based perturbation can break the model. However, we should note that there is a trade-off between security and performance. Our work tends to lean to the security aspect, having relatively low accuracy on natural data. The sacriﬁce of performance can halve the beneﬁts of applying deep learning models, and security concerns can restrain deployments in the real system. We are already familiar with deep learning models embedded in our everyday products or services, such as a smart speaker, ridesharing apps, and social media services. Therefore, a balance of performance and security is required depending on the characteristics of the application. The development of veriﬁable training algorithm enables to improve standard accuracy, exactly quantifying the security. In addition, the quantiﬁcation of performance and security can help to adjust the balance between them.

Acknowledgement

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2018R1D1A1A02085851), and in part by the NRF Grant funded by the Korean Government (MSIT) (NRF-2019R1A2C2002358). The corresponding author is Saerom Park.

[1] A. Athalye, N. Carlini, and D. Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. ar Xiv preprint ar Xiv:1802.00420, 2018.

[2] A. Boopathy, T.-W. Weng, P.-Y. Chen, S. Liu, and L. Daniel. Cnn-cert: An efﬁcient framework for certifying robustness of convolutional neural networks. In Proceedings of the AAAI Conference on Artiﬁcial Intelligence, volume 33, pages 3240 3247, 2019.

[3] R. Bunel, I. Turkaslan, P. H. Torr, P. Kohli, and M. P. Kumar. Piecewise linear neural networks veriﬁcation: A comparative study. 2018.

[4] N. Carlini and D. Wagner. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pages 39 57. IEEE, 2017.

[5] N. Carlini, G. Katz, C. Barrett, and D. L. Dill. Ground-truth adversarial examples. ar Xiv preprint ar Xiv:1709.10207, 2017.

[6] J. M. Cohen, E. Rosenfeld, and J. Z. Kolter. Certiﬁed adversarial robustness via randomized smoothing. ar Xiv preprint ar Xiv:1902.02918, 2019.

[7] K. Dvijotham, S. Gowal, R. Stanforth, R. Arandjelovic, B. O Donoghue, J. Uesato, and P. Kohli. Training veriﬁed learners with learned veriﬁers. ar Xiv preprint ar Xiv:1805.10265, 2018.

[8] K. Dvijotham, R. Stanforth, S. Gowal, T. A. Mann, and P. Kohli. A dual approach to scalable veriﬁcation of deep networks. In UAI, pages 550 559, 2018.

[9] R. Ehlers. Formal veriﬁcation of piece-wise linear feed-forward neural networks. In International Symposium on Automated Technology for Veriﬁcation and Analysis, pages 269 286. Springer, 2017.

[10] M. Fazlyab, A. Robey, H. Hassani, M. Morari, and G. Pappas. Efﬁcient and accurate estimation of lipschitz constants for deep neural networks. In Advances in Neural Information Processing Systems, pages 11423 11434, 2019.

[11] M. Fischetti and J. Jo. Deep neural networks and mixed integer linear optimization. Constraints, 23(3):296 309, 2018.

[12] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. ar Xiv preprint ar Xiv:1412.6572, 2014.

[13] S. Gowal, K. Dvijotham, R. Stanforth, R. Bunel, C. Qin, J. Uesato, T. Mann, and P. Kohli. On the effectiveness of interval bound propagation for training veriﬁably robust models. ar Xiv preprint ar Xiv:1810.12715, 2018.

[14] C. Guo, M. Rana, M. Cisse, and L. van der Maaten. Countering adversarial images using input transformations. ar Xiv preprint ar Xiv:1711.00117, 2017.

[15] M. Hein and M. Andriushchenko. Formal guarantees on the robustness of a classiﬁer against adversarial manipulation. In Advances in Neural Information Processing Systems, pages 2266 2276, 2017.

[16] X. Huang, M. Kwiatkowska, S. Wang, and M. Wu. Safety veriﬁcation of deep neural networks. In International Conference on Computer Aided Veriﬁcation, pages 3 29. Springer, 2017.

[17] T. Huster, C.-Y. J. Chiang, and R. Chadha. Limitations of the lipschitz constant as a defense against adversarial examples. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 16 29. Springer, 2018.

[18] G. Katz, C. Barrett, D. L. Dill, K. Julian, and M. J. Kochenderfer. Reluplex: An efﬁcient smt solver for verifying deep neural networks. In International Conference on Computer Aided Veriﬁcation, pages 97 117. Springer, 2017.

[19] A. Kurakin, I. Goodfellow, and S. Bengio. Adversarial examples in the physical world. ar Xiv preprint ar Xiv:1607.02533, 2016.

[20] F. Latorre, P. Rolland, and V. Cevher. Lipschitz constant estimation of neural networks via sparse polynomial optimization. ar Xiv preprint ar Xiv:2004.08688, 2020.

[21] M. Lecuyer, V. Atlidakis, R. Geambasu, D. Hsu, and S. Jana. Certiﬁed robustness to adversarial examples with differential privacy. In 2019 IEEE Symposium on Security and Privacy (SP), pages 656 672. IEEE, 2019.

[22] B. Li, C. Chen, W. Wang, and L. Carin. Second-order adversarial attack and certiﬁable robustness. ar Xiv preprint ar Xiv:1809.03113, 2018.

[23] A. Lomuscio and L. Maganti. An approach to reachability analysis for feed-forward relu neural networks. ar Xiv preprint ar Xiv:1706.07351, 2017.

[24] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models resistant to adversarial attacks. ar Xiv preprint ar Xiv:1706.06083, 2017.

[25] M. Mirman, T. Gehr, and M. Vechev. Differentiable abstract interpretation for provably robust neural networks. In International Conference on Machine Learning, pages 3575 3583, 2018.

[26] R. E. Moore, R. B. Kearfott, and M. J. Cloud. Introduction to interval analysis, volume 110. Siam, 2009.

[27] S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard. Universal adversarial perturbations. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1765 1773, 2017.

[28] N. Papernot, P. Mc Daniel, X. Wu, S. Jha, and A. Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy (SP), pages 582 597. IEEE, 2016.

[29] A. Raghunathan, J. Steinhardt, and P. Liang. Certiﬁed defenses against adversarial examples. ar Xiv preprint ar Xiv:1801.09344, 2018.

[30] A. Raghunathan, J. Steinhardt, and P. S. Liang. Semideﬁnite relaxations for certifying robustness to adversarial examples. In Advances in Neural Information Processing Systems, pages 10877 10887, 2018.

[31] H. Salman, J. Li, I. Razenshteyn, P. Zhang, H. Zhang, S. Bubeck, and G. Yang. Provably robust deep learning via adversarially trained smoothed classiﬁers. In Advances in Neural Information Processing Systems, pages 11289 11300, 2019.

[32] G. Singh, T. Gehr, M. Mirman, M. Püschel, and M. Vechev. Fast and effective robustness certiﬁcation. In Advances in Neural Information Processing Systems, pages 10802 10813, 2018.

[33] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. ar Xiv preprint ar Xiv:1312.6199, 2013.

[34] V. Tjeng, K. Xiao, and R. Tedrake. Evaluating robustness of neural networks with mixed integer programming. ar Xiv preprint ar Xiv:1711.07356, 2017.

[35] F. Tramèr, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, and P. Mc Daniel. Ensemble adversarial training: Attacks and defenses. ar Xiv preprint ar Xiv:1705.07204, 2017.

[36] Y. Tsuzuku, I. Sato, and M. Sugiyama. Lipschitz-margin training: Scalable certiﬁcation of perturbation invariance for deep neural networks. In Advances in Neural Information Processing Systems, pages 6541 6550, 2018.

[37] T.-W. Weng, H. Zhang, H. Chen, Z. Song, C.-J. Hsieh, D. Boning, I. S. Dhillon, and L. Daniel. Towards fast computation of certiﬁed robustness for relu networks. ar Xiv preprint ar Xiv:1804.09699, 2018.

[38] T.-W. Weng, H. Zhang, P.-Y. Chen, J. Yi, D. Su, Y. Gao, C.-J. Hsieh, and L. Daniel. Evaluating the robustness of neural networks: An extreme value theory approach. ar Xiv preprint ar Xiv:1801.10578, 2018.

[39] E. Wong and J. Z. Kolter. Provable defenses against adversarial examples via the convex outer adversarial polytope. ar Xiv preprint ar Xiv:1711.00851, 2017.

[40] E. Wong, F. Schmidt, J. H. Metzen, and J. Z. Kolter. Scaling provable adversarial defenses. In Advances in Neural Information Processing Systems, pages 8400 8409, 2018.

[41] C. Xie, J. Wang, Z. Zhang, Z. Ren, and A. Yuille. Mitigating adversarial effects through randomization. ar Xiv preprint ar Xiv:1711.01991, 2017.

[42] S. Zagoruyko and N. Komodakis. Wide residual networks. ar Xiv preprint ar Xiv:1605.07146, 2016.

[43] H. Zhang, T.-W. Weng, P.-Y. Chen, C.-J. Hsieh, and L. Daniel. Efﬁcient neural network robustness certiﬁcation with general activation functions. In Advances in neural information processing systems, pages 4939 4948, 2018.

[44] H. Zhang, H. Chen, C. Xiao, B. Li, D. Boning, and C.-J. Hsieh. Towards stable and efﬁcient training of veriﬁably robust neural networks. ar Xiv preprint ar Xiv:1906.06316, 2019.