# lipschitzcertifiable_training_with_a_tight_outer_bound__f91c9e4f.pdf Lipschitz-Certifiable Training with a Tight Outer Bound Sungyoon Lee Seoul National University Seoul, Korea goman1934@snu.ac.kr Jaewook Lee Seoul National University Seoul, Korea jaewook@snu.ac.kr Saerom Park Sungshin Women s University Seoul, Korea psr6275@sungshin.ac.kr Verifiable training is a promising research direction for training a robust network. However, most verifiable training methods are slow or lack scalability. In this study, we propose a fast and scalable certifiable training algorithm based on Lipschitz analysis and interval arithmetic. Our certifiable training algorithm provides a tight propagated outer bound by introducing the box constraint propagation (BCP), and it efficiently computes the worst logit over the outer bound. In the experiments, we show that BCP achieves a tighter outer bound than the global Lipschitz-based outer bound. Moreover, our certifiable training algorithm is over 12 times faster than the state-of-the-art dual relaxation-based method; however, it achieves comparable or better verification performance, improving natural accuracy. Our fast certifiable training algorithm with the tight outer bound can scale to Tiny Image Net with verification accuracy of 20.1% (ℓ2-perturbation of ϵ = 36/255). Our code is available at https://github.com/sungyoon-lee/bcp. 1 Introduction Deep learning has shown successful results in many applications. However, it has been demonstrated that deep neural networks are vulnerable to small but adversarially designed perturbations in the input which can mislead a network to predict a wrong label [33]. There have been many studies on such adversarial attacks and defenses against them [12, 19, 28, 27, 24, 4, 35, 41, 14]. However, Athalye et al. [1] have shown that many of these defense methods are designed to defend against specific predefined adversarial attacks, and, in turn, the models can yet be broken by unseen stronger adaptive adversaries. Thus, many verification methods are proposed to guarantee stable prediction of input within a perturbation set [18, 5, 9, 16, 38, 34, 23, 11, 3, 30, 8, 43, 2]. Verification of a neural network provides either lower bounds on the norm of the input perturbations required to fool the network or upper bounds on the worst-case errors of the network against specified perturbations. In particular, verifiable training incorporates the verification procedure using the upper bound into the training loop and yields a robust model [39, 40, 7, 8, 29, 30]. Verifiable training methods are mainly categorized into two approaches: dual relaxation and layerwise bound propagation approaches. The dual relaxation approach formulates the verifiable training as a convex optimization and uses duality to build a relaxed bound of the optimization problem and to relieve the computational load [39, 40, 29, 7, 30]. Although these verifiable training methods can provide relatively exact robustness bounds for verification, they still involve expensive computations and poor scalability. In contrast, the layer-wise bound propagation approach calculates the upper bounds on the worst case error through relaxation on the layer-wise operations and forward propagation for the perturbation set that can be made of ℓ - or ℓ2-balls [13, 44, 36, 25, 32, 37]. These layer-wise methods are computationally efficient but have loose bounds in the initial phase of training, hindering the application to larger networks. 34th Conference on Neural Information Processing Systems (Neur IPS 2020), Vancouver, Canada. Apart from these deterministic verification approaches, randomized smoothing is one promising approach to procure robust classification results. This can make any classifier to acquire a certified adversarial robustness by constructing a smoothed classifier [22, 21, 6, 31]. However, the randomized smoothing methods require a large number of samples to certify the classifier. In this study, we propose an efficient certifiable training method with a tight outer bound propagation. This propagation enables the model to scale to Tiny Image Net. Our algorithm minimizes an upper bound on the robust classification error. We further tighten the upper bound by introducing a valid box constraint into the optimization problem, thereby improving the optimal solution. By tightening the upper bound on the objective, both the robustness and standard accuracy of our method improve. To summarize, the main contributions of this paper are as follows: We propose a fast certifiable training algorithm called Box Constraints Propagation (BCP) with an efficient computation of the upper bound on the robust classification error. BCP is over 12 times faster than the state-of-the-art dual relaxation-based method [40]. We can obtain tighter outer bounds than those without BCP. These bounds are on average 25.3-55.4% tighter in terms of the length of the worst logit translation. Therefore, our certificate using BCP achieves the verification accuracy comparable to CAP [40], while improving the natural accuracy on CIFAR-10. Our approach can scale to Tiny Image Net and learn a certificate that can achieve 20.1% verification accuracy (ϵ = 36/255). To the best of our knowledge, this is the first non-trivial (deterministic) verification accuracy on Tiny Image Net under the ℓ2-robustness. Our verification loss can adapt to the input locations; thus, the model can learn a behavior depending on the input locations. 2 Certifiable Training with Worst Logit In this section, we introduce the robust training problem for multi-class classification, define the specification based on the worst logit, and establish the objective that provides an upper bound of the robust training problem. Notation We consider a c-class classification problem, where x X RN is an input, y Y = {0, 1, ..., c 1} is the label with respect to the input x, and c is the number of classes. A mapping that takes an input x and outputs a logit vector ζ = z(x) Z is denoted by z : X Z Rc, and the corresponding classifier is f : X Y with f(x) = argmaxm Y zm(x) where zm is the output logit for a class m Y. We assume the classifier network is a feedforward network with K layers as z(k) = h(k)(z(k 1)), k = 1, . . . , K, where z(k) is the vector of the activations in the k-th layer, z(K) = ζ, z(0) = x, and h(k) is the operation in the k-th layer. Let B(x, ϵ) be a perturbation set around the input x with a level of perturbation ϵ. Then, for a classifier f, the robust classification error within the perturbation set B( , ϵ) on a data distribution D is defined as R(f) = PD x B(x, ϵ) s.t. f(x ) = y . We omit the dependency on D and ϵ for simplicity. 2.1 Robust Training The main goal of certifiable training is to minimize the robust classification error R(f). However, because the exact verification for R(f) is NP-complete [18], a simple surrogate of R(f) is used to construct the objective of certifiable training. Our certifiable training minimizes an upper bound on R(f) that builds a certificate of robustness whereas adversarial training [24] minimizes a lower bound on R(f). To obtain the upper bound on R(f), we propagate the perturbation set B(x, ϵ) and calculate the outer bound on the propagated image in the logit space Z. For simplicity, B(x) denotes the input perturbation set B(x, ϵ). Let ˆz(B(x)) Z be an outer bound on the logit image of the perturbation set z(B(x)). Then, we can construct the following upper bound ˆR(f) on R(f): R(f) = E(x,y) D max ζ z(B(x)) max y =y 1[ ζy ζy 0] E(x,y) D max ζ ˆz(B(x)) max y =y 1[ ζy ζy 0] = ˆR(f), (1) where ζm is the m-th element of the logit vector ζ and 1[ ] denotes the indicator function. 2.2 Worst-Translated Logit Based on the upper bound ˆR(f) in (1), we can construct an objective for verifiable training as E(x,y) D max ζ ˆz(B(x)) L(ζ, y) using cross-entropy loss L as a surrogate loss function for the 0-1 loss of ˆR(f). However, it is still inefficient to find the optimal solution for the non-convex maximization problem max ζ ˆz(B(x)) L(ζ, y). Dual relaxation approach addressed this problem by computing a differentiable upper bound on the robust classification error, using a feasible dual solution of the underlying relaxed LP [39, 40, 8]. In contrast, layer-wise propagation approach proposed the worst-case logit or the certifiable margin in the logit space to obtain a differentiable upper bound on the robust classification error [36, 13, 44]. In this study, we introduce the worst-translated logit z(x) that provides an upper bound on the cross-entropy loss over an outer bound ˆz(B(x)) as follows: Definition 1. The worst-translated logit over an outer bound ˆz(B(x)) for the input x and the corresponding label y is defined as z(x; y) = z(x) + t(x; y) where the translation vector t(x; y) has its m-th element with tm(x; y) = (zy(x) zm(x)) min ζ ˆz(B(x)) ζy ζm . (2) When the context is clear, we omit y in z(x; y) and t(x; y), and just write z(x) and t(x) for brevity. Proposition 1 (Wong and Kolter [39]). For an outer bound ˆz(B(x)) z(B(x)) and its corresponding worst-translated logit z(x), the following inequality holds: max ζ ˆz(B(x)) L(ζ, y) L(z(x), y), (3) where L is the cross-entropy loss function. Finally, the objective to be minimized is formulated as follows: J (f, D) = E(x,y) D L(z(x), y) . (4) Note that the worst-translated logit z(x) for the input x may not be inside the outer bound ˆz(B(x)). The remaining problem is how to calculate the outer bound ˆz(B(x)) of the logit image z(B(x)) and how to solve the minimization in (2) corresponding to the outer bound, which will be discussed in Section 3.1 and 3.2, respectively. Furthermore, by using the worst-translated logit z(x) as a certificate that guarantees robustness to adversarial perturbations, we can obtain verification error of the model f on the test data Dtest as follows: RV (f) = ˆP(x,y) Dtest min y =y zy(x) zy (x) 0 (5) which is larger than the robust classification error R(f) on the test data Dtest. 3 Lipschitz-Certifiable Training with Tight Outer Bound In this section, we propose a tight outer bound estimation and an efficient algorithm for calculating the worst-translated logit. We mainly focus on ℓ2-perturbation sets in the input space, but our method can be easily extended to any ℓp-perturbations for p > 0 and ℓ -perturbations, as described later. Notation The ℓ2-perturbation set and the ℓ -perturbation set in the input space are denoted by B2(x, ϵ) = {x : x x 2 ϵ} and B (x, ϵ) = {x : |x i xi| ϵ, i}, respectively. To obtain a tight outer bound ˆz(B2(x, ϵ)), we propagate the perturbation sets through the layers and calculate layerwise outer bounds B(k) 2 and B(k) in the k-th layer. The k-th layer ℓ -bound B(k) can be represented as the box constraint B(k) = midrad(m(k), r(k)) {p : |pi m(k) i | r(k) i , i} with the midpoint m(k) and the radius r(k) [26]. We call the B(k) 2 "ball outer bounds" and the B(k) "box outer bounds". Figure 1: Illustration of BCP. For the k-th layer, the k-th box B(k) (left) is propagated to the next box B(k+1) (right), both colored in red. Note that the k-th ball B(k) 2 is independently propagated to the next ball B(k+1) 2 which has L(k) times larger radius. 3.1 Outer Bound Estimation The outer bound from ℓ2-perturbation ˆz(B2(x, ϵ)) can be simply constructed by using the global Lischiptz constant L of the logit function z, where ˆz(B2(x, ϵ)) = B2(z(x), ϵL) [36]. The global Lipschitz constant is efficiently computed as the product of all layer-wise Lipschitz constants, L = QK k=1 L(k). To tighten the spherical outer bound in the logit space, Tsuzuku et al. [36] replaced it with the ellipsoidal outer bound, ˆz(B2(x, ϵ)) = h(K)(B2(z(K 1), ρ(K 1))), where ρ(k) = ϵ Qk i=1 L(i). In the objective (4), the ellipsoidal outer bound enables the Lipschitz-margin to solve the optimization in (2) explicitly. However, the global Lipschitz constant can still overestimate the outer bound and impose a strong penalty for it, limiting the expressiveness of the model [17]. It leads to a poor classification performance, not getting sharp transitions near decision boundary [15]. On the other hand, using local Lipschitz constants to estimate the outer bounds for each given input x is computationally infeasible to be integrated into the training loop for medium-sized networks, and thus limited to 2-layered networks [15, 10, 20]. To this end, we propose a method called BCP, using layer-wise propagation with the Lipschitz constant and interval arithmetic to efficiently approximate the propagation of the perturbation set adaptive to the input location. This addresses the problems of the global Lipschitz constant-based outer bound and remains the efficient computations for certifiable training. In addition, it enables us to obtain a certificate that can adapt to local properties of the classifier. We further discussed the intuition behind the design of BCP in the supplementary material. Box Constraint Propagation Our outer bound propagation starts with the ℓ2and ℓ - perturbations sets (B(0) 2 , B(0) ), propagates them through layers, and derive the tight ℓ -outer bound B(k) in each layer by circumscribing the propagated images and finding the intersection of the circumscribed boxes. In the penultimate layer, we combine the propagated box constraint bound B(K 1) and the propagated global Lipschitz bound B(K 1) 2 to tighten the final outer bound ˆz(B(x)). For ℓ2-certifiable training, we first consider the pair (B(0) 2 , B(0) ), where B(0) 2 B2(x, ϵ) and B(0) B (x, ϵ) circumscribing B(0) 2 in the input space. Next, we propagate them through the layers to compute the layerwise outer bound pair (B(k) 2 , B(k) ). Here, we assume a feedforward network, but we can extend it to residual networks (see the supplementary material). The ball outer bound in the k-th layer B(k) 2 is the Lipschitz outer bound B2(z(k), ρ(k)) with the radius ρ(k) = ϵ Qk i=1 L(i), where we use the power iteration to estimate the layer-wise Lipschitz constants L(i). The box outer bound B(k+1) in the (k + 1)-th layer (k = 0, 1, , K 2) is obtained by two box constraints that one circumscribes the propagated ellipse image h(k+1)(B(k) 2 ) of the ball B(k) 2 and the other circumscribes the propagated parallelepiped image h(k+1)(B(k) ) of the box B(k) as described in Figure 1. In case of linear layers, the circumscribed box about the propagated ellipse image, out (h(k+1)(B(k) 2 )), is calculated as follows: out (h(k+1)(B(k) 2 )) = midrad( ˆm(k), ˆr(k)) s.t. ˆm(k) = h(k+1)(z(k)), ˆr(k) i = W(k+1) i,: 2 ρ(k), (6) where W(k+1) i,: is the i-th row of the weight matrix W(k+1) of the linear function h(k+1). Simultaneously, we use the interval arithmetic to obtain the other box about the propagated parallelepiped image, IA(B(k) ) as in [13]: IA(B(k) ) = midrad( m(k), r(k)) s.t. m(k) = h(k+1)(m(k)), r(k) = |W(k+1)| r(k), (7) where |W| takes the element-wise absolute values of W. The above two propagations can be easily extended to nonlinear layers. The details are described in the supplementary material. Finally, we can obtain the box outer bound B(k+1) = out (h(k+1)(B(k) 2 )) IA(B(k) ) for the next (k + 1)-th layer as illustrated in Figure 1 with the following equations: m(k+1) = (ub(k+1) + lb(k+1))/2, r(k+1) = (ub(k+1) lb(k+1))/2 s.t. ub(k+1) = max( ˆm(k) + ˆr(k), m(k) + r(k)), lb(k+1) = min( ˆm(k) ˆr(k), m(k) r(k)), (8) where max and min take the element-wise maximum and minimum values, respectively. In the penultimate layer, we obtain the intersection B(K 1) B(K 1) 2 of the box outer bound B(K 1) and the ball outer bound B(K 1) 2 . The intersection is propagated to the logit space through the last linear layer to obtain the tight outer bound, as ˆz(B(x)) = h(K)(B(K 1) B(K 1) 2 ) Z. Extension to ℓp-norm We note that BCP can be easily extended to ℓp-certifiable training for any p > 0 by modifying B(0) 2 = B2(x, ϵ) to B2(x, ϵ ) circumscribing Bp(x, ϵ) in the input space RN, where ϵ = N 1/2 1/max(p,2)ϵ. For the ℓ -case, we can use B(0) 2 = B2(x, Nϵ) circumscribing B(0) . Thus, for ℓ -bound, BCP can be considered as a generalized version of IBP (Interval Bound Propagation) [13]. We found that BCP shows a similar performance to IBP as an ℓ -certified training (see the supplementary material for the details). For now we focus on the performance of BCP under ℓ2-perturbations. 3.2 Certifiable Training Algorithm Formulation Our certifiable algorithm aims to minimize the objective J (f, D) in (4) to get a robust classifier. The objective contains the worst-translated logit z(x), which requires computation of the translation vector t(x) in (2). In this section, we propose an efficient algorithm to calculate t(x) for the tight propagated outer bound ˆz(B(x)) = h(K)(B(K 1) B(K 1) 2 ) proposed in Section 3.1. In Equation (2), zy(x) zm(x) is easily obtained by a forward pass through the network. However, the optimization min ζ ˆz(B(x)) ζy ζm is nontrivial and dependent on the outer bound ˆz(B(x)). Without loss of generality, we can assume that y = 1 and m = 0. Then, the optimal values ζ 0, ζ 1 are as follows: ζ 0, ζ 1 = argmin (ζ0,ζ1) Π0,1ˆz(B(x)) ζ1 ζ0 , (9) where Π0,1 is the projection onto the ζ0ζ1-plane. Then, we can formulate the following optimization: min ζ ˆz(B(x))(e1 e0)T ζ = min ζ B(K 1) 2 B(K 1) (e1 e0)T h(K)(ζ ) = min ζ B(K 1) 2 B(K 1) (W(K) 1,: W(K) 0,: )ζ + b(K) 1 b(K) 0 , (10) where ei is the i-th standard basis vector, and W(K) and b(K) is the weight matrix and the bias vector for the last linear layer h(K). Note that ζ is the vector in the penultimate layer. Therefore, we can construct the following optimization problem that finds the largest violation of the specification to verify the network: min ζ c T ζ s.t. ζ z(K 1) 2 ρ(K 1), |ζ m(K 1)| r(K 1), (11) where c is a specification vector with c T = W(K) 1,: W(K) 0,: , and the second constraint takes the element-wise absolute value and the element-wise inequality. Since it is computationally expensive to Algorithm 1 Box Constraint Propagation (BCP) Certifiable Training Input: training data (x, y) D, target perturbation size ϵtarget, network parameterized by θ Output: Robust network fθ repeat Read mini-batch B from D and adjust ϵ and λ according to the schedule. // Compute the box outer bound and the ball outer bound // B(K 1) = midrad(m(K 1), r(K 1)) where m(K 1), r(K 1) = BCP(x, ϵ; θ) ((6)-(8)). B(K 1) 2 = B2(z(K 1), ρ(K 1)) where z(K 1) = h(K 1) h(1)(x) and ρ(K 1) = ϵ QK 1 i=1 L(i) // Solve the optimization in (11) for each m = y in parallel // Initialize p = z(K 1) ρ(K 1) c c . while not |p m(K 1)| r(K 1) do Decompose p into two parts: p = p[I] + p[Ic], where I {l : |pl m(K 1) l | r(K 1) l }. First phase Project p[I] onto B(K 1) . Second phase With the scaling parameter η in (12), update p ΠB(K 1) p[I] + ηp[Ic]. end while Calculate the worst-translated logit z(x) = z(x) + t(x) with (2) and (10): tm(x) = c T (z(K 1) p). // Update Parameters // Update the parameter θ with the objective (13): θ θ α θJ (fθ, B; λ). until training phase ends integrate a typical optimization tool within the training loop, we propose a simple iterative algorithm that approaches to the optimal solution of (11) in a finite number of steps. We emphasize that by solving (11) we can obtain a better certificate than the global Lipschitz-based certificate because it uses additional box constraint, |ζ m(K 1)| r(K 1). We will see how this additional constraint affects the outer bound and the verification performance in Section 4. Solving the optimization We solve the optimization (11) by using an efficient iterative algorithm that terminates when none of the elements violate the box constraint. Our algorithm starts with the initial point p = z(K 1) ρ(K 1) c c which is the optimal solution of (11) when ignoring the box constraint. Then p satisfies the ball constraint but is not guaranteed to satisfy the box constraint. We decompose the indices of p into two parts, I and Ic, where I {l : |pl m(K 1) l | r(K 1) l }. Then, we can represent p = p[I] + p[Ic], where p[J] = P l J plel. Note that I or Ic can be empty, and we define p[φ] = 0. Then, we iterate the following two phases to find the optimal solution efficiently. In the first phase, p[I] is projected onto the box, denoted by ΠB(K 1) p[I]. In the second phase, p[Ic] is scaled with an adaptive parameter η 1, as computed by: q ρ(K 1) 2 ΠB(K 1) p[I] z(K 1)[I] 2 p[Ic] z(K 1)[Ic] . (12) Based on (12), the next point p ΠB(K 1) p[I] + ηp[Ic] is on the boundary B(K 1) 2 of the ball B(K 1) 2 when Ic = φ. We skip the scaling in the case of Ic = φ. This iterative algorithm terminates when p satisfies the box constraint. The following proposition shows that our algorithm terminates within a finite step which is determined by the number of elements in c. Proposition 2. The while loop in Algorithm 1 finds the optimal solution p = (ζ ) of the optimization problem (11) in a finite number of iterative steps less than the number of elements in c. Proof. The proof is deferred to the supplementary material. Algorithm 1 illustrates the BCP training algorithm. Similar to Kurakin et al. [19], we train on a mixture of normal logit z(x) and the worst logit z(x) as follows: J (f, D; λ) = E(x,y) D (1 λ)L(z(x), y) + λL(z(x), y) i . (13) We gradually increase the perturbation ϵ from 0 to the target bound ϵtarget and increase λ in (13) from 0 to 1, stabilizing the initial phase of training and improving natural accuracy [13, 44]. Therefore, our algorithm enables fast certifiable training of the robust model with a tight outer bound and is, thus, scalable to large networks. 4 Experiments We demonstrate that the proposed method can provide a tight outer bound for ℓ2-perturbation set and train certifiably robust networks, comparing its performance against state-of-the-art certifiable training methods (LMT [36], CAP [40], and IBP [13]) on MNIST and CIFAR10. Moreover, we also show that the BCP scheme can scale to Tiny Image Net and obtain a meaningful verification accuracy.1 We further investigate the robustness under a wide range of perturbation. The details of hyper-parameters and architectures used in the experiments can be found in the supplementary material. 2.5 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 Logit space BCP w/o BCP PGD PGD-Margin Decision Boundary 0.5 1.0 1.5 2.0 2.5 3.0 Logit space BCP w/o BCP PGD PGD-Margin Decision Boundary (b) CIFAR-10 14 12 10 8 6 4 2 0 Logit space BCP w/o BCP PGD PGD-Margin Decision Boundary (c) Acorn-or-seashore Figure 2: Illustration of the outer bounds for the BCP trained models on (a) MNIST, (b) CIFAR-10, and (c) Acorn-or-seashore classification tasks. BCP cuts off the lower area under the red line from the elliptic area and tightens the outer bound. The shaded parallelogram area in (c) indicates the image of the feasible region for the box constraint after the last linear layer. Visualization of Tightening Effects Figure 2 illustrates how BCP can tighten the outer region by introducing the box constraint B(K 1) in (11). We can easily visualize the high-dimensional ellipsoid h(K)(B(K 1) 2 ) Rc in 2D plane with ζyand ζm -axes by projection, where m corresponds with the most probable class except the true class y. However, a high-dimensional parallelogram h(K)(B(K 1) ) is hard to visualize in the 2D plane. Thus, we use the red lines in Figure 2 (a)-(b) to indicate that the projection of the outer region h(K)(B(K 1) 2 B(K 1) ) must lie above the red line and inside the ellipsoid, showing how much area is cut off by the box constraint B(K 1) . We compute the worst-case margin (ζy ζm ζ y ζ m ) based on (9) and build the verification boundary with it, where the red line is obtained from the solution ζ y, ζ m of (9) for ˆz(B(x)) = h(K)(B(K 1) 2 B(K 1) ), and the blue line is for ˆz(B(x)) = h(K)(B(K 1) 2 ). To verify the network, we utilize the verification boundary, where the verification for B2(x, ϵtarget) succeeds if the verification boundary is above the decision boundary (ζy = ζm ). Figure 2c explicitly illustrates the ellipsoid h(K)(B(K 1) 2 ) and the parallelogram h(K)(B(K 1) ) with the verification boundary for a toy binary classification problem between acorn and seashore derived from Tiny Image Net dataset. We also indicate the logits for the adversarial examples against PGD attacks based on cross-entropy loss (PGD) and margin-based loss (PGD-Margin), which cannot go over the verification boundaries. Quantitative Analysis of Tightness of Outer Bounds To quantitatively analyze how much BCP can tighten the outer bound, we use "normalized ℓ1-norm" of the translation vector as a measure of 1https://tiny-imagenet.herokuapp.com/ tightness of the outer bound ˆz(B(x)) for given input x, defined as τ(x) = P i =j ti(x; j)/c(c 1). Without BCP, this is a constant, τ = P i =j ρ(K 1) W(K) i,: W(K) j,: 2/c(c 1), over X since it only considers the global Lipschitz constant and does not depend on the input. We indicate this constant tightness measure τ for each dataset as the dotted lines in Figure 3. On the other hand, using BCP, we can consider the local properties of inputs, and thus, we can obtain different tightness for each input. As shown in the violin plots of the tightness in Figure 3, BCP can tighten the outer bounds by 55.4% (MNIST), 45.8% (CIFAR-10), and 25.3% (Tiny Image Net) on average. Translation Translation Tiny Image Net Translation Figure 3: Violin plots of the tightness of the outer bounds. The dotted lines indicate the tightness without BCP. A smaller value indicates a better tightness. Table 1: Computation time compared to CAP [40]. BCP is over 12 times faster than CAP ( For WRN, CAP uses two GPUs because of the memory limit). Data Structure Computation time (s/epoch) Speed up CAP BCP MNIST 4C3F 689 57.5 12.0 CIFAR-10 4C3F 645 53.0 12.2 6C2F 1,369 56.5 24.2 WRN 1,121* 89.5 12.5 Tiny Image Net 8C2F - 3,268 - Verification performance We evaluate our certifiable training algorithm and other state-of-the-art methods (LMT [36], CAP [40], and IBP [13]) with ϵtarget = 1.58, 36/255, and 36/255 on MNIST, CIFAR-10 and Tiny Image Net, respectively. We use the same bound for evaluation, ϵeval = ϵtarget. For MNIST, BCP outperforms other methods not only in terms of verification accuracy but also in terms of standard accuracy. For CIFAR-10, BCP outperforms LMT and IBP, and produces comparable performance with CAP in terms of verification accuracy, whereas outperforming in terms of both standard accuracy and robust accuracy against PGD. For Tiny Image Net, BCP can achieve a verification accuracy of 20.08%, while LMT and IBP learn constant models and CAP is not scalable to Tiny Image Net. To further investigate robustness of the models, in Figure 4, we demonstrate the change of verification accuracy for different ℓ2-perturbations ϵeval. We train the robust models with ϵtarget = 1.58 on MNIST (Figure 4a) and ϵtarget = 36/255 and 2ϵtarget = 72/255 on CIFAR-10 (Figure 4b,4c). Comparing to state-of-the-art methods, BCP achieves the highest verification accuracy in a wide Table 2: Comparison to other verifiable training methods. Best performances are highlighted in bold. Data Structure # parameters Method Accuracy (%) Standard PGD Verification MNIST 4C3F 1974762 CAP 88.39 62.25 43.95 LMT 86.48 53.56 40.55 BCP 92.41 64.70 47.95 4C3F 2466858 CAP 60.14 55.67 50.29 LMT 56.49 49.83 37.20 IBP 34.50 31.79 24.39 BCP 63.88 58.75 49.58 6C2F 2250378 CAP 60.10 56.20 50.87 LMT 63.05 58.32 38.11 IBP 32.96 31.08 23.42 BCP 65.72 60.78 51.30 WRN 4214850 CAP 60.70 56.77 51.63 LMT 61.33 56.39 33.35 BCP 64.79 59.16 50.33 Tiny Image Net 8C2F 4342984 BCP 28.76 26.64 20.08 range of ϵeval. The verification accuracy of BCP slowly decreases as increasing ϵ, and the decrease seems almost linear, while we observe a significant drop in verification accuracy when ϵeval ϵtarget for CAP. We emphasize that the verification accuracy against a range of perturbation involves more meaningful understanding of robustness than the verification performance at a specific perturbation bound ϵeval in Table 2 (see the supplementary material for more detailed results). 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Verification Acc BCP CAP LMT IBP 0.0 0.1 0.2 0.3 0.4 0.5 Verification Acc BCP CAP LMT IBP (b) CIFAR-10 (36/255) 0.0 0.1 0.2 0.3 0.4 0.5 Verification Acc BCP CAP LMT IBP (c) CIFAR-10 (72/255) Figure 4: Verification performances of verifiable training methods. The vertical lines indicate ϵtarget. Computational Cost Table 1 shows that BCP is over 12 times faster than CAP. We evaluate the computation times on a single Titan X GPU. For a fair comparison, we use the same batch size for both methods as 50 on MNIST and CIFAR-10 and 5 on Tiny Image Net. Because CAP is memoryinefficient, they cannot increase the batch size, whereas we can further speed up with a larger batch size. In the case of WRN [42] on CIFAR-10, we can speed up to 61.1 sec/epoch using batch size of 128, while CAP needs two GPUs to run with a batch size of 50. It implies that our certifiable training is efficiently applicable to a large-scale dataset. 5 Conclusion In this study, we propose a fast certifiable training with a tight outer bound. To obtain a tight outer bound, we propose BCP that efficiently computes box constraints which can tighten the outer bound. Then, we train a certifiably robust model by minimizing the certificate loss based on the worst-translated logit over the tight outer bound. By doing so, we can build the first certifiable robust model on Tiny Image Net under the ℓ2-perturbation. We hope that our method can serve as a strong benchmark for certifiable training on a large-scale dataset. Broader Impact Verifiable training can be used as one of a general learning scheme for applications to securitysensitive domains such as self-driving cars, face recognition, and medical diagnostics. In these applications, an adversarial example is a potential safety hazard that we want to avoid. By training a model with BCP, we can guarantee that no adversarial attack within a given norm-based perturbation can break the model. However, we should note that there is a trade-off between security and performance. Our work tends to lean to the security aspect, having relatively low accuracy on natural data. The sacrifice of performance can halve the benefits of applying deep learning models, and security concerns can restrain deployments in the real system. We are already familiar with deep learning models embedded in our everyday products or services, such as a smart speaker, ridesharing apps, and social media services. Therefore, a balance of performance and security is required depending on the characteristics of the application. The development of verifiable training algorithm enables to improve standard accuracy, exactly quantifying the security. In addition, the quantification of performance and security can help to adjust the balance between them. Acknowledgement This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2018R1D1A1A02085851), and in part by the NRF Grant funded by the Korean Government (MSIT) (NRF-2019R1A2C2002358). The corresponding author is Saerom Park. [1] A. Athalye, N. Carlini, and D. Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. ar Xiv preprint ar Xiv:1802.00420, 2018. [2] A. Boopathy, T.-W. Weng, P.-Y. Chen, S. Liu, and L. Daniel. Cnn-cert: An efficient framework for certifying robustness of convolutional neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 3240 3247, 2019. [3] R. Bunel, I. Turkaslan, P. H. Torr, P. Kohli, and M. P. Kumar. Piecewise linear neural networks verification: A comparative study. 2018. [4] N. Carlini and D. Wagner. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pages 39 57. IEEE, 2017. [5] N. Carlini, G. Katz, C. Barrett, and D. L. Dill. Ground-truth adversarial examples. ar Xiv preprint ar Xiv:1709.10207, 2017. [6] J. M. Cohen, E. Rosenfeld, and J. Z. Kolter. Certified adversarial robustness via randomized smoothing. ar Xiv preprint ar Xiv:1902.02918, 2019. [7] K. Dvijotham, S. Gowal, R. Stanforth, R. Arandjelovic, B. O Donoghue, J. Uesato, and P. Kohli. Training verified learners with learned verifiers. ar Xiv preprint ar Xiv:1805.10265, 2018. [8] K. Dvijotham, R. Stanforth, S. Gowal, T. A. Mann, and P. Kohli. A dual approach to scalable verification of deep networks. In UAI, pages 550 559, 2018. [9] R. Ehlers. Formal verification of piece-wise linear feed-forward neural networks. In International Symposium on Automated Technology for Verification and Analysis, pages 269 286. Springer, 2017. [10] M. Fazlyab, A. Robey, H. Hassani, M. Morari, and G. Pappas. Efficient and accurate estimation of lipschitz constants for deep neural networks. In Advances in Neural Information Processing Systems, pages 11423 11434, 2019. [11] M. Fischetti and J. Jo. Deep neural networks and mixed integer linear optimization. Constraints, 23(3):296 309, 2018. [12] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. ar Xiv preprint ar Xiv:1412.6572, 2014. [13] S. Gowal, K. Dvijotham, R. Stanforth, R. Bunel, C. Qin, J. Uesato, T. Mann, and P. Kohli. On the effectiveness of interval bound propagation for training verifiably robust models. ar Xiv preprint ar Xiv:1810.12715, 2018. [14] C. Guo, M. Rana, M. Cisse, and L. van der Maaten. Countering adversarial images using input transformations. ar Xiv preprint ar Xiv:1711.00117, 2017. [15] M. Hein and M. Andriushchenko. Formal guarantees on the robustness of a classifier against adversarial manipulation. In Advances in Neural Information Processing Systems, pages 2266 2276, 2017. [16] X. Huang, M. Kwiatkowska, S. Wang, and M. Wu. Safety verification of deep neural networks. In International Conference on Computer Aided Verification, pages 3 29. Springer, 2017. [17] T. Huster, C.-Y. J. Chiang, and R. Chadha. Limitations of the lipschitz constant as a defense against adversarial examples. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 16 29. Springer, 2018. [18] G. Katz, C. Barrett, D. L. Dill, K. Julian, and M. J. Kochenderfer. Reluplex: An efficient smt solver for verifying deep neural networks. In International Conference on Computer Aided Verification, pages 97 117. Springer, 2017. [19] A. Kurakin, I. Goodfellow, and S. Bengio. Adversarial examples in the physical world. ar Xiv preprint ar Xiv:1607.02533, 2016. [20] F. Latorre, P. Rolland, and V. Cevher. Lipschitz constant estimation of neural networks via sparse polynomial optimization. ar Xiv preprint ar Xiv:2004.08688, 2020. [21] M. Lecuyer, V. Atlidakis, R. Geambasu, D. Hsu, and S. Jana. Certified robustness to adversarial examples with differential privacy. In 2019 IEEE Symposium on Security and Privacy (SP), pages 656 672. IEEE, 2019. [22] B. Li, C. Chen, W. Wang, and L. Carin. Second-order adversarial attack and certifiable robustness. ar Xiv preprint ar Xiv:1809.03113, 2018. [23] A. Lomuscio and L. Maganti. An approach to reachability analysis for feed-forward relu neural networks. ar Xiv preprint ar Xiv:1706.07351, 2017. [24] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models resistant to adversarial attacks. ar Xiv preprint ar Xiv:1706.06083, 2017. [25] M. Mirman, T. Gehr, and M. Vechev. Differentiable abstract interpretation for provably robust neural networks. In International Conference on Machine Learning, pages 3575 3583, 2018. [26] R. E. Moore, R. B. Kearfott, and M. J. Cloud. Introduction to interval analysis, volume 110. Siam, 2009. [27] S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard. Universal adversarial perturbations. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1765 1773, 2017. [28] N. Papernot, P. Mc Daniel, X. Wu, S. Jha, and A. Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy (SP), pages 582 597. IEEE, 2016. [29] A. Raghunathan, J. Steinhardt, and P. Liang. Certified defenses against adversarial examples. ar Xiv preprint ar Xiv:1801.09344, 2018. [30] A. Raghunathan, J. Steinhardt, and P. S. Liang. Semidefinite relaxations for certifying robustness to adversarial examples. In Advances in Neural Information Processing Systems, pages 10877 10887, 2018. [31] H. Salman, J. Li, I. Razenshteyn, P. Zhang, H. Zhang, S. Bubeck, and G. Yang. Provably robust deep learning via adversarially trained smoothed classifiers. In Advances in Neural Information Processing Systems, pages 11289 11300, 2019. [32] G. Singh, T. Gehr, M. Mirman, M. Püschel, and M. Vechev. Fast and effective robustness certification. In Advances in Neural Information Processing Systems, pages 10802 10813, 2018. [33] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. ar Xiv preprint ar Xiv:1312.6199, 2013. [34] V. Tjeng, K. Xiao, and R. Tedrake. Evaluating robustness of neural networks with mixed integer programming. ar Xiv preprint ar Xiv:1711.07356, 2017. [35] F. Tramèr, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, and P. Mc Daniel. Ensemble adversarial training: Attacks and defenses. ar Xiv preprint ar Xiv:1705.07204, 2017. [36] Y. Tsuzuku, I. Sato, and M. Sugiyama. Lipschitz-margin training: Scalable certification of perturbation invariance for deep neural networks. In Advances in Neural Information Processing Systems, pages 6541 6550, 2018. [37] T.-W. Weng, H. Zhang, H. Chen, Z. Song, C.-J. Hsieh, D. Boning, I. S. Dhillon, and L. Daniel. Towards fast computation of certified robustness for relu networks. ar Xiv preprint ar Xiv:1804.09699, 2018. [38] T.-W. Weng, H. Zhang, P.-Y. Chen, J. Yi, D. Su, Y. Gao, C.-J. Hsieh, and L. Daniel. Evaluating the robustness of neural networks: An extreme value theory approach. ar Xiv preprint ar Xiv:1801.10578, 2018. [39] E. Wong and J. Z. Kolter. Provable defenses against adversarial examples via the convex outer adversarial polytope. ar Xiv preprint ar Xiv:1711.00851, 2017. [40] E. Wong, F. Schmidt, J. H. Metzen, and J. Z. Kolter. Scaling provable adversarial defenses. In Advances in Neural Information Processing Systems, pages 8400 8409, 2018. [41] C. Xie, J. Wang, Z. Zhang, Z. Ren, and A. Yuille. Mitigating adversarial effects through randomization. ar Xiv preprint ar Xiv:1711.01991, 2017. [42] S. Zagoruyko and N. Komodakis. Wide residual networks. ar Xiv preprint ar Xiv:1605.07146, 2016. [43] H. Zhang, T.-W. Weng, P.-Y. Chen, C.-J. Hsieh, and L. Daniel. Efficient neural network robustness certification with general activation functions. In Advances in neural information processing systems, pages 4939 4948, 2018. [44] H. Zhang, H. Chen, C. Xiao, B. Li, D. Boning, and C.-J. Hsieh. Towards stable and efficient training of verifiably robust neural networks. ar Xiv preprint ar Xiv:1906.06316, 2019.