# towards_certificated_model_robustness_against_weight_perturbations__d0f8cd9a.pdf

The Thirty-Fourth AAAI Conference on Artiﬁcial Intelligence (AAAI-20)

Towards Certiﬁcated Model Robustness Against Weight Perturbations

Tsui-Wei Weng, 1 Pu Zhao, 2 Sijia Liu,3 Pin-Yu Chen,3 Xue Lin,2 Luca Daniel1

1Massachusetts Institute of Technology, Cambridge, MA 02139 2Northeastern University, Boston, MA 02115 3MIT-IBM Watson AI Lab, IBM Research, Yorktown Heights, NY 10598 {twweng, dluca}@mit.edu, zhao.pu@husky.neu.edu, {sijia.liu, pin-yu.chen}@ibm.com, xue.lin@northeastern.edu

This work studies the sensitivity of neural networks to weight perturbations, ﬁrstly corresponding to a newly developed threat model that perturbs the neural network parameters. We propose an efﬁcient approach to compute a certiﬁed robustness bound of weight perturbations, within which neural networks will not make erroneous outputs as desired by the adversary. In addition, we identify a useful connection between our developed certiﬁcation method and the problem of weight quantization, a popular model compression technique in deep neural networks (DNNs) and a must-try step in the design of DNN inference engines on resource constrained computing platforms, such as mobiles, FPGA, and ASIC. Speciﬁcally, we study the problem of weight quantization weight perturbations in the non-adversarial setting through the lens of certiﬁcated robustness, and we demonstrate signiﬁcant improvements on the generalization ability of quantized networks through our robustness-aware quantization scheme.

Introduction

Although deep neural networks (DNNs) have achieved human-level performance in many learning tasks, intricate adversarial examples have been shown to exist in DNNs (Szegedy et al. 2014; Moosavi-Dezfooli, Fawzi, and Frossard 2016; Chen et al. 2018; Zhao et al. 2019a). An everincreasing amount of research effort has been devoted to implementing adversarial attacks in various applications (Athalye, Carlini, and Wagner 2018; Carlini and Wagner 2017; Papernot et al. 2016a; Song et al. 2018; Carlini and Wagner 2018), developing defense methods ranging from heuristic methods to provable defenses (Papernot et al. 2016b; Liu et al. 2018; Madry et al. 2018; Kolter and Wong 2018; Liu et al. 2019), as well as efﬁcient veriﬁcation of neural networks against adversarial examples (Hein and Andriushchenko 2017; Weng et al. 2018b; 2018a; Gehr et al. 2018; Boopathy et al. 2019) and random noises (Weng et al. 2019) in image classiﬁcation as well as in natural language processing (Ko et al. 2019) and reinforcement learning (Wang, Weng, and Daniel 2019). Different from above

Equal contribution in alphabetical order Copyright c 2020, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

work on studying the problem of robustness against input perturbations, this work aims to evaluate the sensitivity of DNNs to weight perturbations. Weight perturbations of DNNs are of realistic signiﬁcance. First, a new threat model of weight perturbations was proposed by (Liu et al. 2017; Zhao et al. 2019b), which showed that the so-called fault sneaking/injection attack can enforce DNN to misclassify some natural input images into target labels by slightly modifying weights at a single layer, while maintaining the classiﬁcation of unspeciﬁed input images intact. This implies that the outputs of DNNs are also sensitive to weight perturbations. Moreover, there also exist weight perturbations in the non-adversarial setting. For example, weight quantization (Zhou et al. 2017; Leng et al. 2018; Ren et al. 2019), a major DNN model compression technique commonly utilized by industry for DNN acceleration/implementation, induces weight perturbations by replacing full ﬂoating-point precision weights with ﬁxed-point lower precision weights. It is even well supported in GPUs and mobile devices, e.g., Py Torch (Paszke et al. 2017) in NVIDIA GPUs and Tensor Flow Lite (Abadi et al. 2016) for mobile devices. However, such direct mapping from full precision weights of DNNs into quantized weights could result in signiﬁcant generalization error (Sheng et al. 2018). Different from the robustness issue caused by input perturbations, weight perturbations focus on DNN models for natural (unperturbed) examples rather than adversarial examples. If training and testing samples stem from the same distribution, evaluating the model robustness against weight perturbations in the training dataset (namely, sensitivity of training accuracy to weight perturbations) is able to provide informative guidelines on the generalization ability of the weight-perturbed network (e.g., weight-quantized network). Thus, both fault injection attack and weight quantization motivate us to study the problem of model robustness against weight perturbations.

Contributions. First, we formulate the certiﬁcate problem of model robustness against weight perturbations. The solution to this problem provides the certiﬁed weight perturbation region such that DNNs will maintain the accuracy if weight perturbations are within that region. Second, we ﬁnd provable non-trivial lower bounds on the

exact certiﬁed weight perturbation region in two scenarios: a) single-layer perturbation and b) multi-layer perturbation. We empirically show that the certiﬁed lower bound provides a reasonable assessment on the practical model robustness against fault injection (Liu et al. 2017; Zhao et al. 2019b). Third, we propose a new design of weight quantization by leveraging the statistics obtained from the certiﬁed weight perturbation region. This leads to a robust-aware quantization scheme, for which we show it can be efﬁciently achieved via alternating direction method of multipliers (ADMM) (Boyd et al. 2011). The resulting quantized network yields a signiﬁcant improvement on its generalization ability compared to other quantization methods which neglect the effect of weight perturbations on the training procedure.

Related works. A line of work relevant to ours is formal veriﬁcation of neural networks (Liu et al. 2019), which provides robustness guarantee that any input perturbation within a neighborhood of the natural example cannot fool the classiﬁer s top-1 prediction. In (Katz et al. 2017; Ehlers 2017; Bunel et al. 2018; Dutta et al. 2017), satisﬁability modulo theory (SMT) or mixed-integer programming (MIP) based methods provided exact robustness certiﬁcate with respect to the input perturbation strength. However, these approaches suffer from the scalability issue due to high computational complexity. Moreover, the work (Raghunathan, Steinhardt, and Liang 2018; Dvijotham et al. 2018; Wong and Kolter 2017; Weng et al. 2018a; 2018b; Wong et al. 2018) relaxed the exact veriﬁcation problem by over-approximating the output space of a network given a neighborhood near the natural input. Such a relaxation leads to fast computation in the veriﬁcation process but only proves a lower bound of exact robustness guarantees. Our paper is extended from the second line of work on certiﬁcation of networks but with the main difference on the threat model: weight perturbations rather than input perturbations. Besides rigorously certifying network robustness, the work (Cheney, Schrimpf, and Kreiman 2017) empirically showed the robustness of convolutional neural networks (CNNs) against weight perturbations. Different from existing literature, our work on certiﬁed robustness extended from input to weight perturbations is non-trivial, since the latter could be coupled at multiple layers. Most importantly, we provide a use case, design of weight quantization scheme, showing that robustness against weight perturbations matters even in the non-adversarial setting. It is worth mentioning that different from existing weight quantization work (Leng et al. 2018; Park, Ahn, and Yoo 2017; Zhou et al. 2017; Lin, Talathi, and Annapureddy 2016; Wu et al. 2016; Rastegari et al. 2016; Hubara et al. 2016), our work provides a solution through the leans of formal veriﬁcation of model robustness and has the potential to be integrated with well-developed weight quantization frameworks (Paszke et al. 2017; Abadi et al. 2016).

Problem Setup Notations. Throughout the paper, unless speciﬁed otherwise we use the following notations. Let (x, c) denote a pair of example x and class label c. For a K-layer neural network, let nk, W(k) Rnk nk 1, b(k) Rnk denote the number of

neurons, the weight matrix and the bias vector at layer k, respectively. We use the superscript (k) to indicate the variable associated with layer k. We also deﬁne W := {W(k)}K k=1 and b := {b(k)}K k=1 to denote the vector/matrix/set containing all variables indexed by k. And we use [K] to denote the integer set {1, 2, . . . , K}. Let f(x; W, b) Rn K be a neural network function with respect to the input x for n K output classes. Here we refer f as the logit layer. The softmax layer can be safely discarded in our analysis due to its monotonicity. We use fj(x; W, b) (or simply fj) to denote the j-th class output of the neural network. Let Φ(k)(x) Rnk and Φ(k)(x) Rnk denote the pre-activation and post-activation values of the k-th layer, respectively. And let σ( ) denote a non-linear element-wise activation function.

Neural network model. We focus on the K-layer fullyconnected (FC) feedforward neural network with Re LU activation functions but the results can also be generalized to convolutional neural networks and general activations. The input-output relationship of the network is given by

Φ(k)(x) = σ Φ(k)(x) , Φ(k)(x) = W(k)Φ(k 1)(x) + b(k), k [K], (1)

where Φ(0)(x) := x, and f(x; W, b) = Φ(K)(x). In the classiﬁcation setting, the predicted class c is the class that has the largest output value: arg maxj fj.

Weight perturbations. The problem of our interest is to provide a robustness certiﬁcate for a neural network when its weight parameters are perturbed. We deﬁne ℓ -norm based weight perturbations as

B(W, ϵ) = ˆ W | ˆ W = { ˆ W(k)},

ˆ W(k) W(k) ϵ, k [K] , (2)

where ϵ is the perturbation radius, and ˆ W(k) denotes the perturbed weights against the original weights W(k) at each layer. Given ϵ, verifying the neural network robustness (at input x with the true label c) against weight perturbation can be cast as the following optimization problem

minimize ˆ W fc( ˆ W) maxj =c fj( ˆ W)

subject to ˆ W B(W, ϵ), (3)

where we use the simpliﬁed notation fj(W) to highlight the dependency of classiﬁcation on model weights by omitting x and b. If the optimal value of problem (3) is positive, then the robustness of the neural network is certiﬁed under ϵ-tolerant weight perturbation at input x. Goal of robustness certiﬁcation under weight perturbation: Finding the largest ϵ such that problem (3) has the positive optimal value. We remark that problem (3) maintains the similar formulation of verifying neural network s robustness against input perturbation, e.g., recent works (Wong and Kolter 2017; Weng et al. 2018a; 2018b; Wong et al. 2018). However, none of them investigated the direction of certifying robustness

against weight perturbation. Different from the input perturbation that generates an adversarial example, the effect of weight perturbation on network robustness is measured under the original input. At this sense, the certiﬁed perturbation region in terms of ϵ in (3) offers the new perspective on how sensitive the prediction accuracy of a well-trained network could be against weight perturbation. Moreover, since weights can be perturbed at multiple layers, the problem of weight perturbation suffers from a more complicated layerwise coupling issue than that of certifying input perturbation.

Towards Certiﬁed Sensitivity of Weight Perturbations In this section, we formally describe the idea of certiﬁed lower bound when the weights of neural networks are perturbed. We start from a simple 2-layer MLP example and provide general results in Theorem 3.1. Based on Theorem 3.1, we illustrate how to further generalize our results to multi-layer perturbation setting.

Certiﬁed lower bound: Single-layer weight perturbation When the weight perturbation only occurs at a single layer (e.g. the N-th layer, N K), we have the constraint:

ˆ W(N) W(N) ϵ, ˆ W(k) = W(k), if k = N, k [K]. (4) Ideally, we would like to solve problem (3) exactly to get the maximum possible (exact) tolerance ϵ on the weight perturbation such that the top-1 prediction of a neural network classiﬁer will not change. However, it has been shown that there does not exist a polynomial time algorithm to compute the exact robustness of neural networks (Katz et al. 2017). Hence, our goal is to ﬁnd a non-trivial ϵ efﬁciently and this problem can be formulated as follows. Let f L c ( ˆ W) and f U c ( ˆ W) be two linear functions of ˆ W such that f L c ( ˆ W) fc( ˆ W) f U c ( ˆ W) for all ˆ W, and let

γL c = min ˆ W B(W,ϵ) f L c ( ˆ W), γU c = max ˆ W B(W,ϵ) f U c ( ˆ W). (5)

We can compute ϵ by solving the following problem: maximize ϵ ϵ

subject to γL c γU j > 0, j = c. (6)

Note that the constraint set γL c maxj =c γU j > 0 (namely, γL c γU j > 0, j = c) is more restricted than fc( ˆ W) maxj =c fj( ˆ W) > 0. Thus, the solution to problem (6) provides a certiﬁed lower bound on the maximum ϵ to ensure the positive objective value of problem (3). In fact, in the following, we will show that γL c and γU c can be computed analytically, and hence we are able to ﬁnd the solution of (6) efﬁciently through bi-section on ϵ. Note that in the work (Weng et al. 2018a), the authors proposed an efﬁcient algorithm Fast-Lin to compute a certiﬁed lower bound for neural networks with input perturbation, whereas in this work, we focus on weight-perturbation on the neural networks and show that it is also possible to derive certiﬁed lower bound for this problem setting. Furthermore, we show in Sec 4 and 5 that our technique are beneﬁcial to developing a new weight quantization scheme with better generalization.

Neural network f is bounded by two linear functions f L c ( ˆ W) and f U c ( ˆ W). The core idea to deriving the linear bounds f L c ( ˆ W), f U c ( ˆ W) of a K-layer feed-forward neural network f is to apply linear upper and lower bound on each neuron s activation and consider the signs of associated weights. We start with a 2-layer network (K = 2) and then extend it to the general case. Suppose that the ﬁrst layer weights are perturbed and we have (4) with K = 1. The j-th output of the network (with respect to ˆ W(1)) is then given by

fj( ˆ W(1)) =

r [n1] W (2) j,r σ ˆ W(1) r,: x + b(1) r + b(2) j , (7)

where Wj,r denotes the (j, r)-th entry of W, and ˆ Wr,: de-

notes the r-th row of ˆ W. Since W(1) r,: ϵ ˆ W(1) r,: W(1) r,: + ϵ, the pre-activation y(1) r := ˆ W(1) r,: x + b(1) r at the 1-st layer is bounded by some constants l(1) r , and u(1) r , which are determined by the signs of x and the bounds of ˆ W(1) r,: . Given y(1) r [l(1) r , u(1) r ], the non-linear activation function σ(y(1) r ) has explicit linear bounds (Zhang et al. 2018) with slope and bias parameters {α(1) L,r, α(1) U,r} and {β(1) L,r, β(1) U,r} as follows:

α(1) L,r(y(1) r + β(1) L,r) σ(y(1) r ) α(1) U,r(y(1) r + β(1) U,r). (8)

If l(1) r < 0 < u(1) r , then α(1) L,r = α(1) U,r = u(1) r u(1) r l(1) r , β(1) L,r = 0,

and β(1) U,r = l(1) r ; if l(1) r u(1) r 0, then all parameters

are zeros; if 0 l(1) r u(1) r , then α(1) L,r = α(1) U,r = 1 and

β(1) L,r = β(1) U,r = 0. The equations (7) and (8) of the 2-layer network example imply two general rules:

1. The pre-activation bounds at the N-th layer are known a priori (since no weight prior to the N-th layer is perturbed), and thus we only need to perform bound propagation for k > N layers;

2. The ﬁnal layer bounds f L c ( ˆ W) and f U c ( ˆ W) can be computed via bound propagation. The idea is to compute the pre-activation bounds layer by layer (which is so-called bound propagation) analytically via Theorem 0.1. In Theorem 0.1, we show the analytic output bounds of neural networks when there exists single-layer ℓp-norm weight perturbation with p 1.

Theorem 0.1 Suppose that the N-th layer weights are perturbed in a K-layer neural network. Let f : Rn N n N 1 Rn K denote the mapping from perturbed weights ˆ W(N) at the single layer N to predicted outputs at the ﬁnal layer K. Then there exist two explicit functions f L j : Rn N n N 1 R and f U j : Rn N n N 1 R for class j [n K], such that the following inequality holds

f L j ( ˆ W(N)) fj( ˆ W(N)) f U j ( ˆ W(N)), (9)

where ˆ W(N) s,: W(N) s,: p ϵ and ˆ W(k) = W(k) for k = N, and p 1. The closed forms of lower and upper bounds

in (9) are given by

f U j ( ˆ W(N)) =Λ(N 1) j,: ˆ W(N)Φ(N 1)(x)

k=N Λ(k) j,: (b(k) + Δ(k) :,j ), (10)

f L j ( ˆ W(N)) =Ω(N 1) j,: ˆ WΦ(N 1)(x)

k=N Ω(k) j,: (b(k) + Θ(k) :,j ), (11)

Λ(k 1) j,: =

e j if k = K + 1, Λ(k) j,: λ(k 1) j,: if k = N, (Λ(k) j,: W(k)) λ(k 1) j,: otherwise.

Ω(k 1) j,: =

e j if k = K + 1, Ω(k) j,: ω(k 1) j,: if k = N, (Ω(k) j,: W(k)) ω(k 1) j,: otherwise.

Here the matrices λ(k), ω(k), Δ(k), Θ(k) are functions of the linear bounding parameters {α(k) L,i, α(k) U,i} and {β(k) L,i, β(k) U,i} on each neuron i at layer k:

α(k) U,i if k = N 1, Λ(k+1) j,: W(k+1) :,i 0, α(k) L,i if k = N 1, Λ(k+1) j,: W(k+1) :,i < 0, 1 if k = N 1.

α(k) L,i if k = N 1, Ω(k+1) j,: W(k+1) :,i 0, α(k) U,i if k = N 1, Ω(k+1) j,: W(k+1) :,i < 0, 1 if k = N 1.

β(k) U,i if k = N 1, Λ(k+1) j,: W(k+1) :,i 0, β(k) L,i if k = N 1, Λ(k+1) j,: W(k+1) :,i < 0, 0 if k = N 1.

β(k) L,i if k = N 1, Ω(k+1) j,: W(k+1) :,i 0, β(k) U,i if k = N 1, Ω(k+1) j,: W(k+1) :,i < 0, 0 if k = N 1.

where is the Hadamard product and ej Rn K is a j-th basis vector. Proof. The proof on the input perturbation (Weng et al. 2018a; Zhang et al. 2018; Boopathy et al. 2019) can be adapted to weight perturbation in our case, where we have fj( ˆ W(N)) f U,K 1 j ( ˆ W(N)) f U,K 2 j ( ˆ W(N)) . . .

f U,N+1 j ( ˆ W(N)) = W(N+1) j,: σ( ˆ W(N)Φ(N 1) + b(N)). The

derivation of Λ(k 1) j,: , Ω(k 1) j,: from layers N + 1 to K 1 is the same except for the N-th layer. For N-th layer, since the perturbation is now on ˆ W(N) rather than x, we let Λ(N 1) j,: = Λ(N) j,: λ(N 1) j,: . By using the same trick to decompose σ(y) by the inequality (8) with associated sign of the equivalent matrix W(N+1) j,: , we get the ﬁnal upper bound (10). The lower bound f L j ( ˆ W(N)) can be derived similarly.

Global output bounds γU j and γL j . Based on (10) and (11), we can further derive the global output bounds γU j and γL j , which are constants, determined by (5). Since f L j and f U j are two linear functions and ˆ W B(W, ϵ) is a convex norm constraint, the optimal value of problem (5) can be obtained by Holder s inequalities:

γU j = ϵ Λ(N 1) j,: 1 Φ(N 1)(x) q

+ Λ(N 1) j,: W(N)Φ(N 1)(x) +

k=N Λ(k) j,: (b(k) + Δ(k) :,j ),

γL j = ϵ Ω(N 1) j,: 1 Φ(N 1)(x) q

+ Ω(N 1) j,: W(N)Φ(N 1)(x) +

k=N Ω(k) j,: (b(k) + Θ(k) :,j ),

where 1/q = 1 1/p. As γU j and γL c are monotonically increasing and decreasing with respect to ϵ, respectively, we can solve problem (6) by the bisection search over ϵ, which renders the certiﬁed lower bound on network s robustness against weight perturbation. As a ﬁnal remark, our work is different from adversarial robustness against input perturbation in that f is a function of perturbed weight matrix ˆ W(N) bounded by two linear functions f U j ( ˆ W(N)), f L j ( ˆ W(N)) as opposed to a function of perturbed input x. Interestingly, once the neuron s activation are linearly bounded, we can directly apply the similar idea in (Weng et al. 2018a; Zhang et al. 2018; Boopathy et al. 2019) to compute the bounds layer-by-layer with Theorem 0.1. Our results can also be directly extended to convolutional layers following (Boopathy et al. 2019).

Certiﬁed lower bound: Multi-layer perturbation When there exists multi-layer weights perturbations, deriving the certiﬁed lower bound becomes much more involved as the weight perturbations will be coupled across layers. However, computing layer-wise bound for each neuron is still possible we can integrate previous single-layer results with interval bound propagation as demonstrated below. Without loss of generality, assume that the weight matrices at the S-th layer and the N-th layer are both perturbed with S < N < K. Equations (12) and (13) can be directly used to compute the perturbation bounds at each neuron of layer S by viewing the N 1-th layer as the output layer. For N 1 i K, we have ˆl(i) r Φ(i) r ˆu(i) r , where ˆu(i) r = σ(u(i) r ), ˆl(i) r = σ(l(i) r ). Let k = N 1, p = and q = 1, we then have: if i = k,

u(i+1) j = |W(i+1) j,: | ˆu(i) ˆl(i)

2 + W(i+1) j,: ˆu(i) + ˆl(i)

+ b(k+1) j + ϵ ˆu(i) ˆl(i)

2 q + ϵ ˆu(i) + ˆl(i)

l(i+1) j = |W(i+1) j,: | ˆu(i) ˆl(i)

2 + W(i+1) j,: ˆu(i) + ˆl(i)

+ b(k+1) j ϵ ˆu(i) ˆl(i)

2 q + ϵ ˆu(i) + ˆl(i)

if i > k, u(i+1) j = |W(i+1) j,: | ˆu(i) ˆl(i)

2 + W(i+1) j,: ˆu(i)+ˆl(i)

2 + b(i+1) j l(i+1) j = |W(i+1) j,: | ˆu(i) ˆl(i)

2 + W(i+1) j,: ˆu(i)+ˆl(i)

2 + b(i+1) j

and the network output f( ˆ W(S), ˆ W(N)) is bounded by l(K) j fj( ˆ W(S), ˆ W(N)) u(K) j . For simplicity, we present the above analysis with same ϵ and without bias perturbation, but our analysis can be extended to the case where ϵi associated to the i-th layer and when biases b are perturbed.

Weight Quantization with Perturbation Certiﬁcate In this section, we revisit the problem of weight quantization, commonly used for network compression, through the lens of certiﬁcated robustness against weight perturbation. We propose a uniﬁed quantization framework with perturbation certiﬁcate by leveraging alternating direction method of multipliers (ADMM).

Weight quantization. Given a ﬁnite number of bits, the goal is to discretize a model s weights but to preserve its accuracy. At dk-bit quantization of layer k, we use 1 bit to represent zero value, and the remaining dk 1 bits to represent at most 2dk 1 different values with distance q(k) (here we follow the convention from (Zhou et al. 2017; Leng et al. 2018)).The quantization of continuous weights W(k) = C(k) is given by

ˆ W(k) { 2dk 2q(k), . . . , 2q(k), q(k), 0,

q(k), 2q(k), . . . , 2dk 2q(k)}, k [K] (14)

Here we use the notation of perturbed weights ˆ W(k) to represent the weights after quantization. We remark that the set of quantized values in (14) can easily be encoded using binary bits and implemented in hardware. For example, by storing q(k), we can express { 2q(k), q(k), 0, q(k), 2q(k)} as { 2, 1, 0, 1, 2}. Given the training loss f and the number of bits {dk}, the problem of weight quantization determines {q(k), ˆ W(k)} by solving the optimization problem

minimize {q(k), ˆ W(k)} f({ ˆ W(k)}), subject to (14). (15)

Note that a quantized network may suffer from the generalization issue. However, any weight perturbation within the certiﬁed region found by Theorem 0.1 will not cause misclassiﬁcation. Thus, from the perspective of weight perturbation, one can integrate the certiﬁed region with problem (15) to reduce the generalization error caused by quantization. We call the resulting quantization problem certiﬁed weight quantization, which is given by

minimize {q(k), ˆ W(k)} f({ ˆ W(k)})

subject to (14), ˆ W(k) C(k) ϵ(k) c , k, (16)

where ϵ(k) c is a threshold chosen by the radius of the certiﬁed perturbation region with respect to (w.r.t.) continuous weights

C(k) at layer k. Given a training sample x, the solution to problem (6) provides ϵ(k) c (x). The sample-independent threshold ϵ(k) c can then be set as the average or other percentiles of {ϵ(k) c (x)} over multiple samples. Our experiments will show that the empirical improvement on the generalization error of quantized networks is signiﬁcant by incorporating the certiﬁcation constraints on weight perturbation. We ﬁnally remark that at the ﬁrst glance, one may think that problem (16) would yield worse training loss compared to problem (15), since the former has additional constraints. However, due to non-convexity, it is difﬁcult to solve problem (15) and (16) at the global optimality. The story is then different when comparing a local optimal solution to problem (16) with a local optimal solution to problem (15). Suppose that training and testing samples obey the same underlying data distribution, then the proposed robustness certiﬁcate can drive the designed sub-optimal quantizer toward better generalization capability. Indeed, our experiments show that the introduction of certiﬁcate constraints reduces both training and generalization errors; see Figure 2.

Certiﬁed weight quantization via ADMM. Upon deﬁning ˆ Wk = q(k) ˆVk, problem (16) can be rewritten as

minimize {q(k), ˆ V(k)} f({q(k) ˆV(k)})

subject to ˆV(k) D(k), k, q(k) ˆVk C(k) ϵ(k) c , k

where D(k) := { 2dk 2, . . . , 2, 1, 0, 1, 2, . . . , 2dk 2}. Note that solving problem (17) is challenging since certiﬁcate constraints involve bilinear terms {q(k) ˆVk}. We propose an ADMM-based problem formulation and optimization algorithm to obtain a solution. We begin by introducing an auxiliary variable ˆGk together with ˆGk = ˆVk and reformulate (17) by lending itself to the application of ADMM,

minimize {q(k), ˆ V(k), ˆ G(k)} f({q(k) ˆG(k)}) + I1({ ˆV(k)})

+I2({q(k) ˆG(k)}) subject to ˆG(k) = ˆV(k), k,

where I1 and I2 are indicator functions encoding the constraints a) ˆV(k) D(k) for all k, and b) q(k) ˆG(k) C(k) ϵ(k) c for all k. The key difference from the ADMM formulation in (Leng et al. 2018) is that we impose the auxiliary variable w.r.t. ˆV(k) rather than ˆ W(k) so that the variable q(k) is explicitly included in our formulation. We then introduce the augmented Lagrangian function of (18) that will be alternatively minimized in ADMM,

L({q(k)}, { ˆV(k)}, { ˆGk}, { ˆU(k)})

= f({q(k) ˆG(k)}) + I1({ ˆV(k)}) + I2({q(k) ˆG(k)})

k=1 ( ˆU(k))T ( ˆG(k) ˆV(k)) + ρ

k=1 ˆG(k) ˆV(k) 2 F , (19)

where ρ > 0 is a regularization parameter, ˆU(k) are Lagrangian multipliers w.r.t. equality constraints of (18), and F denotes the Frobenius norm.

For ease of notation, let ˆG, q, ˆV and U denote the set of variables { ˆG(k)}, {q(k)}, { ˆV(k)} and { ˆU(k)}. The main steps of ADMM are alternatively minimizing (19) over two blocks of variables, ˆG and {q, ˆV}. That is, ADMM at the t-th iteration is given by

ˆG(t + 1) = arg min ˆ G L(q(t), ˆV(t), ˆG, U(t)), (20)

{q(t + 1), ˆV(t + 1)} = arg min q, ˆ V L(q, ˆV, ˆG(t + 1), U(t)), (21)

where U(t + 1) = U(t) + ρ( ˆG(t + 1) ˆV(t + 1)), and we initialize ADMM with speciﬁed q(0), ˆV(0) and U(0). We note that problem (21) is decomposable w.r.t. q and ˆV; see equivalent ADMM steps in Proposition 1. Proposition 1 The ADMM subproblems (20) and (21) can be equivalently transformed into a) ˆG-minimization step, b) q-minimization step and c) ˆV-minimization step. That is, ˆG-minimization step: ˆG(t + 1) in (20) is given by the solution of the problem

minimize ˆG f( ˆG; q(t)) + ρ

subject to (1/q(k)) ˇD(k) ˆG(k) (1/q(k)) ˆD(k), k, (22)

where f( ˆG; q(t)) represents f w.r.t. variables ˆG under given values q(t), ˇD(k) := C(k) ϵ(k) c I, and ˆD(k) := ϵ(k) c I+C(k). q-minimization step: q(t + 1) in (21) is given by the solution of the problem

minimize q f(q; ˆG(t + 1))

subject to max{ˆa (k) 1 , ˆa (k) 2 } q(k) min{ˇa (k) 1 , ˇa (k) 2 }, k. (23)

Let Ik,+ and Ik, denote the index set of positive and negative elements in ˆG(k)(t + 1), respectively. And let 1[X]I denote the sub-matrix of X selected by the index set I, and / and max{ } denote the element-wise division and maximum operation, respectively, Then ˆa(k) 1 , ˆa(k) 2 , ˇa(k) 1 , and ˇa(k) 2 in (23) are given by

ˆa(k) 1 := max{[ ˇD(k)]Ik,+/[ ˆG(k)(t + 1)]Ik,+},

ˆa(k) 2 := max{[ ˆD(k)]Ik, /[ ˆG(k)(t + 1)]Ik, },

ˇa(k) 1 := min{[ ˆD(k)]Ik,+/[ ˆG(k)(t + 1)]Ik,+},

ˇa(k) 2 := min{[ ˇD(k)]Ik, /[ ˆG(k)(t + 1)]Ik, }.

ˆV-minimization step: ˆV(t + 1) in (21) is given by

ˆV (k) ij (t + 1) =

Bij Bij [ 2dk 2, 2dk 2] 2dk 2 Bij < 2dk 2

2dk 2 Bij > 2dk 2, (24)

where ˆV (k) ij (t+1) denotes the (i, j)-th element of ˆV(k)(t+1), x represents the nearest integer to x, and B := ˆG(t+1) (1/ρ)U(t). Proof: See Appendix1.

1 The appendix and code are available at https://github.com/ lilyweng/Quantization.

Comparison with (Leng et al. 2018). We note that (Leng et al. 2018) also uses ADMM to solve a similar problem. The key difference of the two ADMM-based solutions is described as follows. 1) The introduced auxiliary variables in ADMM are different. In (Leng et al. 2018), the auxiliary variables are copies of network weights, ignoring the effect of q(k). However, in (18), the auxiliary variables ˆGk are weights separated from the quantization intervals qk. 2) The operator splitting is different. We perform a two-block splitting over the variables ˆGk and {qk, ˆVk}, and show that the resulting second-block subproblem can be further decomposed into easier problems (23) and (24). In contrast, a sub-optimal iterative optimization method was internally used in (Leng et al. 2018) to solve a more involved splitting problem. In Proposition 1, each subproblem can be efﬁciently handled by standard optimization solvers. In (22)-(23), since the constraint sets are simple box constraints, projected gradient descent methods can be used to ﬁnd solutions ˆG(t + 1) and q(t + 1). In (24), we have the closed-form expression of the solution ˆV(t + 1). We also note that ADMM is convergent and reaches a sub-linear convergence rate for nonconvex and nonsmooth stochastic optimization (Huang and Chen 2018).

Experiments We demonstrate the effectiveness of our approach under two applications: a) weight quantization described in Sec. and b) model robustness against fault sneaking attack (Zhao et al. 2019b). To align with our theoretical results, we perform experiments under multilayer perceptron (MLP) models of various numbers of layers1. The performance is evaluated under 4 datasets, MNIST, MNIST-fashion, SVHN, and CIFAR-10.

Figure 1: Testing accuracy of quantized 8-layer MLP (4, 6, and 8 bits per layer) on SVHN versus the certiﬁcation constraint parameter chosen by different percentiles of certiﬁed weight perturbation lower bounds over 100 training images.

Robustness-aware weight quantization. We consider MLP models of 2, 4, 6, 8 and 10 layers, each of which is quantized using 4, 6, and 8 bits. Figure 1 presents the testing accuracy of the quantized 8-layer MLP on SVHN versus the choice of the certiﬁcation constraint parameter ϵ(k) c of problem (16). Here we set ϵ(k) c as a percentile of certiﬁed robustness bounds (6) over 100 training images. For comparison, we also present the performance of weight quantization by solving problem (15) without imposing certiﬁcation constraints. We see that the use of certiﬁcation bounds sig-

Figure 2: Training/testing accuracy of quantization with/without certiﬁcation constraints. left) MNIST-Fashion. middle) SVHN. Dashed lines denote training accuracy and solid lines represent test accuracy. Blue lines show the proposed ADMM method with bounds, and green lines show the proposed ADMM method without bounds. Red lines show the LB method without bounds.

niﬁcantly boosts the testing accuracy of quantized networks (around 10% improvement) and approaches the testing accuracy without quantization. Moreover, we observe that the improved performance is insensitive to the choice of ϵ(k) c except a slight degradation when ϵ(k) c is greater than the 90 percentile of certiﬁcation bounds. This is not surprising since the performance of weight quantization without imposing certiﬁcation constraints is worst, corresponding to the extreme case ϵ(k) c . In the following experiments, unless speciﬁed otherwise, we choose ϵ(k) c as 50 percentile of certiﬁed robustness bounds. Additional results on CIFAR-10 and other model architectures are provided in Table A1 of Appendix. We note that the test accuracy of quantization becomes worse as ϵ(k) c becomes much larger, e.g. the use of 2,5,10 times of 100-th percentile certiﬁcation bounds yields worse test accuracy 82.7%,78.6%,76.1% for 6 layer MLP with 6 bits quantization, whereas the use of 0, 100th percentile gives 83.4%, 83%. More results are shown in the appendix. In Figure 2, we compare our proposed method with the ADMM-based low-bits (LB) quantization method (Leng et al. 2018) in terms of training/testing accuracy versus ADMM iterations under datasets MNIST-fashion and SVHN. Recall that the LB method was designed to solve problem (15) without introducing certiﬁcation constraints. All the methods are initialized from the same pre-trained model of continuous weights. It can be seen from training accuracy that our method converges to a better local optimal solution than the LB method even in the absence of certiﬁcation constraints. When information on certiﬁed robustness bounds is considered, both training and testing accuracy are signiﬁcantly improved, consistent with results in Figure 1 and Table A1.

Model robustness against fault sneaking attack (Zhao et al. 2019b). It was shown in (Zhao et al. 2019b) that slightly perturbing model weights at a single layer is capable of misclassfying a speciﬁc set of natural images toward target labels but keeping classiﬁcation of unspeciﬁed input images intact. The corresponding threat model is called fault sneaking attack (FSA). It is worth mentioning that such an attack is commonly performed at deep layers of a network due to its stealthyness requirement. We refer readers to Appendix for

Table 1: Certiﬁed perturbation bounds and ℓ -norm of weight perturbations caused by FSA. Here FAS perturbs the last 4 layers of a 10-layer MLP under 4 datasets.

dataset layer index 7 8 9 10

original Acc (%) 97.8 97.8 97.8 97.8 Acc after attack (%) 94 93.9 93.6 94.3 certiﬁed bound 2 10 6 2.6 10 5 0.0003 0.0083 attack perturbation 0.042 0.048 0.052 0.073

MNIST-Fashion

original Acc (%) 88.6 88.6 88.6 88.6 Acc after attack (%) 84.6 84.3 84.0 84.3 certiﬁed bound 2 10 6 2.4 10 5 0.00033 0.0117 attack perturbation 0.025 0.0306 0.0351 0.11

original Acc (%) 82.6 82.6 82.6 82.6 Acc after attack (%) 80.5 80.1 80.3 80.6 certiﬁed bound 4 10 6 5.6 10 5 0.0008 0.035 attack perturbation 0.023 0.027 0.036 0.102

original Acc (%) 56.7 56.7 56.7 56.7 Acc after attack (%) 50.2 51.1 51.5 51.2 certiﬁed bound 4 10 6 5 10 5 0.0007 0.027 attack perturbation 0.036 0.04 0.056 0.15

the detailed setting of attack generation and our experiment. Table 1 shows the original accuracy (Acc), Acc after FSA, certiﬁed lower bound on weight perturbation, and the ℓ norm of weight perturbations caused by FSA. We see that the certiﬁed lower bound decreases as the layer index decreases since the linear bound approximation becomes looser when it needs to propagate over more layers prior to the output layer. Moreover, the certiﬁed lower bound shares the same pattern of FSA against the layer index. The indicates the intrinsic robustness of networks against FSA: A shallower layer is more vulnerable to FSA, which is also veriﬁed by Figure A1. Additional results on Image Net are shown in the appendix.

In this paper, we take the ﬁrst step to study the sensitivity of neural networks to weight perturbations and propose an efﬁcient algorithm for computing a certiﬁed robustness bound. Our study on weight perturbation could be useful in both nonadversarial (weight quantization with certiﬁed robustness) and adversarial environments (robustness indicator against weight perturbation).

Acknowledgments

This work is partly supported by National Science Foundation CNS-1929300.

Abadi, M.; Agarwal, A.; Barham, P.; and et. al. 2016. Tensor Flow: Large-scale machine learning on heterogeneous systems. Software available from tensorﬂow.org.

Athalye, A.; Carlini, N.; and Wagner, D. 2018. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. 2018 ICML ar Xiv preprint ar Xiv:1802.00420. Boopathy, A.; Weng, T.-W.; Chen, P.-Y.; Liu, S.; and Daniel, L. 2019. Cnn-cert: An efﬁcient framework for certifying robustness of convolutional neural networks. In AAAI. Boyd, S.; Parikh, N.; Chu, E.; et al. 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends R in Machine Learning.

Bunel, R. R.; Turkaslan, I.; Torr, P.; Kohli, P.; and Mudigonda, P. K. 2018. A uniﬁed view of piecewise linear neural network veriﬁcation. In NIPS, 4790 4799.

Carlini, N., and Wagner, D. 2017. Towards evaluating the robustness of neural networks. In Security and Privacy (SP), 2017 IEEE Symposium on, 39 57. IEEE. Carlini, N., and Wagner, D. 2018. Audio adversarial examples: Targeted attacks on speech-to-text. In 2018 IEEE Security and Privacy Workshops (SPW), 1 7. Chen, P.-Y.; Sharma, Y.; Zhang, H.; Yi, J.; and Hsieh, C.-J. 2018. Ead: elastic-net attacks to deep neural networks via adversarial examples. AAAI.

Cheney, N.; Schrimpf, M.; and Kreiman, G. 2017. On the robustness of convolutional neural networks to internal architecture and weight perturbations. ar Xiv preprint ar Xiv:1703.08245. Dutta, S.; Jha, S.; Sanakaranarayanan, S.; and Tiwari, A. 2017. Output range analysis for deep neural networks. ar Xiv preprint ar Xiv:1709.09130. Dvijotham, K.; Stanforth, R.; Gowal, S.; et al. 2018. A dual approach to scalable veriﬁcation of deep networks. UAI. Ehlers, R. 2017. Formal veriﬁcation of piece-wise linear feedforward neural networks. In International Symposium on Automated Technology for Veriﬁcation and Analysis, 269 286. Springer.

Gehr, T.; Mirman, M.; Drachsler-Cohen, D.; Tsankov, P.; Chaudhuri, S.; and Vechev, M. 2018. Ai2: Safety and robustness certiﬁcation of neural networks with abstract interpretation. In IEEE SP.

Hein, M., and Andriushchenko, M. 2017. Formal guarantees on the robustness of a classiﬁer against adversarial manipulation. In NIPS.

Huang, F., and Chen, S. 2018. Mini-batch stochastic admms for nonconvex nonsmooth optimization. ar Xiv preprint ar Xiv:1802.03284.

Hubara, I.; Courbariaux, M.; Soudry, D.; El-Yaniv, R.; and Bengio, Y. 2016. Binarized neural networks. In Neur IPS.

Katz, G.; Barrett, C.; Dill, D. L.; et al. 2017. Reluplex: An efﬁcient smt solver for verifying deep neural networks. In International Conference on Computer Aided Veriﬁcation, 97 117. Springer.

Ko, C.-Y.; Lyu, Z.; Weng, T.-W.; Daniel, L.; Wong, N.; and Lin, D. 2019. Popqorn: Quantifying robustness of recurrent neural networks. ar Xiv preprint ar Xiv:1905.07387. Kolter, J. Z., and Wong, E. 2018. Provable defenses against adversarial examples via the convex outer adversarial polytope. 2018 ICML ar Xiv preprint ar Xiv:1711.00851. Leng, C.; Dou, Z.; Li, H.; et al. 2018. Extremely low bit neural network: Squeeze the last bit out with admm. In AAAI.

Lin, D.; Talathi, S.; and Annapureddy, S. 2016. Fixed point quantization of deep convolutional networks. In International Conference on Machine Learning, 2849 2858.

Liu, Y.; Wei, L.; Luo, B.; and Xu, Q. 2017. Fault injection attack on deep neural network. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 131 138. Liu, X.; Cheng, M.; Zhang, H.; and Hsieh, C.-J. 2018. Towards robust neural networks via random self-ensemble. In Proceedings of the European Conference on Computer Vision (ECCV), 369 385.

Liu, C.; Arnon, T.; Lazarus, C.; Barrett, C.; and Kochenderfer, M. J. 2019. Algorithms for verifying deep neural networks. ar Xiv preprint ar Xiv:1903.06758. Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; and Vladu, A. 2018. Towards deep learning models resistant to adversarial attacks. 2018 ICLR ar Xiv preprint ar Xiv:1706.06083.

Moosavi-Dezfooli, S.-M.; Fawzi, A.; and Frossard, P. 2016. Deepfool: a simple and accurate method to fool deep neural networks. In CVPR, 2574 2582. Papernot, N.; Mc Daniel, P.; Swami, A.; and Harang, R. 2016a. Crafting adversarial input sequences for recurrent neural networks. In IEEE Military Communications Conference (MILCOM), 49 54. Papernot, N.; Mc Daniel, P.; Wu, X.; et al. 2016b. Distillation as a defense to adversarial perturbations against deep neural networks. In Security and Privacy (SP), 2016 IEEE Symposium on, 582 597. Park, E.; Ahn, J.; and Yoo, S. 2017. Weighted-entropy-based quantization for deep neural networks. In CVPR, 7197 7205. Paszke, A.; Gross, S.; Chintala, S.; and Chanan, G. 2017. Pytorch. Raghunathan, A.; Steinhardt, J.; and Liang, P. 2018. Certiﬁed defenses against adversarial examples. ICLR. Rastegari, M.; Ordonez, V.; Redmon, J.; and Farhadi, A. 2016. Xnor-net: Imagenet classiﬁcation using binary convolutional neural networks. In ECCV, 525 542. Springer. Ren, A.; Zhang, T.; Ye, S.; et al. 2019. Admm-nn: An algorithmhardware co-design framework of dnns using alternating direction methods of multipliers. In ASPLOS, 925 938. ACM. Sheng, T.; Feng, C.; Zhuo, S.; et al. 2018. A quantization-friendly separable convolution for mobilenets. In 2018 1st Workshop on Energy Efﬁcient Machine Learning and Cognitive Computing for Embedded Applications, 14 18. IEEE. Song, D.; Eykholt, K.; Evtimov, I.; et al. 2018. Physical adversarial examples for object detectors. In 12th {USENIX} Workshop on Offensive Technologies ({WOOT} 18). Szegedy, C.; Zaremba, W.; Sutskever, I.; et al. 2014. Intriguing properties of neural networks. 2014 ICLR. Wang, Y.-S.; Weng, T.-W.; and Daniel, L. 2019. Veriﬁcation of neural network control policy under persistent adversarial perturbation. ar Xiv preprint ar Xiv:1908.06353. Weng, T.-W.; Zhang, H.; Chen, H.; et al. 2018a. Towards fast computation of certiﬁed robustness for relu networks. ICML. Weng, T.-W.; Zhang, H.; Chen, P.-Y.; et al. 2018b. Evaluating the robustness of neural networks: An extreme value theory approach. ICLR. Weng, L.; Chen, P.-Y.; Nguyen, L.; Squillante, M.; Boopathy, A.; Oseledets, I.; and Daniel, L. 2019. PROVEN: Verifying robustness of neural networks with a probabilistic approach. In ICML 2019. Wong, E., and Kolter, J. Z. 2017. Provable defenses against adversarial examples via the convex outer adversarial polytope. ar Xiv preprint ar Xiv:1711.00851. Wong, E.; Schmidt, F.; Metzen, J. H.; and Kolter, J. Z. 2018. Scaling provable adversarial defenses. ar Xiv preprint ar Xiv:1805.12514. Wu, J.; Leng, C.; Wang, Y.; et al. 2016. Quantized convolutional neural networks for mobile devices. In CVPR, 4820 4828. Zhang, H.; Weng, T.-W.; Chen, P.-Y.; Hsieh, C.-J.; and Daniel, L. 2018. Efﬁcient neural network robustness certiﬁcation with general activation functions. In NIPS. Zhao, P.; Liu, S.; Chen, P.-Y.; and et. al. 2019a. On the design of black-box adversarial examples by leveraging gradient-free optimization and operator splitting method. In ICCV 2019. Zhao, P.; Wang, S.; Gongye, C.; et al. 2019b. Fault sneaking attack: a stealthy framework for misleading deep neural networks. In Proceedings of the 56th Annual Design Automation Conference. Zhou, A.; Yao, A.; Guo, Y.; et al. 2017. Incremental network quantization: Towards lossless cnns with low-precision weights. 2017 ICLR.