# higherorder_certification_for_randomized_smoothing__ab7b5638.pdf

Higher-Order Certiﬁcation for Randomized Smoothing

Jeet Mohapatra1 Ching-Yun Ko1 Tsui-Wei Weng1,2 Pin-Yu Chen2 Sijia Liu2

Luca Daniel1

1 MIT 2 MIT-IBM Watson AI Lab, IBM Research

Randomized smoothing is a recently proposed defense against adversarial attacks that has achieved state-of-the-art provable robustness against ℓ2 perturbations. A number of publications have extended the guarantees to other metrics, such as ℓ1 or ℓ , by using different smoothing measures. Although the current framework has been shown to yield near-optimal ℓp radii, the total safety region certiﬁed by the current framework can be arbitrarily small compared to the optimal. In this work, we propose a framework to improve the certiﬁed safety region for these smoothed classiﬁers without changing the underlying smoothing scheme. The theoretical contributions are as follows: 1) We generalize the certiﬁcation for randomized smoothing by reformulating certiﬁed radius calculation as a nested optimization problem over a class of functions. 2) We provide a method to calculate the certiﬁed safety region using zeroth-order and ﬁrst-order information for Gaussian-smoothed classiﬁers. We also provide a framework that generalizes the calculation for certiﬁcation using higher-order information. 3) We design efﬁcient, high-conﬁdence estimators for the relevant statistics of the ﬁrst-order information. Combining the theoretical contribution 2) and 3) allows us to certify safety region that are signiﬁcantly larger than the ones provided by the current methods. On CIFAR10 and Imagenet datasets, the new regions certiﬁed by our approach achieve signiﬁcant improvements on general ℓ1 certiﬁed radii and on the ℓ2 certiﬁed radii for color-space attacks (ℓ2 perturbation restricted to only one color/channel) while also achieving smaller improvements on the general ℓ2 certiﬁed radii. As discussed in the future works section, our framework can also provide a way to circumvent the current impossibility results on achieving higher magnitudes of certiﬁed radii without requiring the use of data-dependent smoothing techniques.

1 Introduction

Deep neural networks (DNNs) can be highly sensitive, i.e., small imperceptible input perturbations can lead to mis-classiﬁcation [1, 2]. This poses a big problem for the deployment of DNNs in safety critical applications including aircraft control systems, video surveillance and self-driving cars, which require near-zero tolerance to lack of robustness. Thus, it is important to provide guarantees for the robustness of deep neural network models against multiple worst-case perturbations. Popular threat models are the ℓp-norm-based attacks, where possible perturbations are constrained in an ℓp-ball with respect to a given input x. To that end recent research efforts have focused on attack-agnostic robustness certiﬁcation which, given an input x, provides a safety region within which the model is guaranteed not to change its prediction.

Randomized smoothing is a recently-proposed defense [3, 4, 5] that has achieved state-of-the-art robustness guarantees. Given any classiﬁer f, denoted as a base classiﬁer, randomized smoothing

34th Conference on Neural Information Processing Systems (Neur IPS 2020), Vancouver, Canada.

predicts the class that is most likely to be returned when noise is added to the input x. Thus, randomized smoothing acts as an operator that given a base classiﬁer and a noise model, denoted as smoothing measure, produces a new classiﬁer, denoted as the smoothed classiﬁer. The smoothed classiﬁers thus produced are easily-certiﬁable with strong (even near-optimal for ℓ2) robustness guarantees under various ℓp norm threat models [3, 6, 7]).

However, for any given threat model, the state-of-the-art robustness guarantees are achieved only for the smoothed classiﬁers obtained using very speciﬁc smoothing measures. Thus, the classiﬁers that attain state-of-the-art guarantees under one threat model might perform poorly under another (Gaussian-smoothed classiﬁers are optimal under ℓ2 but perform poorly under ℓ1; uniform noisesmoothed classiﬁers are state-of-the-art under ℓ1 but have poor performance under ℓ2). Moreover, some of the recent works [8, 9, 10] show that the existing framework, which uses only the zeroth order information (the function value of the smoothed classiﬁer g(x)), is incapable of producing large certiﬁed radii for ℓp norms with high values of p.

Motivated by the two limitations above, we focus our attention on improving the certiﬁed safety region which is agnostic of threat models. To that end, we propose a general framework to provide a larger certiﬁed safety region by better utilizing the information derived from a hard-label classiﬁer. In particular, we summarize our contributions as follows:

1. We propose a general framework that calculates a certiﬁed safety region of a smoothed classiﬁer g, around an input point x, by exploiting the estimated local properties (e.g. gradient, Hessian, etc.) of the classiﬁer g at x.

2. We give a threat-model-agnostic asymptotic-optimality result for smoothed classiﬁers obtained by using standard Gaussian as the smoothing measure, i.e. gaussian-smoothed classiﬁers. Using Theorem 2 in Section 3.1, we show that theoretically it is possible to produce arbitrarily tight certiﬁcates for any classiﬁer1. As a consequence, we see that the impossibility results for the existing framework cannot be extended to certiﬁcates obtained using higher-order information.

3. We motivate and prove properties, like convexity (Proposition 1) and non-decreasing dependence on angle (Proposition 2), regarding the certiﬁed safety regions of gaussian-smoothed classiﬁers produced using the zeroth and ﬁrst-order local information. Using these properties, we give formulas for calculating certiﬁed radii for gaussian-smoothed classiﬁers under ℓp threat models and their subspace variants with p = 1, 2, .

4. We design new efﬁcient estimators (see Table 1) to provide high-conﬁdence interval estimates of relevant ﬁrst-order information about the classiﬁer since the naive Monte-Carlo estimators have prohibitively high sample complexity.

Finally, we use the 3rd and the 4th contributions above to empirically verify the effectiveness of the new certiﬁcation framework in providing state-of-the-art certiﬁed accuracy for multiple threat models simultaneously. In particular, our proposed framework substantially boosts the certiﬁed accuracy for ℓ1 norm and subspace ℓ2 norm while maintaining (at times marginally improving) the state-of-the-art near-optimal results for ℓ2 norm. On the CIFAR10 dataset, our results for the ℓ norm also show improvement over the state-of-the-art bounds given by Gaussian smoothing.

2 Background and Related Works

2.1 Related Works

Randomized Smoothing. Randomized smoothing was initially introduced as a heuristic defense by [11] and [12]. Later, [3] formulated it as a certiﬁcation method using ideas from differential privacy, which was then improved by [6] using Renyi divergence. For gaussian-smoothed classiﬁers, [5] made the certiﬁed bounds worst-case-optimal in the context of certiﬁed ℓ2 norm radii by using the Neyman-Pearson Lemma, while authors of [13] combined the certiﬁcation method with adversarial training to further improve the empirical results. Along another line of works, some extended existing certiﬁcation methods to get better ℓp norm certiﬁed radii using different smoothing distributions, e.g.

1For this result, the optimality assumes we abstain at points where the top-1 class has probability lower than 0.5.

a discrete distribution for ℓ0 certiﬁcates [14], the Laplace distribution for ℓ1 certiﬁcates [15], and the generalized Gaussian distribution for ℓ certiﬁcates [16]. Recently, [8] proposed a general method for ﬁnding the optimal smoothing distribution given any threat model, as well as a framework for calculating the certiﬁed robustness for the smoothed classiﬁer.

Impossibility Results. Recently, a number of works have shown that for ℓp norm threat models with large p, it is impossible to give a big certiﬁed radius O(d 1 p 1

2 where d is the input dimension) while retaining a high standard accuracy. In particular, the results on ℓ threat model given in [9, 10] and the results on ℓp (for sufﬁciently large p) threat models given in [8] establish a certiﬁcation/accuracy trade-off, which also exaggerates the need for an extended and generalized framework that breaks the conﬁned trade-off and impossibility results.

2.2 Preliminaries and Notations

Generally, a classiﬁer in machine learning is represented as a function f that maps from the feature space Rd to a probability vector over all the classes Y. We use F to denote the set of all classiﬁers f : Rd 7 Y, where Y is a probability simplex collecting all possible probability vectors with dimension Y. In this paper, we work under the restrictive black-box model, i.e., we assume we only have access to the class labels outputted by classiﬁer f. For ease of use, we consider the output of f to be the hard-thresholding of the probability vector, i.e. the output is a one-hot encoding vector. Given a black-box classiﬁer f F, randomized smoothing is a technique that creates a smoothed classiﬁer g by convolving f (hereafter referred to as the base classiﬁer) with a smoothing measure µ, i.e., g = f µ. We use Gµ F to denote the class of these smoothed functions and denotes the convolution operator. Note that the smoothing measure µ is a probability measure and each smoothed function g Gµ F deﬁnes a mapping from Rd to Y. Thus, we have Gµ F F.

3 A General Framework for Randomized Smoothing

Given a smoothing measure µ, the certiﬁed safety region around a point x is given as the subset of the space over which the classiﬁer output is guaranteed not to change. In practice, this involves estimating some local information about g around the point x and giving the certiﬁed safety region as the common subset (intersection) of the safety region of all classiﬁer h Gµ F whose local properties match the estimated local information. In the current literature, the only local information used is the function value g(x) [5, 13]; however, more local information of g(x), such as the gradient or higher-order derivatives, can in fact be exploited for deriving better certiﬁed radii. To illustrate the idea, we express the local information as the local constraints, and re-write the problem of ﬁnding the certiﬁed radius in the following: Let Hx i (g) be an estimate of some local property of g at the input point x, and let there be k such constraints, then the certiﬁed safety region SR x can be written as

SR x = \ n Sh(x) h Gµ F and Hx i (h) = Hx i (g), 1 i k o ,

Sh(x) = δ arg max h(x + δ) = arg max h(x) (1)

where Sh(x) gives the safety region for the function h. The existing state-of-the-art certiﬁcates [5] via randomized smoothing are a special case of this framework with k = 1 and Hx 1 (h) = (h(x))A given Hx 1 (g) p A, where h( )A denotes the Ath component of the vector h( ) and p A is the lower bound of the estimated probability that x belongs to the predicted class A.

Reduction to Binary. In existing literature, certiﬁcates for randomized smoothing based classiﬁers effectively reduce the multi-class classiﬁcation to binary by reducing the problem to top-1 prediction vs other classes. We follow the same idea to simplify our certiﬁcation objectives by observing :

h(x + δ)arg max(h(x)) > 0.5 = arg max(h(x + δ)) = arg max(h(x)).

Assuming arg max(h(x)) = c, we give by Sh L(x) = {δ | h(x + δ)c > 0.5} a set lower bound of Sh(x), i.e., Sh L(x) Sh(x). Substituting Sh(x) in (1) by Sh L(x) gives us a set lower bound SRL(x) of SR x with the safety region subset:

SRL(x) = \ n δ h(x + δ)c > 0.5 h Gµ F & Hx i (h) = Hx i (g), 1 i k o

This set can be re-written as SRL(x) = {δ | px(x + δ) > 0.5}, where the probability function px(z) given by the optimization problem:

px z = min h Gµ F h(z)c s.t. Hx i (h) = Hx i (g), 1 i k. (2)

Our proposed framework. Use Generalized Neymann Pearson Lemma [17] to solve Equation (2) for px(z). The safety region is then given as super-level set of px, i.e., SRL(x) = {δ | px(x+δ) > 0.5}.

Discussion. Under the proposed general framework for calculating certiﬁed radii, it is easy to see that adding more local constraints (Hx i ) in Equation (2) gives a bigger value of px(z) for any x, z which makes the super-level set of px, equivalently the certiﬁed safety region, bigger. In the following subsection, we study the properties of functions in Gµ F to get some motivation about which information might help us achieve a larger certiﬁed safety region. Then, we provide an example usage of this framework that exploits additional local higher-order information to give larger certiﬁed radii. The proofs of all theoretical results hereafter are supplemented in the appendix.

3.1 Regularity Properties of Gµ F

For any black-box classiﬁer f, the function map between the input pixel-space Rd and the prediction Y is discontinuous. As a result, the higher-order derivatives might not always exist for the smoothed function g for smoothing with general probability measures µ. Therefore, it is crucial to establish conditions on the probability measure µ that guarantee the existence of higher-order derivatives for g at all points in the pixel-space Rd. We give this in the following theorem:

Theorem 1. If α Nd, R

Rd |Dα xµ(y x)|dy 2 is ﬁnite, then Gµ F C . Moreover, if g Gµ F is given as g = f µ for some f F, then

Rd f(y)( 1)i iµ(y x)dy.

Corollary 1.1. When µ is the isotropic Gaussian distribution N(0, σ2I), then Gµ F C and ig(x) = R

Rd f(y)( 1)i iµ(y x)dy.

Corollary 1.1 guarantees the existence of higher-order derivatives for functions obtained by Gaussian smoothing and also provides a way of evaluating the higher-order terms using only the output of the base classiﬁer f. Considering the fact that the truncated mth order Taylor expansion around x gives better local approximations of the function with an increasing m , we expect to get better approximations of the safety region by using the higher-order information about g at x. Now in order to understand the limitations of this technique we consider the asymptotic limit where we have all the higher order terms of g at x. We observe that:

Theorem 2. When µ is the isotropic Gaussian distribution N(0, σ2I), then g Gµ F, g is a real analytic function with inﬁnite radius of convergence, i.e., the Taylor series of g around any point w converges to the function g everywhere.

Asymptotic-Optimality Remark: As a consequence of Theorem 2, when µ is the isotropic Gaussian distribution, any function h Gµ F is uniquely identiﬁed by its Taylor expansion. Thus if we can accurately estimate all the higher-order information about g at any point x, along with the fact that g Gµ F, the feasible set in problem (2) will reduce to the singleton set containing only the function g making px(z) = g(z)c. This gives us the safety region SRL(x) = {g(x)c > 0.5} which is the exact safety region under the binary, top-1 vs rest, relaxation (abstain when top-1 class has probability less than 0.5). Computing the exact safety region implies that we can get large certiﬁed ℓp radii for all p. Hence, the impossibility results shown in the recent works by Yang et.al [8], Blum et al. [9] and Kumar et al. [10] or any of their extensions can not hold for certiﬁcation methods that use higher-order information. See APPENDIX C for remarks on optimality of certiﬁcates obtained using only zeroth and ﬁrst order information.

2Dα is the multi-variate differential operator, where α is a multi-index.

Figure 1: A longitudinal slice of the certiﬁed safety region around a point under various values of y(0), y(1) keeping direction of y(1) ﬁxed along the negative x axis. Normalized gradient magnitude is given as y(1) 2/ maxg(x)=y(0) g(x) 2.The green region is certiﬁed using only zeroth order information and the blue regions using both zeroth and ﬁrst order information.

3.2 Certiﬁcation For Randomized Smoothing Using First-Order Information

For the rest of this paper, we focus on the certiﬁed region produced by solving the optimization problem described in Equation (2) using the zeroth and ﬁrst-order local information of g at a point x. The resulting ﬁrst-order optimization problem is given as follows:

px z = min h Gµ F h(z)c s.t. h(x) = y(0), h(x) = y(1) (3)

where y(0), y(1) are the zeroth and ﬁrst-order local information of g at x estimated using Corollary 1.1. In practice, it is only possible to give interval estimates for y(0), y(1). The following Theorem gives a lower bound for px given interval estimates of y(0), y(1):

Theorem 3 (Lower Bound of px(z)). For a base classiﬁer f F, if g = f µ, µ is the isotropic Gaussian distribution N(0, σ2I), y(0) = g(x), y(1) = g(x), then for any unit vector v and any positive value of r, px(x + σrv) can be lower bounded by solving the following set of equations: Z

2 Φ(c(x))dx = q (4)

2 dx = m2 (5a) Z

2 Φ(c(x))dx = m1 (5b)

with q y(0), m1 σv T y(1), m2 σ y(1) v T y(1)v 2, c(x) := c0 + c1x + c2erx, and Φ(z) being the CDF of the standard normal distribution. If the solution (c0, c1, c2) of above equations has c1 < 0, then the lower bound of px(x + σrv) is instead given by solving Equations (4) to (5a) with c(x) := c0 + c2erx.

Given px(z) the certiﬁed safety region can be given as the super-level set {z | px(z) > 0.5}. To supplement the above theorem, we point out that using only Equation (4) gives exactly the zerothorder certiﬁcation. Intuitively, we see that unlike the zeroth-order certiﬁcation that is calculated through equations independent of v, ﬁrst-order certiﬁcation uses additional Equations (5a), (5b) that depend on v, breaking the symmetry and admitting non-isotropic certiﬁed radius bounds. Referring to Theorem 3, we observe that px(x + rv) only depends on g(x), g(x) 2 and angle between v and g(x). Thus, the ﬁrst order certiﬁed region has a cylindrical symmetry with the axis along the vector y(1).

In Figure 1, we compare a longitudinal slice of the safety region around a point x calculated using both zeroth and ﬁrst-order information to the longitudinal slice of the safety region obtained using only the zeroth-order information. We use these ﬁgures to deduce some properties of these safety regions:

i) The relative improvement in the size of the safety region is the largest for low values of y(0) and get successively smaller with an increasing y(0).

ii) Given a ﬁxed value of y(0), the volume of the certiﬁed safety region grows to inﬁnity (the half-space given by the dash line) as the magnitude of y(1) 2 increases to its maximum possible value.

iii) The certiﬁed safety region is convex and for any given values of y(0) and y(1) 2, the directional certiﬁed radius is highest along the direction of the y(1) and gets successively lower as we rotate away from it (the angle between v and y(1) increases), to the lowest in the direction opposite to the y(1).

We formalize the third observation into the following propositions, which allows us to calculate the certiﬁed radii for the various threat models discussed in the rest of this section. Proposition 1. The certiﬁed safety region, SRL(x), calculated using the zeroth and ﬁrst-order local information is convex, i.e., if x1, x2 SRL(x) then x1+x2

Using Prop 1, we see that along any vector v if for some R, x+Rv SRL(x), then forall 0 r R, x + rv SRL(x). Thus, along any direction v, we can deﬁne a directional robustness Rv such that forall 0 r < Rv, x + rv SRL(x) and forall r R, x + rv SRL(x). It is easy to see that Rv can be given as the solution to the equation r 0, px(x + rv) = 0.5.

Proposition 2. For any given value of y(0), y(1), the directional robustness along v, Rv, given by the ﬁrst-order certiﬁcation method is a non-increasing function of the angle between v and y(1), i.e., cos 1 v T y(1)

(I) Certiﬁed ℓ2 Norm Radius: By Proposition 2, we have that for ﬁrst-order certiﬁcation, the directional robustness along all directions v is at least as big as the directional robustness along the direction y(1), implying that the ℓ2 norm certiﬁed radius is equal to R y(1). Thus,we have Corollary 3.1 (Certiﬁed ℓ2 Norm Radius). For a base classiﬁer f F, if g = f µ, where µ is the isotropic Gaussian distribution N(0, σ2I), y(0) = g(x), y(1) = g(x), the ℓ2 norm radius R is given as R = σr, where (r, w1, w2) is the solution of the system of equations:

Φ(w1 r) Φ(w2 r) = 0.5 (6)

Φ(w1) Φ(w2) = q (7a) 1

2π e w2 2 2 1

2π e w2 1 2 = m1 (7b)

with q y(0) and m1 σ y(1) 2.

Corollary 3.1 gives the same result as Cohen et al. [5] if we remove constraint (7b) and consider the minimum value of r produced by Equation (6) over all possible pairs of w1, w2 satisfying equation (7a) . As a result the certiﬁed ℓ2 norm radius given by Corollary 3.1 is always greater than or equal to the radius given by the existing framework with equality holding only when the classiﬁer is linear.

(II) Certiﬁed ℓ1, ℓ Norm Radius: Using Proposition 1 we see that the minimum ℓ1 norm radius is along one of the basis vectors and using Proposition 2 we see that the minimum directional robustness must be along the basis vector with the biggest angle with y(1). Thus, the projection of y(1) along this can be given as y(1) . Corollary 3.2 (Certiﬁed ℓ1 Norm Radius). For a base classiﬁer f F, if g = f µ, where µ is the isotropic Gaussian distribution N(0, σ2I), y(0) = g(x), y(1) = g(x), the ℓ1 norm radius R is obtained by solving px(x + Rv) = 0.5, where px(x + Rv) is given by solving the problem in

Theorem 3 with m1 σ y(1) , m2 σ q y(1) 2 2 y(1) 2 .

Similarly, the minimum ℓ norm radius must ocuur along a vector of all 1 s and -1 s and using Proposition 2 we see it must be along the vector of all 1 s and -1 s with the biggest angle with y(1). Thus, the projection of y(1) along this can be given as y(1) 1. Corollary 3.3 (Certiﬁed ℓ Norm Radius). For a base classiﬁer f F, if g = f µ, where µ is the isotropic Gaussian distribution N(0, σ2I), y(0) = g(x), y(1) = g(x), the ℓ norm radius R is obtained by solving px(x + Rv) = 0.5, where px(x + Rv) is given by solving the problem in

Theorem 3 with m1 σ

d y(1) 1, m2 σ

d y(1) 2 2 y(1) 2 1.

Table 1: Estimators to calculate the different norm value of g(x). (* newly designed estimators)

k(n1+n2) log α

2 2n1n2(XT n1Yn2+t)

constants t = q

2kd(d log 2 log α)

n1+n2 t = q

2d n1n2 log α

2k(log 2d log α)

k(n1+n2) log α

2 2n1n2(XT n1Yn2 t)

upper bound

n1Xn1+n2Yn2

1+ϵ2u ϵu (*) n1Xn1+n2Yn2

lower bound

n1Xn1+n2Yn2

1+ϵ2 l +ϵl (*) n1Xn1+n2Yn2

(IV) Certiﬁed ℓp Norm Radius over a Subspace: Deﬁnition 1 (Subspace Certiﬁed ℓp Norm Radius). Given subspace S, we deﬁne the ℓp certiﬁed radius in the subspace as follows:

max R R s.t. δ S, δ p R; arg max(g(x + δ)) = arg max(g(x)).

Going beyond ℓp norm threat models, we give the subspace ℓp norm threat model that allows us to, among other things, measure the sensitivity of a deep learning model over a subset of the pixels instead of the entire image. Using Propositions 1 and 2, we are able to extend our results to give : Corollary 3.4 (Subspace Certiﬁed ℓp norm radius). For a base classiﬁer f F, if g = f µ, where µ is the isotropic Gaussian distribution N(0, σ2I), y(0) = g(x), y(1) = g(x), and a subspace S with orthogonal projection matrix PS, for p = 1, 2, the subspace ℓp norm certiﬁed radius R is obtained by solving px(x+Rv) = 0.5, where px(x+Rv) is given by solving the problem in Theorem

3 with m1 σ PSy(1) p , m2 σ q y(1) 2 2 PSy(1) 2 p and p is the dual norm of p.

4 Numerical Estimation of First-Order Information

In the previous section a concrete set of methods has been proposed to calculate the ﬁrst-order certiﬁcate for randomized smoothing assuming we have access to the ﬁrst-order information y(1). Therefore in this section we complete the framework by giving ways to get high conﬁdence estimates of vector y(1). Recalling that the hard-label classiﬁer f is not continuous and hence not differentiable, we need to use the zeroth order information about f to approximate y(1) , i.e., use the result from Corollary 1.1 : y(1) = Ey µ[ y

σ2 f(x + y)]. As the estimator y σ2 f(x + y) is noisy [13], approximating the expectation using Monte-Carlo sampling is hard, i.e., the number of samples required to achieve a non-trivial high-conﬁdence estimate of y(1) scales linearly with the input dimension. In practice, this translates to a sample complexity of 10 billion for CIFAR and near a trillion for Imagenet.

Although estimating the vector y(1) is hard, we observe from Corollary 3.2, 3.1, 3.3, 3.4 that we don t need the whole vector y(1) but only y(1) 2, y(1) for ℓ1 certiﬁed radius; y(1) 2 for ℓ2 certiﬁed radius; y(1) 1, y(1) 2 for ℓ certiﬁed radius; and PSy(1) p, y(1) 2 for the subspace ℓp certiﬁed radius (p = 1, 2, ). Most of these statistics can be estimated much more efﬁciently due to the following observation : Theorem 4. Given a black-box classiﬁer f and the random vector z = w(f(x + w)c 1

2) where w N(0, σ2I), we have that z σ2y(1) is a sub-gaussian random vector with parameter k = σ2( 1

8πe). For convenience, we do some abuse of notation to denote this as (z σ2y(1)) sub G(k).

Leveraging the properties of sub-gaussian random variables, we establish the following general result:

Theorem 5. For any α 2e d

16 , if we have two random vectors X, Y such that (X β)

sub G(k1) and (Y β) sub G(k2), then we can show that using t = q

2k1k2d log α

(a) Certiﬁed ℓ1 radius

(b) Certiﬁed ℓ2 radius

(c) Certiﬁed ℓ radius

(d) Certiﬁed Subspace ℓ2 radius

Figure 2: Increase in Certiﬁed Accuracy for CIFAR10 seen under various threat models

(k1+k2) log α

2 2(XT Y +t) , and ϵl = q

(k1+k2) log α

2 2(XT Y t) renders

1 + ϵ2 l + ϵl

Now, let X = Xn1, Y = Yn2 be the empirical average of n1, n2 independent samples of the random variable z. Then, using Theorem 4 and 5 allows us to give the estimators illustrated in Table 1. However, we see that estimation of y(1) 1 still scales linearly with d making it impractical. Thus, the ℓ norm certiﬁed radius is still given as 1

d of the ℓ2 norm certiﬁed radius.

Finally, for estimating the ℓp norm (p = 1, 2, ) over a subspace S we show that we can use the corresponding estimators from Table 1 except with XS n1 = PSXn1, Y S n2 = PSYn2 and d = d S where PS is the orthogonal projection matrix for S and d S is the dimension of S.

5 Experimental Results

We empirically study the performance of the new certiﬁcation schemes on standard image classiﬁcation datasets, CIFAR10 and Imagenet. We reuse the models given by Cohen et al. [5] and calculate the certiﬁed accuracy at radius R by counting the samples of the test set that are correctly classiﬁed by the smoothed classiﬁer g with certiﬁed radii of at least R. For both our proposed certiﬁcate and the baseline certiﬁcate [5], we use a failure probability of α = 0.001 and N = 200, 000 samples for CIFAR10 and N = 1, 250, 000 samples for Imagenet. For ℓ radius we require a lot more samples to get better results as our current estimator is too noisy. In order to show the possibility presented by our approach, we also provide results for the ℓ certiﬁed radius for CIFAR10 estimated using 4, 000, 000 samples. In our plots, we present, for each threat model the upper envelopes of certiﬁed accuracies attained over the range of considered σ {0.12, 0.25, 0.50, 1.00}. Further experiments (including all Imagenet experiments) are given in the appendix.

5.1 Certiﬁed Accuracy

As expected we see from Figure 2b that the new framework gives only marginal improvement over the ℓ2 radius certiﬁed by existing methods. This follows from the fact that the existing methods already produce near-optimal certiﬁed ℓ2 radii. However, certifying a signiﬁcantly bigger certiﬁed safety region allows us to give signiﬁcant improvements over the certiﬁed ℓ1 radius, the ℓ radius and the subspace ℓ2 radii (the subspace considered here is the red channel of the image, i.e., we only allow perturbations over red component of the RGB pixels of the image). We note here that considering the performance for individual threat models, the ℓ1 norm certiﬁed accuracy given here is still smaller than the existing state-of-the-art. However, the current state-of-the-art for ℓ1 certiﬁed radius uses the uniform smoothing distribution which performs quite poorly for ℓ2 norm radius. If we consider multiple threat models simultaneously the proposed method gives the best joint performance. Similar ﬁndings are also reported for Imagenet experiments (given in the appendix).

Moreover, we see that for the certiﬁed ℓ radius the new method gives an improvement over the certiﬁed ℓ radius provided by existing methods for Gaussian noise. As Gaussian noise was shown to achieve the best possible certiﬁed ℓ radius for existing methods, this provides an improvement over the best possible ℓ radius certiﬁable using existing methods. Although the improvements are small in magnitude, we believe they can be improved upon by using tighter estimators for y(1) 1.

6 Conclusion and Future Work

In this work, we give a new direction for improving the robustness certiﬁcates for randomizedsmoothing-based classiﬁers. We have shown that even in the black-box (hard-label classiﬁer) setting, leveraging more information about the distribution of the labels among the sampled points allows us to certify larger regions and thus guarantee large certiﬁed radii against multiple threat models simultaneously. We have shown this to hold theoretically and also demonstrated it on CIFAR and Imagenet classiﬁers. Additionally, for gaussian smoothing, the proposed framework gives a way to circumvent the recently-proposed impossibility results and also promises a threat-model-agnostic asymptotic-optimality result. However, we note that the ﬁrst-order smoothing technique given in this paper is only a proof-of-concept to show it is possible to better leverage local information to certify larger safety regions without changing the smoothing measure. In future, this work could be extended to derive and use, for any given threat model, the best local information to exploit in order to improve the certiﬁcates for that threat model.

Broader Impact

In recent years, machine learning and intelligent systems have started to be widely adopted into everyday life, including several safety-critical applications. The wide-spread use of these systems requires them to be held to a higher level of scrutiny. One such requirement is robustness , i.e., the systems prediction should not be too sensitive to small input perturbations. In general, undefended/vanilla deep learning models have been found to be extremely sensitive to small imperceptible perturbations. This in turn has given rise to many defense techniques, able to overcome existing threats to robustness. However most of these models have later proved to be ineffective against newer threats. In turn, this has then given rise to a class of techniques which provide some mathematical guarantees about the robustness of their predictions. In this paper, we extended one such framework that provides some mathematically provable guarantees against any single individual threat model chosen from a large class of threat models. Speciﬁcally, we provided a way to construct models with some mathematically provable guarantees against multiple simultaneous threat models.

The beneﬁts of this contribution include providing a more holistic picture of the robustness guarantees for deployed models. This can hopefully bring us a step closer to trustworthy AI. Since this work helps building models that give some guarantees on their behavior, we hope it would also lead to the wider adoption of deep learning models. Moreover, we hope that the guarantees given by these models would allow people to have some improved sense of security when using deep-learning-based products. Considering intensive applications such as product matching, categorization in assembly lines or video surveillance that require long periods of hard effort, robust models will give more reliable and efﬁcient means of achieving the task and considerably reducing the burden on humans.

As for the potentially negative effects of this contribution, we feel that any progress towards robustness certiﬁcation could easily become a double-edged sword. This is because, if adopted blindly (i.e. speciﬁcally without extremely careful attention to the types of guarantees provided and most importantly those NOT provided), it may give also the false sense of security that current deep-learning systems are already ready for deployment. These robustness concerns are one of the major bottlenecks for the adoption of deep-learning models in safety-critical applications such as self-driving car and air trafﬁc control systems. hence the ability to provide some robustness guarantees might result in wide premature adoption of deep learning models in such applications. We would like to candidly admit to developer and user readers of this paper that the presence of some robustness guarantees for a model does not mean we understand the model or that it is completely safe to deploy it. To the contrary the authors of this paper believe that we are still quite far from such safe adoption scenario. Further issues lie also beyond robustness. A negative effect of this publication and publication like this on robustness, is might give the highly incorrect impression that with robustness one can feel safe in deploying AI systems in society. On the contrary we would also like to candidly admit and remind the readers that deep-learning-based models suffer from problems such as lack of accuracy guarantees for out-of-distribution data , lack of fairness, lack of explain-ability and many others that MUST to be solved before AI systems are viable for real-world applications. More speciﬁcally about the lack of accuracy guarantees for out-of-distribution data: the robustness of a model does not necessarily mean its prediction is always accurate. Robustness and accuracy are indeed two disjoint concepts. While it is well-known that an accurate prediction might not be robust, it is also essential to keep in mind that a robust prediction need not always be accurate. In other words, some models may generate a prediction that is indeed highly robust (i.e. does not change upon perturbation) but consistently incorrect! As a result, in applications such as air trafﬁc control systems, the models might display extremely bad behaviour because of situations not present in its training, while the presence of some robustness guarantees might give a false sense of security to an inexperience user (e.g. not familiar with the admittedly subtle mathematical intricacies of different threat models), that these systems are completely fault-tolerant.

In conclusion, we would like to remark that although this paper helps to take a step towards building more trustworthy AI, that goal is indeed still quite far-off.

Funding Disclosure

This work was funded by the MIT-IBM collaboration under the research grant " Toward Trustworthy AI: Efﬁcient Algorithms for Building Probably Robust and Veriﬁable Neural Networks ".

[1] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, Intriguing properties of neural networks, ICLR, 2014. [2] I. Goodfellow, J. Shlens, and C. Szegedy, Explaining and harnessing adversarial examples, in ICLR, 2015. [3] M. Lecuyer, V. Atlidakis, R. Geambasu, D. Hsu, and S. Jana, Certiﬁed robustness to adversarial examples with differential privacy, in 2019 IEEE Symposium on Security and Privacy (SP), pp. 656 672, 2019. [4] Y. Li, X. Bian, and S. Lyu, Attacking object detectors via imperceptible patches on background, ar Xiv preprint ar Xiv:1809.05966, 2018. [5] J. Cohen, E. Rosenfeld, and Z. Kolter, Certiﬁed adversarial robustness via randomized smoothing, in Proceedings of the 36th International Conference on Machine Learning (K. Chaudhuri and R. Salakhutdinov, eds.), vol. 97 of Proceedings of Machine Learning Research, (Long Beach, California, USA), pp. 1310 1320, PMLR, 09 15 Jun 2019. [6] B. Li, C. Chen, W. Wang, and L. Carin, Certiﬁed adversarial robustness with additive noise, in Neur IPS, 2019. [7] K. D. Dvijotham, J. Hayes, B. Balle, Z. Kolter, C. Qin, A. György, K. Xiao, S. Gowal, and P. Kohli, A framework for robustness certiﬁcation of smoothed classiﬁers using f-divergences., in ICLR, 2020. [8] G. Yang, T. Duan, E. Hu, H. Salman, I. Razenshteyn, and J. Li, Randomized smoothing of all shapes and sizes, ar Xiv preprint ar Xiv:2002.08118, 2020. [9] A. Blum, T. Dick, N. Manoj, and H. Zhang, Random smoothing might be unable to certify l robustness for high-dimensional images, ar Xiv preprint ar Xiv:2002.03517, 2020. [10] A. Kumar, A. Levine, T. Goldstein, and S. Feizi, Curse of dimensionality on randomized smoothing for certiﬁable robustness, ar Xiv preprint ar Xiv:2002.03239, 2020. [11] X. Liu, M. Cheng, H. Zhang, and C.-J. Hsieh, Towards robust neural networks via random self-ensemble, in Proceedings of the European Conference on Computer Vision (ECCV), pp. 369 385, 2018. [12] C. Xie, J. Wang, Z. Zhang, Z. Ren, and A. Yuille, Mitigating adversarial effects through randomization, ar Xiv preprint ar Xiv:1711.01991, 2017. [13] H. Salman, J. Li, I. Razenshteyn, P. Zhang, H. Zhang, S. Bubeck, and G. Yang, Provably robust deep learning via adversarially trained smoothed classiﬁers, in Advances in Neural Information Processing Systems, pp. 11289 11300, 2019. [14] G.-H. Lee, Y. Yuan, S. Chang, and T. S. Jaakkola, Tight certiﬁcates of adversarial robustness for randomly smoothed classiﬁers, in Advances in Neural Information Processing Systems, 2019. [15] J. Teng, G.-H. Lee, and Y. Y. J., ℓ1 adversarial robustness certiﬁcates: a randomized smoothing approach, 2019. [16] D. Zhang*, M. Ye*, C. Gong*, Z. Zhu, and Q. Liu, Filling the soap bubbles: Efﬁcient black-box adversarial certiﬁcation with non-gaussian smoothing, 2020. [17] H. Chernoff and H. Scheffe, A generalization of the neyman-pearson fundamental lemma, The Annals of Mathematical Statistics, pp. 213 225, 1952. [18] S. Boyd, S. P. Boyd, and L. Vandenberghe, Convex optimization. Cambridge university press, 2004. [19] J. Wendel, Note on the gamma function, The American Mathematical Monthly, vol. 55, no. 9, pp. 563 564, 1948.