# adversarial_robustness_via_robust_low_rank_representations__ebd3a9b7.pdf

Adversarial robustness via robust low rank representations

Pranjal Awasthi Google Research and Rutgers University. pranjal.awasthi@rutgers.edu.

Himanshu Jain Google Research. himj@google.com.

Ankit Singh Rawat Google Research. ankitsrawat@google.com.

Aravindan Vijayaraghavan Northwestern University. aravindv@northwestern.edu.

Adversarial robustness measures the susceptibility of a classiﬁer to imperceptible perturbations made to the inputs at test time. In this work we highlight the beneﬁts of natural low rank representations that often exist for real data such as images, for training neural networks with certiﬁed robustness guarantees. Our ﬁrst contribution is for certiﬁed robustness to perturbations measured in ℓ2 norm. We exploit low rank data representations to provide improved guarantees over state-of-the-art randomized smoothing-based approaches on standard benchmark datasets such as CIFAR-10 and CIFAR-100. Our second contribution is for the more challenging setting of certiﬁed robustness to perturbations measured in ℓ norm. We demonstrate empirically that natural low rank representations have inherent robustness properties, that can be leveraged to provide signiﬁcantly better guarantees for certiﬁed robustness to ℓ perturbations in those representations. Our certiﬁcate of ℓ robustness relies on a natural quantity involving the 2 matrix operator norm associated with the representation, to translate robustness guarantees from ℓ2 to ℓ perturbations. A key technical ingredient for our certiﬁcation guarantees is a fast algorithm with provable guarantees based on the multiplicative weights update method to provide upper bounds on the above matrix norm. Our algorithmic guarantees improve upon the state of the art for this problem, and may be of independent interest.

1 Introduction

It is now well established across several domains like images, audio and natural language, that small input perturbations that are imperceptible to humans can fool deep neural networks at test time [1, 2, 3, 4]. This phenomenon known as adversarial robustness has led to ﬂurry of research in recent years (see Section A for a discussion of related work). Following most prior work in this area [5, 6, 7, 8, 9], we will study the setting where adversarial perturbations to an input x are measured in an ℓp norm (p = 2 or p = ).

In this work, we study methods for certiﬁed adversarial robustness in the framework developed in [10, 11]. The goal is to output a classiﬁer f that on input x Rn outputs a prediction y in the label space Y, along with a certiﬁed radius rf(x). The classiﬁer is guaranteed to be robust at x up to the radius rf(x) (with high probability), i.e., z : z p rf(x), f(x + z) = f(x).

34th Conference on Neural Information Processing Systems (Neur IPS 2020), Vancouver, Canada.

For an ℓp norm and ε > 0, the certiﬁed accuracy of a classiﬁer f is deﬁned as

acc(ℓp) ε (f) = P (x,y) D

f(x) = y and rf(x) ε , (1)

where D is the data distribution generating test inputs. We call the radius rf(x) returned by the classiﬁer as the certiﬁed radius on x. When ε = 0 this is the natural accuracy of f.

For certiﬁed adversarial robustness to ℓ2 perturbations, the randomized smoothing procedure proposed in [10, 11] is a simple and eﬃcient method that can be applied to any neural network. Randomized smoothing works by creating a smoothed version of a given classiﬁer by adding Gaussian noise to the inputs (see Section 2). The smoothed classiﬁer exhibits certain Lipschitzness properties, and one can derive good certiﬁed robustness guarantees from it. The study of randomized smoothing for certiﬁed ℓ2 robustness is an active research area and the current best guarantees are obtained by incorporating the smoothed classiﬁer into the training process [12] (see Section A).

It seems much more challenging to obtain certiﬁed adversarial robustness to ℓ perturbations [13, 14, 15]. In particular, the design of a procedure akin to randomized smoothing has been diﬃcult to achieve for ℓ perturbations. One approach to obtain certiﬁed ℓ robustness is to translate a certiﬁed radius guarantee of ε for ℓ2 perturbations (via randomized smoothing) into an ε/ n certiﬁed radius guarantee for ℓ perturbations; here n is the dimensionality of the ambient space. Furthermore, recent work [16, 17, 18] has established lower bounds proving that randomized smoothing based methods cannot break the above n barrier for ℓ robustness in the worst case.

However real data such as images are not worst case and often exhibits a natural low rank structure. In this work we show how we can leverage such natural low-rank representations for the data, in order to design algorithms based on randomized smoothing with improved certiﬁed robustness guarantees for both ℓ2 and ℓ perturbations.

Our Contributions. We now describe our main contributions.

Improved certiﬁed ℓ2 robustness: Our ﬁrst contribution is to design new smoothed classiﬁers for achieving certiﬁed robustness to ℓ2 perturbations. These classiﬁers achieve improved tradeoﬀs between natural accuracy and certiﬁed accuracy at higher radii. We achieve this by leveraging the existence of good low-rank representation for the data. We modify the randomized smoothing approach to instead selectively inject more noise along certain directions, without compromising the accuracy of the classiﬁer. The large amount of noise leads to classiﬁer that is less sensitive to ℓ2 perturbations, and hence achieves higher certiﬁed accuracy across a wide range of radii. We empirically demonstrate the improvements obtained by our approach on image data in Section 2.

Fast algorithms for translating certiﬁed robustness guarantees from ℓ2 to ℓ : For the more challenging setting of ℓ robustness we consider classiﬁers of the form f(Px) where P is an arbitrary linear map, and f represents an arbitrary neural network. When translating certiﬁed robustness guarantees for ℓ2 perturbations to obtain guarantees for ℓ perturbations, the loss incurred is captured by the 2 operator norm of matrix P. While computing this operator norm is NP-hard, we design a fast approximate algorithm based on the multiplicative weights update method with provable guarantees. Our algorithmic guarantees give signiﬁcant improvements over the best known bounds [19, 20] for this problem and may be of independent interest (see Section 3).

Certiﬁed ℓ robustness in natural data representations: Real data such as images have natural representations that are often used in image processing e.g., via the Discrete Cosine Transform (DCT). Via an empirical study we highlight the need for achieving ℓ robustness in the DCT basis. More importantly, we demonstrate that the representation in the DCT basis is robust, i.e., there exist low rank projections that capture most of the signal in the data and that at the same time have small 2 operator norm.1 We develop a fast heuristic based on sparse PCA to ﬁnd such robust projections. When combined with our

1This is also true for domains such as audio in the DCT basis. See Appendix G for experimental evidence.

multiplicative weights based algorithm, this leads to a new training procedure based on randomized smoothing. Our procedure can be applied to any network architecture and provides stronger guarantees on robustness to ℓ perturbations in the DCT basis.

2 Certiﬁed Robustness to ℓ2 Perturbations

We build upon the randomized smoothing technique proposed in [10, 11] and further developed in [12]. Consider a multiclass classiﬁcation problem and a classiﬁer f : Rn Y, where Y is the label set. Given f, randomized smoothing produces a smoothed classiﬁer g where

g(x) = arg max y Y P(f(x + δ) = y). (2)

Here δ N(0, σ2I) is the Gaussian noise added. The following proposition holds. Proposition 2.1 ([10, 11]). Given a classiﬁer f, let g be its smoothed version as deﬁned in (2) above. On an input x, and for δ N(0, σ2I) deﬁne y A := arg maxy P(f(x + δ) = y), and let p A = P(f(x + δ) = y A). Then the prediction of g at x is unchanged up to ℓ2 perturbations of radius

2 Φ 1(p A) Φ 1(p B) . (3)

Here p B = maxy =y A P(f(x + δ) = y) and Φ 1 is the inverse of the standard Gaussian CDF.

Hence, randomized smoothing provides a fast method to certify the robustness of any given classiﬁer on various inputs. In order to get robustness to large perturbations it is desirable to choose the noise magnitude σ as large as possible. However, there is a natural tradeoﬀ between the amount of noise added and the natural accuracy of the classiﬁer. As an example consider an input x Rn of ℓ2 length n. If σ is the average amount of noise added then one is restricted to choosing σ to be a small constant in order for the noise to not overwhelm the signal.

However, it is well known that natural data such as images are low dimensional in nature. Figure 5 in Appendix B shows that for the CIFAR-10 and CIFAR-100 datasets, even when projected, via PCA, onto 200 dimensions, the reconstruction error remains small. If the input is close to an r-dimensional subspace, then it is natural to add noise only within the subspace for smoothing. Formally, let Π be the projection matrix on to an r-dimensional subspace and x be such that Πx 2 x 2 = n. For δ N(0, σ2I) we have Πδ 2 σ r. Hence if we only add noise within the subspace, then σ can be as large as p

n/r as opposed to a constant without signiﬁcantly aﬀecting the natural accuracy.

We formalize this into an eﬃcient training algorithm as follows: we take a base classiﬁer/neural network f(x) and replace it with the smoothed classiﬁer gΠ(x) where

gΠ(x) = arg max y Y P(f(Πx + δΠ) = y). (4)

where Π is a projection matrix onto an r-dimensional subspace and δΠ is a standard Gaussian of variance σ2 that lies within Π. For data such as images, good projections Π can be obtained via methods like PCA. Furthermore, certifying the robustness of our proposed smoothed classiﬁer can be easily incorporated into existing pipelines for adversarial training with minimal overhead. In particular using the rotational symmetry of Gaussian distributions it is easy to show the following Proposition 2.2. Given a base classiﬁer f : Rn Y and a projection matrix Π, on any input x, the smoothed classiﬁer gΠ(x) as deﬁned in (4) is equivalent to the classiﬁer given by

gΠ(x) = arg max y Y P(f Π(x + δ) = y). (5)

Here δ is a standard Gaussian of variance σ2 in every direction.

Hence constructing our proposed smoothed classiﬁer simply requires adding a linear transformation layer to any existing network architecture before training via randomized smoothing. We propose to train the smoothed classiﬁer as deﬁned in (5) by minimizing its adversarial

standard cross entropy loss as proposed in [6]. However, since dealing with arg max is hard from an optimization point of view, we follow the approach of [12] and instead minimize the cross entropy loss of the following soft classiﬁer

GΠ(x) = E δ N(0,σ2I)[f Π(x + δ) ]. (6)

This leads to the following objective where ℓce is the standard cross-entropy objective and ε > 0 is perturbation radius chosen for the training procedure.

arg min f E (x,y)

max z: z 2 ε ℓce(GΠ(x + z), y) . (7)

Following [6, 12], the inner maximization of ﬁnding adversarial perturbations is solved via projected gradient descent (PGD), and given the adversarial perturbations, the outer minimization uses stochastic gradient descent. This leads to the following training procedure.

Algorithm 1 Adversarial training via projections

1: function Robust Train(training data (x1, y1), . . . , (xm, ym), subspace rank r, base noise magnitude σ, λ [0, 1], number of steps T, mini batch size b) 2: Perform PCA on (unlabeled) data matrix A Rn m to obtain a rank-r projection matrix Π. 3: Set GΠ as in (6) with σ = λ p

n/r. 4: for t = 1, . . . , T do 5: Obtain a mini batch of b examples (xt1, yt1), . . . , (xtb, ytb). 6: For each xti use projected gradient ascent on inner maximization in (7) to get x ti. 7: Given perturbed examples {(x ti, yti)}i [b], update network parameters via SGD.

8: Output the smoothed classiﬁer gΠ(x).

Empirical Evaluation. We compare Algorithm 1 with the algorithm of [12] for various values of σ and ε (used for training to optimize (7)). We choose ε {0.25, 0.5, 0.75, 1.0} and for each ε we choose the value of σ as described in [12]. In each case, we train the classiﬁer proposed in [12] using a noise magnitude σ, and we train our proposed smoothed classiﬁer using higher noise values of λσ p

n/r, where λ is a parameter that we vary. To obtain the projection matrix Π we perform a PCA onto each image channel separately and use the top 200 principal components to obtain the projection matrix Π.

In all experiments, we train a Res Net-32 network on the CIFAR-10 dataset by optimizing (7). The complexity of Algorithm 1 is comparable to the complexity of training a smoothed classiﬁer as in the work of [SYL+ 21 19]. The PCA step incurs a one time preprocessing cost and the projection step at the beginning simply corresponds to adding a linear layer to an existing Res Net architecture. As an example, on the CIFAR-10 dataset, for ε = 0.25, training the classiﬁer of [12] takes on average 21.27 seconds per epoch, whereas Algorithm 1 takes 21.29 seconds per epoch on average. The same behavior holds across diﬀerent parameter settings. Figure 1 shows a comparison of certiﬁed accuracies for diﬀerent radii and diﬀerent values of λ. See Appendix B for a description of the hyperparameters and additional experiments. For application to other domains such as text data where the input representation is not ﬁxed, training the linear projection Π along with the network could be beneﬁcial and in fact necessary. For image datasets above, we also experimented with simultaneously training the projection with the network parameters. The results obtained were similar to using a ﬁxed projection and we did not see any signiﬁcant advantage.

As can be seen from Figure 1, varying the value of λ lets us tradeoﬀlower accuracy at small values of the radius for a signiﬁcant gain in certiﬁed accuracy at higher radii as compared to the method of [12]. In particular we ﬁnd that choosing values of λ close to 0.5 leads to networks that can certify accuracy at much higher radii with minimal to no loss in the natural accuracy as compared to the approach of [12].

In Figure 2 we present the result of our training procedure for various values of ε and σ and compare with the ℓ2 smoothing method of [12] on the CIFAR-10 and CIFAR-100 datasets. For both datasets, our trained networks outperform the method of [12] across a large range of radius values. For higher values of radius (say, 0.5) our method achieves a desired certiﬁed

Figure 1: Plot of certiﬁed accuracy achieved at diﬀerent radii when different values of λ are chosen to optimize the smoothed classiﬁer in (5). We compare with the method of [12] The plot is obtained by training a Res Net-32 architecture on CIFAR-10 with ε = 1.0. The y-intercept of each curve represents the natural accuracy of the corresponding classiﬁer.

Figure 2: A comparison of certiﬁed radius guarantees obtained via Algorithm 1 as compared to the approach of [12]. The x-axis is the radius, and the y-axis represents the certiﬁed accuracy. The top row describe results for the CIFAR-10 dataset (left) certiﬁed accuracies for various values of ε, (right) for ε = 0.5. Similarly, the bottom row describe the results for the CIFAR-100 dataset.

accuracy with signiﬁcantly higher natural accuracy as compared to the method of [12]. For instance in the CIFAR-10 dataset, at a radius of 1.0 and a desired certiﬁed accuracy of at least 0.35, the method of [12] achieves a natural accuracy of 0.5 (blue dotted curve at radius 0). In contrast our method achieves the same with a natural accuracy of 0.65 (green solid curve at radius 0). On the other hand, at very small radius values the method of [12] is better. This is expected as we suﬀer a small loss in natural accuracy due to the PCA step in Algorithm 1. We remark that in practice, we may not know the radius of adversarial perturbation (and the ideal choice of ε) beforehand. Hence sacriﬁcing a small amount of accuracy at small radii for a signiﬁcant gain at higher radii is a desirable tradeoﬀ.

3 Methods for Certiﬁed ℓ Robustness

We now describe our algorithms for the more challenging problem of certiﬁed robustness to ℓ perturbations in a given basis or representation. Our approach is to leverage the existence of good representations of natural data measured by a certain robustness parameter, and translate ℓ2 robustness guarantees from Section 2 to get certiﬁed ℓ guarantees. Consider f : Rn Y and g(x) := f(Πx), where Π Rn n represents a projection matrix. The certiﬁed accuracy of g satisﬁes

ε > 0, acc(ℓ ) ε (g) acc(ℓ2) ε (g) for 0 ε ε/ Π 2, where Π 2 = max x: x 1 Πx 2

is the 2 operator norm of Π and represents a robustness parameter (see Proposition C.2 for a formal claim). Hence, to translate guarantees from ℓ2 to ℓ , we look for robust projections Π that have small 2 operator norm. Our approach is inspired by the recent theoretical work of [21], and ﬁnds a low-dimensional representation given by an (orthogonal) projection matrix Π for the data that has small Π 2. By matrix norm duality, there is a nice characterization for Π 2 as the maximum ℓ1 norm among Euclidean unit vectors in the subspace of Π (this is a notion of sparsity of the vectors). For a rank r projector Π, the range of values taken by Π 2 is [ r, n]. Hence if the projection is not low-rank and sparse, Π 2 could be as large as n (e.g., when Π = I). This is consistent with the loss of n factor in robustness radius to ℓ perturbations for general datasets [16, 17]. Moreover as we have seen in Section 2, good low-rank representations of the data also give stronger certiﬁed ℓ2 robustness guarantees (and in turn, stronger certiﬁed ℓ guarantees).

The goal is to ﬁnd a good robust rank-r projection of the data if it exists. We propose a heuristic based on sparse PCA [22] to ﬁnd a robust projection with low error (see Section D.2). Since we aim for certiﬁed robustness, an important step is to compute and certify an upper bound on Π 2, for a projection Π. This is an NP-hard problem, related to computing the famous Grothendieck norm [23]. We describe a new, scalable, approximate algorithm for computing upper bounds on Π 2 with provable guarantees.

Certifying the 2 operator norm. Our fast algorithm is based on the multiplicative weights update (MWU) method for approximately solving a natural semi-deﬁnite programming (SDP) relaxation, and produce a good upper bound on Π 2. Our upper bound also comes with a certiﬁcate from the dual SDP i.e., a short proof of the correctness of the upper bound. Given a candidate Π our algorithm will compute an upper bound for Π 1 which by matrix norm duality satisﬁes

Π 2 2 = Π 1 = max x x Πx subject to x 1. (8)

This problem falls into the more general class of problems called Quadratic Programming:

Given a symmetric matrix M with i [n] : Mii 0, max x: x 1 x Mx. (9)

The standard SDP relaxation for the problem (see (12) in Appendix D.1), has primal variables represented by the positive semi-deﬁnite (PSD) matrix X Rn n satisfying constraints Xii 1 for each i [n]. The SDP dual of this relaxation (given in (14) of Appendix D.1) has variables y1, . . . , yn 0 corresponding to the n constraints in the primal SDP. Since the SDP is a valid relaxation for (9), it provides an upper bound for 1 operator norm2 . Classical results show that it is always within a factor of π/2 of the actual value of M 1 [24, 25]. However, it is computationally intensive to solve the SDP using oﬀ-the-shelf SDP solvers (even for CIFAR-10 images, X is 1024 1024). We design a fast algorithm based on the multiplicative weight update (MWU) framework [26, 19].

Description of the algorithm Our algorithm diﬀers slightly from the standard MWU approach for solving the above SDP. The algorithm below takes as input a matrix M and always returns a valid upper bound upbdmin on the SDP value, along with a dual feasible

2A fast algorithm that potentially ﬁnds a local optimum for the problem will not suﬃce for our purposes; we need an upper bound on the global optimum.

solution yubmin that can act as a certiﬁcate, and a candidate primal solution X that attains the same value (and is potentially feasible). Theorem 3.1 proves that for the right setting of parameters, particularly the number of iterations Tf = O(n log n/δ3), the solution X is also guaranteed to be feasible up to small error δ > 0.

Algorithm 2 Fast Certiﬁcation of 1 norm and Quadratic Programming

1: function Certify SDP(M Rn n, iteration bound Tf, slack δ, damping ρ) 2: Initialize α = (1, 1, . . . , 1) Rn. primal X = 0, dual y = 0n, upbdmin = , yubmin = 0n. 3: for t = 0, 1 . . . , T do 4: α (1 δ)α + δ(1, 1, . . . , 1). 5: λ max-eigenvalue(diag( α) 1/2Mdiag( α) 1/2) and u Rn be its eigenvector. 6: v n diag( α) 1/2u, y 1 t+1(ty + λ α), X 1 t+1(t X + vv ). 7: if v 1 + δ or maxi Xii 1 + δ then, do early stop and return appropriate values. 8: Update i [n], α(i) α(i) exp δ

ρ(v(i)2 1) , and renormalize s.t. Pn i=1 α(i) = n. 9: if upbdmin > nλ, then set upbdmin = nλ and yubmin = λ α. 10: Output upbdmin, dual solution yubmin, and primal candidate X.

Recall that from (8) an estimate of the the 1 norm immediately translates to an estimate of the 2 norm. In the above algorithm, there are weights given by α for n diﬀerent constraints of the form Xii 1. At each iteration, the algorithm maximizes the objective subject to one constraint of the form P

i αi Xii n, where α involves a small correction to α that is crucial to ensure the run-time guarantees. The maximization is done using a maximum eigenvalue/eigenvector computation. The weights α are then updated using a multiplicative update based on the violation of the solution found in the current iterate. The damping factor ρ determines the rate of progress of the iterations the smaller the value of ρ the faster the progress, but a very small value may lead to oscillations. A more aggressive choice of ρ compared to the one in Theorem 3.1 seems to work well in practice. Finally, we remark that for every choice of α and ρ we get a valid upper bound (due to dual feasibility). We show the following guarantee for our algorithm for problem (9).

Theorem 3.1. Suppose δ > 0, and M be any symmetric matrix with Mii 0 i [n]. For any α Rn 0 with Pn i=1 α(i) = n, if λ = λmax (diag(α) 1/2Mdiag(α) 1/2) , then y = λα is feasible for the dual SDP and gives a valid upper bound of nλ on the objective value for the SDP relaxation to (9). Moreover Algorithm 2 on input M, with parameters δ and ρ = O(n/δ) after T = O(n log n/δ3) iterations ﬁnds a feasible SDP solution b X 0 and a feasible dual solution by Rn that both sandwich the optimal SDP value within a 1 + δ factor.

(See Prop D.1 and Theorem D.2 in Appendix D.1 for formal statements along with proofs. ) Each iteration only involves a single maximum eigenvalue computation, which can be done up to (1 + δ) accuracy in Teig = O(m/δ) time where m is the number of non-zeros in M (see e.g., [26]). To the best of our knowledge, this gives signiﬁcant improvements over the prior best bound of O(n1.5m/δ2.5) runtime for solving the above SDP [19, 20]. Our algorithm and analysis diﬀers from the general MWU framework [20] by treating the objective diﬀerently from the constraints so that the width parameter does not depend on the objective. A crucial step in our algorithm and proof is to add a correction term of O(δ) to the weights in each step that ensures that the potential violation of each constraint in an iteration is bounded. In addition our algorithm is more scalable than existing oﬀ-the-shelf methods. See Appendix D.1 and F for details and comparisons.

4 Training Certiﬁed ℓ Robust Networks in Natural Representations.

Building upon our theoretical results from the previous section we now demonstrate that for natural representations, one can indeed achieve better certiﬁed robustness to ℓ perturbations

by translating guarantees from certiﬁed ℓ2 robustness. We focus on image data, and study the representation of images in the DCT basis. Before we describe the details of our training and certiﬁcation procedure for ℓ robustness, we provide further empirical evidence that imperceptibility in natural representations such as the DCT basis is a desirable property.

Figure 3: Original images (leftmost) from the CIFAR-10 dataset and their perturbed versions when sparse random perturbations are added in the DCT basis with sparsity k. Large perturbations in the DCT basis (e.g., k = 1 ) lead to perceptible changes in the pixel space though they are ℓ perturbations of ε 0.09. As k increases the imperceptibility of the perturbed images improves.

Study of imperceptibility in DCT basis. We argue that for adversarial perturbations to be imperceptible to humans they should be of small magnitude in the DCT representation (perhaps in addition to being small in the pixel basis). We take images from the CIFAR-10 dataset in its DCT representation and add sparse random perturbations to them. In particular, for a sparsity parameter k, we pick k coordinates in the DCT basis at random and add a random perturbation with ℓ norm of ck/

k where ck ε n is chosen such that the perturbed images are ε 0.09 away from the unperturbed images in the pixel space. Notice that for small values of k, a perturbation of large ℓ norm is added. Figure 3 visualizes the perturbed images for diﬀerent values of k. As seen, large perturbations in the DCT basis lead to visually perceptible changes, even if they are (ε 0.09)-close in the pixel basis. For comparison we also include in Appendix E imperceptible adversarial examples for these images that were generated via the PGD based method of [6] on a Res Net-32 network trained on the CIFAR-10 dataset for robustness to ℓ perturbations of magnitude ε = 0.09. This further motivates studying robustness in natural data representations.

Training certiﬁed ℓ robust networks in the DCT basis. The methods developed in Section 3 and 2 together give algorithms for training classiﬁers with certiﬁed ℓ robustness. However while we want ℓ robustness in a diﬀerent representation U (e.g., in the DCT basis), it may still be more convenient to use oﬀ-the-shelf methods for performing the training in the original representation X Rn (e.g., pixel representation). Let the orthogonal matrix O Rn n represent the DCT transformation. Consider an input x X and let u = Ox U be its DCT representation, where U is the space of images in the DCT basis. It is easy to see that functions f : X Y and g : U Y given by g(u) := f(O 1u) = f(x) have the same certiﬁed accuracy to ℓ2 perturbations. Moreover if the classiﬁer g satisﬁes g(u) = g(Πu) for some projection Π, the robust accuracy of g to ℓ perturbations in the representation U

satisﬁes (see Proposition C.3 in appendix)

acc(ℓ ) ε (g) acc(ℓ2) ε (f), for any ε > 0, 0 ε ε/ Π 2. (10)

Figure 4: A comparison of certiﬁed ℓ accuracy (y-axis) in the DCT basis of our method to that of [12], for diﬀerent values of ε and with diﬀerent certiﬁed radii on the x-axis, for λ = 0.5. The left and right plots describe results for CIFAR-10 and CIFAR-100 datasets.

Experimental Data. We evaluate our approach on the CIFAR-10 and CIFAR-100 datasets. From (10), it is suﬃcient to train a classiﬁer in the original pixel space with an appropriate projection Π = OΠ. Hence, we train a smoothed classiﬁer as deﬁned in (2) using Algorithm 1. To obtain the required Π, we ﬁrst use the sparse PCA based heuristic (Algorithm 3) to ﬁnd a projection matrix of rank 200 for the three image channels separately. We then use Algorithm 2 to compute upper bounds on the 2 operator norm of the projections matrices. Finally, we combine the obtained projection matrices from each channel to obtain a projection Π. Table 1 shows the values of the operator norms certiﬁed by our algorithm for each image channel and for the combined projection matrix. Notice the obtained subspaces have operator norm values signiﬁcantly smaller than n = 55.42. The reconstruction error in each case, when projected onto Π is at most 0.0345.

Dataset R G B Π CIFAR-10 17.45 17.51 17.39 30.22 CIFAR-100 17.22 17.33 17.37 29.97

Table 1: The table shows bounds on 2 norm for projection matrices obtained by Algorithm 2 on CIFAR-10 and CIFAR-100 training sets.

After training, on an input x we obtain a certiﬁed radius for ℓ perturbations in the DCT basis by obtaining a certiﬁed ℓ2 radius via randomized smoothing and then dividing the obtained value by Π 2. We compare with the approach of [12] for training a smoothed classiﬁer without projections. Since the classiﬁer of [12] does not involve projections, we translate the resulting ℓ2 robustness guarantee into an ℓ guarantee by dividing with n = 55.42 as done in [12]. Figure 4 shows that across a range of training parameters, our proposed approach leads to signiﬁcantly higher certiﬁed accuracy to ℓ perturbations in the DCT basis.

5 Conclusion

In this paper, we have shown signiﬁcant beneﬁts in leveraging natural structure that exists in real-world data e.g., low-rank or sparse representations, for obtaining certiﬁed robustness guarantees under both ℓ2 perturbations and ℓ perturbations in natural data representations. Our experiments involving imperceptibility in the DCT basis for images suggest that further study of ℓ robustness for other natural basis (apart from the co-ordinate basis) would be useful for diﬀerent data domains like images, audio etc. We also gave faster algorithms for approximately solving semi-deﬁnite programs for quadratic programming (with provable guarantees that improve the state-of-the-art), to obtain certiﬁed ℓ robustness guarantees. Such problem-speciﬁc fast approximate algorithms for powerful algorithmic techniques like SDPs and other convex relaxations may lead to more scalable certiﬁcation procedures with better guarantees. Finally it would be interesting to see if our ideas and techniques involving the 2 operator norm can be adapted into the training phase, in order to achieve better certiﬁed ℓ robustness in any desired basis without compromising much on natural accuracy.

Broader Impact

Our work provides eﬃcient algorithms for training neural networks with certiﬁed robustness guarantees. This can have signiﬁcant positive societal impact considering the importance of protecting AI systems against malicious adversaries. A classiﬁer with certiﬁed robustness guarantees can give a sense of security to the end user. On the other hand, our methods achieve robustness at the expense of a small loss in natural test accuracy as compared to nonadversarial training. It is unclear how this loss in accuracy is distributed across the population. This could have a negative societal impact if the loss in accuracy is disproportionately on data points/individuals belonging to a speciﬁc demographic group based on say race or gender. That said, robustness to perturbations also corresponds to a natural notion of individual fairness since data points with similar features need to be treated similarly by a robust classiﬁer. Hence, a careful study must be done to understand these eﬀects before a large scale practical deployment of systems based on our work.

Acknowledgments and Disclosure of Funding

The ﬁrst author acknowledges support from the NSF HDR TRIPODS award CCF-1934924. The last author was supported by the National Science Foundation (NSF) under Grant No. CCF-1652491, CCF-1637585 and CCF-1934931.

[1] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. ar Xiv preprint ar Xiv:1312.6199, 2013.

[2] Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim Šrndić, Pavel Laskov, Giorgio Giacinto, and Fabio Roli. Evasion attacks against machine learning at test time. In Joint European conference on machine learning and knowledge discovery in databases, pages 387 402. Springer, 2013.

[3] Nicholas Carlini and David Wagner. Audio adversarial examples: Targeted attacks on speech-to-text. In 2018 IEEE Security and Privacy Workshops (SPW), pages 1 7. IEEE, 2018.

[4] Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Dejing Dou. Hotﬂip: White-box adversarial examples for text classiﬁcation. ar Xiv preprint ar Xiv:1712.06751, 2017.

[5] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. ar Xiv preprint ar Xiv:1412.6572, 2014.

[6] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. ar Xiv preprint ar Xiv:1706.06083, 2017.

[7] Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P Xing, Laurent El Ghaoui, and Michael I Jordan. Theoretically principled trade-oﬀbetween robustness and accuracy. ar Xiv preprint ar Xiv:1901.08573, 2019.

[8] Ali Shafahi, Mahyar Najibi, Mohammad Amin Ghiasi, Zheng Xu, John Dickerson, Christoph Studer, Larry S Davis, Gavin Taylor, and Tom Goldstein. Adversarial training for free! In Advances in Neural Information Processing Systems, pages 3353 3364, 2019.

[9] Eric Wong, Leslie Rice, and J Zico Kolter. Fast is better than free: Revisiting adversarial training. ar Xiv preprint ar Xiv:2001.03994, 2020.

[10] Mathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, and Suman Jana. Certiﬁed robustness to adversarial examples with diﬀerential privacy. In 2019 IEEE Symposium on Security and Privacy (SP), pages 656 672. IEEE, 2019.

[11] Jeremy M Cohen, Elan Rosenfeld, and J Zico Kolter. Certiﬁed adversarial robustness via randomized smoothing. ar Xiv preprint ar Xiv:1902.02918, 2019.

[12] Hadi Salman, Greg Yang, Jerry Li, Pengchuan Zhang, Huan Zhang, Ilya Razenshteyn, and Sebastien Bubeck. Provably robust deep learning via adversarially trained smoothed classiﬁers. ar Xiv preprint ar Xiv:1906.04584, 2019. [13] Aditi Raghunathan, Jacob Steinhardt, and Percy S Liang. Semideﬁnite relaxations for certifying robustness to adversarial examples. In Advances in Neural Information Processing Systems, pages 10877 10887, 2018. [14] Sven Gowal, Krishnamurthy Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy Mann, and Pushmeet Kohli. On the eﬀectiveness of interval bound propagation for training veriﬁably robust models. ar Xiv preprint ar Xiv:1810.12715, 2018. [15] Eric Wong and Zico Kolter. Provable defenses against adversarial examples via the convex outer adversarial polytope. In International Conference on Machine Learning, pages 5283 5292, 2018. [16] Avrim Blum, Travis Dick, Naren Manoj, and Hongyang Zhang. Random smoothing might be unable to certify ℓ robustness for high-dimensional images. ar Xiv preprint ar Xiv:2002.03517, 2020. [17] Greg Yang, Tony Duan, Edward Hu, Hadi Salman, Ilya Razenshteyn, and Jerry Li. Randomized smoothing of all shapes and sizes. ar Xiv preprint ar Xiv:2002.08118, 2020. [18] Aounon Kumar, Alexander Levine, Tom Goldstein, and Soheil Feizi. Curse of dimensionality on randomized smoothing for certiﬁable robustness. ar Xiv preprint ar Xiv:2002.03239, 2020. [19] Sanjeev Arora, Elad Hazan, and Satyen Kale. Fast algorithms for approximate semidefinite programming using the multiplicative weights update method. In 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS 05), pages 339 348. IEEE, 2005. [20] Satyen Kale. Eﬃcient algorithms using the multiplicative weights update method. Princeton University, 2007. [21] Pranjal Awasthi, Vaggos Chatziafratis, Xue Chen, and Aravindan Vijayaraghavan. Adversarially robust low dimensional representations. ar Xiv preprint ar Xiv:1911.13268, 2019. [22] Julien Mairal, Francis Bach, Jean Ponce, and Guillermo Sapiro. Online dictionary learning for sparse coding. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML 09, page 689 696, New York, NY, USA, 2009. Association for Computing Machinery. [23] Noga Alon and Assaf Naor. Approximating the cut-norm via grothendieck s inequality. In Proceedings of the thirty-sixth annual ACM symposium on Theory of computing, pages 72 80. ACM, 2004. [24] Yu Nesterov. Semideﬁnite relaxation and nonconvex quadratic optimization. Optimization methods and software, 9(1-3):141 160, 1998. [25] Noga Alon, Konstantin Makarychev, Yury Makarychev, and Assaf Naor. Quadratic forms on graphs. Inventiones mathematicae, 163(3):499 522, 2006. [26] Philip Klein and Hsueh-I Lu. Eﬃcient approximation algorithms for semideﬁnite programs arising from max cut and coloring. In Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing, STOC 96, page 338 347, New York, NY, USA, 1996. Association for Computing Machinery. [27] Mitali Bafna, Jack Murtagh, and Nikhil Vyas. Thwarting adversarial examples: An l_0robust sparse fourier transform. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 10075 10085. Curran Associates, Inc., 2018. [28] Amartya Sanyal, Varun Kanade, and Philip H. S. Torr. Low rank structure of learned representations. Co RR, abs/1804.07090, 2018. [29] Francesco Croce, Maksym Andriushchenko, and Matthias Hein. Provable robustness of relu networks via maximization of linear regions. ar Xiv preprint ar Xiv:1810.07481, 2018.

[30] Shiqi Wang, Yizheng Chen, Ahmed Abdou, and Suman Jana. Mixtrain: scalable training of formally robust neural networks. corr abs/1811.02625 (2018). arxiv. org/abs, 1811.

[31] Shiqi Wang, Kexin Pei, Justin Whitehouse, Junfeng Yang, and Suman Jana. Eﬃcient formal safety analysis of neural networks. In Advances in Neural Information Processing Systems, pages 6367 6377, 2018.

[32] Timon Gehr, Matthew Mirman, Dana Drachsler-Cohen, Petar Tsankov, Swarat Chaudhuri, and Martin Vechev. Ai2: Safety and robustness certiﬁcation of neural networks with abstract interpretation. In 2018 IEEE Symposium on Security and Privacy (SP), pages 3 18. IEEE, 2018.

[33] Matthew Mirman, Timon Gehr, and Martin Vechev. Diﬀerentiable abstract interpretation for provably robust neural networks. In International Conference on Machine Learning, pages 3578 3586, 2018.

[34] Gagandeep Singh, Timon Gehr, Matthew Mirman, Markus Püschel, and Martin Vechev. Fast and eﬀective robustness certiﬁcation. In Advances in Neural Information Processing Systems, pages 10802 10813, 2018.

[35] Tsui-Wei Weng, Huan Zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Duane Boning, Inderjit S Dhillon, and Luca Daniel. Towards fast computation of certiﬁed robustness for relu networks. ar Xiv preprint ar Xiv:1804.09699, 2018.

[36] Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. Eﬃcient neural network robustness certiﬁcation with general activation functions. In Advances in neural information processing systems, pages 4939 4948, 2018.

[37] Pranjal Awasthi, Abhratanu Dutta, and Aravindan Vijayaraghavan. On robustness to adversarial examples and polynomial optimization. In Advances in Neural Information Processing Systems, pages 13737 13747, 2019.

[38] Xuanqing Liu, Minhao Cheng, Huan Zhang, and Cho-Jui Hsieh. Towards robust neural networks via random self-ensemble. In Proceedings of the European Conference on Computer Vision (ECCV), pages 369 385, 2018.

[39] Xiaoyu Cao and Neil Zhenqiang Gong. Mitigating evasion attacks to deep neural networks via region-based classiﬁcation. In Proceedings of the 33rd Annual Computer Security Applications Conference, pages 278 287, 2017.

[40] Yao-Yuan Yang, Cyrus Rashtchian, Hongyang Zhang, Ruslan Salakhutdinov, and Kamalika Chaudhuri. Adversarial robustness through local lipschitzness. ar Xiv preprint ar Xiv:2003.02460, 2020.

[41] Bai Li, Changyou Chen, Wenlin Wang, and Lawrence Carin. Certiﬁed adversarial robustness with additive noise. In Advances in Neural Information Processing Systems, pages 9459 9469, 2019.

[42] Krishnamurthy Dvijotham, Sven Gowal, Robert Stanforth, Relja Arandjelovic, Brendan O Donoghue, Jonathan Uesato, and Pushmeet Kohli. Training veriﬁed learners with learned veriﬁers. ar Xiv preprint ar Xiv:1805.10265, 2018.

[43] Hadi Salman, Mingjie Sun, Greg Yang, Ashish Kapoor, and J Zico Kolter. Black-box smoothing: A provable defense for pretrained classiﬁers. ar Xiv preprint ar Xiv:2003.01908, 2020.

[44] Yuzhe Yang, Guo Zhang, Dina Katabi, and Zhi Xu. Me-net: Towards eﬀective adversarial robustness with matrix estimation. ar Xiv preprint ar Xiv:1905.11971, 2019.

[45] S. A. Plotkin, D. B. Shmoys, and E. Tardos. Fast approximation algorithms for fractional packing and covering problems. In [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science, pages 495 504, 1991.

[46] Sanjeev Arora and Satyen Kale. A combinatorial, primal-dual approach to semideﬁnite programs. In Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing, STOC 07, page 227 236, New York, NY, USA, 2007. Association for Computing Machinery.

[47] G. Iyengar, D. J. Phillips, and C. Stein. Approximating semideﬁnite packing programs. SIAM Journal on Optimization, 21(1):231 268, 2011.

[48] Rahul Jain and Penghui Yao. A parallel approximation algorithm for positive semideﬁnite programming. In Proceedings of the 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science, FOCS 11, page 463 471, USA, 2011. IEEE Computer Society. [49] Zeyuan Allen-Zhu, Yin Tat Lee, and Lorenzo Orecchia. Using optimization to obtain a width-independent, parallel, simpler, and faster positive sdp solver. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 16, page 1824 1831, USA, 2016. Society for Industrial and Applied Mathematics. [50] Arun Jambulapati, Yin Tat Lee, Jerry Li, Swati Padmanabhan, and Kevin Tian. Positive semideﬁnite programming: mixed, parallel, and width-independent. In Proccedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, Chicago, IL, USA, June 22-26, 2020, pages 789 802. ACM, 2020.

[51] Yin Tat Lee and Swati Padmanabhan. An õ(m/ε3.5)-cost algorithm for semideﬁnite programs with diagonal constraints. Co RR, abs/1903.01859, 2019. [52] Farid Alizadeh. Interior point methods in semideﬁnite programming with applications to combinatorial optimization. SIAM Journal on Optimization, 5(1):13 51, 1995. [53] Y. T. Lee, A. Sidford, and S. C. Wong. A faster cutting plane method and its implications for combinatorial and convex optimization. In 2015 IEEE 56th Annual Symposium on Foundations of Computer Science, pages 1049 1065, 2015. [54] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770 778, 2016. [55] Alex Krizhevsky, Geoﬀrey Hinton, et al. Learning multiple layers of features from tiny images. 2009. [56] Moses Charikar and Anthony Wirth. Maximizing quadratic programs: Extending grothendieck s inequality. In Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science, FOCS 04, page 54 60, USA, 2004. IEEE Computer Society. [57] Vijay Bhattiprolu, Mrinalkanti Ghosh, Venkatesan Guruswami, Euiwoong Lee, and Madhur Tulsiani. Inapproximability of matrix p to q norms. ar Xiv preprint ar Xiv:1802.07425, 2018. [58] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825 2830, 2011. [59] MOSEK Ap S. The MOSEK Fusion API for Python manual Version 9.1.13, 2019. [60] Steven Diamond and Stephen Boyd. CVXPY: A Python-embedded modeling language for convex optimization. Journal of Machine Learning Research, 17(83):1 5, 2016.