# adversarial_robustness_of_supervised_sparse_coding__8b76f305.pdf Adversarial Robustness of Supervised Sparse Coding Jeremias Sulam Johns Hopkins University jsulam1@jhu.edu Ramchandran Muthukumar Johns Hopkins University rmuthuk1@jhu.edu Raman Arora Johns Hopkins University arora@cs.jhu.edu Several recent results provide theoretical insights into the phenomena of adversarial examples. Existing results, however, are often limited due to a gap between the simplicity of the models studied and the complexity of those deployed in practice. In this work, we strike a better balance by considering a model that involves learning a representation while at the same time giving a precise generalization bound and a robustness certificate. We focus on the hypothesis class obtained by combining a sparsity-promoting encoder coupled with a linear classifier, and show an interesting interplay between the expressivity and stability of the (supervised) representation map and a notion of margin in the feature space. We bound the robust risk (to ℓ2-bounded perturbations) of hypotheses parameterized by dictionaries that achieve a mild encoder gap on training data. Furthermore, we provide a robustness certificate for end-to-end classification. We demonstrate the applicability of our analysis by computing certified accuracy on real data, and compare with other alternatives for certified robustness. 1 Introduction With machine learning applications becoming ubiquitous in modern-day life, there exists an increasing concern about the robustness of the deployed models. Since first reported in [Szegedy et al., 2013, Goodfellow et al., 2014, Biggio et al., 2013], these adversarial attacks are small perturbations of the input, imperceptible to the human eye, which can nonetheless completely fluster otherwise well-performing systems. Because of clear security implications [DARPA, 2019], this phenomenon has sparked an increasing amount of work dedicated to devising defense strategies [Metzen et al., 2017, Gu and Rigazio, 2014, Madry et al., 2017] and correspondingly more sophisticated attacks [Carlini and Wagner, 2017, Athalye et al., 2018, Tramer et al., 2020], with each group trying to triumph over the other in an arms-race of sorts. A different line of research attempts to understand adversarial examples from a theoretical standpoint. Some works have focused on giving robustness certificates, thus providing a guarantee to withstand the attack of an adversary under certain assumptions [Cohen et al., 2019, Raghunathan et al., 2018, Wong and Kolter, 2017]. Other works address questions of learnabiltiy [Shafahi et al., 2018, Cullina et al., 2018, Bubeck et al., 2018, Tsipras et al., 2018] or sample complexity [Schmidt et al., 2018, Yin et al., 2018, Tu et al., 2019], in the hope of better characterizing the increased difficulty of learning hypotheses that are robust to adversarial attacks. While many of these results are promising, the analysis is often limited to simple models. 34th Conference on Neural Information Processing Systems (Neur IPS 2020), Vancouver, Canada. Here, we strike a better balance by considering a model that involves learning a representation while at the same time giving a precise generalization bound and a robustness certificate. In particular, we focus our attention on the adversarial robustness of the supervised sparse coding model [Mairal et al., 2011], or task-driven dictionary learning, consisting of a linear classifier acting on the representation computed via a supervised sparse encoder. We show an interesting interplay between the expressivity and stability of a (supervised) representation map and a notion of margin in the feature space. The idea of employing sparse representations as data-driven features for supervised learning goes back to the early days of deep learning [Coates and Ng, 2011, Kavukcuoglu et al., 2010, Zeiler et al., 2010, Ranzato et al., 2007], and has had a significant impact on applications in computer vision and machine learning [Wright et al., 2010, Henaff et al., 2011, Mairal et al., 2008, 2007, Gu et al., 2014]. More recently, new connections between deep networks and sparse representations were formalized by Papyan et al. [2018], which further helped deriving stability guarantees [Papyan et al., 2017b], providing architecture search strategies and analysis [Tolooshams et al., 2019, Murdock and Lucey, 2020, Sulam et al., 2019], and other theoretical insights [Xin et al., 2016, Aberdam et al., 2019, Aghasi et al., 2020, Aberdam et al., 2020, Moreau and Bruna, 2016]. While some recent work has leveraged the stability properties of these latent representations to provide robustness guarantees against adversarial attacks [Romano et al., 2019], these rely on rather stringent generative model assumptions that are difficult to be satisfied and verified in practice. In contrast, our assumptions rely on the existence of a positive gap in the encoded features, as proposed originally by Mehta and Gray [2013]. This distributional assumption is significantly milder it is directly satisfied by making traditional sparse generative model assumptions and can be directly quantified from data. This work makes two main contributions: The first is a bound on the robust risk of hypotheses that achieve a mild encoder gap assumption, where the adversarial corruptions are bounded in ℓ2-norm. Our proof technique follows a standard argument based on a minimal ϵ-cover of the parameter space, dating back to Vapnik and Chervonenkis [1971] and adapted for matrix factorization and dictionary learning problems in Gribonval et al. [2015]. However, the analysis of the Lipschitz continuity of the adversarial loss with respect to the model parameters is considerably more involved. The increase in the sample complexity is mild with adversarial corruptions of size ν manifesting as an additional term of order O (1 + ν)2/m in the bound, where m is the number of samples, and a minimal encoder gap of O(ν) is necessary. Much of our results extend directly to other supervised learning problems (e.g. regression). Our second contribution is a robustness certificate that holds for every hypothesis in the function class for ℓ2 perturbations for multiclass classification. In a nutshell, this result guarantees that the label produced by the hypothesis will not change if the encoder gap is large enough relative to the energy of the adversary, the classifier margin, and properties of the model (e.g. dictionary incoherence). 2 Preliminaries and Background In this section, we first describe our notation and the learning problem, and then proceed to situate our contribution in relation to prior work. Consider the spaces of inputs, X BRd, i.e. the unit ball in Rd, and labels, Y. Much of our analysis is applicable to a broad class of label spaces, but we will focus on binary and multi-class classification setting in particular. We assume that the data is sampled according to some unknown distribution P over X Y. Let H = {f : X Y } denote a hypothesis class mapping inputs into some output space Y R. Of particular interest to us are norm-bounded linear predictors, f( ) = w, , parametrized by d-dimensional vectors w W = {w Rd : w 2 B}. From a learning perspective, we have a considerable understanding of the linear hypothesis class, both in a stochastic non-adversarial setting as well as in an adversarial context [Charles et al., 2019, Li et al., 2019]. However, from an application standpoint, linear predictors are often too limited, and rarely applied directly on input features. Instead, most state-of-the-art systems involve learning a representation. In general, an encoder map ϕ : X Z Rp, parameterized by parameters θ, is composed with a linear function so that f(x) = w, ϕθ(x) , for w Rp. This description applies to a large variety of popular models, including kernel-methods, multilayer perceptrons and deep convolutional neural networks. Herein we focus on an encoder given as the solution to a Lasso problem [Tibshirani, 1996]. More precisely, we consider ϕD(x) : Rd Rp, defined by ϕD(x) := arg min z 1 2 x Dz 2 2 + λ z 1. (1) Note that when D is overcomplete, i.e. p > d, this problem is not strongly convex. Nonetheless, we will assume that that solution to Problem 1 is unique1, and study the hypothesis class given by H = {f D,w(x) = w, ϕD(x) : w W, D D}, where W = {w Rp : w 2 B}, and D is the oblique manifold of all matrices with unit-norm columns (or atoms); i.e. D = {D Rd p : Di 2 = 1 i [p]}. While not explicit in our notation, ϕD(x) depends on the value of λ. For notational simplicity, we also suppress subscripts (D, w) in f D,w( ) and simply write f( ). We consider a bounded loss function ℓ: Y Y [0, b], with Lipschitz constant Lℓ. The goal of learning is to find an f H with minimal risk, or expected loss, R(f) = E(x,y) P [ℓ(y, f(x))]. Given a sample S = {(xi, yi)}m i=1, drawn i.i.d. from P, a popular learning algorithm is empirical risk minimization (ERM) which involves finding f D,w that solves the following problem: min D,w 1 m i=1 ℓ(yi, f D,w(xi)). Adversarial Learning. In an adversarial setting, we are interested in hypotheses that are robust to adversarial perturbations of inputs. We focus on evasion attacks, in which an attack is deployed at test time (while the training samples are not tampered with). As a result, a more appropriate loss that incorporates the robustness to such contamination is the robust loss [Madry et al., 2017], ℓν(y, f(x)) := maxv ν ℓ(y, f(x + v)), where is some subset of Rd that restricts the power of the adversary. Herein we focus on ℓ2 norm-bounded corruptions, ν = {v Rd : v 2 ν}, and denote by RS(f) = 1 m Pm i=1 ℓν(yi, f(xi)) the empirical robust risk of f and R(f) = E(x,y) P [ ℓν(y, f(x))] its population robust risk w.r.t. distribution P. Main Assumptions. We make two general assumptions throughout this work. First, we assume that the dictionaries in D are s-incoherent, i.e, they satisfy a restricted isometry property (RIP). More precisely, for any s-sparse vector, z Rp with z 0 = s, there exists a minimal constant ηs < 1 so that D is close to an isometry, i.e. (1 ηs) z 2 2 Dz 2 2 (1 + ηs) z 2 2. Broad classes of matrices are known to satisfy this property (e.g. sub-Gaussian matrices [Foucart and Rauhut, 2017]), although empirically computing this constant for a fixed (deterministic) matrix is generally intractable. Nonetheless, this quantity can be upper bounded by the correlation between columns of D, either via mutual coherence [Donoho and Elad, 2003] or the Babel function [Tropp et al., 2003], both easily computed in practice. Second, we assume that the map ϕD induces a positive encoder gap on the computed features. Given a sample x X and its encoding, ϕD(x), we denote by Λp s the set of atoms of cardinality (p s), i.e., Λp s = {I {1, . . . , p} : |I| = p s}. The encoder gap τs( ) induced by ϕD on any sample x is defined [Mehta and Gray, 2013] as τs(x) := max I Λp s min i I (λ | Di, x DϕD(x) |) . An equivalent and conceptually simpler definition for τs(x) is the (s + 1)th smallest entry in the vector λ1 | D, x DϕD(x) |. Intuitively, this quantity can be viewed as a measure of maximal energy along any dictionary atom that is not in the support of an input vector. More precisely, recall from the optimality conditions of Problem (1) that |DT i (x DϕD(x))| = λ if [ϕD(x)]i = 0, and |DT i (x DϕD(x))| λ otherwise. Therefore, if τs is large, this indicates that there exist a set I of (p s) atoms that are far from entering the support of ϕD(x). If ϕD(x) has exactly k non-zero entries, we may choose some s > k to obtain τs(x). In general, τs( ) depends on the energy of the residual, x DϕD(x), the correlation between the atoms, the parameter λ, and the cardinality s. In a nutshell, if a dictionary D provides a quickly decaying approximation error as a function of the cardinality s, then a positive encoder gap exists for some s. We consider dictionaries that induce a positive encoder gap in every input sample from a dataset, and define the minimum such margin as τ s := mini [m] τs(xi) > 0. Such a positive encoder exist for quite general distributions, such as s-sparse and approximately sparse signals. However, this definition is more general and it will allow us to avoid making any other stronger distributional assumptions. We now illustrate such the encoder gap with both analytic and numerical examples2. 1The solution is unique under mild assumptions [Tibshirani et al., 2013], and otherwise our results hold for any solution returned by a deterministic solver. 2Code to reproduce all of our experiments is made available at our github repository. 0 20 40 60 80 100 120 s λ=0.2 λ=0.3 λ=0.5 λ=0.6 0 50 100 150 200 250 s λ = 0.10 λ = 0.23 λ = 0.37 λ = 0.50 0 200 400 600 800 1000 s λ = 0.05 λ = 0.07 λ = 0.08 λ = 0.10 Figure 1: Encoder gap, τ s , for synthetic approximately sparse signals (a) MNIST digits (b) and CIFAR10 images (c). Approximate k-sparse signals Consider signals x obtained as x = Dz + v, where D D, v 2 ν and z is sampled from a distribution of sparse vectors with up to k non-zeros, with 3 1 + 1 µ(D) , where µ(D) = maxi =j Di, Dj is the mutual coherence of D. Then, for a particular choice of λ, we have that τs(x) > λ 15µν 2 , s > k. This can be shown using standard results in [Tropp, 2006]; we defer the proof to the Appendix A. Different values of λ provide different values of τs(x). To illustrate this trade-off, we generate synthetic approximately k-sparse signals (k = 15) from a dictionary with 120 atoms in 100 dimensions and contaminate them with Gaussian noise. We then numerically compute the value of τ s as a function of s for different values of λ, and present the results in Figure 1a. Image data We now demonstrate that a positive encoder exist for natural images as well. In Figure 1b we similarly depict the value of τs( ), as a function of s, for an encoder computed on MNIST digits and CIFAR images (from a validation set) with learned dictionaries (further details in Section 6). In summary, the encoder gap is a measure of the ability of a dictionary to sparsely represent data, and one can induce a larger encoder gap by increasing the regularization parameter or the cardinality s. As we will shortly see, this will provide us with a a controllable knob in our generalization bound. 3 Prior Work Many results exist on the approximation power and stability of Lasso (see [Foucart and Rauhut, 2017]), which most commonly rely on assuming data is (approximately) k-sparse under a given dictionary. As explained in the previous section, we instead follow an analysis inspired by Mehta and Gray [2013], which relies on the encoder gap. Mehta and Gray [2013] leverage encoder gap to derive a generalization bound for the supervised sparse coding model in a stochastic (non-adversarial) setting. Their result, which follows a symmetrization technique [Mendelson and Philips, 2004], scales as O( p (dp + log(1/δ))/m, and requires a minimal number of samples that is O(1/(τsλ)). In contrast, we study an generalization in the adversarially robust setting, detailed above. Our analysis is based on an ϵ-cover of the parameter space and on analyzing a local-Lipschitz property of the adversarial loss. The proof of our generalization bound is simpler, and shows a mild deterioration of the upper bound on the generalization gap due to adversarial corruption. Our work is also inspired by the line of work initiated by Papyan et al. [2017a] who regard the representations computed by neural networks as approximations for those computed by a Lasso encoder across different layers. In fact, a first analysis of adversarial robustness for such a model is presented by Romano et al. [2019]; however, they make strong generative model assumptions and thus their results are not applicable to real-data practical scenarios. Our robustness certificate mirrors the analysis from the former work, though leveraging a more general and new stability bound (Lemma 5.2) relying instead on the existence of positive encoder gap. In a related work, and in the context of neural networks, Cisse et al. [2017] propose a regularization term inspired by Parseval frames, with the empirical motivation of improving adversarial robustness. Their regularization term can in fact be related to minimizing the (average) mutual coherence of the dictionaries, which naturally arises as a control for the generalization gap in our analysis. Lastly, several works have employed sparsity as a beneficial property in adversarial learning [Marzi et al., 2018, Demontis et al., 2016], with little or no theoretical analysis, or in different frameworks (e.g. sparse weights in deep networks [Guo et al., 2018, Balda et al., 2019], or on different domains [Bafna et al., 2018]). Our setting is markedly different from that of Chen et al. [2013] who study adversarial robustness of Lasso as a sparse predictor directly on input features. In contrast, the model we study here employs Lasso as an encoder with a data-dependent dictionary, on which a linear hypothesis is applied. A few works have recently begun to analyze the effect of learned representations in an adversarial learning setting [Ilyas et al., 2019, Allen-Zhu and Li, 2020]. Adding to that line of work, our analysis demonstrates that benefits can be provided by exploiting a trade-off between expressivity and stability of the computed representations, and the classifier margin. 4 Generalization bound for robust risk In this section, we present a bound on the robust risk for models satisfying a positive encoder gap. Recall that given a b-bounded loss ℓwith Lipschitz constant Lℓ, RS(f) = 1 m Pm i=1 ℓν(yi, f(xi)) is the empirical robust risk, and R(f) = E(x,y) P ℓν(y, f(x)) is the population robust risk w.r.t. distribution P. Adversarial perturbations are bounded in ℓ2 norm by ν. Our main result below guarantees that if a hypothesis f D,w is found with a sufficiently large encoder gap, and a large enough training set, its generalization gap is bounded as O b q m , where O ignores poly-logarithmic factors. Theorem 4.1. Let W = {w Rp : w 2 B}, and D be the set of column-normalized dictionaries with p columns and with RIP at most η s. Let H = {f D,w(x) = w, ϕD(x) : w W, D D}. Denote τ s the minimal encoder gap over the m samples. Then, with probability at least 1 δ over the draw of the m samples, the generalization gap for any hypothesis f H that achieves an encoder gap on the samples of τ s > 2ν, satisfies RS(f) R(f) b m (d + 1)p log 3m 2λ(1 η s) + p log(B) + log 4 2 log(m/2) + 2 log(2/δ) m + 12(1 + ν)2LℓB s as long as m > λ(1 ηs) (τ s 2ν)2 Kλ, where Kλ = 2 1 + 1+ν 2λ + 5(1+ν) A few remarks are in order. First, note that adversarial generalization incurs a polynomial dependence on the adversarial perturbation ν. This is mild, especially since it only affects the fast O(1/m) term. Second, the bound requires a minimal number of samples. Such a requirement is intrinsic to the stability of Lasso (see Lemma 4.2 below) and it exists also in the non-adversarial setting [Mehta and Gray, 2013]. In the adversarial case, this requirement becomes more demanding, as reflected by the term (τ s 2ν) in the denominator. Moreover, a minimal encoder gap τ s > 2ν is needed as well. Theorem 4.1 suggests an interesting trade-off. One can obtain a large τ s by increasing λ and s as demonstrated in in Figure 1. But increasing λ may come at an expense of hurting the empirical error, while increasing s makes the term 1 ηs smaller. Therefore, if one obtains a model with small training error, along with large τ s over the training samples for an appropriate choice of λ and s while ensuring that ηs is bounded away from 1, then f D,w is guaranteed to generalize well. Furthermore, note that the excess error depends mildly (poly logarithmically) on λ and ηs. Our proof technique is based on a minimal ϵ-cover of the parameter space, and the full proof is included in the Appendix B. Special care is needed to ensure that the encoder gap of the dictionary holds for a sample drawn from the population, as we can only measure this gap on the provided m samples. To address this, we split the data equally into a training set and a development set: the former is used to learn the dictionary, and the latter to provide a high probability bound on the event that τs(x) > τ s . This is to ensure that the random samples of the encoder margin are i.i.d. for measure concentration. Ideally, we would like to utilize the entire dataset for learning the predictor; we leave that for future work. While most of the techniques we use are standard 3, the Lipschitz continuity of the robust loss function requires a more delicate analysis. For that, we have the following result. 3See [Seibert, 2019] for a comprehensive review on these tools in matrix factorization problems. Lemma 4.2 (Parameter adversarial stability). Let D, D D. If D D 2 ϵ 2λ/(1 + ν)2, then max v ϕD(x + v) ϕD (x + v) 2 γ(1 + ν)2ϵ, (2) s λ(1 ηs), as long as τs(x) 2ν + ϵ q λ (1 + ν) + 2 (1+ν) Lemma 4.2 is central to our proof, as it provides a bound on difference between the features computed by the encoder under model deviations. Note that the condition on the minimal encoder gap, τs(x), puts an upper bound on the distance between models D and D . This in turn results in the condition imposed on the minimal samples in Theorem 4.1. It is worth stressing that the lower bound on τs(x) is on the unperturbed encoder gap that which can be evaluated on the samples from the dataset, without the need of the adversary. We defer the proof of this Lemma to Appendix B.1. 5 Robustness Certificate Next, we turn to address an equally important question about robust adversarial learning, that of giving a formal certification of robustness. Formally, we would like to guarantee that the output of the trained model, f D,w(x), does not change for norm-bounded adversarial perturbations of a certain size. Our second main result provides such a certificate for the supervised sparse coding model. Here, we consider a multiclass classification setting with y {1, . . . , K}; simplified results for binary classification are included in Appendix C. The hypothesis class is parameterized as f D,W(x) = WT ϕD(x), with W = [W1, W2, . . . , WK] Rp K. The multiclass margin is defined as follows: ρx = WT yiϕD(x) max j =yi WT j ϕD(x). We show the following result. Theorem 5.1 (Robustness certificate for multiclass supervised sparse coding). Let ρx > 0 be the multiclass classifier margin of f D,w(x) composed of an encoder with a gap of τs(x) and a dictionary, D, with RIP constant ηs < 1. Let c W := maxi =j Wi Wj 2. Then, arg max j [K] [WT ϕD(x)]j = arg max j [K] [WT ϕD(x + v)]j, v : v 2 ν, (3) so long as ν min{τs(x)/2, ρx 1 ηs/c W}. Theorem 5.1 clearly captures the potential contamination on two flanks: robustness can no longer be guaranteed as soon as the energy of the perturbation is enough to either significantly modify the computed representation or to induce a perturbation larger than the classifier margin on the feature space. Proof of Theorem 5.1, detailed in Appendix C, relies on the following lemma showing that under an encoder gap assumption, the computed features are moderately affected despite adversarial corruptions of the input vector. Lemma 5.2 (Stability of representations under adversarial perturbations). Let D be a dictionary with RIP constant ηs. Then, for any x X and its additive perturbation x + v, for any v 2 ν, if τs(x) > 2ν, then we have that ϕD(x) ϕD(x + v) 2 ν 1 ηs . (4) An extensive set of results exist for the stability of the solution provided by Lasso relying generative model assumptions [Foucart and Rauhut, 2017, Elad, 2010]. The novelty of Lemma 5.2 is in replacing such an assumption with the existence of a positive encoder gap on ϕD(x). Going back to Theorem 5.1, note that the upper bound on ν depends on the RIP constant ηs, which is not computable for a given (deterministic) matrix D. Yet, this result can be naturally relaxed by upper bounding ηs with measures of correlation between the atoms, such as the mutual coherence. This quantity provides a measure of the worst correlation between two atoms in the dictionary D, and it is defined as µ(D) = maxi =j | Di, Dj | (for D with normalized columns). For general (overcomplete and full rank) dictionaries, clearly 0 < µ(D) 1. While conceptually simple, results that use µ(D) tend to be too conservative. Tighter bounds on ηs can be provided by the Babel function4, µ(s), which quantifies the maximum correlation between an atom and any other collection of s atoms in D. It can be shown [Tropp et al., 2003, Elad, 2010, Chapter 2] that ηs µ(s 1) (s 1)µ(D). Therefore, we have the following: Corollary 5.3. Under the same assumptions as those in Theorem 5.1, arg max j [K] [WT ϕD(x)]j = arg max j [K] [WT ϕD(x + v)]j, v : v 2 ν (5) so long as ν min{τs(x)/2, ρx p1 µ(s 1)/c W}. Although the condition on ν in the corollary above is stricter (and the bound looser), the quantities involved can easily be computed numerically leading to practical useful bounds, as we see next. 6 Experiments In this section, we illustrate the robustness certificate guarantees both in synthetic and real data, as well as the trade-offs between constants in our sample complexity result. First, we construct samples from a separable binary distribution of k-sparse signals. To this end, we employ a dictionary with 120 atoms in 100 dimensions with a mutual coherence of 0.054. Sparse representations z are constructed by first drawing their support (with cardinality k) uniformly at random, and drawing its non-zero entries from a uniform distribution away from zero. Samples are obtained as x = Dz, and normalized to unit norm. We finally enforce separability by drawing w at random from the unit ball, determining the labels as y = sign(w T ϕD(x)), and discarding samples with a margin ρ smaller than a pre-specified amount (0.05 in this case). Because of the separable construction, the accuracy of the resulting classifier is 1. We then attack the obtained model employing the projected gradient descent method [Madry et al., 2017], and analyze the degradation in accuracy as a function of the energy budget ν. We compare this empirical performance with the bound in Corollary 5.3: given the obtained margin, ρ, and the dictionary s µs, we can compute the maximal certified radius for a sample x as ν(x) = max s min{τs(x)/2, ρx p1 µ(s 1)/c W}. (6) For a given dataset, we can compute the minimal certified radius over the samples, ν = mini [n] ν(xi). This is the bound depicted in the vertical line in Figure 2a. As can be seen, despite being somewhat loose, the attacks do not change the label of the samples, thus preserving the accuracy. In non-separable distributions, one may study how the accuracy depends on the soft margin of the classifier. In this way, one can determine a target margin that results in, say, 75% accuracy on a validation set. One can obtain a corresponding certified radius of ν as before, which will guarantee that the accuracy will not drop below 75% as long as ν < ν . This is illustrated in Figure 2b. An alternative way of employing our results from Section 5 is by studying the certified accuracy achieved by the resulting hypothesis. The certified accuracy quantifies the percentage of samples in a test set that are classified correctly while being certifiable. In our context, this implies that a sample x achieves a margin of ρx, for which a certified radius of ν can be obtained with (6). In this way, one may study how the certified accuracy decreases with increasing ν . This analysis lets us compare our bounds with those of other popular certification techniques, such as randomized smoothing [Cohen et al., 2019]. Randomized smoothing provides high probability robustness guarantees against ℓ2 attacks for any classifier by composing them with a Gaussian distribution (though other distributions have been recently explored as well for other lp norms [Salman et al., 2020]). In a nutshell, the larger the variance of the Gaussian, the larger the certifiable radius becomes, albeit at the expense of a drop in accuracy. We use the MNIST dataset for this analysis. We train a model with 256 atoms by minimizing the following regularized empirical risk using stochastic gradient descent (employing Adam [Kingma 4Let Λ denote subsets (supports) of {1, 2, . . . , p}. Then, the Babel function is defined as µ(s) = maxΛ:|Λ|=s maxj / Λ P i Λ | Di, Dj |. 10 -2 10 -1 10 0 Adversary energy budget ν Accuracy Bound 10-3 10-2 10-1 100 Adversary energy budget ν Accuracy Bound @ 0.75 0.000 0.025 0.050 0.075 0.100 0.125 0.150 Certified Radius Certified Accuracy Our bound σ = 0.00 σ = 0.03 σ = 0.05 σ = 0.08 σ = 0.10 0.000 0.025 0.050 0.075 0.100 0.125 0.150 Certified Radius Certified Accuracy σ = 0.00 σ = 0.03 σ = 0.05 σ = 0.08 σ = 0.10 Figure 2: Numerical demonstrations of our results. (a) synthetic separable distribution. (b) synthetic non-separable distribution. (c-d) certified accuracy on MNIST with λ = 0.2 and λ = 0.3, respectively, comparing with Randomized Smoothing with different variance levels. and Ba, 2014]; the implementation details are deferred to Appendix D) min W,D 1 m i=1 ℓ(yi, W, ϕD(xi) ) + α I DT D 2 F + β W 2 F , (7) where ℓis the cross entropy loss. Recall that ϕD(x) depends on λ, and we train two different models with two values for this parameter (λ = 0.2 and λ = 0.3). Figure 2c and 2d illustrate the certified accuracy on 200 test samples obtained by different degrees of randomized smoothing and by our result. While the certified accuracy resulting from our bound is comparable to that by randomized smoothing, the latter provides a certificate by defending (i.e. composing it with a Gaussian distribution). In other words, different smoothed models have to be constructed to provide different levels of certified accuracy. In contrast, our model is not defended or modified in any way, and the certificate relies solely on our careful characterization of the function class. Since randomized smoothing makes no assumptions about the model, the bounds provided by this strategy rely on the estimation of the output probabilities. This results in a heavy computational burden to provide a high-probability result (a failure probability of 0.01% was used for these experiments). In contrast, our bound is deterministic and trivial to compute. Lastly, comparing the results in Figure 2c (where λ = 0.2) and Figure 2d (where λ = 0.3), we see the trade-off that we alluded to in Section 4: larger values of λ allow for larger encoder gaps, resulting in overall larger possible certified radius. In fact, λ determines a hard bound on the possible achieved certified radius, given by λ/2, as per (6). This, however, comes at the expense of reducing the complexity of the representations computed by the encoder ϕD(x), which impacts the risk attained. 7 Conclusion In this paper we study the adversarial robustness of the supervised sparse coding model from two main perspectives: we provide a bound for the robust risk of any hypothesis that achieves a minimum encoder gap over the samples, as well as a robustness certificate for the resulting end-to-end classifier. Our results describe guarantees relying on the interplay between the computed representations, or features, and the classifier margin. While the model studied is still relatively simple, we envision several ways in which our analysis can be extended to more complex models. First, high dimensional data with shift-invariant properties (such as images) often benefit from convolutional features. Our results do hold for convolutional dictionaries, but the conditions on the mutual coherence may become prohibitive in this setting. An analogous definition of encoder gap in terms of convolutional sparsity [Papyan et al., 2017b] may provide a solution to this limitation. Furthermore, this analysis could also be extended to sparse models with multiple layers, as in [Papyan et al., 2017a, Sulam et al., 2019]. On the other hand, while our result does not provide a uniform learning bound over the hypothesis class, we have found empirically that regularized ERM does indeed return hypotheses satisfying non-trivial encoder gaps. The theoretical underpinning of this phenomenon needs further research. More generally, even though this work focuses on sparse encoders, we believe similar principles could be generalized to other forms of representations in a supervised learning setting, providing a framework for the principled analysis of adversarial robustness of machine learning models. Broader Impact This work contributes to the theoretical understanding of the limitations and achievable robustness guarantees for supervised learning models. Our results can therefore provide tools that could be deployed in sensitive settings where these types of guarantees are a priority. On a broader note, this work advocates for the precise analysis and characterization of the data-driven features computed by modern machine learning models, and we hope our results facilitate their generalization to other more complex models. Acknowledgements This research was supported, in part, by DARPA GARD award HR00112020004, NSF BIGDATA award IIS-1546482, NSF CAREER award IIS-1943251 and NSF TRIPODS award CCF-1934979. Jeremias Sulam kindly thanks Aviad Aberdam for motivating and inspiring discussions. Raman Arora acknowledges support from the Simons Institute as part of the program on the Foundations of Deep Learning and the Institute for Advanced Study (IAS), Princeton, NJ, as part of the special year on Optimization, Statistics, and Theoretical Machine Learning. Aviad Aberdam, Jeremias Sulam, and Michael Elad. Multi-layer sparse coding: the holistic way. SIAM Journal on Mathematics of Data Science, 1(1):46 77, 2019. Aviad Aberdam, Alona Golts, and Michael Elad. Ada-lista: Learned solvers adaptive to varying models. ar Xiv preprint ar Xiv:2001.08456, 2020. Alireza Aghasi, Afshin Abdi, and Justin Romberg. Fast convex pruning of deep neural networks. SIAM Journal on Mathematics of Data Science, 2(1):158 188, 2020. Zeyuan Allen-Zhu and Yuanzhi Li. Feature purification: How adversarial training performs robust deep learning. ar Xiv preprint ar Xiv:2005.10190, 2020. Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. ar Xiv preprint ar Xiv:1802.00420, 2018. Mitali Bafna, Jack Murtagh, and Nikhil Vyas. Thwarting adversarial examples: An ℓ0-robust sparse fourier transform. In Advances in Neural Information Processing Systems, pages 10075 10085, 2018. Emilio Rafael Balda, Arash Behboodi, Niklas Koep, and Rudolf Mathar. Adversarial risk bounds for neural networks through sparsity based compression. ar Xiv preprint ar Xiv:1906.00698, 2019. Amir Beck and Marc Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences, 2(1):183 202, 2009. Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim Srndi c, Pavel Laskov, Giorgio Giacinto, and Fabio Roli. Evasion attacks against machine learning at test time. In Joint European conference on machine learning and knowledge discovery in databases, pages 387 402. Springer, 2013. Sébastien Bubeck, Eric Price, and Ilya Razenshteyn. Adversarial examples from computational constraints. ar Xiv preprint ar Xiv:1805.10204, 2018. Nicholas Carlini and David Wagner. Adversarial examples are not easily detected: Bypassing ten detection methods. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pages 3 14, 2017. Zachary Charles, Shashank Rajput, Stephen Wright, and Dimitris Papailiopoulos. Convergence and margin of adversarial training on separable data. ar Xiv preprint ar Xiv:1905.09209, 2019. Yudong Chen, Constantine Caramanis, and Shie Mannor. Robust sparse regression under adversarial corruption. In International Conference on Machine Learning, pages 774 782, 2013. Moustapha Cisse, Piotr Bojanowski, Edouard Grave, Yann Dauphin, and Nicolas Usunier. Parseval networks: Improving robustness to adversarial examples. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 854 863. JMLR. org, 2017. Adam Coates and Andrew Y Ng. The importance of encoding versus training with sparse coding and vector quantization. 2011. Jeremy M Cohen, Elan Rosenfeld, and J Zico Kolter. Certified adversarial robustness via randomized smoothing. ar Xiv preprint ar Xiv:1902.02918, 2019. Daniel Cullina, Arjun Nitin Bhagoji, and Prateek Mittal. Pac-learning in the presence of adversaries. In Advances in Neural Information Processing Systems, pages 230 241, 2018. DARPA. https://www.darpa.mil/news-events/2019-02-06, 2019. Ambra Demontis, Paolo Russu, Battista Biggio, Giorgio Fumera, and Fabio Roli. On security and sparsity of linear classifiers for adversarial settings. In Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR), pages 322 332. Springer, 2016. David L Donoho and Michael Elad. Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization. Proceedings of the National Academy of Sciences, 100(5): 2197 2202, 2003. Michael Elad. Sparse and redundant representations: from theory to applications in signal and image processing. Springer Science & Business Media, 2010. Simon Foucart and Holger Rauhut. A mathematical introduction to compressive sensing. Bull. Am. Math, 54:151 165, 2017. Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. ar Xiv preprint ar Xiv:1412.6572, 2014. Rémi Gribonval, Rodolphe Jenatton, Francis Bach, Martin Kleinsteuber, and Matthias Seibert. Sample complexity of dictionary learning and other matrix factorizations. IEEE Transactions on Information Theory, 61(6):3469 3486, 2015. Shixiang Gu and Luca Rigazio. Towards deep neural network architectures robust to adversarial examples. ar Xiv preprint ar Xiv:1412.5068, 2014. Shuhang Gu, Lei Zhang, Wangmeng Zuo, and Xiangchu Feng. Projective dictionary pair learning for pattern classification. In Advances in neural information processing systems, pages 793 801, 2014. Yiwen Guo, Chao Zhang, Changshui Zhang, and Yurong Chen. Sparse DNNs with improved adversarial robustness. In Advances in neural information processing systems, pages 242 251, 2018. Mikael Henaff, Kevin Jarrett, Koray Kavukcuoglu, and Yann Le Cun. Unsupervised learning of sparse features for scalable audio classification. In ISMIR, volume 11, page 2011, 2011. Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, and Aleksander Madry. Adversarial examples are not bugs, they are features. In Advances in Neural Information Processing Systems, pages 125 136, 2019. Koray Kavukcuoglu, Marc Aurelio Ranzato, and Yann Le Cun. Fast inference in sparse coding algorithms with applications to object recognition. ar Xiv preprint ar Xiv:1010.3467, 2010. Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. ar Xiv preprint ar Xiv:1412.6980, 2014. Yan Li, Ethan X Fang, Huan Xu, and Tuo Zhao. Inductive bias of gradient descent based adversarial training on separable data. ar Xiv preprint ar Xiv:1906.02931, 2019. Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. ar Xiv preprint ar Xiv:1706.06083, 2017. Julien Mairal, Michael Elad, and Guillermo Sapiro. Sparse representation for color image restoration. IEEE Transactions on image processing, 17(1):53 69, 2007. Julien Mairal, Francis Bach, Jean Ponce, Guillermo Sapiro, and Andrew Zisserman. Discriminative learned dictionaries for local image analysis. In 2008 IEEE Conference on Computer Vision and Pattern Recognition, pages 1 8. IEEE, 2008. Julien Mairal, Francis Bach, and Jean Ponce. Task-driven dictionary learning. IEEE transactions on pattern analysis and machine intelligence, 34(4):791 804, 2011. Zhinus Marzi, Soorya Gopalakrishnan, Upamanyu Madhow, and Ramtin Pedarsani. Sparsity-based defense against adversarial attacks on linear classifiers. In 2018 IEEE International Symposium on Information Theory (ISIT), pages 31 35. IEEE, 2018. Nishant Mehta and Alexander Gray. Sparsity-based generalization bounds for predictive sparse coding. In International Conference on Machine Learning, pages 36 44, 2013. Shahar Mendelson and Petra Philips. On the importance of small coordinate projections. Journal of Machine Learning Research, 5(Mar):219 238, 2004. Jan Hendrik Metzen, Tim Genewein, Volker Fischer, and Bastian Bischoff. On detecting adversarial perturbations. ar Xiv preprint ar Xiv:1702.04267, 2017. Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of machine learning. MIT press, 2018. Thomas Moreau and Joan Bruna. Understanding trainable sparse coding via matrix factorization. ar Xiv preprint ar Xiv:1609.00285, 2016. Calvin Murdock and Simon Lucey. Dataless model selection with the deep frame potential. ar Xiv preprint ar Xiv:2003.13866, 2020. Vardan Papyan, Yaniv Romano, and Michael Elad. Convolutional neural networks analyzed via convolutional sparse coding. The Journal of Machine Learning Research, 18(1):2887 2938, 2017a. Vardan Papyan, Jeremias Sulam, and Michael Elad. Working locally thinking globally: Theoretical guarantees for convolutional sparse coding. IEEE Transactions on Signal Processing, 65(21): 5687 5701, 2017b. Vardan Papyan, Yaniv Romano, Jeremias Sulam, and Michael Elad. Theoretical foundations of deep learning via sparse representations: A multilayer sparse model and its connection to convolutional neural networks. IEEE Signal Processing Magazine, 35(4):72 89, 2018. Aditi Raghunathan, Jacob Steinhardt, and Percy Liang. Certified defenses against adversarial examples. ar Xiv preprint ar Xiv:1801.09344, 2018. Marc Aurelio Ranzato, Fu Jie Huang, Y-Lan Boureau, and Yann Le Cun. Unsupervised learning of invariant feature hierarchies with applications to object recognition. In 2007 IEEE conference on computer vision and pattern recognition, pages 1 8. IEEE, 2007. Yaniv Romano, Aviad Aberdam, Jeremias Sulam, and Michael Elad. Adversarial noise attacks of deep learning architectures: Stability analysis via sparse-modeled signals. Journal of Mathematical Imaging and Vision, pages 1 15, 2019. Hadi Salman, Mingjie Sun, Greg Yang, Ashish Kapoor, and J Zico Kolter. Black-box smoothing: A provable defense for pretrained classifiers. ar Xiv preprint ar Xiv:2003.01908, 2020. Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Aleksander Madry. Adversarially robust generalization requires more data. In Advances in Neural Information Processing Systems, pages 5014 5026, 2018. Matthias Seibert. Sample Complexity of Representation Learning for Sparse and Related Data Models. Ph D thesis, Technische Universität München, 2019. Ali Shafahi, W Ronny Huang, Christoph Studer, Soheil Feizi, and Tom Goldstein. Are adversarial examples inevitable? ar Xiv preprint ar Xiv:1809.02104, 2018. Jeremias Sulam, Aviad Aberdam, Amir Beck, and Michael Elad. On multi-layer basis pursuit, efficient algorithms and convolutional neural networks. IEEE transactions on pattern analysis and machine intelligence, 2019. Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. ar Xiv preprint ar Xiv:1312.6199, 2013. Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267 288, 1996. Ryan J Tibshirani et al. The lasso problem and uniqueness. Electronic Journal of statistics, 7: 1456 1490, 2013. Bahareh Tolooshams, Sourav Dey, and Demba Ba. Deep residual auto-encoders for expectation maximization-based dictionary learning. ar Xiv preprint ar Xiv:1904.08827, 2019. Florian Tramer, Nicholas Carlini, Wieland Brendel, and Aleksander Madry. On adaptive attacks to adversarial example defenses. ar Xiv preprint ar Xiv:2002.08347, 2020. Joel A Tropp. Just relax: Convex programming methods for identifying sparse signals in noise. IEEE transactions on information theory, 52(3):1030 1051, 2006. Joel A Tropp, Anna C Gilbert, Sambavi Muthukrishnan, and Martin J Strauss. Improved sparse approximation over quasiincoherent dictionaries. In Proceedings 2003 International Conference on Image Processing (Cat. No. 03CH37429), volume 1, pages I 37. IEEE, 2003. Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, and Aleksander Madry. Robustness may be at odds with accuracy. ar Xiv preprint ar Xiv:1805.12152, 2018. Zhuozhuo Tu, Jingwei Zhang, and Dacheng Tao. Theoretical analysis of adversarial learning: A minimax approach. In Advances in Neural Information Processing Systems, pages 12259 12269, 2019. Vladimir N Vapnik and A Ya Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. In Theory of Probability and its Applications, page 264 280. 1971. Eric Wong and J Zico Kolter. Provable defenses against adversarial examples via the convex outer adversarial polytope. ar Xiv preprint ar Xiv:1711.00851, 2017. John Wright, Yi Ma, Julien Mairal, Guillermo Sapiro, Thomas S Huang, and Shuicheng Yan. Sparse representation for computer vision and pattern recognition. Proceedings of the IEEE, 98(6): 1031 1044, 2010. Bo Xin, Yizhou Wang, Wen Gao, David Wipf, and Baoyuan Wang. Maximal sparsity with deep networks? In Advances in Neural Information Processing Systems, pages 4340 4348, 2016. Dong Yin, Kannan Ramchandran, and Peter Bartlett. Rademacher complexity for adversarially robust generalization. ar Xiv preprint ar Xiv:1810.11914, 2018. Matthew D Zeiler, Dilip Krishnan, Graham W Taylor, and Rob Fergus. Deconvolutional networks. In 2010 IEEE Computer Society Conference on computer vision and pattern recognition, pages 2528 2535. IEEE, 2010.