# functional_adversarial_attacks__f46bc50c.pdf

Functional Adversarial Attacks

Cassidy Laidlaw University of Maryland claidlaw@umd.edu

Soheil Feizi University of Maryland sfeizi@cs.umd.edu

We propose functional adversarial attacks, a novel class of threat models for crafting adversarial examples to fool machine learning models. Unlike a standard ℓp-ball threat model, a functional adversarial threat model allows only a single function to be used to perturb input features to produce an adversarial example. For example, a functional adversarial attack applied on colors of an image can change all red pixels simultaneously to light red. Such global uniform changes in images can be less perceptible than perturbing pixels of the image individually. For simplicity, we refer to functional adversarial attacks on image colors as Re Color Adv, which is the main focus of our experiments. We show that functional threat models can be combined with existing additive (ℓp) threat models to generate stronger threat models that allow both small, individual perturbations and large, uniform changes to an input. Moreover, we prove that such combinations encompass perturbations that would not be allowed in either constituent threat model. In practice, Re Color Adv can signiﬁcantly reduce the accuracy of a Res Net-32 trained on CIFAR-10. Furthermore, to the best of our knowledge, combining Re Color Adv with other attacks leads to the strongest existing attack even after adversarial training.

1 Introduction

There is an extensive recent literature on adversarial examples, small perturbations to inputs of machine learning algorithms that cause the algorithms to report an erroneous output, e.g. the incorrect label for a classiﬁer. Adversarial examples present serious security challenges for real-world systems like self-driving cars, since a change in the environment that is not noticeable to a human may cause unexpected, unwanted, or dangerous behavior. Many methods of generating adversarial examples (called adversarial attacks) have been proposed [23, 5, 15, 17, 3]. Defenses against such attacks have also been explored [18, 14, 30].

Most existing attack and defense methods consider a threat model of adversarial attacks where adversarial examples can differ from normal inputs by a small ℓp distance. However, using this threat model that encompasses a simple deﬁnition of "small perturbation" misses other types of perturbations that may also be imperceptible to humans. For instance, small spatial perturbations have been used to generate adversarial examples [4, 27, 26].

In this paper, we propose a new class of threat models for adversarial attacks, called functional threat models. Under a functional threat model, adversarial examples can be generated from a regular input to a classiﬁer by applying a single function to all features of the input:

Additive threat model: (x1, . . . , xn) (x1 + δ1, . . . , xn + δn) Functional threat model: (x1, . . . , xn) (f(x1), . . . , f(xn))

For instance, the perturbation function f( ) could darken every red pixel in an image, or increase the volume of every timestep in an audio sample. Functional threat models are in some ways more restrictive because features cannot be perturbed individually. However, the uniformity of the

33rd Conference on Neural Information Processing Systems (Neur IPS 2019), Vancouver, Canada.

Figure 1: A visualization of an additive adversarial attack, a functional adversarial attack, and their combination. The additive attack perturbs each feature (pixel) separately, whereas the functional attack applies the same function f( ) to every feature.

perturbation in a functional threat model makes the change less perceptible, allowing for larger absolute modiﬁcations. For example, one could darken or lighten an entire image by quite a bit without the change becoming noticeable. This stands in contrast to separate changes to each pixel, which must be smaller to avoid becoming perceptible. We discuss various regularizations that can be applied to the perturbation function f( ) to ensure that even large changes are imperceptible.

The advantages and disadvantages of additive (ℓp) and functional threat models complement each other; additive threat models allow small, individual changes to every feature of an input while functional threat models allow large, uniform changes. Thus, we combine the threat models (see ﬁgure 1) and show that the combination encompasses more potential perturbations than either one separately, as we explain in the following theorem which is stated more precisely in section 3.2. Theorem 1 (informal). Let x be a grayscale image with n 2 pixels. Consider an additive threat model that allows changing each pixel by up to a certain amount, and a functional threat model that allows darkening or lightening the entire image by a greater amount. Then the combination of these threat models allows potential perturbations that are not allowed in either constituent threat model.

Functional threat models can be used in a variety of domains such as images (e.g. by uniformly changing image colors), speech/audio (e.g. by changing the "accent" of an audio clip), text (e.g. by replacing a word in the entire document with its synonym), or fraud analysis (e.g. by uniformly modifying an actor s ﬁnancial activities). Moreover, because functional perturbations are large and uniform, they may also be easier to use for physical adversarial examples, where the small pixel-level changes created in additive perturbations could be drowned out by environmental noise.

In this paper, we will focus on one such domain images and deﬁne Re Color Adv, a functional adversarial attack on pixel colors (see ﬁgure 2). In Re Color Adv, we use a ﬂexibly parameterized function f to map each pixel color c in the input to a new pixel color f(c) in an adversarial example. We regularize f( ) both to ensure that no color is perturbed by more than a certain amount, and to make sure that the mapping is smooth, i.e. similar colors are perturbed in a similar way. We show that Re Color Adv can use colors deﬁned in the standard red, green, blue (RGB) color space and also in CIELUV color space, which results in less perceptually different adversarial examples (see ﬁgure 4).

We experiment by attacking defended and undefended classiﬁers with Re Color Adv, by itself and in combination with other attacks. We ﬁnd that Re Color Adv is a strong attack, reducing the accuracy of a Res Net-32 trained on CIFAR-10 to 3.0%. Combinations of Re Color Adv and other attacks are yet more powerful; one such combination lowers a CIFAR-10 classiﬁer s accuracy to 3.6%, even after adversarial training. This is lower than the previous strongest attack of Jordan et al. [10]. We also demonstrate the fragility of adversarial defenses based on an additive threat model by reducing the accuracy of a classiﬁer trained with TRADES [30] to 5.7%. Although one might attempt to mitigate the Re Color Adv attack by converting images to grayscale before classiﬁcation, which removes color information, we show that this simply decreases a classiﬁer s accuracy (both natural and adversarial). Furthermore, we ﬁnd that combining Re Color Adv with other attacks improves the strength of the attack without increasing the perceptual difference, as measured by LPIPS [32], of the generated adversarial example.

Figure 2: Four Image Net adversarial examples generated by Re Color Adv against an Inception-v4 classiﬁer. From left to right in each group: original image, adversarial example, magniﬁed difference.

Our contributions are summarized as follows:

We introduce a novel class of threat models, functional adversarial threat models, and combine them with existing threat models. We also describe ways of regularizing functional threat models to ensure that generated adversarial examples are imperceptible. Theoretically, we prove that additive and functional threat models combine to create a threat model that encompasses more potential perturbations than either threat model alone. Experimentally, we show that Re Color Adv, which uses a functional threat model on images, is a strong adversarial attack against image classiﬁers. To the best of our knowledge, combining Re Color Adv with other attacks leads to the strongest existing attack even after adversarial training.

2 Review of Existing Threat Models

In this section, we deﬁne the problem of generating an adversarial example and review existing adversarial threat models and attacks.

Problem Deﬁnition Consider a classiﬁer g : X n Y from a feature space X n to a set of labels Y. Given an input x X n, an adversarial example is a slight perturbation ex of x such that g(ex) = g(x); that is, ex is given a different label than x by the classiﬁer. Since the aim of an adversarial example is to be perceptually indistinguishable from a normal input, ex is usually constrained to be close to x by some threat model. Formally, Jordan et al. [10] deﬁne a threat model as a function t : P(X n) P(X n), where P denotes the power set. The function t( ) maps a set of classiﬁer inputs S to a set of perturbed inputs t(S) that are imperceptibly different. With this deﬁnition, we can formalize the problem of generating an adversarial example from an input:

ﬁnd ex such that g(ex) = g(x) and ex t({x})

Additive Threat Model The most common threat model used when generating adversarial examples is the additive threat model. Let x = (x1, . . . , xn), where each xi X is a feature of x. For instance, xi could correspond to a pixel in an image or the ﬁlterbank energies for a timestep in an audio sample. In an additive threat model, we assume ex = (x1 + δ1, . . . , xn + δn); that is, a value δi is added to each feature of x to generate the adversarial example ex. Under this threat model, perceptual similarity is usually enforced by a bound on the norm of δ = (δ1, . . . , δn). Thus, the additive threat model is deﬁned as

tadd(S) {(x1 + δ1, . . . , xn + δn) | (x1, . . . , xn) S, δ ǫ} .

Commonly used norms include 2 (Euclidean distance), which constrains the sum of squares of the δi, 0, which constrains the number of features can be changed, and , which allows changing each feature by up to a certain amount. Note that all of the δi can be modiﬁed individually to generate a misclassiﬁcation, as long as the norm constraint is met. Thus, a small ǫ is usually necessary because otherwise the input could be made incomprehensible by noise.

Most previous work on generating adversarial examples has employed the additive threat model. This includes gradient-based methods like FGSM [5], Deep Fool [15], and Carlini & Wagner [3], and gradient-free methods like SPSA [25] and the Boundary Attack [2].

Other Threat Models Some recent work has focused on spatial threat models, which allow for slight perturbations of the locations of features in an input rather than perturbations of the features themselves [27, 26, 4]. Others have proposed threat models based on properties of a 3D renderer [29] and constructing adversarial examples with a GAN [22]. Finally, some research has focused on coloring-based threat models through modiﬁcation of an image s hue and saturation [7], inverting images [8], using a colorization network [1], and applying an afﬁne transformation to colors followed by PGD [31]. See appendix E for a discussion of non-additive threat models and comparison to our proposed functional threat model.

3 Functional Threat Model

In this section, we deﬁne functional threat model and explore its combinations with existing threat models. Recall that in the additive threat model, each feature of an input can only be perturbed by a small amount. Because all the features are changed separately, larger changes could make the input unrecognizable. Our key insight is that larger perturbations to an input should be possible if the dependencies between features are considered.

Unlike the additive threat model, in the functional threat model the features xi are transformed by a single function f : X X, called the perturbation function. That is,

ex = f(x) = (f(x1), . . . , f(xn))

Under this threat model, features which have the same value in the input must be mapped to the same value in the adversarial example. Even large perturbations allowed by a functional threat model may be imperceptible to human eyes because they preserve dependencies between features (for example, shape boundaries and shading in images, see ﬁgure 1). Note that the features xi which are modiﬁed by the perturbation function f( ) need not be scalars; depending on the application, vector-valued features may be useful.

3.1 Regularizing Functional Threat Models

In the functional threat model, various regularizations can be used to ensure that the change remains imperceptible. In general, we can enforce that f F; F is a family of allowed perturbation functions. For instance, we may want to bound by some small ǫ the maximum difference between the input and output of the perturbation function. In that case, we will have:

Fdiff {f : X X | xi X f(xi) xi ǫ} (1)

Fdiff prevents absolute changes of more than a certain amount. Note that the ǫ bound may be higher than that of an additive model, since uniform changes are less perceptible. However, this regularization may not be enough to prevent noticeable changes. Fdiff still includes functions that map similar (but not identical) features very differently. Therefore, a second constraint could be used that forces similar features to be perturbed similarly:

Fsmooth {f | xi, xj X xi xj r (f(xi) xi) (f(xj) xj) ǫsmooth} (2)

Fsmooth requires that similar features are perturbed in the same "direction". For instance, if green pixels in an image are lightened, then yellow-green pixels should be as well.

Depending on the application, these constraints or others may be needed to maintain an imperceptible change. We may want to choose F to be Fdiff, Fsmooth, Fdiff Fsmooth, or an entirely different family of functions. Once we have chosen an F, we can deﬁne a corresponding functional threat model as

tfunc(S) {(f(x1), . . . , f(xn)) | (x1, . . . , xn) S, f F}

3.2 Combining Threat Models

Jordan et al. [10] argue that combining multiple threat models allows better approximation of the complete set of adversarial perturbations which are imperceptible. Here, we show that combining the additive threat model with a simple functional threat model can allow adversarial examples which are not allowable by either model on its own. The following theorem (proved in appendix A)

Figure 3: Re Color Adv transforms each pixel in the input image x (left) by the same function f( ) (center) to produce an adversarial example ex (right). The perturbation function f is shown as a vector ﬁeld in CIELUV color space.

demonstrates this on images for a combination of an additive threat model which allows changing each pixel by a small, bounded amount and a functional threat model which allows darkening or lightening the entire image by up to a larger amount, both of which are arguably imperceptible transformations. Theorem 1. Let x be a grayscale image with n 2 pixels, i.e. x [0, 1]n = X n. Let tadd be an additive threat model where the ℓ distance between input and adversarial example is bounded by ǫ1, i.e. (δ1, . . . , δn) ǫ1. Let tfunc be a functional threat model where f(x) = c x for some c [1 ǫ2, 1 + ǫ2] such that ǫ2 > ǫ1 > 0. Let tcombined = tadd tfunc. Then the combined threat model allows adversarial perturbations which are not allowed by either constituent threat model. Formally, if S X n contains an image x that is not dark, that is xi s.t. xi > ǫ1/ǫ2, then

tcombined(S) tadd(S) tfunc(S) or equivalently ex s.t. ex tcombined(S) ex / tadd(S) tfunc(S)

4 Re Color Adv: Functional Adversarial Attacks on Image Colors

In this section, we deﬁne Re Color Adv, a novel adversarial attack against image classiﬁers that leverages a functional threat model. Re Color Adv generates adversarial examples to fool image classiﬁers by uniformly changing colors of an input image. We treat each pixel xi in the input image x as a point in a 3-dimensional color space C [0, 1]3. For instance, C could be the normal RGB color space. In section 4.1, we discuss our use of alternative color spaces. We leverage a perturbation function f : C C to produce the adversarial example. Speciﬁcally, each pixel in the output ex is perturbed from the input x by applying f( ) to the color in that pixel:

xi = (ci,1, ci,2, ci,3) C [0, 1]3 exi = (eci,1, eci,2, eci,3) = f(ci,1, ci,2, ci,3)

For the purposes of ﬁnding an f( ) that generates a successful adversarial example, we need a parameterization of the function that allows both ﬂexibility and ease of computation. To accomplish this, we let G = g1, . . . , gm [0, 1]3 be a discrete grid of points (or point lattice) where f is explicitly deﬁned. That is, we deﬁne parameters θ1, . . . , θm and let f(gi) = θi. For points not on the grid, i.e. xi / G, we deﬁne f(xi) using trilinear interpolation. Trilinear interpolation considers the "cube" of the lattice points gj surrounding the argument xi and linearly interpolates the explicitly deﬁned θj values at the 8 corners of this cube to calculate f(xi).

Constraints on the perturbation function We enforce two constraints on f( ) to ensure that the crafted adversarial example is indistinguishable from the original image. These constraints are based on slight modiﬁcations of Fdiff and Fsmooth deﬁned in section 3.1. First, we ensure that no pixel can be perturbed by more than a certain amount along each dimension in color space:

Fdiff-col {f : C C | (c1, c2, c3) G |ci eci| < ǫi i = 1, 2, 3}

This particular formulation allows us to set different bounds (ǫ1, ǫ2, ǫ3) on the maximum perturbation along each dimension in color space. We also deﬁne a constraint based on Fsmooth, but instead of using a radius parameter r as in (2) we consider the neighbors N(gj) of each lattice point gj in the grid G:

Fsmooth-col {f : C C | gj X, gk N(gj) (f(gj) gj) (f(gk) gk) 2 ǫsmooth}

Figure 4: The color space used affects the adversarial example produced by Re Color Adv. The original image at center is attacked by Re Color Adv with the CIELUV color space (left) and RGB color space (right). The RGB color space results in noticeable bright green artifacts in the adversarial example, while the perceptually accurate CIELUV color space produces a more realistic perturbation.

In the above, 2 is the ℓ2 (Euclidean) norm in the color space C. We deﬁne our set of allowable perturbation functions as Fcol = Fdiff-col Fsmooth-col with parameters (ǫ1, ǫ2, ǫ3, ǫsmooth).

Optimization To generate an adversarial example with Re Color Adv, we wish to minimize Ladv(f, x) subject to f Fcol, where Ladv enforces the goal of generating an adversarial example that is misclassiﬁed and is deﬁned as the f6 loss from Carlini and Wagner [3], where g(x)i represents the classiﬁer s ith logit:

Ladv(f, x) = max max i =y (g(ex)i g(ex)y), 0 (3)

When solving this constrained minimization problem, it is easy to constrain f Fdiff-col by clipping the perturbation of each color to be within the ǫi bounds. However, it is difﬁcult to enforce f Fsmooth-col directly. Thus, we instead solve a Lagrangian relaxation where the smoothness constraint is replaced by an additional regularization term:

arg min f Fdiff-col Ladv(f, x) + λ Lsmooth(f) ( )

Lsmooth(f) X

gk N (gj) (f(gj) gj) (f(gk) gk) 2

Our Lsmooth is similar to the loss function used by Xiao et al. [27] to ensure a smooth ﬂow ﬁeld. We use the projected gradient descent (PGD) optimization algorithm to solve ( ).

4.1 RGB vs. LUV Color Space

Most image classiﬁers take as input an array of pixels speciﬁed in RGB color space, but the RGB color space has two disadvantages. The ℓp distance between points in RGB color space is weakly correlated with the perceptual difference between the colors they represent. Also, RGB gives no separation between the luma (brightness) and chroma (hue/colorfulness) of a color.

In contrast, the CIELUV color space separates luma from chroma and places colors such that the Euclidean distance between them is roughly equivalent to the perceptual difference [21]. CIELUV presents a color by three components (L, U, V ); L is the luma while U and V together deﬁne the chroma. We run experiments using both RGB and CIELUV color spaces. CIELUV allows us to regularize the perturbation function f( ) perceptually accurately (see ﬁgure 4 and appendix B.1). We experimented with the hue, saturation, value (HSV) and YPb Pr color spaces as well; however, neither is perceptually accurate and the HSV transformation from RGB is difﬁcult to differentiate (see appendix C).

Table 1: Accuracy of adversarially trained models against various combinations of attacks on CIFAR10. Columns correspond to attacks and rows correspond to models trained against a particular attack. C(-RGB) is Re Color Adv using CIELUV (RGB) color space, D is delta attack, and S is St Adv attack. TRADES is the method of Zhang et al. [30]. For classiﬁers marked (B&W), the images are converted to black-and-white before classiﬁcation.

Attack Defense None C-RGB C D S C+S C+D S+D C+S+D

Undefended 92.2 5.9 3.0 0.0 0.9 0.8 0.0 0.0 0.0 C 88.7 43.5 45.8 5.7 3.6 3.4 0.9 0.2 0.2 D 84.8 74.9 50.6 30.6 16.0 11.7 8.9 2.7 2.2 S 82.7 16.9 8.0 0.5 26.2 4.8 0.0 0.1 0.0 C+S 89.5 31.7 23.0 0.7 10.9 8.7 0.5 0.6 0.4 C+D 88.5 36.3 19.5 7.5 2.7 2.8 5.2 4.1 4.6 S+D 82.1 66.9 42.7 35.4 21.9 13.4 12.2 7.6 4.1 C+S+D 88.9 30.6 17.2 7.3 3.5 3.3 5.5 3.7 3.6

TRADES 84.4 81.3 59.2 53.6 26.6 17.5 22.0 8.6 5.7

Undefended (B&W) 88.3 5.3 4.1 0.0 0.9 0.6 0.0 0.0 0.0 C (B&W) 85.8 40.8 38.9 4.2 2.5 2.5 0.5 0.1 0.2

5 Experiments

We evaluate Re Color Adv against defended and undefended neural networks on CIFAR-10 [13] and Image Net [20]. For CIFAR-10 we evaluate the attack against a Res Net-32 [6] and for Image Net we evaluate against an Inception-v4 network [24]. We also consider all combinations of Re Color Adv with delta attacks, which use an ℓ additive threat model with bound ǫ = 8/255, and the St Adv attack of Xiao et al. [27] that perturbs images spatially through a ﬂow ﬁeld. See appendix B for a full discussion of the hyperparameters and computing infrastructure used in our experiments. We release our code at https://github.com/cassidylaidlaw/Re Color Adv.

5.1 Adversarial Training

We ﬁrst experiment by attacking adversarially trained models with Re Color Adv and other attacks. For each combination of attacks, we adversarially train a Res Net-32 on CIFAR-10 against that particular combination. We attack each of these adversarially trained models with all combinations of attacks. The results of this experiment are shown in the ﬁrst part of table 1.

Combination attacks are most powerful As expected, combinations of attacks are the strongest against defended and undefended classiﬁers. In particular, the Re Color Adv + St Adv + delta attack often resulted in the lowest classiﬁer accuracy. The accuracy of only 3.6% after adversarially training against the Re Color Adv + St Adv + delta attack is the lowest we know of.

Transferability of robustness across perturbation types While adversarial attack "transferability" often refers to the ability of attacks to transfer between models [16], here we investigate to what degree a model robust to one type of adversarial perturbation is robust to other types of perturbations, similarly to Kang et al. [11]. To some degree, the perturbations investigated are orthogonal; that is, a model trained against a particular type of perturbation is less effective against others. St Adv is especially separate from the other two attacks; models trained against St Adv attacks are still very vulnerable to Re Color Adv and delta attacks. However, the Re Color Adv and delta attacks allow more transferable robustness between each other. These results are likely due to the fact that both the delta and Re Color Adv attacks operate on a per-pixel basis, whereas the St Adv attack allows spatial movement of features across pixels.

Effect of color space The Re Color Adv attack using CIELUV color space is stronger than that using RGB color space. In addition, the CIELUV color space produces less perceptible perturbations (see ﬁgure 4). This highlights the need for using perceptually accurate models of color when designing and defending against adversarial examples.

Orig. C D C+D C+S+D

Orig. C D C+D C+S+D

Figure 5: Adversarial examples generated with combinations of attacks against a CIFAR-10 Wide Res Net [28] trained using TRADES; the difference from the original is shown below each example. Combinations of attacks tend to produce less perceptible changes than the attacks do separately.

Figure 6: The perceptual distortion (LPIPS) and strength (error rate) of combinations of Re Color Adv and delta attacks with various bounds. The annotated points mark the bounds used in other experiments: C is Re Color Adv, D is a delta attack, and C+D is their combination. Combining the attacks does not increase perceptable change by much (left), but it greatly increases attack strength (right).

5.2 Other Defenses

TRADES TRADES is a training algorithm for deep neural networks that aims to improve robustness to adversarial examples by optimizing a surrogate loss [30]. The algorithm is designed around an additive threat model, but we evaluate a TRADES-trained classiﬁer on all combinations of attacks (see the second part of table 1). This is the best defense method against almost all attacks, despite having been trained based on just an additive threat model. However, the combined Re Color Adv + St Adv + delta attack still reduces the accuracy of the classiﬁer to just 5.7%.

Grayscale conversion Since Re Color Adv attacks input images by changing their colors, one might attempt to mitigate the attack by converting all images to grayscale before classiﬁcation. This could reduce the potential perturbations available to Re Color Adv since altering the chroma of a color would not affect the grayscale image; only changes in luma would. We train models on CIFAR-10 that convert all images to grayscale as a preprocessing step both with and without adversarial training against Re Color Adv. The results of this experiment (see the third part of table 1) show that conversion to grayscale is not a viable defense against Re Color Adv. In fact, the natural accuracy and robustness against almost all attacks decreases when applying grayscale conversion.

5.3 Perceptual Distance

We quantify the perceptual distortion caused by Re Color Adv attacks using the Learned Perceptual Image Patch Similarity (LPIPS) metric, a distance measure between images based on deep network activations which has been shown to correlate with human perception [32]. We combine Re Color Adv and delta attacks and vary the bound of each attack (see ﬁgure 6). We ﬁnd that the attacks can be combined without much increase, or with even sometimes a decrease, in perceptual difference. As Jordan et al. [10] ﬁnd for combinations of St Adv and delta attacks, the lowest perceptual difference at a particular attack strength is achieved by a combination of Re Color Adv and delta attacks.

6 Conclusion

We have presented functional threat models for adversarial examples, which allow large, uniform changes to an input. They can be combined with additive threat models to provably increase the potential perturbations allowed in an adversarial attack. In practice, the Re Color Adv attack, which leverages a functional threat model against image pixel colors, is a strong adversarial attack on image classiﬁers. It can also be combined with other attacks to produce yet more powerful attacks even after adversarial training without a signiﬁcant increase in perceptual distortion. Besides images, functional adversarial attacks could be designed for audio, text, and other domains. It will be crucial to develop defense methods against these attacks, which encompass a more complete threat model of which potential adversarial examples are imperceptible to humans.

Acknowledgements

This work was supported in part by NSF award CDS&E:1854532 and award HR00111990077.

[1] Anand Bhattad, Min Jin Chong, Kaizhao Liang, Bo Li, and David A. Forsyth. Big but Imperceptible Adversarial Perturbations via Semantic Manipulation. ar Xiv preprint ar Xiv:1904.06347, 2019.

[2] Wieland Brendel, Jonas Rauber, and Matthias Bethge. Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models. International Conference on Learning Representations, February 2018.

[3] Nicholas Carlini and David Wagner. Towards Evaluating the Robustness of Neural Networks. In 2017 IEEE Symposium on Security and Privacy (SP), pages 39 57. IEEE, 2017.

[4] Logan Engstrom, Brandon Tran, Dimitris Tsipras, Ludwig Schmidt, and Aleksander Madry. A Rotation and a Translation Sufﬁce: Fooling CNNs with Simple Transformations. ar Xiv preprint ar Xiv:1712.02779, 2017.

[5] Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and Harnessing Adversarial Examples. In International Conference on Learning Representations, 2015.

[6] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770 778, 2016.

[7] Hossein Hosseini and Radha Poovendran. Semantic Adversarial Examples. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 1614 1619, 2018.

[8] Hossein Hosseini, Baicen Xiao, Mayoore Jaiswal, and Radha Poovendran. On the Limitation of Convolutional Neural Networks in Recognizing Negative Images. In 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pages 352 358. IEEE, 2017.

[9] Yerlan Idelbayev. Proper Res Net Implementation for CIFAR10/CIFAR100 in pytorch, 2018. URL https://github.com/akamaster/pytorch_resnet_cifar10. original-date: 201801-15T09:50:56Z.

[10] Matt Jordan, Naren Manoj, Surbhi Goel, and Alexandros G. Dimakis. Quantifying Perceptual Distortion of Adversarial Examples. ar Xiv preprint ar Xiv:1902.08265, February 2019. ar Xiv: 1902.08265.

[11] Daniel Kang, Yi Sun, Tom Brown, Dan Hendrycks, and Jacob Steinhardt. Transfer of Adversarial Robustness Between Perturbation Types. ar Xiv preprint ar Xiv:1905.01034, 2019.

[12] Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. ar Xiv preprint ar Xiv:1412.6980, 2014.

[13] Alex Krizhevsky and Geoffrey Hinton. Learning Multiple Layers of Features from Tiny Images. Technical report, Citeseer, 2009.

[14] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards Deep Learning Models Resistant to Adversarial Attacks. 2018.

[15] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a Simple and Accurate Method to Fool Deep Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2574 2582, 2016.

[16] Nicolas Papernot, Patrick Mc Daniel, and Ian Goodfellow. Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples. ar Xiv preprint ar Xiv:1605.07277, 2016.

[17] Nicolas Papernot, Patrick Mc Daniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. The Limitations of Deep Learning in Adversarial Settings. In IEEE European Symposium on Security and Privacy (Euro S&P), pages 372 387. IEEE, 2016.

[18] Nicolas Papernot, Patrick Mc Daniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks. In IEEE Symposium on Security and Privacy (SP), pages 582 597. IEEE, 2016.

[19] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary De Vito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic Differentiation in Py Torch. In NIPS-W, 2017.

[20] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, and Michael Bernstein. Image Net Large Scale Visual Recognition Challenge. International Journal of Computer Vision, 115(3):211 252, 2015.

[21] Jim Schwiegerling. Field Guide to Visual and Ophthalmic Optics. SPIE Publications, Bellingham, Wash, November 2004. ISBN 978-0-8194-5629-8.

[22] Yang Song, Rui Shu, Nate Kushman, and Stefano Ermon. Constructing Unrestricted Adversarial Examples with Generative Models. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, pages 8322 8333. Curran Associates Inc., 2018.

[23] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing Properties of Neural Networks. In International Conference on Learning Representations, 2014.

[24] Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A. Alemi. Inception-v4, Inception-Res Net and the Impact of Residual Connections on Learning. In Thirty-First AAAI Conference on Artiﬁcial Intelligence, 2017.

[25] Jonathan Uesato, Brendan O Donoghue, Pushmeet Kohli, and Aaron Oord. Adversarial Risk and the Dangers of Evaluating Against Weak Attacks. In International Conference on Machine Learning, pages 5025 5034, July 2018. URL http://proceedings.mlr.press/v80/ uesato18a.html.

[26] Eric Wong, Frank R. Schmidt, and J. Zico Kolter. Wasserstein Adversarial Examples via Projected Sinkhorn Iterations. ar Xiv preprint ar Xiv:1902.07906, February 2019. ar Xiv: 1902.07906.

[27] Chaowei Xiao, Jun-Yan Zhu, Bo Li, Warren He, Mingyan Liu, and Dawn Song. Spatially Transformed Adversarial Examples. ar Xiv preprint ar Xiv:1801.02612, 2018.

[28] Sergey Zagoruyko and Nikos Komodakis. Wide Residual Networks. In British Machine Vision Conference. British Machine Vision Association, 2016.

[29] Xiaohui Zeng, Chenxi Liu, Yu-Siang Wang, Weichao Qiu, Lingxi Xie, Yu-Wing Tai, Chi Keung Tang, and Alan L. Yuille. Adversarial Attacks Beyond the Image Space. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. ar Xiv: 1711.07183.

[30] Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P. Xing, Laurent El Ghaoui, and Michael I. Jordan. Theoretically Principled Trade-off Between Robustness and Accuracy. In ICML 2019, 2019.

[31] Huan Zhang, Hongge Chen, Zhao Song, Duane S. Boning, Inderjit S. Dhillon, and Cho-Jui Hsieh. The Limitations of Adversarial Training and the Blind-Spot Attack. International Conference on Learning Representations, 2019.

[32] Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 586 595, 2018.