# sensitivity_analysis_of_deep_neural_networks__f5d06f9f.pdf

The Thirty-Third AAAI Conference on Artiﬁcial Intelligence (AAAI-19)

Sensitivity Analysis of Deep Neural Networks

Hai Shu Department of Biostatistics The University of Texas MD Anderson Cancer Center Houston, Texas, USA

Hongtu Zhu AI Labs, Didi Chuxing Beijing, China zhuhongtu@didiglobal.com

Deep neural networks (DNNs) have achieved superior performance in various prediction tasks, but can be very vulnerable to adversarial examples or perturbations. Therefore, it is crucial to measure the sensitivity of DNNs to various forms of perturbations in real applications. We introduce a novel perturbation manifold and its associated inﬂuence measure to quantify the effects of various perturbations on DNN classiﬁers. Such perturbations include various external and internal perturbations to input samples and network parameters. The proposed measure is motivated by information geometry and provides desirable invariance properties. We demonstrate that our inﬂuence measure is useful for four model building tasks: detecting potential outliers , analyzing the sensitivity of model architectures, comparing network sensitivity between training and test sets, and locating vulnerable areas. Experiments show reasonably good performance of the proposed measure for the popular DNN models Res Net50 and Dense Net121 on CIFAR10 and MNIST datasets.

1 Introduction Deep neural networks (DNNs) have exhibited impressive power in image classiﬁcation and outperformed human detection in the Image Net challenge (Russakovsky et al. 2015; He et al. 2015; 2016; Huang et al. 2017). Despite this huge success, it is well known that state-of-the-art DNNs can be sensitive to small perturbations (Szegedy et al. 2013; Goodfellow, Shlens, and Szegedy 2015; Moosavi-Dezfooli, Fawzi, and Frossard 2016; Carlini and Wagner 2017; Su, Vargas, and Kouichi 2017). This vulnerability has called into question their usage in safety-critical applications, including selfdriving cars (Bojarski et al. 2016) and face recognition (Sharif et al. 2017), among many others (Akhtar and Mian 2018). There is rich literature on quantifying the sensitivity or robustness of DNNs to external perturbations that affect the input samples; see (Fawzi, Moosavi-Dezfooli, and Frossard 2017; Akhtar and Mian 2018; Novak et al. 2018). For instance, one popular robustness measure computes the minimum adversarial distortion for a given sample (Moosavi-Dezfooli, Fawzi, and Frossard 2016; Hein and Andriushchenko 2017; Weng et al. 2018). However, very little work has been done on measuring the effects of various internal perturbations to

Copyright c 2019, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

network trainable parameters on DNNs. To the best of our knowledge, (Cheney, Schrimpf, and Kreiman 2017) is the ﬁrst paper to examine the robustness of Alex Net (Krizhevsky, Sutskever, and Hinton 2012) by tracking the classiﬁcation performance over several chosen standard deviations of Gaussian perturbations to network weights. The aim of this paper is to develop a novel perturbation manifold and its associated inﬂuence measure to evalute the effects of various perturbations to input samples and/or network trainable parameters. Our inﬂuence measure is a novel extension of the local inﬂuence measures proposed in (Zhu et al. 2007; Zhu, Ibrahim, and Tang 2011) to classiﬁcation problems by using information geometry (Amari 1985; Amari and Nagaoka 2000). Compared with the existing methods (Akhtar and Mian 2018), we make the following two major methodological contributions. Our inﬂuence measure is motivated by information geometry, and its calculation is computationally straightforward and does not require optimizing any objective function. When the dimension of the perturbation vector is larger than the number of classes, the perturbation manifold in (Zhu et al. 2007; Zhu, Ibrahim, and Tang 2011) has a singular metric tensor and thus fails to form a Riemannian manifold. We address this singularity issue by introducing a low-dimensional transform and show that our inﬂuence measure still provides the invariance under diffeomorphisms of the original perturbation. Such an invariance property is critical for assessing the simultaneous effects or comparing the individual impacts of different external and/or internal perturbations within or between DNNs without concerning their difference in scales, such as the comparison between perturbations to trainable parameters in a convolution layer and those in a batch normalization layer within a single DNN. In contrast, existing measures, such as the Jacobian norm (Novak et al. 2018) and Cook s local inﬂuence measure (Cook 1986), do not have this invariance property, leading to some misleading results. Our proposed inﬂuence measure is applicable to various forms of external and internal perturbations and useful for four important model building tasks: (i) detecting potential outliers , (ii) analyzing the sensitivity of model architectures, (iii) comparing network sensitivity between training and test sets, and (iv) locating vulnerable areas. For task (i), downweighting outliers may be used to train a DNN with increased robustness. Task (ii) may serve as a guide to the improvement

of an existing network architecture. Task (iii) can evaluate the heterogeneity of the model robustness between training and test sets, and combining tasks (i) (iii) may be useful for selecting DNNs. For task (iv), the discovered vulnerable locations in a given image can be utilized to either craft adversarial examples or fortify a DNN with data augmentation. The application of our inﬂuence measure to tasks (i)-(iv) is illustrated for two popular DNNs, Res Net50 (He et al. 2016) and Dense Net121 (Huang et al. 2017), on the benchmark datasets CIFAR10 and MNIST.

2 Method 2.1 Perturbation Manifold Given an input image x and a DNN model N with a trainable parameter vector θ, the prediction probability for the response variable y {1, . . . , K} is denoted as P(y|x, θ) = Nθ(y, x). Let ω = (ω1, . . . , ωp)T be a perturbation vector varying in an open subset Ω Rp. The perturbation ω can be ﬂexibly imposed on x, θ, or even the combination of x and θ. Denote P(y|x, θ, ω) to be the prediction probability under perturbation ω such that PK y=1 P(y|x, θ, ω) = 1. It is assumed that there is a ω0 Ωsuch that P(y|x, θ, ω0) = P(y|x, θ). Also, {P(y|x, θ, ω)}K y=1 is assumed to be positive and sufﬁciently smooth for all ω Ω. Following the development in (Zhu et al. 2007; Zhu, Ibrahim, and Tang 2011), we may deﬁne M = {P(y|x, θ, ω) : ω Ω} as a perturbation manifold. The tangent space of M at ω is denoted by Tω, which is spanned by { ℓ(ω|y, x, θ)/ ωi}p i=1, where ℓ(ω|y, x, θ) = log P(y|x, θ, ω). Let Gω(ω) = PK y=1 T ωℓ(ω|y, x, θ) ωℓ(ω|y, x, θ)P(y|x, θ, ω) with ω = ( / ω1, . . . , / ωp). If Gω(ω) is positive deﬁnite, then for any two tangent vectors vi(ω) = h T i T ωℓ(ω|y, x, θ) Tω, i = 1, 2, where h T i denotes the coordinate vector of vi(ω) on the basis ωℓ(ω|y, x, θ), the inner product can be deﬁned by

v1(ω), v2(ω) =

y=1 v1(ω)v2(ω)P(y|x, θ, ω)

= h T 1 Gω(ω)h2. (1)

Subsequently, the length of v1(ω) is given by

v1(ω), v1(ω) = h h T 1 Gω(ω)h1 i1/2 .

With the above inner product deﬁned by Gω(ω), M is a Riemannian manifold and Gω(ω) is the Riemannian metric tensor (Amari 1985; Amari and Nagaoka 2000). We need the positive deﬁniteness of Gω(ω). However, Gω(ω) as a sum of K rank-1 matrices has rank(Gω(ω)) K, so it is a singular matrix when K < p. The case K < p is true in many classiﬁcation problems since the number of classes is often much smaller than the dimension of ω. The singularity of Gω(ω) indicates that the p tangent vectors ℓ(ω|y, x, θ)/ ωi are linearly dependent and thus some components of ω are redundant. In addition, our focus is on the small perturbations around ω0. We hence transform the

p-dimensional ω to be a vector ν such that Gν(ν) is positive deﬁnite in a small neighborhood of ν0 that corresponds to ω0. Our low-dimensional transform is implemented as follows. We ﬁrst obtain a compact singular value decomposition (c SVD) of Gω(ω0). For very large p, rather than the direct but extremely expensive c SVD computation of the p p matrix, we apply a computationally efﬁcient approach using the c SVD of the much smaller p K matrix L0 = [ T ωℓ(ω0|y, x, θ)P 1/2(y|x, θ, ω0)]1 y K by noticing that Gω(ω0) = L0LT 0 . Let r0 = rank(Gω(ω0)). The usual c SVD computation can easily yield that L0 = B0A0 and A0AT 0 = UAΛ0UT A, where B0 is a p r0 matrix with orthonormal columns, UA is a r0 r0 orthogonal matrix and Λ0 is a r0 r0 diagonal matrix. We hence obtain the c SVD: Gω(ω0) = U0Λ0UT 0 with U0 = B0UA. Deﬁne the desirable transform of ω Ωby ν = Λ1/2 0 UT 0 ω. Denote P(y|x, θ, ν) = P(y|x, θ, ω = U0Λ 1/2 0 ν + ξ0), where ξ0 = ω0 U0Λ 1/2 0 ν0 and ν0 = Λ1/2 0 UT 0 ω0. It follows from νℓ= ωℓU0Λ 1/2 0 that Gν(ν0) = I. By the smoothness of P(y|x, θ, ω) in ω Ω, the metric tensor Gν(ν) is positive deﬁnite in an open ball Bν0 centered at ν0.

Deﬁnition 1. We deﬁne the Riemannian manifold Mν0 = {P(y|x, θ, ν) : ν Bν0} with the inner product deﬁned by Gν(ν) in (1) as the perturbation manifold around ν0.

2.2 Inﬂuence Measure

Let f(ω) be the objective function of interest for sensitivity analysis. We deﬁne the inﬂuence measure to evaluate the discrepancy of the objective function f(ω) at two points, ω1 and ω2, corresponding to νi = Λ1/2 0 UT 0 ωi, i = 1, 2 on the perturbation manifold Mν0. Let C(t) = P(y|x, θ, ν(t)) be a smooth curve on Mν0 connecting ν1 = ν(t1) to ν2 = ν(t2), where ν(t) = Λ1/2 0 UT 0 ω(t) with a smooth curve ω(t) connecting ω1 = ω(t1) to ω2 = ω(t2). The distance between ν1 and ν2 along the curve C(t) is deﬁned by

SC(ν1, ν2) = Z t2

ν(t)T Gν(ν(t)) ν(t) 1/2 dt,

with ν(t) = dν(t)/dt. Following (Zhu, Ibrahim, and Tang 2011), the inﬂuence measure for f(ω) along C(t) is given by

IC(ω1, ω2) = [f(ω1) f(ω2)]2

S2 C(ν1, ν2) .

Let ω(0) = ω0, then ν(0) = ν0. We deﬁne the (ﬁrst-order) local inﬂuence measure of f(ω) at ω0 by

FIω(ω0) = max C lim t 0 IC(ω(t), ω(0)). (2)

Denote hν = ν(0) = Λ1/2 0 UT 0 hω, hω = ω(0), ω(t) = dω/dt, f(ω0) = ωf|ω=ω0, and Hf(ω0) = 2f ω ωT ω=ω0.

Plugging S2 C(ν(t), ν(0)) = t2h T ν Gν(ν0)hν + o(t2) and f(ω(t)) = f(ω(0)) + f(ω0)hωt + 1

2(h T ωHf(ω0)hω +

f(ω0) d2ω(0)

dt2 )t2 + o(t2) into (2) yields the closed form

FIω(ω0) = max hν

h T ν T f(ν0) f(ν0)hν

h T ν Gν(ν0)hν = f(ν0) T f(ν0) (3)

= f(ω0)G ω(ω0) T f(ω0),

where f(ν0) := f(ω0)U0Λ 1/2 0 , G ω(ω0) is the pseudoinverse of Gω(ω0), and we used the identities Gν(ν0) = I and ωf tω = ωf νω tν = ωf νω ωνU0Λ 1/2 0 tν = ωf U0Λ 1/2 0 tν. Deﬁnition 2. We deﬁne the inﬂuence measure of f(ω) at ω0 by FIω(ω0) given in (2) with the closed form in (3). Theorem 1. Suppose that ϕ is a diffeomorphism of ω. Then, FIω(ω0) is invariant with respect to any reparameterization corresponding to ϕ.

Proof. Let ϕ = ϕ(ω), ω = ω(ϕ), and ϕ0 = ϕ(ω0). Denote their Jacobian matrices by Φ = ϕ/ ω and Ψ = ω/ ϕ. Differentiating ω = ω(ϕ(ω)) with respect to ω yields I = ΨΦ. Denote Ψ0 = Ψ(ϕ0), Φ0 = Φ(ω0), ω0 = ω(0) and ϕ0 = dϕ(0)/dt. We have

FIω(ω0) = max hω

h T ω T f(ω0) f(ω0)hω

h T ωU0Λ0UT 0 hω

ωT 0 T f(ω0) f(ω0) ω0

ωT 0 Gω(ω0) ω0

ωT 0 ΦT 0 ΨT 0 T f(ω0) f(ω0)Ψ0Φ0 ω0

ωT 0 ΦT 0 ΨT 0 Gω(ω0)Ψ0Φ0 ω0

ϕT 0 T f(ϕ0) f(ϕ0) ϕ0 ϕT 0 Gϕ(ϕ0) ϕ0 = FIϕ(ϕ0).

Theorem 1 shows the invariance of FIω(ω0) under any diffeomorphic (e.g., scaling) reparameterization of the original perturbation vector ω rather than ν. This result is analogous to those in (Zhu et al. 2007; Zhu, Ibrahim, and Tang 2011), but we extend it to cases where the original perturbation model M with Gω(ω) is not a Riemannian manifold, especially when K < p. The invariance property is not enjoyed by the widely used Jacobian norm (Novak et al. 2018) and Cook s local inﬂuence measure (Cook 1986). For example, consider the perturbation α + α, where α = (α1, . . . , αp)T is a subvector of (x T , θT )T , and the scaling version α + α with α = kα. Let (ω, ω0) = ( α, 0) and its scaling counterpart (ω , ω 0) = ( α , 0). We have that the Jacobian norm

J(α) F = h p X

2i1/2 = k J(α ) F (4)

and the Cook s local inﬂuence measure

Cη,ω = 1 (1 + f(ω0) T f(ω0))1/2 ηT Hf(ω0)η ηT (I + T f(ω0) f(ω0))η

= Cη,ω = Ckη,ω (5)

with ω(t) = ω0 + tη are not scaling-invariant. This is problematic especially when the scale heterogeneity exists between parameters to which the perturbations are imposed. For instance, in the simultaneous perturbations to both input image x and trainable network parameters θ, i.e., α = (x T , θT )T , the contribution of x appears to be exaggerated if x is scaled with larger values than θ. Another example is the comparison between perturbations to trainable parameters (weights and bias) in a convolution layer and those (shift/scale parameters) in a batch normalization layer. There are no uniform criteria for the scaling because either rescaling to a unit norm or keeping on the original scales seems to have its own advantages. However, our inﬂuence measure evades this scaling issue by utilizing the metric tensor of the perturbation manifold rather than that of the usual Euclidean space.

2.3 Perturbation Examples In this subsection, we illustrate how to compute the proposed inﬂuence measure for a trained DNN model P(y|x, θ) = Nθ(y, x). We consider the following commonly used perturbations to the input image x or the trainable parameters θ = (θT 1 , . . . , θT L)T , where θl are the parameters in the l-th trainable network layer. Case 1: x + x; Case 2: θ + θ; Case 3: θl + θl. All three cases can be written in a uniﬁed form α + α with α {x, θ, θl}. Let the perturbation vector ω = α and ω0 = 0. For the inﬂuence measure FIω(ω0) in (3), we have f(ν0) = ( αf|ω=ω0)U0Λ 1/2 0 , (6) where Λ0 and U0 are obtained starting from matrix L0 = [ T ωℓ(ω0|y, x, θ)P 1/2(y|x, θ, ω0)]1 y K through L0 = B0A0, A0AT 0 = UAΛ0UT A and U0 = B0UA. The component ωℓ(ω0|y, x, θ) in L0 is now computed by ωℓ(ω0|y, x, θ) = α log P(y|x, θ). (7) The gradients αf|ω=ω0 and α log P(y|x, θ) can be calculated easily via backpropagation (Goodfellow et al. 2016) in deep learning libraries like Tensor Flow (Abadi et al. 2016) and Pytorch (Paszke et al. 2017). Next, we consider a speciﬁc DNN example under Case 3. Consider the following feedforward DNN architecture before the softmax layer:

nθ(x) = σL(ΘLσL 1( (Θ3σ2(Θ2σ1(Θ1x))))) RK, where x Rk0, Θl Rkl kl 1, θl = vec(ΘT l ), and σl s are entry-wise activation functions. For notational simplicity, we set all bias terms to zero and consider the sigmoid function σ(x) = [1 + exp( x)] 1 with σ(x) = σ(x)(1 σ(x)) for all activation functions. Let il(x, θ) and ol(x, θ) be the input and output vectors of the l-th layer such that ol(x, θ) = σl(il(x, θ)) and o0(x, θ) = x. The softmax function is given by

exp(z1) PK k=1 exp(zk) , . . . , exp(z K) PK k=1 exp(zk)

Figure 1: Manhattan plots for Setup 1.

The whole DNN model is

P(y|x, θ) = Nθ(y, x) = g(nθ(x))[y],

for y = 1, . . . , K, where g( )[y] is the y-th entry of vector g( ). Under Case 3, we have α = θl. Choose the objective function f to be the cross-entropy, i.e.,

f(α, ω) = log P(y = ytrue|x, θ, ω).

Hence, in (6) we have αf|ω=ω0 = α log P(y = ytrue|x, θ). Then, to calculate the gradients in (6) and (7), we only need to consider α log P(y|x, θ) = θl log(g(nθ(x))[y]). Note that z log(g(z)[y]) = (ey g(z))T , where ey RK has 1 in the y-th entry and 0 in the others. Moreover, θlnθ(x) = DLΘLDL 1 Dl+1Θl+1Dl Ol 1, with Dl = diag({ σ(il(x, θ)[j])}kl j=1) Rkl kl and Ol 1 = diag({o T l 1(x, θ), . . . , o T l 1(x, θ)}) Rkl (kl 1kl). Hence, for (6) and (7), we have

α log P(y|x, θ)

= (ey g(nθ(x)))T DLΘLDL 1 Dl+1Θl+1Dl Ol 1.

3 Experiments

In this section, we investigate the performance of our local inﬂuence measure. We address the four tasks stated in Introduction through the following setups under the three perturbation cases in Section 2.3.

Setup 1: Compute each training image s FI under Case 1, with f being the cross entropy, i.e., f = log P(y = ytrue|x, θ, ω).

Setup 2: Let f = log P(y = ytrue|x, θ, ω).

Setup 2.1: Compute each training image s FI under Case 2. Setup 2.2: Compute each trainable network layer s FI under Case 3 for each training image.

Setup 3: Compute each image s FI under Case 1 for both training and test sets, where f = log P(y = ypred|x, θ, ω).

Setup 4: Compute each pixel s FI under Case 1 for a given image. We adopt a multi-scale strategy taking into account the spatial effect. For each pixel, we set x in Case 1 to be the k k square centered at the pixel with the scale k {1, 3, 5, 7}. We use f = log P(y = ypred|x, θ, ω).

For the cross-entropy like function f in Setups 3 and 4, we use the predicted label ypred instead of the true label ytrue for the prediction purpose rather than the training purpose in the ﬁrst two setups. In Setup 4, the scale of the pixel-level FI is analogous to the convolutional kernel size. We conduct experiments on the two benchmark datasets CIFAR10 and MNIST using the two popular DNN models Res Net50 (He et al. 2016) and Dense Net121 (Huang et al. 2017). Originally, there are 50,000 and 60,000 training images for CIFAR10 and MNIST, respectively. As the validation sets, we use randomly selected 10% of those images,

with the same number for each class. No data augmentation is used for the training process. Both datasets have 10,000 test images. The prediction accuracy of our trained models is summarized in Table 1.

Table 1: Accuracy for models trained without data augmentation

CIFAR10 MNIST

Model Training Test Training Test

Res Net50 99.78% 88.70% 99.87% 99.29% Dense Net121 99.87% 91.16% 99.998% 99.58%

3.1 Outlier Detection We study the outlier detection ability of our proposed inﬂuence measure under Setup 1. Figure 1 illustrates the results of Setup 1 by using Manhattan plots. Dense Net121 generally has smaller FIs than Res Net50 for the two benchmark datasets, excluding several large FIs over 10 shown in Figure 1(c) for CIFAR10. The images with the top 5 largest FIs are displayed in Figure 2 for each case. Most of the 20 images, especially those in MNIST, are difﬁcult even for human visual detection. This indicates the strong power of our inﬂuence measure in detecting outlier images. We further examine the outlier detection power of our proposed inﬂuence measure by simulating outlier images from MNIST. Each outlier image was generated by overlapping two training digits of different classes that are shifted up to 4 pixels in each direction, with the true label randomly set to be one of the two classes. The two DNN models in Table 1 are trained with additional 50 epochs after incorporating 2700

Figure 2: Images with top 5 largest FIs in Setup 1 for Res Net50 (R) and Denset Net121 (D). Each subcaption shows ytrue ypred and FI.

Figure 3: ROC and PR curves of our proposed FI measure (red) and the Jacobian norm (blue) on MNIST with simulated outliers.

and 300 simulated outlier images into the training and validation sets, with accuracies reduced up to 0.38% and 0.11% for respective training and testing. The original 54,000 training images are all treated as non-outliers. We compare the proposed FI measure with the Jacobian norm given in (4) using the cross-entropy as the objective function f. The maximal Cook s local inﬂuence, maxη Cη,ω, is not considered here due to the expensive computation of the very large Hessian matrix; see (5). Figure 3 shows the outlier detection results of the two considered measures. Although the receiver operating characteristic (ROC) curves of the two measures are almost overlapping, our FI measure signiﬁcantly outperforms Jacobian norm in terms of the precision-recall (PR) curves that are more useful for highly unbalanced data (Davis and Goadrich 2006).

3.2 Sensitivity Analysis on DNN Architectures

We conduct the sensitivity analysis on DNN architectures under Setup 2. The invariance property of our FI measure shown in Theorem 1 enables us to fairly compare the effects of small perturbations to model parameters of different scales within or between DNNs. Setup 2.1 compares the sensitivity between the two DNNs, while Setup 2.2 undertakes the comparison across trainable layers within each single DNN. The Manhattan plots for Setup 2 on CIFAR10 are presented in Figure 4; results for MNIST are provided in the Supplementary Material. The patterns on CIFAR10 under Setup 2.1 in Figure 4(a)-(c), with mostly smaller FIs for Denset Net121, are quite similar to those for Setup 1 in Figure 1(a)-(c), in-

Figure 4: Manhattan plots for Setup 2 on CIFAR10.

Figure 5: Manhattan plots for Setup 3 on CIFAR10.

dicating that Dense Net121 is generally less sensitive than Res Net50 to the inﬁnitesimal perturbations to all network trainable parameters. From Figure 4(d)-(f) for Setup 2.2, we see stable patterns of FIs over the trainable layers for the two DNNs. Modifying their network architectures does not appear to be necessary here. Note that the FI value for the trainable parameters in each single network layer is theoretically dominated by that for all trainable parameters of the entire network, which is well supported by the comparison between Figure 4(a)-(c) and (d)-(f).

3.3 Sensitivity Comparison between Training and Test Sets

We compare the network sensitivity between training and test sets under Setup 3. Figure 5 and Table 2 show the FI values for Setup 3 on CIFAR10; results for MNIST are also provided

in the Supplementary Material. In the ﬁgure and table, the test set has more slightly large FIs than the training set for both DNNs, while FIs are generally smaller in both sets for Dense Net121. We suggest to select a DNN model with similar sensitivity performance and smaller FI values on both training and test sets. Together with the results of Sections 3.1 and 3.2 shown in Figure 1 (a)-(c) and Figure 4 (a)-(c), and also with the model accuracies in Table 1, Dense Net121 is preferred over Res Net50 on CIFAR10 in terms of both sensitivity and accuracy.

3.4 Vulnerable Region Detection

We apply the multi-scale strategy in Setup 4 to detect the areas in an image that are vulnerable to small perturbations. For Setup 4, the test images from the two benchmark datasets with the largest FI in Setup 3 by Dense Net121 are

Table 2: Percentiles of FI values for Setup 3 on CIFAR10

Res Net50 Dense Net121

Percentile Training Test Training Test

75th 2.87e-3 0.031 1.10e-4 1.83e-3 80th 5.38e-3 0.073 2.42e-4 6.57e-3 85th 0.010 0.160 6.00e-4 0.025 90th 0.023 0.343 1.78e-3 0.097 95th 0.064 0.678 7.80e-3 0.352 98th 0.177 0.999 0.037 0.755 99th 0.316 1.215 0.099 0.951 100th 2.160 3.579 2.533 2.215

illustrated in Figure 6. The vulnerable areas for both images are mainly in or around the object, and the image boundaries are generally less sensitive to perturbations. The ﬁgure also reasonably shows that the vulnerable areas expand as the scale of pixel-level FI increases. Figure 7 illustrates the one-pixel adversarial attacks based on pixel-wise FI maps. The two selected test images are correctly predicted by Res Net50 with a high probability and also with a large FI in Setup 3. The pixel-wise FI map denotes the scale-1 pixel-level FI map for the MNIST image, and is the average scale-1 map over the three RGB channels for the CIFAR10 image. For each image, the attacked pixel is the one with the largest value in the pixel-wise FI map. We see

Figure 6: Multi-scale pixel-level FI maps for Setup 4 using Dense Net121. Results are shown for the test image with the largest FI in Setup 3. The CIFAR10 test image has Setup-3 FI = 2.22, ytrue = dog, and ypred = bird. The MNIST test image has Setup-3 FI = 1.28, ytrue = 1, and ypred = 7.

(a) Original

(b) Adversarial

(c) Annotated

Adv. images

(d) Pixel-wise

(e) Overlaid

Pred: 8 (50.75%)

Pred: 3 (69.66%)

Pred: truck (50.0%)

ship (42.4%)

Pred: ship (58.6%)

truck (33.3%)

The attacked pixel s

The attacked pixel s

Figure 7: One-pixel adversarial attacks on Res Net50 using pixel-wise FI maps. The original images have ytrue = 8 and ytrue = truck, respectively. The prediction probabilities from Res Net50 are given in the parentheses. The attacked pixels are framed in red.

that the prediction result signiﬁcantly changes after slightly altering the selected pixel s value. This indicates that our FI measure is useful for discovering vulnerable locations and crafting adversarial examples.

4 Conclusion In this paper, we introduced a novel perturbation manifold and its associated inﬂuence measure for sensitivity analysis of DNN classiﬁers. This new measure is constructed from a Riemannian manifold and provides the invariance property under any diffeomorphic (e.g., scaling) reparameterization of perturbations. This invariance property is not owned by the widely used measures like the Jacobian norm and Cook s local inﬂuence. Our inﬂuence measure performs very well for Res Net50 and Dense Net121 trained on CIFAR10 and MNIST datasets in the tasks of outlier detection, sensitivity comparison between network architectures and that between training and test sets, and vulnerable region detection.

References Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; Kudlur, M.; Levenberg, J.; Monga, R.; Moore, S.; Murray, D. G.; Steiner, B.; Tucker, P.; Vasudevan, V.; Warden, P.; Wicke, M.; Yu, Y.; and Zheng, X. 2016. Tensorﬂow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), 265 283. Akhtar, N., and Mian, A. 2018. Threat of adversarial attacks on deep learning in computer vision: A survey. ar Xiv preprint ar Xiv:1801.00553. Amari, S., and Nagaoka, H. 2000. Methods of Information Geometry. American Mathematical Society, Providence, RI. Amari, S. 1985. Differential-geometrical Methods in Statistics. Springer-Verlag, New York. Bojarski, M.; Del Testa, D.; Dworakowski, D.; Firner, B.; Flepp, B.; Goyal, P.; Jackel, L. D.; Monfort, M.; Muller, U.; Zhang, J.; Zhang, X.; Zhao, J.; and Zieba, K. 2016.

End to end learning for self-driving cars. ar Xiv preprint ar Xiv:1604.07316. Carlini, N., and Wagner, D. 2017. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy, 39 57. Cheney, N.; Schrimpf, M.; and Kreiman, G. 2017. On the robustness of convolutional neural networks to internal architecture and weight perturbations. ar Xiv preprint ar Xiv:1703.08245. Cook, R. D. 1986. Assessment of local inﬂuence. Journal of the Royal Statistical Society. Series B (Methodological) 48(2):133 169. Davis, J., and Goadrich, M. 2006. The relationship between precision-recall and roc curves. In Proceedings of the 23rd International Conference on Machine learning, 233 240. Fawzi, A.; Moosavi-Dezfooli, S.-M.; and Frossard, P. 2017. The robustness of deep networks: A geometrical perspective. IEEE Signal Processing Magazine 34(6):50 62. Goodfellow, I.; Bengio, Y.; Courville, A.; and Bengio, Y. 2016. Deep Learning. Cambridge: MIT Press. Goodfellow, I. J.; Shlens, J.; and Szegedy, C. 2015. Explaining and harnessing adversarial examples. In International Conference on Learning Representations. ar Xiv:1412.6572. He, K.; Zhang, X.; Ren, S.; and Sun, J. 2015. Delving deep into rectiﬁers: Surpassing human-level performance on imagenet classiﬁcation. In Proceedings of the IEEE International Conference on Computer Vision, 1026 1034. He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770 778. Hein, M., and Andriushchenko, M. 2017. Formal guarantees on the robustness of a classiﬁer against adversarial manipulation. In Advances in Neural Information Processing Systems, 2266 2276. Huang, G.; Liu, Z.; Weinberger, K. Q.; and van der Maaten, L. 2017. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4700 4708. Krizhevsky, A.; Sutskever, I.; and Hinton, G. E. 2012. Imagenet classiﬁcation with deep convolutional neural networks. In Advances in Neural Information Processing Systems, 1097 1105. Moosavi-Dezfooli, S.-M.; Fawzi, A.; and Frossard, P. 2016. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2574 2582. Novak, R.; Bahri, Y.; Abolaﬁa, D. A.; Pennington, J.; and Sohl-Dickstein, J. 2018. Sensitivity and generalization in neural networks: an empirical study. In International Conference on Learning Representations. ar Xiv:1802.08760. Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; De Vito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; and Lerer, A. 2017. Automatic differentiation in pytorch. In NIPS 2017 Autodiff Workshop.

Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; Berg, A. C.; and Li, F.-F. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115(3):211 252. Sharif, M.; Bhagavatula, S.; Bauer, L.; and Reiter, M. K. 2017. Adversarial generative nets: Neural network attacks on stateof-the-art face recognition. ar Xiv preprint ar Xiv:1801.00349. Su, J.; Vargas, D. V.; and Kouichi, S. 2017. One pixel attack for fooling deep neural networks. ar Xiv preprint ar Xiv:1710.08864. Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.; and Fergus, R. 2013. Intriguing properties of neural networks. In International Conference on Learning Representations. ar Xiv:1312.6199. Weng, T.-W.; Zhang, H.; Chen, P.-Y.; Yi, J.; Su, D.; Gao, Y.; Hsieh, C.-J.; and Daniel, L. 2018. Evaluating the robustness of neural networks: An extreme value theory approach. In International Conference on Learning Representations. ar Xiv:1801.10578. Zhu, H.; Ibrahim, J. G.; Lee, S.; and Zhang, H. 2007. Perturbation selection and inﬂuence measures in local inﬂuence analysis. The Annals of Statistics 35(6):2565 2588. Zhu, H.; Ibrahim, J. G.; and Tang, N. 2011. Bayesian inﬂuence analysis: a geometric approach. Biometrika 98(2):307 323.