# feature_space_targeted_attacks_by_statistic_alignment__6be48164.pdf

Feature Space Targeted Attacks by Statistic Alignment

Lianli Gao , Yaya Cheng , Qilong Zhang , Xing Xu and Jingkuan Song

Center for Future Media, University of Electronic Science and Technology of China yaya.cheng@hotmail.com, qilong.zhang@std.uestc.edu.cn, jingkuan.song@gmail.com

By adding human-imperceptible perturbations to images, DNNs can be easily fooled. As one of the mainstream methods, feature space targeted attacks perturb images by modulating their intermediate feature maps, for the discrepancy between the intermediate source and target features is minimized. However, the current choice of pixelwise Euclidean Distance to measure the discrepancy is questionable because it unreasonably imposes a spatial-consistency constraint on the source and target features. Intuitively, an image can be categorized as cat no matter the cat is on the left or right of the image. To address this issue, we propose to measure this discrepancy using statistic alignment. Speciﬁcally, we design two novel approaches called Pair-wise Alignment Attack and Global-wise Alignment Attack, which attempt to measure similarities between feature maps by high-order statistics with translation invariance. Furthermore, we systematically analyze the layerwise transferability with varied difﬁculties to obtain highly reliable attacks. Extensive experiments verify the effectiveness of our proposed method, and it outperforms the state-of-the-art algorithms by a large margin. Our code is publicly available at https://github.com/yaya-cheng/PAA-GAA.

1 Introduction

Deep neural networks (DNNs) [He et al., 2016; Huang et al., 2017; Simonyan and Zisserman, 2015; Szegedy et al., 2016] have made impressive achievements in these years, and various ﬁelds are dominated by them, e.g., object detection [Redmon et al., 2016]. However, recent works demonstrate that DNNs are highly vulnerable to the adversarial examples [Szegedy et al., 2014; Biggio et al., 2013] which are only added with human-imperceptible perturbations. To ﬁnd out the insecure bugs in the DNNs, many works pay attention to the generation of adversarial examples. In general, the attack methods can be grouped into three broad categories: white-box, gray-box, and black-box at-

corresponding author

tacks. For the white-box setting [Moosavi-Dezfooli et al., 2016; Carlini and Wagner, 2017], the adversaries can access all information (e.g., the architectures and parameters) of the victim s models. Thus the update directions of the adversarial examples are accurate. For the gray-box setting [Ilyas et al., 2018; Ru et al., 2020], only the output logits or labels are available. Therefore, most of the works craft adversarial examples through a considerable amount of queries. However, in many scenarios, both the whitebox and the gray-box attacks are infeasible owing to the opaque deployed models. For the black-box setting, all information of the victim s models is unavailable. Since the decision boundaries of different DNNs are similar, the resultant adversarial examples crafted for the substitute models, e.g., well-trained models, are also practical for others, which is called the transferability of adversarial examples. Most black-box attack methods [Dong et al., 2018; Inkawhich et al., 2019; Gao et al., 2020a; Gao et al., 2020b; Lin et al., 2020] aim at enhancing the transferability of adversarial examples depending on information from the classiﬁcation layers of the substitute models. However, it is still challenging to improve the success rate of black-box targeted attacks, i.e., induce the victim s models to predict the pre-set target labels.

To tackle the poor effectiveness of black-box targeted attacks, researchers [Sabour et al., 2016; Inkawhich et al., 2019] delve into the feature space targeted attacks, which perturb images by modulating their intermediate feature maps. For example, given a source image, [Inkawhich et al., 2019] ﬁrst select a single sample of the target label whose intermediate activation is furthest from the source one under Euclidean distance. Then, perturbation is crafted by minimizing the Euclidean distance between the source and target features. However, since Euclidean distance prefers to focus on the spatial gap between two features, it will select the spatially furthest target image rather than the one with the outermost semantic meaning. For instance, considering a source image with a cat on the left and the target label is dog , under the above setting, the algorithm tends to choose a target image that has a dog on the right instead of on the left. When it comes to the generation of perturbation, what the algorithm needs to do is the semantic meaning alignment between the source and target features and the minimization of the huge spatial discrepancy. Overall, the current choice of pixel-wise Eu-

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

clidean distance to measure the discrepancy is questionable, as it unreasonably imposes a spatial-consistency constraint on the source and target features. To produce spatial-agnostic measurements, we propose two novel approaches called Pair-wise Alignment Attack and Global-wise Alignment Attack, which attempt to measure similarities between features by high-order statistics with translation invariance. With this perspective, we deal with the feature space targeted attacks as the problem of statistic alignment. By aligning the source and target high-order statistics, rather than depending on the Euclidean distance, we can make the two feature maps semantically close without introducing an excessive spatial gap in feature space. To sum up, our contribution is summarized as three-folds: 1) We point out that the current choice of pixel-wise Euclidean Distance to measure the discrepancy between two features is questionable, for it unreasonably imposes a spatialconsistency constraint on the source and target features. By exploring high-order statistics with translation invariance, two novel methods are proposed: a) Pair-wise Alignment Attack and b) Global-wise Alignment Attack, which deal with feature space targeted attacks as a problem of statistic alignment; 2) To obtain high-reliability results, we systematically analyze the layer-wise transferability. Furthermore, to set all images under the same transfer difﬁculty, which ranges from the easiest to the hardest, we assign the target labels of the same difﬁculty level to them and give a comprehensive evaluation of our methods. and 3) Extensive experimental results show the effectiveness of our methods, which outperform the state-of-the-art by 6.92% at most and 1.70% on average in typical setups.

2 Related Works

After the discovery of adversarial examples [Szegedy et al., 2014; Biggio et al., 2013], many excellent works are proposed. Generally, based on different goals, attack methods can be divided into non-targeted attacks and targeted attacks. For non-targeted attacks (e.g., [Xie et al., 2019]), all need to do is fooling DNNs to misclassify the perturbed images. For targeted attacks, the adversaries must let the DNNs predict speciﬁc untrue labels for the adversarial examples. [Li et al., 2020] apply Poincar e distance and Triplet loss to regularize the targeted attack process. [Gao et al., 2021] propose staircase sign method to utilize the gradients of the substitute models effectively. The above methods craft adversarial examples by directly using the outputs of the classiﬁcation layers, i.e., logits (un-normalized log probability). In addition to these, researchers [Yosinski et al., 2014] observe that distorting the features in the intermediate layers of DNNs can also generate transferable adversarial examples. Based on this, [Inkawhich et al., 2019] generate adversarial examples by minimizing the Euclidean distance between the source and target feature maps. [Inkawhich et al., 2020a] leverage class-wise and layer-wise deep feature distributions of substitute models . [Inkawhich et al., 2020b] extract feature hierarchy of DNNs to boost the performance of targeted adversarial attacks further. However, the above methods need to train speciﬁc auxiliary classiﬁers for each target label, thus

suffering from expensive computation costs.

3 Methodology

In this section, we ﬁrst give some notations of targeted attacks, and the untargeted version can be simply derived. Then we describe our proposed methods, i.e., Pair-wise Alignment Attack and Global-wise Alignment Attack, in Subsection 3.2 and 3.3. The attack process is detailed in Subsection 3.4.

3.1 Preliminaries Adversarial targeted attacks. This task aims at fooling a DNN F to misclassify perturbed image xadv = x+δ, where x is the original image of label y, δ is an imperceptible perturbation added on x. In our work, ℓ -norm is applied to evaluate the imperceptibility of perturbation, i.e., δ ϵ. Different from the untargeted attacks that only need to let F will not perform correct recognition, targeted attacks restrict the misclassiﬁed label to be ytgt. The constrained optimization of targeted attacks can be written as:

xadv = arg min L(xadv, ytgt), s.t. xadv x ϵ, (1) where L( , ) is the loss function to calculate perturbations.

Perceptions of DNNs. DNNs, especially convolutional neural networks (CNNs), have their patterns to perceive and understand images [Zeiler and Fergus, 2014], which is caused by the mechanism of convolutional layers. As introduced in [Worrall et al., 2017], convolution kernels do not perform a one-time transformation to produce result from the input. Instead, a small region of input is perceived iteratively so that features at every layer still hold local structures similar to that of the input (see Supp. Sec. D). This property of convolution leads to the translation homogeneity of intermediate feature maps. Therefore, measuring only the Euclidean distance between two feature maps will be inaccurate when there are translations, rotations, etc.

3.2 Pair-wise Alignment Attack Given an image xtgt of target label ytgt, a speciﬁc intermediate layer l from network F. We use Sl RNl Ml to denote the feature of xadv at layer l of F. Similarly, T l RNl Ml is the feature of xtgt. Speciﬁcally, Nl is the number of channels and Ml is the product of the height and width of features. As described before, since Euclidean distance imposes unreasonable spatial-consistency constraint on Sl and T l, choosing it as the metric leads to redundant efforts on spatial information matching. To handle this, we propose the Pair-wise Alignment Attack (PAA). Assuming that the label information is modeled by highly abstract features, we denote Sl and T l are under two distributions p and q, which models the label information y and ytgt, respectively. Naturally, an arbitrary feature extracted from F is treated as a sample set of a series of feature vectors over corresponding distribution. So the problem is how to utilize these samples to further estimate the difference between p and q. Empirically, source and target sample sets Ω p, Z q are built by splitting Sl, T l into individual vectors, where Ω={s i}Ml i=1, Z ={t j}Ml j=1.

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

Another way of splitting in where Ω={si }Nl i=1, Z ={tj }Nl j=1 is analysed in Supp. Sec. C. After that, through measuring the similarity of Ωand Z, the discrepancy between p and q is estimated. Typically, this is a two-sample problem [Gretton et al., 2012]. As introduced in [Gretton et al., 2012], MMD has been explored for the two-sample problem. Let H be a reproducing kernel Hilbert space (RKHS) with an associated continuous kernel k( , ). For all f H, the mean embedding of p in H is an unique element µp which satisﬁes the condition of Eω pf = f, µp H. Then in our task, MMD2[p, q] is deﬁned as the RKHS distance between µp and µq:

MMD2[p, q] = µp µq 2 H

= µp, µp H + µq, µq H 2 µp, µq H

i,j=1 k(s i, s j) + 1 M 2 l

i,j=1 k(t i, t j)

i,j=1 k(s i, t j).

Speciﬁcally, MMD is calculated by two kinds of pairs: a) intra-distribution pairs (s i, s j), (t i, t j) and b) interdistribution pair (s i, t j). Obviously, MMD is not affected by spatial translations, i.e., shifting or rotation will not change the result of equation 2, which is the key difference from Euclidean distance. Furthermore, based on the critical property MMD2[p, q] = 0 iﬀp = q [Gretton et al., 2012], minimizing equation 2 equals to modulating source feature to target s:

LP(Sl, T l) = MMD2[p, q]. (3)

Since the kernel choice plays a key role in the mean embedding matching [Gretton et al., 2012]. In our experiments, three kernel functions will be studied to evaluate their effectiveness in statistic alignment: Linear kernel PAAℓ: k(s, t)=s Tt.

Polynomial kernel PAAp: k(s, t)=(s Tt + c)d.

Gaussian kernel PAAg: k(s, t)=exp ( s t 2 2 2σ2 ),

where bias c, power d and variance σ2 are hyper-parameters. Following [Inkawhich et al., 2019], by randomly sampling images from each label, a gallery is maintained for picking target images. With the help of the gallery, the pipeline of getting xtgt by PAA is as follows: Given a source image x, we obtain ytgt by using different strategies of target label selection. After that, xtgt is chosen from the corresponding sub-gallery by ﬁnding the image with the largest loss LP. It is worth noting that we adopt the linear-time unbiased estimation of MMD2[p, q] from [Gretton et al., 2012] to decrease the space and computation complexity during the selection of the target image xtgt.

3.3 Global-wise Alignment Attack Since Pair-wise Alignment Attack involves time-consuming pair-wise computation, we propose the other efﬁcient approach that achieves comparable performance. Unlike the

Den121 Vgg19

Den121 Inc-v3

0 5 10 15 20 0

Den121 Res50

0 5 10 15 20 0

1000th 500th 100th 10th 2nd

Figure 1: Performance (t Suc and t TR) of PAAp w.r.t. 2nd, 10th, 100th, 500th, and 1000th settings. Target label of higher ranking leads to better performance.

previous one, Global-wise Alignment Attack (GAA) explicitly matches moments of source, and target sample sets Ω, Z. Speciﬁcally, we employ two global statistics: a) ﬁrstorder raw moment (mean) and b) second-order central moment (variance) to guide the modulation of features. Let µi Sl, µi T l, σi Sl, σi T l be the mean and variance of the ith channel of Sl and T l, respectively:

j=1 (Sl)ij, σi Sl = Var((Sl)i ), (4)

j=1 (T l)ij, σi T l = Var((T l)i ). (5)

Minimizing the gaps between Ωand Z of these two moments equals to aligning the source and target features globally:

δµ = µi Sl µi T l , δσ = σi Sl σi T l ,

LG(Sl, T l) = δµ + δσ. (6)

The reasons for performing Global-wise Alignment are: 1) the two moments are practical to estimate the distribution on a dataset, just like what batch-normalization does; and 2) when the architectures of DNNs go deeper, these two moments will contain more complicated traits to represent different distributions [Li et al., 2018]. Similar as PAA, GAA also chooses the target image from the gallery by calculating Equation (6).

3.4 Attack Algorithm Motivated by MIFGSM [Dong et al., 2018] which using momentum to memorize previous gradients and follow the setting of AA [Inkawhich et al., 2019], we integrate momentum to the pipeline of perturbation generation. Speciﬁcally, for two kinds of attacks, i.e., PAA and GAA, we ﬁrstly calculate

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

Den121 VGG19 Den121 Inc-v3 Den121 Res50 t Suc t TR t Suc t TR t Suc t TR

TIFGSM 0.40 0.41 0.08 0.08 0.24 0.24 MIFGSM 1.48 1.50 0.54 0.55 2.44 2.46 AA 1.18 1.61 0.50 0.68 1.78 2.32 GAA 3.20 4.17 0.70 0.91 4.22 5.62 PAAg 1.60 2.52 0.52 0.73 2.42 3.60 PAAℓ 3.20 3.97 0.74 0.94 4.40 5.65 PAAp 4.38 5.56 1.16 1.45 6.08 7.95

VGG19 Inc-v3 VGG19 Den121 VGG19 Res50 TIFGSM 0.08 0.08 0.26 0.26 0.12 0.12 MIFGSM 0.30 0.30 1.16 1.17 0.68 0.69 AA 0.08 0.14 0.38 0.48 0.16 0.24 GAA 0.12 0.20 0.72 1.56 0.44 0.88 PAAg 0.14 0.30 0.52 1.27 0.32 0.73 PAAℓ 0.12 0.19 0.34 0.74 0.18 0.49 PAAp 0.28 0.36 1.00 1.87 0.56 0.88

Inc-v3 VGG19 Inc-v3 Den121 Inc-v3 Res50 TIFGSM 0.16 0.18 0.08 0.09 0.08 0.09 MIFGSM 0.56 0.56 0.56 0.57 0.54 0.54 AA 0.24 0.66 0.28 1.02 0.24 0.66 GAA 0.60 2.49 0.72 2.38 0.68 2.60 PAAg 0.34 1.03 0.38 1.24 0.34 1.24 PAAℓ 0.22 0.71 0.32 1.95 0.32 2.13 PAAp 0.70 2.55 0.86 3.37 0.82 3.10

Res50 VGG19 Res50 Inc-v3 Res50 Den121 TIFGSM 0.32 0.33 0.08 0.08 0.44 0.45 MIFGSM 2.00 2.02 0.92 0.93 3.96 3.99 AA 0.78 2.05 0.54 1.29 1.96 4.93 GAA 2.14 5.28 0.76 1.78 3.92 9.80 PAAg 0.94 3.15 0.44 1.28 1.68 5.96 PAAℓ 2.16 5.91 0.62 1.71 3.10 8.54 PAAp 4.38 8.46 1.36 2.48 7.36 14.88

Table 1: Quantitative comparisons with state-of-the-art attacks under the random sample strategy of target label selection. Ours achieve the best performance in most cases.

gradients step-by-step:

gν = xadv ν L(Sl ν, T l), (7)

where ν is the current step during the whole iteration, Sl ν is the intermediate feature of the perturbed image xadv ν at iteration ν, and xadv 0 = x. Then the momentum term is accumulated by previous gradients:

βν+1 = µ βν + gν gν , (8)

where µ refers to the decay factor, βν is the momentum term at iteration ν and β0 is initialized to 0. Finally, under the ℓ -norm constraint, adversarial examples are crafted by performing the above calculations iteratively:

xadv ν+1 = clipx,ϵ(xadv ν α sign(βν+1)), (9)

where α is a given step size.

4 Experiments To make comprehensive comparisons with state-of-the-arts, we conduct a series of experiments to evaluate performance. Speciﬁcally, baselines include a feature space targeted attack

2nd 10th 100th 500th 1000th

28.38 14.38 7.6 3.78 0.78 PAAg 27.18 11.3 4.28 1.62 0.32 PAAℓ 33.28 15.56 6.86 3.10 0.86 PAAp 37.98 21.10 11.2 5.12 1.74

36.15 19.82 10.75 5.26 1.03 PAAg 34.94 16.56 6.68 2.64 0.58 PAAℓ 39.39 19.68 9.32 4.27 1.05 PAAp 44.90 26.01 14.66 6.74 2.15

Table 2: Transferability (t Suc and t TR) w.r.t. 2nd, 10th, 100th, 500th, and 1000th settings. Formally, different target labels lead to different performance and those of lower-ranking lead to worse performance.

Den121 Vgg19

Den121 Inc-v3

0 5 10 15 20 0

Den121 Res50

0 5 10 15 20 0

1000th 500th 100th 10th 2nd

Figure 2: Performance (t Suc and t TR) of GAA w.r.t. 2nd, 10th, 100th, 500th, and 1000th settings. Target label of higher ranking leads to better performance.

method: AA [Inkawhich et al., 2019] and two FGSM-based methods: MIFGSM [Dong et al., 2018] and TIFGSM [Dong et al., 2019]. Supp. Sec. G and Sec. H give comparisons with other FGSM-based methods.

Image Net models. For a better evaluation of transferability, four Image Net-trained models with different architectures are chosen: VGG-19 with batch-normalization (VGG19) [Simonyan and Zisserman, 2015], Dense Net-121 (Den121) [Huang et al., 2017], Res Net-50 (Res50) [He et al., 2016], Inception-v3 (Inc-v3) [Szegedy et al., 2016].

Dataset. Attacking images that have already been misclassiﬁed is pointless. Hence for each of all 1000 labels in the Image Net validation set, we randomly select ﬁve images (5,000 in total) to perturb, which are correctly classiﬁed by all the networks we considered.

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

0 5 10 15 20

Den121 Vgg19

0 8 16 24 32 40

0 5 10 15 20

Den121 Inc-v3

0 5 10 15 20

Den121 Res50

8 16 24 32 40

Inc-v3 Vgg19

3 6 9 12 15 18

Inc-v3 Den121

6 9 12 15 18 21

Inc-v3 Res50

6 9 12 15 18 21

0 5 10 15 20

0 5 10 15 20

0 5 10 15 20 16

0 3 6 9 12 6 12 18 24 30 36

0 3 6 9 12 8

0 3 6 9 12 6 12 18 24 30 36

0 4 8 12 16

Vgg19 Inc-v3

0 4 8 12 16

Vgg19 Den121

4 8 12 16 20 24

0 4 8 12 16

Vgg19 Res50

0 4 8 12 16

Res50 Vgg19

0 4 8 12 16

Res50 Inc-v3

0 4 8 12 16

Res50 Den121

8 16 24 32 40

0 4 8 12 16

10 15 20 25 30

0 4 8 12 16 0 8 16 24 32 40

0 4 8 12 16 0

0 4 8 12 16

8 16 24 32 40 48

0 4 8 12 16

0 4 8 12 16

AA GAA PAAg PAA PAAp

Figure 3: t Suc and t TR performance w.r.t. relative layer depth for multiple transfer scenarios. The ﬁgure is split into four phases: upper left, upper right, bottom left, and bottom right, corresponding to black-box attacks transferring from Den121, Inc-v3, VGG19, and Res50. All of our proposed methods outperform AA in most cases, which indicates the effectiveness of statistic alignment on various layers.

Den121 Vgg19 Den121 Inc-v3 Den121 Res50

Figure 4: t Suc results w.r.t. bias c for PAAp transferring from Den121 (white-box model) to VGG19, Inc-v3, and Res50 (blackbox model). We observe the highest results when c=0, i.e., polynomial with pure second-order terms.

Layer decoding scheme. Following AA, a scheme for layer decoding is employed to present better which layer is chosen for the attack. Generally, layers are arranged from shallow to deep and numbered by relative layer depths, e.g., layer 0 of Res50 (denoted as Res50[0]) is near the input layer, and Res50[16] is closed to the classiﬁcation layer. Supp. Sec. A details the scheme.

Target label selection. There are two strategies for target label selection: a) random sample adopted in AA. b) choose by ranking. Previous feature space targeted attack methods,

e.g., [Inkawhich et al., 2019], gain relatively poor performance. Given the prior knowledge that different target labels involve different transfer difﬁculties, randomly sampling the target label will lead to ﬂuctuating transfer results (see Supp. Sec. F for more analysis). For instance, given an image of cat , it is easier to fool a model to predict it as a dog than an airplane. To avoid this, we assign ytgt by ranking. For example, 2nd indicates that the label of the second high conﬁdence is chosen to be ytgt. To give an exhaustive comparison, 2nd, 10th, 100th, 500th, and 1000th settings are adopted. We also report results under the random sample strategy to reveal the raw performance.

Implementation details. To make a fair comparison, all methods are set to identical ℓ constraint ϵ=0.07, the number of iterations T =20, and step size α=ϵ/T =0.0035. The gallery size is set to 20 1000. For PAAg, we set variance σ2 as the mean of squared ℓ2 distances of those pairs. For PAAp, we set bias c = 0, and only study the case of power d = 2. For TIFGSM, we adopt the default kernel length as 15. For MIFGSM, we set the decay factor as µ=1.0.

Evaluation metrics. Following AA, we adopt two metrics, i.e., targeted success rate (t Suc) and targeted transfer rate (t TR), to evaluate the transferability of adversarial examples. For t Suc, it equals the percentage of adversarial examples that successfully fool the victim s DNNs. For t TR, given an image set that contains adversarial examples that attack the substitute model successfully, t TR is the ratio that how many examples of this set can fool the black-box model too.

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

4.1 Comparisons with State-of-the-Art Attacks In this section, to comprehensively evaluate adversarial examples transferability, we ﬁrstly attack different white-box models using the random sample strategy for target labels, then transfer the resultant adversarial examples to black-box models. For instance, Den121 Res50 indicates that we generate adversarial examples from Den121 and transfer them to Res50. Empirically, attack performance varies according to the choice of layers. Under random sample strategy, VGG19[10], Den121[23], Res50[11] and Inc-v3[8] perform the best, their experimental results are shown in Table 1.

Effectiveness of PAA with Different Kernel Functions As demonstrated in Table 1, all pair-wise alignments show their success in feature space targeted attack. Speciﬁcally, comparing with Linear kernel and Gaussian kernel, Polynomial kernel brings the best performance, and our PAAp outperforms state-of-the-arts by 6.92% at most and 1.70% on average, which shows the effectiveness of our pair-wise alignment. As for the reasons of the performance gains, compared with FGSM-based methods, i.e., TIFGSM and MIFGSM, we exploit the information in the intermediate feature maps to perform highly transferable attacks. Compared with AA, it adopts Euclidean distance for measuring differences so that shows worse performance than ours, demonstrating the effectiveness of our proposed statistic alignment.

Effectiveness of GAA Although GAA requires quite simple computations to perform attacks, it still shows convincing performance against all black-box models. Speciﬁcally, GAA outperforms the state-of-the-arts by 3.98% at most and 0.73% on average, which shows the effectiveness of global alignment between statistics from target and source. Moreover, when choosing Den121 and Res50 as white-box models, it shows comparable performance with PAAℓ. When it becomes VGG19 or Inc-v3, GAA achieves the second-best results in most cases.

4.2 Ablation Study Transferability w.r.t. Target Labels Considering different difﬁculties of target label ytgt, for PAAp and GAA, we study how layer-wise transferability varies with 2nd , 10th , 100th , 500th , 1000th setup. As illustrated in Figure 1 and Figure 2, t Suc and t TR w.r.t. relative layer depth under above settings are evaluated. Obviously, the independence of layer-wise transferability from different target labels maintains. In other words, different target labels do not affect the layer-wise transferability trends, although further ytgt away from ground truth y leads to a more challenging transfer-based attack. For case Den121 Res50 under 2nd, we report the results for the optimal layer of Den121 (Den121[22]) in Table 2. Formally, target labels of different ranks lead to different performance , and the lower-ranking leads to worse performance. Speciﬁcally, 2nd is the best case, 1000th refers to the worst case.

Transferability w.r.t. Layers In this section, transferability w.r.t. relative layer depth under 2nd is investigated. Involved methods contain PAAℓ, PAAp, PAAg, GAA, and AA. Speciﬁcally, given the

white-box and black-box model pair, each subﬁgure of Figure 3 illustrates performance under different metric w.r.t. relative layer depth. As demonstrated in the ﬁgure, compared with the Linear kernel, the Polynomial kernel brings about better attack ability on Res50, Inc-v3, and Dense121 whitebox. As for the VGG19 white-box, they achieve comparable results. Furthermore, in most of the chosen layers, all of our methods are superior to the baseline AA by a large margin. Similar to what is stated in [Inkawhich et al., 2019], given a white-box model, our layer-wise transferability still holds a similar trend regardless of which black-box models we test. Speciﬁcally, for Den121, a deeper layer yields more transferability. For Inc-v3, Vgg19, and Res50, the most powerful attack comes from perturbations generated from optimal middle layers. This phenomenon indicates that adversarial examples generated by our optimal layers can be well transferred to truly unknown models. From the experimental results, under 2nd, we simply adopt VGG19[14], Den121[22], Res50[14], and Inc-v3[11] as our optimal layers.

Transferability w.r.t. Orders As mentioned above, the Polynomial kernel leads to the most powerful attack. Since larger bias c (c 0) results in a greater proportion of lower-order terms in the polynomial, in this section, we study the appropriate value of c under 2nd and Den121[22] setup. Speciﬁcally, we attack Den121 using PAAp parameterized by c ranging from 0.0 to 2.0 with a granularity 0.1. As illustrated in Figure 4, from the monotonically decreasing curves, we can achieve the most effective attack when c = 0.0, where t Suc is 37.00%, 24.00%, 37.78% for VGG19, Inc-v3 , and Res50. Once c=1.3 or larger, t Suc maintains stable. The overall average t Suc for VGG19, Incv3, Res50 are 30.78%, 19.68%, and 32.12%.

5 Conclusion In this paper, we propose a novel statistic alignment for feature space targeted attacks. Previous methods utilize Euclidean distance to craft perturbations. However, because of the spatial-related property of this metric, it unreasonably imposes a spatial-consistency constraint on the source and target features. To address this problem, two novel methods, i.e., Pair-wise Alignment Attack and Global-wise Alignment Attack are proposed by employing high-order translationinvariant statistics. Moreover, since randomly selecting target labels results in ﬂuctuating transfer results, we further analyze the layer-wise transferability with different transfer difﬁculties to obtain highly reliable attacks. Extensive experimental results show the effectiveness of our methods.

Acknowledgements This work is supported by National Key Research and Development Program of China (No.2018AAA0102200), the National Natural Science Foundation of China (Grant No.61772116, No.61872064, No.62020106008), Sichuan Science and Technology Program (Grant No.2019JDTD0005), The Open Project of Zhejiang Lab (Grant No.2019KD0AB05) and Open Project of Key Laboratory of Artiﬁcial Intelligence, Ministry of Education (Grant No.AI2019005).

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

References [Biggio et al., 2013] Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim Srndic, Pavel Laskov, Giorgio Giacinto, and Fabio Roli. Evasion attacks against machine learning at test time. In ECML/PKDD, 2013. [Carlini and Wagner, 2017] Nicholas Carlini and David A. Wagner. Towards evaluating the robustness of neural networks. In SP, 2017. [Dong et al., 2018] Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, and Jianguo Li. Boosting adversarial attacks with momentum. In CVPR, 2018. [Dong et al., 2019] Yinpeng Dong, Tianyu Pang, Hang Su, and Jun Zhu. Evading defenses to transferable adversarial examples by translation-invariant attacks. In CVPR, 2019. [Gao et al., 2020a] Lianli Gao, Qilong Zhang, Jingkuan Song, Xianglong Liu, and Heng Tao Shen. Patch-wise attack for fooling deep neural network. In ECCV, 2020. [Gao et al., 2020b] Lianli Gao, Qilong Zhang, Jingkuan Song, and Heng Tao Shen. Patch-wise++ perturbation for adversarial targeted attacks. Co RR, abs/2012.15503, 2020. [Gao et al., 2021] Lianli Gao, Qilong Zhang, Xiaosu Zhu, Jingkuan Song, and Heng Tao Shen. Staircase sign method for boosting adversarial attacks. ar Xiv preprint ar Xiv:2104.09722, 2021. [Gretton et al., 2012] Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Sch olkopf, and Alexander J. Smola. A kernel two-sample test. J. Mach. Learn. Res., 13:723 773, 2012. [He et al., 2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016. [Huang et al., 2017] Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. Densely connected convolutional networks. In CVPR, 2017. [Ilyas et al., 2018] Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin. Black-box adversarial attacks with limited queries and information. In ICML, 2018. [Inkawhich et al., 2019] Nathan Inkawhich, Wei Wen, Hai (Helen) Li, and Yiran Chen. Feature space perturbations yield more transferable adversarial examples. In CVPR, 2019. [Inkawhich et al., 2020a] Nathan Inkawhich, Kevin J. Liang, Lawrence Carin, and Yiran Chen. Transferable perturbations of deep feature distributions. In ICLR, 2020. [Inkawhich et al., 2020b] Nathan Inkawhich, Kevin J. Liang, Binghui Wang, Matthew Inkawhich, Lawrence Carin, and Yiran Chen. Perturbing across the feature hierarchy to improve standard and strict blackbox attack transferability. In Neur IPS, 2020. [Li et al., 2018] Yanghao Li, Naiyan Wang, Jianping Shi, Xiaodi Hou, and Jiaying Liu. Adaptive batch normalization for practical domain adaptation. Pattern Recognit., 80:109 117, 2018.

[Li et al., 2020] Maosen Li, Cheng Deng, Tengjiao Li, Junchi Yan, Xinbo Gao, and Heng Huang. Towards transferable targeted attack. In CVPR, 2020. [Lin et al., 2020] Jiadong Lin, Chuanbiao Song, Kun He, Liwei Wang, and John E. Hopcroft. Nesterov accelerated gradient and scale invariance for adversarial attacks. In ICLR, 2020. [Moosavi-Dezfooli et al., 2016] Seyed-Mohsen Moosavi Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: A simple and accurate method to fool deep neural networks. In CVPR, 2016. [Redmon et al., 2016] Joseph Redmon, Santosh Kumar Divvala, Ross B. Girshick, and Ali Farhadi. You only look once: Uniﬁed, real-time object detection. In CVPR, 2016. [Ru et al., 2020] Binxin Ru, Adam D. Cobb, Arno Blaas, and Yarin Gal. Bayesopt adversarial attack. In ICLR, 2020. [Sabour et al., 2016] Sara Sabour, Yanshuai Cao, Fartash Faghri, and David J. Fleet. Adversarial manipulation of deep representations. In ICLR, 2016. [Simonyan and Zisserman, 2015] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015. [Szegedy et al., 2014] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In ICLR, 2014. [Szegedy et al., 2016] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In CVPR, 2016. [Worrall et al., 2017] Daniel E Worrall, Stephan J Garbin, Daniyar Turmukhambetov, and Gabriel J Brostow. Harmonic networks: Deep translation and rotation equivariance. In CVPR, 2017. [Xie et al., 2019] Cihang Xie, Zhishuai Zhang, Yuyin Zhou, Song Bai, Jianyu Wang, Zhou Ren, and Alan L. Yuille. Improving transferability of adversarial examples with input diversity. In CVPR, 2019. [Yosinski et al., 2014] Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. How transferable are features in deep neural networks? In Neur IPS, 2014. [Zeiler and Fergus, 2014] Matthew D. Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In ECCV, 2014.

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)