# bayesopt_adversarial_attack__1a35a9a1.pdf

Published as a conference paper at ICLR 2020

BAYESOPT ADVERSARIAL ATTACK

Binxin Ru, Adam D. Cobb, Arno Blaas Machine Learning Research Group, Department of Engineering Science, University of Oxford {robin, acobb, arno}@robots.ox.ac.uk

Yarin Gal OATML Research Group, Department of Computer Science, University of Oxford yarin.gal@cs.ox.ac.uk

Black-box adversarial attacks require a large number of attempts before ﬁnding successful adversarial examples that are visually indistinguishable from the original input. Current approaches relying on substitute model training, gradient estimation or genetic algorithms often require an excessive number of queries. Therefore, they are not suitable for real-world systems where the maximum query number is limited due to cost. We propose a query-efﬁcient black-box attack which uses Bayesian optimisation in combination with Bayesian model selection to optimise over the adversarial perturbation and the optimal degree of search space dimension reduction. We demonstrate empirically that our method 1 can achieve comparable success rates with 2-5 times fewer queries compared to previous stateof-the-art black-box attacks.

1 INTRODUCTION

Deep learning algorithms are widely deployed in many real-world systems and are increasingly being used for tasks, ranging from identity veriﬁcation (Liu et al., 2018), to ﬁnancial services (Heaton et al., 2017) to autonomous driving (Bojarski et al., 2016). However, even the most accurate deep learning models can be easily deceived by perturbations which are visually imperceptible to the human eye (Szegedy et al., 2013; Carlini et al., 2016). The growing costs and risks associated with the potential model failures has led to the importance of studying adversarial attacks, both in assessing their robustness and their ability to detect such attacks. In this paper we focus on highly practical adversarial attacks that fulﬁll the following two criteria. First, the attack is designed for a black-box setting because in real-world examples, the attacker would normally have no knowledge of the target deep learning model and can only interact with the model by querying it. Second, query efﬁciency is highly prioritised because in practical cases where the damage caused by the attack is high, the query budget available to the attacker will be highly limited due to the risk of being detected by the defence system or other high inherent costs (monetary or computational) of model evaluations.

Despite of the large array of adversarial attacks proposed in the literature, many of them are whitebox approaches that assume full access to the target model architecture and the ability of performing back-propagation to get gradient information (Moosavi-Dezfooli et al., 2016; Kurakin et al., 2016; Gu & Rigazio, 2014; Goodfellow et al., 2014; Chen et al., 2018; Carlini & Wagner, 2017). On the other hand, for black-box attacks, there are various techniques that have been used which do not require access to the model architecture. One class of methods trains a white-box substitute model and attacks the target model with adversarial examples that successfully fool the substitute (Papernot et al., 2017). However, this type of method requires the availability of the original training data or large query data to train the substitute network and the performance is often limited by the mismatch between the substitute and the target models (Su et al., 2018). The second class of black-box attacks, which show better empirical performance than the substitute model approaches, numerically estimate the gradient of the target model by repeatedly querying it (Chen et al., 2017; Ilyas et al., 2018; Tu et al., 2018) and attack with the estimated gradient. Although various techniques are employed to increase the query efﬁciency for the gradient estimation, they need an excessively large query budget to achieve a successful attack (Alzantot et al., 2018). Another line of work removes the need for gradient estimation and uses decision-based techniques (Brendel et al., 2017) or genetic algorithms

1Our code is available at https://github.com/rubinxin/Bayes Opt_Attack.git

Published as a conference paper at ICLR 2020

(Alzantot et al., 2018) to generate adversarial examples. One popular technique that is adopted by many black-box attacks (Chen et al., 2017; Alzantot et al., 2018; Tu et al., 2018) to signiﬁcantly improve the query efﬁciency is to search for adversarial perturbations in a low-dimensional latent space (search dimensionality reduction). However, learning the effective dimensionality of the latent search space can be challenging by itself, and has not been investigated by the prior works to the best of our knowledge.

In light of the above limitations, we propose a query efﬁcient black-box attack that iteratively optimises over both the adversarial perturbation and the effective dimensionality of the latent search space. Our main contributions are summarised as follows:

We introduce a novel gradient-free black-box attack method, Bayes Opt attack, which uses Bayesian optimisation with Gaussian process surrogate models to ﬁnd the effective adversarial example and is capable of dealing with high-dimensional image inputs.

We proposes a Bayesian technique which learns the optimal degree of search dimensionality reduction by harnessing our statistical surrogate and information from query data. This technique can be incorporated naturally into our attack procedure, leading to efﬁcient optimisation over both adversarial perturbation and the latent search space dimension.

We empirically demonstrate that under the L constraint, our proposed attack method can achieve comparable success rate with about 2 to 5 times fewer model queries in comparison to current state-of-the-art query-efﬁcient black box attack methods.

2 RELATED WORK ON BLACK-BOX ATTACKS

Most of the existing adversarial attacks (Moosavi-Dezfooli et al., 2016; Kurakin et al., 2016; Gu & Rigazio, 2014; Goodfellow et al., 2014; Chen et al., 2018; Carlini & Wagner, 2017) focus on whitebox settings, where the attacker can get full access to the target model and has complete knowledge of the architecture, weights and gradients to generate successful adversarial examples. However, real-world systems are usually attacked in a black-box setting, where one has no knowledge about the model and can only observe the input-output correspondences by querying the model. Here we give a brief overview of various existing black-box attacks.

Substitute model One class of black-box attacks uses data acquired from querying the target blackbox model to train a substitute model (Papernot et al., 2017), which mimics the classiﬁcation behaviour of the target model. The adversary can then employ any white-box method to attack the fully observable substitute model and apply the successful adversarial example to the target model. Such approaches rely on the transferability assumption that adversarial examples which are effective to the substitute model are also very likely to conceive the target model given their similar classiﬁcation performance on the same data (Szegedy et al., 2013). Moreover, to provide training data for the substitute model, the adversary either requires information on the target model s training set, which is highly unrealistic in real-world applications, or needs to build a synthetic training set by querying the target model (Papernot et al., 2017), which implies a large number of model evaluations and becomes hardly feasible for large models or complex datasets (Brendel et al., 2017).

Gradient estimation An alternative to training a substitute model is to estimate the gradient via ﬁnite differences and use the estimated gradient information to produce attacks. However, the naive coordinate-wise gradient estimation requires excessive queries to the target model (2 queries per coordinate per descent/attack step) and thus is not feasible for attacking models with high dimensional inputs (e.g. classiﬁers on Image Net). Chen et al. (2017) overcome this limitation by using stochastic coordinate gradient descent and selecting the batch of coordinates via importance sampling, introducing the state-of-the-art zeroth-order attack, ZOO, which achieves comparable attack success rate and perturbation costs as many white-box attacks. Although ZOO makes it computationally tractable to perform black-box attack on high-dimensional image data, it still requires millions of queries to generate a successful adversarial example, making it impracticable for attacking real-world systems where model query can be expensive and the budget limited. Improving on ZOO, Auto ZOOM (Tu et al., 2018) signiﬁcantly enhances the query efﬁciency by using random vectors to estimate the full gradient and adjusting the estimation parameter adaptively to trade off query efﬁciency vs. input perturbation cost. More importantly, Auto ZOOM shows the beneﬁts of employing dimension reduction techniques in accelerating attacks (i.e. searching for adversarial perturbation in a low-dimensional

Published as a conference paper at ICLR 2020

latent space and decoding it back to the high-dimensional input space). Parallel work by Ilyas et al. (2018) estimate the gradient through a modiﬁed version of natural evolution strategy which can be viewed as a ﬁnite-difference method over a random Gaussian basis. The estimated gradient is then used with projected gradient descent, a white-box attack method, to generate adversarial examples.

Gradient-free optimisation As discussed, gradient-estimation approaches in general need an excessive number of queries to achieve successful attack. Moreover, their dependence on the gradient information makes them less robust to defences which manipulate the gradients (Athalye et al., 2018; Brendel et al., 2017; Guo et al., 2017). Thus, truly gradient-free methods are more likely to bypass such defences. One recent example, which has demonstrated state-of-the-art query efﬁciency, is Gen Attack (Alzantot et al., 2018). Gen Attack uses genetic algorithms to iteratively evolve a population of candidate adversarial examples. Besides having an annealing scheme for mutation rate and range, Gen Attack also adopts dimensional reduction techniques, similar to Auto ZOOM, to improve the query efﬁciency. In parallel to Gen Attack, Brendel et al. (2017) introduce a decision-based attack, Boundary Attack, which only requires access to the ﬁnal model decision. Boundary Attack starts from a huge adversarial perturbation and then iteratively reduces the perturbation through a random walk along the decision boundary. However, Boundary Attack takes about 72 times more queries than Gen Attack to fool an undefended Image Net model (Alzantot et al., 2018). Another recent approach introduced in Moon et al. (2019) reduces the search space to a discrete domain and subsequently uses combinatorial optimisation to ﬁnd successful attacks, mostly focusing on the less challenging setting of untargeted attacks. Finally, the prior works that use Bayesian optimisation for adversarial attacks (Suya et al., 2017; Zhao et al., 2019) only investigate the use of simple Gaussian process as the surrogate model and are limited in their performance. The method proposed in (Suya et al., 2017) deals with the untargetted attack setting and only demonstrates the effectiveness of Bayesian optimisation in comparison to random search on a low-dimensional (d = 57) email attack task. BO-ADMM proposed in (Zhao et al., 2019) works on image data but it applies Bayesian optimisation directly on the search space of image dimension to minimise the joint objective of attack loss and distortion loss. Despite its query efﬁciency, BO-ADMM leads to poor-quality adversarial examples of large distortion loss.

3 PRELIMINARIES

3.1 PROBLEM DEFINITION

We focus on the black-box attack setting, where the adversary has no knowledge about the network architecture, weights, gradient or training data of the target model f, and can only query the target model with an input x to observe its prediction scores on all C classes (i.e. f : Rd [0, 1]C) (Tu et al., 2018; Alzantot et al., 2018). Moreover, we aim to perform targeted attacks, which is more challenging than untargeted attacks, subject to a constraint on the maximum change to any of the coordinates (i.e., a L constraint) (Warde-Farley & Goodfellow, 2016; Alzantot et al., 2018). Speciﬁcally, targeted attacks refer to the case where given a valid input xorigin of class torigin (i.e. arg maxi {1,...,C} f(xorigin)i = torigin) and a target t = torigin, we aim to ﬁnd an adversarial input x, which is close to xorigin according to the L norm, such that arg maxi {1,...,C} f(x)i = t. Untargeted adversarial attacks refer to the case that instead of classifying xorigin as torigin, we try to ﬁnd an input x so that arg maxi {1,...,C} f(x)i = torigin.

In our approach, we follow the convention to optimise over the perturbation δ instead of the adversarial example x directly (Chen et al., 2017; Alzantot et al., 2018; Tu et al., 2018). Therefore, our problem can be formulated as:

arg max i {1,...,C} f(xorigin + δ)i = t s.t. δ δmax (1)

3.2 BAYESIAN OPTIMISATION

Bayesian optimisation (Bayes Opt) is a query-efﬁcient approach to tackle global optimisation problems (Brochu et al., 2010). It is particularly useful when the objective function is a black-box and is costly to evaluate. There are 2 key components in BO: a statistical surrogate, such as a Gaussian process (GP) or a Bayesian neural network (BNN) which models the unknown objective, and an

Published as a conference paper at ICLR 2020

acquisition function α( ) which is maximised to recommend the next query location by trading off exploitation and exploration (see Algorithm 3 in Appendix A for standard Bayes Opt).

4 BAYESOPT ATTACK

4.1 BAYESOPT ATTACK OBJECTIVE

It has been shown that reducing the dimensionality of the search space in Equation (1) increases query efﬁciency signiﬁcantly (Chen et al., 2017; Tu et al., 2018; Alzantot et al., 2018). Due to our focus on the small query regime where our surrogate model needs to be trained with a very small number of observation data, we adopt the previously suggested dimensionality reduction technique, bilinear resizing, to reduce the challenging problem of optimising over high-dimensional input space of x Rd to one over a relatively low-dimensional input space, setting x = xorigin + g(δ) where δ Rdr with dr < d and g : Rdr Rd being the bilinear resizing decoder2.

Furthermore, we follow the approach of smoothing the discontinuous objective function in Equation (1), which has been found to be beneﬁcial in previous work (Chen et al., 2017; Tu et al., 2018; Alzantot et al., 2018). Together with the dimensionality reduction, this leads to the following blackbox objective problem for our Bayesian optimisation:

δ = arg max δ y(δ) s.t. δ [ δmax, δmax]dr (2)

where y(δ) =

log f(xorigin + g(δ))t log

j =t f(xorigin + g(δ))j

4.2 GP-BASED BAYESOPT ATTACK

We ﬁrst use Bayes Opt with a standard GP as the surrogate to solve for the black-box attack objective in the reduced input dimension in Equation (2). The GP encodes our prior belief on the objective y :

y GP(µ(δ), k(δ, δ )) (3)

which is speciﬁed by a mean function µ and a kernel/covariance function k (we use the Matern-5/2 kernel in our work). In our work, we normalise the objective function value and thus use a zeromean prior µ( ) = 0. The predictive posterior distribution for yt at a test point δt conditioned on the observation data Dt 1 = {δi, yi}t 1 i=1 = {δ1:t 1, y1:t 1} then has the form:

p(yt|δt, Dt 1) = GP(y; µy( ), ky( , )) (4)

µy(δt) = k(δt, δ1:t 1)K 1 1:t 1y1:t 1, (5)

ky(δt, δ t) = k(δt, δ t) k(δt, δ1:t 1)K 1 1:t 1k(δ1:t 1, δ t). (6)

where K1:t 1 = K(δ1:t 1, δ1:t 1) is the t 1 t 1 matrix with pairwise covariances k(δl, δ m), l, m t 1 as entries. The optimal GP hyper-parameters θ such as the length scales and variance of the kernel function k can be learnt by maximising the marginal likelihood p(Dt 1|θ) which has analytic form as presented in Appendix F. Please refer to (Rasmussen, 2003) for GPs.

Based on the predictive posterior distribution, we construct the acquisition function α(δ|Dt 1) to help select the next query point δt. The acquisition function can be considered as a utility measure which balances the exploration and exploitation by giving higher utility to input regions where the functional value is high (high µy) and where the model is very uncertain (high ky). The approach of using Bayes Opt with a GP surrogate to attack the target model is described in Algorithm 1.

2Note that our method is not dependent on a particular choice of decoder, i.e. it could also be used together with an autoencoderor PCA-based dimensionality reduction technique.

Published as a conference paper at ICLR 2020

Algorithm 1 Bayes Opt Attack

1: Input: A black-box function y, observation data D0, iteration budget T, a decoder g( ) 2: Output: The best recommended adversarial example x = xorigin + g(δ ) 3: for t = 1, . . . , T do 4: Select δt = arg max αt(δ|Dt 1) 5: yt = y(δt) and Dt Dt 1 (δt, yt) 6: Update the surrogate model with Dt 7: end for

4.3 ADDITIVE GP-BASED BAYESOPT ATTACK

Although techniques such as bilinear resizing are able to reduce the input dimension from the original image size (e.g. d = 3072 for CIFAR10 image) to a much lower dimension (e.g. d = 192), the reduced search space for the adversarial attack is still considered very high dimensional for GP-based Bayes Opt (which is usually applied on problems with d 20).

There are two challenges in using Bayes Opt for high dimensional problems. The ﬁrst is the curse of dimensionality in modelling the objective function. When the unknown function is highdimensional, estimating it with non-parametric regression becomes very difﬁcult because it is impossible to densely ﬁll the input space with ﬁnite number of sample points, even if the sample size is very large (Gy orﬁet al., 2006). The second challenge is the computational difﬁculty in optimising the acquisition function. The computation cost needed for optimising the acquisition function to within a desired accuracy grows exponentially with dimension (Kandasamy et al., 2015).

We adopt the additive-GP model (Duvenaud et al., 2011; Kandasamy et al., 2015) to deal with the above-mentioned challenges associated with searching for adversarial perturbation in the high dimensional space. The key assumption we make is that the objective can be decomposed into a sum of low-dimensional composite functions:

j=1 y(j)(δ(Aj)) (7)

where δ(Aj) denotes disjoint low-dimensional subspaces, δ = M j=1δ(Aj) and δ(Aj) δ(Ai) = for all j = i. If we impose a GP prior for each y(j), the prior for the overall objective y is also a GP: y GP(µ(δ), k(δ, δ )) where µ(δ) = PM j=1 µ(j) δ(Aj) and k(δ, δ ) = PM j=1 k(j) δ(Aj), δ (Aj) .

The predictive posterior distribution for each subspace p(y(j) 1:t 1|δ(Aj) t , Dt) is:

µ(j) y (δ(Aj) t ) = k(j)(δ(Aj) t , δ(Aj) 1:t 1)K 1 1:t 1y1:t 1,

k(j) y (δ(Aj) t , δ (Aj) t ) = k(j)(δ(Aj) t , δ (Aj) t ) k(j)(δ(Aj) t , δ(Aj) 1:t 1)K 1 1:t 1k(j)(δ(Aj) 1:t 1, δ (Aj) t ).

In our case, the exact decomposition (i.e. which input dimension belongs to which low-dimensional subspace) is unknown but we can learn it together with other GP hyperparameters by maximising marginal likelihood (Kandasamy et al., 2015). We demonstrate the effectiveness of our decomposition learning in Appendix E. Note that in this case, the acquisition function is formulated based on p(y(j) t |δ(Aj) t , Dt) for each subspace and is also optimised in the low-dimensional subspace, thus leading to much more efﬁcient optimisation task. The optimal perturbations in all the subspaces {δ(Aj) }M j=1 are then combined to give the next query point δt.

4.4 LEARNING THE OPTIMAL dr

Generating the successful adversarial example x Rd by searching perturbation in a reduced dimension δ Rdr has become a popular practice that leads to signiﬁcant improvement in query efﬁciency (Chen et al., 2017; Tu et al., 2018; Alzantot et al., 2018). However, what the optimal dr is and how to decide it efﬁciently have not been investigated in previous work (Chen et al., 2017; Tu et al., 2018; Alzantot et al., 2018). As we shown empirically in Section 5.1, setting dr arbitrarily can lead to suboptimal attack performance in terms of query efﬁciency, attack success rate as well

Published as a conference paper at ICLR 2020

Algorithm 2 Bayesian selection of dr

1: Input: A decoder g( ), observation data Dd t 1 = {g(δi), yi}t 1 i=1 where g(δi) Rd and a set of possible dr : {dr j}N j=1 2: Output: The optimal reduced dimension dr and the corresponding GP model 3: for j = 1, . . . , N do

4: D dr j t 1 = {g 1 (g(δi)) , yi}t 1 i=1 where g 1 (g(δi)) Rdr j

5: Fit a GP model to D dr j t 1 and computes its maximum marginal likelihood p(Dd t 1|θ , dr j) 6: end for 7: Select dr = arg maxdr j {dr j }N j=1 p(Dd t 1|θ , dr j) and its correspond GP model

as perturbation costs while ﬁnding a good dr by trial and error or progressive increasing is computationally expensive and highly inefﬁcient because the optimal dr varies with different attack input xorigin. An effective way of learning dr is thus very important for adversarial attacks. In this section, we propose a rigorous method, which is neatly compatible with our attack technique, to learn the optimal dr from the query information.

The optimal dr should be the one that both takes into consideration our prior knowledge on the discrete dr choices and at the same time best explain the observed query data. Given that our Bayes Opt attack uses a statistical surrogate (i.e. GP in our case) to model the unknown relation between the attack objective score y and the adversarial perturbation δ, this naturally translates to the criterion of maximising the posterior for dr:

p(dr i |Dt 1) = p(Dt 1|dr i )p(dr i ) p(Dt 1) (8)

where p(Dt 1|dr j) is the marginal likelihood (evidence) of adopting a speciﬁc dr j, p(dr j) is our prior over the possible dr values and p(Dt 1) = P

i p(Dt 1|dr j)p(dr j) is a normalising constant. If we match different dr j to different model choice, the problem of choosing dr then becomes the classic Bayesian model selection task (Rasmussen, 2003). In most cases, our prior assumption is that we do not prefer one dr over another (i.e. ﬂat prior p(dr j) = p(dr i ) for j = i) and thus we can select dr by comparing their evidence(Mac Kay & Mac Kay, 2003):

p(dr j|Dt 1) p(dr j|Dt 1) = p(Dt 1|dr j)p(dr j) p(Dt 1|dr j)p(dr j) = p(Dt 1|dr j) p(Dt 1|dr j). (9)

The exact computation of the evidence term requires marginalisation over model hyper-parameters, which is intractable. We approximate the integral with point estimates (i.e. marginal likelihood of our GP model): p(Dt 1|dr j) = R p(Dt 1|θ, dr j)p(θ|dr j)dθ p(Dt 1|θ , dr j) where θ = arg maxθ p(Dt 1|θ, dr j). For query efﬁciency, we project the same perturbation query data in the original image dimension Dd t 1 = {g(δi), yi}t 1 i=1 to different latent spaces to get corresponding low-dimensional training data sets for separate GP models. The GP model that corresponds to dr j is trained on D dr j t 1 = {g 1 (g(δi)) , yi}t 1 i=1 where g 1 (g(δi)) Rdr j . Then we select the optimal

dr by comparing the marginal likelihood p(D dr j t 1|θ , dr j) of each GP surrogate. The overall procedure for dr selection is described in Algorithm 2. We would like to highlight that the use of the statistical surrogate in our Bayes Opt attack approach enables us to naturally use the Bayesian model selection technique to learn the optimal dr that automatically enjoy the trade-off between data-ﬁt quality and model complexity. And we also show empirically in Section 5.3 that by automating and incorporating the learning of dr into our Bayes Opt attack, we can gain higher success rate and query efﬁciency. Other adversarial attacks methods can also use our proposed approach to decide dr but it would require the additional efforts of constructing statistical models to provide p(Dt 1|dr j).

5 EXPERIMENTS

We empirically compare the performance of our Bayes Opt attacks against the state-of-the-art blackbox methods such as ZOO (Chen et al., 2017), Auto ZOOM(Tu et al., 2018) and Gen Attack (Alzantot et al., 2018). We denote the Bayes Opt attack with standard GP surrogate as GP-BO, its variant that

Published as a conference paper at ICLR 2020

1 2 3 4 5 Image ID

1 2 3 4 Attack Instances

Query counts

1 2 3 4 Attack Instances

Mean per-pixel L2 (0.0001)

d r=8x8x3 d r=10x10x3 d r=12x12x3 d r=14x14x3 d r=16x16x3 d r=18x18x3

Figure 1: Performance of Bayes Opt attack with GP surrogate in different reduced dimension dr. Each color represents a particular dr. The left subplot shows the attack success rates(ASR) for attacking all the other 9 classes on each of the 5 original images. The dr that leads to the highest ASR varies with the original images. The middle subplot shows the number of queries needed to attack the same attack instances (i.e. the same original image and the same attack target) by using different dr. For the same 4 attack instances, the right subplot show the effect of the choice of dr on the L2 distances between the resultant adversarial examples and the original image.

learns the dr automatically is GP-BO-auto-dr as well as the attack with additive GP surrogate as ADDGP-BO. For all the Bayes Opt methods, we use GP-UCB as the acquisition function 3 and update the GP hyperperameters every 5 Bayes Opt iterations. We relearn the optimal dr for GP-BOauto-dr and the search space decomposition for ADDGP-BO every 40 iterations. For ADDGP-BO, we decompose the search spaces into M = 12 sub-spaces of equal dimensions for MNIST and CIFAR10 and into M = 27 sub-spaces for Image Net.

The target models that we attack follow the same architectures as that used in Auto ZOOM and Gen Attack; These are image classiﬁers for MNIST (a CNN with 99.5% test accuracy) and CIFAR10 (a CNN with 80% test accuracy). Following the experiment design in (Tu et al., 2018), we randomly select 50 correctly classiﬁed images from CIFAR10 test data and MNIST test data. We then perform targeted attacks on these images. Each selected image is attacked 9 times, targeting at all but its true class and this gives a total of 450 attack instances for both CIFAR10 and MNIST. For Image Net, we also select 50 correctly classiﬁed images from the test set but perform one random targeted attack for each image. We set δmax = 0.3 for attacking MNIST and δmax = 0.05 for CIFAR10 and Image Net, which are used in (Alzantot et al., 2018). We use the recommended parameter setting and their open sourced implementations for performing all competing attack methods (Chen et al., 2017; Tu et al., 2018; Alzantot et al., 2018).

5.1 EFFECT OF dr

We ﬁrst empirically investigate the effect of the reduced dimensionality dr of the latent space in which we search for the adversarial perturbation. We experiment with the GP-based Bayes Opt attacks for the CIFAR10 classiﬁer using reduced dimension of dr = {6 6 3, 8 8 3, 10 10 3, 12 12 3, 14 14 3, 16 16 3, 18 18 3} and perform targeted attacks on 5 target images with each image being attacked 9 times, leading to 45 attack instances. We ﬁrst investigate the attack success rate (ASR) achieved at different dr for all the 5 images. The results on ASR out of the 9 targeted attacks for each of the 5 images, which are indicated by Image ID 1 to 5, are shown in the left subplot of Figure 1. It s evident that the dr which leads to highest attack success rate varies for different original images xorigin.

We then examines how the dr affects the query efﬁciency and attack quality (average L2 distance of the adversarial image from the original image) for the attack instances (e.g. make the classiﬁer to mis-classify a airplane image as a cat) that GP-based Bayes Opt can attack successfully at all dr. We present the results on 4 attack instances of a airplane image in the middle and right subplots of Figure 1. We can see that even for the same original image and attack instances, varying dr can

3The parameter that trades off exploration and exploitation is set to 2.

Published as a conference paper at ICLR 2020

Table 1: Attack results on 50 random MNIST images by using dr = 14 14 1. Q denotes the query count. ASR denotes attack success rate. The standard errors are in parentheses.

Attack method ASR Q (Max, Median, Mean) Average L2 perturbation (per pixel)

ADDGP-BO 97% 829, 46, 95( 6) 6.81 10 3( 3.12 10 5) GP-BO-auto-dr 98% 833, 75, 121( 6) 6.89 10 3( 5.55 10 5) GP-BO 82% 899, 42, 77( 5) 7.15 10 3( 4.17 10 5) Gen Attack 92% 986, 146, 217( 9) 6.66 10 3( 2.81 10 5) Auto ZOOM 99% 998, 229, 265( 9) 5.05 10 3( 7.71 10 5) ZOO 1% 256, 128, 128( 64) 2.78 10 3( 3.53 10 5)

Table 2: Attack results on 50 randomly selected CIFAR10 images with dr = 14 14 3. Q denotes the query count. ASR denotes attack success rate.The standard errors are in parentheses.

Attack method ASR Q (Max, Median, Mean) Average L2 perturbation (per pixel)

ADDGP-BO 87% 885, 154, 222( 10) 5.87 10 4( 3.22 10 6) GP-BO-auto-dr 87% 891, 159, 234( 10) 5.96 10 4( 5.00 10 6) GP-BO 72% 899, 141, 190( 8) 5.78 10 4( 4.67 10 6) Gen Attack 68% 991, 246, 329( 14) 7.38 10 4( 4.14 10 6) Auto ZOOM 38% 896, 102, 154( 11) 9.97 10 4( 1.53 10 5) ZOO 4% 768, 256, 369( 57) 1.77 10 4( 1.16 10 5)

impact query efﬁciency and L2 norm of the successful adversarial perturbation signiﬁcantly. For example, dr = 8 8 3 is most query efﬁcient for attack instance 1 and 4 but is outperformed by other dimensions in attack instance 2 and 3. Therefore, the importance of dr and the difﬁculty of ﬁnding the optimal dr for a speciﬁc target/image motivates us to derive our method for learning it automatically from the data. As shown in the following sections, our dr learner does lead to more superior attack performance.

5.2 PERFORMANCE UNDER A FIXED QUERY BUDGET

In this experiment, we limit the total query number to be 1000, which is slightly above the median query counts needed for Gen Attack to make a successful attack on MNIST and CIFAR10 (Alzantot et al., 2018) and 2000 for Image Net. We adopt the reduced search dimension recommended by Auto ZOOM, dr = 14 14 3 for CIFAR10 and dr = 14 14 1 for MNIST and allow GP-BOauto-dr to automatically learn the reduced dimension dr in a range between 6 6 1 and 28 28 1 for MNIST, and between 6 6 3 and 32 32 3 for CIFAR10.

For Image Net, due to the high dimensionality of its images, we adopt a hierarchical decoding process: 1) ﬁrst performance Bayes Opt attacks(ADDGP-BO and GP-BO) on a reduced dimension of dr 1 = 48 48 3 and then 2) decode the adversarial perturbation found in dr 1 to dr 2 = 96 96 3 via bilinear upsampling. 3) This is followed by another bilinear decoder projecting the adversarial perturbation in dr 2 back to image dimension of d = 299 299 3. As for the GP-BO-auto-dr, we allow the algorithm to automatically learn the reduced dimension dr 1 in the phase 1) in the range between 6 6 3 to 60 60 3. Gen Attack is performed with the same upsampling as our methods. As for Auto ZOOM and ZOO, we uses the optimal dr settings recommended in (Tu et al., 2018; Chen et al., 2017) for Image Net.

For Bayes Opt attack, each iteration requires 1 query to the objective function so we limit its iteration to 1000 and early terminate the Bayes Opt attack algorithm when successful adversarial example is found. Auto ZOOM comprises 2 stages in their attack algorithm. The ﬁrst exploration stage aims to ﬁnd the successful adversarial example. Once a successful attack is found, it switches to the ﬁnetuning stage to reduce the perturbation cost (e.g. L2 norm) of the successful attack. We report its performance on the attack success rate and L2 norm of adversarial perturbations after a budget of 1000 queries, which allows it to ﬁne-tune the successful adversarial perturbations found. Moreover, Auto ZOOM uses L2 norm to measure the perturbation costs but our method and Gen Attack limit the search space via L norm. We observe that the successful attacks found by Auto ZOOM incur

Published as a conference paper at ICLR 2020

Table 3: Attack results on 50 randomly Image Net images with random target label under a query budget of 2000. Q denotes the query count. ASR denotes attack success rate. The standard errors are in parentheses. ZOO fails to make any successful attack within this budget.

Attack method ASR Q (Max, Median, Mean) Average L2 perturbation (per pixel)

ADDGP-BO 60% 1985, 1247, 1206( 90) 1.74 10 4( 1.89 10 6) GP-BO-auto-dr 32% 1999, 922, 938( 128) 1.96 10 4( 8.07 10 6) GP-BO 16% 1947, 1113, 1232( 162) 1.62 10 4( 7.94 10 6) Gen Attack 12% 1971, 1354, 1266( 259) 2.03 10 4( 9.81 10 6) Auto ZOOM 2% 1451, 1451, 1451( 0.00) 1.21 10 5( 0.00)

much higher perturbation costs in terms of L norm than Gen Attack and our method. Therefore, we only consider the ﬁnal adversarial examples whose L distances from the original images lie within [ δmax, δmax] as the successful attacks 4.

The results on MNIST in Table 1 show that all our Bayes Opt attacks can achieve a comparable success rate but at a much lower query count in comparison to Gen Attack (Alzantot et al., 2018) and Auto ZOOM (Tu et al., 2018). Speciﬁcally, the median query count of ADDGP-BO is 68% less than Gen Attack and 80% less than Auto ZOOM while that of GP-BO-auto-dr is 49% less than Gen Attack and 67% less than Auto ZOOM.

As for results on attacking on CIFAR10, Table 2 shows that all our Bayes Opt attack can achieve signiﬁcantly higher success rate but again at a remarkably lower query counts than the existing black-box approaches. For example, our ADDGP-BO can achieve 18% higher success rate while using 37.4% less queries in terms of the median as compared to Gen Attack. In addition, our approaches also lead to better quality adversarial examples (Figure 3 in Appendix B ) which are closer to the original image than the benchmark methods as reﬂected by the lower average L2 perturbation (20.5% less). More importantly, this set of experiments also demonstrate the effectiveness of our Bayesian method for learning dr as GP-BO-auto-dr leads to 15% increase in attack success rate compared to GP-BO while maintaining the competitiveness in query efﬁciency and L2 distance.

Similarly, the results on Image Net in Table 3 shows that all our Bayes Opt attacks can achieve higher attack success rate than the competing methods given a query budget of 2000. ADDGP-BO is the clearly best performing method in this very high-dimensional setting, achieving 5 times higher success rate than the best competing method, Gen Attack, while again obtaining successful adversarial perturbations with lower average L2 perturbation (14 % less) than Gen Attack. This conﬁrms our hypothesis that the additive-GP surrogate together with decomposition learning is very effective for high-dimensional optimisation.

5.3 QUERY EFFICIENCY COMPARISON

We ﬁnish by comparing the query efﬁciency, measuring the change in the attack success rate over query counts for all the methods. We limit the query budget of our Bayes Opt attacks to 1000 but let the competing methods to continue running until they achieve the same best attack success rate as our best Bayes Opt attacks or exceeds a much higher limit (2000 queries for MNIST and 5000 queries for CIFAR10 and 14000 queries for Image Net).

As shown in Figure 2, Bayes Opt attacks converge much faster to the high attack success rates than the other methods. Speciﬁcally, GP-BO-auto-dr take less than 833 queries to achieve the success rate of 98% for MNIST, which is 24% of that by Gen Attack (2500). As for CIFAR10, both ADDGPBO and GP-BO-auto-dr takes around 890 queries to achieve a success rate of 87% which is 23% of the query count by Auto ZOOM and 17% of that by Gen Attack. One point to note is that Auto ZOOM appears to be slightly more query efﬁcient than Gen Attack in this set of experiments. However, we need to bear in mind that although we limit the mean Linf norm of the adversarial perturbation found by Auto ZOOM to δmax, Auto ZOOM still have the advantage of exploring beyond the δmax for many dimensions. In the case of Image Net, all our Bayes Opt attacks again enjoy faster convergence, with

4This still gives an advantage to ZOO and Auto ZOOM because it allows them to inject perturbation whose magnitude is larger than δmax at certain pixels but our Bayes Opt methods and Gen Attack limit perturbation at all the pixels to be less than δmax.

Published as a conference paper at ICLR 2020

0 500 1,000 2,000 3,000 Query counts

GP-BO-auto-dr ADDGP-BO GP-BO Gen Attack Auto ZOOM ZOO

(a) Attacks on MNIST

0 5001,000 2,000 3,000 4,000 5,200 Query counts

GP-BO-auto-dr ADDGP-BO GP-BO Gen Attack Auto ZOOM ZOO

(b) Attacks on CIFAR10

0 2,500 5,000 7,500 10,000 12,500 15,000 17,500 20,000 Query counts

GP-BO-auto-dr ADDGP-BO GP-BO Gen Attack Auto ZOOM ZOO

(c) Attacks on Image Net

Figure 2: Query efﬁciency of Bayes Opt Attacks. The plots show the attack success rate (ASR) of different methods up to certain query counts. The best Bayes Opt attacks (i.e. GP-BO-auto-dr (red) and ADDGP-BO (blue)) can achieve an ASR of 98% on MNIST(a) and 87% on CIFAR10 (b) within a budget of 1000 queries. To achieve the same success rates, Gen Attack (purple) takes 3500 for MNIST and 5141 for CIFAR10, and Auto ZOOM (green) takes 800 for MNIST and 3880 for CIFAR10. For Image Net (c), the best Bayes Opt attack is ADDGP-BO (blue), which achieves an ASR of 60% with 1985 queries. To achieve the same success rates, Gen Attack (purple) takes a budget of 4711 queries and Auto ZOOM (green) takes 19451 queries. ZOO fails to make any attack on Image Net within the given budget.

ADDGP-BO taking only 1985 queries to achieve 60% ASR. Meanwhile, Gen Attack, takes a budget of 4711 queries, 2.4 times that of ADDGP-BO, to achieve the same ASR and Auto ZOOM costs 19451 queries which is 9.8 times that of ADDGP-BO.

6 CONCLUSION

We introduce a new black-box adversarial attack which leverages Bayesian optimisation to ﬁnd successful adversarial perturbations with high query efﬁciency. We also improve our attack by adopting an additive surrogate structure to ease the optimisation challenge over the typically high-dimensional task. Moreover, we take full advantage of our statistical surrogate model and the available query data to learn the optimal degree of dimension reduction for the search space via Bayesian model selection. In comparison to several existing black-box attack methods, our Bayes Opt attacks can achieve high success rates with 2-5 times fewer queries while still producing adversarial examples that are closer to the original image (in terms of average L2 distance). We believe our Bayes Opt attacks can be a competitive alternative for accessing the model robustness.

One limitation of our attack methods is that due to the poor scalability of GPs, the attack algorithms are computationally more expensive than most existing alternatives especially as the query number increases. Thus, our Bayes Opt attacks are optimal for the setting where the cost of evaluating the target model, being it the monetary costs, computational costs or the risk of being detected, is much higher than the computational cost of the attack algorithm itself and thus query efﬁciency is highly prioritised. On the other hand, replacing the GP surrogates with more scalable models such as sparse GPs or Bayesian neural networks can be an interesting future research direction.

Published as a conference paper at ICLR 2020

Moustafa Alzantot, Yash Sharma, Supriyo Chakraborty, and Mani Srivastava. Genattack: Practical black-box attacks with gradient-free optimization. ar Xiv preprint ar Xiv:1805.11090, 2018.

Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. ar Xiv preprint ar Xiv:1802.00420, 2018.

Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, et al. End to end learning for self-driving cars. ar Xiv preprint ar Xiv:1604.07316, 2016.

Wieland Brendel, Jonas Rauber, and Matthias Bethge. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. ar Xiv preprint ar Xiv:1712.04248, 2017.

Eric Brochu, Vlad M Cora, and Nando De Freitas. A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. ar Xiv preprint ar Xiv:1012.2599, 2010.

Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 39 57. IEEE, 2017.

Nicholas Carlini, Pratyush Mishra, Tavish Vaidya, Yuankai Zhang, Micah Sherr, Clay Shields, David Wagner, and Wenchao Zhou. Hidden voice commands. In 25th {USENIX} Security Symposium ({USENIX} Security 16), pp. 513 530, 2016.

Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM Workshop on Artiﬁcial Intelligence and Security, pp. 15 26. ACM, 2017.

Pin-Yu Chen, Yash Sharma, Huan Zhang, Jinfeng Yi, and Cho-Jui Hsieh. Ead: elastic-net attacks to deep neural networks via adversarial examples. In Thirty-second AAAI conference on artiﬁcial intelligence, 2018.

David K Duvenaud, Hannes Nickisch, and Carl E Rasmussen. Additive gaussian processes. In Advances in neural information processing systems, pp. 226 234, 2011.

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. ar Xiv preprint ar Xiv:1412.6572, 2014.

Shixiang Gu and Luca Rigazio. Towards deep neural network architectures robust to adversarial examples. ar Xiv preprint ar Xiv:1412.5068, 2014.

Chuan Guo, Mayank Rana, Moustapha Cisse, and Laurens Van Der Maaten. Countering adversarial images using input transformations. ar Xiv preprint ar Xiv:1711.00117, 2017.

L aszl o Gy orﬁ, Michael Kohler, Adam Krzyzak, and Harro Walk. A distribution-free theory of nonparametric regression. Springer Science & Business Media, 2006.

JB Heaton, NG Polson, and Jan Hendrik Witte. Deep learning for ﬁnance: deep portfolios. Applied Stochastic Models in Business and Industry, 33(1):3 12, 2017.

Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin. Black-box adversarial attacks with limited queries and information. ar Xiv preprint ar Xiv:1804.08598, 2018.

Kirthevasan Kandasamy, Jeff Schneider, and Barnab as P oczos. High dimensional Bayesian optimisation and bandits via additive models. In International Conference on Machine Learning, pp. 295 304, 2015.

Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physical world. ar Xiv preprint ar Xiv:1607.02533, 2016.

Published as a conference paper at ICLR 2020

Yi Liu, Jie Ling, Zhusong Liu, Jian Shen, and Chongzhi Gao. Finger vein secure biometric template generation based on deep learning. Soft Computing, 22(7):2257 2265, 2018.

David JC Mac Kay and David JC Mac Kay. Information theory, inference and learning algorithms. Cambridge university press, 2003.

Seungyong Moon, Gaon An, and Hyun Oh Song. Parsimonious black-box adversarial attacks via efﬁcient combinatorial optimization. ar Xiv preprint ar Xiv:1905.06635, 2019.

Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2574 2582, 2016.

Nicolas Papernot, Patrick Mc Daniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia conference on computer and communications security, pp. 506 519. ACM, 2017.

C E Rasmussen and C K I Williams. Gaussian processes for machine learning. 2006.

Carl Edward Rasmussen. Gaussian processes in machine learning. In Summer School on Machine Learning, pp. 63 71. Springer, 2003.

Dong Su, Huan Zhang, Hongge Chen, Jinfeng Yi, Pin-Yu Chen, and Yupeng Gao. Is robustness the cost of accuracy? a comprehensive study on the robustness of 18 deep image classiﬁcation models. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 631 648, 2018.

Fnu Suya, Yuan Tian, David Evans, and Paolo Papotti. Query-limited black-box attacks to classiﬁers. NIPS Workshop, 2017.

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. ar Xiv preprint ar Xiv:1312.6199, 2013.

Chun-Chen Tu, Paishun Ting, Pin-Yu Chen, Sijia Liu, Huan Zhang, Jinfeng Yi, Cho-Jui Hsieh, and Shin-Ming Cheng. Autozoom: Autoencoder-based zeroth order optimization method for attacking black-box neural networks. ar Xiv preprint ar Xiv:1805.11770, 2018.

David Warde-Farley and Ian Goodfellow. 11 adversarial perturbations of deep neural networks. Perturbations, Optimization, and Statistics, 311, 2016.

Pu Zhao, Sijia Liu, Pin-Yu Chen, Nghia Hoang, Kaidi Xu, Bhavya Kailkhura, and Xue Lin. On the design of black-box adversarial examples by leveraging gradient-free optimization and operator splitting method. In Proceedings of the IEEE International Conference on Computer Vision, pp. 121 130, 2019.

Published as a conference paper at ICLR 2020

A BAYESIAN OPTIMISATION

Here we present a generic algorithm for Bayesian optimisation.

Algorithm 3 Bayes Opt Algorithm

1: Input: A black-box function y, observation data D0, iteration budget T 2: Output: The best recommendation δ

3: for t = 1, . . . , T do 4: Select δt = arg max αt(δ|Dt 1) 5: yt = y(δt) and Dt Dt 1 (δt, yt) 6: Update the surrogate model with Dt 7: end for

B ADVERSARIAL EXAMPLES FOR CIFAR10

Figure 3: CIFAR10 adversarial examples generated by our Bayes Opt attack. True labels and target labels correspond to the rows and columns.

C ADDITIONAL EXPERIMENTS ON MNIST AND CIFAR10

D OBJECTIVE VALUE OVER BAYESOPT ITERATIONS

We illustrate this via the case of attacking a CIFAR10 image of label 9(class truck) on the other 9 target labels with original label 9We plot the value of objective function (Equation 2), which is equal to the negative of attack loss, against the Bayes Opt iterations/query counts. We can see that our additive GP surrogate (ADDGP-BO) as well as the Bayesian learning of optimal dr (GP-BOauto-dr) lead to faster convergence and thus higher attack success rate for this instance.

E EFFECTIVENESS OF DECOMPOSITION LEARNING

To learn the decomposition for the additive-GP surrogate in ADDGP-BO attack, we follow the approach proposed in (Kandasamy et al., 2015) to treat the decomposition as an additional hyperparameter and learn the optimal decomposition by maximising marginal likelihood. However, exhaustive search over all M!d!/(ds!M)5 possible decompositions is expensive. We adopt a computationally cheap alternative by randomly selecting 20 decompositions and choosing the one with

5M is the number of subspaces and ds = |Aj| is the dimension of each subspaces

Published as a conference paper at ICLR 2020

Table 4: Summary of attack results on 7 MNIST images. Q denotes the query count. ASR denotes attack success rate. The standard errors are in parentheses.

Attack method dr ASR Q (Max, Median, Mean) Average L2 perturbation (per pixel)

GP-BO 14 14 1 92% 899, 53, 119( 28) 7.01 10 3( 1.13 10 4) ADDGP-BO 14 14 1 98% 584, 34, 67( 13) 6.50 10 3( 8.39 10 5) Auto ZOOM 14 14 1 97% 892, 186, 247( 27) 5.10 10 3( 1.92 10 4) Gen Attack 14 14 1 97% 976, 184, 239( 22) 5.56 10 3( 9.25 10 5)

GP-BO 16 16 1 97% 400, 50, 80( 12) 7.08 10 3( 1.09 10 4) ADDGP-BO 16 16 1 100% 540, 42, 72( 12) 6.49 10 3( 9.41 10 5) Auto ZOOM 16 16 1 98% 812, 251, 295( 28) 5.98 10 3( 2.03 10 4) Gen Attack 16 16 1 97% 928, 202, 237( 19) 5.61 10 3( 8.58 10 5)

GP-BO 28 28 1 98% 430, 201, 225( 11) 8.71 10 3( 1.94 10 4) ADDGP-BO 28 28 1 100% 666, 57, 100( 15) 9.71 10 3( 1.12 10 4) Auto ZOOM 28 28 1 98% 860, 185, 244( 27) 1.01 10 2( 1.63 10 4) Gen Attack 28 28 1 83% 976, 365, 399( 33) 6.26 10 3( 6.69 10 5)

Table 5: Summary of attack results on 27 CIFAR10 images. Q denotes the query count. ASR denotes attack success rate. The standard errors are in parentheses.

Attack method dr ASR Q (Max, Median, Mean) Average L2 perturbation (per pixel)

GP-BO 8 8 3 52% 314, 48, 70( 13) 5.78 10 4( 1.44 10 5) ADDGP-BO 8 8 3 75% 899, 140, 234( 33) 5.55 10 4( 8.46 10 6) Auto ZOOM 8 8 3 56% 382, 126, 139( 17) 9.58 10 4( 2.66 10 5) Gen Attack 8 8 3 67% 671, 236, 286( 32) 7.40 10 4( 1.30 10 5)

GP-BO 14 14 3 81% 899, 142, 192( 11) 5.74 10 4( 6.96 10 6) ADDGP-BO 14 14 3 90% 885, 141, 213( 15) 5.79 10 4( 5.16 10 6) Auto ZOOM 14 14 3 42% 376, 95, 133( 12) 9.82 10 4( 1.95 10 5) Gen Attack 14 14 3 75% 971, 251, 321( 21) 7.35 10 4( 6.92 10 6)

GP-BO 16 16 3 83% 785, 280, 292( 22) 5.34 10 4( 1.43 10 5) ADDGP-BO 16 16 3 87% 878, 168, 229( 29) 5.76 10 4( 8.93 10 6) Auto ZOOM 16 16 3 3% 298, 170, 170( 90) 7.50 10 4( 1.35 10 5) Gen Attack 16 16 3 79% 986, 281, 339( 36) 7.36 10 4( 1.55 10 5)

the largest marginal likelihood. The method decomposition learning procedure is repeated every 40 BO iterations and we use M = 12 for CIFAR10. We denote the method as ADDGP-BO-LD in this section but as ADDGP-BO in the rest of the paper.

We also experiment with another alternative way to learn the decomposition (ADDGP-BO-FD), which is similar to importance sampling in ZOO. We group the pixels/dimensions together if the magnitude of average change in their pixel values over the past 5 BO iterations are closer (i.e. pixels that are subject to the most large adversarial perturbations are grouped together). Again, we divide the dimensions into disjoint 12 groups as above to ensure fair comparison.

Table 6: Summary of attack results on CIFAR10 for different decomposition learning method. dr = 14 14 3 which is further decomposed into 12 subspaces of ds = 49. Q denotes the query count. ASR denotes attack success rate. The standard errors are in parentheses.

Attack method ASR Q (Max, Median, Mean) Average L2 perturbation (per pixel)

ADDGP-BO-LD 94% 885, 134, 212( 16) 5.78 10 4( 5.32 10 6) ADDGP-BO-FD 87% 899, 143, 196( 14) 5.05 10 4( 4.52 10 6)

Published as a conference paper at ICLR 2020

0 250 500 750 1000 BO iteration

Attack loss

GP-BO_i9_t0 GP-BO_i9_t1 GP-BO_i9_t2 GP-BO_i9_t3 GP-BO_i9_t4 GP-BO_i9_t5 GP-BO_i9_t6 GP-BO_i9_t7 GP-BO_i9_t8

0 250 500 750 1000 BO iteration

Attack loss

GP-BO-auto-dr_i9_t0 GP-BO-auto-dr_i9_t1 GP-BO-auto-dr_i9_t2 GP-BO-auto-dr_i9_t3 GP-BO-auto-dr_i9_t4 GP-BO-auto-dr_i9_t5 GP-BO-auto-dr_i9_t6 GP-BO-auto-dr_i9_t7 GP-BO-auto-dr_i9_t8

(b) GP-BO-auto-dr

0 250 500 750 1000 BO iteration

Attack loss

ADDGP-BO_i9_t0 ADDGP-BO_i9_t1 ADDGP-BO_i9_t2 ADDGP-BO_i9_t3 ADDGP-BO_i9_t4 ADDGP-BO_i9_t5 ADDGP-BO_i9_t6 ADDGP-BO_i9_t7 ADDGP-BO_i9_t8

(c) ADDGP-BO

Figure 4: Attack objective value against Bayes Opt iterations(query count) for using various Bayes Opt methods to attack one CIFAR10 image of class label 9 (denoted by i). Curves of different colours correspond to the 9 different target labels (denoted by t). Convergence to 0 objective value indicates successful attack. ADDGP-BO and GP-BO-auto-dr enjoy faster convergence and thus higher attack success rates than simple GP-BO.

We compare the both types of decomposition learning methods using 20 randomly selected CIFAR10 images, each of which is attacked on 9 other classes except its original class and a query budget of 1000. The results are shown in Table 6. It is evident that learning decomposition by maximising marginal likelihood (ADDGP-BO-LD) can achieve higher attack success rate than pixelvalue-change-based decomposition learning (ADDGP-BO-FD) given the query budget.

F GAUSSIAN PROCESSES

Gaussian processes (GPs) are popular models for inference in regression problems, especially when a quantiﬁcation of the uncertainty around predictions is of relevance and little prior information is available to the inference problem. These qualities, together with their analytical tractability, make them the most commonly used surrogate model in Bayesian optimisation. A comprehensive overview on GPs can be found in Rasmussen & Williams (2006). Below we highlight the concepts of marginal likelihood and how to use it for hyperparameter optimisation and model selection based on the notation introduced in Section 4.2.

Hyperparameter tuning. The key to ﬁnding the right hyperparameters θ in light of Dt 1 in a principled way is given by the marginal likelihood in Equation (10).

p(Dt 1|θ) = (2π) t 1

2 |K1:t 1(θ)| 1

2y T 1:t 1K 1 1:t 1(θ)y1:t 1

where we have introduced a slightly augmented notation of K1:t 1 K1:t 1(θ) to highlight the dependence on the kernel hyperparamaters θ. In a truly Bayesian approach, one could shy away from ﬁxing one set of hyperparameters and use Equation (10) to derive a posterior distribution of θ based on one s prior beliefs about θ expressed through some distribution p0(θ). However, as such an approach generally requires sampling using methods like Markov chain Monte Carlo (MCMC) due to intractability of integrals, in practice it is often easier and computationally cheaper to replace the fully Bayesian approach by a maximum likelihood approach and simply maximise the likelihood p(Dt 1|θ) w.r.t. θ. We follow this approach and perform a mix of gridand gradient based search to maximise the logarithm of the r.h.s. of Equation (10). After evaluating a grid of 5 000 points, we start gradient ascent on each of the 5 most promising points using the following equation for the gradient (Rasmussen & Williams, 2006) to ﬁnd the hyperparameters θ which maximise the marginal likelihood:

Published as a conference paper at ICLR 2020

θj log p(Dt 1|θ) = 1 2y T 1:t 1K 1 1:t 1 K1:t 1

θj K 1 1:t 1y1:t 1 1

2tr K 1 1:t 1 K1:t 1

= 1 2tr (K 1 1:t 1y1:t 1y T 1:t 1K 1 1:t 1 K 1 1:t 1) K1:t 1

Model selection. Once the optimal hyperparameters θ are found as described above, they can be plugged back into Equation (10) to choose amongst different models. In Section 4.4 this is described for the case of choosing between different values of the reduced dimensionality dr.