# smoothed_embeddings_for_certified_fewshot_learning__0c5f1991.pdf

Smoothed Embeddings for Certiﬁed Few-Shot

Mikhail Pautov Skolkovo Institute of Science and Technology

Olesya Kuznetsova Skolkovo Institute of Science and Technology

Nurislam Tursynbek The University of North Carolina at Chapel Hill

Aleksandr Petiushko Moscow State University, Nuro, Inc.

Ivan Oseledets Skolkovo Institute of Science and Technology, AIRI

Randomized smoothing is considered to be the state-of-the-art provable defense against adversarial perturbations. However, it heavily exploits the fact that classiﬁers map input objects to class probabilities and do not focus on the ones that learn a metric space in which classiﬁcation is performed by computing distances to embeddings of class prototypes. In this work, we extend randomized smoothing to few-shot learning models that map inputs to normalized embeddings. We provide analysis of the Lipschitz continuity of such models and derive a robustness certiﬁcate against 2-bounded perturbations that may be useful in few-shot learning scenarios. Our theoretical results are conﬁrmed by experiments on different datasets.

1 Introduction

Few-shot learning is a setting in which a classiﬁcation model is evaluated on the classes not seen during the training phase. Nowadays quite a few few-shot learning approaches based on neural networks are known. Unfortunately, neural network-based classiﬁers are intimately vulnerable to adversarial perturbations [41, 10] accurately crafted small modiﬁcations of the input that may signiﬁcantly alter the model s prediction. This vulnerability poses a restriction on the deployment of such approaches in safety-critical scenarios, so the research interest in the ﬁeld of attacks on neural networks has been great in recent years.

Several works studied this phenomena in different practical applications image classiﬁcation [2, 34, 33, 40], object detection [17, 26, 48, 50], face recognition [18, 5, 54], semantic segmentation [7, 12, 50]. These studies show that it is easy to force a model to behave in the desired way by applying an imperceptible change to its input. As a result, defenses, both empirical [4, 55, 14] and provable [51, 21, 3, 47, 52, 15, 46, 35], were proposed recently. Although empirical ones can be (and often are) broken by more powerful attacks, the provable ones are of a big interest since they make it possible to provide guarantees of the correctness of the work of a model under certain assumptions, and, thus, possibly broaden the scope of tasks which may be trusted to the neural networks.

Randomized smoothing [21, 3, 25] is the state-of-the-art approach used for constructing classiﬁcation models provably robust against small-norm additive adversarial perturbations. This approach is scalable to large datasets and can be applied to any classiﬁer since it does not use any assumptions

The code for this paper is available at https://github.com/koava36/certrob-fewshot.

36th Conference on Neural Information Processing Systems (Neur IPS 2022).

Noisy images

P f(x+") kf(x+")k2

Smoothed embeddings

Adversarial embedding risk γ

Lipschitz constant L

Abstain from classiﬁcation

Figure 1: Illustration of certiﬁcation pipeline for a single image x. Given a model f( ) and n realisations of zero-mean Gaussian noise "1, . . . , "n N(0, σ2I), an estimate ˆg(x) = 1

i=1 f(x+ "i) of g(x) = E" N (0,σ2I)f(x + ") is computed. Note that g(x) is L-Lipschitz with L =

2/ σ2, according to the Theorem 1. The number of samples n is increased until adversarial embedding risk γ from the Theorem 2 is computed with Algorithms 1-2 and certiﬁed radius r = γ

L is determined. The model g( ) is treated as certiﬁed at x for all additive perturbations δ : kδk2 < r.

about the model s architecture. Generally, the idea is the following. Suppose, we are given a base neural network classier f : RD ! [0, 1]K that maps an input image x to a ﬁxed number of K class probabilities. Its smoothed version with the standard Gaussian distribution is:

g(x) = E " N (0, )f(x + "). (1)

As shown in [3], the new (smoothed) classiﬁer is provably robust at x to 2-bounded perturbations if the base classiﬁer f is conﬁdent enough at x. However, the proof of certiﬁcation heavily exploits the fact that classiﬁers are restricted to mapping an input to a ﬁxed number of class probabilities. Thus, directly applying randomized smoothing to classiﬁers in metric space, such as in few-shot learning, is a challenging task.

There are several works that aim at improving the robustness for few-shot classiﬁcation [20, 9, 53, 28]. However, the focus in such works is either on the improvement of empirical robustness or probabilistic guarantees of certiﬁed robustness; none of them provide theoretical guarantees on the worst-case model behavior.

In this work, we ﬁll this gap, by generalizing and theoretically justifying the idea of randomized smoothing to few-shot learning. In this scenario, provable certiﬁcation needs to be obtained not in the space of output class probabilities, but in the space of descriptive embeddings. This work is the ﬁrst, to our knowledge, where the theoretical robustness guarantees for a few-shot scenario are provided.

Our contributions are summarized as follows:

We provide the ﬁrst theoretical robustness guarantee for few-shot learning classiﬁcation.

Analysis of Lipschitz continuity of such models and providing the robustness certiﬁcates

against 2 bounded perturbations for few-shot learning scenarios.

We propose to estimate conﬁdence intervals not for distances between the approximation

of smoothed embedding and class prototype but for the dot product of vectors which has expectation equal to the distance between actual smoothed embedding and class prototype.

2 Problem statement

2.1 Notation

We consider a few-shot classiﬁcation problem where we are given a set of labeled objects (x1, y1), . . . , (xm, ym) where xi 2 RD and yi 2 {1, . . . , K} are corresponding labels. We fol-

low the notation from [39] and denote Sk as the set of objects of class k.

2.2 Few-shot learning classiﬁcation

Suppose we have a function f : RD ! Rd that maps input objects to the space of normalized embeddings. Then, d dimensional prototypes of classes are computed as follows (expression is given for the prototype of class k):

ck = 1 |Sk|

In order to classify a sample, one should compute the distances between its embedding and class prototypes a sample is assigned to the class with the closest prototype. Namely, given a distance function : Rd Rd ! [0, +1), the class c of the sample x is computed as below:

c = arg min

k2{1,...,K}

(f(x), ck) . (3)

Given an embedding function f, our goal is to construct a classiﬁer g provably robust to additive perturbations of a small norm. In other words, we want to ﬁnd a norm threshold t such that equality

arg min k2{1,...,K}

(g(x), ck) = arg min

k2{1,...,K}

(g(x + δ), ck) , (4)

will be satisﬁed for all δ : kδk2 t.

In this paper, the solution to a problem of constructing a classiﬁer that satisﬁes the condition in (4) is approached by extending the analysis of the robustness of smoothed classiﬁers described in (1) to the case of vector functions. The choice of the distance metric in (4) is motivated by an analysis of Lipschitz-continuity given in the next section.

3 Randomized smoothing

3.1 Background

Randomized smoothing [21, 3] is described as a technique of convolving a base classiﬁer f with an isotropic Gaussian noise such that the new classiﬁer g(x) returns the most probable prediction of f of a random variable N(x, σ2I), where the choice of Gaussian distribution is motivated by the restriction on g to be robust against additive perturbations of the bounded norm. In this case, given a classiﬁer f : RD ! [0, 1] and smoothing distribution N(0, σ2I), the classiﬁer g looks as follows:

g(x) = 1 (2 σ2)

RD f(x + ") exp

One can show by Stein s Lemma that given the fact that the function f in (5) is bounded (namely, 8x 2 D(f), |f(x)| 1), then the function g is L Lipschitz:

8x, x0 2 D(g), kg(x) g(x0)k2 Lkx x0k2, (6)

2 σ2 , what immediately produce theoretical robustness guarantee on g.

Although this approach is simple and effective, it has a serious drawback: in practice, it is impossible to compute the expectation in (5) exactly and, thus, impossible to compute the prediction of the smoothed function g at any point. Instead the integral is computed with the use of Monte-Carlo approximation with n samples to obtain the prediction with an arbitrary level of conﬁdence. Notably, to achieve an appropriate accuracy of the Monte-Carlo approximation, the number of samples n should be large enough that may dramatically affect inference time.

In this work, we generalize the analysis of Lipschitz-continuity to the case of vector functions g : RD ! Rd and provide robustness guarantees for classiﬁcation performed in the space of embeddings. The certiﬁcation pipeline is illustrated in Figure 1.

3.2 Randomized smoothing for vector functions

Lipschitz-continuity of vector function. In the work of [38], the robustness guarantee from [3] is proved by estimating the Lipschitz constant of a smoothed classiﬁer. Unfortunately, a straightforward generalization of this approach to the case of vector functions leads to the estimation of the expectation of the norm of a multivariate Gaussian which is known to depend on the number of dimensions of the space. Instead, we show that a simple adjustment to this technique may be done such that the estimate of the Lipschitz constant is the same as for the function in (5). Our results are formulated in the theorems below proofs of which are moved to the Appendix in order not to distract the reader. Theorem 1. (Lipschitz-continuity of smoothed vector function) Suppose that f : RD ! Rd is a deterministic function and g(x) = E" N (0,σ2I)f(x + ") is continuously differentiable for all x. If

for all x, kf(x)k2 = 1, then g(x) is L Lipschitz in l2 norm with L =

Remark 1. We perform the analysis of Lipschitz-continuity in the Theorem 1 in l2 norm, so the distance metric in (4) is l2 distance. We do not consider other norms in this paper.

Robust classiﬁcation in the space of embeddings. To provide certiﬁcation for a classiﬁcation in the space of embeddings, one should estimate the maximum deviation of the classiﬁed embedding that does not change the closest class prototype. In the Theorem 2, we show how this deviation is connected with the mutual arrangement of embedding and class prototype.

Figure 2: Illustration of the Theorem 2, one-shot case. The direction of adversarial risk in the space of embeddings is always parallel to the vector c1 c2. This is also true for the case of kc1k2 6= kc2k2.

Theorem 2. (Adversarial embedding risk) Given an input image x 2 RD and the embedding g : RD ! Rd the closest point on to decision boundary in the embedding space (see Figure 2) is located at a distance (deﬁned as

adversarial embedding risk):

γ = k k2 = kc2 g(x)k2

2 kc1 g(x)k2

2 2kc2 c1k2

(7) where c1 2 Rd and c2 2 Rd are the two closest prototypes. The value of γ is the distance between classifying embedding and the decision boundary between classes represented by c1 and c2. Note that this is the minimum l2 distortion in the embedding space required to change the prediction of g.

Two theorems combined result in an l2 robustness guarantee for few-shot classiﬁcation as: Theorem 3. (Robustness guarantee) L2-robustness guarantee r for an input image x in the n dimensional input metric space under classiﬁcation by a classiﬁer g from the Theorem 1 is r = γ

L, where L is the Lipschitz constant from the Theorem 1 and γ is the adversarial risk from the Theorem 2. The value of r is the certiﬁed radius of g at x, or, in other words, minimum l2 distortion

in the input space required to change the prediction of g. The proof of this fact straightforwardly follows from the deﬁnition (6) and results from Theorems 1-2.

4 Certiﬁcation protocol

In this section, we describe the numerical implementation of our approach and estimate its failure probability.

4.1 Estimation of prediction of smoothed classiﬁer

As mentioned previously, the procedure of few-shot classiﬁcation is performed by assigning an object x to the closest class prototype. Unfortunately, given the smoothed function g in the form from the Theorem 1 and class prototype ck from (2), it is impossible to compute the value (g(x), ck) explicitly as well as to determine the closest prototype, since it is in general unknown how does g(x) look like.

We propose both to estimate the closest prototype for classiﬁcation and to estimate the distance to the closest decision boundary γ from Theorem 2 as the largest class-preserving perturbation in the embedding space by computing two-sided conﬁdence intervals for random variables

1 = kˆg(x) c1k2

2, . . . , K = kˆg(x) c Kk2

where ˆg(x) = 1

i=1 f(x + "i) is the estimation of g(x) computed as empirical mean by n samples of noise, and one-sided conﬁdence interval for γ from the Theorem 2, respectively. Pseudo-code for both procedures is presented in Algorithms 1-2.

Algorithm 1 Closest prototype computation algorithm. GIVEN: base classiﬁer f, noise σ, object x, number of samples n for ˆg(x), class prototypes {ci}K

i=1, maximum number of samples T and conﬁdence level , RETURNS: index A of the closest prototype.

Function CLOSEST(f, σ, x, n, {ci}K

i=1, T, ) fs ; i 0 n0 = n while n T do

"1, . . . , "n N(0, σ2I) ˆg(x) 1

i=1 f(x + "i) fs fs [ {ˆg(x)} for all i 2 [1, . . . , K] do

C = ci ds To C (fs, C) (l C, u C) Two Sided Conf Int(ds To C, ) li l C, ui u C end for A arg min {li}K

i=1 if u A < min({li}K

i=1,i6=A) then Return A, fs else

n n+n0 {Increase number of samples used for computing an approximation ˆg(x) until the number of observations is large enough to determine two leftmost intervals or until n = T} end if end while End Function

The Algorithm 1 describes an inference procedure for the smoothed classiﬁer from the Theorem 1; the Algorithm 2 uses Algorithm 1 and, given input parameters, estimates an adversarial risk from the Theorem 2 it determines two closest to the smoothed embedding g(x) prototypes A and B and produces the lower conﬁdence bound for the distance between g(x) and the decision boundary between A and B. Combined with analysis from the Theorem 1, it provides the certiﬁed radius for a sample the smallest value of l2 norm of perturbation in the input space required to change the prediction of the smoothed classiﬁer. In the next subsection, we discuss in detail the procedure of computing conﬁdence intervals in Algorithms 1-2.

4.2 Applicability of algorithms

The computations of smoothed function and distances to class prototypes and decision boundary in Algorithms 1-2 are based on the estimations of corresponding random variables, thus, it is necessary to analyze the applicability of the algorithms. In this section, we propose a way to compute conﬁdence intervals for squares of the distances between the estimates of embeddings and class prototypes.

Computation of conﬁdence intervals for the squares of distances. Recall that one way to estimate the value of a parameter of a random variable is to compute a conﬁdence interval for the corresponding statistic. In this work, we construct intervals by applying well-known Hoeffding inequality [13] in the form

i=1(bi ai)2

where X and E(X) are sample mean and population mean of random variable X, respectively, n is the number of samples and numbers ai, bi are such that P(Xi 2 (ai, bi)) = 1.

Algorithm 2 Adversarial embedding risk computation algorithm. GIVEN: base classiﬁer f, noise σ, object x, number of samples n for ˆg(x), class prototypes {ci}K

i=1, maximum number of samples T for ˆg(x) and conﬁdence level RETURNS: lower bound Γ for the adversarial risk γ.

Function EMBEDDING-RISK(f, σ, x, n, {ci}K

i=1, T, ) A, fs A CLOSEST(f, σ, x, n, {ci}K

i=1, T, ) B, fs B CLOSEST(f, σ, x, n, {ci}K

i=1,i6=A, T, ) fs fs A [ fs B γs ; for all f 2 fs do

γ = kc B gk2 kc A gk2

2kc B c Ak2 γs γs [ {γ} end for Γ Lower Conf Bound(γs, ) Return Γ End Function

However, a conﬁdence interval for the distance kˆg(x) ckk2 with a certain conﬁdence covers an expectation of distance E(kˆg(x) ckk2), not the distance for expectation k E(ˆg(x) ck)k2 = kg(x) ckk2.

To solve this problem, we propose to compute conﬁdence intervals for the dot product of vectors. Namely, given a quantity x,k = hg(x) ck, g(x) cki, we sample its unbiased estimate with at maximum 2n samples of noise (here we have to mention that the number of samples n from Algorithm 1 actually doubles since we need a pair of estimates ˆg(x) of smoothed embeddings):

f(x + "i) ck, 1

f(x + "j) ck

and compute conﬁdence interval (lx,k, ux,k) such that given

i=1(bi ai)2

the population mean E(ˆ x,k) is most probably located within it, or, equivalently,

lx,k E(ˆ x,k) ux,k

1 . We have to mention that there are three conﬁdence intervals for three terms (one with quadratic number of samples and two with the linear numbers of samples) in the expression (10), that is why there is fraction 1

3 in the equation (11).

Also note that the population mean E(ˆ x,k) is exactly kg(x) ckk2

E(ˆ x,k) = E

f(x + "i) ck, 1

f(x + "j) ck

= kg(x) ckk2

since f(x + "i) and f(x + "j) are independent random variables for i 6= j. Finally, note that the conﬁdence interval (lx,k, ux,k) for the quantity kg(x) ckk2

2 implies conﬁdence interval

(a) Cub-200-2011

(b) CIFAR-FS

(c) mini Image Net

Figure 3: Dependency of certiﬁed accuracy on attack radius " for different σ, 1-shot case, n = 1000.

(a) Cub-200-2011

(b) CIFAR-FS

(c) mini Image Net

Figure 4: Dependency of certiﬁed accuracy on attack radius " for different σ, 5-shot case, n = 1000.

lx,k, pux,k) (13)

for the quantity kg(x) ckk2. Thus, the procedures Two Sided Conf Int and Lower Conf Bound from algorithms return an interval from (13) and its left bound for the random variable representing corresponding distance, respectively.

5 Experiments

5.1 Datasets

For the experimental evaluation of our approach we use several well-known datasets for few-shot learning classiﬁcation. Cub-200-2011 [45] is a dataset with 11, 788 images of 200 bird species, where 5864 images of 100 species are in the train subset and 5924 images of other 100 species are in the test subset. It is notable that a lot of species presented in dataset have degree of visual similarity, making classiﬁcation of ones a challenging task even for humans. mini Image Net [44] is a substet of images from ILSVRC 2015 [37] dataset with 64 images categories in train subset, 16 categories in validation subset and 20 categories in test subset with 600 images of size 84 84 in each category. CIFAR FS [1] is a subset of CIFAR 100 [19] dataset which was generated in the same way as mini Image Net and contains 37800 images of 64 categories in the train set and 11400 images of 20 categories in the test set. Experimental setup for all the datasets is presented in the next section.

5.2 Experimental settings and computation cost

Following [3], we compute approximate certiﬁed test set accuracy to estimate the performance of the smoothed model prediction with the Algorithm 1 and embedding risk computation with the Algorithm 2. The baseline model we used for experiments is a prototypical network introduced in [39] with Conv Net-4 backbone. Compared to the original architecture, an additional fully-connected layer was added in the tail of the network to map embeddings to 512-dimensional vector space. The model was trained to solve 1-shot and 5-shot classiﬁcation tasks on each dataset, with 5-way classiﬁcation on each iteration.

Parameters of expeiments. For data augmentation, we applied Gaussian noise with zero mean, unit variance and probability 0.3 of augmentation. Each dataset was certiﬁed on a subsample of 500 images with default parameters for the Algorithm 1: number of samples n = 1000, conﬁdence level = 0.001 and variance σ = 1.0, unless stated otherwise. For our settings, it may be shown from simple geometry that values (ai, bi) from (9) are such that bi ai 4 so we use bi ai = 4. The maximum number of samples T in the Algorithm 1 is set to be T = 5 105.

Computation cost. In the table below, we report the computation time of the certiﬁcation procedure per image on Tesla V100 GPU for Cub-200-2011 dataset. Standard deviation in seconds appears to be signiﬁcant because the number of main loop iterations required to separate the two leftmost conﬁdence intervals varies from image to image in the test set.

Table 1: Computation time per image of implementation of the Algorithm 2, Cub-200-2011.

n 103 104 105 t, sec 0.044 0.030 0.509 0.403 4.744 2.730

5.3 Results of experiments

In this section, we report the results of our experiments. In our evaluation protocol, we compute approximate certiﬁed test set accuracy, CA. Given a sample x, a smoothed classiﬁer g( ) from the Algorithm 1 with an assigned classiﬁcation rule h(x) = arg mini2{1,...,K} kg(x) ckk2, threshold value " for l2 norm of additive perturbation and the robustness guarantee r = r(x) from the Algorithm 3, we compute CA on test set S as follows:

CA(S, ") = |(x, y) 2 S : r(x) > " & h(x) = y|

In other words, we treat the model g( ) as certiﬁed at point x under perturbation of norm " if x is correctly classiﬁed by g( ) (what means that the procedure of classiﬁcation described in the Algorithm 1 does not abstain from classiﬁcation of x) and g( ) has the value of certiﬁed radius r(x) > ".

Visualization of results. The ﬁgures 3-4 represent dependencies of certiﬁed accuracy on the value of norm of additive perturbation for different learning settings (1-shot and 5-shot learning). The value of the attack radius corresponds to the threshold " from (14). For Cub-200-2011 dataset we provide a dependency of certiﬁed accuracy for different sample size n for the Algorithm 1 (in the Figure 5).

6 Limitations

In this section, we provide failure probability of Algorithms 1-2, discuss abstains from classiﬁcation in the Algorithm 1 and speculate on the application of our method in other few-shot scenarios.

6.1 Estimation of errors of algorithms

Note that the value of from (11) is the probability of the value of x,k = kg(x) ckk2 not to belong to the corresponding interval of the form from (13). Given a sample x, the procedure in the Algorithm 1 returns two closest prototypes to the g(x). To determine two leftmost conﬁdence intervals, all the

distances x,k have to be located within corresponding intervals, thus, according to the independence of computing these two intervals, the error probability for the Algorithm 1 is q1 = K , where K is the number of classes. Similarly, the procedure in the Algorithm 2 outputs the lower bound for the adversarial risk with coverage at least 1 and depend on the output of the Algorithm 1 inside, and, thus, has error probability q2 = 1 (1 )(1 K ) = + K K 2 that corresponds to returning an overestimated lower bound for the adversarial risk from the Theorem 2.

6.2 Abstains from classiﬁcation and extension to other few-shot approaches

It is crucial to note that the procedure in the Algorithm 1 may require a lot of samples to distinguish two leftmost conﬁdence intervals and sometimes does not ﬁnish before reaching threshold

value T for sample size. As a result, there may be input objects at which the smoothed classiﬁer can be neither evaluated nor certiﬁed. In this subsection, we report the fraction of objects in which the Algorithm 1 abstains from determining the closest class prototype (see Tables 2-3).

Figure 5: Dependency of certiﬁed accuracy on attack radius " for different number of samples n of the Algorithm 1, CIFAR-FS dataset, 5shot case. It is notable that a relatively small number of samples may be used to achieve a satisfactory level of certiﬁed accuracy.

Table 2: Percentage of non-certiﬁed objects in test subset, 1-shot case.

= 10 2 = 10 3 = 10 4 Cub-200-2011 1.6% 1.6% 1.6% CIFAR-FS 2.2% 2.2% 2.4% mini Image Net 1.9% 2.2% 2.4%

Table 3: Percentage of non-certiﬁed objects in test subset, 5-shot case.

= 10 2 = 10 3 = 10 4 Cub-200-2011 1.2% 1.2% 1.4% CIFAR-FS 3.0% 3.4% 3.8% mini Image Net 2.9% 2.9% 3.0%

Since our method is based on randomized smoothing, among the approaches presented in [43] it is applicable for matching networks and to some extent to MAML networks. In the case of matching networks, the new sample is labeled as weighted cosine distance to support samples, so it is easy to transfer guarantees for l2 distance to the ones for cosine distance in case of normalized embeddings. For MAML, smoothing may be applied to the embedding function as well, but theoretical derivations of certiﬁcates are required.

7 Related work

Breaking neural networks with adversarial attacks and empirical defending from them have a long history of cat-and-mouse game. Namely, for a particular proposed defense against existing adversarial perturbations, a new more aggressive attack is found. This motivated researchers to ﬁnd defenses that are mathematically provable and certiﬁably robust to different kinds of input manipulations. Several works proposed exactly veriﬁed neural networks based on Satisﬁability Modulo Theories solvers [16, 6], or mixed-integer linear programming [30, 8]. These methods are found to be computationally inefﬁcient, although they guarantee to ﬁnd adversarial examples, in the case they exist. Another line of works use more relaxed certiﬁcation [47, 11, 36]. Although these methods aim to guarantee that an adversary does not exist in a certain region around a given input, they suffer from scalability to big networks and large datasets. The only scalable to large datasets provable defense against adversarial perturbations is randomized smoothing. Initially, it was found as an empirical defense to mitigate adversarial effects in neural networks [29, 49]. Later several works showed its mathematical proof of certiﬁed robustness [21, 25, 3, 38]. Lecuyer et al [21] ﬁrst provided proof of certiﬁcates against adversarial examples using differential privacy. Later, Cohen et al [3] provided the tightest bound using Neyman-Pearson lemma. Interestingly, alternative proof using Lipschitz continuity was found [38]. The scalability and simplicity of randomized smoothing attracted signiﬁcant attention, and it was extended beyond l2 perturbations [22, 42, 27, 24, 23, 20, 32, 51].

8 Conclusion and future work

In this work, we extended randomized smoothing as a defense against additive norm-bounded adversarial attacks to the case of classiﬁcation in the embedding space that is used in few-shot learning scenarios. We performed an analysis of Lipschitz continuity of smoothed normalized embeddings and derived a robustness certiﬁcate against l2 attacks. Our theoretical ﬁndings are supported experimentally on several datasets. There are several directions for future work: our approach can possibly be extended to other types of attacks, such as semantic transformations; also, it is important to reduce the computational complexity of the certiﬁcation procedure.

9 Acknowledgements

The work was supported by Ministry of Science and Higher Education grant No. 075-10-2021-068.

[1] Luca Bertinetto, Joao F Henriques, Philip HS Torr, and Andrea Vedaldi. Meta-learning with

differentiable closed-form solvers. ar Xiv preprint ar Xiv:1805.08136, 2018.

[2] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In

2017 ieee symposium on security and privacy (sp), pages 39 57. IEEE, 2017.

[3] Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certiﬁed adversarial robustness via randomized

smoothing. In International Conference on Machine Learning, pages 1310 1320. PMLR, 2019.

[4] Guneet S. Dhillon, Kamyar Azizzadenesheli, Zachary C. Lipton, Jeremy Bernstein, Jean

Kossaiﬁ, Aran Khanna, and Anima Anandkumar. Stochastic activation pruning for robust adversarial defense, 2018.

[5] Yinpeng Dong, Hang Su, Baoyuan Wu, Zhifeng Li, Wei Liu, Tong Zhang, and Jun Zhu.

Efﬁcient decision-based black-box adversarial attacks on face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7714 7722, 2019.

[6] Ruediger Ehlers. Formal veriﬁcation of piece-wise linear feed-forward neural networks. In

International Symposium on Automated Technology for Veriﬁcation and Analysis, pages 269 286. Springer, 2017.

[7] Volker Fischer, Mummadi Chaithanya Kumar, Jan Hendrik Metzen, and Thomas Brox. Adver-

sarial examples for semantic image segmentation. ar Xiv preprint ar Xiv:1703.01101, 2017.

[8] Matteo Fischetti and Jason Jo. Deep neural networks as 0-1 mixed integer linear programs: A

feasibility study. ar Xiv preprint ar Xiv:1712.06174, 2017.

[9] Micah Goldblum, Liam Fowl, and Tom Goldstein. Adversarially robust few-shot learning: A

meta-learning approach. Advances in Neural Information Processing Systems, 33, 2020.

[10] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversar-

ial examples. ar Xiv preprint ar Xiv:1412.6572, 2014.

[11] Sven Gowal, Krishnamurthy Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan

Uesato, Relja Arandjelovic, Timothy Mann, and Pushmeet Kohli. On the effectiveness of interval bound propagation for training veriﬁably robust models. ar Xiv preprint ar Xiv:1810.12715, 2018.

[12] Jan Hendrik Metzen, Mummadi Chaithanya Kumar, Thomas Brox, and Volker Fischer. Universal

adversarial perturbations against semantic image segmentation. In Proceedings of the IEEE international conference on computer vision, pages 2755 2764, 2017.

[13] Wassily Hoeffding. Probability inequalities for sums of bounded random variables. In The

collected works of Wassily Hoeffding, pages 409 426. Springer, 1994.

[14] Yunseok Jang, Tianchen Zhao, Seunghoon Hong, and Honglak Lee. Adversarial defense via

learning to generate diverse attacks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2740 2749, 2019.

[15] Jinyuan Jia, Xiaoyu Cao, Binghui Wang, and Neil Zhenqiang Gong. Certiﬁed robustness for

top-k predictions against adversarial perturbations via randomized smoothing. ar Xiv preprint ar Xiv:1912.09899, 2019.

[16] Guy Katz, Clark Barrett, David L Dill, Kyle Julian, and Mykel J Kochenderfer. Reluplex:

An efﬁcient smt solver for verifying deep neural networks. In International Conference on Computer Aided Veriﬁcation, pages 97 117. Springer, 2017.

[17] Edgar Kaziakhmedov, Klim Kireev, Grigorii Melnikov, Mikhail Pautov, and Aleksandr

Petiushko. Real-world attack on mtcnn face detection system. In 2019 International Multi Conference on Engineering, Computer and Information Sciences (SIBIRCON), pages 0422 0427. IEEE, 2019.

[18] Stepan Komkov and Aleksandr Petiushko. Advhat: Real-world adversarial attack on arcface

face id system. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 819 826. IEEE, 2021.

[19] Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images.

[20] Aounon Kumar and Tom Goldstein. Center smoothing: Certiﬁed robustness for networks with

structured outputs. Advances in Neural Information Processing Systems, 34, 2021.

[21] Mathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, and Suman Jana. Certiﬁed

robustness to adversarial examples with differential privacy. In 2019 IEEE Symposium on Security and Privacy (SP), pages 656 672. IEEE, 2019.

[22] Guang-He Lee, Yang Yuan, Shiyu Chang, and Tommi S Jaakkola. Tight certiﬁcates of ad-

versarial robustness for randomly smoothed classiﬁers. ar Xiv preprint ar Xiv:1906.04948, 2019.

[23] Alexander Levine and Soheil Feizi. Robustness certiﬁcates for sparse adversarial attacks

by randomized ablation. In Proceedings of the AAAI Conference on Artiﬁcial Intelligence, volume 34, pages 4585 4593, 2020.

[24] Alexander Levine and Soheil Feizi. Wasserstein smoothing: Certiﬁed robustness against

wasserstein adversarial attacks. In International Conference on Artiﬁcial Intelligence and Statistics, pages 3938 3947. PMLR, 2020.

[25] Bai Li, Changyou Chen, Wenlin Wang, and Lawrence Carin. Certiﬁed adversarial robustness

with additive noise. ar Xiv preprint ar Xiv:1809.03113, 2018.

[26] Debang Li, Junge Zhang, and Kaiqi Huang. Universal adversarial perturbations against object

detection. Pattern Recognition, 110:107584, 2021.

[27] Linyi Li, Maurice Weber, Xiaojun Xu, Luka Rimanic, Bhavya Kailkhura, Tao Xie, Ce Zhang,

and Bo Li. Tss: Transformation-speciﬁc smoothing for robustness certiﬁcation, 2021.

[28] Fan Liu, Shuyu Zhao, Xuelong Dai, and Bin Xiao. Long-term cross adversarial training: A

robust meta-learning method for few-shot classiﬁcation tasks. ar Xiv preprint ar Xiv:2106.12900, 2021.

[29] Xuanqing Liu, Minhao Cheng, Huan Zhang, and Cho-Jui Hsieh. Towards robust neural networks

via random self-ensemble. In Proceedings of the European Conference on Computer Vision (ECCV), pages 369 385, 2018.

[30] Alessio Lomuscio and Lalit Maganti. An approach to reachability analysis for feed-forward

relu neural networks. ar Xiv preprint ar Xiv:1706.07351, 2017.

[31] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu.

Towards deep learning models resistant to adversarial attacks. ar Xiv preprint ar Xiv:1706.06083, 2017.

[32] Jeet Mohapatra, Ching-Yun Ko, Tsui-Wei Weng, Pin-Yu Chen, Sijia Liu, and Luca Daniel.

Higher-order certiﬁcation for randomized smoothing. ar Xiv preprint ar Xiv:2010.06651, 2020.

[33] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Univer-

sal adversarial perturbations. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1765 1773, 2017.

[34] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple

and accurate method to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2574 2582, 2016.

[35] Mikhail Pautov, Nurislam Tursynbek, Marina Munkhoeva, Nikita Muravev, Aleksandr

Petiushko, and Ivan Oseledets. CC-Cert: A probabilistic approach to certify general robustness of neural networks. ar Xiv preprint ar Xiv:2109.10696, 2021.

[36] Aditi Raghunathan, Jacob Steinhardt, and Percy Liang. Certiﬁed defenses against adversarial

examples. ar Xiv preprint ar Xiv:1801.09344, 2018.

[37] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng

Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3):211 252, 2015.

[38] Hadi Salman, Greg Yang, Jerry Li, Pengchuan Zhang, Huan Zhang, Ilya Razenshteyn, and

Sebastien Bubeck. Provably robust deep learning via adversarially trained smoothed classiﬁers. ar Xiv preprint ar Xiv:1906.04584, 2019. [39] Jake Snell, Kevin Swersky, and Richard S Zemel. Prototypical networks for few-shot learning.

ar Xiv preprint ar Xiv:1703.05175, 2017. [40] Jiawei Su, Danilo Vasconcellos Vargas, and Kouichi Sakurai. One pixel attack for fooling deep

neural networks. IEEE Transactions on Evolutionary Computation, 23(5):828 841, 2019. [41] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfel-

low, and Rob Fergus. Intriguing properties of neural networks. ar Xiv preprint ar Xiv:1312.6199, 2013. [42] Jiaye Teng, Guang-He Lee, and Yang Yuan. _1 adversarial robustness certiﬁcates: a randomized

smoothing approach. 2019. [43] Eleni Triantaﬁllou, Tyler Zhu, Vincent Dumoulin, Pascal Lamblin, Utku Evci, Kelvin Xu, Ross

Goroshin, Carles Gelada, Kevin Swersky, Pierre-Antoine Manzagol, et al. Meta-dataset: A dataset of datasets for learning to learn from few examples. ar Xiv preprint ar Xiv:1903.03096, 2019. [44] Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al. Matching networks for

one shot learning. Advances in neural information processing systems, 29:3630 3638, 2016. [45] Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. The

caltech-ucsd birds-200-2011 dataset. 2011. [46] Lily Weng, Pin-Yu Chen, Lam Nguyen, Mark Squillante, Akhilan Boopathy, Ivan Oseledets,

and Luca Daniel. Proven: Verifying robustness of neural networks with a probabilistic approach. In International Conference on Machine Learning, pages 6727 6736. PMLR, 2019. [47] Eric Wong and Zico Kolter. Provable defenses against adversarial examples via the convex outer

adversarial polytope. In International Conference on Machine Learning, pages 5286 5295. PMLR, 2018. [48] Zuxuan Wu, Ser-Nam Lim, Larry S Davis, and Tom Goldstein. Making an invisibility cloak:

Real world adversarial attacks on object detectors. In European Conference on Computer Vision, pages 1 17. Springer, 2020. [49] Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, and Alan Yuille. Mitigating adversarial

effects through randomization. ar Xiv preprint ar Xiv:1711.01991, 2017. [50] Cihang Xie, Jianyu Wang, Zhishuai Zhang, Yuyin Zhou, Lingxi Xie, and Alan Yuille. Adver-

sarial examples for semantic segmentation and object detection. In Proceedings of the IEEE International Conference on Computer Vision, pages 1369 1378, 2017. [51] Greg Yang, Tony Duan, J Edward Hu, Hadi Salman, Ilya Razenshteyn, and Jerry Li. Randomized

smoothing of all shapes and sizes. In International Conference on Machine Learning, pages 10693 10705. PMLR, 2020. [52] Dinghuai Zhang, Mao Ye, Chengyue Gong, Zhanxing Zhu, and Qiang Liu. Black-box certiﬁca-

tion with randomized smoothing: A functional optimization based framework. ar Xiv preprint ar Xiv:2002.09169, 2020. [53] Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Laurent El Ghaoui, and Michael Jordan.

Theoretically principled trade-off between robustness and accuracy. In International Conference on Machine Learning, pages 7472 7482. PMLR, 2019. [54] Yaoyao Zhong and Weihong Deng. Towards transferable adversarial attack against deep face

recognition. IEEE Transactions on Information Forensics and Security, 16:1452 1466, 2020. [55] Jianli Zhou, Chao Liang, and Jun Chen. Manifold projection for adversarial defense on face

recognition. In European Conference on Computer Vision, pages 288 305. Springer, 2020.

1. For all authors...

(a) Do the main claims made in the abstract and introduction accurately reﬂect the paper s

contributions and scope? [Yes]

(b) Did you describe the limitations of your work? [Yes] See Section 6

(c) Did you discuss any potential negative societal impacts of your work? [N/A] (d) Have you read the ethics review guidelines and ensured that your paper conforms to

them? [Yes] 2. If you are including theoretical results...

(a) Did you state the full set of assumptions of all theoretical results? [Yes] (b) Did you include complete proofs of all theoretical results? [Yes] The proofs are

included in the supplementary material. 3. If you ran experiments...

(a) Did you include the code, data, and instructions needed to reproduce the main experi-

mental results (either in the supplemental material or as a URL)? [Yes] The code with instructions is included in the supplementary material. (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they

were chosen)? [Yes] Applicable training details are described in Section 5.2. (c) Did you report error bars (e.g., with respect to the random seed after running experi-

ments multiple times)? [N/A] (d) Did you include the total amount of compute and the type of resources used (e.g., type

of GPUs, internal cluster, or cloud provider)? [Yes] See Section 5.2. 4. If you are using existing assets (e.g., code, data, models) or curating/releasing new assets...

(a) If your work uses existing assets, did you cite the creators? [Yes] See Sections 5.1 and

5.2 (b) Did you mention the license of the assets? [No]

(c) Did you include any new assets either in the supplemental material or as a URL? [No] (d) Did you discuss whether and how consent was obtained from people whose data you re

using/curating? [No] The datasets and models are publicly available. (e) Did you discuss whether the data you are using/curating contains personally identiﬁable

information or offensive content? [N/A] 5. If you used crowdsourcing or conducted research with human subjects...

(a) Did you include the full text of instructions given to participants and screenshots, if

applicable? [N/A] (b) Did you describe any potential participant risks, with links to Institutional Review

Board (IRB) approvals, if applicable? [N/A] (c) Did you include the estimated hourly wage paid to participants and the total amount

spent on participant compensation? [N/A]