# robustness_verification_for_contrastive_learning__087becbf.pdf

Robustness Veriﬁcation for Contrastive Learning

Zekai Wang 1 Weiwei Liu 1

Contrastive adversarial training has successfully improved the robustness of contrastive learning (CL). However, the robustness metric used in these methods is linked to attack algorithms, image labels and downstream tasks, all of which may affect the consistency and reliability of robustness metric for CL. To address these problems, this paper proposes a novel Robustness Veriﬁcation framework for Contrastive Learning (RVCL). Furthermore, we use extreme value theory to reveal the relationship between the robust radius of the CL encoder and that of the supervised downstream task. Extensive experimental results on various benchmark models and datasets verify our theoretical ﬁndings, and further demonstrate that our proposed RVCL is able to evaluate the robustness of both models and images. Our code is available at https: //github.com/wzekai99/RVCL.

1. Introduction

While neural networks (NNs) have achieved remarkable performance in various applications, many studies (Goodfellow et al., 2015; Madry et al., 2018) have demonstrated that NNs are vulnerable when dealing with imperceptibly perturbed images. A rapidly growing body of work therefore aims to investigate how a robust NN model might be obtained.

One successful method in this ﬁeld is based on adversarial training (AT) (Moosavi-Dezfooli et al., 2016; Carlini & Wagner, 2017), which improves the robustness of NN by augmenting the training set with adversarial samples (Szegedy et al., 2014). Schmidt et al. (2018) show that AT requires a large amount of data, while the labels of this data are expensive to obtain. Thus, several existing works have attempted to improve the robustness of adversarially trained models through the use of additional unlabeled data

1School of Computer Science, Wuhan University, China. Correspondence to: Weiwei Liu <liuweiwei863@gmail.com>.

Proceedings of the 39 th International Conference on Machine Learning, Baltimore, Maryland, USA, PMLR 162, 2022. Copyright 2022 by the author(s).

(Alayrac et al., 2019; Carmon et al., 2019; Zhai et al., 2019). Recently, attempts have been made to combine AT with contrastive learning (CL) (Kim et al., 2020; Fan et al., 2021). CL (Chen et al., 2020; He et al., 2020) has demonstrated superior abilities in unsupervised learning: speciﬁcally, it can surpass the standard accuracy of supervised learning on downstream image classiﬁcation tasks by utilizing a largescale unlabeled dataset. It is therefore of primary interest to study the robust performance achieved by contrastive AT (Gowal et al., 2021).

However, existing contrastive AT methods use the empirical robustness metric (e.g. robust accuracy) to evaluate the robustness of encoders, an approach that relies on attack algorithms, image labels and downstream tasks. Three key questions naturally arise from this kind of measurement:

1. Attack algorithm: Robust accuracy is related to a speciﬁc attack algorithm (e.g. PGD attack (Madry et al., 2018)); thus, the results may not be consistent with powerful adversaries. 2. Image label: It is farfetched to train with unlabeled images while evaluating the encoder with labeled images. 3. Downstream task: We do not know whether the robustness beneﬁts from the encoder or the linear classiﬁer.

In fact, the ﬁrst question has already been considered in supervised learning. To explore the ﬁrst question in more detail, the robustness analysis of NN is entangled with the speciﬁc attack algorithms used for evaluation (Weng et al., 2018). Most defense heuristics have subsequently been shown to fail against suitably powerful attack algorithms (Carlini & Wagner, 2017; Uesato et al., 2018). Even if the model is made robust to the attack algorithm used by evaluation, there is no guarantee that will remain robust to other unseen attacks. This has encouraged researchers to develop robustness veriﬁcation (Katz et al., 2017; Wong et al., 2018): i.e., classiﬁers whose prediction at point x is veriﬁed to be constant within a neighborhood of x, regardless of what attack algorithm is applied. The key concept is to ﬁnd the largest radius of this neighborhood, referred to as the robust radius, which is an important metric for use in studying the robustness of NN.

Unfortunately, previous works on robustness veriﬁcation have used only supervised learning; i.e., true labels of data are required for both the supervised classiﬁer and veriﬁer.

Robustness Veriﬁcation for Contrastive Learning

A naive approach would be to apply a current veriﬁcation framework to supervised downstream tasks, although this method would also suffer from the issues highlighted in Questions 2 & 3, which prompts us to ask the following questions:

Can we design a robustness veriﬁcation framework for contrastive learning that does not require class labels and downstream tasks? Is there any relationship between the robust radius of the CL encoder and that of the downstream task?

Our work attempts to conduct a rigorous and comprehensive study that addresses the above questions. Our main contributions are summarized below.

1. We propose RVCL, a novel Robustness Veriﬁcation framework for Contrastive Learning. We then deﬁne the robust radius for CL such that no points inside this radius can be recognized as negative samples; this is a robustness metric for encoders that does not require the involvement of an attack algorithm, image label and downstream task. 2. We use extreme value theory to prove that the robust radius of the CL encoder is the upper bound of the robust radius of the downstream task. This implies that robust data representations rendered by the CL encoder (with a large robust radius) can beneﬁt from the model s robust performance on downstream tasks. 3. To verify the efﬁcacy of the proposed RVCL, we validate our proposed framework on two veriﬁcation benchmark datasets (MNIST and CIFAR-10). Our experimental results illustrate that the proposed RVCL is a suitable robustness metric for models without labels, and accordingly validate our theoretical analysis. Moreover, RVCL is also able to evaluate the antidisturbance ability for distinct images.

2. Related work

Self-supervised Contrastive Learning Self-supervised learning (Jing & Tian, 2021), which involves training models using unlabeled data and various pretext tasks, has become popular as a means of extracting feature representation for deep NNs. Early advances have been used to solve image jigsaw puzzles (Noroozi & Favaro, 2016), predict rotation angles (Gidaris et al., 2018), ﬁll image patches (Doersch et al., 2015), etc. Recently, contrastive learning (CL) (Wu et al., 2018; Chen et al., 2020; He et al., 2020; Tian et al., 2020) has been proposed by maximizing the agreement between positive samples while contrasting with negative samples, and has further been shown to work well in learning effective representations. Some theoretical works have also been proposed; for example, Saunshi et al. (2019) provide the ﬁrst generalization bound for CL, while Nozawa et al. (2020) extend it by means of a PAC-Bayesian approach.

Contrastive Adversarial Training Due to the brittleness of NNs when faced with tiny input perturbations, AT (Madry et al., 2018) is one of the most powerful robust training methods used to enhance model robustness. Several recent works (Kim et al., 2020; Ho & Vasconcelos, 2020; Jiang et al., 2020; Fan et al., 2021) have explored how to improve robustness using contrastive AT. To obtain more robust data representations, AT is used on contrastive pretraining tasks following the contrastive adversarial pretraining + supervised ﬁnetuning paradigm. However, existing methods use an empirical robustness metric; the systematic study of the veriﬁed robustness of CL have been less explored.

Robustness Veriﬁcation In this paper, we focus on deterministic veriﬁcation, following the taxonomy of Li et al. (2020). When the given input is non-robust against the attack, deterministic veriﬁcation is guaranteed to identify this nonrobustness. The literature for this setting can be broadly divided into several categories: complete veriﬁers using satisﬁability modulo theory (SMT) (Katz et al., 2017; Ehlers, 2017), mixed integer programming (MIP) (Tjeng et al., 2019; Anderson et al., 2020) and branch and bound (Ba B) (Bunel et al., 2018; Wang et al., 2021); incomplete veriﬁers using bound propagation (Zhang et al., 2018; Xu et al., 2020), and convex relaxation by linear programming (Wong et al., 2018; Wong & Kolter, 2018). However, these works focus on supervised settings; veriﬁcation with CL settings remain unknown.

Extreme Value Theory Extreme value theory (EVT) has been recognized as a powerful tool, since it enables the limit distribution of properly normalized maxima to be effectively modeled (Scheirer et al., 2011). This success has produced strong empirical results for describable visual attributes (Scheirer et al., 2012), visual inspection tasks (Gibert et al., 2015) and open set recognition problems (Rudd et al., 2018), etc. Recently, CLEVER (Weng et al., 2018) estimates the robust radius of supervised veriﬁcation using EVT. The difference is discussed in more detail in Appendix B. In this paper, we creatively use EVT to theoretically analyze the relationship between the robust radius of the CL encoder and that of the downstream task.

3. Preliminaries

We ﬁrst present notations and describe the frameworks for contrastive learning and supervised veriﬁcation problem that will be essential for our analysis.

Notions Let 2 and denote the Euclidean norm ℓ2 and inﬁnity norm ℓ , respectively. 1[Boolean expression] is the indicator function (equal to 1 if the expression is True and 0 otherwise). B (x0, ϵ) := {x | x x0 ϵ} denotes that the input x is constrained into the ℓ ball. We

Robustness Veriﬁcation for Contrastive Learning

let sign(x) = 1 for x 0 and sign(x) = 1 for x < 0. If S is a set, |S| denotes its cardinality. The transpose of the vector/matrix is represented by the superscript . Given point x, the ℓ2 normalization is deﬁned as eρ(u) = u/ u 2. Given points u and v, the instance similarity is deﬁned as ρ(u, v) = u v/ u 2 v 2, which is the dot product between the ℓ2 normalized u and v (i.e., cosine similarity). We use [n] to represent the set {1, 2, . . . , n}.

Neural network Consider an input vector x Rd0 for a neural network with L layers. Let the number of neurons in the k-th layer be dk, while Wk Rdk dk 1 and bk Rdk (k [L]) represent the weights and biases of NN. Let φk : Rd0 Rdk be the operator mapping the input layer to layer k. σ(v) is the activation function, while the Re LU activation is σ(v) = max(v, 0). For each k [L], φk(x) = Wk bφk 1(x) + bk, bφk(x) = σ(φk(x)), bφ0(x) = x. We simply use φ(j) k and bφ(j) k to represent the pre-activation and post-activation values of the k-th layer and j-th neuron. The output of the neural network is φL(x) Rd L. d L further denotes the number of input classes.

3.1. Contrastive Learning

Let X denote the set of all possible data points. Let Y denote the label set of the downstream task, which is a discrete and ﬁnite set. F is a class of feature encoder f : X Rd. To highlight the key ideas, we present the CL framework proposed by Saunshi et al. (2019) in a simpliﬁed binary scenario, i.e., Y = { 1, 1}. CL assumes that we obtain the similar data in the form of pairs (x, x+) and K independent and identically distributed (i.i.d.) negative samples x 1 , x 2 , . . . , x K. Given an unlabeled dataset U = {zi}m i=1, where zi = (xi, x+ i , x i1, . . . , x i K), we aim to learn an encoder f that makes f(x) similar to f(x+), while keeping away from f(x 1 ), . . . , f(x K) at the same time.

Linear evaluation One standard method for evaluating the performance of the CL model is linear evaluation (Chen et al., 2020; Kim et al., 2020), which learns a downstream linear layer after the base encoder, then uses a modiﬁed model for class-level classiﬁcation. The test accuracy on the downstream task is used as a proxy for representation quality. The model with downstream layer is ﬁne-tuned from a labeled dataset S = {(xi, yi)}n i=1. Both U and S are assumed to be i.i.d. collections.

Data distributions Let C denote the set of latent classes (Saunshi et al., 2019) that are all possible classes for points in X. For each class c C, there is a probability Dc over X that captures the probability that a point belongs to class c. The distribution on C is denoted by η. Let c+, c denote the positive and negative latent class drawn from η; thus, Dc+ and Dc are the distributions to sample positive and

negative samples, respectively. The process for generating an unlabeled sample z = (x, x+, {x i }K i=1) U as follows: 1. Draw two latent classes (c+, c ) η2 ; 2. Draw two positive samples (x, x+) D2 c+ and K negative samples {x i Dc | i [K]} .

To set up the labeled dataset S for binary scenario, we build the binomial distribution ηsup by ﬁxing two classes c+, c :

ηsup(c+)= η(c+) η(c )+η(c+) , ηsup(c )= η(c ) η(c )+η(c+) . We ﬁx yc+ = +1 and yc = 1, then generate a labeled sample (x, y) S as follows: 1. Draw a class c ηsup and set the label y = yc ; 2. Draw a sample x Dc .

Loss function The learning process is divided into two steps: minimizing the contrastive loss on the encoder and ﬁne-tuning on the downstream layer using supervised loss. We focus on logistic loss: ℓ(v) = log2(1 + P

j exp( vj)) for v RK. Thus, the contrastive loss (Chen et al., 2020; He et al., 2020) associated with the encoder f in this framework is deﬁned as follows:

Lun(f) = E c+,c

η2 E x,x+ D2 c+ x i Dc

ℓ f(x)T f(x+) f(x j ) .

(1) For linear evaluation, the supervised learning algorithm is given the mapped dataset b S := {(f(xi), yi)}n i=1 and returns a predictor g : Rd R. The label of bx b S is obtained from by = sign(g(bx)), by { 1, 1}. The logistic loss in (2) is ℓ(v) = log2(1 + exp( v)) for v R. We aim to minimize the supervised loss as follows:

Lsup(g f) = E c ηsup E x Dc ℓ(yc g(f(x))) . (2)

3.2. Supervised Veriﬁcation

In this section, we set up the veriﬁcation problem for supervised learning. For simlicity, we here outline the simpliﬁed binary scenario. The supervised algorithm is given the labeled dataset S = {(xi, yi)}n i=1. Let the dimension of NN output d L = 1. H contains a class of supervised predictor h : X R with h := φL(x). Thus, the label of x is predicted by by = sign(h(x)), by { 1, 1}.

Veriﬁcation problem We refer to x = x + δ as an adversarial sample of x for classiﬁer h if h correctly classiﬁes x but assigns a different label to x . Because many powerful attack methods (Goodfellow et al., 2015; Carlini & Wagner, 2017) and adversarial training frameworks (Madry et al., 2018; Zhang et al., 2019) use ℓ norm, this paper also focuses on the setting in which δ satisﬁes the ℓ norm constraint δ ϵ. We say that model h is lϵ -veriﬁed at (x, y) if it correctly classiﬁes both x and x as y for any x B (x, ϵ), i.e., there are no adversarial samples around

Robustness Veriﬁcation for Contrastive Learning

x. The supervised veriﬁcation at (x, y), y { 1, 1} seeks the solution of the following optimization problem:

eh(x, y, ϵ) := min x y h(x )

s.t. φk(x ) = Wk bφk 1(x ) + bk, k [L], bφk(x ) = σ(φk(x )), k [L 1],

h(x ) = φL(x ),

x B (x, ϵ).

If eh 0, x B (x, ϵ) fools the model into producing an incorrect label. h is lϵ -veriﬁed if eh(x, y, ϵ) > 0. The complete veriﬁer aims to solve (3) and calculates eh exactly. Unfortunately, the complete veriﬁcation is proven to be an NP-complete problem (Katz et al., 2017; Sinha et al., 2018). Therefore, many previous works (Wong & Kolter, 2018; Zhang et al., 2018; Xu et al., 2020) propose incomplete veriﬁers that relax the non-convexity part of NN to derive a lower bound eh h. If h(x, y, ϵ) > 0 is given by the incomplete veriﬁer, model h is also lϵ -veriﬁed at (x, y).

Robust radius The lϵ -veriﬁed of h at (x, y) depends on the radius of the largest ℓ ball centered at x in which h does not change its prediction. This radius is called the robust radius, which is formally deﬁned as follows:

R(h; x, y) := inf sign(h(x )) =y x x

= sup ϵ ϵ s.t. eh(x, y, ϵ) > 0. (4)

If h(x) = y, then R(h; x, y) := 0. It is natural to regard the robust radius as a robustness metric. Recall that computing the robust radius is an NP-hard problem due to the need for complete veriﬁcation. We can thus derive a tight lower bound of R given by h, referred to as the certiﬁed radius, which is formally deﬁned as

R(h; x, y) := sup ϵ ϵ s.t. h(x, y, ϵ) > 0. (5)

The certiﬁed radius satisﬁes 0 R(h; x, y) R(h; x, y).

4. RVCL: Robustness Veriﬁcation Framework for Contrastive Learning

In this section, we ﬁrst introduce the veriﬁcation problem on supervised downstream tasks by simply modifying supervised veriﬁcation (3), and further present several weaknesses of adopting (3) in CL. We go on to propose a novel RVCL framework to solve these issues.

4.1. Veriﬁcation Problem for Linear Evaluation

By deﬁning the supervised veriﬁcation problem on linear evaluation, we can regard the robust radius R as a proxy robustness metric for CL.

We denote the encoder f : Rd0 Rd L with f := φL(x), where d0 and d L are the dimensions of the encoder s input and output, while WLE R1 d L and b LE R are the weight and bias of the downstream linear predictor g(x). The optimization problem for linear evaluation is deﬁned by simply modifying the constraints of (3), as follows:

eg(x, y, ϵ) := min x y g(x )

s.t. φk(x ) = Wk bφk 1(x ) + bk, k [L], bφk(x ) = σ(φk(x )), k [L 1],

g(x ) = WLEφL(x ) + b LE,

x B (x, ϵ).

The difference between (3) and (6) is that there is no active function σ( ) between the encoder and downstream layer. There is no barrier to applying incomplete veriﬁers on (6). Thus, the deﬁnition of robust radius RLE(g; x, y) and certiﬁed radius RLE(g; x, y) on the downstream task are similar to (4) and (5), respectively.

We can therefore regard RLE as a proxy robust radius at data point x. However, this approach has serious problems:

1. RLE cannot be computed directly without a label. 2. Even if we have the label to compute RLE, and use RLE to evaluate the model robustness, we do not know whether the robustness beneﬁts from the encoder or downstream layer.

These problems motivate us to propose RVCL, a novel framework for verifying the robustness of encoders without the need for labels and downstream tasks.

4.2. RVCL Framework

Many existing works have studied the supervised veriﬁcation problem stated in 3.2. However, the performing of robustness veriﬁcation for CL has received less research attention. In this section, we present the formal deﬁnition of the robustness veriﬁcation problem on the encoder f, after which we provide two robustness metrics to study the performance of the CL encoder and incomplete veriﬁer.

4.2.1. VERIFICATION PROBLEM FOR CL

The core concept behind supervised veriﬁcation is that the points in the small B (x, ϵ) ball should have the same label as x. Inspired by this idea, we deﬁne the conditions under which the disturbance successfully attacks the encoder.

Given a positive sample x+, let the negative sample x

be the attack target of x+. We hope that the points x B (x+, ϵ) will be more similar to x+ than any other negative samples x , while the instance-wise attack algorithm generates an adversarial sample x B (x+, ϵ) with the attack strength ϵ, in order to fool the model by judging x

as similar to x .

Robustness Veriﬁcation for Contrastive Learning

Figure 1. θ is the angle between features given by f. If θ1 < θ2, f is not attacked successfully by the adversarial sample x .

If the instance similarity ρ(f(x+), f(x )) > ρ(f(x ), f(x )) , then x is similar to x+ (i.e., θ1 < θ2 in Figure 1), which means that the encoder f is not successfully attacked by x . We say that the encoder f is lϵ -veriﬁed at (x+, x ) if x is more similar to x+ than to x for any x B (x+, ϵ), i.e., there are no adversarial samples similar to x around x+. Note that the comparison of instance similarity has an equivalent form:

ρ(f(x+), f(x )) > ρ(f(x ), f(x )) (7) eρ(f(x+)) eρ(f(x )) f(x ) > 0 (8)

Judging whether or not (8) is True can be regarded as a part of forward propagation; thus, we can deﬁne the optimization problem for CL by adding a linear layer after f with weight WCL = (eρ(f(x+)) eρ(f(x ))) .

Deﬁnition 4.1 (Veriﬁcation problem for CL). Given two positive and negative samples x+, x Rd0, respectively, the feature encoder f : Rd0 Rd L with f := φL(x), ℓ2 normalization eρ(u) = u/ u 2, for any ﬁxed ϵ, the robustness veriﬁcation problem for CL is deﬁned as follows:

ef(x+, x , ϵ) := min x WCLf(x )

s.t. φk(x ) = Wk bφk 1(x ) + bk, k [L], bφk(x ) = σ(φk(x )), k [L 1],

WCL = eρ(f(x+)) eρ(f(x )) R1 d L,

f(x ) = φL(x ), x B (x+, ϵ).

Moreover, the robust radius RCL and certiﬁed radius RCL for CL are deﬁned as follows:

RCL(f; x+, x ) := inf ρ(f(x ),f(x+)) <ρ(f(x ),f(x ))

= sup ϵ ϵ s.t. ef(x+, x , ϵ) > 0,

RCL(f; x+, x ) := sup ϵ ϵ s.t. f(x+, x , ϵ) > 0,

where f is the veriﬁed lower bound of ef given by the veriﬁer; thus, 0 RCL(f; x+, x ) RCL(f; x+, x ). For veriﬁed prediction, f(x+, x , ϵ) > 0 for a given strength ϵ implies that f is lϵ -veriﬁed at (x+, x ). The pseudocode is presented as PREDICT in Appendix C.2.

To compute RCL, we can apply binary search, because f(x+, x , ϵ) is non-increasing with increasing ϵ. The pseudocode is presented as CERTIFY in Appendix C.2.

Certified Radius

Robust Radius

Instance-wise Adversarial Image

Instance Margin

Figure 2. Illustration for RVCL. x x+ must be larger than the robust radius RCL if x is an instance-wise adversarial sample. In this case, the latent class of x is dog , while the feature of x is similar to that of x , which is an elephant . The certiﬁed radius RCL is provided by the incomplete veriﬁer, which is the lower bound of robust radius RCL. It is veriﬁed that no instance-wise adversarial sample exists in B (x+, RCL).

Note that (9) and (10) are both deﬁned directly on the CL encoder without reference to any labels or downstream tasks, which resolves the issue articulated in 4.1.

4.2.2. ROBUSTNESS METRICS FOR CL

This subsection provides two robustness metrics used to study the robustness of the CL encoder and the performance of incomplete veriﬁers.

Average certiﬁed radius (ACR) For supervised veriﬁcation, ACR (Zhai et al., 2020) is an important metric used in evaluating robustness. Speciﬁcally, it is the average of the certiﬁed radius on the test dataset. We can deﬁne it directly on the supervised downstream task: ACRLE := 1 |Stest| P

(x,y) Stest RLE(g; x, y), where Stest is a labeled test dataset satisfying g(x) = y for all (x, y) Stest. However, ACRLE still suffers from the problems discussed in 4.1. We therefore deﬁne ACR for CL based on RCL, which directly reﬂects the robustness of CL encoder without the label:

Deﬁnition 4.2 (Average certiﬁed radius for CL). Given an unlabeled test dataset Utest generated following 3.1, z = (x+, {x i }K i=1) Utest, RCL is deﬁned in (10). The average certiﬁed radius for CL is deﬁned as follows:

ACRCL := 1 K|Utest|

i=1 RCL(f; x+, x i ). (11)

Certiﬁed instance accuracy For supervised veriﬁcation, certiﬁed accuracy is a metric used to evaluate the performance of incomplete veriﬁers. Wang et al. (2021) state that the veriﬁer will be stronger if the certiﬁed accuracy is the tighter lower bound of supervised robust accuracy. Since there is no deﬁnition of robust accuracy provided for the CL encoder, we propose a novel robust accuracy without

Robustness Veriﬁcation for Contrastive Learning

label and downstream task for CL called robust instance accuracy based on (7):

Deﬁnition 4.3 (Robust instance accuracy). Given an unlabeled test dataset Utest, z = (x+, x ) Utest, we use instance-wise PGD attack (Kim et al., 2020) to generate the adversarial point x B(x+, ϵ) by maximizing the contrastive loss (1). The robust instance accuracy with strength ϵ is deﬁned as follows:

Aϵ CL = 1 |Utest|

z Utest 1[ρ(f(x ),f(x+)) ρ(f(x ),f(x ))>0].

(12) We then deﬁne the certiﬁed instance accuracy with strength ϵ, which is the fraction of the test dataset for which f is lϵ -veriﬁed at (x+, x ), i.e., f > 0.

Deﬁnition 4.4 (Certiﬁed instance accuracy). Given an unlabeled test dataset Utest, z = (x+, x ) Utest. The certiﬁed instance accuracy with strength ϵ is deﬁned as

Aϵ CL = 1 |Utest|

z Utest 1[f(x+,x ,ϵ)>0]. (13)

f being lϵ -veriﬁed at (x+, x ) is the sufﬁcient but not necessary condition of correctly classifying x generated by a speciﬁc attack algorithm with attack strength ϵ. This means that the hold of the judgement condition in (13) implies the hold of that in (12), but not vice versa. Thus, (13) is the lower bound of (12). Remark 4.5. The certiﬁed instance accuracy Aϵ CL is used to compare the veriﬁers on the CL encoder without labels. The smaller gap between robust instance accuracy Aϵ CL and Aϵ CL implies a stronger incomplete veriﬁer (see experiments in 6.3). However, as Aϵ CL is a function of the ﬁxed attack strength ϵ, it is difﬁcult to compare the robustness of two models unless one is uniformly better than the other for all strength ϵ. Thus, ACRCL is a more suitable choice than Aϵ CL for evaluating the robustness of CL encoders.

5. Theoretical Analysis for Robust Radius

What is the relationship between the robust radius RCL and RLE? This section demonstrates that RCL is the upper bound of RLE, which is further veriﬁed by the experimental results in 6.1.

To provide the main insights, we ﬁrst consider the situation in which only one positive sample is used. Formally, given an unlabeled sample z = (x+, {x i }K i=1), we introduce the margin distance of x+ as half of the minimum distance between f(x+) and f(x i ), deﬁned as M := mini [K] Di, where Di := (1 ρ(f(x+), f(x i ))/2.

The idea is to estimate the lower tail of the distribution of M by ﬁtting the λ smallest Di of the negative samples x i . We can then use this estimated distribution to produce the probability of a new point x falling into the margin of x+,

which can be interpreted as the probability of x being a positive sample of x+. x is classiﬁed as a positive sample of x+ if it is inside the margin of x+ with high probability.

To estimate the distribution of the margin distance, we turn to the Fisher-Tippett-Gnedenko Theorem in extreme value theory (see the complete statement in Appendix A.1).

Lemma 5.1 (Fisher-Tippett-Gnedenko theorem (Coles et al., 2001)). Let X1, X2, . . . be a sequence of independent random variables with common distribution function F. Let Mn = sup(X1, . . . , Xn). If there exists a sequence an > 0, bn R such that

lim n P(Mn bn

an z) = G(z),

where G is a non-degenerate distribution function, then G belongs to either the Gumbel family, the Fr echet family or the Reverse Weibull family.

The theorem states that the maximum of a sequence of i.i.d. random variables after proper normalization can only converge to one of three possible distributions.

Theorem 5.2 (Margin distribution). Assume a continuous non-degenerate margin distribution exists. The distribution for margin distance M is then given by the Reverse Weibull distribution. The probability of x being a positive sample of x+ is given by the following:

Ψ(x; x+, α, σ) = exp 1 ρ(f(x), f(x+))

where ρ(f(x), f(x+)) is the instance similarity between x and x+. α, σ > 0 are Weibull shape and scale parameters, obtained from ﬁtting to the λ smallest margin distances Di.

See proof in Appendix A.2. Theorem 5.2 demonstrates that the probability of x being a positive sample can be given by the Reverse Weibull distribution ﬁtting on ﬁnite samples. This enables us to compare the robust radius of the encoder and that of the downstream task. Intuitively, if the classiﬁer can predict correctly with higher conﬁdence, this implies that the classiﬁer provides better certiﬁed robustness.

Theorem 5.3 (Robust radius bound). Given an encoder f : X Rd and an unlabeled sample z = (x+, {x i }K i=1), the downstream predictor g : Rd R is trained on b S = {(f(x+), yc+), (f(x i ), yc )K i=1}. Then, for different negative samples x i , we have

RCL(f; x+, x i ) RLE(g; x+, yc+).

Proof sketch. The possibility of downstream layer predicting x as positive can be given by ΨLE, which is ﬁtted from margin distances {Di}K i=1 computed by b S. ΨCL is ﬁtted from speciﬁc Di, since RCL(f; x+, x i ) is the robust radius between speciﬁc pair of positive and negative sample.

Robustness Veriﬁcation for Contrastive Learning

Figure 3 plots the cumulative distribution function (CDF) of ΨCL and ΨLE. Ψ 1 when x x+, which means x is very likely to be the positive sample of x+. If there exists negative samples between x i and x+ ( in Figure 3), then ΨLE will ﬁt to these negative samples and make CDF grows slower than ΨCL, i.e., ΨCL(x) ΨLE(x). Lemma A.4 offers the correspondence that a higher probability to be a positive sample implies a larger robust radius, which recovers the theorem statement. See complete proof in Appendix A.3.

Figure 3. Probability of x as a positive sample. means the negative sample other than x i . 0 𝑥𝑥+

Probability

Remark 5.4. Theorem 5.3 implies that improving the robustness of the CL encoder enlarges the robust radius RCL, which can beneﬁt the robustness of the downstream layer by providing a large upper bound of RLE. A larger RCL implies that the model will achieve a higher robust performance on the downstream task; thus, it is reasonable to regard RCL as a robustness metric.

In the above, we discuss the decision margin by analyzing the case with single x+ and multiple x . Here, we discuss the robust radius of different x+ by making the following theroem:

Theorem 5.5. Given an encoder f : X Rd, two positive samples x+ 1 , x+ 2 and one negative sample x , if ρ(f(x+ 1 ), f(x )) ρ(f(x+ 2 ), f(x )), then

RCL(f; x+ 1 , x ) RCL(f; x+ 2 , x ).

See proof in Appendix A.4. Theorem 5.5 proves that if f(x+) is similar to f(x ), it is easy to recognize the adversarial sample x of x+ as a negative sample. In 6.2, we empirically show that a vague image for which it is difﬁcult to identify the class characteristic is easy to attack.

6. Experiments

In this section, we verify the effectiveness of our proposed RVCL by means of numerical experiments.

More speciﬁcally, 6.1 demonstrates the effectiveness of the average certiﬁed radius for CL. 6.2 shows that ACRCL can evaluate the anti-disturbance ability of individual images. 6.3 compares the strength of incomplete veriﬁers by means of certiﬁed instance accuracy. 6.4 illustrates the veriﬁed lower bound f given by the veriﬁers. 6.5 provides the sensitivity analysis for parameters in RVCL. Due to space limitations, we compare the efﬁciency of different veriﬁers in Appendix E.2.

Set-up. Our main experiments utilize four architectures: Base, Deep from Wang et al. (2021), CNN-A, CNN-B from Dathathri et al. (2020), from which the last layer is removed to form the CL encoders. Sim CLR (Chen et al., 2020) and Ro CL (Kim et al., 2020) are used in this paper for CL training and contrastive AT, respectively. Contrastive AT is trained with instance-wise adversarial samples with different attack strengths ϵtrain; ϵtrain = 0 indicates that the encoder is trained with benign images. Using the same dataset as in previous deterministic veriﬁcation works, all CL encoders are trained on MNIST (Le Cun & Cortes, 2010) and CIFAR-10 (Krizhevsky & Hinton, 2009).

We utilize two incomplete veriﬁers with different veriﬁed tightness in the RVCL framework: CROWN (Zhang et al., 2018) and CBC (β-CROWN (Wang et al., 2021) for CL). A more detailed introduction to incomplete veriﬁers is presented in Appendix C.1.

Further details of the models and experimental settings are presented in Appendix D. The code can be found in the supplementary materials and Git Hub.

6.1. Average certiﬁed radius

In this subsection, we compute the average certiﬁed radius (ACR in 4.2.2) over 100 test samples (i.e., |Stest| = |Utest| = 100) following previous veriﬁcation works. For ACRCL, we set the number of negative samples K = 10. In the interests of efﬁciency, we use CROWN to compute the ACRCL, which is discussed further in Appendix E.2.

To determine whether the model robustness beneﬁts from the encoder or linear classiﬁer, we ﬁx the encoder and ﬁnetune the downstream layer with benign images without perturbations. For the empirical robust test, we utilize supervised robust accuracy on the whole test dataset; this is the downstream classiﬁcation accuracy over adversarial samples via label-wise PGD attack (Madry et al., 2018) with attack strength ϵtest.

Note that a larger value of ϵtrain implies that a more robust encoder is obtained by contrastive AT. The results in Figure 4(a,b) show that ACRCL and ACRLE increase with increasing ϵtrain on the two datasets. Meanwhile, those in Figure 4(c,d) show that with different values of ϵtest, the robust accuracy of the downstream classiﬁer increases as ϵtrain increases. We therefore conclude that:

1. It is effective to measure the robustness using ACRCL without labels and downstream tasks, because both the ACRCL and supervised metric (ACRLE and robust accuracy) grow consistently with increasing ϵtrain. 2. Figure 4(a,b) show that ACRCL is larger than ACRLE with the same ϵtrain, which supports our Theorem 5.3. Theorem 5.3 demonstrates that RCL is the upper bound of RLE, while ACRCL and ACRLE are related to RCL and RLE, respectively.

Robustness Veriﬁcation for Contrastive Learning

ACRCL ACRLE

(a) ACR for MNIST

ACRCL ACRLE

(b) ACR for CIFAR-10

Accuracy(%)

clean test = 0.15 test = 0.2

(c) MNIST Robust Test

Accuracy(%)

clean test = 8 255 test = 16

(d) CIFAR-10 Robust Test

Figure 4. (a,b) ACRCL and ACRLE on MNIST and CIFAR-10 with different values of ϵtrain. (c,d) Supervised robust accuracy on the downstream classiﬁer. On MNIST, attack strengths ϵtest are 0.15, 0.2; on CIFAR-10, attack strengths ϵtest are 8/255, 16/255. Clean represents testing with benign images. (a,c) run on CNN-A, (b,d) run on CNN-B.

3. Figure 4(a,b) show that a robust encoder can significantly improve the model s robust performance on downstream tasks, since ACRLE grows with increasing ϵtrain even though the downstream layer is learned on benign images.

6.2. Anti-disturbance ability of images

To study the robustness property of each image, in this subsection, we compute ACRCL for a single test sample; i.e., |Utest| = 1, then ACRCL := 1

K PK i=1 RCL(f; x+, x i ).

We sample two images from CIFAR-10, as shown in Figure 5(b). The above image is labeled as deer, which is vague and makes it difﬁcult to identify the latent class. The below image is labeled as dog, which is much clearer than deer. We calculate RCL with ﬁfty negative samples (K = 50). Our ﬁndings suggest that the RCL of deer is signiﬁcantly smaller than that of dog; this means that the distance between the feature of deer and its negative samples is smaller than that between the feature of dog and its negative samples, which supports our Theorem 5.5. We can therefore conclude that the anti-disturbance ability of dog is stronger than that of deer, which means that dog is lϵ -veriﬁed with a larger ϵ than deer. These results verify that ACRCL is able to quantify the anti-disturbance ability of images.

We sample 10 more images from CIFAR-10, and plot the images and their ACRCL in Figure 5(a). It comes to the

horse 0.0443

bird 0.0511

horse 0.0589

bird 0.0721

truck 0.0802

truck 0.0858

airplane 0.0972

(a) ACRCL for 10 images chosen from CIFAR-10

(b) RCL for deer and dog

0.04 0.06 0.08 0.10 0.12 ACRCL

KDE for 500 Images

(c) KDE of ACRCL

Figure 5. (a,b,c) run on CNN-B with ϵtrain = 4.4/255. (a) ACRCL for images, which are randomly chosen from CIFAR-10; K = 50. (b) RCL for two images from CIFAR-10; K = 50. (c) Frequency distribution histogram and kernel density estimation (KDE) of ACRCL for 500 images from CIFAR-10; K = 20.

same conclusion: the vague image which is difﬁcult to identify the latent class has a low ACRCL. We further visualize the distribution of ACRCL for 500 images from CIFAR10 by calculating ACRCL with K = 20, and additionally provide the kernel density plot to show the distribution (see Figure 5(c)). About 90% of images ACRCL are distributed within the range [0.044, 0.093], concentrated around 0.07.

6.3. Certiﬁed instance accuracy

Table 1. Comparison of certiﬁed instance accuracy across various networks and attack strength ϵtest on CIFAR-10. The number of test samples |Utest| = 100.

ϵtest Model ϵtrain

Instance Certiﬁed Accuracy Instance Accuracy PGD CBC CROWN

0 100% 97% 96%

100% 100% 100%

91% 26% 11% 4.4 255 100% 55% 34% 8.8 255 100% 68% 52% Based 4.4 255

100% 99% 95% Deep 100% 96% 84% CNN-A 99% 91% 81% 8 255 CNN-B 8.8 255 1% 0% 0%

In this subsection, we study the performance of incomplete verifers on the CL encoder via certiﬁed instance accuracy (Aϵ CL in Deﬁnition 4.4). Table 1 summarizes some of the results on CIFAR-10. Due to space limitations, we provide detailed settings and explanations for more experimental results (on MNIST) in Appendix E.1. The two incomplete

Robustness Veriﬁcation for Contrastive Learning

veriﬁers we utilize herein are CROWN and CBC; CBC is the state-of-the-art veriﬁer, which is more powerful than CROWN for supervised veriﬁcation (Wang et al., 2021).

Aϵ CL is the lower bound of the robust instance accuracy Aϵ CL (Deﬁnition 4.3, obtained by instance-wise PGD attack (Kim et al., 2020)). The tighter lower bound given by Aϵ CL indicates a stronger incomplete veriﬁer. Table 1 shows that the gap between Aϵ CL given by CBC and Aϵ CL given by PGD is smaller than that between CROWN and PGD, which shows that CBC is a stronger veriﬁer than CROWN.

We further illustrate the veriﬁed lower bound f given by these two incomplete veriﬁers in 6.4, from which the same conclusions can be drawn. All these results demonstrate the efﬁcacy of our proposed RVCL: a stronger supervised veriﬁer can still achieve a tighter certiﬁed radius in the RVCL framework.

6.4. Tightness of veriﬁcation

0.6 0.8 1.0 1.2 1.4 PGD adversarial upper bound

1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0

Verified lower bound

CBC CROWN PGD(y=x)

(a) MNIST, ϵtrain = 0.3, ϵtest = 0.3

0.0 0.5 1.0 1.5 2.0 2.5 3.0 PGD adversarial upper bound

Verified lower bound

CBC CROWN PGD(y=x)

(b) CIFAR-10, ϵtrain = 4.4

255, ϵtest = 4 255 Figure 6. Comparing the tightness of veriﬁers. For 100 test samples on MNIST and CIFAR-10. (a) runs on CNN-A, (b) runs on CNN-B. We plot the veriﬁed lower bound f(x+, x , ϵ) against PGD upper bound f. Some points exceed the plotted axes limits.

Instance-wise PGD attack (Kim et al., 2020) provides the upper bound of minimum distortion, f ef, while RVCL provides the lower bound, ef f(x+, x , ϵ). It should be noted that, unlike with supervised veriﬁcation (Dathathri et al., 2020; Wang et al., 2021), all distortion here is instancewise. (x+, x ) is veriﬁed if f(x+, x , ϵ) > 0, meaning that x+ cannot be disturbed to x with attack strength ϵtest. The closer the veriﬁed lower bound f(x+, x , ϵ) is to the PGD upper bound f (y = x in Figure 6), the stronger the veriﬁer would be.

One hundred test samples (|Utest| = 100) are used to illustrate the tightness of the veriﬁcation. As Figure 6 shows, the points above the dotted line are successfully veriﬁed. CBC achieves tight veriﬁcation across all samples, and furthermore consistently outperforms CROWN on MNIST and CIFAR-10. This result is consistant with supervised veriﬁcation in Wang et al. (2021), which demonstrates the effectiveness of RVCL.

6.5. Sensitivity analysis

50 100 150 200 250 Featrue dimension

trian = 8.8

trian = 4.4

5 10 20 50 Negative samples K

trian = 8.8

trian = 4.4

Figure 7. Left: Sensitivity analysis of feature dimension inﬂuencing ACRCL. Right: Sensitivity analysis of the number of negative samples K inﬂuencing ACRCL. Experiments run on CNN-B with different ϵtrain on CIFAR-10.

Feature dimension We investigate the inﬂuence of the feature dimension on ACRCL (see Left of Figure 7). From the result, we can determine that ACRCL increases slightly with the growing feature dimension, then remains stable on dimensions larger than 150. The results illustrate that ACRCL is not sensitive to feature dimension.

Number of negative samples We validate the inﬂuence of the number of negative samples K on ACRCL (see Right of Figure 7). The result shows that ACRCL is not sensitive to K with different values of ϵtrain, which means that we can use small values of K to efﬁciently evaluate the model robustness or the anti-disturbance ability of an image.

7. Conclusion

In this paper, we tackle the robustness veriﬁcation problem for CL without any labels, and accordingly propose a novel RVCL framework that does not depend on any class labels, downstream tasks or speciﬁc attack algorithms. We then use extreme value theory to reveal the quantitative relationship between the robust radius of the CL encoder and that of the downstream task. All our experiments show that RVCL is an efﬁcient robustness framework for CL encoders, and can also be used to evaluate the anti-disturbance ability of images. Moreover, our experimental results verify our theory. We believe that RVCL is a novel perspective from which to understand robustness on contrastive learning.

Acknowledgements

This work is supported by the National Natural Science Foundation of China under Grant 61976161.

Robustness Veriﬁcation for Contrastive Learning

Alayrac, J., Uesato, J., Huang, P., Fawzi, A., Stanforth, R., and Kohli, P. Are labels required for improving adversarial robustness? In Neur IPS, pp. 12192 12202, 2019.

Anderson, R., Huchette, J., Ma, W., Tjandraatmadja, C., and Vielma, J. P. Strong mixed-integer programming formulations for trained neural networks. Math. Program., 183(1):3 39, 2020.

Bunel, R., Turkaslan, I., Torr, P. H. S., Kohli, P., and Mudigonda, P. K. A uniﬁed view of piecewise linear neural network veriﬁcation. In Neur IPS, pp. 4795 4804, 2018.

Carlini, N. and Wagner, D. A. Towards evaluating the robustness of neural networks. In SP, pp. 39 57, 2017.

Carmon, Y., Raghunathan, A., Schmidt, L., Duchi, J. C., and Liang, P. Unlabeled data improves adversarial robustness. In Neur IPS, pp. 11190 11201, 2019.

Chen, T., Kornblith, S., Norouzi, M., and Hinton, G. E. A simple framework for contrastive learning of visual representations. In ICML, volume 119, pp. 1597 1607, 2020.

Coles, S., Bawa, J., Trenner, L., and Dorazio, P. An introduction to statistical modeling of extreme values, volume 208. 2001.

Dathathri, S., Dvijotham, K., Kurakin, A., Raghunathan, A., Uesato, J., Bunel, R., Shankar, S., Steinhardt, J., Goodfellow, I. J., Liang, P., and Kohli, P. Enabling certiﬁcation of veriﬁcation-agnostic networks via memory-efﬁcient semideﬁnite programming. In Neur IPS, 2020.

Doersch, C., Gupta, A., and Efros, A. A. Unsupervised visual representation learning by context prediction. In ICCV, pp. 1422 1430, 2015.

Ehlers, R. Formal veriﬁcation of piece-wise linear feedforward neural networks. In Automated Technology for Veriﬁcation and Analysis, volume 10482, pp. 269 286, 2017.

Fan, L., Liu, S., Chen, P.-Y., Zhang, G., and Gan, C. When does contrastive learning preserve adversarial robustness from pretraining to ﬁnetuning? In Neur IPS, 2021.

Gibert, X., Patel, V. M., and Chellappa, R. Sequential score adaptation with extreme value theory for robust railway track inspection. In ICCV, pp. 131 138, 2015.

Gidaris, S., Singh, P., and Komodakis, N. Unsupervised representation learning by predicting image rotations. In ICLR, 2018.

Goodfellow, I. J., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples. In ICLR, 2015.

Gowal, S., Huang, P., van den Oord, A., Mann, T., and Kohli, P. Self-supervised adversarial robustness for the low-label, high-data regime. In ICLR, 2021.

He, K., Fan, H., Wu, Y., Xie, S., and Girshick, R. B. Momentum contrast for unsupervised visual representation learning. In CVPR, pp. 9726 9735, 2020.

Ho, C. and Vasconcelos, N. Contrastive learning with adversarial examples. In Neur IPS, 2020.

Jiang, Z., Chen, T., Chen, T., and Wang, Z. Robust pretraining by adversarial contrastive learning. In Neur IPS, 2020.

Jing, L. and Tian, Y. Self-supervised visual feature learning with deep neural networks: A survey. IEEE Trans. Pattern Anal. Mach. Intell., 43(11):4037 4058, 2021.

Katz, G., Barrett, C. W., Dill, D. L., Julian, K., and Kochenderfer, M. J. Reluplex: An efﬁcient SMT solver for verifying deep neural networks. In CAV, volume 10426, pp. 97 117, 2017.

Kim, M., Tack, J., and Hwang, S. J. Adversarial selfsupervised contrastive learning. In Neur IPS, 2020.

Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization. In ICLR, 2015.

Krizhevsky, A. and Hinton, G. Learning multiple layers of features from tiny images. 2009.

Le Cun, Y. and Cortes, C. Mnist handwritten digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2010.

Li, L., Qi, X., Xie, T., and Li, B. Sok: Certiﬁed robustness for deep neural networks. Co RR, abs/2009.04131, 2020.

Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. Towards deep learning models resistant to adversarial attacks. In ICLR, 2018.

Moosavi-Dezfooli, S., Fawzi, A., and Frossard, P. Deepfool: A simple and accurate method to fool deep neural networks. In CVPR, pp. 2574 2582, 2016.

Noroozi, M. and Favaro, P. Unsupervised learning of visual representations by solving jigsaw puzzles. In ECCV, volume 9910, pp. 69 84, 2016.

Nozawa, K., Germain, P., and Guedj, B. Pac-bayesian contrastive unsupervised representation learning. In UAI, volume 124, pp. 21 30, 2020.

Robustness Veriﬁcation for Contrastive Learning

Rudd, E. M., Jain, L. P., Scheirer, W. J., and Boult, T. E. The extreme value machine. IEEE Trans. Pattern Anal. Mach. Intell., 40(3):762 768, 2018.

Saunshi, N., Plevrakis, O., Arora, S., Khodak, M., and Khandeparkar, H. A theoretical analysis of contrastive unsupervised representation learning. In ICML, volume 97, pp. 5628 5637, 2019.

Scheirer, W. J., Rocha, A., Micheals, R. J., and Boult, T. E. Meta-recognition: The theory and practice of recognition score analysis. IEEE Trans. Pattern Anal. Mach. Intell., 33(8):1689 1695, 2011.

Scheirer, W. J., Kumar, N., Belhumeur, P. N., and Boult, T. E. Multi-attribute spaces: Calibration for attribute fusion and similarity search. In CVPR, pp. 2933 2940, 2012.

Schmidt, L., Santurkar, S., Tsipras, D., Talwar, K., and Madry, A. Adversarially robust generalization requires more data. In Neur IPS, pp. 5019 5031, 2018.

Sinha, A., Namkoong, H., and Duchi, J. C. Certifying some distributional robustness with principled adversarial training. In ICLR, 2018.

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I. J., and Fergus, R. Intriguing properties of neural networks. In ICLR, 2014.

Tian, Y., Krishnan, D., and Isola, P. Contrastive multiview coding. In ECCV, volume 12356, pp. 776 794, 2020.

Tjeng, V., Xiao, K. Y., and Tedrake, R. Evaluating robustness of neural networks with mixed integer programming. In ICLR, 2019.

Uesato, J., O Donoghue, B., Kohli, P., and van den Oord, A. Adversarial risk and the dangers of evaluating against weak attacks. In ICML, volume 80, pp. 5032 5041, 2018.

Wang, S., Zhang, H., Xu, K., Lin, X., Jana, S., Hsieh, C., and Kolter, J. Z. Beta-crown: Efﬁcient bound propagation with per-neuron split constraints for complete and incomplete neural network veriﬁcation. In Neur IPS, 2021.

Weng, T., Zhang, H., Chen, P., Yi, J., Su, D., Gao, Y., Hsieh, C., and Daniel, L. Evaluating the robustness of neural networks: An extreme value theory approach. In ICLR, 2018.

Wong, E. and Kolter, J. Z. Provable defenses against adversarial examples via the convex outer adversarial polytope. In ICML, volume 80, pp. 5283 5292, 2018.

Wong, E., Schmidt, F. R., Metzen, J. H., and Kolter, J. Z. Scaling provable adversarial defenses. In Neur IPS, pp. 8410 8419, 2018.

Wu, Z., Xiong, Y., Yu, S. X., and Lin, D. Unsupervised feature learning via non-parametric instance discrimination. In CVPR, pp. 3733 3742, 2018.

Xu, K., Shi, Z., Zhang, H., Wang, Y., Chang, K., Huang, M., Kailkhura, B., Lin, X., and Hsieh, C. Automatic perturbation analysis for scalable certiﬁed robustness and beyond. In Neur IPS, 2020.

Yang, G., Duan, T., Hu, J. E., Salman, H., Razenshteyn, I. P., and Li, J. Randomized smoothing of all shapes and sizes. In ICML, volume 119, pp. 10693 10705, 2020.

Zhai, R., Cai, T., He, D., Dan, C., He, K., Hopcroft, J. E., and Wang, L. Adversarially robust generalization just requires more unlabeled data. Co RR, abs/1906.00555, 2019.

Zhai, R., Dan, C., He, D., Zhang, H., Gong, B., Ravikumar, P., Hsieh, C., and Wang, L. MACER: attack-free and scalable robust training via maximizing certiﬁed radius. In ICLR, 2020.

Zhang, H., Weng, T., Chen, P., Hsieh, C., and Daniel, L. Efﬁcient neural network robustness certiﬁcation with general activation functions. In Neur IPS, pp. 4944 4953, 2018.

Zhang, H., Yu, Y., Jiao, J., Xing, E. P., Ghaoui, L. E., and Jordan, M. I. Theoretically principled trade-off between robustness and accuracy. In ICML, volume 97, pp. 7472 7482, 2019.

Robustness Veriﬁcation for Contrastive Learning

A.1. Extreme Value Theory

Before provding the proofs of main results, we ﬁrst provide two important lemmas in extreme value theory (EVT).

Lemma A.1 (Fisher-Tippett-Gnedenko theorem (Coles et al., 2001)). Let X1, X2, . . . be a sequence of independent random variables with common distribution function F. Let Mn = max(X1, . . . , Xn). If there exists a sequence an > 0, bn R such that lim n P(Mn bn

an z) = G(z), (A.1)

where G is a non-degenerate distribution function, then G belongs to either the Gumbel family (Type I), the Fr echet family (Type II) or the Reverse Weibull family (Type III) with their CDFs as follows:

Gumbel family (Type I): G(z) = exp exp z b

Fr echet family (Type II): G(z) = n 0, if z < b, exp z b

a α , if z b,

Reverse Weibull family (Type III): G(z) = n exp b z

a α , if z < b, 1, if z b,

where a > 0, b R and α > 0 are the scale, location and shape parameters, respectively. Lemma A.1 states that the rescaled sample maxima (Mn bn)/an converge in distribution to a variable that has a distribution within one of three families. Furthermore, these three families can be combined into a single family called generalized extreme value (GEV) distribution, which is a family of continuous probability distributions developed within extreme value theory. The Gumbel, Fr echet and Reverse Weibull families are special cases of GEV distribution.

Lemma A.2 (Generalized Extreme Value (GEV) distribution (Coles et al., 2001)). Let X1, X2, . . . be a sequence of i.i.d. samples from the distribution function F. Let Mn = max(X1, . . . , Xn). If there exists a sequence an > 0, bn R such that limn P( Mn bn

an z) = G(z), then if G is a non-degenerate distribution function, it belongs to the class of generalized extreme value (GEV) distributions with

, z R : 1 + ξ z µ

where ξ R, µ R and σ > 0 are the shape, location and scale parameters, respectively.

As the special case, the Reverse Weibull family in Lemma A.1 can be derived by Lemma A.2, which let ξ < 0 and the upper endpoint of F be denoted by b, then α = 1/ξ > 0.

A.2. Proof of Theorem 5.2

Theorem 5.2 (Margin distribution). Assume a continuous non-degenerate margin distribution exists. The distribution for margin distance M is then given by the Reverse Weibull distribution. The probability of x being a positive sample of x+ is given by the following:

Ψ(x; x+, α, σ) = exp 1 ρ(f(x), f(x+))

where ρ(f(x), f(x+)) is the instance similarity between x and x+. α, σ > 0 are Weibull shape and scale parameters, obtained from ﬁtting to the λ smallest margin distances Di.

Proof. From the assume we know that G(z) in Lemma A.1 exists. Since Lemma A.1 applies to maxima, we transform the variables via M = maxi [K] Di, Di := (1 ρ(f(x+), f(x i ))/2. Because Di is bounded ( Di < 0), so the asymptotic distribution of M converges to the Reverse Weibull distribution:

αo , if z < 0,

1, if z 0, (A.4)

Robustness Veriﬁcation for Contrastive Learning

where α > 0, b is the upper endpoint of F, σ is the scale parameters. b = 0 in (A.4) since M is bounded above by 0 as a negative distance. We use margin distances Di of the λ closest samples with x+ to estimate the parameters α and σ, which means to estimate c W of the distribution funstion W.

The margin distance between x and x+ deﬁned as 1 ρ(f(x), f(x+)). Then we focus on the probability of x included in the margin of x+, which can be written as:

P(1 ρ(f(x), f(x+)) < min(D1, . . . , Dn)) = P( min(D1, . . . , Dn) < ρ(f(x), f(x+)) 1)

= P(max( D1, . . . , Dn) < ρ(f(x), f(x+)) 1)

= P(M < ρ(f(x), f(x+)) 1)

= c W(ρ(f(x), f(x+)) 1).

Since ρ(f(x), f(x+)) 1 < 0, we can rewrite (A.5) as (A.3). Overall, we conclude our proof.

A.3. Proof of Theorem 5.3

Theorem 5.3 (Robust radius bound). Given an encoder f : X Rd and an unlabeled sample z = (x+, {x i }K i=1), the downstream predictor g : Rd R is trained on b S = {(f(x+), yc+), (f(x i ), yc )K i=1}. Then, for different negative samples x i , we have

RCL(f; x+, x i ) RLE(g; x+, yc+). (A.6)

Before we formally prove Theorem 5.3, we ﬁrst provide the following two lemmas.

Lemma A.3 focuses on a model for the k largest order statistics. It extends the result in Lemma A.1 to extreme order statistics, by deﬁning M (k) n = k-th largest of {X1, . . . , Xn} and further identifying the limiting behavior of this variable, for ﬁxed k, as n . Lemma A.3 implies that, if the k-th largest order statistic is normalized in exactly the same way as the maximum, then its limiting distribution is of the form given by (A.8).

Lemma A.3 (k-th largest order statistic (Coles et al., 2001)). Let X1, X2, . . . be a sequence of i.i.d. samples from the distribution function F. Let Mn = max(X1, . . . , Xn). If there exists a sequence an > 0, bn R such that limn P( Mn bn

an z) = G(z) for a non-degenerate distribution function G, so that G is the GEV distribution function given by (A.2), then, for ﬁxed k, lim n P((M (k) n bn)/an) = Gk(z) (A.7)

on {z : 1 + ξ(z µ)

σ > 0}, where

Gk(z) = exp{ τ(z)}

with τ(z) = 1 + ξ z µ

Intuitively, the classiﬁer predicting correctly with higher conﬁdence implies that the classiﬁer can provide better certiﬁed robustness. Lemma A.4 provides this relationship directly.

Lemma A.4 (Robust radius (Yang et al., 2020)). The robust radius in any norm is at least

1 Φ(p)dp, (A.9)

where Φ(p) := sup v =1 sup U Rd:q(U)=p limr 0 q(U rv) p

r , λ is the probability that the binary classiﬁer predicts the right label under perturbation, q(U) is the measure of U under q, i.e. q(U) = Prδ q(δ U), v is the perturbation vector.

Finally, we prove Theorem 5.3.

Robustness Veriﬁcation for Contrastive Learning

Proof. ΨLE is obtained from ﬁtting to margin distances {Di}K i=1, Di := (1 ρ(f(x+), f(x i ))/2. From Lemma A.1 we know that (A.3) is the Reverse Weibull distribution and can be written as the form of (A.2):

ΨLE(x; x+, α, σ) = exp

with z = D = ρ(f(x), f(x+)) 1.

RCL(f; x+, x i ) is deﬁned on the positive and negative sample pair (x+, x i ); it means that ΨCL is ﬁtted from speciﬁc (x+, x i ). We denote D(k) for x i is the k-th largest order statistic of { D1, . . . , DK}. By Lemma A.3, the distribution function ΨCL for D(k) can be written as:

ΨCL(x; x+, α, σ, k) = exp{ τ(z)}

with τ(z) = 1 + ξ z µ

σ 1/ξ , z = ρ(f(x), f(x+)) 1. ξ, µ, σ are the same with (A.10).

Let x Dc+ be the positive test sample of x+. As we discuss in 3.1, the latent label for unlabeled positive sample can be given by yc+. In 4, we propose that judging positive sample correctly on CL encoder and downstream task can be transformed to judge whether or not WCLf(x) > 0 and yc+ g(x) > 0 are True, respectively. Thus, for every negative sample x i , we have

P(WCLf(x) > 0 | x) P(yc+ g(x) > 0 | x) = ΨCL(x; x+, α, σ, k) ΨLE(x; x+, α, σ)

= exp{ τ(z)}

= exp{ τ(z)}

with τ(z) = 1 + ξ z µ

σ 1/ξ, z = ρ(f(x), f(x+)) 1. Equality holds if and only if k = 1. Recall that x = x+ + δ, δ ϵ, i.e. x B (x+, ϵ), which is in the constraints of (6) and (9).

By Lemma A.4, we have:

RCL(f; x+, x i ) RLE(g; x+, yc+) = Z 1/2

1 ΨCL(x;x+,α,σ,k)

1 Φ(p)dp Z 1/2

1 ΨLE(x;x+,α,σ)

= Z 1 ΨLE(x;x+,α,σ)

1 ΨCL(x;x+,α,σ,k)

which recovers the theorem statement. Equality holds if and only if k = 1.

A.4. Proof of Theorem 5.5

Theorem 5.5. Given an encoder f : X Rd, two positive samples x+ 1 , x+ 2 and one negative sample x , if ρ(f(x+ 1 ), f(x )) ρ(f(x+ 2 ), f(x )), then

RCL(f; x+ 1 , x ) RCL(f; x+ 2 , x ). (A.14)

Proof. In this proposition, we consider the situation in which multiple positive samples x+ and only one negative sample x are used. Formally, given an unlabeled sample z = ({x+ i }K i=1, x ), we deﬁne the margin distance of x as M := mini [K] D i , where D i := (1 ρ(f(x+ i ), f(x ))/2.

Robustness Veriﬁcation for Contrastive Learning

We denote the lower tail of the distribution of M as Ψneg. Similar to the idea of Theorem 5.2, we use Ψneg to produce the probability of x falling into the margin of x , which can be interpreted as the probability of x being a negative sample. From Lemma A.1, we know that Ψneg also converges to the Reverse Weibull distribution, and can be written to the form as following:

Ψneg(x; x , α, σ) = exp 1 ρ(f(x), f(x ))

where ρ(f(x), f(x )) is the instance similarity between x and x . α, σ > 0 are Weibull shape and scale parameters, obtained from ﬁtting to the λ smallest margin distances D i .

Cumulative distribution function Ψneg is monotonic increasing. Given two positive samples x+ 1 , x+ 2 , if ρ(f(x+ 1 ), f(x )) ρ(f(x+ 2 ), f(x )), we have: Ψneg(x+ 1 ; x , α, σ) Ψneg(x+ 2 ; x , α, σ) (A.16)

x being the positive sample of x+ means that x is more similar to x+ than to x . From Lemma A.4 we have:

RCL(f; x+ 1 , x ) RCL(f; x+ 2 , x ) = Z 1/2

Ψneg(x+ 1 ;x ,α,σ)

1 Φ(p)dp Z 1/2

Ψneg(x+ 2 ;x ,α,σ)

= Z Ψneg(x+ 1 ;x ,α,σ)

Ψneg(x+ 2 ;x ,α,σ)

which recovers the theorem statement. Equality holds if and only if ρ(f(x+ 1 ), f(x )) = ρ(f(x+ 2 ), f(x )).

B. Difference with CLEVER

CLEVER (Cross-Lipschitz Extreme Value for n Etwork Robustness) (Weng et al., 2018) estimates the robust radius R using extreme value theory (EVT). In this paper, we utilize EVT in a totally different way compared with CLEVER:

1. To produce the probability of x being a positive sample of x+, we utilize EVT to estimate the lower tail of the margin distance (Theorem 5.2); thus, we can compare the probability given by the CL encoder and the downstream task. While CLEVER focuses on estimating R by proposing a sampling based approach with EVT to estimate the local Lipschitz constant; it is based on a theoretical analysis of formal robustness guarantee with Lipschitz continuity assumption. 2. We use EVT to reveal the quantitative relationship between the robust radius of the CL encoder and that of the downstream task. While CLEVER only focuses on the robust radius for supervised veriﬁcation.

C. Complete Implementation

We ﬁrst introduce two incomplete veriﬁers with different veriﬁed tightness used for RVCL. Then we present the complete implementation for veriﬁed prediction and certiﬁcation.

C.1. Incomplete veriﬁers

Due to the nonlinear activations σ( ), the feasible set of (6) and (9) is nonconvex. One intuitive idea is to perform the convex relaxation of the feasible set to build incomplete veriﬁers. This paper discusses Re LU networks with CROWN (Zhang et al., 2018), which is a method used to relax the nonconvex equality constraints bφk( ) = σ(φk( )) to convex inequality constraints.

Let l(j) k and u(j) k be the lower and upper bound of φ(j) k , i.e. l(j) k < φ(j) k < u(j) k , k [L]. Given the Re LU activation function

σ(y) = max(y, 0), CROWN uses linear constraints to relax Re LU: α(i) j φ(j) k bφ(j) k u(j) k u(j) k l(j) k

φ(j) k l(j) k , where

0 α(i) j 1. After convex relaxation, (6) and (9) can be efﬁciently solved, as follows:

Lemma C.1 (CROWN bound (Zhang et al., 2018)). Given an L-layer NN φL : Rd0 R with weights Wk and bias bk, and the pre-activation bound l(j) k < φ(j) k < u(j) k (k [L], j [dk]), x B(x, ϵ), we have:

min x WL bφL 1(x ) + b L min x c x + c0 (C.18)

where c and c0 can be computed by Wk, bk, l(j) k , u(j) k .

Robustness Veriﬁcation for Contrastive Learning

Another incomplete veriﬁer stronger than CROWN is β-CROWN (Wang et al., 2021) which is the state-of-the-art veriﬁcation method. β-CROWN uses a few steps of gradient ascent to achieve bounds as tight as possible but suffer from high time cost.

Lemma C.2 (β-CROWN bound (Wang et al., 2021)). Given an L-layer NN φL : Rd0 R with weights Wk and bias bk, the pre-activation bound l(j) k < φ(j) k < u(j) k , x B(x, ϵ), and split constraints z Z, we have:

min x ,z Z WL bφL 1(x ) + b L max β 0 min x (a + P β) x + q β + c0 (C.19)

where a, P , q and c0 can be computed by Wk, bk, l(j) k , u(j) k , β is the multiplier of Lagrange function.

We modify two supervised veriﬁers with different veriﬁcation tightness in order to show that: a stronger veriﬁer can still achieve a tighter certiﬁed radius RCL in RVCL framework, which illustrates the efﬁcacy of RVCL. The related experiment results are in 6.3.

The procedure of incomplete veriﬁer is denoted by INCOMPLETEVERIFIER in the next subsection, which aims to give the veriﬁed lower bound f(x+, x , ϵ) of the function ef in (9).

C.2. Pseudocode

In terms of veriﬁed prediction for CL, (x+, x ) being veriﬁed as correct with strength ϵ means that f is lϵ -veriﬁed at (x+, x ). Similar to the discussion of supervised veriﬁcation in 3.2, if f(x+, x , ϵ) given by the procedure INCOMPLETEVERIFIER is greater than 0, (x+, x ) is veriﬁed to be correct . The procedure of veriﬁed prediction is presented in pseudocode as PREDICT. We utilize PREDICT to obtain the certiﬁed instance accuracy (Aϵ CL in Deﬁnition 4.4) for the test dataset Utest, since Aϵ CL is the fraction of the test dataset for which f is lϵ -veriﬁed at (x+, x ), i.e., PREDICT(f, x+, x , ϵ) = True.

In addition to prediction, we are also interested in the certiﬁed radius RCL for a given (x+, x ). Apparently, f(x+, x , ϵ) is non-increasing with ϵ because of the inf operator. Thus, we can apply binary search to obtain RCL. The procedure is presented as CERTIFY. More precisely, we determine whether f is lϵ -veriﬁed at (x+, x ) with current ϵ. If yes, it means that (x+, x ) is lϵ -veriﬁed with an ϵ larger than the current one, then we increase ϵ; otherwise, we decrease ϵ. The ﬁnal solution of ϵ is the certiﬁed radius RCL.

Pseudocode prediction and certiﬁcation for CL

# judge f is lϵ -veriﬁed at (x+, x ) or not function PREDICT(f, x+, x , ϵ)

Input: encoder f, positive sample x+, negative sample x , perturbation bound ϵ Output: True: f is lϵ -veriﬁed at (x+, x ); False: f is not lϵ -veriﬁed at (x+, x ) f = INCOMPLETEVERIFIER(f, x+, x , ϵ) if f > 0 then return True else return False

# compute the certiﬁed radius of (x+, x ) on encoder f function CERTIFY(f, x+, x , τ, Rl, Ru)

Input: encoder f, positive sample x+, negative sample x , tolerance τ, lower bound Rl, upper bound Ru Output: certiﬁed radius RCL Initialization: τ = 10 6, Rl = 0, Ru = 1 while |Ru Rl| > τ do

ϵ = (Rl+Ru)/2 # f is lϵ -veriﬁed at (x+, x ), the answer should be larger than current ϵ if PREDICT(f, x+, x , ϵ) then Rl = ϵ else Ru = ϵ end while return ϵ

Robustness Veriﬁcation for Contrastive Learning

D. Experimental Settings

D.1. Datasets and model architectures

Datasets For model training, we use MNIST (Le Cun & Cortes, 2010) and CIFAR-10 (Krizhevsky & Hinton, 2009). MNIST is a dataset of 28 28 pixel grayscale images of handwritten single digits between 0 and 9, which contains 60,000 training images and 10,000 testing images with 10 classes. CIFAR-10 contains 50,000 training and 10,000 testing images with 10 classes. Size for color image in CIFAR-10 is 32 32.

Model architectures Table D.1 summarizes the CNN encoder architectures. Each layer (except the last linear layer) is followed by Re LU activation function. Based and Deep are used in Wang et al. (2021), CNN-A and CNN-B are used in Dathathri et al. (2020). To study the sensitivity of featrue dimension, the ouput dimension of encoder can be changed from 50 to 250 on CNN-B.

Table D.1. Model structures used in our experiments. For example, Conv(1, 8, 4) stands for a conventional layer with 1 input channel, 8 output channels and a 4 4 kernel. Linear(754, 100) stands for a fully connected layer with 754 input features and 100 output features.

Datasets Model name Encoder structure

MNIST Base Conv(1, 8, 4) - Conv(8, 16, 4) - Linear(784, 100) CNN-A Conv(1, 16, 4) - Conv(16, 32, 4) - Linear(1568, 100)

Base Conv(3, 8, 4) - Conv(8, 16, 4) - Linear(1024, 100) Deep Conv(3, 8, 4) - Conv(8, 8, 3) - Conv(8, 8, 3) - Conv(8, 8, 4) - Linear(412, 100) CNN-A Conv(3, 16, 4) - Conv(16, 32, 4) - Linear(2048, 100) CNN-B Conv(3, 32, 5) - Conv(32, 128, 4) - Linear(8192, 100)

D.2. Training setup

All NNs are trained with veriﬁcation-agnostic setting (Dathathri et al., 2020), which means without using any tricks to promote veriﬁability. We use the model mentioned in Appendix D.1 as the base encoder network and 2-layer multi-layer perceptron as the projection head (Chen et al., 2020). We set the step size of instance-wise attack α = 0.007, the number of PGD maximize iteration as K = 10. For the rest, we follow the similar setup of Sim CLR (Chen et al., 2020) and Ro CL (Kim et al., 2020).

For optimization, we train the encoder with 500 epochs under Adam (Kingma & Ba, 2015) optimizer with the learning rate of 0.001. For the learning rate scheduling, the learning rate is dropped by a factor of 10 for every 100 epochs. The batch size in training is 256.

D.3. Evaluation setup

Linear evaluation We train the downstream linear layer on the top of the frozen encoder, and the training images are clean. We train the linear layer for 100 epochs with the learning rate of 0.001, and use the cross-entropy loss. The learning rate is dropped by a factor of 10 for every 50 epochs.

Robust test To evaluate the adversarial robustness, we use white-box project gradient descent (PGD) attack. We set ℓ attack with 20 iteration steps. ϵtest = 0, 0.15, 0.2 is set for MNIST, and ϵtest = 0, 8/255, 16/255 is set for CIFAR-10.

Incomplete veriﬁers If not special speciﬁed, CBC (β-CROWN (Wang et al., 2021) for CL) working as an incomplete veriﬁer uses three minutes for each image.

D.4. Training efﬁciency

Our experiments are conducted on a Ubuntu 64-Bit Linux workstation, having 10-core Intel Xeon Silver CPU (2.20 GHz) and Nvidia Ge Force RTX 2080 Ti GPUs with 11GB graphics memory. For adversarial contrastive learning on base encoder, it takes about 12 hours to train 500 epochs on MNIST CNN-A and CIFAR-10 CNN-B with a single GPU. And it takes about 150 minutes to train 100 epochs on downstream linear layer.

Robustness Veriﬁcation for Contrastive Learning

E. Additional Experiments

E.1. Certiﬁed instance accuracy

Table E.2. Complete comparison of certiﬁed instance accuracy across various networks and attack strength ϵtest. All accuracy is computed on 100 test samples on MNIST and CIFAR-10. CBC uses 3 minutes for each sample.

Dataset ϵneg ϵtest Model ϵtrain

Instance Certiﬁed Accuracy Instance Accuracy PGD CBC CROWN

MNIST 0.3 0.1

Based 0 20% 2% 0% CNN-A 6% 0% 0% Based 0.1 100% 100% 100% CNN-A 100% 100% 99%

0.5 0.3 Based 0.3 100% 98% 42% CNN-A 100% 85% 3%

0 100% 97% 96%

100% 100% 100%

91% 26% 11%

4.4 255 100% 55% 34%

8.8 255 100% 68% 52% Based 4.4 255

100% 99% 95% Deep 100% 96% 84% CNN-A 99% 91% 81%

8 255 CNN-B

8.8 255 1% 0% 0%

24 255 6 255

4.4 255 100% 8% 2%

8.8 255 100% 24% 11%

Appendix E.1 summarizes the complete results of certiﬁed instance accuracy provided by our proposed RVCL on MNIST and CIFAR-10. In order to control the similarity ρ(f(x ), f(x )) between x and x , we generate the negative sample x

via instance-wise PGD attack (Kim et al., 2020) with strength ϵneg which is much larger than ϵtest. In 6.3, we present some of the results on CIFAR-10 with ϵneg = 16/255.

As shown by the results in Appendix E.1, the gap between CBC and PGD is smaller than that between CROWN and PGD on both MNIST and CIFAR-10 datasets. This is consistent with the experimental results in Wang et al. (2021), and further illustrates the effectiveness of our proposed RVCL. From Appendix E.1, we can also make the following observations and remarks:

Inﬂuence of ϵtest If ϵtest is small, the certiﬁed instance accuracy Aϵ CL of both CROWN and CBC approach the robust instance accuracy Aϵ CL given by instance-wise PGD. However, the gap between CROWN and CBC becomes large as ϵtest increases. The results show that CBC is a more powerful verifer than CROWN. The reason for this is that CBC optimizes the intermediate layer bounds and then iteratively tightens the lower bound. We can further observe that instance-wise PGD successfully attacks the model on all images of CIFAR-10 under ϵtest = 8/255. Remark E.1. The results in Appendix E.1 often achieve a high robust instance accuracy Aϵ CL. The direct reason is that the instance-wise attack is more difﬁcult than the label-wise attack. Theoretically, Theorem 5.3 shows that the robust radius RCL RLE. This implies that one may label-wise attack an image successfully with a small ϵtest, but that it is nearly impossible to successfully instance-wise attack with the same small ϵtest, which results in a high robust instance accuracy. Figure 4(b) experimentally certiﬁes this conclusion, ACRCL is an order of magnitude larger than ACRLE. However, without loss of effectiveness, we can still evaluate the tightness of veriﬁers by comparing the gap between Aϵ CL and Aϵ CL.

Inﬂuence of ϵtrain The certiﬁed instance accuracy Aϵ CL increases with increasing ϵtrain (consistent with Figure 4(c,d)), which demonstrates that Aϵ CL can also evaluate the model robustness. Howerver, as we discuss in Remark 4.5, Aϵ CL is a function of speciﬁc attack strength ϵtest, it s hard to compare the robustness of two models by comparing Aϵ CL of various values of ϵtest. Thus ACRCL is a more suitable choice to campare models with different robust performance.

Robustness Veriﬁcation for Contrastive Learning

E.2. Efﬁciency of incomplete veriﬁers

(a) ACRCL for MNIST

(b) ACRCL for CIFAR-10

Figure E.1. ACRCL calculated by CBC and CROWN on MNIST and CIFAR-10. |Utest| = 100, K = 10.

Table E.3. Average calculation time of ACRCL, the number of negative samples K = 5, and timeout is set to 0.3.

Time(s) CROWN CBC

MNIST 1.16 464.52 CIFAR-10 3.66 921.78

6.1 and 6.2 use CROWN to compute the certiﬁed radius ACRCL in the interests of efﬁciency. This subsection shows that CBC achieves similar ACRCL with CROWN, but takes more time than CROWN.

Timeout is set to 0.3s for each step of CBC binary search. CNN-A is run on MNIST and CNN-B is run on CIFAR-10. The experimental setting of Figure E.1 is the same with that in 6.1. Table E.3 provides the average time to caculate ACRCL deﬁned in 6.2 over 20 images.

The results in Figure E.1 show that ACRCL over Utest provided by CBC and CROWN is nearly the same, and both of them show the tendency of robust performance. This is because the time for each step of binary search is too short for CBC to tighten the lower bound. However, Table E.3 shows that the time cost of CBC is about 300 times slower than CROWN, even using a small value of timeout for CBC. Therefore, we conclude that CBC is able to achieve a tight bound, but CBC is time-consuming in computing RCL, and it is reasonable to use CROWN to compute ACRCL of models and images.