# robustness_verification_for_contrastive_learning__087becbf.pdf Robustness Verification for Contrastive Learning Zekai Wang 1 Weiwei Liu 1 Contrastive adversarial training has successfully improved the robustness of contrastive learning (CL). However, the robustness metric used in these methods is linked to attack algorithms, image labels and downstream tasks, all of which may affect the consistency and reliability of robustness metric for CL. To address these problems, this paper proposes a novel Robustness Verification framework for Contrastive Learning (RVCL). Furthermore, we use extreme value theory to reveal the relationship between the robust radius of the CL encoder and that of the supervised downstream task. Extensive experimental results on various benchmark models and datasets verify our theoretical findings, and further demonstrate that our proposed RVCL is able to evaluate the robustness of both models and images. Our code is available at https: //github.com/wzekai99/RVCL. 1. Introduction While neural networks (NNs) have achieved remarkable performance in various applications, many studies (Goodfellow et al., 2015; Madry et al., 2018) have demonstrated that NNs are vulnerable when dealing with imperceptibly perturbed images. A rapidly growing body of work therefore aims to investigate how a robust NN model might be obtained. One successful method in this field is based on adversarial training (AT) (Moosavi-Dezfooli et al., 2016; Carlini & Wagner, 2017), which improves the robustness of NN by augmenting the training set with adversarial samples (Szegedy et al., 2014). Schmidt et al. (2018) show that AT requires a large amount of data, while the labels of this data are expensive to obtain. Thus, several existing works have attempted to improve the robustness of adversarially trained models through the use of additional unlabeled data 1School of Computer Science, Wuhan University, China. Correspondence to: Weiwei Liu . Proceedings of the 39 th International Conference on Machine Learning, Baltimore, Maryland, USA, PMLR 162, 2022. Copyright 2022 by the author(s). (Alayrac et al., 2019; Carmon et al., 2019; Zhai et al., 2019). Recently, attempts have been made to combine AT with contrastive learning (CL) (Kim et al., 2020; Fan et al., 2021). CL (Chen et al., 2020; He et al., 2020) has demonstrated superior abilities in unsupervised learning: specifically, it can surpass the standard accuracy of supervised learning on downstream image classification tasks by utilizing a largescale unlabeled dataset. It is therefore of primary interest to study the robust performance achieved by contrastive AT (Gowal et al., 2021). However, existing contrastive AT methods use the empirical robustness metric (e.g. robust accuracy) to evaluate the robustness of encoders, an approach that relies on attack algorithms, image labels and downstream tasks. Three key questions naturally arise from this kind of measurement: 1. Attack algorithm: Robust accuracy is related to a specific attack algorithm (e.g. PGD attack (Madry et al., 2018)); thus, the results may not be consistent with powerful adversaries. 2. Image label: It is farfetched to train with unlabeled images while evaluating the encoder with labeled images. 3. Downstream task: We do not know whether the robustness benefits from the encoder or the linear classifier. In fact, the first question has already been considered in supervised learning. To explore the first question in more detail, the robustness analysis of NN is entangled with the specific attack algorithms used for evaluation (Weng et al., 2018). Most defense heuristics have subsequently been shown to fail against suitably powerful attack algorithms (Carlini & Wagner, 2017; Uesato et al., 2018). Even if the model is made robust to the attack algorithm used by evaluation, there is no guarantee that will remain robust to other unseen attacks. This has encouraged researchers to develop robustness verification (Katz et al., 2017; Wong et al., 2018): i.e., classifiers whose prediction at point x is verified to be constant within a neighborhood of x, regardless of what attack algorithm is applied. The key concept is to find the largest radius of this neighborhood, referred to as the robust radius, which is an important metric for use in studying the robustness of NN. Unfortunately, previous works on robustness verification have used only supervised learning; i.e., true labels of data are required for both the supervised classifier and verifier. Robustness Verification for Contrastive Learning A naive approach would be to apply a current verification framework to supervised downstream tasks, although this method would also suffer from the issues highlighted in Questions 2 & 3, which prompts us to ask the following questions: Can we design a robustness verification framework for contrastive learning that does not require class labels and downstream tasks? Is there any relationship between the robust radius of the CL encoder and that of the downstream task? Our work attempts to conduct a rigorous and comprehensive study that addresses the above questions. Our main contributions are summarized below. 1. We propose RVCL, a novel Robustness Verification framework for Contrastive Learning. We then define the robust radius for CL such that no points inside this radius can be recognized as negative samples; this is a robustness metric for encoders that does not require the involvement of an attack algorithm, image label and downstream task. 2. We use extreme value theory to prove that the robust radius of the CL encoder is the upper bound of the robust radius of the downstream task. This implies that robust data representations rendered by the CL encoder (with a large robust radius) can benefit from the model s robust performance on downstream tasks. 3. To verify the efficacy of the proposed RVCL, we validate our proposed framework on two verification benchmark datasets (MNIST and CIFAR-10). Our experimental results illustrate that the proposed RVCL is a suitable robustness metric for models without labels, and accordingly validate our theoretical analysis. Moreover, RVCL is also able to evaluate the antidisturbance ability for distinct images. 2. Related work Self-supervised Contrastive Learning Self-supervised learning (Jing & Tian, 2021), which involves training models using unlabeled data and various pretext tasks, has become popular as a means of extracting feature representation for deep NNs. Early advances have been used to solve image jigsaw puzzles (Noroozi & Favaro, 2016), predict rotation angles (Gidaris et al., 2018), fill image patches (Doersch et al., 2015), etc. Recently, contrastive learning (CL) (Wu et al., 2018; Chen et al., 2020; He et al., 2020; Tian et al., 2020) has been proposed by maximizing the agreement between positive samples while contrasting with negative samples, and has further been shown to work well in learning effective representations. Some theoretical works have also been proposed; for example, Saunshi et al. (2019) provide the first generalization bound for CL, while Nozawa et al. (2020) extend it by means of a PAC-Bayesian approach. Contrastive Adversarial Training Due to the brittleness of NNs when faced with tiny input perturbations, AT (Madry et al., 2018) is one of the most powerful robust training methods used to enhance model robustness. Several recent works (Kim et al., 2020; Ho & Vasconcelos, 2020; Jiang et al., 2020; Fan et al., 2021) have explored how to improve robustness using contrastive AT. To obtain more robust data representations, AT is used on contrastive pretraining tasks following the contrastive adversarial pretraining + supervised finetuning paradigm. However, existing methods use an empirical robustness metric; the systematic study of the verified robustness of CL have been less explored. Robustness Verification In this paper, we focus on deterministic verification, following the taxonomy of Li et al. (2020). When the given input is non-robust against the attack, deterministic verification is guaranteed to identify this nonrobustness. The literature for this setting can be broadly divided into several categories: complete verifiers using satisfiability modulo theory (SMT) (Katz et al., 2017; Ehlers, 2017), mixed integer programming (MIP) (Tjeng et al., 2019; Anderson et al., 2020) and branch and bound (Ba B) (Bunel et al., 2018; Wang et al., 2021); incomplete verifiers using bound propagation (Zhang et al., 2018; Xu et al., 2020), and convex relaxation by linear programming (Wong et al., 2018; Wong & Kolter, 2018). However, these works focus on supervised settings; verification with CL settings remain unknown. Extreme Value Theory Extreme value theory (EVT) has been recognized as a powerful tool, since it enables the limit distribution of properly normalized maxima to be effectively modeled (Scheirer et al., 2011). This success has produced strong empirical results for describable visual attributes (Scheirer et al., 2012), visual inspection tasks (Gibert et al., 2015) and open set recognition problems (Rudd et al., 2018), etc. Recently, CLEVER (Weng et al., 2018) estimates the robust radius of supervised verification using EVT. The difference is discussed in more detail in Appendix B. In this paper, we creatively use EVT to theoretically analyze the relationship between the robust radius of the CL encoder and that of the downstream task. 3. Preliminaries We first present notations and describe the frameworks for contrastive learning and supervised verification problem that will be essential for our analysis. Notions Let 2 and denote the Euclidean norm ℓ2 and infinity norm ℓ , respectively. 1[Boolean expression] is the indicator function (equal to 1 if the expression is True and 0 otherwise). B (x0, ϵ) := {x | x x0 ϵ} denotes that the input x is constrained into the ℓ ball. We Robustness Verification for Contrastive Learning let sign(x) = 1 for x 0 and sign(x) = 1 for x < 0. If S is a set, |S| denotes its cardinality. The transpose of the vector/matrix is represented by the superscript . Given point x, the ℓ2 normalization is defined as eρ(u) = u/ u 2. Given points u and v, the instance similarity is defined as ρ(u, v) = u v/ u 2 v 2, which is the dot product between the ℓ2 normalized u and v (i.e., cosine similarity). We use [n] to represent the set {1, 2, . . . , n}. Neural network Consider an input vector x Rd0 for a neural network with L layers. Let the number of neurons in the k-th layer be dk, while Wk Rdk dk 1 and bk Rdk (k [L]) represent the weights and biases of NN. Let φk : Rd0 Rdk be the operator mapping the input layer to layer k. σ(v) is the activation function, while the Re LU activation is σ(v) = max(v, 0). For each k [L], φk(x) = Wk bφk 1(x) + bk, bφk(x) = σ(φk(x)), bφ0(x) = x. We simply use φ(j) k and bφ(j) k to represent the pre-activation and post-activation values of the k-th layer and j-th neuron. The output of the neural network is φL(x) Rd L. d L further denotes the number of input classes. 3.1. Contrastive Learning Let X denote the set of all possible data points. Let Y denote the label set of the downstream task, which is a discrete and finite set. F is a class of feature encoder f : X Rd. To highlight the key ideas, we present the CL framework proposed by Saunshi et al. (2019) in a simplified binary scenario, i.e., Y = { 1, 1}. CL assumes that we obtain the similar data in the form of pairs (x, x+) and K independent and identically distributed (i.i.d.) negative samples x 1 , x 2 , . . . , x K. Given an unlabeled dataset U = {zi}m i=1, where zi = (xi, x+ i , x i1, . . . , x i K), we aim to learn an encoder f that makes f(x) similar to f(x+), while keeping away from f(x 1 ), . . . , f(x K) at the same time. Linear evaluation One standard method for evaluating the performance of the CL model is linear evaluation (Chen et al., 2020; Kim et al., 2020), which learns a downstream linear layer after the base encoder, then uses a modified model for class-level classification. The test accuracy on the downstream task is used as a proxy for representation quality. The model with downstream layer is fine-tuned from a labeled dataset S = {(xi, yi)}n i=1. Both U and S are assumed to be i.i.d. collections. Data distributions Let C denote the set of latent classes (Saunshi et al., 2019) that are all possible classes for points in X. For each class c C, there is a probability Dc over X that captures the probability that a point belongs to class c. The distribution on C is denoted by η. Let c+, c denote the positive and negative latent class drawn from η; thus, Dc+ and Dc are the distributions to sample positive and negative samples, respectively. The process for generating an unlabeled sample z = (x, x+, {x i }K i=1) U as follows: 1. Draw two latent classes (c+, c ) η2 ; 2. Draw two positive samples (x, x+) D2 c+ and K negative samples {x i Dc | i [K]} . To set up the labeled dataset S for binary scenario, we build the binomial distribution ηsup by fixing two classes c+, c : ηsup(c+)= η(c+) η(c )+η(c+) , ηsup(c )= η(c ) η(c )+η(c+) . We fix yc+ = +1 and yc = 1, then generate a labeled sample (x, y) S as follows: 1. Draw a class c ηsup and set the label y = yc ; 2. Draw a sample x Dc . Loss function The learning process is divided into two steps: minimizing the contrastive loss on the encoder and fine-tuning on the downstream layer using supervised loss. We focus on logistic loss: ℓ(v) = log2(1 + P j exp( vj)) for v RK. Thus, the contrastive loss (Chen et al., 2020; He et al., 2020) associated with the encoder f in this framework is defined as follows: Lun(f) = E c+,c η2 E x,x+ D2 c+ x i Dc ℓ f(x)T f(x+) f(x j ) . (1) For linear evaluation, the supervised learning algorithm is given the mapped dataset b S := {(f(xi), yi)}n i=1 and returns a predictor g : Rd R. The label of bx b S is obtained from by = sign(g(bx)), by { 1, 1}. The logistic loss in (2) is ℓ(v) = log2(1 + exp( v)) for v R. We aim to minimize the supervised loss as follows: Lsup(g f) = E c ηsup E x Dc ℓ(yc g(f(x))) . (2) 3.2. Supervised Verification In this section, we set up the verification problem for supervised learning. For simlicity, we here outline the simplified binary scenario. The supervised algorithm is given the labeled dataset S = {(xi, yi)}n i=1. Let the dimension of NN output d L = 1. H contains a class of supervised predictor h : X R with h := φL(x). Thus, the label of x is predicted by by = sign(h(x)), by { 1, 1}. Verification problem We refer to x = x + δ as an adversarial sample of x for classifier h if h correctly classifies x but assigns a different label to x . Because many powerful attack methods (Goodfellow et al., 2015; Carlini & Wagner, 2017) and adversarial training frameworks (Madry et al., 2018; Zhang et al., 2019) use ℓ norm, this paper also focuses on the setting in which δ satisfies the ℓ norm constraint δ ϵ. We say that model h is lϵ -verified at (x, y) if it correctly classifies both x and x as y for any x B (x, ϵ), i.e., there are no adversarial samples around Robustness Verification for Contrastive Learning x. The supervised verification at (x, y), y { 1, 1} seeks the solution of the following optimization problem: eh(x, y, ϵ) := min x y h(x ) s.t. φk(x ) = Wk bφk 1(x ) + bk, k [L], bφk(x ) = σ(φk(x )), k [L 1], h(x ) = φL(x ), x B (x, ϵ). If eh 0, x B (x, ϵ) fools the model into producing an incorrect label. h is lϵ -verified if eh(x, y, ϵ) > 0. The complete verifier aims to solve (3) and calculates eh exactly. Unfortunately, the complete verification is proven to be an NP-complete problem (Katz et al., 2017; Sinha et al., 2018). Therefore, many previous works (Wong & Kolter, 2018; Zhang et al., 2018; Xu et al., 2020) propose incomplete verifiers that relax the non-convexity part of NN to derive a lower bound eh h. If h(x, y, ϵ) > 0 is given by the incomplete verifier, model h is also lϵ -verified at (x, y). Robust radius The lϵ -verified of h at (x, y) depends on the radius of the largest ℓ ball centered at x in which h does not change its prediction. This radius is called the robust radius, which is formally defined as follows: R(h; x, y) := inf sign(h(x )) =y x x = sup ϵ ϵ s.t. eh(x, y, ϵ) > 0. (4) If h(x) = y, then R(h; x, y) := 0. It is natural to regard the robust radius as a robustness metric. Recall that computing the robust radius is an NP-hard problem due to the need for complete verification. We can thus derive a tight lower bound of R given by h, referred to as the certified radius, which is formally defined as R(h; x, y) := sup ϵ ϵ s.t. h(x, y, ϵ) > 0. (5) The certified radius satisfies 0 R(h; x, y) R(h; x, y). 4. RVCL: Robustness Verification Framework for Contrastive Learning In this section, we first introduce the verification problem on supervised downstream tasks by simply modifying supervised verification (3), and further present several weaknesses of adopting (3) in CL. We go on to propose a novel RVCL framework to solve these issues. 4.1. Verification Problem for Linear Evaluation By defining the supervised verification problem on linear evaluation, we can regard the robust radius R as a proxy robustness metric for CL. We denote the encoder f : Rd0 Rd L with f := φL(x), where d0 and d L are the dimensions of the encoder s input and output, while WLE R1 d L and b LE R are the weight and bias of the downstream linear predictor g(x). The optimization problem for linear evaluation is defined by simply modifying the constraints of (3), as follows: eg(x, y, ϵ) := min x y g(x ) s.t. φk(x ) = Wk bφk 1(x ) + bk, k [L], bφk(x ) = σ(φk(x )), k [L 1], g(x ) = WLEφL(x ) + b LE, x B (x, ϵ). The difference between (3) and (6) is that there is no active function σ( ) between the encoder and downstream layer. There is no barrier to applying incomplete verifiers on (6). Thus, the definition of robust radius RLE(g; x, y) and certified radius RLE(g; x, y) on the downstream task are similar to (4) and (5), respectively. We can therefore regard RLE as a proxy robust radius at data point x. However, this approach has serious problems: 1. RLE cannot be computed directly without a label. 2. Even if we have the label to compute RLE, and use RLE to evaluate the model robustness, we do not know whether the robustness benefits from the encoder or downstream layer. These problems motivate us to propose RVCL, a novel framework for verifying the robustness of encoders without the need for labels and downstream tasks. 4.2. RVCL Framework Many existing works have studied the supervised verification problem stated in 3.2. However, the performing of robustness verification for CL has received less research attention. In this section, we present the formal definition of the robustness verification problem on the encoder f, after which we provide two robustness metrics to study the performance of the CL encoder and incomplete verifier. 4.2.1. VERIFICATION PROBLEM FOR CL The core concept behind supervised verification is that the points in the small B (x, ϵ) ball should have the same label as x. Inspired by this idea, we define the conditions under which the disturbance successfully attacks the encoder. Given a positive sample x+, let the negative sample x be the attack target of x+. We hope that the points x B (x+, ϵ) will be more similar to x+ than any other negative samples x , while the instance-wise attack algorithm generates an adversarial sample x B (x+, ϵ) with the attack strength ϵ, in order to fool the model by judging x as similar to x . Robustness Verification for Contrastive Learning Figure 1. θ is the angle between features given by f. If θ1 < θ2, f is not attacked successfully by the adversarial sample x . If the instance similarity ρ(f(x+), f(x )) > ρ(f(x ), f(x )) , then x is similar to x+ (i.e., θ1 < θ2 in Figure 1), which means that the encoder f is not successfully attacked by x . We say that the encoder f is lϵ -verified at (x+, x ) if x is more similar to x+ than to x for any x B (x+, ϵ), i.e., there are no adversarial samples similar to x around x+. Note that the comparison of instance similarity has an equivalent form: ρ(f(x+), f(x )) > ρ(f(x ), f(x )) (7) eρ(f(x+)) eρ(f(x )) f(x ) > 0 (8) Judging whether or not (8) is True can be regarded as a part of forward propagation; thus, we can define the optimization problem for CL by adding a linear layer after f with weight WCL = (eρ(f(x+)) eρ(f(x ))) . Definition 4.1 (Verification problem for CL). Given two positive and negative samples x+, x Rd0, respectively, the feature encoder f : Rd0 Rd L with f := φL(x), ℓ2 normalization eρ(u) = u/ u 2, for any fixed ϵ, the robustness verification problem for CL is defined as follows: ef(x+, x , ϵ) := min x WCLf(x ) s.t. φk(x ) = Wk bφk 1(x ) + bk, k [L], bφk(x ) = σ(φk(x )), k [L 1], WCL = eρ(f(x+)) eρ(f(x )) R1 d L, f(x ) = φL(x ), x B (x+, ϵ). Moreover, the robust radius RCL and certified radius RCL for CL are defined as follows: RCL(f; x+, x ) := inf ρ(f(x ),f(x+)) <ρ(f(x ),f(x )) = sup ϵ ϵ s.t. ef(x+, x , ϵ) > 0, RCL(f; x+, x ) := sup ϵ ϵ s.t. f(x+, x , ϵ) > 0, where f is the verified lower bound of ef given by the verifier; thus, 0 RCL(f; x+, x ) RCL(f; x+, x ). For verified prediction, f(x+, x , ϵ) > 0 for a given strength ϵ implies that f is lϵ -verified at (x+, x ). The pseudocode is presented as PREDICT in Appendix C.2. To compute RCL, we can apply binary search, because f(x+, x , ϵ) is non-increasing with increasing ϵ. The pseudocode is presented as CERTIFY in Appendix C.2. Certified Radius Robust Radius Instance-wise Adversarial Image Instance Margin Figure 2. Illustration for RVCL. x x+ must be larger than the robust radius RCL if x is an instance-wise adversarial sample. In this case, the latent class of x is dog , while the feature of x is similar to that of x , which is an elephant . The certified radius RCL is provided by the incomplete verifier, which is the lower bound of robust radius RCL. It is verified that no instance-wise adversarial sample exists in B (x+, RCL). Note that (9) and (10) are both defined directly on the CL encoder without reference to any labels or downstream tasks, which resolves the issue articulated in 4.1. 4.2.2. ROBUSTNESS METRICS FOR CL This subsection provides two robustness metrics used to study the robustness of the CL encoder and the performance of incomplete verifiers. Average certified radius (ACR) For supervised verification, ACR (Zhai et al., 2020) is an important metric used in evaluating robustness. Specifically, it is the average of the certified radius on the test dataset. We can define it directly on the supervised downstream task: ACRLE := 1 |Stest| P (x,y) Stest RLE(g; x, y), where Stest is a labeled test dataset satisfying g(x) = y for all (x, y) Stest. However, ACRLE still suffers from the problems discussed in 4.1. We therefore define ACR for CL based on RCL, which directly reflects the robustness of CL encoder without the label: Definition 4.2 (Average certified radius for CL). Given an unlabeled test dataset Utest generated following 3.1, z = (x+, {x i }K i=1) Utest, RCL is defined in (10). The average certified radius for CL is defined as follows: ACRCL := 1 K|Utest| i=1 RCL(f; x+, x i ). (11) Certified instance accuracy For supervised verification, certified accuracy is a metric used to evaluate the performance of incomplete verifiers. Wang et al. (2021) state that the verifier will be stronger if the certified accuracy is the tighter lower bound of supervised robust accuracy. Since there is no definition of robust accuracy provided for the CL encoder, we propose a novel robust accuracy without Robustness Verification for Contrastive Learning label and downstream task for CL called robust instance accuracy based on (7): Definition 4.3 (Robust instance accuracy). Given an unlabeled test dataset Utest, z = (x+, x ) Utest, we use instance-wise PGD attack (Kim et al., 2020) to generate the adversarial point x B(x+, ϵ) by maximizing the contrastive loss (1). The robust instance accuracy with strength ϵ is defined as follows: Aϵ CL = 1 |Utest| z Utest 1[ρ(f(x ),f(x+)) ρ(f(x ),f(x ))>0]. (12) We then define the certified instance accuracy with strength ϵ, which is the fraction of the test dataset for which f is lϵ -verified at (x+, x ), i.e., f > 0. Definition 4.4 (Certified instance accuracy). Given an unlabeled test dataset Utest, z = (x+, x ) Utest. The certified instance accuracy with strength ϵ is defined as Aϵ CL = 1 |Utest| z Utest 1[f(x+,x ,ϵ)>0]. (13) f being lϵ -verified at (x+, x ) is the sufficient but not necessary condition of correctly classifying x generated by a specific attack algorithm with attack strength ϵ. This means that the hold of the judgement condition in (13) implies the hold of that in (12), but not vice versa. Thus, (13) is the lower bound of (12). Remark 4.5. The certified instance accuracy Aϵ CL is used to compare the verifiers on the CL encoder without labels. The smaller gap between robust instance accuracy Aϵ CL and Aϵ CL implies a stronger incomplete verifier (see experiments in 6.3). However, as Aϵ CL is a function of the fixed attack strength ϵ, it is difficult to compare the robustness of two models unless one is uniformly better than the other for all strength ϵ. Thus, ACRCL is a more suitable choice than Aϵ CL for evaluating the robustness of CL encoders. 5. Theoretical Analysis for Robust Radius What is the relationship between the robust radius RCL and RLE? This section demonstrates that RCL is the upper bound of RLE, which is further verified by the experimental results in 6.1. To provide the main insights, we first consider the situation in which only one positive sample is used. Formally, given an unlabeled sample z = (x+, {x i }K i=1), we introduce the margin distance of x+ as half of the minimum distance between f(x+) and f(x i ), defined as M := mini [K] Di, where Di := (1 ρ(f(x+), f(x i ))/2. The idea is to estimate the lower tail of the distribution of M by fitting the λ smallest Di of the negative samples x i . We can then use this estimated distribution to produce the probability of a new point x falling into the margin of x+, which can be interpreted as the probability of x being a positive sample of x+. x is classified as a positive sample of x+ if it is inside the margin of x+ with high probability. To estimate the distribution of the margin distance, we turn to the Fisher-Tippett-Gnedenko Theorem in extreme value theory (see the complete statement in Appendix A.1). Lemma 5.1 (Fisher-Tippett-Gnedenko theorem (Coles et al., 2001)). Let X1, X2, . . . be a sequence of independent random variables with common distribution function F. Let Mn = sup(X1, . . . , Xn). If there exists a sequence an > 0, bn R such that lim n P(Mn bn an z) = G(z), where G is a non-degenerate distribution function, then G belongs to either the Gumbel family, the Fr echet family or the Reverse Weibull family. The theorem states that the maximum of a sequence of i.i.d. random variables after proper normalization can only converge to one of three possible distributions. Theorem 5.2 (Margin distribution). Assume a continuous non-degenerate margin distribution exists. The distribution for margin distance M is then given by the Reverse Weibull distribution. The probability of x being a positive sample of x+ is given by the following: Ψ(x; x+, α, σ) = exp 1 ρ(f(x), f(x+)) where ρ(f(x), f(x+)) is the instance similarity between x and x+. α, σ > 0 are Weibull shape and scale parameters, obtained from fitting to the λ smallest margin distances Di. See proof in Appendix A.2. Theorem 5.2 demonstrates that the probability of x being a positive sample can be given by the Reverse Weibull distribution fitting on finite samples. This enables us to compare the robust radius of the encoder and that of the downstream task. Intuitively, if the classifier can predict correctly with higher confidence, this implies that the classifier provides better certified robustness. Theorem 5.3 (Robust radius bound). Given an encoder f : X Rd and an unlabeled sample z = (x+, {x i }K i=1), the downstream predictor g : Rd R is trained on b S = {(f(x+), yc+), (f(x i ), yc )K i=1}. Then, for different negative samples x i , we have RCL(f; x+, x i ) RLE(g; x+, yc+). Proof sketch. The possibility of downstream layer predicting x as positive can be given by ΨLE, which is fitted from margin distances {Di}K i=1 computed by b S. ΨCL is fitted from specific Di, since RCL(f; x+, x i ) is the robust radius between specific pair of positive and negative sample. Robustness Verification for Contrastive Learning Figure 3 plots the cumulative distribution function (CDF) of ΨCL and ΨLE. Ψ 1 when x x+, which means x is very likely to be the positive sample of x+. If there exists negative samples between x i and x+ ( in Figure 3), then ΨLE will fit to these negative samples and make CDF grows slower than ΨCL, i.e., ΨCL(x) ΨLE(x). Lemma A.4 offers the correspondence that a higher probability to be a positive sample implies a larger robust radius, which recovers the theorem statement. See complete proof in Appendix A.3. Figure 3. Probability of x as a positive sample. means the negative sample other than x i . 0 𝑥𝑥+ Probability Remark 5.4. Theorem 5.3 implies that improving the robustness of the CL encoder enlarges the robust radius RCL, which can benefit the robustness of the downstream layer by providing a large upper bound of RLE. A larger RCL implies that the model will achieve a higher robust performance on the downstream task; thus, it is reasonable to regard RCL as a robustness metric. In the above, we discuss the decision margin by analyzing the case with single x+ and multiple x . Here, we discuss the robust radius of different x+ by making the following theroem: Theorem 5.5. Given an encoder f : X Rd, two positive samples x+ 1 , x+ 2 and one negative sample x , if ρ(f(x+ 1 ), f(x )) ρ(f(x+ 2 ), f(x )), then RCL(f; x+ 1 , x ) RCL(f; x+ 2 , x ). See proof in Appendix A.4. Theorem 5.5 proves that if f(x+) is similar to f(x ), it is easy to recognize the adversarial sample x of x+ as a negative sample. In 6.2, we empirically show that a vague image for which it is difficult to identify the class characteristic is easy to attack. 6. Experiments In this section, we verify the effectiveness of our proposed RVCL by means of numerical experiments. More specifically, 6.1 demonstrates the effectiveness of the average certified radius for CL. 6.2 shows that ACRCL can evaluate the anti-disturbance ability of individual images. 6.3 compares the strength of incomplete verifiers by means of certified instance accuracy. 6.4 illustrates the verified lower bound f given by the verifiers. 6.5 provides the sensitivity analysis for parameters in RVCL. Due to space limitations, we compare the efficiency of different verifiers in Appendix E.2. Set-up. Our main experiments utilize four architectures: Base, Deep from Wang et al. (2021), CNN-A, CNN-B from Dathathri et al. (2020), from which the last layer is removed to form the CL encoders. Sim CLR (Chen et al., 2020) and Ro CL (Kim et al., 2020) are used in this paper for CL training and contrastive AT, respectively. Contrastive AT is trained with instance-wise adversarial samples with different attack strengths ϵtrain; ϵtrain = 0 indicates that the encoder is trained with benign images. Using the same dataset as in previous deterministic verification works, all CL encoders are trained on MNIST (Le Cun & Cortes, 2010) and CIFAR-10 (Krizhevsky & Hinton, 2009). We utilize two incomplete verifiers with different verified tightness in the RVCL framework: CROWN (Zhang et al., 2018) and CBC (β-CROWN (Wang et al., 2021) for CL). A more detailed introduction to incomplete verifiers is presented in Appendix C.1. Further details of the models and experimental settings are presented in Appendix D. The code can be found in the supplementary materials and Git Hub. 6.1. Average certified radius In this subsection, we compute the average certified radius (ACR in 4.2.2) over 100 test samples (i.e., |Stest| = |Utest| = 100) following previous verification works. For ACRCL, we set the number of negative samples K = 10. In the interests of efficiency, we use CROWN to compute the ACRCL, which is discussed further in Appendix E.2. To determine whether the model robustness benefits from the encoder or linear classifier, we fix the encoder and finetune the downstream layer with benign images without perturbations. For the empirical robust test, we utilize supervised robust accuracy on the whole test dataset; this is the downstream classification accuracy over adversarial samples via label-wise PGD attack (Madry et al., 2018) with attack strength ϵtest. Note that a larger value of ϵtrain implies that a more robust encoder is obtained by contrastive AT. The results in Figure 4(a,b) show that ACRCL and ACRLE increase with increasing ϵtrain on the two datasets. Meanwhile, those in Figure 4(c,d) show that with different values of ϵtest, the robust accuracy of the downstream classifier increases as ϵtrain increases. We therefore conclude that: 1. It is effective to measure the robustness using ACRCL without labels and downstream tasks, because both the ACRCL and supervised metric (ACRLE and robust accuracy) grow consistently with increasing ϵtrain. 2. Figure 4(a,b) show that ACRCL is larger than ACRLE with the same ϵtrain, which supports our Theorem 5.3. Theorem 5.3 demonstrates that RCL is the upper bound of RLE, while ACRCL and ACRLE are related to RCL and RLE, respectively. Robustness Verification for Contrastive Learning ACRCL ACRLE (a) ACR for MNIST ACRCL ACRLE (b) ACR for CIFAR-10 Accuracy(%) clean test = 0.15 test = 0.2 (c) MNIST Robust Test Accuracy(%) clean test = 8 255 test = 16 (d) CIFAR-10 Robust Test Figure 4. (a,b) ACRCL and ACRLE on MNIST and CIFAR-10 with different values of ϵtrain. (c,d) Supervised robust accuracy on the downstream classifier. On MNIST, attack strengths ϵtest are 0.15, 0.2; on CIFAR-10, attack strengths ϵtest are 8/255, 16/255. Clean represents testing with benign images. (a,c) run on CNN-A, (b,d) run on CNN-B. 3. Figure 4(a,b) show that a robust encoder can significantly improve the model s robust performance on downstream tasks, since ACRLE grows with increasing ϵtrain even though the downstream layer is learned on benign images. 6.2. Anti-disturbance ability of images To study the robustness property of each image, in this subsection, we compute ACRCL for a single test sample; i.e., |Utest| = 1, then ACRCL := 1 K PK i=1 RCL(f; x+, x i ). We sample two images from CIFAR-10, as shown in Figure 5(b). The above image is labeled as deer, which is vague and makes it difficult to identify the latent class. The below image is labeled as dog, which is much clearer than deer. We calculate RCL with fifty negative samples (K = 50). Our findings suggest that the RCL of deer is significantly smaller than that of dog; this means that the distance between the feature of deer and its negative samples is smaller than that between the feature of dog and its negative samples, which supports our Theorem 5.5. We can therefore conclude that the anti-disturbance ability of dog is stronger than that of deer, which means that dog is lϵ -verified with a larger ϵ than deer. These results verify that ACRCL is able to quantify the anti-disturbance ability of images. We sample 10 more images from CIFAR-10, and plot the images and their ACRCL in Figure 5(a). It comes to the horse 0.0443 bird 0.0511 horse 0.0589 bird 0.0721 truck 0.0802 truck 0.0858 airplane 0.0972 (a) ACRCL for 10 images chosen from CIFAR-10 (b) RCL for deer and dog 0.04 0.06 0.08 0.10 0.12 ACRCL KDE for 500 Images (c) KDE of ACRCL Figure 5. (a,b,c) run on CNN-B with ϵtrain = 4.4/255. (a) ACRCL for images, which are randomly chosen from CIFAR-10; K = 50. (b) RCL for two images from CIFAR-10; K = 50. (c) Frequency distribution histogram and kernel density estimation (KDE) of ACRCL for 500 images from CIFAR-10; K = 20. same conclusion: the vague image which is difficult to identify the latent class has a low ACRCL. We further visualize the distribution of ACRCL for 500 images from CIFAR10 by calculating ACRCL with K = 20, and additionally provide the kernel density plot to show the distribution (see Figure 5(c)). About 90% of images ACRCL are distributed within the range [0.044, 0.093], concentrated around 0.07. 6.3. Certified instance accuracy Table 1. Comparison of certified instance accuracy across various networks and attack strength ϵtest on CIFAR-10. The number of test samples |Utest| = 100. ϵtest Model ϵtrain Instance Certified Accuracy Instance Accuracy PGD CBC CROWN 0 100% 97% 96% 100% 100% 100% 91% 26% 11% 4.4 255 100% 55% 34% 8.8 255 100% 68% 52% Based 4.4 255 100% 99% 95% Deep 100% 96% 84% CNN-A 99% 91% 81% 8 255 CNN-B 8.8 255 1% 0% 0% In this subsection, we study the performance of incomplete verifers on the CL encoder via certified instance accuracy (Aϵ CL in Definition 4.4). Table 1 summarizes some of the results on CIFAR-10. Due to space limitations, we provide detailed settings and explanations for more experimental results (on MNIST) in Appendix E.1. The two incomplete Robustness Verification for Contrastive Learning verifiers we utilize herein are CROWN and CBC; CBC is the state-of-the-art verifier, which is more powerful than CROWN for supervised verification (Wang et al., 2021). Aϵ CL is the lower bound of the robust instance accuracy Aϵ CL (Definition 4.3, obtained by instance-wise PGD attack (Kim et al., 2020)). The tighter lower bound given by Aϵ CL indicates a stronger incomplete verifier. Table 1 shows that the gap between Aϵ CL given by CBC and Aϵ CL given by PGD is smaller than that between CROWN and PGD, which shows that CBC is a stronger verifier than CROWN. We further illustrate the verified lower bound f given by these two incomplete verifiers in 6.4, from which the same conclusions can be drawn. All these results demonstrate the efficacy of our proposed RVCL: a stronger supervised verifier can still achieve a tighter certified radius in the RVCL framework. 6.4. Tightness of verification 0.6 0.8 1.0 1.2 1.4 PGD adversarial upper bound 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 Verified lower bound CBC CROWN PGD(y=x) (a) MNIST, ϵtrain = 0.3, ϵtest = 0.3 0.0 0.5 1.0 1.5 2.0 2.5 3.0 PGD adversarial upper bound Verified lower bound CBC CROWN PGD(y=x) (b) CIFAR-10, ϵtrain = 4.4 255, ϵtest = 4 255 Figure 6. Comparing the tightness of verifiers. For 100 test samples on MNIST and CIFAR-10. (a) runs on CNN-A, (b) runs on CNN-B. We plot the verified lower bound f(x+, x , ϵ) against PGD upper bound f. Some points exceed the plotted axes limits. Instance-wise PGD attack (Kim et al., 2020) provides the upper bound of minimum distortion, f ef, while RVCL provides the lower bound, ef f(x+, x , ϵ). It should be noted that, unlike with supervised verification (Dathathri et al., 2020; Wang et al., 2021), all distortion here is instancewise. (x+, x ) is verified if f(x+, x , ϵ) > 0, meaning that x+ cannot be disturbed to x with attack strength ϵtest. The closer the verified lower bound f(x+, x , ϵ) is to the PGD upper bound f (y = x in Figure 6), the stronger the verifier would be. One hundred test samples (|Utest| = 100) are used to illustrate the tightness of the verification. As Figure 6 shows, the points above the dotted line are successfully verified. CBC achieves tight verification across all samples, and furthermore consistently outperforms CROWN on MNIST and CIFAR-10. This result is consistant with supervised verification in Wang et al. (2021), which demonstrates the effectiveness of RVCL. 6.5. Sensitivity analysis 50 100 150 200 250 Featrue dimension trian = 8.8 trian = 4.4 5 10 20 50 Negative samples K trian = 8.8 trian = 4.4 Figure 7. Left: Sensitivity analysis of feature dimension influencing ACRCL. Right: Sensitivity analysis of the number of negative samples K influencing ACRCL. Experiments run on CNN-B with different ϵtrain on CIFAR-10. Feature dimension We investigate the influence of the feature dimension on ACRCL (see Left of Figure 7). From the result, we can determine that ACRCL increases slightly with the growing feature dimension, then remains stable on dimensions larger than 150. The results illustrate that ACRCL is not sensitive to feature dimension. Number of negative samples We validate the influence of the number of negative samples K on ACRCL (see Right of Figure 7). The result shows that ACRCL is not sensitive to K with different values of ϵtrain, which means that we can use small values of K to efficiently evaluate the model robustness or the anti-disturbance ability of an image. 7. Conclusion In this paper, we tackle the robustness verification problem for CL without any labels, and accordingly propose a novel RVCL framework that does not depend on any class labels, downstream tasks or specific attack algorithms. We then use extreme value theory to reveal the quantitative relationship between the robust radius of the CL encoder and that of the downstream task. All our experiments show that RVCL is an efficient robustness framework for CL encoders, and can also be used to evaluate the anti-disturbance ability of images. Moreover, our experimental results verify our theory. We believe that RVCL is a novel perspective from which to understand robustness on contrastive learning. Acknowledgements This work is supported by the National Natural Science Foundation of China under Grant 61976161. Robustness Verification for Contrastive Learning Alayrac, J., Uesato, J., Huang, P., Fawzi, A., Stanforth, R., and Kohli, P. Are labels required for improving adversarial robustness? In Neur IPS, pp. 12192 12202, 2019. Anderson, R., Huchette, J., Ma, W., Tjandraatmadja, C., and Vielma, J. P. Strong mixed-integer programming formulations for trained neural networks. Math. Program., 183(1):3 39, 2020. Bunel, R., Turkaslan, I., Torr, P. H. S., Kohli, P., and Mudigonda, P. K. A unified view of piecewise linear neural network verification. In Neur IPS, pp. 4795 4804, 2018. Carlini, N. and Wagner, D. A. Towards evaluating the robustness of neural networks. In SP, pp. 39 57, 2017. Carmon, Y., Raghunathan, A., Schmidt, L., Duchi, J. C., and Liang, P. Unlabeled data improves adversarial robustness. In Neur IPS, pp. 11190 11201, 2019. Chen, T., Kornblith, S., Norouzi, M., and Hinton, G. E. A simple framework for contrastive learning of visual representations. In ICML, volume 119, pp. 1597 1607, 2020. Coles, S., Bawa, J., Trenner, L., and Dorazio, P. An introduction to statistical modeling of extreme values, volume 208. 2001. Dathathri, S., Dvijotham, K., Kurakin, A., Raghunathan, A., Uesato, J., Bunel, R., Shankar, S., Steinhardt, J., Goodfellow, I. J., Liang, P., and Kohli, P. Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming. In Neur IPS, 2020. Doersch, C., Gupta, A., and Efros, A. A. Unsupervised visual representation learning by context prediction. In ICCV, pp. 1422 1430, 2015. Ehlers, R. Formal verification of piece-wise linear feedforward neural networks. In Automated Technology for Verification and Analysis, volume 10482, pp. 269 286, 2017. Fan, L., Liu, S., Chen, P.-Y., Zhang, G., and Gan, C. When does contrastive learning preserve adversarial robustness from pretraining to finetuning? In Neur IPS, 2021. Gibert, X., Patel, V. M., and Chellappa, R. Sequential score adaptation with extreme value theory for robust railway track inspection. In ICCV, pp. 131 138, 2015. Gidaris, S., Singh, P., and Komodakis, N. Unsupervised representation learning by predicting image rotations. In ICLR, 2018. Goodfellow, I. J., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples. In ICLR, 2015. Gowal, S., Huang, P., van den Oord, A., Mann, T., and Kohli, P. Self-supervised adversarial robustness for the low-label, high-data regime. In ICLR, 2021. He, K., Fan, H., Wu, Y., Xie, S., and Girshick, R. B. Momentum contrast for unsupervised visual representation learning. In CVPR, pp. 9726 9735, 2020. Ho, C. and Vasconcelos, N. Contrastive learning with adversarial examples. In Neur IPS, 2020. Jiang, Z., Chen, T., Chen, T., and Wang, Z. Robust pretraining by adversarial contrastive learning. In Neur IPS, 2020. Jing, L. and Tian, Y. Self-supervised visual feature learning with deep neural networks: A survey. IEEE Trans. Pattern Anal. Mach. Intell., 43(11):4037 4058, 2021. Katz, G., Barrett, C. W., Dill, D. L., Julian, K., and Kochenderfer, M. J. Reluplex: An efficient SMT solver for verifying deep neural networks. In CAV, volume 10426, pp. 97 117, 2017. Kim, M., Tack, J., and Hwang, S. J. Adversarial selfsupervised contrastive learning. In Neur IPS, 2020. Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization. In ICLR, 2015. Krizhevsky, A. and Hinton, G. Learning multiple layers of features from tiny images. 2009. Le Cun, Y. and Cortes, C. Mnist handwritten digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2010. Li, L., Qi, X., Xie, T., and Li, B. Sok: Certified robustness for deep neural networks. Co RR, abs/2009.04131, 2020. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. Towards deep learning models resistant to adversarial attacks. In ICLR, 2018. Moosavi-Dezfooli, S., Fawzi, A., and Frossard, P. Deepfool: A simple and accurate method to fool deep neural networks. In CVPR, pp. 2574 2582, 2016. Noroozi, M. and Favaro, P. Unsupervised learning of visual representations by solving jigsaw puzzles. In ECCV, volume 9910, pp. 69 84, 2016. Nozawa, K., Germain, P., and Guedj, B. Pac-bayesian contrastive unsupervised representation learning. In UAI, volume 124, pp. 21 30, 2020. Robustness Verification for Contrastive Learning Rudd, E. M., Jain, L. P., Scheirer, W. J., and Boult, T. E. The extreme value machine. IEEE Trans. Pattern Anal. Mach. Intell., 40(3):762 768, 2018. Saunshi, N., Plevrakis, O., Arora, S., Khodak, M., and Khandeparkar, H. A theoretical analysis of contrastive unsupervised representation learning. In ICML, volume 97, pp. 5628 5637, 2019. Scheirer, W. J., Rocha, A., Micheals, R. J., and Boult, T. E. Meta-recognition: The theory and practice of recognition score analysis. IEEE Trans. Pattern Anal. Mach. Intell., 33(8):1689 1695, 2011. Scheirer, W. J., Kumar, N., Belhumeur, P. N., and Boult, T. E. Multi-attribute spaces: Calibration for attribute fusion and similarity search. In CVPR, pp. 2933 2940, 2012. Schmidt, L., Santurkar, S., Tsipras, D., Talwar, K., and Madry, A. Adversarially robust generalization requires more data. In Neur IPS, pp. 5019 5031, 2018. Sinha, A., Namkoong, H., and Duchi, J. C. Certifying some distributional robustness with principled adversarial training. In ICLR, 2018. Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I. J., and Fergus, R. Intriguing properties of neural networks. In ICLR, 2014. Tian, Y., Krishnan, D., and Isola, P. Contrastive multiview coding. In ECCV, volume 12356, pp. 776 794, 2020. Tjeng, V., Xiao, K. Y., and Tedrake, R. Evaluating robustness of neural networks with mixed integer programming. In ICLR, 2019. Uesato, J., O Donoghue, B., Kohli, P., and van den Oord, A. Adversarial risk and the dangers of evaluating against weak attacks. In ICML, volume 80, pp. 5032 5041, 2018. Wang, S., Zhang, H., Xu, K., Lin, X., Jana, S., Hsieh, C., and Kolter, J. Z. Beta-crown: Efficient bound propagation with per-neuron split constraints for complete and incomplete neural network verification. In Neur IPS, 2021. Weng, T., Zhang, H., Chen, P., Yi, J., Su, D., Gao, Y., Hsieh, C., and Daniel, L. Evaluating the robustness of neural networks: An extreme value theory approach. In ICLR, 2018. Wong, E. and Kolter, J. Z. Provable defenses against adversarial examples via the convex outer adversarial polytope. In ICML, volume 80, pp. 5283 5292, 2018. Wong, E., Schmidt, F. R., Metzen, J. H., and Kolter, J. Z. Scaling provable adversarial defenses. In Neur IPS, pp. 8410 8419, 2018. Wu, Z., Xiong, Y., Yu, S. X., and Lin, D. Unsupervised feature learning via non-parametric instance discrimination. In CVPR, pp. 3733 3742, 2018. Xu, K., Shi, Z., Zhang, H., Wang, Y., Chang, K., Huang, M., Kailkhura, B., Lin, X., and Hsieh, C. Automatic perturbation analysis for scalable certified robustness and beyond. In Neur IPS, 2020. Yang, G., Duan, T., Hu, J. E., Salman, H., Razenshteyn, I. P., and Li, J. Randomized smoothing of all shapes and sizes. In ICML, volume 119, pp. 10693 10705, 2020. Zhai, R., Cai, T., He, D., Dan, C., He, K., Hopcroft, J. E., and Wang, L. Adversarially robust generalization just requires more unlabeled data. Co RR, abs/1906.00555, 2019. Zhai, R., Dan, C., He, D., Zhang, H., Gong, B., Ravikumar, P., Hsieh, C., and Wang, L. MACER: attack-free and scalable robust training via maximizing certified radius. In ICLR, 2020. Zhang, H., Weng, T., Chen, P., Hsieh, C., and Daniel, L. Efficient neural network robustness certification with general activation functions. In Neur IPS, pp. 4944 4953, 2018. Zhang, H., Yu, Y., Jiao, J., Xing, E. P., Ghaoui, L. E., and Jordan, M. I. Theoretically principled trade-off between robustness and accuracy. In ICML, volume 97, pp. 7472 7482, 2019. Robustness Verification for Contrastive Learning A.1. Extreme Value Theory Before provding the proofs of main results, we first provide two important lemmas in extreme value theory (EVT). Lemma A.1 (Fisher-Tippett-Gnedenko theorem (Coles et al., 2001)). Let X1, X2, . . . be a sequence of independent random variables with common distribution function F. Let Mn = max(X1, . . . , Xn). If there exists a sequence an > 0, bn R such that lim n P(Mn bn an z) = G(z), (A.1) where G is a non-degenerate distribution function, then G belongs to either the Gumbel family (Type I), the Fr echet family (Type II) or the Reverse Weibull family (Type III) with their CDFs as follows: Gumbel family (Type I): G(z) = exp exp z b Fr echet family (Type II): G(z) = n 0, if z < b, exp z b a α , if z b, Reverse Weibull family (Type III): G(z) = n exp b z a α , if z < b, 1, if z b, where a > 0, b R and α > 0 are the scale, location and shape parameters, respectively. Lemma A.1 states that the rescaled sample maxima (Mn bn)/an converge in distribution to a variable that has a distribution within one of three families. Furthermore, these three families can be combined into a single family called generalized extreme value (GEV) distribution, which is a family of continuous probability distributions developed within extreme value theory. The Gumbel, Fr echet and Reverse Weibull families are special cases of GEV distribution. Lemma A.2 (Generalized Extreme Value (GEV) distribution (Coles et al., 2001)). Let X1, X2, . . . be a sequence of i.i.d. samples from the distribution function F. Let Mn = max(X1, . . . , Xn). If there exists a sequence an > 0, bn R such that limn P( Mn bn an z) = G(z), then if G is a non-degenerate distribution function, it belongs to the class of generalized extreme value (GEV) distributions with , z R : 1 + ξ z µ where ξ R, µ R and σ > 0 are the shape, location and scale parameters, respectively. As the special case, the Reverse Weibull family in Lemma A.1 can be derived by Lemma A.2, which let ξ < 0 and the upper endpoint of F be denoted by b, then α = 1/ξ > 0. A.2. Proof of Theorem 5.2 Theorem 5.2 (Margin distribution). Assume a continuous non-degenerate margin distribution exists. The distribution for margin distance M is then given by the Reverse Weibull distribution. The probability of x being a positive sample of x+ is given by the following: Ψ(x; x+, α, σ) = exp 1 ρ(f(x), f(x+)) where ρ(f(x), f(x+)) is the instance similarity between x and x+. α, σ > 0 are Weibull shape and scale parameters, obtained from fitting to the λ smallest margin distances Di. Proof. From the assume we know that G(z) in Lemma A.1 exists. Since Lemma A.1 applies to maxima, we transform the variables via M = maxi [K] Di, Di := (1 ρ(f(x+), f(x i ))/2. Because Di is bounded ( Di < 0), so the asymptotic distribution of M converges to the Reverse Weibull distribution: αo , if z < 0, 1, if z 0, (A.4) Robustness Verification for Contrastive Learning where α > 0, b is the upper endpoint of F, σ is the scale parameters. b = 0 in (A.4) since M is bounded above by 0 as a negative distance. We use margin distances Di of the λ closest samples with x+ to estimate the parameters α and σ, which means to estimate c W of the distribution funstion W. The margin distance between x and x+ defined as 1 ρ(f(x), f(x+)). Then we focus on the probability of x included in the margin of x+, which can be written as: P(1 ρ(f(x), f(x+)) < min(D1, . . . , Dn)) = P( min(D1, . . . , Dn) < ρ(f(x), f(x+)) 1) = P(max( D1, . . . , Dn) < ρ(f(x), f(x+)) 1) = P(M < ρ(f(x), f(x+)) 1) = c W(ρ(f(x), f(x+)) 1). Since ρ(f(x), f(x+)) 1 < 0, we can rewrite (A.5) as (A.3). Overall, we conclude our proof. A.3. Proof of Theorem 5.3 Theorem 5.3 (Robust radius bound). Given an encoder f : X Rd and an unlabeled sample z = (x+, {x i }K i=1), the downstream predictor g : Rd R is trained on b S = {(f(x+), yc+), (f(x i ), yc )K i=1}. Then, for different negative samples x i , we have RCL(f; x+, x i ) RLE(g; x+, yc+). (A.6) Before we formally prove Theorem 5.3, we first provide the following two lemmas. Lemma A.3 focuses on a model for the k largest order statistics. It extends the result in Lemma A.1 to extreme order statistics, by defining M (k) n = k-th largest of {X1, . . . , Xn} and further identifying the limiting behavior of this variable, for fixed k, as n . Lemma A.3 implies that, if the k-th largest order statistic is normalized in exactly the same way as the maximum, then its limiting distribution is of the form given by (A.8). Lemma A.3 (k-th largest order statistic (Coles et al., 2001)). Let X1, X2, . . . be a sequence of i.i.d. samples from the distribution function F. Let Mn = max(X1, . . . , Xn). If there exists a sequence an > 0, bn R such that limn P( Mn bn an z) = G(z) for a non-degenerate distribution function G, so that G is the GEV distribution function given by (A.2), then, for fixed k, lim n P((M (k) n bn)/an) = Gk(z) (A.7) on {z : 1 + ξ(z µ) σ > 0}, where Gk(z) = exp{ τ(z)} with τ(z) = 1 + ξ z µ Intuitively, the classifier predicting correctly with higher confidence implies that the classifier can provide better certified robustness. Lemma A.4 provides this relationship directly. Lemma A.4 (Robust radius (Yang et al., 2020)). The robust radius in any norm is at least 1 Φ(p)dp, (A.9) where Φ(p) := sup v =1 sup U Rd:q(U)=p limr 0 q(U rv) p r , λ is the probability that the binary classifier predicts the right label under perturbation, q(U) is the measure of U under q, i.e. q(U) = Prδ q(δ U), v is the perturbation vector. Finally, we prove Theorem 5.3. Robustness Verification for Contrastive Learning Proof. ΨLE is obtained from fitting to margin distances {Di}K i=1, Di := (1 ρ(f(x+), f(x i ))/2. From Lemma A.1 we know that (A.3) is the Reverse Weibull distribution and can be written as the form of (A.2): ΨLE(x; x+, α, σ) = exp with z = D = ρ(f(x), f(x+)) 1. RCL(f; x+, x i ) is defined on the positive and negative sample pair (x+, x i ); it means that ΨCL is fitted from specific (x+, x i ). We denote D(k) for x i is the k-th largest order statistic of { D1, . . . , DK}. By Lemma A.3, the distribution function ΨCL for D(k) can be written as: ΨCL(x; x+, α, σ, k) = exp{ τ(z)} with τ(z) = 1 + ξ z µ σ 1/ξ , z = ρ(f(x), f(x+)) 1. ξ, µ, σ are the same with (A.10). Let x Dc+ be the positive test sample of x+. As we discuss in 3.1, the latent label for unlabeled positive sample can be given by yc+. In 4, we propose that judging positive sample correctly on CL encoder and downstream task can be transformed to judge whether or not WCLf(x) > 0 and yc+ g(x) > 0 are True, respectively. Thus, for every negative sample x i , we have P(WCLf(x) > 0 | x) P(yc+ g(x) > 0 | x) = ΨCL(x; x+, α, σ, k) ΨLE(x; x+, α, σ) = exp{ τ(z)} = exp{ τ(z)} with τ(z) = 1 + ξ z µ σ 1/ξ, z = ρ(f(x), f(x+)) 1. Equality holds if and only if k = 1. Recall that x = x+ + δ, δ ϵ, i.e. x B (x+, ϵ), which is in the constraints of (6) and (9). By Lemma A.4, we have: RCL(f; x+, x i ) RLE(g; x+, yc+) = Z 1/2 1 ΨCL(x;x+,α,σ,k) 1 Φ(p)dp Z 1/2 1 ΨLE(x;x+,α,σ) = Z 1 ΨLE(x;x+,α,σ) 1 ΨCL(x;x+,α,σ,k) which recovers the theorem statement. Equality holds if and only if k = 1. A.4. Proof of Theorem 5.5 Theorem 5.5. Given an encoder f : X Rd, two positive samples x+ 1 , x+ 2 and one negative sample x , if ρ(f(x+ 1 ), f(x )) ρ(f(x+ 2 ), f(x )), then RCL(f; x+ 1 , x ) RCL(f; x+ 2 , x ). (A.14) Proof. In this proposition, we consider the situation in which multiple positive samples x+ and only one negative sample x are used. Formally, given an unlabeled sample z = ({x+ i }K i=1, x ), we define the margin distance of x as M := mini [K] D i , where D i := (1 ρ(f(x+ i ), f(x ))/2. Robustness Verification for Contrastive Learning We denote the lower tail of the distribution of M as Ψneg. Similar to the idea of Theorem 5.2, we use Ψneg to produce the probability of x falling into the margin of x , which can be interpreted as the probability of x being a negative sample. From Lemma A.1, we know that Ψneg also converges to the Reverse Weibull distribution, and can be written to the form as following: Ψneg(x; x , α, σ) = exp 1 ρ(f(x), f(x )) where ρ(f(x), f(x )) is the instance similarity between x and x . α, σ > 0 are Weibull shape and scale parameters, obtained from fitting to the λ smallest margin distances D i . Cumulative distribution function Ψneg is monotonic increasing. Given two positive samples x+ 1 , x+ 2 , if ρ(f(x+ 1 ), f(x )) ρ(f(x+ 2 ), f(x )), we have: Ψneg(x+ 1 ; x , α, σ) Ψneg(x+ 2 ; x , α, σ) (A.16) x being the positive sample of x+ means that x is more similar to x+ than to x . From Lemma A.4 we have: RCL(f; x+ 1 , x ) RCL(f; x+ 2 , x ) = Z 1/2 Ψneg(x+ 1 ;x ,α,σ) 1 Φ(p)dp Z 1/2 Ψneg(x+ 2 ;x ,α,σ) = Z Ψneg(x+ 1 ;x ,α,σ) Ψneg(x+ 2 ;x ,α,σ) which recovers the theorem statement. Equality holds if and only if ρ(f(x+ 1 ), f(x )) = ρ(f(x+ 2 ), f(x )). B. Difference with CLEVER CLEVER (Cross-Lipschitz Extreme Value for n Etwork Robustness) (Weng et al., 2018) estimates the robust radius R using extreme value theory (EVT). In this paper, we utilize EVT in a totally different way compared with CLEVER: 1. To produce the probability of x being a positive sample of x+, we utilize EVT to estimate the lower tail of the margin distance (Theorem 5.2); thus, we can compare the probability given by the CL encoder and the downstream task. While CLEVER focuses on estimating R by proposing a sampling based approach with EVT to estimate the local Lipschitz constant; it is based on a theoretical analysis of formal robustness guarantee with Lipschitz continuity assumption. 2. We use EVT to reveal the quantitative relationship between the robust radius of the CL encoder and that of the downstream task. While CLEVER only focuses on the robust radius for supervised verification. C. Complete Implementation We first introduce two incomplete verifiers with different verified tightness used for RVCL. Then we present the complete implementation for verified prediction and certification. C.1. Incomplete verifiers Due to the nonlinear activations σ( ), the feasible set of (6) and (9) is nonconvex. One intuitive idea is to perform the convex relaxation of the feasible set to build incomplete verifiers. This paper discusses Re LU networks with CROWN (Zhang et al., 2018), which is a method used to relax the nonconvex equality constraints bφk( ) = σ(φk( )) to convex inequality constraints. Let l(j) k and u(j) k be the lower and upper bound of φ(j) k , i.e. l(j) k < φ(j) k < u(j) k , k [L]. Given the Re LU activation function σ(y) = max(y, 0), CROWN uses linear constraints to relax Re LU: α(i) j φ(j) k bφ(j) k u(j) k u(j) k l(j) k φ(j) k l(j) k , where 0 α(i) j 1. After convex relaxation, (6) and (9) can be efficiently solved, as follows: Lemma C.1 (CROWN bound (Zhang et al., 2018)). Given an L-layer NN φL : Rd0 R with weights Wk and bias bk, and the pre-activation bound l(j) k < φ(j) k < u(j) k (k [L], j [dk]), x B(x, ϵ), we have: min x WL bφL 1(x ) + b L min x c x + c0 (C.18) where c and c0 can be computed by Wk, bk, l(j) k , u(j) k . Robustness Verification for Contrastive Learning Another incomplete verifier stronger than CROWN is β-CROWN (Wang et al., 2021) which is the state-of-the-art verification method. β-CROWN uses a few steps of gradient ascent to achieve bounds as tight as possible but suffer from high time cost. Lemma C.2 (β-CROWN bound (Wang et al., 2021)). Given an L-layer NN φL : Rd0 R with weights Wk and bias bk, the pre-activation bound l(j) k < φ(j) k < u(j) k , x B(x, ϵ), and split constraints z Z, we have: min x ,z Z WL bφL 1(x ) + b L max β 0 min x (a + P β) x + q β + c0 (C.19) where a, P , q and c0 can be computed by Wk, bk, l(j) k , u(j) k , β is the multiplier of Lagrange function. We modify two supervised verifiers with different verification tightness in order to show that: a stronger verifier can still achieve a tighter certified radius RCL in RVCL framework, which illustrates the efficacy of RVCL. The related experiment results are in 6.3. The procedure of incomplete verifier is denoted by INCOMPLETEVERIFIER in the next subsection, which aims to give the verified lower bound f(x+, x , ϵ) of the function ef in (9). C.2. Pseudocode In terms of verified prediction for CL, (x+, x ) being verified as correct with strength ϵ means that f is lϵ -verified at (x+, x ). Similar to the discussion of supervised verification in 3.2, if f(x+, x , ϵ) given by the procedure INCOMPLETEVERIFIER is greater than 0, (x+, x ) is verified to be correct . The procedure of verified prediction is presented in pseudocode as PREDICT. We utilize PREDICT to obtain the certified instance accuracy (Aϵ CL in Definition 4.4) for the test dataset Utest, since Aϵ CL is the fraction of the test dataset for which f is lϵ -verified at (x+, x ), i.e., PREDICT(f, x+, x , ϵ) = True. In addition to prediction, we are also interested in the certified radius RCL for a given (x+, x ). Apparently, f(x+, x , ϵ) is non-increasing with ϵ because of the inf operator. Thus, we can apply binary search to obtain RCL. The procedure is presented as CERTIFY. More precisely, we determine whether f is lϵ -verified at (x+, x ) with current ϵ. If yes, it means that (x+, x ) is lϵ -verified with an ϵ larger than the current one, then we increase ϵ; otherwise, we decrease ϵ. The final solution of ϵ is the certified radius RCL. Pseudocode prediction and certification for CL # judge f is lϵ -verified at (x+, x ) or not function PREDICT(f, x+, x , ϵ) Input: encoder f, positive sample x+, negative sample x , perturbation bound ϵ Output: True: f is lϵ -verified at (x+, x ); False: f is not lϵ -verified at (x+, x ) f = INCOMPLETEVERIFIER(f, x+, x , ϵ) if f > 0 then return True else return False # compute the certified radius of (x+, x ) on encoder f function CERTIFY(f, x+, x , τ, Rl, Ru) Input: encoder f, positive sample x+, negative sample x , tolerance τ, lower bound Rl, upper bound Ru Output: certified radius RCL Initialization: τ = 10 6, Rl = 0, Ru = 1 while |Ru Rl| > τ do ϵ = (Rl+Ru)/2 # f is lϵ -verified at (x+, x ), the answer should be larger than current ϵ if PREDICT(f, x+, x , ϵ) then Rl = ϵ else Ru = ϵ end while return ϵ Robustness Verification for Contrastive Learning D. Experimental Settings D.1. Datasets and model architectures Datasets For model training, we use MNIST (Le Cun & Cortes, 2010) and CIFAR-10 (Krizhevsky & Hinton, 2009). MNIST is a dataset of 28 28 pixel grayscale images of handwritten single digits between 0 and 9, which contains 60,000 training images and 10,000 testing images with 10 classes. CIFAR-10 contains 50,000 training and 10,000 testing images with 10 classes. Size for color image in CIFAR-10 is 32 32. Model architectures Table D.1 summarizes the CNN encoder architectures. Each layer (except the last linear layer) is followed by Re LU activation function. Based and Deep are used in Wang et al. (2021), CNN-A and CNN-B are used in Dathathri et al. (2020). To study the sensitivity of featrue dimension, the ouput dimension of encoder can be changed from 50 to 250 on CNN-B. Table D.1. Model structures used in our experiments. For example, Conv(1, 8, 4) stands for a conventional layer with 1 input channel, 8 output channels and a 4 4 kernel. Linear(754, 100) stands for a fully connected layer with 754 input features and 100 output features. Datasets Model name Encoder structure MNIST Base Conv(1, 8, 4) - Conv(8, 16, 4) - Linear(784, 100) CNN-A Conv(1, 16, 4) - Conv(16, 32, 4) - Linear(1568, 100) Base Conv(3, 8, 4) - Conv(8, 16, 4) - Linear(1024, 100) Deep Conv(3, 8, 4) - Conv(8, 8, 3) - Conv(8, 8, 3) - Conv(8, 8, 4) - Linear(412, 100) CNN-A Conv(3, 16, 4) - Conv(16, 32, 4) - Linear(2048, 100) CNN-B Conv(3, 32, 5) - Conv(32, 128, 4) - Linear(8192, 100) D.2. Training setup All NNs are trained with verification-agnostic setting (Dathathri et al., 2020), which means without using any tricks to promote verifiability. We use the model mentioned in Appendix D.1 as the base encoder network and 2-layer multi-layer perceptron as the projection head (Chen et al., 2020). We set the step size of instance-wise attack α = 0.007, the number of PGD maximize iteration as K = 10. For the rest, we follow the similar setup of Sim CLR (Chen et al., 2020) and Ro CL (Kim et al., 2020). For optimization, we train the encoder with 500 epochs under Adam (Kingma & Ba, 2015) optimizer with the learning rate of 0.001. For the learning rate scheduling, the learning rate is dropped by a factor of 10 for every 100 epochs. The batch size in training is 256. D.3. Evaluation setup Linear evaluation We train the downstream linear layer on the top of the frozen encoder, and the training images are clean. We train the linear layer for 100 epochs with the learning rate of 0.001, and use the cross-entropy loss. The learning rate is dropped by a factor of 10 for every 50 epochs. Robust test To evaluate the adversarial robustness, we use white-box project gradient descent (PGD) attack. We set ℓ attack with 20 iteration steps. ϵtest = 0, 0.15, 0.2 is set for MNIST, and ϵtest = 0, 8/255, 16/255 is set for CIFAR-10. Incomplete verifiers If not special specified, CBC (β-CROWN (Wang et al., 2021) for CL) working as an incomplete verifier uses three minutes for each image. D.4. Training efficiency Our experiments are conducted on a Ubuntu 64-Bit Linux workstation, having 10-core Intel Xeon Silver CPU (2.20 GHz) and Nvidia Ge Force RTX 2080 Ti GPUs with 11GB graphics memory. For adversarial contrastive learning on base encoder, it takes about 12 hours to train 500 epochs on MNIST CNN-A and CIFAR-10 CNN-B with a single GPU. And it takes about 150 minutes to train 100 epochs on downstream linear layer. Robustness Verification for Contrastive Learning E. Additional Experiments E.1. Certified instance accuracy Table E.2. Complete comparison of certified instance accuracy across various networks and attack strength ϵtest. All accuracy is computed on 100 test samples on MNIST and CIFAR-10. CBC uses 3 minutes for each sample. Dataset ϵneg ϵtest Model ϵtrain Instance Certified Accuracy Instance Accuracy PGD CBC CROWN MNIST 0.3 0.1 Based 0 20% 2% 0% CNN-A 6% 0% 0% Based 0.1 100% 100% 100% CNN-A 100% 100% 99% 0.5 0.3 Based 0.3 100% 98% 42% CNN-A 100% 85% 3% 0 100% 97% 96% 100% 100% 100% 91% 26% 11% 4.4 255 100% 55% 34% 8.8 255 100% 68% 52% Based 4.4 255 100% 99% 95% Deep 100% 96% 84% CNN-A 99% 91% 81% 8 255 CNN-B 8.8 255 1% 0% 0% 24 255 6 255 4.4 255 100% 8% 2% 8.8 255 100% 24% 11% Appendix E.1 summarizes the complete results of certified instance accuracy provided by our proposed RVCL on MNIST and CIFAR-10. In order to control the similarity ρ(f(x ), f(x )) between x and x , we generate the negative sample x via instance-wise PGD attack (Kim et al., 2020) with strength ϵneg which is much larger than ϵtest. In 6.3, we present some of the results on CIFAR-10 with ϵneg = 16/255. As shown by the results in Appendix E.1, the gap between CBC and PGD is smaller than that between CROWN and PGD on both MNIST and CIFAR-10 datasets. This is consistent with the experimental results in Wang et al. (2021), and further illustrates the effectiveness of our proposed RVCL. From Appendix E.1, we can also make the following observations and remarks: Influence of ϵtest If ϵtest is small, the certified instance accuracy Aϵ CL of both CROWN and CBC approach the robust instance accuracy Aϵ CL given by instance-wise PGD. However, the gap between CROWN and CBC becomes large as ϵtest increases. The results show that CBC is a more powerful verifer than CROWN. The reason for this is that CBC optimizes the intermediate layer bounds and then iteratively tightens the lower bound. We can further observe that instance-wise PGD successfully attacks the model on all images of CIFAR-10 under ϵtest = 8/255. Remark E.1. The results in Appendix E.1 often achieve a high robust instance accuracy Aϵ CL. The direct reason is that the instance-wise attack is more difficult than the label-wise attack. Theoretically, Theorem 5.3 shows that the robust radius RCL RLE. This implies that one may label-wise attack an image successfully with a small ϵtest, but that it is nearly impossible to successfully instance-wise attack with the same small ϵtest, which results in a high robust instance accuracy. Figure 4(b) experimentally certifies this conclusion, ACRCL is an order of magnitude larger than ACRLE. However, without loss of effectiveness, we can still evaluate the tightness of verifiers by comparing the gap between Aϵ CL and Aϵ CL. Influence of ϵtrain The certified instance accuracy Aϵ CL increases with increasing ϵtrain (consistent with Figure 4(c,d)), which demonstrates that Aϵ CL can also evaluate the model robustness. Howerver, as we discuss in Remark 4.5, Aϵ CL is a function of specific attack strength ϵtest, it s hard to compare the robustness of two models by comparing Aϵ CL of various values of ϵtest. Thus ACRCL is a more suitable choice to campare models with different robust performance. Robustness Verification for Contrastive Learning E.2. Efficiency of incomplete verifiers (a) ACRCL for MNIST (b) ACRCL for CIFAR-10 Figure E.1. ACRCL calculated by CBC and CROWN on MNIST and CIFAR-10. |Utest| = 100, K = 10. Table E.3. Average calculation time of ACRCL, the number of negative samples K = 5, and timeout is set to 0.3. Time(s) CROWN CBC MNIST 1.16 464.52 CIFAR-10 3.66 921.78 6.1 and 6.2 use CROWN to compute the certified radius ACRCL in the interests of efficiency. This subsection shows that CBC achieves similar ACRCL with CROWN, but takes more time than CROWN. Timeout is set to 0.3s for each step of CBC binary search. CNN-A is run on MNIST and CNN-B is run on CIFAR-10. The experimental setting of Figure E.1 is the same with that in 6.1. Table E.3 provides the average time to caculate ACRCL defined in 6.2 over 20 images. The results in Figure E.1 show that ACRCL over Utest provided by CBC and CROWN is nearly the same, and both of them show the tendency of robust performance. This is because the time for each step of binary search is too short for CBC to tighten the lower bound. However, Table E.3 shows that the time cost of CBC is about 300 times slower than CROWN, even using a small value of timeout for CBC. Therefore, we conclude that CBC is able to achieve a tight bound, but CBC is time-consuming in computing RCL, and it is reasonable to use CROWN to compute ACRCL of models and images.