# adversarially_robust_conformal_prediction__ce7bc1d0.pdf Published as a conference paper at ICLR 2022 ADVERSARIALLY ROBUST CONFORMAL PREDICTION Asaf Gendler1, Tsui-Wei Weng2, Luca Daniel2, Yaniv Romano1 1Department of Electrical and Computer Engineering, Technion - Israel Institute of Technology 2Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology Conformal prediction is a model-agnostic tool for constructing prediction sets that are valid under the common i.i.d. assumption, which has been applied to quantify the prediction uncertainty of deep net classifiers. In this paper, we generalize this framework to the case where adversaries exist during inference time, under which the i.i.d. assumption is grossly violated. By combining conformal prediction with randomized smoothing, our proposed method forms a prediction set with finite-sample coverage guarantee that holds for any data distribution with ℓ2norm bounded adversarial noise, generated by any adversarial attack algorithm. The core idea is to bound the Lipschitz constant of the non-conformity score by smoothing it with Gaussian noise and leverage this knowledge to account for the effect of the unknown adversarial perturbation. We demonstrate the necessity of our method in the adversarial setting and the validity of our theoretical guarantee on three widely used benchmark data sets: CIFAR10, CIFAR100, and Image Net. 1 INTRODUCTION Deep neural net classifiers have achieved tremendous accomplishments over the last several years. Nevertheless, the increased deployment of these algorithms in real-world applications, especially ones that can be life-threatening like autonomous driving, raises major concerns about their reliability (Heaven, 2019). To alleviate these issues, it is important to develop techniques that allow the users to assess the uncertainty in predictions obtained by complex classifiers, revealing their limitations. Conformal prediction (Vovk et al., 2005) is a simple yet powerful tool for generating prediction sets whose size reflects the prediction uncertainty. Specifically, suppose we are given n training examples {(Xi, Yi)}n i=1 with feature vector Xi Rd, discrete and unordered class label Yi {1, 2, . . . , L} = Y, and any learning algorithm that aims at predicting the unknown Yn+1 of a given test point Xn+1. Under the assumption that the training and test examples are sampled exchangeably e.g., they may be drawn i.i.d. from an unknown distribution PXY , conformal prediction algorithms construct a distribution-free prediction set C (Xn+1) Y guaranteed to contain the test label Yn+1 at any desired coverage probability 1 α (0, 1): P [Yn+1 C (Xn+1)] 1 α. (1) For example, it is common to set the desired coverage level 1 α to be 90% or 95%. Note that the coverage probability P [Yn+1 C (Xn+1)] is marginal because it is taken over all the training examples {(Xi, Yi)}n i=1 and the test point Xn+1. The key idea of conformal prediction is to fit a classifier on the training set and use this model to assign non-conformity scores for held-out data points. These scores reflect the prediction error of the underlying classifier, where, loosely speaking, a smaller prediction error would lead to the construction of smaller and more informative sets. However, the sets constructed by the vanilla conformal method may not have the right coverage when the training and test points violate the exchangeability assumption (Cauchois et al., 2020; Gibbs & Cand es, 2021; Tibshirani et al., 2019; Podkopaev & Ramdas, 2021; Guan & Tibshirani, 2019), which is hardly satisfied by real data in practice as distribution shift happens frequently (Koh et al., 2021). In particular, we consider the potential threat of adversarial attacks (Goodfellow et al., 2015; Szegedy et al., 2014; Carlini & Wagner, 2017) carefully crafted human-imperceptible noise perturbations that drives the fitted model to err at test (inference) time. Such noise perturbations can introduce a large and arbitrary distribution shift that is extremely hard to estimate. In this setting, the Published as a conference paper at ICLR 2022 Vanilla CP Adversarial Test Our Method (RSCP) Adversarial Test Marginal Coverage Nominal Level Vanilla CP Adversarial Test Our Method (RSCP) Adversarial Test Average Set Size Figure 1: Marginal coverage (i.e., the percent of test labels that are included in the prediction sets) and average set-size obtained by vanilla conformal prediction (Vanilla CP) and our RSCP method, evaluated on the test set of CIFAR10. We use a VGG-19 classifier and set the nominal coverage level to be (1 α) = 0.9. See Section 5 and Supplementary Section S8 for more details. prediction sets constructed by the vanilla conformal approach are often invalid, i.e., do not satisfy (1), as illustrated in Figure 1. Following that figure, while this method achieves the desired 90% coverage when applied to clean test data, the empirical coverage obtained when applying the same method on adversarial test examples falls dramatically below 90%, to about 30%. Therefore, it is the main motivation of this work to construct prediction sets that are robust to adversarial attacks. We formalize this requirement as follows: P h Yn+1 Cδ( Xn+1) i 1 α, (2) where Xn+1 = Xn+1 + ϵ is the test adversarial example, and ϵ 2 δ is a norm-bounded adversarial perturbation. Note that we use the notation Cδ to distinguish between the new setting from the exchangeability case for which δ = 0. Crucially, we require (2) to hold in finite samples, for any PXY , any adversarial perturbation of magnitude bounded by δ, and regardless of the choice or accuracy of the underlying classifier. At the same time, we wish Cδ to be as small as possible. To realize (2) with the desired coverage, we propose to combine randomized smoothing (Duchi et al., 2012; Cohen et al., 2019; Salman et al., 2019) with the vanilla conformal prediction procedure, and hence we name our technique Randomly Smoothed Conformal Prediction (RSCP). Randomized smoothing allows us to bound the Lipschitz constant of any non-conformity score function by convolving it with the Gaussian distribution function. Leveraging this bound, we show how to modify conformal prediction and rigorously construct prediction sets that account for the adversarial perturbation. Figure 1 illustrates that our proposed RSCP approach successfully attains the desired coverage level whereas the vanilla conformal method fails. Observe that RSCP constructs slightly larger prediction sets, reflecting the increased uncertainty induced by the adversarial noise. The contributions of this paper are two-fold: (i) We propose, for the first time, a new conformal prediction method that can account for the potential adversarial threats during inference time. Our RSCP method, described in Section 3, is model-agnostic in that it can work with any classifier and non-conformity score, scalable since the smoothing can be done by Monte Carlo integration with many i.i.d. Gaussian noise realizations, and robust against any ℓ2-norm bounded adversarial attack. (ii) We prove that the prediction sets constructed by RSCP are valid in the sense of (2), and, in Section 5, support this theoretical result with numerical experiments on CIFAR10, CIFAR100, and Image Net data sets. 2 CONFORMAL PREDICTION Since the focus of this paper is on how to adapt the vanilla conformal prediction method to the adversarial setting, in this section, we give background on conformal prediction. While we focus here on classification problems, the method of conformal prediction can also be applied to regression tasks. The vanilla conformal prediction can be divided into two categories: the split conformal prediction Published as a conference paper at ICLR 2022 and the full conformal prediction. The first one involves data splitting while the second one constructs valid sets without splitting the data but at the cost of a significant increase in computational complexity at test time (Papadopoulos et al., 2002; Papadopoulos, 2008; Vovk et al., 2005). To avoid prohibitive computation complexity, in this paper we focus only on split conformal prediction. This method starts by splitting the training data into two disjoint subsets: a proper training set Itr {1, . . . , n} and a calibration set Ical = {1, . . . , n} \ Itr. Then, a classifier ˆf(x) [0, 1]L is fit to the proper training set, estimating the conditional class probabilities P[Y = y | X = x] for all y Y. In the case of deep net classifiers, which is the focus of this work, this may be the output of the softmax layer. Next, we compute a non-conformity score Si = S (Xi, Yi) R for each calibration point {(Xi, Yi)}i Ical. This score expresses how well the model prediction ˆf(X) is aligned with the true label Y , where a lower score implies better alignment. For example, the score from Vovk et al. (2005); Lei et al. (2013) is given by S (x, y) = 1 ˆf (x)y , (3) where ˆf (x)y [0, 1] is the yth entry in the vector ˆf (x). Another example is the score proposed by Romano et al. (2020b), which can be expressed as S (x, y) = X y Y ˆf (x)y I n ˆf (x)y > ˆf (x)y o + ˆf (x)y u, (4) where I is the indicator function and u is a random variable distributed uniformly over the segment [0, 1]. We refer to the score from (3) as HPS as it was shown to construct homogeneous prediction sets. Analogously, we refer to (4) as APS since it tends to yield adaptive prediction sets that reflect better the underlying uncertainty across sub-populations; see Romano et al. (2020b) for more details. Given the desired coverage level 1 α, the prediction set for a new test point Xn+1 is formulated as C (Xn+1) = y Y : S (Xn+1, y) Q1 α {Si}i Ical , (5) where Q1 α {Si}i Ical := the (1 α) 1 + 1 1 + |Ical| th empirical quantile of {Si}i Ical (6) is the score positioned (n + 1) (1 α) in the sorted array of calibration scores Si, i Ical. In plain words, in (5) we sweep over all possible labels y Y and include in C (Xn+1) the guessed labels y whose scores S (Xn+1, y) are smaller than most of the calibration scores S (Xi, Yi). Since the calibration and test points are drawn exchangeably from PXY and ˆf is fixed, the score S (Xn+1, y) for the guess y = Yn+1 can fall anywhere in the sorted array of calibration scores with equal probability. This property guarantees that the prediction set (5) satisfies (1); see Vovk et al. (2005). 3 RANDOMLY SMOOTHED CONFORMAL PREDICTION In this section, we introduce our proposed RSCP framework for constructing prediction sets that are valid in the adversarial regime. Recall that an adversarial attack can lead to a significant distributional shift between the clean calibration points and the corrupted test example Xn+1 = Xn+1 + ϵ, thus violating the fundamental exchangeability assumption of the split conformal procedure. Focusing on the guess y = Yn+1, an effective attack would result in a larger non-conformity score for the corrupted test point S( Xn+1, y) compared to that of the clean input S(Xn+1, y). Therefore, a naive comparison of S( Xn+1, y) to the same threshold Q1 α from (6), which neglects the increased uncertainty caused by the adversarial perturbation, will result in a prediction set that may not achieve the desired coverage, as already illustrated in Figure 1. To address this, we should compare the test score to an inflated threshold, larger than Q1 α, which rigorously accounts for the effect of the adversarial noise. This is the core idea behind our proposal described in detail below; see Figure 2. 3.1 ADVERSARIALLY ROBUST CALIBRATION Suppose we are given a non-conformity score function S for which we can bound by how much its value could be increased due to the adversarial noise ϵ 2 δ added to Xn+1. Formally, we require the score S to satisfy the following relation: S Xn+1, y S (Xn+1, y) + Mδ, y Y, (7) Published as a conference paper at ICLR 2022 Figure 2: Schematic demonstration of the effect of an adversarial noise on vanilla split conformal prediction, along with our proposed solution. The vanilla conformal set C(X) and our RSCP set Cδ(X) are generated as described in (5) and (8), respectively. The black arrows point to the values of the non-conformity scores that correspond to the clean (left) and adversarial (right) test points, which are further marked by the vertical purple lines. where Mδ 0 is a constant that is a function of δ, such that Mδ1 Mδ2 for δ1 δ2 and Mδ = 0 for δ = 0. In essence, we would like (7) to hold for the smallest possible Mδ. We denote this score function by S to emphasize that it must satisfy (7), distinguishing it from existing non-conformity scores S, e.g., (3) (4); see Section 3.2 for a concrete and very general framework for designing S for which the constant Mδ can be easily derived. Importantly, Mδ serves as a bridge between the observed score S( Xn+1, y) and the unobserved one S(Xn+1, y) for a fixed y Y. Leveraging this property, we propose to construct a prediction set robust to a norm-bounded adversarial attack by applying the following decision rule: Cδ( Xn+1) = {y Y : S( Xn+1, y) Q1 α({ Si}i Ical) + Mδ}, (8) where Si = S(Xi, Yi). In contrast to the vanilla split conformal approach (5), the prediction set defined above is generated by comparing the test score to an inflated threshold Q1 α + Mδ, as illustrated in Figure 2. Notice that the level of inflation is affected by the magnitude of the adversarial perturbation as well as the robustness of S, i.e., the value of Mδ. A larger perturbation implies larger inflation, and a more robust score S implies smaller inflation. The theorem below states that the constructed prediction set (8) is guaranteed to contain the unknown target label Yn+1 with a probability of at least 1 α, for any distribution PXY , sample size n, score function S that satisfies (7), and adversarial perturbation of magnitude δ generated by any attack algorithm. The proof of this and all other results can be found in Section S1 of the Supplementary Material. Theorem 1. Assume that the samples {(Xi, Yi)}n+1 i=1 are drawn exchangeably from some unknown distribution PXY . Let Xn+1 = Xn+1 +ϵ be a clean test example Xn+1 with an additive corruption of an ℓ2-norm bounded adversarial noise ϵ 2 δ. Then, the prediction set Cδ( Xn+1) defined in (8) satisfies P h Yn+1 Cδ( Xn+1) i 1 α. Before presenting our framework to construct scores that rigorously satisfy (7), we pause to prove a lower bound on the coverage that the vanilla split conformal could attain in the adversarial setting. Theorem 2. Under the assumptions of Theorem 1, the prediction set C( Xn+1) defined in (5), applied with S in place of S, satisfies P h Yn+1 C( Xn+1) i τ, where τ = max n τ : Qτ Si i Ical Q1 α Si i Ical Mδ o . (9) Note that τ can be simply computed by running a grid search on the sorted array of calibration scores. It is important to observe that in contrast to Theorem 1, which guarantees at least 1 α coverage for any user-specified level α, the worst coverage level τ of the vanilla split conformal is not controlled explicitly, i.e., τ is known only after looking at the training and calibration data. Observe also that, Published as a conference paper at ICLR 2022 by construction, τ 1 α (see Supplementary Section S1) and equality is met in a special case where δ = 0, i.e., no attack is performed. In this special case, Mδ = 0 by definition, and both Theorem 1 and 2 converge to the classic coverage guarantee of the vanilla conformal algorithm. 3.2 THE RANDOMLY SMOOTHED SCORE To apply the robust calibration procedure presented above, we need to have access to a constant Mδ that satisfies (7) for a given non-conformity score function. In general, this constant is unknown for modern deep neural net classifiers as well as for complex non-conformity scores, such as the one defined in (4). In what follows, we present a novel framework that takes inspiration from the literature on randomized smoothing (Duchi et al., 2012; Cohen et al., 2019; Salman et al., 2019), offering a theoretically grounded mechanism to bound the Lipschitz constant of any scoring function derived from any black-box classifier. Concretely, we suggest constructing a new score S from the base score S as follows: S (x, y) = Φ 1 Ev N(0d,σ2Id) [S (x + v, y)] , (10) where S is a smoothed version of the base score S, obtained by averaging the value of S (x + v, y) over many independent samples v N 0d, σ2Id drawn from the multivariate normal distribution, and then applying Φ 1, the inverse of the cumulative distribution function (CDF) of the standard normal distribution. The hyper-parameter σ controls the level of smoothing, where a larger σ implies stronger smoothing. (In Section 5.1, Figure 3, and Supplementary Section S5.1 we discuss the effect of this parameter on the statistical efficiency of the overall calibration procedure.) Assuming that the base score function S(x, y) [0, 1] for any given (x, y), which holds for most non-conformity scores in multi-class classification problems, we can invoke a well-known result from the randomized smoothing literature (Salman et al., 2019, Lemma 2) and get the following upper-bound for the smoothed score: S Xn+1, y S (Xn+1, y) + ϵ 2 σ S (Xn+1, y) + δ Importantly, (11) holds for every y Y, hence rigorously satisfying the relation (7) for the choice of Mδ = δ/σ. Armed with S, we can invoke Theorems 12 and construct an adversarially robust prediction set by following (8) or lower bound the coverage of the vanilla split conformal procedure. Corollary 1. Let S be the randomly smoothed score function (10) and set Mδ = δ/σ. The prediction set defined in (8) satisfies P h Yn+1 Cδ Xn+1 i 1 α, and the worst coverage of (5) is given by P h Yn+1 C Xn+1 i τ, where τ = max n τ : Qτ Si i Ical Q1 α Si The above leads to two possible use-cases of our proposal. In the first, we can implement our randomly smoothed conformal prediction procedure (RSCP) (8) and construct valid prediction sets at any desired level, bearing in mind that these sets are likely to be larger due to the increased uncertainty induced by the adversarial perturbation. In the second case, we can deploy the vanilla split conformal algorithm with a smoothed version of the base score, providing important information about the worst coverage that might be obtained under an adversarial attack. In practice, the smoothed score cannot be evaluated explicitly as it requires to compute the expectation in (10). However, this smoothed score can be easily estimated by averaging over many i.i.d. Gaussian noise realizations in a Monte Carlo fashion (Cohen et al., 2019, Section 3.2), as described in Algorithm 1; see Section 5.1 and Supplementary Section S5.2 for more details about the effect of this approximation on the coverage and set-size. Here, an interesting future direction is to develop a high probability bound for the smoothed score approximation error and integrate it with our method. 4 RELATED WORK 4.1 CONFORMAL PREDICTION UNDER DISTRIBUTION SHIFT Previous work offers several generalizations of the vanilla conformal prediction framework to the case of distribution shift between the training and the test points, however, none of them addresses Published as a conference paper at ICLR 2022 Algorithm 1 RSCP: Randomly Smoothed Conformal Prediction Input: Data {(Xi, Yi)}n i=1; a test point Xn+1 corrupted by adversarial noise of energy δ; learning algorithm A; a base non-conformity score S; and coverage level 1 α. 1: Split the training data into 2 disjoint subsets Itr and Ical. 2: Train a classifier using A on all samples from Itr, i.e., ˆf (X) A {(Xi, Yi)}i Itr . 3: Draw vij i.i.d. N 0d, σ2Id , where i Ical and j = 1, . . . , ns, and compute: ˆSi = ˆS (Xi, Yi) = Φ 1 1 j=1 S (Xi + vij, Yi) , for all i Ical. (12) 4: Compute Q1 α({ ˆSi}i Ical) := the (1 α) 1 + 1 1+|Ical| th empirical quantile of { ˆSi}i Ical. 5: Construct a prediction set for Xn+1: Cδ( Xn+1) = y Y : ˆS( Xn+1, y) Q1 α({ ˆSi}i Ical) + δ Output: A prediction set Cδ( Xn+1) for the unobserved test label Yn+1. the adversarial setting specifically. Cauchois et al. (2020) suggested a method to construct valid sets when the test distribution of the non-conformity scores is in an f-divergence ball around their training distribution. While this approach shares the idea of inflating the value of Q1 α, the f-divergence measure is notoriously difficult to estimate in practice (Nguyen et al., 2010; Tsybakov, 2009) and thus less applicable compared to our approach, which focuses on the addition of bounded ℓ2 adversarial noise (see Supplementary Section S12 for a comparison between the methods). Gibbs & Cand es (2021) generalized the vanilla conformal prediction approach to an online setting where the data generating distribution varies over time by modifying the threshold adaptively as new labeled points are being observed, such that the long-term empirical coverage frequency would be at least 1 α. This is of course different from our approach that works offline and assumes the same data generating process but with the addition of a malicious small perturbation induced by an attacker. Tibshirani et al. (2019) studied the case of covariate shift and offered a weighted version of conformal prediction to address it. In this setting, the shift is only in PX while PY |X remains intact. Podkopaev & Ramdas (2021) modified this method to tackle the label-shift case, in which PX|Y remains the same but PY varies between train and test time. Both approaches differ from ours, which handles a full distributional shift induced by the adversarial perturbation. Lastly, Hechtlinger et al. (2018) suggested a non-conformity score that empirically tends to be more robust to adversarial attacks, replacing the classifier s class probability estimates of Y | X with a density estimator that approximates the conditional distribution of X | Y . In contrast to our RSCP approach, this method is not supported by a theoretical guarantee for attaining the desired coverage in the adversarial setting. 4.2 CERTIFIED ADVERSARIAL ROBUSTNESS A different but related line of research is known as robustness certification . Here, the goal is to provide an ℓp radius around a test point, in which no adversarial perturbations exist, i.e, the predicted label provably remains the same inside this radius. Currently, one of the mainstream approaches in robustness certification is through randomized smoothing, which was proposed in (Lecuyer et al., 2019; Li et al., 2019; Cohen et al., 2019) to provide an analytic form of robustness certificate. More formally, given a base classifier ˆf(x), one can construct a smoothed classifier according to the following decision rule: ˆy = argmaxy P[ ˆfy(x + v)], v N(0d, σ2Id). (13) In contrast to the base classifier, the smoothed classifier can be supported by a rigorous ℓ2 certification radius around the test point Xn+1 = x. It is self-evident that our work is different from robustness certification, as we are not aiming to deliver a robustness certificate for a given classifier over some test point; instead, we construct a valid prediction set whose size can vary with x. However, both works make use of the randomized smoothing technique, sharing related set of ideas. Published as a conference paper at ICLR 2022 For example, practical evidence shows that for the smoothed model (13) to correctly classify the test point Xn+1, the base classifier ˆf needs to consistently label the noisy sample N Xn+1, σ2I as Yn+1 (Cohen et al., 2019, Section 3.3). Training the base classifier without accounting for the future smoothing operation would not necessarily satisfy this requirement, and, as a consequence, would lead to a very small certification radius. We note that this observation is also relevant to our setting, where we found that using a classifier that is not robust to Gaussian perturbations results in prediction sets that are often large and thus uninformative. More details are provided in Section S4 of Supplementary Material. One way to improve the certified radius is by training the base classifier on the labeled training pairs {(N Xi, σ2I , Yi)}i, i Itr as in Lecuyer et al. (2019), where σ is equal to the one used for smoothing. Later, Salman et al. (2019) further improved the robustness guarantee through adversarial training, where the adversary generates a bounded perturbation via the method of SMOOTHADV an adversarial attack designed specifically against smoothed classifiers. A more recent work by Salman et al. (2020) suggested adding a denoising pre-processing layer preceding the classifier, to remove the Gaussian noise used in randomized smoothing. In our experiments, we find that the training strategy proposed in Lecuyer et al. (2019) works well in practice and results in relatively small prediction sets, although a better training scheme such as the one from Salman et al. (2019) or Salman et al. (2020) could be easily combined with our method and may further improve the overall statistical efficiency. 5 EXPERIMENTS In this section, we evaluate the performance of our proposed methods on three benchmark image classification data sets: CIFAR10, CIFAR100 (Krizhevsky, 2009), and Image Net ILSVRC2012 (Deng et al., 2009), which are described in Supplementary Section S2. We evaluate our methods as follows. First, we fit a model on the entire training set. Then, we split the remaining data into two equally sized disjoint subsets, one is used for calibration and the second for testing. The calibration is done using the two base non-conformity scores mentioned in Section 2, i.e., HPS (3) and APS (4). Next, we attack each point in the test set by adding to it an adversarial perturbation whose ℓ2-norm is bounded by δ.1 Lastly, for each adversarial test point, we construct a prediction set using three different conformal methods (described below), asking for a nominal coverage level of 1 α. Given the constructed sets, we test the validity of each method by reporting the marginal coverage rate, and the statistical efficiency by computing the average size of the prediction sets. We repeat this process 50 times by randomly assigning data points to form the calibration and test sets. In our experiments, we compare three different approaches: (i) vanilla split conformal prediction implemented with the two base-scores (HPS and APS), denoted by Vanilla CP; (ii) vanilla split conformal prediction with our proposed smoothed version of the base scores, denoted by CP+SS; and (iii) the proposed randomly smoothed conformal prediction (see Algorithm 1), denoted by RSCP. Our experiments show that Vanilla CP constructs invalid prediction sets whose marginal coverage is way below the desired level. The smoothed score improves robustness and coverage but still does not achieve the nominal coverage rate. The prediction sets constructed by our RSCP approach achieve the desired coverage level, aligning with our theoretical guarantees. 5.1 IMPLEMENTATION DETAILS Models For all data sets, we choose Res Net (He et al., 2016) to be the base architecture of the deep net classifier and fit two different models for each data set: one on clean training points and the second on points that are augmented with Gaussian noise of standard deviation σ that is equal to the smoothing parameter from (10). The former is used to explore the effect of adversarial examples on the validity of Vanilla CP. The latter training strategy improves the robustness of the classifier and, as a consequence, reduces the size of the prediction sets constructed by CP+SS and RSCP methods, as discussed in Section 4.2 and Supplementary Section S4. For CIFAR10 and Image Net data sets, we use the pre-trained Res Net-110 and Res Net-50 models from Cohen et al. (2019), and for CIFAR100 we fit our own Res Net-110 model as the latter data set is not studied by Cohen et al. (2019), please see Supplementary Section S3 for additional details on the training procedure. See also Supplementary Section S8 for similar experiments in which we choose Dense Net (Iandola et al., 2014) and VGG (Simonyan & Zisserman, 2014) to be the base architectures. 1See Supplementary Section S9 for additional experiments evaluated on clean data points. Published as a conference paper at ICLR 2022 1/8 1/6 1/4 1/3 1/2 1 2 Average Set Size Figure 3: The effect of σ on the size of RSCP s prediction sets. The results correspond to the CIFAR100, with δ = 0.125 and ns = 256. Marginal Coverage Average Set Size Figure 4: The effect of the number of noise realizations ns (used to estimate the smoothed score) on the marginal coverage (left) and size (right) of prediction sets constructed by RSCP. The results correspond to the CIFAR10 data set, where we set δ and σ to be equal to 0.125 and 0.25, respectively. Attack algorithm For Vanilla CP, we generate adversarial noise using the projected gradient descent (PGD) attack (Madry et al., 2018), which is one of the strongest white-box attacks known in the literature to construct ℓ2 bounded adversarial examples. For the other two methods that make use of randomized smoothing (CP+SS and RSCP), we apply SMOOTHADV (Salman et al., 2019) a variation of PGD that is tailored for smoothed classifiers. For CIFAR10 and CIFAR100, we attack the test points with an adversarial perturbation of energy δ = 0.125, and for Image Net we use a larger perturbation of magnitude δ = 0.25. Both choices lead to a significant reduction in the test accuracy of the base classifier, as summarized in Supplementary Section S6. Choosing σ for smoothing Recall Corollary 1 and observe that the coverage guarantee holds for any choice of the smoothing level σ. This parameter is intimately connected to the accuracy/robustness trade-off (Cohen et al., 2019, Section 4): a large σ will reduce the Lipschitz constant Mδ = δ/σ, however, may also reduce the accuracy of the base classifier (even if fitted on noisy samples). Similar to previous work on randomized smoothing (Cohen et al., 2019; Salman et al., 2019), we treat σ as a hyper-parameter of our method which can be tuned, for example, by choosing σ that produces the smallest sets by cross-validation.2 In practice, we find that setting σ = 2δ in (10) yields informative prediction sets, as illustrated in Figure 3. This figure also demonstrates the accuracy/robustness trade-off discussed above: the size of the prediction sets reduces for 1/8 Mδ 1/2, and increases for Mδ > 1/2. See Supplementary Section S5.1 for additional experiments that study the effect of σ on the marginal coverage and average set-size for all data sets. Choosing the number of Monte Carlo iterations Following Algorithm 1, the smoothed score is evaluated by replacing the expectation (10) with the empirical mean (12), averaging over ns i.i.d. Gaussian noise samples. This is similar to the approach Cohen et al. (2019, Section 3.2) used for approximating the smoothed classifier defined in (13). In our experiments, we find that our method is fairly robust to the choice of ns, and using more than ns = 256 realizations barely affects the marginal coverage and average set-size. This phenomenon is demonstrated in Figure 4 for CIFAR10; see Supplementary Section S5.2 for additional experiments that illustrate the influence of ns on CIFAR100 and Image Net. Therefore, in our experiments we set ns = 256 both for CIFAR10 and CIFAR100, and ns = 64 for Image Net due to limited GPU memory footprint; see Supplementary Section S7 for a discussion about the computational complexity including the running time of RSCP. 5.2 RESULTS Figure 5 presents the marginal coverage obtained by Vanilla CP, CP+SS, and RSCP for the three data sets. As can be seen, Vanilla CP fails to construct valid prediction sets in the presence of adversarial noise. For instance, consider the choice of HPS as the non-conformity score, and observe that the average coverage for CIFAR10, CIFAR100, and Image Net is 27.27%, 29.24%, and 38.02%, respectively; all are much smaller than the desired 90% level, emphasizing the necessity of 2Note that our objective is different from that of certified robustness that seeks for σ that leads to a large radius in which the prediction is certifiably robust. Published as a conference paper at ICLR 2022 0.75 0.80 0.85 0.90 0.95 Marginal Coverage CP+SS Worst Coverage Nominal Level CIFAR100 Image Net Average Set Size Figure 5: Marginal coverage (left) and average set-size (right) obtained by different conformal methods. The target coverage level is 90%. our proposal. When using the same vanilla split conformal procedure, however with the smoothed version of the base non-conformity score (i.e., CP+SS), we get a major improvement in terms of coverage. Here, for the smoothed version of the HPS score, the average coverage levels obtained for CIFAR10, CIFAR100, and Image Net are closer to 90% and are equal to 86.78%, 87.10%, and 88.12%, respectively. This serves as evidence that the new smoothed score is more robust to adversarial attacks. Moreover, since we can derive the Lipschitz constant of the smoothed score, we can invoke the worst coverage bound from Theorem 2. As illustrated, the (average) worst coverage is indeed below the empirical coverage level obtained by CP+SS. While this lower bound is close to the empirical one (for both scores), there exists a gap between the two that can be possibly reduced by designing a better algorithm tailored to attack the smoothed score. By contrast, our RSCP method generates valid prediction sets whose coverage is slightly above the nominal level, supporting our theoretical guarantee. Figure 5 also presents the average set-size constructed by all three methods. Observe that CP+SS generates larger sets compared to Vanilla CP since it relies on a more robust classifier that leads to better coverage at cost of lower accuracy; see Supplementary Table S1. Observe also that RSCP constructs larger sets than those of CP+SS due to the inflated threshold used in the calibration, accounting for the effect of the adversarial noise. In addition, the sets obtained by the APS score are larger than those of HPS. This can be explained by (i) the coverage plots: the marginal coverage obtained when using the smoothed APS score tends to be larger than that of the smoothed HPS; and (ii) it is known that the APS score tends to construct larger sets even in the noiseless case since it aims at constructing sets of 1 α coverage across different sub-populations; see Supplementary Section S11 for an experiment that illustrates this as well as Romano et al. (2020b). 6 CONCLUSIONS AND FUTURE WORK In this paper, we generalized the technique of conformal prediction to the adversarial setting and introduced a model agnostic, scalable, and simple method for constructing valid prediction sets to multi-class classification problems, accounting for the adversarial noise. Our experiments show that our RSCP method is most effective when combined with robust classifiers, e.g., ones that are fitted by augmenting Gaussian noise to the training data. However, this often leads to a degradation in test accuracy, a limitation that is not unique to our work. In fact, this is a matter of ongoing research in the robustness certification literature, and note that any new development in that front (e.g., Salman et al. (2020)) could be easily integrated with our method as demonstrated in Supplementary Section S10. An exciting future direction could be to extend the guarantees we provided beyond the assumption that the adversarial noise is ℓ2-norm bounded and consider different smoothing and calibration techniques that can handle the ℓ0, ℓ1, and ℓ cases (Lee et al., 2019; Teng et al., 2019; Zhang et al., 2019). Other directions could be to explore the use of our method to improve robustness to adversarial attacks in regression problems and to provide conditional validity guarantees of the style suggested by Vovk (2012). Published as a conference paper at ICLR 2022 ACKNOWLEDGMENTS A.G. and Y.R. were partially supported by the 2020 Zuckerman Fund, Technion Center for Machine Learning and Intelligent Systems (MLIS Grant No. 2029967), and the ISRAEL SCIENCE FOUNDATION (grant No. 729/21). Y.R. thanks the Career Advancement Fellowship, Technion, for providing research support. Y.R also thank Stephen Bates for his comments on an earlier version of this manuscript. ETHICS STATEMENT The ability to communicate the predictive uncertainty of any classifier by constructing prediction sets of valid coverage is crucial to assess the performance of machine learning algorithms (Holland, 2020), improve their reliability, and avoid unwanted discrimination (Romano et al., 2020a). Our proposed method further enriches the existing uncertainty quantification toolbox, by providing a rigorous form of robustness to adversarial perturbations. For example, one can promote fairness in high-stakes applications (in addition to robustness) by integrating RSCP within the framework of equalized coverage (Romano et al., 2020a). Here, we stress that the validity of our methods relies on the exchangeability assumption between the clean training and test points, and on the knowledge of the ℓ2-bound of the attack. If one of these assumptions is violated, the sets that RSCP constructs may not have the right coverage. Therefore, it is recommended to assess whether the above assumptions are reasonable when using the proposed methods. REPRODUCIBILITY STATEMENT A Python package that implements our methods and code to reproduce our experiments are available at https://github.com/Asafgendler/RSCP. Complete proofs of the theorems presented in this paper can be found in Supplementary Section S1. A complete description of the data sets used in our experiments, including pre-processing steps, can be found in Supplementary Section S2. The details about the models used in our experiments and how they were trained are provided in Supplementary sections S3 and S8. Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy (SP), pp. 39 57. IEEE Computer Society, 2017. Maxime Cauchois, Suyash Gupta, Alnur Ali, and John C Duchi. Robust validation: Confident predictions even when distributions shift. ar Xiv preprint ar Xiv:2008.04267, 2020. Gilad Cohen, Guillermo Sapiro, and Raja Giryes. Detecting adversarial samples using influence functions and nearest neighbors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14453 14462, 2020. Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. In International Conference on Machine Learning, pp. 1310 1320. PMLR, 2019. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In IEEE conference on computer vision and pattern recognition, pp. 248 255. Ieee, 2009. John C Duchi, Peter L Bartlett, and Martin J Wainwright. Randomized smoothing for stochastic optimization. SIAM Journal on Optimization, 22(2):674 701, 2012. Reuben Feinman, Ryan R Curtin, Saurabh Shintre, and Andrew B Gardner. Detecting adversarial samples from artifacts. ar Xiv preprint ar Xiv:1703.00410, 2017. Rina Foygel Barber, Emmanuel J Candes, Aaditya Ramdas, and Ryan J Tibshirani. The limits of distribution-free conditional predictive inference. Information and Inference: A Journal of the IMA, 10(2):455 482, 2021. Isaac Gibbs and Emmanuel Cand es. Adaptive conformal inference under distribution shift. ar Xiv preprint ar Xiv:2106.00170, 2021. Published as a conference paper at ICLR 2022 Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. stat, 1050:20, 2015. Leying Guan and Rob Tibshirani. Prediction and outlier detection in classification problems. ar Xiv preprint ar Xiv:1905.04396, 2019. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770 778, 2016. Douglas Heaven. Why deep-learning ais are so easy to fool. Nature, 574(7777):163 166, 2019. Yotam Hechtlinger, Barnab as P oczos, and Larry Wasserman. Cautious deep learning. ar Xiv preprint ar Xiv:1805.09460, 2018. Matthew J Holland. Making learning more transparent using conformalized performance prediction. ar Xiv preprint ar Xiv:2007.04486, 2020. Forrest Iandola, Matt Moskewicz, Sergey Karayev, Ross Girshick, Trevor Darrell, and Kurt Keutzer. Densenet: Implementing efficient convnet descriptor pyramids. ar Xiv preprint ar Xiv:1404.1869, 2014. Pang Wei Koh, Shiori Sagawa, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Irena Gao, Tony Lee, et al. Wilds: A benchmark of in-the-wild distribution shifts. In International Conference on Machine Learning, pp. 5637 5664. PMLR, 2021. A Krizhevsky. Learning multiple layers of features from tiny images. Master s thesis, University of Tront, 2009. Mathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, and Suman Jana. Certified robustness to adversarial examples with differential privacy. In IEEE Symposium on Security and Privacy (SP), pp. 656 672. IEEE, 2019. Guang-He Lee, Yang Yuan, Shiyu Chang, and Tommi Jaakkola. Tight certificates of adversarial robustness for randomly smoothed classifiers. Advances in Neural Information Processing Systems, 32:4910 4921, 2019. Jing Lei, James Robins, and Larry Wasserman. Distribution-free prediction sets. Journal of the American Statistical Association, 108(501):278 287, 2013. Bai Li, Changyou Chen, Wenlin Wang, and Lawrence Carin. Certified adversarial robustness with additive noise. Advances in Neural Information Processing Systems, 32:9464 9474, 2019. Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018. Jan Hendrik Metzen, Tim Genewein, Volker Fischer, and Bastian Bischoff. On detecting adversarial perturbations. ar Xiv preprint ar Xiv:1702.04267, 2017. Xuan Long Nguyen, Martin J Wainwright, and Michael I Jordan. Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Transactions on Information Theory, 56(11):5847 5861, 2010. Harris Papadopoulos. Inductive conformal prediction: Theory and application to neural networks. In Tools in Artificial Intelligence. Intech Open, 2008. Harris Papadopoulos, Kostas Proedrou, Volodya Vovk, and Alex Gammerman. Inductive confidence machines for regression. In European Conference on Machine Learning, pp. 345 356. Springer, 2002. Aleksandr Podkopaev and Aaditya Ramdas. Distribution-free uncertainty quantification for classification under label shift. ar Xiv preprint ar Xiv:2103.03323, 2021. Published as a conference paper at ICLR 2022 Yaniv Romano, Evan Patterson, and Emmanuel Candes. Conformalized quantile regression. Advances in Neural Information Processing Systems, 32:3543 3553, 2019. Yaniv Romano, Rina Foygel Barber, Chiara Sabatti, and Emmanuel Cand es. With malice toward none: Assessing uncertainty via equalized coverage. Harvard Data Science Review, 2, 4 2020a. Yaniv Romano, Matteo Sesia, and Emmanuel Candes. Classification with valid and adaptive coverage. Advances in Neural Information Processing Systems, 33:3581 3591, 2020b. Hadi Salman, Greg Yang, Jerry Li, Pengchuan Zhang, Huan Zhang, Ilya Razenshteyn, and S ebastien Bubeck. Provably robust deep learning via adversarially trained smoothed classifiers. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, pp. 11292 11303, 2019. Hadi Salman, Mingjie Sun, Greg Yang, Ashish Kapoor, and J Zico Kolter. Denoised smoothing: A provable defense for pretrained classifiers. Advances in Neural Information Processing Systems, 33, 2020. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. ar Xiv preprint ar Xiv:1409.1556, 2014. Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In 2nd International Conference on Learning Representations, ICLR 2014, 2014. Jiaye Teng, Guang-He Lee, and Yang Yuan. ℓ1 adversarial robustness certificates: a randomized smoothing approach, 2019. Ryan J Tibshirani, Rina Foygel Barber, Emmanuel Candes, and Aaditya Ramdas. Conformal prediction under covariate shift. Advances in Neural Information Processing Systems, 32:2530 2540, 2019. Alexandre B Tsybakov. Introduction to Nonparametric Estimation. Springer series in statistics. Springer, Dordrecht, 2009. Vladimir Vovk. Conditional validity of inductive conformal predictors. In Asian conference on machine learning, pp. 475 490. PMLR, 2012. Vladimir Vovk, Alex Gammerman, and Glenn Shafer. Algorithmic learning in a random world. Springer Science & Business Media, 2005. Dinghuai Zhang, Mao Ye, Chengyue Gong, Zhanxing Zhu, and Qiang Liu. Filling the soap bubbles: Efficient black-box adversarial certification with non-gaussian smoothing, 2019. Published as a conference paper at ICLR 2022 SUPPLEMENTARY MATERIAL: ADVERSARIALLY ROBUST CONFORMAL PREDICTION S1 SUPPLEMENTARY PROOFS In this section, we prove Theorem 1 and Theorem 2 presented in Section 3 of the main manuscript. Proof of theorem 1. From the definition of the prediction set in (8): P h Yn+1 Cδ( Xn+1) i = P h S( Xn+1, Yn+1) Q1 α Si i Ical + Mδ i (using (7)) P h S (Xn+1, Yn+1) + Mδ Q1 α Si i Ical + Mδ i = P h S (Xn+1, Yn+1) Q1 α Si where the last inequality follows from the validity of the prediction set constructed by the vanilla split conformal procedure since S(Xi, Yi), i {Ical {n + 1}} are exchangeable (Vovk et al., 2005); see also (Romano et al., 2019, Lemma 2). Proof of theorem 2. From the definition of the prediction set in (5): P h Yn+1 C( Xn+1) i = P h S( Xn+1, Yn+1) Q1 α Si (using (7)) P h S (Xn+1, Yn+1) + Mδ Q1 α Si = P h S (Xn+1, Yn+1) Q1 α Si i Ical Mδ i (using the definition of τ, defined in (9)) P h S (Xn+1, Yn+1) Qτ Si where, similarly to the proof of Theorem 1, the last inequality follows from (Romano et al., 2019, Lemma 2). Lastly, we show that τ 1 α. From the definition of τ in (9): τ = max n τ : Qτ Si i Ical Q1 α Si i Ical Mδ o max n τ : Qτ Si i Ical Q1 α Si S2 DATA SETS Overview We evaluate our methods on three widely used image classification data sets: CIFAR10, CIFAR100 (Krizhevsky, 2009), and Image Net (Deng et al., 2009). The CIFAR10 data set consists of 60,000 RGB color images of size 32 32 3, corresponding to 10 different classes, with 6,000 images per class. In this data set, there are 50,000 training images and 10,000 test images. The CIFAR100 data set is identical in structure to CIFAR10, except it has 100 classes containing 600 images each. Overall, this data consists of 50,000 training images and 10,000 test images. In our experiments, we use the entire training set for fitting the classifiers and split the rest 10,000 points into two sets of equal size (calibration and test) 50 times. We use the ILSVRC2012 version of the Image Net data set. Here, the training set consists of 1.2 million color RGB images of different sizes, corresponding to 1,000 class labels. To construct calibration and test sets, we randomly split the 50,000 images of the ILSVRC2012 validation set into calibration and test sets of equal size, for 50 times. We use the entire ILSVRC2012 training set for fitting the predictive models. Published as a conference paper at ICLR 2022 Pre-processing The CIFAR10 and CIFAR100 images are normalized by (i) subtracting the training images mean value, equals to [0.4914, 0.4822, 0.4465], and (ii ) dividing the result by the images standard deviation, equals to [0.2023, 0.1994, 0.2010]. The Image Net images are first re-scaled to a size of 256 256. Then, the central 224 224 pixels are taken and normalized by subtracting the mean value of the training images [0.485, 0.456, 0.406] and dividing the result by their standard deviation, given by [0.229, 0.224, 0.225]. As described in Section 5.1 of the main manuscript, for CIFAR10 and Image Net data sets, we use the pre-trained Res Net-110 and Res Net-50 models from Cohen et al. (2019). We use the models that were trained on clean training points, as well as models that were trained on Gaussian augmented training points. These models are downloaded from https://github.com/ locuslab/smoothing. This link also provides the full details on how the models were trained. Since the CIFAR100 data set was not studied by Cohen et al. (2019), we fit our own models by modifying the software package available at https://github.com/bearpaw/ pytorch-classification. Specifically, we fit a residual network of depth 110 (Res Net110) by augmenting the training data with random crops of size 32 (after applying a zero padding of size 4) and random horizontal flips. The images are normalized as explained in Section S2. For convenience, we added the normalization as a pre-processing layer to the fitted model. As explained in Section 5.1 of the main manuscript, we fit two models: one on clean training points and another on points that are augmented with Gaussian noise of standard deviation σ that is equal to the one used for smoothing. The models are trained using the stochastic gradient descent optimizer with a momentum term of 0.9 for 164 epochs, using a batch size of 128 points. The learning rate starts at 0.1 and is multiplied by a factor of 0.1 at epochs 81 and 122. A weight decay regularization of 10 4 is added to the loss function. We train the models using Pytorch, on a single Nvidia GEFORCE GTX 1080 Ti GPU. S4 WHY DOES TRAINING WITH GAUSSIAN NOISE IMPORTANT? As stated in Section 4.2 of the main manuscript, while our method can work with any base classifier ˆf, in practice, we require ˆf to have some degree of robustness in order to construct small prediction sets. Figure S6 demonstrates that our RSCP procedure constructs valid but very large prediction sets when deployed with a standard classifier, fitted without augmenting Gaussian noise to the CIFAR10 training points. Here, the average size of the prediction sets is equal to 9, a trivial result for the CIFAR10 data set that has only 10 classes. In this experiment, we use σ = 0.5 for smoothing, an attack of magnitude δ = 0.125, and ns = 256 Gaussian noise samples for estimating the smoothed score. Marginal Coverage Nominal Level Average Set Size Figure S6: The importance of using robust models. Marginal coverage (left) and average set-size (right) obtained by different conformal methods. The target coverage level is 90%. Published as a conference paper at ICLR 2022 S5 HYPER PARAMETERS S5.1 INFLUENCE OF THE VALUE OF σ Herein, we extend the discussion from Section 5.1 of the main manuscript and study the effect of the choice of the hyper-parameter σ used to compute the smoothed score. Specifically, we repeat the same experiment described in the context of Figure 3 for all data sets. Figure S7 shows how the ratio between σ and δ affects the average set-size and the average coverage obtained by our RSCP method. In this experiment, we use an adversarial perturbation of magnitude δ = 0.125 for CIFAR10 and CIFAR100, and magnitude δ = 0.25 for Image Net. As can be seen, a ratio of σ/δ = 2 works well for all data sets and the two non-conformity scores. Marginal Coverage Average Set Size Figure S7: The effect of σ on the marginal coverage (top) and size (bottom) of prediction sets constructed by RSCP. Here, δ and ns are fixed and equal to 0.125 and 256 for CIFAR10 and CIFAR100, and 0.25 and 64 for Image Net. S5.2 INFLUENCE OF THE VALUE OF ns As explained in Section 5.1 of the main manuscript, the smoothed score is evaluated by replacing the expectation (10) with the empirical mean (12), averaging over ns i.i.d. Gaussian noise samples. Figure S8 shows the effect of the value of ns on the average set-size and the average coverage obtained by our RSCP method for all data sets. Following that figure, averaging more than ns = 256 noise samples (or even ns = 64) barely affects the empirical coverage and size of the constructed sets. Published as a conference paper at ICLR 2022 Marginal Coverage Average Set Size Figure S8: The effect of the number of noise realizations ns (used to estimate the smoothed score) on the marginal coverage (top) and size (bottom) of prediction sets constructed by RSCP. Here, δ and σ are fixed and equal to 0.125 and 0.25 for CIFAR10 and CIFAR100, and to 0.25 and 0.5 for Image Net respectively. S6 EFFECT OF THE ADVERSARIAL ATTACKS As stated in Section 5.1 of the main manuscript, in all of our experiments, we use an attack of ℓ2 magnitude δ = 0.125 for CIFAR10 and CIFAR100, and δ = 0.25 for Image Net. Table S1 illustrates the effect of such adversarial perturbations on the test accuracy of the models used in the experiments presented in Section 5 of the main manuscript as well as in Supplementary Section S5.1 and Supplementary Section S8 that is presented hereafter. The PGD attack leads to a significant reduction in the test accuracy of models that are fitted without Gaussian noise augmentation (i.e., σ = 0). By contrast, the models that are trained with Gaussian noise are more robust, as the decrease in test accuracy3 is smaller when attacked by SMOOTHADV. This, however, comes at the expense of an accuracy reduction on the clean test points. S7 COMPUTATIONAL COMPLEXITY Following Algorithm 1, at test time, the computational complexity of RSCP is dominated by the number of calls to the predictive model, required to compute the smoothed score for each of the L 3When evaluating the test accuracy of the models trained with Gaussian noise, we add Gaussian noise with the same standard deviation σ to the test point since we find this approach to improve the accuracy. Published as a conference paper at ICLR 2022 Table S1: The influence of the adversarial attacks used in our experiments on the classifiers test accuracy. Architecture Training σ CIFAR10 CIFAR100 Image Net Clean Test Adversarial Test Clean Test Adversarial Test Clean Test Adversarial Test Res Net 0 89.94% 26.7% 71.03% 12.19% 75.69% 19.59% 0.0625 87.13% 68.96% 64.58% 41.17% - - 0.125 81.69% 67.84% 58.06% 41.95% - - 0.25 75.19% 65.18% 50.73% 40.82% 68.58% 56.32% 0.375 - - 44.23% 37.14% - - 0.5 63.84% 57.34% 39.0% 33.68% 61.04% 53.09% 0.75 - - 31.75% 28.1% - - 1 47.99% 44.41% 25.57% 23.56% 47.73% 42.89% Dense Net 0 95.42% 23.28% 77.07% 4.29% - - 0.25 80.09% 70.89% 52.69% 42.47% - - VGG 0 93.1% 54.96% 72.17% 23.12% - - 0.25 78.88% 69.82% 48.57% 39.54% - - possible labels y Y. The number of calls to the predictive model for creating a set for a single test image is ns, and the number of classes L only affects the number of scores that need to be calculated using the model predictions. The training involves fitting a classifier, and the calibration requires computing the smoothed non-conformity scores for all calibration points, where both steps can be performed only once. Table S2 summarizes the average runtime for constructing a prediction set using Vanilla CP, CP+SS, and RSCP with the HPS score function (3), for a single image, where the average is taken over 100 realizations. We present the results obtained by two different implementation strategies of our methods: single mode and batch mode . The single implementation repeats the following two steps for ns times: (i) augment to the input image one realization of Gaussian noise, and (ii) feed this noisy image to the neural net model. The batch implementation is more efficient since it parallelizes the inference step as follows: (i) augment ns noise realizations to the same input image, and (ii) pass all these noisy realizations together (as one batch) to the neural net model. We follow the exact same setup from the experiments used to create Figure 5 of the main manuscript. Specifically, ns = 256 for CIFAR10 and CIFAR100, and ns = 64 for Image Net. The runtimes are evaluated using Pytorch, on a single Nvidia GEFORCE GTX 1080Ti GPU. Following Table S2, we can see that when using the batch implementation the inference time of CP+SS and RSCP is approximately 3-4 times larger than Vanilla CP. Also, the runtimes of CP+SS and RSCP are almost identical, since the only difference between the two is the threshold used to construct the prediction sets. Table S2: Inference time (seconds) for constructing a prediction set for a single image using different conformal prediction methods. Dataset Vanilla CP CP+SS RSCP Single Batch Single Batch CIFAR10 0.019336 4.820160 0.030831 4.920013 0.030856 CIFAR100 0.017984 5.981254 0.030915 5.937026 0.030892 Image Net 0.097539 1.288073 0.424692 1.134275 0.435865 S8 RESULTS WITH DENSNET AND VGG MODELS In this section, we study the performance of our method when applied with Dense Net (Iandola et al., 2014) and VGG (Simonyan & Zisserman, 2014) models on the CIFAR10 and CIFAR100 data sets, Published as a conference paper at ICLR 2022 as opposed to experiments presented thus far that use only Res Net (He et al., 2016) classifiers. The experiments here follow the protocol described in Section 5 of the main manuscript. We use the same attack algorithms and magnitude δ = 0.125 as in the Res Net-110 experiments for CIFAR10 and CIFAR100. As for the hyper-parameters of our methods, we choose σ = 0.25 for smoothing, and ns = 256 for estimating the smoothed score. Figure S9 compares the coverage and size of the prediction sets constructed by combining Res Net-110, VGG, or Dense Net with Vanilla CP, CP+SS, and RSCP. As can be seen, the difference between the three models is minor in terms of coverage for all calibration methods in both data sets. We can also see that our theoretical guarantees hold: for CP+SS, the worst coverage bound is below the empirical coverage obtained, and RSCP yields valid prediction sets in contrast to the two other methods. In terms of size, the difference between Res Net and Dense Net is minor, while VGG constructs larger sets, especially on CIFAR100. Figure 1 of the main manuscript presents the results obtained by applying the VGG models to the CIFAR10 data set, using the HPS score. In that figure, we additionally tested the vanilla conformal method on the clean test points, to show the validity of this method when applied to clean test data that satisfy the exchangeability assumption. Turning to the technical details about the training procedure. For each architecture, we fit two models: one on clean training points and the second on points that are augmented with Gaussian noise, as explained in Section 5.1 and Supplementary Section S3. All the models are trained on our local server. The Dense Net architecture consists of 100 layers with a growth rate of 12 and a compression rate of 2. We use the same data augmentation and normalization described in Supplementary sections S2 and S3. The models are trained using stochastic gradient descent with a momentum term that equals 0.9, a batch size of 64, and a total of 300 epochs. The learning rate starts at 0.1 and is multiplied by a factor of 0.1 at epochs 150 and 225. We also use a weight decay regularization that equals 10 4. The VGG architecture consists of 19 layers with batch normalization. We use the same data augmentation and normalization from Supplementary sections S2 and S3. The VGG models are trained using stochastic gradient descent: a momentum of 0.9, batch size of 128, for a total of 164 epochs. We set the initial learning rate to be equal to 0.1 and multiplied by a factor of 0.1 at epochs 81 and 122. We also use a weight decay regularization, which equals 10 4. The Res Net models are trained as explained in Supplementary Section S3. We train all the models using Pytorch, on a single Nvidia GEFORCE GTX 1080 Ti GPU. Marginal Coverage CP+SS Worst Coverage Nominal Level Res Net VGG Average Set Size 0.75 0.80 0.85 0.90 0.95 Marginal Coverage CP+SS Worst Coverage Nominal Level Res Net VGG Average Set Size Figure S9: A comparison between Res Net, VGG, and Dense Net models, deployed to the CIFAR10 (bottom) and CIFAR100 (top) data sets. Marginal coverage (left) and average set-size (right) obtained by different conformal methods. The target coverage level is 90%. Published as a conference paper at ICLR 2022 S9 RESULTS ON CLEAN DATA In this section, we study the performance of our methods in a situation where no adversarial attack is performed. We follow the setup from Figure 5 of the main manuscript, and compare the three methods of interest (Vanilla CP, CP+SS, and RSCP) on the three data sets, using the same models, smoothing parameter σ, and the number of noise realizations ns. Importantly, in contrast to Figure 5, the results that are presented in Figure S10 are obtained by applying these three methods on clean test points that are not corrupted by adversarial noise. Following that figure, both Vanilla CP and CP+SS achieve the desired coverage of 90%, which stands in line with the theoretical guarantee of split conformal prediction; see also Theorem 2. By contrast, RSCP results in a higher coverage rate than the desired one, since this method always accounts for the uncertainty induced by a possible attack even if not applied. Moving to the average set-size, notice that all the methods perform similarly to the case where the test data is corrupted, as depicted in Figure 5 of the main manuscript. Specifically, the average size of the sets constructed by CP+SS is larger than that of Vanilla CP, possibly due to smoothed model that is used to make the predictions. However, it is important to stress that CP+SS has a significantly smaller drop in coverage in the adversarial setting. We also note that the average size of the sets constructed by RSCP is larger than those of CP+SS due to the inflation of threshold Q1 α. As a concluding remark, we iterate here the discussion from Section 3.2 of the main manuscript, displaying the two possible use-cases of our proposal. In the first, one can implement RSCP and construct valid prediction sets at any desired level of coverage, while bearing in mind that these sets are likely to be larger even if no attack is performed. In the second case, one can deploy CP+SS, guaranteeing that the sets will have the desired coverage in the case where no attack is performed while providing important information about the worst coverage that might be obtained under an adversarial attack. Moreover, we believe that in practice these two use-cases can be combined via the following two-step procedure to improve statistical efficiency: (i) apply a method for detecting whether a given test point is corrupted by an adversarial noise (e.g., Metzen et al., 2017; Feinman et al., 2017; Cohen et al., 2020); and (ii) if an attack is detected, apply the RSCP method to construct a valid prediction set, otherwise use the vanilla CP or CP+SS since these methods are less conservative. Marginal Coverage Nominal Level Average Set Size Figure S10: Performance on clean data. Marginal coverage (left) and average set-size (right) obtained by different conformal methods. The target coverage level is 90%. S10 RESULTS WITH AN ADVERSARIALLY-TRAINED MODEL In this section, we study the performance of our methods, integrated with the adversarial training scheme of Salman et al. (2019) that was shown to improve the robustness to the specific SMOOTHADV attack we applied at test time. This new, more robust, predictive model replaces the one we used in Section 5 (see also Supplementary Section S3), which is fitted merely by adding Gaussian noise to the training points. Recall that the SMOOTHADV attack has two parameters: the magnitude of the noise δ and the smoothing strength σ. Here, we use a training procedure that is ideal in the sense that it has access to the same δ used for attacking the model, and the same σ used Published as a conference paper at ICLR 2022 to compute the smoothed score. We perform this experiment only on the CIFAR10 data set and fit a Res Net-110 model using the code provided by Salman et al. (2019). Their software package is available at https://github.com/Hadisalman/smoothing-adversarial; this link also provides the full details on how the models are trained. The results are depicted in Figure S11, which follows the exact same experimental protocol used to create Figure 5 of the main manuscript. That figure compares two versions of CP+SS: the first is implemented with a model fitted on Gaussian augmented data, and the second is implemented with the adversarially trained model of Salman et al. (2019). As can be seen, the latter model achieves a slightly better coverage, closer to the desired one with the HPS score, while achieving a slightly worse coverage with the APS score. For both scores, however, this model improves the statistical efficiency. Interestingly, even in this optimistic case, CP+SS does not attain the desired coverage perfectly. It is also important to stress that if the new model would be attacked by a different and more effective attack, the coverage can possibly drop even further. It is worth noting that this experiment supports our discussion from Section 6, arguing that any new development for improving adversarial robustness could be easily integrated with our method to improve the overall performance. Adversarial Gaussian Augmentation Marginal Coverage Nominal Level Adversarial Gaussian Augmentation Average Set Size Figure S11: Results using an adversarially trained model. Marginal coverage (left) and average set-size (right) obtained by CP+SS on the CIFAR10 dataset with two different models. The target coverage level is 90%. S11 HPS VS. APS In this section, we discuss the difference between the APS and HPS non-conformity scores and demonstrate the advantages that APS has over HPS. Romano et al. (2020b) carefully argued that while conformal prediction methods which use the HPS score are guaranteed to achieve the marginal coverage criterion (1), they may fail to achieve the stronger and more informative notion of conditional coverage, defined as P [Yn+1 C (x) |Xn+1 = x] 1 α. (S14) In plain words, the above statement requires the coverage to hold for a specific observation Xn+1 = x, and not merely on average over all test points. While it is known that conditional coverage cannot be achieved in finite samples without imposing strong regularity conditions and/or modeling assumptions (Foygel Barber et al., 2021; Vovk, 2012), it was shown by Romano et al. (2020b) that the APS score performs better in terms of conditional coverage compared to the HPS score. In fact, Romano et al. (2020b) rigorously showed that by combining the APS score with an oracle classifier that has access to the true conditional class probabilities PY |X, one can exactly satisfy (S14). To demonstrate the advantage of APS in the adversarial setting, we repeat the experiment from Figure 5 of the main manuscript and present the coverage and the average set-size, conditional on each of the 10 classes of the CIFAR10 data set. Following Figure S12, the APS prediction sets achieve a steady coverage across all classes, close to the desired level for the CP+SS, while the coverage rates of HPS vary greatly. Moreover, the latter method tends to undercover the labels of the Published as a conference paper at ICLR 2022 samples that belong to classes 2-5. One can also see that the size of the prediction sets constructed by APS varies across the different classes, and therefore better reflecting the prediction uncertainty. This stands in contrast with the sets constructed via the HPS score, whose sizes are more or less constant for all class labels. Marginal Coverage Nominal Level 0 1 2 3 4 5 6 7 8 9 Class Average Set Size 0 1 2 3 4 5 6 7 8 9 Class Figure S12: Comparison between HPS and APS. Marginal coverage (top) and average set-size (bottom) obtained by different conformal methods for each of the classes of CIFAR10 separately. The target coverage level is 90%. S12 COMPARISON WITH CAUCHOIS ET AL. (2020) As discussed in Section 4, the task of reliably handling distribution shifts between the calibration and the test points has been studied in the literature of conformal prediction. However, none of the existing methods addresses our special setup, for which a full distributional shift is induced by an adversarial attack. Perhaps the closest work to ours is that of Cauchois et al. (2020). They present a calibration framework to construct robust prediction sets, given that the f-divergence between the test distribution and the calibration distribution of the non-conformity scores is bounded by a fixed value ρ. Since ρ is unknown in practice, they propose a method to estimate it from the data and use this approximation to inflate the threshold Q1 α, so that the constructed sets will be robust to natural distributional shifts. This is in contrast to our method that assumes that an unknown additive adversarial noise is added to the test points, and, as such, it is impossible to estimate ρ in this case from the training/calibration data. To demonstrate this, we follow the experimental protocol from Section 5 to compare our methods with Cauchois et al. (2020). We estimate ρ by applying Cauchois et al. (2020, Algorithm 1), and then construct prediction sets for the test points that are corrupted by adversarial perturbations of Published as a conference paper at ICLR 2022 magnitude δ = 0.125.4 We use the same attacks detailed in Section 5, and evaluate this method over 50 random splits of the CIFAR10 dataset. We compare the marginal coverage and average set sizes achieved by their method to our RSCP approach with σ matching to the one used to augment the training points, and set ns = 256. We make this comparison with four different Res Net models: one that is trained on clean data (in this case we perform RSCP with σ = 0.25 for smoothing), and three that are trained on points augmented with Gaussian noise, each one with a different σ 5. The results are depicted in Figure S13. As can be seen, while our method achieves the desired coverage, no matter which base model is used (at the expense of larger prediction sets), the method of Cauchois et al. (2020) fails when using a model trained on clean points: the empirical coverage is below the desired coverage (Figure S13a). The same happens with the σ = 0.0625 and σ = 0.125 models; see figures S13b and S13c. When using a more robust classifier trained with σ = 0.25 Cauchois et al. (2020) achieves the desired coverage in practice, although it is not supported by a theoretical guarantee; see Figure S13d. Interestingly, for this model, while the two methods construct prediction sets that achieve the desired coverage, our RSCP method which uses smoothed scores constructs smaller sets with the choice of HPS as the base score, and the same size of sets with the APS score. 4We thank the authors of (Cauchois et al., 2020) for sharing their code with us. 5As stated in Section S6, we add Gaussian noise to the test points when applying these models to calculate the non-conformity scores. Published as a conference paper at ICLR 2022 Cauchois et al. Marginal Coverage Nominal Level Cauchois et al. Average Set Size Cauchois et al. Marginal Coverage Nominal Level Cauchois et al. Average Set Size Cauchois et al. Marginal Coverage Nominal Level Cauchois et al. Average Set Size Cauchois et al. Marginal Coverage Nominal Level Cauchois et al. Average Set Size Figure S13: Comparison with Cauchois et al. (2020). Marginal coverage (left) and average setsize (right) obtained by different conformal methods on the CIFAR10 dataset with three different Res Net-110 models. The target coverage level is 90%. (a) model trained on clean training points, (b) model trained on points augmented by Gaussian noise with σ = 0.0625, (c) model trained on points augmented by Gaussian noise with σ = 0.125, (d) model trained on points augmented by Gaussian noise with σ = 0.25.