# understanding_intrinsic_robustness_using_label_uncertainty__8605bafe.pdf Published as a conference paper at ICLR 2022 UNDERSTANDING INTRINSIC ROBUSTNESS USING LABEL UNCERTAINTY Xiao Zhang Department of Computer Science University of Virginia shawn@virginia.edu David Evans Department of Computer Science University of Virginia evans@virginia.edu A fundamental question in adversarial machine learning is whether a robust classifier exists for a given task. A line of research has made some progress towards this goal by studying the concentration of measure, but we argue standard concentration fails to fully characterize the intrinsic robustness of a classification problem since it ignores data labels which are essential to any classification task. Building on a novel definition of label uncertainty, we empirically demonstrate that error regions induced by state-of-the-art models tend to have much higher label uncertainty than randomly-selected subsets. This observation motivates us to adapt a concentration estimation algorithm to account for label uncertainty, resulting in more accurate intrinsic robustness measures for benchmark image classification problems. 1 INTRODUCTION Since the initial reports of adversarial examples against deep neural networks (Szegedy et al., 2014; Goodfellow et al., 2015), many defensive mechanisms have been proposed aiming to enhance the robustness of machine learning classifiers. Most have failed, however, against stronger adaptive attacks (Athalye et al., 2018; Tramer et al., 2020). PGD-based adversarial training (M adry et al., 2018) and its variants (Zhang et al., 2019; Carmon et al., 2019) are among the few heuristic defenses that have not been broken so far, but these methods still fail to produce satisfactorily robust classifiers, even for classification tasks on benchmark datasets like CIFAR-10. Motivated by the empirical hardness of adversarially-robust learning, a line of theoretical works (Gilmer et al., 2018; Fawzi et al., 2018; Mahloujifar et al., 2019a; Shafahi et al., 2019) have argued that adversarial examples are unavoidable. In particular, these works proved that as long as the input distributions are concentrated with respect to the perturbation metric, adversarially robust classifiers do not exist. Recently, Mahloujifar et al. (2019b) and Prescott et al. (2021) generalized these results by developing empirical methods for measuring the concentration of arbitrary input distributions to derive an intrinsic robustness limit. (Appendix A provides a more thorough discussion of related work.) We argue that the standard concentration of measure problem, which was studied in all of the aforementioned works, is not sufficient to capture a realistic intrinsic robustness limit for a classification problem. In particular, the standard concentration function is defined as an inherent property regarding the input metric probability space that does not take account of the underlying label information. We argue that such label information is essential for any supervised learning problem, including adversarially robust classification, so must be incorporated into intrinsic robustness limits. Contributions. We identify the insufficiency of the standard concentration of measure problem and demonstrate why it fails to capture a realistic intrinsic robustness limit (Section 3). Then, we introduce the notion of label uncertainty (Definition 4.1), which characterizes the average uncertainty of label assignments for an input region. We then incorporate label uncertainty in the standard concentration measure as an initial step towards a more realistic characterization of intrinsic robustness (Section 4). Experiments on the CIFAR-10 and CIFAR-10H (Peterson et al., 2019) datasets demonstrate that error regions induced by state-of-the-art classification models all have high label uncertainty (Section 6.1), which validates the proposed label uncertainty constrained concentration problem. Published as a conference paper at ICLR 2022 (a) ℓ perturbations (b) ℓ2 perturbations Figure 1: Intrinsic robustness estimates for classification tasks on CIFAR-10 under (a) ℓ perturbations with ϵ = 8/255 and (b) ℓ2 perturbations with ϵ = 0.5. Orange dots are intrinsic robustness estimates using the method in Prescott et al. (2021), which does not consider labels; green dots show the results using our methods that incorporate label uncertainty; blue dots are results achieved by the state-of-the-art adversarially-trained models in Robust Bench (Croce et al., 2020). Three fundamental causes behind the adversarial vulnerability can be summarized as imperfect risk (red region), concentration of measure (orange region) and existence of uncertain inputs (green region). By adapting the standard concentration estimation method in Mahloujifar et al. (2019b), we propose an empirical estimator for the label uncertainty constrained concentration function. We then theoretically study the asymptotic behavior of the proposed estimator and provide a corresponding heuristic algorithm for typical perturbation metrics (Section 5). We demonstrate that our method is able to produce a more accurate characterization of intrinsic robustness limit for benchmark datasets than was possible using prior methods that do not consider labels (Section 6.2). Figure 1 illustrates the intrinsic robustness estimates resulting from our label uncertainty approach on two CIFAR-10 robust classification tasks. The intrinsic robustness estimates we obtain by incorporating label uncertainty are much lower than prior limits, suggesting that compared with the concentration of measure phenomenon, the existence of uncertain inputs may explain more fundamentally the adversarial vulnerability of state-of-the-art robustly-trained models. In addition, we also provide empirical evidence showing that both the clean and robust accuracies of state-of-the-art robust classification models are largely affected by the label uncertainty of the tested examples, suggesting that adding an abstain option based on label uncertainty is a promising avenue for improving adversarial robustness of deployed machine learning systems (Section 6.3). Notation. We use [k] to denote {1, 2, . . . , k} and use In to denote the n n identity matrix. For any set A, |A| denotes its cardinality, pow(A) is all its measurable subsets, and 1A( ) is the indicator function of A. Consider metric probability space (X, µ, ), where : X X R 0 is a distance metric on X. Define the empirical measure of µ with respect to a data set S sampled from µ as bµS(A) = P x S 1A(x)/|S|. Denote by Bϵ(x, ) the ball around x with radius ϵ measured by . The ϵ-expansion of A is defined as Aϵ( ) = {x X : x Bϵ(x, ) A}. When is free of context, we simply write Bϵ(x) = Bϵ(x, ) and Aϵ = Aϵ( ). 2 PRELIMINARIES Adversarial Risk. Adversarial risk captures the vulnerability of a classifier against adversarial perturbations. In particular, we adopt the following adversarial risk definition, which has been studied in several previous works, such as Gilmer et al. (2018); Bubeck et al. (2019); Mahloujifar et al. (2019a;b); Zhang et al. (2020b); Prescott et al. (2021). Definition 2.1 (Adversarial Risk). Let (X, µ, ) be a metric probability space of instances and Y be the set of possible class labels. Assume c : X Y is a concept function that gives each instance a label. For any classifier f : X Y and ϵ 0, the adversarial risk of f is defined as: Adv Riskϵ(f, c) = Pr x µ x Bϵ(x) s.t. f(x ) = c(x ) . The adversarial robustness of f is defined as: Adv Robϵ(f, c) = 1 Adv Riskϵ(f, c). Published as a conference paper at ICLR 2022 When ϵ = 0, adversarial risk equals to the standard risk. Namely, Adv Risk0(f, c) = Risk(f, c) := Prx µ[f(x) = c(x)] holds for any classifier f. Other definitions of adversarial risk have been proposed, such as the one used in M adry et al. (2018). These definitions are equivalent to the one we use, as long as small perturbations preserve the labels assigned by c( ). Intrinsic Robustness. The definition of intrinsic robustness was first introduced by Mahloujifar et al. (2019b) to capture the maximum adversarial robustness with respect to some set of classifiers: Definition 2.2 (Intrinsic Robustness). Consider the input metric probability space (X, µ, ) and the set of labels Y. Let c : X Y be a concept function that gives a label to each input. For any set of classifiers F {f : X Y} and ϵ 0, the intrinsic robustness with respect to F is defined as: Adv Robϵ(F, c) = 1 inf f F Adv Riskϵ(f, c) = sup f F {Adv Robϵ(f, c)}. According to the definition of intrinsic robustness, there does not exist any classifier in F with adversarial robustness higher than Adv Robϵ(F, c) for the considered task. Prior works, including Gilmer et al. (2018); Mahloujifar et al. (2019a;b); Zhang et al. (2020b), selected F in Definition 2.2 as the set of imperfect classifiers Fα = {f : Risk(f, c) α}, where α (0, 1) is set as a small constant that reflects the best classification error rates achieved by state-of-the-art methods. Concentration of Measure. Concentration of measure captures a closeness property for a metric probability space of instances. More formally, it is defined by the concentration function: Definition 2.3 (Concentration Function). Let (X, µ, ) be a metric probability space. For any α (0, 1) and ϵ 0, concentration function is defined as: h(µ, α, ϵ) = inf E pow(X){µ(Eϵ) : µ(E) α}. The standard notion of concentration function considers a special case of Definition 2.3 with α = 1/2 (e.g., Talagrand (1995)). For some special metric probability spaces, one can prove the closed-form solution of the concentration function. The Gaussian Isoperimetric Inequality (Borell, 1975; Sudakov & Tsirelson, 1974) characterizes the concentration function for spherical Gaussian distribution and ℓ2-norm distance metric, and was generalized by Prescott et al. (2021) to other ℓp norms. 3 STANDARD CONCENTRATION IS INSUFFICIENT We first explain a fundamental connection between the concentration of measure and the intrinsic robustness with respect to imperfect classifiers shown in previous work, and then argue that standard concentration fails to capture a realistic intrinsic robustness limit because it ignores data labels. Connecting Intrinsic Robustness with Concentration of Measure. Let (X, µ, ) be the considered input metric probability space, Y be the set of possible labels, and c : X Y be the concept function that gives each input a label. Given parameters 0 < α < 1 and ϵ 0, the standard concentration problem can be cast into an optimization problem as follows: minimize E pow(X) µ(Eϵ) subject to µ(E) α. (3.1) For any classifier f, let Ef = {x X : f(x) = c(x)} be its induced error region with respect to c( ). By connecting the risk of f with the measure of Ef and the adversarial risk of f with the measure of the ϵ-expansion of Ef, Mahloujifar et al. (2019a) proved that the standard concentration problem (3.1) is equivalent to the following optimization problem regarding risk and adversarial risk: minimize f Adv Riskϵ(f, c) subject to Risk(f, c) α. To be more specific, the following lemma characterizes the connection between the standard concentration function and the intrinsic robustness limit with respect to the set of imperfect classifiers: Lemma 3.1 (Mahloujifar et al. (2019a)). Let α (0, 1) and Fα = {f : Risk(f, c) α} be the set of imperfect classifiers. For any ϵ 0, it holds that Adv Robϵ(Fα, c) = 1 h(µ, α, ϵ). Lemma 3.1 suggests that the concentration function of the input metric probability space h(µ, α, ϵ) can be translated into an adversarial robustness upper bound that applies to any classifier with risk at Published as a conference paper at ICLR 2022 least α. If this upper bound is shown to be small, then one can conclude that it is impossible to learn an adversarially robust classifier, as long as the learned classifier has risk at least α. Concentration without Labels Mischaracterizes Intrinsic Robustness. Despite the appealing relationship between concentration of measure and intrinsic robustness, we argue that solving the standard concentration problem is not enough to capture a meaningful intrinsic limit for adversarially robust classification. The standard concentration of measure problem (3.1), which aims to find the optimal subset that has the smallest ϵ-expansion with regard to the input metric probability space (X, µ, ), does not involve the concept function c( ) that determines the underlying class label of each input. Therefore, no matter how we assign the labels to the inputs, the concentration function h(µ, α, ϵ) will remain the same for the considered metric probability space. In sharp contrast, learning an adversarially-robust classifier depends on the joint distribution of both the inputs and the labels. Moreover, when the standard concentration function is translated into an intrinsic limit of adversarial robustness, it is defined with respect to the set of imperfect classfiers Fα (see Lemma 3.1). The only restriction imposed by Fα is that the classifier (or equivalently, the measure of the corresponding error region) has risk at least α. This fails to consider whether the classifier is learnable or not under the given classification problem. Therefore, the intrinsic robustness limit implied by standard concentration Adv Robϵ(Fα, c) could be much higher than Adv Robϵ(Flearn, c), where Flearn denotes the set of classifiers that can be produced by some supervised learning method. Hence, it is not surprising that Mahloujifar et al. (2019b) found that the adversarial robustness attained by state-of-theart robust training methods for several image benchmarks is much lower than the intrinsic robustness limit implied by standard concentration of measure. In this work, to obtain a more meaningful intrinsic robustness limit we restrict the search space of the standard concentration problem (3.1) by considering both the underlying class labels and the learnability of the given classification problem. Gaussian Mixture Model. We further illustrate the insufficiency of standard concentration under a simple Gaussian mixture model. Let X Rn be the input space and Y = { 1, +1} be the label space. Assume all the inputs are first generated according to a mixture of 2-Gaussian distribution: x µ = 1 2N( θ, σ2In) + 1 2N(θ, σ2In), then labeled by a concept function c(x) = sgn(θ x), where θ Rn and σ R are given parameters (this concept function is also the Bayes optimal classifier, which best separates the two Gaussian clusters). Theorem 3.2, proven in Appendix C.1, characterizes the optimal solution to the standard concentration problem under this assumed model. Theorem 3.2. Consider the above Gaussian mixture model with ℓ2 perturbation metric. The optimal solution to the standard concentration problem (3.1) is a halfspace, either H = {x X : θ x + b θ 2 0} or H+ = {x X : θ x b θ 2 0}, where b is a parameter depending on α and θ such that µ(H ) = µ(H+) = α. Remark 3.3. Theorem 3.2 suggests that for the Gaussian mixture model, the optimal subset achieving the smallest ϵ-expansion under ℓ2-norm distance metric is a halfspace H, which is far away from the boundary between the two Gaussian classes for small α. When translated into the intrinsic robustness problem, the corresponding optimal classifier f has to be constructed by treating H as the only error region, or more precisely f(x) = c(x) if x / H; f(x) = c(x) otherwise. This optimally constructed classifier f, however, does not match our intuition of what a predictive classifier would do under the considered Gaussian mixture model. In particular, since all the inputs in H and their neighbours share the same class label and are also far away from the boundary, examples that fall into H should be easily classified correctly using simple decision rule, such as k-nearest neighbour or maximum margin, whereas examples that are close to the boundary should be more likely to be misclassified as errors by supervisedly-learned classifiers. This confirms our claim that standard concentration is not sufficient for capturing a meaningful intrinsic robustness limit. 4 INCORPORATING LABEL UNCERTAINTY IN INTRINSIC ROBUSTNESS In this section, we first propose a new concentration estimation framework by imposing a constraint based on label uncertainty (Definition 4.1) on the search space with respect to the standard problem (3.1). Then, we explain why this yields a more realistic intrinsic robustness limit. Let (X, µ) be the input probability space and Y = {1, 2, . . . , k} be the set of labels. η : X [0, 1]k is said to capture the full label distribution (Geng, 2016; Gao et al., 2017), if [η(x)]y corresponds Published as a conference paper at ICLR 2022 to the description degree of y to x for any x X and y Y, and P y [k][η(x)]y = 1 holds for any x X. For classification tasks that rely on human labeling, one can approximate the label distribution for any input by collecting human labels from multiple human annotators. Our experiments use the CIFAR-10H dataset that did this for the CIFAR-10 test images (Peterson et al., 2019). For any subset E pow(X), we introduce label uncertainty to capture the average uncertainty level with respect to the label assignments of the inputs within E: Definition 4.1 (Label Uncertainty). Let (X, µ) be the input probability space and Y = {1, 2, . . . , k} be the complete set of class labels. Suppose c : X Y is a concept function that assigns each input x a label y Y. Assume η : X [0, 1]k is the underlying label distribution function, where [η(x)]y represents the description degree of y to x. For any subset E pow(X) with measure µ(E) > 0, the label uncertainty (LU) of E with respect to (X, µ), c( ) and η( ) is defined as: LU(E; µ, c, η) = 1 µ(E) c(x) + max y =c(x) We define LU(E; µ, c, η) as the average label uncertainty for all the examples that fall into E, where 1 [η(x)]c(x) + maxy =c(x)[η(x)]y represents the label uncertainty of a single example {x, c(x)}. The range of label uncertainty is [0, 2]. For a single input, label uncertainty of 0 suggests the assigned label fully captures the underlying label distribution; label uncertainty of 1 means there are other classes as likely to be the ground-truth label as the assigned label; label uncertainty of 2 means the input is mislabeled and there is a different label that represents the ground-truth label. Based on the notion of label uncertainty, we study the following constrained concentration problem: minimize E pow(X) µ(Eϵ) subject to µ(E) α and LU(E; µ, c, η) γ, (4.1) where γ [0, 2] is a constant. When γ is set as zero, (4.1) simplifies to the standard concentration of measure problem. In this work, we set the value of γ to roughly represent the label uncertainty of the error region of state-of-the-art classifiers for the given classification problem. Theorem 4.2, proven in Appendix C.2, shows how (4.1) captures the intrinsic robustness limit with respect to the set of imperfect classifiers whose error region label uncertainty is at least γ. Theorem 4.2. Define Fα,γ = {f : Risk(f, c) α, LU(Ef; µ, c, η) γ}, where α (0, 1), γ (0, 2) and Ef = {x X : f(x) = c(x)} is the error region of f. For any ϵ 0, it holds that inf E pow(X){µ(Eϵ) : µ(E) α, LU(E; µ, c, η) γ} = 1 Adv Robϵ(Fα,γ, c). Compared with standard concentration, (4.1) aims to search for the least expansive subset with respect to input regions with high label uncertainty. According to Theorem 4.2, the translated intrinsic robustness limit is defined with respect to Fα,γ and is guaranteed to be no greater than Adv Robϵ(Fα). Although both Adv Robϵ(Fα) and Adv Robϵ(Fα,γ) can serve as valid robustness upper bounds for any f Fα,γ, the latter one would be able to capture a more meaningful intrinsic robustness limit, since state-of-the-art classifiers are expected to more frequently misclassify inputs with large label uncertainty, as there is more discrepancy between their assigned labels and the underlying label distribution (Section 6 provides supporting empirical evidence for this on CIFAR-10). Need for Soft Labels. The proposed approach requires label uncertainty information for training examples. The CIFAR-10H dataset provided soft labels from humans that enabled our experiments, but typical machine learning datasets do not provide such information. Below, we discuss possible avenues to estimating label uncertainty when human soft labels are not available and are too expensive to acquire. A potential solution is to estimate the set of examples with high label uncertainty using the predicted probabilities of a classification model. Confident learning (Natarajan et al., 2013; Lipton et al., 2018; Huang et al., 2019; Northcutt et al., 2021b;a) provides a systematic method to identify label errors in a dataset based on this idea. If the estimated label errors match the examples with high human label uncertainty, then we can directly extend our framework by leveraging the estimated error set. Our experiments on CIFAR-10 (see Appendix G), however, suggest that there is a misalignment between human recognized errors and errors produced by confident learning. The existence of such misalignment further suggests that one should be cautious when combining the estimated set of label errors into our framework. As the field of confident learning advances to produce a more accurate estimator of label error set, it would serve as a good alternative solution for applying our framework to the setting where human label information is not accessible. Published as a conference paper at ICLR 2022 5 MEASURING CONCENTRATION WITH LABEL UNCERTAINTY CONSTRAINTS Directly solving (4.1) requires the knowledge of the underlying input distribution µ and the groundtruth label distribution function η( ), which are usually not available for classification problems. Thus, we consider the following empirical counterpart of (4.1): minimize E G bµS(Eϵ) subject to bµS(E) α and LU(E; bµS, c, bη) γ, (5.1) where the search space is restricted to some specific collection of subsets G pow(X), µ is replaced by the empirical distribution bµS with respect to a set of inputs sampled from µ, and the empirical label distribution bη(x) is considered as an empirical replacement of η(x) for any given input x S. Theorem 5.1, proven in Appendix C.3, characterizes a generalization bound regarding the proposed label uncertainty estimate. It shows that if G is not too complex and bη is close to the ground-truth label distribution function η, the empirical estimate of label uncertainty LU(E; bµS, c, bη) is guaranteed to be close to the actual label uncertainty LU(E; µ, c, η). The formal definition of the complexity penalty with respect to a collection of subsets is given in Appendix B. Theorem 5.1 (Generalization of Label Uncertainty). Let (X, µ) be a probability space and G pow(X) be a collection of subsets of X. Assume φ : N R [0, 1] is a complexity penalty for G. If bη( ) is close to η( ) in L1-norm with respect to µ, i.e. R X η(x) bη(x) 1dµ δη, where δη (0, 1) is a small constant, then for any α, δ (0, 1) such that δ < α, we have E G and µ(E) α : LU(E; µ, c, η) LU(E; bµS, c, bη) 4δ + δη Remark 5.2. Theorem 5.1 implies the generalization of concentration under label uncertainty constraints (see Theorem C.3 for a formal argument of this and its proof in Appendix C.5). If we choose G and the collection of its ϵ-expansions, Gϵ = {Eϵ : E G}, in a careful way that both of their complexities are small, then with high probability, the empirical label uncertainty constrained concentration will be close to the actual concentration when the search space is restricted to G. Moreover, define h(µ, c, η, α, γ, ϵ, G) = inf E G{µ(Eϵ) : µ(E) α, LU(E; µ, c, η) γ} as the generalized concentration function under label uncertainty constraints. Then, based on a similar proof technique used for Theorem 3.5 in Mahloujifar et al. (2019b), we can further show that if G also satisfies a universal approximation property, then with probability 1, h(µ, c, η, α, γ δη/α, ϵ) lim T h(µST , c, bη, α, γ, ϵ, G(T)) h(µ, c, η, α, γ + δη/α, ϵ), (5.2) where T stands for the complexity of G and ST denotes a set of samples of size m(T). Appendix C.4 provides a formal argument and proof for (5.2). It is worth noting that (5.2) suggests that if we increase both the complexity of the collection of subsets G and the number of samples used for the empirical estimation, the optimal value of the empirical concentration problem (5.1) will converge to the actual concentration function with an error limit of δη/α on parameter γ. When the difference between the empirical label distribution bη( ) and the underlying label distribution η( ) is negligible, it is guaranteed that the optimal value of (5.1) asymptoptically converges to that of (4.1). Concentration Estimation Algorithm. Although Remark 5.2 provides a general idea how to choose G for measuring concentration, it does not indicate how to solve the empirical concentration problem (5.1) for a specific perturbation metric. This section presents a heuristic algorithm for estimating the least-expansive subset for optimization problem (5.1) when the metric is ℓ2-norm or ℓ -norm. We choose G as a union of balls for the ℓ2-norm distance metric and set G as a union of hypercubes for ℓ -norm (see Appendix B for the formal definition of union of ℓp-balls). It is worth noting that such choices of G satisfy the condition required for Theorem C.2, since they are universal approximators for any set and the VC-dimensions of both G and Gϵ are both bounded (see Eisenstat & Angluin (2007) and Devroye et al. (2013)). The remaining task is to solve (5.1) based on the selected G. Following Mahloujifar et al. (2019b), we place the balls for ℓ2 (or the hypercubes for ℓ ) in a sequential manner, and search for the best placement that satisfies the label uncertainty constraint using a greedy approach. Algorithm 1 in Appendix D gives pseudocode for the search algorithm. It initializes the feasible set of the hyperparmeters Ωas an empty set for each placement of balls (or hypercubes), then enumerates all Published as a conference paper at ICLR 2022 (a) Illustration of CIFAR-10 and CIFAR-10H (b) Label Uncertainty Distribution Figure 2: (a) Visualization of the CIFAR-10 test images with the soft labels from CIFAR-10H, the original assigned labels from CIFAR-10 and the label uncertainty scores computed based on Definition 4.1. (b) Histogram of the label uncertainty distribution for the CIFAR-10 test dataset. the possible initial placements, Sinit(u, k), such that its empirical label uncertainty exceeds the given threshold γ. Finally, among all the feasible ball (or hypercube) placements, it records the one that has the smallest ϵ-expansion with respect to the empirical measure bµS. In this way, the input region produced by Algorithm 1 serves as a good approximate solution to the empirical problem (5.1). 6 EXPERIMENTS We conduct experiments on the CIFAR-10H dataset (Peterson et al., 2019), which contains soft labels reflecting human perceptual uncertainty for the 10,000 CIFAR-10 test images (Krizhevsky & Hinton, 2009). These soft labels can be regarded as an approximation of the label distribution function η( ) at each given input, whereas the original CIFAR-10 test dataset provides the class labels given by the concept function c( ). We report on experiments showing the connection between label uncertainty and classification error rates (Section 6.1) and that incorporating label uncertainty enables better intrinsic robustness estimates (Section 6.2). Section 6.3 demonstrates the possibility of improving model robustness by abstaining for inputs in high label uncertainty regions. 6.1 ERROR REGIONS HAVE LARGER LABEL UNCERTAINTY Figure 2(a) shows the label uncertainty scores for several images with both the soft labels from CIFAR10H and the original class labels from CIFAR-10 (see Appendix F for more illustrations). Images with low uncertainty scores are typically easier for humans to recognize their class category (first row of Figure 2(a)), whereas images with high uncertainty scores look ambiguous or even misleading (second and third rows). Figure 2(b) shows the histogram of the label uncertainty distribution for all the 10, 000 CIFAR-10 test examples. In particular, more than 80% of the examples have label uncertainty scores below 0.1, suggesting the original class labels mostly capture the underlying label distribution well. However, around 2% of the examples have label uncertainty scores exceeding 0.7, and some 400 images appear to be mislabeled with uncertainty scores above 1.2. We hypothesize that ambiguous or misleading images should also be more likely to be misclassified as errors by state-of-the-art machine learning classifiers. That is, their induced error regions should have larger that typical label uncertainty. To test this hypothesis, we conduct experiments on CIFAR-10 and CIFAR-10H datasets. More specifically, we train different classification models, including intermediate models extracted at different epochs, using the CIFAR-10 training dataset, then empirically compute the standard risk, adversarial risk, and label uncertainty of the corresponding error region. The results are shown in Figure 3 (see Appendix E for experimental details). Figures 3(a) and 3(b) demonstrate the relationship between label uncertainty and standard risk for various classifiers produced by standard training and adversarial training methods under ℓ perturbations with ϵ = 8/255. In addition, we plot the label uncertainty with error bars of randomly-selected images from the CIFAR-10 test dataset as a reference. As the model classification accuracy increases, the label uncertainty of its induced error region increases, suggesting the misclassified examples tend to have higher label uncertainty. This observation holds consistently for both standard and Published as a conference paper at ICLR 2022 (a) Standard Training (b) Adversarial Training (c) Robust Bench Figure 3: Visualizations of error region label uncertainty versus standard risk and adversarial risk with respect to classifiers produced by different machine learning methods: (a) Standard-trained classifiers with different network architecture; (b) Adversarially-trained classifiers using different learning algorithms; (c) State-of-the-art adversarially robust classification models from Robust Bench. (a) ℓ perturbations (ϵ = 8/255) (b) ℓ2 perturbations (ϵ = 0.5) Figure 4: Estimated intrinsic robustness based on Algorithm 1 with γ = 0.17 under (a) ℓ perturbations with ϵ = 8/255; and (b) ℓ2 perturbations with ϵ = 0.5. For comparison, we plot baseline estimates produced without considering label uncertainty using a half-space searching method (Prescott et al., 2021) and using union of hypercubes or balls (Algorithm 1 with γ = 0). Robust accuracies achieved by state-of-the-art Robust Bench models are plotted in green. adversarially trained models with any tested network architecture. Figure 3(c) summarizes the error region label uncertainty with respect to the state-of-the-art adversarially robust models documented in Robust Bench (Croce et al., 2020). Regardless of the perturbation type or the learning method, the average label uncertainty of their misclassified examples all falls into a range of (0.17, 0.23), whereas the mean label uncertainty of all the testing CIFAR-10 data is less than 0.1. This supports our hypothesis that error regions of state-of-the-art classifiers tend to have larger label uncertainty, and our claim that intrinsic robustness estimates should account for labels. 6.2 EMPIRICAL ESTIMATION OF INTRINSIC ROBUSTNESS In this section, we apply Algorithm 1 to estimate the intrinsic robustness limit for the CIFAR-10 dataset under ℓ perturbations with ϵ = 8/255 and ℓ2 perturbations with ϵ = 0.5. We set the label uncertainty threshold γ = 0.17 to roughly represent the error region label uncertainty of state-ofthe-art classification models (see Figure 3). In particular, we adopt a 50/50 train-test split over the original 10, 000 CIFAR-10 test images (see Appendix E for experimental details). Figure 4 shows our intrinsic robustness estimates with γ = 0.17 when choosing different values of α. We include the estimates of intrinsic robustness defined with Fα as a baseline, where no label uncertainty constraint is imposed (γ = 0). Results are shown both for our ℓp-balls searching method and the half-space searching method in Prescott et al. (2021). We also plot the standard error and the robust accuracy of the state-of-the-art adversarially robust models in Robust Bench (Croce et al., 2020). For concentration estimation methods, the plotted values are the empirical measure of the returned optimally-searched subset (x-axis) and 1 the empirical measure of its ϵ-expansion (y-axis). Published as a conference paper at ICLR 2022 (a) Carmon et al. (2019) (b) Wu et al. (2020) Figure 5: Accuracy curves for different adversarially-trained classifiers, varying the abstaining ratio of CIFAR-10 images with high label uncertainty score: (a) Carmon et al. (2019) for ℓ perturbations with ϵ = 8/255; (b) Wu et al. (2020) for ℓ2 perturbations with ϵ = 0.5. Corresponding cut-off values of label uncertainty are marked on the x-axis with respect to percentage values of {0.02, 0.1, 0.2}. Compared with the baseline estimates, our label-uncertainty constrained intrinsic robustness estimates are uniformly lower across all the considered settings (similar results are obtained under other experimental settings, see Table 1 in Appendix F). Although both of these estimates can serve as legitimate upper bounds on the maximum achievable adversarial robustness for the given task, our estimate, which takes data labels into account, being closer to the robust accuracy achieved by stateof-the-art classifiers indicates it is a more accurate characterization of intrinsic robustness limit. For instance, under ℓ perturbations with ϵ = 8/255, the best adversarially-trained classification model achieves 66% robust accuracy with approximately 8% clean error, whereas our estimate indicates that the maximum robustness one can hope for is about 82% as long as the classification model has at least 8% clean error. In contrast, the intrinsic robustness limit implied by standard concentration is as high as 90% for the same setting, which again shows the insufficiency of standard concentration. 6.3 ABSTAINING BASED ON LABEL UNCERTAINTY Based on the definition of label uncertainty, and our experimental results in the previous subsections, we expect classification models to have higher accuracy on examples with low label uncertainty. Figure 5 shows the results of experiments to study the effect of abstaining based on label uncertainty on both clean and robust accuracies using adversarially-trained CIFAR-10 classification models from Carmon et al. (2019) (ℓ , ϵ = 8/255) and Wu et al. (2020) (ℓ2, ϵ = 0.5). We first sort all the test CIFAR-10 images based on label uncertainty, then evaluate the model performance with respect to different abstaining ratios of top uncertain inputs. The accuracy curves suggest that a potential way to improve the robustness of classification systems is to enable the classifier an option to abstain on examples with high uncertainty score. For example, if we allow the robust classifier of Carmon et al. (2019) to abstain on the 2% of the test examples whose label uncertainty exceeds 0.7, the clean accuracy improves from 89.7% to 90.3%, while the robust accuracy increases from 59.5% to 60.4%. This is close to the maximum robust accuracy that could be achieved with a 2% abstention rate (0.595/(1 0.02) = 0.607). This result points to abstaining on examples in high label uncertainty regions as a promising path towards achieving adversarial robustness. 7 CONCLUSION Standard concentration fails to sufficiently capture intrinsic robustness since it ignores data labels. Based on the definition of label uncertainty, we observe that the error regions induced by state-ofthe-art classification models all tend to have high label uncertainty. This motivates us to develop an empirical method to study the concentration behavior regarding the input regions with high label uncertainty, which results in more accurate intrinsic robustness measures for benchmark image classification tasks. Our experiments show the importance of considering labels in understanding intrinsic robustness, and further suggest that abstaining based on label uncertainty could be a potential method to improve the classifier accuracy and robustness. Published as a conference paper at ICLR 2022 AVAILABILITY An implementation of our method, and code for reproducing our experiments, is available under an open source license from: https://github.com/xiaozhanguva/intrinsic_rob_lu. ACKNOWLEDGEMENTS This work was partially funded by an award from the National Science Foundation (NSF) Sa TC program (Center for Trustworthy Machine Learning, #1804603). ETHICS STATEMENT Our work is primarily focused on deepening our understanding of intrinsic adversarial robustness limit and the main contributions in this paper are theoretical. Our work could potentially enable construction of more robust classification systems, as suggested by the results in Section 6.3. For most applications, such as autonomous vehicles and malware detection, improving the robustness of classifiers is beneficial to society. There may be scenarios, however, such as face recognition where uncertainty and the opportunity to confuse classifiers with adversarial perturbations may be useful, so enabling more robust classifiers in these domains may have negative societal impacts. REPRODUCIBILITY STATEMENT Details of our experimental setup and methods are provided in Appendix E, and all of the datasets we use are publicly available. In addition, we state the assumptions for our theoretical results in each theorem. Detailed proofs of all the presented theorems are provided in Appendix C. Jean-Baptiste Alayrac, Jonathan Uesato, Po-Sen Huang, Alhussein Fawzi, Robert Stanforth, and Pushmeet Kohli. Are labels required for improving adversarial robustness? In Neur IPS, 2019. Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In International Conference on Machine Learning, 2018. Arjun Nitin Bhagoji, Daniel Cullina, and Prateek Mittal. Lower bounds on adversarial robustness from optimal transport. In Neur IPS, 2019. Christer Borell. The Brunn-Minkowski inequality in Gauss space. Inventiones mathematicae, 30(2): 207 216, 1975. Sebastien Bubeck, Yin Tat Lee, Eric Price, and Ilya Razenshteyn. Adversarial examples from computational constraints. In International Conference on Machine Learning, 2019. Yair Carmon, Aditi Raghunathan, Ludwig Schmidt, John C Duchi, and Percy S Liang. Unlabeled data improves adversarial robustness. In Neur IPS, 2019. Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. In International Conference on Machine Learning, 2019. Francesco Croce and Matthias Hein. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In International Conference on Machine Learning, 2020. Francesco Croce, Maksym Andriushchenko, Vikash Sehwag, Nicolas Flammarion, Mung Chiang, Prateek Mittal, and Matthias Hein. Robust Bench: a standardized adversarial robustness benchmark. ar Xiv preprint ar Xiv:2010.09670, 2020. Luc Devroye, László Györfi, and Gábor Lugosi. A Probabilistic Theory of Pattern Recognition. Springer, 2013. Published as a conference paper at ICLR 2022 Elvis Dohmatob. Generalized no free lunch theorem for adversarial robustness. In International Conference on Machine Learning, 2019. David Eisenstat and Dana Angluin. The VC dimension of k-fold union. Information Processing Letters, 101(5):181 184, 2007. Alhussein Fawzi, Hamza Fawzi, and Omar Fawzi. Adversarial vulnerability for any classifier. In Neur IPS, 2018. Bin-Bin Gao, Chao Xing, Chen-Wei Xie, Jianxin Wu, and Xin Geng. Deep label distribution learning with label ambiguity. IEEE Transactions on Image Processing, 26(6):2825 2838, 2017. Xin Geng. Label distribution learning. IEEE Transactions on Knowledge and Data Engineering, 28 (7):1734 1748, 2016. Justin Gilmer, Luke Metz, Fartash Faghri, Samuel S Schoenholz, Maithra Raghu, Martin Wattenberg, and Ian Goodfellow. Adversarial spheres. ar Xiv:1801.02774, 2018. Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In ICLR, 2015. Sven Gowal, Krishnamurthy Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy Mann, and Pushmeet Kohli. Scalable verified training for provably robust image classification. In International Conference on Computer Vision, 2019. Chuan Guo, Mayank Rana, Moustapha Cisse, and Laurens van der Maaten. Countering adversarial images using input transformations. In ICLR, 2018. Jinchi Huang, Lie Qu, Rongfei Jia, and Binqiang Zhao. O2U-Net: A simple noisy label detection approach for deep neural networks. In International Conference on Computer Vision, 2019. Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009. Ryen Krusinga, Sohil Shah, Matthias Zwicker, Tom Goldstein, and David Jacobs. Understanding the (un)interpretability of natural image distributions using generative models. ar Xiv:1901.01499, 2019. Michel Ledoux. Isoperimetry and Gaussian analysis. In Lectures on Probability Theory and Statistics. Springer, 1996. Bai Li, Changyou Chen, Wenlin Wang, and Lawrence Carin. Certified adversarial robustness with additive noise. In Neur IPS, 2019. Zachary Lipton, Yu-Xiang Wang, and Alexander Smola. Detecting and correcting for label shift with black box predictors. In International Conference on Machine Learning, 2018. Saeed Mahloujifar, Dimitrios Diochnos, and Mohammad Mahmoody. The curse of concentration in robust learning: Evasion and poisoning attacks from concentration of measure. In AAAI Conference on Artificial Intelligence, 2019a. Saeed Mahloujifar, Xiao Zhang, Mohammad Mahmoody, and David Evans. Empirically measuring concentration: Fundamental limits on intrinsic robustness. In Neur IPS, 2019b. Aleksander M adry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In ICLR, 2018. Nagarajan Natarajan, Inderjit S Dhillon, Pradeep K Ravikumar, and Ambuj Tewari. Learning with noisy labels. In Neur IPS, 2013. Curtis Northcutt, Anish Athalye, and Jonas Mueller. Pervasive label errors in test sets destabilize machine learning benchmarks. In Neur IPS (Datasets and Benchmarks Track), 2021a. Curtis Northcutt, Lu Jiang, and Isaac Chuang. Confident learning: Estimating uncertainty in dataset labels. Journal of Artificial Intelligence Research, 70:1373 1411, 2021b. Published as a conference paper at ICLR 2022 Nicolas Papernot, Patrick Mc Daniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In IEEE Symposium on Security and Privacy, 2016. Joshua C Peterson, Ruairidh M Battleday, Thomas L Griffiths, and Olga Russakovsky. Human uncertainty makes classification more robust. In International Conference on Computer Vision, 2019. Jack Prescott, Xiao Zhang, and David Evans. Improved estimation of concentration under ℓp-norm distance metrics using half spaces. In ICLR, 2021. Aditi Raghunathan, Jacob Steinhardt, and Percy Liang. Certified defenses against adversarial examples. In ICLR, 2018. Ali Shafahi, W. Ronny Huang, Christoph Studer, Soheil Feizi, and Tom Goldstein. Are adversarial examples inevitable? In ICLR, 2019. Vladimir N Sudakov and Boris S Tsirelson. Extremal properties of half-spaces for spherically invariant measures. Zapiski Nauchnykh Seminarov Leningrad Otdel Mathematical Institute Steklov (LOMI), 41:14 24, 1974. Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In ICLR, 2014. Michel Talagrand. Concentration of measure and isoperimetric inequalities in product spaces. Publications Mathématiques de l Institut des Hautes Études Scientifiques, 81(1):73 205, 1995. Florian Tramer, Nicholas Carlini, Wieland Brendel, and Aleksander Madry. On adaptive attacks to adversarial example defenses. In Neur IPS, 2020. Eric Wong and Zico Kolter. Provable defenses against adversarial examples via the convex outer adversarial polytope. In International Conference on Machine Learning, 2018. Eric Wong, Frank R Schmidt, Jan Hendrik Metzen, and Zico Kolter. Scaling provable adversarial defenses. In Neur IPS, 2018. Dongxian Wu, Shu-Tao Xia, and Yisen Wang. Adversarial weight perturbation helps robust generalization. In Neur IPS, 2020. Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, and Alan Yuille. Mitigating adversarial effects through randomization. In ICLR, 2018. Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Laurent El Ghaoui, and Michael Jordan. Theoretically principled trade-off between robustness and accuracy. In International Conference on Machine Learning, 2019. Huan Zhang, Hongge Chen, Chaowei Xiao, Sven Gowal, Robert Stanforth, Bo Li, Duane Boning, and Cho-Jui Hsieh. Towards stable and efficient training of verifiably robust neural networks. In ICLR, 2020a. Xiao Zhang, Jinghui Chen, Quanquan Gu, and David Evans. Understanding the intrinsic robustness of image distributions using conditional generative models. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2020b. A RELATED WORK This section summarizes the work related to ours, beyond the brief background provided in the Introduction. First, we discuss the line of research aiming to develop robust classification models against adversarial examples. Then, we introduce the line of works which focus on understanding the intrinsic robustness limit. Published as a conference paper at ICLR 2022 A.1 TRAINING ADVERSARIALLY ROBUST CLASSIFIERS Witnessing the vulnerability of modern machine learning models to adversarial examples, extensive studies have been carried out aiming to build classification models that can be robust against adversarial perturbations. Heuristic defense mechanisms (Goodfellow et al., 2015; Papernot et al., 2016; Guo et al., 2018; Xie et al., 2018; M adry et al., 2018) had been most popular until many of them were broken by stronger adaptive adversaries (Athalye et al., 2018; Tramer et al., 2020). The only scalable defense which seems to hold up well against adaptive adversaries is PGD-based adversarial training (M adry et al., 2018). Several variants of PGD-based adversarial training have been proposed, which either adopt different loss function (Zhang et al., 2019; Wu et al., 2020) or make use of additional training data (Carmon et al., 2019; Alayrac et al., 2019). Nevertheless, the best current adversarially-trained classifiers can only achieve around 65% robust accuracy on CIFAR-10 against ℓ perturbations with strength ϵ = 8/255, even with additional training data (see the leaderboard in Croce et al. (2020)). To end the arms race between heuristic defenses and newly designed adaptive attacks that break them, certified defenses have been developed based on different approaches, including linear programming (Wong & Kolter, 2018; Wong et al., 2018), semidefinite programming (Raghunathan et al., 2018), interval bound propagation (Gowal et al., 2019; Zhang et al., 2020a) and randomized smoothing (Cohen et al., 2019; Li et al., 2019). Although certified defenses are able to train classifiers with robustness guarantees for input instances, most defenses can only scale to small networks and they usually come with sacrificed empirical robustness, especially for larger adversarial perturbations. A.2 THEORETICAL UNDERSTANDING ON INTRINSIC ROBUSTNESS Given the unsatisfactory status quo of building adversarially robust classification models, a line of research (Gilmer et al., 2018; Fawzi et al., 2018; Mahloujifar et al., 2019a; Shafahi et al., 2019; Dohmatob, 2019; Bhagoji et al., 2019) attempted to explain the adversarial vulnerability from a theoretical perspective. These works proved that as long as the input distribution is concentrated with respect to the perturbation metric, adversarially robust classifiers cannot exist. At the core of these results is the fundamental connection between the concentration of measure phenomenon and an intrinsic robustness limit that capture the maximum adversarial robustness with respect to some specific set of classifiers. For instance, Gilmer et al. (2018) showed that for inputs sampled from uniform n-spheres, a model-independent robustness upper bound under the Euclidean distance metric can be derived using the Gaussian Isoperimetric Inequality (Sudakov & Tsirelson, 1974; Borell, 1975). Mahloujifar et al. (2019a) generalized their result to any concentrated metric probability space of inputs. Nevertheless, it is unclear how to apply these theoretical results to typical image classification tasks, since whether or not natural image distributions are concentrated is unknown. To address this question, Mahloujifar et al. (2019b) proposed a general way to empirically measure the concentration for any input distribution using data samples, then employed it to estimate an intrinsic robustness limit for typical image benchmarks. By showing the existence of a large gap between the limit implied by concentration and the empirical robustness achieved by state-of-the-art adversarial training methods, Mahloujifar et al. (2019b) further concluded that concentration of measure can only explain a small portion of adversarial vulnerability of existing image classifiers. More recently, Prescott et al. (2021) further strengthened their conclusion by using the set of half-spaces to estimate the concentration function, which achieves enhanced estimation accuracy. Other related works (Fawzi et al., 2018; Krusinga et al., 2019; Zhang et al., 2020b) proposed estimating lower bounds on the concentration of measure by approximating the underlying distribution using generative models. None of these works, however, consider data labels. Our main results show that data labels are essential for understanding intrinsic robustness limits. B FORMAL DEFINITIONS In this section, we introduce the formal definitions of complexity penalty and union of ℓp-balls that are used in Section 5. To begin with, we lay out the definition of complexity penalty that is defined for some collection of subsets G pow(X). VC dimension and Rademacher complexity are commonly-used examples of such a complexity penalty. Published as a conference paper at ICLR 2022 Definition B.1 (Complexity Penalty). Let G pow(X). We say φ : N R [0, 1] is a complexity penalty for G, if for any δ (0, 1), it holds that Pr S µm E G : |bµS(E) µ(E)| δ φ(m, δ). Next, we provide the formal definition of union of ℓp-balls as follows: Definition B.2 (Union of ℓp-Balls). Let p 1. For any T Z+, define the union of T ℓp-balls as B(T; ℓp) = n T t=1 B(ℓp) rt (ut): t [T], (ut, rt) Rn Rn 0 o , When p = , B(T; ℓ ) corresponds to the union of T hypercubes. C PROOFS OF MAIN RESULTS In this section, we provide detailed proofs of our main results, including Theorem 3.2, Theorem 4.2, Theorem 5.1 and the argument presented in Remark 5.2. C.1 PROOF OF THEOREM 3.2 In order to prove Theorem 3.2, we make use of the Gaussian Isoperimetric Inequality (Sudakov & Tsirelson, 1974; Borell, 1975). The proof of such inequality can be found in Ledoux (1996). Lemma C.1 (Gaussian Isoperimetric Inequality). Let (Rn, µ) be the n-dimensional Gaussian space equipped with the ℓ2-norm distance metric. Consider an arbitrary subset E pow(Rn), suppose H is a half space that satisfies µ(H) = µ(E). Then for any ϵ 0, we have µ(Eϵ) µ(Hϵ) = Φ Φ 1 µ(E) + ϵ , where Φ( ) is the cumulative distribution function of N(0, 1) and Φ 1( ) is its inverse function. Now we are ready to prove Theorem 3.2. Proof of Theorem 3.2. To begin with, we introduce the following notations. Let µ be the probability measure for N( θ, σ2In) and µ+ be the probability measure for N(θ, σ2In), then by definition, we have µ = µ /2 + µ+/2. Consider the optimal subset E = argmin E pow(X){µϵ(E) : µ(E) α}. Note that the standard concentration function h(µ, α, ϵ) is monotonically increasing with respect to α, thus µ(E ) = α holds for any continuous µ. Let α = µ (E ) and α+ = µ+(E ). According to the Gaussian Isoperimetric Inequality Lemma C.1, it holds for any ϵ 0 that µ(E ϵ ) = 1 2µ (E ϵ ) + 1 2µ+(E ϵ ) 1 2Φ(Φ 1(α ) + ϵ) + 1 2Φ(Φ 1(α+) + ϵ). (C.1) Note that the equality of (C.1) can be achieved if and only if E is a half space. Next, we show that there always exists a half space H pow(X) such that µ (H) = α and µ+(H) = α+. Let f ( ), f+( ) be the PDFs of µ and µ+ respectively. For any x X, f (x) and f+(x) are always positive, thus we have f+(x) f (x) = exp 1 2σ2 (x θ) (x θ) exp 1 2σ2 (x + θ) (x + θ) = exp 2θ x This implies that the ratio of f+(x)/f (x) is monotonically increasing with respect to θ x. Consider the following extreme half space H = {x X : θ x + b θ 2 0} such that µ(H ) = α. We are going to prove µ (H ) µ (E ) = α and µ+(H ) µ+(E ) = α+. Consider the sets E (H ) and (E ) H , we have µ E (H ) inf x E (H ) exp 2θ x sup x (E ) H Published as a conference paper at ICLR 2022 Note that we also have µ+ E (H ) + µ E (H ) = µ+ (E ) H + µ (E ) H . (C.3) Thus, combining (C.2) and (C.3), we have µ+ E (H ) µ+ (E ) H and µ E (H ) µ (E ) H , Adding the term µ+ E H or µ E H on both sides, we further have µ+(H ) µ+(E ) = α+ and µ (H ) µ (E ) = α . On the other hand, consider the half space H+ = {x X : θ x b θ 2 0} such that µ(H+) = α. Based on a similar technique, we can prove µ (H+) µ+(E ) = α+ and µ (H+) µ (E ) = α . In addition, let H = {x X : w x + b 0} be any half space such that µ(H) = α. Since both µ+ and µ are continuous, as we rotate the half space (i.e., gradually increase the value of w θ), µ (H) and µ+(H) will also change continuously. Therefore, it is guaranteed that there exists a half space H pow(X) such that µ (H) = α and µ+(H) = α+. This further implies that the lower bound of (C.1) can be always be achieved. Finally, since we have proved the optimal subset has to be a half space, the remaining task is to solve the following optimization problem: min H pow(X) 1 2Φ Φ 1 µ (H) + ϵ + 1 2Φ Φ 1 µ+(H) + ϵ s.t. H = {x X : w x + b 0} and µ(H) = α. (C.4) Construct function g(u) = Φ Φ 1(u) + ϵ + Φ Φ 1(2α u) + ϵ , where u [0, 2α]. Based on the derivative of inverse function formula, we compute the derivative of g with respect to u as follows 2π exp (Φ 1(u) + ϵ)2 2π exp (Φ 1(2α u) + ϵ)2 = exp (Φ 1(u) + ϵ)2 exp (Φ 1(u))2 exp (Φ 1(2α u) + ϵ)2 exp (Φ 1(2α u))2 = exp( ϵ2/2) exp ϵΦ 1(u) exp ϵΦ 1(2α u) . Noticing the term exp( ϵΦ 1(u)) is monotonically decreasing with respect to u, we then know that g(u) is monotonically increasing in [0, α] and monotonically decreasing in [α, 2α]. Therefore, this suggests that the optimal solution to (C.4) is achieved when µ (H) reaches its maximum or its minimum. According to the previous argument regarding the range of α and α+, we can immediately prove the optimality results of Theorem 3.2. C.2 PROOF OF THEOREM 4.2 In this section, we prove Theorem 4.2 based on techniques used in Mahloujifar et al. (2019a) for proving the connection between the standard concentration function and intrinsic robustness with respect to the set of imperfect classifiers. Proof of Theorem 4.2. Let E be the optimal solution to (4.1), then µ(E ϵ ) corresponds to the optimal value of (4.1). We are going to show 1 Adv Robϵ(Fα,γ, c) = µ(E ϵ ) by proving both directions. First, we prove 1 Adv Robϵ(Fα,γ, c) µ(E ϵ ). Let f be any classifier within Fα,γ, and E(f) be the corresponding error region of f. According to the definitions of risk and adversarial risk, we have Risk(f, c) = µ(E(f)) and Adv Riskϵ(f, c) = µ(Eϵ(f)), Published as a conference paper at ICLR 2022 where Eϵ(f) represents the ϵ-expansion of E(f). Since f Fα,γ, we have Risk(f, c) = µ(E(f)) α and LU(E(f); µ, c, η) γ. Thus, by (4.1), we obtain that 1 Adv Robϵ(f, c) = Adv Riskϵ(f, c) = µ(Eϵ(f)) µ(E ϵ ). By taking the infimum over f over Fα,γ on both sides, we prove 1 Adv Robϵ(Fα,γ, c) µ(E ϵ ). Next, we show that 1 Adv Robϵ(Fα,γ, c) µ(E ϵ ). We construct a classifier f such that f (x) = c(x) if x E ; f (x) = c(x) otherwise. Note that by construction, E corresponds to the error region of f . Thus according to the definitions of risk and adversarial risk, we know Risk(f , c) = µ(E ) α and Adv Riskϵ(f , c) = µ(E ϵ ). Since LU(E ; µ, c, η) γ, we know the error region label uncertainty of f is at least γ. Thus, by definition of intrinsic robustness, we know 1 Adv Robϵ(Fα,γ, c) Adv Riskϵ(f , c) = µ(E ϵ ). Finally, putting pieces together, we complete the proof. C.3 PROOF OF THEOREM 5.1 Proof of Theorem 5.1. For simplicity, denote by lu(x; c, η) = 1 η(x) c(x)+maxy =c(x) η(x) y the label uncertainty of a given input x with respect to c( ) and η( ). Let E be a subset in G such that µ(E) α and |µ(E) bµ(E)| δ, where δ is a constant much smaller than α. Then according to Definition 4.1, we can decompose the estimation error of label uncertainty as: LU(E; µ, c, η) LU(E; bµS, c, bη) = 1 µ(E) E lu(x; c, η) dµ 1 bµS(E) E lu(x; c, bη) dbµS = 1 µ(E) 1 bµS(E) E lu(x; c, η) dµ | {z } I1 lu(x; c, η) lu(x; c, bη) dµ | {z } I2 E lu(x; c, bη) dµ Z E lu(x; c, bη) dbµS Next, we upper bound the absolute value of the three components, respectively. Consider the first term I1. Note that 0 lu(x; c, η) 2 for any x X, thus we have | R E lu(x; c, η) dµ| 2µ(E). Therefore, we have |I1| 1 µ(E) 1 bµS(E) 2µ(E) 2 bµS(E) |µ(E) bµS(E)|. As for the second term I2, the following inequality holds for any x X |lu(x; c, η) lu(x; c, bη)| η(x) bη(x) + max y =c(x) y max y =c(x) η(x) bη(x) 1, where the second inequality holds because | maxi ai maxi bi| maxi |ai bi| for any a, b Rn. Therefore, we can upper bound |I2| by |I2| 1 bµS(E) η(x) bη(x) 1 dµ 1 bµS(E) η(x) bη(x) 1 dµ δη bµS(E). Published as a conference paper at ICLR 2022 For the last term I3, since 0 lu(x; c, η) 2 holds for any x X, we have |I3| 2 bµS(E) µ(E) bµS(E) . Finally, putting pieces together, we have |LU(E; µ, c, η) LU(E; bµS, c, bη)| 4 bµS(E) µ(E) bµS(E) + δη bµS(E) 4δ + δη provided µ(E) α and |µ(E) bµS(E)| δ. Making use of the definition of complexity penalty for G completes the proof of Theorem 5.1. C.4 PROOF OF REMARK 5.2 Before presenting the proofs, we first lay out the formal statement of Remark 5.2 in Theorem C.2. The proof technique of Theorem C.2 is inspired by Theorem 3.5 in Mahloujifar et al. (2019b). Theorem C.2 (Formal Statement of Remark 5.2). Consider the input metric probability space (X, µ, ), the concept function c and the label distribution function η. Let {G(T)}T N be a series of collection of subsets over X. For any T N, assume φT and φT ϵ are complexity penalties for G(T) and Gϵ(T) respectively, and bη is a function such that R X bη(x) η(x) 1dx δη. Define h(µ, c, η, α, γ, ϵ, G) = inf E G{µ(Eϵ) : µ(E) α, LU(E; µ, c, η) γ} to be the constrained concentration function. We simply write h(µ, c, η, α, γ, ϵ) when G = pow(X). Given a sequence of datasets {ST }T N, where ST consists of m(T) i.i.d. samples from µ and a sequence of real numbers {δ(T)}T N with δ(T) (0, α/2), if the following assumptions holds: 1. P T =1 φT (m(T), δ(T)) < 2. P T =1 φT ϵ (m(T), δ(T)) < 3. lim T δ(T) = 0 4. lim T h(µ, c, η, α, γ, ϵ, G(T)) = h(µ, c, η, α, γ, ϵ) 1 5. h is locally continuous w.r.t. α and γ at (µ, c, η, α, γ δη/α, ϵ, pow(X)), then with probability 1, we have h(µ, c, η, α, γ δη/α, ϵ) lim T h(µST , c, bη, α, γ, ϵ, G(T)) h(µ, c, η, α, γ + δη/α, ϵ). To prove Theorem C.2, we use the following theorem regarding the generalization of concentration under label uncertainty constraints. The proof of Theorem C.3 is provided in Appendix C.5. Theorem C.3 (Generalization of Concentration). Let (X, µ, ) be a metric probability space and G pow(X). Define h(µ, c, η, α, γ, ϵ, G) = inf E G{µ(Eϵ) : µ(E) α, LU(E; µ, c, η) γ} as the generalized concentration function under label uncertainty constraints. Then, under the same setting of Theorem 5.1, for any γ, ϵ [0, 1], α (0, 1] and δ (0, α/2), we have Pr S µm h(µ, c, η, α δ, γ δ , ϵ, G) δ h(bµS, c, bη, α, γ, ϵ, G) h(µ, c, η, α + δ, γ + δ , ϵ, G) + δ 1 6φ(m, δ) 2φϵ(m, δ), where δ = (4δ + δη)/(α 2δ) and φϵ is the complexity penalty for Gϵ. In addition, we also make use of the Borel-Cantelli Lemma to prove Theorem C.2. Lemma C.4 (Borel-Cantelli Lemma). Let {ET }T N be a series of events such that P T =1 Pr[ET ] < . Then with probability 1, only finite number of events will occur. Now we are ready to prove Theorem C.2. 1It is worth nothing that this assumption is satisfied for any family of collections of subsets that is a universal approximator, such as kernel SVMs and decision trees. Published as a conference paper at ICLR 2022 Proof of Theorem C.2. Let ET be the event such that h µ, c, η, α δ(T), γ δ (T), ϵ, G(T) δ(T) > h bµST , c, bη, α, γ, ϵ, G(T) or h µ, c, η, α + δ(T), γ + δ (T), ϵ, G(T) + δ(T) < h bµST , c, bη, α, γ, ϵ, G(T) , δ (T) = (4δ(T) + δη)/(α 2δ(T)) for any T N. Since δ(T) < α/2, thus according to Theorem C.3, for any T N, we have Pr[ET ] 6φT (m(T), δ(T)) + 2φT ϵ (m(T), δ(T))). By Assumptions 1 and 2, this further implies T =1 Pr[ET ] 6 T =1 φT (m(T), δ(T)) + 2 T =1 φT ϵ (m(T), δ(T))) < . Thus according to Lemma C.4, we know that there exists some j N such that for all T j, h µ, c, η, α δ(T), γ δ (T),ϵ, G(T) δ(T) h(bµST , c, bη, α, γ, ϵ) h µ, c, η, α + δ(T), γ + δ (T), ϵ, G(T) + δ(T), (C.5) holds with probability 1. In addition, by Assumptions 3, 4 and 5, we have lim T h µ, c, η, α δ(T), γ δ (T), ϵ, G(T) = lim T1 lim T2 h µ, c, η, α δ(T1), γ δ (T1), ϵ, G(T2) = lim T1 h µ, c, η, α δ(T1), γ δ (T1), ϵ = h µ, c, η, α, γ δη/α, ϵ , where the second equality is due to Assumption 4 and the last equality is due to Assumptions 3 and 5. Similarly, we have lim T h µ, c, η, α + δ(T), γ + δ (T), ϵ, G(T) = h µ, c, η, α, γ + δη/α, ϵ . Therefore, let T goes to in (C.5), we have h µ, c, η, α, γ δη/α, ϵ lim T h bµST , c, bη, α, γ, ϵ, G(T) h µ, c, η, α, γ + δη/α, ϵ . C.5 PROOF OF GENERALIZATION OF CONCENTRATION THEOREM Proof of Theorem C.3. First, we introduce some notation. Let h(µ, c, η, α, γ, ϵ, G) be the optimal value and g(µ, c, η, α, γ, ϵ, G) be the optimal solution with respect to the following generalized concentration of measure problem with label uncertainty constraint: minimize E G µ(Eϵ) subject to µ(E) α and LU(E; µ, c, η) γ. (C.6) Note that the difference between (C.6) and (4.1) is that the feasible set of E is restricted to some collection of subsets G pow(X). Correspondingly, we let h(bµS, c, bη, α, γ, ϵ, G) and g(bµS, c, bη, α, γ, ϵ, G) be the optimal value and optimal solution with respect to the empirical optimization problem (5.1). Let E = g(µ, c, η, α + δ, γ + δ , ϵ, G) and b E = g(bµS, c, bη, α, γ, ϵ, G), where δ will be specified later. Note that when these optimal sets do not exist, we can select a set for which the expansion is arbitrarily close to the optimum, then every step of the proof will apply to this variant. According to the definition of complexity penalty, we have Pr S µm |bµS(b E) µ(b E)| δ φ(m, δ). (C.7) Since bµS(b E) α by definition, (C.7) implies that Pr S µm µ(b E) α δ φ(m, δ). (C.8) Published as a conference paper at ICLR 2022 In addition, according to Theorem 5.1, for any δ (0, α/2), we have LU(b E; µ, c, η) LU(b E; bµS, c, bη) 4δ + δη 2φ(m, δ), (C.9) where the inequality holds because of (C.8) and the union bound. Since LU(b E; bµS, c, bη) γ by definition, (C.9) implies that LU(b E; µ, c, η) γ 4δ + δη 2φ(m, δ). (C.10) Based on the definition of the concentration function h, combining (C.8) and (C.10) and making use of the union bound, we have Pr S µm µ(b Eϵ) h(µ, c, η, α δ, γ δ , ϵ, G) 3φ(m, δ), (C.11) where we set δ = 4δ+δη α 2δ . Note that according to the definition of φϵ, we have Pr S µm |µ(b Eϵ) µS(b Eϵ)| δ φϵ(m, δ), (C.12) thus combining (C.11) and (C.12) by union bound, we have Pr S µm bµS(b Eϵ) h(µ, c, η, α δ, γ δ , ϵ, G) δ 3φ(m, δ) + φϵ(m, δ). (C.13) This completes the proof of one-sided inequality of Theorem C.3. The other side of Theorem C.3 can be proved using the same technique. In particular, we have Pr S µm bµS(b Eϵ) h(µ, c, η, α + δ, γ + δ , ϵ, G) + δ 3φ(m, δ) + φϵ(m, δ). (C.14) Combining (C.13) and (C.14) by union bound completes the proof. D HEURISTIC SEARCH ALGORITHM The pseudocode of the heuristic search algorithm for the empirical label uncertainty constrained concentration problem (5.1) is shown in Algorithm 1. Algorithm 1: Heuristic Search for Robust Error Region under ℓp(p {2, }) Input : a set of labeled inputs {x, c(x), bη(x)}x S, parameters α, γ, ϵ, T 1 b E {}, b Sinit {}, b Sexp {}; 2 for t = 1, 2, . . . , T do 3 klower (α|S| | b Sinit|)/(T t + 1) , kupper (α|S| | b Sinit|); 5 for u S do 6 for k [klower, kupper] do 7 rk(u) compute the ℓp distance from u to the k-th nearest neighbour in S \ b Sinit; 8 Sinit(u, k) {x S \ b Sinit : x u 2 rk(u)}; 9 Sexp(u, k) {x S \ b Sexp : x u 2 rk(u) + ϵ}; 10 if LU(Sinit(u, k), bµS, c, bη) γ then 11 insert (u, k) into Ω 12 (bu, bk) argmin(u,k) Ω{|Sexp(u, k)| |Sinit(u, k)|}; 13 b E b E Ball(bu, rbk(bu)); 14 b Sinit b Sinit Sinit(bu, bk), b Sexp b Sexp Sexp(bu, bk); Output : b E Published as a conference paper at ICLR 2022 E DETAILED EXPERIMENTAL SETTINGS In this section, we specify the details of the experiments presented in Section 6. The robustness results of all the adversarially-trained models from Robust Bench (Croce et al., 2020) are evaluated using the auto attack (Croce & Hein, 2020). All of our experiments are conducted using a GPU server with a NVIDIA Ge Force RTX 2080 Ti Graphics card. Error Region Label Uncertainty. We explain the experimental details of Figure 3. For standard trained classifiers, we implemented five neural network architecture, including a 4-layer neural net with two convolutional layers and two fully-connected layers (small), a 7-layer neural net with four convolutional layers and three fully-connected layers (large), a Res Net-18 architecture (resnet18), Res Net-50 architecture (resnet50) and a Wide Res Net-34-10 architecture (wideresnet). We trained the small and large model using a Adam optimizer with initial learning rate 0.005, whereas we trained the resnet18, resnet50 and wideresnet model using a SGD optimizer with initial learning rate 0.01. All models are trained using a piece-wise learning rate schedule with a decaying factor of 10 at epoch 50 and epoch 75, respectively. For Figure 3(a), we plotted the label uncertainty and standard risk for the intermediate models obtained at epochs 5, 10, . . . , 100 for each architecture. In addition, we also randomly selected different subsets of inputs with empirical measure of 0.05, 0.10, . . . 0.95 and plotted their corresponding label uncertainty with error bars. For adversarially trained classifiers, we implemented the vanilla adversarial training method (M adry et al., 2018) and the adversarial training method with adversarial weight perturbation (Wu et al., 2020), which are denoted as AT and AT-AWP in Figure 3(b) respectively. Both Res Net-18 (resnet18) and Wide Res Net-34-10 (wideresnet) architecture are implemented for each training method. A 10-step PGD attack (PGD-10) with step size 2/255 and maximum perturbation size 8/255 is used for each model during training. In addition, each model is trained for 200 epochs using a SGD optimizer with initial learning rate 0.1 and piece-wise learning rate schedule with a decaying factor of 10 at epoch 100 and epoch 150. We record the intermediate models at epoch 10, 20, . . . , 200 respectively. Estimation of Intrinsic Robustness. For Figure 4, we first conduct a 50/50 train-test split over the 10, 000 CIFAR-10 test images, then run Algorithm 1 for each setting on the training dataset to obtain the optimal subset. Here, we choose the value of α {0.01, 0.02, . . . , 0.15} and tune the parameter T for each α parameter. Next, we evaluate the empirical measure of the optimally-searched subset (denoted by empirical risk in Figure 4) and the empirical measure of its ϵ-expansion using the testing dataset, and translate it into an intrinsic robustness estimate. Finally, we plot the empirical risk and the estimated intrinsic robustness for each parameter setting in Figure 4. F ADDITIONAL EXPERIMENTS This appendix provides additional experimental results, supporting our arguments in Section 6. Visualization of label uncertainty. Figure 6 shows some CIFAR-10 images with the original CIFAR10 labels and the CIFAR-10H human uncertainty labels. The label uncertainty score is computed based on Definition 4.1 and provided under each image. There are a few examples with high label uncertainty, whose CIFAR-10 label contradicts with the CIFAR-10H soft label (see the first two images in Figure 6(a)), indicating they are actually mislabeled. The images with uncertainty scores around 1.0 do appear to be images that are difficult for human to recognize, whereas images with lowest uncertainty scores look clearly representative of the labeled class. These observations show the usefulness of the proposed label uncertainty definition. Estimation of Intrinsic Robustness. Table 1 summarizes our estimated intrinsic robustness limits produced by Algorithm 1 for different hyperparameter settings. In particular, we set α = 0.05 and γ = 0.17 to roughly reflect the standard error and the label uncertainty of the error regions with respect to the state-of-the-art classification models (see Figure 3), use ϵ {4/255, 8/255, 16/255} for ℓ and ϵ {0.5, 1.0, 1.5} for ℓ2. Note that we also compare our estimate with the intrinsic robustness limit implied by the standard concentration by setting γ = 0 for each setting. We perform a 50/50 train-test split on the CIFAR-10 test images: we obtain the optimal subset with the smallest ϵ-expansion on the training dataset based on Algorithm 1 and evaluate it on the testing Published as a conference paper at ICLR 2022 (a) Highest Uncertainty Score (b) Lowest Uncertainty Score (c) Uncertainty Score around 1.0 (d) Uncertainty Score around 1.0 Figure 6: Illustration of human uncertainty labels and label uncertainty of CIFAR-10 test images. Each subfigure shows a group of images with a certain level of uncertainty score. dataset. We report both the empirical measure of the optimally-found subset (Empirical Risk in Table 1), and the translated intrinsic robustness estimate. These results show that our estimation of intrinsic robustness generalizes from the training data to the testing data, and support the argument that our estimate is a more accurate characterization of intrinsic robustness compared with standard one. G ESTIMATING LABEL ERRORS USING CONFIDENT LEARNING The proposed concentration estimation framework relies on the knowledge of human soft labels to determine which example has label uncertainty exceeding a certainty threshold. Since most machine learning datasets do not provide such information like CIFAR-10H, this raises the question of how to extend our method to the setting where human soft labels are unavailable. We make an initial attempt to address the aforementioned issue using the confident learning approach of Northcutt et al. (2021b). Their goal was to identify label errors for a dataset, which is closely related to label uncertainty. The method first computes a confidence joint matrix based on the predicted probabilities of a pretrained classifier, then selects the top examples based on a ranking rule, such as self-confidence or max margin. If we are able to approximate human label uncertainty from the raw Published as a conference paper at ICLR 2022 Table 1: Summary of the main results using our method for different settings on CIFAR-10 dataset. We conduct 5 repeated trials for each setting to record the mean statistics and its standard deviation. Metric α ϵ γ T Empirical Risk (%) Intrinsic Robustness (%) training testing training testing 4/255 0.0 5 5.80 0.04 4.50 0.21 93.48 0.10 93.86 0.26 0.17 5 5.84 0.10 5.06 0.82 92.03 0.45 92.61 1.12 8/255 0.0 10 5.77 0.01 4.76 0.27 92.89 0.11 92.36 0.33 0.17 10 5.77 0.02 4.85 0.58 90.91 0.53 90.98 1.03 16/255 0.0 5 5.68 0.04 5.30 0.33 88.44 0.47 87.89 1.24 0.17 5 5.67 0.25 4.79 0.75 81.96 1.69 83.83 2.37 0.5 0.0 5 5.76 0.00 5.41 0.60 93.78 0.10 93.51 0.67 0.17 5 5.76 0.00 5.36 0.14 91.89 0.38 91.70 0.49 1.0 0.0 5 5.76 0.00 6.00 0.50 92.93 0.06 92.22 0.55 0.17 5 5.76 0.00 5.32 0.29 87.86 0.79 87.75 0.58 1.5 0.0 5 5.76 0.00 5.67 0.56 91.98 0.13 91.82 0.65 0.17 5 5.76 0.00 5.69 0.45 83.33 2.04 82.87 2.50 (a) Error versus Non-error (b) Precision Recall Curve Figure 7: Illustration of misalignment label errors recognized by human and those identified by confident learning (a) Distribution of human label uncertainty between errors and non-errors estimated using confident learning; (b) Precision-recall curve for estimating the set of examples with human label uncertainty exceeding 0.5. inputs and labels, or identify the set of examples with high label uncertainty, then we can immediately adapt our proposed framework by leveraging such estimated results. However, we observe only a weak correlation between the set of label errors that are produced by confident learning and the set of examples with high human label uncertainty. We conduct the experiments on CIFAR-10 and identify the set of label errors based on confident learning. We train a Res Net-50 based classification model on the CIFAR-10 training data, and select examples in the CIFAR-10 test dataset as labeling errors using the best ranking method suggested in Northcutt et al. (2021b). Figure 7(a) compares the distribution of human label uncertainty (based on the human soft labels from CIFAR-10H) between the set of estimated label error and non-errors. Although the set of examples estimated as label error have relative higher human label uncertainty compared with non-errors, there exist over 30% of estimated label errors have 0 label uncertainty for human annotators. It implies that there is a mismatch between label errors identified by human and that estimated using confident learning techniques. This is further confirmed by the precision-recall curve presented in Figure 7(b). We treat examples with human label uncertainty exceeding 0.5 as the ground-truth uncertain images, and vary the size of produced set of label errors to plot the precision and recall curve. The fact that precision rate is uniformly lower than 0.25, indicating that over 75% of the estimate error examples have human label uncertainty less than 0.5. Published as a conference paper at ICLR 2022 Figure 8: Visualization of label distribution of top uncertain CIFAR-10 test images estimated using a confident learning approach. Both human and estimated label distribution are plotted in each figure. The corresponding label uncertainty scores are computed and provided under each image, while the original CIFAR-10 label is highlighted in blue above each image. Figure 8 visualizes the human label distribution and estimated label distribution on CIFAR-10. We compute the estimated label uncertainty of each CIFAR-10 testing examples by replacing the human label distribution with the predicted probabilities of the trained model. It can be seen that there exists a misalignment between the human label distribution and the distribution estimated using some neural network. This again confirms that label errors produced by confident learning are not guaranteed to be examples that are difficult for humans.