# reliable_learning_in_challenging_environments__e8fe78a8.pdf

Reliable learning in challenging environments

Maria-Florina Balcan Carnegie Mellon University

ninamf@cs.cmu.edu

Steve Hanneke Purdue University steve.hanneke@gmail.com

Rattana Pukdee Carnegie Mellon University

rpukdee@cs.cmu.edu

Dravyansh Sharma Carnegie Mellon University dravyans@cs.cmu.edu

The problem of designing learners that provide guarantees that their predictions are provably correct is of increasing importance in machine learning. However, learning theoretic guarantees have only been considered in very speciﬁc settings. In this work, we consider the design and analysis of reliable learners in challenging testtime environments as encountered in modern machine learning problems: namely adversarial test-time attacks (in several variations) and natural distribution shifts.

In this work, we provide a reliable learner with provably optimal guarantees in such settings. We discuss computationally feasible implementations of the learner and further show that our algorithm achieves strong positive performance guarantees on several natural examples: for example, linear separators under log-concave distributions or smooth boundary classiﬁers under smooth probability distributions.

1 Introduction

The question of providing reliability guarantees on the output of learned classiﬁers has been studied previously in the classical learning setting where the training and test data are independent and identically distributed (i.i.d.) draws from the same distribution [RS88, EYW10, EYW12]. Conceptually, a reliable learner outputs a prediction and may output a correctness guarantee. We know that the learner is correct on all points with the guarantee as long as the learning-theoretic assumptions hold, e.g., realizability. While a trivial model that abstains from providing any guarantee is also a reliable learner, we are interested in a reliable learner that provides the guarantee on as many points as possible (useful in the sense of Rivest and Sloan [RS88]). [EYW10] provides a characterization of optimal reliable learners in this classical learning setting.

However, the assumption that the training and test data are drawn from the same distribution is often violated in practice. The mismatch may take the form of a natural distribution shift when the test distribution is different from the training distribution or adversarial attacks when there is an adversary that can perturb a test data point with the goal of changing the model prediction. This is frequently accompanied by a signiﬁcant performance drop, as well as the inability to guarantee the usefulness of the algorithm. As a result, there is a signiﬁcant interest in the study of test-time attacks [GSS15, CW17, MMS+18] and distribution shift [LWS18, RRSS19, MTR+21] among the applied machine learning community. Furthermore, recently there has been growing interest in the theoretical machine learning community for designing approaches with provable guarantees under test time attacks [AKM19, MHS19, MHS22] as well as renewed interest in distribution shift [BDBCP06, MMR08, HK19]. All the prior theoretical work in the literature has mainly focused on the effect of attacks or distribution shift on average error rate (e.g. [BDBCP06, AKM19]). However,

37th Conference on Neural Information Processing Systems (Neur IPS 2023).

this neglects a major relevant concern for users of machine learning algorithms, namely the ability to provide correctness guarantees for individual predictions: i.e., reliability.

In this work, we advance this line of work by developing a general understanding of how to learn reliably in the presence of corruptions or changes to the test set, speciﬁcally under adversarial test-time attacks as well as distribution shift between the training (source) and test (target) data.

Our results. We consider algorithms that provide robustly-reliable predictions which are guaranteed to be correct under standard assumptions from statistical learning theory, for both test-time attacks and distribution shift. Our ﬁrst main set of results tackles the challenging case of adversarial test-time perturbations. For this setting, we introduce a novel compelling reliability criterion on a learner that particularly captures the challenge of reliability under the test-time attacks. Given a test point z, a robustly-reliable classiﬁer either abstains from prediction, or outputs both a prediction y and a reliability guarantee with the guarantee that y is correct unless one of two bad events has occurred: 1) the true target function does not belong to the given hypothesis set H or, 2) a test-point z is

perturbed from its original point by adversarial strength of at least (measured in the relevant metric). In the case of distribution shift, we provide novel analysis and a complexity measure that extend the classical notion of reliable learning to the setting when the test distribution is allowed to be an arbitrary new distribution.

1.1 Summary of contributions

1. We propose robustly-reliable learners for test-time attacks which guarantee reliable learning in

the presence of test-time attacks, and characterize the region of instance space where they are simultaneously robust and reliable. Speciﬁcally, under the realizable setting, for adversarial perturbations within metric balls around the test points, we use the radius of the metric ball as a natural notion of adversarial strength. We output a reliability radius with a guarantee that our prediction on a point is correct as long as it was perturbed with a distance less than (under a given metric). We further show that our proposed robustly-reliable learner achieves pointwise optimal values for this reliability radius: that is, no robustly-reliable learner can output a reliability radius larger than our learner for any point in the instance space (Theorem B.1, B.2). 2. The pointwise optimal algorithm is easy to derive from our deﬁnition. We discuss a computation-

ally efﬁcient implementation of the optimal learners. (Section 4). 3. We discuss variants of these algorithms and guarantees appropriate for three different variants of

adversarial losses studied in the literature: depending on whether the perturbed point must have the same label as the original point, or in lieu of this, whether the algorithm should predict the true label of the perturbed point, or the same label as the original point (Deﬁnition 1). 4. We further introduce a safely-reliable region, which captures the challenge caused by the adver-

sary s ability to perturb a test point to cause a reduction in our reliability radius (Deﬁnition 6). As examples, we show that the safely-reliable region can be large for linear separators under log-concave distributions and for classiﬁers with smooth decision boundaries under nearly-uniform distributions and as a consequence, the robustly-reliable region is large as well (Theorem 3.3). 5. We extend this characterization to abstention-based reliable predictions for arbitrary adversarial

perturbation sets, where we no longer restrict ourselves to metric balls. We again get a tight characterization of the robustly-reliable region (Theorem C.1). 6. We also consider reliability in the distribution shift setting where the test data points come from a

different distribution. We introduce a novel reﬁnement to the notion of disagreement coefﬁcient [Han07], to measure the transferability of reliability guarantees across distributions. We provide bounds on the probability mass of the reliable region under transfer for several interesting examples including, when learning linear separators, transfer from β1 log-concave to β2 log-concave and to s-concave distributions (Theorems G.1, G.2). We additionally bound the probability of the reliable region for learning classiﬁers with general smooth classiﬁcation boundaries, for transfer between smooth distributions (Theorem G.3). 7. We further extend our reliability results to the setting of robustness tranfer, where the test data is

simultaneously under adversarial perturbations as well as distribution shift (Section J). 8. Finally, we demonstrate that it is possible extend our results into the agnostic setting. (Section 7)

Conceptual advances over prior work. Prior works on certiﬁed robustness [SKL17, CRK19, WLF22] have examined pointwise consistency guarantees. The certiﬁed robustness guarantee is only that a prediction does not change with an adversarial perturbation, but it does not guarantee that the

prediction is correct (neither for the original point nor the perturbation); in particular, a constant function is always certiﬁed robust but it may not be useful. In contrast, our notion of robustly-reliable learner guarantees that, for any test point x and perturbation z, if z has a distance less than to x ( = reliability radius), then the prediction will be correct (robust loss zero) in a sense informed by which robust loss we are addressing; we discuss this idea for several different losses, leading to different interpretations of this guarantee. In particular, for the stability loss, the prediction being correct means that it predicts the true label of the original point x; this implies certiﬁed robustness,

but is even stronger, since it also guarantees the correct label. Prior work [BBHS22] introduces the notion of a robustly-reliable learner for poisoning attacks which is different from our deﬁnition that is tailored to test-time attacks with a guarantee in terms of a reliability radius. In distribution shifts setting, we are the ﬁrst to assess the transferability of reliability guarantees which differ from a widely-studied metric of average error rate. For additional related work, we refer to Appendix A.

2 Preliminaries and problem formulation

Let X denote the instance space and Y = {0, 1} be the label space. Let H be a hypothesis class. The learner L is given access to a labeled sample S = {(xi, yi)}m

i=1 drawn from a distribution D over X Y and learns a concept h L : X ! Y. In the realizable setting, we assume we have a hypothesis (concept) class H and target concept h 2 H such that the true label of any x 2 X is given by h (x). In particular, S = {(xi, h (xi))}m

i=1 in this setting. Given the 0-1 loss function : H X ! {0, 1}, deﬁne err S(h, ) = 1 m

(x,y)2S (h, x). We use DX to denote the marginal distribution over X. We use I[ ] to denote the indicator function that takes values in {0, 1}. We also deﬁne BH

D (h , r) = {h 2 H | Pr D[h(x) 6= h (x)] r} as the set of hypotheses in H that disagree with h with probability at most r. During test-time, the learner makes a prediction on a test-point z 2 X. We consider the following settings

1. Adversarial test-time attack. We consider adversarial attacks with perturbation function U :

X ! 2X that can perturb a test point x to an arbitrary point z from the perturbation set U(x) X [MHS19]. We assume that the adversary has access to the learned concept h L as well as the test point x, and can perturb this data point to any z 2 U(x) and then provide this perturbed data point to the learner at test-time. We want to provide pointwise robustness and reliability guarantees in this setting. We will assume that x 2 U(x) for all x 2 X. For any point z, we have U 1(z) := {x 2 X|z 2 U(x)}, the set of points that can be perturbed to z. We use perturbation to refer to a point z 2 U(x) and the perturbation sets U(x) interchangeably. 2. Distribution shift. We consider when a test point z is drawn from a different distribution from

the training samples. In this case, we want to provide a pointwise reliability guarantee. We will discuss more on this in Section 5.

2.1 Robust loss functions

In the applied and theoretical literature, various deﬁnitions of adversarial success have been explored, each dependent on the interpretation of robustness; depending on whether the perturbed point must have the same label as the original point, or in lieu of this, whether the algorithm should predict the true label of the perturbed point, or the same label as the original point. To capture these, we formally consider the following loss functions. Deﬁnition 1 (Robust loss functions). For a hypothesis h, a test point x, and a perturbation function U, we consider the following adversarially successful events.

1. Constrained Adversary loss [SZS+14, BBSZ23]. There exists a perturbation z of x that does not

change the true label of an original point x but h(z) is incorrect.

CA(h, x) = sup z2U(x) h (z)=h (x)

I[h(z) 6= h (z)].

For a ﬁxed perturbation z 2 U(x), deﬁne h

CA(h, x, z) = I[h(z) 6= h (z) h (z) = h (x)]. 2. True Label loss [ZL19, GKKW21]. There exists a perturbation z of x such that h(z) is incorrect.

TL (h, x) = sup z2U(x)

I[h(z) 6= h (z)].

Figure 1: Different perturbation sets considered in TL, ST (left), CA (mid) and IA (right). The dashed line represents the decision boundary of h and the background color of red and blue represents the label 0 and 1 respectively. The ball around each point describes the possible perturbation set U(x) and the shaded area inside each ball is the allowed perturbation. In TL, ST, we consider all perturbation in U(x) while in CA, we consider perturbations that do not change the true label of the perturbed point. Lastly, in IA, an adversary only perturb points where the original true label is 0.

In this case, if the true label of the z changes, then the learner need to match its prediction with the new label. For a ﬁxed perturbation z 2 U(x), deﬁne h

TL (h, x, z) = I[h(z) 6= h (z)]. 3. Stability loss [AKM19, MHS19, MHS22]. There exists a perturbation z of x such that h(z) is

different from h (x).

ST (h, x) = sup z2U(x)

I[h(z) 6= h (x)].

In this case, we focus on the consistency aspect where we want the prediction of any perturbation z to be the same as the prediction of x and this has to be correct w.r.t. x i.e. equals to h (x). For a ﬁxed perturbation z 2 U(x), deﬁne h

ST (h, x, z) = I[h(z) 6= h (x)]. 4. Incentive-aware Adversary loss [ZC21]. We take inspiration from economics application where

we assume that the label 1 is a more favorable outcome e.g. loan approval for which an adversary has no incentive to make any perturbation when the original label is 1. Deﬁne the perturbation set

UIA(x, h ) =

U(x) ; h (x) = 0 {x} ; h (x) = 1

and deﬁne an incentive-aware adversary loss as

IA (h, x) = sup z2UIA(x,h )

I[h(z) 6= h (x)].

For a ﬁxed perturbation z 2 U(x), deﬁne h

IA (h, x, z) = I[h(z) 6= h (x) z 2 UIA(x, h )].

We say that h is robust to a perturbation function U at x w.r.t. a robust loss if h (h, x) = 0.

Remark. ST, IA are robust losses that we can always evaluate in practice on the training data since we are comparing h(z) with h (x) which is known to us on the training data. For CA, TL, we are comparing h(z) with h (z) for which z may lie outside of the support of the natural data distribution and we may not have access to h (z). We illustrate the relationship between these losses by making a few useful observations.

In the robustly-realizable case [MHS20] when the perturbation function U does not change the

true label of any x in the training or test data, then all the losses CA, TL, ST are equivalent. This corresponds to a common assumption in the adversarial robustness literature, that the perturbations are human-imperceptible , which is usually quantiﬁed as the set of perturbations within a small metric ball around the data point. We provide an illustration of the perturbation set considered in various robust losses in Figure 1.

By considering these perturbation set, we have the following implication TL ! CA, ST ! CA and ST ! IA where 1 ! 2 means robustness w.r.t. 1 implies robustness w.r.t. 2.

3 Robustly-reliable learners w.r.t. metric ball attacks

Although our robust losses are deﬁned for any general perturbation set, we ﬁrst consider the case where the perturbation sets are balls in some metric space. Such attacks are widely studied in the literature, in particular, for balls with bounded Lp-norm. Moreover, the radius of the metric ball serves as a natural notion of adversarial strength that allows us to quantify the level of robustness. We will later (Theorem C.1) present results for general perturbation sets as well.

Let M = (X, d) be a metric space equipped with distance metric d. We use the notation BM(x, r) = {x0 2 X | d(x, x0) r} (resp. Bo

M(x, r) = {x0 2 X | d(x, x0) < r}) to denote a closed (resp. open) ball of radius r centered at x. We will sometimes omit the underlying metric M from the subscript to reduce notational clutter. We formally deﬁne a metric ball attack as follows.

Deﬁnition 2. Metric-ball attacks are deﬁned as the class of perturbation functions UM = {u : X ! 2X | u (x) = BM(x, )}, induced by the metric M = (X, d) deﬁned over the instance space.

At test-time, given a test-point z 2 X, we would like to make a prediction at z with a reliability guarantee. We consider this type of learner, a robustly-reliable learner deﬁned formally as follows.

Deﬁnition 3 (Robustly-reliable learner w.r.t. M-ball attacks). A learner L is robustly-reliable w.r.t. M-ball attacks for hypothesis space H and robust loss function if, for any target concept h 2 H, given S labeled by h , the learner outputs functions h L

S : X ! Y and r L

S : X ! [0, 1) [ { 1} such that for all x, z 2 X if r L

S(z) = > 0 and z 2 Bo

M(x, ) then h (h L

S, x, z) = 0. Further, if r L

S(z) = 0, then h (z) = h L

Note that L outputs a prediction and a real value r (the reliability radius ) for any test input. r = 1 corresponds to abstention (even in the absence of perturbation) i.e. when the learner is incapable of giving a reliability guarantee for that prediction), and r = > 0 is a guarantee from the learner that if the adversary s attack is in Bo

M(x, ) then we are correct i.e. if an adversary changes the original test point x to z, the attack will not succeed if the adversarial budget is less than . Lastly, when r = 0, the learner provides a guarantee that the learner s prediction at z is correct.

Deﬁnition 4 (Robustly-reliable region w.r.t. M-ball attacks). For a robustly-reliable learner L w.r.t. M-ball attacks for sample S, hypothesis space H and robust loss function deﬁned above, the robustly-reliable region of L at a reliability level is deﬁned as RRL(S, ) = {x 2 X | r L

S(x) } for sample S and 0.

The robustly-reliable region contains all points with a reliability guarantee of at least . We use RRL

W to denote robustly-reliable regions with respect to losses W for W 2 {CA, TL, ST, IA}. A natural goal is to ﬁnd a robustly-reliable learner L that has the largest robustly-reliable region possible. First, we note that predictions that are known by the learner to be correct are still known to be correct even when the test points are attacked. Therefore, a test point z lies in the robustly-reliable region w.r.t. CA, TL, as long as we can be sure that h L

S(z) is correct. This is equivalent to z being classiﬁed perfectly, i.e. according to the true label. Therefore, the robustly-reliable region w.r.t. CA, TL is given by the agreement region of the version space, which is the largest region where we can be sure of what the correct label is in the absence of any adversarial attack [EYW10]. We recall the deﬁnition of version space [Mit82] and agreement region [CAL94, BBL06].

Deﬁnition 5. For a set H H of hypothesis, and any set of samples S, let DIS(H) = {x 2 X : 9h1, h2 2 H s.t. h1(x) 6= h2(x)} be the disagreement region and Agree(H) = X \ DIS(H) be the agreement region. Let H0(S) = {h 2 H| err S(h) = 0} be a version space: the set of all hypotheses that correctly classify S. More generally, H (S) = {h 2 H| err S(h) } for 0.

We can also characterize the robustly-reliable region with respect to other robust losses in terms of the agreement region in the following Theorem.

Theorem 3.1. Let H be any hypothesis class. With respect to M-ball attacks and W , for 0,

(a) there exists a robustly-reliable learner L such that RRL

W (S, ) AW , and (b) for any robustly-reliable learner L, RRL

W (S, ) AW .

Speciﬁcally, for the robust loss W , the optimal robustly-reliable region AW are

Robust loss W Optimal robustly-reliable region AW CA, TL {z | z 2 Agree(H0(S))} ST {z | Bo(z, ) Agree(H0(S)) h(z) = h(x), 8x 2 Bo(z, ), 8h 2 H0(S)} IA (AST \ {z | h (z) = 1}) [ {z | z 2 Agree(H0(S)) h (z) = 0}

Proof. (Sketch) We provide the construction of the optimal robustly-reliable learner Lopt such that RRLopt

W (S, ) AW and later show that for any robustly-reliable learner L, we must also have RRL

W (S, ) AW . We start with CA, TL, consider a learner Lopt that predicts using an ERM classiﬁer and outputs = 1 for all points in the agreement region of H0(S). Any prediction in Agree(H0(S)) is reliable because it also agrees with h (h 2 H0(S) by realizability). On the other hand, for z 2 DIS(H0(S)), there exist h1, h2 2 H0(S) that disagree on z. For any learner L, it is not possible to guarantee that h L(z) is correct as we may have h = h1 or h = h2.

Now, for ST, consider a learner Lopt that classiﬁes using an ERM but the reliability radius is now the largest > 0 for which Bo(z, ) Agree(H0(S)) and h(x) = h(z), 8 x 2 Bo(z, ), h 2 H0(S), else = 0 if z 2 Agree(H0(S)), and 1 otherwise. The ﬁrst condition guarantees that h(x) = h (x), 8x 2 Bo(z, ). Combined with the second condition we have h(z) = h(x) = h (x), 8x 2 Bo(z, ). Thus, Lopt is a robustly-reliable learner. On the other hand, for a robustly-reliable learner L, consider z 2 RRL

ST(S, ) for > 0. We must have h L(z) = h (x), 8x 2 Bo(z, ). Using a similar argument to the case for CA, TL, we have z 2 Agree(H0(S)). If there exists x 2 Bo(z, ) that x 62 Agree(H0(S)), there exists h1, h2 2 H0(S) that h L(z) 6= h1(x) or h L(z) 6= h2(x). It is not possible to guarantee that h L(z) = h (x) as we may have h = h1 or h = h2. Therefore, we must have Bo(z, ) Agree(H0(S)). Finally, we cannot have x 2 Bo(z, ) that h(z) = h (z) 6= h (x) since this contradict with h(z) = h (x). Therefore, we must have h (x) = h (z). Since we have x 2 Agree(H0(S)), this implies that h(x) = h(z) for h 2 H0(S). For IA, the construction is similar to ST. For full proof, we refer to Appendix B.

For ST, the learner is able to certify a subset of the agreement region, which satisﬁes two additional conditions: h L

S must be correct on all possible points x that could be perturbed to an observed test point z, and the true label of z should match the true label of x. We denote the second condition of h(z) = h(x), 8x 2 Bo(z, ), 8h 2 H0(S) as the label consistency condition. For IA, since robustness w.r.t. ST implies robustness w.r.t. IA, we have AST AIA. In addition, with an incentiveaware adversary, whenever h(z) = 0, we must have h (x) = 0 since the adversary does not perturb a data point with the original label 1. Therefore, we can additionally provide a guarantee on z that lies in the agreement region and h(z) = 0. We remark that we can identify the term h (z) for any point z 2 Agree(H0(S)) due to the realizability assumption.

We have provided a robustly-reliable learner with the largest possible robustly-reliable region for losses CA, TL, ST, IA. However, we note that the probability mass of the robustly-reliable region may not be a meaningful way to quantify the overall reliability of a learner because a perturbation z may lie outside of the support of the natural data distribution and have zero probability mass. It seems more useful to measure the mass of points x where any perturbation z of x still lies within the robustly-reliable region. We formally deﬁne this region as the safely-reliable region.

Deﬁnition 6 (Safely-reliable region w.r.t. M-ball attacks). Let L be a robustly-reliable learner w.r.t. M-ball attacks for sample S, hypothesis class H and robust loss function . The safely-reliable region of learner L at reliability levels 1, 2 is deﬁned as

CA(S, 1, 2) = {x 2 X | BM(x, 1) \ {z | h (z) = h (x)} RRL

CA(S, 2)}, 2. SRL

W (S, 1, 2) = {x 2 X | BM(x, 1) RRL

W (S, 2)} for W 2 {TL, ST}, 3. SRL

IA(S, 1, 2) = {x 2 X | h (x) = 0 BM(x, 1) RRL

IA(S, 2)} [ {x 2 X | h (x) = 1 x 2 RRL

IA(S, 2))}.

The safely-reliable region contains any point that retains a reliability radius of at least 2 even after being attacked by an adversary with strength 1. In the safely-reliable region, we consider a set of potential natural (before attack) points x, while in the robustly-reliable region, we consider a set of potential test points z. In the following subsections, we show that in interesting cases commonly studied in the literature, the probability mass of the safely-reliable region is actually quite large.

Figure 2: The robustly-reliable region for CA, TL (left), ST (mid) and IA(right) for linear separators with an L2-ball perturbation. The background color of blue and red represents the agreement region of class 1 and 0 respectively. In this case, we can remove the label consistency condition and reduce the robustly-reliable region into the -buffered agreement region.

3.1 Safely-reliable region for linear separators under log-concave distributions is large

We provide the probability mass of safely-reliable regions with respect to different losses for linear separators when the data distribution follows an isotropic (mean zero and identity covariance matrix) log-concave (logarithm of density function is concave) distribution under a bounded L2-norm ball attack. For full proof, we refer to Appendix D. We will rely on the following key lemma which states that the agreement region of a linear separator cannot contain points that are arbitrarily close to the decision boundary of h for any sample S.

Lemma 3.2. Let D be a distribution over Rd and H = {h : x ! sign(hwh, xi) | wh 2 Rd, kwhk2 = 1} be a class of linear separators. For h 2 H, for a set of samples S Dm such that there is no data point in S that lies on the decision boundary, for any 0 < c < d, there exists δ(S, c, d) > 0 such that for any x with c kxk d and |hwh , xi| < δ, we have x 62 Agree(H0(S)).

A direct implication of the lemma is that any L2-ball B(x, ) that lies in the agreement region must not contain the decision boundary of h and must contain points with the same label. This allows us to remove the label consistency condition and instead focus on whether the ball B(x, ) lies in the agreement region. Intuitively, the reliable region is now given by the -buffered agreement region where we only select points that have a distance at least from the boundary of the agreement region (Figure 2). We provide bounds for the probability mass of the safely-reliable region below and we refer to the full proof in Appendix D. Theorem 3.3. Let D be isotropic log-concave over Rd and H = {h : x ! sign(hwh, xi) | wh 2 Rd, kwhk2 = 1} be the class of linear separators. Let B( , ) be a L2 ball perturbation with radius . For S Dm, for m = O( 1

"2 (VCdim(H) + ln 1

δ )), for an optimal robustly-reliable learner L,

TL(S, 1, 2)) 1 2 1 O(

d") with probability at least 1 δ, (b) SRL

CA(S, 1, 2) = SRL

TL(S, 1, 2) almost surely, (c) Pr(SRL

ST(S, 1, 2)) 1 2( 1 + 2) O(

d") with probability at least 1 δ, (d) Pr(SRL

IA(S, 1, 2)) 1 ( 1 + 2) O(

d") with probability at least 1 δ.

The O-notation suppresses dependence on logarithmic factors and distribution-speciﬁc constants.

We remark that we can t always remove the label consistency condition for a general perturbation set. For example, consider U(x) = B(x a, ) [ {x} [ B(x + a, ), is made of two L2 balls with center x a, x + a, with appropriate value of a, , we may have each ball lie in the different side of the agreement region so that the whole perturbation set lie in the agreement region but contain points with different labels (Figure 3). We also provide bounds on the probability mass of the safely-reliable region for more general concept spaces beyond linear separators, speciﬁcally, classiﬁers with smooth boundaries in Appendix E.

4 On computational efﬁciency

Given the deﬁnition, it is possible to implement computationally efﬁcient robustly-reliable learners. For example, for linear separator concept classes under bounded L2-norm attack. The optimal

Figure 3: The perturbation set is represented by two dashed balls. This lies inside the agreement region but contains points with different labels.

Distribution

Distribution

Figure 4: The disagreement region and the agreement region under a distribution shift where P and Q are isotropic (left) and where there is a mean shift (right).

robustly-reliable learner Lopt, described above may be implemented via a linearly constrained quadratic program that computes the (squared) distance of the test point z to the closest point z0 in the disagreement region. This gives us the reliability radius, since for linear separators one must cross the decision boundary to perturb a point to a differently labeled point

min w,w0,z0 ||z z0||2

s.t. hw, xiiyi 0, for each (xi, yi) 2 S,

hw0, xiiyi 0, for each (xi, yi) 2 S,

hw, z0ihw0, z0i 0.

Given training sample S, for any given test point z, the learner L can efﬁciently compute the solution s to the above program and output

s as the reliability radius. We show that the variant of this objective also provides a reliability radius for a wide range of hypothesis classes under L2 ball attacks (see Lemma F.1). In addition, we can relax this objective into a regularized objective that gives a lower bound on the reliability radius of ||z z ||2, when z is the solution of

h1, h2, z = argmin

h,h0,z0 ||z z0||2 + λ( ˆR(h, S [ {(z0, 0)}) + ˆR(h0, S [ {(z0, 1)}))

when ˆR(h, S) is the empirical risk of h on S. We provide a more detailed discussion in Appendix F.

5 Robustly-reliable learning under distribution shift

We now consider the reliability aspect for distribution shift, a different kind of test-time robustness challenge when the test data comes from a different distribution than the training data. Formally, let P be the training distribution and let Q be the test distribution. We assume the realizable distribution shift setting, i.e. there is a target concept h 2 H such that the true label of any x 2 X is given by h (x) at training time and test time, or err P(h ) = err Q(h ) = 0. As observed earlier, points that

are known by the learner to be correct (reliable) are still known to be correct even when it is drawn from a different distribution. This reliability guarantee holds even when the distributions P and Q do not share a common support, a setting for which many prior theoretical works result in vacuous bounds. For example, suppose X = Rn, P and Q are supported on disjoint n-balls, and H is the class of linear separators. Then the total variation distance, the H-divergence [KBDG04, BDBC+10] as well as the discrepancy distance [MMR08] between P and Q are all 1. While recent work of [HK19] does apply in this setting, they do not focus on the reliability guarantee. In this work, we are interested in quantifying the transferability of reliability guarantee transfer between distributions P and Q. We recall the notion of reliable prediction [EYW10]. Deﬁnition 7 (Reliability). A learner L is reliable w.r.t. concept space H if, for any target concept h 2 H, given any sample S labeled by h , the learner outputs functions h L

S : X ! Y and a L

S : X ! {0, 1} such that for all x 2 X if a L

S(x) = 1 then h L

S(x) = h (x). Else, if a L

S(x) = 0, the learner abstains from prediction. The reliable region of L is RL(S) = {x 2 X | a L

We deﬁne the following metric to measure the reliability of a learner under distribution shift. Deﬁnition 8 (P!Q reliable correctness). The P!Q-reliable correctness of L (at sample rate m, for distribution shift from P to Q) is deﬁned as the expected probability mass of its reliable region under Q when trained on a random training S Pm, i.e. Prx Q,S P m[x 2 RL(S)].

The disagreement coefﬁcient was originally introduced to study the label complexity in agnostic active learning [Han07] and is also known to characterize reliable learning in the absence of any distribution shift [EYW10]. We propose the following reﬁnement to the notion of disagreement coefﬁcient, which we will use to give bounds on a learner s P!Q reliable correctness. Deﬁnition 9 (P!Q disagreement coefﬁcient). For a hypothesis class H, the P!Q disagreement coefﬁcient of h 2 H with respect to H is given by

P!Q(") = sup

Pr Q[DIS(BP(h , r))]

where BP(h , r) = {h 2 H | Pr P[h(x) 6= h (x)] r}.

This quantiﬁes the rate of disagreement over Q among classiﬁers which are within disagreement-balls w.r.t. h under P, relative to the version space radius. The proposed metric is asymmetric between P and Q, and also depends on the target concept h . More simple examples are in Appendix H. We show that the P!Q-reliable correctness of our learner may be bounded in terms of the P!Q disagreement coefﬁcient using a uniform convergence based argument. The proof details are in Appendix I. Theorem 5.1. Let Q be a realizable distribution shift of P with respect to H, and h 2 H be the target concept. Given sufﬁciently large sample size m c "2 (d+ln 1

δ ), the P!Q-reliable correctness of L, the optimal robustly-reliable learner, is at least

Pr x Q,S P m[x 2 RL(S)] 1 P!Q " δ.

Here c is an absolute constant, and d is the VC-dimension of H.

In Appendix J, we show that this P ! Q disagreement coefﬁcient can be small for several examples which implies that it is possible to transfer the reliability guarantee from one distribution to the other. In particular, when learning linear separators, we provide bounds for transferring from β1 log-concave to β2 log-concave and to s-concave distributions (Theorems G.1, G.2). In addition, when learning classiﬁers with general smooth classiﬁcation boundaries, we provide bounds for transferring between smooth distributions (Theorem G.3).

6 Safely-reliable correctness under distribution shift

There is a growing practical [SSZ+20, SIE+20] as well as recent theoretical interest [DGH+23] in the setting of robustness transfer , where one simultaneously expects adversarial test-time attacks as well as distribution shift. We will study the reliability aspect for this more challenging setting. We note that the deﬁnition of a robustly-reliable learner does not depend on the data distribution (see

Deﬁnition 3) as the guarantee is pointwise. Our optimality result in Section 3 applies even when a test point is drawn from a different distribution Q. In this case, the safely-reliable region instead would have a different probability mass. Deﬁnition 10 (P!Q safely-reliable correctness). The P!Q safely-reliable correctness of L (at sample rate m, for distribution shift from P to Q, w.r.t. robust loss ) is deﬁned as the probability mass of its safely-reliable region under Q, on a sample S Pm, i.e.

(S, 1, 2) := Pr x Q,S P m[x 2 SRL

(S, 1, 2)].

We consider an example when the training distribution P is isotropic log-concave and the test distribution Qµ is log-concave with its mean shifted by µ but the covariance matrix is still an identity matrix (see Figure 4, right). We provide the bound on the P!Q safely-reliable correctness of this example in Appendix J (see Theorem J.2).

7 Reliability in the agnostic setting

In the above, we have assumed that the training samples S are realizable under our concept class H, i.e. there is a target concept h consistent with our (uncorrupted) data. In the agnostic setting, we can have minh2H err S(h) > 0, meaning no single concept is always correct. We deﬁne a -tolerably robustly-reliable learner under test-time attacks in the agnostic setting as the learner whose reliable predictions agree with every low-error hypothesis (error at most ) on the training sample ([BBHS22] have proposed the corresponding deﬁnition for data poisoning attacks). Deﬁnition 11 ( -tolerably robustly-reliable learner w.r.t. M-ball attacks). A learner L is robustlyreliable w.r.t. M-ball attacks for sample S, hypothesis space H and robust loss function if, for every concept h 2 H with err S(h ) , the learner outputs functions h L

S : X ! Y and r L

S : X ! [0, 1) [ { 1} such that for all x, z 2 X if r L

S(z) = > 0 and z 2 Bo

M(x, ) then h (h L

S, x, z) = 0. Further, if r L

S(z) = 0, then h (z) = h L

S(z). Given sample S such that some concept h 2 H satisﬁes err S(h ) , the robustly-reliable region of L is deﬁned as RRL(S, , ) = {x 2 X | r L

S(x) } for , 0.

We generalize our results from Section 3 to the agnostic setting (proof details in Appendix K). Here H (S) = {h 2 H | err S(h) }. Theorem 7.1. Let H be any hypothesis class. With respect to M-ball attacks and CA, for 0,

(a) There exists a robustly-reliable learner L such that RRL

CA(S, , ) Agree(H (S)), (b) For any robustly-reliable learner L, RRL

CA(S, , ) Agree(H (S)).

8 Discussion

In this work, we generalize the classical line of works on reliable learning to address challenging test-time environments. We propose a novel robustly-reliability criterion that is applicable to several variations of test-time attacks. Our analysis leads to an easy-to-derive algorithm that can be implemented efﬁciently in many cases. Additionally, we introduce a P ! Q disagreement coefﬁcient to capture the transferability of the reliability guarantee between distributions. The proposed robustly-reliability criterion and the P ! Q disagreement coefﬁcient together provide a comprehensive framework for handling test-time attacks and evaluating the reliability of learning models. This contributes to the advancement of reliable learning methodologies in the face of challenging real-world scenarios, facilitating the development of more resilient and trustworthy machine learning systems. Notably, key questions remain open, including, how to efﬁciently implement the algorithm for a class of neural networks, and how to learn reliably with respect to any general robust loss function?

9 Acknowledgements

This work was supported in part by NSF grants CCF-1910321 and SES-1919453, the Defense Advanced Research Projects Agency under cooperative agreement HR00112020003, a Bloomberg Data Science Ph D fellowship, and a Simons Investigator Award.

[AB99] Martin Anthony and Peter L Bartlett. Neural network learning: Theoretical foundations.

Cambridge University Press, 1999.

[ABGLP19] Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. Invariant risk

minimization. ar Xiv preprint ar Xiv:1907.02893, 2019.

[ABHU15] Pranjal Awasthi, Maria-Florina Balcan, Nika Haghtalab, and Ruth Urner. Efﬁcient

learning of linear separators under bounded noise. In Conference on Learning Theory, pages 167 190. PMLR, 2015.

[ABL14] Pranjal Awasthi, Maria Florina Balcan, and Philip M Long. The power of localization

for efﬁciently learning linear separators with noise. In Proceedings of the forty-sixth annual ACM Symposium on Theory of Computing, pages 449 458, 2014.

[AK91] David Applegate and Ravi Kannan. Sampling and integration of near log-concave

functions. In Proceedings of the twenty-third annual ACM Symposium on Theory of Computing, pages 156 163, 1991.

[AKM19] Idan Attias, Aryeh Kontorovich, and Yishay Mansour. Improved generalization bounds

for robust learning. In Algorithmic Learning Theory, pages 162 183. PMLR, 2019.

[BBHS22] Maria-Florina Balcan, Avrim Blum, Steve Hanneke, and Dravyansh Sharma. Robustly-

reliable learners under poisoning attacks. In Conference on Learning Theory. PMLR, 2022.

[BBL06] Maria-Florina Balcan, Alina Beygelzimer, and John Langford. Agnostic active learning.

In Proceedings of the 23rd International Conference on Machine Learning, pages 65 72, 2006.

[BBS07] Steffen Bickel, Michael Brückner, and Tobias Scheffer. Discriminative learning for

differing training and test distributions. In Proceedings of the 24th International Conference on Machine Learning, pages 81 88, 2007.

[BBSZ23] Maria-Florina Balcan, Avrim Blum, Dravyansh Sharma, and Hongyang Zhang. On

the power of abstention and data-driven decision making for adversarial robustness. Journal of Machine Learning Research, 2023.

[BCKW15] Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. Weight

uncertainty in neural network. In International Conference on Machine Learning, pages 1613 1622. PMLR, 2015.

[BDBC+10] Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and

Jennifer Wortman Vaughan. A theory of learning from different domains. Machine Learning, 79:151 175, 2010.

[BDBCP06] Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. Analysis of

representations for domain adaptation. Advances in Neural Information Processing Systems, 19, 2006.

[BH21] Maria-Florina Balcan and Nika Haghtalab. Noise in classiﬁcation. Beyond the Worst-

Case Analysis of Algorithms, page 361, 2021.

[BL13] Maria-Florina Balcan and Phil Long. Active and passive learning of linear separators

under log-concave distributions. In Conference on Learning Theory, pages 288 316. PMLR, 2013.

[BPRZ23] Maria-Florina Balcan, Rattana Pukdee, Pradeep Ravikumar, and Hongyang Zhang.

Nash equilibria and pitfalls of adversarial training in adversarial robustness games. In The 26nd International Conference on Artiﬁcial Intelligence and Statistics. PMLR,

[BZ17] Maria-Florina Balcan and Hongyang Zhang. Sample and computationally efﬁcient

learning algorithms under s-concave distributions. Advances in Neural Information Processing Systems, 30, 2017.

[CAL94] David Cohn, Les Atlas, and Richard Ladner. Improving generalization with active

learning. Machine Learning, 15:201 221, 1994.

[CDG+18] Corinna Cortes, Giulia De Salvo, Claudio Gentile, Mehryar Mohri, and Scott Yang.

Online learning with abstention. In International Conference on Machine Learning, pages 1059 1067. PMLR, 2018.

[CDM16] Corinna Cortes, Giulia De Salvo, and Mehryar Mohri. Boosting with abstention. Ad-

vances in Neural Information Processing Systems, 29, 2016.

[CRK19] Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certiﬁed adversarial robustness via

randomized smoothing. In International Conference on Machine Learning, pages 1310 1320. PMLR, 2019.

[CRS+19] Yair Carmon, Aditi Raghunathan, Ludwig Schmidt, John C Duchi, and Percy S Liang.

Unlabeled data improves adversarial robustness. Advances in Neural Information Processing Systems, 32, 2019.

[CW17] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural

networks. In 2017 IEEE symposium on security and privacy, pages 39 57. IEEE, 2017.

[CWG+19] Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, and Tengyu Ma. Learning

imbalanced datasets with label-distribution-aware margin loss. Advances in Neural Information Processing Systems, 32, 2019.

[Dan16] Amit Daniely. Complexity theoretic limitations on learning halfspaces. In Proceedings

of the forty-eighth annual ACM Symposium on Theory of Computing, pages 105 117, 2016.

[DGH+23] Yuyang Deng, Nidham Gazagnadou, Junyuan Hong, Mehrdad Mahdavi, and Lingjuan

Lyu. On the hardness of robustness transfer: A perspective from Rademacher complexity over symmetric difference hypothesis space. ar Xiv preprint ar Xiv:2302.12351, 2023.

[DGT19] Ilias Diakonikolas, Themis Gouleakis, and Christos Tzamos. Distribution-independent

pac learning of halfspaces with massart noise. Advances in Neural Information Processing Systems, 32, 2019.

[DN21] John C Duchi and Hongseok Namkoong. Learning models with uniform performance

via distributionally robust optimization. The Annals of Statistics, 49(3):1378 1406, 2021.

[EYW10] Ran El-Yaniv and Yair Wiener. On the foundations of noise-free selective classiﬁcation.

Journal of Machine Learning Research, 11(5), 2010.

[EYW12] Ran El-Yaniv and Yair Wiener. Active learning via perfect selective classiﬁcation.

Journal of Machine Learning Research, 13(2), 2012.

[GG16] Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian approximation: Representing

model uncertainty in deep learning. In International Conference on Machine Learning, pages 1050 1059. PMLR, 2016.

[GKKM20] ShaﬁGoldwasser, Adam Tauman Kalai, Yael Kalai, and Omar Montasser. Beyond

perturbations: Learning guarantees with arbitrary adversarial test examples. Advances in Neural Information Processing Systems, 33:15859 15870, 2020.

[GKKW21] Pascale Gourdeau, Varun Kanade, Marta Kwiatkowska, and James Worrell. On the hard-

ness of robust classiﬁcation. The Journal of Machine Learning Research, 22(1):12521 12549, 2021.

[GPSW17] Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q Weinberger. On calibration of modern

neural networks. In International Conference on Machine Learning, pages 1321 1330. PMLR, 2017.

[GSS15] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing

adversarial examples. International Conference on Learning Representations, 2015.

[Han07] Steve Hanneke. A bound on the label complexity of agnostic active learning. In

Proceedings of the 24th International Conference on Machine Learning, pages 353 360, 2007.

[HGB+06] Jiayuan Huang, Arthur Gretton, Karsten Borgwardt, Bernhard Schölkopf, and Alex

Smola. Correcting sample selection bias by unlabeled data. Advances in Neural Information Processing Systems, 19, 2006.

[HK19] Steve Hanneke and Samory Kpotufe. On the value of target data in transfer learning.

Advances in Neural Information Processing Systems, 32, 2019.

[HKLM20] Max Hopkins, Daniel Kane, Shachar Lovett, and Gaurav Mahajan. Noise-tolerant,

reliable active classiﬁcation with comparison queries. In Conference on Learning Theory, pages 1957 2006. PMLR, 2020.

[JSH20] Adel Javanmard, Mahdi Soltanolkotabi, and Hamed Hassani. Precise tradeoffs in

adversarial training for linear regression. In Conference on Learning Theory, pages 2034 2078. PMLR, 2020.

[KBDG04] Daniel Kifer, Shai Ben-David, and Johannes Gehrke. Detecting change in data streams.

In VLDB, volume 4, pages 180 191. Toronto, Canada, 2004.

[KKMS08] Adam Tauman Kalai, Adam R Klivans, Yishay Mansour, and Rocco A Servedio.

Agnostically learning halfspaces. SIAM Journal on Computing, 37(6):1777 1805, 2008.

[KV94] Michael J Kearns and Umesh Vazirani. An introduction to computational learning

theory. MIT press, 1994.

[LAG+19] Mathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, and Suman Jana.

Certiﬁed robustness to adversarial examples with differential privacy. In 2019 IEEE Symposium on Security and Privacy, pages 656 672. IEEE, 2019.

[LCWC19] Bai Li, Changyou Chen, Wenlin Wang, and Lawrence Carin. Certiﬁed adversarial

robustness with additive noise. Advances in Neural Information Processing Systems, 32, 2019.

[LPB17] Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scal-

able predictive uncertainty estimation using deep ensembles. Advances in Neural Information Processing Systems, 30, 2017.

[LV07] László Lovász and Santosh Vempala. The geometry of logconcave functions and

sampling algorithms. Random Structures & Algorithms, 30(3):307 358, 2007.

[LWS18] Zachary Lipton, Yu-Xiang Wang, and Alexander Smola. Detecting and correcting for

label shift with black box predictors. In International Conference on Machine Learning, pages 3122 3130. PMLR, 2018.

[MHS19] Omar Montasser, Steve Hanneke, and Nathan Srebro. VC classes are adversarially

robustly learnable, but only improperly. In Conference on Learning Theory, pages 2512 2530. PMLR, 2019.

[MHS20] Omar Montasser, Steve Hanneke, and Nati Srebro. Reducing adversarially robust

learning to non-robust pac learning. Advances in Neural Information Processing Systems, 33:14626 14637, 2020.

[MHS21] Omar Montasser, Steve Hanneke, and Nathan Srebro. Adversarially robust learning

with unknown perturbation sets. In Conference on Learning Theory, pages 3452 3482. PMLR, 2021.

[MHS22] Omar Montasser, Steve Hanneke, and Nathan Srebro. Adversarially robust learning: A

generic minimax optimal learner and characterization. Advances in Neural Information Processing Systems, 2022.

[MIG+19] Wesley J Maddox, Pavel Izmailov, Timur Garipov, Dmitry P Vetrov, and Andrew Gor-

don Wilson. A simple baseline for bayesian uncertainty in deep learning. Advances in Neural Information Processing Systems, 32, 2019.

[Mit82] Tom M Mitchell. Generalization as search. Artiﬁcial Intelligence, 18(2):203 226,

[MMR08] Yishay Mansour, Mehryar Mohri, and Afshin Rostamizadeh. Domain adaptation with

multiple sources. Advances in Neural Information Processing Systems, 21, 2008.

[MMS+18] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and

Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.

[MTR+21] John P Miller, Rohan Taori, Aditi Raghunathan, Shiori Sagawa, Pang Wei Koh, Vaishaal

Shankar, Percy Liang, Yair Carmon, and Ludwig Schmidt. Accuracy on the line: on the strong correlation between out-of-distribution and in-distribution generalization. In International Conference on Machine Learning, pages 7721 7735. PMLR, 2021.

[ND16] Hongseok Namkoong and John C Duchi. Stochastic gradient methods for distribution-

ally robust optimization with f-divergences. Advances in Neural Information Processing Systems, 29, 2016.

[PY10] Sinno Jialin Pan and Qiang Yang. A survey on transfer learning. IEEE Transactions on

knowledge and data engineering, 22(10):1345 1359, 2010.

[PZ21] Nikita Puchkin and Nikita Zhivotovskiy. Exponential savings in agnostic active learning

through abstention. In Conference on Learning Theory, pages 3806 3832. PMLR, 2021.

[QCSSL08] Joaquin Quinonero-Candela, Masashi Sugiyama, Anton Schwaighofer, and Neil D

Lawrence. Dataset shift in machine learning. MIT Press, 2008.

[RRR21] Elan Rosenfeld, Pradeep Kumar Ravikumar, and Andrej Risteski. The risks of invariant

risk minimization. In International Conference on Learning Representations, 2021.

[RRSS19] Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. Do im-

agenet classiﬁers generalize to imagenet? In International Conference on Machine Learning, pages 5389 5400. PMLR, 2019.

[RS88] Ronald L Rivest and Robert H Sloan. Learning complicated concepts reliably and

usefully. In Association for the Advancement of Artiﬁcial Intelligence, pages 635 640, 1988.

[RWK20] Leslie Rice, Eric Wong, and Zico Kolter. Overﬁtting in adversarially robust deep

learning. In International Conference on Machine Learning, pages 8093 8104. PMLR, 2020.

[RXY+20] Aditi Raghunathan, Sang Michael Xie, Fanny Yang, John C Duchi, and Percy Liang.

Understanding and mitigating the tradeoff between robustness and accuracy. In Proceedings of the 37th International Conference on Machine Learning, pages 7909 7919, 2020.

[Shi00] Hidetoshi Shimodaira. Improving predictive inference under covariate shift by weight-

ing the log-likelihood function. Journal of Statistical Planning and Inference, 90(2):227 244, 2000.

[SIE+20] Hadi Salman, Andrew Ilyas, Logan Engstrom, Ashish Kapoor, and Aleksander Madry.

Do adversarially robust imagenet models transfer better? Advances in Neural Information Processing Systems, 33:3533 3545, 2020.

[SKHL20] Shiori Sagawa, Pang Wei Koh, Tatsunori B Hashimoto, and Percy Liang. Distribution-

ally robust neural networks. In International Conference on Learning Representations, 2020.

[SKL17] Jacob Steinhardt, Pang Wei W Koh, and Percy S Liang. Certiﬁed defenses for data

poisoning attacks. Advances in Neural Information Processing Systems, 30, 2017.

[SNG+19] Ali Shafahi, Mahyar Najibi, Mohammad Amin Ghiasi, Zheng Xu, John Dickerson,

Christoph Studer, Larry S Davis, Gavin Taylor, and Tom Goldstein. Adversarial training for free! Advances in Neural Information Processing Systems, 32, 2019.

[SS16] Baochen Sun and Kate Saenko. Deep coral: Correlation alignment for deep domain

adaptation. In Computer Vision ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part III 14, pages 443 450. Springer, 2016.

[SST+18] Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Aleksander

Madry. Adversarially robust generalization requires more data. Advances in Neural Information Processing Systems, 31, 2018.

[SSZ+20] Ali Shafahi, Parsa Saadatpanah, Chen Zhu, Amin Ghiasi, Christoph Studer, David

Jacobs, and Tom Goldstein. Adversarially robust transfer learning. In International Conference on Learning Representations, 2020.

[SZS+14] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian

Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In International Conference on Learning Representations, 2014.

[TKP+18] Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and

Patrick Mc Daniel. Ensemble adversarial training: Attacks and defenses. In International Conference on Learning Representations, 2018.

[TS10] Lisa Torrey and Jude Shavlik. Transfer learning. In Handbook of research on machine

learning applications and trends: algorithms, methods, and techniques, pages 242 264. IGI global, 2010.

[TSE+19] Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, and Alek-

sander Madry. Robustness may be at odds with accuracy. In International Conference on Learning Representations, 2019.

[Vap98] Vladimir N Vapnik. Statistical learning theory. Wiley-Interscience, 1998.

[vd VW96] Aad W van der Vaart and Jon A Wellner. Weak convergence. Springer, 1996.

[Wan11] Liwei Wang. Smoothness, disagreement coefﬁcient, and the label complexity of

agnostic active learning. Journal of Machine Learning Research, 12(7), 2011.

[WLF22] Wenxiao Wang, Alexander J Levine, and Soheil Feizi. Improved certiﬁed defenses

against data poisoning with (deterministic) ﬁnite aggregation. In International Conference on Machine Learning, pages 22769 22783. PMLR, 2022.

[WR06] Christopher KI Williams and Carl Edward Rasmussen. Gaussian processes for machine

learning. MIT press Cambridge, MA, 2006.

[YCJ16] Songbai Yan, Kamalika Chaudhuri, and Tara Javidi. Active learning from imperfect

labelers. Advances in Neural Information Processing Systems, 29, 2016.

[ZC21] Hanrui Zhang and Vincent Conitzer. Incentive-aware pac learning. In Proceedings of

the AAAI Conference on Artiﬁcial Intelligence, volume 35, pages 5797 5804, 2021.

[ZL19] Yuchen Zhang and Percy Liang. Defending against whitebox adversarial attacks

via randomized discretization. In The 22nd International Conference on Artiﬁcial Intelligence and Statistics, pages 684 693. PMLR, 2019.

[ZLLJ19] Yuchen Zhang, Tianle Liu, Mingsheng Long, and Michael Jordan. Bridging theory and

algorithm for domain adaptation. In International Conference on Machine Learning, pages 7404 7413. PMLR, 2019.

[ZLWJ20] Yuchen Zhang, Mingsheng Long, Jianmin Wang, and Michael I Jordan. On localized

discrepancy for domain adaptation. ar Xiv preprint ar Xiv:2008.06242, 2020.

[ZN22] Yinglun Zhu and Robert D Nowak. Efﬁcient active learning with abstention. In

Advances in Neural Information Processing Systems, 2022.

[ZYJ+19] Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Laurent El Ghaoui, and

Michael Jordan. Theoretically principled trade-off between robustness and accuracy. In International Conference on Machine Learning, pages 7472 7482. PMLR, 2019.