# adversarially_robust_learning_with_uncertain_perturbation_sets__1e83ed94.pdf

Adversarially Robust Learning with Uncertain Perturbation Sets

Tosca Lechner University of Waterloo, Waterloo, Ontario, Canada, tlechner@uwaterloo.ca

Vinayak Pathak Layer 6 AI, Toronto, Ontario, Canada, vinayak@layer6.ai

Ruth Urner York University, Toronto, Ontario, Canada, uruth@eecs.yorku.ca

In many real-world settings exact perturbation sets to be used by an adversary are not plausibly available to a learner. While prior literature has studied both scenarios with completely known and completely unknown perturbation sets, we propose an in-between setting of learning with respect to a class of perturbation sets. We show that in this setting we can improve on previous results with completely unknown perturbation sets, while still addressing the concerns of not having perfect knowledge of these sets in real life. In particular, we give the first positive results for the learnability of infinite Littlestone classes when having access to a perfect-attack oracle. We also consider a setting of learning with abstention, where predictions are considered robustness violations, only when the wrong label prediction is made within the perturbation set. We show there are classes for which perturbation-set unaware learning without query access is possible, but abstention is required.

1 Introduction

Adversarial perturbations, imperceivably small manipulations to input instances that cause unexpected failures of learned predictors, pose significant safety concerns for their employment. The phenomenon, first exposed a decade ago in image classification tasks [Szegedy et al., 2014], has since received substantial attention, both in the practical and theoretical machine learning research communities. Studies in both lines of research often skip over the question of how to suitably model imperceivable perturbations and rather investigate how to defend against various (fixed) types of attacks. In theoretical studies, this is typically modeled by defining an adversarial loss that, in addition to misclassification, penalizes the existence of a perturbed instance on which a classifier assigns a different label. This type of loss definition then crucially depends on a given perturbation type, that is some function u : X 2X that assigns every domain instance a set of points that would be viewed as an admissible perturbation (often balls of some radius with respect to some norm are considered).

However, in most realistic settings, exact perturbation sets to be used by an adversary are not plausibly available to a learner. Moreover, it has been shown that encouraging robustness with respect to a perturbation type that is not suitable for the task easily degrades classification accuracy [Tsipras et al., 2019, Zhang et al., 2019, Yang et al., 2020]. In response, some work has then explored the question of learnability when the perturbation type is not known at all in advance [Montasser et al., 2021], but information about possible perturbations can be accessed through queries. It is not difficult to be convinced though that without any restrictions or knowledge of the perturbations to be used, robust

37th Conference on Neural Information Processing Systems (Neur IPS 2023).

learning from samples only is impossible. Even for a small collection of possible perturbation types, as soon as these induce inconsistencies between optimal predictors a sample based learner cannot be simultaneously successful with respect to all options.

In this work we explore a middle ground between fixed and unknown perturbation types, which we term learning with uncertain perturbation sets. We will assume that the true perturbation type is a member of a fixed class of perturbation types. One can think of this as knowing that we should encourage robustness with respect to some type of norm induced ball, but may not be certain about which type of norm or which radius is most suitable for the task. We then study which structures (on the interplay between perturbation type class and the class of predictors to be learned) allow for PAC-type learnability guarantees.

Given a class U of perturbation types and hypothesis class H, we define an H-induced partial order on U. We show that when U is actually totally ordered in this sense, statistical (PAC) learning is possible in the realizable case as soon as H has finite VC-dimension. U being totally ordered applies to a wide variety of settings, for example the case where the various perturbation types are balls in some metric space while the radius varies over U (and actually also does not need to be constant over domain X for each u U). However, we also show that, without the realizability assumption learning from samples alone is not feasible, even when U is totally ordered with respect to H.

We thus explore two natural remedies: (1) We allow the learner to interact with a perfect attack oracle [Montasser et al., 2021]. Given an instance x and classifier f, such an oracle certifies robustness or provides an adversarial perturbation of x for f. Previous work on learning with unknown perturbation sets has employed such an oracle and shown learnability for classes with finite Littlestone dimension [Montasser et al., 2021]. In this work, we show that provided certain structures (U being totally ordered or a finite union of totally ordered perturbation type classes) learning with a perfect attack oracle becomes feasible for all classes of finite VC-dimension (which is a much wider space of natural hypothesis classes than those of finite Littlestone dimension). (2) We allow the learner to output a classifier that sometimes abstains from prediction. For this, we define a modification of the adversarial loss, where a hypothesis does not suffer loss when it abstains on an instance that was actually manipulated. We show that this again yields learnability for VC-classes when the perturbation type class is a finite union of totally ordered classes. We then consider the case in which H has finite disagreement-coefficient [Hanneke, 2007] and show that such H can be learned with respect to every class of perturbation types. And finally, we define a notion of (ϵ, H)-cover for a class of perturbation types U, introduce learners for this kind of cover and show that a class H is robustly learnable with respect to U if there is a finite-disagreement-cover learner for (H, U).

Our results on learning with abstentions show that different strategies for guarantees for different kinds of robustness (for example different types of perturbation sets) can be combined into an abstention learner that only predicts in the agreement region. The guarantees for each learning strategy then yield a guarantee for the combined classifier that depends on the number of strategies used. This highlights a different aspect to the power of abstention in adversarially robust learning from what has been established in prior work [Balcan et al., 2020, 2023]. We show that abstentions can also be used to address uncertainty in perturbation types.

1.1 Related work

Providing PAC type learning guarantees under adversarial robustness requirements has received an enormous amount of research attention in the past decade [Feige et al., 2015, Cullina et al., 2018, Wang et al., 2018, Awasthi et al., 2019, Montasser et al., 2019, Attias et al., 2019, Montasser et al., 2020, Ashtiani et al., 2020, Montasser et al., 2021, Gourdeau et al., 2021, 2022, Montasser et al., 2022, Attias and Hanneke, 2022, Attias et al., 2022, Bhattacharjee et al., 2021, Awasthi et al., 2022b,a, 2023, Mao et al., 2023]. We will focus here on aspects within this literature that are most relevant to our settings.

Most prior works study the sample complexity of adversarial robustness with respect to a fixed type of adversarial perturbation (often a metric ball with a certain radius). However, recent research has developed frameworks of analysis that go beyond learning with respect to a fixed type of perturbations. It has been argued that the correct type of admissible adversarial perturbations (which originally were imperceivable modifications to input instances that lead to misclassification by a learned predictor) should depend on the underlying data-generating process [Bhattacharjee and Chaudhuri,

2021, Chowdhury and Urner, 2022]. These studies define a notion of adaptive robustness with respect to which a predictor should be evaluated. The drawback of these notions is that the correct perturbation type is not available to the learner and thus a predictor cannot be straightforwardly evaluated with respect to this loss. A different way of relaxing the assumption of a fixed, known perturbation type is provided in the framework of tolerance [Ashtiani et al., 2023, Bhattacharjee et al., 2023]. Here, a learner is evaluated with respect to some fixed perturbation type, while being compared to the optimal achievable loss with respect to a larger perturbation type, relaxing the requirement for the learner and reflecting that the exact relevant perturbation type is typically not known (or even uniquely existing).

Most relevant to our work is a recent study on PAC learning with respect to unknown perturbation sets [Montasser et al., 2021]. This work provides learning guarantees for the case that the robustness requirement will be with respect to an entirely unknown perturbation type without any prior knowledge about the nature of these perturbations. Those learning bounds assume access to a perfect attack oracle (as we do in some parts of this work), and require the class to have finite Littlestone dimension. In this work, we show that adding some structure to the class of perturbation types (as well as the promise that the true perturbation type will be a member of this class), allows for sample and query complexity bounds that are independent of the hypothesis class s Littlestone dimension.

Finally, there is a long line of work studying the benefits of abstentions for classification tasks [Bartlett and Wegkamp, 2008, El-Yaniv and Wiener, 2010, Wiener and El-Yaniv, 2015, Yuan and Wegkamp, 2010, Zhang and Chaudhuri, 2016]. Some recent studies have also shown that the ability to abstain from prediction for manipulated instances can be beneficial for adversarial robustness requirements [Balcan et al., 2020, 2023]. However, the former studies a different setting for the generation of adversarial perturbations, and the latter considers fixed perturbation types.

1.2 Overview of results

In Section 2, we define our setup, introduce the notions of learnability and loss we investigate and provide an overview of the different kinds of prior knowledge and oracle access we employ. In particular, we there define our partial order on classes of perturbation types.

In Section 3 we analyze perturbation type classes for which this order is a total order. Theorem 1 states that every hypothesis class with finite VC dimension can be robustly learned in the realizable case with respect to any totally ordered perturtbation type class. We then prove that this result cannot be extended to the agnostic case (Observation 1). This motivates investigating the benefits of a perfect attack oracle. Theorem 2 shows that the impossibility can be overcome with access to a perfect attack oracle and provides a finite sample and oracle-query bound for the agnostic case.

In Section 4, we investigate classes of perturbation types U that are not necessarily totally ordered. We first extend our previous results on learning with access to a perfect-attack-oracle to the case where U is a union of finitely many totally ordered perturbation classes (Theorem 3). We then establish that there are classes of perturbation types and hypothesis classes which cannot be learned without abstention, even with access to a perfect attack oracle (Observation 3), but which can be learned with abstention without the need for access to a perfect-attack-oracle (Observation 4). This motivates our investigations into learning with respect to our adversarial abstention loss (Definition 5).

We show that in the realizable case, every hypothesis class with finite VC-dimension can be learned with respect to a finite union of totally ordered perturbation types in Theorem 4. We then consider the case in which H has finite disagreement-coefficient and show that such H can be learned with respect to every class of perturbation types (Theorem 5). Lastly, we define a notion of (ϵ, H)-cover for a class of perturbation types U, which states that for every u U there is a close-to-optimal hypothesis for u in the cover (Definition 7) and introduce the notion of learners for this kind of cover (Definition 8). We then generalize our previous results by showing that a class H is robustly learnable with respect to U if there is a finite-disagreement-cover learner for (H, U) (Theorem 6). While we provide some proof sketches in the main body of the paper, the detailed proofs for all results are in the supplement.

For sets A and B, we let 2A denote the powerset of A and AB the set of all functions from B to A. Let X be a domain and Y = {0, 1} be a label space. We model the data generation as a distribution

P over X Y, and let PX denote its marginal over X. A hypothesis or predictor is a function h : X {0, 1, }. Thus, on input x X, a hypothesis h either predicts label 0 or 1 or abstains by outputting . We call a hypothesis non-abstaining if h(x) = for all x X. A hypothesis class H is a set of non-abstaining predictors. We will let F = {0, 1, }X denote the set of all predictors over X.

The performance of a predictor h is measured by a loss function ℓ: F X Y R. For loss ℓ, we use the notation LP (h) = E(x,y) P [ℓ(h, x, y)], to denote the expected loss of predictor h with respect to distribution P and LS(h) = 1 |S| P|S| i=1 ℓ(h, xi, yi) the empirical loss on dataset S = ((x1, y1), . . . , (xm, ym)). Further, for a hypothesis class H we will let opt P (H) = infh H LP (h) denote the approximation error of class H with respect to P and loss ℓ.

The standard loss for classification tasks is the binary loss ℓ0/1(h, x, y) = 1 [h(x) = y], where 1 [ ] denotes the indicator function. The standard loss for adversarial robustness additionally penalizes h on instance (x, y) for the existence of a perturbation on x that h does not label with y. This loss is defined for a specified perturbation type, which is a function u : X 2X that assigns every instance x X a set u(x) X with x u(x) of admissible perturbations. The adversarial loss with respect to u is then defined as ℓu(h, x, y) = 1 [ z u(x) : h(z) = y] .

We will use the notation ℓu, (h, x) to denote the adversarial component loss, which indicates whether a domain point x lies close to the decision boundary with respect to the perturbation type u:

ℓu, (h, x) = 1 [ z u(x) : h(z) = h(x)] .

For hypotheses that have the option to abstain from prediction, we propose a variation of the adversarial loss that allows for a predictor to abstain from prediction on perturbed instances (but not on unperturbed instances) without suffering loss. The following definition captures this modification:

Definition 1 (Adversarial abstention loss). For a perturbation type u : X 2X , the adversarial abstention loss is defined by

ℓu,abst(h, x, y) =

1 if h(x) = y 1 if z u(x) such that h(z) = y and h(z) = 0 otherwise

The main difference with adversarial loss is that if h(z) = for an adversarial point z u(x) with z = x, then there is no penalty and ℓu,abst(h, x, y) = 0, thus allowing the predictor to abstain on points that were perturbed. This definition models the scenario where if the predictor can correctly detect that an adversarial attack has happened and abstains, then it is not penalized. However, abstaining on non-perturbed points is still penalized. Note that in case a predictor h never abstains (that is h(x) = for all x X) the two definitions of adversarial loss coincide. We let Lu P (h), Lu,abst P (h), Lu S(h), Lu,abst S (h) denote the corresponding expected and empirical losses, and optu P (H), optu,abst P (H) the corresponding approximation errors of a class H and distribution P.

Uncertain perturbation type In the literature, adversarial robustness is mostly treated with respect to a fixed (known) perturbation type u, while learning with respect to an entirely unknown perturbation type has also been investigated. In this work, we introduce a setting that naturally interpolates between these two extremes. We assume that the true perturbation type u is unknown to the learner, but promised to lie within a fixed class U of perturbation types. That is, the learner has prior knowledge of a class U with u U , where u is the perturbation type that the learner will be evaluated with (see Definition 3 below). We can think of u as the perturbation type that the true adversarial environment will employ to manipulate input instances and induce misclassifications. We let Uall denote the class that contains all possible perturbation types.

It is easy to see that without any restrictions on the class U and any capability of the learner to (interactively) gather information about u and no option to abstain, a learner cannot succeed (see also Observations 1 and 2 below). To gain control over potentially infinite classes of perturbation types, we will now define a partial order on a class U that is induced by a hypothesis class H. This structure will prove to be conducive for learnability.

Definition 2. Let H {0, 1}X be a hypothesis class. For perturbation types u1, u2 we say that u1 is smaller than u2 with respect to class H, and write u1 H u2, if and only if for all h H and all x X we have ℓu1, (h, x) ℓu2, (h, x).

We say a set of perturbation types U is totally ordered, with respect to H, if for every u1, u2 U, we have either u1 H u2 or u2 H u1.

We have u1 H u2 if, for every function h H the margin area (the points which receive positive adversarial component loss) with respect to u1 is included in the margin area with respect to u2. This is, for example, the case if all perturbation sets with respect to u1 are included in those induced by u2.

Resources to learners and learnability A learner is a function A : (X Y) F that takes in a finite sequence S = ((x1, y1), . . . (xm, ym)) of labeled examples and outputs a predictor A(S) : X {0, 1, }. We will study the following PAC-type [Valiant, 1984] notion of learnability of hypothesis classes H with perturbation type classes U. Definition 3 (Learnability without abstentions). Let H {0, 1}X be a hypothesis class and U a class of perturbation types. We say that H is robustly learnable with respect to perturbation class U if there exists a learner A and a function m : (0, 1)2 N such that for every ϵ, δ > 0, every m m(ϵ, δ), every u U and every data-generating distribution P we have

PS P m[Lu P (A(S)) optu P (H) + ϵ] 1 δ.

We speak of learnability in the realizable case if the above requirement is relaxed to only hold for all distributions P with optu P (H) = 0. Without this assumption, we also speak of learnability in the agnostic case.

If the above definition is fulfilled for a class H, then the function m : (0, 1)2 N in the above definition is an upper bound on the sample complexity of the learning task. Note that, while we don t require the learner in the above definition to output a non-abstaining hypothesis, predicting will always cause loss with respect to ℓu. We can thus assume all considered predictors are non-abstaining classifiers for this setting.

Note that the above definition generalizes and unifies previous notions of adversarially robust learnability. We obtain learnability with respect a known perturbation type (the setting that is mostly considered in prior works) when U = {u } is a singleton class. Learning with unknown perturbations [Montasser et al., 2021] on the other hand, is the setting where U is the set of all possible perturbation types. In this work, we are interested in studying more fine-grained conditions on the structure of (the combination of) H and U that allow for learnability. We will call the case where U is neither a singleton set nor the set of all possible perturbation types learning with uncertain perturbation sets.

We will show that, even when the class of perturbation types U in consideration is totally ordered with respect to hypothesis class H, agnostically learning H with U is impossible even in very simple cases (see Observation 1). We will thus, as has been done in earlier studies [Ashtiani et al., 2020, Montasser et al., 2021] in addition allow the learner access to a perfect attack oracle, which we model as follows: Definition 4 (Perfect Attack Oracle). A perfect attack oracle for perturbation type u is a function Ou : F X X {robust}, that takes as input a non-abstaining predictor f and a domain point x and either certifies that f is robust on x with respect to u by outputting robust or returns an admissible perturbation of x for f. That is

Ou(f, x) = robust if f(z) = f(x) for all z u(x) z u(x) with f(x) = f(z) otherwise

We say that a class H and perturbation class U are learnable with access to a perfect attack oracle if there exists a function n : (0, 1)2 N such that the conditions in Definition 3 are satisfied for a learner that additionally makes at most n(ϵ, δ) queries to the perfect attack oracle. The function n( , ) then is an upper bound on the attack oracle complexity of the learning problem.

In Section 4 we will consider more general pairs of hypothesis and perturbation classes H and U. We will show that if U is not necessarily totally ordered with respect to H (or a finite union of such totally

ordered classes) there are tasks on which no learner (even with access to a perfect attack oracle) can succeed (see Observation 3 below). For such cases we will explore learnability with respect to the adversarial abstention loss:

Definition 5 (Learnability with abstentions). Let H {0, 1}X be a hypothesis class and U a class of perturbation types. We say that H is robustly learnable with abstentions with respect to perturbation class U if there exists a learner A and a function m : (0, 1)2 N such that for every ϵ, δ > 0, every m m(ϵ, δ), every u U and every data-generating distribution P we have

PS P m[Lu ,abst P (A(S)) optu P (H) + ϵ] 1 δ.

Note that since H contains only non-abstaining predictors, we have optu P (H) = optu ,abst P (H).

Discussion on additional parameters and assumptions Throughout this paper, we will also work with various standard complexity measures, such as bounded VC-dimension or bounded Littlestone dimension of the hypothesis class H, bounded VC-dimension of the loss class or the adversarial component loss class. We refer the reader to the appendix for a reminder of the definitions of these notions.

We will further assume that learners have the capability to identify empirical risk minimizing hypotheses from a class. It is standard (though often implicit) to assume this as oracle access for the learner as well. We consider the standard ERM oracle, a robust ERM (RERM) oracle (for a fixed perturbation type u) and a maximally robust ERM (MRERM) oracle (for a class U of perturbation types, and realizable samples only). We define the following oracles:

ERMH : S 7 h S argminh HL0/1 S (h) RERMu H : S 7 hu S argminh HLu S(h)

MRERMU H : S 7

(hu S S , u S) with hu S S argminh HLu S S (h), and u S is H maximal in the set {u U | minh H Lu S(h) = 0} error if Lu S(h) > 0 for all u U and h H

That is, we will assume that MRERMU H will return an error if the input sample S is not ℓu-realizable for any u U. If, on the other hand, there does exist a u U for which S is ℓu-realizable, then MRERMU H will return a maximal (with respect to H) such perturbation type u S and corresponding hypothesis hu S S with Lu S (hu S ) = 0. In particular, we assume that for every non-empty sample S there exists such a maximal element u S U and corresponding ERM hypothesis from H. While these do not always exist in U and H a priori, it has recently been shown that it is possible to embed U and H into slightly larger classes so that the MRERMU H oracle is always well defined [Lechner et al., 2023]. See Appendix Section A for more details on this.

Since we focus on studying sample complexity (independently of computational considerations) and consider learners as functions throughout, assuming access to the above oracles is not restricting the validity of our bounds. It is still interesting to understand when is their existence reasonable to assume. Both ERM and RERM oracles have been widely used in the literature and they are often computationally hard to implement. Assuming their existence, however, an MRERM oracle can be easily implemented for the case where U is finite and totally ordered by running a binary search over U and calling RERM for the comparison step. If we assume U is parametrized by a k-bit number, then this requires O(k) queries to RERM.

3 Learning with a totally ordered perturbation class

We will start with investigating the case where the perturbation class U is totally ordered with respect to hypothesis class H (see Definition 2). Note that if two perturbation types u1 and u2 we satisfy u1(x) u2(x) for all x X, then u1 is smaller than u2, u1 H u2, for every hypothesis class H {0, 1}X . This is, for example, the case when u1 and u2 assign balls (with respect to some metric) centered at x to every domain point x and u2 always assigns a ball of radius at least as large as the ball assigned by u1 (while the balls assigned by u1 and u2 are not necessarily required to have the same radii uniformly over the space X).

In this section, we will focus on learning with respect to the standard adversarial loss and thus only consider non-abstaining predictors. We first show that for H and totally ordered U, whenever the class H has bounded VC-dimension we get learnability in the realizable case. This result is based on adapting a compression argument for adversarially robust learning [Montasser et al., 2019] to the case of uncertain perturbation sets. A very similar adaptation has recently been made for the related problem of strategic classification [Lechner et al., 2023]. The proof of this result can be found in the appendix. Theorem 1. Let U be a perturbation class that is totally ordered with respect to a hypothesis class H with VC(H) = d < . Then H is learnable in the realizable case with respect to U with sample complexity O( d 2d log(1/δ)

Recall that realizability here means optu P (H) = 0 for the true perturbation type u (with respect to which the learner is evaluated). We now show that without this assumption learning guarantees are impossible without equipping the learner with additional resources (or weakening the success criteria). Observation 1. There exists a class H with VC(H) = 1 and a perturbation class U that is totally ordered with respect to H that is not learnable (in the sense of Definition 3).

We will now show that allowing a learner access to a perfect attack oracle yields learnability in the agnostic case. Theorem 2. Let U be a perturbation class that is totally ordered with respect to a hypothesis class H with VC(H) = d < and assume access to a perfect attack oracle. Then H is learnable in the agnostic case with respect to U with sample complexity m(ϵ, δ) = O( d 2d log(1/δ)

ϵ2 ) and oracle complexity O(m(ϵ, δ)2).

Proof. We will employ a well-known reduction from agnostic learning to realizable compressionbased learning [Montasser et al., 2019]. Since our proof for Theorem 1 for the realizable case of learning with uncertain, totally ordered perturbation sets employs a compression-based learner, for the agnostic case it suffices to show that, given any sample S = ((x1, y1), . . . , (xm, ym)), we can identify a largest subsample S of S that is realizable with respect to the underlying perturbation type u . We will now outline how the perfect attack oracle can be employed to achieve this.

Employing the MRERMU H-oracle, the learner can generate an ordered list T = ((S1, h1, u1), . . . (Sn, hn, un))) of subsamples of S that are realizable with respect to some u U, such that (hi, ui) = MRERMU H(Si) for all i and ui H uj for all i j. Note that we have u H ui if and only if the perfect attack oracle returns robust for all sample points in Si when tested on hi, that is Ou (hi, x) = robust for all x with (x, y) Si for some y. Thus, using a binary search over the list T, we can identify the smallest index ι for which u H uι. Since we have n 2m and every test in the search employs at most m calls to the perfect attack oracle, this search terminates after at most m2 queries. Finally, note that all subsets Si from T for which i ι are realizable with respect to u . Thus we can choose any largest among Sι, . . . Sn as a maximal realizable subset with respect to the unknown perturbation type u .

In Section B we provide additional guarantees for the agnostic case.

4 Beyond totally ordered perturbation classes

We now consider a more general setting where U is not necessarily totally ordered with respect to H.

4.1 Learning with the standard adversarial loss

We will start by showing that H, U are learnable with access to a perfect attack oracle if U is the union of k sub-classes which are each totally ordered with respect to H. This naturally contains the case that U is finite. Another natural case where this occurs is considering perturbation sets that are balls around domain points of certain radii and with respect to one of a number of possible norms. Consider X = Rd and the set

U = {up r : x 7 Bp r(x) | r R, p 0, 1, 2, }

where we let Bp r(x) denote the ℓp-norm ball around x. Then the above class U is a union of 4 totally ordered perturbation classes (one for balls of each norm) with respect to any hypothesis class H.

Theorem 3. Let H be a hypothesis class with VC(H) = d < , and let U be a perturbation class, which is a union U = U1 U2 . . . Uk of k subclasses which are each totally ordered with respect to class H. Then H is learnable with access to a perfect attack oracle Ou in the realizable case with respect to U with sample complexity O( d 2d log(k/δ)

ϵ + log(k) log(1/ϵ)+log(1/δ)

ϵ2 ) and query complexity

O( log(k) log(1/ϵ)+log(1/δ)

Proof. We can generate k hypotheses h1, h2, . . . , hk that are the output of an (ϵ, δ)-successful learner for H with respect to each of the Ui. Since each of the classes Ui are totally ordered with respect to H, the result of Theorem 1 tells us that O( d 2d log(k/δ)

ϵ ) sample points suffice to guarantee that, with probability at least 1 δ, we have ℓu (hi) ϵ for all i with u Ui. Thus the class Hk = {h1, h2, . . . , hk} of these k learning outputs has approximation error at most ϵ with respect to the data generating distribution P. Note that, since we have |Hk| = k, this class has also bounded Littlestone dimension at most log(k). Employing a result (Montasser et al. [2021], Theorem 3) on agnostically learning finite Littlestone classes with access to a perfect attack oracle, an additional O( log(k) log(1/ϵ)+log(1/δ)

ϵ2 ) samples and the stated number of queries suffice to output a hypothesis h with Lu P (h) 3ϵ.

It is not difficult to see that, even in the realizable case, and when U is the union of just two totally ordered subclasses, a learner without access to a perfect attack oracle cannot succeed with respect to the standard adversarial loss.

Observation 2. There exists a class H with VC(H) = 1 and a perturbation class U that is the union of two totally ordered subclasses with respect to H that are not learnable in the realizable case (in the sense of Definition 3) without access to a perfect attack oracle.

For the proof of this observation we refer the reader to the supplementary material. Now we show that if we remove even more structure from the set of perturbation types U then learning success becomes impossible even with access to a perfect attack oracle. The following observation will then motivate to investigate learning with respect to the adversarial abstention loss (rather than the standard adversarial loss).

Observation 3. There exists a class H with VC(H) = 1 and a perturbation class U that are not learnable (in the sense of Definition 3), even with access to a perfect attack oracle.

Proof. We consider the same domain, class, and distribution as in the proof of the previous observation: X = R, P with P(( 3, 1) = P((3, 0)) = 0.5, and

H = {ht : x 7 1 [x t] | t R}.

Further, we consider the class U = {ua,b | a, b R} of perturbation types ua,b, that assign all points x 0 to a ball of radius a around x and all points x > 0 to a ball of radius b. Then, even when promised realizability, a learner cannot distinguish the cases where u = ua,b for some values a, b with a + b 6 from samples and any finite number of queries to the perfect attack oracle. In particular, if u = ua,b for some a, b with a + b = 6, predicting with respect to one such pair a, b will induce loss 0.5 for all other such pairs.

4.2 Learning with the adversarial abstention loss

We will now consider learning with respect to the adversarial abstention loss. We will first show that learning with abstentions can actually help overcome some of the impossibilities shown above. For this, let us revisit the example seen in Observation 3.

Observation 4. There exists a class H with VC(H) = 1 and a perturbation class U that are not learnable (in the sense of Definition 3), even with access to a perfect attack oracle, but that is robustly learnable with abstentions (in the sense of Definition 5) in the realizable case (without access to a perfect attack oracle).

Proof sketch. We consider the example from the proof of Observation 3. Now consider the subclasses U1 = {ua,0 : a R} and U2 = {u0,b : b R}. Since both U1 and U2 are totally ordered we can learn successful hyptheses h1 and h2 respectively. We now define h : X {0, 1, } by h(x) = y if h1(x) = h2(x) = y and h(x) = otherwise. This is a successful hypothesis. For more detail, we refer the reader to the appendix.

We will now continue by showing that, similar to the case where we have access to a perfect-attackoracle, a finite union U = Sk i=1 Ui of totally ordered perturbation sets Ui is robustly learnable with abstention in the robustly realizable case.

Theorem 4. Let U = Sk i=1 Ui, where every Ui is totally ordered. Furthermore, let H have a finite VC dimension. Then H is robustly learnable with abstension with respect to U in the robustly realizable case with sample complexity O( d 2d k2 log(k/δ)

4.2.1 Abstention learning with disagreement coefficient

The proof of Theorem 4 shows how bounding the mass of the region on which two learners with small 0/1-loss disagree, yields a robust error guarantee for learning with abstentions. In this section, we will review the definition of the disagreement coefficient (Hanneke [2007], Hanneke et al. [2014]), and show that hypothesis classes H with finite disagreement coefficient are robustly learnable with abstention in the realizable case with respect to every class of perturbation types U. For a distribution PX over X, we let the PX -induced difference between two hypotheses be denoted by d PX (h, h ) = Px PX [h(x) = h (x)].

The ball in H around a hypothesis h with radius r is then denoted by

BPX (h, r) = {h H : d PX (h, h ) r}.

Lastly, let the disagreement region of a set of hypotheses H be

DIS(H) = {x X : h, h H with h(x) = h (x)}.

We can now state the definition of the disagreement coefficient.

Definition 6 (Hanneke et al. [2014]). For a distribution P over X Y and a class H {0, 1}X , let h arg minh H LP (h ). The disagreement-coefficient of H with respect to P is defined by:

θP (H) = sup r (0,1)

PX (DIS(B(h , r)))

Furthermore, let the worst-case disagreement coefficient be θ(H) = sup P (X Y) θP (H).

Theorem 5. Let H be a hypothesis class with θ(H) < and V C(H) = d < . Then the class H is robustly learnable with abstention in the realizable case with respect to the class of all perturbation types Uall with sample complexity O( d+log( 1

δ ) ( ϵ θ(H) )2 ).

The proof can be found in Appendix D.6, but the main idea is that for any point x, we return a y {0, 1} if and only if every h with L0/1 S (h) = 0 agrees with y, otherwise we abstain. Because of the bound on the disagreement coefficient, one can show that for a large enough S, this leads to a hypothesis with low abstention loss.

4.2.2 Abstention learning via ϵ-cover

We will now generalize our previous results, by introducing the concept of an (ϵ, H)-cover.

Definition 7 ((ϵ, H)-cover of U). Let H {0, 1}X be a hypothesis class, P a distribution over X Y and U a class of perturbation types. A set of hypotheses H is a (ϵ, H)-cover for U with respect to P, if for every u U, there exists an h H such that Lu P (h ) < infh H Lu P (h) + ϵ.

We will now provide the definition of a successful cover learner.

Definition 8. Let H {0, 1}X be a hypothesis class, and U be a class of perturbation types. We say a function Acover : (X Y) 2{0,1}X that maps a sample S to a hypothesis class ˆH F is a successful finite-disagreement-cover learner for (H, U), if there is a function m(H,U),cover : (0, 1)3 N, such that for every ϵ, η, δ (0, 1), every m m(H,U),cover(ϵ, η, δ) and every distribution P over X {0, 1} with probability 1 δ over S P m both of the following statements hold:

A(S) is an (ϵ, H)-cover for U with respect to P

PX (DIS(A(S)) η .

We note that any successful cover learner, that is guaranteed to output a finite set is a successful finite-disagreement cover learner. The case, where U is a union of k totally ordered perturbation type classes Ui, we have seen before, can be interpreted as first learning a cover ˆH of size k, where each of the element of the union "covers" one of the classes. The case in which H has a finite disagreement coefficient can furthermore be seen as a case where the set of all ERMH hypotheses constitute an (ϵ, H)-cover H for every class of perturbation types U. The mass of the disagreement region of H is then bounded by the disagreement coefficient. We now state our result generalizing these observations.

Theorem 6. A class H is robustly learnable with abstentions with respect to U in the robustly realizable case, if there is a successful finite-disagreement-cover learner for (H, U). Furthermore, the sample complexity of robustly learning with abstentions is then bounded by m H,U(ϵ, δ) m H,U,cover(ϵ/2, ϵ/2, δ).

Inspired by considerations of adaptive robustness radii [Chowdhury and Urner, 2022, Bhattacharjee and Chaudhuri, 2021], in Section C of the appendix we discuss a setting of abstention learning in which we allow the size of the perturbation sets to vary locally between different regions of the domain. We provide a learning guarantee for this setting that is not covered by our previous results.

5 Conclusion

In this work we relaxed the assumption that we know exactly the perturbation sets that are to be used by an adversary. Our model provides an interpolation between knowing the perturbation sets exactly and not knowing them at all. Many of our results rely on realizability. We have also shown that without realizability, learning is not possible unless the learner receives other feedback, such as access to a perfect attack oracle. It might be interesting to consider some milder relaxations of realizability. For example, instead of saying optu P (H) = 0, what if we only know that the adversarial Bayes loss is zero for some u U?

While our work focuses on statistical aspects, it will also be interesting to understand the kinds of computation required for our results. For example, we rely on the existence of an MRERM oracle. When are they computationally feasible? Do we require accurate MRERM oracles or can approximate (and hence potentially more tractable) versions also be used for the learning task?

Another question is to find a dimension that characterizes learning in various settings. This was only recently resolved for adversarially robust learning (for the case when u is known) Montasser et al. [2022], however, the dimension proposed in the paper is not easy to define. On the other hand, in the setting when the only information we can obtain about u is through a perfect attack oracle, the Littlestone dimension of H has been shown to characterize adversarially robust learning Montasser et al. [2021]. It will be interesting to see if such a simple dimension can be obtained for the settings considered in our paper.

Acknowledgements

Ruth Urner is also a faculty affiliate member at Toronto s Vector institute, and acknowledges research support through an NSERC discovery grant. Tosca Lechner is supported by a Vector Research Grant and a Waterloo Apple Ph D Fellowship in Data Science and Machine Learning.

Hassan Ashtiani, Vinayak Pathak, and Ruth Urner. Black-box certification and learning under adversarial perturbations. In Proceedings of the 37th International Conference on Machine Learning, ICML, pages 388 398, 2020.

Hassan Ashtiani, Vinayak Pathak, and Ruth Urner. Adversarially robust learning with tolerance. In International Conference on Algorithmic Learning Theory, ALT, pages 115 135, 2023.

Idan Attias and Steve Hanneke. Adversarially robust pac learnability of real-valued functions. ar Xiv preprint ar Xiv:2206.12977, 2022.

Idan Attias, Aryeh Kontorovich, and Yishay Mansour. Improved generalization bounds for robust learning. In Algorithmic Learning Theory, ALT, pages 162 183, 2019.

Idan Attias, Steve Hanneke, and Yishay Mansour. A characterization of semi-supervised adversarially robust pac learnability. Advances in Neural Information Processing Systems, 35:23646 23659, 2022.

Pranjal Awasthi, Abhratanu Dutta, and Aravindan Vijayaraghavan. On robustness to adversarial examples and polynomial optimization. In Advances in Neural Information Processing Systems, Neur IPS, pages 13760 13770, 2019.

Pranjal Awasthi, Anqi Mao, Mehryar Mohri, and Yutao Zhong. H-consistency bounds for surrogate loss minimizers. In International Conference on Machine Learning, pages 1117 1174. PMLR, 2022a.

Pranjal Awasthi, Anqi Mao, Mehryar Mohri, and Yutao Zhong. Multi-class h-consistency bounds. Advances in neural information processing systems, 35:782 795, 2022b.

Pranjal Awasthi, Anqi Mao, Mehryar Mohri, and Yutao Zhong. Theoretically grounded loss functions and algorithms for adversarial robustness. In International Conference on Artificial Intelligence and Statistics, pages 10077 10094. PMLR, 2023.

Maria-Florina Balcan, Avrim Blum, Dravyansh Sharma, and Hongyang Zhang. On the power of abstention and data-driven decision making for adversarial robustness. Co RR, abs/2010.06154, 2020.

Maria-Florina Balcan, Steve Hanneke, Rattana Pukdee, and Dravyansh Sharma. Reliable learning for test-time attacks and distribution shift. Co RR, abs/2304.03370, 2023.

Peter L. Bartlett and Marten H. Wegkamp. Classification with a reject option using a hinge loss. Journal of Machine Learning Research, 9(59):1823 1840, 2008.

Robi Bhattacharjee and Kamalika Chaudhuri. Consistent non-parametric methods for maximizing robustness. In Advances in Neural Information Processing Systems 34 Neur IPS, pages 9036 9048, 2021.

Robi Bhattacharjee, Somesh Jha, and Kamalika Chaudhuri. Sample complexity of robust linear classification on separated data. In International Conference on Machine Learning, pages 884 893. PMLR, 2021.

Robi Bhattacharjee, Max Hopkins, Akash Kumar, Hantao Yu, and Kamalika Chaudhuri. Robust empirical risk minimization with tolerance. In International Conference on Algorithmic Learning Theory ALT, pages 182 203, 2023.

Sadia Chowdhury and Ruth Urner. Robustness should not be at odds with accuracy. In 3rd Symposium on Foundations of Responsible Computing, FORC, pages 5:1 5:20, 2022.

Daniel Cullina, Arjun Nitin Bhagoji, and Prateek Mittal. Pac-learning in the presence of adversaries. In Advances in Neural Information Processing Systems, Neur IPS, pages 230 241, 2018.

Ran El-Yaniv and Yair Wiener. On the foundations of noise-free selective classification. J. Mach. Learn. Res., 11:1605 1641, 2010.

Uriel Feige, Yishay Mansour, and Robert Schapire. Learning and inference in the presence of corrupted inputs. In Conference on Learning Theory, COLT, pages 637 657, 2015.

Pascale Gourdeau, Varun Kanade, Marta Kwiatkowska, and James Worrell. On the hardness of robust classification. J. Mach. Learn. Res., 22:273:1 273:29, 2021.

Pascale Gourdeau, Varun Kanade, Marta Kwiatkowska, and James Worrell. Sample complexity bounds for robustly learning decision lists against evasion attacks. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI, pages 3022 3028, 2022.

Steve Hanneke. A bound on the label complexity of agnostic active learning. In Machine Learning, Proceedings of the Twenty-Fourth International Conference ICML, pages 353 360, 2007.

Steve Hanneke et al. Theory of disagreement-based active learning. Foundations and Trends in Machine Learning, 7(2-3):131 309, 2014.

Tosca Lechner, Ruth Urner, and Shai Ben-David. Strategic classification with unknown user manipulations. In International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pages 18714 18732. PMLR, 2023.

Anqi Mao, Mehryar Mohri, and Yutao Zhong. Cross-entropy loss functions: Theoretical analysis and applications. ar Xiv preprint ar Xiv:2304.07288, 2023.

Omar Montasser, Steve Hanneke, and Nathan Srebro. VC classes are adversarially robustly learnable, but only improperly. In Conference on Learning Theory, COLT, pages 2512 2530, 2019.

Omar Montasser, Surbhi Goel, Ilias Diakonikolas, and Nathan Srebro. Efficiently learning adversarially robust halfspaces with noise. ar Xiv preprint ar Xiv:2005.07652, 2020.

Omar Montasser, Steve Hanneke, and Nathan Srebro. Adversarially robust learning with unknown perturbation sets. In Conference on Learning Theory, COLT, pages 3452 3482, 2021.

Omar Montasser, Steve Hanneke, and Nati Srebro. Adversarially robust learning: A generic minimax optimal learner and characterization. Advances in Neural Information Processing Systems, 35: 37458 37470, 2022.

Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, 2014.

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In 2nd International Conference on Learning Representations, ICLR, 2014.

Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, and Aleksander Madry. Robustness may be at odds with accuracy. In 7th International Conference on Learning Representations, ICLR, 2019.

Leslie G. Valiant. A theory of the learnable. Commun. ACM, 27(11):1134 1142, 1984.

Yizhen Wang, Somesh Jha, and Kamalika Chaudhuri. Analyzing the robustness of nearest neighbors to adversarial examples. In Proceedings of the 35th International Conference on Machine Learning, ICML, pages 5120 5129, 2018.

Yair Wiener and Ran El-Yaniv. Agnostic pointwise-competitive selective classification. Journal of Artificial Intelligence Research, 52:171 201, 2015.

Yao-Yuan Yang, Cyrus Rashtchian, Hongyang Zhang, Ruslan Salakhutdinov, and Kamalika Chaudhuri. A closer look at accuracy vs. robustness. In Advances in Neural Information Processing Systems 33, Neur IPS, 2020.

Ming Yuan and Marten H. Wegkamp. Classification methods with reject option based on convex risk minimization. J. Mach. Learn. Res., 11:111 130, 2010.

Chicheng Zhang and Kamalika Chaudhuri. The extended littlestone s dimension for learning with mistakes and abstentions. In Proceedings of the 29th Conference on Learning Theory, COLT, pages 1584 1616, 2016.

Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P. Xing, Laurent El Ghaoui, and Michael I. Jordan. Theoretically principled trade-off between robustness and accuracy. In Proceedings of the 36th International Conference on Machine Learning, ICML, pages 7472 7482, 2019.

A Maximally Robust Empirical Risk Minimization

Recall the definition of the empirical risk minimization oracles:

ERMH : S 7 h S argminh HL0/1 S (h) RERMu H : S 7 hu S argminh HLu S(h)

MRERMU H : S 7

(hu S S , u S) with hu S S argminh HLu S S (h), and u S is H maximal in the set {u U | minh H Lu S(h) = 0} error if Lu S(h) > 0 for all u U and h H

While there always exists functions h S and hu S that are solutions to the queries in the first two oracles, the last one requires a bit more care.

Let S be some labeled dataset and assume that H and U are hypothesis and perturbation type classes such that S is H-realizable for the true perturbation type u U. We can then define a maximal perturbation type u S for S, H and U as follows:

u U such that minh H Lu S(h)=0 u(x).

This defines a perturbation type u S(x), which is not necessarily in U but does satisfy u H u S, which suffices for our purposes.

If, given u S, the set argminh HLu S S (h) is non-empty, the oracle can return any function in this set as hu S S . We will now argue that we can define a closure H of the hypothesis class H, which has the same VC-dimension as H and such that hu S S H.

We here show how such an embedding of H into the closure class H can be constructed in the case of a countable domain X and a perturbation class U that is separable, in the sense that for every U U there exists a countable subset U U , such that for every x X : S u U u(x) = S u U u(x). This is the case for a wide range of perturbation types, in particular in the case for which the sets u(x) are always open balls with respect to some ℓp-norm in Rd. In the general case (where X may be uncountable and U not separable in the above sense) recent work has shown that the existence of a closure class with identical VC-dimension is still guaranteed [Lechner et al., 2023]. The more general construction there employs the set-theoretic concept of filters. It is shown in the context of strategic classification, but the technique is readily available to adversarial perturbation sets. We here illustrate the intuition behind that abstract construction by focusing on countable and separable case.

We now assume that X is countable and let (xi)i N be an enumeration of the elements of X. The separability assumption on the perturbation class U implies that for all S there is a countable subset Uc U such that for all x X we have

u Uc such that minh H Lu S(h)=0 u(x).

For a given sample S, let (ui)i N be an enumeration of the set

{u Uc | min h H Lu S(h) = 0}

and let (hi)i N be an enumeration of corresponding empirical risk minimizing hypotheses from H:

hi = RERMui H(S).

We will now define the function h S by successively choosing subsequences from the sequence (hi)i N as follows. We set F0 = (hi)i N and J0 = N. Then, for every i N, we define a set of indices Ji such that

Case 1: There exists a k Ji 1 such that hj(xi) = hk(xi) for all j k, j Ji 1. Then we define: Ji = {j Ji 1 | hj(xi) = hk(xi)}

Case 2: Both labels are assigned infinitely often to xi by the sequence (hj)j Ji 1. Then we define Ji = {j Ji 1 | hj(xi) = 1}

and we define subsequence Fi = (hj)j Ji. Note that, by construction, in each resulting sequence Fi the first i points x1, x2, . . . , xi receive the same label by all functions in the sequence. Now, by choosing indices to select the i-th function from each sequence Fi (by choosing the first index of J1, the second index in J2, the third in J3 and so on), and re-identifying these indices with the natural numbers, we obtain yet another subsequence F = (fi)i N. In this sequence we now have for every i N and every k i fk(xi) = fi(xi).

That is, this sequence of functions converges pointwise and we can use the above equation define the function h S as follows:

hu S S (xi) = fi(xi).

It is not difficult to see that the so defined function hu S S satisfied LS u S(hu S S ) = 0. Let let H be defined

by adding these function hu S S to H for all possible S.

H = H {hu S S {0, 1}X | S (X Y) }

We observe that every behavior that H exhibits on a finite subset C X, is already witnessed by a function from H. Thus we get VC( H) = VC(H), and we can always let MRERMU H return a function from H.

B Agnostic learning for finite unions of total orders

Theorem 7. Let H be a hypothesis class with VC(H) = d < , and let U be a perturbation class, which is a union U = U1 U2 . . . Uk of k subclasses which are each totally ordered with respect to class H. Then H is agnostically learnable with access to a perfect attack oracle Ou with respect to U with sample complexity m1(ϵ, δ) + m2(ϵ, δ), and query complexity n1(ϵ, δ) + n2(ϵ, δ) where:

m1(ϵ, δ) = O( d 2d log(k/δ)

m2(ϵ, δ) is the sample complexity of agnostically learning a hypothesis class of size k wrt Uall.

n1(ϵ, δ) = k (m1(ϵ, δ))2

n2(ϵ, δ) is the query complexity of agnostically learning a hypothesis class of size k wrt Uall

Proof. We can generate k hypotheses h1, h2, . . . , hk that are the output of an (ϵ, δ)-successful learner for H with respect to each of the Ui. Since each of the classes Ui are totally ordered with respect to H, the result of Theorem 2 (combined with a union bound) tells us that m1(ϵ, δ) sample points and n1(ϵ, δ) queries suffice to guarantee that, with probability at least 1 δ, we have ℓu (hi) optu P (H) + ϵ for all i with u Ui. Thus the class Hk = {h1, h2, . . . , hk} of these k learning outputs has approximation error at most optu P (H) + ϵ with respect to the data generating distribution P. Note that, since we have |Hk| = k, this class has bounded Littlestone dimension at most log(k), VC-dimension at most log(k), and dual VC-dimension at most k. Employing a result (Montasser et al. [2021], Theorem 4) on agnostically learning finite VC classes with access to a perfect attack oracle, we get that both m2(ϵ, δ) and n2(ϵ, δ) are finite and Lu P (h) optu P (H) + 2ϵ.

C ϵ-cover Learning

C.1 Abstention learning for regions with different radii

Let us now explore the case, where we allow our perturbation types to consist of perturbation sets whose size varies throughout the domain. While a lot of work assumes that the size of the radius

is fixed throughout the domain, it is not a prior clear that the distances in the domain correspond perfectly to what humans find perceptible. This was noted in prior works [Bhattacharjee and Chaudhuri, 2021, Chowdhury and Urner, 2022] and motivated their adaptive adversarial loss. We explore such variation of local radii for the setting of uncertain perturbation sets. We have seen examples of this in Observation 3 and Observation 4. We will now generalize these perturbation types. Let X = R. For a function f : X [k], let the perturbation type uf,a1,...,ak defined by: uf,a1,...,ak(x) = Baf(x)(x).

That is, we have k different regions in the domain, which are determined by f. Furthermore, we assume that each perturbation set within a region i is a ball with radius of a fixed size ai, while radii can vary between regions. For a fixed function f, let us now consider Uf = {uf,a1,...,ak : a1, . . . , ak R}. Let VC(H) VC(H U)ℓ d. We note that in general, that is excluding particular hypothesis sets H and function f, this set of perturbation sets cannot be expressed as a finite union of totally ordered sets.

For every i [k] consider the classes Uf,i = {uf,0,...,0,ai,0,...,0 : ai R}. Let P be a realizable

distribution with respect to H and u U. Let m(ϵ, δ) = C d+log( 1

δ ) ϵ2 , where C is the constant we require to get a uniform-convergence guarantee (which exists due to the corresponding VCdimensions being finite, see Shalev-Shwartz and Ben-David [2014]), and let m m( ϵ k(2k2+1), δ

k). Now for a sample S and any i, let (hi, ui) = MRERMH Uf,i(S). We denote by amax i the maximum realizable radius for component i, i.e. h0,...,0,amax i ,0,...,0 = hi. Now let u = uf,a 1,...,a k be the ground true perturbation set. Then for every i, a i amax i . Let umax = uf,amax 1 ,...,amax k . We note that for every x X, y Y and h F, we have ℓumax(h, x, y) maxi [k] ℓui(h, x, y). Now let

h(x) = y if for all i : hi(x) = y otherwise.

We note that by the PAC-learning guarantee with respect to 0/1-loss, with probability 1 δ, for every i, j [k] we have a pairwise disagreement between hi and hj of at most 2ϵ (2k2+1)k. Thus the

disagreement-region of {hi : i [k]} has mass at most k2ϵ k(2k2+1). Thus, with probability 1 δ over the sample generation the abstention loss of h we have

Lu ,abst P (h) Lumax,abst P (h)

i=1 Lui,abst P (h)

i=1 (Lui P (hi) + PX ({x : h(x) = }))

k( ϵ k(2k2 + 1) + k2ϵ k(2k2 + 1)) = ϵ.

Note, that we can thus view {h} as an (H, ϵ)-cover of U with respect to P and adversarial abstention loss.

D.1 Proof of Theorem 1

Proof. For this result we adapt a compression based argument for the case of fixed perturbation type [Montasser et al., 2019] to the case of a class U of perturbation types. A similar adaptation has recently been shown for the setting of strategic classification [Lechner et al., 2023]. The key difference to the original compression scheme is that we use the MRERMU H oracle for the weak learners in each round of boosting, rather than the simple RERMu H oracle as is done in the original publication [Montasser et al., 2019]. Importantly, the maximal ERM paradigm is well defined on all subsets of an original given data sample, and thus each required weak hypothesis can be encoded with a finite number of samples. Finally, since the maximal perturbation type for any subsample is always at least as large as the true perturbation type (under the realizablity assumption), using MRERMU H for each boosting step we obtain the required guarantee for the true perturbation type (without the learner/compressor requiring knowledge of the true type). We thus obtain the same compression size and implied sample complexity in the case of a totally ordered class of perturbation types as for a fixed perturbation type.

D.2 Proof of Observation 1

Proof. Consider a finite domain X = {x1, x2, x3} and distribution P over X {0, 1} with P((x1, 1)) = 1/4, P((x3, 0)) = 3/4 and P((x, y)) = 0 for all (x, y) / {(x1, 1), (x3, 0)}. We define a class U = {u1, u2} containing two perturbation types u1 and u2 defined as follows:

u1(x1) = {x1, x2}, u1(x2) = {x2}, u1(x3) = {x3}

and u2(x1) = {x1, x2}, u2(x2) = {x2}, u2(x3) = {x2, x3}

We let hy1y2y3 denote the function that labels xi with yi on this domain (there are only 8 different non-abstaining predictors over X), and let H = {h110, h100} be a hypothesis class containing only two of these predictors. Clearly VC(H) = 1 and u1 H= u2. Note that

optu1 P (H) = Lu1 P (h110) = 0

while Lu1 P (hy1y2y3) 1/4 for all predictors hy1y2y3 = h110. We further note that

optu2 P (H) = Lu2 P (h100) = Lu2 P (h000) = 1/4

while Lu2 P (hy1y2y3) 3/4 for all predictors hy1y2y3 / {h000, h100}. Samples from P will not allow distinguishing whether the true perturbation type is u1 or u2 (since we are not imposing a realizability assumption here). Thus, no learner will be able to output predictors that are (close to) optimal for all possible perturbation types in U.

D.3 Proof of Observation 2

Proof. We consider X = R and H to be the class of threshold classifiers:

H = {ht : x 7 1 [x t] | t R}.

Further consider U = U U+, where U consists of perturbation types u r that assign each x to a ball (interval) of radius r around x if x 0 and a ball of radius r/2 around x if x > 0, and U+ similarly consists of perturbation types u+ r that assign each x to a ball (interval) of radius r around x if x > 0 and a ball of radius r/2 around x if x 0. Both U and U+ are totally ordered (with respect to any hypothesis class) but their union U is not totally ordered with respect to H. Consider a distribution P with P(( 3, 1) = P((3, 0)) = 0.5. Then, even when promised realizability, a learner cannot distinguish the cases u = u 4 and u = u+ 4 from samples; and performing well with respect to one of these perturbation types will induce loss 0.5 with respect to the other.

D.4 Proof of Observation 4

Proof. We consider the example from Observation 3 Now consider the classes U1 = {ua,0 : a R} and U2 = {u0,b R}. Since both U1 and U2 are totally ordered there are successful robust learners A1 for H with respect to U1 and A2 with respect to U2. We run both learners on a sample with a size to guarantee (ϵ/2, δ/2)-success in both cases. We denote the resulting hypotheses with h1 and h2 respectively. We now define h : X {0, 1, } by h(x) = y if h1(x) = h2(x) = y and h(x) = otherwise.

We now argue that with probability 1 δ, h has adversarial abstention loss less than ϵ. Let u = ua ,b U be the ground-truth perturbation type. Now let P be any ℓua ,b -realizable distribution. Let ( , ua1,0) = MRERMU1 H (S) and ( , u0,b2) = MRERMU2 H (S). With probability 1, we have both a1 > a and b2 > b . Further, we note every x X and every ua,b U, ℓua,b(h, x, y) is either determined by a, h and y or b, h and y, not both, since either x 0 or x > 0. This means, that for every x X, ℓua1,b2(h, x, y) max{ℓua1,0(h, x, y), ℓu0,b2(h, x, y)}.

Combining these observations, we get

Lu ,abst P (h) L ua1,b2,abst P (h) L ua1,0,abst P (h) + L u0,b2,abst P (h) ϵ

D.5 Proof of Theorem 4

Proof. We assume robust realizability with respect to U. In particular this means that we are in the non-robustly realizable case. Thus for any Ui, we know that the set U i = Ui {{{x} : x X}} is totally ordered and that it contains some ui U i such that H is robustly realizable with respect to the ground truth distribution P and ui. We know from Theorem 1, that for every i, there is a successful maximally robust learner Ai for H with respect to U i in the realizable case, since U i is totally ordered.

Now let A be defined by

1 iff for all i [k] we have Ai(S)(x) = 1 0 iff for all i [k] we have Ai(S)(x) = 0 otherwise .

Furthermore, let (h, ui(S)) = MRERMUi H(S). Now let u U be the ground-truth perturbation type. We know that there is i [k], such that u ui . Let P be a distribution with optu P (H) = 0. Now if S P m with m maxi [k] m( ϵ 2k2+1, δ

k), then with probability 1 δ, for every i [k],

Lui P (Ai(S)) ϵ 2k2+1 and therefore, L0/1 P (Ai(S)) ϵ 2k2+1. Thus, for any two i, j [k], we have

PX ({x X : Ai(S)(x) = Aj(S)(x)}) 2ϵ 2k2 + 1.

Thus, Lu ,abst P (A(S)) Lu ,abst P (Ai (S)) + PX ({x X : A(S)(x) = })

Lui ,abst P (Ai (S)) + PX ({x X : i, j, Ai(S)(x) = Aj(S)(x)})

ϵ 2k2 + 1 + 2k2ϵ 2k2 + 1 = ϵ.

D.6 Proof of Theorem 5

Proof. Let u Uall be any perturbation set and P any distribution with optu P (H) = 0. Furthermore, let m m H(ϵ, δ) where m H is the sample complexity function in O( d+log( 1

δ ) ( ϵ θ(H) )2 . Let S P m. Let

ˆH = {h H : L0/1 S (h) = 0}. Now we define our output hypothesis ˆh by

1 if for every h ˆH : h(x) = 1 0 if for every h ˆH : h(x) = 0 otherwise.

We note that there is some h ˆH with Lu P (h ) = L0/1 P (h ) = 0. Since every h ˆH is an ERMH hypothesis and H has VC dimension d, the sample of ERMs gives us

Lu ,abst P (ˆh) Lu ,abst P (h ) + PX ({x X : h (x) = ˆh(x)})

Lu P (h ) + PX ({x X : ˆh(x) = })

0 + PX (DIS(B(h , ϵ θ(H))) = ϵ.

D.7 Proof of Theorem 6

Proof. Let m m U,H,cover(ϵ/2, ϵ/2, δ) and S P m. And let u U the ground-truth perturbation set. We define the abstention learner as:

1 if for all h Acover(S) : h(x) = 1 0 if for all h Acover(S) : h(x) = 0 otherwise

Then with probability 1 δ, PX (DIS(Acover(S))) ϵ

2 and h Acover(S) with

Lu P (h) min h H Lu P (h ) + ϵ/2.

Since for every x X either A(S)(x) = h(x) or x Acover(S). Thus,

Lu ,abst P (A(S)) Lu ,abst P (h) + PX (DIS(Acover(S))) min h H Lu ,abst P (h ) + ϵ.