# characterization_of_overfitting_in_robust_multiclass_classification__67994ee9.pdf

Characterization of Overfitting in Robust Multiclass Classification

Jingyuan Xu Weiwei Liu School of Computer Science, Wuhan University National Engineering Research Center for Multimedia Software, Wuhan University Institute of Artificial Intelligence, Wuhan University Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University {jingyuanxu777,liuweiwei863}@gmail.com

This paper considers the following question: Given the number of classes m, the number of robust accuracy queries k, and the number of test examples in the dataset n, how much can adaptive algorithms robustly overfit the test dataset? We solve this problem by equivalently giving near-matching upper and lower bounds of the robust overfitting bias in multiclass classification problems.

1 Introduction

Learning models that are robust to adversarial perturbations has garnered significant research attention in recent years. However, despite this progress, a pervasive issue continues to plague these models, namely robust overfitting [1]. A common approach to overcome overfitting is to divide the dataset into a training set, and a holdout (or test) set. Nonetheless, modern machine learning is adaptive in its nature. Prior information about a model s performance on the test set inevitably influences future modeling choices in extensive experiments and competitions. Recent studies have shown that excessive reuse of the holdout dataset can also leads to overfitting in non-robust setting [2, 3], and a body of subsequent work has quantitatively explored this phenomenon in the framework of perfect test label reconstruction [4, 5, 6]. Accordingly, in this paper we attempt to address the following questions: Can adaptive behavior result in overfitting in an adversarial setting? If so, by how much can adaptive algorithms robustly overfit the test dataset?

To solve these questions, we generalize the framework of perfect reconstruction to the adversarial setting, and analyze the average case performance that can be achieved by an adaptive algorithm, denoted as h U(k, n, m), where k, n, m represents the number of robust accuracy queries, test samples and classes, respectively. This term equivalently measures the maximum level of robust overfitting in a multiclass classification problem. In this paper, we derive both upper and lower bounds of h U(k, n, m), and demonstrate that our upper bounds and lower bounds are matching within logarithmic factors when n and the distribution of test dataset features DX are fixed.

1.1 Related works

Perfect test label reconstruction. The question of perfect test label reconstruction in non-robust setting dates back to decades ago [7, 8, 9]. The developments on studying biasing results due to adaptive reuse of the test data start with the work of [4, 10], which broadly fall in the field of adaptive data analysis [11, 12]. [5] first pose the problem of characterizing the overfitting bias as a function of k, n, m, but they fail to give upper and lower bounds on the same order of m, which is left as an open

Correspondence to: Weiwei Liu <liuweiwei863@gmail.com>.

37th Conference on Neural Information Processing Systems (Neur IPS 2023).

question [13]. [6] close this question and determine the amount of overfitting possible in multiclass classification. The theory on quantitative overfitting analysis in adversarial setting remains blank.

Adversarial robustness. It has been shown that deep neural networks are fragile to imperceptible distortions in the input space [14]. Perhaps the most common method to improve adversarial robustness is adversarial training [15, 16]. Theoretically, [17, 18, 19, 20] study the PAC learnability of adversarial robust learning problem, and [21, 22] study adversarial robustness under self-supervised learning. [23, 24] investigates the adversarial robustness from the perspective of ordinary differential equations. Besides, [25] analyze the trade-off between robustness and fairness, [26] study the worst-class adversarial robustness in adversarial training.

2 Summary of our results

In the remainder of this article, R, N and Rd represent the sets of real numbers, natural numbers and d-dimensional vectors over R, respectively. We denote the set {1, . . . , n} (for n N) by [n]. If A and B are sets, we use BA to denote the collection of all mappings from A to B and 2A to denote the power set of A, that is the collection of all subsets of A. We denote the indicator function by 1{event}, that is 1 if an event happens and 0 otherwise. If D is a distribution, we use supp(D) to denote the support of D, which is defined by the closure of the set of possible values of a random variable having D as its distribution. Besides, p represents the ℓp norm and dp( , ) represents the distance function induced by ℓp norm. Finally, we use big tilde notations O, Ωand Θ as variants of big O notations that ignores logarithmic factors.

2.1 Problem formulation

Let X = Rd be the instance space and Y = {1, . . . , m} be the label space. Let S = {(xi, ci)}n i=1 denote the test set, whose features, denoted as XS = {x1, . . . , xn}, are independent and identically distributed (i.i.d.) according to some distribution DX on X 2. For simplification, we use c = (c1, . . . , cn) to describe the vector of test set labels. Let f : X Y be a function, its robust accuracy on the test set with respect to (w.r.t.) a small perturbation U : X 2X is defined by

Acc U (f; S) 1

i=1 1{ x U(xi), f(x ) = ci}.

The perturbation U(x) is required to be nonempty, so some choice of x is always available. This paper focuses the case when the perturbation is the p-norm ball with a small radius r, i.e. U(x) = {z X : z x p r} for some p 1. r = 0 gives the identity perturbation: I(x) {x}. Note that in this case, the definition of robust accuracy is reduced to standard (non-robust) accuracy.

In this work, we mainly study the robust overfitting attack algorithms, which do not have access to the test set S. Instead, they have query access to robust accuracy of the model on S, that is, for any classifier f, the algorithm is able to obtain the value Acc U (f; S). We refer to this access as a query. A k-query algorithm A makes k queries f1, . . . , fk on S, and based on the values of XS and Acc U (f1; S), . . . , Acc U (fk; S), A outputs a classifier ˆf = A(S). We say a k-query algorithm A is based on hypothesis class H YX if f1, . . . , fk H. The performance of the algorithm on S is measured by h U(A; S) E [Acc U ( ˆf; S)], where the expectation is over the algorithm s randomization. It is also of interest to ask the performance of an algorithm when c are drawn according to some distribution DY over [m]n. Let D = DX DY , for an algorithm A, define its performance w.r.t. D by

h U(A; D) E S D h U(A; S).

And we evaluate the algorithms under the assumption that they do not have any prior knowledge about the test labels. That is, the prior distribution of test labels is uniform over all possible labeling:

h U(A) E S X n S µn m h U(A; S),

2Formally, there is a sigma algebra F 2X of events and DX is a probability measure on (X, F)

where µn m denotes the uniform distribution over [m]n. Since the best robust accuracy is 1/m without making any queries (k = 0), h U(A; S) 1/m measures how much A robustly overfits S. The goal of this paper is to find the largest robust overfitting possible for a multiclass classification problem. So we define the performance that is achievable by an algorithm after making k queries on any S by

h U(k, n, m) max A h U(A),

and we are interested in the value of h U(k, n, m) 1/m, or h U(k, n, m) equivalently. In the rest of this paper, we aim at deriving bounds of h U(k, n, m) for given k, n and m.

2.2 Our main results

Our bounds on h U(k, n, m) have two different regimes, which can be summarized as the following theorem. Theorem 1 (Informal). let DX be the distribution of test sample features. For k = O(n/m),

h U(k, n, m) 1

For k = Ω(n/m),

h U(k, n, m) 1

where ΦDX (n) 1 and is monotonically decreasing w.r.t. n.

When U = I, our upper bounds match the best known upper bounds [5], while our lower bounds differ from the known optimal lower bounds [6] by a factor ΦDX (n). Since overfitting in the context of robust learning has some requirements on the test samples (e.g. the well-separated property for different classes [27]), we may not able to ensure a non-trivial lower bound on h U(k, m, n) for some DX . Intuitively, ΦDX (n) measures how easily to sample n good (for robust overfitting) test data features from DX . The specific form of ΦDX (n) is presented in Section 3.2. Note that for a fixed size of S whose features are i.i.d. according to a fixed distribution DX , the upper and lower bounds are matching up to a logarithmic factor, that is,

h U(k, n, m) = Θ

, k = O(n/m),

h U(k, n, m) = Θ 1

, k = Ω(n/m)

for any fixed n and DX .

2.3 Overview of our techniques

Next, we give a brief overview of proof techniques used to obtain the main results. We first note that throughout this paper we use the notion of corrupted hypothesis [28], which transforms the formulation of robust accuracy to a non-robust one thus greatly simplifying the proofs. The definition of corrupted hypothesis is presented in the beginning of Section 3.

We establish the upper bounds via minimum description length argument, following closely a proof of an analogous result by [5] for non-robust setting. Note that their bounds can be viewed as trivial upper bounds of h U(k, n, m) since non-robust accuracy always upper bounds robust accuracy. We tighten the bounds by considering the query class of an algorithm. The details are presented in Theorem 2. To obtain the lower bounds, we propose computationally efficient algorithms for two regions of k respectively. The algorithms are modified from [6], who study the worst case overfitting bias in a non-robust setting. To take the perturbation of features into account, we extend their queries to the whole feature space X by assigning each label of x X to be the closest label in XS in the sense of p-norm. The theoretical guarantees are given in Theorems 3 and 4.

2.4 Discussion

Although we equivalently derive both upper bounds and lower bounds of robust overfitting bias, there remains a gap in finding whether the lower and upper bounds can match up to a constant factor. In non-robust setting, it is proven that the log n factor in upper bounds can be removed for k = 1 [6]. It is interesting to ask if this result holds for adversarial robust setting and, more ambitiously, for all k. We leave it as an open question.

3 Proofs of main results

To make the proofs more readable, we introduce the notion of corrupted hypothesis [28].

Consider a given hypothesis f : X Y. A labeled adversarial sample ( x, y) is classified correctly if x f 1(y). A labeled example (x, y) is classified correctly if U(x) f 1(y). Let Y = Y { }, where is the special always wrong output, i.e. y = for every y Y. We define the mapping κU : YX YX :

κU(f)(x) = y, U(x) f 1(y), , otherwise.

The corrupted set of hypotheses induced by perturbation U is then defined by H = {κU(f) : f H}.

3.1 Upper bounds

The upper bounds relies on the following theorem, showing in a probabilistic view that finding a classifier in some given hypothesis class with desired robust accuracy requires learning many bits about the test labeling. Theorem 2. Let m, n, k be positive integers and Dn m = Dn X µn m, where µn m denotes the uniform distribution over [m]n. Let H YX be a hypothesis class and U be an perturbation. Then there exists a constant CH 1 satisfying: For every H-based k-query algorithm A, δ > 0, b = k ln(n + 1)1 + ln(1/δ), we have

Pr S Dn m,f=A(S)

Acc U (f; S) CH

δ, k < n m(log(n + 1) + log(1/δ)),

Pr S Dn m,f=A(S)

Acc U (f; S) CH

δ, k n m(log(n + 1) + log(1/δ)).

Proof. For any fixed hypothesis f, denote κU(f) by f. By the definition of corrupted hypotheses, for every i [n],

Pr xi DX{ x U(xi), f(x ) = ci} = Pr xi DX{ f(x) = ci}

= Pr xi DX{ f(xi) = ci| f(xi) = } Pr xi DX{ f(xi) = }

+ Pr xi DX{ f(xi) = ci| f(xi) = } Pr xi DX{ f(xi) = }.

We observe that Pr { f(xi) = } is a constant related to f, let C(f) Pr { f(xi) = }, since Pr { f(xi) = ci| f(xi) = } = 0, we have

Pr xi DX{ x U(xi), f(x ) = ci} = Pr xi DX{ f(xi) = ci| f(xi) = } Pr xi DX{ f(xi) = } = C(f)

This implies that 1{ x U(xi), f(x ) = ci} is a Bernoulli random variables with bias C(f)

m . By the Chernoff bound, for any fixed f,

i=1 1{ x U(xi), f(x ) = ci} C(f)

e mnϵ2 2C(f)+mϵ .

Denote maxf H C(f) by CH, then

Pr {Acc U (f; S) CH

m + ϵ} Pr {Acc U (f; S) C(f)

m + ϵ} e mnϵ2 2C(f)+mϵ e mnϵ2 2CH+mϵ , (1)

holds for every f. Consider the execution of A with responses of robust accuracy fixed to some sequence of values α = (α1, . . . , αk) {0, 1/n, . . . , 1}k. Denote the resulting predictor by Aα, its output distribution is fixed, hence by Eq.(1), we have

Pr S Dn m,f=Aα

Acc U (f; S) CH

m + ϵ e mnϵ2 2CH+mϵ .

Denote the set of all possible values of α by V. For every test set S, the accuracy oracle outputs some responses in V. Therefore,

Pr S Dn m,f=A(S)

Acc U (f; S) CH

α V Pr S Dn m,f=Aα

Acc U (f; S) CH

(n + 1)k e mnϵ2 2CH+mϵ .

Now if k ln(n+1)+ln(1/δ)

n 1 m, then by definition of b, 2b

n 2 m. It follows that mϵ 2 2CH, hence mnϵ2 2CH+mϵ nϵ

(n + 1)k e mnϵ2 2CH+mϵ ek ln(n+1) nϵ

2 = eln δ = δ.

If k ln(n+1)+ln(1/δ)

m, in this case 2 q

m. We obtain that mϵ < 2, thus mnϵ2 2CH+mϵ mnϵ2

4 and (n + 1)k e mnϵ2 2CH+mϵ ek ln(n+1) mnϵ2

4 = eln δ = δ, and we complete the proof.

The corollary below follows immediately from Theorem 2, which gives the upper bounds of h U(k, n, m). Corollary 1. Let m, n, k be positive integers and U be a perturbation, then

h U(k, n, m) 1

(k + 1) log(n + 1)

nm , k < n m(log(n + 1) + log(n))

h U(k, n, m) 1

m + 2(k + 1) log(n + 1) + 1

n , k n m(log(n + 1) + log(n))

Proof. Denote Acc U (f; S) X [0, 1]. Substitute δ = 1/n in Theorem 2 and notice that CH 1 for any hypothesis class H, hence for k < n m(log(n+1)+log(n)) we have

(k + 1) log(n + 1)

Let c = 1 m + 2 q

(k+1) log(n+1)

nm , it remains to show EX c + 1/n. It is trivial for c 1. For the case that c < 1, let PX be the probability distribution of X, by the definition of expectation,

0 Xd PX = Z c

0 Xd PX + Z 1

c Xd PX c Z c

0 d PX + Z 1

c d PX c + 1

where we use the fact that R 1 c d PX 1/n by Eq.(2) in the last step. The case of k n m(log(n+1)+log(n)) can be proved using similar arguments, and we complete the proof.

3.2 Lower bounds

The lower bounds of h U(k, n, m) are derived from two designed algorithms, namely Asmall and Abig(C). Asmall is divided into two cases based on whether k = 1, and the precise analysis is presented in Theorem 3. Abig(C) accepts a parameter C for calculating an intermediate variable, and we gain our Theorem 4 by setting C = ΦDX (n). Finally, the lower bounds of h U(k, n, m) follow from h U(k, n, m) = max A h U(A) h U(Asmall or Abig(ΦDX (n))).

To simplify the expression, we first introduce some definitions. For each XS X n, define

ΓS = x X j1 = j2 [n] s.t. dp(x, xj1) = dp(x, xj2) = min i [n] dp(x, xi) ,

and ΦDX (n) = min XS supp(Dn X )

U(ΓS) d PDX ,

where U(ΓS) = {U(x)|x ΓS} and PDX is the distribution function of DX .

With these definitions, we present that:

Algorithm 1 Asmall (k = 1)

1. Let f 1(x) be the all one query, i.e. f 1(x) = 1, x X.

2. Output ˆf that

ˆf(x) = 1 x X, if Acc U (f 1, S) 1/m, 2 x X, otherwise.

Theorem 3. Let n m and Dn m = Dn X µn m. Then for 1 k 1 + n/2m,

h U(Asmall, Dn m) ΦDX (n)

Proof. Case: k = 1 : For l {1, . . . , m}, let Nl be the number of examples with label l. Since the labels are uniformly distributed and ˆf is always a constant predictor, (N1, . . . , Nm) follows a multinomial distribution with parameters (n; 1

m, . . . , 1

m). Then by the construction of ˆf, the number of robustly and correctly predicted labels is then given by N1 1{N1 n/m}+N2 1{N1 < n/m}, and the expected robust accuracy is given by

E c µn m[h U(Asmall; S)] = 1

n E [N1 1{N1 n/m} + N2 1{N1 < n/m}].

We conclude that

E c µn m[h U(Asmall; S)] 1

1 mn by citing the following lemma:

Lemma 1 (Lemma 4 in [6].). Let n m 2. If (N1, . . . , Nm) is distributed according to a multinomial distribution with parameters (n; 1

m, . . . , 1

E [N1 1{N1 n/m} + N2 1{N1 < n/m}] n

Case: k > 1 : For a fixed S, and hence, a fixed ˆf = Asmall(S), for each l {1, . . . , m, }, let N S i,l be the number of examples in Bi s.t. κU( ˆf)(x) = l. Since whether κU( ˆf)(x) equals to is only depends on the distribution of x, we have that (N S i,1, . . . , N S i,m, N S i, ) follows a multinomial

distribution with parameters (|Bi|; C(S)

m , . . . , C(S)

m , 1 C(S)), where C(S) = Pr {κU( ˆf)(x) = }.

Algorithm 2 Asmall (k > 1)

1. Divide [n] into k 1 blocks B1, . . . , Bk 1 such that n k 1 |Bi| n 2(k 1) for each i.

2. For 1 i k, define query fi satisfying:

fi(xj) = 1, if j B1 B2 Bi 1 2, otherwise.

for xj XS and

fi(x) = fi(xj ), where j = arg min j [n] dp(x, xj)

for x / XS.

3. Output ˆf, it predicts xj by the following:

ˆf(xj) = 1 for all j Bi, if Acc U (fi+1, S) Acc U (fi; S), 2 for all j Bi, otherwise.

and predicts the rest of x by ˆf(x) = ˆf(xj ), where j = arg mini [n] dp(x, xj).

Then our predictions robustly and correctly predicts max{N S i,1, N S i,2} examples in Bi and hence h U(A; S) = 1

n Pk 1 i=1 max{N S i,1, N S i,2}. We will show it later that

E [max{N S i,1, N S i,2}] C(S)|Bi|

By summing over the blocks, we can lower bound the expected total number of robustly correct predictions made by Asmall :

i=1 E [max{N S i,1, N S i,2}] C(S) n

C(S) n 2(k 1)m C(S) n

To obtain a lower bound of C(S) that is independent of the choice of S, for each S we define g S satisfying:

g S(x) = ci x XS, g S(xj ) where j = arg mini [n] dp(x, xj) otherwise,

and let Hn = {g S : S (supp(DX ) Y)n}. It is easy to see that Asmall(S) Hn for all S. Hence by the definition of ΦDX (n), we have

ΦDX (n) = min g S Hn Pr {κU(gs)(x) = } C(S).

We conclude that

i=1 E [max{N S i,1, N S i,2}] ΦDX (n) n

Normalizing by n proves the desired result. It remains to lower bound E [max{N S i,1, N S i,2}]. Since N S i,1 and N S i,2 follow a binominal distribution

with parameters (|Bi|; C(S)

m ), E[N S i,1 + N S i,2] = 2C(S)|Bi|

m for all S. Let N be an independent copy

of N S i,2. N S i,1 and N are negatively correlated, hence

E [|N S i,1 N S i,2|] E [|N S i,1 N |] E N S i,1 C(S)

v u u t E h (N S i,1 C(S)

Then we conclude that

max{N S i,1, N S i,2} = N S i,1 + N S i,2 2 + |N S i,1 N S i,2| 2 C(S)|Bi|

which completes the proof.

We then present Abig(C) and the theoretical analysis for k = Ω(n/m).

Algorithm 3 Abig(C)

1. Let t := 1 + Ck 9 ln m 2. Define query f1, . . . , fk satisfying:

(a) For 1 j t, f1(xj), . . . , fk(xj) are uniformly chosen from all sequences in [m]k that have each element in [m] appearing exactly k/m times. (b) For j > t, fi(xj) = for all i = 1, . . . , k.

(c) For x / XS, fi(x) = fi(xj ) for each i [k], where j = arg mini [n] dp(x, xj).

3. Output ˆf such that:

( arg maxy [m] P

i:fi(xj)=y Acc U (fi; S)3, 1 j t, 1, j > t.

and predicts the rest of x by ˆf(x) = ˆf(xj ), where j = arg mini [n] dp(x, xj).

Theorem 4. Let k > 18m log m

ΦDX (n) and let Dn m = Dn X µn m,

h U(Abig(ΦDX (n)), Dn m) ΦDX (n)

m + ΦDX (n) k

144n log m .

Proof. For l [m], let Al be the total number of robustly and correctly predicted examples by all the queries that predict the first examples as l , i.e.

i: x U(x1),fi(x )=l Acc U (fi; S) = X

i: x U(x1),fi(x )=l

j=1 1{fi(x ) = yj, x U(xj)}

=W0 1{l = y1} + X

i: x U(x1),fi(x )=l

j=2 1{fi(x ) = yj, x U(xj)},

(3) where W0 k

m is the number of queries that predict the whole perturbation set U(x1) as y1 .

2Breaking ties randomly.

i: x U(x1),fi(x )=l

j=2 1{fi(x ) = yi, x U(xi)},

then for l = y1, Ay1 Al = W0 + My1 Ml. Since 1{fi(x ) = yj, x U(xj)} are independent for j = j , and are negatively associated across i for any fixed j, therefore, by the Chernoff bound, it satisfies that for ϵ < 1

Pr {|Ml E [Ml]| > ϵ E [Ml]} 2 exp( ϵ2

Suppose ϵ satisfies that ϵ E [Ml] k 2m and ϵ2

3 E [Ml] 3 log m, then by Eq. (3) and the union bound

arg max l [m] Al = y1

and with probability at least 3/4, ˆf(x1) = y1.

We now derive the bounds on ϵ. To this end, we first need to bound E [Ml]. Define Hn as that in the proof of Theorem 3. Let f Hn denotes the corrupted set of Hn. For each g f Ht, define g : X Y that takes value on U(xj) for j = t + 1, . . . , n and coincides with g otherwise, that is,

g (x) = , x j=t+1,...,n U(xj), g(x), otherwise.

Let Gn be the set which contains all such g . It is easy to see that κU(f1), . . . , κU(fk) Gn. Note that ming Gn Pr {g(x) = } = ΦDX (n), we have

m 1{fi(x ) = yj, x U(xj)} 1

Now for each l by the linearity of expectations,

m E [Ml] (t 1) k

Thus ϵ E [Ml] holds for

(t 1)k k 2m = m 2(t 1),

3 E [Ml] 3 log m holds for ϵ q

9 log m m2 ΦDX (n)(t 1)k. Therefore, we can find a suitable ϵ whenever s

ΦDX (n)(t 1)k m 2(t 1) < 1.

If we choose t = 1 + ΦDX (n)k

36 log m and k > 18 ΦDX (n)m log m, the aforementioned conditions hold. Note

that Pr {κU( ˆf) = } ΦDX (n), hence the expected number of robustly and correctly predicted labels is at least 3 4 t + ΦDX (n)

m (n t) ΦDX (n) n

m ) ΦDX (n) n

m + ΦDX (n) k

144 log m ,

which completes the proof.

4 Conclusion

In this work, we study the overfitting bias in the context of robust multiclass learning. We formally define the adaptive algorithms in an adversarial setting and analyze the average case performance that can be achieved by an adaptive algorithm. Upper bounds and lower bounds are both derived, and are matching within logarithmic factors when the number of test samples and distribution of data features are fixed.

Acknowledgements

This work is supported by the National Natural Science Foundation of China under Grant 61976161, the Fundamental Research Funds for the Central Universities under Grant 2042022rc0016.

[1] Leslie Rice, Eric Wong, and J. Zico Kolter. Overfitting in adversarially robust deep learning. In ICML, volume 119, pages 8093 8104, 2020.

[2] Avrim Blum and Moritz Hardt. The ladder: A reliable leaderboard for machine learning competitions. In ICML, volume 37 of JMLR Workshop and Conference Proceedings, pages 1006 1014, 2015.

[3] Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. Do imagenet classifiers generalize to imagenet? In ICML, volume 97 of Proceedings of Machine Learning Research, pages 5389 5400, 2019.

[4] Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Aaron Leon Roth. Preserving statistical validity in adaptive data analysis. In STOC, pages 117 126, 2015.

[5] Vitaly Feldman, Roy Frostig, and Moritz Hardt. The advantages of multiple classes for reducing overfitting from test set reuse. In ICML, volume 97, pages 1892 1900, 2019.

[6] Jayadev Acharya and Ananda Theertha Suresh. Optimal multiclass overfitting by sequence reconstruction from hamming queries. In ALT, volume 117 of Proceedings of Machine Learning Research, pages 3 21, 2020.

[7] Paul Erdos and Alfréd Rényi. On two problems of information theory. Magyar Tud. Akad. Mat. Kutató Int. Közl, 8:229 243, 1963.

[8] Vasek Chvátal. Mastermind. Combinatorica, 3(3):325 329, 1983.

[9] Nader H. Bshouty. Optimal algorithms for the coin weighing problem with a spring scale. In COLT, 2009.

[10] Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Aaron Roth. The reusable holdout: Preserving validity in adaptive data analysis. Science, 349(6248):636 638, 2015.

[11] Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Aaron Roth. Generalization in adaptive data analysis and holdout reuse. In Neur IPS, pages 2350 2358, 2015.

[12] Raef Bassily, Kobbi Nissim, Adam D. Smith, Thomas Steinke, Uri Stemmer, and Jonathan R. Ullman. Algorithmic stability for adaptive data analysis. In STOC, pages 1046 1059, 2016.

[13] Vitaly Feldman, Roy Frostig, and Moritz Hardt. Open problem: How fast can a multiclass test set be overfit? In COLT, volume 99 of Proceedings of Machine Learning Research, pages 3185 3189, 2019.

[14] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In ICLR, 2014.

[15] Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In ICLR, 2015.

[16] Zekai Wang, Tianyu Pang, Chao Du, Min Lin, Weiwei Liu, and Shuicheng Yan. Better diffusion models further improve adversarial training. In ICML, volume 202 of Proceedings of Machine Learning Research, pages 36246 36263, 2023.

[17] Dong Yin, Kannan Ramchandran, and Peter L. Bartlett. Rademacher complexity for adversarially robust generalization. In ICML, volume 97 of Proceedings of Machine Learning Research, pages 7085 7094, 2019.

[18] Omar Montasser, Steve Hanneke, and Nathan Srebro. VC classes are adversarially robustly learnable, but only improperly. In COLT, volume 99 of Proceedings of Machine Learning Research, pages 2512 2530, 2019.

[19] Jingyuan Xu and Weiwei Liu. On robust multiclass learnability. In Neur IPS, 2022.

[20] Zhengyu Zhou and Weiwei Liu. Sample complexity for distributionally robust learning under chi-square divergence. Journal of Machine Learning Research, 24(230):1 27, 2023.

[21] Zekai Wang and Weiwei Liu. Robustness verification for contrastive learning. In ICML, volume 162 of Proceedings of Machine Learning Research, pages 22865 22883, 2022.

[22] Xin Zou and Weiwei Liu. Generalization bounds for adversarial contrastive learning. Journal of Machine Learning Research, 24:114:1 114:54, 2023.

[23] Hanshu Yan, Jiawei Du, Vincent Y. F. Tan, and Jiashi Feng. On robustness of neural ordinary differential equations. In ICLR, 2020.

[24] Xiyuan Li, Zou Xin, and Weiwei Liu. Defending against adversarial attacks via neural dynamic system. In Neur IPS, 2022.

[25] Xinsong Ma, Zekai Wang, and Weiwei Liu. On the tradeoff between robustness and fairness. In Neur IPS, 2022.

[26] Boqi Li and Weiwei Liu. WAT: improve the worst-class robustness in adversarial training. In Brian Williams, Yiling Chen, and Jennifer Neville, editors, AAAI, pages 14982 14990, 2023.

[27] Yao-Yuan Yang, Cyrus Rashtchian, Hongyang Zhang, Ruslan Salakhutdinov, and Kamalika Chaudhuri. A closer look at accuracy vs. robustness. In Neur IPS, 2020.

[28] Daniel Cullina, Arjun Nitin Bhagoji, and Prateek Mittal. Pac-learning in the presence of adversaries. In Neur IPS, pages 228 239, 2018.