# probably_approximately_global_robustness_certification__e8ff650f.pdf

Probably Approximately Global Robustness Certification

Peter Blohm 1 Patrick Indri 1 Thomas G artner 1 Sagar Malhotra 1

We propose and investigate probabilistic guarantees for the adversarial robustness of classification algorithms. While traditional formal verification approaches for robustness are intractable and sampling-based approaches do not provide formal guarantees, our approach is able to efficiently certify a probabilistic relaxation of robustness. The key idea is to sample an ϵ-net and invoke a local robustness oracle on the sample. Remarkably, the size of the sample needed to achieve probably approximately global robustness guarantees is independent of the input dimensionality, the number of classes, and the learning algorithm itself. Our approach can, therefore, be applied even to large neural networks that are beyond the scope of traditional formal verification. Experiments empirically confirm that it characterizes robustness better than state-of-the-art sampling-based approaches and scales better than formal methods.

1. Introduction

We present a novel sampling-based procedure to certify the global robustness of a classifier with high probability. Existing robustness verification methods are decision procedures that provide yes/no answers for a given robustness criterion, for a specific point in the input space. In general, however, robustness is a quantitative property that may depend on additional factors, such as the classifier s confidence in a given prediction. As a key contribution, we propose and investigate a novel notion of global robustness that quantifies the robustness of any point, given its prediction confidence. Our guarantee is obtained by checking local robustness on a sufficiently large sample of points. The size of the sample is quantified using a learning-theoretic construct, namely ϵ-nets (Haussler & Welzl, 1986). Our approach is agnostic to the specific method used to perform local robustness

1TU Wien, Austria. Correspondence to: Peter Blohm <peter.blohm@tuwien.ac.at>.

Proceedings of the 42 nd International Conference on Machine Learning, Vancouver, Canada. PMLR 267, 2025. Copyright 2025 by the author(s).

checks and, therefore, to the precise notion of robustness investigated. In this paper, we will consider both formal and adversarial methods to check local robustness.

Neural Network (NN) robustness is a major desideratum as, for non-robust networks, predictions can be drastically changed with only small perturbations to the input (Szegedy et al., 2014). This behavior can be detrimental in safetycritical applications like autonomous driving (Rao & Frtunikj, 2018) or image recognition tasks (Athalye et al., 2018). Hence, methods for evaluating whether NNs are robust to such perturbations are critical to enable their safe and reliable deployment. Significant research efforts have been invested to assess NN robustness, following two main lines of research:

Formal verification methods have been used to provably certify local robustness of NNs (Katz et al., 2019; Wu et al., 2024; Xu et al., 2021). Recent results have extended these techniques to global robustness certification (Athavale et al., 2024) albeit only for NNs with up to a few hundred parameters.

Adversarial methods rely on local optimization to assess whether NNs can withstand adversarial attacks (Goodfellow et al., 2015; Madry et al., 2018; Carlini & Wagner, 2017a). These methods easily scale to large networks, but generally do not provide formal guarantees on the global behavior of the network.

Our approach provides a quantitative characterization of robustness for the whole input space parameterized by the prediction confidence of each point. That is, for each point in the input space we are able to give a high-probability lower bound for its robustness. This only requires independently and identically distributed (iid) samples from the data distribution and access to a local robustness oracle. Such an oracle can be efficiently modeled using existing (formal or adversarial) methods for assessing local robustness. The sample size required by our approach is independent of the input dimensionality, the number of classes, and the learning algorithm itself. Our approach can provide distinct robustness guarantees for each confidence value after a single sampling procedure. This characterization can be used to infer high-probability lower bounds on the robustness of new data points, that were not part of the sampling procedure.

Probably Approximately Global Robustness Certification

This paper is organized as follows. In Section 2, we present the relevant related work. In Section 3, we introduce the preliminaries. Our main contributions are described in Section 4 and Section 5. We present and discuss our experimental evaluation in Section 6. Finally, Section 7 contains concluding remarks.

2. Related Work

After the seminal work by Szegedy et al. (2014) showed that NNs are sensitive to adversarial examples, a number of techniques to find such examples were introduced. Many such approaches rely on gradient computation such as the Fast Gradient Signed Method (FGSM, Goodfellow et al., 2015) and an iterative adaptation of it called Projected Gradient Descent (PGD, Madry et al., 2018). Additionally, the C&W attack (Carlini & Wagner, 2017a) explicitly takes the distance to the original data point into account, to find a particularly close adversarial example. More recently, empirical approaches to assess (Webb et al., 2019; Baluta et al., 2021; Kim et al., 2023) as well as improve (Li et al., 2023; Kim et al., 2023) the robustness of NNs have been introduced, often building from the concept of adversarial training introduced by Goodfellow et al. (2015). While these approaches are applicable to large NNs and can be used to empirically assess robustness, they do not provide theoretical robustness guarantees for new points.

In an effort to obtain provable guarantees, a line of research has developed formal methods for the verification of NNs (Katz et al., 2019; Chen et al., 2021; Xu et al., 2020). While a formally verified NN is provably robust to input perturbations (Casadio et al., 2022; Meng et al., 2022), formal verification is limited to small NNs. In fact, it is generally hard to provide guarantees about the behavior of large networks (Katz et al., 2017) and it is, in particular, hard to detect all adversarial examples (Carlini & Wagner, 2017b). Other approaches have sought to provide approximate (Wu et al., 2020) or statistical (Webb et al., 2019; Cohen et al., 2019; Lecuyer et al., 2019) robustness guarantees, as well as to train NNs which are certifiably robust (Li et al., 2023; Sinha et al., 2018; Wang et al., 2018).

Recent work has specifically addressed global robustness for NNs. Athavale et al. (2024) and Indri et al. (2024) have recently extended existing formal verification methods to certify global robustness for confident predictions, where confidence is quantified using the softmax function. Similarly, Kabaha & Cohen (2024) focus on confident predictions using a margin-based notion of confidence. However, these methods are limited to NNs with only hundreds of parameters. In this paper, we introduce a probabilistic relaxation of the notion of global robustness of Athavale et al. (2024). We use ϵ-nets (Haussler & Welzl, 1986) to provide high-probability guarantees for this notion of robustness.

3. Preliminaries

In this section, we will introduce basic notation and the necessary background for our theory. We will further introduce and motivate the notion of a robustness oracle and prediction confidence. These abstract concepts will allow our results to be easily adapted to different methods of assessing robustness, as shown in our experiments in Section 6.

3.1. Basic Notation

Consider a classification task with n classes on a metric input space X. We define a classifier as f : X Rn, where for any instance x X the classifier returns a vector f(x) Rn. In this vector, the i-th component f(x)i represents the classifier s output for class i, like the output layer in a NN. We say that the predicted class for x is class(x) = arg maxi [1,n] f(x)i. Furthermore, we say conf f(x) R is the prediction confidence of f for class(x). In the rest of this paper, we will assume that for a class class(x) = c the confidence is given by the softmax function, i.e.,

conf f(x) = exp(f(x)c) P j [1,n] exp(f(x)j) (1)

without restricting the generality of our statements. We assume that the data for the classification task follows a distribution D over X. We write X D to indicate that X is a random variable drawn from D, and x D to denote an observed realization of X. If clear from context, probabilities Pr D that are computed with respect to a distribution D will be written as Pr.

3.2. Robustness

We say a classifier f is robust around a point x, if class(f(x)) remains constant within some neighborhood N(x) X of x. In many applications, N(x) is chosen to be an L -ball (Goodfellow et al., 2015; Madry et al., 2018) or another norm-bounded ball (Baluta et al., 2021; Carlini & Wagner, 2017a; Leino et al., 2021; Athavale et al., 2024).

How robust f is around x is assessed by a robustness oracle robf(x). The robustness oracle robf(x) returns the distance ρ to the closest counterexample x N(x) s.t. class(f(x )) = class(f(x)). Examples of robustness oracles robf include verification tools (Katz et al., 2017; Meng et al., 2022; Xu et al., 2021; Athavale et al., 2024), which offer exact results at high computational costs. For such exhaustive search methods, we can state robf(x) = minx N(x){ x x : class(f(x)) = class(f(x ))}. In contrast, heuristic or adversarial methods (Goodfellow et al., 2015; Madry et al., 2018; Carlini & Wagner, 2017a) can efficiently find very close points with different class membership, which can be used as an upper bound for ρ. We will further discuss practically useful robustness oracles based on well-established methods in Section 6.

Probably Approximately Global Robustness Certification

The following definition captures this oracle-based notion of robustness around a given point.

Definition 3.1 (Local ρ-robustness). A classifier f : X Rn is locally ρ-robust around x according to the robustness oracle robf if robf(x) ρ. (2)

Based on this definition, we need to define a notion of global robustness for a given classifier f, so we can guarantee the classifier will behave robustly regardless of the input. A natural definition of global ρ-robustness would require local ρ-robustness for every possible input. However, enforcing this for all inputs in dense spaces like Rm would constrain the classifier to make constant predictions. To address this, a more useful notion of global robustness focuses on specific regions of interest, such as areas where the classifier s prediction confidence conf f(x) is high (Leino et al., 2021; Athavale et al., 2024). More confident predictions require larger changes in the output of f to alter the predicted class, so we consequently expect points classified with high confidence to be more robust. The following definition formalizes global robustness, based on a confidence threshold κ.

Definition 3.2 (Global ρ-κ-robustness). A classifier f : X Rn is globally ρ-κ-robust if it is locally ρ-robust for all x X where the prediction confidence conf f(x) is larger than κ.

x X : conf f(x) κ = robf(x) ρ. (3)

Definition 3.2 is equivalent to the following statement, which states the absence of counterexamples:

x X : robf(x) < ρ conf f(x) κ. (4)

Equation (4) highlights that a classifier is certified to be robust if no counterexample exists, that is, if no point with both high confidence and insufficient robustness is found by the oracle. However, certifying robustness in this manner is intractable in general (Katz et al., 2017). In this paper, we focus instead on bounding the probability of encountering counterexamples.

3.3. Epsilon Nets

Our robustness guarantees build on concepts from computational geometry and learning theory, and our definitions are based on Mitzenmacher & Upfal (2017).

Definition 3.3 (Range space). Let Y be a (possibly infinite) set and R a family of subsets of Y called ranges. A range space is a tuple (Y, R).

Definition 3.4 (VC Dimension, Vapnik & Chervonenkis (2015)). Let (Y, R) be a range space. The Vapnik Chervonenkis (VC) dimension of (Y, R) is the size of the

largest finite set S Y such that for every subset T S, there exists a range R R satisfying R S = T. If no such maximum exists, the VC dimension is infinite.

Definition 3.5 (ϵ-net, Haussler & Welzl (1986)). Let (Y, R) be a range space and D be a probability distribution on Y. A (finite) set N Y is an ϵ-net for Y with respect to D if, for every set R R such that Pr(R) ϵ, the set R contains at least one point from N, i.e.,

R R : Pr(R) ϵ = R N = . (5)

ϵ-nets provide a general notion of coverage. Our approach for certifying PAG-robustness relies on the following theorem for constructing ϵ-nets from iid samples.

Theorem 3.6 (ϵ-nets from iid samples (Mitzenmacher & Upfal, 2017)). Let (Y, R) be a range space with VC dimension d and let DY be a probability distribution on Y. For any 0 < δ, ϵ 1

2, an iid sample from DY of size s is an ϵ-net for Y with probability at least 1 δ if

While the big-O notation of this result is convenient for theoretical asymptotic analyses, it neglects constant factors that may be significant in practice. In their proof, however, Mitzenmacher & Upfal (2017) show that Theorem 3.6 always holds if s satisfies the inequality in the following restatement.

Proposition 3.7 (ϵ-nets from iid samples with constants). Let (Y, R) be a range space with VC dimension d and let D be a probability distribution over Y. For any 0 < ϵ, δ < 1

2, an iid sample from DY of size s is an ϵ-net for Y with probability at least 1 δ if

δ + d ln(2s) ln 1 e sϵ/8 . (7)

Proof. See Appendix A.

We refer to the smallest integer s that satisfies the inequality in Proposition 3.7 as s(ϵ, δ, d), and obtain it by computing

δ + d ln(2s) ln 1 e sϵ/8 .

To obtain sample complexities, we find s(ϵ, δ, d) from Equation (8) using numerical methods.

4. PAG Robustness

In this section, we present our main theoretical results. We show how to get high-probability certificates for the following relaxation of global ρ-κ-robustness (Definition 3.2).

Probably Approximately Global Robustness Certification

Definition 4.1. Given a data-distribution D, a classifier f : X Rn is approximately globally robust according to a robustness oracle robf if, for a given confidence κ, robustness ρ, and X D, we have that

Pr robf(X) < ρ | conf f(X) κ < ϵ. (9)

where 0 < ϵ < 1 is a chosen parameter.

Our relaxation states that all κ-confident predictions of f will also be at least ρ-robust, with high probability. This probabilistic statement is inspired by the work of Athavale et al. (2024), where the confidence threshold is introduced as a mechanism for f to abstain from prediction. A key difference to their work is that we are able to decide the probabilistic statement for all values of (ρ, κ) simultaneously. For this, we devise on a sampling-based methodology using ϵ-nets to obtain the bound in Equation (9).

We rephrase the left-hand side of Equation (9) as

Pr robf(X) < ρ conf f(X) κ

Pr conf f(X) κ (10)

to then separately provide a lower bound for the numerator (using Proposition 4.2) and an upper bound for the denominator (using Lemma 4.3). Both of these bounds can then be realized with the same iid sample. In the rest of this section, we introduce a set of novel definitions required for our results and then proceed with the bounds themselves.

4.1. Quality Space

An important aspect of our method is that our guarantees are independent of the properties of the input space and of the classifier f. This is because our guarantees only consider the two properties of confidence and robustness of a given point in the input space. In this section, we formalize this idea. We define a function q that maps a given point x X to its robustness-confidence tuple:

q(x) 7 (robf(x), conf f(x)). (11)

As q(x) captures precisely the characteristics, or qualities, of each point x that are relevant for our guarantees, it is useful to introduce the concept of a quality space Q. The quality space is a 2-dimensional real space defined by the outputs of the map q. For a given pair (ρ, κ), we say x is a counterexample to global ρ-κ-robustness if q(x) R(ρ, κ), where

R(ρ, κ) := {(ρ , κ ) R2 : ρ < ρ κ κ}. (12)

The quality space allows us to easily define and detect whether f is robust or not. Each R(ρ, κ) corresponds to

Quality space Q

Figure 1. Visualization of the quality space Q and of the region of counterexamples defined in Equation (12) for two possible pairs of robustness-confidence values (ρ, κ) and ( ρ, κ).

the intersection of two-axis aligned half-spaces in Q, as illustrated in Figure 1.

We define the family of these counterexample ranges over all possible (ρ, κ) as

R = {R(ρ, κ) : (ρ, κ) R2}. (13)

Under a data distribution D, we have

Pr(R(ρ, κ)) = Pr q(X) R(ρ, κ)

= Pr robf(X) < ρ conf f(X) κ . (14)

Given a set N X, we use q(N) to denote the set q(N) = {q(x) Q : x N}. With these concepts, we can obtain sample complexities independent from the dimensionality of X. We obtain an iid sample N in X, map it to Q, and interpret q(N) as ϵ-net over the range space (Q, R). This range space is equivalent to the intersection of two axisaligned half-spaces in R2, with a VC-dimension d = 2.

4.2. Bounding the Joint Probability

In this section, we define a method that can check under which conditions and for which (ρ, κ) the inequality

Pr robf(X) < ρ conf f(X) κ < ϵ (15)

holds true. For our guarantee, we consider an iid sample N from the data distribution D. If our iid sample is an ϵ-net, it intersects all ϵ-likely ranges. Therefore, if no counterexample to ρ-κ-robustness is found in an ϵ-net over (Q, R), then R(ρ, κ) is known to be less than ϵ-likely. The following proposition formalizes this idea.

Proposition 4.2. Let D be a data distribution and f : X Rn be a classifier. Given an ϵ-net q(N) over the range space (Q, R), then for a given confidence κ and robustness radius ρ, we have that

q(N) R(ρ, κ) = = Pr R(ρ, κ) < ϵ (16)

Probably Approximately Global Robustness Certification

where R(ρ, κ) denotes the set of counterexamples to ρ-κrobustness.

Proof. As q(N) is an ϵ-net, it holds that

R R : Pr(R) ϵ = q(N) R = . (17)

By contraposition, this statement is equivalent to

R R : Pr(R) < ϵ = q(N) R = . (18)

Then, q(N) R(ρ, κ) = = Pr(R(ρ, κ)) < ϵ.

This result allows us to bound the numerator of Equation (9). Note that Proposition 4.2 allows us to give guarantees on the behavior of f. However, a bound on the joint probability (alone) may lead to vacuous guarantees. In fact, the probability in Equation (15) is trivially zero if we consider a confidence value κ large enough such that Pr(conf f(X) κ) = 0. For this reason, we require additional information about Pr(conf f(X) κ).

4.3. Bounding the prediction confidence

In this section, we provide a method to lower bound the probability of obtaining a given prediction confidence as

Pr(conf f(X) κ) pmin. (19)

Equation (19) is implied by the following statement about the cumulative distribution function of confidence:

Pr(conf f(X) κ) 1 pmin (20)

The following auxiliary lemma provides a general method to obtain a bound for a given quantile of a random variable from an iid sample. Lemma 4.3. Let K be a real-valued random variable, and N be an iid sample of K with |N| = s. Denote with N(i) R the ith-largest element in the sample. Then for parameters 1 > p 1

2 and 0 < δ < 1

2, with probability of at least 1 δ, we have that Pr(K N(i)) p (21)

for any integer i such that

Proof sketch. The number of elements in an iid sample that are smaller or equal to the p-quantile Kp of K is a random variable I Binom(s, p). We are interested in finding the largest integer i such that N(i) Kp holds with probability at least 1 δ and, therefore, the largest integer i such that Pr(I i) 1 δ. We use a Chernoff bound for the deviation of I from its expected value E[I] = sp to obtain an integer i that is a high-probability lower bound for I.

We use Lemma 4.3 with K = conf f(X) and p = 1 pmin to obtain κ values for which

Pr(conf f(X) > κ) pmin. (23)

As Pr(conf f(X) κ) Pr(conf f(X) > κ), we arrive at Equation (19). For compactness, we use

i(s, p, δ) := max i N

to refer to the largest i which respects the bound in Equation (22). In our sample, we refer to the confidence value corresponding to this i as κmax. For confidence values κ κmax, Lemma 4.3 allows us to state with high probability that Pr(conf f(X) > κ) pmin. We cannot, however, make a statement about confidence values that were not observed frequently enough in the sample, i.e., for κ > κmax. This upper bound on confidence values being certified is natural, as our statement relies only on the observed sample.

4.4. PAG Robustness

We have now introduced all necessary tools and conditions to certify a classifier f to be approximately globally robust. So far, we assumed an ϵ-net in the quality space Q as given. We now discuss how to obtain such an ϵ-net using Theorem 3.6 to derive our main contribution: an iid samplingbased procedure that allows us to certify the approximate global robustness of a classifier f with high probability.

Theorem 4.4 (PAG robustness). Let D be a data distribution, f : X Rn be a classifier and robf be a local robustness oracle. With q(x) defined as in Equation (11), with parameters 0 < ϵ, pmin, δ < 1

2. Consider a sample N D with |N| s(ϵ, δ/2, 2) as per Equation (8) and an integer i = i(|N|, 1 pmin, δ/2) as per Equation (24). Then, with a probability of at least 1 δ the following implication holds for all ρ and for all κ N(i):

q(N) R(ρ, κ) = =

Pr robf(X) < ρ | conf f(X) κ < ϵ pmin . (25)

That is, under the absence of counterexamples in N, the classifier f can be certified to be Probably Approximately Globally (PAG) robust.

We present a proof of Theorem 4.4 that builds on the following lemma, which reformulates Proposition 4.2 as conditional probability.

Lemma 4.5. Let X D be a random data point in X, f : X Rn be a classifier, and parameters 0 < ϵ, pmin 1

2 be a parameter. Given an ϵ-net q(N) over the range space

Probably Approximately Global Robustness Certification

(Q, R), for all ρ, κ such that Pr(conf f(X) κ) pmin, it holds that q(N) R(ρ, κ) = =

Pr robf(X) < ρ | conf f(X) κ < ϵ pmin . (26)

Proof. Let us define, for brevity, the two random events Er = {robf(X) < ρ} and Ec = {conf f(X) κ}. By Proposition 4.2 we know

q(N) R(ρ, κ) = = Pr(Er Ec) < ϵ. (27)

Furthermore, by the definition of conditional probability

Pr(Er | Ec) = Pr(Er Ec)

Pr(Ec) . (28)

When Pr(Ec) = Pr(conf f(X) κ) pmin, it holds that

Pr(Er | Ec) Pr(Er Ec)

pmin < ϵ pmin , (29)

which completes the proof.

Proof of Theorem 4.4. We assume that |N| s(ϵ, δ/2, 2). Theorem 3.6 then implies that q(N) is an ϵ-net over (Q, R) with a probability of at least 1 δ/2. We further assert i i(|N|, 1 pmin, δ/2). Lemma 4.3 then implies that Pr(conf f(X) N(i)) pmin with a probability of at least 1 δ/2 as well. We use the union bound to state that both conditions hold true with a probability of at least 1 δ. Given that both N(i) is a valid quantile bound, i.e., Pr(conf f(X) N(i)) pmin, and κ N(i), then Pr(conf f(X) κ) pmin. The theorem then follows from Lemma 4.5.

In some settings, it might not be possible to directly sample from the data distribution, but rather from a distribution D

that is used to approximate the true distribution D. Even in these settings we can transfer our guarantee, as long as we can quantify the difference between the two distributions.

Theorem 4.6 (PAG under distribution shift). Let D be a data distribution, f : X Rn be a classifier and robf be some robustness oracle. Let D be a distribution such that TV(D, D ) = Λ. Consider an iid sample N D s and parameters ϵ, δ and pmin. Consider an s that satisfies the conditions in Theorem 4.4 for D . Then, with probability at least 1 δ q(N) R(ρ, κ) = =

Pr D robf(X) < ρ | conf f(X) κ < ϵ + Λ pmin Λ.

(30) That is, in the absence of counterexamples f can be certified to be PAG robust with respect to the data distribution D.

Proof sketch. The data processing inequality (Raginsky, 2014) can be used to show (Lemma A.2) that Λ is an upper bound for the total variation distance of the distribution of instances sampled from D and D when they are mapped into the quality space Q. The numerator in the right-hand side can be derived by showing (Lemma A.3) that an ϵ-net under D is an (ϵ + Λ)-net under D. The denominator can be derived by observing (Proposition A.5) that the quantiles of D and D can differ by at most Λ. The theorem follows (Appendix A) from these results.

In the following section, we describe how to use the theoretical results obtained so far and practically derive robustness lower-bounds for a given classifier.

5. Robustness Lower-Bounds

In the previous section, we described how to provide global robustness guarantees given an iid sample N is found to be robust. Specifically, for a given choice of parameters ϵ and δ, we bound the required size of N. This single sample N can be used to evaluate the robustness with all tuples (ρ, κ). A lower-bound on the robustness of a point x with prediction confidence κ = conf f(x) can be obtained using the smallest observed robustness for a confidence of at least κ in the sample N. We define a mapping function M(κ) 7 ρ that maps a given confidence κ to a corresponding robustness lower bound ρ:

min x N robf(x)

s.t. conf f(x) κ

UNDEFINED else

Equation (31) assures that no counterexample of ρ-κrobustness is in N for ρ = M(κ), and that the returned ρ satisfies all conditions of Theorem 4.4. The mapping function M(κ) does not return a robustness lower-bound for confidence values larger than κmax. In this case, for the considered sample size the uncertainty for quantile estimation is too large to guarantee Pr(conf f(X) κmax) > 0, as described in Lemma 4.3. Note that, for κ > κmax, we can nevertheless guarantee that Pr robf(X) < ρ conf f(X) κ < ϵ. The map M can be constructed from a sample N in O(|N| log(|N|)) time, as detailed in Algorithm 1.

For a given mapping M, we now discuss how likely it is that our guarantees hold across all confidence values. This is not directly addressed by our analysis so far and requires to bound

Pr robf(X) < M conf f(X) . (32)

We can obtain a bound on this probability as a direct consequence of Proposition 4.2. In this sense, the ρ-κ-mapping

Probably Approximately Global Robustness Certification

Quality space Q

Figure 2. Visualization of the robustness lower-bound map M(κ). Each rectangular region under the curve has a probability mass smaller than ϵ. The yellow line represents the maximum confidence value above which M(κ) is undefined.

M is not only useful to obtain robustness lower-bounds for individual points, but can furthermore be used to bound the probability that for a point x sampled from D, it holds that robf(x) < M(conf f(x)).

Proposition 5.1. Consider a classifier f : X Rn and a ρ-κ-mapping M constructed from an ϵ-net q(N) as in Equation (31). Let |M| be the size of the codomain of M. Then

Pr robf(X) < M conf f(X) < |M|ϵ. (33)

Proof. We use the properties of ϵ-nets together with a union bound based on the size of the codomain of the mapping M. As M consists, by construction, of |M| discrete steps, |M| ranges fully cover the area in the quality space below M. In the worst case, we have to assume their probability mass is additive and proceed with a union bound, proving our statement.

In practical settings, |M| is typically small; we empirically demonstrate this in Section 6. We illustrate the practical effectiveness of our method in experiments with NNs in the following section.

6. Experiments

In this section, we investigate the practical aspects of our theory. We aim to show that our results translate into practical settings and aim to answer the following research questions:

RQ1: How can different methods for checking local robustness be modeled as oracles?

RQ2: How well do our guarantees hold on unseen data in realistic, imperfect conditions?

RQ3: How does the runtime of our verification procedure scale based on network size, parameter choices, and oracle?

RQ4: How well can we capture qualitative differences in the behavior of different NNs?

We answer these questions by applying our procedure to the certification of NNs for MNIST (Deng, 2012) and CIFAR10 (Krizhevsky, 2009).

Setup We train four different network architectures (Ioffe & Szegedy, 2015; He et al., 2016; Mirman et al., 2018) on the two classification problems MNIST and CIFAR10, as illustrated in Table 1. For each architecture, we train, in a cross-validation setup, five instances of the network with standard training and five instances with TRADES (Zhang et al., 2019). We then use the respective validation split of the data to produce our guarantee: we imitate iid samples from the true data distribution by sampling with Gaussian noise from the validation data. As this approximation might not represent the true data distribution perfectly, the theoretical strictness of our guarantees might decrease, according to Theorem 4.6. Similar to this, in practical settings, true iid sampling is not always possible, and we investigate whether our guarantees appear to degrade noticeably.

Table 1. Overview of used NN architectures and oracles.

Dataset Architecture Params Oracle

MNIST Feed Forward 39 k PGD, Li RPA Conv Big 1 663 k Li RPA

CIFAR10 Res Net20 272 k PGD VGG11 BN 9 491 k PGD

We repeat the randomized verification procedure multiple times for each instance of the NNs with different robustness oracles. A detailed description of the training and verification procedure is provided in Appendix C.

Robustness Oracles We provide three robustness oracles, and conduct experiments with one based on adversarial techniques and one based on formal methods, as an answer to RQ1. Our adversarial oracle uses PGD (Madry et al., 2018), with a high number of small gradient steps, to report the closest counterexample found. For the experiments with PGD, we set ϵ = 10 4, pmin = 0.01, δ = 0.01 and thus sample s(ϵ, δ/2, 2) = 989534 images. We use PGD to quantify robustness, by performing many small gradient steps from each data point, and we measure the L distance to the first adversarial example.

The two formal robustness oracles we provide are based on the NN verification tools Marabou (Katz et al., 2019) and

Probably Approximately Global Robustness Certification

auto Li RPA1 based on Li RPA Xu et al. (2020) respectively. We will only conduct multiple experiments using auto Li RPA, due to the high computational demand of Marabou in our setting. For both of these tools, we certify local robustness in L hypercubes of fixed distances, and use binary search to find a lower bound of the radius where the model is robust. In our experiments with auto Li RPA, we choose ϵ = 2.5 10 3, pmin = 0.05 and δ = 0.01. We consequently sample s(ϵ, δ/2, 2) = 31635 images.

Evaluation After the sampling procedure is completed, we construct the map M as in Equation (31) from the sampling split of our data. The map M provides the robustness lower bound obtained with the sampling procedure. We are then interested in empirically checking whether the robustness lower bound we obtained holds on the unseen testing split, that is, whether the testing data respects our guarantees. To this end, we define the following estimators. For a given confidence value κ, we estimate the conditional probability Pr(robf(x ) < M(κ) | conf f(x ) κ) for x Dtest, that is, the probability that the testing data does not respect the robustness lower-bound provided by M.

pκ = |{x : robf(x ) < M(κ) conf f(x ) κ}|

|{x : conf f(x ) κ}| ,

(34) where x Dtest. To estimate the worst-case violation of our guarantees, we consider the ˆp = maxκ κmax pκ and check if ˆp ϵ/pmin. We furthermore report

nc = |{x Dtest : robf(x ) < M(conf f(x )}|. (35)

This helps evaluate Proposition 5.1 empirically. As discussed in Proposition 5.1, |M| denotes the number of steps in M, i.e., the size of its codomain. If Proposition 5.1 holds for our sample, we then expect nc |Dtest||M|ϵ. For both MNIST and CIFAR-10, |Dtest| = 104 and thus |Dtest|ϵ = 1. We then expect nc/|M| < 1 for PGD and nc/|M| < 10 for auto Li RPA.

Results Table 2 reports a summary of our experimental evaluation. We perform a total of 230 experiments on 40 NNs, checking our guarantees against the estimators described above. In the majority of trials (211/230) our estimators suggest that our guarantees hold across all confidence values. To answer RQ2, we discuss possible explanations for the 19 observed violations, besides the natural variance of our estimators defined in Equations (34) and (35).

In the few cases where our estimators indicate that our guarantees are violated on unseen data, the models under analysis still follow the overall trend described by our guarantees (see Figure 3), with only slight violations. Considering our results in more detail, we can observe that 5 of

1github.com/Verified-Intelligence/auto Li RPA

the 6 runs with violations for the Feed Forward network on MNIST originate in one single validation data split, as detailed in Appendix D. This could be, for instance, due to a distribution shift between this specific validation split under Gaussian noise and the true data distribution. A similar phenomenon can be observed for Res Net20 on CIFAR10.

With these possible explanations in mind, we observe a remaining total of 5/180 0.028 trials, where our guarantees seem not to generalize to the unseen test data as expected. This empirical fraction is close to our choice of δ = 0.01, and considering our simple approach of estimating the data distribution, our method can be considered relatively resilient to imperfect sampling settings.

To discuss RQ3, we inspect the runtimes in Table 2. We find that our approach is able to effectively scale to large models like the 10 million parameter VGG11 BN with strict parameter values for ϵ, δ and pmin, that provide strong guarantees. The runtime of our procedure consists mainly of the individual local robustness checks, so by relaxing ϵ, δ and pmin, the runtime can be easily adjusted to desired durations.

We plot the map M and the testing split of CIFAR10 with VGG11 BN in Figure 3. To investigate our method on the more challenging examples, we choose to plot the results for one of the 3 experiments for which we observe pκ > 0.01 for some κ. Even in this case, the lower bound provided by M generally holds on test data and describes well the specific behavior of the networks, despite very few (7) exceptions. The bounds also allow for a clear differentiation between the formal robustness of adversarially and normally trained networks, as observable in Figure 4 in Appendix D too, which answers RQ4 positively.

7. Conclusion

In this work, we use ϵ-nets to devise a sampling-based approach to provide probabilistic guarantees on the global robustness of a given classifier. Our approach is agnostic to the specific robustness oracle and can thus be adapted to a variety of existing approaches. Our guarantees are conditioned on the confidence of a prediction to allow for a flexible characterization of robustness. In our experimental evaluation, we used PGD and auto Li RPA to evaluate local robustness and showed how our method: (i) characterizes the behavior of networks trained with both standard and adversarial methods, and (ii) obtains useful global robustness guarantees that transfer effectively to unseen data.

In future work, we will use other informative properties in addition to confidence, such as the predicted class, to further refine our guarantees. Beyond this, more refined robustness oracles and sampling procedures can be used to further improve the scalability and practical resilience of the obtained guarantees. As our approach is agnostic to the

Probably Approximately Global Robustness Certification

Figure 3. Scatter plot of the CIFAR-10 test dataset Dtest, with |Dtest| = 10000, in the quality space Q with VGG11 BN. The left network is trained with standard methods, the right network is trained robustly with TRADES. The red line depicts the lower bound obtained from validation sample N, with the parameters ϵ = 10 4, δ = pmin = 0.01, with |N| = s(ϵ, δ/2, 2) = 989534. On the right-hand side, 5 data points violate the lower bound. Note that, despite this apparent violation, M tightly fits the test data and illustrates the contrasting robustness behaviors of the networks.

Table 2. Summary results for all experiments. We report worst results aggregated over three to five random seeds and over the different data splits used for the TRADES adversarial training. The subscripts indicate standard training (ST) or adversarial training (AT) with TRADES, and if the used oracle is PGD (P) or auto Li RPA (L). For each experiment, we report the values of ˆp and nc, |M|, where bold numbers denote that the estimators are consistent with our guarantees for all the κ κmax, for all the runs considered. Moreover, we report the number of individual good runs that are consistent with our guarantees even when considering the worst-case ˆp. Finally, we report the average runtime per verification run in minutes. More extensive results are available in Appendix D.

Experiment NN ˆp 103 ϵ/pmin 103 nc |M| good runs runtime

MNISTST,L Feed Forward 0.00 50.00 0 13 15 25/25 8.3 Conv Big 0.00 50.00 0 5 5 15/15 245

MNISTAT,L Feed Forward 1.30 50.00 7 6 9 25/25 8.3 Conv Big 3.39 50.00 44 3 4 13/15 246

MNISTST,P Feed Forward 0.18 10.00 2 38 46 25/25 0.2 MNISTAT,P Feed Forward 16.50 10.00 4 14 22 19/25 0.3

CIFARST,P Res Net20 1.35 10.00 8 3 3 17/25 4.5 VGG11 BN 0.00 10.00 0 4 5 25/25 38.8

CIFARAT,P Res Net20 1.51 10.00 4 24 42 25/25 133.4 VGG11 BN 17.00 10.00 9 26 60 22/25 308

properties of the tested object, we will investigate other use cases where this approach to probabilistic verification could improve on the state of the art.

Acknowledgements

This work was funded in part by the Vienna Science and Technology Fund (WWTF), project Stru DL (ICT22-059); by the Austrian Science Fund (FWF), project Nan OX-ML (6728); by the TU Wien DK Sec Int and by the European Unions Horizon Europe Doctoral Network programme un-

der the Marie-Skłodowska-Curie grant, project Training Alliance for Computational systems chemistry (101072930).

Impact Statement

This paper presents work whose goal is to advance the field of Machine Learning. There are many potential societal consequences of our work, none of which we feel must be specifically highlighted here.

Probably Approximately Global Robustness Certification

Athalye, A., Engstrom, L., Ilyas, A., and Kwok, K. Synthesizing robust adversarial examples. In International conference on machine learning, pp. 284 293. PMLR, 2018.

Athavale, A., Bartocci, E., Christakis, M., Maffei, M., Nickovic, D., and Weissenbacher, G. Verifying global twosafety properties in neural networks with confidence. In International Conference on Computer Aided Verification, pp. 329 351. Springer, 2024.

Baluta, T., Chua, Z. L., Meel, K. S., and Saxena, P. Scalable quantitative verification for deep neural networks. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pp. 312 323, 2021.

Carlini, N. and Wagner, D. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 39 57, 2017a.

Carlini, N. and Wagner, D. Adversarial examples are not easily detected: Bypassing ten detection methods. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, AISec 17, pp. 3 14, New York, NY, USA, 2017b. Association for Computing Machinery. ISBN 9781450352024.

Casadio, M., Komendantskaya, E., Daggitt, M. L., Kokke, W., Katz, G., Amir, G., and Refaeli, I. Neural network robustness as a verification property: A principled case study. In Shoham, S. and Vizel, Y. (eds.), Computer Aided Verification, pp. 219 231, Cham, 2022. Springer International Publishing. ISBN 978-3-031-13185-1.

Chen, Y., Wang, S., Qin, Y., Liao, X., Jana, S., and Wagner, D. Learning security classifiers with verified global robustness properties. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, CCS 21, pp. 477 494, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450384544.

Cohen, J., Rosenfeld, E., and Kolter, Z. Certified adversarial robustness via randomized smoothing. In international conference on machine learning, pp. 1310 1320. PMLR, 2019.

Deng, L. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141 142, 2012.

Goodfellow, I. J., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples. In Bengio, Y. and Le Cun, Y. (eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.

Haussler, D. and Welzl, E. Epsilon-nets and simplex range queries. In Proceedings of the Second Annual Symposium on Computational Geometry, SCG 86, pp. 61 71, New York, NY, USA, 1986. Association for Computing Machinery. ISBN 0897911946.

He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770 778, 2016.

Indri, P., Blohm, P., Athavale, A., Bartocci, E., Weissenbacher, G., Maffei, M., Nickovic, D., G artner, T., and Malhotra, S. Distillation based robustness verification with pac guarantees. In ICML 2024 Next Generation of AI Safety Workshop, 2024.

Ioffe, S. and Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pp. 448 456. pmlr, 2015.

Kabaha, A. and Cohen, D. D. Verification of neural networks global robustness. Proceedings of the ACM on Programming Languages, 8(OOPSLA1):1010 1039, 2024.

Katz, G., Barrett, C., Dill, D. L., Julian, K., and Kochenderfer, M. J. Reluplex: An efficient smt solver for verifying deep neural networks. In International Conference on Computer Aided Verification, pp. 97 117. Springer, 2017.

Katz, G., Huang, D. A., Ibeling, D., Julian, K., Lazarus, C., Lim, R., Shah, P., Thakoor, S., Wu, H., Zelji c, A., Dill, D. L., Kochenderfer, M. J., and Barrett, C. The marabou framework for verification and analysis of deep neural networks. In Dillig, I. and Tasiran, S. (eds.), Computer Aided Verification, pp. 443 452, Cham, 2019. Springer International Publishing. ISBN 978-3-030-25540-4.

Kim, H., Park, J., Choi, Y., and Lee, J. Fantastic robustness measures: The secrets of robust generalization. In Thirtyseventh Conference on Neural Information Processing Systems, 2023.

Krizhevsky, A. Learning multiple layers of features from tiny images. Technical report, 2009.

Lecuyer, M., Atlidakis, V., Geambasu, R., Hsu, D., and Jana, S. Certified robustness to adversarial examples with differential privacy. In 2019 IEEE Symposium on Security and Privacy (SP), pp. 656 672, 2019.

Leino, K., Wang, Z., and Fredrikson, M. Globally-robust neural networks. In Meila, M. and Zhang, T. (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 6212 6222. PMLR, 18 24 Jul 2021.

Probably Approximately Global Robustness Certification

Li, L., Xie, T., and Li, B. So K: Certified Robustness for Deep Neural Networks. In 2023 IEEE Symposium on Security and Privacy (SP), pp. 1289 1310, 2023.

Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.

Meng, M. H., Bai, G., Teo, S. G., Hou, Z., Xiao, Y., Lin, Y., and Dong, J. S. Adversarial robustness of deep neural networks: A survey from a formal verification perspective. IEEE Transactions on Dependable and Secure Computing, pp. 1 1, 2022.

Mirman, M., Gehr, T., and Vechev, M. Differentiable abstract interpretation for provably robust neural networks. In International Conference on Machine Learning, pp. 3578 3586. PMLR, 2018.

Mitzenmacher, M. and Upfal, E. Probability and Computing: Randomization and Probabilistic Techniques in Algorithms and Data Analysis. Cambridge University Press, USA, 2nd edition, 2017. ISBN 110715488X.

Raginsky, M. Strong data processing inequalities and ϕ -sobolev inequalities for discrete channels. IEEE Transactions on Information Theory, 62:3355 3389, 2014.

Rao, Q. and Frtunikj, J. Deep learning for self-driving cars: chances and challenges. In Proceedings of the 1st International Workshop on Software Engineering for AI in Autonomous Systems, SEFAIS 18, pp. 35 38, New York, NY, USA, 2018. Association for Computing Machinery. ISBN 9781450357395.

Sauer, N. On the density of families of sets. Journal of Combinatorial Theory, Series A, 13(1):145 147, July 1972. ISSN 0097-3165.

Shelah, S. A combinatorial problem; stability and order for models and theories in infinitary languages. Pacific Journal of Mathematics, 41(1):247 261, April 1972. ISSN 0030-8730.

Simonyan, K. and Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Bengio, Y. and Le Cun, Y. (eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.

Sinha, A., Namkoong, H., and Duchi, J. Certifiable distributional robustness with principled adversarial training. In International Conference on Learning Representations, 2018.

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. Intriguing properties of neural networks. In International Conference on Learning Representations (ICLR), 2014.

Vapnik, V. N. and Chervonenkis, A. Y. On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities, pp. 11 30. Springer International Publishing, Cham, 2015. ISBN 978-3-319-21852-6.

Wang, S., Chen, Y., Abdou, A., and Jana, S. Mixtrain: Scalable training of verifiably robust neural networks. ar Xiv preprint ar Xiv:1811.02625, 2018.

Webb, S., Rainforth, T., Teh, Y. W., and Kumar, M. P. A statistical approach to assessing neural network robustness. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. Open Review.net, 2019.

Wu, H., Isac, O., Zelji c, A., Tagomori, T., Daggitt, M., Kokke, W., Refaeli, I., Amir, G., Julian, K., Bassan, S., et al. Marabou 2.0: a versatile formal analyzer of neural networks. In International Conference on Computer Aided Verification, pp. 249 264. Springer, 2024.

Wu, M., Wicker, M., Ruan, W., Huang, X., and Kwiatkowska, M. A game-based approximate verification of deep neural networks with provable guarantees. Theoretical Computer Science, 807:298 329, 2020. ISSN 0304-3975. In memory of Maurice Nivat, a founding father of Theoretical Computer Science - Part II.

Xu, K., Shi, Z., Zhang, H., Wang, Y., Chang, K.-W., Huang, M., Kailkhura, B., Lin, X., and Hsieh, C.-J. Automatic perturbation analysis for scalable certified robustness and beyond. Advances in Neural Information Processing Systems, 33:1129 1141, 2020.

Xu, K., Zhang, H., Wang, S., Wang, Y., Jana, S., Lin, X., and Hsieh, C.-J. Fast and Complete: Enabling complete neural network verification with rapid and massively parallel incomplete verifiers. In International Conference on Learning Representations, 2021.

Zhang, H., Yu, Y., Jiao, J., Xing, E. P., Ghaoui, L. E., and Jordan, M. I. Theoretically principled trade-off between robustness and accuracy. In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning, ICML 2019, volume 97, pp. 7472 7482. PMLR, 2019.

Probably Approximately Global Robustness Certification

Proposition 3.7 (ϵ-nets from iid samples with constants). Let (Y, R) be a range space with VC dimension d and let D be a probability distribution over Y. For any 0 < ϵ, δ < 1

2, an iid sample from DY of size s is an ϵ-net for Y with probability at least 1 δ if

δ + d ln(2s) ln 1 e sϵ/8 . (7)

Proof. We follow the double sampling argument from Mitzenmacher & Upfal (2017, Theorem 14.8). First, define E1 as the random event that a sample N of size |N| = s is not an ϵ-net:

E1 = R R : Pr(X R) ϵ R N = . (36)

We aim to show Pr(E1) δ for a large enough s. Consider then a second sample T with |T| = s and define E2 as the random event that some range R R does not intersect N, but has a large intersection with T:

E2 = n R R : Pr(X R) ϵ R N = |R T| ϵs

As E(|R T|) = ϵs, then Pr |R T| ϵs

2 should be large. Hence, E1 and E2 should have similar probability in total. Mitzenmacher & Upfal (2017) formalize this intuition with the following expression which considers a fixed range R such that R N = and Pr(X R ) ϵ. In particular, as E2 E1 and consequently E2 = E2 E1, it follows that

Pr(E2) Pr(E1) = Pr(E1 E2)

Pr(E1) = Pr(E2 | E1) Pr |T R | ϵs

For some fixed range R , the random variable S = |R T| follows a binomial distribution Bin(s, p) with expectation E[S] = sp. We can proceed with a Chernoff bound as

Pr |R T| sϵ

Pr |R T| sp

where we used the fact that p ϵ. While Mitzenmacher & Upfal (2017) relax this expression with exp( sϵ

2. In the interest of tighter bounds, we continue instead our derivation without this relaxation. Thus,

Pr(E2) Pr(E1) = Pr(E2 | E1) Pr |T R | ϵs

= Pr(E1) Pr(E2) 1 exp sϵ

Next, we aim to bound the probability of E2 using a larger event E 2. Consider again some fixed range R with

ER = {(R N = ) (|R T| k)} (41)

with k = ϵs

2 . We want to show that Pr(ER) is small. To do so, consider a set of 2s elements and assume to partition it randomly into N and T. ER captures the event that at least k elements in N T intersect R but none of these is in N. Of the 2s s possible partitions of N T, in exactly 2s k s of them, no element of R is in N. Consequently,

Pr(ER) Pr(N R = | |R (N T)| k) (42)

2s s = (2s k)!s!

(2s)!(s k)! = s(s 1) (s k + 1) (2s)(2s 1) (2s k + 1) (43)

2 k = 2 ϵs/2. (44)

The last inequality introduces a further albeit small relaxation in the result, as k s. We then finally consider the event E 2 via the union bound over all the ranges R R, that is,

E 2 = { R R : (R N = ) (|R T| sϵ

Probably Approximately Global Robustness Certification

We then use the Sauer-Shelah Lemma (Sauer, 1972; Shelah, 1972) to argue that we can consider at most (2s)d ranges when projecting R onto N T. By the union bound we then have that Pr(E 2) (2s)d2 sϵ/2. Finally, we arrive at

Pr(E1) Pr(E2) 1 exp sϵ

8 (2s)d2 sϵ/2

We can now simplify the last expression to obtain

(2s)d2 sϵ/2 δ 1 exp sϵ

d ln(2s) + sϵ

ln(2) ln(δ) + ln 1 exp sϵ

+ d ln(2s) ln 1 exp sϵ

which concludes the derivation.

Lemma 4.3. Let K be a real-valued random variable, and N be an iid sample of K with |N| = s. Denote with N(i) R the ith-largest element in the sample. Then for parameters 1 > p 1

2 and 0 < δ < 1

2, with probability of at least 1 δ, we have that

Pr(K N(i)) p (21)

for any integer i such that

Proof. Let Kp R be the p-quantile of K, i.e.,

Pr(K Kp) = p (50)

The number of elements in an iid sample smaller than Kp is a binomially distributed random variable I Binom(s, p). We aim to find an integer i that results in a high-probability lower bound for I, that is, we look for the largest i such that

Pr (I > i) 1 δ (51)

or, equivalently,

pk(1 p)s k < δ. (52)

There is no straightforward closed form expression to find this integer i. We proceed using Chernoff bounds for the deviation of I from E[I] = sp. Recall that Chernoff bounds for some relative deviation η [0, 1] is defined as (Mitzenmacher & Upfal, 2017, Chapter 4)

Pr(I (1 η)E[I]) exp η2E[I]

In our discrete setting, the relative deviation η of I from E[I] is defined as η := sp i

sp . The Chernoff bound can then be written as

Pr(I i) exp (sp i)2

Probably Approximately Global Robustness Certification

We now perform routine calculations to obtain an upper bound for i.

2sp < ln (δ) (55)

Given Equation (58) holds for a particular i, at least i elements in a sample of size s are smaller than Kp with probability at least 1 δ, i.e., Pr(N(i) Kp) 1 δ. Finally, the event N(i) Kp implies Pr(K N(i)) p, which then holds true with a probability of at least 1 δ as well.

Definition A.1 (Total variation distance). Consider two distributions D, D over X and let R X be a measurable set. The total variation (TV) distance between D and D is defined as

TV(D, D ) = sup R X |Pr D (R) Pr D (R)|. (59)

Lemma A.2 (Total variation distance in the quality space Q). Consider two distributions D and D over X and denote with TV(D, D ) their total variation distance. For the function q : X R2 that maps instances into the quality space Q. We denote the distribution of the instances after they are mapped onto Q as Dq, D q. It then holds that

TV(Dq, D q) TV(D, D ). (60)

Proof. For ease of presentation, we discuss a proof of the statement for more general conditions, which correspond to a version of the data processing inequality (Raginsky, 2014). Let F be a σ-algebra on the input space X and consider two probability measures P and Q on (X, F). Let t : X Y be a measurable map and G be a σ-algebra on the space Y. With slight abuse of notation we denote with t(P)(B) := P(t 1(B)), B G the pushforward measure induced by t on (Y, G). We do so analogously for Q too. The total variation distance for probability measures is defined as

TV(t(P), t(Q)) = sup B G |t(P)(B) t(Q)(B)| = sup B G |P(t 1(B)) Q(t 1(B))| (61)

Note that for every measurable set B G in the measure space (Y, G), the preimage t 1(B) F is a measurable set in (X, F). It therefore holds that

sup B G |P(t 1(B)) Q(t 1(B))| sup A F |P(A) Q(A)| = TV(P, Q), (62)

from which it follows that TV(t(P), t(Q)) TV(P, Q). (63)

The statement of lemma holds when t is the function q that maps to the quality space Q, and Y is Q.

Lemma A.3. Given two distributions D and D on X such that TV(D, D ) Λ, if N is an ϵ-net for X with respect to D , then it is an (ϵ + Λ)-net for X with respect to D.

Proof. Using Definition A.1, TV(D, D ) = sup R X |Pr D(R) Pr D (R)| Λ. Specifically, Pr D(R) ϵ + Λ implies that Pr D (R) ϵ. As N is an ϵ-net for X with respect to D , it intersects all R X such that Pr D (R) ϵ, and it thus intersects all R X such that Pr D(R) ϵ + Λ. Therefore, N is an (ϵ + Λ)-net for X with respect to D.

Lemma A.4. Let K be a random variable sampled from D . Let D be another distribution such that TV(D, D ) Λ, and Pr D (K κ) p then we have that Pr D(K κ) p + Λ (64)

Probably Approximately Global Robustness Certification

Proof. The proof follows from the fact that if TV(D, D ) Λ, then probability of no event under D and D can differ by more than Λ.

Proposition A.5. Consider a random variable K D and let N, N(i), p, and δ be as in the setting for Lemma 4.3 so that Pr D (K N(i)) p holds with probability at least 1 δ for any integer i such that i < np p

2np ln (1/δ). Consider a distribution D such that TV(D, D ) Λ. Then, under the distribution D and for the same N, N(i), p, and δ, with probability of at least 1 δ: Pr D(K N(i)) p + Λ. (65)

Proof. Using Definition A.1, for any measurable set R Rn it holds that |Pr D(R) Pr D (R)| Λ. If we consider the event {K N(i)} Rn, with probability of at least 1 δ it holds that:

Pr D(K N(i)) = Pr D (K N(i)) + Λ ( ) p + Λ, (66)

where in ( ) we used Lemma 4.3.

Theorem 4.6 (PAG under distribution shift). Let D be a data distribution, f : X Rn be a classifier and robf be some robustness oracle. Let D be a distribution such that TV(D, D ) = Λ. Consider an iid sample N D s and parameters ϵ, δ and pmin. Consider an s that satisfies the conditions in Theorem 4.4 for D . Then, with probability at least 1 δ q(N) R(ρ, κ) = =

Pr D robf(X) < ρ | conf f(X) κ < ϵ + Λ pmin Λ. (30)

That is, in the absence of counterexamples f can be certified to be PAG robust with respect to the data distribution D.

Proof. We show that the two assumptions in Theorem 4.4 lead to the statement of the theorem if TV(D, D ) = Λ.

1. To satisfy assumption 1. in Theorem 4.4, s 2

δ + d log(2s) , i.e., N is an ϵ-net for X with respect to D with probability at least 1 δ/2. Because of Lemma A.3, N is also an (ϵ + Λ)-net for X with respect to D with probability at least 1 δ/2.

2. To satisfy assumption 2. in Theorem 4.4, i < s(1 pmin) q

2s(1 pmin) ln 2

δ for i = |{x N : conf f(x) < κ}|, i.e., Pr D (conf f(X) κ) 1 pmin with probability at least 1 δ/2. Using Proposition A.5 with p = 1 pmin, it follows that Pr D(conf f(X) κ) 1 pmin + Λ with probability at least 1 δ/2.

By union bound, the probability that either condition does not hold on a given sample N is at most δ/2 + δ/2 = δ. Thus, with probability of at least 1 δ, N will be an (ϵ + Λ)-net for X under D and the quantile bound holds under D. The data processing inequality (Raginsky, 2014) can then be used to show (Lemma A.2) that Λ is an upper bound for the total variation distance of the distribution of instances sampled from D and D when they are mapped into the quality space Q. With this, the theorem follows from the definition of ϵ-nets and Lemma 4.5.

B. Construction of Mapping

In this section, we expand on the definition of the mapping function M in Equation (31). The construction of the map can be performed in a single pass over the sorted sample N, and is illustrated in Algorithm 1. This process is O(|N| log(|N|)), bound by the process of sorting. At inference time, M(κ) returns the ρ corresponding to the largest confidence κ κ.

The size of the mapping |M|, and the corresponding strictness of Proposition 5.1, can be controlled via quantization. Rounding down robustness radii reduces the size of the codomain of M, keeps the guarantees valid, and reduces the spatial requirements of the mapping.

C. Details on Experimental Setup

In this section, we detail how the experiments were conducted. The full code to train, test and analyze the experiments is available at our repository.

Probably Approximately Global Robustness Certification

Algorithm 1 Obtain κ-ρ-mapping Input: ϵ-net N, confidence upper-bound κmax Output: κ-ρ-mapping M(κ) Let M = Let κ = /* iterate through sample in order of increasing robustness ρ */ for (ρ, κ) N in lexicographic order do

if κ < κ κmax then

M = M {κ 7 ρ} // add new step from κ to ρ to the mapping κ = κ end end return M

C.1. Software and Hardware Used

We use Py Torch to conduct our experiments and perform training with the MAIR (Kim et al., 2023) library for support of normal and adversarial training. In all instances, for TRADES, we chose the parameter β = 6. All the experiments were run on a single desktop machine equipped with an Intel i9-11900KF @ 3.50GHz CPU and a NVIDIA Ge Force RTX 3080 GPU.

C.2. Robustness Oracles

PGD We adapt the internal PGD function in MAIR to return the distance to adversarial examples. To get a fine-grained result, we perform many gradient steps with a small step size. For MNIST, we use a step size of 0.5/256 and up to 200 steps to find an adversarial example. We project our examples to valid pixel values between 0 and 1. For CIFAR-10, we use a similar setup with a smaller step size of 0.1/256 and up to 500 steps. We constrain adversarial images to have L distance of at most 0.5 to the original data point.

LIRPA (Xu et al., 2020) offers a Python interface for NN robustness certification. Similar to Marabou, we perform some preprocessing on our MNIST models and use the bound propagation capabilities of Li RPA to obtain a bound. The Li RPA library lets us define a neighbourhood in X and gives bounds for the logits of f in this neighbourhood. To check if the class of f stays constant around a given point x, with class(x) = c, we query Li RPA for an upper bound UB(x)

UB(x) = max x X{f(x ) fc(x ) : x x < ρ}. (67)

If this upper bound UB(x) = 0, then f is robust around x. It is worth noting that the bounds reported by Li RPA are not necessarily tight. This means that Li RPA might underreport robustness radii. Similarly to Marabou, we then use binary search to find the smallest value for ρ.

Marabou While omitted in the main paper due to the better scalability of auto Li RPA, we present Marabou here as an additional example for a robustness oracle. We use the Python interface of the Marabou 2.0 NN verification tool to verify the following query for our network for a given data point x X and a given robustness radius ρ:

x X : ||x x || < ρ class(x) = class(x ). (68)

If the formula is UNSAT, the network is locally ρ-robust. We perform a binary search to get a bounded estimate of the local robustness radius, and obtain a lower bound of the L local robustness radius with 4 bits of precision, up to a value of 0.5. In Figure 6, we choose ϵ = 2.5/ ln(2) 10 3 and δ = 0.01 and pmin = 0.05, with a six-step binary search for ρ. This results in a sample requirement of s(ϵ, δ/2, 2) = 21294.

C.3. NN Architectures

All network architectures were trained with the MAIR library (Kim et al., 2023), using a normal training procedure or TRADES (Zhang et al., 2019) with a β = 6. We trained 5 instances of each architecture on different splits of the training data. For the respective number of parameters, refer to Table 1.

Probably Approximately Global Robustness Certification

Feed Forward Network We considered three-layer, fully-connected Re Lu networks with (768, 50, 10) neurons, trained on MNIST. Refer to our github repository and to Appendix D for the exact (hyper)parameter values.

Res Net20 We downloaded an untrained Res Net20 network from this Github repository. Refer to Appendix D for the exact hyperparameter values.

Conv Big Similarly to the other architectures, we used an untrained Conv Big architecture (Mirman et al., 2018), but trained our instances independently.

VGG11 BN This CIFAR10 network (Simonyan & Zisserman, 2015; Ioffe & Szegedy, 2015) is our largest architecture with approximately 10M trainable parameters.

C.4. Datasets and Splits

We use MNIST and CIFAR10 for our experiments. We train five instances of each architecture in the manner of 5-fold cross-validation. We then use the respective 20% of the training data to sample data points with Gaussian noise added to the data points. The network does not see the validation split during training, and our guarantees are only obtained from the data in the test split. Finally, the test set, both unknown to the network and our verification procedure, is used to test the generalization of our guarantee.

To obtain the sampling datasets, we consider the following settings:

For the PGD oracle, we choose the verification parameters ϵ = 10 4 and δ = pmin = 0.01. This results in a sample size of s(ϵ, δ/2, 2) = 989534.

For the auto Li RPA oracle, due to the more costly local verification procedure, we relax our parameters to ϵ = 2.5 10 3, δ = 0.01 and pmin = 0.05. This results in a sample size of s(ϵ, δ/2, 2) = 31635.

For each of our experiments, the sampling dataset is then obtained by uniformly choosing data from the sampling split and adding Gaussian noise with a mean of 0 and a standard deviation of 8/256 for both.

D. Additional Results

D.1. Additional Results on MNIST

Figure 4. Scatter plot of the MNIST test dataset Dtest, with |Dtest| = 10000, in the quality space Q for our feed forward network. The left network is trained with standard methods, the right network is trained robustly with TRADES. Results for parameters ϵ = 10 4, δ = pmin = 0.01, with |N| = s(ϵ, δ/2, 2) = 989534.

Probably Approximately Global Robustness Certification

Figure 5. Scatter plot of the MNIST test dataset Dtest, with |Dtest| = 10000, in the quality space Q for our feed forward network. The left network is trained with standard methods, the right network is trained robustly with TRADES. Results for parameters ϵ = 2.5 10 3, δ = pmin = 0.01, with |N| = s(ϵ, δ/2, 2) = 31635.

0.00 0.25 0.50 0.75 1.00 Confidence Score (quantiles)

Formal Robustness (L inf distance)

Lower Bound Test Data Max Confidence

MNIST Lower Bound vs Test Data with Standard Training

0.00 0.25 0.50 0.75 1.00 Confidence Score (quantiles)

Formal Robustness (L inf distance)

Lower Bound Test Data Max Confidence

MNIST Lower Bound vs Test Data with TRADES Training

Figure 6. Scatter plot of the MNIST test dataset Dtest, with |Dtest| = 10000, in the quality space Q for our feed forward network. The left network is trained with standard methods, the right network is trained robustly with TRADES (β = 2). Results for parameters ϵ = 2.5/ ln(2) 10 3, δ = 0.01 pmin = 0.05, with |N| = s(ϵ, δ/2, 2) = 21892.