# auditing_local_explanations_is_hard__41fe3487.pdf

Auditing Local Explanations is Hard

Robi Bhattacharjee University of Tübingen and Tübingen AI Center robi.bhattacharjee@wsii.uni-tuebingen.de

Ulrike von Luxburg University of Tübingen and Tübingen AI Center ulrike.luxburg@uni-tuebingen.de

In sensitive contexts, providers of machine learning algorithms are increasingly required to give explanations for their algorithms decisions. However, explanation receivers might not trust the provider, who potentially could output misleading or manipulated explanations. In this work, we investigate an auditing framework in which a third-party auditor or a collective of users attempts to sanity-check explanations: they can query model decisions and the corresponding local explanations, pool all the information received, and then check for basic consistency properties. We prove upper and lower bounds on the amount of queries that are needed for an auditor to succeed within this framework. Our results show that successful auditing requires a potentially exorbitant number of queries particularly in high dimensional cases. Our analysis also reveals that a key property is the locality of the provided explanations a quantity that so far has not been paid much attention to in the explainability literature. Looking forward, our results suggest that for complex high-dimensional settings, merely providing a pointwise prediction and explanation could be insufficient, as there is no way for the users to verify that the provided explanations are not completely made-up.

1 Introduction

Machine learning models are increasingly used to support decision making in sensitive contexts such as credit lending, hiring decisions, admittance to social benefits, crime prevention, and so on. In all these cases, it would be highly desirable for the customers/applicants/suspects to be able to judge whether the model s predictions or decisions are trustworthy . New AI regulation such as the European Union s AI Act can even legally require this. One approach that is often held up as a potential way to achieve transparency and trust is to provide local explanations, where every prediction/decision comes with a human-understandable explanation for this particular outcome (e.g., LIME (Ribeiro et al., 2016), SHAP (Lundberg and Lee, 2017), or Anchors (Ribeiro et al., 2018)).

However, in many real-world scenarios, the explanation receivers may not necessarily trust the explanation providers (Bordt et al., 2022). Imagine a company that uses machine learning tools to assist in screening job applications. Because the company is well-advised to demonstrate fair and equitable hiring, it is plausible that it might bias its explanations towards depicting these properties. And this is easy to achieve: the company is under full control of the machine learning model and the setup of the explanation algorithm, and prior literature (Ghorbani et al., 2019; Dombrowski et al., 2019) has shown that current explainability tools can be manipulated to output desirable explanations.

This motivates the question: what restrictions or procedures could be applied to prevent such explanation cheating, and more specifically, what are ways to verify that the provided explanations

38th Conference on Neural Information Processing Systems (Neur IPS 2024).

(a) Insufficient data for auditing

(b) Sufficient data for auditing

Figure 1: Local explanations (see Section 2.2 for notation): In both panels, a set of training points x and their classifications f(x) (red/blue, decision boundary in green) are shown. For three training points (one centered at each ball), a local linear explanation (gx, Rx) is illustrated where gx is a local linear classifier (black decision boundary) and Rx is a local ball centered at x. Panel (a) depicts a regime where there is insufficient data for verifying how accurate the local explanations approximate the classifier f none of the provided regions contain enough points to assess the accuracy of the linear explanations. Panel (b) depicts a regime with more training points allowing us to validate the accuracy of the linear explanations based on how closely they align with the points in their corresponding regions.

are actually trustworthy? One approach is to require that the explanation providers completely publicize their models, thus allowing users or third-party regulators to verify that the provided explanations are faithful to the actual model being used. However, such a requirement would likely face stiff resistance in settings where machine learning models are valuable intellectual property.

In this work, we investigate an alternative approach, where a third-party regulator or a collective of users attempt to verify the trustworthiness of local explanations, simply based on the predictions and explanations over a set of examples. The main idea is that by comparing the local explanations with the actual predictions across enough data one could, in principle, give an assessment on whether the provided explanations actually adhere to the explained model. The goal of our work is to precisely understand when this is possible.

1.1 Our contributions: data requirements for auditing.

We begin by providing a general definition for local explainability that encompasses many popular explainability methods such as Anchors (Ribeiro et al., 2018), Smooth-grad (Smilkov et al., 2017), and LIME (Ribeiro et al., 2016). We define a local explanation for a classifier f at a point x as a pair (Rx, gx), where Rx is a local region surrounding x, and gx is a simple local classifier designed to approximate f over Rx. For example, on continuous data, Anchors always output (Rx, gx) where Rx is a hyper-rectangle around x and gx is a constant classifier; gradient-based explanations such as Smooth-grad or LIME implicitly approximate the decision function f by a linear function in a local region around x.

Obviously, any human-accessible explanation that is being derived from such a local approximation can only be trustworthy if the local function gx indeed approximates the underlying function f on the local region Rx. Hence, a necessary condition for a local explanation to be trustworthy is that the function gx is close to f on the region Rx, and this should be the case for most data points x sampled from the underlying distribution.

To measure how closely a set of local explanations adheres to the original classifier f, we propose an explainability loss function Lγ(E, f), which quantifies the frequency with which f differs by more

than γ from the local classifier gx over the local region Rx (see Sec. 2.2 for precise definitions). We then introduce a formalism for auditing local explanations where an auditor attempts to estimate the explainability loss Lγ(E, f). In our formalism, the auditor does so with access to the following objects:

1. A set of data points X = {x1, . . . , xn} drawn i.i.d from the underlying data distribution. 2. The outputs of a classifier on these points, f(X) = {f(x1), . . . , f(xn)}. 3. The provided local explanations for these points E(f, X) = {E(f, x1), . . . , E(f, xn)}

Observe that in our formalism, the auditor has only restricted access to the machine learning model and the explanations: they can only interact with them through their evaluations at specific data-points. We have chosen this scenario because we believe it to be the most realistic one in many practical situations, where explanation providers try to disclose as little information on their underlying machine learning framework as possible.

In our main result, Theorem 4.1, we provide a lower bound for the amount of data needed for an auditor to accurately estimate Lγ(E, f). A key quantity in our analysis is the locality of the provided explanations. We show that the smaller the provided local regions Rx are, the more difficult it becomes to audit the explainer. Intuitively, this holds because estimating the explainability loss relies on observing multiple points within these regions, as illustrated in Panel (b) of Figure 1. By contrast, if this fails to hold (Panel (a)), then there is no way to validate how accurate the local explanations are. We also complement our lower bound with an upper bound (Theorem 4.2) that demonstrates that reasonably large local regions enable auditing within our framework.

Our results imply that the main obstacle to auditing local explanations in this framework is the locality of the provided explanations. As it turns out, this quantity is often prohibitively small in practice, making auditing practically impossible. In particular, for high-dimensional applications, the local regions Rx given by the explainer are often exponentially small in the data-dimension. Thus the explanations cannot be verified in cases where there does not exist any prior trust between the explanation provider and the explanation receivers.

We stress that estimating the local loss Lγ(E, f) serves as a first baseline on the path towards establishing trustworthy explanations. It is very well possible that an explanation provider achieves a small local loss (meaning that the local classifiers closely match the global classifier f) but nevertheless provides explanations that are misleading in some other targeted manner. Thus, we view successful auditing in this setting as a necessary but not sufficient condition for trusting an explanation provider.

Our results might have far-reaching practical consequences. In cases where explanations are considered important or might even be required by law, for example by the AI Act, it is a necessary requirement that explanations can be verified or audited (otherwise, they would be completely useless). Our results suggest that in the typical high dimensional setting of modern machine learning, auditing pointwise explanations is impossible if the auditor only has access to pointwise decisions and corresponding explanations. In particular, collectives of users, for example coordinated by non-governmental organizations (NGOs), are never in the position to audit explanations. The only way forward in auditing explanations would be to appoint a third-party auditor who has more power and more access to the machine learning model, be it access to the full specification of the model function and its parameters, or even to the training data. Such access could potentially break the fundamental issues posed by small local explainability regions in our restricted framework, and could potentially enable the third party auditor to act as a moderator to establish trust between explanation receivers and explanation providers.

1.2 Related Work

Prior work (Yadav et al., 2022; Bhatt et al., 2020; Oala et al., 2020; Poland, 2022) on auditing machine learning models is often focused on applying explainability methods to audit the models, rather than the explanations themselves. However, there has also been recent work (Leavitt and Morcos, 2020; Zhou et al., 2021) arguing for more rigorous ways to evaluate the performance of various explanation methods. There are numerous approaches for doing so: including performance based on human-evaluation (Jesus et al., 2021; Poursabzi-Sangdeh et al., 2021), and robustness (Alvarez-Melis and Jaakkola, 2018).

There has also been a body of work that evaluates explanations based on the general notion of faithfulness between explanations and the explained predictor. Many approaches (Wolf et al., 2019; Poppi et al., 2021; Tomsett et al., 2020) examine neural-network specific measures, and typically rely on access to the neural network that would not be present in our setting. Others are often specialized to a specific explainability tool with LIME (Visani et al., 2022; Botari et al., 2020) and Shap (Huang and Marques-Silva, 2023) being especially popular choices.

By contrast, our work considers a general form of local explanation, and studies the problem of auditing such explanations in a restricted access setting, where the auditor only interacts with explanations through queries. To our knowledge, the only previous work in a similar setting is (Dasgupta et al., 2022), in which local explanations are similarly audited based on collecting them on a set of data sampled from a data distribution. They consider a quantity called the local sufficiency, which directly corresponds to our notion of local loss (Definition 2.3). However, their work is restricted to a discrete setting where local fidelity is evaluated based on instances that receive identical explanations. In particular, they attempt to verify that points receiving identical explanations also receive identical predictions. By contrast, our work lies within a continuous setting, where a local explanation is said to be faithful if it matches the underlying model over a local region.

A central quantity to our analysis is the locality of an explanation, which is a measure of how large the local regions are. Prior work has rarely measured or considered this quantity, with a notable exception being Anchors method (Ribeiro et al., 2018) which utilizes it to assist in optimizing their constructed explanations. However, that work did not explore this quantity beyond treating it as a fixed parameter.

Finally, we note that other recent work, such as (Bassan and Katz, 2023), provides avenues for providing explanations with certifiable correctness, meaning that they provide proof that their accurate reflect the underlying model. We view our work as complementary to such methods as our work demonstrates the necessity of such ideas by demonstrating difficulties with using generic local explanation methods.

2 Local Explanations

2.1 Preliminaries

In this work, we restrict our focus to binary classification we let µ denote a data distribution over Rd, and f : Rd { 1} be a so-called black-box binary classifier that needs to be explained. We note that lower bounds shown for binary classification directly imply lower bounds in more complex settings such as multi-class classification or regression.

For any measurable set, M Rd, we let µ(M) denote the probability mass µ assigns M. We will also let supp(µ) denote the support of µ, which is the set of all points x such that µ ({x : ||x x || r}) > 0 for all r > 0.

We define a hyper-rectangle in Rd as a product of intervals, (a1, b1] (ad, bd], and let Hd denote the set of all hyper-rectangles in Rd. We let Bd denote the set of all L2-balls in Rd, with the ball of radius r centered at point x being defined as B(x, r) = {x : ||x x || r}.

We will utilize the following two simple hypothesis classes: Cd, which is the set of the two constant classifiers over Rd, and Ld, which is the set of all linear classifiers over Rd. These classes serve as important examples of simple and interpretable classifiers for constructing local explanations.

2.2 Defining local explanations and explainers

One of the most basic and fundamental concepts in Explainable Machine Learning is the notion of a local explanation, which, broadly speaking, is an attempt to explain a complex function s behavior at a specific point. In this section, we describe a general form that such explanations can take, and subsequently demonstrate that two widely used explainability methods, LIME and Anchors, adhere to it.

We begin by defining a local explanation for a classifier at a given point. Definition 2.1. For x Rd, and f : Rd { 1}, a local explanation for f at x is a pair (Rx, gx) where Rx Rd is a region containing x, and gx : Rx { 1} is a classifier.

Here, gx is typically a simple function intended to approximate the behavior of a complex function, f, over the region Rx. The idea is that the local nature of Rx simplifies the behavior of f enough to provide intuitive explanations of the classifier s local behavior.

Next, we define a local explainer as a map that outputs local explanations.

Definition 2.2. E is a local explainer if for any f : Rd { 1} and any x Rd, E(f, x) is a local explanation for f at x. We denote this as E(f, x) = (Rx, gx).

We categorize local explainers based on the types of explanations they output if R denotes a set of regions in Rd, and G denotes a class of classifiers, Rd { 1}, then we say E E(R, G) if for all f, x, E(f, x) outputs (Rx, gx) with Rx R and gx G.

Local explainers are typically constructed for a given classifier f over a given data distribution µ. In practice, different algorithms employ varying amounts of access to both f and µ for example, SHAP crucially relies on data sampled from µ whereas gradient based methods often rely on knowing the actual parameters of the model, f. To address all of these situations, our work takes a black-box approach in which we make no assumptions about how a local explainer is constructed from f and µ. Instead we focus on understanding how to evaluate how effective a given explainer is with respect to a classifier f and a data distribution µ.

2.3 Examples of Explainers

We now briefly discuss how various explainability tools in practice fit into our framework of local explanations.

Anchors: The main idea of Anchors (Ribeiro et al., 2018) is to construct a region the input point in which the desired classifier to explain remains (mostly) constant. Over continuous data, it outputs a local explainer, E, such that E(x) = (Rx, gx), where gx is a constant classifier with gx(x ) = f(x) for all x Rd, and Rx is a hyper-rectangle containing x. It follows say that the Anchors method outputs an explainer in the class, E(Hd, Cd).

Gradient-Based Explanations: Many popular explainability tools (Smilkov et al., 2017; Agarwal et al., 2021; Ancona et al., 2018) explain a model s local behavior by using its gradient. By definition, gradients have a natural interpretation as a locally linear model. Because of this, we argue that gradient-based explanations are implicitly giving local explanations of the form (Rx, gx), where Rx = B(x, r) is a small L2 ball centered at x, and gx is a linear classifier with coefficients based on the gradient. Therefore, while the radius r and the gradient gx being used will vary across explanation methods, the output can be nevertheless interpreted as an explainer in E (Bd, Ld), where Bd denotes the set of all L2-balls in Rd, and Ld denotes the set of all linear classifiers over Rd.

LIME: At a high level, LIME (Ribeiro et al., 2016) also attempts to give local linear approximations to a complex model. However, unlike gradient-based methods, LIME includes an additional featurewise discretization step where points nearby the input point, x, are mapped into a binary representation in {0, 1}d based on how similar a point is to x. As a consequence, LIME can be construed as outputting local explanations of a similar form to those outputted by gradient-based methods.

Finally, as an important limitation of our work, although many well-known local explanations fall within our definitions, this does not hold in all cases. Notably, Shapley-value (Lundberg and Lee, 2017) based techniques do not conform to the format given in Definition 2.1, as it is neither clear how to construct local regions that they correspond to, nor the precise local classifier being used.

2.4 A measure of how accurate an explainer is

We now formalize what it means for a local classifier, gx, to approximate" the behavior of f in Rx.

Definition 2.3. For explainer E and point x, we let the local loss, L(E, f, x) be defined as the fraction of examples drawn from the region Rx such that gx and f have different outputs. More precisely, we set L(E, f, x) = Pr x µ[gx(x ) = f(x)|x Rx].

µ is implicitly used to evaluate E, and is omitted from the notation for brevity. We emphasize that this definition is specific to classification, which is the setting of this work. A similar kind of loss can be constructed for regression tasks based on the mean-squared difference between gx and f.

We contend that maintaining a low local loss across most data points is essential for any reasonable local explainer. Otherwise, the explanations provided by the tool can be made to support any sort of explanation as they no longer have any adherence to the original function f.

To measure the overall performance of an explainer over an entire data distribution, it becomes necessary to aggregate L(E, f, x) over all x µ. One plausible way to accomplish this would be to average L(E, f, x) over the entire distribution. However, this would leave us unable to distinguish between cases where E gives extremely poor explanations at a small fraction of points as opposed to giving mediocre explanations over a much larger fraction. To remedy this, we opt for a more precise approach in which a user first chooses a local error threshold, 0 < γ < 1, such that local explanations that incur an explainabiliy loss under γ are considered acceptable. They then measure the global loss for E by determining the fraction of examples, x, drawn from µ that incur explainability loss above γ.

Definition 2.4. Let γ > 0 be a user-specified local error threshold. For local explainer E, we define the explainability loss Lγ(E, f) as the fraction of examples drawn from µ that incur a local loss larger than γ. That is, Lγ(E, f) = Pr x µ[L(E, f, x) γ].

We posit that the quantity Lγ(E, f) serves as an overall measure of how faithfully explainer E adheres to classifier f, with lower values of Lγ(E, f) corresponding to greater degrees of faithfulness.

2.5 A measure of how large local regions are

The outputted local region Rx plays a crucial role in defining the local loss. On one extreme, setting Rx to consist of a single point, {x}, can lead to a perfect loss of 0, as the explainer only needs to output a constant classifier that matches f at x. But these explanations would be obviously worthless as they provide no insight into f beyond its output f(x). On the other extreme, setting Rx = Rd would require the explainer to essentially replace f in its entirety with gx, which would defeat the purpose of explaining f (as we could simply use gx instead). Motivated by this observation, we define the local mass of an explainer at a point x as follows: Definition 2.5. The local mass of explainer E with respect to point x and function f, denoted Λ(E, f, x), is the probability mass of the local region outputted at x. That is, if E(f, x) = (Rx, gx), then Λ(E, f, x) = Pr x µ[x Rx].

Based on our discussion above, it is unclear what an ideal local mass is. Thus, we treat this quantity as a property of local explanations rather than a metric for evaluating their validity. As we will later see, this property is quite useful for characterizing how difficult it is to estimate the explainability loss of an explainer. We also give a global characterization of the local mass called locality. Definition 2.6. The locality of explainer E with respect to function f, denoted Λ(E, f), is the minimum local mass it incurs. That is, Λ(E, f) = infx supp(µ) Λ(E, f, x).

3 The Auditing Framework

Recall that our goal is to determine how explanation receivers can verify provided explanations in situations where there isn t mutual trust. To this end, we provide a framework for auditing local explanations, where an auditor attempts to perform this verification with as little access to the underlying model and explanations as possible. Our framework proceeds in with the following steps.

1. The auditor fixes a local error threshold γ.

2. A set of points X = {x1, . . . , xn} are sampled i.i.d from data distribution µ.

3. A black-box classifier f is applied to these points. We denote these values with f(X) = {f(x1), . . . , f(xn)}.

4. A local explainer E outputs explanations for f at each point. We denote these explanations with E(f, X) = {E(f, x1), . . . , E(f, xn)}. 5. The Auditor outputs an estimate A (X, f(X), E(f, X)) for the explainability loss.

Observe that the auditor can only have access to the the model f and its corresponding explanations through the set of sampled points. Its only inputs are X, f(X), and E(f, X). In the context of the job application example discussed in Section 1, this would amount to auditing a company based on the decisions and explanations they provided over a set of applicants.

In this framework, we can define the sample complexity of an auditor as the amount of data it needs to guarantee an accurate estimate for Lγ(E, f). More precisely, fix a data distribution, µ, a classifier, f, and an explainer E. Then we have the following: Definition 3.1. For tolerance parameters, ϵ1, ϵ2, δ > 0, and local error threshold, γ > 0, we say that an auditor, A, has sample complexity N(ϵ1, ϵ2, δ, γ) with respect to µ, E, f, if for any n N(ϵ1, ϵ2, δ, γ), with probability at least 1 δ over X = {x1, . . . , xn} µn, A outputs an accurate estimate of the explainability loss, Lγ(E, f). That is,

Lγ(1+ϵ1)(E, f) ϵ2 A (X, f(X), E(f, X)) Lγ(1 ϵ1)(E, f) + ϵ2.

Next, observe that our sample complexity is specific to the distribution, µ, the classifier, f, and the explainer, E. We made this choice to understand the challenges that different choices of µ, f, and E pose to an auditor. As we will later see, we will bound the auditing sample complexity using the locality (Definition 2.5), which is a quantity that depends on µ, f, and E.

4 How much data is needed to audit an explainer?

4.1 A lower bound on the sample complexity of auditing

We now give a lower bound on the amount of data needed to successfully audit an explainer. That is, we show that for any auditor A and any data distribution µ we can find some explainer E and some classifier f such that A is highly likely to give an inaccurate estimate of the explainability loss. To state our theorem we use the following notation and assumptions. Recall that Hd denotes the set of hyper-rectangles in Rd, and that Cd denotes the set of the two constant binary classifiers over Rd. Additionally, we will include a couple of mild technical assumptions about the data distribution µ. We defer a detailed discussion of them to Appendices A.3 and A.1. We now state our lower bound. Theorem 4.1 (lower bound on the sample complexity of auditing). Let ϵ1, ϵ2 < 1 48 be tolerance parameters, and let γ < 1

3 be any local error threshold. Let µ be any non-degenerate distribution, and λ > 0 be any desired level of locality. Then for any auditor A there exists a classifier f : Rd { 1} and an explainer E E(Hd, Cd) such that the following conditions hold.

1. E has locality Λ(E, f) = λ.

2. There exist absolute constants c0, c1 > 0 such that if the auditor receives

n c0 max(ϵ1, ϵ2)λ1 c1 max(ϵ1,ϵ2)

many points, then with probability at least 1

3 over X = {x1, . . . , xn} µn, A gives an inaccurate estimate of Lγ(E, f). That is,

A (X, f(X), E(f, X)) / [Lγ(1+ϵ1)(E, f) ϵ2, Lγ(1 ϵ1)(E, f) + ϵ2].

In summary, Theorem 4.1 says that auditing an explainer requires an amount of data that is inversely proportional to its locality. Notably, this result does not require the data-distribution to be adversarially chosen, and furthermore applies when the explainer E can be guaranteed to have a remarkably simple form being in E(Hd, Cd).

Proof intuition of Theorem 4.1: The main intuition behind Theorem 4.1 is that estimating the local explainability loss, L(E, f, x), requires us to observe samples from the regions Rx. This would allow us to obtain an empirical estimate of L(E, f, x) by simply evaluating the fraction of points

from Rx that the local classifier, gx, misclassifies. This implies that the locality λ is a limiting factor as it controls how likely we are to observe data within a region Rx.

However, this idea enough isn t sufficient to obtain our lower bound. Although the quantity Ω 1 λ1 O(ϵ) does indeed serve as a lower bound on the amount of data needed to guarantee seeing a large number of points within a region, Rx, it is unclear what a sufficient number of observations within Rx is. Even if we don t have enough points in any single region, Rx, to accurately estimate L(E, f, x), it is entirely plausible that aggregating loose estimates of L(E, f, x) over a sufficient number of points x might allow us to perform some type of estimation of Lγ(E, f).

To circumvent this issue, the key technical challenge is constructing a distribution of functions f and fixing m = O 1

ϵ such that observing fewer than m points from a given region, Rx, actually provides zero information about which function was chosen. We include a full proof in Appendix A.

4.2 An upper bound on the sample complexity of auditing.

We now show that if λ is reasonably large, then auditing the explainability loss Lγ(E, f) can be accomplished. As mentioned earlier, we stress that succeeding in our setting is not a sufficient condition for trusting an explainer verifying that the local explanations gx match the overall function f is just one property that a good explainer would be expected to have. Thus the purpose of our upper bound in this section is to complement our lower bound, and further support that the locality parameter λ is the main factor controlling the sample complexity of an auditor.

Our auditing algorithm proceeds by splitting the data into two parts, X1 and X2. The main idea is to audit the explanations given for points in X1 by utilizing the data from X2. If we have enough data, then it is highly likely for us to see enough points in each local region to do this. We defer full details for this procedure to Appendix B.1. We now give the an upper bound on its sample complexity.

Theorem 4.2 (Upper Bound on Sample Complexity of Algorithm 1). There exists an auditor, A, for which the following holds. Let µ be a data distribution, f be a classifier, and E be an explainer. Suppose that E has locality λ with respect to µ and f. Let ϵ1, ϵ2, δ be tolerance parameters and let γ > 0 be a local error threshold. Then A has sample complexity at most

N(ϵ1, ϵ2, δ, γ) = O 1

ϵ2 2 + 1 λγ2ϵ2 1

This bound shows that the locality is sufficient for bounding the sample complexity for auditing local explanations. We defer a full proof to Appendix B. Observe that the dependency on λ is O( 1

λ) which matches the dependency in our lower bound provided that ϵ1, ϵ2 0.

5 The locality of practical explainability methods can be extremely small

Theorems 4.1 and 4.2 demonstrate that the locality λ characterizes the amount of data needed for an Auditor to guarantee an accurate estimate of the explainability loss Lλ(E, f). It follows that if λ is extremely small, then auditing could require a prohibitive amount of data. This leads to the following question: how small is λ for practical explainability algorithms? To answer this, we will examine examine several commonly used algorithms that adhere to our framework.

We begin with gradient-based methods, which can be construed as providing an explainer in the class E(Bd, Ld), where Bd denotes the set of L2 balls in Rd, and Ld denotes the set of linear classifiers. To understand the impact of dimension on the locality of such explainers, we begin with a simple theoretical example.

Let µ be the data distribution over Rd that is a union of three concentric spheres. Specifically, x µ is equally likely to be chosen at uniform from the sets S1 = {x : ||x|| = 1 α}, S2 = {x : ||x|| = 1}, and S3 = {x : ||x|| = 1 + β}, where α, β are small d-dependent constants (Defined in Appendix C). Let f : Rd { 1} be any classifier such that f(x) = 1 if x S1 S3 and f(x) = 1 if x S2. Observe that µ is a particularly simple data distribution over three spherical manifolds, and f is a simple classifier that distinguishes its two parts. We illustrate this distribution in panel (a) of Figure 2.

Despite its simplicity, locally explaining f with linear classifiers faces fundamental challenges. We illustrate this in Figure 2. Choosing a large local neighborhood, as done at point A, leads to issues

Figure 2: An illustration of Theorem 5.1, with the concentric blue and red circles depicting the data distribution µ classified by f, and with local explanations being depicted at points A and B. Explanations are forced to either have large local loss (point A) or a low local mass (point B).

posed by the curvature of the data distribution, meaning that it is impossible to create an accurate local linear classifier. On the other hand, choosing a neighborhood small enough for local linearity, as done in point B, leads to local regions that are exponentially small with respect to the data dimension. We formalize this in the following theorem.

Theorem 5.1 (A high dimensional example). Let µ, f, be as described above, and let E be any explainer in E(Bd, Ld). Let x be any point chosen on the outer sphere, S3. Then E outputs an explanation at x that either has a large local loss, or that has a small local mass. That is, either L(E, f, x ) 1

6, or Λ(E, f, x) 31 d.

Theorem 5.1 demonstrates that if a locally linear explanation achieves even a remotely reasonable local loss, then it necessarily must have an extremely small local explanation. This suggests that, gradient based explanations will be exponentially local with respect to the data dimension, d.

We believe that this is also exhibited in practice particularly over image data, where explanations are often verified based on perceptual validity, rather than relevance to practical training points beyond the point being explained. For example, the explanations given by Smooth Grad (Smilkov et al., 2017) are visualized as pixel by pixel saliency maps. These maps often directly correspond to the image being explained, and are clearly highly specific to the it (see e.g. Figure 3 of (Smilkov et al., 2017)). As a result, we would hardly expect the implied linear classifier to have much success over almost any other natural image. This in turn suggest that the locality would be extremely small. We also remark that a similar argument can be made for Lime, which also tends to validate its explanations over images perceptually (for example, see Figure 4 of Ribeiro et al. (2016)).

Unlike the previous methods, Anchors (Ribeiro et al., 2018) explicitly seeks to maximize the local mass of its explanations. However, it abandons this approach for image classifiers, where it instead maximizes a modified form of locality based on super-imposing pixels from the desired image with other images. While this gives perceptually valid anchors, the types of other images that fall within the local region are completely unrealistic (as illustrated in Figure 3 of (Ribeiro et al., 2018)), and the true locality parameter is consequently extremely small. Thus, although Anchors can provide useful and auditable explanations in low-dimensional, tabular data setting, we believe that they too suffer from issues with locality for high-dimensional data. In particular, we note that it is possible to construct similar examples to Theorem 5.1 that are designed to force highly local Anchors-based explanations.

6 Conclusion

Our results in Section 4 demonstrate that the locality of a local explainer characterizes how much data is needed to audit it; smaller local regions lead to larger amounts of data. Meanwhile, our

discussion in Section 5 shows that typical local explanations are extremely local in high-dimensional space. It follows that in many cases, auditing solely based on point-wise decisions and explanations is impossible. Thus, any entity without model access, such as a collective of users, are never in a position to guarantee trust for a machine learning model.

We believe that the only way forward is through a more powerful third-party auditor that crucially as more access to the machine learning model, as this could potentially break the fundamental challenges posed by small explainability regions. We believe that investigating the precise types of access this would entail as an important direction for future work that might have broad practical consequences.

Finally, although our definition of local explainers encompasses several widely used explanation methods, we do note that there are notable exceptions such as Shap (Lundberg and Lee, 2017), which does not fit into our paradigm. As a consequence, one important direction for future work is expanding our framework to encompass other local explanation methods and examine to what degree they can be audited.

Acknowledgements

This work has been supported by the German Research Foundation through the Cluster of Excellence Machine Learning - New Perspectives for Science" (EXC 2064/1 number 390727645) and the Carl Zeiss Foundation through the CZS Center for AI and Law.

Agarwal, S., Jabbari, S., Agarwal, C., Upadhyay, S., Wu, S., and Lakkaraju, H. (2021). Towards the unification and robustness of perturbation and gradient based explanations. In International Conference on Machine Learning (ICML).

Alvarez-Melis, D. and Jaakkola, T. S. (2018). On the robustness of interpretability methods. arxiv preprint 1806.08049.

Ancona, M., Ceolini, E., Öztireli, C., and Gross, M. (2018). Towards better understanding of gradientbased attribution methods for deep neural networks. In International Conference on Learning Representations (ICLR).

Bassan, S. and Katz, G. (2023). Towards formal XAI: formally approximate minimal explanations of neural networks. In Sankaranarayanan, S. and Sharygina, N., editors, Tools and Algorithms for the Construction and Analysis of Systems - 29th International Conference, TACAS 2023, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022, Paris, France, April 22-27, 2023, Proceedings, Part I, volume 13993 of Lecture Notes in Computer Science, pages 187 207. Springer.

Bhatt, U., Xiang, A., Sharma, S., Weller, A., Taly, A., Jia, Y., Ghosh, J., Puri, R., Moura, J. M. F., and Eckersley, P. (2020). Explainable machine learning in deployment. In Conference on Fairness, Accountability, and Transparency (FAcc T).

Bordt, S., Finck, M., Raidl, E., and von Luxburg, U. (2022). Post-hoc explanations fail to achieve their purpose in adversarial contexts. In Conference on Fairness, Accountability, and Transparency (FAcc T).

Botari, T., Hvilshøj, F., Izbicki, R., and de Carvalho, A. C. P. L. F. (2020). Melime: Meaningful local explanation for machine learning models. arxiv preprint 2009.05818.

Dasgupta, S., Frost, N., and Moshkovitz, M. (2022). Framework for evaluating faithfulness of local explanations. In International Conference on Machine Learning (ICML).

Dombrowski, A., Alber, M., Anders, C. J., Ackermann, M., Müller, K., and Kessel, P. (2019). Explanations can be manipulated and geometry is to blame. In Advances in Neural Information Processing Systems (Neur IPS).

Ghorbani, A., Abid, A., and Zou, J. Y. (2019). Interpretation of neural networks is fragile. In AAAI Conference on Artificial Intelligence.

Huang, X. and Marques-Silva, J. (2023). The inadequacy of shapley values for explainability. arxiv preprint 2302.08160.

Jesus, S. M., Belém, C., Balayan, V., Bento, J., Saleiro, P., Bizarro, P., and Gama, J. (2021). How can I choose an explainer?: An application-grounded evaluation of post-hoc explanations. In 21: 2021 ACM Conference on Fairness, Accountability (FAcc T).

Leavitt, M. L. and Morcos, A. S. (2020). Towards falsifiable interpretability research. arxiv preprint 2010.12016.

Lundberg, S. M. and Lee, S. (2017). A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems (Neur IPS).

Oala, L., Fehr, J., Gilli, L., Balachandran, P., Leite, A. W., Ramírez, S. C., Li, D. X., Nobis, G., Alvarado, E. A. M., Jaramillo-Gutierrez, G., Matek, C., Shroff, A., Kherif, F., Sanguinetti, B., and Wiegand, T. (2020). ML4H auditing: From paper to practice. In Machine Learning for Health Workshop, (ML4H@Neur IPS).

Poland, C. M. (2022). The right tool for the job: Open-source auditing tools in machine learning. arxiv preprint 2206.10613.

Poppi, S., Cornia, M., Baraldi, L., and Cucchiara, R. (2021). Revisiting the evaluation of class activation mapping for explainability: A novel metric and experimental analysis. In Workshops of the Conference on Computer Vision and Pattern Recognition Workshops (CVPR).

Poursabzi-Sangdeh, F., Goldstein, D. G., Hofman, J. M., Vaughan, J. W., and Wallach, H. M. (2021). Manipulating and measuring model interpretability. In Conference on Human Factors in Computing Systems (CHI).

Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). "why should I trust you?": Explaining the predictions of any classifier. In Demonstrations Session, Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.

Ribeiro, M. T., Singh, S., and Guestrin, C. (2018). Anchors: High-precision model-agnostic explanations. In Mc Ilraith, S. A. and Weinberger, K. Q., editors, AAAI Conference on Artificial Intelligence.

Smilkov, D., Thorat, N., Kim, B., Viégas, F. B., and Wattenberg, M. (2017). Smoothgrad: removing noise by adding noise. arxiv preprint 1706.03825.

Tomsett, R., Harborne, D., Chakraborty, S., Gurram, P., and Preece, A. D. (2020). Sanity checks for saliency metrics. In (AAAI) Conference on Artificial Intelligence.

Visani, G., Bagli, E., Chesani, F., Poluzzi, A., and Capuzzo, D. (2022). Statistical stability indices for LIME: obtaining reliable explanations for machine learning models. J. Oper. Res. Soc., 73(1):91 101.

Wolf, L., Galanti, T., and Hazan, T. (2019). A formal approach to explainability. In AAAI/ACM Conference on AI, Ethics, and Society (AIES).

Yadav, C., Moshkovitz, M., and Chaudhuri, K. (2022). A learning-theoretic framework for certified auditing of machine learning models. arxiv preprint 2206.04740.

Zhou, J., Gandomi, A. H., Chen, F., and Holzinger, A. (2021). Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593.

A Proof of Theorem 4.1

A.1 An additional assumption

We also include the assumption that the locality parameter be small compared to the tolerance parameters. More precisely, we assume that λ < ϵ2 2.

We believe this to be an extremely mild assumption considering that we typically operate in the regime where λ is exponentially small with the dimension, d, whereas the tolerance parameters are typically between 10 2 and 10 3.

A.2 Main Proof

Proof. Fix ϵ1, ϵ2, γ, λ, and µ, as given in the theorem statement. Our goal is to show the existence of classifier f and explainer E so that the auditor, A, is likely to incorrectly estimate the parameter, Lγ(E, f). To do so, our strategy will be instead to consider a distribution over choices of (E, f), and show that in expectation over this distribution, A estimates Lγ(E, f) poorly.

To this end, we define the following quantities:

1. Let E be the explainer given in Section A.4.

2. Let f be the random classifier defined in Section A.4.

3. Let n be any integer with

n 1 2592λ(1 8max(ϵ1,ϵ2) .

4. Let X be a random variable for a set of points {x1, . . . , xn} drawn i.i.d from µ.

5. Let Y = f (X) be a random variable for {f (x1), f (x2), . . . , f (xn)}. Y has randomness over both f and X .

6. Let n = Rd n { 1}n, and σ denote the measure over n induced by (X, Y ).

7. By definition E s output is independent of function, f . Thus, we will abbreviate A s output by writing A (X, f (X), E(f , X)) = A(X, Y, E).

This emphasizes that both X and the output of E are independent of f .

8. We let I denote the interval that the auditor seeks to output an estimate it. That is,

I = Lγ(1 + ϵ1)(E, f ) ϵ2, Lγ(1 ϵ1)(E, f ) + ϵ2 .

Using this notation, we seek to lower bound that the auditor fails meaning we seek to lower bound,

Pr f ,X [A(X, Y, E) / I ] .

To do so, let T1 denote the event

2 ϵ2 < Lγ(1+ϵ1)(E, f ) Lγ(1 ϵ1)(E, f ) < 1

and T2 denote the event

2 + 3ϵ2 < Lγ(1+ϵ1)(E, f ) Lγ(1 ϵ1)(E, f ) < 1

The key observation is that any estimate, A(X, Y, E), can be inside at most one of the intervals, [ 1

2 + ϵ2] and [ 1

2 + 5ϵ2]. Using this, we can re-write our desired probability through the following integration.

Let (x, y) denote specific choices of (X, Y ). Note that in this context, x represents a set of points in (Rd)n, and y represents a set of labels in { 1}n. We then have the following:

Pr f ,X[A(X, Y, E) / I ] = Z

n Pr f [A(x, y, E) / I |X = x, Y = y] dσ(x, y)

n Pr f [T1|X = x, Y = y]1 (A(x, y, E) / I1) +

Pr f [T2|X = x, Y = y]1 (A(x, y, E) / I1) dσ(x, y)

n min Pr f [T1|X = x, Y = y], Pr f [T2|X = x, Y = y] dσ(x, y),

where the last equation holds because at least one of the events, A(x, y, E) / I1 and A(x, y, E) / I2, must hold. To bound this last quantity, we utilize Lemma A.9.

S = (x, y) : Pr[T1|X = x, Y = y], Pr[T0|X = x, Y = y] 2

By Lemma A.9, we have that σ(S ) 5

6. It follows that

Pr f ,X[A(X, Y, E) / I ] Z

n min Pr f [T1|X = x, Y = y], Pr f [T2|X = x, Y = y] dσ(x, y)

S min Pr f [T1|X = x, Y = y], Pr f [T2|X = x, Y = y] dσ(x, y)

S 2 5dσ(x, y)

which completes the proof as this implies with probability at least 1

3, the Auditors estimate is not sufficiently accurate.

A.3 Non-degenerate Distributions

Theorem 4.1 includes the assumption that µ is non-degenerate, which is defined as follows. Definition A.1. We say that data distribution µ over Rd is non-degenerate if for all x Rd, there exists 1 i d such that µ ({x : x i = xi}) = 0.

Being non-degenerate essentially means that at any point, x, the data distribution µ has a finite probability density with respect to some feature.

This condition any distribution with a well-defined density over Rd (i.e. such as a Gaussian) and is also met for most practical data-sets in which any of the features is globally continuously distributed (i.e. mass in kg over a distribution of patients).

We exclude data distributions with point masses because they can pose particularly simple cases in which there is a strict lower bound on how small the local region assigned to a given point can be. For example, in the extreme case where µ is concentrated on a single point, auditing any model or explanation over µ is trivial.

We now show a useful property of non-degenerate distributions. Lemma A.2. Let µ be a non-degenerate distribution and R be a hyper-rectangle. Then R can be partitioned into two hyper-rectangles, R1, R2 such that µ(R1), µ(R2) µ(R)

Proof. Let R = (a1, b1] (a2, b2] (ad, bd]. First, suppose that there exists 1 i d such that for all r (ai, bi],

µ ({x : x = r} R) µ(R)

r = sup r : µ (R {x : xi r}) µ(R)

It follows that setting R1 = R {x : xi r } and R2 = R \ R1 will suffice as R1 will have probability mass at least µ(R)

4 and probability mass at most µ(R)

Otherwise, suppose that no such i exists. Then, thus, there exists r1, r2, . . . , rd} such that µ(R {x : xi = ri}) > 0. It follows that the point (r1, . . . , rd) violates Definition A.1, which is a contradiction. Thus some i exists, which allows us to apply the above argument, finishing the proof.

A.4 Constructing f and E

We begin by partitioning the support of µ into hyper-rectangles such that each rectangle has probability mass in the interval [ α

4 , α]. We then further partition these rectangles into a large number of equal parts. Formally, we have the following:

Lemma A.3. Let α > 0 be fixed, and K > 0 be any integer. Then for some integer L > 0, there exists a set of hyper-rectangles, {Rj i : 1 i L, 1 j K} such that the following hold:

1. R1 i , . . . RK i partition rectangle Ri.

2. For all 1 i L, α µ(Ri) 4α.

3. For all 1 i L and 1 j K, µ(Ri)

4K µ(Rj i) µ(Ri)

Proof. First construct R1, . . . , RL by using the following procedure:

1. Begin with the set A = {R } where R is a large rectangle containing the support of µ.

2. If A contains a rectangle, R, such that µ(R) > 4α, then apply Lemma A.2 to split R into two rectangles with mass at least µ(R)

4 and mass at most 3µ(R)

3. Repeat step 2 until no such rectangles, R, exist.

This process clearly must terminate in a set of rectangles each of which has mass in the desired range, and also must terminate as a single rectangle can only be cut at most log 1

Next, to construct R1 i , R2 i , RK i , we simply utilize an analogous procedure, this time starting with {Ri} and replacing α with µ(Ri)

We now construct a fixed explainer, E.

Definition A.4. Let E denote the explainer so that for all x supp(µ), we have E(x) = (Rx, g+1) where Rx is the unique hyper-rectangle, Ri that contains x, and g+1 is the constant classifier that always outputs +1.

We now construct a distribution over functions f, and let f be a random function that follows this distribution. We have the following:

Definition A.5. Let f be a random classifier mapping Rd to { 1} constructed as follows. Let m be an integer and 0 p1, . . . , p2m, q1, . . . , q2m 1 be real numbers that satisfy the conditions set forth in Lemma A.10. Then f is constructed with the following steps:

1. Let P be a binary event that occurs with probability 1

2. If P occurs, then set ri = pi for 1 i p2m. Otherwise set ri = qi.

3. If x / L i=1Ri, then f (x) = +1.

4. For each rectangle Ri, randomly select 1 k 2m at uniform.

5. For each sub-rectangle Rj i, with probability rk, set f(x) = 1 for all x Rj i, and with probability 1 rk, set f (x) = +1 for all x Rj i.

Note that m is constructed based on ϵ1, ϵ2, and γ which we assume to be provided as in the statement of Theorem 4.1.

A.5 Properties of f and E

We now prove several useful properties of this construction. To do so, we use the following notation:

1. We let f denote the random variable representing the way f is generated. We use f = f to denote the event that f equals a specific function f : Rd { 1}.

2. We let P denote the indicator variable for the binary event used in Section A.4 to construct f.

3. We let m denote the integer from Lemma A.10 that is used to construction f .

4. We let X = (x1, . . . , xn) µn denote a random variable of n i.i.d selected points from µ. We use x to denote a specific instance of X.

5. We let Y = (y1, . . . , yn) be a random variable over labels constructed by setting yi = f (xi). We similarly use y to denote a specific instance of Y .

6. We let σ denote the measure over Rd { 1} n associated with (X, Y ) .

7. We let n denote the domain of the pair of random vectors (X, Y ) (as done in Section A.2)

We begin with a bounds on the probability that we see any rectangle that has a large number of points selected from it.

Lemma A.6. Let R1, . . . , RL be as defined in section A.4, and m as given. Let U denote the subset of n such that

U = {(x, y) : 1 i z, |X Ri| 2m} .

Then σ(U) 1 180.

Proof. We bound the probability that a single rectangle, Ri, contains at least 2m points from X, and then apply a union bound over all L rectangles. By construction, µ(Ri) 4λ, which implies that for each point xj X the probability that Xj falls within rectangle Ri is at most 4λ. Thus, for any set of 2m distinct points from X, the probability that they all fall within Ri is at most (4λ)2m. By taking a union bound over all n 2m subsets of 2m point from X, and substituting our assumed upper bound for n (point 3. of Section A.2), we have the following

Pr[|X Ri| 2m] n 2m

2m 1 2592 max(ϵ1, ϵ2)λ1 8 max(ϵ1,ϵ2)

e 2m 41 1 2m λ1 1 2m

2592 max(ϵ1, ϵ2)λ1 8 max(ϵ1,ϵ2)

By definition (see Lemma A.10), m 1 16 max(ϵ1,ϵ2). Substituting this, and noting that λ1 1 2m is increasing with respect to m (since λ < 1), we have

Pr[|X Ri| 2m] (4λ)

e 2m 41 1 2m λ1 1 2m

2592 max(ϵ1, ϵ2)λ1 8 max(ϵ1,ϵ2)

(4λ) e8 max(ϵ1, ϵ2)

1 4λ1 8 max(ϵ1,ϵ2)

2592 max(ϵ1, ϵ2)λ1 8 max(ϵ1,ϵ2)

Finally, we apply a union bound over all rectangles. Observe that there are at most 1

λ such rectangles because by construction each rectangle has mass at most λ. Thus, our total probability is at most 1 λ λ 180 which is at most 1 180 as desired.

Next, we leverage the properties from the construction of f to bound the conditional probability of P = 1 when (x, y) / U. Lemma A.7. Let (x, y) be in the support of σ so that (x, y) / U. Then

Pr[P = 1|(X, Y ) = (x, y)] = Pr[P = 0|(X, Y ) = (x, y)] = 1

Proof. Our main idea will be to use Bayes-rule, and show that Pr[(X, Y ) = (x, y)|P = 1] = Pr[(X, Y ) = (x, y)|P = 0. This will suffice due to the fact that the prior distribution for P is uniform over {0, 1}. To do so, we first note that X is independent from P. For this reason, it suffices to show that Pr[Y = y|P = 1, X = x] = Pr[Y = y|P = 0, X = x].

To do so, we will express these probabilities in terms of the real numbers, p1, . . . , p2m and q1, . . . , q2m from which they were constructed (see Definition A.5).

For each rectangle, Ri (see Lemma A.3), let Y Ri denote the function values of all points in the set X Ri. It follows from step 4 of Definition A.5 that the values in Y Ri and Y Rj are independent from each other. Thus, we can re-write our desired probability as

Pr[Y = y|P = 1, X = x] =

i=1 Pr (Y Ri) = (y Ri)|P = 1, (X Ri) = (x Ri) .

We now analyze the latter quantity for a rectangle, Ri. For convenience, let us relabel indices so that x Ri = {x1, x2, . . . , xl} and y Ri = {y1, . . . , yl} for some integer l 0. We also let X1, . . . , Xl and Y1, . . . , Yl denote the corresponding values for X Ri and Y Ri.

We now further assume that that for all xa, xb {x1, . . . , xl}, that xa and xb are contained within different sub-rectangles, Ra i , Rb i (see Definition A.5). If this isn t the case, observe that we can simply remove the pair (xb, yb), as by the construction of f , this will be forced to be identical to (xa, ya).

By applying this assumption, we now have that for a given choice of the parameter rk (step 4 of Definition A.5), the values of y1, . . . , yl are mutually independent. Utilizing this, we have

Pr (Y Ri) = (y Ri)|P = 1, (X Ri) = (x Ri) = 1 2m

Where F is a polynomial of degree l. Here, the expression, yk

2 simply evaluates to pj if yk = 1 (as pj is the probability of observing a 1) and 1 pj otherwise.

Next, observe that the only difference when performing this computation for P = 0 is that we use the real numbers, q1, . . . q2m instead. Thus, we have,

Pr (Y Ri) = (y Ri)|P = 0, (X Ri) = (x Ri) = 1 2m

To show these two expression are equal, by assumption (x, y) / U which implies that l < 2m. Furthermore, by Lemma A.10, P2m k=1 pt k = P2m k=1 qt k,for all 0 t l. It follows that P2m k=1 F(pk) = P2m k=1 F(qk), which implies our desired result.

Next, we bound the probability of events related to the value of Lγ, the parameter that the Auditor seeks to estimate.

Lemma A.8. Let T1 denote the event that

1 2 ϵ2 < Lγ(1+ϵ1) Lγ(1 ϵ1) < 1

Let T0 denote the event that

1 2 + 3ϵ2 < Lγ(1+ϵ1) < Lγ(1 ϵ1) 1

Then taken over the randomness of the entire construction, Pr[T1, P = 1], Pr[T0, P = 0] 89 180, where P is the binary event defined above.

Proof. By definition, Pr[P = 1] = Pr[P = 0] = 1

2. Thus, it suffices to show that Pr[T1|P = 1], Pr[T0|P = 0] 89

We begin with the case that P = 1 (the case for P = 0 will be similar). For each rectangle, Ri, let r(Ri) denote the choice of rk made for Ri in step 5 of Definition A.5. The crucial observation is that the value of r(Ri) nearly determines the local loss that E pays for points in Ri with respect to f . In particular, if the number of sub-rectangles, K, is sufficiently large, then by the law of large numbers, we have that with high probability over the choice of f , for all rectangles Ri and for all x Ri,

|L(E, f , x) r(Ri)| < 0.01γ(ϵ). (1)

Let us fix K to be any number for which this holds, and assume that this value of K is set throughout our construction.

Next, recall by Lemma A.10 that

p1 < p2 < < pm < γ(1 2ϵ1) < γ(1 + 2ϵ1) < pm+1 < < p2m.

Recall that r(Ri) is chosen at uniform among {p1 . . . p2m} (step 5 of Definition A.5). It follows from Equation 1 that for any x Ri, and for any α {γ(1 ϵ1), γ(1 + ϵ1)} that

Pr f [L(E, f , x) α for all x Ri] = 1

Furthermore, because we are conditioning on P = 1, the value of f within each rectangle, Ri, is independent. This implies that we can bound the behavior of Lα(E, f ) by expressing as a sum of independent variables.

Let α {γ(1 ϵ1), γ(1 + ϵ1)}, we have by Hoeffding s inequality that

Pr Lα(E, f ) 1

i=1 µ(Ri)1 (L(E, f , x) α for all x Ri)

2ϵ2 2 PL i=1 µ(Ri)2

1 2 exp 2ϵ2 2 16λ

1 1 180 = 179

The penultimate inequality holds since µ(Ri) 4λ for each Ri, and because there are at most 1

λ such rectangles. The last inequality holds because λ < ϵ2 2 by the assumption in Section A.1.

Thus by taking a union bound over both values of α, we have that Lα(E, f ) 1

with probability at least 89

90. This completes our proof for the case P = 1.

For P = 0, we can follow a nearly identical argument. The only difference is that the values of q (see Lemma A.10) are selected so that

Pr f [L(E, f , x) α] 1

This results in the expected loss falling within a different interval, and an identical analysis using Hoeffding s inequality gives the desired result.

The main idea of proving Theorem 4.1 is to show that for many values of X, Y , the conditional probabilities of T1 and T0 occurring are both fairly large. This, in turn, will cause the Auditor to have difficulty as its estimate will necessarily fail for at least one of these events.

To further assist with proving this, we have the following additional lemma.

Lemma A.9. Let S denote the subset of Rd { 1} n such that

S = (x, y) : Pr[T1|(X, Y ) = (x, y)], Pr[T0|(X, Y ) = (x, y)] 2

Then σ(S ) 5

Proof. Let S 1 = {(x, y) : Pr[T1|(X, Y ) = (x, y)] < 2 5}, and similarly S 2 = {(x, y) : Pr[T2|(X, Y ) = (x, y)] < 2 5}. Then S = Rd { 1} n \ (S 1 S 2). Thus it suffices to upper bound the mass of S 1 and S 2. To do so, let U be the set defined in Lemma A.6. Then we have 89 180 Pr[T1, P = 1]

(Rd { 1})n Pr[T1, P = 1|(X, Y ) = (x, y)]dσ

S 1 Pr[T1, P = 1|(X, Y ) = (x, y)]dσ + Z

U\S 1 Pr[T1, P = 1|(X, Y ) = (x, y)]dσ

n\(S 1 U) Pr[T1, P = 1|(X, Y ) = (x, y)]dσ

5σ(S 1) + (σ(U) σ(U S 1)) + 1

2 (σ( n \ U) σ (( n \ U) S 1))

2σ( n \ U) + σ(U)

2 179 180 + 1 180 σ(S 1) 10

Here we are simply leveraging the fact that Pr[P = 1|X, Y = x, y] is precisely 1

2 when x, y are not in U, and consequently that the probability of T1 and P = 1 is at most 2

5, 1, and 1

2 when (x, y) is in the sets S 1, U \ S 1 and ( n \ U) \ S 1 respectively. Finally, simplifying this inequality gives us σ(S 1) 1 12.

A completely symmetric argument will similarly give us that σ(S 2) 1 12. Combining these with a union bound, it follows that σ(S ) 5

6, as desired.

A.6 Technical Lemmas

Lemma A.10. For all 0 < γ, ϵ1, ϵ2 < 1 48, there exists m > 0, and real numbers 0 p1, p2, . . . , p2m, q1, . . . , q2m 1 such that the following four conditions hold:

1. For all 0 t 2m 1, P2m i=1 pt i = P2m i=1 qt i.

2. p1 p2 pm < γ(1 2ϵ1) < γ(1 + 2ϵ1) < pm+1 . . . p2m.

3. q1 q2 qm 1 < qm = qm+1 = γ(1 + 2ϵ1) < qm+2 . . . q2m.

4. 1 4ϵ2 m 1 8 max(2ϵ1,ϵ2) + 1.

Proof. Let l denote the largest integer that is strictly smaller than 1 8 max(2ϵ1,ϵ2), and let ϵ = 1 8l. It follows that ϵ max(2ϵ1, ϵ2). Let Pl and Ql be as defined in Lemma A.11. Let m = 2l. Then it follows from the definitions of m, l that

m = 2l 1 4 max(2ϵ1, max(ϵ2) 1 4ϵ2 ,

which proves the first part of property 4 in Lemma A.10. For the second part, by the definition of l, we have l 1 8 max(2ϵ1,ϵ2) 1. Since ϵ1, ϵ2 1 48, it follows that l 2 which implies

m = 2l l + 2 1 8 max(2ϵ1, ϵ2) + 1.

Next, let p1, . . . , p4l denote the (sorted in increasing order) roots of the polynomial

P l (x) = Pl

x γ(1 + 2ϵ1)

Since the roots of Pl are explicitly given in Lemma A.11, it follows that the middle two roots of P l (x) (which are the values of pm and pm+1) satisfy

pm = γ(1 + 2ϵ1 2ϵ), pm+1 = γ(1 + 2ϵ1 + 2ϵ).

Because ϵ > ϵ, these values clearly satisfy the inequalities given by point 2 in the Lemma statement.

Next, define q1, . . . , q4l as the (sorted in increasing order) roots of the polynomial,

Q l(x) = Ql

x γ(1 + 2ϵ1)

Again using Lemma A.11, we see that

qm = qm+1 = γ(1 + 2ϵ1),

which satisfies point 3.

To see that pi and qi are indeed in the desired range, we simply note that by substitution, both p1 and p2 must be larger than γ(1 + 2ϵ1) 4l(2γϵ). However, by definition, 4l(2γϵ) = γ. Thus, this quantity is larger than 0 which implies that p1 and q1 are both positive. Because γ < 1 10, a similar argument implies that p2m and q2m are at most 1.

Finally, point 1 follows from the fact that p1, . . . , p2m and q1, . . . , q2m are the complete sets of roots of two polynomials that have matching coefficients for the first 2m coefficients. it follows by basic properties of Newton sums that P pt i = P qt i for 0 i 2m 1, and this proves point 1.

Lemma A.11. For any l > 0, let

Pl(x) = ((x + 1)(x + 3) . . . (x + 4l 1)) ((x 1)(x 3) . . . (x 4l + 1)) .

Let Ql(x) = Pl(x) Pl(0). Then Ql has 2l 1 distinct real roots over the interval ( 4l, 1), 2l 1 distinct real roots over the interval (1, 4l), and a double root at x = 0.

Proof. By symmetry, P l (0) = Q l(0) = 0, and by definition Ql(0) = 0. It follows that x = 0 is a double root. Next, fix 1 i l 1. By definition, we have that Pl(4i 1) = Pl(4i + 1) = 0. We also have that

j=1 (2j 1).

Meanwhile, we also have that Pl(0) = Q2 j=1 l(2j 1) 2 . By directly comparing terms, it follows

that Pl(i) is strictly larger than Pl(0). Thus, by the intermediate value theorem, Ql must have at least one root in both (4i 1, 4i) and (4i, 4i + 1). Using a similar argument, we can also show that Ql has at least one root in (4l 1, 4l).

Since Pl is an even function, it follows that Ql is as well which means it symmetrically has roots in the intervals ( 4i, 4i + 1) for 1 i l and ( 4i 1, 4i) for 1 i l 1. Taken all together, we have constructed 2(l + l 1) = 4l 2 distinct intervals that each contain a root. Since Ql also has a double root at x = 0, it follows that this must account for all of its roots as deg(Ql) = deg(Pl) = 4l.

B Proof of Theorem 4.2

B.1 Algorithm description

The main idea of our auditor, simple_audit , is to essentially performs a brute-force auditing where we choose a set of points, X1, and attempt to assess the accuracy of their local explanations by using a a wealth of labeled data, X2, to validate it. Our algorithm uses the following steps (pseudocode given in Algorithm 1).

1. (lines 1 -3) We first partition X based on the tolerance parameters, ϵ1, ϵ2, δ. X1 will be the set of points that we validate, and X2 will be the set of points we use for validation.

2. (lines 8), For each point x in X1, we check whether there are enough points from X2 that fall within its local region, Rx, to accurate estimate its local loss.

3. (line 9-13) For each point satisfying the criteria in line 8, we evaluate its empirical local loss and then tally up how many points have a loss that is larger than γ.

4. (line 17) We output the proportion of points with loss larger than γ among all points whose loss we measured.

At a high level, we expect this algorithm to succeed as long as we have enough data in each of the local regions induced from points in X1.

B.2 Notation

We use the following:

1. Let δ, ϵ1, ϵ2, γ be the tolerance parameters defined in Definition 3.1.

2. Let λ = Λ(E, f) denote the locality of E, f w.r.t. data distribution µ.

3. Let X1 be the set of points that are specifically being audited.

4. Let X2 be the set of points being used to audit.

5. Let |X1| = m. By definition, m > log 1

6. We set |X2| = n = n m. By definition, n > stuff.

Algorithm 1 simple_audit(X, f(X), E(f, X), ϵ1, ϵ2, γ, δ)

ϵ2 2 log 12

2: k 1 2γ2ϵ2 1 log 176

3: X1 {x1, . . . , xm}, X2 X \ X1. 4: r , b 0. 5: for xi X1 do 6: (Rxi, gxi) = E(xi, f) 7: Xi 1 = Rxi X2. 8: if |Xi 1| k then 9: ˆL(E, f, xi) 1 |Xi 1| P

xj Xi 1 1 (gxi(xj) = f(xj))

10: if ˆL(E, f, xi) > γ then 11: r = r + 1. 12: else 13: b = b + 1. 14: end if 15: end if 16: end for 17: Return r r +b .

7. For any x Rd, we let E(x) = (Rx, gx) be the local explanation outputted for x by explainer E.

We also define the following quantities related to estimating how frequently the local loss outputted by the explainer E is above the desired threshold, γ.

1. Let r = Prx µ[L(E, f, x) γ(1 + ϵ1)]. 2. Let g = Prx µ[γ(1 ϵ1) L(E, f, x) γ(1 + ϵ1)]. 3. Let b = Prx µ[L(E, f, x) γ(1 ϵ1)].

Here, r denotes the probability that a point has a large local error, b , the probability a point has a low local error, and g , the probability of an "in-between" case that is nearby the desired threshold, γ. By the definition of sample complexity (Definition 3.1), the goal of Algorithm 1 is to output an estimate that is inside the interval, [r ϵ2, r + g + ϵ2].

Next, we define r, g, b as versions of these quantities that are based on the sample, X1.

1. Let r = Prx X1[L(E, f, x) γ(1 + ϵ1)]. 2. Let g = Prx X1[γ(1 ϵ1) L(E, f, x) γ(1 + ϵ1)]. 3. Let b = Prx X1[L(E, f, x) γ(1 ϵ1)].

Observe that while x is drawn at uniform from X1 in these quantities, we still use the true loss with respect toe µ, L(E, f, x), to determine whether it falls into r, g or b. Because of this, it becomes necessary to define two more fully empirical quantities that serve as estimates of r and b (we ignore g as it will merely contribute to a "margin" in our estimation terms).

1. Let r = Prx X1

(Prx X2[gx(x ) = f(x )|x Rx] > γ) and |X2 Rx| log 176

ϵ2δ 2(γϵ1)2

2. let b = Prx X1

(Prx X2[gx(x ) = f(x )|x Rx] γ) and |X2 Rx| log 176

ϵ2δ 2(γϵ1)2

The final estimate outputted by Algorithm 1 is precisely r r +b . Thus, our proof strategy will be to show that for sufficiently large samples, r, g, b are relatively accurate estimates of r , g , b , and in turn r and b are relatively accurate estimates of r and b. Together, these will imply that our estimate is within the desired interval, [r , b ].

B.3 The main proof

Proof. (Theorem 4.2) Let

ϵ2 2 log 12

δ + log 176

ϵ2δ 2λγ2ϵ2 1 log 44 log 176

ϵ2δ ϵ2δγ2ϵ2 1 .

By ignoring log factors, we see that n = O 1 ϵ2 2 + 1 λγ2ϵ2 1

, thus satisfying the desired requirement in Theorem 4.2.

Let X1 and X2 be as in Algorithm 1, and let m, n denote |X1| and |X2 respectively. Directly from Algorithm 1, it follows that ,

ϵ2 2 log 12

δ , n log 176

ϵ2δ 2λγ2ϵ2 1 log 44 log 176

ϵ2δ ϵ2δγ2ϵ2 1 .

By letting ϵ = ϵ2

7 , and k = log 16

ϵδ 2(γϵ1)2 , we have that m 1 2ϵ2 log 12

δ , and that

ϵ2δ 2λγ2ϵ2 1 log 44 log 16

ϵδ ϵ2δγ2ϵ2 1

ϵδ 2λγ2ϵ2 1 log 4 log 16

ϵδ ϵδγ2ϵ2 1

Our bounds on m, k and n allow us to apply Lemmas B.1 and B.4 along with a union bound to get that the following equations hold simultaneously with probability at least 1 δ over X µn:

|r r |.|g g |, |b b | ϵ, (2)

r(1 2ϵ) r r + g + bϵ (3)

b(1 2ϵ) b rϵ + g + b. (4)

Recall that our goal is to show that r r +b [r ϵ2, r + g + ϵ2] holds with probability at least 1 δ. Thus, it suffices to show that this a simple algebraic consequence of equations 2, 3, and 4. To this end, we have

(a) r(1 2ϵ) r(1 2ϵ) + rϵ + g + b

r r + b + g 2ϵ

(b) r ϵ r + b + g + 3ϵ 2ϵ

1 + 3ϵ ϵ 1 + 3ϵ 2ϵ

r (1 3ϵ) ϵ 2ϵ r 4ϵ 2ϵ

Here step (a) follows from Equations 3 and 4, (b) from Equation 2, and (c) from the fact that ϵ = ϵ2

For the other side of the inequality, we have

(a) r + g + bϵ r + g + bϵ + b(1 2ϵ)

r + g + bϵ (r + g + b)(1 ϵ)

r + g (r + g + b)(1 ϵ) + ϵ 1 ϵ

r + g r + g + b + 2ϵ + ϵ(1 + 2ϵ)

(b) r + g + 2ϵ r + g + b 3ϵ + 3ϵ + 2ϵ2

= r + g + 2ϵ

1 3ϵ + 3ϵ + 2ϵ2

(c) (r + g )(1 + 4ϵ) + 2ϵ(1 + 4ϵ) + 3ϵ + 2ϵ2

r + g + 6ϵ + 8ϵ2 + 3ϵ + 2ϵ2

(d) r + g + 7ϵ + 4ϵ

(e) r + g + ϵ2

Here step (a) follows from Equations 3 and 4, (b) from Equation 2, (c) from the fact that 1 1 3ϵ 1+4ϵ, (d) from ϵ = ϵ2

2, and (e) from the ϵ = ϵ2

B.4 Concentration lemmas

In this section, we show several lemmas that allow us to bound the behavior of the random variables r, g, b, r and b (defined in section B.2). We also use m and n as they are defined in Section B.2 to be the sizes of |X1| and |X2| respectively. Finally, we also let ϵ = ϵ2

We begin by bounding the differences between r, g, b and r , g , b .

Lemma B.1. Suppose that m 1 2ϵ2 log 12

δ . Then with probability at least 1 δ

2 over X1 µm, the

|r r |, |g g |, |b b | ϵ.

Proof. Observe that r is the average of m i.i.d binary variables each of which have expected value r . It follows by Hoeffding s inequality that

Pr[|r r | > ϵ] 2 exp 2(ϵm)2

2 exp 2ϵ2 1

By an identical argument, we see that the same holds for Pr[|g g | > ϵ] and Pr[|b b | > ϵ]. Applying a union bound over all three gives us the desired result.

Next, we show that if n is sufficiently large, then it is highly likely that for any given point x, we observe a large number of points from X2 within the explanation region, Rx.

Lemma B.2. Let x supp(µ), and let k > 0 be an integer. Suppose that n k log 8k

δϵ λ . Then with probability at least 1 δϵ

8 over X2 µm, |Rx X2| k.

Proof. Partition X2 into k sets, X1 2, X2 2, . . . , Xk 2 each of which contain at least log 8k

δϵ λ i.i.d points from µ. Because each point is drawn independently, we have that for any 1 i k,

Pr[Xi 2 Rx = ] = 1 Pr x µ[x Rx] log 8k

Here we are using the definition of λ as a lower bound on the probability mass of Rx.

Next, we show that if Rx has a sufficient number of points, then it is quite likely for the empirical estimate of the local loss at x to be accurate.

Lemma B.3. Let x supp(µ), and let k log 16

ϵδ 2(γϵ1)2 . Then conditioning on there being at least k elements from X2 in Rx, the empirical local loss at x differs from the true local loss by at most γϵ1 with probability at least 1 δϵ

8 . That is,

" L(E, f, x) 1 |X2 Rx|

x X2 Rx 1 (gx(x ) = f(x ))

Proof. The key idea of this lemma is that the distribution of k points drawn from µ conditioned on being in Rx is precisely the marginal distribution over which L(E, f, x) is defined. In particular, this means that the points in X2 Rx can be construed as i.i.d drawn from the marginal distribution of µ over Rx. Given this observation, the rest of the proof is a straightforward application of Hoeffding s inequality. Letting ˆL(E, f, x) = 1 |X2 Rx| P

x X2 Rx 1 (gx(x ) = f(x )) and K = |X2 Rx|, we have

h L(E, f, x) ˆL(E, f, x) > γϵ1 K k i 2 exp 2(Kγϵ1)2

2 exp log 16

as desired.

It follows by a union bound that the probability that least one of the sets in {Xi 2 Rx : 1 i k} is empty is at most δϵ

8 . Thus with probability at least 1 δϵ

8 , all the sets are non-empty which implies that |Rx X2| k, completing the proof.

Finally, we use the previous two lemmas to show that r and b closely approximate r and b.

Lemma B.4. Let k log 16

ϵδ 2(γϵ1)2 , and suppose that n k log 8k

δϵ λ . Then with probability at least 1 δ

2 over X2 µn , the following equations holds:

r(1 2ϵ) r r + g + bϵ,

b(1 2ϵ) b rϵ + g + b.

Proof. We begin by defining subsets of X1 that correspond to r, g, b, r and b . We have

1. Let R = {x X1 : L(E, f, x) γ(1 + ϵ1)}.

2. Let G = {x X1 : γ(1 ϵ1) L(E, f, x) γ(1 + ϵ1)}.

3. Let B = {x X1 : L(E, f, x) γ(1 ϵ1)]}.

4. Let R = x X1 : Prx X2[gx(x ) = f(x )|x Rx] > γ and |X2 Rx| log 176

ϵ2δ 2(γϵ1)2

5. Let B = x X1 : Prx X2[gx(x ) = f(x )|x Rx] γ and |X2 Rx| log 176

ϵ2δ 2(γϵ1)2

Observe that r, g, b, r , and b are the probabilities that x X1 is in the sets R, G, B, R , and B respectively.

Our strategy will be to use the previous lemmas to bound the sizes of the intersections, R R, R B, B R, B B . To this end, let x R be an arbitrary point. By Lemma B.2, with probability at least 1 δϵ

8 over X2 µn , x R B . Furthermore, by Lemma B.3 (along with the definition of R), with probability at most δϵ

8 , x B . Applying linearity of expectation along with Markov s inequality, we get the following two bounds:

R (X1 \ (R B ) > |R|ϵ EX2[|R (X1 \ (R B )|]

R B > |R|ϵ EX2[|R B |]

Applying an analogous line of reasoning stating with x B, we also have

B (X1 \ (R B ) > |B|ϵ δ

B R > |B|ϵ δ

Applying a union bound, none of these events occur with probability at least 1 δ

2 over X2 µn . Thus, it suffices to show that they algebraically imply the desired inequalities. To this end, suppose none of them hold. Then we have,

= |R B| + |R G| + |R R|

|B|ϵ + |G| + |R|

|X1| = bϵ + g + r,

= |R B| + |R G| + |R R|

0 + 0 + |R| |R B | |R \ (B R )|

|R| |R|ϵ |R|ϵ

|X1| = r(1 2ϵ).

The upper and lower bounds on b are analogous.

C Proof of Theorem 5.1

C.1 Definitions and Notation

Definition C.1. Let α = 1 3670016d4 and β = 1 3584d2 .

Definition C.2. Let S1, S2, S3 be three (d 1)-spheres centered at the origin with radii (1 α), 1, and 1 + β respectively for 0 < α, β.

Definition C.3. Let µ denote the data distribution so that x µ is selected by first selecting i {1, 2, 3} at uniform, and then selecting x from Si at uniform.

Definition C.4. Let f denote the classifier Rd { 1} such that

+1 ||x||2 1 α

2 < ||x||2 1 + β

2 +1 ||x2|| > 1 + β

Definition C.5. Let x be an arbitrary point chosen on S3, and let g be any linear classifier, and B(a, r) be any L2-ball that contains x .

Lemma C.6. There exists x S2 and 0 θ1, θ2, θ3 π such that

S1 B(a, r) = C(S1, x(1 α), θ1),

S2 B(a, r) = C(S2, x, θ2),

S3 B(a, r) = C(S3, x(1 + β), θ3),

where C(S, x, θ) denotes the spherical cap of angle θ centered at x on (d 1)-sphere S (see Definition C.18).

C.2 Main Proof

We begin by showing that the structure of the data distribution µ provides significant difficulty for linear classifiers. At a high level, the curvature of the spheres, S1, S2, S3, make separating them linearly only possible for small portions of the sphere. We formalize this with the following lemma.

Lemma C.7. Let θ π

4 . Let x be an arbitrary point on S2, and let T1(x, θ), T3(x, θ) denote the sets

T1(x, θ) = C(S2, x, θ) C(S1, x(1 α), θ),

T3(x, θ) = C(S2, x, θ) C(S3, x(1 + β), θ).

Let g : Rd { 1} denote any linear classifier. Then g exhibits a loss of at least 1

3 over the conditional distribution of µ restricted to either T1 or T3. That is,

Pr x µ[g(x ) = f(x )|x T1(x, θ)], Pr x µ[g(x ) = f(x )|x T3(x, θ)] 1

Next, we show that if the local explanation region B(a, r),contains a sufficiently large probability mass, then it also must include a region that takes the form given by T1 or T3 from Lemma C.7.

Lemma C.8. Suppose that µ(B(a, r)) 31 d. Let T1 and T3 be as defined in Lemma C.7. Then there exist x S2 and θ π

4 such that at least one of the following hold:

T1(x, θ) B(a, r), and µ(T1(x,θ))

µ(B(a,r)) 1

T3(x, θ) B(a, r), and µ(T3(x,θ))

µ(B(a,r)) 1

We are now prepared to prove Theorem 5.1.

Proof. (Theorem 5.1) Suppose B(a, r) 31 d. Then by Lemma C.8, there exists θ π

4 such that either T1(x, θ) or T3(x, θ) is a subset of B(a, r) and satisfies the conditions outlined above. Suppose that T1(x, θ) B(a, r) (the other case is analogous).

Let g be any linear classifier. Then it follows from Lemmas C.7 and C.8 that the loss g incurs over the conditional distribution of µ over B(a, r) can be bounded as follows:

Pr z B(a,r)[g(z) = f(z)] Pr[z T1(x, θ)] Pr[g(z) = f(z)|z T1(x, θ)]

which concludes the proof.

C.3 Proof of Lemma C.7

We will show that the claim holds for T3(x, θ) as the proof for T1(x, θ) is nearly identical (as α < β). Let w Rd be a unit vector and b R be a scalar such that

g(z) = 1 w, z b 1 w, z < b .

Our main strategy will be to find a large set of points within T3(x, θ) such that g(z) = g(z(1 + β)) for all z within this set. This will force g to misclassify either z or z(1 + β) which will lead to our desired error bound. To this end, define

T = n z C(S2, x, θ) : g(z) = 1, g(z(1 + β)) = +1, | x, z | cos π

Lemma C.9. µ(T ) µ(C(S2,x,θ)) 1 10.

Proof. Let z be selected at uniform from

C(S2, x, θ) \ C S2, x, π

Note that z definitionally satisfies that | x, z | cos π

8 . It suffices to upper bound the probability that g(z) = g(z(1 + β)). Let Cϕ = {z : z, x = cos ϕ}. Our main idea is to condition on z Cϕ, and then integrate over all choices of ϕ. That is, if we let ϕ denote the random variable representing the angle between x and z, then

Pr z [g(z) = 1, g(z(1 + β)) = +1] = Eϕ Pr z|ϕ[g(z) = 1, g(z(1 + β)) = +1].

We will now bound the latter quantity. Fix any ϕ, and observe that the conditional distribution, z|ϕ can be written as z = x cos ϕ + u sin ϕ where u is a random vector in Rd 1 that is uniformly distributed over the unit sphere, Sd 2 Rd 1. Rewriting the condition that g(z) = g(z(1 + β)) in terms of u, observe that

g(z) = 1, g(z(1 + β)) = +1 = w, z b w, z(1 + β)

= b 1 + β w, z b

= b 1 + β x cos ϕ, w w, u sin ϕ b x cos ϕ, w

= w, u s, s + β sin ϕ

where s is a constant that depends solely on b, w, x, and ϕ. Note that we are using the fact that |b| (1 + β) as otherwise g would trivially output the same label over all z µ).

By applying Lemma C.17 along with the fact that (by definition of ϕ) β sin ϕ β sin π

8 1 1370d2 ,

we have that

u s, s + β sin ϕ

which implies the desired result.

Lemma C.10.

µ(C(S2,x, π

8 ) C(S2, x, π

8 )) µ(C(S2,x,θ)) 7 30.

Proof. By symmetry, µ C S2, x, π

8 = µ C S2, x, π

8 so it suffices to bound one of them. Since θ π

4 by assumption, applying Lemma C.20, we have

µ C S2, x, π

8 C S2, x, π

µ (C(S2, x, θ)) 2µ C S2, x, π

µ (C(S2, x, θ))

2µ C S2, x, π

µ C S2, x, π

as d 5 in the assumption of Theorem 5.1.

We are now prepared to prove the main lemma.

Proof. (Lemma C.7) Let A C(S2, x, θ) be defined as the set of all points for which g classifies both the point and its image in (1 + β)S3 correctly. That is,

A = {z C(S2, x, θ) : g(z) = 1, g((1 + β)z) = +1}.

By the previous two lemmas, we have

µ(A ) µ(C(S2, x, θ)) µ(T C(S2, x, π

8 ) C(S2, x, π

8 )) µ(C(S2, x, θ))

3 Each z A is a point for which both z and (1 + β)z are correctly classified, and each z C(S2, x, θ) \ A corresponds to either z being misclassified, or (1 + β)z being misclassified. It follows that the overall accuracy of g over T3(x, θ) is at most

Pr z T3(x,θ)[g(z) = f(z)] Pr z C(S2,x,θ)[z A ] + 1

2 Pr z C(S2,x,θ)[z / A ]

1 + Pr z C(S2,x,θ)[z A ]

3 Thus g must incur loss at least 1

3 over T3(x, θ), as desired.

C.4 Proof of Lemma C.8

Throughout this section, we assume that µ(B(a, r)) 31 d. Lemma C.11. max(θ1, θ2, θ3) π

Proof. Assume towards a contradiction that this does not hold. Let x be as in Lemma C.6. Then by the Definition of µ (Definition C.3) and Lemma C.6, it follows that

µ(B(a, r)) = µ(C(S1, x(1 α), θ1)) + µ(C(S2, x, θ2)) + µ(C(S3, x(1 + β), θ3))

3 (Ψ(θ1) + Ψ(θ2) + Ψ(θ3))

where Ψ is as defined in Section C.6. However, Lemma C.19 implies that Ψ pi

3 31 dΨ(π) = 31 d. This contradicts our assumption on µ(B(a, r)) and implies the desired result.

Lemma C.12. r 1 α.

Proof. Lemma C.11 implies that B(a, r) must intersect some sphere among S1, S2, S3 in a spherical cap of an angle at least π

3 . Basic geometry implies that r min(rad(S1), rad(S2), rad(S3)) where rad(Si) denotes the radius of Si. The desired result follows from the fact that 1 α rad(S1) rad(S2), rad(S3).

Lemma C.13. |θ2 max(θ1, θ3)| 1 4d.

Proof. We first compute θ1, θ2, θ3 in terms of a, r, α, and β. We begin with θ2, and note that the expressions for θ1 and θ3 can be similarly derived. To this end, we have

S2 B(a, r) = {x : ||x|| = 1, ||x a|| r}

= x : ||x|| = 1, x, x 2 x, a + a, a r2

= x : ||x|| = 1, x

||x||, a ||a||

where we use a to denote ||a|| in a slight abuse of notation.

It follows from Lemma C.6 that

cos θ2 = 1 + a2 r2

We can similarly show that

cos θ1 = (1 α)2 + a2 r2

2(1 α)a , cos θ3 = (1 + β)2 + a2 r2

2(1 + β)a .

Let h : R R be the function defined as h(s) = s2+a2 r2

2sa . Thus,

cos θ1 = h(1 α), cos θ2 = h(1), cos θ3 = h(1 + β).

Note that in cases where h is outside of the interval [ 1, 1] (meaning θi would not be defined), we simply set θi equal to π and 0 respectively, as these quantities still accurately describe the intersection between B(a, r) and the corresponding sphere, Si.

Case 1: 0 a β

2 By definition, B(a, r) contains x and therefore intersects S3. It follows from the triangle inequality that r 1 + β

2 . However, this implies that B(a, r) must contain the entirety of S2 and S1, which implies that θ1 = θ2 = max(θ1, θ3) = π, thus implying the lemma statement.

2 < a 1 α If r > 1 + 2β, then B(a, r) will contain S1, S2 and S3, which implies θ1 = θ2 = θ3 = π (implying the lemma statement). Thus, assume r 1 + 2β.

Differentiating h w.r.t. s gives

By Lemma C.12, r2 a2, which implies that h (s) is nonnegative for s [1 α, 1+β]. Furthermore, we have that over the interval, [1 α, 1 + β],

1 + (1 + 2β)2 β 2 2

1 + 1 + 4β + 3.75β2

1 + 1 + 4(0.25) + 3.75(0.25)2

This is obtained by substituting appropriate upper and lower bounds for r, a, s, α, and β. Because h (s) is nonnegative over the interval, we must have that h(1 α) h(1) h(1 + β) which implies θ1 θ2 θ3 (as cos is a decreasing function). It follows from our upper bound on h (s) that

| cos θ2 cos (max(θ1, θ3)) | = cos(θ2) cos(θ1) = h(1) h(1 α)

1 α h (s)ds

Applying Lemma C.14 implies that |θ2 max(θ1, θ3)| 8 q

α β = 1 4d, which implies the lemma statement.

Case 3: a > 1 α First suppose that |a r| > 3. If r > a + 3, then the triangle inequality implies that S1, S2, S3 B(a, r) which implies the desired result. On the other hand, if r < a 3, then we must have a > 3, and that B(a, r) is disjoint from S1, S2, S3 which again implies the desired result. Thus, we assume that |a r| 3.

We now use a similar strategy to the previous case, and bound the derivative, h (s). By substituting that |a r| 3, we have, for s [1 α, 1 + β],

|h (s)| = 1 2a

1 + (2a + 3)(3)

1 + (2a + 3)(3)

1 + (2a + 3)(3)

1 2a (1 + 4(2a + 3))

1 2a (13 + 8a)

4 + 10 = 14.

Here we are exploiting the fact that 1 α

3 2 , 0.65. It follows by the same argument given in Case 2 that | cos θ2 cos(max(θ1, θ3))| 14β. Applying Lemma C.14 implies |θ2 max(θ1, θ3)| 4 14β = 1 4d, as desired.

Now we are ready to prove the lemma.

Proof. (Lemma C.8) Let x be as in Lemma C.6, and let θ = max(θ1, θ2, θ3). Then by applying Lemma C.6 to the Definition of µ (Definition C.3) gives us

µ (B(a, r)) = µ (C(S1, x(1 α), θ1)) + µ (C(S2, x, θ2)) + µ (C(S3, x(1 + β), θ3))

Here Ψ denotes the function defined in Section C.6.

Next, let θ = min(max(θ1, θ3), θ2). Let T1(x, θ) and T3(x, θ) be as defined in Lemma C.7. Observe that if θ1 θ3, then

T1(x, θ) C(S1, x(1 α), θ1) C(S2, x, θ2) B(a, r),

and otherwise, T3(x, θ) C(S3, x(1 + β), θ3) C(S2, x, θ2) B(a, r).

Thus, at least one of these sets is part of B(a, r). We now show that these sets have the desired mass. By the definition of θ , we have

µ(T1(x, θ))

µ(B(a, r)) , µ(T3(x, θ))

µ(B(a, r)) 2µ(C(S2, x, θ))

3µ(C(S2, x, θ )).

Next, Lemma C.11 implies that θ π

3 , and Lemma C.13 implies that θ θ 1 4d. It follows that

Substituting this, we find that

2µ(C(S2, x, θ)) 3µ(C(S2, x, θ )) =

2 3Ψ(θ) Ψ(θ )

where the last steps follow from Lemmas C.19 and C.16. This completes the proof.

C.5 Technical Lemmas

Lemma C.14. Suppose ϕ1, ϕ2 [0, π] such that | cos(ϕ1) cos(ϕ2)| c. Then |ϕ1 ϕ2| 4 c.

Proof. WLOG, suppose ϕ1 ϕ2. Let x = ϕ2 ϕ1.

Using the sum to product rules, it follows that

α | cos ϕ1 cos ϕ2|

= 2 sin ϕ1 ϕ2

2 sin ϕ1 + ϕ2

2 sin ϕ1 + ϕ2

However, observe that π ϕ1+ϕ2

2 and that ϕ1+ϕ2

2. It follows that ϕ1+ϕ2

2], which implies that sin ϕ1+ϕ2

2. Substituting this, we have

2 sin ϕ1 + ϕ2

We now do casework based on x. First suppose that x π

2 . Then c 2 sin2 π

4 = 1. By definition, x π, so it follows that x 4 α, implying the desired result.

Otherwise, if x π

2 , then sin x

4, as the function t 7 sin(t) t

2 is nonnegative on the interval [0, π

2 ]. Substituting this, we see that c x2

8c < 4 c, as desired.

Lemma C.15. For 0 c 1 and 0 θ π, sin(cθ) c sin(θ).

Proof. Let f(θ) = sin(cθ) c sin(θ). Observe that f(0) = 0. Furthermore, for θ [0, π], we have f (θ) = c cos(cθ) c cos(θ) = c (cos(cθ) cos(θ)) . Since cos is a decreasing function on the interval [0, π], it follows that cos(cθ) cos(θ), which implies f (θ) 0. Thus f is non-decreasing on the interval, and the desired inequality holds.

Lemma C.16. For all x > 1, 1 1

Proof. Let f(x) = 1 1

x x 1. It is well known that limx f(x) = 1

e and limx 1+ f(x) = 1. Thus it suffices to show that f(x) is a non-increasing function. To do so, we will show that ln f(x) is non-increasing by taking its derivative. We have

d (ln f(x))

(x 1) ln x 1

dx ((x 1) ln(x 1) (x 1) ln x)

= ln(x 1) + x 1

ln(x) + x 1

x (ln(x) ln(x 1))

Lemma C.17. Let z be a point chosen at uniform over S2, and let w be a fixed unit vector. Then if t 1 1370d2 , then for any s R,

Pr z [ w, z [s, s + t]] 1

Proof. Let θ denote the random variable that represents the angle between w and z. Applying Lemma C.14, it follows that for some choice of s R that

Pr z [ w, z [s, s + t] Pr θ [θ [s , s + 4

We now bound this quantity by utilizing the quantity Ψ (defined in Section C.6). We have,

Pr θ [θ [s , s + 4

t s sin(d 2) ϕdϕ R π 0 sin(d 2) ϕdϕ

t sin(d 2) ϕdϕ

2 0 sin(d 2) ϕdϕ

Here we have simply chosen the interval of length 4

t that maximizes the corresponding the integral. Next, we continue by applying Lemmas C.19 and C.16 to get

Pr θ [θ [s , s + 4

29(d 1)!1/29

as desired.

C.6 Spherical Caps

Definition C.18. Let S be a (d 1) sphere centered at the origin, let 0 θ π be an angle, and let x S be a point. We let C(S, x, θ) denote the spherical cap with angle θ centered at x, and it consists of all points, x Sd 1, such that x,x ||x||||x || cos ϕ.

Here we take the convention of associating C(Si, xi, 0) with both the empty set and with {xi}. While these are distinct sets, they both have measure 0 under π. We also associate C(Si, xi, π) with the entirety of Si.

We let Ψ(θ) denote the ratio of the (d 1)-surface area of the region, C(S, x, θ), to the (d 1)-surface area of the entire sphere. Thus, Ψ(θ) denotes the fraction of the sphere covered by a spherical cap of angle θ. By standard integration over spherical coordinates, we have

R θ 0 sin(d 2) ϕdϕ R π 0 sin(d 2) ϕdϕ .

Next, we bound Ψ(θ) with the following inequality.

Lemma C.19. Let 0 θ π and let 0 c 1. Then

Proof. By applying Lemma C.15 to the definition of Ψ, we have the following manipulations.

R cϕ 0 sind 2 θdθ R ϕ 0 sind 2 θdθ

R ϕ 0 sind 2(cu)(cdu)

R ϕ 0 sind 2 θdθ

R ϕ 0 (c sin(u))d 2 (cdu)

R ϕ 0 sind 2 θdθ

We similarly have an upper bound on this ratio. Lemma C.20. Let 0 θ π

2 and 0 c 1. Then

Ψ(θ) c sin cϕ

Proof. We similarly have,

R cϕ 0 sind 2 θdθ R ϕ 0 sind 2 θdθ

R ϕ 0 sind 2(cu)(cdu)

R ϕ 0 sind 2 θdθ

R ϕ 0 sin(u) sin cϕ

sin ϕ d 2 (cdu) R ϕ 0 sind 2 θdθ

Here we are using the fact that t 7 sin ct

sin t is a non-decreasing function for t [0, π].

Neur IPS Paper Checklist

Question: Do the main claims made in the abstract and introduction accurately reflect the paper s contributions and scope? Answer: [Yes] Justification: we mention all main results in both the abstract and introduction. Furthermore everything within these sections is revisited in the body. Guidelines:

The answer NA means that the abstract and introduction do not include the claims made in the paper. The abstract and/or introduction should clearly state the claims made, including the contributions made in the paper and important assumptions and limitations. A No or NA answer to this question will not be perceived well by the reviewers. The claims made should match theoretical and experimental results, and reflect how much the results can be expected to generalize to other settings. It is fine to include aspirational goals as motivation as long as it is clear that these goals are not attained by the paper. 2. Limitations

Question: Does the paper discuss the limitations of the work performed by the authors? Answer: [Yes] Justification: all our assumptions are clearly stated, and we use the limitations of our methods as avenues for future work. We also stress that our framework is by no means comprehensive for evaluating local explanations. Guidelines:

The answer NA means that the paper has no limitation while the answer No means that the paper has limitations, but those are not discussed in the paper. The authors are encouraged to create a separate "Limitations" section in their paper. The paper should point out any strong assumptions and how robust the results are to violations of these assumptions (e.g., independence assumptions, noiseless settings, model well-specification, asymptotic approximations only holding locally). The authors should reflect on how these assumptions might be violated in practice and what the implications would be. The authors should reflect on the scope of the claims made, e.g., if the approach was only tested on a few datasets or with a few runs. In general, empirical results often depend on implicit assumptions, which should be articulated. The authors should reflect on the factors that influence the performance of the approach. For example, a facial recognition algorithm may perform poorly when image resolution is low or images are taken in low lighting. Or a speech-to-text system might not be used reliably to provide closed captions for online lectures because it fails to handle technical jargon. The authors should discuss the computational efficiency of the proposed algorithms and how they scale with dataset size. If applicable, the authors should discuss possible limitations of their approach to address problems of privacy and fairness. While the authors might fear that complete honesty about limitations might be used by reviewers as grounds for rejection, a worse outcome might be that reviewers discover limitations that aren t acknowledged in the paper. The authors should use their best judgment and recognize that individual actions in favor of transparency play an important role in developing norms that preserve the integrity of the community. Reviewers will be specifically instructed to not penalize honesty concerning limitations. 3. Theory Assumptions and Proofs

Question: For each theoretical result, does the paper provide the full set of assumptions and a complete (and correct) proof?

Answer: [Yes]

Justification: this is a theory paper, and doing so is its main focus. our proofs are all included in the appendix.

Guidelines:

The answer NA means that the paper does not include theoretical results. All the theorems, formulas, and proofs in the paper should be numbered and crossreferenced. All assumptions should be clearly stated or referenced in the statement of any theorems. The proofs can either appear in the main paper or the supplemental material, but if they appear in the supplemental material, the authors are encouraged to provide a short proof sketch to provide intuition. Inversely, any informal proof provided in the core of the paper should be complemented by formal proofs provided in appendix or supplemental material. Theorems and Lemmas that the proof relies upon should be properly referenced.

4. Experimental Result Reproducibility

Question: Does the paper fully disclose all the information needed to reproduce the main experimental results of the paper to the extent that it affects the main claims and/or conclusions of the paper (regardless of whether the code and data are provided or not)?

Answer: [NA]

Justification: This is a theory paper.

Guidelines:

The answer NA means that the paper does not include experiments. If the paper includes experiments, a No answer to this question will not be perceived well by the reviewers: Making the paper reproducible is important, regardless of whether the code and data are provided or not. If the contribution is a dataset and/or model, the authors should describe the steps taken to make their results reproducible or verifiable. Depending on the contribution, reproducibility can be accomplished in various ways. For example, if the contribution is a novel architecture, describing the architecture fully might suffice, or if the contribution is a specific model and empirical evaluation, it may be necessary to either make it possible for others to replicate the model with the same dataset, or provide access to the model. In general. releasing code and data is often one good way to accomplish this, but reproducibility can also be provided via detailed instructions for how to replicate the results, access to a hosted model (e.g., in the case of a large language model), releasing of a model checkpoint, or other means that are appropriate to the research performed. While Neur IPS does not require releasing code, the conference does require all submissions to provide some reasonable avenue for reproducibility, which may depend on the nature of the contribution. For example (a) If the contribution is primarily a new algorithm, the paper should make it clear how to reproduce that algorithm. (b) If the contribution is primarily a new model architecture, the paper should describe the architecture clearly and fully. (c) If the contribution is a new model (e.g., a large language model), then there should either be a way to access this model for reproducing the results or a way to reproduce the model (e.g., with an open-source dataset or instructions for how to construct the dataset). (d) We recognize that reproducibility may be tricky in some cases, in which case authors are welcome to describe the particular way they provide for reproducibility. In the case of closed-source models, it may be that access to the model is limited in some way (e.g., to registered users), but it should be possible for other researchers to have some path to reproducing or verifying the results.

5. Open access to data and code

Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [NA] Justification: This is a theory paper. Guidelines:

The answer NA means that paper does not include experiments requiring code. Please see the Neur IPS code and data submission guidelines (https://nips.cc/ public/guides/Code Submission Policy) for more details. While we encourage the release of code and data, we understand that this might not be possible, so No is an acceptable answer. Papers cannot be rejected simply for not including code, unless this is central to the contribution (e.g., for a new open-source benchmark). The instructions should contain the exact command and environment needed to run to reproduce the results. See the Neur IPS code and data submission guidelines (https: //nips.cc/public/guides/Code Submission Policy) for more details. The authors should provide instructions on data access and preparation, including how to access the raw data, preprocessed data, intermediate data, and generated data, etc. The authors should provide scripts to reproduce all experimental results for the new proposed method and baselines. If only a subset of experiments are reproducible, they should state which ones are omitted from the script and why. At submission time, to preserve anonymity, the authors should release anonymized versions (if applicable). Providing as much information as possible in supplemental material (appended to the paper) is recommended, but including URLs to data and code is permitted. 6. Experimental Setting/Details

Question: Does the paper specify all the training and test details (e.g., data splits, hyperparameters, how they were chosen, type of optimizer, etc.) necessary to understand the results? Answer: [NA] Justification: this is a theory paper Guidelines:

The answer NA means that the paper does not include experiments. The experimental setting should be presented in the core of the paper to a level of detail that is necessary to appreciate the results and make sense of them. The full details can be provided either with the code, in appendix, or as supplemental material. 7. Experiment Statistical Significance

Question: Does the paper report error bars suitably and correctly defined or other appropriate information about the statistical significance of the experiments? Answer: [NA] Justification: this is a theory paper Guidelines:

The answer NA means that the paper does not include experiments. The authors should answer "Yes" if the results are accompanied by error bars, confidence intervals, or statistical significance tests, at least for the experiments that support the main claims of the paper. The factors of variability that the error bars are capturing should be clearly stated (for example, train/test split, initialization, random drawing of some parameter, or overall run with given experimental conditions). The method for calculating the error bars should be explained (closed form formula, call to a library function, bootstrap, etc.)

The assumptions made should be given (e.g., Normally distributed errors). It should be clear whether the error bar is the standard deviation or the standard error of the mean. It is OK to report 1-sigma error bars, but one should state it. The authors should preferably report a 2-sigma error bar than state that they have a 96% CI, if the hypothesis of Normality of errors is not verified. For asymmetric distributions, the authors should be careful not to show in tables or figures symmetric error bars that would yield results that are out of range (e.g. negative error rates). If error bars are reported in tables or plots, The authors should explain in the text how they were calculated and reference the corresponding figures or tables in the text. 8. Experiments Compute Resources

Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [NA] Justification: This is a theory paper Guidelines:

The answer NA means that the paper does not include experiments. The paper should indicate the type of compute workers CPU or GPU, internal cluster, or cloud provider, including relevant memory and storage. The paper should provide the amount of compute required for each of the individual experimental runs as well as estimate the total compute. The paper should disclose whether the full research project required more compute than the experiments reported in the paper (e.g., preliminary or failed experiments that didn t make it into the paper). 9. Code Of Ethics

Question: Does the research conducted in the paper conform, in every respect, with the Neur IPS Code of Ethics https://neurips.cc/public/Ethics Guidelines? Answer: [Yes] Justification: this paper does not utilize any datasets or human subjects. It also does not contribute towards any sort of harm. Guidelines:

The answer NA means that the authors have not reviewed the Neur IPS Code of Ethics. If the authors answer No, they should explain the special circumstances that require a deviation from the Code of Ethics. The authors should make sure to preserve anonymity (e.g., if there is a special consideration due to laws or regulations in their jurisdiction). 10. Broader Impacts

Question: Does the paper discuss both potential positive societal impacts and negative societal impacts of the work performed? Answer: [Yes] Justification: we discuss consequences of our results. Because they are theoretical, we don t believe our results can be used in a directly harmful manner. Guidelines:

The answer NA means that there is no societal impact of the work performed. If the authors answer NA or No, they should explain why their work has no societal impact or why the paper does not address societal impact. Examples of negative societal impacts include potential malicious or unintended uses (e.g., disinformation, generating fake profiles, surveillance), fairness considerations (e.g., deployment of technologies that could make decisions that unfairly impact specific groups), privacy considerations, and security considerations.

The conference expects that many papers will be foundational research and not tied to particular applications, let alone deployments. However, if there is a direct path to any negative applications, the authors should point it out. For example, it is legitimate to point out that an improvement in the quality of generative models could be used to generate deepfakes for disinformation. On the other hand, it is not needed to point out that a generic algorithm for optimizing neural networks could enable people to train models that generate Deepfakes faster. The authors should consider possible harms that could arise when the technology is being used as intended and functioning correctly, harms that could arise when the technology is being used as intended but gives incorrect results, and harms following from (intentional or unintentional) misuse of the technology. If there are negative societal impacts, the authors could also discuss possible mitigation strategies (e.g., gated release of models, providing defenses in addition to attacks, mechanisms for monitoring misuse, mechanisms to monitor how a system learns from feedback over time, improving the efficiency and accessibility of ML).

11. Safeguards

Question: Does the paper describe safeguards that have been put in place for responsible release of data or models that have a high risk for misuse (e.g., pretrained language models, image generators, or scraped datasets)?

Answer: [NA]

Justification: no models or data are in this paper.

Guidelines:

The answer NA means that the paper poses no such risks. Released models that have a high risk for misuse or dual-use should be released with necessary safeguards to allow for controlled use of the model, for example by requiring that users adhere to usage guidelines or restrictions to access the model or implementing safety filters. Datasets that have been scraped from the Internet could pose safety risks. The authors should describe how they avoided releasing unsafe images. We recognize that providing effective safeguards is challenging, and many papers do not require this, but we encourage authors to take this into account and make a best faith effort.

12. Licenses for existing assets

Question: Are the creators or original owners of assets (e.g., code, data, models), used in the paper, properly credited and are the license and terms of use explicitly mentioned and properly respected?

Answer: [NA]

Justification: no code or models are used.

Guidelines:

The answer NA means that the paper does not use existing assets. The authors should cite the original paper that produced the code package or dataset. The authors should state which version of the asset is used and, if possible, include a URL. The name of the license (e.g., CC-BY 4.0) should be included for each asset. For scraped data from a particular source (e.g., website), the copyright and terms of service of that source should be provided. If assets are released, the license, copyright information, and terms of use in the package should be provided. For popular datasets, paperswithcode.com/datasets has curated licenses for some datasets. Their licensing guide can help determine the license of a dataset. For existing datasets that are re-packaged, both the original license and the license of the derived asset (if it has changed) should be provided.

If this information is not available online, the authors are encouraged to reach out to the asset s creators. 13. New Assets

Question: Are new assets introduced in the paper well documented and is the documentation provided alongside the assets? Answer: [NA] Justification: No released assets Guidelines:

The answer NA means that the paper does not release new assets. Researchers should communicate the details of the dataset/code/model as part of their submissions via structured templates. This includes details about training, license, limitations, etc. The paper should discuss whether and how consent was obtained from people whose asset is used. At submission time, remember to anonymize your assets (if applicable). You can either create an anonymized URL or include an anonymized zip file. 14. Crowdsourcing and Research with Human Subjects

Question: For crowdsourcing experiments and research with human subjects, does the paper include the full text of instructions given to participants and screenshots, if applicable, as well as details about compensation (if any)? Answer: [NA] Justification: no human subjects Guidelines:

The answer NA means that the paper does not involve crowdsourcing nor research with human subjects. Including this information in the supplemental material is fine, but if the main contribution of the paper involves human subjects, then as much detail as possible should be included in the main paper. According to the Neur IPS Code of Ethics, workers involved in data collection, curation, or other labor should be paid at least the minimum wage in the country of the data collector. 15. Institutional Review Board (IRB) Approvals or Equivalent for Research with Human Subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or institution) were obtained? Answer: [NA] Justification: No human subjects Guidelines:

The answer NA means that the paper does not involve crowdsourcing nor research with human subjects. Depending on the country in which research is conducted, IRB approval (or equivalent) may be required for any human subjects research. If you obtained IRB approval, you should clearly state this in the paper. We recognize that the procedures for this may vary significantly between institutions and locations, and we expect authors to adhere to the Neur IPS Code of Ethics and the guidelines for their institution. For initial submissions, do not include any information that would break anonymity (if applicable), such as the institution conducting the review.