# glime_general_stable_and_local_lime_explanation__1c3a8460.pdf

GLIME: General, Stable and Local LIME Explanation

Zeren Tan Tsinghua University thutzr1019@gmail.com

Yang Tian Tsinghua University tyanyang04@gmail.com

Jian Li Tsinghua University lapordge@gmail.com

As black-box machine learning models grow in complexity and find applications in high-stakes scenarios, it is imperative to provide explanations for their predictions. Although Local Interpretable Model-agnostic Explanations (LIME) [22] is a widely adpoted method for understanding model behaviors, it is unstable with respect to random seeds [35, 24, 3] and exhibits low local fidelity (i.e., how well the explanation approximates the model s local behaviors) [21, 16]. Our study shows that this instability problem stems from small sample weights, leading to the dominance of regularization and slow convergence. Additionally, LIME s sampling neighborhood is non-local and biased towards the reference, resulting in poor local fidelity and sensitivity to reference choice. To tackle these challenges, we introduce GLIME, an enhanced framework extending LIME and unifying several prior methods. Within the GLIME framework, we derive an equivalent formulation of LIME that achieves significantly faster convergence and improved stability. By employing a local and unbiased sampling distribution, GLIME generates explanations with higher local fidelity compared to LIME. GLIME explanations are independent of reference choice. Moreover, GLIME offers users the flexibility to choose a sampling distribution based on their specific scenarios.

1 Introduction

Why a patient is predicted to have a brain tumor [10]? Why a credit application is rejected [11]? Why a picture is identified as an electric guitar [22]? As black-box machine learning models continue to evolve in complexity and are employed in critical applications, it is imperative to provide explanations for their predictions, making interpretability a central concern [1]. In response to this imperative, various explanation methods have been proposed [39, 26, 4, 19, 22, 28, 30], aiming to provide insights into the internal mechanisms of deep learning models.

Among the various explanation methods, Local Interpretable Model-agnostic Explanations (LIME) [22] has attracted significant attention, particularly in image classification tasks. LIME explains predictions by assigning each region within an image a weight indicating the influence of this region to the output. This methodology entails segmenting the image into super-pixels, as illustrated in the lower-left portion of Figure 1a, introducing perturbations, and subsequently approximating the local model prediction using a linear model. The approximation is achieved by solving a weighted Ridge regression problem, which estimates the impact (i.e., weight) of each super-pixel on the classifier s output.

Nevertheless, LIME has encountered significant instability due to its random sampling procedure [35, 24, 3]. In LIME, a set of samples perturbing the original image is taken. As illustrated in Figure 1a, LIME explanations generated with two different random seeds display notable disparities, despite using a large sample size (16384). The Jaccard index, measuring similarity between two explanations on a scale from 0 to 1 (with higher values indicating better similarity), is below 0.4. While many prior studies aim to enhance LIME s stability, some sacrifice computational time for stability [24, 40], and others may entail the risk of overfitting [35]. The evident drawback of unstable

37th Conference on Neural Information Processing Systems (Neur IPS 2023).

LIME GLIME-Binomial GLIME-Gauss

Jaccard Index 0.38 1.00 0.95

Random Seed 2 Random Seed 1

0.00 0.99 0.56 R2

(a) LIME and GLIME explanations for an Image Net image on the tiny Swin-Transformer with two random seeds. σ = 0.25 and a sample size of 16384 are employed. The numerical value presented below the second row represents the Jaccard Index between the two explanation maps.

LIME GLIME-Gauss GLIME-Gauss LIME

(b) Distribution of LIME and GLIME-GAUSS in 2D example (left) and 1D example explanations (right). LIME samples are constrained to one side of r and are more distant from x. Additionally, LIME explanations exhibit variability with distinct reference points.

Figure 1: GLIME enhances stability and local fidelity compared to LIME. (a) LIME demonstrates instability with the default parameter σ, while GLIME consistently provides meaningful explanations. (b) LIME samples from a biased and non-local neighborhood, a limitation overcome by GLIME.

explanations lies in their potential to mislead end-users and hinder the identification of model bugs and biases, given that LIME explanations lack consistency across different random seeds.

In addition to its inherent instability, LIME has been found to have poor local fidelity [16, 21]. As depicted in Figure 1a, the R2 value for LIME on the sample image approaches zero (refer also to Figure 4b). This problem arises from the non-local and skewed sampling space of LIME, which is biased towards the reference. More precisely, the sampling space of LIME consists of the corner points of the hypercube defined by the explained instance and the selected reference. For instance, in the left section of Figure 1b, only four red points fall within LIME s sampling space, yet these points are distant from x. As illustrated in Figure 3, the L2 distance between LIME samples of the input x and x is approximately 0.7 x 2 on Image Net. Although LIME incorporates a weighting function to enforce locality, an explanation cannot be considered as local if the samples themselves are non-local, leading to a lack of local fidelity in the explanation. Moreover, the hypercube exhibits bias towards the reference, resulting in explanations designed to explain only a portion of the local neighborhood. This bias causes LIME to generate different explanations for different references, as illustrated in Figure 1b (refer to Appendix A.4 for more analysis and results).

To tackle these challenges, we present GLIME a local explanation framework that generalizes LIME and five other methods: Kernel SHAP [19], Smooth Grad [28], Gradient [36], DLIME [35], and ALIME [24]. Through a flexible sample distribution design, GLIME produces explanations that are more stable and faithful. Addressing LIME s instability issue, within GLIME, we derive an equivalent form of LIME, denoted as GLIME-BINOMIAL, by integrating the weighting function into the sampling distribution. GLIME-BINOMIAL ensures exponential convergence acceleration compared to LIME when the regularization term is presented. Consequently, GLIME-BINOMIAL demonstrates improved stability compared to LIME while preserving superior local fidelity (see Figure 4). Furthermore, GLIME enhances both local fidelity and stability by sampling from a local distribution independent of any specific reference point.

In summary, our contributions can be outlined as follows:

We conduct an in-depth analysis to find the source of LIME s instability, revealing the interplay between the weighting function and the regularization term as the primary cause. Additionally, we attribute LIME s suboptimal local fidelity to its non-local and biased sampling space.

We introduce GLIME as a more general local explanation framework, offering a flexible design for the sampling distribution. With varying sampling distributions and weights, GLIME serves as a generalization of LIME and five other preceding local explanation methods.

By integrating weights into the sampling distribution, we present a specialized instance of GLIME with a binomial sampling distribution, denoted as GLIME-BINOMIAL. We demonstrate that GLIME-BINOMIAL, while maintaining equivalence to LIME, achieves faster convergence with significantly fewer samples. This indicates that enforcing locality in the sampling distribution is better than using a weighting function. With regard to local fidelity, GLIME empowers users to devise explanation methods that exhibit greater local fidelity. This is achieved by selecting a local and unbiased sampling distribution tailored to the specific scenario in which GLIME is applied.

2 Preliminary

2.1 Notations

Let X and Y denote the input and output spaces, respectively, where X RD and Y R. We specifically consider the scenario in which X represents the space of images, and f : X Y serves as a machine learning model accepting an input x X. This study focuses on the classification problem, wherein f produces the probability that the image belongs to a certain class, resulting in Y = [0, 1].

Before proceeding with explanation computations, a set of features {si}d i=1 is derived by applying a transformation to x. For instance, {si}d i=1 could represent image segments (also referred to as super-pixels in LIME) or feature maps obtained from a convolutional neural network. Alternatively, {si}d i=1 may correspond to raw features, i.e., x itself. In this context, 0, 1, and 2 denote the ℓ0, ℓ1, and ℓ2 norms, respectively, with representing the element-wise product. Boldface letters are employed to denote vectors and matrices, while non-boldface letters represent scalars or features. Bx(ϵ) denotes the ball centered at x with radius ϵ.

2.2 A brief introduction to LIME

In this section, we present the original definition and implementation of LIME [22] in the context of image classification. LIME, as a local explanation method, constructs a linear model when provided with an input x that requires an explanation. The coefficients of this linear model serve as the feature importance explanation for x.

Features. For an input x, LIME computes a feature importance vector for the set of features. In the image classification setting, for an image x, LIME initially segments x into super-pixels s1, . . . , sd using a segmentation algorithm such as Quickshift [32]. Each super-pixel is regarded as a feature for the input x.

Sample generation. Subsequently, LIME generates samples within the local vicinity of x as follows. First, random samples are generated uniformly from {0, 1}d. The j-th coordinate z j for each sample z is either 1 or 0, indicating the presence or absence of the super-pixel sj. When sj is absent, it is replaced by a reference value rj. Common choices for the reference value include a black image, a blurred version of the super-pixel, or the average value of the super-pixel [29, 22, 8]. Then, these z samples are transformed into samples in the original input space RD by combining them with x = (s1, . . . , sd) using the element-wise product as follows: z = x z + r (1 z ), where r is the vector of reference values for each super-pixel, and represents the element-wise product. In other words, z X is an image that is the same as x, except that those super-pixels sj with z j = 0 are replaced by reference values.

Feature attributions. For each sample z and the corresponding image z, we compute the prediction f(z). Finally, LIME solves the following regression problem to obtain a feature importance vector (also known as feature attributions) for the super-pixels:

w LIME = arg min v Ez Uni({0,1}d)[π(z )(f(z) v z )2] + λ v 2 2, (1)

where z = x z +r (1 z ), π(z ) = exp{ 1 z 2 2/σ2}, and σ is the kernel width parameter.

Remark 2.1. In practice, we draw samples {z i}n i=1 from the uniform distribution Uni({0, 1}d) to estimate the expectation in Equation 1. In the original LIME implementation [22], λ = α/n for a

constant α > 0. This choice has been widely adopted in prior studies [40, 8, 19, 9, 5, 20, 34]. We use ˆw LIME to represent the empirical estimation of w LIME.

2.3 LIME is unstable and has poor local fidelity

Instability. To capture the local characteristics of the neighborhood around the input x, LIME utilizes the sample weighting function π( ) to assign low weights to samples that exclude numerous super-pixels and, consequently, are located far from x. The parameter σ controls the level of locality, with a small σ assigning high weights exclusively to samples very close to x and a large σ permitting notable weights for samples farther from x as well. The default value for σ in LIME is 0.25 for image data. However, as depicted in Figure 1a, LIME demonstrates instability, a phenomenon also noted in prior studies [35, 24, 34]. As showed in Section 4, this instability arises from small σ values, leading to very small sample weights and, consequently, slow convergence.

Poor local fidelity. LIME also suffers from poor local fidelity [16, 21]. The sampling space of LIME is depicted in Figure 1b. Generally, the samples in LIME exhibit considerable distance from the instance being explained, as illustrated in Figure 3, rendering them non-local. Despite LIME s incorporation of weights to promote locality, it fails to provide accurate explanations for local behaviors when the samples themselves lack local proximity. Moreover, the sampling space of LIME is influenced by the reference, resulting in a biased sampling space and a consequent degradation of local fidelity.

3 A general local explanation framework: GLIME

3.1 The definition of GLIME

We first present the definition of GLIME and show how it computes the explanation vector w GLIME. Analogous to LIME, GLIME functions by constructing a model within the neighborhood of the input x, utilizing sampled data from this neighborhood. The coefficients obtained from this model are subsequently employed as the feature importance explanation for x.

Feature space. For the provided input x X RD, the feature importance explanation is computed for a set of features s = (s1, . . . , sd) derived from applying a transformation to x. These features s can represent image segments (referred to as super-pixels in LIME) or feature maps obtained from a convolutional neural network. Alternatively, the features s can correspond to raw features, i.e., the individual pixels of x. In the context of LIME, the method specifically operates on super-pixels.

Sample generation. Given features s, a sample z can be generated from the distribution P defined on the feature space (e.g., s are super-pixels segmented by a segmentation algorithm such Quickshift [32] and P = Uni({0, 1}d) in LIME). It s important to note that z may not belong to X and cannot be directly input into the model f. Consequently, we reconstruct z RD in the original input space for each z and obtain f(z) (in LIME, a reference r is first chosen and then z = x z +r (1 z )). Both z and z are then utilized to compute feature attributions.

Feature attributions. For each sample z and its corresponding z, we compute the prediction f(z). Our aim is to approximate the local behaviors of f around x using a function g that operates on the feature space. g can take various forms such as a linear model, a decision tree, or any Boolean function operating on Fourier bases [37]. The loss function ℓ(f(z), g(z )) quantifies the approximation gap for the given sample z . In the case of LIME, g(z ) = v z , and ℓ(f(z), g(z )) = (f(z) g(z ))2. To derive feature attributions, the following optimization problem is solved:

w GLIME = arg min v Ez P[π(z )ℓ(f(z), g(z ))] + λR(v), (2)

where π( ) is a weighting function and R( ) serves as a regularization function, e.g., 1 or 2 2 (which is used by LIME). We use ˆw GLIME to represent the empirical estimation of w GLIME.

Connection with Existing Frameworks. Our formulation exhibits similarities with previous frameworks [22, 12]. The generality of GLIME stems from two key aspects: (1) GLIME operates within a broader feature space Rd, in contrast to [22], which is constrained to {0, 1}d, and [12], which is confined to raw features in RD. (2) GLIME can accommodate a more extensive range of distribution choices tailored to specific use cases.

3.2 An alternative formulation of GLIME without the weighting function

Indeed, we can readily transform Equation 2 into an equivalent formulation without the weighting function. While this adjustment simplifies the formulation, it also accelerates convergence by sampling from the transformed distribution (see Section 4.1 and Figure 4a). Specifically, we define the transformed sampling distribution as e P(z ) = π(z )P(z ) R π(t)P(t)dt. Utilizing e P as the sampling distribution, Equation 2 can be equivalently expressed as

w GLIME = arg min v Ez e P[ℓ(f(z), g(z ))] + λ

Z R(v), Z = Et P[π(t)P(t)] (3)

It is noteworthy that the feature attributions obtained by solving Equation 3 are equivalent to those obtained by solving Equation 2 (see Appendix B.1 for a formal proof). Therefore, the use of π( ) in the formulation is not necessary and can be omitted. Hence, unless otherwise specified, GLIME refers to the framework without the weighting function.

3.3 GLIME unifies several previous explanation methods

This section shows how GLIME unifies previous methods. For a comprehensive understanding of the background regarding these methods, kindly refer to Appendix A.6.

LIME [22] and GLIME-BINOMIAL. In the case of LIME, it initiates the explanation process by segmenting pixels x1, , x D into super-pixels s1, , sd. The binary vector z P = Uni({0, 1}d) signifies the absence or presence of corresponding super-pixels. Subsequently, z = x z + r (1 z ). The linear model g(z ) = v z is defined on {0, 1}d. For image explanations, ℓ(f(z), g(z )) = (f(z) g(z ))2, and the default setting is π(z ) = exp( 1 z 2 0/σ2), R(v) = v 2 2 [22]. Remarkably, LIME is equivalent to the special case GLIME-BINOMIAL without the weighting function (see Appendix B.2 for the formal proof). The sampling distribution of GLIMEBINOMIAL is defined as P(z , z 0 = k) = ek/σ2/(1 + e1/σ2)d, where k = 0, 1, . . . , d. This distribution is essentially a Binomial distribution. To generate a sample z {0, 1}d, one can independently draw z i {0, 1} with P(zi = 1) = 1/(1 + e 1/σ2) for i = 1, . . . , d. The feature importance vector obtained by solving Equation 3 under GLIME-BINOMIAL is denoted as w Binomial.

Kernel SHAP [19]. In our framework, the formulation of Kernel SHAP aligns with that of LIME, with the only difference being R(v) = 0 and π(z ) = (d 1)/( d z 0 z 0(d z 0)).

Smooth Grad [28]. Smooth Grad functions on raw features, specifically pixels in the case of an image. Here, z = z + x, where z N(0, σ2I). The loss function ℓ(f(z), g(z )) is represented by the squared loss, while π(z ) = 1 and R(v) = 0, as established in Appendix B.6.

Gradient [36]. The Gradient explanation is essentially the limit of Smooth Grad as σ approaches 0.

DLIME [35]. DLIME functions on raw features, where P is defined over the training data that have the same label with the nearest neighbor of x. The linear model g(z ) = v z is employed with the square loss function ℓand the regularization term R(v) = 0.

ALIME [24]. ALIME employs an auto-encoder trained on the training data, with its feature space defined as the output space of the auto-encoder. The sample generation process involves introducing Gaussian noise to x. The weighting function in ALIME is denoted as π(z ) = exp( AE(x) AE(z ) 1), where AE( ) represents the auto-encoder. The squared loss function is chosen as the loss function and no regularization function is applied.

4 Stable and locally faithful explanations in GLIME

4.1 GLIME-BINOMIAL converges exponentially faster than LIME

To understand the instability of LIME, we demonstrate that the sample weights in LIME are very small, resulting in the domination of the regularization term. Consequently, LIME tends to produce explanations that are close to zero. Additionally, the small weights in LIME lead to a considerably slower convergence compared to GLIME-BINOMIAL, despite both methods converging to the same limit.

0.00 0.25 0.50 0.75 1.00 |z0|0/d

probability

LIME Binomial

0.00 0.25 0.50 0.75 1.00 |z0|0/d

probability

0.00 0.25 0.50 0.75 1.00 |z0|0/d

probability

0.00 0.25 0.50 0.75 1.00 |z0|0/d

probability

weight (dotted)

weight (dotted)

weight (dotted)

weight (dotted)

Figure 2: The distribution and the weight of z 0. In LIME, the distribution of z 0 follows a binomial distribution, and it is independent of σ. In GLIME-BINOMIAL, when σ is small, z 0 concentrates around d, while in LIME, most samples exhibit negligible weights except for the all-one vector 1. As σ increases, the distributions in GLIME-BINOMIAL converge to those in LIME, and all LIME samples attain non-negligible weights.

Small sample weights in LIME. The distribution of the ratio of non-zero elements to the total number of super-pixels, along with the corresponding weights for LIME and GLIME-BINOMIAL, is depicted in Figure 2. Notably, most samples exhibit approximately d/2 non-zero elements. However, when σ takes values such as 0.25 or 0.5, a significant portion of samples attains weights that are nearly zero. For instance, when σ = 0.25 and z 0 = d/2, π(z ) reduces to exp( 8d), which is approximately 10 70 for d = 20. Even with z 0 = d 1, π(z ) equals e 16, approximating 10 7. Since LIME samples from Uni({0, 1}d), the probability that a sample z has z 0 = d 1 or d is approximately 2 10 5 when d = 20. Therefore, most samples have very small weights. Consequently, the sample estimation of the expectation in Equation 1 tends to be much smaller than the true expectation with high probability and is thus inaccurate (see Appendix B.3 for more details). Given the default regularization strength λ = 1, this imbalance implies the domination of the regularization term in the objective function of Equation 1. As a result, LIME tends to yield explanations close to zero in such cases, diminishing their meaningfulness and leading to instability.

GLIME converges exponentially faster than LIME in the presence of regularization. Through the integration of the weighting function into the sampling process, every sample uniformly carries a weight of 1, contributing equally to Equation 3. Our analysis reveals that GLIME requires substantially fewer samples than LIME to transition beyond the regime where the regularization term dominates. Consequently, GLIME-BINOMIAL converges exponentially faster than LIME. Recall that ˆw LIME and ˆw GLIME represent the empirical solutions of Equation 1 and Equation 3, respectively, obtained by replacing the expectations with the sample average. ˆw BINOMIAL is the empirical solution of Equation 3 with the transformed sampling distribution e P(z , z 0 = k) = ek/σ2/(1 + e1/σ2)d, where k = 0, 1, , d. In the subsequent theorem, we present the sample complexity bound for LIME (refer to Appendix B.4 for proof).

Theorem 4.1. Suppose samples {z i}n i=1 Uni({0, 1}d) are used to compute the LIME explanation. For any ϵ > 0, δ (0, 1), if n = Ω(ϵ 2d928de8/σ2 log(4d/δ)), λ n, we have P( ˆw LIME w LIME 2 < ϵ) 1 δ. w LIME = limn ˆw LIME.

Next, we present the sample complexity bound for GLIME (refer to Appendix B.5 for proof).

Theorem 4.2. Suppose z P such that the largest eigenvalue of z (z ) is bounded by R and E[z (z ) ] = (α1 α2)I + α211 , Var(z (z ) ) 2 ν2, |(z f(z))i| M for some M > 0. {z i}n i=1 are i.i.d. samples from P and are used to compute GLIME explanation ˆw GLIME. For any ϵ > 0, δ (0, 1), if n = Ω(ϵ 2M 2ν2d3γ4 log(4d/δ)) where γ is a function of λ, d, α1, α2, we have P( ˆw GLIME w GLIME 2 < ϵ) 1 δ. w GLIME = limn ˆw GLIME.

Since GLIME-BINOMIAL samples from a binomial distribution, which is sub-Gaussian with parameters M =

d, ν = 2, α1 = 1/(1 + e 1/σ2), α2 = 1/(1 + e 1/σ2)2, and γ(α1, α2, d) = de2/σ2

(refer to Appendix B.5 for proof), we derive the following corollary:

Corollary 4.3. Suppose {z i}n i=1 are i.i.d. samples from e P(z , z 0 = k) = ek/σ2/(1+e1/σ2)d, k = 1, . . . , d and are used to compute GLIME-BINOMIAL explanation. For any ϵ > 0, δ (0, 1), if n = Ω(ϵ 2d5e4/σ2 log(4d/δ)), we have P( ˆw Binomial w Binomial 2 < ϵ) 1 δ. w Binomial = limn ˆw Binomial.

Figure 3: The distribution of sample distances to the original input. The samples produced by LIME display a considerable distance from the original input x, whereas the samples generated by GLIME demonstrate a more localized distribution. LIME has a tendency to overlook sampling points that are in close proximity to x.

Comparing the sample complexities outlined in Theorem 4.1 and Corollary 4.3, it becomes evident that LIME necessitates an exponential increase of exp(d, σ 2) more samples than GLIME-BINOMIAL for convergence. Despite both LIME and GLIME-BINOMIAL samples being defined on the binary set {0, 1}, the weight π(z ) associated with a sample z in LIME is notably small. Consequently, the square loss term in LIME is significantly diminished compared to that in GLIME-BINOMIAL. This situation results in the domination of the regularization term over the square loss term, leading to solutions that are close to zero. For stable solutions, it is crucial that the square loss term is comparable to the regularization term. Consequently, GLIME-BINOMIAL requires significantly fewer samples than LIME to achieve stability.

4.2 Designing locally faithful explanation methods within GLIME

Non-local and biased sampling in LIME. LIME employs uniform sampling from {0, 1}d and subsequently maps the samples to the original input space X with the inclusion of a reference. Despite the integration of a weighting function to enhance locality, the samples {zi}n i=1 generated by LIME often exhibit non-local characteristics, limiting their efficacy in capturing the local behaviors of the model f (as depicted in Figure 3). This observation aligns with findings in [16, 21], which demonstrate that LIME frequently approximates the global behaviors instead of the local behaviors of f. As illustrated earlier, the weighting function contributes to LIME s instability, emphasizing the need for explicit enforcement of locality in the sampling process.

Local and unbiased sampling in GLIME. In response to these challenges, GLIME introduces a sampling procedure that systematically enforces locality without reliance on a reference point. One approach involves sampling z P = N(0, σ2I) and subsequently obtaining z = x + z . This method, referred to as GLIME-GAUSS, utilizes a weighting function π( ) 1, with other components chosen to mirror those of LIME. The feature attributions derived from this approach successfully mitigate the aforementioned issues. Similarly, alternative distributions, such as P = Laplace(0, σ) or P = Uni([ σ, σ]d), can be employed, resulting in explanation methods known as GLIME-LAPLACE and GLIME-UNIFORM, respectively.

4.3 Sampling distribution selection for user-specific objectives

Users may possess specific objectives they wish the explanation method to fulfill. For instance, if a user seeks to enhance local fidelity within a neighborhood of radius ϵ, they can choose a distribution and corresponding parameters aligned with this objective (as depicted in Figure 5). The flexible design of the sample distribution in GLIME empowers users to opt for a distribution that aligns with their particular use cases. Furthermore, within the GLIME framework, it is feasible to integrate feature correlation into the sampling distribution, providing enhanced flexibility. In summary, GLIME affords users the capability to make more tailored choices based on their individual needs and objectives.

5 Experiments

Dataset and models. Our experiments are conducted on the Image Net dataset1. Specifically, we randomly choose 100 classes and select an image at random from each class. The models chosen for explanation are Res Net18 [13] and the tiny Swin-Transformer [18] (refer to Appendix A.7 for results). Our implementation is derived from the official implementation of LIME2. The default segmentation algorithm in LIME, Quickshift [32], is employed. Implementation details of our experiments are provided in Appendix A.1. For experiment results on text data, please refer to Appendix A.9. For experiment results on ALIME, please refer to Appendix A.8.

Metrics. (1) Stability: To gauge the stability of an explanation method, we calculate the average top-K Jaccard Index (JI) for explanations generated by 10 different random seeds. Let w1, . . . , w10 denote the explanations obtained from 10 random seeds. The indices corresponding to the top-K largest values in wi are denoted as Ri,:K. The average Jaccard Index between pairs of Ri,:K and Rj,:K is then computed, where JI(A, B) = |A B|/|A B|.

(2) Local Fidelity: To evaluate the local fidelity of explanations, reflecting how well they capture the local behaviors of the model, we employ two approaches. For LIME, which uses a non-local sampling neighborhood, we use the R2 score returned by the LIME implementation for local fidelity assessment [33]. Within GLIME, we generate samples {zi}m i=1 and {z i}m i=1 from the neighborhood Bx(ϵ). The squared difference between the model s output and the explanation s output on these samples is computed. Specifically, for a sample z, we calculate (f(z) ˆw z )2 for the explanation ˆw. The local fidelity of an explanation ˆw at the input x is defined as 1/(1 + 1

i(f(zi) ˆw z i)2), following the definition in [16]. To ensure a fair comparison between different distributions in GLIME, we set the variance parameter of each distribution to match that of the Gaussian distribution. For instance, when sampling from the Laplace distribution, we use Laplace(0, σ/

2), and when sampling from the uniform distribution, we use Uni([

5.1 Stability of LIME and GLIME

LIME s instability and the influence of regularization/weighting. In Figure 4a, it is evident that LIME without the weighting function (LIME + π = 1) demonstrates greater stability compared to its weighted counterpart, especially when σ is small (e.g., σ = 0.25, 0.5). This implies that the weighting function contributes to instability in LIME. Additionally, we observe that LIME without regularization (LIME + λ = 0) exhibits higher stability than the regularized LIME, although the improvement is not substantial. This is because, when σ is small, the sample weights approach zero, causing the Ridge regression problem to become low-rank, leading to unstable solutions. Conversely, when σ is large, significant weights are assigned to all samples, reducing the effectiveness of regularization. For instance, when σ = 5 and d = 40, most samples carry weights around 0.45, and even samples with only one non-zero element left possess weights of approximately 0.2. In such scenarios, the regularization term does not dominate, even with limited samples. This observation is substantiated by the comparable performance of LIME, LIME+π = 1, and LIME+λ = 0 when σ = 1 and 5. Further results are presented in Appendix A.2.

Enhancing stability in LIME with GLIME. In Figure 4a, it is evident that LIME achieves a Jaccard Index of approximately 0.4 even with over 2000 samples when using the default σ = 0.25. In contrast, both GLIME-BINOMIAL and GLIME-GAUSS provide stable explanations with only 200-400 samples. Moreover, with an increase in the value of σ, the convergence speed of LIME also improves. However, GLIME-BINOMIAL consistently outperforms LIME, requiring fewer samples for comparable stability. The logarithmic scale of the horizontal axis in Figure 4a highlights the exponential faster convergence of GLIME compared to LIME.

Convergence of LIME and GLIME-BINOMIAL to a common limit. In Figure 8 of Appendix A.3, we explore the difference and correlation between explanations generated by LIME and GLIMEBINOMIAL. Mean Squared Error (MSE) and Mean Absolute Error (MAE) are employed as metrics to quantify the dissimilarity between the explanations, while Pearson correlation and Spearman rank correlation assess their degree of correlation. As the sample size increases, both LIME and GLIME-

1Code is available at https://github.com/thutzr/GLIME-General-Stable-and-Local-LIME-Explanation 2https://github.com/marcotcr/lime

# samples (log scale)

Top-20 Jaccard Index

GLIME--Binomial GLIME--Gauss LIME LIME+λ = 0 LIME+λ = e d/σ2

# samples (log scale)

Top-20 Jaccard Index

# samples (log scale)

Top-20 Jaccard Index

# samples (log scale)

Top-20 Jaccard Index

(a) Stability of various methods. The reported metric is the Top-20 Jaccard index. Instances of LIME with no regularization and no weighting are denoted as LIME+λ = 0 and LIME+π = 1, respectively. LIME exhibits instability when σ is small, whereas GLIME demonstrates enhanced stability across different σ values. Notably, LIME achieves greater stability without weighting or regularization when σ is small. Conversely, regularization and weighting have minimal impact on the stability of LIME when σ is large.

0.25 0.5 1.0 5.0 σ

LIME GLIME--Binomial GLIME--Gauss GLIME--Uniform GLIME--Laplace

(b) R2 comparison between LIME and various GLIME methods with different sampling distributions. We utilize 2048 samples to compute explanations and the corresponding R2 for each image and each method. LIME exhibits nearly zero R2 when σ = 0.25 and 0.5, suggesting minimal explanatory power. In general, the R2 of LIME is consistently lower than that of GLIME, underscoring GLIME s enhancement in local fidelity.

Figure 4: GLIME consistently enhances stability and local fidelity compared to LIME across various values of σ.

Fidelity ( )

GLIME--Gauss GLIME--Laplace GLIME--Uniform

Figure 5: Local fidelity of GLIME across different neighborhood radii. Explanations produced under a distribution with a standard deviation of σ demonstrate the ability to capture behaviors within local neighborhoods with radii exceeding σ.

BINOMIAL exhibit greater similarity and higher correlation. The dissimilarity in their explanations diminishes rapidly, approaching zero when σ is significantly large (e.g., σ = 5).

5.2 Local fidelity of LIME and GLIME

Enhancing local fidelity with GLIME. A comparison of the local fidelity between LIME and the explanation methods generated by GLIME is presented in Figure 4b. Utilizing 2048 samples for each image to compute the R2 score, GLIME consistently demonstrates superior local fidelity compared to LIME. Particularly, when σ = 0.25 and 0.5, LIME exhibits local fidelity that is close to zero, signifying that the linear approximation model ( ˆw LIME) z is nearly constant. Through the explicit integration of locality into the sampling process, GLIME significantly improves the local fidelity of the explanations.

Local fidelity analysis of GLIME under various sampling distributions. In Figure 5, we assess the local fidelity of GLIME employing diverse sampling distributions: N(0, σ2I), Laplace(0, σ/

(a) Human-interpretability results for accurate predictions. Participants consistently rated GLIME significantly higher than LIME, indicating that GLIME serves as a more effective tool for explaining model predictions.

(b) Human-interpretability results for incorrect predictions. Participants consistently rated GLIME significantly higher than LIME, indicating that GLIME excels in identifying the model s errors more effectively than LIME.

Figure 6: GLIME helpes explaining model predictions better than LIME.

3σ]d). The title of each sub-figure corresponds to the standard deviation of these distributions. Notably, we observe that the value of σ does not precisely align with the radius ϵ of the intended local neighborhood for explanation. Instead, local fidelity tends to peak at larger ϵ values than the corresponding σ. Moreover, different sampling distributions achieve optimal local fidelity for distinct ϵ values. This underscores the significance of selecting an appropriate distribution and parameter values based on the specific radius ϵ of the local neighborhood requiring an explanation. Unlike LIME, GLIME provides the flexibility to accommodate such choices. For additional results and analysis, please refer to Appendix A.5.

5.3 Human experiments

In addition to numerical experiments, we conducted human-interpretability experiments to evaluate whether GLIME provides more meaningful explanations to end-users. The experiments consist of two parts, with 10 participants involved in each. The details of the procedures employed in conducting the experiments is presented in the following.

1. Can GLIME improve the comprehension of the model s predictions? To assess this, we choose images for which the model s predictions are accurate. Participants are presented with the original images, accompanied by explanations generated by both LIME and GLIME. They are then asked to evaluate the degree of alignment between the explanations from these algorithms and their intuitive understanding. Using a 1-5 scale, where 1 indicates a significant mismatch and 5 signifies a strong correspondence, participants rate the level of agreement. 2. Can GLIME assist in identifying the model s errors? To explore this, we select images for which the model s predictions are incorrect. Participants receive the original images along with explanations generated by both LIME and GLIME. They are then asked to assess the degree to which these explanations aid in understanding the model s behaviors and uncovering the reasons behind the inaccurate predictions. Using a 1-5 scale, where 1 indicates no assistance and 5 signifies substantial aid, participants rate the level of support provided by the explanations.

Figure 6 presents the experimental results. When participants examined images with accurate model predictions, along with explanations from LIME and GLIME, they assigned an average score of 2.96 to LIME and 3.37 to GLIME. On average, GLIME received a score 0.41 higher than LIME. Notably, in seven out of the ten instances, GLIME achieved a higher average score than LIME.

In contrast, when participants examined images with incorrect model predictions, accompanied by explanations from LIME and GLIME, they assigned an average score of 2.33 to LIME and 3.42 to GLIME. Notably, GLIME outperformed LIME with an average score 1.09 higher across all ten images. These results strongly indicate that GLIME excels in explaining the model s behaviors.

6 Related work

Post-hoc local explanation methods. In contrast to inherently interpretable models, black-box models can be explained through post-hoc explanation methods, which are broadly categorized as model-agnostic or model-specific. Model-specific approaches, such as Gradient [2, 27], Smooth Grad [28], and Integrated Gradient [30], assume that the explained model is differentiable and that gradient access is available. For instance, Smooth Grad generates samples from a Gaussian distribution centered at the given input and computes their average gradient to mitigate noise. On the other hand, model-agnostic methods, including LIME [22] and Anchor [23], aim to approximate the local model behaviors using interpretable models, such as linear models or rule lists. Another widely-used model-agnostic method, SHAP [19], provides a unified framework that computes feature attributions based on the Shapley value and adheres to several axioms.

Instability of LIME. Despite being widely employed, LIME is known to be unstable, evidenced by divergent explanations under different random seeds [35, 34, 38]. Many efforts have been devoted to stabilize LIME explanations. Zafar et al. [35] introduced a deterministic algorithm that utilizes hierarchical clustering for grouping training data and k-nearest neighbors for selecting relevant data samples. However, the resulting explanations may not be a good local approximation. Addressing this concern, Shankaranarayana et al. [24] trained an auto-encoder to function as a more suitable weighting function in LIME. Shi et al. [25] incorporated feature correlation into the sampling step and considered a more restricted sampling distribution, thereby enhancing stability. Zhou et al. [40] employed a hypothesis testing framework to determine the necessary number of samples for ensuring stable explanations. However, this improvement came at the expense of a substantial increase in computation time.

Impact of references. LIME, along with various other explanation methods, relies on references (also known as baseline inputs) to generate samples. References serve as uninformative inputs meant to represent the absence of features [4, 30, 26]. Choosing an inappropriate reference can lead to misleading explanations. For instance, if a black image is selected as the reference, important black pixels may not be highlighted [15, 6]. The challenge lies in determining the appropriate reference, as different types of references may yield different explanations [14, 6, 15]. In [15], both black and white references are utilized, while [7] employs constant, noisy, and Gaussian blur references simultaneously. To address the reference specification issue, [6] proposes Expected Gradient, considering each instance in the data distribution as a reference and averaging explanations computed across all references.

7 Conclusion

In this paper, we introduce GLIME, a novel framework that extends the LIME method for local feature importance explanations. By explicitly incorporating locality into the sampling procedure and enabling more flexible distribution choices, GLIME mitigates the limitations of LIME, such as instability and low local fidelity. Experimental results on Image Net data demonstrate that GLIME significantly enhances stability and local fidelity compared to LIME. While our experiments primarily focus on image data, the applicability of our approach readily extends to text and tabular data.

8 Acknowledgement

The authors would like to thank the anonymous reviewers for their constructive comments. Zeren Tan and Jian Li are supported by the National Natural Science Foundation of China Grant (62161146004). Yang Tian is supported by the Artificial and General Intelligence Research Program of Guo Qiang Research Institute at Tsinghua University (2020GQG1017).

[1] Alejandro Barredo Arrieta, Natalia Díaz-Rodríguez, Javier Del Ser, Adrien Bennetot, Siham Tabik, Alberto Barbado, Salvador García, Sergio Gil-López, Daniel Molina, Richard Benjamins, et al. Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai. Information fusion, 58:82 115, 2020.

[2] David Baehrens, Timon Schroeter, Stefan Harmeling, Motoaki Kawanabe, Katja Hansen, and Klaus-Robert Müller. How to explain individual classification decisions. The Journal of Machine Learning Research, 11:1803 1831, 2010.

[3] Naman Bansal, Chirag Agarwal, and Anh Nguyen. Sam: The sensitivity of attribution methods to hyperparameters. In Proceedings of the ieee/cvf conference on computer vision and pattern recognition, pages 8673 8683, 2020.

[4] Alexander Binder, Grégoire Montavon, Sebastian Lapuschkin, Klaus-Robert Müller, and Wojciech Samek. Layer-wise relevance propagation for neural networks with local renormalization layers. In Artificial Neural Networks and Machine Learning ICANN 2016: 25th International Conference on Artificial Neural Networks, Barcelona, Spain, September 6-9, 2016, Proceedings, Part II 25, pages 63 71. Springer, 2016.

[5] Ian C Covert, Scott Lundberg, and Su-In Lee. Explaining by removing: A unified framework for model explanation. The Journal of Machine Learning Research, 22(1):9477 9566, 2021.

[6] Gabriel Erion, Joseph D Janizek, Pascal Sturmfels, Scott M Lundberg, and Su-In Lee. Improving performance of deep learning models with axiomatic attribution priors and expected gradients. Nature machine intelligence, 3(7):620 631, 2021.

[7] Ruth C Fong and Andrea Vedaldi. Interpretable explanations of black boxes by meaningful perturbation. In Proceedings of the IEEE international conference on computer vision, pages 3429 3437, 2017.

[8] Damien Garreau and Dina Mardaoui. What does lime really see in images? In International conference on machine learning, pages 3620 3629. PMLR, 2021.

[9] Damien Garreau and Ulrike von Luxburg. Looking deeper into tabular lime. ar Xiv preprint ar Xiv:2008.11092, 2020.

[10] Loveleen Gaur, Mohan Bhandari, Tanvi Razdan, Saurav Mallik, and Zhongming Zhao. Explanation-driven deep learning model for prediction of brain tumour status using mri image data. Frontiers in Genetics, page 448, 2022.

[11] Rory Mc Grath, Luca Costabello, Chan Le Van, Paul Sweeney, Farbod Kamiab, Zhao Shen, and Freddy Lecue. Interpretable credit application predictions with counterfactual explanations. ar Xiv preprint ar Xiv:1811.05245, 2018.

[12] Tessa Han, Suraj Srinivas, and Himabindu Lakkaraju. Which explanation should i choose? a function approximation perspective to characterizing post hoc explanations. ar Xiv preprint ar Xiv:2206.01254, 2022.

[13] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770 778, 2016.

[14] Saachi Jain, Hadi Salman, Eric Wong, Pengchuan Zhang, Vibhav Vineet, Sai Vemprala, and Aleksander Madry. Missingness bias in model debugging. ar Xiv preprint ar Xiv:2204.08945, 2022.

[15] Andrei Kapishnikov, Tolga Bolukbasi, Fernanda Viégas, and Michael Terry. Xrai: Better attributions through regions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4948 4957, 2019.

[16] Thibault Laugel, Xavier Renard, Marie-Jeanne Lesot, Christophe Marsala, and Marcin Detyniecki. Defining locality for surrogates in post-hoc interpretablity. ar Xiv preprint ar Xiv:1806.07498, 2018.

[17] Wu Lin, Mohammad Emtiyaz Khan, and Mark Schmidt. Stein s lemma for the reparameterization trick with exponential family mixtures. ar Xiv preprint ar Xiv:1910.13398, 2019.

[18] Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012 10022, 2021.

[19] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017.

[20] Christoph Molnar. Interpretable machine learning. Lulu. com, 2020.

[21] Amir Hossein Akhavan Rahnama and Henrik Boström. A study of data and label shift in the lime framework. ar Xiv preprint ar Xiv:1910.14421, 2019.

[22] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. " why should i trust you?" explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135 1144, 2016.

[23] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Anchors: High-precision modelagnostic explanations. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.

[24] Sharath M Shankaranarayana and Davor Runje. Alime: Autoencoder based approach for local interpretability. In Intelligent Data Engineering and Automated Learning IDEAL 2019: 20th International Conference, Manchester, UK, November 14 16, 2019, Proceedings, Part I 20, pages 454 463. Springer, 2019.

[25] Sheng Shi, Xinfeng Zhang, and Wei Fan. A modified perturbed sampling method for local interpretable model-agnostic explanation. ar Xiv preprint ar Xiv:2002.07434, 2020.

[26] Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. Learning important features through propagating activation differences. In International conference on machine learning, pages 3145 3153. PMLR, 2017.

[27] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. ar Xiv preprint ar Xiv:1312.6034, 2013.

[28] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. ar Xiv preprint ar Xiv:1706.03825, 2017.

[29] Pascal Sturmfels, Scott Lundberg, and Su-In Lee. Visualizing the impact of feature attribution baselines. Distill, 5(1):e22, 2020.

[30] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319 3328. PMLR, 2017.

[31] Joel A Tropp et al. An introduction to matrix concentration inequalities. Foundations and Trends in Machine Learning, 8(1-2):1 230, 2015.

[32] Andrea Vedaldi and Stefano Soatto. Quick shift and kernel methods for mode seeking. In Computer Vision ECCV 2008: 10th European Conference on Computer Vision, Marseille, France, October 12-18, 2008, Proceedings, Part IV 10, pages 705 718. Springer, 2008.

[33] Giorgio Visani, Enrico Bagli, and Federico Chesani. Optilime: Optimized lime explanations for diagnostic computer algorithms. ar Xiv preprint ar Xiv:2006.05714, 2020.

[34] Giorgio Visani, Enrico Bagli, Federico Chesani, Alessandro Poluzzi, and Davide Capuzzo. Statistical stability indices for lime: Obtaining reliable explanations for machine learning models. Journal of the Operational Research Society, 73(1):91 101, 2022.

[35] Muhammad Rehman Zafar and Naimul Mefraz Khan. Dlime: A deterministic local interpretable model-agnostic explanations approach for computer-aided diagnosis systems. ar Xiv preprint ar Xiv:1906.10263, 2019.

[36] Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In Computer Vision ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13, pages 818 833. Springer, 2014.

[37] Yifan Zhang, Haowei He, and Yang Yuan. Consistent and truthful interpretation with fourier analysis. ar Xiv preprint ar Xiv:2210.17426, 2022.

[38] Yujia Zhang, Kuangyan Song, Yiming Sun, Sarah Tan, and Madeleine Udell. " why should you trust my explanation?" understanding uncertainty in lime explanations. ar Xiv preprint ar Xiv:1904.12991, 2019.

[39] Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2921 2929, 2016.

[40] Zhengze Zhou, Giles Hooker, and Fei Wang. S-lime: Stabilized-lime for model explanation. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, pages 2429 2438, 2021.

A More discussions

A.1 Implementation details

Dataset selection. The experiments use images from the validation set of the Image Net-1k dataset. To ensure consistency, a fixed random seed (2022) is employed. Specifically, 100 classes are uniformly chosen at random, and for each class, an image is randomly selected.

Models. The pretrained models used are sourced from torchvision.models, with the weights parameter set to IMAGENET1K_V1.

Feature transformation. The initial step involves cropping each image to dimensions of (224, 224, 3). The quickshift method from scikit-image is then employed to segment images into super-pixels, with specific parameters set as follows: kernel_size=4, max_dist=200, ratio=0.2, and random_seed=2023. This approach aligns with the default setting in LIME, except for the modified fixed random seed. Consistency in the random seed ensures that identical images result in the same super-pixels, thereby isolating the source of instability to the calculation of explanations. However, for different images, they are still segmented in different ways.

Computing explanations. The implemented procedure follows the original setup in LIME. The hide_color parameter is configured as None, causing the average value of each super-pixel to act as its reference when the super-pixel is removed. The distance_metric is explicitly set to l2, as recommended for image data in LIME [22]. The default value for alpha in Ridge regression is 1, unless otherwise specified. For each image, the model f infers the most probable label, and the explanation pertaining to that label is computed. Ten different random seeds are utilized to compute explanations for each image. The random_seed parameter in both the Lime Image Explainer and the explain_instance function is set to these specific random seeds.

A.2 Stability of LIME and GLIME

Figure 7 illustrates the top-1, top-5, top-10, and average Jaccard indices. Importantly, the results presented in Figure 7 closely align with those in Figure 4a. In summary, it is evident that GLIME consistently provides more stable explanations compared to LIME.

A.3 LIME and GLIME-BINOMIAL converge to the same limit

In Figure 8, the difference and correlation between explanations generated by LIME and GLIMEBINOMIAL are presented. With an increasing sample size, the explanations from LIME and GLIMEBINOMIAL become more similar and correlated. The difference between their explanations rapidly converges to zero, particularly when σ is large, such as in the case of σ = 5. While LIME exhibits a slower convergence, especially with small σ, it is impractical to continue sampling until their difference fully converges. Nevertheless, the correlation between LIME and GLIME-BINOMIAL strengthens with an increasing number of samples, indicating their convergence to the same limit as the sample size grows.

A.4 LIME explanations are different for different references.

The earlier work by Jain et al. [14] has underscored the instability of LIME regarding references. As shown in Section 4.2, this instability originates from LIME s sampling distribution, which relies on the chosen reference r. Additional empirical evidence is presented in Figure 9. Six distinct references black, white, red, blue, yellow image, and the average value of the removed super-pixel (the default setting for LIME) are selected. The average Jaccard indices for explanations computed using these various references are detailed in Figure 9. The results underscore the sensitivity of LIME to different references.

Different references result in LIME identifying distinct features as the most influential, even with a sample size surpassing 2000. Particularly noteworthy is that, with a sample size exceeding 2000, the top-1 Jaccard index consistently remains below 0.7, underscoring LIME s sensitivity to reference variations.

# samples (log scale)

Top-1 Jaccard Index

General LIME--Binomial General LIME--Gauss LIME LIME+λ = 0 LIME+λ = e d/σ2

# samples (log scale)

Top-1 Jaccard Index

# samples (log scale)

Top-1 Jaccard Index

# samples (log scale)

Top-1 Jaccard Index

(a) Top-1 Jaccard indices for various methods.

# samples (log scale)

Top-5 Jaccard Index

General LIME--Binomial General LIME--Gauss LIME LIME+λ = 0 LIME+λ = e d/σ2

# samples (log scale)

Top-5 Jaccard Index

# samples (log scale)

Top-5 Jaccard Index

# samples (log scale)

Top-5 Jaccard Index

(b) Top-5 Jaccard indices for various methods.

# samples (log scale)

Top-10 Jaccard Index

General LIME--Binomial General LIME--Gauss LIME LIME+λ = 0 LIME+λ = e d/σ2

# samples (log scale)

Top-10 Jaccard Index

# samples (log scale)

Top-10 Jaccard Index

# samples (log scale)

Top-10 Jaccard Index

(c) Top-10 Jaccard indices for various methods.

# samples (log scale)

Jaccard Index

General LIME--Binomial General LIME--Gauss LIME LIME+λ = 0 LIME+λ = e d/σ2

# samples (log scale)

Jaccard Index

# samples (log scale)

Jaccard Index

# samples (log scale)

Jaccard Index

(d) Average Jaccard indices for various methods.

Figure 7: Top-1, 5, 10, and average Jaccard indices are computed for various methods. The average Jaccard index is obtained by averaging the top-1 to top-d Jaccard indices.

10 2 10 3 10 4

# samples (log scale)

0.25 0.5 1.0 5.0

10 2 10 3 10 4

# samples (log scale)

10 2 10 3 10 4

# samples (log scale)

10 2 10 3 10 4

# samples (log scale)

Figure 8: Difference and correlation between LIME and GLIME-BINOMIAL explanations. Mean Squared Error (MSE) and Mean Absolute Error (MAE) serve as metrics to evaluate the dissimilarity between explanations provided by LIME and GLIME-BINOMIAL. Pearson and Spearman correlation coefficients are employed to quantify the correlation between these explanations. With an increasing number of samples, the explanations from LIME and GLIME-BINOMIAL tend to show greater similarity. Notably, the dissimilarity and correlation of explanations between LIME and GLIMEBINOMIAL converge more rapidly when σ is higher.

# samples (log scale)

Jaccard Index

σ = 0.25 σ = 0.5 σ = 1 σ = 5

# samples (log scale)

Jaccard Index

# samples (log scale)

Jaccard Index

# samples (log scale)

Jaccard Index

Figure 9: The Top-K Jaccard index of explanations, computed with different references, consistently stays below 0.7, even when the sample size exceeds 2000.

Fidelity ( )

General LIME--Gauss General LIME--Laplace General LIME--Uniform

1.0000 σ = 1

Figure 10: The local fidelity of GLIME in the ℓ1 neighborhood

A.5 The local fidelity of GLIME

Figure 5 presents the local fidelity of GLIME, showcasing samples from the ℓ2 neighborhood {z| z x 2 ϵ} around x. Additionally, Figure 10 and Figure 11 illustrate the local fidelity of GLIME within the ℓ1 neighborhood {z| z x 1 ϵ} and the ℓ neighborhood {z| z x ϵ}, respectively.

A comparison between Figure 5 and Figure 10 reveals that, for the same σ, GLIME can explain the local behaviors of f within a larger radius in the ℓ1 neighborhood compared to the ℓ2 neighborhood. This difference arises from the fact that {z| z x 2 ϵ} defines a larger neighborhood compared to {z| z x 1 ϵ} with the same radius ϵ.

Likewise, the set {z| z x ϵ} denotes a larger neighborhood than {z| z x 2 ϵ}, causing the local fidelity to peak at a smaller radius ϵ for the ℓ neighborhood compared to the ℓ2 neighborhood under the same σ.

Remarkably, GLIME-LAPLACE consistently demonstrates superior local fidelity compared to GLIMEGAUSS and GLIME-UNIFORM. Nevertheless, in cases with larger ϵ, GLIME-GAUSS sometimes surpasses the others. This observation implies that the choice of sampling distribution should be contingent on the particular radius of the local neighborhood intended for explanation by the user.

Fidelity ( )

General LIME--Gauss General LIME--Laplace General LIME--Uniform

Figure 11: The local fidelity of GLIME in the ℓ neighborhood

A.6 GLIME unifies several previous methods

Kernel SHAP [19]. Kernel SHAP integrates the principles of LIME and Shapley values. While LIME employs a linear explanation model to locally approximate f, Kernel SHAP seeks a linear explanation model that adheres to the axioms of Shapley values, including local accuracy, missingness, and consistency [19]. Achieving this involves careful selection of the loss function ℓ( , ), the weighting function π( ), and the regularization term R. The choices for these parameters in LIME often violate local accuracy and/or consistency, unlike the selections made in Kernel SHAP, which are proven to adhere to these axioms (refer to Theorem 2 in [19]).

Gradient [2, 27]. This method computes the gradient f to assess the impact of each feature under infinitesimal perturbation [2, 27].

Smooth Grad [28]. Acknowledging that standard gradient explanations may contain noise, Smooth Grad introduces a method to alleviate noise by averaging gradients within the local neighborhood of the explained input [28]. Consequently, the feature importance scores are computed as Eϵ N(0,σ2I)[ f(x + ϵ)].

DLIME [35]. Diverging from random sampling, DLIME seeks a deterministic approach to sample acquisition. In its process, DLIME employs agglomerative Hierarchical Clustering to group the training data, and subsequently utilizes k-Nearest Neighbour to select the cluster corresponding to the explained instance. The DLIME explanation is then derived by constructing a linear model based on the data points within the identified cluster.

ALIME [24]: ALIME leverages an auto-encoder to assign weights to samples. Initially, an autoencoder, denoted as AE( ), is trained on the training data. Subsequently, the method involves sampling n nearest points to x from the training dataset. The distances between these samples and the explained instance x are assessed using the ℓ1 distance between their embeddings, obtained through the application of the auto-encoder AE( ). For a sample z, its distance from x is measured as AE(z) AE(x) 1, and its weight is computed as exp( AE(z) AE(x) 1). The final explanation is derived by solving a weighted Ridge regression problem.

A.7 Results on tiny Swin-Transformer [18]

The findings on the tiny Swin-Transformer align with those on Res Net18, providing additional confirmation that GLIME enhances stability and local fidelity compared to LIME. Please refer to Figure 12, Figure 13 and Figure 14 for results.

A.8 Comparing GLIME with ALIME

While ALIME [24] improves upon the stability and local fidelity of LIME, GLIME consistently surpasses ALIME. A key difference between ALIME and LIME lies in their methodologies: ALIME employs an encoder to transform samples into an embedding space, calculating their distance from the input to be explained as AE(z) AE(x) 1, whereas LIME utilizes a binary vector z {0, 1}d to represent a sample, measuring the distance from the explained input as 1 z 2.

Because ALIME relies on distance in the embedding space to assign weights to samples, there is a risk of generating very small sample weights if the produced samples are far from x, potentially resulting in instability issues.

In our Image Net experiments comparing GLIME and ALIME, we utilize the VGG16 model from the repository imagenet-autoencoder3 as the encoder in ALIME. The outcomes of these experiments are detailed in Table 1. The findings demonstrate that, although ALIME demonstrates enhanced stability compared to LIME, this improvement is not as substantial as the improvement achieved by GLIME, particularly under conditions of small σ or sample size.

A.9 Experiment results on IMDb

The Distil BERT model is employed in experimental evaluations on the IMDb dataset, where 100 data points are selected for explanation. The comparison between GLIME-BINOMIAL and LIME

3https://github.com/Horizon2333/imagenet-autoencoder

# samples (log scale)

Top-20 Jaccard Index

GLIME--Binomial GLIME--Gauss LIME LIME+λ = 0 LIME+π = 1

# samples (log scale)

Top-20 Jaccard Index

# samples (log scale)

Top-20 Jaccard Index

# samples (log scale)

Top-20 Jaccard Index

(a) Stability across various methods (tiny Swin Transformer). The reported metric is the Top-20 Jaccard index. LIME+λ = 0 and LIME+π = 1 represent LIME without regularization and without weighting, respectively. LIME exhibits instability, particularly when σ is small, whereas GLIME demonstrates enhanced stability across varying σ values. Notably, in the absence of weighting or regularization, LIME s stability significantly improves when σ is small. Conversely, the impact of regularization and weighting on LIME s stability is marginal when σ is large.

0.25 0.5 1.0 5.0 σ

LIME GLIME--Binomial GLIME--Gauss GLIME--Uniform GLIME--Laplace

(b) R2 comparison among LIME and GLIME variants with different sampling distributions (tiny Swin-Transformer). For each image and method, 2048 samples are utilized to compute the explanation and the corresponding R2. Notably, LIME yields nearly zero R2 when σ = 0.25 and 0.5, indicating an almost negligible explanation produced by LIME. In contrast, the R2 values of LIME are consistently lower than those of GLIME, underscoring that GLIME enhances the local fidelity of LIME.

Figure 12: GLIME markedly enhances both stability and local fidelity compared to LIME across various values of σ.

Table 1: Top-20 Jaccard Index of GLIME-BINOMIAL, GLIME-GAUSS, and ALIME. GLIMEBINOMIAL and GLIME-GAUSS exhibit significantly higher stability than ALIME, particularly in scenarios with small σ or limited samples.

# samples 128 256 512 1024

GLIME-BINOMIAL 0.952 0.981 0.993 0.998 GLIME-GAUSS 0.872 0.885 0.898 0.911 ALIME 0.618 0.691 0.750 0.803

GLIME-BINOMIAL 0.596 0.688 0.739 0.772 GLIME-GAUSS 0.875 0.891 0.904 0.912 ALIME 0.525 0.588 0.641 0.688

GLIME-BINOMIAL 0.533 0.602 0.676 0.725 GLIME-GAUSS 0.883 0.894 0.908 0.915 ALIME 0.519 0.567 0.615 0.660

GLIME-BINOMIAL 0.493 0.545 0.605 0.661 GLIME-GAUSS 0.865 0.883 0.898 0.910 ALIME 0.489 0.539 0.589 0.640

is depicted in Figure 15 using the Jaccard Index. Our findings indicate that GLIME-BINOMIAL consistently exhibits higher stability than LIME across a range of σ values and sample sizes. Notably, at smaller σ values, GLIME-BINOMIAL demonstrates a substantial improvement in stability compared to LIME.

# samples (log scale)

Top-1 Jaccard Index

GLIME--Binomial GLIME--Gauss LIME LIME+λ = 0 LIME+π = 1

# samples (log scale)

Top-1 Jaccard Index

# samples (log scale)

Top-1 Jaccard Index

# samples (log scale)

Top-1 Jaccard Index

(a) Top-1 Jaccard index of different methods (tiny Swin-Transformer).

# samples (log scale)

Top-5 Jaccard Index

GLIME--Binomial GLIME--Gauss LIME LIME+λ = 0 LIME+π = 1

# samples (log scale)

Top-5 Jaccard Index

# samples (log scale)

Top-5 Jaccard Index

# samples (log scale)

Top-5 Jaccard Index

(b) Top-5 Jaccard index of different methods (tiny Swin-Transformer).

# samples (log scale)

Top-10 Jaccard Index

GLIME--Binomial GLIME--Gauss LIME LIME+λ = 0 LIME+π = 1

# samples (log scale)

Top-10 Jaccard Index

# samples (log scale)

Top-10 Jaccard Index

# samples (log scale)

Top-10 Jaccard Index

(c) Top-10 Jaccard index of different methods (tiny Swin-Transformer).

# samples (log scale)

Jaccard Index

GLIME--Binomial GLIME--Gauss LIME LIME+λ = 0 LIME+π = 1

# samples (log scale)

Jaccard Index

# samples (log scale)

Jaccard Index

# samples (log scale)

Jaccard Index

(d) Average Jaccard index of different methods (tiny Swin-Transformer).

Figure 13: Top-1, 5, 10, and average Jaccard indices are calculated for various methods, and the average Jaccard index represents the mean of top-1, , d indices for the tiny Swin-Transformer.

# samples (log scale)

0.25 0.5 1.0 5.0

# samples (log scale)

# samples (log scale)

# samples (log scale)

Figure 14: Difference and correlation in LIME and GLIME-BINOMIAL explanations (tiny Swin-Transformer). Mean Squared Error (MSE) and Mean Absolute Error (MAE) quantify the divergence between LIME and GLIME-BINOMIAL explanations. Pearson and Spearman correlation coefficients gauge the correlation. With an increasing sample size, the explanations tend to align more closely. Notably, the difference and correlation converge more rapidly with larger values of σ.

300 400 500 600 700 800 900 1000 # samples

Top-20 Jaccard Index

method GLIME-Binomial LIME

0.25 0.5 1.0 5.0

(a) Stability comparison between GLIMEBINOMIAL and LIME. The top-20 Jaccard index is reported, illustrating that GLIME displays superior stability compared to LIME across diverse values of σ. Notably, GLIME s stability remains consistently robust, showing limited sensitivity to changes in σ. In contrast, LIME exhibits increased stability as σ values grow larger.

(b) Comparison of R2 between LIME and GLIMEBINOMIAL under various sampling distributions. Using 2048 samples for explanation computation, R2

values are computed for each image and method. It is noteworthy that when σ = 0.25, LIME exhibits nearly negligible R2 values.

Figure 15: GLIME significantly enhances both stability and local fidelity compared to LIME across various σ values.

B.1 Equivalent GLIME formulation without π( )

By integrating the weighting function into the sampling distribution, the problem to be solved is

w GLIME = arg min v Ez P[π(z )ℓ(f(z), g(z ))] + λR(v)

= arg min v

Rd π(z )ℓ(f(z), g(z ))P(z)dz + λR(v)

=arg minv R

Rd ℓ(f(z), g(z ))π(z )P(z)dz + λR(v) R

Rd π(u )P(u)du

= arg min v

Rd ℓ(f(z), g(z ))π(z )P(z)dz R

Rd π(u )P(u)du + λR(v) R

Rd π(u )P(u)du

= arg min v

Rd ℓ(f(z), g(z )) e P(z)dz + λ

Z R(v) e P(z) = π(z )P(z)

Rd π(u )P(u)du

= arg min v Ez e P[ℓ(f(z), g(z ))] + λ

B.2 Equivalence between LIME and GLIME-BINOMIAL

For LIME, P = Uni({0, 1}d) and thus P(z , z 0 = k) = 1 2d , k = 0, 1, , d, so that

Rd π(u )P(u)du =

k=0 e(k d)/σ2 d k

2d = e d/σ2

2d (1 + e1/σ2)d

Thus, we have

e P(z) = π(z )P(z)

Z = e(k d)/σ22 d

(1 + e1/σ2)d

Therefore, GLIME-BINOMIAL is equivalent to LIME.

B.3 LIME requires many samples to accurately estimate the expectation term in Equation 1.

In Figure 2, it is evident that a lot of samples generated by LIME possess considerably small weights. Consequently, the sample estimation of the expectation in Equation 1 tends to be much smaller than the true expectation with high probability. In such instances, the regularization term would have a dominant influence on the overall objective.

Consider specific parameters, such as σ = 0.25, n = 1000, d = 20 (where σ and n are the default values in the original implementation of LIME). The probability of obtaining a sample z with z 0 = d 1 or d is approximately d 2d + 1 2d = d+1

2d 2 10 5. Let s consider a typical scenario where |f(z)| [0, 1], and (f(z) v z )2 is approximately 0.1 for most z, z . In

this case, Ez Uni({0,1}d)[π(z )(f(z) v z )2] 0.1 Pd k=0 e(k d)/σ2

2d 10 7. However, if we lack samples z with z 0 = d 1 or d, then all samples z i with z i 0 d 2 have weights π(z i) exp( 2

σ2 ) 1.26 10 14. This leads to the sample average 1

n Pn i=1 π(z i)(f(zi) v z i)2 1.26 10 15 10 7. The huge difference between the magnitude of the expectation term in Equation 1 and the sample average of this expectation indicates that the sample average is not an accurate estimation of Ez Uni({0,1}d)[π(z )(f(z) v z )2] (if we do not get enough samples). Additionally, under these circumstances, the regularization term is likely to dominate the sample average term, leading to an underestimation of the intended value of v. In conclusion, the original sampling method for LIME, even with extensively used default parameters, is not anticipated to yield meaningful explanations.

B.4 Proof of Theorem 4.1

Theorem B.1. Suppose samples {z i}n i=1 Uni({0, 1}d) are used to compute LIME explanation. For any ϵ > 0, δ (0, 1), if n = Ω(ϵ 2d524de4/σ2 log(4d/δ)), λ n, we have P( ˆw LIME w LIME 2 < ϵ) 1 δ. w LIME = limn ˆw LIME.

Proof. To compute the LIME explanation with n samples, the following optimization problem is solved:

ˆw LIME = arg min v 1 n

i=1 π(z i)(f(zi) v z i)2 + λ

n Pn i=1 π(z i)(f(zi) v z i)2 + λ

n v 2 2. Setting the gradient of L with respect to v to zero, we obtain:

nπ(z i)(f(zi) v z i)z i + 2

which leads to:

ˆw LIME = 1

i=1 π(z i)z i(z i) + λ

i=1 π(z i)z if(zi) .

Denote Σn = 1 n Pn i=1 π(z i)z i(z i) + λ

n, Γn = 1 n Pn i=1 π(z i)z if(zi), Σ = limn Σn, and Γ = limn Γn. Then, we have: ˆw LIME = Σ 1 n Γn, w LIME = Σ 1Γ. To prove the concentration of ˆw LIME, we follow the proofs in [8]: (1) First, we prove the concentration of Σn; (2) Then, we bound Σ 1 2 F ; (3) Next, we prove the concentration of Γn; (4) Finally, we use the following inequality: Σ 1 n Γn Σ 1Γ 2 Σ 1 F Γn Γ 2 + 2 Σ 1 2 F Γ Σn Σ , when Σ 1(Σn Σ) 0.32 [8].

Before establishing concentration results, we first derive the expression for Σ.

Expression of Σ.

i π(z i)(z i1)2 + λ

i π(z i)z i1z i2 1 n P

i π(z i)z i,1z id 1 n P

i π(z i)z i1z i2 1 n P

i π(z i)(z i2)2 + λ

i π(z i)z i2z id ... ... ... ... 1 n P

i π(z i)z i1z id 1 n P

i π(z i)z i2z id 1 n P

i π(z i)(z id)2 + λ

By taking n , we have

Σn Σ = (α1 α2)I + α211

α1 =Ez Uni({0,1}d)[π(z )z i] = Ez Uni({0,1}d)[π(z )(z i)2]

k=0 e(k d)/σ2P(z i = 1| z 0 = k)P( z 0 = k)

k=0 e(k d)/σ2 k

k=0 e(k d)/σ2 d 1 k 1

k=0 e(k 1)/σ2e(1 d)/σ2 d 1 k 1

=e(1 d)/σ2 (1 + e 1 σ2 )d 1

2d = (1 + e 1

α2 =Ez Uni({0,1}d)[π(z )z iz j]

k=0 e(k d)/σ2P(z i = 1, z j = 1| z 0 = k)P( z 0 = k)

k=0 e(k d)/σ2 k(k 1)

k=0 e(k d)/σ2 d 2 k 2

k=0 e(k 2)/σ2e(2 d)/σ2 d 2 k 2

=e(2 d)/σ2 (1 + e 1 σ2 )d 2

2d = (1 + e 1

2d By Sherman-Morrison formula, we have

Σ 1 = ((α1 α2)I + α211 ) 1 = 1 α1 α2 (I + α2 α1 α2 11 ) 1

= 1 α1 α2 (I

α2 α1 α2 11

1 + α2 α1 α2 d) = (β1 β2)I + β211

β1 = α1 + (d 2)α2 (α1 α2)(α1 + (d 1)α2), β2 = α2 (α1 α2)(α1 + (d 1)α2)

In the following, we aim to establish the concentration of ˆw LIME.

Concentration of Σn. Considering 0 π( ) 1 and zi {0, 1}d, each element within Σn resides within the interval of [0, 2]. Moreover, as

1 2d α1 = (1 + e 1

1 2d α2 = (1 + e 1

2d α1 α2 = e 1

σ2 (1 + e 1

The elements within Σ are within the range of [0, 1

4]. Consequently, the elements in Σn Σ fall within the range of [ 1

Referring to the matrix Hoeffding s inequality [31], it holds true that for all t > 0,

P( Σn Σ 2 t) 2d exp( nt2

Bounding Σ 1 2 F . Σ 1 2 F = dβ2 1 + (d2 d)β2 2

d 2d α1 + (d 1)α2 2 + (d 1)

2d α1 + (d 2)α2 2 + (d 2)

|β1| = α1 + (d 2)α2 (α1 α2)(α1 + (d 1)α2)

2de1/σ2, β2 1 22de2/σ2

|β2| = α2 (α1 α2)(α1 + (d 1)α2)

= e1/σ2 1 (α1 + (d 1)α2)

d 12de1/σ2,

β2 2 d 222de2/σ2

Σ 1 2 F = dβ2 1 + (d2 d)β2 2 d22de2/σ2 + (d2 d)d 222de2/σ2 2d22de2/σ2

Concentration of Γn. With |f| 1, all elements within both Γn and Γ exist within the range of [0, 1]. According to matrix Hoeffding s inequality [31], for all t > 0,

P( Γn Γ t) 2d exp nt2

Concentration of ˆw LIME. When Σ 1(Σn Σ) 0.32 [8], we have

Σ 1 n Γn Σ 1Γ 2 Σ 1 F Γn Γ 2 + 2 Σ 1 2 F Γ Σn Σ

Σ 1(Σn Σ) Σ 1 Σn Σ 21/2d1/22de1/σ2 Σn Σ

Exploiting the concentration of Σn, where n n1 = 27d322de2/σ2 log(4d/δ) and t = t1 = 5 222.5d 0.52 de 1/σ2, we have

P( Σn Σ 2 t) 2d exp nt2

Therefore, with a probability of at least 1 δ

Σ 1(Σn Σ) Σ 1 Σn Σ 21/2d1/22de1/σ2 Σn Σ 0.32

For n n2 = 28ϵ 2d222de2/σ2 log(4d/δ) and t2 = 2 2.5d 0.52 de 1/σ2ϵ, the following concentration inequality holds:

P( Γn Γ t2) 2d exp n2t2 2 8d

In this context, with a probability of at least 1 δ

Considering Γ

d, we select n n3 = 29ϵ 2d524de4/σ2 log(4d/δ) and t3 = 2 3d 1.52 2de 2/σ2ϵ, leading to

P( Σn Σ 2 t3) 2d exp n3t2 3 32d2

2d exp n3t2 3 32d2

With a probability at least 1 δ/2, we have

Σ 1 2 Γ Σn Σ ϵ

In summary, we choose n max{n1, n2, n3}, and then for all ϵ > 0, δ (0, 1)

P( Σ 1 n Γn Σ 1Γ ϵ) δ

B.5 Proof of Theorem 4.2 and Corollary 4.3

Theorem B.2. Suppose z P such that the largest eigenvalue of z (z ) is bounded by R and E[z (z ) ] = (α1 α2)I + α211 , Var(z (z ) ) 2 ν2, |(z f(z))i| M for some M > 0. {z i}n i=1 are i.i.d. samples from P and are used to compute GLIME explanation ˆw GLIME. For any ϵ > 0, δ (0, 1), if n = Ω(ϵ 2M 2ν2d3γ4 log(4d/δ)) where γ2 = dβ2 1 + (d2 d)β2 2, β1 = (α1 + (d 2)α2)/β0, β2 = α2/β0, β0 = (α1 α2)(α1+(d 1)α2)), we have P( ˆw GLIME w GLIME 2 < ϵ) 1 δ. w GLIME = limn ˆw GLIME.

Proof. The proof closely resembles that of Theorem 4.1. Employing the same derivation, we deduce that:

Σ = (α1 + λ α2)I + α211 , Σ 1 = (β1 β2)I + β211

β1 = α1 + λ + (d 2)α2 (α1 + λ α2)(α1 + λ + (d 1)α2), β2 = α2 (α1 + λ α2)(α1 + λ + (d 1)α2)

Given that λmax(z (z ) ) R and Var(z (z ) ) 2 ν2, according to the matrix Hoeffding s inequality [31], for all t > 0:

P( Σn Σ 2 t) 2d exp nt2

Applying Hoeffding s inequality coordinate-wise, we obtain:

P( Γn Γ t) 2d exp nt2

Additionally, Σ 1 2 F = dβ2 1 + (d2 d)β2 2 = γ2

By selecting n n1 = 25γ2ν2 log(4d/δ) and t1 = 235 2γ 1, we obtain

P( Σn Σ 2 t1) 2d exp n1t2 1 8ν2

with a probability of at least 1 δ/2.

Σ 1(Σn Σ) Σ 1 Σn Σ γt1 = 0.32

Letting n n2 = 25ϵ 2M 2d2γ2 log(4d/δ) and t2 = 2 2ϵγ 1, we have

P( Γn Γ t2) 2d exp n2t2 2 8M 2d2

with a probability of at least 1 δ/2.

Σ Γn Γ γt2 ϵ

As Γ M, by choosing n n3 = 25ϵ 2M 2ν2dγ4 log(4d/δ) and t3 = 2 2ϵM 1d 0.5γ 2, we have

P( Σn Σ 2 t3) 2d exp n3t2 3 2ν2

and with a probability of at least 1 δ/2,

Σ 1 2 Γ Σn Σ γ2Md0.5t3 = ϵ

Therefore, by choosing n = max{n1, n2, n3}, we have

P( Σ 1 n Γn Σ 1Γ ϵ) δ

Corollary B.3. Suppose {z i}n i=1 are i.i.d. samples from P(z , z 0 = k) = ek/σ2/(1+e1/σ2)d, k = 1, . . . , d are used to compute GLIME-BINOMIAL explanation. For any ϵ > 0, δ (0, 1), if n = Ω(ϵ 2d5e4/σ2 log(4d/δ)), we have P( ˆw Binomial w Binomial 2 < ϵ) 1 δ. w Binomial = limn ˆw Binomial.

Proof. For GLIME-BINOMIAL, each coordinate of z (z ) follows a Bernoulli distribution, ensuring the bounded variance of both z (z ) and (z f(z ))i. Additionally, we have

α1 =E[(z2 i ) ] = E[z i] = e1/σ2

k=0 P(z i = 1| z 0 = k)P( z 0 = k)

(1 + e1/σ2)d

d 1 k 1 ek/σ2

(1 + e1/σ2)d

=(1 + e1/σ2)d 1

(1 + e1/σ2)d e1/σ2 = e1/σ2

α2 =E[z iz j] = e1/σ2

k=0 P(z i = 1, z j = 1| z 0 = k)P( z 0 = k)

k(k 1) d(d 1)

(1 + e1/σ2)d

d 2 k 2 ek/σ2

(1 + e1/σ2)d

=(1 + e1/σ2)d 2

(1 + e1/σ2)d e2/σ2 = e2/σ2

(1 + e1/σ2)2 = α2 1

|β1|2 = | α1 + λ + (d 2)α2 (α1 + λ α2)(α1 + λ + (d 1)α2)|2

| 1 α1 + λ α2 | 1 |α1 α2| = e 1/σ2(1 + e1/σ2)2 4e1/σ2

|β2|2 =| α2 (α1 + λ α2)(α1 + λ + (d 1)α2)|2

α2 2 (α1 α2)(α1 + λ + (d 1)α2)2

= α1α2 (1 α1)(α1 + (d 1)α2)2

α1α2 (1 α1)((d 1)α2)2 = e 1/σ2(1 + e1/σ2)2

(d 1)2 22e1/σ2

dβ2 1 + (d2 d)β2 2 de1/σ2 + e1/σ2 d d 1 de1/σ2

B.6 Formulation of Smooth Grad

Proposition B.4. Smooth Grad is equivalent to GLIME formulation with z = z + x where z N(0, σ2I), ℓ(f(z), g(z )) = (f(z) g(z ))2 and π(z) = 1, Ω(v) = 0.

The explanation returned by GLIME for f at x with infinitely many samples under the above setting is

σ2 Ez N(0,σ2I)[z f(z + x)] = Ez N(0,σ2I)[ f(x + z )]

which is exactly Smooth Grad explanation. When σ 0, w f(x + z)|z=0.

Proof. To establish this proposition, we commence by deriving the expression for the GLIME explanation vector w .

Exact Expression of Σ: For each i = 1, , n, let z i N(0, σ2I). In this context,

k(z2 k1) 1 n P

k z l1z kd ... ... ... 1 n P

k z kdz k1 1 n P

This implies

Σ = Ez N(0,σ2I)[z (z ) ] =

σ2 0 ... ... ... 0 σ2

1 σ2 0 ... ... ... 0 1 σ2

Consequently, we obtain

w = Σ 1Γ = 1

σ2 Ez N(0,σ2I)[z f(x + z )] = Ez N(0,σ2I)[ f(x + z )]

The final equality is a direct consequence of Stein s lemma [17].