# what_does_lime_really_see_in_images__81856526.pdf

What Does LIME Really See in Images?

Damien Garreau 1 Dina Mardaoui 2

The performance of modern algorithms on certain computer vision tasks such as object recognition is now close to that of humans. This success was achieved at the price of complicated architectures depending on millions of parameters and it has become quite challenging to understand how particular predictions are made. Interpretability methods propose to give us this understanding. In this paper, we study LIME, perhaps one of the most popular. On the theoretical side, we show that when the number of generated examples is large, LIME explanations are concentrated around a limit explanation for which we give an explicit expression. We further this study for elementary shape detectors and linear models. As a consequence of this analysis, we uncover a connection between LIME and integrated gradients, another explanation method. More precisely, the LIME explanations are similar to the sum of integrated gradients over the superpixels used in the preprocessing step of LIME.

1. Introduction

Deep neural networks and deep convolutional neural networks (CNN) in particular have changed the way computers look at images (Schmidhuber, 2015). Many speciﬁc tasks in computer vision such as character recognition and object recognition are now routinely achieved by personal computers with human-like accuracy. The success of these algorithms seems partly due to the great complexity of the models they encode, the most recent relying on hundreds of layers and millions of parameters.

While the accuracy is often the only relevant metric for practitioners, there are numerous situations where one is not satisﬁed if the model is making good predictions for the wrong reasons. We would like to know why the model makes

1Universit e Cˆote d Azur, Inria, CNRS, LJAD, France 2Polytech Nice. Correspondence to: Damien Garreau <damien.garreau@univ-cotedazur.fr>.

Proceedings of the 38 th International Conference on Machine Learning, PMLR 139, 2021. Copyright 2021 by the author(s).

a particular prediction. Responding to this emerging need, many interpretability methods have appeared in the last ﬁve years. Among them, model agnostic methods aim to provide to the user meaningful insights on the inner working of a speciﬁc algorithm without making any speciﬁc assumption on the architecture of the model. We refer to Adadi and Berrada (2018); Guidotti et al. (2018) and Linardatos et al. (2021) for recent review papers.

Figure 1. Explaining a prediction with LIME. In this example, the function to be explained f is the likelihood, according to the Inception V3 network, that the input image ξ contains a lion. After a run of LIME with default parameters, the top ﬁve positive coefﬁcients are highlighted in the right panel.

In this paper, we study the image version of LIME (Local Interpretable Model-agnostic Explanations, Ribeiro et al., 2016). Let us recall brieﬂy how it operates: in order to explain the prediction of a model f for an example ξ, LIME

1. decomposes ξ in d superpixels, that is, small homogeneous image patches;

2. creates a number of new images x1, . . . , xn by randomly turning on and off these superpixels;

3. queries the model, getting predictions yi = f(xi);

4. builds a local weighted surrogate model ˆβn ﬁtting the yis to the presence or absence of superpixels.

Each coefﬁcient of ˆβn is associated to a superpixel of the original image ξ and, intuitively, the more positive the more important the superpixel is for the prediction at ξ according to LIME. Generally, the user visualizes ˆβn by highlighting

What Does LIME Really See in Images?

the superpixels associated to the top positive coefﬁcients (usually ﬁve, see Figure 1).

The central question underlying this work is that of the soundness of LIME for explaining simple models: before using LIME on deep neural networks, are we sure that the explanations provided make sense for the most simple models? Can we guarantee it theoretically?

Contributions. Our contributions are the following:

when the number of perturbed examples is large, the interpretable coefﬁcients concentrate with high probability around a vector β that depends only on the model and the example to explain;

we provide an explicit expression for β, from which we gain some reassurance on LIME. In particular, the explanations are linear in the model;

for simple shape detectors, we can be more precise in the computation of β and we show that LIME provides meaningful explanations in that case;

we can also compute β for linear models. The limit explanation takes a very simple form: βj is the sum of coefﬁcients multiplied by pixel values on each superpixel;

as a consequence, we show experimentally that for models that are sufﬁciently smooth with respect to their inputs, the outputs of LIME are similar to the sum over superpixels of integrated gradients, another interpretability method.

Related work. While some weaknesses of LIME are wellknown, in particular its vulnerability adversarial attacks (Slack et al., 2020), investigating whether the produced explanations do make sense is still an ongoing area of research. (see for instance Narodytska et al. (2019)). The present work follows the line of ideas initiated by Garreau and von Luxburg (2020a;b) for the tabular data version of LIME and later extended to text data by Mardaoui and Garreau (2021). In particular, our main result and its proof are similar to the theory laid out in these papers. The interesting differences come from the sampling procedure of LIME for images: there is no superpixel creation step in the text and tabular data version of the algorithm. Therefore, the exact expression of the limit explanations and the associated conclusions differ.

Organization of the paper. We start by presenting LIME for images in Section 2. Section 3 contains our main results, which are further developed for simple models in Section 4. Finally, we investigate the link between LIME and integrated gradients in Section 5.

2. LIME for Images

From now on, we consider a model f : [0, 1]D R as well as a ﬁxed example to explain ξ [0, 1]D. Hence D denotes the number of pixels of the images on which f operates. In practice, the inputs of f are always 2or 3-dimensional arrays. Of particular interest, grayscale images are usually encoded as h w arrays, whereas RGB images are h w 3, with each channel corresponding to a primary color. We will see that it does not make a difference and our results can be read channel-wise if there is more than one color channel.

2.1. Superpixels

The ﬁrst step of the LIME operation is to split ξ into superpixels. These are contiguous patches of the image that share color and / or brightness similarities. We refer to Figure 2 for an illustration. In the text version of LIME, the counterpart of this superpixel decomposition is a local dictionary where each interpretable feature is a unique word of the text, whereas in the tabular version a complicated discretization procedure is needed.

By default, LIME uses the quickshift algorithm to produce these superpixels (Vedaldi and Soatto, 2008). In a nutshell, quickshift is a mode-seeking algorithm that considers the pixels as samples over a 5-dimensional space (3 color dimensions and 2 space dimensions).

For any 1 k d, we denote the kth superpixel associated to ξ by Jk. Therefore, the d subsets J1, . . . , Jd form a partition of the pixels, that is,

J1 Jd = {1, . . . , D} and Jk Jℓ= k = ℓ.

Note that, even though the superpixels are generally contiguous patches of the image, we do not make this assumption.

2.2. Sampling

As we have seen in Section 1, one of LIME s key ideas is to create new examples from ξ by randomly replacing some superpixels of the image. By default, these chosen superpixels are replaced by the mean color of the superpixel, a procedure that we call mean replacement. It is also possible to choose a speciﬁc color as a replacement image. We demonstrate the sampling procedure in Figure 2 as well as the two possible choices for the replacement image.

Let us be more precise and let us assume that ξ is ﬁxed and J1, . . . , Jd are given. The ﬁrst step of the sampling scheme is to compute the replacement image ξ [0, 1]D. If a given color c is provided, then ξu = c for all 1 u D. If no color is provided, then the mean image is computed: for any superpixel Jk, we deﬁne ξ [0, 1]D by

u Jk, ξu = 1 |Jk|

u Jk ξu . (1)

What Does LIME Really See in Images?

segmentation

Figure 2. Sampling procedure of LIME for images. The image to explain, ξ, is ﬁrst split into d superpixels (lower left corner, here d = 72). A replacement image ξ is computed, which is by default the mean of ξ on each superpixel (top row), see Eq. (1). This replacement image can also be ﬁlled uniformly with a pre-determined color (bottom row: replacement with the color black). Then, for each new generated example xi with 1 i n, the superpixels are randomly switched depending on the throw of d independent Bernoulli random variables with parameter 1/2. Thus LIME creates n new images where key parts of ξ disappear at random.

Of course, if the input images have several channels, the mean is computed on each channel.

Then, for each 1 i n, LIME samples a random vector zi {0, 1}d where each coordinate of zi is i.i.d. Bernoulli with parameter 1/2. Each zi,j corresponds to the activation (zi,j = 1) or inactivation (zi,j = 0) of superpixel j. We call the zis the interpretable features. To be precise, for any given i {1, . . . , n}, the new example xi [0, 1]D has pixel values given by

u Jj, xi,u = zi,jξu + (1 zi,j)ξu . (2)

Again, if ξ has several color channels, Eq. (2) is written channel-wise. Note that ξ corresponds to the vector 1 = (1, . . . , 1) (all the superpixels of the image are activated).

2.3. Weights

Of course, the new examples xi can be quite different from the original image. For instance, if most of the zi,j are zero, then xi is close to ξ. Some care is taken when building the surrogate model, and new examples are given a positive weight πi that takes this proximity into account. By default, these weights are deﬁned by

1 i n, πi := exp dcos(1, zi)2

where ν > 0 is a positive bandwidth parameter equal to 0.25 by default and dcos is the cosine distance. Namely,

u, v Rd, dcos(u, v) := 1 u v u v .

We see that dcos(zi, 1) takes near zero values if most of the superpixels are activated, and values near 1 in the opposite scenario, as expected.

An important remark is that the weights πi depend only on the number of inactivated superpixels. Indeed, conditionally to zi having exactly s elements equal to zero, we have z i 1 = d s and zi =

d s. Since 1 =

d, using Eq. (3), we deduce that πi = ψ(s/d), where we deﬁned

t [0, 1], ψ(t) := exp (1 1 t)2

2.4. Surrogate Model

The next stage of LIME is to build a surrogate model. More precisely, LIME builds a linear model with the interpretable features zi as input and the model predictions yi := f(xi) as responses. This linear model, in the default implementation, is obtained by (weighted) ridge regression (Hoerl and Kennard, 1970). Formally, the outputs of LIME for model f and image ξ are given by

ˆβλ n arg min β Rd+1

i=1 πi(yi β zi)2 + λ β 2 , (5)

where λ > 0 is a regularization parameter. We call the coordinates of ˆβλ n the interpretable coefﬁcients. By convention, the 0th coordinate of ˆβλ n is the intercept of the model. Some feature selection procedure can be used: we do not consider such extensions in our analysis and keep to the default implementation, which is ridge.

Another important remark is the following: as in the text and tabular cases, LIME uses the default setting of sklearn for the regularization parameter, that is, λ = 1. Hence the ﬁrst term in Eq. (5) is roughly of order n and the second term of order d. Since we experiment in the large n regime

What Does LIME Really See in Images?

(n = 1000 is default) and with images split up in 100 superpixels, we are in a situation where n d. Therefore, we can consider that λ = 0 in our analysis and still recover meaningful results. We will denote by ˆβn the solution of Eq. (5) with λ = 0, that is, ordinary least-squares.

The ﬁnal step of LIME for images is to display the superpixels associated to the top positive coefﬁcients of ˆβλ n (usually ﬁve, see Figure 1). Part of what makes the method attractive to the practitioner is the ease with which one can read the results from one run of LIME just by looking at the highlighted part of the image. Note that it is also possible to highlight the superpixels associated to the top negative coefﬁcients in another color, to see which parts of the image have a negative inﬂuence on the prediction.

3. Main Results

In this section we present our main results. Namely, the concentration of ˆβn around βf (Section 3.1) and the expression of βf as a function of f and other quantities (Section 3.2).

3.1. Concentration of ˆβn

When the number of new samples n is large, we expect the empirical explanations provided by LIME to stabilize. Our ﬁrst result formalizes this intuition.

Theorem 1 (Concentration of ˆβn). Assume that f is bounded by a constant M > 0 on [0, 1]D. Let ϵ > 0 and η (0, 1). Let d be the number of superpixels. Then, there exists βf Rd+1 such that, for every

n max(M, M 2)ϵ 2d7e 4 ν2 log 8d

we have P( ˆβn βf ϵ) η.

We refer to the appendix for a complete statement (we omitted numerical constants and the intercept for clarity). Intuitively, Theorem 1 means that when n is large, ˆβn stabilizes around βf. Thus we can focus on βf to study LIME. The main limitation of Theorem 1 is the dependency on d and ν: the control that we achieve on ˆβn βf is quite poor whenever d is too large or ν is too small. Note also that ˆβn is given by the non-regularized version of LIME.

Theorem 1 is quite similar to Theorem 1 in Garreau and von Luxburg (2020b) and Theorem 1 in Mardaoui and Garreau (2021), which are essentially the same result for the tabular data and the text data version of LIME. The rate of convergence is slightly better here, but this seems to be an artifact of the proof and we do not think that one should sample less when dealing with images.

3.2. Expression of βf

In this section we obtain the explicit expression of βf. Before doing so, we need to introduce additional notation. From now on, we introduce the random variable z {0, 1}d

such that z1, . . . , zn are i.i.d. samples of z; it is the only source of randomness in the sampling and all expectations are taken with respect to it. We denote by π and x the associated weights and examples. Deﬁnition 1 (α coefﬁcients). Deﬁne α0 := E [π] and, for any 1 p d, αp := E [πz1 zp].

Intuitively, when ν is large, αp corresponds to the probability that exactly p superpixels of ξ are turned on. Since the sampling scheme of LIME for images is completely symmetrical as well as the deﬁnition of the weights, we see that this probability does not depend on the exact set of indices, hence the deﬁnition of the α coefﬁcients. We show in appendix that the expected covariance matrix of problem (5) can be written with the ﬁrst three α coefﬁcients. Though Deﬁnition 1 is identical to Deﬁnition 3 in Mardaoui and Garreau (2021), the exact expression of the α coefﬁcients is different in this case since the sampling procedure differs. Proposition 1 (Computation of the α coefﬁcients). Let d 2 and 0 p d. For any ν > 0, it holds that

where ψ is deﬁned as in Eq. (4).

We prove Proposition 1 in the appendix. From the α coefﬁcients, we then form the normalization constant

cd := (d 1)α0α2 dα2 1 + α0α1 ,

and the σ coefﬁcients: Deﬁnition 2 (σ coefﬁcients). For any d 2 and ν > 0, deﬁne

σ2 := (d 2)α0α2 (d 1)α2 1+α0α1 α1 α2 ,

σ3 := α2 1 α0α2 α1 α2 .

We show in appendix that the inverse of the expected covariance matrix associated to problem (5) can be expressed with the help of the σ coefﬁcients and cd. With these notation in hand, we have: Proposition 2 (Expression of βf). Under the assumptions of Theorem 1, we have cd > 0 and, for any 1 j d,

βf j = c 1 d

σ1E [πf(x)] + σ2E [πzjf(x)]

E [πzkf(x)] .

What Does LIME Really See in Images?

1 2 3 4 5 6 7 8 superpixels

Interpretable coefficients

1 2 3 4 5 6 7 8 superpixels

Interpretable coefficients

1 2 3 4 5 6 7 8 superpixels

Interpretable coefficients

Figure 3. In this ﬁgure, we show how the theoretical predictions of Proposition 3 compare to practice. We considered a digit from the MNIST dataset (Le Cun et al., 1998). The function to explain takes value 1 if all pixels marked in blue have value greater than τ = 0.5, 0 otherwise. In each case, we ran LIME with n = 1000 examples, default regularization λ = 1 and zero replacement. We repeated the experiment ﬁve times, which gave the boxplot corresponding to the empirical values of the interpretable coefﬁcients for each superpixel. The red crosses correspond to the predictions given by Proposition 3. We see that when the shape is split among p superpixels, each one receives a coefﬁcient approximately equal to 1/2p 1.

We provide a detailed proof of Proposition 2 as well as the expression of the intercept βf 0 in the appendix. Let us note that Proposition 2 is quite similar to Eq. (6) in Garreau and von Luxburg (2020b) and Eq. (9) in Mardaoui and Garreau (2021). We will see that many properties of βf are similar to the tabular and text case. Let us now present some immediate consequences of Proposition 2.

Linearity of explanations. As in the tabular and the text setting, the mapping f 7 βf is linear. Thus for any model that can be decomposed as a sum, the explanations provided by LIME are the sum of the explanations of individual models. This is true up to noise coming from the sampling (quantiﬁed by Theorem 1) and a small error due to the regularization, which is not taken into account in Theorem 1.

Large bandwidth. Because of the weights π and their complex dependency in the bandwidth ν, it can be difﬁcult to make sense of Proposition 2 in the general case. It is somewhat easier when ν + . Indeed, we show in the appendix that cd 1/4, σ1 1/2, σ2 1, and σ3 0. Moreover, π 1 almost surely. Therefore, by dominated convergence, the expression of βf simpliﬁes to βf

j = 2 (E [f(x)|zj = 1] E [f(x)]) , (6)

for any 1 j d. In other words, the explanation provided by LIME is proportional to the difference between the mean value of the model conditioned to superpixel j being activated and the mean value of the model for x sampled as explained previously. It seems that Eq. (6) encompasses a desirable trait of LIME for images: when the bandwidth is large, the interpretable coefﬁcient for superpixel j takes large positive values if in the vicinity of ξ, the model takes signiﬁcantly larger values when this superpixel is present in the image. Of course, presence or absence of a given superpixel depends on the replacement scheme. Eq. (6) hints that the explanation for superpixel j could be near zero if ξ is close to ξ on Jj, whereas Jj is actually important for the prediction.

4. Expression of βf for Simple Models

In this section, we get meaningful insights on the explanations provided by LIME for simple models.

4.1. Shape Detectors

We start with a very simple model: a ﬁxed shape detector for gray scale images. To this extent, let S := {u1, . . . , uq} be a set of q distinct pixels indices, the shape. Let τ (0, 1) be a positive threshold. We deﬁne the associated shape detector

x [0, 1]D, f(x) := Y

u S 1xu>τ . (7)

Readily, f(x) takes the value 1 if the pixels of shape S are lit up, and 0 otherwise.

It is possible to compute βf in closed-form in this case. In fact, the result does not depend on the exact shape to be detected, but rather on how it intersect the LIME superpixels. Let us deﬁne

E := {j {1, . . . , d} s.t. Jj S = } ,

the set of superpixels intersecting the shape S. We separate this set in two parts,

E+ :={j E s.t. ξj > τ}, and E :={j E s.t. ξj τ}.

Intuitively, E+ (resp. E ) contains superpixels that are brighter than τ on average (resp. darker). We also need to deﬁne

S+ :={u S s.t. ξu > τ}, and S :={u S s.t. ξu τ}.

Namely, S+ (resp. S ) contains pixels in the shape that have a value greater (resp. smaller) than the threshold.

We now make the following assumption:

j E+, Jj S = . (8)

What Does LIME Really See in Images?

Intuitively, Eq. (8) means that there is no superpixel intersecting the shape such that the average value of the superpixel activates our detector without the detector being activated. This could happen for instance if the shape intersection with the superpixels is very dark but pixels around are much brighter. It is a reasonable assumption since superpixels are quite homogeneous in color and shape. Moreover, in the case of grayscale images with zero replacement, Eq. (8) is always satisﬁed since τ > 0. Nevertheless, we provide a result that does not rely on Eq. (8) in the appendix, which specializes into: Proposition 3 (Computation of βf, shape detector). Let f be deﬁned as in Eq. (7). Assume that Eq. (8) holds and let p := |E |. Then, if there exists j E such that Jj S = , βf = 0. Otherwise, for any j E ,

βf j =c 1 d {σ1αp + σ2αp + (p 1)σ3αp+(d p)σ3αp+1}

and for any j {1, . . . , d} \ E ,

βf j =c 1 d {σ1αp+σ2αp+1 + pσ3αp + (d p 1)σ3αp+1}

The accuracy of Proposition 3 is demonstrated in Figure 3. Note that we use grayscale images for clarity (it is easier to deﬁne brightness in this case), but Proposition 3 could be adapted for RGB images.

Note that Proposition 3 is reminiscent of Proposition 3 in Mardaoui and Garreau (2021). This is not a surprise, since both these results study LIME for models with analogous structures. However, the result differ since one has to consider the intersections of the superpixels in the present case. We now make two remarks, still focusing on grayscale images with zero replacement for simplicity: in that case, E+ is always empty since τ is positive.

Shape included in a single superpixel. In the simple case where the shape is detected and is totally included in superpixel k, then it is straightforward from the deﬁnitions of the α coefﬁcients and cd that βf k = 1 and βf j = 0 otherwise. This is a good property of LIME for images since superpixel k is the only relevant superpixel in this particular situation (see Figure 3).

Shape included in several superpixels. The situation is slightly more complicated when S intersects several superpixels (that is, p > 1). Reading Proposition 3, we see that when ν is large, the coefﬁcients for an intersecting superpixel is approximately 1/2p 1 and 0 otherwise (see Lemma 1 in appendix for a more precise statement). That is, the importance given by LIME is evenly divided between superpixels that contain part of the shape. Again, this makes sense since they are the only relevant superpixels in this case (see Figure 3). For instance, if only two superpixels are involved, then the weight is roughly halved. We can note that this halving occurs even if a very small portion of the shape falls into one of the two superpixels.

4.2. Linear Model

We now turn to linear models. That is, f is given by

x [0, 1]D, f(x) =

u=1 λuxu , (9)

with λ1, . . . , λD R arbitrary coefﬁcients. Note that we can adapt this deﬁnition if there is more than one color channel, by considering 3 D coefﬁcients.

Let us compute βf when f is linear. We show in the appendix the following:

Proposition 4 (Computation of βf, linear case). Assume that f is deﬁned as in Eq. (9). Then, for any 1 j d,

u Jj λu (ξu ξu) .

When ξ = 0, the coefﬁcients take a very simple expression. Namely, the interpretable coefﬁcient associated to superpixel Jj is the sum of the coefﬁcients of f multiplied by the pixel values on the superpixel. If another replacement is chosen, then the normalized values of the pixel is taken into account in this product. This seems to make a lot of sense: let us say that the coefﬁcients of f take large positive values on superpixel j. Then LIME will give a high interpretable coefﬁcient to this superpixel, unless the pixel values are small (or very close to the replacement color if another replacement is chosen). This is particularly visible in Figure 4: the λ coefﬁcients take high values in the lower right corner (left panel). But since the 5th superpixel contains only 0-values pixels (middle panel), the interpretable coefﬁcient given by LIME for this superpixel cancels out (right panel).

It is also interesting to see that there is no bandwidth dependency in Proposition 4. In a sense, this is to be expected since LIME is doing averages that are scale invariant if the function to explain is linear.

Proposition 4 is similar in spirit to Eq. (12) in Mardaoui and Garreau (2021), where the interpretable coefﬁcients provided by LIME for a linear model where also found to be (approximately) equal to the product of the coefﬁcients of the linear model and the feature value.

5. Approximated Explanations

If f is regular enough, it can be written as in Eq. (9) in the vicinity of ξ. If this is the case, an interesting question in light of the results of the previous section is the following: are the explanations given by Proposition 4 close to the LIME explanations? To put it plainly, can we approximately recover the LIME coefﬁcients by summing the partial derivatives of f at ξ over the superpixels? We will

What Does LIME Really See in Images?

1 2 3 4 5 6 superpixels

Interpretable coefficients

Figure 4. In this ﬁgure, we show how the theoretical predictions of Proposition 4 fare in practice. We consider a digit from the MNIST dataset. The function to explain is linear, with λi,j proportional to i + j with added white noise (heatmap in the left panel). We ran LIME 5 times with zero replacement and default parameters, the superpixels used are displayed in the middle panel. We see that our predictions match perfectly. As predicted, J5 receives a coefﬁcient equal to zero whereas f has high coefﬁcients in this area, since the pixel values are equal to zero in this superpixel.

0.0 0.2 0.4 0.6 0.8 1.0 α

f((1 α)ξ + αξ)

0.0 0.2 0.4 0.6 0.8 1.0 α

f((1 α)ξ + αξ)

Figure 5. Predictions given by the Inception V3 network (Szegedy et al., 2016) on a linear path between ξ (α = 0) and ξ (α = 1). Left panel: we see how the predicted values can vary along the path and why only considering the gradient at ξ or ξ may not be a good idea to build a linear approximation. Right panel: we see how the gradient can saturate when the network is very conﬁdent in the prediction.

see that the answer to this question is yes, with one caveat: simply taking the gradient does not always yield a satisfying linear approximation for complicated functions. We discuss linear approximations of an arbitrary function in Section 5.1 before investigating empirically in Section 5.2.

5.1. Linear Approximation

The most natural linear approximation of a function is given by its Taylor expansion truncated at order one. Since we want to approximate f(x), where x is somewhere between ξ and ξ, we could write, for instance, that f(x) f(ξ) + f(ξ) (x ξ). There are two main objections in doing so in the present case. First, we do not expect f to be linear between ξ and ξ, and taking just one gradient would lead to a poor approximation. We illustrate this behavior in Figure 5 by computing the predictions across a straight line between ξ and ξ.

Second, it is a well-known phenomenon in modern architectures that the gradient of the model with respect to the input can saturate when the network is conﬁdent in the prediction for certain activation functions (see, for instance, Krizhevsky et al. (2012)). Since from our point of view f is a black-box model, we do not have information on the activation functions (in fact, we do not even assume that f is a neural network). Therefore gradients taken at ξ or ξ can be zero, giving us essentially no information on the behavior of f (see Figure 5).

For both these reasons, we build a linear approximation of f between ξ and ξ using the averaged gradients on a linear path between ξ and ξ. Formally, we deﬁne

f((1 α)ξ + αξ)

the averaged gradient at pixel u. We approximate this integral by a Riemann sum, that is,

gapprox u := 1

mξ) xu . (11)

Subsequently, we approximate f(x) by (x ξ) gapprox + f(ξ). Applying Proposition 4 to this approximation we obtain the approximate explanations

1 j d, βapprox j = X

u Jj (ξu ξu) gapprox u . (12)

Inside the sum, we recognize the deﬁnition of integrated gradients between ξ and ξ (Sundararajan et al., 2017), another interpretability method. Eq. (12) therefore corresponds to the sum of integrated gradients over superpixel j.

5.2. Experiments

In this section, we show experimentally that LIME explanations are similar to the approximated explanations derived in the previous section. The code of all experiments is available at https://github.com/dgarreau/image_ lime_theory

Setting. We ﬁrst considered images from the CIFAR10 dataset (Krizhevsky et al., 2009), that is, 32 32 RGB images belonging to ten categories. For a subset of 1000 images of the test set, we computed the explanations given by LIME with default settings, with the exception of the kernel size used by the quickshift algorithm which we decreased to 1 to get wider superpixels. We compared these explanations with the approximated explanations of Section 5 for four different models. First, we started with a very simple one-hidden-layer neural network, trained to 35% accuracy. We then moved to VGG-like architectures (Simonyan and Zisserman, 2014), progressively increasing the number of

What Does LIME Really See in Images?

damselfly (conf. 88%)

segmentation

int. gradient

linear approx.

quail (conf. 22%)

segmentation

int. gradient

linear approx.

Figure 6. Comparing the explanations given by LIME vs approximate explanations obtained by summing the integrated gradient over the LIME superpixels. Here we explain the top predicted class for images of the ILSVRC2017 test data with the Inception V3 network. In both cases, we showcase the top ﬁve positive coefﬁcients. Qualitatively, the explanations obtained are quite similar, identifying close superpixels when they are not matching exactly.

blocks in the model (from 1 to 3). For each model, we considered the function corresponding to the most likely class for ξ. We then collected the indices of the superpixels associated to the top ﬁve and top ten positive average coefﬁcients. For the sum of integrated gradients, we considered m = 20 steps in Eq. (11) as in Sundararajan et al. (2017). The results are presented in Table 1.

We then moved to more realistic images coming from the test set of the 2017 large scale visual recognition challenge (LSVRC, Russakovsky et al., 2015). We used three pretrained models from the Keras framework: Mobile Net V2 (Sandler et al., 2018), Dense Net121 (Huang et al., 2017), and Inception V3 (Szegedy et al., 2016) with default input shape (299, 299, 3). Again, we compared the LIME default explanations to the approximated explanations for 1000 of these images. Qualitative results are presented in Figure 6 for the Inception V3 network, while Table 2 contains the quantitative results.

Metric. We compared the list of the top 5 and 10 positive coefﬁcients via the Jaccard sindex (also know as Jaccard similarity coefﬁcient), that is, the size of the intersection divided by the size of the union of the two lists. Hence a Jaccard index of 1 means perfect match between the identiﬁed superpixels whereas a Jaccard index of 0 complete disagreement. Note that the Jaccard similarity between a ﬁxed set of size 5 (resp. 10) and a random subset of {1, . . . , 60} is equal to 0.05 (resp. 0.06). Here, 60 is the observed average number of superpixels produced by the quickshift algorithms for the images at hand.

Results. Without being a perfect match, we observe a substantial overlap between the LIME explanations and the approximated explanations for all the models and datasets that we tried. This is particularly striking for simple

Table 1. Comparison between LIME and approximated explanations for CIFAR-10. For each model, we report JX, the Jaccard similarity between the top X positive coefﬁcients.

Model # param. # layers acc. J5 J10

1-layer 330K 1 0.35 0.99 0.99 VGG1 1M 4 0.67 0.81 0.85 VGG2 600K 8 0.69 0.75 0.81 VGG3 550K 12 0.70 0.71 0.76

models. More precisely, the Jaccard similarities observed are several times higher than what a random guess would produce. This is surprising since we are considering a linear approximation of highly non-linear functions. As a matter of fact, the exact values of the interpretable coefﬁcients are quite different. Nevertheless, they are sufﬁciently close so that the sets of superpixels identiﬁed by both methods are consistently overlapping.

We notice that this link seems to weaken when the models become too complex, while still a third of identiﬁed superpixels are common for Inception V3. However, visual inspection reveals that the superpixels identiﬁed by both methods remain close from each other even when they are distinct (see Figure 6 and additional qualitative results in appendix).

We want to emphasize that, if the model is not smooth, the link between approximate explanations in the sense of Eq. (12) and LIME does not exist anymore. For instance, a random forest model based on CART trees has gradient equal to zero everywhere. Therefore, the integrated gradient is also zero, and βapprox j = 0 for any j. We also want to point out that we did not evaluate βapprox as as an interpretability method. In particular, it could be the case that the associated

What Does LIME Really See in Images?

Table 2. Comparison between LIME and approximated explanations for LSVRC images.

Model # param. # layers acc. J5 J10

Mobile Net V2 3.5M 88 0.90 0.43 0.54 Dense Net121 8.0M 121 0.92 0.42 0.44 Inception V3 23.9M 159 0.94 0.35 0.36

explanations are of a lesser quality than LIME s.

Computation time. Setting aside the segmentation step, each run of LIME requires n = 1000 queries of the model, whereas the averaged gradient estimation requires m = 20 queries. In the favorable scenario where getting a gradient is as costly as a model query, computing the approximated explanations is much faster than LIME.

6. Conclusion

In this paper, we proposed the ﬁrst theoretical analysis of LIME for images. We showed that the explanations provided make sense for elementary shape detectors and linear models. As a consequence of this analysis, we discovered that for smooth models the interpretable coefﬁcients of LIME for images resemble to the sum of integrated gradients over the LIME superpixels.

As future work, we plan on tackling more complex models. A starting point is the study of polynomial functions: obtaining a statement analogous to Proposition 4 would open the door to more precise expression for the limit explanation depending on the higher derivatives of f.

Acknowledgments

This work was partly funded by the UCA DEP grant. The authors want to thank Ulrike von Luxburg for her insights in the writing phase of the paper.

A. Adadi and M. Berrada. Peeking inside the black-box: A survey on explainable artiﬁcial intelligence (XAI). IEEE Access, 6:52138 52160, 2018.

D. Garreau and U. von Luxburg. Explaining the Explainer: A First Theoretical Analysis of LIME. In Proceedings of the 23rd International Conference on Artiﬁcial Intelligence and Statistics (AISTATS), 2020a.

D. Garreau and U. von Luxburg. Looking Deeper into Tabular LIME. ar Xiv preprint ar Xiv:2008.11092, 2020b.

R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, F. Giannotti, and D. Pedreschi. A survey of methods for explaining

black box models. ACM computing surveys (CSUR), 51 (5):1 42, 2018.

A. E. Hoerl and R. W. Kennard. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1):55 67, 1970.

G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4700 4708, 2017.

A. Krizhevsky, G. Hinton, et al. Learning multiple layers of features from tiny images. 2009.

A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classiﬁcation with deep convolutional neural networks. Advances in neural information processing systems, 25: 1097 1105, 2012.

Y. Le Cun, L. Bottou, Y. Bengio, and P. Haffner. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11):2278 2324, 1998.

P. Linardatos, V. Papastefanopoulos, and S. Kotsiantis. Explainable AI: A Review of Machine Learning Interpretability Methods. Entropy, 23(1):18, 2021.

D. Mardaoui and D. Garreau. An Analysis of LIME for Text Data. In Proceedings of The 24th International Conference on Artiﬁcial Intelligence and Statistics (AISTATS), 2021.

N. Narodytska, A. Shrotri, K. S. Meel, A. Ignatiev, and J. Marques-Silva. Assessing heuristic machine learning explanations with model counting. In International Conference on Theory and Applications of Satisﬁability Testing, pages 267 278, 2019.

M. T. Ribeiro, S. Singh, and C. Guestrin. Why should I trust you? Explaining the predictions of any classiﬁer. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135 1144, 2016.

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3):211 252, 2015.

M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.- C. Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR), pages 4510 4520, 2018.

J. Schmidhuber. Deep learning in neural networks: An overview. Neural networks, 61:85 117, 2015.

What Does LIME Really See in Images?

K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. ar Xiv preprint ar Xiv:1409.1556, 2014.

D. Slack, S. Hilgard, E. Jia, S. Singh, and H. Lakkaraju. Fooling LIME and SHAP: Adversarial attacks on post hoc explanation methods. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 2020.

M. Sundararajan, A. Taly, and Q. Yan. Axiomatic Attribution for Deep Networks. In Proceedings of the 34th International Conference on Machine Learning (ICML), 2017.

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818 2826, 2016.

A. Vedaldi and S. Soatto. Quick shift and kernel methods for mode seeking. In European Conference on Computer Vision, pages 705 718, 2008.