# anchors_highprecision_modelagnostic_explanations__e13cddb2.pdf

Anchors: High-Precision Model-Agnostic Explanations

Marco Tulio Ribeiro University of Washington marcotcr@cs.washington.edu

Sameer Singh University of California, Irvine sameer@uci.edu

Carlos Guestrin University of Washington guestrin@cs.washington.edu

We introduce a novel model-agnostic system that explains the behavior of complex models with high-precision rules called anchors, representing local, sufﬁcient conditions for predictions. We propose an algorithm to efﬁciently compute these explanations for any black-box model with high-probability guarantees. We demonstrate the ﬂexibility of anchors by explaining a myriad of different models for different domains and tasks. In a user study, we show that anchors enable users to predict how a model would behave on unseen instances with less effort and higher precision, as compared to existing linear explanations or no explanations.

Introduction Sophisticated machine learning models such as deep neural networks have been shown to be highly accurate for many applications, even though their complexity virtually makes them black-boxes. As a consequence of the need for users to understand the behavior of these models, interpretable machine learning has seen a resurgence in recent years, ranging from the design of novel globally-interpretable machine learning models (Lakkaraju, Bach, and Leskovec 2016; Ustun and Rudin 2015; Wang and Rudin 2015) to local explanations (for individual predictions) that can be computed for any classiﬁer (Baehrens et al. 2010; Ribeiro, Singh, and Guestrin 2016b; Strumbelj and Kononenko 2010). A question at the core of interpretability is whether humans understand a model enough to make accurate predictions about its behavior on unseen instances. For instances where humans can conﬁdently predict the behavior of a model, let (human) precision be the fraction in which they are correct (note that this is human precision, not model precision). High human precision is paramount for real interpretability - one can hardly say they understand a model if they consistently think they know what it will do, but are often mistaken. Most local approaches provide explanations that describe the local behavior of the model using a linearly weighted combination of the input features (Baehrens et al. 2010; Ribeiro, Singh, and Guestrin 2016b; Strumbelj and Kononenko 2010). Linear functions can capture relative importance of features in an easy-to-understand manner. However, since these linear

Copyright c 2018, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

(a) Instances

(b) LIME explanations

(c) Anchor explanations

Figure 1: Sentiment predictions, LSTM

explanations are in some way local, it is not clear whether they apply to an unseen instance. In other words, their coverage (region where explanation applies) is unclear. Unclear coverage can lead to low human precision, as users may think an insight from an explanation applies to unseen instances even when it does not. When combined with the arithmetic involved in computing the contribution of the features in linear explanations, the human effort required can be quite high. Take for example LIME (Ribeiro, Singh, and Guestrin 2016b) explanations for two sentiment predictions made by an LSTM in Figure 1. Although both explanations are computed to be locally accurate, if one took the explanation on the left and tried to apply it to the sentence on the right, one might be tempted to think that the word not would have a positive inﬂuence, which it does not. While such explanations provide insight into the model, their coverage is not clear, e.g. when does not have a positive inﬂuence on sentiment? In this paper, we introduce novel model-agnostic explanations based on if-then rules, which we call anchors. An anchor explanation is a rule that sufﬁciently anchors the prediction locally such that changes to the rest of the feature values of the instance do not matter. In other words, for instances on which the anchor holds, the prediction is (almost) always the same. For example, the anchors in Figure 1c state that the presence of the words not bad virtually guarantee a prediction of positive sentiment (and not good of negative

The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18)

(a) D and D(.|A)

(b) Two toy visualizations

Figure 2: Concrete example of D in (a) and intuition (b)

sentiment). Anchors are intuitive, easy to comprehend, and have extremely clear coverage they only apply when all the conditions in the rule are met, and if they apply the precision is high (by design). We demonstrate the usefulness of anchors by applying them to a variety of machine learning tasks (classiﬁcation, structured prediction, text generation) on a diverse set of domains (tabular, text, and images). We also run a user study, where we observe that anchors enable users to predict how a model would behave on unseen instances with much less effort and higher precision as compared to existing techniques for model-agnostic explanation, or no explanations.

Anchors as High-Precision Explanations

Given a black box model f : X Y and an instance x X, the goal of local model-agnostic interpretability (Ribeiro, Singh, and Guestrin 2016a; 2016b; Strumbelj and Kononenko 2010) is to explain the behavior of f(x) to a user, where f(x) is the individual prediction for instance x. The assumption is that while the model is globally too complex to be explained succinctly, zooming in on individual predictions makes the explanation task feasible. Most model-agnostic methods work by perturbing the instance x according to some perturbation distribution Dx (for simplicity, D from now on). In Ribeiro, Singh, and Guestrin (2016b), we emphasize that the perturbations D (and explanations) must use an interpretable representation (i.e. one that makes sense to humans), even if the model uses an alternative representation of the input. Let A be a rule (set of predicates) acting on such an interpretable representation, such that A(x) returns 1 if all its feature predicates are true for instance x. For example, in Figure 2a (top), x = This movie is not bad. , f(x) = Positive, A(x) = 1 where A = { not , bad }. Let D( |A) denote the conditional distribution when the rule A applies (e.g. similar texts where not and bad are present, Figure 2a bottom). A is an anchor if A(x) = 1 and A is a sufﬁcient condition for f(x) with high probability in our running example, if a sample z from D(z|A) is likely predicted as Positive (i.e. f(x) = f(z)). Formally A is an anchor if,

ED(z|A)[1f(x)=f(z)] τ, A(x) = 1. (1)

Figure 2b shows two zoomed in regions of a complex model, with particular instances (+ and -) being explained. LIME (Ribeiro, Singh, and Guestrin 2016b) explanations work by learning the lines that best approximate the model under D, with some local weighting. The resulting expla-

Instance If Predict

I want to play(V) ball. previous word is PARTICLE play is VERB.

I went to a play(N) yesterday. previous word is DETERMINER play is NOUN.

I play(V) ball on Mondays. previous word is PRONOUN play is VERB.

Table 1: Anchors for Part-of-Speech tag for the word play

nations give no indication of how faithful they are (the explanation on the right is a much better local approximation of the black box model than the one on the left), or what their local region is. In contrast, even though they use the same D, anchors are by construction faithful, adapting their coverage to the model s behavior (the anchor on the right of Figure 2b is broader) and making their boundaries clear. Leaving the discussion of how to compute anchors for later, we now demonstrate their usefulness and ﬂexibility via concrete examples in a variety of domains and models. Text Classiﬁcation: We have already alluded to Figure 1, where we trained an LSTM model with paraphrastic sentence embeddings (Wieting et al. 2015) to predict the sentiment of reviews. In this case, the features used by the model are uninterpretable. The interpretable representation we use is the presence of individual tokens (words) in the instance. The perturbation distribution D replaces absent tokens by random words with the same POS tag with probability proportional to their similarity in an embedding space (Pennington, Socher, and Manning 2014) (i.e. the generated sentences are coherent, and of the same length). We show samples from D for a sentence in Figure 2a. The anchor A = { not , bad } is easy to apply: if the words not and bad are in the sentence, the model will predict positive with probability at least τ (set to 0.95 from here onwards). If either word (or both) are not present, we know we do not have sufﬁcient information to know what the model will do. Examples from D(z|A) are shown in Figure 2a (bottom). Structured prediction: When the output of the algorithm is a structure, the anchor approach can be used to explain any function of the output. Anchors are particularly suited for structured prediction models: while the global behavior is too complex to be captured by simple interpretable models, the local behavior can usually be represented using short rules. In Table 1, we explain the predictions of a state of the art part-of-speech tagger for the word play in different contexts, where we include the tags for neighboring words as part of the interpretable representation. The anchors demonstrate that the model is picking up on reasonable patterns of the English language, e.g. if play is preceded by a determiner, it is likely used as a noun. In Table 2, we compute anchors for a multi-layer RNN encoder/attention-based decoder translation system (Vinyals et al. 2015) trained on English-Portuguese parallel corpora. In this case, anchors explain the presence of a word (or certain words) in the translation. The anchors (in bold) are computed with respect to the presence of the words in pink in the Por-

English Portuguese

This is the question we must address

Esta e a quest ao que temos que enfrentar

This is the problem we must address

Este e o problema que temos que enfrentar

This is what we must address E isso que temos de enfrentar

Table 2: Anchors (in bold) of a machine translation system for the Portuguese word for This (in pink).

No capital gain or loss, never married 50K

Country is US, married, work hours > 45 > 50K

No priors, no prison violations and crime not against property Not rearrested

Male, black, 1 to 5 priors, not married, and crime not against property Re-arrested

FICO score 649 Bad Loan

649 FICO score 699 and $5, 400 loan amount $10, 000 Good Loan

Table 3: Generated anchors for Tabular datasets

tuguese text. The ﬁrst row in Table 2 means that when the words This , is , and question appear in English, the translation will include the word Esta . In Portuguese, the translation for the word this depends on the gender of the word it refers to ( esta for feminine, este for masculine), or should be isso if its referent is not in the sentence. The anchors show that the model is capturing this behavior as it always includes this is , and the word that this refers to ( question is feminine, problem is masculine).

Tabular Classiﬁcation: Classiﬁcation of data in tabular form (categorical and/or continuous features) is a popular application of machine learning. We use a validation dataset to deﬁne D, and sample from D(z|A) by ﬁxing the predicates in A and sampling the rest of the row as a whole. We show anchors for a few predictions of 400 gradient boosted trees trained on balanced versions of three datasets in Table 3. The anchors provide valuable insight into these models. Marital status appears in many different anchors for predicting whether a person makes > $50K annually (adult dataset). The anchors for predicting recidivism for individuals released from prison (Schmidt and Witte 1988) (rcdv dataset), show that this model is unfair if used for bail decisions, as race and gender feature prominently. For predicting whether a loan on the Lending Club website will turn out bad, i.e. late payment or default (lending dataset), the FICO score is sufﬁcient in the extremes, but loan amount is taken into consideration otherwise. We note that these are not exhaustive the models are complex, and these anchors explain their behavior on part of the input space but not all of it. Anchors for hard predictions (in particular boundary cases) may be longer.

Image Classiﬁcation: When explaining the label prediction for an image we follow Ribeiro, Singh, and Guestrin (2016b),

segmenting the image into superpixels (Vedaldi and Soatto 2008) and using the presence or absence of these superpixels as the interpretable representation. In contrast to Ribeiro, Singh, and Guestrin (2016b), instead of hiding superpixels, we deﬁne D(z|A) by ﬁxing the superpixels in A to the original image and superimposing another image over the rest of the superpixels. We explain a prediction of the Inception V3 neural network (Szegedy et al. 2015) in Figure 3b. Even though D is quite unrealistic here, the anchor demonstrates that the model focuses on various parts of the dog to determine its breed. An inspection of the (admittedly bizarre) images from D(z|A) in Figure 3c for which the model predicts beagle with high conﬁdence illustrates that attributes that humans would consider essential for a beagle prediction (legs, not being underwater, not being in the sky, not having a human body) are not quite as essential to the neural network. Visual Question Answering (VQA): As a ﬁnal example, we present anchors for the VQA task: answering a question asked of a reference image. Here, we are interested in identifying which part of the question led to the predicted answer, and thus use the same representation and distribution as in Figure 2b. We explain predictions from the Visual7W openended VQA system (Zhu et al. 2016) using Figure 3a as the reference image. In Figure 3d we show a question/prediction pair and its anchor (in bold), as well as samples from D(z|A) and their predictions. The short anchor ( What ) reveals that, for this image, the model s answer to many questions will be dog , contrary to our expectations. In Figure 3e, we show other question/answer pairs and their anchors, providing examples where the behavior of the classiﬁer is aligned with our intuitions (ﬁrst three) and where it is not (last question).

Related Work

Even in the few cases where having some understanding of a machine learning model s behavior is not a requirement, it is certainly an advantage. Relying only on validation accuracy has many well studied problems, as practitioners consistently overestimate their model s accuracy (Patel et al. 2008), propagate feedback loops (Sculley et al. 2015), or fail to notice data leaks (Kaufman, Rosset, and Perlich 2011). Compared to other interpretable options, rules fare well; users prefer, trust and understand rules better than alternatives (Lim, Dey, and Avrahami 2009; Stumpf et al. 2007), in particular rules similar to anchors. Short, disjoint rules are easier to interpret than hierarchies like decision lists or trees (Lakkaraju, Bach, and Leskovec 2016). A number of approaches construct globally interpretable models, many based on rules (Lakkaraju, Bach, and Leskovec 2016; Letham et al. 2015; Wang and Rudin 2015; Wang et al. 2015). With such models, the user should be able to guess the model s behavior on any example (i.e. perfect coverage). However, these models are not appropriate for many domains, e.g. almost no interpretable rule-based system is suitable for text or image applications, due to the sheer size of the feature space, or are just not accurate enough. Interpretability, in these cases, comes at the cost of ﬂexibility, accuracy, or efﬁciency (Ribeiro, Singh, and Guestrin 2016a). An alternative is learning a simple (interpretable) model to imitate the black

(a) Original image

(b) Anchor for beagle

(c) Images where Inception predicts P(beagle) > 90%

What animal is featured in this picture ? dog

What ﬂoor is featured in this picture? dog What toenail is paired in this ﬂowchart ? dog What animal is shown on this depiction ? dog

(d) VQA: Anchor (bold) and samples from D(z|A)

Where is the dog? on the ﬂoor What color is the wall? white When was this picture taken? during the day Why is he lifting his paw? to play

(e) VQA: More example anchors (in bold)

Figure 3: Anchor Explanations for Image Classiﬁcation and Visual Question Answering (VQA)

box model globally (e.g. a decision tree (Craven and Shavlik 1996) or a set of rules (Sanchez et al. 2015)), but this may yield low human precision. Simple models are not able to fully capture the behavior of the complex ones, and thus lead users to wrong conclusions, especially since it is not clear when the simple model is faithful. To avoid this, local model-agnostic explanations explain individual predictions (instead of the whole model at once). These methods provide a trade-off: each explanation is easy to understand even for complex models and tasks, but only captures the behavior of the model on a local region of the input space. The anchor approach falls within this category (and thus can explain complex models like translation and VQA), together with many forms of linear explanations (Baehrens et al. 2010; Ribeiro, Singh, and Guestrin 2016b; Strumbelj and Kononenko 2010). As illustrated pictorially by Figure 2b and concretely in Figure 1b, even the local behavior of a model may be extremely non-linear, leading to poor linear approximations and users being potentially misled as to how the model will behave (e.g. thinking that not will usually be positive after seeing Figure 1b (left)). Linear explanations are also harder to parse and apply than simple rules, as they involve mental calculations. Finally, even if the blackbox model is approximately linear locally, humans may not be able to compute distance functions correctly , and may think that a local explanation applies when it does not the unclear coverage problem. Anchors, on the other hand, make their coverage very clear the user knows exactly when the explanation for an instance generalizes to other instances. By construction, anchors are not only faithful to the original model, but communicate its behavior to the user in such a way that virtually guarantees correct understanding, and thus high precision. Anchors are also able to capture non-linear behavior, even locally. In sum, the anchor approach combines the beneﬁts of local

model-agnostic explanations with the interpretability of rules, constructed in a way to best support human understanding.

Efﬁciently Computing Anchors We revisit the problem deﬁnition from Eq. (1): given a blackbox classiﬁer f, instance x, distribution D, and the desired level of precision τ, an anchor A is a set of feature predicates on x that achieves prec(A) τ, where

prec(A) = ED(z|A) 1f(x)=f(z) . (2)

For an arbitrary D and black-box model f, it is intractable to compute this precision directly. Instead, we introduce a probabilistic deﬁnition: anchors satisfy the precision constraint with high probability.

P (prec(A) τ) 1 δ (3)

If multiple anchors meet this criterion, those that describe the behavior of a larger part of the input space are preferred, i.e. ones with the largest coverage . Formally, we deﬁne the coverage of an anchor as the probability that it applies to samples from D, i.e. cov(A) = ED(z)[A(z)]. We thus deﬁne this search for an anchor as the following combinatorial optimization problem:

max A s.t. P (prec(A) τ) 1 δ cov(A). (4)

The number of all possible anchors is exponential, and it is intractable to solve this problem exactly. While the search for anchors is similar in spirit to Probabilistic Inductive Logic Programming (ILP) (De Raedt and Kersting 2008) and other rule-ﬁnding methods, one crucial difference is that we do not assume a dataset apriori - instead we have perturbation distributions and a black box model, which we can call to estimate precision and coverage bounds under D. While we

could in theory generate a very large dataset and then use methods like ILP to ﬁnd anchors, the number of perturbed samples and predictions from the black box model would be prohibitive, especially in high-dimensional sparse domains such as text. In order to efﬁciently explore the model s behavior in the perturbation space, we turn to a multi-armed bandit formulation.

Bottom-up Construction of Anchors We ﬁrst introduce a bottom-up construction of anchors, which we later extend to search over a space of potential anchors. Here, we incrementally construct the anchor A, which is initialized with an empty rule, i.e. one that applies to every instance. In each iteration, we generate a number of candidate rules that extend A by one additional feature predicate, {ai}, in its deﬁnition, i.e. the set of candidate rules in each iteration is A = {A ai, A ai+1, A ai+2, . . .}. We identify the candidate rule with the highest estimated precision (as described next), replace A with the selected candidate, and repeat. If the current candidate rule meets the anchor deﬁnition in Eq. (3), we have identiﬁed our desired anchor and terminate. Although this approach does not directly compute the coverage, and instead tries to ﬁnd the shortest anchor, we note that short anchors are likely to have a higher coverage, and require less effort from the users to understand. In order to select the best candidate rule in each iteration, we want to estimate the precision of these candidates efﬁciently. Since we cannot compute the true precision, we rely on samples from D( |A) to estimate the precision of A; however, a ﬁxed number of samples may be too many or too few for an accurate estimation. Instead, we are interested in a minimal set of calls to f (fewest samples from D) in order to estimate which candidate rule has the highest true precision. This problem can be formulated as an instance of pureexploration multi-armed bandit problem (Kaufmann and Kalyanakrishnan 2013), i.e. each candidate A is an arm, the true precision of A on D( |A) is the latent reward, and each pull of the arm A is an evaluation of 1f(x)=f(z) on a sample from D(z|A). For such a setting, the KL-LUCB (Kaufmann and Kalyanakrishnan 2013) algorithm can be used to identify the rule with the highest precision. The algorithm works by constructing conﬁdence regions based on KL divergence (Cover and Thomas 1991). In each step, the algorithm selects two distinct rules: the best mean (A) and the highest upper bound (A ), and updates their bounds by getting a sample each from D(z|A) and D(z |A ), and computing 1f(x)=f(z) and 1f(x)=f(z ). This sampling process continues until the lower bound on A is higher than A s upper bound with tolerance ϵ [0, 1]. If A is the arm with highest true precision, the following (proved by Kaufmann and Kalyanakrishnan) holds for the true precision of the chosen rule A:

P(prec(A) prec(A ) ϵ) 1 δ (5)

Alg 1 presents an outline of this approach. When evaluating if the rule chosen by KL-LUCB is an anchor, we need to be conﬁdent it meets our precision criteria. Thus, if for an identiﬁed rule A, preclb(A) < τ but precub(A) > τ, we sample from D( |A) until either we are conﬁdent A is an anchor (preclb(A) > τ) or not (precub(A) < τ).

Algorithm 1 Identifying the Best Candidate for Greedy

function Generate Cands(A, c) Ar = for all A A; ai x, ai / A do if cov(A ai) > c then {Only high-coverage} Ar Ar (A ai) {Add as potential anchor} return Ar {Candidate anchors for next round}

function Best Cand(A, D, ϵ, δ) initialize prec, precub, preclb estimates A A A arg max A prec(A) A arg max A =A precub(A , δ) {δ implicit below} while precub(A ) preclb(A) > ϵ do sample z D(z|A), z D(z |A ) {Sample more} update prec, precub, preclb for A and A

A arg max A prec(A) A arg max A =A precub(A ) return A

Algorithm 2 Outline of the Beam Search

function Beam Search(f, x, D, τ) hyperparameters B, ϵ, δ A null, A0 {Set of candidate rules} loop At Generate Cands(At 1, cov(A )) At B-Best Cand(At, D, B, δ, ϵ) {LUCB} if At = then break loop for all A At s.t. preclb(A, δ) > τ do if cov(A) > cov(A ) then A A return A

Beam-Search for Anchor Construction Although the greedy approach we have described so far can ﬁnd short anchors with the guarantee that, for each step, the choice was near optimal with high probability, it has two major shortcomings. First, due to the greedy nature of the approach, it is only able to maintain a single rule at a time (that it incrementally augments), and thus any suboptimal choice is irreversible. Second, the greedy algorithm is not directly concerned with the coverage of the anchors, and instead returns the shortest anchor that it ﬁnds. In order to address both these concerns, we extend the greedy approach to perform a beam-search by maintaining a set of candidate rules, while guiding the search to identify amongst many possible anchors the one that has the highest coverage. The algorithm is outlined in Algorithm 2. It is similar in structure to the greedy approach, with a set of B current candidates instead of a single one. After generating all the possible candidates rules, we select the B-best candidates to keep based on the KL-LUCB approach with multiple arms (the Explore-m setting). For the tolerance ϵ [0, 1], this version of KL-LUCB algorithm returns a set A of size B that is an ϵ-approximation of A , with high probability.

P(min A A prec(A) min A A prec(A ) ϵ) 1 δ (6)

We omit the description of KL-LUCB for this setting, but the intuition is similar to the one in the greedy approach. Further, amongst multiple anchors that we encounter, we output

the one with the highest coverage, thus directly optimizing Eq. (4). This condition is also used for efﬁcient pruning of the search space we do not store any rule that has a lower coverage than that of the best anchor found so far, since the coverage of a rule can only reduce as more predicates are added. The beam-search algorithm is therefore more likely to return an anchor with a higher coverage than the one found by the greedy approach, and thus we use this algorithm for all examples and experiments.

Hyperparameters and Potential Issues A rule that exactly matches x is always a valid anchor, albeit with very low coverage, and thus, is of little use. Our algorithm can always recover this anchor, and thus is guaranteed to terminate in a bounded number of iterations. In pathological cases, it is possible for KL-LUCB to require a very large number of samples from D in order to separate two candidate rules with a high conﬁdence; however, this can be alleviated by increasing the tolerance ϵ, the width δ, or by setting a maximum number of samples. In practice, all explanations present in this paper were generated in a few seconds to few minutes. We set these parameters to reasonable values, B = 10, ϵ = 0.1, δ = 0.05, and leave an analysis of the sensitivity of our approach to these for future work. So far, we focused on computing anchors for individual predictions. In order to gain a more complete understanding of how the model works, the user needs to examine multiple explanations. Instead of randomly selecting which anchors to show to the user, we would like to identify an optimal set of anchors that represent this global behavior, thereby reducing the user effort. By observing that such an objective is submodular, we propose an approach for submodular-pick (SP) in Ribeiro, Singh, and Guestrin (2016b) that we adapt to this setting. In particular, the approach selects K anchors that cover as many instances in the validation set as possible. We use an iterative, greedy method that provides guarantees on the quality of our solution, due to the submodular nature of the optimization (Krause and Golovin 2014).

Experiments We evaluate anchor explanations for complex models on a number of tasks, primarily focusing on how they facilitate accurate predictions by users (simulated and human) on the behavior of the models on unseen instances. Code and the data for all the experiments is available at https://github.com/ marcotcr/anchor-experiments.

Simulated Users For simulated users, we use the tabular datasets previously mentioned (adult, rcdv and lending). Each dataset is split such that models are trained with the training set, explanations are produced for instances in the validation set, and evaluated on instances in the test set. For each dataset, we train three different models: logistic regression (lr), 400 gradient boosted trees (gb) and a multilayer perceptron with two layers of 50 units each (nn). We generate both linear LIME (Ribeiro, Singh, and Guestrin 2016b) and anchor explanations for them.

Precision Coverage

anchor lime-n anchor lime-t

adult logistic 95.6 81.0 10.7 21.6 gbt 96.2 81.0 9.7 20.2 nn 95.6 79.6 7.6 17.3

rcdv logistic 95.8 76.6 6.8 17.3 gbt 94.8 71.7 4.8 2.6 nn 93.4 65.7 1.1 1.5

lending logistic 99.7 80.2 28.6 12.2 gbt 99.3 79.9 28.4 9.1 nn 96.7 77.0 16.6 5.4

Table 4: Average precision and coverage with simulated users on 3 tabular datasets and 3 classiﬁers. lime-n indicates direct application of LIME to unseen instances, while lime-t indicates a threshold was tuned using an oracle to achieve the same precision as the anchor approach. The anchor approach is able to maintain very high precision, while a naive use of linear explanations leads to varying degrees of precision.

When simulating users, we compute coverage (what fraction of the instances they predict after seeing explanations) and precision (what fraction of the predictions were correct) on the complete test set. For each dataset, model, and explanation type, we compute these metrics for the explanation of each instance in the validation data. Simulating when an anchor applies is clear. It is not obvious, however, how real users would use LIME explanations. Ideally, they should only apply explanations to examples that are close, but it is not clear what the distance function and the threshold for close should be, or if users compute distances on demand. Therefore, in this section, we simulate different behaviors, and perform a study with real users in the following section. In Table 4 (left), we show the average precision for anchor and LIME, assuming users always apply the LIME explanation without any regard to distance (we call such a user lime-n, for naive). It is clear that the anchor approach is able to deliver on the promise of high average precision, for all datasets and models. If users apply LIME naively, on the other hand, they get widely differing levels of precision. Since high precision is a prerequisite for interpretability, we simulate a user that only makes a prediction with LIME if the application of the linear explanation yields a probability above a certain threshold (setting a threshold on the distance produced strictly worse results), and call this user lime-t. We tune the threshold values for each dataset/model pair in order to obtain the same average precision as the anchor approach on the test set, so that coverage is comparable. A real user would not be able to perform this tuning, as (1) we are cheating by looking at the test set, and (2) the threshold values found range from 67% to 91%, and often with huge variation in the same dataset for different models. We show the average test coverage of the validation set explanations for anchor and lime-t in Table 4 (right). There is no clear winner in terms of coverage, thus demonstrating that even with the impossibly tuned thresholds, LIME is not able to outperform anchors. Although the user is often interested in speciﬁc expla-

Method Precision Coverage (perceived) Time/pred (seconds)

adult rcdv vqa1 vqa2 adult rcdv vqa1 vqa2 adult rcdv vqa1 vqa2

No expls 54.8 83.1 61.5 68.4 79.6 63.5 39.8 30.8 29.8 14 35.7 26 18.7 20 13.9 20

LIME(1) 68.3 98.1 57.5 76.3 89.2 55.4 71.5 54.2 28.5 10 24.6 6 8.6 3 11.1 8 Anchor(1) 100.0 97.8 93.0 98.9 43.1 24.6 31.9 27.3 13.0 4 14.4 5 5.4 2 3.7 1

LIME(2) 89.9 72.9 - - 78.5 63.1 - - 37.8 20 24.4 7 - - Anchor(2) 87.4 95.8 - - 62.3 45.4 - - 10.5 3 19.2 10 - -

Table 5: Results of the User Study. Underline: signiﬁcant w.r.t. anchors in the same dataset and same number of explanations. Results show that users consistently achieve high precision with anchors, as opposed to baselines, with less effort (time).

(a) adult dataset

(b) rcdv dataset

Figure 4: Coverage on the test set as the simulated user sees more explanations, at the same precision level. While there is no clear winner for random explanations, anchors are better when explanations are picked using submodular-pick.

nations, most users would prefer a set of explanations that explain most of the model with as little effort on their part as possible - explanations picked using the submodular procedure described before. In Figure 4, we show the coverage (for the same precision level) for gb in two of the datasets (we omit the other models/dataset due to space, but the results are similar) as the user sees more explanations, chosen either via submodular pick (SP-LIME and SP-Anchor) or at random (RP-LIME and RP-Anchor). The results indicate that while the average coverage of random explanations is low (e.g. 4.8% for rcdv), selecting the right set of explanations with submodular pick can give users an understanding of the model s global behavior (e.g. 50% after only 10 explanations). The anchor approach also yields better coverage for the same precision - even though it is unclear if real users can achieve such high precision with LIME.

We ran a user study with 26 users students who had or were taking a machine learning course so we could rely on familiarity with concepts such as model , prediction , and cross validation . For this study, we used the adult and rcdv datasets, followed by a multiple-choice VQA system (Ren, Kiros, and Zemel 2015) on two images. While the VQA model predicts one of 1000 labels, we restrict it to the 5 most common answers predicted on questions in D, in order to reduce visual overload. For each dataset and explanation type, we want to evaluate if users are able to predict the behavior of the model on

unseen instances. Our set up for the adult and rcdv datasets consists of the users ﬁrst browsing through 10 predictions without any explanations, then with one, and two LIME or anchor explanations. They are asked to predict the behavior of the classiﬁer on 10 random test instances before and 10 instances after seeing each round of explanations. The user then goes through the same procedure on the other dataset, with the explanation type that was not the one used for the ﬁrst one. We ask subjects to only make predictions if they are very conﬁdent, and to select I don t know otherwise. We measure coverage as the fraction of instances where users made a prediction other than I don t know (i.e. their perceived coverage), and only measure precision in these instances. This process is repeated for the two VQA images - half the users see LIME for the ﬁrst and then anchor for the second, and vice versa for the other half, and predict the model s answers on 20 questions before and after explanations. The explanations are comparable in size: LIME explanations had at most 5 terms in all datasets, while anchors varied from 1 to 5, depending on each individual prediction.

The results in Table 5, where LIME(1) refers to results after one LIME explanation and so on, show that users with anchors achieve high-precision - around 95% for all of the combinations, except for Anchor(2) in adult. The coverage of users with anchors grows with a second explanation on both datasets. Precision with LIME, on the other hand, varies dramatically, and for VQA1, is worse than no explanations. The subjects made mostly correct predictions when using anchors (high precision), and knew to select I don t know when an instance was not covered by the explanation. In contrast, with LIME or no explanations, users thought they could make conﬁdent predictions more often, even though their precision was considerably lower. Further, it took dramatically less time to understand and use anchors as compared to linear explanations, across all datasets and tasks. On a poll, 21/26 of users preferred anchors, and 24/26 said they would be more precise with anchors; interestingly, the 2 users who said they would be more precise with LIME were actually more precise with anchors. Many commented that it was easier to apply anchors than combining the weights of LIME explanations, especially with multiple explanations. They also felt more conﬁdent in their predictions with anchors.

The user study conﬁrms our hypotheses: it is much easier for users to understand the coverage of anchor explanations as opposed to linear explanations, and to achieve high-precision

understanding of the model s behavior (as measured by predicting it on new instances). Anchors are also easier to comprehend, and take less effort in applying, as reﬂected in their times and qualitative feedback.

Limitations and Future Work

Having demonstrated the ﬂexibility and usefulness of the anchor approach for a wide variety of domains, and having compared it to the state-of-the-art, we now turn to its limitations and opportunities for future work. Overly speciﬁc anchors: Predictions that are near a boundary of the black box model s decision function, or predictions of very rare classes may require very speciﬁc sufﬁcient conditions , and thus their anchors may be complex and provide low coverage. We give an example in Figure 5 (adult dataset), where a particular prediction is very near the boundary of the decision function almost any change to the instance results in a change in prediction. While the very speciﬁc anchor in Figure 5c communicates to the user that the conditions necessary for the prediction are very speciﬁc, it does not generalize well to other instances, due to its narrowness. It also does not give much insight into the model s behavior besides the fact that the instance is near a decision boundary. In cases like these, a linear LIME explanation (Figure 5b) may be preferred due to the potential insights it gives, with the caveat that users may still generalize incorrectly due to unclear coverage. Potentially conﬂicting anchors: When using the anchor approach in the wild , two or more anchors with different predictions may apply to the same test instance. While possible, this situation is unlikely for two reasons: (1) the high probability precision guarantee that anchors have by construction, and (2) the submodular objective in the pick procedure encourages a set of anchors with low overlap. If this were to happen in a real situation, we would want to alert the user, suggest further investigation, and maybe suggest increasing the precision threshold. Complex output spaces: For certain problems where the output is structured and complex, there is a variety of explanations that may be useful. In this work, we restrict ourselves to explaining certain functions of the output, such as in Tables 1 and 2, leaving the task of explaining the full output space to future work. We emphasize that this is a problem that is not speciﬁc to the anchor approach, but still an open problem in the explanation literature. Even for a simpler output space such as the one present in the multi-label classiﬁcation setting (Tsoumakas and Katakis 2006), it is not clear if the best option would be to explain each label individually or the set of predicted labels as a single label. The former could overwhelm the user if the number of labels is too large, while the latter may lead to non intuitive, or overly complex explanations. Realistic perturbation distributions: All perturbationbased explanation methods depend on a local perturbation distribution that is expressive enough to reveal the model s behavior, while acting on components that are sufﬁciently interpretable. Finding such distributions for some domains re-

28 < Age 37 Workclass = Private Education = High School grad Marital Status = Married Occupation = Blue-Collar Relationship = Husband Race = White Sex = Male Capital Gain = None Capital Loss = Low Hours per week 40.00 Country = United-States

P (Salary > $50K) = 0.57

(a) Instance and prediction

(b) LIME explanation

IF Country = United-States AND Capital Loss = Low AND Race = White AND Relationship = Husband AND Married AND 28 < Age 37 AND Sex = Male AND High School grad AND Occupation = Blue-Collar THEN PREDICT Salary > $50K

(c) An anchor explanation

Figure 5: Explaining a prediction near the decision boundary in the UCI adult dataset.

mains a challenge - for example, the perturbations for images provided explanations that led to some insight, but cannot be used for comparison of explanations across images. Designing such distributions is a line of research that would beneﬁt multiple explanation methods.

Conclusions We have argued that high precision and clear coverage are crucial for interpretable explanations of a model s local behavior. We introduced a novel family of rule-based, model-agnostic explanations called anchors, designed to exhibit both these properties. Anchors highlight the part of the input that is sufﬁcient for the classiﬁer to make the prediction, making them intuitive and easy to understand. We demonstrated the ﬂexibility of the anchor approach by explaining predictions from a variety of classiﬁers on multiple domains. In a user study, we showed that anchors not only lead to higher human precision than linear explanations, but also require less effort to understand and apply.

Acknowledgements We are grateful to the anonymous reviewers for their feedback. This work was supported in part by ONR award #N00014-13-1-0023, and in part by FICO and Adobe Research. The views expressed are those of the authors and do not reﬂect the policy or position of the funding agencies.

References Baehrens, D.; Schroeter, T.; Harmeling, S.; Kawanabe, M.; Hansen, K.; and M uller, K.-R. 2010. How to explain indi-

vidual classiﬁcation decisions. Journal of Machine Learning Research 11. Cover, T. M., and Thomas, J. A. 1991. Elements of Information Theory. New York, NY, USA: Wiley-Interscience. Craven, M. W., and Shavlik, J. W. 1996. Extracting treestructured representations of trained networks. Advances in neural information processing systems 24 30. De Raedt, L., and Kersting, K. 2008. Probabilistic inductive logic programming. Berlin, Heidelberg: Springer-Verlag. chapter Probabilistic Inductive Logic Programming, 1 27. Kaufman, S.; Rosset, S.; and Perlich, C. 2011. Leakage in data mining: Formulation, detection, and avoidance. In Knowledge Discovery and Data Mining (KDD). Kaufmann, E., and Kalyanakrishnan, S. 2013. Information complexity in bandit subset selection. In Proceedings of the Twenty-sixth nnual Conference on Learning Theory (COLT 2013), volume 30 of JMLR Workshop and Conference Proceedings, 228 251. JMLR. Krause, A., and Golovin, D. 2014. Submodular function maximization. In Tractability: Practical Approaches to Hard Problems. Cambridge University Press. Lakkaraju, H.; Bach, S. H.; and Leskovec, J. 2016. Interpretable decision sets: A joint framework for description and prediction. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 16, 1675 1684. New York, NY, USA: ACM. Letham, B.; Rudin, C.; Mc Cormick, T. H.; and Madigan, D. 2015. Interpretable classiﬁers using rules and bayesian analysis: Building a better stroke prediction model. Annals of Applied Statistics. Lim, B. Y.; Dey, A. K.; and Avrahami, D. 2009. Why and why not explanations improve the intelligibility of contextaware intelligent systems. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 09, 2119 2128. New York, NY, USA: ACM. Patel, K.; Fogarty, J.; Landay, J. A.; and Harrison, B. 2008. Investigating statistical machine learning as a tool for software development. In Human Factors in Computing Systems (CHI). Pennington, J.; Socher, R.; and Manning, C. D. 2014. Glove: Global vectors for word representation. In Empirical Methods in Natural Language Processing (EMNLP), 1532 1543. Ren, M.; Kiros, R.; and Zemel, R. S. 2015. Exploring models and data for image question answering. In Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS 15, 2953 2961. Cambridge, MA, USA: MIT Press. Ribeiro, M. T.; Singh, S.; and Guestrin, C. 2016a. Modelagnostic interpretability of machine learning. In Human Interpretability in Machine Learning workshop, ICML 16. Ribeiro, M. T.; Singh, S.; and Guestrin, C. 2016b. why should I trust you? : Explaining the predictions of any classiﬁer. In Knowledge Discovery and Data Mining (KDD). Sanchez, I.; Rocktaschel, T.; Riedel, S.; and Singh, S. 2015. Towards extracting faithful and descriptive representations of

latent variable models. In AAAI Spring Syposium on Knowledge Representation and Reasoning (KRR): Integrating Symbolic and Neural Approaches. Schmidt, P., and Witte, A. D. 1988. Predicting Recidivism in North Carolina, 1978 and 1980. Inter-university Consortium for Political and Social Research. Sculley, D.; Holt, G.; Golovin, D.; Davydov, E.; Phillips, T.; Ebner, D.; Chaudhary, V.; Young, M.; and Crespo, J.-F. 2015. Hidden technical debt in machine learning systems. In Neural Information Processing Systems (NIPS). Strumbelj, E., and Kononenko, I. 2010. An efﬁcient explanation of individual classiﬁcations using game theory. Journal of Machine Learning Research 11. Stumpf, S.; Rajaram, V.; Li, L.; Burnett, M.; Dietterich, T.; Sullivan, E.; Drummond, R.; and Herlocker, J. 2007. Toward harnessing user feedback for machine learning. In Proceedings of the 12th International Conference on Intelligent User Interfaces, IUI 07, 82 91. New York, NY, USA: ACM. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; and Rabinovich, A. 2015. Going deeper with convolutions. In Computer Vision and Pattern Recognition (CVPR). Tsoumakas, G., and Katakis, I. 2006. Multi-label classiﬁcation: An overview. International Journal of Data Warehousing and Mining 3(3). Ustun, B., and Rudin, C. 2015. Supersparse linear integer models for optimized medical scoring systems. Machine Learning. Vedaldi, A., and Soatto, S. 2008. Quick shift and kernel methods for mode seeking. In European Conference on Computer Vision, 705 718. Springer. Vinyals, O.; Kaiser, L.; Koo, T.; Petrov, S.; Sutskever, I.; and Hinton, G. E. 2015. Grammar as a foreign language. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, 2773 2781. Wang, F., and Rudin, C. 2015. Falling rule lists. In Artiﬁcial Intelligence and Statistics (AISTATS). Wang, T.; Rudin, C.; Doshi-Velez, F.; Liu, Y.; Klampﬂ, E.; and Mac Neille, P. 2015. Or s of and s for interpretable classiﬁcation, with application to context-aware recommender systems. ar Xiv:1504.07614. Wieting, J.; Bansal, M.; Gimpel, K.; and Livescu, K. 2015. Towards universal paraphrastic sentence embeddings. Co RR abs/1511.08198. Zhu, Y.; Groth, O.; Bernstein, M.; and Fei-Fei, L. 2016. Visual7W: Grounded Question Answering in Images. In IEEE Conference on Computer Vision and Pattern Recognition.