# model_agnostic_multilevel_explanations__58710b1d.pdf

Model Agnostic Multilevel Explanations

Karthikeyan Natesan Ramamurthy, Bhanukiran Vinzamuri, Yunfeng Zhang, Amit Dhurandhar

IBM Research, Yorktown Heights, NY USA 10598 knatesa@us.ibm.com, bhanu.vinzamuri@ibm.com, {zhangyun, adhuran}@us.ibm.com

In recent years, post-hoc local instance-level and global dataset-level explainability of black-box models has received a lot of attention. Lesser attention has been given to obtaining insights at intermediate or group levels, which is a need outlined in recent works that study the challenges in realizing the guidelines in the General Data Protection Regulation (GDPR). In this paper, we propose a meta-method that, given a typical local explainability method, can build a multilevel explanation tree. The leaves of this tree correspond to local explanations, the root corresponds to global explanation, and intermediate levels correspond to explanations for groups of data points that it automatically clusters. The method can also leverage side information, where users can specify points for which they may want the explanations to be similar. We argue that such a multilevel structure can also be an effective form of communication, where one could obtain few explanations that characterize the entire dataset by considering an appropriate level in our explanation tree. Explanations for novel test points can be cost-efﬁciently obtained by associating them with the closest training points. When the local explainability technique is generalized additive (viz. LIME, GAMs), we develop fast approximate algorithm for building the multilevel tree and study its convergence behavior. We show that we produce high ﬁdelity sparse explanations on several public datasets and also validate the effectiveness of the proposed technique based on two human studies one with experts and the other with non-expert users on real world datasets.

1 Introduction

A very natural and effective way to communicate is to ﬁrst provide high level general concepts and then only dive into more of the speciﬁcs [1]. In addition, the transition from high level concepts to more and more speciﬁc explanations should ideally be as logical or smooth as possible [2, 3]. For example, when you call a service provider there is usually an automated message trying categorize the problem at a high level followed by more speciﬁc questions. Eventually if the issue is not resolved a human representative may intervene to delve into further details. In such cases, information or explanations you provide at multiple levels enables others to obtain insights that are otherwise opaque. Recent work [4] has stressed the importance of having such multilevel explanations to successfully meet the requirements of Europe s General Data Protection Regulation (GDPR) [5]. They argue that simply having local or global explanations may not sufﬁce for providing satisfactory explanations in many cases. In fact, even in the widely participated FICO explainability challenge [6] it was expected that one provides not just local explanations but also insights at the intermediate class level.

Motivated by this need, in this paper, we propose a novel model agnostic multilevel explanation (MAME) method that takes as input a post-hoc local explainability technique for black-box models (e.g. LIME [7]) and an unlabeled dataset. The method then generates multiple explanations for each of the examples corresponding to different degrees of cohesion (i.e. parameter tying) between explanations of the examples. This explicitly controllable degree of cohesion determines a level in our multilevel explanation tree. In addition, we constrain that the predictions of the explanation model to be close to that of the black-box at each tree node, ensuring ﬁdelity. At the extremes, the

34th Conference on Neural Information Processing Systems (Neur IPS 2020), Vancouver, Canada.

Figure 1: Illustration of multilevel explanations generated by MAME for an industrial pump failure dataset consisting of 2500 wells. We show three levels: the bottom level (four) leaves which correspond to example local explanations, the top level corresponds to one global explanation and an intermediate level corresponds to explanations for two groups highlighted by MAME. Based on expert feedback, these intermediate explanations, although explaining the same type of pump, had semantic meaning as they corresponded to different manufacturer groups that behave noticeably differently.

leaves would correspond to independent local explanations as with methods like LIME, while the root would correspond to a single global explanation given the high degree of cohesion.

An illustration of this is given in Figure 1, where multilevel explanations were generated by MAME for a real industrial pump failure dataset (see Section 4.4 for details). We show three levels: the four leaves correspond to example local explanations (amongst many), the root corresponds to one global explanation and an intermediate level corresponds to explanations for two groups highlighted by MAME. Note that levels are numbered from 1 (leaves) increasing up to the highest value (root). The dotted lines indicate that the nodes are descendants of the node above, but not direct children. Based on expert feedback these intermediate explanations correspond to pumps having different manufacturers resulting in noticeable difference in behaviors. Also note that each level provides distinct enough information not subsumed by just local or global explanations, thus motivating the need for such multilevel explanations. Such explanations can thus help identify key characteristics that bind together different examples at various levels of granularity. They can also provide exemplar based explanations based on the groupings at speciﬁc levels. These are provided in the supplement.

Our method can also take into account side information such as similarity in explanations based on class labels or user speciﬁed groupings based on domain knowledge for a subset of examples. Moreover, one can also use non-linear additive models going beyond LIME to generate local explanations. Our method thus provides a lot of ﬂexibility in building multilevel explanations that can be customized apropos a speciﬁc application. We prove that our method actually forms a tree in that examples merged in a particular level of the tree remain together at higher levels. The proposed fast approximate algorithm for obtaining multilevel explanations is proved to converge to the exact solution. We show that we produce high ﬁdelity sparse explanations on several public datasets. We also validate the effectiveness of the proposed technique based on two human studies one with experts and the other with non-expert users on real world datasets.

2 Related Work

The most traditional direction in explainable AI is to directly build interpretable models such as rule lists or decision sets [8, 9] on the original data itself so that no post-hoc interpretability methods are required to uncover the logic behind the proposed actions. These methods however, may not readily

give the best performance in many applications compared to more sophisticated black-box models. Some works try to provide local [7, 10, 11, 12] as well as global explanations that are both feature and exemplar based [13]. The method proposed by Plumb et al. [13] provides only local explanations and detects global patterns, but does not automatically identify and provide explanations for groups of data. Exemplar based explanations [14, 15] identify few examples that are representative of a larger dataset. Previous works tend to use distinct models to provide the local and global explanations. Hence, consistency between these models could potentially be an issue. Pedreschi et al. [16] propose a high-level approach to cluster local rule-based explanations to learn a local to global tree structure. The authors do not explicitly provide any algorithm and the drawback here is that the explanations at levels other than local may not have high ﬁdelity to the black-box. In Tree Explainer [17] the authors present ﬁve new methods to combine efﬁciently computed exact local SHAP explanations which also additionally account for local feature interactions to understand global behavior of the model. Drawbacks are similar to [16]. Tsang et al. [18] propose a hierarchical method to study the change in the behavior of interactions for a local explanation from an instance level to across the whole dataset. A key difference is that they do not fuse explanations as we do as they go higher up the tree. Moreover, their notion of hierarchy is based on learning higher order interactions between the prediction and input features, which is different from ours. Bhatt et al. [19] propose a method to aggregate explanation functions based on various desiderata such as complexity, faithfulness, etc. Lakkaraju et al. [20] present a model agnostic approach to customize local explanations to an end-user deﬁned feature set of interest. In contrast, MAME allows the end user to specify which explanations are similar apriori and also controls for complexity (sparsity) to learn high ﬁdelity explanations.

Our approach has relations to convex clustering [21, 22], and its generalizations to multi-task learning [23, 24]. However, our goal is completely different (multilevel post-hoc explainability) and our methodology of computing and using local models that mimic black-box predictions is also different.

Let X Y denote the input-output space and f : X Y be a classiﬁcation/regression function corresponding to a black-box model. Let g(.) be a potentially non-linear map on feature vector x X. Let p denote the number of features and n the number of instances. For the parameter vector θ Rp, l(x, θ) = g(x)T θ is a generalized additive model, g(.) being a pre-speciﬁed map. We can learn this from the predictions of f(.) for examples near x. This provides a local explanation for x given by θ. The similarity, ψ(x, z), between x and z, can be estimated as exp( d(x, z)/η), d(., .) being the distance function, and η being the bandwidth. Let {(x1, y1), . . . , (xn, yn)} be a dataset of size n, where yi may or may not be known for each i. For each xi, we deﬁne the neighborhood Ni = {z X|ψ(xi, z) κ}, κ close to 1. In practice, Ni of size m can be generated by randomly perturbing the instance (xi) m number of times as done in previous works [7] . We now deﬁne the optimization:

z Ni ψ (xi, z) f(z) g(z)T θi 2 + αi||θi||1 + β

i<j wij θi θj 2

where α1, ..., αn, β 0 are regularization parameters, wij 0 are custom weights and Θ(β) is a set of θi i {1, ..., n} for a given β.

The ﬁrst term in (1) tries to make the local models for each example to be as faithful as possible to the black-box model, in the neighborhood of the example. The second term tries to keep each explanation θi sparse. The third term tries to group together explanations. This in conjunction with the ﬁrst term has the effect of trying to make explanations of similar examples to be similar. Here we have the opportunity to inject domain knowledge by creating a prior knowledge graph with adjacency matrix W. The edge weights wij can be set to high values for pairs of examples that we consider to have similar explanations, while setting zero weights for the rest. In the third term, other norms instead of ℓ2 such as ℓ1 or ℓ can be used as well, but we ﬁnd that ℓ2 provides strong theoretical guarantees and good experimental results.

We solve the above objective for different values of β, wherein β = 0 corresponds to the leaves of the multilevel explanation tree. Each leaf represents a training example and its local LIME explanation. At β = 0, (1) decouples to n optimizations, corresponding to the LIME explanations. β can be adaptively increased from 0 resulting in progressive grouping of explanations (since θs get closer),

and their corresponding examples, forming higher levels of the tree. The grouping happens because θi and θj with a non-zero wij are encouraged to get closer as β increases in (1). The intermediate levels hence correspond to disjoint clusters of examples with their representative explanations. The root of the tree obtained at a high β value represents the global explanation for the entire dataset. Note that the number of unique explanations as β is equal to the number of connected components in the prior knowledge graph. Throughout this paper, we assume that this graph has a single connected component, without loss of generality.

3.1 Optimization Details

We solve the optimization in (1) using ADMM [25] by posing (1) as

min Θ(β),U (β),V (β)

z Ni ψ (xi, z) f(z) g(z)T θi 2 + αi||ui||1 + β

el E wl vl 2

such that Θ = U, ΘD = V. (2)

The augmented Lagrangian with scaled dual variables is

min Θ,U,V,Z1,Z2

z Ni ψ (xi, z) f(z) g(z)T θi 2 + αi||ui||1 + β

el E wl vl 2

ρ 2 Θ U + Z1 2 F + ρ

2 ΘD V + Z2 2 F . (3)

Here, U Rp n, and V Rp |E| are the auxiliary variables, and E is the list of edges in the prior knowledge graph with non-zero weights. The columns of U and V are denoted by ui and vl respectively. ui corresponds to the same column in Θ (θi), and D Rn |E| acts on Θ to encode differences in their columns. For example, the column of D that encodes θi θj will contain 1 at row i and 1 at row j. Z1 and Z2 are the scaled dual variables. This reformulation is inspired by [22].

The ADMM iterations for a given value of β are:

Θ(k+1) = arg min Θ

z Ni ψ (xi, z) f(z) g(z)T θ(k) i 2 + ρ

2 Θ(k) U (k) + Z(k) 1 2 F +

ρ 2 Θ(k)D V (k) + Z(k) 2 2 F , (4)

U (k+1) = arg min U

i αi||u(k) i ||1 + ρ

2 Θ(k+1) U (k) + Z(k) 1 2 F , (5)

V (k+1) = arg min V β

el E wl v(k) l 2

2 Θ(k+1)D V (k) + Z(k) 2 2 F , (6)

Z(k+1) 1 = Z(k) 1 + Θ(k+1) U (k+1), (7)

Z(k+1) 2 = Z(k) 2 + Θ(k+1)D V (k+1). (8)

Since (4)-(8) should be solved for progressively increasing values of β, we adopt the idea of Algorithmic Regularization (AR) [22] to run (4)-(8) only once for each value of β. The algorithm regularizes itself (hence the name) by warm-starting the next set of ADMM iterations with the estimate for the previous β value. This method exploits the fact that the solutions for two close β values should be close. The β values are obtained by initializing to a small ϵ and multiplying it by a step size t(> 1.0) for the next k. We will denote these approximate solutions as Θ(k), where k corresponds to the index of the set of β values. The detailed algorithm for obtaining the multilevel tree using MAME is described in Algorithm 1. This procedure results in a single tree because a single Union operation merges two child nodes (and the sub-trees for which they are root) to create a parent node. Merges only happen above and there will be no cycles hence. In this algorithm, when the stopping criteria on V is met, a single global model for the entire dataset is obtained (represented by the root of the tree). Step (v) creates runs LASSO on the group of examples in each node to get post-processed representative explanations. Since the algorithm implements the AR approach, γ(k) is

Algorithm 1 Model Agnostic Multilevel Explanation (MAME) method

Input: Dataset x1, ..., xn, black-box model f(.), the coordinate wise map g(.) and prior knowledge graph adjacency matrix W. i) Sample neighborhoods Ni for each example xi. ii) Construct matrix D based on edge list E. iii) Initialize Θ(0), U (0),V (0) and set k = 0, multiplicative step-size t (say to 1.01), γ(0) = ϵ (say to 1e 10), grouping threshold τ (say to 1e 6), ρ (say to 2), and tol. (say to 1e 6). iv) Initialize a disjoint set with leaves of the multilevel the tree S = {1, . . . , n}. while V k+1 > tol. do

Obtain Θ(k+1) by solving (4), U (k+1) by solving (5), V (k+1) by solving (6) with β set as γ(k). Obtain Z(k+1) 1 , Z(k+1) 2 by solving (7) and (8). For each edge el = (i, j) E, if vl < τ, perform Union(i, j). k = k + 1; γ(k+1) = γ(k) t. end while iv) Recover the multilevel tree by keeping track of the disjoint set unions. v) For every group of examples in each tree node, post-process to get representative explanations by optimizing (1) with β = 0 only for the examples in the group.

used as a surrogate notation for β (which is used in the exact solution). The dominant complexity for pass of the while loop in Algorithm 1 is O(pn(p + n)). Detailed computational complexity analysis is provided in the supplement.

The exact solution where (4)-(8) are run until convergence for each β value, will be denoted as Θ(β). We show that the approximate solution converges to the exact one (proof in supplement). This theorem holds true for {Θ(k)} without the post-processing step (v) in Algorithm 1. Theorem 3.1. As (t, ϵ) (1, 0), where t is the multiplicative step-size update and ϵ is the initial regularization level, the sequence of AR-based primal solutions {Θ(k)}, and the sequence of exact primal solutions {Θ(β)} converge in the following sense.

inf k Θ(k) Θ(β) , Ek

inf β Θ(k) Θ(β) o (t,ϵ) (1,0) 0 (9)

Comparisons of approximation quality and timing between the two solutions are in supplement. We now show that our method actually forms a tree in that explanations of examples that are close together at lower levels will remain at least equally close at higher levels (proof in supplement). Lemma 3.2 (Non-expansive map for exact solutions). If β1, ..., βk are regularization parameters for the last term in (1) for r consecutive levels in our multilevel explanations where β1 = 0 is the lowest level with θi,s and θj,s denoting the (globally) optimal coefﬁcient vectors (or explanations) for xi and xj respectively corresponding to level s {1, ..., r}, then for s > 1 and wij > 0 we have θi,s θj,s 2 θi,s 1 θj,s 1 2.

4 Experiments

We evaluate our method on three different cases with two baselines (described below): (a) Two Step, and (b) Submodular Pick LIME (SP-LIME) [7]. We set g(.) to be an identity map for numerical features and one-hot encoding for categorical features in all the experiments for ease of demonstration, though it can be set to any appropriate non-linear map by the user. We ﬁrst show quantitative beneﬁt of our method in terms of two measures deﬁned in Section 4.2 on several public datasets. The MAME approach has better explanation ﬁdelity, and the explanations are better matched to the important features of the black-box, when compared to the baselines. Secondly, we conducted a study with data scientist users who were not experts in ﬁnance using a public loan approval dataset. We found that our method was signiﬁcantly better insights to these non-experts compared to the baselines. Data scientists are the right catcher for our method as a recent study [26] claims that data scientists are the ﬁrst consumers of explanations from AI based in most organizations. Finally, we performed a case study involving human experts in the Oil & Gas industry. Insights provided by MAME were semantically meaningful to the experts both when they did and did not provide additional side information.

4.1 Baselines and Miscellaneous Details

Two Step: This is a hierarchical convex clustering [21, 22] of the local LIME explanations for n instances, given by Ω Rp n. The objective is written as min Θ Ω Θ 2 F +β P

i<j wij θi θj 2 ,

where . F denotes the Frobenius norm, and Θ is the Two Step solution, differentiated from the MAME solution given by Θ. As β varies, Two Step also results in a multilevel tree with disjoint data clusters at each level of the tree like MAME, although it does not explicitly ensure ﬁdelity to the black-box model predictions, like MAME does (see (1)). We implement Two Step also using an AR scheme [22]. The post-processed explanations for a node in the Two Step tree is the median of the group of explanations in that node.

SP-LIME: The submodular pick algorithm [7] chooses a subset of diverse, representative set of explanations from the local LIME explanations. We vary the size of this representative subset from 1 to n. Each representative explanation is associated to a representative training example.

Explanation for a novel test example: For Two Step and MAME, the explanation for a novel test example at any level chosen to be the explanation for the nearest training example (based on euclidean distance). For SP-LIME, we identify the nearest representative training example to the test example, and choose its corresponding explanation. The 1 nearest neighbor association is meaningful since LIME explanations (which also form the basis for Two Step and MAME) are optimized to work well in the neighborhood of each training data point.

4.2 Quantitative Evaluation with Public Datasets

Quantitative Metrics: We wish to primarily answer two questions quantitatively: (i) How trustworthy the learned local explanation models are at different levels in the tree? (ii) How faithful are the explanation models to the black-box that they explain? We let n be the number of training samples, l be the number of levels, θ(k) i be the explanation of sample i at level k, and θ(k) ij be its jth coefﬁcient.

a) Explanation Inﬁdelity: The trustworthiness of the explanations is measured using the inﬁdelity measure proposed in [27] with novel test data different than the training data. This measure captures how well our explanation can track changes in black-box prediction when the input goes from a chosen baseline to the actual value. It is deﬁned as ((x x0)T θ(x) (f(x) f(x0)))2 where x is a test example, θ(x) is the explanation corresponding to it, and x0 is the baseline perturbation. We choose x0 to be 0 for numerical values and the least frequent category for categorical values. This measure quantiﬁes the ﬁdelity of the explanation to the black-box predictor under perturbations (lower the better). We average this over all test examples and all levels of the tree for Two Step and MAME. For SP-LIME, this measure is averaged over all test examples, and all possible number of representative explanations ranging from 1 to n. This averaging ensures that the comparisons are fair. Details of another metric, generalized ﬁdelity, used to gauge trustworthiness are in the supplement. Both explanation inﬁdelity and generalized ﬁdelity measures can be situated under the deﬁnition of faithfulness given in [28, Corr. 1.2].

b) Feature importance correlation: We compute the feature importances of coefﬁcients with LIME, Two Step and MAME methods. For LIME, the importance score of a feature j is deﬁned as p Pn i=1 |θij| [7]. For Two Step and MAME, we include all the levels to deﬁne this as q Pn i=1 Pl k=1 |θ(k) ij |. We then compute the Pearson correlation coefﬁcient between these and the black-box model feature importances. Note that this comparison can be performed only for black-box models that can output feature importance scores.

Setup and Results: Our demonstration includes: Auto MPG [29], Retention1 [30], Home Line Equity Line of Credit (HELOC) [6], Waveform [29], and Airline Travel Information System (ATIS) datasets. The Auto MPG dataset has a continuous outcome variable whereas the rest are discrete outcomes. The (number of examples, number of features) in the datasets are: Auto MPG (392, 7), Retention (1200, 8), HELOC (1000, 22), Waveform (5000, 21), and ATIS (5871, 128). ATIS is a complex, high dimensional, featurized NLP data widely used for intent classiﬁcation of text (26 classes) from airline customers. We perform 5 fold cross validation for all datasets except ATIS (which comes with its

1We generated the data using code in https://github.com/IBM/AIX360/blob/master/aix360/data/ ted_data/Generate Data.py

Dataset Random Forest MLP SP-LIME Two Step MAME SP-LIME Two Step MAME Auto MPG 38.91 37.02 37.2 593.86 493.33 491.06 Retention 0.29 0.14 0.12 0.36 0.18 0.16 HELOC 0.05 0.06 0.06 0.14 0.11 0.10 Waveform 0.63 0.62 0.60 0.19 0.20 0.19 ATIS 0.19 0.17 0.18 0.37 0.14 0.12

Random Forest LIME Two Step MAME

0.87 0.89 0.89 0.84 0.88 0.84 0.94 0.95 0.96 0.60 0.65 0.61 0.90 0.90 0.91

Table 1: Performance of the compared approaches. The ﬁrst and second best numbers are respectively in bold and italics. The rows in (a) and (b) correspond to the same datasets. (a): Average inﬁdelity measure [27] over all clusters/levels computed for the three compared methods with the black box models. Overall, MAME reduces explanation inﬁdelity with respect to SP-LIME and Two Step by 11% and 2% for RF, and 34% and 8% for MLP. (b) Pearson correlation coefﬁcient between black-box model feature importances (Random Forest) and feature importances of the explanation methods.

own train-test partition) and report mean performances. The black-box models trained are Random Forest (RF) Regressor/Classiﬁer, and Multi-Layer Perceptron (MLP) Regressor/Classiﬁer. Only the Auto MPG dataset used a regression black-box, while rest used classiﬁcation black-box models. The RF models used between 100 and 500 trees, whereas the MLP models have either 3 or 4 hidden layers with 20 to 200 units per layer. For regression, the labels for the explanation models are directly the predictions, whereas for classiﬁcation, these are the predicted probabilities for a speciﬁed class.

When running LIME and MAME, the neighborhood size |Ni| in (1) is set to 10, neighborhood weights ψ(xi, z) are set using a Gaussian kernel on xi z 2 2 with a automatically tuned bandwidth, and the αi values in (1) are set to provide explanations with 5 non-zero values when β = 0. Tuning αi for each instance happens only once in the leaf nodes and can be efﬁciently obtained using the LASSO homotopy method [31].

For Two Step and MAME, the labels f(xi) are sorted and the wij (see Section 3) are set to 1 whenever f(xi) and f(xj) are right next to each other in this sorted list, else wij = 0. This path graph [32] prior enforces the simple observation that explanations for similar black-box model predictions must be similar, and provided good results. This prior knowledge graph biases the tree building process, particularly in the lower levels, since merges in those levels (Algorithm 1) follow the graph closely. So, it must be carefully created by the user, based on domain knowledge or experimentation. Another option, which we do not explore here, is to create the graph based on the instances (xi). This will perform well if there is a similarity measure under which similar instances would result in similar explanations. For both Two Step and MAME, we used post-processed explanations (see Section 3). More details on the datasets, the black-box models, the classes chosen for explanation, and other hyper-parameters are in the supplement.

The explanation inﬁdelities are provided in Table 1 (a). The inﬁdelities for Auto MPG dataset are higher since the regression outcomes can be much larger than 1 as opposed to other datasets where the classiﬁcation probabilities are bound between 0 and 1. MAME is the ﬁrst best in 7 out of 10 scenarios presented (5 datasets 2 black-box models), and the second best in the rest 3. We also compute the relative reductions in inﬁdelity across all datasets for the two black-box models. Overall, MAME reduces explanation inﬁdelity with respect to SP-LIME and Two Step by 11% and 2% for RF, and 34% and 8% for MLP. . The feature importance correlation for RF black-boxes are provided in Table 1 (b). Here MAME performs competitively being the best in 3 and second best in the rest 2 datasets. We do not consider MLP for this measure since feature importances are not output by them. Overall, MAME performs better with these two measures because it creates more accurate approximations of black-box models at automatically-determined groupings of data.

For the three smaller datasets (Auto MPG, Retention, HELOC), all three methods complete running in about 10 minutes or less, in a single core. For Waveform, MAME and Two Step take 1.5 hours to run whereas SP-LIME takes 5 hours. For our largest dataset, ATIS, MAME and Two Step take 2 hours and 3 hours respectively to complete whereas SP-LIME takes 48 hours. MAME and Two Step are thus naturally scalable for large, high-dimensional data, because of the underlying AR scheme. Additional details on computing infrastructure and computational complexity are given in the

MSE of Repayment Prob.

Num. of Correct Cluster

Figure 2: (a) The high-level visual explanation shown in the user study for the MAME condition. (b) Result of study: The left graph shows the mean squared error between the classiﬁer s predicted repayment probability and the participants guess of the classiﬁer s prediction (lower is better). (c) Result of study: The average number of trials (out of 10) in which the participant correctly chose the most similar cluster (higher is better). The error bars show 95% conﬁdence intervals.

supplement. Note that our implement was efﬁcient, but we did not incorporate explicit parallelization, which can be used to further speed up the AR scheme.

4.3 User Study with Credit Dataset

To further evaluate MAME s ability to accurately capture high-level concepts for a given domain, we conducted another user study with 30 data scientists from three different technology companies. In this study, we asked the participants to play the role of a loan ofﬁcer who needs to decide whether to approve or reject a person s loan request based on that person s ﬁnancial features. We used the HELOC dataset [6] that contains 23 continuous variables about a person s ﬁnancial record. One predictor, external credit score, was removed since it was a summary of other variables and accounted for much of the variability in the outcome variable. The binary outcome variable indicated either loan default or full repayment. We used 75% of the dataset to train a random forest model with 100 trees, and the remaining data for our user study. We then used MAME, SP-LIME, and Two Step to produce three different sets of explanations to characterize the training dataset. Figure 2 (a) shows the explanations generated by MAME (see Supplemental Material for a description of the visualization and the explanations generated by other methods). All methods were constrained to only produce four high-level explanation clusters so that they can be compared fairly and each cluster can represent a large portion of data.

The 30 participants were evenly divided into three groups, each exposed to only one explanation method, though they were not told what the method was. The participant s task is to review the explanations for the four clusters, assign the given 10 loan applications to the clusters, and guess the classiﬁer s predicted loan repayment probability for these applications. In other words, we want the participants to simulate how the classiﬁer make predictions, since the degree to which an algorithm can be simulated is an important metric to judge the efﬁcacy of the explanations [33]. We expect the participants to do well in these tasks when the clusters produced by an explanation method are i) selective and ii) homogeneous (instances in each cluster have similar important feature values) and when the iii) explanations are faithful.

Figures 2 (b) and (c) shows the results of the user study. The left graph shows the mean squared error (MSE) between the classiﬁer s probability prediction and the participant s guess of that prediction. A Tukey post hoc test shows that MAME (µ = 0.033) signiﬁcantly outperformed SP-LIME (µ = 0.11, p = .002) and Two Step (µ = 0.109, p = .002). The right graph shows the number of trials (out of 10) in which the participant chose the correct most-similar cluster. As can be seen, MAME (µ = 7.5) again signiﬁcantly outperformed SP-LIME (µ = 2.9, p < .0001) and Two Step (µ = 0.7, p < .0001). Both sets of results suggest that MAME produced more accurate and homogeneous high-level clusters than Two Step and SP-LIME methods. In addition, the very low squared difference suggests that non-experts can use MAME s high-level explanations to quickly understand how the model works and make reasonably accurate predictions.

4.4 Expert Study with Oil & Gas Industry Dataset

We perform a case study with a real-world industrial pump failure dataset (classiﬁcation dataset) from the Oil & Gas industry. The pump failure dataset consists of sensor readings acquired from 2500 oil wells over a period of four years that contains pumps to push out oil. These sensor readings consist of measurements such as speed, torque, casing pressure (CPRESS), production (ROCURRRATE) and efﬁciency (ROCEFF) for the well along with the type of failure category diagnosed by the engineer. In this dataset, there are three major failure modes: Worn (pump is worn out because of aging), Reservoir (pump has a physical defect) and Pump Fit (there is sand in the pump). Semantically, there can be seven different types of pumps in a well which can be manufactured separately by fourteen different vendors. Our black-box classiﬁer predicts the probability of Reservoir failure, which is a difﬁcult failure class to identify. The black-box classiﬁer used was a 7-layer multilayer perceptron (MLP) with parameter settings recommended by scikit-learn MLPClassiﬁer. The dataset had 5000 instances 75% of which were used to train the model, and the rest were used for testing.

We conducted two types of studies here. In the ﬁrst one, we obtained explanations from MAME without any side information from the subject matter expert (SME). The prior knowledge graph was just a path graph which is obtained by sorting the black-box predictions f(xi) and setting wij to 1 whenever f(xi) and f(xj) are right next to each other in the sorted list. This simple prior (path graph) enforces that weights for similar observations must be similar. Relevant explanations are shown in Figure 1. They observed that at level 380, the two clusters shown (out of 4 at that level) were particularly interesting from a semantics perspective, as both corresponded to the same pump type, but had different manufacturer groups, and hence exhibited different behavior.

In the second study, we leveraged the SME s knowledge on the existence of 4 similar pump manufacturer groups in the data (mentioned above in study 1) and induced this prior knowledge into MAME and Two Step via the prior knowledge graph. Subsequently, the SME picked the level which had 4 clusters both with MAME and Two Step. The SME observed that the four clusters from MAME were completely homogeneous individually and captured the prior knowledge completely, whereas the four clusters from Two Step were heterogeneous individually and did not the reﬂect the expected semantic grouping. More explicitly, the SME concluded that MAME was able to ingest prior knowledge effectively and identify one cluster of non-producing pumps which were used in a run-to-failure scheme at higher speed, ROCURRRATE and CPRESS than usual. Two Step does not explicitly control for ﬁdelity which results in less semantically relevant clusters. This helped the SME build more trust in our explanations. Both expert studies completed execution in 10 minutes or less, in a single core. Further details are available in the supplement.

5 Conclusion

In this paper, we have provided a meta-explanation method approach that can take a local explainer such as LIME and produce a multilevel explanation tree by jointly learning the explanations with different degrees of cohesion, while at the same time being faithful to the black-box model. We have argued based on recent works as well as through expert and non-expert user studies that such explanations can bring additional insight not conveyed readily by just global or local explanations. We have also shown that when one desires few explanations to understand the model and the data as typically would be the case, our method creates higher ﬁdelity explanations compared with other methods. We have also made our algorithm scalable by proposing principled approximations to optimize the objective. Future extensions could involve adaptations to other non-parametric contrastive/counterfactual local explanation methods.

Broader Impact

With the proliferation of deep learning, explaining or understanding the reasons behind the models decisions has become extremely important in many critical applications [30]. Many explainability methods have been proposed in literature [7, 12, 11], however, they either provide instance speciﬁc local explanations or ﬁt to the entire dataset and create global explanations. Our proposed method is able to create both such explanations, but in addition, it also creates explanations for subgroups in the data and all of this jointly. We thus are creating explanations for granularities (between local and global). This multilevel aspect has not been sufﬁciently researched before. In fact recently

[4] has stressed the importance of having such multilevel explanations for successfully meeting the requirements of Europe s General Data Protection Regulation (GDPR) [5]. They clearly state that simply having local or global explanations may not be sufﬁcient for providing satisfactory explanations in many cases.

There are also potential risks with this approach. The ﬁrst is that if the base local explainer is non-robust or inaccurate [34, 35] then the explanations generated by our tree also may have to be considered cautiously. However this is not speciﬁc to our method, and applies to several post-hoc explainability methods that try to explain a black-box model. The way to mitigate this is to ensure that the local explanation methods are adapted (such as by choosing appropriate neighborhoods in LIME) to provide robust and accurate explanations. Another risk could be that such detailed multilevel explanations may reveal too much about the internals of the model (similar scenario for gradient-based models is discussed in [36]) and hence may raise privacy concerns. Mitigation could happen by selectively revealing the levels / pruning the tree or having a budget of explanations for each user to balance the level of explanations vs. the exposure of the black-box model.

Funding and Conﬂicts of Interest

All authors were employed by IBM Corporation. There were no other sources of funding.

Acknowledgments

The authors thank the anonymous reviewers for this and past revisions of this paper for their detailed and thoughtful comments. Part of this work was conducted under the auspices of the IBM Watson for Natural Resources Innovation Program.

[1] T. Pettinger, 10 Tips for Effective Conversation, in http://www.srichinmoybio.co.uk/blog/communication/10-tips-for-effective-conversation/, 2008.

[2] R. M. Joseph Grenny, Al Switzler, Crucial Conversations: Tools for Talking When Stakes Are High. Mc Graw-Hill Education, 2001.

[3] P. Lipton, What Good is an Explanation? Studies in Epistemology, Logic, Methodology, and Philosophy of Science, vol. 302, 2001.

[4] M. E. Kaminski and G. Malgieri, Multi-layered explanations from algorithmic impact assessments in the GDPR, in Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 2020.

[5] P. N. Yannella and O. Kagan, Analysis: Article 29 Working Party Guidelines on Automated Decision Making Under GDPR, 2018, https://www.cyberadviserblog.com/2018/01/analysisarticle-29-working-party-guidelines-on-automated-decision-making-under-gdpr/.

[6] FICO, Explainable Machine Learning Challenge, https://community.ﬁco.com/s/ explainable-machine-learning-challenge?tabset-3158a=2, 2018, accessed: 2018-10-25.

[7] M. Ribeiro, S. Singh, and C. Guestrin, "Why Should I Trust You?" Explaining the Predictions of Any Classiﬁer, in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016.

[8] F. Wang and C. Rudin, Falling rule lists, in Artiﬁcial Intelligence and Statistics, 2015, pp. 1013 1022.

[9] G. Su, D. Wei, K. Varshney, and D. Malioutov, Interpretable Two-level Boolean Rule Learning for Classiﬁcation, in https://arxiv.org/abs/1606.05798, 2016.

[10] G. Montavon, W. Samek, and K.-R. Müller, Methods for interpreting and understanding deep neural networks, Digital Signal Processing, 2017.

[11] S. Lundberg and S.-I. Lee, Uniﬁed framework for interpretable methods, in Advances of Neural Inf. Proc. Systems, 2017.

[12] A. Dhurandhar, P.-Y. Chen, R. Luss, C.-C. Tu, P. Ting, K. Shanmugam, and P. Das, Explanations based on the missing: Towards contrastive explanations with pertinent negatives, in Advances in Neural Information Processing Systems 31, 2018.

[13] G. Plumb, D. Molitor, and A. Talwalkar, Model agnostic supervised local explanations, in Advances in Neural Information Processing Systems, 2018, p. 2520 2529.

[14] K. Gurumoorthy, A. Dhurandhar, G. Cecchi, and C. Aggarwal, Efﬁcient Data Representation by Selecting Prototypes with Importance Weights, IEEE International Conference on Data Mining, 2019.

[15] B. Kim, R. Khanna, and O. Koyejo, Examples are not enough, learn to criticize! criticism for interpretability, in In Advances of Neural Inf. Proc. Systems, 2016.

[16] D. Pedreschi, F. Giannotti, R. Guidotti, A. Monreale, S. Ruggieri, and F. Turini, Meaningful explanations of Black Box AI decision systems, in Proceedings of the AAAI Conference on Artiﬁcial Intelligence, vol. 33, 2019, pp. 9780 9784.

[17] S. Lundberg, G. Erion, H. Chen, D. A, J. M. Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal, and S. I. Lee, From local explanations to global understanding with explainable AI for trees, in Nature Machine Intelligence, vol. 2, 2020, pp. 56 -67.

[18] M. Tsang, Y. Sun, D. Ren, and Y. Liu, Can I trust you more? Model-agnostic hierarchical explanations, ar Xiv preprint ar Xiv:1812.04801, 2018.

[19] U. Bhatt, A. Weller, and J. M. Moura, Evaluating and aggregating feature-based model explanations, International Joint Conference on Artiﬁcial Intelligence (IJCAI), 2020.

[20] H. Lakkaraju, E. Kamar, R. Caruana, and J. Leskovec, Faithful and customizable explanations of black box models, in Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, 2019, pp. 131 138.

[21] G. K. Chen, E. C. Chi, J. M. O. Ranola, and K. Lange, Convex clustering: An attractive alternative to hierarchical clustering, PLo S computational biology, vol. 11, no. 5, 2015.

[22] M. Weylandt, J. Nagorski, and G. I. Allen, Dynamic Visualization and Fast Computation for Convex Clustering via Algorithmic Regularization, Journal of Computational and Graphical Statistics, pp. 1 18, 2019.

[23] M. Yu, K. N. Ramamurthy, A. Thompson, and A. Lozano, Simultaneous Parameter Learning and Bi-Clustering for Multi-Response Models, Frontiers in Big Data, vol. 2, p. 27, 2019.

[24] M. Yu, A. M. Thompson, K. N. Ramamurthy, E. Yang, and A. C. Lozano, Multitask Learning using Task Clustering with Applications to Predictive Modeling and GWAS of Plant Varieties, ar Xiv preprint ar Xiv:1710.01788, 2017.

[25] S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein et al., Distributed optimization and statistical learning via the alternating direction method of multipliers, Foundations and Trends R in Machine learning, vol. 3, no. 1, pp. 1 122, 2011.

[26] U. Bhatt, A. Xiang, S. Sharma, A. Weller, Y. J. Ankur Taly, J. Ghosh, R. Puri, J. M. F. Moura, and P. Eckersley, Explainable Machine Learning in Deployment, in Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 2020.

[27] C.-K. Yeh, C.-Y. Hsieh, A. Suggala, D. I. Inouye, and P. K. Ravikumar, On the (in) ﬁdelity and sensitivity of explanations, in Advances in Neural Information Processing Systems, 2019, pp. 10 965 10 976.

[28] A. Jacovi and Y. Goldberg, Towards Faithfully Interpretable NLP Systems: How should we deﬁne and evaluate faithfulness? in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 4198 4205.

[29] D. Dheeru and E. Karra Taniskidou, UCI machine learning repository, 2017. [Online]. Available: http://archive.ics.uci.edu/ml

[30] V. Arya, R. K. E. Bellamy, P.-Y. Chen, A. Dhurandhar, M. Hind, S. C. Hoffman, S. Houde, Q. V. Liao, R. Luss, A. Mojsilovi c, S. Mourad, P. Pedemonte, R. Raghavendra, J. Richards, P. Sattigeri, K. Shanmugam, M. Singh, K. R. Varshney, D. Wei, and Y. Zhang, AI Explainaility 360: An Extensible Toolkit for Understanding Data and Machine Learning Models, Journal of Machine Learning Research, vol. 21, no. 130, pp. 1 6, 2020.

[31] D. L. Donoho and Y. Tsaig, Fast solution of ℓ1 norm minimization problems when the solution may be sparse, IEEE Transactions on Information Theory, vol. 54, no. 11, pp. 4789 4812, 2008. [32] J. L. Gross and J. Yellen, Graph theory and its applications. CRC press, 2005. [33] Z. C. Lipton, The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery. Queue, vol. 16, no. 3, p. 31 57, Jun. 2018. [Online]. Available: https://doi.org/10.1145/3236386.3241340 [34] D. Slack, S. Hilgard, E. Jia, S. Singh, and H. Lakkaraju, Fooling lime and shap: Adversarial attacks on post hoc explanation methods, in Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, ser. AIES 20. New York, NY, USA: Association for Computing Machinery, 2020, p. 180 186. [Online]. Available: https://doi.org/10.1145/3375627.3375830 [35] A. Ghorbani, A. Abid, and J. Zou, Interpretation of neural networks is fragile, in Proceedings of the AAAI Conference on Artiﬁcial Intelligence, vol. 33, 2019, pp. 3681 3688. [36] S. Milli, L. Schmidt, A. D. Dragan, and M. Hardt, Model reconstruction from model explanations, in Proceedings of the Conference on Fairness, Accountability, and Transparency, ser. FAT* 19. New York, NY, USA: Association for Computing Machinery, 2019, p. 1 9. [Online]. Available: https://doi.org/10.1145/3287560.3287562