# debiasing_conceptbased_explanations_with_causal_analysis__0ebcdc78.pdf

Published as a conference paper at ICLR 2021

DEBIASING CONCEPT-BASED EXPLANATIONS WITH CAUSAL ANALYSIS

Mohammad Taha Bahadori, David E. Heckerman {bahadorm, heckerma}@amazon.com

Concept-based explanation approach is a popular model interpertability tool because it expresses the reasons for a model s predictions in terms of concepts that are meaningful for the domain experts. In this work, we study the problem of the concepts being correlated with confounding information in the features. We propose a new causal prior graph for modeling the impacts of unobserved variables and a method to remove the impact of confounding information and noise using a two-stage regression technique borrowed from the instrumental variable literature. We also model the completeness of the concepts set and show that our debiasing method works when the concepts are not complete. Our synthetic and real-world experiments demonstrate the success of our method in removing biases and improving the ranking of the concepts in terms of their contribution to the explanation of the predictions.

1 INTRODUCTION

Explaining the predictions of neural networks through higher level concepts (Kim et al., 2018; Ghorbani et al., 2019; Brocki & Chung, 2019; Hamidi-Haines et al., 2018) enables model interpretation on data with complex manifold structure such as images. It also allows the use of domain knowledge during the explanation process. The concept-based explanation has been used for medical imaging (Cai et al., 2019), breast cancer histopathology (Graziani et al., 2018), cardiac MRIs (Clough et al., 2019), and meteorology (Sprague et al., 2019).

When the set of concepts is carefully selected, we can estimate a model in which the discriminative information ﬂow from the feature vectors x through the concept vectors c and reach the labels y. To this end, we train two models for prediction of the concept vectors from the features denoted by bc(x) and the labels from the predicted concept vector by(bc). This estimation process ensures that for each prediction we have the reasons for the prediction stated in terms of the predicted concept vector bc(x).

However, in reality, noise and confounding information (due to e.g. non-discriminative context) can inﬂuence both of the feature and concept vectors, resulting in confounded correlations between them. Figure 1 provides an evidence for noise and confounding in the CUB-200-2011 dataset (Wah et al., 2011). We train two predictors for the concepts vectors based on features bc(x) and labels bc(y) and compare the Spearman correlation coefﬁcients between their predictions and the true ordinal value of the concepts. Having concepts for which bc(x) is more accurate than bc(y) could be due to noise, or due to hidden variables independent of the labels that spuriously correlated c and x, leading to undesirable explanations that include confounding or noise.

In this work, using the Concept Bottleneck Models (CBM) (Koh et al., 2020; Losch et al., 2019) we demonstrate a method for removing the counfounding and noise (debiasing) the explanation with concept vectors and extend the results to Testing with Concept Activation Vectors (TCAV) (Kim et al., 2018) technique. We provide a new causal prior graph to account for the confounding information and concept completeness (Yeh et al., 2020). We describe the challenges in estimation of our causal prior graph and propose a new learning procedure. Our estimation technique deﬁnes and predicts debiased concepts such that the predictive information of the features maximally ﬂow through them.

We show that using a two-stage regression technique from the instrumental variables literature, we can successfully remove the impact of the confounding and noise from the predicted concept vectors. Our proposed procedure has three steps: (1) debias the concept vectors using the labels, (2) predict

Published as a conference paper at ICLR 2021

0 50 100 150 200 250 300 Concept ids

Spearman Correlation

ρ(bc(x), c)

ρ(bc(y), c)

Figure 1: Spearman correlation coefﬁcients (ρ) of the predictors of the concepts given features bc(x) and labels bc(y) for the 312 concepts in the test partition of the CUB-200-2011 dataset (Wah et al., 2011). 112 concepts can be predicted more accurately with the features rather than the labels. Concept ids in the x-axis are sorted in the increasing ρ(bc(y), c) order. We provide the detailed steps to obtain the ﬁgure in Section 4.2.

the debiased concept vectors using the features, and (3) use the predict concept vectors in the second step to predict the labels. Optionally, we can also ﬁnd the residual predictive information in the features that are not in the concepts.

We validate the proposed method using a synthetic dataset and the CUB-200-2011 dataset. On the synthetic data, we have access to the ground truth and show that in the presence of confounding and noise, our debiasing procedure improves the accuracy of recovering the true concepts. On the CUB-200-2011 dataset, we use the Rem Ove And Retrain (ROAR) framework (Hooker et al., 2019) to show that our debiasing procedure ranks the concepts in the order of their explanation more accurately than the regular concept bottleneck models. We also show that we improve the accuracy of CBNs in the prediction of labels using our debiasing technique. Finally, using several examples, we also qualitatively show when the debasing helps improve the quality of concept-based explanations.

2 METHODOLOGY

Notations. We follow the notation of Goodfellow et al. (2016) and denote random vectors by bold font letters x and their values by bold symbols x. The notation p(x) is a probability measure on x and dp(x = x) is the inﬁnitesimal probability mass at x = x. We use by(x) to denote the the prediction of y given x. In the graphical models, we show the observed and unobserved variables using ﬁlled and hollow circles, respectively.

Problem Statement. We assume that during the training phase, we are given triplets (xi, ci, yi) for i = 1, . . . , n data points. In addition to the regular features x and labels y, we are given a human interpretable concepts vector c for each data point. Each element of the concept vector measures the degree of existence of the corresponding concept in the features. Thus, the concept vector typically have binary or ordinal values. Our goal is to learn to predict y as a function of x and use c for explaining the predictions. Performing in two steps, we ﬁrst learn a function bc(x) and then learn another function by(bc(x)). The prediction bc(x) is the explanation for our prediction by. During the test time, only the features are given and the prediction+explanation algorithm predicts both by and bc.

In this paper, we aim to remove the bias and noise components from the estimated concept vector bc such that it explains the reasons for prediction of the labels more accurately. To this end, we ﬁrst need to propose a new causal prior graph that includes the potential unobserved confounders.

2.1 A NEW CAUSAL PRIOR GRAPH FOR CBMS

Figure 2a shows the ideal situation in explanation via high-level concepts. The generative model corresponding to Figure 2a states that for generating each feature xi we ﬁrst randomly draw the label yi. Given the label, we draw the concepts ci. Given the concepts, we generate the features. The

Published as a conference paper at ICLR 2021

(a) The ideal concepts

(b) Our more realistic graph

(c) The graph with bd(y)

Figure 2: (a) The ideal view of the causal relationships between the features x, concepts c, and labels y. (b) In a more realistic setting, the unobserved confounding variable u impacts both x and c. The shared information between x and y go through the discriminative part of the concepts d. We also model the completeness of the concepts via a direct edge from the features x to the labels y. (c) When we use bd(y) = E[c|y] in place of d and c, we eliminate the confounding link u c.

hierarchy in this graph is from nodes with less detailed information (labels) to more detailed ones (features, images).

This model in Figure 2a is an explanation for the phenomenon in Figure 1, because the noise in generation of the concepts allows the x c edge to be stronger than the c y edge. However, another (non-mutually exclusive) explanation for this phenomenon is the existence of hidden confounders u shown in Figure 2b. In this graphical model, u represents the confounders and d represents the unconfounded concepts. Note that we assume that the confounders u and labels y are independent when x and c are not observed.

Another phenomenon captured in Figure 2b is the lack of concept completeness (Yeh et al., 2020). It describes the situation when the features, compared to the concepts, have additional predictive information about the labels.

The non-linear structural equations corresponding to the causal prior graph in Figure 2b are as follows

d = f1(y) + εd, (1) c = d + h(u), (2) x = f2(u, d) + f3(y) + εx, (3)

for some vector functions h, f1, f2, and f3. We have εd y and u y. Our deﬁnition of d in Eq. (2) does not restrict u, because we simply attribute the difference between c and f1(y) to a function of the latent confounder u and noise.

Our causal prior graph in Figure 2b corresponds to a generative process in which to generate an observed triplet (xi, ci, yi) we ﬁrst draw a label yi and a confounder ui vector independently. Then we draw the discriminative concepts di based on the label and generate the features xi jointly based on the concepts, label, and the confounder. Finally, we draw the observed concept vector ci based on the drawn concept and confounder vectors.

Both causal graphs reﬂect our assumption that the direction of causality is from the labels to concepts and then to the features, y d x, to ensure that u and y are marginally independent in Figure 2b. This direction also correspond to moving from more abstract class labels to concepts to detailed features. During estimation, we ﬁt the functions in the x d y direction, because ﬁnding the statistical strength of an edge does not depend on its direction.

Estimation of the model in Figure 2b is challenging because there are two distinct paths for the information from the labels y to reach the features x. Our solution is to prioritize the bottleneck path and estimate the y d x, then estimate the residuals of the regression using the y x direct path. Our two-stage estimation technique ensures that the predictive information of the features maximally ﬂow through the concepts. In the next sections, we focus on the ﬁrst phase and using a two-stage regression technique borrowed from the instrumental variables literature to eliminate the noise and confounding in estimation of the d x link.

Published as a conference paper at ICLR 2021

Algorithm 1 Debiased CBMs

Require: Data tuples (xi, ci, yi) for i = 1, . . . , n.

1: Estimate a model bd(y) = E[c|y] using (ci, yi) pairs. 2: Train a neural network as an estimator for p b φ(d|x) using (xi, bdi)) pairs.

3: Use pairs (xi, yi) to estimate function gθ by ﬁtting R gθ(d)dp b φ(d = d|xi) to yi.

4: Compute the debiased explanations E[d|xi] 1

n Pn i=1 E[d|xi] for i = 1, . . . , n. 5: return The CBM deﬁned by (p b φ, gθ) and the debiased explanations.

2.2 INSTRUMENTAL VARIABLES

Background on two-stage regression. In causal inference, instrumental variables (Stock, 2015; Pearl, 2009) denoted by z are commonly used to ﬁnd the causal impact of a variable x on y when x and y are jointly inﬂuenced by an unobserved confounder u (i.e., x u y). The key requirement is that z should be correlated with x but independent of the confounding variable u (i.e. z x y and z u). The commonly used 2-stage least squares ﬁrst regresses x in terms of z to obtain bx followed by regression of y in terms of bx. Because of independence between z and u, bx is also independent of u. Thus, in the second regression the confounding impact of u is eliminated. Our goal is to use the two-stage regression trick again to remove the confounding factors impacting features and concept vectors. The instrumental variable technique can be used for eliminating the biases due to the measurement errors (Carroll et al., 2006).

Two-Stage Regression for CBMs. In our causal graph in Figure 2b, the label y can be used for the study of the relationship between concepts d and features x. We predict d as a function of y and use it in place of the concepts in the concept bottleneck models. The graphical model corresponding to this procedure is shown in Figure 2c, where the link u c is eliminated. In particular, given the independence relationship y u, we have bd(y) = E[c|y] h(u). This is the basis for our debiasing method in the next section.

2.3 THE ESTIMATION METHOD

Our estimation uses the observation that in graph 2b the label vector y is a valid instrument for removing the correlations due to u. Combining Eqs. (1) and (2) we have c = f1(y) + h(u) + εd. Taking expectation with respect to p(c|y), we have

E[c|y] = E[f1(y) + h(u) + εd|y] = f1(y) + E[h(u)] + E[εd]. (4)

The last step is because both u and εd are independent of y. Thus, two terms are constant in terms of x and y and can be eliminated after estimation. Eq. (4) allows us to remove the impact of u and εd and estimate the denoised and debiased bd(y) = E[c|y]. We ﬁnd E[c|y] using a neural network trained on (ci, yi) pairs and use them as pseudo-observations in place of di. Given our debiased prediction for the discriminative concepts di, we can perform the CBMs two-steps of x d and d y estimation.

Because we use expected values of c in place of d during the learning process (i.e., bd(y) = E[c|y]), the debiased concept vectors have values within the ranges of original concept vectors c. Thus, we do not lose the human readability with the debiased concept vectors.

Incorporating Uncertainty in Prediction of Concepts. Our empirical observations show that prediction of the concepts from the features can be highly uncertain. Hence, we present a CBM estimator that takes into account the uncertainties in prediction of the concepts. We take the conditional expectation of the labels y given features x as follows

E[y|x] = E[gθ(bd)|x] = Z gθ(d)dpφ(d = d|x), (5)

where pφ(d|x) is the probability function, parameterized by φ, that captures the uncertainty in prediction of labels from features. The gθ( ) function predicts labels from the debiased concepts.

Published as a conference paper at ICLR 2021

In summary, we perform the steps in Algorithm 1 to learn debiased CBMs. In Line 3, we approximate the integral using Monte Carlo approach by drawing from the distribution p b φ(d|x) estimated in Line 2, see (Hartford et al., 2017). Given the labels data type, we can use the appropriate loss functions and are not limited to the least squares loss. Line 4 removes the constant mean that can be due to noise or confounding, as we discussed after Eq. (4). In the common case of binary concept vectors, we use the debiased concepts estimated in Line 4 to rank the concepts in the order of their contribution to explanation of the predictions.

3 DISCUSSIONS

Debiasing TCAVs. While we presented our debiasing algorithm for the CBMs, we can easily use it to debias the TCAV (Kim et al., 2018) explanations too. We can use the ﬁrst step to remove the bias due to the confounding and perform TCAV using bd vectors, instead of c vectors. The TCAV method is attractive, because unlike CBMs, it analyzes the existing neural networks and does not need to deﬁne a new model.

Measuring Concept Completeness. Our primary objective of modeling the concept incompleteness phenomenon in our causal prior graph is to show that our debiasing method does not need the concept completeness assumption to work properly. If we are interested in measuring the degree of completeness of the concepts, we can do it based on the deﬁnition by Yeh et al. (2020). To this end, we ﬁt a unrestricted neural network q(x) = E[y|x] to the to the residuals in the step 3 of our debiasing algorithm. The function q( ) captures the residual information in x. We compare the improvement in prediction accuracy over the accuracy in step 3 to quantify the degree of concept incompleteness. Because we ﬁrst predict the labels y using the CBM and then ﬁt the q(x) to the residuals, we ensure that the predictive information maximally go through the CBM link x d y.

Prior Work on Causal Concept-Based Explanation. Among the existing works on causal concept-based explanation, Goyal et al. (2019) propose a different causal prior graph to model the spurious correlations among the concepts and remove them using conditional variational autoencoders. In contrast, we aim at handling noise and spurious correlations between the features and concepts using the labels as instruments. Which work is more appropriate for a problem depending on the assumptions underlying that problem.

The Special Linear Gaussian Case. When the concepts have real continuous values, we might use a simple multivariate Gaussian distribution to model p(d|x) = N(x, σI), σ > 0. If we further use a linear regression to predict labels from the concepts, we can show that the steps are simpliﬁed as follows:

1. Learn bd(y) by predicting yi ci.

2. Learn ˆˆd(x) by predicting xi bdi.

3. Learn by(ˆˆd) by predicting ˆˆdi yi.

The above steps show that given the linear Gaussian assumptions, steps 2 and 3 coincide with the sequential bottleneck (Koh et al., 2020) training of CBMs. We only need to change the concepts from ci to bdi we can eliminate the noise and confounding bias.

4 EXPERIMENTS

Evaluation of explanation algorithms is notoriously difﬁcult. Thus, we ﬁrst present experiments with synthetic data to show that our debiasing technique improves the explanation accuracy when we know the true explanations. Then, on the CUB-200-2011 dataset, we use the ROAR (Hooker et al., 2019) framework to show that the debiasing improves the explanation accuracy. Finally, using several examples, we identify the circumstances in which the debiasing helps.

Published as a conference paper at ICLR 2021

102 103 104 105 106

Number of data points

Average Correlation

Regular Debiased

(a) Fully Random Design

102 103 104 105 106

Number of data points

Average Correlation

Regular Debiased

(b) Orthogonal Confounding Design

Figure 3: Correlation between the estimated concept vectors and the true discriminative concept vectors as the number of data points grow. Notice the different ranges of the y-axes and the logarithmic scale of the x-axes.

4.1 SYNTHETIC DATA EXPERIMENTS

We create a synthetic dataset according to the following steps:

1. Generate n vectors yi R100 with elements distributed according to unit normal distribution N(0, 1).

2. Generate n vectors ui R100 with elements distributed according to unit normal distribution N(0, 1).

3. Generate n vectors εc,i R100 with elements distributed according to scaled normal distribution N(0, σ = 0.02).

4. Generate n vectors εx,i R100 with elements distributed according to scaled normal distribution N(0, σ = 0.02).

5. Generate matrices W1, W2, W3, W4 R100 100 with elements distributed according to scaled normal distribution N(0, σ = 0.1).

6. Compute di = W1yi + εd,i for i = 1, . . . , n.

7. Compute ci = di + W2ui for i = 1, . . . , n.

8. Compute xi = W3di + W4ui + εx,i for i = 1, . . . , n.

In Figure 3a, we plot the correlation between the true unconfounded and noiseless concepts W y and the estimated concept vectors with the regular two-step procedure (without debiasing) and our debiasing method, as a function of sample size n. The results show that the bias due to confounding does not vanish as we increase the sample size and our debiasing technique can make the results closer to the true discriminative concepts.

In the ideal case, when the confounding impact is orthogonal to the impact of the labels, our debiasing algorithm recovers the true concepts more accurately. Figure 3b demonstrates the scenario when W1 W2, W4 in the synthetic data generation. To generate matrices with this constraint, we ﬁrst generate a unitary matrix Q and generate four diagonal matrices Λ1, . . . , Λ4 with diagonal elements drawn from Uniform(0.1, 1). This choice of the distribution for the diagonal elements caps the condition number of the matrices by 10. To satisfy the orthogonality constraint, we set the ﬁrst 50 diagonal elements of Λ2 and Λ4 and last 50 diagonal elements of Λ1 to zero. We compute the matrices as Wj = QΛj Q for j = 1, . . . , 4. The orthogonality allows perfect separation of u and y impacts and the perfect debiasing by our method as the sample size grows.

In Figure 6, we repeat the synthetic experiments in a higher noise regime where the standard deviation of the noise variables εc and εc is set to σ = 0.1. The results conﬁrm our main claim about our debiasing algorithm.

Published as a conference paper at ICLR 2021

Table 1: Mapping the concept annotations to real values.

Annotation Certainty Ordinal Score Numeric Map Doesn t Exist deﬁnitely 0 0 Doesn t Exist probably 1 1/6 Doesn t Exist guessing 2 2/6 Doesn t Exist not visible 3 3/6 Exists not visible 3 3/6 Exists guessing 4 4/6 Exists probably 5 5/6 Exists deﬁnitely 6 1

4.2 CUB-200-2011 EXPERIMENTS

Dataset and preprocessing. We evaluate the performance of the proposed approach on the CUB200-2011 dataset (Wah et al., 2011). The dataset includes 11788 pictures (in 5994/5794 train/test partitions) of 200 different types of birds, annotated both for the bird type and 312 different concepts about each picture. The concept annotations are binary, whether the concept exists or not. However, for each statement, a four-level certainty score has been also assigned: 1: not visible, 2: guessing, 3: probably, and 4: deﬁnitely. We combine the binary annotation and the certainty score to create a 7-level ordinal variable as the annotation for each image as summarized in Table 1. For simplicity, we map the 7-level ordinal values to uniformly spaced valued in the [0, 1] interval. We randomly choose 15% of the training set and hold out as the validation set.

The result in Figure 1. To compare the association strength between y and c with the association strength between x and c we train two predictors of concepts bc(x) and bc(y). We use Torch Vision s pre-trained Res Net152 network (He et al., 2016) for prediction of the concepts from the images. Because the labels y have categorical values, bc(y) is simply the average concept annotation scores per each class. We use the Spearman correlation to ﬁnd the association strengths in pairs (bc(x), c) and (bc(y), c) because the concept annotations are ordinal numbers. The concept ids in the x-axis are sorted in terms of increasing values of ρ(bc(y), c).

The top ten concepts with the largest values of ρ(bc(x), c) ρ(bc(y), c) are has back color::green , has upper tail color::green , has upper tail color::orange , has upper tail color::pink , has back color::rufous , has upper tail color::purple , has back color::pink , has upper tail color::iridescent , has back color::purple , and has back color::iridescent . These concepts are all related to color and can be easily confounded by the context of the images.

Training details for Algorithm 1. We model the distribution of the concept logits as Gaussians with means equal to the Res Net152 s logit outputs and a diagonal covariance matrix. We estimate the variance for each concept by using the logits of the true concept annotation scores that are clamped into [0.05, 0.95] to avoid large logit numbers. In each iteration of the training loop for Line 3, we draw 25 samples from the estimated p(d|x). Predictor of labels from concepts (the function g( ) in Eq. (5)) is a three-layer feed-forward neural network with hidden layer sizes (312, 312, 200). There is a skip connection from the input to the penultimate layer. All algorithms are trained with Adam optimization algorithm (Kingma & Ba, 2014).

Quantitative Results. Comparing to the baseline algorithm, our debiasing technique increases the average Spearman correlation between bc(x) and bc(y) from 0.406 to 0.508. For the above 10 concepts, our algorithm increases the average Spearman correlation from 0.283 to 0.389. Our debiasing algorithm also improves the generalization in prediction of the image labels. The debiasing also improves the top-5 accuracy of predicting the labels from 39.5% to 49.3%.

To show that our proposed debiasing accurately ranks the concepts in terms of their explanation of the predictions, we use the Rem Ove And Retrain (ROAR) framework (Hooker et al., 2019). In the ROAR framework, we sort the concept using the scores E[d|xi] 1

n Pn i=1 E[d|xi] in the ascending order. Then, we mask (set to zero) the least explanatory x% of the concepts using the scores and retrain the gθ(d) function. We perform the procedure for x {0, 10, 20, . . . , 80, 90, 95} and record

Published as a conference paper at ICLR 2021

0 20 40 60 80 100 Percentage of Masked Concepts

Normalized Accuracy

Regular Debiased

Figure 4: The ROAR evaluation: We mask x% of the concepts that are identiﬁed by the methods as less explanatory of the labels and retrain the gθ( ) function. We measure the change in the accuracy of predicting the labels y as we increase the masking percentage. For better comparison of the trends, we have normalized them by their ﬁrst data point (x = 0%).

Figure 5: Twelve example images where the debiasing helps. A common pattern is that, the image context has either prevented or misled the annotator from accurate annotation of the concepts. From the left to right, the birds are Brandt Cormorant , Pelagic Cormorant , Fish Crow , Fish Crow , Fish Crow , Ivory Gull , Ivory Gull , Green Violetear , Green Violetear , Cape Glossy Starling , Northern Waterthrush , Northern Waterthrush .

the testing top-5 accuracy of predicting the labels y. We repeat the ROAR experiments for 3 times and report the average accuracy as we vary the masking percentage.

Figure 4 shows the ROAR evaluation of the regular and debiased algorithms. Because the debiased algorithm is more accurate, for easier comparison, we normalize both curves by dividing them by their accuracy at masking percentage x = 0%. An immediate observation is that the plot for debiased algorithm stays above the regular one, which is a clear indication of its superior performance in identifying the least explanatory concepts. The results show several additional interesting insights too. First, the prediction of the bird species largely rely on a sparse set of concepts as we can mask 95% of the concepts and still have a decent accuracy. Second, masking a small percentage of irrelevant concepts reduces the noise in the features and improves the generalization performance of both algorithms. Our debiased algorithm is more successful by being faster at ﬁnding the noisy features before x = 20% masking. Finally, the debiased accuracy curve is less steep after x = 80%, which again indicates its success in ﬁnding the most explanatory concepts.

Qualitative analysis of the results. In Figure 5, we show 12 images for which the bd and c are signiﬁcantly different. A common pattern among the examples is that the context of the image does not allow accurate annotations by the annotators. In images 3, 4, 5, 6, 7, 11, and 12 in Figure 5, the ten color-related concepts listed at the beginning are all set to 0.5, indicating that the annotators have failed in annotation. However, our algorithm correctly identiﬁes that for example Ivory Gulls do not have green-colored backs by predicting bc = 0.08 which is closer to bc(y) = 0.06 than the true c = 0.5.

Another pattern is the impact of the color of the environment on the accuracy of the annotations. For example, the second image from the left is an image of Pelagic cormorant, whose back and upper tail colors are unlikely to be green with per-class average of 0.12 and 0.07, respectively. However, because of the color of the image and the reﬂections, the annotator has assigned 1.0 to both of has back color::green and has upper tail color::green concepts. Our algorithm predicts 0.11 and 0.16 for these two features respectively, which are closer to the per-class average.

Published as a conference paper at ICLR 2021

Table 2: Examples of differences between regular and debiased algorithms in ranking the concepts.

Image Top 15 Concepts

Debiased: has throat color::black, has head pattern::plain, has forehead color::black, has breast color::black, has underparts color::black, has nape color::black, has crown color::black, has primary color::black, has bill color::black, has belly color::black, has wing color::orange, has breast pattern::solid, has upperparts color::orange, has wing pattern::multi-colored, has bill length::about the same as head Regular: has primary color::black, has wing color::black, has throat color::black, has upperparts color::black, has breast color::black, has primary color::blue, has underparts color::black, has belly color::black, has back color::black, has nape color::black, has upperparts color::blue, has tail pattern::solid, has crown color::blue, has under tail color::black, has forehead color::blue

Debiased: has primary color::red, has crown color::red, has forehead color::red, has throat color::red, has nape color::red, has breast color::red, has underparts color::red, has belly color::red, has forehead color::rufous, has upperparts color::red, has crown color::rufous, has nape color::rufous, has wing pattern::multi-colored, has primary color::rufous, has throat color::rufous Regular: has underparts color::grey, has breast color::grey, has belly color::grey, has belly pattern::multi-colored, has breast pattern::multi-colored, has nape color::grey, has bill length::shorter than head, has breast color::red, has throat color::grey, has upperparts color::red, has back pattern::multi-colored, has underparts color::red, has primary color::grey, has belly color::red, has throat color::red

Debiased: has bill length::about the same as head, has belly pattern::spotted, has underparts color::black, has bill shape::dagger, has breast color::black, has belly color::black, has breast pattern::spotted, has back pattern::spotted, has wing pattern::spotted, has nape color::red, has back color::black, has tail pattern::spotted, has under tail color::black, has upper tail color::black, has primary color::buff Regular: has primary color::brown, has wing color::brown, has upperparts color::brown, has crown color::brown, has back color::brown, has forehead color::brown, has nape color::brown, has throat color::white, has breast pattern::spotted, has under tail color::brown, has breast color::white, has belly color::white, has underparts color::white, has upper tail color::brown, has breast color::brown

Debiased: has shape::duck-like, has bill shape::spatulate, has size::medium (9 - 16 in), has bill length::about the same as head, has throat color::buff, has underparts color::brown, has breast color::brown, has belly pattern::spotted, has crown color::brown, has primary color::brown, has belly color::brown, has nape color::brown, has forehead color::brown, has upperparts color::brown, has belly color::purple Regular: has belly color::grey, has underparts color::grey, has breast color::grey, has belly color::pink, has belly color::rufous, has underparts color::purple, has belly color::purple, has underparts color::pink, has throat color::grey, has primary color::grey, has belly color::green, has underparts color::rufous, has underparts color::green, has belly color::iridescent, has forehead color::grey

Debiased: has throat color::black, has forehead color::black, has crown color::black, has primary color::black, has nape color::black, has breast color::red, has belly color::white, has underparts color::white, has underparts color::red, has back color::black, has primary color::white, has bill shape::cone, has breast pattern::multi-colored, has primary color::red, has upperparts color::black Regular: has nape color::black, has primary color::black, has nape color::rufous, has primary color::red, has wing color::orange, has nape color::red, has breast color::red, has crown color::black, has upperparts color::orange, has crown color::red, has underparts color::white, has upperparts color::rufous, has forehead color::black, has throat color::red, has back color::purple

Debiased: has wing color::blue, has upperparts color::blue, has crown color::rufous, has primary color::blue, has bill color::rufous, has back color::blue, has under tail color::blue, has bill color::red, has upper tail color::blue, has wing pattern::multi-colored, has nape color::rufous, has crown color::brown, has belly color::brown, has nape color::brown, has underparts color::brown Regular: has primary color::red, has throat color::red, has underparts color::red, has breast color::red, has forehead color::red, has crown color::rufous, has crown color::red, has nape color::red, has belly color::red, has primary color::rufous, has throat color::rufous, has forehead color::rufous, has wing color::red, has nape color::rufous, has underparts color::rufous

In Table 2, we list six examples to show the superior accuracy of the debiased CBM in ranking the concepts in terms of their explanation power. Moreover, in Table 3 in the appendix, we list the top and bottom concepts in terms of their predictive uncertainty in our debiased CBM.

5 CONCLUSIONS AND FUTURE WORKS

Studying the concept-based explanation techniques, we provided evidences for potential existence of spurious association between the features and concepts due to unobserved latent variables or noise. We proposed a new causal prior graph that models the impact of the noise and latent confounding fron the estimated concepts. We showed that using the labels as instruments, we can remove the impact of the context from the explanations. Our experiments showed that our debiasing technique not only improves the quality of the explanations, but also improve the accuracy of predicting labels through the concepts. As future work, we will investigate other two-stage-regression techniques to ﬁnd the most accurate debiasing method.

Published as a conference paper at ICLR 2021

Lennart Brocki and Neo Christopher Chung. Concept Saliency Maps to Visualize Relevant Features in Deep Generative Models. In ICMLA, 2019.

Carrie J Cai, Emily Reif, Narayan Hegde, Jason Hipp, Been Kim, Daniel Smilkov, Martin Wattenberg, Fernanda Viegas, Greg S Corrado, Martin C Stumpe, et al. Human-centered tools for coping with imperfect algorithms during medical decision-making. In CHI, 2019.

Raymond J Carroll, David Ruppert, Leonard A Stefanski, and Ciprian M Crainiceanu. Measurement error in nonlinear models: a modern perspective. CRC press, 2006.

James R Clough, Ilkay Oksuz, Esther Puyol-Antón, Bram Ruijsink, Andrew P King, and Julia A Schnabel. Global and local interpretability for cardiac MRI classiﬁcation. In MICCAI, 2019.

Amirata Ghorbani, James Wexler, James Y Zou, and Been Kim. Towards automatic concept-based explanations. In Neur IPS, pp. 9273 9282, 2019.

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016.

Yash Goyal, Uri Shalit, and Been Kim. Explaining Classiﬁers with Causal Concept Effect (Ca CE). ar Xiv:1907.07165, 2019.

Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pp. 124 132. Springer, 2018.

Mandana Hamidi-Haines, Zhongang Qi, Alan Fern, Fuxin Li, and Prasad Tadepalli. Interactive Naming for Explaining Deep Neural Networks: A Formative Study. ar Xiv:1812.07150, 2018.

Jason Hartford, Greg Lewis, Kevin Leyton-Brown, and Matt Taddy. Deep IV: A ﬂexible approach for counterfactual prediction. In ICML, pp. 1414 1423, 2017.

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016.

Sara Hooker, Dumitru Erhan, Pieter-Jan Kindermans, and Been Kim. A benchmark for interpretability methods in deep neural networks. In Near IPS, 2019.

Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV). In ICML, pp. 2668 2677, 2018.

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. ar Xiv:1412.6980, 2014.

Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept Bottleneck Models. In ICML, 2020.

Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classiﬁcation output: Semantic bottleneck networks. ar Xiv:1907.10882, 2019.

Judea Pearl. Causality. Cambridge university press, 2009.

Conner Sprague, Eric B Wendoloski, and Ingrid Guch. Interpretable AI for Deep Learning Based Meteorological Applications. In American Meteorological Society Annual Meeting. AMS, 2019.

James H Stock. Instrumental variables in statistics and econometrics. International Encyclopedia of the Social & Behavioral Sciences, 2015.

C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. The Caltech-UCSD Birds-200-2011 Dataset. Technical Report CNS-TR-2011-001, California Institute of Technology, 2011.

Chih-Kuan Yeh, Been Kim, Sercan O Arik, Chun-Liang Li, Tomas Pﬁster, and Pradeep Ravikumar. On Completeness-aware Concept-Based Explanations in Deep Neural Networks. In Neur IPS, 2020.

Published as a conference paper at ICLR 2021

A HIGH-NOISE SYNTHETIC EXPERIMENTS

102 103 104 105 106

Number of data points

Average Correlation

Regular Debiased

(a) Fully Random Design

102 103 104 105 106

Number of data points

Average Correlation

Regular Debiased

(b) Orthogonal Confounding Design

Figure 6: Correlation between the estimated concept vectors and the true discriminative concept vectors as the number of data points grow. Notice the different ranges of the y-axes and the logarithmic scale of the x-axes.

Certainty Level Concepts

Most Uncertain (194) has nape color::black, (260) has primary color::black, (164) has forehead color::black, (7) has bill shape::all-purpose, (132) has throat color::black, (305) has crown color::black, (133) has throat color::white, (150) has bill length::about the same as head, (8) has bill shape::cone, (152) has bill length::shorter than head

Least Uncertain (216) has wing shape::tapered-wings, (215) has wing shape::broad-wings, (217) has wing shape::long-wings, (214) has wing shape::pointed-wings, (77) has tail shape::fan-shaped tail, (213) has wing shape::rounded-wings, (79) has tail shape::squared tail, (83) has upper tail color::purple, (82) has upper tail color::iridescent, (75) has tail shape::rounded tail

Table 3: The top 10 most and least uncertain concepts identiﬁed by the debiased algorithm.