# decisions_counterfactual_explanations_and_strategic_behavior__39eb22af.pdf Decisions, Counterfactual Explanations and Strategic Behavior Stratis Tsirtsis Max Planck Institute for Software Systems Kaiserslautern, Germany stsirtsis@mpi-sws.org Manuel Gomez-Rodriguez Max Planck Institute for Software Systems Kaiserslautern, Germany manuelgr@mpi-sws.org As data-driven predictive models are increasingly used to inform decisions, it has been argued that decision makers should provide explanations that help individuals understand what would have to change for these decisions to be beneficial ones. However, there has been little discussion on the possibility that individuals may use the above counterfactual explanations to invest effort strategically and maximize their chances of receiving a beneficial decision. In this paper, our goal is to find policies and counterfactual explanations that are optimal in terms of utility in such a strategic setting. We first show that, given a pre-defined policy, the problem of finding the optimal set of counterfactual explanations is NP-hard. Then, we show that the corresponding objective is nondecreasing and satisfies submodularity and this allows a standard greedy algorithm to enjoy approximation guarantees. In addition, we further show that the problem of jointly finding both the optimal policy and set of counterfactual explanations reduces to maximizing a non-monotone submodular function. As a result, we can use a recent randomized algorithm to solve the problem, which also offers approximation guarantees. Finally, we demonstrate that, by incorporating a matroid constraint into the problem formulation, we can increase the diversity of the optimal set of counterfactual explanations and incentivize individuals across the whole spectrum of the population to self improve. Experiments on synthetic and real lending and credit card data illustrate our theoretical findings and show that the counterfactual explanations and decision policies found by our algorithms achieve higher utility than several competitive baselines. 1 Introduction Whenever a bank decides to offer a loan to a customer, a university decides to admit a prospective student, or a company decides to hire a new employee, the decision is increasingly informed by a data-driven predictive model. In all these high-stakes applications, the goal of the predictive model is to provide accurate predictions of the outcomes from a set of observable features while the goal of the decision maker is to take decisions that maximize a given utility function. For example, in university admissions, the predictive model may estimate the ability of each prospective student to successfully complete the graduate program while the decision maker may weigh the model s estimate against other socio-economic considerations (e.g., number of available scholarships, diversity commitments). In this context, there has been a tremendous excitement on the potential of data-driven predictive models to enhance decision making in high-stakes applications. However, there has also been a heated debate about their lack of transparency and explainability [1 5]. As a result, there already exists a legal requirement to grant individuals who are subject to (semi)-automated decision making the right-to-explanation in the European Union [6, 7]. With this motivation, there has been a flurry of work on interpretable machine learning [8 16], which has predominantly focused on developing methods to find explanations for the predictions made by a predictive model. Within this line of work, 34th Conference on Neural Information Processing Systems (Neur IPS 2020), Vancouver, Canada. the work most closely related to ours [12, 13, 15, 16] aims to find counterfactual explanations that help individuals understand what would have to change for a predictive model to make a positive prediction about them. However, none of these works distinguish between decisions and predictions and, consequently, cannot be readily used to provide explanations to the decisions taken by a decision maker, which are ultimately what individuals who are subject to (semi)-automated decision making typically care about. In our work, we build upon a recent line of work that explicitly distinguishes between predictions and decisions [17 22] and then pursue the development of methods to find counterfactual explanations for the decisions taken by a decision maker who is assisted by a data-driven predictive model. These counterfactual explanations help individuals understand what would have to change in order to receive a beneficial decision, rather than a positive prediction. Moreover, once we focus on explaining decisions, we cannot overlook the possibility that individuals may use these explanations to invest effort strategically in order to maximize their chances of receiving a beneficial decision. However, this is also an opportunity for us to find counterfactual explanations that help individuals to self-improve and eventually increase the utility of a decision policy, as noted by several studies in economics [23 25] and, more recently, in the computer science literature [21, 26, 27]. For example, if a bank explains to a customer that, if she reduces her credit card debt by 20%, she will receive the loan she is applying for, she may feel compelled to reduce her overall credit card debt by the proposed percentage to pay less interest, improving her financial situation, and this will eventually increase the profit the bank makes when she is able to successfully return the loan. This is in contrast with previous work on interpretable machine learning, which have ignored the influence that (counterfactual) explanations (of predictions by a predictive model) may have on the accuracy of predictive models and the utility of the decision policies1. Our contributions. We cast the above problem as a Stackelberg game in which the decision maker moves first and shares her counterfactual explanations before individuals best-respond to these explanations and invest effort to receive a beneficial decision. In this context, we assume that the decision maker takes decisions based on low dimensional feature vectors since, in many realistic scenarios, the data is summarized by just a small number of summary statistics (e.g., FICO scores) [28, 29]. Under this problem formulation, we first show that, given a pre-defined policy, the problem of finding the optimal set of counterfactual explanations is NP-hard by using a novel reduction of the Set Cover problem [30]. Then, we show that the corresponding objective function is monotone and submodular and, as a direct consequence, it readily follows that a standard greedy algorithm offers approximation guarantees. In addition, we show that, given a pre-defined set of counterfactual explanations, the optimal policy is deterministic and can be computed in polynomial time. Moreover, building on this result, we can reduce the problem of jointly finding both the optimal policy and set of counterfactual explanations to maximizing a non-monotone submodular function. As a consequence, we can use a recent randomized algorithm to solve the problem, which also offers approximation guarantees. Further, we demonstrate that, by incorporating a matroid constraint into the problem formulation, we can increase the diversity of the optimal set of counterfactual explanations and incentivize individuals across the whole spectrum of the population to self improve. Experiments using real lending and credit card data illustrate our theoretical findings and show that the counterfactual explanations and decision policies found by the above algorithms achieve higher utility than several competitive baselines2. 2 Problem Formulation Given an individual with a feature vector x {1, ..., n}d and a (ground-truth) label y {0, 1}, we assume a decision d(x) {0, 1} controls whether the corresponding label is realized3. This setting fits a variety of real-world scenarios, where continuous features are often discretized into (percentile) ranges. For example, in university admissions, the decision specifies whether a student is admitted (d(x) = 1) or rejected (d(x) = 0); the label indicates whether the student completes the program (y = 1) or drops out (y = 0) upon acceptance; and the feature vector (x) may include her GRE scores, undergraduate GPA percentile, or research experience. Throughout the paper, we will denote the set of feature values as X = {x1, x2, . . . , xm}, where m = nd denotes the number of feature values, and assume that the number of features d is small, as discussed previously. 1Refer to Appendix A for a discussion of further related work. 2An open-source implementation can be found at https://github.com/Networks-Learning/strategic-decisions. 3Without loss of generality, we assume each feature takes n different values. Each decision is sampled from a decision policy d(x) π(d | x), where, for brevity, we will write π(x) = π(d = 1 | x). For each individual, the label y is sampled from a conditional probability distribution y P(y | x) and, without loss of generality, we index the feature values in decreasing order with respect to their corresponding outcome, i.e., i < j P(y = 1 | xi) P(y = 1 | xj). Moreover, we adopt a Stackelberg game-theoretic formulation in which each individual with initial feature value xi receives a (counterfactual) explanation from the decision maker by means of a feature value E(xi) A Pπ := {x X : π(x) = 1} before she (best-)responds4. This formulation fits a variety of real-world applications. For example, insurance companies often provide online car insurance simulators that, on the basis of a customer s initial feature value xi, let the customer know whether they are eligible for a particular deal. In case the customer does not qualify, the simulator could provide a counterfactual example E(xi) under which the individual is guaranteed to be eligible. In the remainder, we will refer to A as the set of counterfactual explanations and, for each individual with initial feature value xi, we will assume she does not know anything about the other counterfactual explanations A\E(xi) other individuals may receive nor the decision policy π(x). Now, let c(x, E(xi)) be the cost5 an individual pays for changing from xi to E(xi) and b(π, x) = Ed π(d | x)[d(x)] be the (immediate) benefit she obtains from a policy π, which is just the probability that the individual receives a positive decision. Then, following Tabibian et al. [21], each individual s best response is to change from her initial feature value xi to E(xi) iff the gained benefit she would obtain outweighs the cost she would pay for changing features, i.e., E(xi) {xj X : b(π, xj) c(xi, xj) b(π, xi)} := R(xi), and it is to keep her initial feature value xi otherwise. Here, we will refer to R(xi) as the region of adaptation. Then, at a population level, the above best response results into a transportation of mass between the original feature distribution P(x) and a new feature distribution P(x | π, A) induced by the policy π and the counterfactual explanations A. More specifically, we can readily derive an analytical expression for the induced feature distribution in terms of the original feature distribution, i.e., for all xj X, P(xj | π, A) = P(xj)I(R(xj) A = ) + X i [m] P(xi)I(E(xi) = xj xj R(xi)), Similarly as in previous work [17, 18, 21, 22], we will assume that the decision maker is rational, has access to (an estimation of) the original feature distribution P(x), and aims to maximize the (immediate) utility u(π, γ), which is the expected overall profit she obtains, i.e., u(π, A) = Ex P (x | π,A),y P (y | x),d π(x) [yd(x) γd(x)] = Ex P (x | π,A) [π(x)(P(y = 1 | x) γ)] , (1) where γ (0, 1) is a given constant reflecting economic considerations of the decision maker. For example, in university admissions, the term π(x)P(y = 1 | x) is proportional to the expected number of students who are admitted and complete the program, the term π(x)γ is proportional to the number of students who are admitted, and γ measures the cost of education in units of graduated students. As a direct consequence, given a feature value xi and a set of counterfactual explanations A, we can conclude that, if R(xi) A = , the decision maker will decide to provide the counterfactual explanation E(xi) that provides the largest utility gain under the assumption that individuals best respond, i.e., E(xi) = argmax x A R(xi) P(y | x) for all xi X \ Pπ such that R(xi) A = , (2) and, if R(xi) A = , we arbitrarily assume that E(xi) = argminx A c(xi, x)6. Given the above preliminaries, our goal is to help the decision maker to first find the optimal set of counterfactual explanations A for a pre-defined policy in Section 3 and then both the optimal policy π and set of counterfactual explanations A in Section 4. Remarks. Given an individual with initial feature value x, one may think that, by providing the counterfactual explanation E(x) A R(x) that gives the largest utility gain, the decision maker 4In practice, individuals with initial feature values xi such that π(x) = 1 may not receive any explanation since they are guaranteed to receive a positive decision. 5In practice, the cost for each pair of feature values may be given by a parameterized function. 6Note that, if A R(xi) = , the individual s best response is to keep her initial feature value xi and thus any choice of counterfactual explanation E(xi) leads to the same utility. is not acting in the individual s best interest but rather selfishly. This is because there may exist another counterfactual explanation Em(x) A R(x) with lower cost for the individual, i.e., c(x, Em(x)) c(x, E(x)). In our work, we argue that the provided counterfactual explanations help the individual to achieve a greater self-improvement and this is likely to result in a superior long-term well-being, as illustrated in Figure 7(c) in Appendix E. For example, consider a bank issuing credit cards who wants to maintain credit for trustworthy customers and incentivize the more risky ones to improve their financial status. In this case, E(x) is the explanation that maximally improves the financial status of the individual, making the repayment more likely, but requires her to pay a larger (immediate) cost. In contrast, Em(x) is an alternate explanation that requires the individual to pay a smaller (immediate) cost but, in comparison with E(x), would result in a higher risk of default. In this context, note that the individual would be willing to pay the cost of following either E(x) or Em(x) since both explanations lie within the region of adaptation R(x). We refer the interested reader to Appendix F.2 for an anecdotal real-world example of E(x) and Em(x). As argued very recently [21, 26, 31], due to Goodhart s law, the conditional probability P(y | x) may change after individuals (best)-respond if the true causal effect between the observed features x and the outcome variable y is partially described by unobserved features. Moreover, Miller et al. [31] have argued that, to distinguish between gaming and improvement, it is necessary to have access to the full underlying causal graph between the features and the outcome variable. In this work, for simplicity, we assume that P(y | x) does not change, however, it would be very interesting to lift this assumption in future work. 3 Finding the optimal counterfactual explanations for a policy In this section, our goal is to find the optimal set of counterfactual explanations A for a pre-defined policy π, i.e., A = argmax A Pπ : |A| k u(π, A), (3) where the cardinality constraint on the set of counterfactual explanations balances the decision maker s obligation to be transparent with trade secrets [32]. More specifically, note that, without this constraint, an adversary could reverse-engineer the entire decision policy π(x) by impersonating individuals with different feature values x [33]. As it will become clearer in the experimental evaluation in Section 6, our results may persuade decision makers to be transparent about their decision policies, something they are typically reluctant to be despite the increasing legal requirements, since we show that transparency increases the utility of the policies. Moreover, throughout this section, we will assume that the decision maker who picks the pre-defined policy is rational7 and the policy is outcome monotonic89 [21]. Outcome monotonicity just implies that, the higher an individual s outcome P(y = 1 | x), the higher their chances of receiving a positive decision π(x). Unfortunately, using a novel reduction of the Set Cover problem [30], the following theorem reveals that we cannot expect to find the optimal set of counterfactual explanations in polynomial time (proven in Appendix B.1): Theorem 1 The problem of finding the optimal set of counterfactual explanations that maximizes utility under a cardinality constraint is NP-Hard. Even though Theorem 1 is a negative result, we will now show that the objective function in Eq. 3 satisfies a set of desirable properties, i.e., non-negativity, monotonicity and submodularity10, which allow a standard greedy algorithm to enjoy approximation guarantees at solving the problem. To this aim, with a slight abuse of notation, we first express the objective function as a set function f(A) = u(π, A), which takes values over the ground set of counterfactual explanations, Pπ. Then, we have the following proposition (proven in Appendix B.2): 7Note that, if the decision maker is rational and her goal is to maximize the utility, as defined in Eq. 1, then, for all x X such that P(y = 1 | x) < γ, it holds that π(x) = 0. 8A policy π is called outcome monotonic if P(y = 1 | xi) P(y = 1 | xj) π(xi) π(xj) xi, xj X. 9If the policy π is deterministic, our results also hold for non outcome monotonic policies. 10A function f : 2X R is submodular if for every A, B X : A B and x X \ B it holds that f(A {x}) f(A) f(B {x}) f(B). Proposition 2 The function f is non-negative, submodular and monotone. The above result directly implies that the standard greedy algorithm [34] for maximizing a nonnegative, submodular and monotone function will find a solution A to the problem such that f(A) (1 1/e)f(A ), where A is the optimal set of counterfactual explanations. The algorithm starts from a solution set A = and it iteratively adds to A the counterfactual explanation x Pπ \ A that provides the maximum marginal difference f(A {x}) f(A). Algorithm 1 in Appendix C provides a pseudocode implementation of the algorithm. Finally, since the greedy algorithm computes the marginal difference of f for at most m elements per iteration and, following from the proof of Proposition 2, the marginal difference f(A {x}) f(A) can be computed in O(m), then it immediately follows that, in our problem, the greedy algorithm has an overall complexity of O(km2). 4 Finding the optimal policy and counterfactual explanations In this section, our goal is to jointly find the optimal decision policy and set of counterfactual explanations A , i.e., π , A = argmax (π,A):A Pπ |A| k u(π, A) (4) where, similarly as in the previous section, k is the maximum number of counterfactual explanations the decision maker is willing to provide to the population to balance the right to explanation with trade secrets. By jointly optimizing both the decision policy and the counterfactual explanations, we may obtain an additional gain in terms of utility in comparison with just optimizing for the set of counterfactual explanations given the optimal decision policy in a non-strategic setting, as shown in Figure 6 in Appendix D. Moreover, as we will show in the experimental evaluation in Section 6, this additional gain will be significant. Similarly as in Section 3, we cannot expect to find the optimal policy and set of counterfactual explanations in polynomial time. More specifically, we have the following negative result, which easily follows from Proposition 4 and slightly extending the proof of Theorem 1: Theorem 3 The problem of jointly finding both the optimal policy and the set of counterfactual explanations that maximize utility under a cardinality constraint is NP-hard. However, while the problem of finding both the policy and the set of counterfactual explanations appears significantly more challenging than the problem of finding just the set of counterfactual explanations given a pre-defined policy (refer to Eq. 3), the following proposition shows that the problem is not inherently harder. More specifically, for each possible set of counterfactual explanations, it shows that the policy that maximizes the utility can be easily computed (proven in Appendix B.3): Proposition 4 Given a set of counterfactual explanations A Y := {x X : P(y = 1 | x) γ}11, the policy π A = argmaxπ:A Pπ u(π, A) that maximizes the utility is deterministic and can be found in polynomial time, i.e., π A(x) = 1 if x A {x A : P(y = 1 | x ) > P(y = 1 | x) c(x, x ) 1} = x Y 0 otherwise. (5) The above result implies that, to set all the values of the optimal decision policy, we only need to perform O(km) comparisons. Moreover, it reveals that, in contrast with the non strategic setting, the optimal policy given a set of counterfactual explanations is not a deterministic threshold rule with a single threshold [17, 22], i.e., π(x) = 1 if P(y = 1 | x) γ 0 otherwise, (6) but rather a more conservative deterministic decision policy that does not depend only on the outcome P(y = 1 | x) and γ but also on the cost individuals pay to change features. Moreover, we can build up on the above result to prove that the problem of finding the optimal decision policy and set of counterfactual explanations can be reduced to maximizing a non-monotone submodular function. To this aim, let π A be the optimal policy induced by a given set of counterfactual explanations A, as in 11Since the decision maker is rational, she will never provide an explanation that contributes negatively to her utility. Proposition 4, and define the set function h(A) = u(π A, A) over the ground set Y. Then, we have the following proposition (proven in Appendix B.4): Proposition 5 The function h is non-negative, submodular and non-monotone. Fortunately, there exist efficient algorithms with global approximation guarantees for maximizing a non-monotone submodular function under cardinality constraints. In our work, we use the randomized polynomial time algorithm by Buchbinder et al. [35], which can find a solution A such that h(A) (1/e)h(A ), where A and π A are the optimal set of counterfactual explanations and decision policy, respectively. The algorithm is just a randomized variation of the standard greedy algorithm. It starts from a solution set A = and it iteratively adds one counterfactual explanation x Y\A. However, instead of greedily choosing the element x that provides the maximum marginal difference h(A {x}) h(A), it sorts all the candidate elements with respect to their marginal difference and picks one at random among the top k. Algorithm 2 in Appendix C provides a pseudocode implementation of the algorithm. Finally, since the above randomized algorithm has a complexity of O(km) and, following from the proof of Proposition 5, the marginal difference of h can be computed in O(m), it readily follows that, in our problem, the algorithm has a complexity of O(km2). 5 Increasing the diversity of the counterfactual explanations In many cases, decision makers may like to ensure that individuals across the whole spectrum of the population are incentivized to self-improve. For example, in a loan scenario, the bank may use age group as a feature to estimate the probability that a customer repays the loan, however, it may like to deploy a decision policy that incentivizes individuals across all age groups in order to improve the financial situation of all. To this aim, the decision maker can increase the diversity of the optimal set of counterfactual explanations by incorporating a matroid constraint into the problem formulation, rather than a cardinality constraint. Formally, consider disjoint sets X1, X2, . . . , Xl such that S i Xi = X and integers d1, d2, . . . , dl such that k = P i di. Then, a partition matroid is the collection of sets {S 2X : |S Xi| di i [l]}. In the loan example, the decision maker could search for a set of counterfactual explanations A within a partition matroid where each one of the Xi s corresponds to the feature values covered by each age group and di = k/l i [l]. This way, the set of counterfactual explanations A would include explanations for every age group. In this case, the decision maker could rely on a variety of polynomial time algorithms with global guarantees for submodular function maximization under matroid constraints, e.g., the algorithm by Calinescu et al. [36]. 6 Experiments In this section, we evaluate Algorithms 1 and 2 using real loan and credit card data and show that the counterfactual explanations and decision policies found by our algorithms achieve higher utility than several competitive baselines. Appendix E contains additional experiments on synthetic data. Experimental setup. We experiment with two publicly available datasets: (i) the lending dataset [37], which contains information about all accepted loan applications in Lending Club during the 2007-2018 period and (ii) the credit dataset [38], which contains information about a bank s credit card payoffs12. For each accepted loan applicant (or credit card holder), we use various demographic information and financial status indicators as features x and the current loan status (or credit payoff status) as label y. Appendix F.1 contains more details on the specific features we used in each dataset and also describes the procedure we followed to approximate P(y | x). To set the values of the cost function c(xi, xj), we use the maximum percentile shift among actionable features13, similarly as in Ustun et al. [13]. More specifically, let L be the set of actionable (numerical) features and L be the set of non-actionable (discrete-valued) features14. Then, for each pair of feature values xi, xj we define the cost function as: 12We used a version of the credit dataset preprocessed by Ustun et al. [13] 13A feature is actionable if an individual can change its values in order to get a positive decision. 14In the credit dataset, L contains Marital Status, Age Group and Education Level and L contains the remaining features and, in the lending dataset, L contains all features. 10% 20% 30% 50% 100% Allowed percentile shift (1/α) Utility, u(π, A) Black box Minimum cost Diverse Algorithm 1 Algorithm 2 (a) Lending dataset 10% 20% 30% 50% 100% Allowed percentile shift (1/α) Utility, u(π, A) Black box Minimum cost Diverse Algorithm 1 Algorithm 2 (b) Credit dataset Figure 1: Utility achieved by five types of decision policies and counterfactual explanations against the value of the parameter α, in the lending and credit datasets. In panel (a), the number of feature values is m = 400 and, in panel (b), it is m = 3200. In both panels, we set k = 0.05m and we repeat each experiment 20 times. c(xi, xj) = α maxl L |Ql(xj,l) Ql(xi,l)| if xi,l = xj,l l L otherwise, (7) where xj,l is the value of the l-th feature for the feature value xj, Ql( ) is the CDF of the numerical feature l L and α 1 is a scaling factor. As an exception, in the credit dataset, we always set the cost c(xi, xj) between two feature values to if Ql(xj,l) < Ql(xi,l) for l {Total Overdue Counts, Total Months Overdue} considering the fact that history of overdue payments cannot be erased. In this context, we would like to acknowledge that more sophisticated cost functions can be designed in terms of feasibility and difficulty of adaptation, taking into account domain knowledge and information about the deployed classifier, however, it goes beyond the scope of our work. Finally, in our experiments, we compare the utility of the following decision policies and counterfactual explanations: Black box: decisions are taken by the optimal decision policy in the non-strategic setting, given by Eq. 6, and individuals do not receive any counterfactual explanations. Minimum cost: decisions are taken by the optimal decision policy in the non-strategic setting, given by Eq. 6, and individuals receive counterfactual explanations of minimum cost with respect to their initial feature values, similarly as in previous work [13, 15, 39]. More specifically, we cast the problem of finding the set of counterfactual explanations as the minimization of the weighted average cost individuals pay to change their feature values to the closest counterfactual explanation, i.e., Amc = argmin A Pπ : |A| k xi X\Pπ P(xi) min xj A c(xi, xj), and realize that this problem is a version of the k-median problem, which we can solve using a greedy heuristic [40]. Diverse: decisions are taken by the optimal decision policy in the non-strategic setting, given by Eq. 6, and individuals receive a set of diverse counterfactual explanations of minimum cost with respect to their initial feature values, similarly as in previous work [16, 41], i.e., Ad = argmax A Pπ : |A| k xi X\Pπ P(xi)I(R(xi) A = ), To solve the above problem, we realize it can be reduced to the weighted version of the maximum coverage problem, which can be solved using a well-known greedy approximation algorithm [42]. Algorithm 1: decisions are taken by the optimal decision policy in the non-strategic setting, given by Eq. 6, and individuals receive counterfactual explanations given by Eq. 2, where A is found using Algorithm 1. Algorithm 2: decisions are taken by the decision policy given by Eq. 5 and individuals receive counterfactual explanations given by Eq. 2, where A is found using Algorithm 2. Results. We start by comparing the utility achieved by each of the decision policies and counterfactual explanations in both datasets, for several values of the parameter α, which is proportional to the difficulty of changing features. Figure 1 summarizes the results, which show that Algorithm 1 and 0.249 0.847 0.975 0.991 0.996 Final P(y = 1 | x) Initial P(y = 1 | x) (a) Alg. 1, lending 0.249 0.847 0.975 0.991 0.996 Final P(y = 1 | x) Initial P(y = 1 | x) (b) Alg. 2, lending 0.502 0.763 0.858 0.887 0.916 Final P(y = 1 | x) Initial P(y = 1 | x) (c) Alg. 1, credit 0.502 0.763 0.858 0.887 0.916 Final P(y = 1 | x) Initial P(y = 1 | x) (d) Alg. 2, credit Figure 2: Transportation of mass induced by the policies and counterfactual explanations used in Algorithm 1 and 2 in both the lending and the credit dataset. For each individual in the population, whose best-response is to change her feature value, we record her outcome P(y = 1 | x) before the best response (Initial P(y = 1 | x)) and after the best response (Final P(y = 1 | x)). In each panel, the color is proportional to the percentage of individuals who move from initial P(y = 1 | x) to final P(y = 1 | x) and we set α = 2. 0 2 4 10 20 k Utility, u(π, A) Black box Minimum cost Diverse Algorithm 1 Algorithm 2 (a) Utility vs. k 0 4 10 20 40 100 k Utility, u(π, A) Black box Algorithm 2 - pl = 0.1 Algorithm 2 - pl = 0.5 Algorithm 2 - pl = 0.9 (b) Utility vs. k under leakage Figure 3: Number of counterfactual explanations and information leakage. Panel (a) shows the utility achieved by five types of decision policies and counterfactual explanations against the number of counterfactual explanations k. Panel (b) shows the utility achieved by Algorithm 2 against the number of counterfactual explanations k for several values of the leakage probability pl. In both panels, we use the lending dataset, the number of feature values is m = 400, we set α = 2, we repeat each experiment involving randomization 20 times. Algorithm 2 consistently outperform all baselines and, as the cost of adapting to feature values with higher outcome values decreases (smaller α), the competitive advantage by jointly optimizing the decision policy and the counterfactual explanations (Algorithm 2) grows significantly. This competitive advantage is more apparent in the credit card dataset because it contains non actionable features (e.g., credit overdue counts) and, under the optimal decision policy in the non-strategic setting, it is difficult to incentivize individuals who receive a negative decision to improve by just optimizing the set of counterfactual explanations they receive. For specific examples of counterfactual explanations provided by Algorithm 1 and the minimum cost baseline, refer to Appendix F.2. To understand the differences in utility caused by the two proposed algorithms, we measure the transportation of mass induced by the policies and counterfactual explanations used in Algorithm 1 and 2 in both datasets, as follows. For each individual in the population whose best-response is to change her feature value, we record her outcome P(y = 1 | x) before and after the best response. Then, we discretize the outcome values using percentiles. Figure 2 summarizes the results, which show several interesting insights. In the lending dataset, we observe that a large portion of individuals do improve their outcome even if we only optimize the counterfactual explanations (Panel (a)). In contrast, in the credit dataset, we observe that, if we only optimize the counterfactual explanations (Panel (c)), most individuals do not improve their outcome. That being said, if we jointly optimize the decision policy and counterfactual explanations (Panels (b) and (d)), we are able to incentivize a large portion of individuals to self improve in both datasets. Next, we focus on the lending dataset and evaluate the sensitivity of our algorithms. First, we measure the influence that the number of counterfactual explanations has on the utility achieved by each of the decision policies and counterfactual explanations. As shown in Figure 3(a), our algorithms just need a small number of counterfactual explanations to provide significant gains in terms of utility with respect to all the baselines. Second, we challenge the assumption that individuals do not share the counterfactual explanations they receive with other individuals with different feature values. To this end, we assume that, given the set of counterfactual explanations A found by Algorithm 2, individuals with initial feature value x receive the counterfactual explanation E(x) A given by Eq. 2 and, with Under 25 25 - 39 40 - 59 Over 59 Age group (a) Rejected population by age Under 25 25 - 39 40 - 59 Over 59 Age group Explanations Cardinality Matroid (b) Explanations by age Under 25 25 - 39 40 - 59 Over 59 Age group Relative improvement Cardinality Matroid (c) Relative improvement by age Figure 4: Increasing the diversity of the provided counterfactual explanations. Panel (a) shows the population per age group, rejected by the optimal threshold policy in the non strategic setting. Panel (b) shows a comparison of the age distribution of counterfactual explanations in A produced by the greedy algorithm under a cardinality and a matroid constraint. Panel (c) shows the relative improvement of each age group. In all panels, we use the credit dataset and we set k = 32 and α = 2. probability pl, they also receive an additional explanation E (x) picked at random from A and they follow the counterfactual explanation that benefits them the most. Figure 3(b) summarizes the results for several values of pl and number of counterfactual explanations, which show that the policies and explanations provided by Algorithm 2 present a significant utility advantage even when the leakage probability pl is large. Finally, we focus on the credit dataset and consider a scenario in which a bank aims not only to continue providing credit to the customers that are more likely to repay but also provide explanations that incentivize individuals across all age groups to maintain their credit. To this end, we incorporate a partition matroid constraint that ensures the counterfactual explanations are diverse across age groups, as described in Section 5, and use a slightly modified version of Algorithm 1 to solve the constrained problem [34], which enjoys a 1/2 approximation guarantee. Figure 4 summarizes the results, which show that: (i) optimizing under a cardinality constraint leads to an unbalanced set of explanations, favoring the more populated age groups (25 to 59) while completely ignoring the recourse potential of individuals older than 60; (ii) the relative group improvement, defined as P xi Xz\Pπ P(xi)[P(y | xi j) P(y | xi)]/ P xi Xz\Pπ P(xi), where Xz is the set of feature values corresponding to age group z and xi j is the best response of individuals with initial feature value xi Xz, is more balanced across age groups, showing that the matroid constraint can be used to generate counterfactual explanations that help the entire spectrum of the population to self-improve. 7 Conclusions In this paper, we have designed several algorithms that allow us to find the decision policies and counterfactual explanations that maximize utility in a setting in which individuals who are subject to the decisions taken by the policies use the counterfactual explanations they receive to invest effort strategically. Moreover, we have experimented with synthetic and real lending and credit card data and shown that the counterfactual explanations and decision policies found by our algorithms achieve higher utility than several competitive baselines. By uncovering a previously unexplored connection between strategic machine learning and interpretable machine learning, our work opens up many interesting directions for future work. For example, we have adopted a specific type of mechanism to provide counterfactual explanations (i.e., one feature value per individual using a Stackelberg formulation). A natural next step would be to extend our analysis to other types of mechanisms fitting a variety of real-world applications. Moreover, we have assumed that the cost individuals pay to change features is given. However, our algorithms would be more effective if we develop a methodology to reliably estimate the cost function from real observational (or interventional) data. In our work, we have assumed that features take discrete values and individuals who are subject to the decisions do not share information between them. It would be interesting to lift these assumptions, extend our analysis to real-valued feature values, and develop decision policies and counterfactual explanations that are robust to information sharing between individuals (refer to Figure 3(c)). Finally, by assuming that P(y | x) does not change after individuals best respond, we are implicitly assuming that there are not unobserved features that partially describe the true causal effect between the observed features x and the outcome variable y. However, in practice, this assumption is likely to be violated and P(y | x) may change after individuals best respond, as recently noted by Miller et al. [31]. In this context, it would be very interesting to find counterfactual explanations that are robust to unmeasured confounding. 8 Broader Impact In recent years, there has been a heated debate concerning the (lack of) transparency of machine learning models used to inform decisions that have significant consequences for individuals. In this context, decision makers are usually reluctant to share their decision policy with the individuals they decide upon because of trade secret concerns. In our work, we show that, if explanations for (semi-)automated decisions are chosen taking into account how individuals will best respond to them, the explanations can significantly increase the decision-maker s utility while also helping the individuals to self-improve, making transparency desirable for both sides. Acknowledgements This project has received funding from the European Research Council (ERC) under the European Union s Horizon 2020 research and innovation programme (grant agreement No. 945719). [1] Finale Doshi-Velez and Been Kim. Towards a rigorous science of interpretable machine learning. ar Xiv preprint ar Xiv:1702.08608, 2017. [2] Adrian Weller. Challenges for transparency. 2017. [3] Zachary C Lipton. The mythos of model interpretability. Queue, 16(3):31 57, 2018. [4] David Gunning and David W Aha. Darpa s explainable artificial intelligence program. AI Magazine, 40(2):44 58, 2019. [5] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206 215, 2019. [6] Paul Voigt and Axel Von dem Bussche. The eu general data protection regulation (gdpr). A Practical Guide, 1st Ed., Cham: Springer International Publishing, 2017. [7] Sandra Wachter, Brent Mittelstadt, and Luciano Floridi. Why a right to explanation of automated decision-making does not exist in the general data protection regulation. International Data Privacy Law, 7(2):76 99, 2017. [8] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Why should i trust you? explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016. [9] Pang Wei Koh and Percy Liang. Understanding black-box predictions via influence functions. In Proceedings of the 34th International Conference on Machine Learning, 2017. [10] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. In Advances in neural information processing systems, 2017. [11] Supriyo Chakraborty, Richard Tomsett, Ramya Raghavendra, Daniel Harborne, Moustafa Alzantot, Federico Cerutti, Mani Srivastava, Alun Preece, Simon Julier, Raghuveer M Rao, et al. Interpretability of deep learning models: a survey of results. In 2017 IEEE Smart World, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (Smart World/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), pages 1 6. IEEE, 2017. [12] Sandra Wachter, Brent Mittelstadt, and Chris Russell. Counterfactual explanations without opening the black box: Automated decisions and the gdpr. Harv. JL & Tech., 31:841, 2017. [13] Berk Ustun, Alexander Spangher, and Yang Liu. Actionable recourse in linear classification. In Proceedings of the Conference on Fairness, Accountability, and Transparency, pages 10 19, 2019. [14] W James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071 22080, 2019. [15] Amir-Hossein Karimi, Gilles Barthe, Borja Belle, and Isabel Valera. Model-agnostic counterfactual explanations for consequential decisions. ar Xiv preprint ar Xiv:1905.11190, 2019. [16] Ramaravind K. Mothilal, Amit Sharma, and Chenhao Tan. Explaining machine learning classifiers through diverse counterfactual explanations. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 2020. [17] Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. Algorithmic decision making and the cost of fairness. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 797 806, 2017. [18] Niki Kilbertus, Manuel Gomez-Rodriguez, Bernhard Schölkopf, Krikamol Muandet, and Isabel Valera. Fair decisions despite imperfect predictions. In AISTATS, 2019. [19] Jon Kleinberg, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig, and Sendhil Mullainathan. Human decisions and machine predictions. The quarterly journal of economics, 133(1):237 293, 2018. [20] Shira Mitchell, Eric Potash, Solon Barocas, Alexander D Amour, and Kristian Lum. Predictionbased decisions and fairness: A catalogue of choices, assumptions, and definitions. ar Xiv preprint ar Xiv:1811.07867, 2018. [21] Behzad Tabibian, Stratis Tsirtsis, Moein Khajehnejad, Adish Singla, Bernhard Schölkopf, and Manuel Gomez-Rodriguez. Optimal decision making under strategic behavior. Arxiv:1905.09239, 2020. [22] Isabel Valera, Adish Singla, and Manuel Gomez Rodriguez. Enhancing the accuracy and fairness of human decision making. In Advances in Neural Information Processing Systems, pages 1769 1778, 2018. [23] S. Coate and G. Loury. Will affirmative-action policies eliminate negative stereotypes? The American Economic Review, 1993. [24] R. Fryer and G. Loury. Valuing diversity. Journal of Political Economy, 2013. [25] L. Hu and Y. Chen. A short-term intervention for long-term fairness in the labor market. In WWW, 2018. [26] Jon Kleinberg and Manish Raghavan. How do classifiers induce agents to invest effort strategically? In Proceedings of the 2019 ACM Conference on Economics and Computation, pages 825 844, 2019. [27] Juan C Perdomo, Tijana Zrnic, Celestine Mendler-Dünner, and Moritz Hardt. Performative prediction. ar Xiv preprint ar Xiv:2002.06673, 2020. [28] Moritz Hardt, Eric Price, and Nati Srebro. Equality of opportunity in supervised learning. In Advances in neural information processing systems, pages 3315 3323, 2016. [29] Lydia T Liu, Sarah Dean, Esther Rolf, Max Simchowitz, and Moritz Hardt. Delayed impact of fair machine learning. In Advances in neural information processing systems, 2018. [30] Richard M Karp. Reducibility among combinatorial problems. In Complexity of computer computations, pages 85 103. Springer, 1972. [31] John Miller, Smitha Milli, and Moritz Hardt. Strategic adaptation to classifiers: A causal perspective. ar Xiv preprint ar Xiv:1910.10362, 2019. [32] Solon Barocas, Andrew D Selbst, and Manish Raghavan. The hidden assumptions behind counterfactual explanations and principal reasons. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pages 80 89, 2020. [33] Credit score simulator. https://www.creditkarma.com/tools/credit-score-simulator/. [34] George L Nemhauser, Laurence A Wolsey, and Marshall L Fisher. An analysis of approximations for maximizing submodular set functions i. Mathematical programming, 14(1):265 294, 1978. [35] Niv Buchbinder, Moran Feldman, Joseph Naor, and Roy Schwartz. Submodular maximization with cardinality constraints. In Proceedings of the twenty-fifth annual ACM-SIAM symposium on Discrete algorithms, pages 1433 1452. SIAM, 2014. [36] Gruia Calinescu, Chandra Chekuri, Martin Pal, and Jan Vondrák. Maximizing a monotone submodular function subject to a matroid constraint. SIAM Journal on Computing, 40(6):1740 1766, 2011. [37] Lending club dataset. https://www.kaggle.com/wordsforthewise/lending-club/version/3. [38] I-Cheng Yeh and Che-hui Lien. The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications, 36(2):2473 2480, 2009. [39] Gabriele Tolomei, Fabrizio Silvestri, Andrew Haines, and Mounia Lalmas. Interpretable predictions of tree-based ensembles via actionable feature tweaking. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pages 465 474, 2017. [40] Roberto Solis-Oba. Approximation algorithms for the k-median problem. In Efficient Approximation and Online Algorithms, pages 292 320. Springer, 2006. [41] Chris Russell. Efficient search for diverse coherent explanations. In Proceedings of the Conference on Fairness, Accountability, and Transparency, pages 20 28, 2019. [42] Dorit S Hochbaum and Anu Pathria. Analysis of the greedy approach in problems of maximum k-coverage. Naval Research Logistics (NRL), 45(6):615 627, 1998. [43] Michael Brückner and Tobias Scheffer. Stackelberg games for adversarial prediction problems. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 547 555, 2011. [44] Nilesh Dalvi, Pedro Domingos, Sumit Sanghai, and Deepak Verma. Adversarial classification. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 99 108, 2004. [45] Jinshuo Dong, Aaron Roth, Zachary Schutzman, Bo Waggoner, and Zhiwei Steven Wu. Strategic classification from revealed preferences. In Proceedings of the 2018 ACM Conference on Economics and Computation, pages 55 70, 2018. [46] Moritz Hardt, Nimrod Megiddo, Christos Papadimitriou, and Mary Wootters. Strategic classification. In Proceedings of the 2016 ACM conference on innovations in theoretical computer science, pages 111 122, 2016. [47] Lily Hu, Nicole Immorlica, and Jennifer Wortman Vaughan. The disparate effects of strategic manipulation. In Proceedings of the Conference on Fairness, Accountability, and Transparency, pages 259 268, 2019. [48] Smitha Milli, John Miller, Anca D Dragan, and Moritz Hardt. The social cost of strategic classification. In Proceedings of the Conference on Fairness, Accountability, and Transparency, pages 230 239, 2019.