# learning_models_for_actionable_recourse__a571f5bb.pdf

Learning Models for Actionable Recourse

Alexis Ross Harvard University Allen Institute for Artiﬁcial Intelligence

alexisr@allenai.org

Himabindu Lakkaraju

Harvard University hlakkaraju@seas.harvard.edu

Osbert Bastani University of Pennsylvania obastani@seas.upenn.edu

As machine learning models are increasingly deployed in high-stakes domains such as legal and ﬁnancial decision-making, there has been growing interest in post-hoc methods for generating counterfactual explanations. Such explanations provide individuals adversely impacted by predicted outcomes (e.g., an applicant denied a loan) with recourse i.e., a description of how they can change their features to obtain a positive outcome. We propose a novel algorithm that leverages adversarial training and PAC conﬁdence sets to learn models that theoretically guarantee recourse to affected individuals with high probability. We demonstrate the efﬁcacy of our approach with extensive experiments on real data.

1 Introduction

In recent years, there has been a growing interest in using machine learning to inform consequential decisions in legal and ﬁnancial decision-making e.g., deciding whether to give an applicant a loan [Hardt et al., 2016], bail to a defendant [Lakkaraju and Rudin, 2017], or parole to a prisoner [Zeng et al., 2017]. Because these decisions have an impact on the lives of the concerned individuals, it is critical to explain why the model made its prediction. Explanations are important not only to ensure that there are no issues with the way the prediction is made (e.g., making sure the decision is free of racial/gender bias [Hardt et al., 2016] and does not suffer from causal issues [Bastani et al., 2017]), but also to give the affected individual a justiﬁcation for the decision. Thus, there has been a great deal of recent interest in explainable machine learning [Lou et al., 2012, Wang and Rudin, 2015, Ribeiro et al., 2016b, Lakkaraju et al., 2019].

We focus on counterfactual explanations [Wachter et al., 2017], which specify how features can be changed to obtain a different model prediction. These explanations can be used to provide individuals who are negatively impacted by model outcomes with actionable recourses i.e., actions they can take to receive a positive outcome [Ustun et al., 2019]. For instance, for an applicant denied a loan, an actionable recourse might be: to get a loan, increase your income by $1,000. Such a recourse must satisfy two properties to be actionable: (i) it only changes features that the individual can realistically modify e.g., it cannot change race or gender, but can change income, and (ii) the magnitude of the change must be reasonable. An actionable recourse like this may be considered necessary in some settings because it provides an individual with agency over a consequential decision that affects them.

Prior research has addressed the need for recourses through post-hoc algorithms for computing individual recourses corresponding to certain kinds of models e.g., using integer linear programming

Work started at Harvard University.

35th Conference on Neural Information Processing Systems (Neur IPS 2021).

in the context of linear models [Ustun et al., 2019], or using gradient descent on the input for differentiable models [Wachter et al., 2017]. However, these approaches do not guarantee that actionable recourses exist; at best, Ustun et al. [2019] guarantee that they ﬁnd one if it exists, and only for linear models. In other words, many affected individuals may not even be prescribed any actions that they can take to change their outcome.

We aim to provide tools for guaranteeing the existence of actionable recourses for domains in which recourses may be necessary. We propose a novel algorithm for learning models in a way that is designed to ensure the existence of actionable recourses so that affected individuals receive recourse with a high probability. To the best of our knowledge, our approach is the ﬁrst to train models for which recourse is likely to exist with a high probability. It builds on adversarial training [Goodfellow et al., 2015a], which is designed to ensure that models are robust to adversarial examples. At a high level, given a binary classiﬁcation model f : X ! Y, where Y = {0, 1}, an adversarial example is a perturbation δ 2 to an input x 2 X such that f (x + δ) 6= f (x). In a typical adversarial training setting, we choose to be small in some sense (e.g., in terms of L1 norm of δ 2 ) and aim to guarantee that adversarial examples do not exist i.e., f(x + δ) = f(x) for all δ 2 .

In contrast, in our setting, we intuitively want to ensure that adversarial examples do exist. In particular, given an input x for which f(x) = 0, we want there to exist recourse δ 2 such that f(x + δ) = 1, where in our case, is the given set of all permissible recourses for the application domain. Thus, we adapt existing adversarial training algorithms to ensure the existence of recourse. As an added beneﬁt, these algorithms can compute recourse signiﬁcantly faster than existing techniques for general differentiable models [Wachter et al., 2017].

While adversarial training heuristically encourages recourse to exist, it does not provide any theoretical guarantees. We build on PAC conﬁdence sets [Park et al., 2020] to guarantee that recourse exists with high probability (assuming the test distribution is the same as the training distribution).

We evaluate our approach on four real world datasets that cover lending, recidivism, bail, and credit outcomes. Our results demonstrate that our approach is very effective at improving recourse rates (i.e., the probability that individuals are given recourse) without noticeably reducing accuracy.

In addition, we show that we achieve this improvement in recourse rates without noticeably harming the quality of recourses or the brittleness of the underlying model. Firstly, we empirically demonstrate that our approach improves the rate at which models provide recourses that are grounded in reality: We ﬁnd that our approach encourages the existence of recourses that both obey causal constraints driven by real-world causal relationships and are in-distribution to the original data. Secondly, we show that our approach encourages the existence of robust recourses (i.e., recourses that result in positive outcomes even when changed in small ways). Lastly, we show that these improvements in recourse rates do not render the underlying classiﬁer brittle.2

Related work. Beyond Wachter et al. [2017], Ustun et al. [2019], other approaches have been proposed for generating recourses [Zhang et al., 2018, Hendricks et al., 2018, Mothilal et al., 2020, Looveren and Klaise, 2019, Poyiadzi et al., 2020, Karimi et al., 2020a,b,c]. However, all these works focus on how to compute recourse for given predictive models; in contrast, our goal is to learn predictive models that provide recourses at high rates. Any of these methods can be used in conjunction with ours. Our work also builds on adversarial training [Szegedy et al., 2014, Goodfellow et al., 2015b, Bastani et al., 2016, Shaham et al., 2018]. While recent work in model explainability has leveraged adversarial training to improve robustness of explanations [Lakkaraju et al., 2020b], their goal is to reduce the rate of adversarial examples, whereas ours is to increase the rate of recourses.

2 Problem Formulation

Preliminaries. Consider a binary classiﬁer f : X ! Y, where x 2 X Rn X are the features, y 2 Y = {0, 1} are the labels, and 2 Rn are the parameters. We assume that f has the form f (x) =

(g (x) 0), where g : X ! [0, 1] is a scoring function and 0 2 R is a decision threshold. We assume that g is differentiable in x i.e., rxg (x) exists almost everywhere.3

2Our code is available at https://github.com/alexisjihyeross/adversarial_recourse. 3Note that we have focused on real-valued features. We discuss how our algorithm can be extended to handling categorical features in Section B.4.

Recourse. We seek to ensure that individuals given negative outcomes by f are also given recourse. Deﬁnition 2.1. Given a classiﬁer f : X ! Y and a set Rn X, an input x 2 X has recourse if there exists δ 2 such that f(x + δ) = 1.

to denote the set of inputs for which recourse exists. The speciﬁc design of the set of permissible recourses Rn X is domain speciﬁc. We assume that 0 2 ; then, recourse automatically exists for positive outcomes f (x) = 1 by taking δ = 0. In addition, we assume that is a polytope i.e., = {δ 2 Rn X | Aδ + b 0} for some A 2 Rk n X and b 2 Rk. That is, can be expressed as a set of afﬁne constraints. This assumption is required for computational tractability.

A standard choice is = {δ | kδk1 δmax}, which says that the recourse can suggest changes to any feature by a bounded amount. We can apply this constraint after an afﬁne transformation of δ that appropriately scales different covariates. In addition, we often want to restrict features e.g., to avoid suggesting that an individual change their age, we can constrain δi = 0, or to avoid suggesting that an individual decrease their income to qualify for a loan, we can constrain δi 0. In principle, can also be tailored to individuals e.g., disallow suggesting increased education for individuals who cannot afford to do so. Finally, should be designed large enough so it includes a plausible recourse for every individual, yet small enough to ensure that the recourses do not overburden the individuals. All of these considerations are domain-speciﬁc; we describe our choices for datasets in our experiments in Section 4.

Probably Approximately has REcourse (PARE). Our goal is to ensure that recourse exists for most individuals. Given a conﬁdence level 2 R>0, we say that approximately has recourse if

i.e., recourse exists for f with probability at least 1 w.r.t. the distribution p(x) over individuals.

Then, our goal is to design an algorithm for estimating the model parameters so that f approximately has recourse. To do so, our algorithm takes as input a held-out calibration dataset Z X Y of examples (x, y) p, where p(x, y) is the distribution over labeled examples, and outputs model parameters ˆ (Z). Then, as in the probably approximately correct (PAC) learning framework [Valiant, 1984], our algorithm might additionally fail due to the randomness in Z. Thus, given a second

conﬁdence level 2 R>0, we say that ˆ Probably Approximately has REcourse (PARE) if

ˆ (Z) approximately has recourse

where p(Z) = Q

(x,y)2Z p(x, y). In other words, our algorithm produces a model that approximately has recourse with probability at least 1 δ over p(Z).

Constructing a PARE classiﬁer. Note that we can trivially obtain a PARE classiﬁer f by choosing the decision threshold 0 = 0, in which case f (x) = 1 for all x 2 X. However, this approach is undesirable since it assigns a positive outcome to all individuals. Instead, we want to maximize the performance of f (e.g., in terms of accuracy, F1 score, etc.) subject to a constraint that f is PARE. Thus, we divide the problem of constructing a PARE classiﬁer into two parts (i) increasing recourse rate: we train g in a way that heuristically increases the rate at which inputs x 2 X have recourse (for any 0), and (ii) guaranteeing recourse: we choose 0 to guarantee that the resulting f is PARE.

3 Our Algorithm

We describe our algorithm for learning models that satisfy PARE while achieving good performance. We describe Step 1 (increasing recourse rate) in Section 3.1 and Step 2 (guaranteeing recourse) in Section 3.3. We describe ways to compute recourse in Section 3.2.

3.1 Step 1: Increasing Recourse Rate

Background: adversarial training. Consider a classiﬁer f : X ! Y and perturbations Rn X. Given x 2 X, an adversarial example [Szegedy et al., 2014] for x is a perturbation δ 2 such that f (x + δ) 6= f (x) i.e., the perturbation δ (restricted to be small) changes the predicted label of x.

Adversarial examples are undesirable because they indicate that f is not robust to small changes to the input that should not affect the class label (e.g., according to human predictions). Thus, there has

been a great deal of interest in designing algorithms for improving robustness to adversarial examples. The basic approach is adversarial training [Goodfellow et al., 2015b], which dynamically computes adversarial examples for inputs in the training set and adds these to the objective as additional training examples, as in data augmentation. In particular, given a loss function : R Y ! R, where (g (x), y) is the loss for training example (x, y), they seek to compute

A( ) where A( ) = Ep(x,y)

(g (x), y) + λ max

δ2 (g (x + δ), y)

In A( ), the ﬁrst term is the supervised learning loss, the second is the adversarially robust loss (i.e., encourage g to be robust to adversarial examples x + δ), and λ 2 R 0 is a hyperparameter.

The challenge in optimizing A( ) is computing the maximum over δ 2 . To address this challenge, existing adversarial training algorithms leverage approximations enabling efﬁcient computation of δ.

Our approach. We use adversarial training to learn a model f for which inputs x 2 X have recourse at higher rates compared to models trained using conventional approaches. There are two key differences compared to adversarial training. First, we want recourse to exist, which corresponds to encouraging the existence of adversarial examples. Second, we only care about changing negative labels f (x) = 0 to positive ones f (x + δ) = 1, not vice versa. Thus, we want to solve

R( ) where R( ) = Ep(x,y)

(g (x), y) + λ min

δ2 (g (x + δ), 1)

Compared to A, R has a different second term in two ways: (i) the maximum over δ with a minimum, and (ii) the label y is replaced with the label 1. We note that when λ = 0, Eq. 1 is supervised learning.

We optimize Eq. 1 using adversarial training [Goodfellow et al., 2015a, Shaham et al., 2018, Lakkaraju et al., 2020a], which performs stochastic gradient descent on R( ). The key challenge to computing r R( ) is computing the gradient of the second term, which can be rewritten as follows:

δ2 (g (x + δ), 1) = r (g (x + δ ), 1),

where δ is the perturbation that maximizes the likelihood of positive outcome, or minimizes the loss between predicted and positive outcomes i.e.,

δ = arg min

(g (x + δ), 1). (2)

Computing δ is computationally challenging; thus, we use a Taylor approximation of the loss (g (x + δ), 1) (g (x), 1) + rx (g (x), 1)>δ. Using this approximation, Eq. 2 becomes

δ = arg min

(g (x + δ), 1) arg min

rx (g (x), 1)>δ,

where we have dropped the term (g (x), 1) since it is constant with respect to δ. Finally, note that the optimization problem on the last step is a linear program (LP), since we have assumed that δ 2 can be expressed as a set of afﬁne constraints and since the objective is linear in δ. In summary, we optimize Eq. 1 using stochastic gradient descent, where at each step we solve an LP to approximate the second term i.e., given parameters i and example (xi, yi) on gradient step i, and step size i 2 R>0 on step i, we use the stochastic gradient update

i+1 = i + i (r (g (x), y) + λ r (g (x + δ

i , 1)) where δ

i = arg min

rx (g (x), 1)>δ.

3.2 Computing Recourse

So far, we have focused on how to train a model f that provides recourse. Once we have trained f , we still need a way to compute recourse for a given individual x i.e., an algorithm A : X ! for computing δx = A(x) such that f(x + δx) = 1. We describe three such algorithms; in general, any algorithm designed to output recourses can be used [Karimi et al., 2020a, Poyiadzi et al., 2020].

Gradient descent. The approach proposed in Wachter et al. [2017] can directly be applied to compute recourse. They solve the problem

δx = arg min

{ (g (x + δ, 1) + λ0 kδk2},

where λ0 2 R 0 is a hyperparameter, using gradient descent on δ. The term kδk2 is designed to encourage the recourse δ to be small, which is often desirable in practice (we have excluded it from our formulation for simplicity). While this approach is generally effective, it can be very slow since we need to solve an optimization problem for each individual.

Adversarial training. We can also use adversarial training to compute recourse i.e.,

δx = arg min

rx (g (x), 1)>δ.

This approach approximates the gradient descent approach, but can be computed much more efﬁciently. Furthermore, since our objective in Eq. 1 is designed to encourage this speciﬁc perturbation to provide recourse, it performs nearly as well as gradient descent when f is trained with λ > 0.

Linear approximation. Finally, Ustun et al. [2019] propose an approach to compute recourse when f (x) =

(β>x β0) is a linear model. In this case, they compute δx using an integer linear program (ILP). For nonlinear models, we can instead use the linear approximation of g near x. Letting δx = A(x; β0, β) be the recourse generated by their algorithm, and using the Taylor expansion g (x0) g (x) + rxg (x)>(x0 x), we can use their approach to compute the recourse

δx = A(x; 0 g (x), rxg (x)).

However, this approach only works well when g is approximately linear as a function of x; otherwise, the Taylor approximation may be poor and δx may not satisfy the desired condition f (x + δx) = 1.

3.3 Step 2: Guaranteeing Recourse

Finally, we describe how we choose 0 to ensure f provides recourse with high probability. Note that 0 also controls the fraction of individuals given a positive outcome without the need for recourse; thus, we choose 0 to optimize the performance of f subject to a constraint that f is PARE.

Background: PAC conﬁdence sets. We build on work constructing PAC conﬁdence sets [Park et al., 2020]. Given x 2 X, they construct a model h , : X ! P(Y) (where P is the power set) that returns the set of all labels y with score above threshold 2 R i.e.,

h , (x) = {y 2 Y | h (x, y) } where h (x, y) =

g (x) if y = 1 1 g (x) if y = 0.

Given 2 R>0, we say is approximately correct if

Pp(x,y)[y 2 h , (x)] 1 ,

i.e., h , (x) contains the true label for x with high probability over p(x, y). Note that = 0 satisﬁes this condition, since h ,0(x) = {0, 1} for all x. The goal is to choose as large as possible while ensuring approximate correctness.

Park et al. [2020] proposes an estimator ˆ that takes as input (i) the pretrained model g : X ! [0, 1], (ii) a calibration dataset Z X Y of i.i.d. samples (x, y) p, and (iii) conﬁdence levels , 2 R>0, and constructs a threshold ˆ (Z) 2 [0, 1] that is probably approximately correct (PAC):

Pp(Z)[ˆ (Z) is approximately correct] 1 . (3)

In other words, ˆ (Z) is approximately correct with probability at least 1 according to p(Z). Their approach leverages the fact that ˆ (Z) is an estimator for a single parameter; thus, they can use learning theory to obtain PAC guarantees [Kearns et al., 1994].

PARE models via PAC conﬁdence sets. Given a model g : X ! R, an algorithm A : X ! for computing recourse, and a calibration set Z X Y, our algorithm leverages PAC conﬁdence sets to choose a threshold 0 = ˆ 0(Z) that ensures the resulting model fˆ satisﬁes the PARE constraint.

First, our algorithm uses the PAC conﬁdence set algorithm to construct the new calibration dataset

Z0 = {(x + A(x), 1) | (x, y) 2 Z}.

Intuitively, Z0 says that the correct label for every input x + A should be 1 i.e., the recourse constructed by A should satisfy f (x + A(x)) = 1.

Metrics Adult Compas Bail German

Baseline Ours Baseline Ours Baseline Ours Baseline Ours

Performance

F1 score 0.697 0.636 0.739 0.717 0.775 0.760 0.447 0.419 Accuracy 0.830 0.787 0.667 0.565 0.643 0.629 0.600 0.527 Precision 0.621 0.555 0.655 0.561 0.646 0.644 0.364 0.317 Recall 0.799 0.752 0.850 0.991 0.968 0.930 0.583 0.638

Recourse neg

Linear approx. 0.220 0.053 0.156 0.068 0.102 0.029 0.204 0.086 Gradient desc. 0.210 0.496 0.579 1.000 0.317 1.000 0.804 0.864 Adversarial train. 0.007 0.498 0.000 0.967 0.000 0.993 0.127 0.968

Recourse all

Linear approx. 0.453 0.328 0.773 0.982 0.957 0.981 0.607 0.593 Gradient desc. 0.461 0.661 0.883 1.000 0.970 1.000 0.890 0.920 Adversarial train. 0.321 0.659 0.722 0.999 0.952 0.999 0.517 0.980

Table 1: Performance and recourse for the baseline model (λ = 0) and the model trained with our algorithm (λ = 0.8), and for each of the three algorithms for computing recourse in Section 3.2. We show mean results across 3 random data splits and bold the higher value between the baseline and our algorithm.

Then, our algorithm constructs h ,ˆ (Z) using g , Z0, and the given , . The PAC guarantee in Eq. 3 says that with probability at least 1 over p(Z), we have

Pp(x,y)[y0 2 h ,ˆ (Z0)(x0)] 1

where (x0, y0) 2 Z0 is the example constructed from (x, y) 2 Z as described above. Note that the outer probability is over p(Z) since Z0 is a deterministic function of the random variable Z. Plugging in the deﬁnitions x0 = x + A(x) and y0 = 1, Eq. 4 becomes

Pp(x,y)[1 2 h ,ˆ (Z0)(x + A(x))] 1

and plugging in the deﬁnition of h , , it becomes

Pp(x,y)[g (x + A(x)) ˆ (Z0)] 1

Then, our algorithm returns ˆ 0(Z) = ˆ (Z0) (with given parameters as the remaining parameters). Since Eq. 5 is equivalent to the PARE condition, we have:

Theorem 3.1. ˆ satisﬁes the PARE condition.

4 Experiments

We evaluate our approach and show how it can effectively improve recourse rates while preserving accuracy. We also demonstrate how it can improve the correctness and robustness of recourses. In Appendix B, we provide additional results on how our approach affects fairness of models, as well as results on an NLP task with discrete covariates to demonstrate the ﬂexibility of our framework.

4.1 Experimental Setup

Datasets. We use four real-world datasets. The ﬁrst contains adult income information from the 1994 United States Census Bureau [Dua and Graff, 2017]. It includes information about adults

demographics, education, and occupations. Each adult is labeled as making below or > $50K a year, which can be thought of as a proxy for whether an individual will be able to repay a loan or not. The second contains information collected by Propublica about criminal defendants compas recidivism scores [Angwin et al., 2016]. This dataset includes information about defendants demographics and crimes, and each defendant is labeled as having either a high or low likelihood of reoffending,

Figure 1: How performance and recourse vary with λ; λ = 0 is the baseline and λ > 0 is our approach. We plot means and standard errors across 3 random data splits. The ﬁrst row shows performance metrics; the second and third show recourse metrics. Performance: F1 Accuracy. Recourse Algorithm: gradient descent adversarial training .

Figure 2: How performance and recourse vary with 0, for our approach (λ = 0.8, left) and the baseline (λ = 0, right). Performance: F1 Accuracy. Recourse: Recourse neg Recourse all.

as measured by the compas assessment tool. The third dataset represents bail outcomes from two different U.S. state courts from 1990-2009 [Schmidt and Witte, 1988]. It includes information about individuals criminal histories and demographics. Each defendant is labeled as having a high or low risk of recidivism. The fourth dataset is the german credit dataset [Dua and Graff, 2017], which contains individuals demographic, personal, and ﬁnancial information. Each applicant is labeled as either having high or low credit risk.

We standardize all continuous features to have mean 0 and variance 1. We randomly split each dataset into 80% train and 20% validation sets. We use 3 random data splits and report the mean across splits for the rest of this section, unless otherwise speciﬁed. For compas and adult, we hold out 500 examples from the validation set to form a test set; for german, we hold out 100 examples. For bail, we evaluate on a 500-instance subset of the test set. All results are reported on the test set.4

Models. All models are neural networks with 3 100-node hidden layers, dropout probability 0.3, and tanh activations. For evaluation, we choose the epoch achieving the highest validation F1 score.

Parameters. We experimented with λ values between 0.0 to 2.0 in increments of 0.2 and found that λ = 0.8 provided the best tradeoff between F1 score and recourse rate across all datasets; we evaluate how our results vary with λ below. For , we choose the set of actionable features (i.e., features that can be changed as part of the recourse) based on the dataset (two features for adult, bail, german; one for compas); we set δmax = 0.75 after standardizing features. (See Appendix B.2 for experiments investigating the effect of varying values of δmax). We also include domain-speciﬁc linear constraints in e.g., for the adult dataset, recourse can only require that hours worked increases. See Appendix A.3 for more details about our choices of .

4Our processed datasets have sizes: adult 32.5K, compas 6K, bail 8K, german 1K.

We choose the decision threshold 0 to maximize F1 score rather than to obtain PAC guarantees, since our goal is to understand the tradeoffs between model performance and recourse rates with a ﬁxed underlying predictive model f . We evaluate the effects of rigorously choosing 0 in Section 4.2.

Baselines. To the best of our knowledge, our approach is the ﬁrst to train models with the objective of providing recourse at a higher rate. Thus, we compare to a baseline that omits our recourse loss i.e., λ = 0.0. For our approach and this baseline, we evaluate the performance of each of the three different algorithms for computing recourse described in Section 3.2. We use the alibi implementation [Klaise et al., 2019] of the gradient descent algorithm for computing recourse [Wachter et al., 2017] and set the initial value of the hyperparameter λ0 = 0.001. We use LIME [Ribeiro et al., 2016a] as our linear approximation method with the approach proposed by Ustun et al. [2019].

Metrics. We evaluate our approach and the baselines with the following metrics: (i) standard performance metrics of accuracy and F1 score, (ii) recourse neg, the proportion of instances x with negative original outcomes that receive recourse such that f(x) = 0 but f(x + δ ) = 1, and (iii) recourse all, the proportion of all instances x such that either f(x) = 1 or f(x) = 0 but f(x + δ ) = 1. Metric (iii) is most useful for measuring rate of positive outcomes, since we want to include individuals who are originally assigned a positive outcome.

4.2 Efﬁcacy of Our Framework

In Table 1, we show the performance and recourse rates of models trained with our approach and baseline models. Overall, our approach signiﬁcantly improves recourse without sacriﬁcing performance. Across datasets, models trained using our approach offer recourse at signiﬁcantly higher rates than the baseline model, for both the adversarial training and gradient descent approaches to computing recourse. We do not observe this trend when using linear approximation to compute recourse; in this case, both the baseline and our approach perform poorly. We believe this effect can be explained by poor LIME approximations of f , which are exacerbated by adversarial training since it increases the nonlinearity of f . Figure 1 shows how these results vary with λ: F1 scores and accuracies are relatively stable as a function of λ.

In Figure 2, we plot how these results vary with 0 (for classiﬁers trained with λ = 0 and λ = 0.8 on a single data split), using the adversarial training method of computing recourse.5 High values of 0 lead to lower recourse rates in all cases. The curves for performance are similar for the baseline and for our approach, but the decline in recourse values begins at lower thresholds in the case of the baseline. Thus, our approach improves recourse for most choices of 0 without sacriﬁcing performance. The trade-off between recourse and performance depends on the dataset. For adult, performance increases while recourse decreases since the majority label in this dataset is negative, whereas for bail and compas, performance and recourse both decrease as 0 increases since the majority label is positive.

Performance under PARE guarantees. Next, we show that we can often obtain PARE guarantees without signiﬁcantly reducing performance. We compare the performance of choosing the decision threshold 0 to maximize F1 score to that of 0 chosen to obtain PARE classiﬁers. Speciﬁcally, for the latter, we compute 0 using the approach described in Section 3.3 with parameters = = 0.05. Then, we evaluate models at thresholds in 10 equally spaced increments from 0 to the upper bound and ﬁx 0 to maximize F1 score. For these experiments, we use the adversarial training algorithm to compute recourse; we observed similar trends for the gradient descent algorithm.

Results are shown in Table 2. In all cases, choosing 0 to satisfy the PARE condition yields a classiﬁer that returns recourse at a rate 1 = 0.95, which validates our theoretical guarantees. Furthermore, on all three datasets, the F1 score does not signiﬁcantly decrease when imposing the PARE condition. We do see a decrease in F1 score for the baseline model on the adult dataset, but the decrease for our model is smaller, suggesting that our end-to-end framework of training models and ﬁxing 0 is successful at guaranteeing recourse without a big drop in accuracy.6

5Note that the recourse neg values are low for λ = 0 because we use the adversarial training method to compute recourse, which builds on a fast linear approximation and thus does not exhaustively ﬁnd recourses. We use this method instead of the more effective gradient descent method for efﬁciency, since the latter is less efﬁcient and would require recomputing recourses for each threshold.

6We can obtain a weaker theoretical guarantee at a smaller cost in performance. For instance, applying PARE to our model in Table 2 with = = 0.25 results in an F1 score of 0.613 on the adult dataset.

Adult Compas Bail German F1 Recourse F1 Recourse F1 Recourse F1 Recourse

BL + F1 max 0.697 0.321 0.739 0.722 0.775 0.952 0.447 0.517 BL + PARE 0.400 0.981 0.722 0.972 0.777 0.996 0.442 0.997

Ours + F1 max 0.636 0.659 0.717 0.999 0.760 0.999 0.419 0.980 Ours + PARE 0.526 0.974 0.721 0.999 0.776 0.999 0.457 1.000

Table 2: Impact of choosing decision threshold 0 to satisfy the PARE constraint. We show F1 scores for the baseline model and the model trained with our algorithm using two methods of determining 0: (i) maximize the F1 score, and (ii) guarantee that the model satisﬁes the PARE constraint (+PARE, bolded). For the baseline model (BL), λ = 0; for our model (Ours), λ = 0.8.

Figure 3: Recourse neg for the german dataset using the causal recourse computing algorithm proposed by Karimi et al. [2020b]. We show means and standard errors across 20 random data splits for varying values of λ.

Groundedness of recourses. One key question is whether the recourses of models trained with our approach are grounded in reality i.e., whether they are plausible modiﬁcations of the ground truth label. For instance, if an individual is denied a loan, they should be given a recourse such that if they take the speciﬁed action, they actually increase their likelihood of paying back the loan; an individual may increase their income and be given a loan, but still fail to repay it. We want to ensure that our training approach increases the rate of recourses that obey causal relationships in the world, rather than recourses that exploit spurious correlations between features. Directly evaluating groundedness of recourses is challenging, since we do not know the ground truth labels for suggested recourses. Thus, we evaluate whether they are causally grounded and in-distribution.

First, we measure the rate at which causally grounded recourses are offered by models trained with our approach. We apply the algorithm for computing causal recourses (CR) introduced by Karimi et al. [2020b] to evaluate whether our proposed training algorithm improves the rate at which causally grounded recourses are offered. Because CR requires access to an underlying structural causal model (SCM) of the world, we only experiment with the german dataset, for which Karimi et al. [2020b] provide an associated SCM. We measure the proportion of test instances x for which the CR algorithm computes a valid recourse that satisﬁes the constraint that perturbations be bounded by δmax i.e. recourse neg .7 As shown in Figure 3, with increasing λ, the rate at which recourses are by the CR

algorithm increases. This ﬁnding suggests that our training algorithm encourages the existence of causally grounded recourses.

Second, we evaluate whether the recourses offered by models trained with our approach are indistribution with respect to the original training data. In line with prior work [Slack et al., 2020], we train classiﬁers to distinguish between original data instances and recourses computed by the gradient descent algorithm for models trained with our approach (λ = 0.8). In Table 3, we report

the accuracies of these classiﬁers on a held out test set. The low classiﬁer accuracies across all datasets indicate that these recourses are indistinguishable from original data instances. Thus, our framework encourages the existence of recourses that are in-distribution to the original data.

Robustness. Another key question is whether the recourses generated using our approach are robust i.e., whether small changes to the recourse result in valid recourses. For instance, if an individual

7We select hyperparameters for the CR algorithm that maximize recourse neg on the train set.

Adult Compas Bail German

Neural Network 0.54 0.56 0.52 0.51

Random Forest 0.53 0.55 0.51 0.51

Logistic Regression 0.51 0.48 0.48 0.47

Table 3: Accuracies of classiﬁers trained to distinguish between original data instances and recourses computed using the gradient descent algorithm for computing recourse for models trained on a single random data split with our approach (λ = 0.8).

Adult Compas Bail German Robustness Exp. Recourse Alg. BL Ours BL Ours BL Ours BL Ours

Recourse Grad. desc. 1.000 0.865 1.000 1.000 1.000 1.000 0.949 0.898 Advers. train. 1.000 0.841 1.000 1.000 1.000 1.000 0.857 1.000

Model 0.976 0.926 0.980 1.000 0.994 0.996 0.970 0.920

Table 4: The ﬁrst row shows the percentage of recourses found that are robust to noise. The second row shows the percentage of test inputs that are robust to noise in the recourse dimensions. We show results for models trained with varying λ (λ = 0 indicates the baseline, and λ = 0.8 indicates our approach) on a single data split. For each model, we show results for the gradient descent and adversarial training algorithms for computing recourse in Section 3.2.

increases their income by more (or even slightly less) than the amount suggested in the recourse, they would expect to still be provided with a positive decision.

For each computed recourse δ, we compute a noisy recourse δ0 by adding i.i.d. Gaussian noise to each actionable feature of δ i.e., δ0

i = δi + N(0, 0.1). We consider a recourse δ robust if its noisy recourse δ0 is valid i.e. if f(x + δ0) = 1. In the top row of Table 4, we report the percentage of recourses found that are robust for a baseline model and model trained with our adversarial approach. As shown, our training approach does not signiﬁcantly reduce the robustness of recourses: There is a slight drop in robustness for the adult dataset and for the gradient descent recourse algorithm for the german dataset; however, for compas and bail, recourses remain robust.

Effect on classiﬁer brittleness. We also investigate the effect of our adversarial training approach on model brittleness. In particular, we measure how sensitive models trained with our adversarial algorithm are to noises in the recourse dimensions, as compared to baseline models. We add i.i.d. Gaussian noise, as described above, to the actionable dimensions of original inputs, and compute the proportion of test instances for which the trained model is robust to this noise. As shown in the bottom row of Table 4, our training approach does not signiﬁcantly increase model brittleness we observe a small drop in model robustness in the recourse dimensions for the adult and german datasets and a small increase in robustness for the compas and bail datasets. These results suggest that our training approach effectively ensures recourse without rendering classiﬁers brittle.

5 Conclusion

We have proposed a novel algorithm for training models that are guaranteed to provide recourse for reversing adverse outcomes with high probability. Our experiments show that our algorithm trains models that provide recourse at high rates without sacriﬁcing accuracy compared to traditional learning algorithms. Future work includes extending our techniques beyond binary classiﬁcation.

Acknowledgements

We would like to thank the anonymous reviewers for their insightful feedback. This work is supported in part by the NSF awards #IIS-2008461, #IIS-2040989, and #CCF-1910769, and research awards from the Harvard Data Science Institute, Amazon, Bayer, and Google. The views expressed are those of the authors and do not reﬂect the ofﬁcial policy or position of the funding agencies.

Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. How we analyzed the compas

recidivism algorithm. Propublica, 2016.

Osbert Bastani, Yani Ioannou, Leonidas Lampropoulos, Dimitrios Vytiniotis, Aditya Nori, and

Antonio Criminisi. Measuring neural net robustness with constraints. In Advances in neural information processing systems, pages 2613 2621, 2016.

Osbert Bastani, Carolyn Kim, and Hamsa Bastani. Interpreting blackbox models via model extraction.

ar Xiv preprint ar Xiv:1705.08504, 2017.

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep

bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171 4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1423. URL https: //www.aclweb.org/anthology/N19-1423.

Dheeru Dua and Casey Graff. UCI machine learning repository, 2017. URL http://archive.ics.

uci.edu/ml.

Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial

examples. In International Conference on Learning Representations, 2015a.

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial

examples. In ICLR, 2015b.

Moritz Hardt, Eric Price, and Nati Srebro. Equality of opportunity in supervised learning. In Advances

in neural information processing systems, pages 3315 3323, 2016.

Lisa Anne Hendricks, Ronghang Hu, Trevor Darrell, and Zeynep Akata. Generating counterfactual

explanations with natural language. ar Xiv preprint ar Xiv:1806.09809, 2018.

Amir-Hossein Karimi, Gilles Barthe, Borja Balle, and Isabel Valera. Model-agnostic counterfactual

explanations for consequential decisions. In International Conference on Artiﬁcial Intelligence and Statistics, pages 895 905, 2020a.

Amir-Hossein Karimi, Bernhard Schölkopf, and Isabel Valera. Algorithmic recourse: from counter-

factual explanations to interventions, 2020b.

Amir-Hossein Karimi, Julius von Kügelgen, Bernhard Schölkopf, and Isabel Valera. Algorithmic

recourse under imperfect causal knowledge: a probabilistic approach, 2020c.

Michael J Kearns, Umesh Virkumar Vazirani, and Umesh Vazirani. An introduction to computational

learning theory. MIT press, 1994.

Janis Klaise, Arnaud Van Looveren, Giovanni Vacanti, and Alexandru Coca. Alibi: Algorithms

for monitoring and explaining machine learning models, 2019. URL https://github.com/ Seldon IO/alibi.

Himabindu Lakkaraju and Cynthia Rudin. Learning cost-effective and interpretable treatment regimes.

In Artiﬁcial Intelligence and Statistics, pages 166 175, 2017.

Himabindu Lakkaraju, Ece Kamar, Rich Caruana, and Jure Leskovec. Faithful and customizable

explanations of black box models. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pages 131 138. ACM, 2019.

Himabindu Lakkaraju, Nino Arsov, and Osbert Bastani. Robust and Stable Black Box Explanations.

Icml, 2020a.

Himabindu Lakkaraju, Nino Arsov, and Osbert Bastani. Robust and stable black box explanations. In

ICML, 2020b.

Arnaud Van Looveren and Janis Klaise. Interpretable counterfactual explanations guided by proto-

types, 2019.

Yin Lou, Rich Caruana, and Johannes Gehrke. Intelligible models for classiﬁcation and regression.

In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 150 158, 2012.

Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: A simple and

accurate method to fool deep neural networks. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2574 2582, 2016.

Ramaravind K Mothilal, Amit Sharma, and Chenhao Tan. Explaining machine learning classiﬁers

through diverse counterfactual explanations. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pages 607 617, 2020.

Sangdon Park, Osbert Bastani, Nikolai Matni, and Insup Lee. Pac conﬁdence sets for deep neural

networks via calibrated prediction. In ICLR, 2020.

Rafael Poyiadzi, Kacper Sokol, Raul Santos-Rodriguez, Tijl De Bie, and Peter Flach. Face: Feasible

and actionable counterfactual explanations. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, AIES 20, page 344 350, New York, NY, USA, 2020. Association

for Computing Machinery. ISBN 9781450371100. doi: 10.1145/3375627.3375850. URL https://doi.org/10.1145/3375627.3375850.

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. "why should I trust you?": Explaining the

predictions of any classiﬁer. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016, pages 1135 1144, 2016a.

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. " why should i trust you?" explaining the

predictions of any classiﬁer. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135 1144, 2016b.

Peter Schmidt and Ann D. Witte. Predicting recidivism in north carolina, 1978 and 1980. Inter-

university Consortium for Political and Social Research, 1988. URL https://www.ojp.gov/ pdffiles1/Digitization/115306NCJRS.pdf.

Uri Shaham, Yutaro Yamada, and Sahand Negahban. Understanding adversarial training: Increasing

local stability of supervised models through robust optimization. Neurocomputing, 307:195 204, 2018.

Dylan Slack, Sophie Hilgard, Emily Jia, Sameer Singh, and Himabindu Lakkaraju. Fooling lime

and shap: Adversarial attacks on post hoc explanation methods. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, AIES 20, page 180 186, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450371100. doi: 10.1145/3375627.3375830. URL https://doi.org/10.1145/3375627.3375830.

Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and

Christopher Potts. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631 1642, Seattle, Washington, USA, October 2013. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/D13-1170.

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow,

and Rob Fergus. Intriguing properties of neural networks. In ICLR, 2014.

Berk Ustun, Alexander Spangher, and Yang Liu. Actionable recourse in linear classiﬁcation. In

Proceedings of the Conference on Fairness, Accountability, and Transparency, pages 10 19, 2019.

Leslie G Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134 1142, 1984.

S. Wachter, Brent D. Mittelstadt, and Chris Russell. Counterfactual explanations without opening the

black box: Automated decisions and the gdpr. European Economics: Microeconomics & Industrial Organization e Journal, 2017.

Fulton Wang and Cynthia Rudin. Falling rule lists. In Artiﬁcial Intelligence and Statistics, pages

1013 1022, 2015.

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi,

Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. Huggingface s transformers: State-of-the-art natural language processing, 2019.

Jiaming Zeng, Berk Ustun, and Cynthia Rudin. Interpretable classiﬁcation models for recidivism

prediction. Journal of the Royal Statistical Society: Series A (Statistics in Society), 3(180):689 722, 2017.

Xin Zhang, Armando Solar-Lezama, and Rishabh Singh. Interpreting neural network judgments via

minimal, stable, and symbolic corrections. In Advances in Neural Information Processing Systems, pages 4874 4885, 2018.

Zhengli Zhao, Dheeru Dua, and Sameer Singh. Generating natural adversarial examples. In International Conference on Learning Representations (ICLR), 2018.