# trustworthy_actionable_perturbations__a405c6ec.pdf

Trustworthy Actionable Perturbations

Jesse Friedbaum 1 2 Sudarshan Adiga 2 Ravi Tandon 1 2

Counterfactuals, or modified inputs that lead to a different outcome, are an important tool for understanding the logic used by machine learning classifiers and how to change an undesirable classification. Even if a counterfactual changes a classifier s decision, however, it may not affect the true underlying class probabilities, i.e. the counterfactual may act like an adversarial attack and fool the classifier. We propose a new framework for creating modified inputs that change the true underlying probabilities in a beneficial way which we call Trustworthy Actionable Perturbations (TAP). This includes a novel verification procedure to ensure that TAP change the true class probabilities instead of acting adversarially. Our framework also includes new cost, reward, and goal definitions that are better suited to effectuating change in the real world. We present PAClearnability results for our verification procedure and theoretically analyze our new method for measuring reward. We also develop a methodology for creating TAP and compare our results to those achieved by previous counterfactual methods.

1. Introduction

As machine learning (ML) classifiers have experienced widespread adoption in applications that have an out-sized impact on individuals lives (such as credit lending (Leo et al., 2019), college admissions (Martinez Neda et al., 2021) and healthcare (Sauer et al., 2022)), the need to understand classifiers decision making and how to avoid undesirable classifications has become increasingly important. One of the most important tools for filling this need is the counterfactual: a counterfactual for a given input and classifier

1Program in Applied Mathematics, University of Arizona, Tucson, AZ, USA 2Department of Electrical and Computer Engineering, University of Arizona, Tucson, AZ, USA. Correspondence to: Jesse Friedbaum <friedbaum@math.arizona.edu>, Ravi Tandon <tandonr@arizona.edu>.

Proceedings of the 41 st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024. Copyright 2024 by the author(s).

is a similar input that results in a different classification. Suppose a classifier is designed to determine whether a loan application represents a good or bad credit risk. If the classifier determines a loan to be a bad credit risk, a counterfactual would be a modified loan application that is classified as a good credit risk, e.g. the individual in a loan application is a bad credit risk, but an otherwise identical applicant who is 5 years younger with a $500 higher monthly income would be a good credit risk. Wachter et al. (2017) first suggested the use of Counterfactuals Explanations (CE) to help understand classifier decision making. Subsequent works explored the use of counterfactuals to help individuals change undesirable classifications (Ustun et al., 2019; Karimi et al., 2021; Poyiadzi et al., 2020). Returning to the example of an individual turned down for a loan, this type of counterfactual would not suggest an individual decrease their age (clearly impossible), but rather make practical changes such as pay off all credit card debt and request a 10% smaller loan. These counterfactuals came to be known as Actionable Counterfactuals (AC) or Algorithmic Recourses (AR). Although these counterfactuals change a classifier s decision, it can not be assumed they will have the same affect on the real world (Freiesleben, 2022), e.g. a change that causes a classifier to determine someone is a good credit risk may not increase the person s odds of paying off the loan in reality. K onig et al. (2023) point out that a counterfactual could change a classifiers decision without changing the real world if the modifications are not causally linked to the output. For example, having a mailing address in an affluent neighborhood may correlate to higher odds of paying off a loan and changing the address could affect a classifiers decision, but there is no causal link. Accordingly, telling an applicant to change their mailing address to a P.O. box in a wealthy neighborhood would not improve their chances of paying off a loan. K onig et al. (2023) proposed a framework to ensure modifications are causally linked to the output called Improvement-Focused Causal Recourse (ICR).

In this paper, we focus on tackling new challenges for this problem, which have not been addressed in prior work. Trustworthy Actionable Perturbations (TAP) focus on three novel improvements for affecting real world outcomes.

Trustworthiness Against Adversarial Examples: Szegedy et al. (2013) showed that ML classifiers are brittle and small modifications to an input can cause misclassifications in oth-

Trustworthy Actionable Perturbations

Real world feasible actions

Cost function

(monetary change)

eo Ym Ok=</latexit>(mutable features)

(80 % chance

of repayment)

Jjd IKy E1EBXq Ilai KIZek TP6MV6s J6s V+t MZqx0swh+g Hr/Qv Nv Za I</latexit>(failed loan application)

Perturbation

l0j E5QHnmogmro Ct VRA1E0QY/o Gb04D86T8+q8z Vt Tzm Lm EP2Q8/4F21m XOg=</latexit>(modiﬁed loan

application)

Veriﬁcation

Changes true class

6QV3UQw Q9o Ef0j F60ufakv Wpvy9a SVswcox+lv X8BCE2S7Q=</latexit>adversarially

Trustworthy

(Accept) (Reject)

CE AC/ IRC TAP AR (ours) Intended to Explain Classiﬁer Decisions X Intended to Change Classiﬁer Decisions X Intended to Change Real World Outcomes X X Real World Feasible Changes X X X Optimally Eﬃcient X X Flexible Multi-Class Deﬁnition of Goals X Ensured Causally Related Changes X Veriﬁcation Against Adversarial Examples X

s R6r37UE/lfr Rdqr Oz Mmwkh TQRYfe RFHOk DJ3Wj IJCWa T2ODi WTxroi Msc REx+k U0h DOa7Zdt5cno4RUq3Z1STq Vsn VWtqzr Sqlxke WRhy M4hl Owo AYNu IQWt IHACB7h GV4Mbjw Zr8boj Vn ZDOH8EPG+xe Af Y30</latexit>ICR

Flexible Goals (Multi-Class)

2Gvp Zwh0Yf+u5f B/t X5Cwe Ewl Tp OCLVYLAo Sx Snie VB8LA0KUt PMg DAy+ys XF2BAUBZJc R7Cz0a9flhfncxz Uqv Vayv Sq1a8Hx XP+1Ut N4+We RTYHtn35n HGqz JTlmbd Zhg N+y OPb BH59a5d56c50Xrmr Oc2WV/y Hn9DSXor U=</latexit>Veriﬁcation Against Adversarial Examples

Real World Eﬃciency

Figure 1. a) Overview of the framework for creating Trustworthy Actionable Perturbations (TAP). b) Comparison of objectives and features of TAP and Counterfactual Explanations (CE) (Wachter et al., 2017), Actionable Counterfactuals/Algorithmic Recourse (AC/AR) (Ustun et al., 2019; Karimi et al., 2021; Poyiadzi et al., 2020), Improvement-Focused Causal Recourse (ICR) (K onig et al., 2023).

erwise accurate classifiers. Among the various definitions of adversarial behavior, we use the definition: modifications to a data point are adversarial if they cause a classifier to be far less accurate on modified data points than the original data (Diochnos et al., 2018). These modified inputs are called adversarial examples and the algorithms that create them are called adversarial attacks. The algorithms that create counterfactuals are very similar to adversarial attacks and Pawelczyk et al. (2022) showed they produce similar outputs, which leads to the troubling conclusion that many counterfactuals may act as adversarial examples and change the classifier decision (individual is now offered a loan) without changing the true underlying class probabilities (individual is still likely to default on the loan). The adversarial vulnerability of classifiers is separate from causality concerns. For this reason, we introduce a novel two step procedure where (1) we generate a suggested change and (2) we use an independently trained verifier to certify that this change is not acting as an adversarial example. We present a methodology for training this verifier and provide analytical results showing that it is PAC-learnable (Theorem 2.3).

Flexible Goal Definition: AC/AR focus solely on the final classification of a data point, but this may not always be sufficient or feasible. For instance a valid AC/AR may lead to a 51% likelihood of paying off a loan, but this may not satisfy the individual. Additionally, a change that improves a cancer patient s odds of survival form 15% to 40% would not constitute a valid AC/AR even though it would be very useful. Accordingly, our framework defines goals through a target set of acceptable outcomes that can be tailored to an individual s needs, and we demonstrate how these target sets can be designed. We note that ICR (K onig et al., 2023) and one of the AC/AR methods (Dandl et al., 2020) propose the use of goals other than final classification, but our formulation is more flexible and applies to multi-class scenarios. We develop a principled measure of reward by defining a distance to the target set using statistical divergence. We

analyze this distance theoretically in Theorem 2.2.

Real World Efficiency: Previous works on CE and AC/AR reduce the amount of changes made by a counterfactual by minimizing a weighted ℓ-norm of the changes (with the exception of Ramakrishnan et al. (2020)), however these norms often fail to represent the real world cost of a change. Alternatively, we minimize a cost measure built specifically to reflect real world costs of a change. By using this measure of real world cost and principled measure of rewards (distance to target set), TAP can suggest more efficient advice. We present a few examples of the utility of producing efficient advice through TAP: (a) Suggest the course of treatment that would double a patient s odds of survival while requiring the least staff hours. (b) List the skills an job applicant could acquire in the least amount of time that would lead to a high probability of receiving an interview. (c) Find the cheapest modifications to a product that would bring it into a more premium price range. We illustrate through experiments on real world data how the use of application specific cost functions leads to more efficient advice.

Figure 1(a) illustrates our framework of Trustworthy Actionable Perturbations (TAP) for using feasible actions, true cost and an individualized goal to create an efficient change, which is then verified to ensure that the change affects the true class probabilities instead of acting adversarially. Our goal to change the true class probabilities (real world outcomes) differs from previous CE and AC/AR works that seek only to change the classifier s decision. We share our goal with ICR which is focused on ensuring that only features causally related to the class are modified. Our framework, on the other hand, focuses on ensuring that the changes do not exploit the brittleness of ML classifiers and cause misclassifications. This can occur regardless of whether modified features are causally related to the output. Figure 1(b) provides a summary of the objectives and features of various existing approaches alongside TAP.

Trustworthy Actionable Perturbations

2. Trustworthy Actionable Perturbations

Problem Setting and Goals: Suppose there is an unknown distribution (x, C) D. Here x is a member of the input space X Rd and C {1, ..., k} is the class of x. We define the true class probabilities y(x) := (P(C = 1|x), ..., P(C = k|x)). We let Y denote the ksimplex and use a classifier M : X Y to estimate y(x). Our goal in designing TAP is as follows: Given an input x with an undesirable classification M(x), find the most efficient real world actions to create a modified input x such that the corresponding true probabilities y( x) (and not just M( x)) are more desirable.

Real World Actionability: TAP should only suggest modifications that are feasible in the real world (e.g., not decreasing an individual s age). To this end, we introduce: the Actionable Set A(x) of a data point x as the set of all perturbations of x that are feasible in the real world. For example, if X represents loan applications with x1 the age of the applicant, x2 the applicant s credit score, x3 the amount of credit and x4 the loan duration, the actionable set could be A(x) = { x X| x1 = x1, x2 = x2}, i.e. the applicant can change the size and duration of the loan they request, but not their age or credit score. Previous works have examined the complexities of actionability including causal relations between inputs, e.g. one can t increase their education without an increase in age (Mahajan et al., 2019; Karimi et al., 2020b). All of these considerations, as well as a limiting changes to attributes which are believed to have a causal link to the output, can be incorporated into A(x).

Efficiency: The definition of the most efficient change depends on the context of the problem and could involve a well defined value such as cost in dollars or more nebulous value such as amount of effort required. We characterize this value with a function d X : X X R, where d X (x, x) is the cost of changing x to x. For example, if x and x represent resumes, then d X (x, x) could represent the time it would take to acquire the attributes listed on resume x, but not on x. We note this function may not be a true distance measure. For example, if d X represents the difference in financial cost between two courses of medical treatment, then d X (x, x) should be negative when x is more affordable than x.

Desirability: We now define what we mean by a desirable outcome the goal of a TAP. The Target Set T is the set of all elements of Y that would be an acceptable result of a TAP. If we wish to belong to a desirable class w with probability no less than p, the target set would have the form T = {z Y|zw p}. If our goal is rather to avoid some undesirable class u, T could be of the form T = {z Y|zu q} for a fixed q. More generally, if we wish to belong to a set of desirable classes W with probability at least p and we wish to belong to a set of undesirable classes

j Huj Sfj1Xhbtha Mf OY/ZDx/g WPR410</latexit>p

""" SU > (1 SW)

and SW < (1 SU)

{y 2 Y | SU > q and SW (1 SU)

{y 2 Y | SW < p and SU (1 SW)

{y 2 Y | SW p and SU q} A :

j3F8pv2a Gc XQF5k Yl HB+Wy0fla WSa Kq VSu TRVGv F4KAYBf7h ZPTSR9Zsk W2y R4Jy CE5Iek Suq Ek3vy SJ7Ji/fg PXmv3t4NONdjb JD3jv X7x Om/s=</latexit>Total Probability in Desired Classes

Total Probability in Undesired Classes

Figure 2. Illustration of the partition on Y used to calculate the distance from the target set T in Theorem 2.2. Although the cost function takes different functional form(s) in the four regions, it is continuously differentiable in the entire space.

U with probability no greater than q, we would use

i W zi p, X

We must quantify how close an TAP comes to achieving its goal in a principled manner. To do this, we first choose a measure of statistical distance D(y||z) (we use Kullback Leibler (KL) Divergence). We then denote d Y(y, T) as the distance of y to the target set T, defined as follows:

d Y(y, T) := inf z T D(y||z). (2)

We may now formally define Trustworthy Actionable Perturbations. Let ϵ represent budget the amount of work we are willing to perform, and δ represent tolerance how close the final result is to our target set T.

Definition 2.1 ((ϵ, δ)-Trustworthy Actionable Perturbation).

x is an (ϵ, δ)-trustworthy actionable perturbation for x and a target set T if

1. d X (x, x) ϵ

2. d Y(y( x), T) δ

In order to verify the second condition we must be able to calculate d Y. Fortunately, the optimization problem in (2) has a differentiable closed form solution when D(y||z) is an f-divergence: a broad class of measures including KL-divergence, total-variation (TV) and other commonly used statistical distances. An f-divergence is defined as D(y||z) = Pk i=1 zif yi zi

, where f is a convex function

satisfying f(1) = 0 and f(0) = limx 0+ f(x) (Polyanskiy & Wu, 2024). Theorem 2.2 describes the solution to (2).

Trustworthy Actionable Perturbations

Theorem 2.2. If D(y||z) is an f-Divergence with twice differentiable f and T is of form (1), then

d Y(y, T) =

p + (1 p)f 1 SW

q + (1 q)f 1 SU

+(1 p q)f 1 SW SU

(3) where SW = P

i W yi, SU = P

i U yi and the sets A, B, C and D are a partition of Y defined and visualized in Figure 2. Furthermore, d Y(y, T) is continuously differentiable in y over its entire domain.

Equation (3) in Theorem 2.2 is easily calculable and continuously differentiable despite its piece-wise form, which will be significant when creating TAP (see Section 3). The proof of Theorem 2.2 involves showing that optimization problem (2) is convex and finding a value z that satisfies the KKT conditions. This proof and additional results about d Y are found in the Appendix A.1.

Real-world Verifiability of TAP: Note that TAP are defined with respect to the true class probabilities y( x) because TAP should have an effect in the real world. Notwithstanding, y( x) is unknown and we must use M( x) to create our TAP (more details in Section 3), which introduces the risk that we might produce an x that has the desired effect on M( x) but not y( x) (like an adversarial example). This is of particular concern because TAP and all other counterfactuals are created by solving an optimization problem of the form

x = arg min x loss( x, w) + λ dist( x, x), (4)

which is precisely how most adversarial examples are created (Pawelczyk et al., 2022). When counterfactuals were first introduced to ML (Wachter et al., 2017), the concern that counterfactuals would act as adversarial examples was dismissed because the adversarial attacks of the time 1) modified many more features than counterfactuals and 2) were targeted almost exclusively at image data whereas counterfactuals were proposed for use on tabular data. Since that time, Gourdeau et al. (2021); Su et al. (2019) demonstrated that adversarial attacks can be effective when changing a very small number of features, and several works (Ballet et al., 2019; Mathov et al., 2020; Cartella et al., 2021; Kumar et al., 2021) have shown that adversarial examples exist on tabular data sets. This implies that verification is necessary to achieve results that can be trusted to change the true class probabilities.

Verifying x may appear similar to detecting adversarial examples, which has been the object of significant research (Yang et al., 2020; Roth et al., 2019; Fidel et al., 2020;

Carlini & Wagner, 2017a) with no satisfactory solution. Fortunately, we have an important advantage over detecting adversarial examples: we know the original data point x and exactly how it was modified, i.e., x. To capitalize on this knowledge, we propose a novel verification procedure using a classifier V : X X [0, 1] which compares two inputs simultaneously and predicts the probability of the inputs belonging to the same class: the value of V (x, x) estimates P(C = C|x, x). Because V has a different classification task from M, attacks targeted against M should not be effective against V , and we can use the discrepancy between estimates of M and V to determine if an x acts adversarially on M. In order to make this comparison, we use the fact that P(C = C|x, x) can also be estimated using M by calculating Pk i=1 Mi(x)Mi( x). If x acts adversarially we would expect Pk i=1 Mi(x)Mi( x) to be very small while V (x, x) is large. If x is not adversarial we would expect similar values from both Pk i=1 Mi(x)Mi( x) and V (x, x). Accordingly, we define

i=1 Mi(x)Mi( x)

and verify that an x is trustworthy only if (x, x) < γ. In Section 3, we describe how we selected the threshold γ.

Training a Verifier & PAC Learnability: In order to create V , we must have data on which it can be trained. We build this difference training data by creating all possible pairs of elements from our original training data and labeling the pairs by whether they belong to the same class (1 for the same class, 0 for different classes). If the original training data is {x(i), C(i)}n i=1, the difference training data is {(x(i), x(j)), z(i,j)}1 i,j n, where z(i,j) = 1[C(i) = C(j)]. We use the same architecture for V as M (only changing the number of inputs and outputs), but differing architectures could also be used. Now that we have a method for training V , we show that training in this way leads to a generalizable verifier. To this end, we next present a probably approximately correct (PAC) bound on V s generalization gap which depends on n (number of training samples), k (number of classes), and d (data dimensionality). Theorem 2.3. Let R(V ) be the true risk of a verifier V over data drawn from D and ˆRS(V ) be the empirical risk over a sample S of labelled point pairs drawn i.i.d. from D. Both risks are defined using a bounded loss function ℓ( , ) Bℓ. Also let V be selected from a function class V. Then for any δ (0, 1), with probability (1 δ), the following bound on the generalization gap holds.

R(V ) ˆRS(V ) = O

Here the terms with explicit dependence on δ have been suppressed because they are dominated by the term in (6).

Trustworthy Actionable Perturbations

The precise generalization bound is presented in (43) in the Appendix A.2.

To prove Theorem 2.3, we construct a definition of risk that fits this new learning scenario (i.e., learning if two samples are from the same class or not, as opposed to conventional classification). This risk takes into account that we expect large imbalances between the number of point pairs from the same class and from different classes. In order to obtain the bounds on the generalization gap, we expand this risk into a sum of terms which can be bounded with existing Rademacher complexity PAC-methods. Finally, we bound the growth of these Rademacher complexity terms as a function of n, k and d to arrive at (6). The complete proof, including detailed definitions of R(V ) and ˆRS(V ) as well as additional discussion, is presented in the Appendix A.2.

Remark 2.4. The bound in Theorem 2.3 is small as long as n k2 and n is exponentially larger than d. The relation between n and k is crucial because it implies that the denominator

n2 k2n n2 k. This differs from typical PAC bounds where the primary requirement is n be exponentially larger than d (Theorem 4.3 in Gottlieb et al. (2016)) and have mild dependence on the number of classes k. The key implication of this result is: when using a verifier as described in this paper, as the data sets used increase in number of classes k, it is essential that the amount of training data increases at a rate of

3. Generating TAP

Two Step Creation Method: We now present and discuss the general optimization framework for creating TAP. Ideally, we would like to solve the following optimization problem: arg min x A(x) d Y( y, T) + λd X ( x, x), where the scalar parameter λ balances the effort(ϵ)-reward(δ) tradeoff. Solving this optimization would be guaranteed to create an effective TAP; unfortunately y( x) is unknown and we cannot solve this problem directly. Instead propose the following two-step procedure where: in Step 1, we treat M( x) as a surrogate for y( x), and in Step 2, we use a verification algorithm to ensure that x is not just fooling the classifier.

Step 1 : arg min x A(x) d Y(M( x), T) + λd X ( x, x) (7)

Step 2: Verify M( x) y( x) i.e. (x, x) γ (8)

Solving Step 1: We solve (7) using gradient descent which requires us to use differentiable models M and formulate d X in a differentiable manner (d Y is differentiable according to Theorem 2.2). We modify our gradient descent to address two challenges. (1) We must insure that our so-

Algorithm 1 Generating TAP

Input: Classifiers M & V , point x, target family T, learning rate α, verification-cut off γ

x x while x not converged do

g x (d Y(M( x), T) + λd X ( x, x) + b( x) + p( x)) gj 0 for all immutable features j.

x x αg end while

x = cond( x) (project onto the coherent space) ϵ, δ = d X ( x, x), d Y(M( x), T) if ϵ and δ requirements NOT met then

Adjust λ (see text for explanation) Return to while loop end if if V (x, x) Pk i=1 Mi(x)Mi( x) γ then Adjust problem parameters (see text for explanation) Restart algorithm end if return x

lution is actionable: x A(x). (2) Our solution x must follow any formatting rules associated with the data set (for instance, Boolean variables must be either 0 or 1, categorical features must respect one-hot encoding, etc.). A perturbation that follows these formatting rules is called coherent. To solve these two difficulties, we first assume A(x) = { x|li xi ui, 1 i d} for some set of lower bounds {li}d i=1 and upper bounds {ui}d i=1. An attribute is immutable if li = ui. We ensure actionability by setting all elements of the gradient corresponding to immutable features to zero and adding a large penalty b( x) term to the objective function which punishes points for leaving the actionable set. To ensure coherence, we project the result of our gradient descent onto the coherent space by using a function cond : Rm X which performs the appropriate value rounding to make an input coherent. We found it useful to introduce a second penalty term p( x) which requires that any one-hot encoded features sum to 1. This ensures our answers never stray too far from a coherent point and improves robustness. Details on b, p and cond are found in the Appendix A.3.3. In practice we also found it useful to replace regular gradient descent with the ADAM algorithm (Kingma & Ba, 2014).

Solving Step 2: In Section 2, we discussed the necessity of verification and suggested that an TAP can be trusted if (x, x) = V (x, x) Pk i=1 Mi(x)Mi( x) is smaller than a threshold γ. Our process for choosing γ starts with deciding on an acceptable risk of eliminating a truly effective TAP (we use 10%). To find the γ corresponding to this risk, we calculate (x(i), x(j)) for a sufficiently large number of pairs (x(i), x(j)) from the testing data such that

Trustworthy Actionable Perturbations

85% chance of passing 25% chance of Diabetes 80% chance of repayment

Law School Grades, location of BAR exam

BMI, smoking, fruit, vegetable and alcohol consumption, healthcare, education, income, physical activity

Loan size, loan duration, telephone, money in checking account, money in savings account

Combination of e ort to improve grades and physical distance to travel to take BAR

E ort required to loose weight, change health habits, education and income as weighted 2-norm

Total monetary di erence in Deutsche Marks (DM)

Sex, race, LSAT score, undergraduate GPA

Age, sex, blood pressure, cholesterol, stroke, heart disease, mental, physical and general health, diﬃculty walking

Age, sex, marital status income, dependents, loan purpose, employment length and type, housing, credit history, collateral 77% 75% 75%

90% of high income

Weekly hours worked, education level, job type, employer type

Combination of time to improve education and hours worked per week

Age, sex, race, marital status

80% 77% 75% 80%

Dataset Adult Income Law School Success Diabetes Prediction German Credit

it>Target Set T

Mutable Features

Cost Measure

Features Immutable

Figure 3. Table containing details on data sets used for testing.

C(i) = C(j). Finally, we pick γ such that only the desired percentage of (x(i), x(j)) values (e.g. 10%) are above γ. The verification procedure is now reduced to eliminating any x that results in (x, x) > γ.

Adjusting for Suitability and Verifiability: When creating TAP we will often have a particular budget (ϵ) or tolerance (δ) bound we need to satisfy. To find a suitable TAP we repeat Step 1 of our process adjusting λ until the desired budget or tolerance is met: increasing λ to decrease ϵ and decreasing λ to decrease δ. It may also be appropriate to use a variety of λ values and plot the ϵ and δ values of each resulting TAP (see Figure 4). The user may then select a TAP they see as offering particularly good value. When a TAP fails the verification step, there are a few recourses. (1) Sometimes it is sufficient to decrease λ, putting greater emphasis on reaching the target set. (2) Shrink the target set (increase the value of p and decrease the value of q) in order to force the algorithm to find more effective changes. (3) Add a random perturbation to x in order to move the starting point away from the adversarial example. The entire procedure is described in Algorithm 1.

4. Experimental Results Data Sets: We compare TAP, counterfactuals and adversarial attacks on four data sets from different fields; data set details are found in Figure 3 and the Appendix A.3.1. The associated code can be found at https://github. com/Jesse Friedbaum/TAP_code.

Adult Income (Kohavi & Becker, 1996): This data set contains demographic information on Americans labelled by whether they had a high income. The actionable set A(x) allows individuals to increase their education, change jobs and adjust their weekly work hours. The cost function d X sums the expected number of years to improve education, a one-year cost to change jobs and the square of the change in hours worked (weighted so an additional 3 hours of work

per week is equal to a year spent on education). Law School Success (Wightman, 1998): This data set contains information on law school students labelled by whether they passed the BAR exam. A(x) allows changes to law school grades (through more studying) and the region where the exam is taken. The cost function d X sums the increase in grades and the physical distance travelled to take the BAR. Moving to an adjacent region (Far West to North West) is weighted equal to increasing grades one standard deviation. Diabetes Prediction (for Disease Control & , CDC): The individuals in this data set are labelled by whether they have diabetes. We define A(x) to allow changes in health habits, BMI, education and income. We use a weighted 2-norm for d X to represent the relative difficulty of making changes. For example, starting to get regular physical activity is weighted the same as dropping one BMI. German Credit (Hofmann, 1994): This data set contains loan applications. In A(x), we allow for changes to the loan duration and size and funds in the checking and savings accounts. We use d X to measure the total difference in Deutsche Marks (DM) over all elements of the application.

Other Methods: We compare our results against counterfactuals created using the original method proposed to create counterfactuals (Wachter et al., 2017) and the diverse counterfactuals (DICE) method in (Mothilal et al., 2020), the most cited methods in the literature. These methods use an ℓp norm based cost function that often fails to reflect real world costs (see examples on the next page). We also compare TAP against the Carlini & Wagner (2017b) ℓ2 adversarial attack, one of the most well known and effective adversarial attacks. The counterfactuals belong to the same actionable set as the TAP, but the adversarial examples are not limited to an actionable set and may not be coherent.

Models: Gradient boosted tree algorithms (Friedman, 2001) are considered state of the art architectures for tabular data classification (Shwartz-Ziv & Armon, 2022). Unfortunately,

Trustworthy Actionable Perturbations

Counterfactual Original Feature

Trustworthy

Actionable Perturbations

Trustworthy

Actionable Perturbations

TAP-1 TAP-2

Grades Law School

Ze ATP4MVYGE/Gq/G2b M0Z2cw R+CHj/Qt7z JQW</latexit>Region of

Probability of Passing Exam

50n Jaz PBlp F536kv SOz Pthmnb141a+6LIowx Hc Ayn YEMT2n AFHeg Cg Qk8wj O8GNx4Ml6Nt0Vry Shm Du GHj Pcv Eem Nr Q=</latexit>0.7

mg We6Qywnqrft RT+V+v H2m8OEyai WFNBFh/5MUc6ROnda Mwk JZr Pj MFEMr Mr Il Ms Md Emn VIWwn Dd Zvu8m SUklr Nr S1J18RVrzr Odb3Susjz KMIRHMp ONCAFlx BGzp AYAKP8Awv Frer Ffrbd Fas PKZQ/gh6/0LC9m Nq Q=</latexit>2.1

QC12j Nuogijh6RM/oxbg3nox X423RWj Dym UP0Q8b7F0ink CQ=</latexit>Far West Great Lakes Mid-West

d ZJRKQvhr GHb TXtx Mkx Jv W7XF6Rbq1qn Vcu6rl Va53ke RXADs Exs EADt MAla IMOw ICDR/AMXgxp PBmvxtu8t WDk M/vgh4z3Lz Bqk Kw=</latexit>Mid-South

95% 78% 55% 44%

Immutable Features

qvxtmj NGdn MIfkh4/0LPKi RPA=</latexit>Race: White Undergraduate GPA: 2.5 LSAT: 34

Counterfactual Original Feature

Trustworthy

Actionable Perturbations

Trustworthy

Actionable Perturbations

Probability

Race: White

KTI2osf9d S+F/Nip Tbt GPmh5ECn84/ci OVYDT+/GQCa CKTx NDq GDJrpi Oi SBUJSm Vsh BOG6b ZNBcn45TU62Z9Qbq1qn FSNYzr Wq V1nud RAfo EB0j Az VQC12h Nuogig L0i J7Ri6a0J+1Ve5u3Fr R8Zh/9k Pb+BTm Yk To=</latexit>Sex: Female Age: 30 Marital Status: Single

HSCsd Eq VPITLlm237d XJMCPNpt1ck UGjbl3ULeum Uetc FXm Uw Qk4Bef Ai3QAV3QA32AQe P4Bm8GMp4Ml6Nt2Vry Shmjs EPGe9fi VWRbg=</latexit>High School

d8V0TASh Og RZz EI4r9t2w16cj FNSq9m1Bel UK9Zxb Juqu Xm RZ5HAR2i I3SKLFRHTXSFWqi NKOLo ET2j F+Pe DJejbd565KRzxyg Hz Levw CJ95BO</latexit>Master s Professional Private Self-Employed Service

zku4b Np2y15+GWd Ko2E3lop Tr1k XNcu6r Vfb V0Uf JXSMTt AZsl ATtd EN6q Auoih Gj+g Zv Ri J8WS8Gm+L6Ip Rz Byh Hz Devw B64p H7</latexit>White-Collar 40 41 56

Professional

pnf Fd EIEo Uon VM5Du Gradsten Ywz0mj Yj RVx6j Xrsm Zd/Vq+7r Io4RO0Rm6QBZqoja6R3URN0SN6Ri9Gb Dw Zr8bsn XNKGZO0A8Z718h W5AZ</latexit>Private Private Service Professional 40

Employer Occupation Hours Worked

High Income 2.4% 22% 73% 69%

TAP-3 TAP-4

t3pr R0pl9EPa+xf0EZIp</latexit>TAP-2

5CX3e GSE3k71o G/6v1Ex U0v ZTy OFGE48VHQc Kgim B2ORx RQb Bi M20QFl Tv Cv ECYSVzqech3DZc Jymszw Zs S2HXt Juv Wad VGzr Lt6t XVd5FECx+AEn AELNEAL3I2c AEGFDy CZ/Bic OPJe DXe Fq0r Rj Fz BH7Ie P8Ca O6Obw=</latexit>TAP-3

5cw Yx+n0h QIOU8HRng NRU/q6l8L/a IFZ+y0oj2JFOF5+5Mc Mqh Cml8Mx FQr Ntc GYUH1rh BPk UBY6Xx KWQi XTdtu2au TYUrqdbu+Ir1a1Wp ULeu Vmlf53k Uw Qk4Bef Ak3QBreg Axy AQWP4Bm8GNx4Ml6Nt2Vrwchnjs EPGe9fan Oc A=</latexit>TAP-4

KRIb Y9GT0Xc EK/T4TEUWrm WFGn Q2Csftdi+F+t F4Bd N0Pu+g Ewly4W2YHA4OE4ATzkl EQs8g QKnl0K6Zj Igm FKdc Es JFr Vqt V5dfxj Gp VKq VJem US8Z5y TBuyo XGZp HFh2h Y3SKDFRDXSNWqi NKLp Hj+g Zv Wh T7Ul71d4Wr Rktn Tl EP6S9fw E865HC</latexit>CF-1

Ao98n Yuwr Nf Pdp NPHeq J+1L4X60fa/hx Ey Eka CLD7y Io50g NLD0Yh JSj Sf JQYTy ZJd EZlgi Yl O4ilm IVz Wbth L09GKan V7Nq Sd Ks V6Ji Wbf Vcv Mqz6MAx3ACZ2BHZpw A23o AIEJPMIzv Bi+8WS8Gm+L1h Ujnzm CHz Levw C1RY4I</latexit>CF-2

Trustworthy Actionable Perturbations Original Counterfactual

Ye M9y/5x Y41</latexit>DICE

Trustworthy Actionable Perturbations Original Counterfactual

gmp Vu3qkn Qq Zeusb Fm3l VLj Isj D0dw DKdg Q0ac A0ta AOBMTz CM7w Ywngy Xo23RWv Oy GYO4Ye M9y/5x Y41</latexit>DICE

(a) Law School Example (b) Adult Income Example

Perturbation budget Perturbation budget

Distance to target set

/δ values of an individual in law school /δ values of an individual in adult income

Distance to target set (δ)

Figure 4. Cost-Benefit plots of TAP and counterfactuals for an individually from the Law School data set with grades measured in standard deviations from the mean (a) and an individual in the Adult Income data set (b).

these models are not differentiable and cannot be used with our framework. Instead we use neural networks which we tuned until they provide accuracy on par with gradient boosted tree models on the same data set. Details on our models structure and training are given in Appendix A.3.2.

Representative Examples of TAP and Trade-off between cost/desirability: We first examine two representative examples of how TAP behave differently than counterfactuals for specific individuals. Figure 4 shows a plot of the ϵ/δ values of TAP and counterfactuals for one individual in the Law School data set and one individual in the Adult Income data set. We examine the results from the Law School data set: The TAP labelled TAP-1 suggests only a mild (0.2 standard deviation) increase in grades and the relatively short move from the Far West to the Great Lakes region resulting in a small 11% increase in the chance of passing the BAR. On the other hand, TAP-2 suggest a larger increase in grades and a longer move which results in a much larger 34% increase to the odds of success. Finally the counterfactual CF-1 suggest an enormous increase in grades and massive cross country move to achieve 51% increase in the odds of success. Turning our attention to the Adult Income example: TAP-3 suggests a relatively simple increase in education to the masters level resulting in a 20% increase to the odds of a high income. Alternatively, TAP-4 achieves an 71% increase by suggesting far more changes including a professional degree and becoming self-employed. The counterfactual CF-2 does not suggest becoming self-employed

and produces a smaller 67% increase in the odds of high income despite also suggesting a professional degree and a drastic 16 hour increase in the hours worked per week.

These examples illustrates two trends: 1) TAP offer both low-cost/low-reward (large-δ/small-ϵ) and high-cost/highreward options, whereas counterfactual methods (Wachter et al., 2017; Mothilal et al., 2020) offer only high-cost options. This is because TAP are defined by distance to the target set, but counterfactuals are defined as belonging to the desirable class. That rules out any advice that doesn t result in the desirable class being the most likely class. 2) Counterfactuals are prone to suggesting very high-cost outliers. This has two main causes: (a) The ℓ1 norm used to create the counterfactuals does not accurately represent real world effort. For example this norm considers any move in region to cost the same regardless of actual distance. (b) Because counterfactuals do not use a target set, they are prone to overshooting the desired goal. For example CF1 resulted in a 95% chance of passing the BAR when our goal was only 85%.

Comparison of TAP vs. Other Approaches: We now compare TAP, counterfactuals (Wachter et al., 2017; Mothilal et al., 2020) and CW attacks (Carlini & Wagner, 2017b) over the entire data sets. In Figure 5: Each bar chart refers to a particular data set and desired distance δ to the target set T. Each bar shows the percentage of individuals that a method was able to move inside the goal δ at a variety

Trustworthy Actionable Perturbations

Low reward: δ = 1 Moderate reward: δ = 0.5 High reward: δ = 0

0 cost/ 7000 DM 0 cost/ 7000 DM

Success rate

(a) Performance comparison over entire German credit dataset

Ncg SZo Awxi8Aiew Yv2o D1pr9rbf DSn Zl98APa+xf W5a U</latexit>TAP (our work) Single Counterfactual DICE

f Jrp7j U02Ge BQK/QKFM/p9Ii G+l DPf050+URP5u5b C/2q9WI0abs KCKFYQ0MWi Ucyx Cn Ea AR4y AVTxm Ta ECq Zvx XRCBKFKB1XMQrio23b DXn4Zp6RWs2t L4l Qr1n Fsq6r5e Zlnkc BHa MTd IYs VEd Nd IXaq IMo Eug RPa MX48F4Ml6Nt0Xrip HPHKEf Mt6/AOo Xkso=</latexit>Carlini Wagner

Low reward: δ = 1 Moderate reward: δ = 0.5 High reward: δ = 0

Success rate

h4</latexit>(b) Performance comparison over entire Diabetes prediction dataset

</latexit>0

BD9kv X8Boi SNcg=</latexit>40

037d XJKCW1ml1bkd5F1ap XLat Tr7Su8jy Kc AKnc A4WNKAFN9CGLh Cg8Aj P8GLc G0/Gq/G2b C0Y+cwx/JDx/g Uu B40</latexit>0

m2o RTz EK4q Ltuw12ej FJSrbr VJemc V5xax XFuau Xm Z5HAY7h BM7Ag To04Rpa0AYCPjz CM7x YU+v Jer Xe Fq0r Vj5z BD9kv X8Boi SNcg=</latexit>40

cbx Mpr Ogk Lolj Rg Cw/8m KOVIj Sq9GYCUo Un2u Di WB6V0Sm WGCid Dal LITLhm037d XJKCW1ml1bkd5F1ap XLat Tr7Su8jy Kc AKnc A4WNKAFN9CGLh Cg8Aj P8GLc G0/Gq/G2b C0Y+cwx/JDx/g Uu B40</latexit>0 12 BMI

536kv Sr VXt86pt39TKrcs8jw I4Asfg FNig AVrg Gr RB2AQg Ufw DF6MB+PJe DXe Fq0r Rj5z CH7Ie P8C7fi TRQ=</latexit>cost/

cbx Mpr Ogk Lolj Rg Cw/8m KOVIj Sq9GYCUo Un2u Di WB6V0Sm WGCid Dal LITLhm037d XJKCW1ml1bkd5F1ap XLat Tr7Su8jy Kc AKnc A4WNKAFN9CGLh Cg8Aj P8GLc G0/Gq/G2b C0Y+cwx/JDx/g Uu B40</latexit>0 12 BMI

536kv Sr VXt86pt39TKrcs8jw I4Asfg FNig AVrg Gr RB2AQg Ufw DF6MB+PJe DXe Fq0r Rj5z CH7Ie P8C7fi TRQ=</latexit>cost/ 0

m1pak U6045x XHuam Wm5d5Hg V0j E7QGXJQHTXRNWqh Nq Io Ro/o Gb1Yif Vkv Vpvi9YVK585Qj9kv X8Bo2OSFQ=</latexit>Success rate

Success rate

Success rate

Success rate

An GLEDQ=</latexit>Percentage of individuals moved within δ = 0.5 of target (c) (Top red cells unveriﬁed, bottom green cells veriﬁed)

Ad XOW IQc RPe (CRVW LQ \Ha UV RI HGXca WLRQ)

La Z Sch RRO SXcce VV (CRVW LQ ı JUa GH LQc UHa VH)

Diabe We V PUedic Wi RQ (CRVW LQ BMI SRLQW c Ka QJH)

Ge UPa Q CUedi W (CRVW LQ DHXWVc KH Ma UNV DM) CRVW/İ 6 12 18 2 4 6 4 8 12 0 3,500 7,000 APicab Oe Pe UWXUba Wi RQV

34% 76% 81% 78% 99% 100% 93% 95% 98% 73% 100% 100% 31% 62% 67% 75% 99% 100% 56% 59% 62% 58% 81% 85% CRXQWe Ufac WXa O Wac KWHU HW a O. (2017)

7.8% 71% 78% 51% 95% 99% 14% 47% 64% 0% 62% 81% 6% 44% 45% 48% 91% 95% 8.1% 29% 35% 0% 42% 58% DICE MRWKLOa O HW a O (2020)

17% 69% 78% 39% 93% 100% 21% 55% 71% 0% 46% 77% 12% 49% 59% 38% 92% 98% 14% 37% 47% 0% 35% 58% 7.5% 7.5% 7.5% 90% 98% 99% 0% 0% 0% 3.8% 31% 31% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% Ca UOi Qi Wag Qe U Ca UOLQL & Wa JQHU (2017b)

Adult income Law school success Diabetes prediction German credit (cost in years of education) (cost in σ grade increase) (cost in BMI point change) (cost in Deutsche marks (DM))

+hs JVn</latexit>(our work)

Counterfactual

Wachter et al. (2017)

7T3L0l Rkn A=</latexit>DICE Mothilal et al. (2020) Carlini Wagner Carlini and Wagner (2017b)

Figure 5. a) & b) show average success rate for moving individuals within a variety of distances (δ) to the target set. The y-axis shows the percentage of individuals within the goal distance, and the x-axis, represents different costs (ϵ values). c) Summarizes success values for all data sets. The upper (red) value for each row is the success rate before the verification procedure and the lower (green) value is the success rate after verification with a 10% chance of rejecting valid examples.

of costs ϵ. (Bar charts for all data sets are found in the Appendix A.3.4.) The table summarizes this information for all data sets with the upper (red) value in each cell representing the data before the verification procedure and the lower (green) value the success rate after the verification procedure. Consider the bar chart on the top middle which refers to the German Credit data and a goal of δ = 0.5 from the target (the same information as the last three columns of the table). At a ϵ = 0 Deutsche Marks (DM) cost, TAP are able to move 73% of individuals within the goal range by closing empty accounts. Counterfactuals do not match this success until the cost ϵ = 7, 000DM, and CW attacks never achieve more than a 31% success rate. TAP outperform counterfactuals in all of the test scenarios.

Impact and Effectiveness of Verifier: The first important

take away from the success rates after verification is that the verifier was 100% effective at eliminating Carlini Wagner adversarial examples (visible in the bottom row of the table in Figure 5 c), implying that the verification method does indeed eliminate inputs that fool the classifier. Importantly, the verification procedure also removes a significant number of TAP and counterfactuals. Consider the second column of Figure 5 c: Out of all TAP generated 14% appeared effective but were eliminated by the verification procedure. Counterfactual methods fared even worse with 20% to 27% of counterfactuals eliminated. This reinforces the necessity of a verification procedure.

Concluding Remarks & Future Work: In this work, we proposed Trustworthy Actionable Perturbations (TAP) which leverage ML classifiers to find efficient actions to

Trustworthy Actionable Perturbations

achieve real world results. Our proposed framework introduces a novel verification procedure, flexible definition of goals, and principled reward measure for use in generating counterfactuals. We demonstrated their effectiveness when compared to other methods on data sets from multiple fields. Finally we note that our framework is flexible enough to incorporate contributions from previous works on counterfactuals such as individualized cost measures (De Toni et al., 2023), causal relations between inputs (Mahajan et al., 2019; Karimi et al., 2020b), causal relationships to the output (K onig et al., 2023), and advanced optimization methods (Guidotti et al., 2018; Karimi et al., 2020a).

Impact Statement

As the use of AI and ML expands into critical applications such as healthcare, criminal justice, and hiring, the importance of explaining decisions deemed unfavorable and providing recourse to such users has grown significantly. In this context, our paper introduces a novel contribution aimed at making recourse mechanisms more trustworthy. We present a flexible framework, Trustworthy Actionable Perturbations (TAP), designed to generate cost-effective recourse which can ensure that the recourse being provided to users results in real-world changes. TAP can be useful to both end-users and institutions that suggest the recourse. The technical tools and the analytical results developed in the paper (including a flexible target set, and a novel pairwise verification procedure) can also find use and lead to new insights for other problems such as cost-sensitive learning and adversarial defense.

Acknowledgements

We thank the anonymous ICML reviewers and the area chairs for their insightful suggestions. This work was supported by NSF grants CAREER 1651492, CCF-2100013, CNS-2209951, CNS-1822071, CNS-2317192, and by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing under Award Number DESC-ERKJ422, and NIH Award R01-CA261457-01A1.

Ballet, V., Renard, X., Aigrain, J., Laugel, T., Frossard, P., and Detyniecki, M. Imperceptible adversarial attacks on tabular data. ar Xiv preprint ar Xiv:1911.03274, 2019.

Bartlett, P. L. and Mendelson, S. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results. Journal of Machine Learning Research, 3(Nov):463 482, 2002.

Carlini, N. and Wagner, D. Adversarial examples are not easily detected: Bypassing ten detection methods. In

Proceedings of the 10th ACM workshop on artificial intelligence and security, pp. 3 14, 2017a.

Carlini, N. and Wagner, D. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 39 57, 2017b.

Cartella, F., Anunciacao, O., Funabiki, Y., Yamaguchi, D., Akishita, T., and Elshocht, O. Adversarial attacks for tabular data: Application to fraud detection and imbalanced data. ar Xiv preprint ar Xiv:2101.08030, 2021.

Dandl, S., Molnar, C., Binder, M., and Bischl, B. Multiobjective counterfactual explanations. In International Conference on Parallel Problem Solving from Nature, pp. 448 469. Springer, 2020.

De Toni, G., Viappiani, P., Teso, S., Lepri, B., and Passerini, A. Personalized algorithmic recourse with preference elicitation. Transactions on Machine Learning Research, 2023.

Diochnos, D., Mahloujifar, S., and Mahmoody, M. Adversarial risk and robustness: General definitions and implications for the uniform distribution. Advances in Neural Information Processing Systems, 31, 2018.

Fidel, G., Bitton, R., and Shabtai, A. When explainability meets adversarial learning: Detecting adversarial examples using shap signatures. In 2020 international joint conference on neural networks (IJCNN), pp. 1 8. IEEE, 2020.

for Disease Control, C. and (CDC), P. Behavioral risk factor surveillance system survey data (brfss), 2015. URL https://www.cdc.gov/brfss/index.html.

Freiesleben, T. The intriguing relation between counterfactual explanations and adversarial examples. Minds and Machines, 32(1):77 109, 2022.

Friedman, J. H. Greedy function approximation: a gradient boosting machine. Annals of statistics, pp. 1189 1232, 2001.

Gottlieb, L.-A., Kontorovich, A., and Krauthgamer, R. Adaptive metric dimensionality reduction. Theoretical Computer Science, 620:105 118, 2016.

Gourdeau, P., Kanade, V., Kwiatkowska, M., and Worrell, J. On the hardness of robust classification. Journal of Machine Learning Research, 22(273):1 29, 2021.

Guidotti, R., Monreale, A., Ruggieri, S., Pedreschi, D., Turini, F., and Giannotti, F. Local rule-based explanations of black box decision systems. ar Xiv preprint ar Xiv:1805.10820, 2018.

Trustworthy Actionable Perturbations

Hofmann, H. Statlog (German Credit Data). UCI Machine Learning Repository, 1994. DOI: https://doi.org/10.24432/C5NC77.

Karimi, A.-H., Barthe, G., Balle, B., and Valera, I. Modelagnostic counterfactual explanations for consequential decisions. In International Conference on Artificial Intelligence and Statistics, pp. 895 905. PMLR, 2020a.

Karimi, A.-H., Von K ugelgen, J., Sch olkopf, B., and Valera, I. Algorithmic recourse under imperfect causal knowledge: a probabilistic approach. Advances in neural information processing systems, 33:265 277, 2020b.

Karimi, A.-H., Sch olkopf, B., and Valera, I. Algorithmic recourse: from counterfactual explanations to interventions. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pp. 353 362, 2021.

Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization. ar Xiv preprint ar Xiv:1412.6980, 2014.

Kohavi, R. and Becker, B. Uci adult dataset. UCI machine learning repository, 1996.

K onig, G., Freiesleben, T., and Grosse-Wentrup, M. Improvement-focused causal recourse (icr). In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp. 11847 11855, 2023.

Kumar, N., Vimal, S., Kayathwal, K., and Dhama, G. Evolutionary adversarial attacks on payment systems. In 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 813 818. IEEE, 2021.

Leo, M., Sharma, S., and Maddulety, K. Machine learning in banking risk management: A literature review. Risks, 7(1):29, 2019.

Mahajan, D., Tan, C., and Sharma, A. Preserving causal constraints in counterfactual explanations for machine learning classifiers. ar Xiv preprint ar Xiv:1912.03277, 2019.

Martinez Neda, B., Zeng, Y., and Gago-Masague, S. Using machine learning in admissions: Reducing human and algorithmic bias in the selection process. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education, pp. 1323 1323, 2021.

Mathov, Y., Levy, E., Katzir, Z., Shabtai, A., and Elovici, Y. Not all datasets are born equal: On heterogeneous data and adversarial examples. ar Xiv preprint ar Xiv:2010.03180, 2020.

Mohri, M., Rostamizadeh, A., and Talwalkar, A. Foundations of Machine Learning. MIT press, 2018.

Mothilal, R. K., Sharma, A., and Tan, C. Explaining machine learning classifiers through diverse counterfactual explanations. In Proceedings of the 2020 conference on fairness, accountability, and transparency, pp. 607 617, 2020.

Naeini, M. P., Cooper, G., and Hauskrecht, M. Obtaining well calibrated probabilities using bayesian binning. In Proceedings of the AAAI conference on artificial intelligence, volume 29, 2015.

Pawelczyk, M., Agarwal, C., Joshi, S., Upadhyay, S., and Lakkaraju, H. Exploring counterfactual explanations through the lens of adversarial examples: A theoretical and empirical analysis. In International Conference on Artificial Intelligence and Statistics, pp. 4574 4594. PMLR, 2022.

Polyanskiy, Y. and Wu, Y. Information theory: From coding to learning. 2024.

Poyiadzi, R., Sokol, K., Santos-Rodriguez, R., De Bie, T., and Flach, P. Face: feasible and actionable counterfactual explanations. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, pp. 344 350, 2020.

Ramakrishnan, G., Lee, Y. C., and Albarghouthi, A. Synthesizing action sequences for modifying model decisions. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp. 5462 5469, 2020.

Roth, K., Kilcher, Y., and Hofmann, T. The odds are odd: A statistical test for detecting adversarial examples. In International Conference on Machine Learning, pp. 5498 5507. PMLR, 2019.

Sauer, C. M., Dam, T. A., Celi, L. A., Faltys, M., de la Hoz, M. A., Adhikari, L., Ziesemer, K. A., Girbes, A., Thoral, P. J., and Elbers, P. Systematic review and comparison of publicly available icu data sets a decision guide for clinicians and data scientists. Critical care medicine, 50 (6):e581 e588, 2022.

Shwartz-Ziv, R. and Armon, A. Tabular data: Deep learning is not all you need. Information Fusion, 81:84 90, 2022.

Su, J., Vargas, D. V., and Sakurai, K. One pixel attack for fooling deep neural networks. IEEE Transactions on Evolutionary Computation, 23(5):828 841, 2019.

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. Intriguing properties of neural networks. ar Xiv preprint ar Xiv:1312.6199, 2013.

Ustun, B., Spangher, A., and Liu, Y. Actionable recourse in linear classification. In Proceedings of the conference on fairness, accountability, and transparency, pp. 10 19, 2019.

Trustworthy Actionable Perturbations

Wachter, S., Mittelstadt, B. D., and Russell, C. Counterfactual explanations without opening the black box: Automated decisions and the gdpr. Cybersecurity, 2017.

Wightman, L. F. Lsac national longitudinal bar passage study. lsac research report series. 1998.

Yang, P., Chen, J., Hsieh, C.-J., Wang, J.-L., and Jordan, M. Ml-loo: Detecting adversarial examples with feature attribution. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp. 6639 6647, 2020.

Trustworthy Actionable Perturbations

A. Appendix

The Appendix is organized as follows:

A.1 Proof of Theorem 2.2 (Analysis of statistical distance d Y to the target set T) A.2 Proofs of Theorem 2.3 (PAC generalization bounds for Verifier) A.3 Additional details about the implementation of experiments

A.3.1 Details about data sets and their corresponding cost functions A.3.2 Details about the models used A.3.3 Details about the objective function used for optimization A.3.4 Additional experimental results showing the comparative performance of TAP vs. other methods.

A.1. Proof of Theorem 2.2 (Analysis of statistical distance d Y to the target set T)

Recall that our target sets have the form

i W zi p , X

where either W or U could be empty. Also recall

d Y( y, T) = min z T Df( y||z) = min z T

We must prove three facts: A) d Y( y, T) has the closed form found in equation (3), B) This function is continuous, C) the derivative of the function is continuous. We begin by proving the closed form equation.

Our proof of the closed form of d Y( y, T) will be made easier by introducing notation N = (W U)C as the neutral classes that are neither desirable nor undesirable. We will use the fact that 1 = SW + SU + SN to rewrite (3) as

d Y( y, T) =

0 if SW p and SU q

p + (1 p)f SU+SN

1 p if SW < p and SU (1 SW) q 1 p

q + (1 q)f SW+SN

1 q if SU > q and SW (1 SU) p 1 q

q + (1 p q)f SN 1 p q if SU > (1 SW) q 1 p

and SW < (1 SU) p 1 q

where SW = P

i W yi, SU = P

i U yi and SN = P

The case where y T is obvious so we consider only the case where y / T, First note that f-divergence Df( y||z) is convex in z. Furthermore T is a convex set. Therefore any z satisfying the KKT conditions is a minimizer. The KKT

Trustworthy Actionable Perturbations

conditions for this problem can be written as

L(z) = 0 (10)

i=1 zi = 1 (11)

i W zi 0 (12)

i U zi q 0 (13)

µ1, µ2 0 (14)

where the Lagrangian is defined by

i=1 zi + µ1

Note that we have neglected to explicitly state the requirement that 0 zi 1 for all i. This is because our eventual solution will satisfy these bounds anyways, and omitting these bounds will drastically simplify our calculations. We now rewrite (10) as

+ λ µ1 = 0 i W (17)

+ λ + µ2 = 0 i U (18)

+ λ = 0 i N (19)

We now propose a solution can be found where that the ratios yi

zi are constant in each of the sets W, U, N. That is

zi = CW yi i W

zi = CU yi i U

zi = CN yi i N.

In that case we can satisfy conditions (17), (18) and (19) (originally (10)) by setting

λ = C 1 N f (C 1 N ) f(C 1 N )

µ1 = λ + f(C 1 W ) C 1 W f (C 1 W )

µ2 = λ f(C 1 U ) + C 1 U f (C 1 U ).

We can now reformulate (14) so that it is easier to analyze. We will first define h(x) = xf (x) f(x). Note that because f(x) is convex h (x) = xf (x) 0 for all x 0 and h(x) is increasing. We can then rewrite our formulas for λ, µ1 and µ2 ((17), (18) and (19)) as

λ = h(C 1 N )

µ1 = h(C 1 N ) h(C 1 W )

µ2 = h(C 1 U ) h(C 1 N ).

Trustworthy Actionable Perturbations

Then µ1 0 becomes

h(C 1 N ) h(C 1 W )

C 1 N C 1 W CN CW,

and µ2 0 similarly becomes CN CU. This implies (14) is equivalent to

CU CN CW. (20)

Our choice of λ, µ1 and µ2 (defined in (17), (18) and (19)) satisfy (10), so we must now find values of CW, CU and CN that satisfy (11) through (16). We will consider 3 cases illustrated in Figure 6.

IYWt IGAD4/w DC/W1Hqy Xq23Reu Klc8cw Q9Z71+FQo1f</latexit>1

rn Fad Wc Zyb Wrl5med Rg GM4g TNwo A5Nu IYWt IGAD4/w DC/W1Hqy Xq23Reu Klc8cw Q9Z71+FQo1f</latexit>1

Wr Xr Ws Tr3Susrz KITd Ir Ok YUaq IVu UBt1EUWAHt Ezej Huj Sfj1Xhbtha Mf OY/ZDx/g WPR410</latexit>p Total Probability in Desired Classes

Total Probability in Undesired Classes

Trivial case

Figure 6. The three cases visualized in probability space.

Case: 1 Suppose SW < p and SU (1 SW) q 1 p .

Let CW = p SW and CU = CN = 1 p SU+SN . This implies µ2 = 0 which satisfies (16) and µ2 0 (half of (14)). This also implies P

i= W zi = p satisfying (12) and (15). We will use the fact SU + SN = 1 SW in our proof of condition (13).

i SU zi = X

i SU CU yi = 1 p SU + SN SU

1 p SU + SN (1 SW) q 1 p

This proves (13) is satisfied.

Because SW < p we have

CW = p SW > 1 > 1 p 1 SW = 1 p SU + SN = CN

This implies µ1 > 0 and satisfies the other half of (14).

Trustworthy Actionable Perturbations

We have now shown all the KKT conditions are satisfied and we have found a minimizer. We now plug these values into (9) to find a closed form for the distance.

d Y( y, T) = min z T

p yi SW f SW

(1 p) yi SU + SN f SU + SN

+ (1 p)f SU + SN

Case: 2 Suppose SU > q and SW (1 SU) p 1 q .

Let CU = q SU and CW = CN = 1 q SW+SN . This implies µ1 = 0 which satisfies (15) and µ1 0 (half of (14)). We also have P

i= U zi = q satisfying (13) and (16). We now prove condition (12) is satisfied.

i SW zi = X

i SW CW yi = 1 q SW + SN SW

1 q SW + SN (1 SU) p 1 q

Finally we prove CN CU implying µ2 0 which satisfies the other half of (14)

SU < 1 < 1 q

1 SU = 1 q SW + SN = CN

Now that we have proven that this is a minimizer we will again plug solution into (9) to find the distance value.

d Y( y, T) = min z T

q yi SU f SU

(1 q) yi SW + SN f SW + SN

+ (1 q)f SW + SN

Case: 3 Suppose SU > (1 SW) q 1 p and SW < (1 SU) p 1 q .

Let CW = p SW , CU = q SU and CN = 1 p q

SN in which case P

i= W zi = p (satisfying (12) and (15)), P

i= U zi = q (satisfying (13) and (16)). The choice of CN ensures that (11) is satisfied:

i CW zi + X

i CU zi + X

= CWSW + CUSU + CN SN = 1

To show that (14) is satisfied. We note SU > (1 SW) q 1 p implies CN > CU and SW < (1 SU) p 1 q p implies

Trustworthy Actionable Perturbations

CN < CW. this proves (20) which is equivalent to (14) Plugging these minimizing values of z into (9) yields

d Y( y, T) = min z T

p yi SW f SW

q yi SU f SU

SN f SN 1 p q

+ (1 p q)f SN 1 p q

This proves the closed form in equation (3) and we may now proceed to show that this function is continuous. To prove continuity we need only show continuity the piece-wise boundaries which we will evaluate one at a time.

Boundary 1: SW = p. The two functions that share this boundary are 0 and pf SW

p + (1 p)f 1 SW

1 p . Plugging the boundary into the latter function yields

+ (1 p)f 1 SW

+ (1 p)f 1 p

The two functions are equal on the boundary and the boundary is continuous.

Boundary 2: SU = q. The two functions that share this boundary are 0 and qf SU

q + (1 q)f 1 SU

1 q . Plugging the boundary into the latter function yields

+ (1 q)f 1 SU

+ (1 q)f 1 q

The two functions are equal on the boundary and the boundary is continuous.

Boundary 3: SU = (1 SW) q 1 p . The two functions that share this boundary are pf SW

p + (1 p)f 1 SW

q + (1 p q)f SN 1 p q . Plugging the boundary into the latter function yields

+ (1 p q)f 1 SW SU

+ (1 p)f 1 SW

The two functions are equal on the boundary and the boundary is continuous.

Boundary 4: SW = (1 SU) p 1 q . The two functions that share this boundary are qf SU

q + (1 q)f 1 SU

q + (1 p q)f SN 1 p q . Plugging the boundary into the latter function yields

+ (1 p q)f 1 SU SW

+ (1 q)f 1 SU

The two functions are equal on the boundary and the boundary is continuous. We have now shown continuity on all boundaries and the function is continuous. Now, to show that the derivative of the function is continuous, we need only show the all partial derivatives exist and agree on the boundaries. We use the closed form equation (3) found in the body of the paper (which is equivalent to the one found in the beginning of the proof) but suppresses SN . This makes it easier to differentiate with respect to yi, i W U.

d Y( y, T) =

0 if SW p and SU q

p + (1 p)f 1 SW

1 p if SW < p and SU (1 SW) q 1 p

q + (1 q)f 1 SU

1 q if SU > q and SW (1 SU) p 1 q

q + (1 p q)f 1 SW SU

1 p q if SU > (1 SW) q 1 p

and SW < (1 SU) p 1 q

Trustworthy Actionable Perturbations

We now take the derivative with respect to a desirable class (i W).

yi W d Y( y, T) =

0 if SW > p and SU < q

1 p if SW < p and SU < (1 SW) q 1 p

0 if SU > q and SW > (1 SU) p 1 q

p f 1 SW SU

1 p q if SU > (1 SW) q 1 p

and SW < (1 SU) p 1 q

Now we need only ensure all pieces agree on the boundaries to show that the derivative exists and is continuous.

Boundary 1: SW = p. The two functions that share this boundary are 0 and f SW

1 p . Plugging the boundary into the latter function yields

Then setting the derivative at the boundary to 0 makes the derivative on this boundary continuous.

Boundary 2: SU = q. The two functions that share this boundary are both 0, and setting the derivative at the boundary to 0 makes the derivative on this boundary continuous.

Boundary 3: SU = (1 SW) q 1 p . The two functions that share this boundary are f SW

p f 1 SW SU

1 p q . Plugging the boundary into the latter function yields

Then setting the derivative at the boundary to f SW

1 p makes the derivative on this boundary continuous.

Boundary 4: SW = (1 SU) p 1 q . The two functions that share this boundary are 0 and f SW

p f 1 SW SU

We rewrite the boundary as SU = 1 q

p SW + 1 and plug it into the latter function.

Then setting the derivative at the boundary to 0 makes the derivative on this boundary continuous.

This yields the continuous partial derivative

yi W d Y( y, T) =

0 if SW p and SU q

1 p if SW < p and SU (1 SW) q 1 p

0 if SU > q and SW (1 SU) p 1 q

p f 1 SW SU

1 p q if SU > (1 SW) q 1 p

and SW < (1 SU) p 1 q

We now take the derivative with respect to a undesirable class (i U).

Trustworthy Actionable Perturbations

yi U d Y( y, T) =

0 if SW > p and SU < q

0 if SW < p and SU < (1 SW) q 1 p

1 q if SU > q and SW > (1 SU) p 1 q

q f 1 SW SU

1 p q if SU > (1 SW) q 1 p

and SW < (1 SU) p 1 q

Now we need only ensure that there is agreement on the boundaries.

Boundary 1: SW = p. The two functions that share this boundary are both 0, and setting the derivative at the boundary to 0 makes the derivative on this boundary continuous.

Boundary 2: SU = q. The two functions that share this boundary are both 0 and f SU

1 q . Plugging the boundary into the latter function yields

Then setting the derivative at the boundary to 0 makes the derivative on this boundary continuous.

Boundary 3: SU = (1 SW) q 1 p . The two functions that share this boundary are 0 and f SU

q f 1 SW SU

We rewrite the boundary as SW = 1 1 p

q SU and plug it into the latter function.

Then setting the derivative at the boundary to 0 makes the derivative on this boundary continuous.

Boundary 4: SW = (1 SU) p 1 q . The two functions that share this boundary are f SU

q f 1 SW SU

1 p q . Plugging the boundary into the latter function yields

1 SU (1 SU) p 1 q

Then setting the derivative at the boundary to f SU

1 q makes the derivative on this boundary continuous.

This yields the continuous partial derivative

yi U d Y( y, T) =

0 if SW p and SU q

0 if SW < p and SU (1 SW) q 1 p

1 q if SU > q and SW (1 SU) p 1 q

q f 1 SW SU

1 p q if SU > (1 SW) q 1 p

and SW < (1 SU) p 1 q

We now present a corollary to Theorem 2.2 that shows explicitly that d Y decreases with added probability to the desirable classes and increases with added probability to the undesirable classes.

Trustworthy Actionable Perturbations

Corollary A.1. If T is of form (1) and f is twice differentiable, then d Y( y, T) is decreasing in yi if i W and is increasing if i U.

To prove Corollary A.1, we need only show equation (3) is decreasing in yi for i W and increasing in yi for i U, we need only prove that the partial derivative (21) is non-positive and the partial derivative (22) is non-negative. We will rely heavily on the fact thatf is increasing because f is convex.

We start with (21):

yi W d Y( y, T) =

0 if SW p and SU q

1 p if SW < p and SU (1 SW) q 1 p

0 if SU > q and SW (1 SU) p 1 q

p f 1 SW SU

1 p q if SU > (1 SW) q 1 p

and SW < (1 SU) p 1 q

Clearly the first and third cases are non-positive, so we proceed to the second case.

Because SW < p, we have SW

p < 1 < 1 SW

Next we prove the partial derivative is negative in the fourth case.

SW < (1 SU) p 1 q

SW q SW < p p SU SW q SW p SW < p p SU p SW SW

p < 1 SU SW

< f 1 SU SW

This shows that (21) is non-positive and (3) is decreasing in yi for i W.

We now consider (22):

yi U d Y( y, T) =

0 if SW p and SU q

0 if SW < p and SU (1 SW) q 1 p

1 q if SU > q and SW (1 SU) p 1 q

q f 1 SW SU

1 p q if SU > (1 SW) q 1 p

and SW < (1 SU) p 1 q

Clearly the first two cases are non-negative, so we consider the third case.

Trustworthy Actionable Perturbations

Because SU > q, we have SU

q > 1 > 1 SU

We can no prove the fourth case is positive.

SU > (1 SW) q 1 p

SU p SW > q q SW SU p SW q SU > q p SW q SU SU

q > 1 SW SU

> f 1 SW SU

This shows that (22) is non-negative and (3) is increasing in yi for i U.

Additional Analysis on d Y The following lemma generalizes the result of Corollary A.1 to any f-divergence if we are restricted to the binary classification case. That is, d Y exhibits expected behavior of a reward measure or any f-divergence if we restrict ourselves to the binary classification setting (reward goes down as the probability of being the undesirable class goes up).

Lemma A.2. In the binary classification setting, if T = {z Y|z1 p}, then d Y( y, T) is decreasing (not necessarily strictly) in y1 for D( y||z) any f-divergence.

We now present the proof of Lemma A.2. Recall d Y( y, T) = minz T Df( y||z). For binary probability distributions a and b, the f-divergence has the simple form

Df(b||a) = a1f b1

+ (1 a1)f 1 b1

for a convex function f with f(1) = 0. We show a relationship between this formula and a secant line. To refer to the secant line of a function g(x) from point x = α to x = β evaluated at γ, we will use the notation Sg(α, β; γ). When using this notation we will assume that α β.

We assume a1 > b1 and show that Df(b||a) is equivalent to the secant line of f(x) from x = b1

1 a1 evaluated at 1. (Note b1

a1 < 1 < 1 b1

1 a1 .) We show this simply using the point slope form.

1 a1 ; x = x 1 b1

f 1 b1 1 a1

1 b1 1 a1 b1

a1 + f 1 b1

1 a1 ; 1 = 1 1 b1

f 1 b1 1 a1

1 b1 1 a1 b1

a1 + f 1 b1

+ (1 a1)f 1 b1

Trustworthy Actionable Perturbations

Now that Df(b||a) is related to a secant line we prove a few facts about secant lines of convex functions. If g is convex, then Sg(α, β; γ) is decreasing in α and increasing in β whenever α < γ < β. Recall that if g is convex, then by definition for any v1 < v2 < v3, we have

g(v2) g(v1)

v2 v1 g(v3) g(v1)

v3 v1 g(v3) g(v2)

v3 v2 . (24)

Then for any β < β we have

Sg(α, β; γ) = (γ α)m + g(α) (25)

Sg(α, β; γ) = (γ α) m + g(α) (26)

for m m. It follows that for any γ α Sg(α, β; x) Sg(α, β; x), (27)

and Sg(α, β; x) is increasing in β.

A similar argument shows that Sg(α, β; x) is decreasing in α when γ β.

We will use these facts to analyze d Y( y, T) = minz T Df( y||z). The f-divergence between identical distributions is zero, so we have d Y( y, T) = 0 whenever y1 p. When y1 < p we have y1

z1 < 1 < 1 y1

d Y( y, T) = min z T Df( y||z)

= min z T Sf

which is decreasing in y1

z1 and increasing in 1 y1

1 z1 , so to achieve the minimum we use the smallest possible z1, i.e. z1 = p. We may now simplify

d Y( y, T) =

1 p ; 1 if y < p

Note that this is continuous at y = p because Sf(1, 1; 1) = f(1) = 0. With this closed form solution for d Y( y, T) we may finish the proof.

We have already shown that Sf y1

1 p ; 1 is decreasing in y1

p and increasing in 1 y1

1 p , so increasing y1 will decrease

1 p ; 1 and d Y( y, T) is decreasing in y1.

Trustworthy Actionable Perturbations

A.2. Proofs of Theorem 2.3 (PAC generalization bounds for Verifier)

Let us define Di as the distribution of the data x conditioned on the event that it is drawn from class i. We define a loss function ℓ: {0, 1} [0, 1] R as follows:

ℓ(z, v) = zl(v) + (1 z)l(1 v), (28)

where l is some differentiable function (e.g., log() which would lead to the cross-entropy loss). Furthermore, we assume that the output of the loss ℓis upper bounded by a constant Bℓand is Lipschitz. The verifier output of V (x, x) estimates probability that x and x belong to the same class.

Using this loss, we now define the true risk R(V ) of a verifier V as

R(V ) = 1 k(k 1)

i =j E x(i) Di,x(j) Dj [ℓ(0, V (x(i), x(j)))]

| {z } R(diff)(V )

i=1 E (x, x) Di [ℓ(1, V (x, x))]

| {z } R(same)(V )

The verifier faces two types of inputs that it should be able to distinguish: (a) pairs of inputs that can come from the same class (i.e., x, x Di for some class i) and (b) pairs of inputs that can belong to different classes (i.e., x(i) Di and x(j) Dj for some pair of classes i = j). This formulation of risk assigns equal value to identifying pairs from the same class and pairs form different classes because both of R(diff)(V ) (accuracy on pairs form different classes) and R(same)(V ) (accuracy on pairs form the same class) are normalized by dividing by the total number of terms in the sum. Specifically, we normalize the total risk for misclassifying pairs from different classes by k(k 1), which is the number of distinct ordered pairs of classes we can form out of k classes. Similarly, we normalize the total risk of misclassifying pairs from same classes by k. Furthermore, both R(diff)(V ) and R(same)(V ) assign equal importance to each possible type of class combination (which class the first element of the pair comes form and which class the second element of the pair comes from).

To calculate our empirical risk we will assume we are given k sets S(i) 1 i k, each containing n/k samples drawn independently from the corresponding Di as defined above. We index these sets as follows:

S(i) = {x(i) (q)}n/k q=1, i = 1, 2, . . . , k. (30)

We define the entire dataset S as

i=1 S(i) (31)

We define our empirical risk for training the verifier over the set S as follows:

ˆRS(V ) = 1 k(k 1)

r=1 ℓ(0, V (x(i) q , x(j) r ))

| {z } ˆ R(diff) S (V )

r=1,r =q ℓ(1, V (x(i) q , x(i) r ))

| {z } ˆ R(same) S (V )

where ˆR(diff) S (V ) denotes the empirical risk of the verifier on inputs from different classes; and ˆR(same) S (V ) denotes the empirical risk of the verifier on inputs from the same class. It is straightforward to verify that ˆRS(V ) is an unbiased estimator of the true risk R(V ), i.e., E( ˆRS(V )) = R(V ).

Let us define worst case generalization gap for a given dataset S as

ϕ(S) = sup V V

R(V ) ˆRS(V ) , (33)

where V denotes the hypothesis class from which the verifier V ( ) is selected. To bound this generalization gap, we will use the notion of Rademacher complexity which measures the correlation between the function class and the random labels to upper bound the generalization gap (Mohri et al., 2018). The Rademacher complexity of a hypothesis class over a particular data set is formally defined as:

Trustworthy Actionable Perturbations

Definition A.3. The empirical Rademacher complexity of a function class F with respect to the set ˇS = {ai}n i=1 is given by the following equation:

R ˇ S(F) = 1

i=1 σif(ai)

where σi s are i.i.d. Rademacher random variables, i.e., Pr(σi = 1) = Pr(σi = 1) = 1

In the following steps, we upper bound the generalization gap in (33) as using Rademacher complexity. We first bound the generalization gap using triangle inequality as follows:

ϕ(S) = sup V V

R(diff)(V ) + R(same)(V ) ˆR(diff) S (V ) ˆR(same) S (V ) (35)

R(diff)(V ) ˆR(diff) S (V ) + sup V V

R(same)(V ) ˆR(same) S (V ) (36)

The above bound first decomposes the generalization gap into the sum of two generalization gaps, the first over the pair of samples coming from different classes; and the second over the samples drawn from the same class. To proceed we will need a few additional definitions: we define Di Dj to represent the distribution over pairs (x(i), x(j)) where x(i) is drawn from Di and x(j) is drawn from Dj independently. We also define the sets

S(i) S(j) =

( {(x(i) (q), x(j) (r))}1 q,r k i = j

{(x(i) (q), x(j) (r))}1 q,r k,q =r i = j , (37)

and enumerate the elements of each set by S(i) S(j) = {ui,j q }(n/k)2

q=1 when i = j. When i = j the enumeration takes the

form S(i) S(i) = {ui,i q }(n/k)2 n q=1 .

Using our definitions of true and empirical risk, we can now upper bound the above sum as follows,

ϕ(S) sup V V

Ex(i) Di,x(j) Dj h ℓ(V (x(i), x(j)), 0) i k

r=1 ℓ(V (x(i) (q), x(j) (r)), 0)

Ex, x Di [ℓ(V (x, x), 1)]

r=1,r =q ℓ(V (x(i) (q), x(i) (r)), 1)

i =j sup V V

Eu Di Djℓ(V (u), 1) k

q=1 ℓ(V (u(i,j) q ), 1)

i=1 sup V V

Eu Di Di [ℓ(V (u), 1)] k

q=1 ℓ(V (u(i,i) q ), 1)

where the second inequality follows by bounding the absolute value of a sum by the sum of the absolute values (across both the diff and same terms).

We now apply the standard Rademacher complexity PAC-bound (Mohri et al., 2018; Bartlett & Mendelson, 2002) to each of the supremums in (38). This implies that the following inequalities hold with probability 1 δ for any δ (0, 1). (The probability in this case stems from he random selection/draw of S.)

ϕ(S) 1 k(k 1)

2RS(i) S(j)(V) + 6k Bℓ

2RS(i) S(i)(V) + 6k Bℓ

i =j RS(i) S(j)(V) + 2

i=1 RS(i) S(i)(V) + 6k Bℓ

i =j RS(i) S(j)(V) + 2

i=1 RS(i) S(i)(V) + 12k Bℓ

Trustworthy Actionable Perturbations

where the final inequality comes form replacing 6k Bℓ

n with the larger 6k Bℓ

n2 k2n. Equation (39) can be interpreted as the sum of three terms: the first term is the average Rademacher complexity over the datasets corresponding to pairs which are drawn from different classes; the second term is the average Rademacher complexity over the datasets corresponding to pairs which are drawn from same classes; the third term is a standard term which shows the dependence on δ (as well as n, k).

We now apply the bound on empirical Rademacher complexity

RQ(V ) = O |Q| 1/d (40)

with d the dimension of the elements of Q (Gottlieb et al., 2016). To apply this we will recall the dimension of the elements of S(i) S(j) is 2d, and |S(i) S(j)| = n

k 2 when i = j, and |S(i) S(i)| = n

k 2 n = n2 k2n

k2 . Applying our Rademacher complexity bound yields

ϕ(S) 2 k(k 1)

The bound in (43) is our final PAC bound true with probability 1 δ. However, we expect the δ containing term to be dominated by the other term because ( k

n2 k2n) < 1 and d is expected to be much larger than 1.

A.3. Additional Implementation Details

In this section we give additional details on how we implemented our methods to create the experimental results found in this paper. We also provide code for replicating our results at https://github.com/Jesse Friedbaum/TAP_code.

A.3.1. DATA SET AND COST FUNCTION DETAILS

Here we give additional description of each data set and the corresponding the cost functions d X used in our experiments. As noted in Section 3 we must ensure d X is differentiable. When dealing with categorical features costs are by nature discrete (and not differentiable). We show how we were able to write these costs in a differentiable form. Suppose v Rℓis a one-hot encoding of a categorical feature and define the transition cost matrix A such that Ai,j as the cost of changing from category i to category j. Then z T A z represents the costs of changing this categorical feature and is differentiable in z.

Adult Income Prediction Dataset: (Kohavi & Becker, 1996) This widely used data set contains information from the 1994 U.S. census, with individuals labelled by whether their annual income was over $50,000 ( $100,000 in 2023 adjusted for inflation). We define our target set T as over 80% probability high income. Our actionable set allows changes in job type, education and number of hours worked with all other attributes immutable. The cost function d Y includes the expected number of years to improve education (e.g. two years to go from associate s degree to bachelors degree), a one-year cost to change employer type and the 2-norm of the change in hours worked per week (weighted so 3 hours per week is equivalent to a year spent on education). Here Trustworthy Actionable Perturbations suggest the best way to improve an individuals odds of making a large income with the least time and effort.

Specifically d X is the sum cost from changes (1) hours worked per week (2) change in employment type (3) change in education and (4) change in field of work.

The cost from a change in hours is given by h2

10 where h is the change in weekly hours worked. This will mean 3 extra hours of work are approximately equivalent to one year of schooling.

The cost from a change in employer (the options are government, private, self-employed and other) is always 1 (equivalent to a year spent on education).

The possible levels of education are (1) any schooling, (2) High School Degree, (3) Professional Degree, (4) some college, (5) Associate s Degree, (6) Bachelors Degree, (7) Master s Degree, (8) Doctorate Degree. The cost transition matrix associated

Trustworthy Actionable Perturbations

with the level of education (as ordered above) is

AEducation =

0 2 10 3 4 6 8 11 L 0 8 1 2 4 6 9 L L 0 L L L 2 5 L L 7 0 1 3 5 8 L L 6 L 0 2 4 7 L L 4 L L 0 2 5 L L 4 L L L 0 3 L L 4 L L L L 0

where L is a large number meant to prevent suggestions that lead to a decrease in education, which is impossible (we use L = 1, 000). These numbers represent the expected number of years required to gain the specified degree (i.e. the cost of going from a high school degree to a bachelors degree is A2,6 = 4).

Finally the options for fields of work are (1) Service, (2) Sales, (3) Blue-Collar (4) White Collar, (5) Professional, (6) Other. The cost transition matrix associated with the field of work (as ordered above) is

AProfession =

0 1 2 3 4 1 1 0 1 2 3 1 1 1 0 1 2 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 0

This represents a minimum cost of 1 for any change in field of work with higher costs when moving to a relatively more selective field (i.e., service to professional).

Law School Success Prediction Dataset: (Wightman, 1998) This data set contains demographic information and academic records for over 20,000 law school students labelled by whether or not a student passed the BAR exam. Our target set is an 85% chance of passing the BAR. To create A(x), we suppose the law school performance is merely a projection that can be changed through more studying, allowing us to change the law school grades and the location where the students take the BAR. The cost function d X sums the increase in grades and the physical distance travelled to take the BAR where moving to an adjacent region (e.g. Far West to North West) is weighted the same as increasing grades by one standard deviation.

Specifically d Y sums the increase in grades and the physical distance travelled to take the BAR where moving to an adjacent region (e.g. Far West to North West) is weighted the same as increasing grades one standard deviation. This set up returns the optimal combination of studying harder and moving location to take the BAR.In this data set d X is sum of the change in grades (in standard deviations from the mean) and distance traveled. The country was divided into eight regions: (1) Far West, (2) Great Lakes, (3) Mid-South, (4) Mountain West, (5) Mid-West, (6) North East, (7) New England, (8) North West. We use the transition cost matrix

0 3 4 1 2 6 5 1 3 0 1 2 1 2 1 3 4 1 0 2 1 2 1 5 1 2 2 0 1 4 3 2 2 1 1 1 0 3 2 3 6 2 2 4 3 0 1 5 5 1 1 3 2 1 0 5 1 3 5 2 3 5 5 0

Moves to adjacent regions result in a cost of 1, while the highest cost of 6 is incurred by moving from Far West to New England or back.

Diabetes Prediction Dataset: (for Disease Control & , CDC) This data set contains information on the demographics, health conditions and health habits of 250,000 individuals labelled by whether an individual is diabetic extracted from the Behavioral Risk Factor Surveillance System (BRFSS), a health-related telephone survey that is collected annually by the CDC.. We define A(x) to allow changes in health habits, BMI, education and income. We use a weighted 2-norm for d X to represent the relative difficulty of making changes. For example, starting to get regular physical activity is weighted the

Trustworthy Actionable Perturbations

same as dropping one BMI point. Increasing education, income and health insurance were weighted as more difficult that simply adjusting health habits.

German Credit Dataset: (Hofmann, 1994) This commonly used data set contains information on 1,000 loan applications in Germany labelled by their credit risk. The actionable set allows for changes in the loan request (time and size) as well as the funds in the applicants checking and savings account and whither the applicant has a telephone. The target set T is a greater than 80% of being a good credit risk. The cost function d Y is the direct measuring the total difference in Deutsche Marks (DM) between all elements of the application. No cost was assigned to closing empty accounts. The change in length of loan is converted to DM through the individual s monthly disposable income. Finally we set a flat cost of 50DM to acquire a telephone.

A.3.2. MODEL DETAILS

We used fully connected feed forward neural networks. Each network used 3 hidden layers with Re Lu activation functions between each layer. We tuned the parameters of the neural networks until we achieved accuracy on par with common tree based classifiers (random forests and histogram boosted trees). Accuracy results are presented in table A.3.2. For all data sets except the German Credit data set each hidden layer had 60 nodes. The German Credit data set required 120 nodes per layer. Additionally, for the German Credit data set only, we used dropout regularization of 20% on each hidden layer. We trained these models using the ADAM optimizer to minimize cross entropy loss. We used an 80 10 10 train-validate-test data split and implemented early stopping with the validation data. All Trustworthy Actionable Perturbations, counterfactuals and adversarial examples were created for the testing data. We used identical architecture for V as M, except for doubling the input size. Accuracy data may be found in table 3.

Adult Income Law School Success Diabetes Prediction German Credit Random Forest 73% 64% 62% 74% Histogram Gradient Boosted Trees 81% 77% 75% 69% Neural Network 80% 77% 75% 75%

We also tested the calibration of our networks by calculating the expected calibration error (ECE) (Naeini et al., 2015). We used 15 bins and record the results in table A.3.2

Adult Income Law School Success Diabetes Prediction German Credit ECE (15 bins) 16% 15% 7% 21%

A.3.3. OBJECTIVE FUNCTION DETAILS

In our implementation we formulated the actionablility penalty term b as

i=1 max{0, xi ui} + max{0, li xi}

with G a sufficiently large constant. We formulated our coherence penalty term p as

with P another appropriately large constant. The conditioner function cond simply rounded integer and Boolean values to the nearest integer value. For one-hot encoded features categorical features, the category with the largest value set to one and all other categories set to zero.

A.3.4. ADDITIONAL EXPERIMENTAL RESULTS

Here we show success bar charts similar to those found in figure 7 compare the efficacy of Trustworthy Actionable Perturbations, counterfactuals (Wachter et al., 2017; Mothilal et al., 2020) and adversarial examples from the Carlini Wagner ℓ2 attack (Carlini & Wagner, 2017b) for all data sets. These are similar to Figure 5, but include all data sets and an increased number of cost (ϵ) values.

Trustworthy Actionable Perturbations

Each bar chart refers to a particular data set and desired distance δ to the target set T. Inside of each chart, the bars show the percentage of individuals that a method was able to successfully move inside the goal δ at a variety of costs ϵ. Figure 7 shows data before the verification procedure has been performed and 7 shows the data after all . In these tests, the Trustworthy Actionable Perturbations (in blue) outperform the counterfactuals (in green and orange) in nearly all cases except for when both methods achieved 100% success or the very high-cost (large ϵ) high reward (δ = 0) scenarios. Carlini Wagner attacks (red) are only effective at larger δ values because they are designed to move a data point just barely inside the target class. The Carlini Wagner attacks are not required to be actionable (or even feasible), so they do not constitute useful advise. The verifier is able to recognize that these adversarial examples are untrustworthy in all cases.

Trustworthy Actionable Perturbations

t>Low reward: δ = 1 Moderate reward: δ = 0.5 High reward: δ = 0

8GNJ+PVe Fu0rhj5z BH8k PH+Be UTj Y4=</latexit>0

+Eb T8ar8b Zo XTHym SP4Ie P9C1sijc0=</latexit>50

V4MYTw Zr8bsr Vg5DPH8EPG+xf Dlo4D</latexit>100

8GNJ+PVe Fu0rhj5z BH8k PH+Be UTj Y4=</latexit>0

0ixr5SM9NOn2s J+p3LYX/1fq R9p Oz EQYa Sr I4i Mv4kg HKL0bj Zik RPNZYj CRLNk Vk Qm Wm Ogkn WIWwk XDtpv28m SUknrdri9Jt1a1zqu Wd VMrty7z PApw DCdw Bh Y0o AX0IYOEBj DIz Di8GNJ+PVe Fu0rhj5z BH8k PH+Be UTj Y4=</latexit>0

0ixr5SM9NOn2s J+p3LYX/1fq R9p Oz EQYa Sr I4i Mv4kg HKL0bj Zik RPNZYj CRLNk Vk Qm Wm Ogkn WIWwk XDtpv28m SUknrdri9Jt1a1zqu Wd VMrty7z PApw DCdw Bh Y0o AX0IYOEBj DIz Di8GNJ+PVe Fu0rhj5z BH8k PH+Be UTj Y4=</latexit>0

Qj9kv X8Bo2OSFQ=</latexit>Success rate

TAP (our work) Single Counterfactual

6/AOo Xkso=</latexit>Carlini Wagner

Success rate

16 years education

16 years education

0ixr5SM9NOn2s J+p3LYX/1fq R9p Oz EQYa Sr I4i Mv4kg HKD0cj Zik RPNZYj CRLNk Vk Qm Wm Ogknm IWwm XDtpv28m SUknrdri9Jt1a1zqu Wd Vsrt67y PApw DCdw Bh Y0o AU30IYOEJj AIz Di+Eb T8ar8b Zo XTHym SP4Ie P9C1sijc0=</latexit>50

Pf J+Y4UGo We Elng PVE/a6l8L9a P9Z+y50z Ec Wa Cr L8y I850i FKL0cj Jin Rf JYTCRLdk Vkgi Um Osmnl IVw2b Ttlr06Ga Wk0b Ab K9Kt16y Lm Xd1ivtqzy PIpz AKZy DBU1ow10w AECDB7h GV4MYTw Zr8bsr Vg5DPH8EPG+xf Dlo4D</latexit>100

0ixr5SM9NOn2s J+p3LYX/1fq R9p Oz EQYa Sr I4i Mv4kg HKL0bj Zik RPNZYj CRLNk Vk Qm Wm Ogkn WIWwk XDtpv28m SUknrdri9Jt1a1zqu Wd VMrty7z PApw DCdw Bh Y0o AX0IYOEBj DIz Di8GNJ+PVe Fu0rhj5z BH8k PH+Be UTj Y4=</latexit>0

Success rate

0ixr5SM9NOn2s J+p3LYX/1fq R9p Oz EQYa Sr I4i Mv4kg HKD0cj Zik RPNZYj CRLNk Vk Qm Wm Ogknm IWwm XDtpv28m SUknrdri9Jt1a1zqu Wd Vsrt67y PApw DCdw Bh Y0o AU30IYOEJj AIz Di+Eb T8ar8b Zo XTHym SP4Ie P9C1sijc0=</latexit>50

Pf J+Y4UGo We Elng PVE/a6l8L9a P9Z+y50z Ec Wa Cr L8y I850i FKL0cj Jin Rf JYTCRLdk Vkgi Um Osmnl IVw2b Ttlr06Ga Wk0b Ab K9Kt16y Lm Xd1ivtqzy PIpz AKZy DBU1ow10w AECDB7h GV4MYTw Zr8bsr Vg5DPH8EPG+xf Dlo4D</latexit>100

0ixr5SM9NOn2s J+p3LYX/1fq R9p Oz EQYa Sr I4i Mv4kg HKL0bj Zik RPNZYj CRLNk Vk Qm Wm Ogkn WIWwk XDtpv28m SUknrdri9Jt1a1zqu Wd VMrty7z PApw DCdw Bh Y0o AX0IYOEBj DIz Di8GNJ+PVe Fu0rhj5z BH8k PH+Be UTj Y4=</latexit>0 16 years education

Low reward: δ = 1

NSKp VLX6S+WXS2io5zupmv HI7y LBVts Y2m MN2WIUdsyqr Mc Fu2T17ZE/Wnf Vg PVsvn61j1mhmhf2Q9fo B0u Ydw=</latexit>Moderate reward: δ = 0.5 High reward: δ = 0

8GNJ+PVe Fu0rhj5z BH8k PH+Be UTj Y4=</latexit>0

rcs8jw I4Asfg FNig AVrg Gr RB2AQg Ufw DF6MB+PJe DXe Fq0r Rj5z CH7Ie P8C7fi TRQ=</latexit>cost/

+Eb T8ar8b Zo XTHym SP4Ie P9C1sijc0=</latexit>50

V4MYTw Zr8bsr Vg5DPH8EPG+xf Dlo4D</latexit>100

8GNJ+PVe Fu0rhj5z BH8k PH+Be UTj Y4=</latexit>0

0ixr5SM9NOn2s J+p3LYX/1fq R9p Oz EQYa Sr I4i Mv4kg HKL0bj Zik RPNZYj CRLNk Vk Qm Wm Ogkn WIWwk XDtpv28m SUknrdri9Jt1a1zqu Wd VMrty7z PApw DCdw Bh Y0o AX0IYOEBj DIz Di8GNJ+PVe Fu0rhj5z BH8k PH+Be UTj Y4=</latexit>0

0ixr5SM9NOn2s J+p3LYX/1fq R9p Oz EQYa Sr I4i Mv4kg HKL0bj Zik RPNZYj CRLNk Vk Qm Wm Ogkn WIWwk XDtpv28m SUknrdri9Jt1a1zqu Wd VMrty7z PApw DCdw Bh Y0o AX0IYOEBj DIz Di8GNJ+PVe Fu0rhj5z BH8k PH+Be UTj Y4=</latexit>0

V0j E7QGXJQHTXRNWqh Nq Io Ro/o Gb1Yif Vkv Vpvi9YVK585Qj9kv X8Bo2OSFQ=</latexit>Success rate

Success rate

0ixr5SM9NOn2s J+p3LYX/1fq R9p Oz EQYa Sr I4i Mv4kg HKD0cj Zik RPNZYj CRLNk Vk Qm Wm Ogknm IWwm XDtpv28m SUknrdri9Jt1a1zqu Wd Vsrt67y PApw DCdw Bh Y0o AU30IYOEJj AIz Di+Eb T8ar8b Zo XTHym SP4Ie P9C1sijc0=</latexit>50

Pf J+Y4UGo We Elng PVE/a6l8L9a P9Z+y50z Ec Wa Cr L8y I850i FKL0cj Jin Rf JYTCRLdk Vkgi Um Osmnl IVw2b Ttlr06Ga Wk0b Ab K9Kt16y Lm Xd1ivtqzy PIpz AKZy DBU1ow10w AECDB7h GV4MYTw Zr8bsr Vg5DPH8EPG+xf Dlo4D</latexit>100

0ixr5SM9NOn2s J+p3LYX/1fq R9p Oz EQYa Sr I4i Mv4kg HKL0bj Zik RPNZYj CRLNk Vk Qm Wm Ogkn WIWwk XDtpv28m SUknrdri9Jt1a1zqu Wd VMrty7z PApw DCdw Bh Y0o AX0IYOEBj DIz Di8GNJ+PVe Fu0rhj5z BH8k PH+Be UTj Y4=</latexit>0

Success rate

0ixr5SM9NOn2s J+p3LYX/1fq R9p Oz EQYa Sr I4i Mv4kg HKD0cj Zik RPNZYj CRLNk Vk Qm Wm Ogknm IWwm XDtpv28m SUknrdri9Jt1a1zqu Wd Vsrt67y PApw DCdw Bh Y0o AU30IYOEJj AIz Di+Eb T8ar8b Zo XTHym SP4Ie P9C1sijc0=</latexit>50

Pf J+Y4UGo We Elng PVE/a6l8L9a P9Z+y50z Ec Wa Cr L8y I850i FKL0cj Jin Rf JYTCRLdk Vkgi Um Osmnl IVw2b Ttlr06Ga Wk0b Ab K9Kt16y Lm Xd1ivtqzy PIpz AKZy DBU1ow10w AECDB7h GV4MYTw Zr8bsr Vg5DPH8EPG+xf Dlo4D</latexit>100

0ixr5SM9NOn2s J+p3LYX/1fq R9p Oz EQYa Sr I4i Mv4kg HKL0bj Zik RPNZYj CRLNk Vk Qm Wm Ogkn WIWwk XDtpv28m SUknrdri9Jt1a1zqu Wd VMrty7z PApw DCdw Bh Y0o AX0IYOEBj DIz Di8GNJ+PVe Fu0rhj5z BH8k PH+Be UTj Y4=</latexit>0

6σ grade increase 6σ grade increase 6σ grade increase

Low reward: δ = 1 Moderate reward: δ = 0.5

m Z5p NABOk THy EYl VEFVEN1RNE9ek TP6MV4MJ6MV+Nt2rpgz Gb20A8Z71+JOZYn</latexit>High reward: δ = 0

8GNJ+PVe Fu0rhj5z BH8k PH+Be UTj Y4=</latexit>0

rcs8jw I4Asfg FNig AVrg Gr RB2AQg Ufw DF6MB+PJe DXe Fq0r Rj5z CH7Ie P8C7fi TRQ=</latexit>cost/

+Eb T8ar8b Zo XTHym SP4Ie P9C1sijc0=</latexit>50

ESYZ1k Vcx Cu Gg4Tt NZfhmp F536kv Sr VXt86pt39TKrcs8jw I4Asfg FNig AVrg Gr RB2AQg Ufw DF6MB+PJe DXe Fq0r Rj5z CH7Ie P8C7fi TRQ=</latexit>cost/

0ixr5SM9NOn2s J+p3LYX/1fq R9p Oz EQYa Sr I4i Mv4kg HKL0bj Zik RPNZYj CRLNk Vk Qm Wm Ogkn WIWwk XDtpv28m SUknrdri9Jt1a1zqu Wd VMrty7z PApw DCdw Bh Y0o AX0IYOEBj DIz Di8GNJ+PVe Fu0rhj5z BH8k PH+Be UTj Y4=</latexit>0

ESYZ1k Vcx Cu Gg4Tt NZfhmp F536kv Sr VXt86pt39TKrcs8jw I4Asfg FNig AVrg Gr RB2AQg Ufw DF6MB+PJe DXe Fq0r Rj5z CH7Ie P8C7fi TRQ=</latexit>cost/

V0j E7QGXJQHTXRNWqh Nq Io Ro/o Gb1Yif Vkv Vpvi9YVK585Qj9kv X8Bo2OSFQ=</latexit>Success rate

Success rate

0ixr5SM9NOn2s J+p3LYX/1fq R9p Oz EQYa Sr I4i Mv4kg HKD0cj Zik RPNZYj CRLNk Vk Qm Wm Ogknm IWwm XDtpv28m SUknrdri9Jt1a1zqu Wd Vsrt67y PApw DCdw Bh Y0o AU30IYOEJj AIz Di+Eb T8ar8b Zo XTHym SP4Ie P9C1sijc0=</latexit>50

Success rate

0ixr5SM9NOn2s J+p3LYX/1fq R9p Oz EQYa Sr I4i Mv4kg HKD0cj Zik RPNZYj CRLNk Vk Qm Wm Ogknm IWwm XDtpv28m SUknrdri9Jt1a1zqu Wd Vsrt67y PApw DCdw Bh Y0o AU30IYOEJj AIz Di+Eb T8ar8b Zo XTHym SP4Ie P9C1sijc0=</latexit>50

Low reward: δ = 1 Moderate reward: δ = 0.5 High reward: δ = 0

8GNJ+PVe Fu0rhj5z BH8k PH+Be UTj Y4=</latexit>0

0ixr5SM9NOn2s J+p3LYX/1fq R9p Oz EQYa Sr I4i Mv4kg HKL0bj Zik RPNZYj CRLNk Vk Qm Wm Ogkn WIWwk XDtpv28m SUknrdri9Jt1a1zqu Wd VMrty7z PApw DCdw Bh Y0o AX0IYOEBj DIz Di8GNJ+PVe Fu0rhj5z BH8k PH+Be UTj Y4=</latexit>0

0ixr5SM9NOn2s J+p3LYX/1fq R9p Oz EQYa Sr I4i Mv4kg HKL0bj Zik RPNZYj CRLNk Vk Qm Wm Ogkn WIWwk XDtpv28m SUknrdri9Jt1a1zqu Wd VMrty7z PApw DCdw Bh Y0o AX0IYOEBj DIz Di8GNJ+PVe Fu0rhj5z BH8k PH+Be UTj Y4=</latexit>0

Eg Kb Rrx SHZ9o Az AW3NIdu LIGEPoe OP7n K6p0Hk Ip F4k5PY/BCMh Is YJRog/q3Ca Wg FJZEw6BUtiv2XPivc XJTRrlag9JHfxj RJASh KSd K9Rw71l5Kp Ga Uw6z YTx TEh E7ICHr GCh KC8t L5z TN8asg QB5E0T2g8p98n Uh Iq NQ190xk SPVa/axn8r9ZLd NDw Uibi RIOgi0VBwr GOc BYAHj IJVPOp MYRKZm7Fd Ewkodr EVJy Hc F3Ya7/DLOSK3m1pak U6045x XHuam Wm5d5Hg V0j E7QGXJQHTXRNWqh Nq Io Ro/o Gb1Yif Vkv Vpvi9YVK585Qj9kv X8Bo2OSFQ=</latexit>Success rate

Success rate

0ixr5SM9NOn2s J+p3LYX/1fq R9p Oz EQYa Sr I4i Mv4kg HKL0bj Zik RPNZYj CRLNk Vk Qm Wm Ogkn WIWwk XDtpv28m SUknrdri9Jt1a1zqu Wd VMrty7z PApw DCdw Bh Y0o AX0IYOEBj DIz Di8GNJ+PVe Fu0rhj5z BH8k PH+Be UTj Y4=</latexit>0

Success rate

0ixr5SM9NOn2s J+p3LYX/1fq R9p Oz EQYa Sr I4i Mv4kg HKL0bj Zik RPNZYj CRLNk Vk Qm Wm Ogkn WIWwk XDtpv28m SUknrdri9Jt1a1zqu Wd VMrty7z PApw DCdw Bh Y0o AX0IYOEBj DIz Di8GNJ+PVe Fu0rhj5z BH8k PH+Be UTj Y4=</latexit>0

1y CNug ADBh4BM/gxbg3nox X423Rum Jk M0fgh4z3L2uxju0=</latexit>14 BMI

5000 DM 5000 DM 5000 DM

(a) Performance comparison over entire Adult Income dataset

(b) Performance comparison over entire Law School dataset

n TWYHfsg T2x5+g+eoxeou Gkd Ca9qyz L4je3g FV36O6</latexit>(c) Performance comparison over entire Diabetes dataset

(d) Performance comparison over entire German Credit dataset

BEFORE VERIFICATION

Figure 7. Performance comparison over entire datasets before verification: The graphs show average success rate for moving individuals within a variety of distances (δ) to the target set. The y-axis shows the percentage of individuals within the goal distance, and the x-axis, represents different costs (ϵ values) to achieve the goal. These values were obtained before applying the verification procedure.

Trustworthy Actionable Perturbations

Low reward: δ = 1

g NSKp VLX6S+WXS2io5zupmv HI7y LBVts Y2m MN2WIUdsyqr Mc Fu2T17ZE/Wnf Vg PVsvn61j1mhmhf2Q9fo B0u Ydw=</latexit>Moderate reward: δ = 0.5

Lm Z5p NABOk THy EYl VEFVEN1RNE9ek TP6MV4MJ6MV+Nt2rpgz Gb20A8Z71+JOZYn</latexit>High reward: δ = 0

i8GNJ+PVe Fu0rhj5z BH8k PH+Be UTj Y4=</latexit>0

i+Eb T8ar8b Zo XTHym SP4Ie P9C1sijc0=</latexit>50

GV4MYTw Zr8bsr Vg5DPH8EPG+xf Dlo4D</latexit>100

i8GNJ+PVe Fu0rhj5z BH8k PH+Be UTj Y4=</latexit>0

0ixr5SM9NOn2s J+p3LYX/1fq R9p Oz EQYa Sr I4i Mv4kg HKL0bj Zik RPNZYj CRLNk Vk Qm Wm Ogkn WIWwk XDtpv28m SUknrdri9Jt1a1zqu Wd VMrty7z PApw DCdw Bh Y0o AX0IYOEBj DIz Di8GNJ+PVe Fu0rhj5z BH8k PH+Be UTj Y4=</latexit>0

0ixr5SM9NOn2s J+p3LYX/1fq R9p Oz EQYa Sr I4i Mv4kg HKL0bj Zik RPNZYj CRLNk Vk Qm Wm Ogkn WIWwk XDtpv28m SUknrdri9Jt1a1zqu Wd VMrty7z PApw DCdw Bh Y0o AX0IYOEBj DIz Di8GNJ+PVe Fu0rhj5z BH8k PH+Be UTj Y4=</latexit>0

5Qj9kv X8Bo2OSFQ=</latexit>Success rate

TAP (our work) Single Counterfactual

t6/AOo Xkso=</latexit>Carlini Wagner

5Qj9kv X8Bo2OSFQ=</latexit>Success rate

WTRITp CJ8h CVRHl6i Bmoi O/SAnt Czc W8Gi/G6w1Y8xn9t EPGW+fq96Uw A=</latexit>16 years education 16 years education

0ixr5SM9NOn2s J+p3LYX/1fq R9p Oz EQYa Sr I4i Mv4kg HKD0cj Zik RPNZYj CRLNk Vk Qm Wm Ogknm IWwm XDtpv28m SUknrdri9Jt1a1zqu Wd Vsrt67y PApw DCdw Bh Y0o AU30IYOEJj AIz Di+Eb T8ar8b Zo XTHym SP4Ie P9C1sijc0=</latexit>50

Pf J+Y4UGo We Elng PVE/a6l8L9a P9Z+y50z Ec Wa Cr L8y I850i FKL0cj Jin Rf JYTCRLdk Vkgi Um Osmnl IVw2b Ttlr06Ga Wk0b Ab K9Kt16y Lm Xd1ivtqzy PIpz AKZy DBU1ow10w AECDB7h GV4MYTw Zr8bsr Vg5DPH8EPG+xf Dlo4D</latexit>100

0ixr5SM9NOn2s J+p3LYX/1fq R9p Oz EQYa Sr I4i Mv4kg HKL0bj Zik RPNZYj CRLNk Vk Qm Wm Ogkn WIWwk XDtpv28m SUknrdri9Jt1a1zqu Wd VMrty7z PApw DCdw Bh Y0o AX0IYOEBj DIz Di8GNJ+PVe Fu0rhj5z BH8k PH+Be UTj Y4=</latexit>0

Success rate

0ixr5SM9NOn2s J+p3LYX/1fq R9p Oz EQYa Sr I4i Mv4kg HKD0cj Zik RPNZYj CRLNk Vk Qm Wm Ogknm IWwm XDtpv28m SUknrdri9Jt1a1zqu Wd Vsrt67y PApw DCdw Bh Y0o AU30IYOEJj AIz Di+Eb T8ar8b Zo XTHym SP4Ie P9C1sijc0=</latexit>50

Pf J+Y4UGo We Elng PVE/a6l8L9a P9Z+y50z Ec Wa Cr L8y I850i FKL0cj Jin Rf JYTCRLdk Vkgi Um Osmnl IVw2b Ttlr06Ga Wk0b Ab K9Kt16y Lm Xd1ivtqzy PIpz AKZy DBU1ow10w AECDB7h GV4MYTw Zr8bsr Vg5DPH8EPG+xf Dlo4D</latexit>100

0ixr5SM9NOn2s J+p3LYX/1fq R9p Oz EQYa Sr I4i Mv4kg HKL0bj Zik RPNZYj CRLNk Vk Qm Wm Ogkn WIWwk XDtpv28m SUknrdri9Jt1a1zqu Wd VMrty7z PApw DCdw Bh Y0o AX0IYOEBj DIz Di8GNJ+PVe Fu0rhj5z BH8k PH+Be UTj Y4=</latexit>0 16 years education

Low reward: δ = 1 Moderate reward: δ = 0.5 High reward: δ = 0

Krcs8jw I4Asfg FNig AVrg Gr RB2AQg Ufw DF6MB+PJe DXe Fq0r Rj5z CH7Ie P8C7fi TRQ=</latexit>cost/

i+Eb T8ar8b Zo XTHym SP4Ie P9C1sijc0=</latexit>50

g V0j E7QGXJQHTXRNWqh Nq Io Ro/o Gb1Yif Vkv Vpvi9YVK585Qj9kv X8Bo2OSFQ=</latexit>Success rate

g V0j E7QGXJQHTXRNWqh Nq Io Ro/o Gb1Yif Vkv Vpvi9YVK585Qj9kv X8Bo2OSFQ=</latexit>Success rate

0ixr5SM9NOn2s J+p3LYX/1fq R9p Oz EQYa Sr I4i Mv4kg HKD0cj Zik RPNZYj CRLNk Vk Qm Wm Ogknm IWwm XDtpv28m SUknrdri9Jt1a1zqu Wd Vsrt67y PApw DCdw Bh Y0o AU30IYOEJj AIz Di+Eb T8ar8b Zo XTHym SP4Ie P9C1sijc0=</latexit>50

Success rate

0ixr5SM9NOn2s J+p3LYX/1fq R9p Oz EQYa Sr I4i Mv4kg HKD0cj Zik RPNZYj CRLNk Vk Qm Wm Ogknm IWwm XDtpv28m SUknrdri9Jt1a1zqu Wd Vsrt67y PApw DCdw Bh Y0o AU30IYOEJj AIz Di+Eb T8ar8b Zo XTHym SP4Ie P9C1sijc0=</latexit>50

0 6σ grade increase 6σ grade increase 6σ grade increase

Low reward: δ = 1

g NSKp VLX6S+WXS2io5zupmv HI7y LBVts Y2m MN2WIUdsyqr Mc Fu2T17ZE/Wnf Vg PVsvn61j1mhmhf2Q9fo B0u Ydw=</latexit>Moderate reward: δ = 0.5 High reward: δ = 0

ESYZ1k Vcx Cu Gg4Tt NZfhmp F536kv Sr VXt86pt39TKrcs8jw I4Asfg FNig AVrg Gr RB2AQg Ufw DF6MB+PJe DXe Fq0r Rj5z CH7Ie P8C7fi TRQ=</latexit>cost/

ESYZ1k Vcx Cu Gg4Tt NZfhmp F536kv Sr VXt86pt39TKrcs8jw I4Asfg FNig AVrg Gr RB2AQg Ufw DF6MB+PJe DXe Fq0r Rj5z CH7Ie P8C7fi TRQ=</latexit>cost/

g V0j E7QGXJQHTXRNWqh Nq Io Ro/o Gb1Yif Vkv Vpvi9YVK585Qj9kv X8Bo2OSFQ=</latexit>Success rate

g V0j E7QGXJQHTXRNWqh Nq Io Ro/o Gb1Yif Vkv Vpvi9YVK585Qj9kv X8Bo2OSFQ=</latexit>Success rate

Success rate

Low reward: δ = 1 Moderate reward: δ = 0.5 High reward: δ = 0

Eg Kb Rrx SHZ9o Az AW3NIdu LIGEPoe OP7n K6p0Hk Ip F4k5PY/BCMh Is YJRog/q3Ca Wg FJZEw6BUtiv2XPivc XJTRrlag9JHfxj RJASh KSd K9Rw71l5Kp Ga Uw6z YTx TEh E7ICHr GCh KC8t L5z TN8asg QB5E0T2g8p98n Uh Iq NQ190xk SPVa/axn8r9ZLd NDw Uibi RIOgi0VBwr GOc BYAHj IJVPOp MYRKZm7Fd Ewkodr EVJy Hc F3Ya7/DLOSK3m1pak U6045x XHuam Wm5d5Hg V0j E7QGXJQHTXRNWqh Nq Io Ro/o Gb1Yif Vkv Vpvi9YVK585Qj9kv X8Bo2OSFQ=</latexit>Success rate

Eg Kb Rrx SHZ9o Az AW3NIdu LIGEPoe OP7n K6p0Hk Ip F4k5PY/BCMh Is YJRog/q3Ca Wg FJZEw6BUtiv2XPivc XJTRrlag9JHfxj RJASh KSd K9Rw71l5Kp Ga Uw6z YTx TEh E7ICHr GCh KC8t L5z TN8asg QB5E0T2g8p98n Uh Iq NQ190xk SPVa/axn8r9ZLd NDw Uibi RIOgi0VBwr GOc BYAHj IJVPOp MYRKZm7Fd Ewkodr EVJy Hc F3Ya7/DLOSK3m1pak U6045x XHuam Wm5d5Hg V0j E7QGXJQHTXRNWqh Nq Io Ro/o Gb1Yif Vkv Vpvi9YVK585Qj9kv X8Bo2OSFQ=</latexit>Success rate

Success rate

14 BMI 14 BMI 14 BMI

5000 DM 5000 DM 5000 DM

7LGE91mfv2YANm WBf2BX7xr5Hl9HX6Dr6e Vu6Ea17Xr C/EP2+AT4qp TQ=</latexit>(a) Performance comparison over entire Adult Income dataset

(b) Performance comparison over entire Law School dataset

(c) Performance comparison over entire Diabetes dataset

tsz12x Fqsz QT7w27YHbu P/ka30UP0+Fo6FU16Vtg HRM8v EJClo Q=</latexit>(d) Performance comparison over entire German Credit dataset

AFTER VERIFICATION

Figure 8. Performance comparison over entire datasets after verification: The graphs show average success rate for moving individuals within a variety of distances (δ) to the target set. The y-axis shows the percentage of individuals within the goal distance, and the x-axis, represents different costs (ϵ values) to achieve the goal. These values were obtained after applying the verification procedure with a 10% chance of eliminating valid inputs.