# pathspecific_counterfactual_fairness__b7a71fa3.pdf

The Thirty-Third AAAI Conference on Artiﬁcial Intelligence (AAAI-19)

Path-Speciﬁc Counterfactual Fairness

Silvia Chiappa csilvia@google.com Deep Mind London

We consider the problem of learning fair decision systems from data in which a sensitive attribute might affect the decision along both fair and unfair pathways. We introduce a counterfactual approach to disregard effects along unfair pathways that does not incur in the same loss of individual-speciﬁc information as previous approaches. Our method corrects observations adversely affected by the sensitive attribute, and uses these to form a decision. We leverage recent developments in deep learning and approximate inference to develop a VAE-type method that is widely applicable to complex nonlinear models.

Introduction

Machine learning is increasingly being used to take decisions that can severely affect people s lives, e.g. in policing, education, hiring, lending, and criminal risk assessment (Hoffman, Kahn, and Li 2015; Dieterich, Mendoza, and Brennan 2016). This phenomenon has been accompanied by an increase in concern about disparate treatment caused by model errors and bias in the data. In response to calls from governments and institutions, researchers have started to study how to ensure that learned models do no take decisions that are unfair with respect to sensitive attributes (e.g. race and gender) using different approaches. Among them, the causal framework (Pearl 2000; Dawid 2007; Pearl, Glymour, and Jewell 2016; Peters, Janzing, and Sch olkopf 2017) offers an intuitive and powerful way of reasoning about fairness, by viewing unfairness as the presence of an unfair causal effect of the sensitive attribute on the decision (Qureshi et al. 2016; Bonchi et al. 2017; Kilbertus et al. 2017; Kusner et al. 2017; Russell et al. 2017; Zhang and Wu 2017; Zhang, Wu, and Wu 2017; Nabi and Shpitser 2018; Zhang and Bareinboim 2018). Kusner et al. recently introduced a causal, individual-level, deﬁnition of fairness, called counterfactual fairness, which states that a decision is fair toward an individual if it coincides with the one that would have been taken in a counterfactual world in which the sensitive attribute were different. Counterfactual fairness assumes that the entire effect of the sensitive attribute on the decision is problematic. This is restrictive

Copyright c 2019, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

for scenarios in which the sensitive attribute might affect the decision along both fair and unfair pathways. For example, in the case of Berkeley s alleged sex bias in graduate admission (Pearl 2000), female applicants were rejected more often than male applicants as they were more often applying to departments with lower admission rates. Such an effect of gender through department choice is not

unfair as far as the college is concerned. What would be inadmissible is if the college treated male and female applicants with the same qualiﬁcations and applying to the same departments differently because of gender. This complex scenario can be represented by the graphical causal model depicted above. In this model, A, Q, D, and Y are random variables representing respectively gender, qualiﬁcation, department choice, and admission decision, A D Y is a causal path representing the inﬂuence of gender A on admission decision Y through department choice D, and A Y is a causal path representing the direct inﬂuence of A on Y . To deal with such scenarios, we propose a novel deﬁnition of fairness called path-speciﬁc counterfactual fairness, which states that a decision is fair toward an individual if it coincides with the one that would have been taken in a counterfactual world in which the sensitive attribute along the unfair pathways were different. In the Berkeley example, this would mean that an admission decision would be fair toward a female candidate if it would remain the same when pretending that the candidate were male along A Y . We propose an approach that implements path-speciﬁc counterfactual fairness by correcting the observations corresponding to variables that are descendants of the sensitive attribute along unfair causal pathways. The correction aims at eliminating the unfair information contained in the observations while retaining fair information. Furthermore, we introduce a latent-variable method that, by leveraging recent developments in deep learning and approximate inference, allows to apply this correction approach to complex non-linear models. Our correction procedure allows to retain more individual-speciﬁc information

Figure 1: (a): GCM with a confounder C for the causal effect of A on Y . (b): GCM with one direct path and one indirect causal path from A to Y . (c): GCM with a confounder C for the causal effect of M on Y .

than previous approaches to path-speciﬁc fairness based on constraining the learning of the model parameters to eliminate or reduce unfair effects (Kilbertus et al. 2017; Nabi and Shpitser 2018).

Background on Causality Causal relationships among random variables can visually be expressed using graphical causal models (GCMs). A GCM is a special case of a graphical model (see Chiappa for a quick introduction) that captures both independence and causal relations. In this work, we restrict ourselves to directed acyclic graphs, i.e. graphs in which a node cannot be an ancestor of itself. In a directed acyclic graph, the joint distribution over all nodes p(X1, . . . , XI) is given by the product of the conditional distributions of each node Xi given its parents pa(Xi), i.e. p(X1, . . . , XI) = QI i=1 p(Xi|pa(Xi)). GCMs enable us to give a graphical deﬁnition of causes and causal effects: if there exists a directed path from A to Y , then A is a potential cause of Y . Directed paths are also called causal paths. The causal effect of A on Y can be seen as the information that A sends to Y through causal paths, or as the conditional distribution of Y given A restricted to causal paths. This implies that if there exist at least one open non-causal path between A and Y then the causal effect of A on Y differs from p(Y |A). An example of such a path is A C Y in the GCM G of Fig. 1(a): the variable C is said to be a confounder for the effect of A on Y . In this case, the causal effect of A = a on Y can be seen as the conditional distribution p A=a(Y |A = a) on the modiﬁed GCM G A=a, resulting from intervening on A by replacing p(A|C) with a delta distribution δA=a (thereby removing the link from C to A) and leaving the remaining conditional distributions p(Y |A, C) and p(C) unaltered. The rules of do-calculus (Pearl 2000; Pearl, Glymour, and Jewell 2016) indicate if and how the conditional distribution in the intervened graph can be estimated using observations from G: if C is observed p A=a(Y |A = a) = P

C p(Y |A = a, C)p(C), whilst if C is unobserved estimating the conditional distribution using only observations from G is not possible in this case the effect is said to be non-identiﬁable. We deﬁne YA=a to be the random variable with distribution p(YA=a) = p A=a(Y |A = a). YA=a is called potential outcome variable and we will refer to it with the shorthand Ya.

By performing different interventions on A along different causal paths, it is possible to isolate the contribution of the causal effect of A on Y along a group of paths.

Direct and Indirect Effect. The simplest cases are the isolation of the contributions along the direct path A Y (direct effect) and along the indirect causal paths A . . . Y (indirect effect). Suppose that the GCM contains only one indirect causal path through a variable M, as in Fig. 1(b). We deﬁne Ya(Ma ) to be the random variable that results from the interventions A = a along A Y and A = a along A M Y . The average direct effect (ADE) and the average indirect effect (AIE) of A = a with respect to A = a are given by1

ADE = Ya(Ma ) Ya , AIE = Ya (Ma) Ya ,

where, e.g., Ya = R

Ya Yap(Ya). More generally, the ADE of A = a with respect to A = a can be estimated by computing the difference between 1) the average effect of A = a along the direct path A Y and A = a along the indirect causal paths A . . . Y and 2) the average effect of A = a along all causal paths. Similarly, the AIE of A = a with respect to A = a can be estimated by computing the difference between 1) the average effect of A = a along the direct path A Y and A = a along the indirect causal paths A . . . Y and 2) the average effect of A = a along all causal paths. Under the independence assumption Ya,m Ma (sequential ignorability), p(Ya(Ma )) can be estimated as

p(Ya(Ma )) = Z

m p(Ya(Ma )|Ma = m)p(Ma = m)

m p(Ya,m|Ma = m)p(Ma = m)

m p(Ya,m)p(Ma = m) , (1)

where to obtain the second line we have used the consistency property (Pearl, Glymour, and Jewell 2016). As there are no confounders, intervening coincides with conditioning, i.e. p(Ya,m) = p(Y |A = a, M = m) and p(Ma ) = p(M|A = a ). If the GCM contains a confounder for the effect of either A or M on Y , such as C in Fig. 1(c), then p(Ya,m) = p(Y |A = a, M = m). In this case, by following similar arguments as the ones used in Eq. (1) but conditioning on C (and therefore assuming Ya,m Ma |C), we obtain2

p(Ya(Ma )) = Z

m,c p(Y |a, m, c)p(m|a , c)p(c) .

If C is unobserved, the effect is non-identiﬁable.

1In this paper, we consider the natural effect, which generally differs from the controlled effect; the latter corresponds to intervening on M. 2We use the notation p(Y |a, m, c) as a shorthand for p(Y |A = a, M = m, C = c).

Path-Speciﬁc Effect. In the more complex case in which, rather than computing the direct and indirect effects, we want to isolate the contribution of the effect along a speciﬁc group of paths, we can generalize the formulas for the ADE and AIE by using in the ﬁrst term the variable resulting from performing the intervention A = a along the group of interest and A = a along the remaining causal paths. For example, consider the GCM of Fig. 2 and assume that we are interested in isolating the effect of A on Y along the direct path A Y and the paths passing through M, A M , . . . , Y , namely along the green and dashed green-black links. The path-speciﬁc effect (PSE) of A = a with respect to A = a for this group of paths is given by

PSE = Ya(Ma, La (Ma)) Ya ,

where p(Ya(Ma, La (Ma))) can be computed as Z

c,m,l p(Y |a, c, m, l)p(l|a , c, m)p(m|a, c)p(c) .

In the simple case in which the GCM corresponds to a linear model, e.g.

A Bernoulli(π), C = ϵc , M = θm + θm a A + θm c C + ϵm,

L = θl + θl a A + θl c C + θl m M + ϵl ,

Y = θy + θy a A + θy c C + θy m M + θy l L + ϵy , (2)

where ϵc, ϵm, ϵl and ϵy are unobserved independent zeromean Gaussian terms, we have

Ya(Ma, La (Ma)) = θy + θy mθm + θy l (θl + θl mθm)

+ θy aa + θy mθm a a + θy l (θl aa + θl mθm a a) .

The PSE is therefore given by

θy a(a a ) + θy mθm a (a a ) + θy l θl mθm a (a a ) . (3)

Shpitser gives a recursive rule for obtaining the variable of interest for computing the PSE, and a graphical method for understanding whether the PSE is identiﬁable in the presence of unobserved confounders.

Path-Speciﬁc Counterfactual Fairness We are interested in complex scenarios in which the sensitive attribute A might affect the decision variable Y along both fair and unfair causal pathways. We assume that A can only take two values a and a , and that a is a baseline value. Kilbertus et al. and Nabi and Shpitser propose to deal with such scenarios by constraining the learning of the model parameters such that the average of the unfair effect is eliminated or reduced. More speciﬁcally, Nabi and Shpitser suggest to perform model training by constraining the unfair PSE of A on Y to lie in a small range. The main limitation of this approach is that, at test time, it requires averaging over all variables that are descendants of the sensitive attribute through the unfair causal pathways. This can negatively impact the system s predictive accuracy, as individual-speciﬁc information about those descendants is disregarded. Kilbertus et al. propose to directly identify a set of constraints on the

Figure 2: GCM corresponding to Eq. (2).

conditional distribution of the decision variable that eliminate the unfair effect. This can easily be done in linear models, but it is unclear how to identify the constraints in more complex non-linear scenarios. Furthermore, this approach also unnecessarily removes information from problematic descendants. In contrast, we propose to simply correct at test time the decisions of individuals for which A = a by making sure that they coincide with the one that would have been taken in a counterfactual world in which the sensitive attribute along the unfair pathways were set to the baseline. This requires correcting the observations corresponding to variables that are descendants of the sensitive attribute through unfair pathways, by removing the unfair information induced by the sensitive attribute while retaining the remaining fair information. We achieve this through a generalization of the abduction-action-prediction method for counterfactual reasoning (Pearl 2000). We generally refer to our approach as path-speciﬁc counterfactual fairness (PSCF). For the Berkeley alleged sex bias case for example, PSCF would ensure that the admission decision of a female applicant coincides with the one that would have been taken in a counterfactual world in which her gender a were male a along the direct path A Y , by taking a decision based on the intervention A = a along A Y . To highlight its relation with the approaches of Kilbertus et al. and of Nabi and Shpitser, we ﬁrst explain PSCF for the case in which the data-generation mechanism is given by the linear model of Eq. (2) (Fig. 2). Assume that the direct effect of A on Y and the effect through M are considered unfair. PSCF corrects the decision of an individual for which A = a by performing the intervention A = a along the direct path A Y and the paths passing through M, A M , . . . , Y , namely along the green and dashed green-black links of Fig. 2. (Notice that the dashed green-black links differ fundamentally from the green links; they contain unfairness only as a consequence of A M, corresponding to the parameter θm a , being unfair.) More precisely, assuming that a = 0 is the baseline value of A, given an instance {an = a = 1, cn, mn, ln}, the PSCF approach computes a fair prediction yn PSCF of yn as the mean of p(Ya (Ma , La(Ma ))|a, cn, mn, ln). This is achieved by ﬁrst computing ϵn m and ϵn l from an, cn, mn, ln and the model equations (abduction), i.e.

ϵn m = mn θm θm a θm c cn ,

ϵn l = θl θl a θl ccn θl mmn .

Then fair transformations of mn and ln, mn PSCF and ln PSCF, and the fair prediction yn PSCF are obtained by substituting ϵn m and ϵn l into the model equations with the problematic terms θm a and θl a removed (this corresponds to the intervention A = a

along the direct path A Y and the paths passing through M, A M , . . . , Y ), i.e.

mn PSCF = θm + θm a + θm c cn + ϵn m ,

ln PSCF = θl + θl a + θl ccn + θl mmn PSCF + ϵn l ,

yn PSCF = θy + θy a + θy c cn + θy mmn PSCF + θy l ln PSCF . (4)

This approach can be seen as performing a correction on the decision through a correction on all the variables that are descendants of the sensitive attribute along unfair pathways (UP), namely M and L in this case. To understand the relation with the fair inference on outcomes (FIO) method suggested by Nabi and Shpitser, the PSE for this model (Eq. (3)) with a = 1 and a = 0 takes the form

PSE = θy a + θm a (θy m + θy l θl m) .

FIO consists in performing a constrained learning of the model parameters θ such that the PSE lies in a small range. After training, a prediction yn FIO of yn for an instance {an, cn, mn, ln} can be obtained as yn FIO = Y p(Y |an,cn), where p(Y |an, cn) is given by Z

m,l p(Y |an, cn, m, l)p(l|an, cn, m)p(m|an, cn) ,

mn FIO = ˆθm + ˆθm a an + ˆθm c cn ,

ln FIO = ˆθl + ˆθl aan + ˆθl ccn + ˆθl mmn FIO ,

yn FIO = ˆθy + ˆθy aan + ˆθy c cn + ˆθy mmn FIO + ˆθy l ln FIO ,

where ˆθ indicate the learned model parameters. Assume that, at the end of training, ˆθ for both PSCF and FIO coincide with the true underlying parameters θ, except for ˆθm a and ˆθy a in FIO which are assigned zero values to satisfy the constraint PSE = 0. Then, given an instance {an = a = 1, cn, mn, ln}, we can express yn PSCF as yn PSCF = Y p(Y |an,cn,mn,ln) PSE, since

yn PSCF = θy + θy c cn + θy mmn + θy l ln θm a (θy m + θy l θl m) ;

and yn FIO as yn FIO = Y p(Y |an,cn) PSE, since

yn FIO = θy + θy c cn + θy m mn + θy l ln θm a (θy m + θy l θl m) ,

where mn = M p(M|an,cn) = θm + θm a + θm c cn. This formulation highlights the disadvantage of FIO over PSCF in disregarding speciﬁc information about the individual, ϵn m and ϵn l , through the use of mn and ln. As the constraint PSE = 0 is not necessarily achieved by assigning zero values to ˆθm a and ˆθy a, this correspondence does not generally hold. As the reason for averaging over M and L, Nabi and Shpitser indicate the need to account for the constraints that are potentially imposed on ˆθm a and ˆθl m. If a constraint is imposed on a parameter, then the corresponding variable needs indeed to be integrated out to ensure that such a constraint is taken into account in the prediction. For any model, the PSE would contain the parameters corresponding to the UP descendants

-10 -5 0 5 10

Figure 3: (a): Empirical distribution of the estimate of ϵn m for the case in which mn is generated by Eq. (2) with an extra non-linear term f(A, C) (continuous lines). Histograms of p(Hm|A) (crossed lines). (b): GCM with an explicit latent variable for each UP descendant of A.

of A, which means that FIO would always require integrating out the UP descendants. However, even if we a priori identify a set of constraints that give PSE = 0, the UP descendants must be integrated out or corrected from unfairness even if no constraints are imposed on the corresponding parameters. Consider the case discussed above, where we achieve PSE = 0 by setting ˆθm a and ˆθy a to zero values. This does not constrain ˆθl m. However, to form a prediction of yn, we would still need to integrate over L, as the observation ln contains the problematic term θy l θl mθm a , corresponding to the unfair part of the effect of A on L. In this simple case, we could avoid having to integrate over M and L by a priori imposing the constraints ˆθy a = 0 and ˆθy m = ˆθy l ˆθl m, i.e. by constraining the conditional distribution used to form a prediction of yn, p(Y |A, C, M, L). This coincides with the constraint proposed by Kilbertus et al. to avoid proxy discrimination. However, this approach achieves removal of the problematic unfairness in mn and ln by cancelling out the entire mn from the prediction. This is also suboptimal, as all information within mn is disregarded. Furthermore, it is not clear how to extend this approach to more complex scenarios. In conclusion, the main advantage of our approach is that it allows to retain fair individual-speciﬁc information contained in the UP descendants. This is achieved by leaving unaltered the underlying data-generation mechanism during training.

Model-Observations Mismatch. Whilst offering several advantages over previous approaches to path-speciﬁc fairness, in the presence of a strong mismatch between the assumed and actual data-generation mechanisms, the PSCF approach described above would most likely not remove unfairness completely. Indeed, in this case the estimates of ϵn m and ϵn l would not be independent from the sensitive attribute A. Consider, for example, the case in which we assume the data-generation process of Eq. (2), but the observed mn, n = 1, . . . , N, are generated from a modiﬁed version of Eq. (2) containing an extra non-linear term f(A, C). The learned model parameters ˆθ would not be able to describe this non-linear term, which would therefore be absorbed into the estimate of ϵn m, making

it dependent on A, as shown in Fig. 3(a) (continuous lines). To solve this issue, we propose to decompose ϵm into two components, i.e. ϵm = Hm + ηm, and to adopt a training procedure in which p(Hm|A = a), deﬁned as

p(Hm|A = a) = 1

n=1 p(Hm|an = a, cn, mn, ln) (5)

where Na indicates the number of observations for which an = a, is encouraged to have small dependence on A. We can then use, e.g., the mean of p(Hm|an, cn, mn, ln), rather than the estimate of ϵn m. In other words, we make sure that, when estimating the latent randomness associated with an individual, we only pick up the part that does not depend on A, and only use this part to perform the prediction. Encouraging independence on A is necessary, as otherwise the estimated p(Hn m|A) would be close to the estimate of ϵn m. This is shown by the histograms of p(Hm|A) in Fig. 3(a) (crossed lines), obtained by assuming a Gaussian distribution for p(Hm) and by learning the model parameters using an expectation maximization approach. To more generally ensure that the abduction procedure will not end up with estimates that depend on the sensitive variable, we need to encourage latent independence on A for each descendant of A that needs to be corrected, namely for each UP descendant, and therefore introduce another latent variable for L, Hl (see Fig. 3(b)). We propose a way to encourage independence on A together with a method that generalizes the PSCF approach described above to complex non-linear models in the next section.

PSCF-VAE Consider more general equations for the GCM of Fig. 3(b), given by

A Bernoulli(π), C pθ(C) , Hm pθ(Hm), M pθ(M|A, C, Hm) , Hl pθ(Hl), L pθ(L|A, C, M, Hl) , Y pθ(Y |A, C, M, L) , (6)

where if M is categorical we assume pθ(M|A, C, Hm) = fθ(A, C, Hm), where fθ(A, C, Hm) can be any function (e.g. a neural network); whilst if M is continuous we assume that pθ(M|A, C, Hm) is Gaussian with mean fθ(A, C, Hm). The model likelihood pθ(A, C, M, L, Y ), and the posterior distributions pθ(Hm|A, C, M, L) and pθ(Hl|A, C, M, L) required to form fair predictions, are generally intractable. We address this issue with a variational approach that computes Gaussian approximations qφ(Hm|A, C, L, M) and qφ(Hl|A, C, L, M) of pθ(Hm|A, C, M, L) and pθ(Hl|A, C, M, L) respectively, parametrized by φ, as discussed in detail below. After learning θ and φ, analogously to Eq. (4), we compute a fair prediction yn PSCF for an instance {an = a, cn, mn, ln} as Ya (Ma , La(Ma )) p(Ya (Ma ,La(Ma ))|a,cn,mn,ln), estimated using a Monte-Carlo approach. Speciﬁcally, we ﬁrst draw samples hn,i m qφ(Hm|a, cn, mn, ln) and hn,i l

qφ(Hl|a, cn, mn, ln), for i = 1, . . . , I, and then form

mn,i PSCF pθ(M|a , cn, hn,i m ),

ln,i PSCF pθ(L|a, cn, mn,i PSCF, hn,i l ) ,

yn PSCF = 1

i=1 Y pθ(Y |a ,cn,mn,i PSCF,ln,i PSCF) . (7)

In the experiments, we used I = 500. If we group the observed and latent variables as V = {A, C, M, L, Y } and H = {Hm, Hl} respectively, the variational approximation qφ(H|V ) to the intractable posterior pθ(H|V ) is obtained by ﬁnding the variational parameters φ that minimize the Kullback-Leibler divergence KL(qφ(H|V )||pθ(H|V )). This is equivalent to maximizing a lower bound Fθ,φ on the log of the marginal likelihood log pθ(V ) Fθ,φ with

Fθ,φ = log qφ(H|V ) qφ(H|V ) + log pθ(V, H) qφ(H|V ) ,

where, e.g.,

log qφ(H|V ) qφ(H|V ) = Z

H qφ(H|V ) log qφ(H|V ).

In our case, rather than qφ(H|V ), we use qφ(H|V V \Y ). Our approach is therefore to learn simultaneously the latent embedding and predictive distributions in Eq. (7). This could be preferable to other causal latent variable approaches such as the Fair Learning algorithm proposed by Kusner et al., which separately learns a predictor of Y using samples from the previously inferred latent variables and from the nondescendants of A. In order for Fθ,φ to be tractable conjugacy is required, which heavily restricts the family of models that can be used. This issue can be addressed with a Monte-Carlo approximation known as variational auto-encoding (VAE) (Kingma and Welling 2014; Rezende, Mohamed, and Wierstra 2014). This approach represents H as a non-linear transformation H = fφ(E) of a random variable E from a parameter free distribution qϵ. As we choose q to be Gaussian, H = µφ + σφE with qϵ = N(0, 1) for the univariate case. This enables us to rewrite the bound as

Fθ,φ = log qφ(H =fφ(E)) + log pθ(V, H =fφ(E)) qϵ .

The ﬁrst part of the gradient of Fθ,φ with respect to φ, φFθ,φ, can be computed analytically, whilst the second part is approximated by

φ log pθ(V, H = fφ(E)) qϵ

i=1 φ log pθ(V, hi = fφ(ϵi)), ϵi qϵ .

In the experiments, we used I = 1, as commonly done in the VAE literature. The variational parameters φ are parametrized by a neural network taking as input V .

Independence on A. In order to ensure that qφ(H|A), deﬁned similarly to Eq. (5), does not depend on A, we experimented with an adversary approach (Edwards and Storkey

Figure 4: (a): GCM for the UCI Adult dataset. (b): GCM for the UCI German Credit dataset.

2016) and with a maximum mean discrepancy (MMD) penalization approach (Gretton et al. 2012; Louizos et al. 2016), which gave similar but more stable results. The MMD approach adds a penalty term to the bound Fθ,φ,

βLMMD(a, a ) ,

where β is a weighting factor that determines the degree of independence, and therefore might correspond to different levels of fairness. LMMD(a, a ) is the sum of several terms, one for each latent variable, where e.g. the term for Hm is given by

Lm MMD(a, a ) = 1 N 2a

j=1 k(ha,i m , ha,j m )

j=1 k(ha ,i m , ha ,j m ) 2 Na Na

j=1 k(ha,i m , ha ,j m ) ,

where k is a Gaussian kernel, and ha,i m is a sample from the variational distribution for an individual for which A = a.

Experiments We evaluate the proposed PSCF-VAE method on the UCI Adult and German Credit datasets. As prior distribution pθ for each latent variable (Eq. (6)) we used a ten-dimensional Gaussian with diagonal covariance matrix, whilst as fθ we used a neural network with one linear layer of size 100 with tanh activation, followed by a linear layer (the outputs were Gaussian means for continuous variables and logits for categorical variables). As variational distribution qφ we used a ten-dimensional Gaussian with diagonal covariance, with means and log variances obtained as the outputs of a neural network with two linear layers of size 20 and tanh activation, followed by a linear layer. Training was achieved with the Adam optimizer (Kingma and Ba 2015) with learning rate 0.01, mini-batch size 128, and default values β1 = 0.9, β2 = 0.999, and ϵ = 1e-8. Training was stopped after 20,000 steps.

The UCI Adult Dataset The Adult dataset from the UCI repository (Lichman 2013) contains 14 attributes including age, working class, education level, marital status, occupation, relationship, race, gender, capital gain and loss, working hours, and nationality for 48,842 individuals; 32,561 and 16,281 for the training and test sets respectively. The goal is to predict whether the individual s annual income is above or below $50,000. We assumed the GCM of Fig. 4(a) (following Nabi and Shpitser),

0 2500 5000 7500 10000 73

Figure 5: Test accuracy of PSCF-VAE on the UCI Adult dataset for increasing values of β.

where A corresponds to the protected attribute sex, C to the duple age and nationality, M to marital status, L to level of education, R to the triple working class, occupation, and hours per week, and Y to the income class3. Age, level of education and hours per week are continuous, whilst sex, nationality, marital status, working class, occupation, and income are categorical. Besides the direct effect A Y , the effect of A on Y through marital status, namely along the paths A M , . . . , Y , is considered unfair. Nabi and Shpitser assume that all variables are continuous, except A and Y , and linearly related, except Y for which p(Y = 1|pa(Y )) = π = σ(θy + P

Xi pa(Y ) θy xi Xi) where σ( ) is the sigmoid function. With the encoding A {0, 1}, where 0 indicates the male baseline value, and under the approximation log(π/(1 π)) log π, we can write the PSE in the odds ratio scale as PSE exp(θy a + θy mθm a + θy l θl mθm a + θy r(θr mθm a + θr l θl mθm a )). An instance from the test set {an, cn, mn, ln, rn} is classiﬁed by using p(Y |an, cn) = R

m,l,r p(Y |an, cn, m, l, r) p(r|an, cn, m, l)p(l|an, cn, m)p(m|an, cn). In Fig. 5, we show the accuracy obtained by PSCF-VAE on the test set for increasing values of β, ranging from β = 0 (no MMD penalization) to β = 10, 000. As we can see, accuracy decreases from 81.2% to 73.4%. Notice that predictions were formed using samples of Hm, Hl and Hr also for males, even if not required. Also notice that forming predictions from pθ(Y |an, cn, mn, ln, rn) gives 82.7% accuracy. In Fig. 6, we show histograms of two dimensions of qφ(Hm|A) (ﬁrst and second row) and one dimension of qφ(Hl|A) (third row) for β = 0, β = 2, 500, and β = 5, 000 (left to right) after 20,000 training steps for females (red) and males (blue) these are the only variables that show differences between male and females. As we can see, increasing β induces a reduction in the number of modes in the posterior, which corresponds to information loss. For β = 10, 000 all histograms are unimodal (not shown). For β = 5, 000, for which accuracy is around 78%, the histograms for females

3We omit race, and capital gain and loss (although including capital gain and loss would increase test accuracy from 82.7% to 84.7%) to use the same attributes as Nabi and Shpitser.

10 5 0 5 10 15 60 40 20 0 20 40 60 25 20 15 10 5 0 5 10 15

20 10 0 10 20 30 15 10 5 0 5 10 15 30 25 20 15 10 5 0

20 15 10 5 0 5 15 10 5 0 5 10 15 20 15 10 5 0 5 10 15

Figure 6: Histograms of two dimensions of qφ(Hm|A) (ﬁrst and second row) and one dimension of qφ(Hl|A) (third row) for β = 0, β = 2, 500, and β = 5, 000 (left to right) after 20,000 training steps for females (red) and males (blue).

and males are similar this can therefore be considered a fair accuracy. The unconstrained PSE on this dataset is 3.64. When constraining the PSE to be smaller than 3.7 (thus essentially imposing no constraint), FIO gives 73.8% accuracy, due the information that is lost by integrating out M, L and R. Constraining the PSE to be smaller than 3.6 also gives 73.8% accuracy. Constraining the PSE to be smaller than 1.05, as suggested by Nabi and Shpitser, gives 73.4% accuracy (Nabi and Shpitser report 72%). These results demonstrate that loss in accuracy in FIO is due to integrating out M, L and R, rather than to ensuring fairness.

The UCI German Credit Dataset The German Credit dataset from the UCI repository contains 20 attributes of 1,000 individuals applying for loans. Each applicant is classiﬁed as a good or bad credit risk, i.e. as likely or not likely to repay the loan. We assume the GCM in Fig. 4(b), where A corresponds to the protected attribute sex, C to age, S to the triple status of checking account, savings, and housing, and R the duple credit amount and repayment duration. The attributes age, credit amount, and repayment duration are continuous, whilst checking account, savings, and housing are categorical. Besides the direct effect A Y , we would like to remove the effect of A on Y through S. We only need to introduce a hidden variable Hs for S, as R does not need to be corrected. We divided the dataset into training and test sets of sizes 700 and 300 respectively. As for the Adult dataset, we varied β from 0 to 10,000. The test accuracy remained 76.0% for

5 4 3 2 1 0 1 2 3 4 8 6 4 2 0 2 4 6 8

Figure 7: Histograms of qφ(Hs|A) for one dimension of the variable housing for β = 0 and β = 10, 000 after 20,000 training steps for females (red) and males (blue).

all values of β (predictions were formed using samples of Hs also for males). This is same accuracy obtained when forming predictions from pθ(Y |an, cn, sn, rn). In Fig. 7, we show qφ(Hs|A) for one dimension of the variable housing, which shows the most signiﬁcant difference between females and males, for β = 0 and β = 10, 000.

Conclusions

We have proposed a novel intuitive deﬁnition of fairness, pathspeciﬁc counterfactual fairness, which states that a decision is fair toward an individual if it coincides with the one that would have been taken in a counterfactual world in which the sensitive attribute along the unfair pathways were different. We have introduced a latent inference-projection method, PSCF-VAE, that achieves path-speciﬁc counterfactual fairness by correcting the variables that are descendants of the sensitive attribute along unfair pathways during testing, leaving unaltered the underlying data-generation mechanism during training. The proposed method is widely applicable to complex non-linear models. PSCF-VAE requires providing the causal model underlying the data generation process. Future work will consider relaxing this requirement.

Acknowledgements

The author would like to thank Thomas P. S. Gillam for his contribution to this work.

Bonchi, F.; Hajian, S.; Mishra, B.; and Ramazzotti, D. 2017. Exposing the probabilistic causal structure of discrimination. International Journal of Data Science and Analytics 3(1):1 21. Chiappa, S. 2014. Explicit-duration Markov switching models. Foundations and Trends in Machine Learning 7(6):803 886. Dawid, P. 2007. Fundamentals of statistical causality. Technical report, University Colledge London. Dieterich, W.; Mendoza, C.; and Brennan, T. 2016. Compas risk scales: Demonstrating accuracy equity and predictive parity.

Edwards, H., and Storkey, A. 2016. Censoring representations with an adversary. In 4th International Conference on Learning Representations. Gretton, A.; Borgwardt, K.; Rasch, M.; Sch olkopf, B.; and Smola, A. 2012. A kernel two-sample test. Journal of Machine Learning Research 13:723 773. Hoffman, M.; Kahn, L.; and Li, D. 2015. Discretion in hiring. Kilbertus, N.; Rojas-Carulla, M.; Parascandolo, G.; Hardt, M.; Janzing, D.; and Sch olkopf, B. 2017. Avoiding discrimination through causal reasoning. In Advances in Neural Information Processing Systems 30, 656 666. Kingma, D., and Ba, J. 2015. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations. Kingma, D., and Welling, M. 2014. Auto-encoding variational Bayes. In 2nd International Conference on Learning Representations. Kusner, M.; Loftus, J.; Russell, C.; and Silva, R. 2017. Counterfactual fairness. In Advances in Neural Information Processing Systems 30, 4069 4079. Lichman, M. 2013. UCI machine learning repository. Louizos, C.; Swersky, K.; Li, Y.; Welling, M.; and Zemel, R. 2016. The variational fair autoencoder. In 4th International Conference on Learning Representations. Nabi, R., and Shpitser, I. 2018. Fair inference on outcomes. In Thirty-Second AAAI Conference on Artiﬁcial Intelligence. Pearl, J.; Glymour, M.; and Jewell, N. 2016. Causal Inference in Statistics: A Primer. Wiley. Pearl, J. 2000. Causality: Models, Reasoning, and Inference. Cambridge University Press. Peters, J.; Janzing, D.; and Sch olkopf, B. 2017. Elements of Causal Inference: Foundations and Learning Algorithms. MIT Press. Qureshi, B.; Kamiran, F.; Karim, A.; and Ruggieri, S. 2016. Causal discrimination discovery through propensity score analysis. Ar Xiv e-prints. Rezende, D.; Mohamed, S.; and Wierstra, D. 2014. Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of the 31st International Conference on Machine Learning, 1278 1286. Russell, C.; Kusner, M.; Loftus, J.; and Silva, R. 2017. When worlds collide: Integrating different counterfactual assumptions in fairness. In Advances in Neural Information Processing Systems 30, 6417 6426. Shpitser, I. 2013. Counterfactual graphical models for longitudinal mediation analysis with unobserved confounding. Cognitive Science 37(6):1011 1035. Zhang, J., and Bareinboim, E. 2018. Fairness in decisionmaking the causal explanation formula. In Proceedings of the 32nd AAAI Conference on Artiﬁcial Intelligence. Zhang, L., and Wu, X. 2017. Anti-discrimination learning: a causal modeling-based framework. International Journal of Data Science and Analytics 1 16.

Zhang, L.; Wu, Y.; and Wu, X. 2017. A causal framework for discovering and removing direct and indirect discrimination. In IJCAI, 3929 3935.