# counterfactual_fairness_unidentification_bound_and_algorithm__53ec1016.pdf

Counterfactual Fairness: Unidentiﬁcation, Bound and Algorithm

Yongkai Wu , Lu Zhang and Xintao Wu University of Arkansas {yw009, lz006, xintaowu}@uark.edu

Fairness-aware learning studies the problem of building machine learning models that are subject to fairness requirements. Counterfactual fairness is a notion of fairness derived from Pearl s causal model, which considers a model is fair if for a particular individual or group its prediction in the real world is the same as that in the counterfactual world where the individual(s) had belonged to a different demographic group. However, an inherent limitation of counterfactual fairness is that it cannot be uniquely quantiﬁed from the observational data in certain situations, due to the unidentiﬁability of the counterfactual quantity. In this paper, we address this limitation by mathematically bounding the unidentiﬁable counterfactual quantity, and develop a theoretically sound algorithm for constructing counterfactually fair classiﬁers. We evaluate our method in the experiments using both synthetic and real-world datasets, as well as compare with existing methods. The results validate our theory and show the effectiveness of our method.

1 Introduction It is important to develop fairness-aware machine learning algorithms and models such that the decisions made with their assistance are subject to fairness requirements. In recent years, the research community has studied fairness-aware machine learning from the causal perspective [Zhang et al., 2017b; Zhang et al., 2017a; Zhang and Bareinboim, 2018; Nabi and Shpitser, 2018; Zhang et al., 2018b; Zhang et al., 2018a] using causal modeling [Pearl, 2009]. In these works, fairness is generally formulated and quantiﬁed as the average causal effect of the sensitive attribute on the decision attribute. The effect is evaluated by the intervention through the post-interventional distributions. Different from above works, [Kusner et al., 2017] introduced counterfactual fairness, based on the counterfactual inference, which considers the causal effect within a particular individual/group speciﬁed by of observational proﬁle attributes. The notion of counterfactual fairness is more general than the intervention-based notions where the set of proﬁle attributes is empty. Consequently, the counterfactual inference is more challenging

than the intervention. This is because measuring interventions only considers the post-interventional distributions, but counterfactual inference considers both the real world without the intervention and the counterfactual world with the intervention. Researchers have proved that the counterfactual quantity cannot be uniquely computed from the observational data in some situations, which are referred to as the unidentiﬁable situations [Pearl, 2009]. The unidentiﬁable situations are big barriers to the application of counterfactual fairness. In [Kusner et al., 2017], the authors proposed three methods to evade the unidentiﬁability issue: 1) only non-descendants of the sensitive attribute are used in classiﬁcation, 2) the non-deterministic substitutions of the hidden variables are postulated and inferred based on domain knowledge, or 3) the complete causal model is postulated and estimated, e.g., , being treated as the additive noise model then estimating the errors. However, the sensitive attribute is usually an inherent nature of data hence many attributes are its descendants. If all descendants are forbidden, very few attributes are allowed for classiﬁer training, weakening the resultant fair classiﬁer dramatically. Also, it is over-simpliﬁed to postulate the substitutions and their distributions, since the exogenous variables represent all possible sources of randomness; or presuppose that the causal model, which is supposed to represent the underlying mechanism of the world, is an additive model. In this paper, we address the problem of learning counterfactually fair classiﬁers by mathematically bounding the unidentiﬁable counterfactual quantity. We leverage the counterfactual graph proposed in [Shpitser and Pearl, 2008] for depicting the independence relationships among variables in the real world and the counterfactual world which are of concern in the counterfactual quantity. Then, we adopt the c-component factorization to decompose the counterfactual quantity, and identify the terms that are the source of unidentiﬁcation. We propose a graphical criterion for determining the identiﬁcation of counterfactual fairness and develop the lower and upper bounds of counterfactual fairness in unidentiﬁable situations. Finally, we propose a post-processing method for reconstructing arbitrary classiﬁers in order to achieve counterfactual fairness. We formulate the reconstruction problem as a linear constrained optimization problem with the bounded counterfactual fairness criterion as the constraints. In the experiments, we evaluate our methods and compare

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

them with existing ones using real-world datasets and synthetic datasets where the ground-truth of counterfactual fairness can be precisely quantiﬁed. The results show that our method correctly achieves counterfactual fairness as expected according to our theorem, while obtaining high accuracy of prediction. On the contrary, the methods proposed in [Kusner et al., 2017] either fail to achieve counterfactual fairness or suffer from low accuracy due to simpliﬁed assumptions.

2 Preliminaries 2.1 Structural Causal Model and Intervention Deﬁnition 1 (Structural Causal Model). A structural causal model M is represented by a triple U, P(U), V, F where 1. U is a set of exogenous variables of any types, i.e., discrete, continuous, or mixed. An arbitrary joint probability distribution P(U) is deﬁned over U. 2. V is a set of endogenous variables that are determined by variables in U V. 3. F is a set of functions mapping from U V to V. Specifically, for each X V, there is a function f X F mapping from U (V\X) to X, i.e., X = f X(Pa(X), UX), where Pa(X) V \ X stands for the endogenous variables that directly determine the value of X, and UX U represents all sources of randomness. A causal model is associated with a causal graph G where each node corresponds to a variable in V 1, and each edge, denoted by an arrow, points from a node X to another node Y if X is an input of f Y . In this manuscript, we focus on the Markovian causal model where all exogenous variables are independent. Thus, the causal graph is simpliﬁed by omitting all exogenous variables and the edges associated with them. For any set of nodes X, we use Pa(X)G, Ch(X)G, An(X)G, and De(X)G to denote the sets of parents, children, ancestors, and descendants of X in G. In the causal model, the quantitative measure of causal effects is facilitated by interventions through do-calculus [Pearl, 2009], which simulates the physical interventions that force some variable X to take certain values x. Formally, the intervention that ﬁxes the value of X to x is denoted by do(x). The mathematical meaning of do(x) in a causal model M is deﬁned as the substitution of equation X = f X(Pa(X)G, UX) with X = x. The causal model after performing do(x) is called a submodel denoted by Mx. For another endogenous variable Y which is affected by the intervention, its interventional variant in submodel Mx is denoted by Yx. The distribution of Yx, called the post-intervention distribution of Y under do(x), is denoted by P(yx). For simplicity, we rewrite P(yx) as Px(y), meaning the distribution of (the variant of) Y in submodel Mx. Similarly, we can rewrite the condition distribution of Yx given Zx, i.e., P(yx|zx), as Px(y|z). Pearl [Pearl, 2009] proposed three rules of do-calculus to infer post-intervention distributions from observational data, by converting post-intervention distributions to observational distributions.

1An uppercase letter denotes an attribute and a lowercase letter denotes an attribute value. Similarly, a bold uppercase letter denotes a set of attributes and a bold lowercase letter denotes a set of values.

2.2 Counterfactual Inference and Unidentiﬁcation In Section 2.1, the causal effect is estimated using intervention where the post-intervention distribution concerns the counterfactual world represented by submodel Mx only. If we infer the post-intervention distribution while conditioning on certain individuals or groups speciﬁed by a subset of endogenous variables, the inferred quantity will involve two worlds simultaneously, the real world represented by causal model M, and the counterfactual world Mx, hence cannot be resolved by do-calculus directly. Such causal inference problem is called the counterfactual inference, and the distribution of Yx conditioning on the real world observation O = o is denoted by P(yx|o). Note that Yx is a variable in submodel Mx, while O are variables in original causal model M. Apparently, inferring P(yx|o) requires to know the connection between the real world and the counterfactual world. This can be done if we have complete knowledge of the causal model. According to [Pearl, 2009], the counterfactual inference can be exactly performed using three steps if the complete model, including all the structural equations, is known: 1. Abduction: Update P(u) by observation O = o to obtain P(u|o). 2. Action: Modify M by intervention do(x) to obtain the submodel Mx. 3. Prediction: Use modiﬁed submodel Mx, P(u|o) to compute the probability of Yx,, i.e., the consequence of the counterfactual inference. The above method is usually infeasible in practice due to the lack of the complete knowledge of the causal model. If we only have the causal graph and observational data, which is a common scenario in the literature, the counterfactual quantity might be evaluated by using the IDC* algorithm developed in [Shpitser and Pearl, 2008]. However, in certain situations where the IDC* algorithm fails, the corresponding counterfactual quantity cannot be uniquely computed from the observational data in theory. These situations are referred to as the unidentiﬁable situations. One typical unidentiﬁable situation [Shpitser and Pearl, 2008] is shown in Lemma 1. Lemma 1. Let X, Y be two variables such that Y is a parent of X, then P(Y = y, Yx = y ) is unidentiﬁable if y = y .

3 Quantifying and Bounding Counterfactual Fairness Fairness-aware learning is widely studied using causal modeling to capture the causal connection between the sensitive attribute and the challenged decision [Kilbertus et al., 2017; Li et al., 2017; Zhang et al., 2017b; Zhang and Bareinboim, 2018; Nabi and Shpitser, 2018; Chiappa, 2019; Kusner et al., 2017]. We adopt the notion of counterfactual fairness proposed in [Kusner et al., 2017], which formulates fairness as the equivalence of two counterfactual quantities. Although this notion captures the true intuition behind fairness, it faces signiﬁcant computational challenges due to the unidentiﬁability of counterfactual inference. In this section, we ﬁrst give the formal deﬁnition of counterfactual fairness for predictive models and explain its physical meaning. Then, we show how to address above challenges by mathematically bounding the unidentiﬁable counterfactual quantity. In our notations, S {s+, s } denotes the sensitive attribute, Y {y+, y } denotes the decision, and X denotes

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

Figure 1: (a) Causal Graph G. (b) Counterfactual Graph G for P(ˆys|s , z).

the set of other attributes. The historical dataset D drawn from a distribution P(X, S, Y ) is used to train a classiﬁer f : X, S ˆY . The underlying mechanism that determines a distribution P(X, S, ˆY ) is represented by a causal model M. The causal graph associated with the causal model is denoted by G. Then, counterfactual fairness is deﬁned as follows. Deﬁnition 2. (Counterfactual Fairness) Given a set of attributes Z X, a classiﬁer f : X, S ˆY is counterfactually fair w.r.t. Z, if under any observational condition Z = z we have

P(ˆys |s , z) = P(ˆys|s , z), where s , s {s+, s }.

Recall that a lowercase letter with a subscript represents a value assignment to the corresponding variable in the submodel, e.g., ˆys is a value of ˆYs in the submodel Ms. The physical meaning of counterfactual fairness can be interpreted as follows. Consider candidates are applying for a job and a predictive model is used to make the decision ˆY . We concern an individual from disadvantage group s who is speciﬁed by a proﬁle z. Straightforwardly, the probability of the individual to get the positive decision is P(ˆy|s , z), which is equivalent to P(ˆys |s , z) since the intervention makes no change to S s value of that individual. Now assume the value of S for this very individual had been changed from s to s+. The probability of this individual to get the positive decision after the hypothetical change is given by P(ˆys+|s , z). Therefore, if two probabilities P(ˆys |s , z) and P(ˆys+|s , z) are identical, we can claim the individual is treated fairly as if he/she had been from the other group.

3.1 Identiﬁcation of Counterfactual Quantity In this section, we identify the source of unidentiﬁcation for the counterfactual quantity and give a graphical criterion determining the identiﬁability of the counterfactual quantity. Our method is inspired by the IDC* algorithm and we further extend it to bound the unidentiﬁable quantity. The analysis of P(ˆys|s , z) concerns the connection between two causal models, M and Ms. Thus, we apply the make-cg algorithm [Shpitser and Pearl, 2008] to the causal graph G to construct a new graph G that depicts the independence relationship among all variables in M and Ms that are of concern in the analysis. The make-cg algorithm ﬁrst combines the two causal graphs and makes them share the same

exogenous variables U, corresponding to the shared causal context or background. Then, it removes the duplicated endogenous nodes which are also not affected by do(s). The resultant graph is the so-called counterfactual graph. Next, we apply the c-component factorization [Tian and Pearl, 2002] to decompose counterfactual graph G into disjoint subgraphs called the c-components, such that any two nodes in the same c-component are connected by a bi-directed path2. After that, the joint distribution of all variables in the counterfactual graph can be factorized as the product of the conditional distribution of each c-component. Our theoretical analysis will show that if certain c-component has the unidentiﬁability issue that cannot be resolved by summation, the corresponding counterfactual quantity is unidentiﬁable. Without loss of generality, we ﬁrst use an example to illustrate our idea. Consider the causal graph G shown in Figure 1 (a) where there are ﬁve attributes A, B, C, S, ˆY : S is the sensitive attribute; ˆY is the prediction of the decision attribute obtained by any classiﬁer; A is the ancestor of ˆY but not the descendant of S; B is the intersection between the ancestor of Y and the descendant of S; and C is the descendant of S but not the ancestor of ˆY . We aim to study the identiﬁability of P(ˆys|s , z), where Z is an arbitrary subset of {A, B, C}. The counterfactual graph denoted by G is shown in Figure 1 (b), where the bi-directed dash edge implies that the two nodes share the same exogenous variables. Note that A and As are merged as A since they are duplicated. Next, we apply the c-component factorization. In Figure 1 (b), there are ﬁve c-components: A , S , B, Bs , C, Cs , and ˆY , ˆYs . We can factorize P(ˆys, s , z) as

P(ˆys, s , z) = X

x\z,ˆy,b ,c R(a)R(s )R(c, c s)R(b, b s)R(ˆy , ˆys),

where R(w) = P w|Pa(W)G for any node set W, x = {a, b, c}, and z is any subset of x. Then, we can derive that

P(ˆys|s , z) =

P x\z,ˆy,b ,c h P (a)P (s |a)P (c,c s|s ,a) P (b,b s|s ,a)P (ˆy ,ˆys|a,b,b s) i

Note that c s in P(c, c s|s , a) and ˆy in P(ˆy, ˆys|a, b, b s) can be canceled out by summation. By applying the mseparation, we can remove b from P(ˆys|a, b, b s), as B is dseparated from ˆYs conditioning on A and Bs. Thus, we obtain

P(ˆys|s , z) =

P x\z,b h P (a)P (s |a)P (c|s ,a) P (b,b s|s ,a)P (ˆys|a,b s) i

P(s , z) . (1)

To further analyze Eq. (1), we consider two cases below. Case 1 (B / Z): In this case, we have b under the Σ of Eq. (1), hence b in P(b, b s|s , a) can be canceled out by summation, resulting in P(b s|s , a). Then, we can remove s from P(b s|s , a) as Bs is d-separated from S conditioning on A, resulting in P(b s|a). We can further rewrite P(ˆys|a, b s) as Ps(ˆy|a, b ), and rewrite P(b s|a) as P(b s|a). At last, we invoke do-calculus Rule 2 [Pearl, 2009] to convert Ps(ˆy|a, b )

2A bi-directed path is a path consisting of bi-directed edges only.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

to P(ˆy|a, b , s), and Ps(b |a) to P(b |a, s). Finally, we obtain

P(ˆys|s , z) =

P x\z\{b},b P(s , a, c)P(b |a, s)P(ˆy|a, b , s)

P x\z\{b} P(s , a, c)P(ˆy|a, s)

P(s , z) . (2)

Case 2 (B Z): In this case, since we don t have b under the Σ, term P(b, b s|s , a) cannot be reduced, resulting in

P(ˆys|s , z)=

P x\z,b P(s , a, c)P(b, b s|a, s )P(ˆy|a, b , s)

P(s , z) . (3)

From above two cases we see that, P(ˆys|s , z) in Case 1 is identiﬁable as all terms in Eq. (2) can be read from observational data. One can verify that this result is consistent with the IDC* algorithm. However in Case 2, since P(b, b s|s , a) in Eq. (3) is unidentiﬁable according to Lemma 1, P(ˆys|s , z) is also unidentiﬁable. In this example, the identiﬁability of P(ˆys|s , z) depends on whether node B, the intersection of S s descendants and ˆY s ancestors, is in set Z or not. We summarize this result as follows. Proposition 1. For the causal graph in Figure 1 (a), P(ˆys|s , z) is unidentiﬁable if and only if B Z.

3.2 Bounding Unidentiﬁable Counterfactual Quantity In Eq. (3), we identify the source of unidentiﬁability. Next, we derive the lower and upper bounds for P(ˆys|s , z) as shown in the following proposition, which works for both identiﬁable and unidentiﬁable situations. Proposition 2. For the causal graph in Figure 1 (a) we have

P(ˆys|s , z)

P x\z P(s , x) maxm {P(ˆy|s, a, b )}

P(s , z) , (4)

P(ˆys|s , z)

P x\z P(s , x) minm {P(ˆy|s, a, b )}

P(s , z) , (5)

where x = {a, b, c}, z is any subset of x, and M = {B} Z.

Proof. Suppose B Z, then M = {B}. Obviously, we have P(ˆy|s, a, b ) max b {P(ˆy|s, a, b )} .

By applying this inequality to Eq. (3), we have P(ˆys|s , z)

x\z P(s , a, c) maxb {P(ˆy|s, a, b )} P

b P(b, b s|s , a)

x\z P(s , a, c)P(b|s , a) maxb {P(ˆy|s, a, b )}

P x\z P(s , x) maxb {P(ˆy|s, a, b )}

The second step is due to the condition P b P(b, b s|s , a) = P(b|s , a), and the third step is due to B C|A, S. Similarly, we can replace max with min to obtain Eq. (5). If B / Z, M = . Then, we have max {P(ˆy|s, a, b )} = min {P(ˆy|s, a, b )} = P(ˆy|s, a) and both Eq. (4) and

Eq. (5) become

x\z\{b} P (s ,a,c)P (ˆy|s,a)

P (s ,z) , which is consistent with the identiﬁable situations (i.e., Eq. (2)).

3.3 Extending to General Case Above results can be extended to the general case. Let A denote the ancestors of ˆY which are not the descendants of S, B denote the intersection between the ancestors of ˆY and the descendants of S, C denote the descendants of S which are not the ancestors of ˆY , i.e.,

A = An( ˆY )G \ De(S)G, B = An( ˆY )G De(S)G,

C = De(S)G \ An( ˆY )G.

Note that A, B, C are disjoint and X = A B C. Now we are ready to extend Propositions 1 and 2 to the general case. Theorem 1. (Identiﬁcation of Counterfactual Quantity) Given a causal graph G and the set of proﬁle attributes Z, the counterfactual quantity P(ˆys|s , z) is unidentiﬁable if and only if B Z = . Theorem 2. (Bounds of Counterfactual Quantity) Given a causal graph G and a set of proﬁle attributes Z, we have

P(ˆys|s , z)

P x\z h P (s ,x) maxm {P (ˆy|s,pa( ˆ Y)G m ,pa( ˆ Y)G\{s,m }}

P(ˆys|s , z)

P x\z h P (s ,x) minm {P (ˆy|s,pa( ˆ Y)G m ,pa( ˆ Y)G\{s,m }}

where we partition B to two disjoint sets: a set M Z and a set N / Z such that M = B Z, N = B \ Z. The proofs are similar to the previous ones.

4 Achieving Counterfactual Fairness in Classiﬁcation The derived bounds clear the path towards constructing counterfactually fair classiﬁers. In this section, we propose a postprocessing method for reconstructing any classiﬁer to achieve counterfactual fairness. To this end, we ﬁrst give a relaxed quantitative criterion of fairness based on Deﬁnition 2. Deﬁnition 3 (τ-Counterfactual Fairness). Given a proﬁle attribute set Z X and a threshold τ, a classiﬁer f : X, S ˆY is counterfactually fair if under any condition Z = z, DE(ˆys s+|z) τ,

where DE(ˆys s+|z) = P(ˆys+|s , z) P(ˆys |s , z).

In above deﬁnition, DE(ˆys s+|z) captures the amount of unfairness or discrimination of a classiﬁer in terms of the difference in the positive decision rate for a certain group of individuals (speciﬁed by z) between the counterfactual world (where they had been changed to s+) and the real world (where they are actually in s ). If the amount of unfairness of a classiﬁer is smaller than τ, we claim this classiﬁer is (counterfactually) fair. Note that the ﬁrst term P(ˆys+|s , z) has the identiﬁcation issue, but the second term P(ˆys |s , z) simply equals to P(ˆy|s , z) since the intervention do(s ) makes no change to the value of S for this group. By denoting the upper and lower bounds of P(ˆys+|s , z) obtained in Theorem 2 as ub(P(ˆys+|s , z)) and lb(P(ˆys+|s , z)) respectively, we obtain the bounds of DE(ˆys s+|z) as follows.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

Corollary 1. (Bounds of Counterfactual Fairness) The upper and lower bounds of counterfactual fairness DE(ˆys s+|z) are given by

ub (DE(ˆys s+|z)) = ub P(ˆys+|s ,z) P(ˆy|s ,z), (6)

lb (DE(ˆys s+|z)) = lb P(ˆys+|s ,z) P(ˆy|s ,z). (7)

Corollary 1 can facilitate the detection of unfairness from observational data. Speciﬁcally, if we have ub (DE(ˆys s+|z)) τ and lb (DE(ˆys s+|z)) τ, then it is guaranteed that τ-counterfactual fairness is satisﬁed. If we have ub (DE(ˆys s+|z)) τ or lb (DE(ˆys s+|z)) > τ, then it is guaranteed that τcounterfactual fairness cannot be satisﬁed. Otherwise, it is uncertain and cannot be determined from data. Based on Corollary 1, we then propose an efﬁcient method for constructing counterfactually fair classiﬁers. Note that the bounds are consistent with identiﬁable situations, so the method works for both identiﬁable/unidentiﬁable situations. We consider to construct a new decision variable Y from ˆY in the causal model such that τ-counterfactual fairness regarding Y is satisﬁed. The objective is to ﬁnd an optimal probabilistic mapping function P( y|ˆy, pa( ˆY )G) that minimizes the difference between Y and Y , measured by the empirical loss ED[ℓ(Y, Y )], meanwhile, the new decisions are counterfactually fair. The formulation of this optimization problem is given below. Problem Formulation 1. Given a dataset D with prediction ˆY made by an arbitrary classiﬁer, we aim to learn a postprocessing mapping function P( y|ˆy, pa( ˆY )G) by solving the following optimization problem:

min ED[ℓ(Y, Y )] s.t. for any z : ub (DE( ys s+|z)) τ, lb (DE( ys+ s |z)) τ, X

y P( y|ˆy, pa( ˆY )G) = 1, 0 P( y|ˆy, pa( ˆY )G) 1,

where ℓ(Y, Y ) is the 0-1 loss function. It is easy to show that Problem Formulation 1 is a linear programming problem with P( y|ˆy, pa( ˆY )G) as variables. Note that distribution P( y|pa( ˆY )G) can be obtained by P( y|pa( ˆY )G) = P ˆy P(ˆy|pa( ˆY )G)P( y|ˆy, pa( ˆY )G). Thus, all constraints are linear w.r.t. P( y|ˆy, pa( ˆY )G). On the other hand, for the objective function we have

ED[ℓ(Y, Y )] = X

y, y {y+,y } ℓ(y, y)P( y, y) = 2P( y = y).

And we also have

P( y = y) = P(ˆy = y)P( y = ˆy) + P(ˆy = y)P( y = ˆy)

x,s P(x, s)

P(ˆy = y|x, s) h P ( y=y |ˆy=y ,x,s) P (ˆy=y |x,s) +P ( y=y+|ˆy=y+,x,s) P (ˆy=y+|x,s)

+ P(ˆy = y|x, s) h P ( y=y+|ˆy=y ,x,s) P (ˆy=y |x,s) + P ( y=y |ˆy=y+,x,s) P (ˆy=y+|x,s)

Figure 2: (a) Causal graph for the synthetic data. (b) Causal graph for the Adult data. Dashed nodes represent the exogenous variables. Bold nodes represent the proﬁle attributes in Z.

In the above expression, all probabilities except P( y|ˆy, x, s) are read from the training set D, making it a linear expression of P( y|ˆy, x, s).

5 Experiments

We evaluate our method and compare it with previous methods on two datasets. To show the correctness of our method, we generate a synthetic dataset from a known causal model with complete knowledge in our evaluation. We also use the Adult dataset [Lichman, 2013] to evaluate these methods in a real-world environment. We evaluate four methods for constructing classiﬁers: (1) the original learning algorithm without fairness constraints as the baseline (denoted by BL); (2) two methods (denoted by A1 and A3) from [Kusner et al., 2017] where A1 uses non-descendants of S only for building classiﬁers, and A3 presuppose the additive noise model for estimating the noise terms, which are then used for building classiﬁers; (3) our method (denoted as CF). By default, the discrimination threshold τ is set as 0.05.

# of z DE(ˆys s+|z) ub lb Truth 1 0.399 0.105 0.328 2 0.471 0.177 0.467 3 0.147 -0.082 -0.038 4 0.374 0.145 0.145

Table 1: Bounds and ground truth of counterfactual fairness for all value combinations of Z using the synthetic data.

5.1 Datasets Synthetic Data. We manually build a causal model (where all variables are discrete) with complete knowledge of the exogenous variables and the functions (i.e., the contingency table) using Tetrad [Scheines et al., 1998]. The corresponding causal graph is shown in Figure 2a. This causal model consists of 5 endogenous variables, A, S, M, N, Y , and 5 independent exogenous variables, UA, US, UM, UN, UY . For simplicity, all endogenous variables have two domain values and all exogenous variables have three domain values. The

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

# of z LR SVM BL A1 A3 CF BL A1 A3 CF 1 0.000 0.000 -0.233 0.049 0.114 0.000 0.174 0.049 2 1.000 0.000 1.000 0.049 0.762 0.000 0.648 0.049 3 0.000 0.000 0.000 0.000 -0.021 0.000 -0.021 0.000 4 1.000 0.000 0.000 0.048 1.000 0.000 0.000 0.048

Table 2: Counterfactual fairness for prediction of the synthetic data. Values violating the threshold are highlighted in bold.

Accu. (%) Data BL A1 A3 CF

LR Train 60.103 55.760 59.433 61.987 Test 60.421 56.563 59.713 62.512

SVM Train 65.710 55.760 62.466 61.977 Test 65.841 56.563 62.542 62.463

Table 3: Prediction accuracy for the synthetic data.

distributions of the exogenous variables and the deterministic functions of the endogenous variables are randomly assigned. Then, we generate 100,000 examples from this causal model and split the data into training and testing sets with a ratio of 80/20. We consider S as the sensitive attribute and Y as the decision attribute. The proﬁle attribute set Z contains A, M.

Adult Data. This dataset consists of 65,123 records with 11 attributes including education, sex, income etc.. We select 7 attributes, binarize their domain values, and split the dataset into the training and testing sets, following the 80/20 ratio. We apply the PC algorithm implemented in Tetrad to build the causal graph while the signiﬁcant threshold is set as 0.01 for conditional independence testing. We use three tiers in the partial order for temporal priority: sex, age in Tier 1, education, marital-status and workclass are deﬁned in Tier 2, and income deﬁned in Tier 3. The causal graph is shown in Figure 2b, where sex is considered as the sensitive attribute and income is the decision attribute. age, education, maritalstatus, and workclass are contained the proﬁle attributes Z.

5.2 Experiment on Synthetic Data Quantifying Counterfactual Fairness. According to Theorem 1, the counterfactual fairness quantity is unidentiﬁable in this dataset. We evaluate the bounds of counterfactual fairness using Theorem 2. The ground truth (i.e., the exact values of all counterfactual quantities) is computed by applying the Abduction-Action-Prediction method. The results are shown in Table 1, where the ﬁrst column indicates the indices of z s value combinations. As can be seen, the exact values of DE(ˆys s+|z) fall into the range of our bounds for all value combinations of Z, which validates our theorem.

Building Counterfactually Fair Classiﬁers. We then evaluate the classiﬁer learning methods. For the baseline method, we adopt the logistic regression (LR) and support vector machine (SVM). Then, we apply A1, A3, and CF on top of both classiﬁers. The counterfactual fairness is precisely evaluated and shown in Table 2 for all the methods using the Abduction-Action-Prediction method. The predictive accuracy is reported in Table 3. As expected, both A1 and CF achieve fairness, but our method achieves higher accuracy than A1, implying that A1 loses more information. On the

# of z BL A1 A3 CF ub lb val ub lb ub lb

2 0.321 0.000 0.000 -1.000 -1.000 -0.007 -0.047 4 0.523 0.000 0.000 -1.000 -1.000 0.038 -0.027 13 1.000 0.304 0.000 0.000 -1.000 0.049 -0.016 15 1.000 0.398 0.000 0.000 -1.000 0.050 -0.007

2 0.135 -0.186 0.000 -0.186 -0.186 -0.007 -0.047 4 0.283 -0.240 0.000 -0.240 -0.240 0.038 -0.027 13 0.866 0.170 0.000 0.866 0.170 0.049 -0.016 15 0.907 0.305 0.000 0.907 0.305 0.050 -0.007

Table 4: Counterfactual fairness for prediction of the Adult data.

Accu. (%) D BL A1 A3 CF

LR Train 77.728 67.624 74.845 70.433 Test 77.200 66.934 73.867 69.451

SVM Train 78.071 67.624 77.845 70.413 Test 77.449 66.934 77.166 69.438

Table 5: Prediction accuracy for the Adult data.

other hand, we see that BL fails to achieve counterfactual fairness, because it ignores the fairness during the training. In addition, A3 also fails to achieve counterfactual fairness. This implies that assuming additive model may produce biased results when the underlying causal model is non-linear.

5.3 Experiment on Adult Dataset

We evaluate the fair classiﬁer learning methods using the Adult dataset. Since we don t have the ground truth, we report bounds of counterfactual fairness for different methods. Table 4 shows that only A1 and CF can achieve counterfactual fairness for all value combinations of Z, but our CF consistently achieves higher accuracy than A1 as shown in Table 5. This is as expected since A1 is proved to be fair in [Kusner et al., 2017] (and also identiﬁable according to Theorem 1), but will inevitably lead to lower accuracy as only S s nondescendants are used. For BL and A3 in Table 4, either the lower bound is larger than τ or the upper bound is less than τ, indicating the τ-counterfactual fairness is not achieved.

6 Conclusion

We focus on the unidentiﬁability challenge when applying counterfactual fairness in practice. We decompose the counterfactual quantity and identify the source of unidentiﬁcation by leveraging the counterfactual graph and c-component factorization from Pearl s framework. We then develop the criterion of identiﬁcation and the upper/lower bounds for counterfactual fairness. Finally, we formulate counterfactually fair classiﬁcation as a linear programming problem. Empirical evaluations show our method is guaranteed to achieve counterfactual fairness in classiﬁcation, while previous approaches either cannot achieve counterfactual fairness or suffer bad performance due to over-simpliﬁed assumptions.

Acknowledgments

This work was supported in part by NSF 1646654.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

References [Chiappa, 2019] Silvia Chiappa. Path-Speciﬁc Counterfactual Fairness. The Thirty-Third AAAI Conference on Artiﬁcial Intelligence (AAAI-19), 2019. [Kilbertus et al., 2017] Niki Kilbertus, Mateo Rojas-Carulla, Giambattista Parascandolo, Moritz Hardt, Dominik Janzing, and Bernhard Sch olkopf. Avoiding Discrimination through Causal Reasoning. Neural Information Processing Systems, 2017. [Kusner et al., 2017] Matt J. Kusner, Joshua R. Loftus, Chris Russell, and Ricardo Silva. Counterfactual Fairness. Neural Information Processing Systems, 2017. [Li et al., 2017] Jiuyong Li, Jixue Liu, Lin Liu, Thuc Duy Le, Saisai Ma, and Yizhao Han. Discrimination detection by causal effect estimation. In 2017 IEEE International Conference on Big Data (Big Data), pages 1087 1094. IEEE, December 2017. [Lichman, 2013] M Lichman. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml, 2013. [Nabi and Shpitser, 2018] Razieh Nabi and Ilya Shpitser. Fair Inference On Outcomes. The Thirty-Second AAAI Conference on Artiﬁcial Intelligence (AAAI-18), pages 1931 1940, 2018. [Pearl, 2009] Judea Pearl. Causality: Models, Reasoning and Inference. Cambridge University Press, New York, NY, USA, 2nd edition, 2009. [Scheines et al., 1998] Richard Scheines, Peter Spirtes, Clark Glymour, Christopher Meek, and Thomas Richardson. The TETRAD Project: Constraint Based Aids to Causal Model Speciﬁcation. Multivariate Behavioral Research, 33(1):65 117, January 1998. [Shpitser and Pearl, 2008] Ilya Shpitser and Judea Pearl. Complete Identiﬁcation Methods for the Causal Hierarchy. Journal of Machine Learning Research, 9:1941 1979, 2008. [Tian and Pearl, 2002] Jin Tian and Judea Pearl. A General Identiﬁcation Condition for Causal Effects. Proceedings of the Eighteenth National Conference on Artiﬁcial Intelligence and Fourteenth Conference on Innovative Applications of Artiﬁcial Intelligence (IAAI 2002), (August):567 573, 2002. [Zhang and Bareinboim, 2018] Junzhe Zhang and Elias Bareinboim. Fairness in Decision-Making The Causal Explanation Formula. The Thirty-Second AAAI Conference on Artiﬁcial Intelligence (AAAI-18), 2018. [Zhang et al., 2017a] Lu Zhang, Yongkai Wu, and Xintao Wu. Achieving Non-Discrimination in Data Release. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, August 13 - 17, 2017, pages 1335 1344, New York, New York, USA, November 2017. ACM Press. [Zhang et al., 2017b] Lu Zhang, Yongkai Wu, and Xintao Wu. A Causal Framework for Discovering and Removing Direct and Indirect Discrimination. In Proceedings of

the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence, IJCAI 2017, Melbourne, Australia, August 19-25, 2017, pages 3929 3935, California, August 2017. International Joint Conferences on Artiﬁcial Intelligence Organization. [Zhang et al., 2018a] L. Zhang, Y. Wu, and X. Wu. Causal modeling-based discrimination discovery and removal: Criteria, bounds, and algorithms. IEEE Transactions on Knowledge and Data Engineering, pages 1 1, 2018. [Zhang et al., 2018b] Lu Zhang, Yongkai Wu, and Xintao Wu. Achieving non-discrimination in prediction. In Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden., pages 3097 3103, 2018.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)