# achieving_nondiscrimination_in_prediction__77aa4742.pdf

Achieving Non-Discrimination in Prediction

Lu Zhang, Yongkai Wu, and Xintao Wu University of Arkansas {lz006,yw009,xintaowu}@uark.edu

In discrimination-aware classiﬁcation, the preprocess methods for constructing a discriminationfree classiﬁer ﬁrst remove discrimination from the training data, and then learn the classiﬁer from the cleaned data. However, they lack a theoretical guarantee for the potential discrimination when the classiﬁer is deployed for prediction. In this paper, we ﬁll this gap by mathematically bounding the discrimination in prediction. We adopt the causal model for modeling the data generation mechanism, and formally deﬁning discrimination in population, in a dataset, and in prediction. We obtain two important theoretical results: (1) the discrimination in prediction can still exist even if the discrimination in the training data is completely removed; and (2) not all pre-process methods can ensure non-discrimination in prediction even though they can achieve non-discrimination in the modiﬁed training data. Based on the results, we develop a two-phase framework for constructing a discrimination-free classiﬁer with a theoretical guarantee. The experiments demonstrate the theoretical results and show the eﬀectiveness of our two-phase framework.

1 Introduction Discrimination-aware classiﬁcation is receiving an increasing attention in the data mining and machine learning ﬁelds. Many methods have been proposed for constructing discrimination-free classiﬁers, which can be broadly classiﬁed into three categories: the pre-process methods that modify the training data [Kamiran and Calders, 2009a; Feldman et al., 2015; Zhang et al., 2017; Calmon et al., 2017; Nabi and Shpitser, 2017], the in-process methods that adjust the learning process of the classiﬁer [Calders and Verwer, 2010; Kamishima et al., 2011; 2012; Zafar et al., 2017], and the post-process methods that directly change the predicted labels [Kamiran et al., 2010; Hardt et al., 2016]. All three categories of methods have their respective limitations. For the in-process methods, they usually perform some tweak or develop some regularizers for the classiﬁer to correct or penalize discriminatory outcomes during

the learning process. However, since the discrimination or fair constraints are generally not convex functions, surrogate functions are usually used for the minimization. As a result, additional bias might be introduced due to the approximation errors associated with the surrogate function. For the post-process methods, they are restricted to those who can modify the predicted label of each individual independently. Thus, methods that map the whole dataset or population to a new non-discriminatory one cannot be adopted for post-process, which means that a number of causal-based discrimination removal methods (e.g., [Zhang et al., 2017; Nabi and Shpitser, 2017]) cannot be applied. In our work, we target the pre-process methods that modify the training data, where some methods only modify the label, such as the Massaging [Kamiran and Calders, 2009a; ˇZliobait e et al., 2011] and the Causal-Based Removal [Zhang et al., 2017], and some methods also modify the data attributes other than the label, such as the Preferential Sampling [Kamiran and Calders, 2012; ˇZliobait e et al., 2011], the Reweighing [Calders et al., 2009], and the Disparate Impact Removal [Feldman et al., 2015; Adler et al., 2016]. The fundamental assumption of the pre-process methods is that, since the classiﬁer is learned from a discrimination-free dataset, it is likely that the future prediction will also be discriminationfree [Kamiran and Calders, 2009b]. Although this assumption is plausible, however, there is no theoretical guarantee to show how much likely it is and how discrimination-free the prediction would be given a training data and a classiﬁer. The lack of the theoretical guarantees places great uncertainty on the performance of all pre-process methods. In this paper, we ﬁll this gap by modeling the discrimination in prediction using the causal model [Pearl, 2009]. A causal model is a structural equation-based mathematical object that describes the causal mechanism of a system. We assume that there exists a ﬁxed but unknown causal model that represents the underlying data generation mechanism of the population. Based on the causal model, we deﬁne the causal measure of discrimination in population as well as in prediction. We then formalize two problems, namely discovering and removing discrimination in prediction. Based on speciﬁc assumptions regarding the causal model and the causal measure of discrimination, we conduct concrete analysis of discrimination. We derive the formula for quantitatively measuring the discriminatory eﬀect in population from

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

Notation Deﬁnition C Protected attribute Z = {Z1, , Zm} Non-protected attributes L Label of decision h : C Z L Classiﬁer M Causal model of population Mh Causal model of prediction D = {(c(j), z( j), l(j))} Training data Dh = {(c(j), z( j), h(c(j),z(j)))} Training data w/ predicted labels

Table 1: Table of notations.

the observable probability distributions. We then derive the corresponding causal measure of the discrimination in prediction, as well as their approximations from the sample dataset. Finally, we link the discrimination in prediction with the discrimination in the training data by a probabilistic condition, which mathematically bounds the probability of the discrimination in prediction being within a given interval in terms of the training data and classiﬁer. As a consequence, we obtain two important theoretical results: (1) even if the discrimination in the training data is completely removed, the discrimination in prediction can still exist due to the bias in the classiﬁer; and (2) for removing discrimination, diﬀerent from the claims of many previous work, not all pre-process methods can ensure non-discrimination in prediction even though they can achieve non-discrimination in the modiﬁed training data. We show that to guarantee nondiscrimination in prediction, the pre-process methods should only modify the label. Based on the results, we develop a twophase framework for constructing a discrimination-free classiﬁer with a theoretical guarantee, which provides a guideline for employing existing pre-process methods or designing new ones. The experiments demonstrate the theoretical results and show the eﬀectiveness of our two-phase framework.

2 Problem Formulation

2.1 Notations and Preliminaries We consider an attribute space which consists of some protected attributes, the label of certain decision attribute, and the non-protected attributes. Throughout the paper, we use an uppercase alphabet, e.g., X to represent an attribute; a bold uppercase alphabet, e.g., X, to represent a subset of attributes. We use a lowercase alphabet, e.g., x, to represent a realization of attribute X; a bold lowercase alphabet, e.g., x, to represent a realization of X. For ease of representation, we assume that there is only one protected attribute, denoted by C, which is a binary attribute associated with the domain values of the nonprotected group c+ and the protected group c . We denote the label of the decision attribute by L, which is a binary attribute associated with the domain values of the positive label l+ and negative label l . According to the convention in machine learning, we also deﬁne that l+ = 1 and l = 0. The set of all the non-protected attributes is denoted by Z = {Z1, , Zm}. Please refer to the notation table shown as Table 1. A causal model is formally deﬁned as follows.

Deﬁnition 1 (Causal Model). A causal model M is a triple M = U, V, F where

1. U is a set of hidden contextual variables that are determined by factors outside the model. 2. V is a set of observed variables that are determined by variables in U V. 3. F is a set of equations mapping from U V to V. Specifically, for each Vi V, there is an equation fi mapping from U (V\Vi) to Vi, i.e.,

vi = fi(pai, ui),

where pai is a realization of a set of observed variables PAi V\Vi called the parents of Vi, and ui is a realization of a set of hidden variables Ui U. The causal eﬀect in the causal model is deﬁned over the intervention that ﬁxes the value of an observed variable(s) V to a constant(s) v while keeping the rest of the model unchanged. The intervention is mathematically formalized as do(V = v) or simply do(v). Then, for any variables X, Y V, the distribution of Y after do(x) is deﬁned as [Pearl, 2009]

P(y|do(x)) P(Y = y|do(X = x)) = X

{u:Yx(u)=y} P(u), (1)

where Yx(u) denotes the value of Y after intervention do(x) under context U = u. Note that P(u) is an unknown joint distribution of all hidden variables. If the causal model satisﬁes the Markovian assumption: (1) the associated causal graph of the causal model is acyclic; and (2) all variables in U are mutually independent, P(y|do(x)) can be computed from the joint distribution of V according to the truncated factorization formula [Pearl, 2009]

P(y|do(x)) = X

Vi,X P(vi|pai)δX=x, (2)

where the summation is a marginalization that traverses all value combinations of V = V\{X, Y}, and δX=x means replacing X with x in each term.

2.2 Model Discrimination in Prediction Assume that there exists a ﬁxed population over the space C Z L. The values of all the attributes in the population are determined by a causal model M, which can be written as

Causal Model M c = f C(pa C, u C) zi = fi(pai, ui) i = 1, , m l = f L(pa L, u L)

where f L can be considered as the decision making process in the real system. Without ambiguity, we can also use M to denote the population, and use the terms mechanism and population interchangeably. In practice, M is unknown and we can only observe a sample dataset D = {(c(j), z(j), l(j)); j = 1, , n} drawn from the population. A classiﬁer h is function mapping from C Z to L. A hypothesis space H is a set of candidate classiﬁers. A learning algorithm analyzes dataset D as the training data to ﬁnd a classiﬁer from H that minimizes the diﬀerence between the predicted labels h(c(j), z(j)) and the true labels l( j) (j = 1, , n). Once training completes, the classiﬁer is deployed to infer prediction on any new unlabeled data. It is

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

usually assumed that the unlabeled data is drawn from the same population as the training data, i.e., M. Therefore, in prediction, the values of all the attributes other than the label are determined by the same mechanisms as those in M, and meanwhile the classiﬁer acts as a new mechanism for determining the value of the label. We consider M with function f L( ) being replaced by classiﬁer h( ) as the causal model of classiﬁer h, denoted by Mh, which is written as

Causal Model Mh c = f C(pa C, u C) zi = fi(pai, ui) i = 1, , m l = h(c, z)

If we apply the classiﬁer on D, we can obtain a new dataset Dh by replacing the original labels with the predicted labels, i.e., Dh = {(c(j), z(j), h(c(j), z(j))); j = 1, , n}. Straightforwardly, Dh can be considered as a sample drawn from Mh. The discrimination in prediction made by classiﬁer h can be naturally deﬁned as the discrimination in Mh. To this end, we ﬁrst deﬁne a measure of discrimination in M based on the causal relationship speciﬁed by M, denoted by DEM called the true discrimination. By adopting the same measure, we denote the discrimination in Mh by DEMh, called the true discrimination in prediction. Then, we denote the approximation of DEM from dataset D by DED, and similarly denote the approximation of DEMh from dataset Dh by DEDh. Our target is to discover and remove the true discrimination in prediction, i.e., DEMh, based on certain causal measure of discrimination deﬁned on M, i.e., DEM. When calculating DEMh from dataset D, we may encounter disturbances such as the sampling error of D and the misclassiﬁcation of h. We then need to compute analytic approximation to DEMh. Thus, we deﬁne the problem of discovering discrimination in prediction as follows.

Problem 1 (Discover Discrimination in Prediction). Given a causal measure of discrimination deﬁned on M, i.e., DEM, a sample dataset D and a classiﬁer h trained on D, compute analytic approximation to the true discrimination in prediction, i.e., DEMh.

If the true discrimination in prediction is detected according to the approximation, the next step is to remove the discrimination through tweaking the dataset and/or the classiﬁer. Thus, we deﬁne the problem of removing discrimination in prediction as follows.

Problem 2 (Remove Discrimination in Prediction). Given DEM, D and h, tweak D and/or h in order to make DEMh be bounded by a user-deﬁned threshold τ.

3 Discover Discrimination in Prediction

In the above general problem deﬁnitions, DEM can be any reasonable causal measure of discrimination deﬁned on any causal model. However, a concrete analysis of discrimination must rely on speciﬁc assumptions regarding the causal measure of discrimination and the causal model. The remaining of the paper is based on following assumptions: (1) M satisﬁes the Markovian assumption; (2) we consider all causal effects (total eﬀect) of C on L as discriminatory; (3) we assume that C has no parent and L has no child. The ﬁrst assumption

is necessary for computing the causal eﬀect from the observable probability distributions. The second assumption is because the total causal eﬀect is the causal relationship that is easiest to interpret and estimate. We will extend our results to other discrimination deﬁnitions such as those in [Zhang et al., 2017; Bonchi et al., 2017] in the future work. The last assumption is to make our theoretical results more concise and can be easily relaxed.

3.1 Causal Measure of Discrimination We ﬁrst derive the true discrimination in M. The key of discrimination is a counterfactual question: whether the decision of an individual would be diﬀerent had the individual been of a diﬀerent protected/non-protected group (e.g., sex, race, age, religion, etc.)? To answer this question, we can perform an intervention on each individual to change the value of protected attribute C and see how label L will change. We consider the diﬀerence between the expectation of the labels when performing do(c+) for all individuals and the expectation of the labels when performing do(c ) for all individuals, and use it as the causal measure of discrimination. To obtain the above diﬀerence, note that the causal model is completely speciﬁed at the individual level when context U = u is speciﬁed. For each individual speciﬁed by u, denote the label of individual u by Lc+(u) (resp. Lc (u)) when C is ﬁxed according to do(c+) (resp. do(c )). Then, the diﬀerence in the label of individual u is given by Lc+(u) Lc (u). The expected diﬀerence of the labels over all individuals is hence given by E[Lc+(u) Lc (u)]. Based on this analysis, we obtain the following proposition.

Proposition 1. Given a causal model M, the true discrimination is given by

DEM = P(l+|c+) P(l+|c ).

Proof. The above expectations can be computed as

E[Lc+(u)] = X

u Lc+(u)P(u) = X

{u:Lc+(u)=l+} l+P(u)

{u:Lc+(u)=l } l P(u) = X

{u:Lc+(u)=l+} P(u) = P(l+|do(c+)), (3)

where the last step is according to Eq. (1). According to Eq. (2), we have

P(l+|do(c+)) = X

z P(l+|pa L)δC=c+ Y

Zi Z P(zi|pai)δC=c+ .

On the other hand, since C has no parent, we have

P(l+|c+) = P(l+, c+)

P(c+) = P z P(c+)P(l+|pa L) Q Zi Z P(zi|pai)

Thus, we have P(l+|do(c+)) = P(l+|c+), leading to E[Lc+(u)] = P(l+|c+). Similarly we can prove E[Lc (u)] = P(l+|c ). Hence, the proposition is proven.

Interestingly, the obtained discrimination causal measure is the same as the classic statistical discrimination metric risk diﬀerence, which is widely used as the non-discrimination

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

constraint in discrimination-aware learning [Romei and Ruggieri, 2014]. Our analysis can help understand the assumptions and scenarios in which the risk diﬀerence applies. Given dataset D, we approximate DEM using the maximum likelihood estimation, denoted by DED as shown below. Proposition 2. Given a dataset D, the discrimination in D is given by

DE(c+, c )D = ˆP(l+|c+) ˆP(l+|c ),

where ˆP(l+|c+) = X

z ˆP(l+|c+, z) ˆP(z|c+), (4)

with ˆP( ) being the conditional frequency in D. Given a classiﬁer h : C Z L, denote the predicted labels by L. By adopting the same causal measure of discrimination of DEM, we obtain DEMh shown as follows. Proposition 3. Given a causal model M and a classiﬁer h, the true discrimination in prediction is given by

DEMh = P( l+|c+) P( l+|c ),

where P( l+|c+) (resp. P( l+|c )) is the probability of the positive predicted labels for the data with C = c+ (resp. C = c ). We similarly deﬁne DEDh as the maximum likelihood estimation of DEMh. Proposition 4. Given a dataset D and a classiﬁer h trained on D, the discrimination in Dh is given by

DEDh = ˆP( l+|c+) ˆP( l+|c ),

where ˆP( l+|c+) = X

z I[h(c+,z)=l+] ˆP(z|c+). (5)

with I[ ] the indicator function.

3.2 Bounding Discrimination in Prediction To approximate DEMh from D, sampling error cannot be avoided since D is only a subset of the entire population. Although exact measurement of sampling error is generally not feasible as M is unknown, it can be probabilistically bounded. In the following we ﬁrst bound the diﬀerence between DEM and its approximation DED, and then extend the result to the diﬀerence between DEMh and its approximation DEDh. Proposition 5. Given a causal model M and a sample dataset D with size of n, the probability of the diﬀerence between DEM and DED no larger than t is bounded by

P |DEM DED| t > 1 4e n+n

where n+ and n (n+ +n = n) are the numbers of individuals with c+ and c in D.

Proof. By deﬁnition of DEM and DED we have

DEM DED = P(l+|c+) ˆP(l+|c+) + ˆP(l+|c ) P(l+|c ) .

Denoting by l(+ j) the label of the jth individual in D with C = c+, we can write ˆP(l+|c+) as

ˆP(l+|c+) = 1

n+ I[l(+1)=l+] + + I[l(+n+)=l+] ,

where indicators I[l(+ j)=l+] (j = 1 n+) can be considered as independent random variables bounded by the interval [0, 1]. Note that E[ ˆP(l+|c+)] = P(l+|c+). According to the Hoeﬀding s inequality [Murphy, 2012], we have

P P(l+|c+) ˆP(l+|c+) t 2e 2n+t2.

Similarly, we have P P(l+|c ) ˆP(l+|c ) t 2e 2n t2. Therefore, we have

P |DEM DED| t

P P(l+|c+) ˆP(l+|c+) + P(l+|c ) ˆP(l+|c ) t

max 0 x t P P(l+|c+) ˆP(l+|c+) x P P(l+|c ) ˆP(l+|c ) t x

max 0 x t(1 2e 2n+x2)(1 2e 2n (t x)2)

where the last line is by substituting x with

For extending Proposition 5 to Proposition 6, the diﬀerence is that, since h is a classiﬁer depending on training data D, indicators I[h(c(+ j),z(+ j))=l+] cannot be considered as independent. Thus, the Hoeﬀding s inequality cannot be directly applied and a uniform bound for all hypotheses in H is needed. Proposition 6. Given a causal model M, a sample dataset D, and a classiﬁer h : C Z L from hypothesis space H, the probability of the distance between DEMh and DEDh no larger than t is bounded by

P DEMh DEDh t ! 1 δ(t),

n t2 if H is ﬁnite,

4(2en+)d + (2en )d

n t2 if H is inﬁnite,

with d being the VC dimension of H.

Proof. According to the deﬁnitions of DEMh and DEDh,

DEMh DEDh = P( l+|c+) ˆP( l+|c+) + P( l+|c ) ˆP( l+|c ) .

Similar to the proof of Proposition 5, we treat ˆP( l+|c+) as the mean of indicators I[h(c(+ j),z(+ j))=l+] (j = 1 n+). According to the generalization bound in statistical learning theory [Vapnik, 1998], if H is ﬁnite we have

P P( l+|c+) ˆP( l+|c+) t 2|H|e 2n+t2,

where |H| is the size of H. If H is inﬁnite we have

P P( l+|c+) ˆP( l+|c+) t 4 2en+

!d e 2n+t2,

where d is the VC dimension of H. Then the proposition can be proven similarly to Eq. (6).

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

Proposition 6 provides an approximation of DEMh from DEDh. However, since pre-process methods deal with the training data, it is imperative to further link DEMh with DED. Next, we give the relation between DEDh and DED in terms of a bias metric that we refer to as the the error bias. Deﬁnition 2 (Error Bias). For any classiﬁer h trained on a training dataset D, the error bias is given by

εh,D = ε+ 1 ε+ 2 (ε 1 ε 2),

where ε+ 1, ε 1 are the percentages of false positives on data with C = c+ and C = c respectively, and ε+ 2, ε 2 are the percentages false negatives on data with C = c+ and C = c respectively. Proposition 7. For any classiﬁer h that is trained on D, we have DEDh DED = εh,D.

Proof. By deﬁnition, ε+ 1 is given by

{ j:c(j)=c+,l( j)=l } I[h(c(j),z( j))=l+],

which can be rewritten as

z ˆP(z|c+) I[h(c( j),z( j))=l+] (1 ˆP(l+|c+, z)).

Similarly, ε+ 2 is given by

z ˆP(z|c+) I[h(c( j),z( j))=l ] ˆP(l+|c+, z).

Subtracting ε+ 2 from ε+ 1, we obtain

ε+ 1 ε+ 2 = X

z ˆP(z|c+) I[h(c+,z)=l+](1 ˆP(l+|c+, z)) I[h(c+,z)=l ] ˆP(l+|c+, z) ,

which is equivalent to

ε+ 1 ε+ 2 = X

z ˆP(z|c+) I[h(c+,z)=l+] ˆP(l+|c+, z) .

Similarly for data with C = c , we have

ε 1 ε 2 = X

z ˆP(z|c ) I[h(c ,z)=l+] ˆP(l+|c , z) .

On the other hand, according to Eq. (4) and (5) we have

DEDh DED = X

z ˆP(z|c+) I[h(c+,z)=l+] ˆP(l+|c+, z)

z ˆP(z|c ) I[h(c ,z)=l+] ˆP(l+|c , z) = ε+ 1 ε+ 2 (ε 1 ε 2).

Letting εh,D = ε+ 1 ε+ 2 (ε 1 ε 2) completes the proof.

Using Proposition 7, we rewrite Propositions 6 to Theorem 1 that is easier to interpret and utilize in practice. Theorem 1. Given a causal model M, a sample dataset D and a classiﬁer h trained on D, DEMh is bounded by

P DEMh DED + εh,D + t ! 1 δ(t),

where the meaning of δ(t) is same as that in Proposition 6.

Theorem 1 gives a criterion of non-discrimination in prediction that incorporates both the discrimination in the training data and the error bias of the classiﬁer, i.e., DED + εh,D being bounded by a threshold τ. It shows that either given a discrimination-free dataset D, i.e., |DED| τ, or a balanced classiﬁer, i.e., εh,D τ, we cannot guarantee nondiscriminatory prediction. Instead, it requires to ensure that the sum of DED and εh,D is within the threshold.

4 Remove Discrimination in Prediction This section solves the problem of removing discrimination in prediction: if criterion DED + εh,D τ is not satisﬁed for a classiﬁer, how can we meet the criterion through modifying the training data and tweaking the classiﬁer? Denote by D a dataset obtained by modifying D, and by h a new classiﬁer trained on D . Note that when the training data is modiﬁed, the error bias of the classiﬁer built on it will also change. Thus, it is diﬃcult to perform the training data modiﬁcation and the classiﬁer tweaking simultaneous. We propose a framework for modifying the training data and the classiﬁer in two successive phases, as summarized in Algorithm 1.

Algorithm 1: Two-phase framework. 1 If DED+εh,D τ, we are done. Otherwise, modify the labels in the training dataset D to obtain a modiﬁed dataset D such that |DED | τ. 2 Train a classiﬁer h on D . If DED +εh ,D τ, we are done. Otherwise, tweak classiﬁer h to meet the above requirement.

In the ﬁrst phase, we modify D to remove the discrimination it contains. The modiﬁed dataset D can be considered as being generated by a causal model M that is different from M with respect to the modiﬁcation. Note that if DED +εh ,D τ is achieved, Theorem 1 ensures the bound of discrimination for M h , i.e., the discrimination of h performed on M , but not for Mh , i.e., the discrimination of h performed on the original population. If we only modify the label of D, M can be written as

Causal Model M c = f C(pa C, u C) zi = fi(pai, ui) i = 1, , m l = f L(pa L, u L)

Then, the causal model of any classiﬁer h trained on D and performed on M is given by

Causal Model M h c = f C(pa C, u C) zi = fi(pai, ui) i = 1, , m l = h (c, z)

which is equivalent to Mh . Thus, non-discrimination in M h also means non-discrimination in Mh . On the other hand, if we modify attributes other than L, since the new unlabeled data is drawn from the original population, Mh is inconsistent with M h . As a result, the non-discrimination result of the training data cannot be applied to the prediction of the new data. Therefore, we have the following corollary derived from Theorem 1.

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

Size DEM DED DEDh DEMh DT SVM DT SVM 500

0.131 1.6E 3 0.145 4.1E 3 0.132 8.2E 3 0.139 3.5E 3 0.125 6.8E 3 2000 0.131 4.8E 4 0.129 1.1E 3 0.121 7.4E 3 0.130 9.4E 4 0.120 7.1E 3 10000 0.129 8.0E 5 0.138 4.0E 4 0.150 4.3E 3 0.138 3.8E 4 0.150 4.3E 3

Table 2: Measured discrimination before discrimination removal (values larger than threshold are highlighted as bold).

Size Two-phase framework (MSG) DI DED DEMh DEMh (w/o classiﬁer tweaking) DED DEMh 500 0.004 3.7E 6 0.015 1.0E 3 0.068 4.6E 3 2E 4 1.4E 3 0.092 6.1E 3 2000 0.001 1.7E 7 0.016 5.3E 4 0.067 4.3E 3 0.001 3.4E 4 0.095 1.6E 3 10000 2E 4 9.7E 9 0.013 3.3E 4 0.061 3.3E 3 0.001 6.8E 5 0.107 5.4E 4

Table 3: Measured discrimination after discrimination removal (decision tree as the classiﬁer).

Corollary 1. Let D be a modiﬁed dataset from D, and h be a new classiﬁer trained on D . If D only modiﬁes the labels, then DED +εh ,D τ is a suﬃcient condition to achieve

P DEMh τ + t 1 δ(t),

where the meaning of δ(t) is same as that in Proposition 6. In the second phase, we make modiﬁcations to h to reduce the error bias. Although dealing with a diﬀerent fairness criterion, existing methods for balancing the misclassiﬁcation rates (e.g., [Hardt et al., 2016]) can be easily extended for solving this problem. For the purpose of evaluating the correctness of our theoretical results, here we use a simple algorithm Random Flip for reducing the error bias that can be applied to any classiﬁer. After the classiﬁer makes a prediction, Random Flip randomly ﬂips the predicted label with certain probability p+ (resp. p ) if the individual has C = c+

(resp. C = c ) to achieve DED +εh ,D τ, where p+ can be computed according to the prediction of h over D . Denoting σ = τ |DED |, it suﬃces to make |ε+ 1 ε+ 2| σ/2 and |ε 1 ε 2)| σ/2. Assume that ε+ 1 ε+ 2 > σ/2, then it can be easily shown that p+ should satisfy (ε+ 1 ε+ 2 σ/2) n+

p+ (ε+ 1 ε+ 2) n+

fp+tp . Similar result can be obtained if ε+ 1 ε+ 2 < σ/2.

5 Empirical Evaluation 5.1 Experimental Setup In this section, we conduct experiments to evaluate our theoretical results. For simulating a population, we adopt a semisynthetic data generation method. We ﬁrst learn a causal model M for a real dataset, the Adult dataset [Lichman, 2013], and treat it as the ground-truth. We then generate the training data D based on M. The causal model is built using the open-source software Tetrad [Glymour and others, 2004]. The Adult dataset consists of 11 attributes including age, education, sex, occupation, income, etc. Due to the sparse data issue, we binarize each attribute s domain values into two classes to reduce the domain sizes. We treat sex as C and income as L. The discrimination is measured as 0.13 in M, i.e., DEM = 0.13. Based on the underlying distribution of M, we generate a number of training data sets with diﬀerent sample sizes.

When constructing discrimination-free classiﬁers using the two-phase framework, we select one representative data modifying algorithm that only modiﬁes L, the Massaging (MSG) algorithm [Kamiran and Calders, 2009a]. For other algorithms, we will evaluate their performance in preserving data utility in the future work. For comparison, we also include an algorithm that modiﬁes Z, the Disparate Impact Removal (DI) algorithm [Adler et al., 2016]. The proposed Random Flip algorithm is used for tweaking the classiﬁer. We assume a discrimination threshold τ = 0.05, i.e., we want to ensure that the discrimination in prediction is not larger than 0.05.

5.2 Experiment Results

We ﬁrst measure the discrimination in each training data set, i.e., DED. Then, we learn the classiﬁer h from the training data where two classiﬁers, decision tree (DT) and support vector machine (SVM) are used. By replacing the labels in the training data with the labels predicted by the classiﬁer, we obtain Dh whose discrimination is measured as DEDh. Finally, we measure the discrimination in prediction, i.e., DEMh according to Proposition 3. For each sample size, we repeat the experiments 100 times by randomly generating 100 sets of training data. We report the average and variance of each measured discrimination. The results are shown in Table 2. As expected, with the increase of the sample size, the diﬀerence between DEM and DED decreases as shown by the variance. Since DED > 0.05, the training data contains discrimination. As a result, both the training data with predicted labels, i.e., Dh, and the prediction, i.e., Mh, also contain discrimination. To show the eﬀectiveness of the two-phase framework, we ﬁrst apply MSG to completely remove the discrimination in the above training data, obtaining the modiﬁed training data D . Then, a decision tree h is built on D , and the Random Flip algorithm is executed to tweak the classiﬁer so that the error bias is less than 0.05, i.e., εh ,D 0.05. Finally, we measure the discrimination in Mh . For comparison, the same process is also performed for DI. The results are shown in Table 3. By using the two-phase framework, discrimination is removed from the training data as shown by DED , and more importantly, removed from the prediction as shown by DEMh . We also see that, if the classiﬁer tweaking is not performed, the prediction still contains discrimination. How-

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

ever, for DI, even when the discrimination is removed from the training data, and the error bias in the classiﬁer is also removed, there still exists discrimination in prediction. These results are consistent with our theoretical conclusions.

6 Conclusions

In this paper, we addressed the limitation of the pre-process methods that there is no guarantee about the discrimination in prediction. Our theoretical results show that: (1) only removing discrimination from the training data cannot ensure nondiscrimination in prediction for any classiﬁer; and (2) when removing discrimination from the training data, one should only modify the labels in order to obtain a non-discrimination guarantee. Based on the results, we developed a two-phase framework for constructing a discrimination-free classiﬁer with a theoretical guarantee. The experiments adopting a semi-synthetic data generation method demonstrate the theoretical results and show the eﬀectiveness of our two-phase framework.

Acknowledgments

This work was supported in part by NSF 1646654.

[Adler et al., 2016] Philip Adler, Casey Falk, Sorelle A Friedler, Gabriel Rybeck, Carlos Scheidegger, Brandon Smith, and Suresh Venkatasubramanian. Auditing blackbox models for indirect inﬂuence. In Proceedings of ICDM 2016, 2016.

[Bonchi et al., 2017] Francesco Bonchi, Sara Hajian, Bud Mishra, and Daniele Ramazzotti. Exposing the probabilistic causal structure of discrimination. International Journal of Data Science and Analytics, 3(1):1 21, 2017.

[Calders and Verwer, 2010] Toon Calders and Sicco Verwer. Three naive bayes approaches for discrimination-free classiﬁcation. Data Mining and Knowledge Discovery, 21(2):277 292, 2010.

[Calders et al., 2009] Toon Calders, Faisal Kamiran, and Mykola Pechenizkiy. Building classiﬁers with independency constraints. In Data mining workshops, 2009. ICDMW 09. IEEE international conference on, pages 13 18. IEEE, 2009.

[Calmon et al., 2017] Flavio Calmon, Dennis Wei, Bhanukiran Vinzamuri, Karthikeyan Natesan Ramamurthy, and Kush R Varshney. Optimized pre-processing for discrimination prevention. In Advances in Neural Information Processing Systems, pages 3995 4004, 2017.

[Feldman et al., 2015] Michael Feldman, Sorelle A Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. Certifying and removing disparate impact. In KDD, pages 259 268. ACM, 2015.

[Glymour and others, 2004] Clark Glymour et al. The TETRAD project. http://www.phil.cmu.edu/ tetrad, 2004.

[Hardt et al., 2016] Moritz Hardt, Eric Price, Nati Srebro, et al. Equality of opportunity in supervised learning. In Advances in Neural Information Processing Systems, pages 3315 3323, 2016. [Kamiran and Calders, 2009a] Faisal Kamiran and Toon Calders. Classifying without discriminating. In Computer, Control and Communication, 2009. IC4 2009. 2nd International Conference on, pages 1 6. IEEE, 2009. [Kamiran and Calders, 2009b] Faisal Kamiran and Toon Calders. Discrimination-aware classiﬁcation. In 21st Benelux Conference on Artiﬁcial Intelligence (BNAIC), pages 333 334, 2009. [Kamiran and Calders, 2012] Faisal Kamiran and Toon Calders. Data preprocessing techniques for classiﬁcation without discrimination. KAIS, 33(1):1 33, 2012. [Kamiran et al., 2010] Faisal Kamiran, Toon Calders, and Mykola Pechenizkiy. Discrimination aware decision tree learning. In ICDM, pages 869 874. IEEE, 2010. [Kamishima et al., 2011] Toshihiro Kamishima, Shotaro Akaho, and Jun Sakuma. Fairness-aware learning through regularization approach. In ICDMW, pages 643 650. IEEE, 2011. [Kamishima et al., 2012] Toshihiro Kamishima, Shotaro Akaho, Hideki Asoh, and Jun Sakuma. Fairness-aware classiﬁer with prejudice remover regularizer. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 35 50. Springer, 2012. [Lichman, 2013] M. Lichman. UCI machine learning repository. http://archive.ics.uci.edu/ml, 2013. [Murphy, 2012] Kevin P Murphy. Machine learning: a probabilistic perspective. MIT press, 2012. [Nabi and Shpitser, 2017] Razieh Nabi and Ilya Shpitser. Fair inference on outcomes. ar Xiv preprint ar Xiv:1705.10378, 2017. [Pearl, 2009] Judea Pearl. Causality: models, reasoning and inference. Cambridge university press, 2009. [Romei and Ruggieri, 2014] Andrea Romei and Salvatore Ruggieri. A multidisciplinary survey on discrimination analysis. The Knowledge Engineering Review, 29(05):582 638, 2014. [Vapnik, 1998] Vlamimir Vapnik. Statistical learning theory, volume 1. Wiley New York, 1998.

[ˇZliobait e et al., 2011] Indre ˇZliobait e, Faisal Kamiran, and Toon Calders. Handling conditional discrimination. In ICDM, pages 992 1001. IEEE, 2011. [Zafar et al., 2017] Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rogriguez, and Krishna P Gummadi. Fairness constraints: Mechanisms for fair classiﬁcation. In Artiﬁcial Intelligence and Statistics, pages 962 970, 2017. [Zhang et al., 2017] Lu Zhang, Yongkai Wu, and Xintao Wu. A causal framework for discovering and removing direct and indirect discrimination. In IJCAI, 2017.

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)