# achieving_nondiscrimination_in_prediction__77aa4742.pdf Achieving Non-Discrimination in Prediction Lu Zhang, Yongkai Wu, and Xintao Wu University of Arkansas {lz006,yw009,xintaowu}@uark.edu In discrimination-aware classification, the preprocess methods for constructing a discriminationfree classifier first remove discrimination from the training data, and then learn the classifier from the cleaned data. However, they lack a theoretical guarantee for the potential discrimination when the classifier is deployed for prediction. In this paper, we fill this gap by mathematically bounding the discrimination in prediction. We adopt the causal model for modeling the data generation mechanism, and formally defining discrimination in population, in a dataset, and in prediction. We obtain two important theoretical results: (1) the discrimination in prediction can still exist even if the discrimination in the training data is completely removed; and (2) not all pre-process methods can ensure non-discrimination in prediction even though they can achieve non-discrimination in the modified training data. Based on the results, we develop a two-phase framework for constructing a discrimination-free classifier with a theoretical guarantee. The experiments demonstrate the theoretical results and show the effectiveness of our two-phase framework. 1 Introduction Discrimination-aware classification is receiving an increasing attention in the data mining and machine learning fields. Many methods have been proposed for constructing discrimination-free classifiers, which can be broadly classified into three categories: the pre-process methods that modify the training data [Kamiran and Calders, 2009a; Feldman et al., 2015; Zhang et al., 2017; Calmon et al., 2017; Nabi and Shpitser, 2017], the in-process methods that adjust the learning process of the classifier [Calders and Verwer, 2010; Kamishima et al., 2011; 2012; Zafar et al., 2017], and the post-process methods that directly change the predicted labels [Kamiran et al., 2010; Hardt et al., 2016]. All three categories of methods have their respective limitations. For the in-process methods, they usually perform some tweak or develop some regularizers for the classifier to correct or penalize discriminatory outcomes during the learning process. However, since the discrimination or fair constraints are generally not convex functions, surrogate functions are usually used for the minimization. As a result, additional bias might be introduced due to the approximation errors associated with the surrogate function. For the post-process methods, they are restricted to those who can modify the predicted label of each individual independently. Thus, methods that map the whole dataset or population to a new non-discriminatory one cannot be adopted for post-process, which means that a number of causal-based discrimination removal methods (e.g., [Zhang et al., 2017; Nabi and Shpitser, 2017]) cannot be applied. In our work, we target the pre-process methods that modify the training data, where some methods only modify the label, such as the Massaging [Kamiran and Calders, 2009a; ˇZliobait e et al., 2011] and the Causal-Based Removal [Zhang et al., 2017], and some methods also modify the data attributes other than the label, such as the Preferential Sampling [Kamiran and Calders, 2012; ˇZliobait e et al., 2011], the Reweighing [Calders et al., 2009], and the Disparate Impact Removal [Feldman et al., 2015; Adler et al., 2016]. The fundamental assumption of the pre-process methods is that, since the classifier is learned from a discrimination-free dataset, it is likely that the future prediction will also be discriminationfree [Kamiran and Calders, 2009b]. Although this assumption is plausible, however, there is no theoretical guarantee to show how much likely it is and how discrimination-free the prediction would be given a training data and a classifier. The lack of the theoretical guarantees places great uncertainty on the performance of all pre-process methods. In this paper, we fill this gap by modeling the discrimination in prediction using the causal model [Pearl, 2009]. A causal model is a structural equation-based mathematical object that describes the causal mechanism of a system. We assume that there exists a fixed but unknown causal model that represents the underlying data generation mechanism of the population. Based on the causal model, we define the causal measure of discrimination in population as well as in prediction. We then formalize two problems, namely discovering and removing discrimination in prediction. Based on specific assumptions regarding the causal model and the causal measure of discrimination, we conduct concrete analysis of discrimination. We derive the formula for quantitatively measuring the discriminatory effect in population from Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18) Notation Definition C Protected attribute Z = {Z1, , Zm} Non-protected attributes L Label of decision h : C Z L Classifier M Causal model of population Mh Causal model of prediction D = {(c(j), z( j), l(j))} Training data Dh = {(c(j), z( j), h(c(j),z(j)))} Training data w/ predicted labels Table 1: Table of notations. the observable probability distributions. We then derive the corresponding causal measure of the discrimination in prediction, as well as their approximations from the sample dataset. Finally, we link the discrimination in prediction with the discrimination in the training data by a probabilistic condition, which mathematically bounds the probability of the discrimination in prediction being within a given interval in terms of the training data and classifier. As a consequence, we obtain two important theoretical results: (1) even if the discrimination in the training data is completely removed, the discrimination in prediction can still exist due to the bias in the classifier; and (2) for removing discrimination, different from the claims of many previous work, not all pre-process methods can ensure non-discrimination in prediction even though they can achieve non-discrimination in the modified training data. We show that to guarantee nondiscrimination in prediction, the pre-process methods should only modify the label. Based on the results, we develop a twophase framework for constructing a discrimination-free classifier with a theoretical guarantee, which provides a guideline for employing existing pre-process methods or designing new ones. The experiments demonstrate the theoretical results and show the effectiveness of our two-phase framework. 2 Problem Formulation 2.1 Notations and Preliminaries We consider an attribute space which consists of some protected attributes, the label of certain decision attribute, and the non-protected attributes. Throughout the paper, we use an uppercase alphabet, e.g., X to represent an attribute; a bold uppercase alphabet, e.g., X, to represent a subset of attributes. We use a lowercase alphabet, e.g., x, to represent a realization of attribute X; a bold lowercase alphabet, e.g., x, to represent a realization of X. For ease of representation, we assume that there is only one protected attribute, denoted by C, which is a binary attribute associated with the domain values of the nonprotected group c+ and the protected group c . We denote the label of the decision attribute by L, which is a binary attribute associated with the domain values of the positive label l+ and negative label l . According to the convention in machine learning, we also define that l+ = 1 and l = 0. The set of all the non-protected attributes is denoted by Z = {Z1, , Zm}. Please refer to the notation table shown as Table 1. A causal model is formally defined as follows. Definition 1 (Causal Model). A causal model M is a triple M = U, V, F where 1. U is a set of hidden contextual variables that are determined by factors outside the model. 2. V is a set of observed variables that are determined by variables in U V. 3. F is a set of equations mapping from U V to V. Specifically, for each Vi V, there is an equation fi mapping from U (V\Vi) to Vi, i.e., vi = fi(pai, ui), where pai is a realization of a set of observed variables PAi V\Vi called the parents of Vi, and ui is a realization of a set of hidden variables Ui U. The causal effect in the causal model is defined over the intervention that fixes the value of an observed variable(s) V to a constant(s) v while keeping the rest of the model unchanged. The intervention is mathematically formalized as do(V = v) or simply do(v). Then, for any variables X, Y V, the distribution of Y after do(x) is defined as [Pearl, 2009] P(y|do(x)) P(Y = y|do(X = x)) = X {u:Yx(u)=y} P(u), (1) where Yx(u) denotes the value of Y after intervention do(x) under context U = u. Note that P(u) is an unknown joint distribution of all hidden variables. If the causal model satisfies the Markovian assumption: (1) the associated causal graph of the causal model is acyclic; and (2) all variables in U are mutually independent, P(y|do(x)) can be computed from the joint distribution of V according to the truncated factorization formula [Pearl, 2009] P(y|do(x)) = X Vi,X P(vi|pai)δX=x, (2) where the summation is a marginalization that traverses all value combinations of V = V\{X, Y}, and δX=x means replacing X with x in each term. 2.2 Model Discrimination in Prediction Assume that there exists a fixed population over the space C Z L. The values of all the attributes in the population are determined by a causal model M, which can be written as Causal Model M c = f C(pa C, u C) zi = fi(pai, ui) i = 1, , m l = f L(pa L, u L) where f L can be considered as the decision making process in the real system. Without ambiguity, we can also use M to denote the population, and use the terms mechanism and population interchangeably. In practice, M is unknown and we can only observe a sample dataset D = {(c(j), z(j), l(j)); j = 1, , n} drawn from the population. A classifier h is function mapping from C Z to L. A hypothesis space H is a set of candidate classifiers. A learning algorithm analyzes dataset D as the training data to find a classifier from H that minimizes the difference between the predicted labels h(c(j), z(j)) and the true labels l( j) (j = 1, , n). Once training completes, the classifier is deployed to infer prediction on any new unlabeled data. It is Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18) usually assumed that the unlabeled data is drawn from the same population as the training data, i.e., M. Therefore, in prediction, the values of all the attributes other than the label are determined by the same mechanisms as those in M, and meanwhile the classifier acts as a new mechanism for determining the value of the label. We consider M with function f L( ) being replaced by classifier h( ) as the causal model of classifier h, denoted by Mh, which is written as Causal Model Mh c = f C(pa C, u C) zi = fi(pai, ui) i = 1, , m l = h(c, z) If we apply the classifier on D, we can obtain a new dataset Dh by replacing the original labels with the predicted labels, i.e., Dh = {(c(j), z(j), h(c(j), z(j))); j = 1, , n}. Straightforwardly, Dh can be considered as a sample drawn from Mh. The discrimination in prediction made by classifier h can be naturally defined as the discrimination in Mh. To this end, we first define a measure of discrimination in M based on the causal relationship specified by M, denoted by DEM called the true discrimination. By adopting the same measure, we denote the discrimination in Mh by DEMh, called the true discrimination in prediction. Then, we denote the approximation of DEM from dataset D by DED, and similarly denote the approximation of DEMh from dataset Dh by DEDh. Our target is to discover and remove the true discrimination in prediction, i.e., DEMh, based on certain causal measure of discrimination defined on M, i.e., DEM. When calculating DEMh from dataset D, we may encounter disturbances such as the sampling error of D and the misclassification of h. We then need to compute analytic approximation to DEMh. Thus, we define the problem of discovering discrimination in prediction as follows. Problem 1 (Discover Discrimination in Prediction). Given a causal measure of discrimination defined on M, i.e., DEM, a sample dataset D and a classifier h trained on D, compute analytic approximation to the true discrimination in prediction, i.e., DEMh. If the true discrimination in prediction is detected according to the approximation, the next step is to remove the discrimination through tweaking the dataset and/or the classifier. Thus, we define the problem of removing discrimination in prediction as follows. Problem 2 (Remove Discrimination in Prediction). Given DEM, D and h, tweak D and/or h in order to make DEMh be bounded by a user-defined threshold τ. 3 Discover Discrimination in Prediction In the above general problem definitions, DEM can be any reasonable causal measure of discrimination defined on any causal model. However, a concrete analysis of discrimination must rely on specific assumptions regarding the causal measure of discrimination and the causal model. The remaining of the paper is based on following assumptions: (1) M satisfies the Markovian assumption; (2) we consider all causal effects (total effect) of C on L as discriminatory; (3) we assume that C has no parent and L has no child. The first assumption is necessary for computing the causal effect from the observable probability distributions. The second assumption is because the total causal effect is the causal relationship that is easiest to interpret and estimate. We will extend our results to other discrimination definitions such as those in [Zhang et al., 2017; Bonchi et al., 2017] in the future work. The last assumption is to make our theoretical results more concise and can be easily relaxed. 3.1 Causal Measure of Discrimination We first derive the true discrimination in M. The key of discrimination is a counterfactual question: whether the decision of an individual would be different had the individual been of a different protected/non-protected group (e.g., sex, race, age, religion, etc.)? To answer this question, we can perform an intervention on each individual to change the value of protected attribute C and see how label L will change. We consider the difference between the expectation of the labels when performing do(c+) for all individuals and the expectation of the labels when performing do(c ) for all individuals, and use it as the causal measure of discrimination. To obtain the above difference, note that the causal model is completely specified at the individual level when context U = u is specified. For each individual specified by u, denote the label of individual u by Lc+(u) (resp. Lc (u)) when C is fixed according to do(c+) (resp. do(c )). Then, the difference in the label of individual u is given by Lc+(u) Lc (u). The expected difference of the labels over all individuals is hence given by E[Lc+(u) Lc (u)]. Based on this analysis, we obtain the following proposition. Proposition 1. Given a causal model M, the true discrimination is given by DEM = P(l+|c+) P(l+|c ). Proof. The above expectations can be computed as E[Lc+(u)] = X u Lc+(u)P(u) = X {u:Lc+(u)=l+} l+P(u) {u:Lc+(u)=l } l P(u) = X {u:Lc+(u)=l+} P(u) = P(l+|do(c+)), (3) where the last step is according to Eq. (1). According to Eq. (2), we have P(l+|do(c+)) = X z P(l+|pa L)δC=c+ Y Zi Z P(zi|pai)δC=c+ . On the other hand, since C has no parent, we have P(l+|c+) = P(l+, c+) P(c+) = P z P(c+)P(l+|pa L) Q Zi Z P(zi|pai) Thus, we have P(l+|do(c+)) = P(l+|c+), leading to E[Lc+(u)] = P(l+|c+). Similarly we can prove E[Lc (u)] = P(l+|c ). Hence, the proposition is proven. Interestingly, the obtained discrimination causal measure is the same as the classic statistical discrimination metric risk difference, which is widely used as the non-discrimination Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18) constraint in discrimination-aware learning [Romei and Ruggieri, 2014]. Our analysis can help understand the assumptions and scenarios in which the risk difference applies. Given dataset D, we approximate DEM using the maximum likelihood estimation, denoted by DED as shown below. Proposition 2. Given a dataset D, the discrimination in D is given by DE(c+, c )D = ˆP(l+|c+) ˆP(l+|c ), where ˆP(l+|c+) = X z ˆP(l+|c+, z) ˆP(z|c+), (4) with ˆP( ) being the conditional frequency in D. Given a classifier h : C Z L, denote the predicted labels by L. By adopting the same causal measure of discrimination of DEM, we obtain DEMh shown as follows. Proposition 3. Given a causal model M and a classifier h, the true discrimination in prediction is given by DEMh = P( l+|c+) P( l+|c ), where P( l+|c+) (resp. P( l+|c )) is the probability of the positive predicted labels for the data with C = c+ (resp. C = c ). We similarly define DEDh as the maximum likelihood estimation of DEMh. Proposition 4. Given a dataset D and a classifier h trained on D, the discrimination in Dh is given by DEDh = ˆP( l+|c+) ˆP( l+|c ), where ˆP( l+|c+) = X z I[h(c+,z)=l+] ˆP(z|c+). (5) with I[ ] the indicator function. 3.2 Bounding Discrimination in Prediction To approximate DEMh from D, sampling error cannot be avoided since D is only a subset of the entire population. Although exact measurement of sampling error is generally not feasible as M is unknown, it can be probabilistically bounded. In the following we first bound the difference between DEM and its approximation DED, and then extend the result to the difference between DEMh and its approximation DEDh. Proposition 5. Given a causal model M and a sample dataset D with size of n, the probability of the difference between DEM and DED no larger than t is bounded by P |DEM DED| t > 1 4e n+n where n+ and n (n+ +n = n) are the numbers of individuals with c+ and c in D. Proof. By definition of DEM and DED we have DEM DED = P(l+|c+) ˆP(l+|c+) + ˆP(l+|c ) P(l+|c ) . Denoting by l(+ j) the label of the jth individual in D with C = c+, we can write ˆP(l+|c+) as ˆP(l+|c+) = 1 n+ I[l(+1)=l+] + + I[l(+n+)=l+] , where indicators I[l(+ j)=l+] (j = 1 n+) can be considered as independent random variables bounded by the interval [0, 1]. Note that E[ ˆP(l+|c+)] = P(l+|c+). According to the Hoeffding s inequality [Murphy, 2012], we have P P(l+|c+) ˆP(l+|c+) t 2e 2n+t2. Similarly, we have P P(l+|c ) ˆP(l+|c ) t 2e 2n t2. Therefore, we have P |DEM DED| t P P(l+|c+) ˆP(l+|c+) + P(l+|c ) ˆP(l+|c ) t max 0 x t P P(l+|c+) ˆP(l+|c+) x P P(l+|c ) ˆP(l+|c ) t x max 0 x t(1 2e 2n+x2)(1 2e 2n (t x)2) where the last line is by substituting x with For extending Proposition 5 to Proposition 6, the difference is that, since h is a classifier depending on training data D, indicators I[h(c(+ j),z(+ j))=l+] cannot be considered as independent. Thus, the Hoeffding s inequality cannot be directly applied and a uniform bound for all hypotheses in H is needed. Proposition 6. Given a causal model M, a sample dataset D, and a classifier h : C Z L from hypothesis space H, the probability of the distance between DEMh and DEDh no larger than t is bounded by P DEMh DEDh t ! 1 δ(t), n t2 if H is finite, 4(2en+)d + (2en )d n t2 if H is infinite, with d being the VC dimension of H. Proof. According to the definitions of DEMh and DEDh, DEMh DEDh = P( l+|c+) ˆP( l+|c+) + P( l+|c ) ˆP( l+|c ) . Similar to the proof of Proposition 5, we treat ˆP( l+|c+) as the mean of indicators I[h(c(+ j),z(+ j))=l+] (j = 1 n+). According to the generalization bound in statistical learning theory [Vapnik, 1998], if H is finite we have P P( l+|c+) ˆP( l+|c+) t 2|H|e 2n+t2, where |H| is the size of H. If H is infinite we have P P( l+|c+) ˆP( l+|c+) t 4 2en+ !d e 2n+t2, where d is the VC dimension of H. Then the proposition can be proven similarly to Eq. (6). Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18) Proposition 6 provides an approximation of DEMh from DEDh. However, since pre-process methods deal with the training data, it is imperative to further link DEMh with DED. Next, we give the relation between DEDh and DED in terms of a bias metric that we refer to as the the error bias. Definition 2 (Error Bias). For any classifier h trained on a training dataset D, the error bias is given by εh,D = ε+ 1 ε+ 2 (ε 1 ε 2), where ε+ 1, ε 1 are the percentages of false positives on data with C = c+ and C = c respectively, and ε+ 2, ε 2 are the percentages false negatives on data with C = c+ and C = c respectively. Proposition 7. For any classifier h that is trained on D, we have DEDh DED = εh,D. Proof. By definition, ε+ 1 is given by { j:c(j)=c+,l( j)=l } I[h(c(j),z( j))=l+], which can be rewritten as z ˆP(z|c+) I[h(c( j),z( j))=l+] (1 ˆP(l+|c+, z)). Similarly, ε+ 2 is given by z ˆP(z|c+) I[h(c( j),z( j))=l ] ˆP(l+|c+, z). Subtracting ε+ 2 from ε+ 1, we obtain ε+ 1 ε+ 2 = X z ˆP(z|c+) I[h(c+,z)=l+](1 ˆP(l+|c+, z)) I[h(c+,z)=l ] ˆP(l+|c+, z) , which is equivalent to ε+ 1 ε+ 2 = X z ˆP(z|c+) I[h(c+,z)=l+] ˆP(l+|c+, z) . Similarly for data with C = c , we have ε 1 ε 2 = X z ˆP(z|c ) I[h(c ,z)=l+] ˆP(l+|c , z) . On the other hand, according to Eq. (4) and (5) we have DEDh DED = X z ˆP(z|c+) I[h(c+,z)=l+] ˆP(l+|c+, z) z ˆP(z|c ) I[h(c ,z)=l+] ˆP(l+|c , z) = ε+ 1 ε+ 2 (ε 1 ε 2). Letting εh,D = ε+ 1 ε+ 2 (ε 1 ε 2) completes the proof. Using Proposition 7, we rewrite Propositions 6 to Theorem 1 that is easier to interpret and utilize in practice. Theorem 1. Given a causal model M, a sample dataset D and a classifier h trained on D, DEMh is bounded by P DEMh DED + εh,D + t ! 1 δ(t), where the meaning of δ(t) is same as that in Proposition 6. Theorem 1 gives a criterion of non-discrimination in prediction that incorporates both the discrimination in the training data and the error bias of the classifier, i.e., DED + εh,D being bounded by a threshold τ. It shows that either given a discrimination-free dataset D, i.e., |DED| τ, or a balanced classifier, i.e., εh,D τ, we cannot guarantee nondiscriminatory prediction. Instead, it requires to ensure that the sum of DED and εh,D is within the threshold. 4 Remove Discrimination in Prediction This section solves the problem of removing discrimination in prediction: if criterion DED + εh,D τ is not satisfied for a classifier, how can we meet the criterion through modifying the training data and tweaking the classifier? Denote by D a dataset obtained by modifying D, and by h a new classifier trained on D . Note that when the training data is modified, the error bias of the classifier built on it will also change. Thus, it is difficult to perform the training data modification and the classifier tweaking simultaneous. We propose a framework for modifying the training data and the classifier in two successive phases, as summarized in Algorithm 1. Algorithm 1: Two-phase framework. 1 If DED+εh,D τ, we are done. Otherwise, modify the labels in the training dataset D to obtain a modified dataset D such that |DED | τ. 2 Train a classifier h on D . If DED +εh ,D τ, we are done. Otherwise, tweak classifier h to meet the above requirement. In the first phase, we modify D to remove the discrimination it contains. The modified dataset D can be considered as being generated by a causal model M that is different from M with respect to the modification. Note that if DED +εh ,D τ is achieved, Theorem 1 ensures the bound of discrimination for M h , i.e., the discrimination of h performed on M , but not for Mh , i.e., the discrimination of h performed on the original population. If we only modify the label of D, M can be written as Causal Model M c = f C(pa C, u C) zi = fi(pai, ui) i = 1, , m l = f L(pa L, u L) Then, the causal model of any classifier h trained on D and performed on M is given by Causal Model M h c = f C(pa C, u C) zi = fi(pai, ui) i = 1, , m l = h (c, z) which is equivalent to Mh . Thus, non-discrimination in M h also means non-discrimination in Mh . On the other hand, if we modify attributes other than L, since the new unlabeled data is drawn from the original population, Mh is inconsistent with M h . As a result, the non-discrimination result of the training data cannot be applied to the prediction of the new data. Therefore, we have the following corollary derived from Theorem 1. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18) Size DEM DED DEDh DEMh DT SVM DT SVM 500 0.131 1.6E 3 0.145 4.1E 3 0.132 8.2E 3 0.139 3.5E 3 0.125 6.8E 3 2000 0.131 4.8E 4 0.129 1.1E 3 0.121 7.4E 3 0.130 9.4E 4 0.120 7.1E 3 10000 0.129 8.0E 5 0.138 4.0E 4 0.150 4.3E 3 0.138 3.8E 4 0.150 4.3E 3 Table 2: Measured discrimination before discrimination removal (values larger than threshold are highlighted as bold). Size Two-phase framework (MSG) DI DED DEMh DEMh (w/o classifier tweaking) DED DEMh 500 0.004 3.7E 6 0.015 1.0E 3 0.068 4.6E 3 2E 4 1.4E 3 0.092 6.1E 3 2000 0.001 1.7E 7 0.016 5.3E 4 0.067 4.3E 3 0.001 3.4E 4 0.095 1.6E 3 10000 2E 4 9.7E 9 0.013 3.3E 4 0.061 3.3E 3 0.001 6.8E 5 0.107 5.4E 4 Table 3: Measured discrimination after discrimination removal (decision tree as the classifier). Corollary 1. Let D be a modified dataset from D, and h be a new classifier trained on D . If D only modifies the labels, then DED +εh ,D τ is a sufficient condition to achieve P DEMh τ + t 1 δ(t), where the meaning of δ(t) is same as that in Proposition 6. In the second phase, we make modifications to h to reduce the error bias. Although dealing with a different fairness criterion, existing methods for balancing the misclassification rates (e.g., [Hardt et al., 2016]) can be easily extended for solving this problem. For the purpose of evaluating the correctness of our theoretical results, here we use a simple algorithm Random Flip for reducing the error bias that can be applied to any classifier. After the classifier makes a prediction, Random Flip randomly flips the predicted label with certain probability p+ (resp. p ) if the individual has C = c+ (resp. C = c ) to achieve DED +εh ,D τ, where p+ can be computed according to the prediction of h over D . Denoting σ = τ |DED |, it suffices to make |ε+ 1 ε+ 2| σ/2 and |ε 1 ε 2)| σ/2. Assume that ε+ 1 ε+ 2 > σ/2, then it can be easily shown that p+ should satisfy (ε+ 1 ε+ 2 σ/2) n+ p+ (ε+ 1 ε+ 2) n+ fp+tp . Similar result can be obtained if ε+ 1 ε+ 2 < σ/2. 5 Empirical Evaluation 5.1 Experimental Setup In this section, we conduct experiments to evaluate our theoretical results. For simulating a population, we adopt a semisynthetic data generation method. We first learn a causal model M for a real dataset, the Adult dataset [Lichman, 2013], and treat it as the ground-truth. We then generate the training data D based on M. The causal model is built using the open-source software Tetrad [Glymour and others, 2004]. The Adult dataset consists of 11 attributes including age, education, sex, occupation, income, etc. Due to the sparse data issue, we binarize each attribute s domain values into two classes to reduce the domain sizes. We treat sex as C and income as L. The discrimination is measured as 0.13 in M, i.e., DEM = 0.13. Based on the underlying distribution of M, we generate a number of training data sets with different sample sizes. When constructing discrimination-free classifiers using the two-phase framework, we select one representative data modifying algorithm that only modifies L, the Massaging (MSG) algorithm [Kamiran and Calders, 2009a]. For other algorithms, we will evaluate their performance in preserving data utility in the future work. For comparison, we also include an algorithm that modifies Z, the Disparate Impact Removal (DI) algorithm [Adler et al., 2016]. The proposed Random Flip algorithm is used for tweaking the classifier. We assume a discrimination threshold τ = 0.05, i.e., we want to ensure that the discrimination in prediction is not larger than 0.05. 5.2 Experiment Results We first measure the discrimination in each training data set, i.e., DED. Then, we learn the classifier h from the training data where two classifiers, decision tree (DT) and support vector machine (SVM) are used. By replacing the labels in the training data with the labels predicted by the classifier, we obtain Dh whose discrimination is measured as DEDh. Finally, we measure the discrimination in prediction, i.e., DEMh according to Proposition 3. For each sample size, we repeat the experiments 100 times by randomly generating 100 sets of training data. We report the average and variance of each measured discrimination. The results are shown in Table 2. As expected, with the increase of the sample size, the difference between DEM and DED decreases as shown by the variance. Since DED > 0.05, the training data contains discrimination. As a result, both the training data with predicted labels, i.e., Dh, and the prediction, i.e., Mh, also contain discrimination. To show the effectiveness of the two-phase framework, we first apply MSG to completely remove the discrimination in the above training data, obtaining the modified training data D . Then, a decision tree h is built on D , and the Random Flip algorithm is executed to tweak the classifier so that the error bias is less than 0.05, i.e., εh ,D 0.05. Finally, we measure the discrimination in Mh . For comparison, the same process is also performed for DI. The results are shown in Table 3. By using the two-phase framework, discrimination is removed from the training data as shown by DED , and more importantly, removed from the prediction as shown by DEMh . We also see that, if the classifier tweaking is not performed, the prediction still contains discrimination. How- Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18) ever, for DI, even when the discrimination is removed from the training data, and the error bias in the classifier is also removed, there still exists discrimination in prediction. These results are consistent with our theoretical conclusions. 6 Conclusions In this paper, we addressed the limitation of the pre-process methods that there is no guarantee about the discrimination in prediction. Our theoretical results show that: (1) only removing discrimination from the training data cannot ensure nondiscrimination in prediction for any classifier; and (2) when removing discrimination from the training data, one should only modify the labels in order to obtain a non-discrimination guarantee. Based on the results, we developed a two-phase framework for constructing a discrimination-free classifier with a theoretical guarantee. The experiments adopting a semi-synthetic data generation method demonstrate the theoretical results and show the effectiveness of our two-phase framework. Acknowledgments This work was supported in part by NSF 1646654. [Adler et al., 2016] Philip Adler, Casey Falk, Sorelle A Friedler, Gabriel Rybeck, Carlos Scheidegger, Brandon Smith, and Suresh Venkatasubramanian. Auditing blackbox models for indirect influence. In Proceedings of ICDM 2016, 2016. [Bonchi et al., 2017] Francesco Bonchi, Sara Hajian, Bud Mishra, and Daniele Ramazzotti. Exposing the probabilistic causal structure of discrimination. International Journal of Data Science and Analytics, 3(1):1 21, 2017. [Calders and Verwer, 2010] Toon Calders and Sicco Verwer. Three naive bayes approaches for discrimination-free classification. Data Mining and Knowledge Discovery, 21(2):277 292, 2010. [Calders et al., 2009] Toon Calders, Faisal Kamiran, and Mykola Pechenizkiy. Building classifiers with independency constraints. In Data mining workshops, 2009. ICDMW 09. IEEE international conference on, pages 13 18. IEEE, 2009. [Calmon et al., 2017] Flavio Calmon, Dennis Wei, Bhanukiran Vinzamuri, Karthikeyan Natesan Ramamurthy, and Kush R Varshney. Optimized pre-processing for discrimination prevention. In Advances in Neural Information Processing Systems, pages 3995 4004, 2017. [Feldman et al., 2015] Michael Feldman, Sorelle A Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. Certifying and removing disparate impact. In KDD, pages 259 268. ACM, 2015. [Glymour and others, 2004] Clark Glymour et al. The TETRAD project. http://www.phil.cmu.edu/ tetrad, 2004. [Hardt et al., 2016] Moritz Hardt, Eric Price, Nati Srebro, et al. Equality of opportunity in supervised learning. In Advances in Neural Information Processing Systems, pages 3315 3323, 2016. [Kamiran and Calders, 2009a] Faisal Kamiran and Toon Calders. Classifying without discriminating. In Computer, Control and Communication, 2009. IC4 2009. 2nd International Conference on, pages 1 6. IEEE, 2009. [Kamiran and Calders, 2009b] Faisal Kamiran and Toon Calders. Discrimination-aware classification. In 21st Benelux Conference on Artificial Intelligence (BNAIC), pages 333 334, 2009. [Kamiran and Calders, 2012] Faisal Kamiran and Toon Calders. Data preprocessing techniques for classification without discrimination. KAIS, 33(1):1 33, 2012. [Kamiran et al., 2010] Faisal Kamiran, Toon Calders, and Mykola Pechenizkiy. Discrimination aware decision tree learning. In ICDM, pages 869 874. IEEE, 2010. [Kamishima et al., 2011] Toshihiro Kamishima, Shotaro Akaho, and Jun Sakuma. Fairness-aware learning through regularization approach. In ICDMW, pages 643 650. IEEE, 2011. [Kamishima et al., 2012] Toshihiro Kamishima, Shotaro Akaho, Hideki Asoh, and Jun Sakuma. Fairness-aware classifier with prejudice remover regularizer. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 35 50. Springer, 2012. [Lichman, 2013] M. Lichman. UCI machine learning repository. http://archive.ics.uci.edu/ml, 2013. [Murphy, 2012] Kevin P Murphy. Machine learning: a probabilistic perspective. MIT press, 2012. [Nabi and Shpitser, 2017] Razieh Nabi and Ilya Shpitser. Fair inference on outcomes. ar Xiv preprint ar Xiv:1705.10378, 2017. [Pearl, 2009] Judea Pearl. Causality: models, reasoning and inference. Cambridge university press, 2009. [Romei and Ruggieri, 2014] Andrea Romei and Salvatore Ruggieri. A multidisciplinary survey on discrimination analysis. The Knowledge Engineering Review, 29(05):582 638, 2014. [Vapnik, 1998] Vlamimir Vapnik. Statistical learning theory, volume 1. Wiley New York, 1998. [ˇZliobait e et al., 2011] Indre ˇZliobait e, Faisal Kamiran, and Toon Calders. Handling conditional discrimination. In ICDM, pages 992 1001. IEEE, 2011. [Zafar et al., 2017] Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rogriguez, and Krishna P Gummadi. Fairness constraints: Mechanisms for fair classification. In Artificial Intelligence and Statistics, pages 962 970, 2017. [Zhang et al., 2017] Lu Zhang, Yongkai Wu, and Xintao Wu. A causal framework for discovering and removing direct and indirect discrimination. In IJCAI, 2017. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18)