# algorithmic_fairness_verification_with_graphical_models__6eef16e0.pdf

Algorithmic Fairness Veriﬁcation with Graphical Models*

Bishwamittra Ghosh1, Debabrota Basu2, Kuldeep S. Meel1

1 School of Computing, National University of Singapore, Singapore 2 Equipe Scool, Univ. Lille, Inria, UMR 9189 - CRISt AL, CNRS, Centrale Lille, France

In recent years, machine learning (ML) algorithms have been deployed in safety-critical and high-stake decision-making, where the fairness of algorithms is of paramount importance. Fairness in ML centers on detecting bias towards certain demographic populations induced by an ML classiﬁer and proposes algorithmic solutions to mitigate the bias with respect to different fairness deﬁnitions. To this end, several fairness veriﬁers have been proposed that compute the bias in the prediction of an ML classiﬁer essentially beyond a ﬁnite dataset given the probability distribution of input features. In the context of verifying linear classiﬁers, existing fairness veriﬁers are limited by accuracy due to imprecise modeling of correlations among features and scalability due to restrictive formulations of the classiﬁers as SSAT/SMT formulas or by sampling. In this paper, we propose an efﬁcient fairness veriﬁer, called FVGM, that encodes the correlations among features as a Bayesian network. In contrast to existing veriﬁers, FVGM proposes a stochastic subset-sum based approach for verifying linear classiﬁers. Experimentally, we show that FVGM leads to an accurate and scalable assessment for more diverse families of fairness-enhancing algorithms, fairness attacks, and group/causal fairness metrics than the state-of-the-art fairness veriﬁers. We also demonstrate that FVGM facilitates the computation of fairness inﬂuence functions as a stepping stone to detect the source of bias induced by subsets of features.

1 Introduction The signiﬁcant improvement of machine learning (ML) over the decades has led to a host of applications of ML in high-stake decision-making such as college admission (Martinez Neda, Zeng, and Gago-Masague 2021), hiring of employees (Ajunwa et al. 2016), and recidivism prediction (Tollenaar and Van der Heijden 2013; Dressel and Farid 2018). ML algorithms often have an accuracy-centric learning objective, which may cause them to be biased towards certain part of the dataset belonging to a certain economically or socially sensitive groups (Landy, Barnes, and Murphy 1978; Zliobaite 2015; Berk 2019). The following example illustrates a plausible case of unfairness induced by ML algorithms. Example 1.1. Following (Ghosh, Basu, and Meel 2021, Example 1.), let us consider an ML problem where the classiﬁer

*Source code: https://github.com/meelgroup/justicia Copyright 2022, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

decides the eligibility of an individual for health insurance given their income and ﬁtness (Figure 1). Here, the sensitive feature age (A) follows a Bernoulli distribution, and income (I) and ﬁtness (F) follow Gaussian distributions. We generate 1000 samples from these distributions and use them to train a Support Vector Machine (SVM) classiﬁer. The decision boundary of this classiﬁer is 9.37I + 9.75F 0.34A 9.4, where A = 1 denotes the sensitive group age 40 . This classiﬁer selects an individual above and below 40 years of age with probabilities 0.24 and 0.86, respectively. This illustrates a disparate treatment of individuals of two age groups by the SVM classiﬁer.

0.0 0.5 1.0 income

0.0 0.5 1.0 fitness

age < 40 age 40

Figure 1: Age-dependent distributions of income and ﬁtness in Example 1.1.

In order to identify and mitigate the bias of ML classiﬁers, different fairness deﬁnitions and fairness algorithms have been proposed (Hardt, Price, and Srebro 2016; Kusner et al. 2017; Mehrabi et al. 2019). In this paper, we focus on two families of fairness deﬁnitions: group and causal fairness. Group fairness metrics, such as disparate impact and equalized odds constrain the probability of the positive prediction of the classiﬁer to be (almost) equal among different sensitive groups (Dwork et al. 2012; Feldman et al. 2015). On the other hand, causal fairness metrics assess the difference in positive predictions if every feature in the causal relation remains identical except the sensitive feature (Nabi and Shpitser 2018; Zhang and Bareinboim 2018). The early works on fairness veriﬁcation focused on measuring fairness metrics of a classiﬁer for a given dataset (Bellamy et al. 2018). Naturally, such techniques were limited in enhancing conﬁdence of users for wide deployment. Consequently, recent veriﬁers seek to achieve veriﬁcation beyond ﬁnite dataset and in turn focus on the probability distribution of features (Albarghouthi et al. 2017; Bastani, Zhang, and Solar-Lezama 2019; Ghosh, Basu, and Meel 2021). More speciﬁcally, the input to the veriﬁer is

The Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22)

a classiﬁer and the probability distribution of features, and the output is an estimate of fairness metrics that the classiﬁer obtains given the distribution. For Example 1.1, a fairness veriﬁer takes the SVM classiﬁer and the distribution of features I, F, A as an input and outputs the probability of positive prediction of the classiﬁer for different sensitive groups. In order to solve the fairness-veriﬁcation problem, existing works have proposed two principled approaches. Firstly, Ghosh, Basu, and Meel (2021) and Albarghouthi et al. (2017) propose formal methods that reduce the problem into a solution of an SSAT or an SMT formula respectively. Secondly, Bastani, Zhang, and Solar-Lezama (2019) propose a sampling approach that relies on extensively enumerating the conditional probabilities of prediction given different sensitive features and thus, incurs high computational cost. Additionally, existing works assume feature independence of non-sensitive features and consider correlated features within a limited scope, such as conditional probabilities of non-sensitive features w.r.t. sensitive features and ignore correlations among non-sensitive features. As a result, the scalability and accuracy of existing veriﬁers remains a major challenge. In this work, we seek to remedy the aforementioned situation. As a ﬁrst step, we focus on linear classiﬁers, which has attracted signiﬁcant attention from researchers in the context of fair algorithms (Pleiss et al. 2017; Zafar et al. 2017; Dressel and Farid 2018; John, Vijaykeerthy, and Saha 2020). At this point, it is worth highlighting that our empirical evaluation demonstrates that the existing techniques fail to scale beyond small examples or provide highly inaccurate estimates for comparatively small linear classiﬁers. Our Contributions. In this paper, we propose a fairness veriﬁcation framework, namely FVGM (Fairness Veriﬁcation with Graphical Models), for accurately and efﬁciently verifying linear classiﬁers. FVGM proposes a novel stochastic subset-sum encoding for linear classiﬁers with an efﬁcient pseudo-polynomial solution using dynamic programming. To address feature-correlations, FVGM considers a graphical model, particularly a Bayesian Network that represents conditional dependence (and independence) among features in the form of a Directed Acyclic Graph (DAG). Experimentally, FVGM is more accurate and scalable than existing fairness veriﬁers; FVGM can verify group and causal fairness metrics for multiple fairness algorithms. We also demonstrate two novel applications of FVGM as a fairness veriﬁer: (a) detecting fairness attacks, and (b) computing Fairness Inﬂuence Functions (FIF) of features as a mean of identifying (un)fairness contribution of a subset of features.

2 Background In this section, we deﬁne different fairness metrics proposed for classiﬁcation. Following that, we state basics of stochastic subset sum and Bayesian networks that are the main components of our methodology. Fairness in ML. We consider1 a dataset D as a collection of triples (X, A, Y ) generated from an underlying dis-

1We represent sets/vectors by bold letters, and the corresponding distributions by calligraphic letters. We express random variables in uppercase, and an assignment of a random variable in lowercase.

tribution D. X {X1, . . . , Xm1} are non-sensitive features whereas A {A1, . . . , Am2} are categorical sensitive features. Y {0, 1} is the binary label (or class) of (X, A). Each non-sensitive feature Xi is sampled from a continuous probability distribution Xi, and each sensitive feature Aj {0, . . . , Nj} is sampled from a discrete probability distribution Aj. We use (x, a) to denote the featurevalues of (X, A). For sensitive features, a valuation vector a = [a1, .., am2] is called a compound sensitive group. For example, consider A = {race, sex} where race {Asian, Color, White} and sex {female, male}. Thus a = [Asian, female] is a compound sensitive group. We represent a binary classiﬁer trained on the dataset D as M : (X, A) ˆY . Here, ˆY {0, 1} is the predicted class of (X, A). We now discuss different fairness metrics in the literature based on the prediction of a classiﬁer (Feldman et al. 2015; Hardt, Price, and Srebro 2016; Nabi and Shpitser 2018). In this paper, FVGM veriﬁes two families of fairness metrics: group fairness (ﬁrst three in the following) and path-speciﬁc causal fairness.

1. Disparate Impact (DI): A classiﬁer satisﬁes (1 ϵ)- disparate impact if for ϵ [0, 1], mina Pr[ ˆY = 1|A = a] (1 ϵ) maxa Pr[ ˆY = 1|A = a]. 2. Statistical Parity (SP): A classiﬁer satisﬁes ϵ-statistical parity if for ϵ [0, 1], maxa Pr[ ˆY = 1|A = a] mina Pr[ ˆY = 1|A = a] ϵ. 3. Equalized Odds (EO): A classiﬁer satisﬁes ϵ-equalized odds if for ϵ [0, 1], maxa Pr[ ˆY = 1|A = a, Y = 0] mina Pr[ ˆY = 1|A = a, Y = 0] ϵ, and maxa Pr[ ˆY = 1|A = a, Y = 1] mina Pr[ ˆY = 1|A = a, Y = 1] ϵ.

4. Path-speciﬁc Causal Fairness (PCF): Let amax arg maxa Pr[ ˆY = 1|A = a]. We consider mediator features Z X sampled from the conditional distribution Z|A=amax. This emulates the fact that mediator variables have the same sensitive features amax. For ϵ [0, 1], path-speciﬁc causal fairness is deﬁned as maxa Pr[ ˆY = 1|A = a, Z] mina Pr[ ˆY = 1|A = a, Z] ϵ.

For all of the above metrics, lower value of ϵ indicates higher fairness demonstrated by the classiﬁer M. Following the observation of (Ghosh, Basu, and Meel 2021), computing all of the aforementioned fairness metrics is equivalent to computing the maximum and minimum of the probability of positive prediction of the classiﬁer, denoted as Pr[ ˆY = 1|A = a], for all compound sensitive groups a from A. Thus, in Section 3, we focus on computing the maximum and minimum probability of positive prediction of the classiﬁer and then extend it to assess corresponding fairness metrics. We call the group for which the probability of positive prediction is maximum (minimum) as the most (least) favored group of the classiﬁer. Stochastic Subset Sum Problem (S3P). Let B {Bi}|B| i=1 be a set of Boolean variables and wi Z be the weight of Bi. Given a constraint of the form P|B| i=1 wi Bi = τ, for a constant threshold τ Z, the subset-sum problem seeks

to compute an assignment b {0, 1}|B| such that the constraint evaluates to true when B is substituted with b. Subset sum problem is known to be a NP-complete problem and well-studied in theoretical computer science (Kleinberg and Tardos 2006). The counting version of the subset-sum problem counts all b s for which the above constraint holds; and this problem belongs to the complexity class #P. In this paper, we consider the counting problem for the constraint P|B| i=1 wi Bi τ where variables Bi s are stochastic. More precisely, we deﬁne a stochastic subset-sum problem, namely S3P, that computes Pr[P|B| i=1 wi Bi τ]. Details of S3P are in Section 3.1. Bayesian Network. In general, a Probabilistic Graphical Model (Koller and Friedman 2009), and speciﬁcally a Bayesian network (Pearl 1985; Chavira and Darwiche 2008), encodes the dependencies and conditional independence between a set of random variables. In this paper, we leverage an access to a Bayesian network on X A that represents the joint distribution on them. A Bayesian network is denoted by a pair (G, θ), where G (V, E) is a DAG (Directed Acyclic Graph), and θ is a set of parameters encoding the conditional probabilities induced by the joint distribution under investigation. Each vertex Vi V corresponds to a random variable. Edges E V V imply conditional dependencies among variables. For each variable Vi V, let Pa(Vi) V \ {Vi} denote the set of parents of Vi. Given Pa(Vi) and parameters θ, Vi is independent of its other non-descendant variables in G. Thus, for the assignment vi of Vi and u of Pa(Vi), the aforementioned semantics of a Bayesian network encodes the joint distribution of V as:

Pr[V1 = v1, . . . ,V|V| = v|V|] =

i=1 Pr[Vi = vi|Pa(Vi) = u; θ]. (1)

3 FVGM: Fairness Veriﬁcation with Graphical Models In this section, we present FVGM, a fairness veriﬁcation framework for linear classiﬁers that accounts for correlated features represented as a graphical model. The core idea of verifying fairness of a classiﬁer is to compute the probability of positive prediction of the classiﬁer with respect to all compound sensitive groups. To this end, FVGM solves a stochastic subset sum problem, S3P, that is equivalent to computing the probability of positive prediction of the classiﬁer for the most (and the least) favored sensitive group. In this section, we ﬁrst deﬁne S3P and present an efﬁcient dynamic programming solution for S3P. We then extend S3P to consider correlated features as input. Finally, we conclude by discussing fairness veriﬁcation based on the solution of S3P. Problem Formulation. Given a linear classiﬁer M : (X, A) ˆY and a probability distribution D of X A, our objective is to compute maxa Pr[ ˆY = 1|A = a] and mina Pr[ ˆY = 1|A = a] with respect to D. In this study, we express a linear classiﬁer M as

i w Xi Xi + X

j w Aj Aj τ i .

Here, w denotes the weight (or coefﬁcients) of a feature, τ denotes the bias or the offset parameter of the classiﬁer, and 1 is an indicator function. Hence, the prediction ˆY = 1 if and only if the inner inequality holds. Thus, computing the maximum (resp. minimum) probability of positive prediction is equivalent to ﬁnding out the assignment of Aj s for which the probability of satisfying the inner inequality is highest (resp. lowest). We reduce this computation into an instance of S3P. To perform this reduction, we assume weights w and bias τ as integers, and features X A as Boolean. In Sec. 3.5, we relax these assumptions and extend to the practical settings.

3.1 S3P: Stochastic Subset Sum Problem Now, we formally describe the speciﬁcation and semantics of S3P. S3P operates on a set of Boolean variables B = {Bi}n i=1 {0, 1}n, where wi Z is the weight of Bi, and n |B|. Given a constant threshold τ Z, S3P computes the probability of a subset of B with sum of weights of nonzero variables to be at least τ. Formally,

S(B, τ) Pr h X

i wi Bi τ i [0, 1].

Aligning with terminologies in stochastic satisﬁability (SSAT) (Littman, Majercik, and Pitassi 2001), we categorize the variables B into two types: (i) chance variables that are stochastic and have an associated probability of being assigned to 1 and (ii) choice variables that we optimize while computing S(B, τ). To specify the category of variables, we consider a quantiﬁer qi {

pi, , } for each Bi. Elaborately,

p is a random quantiﬁer corresponding to a chance variable B B, where p Pr[B = 1]. In contrast, is an existential quantiﬁer corresponding to a choice variable B such that a Boolean assignment of B maximizes S(B, τ). Finally, is an universal quantiﬁer for a choice variable B that ﬁxes an assignment to B that minimizes S(B, τ). Now, we formally present the semantics of S(B, τ) provided that each variable Bi has weight wi and quantiﬁer qi. Let B[2 : n] {Bj}n j=2 be the subset of B without the ﬁrst variable B1. Then S(B, τ) is recursively deﬁned as:

1[τ 0], if B = S(B[2 : n], τ max{w1, 0}), if q1 = S(B[2 : n], τ min{w1, 0}), if q1 = p1 S(B[2 : n], τ w1)+ (1 p1) S(B[2 : n], τ), if q1 =

Observe that when B is empty, S is computed as 1 if τ 0, and S = 0 otherwise. For existential and universal quantiﬁers, we compute S based on the weight. Speciﬁcally, if q1 = , we decrement the threshold τ by the maximum between w1 and 0. For example, if w1 > 0, B1 is assigned 1, and assigned 0 otherwise. Therefore, by solving for an existential

variable, we maximize S. In contrast, when if q1 = , we ﬁx an assignment of B1 that minimizes S by choosing between the minimum of w1 and 0. Finally, for random quantiﬁers, we decompose the computation of S into two sub-problems: one sub-problem where B1 = 1 and the updated threshold becomes τ w1 and another sub-problem where B1 = 0 and the updated threshold remains the same. Herein, we compute S as the expected output of both sub-problems. Remark. S(B, τ) does not depend on the order of B. Computing Minimum and Maximum probability of positive prediction of Linear Classiﬁers Using S3P. For computing maxa Pr[ ˆY = 1|A = a] of a linear classiﬁer, we set existential quantiﬁers to sensitive features Aj, randomized quantiﬁers

to non-sensitive features Xi and construct a set B = A X. The coefﬁcients w Aj and w Xi of the classiﬁer become weights of B. Also, we get n = m1 + m2. For non-sensitive variables Xi, which are chance variables, we derive their marginal probability pi = Pr[Xi = 1] from the distribution D. According to semantic of S3P, setting quantiﬁers on A computes the maximum value of S(B, τ) that equalizes the maximum probability of positive prediction of the classiﬁer. In this case, the inferred assignment of A implies the most favored group amax = arg maxa Pr[ ˆY = 1|A = a]. In contrast, to compute the minimum probability of positive prediction, we instead assign each variable Aj a universal quantiﬁer while keeping random quantiﬁers over Xi, and infer the least favored group amin = arg mina Pr[ ˆY = 1|A = a].

3.2 A Dynamic Programming Solution for S3P We propose a dynamic programming approach (Pisinger 1999; Woeginger and Yu 1992) to solve S3P as the problem has overlapping sub-problem properties. For example, S(B, τ) can be solved by solving S(B[2 : n], τ ), where the updated threshold τ , called the residual threshold, depends on the original threshold τ and the assignment of B1 as shown in Eq. (2). Building on this observation, we propose the recursion and terminating condition leading to our dynamic programming algorithm. Recursion. We consider a function dp(i, τ) that solves the sub-problem S(B[i : n], τ), for i {1, . . . , n}. The semantics of S(B, τ) in Eq. (2) induces the recursive deﬁnition of dp(i, τ) as:

dp(i + 1, τ max{wi, 0}), if qi = dp(i + 1, τ min{wi, 0}), if qi = pi dp(i + 1, τ wi)+ (1 pi) dp(i + 1, τ), if qi =

Eq. (3) shows that S(B, τ) can be solved by instantiating dp(1, τ), which includes all the variables in B. Terminating Condition. Let wneg, wpos, and wall be the sum of negative, positive, and all weights of B, respectively. We observe that wneg wall wpos. Thus, for any i, if the residual threshold τ wneg, there is always a subset of B[i : n] with sum of weights at least τ. Conversely, when τ > wpos, there is no subset of B[i : n] with sum of weights

at least τ. We leverage this bound and tighten the terminating conditions of dp(i, τ) in Eq. (4).

1 if τ wneg 0 if τ > wpos 1[τ 0] if i = n + 1 (4)

Eq. (3) and (4) together deﬁne our dynamic programming algorithm. While deploying the algorithm, we store dp(i, τ) in memory to avoid repetitive computations. This allows us to achieve a pseudo-polynomial algorithm (Lemma 1) instead of a na ıve exponential algorithm enumerating all possible assignments. In particular, the time complexity is pseudo-polynomial for chance (random) variables and linear for choice (existential and universal) variables.

Lemma 1. Let n be the number of existential and universal variables in B. Let w = P Bi B|qi= max{wi, 0} and w = P Bi B|qi= min{wi, 0} be the considered sum of weights of existential and universal variables, respectively. We can exactly solve S3P using dynamic programming with time complexity O((n n )(τ + |wneg| w w ) + n ). The space complexity is O((n n )(τ +|wneg| w w )).

A Heuristic for Faster Computation. We propose two improvements for a faster computation of the dynamic programming solution. Firstly, we observe that in Eq. (3), existential/universal variables are deterministically assigned based on their weights. Hence, we reorder B such that existential/universal variables appear earlier in B than random variables. This allows us to avoid unnecessary repeated exploration of existential/universal variables in dp. Moreover, according to the remark in Section 3.1, reordering B still produces the same exact solution of S3P. Secondly, to reach the terminating condition of dp(i, τ) more frequently, we sort B based on their weights more speciﬁcally, within each cluster of random, existential, and universal variables. In particular, if τ 0.5(wpos wneg), τ is closer to wpos than wneg. Hence, we sort each cluster in descending order of weights. Otherwise, we sort in ascending order. We illustrate our dynamic programming approach in Example 3.1.

Example 3.1. We consider a linear classiﬁer P + Q + R S 2. Herein, P is a Boolean sensitive feature, and Q, R, S are Boolean non-sensitive features with Pr[Q] = 0.4, Pr[R] = 0.5, and Pr[S] = 0.3. To compute the maximum probability of positive prediction of the classiﬁer, we impose an existential quantiﬁer on P and randomized quantiﬁers on others. This leads us to the observation that P = 1 is the optimal assignment as w P = 1 > 0. We now require to compute Pr[Q+R S 1], which by dynamic programming, is computed as 0.55. The solution is visualized as a search tree in Figure 2a, where we observe that storing the solution of sub-problems in the memory avoids repetitive computation, such as exploring the node (4, 0). Similarly, the minimum probability of positive prediction of the classiﬁer is 0.14 (not shown in Figure 2a) where we impose a universal quantiﬁer on P to obtain P = 0 as the optimal assignment.

2, 1 0.55 0.85 Pr[Q] + 0.35 Pr[ Q] = 0.55

(a) Known marginal probabilities.

0.85 Pr[Q|P]+ 0.35 Pr[ Q|P] = 0.65

Bayesian network

(b) Probabilities computed with a Bayesian network.

Figure 2: Search tree representation of S3P for computing the maximum probability of positive prediction of the classiﬁer on variables B = {P, Q, R, S} with weights {1, 1, 1, 1} and threshold τ = 2 . Each node is labeled by (i, τ ), where i is the index of B and τ is the residual threshold. The tree is explored using Depth-First Search (DFS) starting with left child. Within a node, the value in the bottom denotes dp(i, τ ) that is solved recursively based on sub-problems dp(i + 1, ) in child nodes. Yellow nodes denote existential variables and all other nodes are random variables. Additionally, a green node denotes a collision, in which case a previously computed dp solution is returned. Leaf nodes (gray) are computed based on terminating conditions in Eq. 4. In Figure 2b, nodes with double circles, such as {(1, 2), (2, 1), (2, 2)}, are enumerated exponentially to compute conditional probabilities from the Bayesian network.

3.3 S3P with Correlated Variables In S3P presented in Section 3.1, we consider all Boolean variables to be probabilistically independent. This independence assumption often leads to an inaccurate estimate of the probability of positive prediction of the classiﬁer because both sensitive and non-sensitive features can be correlated in practical fairness problems. Therefore, we extend S3P to include correlations among variables. We consider a Bayesian network BN = (G, θ) to represent correlated variables, where G (V, E), V B, E V V, and θ is the parameter of the network. In BN, we constrain that there is no conditional probability of choice (i.e., existential and universal) variables as we optimize their assignment in S3P. Choice variables, however, can impose conditions on chance (i.e., random) variables. In practice, we achieve this by allowing no incoming edge on choice variables while learning BN (ref. Section 4). For a chance variable Bi V, let Pa(Bi) denote its parents. According to Eq. (1), for an assignment u of Pa(Bi), BN ensures Bi to be independent of other non-descendant variables in V. Hence, in the recursion of Eq. (3), we substitute pi with Pr[Bi = 1|Pa(Bi) = u]. In order to explicate the dependence on u, we denote the expected solution of S(B[i : n], τ) as dp(i, τ, u), which for Bi V is modiﬁed as follows:

dp(i,τ, u) = Pr[Bi = 0|Pa(Bi) = u]dp(i + 1, τ, u {0}) + Pr[Bi = 1|Pa(Bi) = u]dp(i + 1, τ wi, u {1}).

Since dp(i, τ, u) involves u, we initially perform a topological sort of V to know the assignment of parents before

computing dp on the child. Moreover, there are 2|Pa(Bi)| assignments of Pa(Bi), and we compute dp(i, τ, u) for u {0, 1}|Pa(Bi)| to incorporate all conditional probabilities into S3P. For this enumeration, we do not store dp(i, τ, u) in memory. However, for Bi V that does not appear in the network, we instead compute dp(i, τ) and store it in memory as in Section 3.2, because Bi is not correlated with other variables. Lemma 2 presents the complexity of solving S3P with correlated variables, wherein unlike Lemma 1, the complexity differentiates based on variables in V (exponential) and B \ V (pseudo-polynomial). Lemma 2. Let V B be the set of vertices in the Bayesian network and n be the number of existential and universal variables in B \ V. Let w = P Bi B\V|qi= max{wi, 0} and w = P Bi B\V|qi= min{wi, 0} be the sum of considered weights of existential and universal variables, respectively that only appear in B \ V. To exactly compute S3P with correlated variables in the dynamic programming approach, time complexity is O(2|V| + (n n |V|)(τ + |wneg| w w ) + n ) and space complexity is O((n n |V|)(τ + |wneg| w w )). A Heuristic for Faster Computation. We observe that to encode conditional probabilities, we enumerate all assignments of variables in V. For computing the probability of positive prediction of a linear classiﬁer with correlated features, we consider a heuristic to sort variables in B = A X. Let V B be the set of vertices in the network and Vc = B\V. In this heuristic, we sort sensitive variables A by positioning A V in the beginning followed by A Vc. Then we order the variables B such that variables in A precedes those in X V, and the variables in X Vc follows the ones in

X V. This sorting allows us to avoid repetitive enumeration of variables in V B as they are placed earlier in B.

Example 3.2. We extend Example 3.1 with a Bayesian Network (G, θ) with V = {P, Q} and E = {(P, Q)}. Parameters θ imply conditional probabilities Pr[Q|P] = 0.6 and Pr[Q| P] = 0.3. In Figure 2b, we enumerate all assignment of P and Q to incorporate all conditional probabilities of Q given P. We, however, observe that the dynamic programming solution in Section 3.2 still prunes search space for variables that do not appear in V, such as {R, S}. Hence following the calculation in Figure 2b, we obtain the maximum probability of positive prediction of the classiﬁer as 0.65 for P = 1. The minimum probability of positive prediction (not shown) is similarly calculated as 0.11 for P = 0.

3.4 Fairness Veriﬁcation with Computed Probability of Positive Prediction

Given a classiﬁer M, a distribution D, and a fairness metric f, verifying whether a classiﬁer is ϵ-fair (ϵ [0, 1]) is equivalent to computing 1[f(M|D) ϵ]. We now compute f(M|D) based on the maximum probability of positive prediction maxa Pr[ ˆY = 1|A = a] and the minimum probability of positive prediction mina Pr[ ˆY = 1|A = a] of a classiﬁer. For measuring fairness metric SP, we compute the difference maxa Pr[ ˆY = 1|A = a] mina Pr[ ˆY = 1|A = a]. We, however, deploy FVGM twice while measuring EO: one for the distribution D conditioned on Y = 1 and another for Y = 0. In each case, we compute maxa Pr[ ˆY = 1|A = a, Y = y] mina Pr[ ˆY = 1|A = a, Y = y] for y {0, 1} and take the maximum difference as the value of EO. For measuring causal metric PCF, we compute maxa Pr[ ˆY = 1|A = a, Z] and mina Pr[ ˆY = 1, Z|A = a, Z] conditioned on mediator features Z and take their difference. To measure DI, we compute the ratio maxa Pr[ ˆY = 1|A = a]/ mina Pr[ ˆY = 1|A = a]. In contrast to other fairness metrics, DI closer to 1 indicates higher fairness level. Thus, we verify whether a classiﬁer achieves (1 ϵ)-DI by checking 1[DI(M|D) 1 ϵ].

3.5 Extension to Practical Settings

For verifying linear classiﬁers with real-valued features and coefﬁcients, we preprocess them so that FVGM can be invoked. Let Xc R be a continuous real-valued feature with coefﬁcient wc R in the classiﬁer. We discretize Xc to a set Bc of k Boolean variables using binning-based discretization and assign a Boolean variable to each bin. Hence, Bi Bc becomes 1, when Xc belongs to the ith bin. Let µi denote the mean of feature-values within ith bin. We then set the coefﬁcient of Bi as wcµi. By law of large numbers, Xc P i µi Bi for inﬁnitely many bins (Grimmett and Stirzaker 2020). Finally, we multiply the coefﬁcients of discretized variables by l N \ {0} and round to an integer. Accuracy of the preprocessing step relies on the number of bins k and the multiplier l. Therefore, we empirically ﬁne-tune both k and l by comparing the processed classiﬁer with the initial classiﬁer on a validation dataset.

0 20 40 60 80 100 Benchmarks solved

Time limit (s)

Logistic regression

FVGM Fair Square Justicia Veri Fair

0 20 40 60 80 100 Benchmarks solved

Time limit (s)

Support Vector Machine

FVGM Fair Square Justicia Veri Fair

Figure 3: A cactus plot to present the scalability of different fairness veriﬁers. The number of solved benchmarks are on the X-axis and the required time is on the Y -axis; a point (x, y) implies that a veriﬁer takes less than or equal to y seconds to compute fairness metrics of x many benchmarks. We consider 100 benchmarks generated from 5 real-world datasets using 5-fold cross-validation. In each fold, we consider {25, 50, 75, 100} percent of non-sensitive features.

4 Empirical Performance Analysis

In this section, we empirically evaluate the performance of FVGM. We ﬁrst present the experimental setup and the objective of our experiments, followed by experimental results. Experimental Setup. We implement a prototype of FVGM in Python (version 3.8). We deploy the Scikit-learn library for learning linear classiﬁers such as Logistic Regression (LR) and Support Vector Machine (SVM) with linear kernels. We perform ﬁve-fold cross-validation on a dataset. While the classiﬁer is trained on continuous features, we discretize them to Boolean features to be invoked by FVGM. During discretization, we apply a gird-search to estimate the best bin-size within a maximum bin of 10. To convert the coefﬁcients of features into integers, we employ another grid-search to choose the best multiplier within {1, 2, . . . , 100}. For learning a Bayesian network on the converted Boolean data, we deploy the PGMPY library (Ankan and Panda 2015). For network learning, we apply a Hillclimbing search algorithm that learns a DAG structure by optimizing K2 score (Koller and Friedman 2009). For estimating parameters of the network, we use Maximum Likelihood Estimation (MLE) algorithm. We compare FVGM with three existing fairness veriﬁers: Justicia (Ghosh, Basu, and Meel 2021), Fair Square (Albarghouthi et al. 2017), and Veri Fair (Bastani, Zhang, and Solar Lezama 2019).

4.1 Scalability Analysis

Benchmarks. We perform the scalability analysis on ﬁve real-world datasets studied in fair ML literature: UCI

2 3 4 5 # Features

Disparate impact

Logistic regression

Justicia Fair Square

2 3 4 5 # Features

Disparate impact

Support Vector Machine

Justicia Fair Square

Figure 4: Comparing the average accuracy of different veriﬁers over 100 synthetic benchmarks while varying the number of features. FVGM yields the closest estimation of the analytically calculated Exact values of DI for LR and SVM classiﬁers.

Adult, German-credit (Doshi-Velez and Kim 2017), COMPAS (Angwin et al. 2016), Ricci (Mc Ginley 2010), and Titanic (https://www.kaggle.com/c/titanic). We consider 100 benchmarks generated from 5 real-world datasets and report the computation times (for DI and SP) of different veriﬁers. Results. In Figure 3, we present the scalability results of different veriﬁers. First, we observe that Fair Square often times out (= 900 seconds) and can solve 5 benchmarks. This indicates that SMT-based reduction for linear classiﬁers cannot scale. Similarly, SSAT-based veriﬁer Justicia that performs pseudo-Boolean to CNF translation for linear classiﬁers, times out for around 20 out of 100 benchmarks. Sampling-based framework, Veri Fair, has comparatively better scalability than SMT/SSAT based frameworks and can solve more than 90 benchmarks. Finally, FVGM achieves impressive scalability by solving all 100 benchmarks with 1 to 2 orders of magnitude runtime improvements than existing veriﬁers. Therefore, S3P-based framework FVGM proves to be highly scalable in verifying fairness properties of linear classiﬁers than the state-of-the-art.

4.2 Accuracy Analysis Benchmark Generation. To perform accuracy analysis, we require the ground truth, which is not available for real-world instances. Therefore, we focus on generating synthetic benchmarks for analytically computing the ground truth of different fairness metrics, such as DI, from the known distribution of features. In each benchmark, we consider n {2, 3, 4, 5} features including one Boolean sensitive feature, say A, generated from a Bernoulli distribution with mean 0.5. We generate non-sensitive features Xi from Gaussian distributions such that Pr[Xi|A = 1] N(µi, σ2) and Pr[Xi|A = 0] N(µ i, σ2), where µi, µ i [0, 1], σ = 0.1, and µi, µ i are chosen from a uniform distribution in [0, 1]. Finally, we cre-

Dataset A Algo. DI PCF SP EO

Adult race RW 0.53 -0.06 -0.06 -0.02 OP 0.57 -0.07 -0.07 -0.02

sex RW 0.96 -0.16 -0.15 -0.08 OP 0.43 -0.08 -0.08 0.03

COMPAS race RW 0.13 -0.07 -0.07 -0.06 OP 0.15 -0.08 -0.08 -0.05

sex RW 0.1 -0.04 -0.04 0.04 OP 0.09 -0.04 -0.04 -0.03

German age RW 0.52 -0.53 -0.52 -0.47 OP 0.53 -0.53 -0.53 -0.51

sex RW 0.06 0.06 0.06 0.02 OP 0.12 0.12 0.12 0.07

Table 1: Veriﬁcation of fairness algorithms using FVGM. A denotes sensitive features. RW and OP refer to reweighing and optimized-preprocessing algorithms. Numbers in bold refer to fairness improvement.

ate label Y = 1[Pn 1 i=1 Xi 0.5 Pn 1 i=1 (µi + µ i)] such that Y does not directly depend on the sensitive feature. For each n, we generate 100 random benchmarks, learn LR and SVM classiﬁers on them, and compute DI2 using different veriﬁers. Results. We assess the accuracy of the competing veriﬁers in estimating fairness metrics, speciﬁcally DI with LR and SVM classiﬁers. In Figure 4, FVGM computes DI closest to the Exact value for different number of features and both type of classiﬁers. In contrast, Justicia, Fair Square, and Veri Fair measure DI far from the Exact because of ignoring correlations among the features. For example, for SVM classiﬁer with n = 5 (right plot in Figure 4), Exact DI is 0.089 (average over 100 random benchmarks). Here, FVGM computes DI as 0.094, while all other veriﬁers compute DI as at least 0.233. Therefore, FVGM is more accurate than existing veriﬁers as it explicitly considers correlations among features.

5 Applications of FVGM In this section, we apply FVGM for verifying fairnessenhancing algorithms and depreciation attacks. We also demonstrate that FVGM facilitates computation of fairness inﬂuence functions by enabling the detection of bias due to individual features.

Verifying Fairness-enhancing Algorithms. We deploy FVGM in verifying the effectiveness of fairness-enhancing

2To analytically compute DI, let the coefﬁcients of the classiﬁer be wi for Xi and w A for A, and bias be τ. Since all non-sensitive features are from Gaussian distributions, we compute the probability of the predicted class Pr[ ˆY |A = 1] N(Pn 1 i=1 wiµi, σ2 ˆY ) and Pr[ ˆY |A = 0] N(Pn 1 i=1 wiµ i, σ2 ˆY ) with σ2 ˆY = (Pn 1 i=1 w2 i )σ2. Hence, the probability of positive prediction of the classiﬁer is 1 CDF ˆY |A=1(τ w A) for A = 1 and 1 CDF ˆY |A=0(τ) for A = 0, where CDF is the cumulative distribution function. Finally, we compute DI by taking the ratio of the minimum and the maximum of the probability of positive prediction of the classiﬁer.

1 5 10 20 40 80 120160 Poisoned samples

Disparate impact

1 5 10 20 40 80 120160 Poisoned samples

Statistical parity

Figure 5: Verifying poisoning attack against fairness using FVGM. The red line denotes the safety margin of the ML model against the attack.

algorithms designed to ameliorate bias. For example, fairness pre-processing algorithms can be validated by applying FVGM on the unprocessed and the processed data separately and comparing different fairness metrics. In Table 1, we report the effect of fairness algorithms w.r.t. four fairness metrics: disparate impact (DI), path-speciﬁc causal fairness (PCF), statistical parity (SP), and equalized odds (EO). Note that, fairness is improved if DI increases and the rest of the metrics decrease. For instance, in most instances for Adult dataset, reweighing (RW) (Kamiran and Calders 2012) and optimized pre-processing (OP) (Calmon et al. 2017) algorithms are successful in improving fairness. The exceptional case is the unfairness regarding the sensitive feature sex , where OP algorithm fails in improving fairness metric EO. Thus, FVGM veriﬁes the enhancement and decrement in fairness by fairness-enhancing algorithms.

Verifying Fairness Attacks. We apply FVGM in verifying a fairness poisoning-attack algorithm. This algorithm injects a small fraction of poisoned samples into the training data such that the classiﬁer becomes relatively unfair (Solans, Biggio, and Castillo 2020). We apply this attack to add {1, 5, . . . , 160} poisoned samples and measure the corresponding disparate impact and statistical parity. In Figure 5, FVGM veriﬁes that the disparate impact of the classiﬁer decreases and statistical parity increases, i.e. the classiﬁer becomes more unfair, as the number of poisoned samples increases. Therefore, FVGM shows the potential of being deployed in safety-critical applications to detect fairness attacks. For example, if we set 0.9 as an acceptable threshold of disparate impact, FVGM can raise an alarm once 160 poisoned samples are added.

Fairness Inﬂuence Function (FIF): Tracing Sources of Unfairness. Another application of FVGM as a fairness veriﬁer is to quantify the effect of a subset of features on fairness. Thus, we deﬁne fairness inﬂuence function (FIF) that computes the effect of a subset of non-sensitive features S X on the probability of positive prediction of a classiﬁer given a speciﬁc sensitive group A = a, FIF(S) Pr[ ˆY = 1|A = a, D] Pr[ ˆY = 1|A = a, D S]. FIF allows us to explain the sources of unfairness in the classiﬁer. In practice, we compute FIF of S by replacing the probability distribution of S with a uniformly random distribution, referred to as D S, and reporting differences in the conditional probability of positive prediction of the classiﬁer.

0.3 0.4 0.5 0.6 0.7

Pr[Y = 1|sex = 1]

priors_count

juv_misd_count

juv_fel_count

juv_other_count

c_charge_degree

0.4 0.5 0.6 0.7 0.8

Pr[Y = 1|sex 1]

priors_count

juv_misd_count

juv_fel_count

juv_other_count

c_charge_degree

Figure 6: FIF computation for COMPAS dataset. For Female (left plot) and Male, inﬂuence of age decreases and the probability of positive prediction of the classiﬁer increases by different magnitudes.

In Figure 6, we compute FIF for all features in COMPAS dataset, separately for two sensitive groups: Female ( sex = 1) and Male. This dataset concerns the likelihood of a person of re-offending crimes within the next two years. We ﬁrst observe that the base values of the probability of positive prediction are different for the two groups (0.46 vs 0.61 for Female and Male), thereby showing Male as more probable to re-offend crimes than Female. Moreover, FIF of the feature age is comparatively higher in magnitude for Male than Female. This implies that while deciding recidivism the algorithm assumes that Female individuals across different ages re-offend crimes with almost the same probability and the probability of re-offending for Male individuals highly depends on age. Thus, applying FVGM to compute FIF provides us insights about sources of bias and thus, indicators to improve fairness. Pr[ ˆY = 1|age 0 0.5]

6 Conclusion

In this paper, we propose FVGM, an efﬁcient fairness veriﬁcation framework for linear classiﬁers based on a novel stochastic subset-sum problem. FVGM encodes a graphical model of feature-correlations, represented as a Bayesian Network, and computes multiple group and causal fairness metrics accurately. We experimentally demonstrate that FVGM is not only more accurate and scalable than the existing veriﬁers but also applicable in practical fairness tasks, such as verifying fairness attacks and enhancing algorithms, and computing the fairness inﬂuence functions. As a future work, we aim to design fairness-enhancing algorithms certiﬁed by fairness veriﬁers, such as FVGM. Since FVGM serves as an accurate and scalable fairness veriﬁer for linear classiﬁers, it will be interesting to design such veriﬁers for other ML models.

Acknowledgments

This work was supported in part by National Research Foundation, Singapore under its NRF Fellowship Programme [NRF-NRFFAI1-2019-0004 ] and AI Singapore Programme [AISG-RP-2018-005], and NUS ODPRT Grant [R-252-000685-13]. The computational work for this paper was performed on resources of Max Planck Institute for Software Systems, Germany and the National Supercomputing Centre, Singapore (https://www.nscc.sg).

References Ajunwa, I.; Friedler, S.; Scheidegger, C. E.; and Venkatasubramanian, S. 2016. Hiring by algorithm: predicting and preventing disparate impact. Available at SSRN. URL: http://sorelle.friedler.net/papers/SSRN-id2746078.pdf. Albarghouthi, A.; D Antoni, L.; Drews, S.; and Nori, A. V. 2017. Fair Square: probabilistic veriﬁcation of program fairness. Proceedings of the ACM on Programming Languages, 1(OOPSLA): 1 30. Angwin, J.; Larson, J.; Mattu, S.; and Kirchner, L. 2016. Machine bias risk assessments in criminal sentencing. Pro Publica, May, 23. Ankan, A.; and Panda, A. 2015. PGMPY: Probabilistic graphical models using Python. In Proceedings of the 14th Python in Science Conference (SCIPY 2015). Citeseer. Bastani, O.; Zhang, X.; and Solar-Lezama, A. 2019. Probabilistic veriﬁcation of fairness properties via concentration. Proceedings of the ACM on Programming Languages, 3(OOPSLA): 1 27. Bellamy, R. K. E.; Dey, K.; Hind, M.; Hoffman, S. C.; Houde, S.; Kannan, K.; Lohia, P.; Martino, J.; Mehta, S.; Mojsilovic, A.; Nagar, S.; Ramamurthy, K. N.; Richards, J.; Saha, D.; Sattigeri, P.; Singh, M.; Varshney, K. R.; and Zhang, Y. 2018. AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias. Berk, R. 2019. Accuracy and fairness for juvenile justice risk assessments. Journal of Empirical Legal Studies, 16(1): 175 194. Calmon, F.; Wei, D.; Vinzamuri, B.; Ramamurthy, K. N.; and Varshney, K. R. 2017. Optimized pre-processing for discrimination prevention. In Advances in Neural Information Processing Systems, 3992 4001. Chavira, M.; and Darwiche, A. 2008. On probabilistic inference by weighted model counting. Artiﬁcial Intelligence, 172(6-7): 772 799. Doshi-Velez, F.; and Kim, B. 2017. Towards a rigorous science of interpretable machine learning. ar Xiv preprint ar Xiv:1702.08608. Dressel, J.; and Farid, H. 2018. The accuracy, fairness, and limits of predicting recidivism. Science advances, 4: eaao5580. Dwork, C.; Hardt, M.; Pitassi, T.; Reingold, O.; and Zemel, R. 2012. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference, 214 226. Feldman, M.; Friedler, S. A.; Moeller, J.; Scheidegger, C.; and Venkatasubramanian, S. 2015. Certifying and removing disparate impact. In proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, 259 268. Ghosh, B.; Basu, D.; and Meel, K. S. 2021. Justicia: A Stochastic SAT Approach to Formally Verify Fairness. In Proceedings of AAAI. Grimmett, G.; and Stirzaker, D. 2020. Probability and random processes. Oxford University press.

Hardt, M.; Price, E.; and Srebro, N. 2016. Equality of opportunity in supervised learning. In Advances in neural information processing systems, 3315 3323. John, P. G.; Vijaykeerthy, D.; and Saha, D. 2020. Verifying Individual Fairness in Machine Learning Models. ar Xiv preprint ar Xiv:2006.11737. Kamiran, F.; and Calders, T. 2012. Data preprocessing techniques for classiﬁcation without discrimination. Knowledge and Information Systems, 33(1): 1 33. Kleinberg, J.; and Tardos, E. 2006. Algorithm design. Pearson Education India. Koller, D.; and Friedman, N. 2009. Probabilistic graphical models: principles and techniques. MIT press. Kusner, M. J.; Loftus, J.; Russell, C.; and Silva, R. 2017. Counterfactual fairness. In Advances in neural information processing systems, 4066 4076. Landy, F. J.; Barnes, J. L.; and Murphy, K. R. 1978. Correlates of perceived fairness and accuracy of performance evaluation. Journal of Applied psychology, 63(6): 751. Littman, M. L.; Majercik, S. M.; and Pitassi, T. 2001. Stochastic Boolean satisﬁability. Journal of Automated Reasoning, 27(3): 251 296. Martinez Neda, B.; Zeng, Y.; and Gago-Masague, S. 2021. Using Machine Learning in Admissions: Reducing Human and Algorithmic Bias in the Selection Process. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education, 1323 1323. Mc Ginley, A. C. 2010. Ricci v. De Stefano: A Masculinities Theory Analysis. Harv. JL & Gender, 33: 581. Mehrabi, N.; Morstatter, F.; Saxena, N.; Lerman, K.; and Galstyan, A. 2019. A survey on bias and fairness in machine learning. ar Xiv preprint ar Xiv:1908.09635. Nabi, R.; and Shpitser, I. 2018. Fair inference on outcomes. In Proceedings of the AAAI Conference on Artiﬁcial Intelligence, volume 32. Pearl, J. 1985. Bayesian networks: A model of self-activated memory for evidential reasoning. In Proceedings of the 7th conference of the Cognitive Science Society, University of California, Irvine, CA, USA, 15 17. Pisinger, D. 1999. Linear time algorithms for knapsack problems with bounded weights. Journal of Algorithms, 33(1): 1 14. Pleiss, G.; Raghavan, M.; Wu, F.; Kleinberg, J.; and Weinberger, K. Q. 2017. On fairness and calibration. ar Xiv preprint ar Xiv:1709.02012. Solans, D.; Biggio, B.; and Castillo, C. 2020. Poisoning attacks on algorithmic fairness. ar Xiv preprint ar Xiv:2004.07401. Tollenaar, N.; and Van der Heijden, P. 2013. Which method predicts recidivism best? A comparison of statistical, machine learning and data mining predictive models. Journal of the Royal Statistical Society: Series A (Statistics in Society), 176(2): 565 584. Woeginger, G. J.; and Yu, Z. 1992. On the equal-subset-sum problem. Information Processing Letters, 42(6): 299 302.

Zafar, M. B.; Valera, I.; Rogriguez, M. G.; and Gummadi, K. P. 2017. Fairness constraints: Mechanisms for fair classiﬁcation. In Artiﬁcial Intelligence and Statistics, 962 970. Zhang, J.; and Bareinboim, E. 2018. Fairness in decisionmaking the causal explanation formula. In Proceedings of the AAAI Conference on Artiﬁcial Intelligence, volume 32. Zliobaite, I. 2015. On the relation between accuracy and fairness in binary classiﬁcation. ar Xiv preprint ar Xiv:1505.05723.