# conformalized_fairness_via_quantile_regression__15377755.pdf

Conformalized Fairness via Quantile Regression

Meichen Liu1, Lei Ding1, Dengdeng Yu2, Wulong Liu3, Linglong Kong1 , Bei Jiang1

1Department of Mathematical and Statistical Sciences, University of Alberta 2Department of Mathematics, University of Texas at Arlington 3 Huawei Noah s Ark Lab Canada {meichen1,lding1,lkong,bei1}@ualberta.ca {dengdeng.yu}@uta.edu {liuwulong}@huawei.com

Algorithmic fairness has received increased attention in socially sensitive domains. While rich literature on mean fairness has been established, research on quantile fairness remains sparse but vital. To fulfill great needs and advocate the significance of quantile fairness, we propose a novel framework to learn a real-valued quantile function under the fairness requirement of Demographic Parity with respect to sensitive attributes, such as race or gender, and thereby derive a reliable fair prediction interval. Using optimal transport and functional synchronization techniques, we establish theoretical guarantees of distribution-free coverage and exact fairness for the induced prediction interval constructed by fair quantiles. A hands-on pipeline is provided to incorporate flexible quantile regressions with an efficient fairness adjustment post-processing algorithm. We demonstrate the superior empirical performance of this approach on several benchmark datasets. Our results show the model s ability to uncover the mechanism underlying the fairness-accuracy trade-off in a wide range of societal and medical applications.

1 Introduction

We are increasingly leaning on machine learning systems to tackle human problems. A primary objective is to develop intelligent algorithms that can automatically produce accurate decisions which also enjoy equitable properties, as unintended social bias has been identified as a rising concern in various fields [13, 18, 14].

As a means of providing quantitative measures of fairness, a number of metrics have been proposed. These metrics can be categorized into three broad categories: group fairness [3], individual fairness [25], and causality-based fairness [31]. In contrast to causality-based fairness that requires domain knowledge to develop a fair causal structure and individual fairness that seeks equality only between similar individuals, group fairness does not require any prior knowledge and seeks equality for groups as a whole [6]. Among the metrics defined for group fairness such as equalized odds [9, 29] and predictive rate parity [10], demographic parity (DP) is generic since it does not allow prediction results in aggregate to depend on sensitive attributes [1, 21, 12, 39]. In particular, an algorithm is said to satisfy DP if its prediction is independent of any given sensitive attribute [1].

There have been a number of studies on algorithmic fairness concerning DP [1, 11, 12, 21, 32, 39]. In the context regression analysis, much attention have been paid on conditional mean inferences [1, 11, 12, 32], few are concerned with conditional quantiles [37, 39]. As real-world data often exhibit heterogeneity, contain extreme outliers, or do not meet satisfactory distributional assumptions,

Co-Corresponding Authors

36th Conference on Neural Information Processing Systems (Neur IPS 2022).

like Gaussianity, a fairness discussion on conditional quantiles may be more rational and essential since they are able to provide a more complete understanding of the dependence structure between response and explanatory variables [39], as well as better accommodate asymmetry and extreme tail behavior [38]. It should also be noted that bias or unfairness that arises in mean regression may also be propagated through quantile regression, therefore it must be properly dealt with separately: a graphic demonstration can be found in Figure 1. More intuitively, we may take an example from a Spanish labor market study [17, 18]. The study found that in Spain, also in line with other countries, the mean wage gap between men and women is quite substantial: on average, women earn around 70 percent of what men earn. While wage gaps are not uniform across all pay scales, they are greater at higher quantiles than at lower quantiles. As biases and disparities at different quantiles tend to be overshadowed by the mean behavior of the entire population, we propose a novel framework to seek fair predictions at different quantiles. It uses optimal transport techniques [2, 12] by transforming bias-affected distributions into an only-fair Wasserstein-2 barycenter through a kernel-based functional synchronization method [8, 45], in order to provide fair quantile estimators.

Figure 1: An illustration of quantile fairness: for a skewed and heteroscedastic quantile estimation {qα,i}N i=1 affected by the sensitive attribute S {0, 1}, for example, the higher quantile of the salary distribution, the optimal fair quantile prediction Qgα(t), t (0, 1) is derived through a convex combination of the conditional quantile functions of Qqα|S=0 and Qqα|S=1.

Since quantile fairness poses a number of theoretical challenges, no previous literature has been able to provide any inference results such as prediction intervals concerning quantile fairness. It is imperative to keep in mind that fairness is only one of two legs of the primary goal of modern machine learning algorithms, the other being accuracy. Building a reliable prediction with valid confidence is a significant challenge that is encountered by many machine learning algorithms [44]. Towards this end, we propose the conformalized fair quantile prediction (CFQP) inspired by the works of Romano et al. [33, 34]. Our analysis demonstrates, both mathematically and experimentally, that CFQR provides finite sample, distribution-free validity, DP fairness for different quantiles, and precise control of the miscoverage rate, regardless of the underlying quantile algorithm.

Contributions and Outlines. In this paper, we propose a new quantile based method with valid inference that enhances both accuracy and fairness while maintaining a balance between the two. It is a novel framework that allows an exact control of prediction miscoverages while ensuring quantile fairness simultaneously. The main contributions are summarized as follows:

i. We successfully transform the problem of searching quantiles under DP fairness to the construction of multi-marginal Wasserstein-2 barycenters via the optimal transport theory [2, 12, 19]. We incorporate a novel kernel smoothing step into the preceding method, which is particularly advantageous for subgroups whose sample sizes are too small to obtain reliable quantile function estimations.

ii. In Section 4, we propose a conformalized fair quantile regression prediction interval (CFQP) inspired by the works of Romano et al. [33, 34]. It is mathematically proved to achieve a distribution-free validity, demographic parity on different quantiles, and an exact control of miscoverage rates, regardless of the quantile algorithm used. The theoretical validity of prediction interval constructed by CFQP and exact DP of the fair quantile estimators are given in Section 5 and the supplement.

iii. The experimental results presented in Section 6 include a numerical comparison of the proposed CFQP and fair quantile estimation with both state-of-the-art conformal and fairness-oriented

methods. By reducing the discriminatory bias dramatically, our method outperforms the state-ofthe-art methods while maintaining reasonable short interval lengths.

Related works. Existing approaches for building a fair mean regression broadly fall into three classes: pre-processing, in-processing and post-processing. In particular, preprocessing methods focus on transforming the data to remove any unwanted bias [5, 31, 43]; in-processing methods aim to build in fairness constraints into the training step [1, 4, 25, 29]; post-processing methods target to modify the trained predictor [11, 12, 28]. As few previous works have focused on the quantile fairness of and fair prediction interval, the most related are Yang et al. [39], where a different fairness measure was used. While Agarwal et al. [1] mentioned that their reduction-based approach can be adapted into quantile regression, Williamson and Menon [37] brought forward a novel conditional variance at risk fairness measure aiming to control the largest subgroup risk. For interval fairness measure, the approach by Romano et al. [33] achieved equalized coverage among groups without fairness on interval endpoints. Methodologically, integrating algorithmic fairness with Wasserstein distance based barycenter problem has been studied in [2, 11, 12, 19, 23]. Both in-processing [1, 23] and post-processing [11, 12] methods were proposed to solve classification and mean regression problems. As a post-processing method, our work is distinct from above-mentioned methods by constructing the DP-fairness for each population quantile, and generating a fair prediction interval accordingly.

Notations. We denote by [K] the set {1, . . . , K} for arbitrary integer K. |S| represents the cardinality for a finite set S. E and P represent the expectation and probability and 1{ } is the indicator function. Let {Zn} n=1 be a sequence of random variables, and {kn} n=1 be a sequence of positive numbers, we say that Zn = Op(kn), if lim T lim supn P(|Zn| > Tkn) = 0, then Zn/kn = Op(1). To

denote the equality in distribution of two random variables A and B, we write A d= B.

2 Problem statement

Consider the regression problem where a sensitive characteristic S is available, which by the U.S. law [19, 33] can be enumerated as sex, race, age, disability, etc. We observe the triplets (X1, S1, Y1) , . . . , (Xn, Sn, Yn), denote (Xi, Si, Yi) by Zi, i = 1, . . . , n and Zi is a random variable in Rp [K] R. The aim is to predict the unknown value of Yn+1 at a test point Xn+1, Sn+1. Let P be the joint distribution of Z, we assume that all the samples {Zi}n+1 i=1 are drawn exchangeable, where i.i.d. is a special case.

Our goal is to construct a marginal distribution-free prediction band C (Xn+1, Sn+1) R that is likely to cover the unknown response Yn+1 with finite-sample (nonasymptotic) validity. Formally, given a desired miscoverage rate α, the predicted interval satisfies

P {Yn+1 C (Xn+1, Sn+1)} 1 α (1)

for any joint distribution P and any sample size n, while the left and right endpoint of C (Xn+1, Sn+1) satisfies the fairness constraint of Demographic Parity concerning the sensitive variable S.

Demographic Parity. We introduce the quantitative definition of DP in fair regression and connect the DP-fairness with a quantile regressor qα. The result that qα can be projected to the fair counterparts using optimal transport will be invoked later.

Given a fixed quantile level α (it may refer to αlo or αhi indicating the upper and lower quantile estimates for the prediction band C(Xn+1, Sn+1)). Let qα : Rp [K] R represent an arbitrary conditional quantile predictor. Denote by νqα|s the distribution of (qα(X, S) | S = s), the Cumulative Distribution Function (CDF) of νqα|s is given by

Fνqα|s(t) = P(qα(X, S) t | S = s). (2)

The quantile function Qνqα|s = F 1 νqα|s : [0, 1] R ,namely, the generalized inverse of Fνqα|s, can thus be defined as for all levels t (0, 1],

Qνqα|s(t) = inf{y R : Fνqα|s(y) t} with Qνqα|s(0) = Qνqα|s(0+). (3)

To simplify the notations, we will write Fqα|s and Qqα|s instead of Fνqα|s and Qνqα|s respectively, for any prediction rule qα.

In the following, we introduce the definition of Demographic Parity (DP), which is most commonly used in the context of fairness research [1, 11, 12, 21, 29].

Definition 1 (Demographic Parity). An arbitrary prediction g : Rd [K] R satisfies demographic parity under a distribution P over (X, S, Y ), if g(X, S) is statistically independent of the sensitive attribute S. Formally, for every s, s [K],

sup t R |P(g(X, S) t | S = s) P (g(X, S) t | S = s )| = 0.

Demographic Parity (DP) requires the predictions to be independent of the sensitive attribute, and it demands the Kolmogorov-Smirnov distance [26] (the difference between CDFs measured in the l norm) between νg|s and νg|s to vanish for all categories s, s .

3 Quantile Regression and Conformal Prediction

In this section, we recall the CQR approach for finite sample, distribution-free prediction interval inference. Quantile regression was proposed by Koenker and Bassett [24] to estimate the α-th quantile of the conditional distribution of Y given X := (X, S) for some quantile level α (0, 1), since then it has become more pervasive with various applications, such as providing prediction intervals, detecting outliers, or perceiving the entire distribution [27, 20]. Denote the conditional cumulative distribution of Y given X by F(y | X = x) := P{Y y | X = x}. The α-th conditional quantile prediction is defined as qα( x) := inf{y R : F(y | X = x) α}. Quantile regression can be cast as an optimization problem [27, 42, 30, 36, 41], by minimizing the expected check loss function E(ρα) = E[ρα(y, q)| X = x], where

ρα(y, qα( x)) = α|y qα( x)| if y qα( x), (1 α)|y qα( x)| if y < qα( x). (4)

Quantile regression offers a principled way of judging the reliability of predictions by building a prediction interval for the new observation ( Xn+1, Yn+1). In contrast to asymptopia, Romano et al. [33, 34] brought forward the conformalized quantile regression (CQR) by combining the merits of robust quantile regression with conformal prediction, thus finite sample validity in Eq. (1) is guaranteed. Inspired by the split conformal method, a split CQR likewise starts with splitting the data into a proper training set and a calibration set, indexed by I1, I2 respectively. Given any quantile regression algorithm Q, we then fit two conditional quantile functions ˆqαlo and ˆqαhi on the proper training set: {ˆqαlo, ˆqαhi} Q n Xi, Yi : i I1 o . The conformity scores are

calculated to quantify the error made by the plug-in prediction interval ˆC( x) = [ˆqαlo( x), ˆqαhi( x)]. We evaluate the scores on the calibration set as Ek := max n ˆqαlo( Xk) Yk, Yk ˆqαhi( Xk) o

for each k I2, where both undercoverage and overcoverage of the interval are taken into consideration [34]. Given a new input data Xn+1, we construct the prediction interval for Yn+1 as C Xn+1 = h ˆqαlo Xn+1 Q1 α (E, I2) , ˆqαhi Xn+1 + Q1 α (E, I2) i , where

Q1 α(E, I2) := (1 α)(1 + 1/|I2|)-th empirical quantile of {Ek : k I2} conformalizes the plug-in prediction interval. Note that the constructed interval C( Xn+1) could be highly influenced by the sensitive variable S.

4 Conformal fair quantile prediction (CFQP)

We formally describe our proposed conformal fair prediction (CFQP) framework for constructing DP fairness constrained prediction intervals in this section. A kernel smoothing quantile function is introduced during the functional synchronization, which can improve the estimation when some subgroups are too small to give reliable sample quantile function estimations. Definition 2 (Wasserstein-2 distance). Let µ and ν be two univariate probability measures with finite second moments. The squared Wasserstein-2 distance between µ and ν is defined as

W2 2(µ, ν) = inf Z

R R |x y|2dγ(x, y), γ Γµ,ν

where Γµ,ν is the set of probability measures (couplings) on R R having µ and ν as marginals.

Proposition 1 (Fair optimal prediction [12]). Assume, for each s [K], that the univariate measure νqα|s has a density and let ps = P(S = s). Then,

min gα is fair E (qα(X, S) gα(X, S))2 = min ν

s [K] ps W2 2 νqα|s, ν . (5)

Moreover, if gα and ν solve the l.h.s. and the r.h.s. problems respectively, then ν = νgα and specifically, gα(x, s) = X

s [K] ps Qqα|s Fqα|s qα(x, s). (6)

Proposition 1 implies that the optimal fair quantile predictor for an input (x, s) is obtained by a nonlinear transformation of the vector [qα(x, s)]K s=1 linking to a Wasserstein barycenter problem[2, 12]. The explicit closed form solution comes from [2, 12, 16], which relies on the classical characterization of optimal coupling in one dimension of the Wasserstein-2 distance. A rigorous proof is given in [12, 23]. It shows that a minimizer gα of the L2-risk can be used to construct ν and vice-versa, given ν, there is a explicit expression Eq. (6) for the multi-marginal Wasserstein barycenter [2].

First, we provide a sketch of the CFQP approach. We start with splitting the whole training data into a proper training set I1 and a calibration set I2, then fit an arbitrary quantile regression algorithm Q on I1, {ˆqαlo, ˆqαhi} Q n Xi, Yi : i I1 o . We apply the fitted quantile algorithm Q on the

calibration set I2 to obtain the predicted {ˆqαlo( Xi), ˆqαhi( Xi)}i I2. Since the quantile estimates for I2 will be used for conformalization, it is essential to transform them into fair ones, i.e. ˆgα,i, i I2 (Eq. (9)), through Algorithm 2. Finally, for a test point Xn+1, we will predict two quantile estimates ˆqα(x, s) affected by the sensitive variable S by Q, then apply the functional synchronization (details in Algorithm 2) and calibration (Algorithm 1) steps in turn to generate the fair constraint prediction interval C( Xn+1) for Yn+1.

Next, we explicate in detail how to remove the effect of the sensitive variable for the predicted quantile estimates. By Proposition 1, the optimal fair quantiles take the form of Eq. (6). Therefore, we propose an empirical optimal fair quantile estimator ˆgα that relies on the plug-in principle. In particular, Eq.(6) indicates that for each quantile level α and each category s [K], we only need estimators for the regression function ˆqα, the proportion ˆps, the cumulative distribution function Fˆqα|s and the quantile function Qˆqα|s.

Note that we can empirically estimate the CDF and quantile function for each sensitive group in the calibration set I2 separately. Hence for each quantile level α, let Ns := |Is 2|, and the quantile estimators (ˆqs 1, ˆqs 2, . . . , ˆqs Ns)2 are calculated through the fitted quantile regression Q with training set I1. We define the augmented random variable for each data point in I2,

qs i := ˆqs i + U s i ([ σ, σ]) i Is 2, s [K],

where U s i are i.i.d. random variables, uniformly distributed on [ σ, σ] for some small positive σ, and independent from all the previously introduced random variables. It serves as a smoothing random variable, for the random variables qs i are i.i.d. continuous for any P(Y | X) and Q. Otherwise, the original ˆqs i might have atoms resulting in a non-zero probability to observe ties in the group {ˆqs i } for s = 1, . . . , K. This trick, also called jittering [7, 12] is often used for data visualization for tie-breaking. Using the above quantities, we build the CDF and quantile function estimators for each subgroup s [K] as follows,

ˆFqα|s (t) = N 1 s

i=1 1 n qs i t o , for all t R, (7)

ˆQ2,qα|s (t) = Z 1

0 ˆF 1 qα|s (v)Kh(t v)dv, t (0, 1). (8)

The smoothed kernel estimator Eq.(8) was firstly proposed by Cheng and Parzen [8], where Kh( ) = K( /h)/h is a kernel function chosen as a probability density function that is symmetric around zero with bandwidth parameter h > 0.

2ˆqs i depends on the quantile level α, we suppress α for notational simplification.

If the quantile functions Q2,qα|s is differentiable, the derivative Q s (t) := Q 2,qα|s (t) for t (0, 1) is the quantile density function [8, 45]. We hereby give an estimation bound for Eq. (8) using kernel smoothing. For this purpose, we invoke the conditions (A1) - (A3) that are needed for deducing the following proposition. They can also be found from [45] and are included in the supplementary material. Proposition 2. Under conditions (A1), (A2), and (A3), we have

sup s sup t [0,1]

ˆQ2,qα|s (t) Qqα|s (t) = Op N 1/2 , s = 1, . . . , K.

The motivation for including a smoothing step is twofold: First, smoothing the quantile function eliminates the troublesomeness in defining arbitrary quantiles from the empirical one when the sample sizes of subgroups are small. Second, the proposed kernel smoothing improves second-order efficiency by alleviating the relative deficiency [15, 45]. Remark 1. One can utilize various kernels such as the Gaussian or Epanechnikov kernel with adaptive bandwidth for better practical performance. Other smoothing methods such as splines or local linear fitting can likewise be applied with equal effectiveness.

Consequently, for each quantile level α, the functional synchronized quantile estimator is

s =1 ˆps ˆQ2,qα|s ˆFqα|s qs i , i I2. (9)

The proposed estimator can be deemed as the empirical counterpart with additional randomization of the explicit fair optimal formula Eq.(6).

To conformalize the adjusted fair quantiles Eq (9), we need to compute the conformity scores Ei for each i I2 that quantify the error made by the plug-in fair prediction interval ˆCg( x) = [ˆgαlo( x), ˆgαhi( x)]. The scores are evaluated on the calibration set as

Ei := max{ˆgαlo,i Yi, Yi ˆgαhi,i}. (10)

At the last stage, for a new data point Xn+1 = (x, s), and α {αlo, αhi}, by defining

qs 1,i = ˆqs i + U s i ([ σ, σ]) i Is 1 and qα(x, s) = ˆqα(x, s) + U([ σ, σ]).

We use the empirical CDF of training set 3

ˆF1,qα|s(t) := 1 |Is 1|+1 P|Is 1| i=1 1 qs 1,i < t + U([0, 1]) 1 + P|Is 1| i=1 1 qs 1,i = t (11)

to estimate the location ˆF1,qα|s qα(x, s). Thus the fair quantile estimator is built as follows

ˆgα(x, s) =

s =1 ˆps ˆQ2,qα|s ˆF1,qα|s qα(x, s), α {αlo, αhi}. (12)

The fair prediction interval for Yn+1 is constructed as

C( Xn+1) = [ˆgαlo(x, s) Q1 α(E, I2), ˆgαhi(x, s) + Q1 α(E, I2)], (13)

where Q1 α(E, I2) := (1 α)(1 + 1/|I2|)-th empirical quantile of {Ei : i I2} will adjust the plug-in fair prediction interval. We present the pseudo-codes of CFQP as well as the construction of ˆgα for Eq. 9 in Algorithm 1, 2 respectively.

5 Theoretical results

We provide a statistical analysis of the proposed algorithm with coverage and DP-fairness guarantees in this part.

3Still, ˆqs i depends on quantile level α.

Algorithm 1 Split Conformal Fair Prediction (CFQP)

Input: D = {(Xi, Si, Yi)}n i=1; miscoverage level α (0, 1); quantile regression algorithm Q. 1: Randomly split [n] into disjoint proper training and calibration indices I1, I2. 2: Fit two conditional quantile functions on the training set {ˆqαlo, ˆqαhi} Q({(Xi, Si, Yi), i I1}). 3: Call functional Synchronization (Algorithm 2) to calculate {ˆgαlo, ˆgαhi} for each i I2. 4: Compute Ei max {ˆgαlo (Xi) Yi, Yi ˆgαhi (Xi)} for i I2. 5: Compute Q1 α(E, I2) (1 α)(1 + 1/|I2|)-th empirical quantile of {Ei : i I2}. 6: For a new test point (x, s), compute {ˆgαlo(x, s), ˆgαhi(x, s)} through Algorithm 2 Output: Fair prediction interval C(x, s) = [ˆgαlo(x, s) Q1 α(E, I2), ˆgαhi(x, s) + Q1 α(E, I2)] for (Xn+1, Sn+1) = (x, s).

Algorithm 2 Functional Synchronization

Input: Calibration set {(Xi, Si)}i I2 or new point (x, s); base quantile estimator Q; slack parameter σ; training set {(Xi, Si)}i I1; 1: if Calibration set {(Xi, Si)}i I2 then 2: for α {αlo, αhi} do 3: { qα(Xi, Si)} {qα(Xi, Si) + U([ σ, σ])}i I2 U([ σ, σ]) are used for tie-breaking 4: for s [K] do 5: Compute ˆFqα|s (t), and ˆF 1 2,qα|s (t) by Eq. (7) and (8).

6: Obtain ˆgα(Xi, Si) PK s =1 ˆps ˆF 1 2,qα|s ˆFqα|s qα(Xi, Si), i I2 7: end for 8: end for 9: else if New test point (x, s) then 10: for α {αlo, αhi} do 11: { qs 1,α} {ˆqs α + U([ σ, σ])}i Is 1 and qα(x, s) qα(x, s) + U([ σ, σ]) 12: Compute ˆgα(x, s) PK s =1 ˆps ˆF 1 2,qα|s ˆF1,qα|s qα(x, s) by Eq. (8) and (7) 13: end for 14: end if Output: fair quantile prediction ˆgα for calibration set or new test point (x, s).

Theorem 1 (Prediction coverage guarantee). If ( Xi, Yi), i = 1, . . . , n + 1 are exchangeable, then the prediction interval C( Xn+1) constructed by the split CFQP algorithm satisfies

P{Yn+1 C( Xn+1)} 1 α.

Moreover, if the conformity scores Ei are almost surely distinct, the prediction interval is nearly exactly calibrated, P{Yn+1 C( Xn+1)} 1 α + 1/(|I2| + 1).

Remark 2. Corollary 1 in the supplementary material gives an extension for the conformalization step which allows coverage errors to be spread arbitrarily over the left and right tails. Controlling the left and right tails independently yields a stronger coverage guarantee.

Theorem 2 (Demographic parity guarantee). For any joint distribution P of (X, S, Y ), any σ > 0, as well as the base quantile estimator ˆqα : Rp [K] R constructed on labeled data, the estimator ˆgα defined in Eq. (12) satisfies

(ˆgα(X, S) | S = s) d= (ˆgα(X, S) | S = s ) s, s [K].

Quantile DP guarantees provided by Theorem 2 are derived directly from distribution-free properties on rank and order statistics in Lemma E.1., Theorem 7.2, of Chzhen and Schreuder [11]. Further information can be found in their papers and the references they provide. We extend the estimator ˆg for quantile regression based on the estimators developed in their work with solid theoretical foundations. In contrast to the seminal work of Chzhen and Schreuder [11], we regard the densities and quantile functions as functional data, namely, as samples of stochastic processes. Typically, this approach is used by economists when dealing with the densities of income distribution across populations. As a result of introducing the kernel smoothing step, potential performance improvements are demonstrated in the supplemental material.

6 Experiments

To evaluate our proposed method 4, we report the performance of post-processing fairness adjustment on quantiles through four benchmark datasets: Law School (LAW), Community&Crime (CRIME), MEPS 2016 (MEPS), Government Salary (GOV). A detailed description of these datasets can be found in the supplementary material. The code for reproducing our results is avaiable at https: //github.com/Lei-Ding07/Conformal_Quantile_Fairness.

Coverage Length KS(lo) KS(hi) Coverage Length KS(lo) KS(hi)

Ln-CQR 90.16 0.47 0.46 .004 0.39 0.03 0.11 0.02 90.22 1.88 1.30 0.05 0.62 0.06 0.53 0.06 Ln-CFQP 90.02 0.51 0.46 .004 0.02 0.01 0.02 0.01 90.44 1.84 1.64 0.05 0.11 0.03 0.12 0.04

RF-CQR 90.25 0.55 0.39 .005 0.20 0.02 0.15 0.02 90.27 1.66 1.15 0.03 0.64 0.05 0.59 0.05 RF-CFQP 90.11 0.48 0.38 .004 0.02 .008 0.02 .009 90.34 1.84 1.54 0.04 0.12 0.04 0.12 0.03

NN-CQR 90.00 0.50 0.40 0.02 0.41 0.07 0.18 0.05 90.01 1.89 1.16 0.05 0.70 0.05 0.63 0.06 NN-CFQP 90.01 0.51 0.39 0.01 0.02 .009 0.03 .009 89.95 1.62 1.54 0.12 0.12 0.04 0.12 0.03

Coverage Length KS (lo) KS(hi) Coverage Length KS (lo) KS(hi)

Ln-CQR 89.92 0.66 0.66 0.01 0.09 0.03 0.33 0.05 90.00 0.19 0.79 .002 0.26 .014 0.44 0.02 Ln-CFQP 89.99 0.69 0.66 0.01 0.03 0.01 0.03 0.01 90.02 0.19 0.78 .002 0.05 0.01 0.04 0.01

RF-CQR 90.07 0.65 0.38 .009 0.19 0.02 0.30 0.03 90.03 0.17 0.61 .002 0.29 0.01 0.28 0.02 RF-CFQP 90.38 0.60 0.39 0.01 0.02 0.01 0.03 0.01 90.03 0.17 0.62 .002 0.05 0.01 0.04 0.01

NN-CQR 89.95 0.68 0.37 0.04 0.24 0.09 0.37 0.06 90.01 0.19 0.58 0.01 0.28 0.03 0.32 0.04 NN-CFQP 89.97 0.61 0.37 0.04 0.03 0.01 0.04 0.01 90.01 0.18 0.59 0.01 0.05 0.01 0.05 0.01

Table 1: Results reported on test set of 200 repeated experiments with α = 0.1. CQR refers to the conformalized quantile regression in [34]. Ln, RF, and NN denote the linear, random forest, and neural network quantile regression models. Our methods are shown in bold.

Figure 2: Results for estimating the lower (αlo) and upper (αhi) quantiles using some state-of-the-art DP-fairness requirement methods on all the datasets. Unfair , Chzhen , and Agarwal stand for the linear quantile model without fairness adjustment, barycenter method [12] and reduction-based algorithm [1] respectively. We present the MAE and KS of lower and upper quantile estimations. A Linear quantile model is implemented in this comparison.

We measure the violation of DP-fairness of the quantiles required by Definition 1 through the empirical Kolmogorov-Smirnov (KS) distance. The value represents the disparity between groups

4We utilize the local linear fitting smoothing method in the experiments.

Zs = {(X, S, Y ) Z : S = s} for all s [K],

KS(gα) = max s,s [K] sup t R

(X,S,Y ) Zs1{gα(X, S) t} 1 |Zs | P

(X,S,Y ) Zs 1{gα(X, S) t}

Experiment results. In Table 1, we report the average performance of the proposed CFQP over 200 randomly training-test splits as well as the baseline model CQR by the coverage rate, length of prediction interval, and the KS distance of the interval endpoint. We split the training data into proper training and calibration sets of equal sizes. Throughout the experiments, the nominal miscoverage rate is fixed to α = 0.1. Among pre-existing quantile algorithms, we select three leading variants: linear model[24], random forests [27] and neural networks [35]. Overall, our CFQP likewise CQR constructs prediction bands attaining desirable coverage around 90%, as claimed in Theorem 1. Random forest based approaches tend to be slightly more conservative than the other two w.r.t the coverage rate among all four datasets.

In the KS column concerning the DP fairness of interval endpoints, our CFQP method greatly reduces the discriminatory bias (quantified by KS) by 70% up to 90% compared to that of CQR. In addition, the lengths of the prediction intervals mostly remain the same except for the Crime dataset, probably due to its inherent high discriminatory bias between sensitive groups. In addition, CFQP is more robust according to the standard errors over experiment repetitions.

Figure 2 presents the comparison of our post-processing fairness adjustment procedure on quantiles ˆgα using the test set Ztest = {(Xi, Si, Yi)}ntest i=1 with some state-of-the-art fairness algorithms. Since most of the algorithms are targeted at mean prediction, there is no direct comparison with our quantile fairness method; we accordingly modified the existing methods into quantile versions for comparison. A detailed description can be found in the supplementary material.

The points in Figure 2 represents the mean of 200 repeated experiments with x-axis as KS distance and y-axis as Mean Absolute Error(MAE), where MAE(gα) = 1/ntest P Ztest |Y gα(X, S)| measures the prediction error of quantiles, the bars are the standard error on both axes. The optimal points should locate at the bottom left corner of the graph, where smaller KS distance and smaller MAE are achieved. In each subplot, our method consistently performs better with the smallest KS distance while keeping the MAE equal or slightly higher than the others or the unfair version. Note that due to the highly right skewness of real datasets, the MAE of the upper quantile estimation is larger than that of the lower quantile for all quantile approaches. Additional ablation studies for computational time analysis and testing the kernel smoothing approach are incorporated in the supplementary material.

7 Conclusion and future work

Conformal fair quantile regression is a novel approach for creating fair prediction intervals that attain valid coverage and reach independence between sensitive attributes while making minimal modifications to the quantile endpoints simultaneously. It becomes superior within heteroskedastic and/or asymmetric datasets and robust to outliers.

Our method is supported by rigorous distribution-free coverage and exact DP-fairness guarantees, as proved in theoretical parts. We conducted several real data examples demonstrating the effectiveness of our method in achieving exact coverage while imposing DP-fairness in practice. The method outperforms several state-of-the-art approaches by comparison.

A limitation in our numerical experiments is that we simply utilize the local linear smoothing method in defining quantile functions of the subgroups; we believe incorporating flexible kernel smoothing approaches [40, 45] would improve the experimental performances.

As potential future works, it would be valuable to introduce a DP relaxation framework based on an unfairness measure in a similar manner as [11, 37], allowing controlling the level of unfairness in quantile estimates. We also expect to extend the scope to other potential fairness metrics which is dependent on the underlying response like equalizing quantile loss across groups by incorporating a fairness penalty term in training, or the fairness metric defined for conditional variance-at-risk. In addition, it is worthwhile exploring the quantile fairness in other areas of AI, such as NLP[14, 22], Computer Vision, Recommendation systems, etc.

Acknowledgements

This work was supported by the Economic and Social Research Council (ESRC ES/T012382/1) and the Social Sciences and Humanities Research Council (SSHRC 2003-2019-0003) under the scheme of the Canada-UK Artificial Intelligence Initiative. The project title is BIAS: Responsible AI for Gender and Ethnic Labour Market Equality. We thank Yang Hu, Xiaojun Du and Matthew Pietrosanu for their helpful discussion and valuable input and all the constructive suggestions and comments from the reviewers.

[1] A. Agarwal, M. Dudík, and Z. S. Wu. Fair regression: Quantitative definitions and reduction-based algorithms. In International Conference on Machine Learning, page 120 129. PMLR, 2019.

[2] M. Agueh and G. Carlier. Barycenters in the wasserstein space. SIAM Journal on Mathematical Analysis, 43(2):904 924, 2011. doi: 10.1137/100805741.

[3] S. Barocas, M. Hardt, and A. Narayanan. Fairness in machine learning. Nips tutorial, 1:2, 2017.

[4] R. Berk, H. Heidari, S. Jabbari, M. Joseph, M. Kearns, J. Morgenstern, S. Neel, and A. Roth. A convex framework for fair regression. ar Xiv preprint ar Xiv:1706.02409, 2017.

[5] F. Calmon, D. Wei, B. Vinzamuri, K. Natesan Ramamurthy, and K. R. Varshney. Optimized pre-processing for discrimination prevention. Advances in neural information processing systems, 30, 2017.

[6] A. Castelnovo, R. Crupi, G. Greco, D. Regoli, I. G. Penco, and A. C. Cosentini. A clarification of the nuances in the fairness metrics landscape. Scientific Reports, 12(1):1 21, 2022.

[7] J. M. Chambers, W. S. Cleveland, B. Kleiner, and P. A. Tukey. Graphical methods for data analysis. Chapman and Hall/CRC, 2018.

[8] C. Cheng and E. Parzen. Unified estimators of smooth quantile and quantile density functions. Journal of statistical planning and inference, 59(2):291 307, 1997.

[9] J. Cho, G. Hwang, and C. Suh. A fair classifier using kernel density estimation. Advances in Neural Information Processing Systems, 33:15088 15099, 2020.

[10] A. Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data, 5(2):153 163, 2017.

[11] E. Chzhen and N. Schreuder. A minimax framework for quantifying risk-fairness trade-off in regression. The Annals of Statistics, 50(4):2416 2442, 2022.

[12] E. Chzhen, C. Denis, M. Hebiri, L. Oneto, and M. Pontil. Fair regression with wasserstein barycenters. Advances in Neural Information Processing Systems, 33:7321 7331, 2020.

[13] P. Czarnowska, Y. Vyas, and K. Shah. Quantifying social biases in nlp: A generalization and empirical comparison of extrinsic fairness metrics. Transactions of the Association for Computational Linguistics, 9: 1249 1267, 2021.

[14] L. Ding, D. Yu, J. Xie, W. Guo, S. Hu, M. Liu, L. Kong, H. Dai, Y. Bao, and B. Jiang. Word embeddings via causal inference: Gender bias reducing and semantic information preserving. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 11864 11872, 2022.

[15] M. Falk. Relative deficiency of kernel type estimators of quantiles. The Annals of Statistics, pages 261 268, 1984.

[16] S. Gallón, J.-M. Loubes, and E. Maza. Statistical properties of the quantile normalization method for density curve alignment. Mathematical biosciences, 242(2):129 142, 2013. doi: 10.1016/j.mbs.2012.12.007.

[17] J. García, P. J. Hernández, and A. López-Nicolás. How wide is the gap? an investigation of gender wage differences using quantile regression. Empirical Economics, 26(1):149 167, Mar 2001. doi: 10.1007/s001810000050.

[18] J. Gardeazabal and A. Ugidos. Gender wage discrimination at quantiles. Journal of Population Economics, 18(1):165 179, Mar 2005. doi: 10.1007/s00148-003-0172-z.

[19] T. L. Gouic, J.-M. Loubes, and P. Rigollet. Projection to fairness in statistical learning. ar Xiv:2005.11720 [cs, math, stat], Jun 2020. ar Xiv: 2005.11720.

[20] P. Han, L. Kong, J. Zhao, and X. Zhou. A general framework for quantile estimation with incomplete data. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 81(2):305 333, 2019.

[21] H. J. Holzer and D. Neumark. Affirmative action: What do we know? Journal of policy analysis and management, 25(2):463 490, 2006.

[22] S. Hu, J. A. Al-Ani, K. D. Hughes, N. Denier, A. Konnikov, L. Ding, J. Xie, Y. Hu, M. Tarafdar, B. Jiang, et al. Balancing gender bias in job advertisements with text-level bias mitigation. Frontiers in big Data, 5, 2022.

[23] R. Jiang, A. Pacchiano, T. Stepleton, H. Jiang, and S. Chiappa. Wasserstein fair classification. In Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, page 862 872. PMLR, Aug 2020.

[24] R. Koenker and G. Bassett. Regression quantiles. Econometrica: journal of the Econometric Society, pages 33 50, 1978.

[25] M. J. Kusner, J. Loftus, C. Russell, and R. Silva. Counterfactual fairness. Advances in neural information processing systems, 30, 2017.

[26] E. L. Lehmann, J. P. Romano, and G. Casella. Testing statistical hypotheses, volume 3. Springer, 2005.

[27] N. Meinshausen and G. Ridgeway. Quantile regression forests. Journal of Machine Learning Research, 7 (6), 2006.

[28] S. Milli, J. Miller, A. D. Dragan, and M. Hardt. The social cost of strategic classification. In Proceedings of the Conference on Fairness, Accountability, and Transparency, pages 230 239, 2019.

[29] L. Oneto, M. Donini, and M. Pontil. General fair empirical risk minimization. ar Xiv:1901.10080 [cs, stat], Dec 2019. ar Xiv: 1901.10080.

[30] M. Pietrosanu, J. Gao, L. Kong, B. Jiang, and D. Niu. Advanced algorithms for penalized quantile and composite quantile regression. Computational Statistics, 36(1):333 346, Mar 2021. ISSN 0943-4062, 1613-9658. doi: 10.1007/s00180-020-01010-1.

[31] D. Plecko and N. Meinshausen. Fair data adaptation with quantile preservation. Journal of Machine Learning Research, 21(242):1 44, 2020.

[32] D. Pleˇcko, N. Bennett, and N. Meinshausen. fairadapt: Causal reasoning for fair data pre-processing. ar Xiv preprint ar Xiv:2110.10200, 2021.

[33] Y. Romano, R. Barber, C. Sabatti, and E. Candès. With malice towards none: Assessing uncertainty via equalized coverage. Ar Xiv, 2019. doi: 10.1162/99608f92.03f00592.

[34] Y. Romano, E. Patterson, and E. Candes. Conformalized quantile regression. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.

[35] J. W. Taylor. A quantile regression neural network approach to estimating the conditional density of multiperiod returns. Journal of Forecasting, 19(4):299 311, 2000.

[36] Y. Wang, L. Kong, B. Jiang, X. Zhou, S. Yu, L. Zhang, and G. Heo. Wavelet-based lasso in functional linear quantile regression. Journal of Statistical Computation and Simulation, 89(6):1111 1130, 2019.

[37] R. Williamson and A. Menon. Fairness risk measures. In International Conference on Machine Learning, pages 6786 6797. PMLR, 2019.

[38] Z. Xiao and R. Koenker. Conditional quantile estimation for generalized autoregressive conditional heteroscedasticity models. Journal of the American Statistical Association, 104(488):1696 1712, 2009.

[39] D. Yang, J. Lafferty, and D. Pollard. Fair quantile regression. ar Xiv preprint ar Xiv:1907.08646, 2019.

[40] S.-S. Yang. A smooth nonparametric estimator of a quantile function. Journal of the American Statistical Association, 80(392):1004 1011, 1985. ISSN 0162-1459. doi: 10.2307/2288567.

[41] D. Yu, L. Kong, and I. Mizera. Partial functional linear quantile regression for neuroimaging data analysis. Neurocomputing, 195:74 87, 2016.

[42] D. Yu, L. Zhang, I. Mizera, B. Jiang, and L. Kong. Sparse wavelet estimation in quantile regression with multiple functional predictors. Computational statistics & data analysis, 136:12 29, 2019.

[43] R. Zemel, Y. Wu, K. Swersky, T. Pitassi, and C. Dwork. Learning fair representations. In International conference on machine learning, pages 325 333. PMLR, 2013.

[44] G. Zeni, M. Fontana, and S. Vantini. Conformal prediction: a unified review of theory and new challenges. ar Xiv preprint ar Xiv:2005.07972, 2020.

[45] Z. Zhang and H.-G. Müller. Functional density synchronization. Computational Statistics & Data Analysis, 55(7):2234 2249, 2011. doi: 10.1016/j.csda.2011.01.007.

1. For all authors...

(a) Do the main claims made in the abstract and introduction accurately reflect the paper s contributions and scope? [Yes] (b) Did you describe the limitations of your work? [Yes] See Section 7

(c) Did you discuss any potential negative societal impacts of your work? [No] We currently don t foresee the potential negative societal impacts of our work, however, we anticipate this method to be used in the future by practitioners to real-life scenarios involving job markets, medical science, etc., and to potentially help making decisions that would protect the vulnerable and propel social harmony. (d) Have you read the ethics review guidelines and ensured that your paper conforms to them? [Yes]

2. If you are including theoretical results...

(a) Did you state the full set of assumptions of all theoretical results? [Yes] (b) Did you include complete proofs of all theoretical results? [Yes]

3. If you ran experiments...

(a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)?

[Yes] (c) Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? [Yes] (d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [N/A]

4. If you are using existing assets (e.g., code, data, models) or curating/releasing new assets...

(a) If your work uses existing assets, did you cite the creators? [Yes] (b) Did you mention the license of the assets? [N/A]

(c) Did you include any new assets either in the supplemental material or as a URL? [N/A] (d) Did you discuss whether and how consent was obtained from people whose data you re using/curating? [Yes] (e) Did you discuss whether the data you are using/curating contains personally identifiable information or offensive content? [N/A]

5. If you used crowdsourcing or conducted research with human subjects...

(a) Did you include the full text of instructions given to participants and screenshots, if applicable?

[N/A] (b) Did you describe any potential participant risks, with links to Institutional Review Board (IRB) approvals, if applicable? [N/A] (c) Did you include the estimated hourly wage paid to participants and the total amount spent on participant compensation? [N/A]