# multidimensional_fair_federated_learning__30552c61.pdf

Multi-Dimensional Fair Federated Learning

Cong Su1, Guoxian Yu1,2,*, Jun Wang2, Hui Li1,2, Qingzhong Li1,2,*, Han Yu3

1School of Software, Shandong University, Jinan, China 2SDU-NTU Joint Centre for AI Research, Shandong University, Jinan, China 3School of Computer Science and Engineering, Nanyang Technological University, Singapore csu@mail.sdu.edu.cn, {gxyu, kingjun, lih, lqz}@sdu.edu.cn, han.yu@ntu.edu.sg

Federated learning (FL) has emerged as a promising collaborative and secure paradigm for training a model from decentralized data without compromising privacy. Group fairness and client fairness are two dimensions of fairness that are important for FL. Standard FL can result in disproportionate disadvantages for certain clients, and it still faces the challenge of treating different groups equitably in a population. The problem of privately training fair FL models without compromising the generalization capability of disadvantaged clients remains open. In this paper, we propose a method, called m Fair FL, to address this problem and achieve group fairness and client fairness simultaneously. m Fair FL leverages differential multipliers to construct an optimization objective for empirical risk minimization with fairness constraints. Before aggregating locally trained models, it first detects conflicts among their gradients, and then iteratively curates the direction and magnitude of gradients to mitigate these conflicts. Theoretical analysis proves m Fair FL facilitates the fairness in model development. The experimental evaluations based on three benchmark datasets show significant advantages of m Fair FL compared to seven state-of-the-art baselines.

Introduction The widespread adoption of machine learning models has given rise to significant apprehensions regarding fairness, spurring the emergence of fairness criteria and models. In recent times, a multitude of fairness criteria have been put forth, with one of the most widely acknowledged ones being group fairness (Hardt, Price, and Srebro 2016; Ustun, Liu, and Parkes 2019). Group fairness might also be mandated by legal statutes (EU et al. 2012), necessitating models to impartially treat distinct groups concerning sensitive attributes such as age, gender, and race. Building upon these concepts of group fairness, numerous methodologies have been introduced to train equitable models, predicated on the premise that the model can directly access the complete training dataset (Zafar et al. 2017; Roh et al. 2021). However, the ownership of these datasets often resides with disparate institutions, rendering them inaccessible for sharing due to privacy safeguarding considerations.

*Corresponding author: Guoxian Yu and Qingzhong Li. Copyright 2024, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

Federated learning (FL) (Wang et al. 2021a) stands as a distributed learning paradigm that facilitates the collective training of a model by multiple data owners, all while retaining their data within their local domains. If each data owner was to individually train a fairness model on their own data and subsequently contribute it for aggregation, akin to the methods of Fed Avg (Mc Mahan et al. 2017) and Fed OPT (Reddi et al. 2020), a promising way opens up for augmenting model fairness within decentralized contexts. However, the presence of data heterogeneity, manifesting in variations in sizes and distributions across different clients, introduces a distortion to the localized efforts aimed at enhancing fairness in the global model. Consequently, a disparity emerges between the impartial model aggregated in a straightforward manner, utilizing fairness models trained on diverse client datasets, and the model achieved under centralized circumstances. Meanwhile, a simplistic pursuit of minimizing the aggregation loss in the federated system can lead the global model astray, favoring certain clients excessively and disadvantaging others, thereby engendering what is termed as client fairness (CF). Preceding endeavors have predominantly centered on rectifying issues concerning client fairness. These efforts encompass methodologies such as re-weighting client aggregation weights (Zhao and Joshi 2022), tackling distributed mini-max optimization challenges (Mohri, Sivek, and Suresh 2019), or mitigating conflicts between clients (Hu et al. 2022). In contrast, our emphasis pivots toward multi-dimensional fairness, encompassing both group and client fairness, aligning with legal stipulations and ethical considerations. This dual focus also significantly influences the willingness of clients to actively engage in the FL process, thereby contributing to datasets that are more comprehensive and representative for the training of the global model. However, the inherent decentralized nature of this approach provokes complexities in achieving equitable training for a global model, particularly when confronted with the complexities of heterogeneous data distributions spanning the client landscape. The intricate challenge of privately training an equitable model from such decentralized, disparate data, while ensuring equitable treatment for each contributing client, poses a formidable conundrum. We aim to address this open and intricate quandary.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

We propose the Multi-dimensional Fair Federated Learning (m Fair FL) method, which aims to ensure equity not only among distinct sensitive groups but also across individual clients. The core principle of m Fair FL involves the optimization of client models under the guidance of fairness constraints. Prior to the execution of gradient aggregation on the central server, m Fair FL evaluates the potential presence of conflicting gradients among clients by assessing their gradient similarities. Subsequently, m Fair FL undertakes an iterative process wherein it tactically adjusts the direction and magnitude of conflicting gradients to mitigate such disparities. Through this nuanced strategy, m Fair FL adeptly navigates the delicate balance between equitable treatment and optimal accuracy, catering to both marginalized sensitive groups and individual clients. The schematic framework of m Fair FL is depicted in Figure 1. Our contributions can be succinctly outlined as follows: (i) We introduce an innovative framework for fair federated learning, denoted as m Fair FL, and establish its capacity to bolster model fairness concerning sensitive groups within a decentralized data context. (ii) m Fair FL conceptualizes the pursuit of fairness optimization through a meticulously designed minimax framework, replete with a group fairness metric as constraints. It analyzes and adjusts the trajectory and magnitude of potentially conflicting gradients throughout the training process, which adeptly augments group fairness across the entire populace while ensuring an impartial treatment of each client within the global model. (iii) Through both theoretical and experimental analysis, we demonstrate that m Fair FL excels in mitigating gradient conflicts among clients, ultimately achieving a higher degree of group fairness compared to the state of the art (SOTA).

Related Work With the growing concern surrounding fairness, various approaches have been proposed. To analyze the problem, we categorize fairness models into two types: centralized and federated, based on their training protocols. Fairness models on centralized data. In the context of centralized data, it is common to modify the training framework to attain an appropriate level of group fairness, ensuring that a classifier exhibits comparable performance across different sensitive groups. Several techniques have been devised to address group fairness issues within the centralized setting, which can be categorized into three types: pre-processing (Salimi et al. 2019), in-processing (Garg et al. 2019), and post-processing (Mishler, Kennedy, and Chouldechova 2021) methods. For more extensive insights into fairness methods applied to centralized data, refer to the recent literature survey (Pessach and Shmueli 2022). Fairness FL models on decentralized data. In contrast, achieving fairness within the practical FL setting has received limited attention compared to centralized solutions (Wang et al. 2021b). The notion of fairness in FL differs slightly from the standard concept in centralized learning. Client fairness, a popular fairness definition in FL, aims to ensure that all clients (i.e., data owners) achieve similar accuracy. Previous attempts to achieve client fairness in

Fairness Accuracy

Detect Conflicts

Update : Statistics vector : Model parameters : Lagrange Multi-

Institution 1 Institution 2 Institution

Mitigate Conflicts

Fairness Accuracy

Fairness Accuracy

Figure 1: An overview of m Fair FL, which formulates a minimax constrained optimization problem in terms of accuracy and fairness. Before aggregating client gradients, it detects the presence of gradient conflicts, and then mitigates the conflicting gradients through gradient adjustments to align them with the overall fairness objective.

FL include modifying the aggregation weights of clients to achieve a uniform service quality for all clients (Li et al. 2019; Zhao and Joshi 2022; Yue, Nouiehed, and Al Kontar 2023), selecting biased clients for update (Cho et al. 2020), and finding the common update direction for all clients (Hu et al. 2022). Only a few studies are dedicated to group fairness in FL (Abay et al. 2020; Du et al. 2021; G alvez et al. 2021). For example, Zeng, Chen, and Lee (2021) updated the weight of local loss for each sensitive group during the global aggregation phase. Ezzeldin et al. (2023) adapted client weights based on local fairness of each client and deviations from global one. However, these methods disregard the gradient conflicts, which lead to performance decline and unfavourable outcomes for certain clients. m Fair FL aims to eliminate bias towards different groups (group fairness) based on sensitive attributes and to learn a global model that benefits all clients, thereby achieving better client fairness alongside group fairness.

Preliminaries Federated Learning

Following the typical FL setting (Mc Mahan et al. 2017), suppose there are K different clients, and each client can only access its own dataset Dk = {di k = (si k, xi k, yi k)}nk i=1 D, where si k is the sensitive attribute of client k, yi k is the label, xi k is other observational attributes, nk is the number of client samples. The goal of FL is to train a global model parameterized with w Rm (m is the number of parameters) on client datasets Dk with guaranteed privacy:

min w Rm XK

i=1pi L(Di, w) (1)

where L(Di, w) = 1 ni Pni i=1 l(di k, w) is the local objective

function of client i with weights pi 0, PK i=1 pi = 1.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Fairness Notions In this paper, We focus on three canonical group fairness notions, i.e., Demographic Parity (DP), Equalized Odds (EO), and Accuracy Parity (AP) (Pessach and Shmueli 2022). For the sake of exposition, we describe these notions in the centralized setting.

Definition 1 (Demographic Parity) The model s predictions ˆY =ˆy are statistically independent of the sensitive attribute S. The extent of a model s unfairness with respect to Demographic Parity can be measured as follows:

DP(ˆy) = |P[ ˆY = ˆy|S = s] P[ ˆY = ˆy]| s S (2)

Definition 2 (Equalized Odds) Given the label Y =y, the predictions ˆY =ˆy are statistically independent of the sensitive attribute S. i.e., for all s S and y Y , we can measure the absolute difference between two prediction rates to quantify how unfair a model is in term of Equalized Odds:

EO(ˆy) = |P[ˆy|S = s, Y = y] P[ˆy|Y = y]| (3)

Definition 3 (Accuracy Parity) The model s misprediction rate is independent of the sensitive attribute:

P[ˆy = y|S = s] = P[ˆy = y] s S (4)

where, equivalently, we can measure the degree of unfairness in the model with respect to Accuracy Parity as follows:

AP(ˆy) = |P[L(D, w)|S = s] P[L(D, w)]| (5)

where L(D, w) is the loss function minimized in problem (1).

The above discussed fairness notions can be interpreted as the difference between each group and the overall population (Fioretto, Mak, and Van Hentenryck 2020). Formally, these notions can be rewritten as:

FN = |F(D, w) F(Ds, w)| (6)

where F(D, w) = 1 n P d D f(d, w), F(Ds, w) = 1 ns P ds D f(ds, w). Ds is the subset of D with S=s, and f is one of the fairness notions described above.

Necessity for FL to Improve Fairness In this subsection, we analyse the advantage of FL for improving fairness in decentralized settings. To build a fair model in decentralized settings, an intuitive solution (hereon referred to as Ind Fair ) is to independently train the fair local model using client data. Specifically, for client k, Ind Fair trains a fair model by solving the following problem:

min L(Dk, w) s.t. |F(Dk, w) F(Ds k, w)| α, s S (7)

where α [0, 1] is the fairness tolerance threshold. Let gα k be the trained model of client k, then the overall performance

of Ind Fair is defined as the mixture of all clients:

Bern(gα 1 (x, s)), w.p. 1/K Bern(gα 2 (x, s)), w.p. 1/K Bern(gα K(x, s)), w.p. 1/K

= Bern((gα 1 (x, s) + + gα K(x, s))/K)

= Bern(g Seq α )

where Bern stands for Bernoulli distribution, and w.p. is the abbreviation for with probability . On the other hand, we can train a fair global model (hereon referred to as Fed Fair ) on decentralized data through FL. The fair global model g F ed α is obtained by solving a constrained problem:

min L(D, w), s.t. |F(Dk, w) F(Ds k, w)| α, for all k = 1, 2, ..., K. (9)

Here, an important question raises: can Fed Fair achieve a better fairness than Ind Fair? The following theorem gives the confirm answer. Theorem 1 (Necessity for FL) If the data distribution is highly heterogeneous across clients, then min FN(g Ind α ) > min FN(g F ed α ). Theorem 1 means that in decentralized setting, there is a fairness gap between federated methods and non-federated ones, and FL improves the fairness performance. The proof and formal representation are deferred into the Supplementary file (Su et al. 2023).

The Proposed Approach Theorem 1 demonstrates the potency of FL in effectively bolstering model fairness while safeguarding against data leakage within a decentralized context. Nevertheless, employing fairness methods directly within the FL framework might not be the optimal approach. This challenge arises from the significant heterogeneity in data distributions across clients. Consequently, the localized fairness performance could diverge from fairness across the entire population. Additionally, in this scenario, the concept of client fairness gains prominence as another critical facet of fairness that necessitates consideration. To tackle these intricacies, we introduce m Fair FL, a solution designed to confidentially train a global model while integrating group fairness. This approach effectively mitigates the adverse effects of gradient conflicts among clients, as depicted in Figure 1. m Fair FL strategically transforms the fairness-constrained problem into an unconstrained problem that enforces fairness through the use of Lagrange multipliers. In each communication round, every client computes its training loss, measures of fairness violations, and gradients. Subsequently, these statistics are communicated to the FL server (aggregation phase). The server then identifies and rectifies conflicting gradients direction and magnitude before aggregation. This refined model is then updated and distributed to clients (local training phase).

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

This intricate process enables m Fair FL to attain a precise global model that remains equitable for both sensitive groups and individual clients. The subsequent subsections delve into the finer technical intricacies of our approach.

Local Statistics Computation Phase Our goal is to train an optimal model from decentralized data while satisfying group fairness. For this purpose, we directly inject the group fairness constraint into the model training:

w = min w Rm L(D, w)

s.t. |F(D, w) F(Ds, w)| α, s S (10)

where L(D, w) = 1

n P d D l(d, w), F(D, w) is the fairness metric defined in Eq. (6). Let h(w)=[h1(w), h2(w), , h|S|(w)] where hs(w)=|F(D, w) F(Ds, w)| α. We use the similar technique from the Lagrangian approach (Fioretto et al. 2021) to relax the constraint:

J(w, α) = L(D, w) + λh(w). (11)

The relaxation provides more freedom for the optimization algorithm to find solutions that may not strictly satisfy all the constraints, but rather approximate them within an acceptable range. Thus, the objective function in Eq. (11) can be optimized using gradient descent/ascent: λ λ + γh(w), w w η( w L(D, w) + λ wh(w)). (12)

Based on Eq. (12), each client computes the following statistics required for the server to perform model updates:

L(D, w); w L(D, w); F(D, w); [F(Ds, w)]s S; w F(D, w); w[F(Ds, w)]s S. (13)

In fact, some of these statistics can be obtained from others: F(D, w)=P s S F(Ds, w), w F(D, w)=P s S w F(Ds, w). Therefore, in each communication round, a client reports a statistics vector to the server as:

zk =[L(Dk, w); w L(Dk, w); [F(Ds k, w)]s S; w[F(Ds k, w)]s S] (14)

We define the training loss of client k in round t as lt k=L(Dk, w) + λh(w), and the updated gradient gt k= w L(Dk, w) + λ wh(w). Let Gt={gt 1, gt 2, , gt K} represent the gradients received by the server from clients, and Lt={lt 1, lt 2, , lt K} be the received client losses.

Aggregation Phase During the aggregation phase, the server leverages zk provided by clients to refine and update the global model via Eq. (12). Owing to the presence of diverse data distributions, gradient conflicts emerge among clients. In isolation, these conflicts might not be inherently detrimental, as straightforward gradient averaging can effectively optimize the global objective function (Mc Mahan et al. 2017). However, when

conflicts among gradients involve considerable variations in magnitudes, certain clients could encounter pronounced drops in performance. For instance, consider the scenario of training a binary classifier. If a subset of clients holds a majority of data pertaining to one class, and conflicts in gradients arise between these two classes, the global model could become skewed toward the majority-class clients, thereby compromising performance on the other class. Moreover, even when class balance is maintained among clients, disparities in gradient magnitudes may persist due to divergent sample sizes across clients. Therefore, before aggregating clients gradients in each communication round, m Fair FL first checks whether there are any conflicting gradients among clients. If there are gradient conflicts, then at least a pair of client gradients (gt i, gt j) such that cos(gt i, gt j) < ˆϕt ij, where ˆϕt ij 0 is the gradient similarity goal of t-th communication round. Note that interactions among gradients (gradient similarity goal) change significantly across clients and communication rounds. Thus, m Fair FL performs Exponential Moving Average (EMA) (Wang and Tsvetkov 2021) to set appropriate gradient similarity goals for clients i and j in round t:

ˆϕt ij = δ ˆϕt 1 ij + (1 δ)ϕt ij (15)

where δ is the hyper-parameter, and ϕt ij = cos(gt i, gt j) is the computed gradient similarity. Specifically, ˆϕ0 ij = 0. In order to mitigate the adverse repercussions stemming from gradient conflicts among clients, m Fair FL introduces an innovative gradient aggregation strategy. Specifically, the approach initiates by arranging clients gradients within Gt in ascending order, based on their respective loss values. This orchestrated arrangement yields POt, which outlines the sequence for utilizing each gradient as a reference projection target. Subsequently, through an iterative process, m Fair FL systematically adjusts the magnitude and orientation of the k-th client gradient, denoted as gt k, so as to align with the desired similarity criteria between gt k and the target gradient gt j POt, in accordance with the prescribed order set by POt: gt k = c1 gt k + c2 gt j (16)

Since there are infinite valid combinations of c1 and c2, we fix c1 = 1 and apply the Law of Sines on the planes of gt k and gt j to calculate the value of c2, and obtain the derived new gradient for the k-th client:

gt k = gt k ||gt k||(ϕt kj q

1 (ˆϕt kj)2 ˆϕt kj q

1 (ϕt kj)2)

1 (ˆϕt kj)2 gt j

(17) The derivation detail is deferred into the Supplementary file.

Theorem 2 Suppose there is a set of gradients G = {g1, g2, ..., g K} where gi always conflicts with gtj j before adjusting gtj j to match similarity goal between gtj j and gi (gtj j represents the gradient adjusting gj with the target gradients in G for tj times). Suppose ϵ1 |cos(gti i , gtj j )| ϵ2,

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

0<ϵ1<ˆϕij ϵ2 1, for each gi G, as long as we iteratively project gi onto gk s normal plane (skipping gi itself) in the ascending order of k=1, 2, , K, the larger the k is, the smaller the upper bound of conflicts between the aggregation gradient of global model gglobal and gk is. The maximum value of |gglobal gk| is bounded by

K (maxi ||gi||)2 ϵ2Xmax(1 Xmin)(1 (1 Xmin)K k)

Xmin , where

1 ˆϕ2 and Xmin= ϵ1

Theorem 2 substantiates that the later a client s gradient assumes the role of the projection target, the fewer conflicts it will engage in with the ultimate averaged gradient computed by m Fair FL. Consequently, in the pursuit of refining the model s performance across clients with comparatively lower training proficiency, we position clients with higher training losses towards the end of the projecting target order list, denoted as POt. Additionally, these gradients provide the optimal model update direction. To further amplify the focus on client fairness, we permit βK clients with suboptimal performance to retain their original gradients. The parameter β modulates the extent of conflict mitigation and offers a means to strike a balance. When β=1, all clients are mandated to mitigate conflicts with others. Conversely, when β=0, all clients preserve their original gradients, aligning m Fair FL with Fed Avg. By adopting this approach, m Fair FL effectively alleviates gradient conflicts, corroborated by the findings in Theorem 2. Consequently, m Fair FL is equipped to set an upper limit on the maximum conflict between any client s gradient and the aggregated gradient of the global model. This strategic stance enables m Fair FL to systematically counteract the detrimental repercussions stemming from gradient conflicts. Algorithm 1 in the Supplementary file outlines the main procedures of m Fair FL. Theorem 3 proves that m Fair FL can find the optimal value w within a finite number of communications. This explains why m Fair FL can effectively train a group and client fairness-aware model in the decentralized setting. The proof can be found into the Supplementary file.

Theorem 3 Suppose there are K objective functions J1(w), J2(w), , JK(w), and each objective function is differentiable and L-smooth. Then m Fair FL will converage to the optimal w within a finite number of steps.

Experimental Evaluation

Experimental Setup

In this section, we conduct experiments to evaluate the effectiveness of m Fair FL using three real-world datasets: Adult (Dua and Graff 2017), COMPAS (Pro Publica. 2016), and Bank (Moro, Cortez, and Rita 2014). The Adult dataset contains 48,842 samples, with gender treated as the sensitive attribute. There are 7,214 samples in the COMPAS dataset, with gender treated as the sensitive attribute. As for the Bank dataset with 45,211 samples, with age treated as the sensitive attribute. We split the data among five FL clients in

an non-iid manner.1

For the purpose of comparative analysis, we consider several baseline methods, categorized into three groups: (i) independent training of the fair model within a decentralized context (Ind Fair); (ii) fair model training via Fed Avg (Fed Avg-f); (iii) fair model training within a centralized setting (Cen Fair). Three SOTA FL with group fairness: (i) Fed FB (Zeng, Chen, and Lee 2021), which adjusts each sensitive group s weight for aggregation; (ii) FPFL (G alvez et al. 2021), which enforces fairness by solving the constrained optimization; (iii) Fair Fed (Ezzeldin et al. 2023), which adjusts clients weights based on locally and global trends of fairness mtrics. In addition to these, we evaluate our proposed m Fair FL against cutting-edge FL methods that emphasize client fairness, including: (i) q-FFL (Li et al. 2019), which adjusts client aggregation weights using a hyperparameter q; (ii) DRFL (Zhao and Joshi 2022), which automatically adapts client weights during model aggregation; (iii) Ditto (Li et al. 2021), a hybrid approach that merges multitask learning with FL to develop personalized models for each client; and (iv) Fed MGDA+ (Hu et al. 2022), which frames FL as a multi-objective optimization problem. Throughout our experiments, we adhere to a uniform protocol of 10 communication rounds and 20 local epochs for all FL algorithms. For other methods, we execute 200 epochs, leveraging cross-validation techniques on the training sets to determine optimal hyperparameters for the comparative methods. All algorithms are grounded in Re LU neural networks with four hidden layers, thereby ensuring an equal count of model parameters.2 We use the same server (Ubuntu 18.04.5, Intel Xeon Gold 6248R and Nvidia RTX 3090) to perform experiments.

Estimation on Group Fairness

We undertake a comprehensive comparative analysis, focusing on the accuracy and group fairness aspects of the evaluated methods. To delve into the intricate relationship between method performance and data heterogeneity, we extend the setting of Mc Mahan et al. (2017) for constructing heterogeneous data, emphasizing the heterogeneity in the sensitive attributes and data sizes across clients. Specifically, we group the datsets by sensitive attributes, and randomly assign 30%, 30%, 20%, 10%, 10% of the samples from group 0 and 10%, 20%, 20%, 20%, 30% of the samples from group 1 to five clients, respectively. The outcomes of this data splitting strategy, encompassing average accuracy along with standard deviations and the Demographic Parity violation score for each method, are outlined in Table S2. Furthermore, for datasets characterized by pronounced data heterogeneity, we draw samples from each group across five clients at a ratio of 50%, 10%, 10%, 20%, 10%, and 10%, 40%, 30%, 10%, 10%, respectively. The corresponding experimental outcomes are showcased in Table 1. From the

1Due to page limit, we include the experiments conducted in a more general setting with multiple sensitive attributes and multiple values for sensitive attributes in Supplementary file (In Table S3). 2Further elaboration on the selection of hyperparameters for m Fair FL can be found in the Supplementary file.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Adult Compas Bank Acc. DP EO AP CF Acc. DP EO AP CF Acc. DP EO AP CF Ind Fair .768 .083 .071 .077 - .573 .083 .097 .085 - .831 .028 .025 .029 - Fed Avg-F .706 .224 .164 .218 .232 .558 .059 .066 .062 .184 .828 .033 .034 .033 .143 Fed FB .779 .014 .007 .011 .058 .557 .023 .019 .021 .033 .837 .014 .016 .009 .067 FPFL .754 .023 .016 .019 .228 .553 .033 .018 .024 .157 .822 .008 .012 .010 .153 Fair Fed .756 .009 .004 .008 .244 .551 .009 .003 .004 .186 .824 .003 .004 .004 .164 Fed MGDA+ .837 .238 .237 .238 .063 .635 .136 .141 .137 .044 .874 .084 .077 .085 .065 Cen FL .812 .014 .008 .013 - .616 .014 .008 .011 - .866 .001 .000 .002 - m Fair FL .792 .012 .003 .007 .036 .596 .005 .009 .003 .022 .844 .005 .006 .003 .028

Table 1: Accuracy and the violation of Group fairness and Client Fairness results on the three datasets with high data heterogeneity among clients. The best results in fairness are highlighted in boldface. / indicates that m Fair FL is statistically worse/better than the compared method by student pairwise t-test at 95% confident level. - implies not applicable.

insights gleaned from Tables S2 and 1, we observe that: (i) m Fair FL prominently enhances fairness, achieving a parity of fairness akin to Cen Fair. This substantiates m Fair FL s efficacy in skillfully training fair models for sensitive groups within the decentralized data landscape. (ii) The lackluster performance of Ind Fair in terms of group fairness accentuates that in a decentralized scenario, fairness models exclusively trained on local data fall considerably short of achieving group fairness at a population-wide level. It is also noteworthy that Fed Avg-f occasionally exhibits lower accuracy than Ind Fair. This discrepancy arises from the aggregation strategy of Fed Avg-f, which can have unintended consequences for certain clients. Conversely, Ind Fair manages to ensure fairness for specific sub-distributions through the training of each local model. (iii) In direct comparison, m Fair FL distinctly outperforms Fed Avg-f in both accuracy and fairness, thus underscoring the constraints inherent in merely grafting fairness techniques onto the FL paradigm. Evidently, the group fairness achieved by Fed Avg-f lags behind the fairness exhibited across the entire population. This gap is particularly pronounced in scenarios characterized by high data heterogeneity among clients. Through the judicious amalgamation of fairness techniques with the decentralized essence of FL, and its steadfast commitment to ensuring advantageous model updates for all clients, m Fair FL adeptly enhances both the overarching fairness and accuracy, thereby offering a comprehensive improvement. (iv) Notably, m Fair FL can better trade-off accuracy and group fairness than Fed FB, FPFL and Fair Fed. This is because they overlook the detrimental effects of the conflicting gradients with large difference in the magnitudes, leading to accuracy reduction and harming certain clients. Fed MGDA+ frequently yields the highest accuracy in various scenarios, but markedly infringes upon the fairness of model decisions as applied to disadvantaged groups. This is primarily attributed to the fact that Fed MGDA+ concentrate solely on aligning client accuracy without due regard for mitigating discrimination against sensitive groups. (v) Upon juxtaposing the outcomes presented in Table 1 (high heterogeneity) with those in Table S2 (low heterogeneity), a salient observation arises: m Fair FL demonstrates a marginal decrease in both fairness and accuracy when tran-

sitioning from low to high data heterogeneity. This underscores the robustness intrinsic to m Fair FL when grappling with heterogeneous data. Such a consistency aligns with our initial expectations, as m Fair FL adeptly orchestrates gradient directions and magnitudes to navigate conflicts, thereby ensuring equitable model updates across all clients. Conversely, Fed Avg-f manifests notable performance disparities across distinct data heterogeneity levels, with a particularly steep decline observed in scenarios characterized by high data heterogeneity. This vulnerability is attributed to Fed Avg-f s simplistic gradient averaging approach, which insufficiently accommodates the intricate impact of data heterogeneity on the global model.

Estimation on Client Fairness

The group fairness-aware model cultivated by m Fair FL brings about advantages for each participating client, all while avoiding any undue preference towards specific clients. To further validate this assertion, we embark on a series of experiments designed to evaluate m Fair FL s performance in terms of Client Fairness, subsequently juxtaposing it against other pertinent fairness methods. Client Fairness stands as a potent metric for gauging whether the global model disproportionately favors particular clients while disregarding the rest. To further accentuate the discerning capabilities of m Fair FL, we undertake the random allocation of samples: 50% and 10% of group 0 samples, coupled with 10%, 20%, and 10% of group 1 samples, are assigned to the 1st, 2nd, 3rd, 4th, and 5th clients, respectively. This deliberate strategy accentuates pronounced data heterogeneity across clients. For reference, Fed Avg constitutes the baseline in this experimental setup. The resulting accuracy and violation scores pertinent to Client Fairness for each method are succinctly presented in Table 2. Our observations from this comparative analysis are as follows: (i) Remarkably, m Fair FL emerges as the frontrunner, boasting the most modest client fairness violation scores while achieving accuracy on par with other fairness-focused FL methods. In essence, m Fair FL excels in abating the potential biases inherent to FL contexts. Through its meticulous consideration of conflicting gradients and adept adjustments to their directions and magnitudes, m Fair FL guarantees a more equitable distribution of model updates among

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Fed Avg q-FFL DRFL Ditto Fed MGDA+ m Fair FL

Adult Accuracy .792 .718 .762 .834 .837 .853 CF Vio. .219 .081 .084 .074 .063 .035

Compas Accuracy .594 .566 .598 .629 .635 .668 CF Vio. .179 .058 .031 .024 .044 .018

Bank Accuracy .842 .816 .864 .877 .874 .889 CF Vio. .147 .110 .090 .068 .065 .022

Table 2: Accuracy ( ) and Client Fairness violation score ( ) on three datasets with high heterogeneity among clients. / indicates that m Fair FL is statistical worse/better than the compared method (student pairwise t-test at 95% confident level).

clients. This concerted effort tangibly diminishes the breach of client fairness, ultimately heralding a more even allocation of model updates among all participating clients. In contrast, both q-FFL and DRFL endeavor to tackle client fairness by manipulating client aggregation weights, but falter in effectively addressing conflicts characterized by substantial gradient magnitude disparities. Ditto aims to strike a balance between local and global models, engendering personalized models for individual clients. However, its global model aggregation strategy closely resembles that of Fed Avg, potentially yielding unfavorable outcomes for certain clients. In the same vein, Fed MDGA+ aspires to pinpoint a shared update direction for all clients during federated training, inadvertently overlooking the influential role played by gradient magnitudes in model aggregation. Therefore, it is evident that m Fair FL stands as the epitome of achievement, outperforming its counterparts both in terms of client fairness and accuracy. (ii) Fed Avg, unfortunately, languishes at the bottom of the performance spectrum, marked by inferior accuracy and client fairness. This regression is traceable to its rudimentary averaging strategy, which disregards the disparate contributions of individual clients. Consequently, when confronting gradients in conflict with significantly divergent magnitudes, Fed Avg becomes susceptible to overfitting certain clients at the detriment of others. The significant difference in performance between m Fair FL and Fed Avg demonstrates the potency of m Fair FL in counteracting client conflicts. (iii) To further solidify m Fair FL s efficacy, we furnish the number of iterations imperative for all compared methods to attain their optimal performance in Figure S1 of the Supplementary file. This visual depiction affords insights into the convergence trajectories undertaken by distinct methods over iterations. Notably, m Fair FL exhibits commendable performance levels and converges towards the pinnacle of client fairness within a noticeably fewer (or comparable) count of communication rounds. This clearly underscores m Fair FL s capacity to efficiently train the model, attaining the desired accuracy and client fairness benchmarks with commendable efficacy.

Ablation Study

To prove the necessity of the projection order of m Fair FL, we introduce two variants of m Fair FL: (i) m Fair FL-rnd adjusts the gradients in a random order of the projection target. (ii) m Fair FL-rev adjusts gradients in the opposite order. The experimental settings are the same as the previous

ours-rnd ours-rev ours

Acc. .774 .768 .792 DP. .018 .027 .012 EO .013 .026 .003 AP .017 .028 .007 CF .049 .064 .036

Acc. .577 .580 .596 DP .014 .020 .005 EO .015 .018 .009 AP .012 .019 .003 CF .044 .049 .022

Acc. .837 .822 .844 DP .009 .023 .005 EO .013 .019 .006 AP .007 .021 .003 CF .047 .077 .028

Table 3: Accuracy ( ), Group fairness and Client Fairness violation scores ( ) of m Fair FL and its variants. indicates that m Fair FL is statistical better than the variant.

subsection. The results are shown in Table 3. It can be observed that the projection order has significant impact on the effectiveness of gradient projection. m Fair FL-rnd ignores the information provided by client losses. Thus, it loses to m Fair FL in terms of group fairness and client fairness. m Fair FL-rev achieves lower fairness than m Fair FL-rnd, indicating that the global model tends to neglecting clients with poorer performance when adjusting gradients in the opposite order of m Fair FL. The best multi-dimensional fairness performance is obtained by m Fair FL. This confirms that its loss-based order helps improve fairness.

Conclusions Addressing both group fairness and client fairness is paramount in the realm of FL. This paper introduces the novel m Fair FL method as a groundbreaking solution that adeptly navigates these dual dimensions of fairness. m Fair FL formulates the optimization conundrum as a minimax problem featuring group fairness constraints. Through meticulous adjustments to conflicting gradients throughout the training regimen, m Fair FL orchestrates model updates that distinctly benefit all clients in an equitable manner. Both theoretical study and empirical results confirm that mitigating client conflicts during global model update improves the fairness for sensitive groups, and m Fair FL effectively achieves both group fairness and client fairness. How to mitigate the privacy risks of m Fair FL is our future work.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Acknowledgements

This work is supported by NSFC (62031003, 62072380 and 62272276); National Key Research and Development Program of China (2022YFC3502101); Taishan Scholars Project Special Funding; Xiaomi Young Talents Program; CAAI-Huawei Mind Spore Open Fund; the National Research Foundation Singapore and DSO National Laboratories under the AI Singapore Programme (AISG Award No: AISG2-RP-2020-019); and the RIE 2020 Advanced Manufacturing and Engineering (AME) Programmatic Fund (No. A20G8b0102), Singapore.

Abay, A.; Zhou, Y.; Baracaldo, N.; Rajamoni, S.; Chuba, E.; and Ludwig, H. 2020. Mitigating bias in federated learning. ar Xiv preprint ar Xiv:2012.02447. Cho, Y. J.; Gupta, S.; Joshi, G.; and Ya gan, O. 2020. Banditbased communication-efficient client selection strategies for federated learning. In Asilomar Conference on Signals, Systems, and Computers, 1066 1069. Du, W.; Xu, D.; Wu, X.; and Tong, H. 2021. Fairness-aware agnostic federated learning. In SDM, 181 189. Dua, D.; and Graff, C. 2017. UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed: 2024-02-18. EU, E.; et al. 2012. Charter of fundamental rights of the European Union. The Review of International Affairs, 63(1147): 109 123. Ezzeldin, Y. H.; Yan, S.; He, C.; Ferrara, E.; and Avestimehr, A. S. 2023. Fairfed: Enabling group fairness in federated learning. In AAAI, 7494 7502. Fioretto, F.; Mak, T. W.; and Van Hentenryck, P. 2020. Predicting ac optimal power flows: Combining deep learning and lagrangian dual methods. In AAAI, 630 637. Fioretto, F.; Van Hentenryck, P.; Mak, T. W.; Tran, C.; Baldo, F.; and Lombardi, M. 2021. Lagrangian duality for constrained deep learning. In ECML PKDD, 118 135. G alvez, B. R.; Granqvist, F.; van Dalen, R.; and Seigel, M. 2021. Enforcing fairness in private federated learning via the modified method of differential multipliers. In Neur IPS 2021 Workshop Privacy in Machine Learning. Garg, S.; Perot, V.; Limtiaco, N.; Taly, A.; Chi, E. H.; and Beutel, A. 2019. Counterfactual fairness in text classification through robustness. In AAAI/ACM Conference on AI, Ethics, and Society, 219 226. Hardt, M.; Price, E.; and Srebro, N. 2016. Equality of opportunity in supervised learning. In Neur IPS, 3315 3323. Hu, Z.; Shaloudegi, K.; Zhang, G.; and Yu, Y. 2022. Federated learning meets multi-objective optimization. TNSE, 9(4): 2039 2051. Li, T.; Hu, S.; Beirami, A.; and Smith, V. 2021. Ditto: Fair and robust federated learning through personalization. In ICML, 6357 6368. Li, T.; Sanjabi, M.; Beirami, A.; and Smith, V. 2019. Fair Resource Allocation in Federated Learning. In ICLR.

Mc Mahan, B.; Moore, E.; Ramage, D.; Hampson, S.; and y Arcas, B. A. 2017. Communication-efficient learning of deep networks from decentralized data. In AISTAT, 1273 1282. Mishler, A.; Kennedy, E. H.; and Chouldechova, A. 2021. Fairness in risk assessment instruments: Post-processing to achieve counterfactual equalized odds. In ACM Conference on Fairness, Accountability, and Transparency, 386 400. Mohri, M.; Sivek, G.; and Suresh, A. T. 2019. Agnostic federated learning. In ICML, 4615 4625. Moro, S.; Cortez, P.; and Rita, P. 2014. A data-driven approach to predict the success of bank telemarketing. Decision Support Systems, 62: 22 31. Pessach, D.; and Shmueli, E. 2022. A Review on Fairness in Machine Learning. ACM Computing Surveys, 55(3): 1 44. Pro Publica. 2016. Compas recidivism risk score data and analysis. https://www.propublica.org/datastore/ dataset/compas-recidivism-risk-score-data-and-analysis. Accessed: 2024-02-18. Reddi, S.; Charles, Z.; Zaheer, M.; Garrett, Z.; Rush, K.; Koneˇcn y, J.; Kumar, S.; and Mc Mahan, H. B. 2020. Adaptive federated optimization. ar Xiv preprint ar Xiv:2003.00295. Roh, Y.; Lee, K.; Whang, S. E.; and Suh, C. 2021. Fair Batch: batch selection for model fairness. In ICLR. Salimi, B.; Rodriguez, L.; Howe, B.; and Suciu, D. 2019. Interventional fairness: Causal database repair for algorithmic fairness. In International Conference on Management of Data, 793 810. Su, C.; Yu, G.; Wang, J.; Li, H.; Li, Q.; and Yu, H. 2023. Multi-dimensional Fair Federated Learning. ar Xiv:2312.05551. Ustun, B.; Liu, Y.; and Parkes, D. 2019. Fairness without harm: Decoupled classifiers with preference guarantees. In ICML, 6373 6382. Wang, J.; Charles, Z.; Xu, Z.; Joshi, G.; Mc Mahan, H. B.; Al-Shedivat, M.; Andrew, G.; Avestimehr, S.; Daly, K.; Data, D.; et al. 2021a. A field guide to federated optimization. ar Xiv preprint ar Xiv:2107.06917. Wang, Z.; Fan, X.; Qi, J.; Wen, C.; Wang, C.; and Yu, R. 2021b. Federated learning with fair averaging. In IJCAI, 1615 1623. Wang, Z.; and Tsvetkov, Y. 2021. Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models. In ICLR. Yue, X.; Nouiehed, M.; and Al Kontar, R. 2023. Gifair-fl: A framework for group and individual fairness in federated learning. INFORMS Journal on Data Science, 2(1): 10 23. Zafar, M. B.; Valera, I.; Rogriguez, M. G.; and Gummadi, K. P. 2017. Fairness constraints: mechanisms for fair classification. In AISTAT, 962 970. Zeng, Y.; Chen, H.; and Lee, K. 2021. Improving fairness via federated learning. In ICLR. Zhao, Z.; and Joshi, G. 2022. A dynamic reweighting strategy for fair federated learning. In ICASSP, 8772 8776.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)