# federated_domain_generalization_with_decision_insight_matrix__7bd92cef.pdf

Federated Domain Generalization with Decision Insight Matrix

Tianchi Liao1 , Binghui Xie2 , Lele Fu1 , Sheng Huang1 , Bowen Deng1 , Chuan Chen1 and Zibin Zheng1

1Sun Yat-sen University, Guangzhou, China 2The Chinese University of Hong Kong, Hong Kong {liaotch, fulle, huangsh253, dengbw3}@mail2.sysu.edu.cn, bhxie21@cse.cuhk.edu.hk, {chenchuan, zhzibin}@mail.sysu.edu.cn

Federated domain generalization addresses the crucial challenge of developing models that can generalize across diverse domains while maintaining data privacy in federated learning settings. Current approaches either compromise privacy constraints or focus narrowly on specific aspects of model invariance, often incurring significant computational overhead. We propose a novel approach Fed DIM, which leverages the concept of insight matrix - a fine-grained representation of the model s decisionmaking process derived from element-wise products between feature vectors and classifier weights. By introducing a regularization term that promotes consistency between individual sample insight matrices and their class-wise mean representations, our method effectively captures both feature and classifier invariance. This approach not only maintains strict privacy requirements but also introduces minimal computational overhead as it utilizes intermediate computations already present in the forward pass. Extensive experiments demonstrate that our method achieves superior out-ofdistribution generalization compared to existing federated learning approaches while being simple to implement. Our work provides a new perspective on achieving robust generalization in federated learning settings through the lens of decisionmaking processes.

1 Introduction

Federated Learning (FL) has revolutionized the machine learning landscape by enabling collaborative model training across distributed clients while preserving data privacy [Liao et al., 2025]. In this paradigm, clients train local models on their private data, which are then periodically aggregated by a central server to form a global model, thereby circumventing the need for direct access to raw data [Li et al., 2020a; Li et al., 2024; Chen et al., 2025]. Although FL has demonstrated promising results in scenarios where data is independently and identically distributed (i.i.d.) [Mc Mahan et

Corresponding author.

Unseen Domain

Global model

Naive model

Figure 1: Problem illustration of federated domain generalization.

al., 2017; Shen et al., 2025], real-world applications often present a more complex challenge: clients typically collect data independently, resulting in distinct distributions across different domains. Furthermore, during deployment, models frequently encounter data from previously unseen target domains, leading to a significant distribution shift problem [Huang et al., 2023; Liao et al., 2024; Wan et al., 2024]. This scenario, known as Federated Domain Generalization (Fed DG) and illustrated in Figure 1, raises a fundamental challenge beyond traditional FL s data heterogeneity: how can federated models effectively generalize across diverse domains while maintaining privacy constraints? Traditional domain generalization methods focus on learning invariant relationships explicitly from data or representations [Hu et al., 2024; Fu et al., 2025a; Huang et al., 2025]. However, these methods require a centralized setting where data or representations are shared across clients, potentially compromising client privacy. To address this challenge, researchers have begun exploring federated domain generalization [Qi et al., 2024]. To investigate the invariance relationships between clients, various FL methods have been developed, focusing either on the feature level or the logit level [Qiao et al., 2024]. [Zhang et al., 2023] proposed a novel model aggregation method based on locally estimated generalization gaps, but their insight was limited to scenarios where each training domain was treated as a single client. [Huang et al., 2023] proposed a prototype aggregation method, FPL, from the feature perspective. By introducing consistency regularization, it aligns local features with prototypes to ex-

Proceedings of the Thirty-Fourth International Joint Conference on Artiﬁcial Intelligence (IJCAI-25)

plicitly learn invariance in the feature extractor. [Guo et al., 2023] introduced a regularization approach Fed IIR based on local empirical risk minimization, which aims to implicitly learn invariance by constraining the parameter space. However, these methods fail to integrate information from both the feature extractor and the classifier [Hu et al., 2023; Zhang et al., 2024; Qi et al., 2025; Fu et al., 2025b]. In response, we suggest a novel approach that emphasizes the decision-making process in the classifier layer of deep neural networks, rather than focusing solely on feature or classifier invariance. In conventional models, final output logits are computed by multiplying the penultimate layer s features with the classifier s weights. A deeper analysis reveals that [Chen et al., 2023] each logit value can be decomposed into the summation of element-wise products between the feature vector and corresponding weight vector . While most FL methods rely on feature prototypes [Wan et al., 2024; Bai et al., 2024] to learn invariance, the intermediate elementwise products (before summation) retain more fine-grained information. Viewing each product term as a contribution to its corresponding logit, we collect these contributions across all classes into a matrix. This matrix, which we term the insight matrix , encapsulates the model s decision-making process for input classification. By exchanging insight matrices in FL, clients can share cross-domain knowledge representations, thus facilitating invariant learning. Based on insight matrices decision, we propose a federated domain generalization model, named Fed DIM, which provides a new theoretical and practical framework for invariant learning. We assume that different clients have heterogeneous input/output distributions, but a well-generalized model should make decisions based on cues that are consistent across samples and clients. Based on this intuition, we propose a regularization term that promotes similarity between each sample s insight matrix and the mean insight matrix of its corresponding class. Our approach offers two key advantages: First, it explicitly combines the semantic information of the encoder and classifier through fine-grained modeling, thus enhancing the model s ability to adapt to client-domain heterogeneity; Second, the insight matrix, as a natural byproduct of the logit computation process, has minimal computational overhead. Experimental results show that Fed DIM improves the model s ability to learn invariant relationships across client domains. In summary, the main contributions of this paper are as follows:

New perspective: In this paper, considering the privacy of FL, we propose a novel method built upon the concept of category-wise mean insight matrices. Bridging the gap between focusing only on feature invariance or logit invariance in FL, offering new insights into out-ofdistribution (OOD) generalization of FL.

Simple yet effective algorithm: We introduce an efficient strategy that leverages insight matrices to enhance model robustness. Our method requires minimal modifications to the standard Fed Avg algorithm while achieving superior performance. The lightweight implementation adds only a few lines of code while its effectiveness is theoretically guaranteed.

Superior Performance: We conduct extensive experiments on multiple benchmark datasets. The results demonstrate that Fed DIM consistently outperforms existing federated learning methods in Out-of-Distribution generalization.

2 Preliminaries 2.1 Problem Setting In federated domain generalization, the data in each client is sampled from different domains. Let D denote the set of all client domains. We denote the training domain by Dtr = {D1, DM}, Dtr D , where M is the number of training domains (or clients). Let X and Y represent the input space and target space, respectively, the sample contains K classes. Each client c D holds a local dataset denoted as {(xc i, yc i )}nc i=1, where nc is the number of samples. Let the loss function as L(f(x), y). Then, for each client c, the expected risk as Ec(f) = Exc,yc L(f(xc), yc), and the global expected risk of model f denotes as EDtr(f) = EDtr Ec(f). The ideal goal of FL training is to minimize the overall loss function on the dataset D. However, in practice, FL typically involves a large number of clients with heterogeneous data distributions, and only a subset of clients participate in the training. This introduces a distribution shift between the participating clients and those not seen during training, leading to the out-of-distribution (OOD) generalization problem [Qi et al., 2024; Xie et al., 2024]. Therefore, instead of optimizing the expected risk over the entire domain, we focus on optimizing the following empirical risk objective:

min θ ED(f) EDtr(f) = 1

i=1 L (f(xc i; θ), yc i ) (1)

The federated OOD problem cannot be solved directly since not all potential clients are observed. This is more challenging than ordinary heterogeneous FL. To generalize to nonparticipating clients, the key to our study is how to learn the invariant relationship between inputs and goals.

2.2 Modeling Decision Insight Matrix In the current deep model, the final output of the decision process involves two main steps: (1) the feature extractor: transforming the input from the original feature space to the feature embedding space, i.e., z = h(φ, x) RD : X Z; and (2) the classifier: using the features to compute the final logits, i.e., o = g(w, z) RK : Z ˆY. Thus the model can be written as f(θ) = g(w) h(φ), where θ = (φ, w). In the Fed DG scenario, clients primarily focus on samples from their local domains, while the server needs to aggregate inter-domain information from multiple clients to learn invariant relationships. Most existing research has focused on learning invariance by uploading features or logits to regularize the model, but these approaches have certain limitations: ❶Solely focusing on feature invariance often overlooks the importance of classifier weights across different feature elements, which may lead to biased estimates of feature importance, thereby weakening the model s generalization ability. ❷Although logits implicitly encode the relationship between

Proceedings of the Thirty-Fourth International Joint Conference on Artiﬁcial Intelligence (IJCAI-25)

𝒈(𝒘; 𝐳𝟏) 𝒉(𝝋; 𝐱𝟏)

feature extractor classifier

class 1 class 2 class K

model Insight matrix

Agg. Eq.(5)

Insight matrix

global mean value

Update. Eq.(6) features

𝐨𝟏 ℝ𝑁 𝐾 logits

Figure 2: An overview of Fed DIM based on insight matrix. The clients process data from different domains. clients update the local model by minimizing the classification loss Lce and the distance Lim between the global and the local insight matrix. then upload them to the server. The server sends down the global model and matrix to the client after aggregating it and updating the global insight matrix.

the classifier s weights, they only provide rough numerical values and lack fine-grained recognition of cross-domain generalizability, thus lacking deeper insight into the underlying decision-making process. Since each logit value can be decomposed into the sum of the element-wise products of the feature vector and the corresponding weight vector, we argue that these intermediate product terms retain more granular information. Therefore, assuming the label set contains K classes, the logits can be represented as o = WT z RK, where W RD K is the weight. For simplicity, we ignore the bias term of the classifier. Based on this decomposition, we treat each product term as the contribution of its corresponding logit. Thus, the logic value of class k can be expressed as

ok = W {,k}z =

j=1 W{j,k}zj. (2)

We aggregate the contributions of all categories into a matrix, called the Insight Matrix , which is defined sa following:

W{1,1}z1 W{1,2}z1 W{1,K}z1 W{2,1}z2 W{2,2}z2 . . . W{2,K}z2 ... ... ... ... W{D,1}z D W{D,2}z D . . . W{D,K}z D

The insight matrix I RD K encapsulates key information about the model s decision-making process when classifying an input.

3 Methodology We propose a federated domain generalization solution that utilizes the insight matrix as a key component for exchanging information between servers and clients. The core idea is that the insight matrix of samples from the same class is consistent with its corresponding average value, implying that the insight matrices for the same class across different domains

should exhibit similarity. This ensures that the model makes classification decisions based on the same reasoning process. The framework of our method is illustrated in Figure 2.

3.1 Local Mean Class Insight Matrix According to Eq. (2), the client generates an insight matrix for each sample. We define a local mean insight matrix I(k) c to represent the k-th class. For the c-th client, the average insight matrix is the mean of the insight matrices of the samples belonging to class k.

I(k) c = 1 |nc,k|

(x,y) nc,k W h(φ, x), (4)

where represents element-wise product, I(k) c RD K and nc,k denotes the samples with class k in client c. We calculate the average insight matrix for each class and stack to get the local mean insight matrix Ic = [I(1) c , , I(K) c ] RK D K.

3.2 Global Aggregation and Update To generalize the global model to unseen clients, it is insufficient to simply aggregate the models of participating clients on the server. Although the clients possess domain information with different distributions, they share the same label space, which enables the participating clients to share a common embedding space. By aggregating the clients insight matrices based on class information, we can learn invariant relationships in the federated domain generalization scenario. Thus, our aggregated global model and global insight matrix can be expressed as:

c C θt c, and Ig = 1

c C Ic, (5)

where θt c is the model trained by client c in round t, and C is the number of clients sampled per round. In the FL training process, clients perform random sampling in each round. However, in the Fed DG scenario, the

Proceedings of the Thirty-Fourth International Joint Conference on Artiﬁcial Intelligence (IJCAI-25)

Algorithm 1 Fed DIM Input: total rounds T, local epochs E, total number of clients M, sampled number of clients C, learning rate η, hyperparameter for loss λ Server executes: 1: Initialize global model θ and global insight matrix I 2: for each round t = 1 T do 3: Server samples subset C of clients 4: for each client c C in parallel do 5: {θt c, Ic} Clients updates(θt, It) 6: end for 7: Update global model and calculate the global insight matrix by Eq. (5) 8: Update global insight matrix It+1 by Eq. (6) 9: end for Clients updates: 1: Initialize local model θt c = θt

2: for each local epoch e = 1 E do 3: Sample mini-batch in B: 4: Calculate the n-th sample insight matrix In 5: Calculate local loss by Eq. (7) 6: Update local model: θt c θt c η Lc (θt c; Bc) 7: end for 8: Calculate the mean class insight matrix Ic by Eq. (4) 9: return θt c and Ic

significant differences in client data distributions can lead to considerable variations in the aggregated insight matrix in each round. Therefore, we adopt a momentum to update the global insight matrix in each round. This method reduces the impact of noise in each iteration and balances new information with historical data, making the update process smoother and preventing overreaction to individual training data.

It+1 = (1 m) It + m Ig, (6)

where m is a positive momentum value, and I is initialized from the first iteration to compute the processed insight matrix, and Ig calculated by Eq. (5).

3.3 Local Model Update

Clients update their local models to learn invariant relationships and generate consistent insight matrices across clients. To achieve this, we introduce a regularization term in the local loss, which enables the local model to capture the invariant relationship between data and targets during single-domain learning. Specifically, the loss is defined as follows:

L = Lce + Lim

i=1 Lce (f(xc i; θ), yc i ) + λ 1

{i|yi=k} Ii Ik 2,

(7) where || || is the l2 norm, Lce is the cross entropy loss, B is the number of samples in a mini-batch. Ii is the insight matrix for the n-th sample. Ik is the global insight matrix corresponding to the k-th class distributed by the server.

4 Theoretical Analysis This section presents the theoretical analysis demonstrating how our methods address the distribution shift problem. At first, we provide the following lemma, which is from [Ben David et al., 2010] to bound the distribution divergence between two different domains. Lemma 4.1. Let d H H(A, B) denotes the domain divergence between two domain distributions A and B. The expected risk gap between A and B is bounded as |EA(θ) EB(θ)| 1

2d H H(A, B). Then, we consider the federated domain generalization setting where the training data follow the distribution D = S|C| c=1 Dc, with Dc denoting the data distribution of client c among |C| total clients [Yan and Guo, 2025]. Each client maintains a training set Dc sampled from Dc with size nc = |Dc|, forming an overall training set D of n = P c nc samples and model aggregation weights {pc = nc

n }. Let ED(θ) denote the expected risk on D and ˆED(θ) denote the empirical risk on D. We define ˆEDc(θ), ˆIc(θ), as the two loss terms in Eq. (7).

Theorem 4.2. Let ˆθ be the aggregated global model federatedly trained with the proposed overall loss function. Define θ T := arg minθ ET (θ) and θ c := arg minθc ˆEDc (θc) . Let H be a hypothesis space of VC dimension d. For any δ (0, 1), with probability at least 1 δ, the generalization gap of the model ˆθ on the unseen testing domain T has the following bound,

ET (ˆθ) ET (θ T ) X

c pc( ˆEDc(ˆθ) ˆEDc (θ c) (8)

+ ˆIc(ˆθ) + d H H (Dc, T )

δ + d log nc

The complete proof is provided in the supplementary material. In this theorem, d H H (Dc, T ) represents the domain divergence between source domain Dc and target domain T , while represents the optimal model s residual error on both D and T . The theorem establishes that the generalization gap of the global model ˆθ on target domain T is upper-bounded by two components: a weighted average term and the residual error. The weighted average term incorporates each client s empirical risk, insight matrix, and domain divergence bounds. Our proposed method aims to enhance the global model s generalization ability on T by explicitly minimizing the first two bound terms. The final term, which emerges from converting expected loss to empirical loss, is determined by the dataset size

5 Experiments 5.1 Experimental Setup Datasets. To evaluate our approach, we conducted experiments in four datasets, Rotated MNIST [Ghifary et al., 2015], is a MNIST dataset of 7000 samples by rotating it at

Proceedings of the Thirty-Fourth International Joint Conference on Artiﬁcial Intelligence (IJCAI-25)

angles of 0 , 15 , 30 , 45 , 60 , and 75 , resulting in six different domains. PACS [Li et al., 2017], has 9991 images consisting of seven object categories in four domains (photo, art, cartoon, and sketch). VLCS [Fang et al., 2013], has 10729 images consisting of five object categories in four domains (Caltech101, Label Me, SUN09, and VOC2007). Office Home [Venkateswara et al., 2017], is an image recognition dataset that includes 15,588 images of 65 classes from four different domains (art, clipart, product, and real-world). These are commonly used in the literature for domain generalization. We adhere to the experimental methodology outlined in Fed IIR [Guo et al., 2023]. For all datasets, we perform leave-one-domain-out strategy [Gulrajani and Lopez Paz, 2020], where we choose one domain as the test domain, train the model on all remaining domains, and evaluate it on the chosen domain. Each source domain is treated as a client. Following standard practice, we use 90% of available data as training data and 10% as validation data. Considering the FL setting, we explore two scenarios based on the number of clients: the one-domain-one-client scenario and the one-domain-multiple-clients scenario. In the onedomain-one-client scenario, each training domain is treated as an individual client. In the one-domain-multiple-clients scenario, data from each training domain is randomly partitioned into multiple subsets, with each client containing data from one subset of a given training domain. The details of the data partitioning are provided in the Appendix C.1. Baselines. We consider 2 classic federated methods Fed Avg [Mc Mahan et al., 2017], Fed Prox [Li et al., 2020b], and 4 state-of-the-art federated methods for domain generalization Fed ADG [Zhang et al., 2021], Fed SR [Nguyen et al., 2022], Fed IIR [Guo et al., 2023] and Fed LGF [Yan and Guo, 2025] as baselines. Implementation. We design dataset-specific models for each task. For the Rotated MNIST dataset, the feature encoder consists of four convolutional blocks, with Re LU activation, group normalization, and average pooling, followed by a linear classifier. During training, the batch size is 64. For the VLCS and PACS datasets, Res Net-18 is used as the feature encoder, while Res Net-50 is employed for the Office Home dataset. The classifiers for these three datasets consist of two fully connected layers. During training, the batch size is 32. For all datasets and scenarios, we set the communication rounds T to 100, with local iteration per round E=1 to accommodate limited local computational resources. Local models are updated using the SGD optimizer with a momentum of 0.9. The best parameters reported in the original paper were selected for the baseline, and the optimal hyperparameters of Fed DIM were found by grid search. Each experiment was repeated 3 times and the average value was calculated.

5.2 Experimental Results We evaluated all methods in two scenarios, where M denotes the total number of clients and C represents the number of sampled clients. Table 1 reports the results for the onedomain-one-client scenario, while Table 2 presents the results for the one-domain-multi-client scenario. For clarity, detailed results for specific domains are provided in the Appendix. C. The experimental results demonstrate that, under the cross-

domain client setting, our proposed method consistently outperforms other state-of-the-art baselines. In terms of average accuracy, we outperform the latest baseline Fed LGF by 1.47% across all datasets. These observations validate the effectiveness of our method compared to existing baselines. As the total number of clients increases, the performance of all methods declines significantly in the one-domainmulti-client scenario. In particular, Fed ADG and Fed SR exhibit the most noticeable performance drops, likely because the increased number of clients makes it difficult to align the distributions across different source domains. To further validate this hypothesis, we extended the number of clients to 100, with the experimental results provided in Table 3 of the Appendix. Under this setting, both Fed ADG and Fed SR perform worse than Fed Avg. Additionally, when client data volume is large, the performance of Fed IIR also drops significantly, potentially due to the reduced number of samples per category for each client caused by the increased number of clients. In contrast, our method exhibits the smallest performance degradation, highlighting its effectiveness and strong generalization capability in multi-client scenarios. Loss surface visualization. We visualized the loss surface in a one-domain-one-client scenario using the art test domain in the PACS dataset, as shown in Figure 3. In the visualization, we used the global model as the origin and labeled the local models. This approach is consistent with the visualization technique in [Garipov et al., 2018]. It can be observed that compared to Fed Avg, our local models all converge in the flat region of the loss surface, and in this way the global model induced is more generalizable. In addition, we find that the gap between the global and local models is much smaller, suggesting that our approach has a clear advantage in maintaining a consistent optimization objective across different domains.

Figure 3: Loss surfaces w.r.t. model parameters on the PACS dataset with target domain art .

Visualization. To better demonstrate the effectiveness of our method, we performed t-SNE [Van der Maaten and Hinton, 2008] visualization of the features z on the PACS dataset, as shown in Figure 4. Compared to Fed Avg, our method is clearer in clustering classification. On the training domain, Fed DIM presents a clear block structure, indicating the effectiveness of the training process. When the model is generalized to the test domain (dark red part), the clustering structure is still obvious. This demonstrates the effectiveness of our method in generalizing to unseen distributions.

Proceedings of the Thirty-Fourth International Joint Conference on Artiﬁcial Intelligence (IJCAI-25)

Rotated MNIST (M=5, C=5)

PACS (M=3, C=3)

VLCS (M=3, C=3)

Office Home (M=3, C=3) Average Methods

Conv Net Res Net-18 Res Net-18 Res Net-50 Fed Avg 94.77 0.2 83.13 0.1 75.38 0.7 68.94 0.1 80.56 Fed Prox 94.41 0.3 83.32 0.2 76.43 1.2 68.04 0.5 80.55 Fed ADG 94.96 0.0 83.28 0.4 77.53 0.3 68.87 0.4 81.16 Fed SR 94.65 0.4 83.65 0.3 75.48 0.7 69.25 0.3 80.76 Fed IIR 95.22 0.3 83.87 0.3 77.75 0.8 69.52 0.1 81.59 Fed LGF 95.09 0.2 84.20 0.5 77.23 1.1 69.33 0.2 81.46 Fed DIM 95.83 0.2 84.57 0.4 79.12 0.8 71.12 0.2 82.66

Table 1: Performance comparison (%) of all compared mehtods on Rotated MNIST, PACS, VLCS, and Office Home using leave-one-out domain validation. Each training domain is considered as a client and all clients participate in each round of joint training.

Rotated MNIST (M=50, C=10)

PACS (M=30, C=10)

VLCS (M=30, C=10)

Office Home (M=30, C=10) Average Methods

Conv Net Res Net-18 Res Net-18 Res Net-50 Fed Avg 91.00 0.4 76.82 0.5 73.75 0.9 67.59 0.2 77.29 Fed Prox 91.11 0.6 77.48 0.7 74.18 1.4 67.73 0.7 77.63 Fed ADG 92.71 0.3 77.89 0.6 71.95 1.7 67.23 0.2 77.44 Fed SR 92.44 0.8 78.13 0.5 73.33 0.5 65.84 0.6 77.43 Fed IIR 93.28 0.5 79.25 0.5 75.12 0.8 68.19 0.3 78.96 Fed LGF 92.84 0.6 79.49 0.6 75.79 1.3 68.01 0.4 79.03 Fed DIM 93.86 0.4 80.57 0.5 77.12 1.1 70.03 0.5 80.40

Table 2: Performance comparison (%) of all compared mehtods on Rotated MNIST, PACS, VLCS, and Office Home using leave-one-out domain validation. The total number of participating clients is more than the number of domains, sampling 10 clients per round for training.

Fed Avg Fed DIM

photo cartoon sketch art (test)

Figure 4: Visualization of t-SNE embedding for the PACS dataset with art as the unseen target domain. Here, different colors represent different domain. The seven clusters denote the classes.

5.3 Ablation Study To confirm the validity of our approach, we designed the following five variants for comparison to assess the independent impact of each component.

W/λ = 0: The method degenerates to Fed Avg.

W/m = 0: The insight matrix I is generated and fixed for the initial pre-trained model.

W/m = 1: The insight matrix I is dynamically updated by the current round.

W/Fea: We replaced 2nd term of Eq. (7) using featureinvariance as Lim = 1

B P k P {n|yi=k} zi zk 2.

feature-invariance: focusing on the consistent expression of the features in the middle layer of the model.

W/Log: We replaced 2nd term of Eq. (7) using logitinvariance as Lim = 1

B P k P {n|yi=k} oi ok 2.

Table 3 presents the experimental results of our different variants. It can be observed that the performance at W/m = 0 surpasses other variants, indicating that leveraging the insight matrix directly from the pre-trained model facilitates invariant decision-making. While W/m = 1 is outperformed by our momentum updating strategy due to the limited number of samples in a single batch, which cannot adequately consider all samples in the same class. Therefore, we design a scheme for dynamic updating based on historical information. Moreover, it can be observed that both variants W/Fea and W/Log outperform the case where λ = 0. This is because the invariance constraint we adopt helps the model achieve better generalization and robustness to some extent. However, it is worth noting that this constraint may amplify the influence of irrelevant features, which have large values but correspond to small weights in the decision-making process, thus weakening the overall classification performance. Furthermore, focusing solely on logical invariance does not account for the varying contributions of individual features to the final decision. This can lead to small contributions being boosted to ensure that the summation equals the mean value, causing the model to emphasize irrelevant features and further degrading performance.

logit-invariance: focuing on the stability of the model s predicted probability, which is the behavior of the output layer.

Proceedings of the Thirty-Fourth International Joint Conference on Artiﬁcial Intelligence (IJCAI-25)

Figure 5: The accuracy convergence curves for the validation and test sets in one-domain-multiple-clients scenario. There are 50 clients for Rotated MNIST and 30 clients for the other datasets, and 10 clients are sampled in each round for training. Where the test domains of datasets Rotated MNIST, PACS, VLCS, and Office Home are 0 , art , VOC2007 , and art .

Invariance Test datasets Methods F O I M P V H Avg. W/λ=0 - - 91.00 76.82 73.75 67.59 77.29 W/m=0 - - 93.79 79.33 76.92 68.16 79.55 W/m=1 - - 92.26 78.62 76.94 69.13 79.24 W/Fea - - 93.84 78.23 76.82 68.96 79.46 W/Log - - 92.26 77.71 75.16 68.6 78.43 Fed DIM - - 93.86 80.57 77.12 70.03 80.40

Table 3: Ablation study on four datasets. We abbreviated the symbols. Under Invariance content, F stands for feature, O stands for logical value, and I stands for our Insight Matrix. Under the test dataset content, M stands for Rotated Mnist, P stands for PACS, V stands for VLCS, and H stands for Office Home.

5.4 Visualization of the Convergence Process We present the convergence behavior of different methods in a one-domain multiple-client scenario. The learning rate for all methods is fixed to the same value. We report the accuracy of each method on both the validation and test sets, as shown in Figure 5. From the figure, it is evident that all methods demonstrate stable convergence on the validation set, with our algorithm achieving relatively superior performance across various datasets, particularly on the VLCS and Office Home datasets. Moreover, although the performance of different methods on the test set exhibits varying degrees of fluctuation, overall, the test accuracy of all methods stabilizes in the later stages of training, indicating good convergence behavior of the models on the test set.

5.5 Parameter Sensitivity Analysis We investigated the effects of momentum coefficient and the loss trade-off parameter in a one-domain-multi-client scenario. We evaluated the sensitivity of the model using four datasets in the range λ {0.0001, 0.001, 0.01, 0.1, 1} and m {0.1, 0.3, 0.5, 0.7, 0.9}, as shown in Figure 6. The

results show that Fed DIM performs consistently when λ {0.0001, 0.001, 0.01, 0.1}, but performance degrades significantly at λ = 1. Furthermore, models with static momentum (m = 0 or m = 1) perform worse than those with dynamic momentum updates, highlighting the importance of momentum in improving generalization.

(a) Trade-off parameter λ

(b) Momentum coefficient m

Figure 6: Average test accuracy (%) for various values of the hyperparameter λ and m, with one-domain-multi-client setting.

6 Conclusion

We study OOD generalization in federated learning through a novel perspective of fine-grained invariant relationship learning, which captures subtle yet crucial patterns across distributed domains. We propose Fed DIM, a simple yet effective method that enhances OOD generalization by leveraging insight matrices to distill domain-invariant relationship from heterogeneous client data. Our theoretical analysis shows that Fed DIM can effectively generalize to unseen domains by maintaining consistent relationships across different distributions, while empirical results demonstrate stateof-the-art performance on standard federated domain generalization benchmarks, including Rotated MNIST, PACS, VLCS and Office Home datasets.

Proceedings of the Thirty-Fourth International Joint Conference on Artiﬁcial Intelligence (IJCAI-25)

Acknowledgments

The research is supported bythe National Key R&D Program of China (2023YFB2703700), the National Natural Science Foundation of China (62176269).

Contribution Statement

Tianchi Liao and Binghui Xie contribute equally to this work.

[Bai et al., 2024] Sikai Bai, Jie Zhang, Song Guo, Shuaicheng Li, Jingcai Guo, Jun Hou, Tao Han, and Xiaocheng Lu. Diprompt: Disentangled prompt tuning for multiple latent domain generalization in federated learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 27284 27293, 2024.

[Ben-David et al., 2010] Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. A theory of learning from different domains. Machine learning, 79:151 175, 2010.

[Chen et al., 2023] Liang Chen, Yong Zhang, Yibing Song, Anton Van Den Hengel, and Lingqiao Liu. Domain generalization via rationale invariance. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1751 1760, 2023.

[Chen et al., 2025] Chuan Chen, Tianchi Liao, Xiaojun Deng, Zihou Wu, Sheng Huang, and Zibin Zheng. Advances in robust federated learning: A survey with heterogeneity considerations. IEEE Transactions on Big Data, 2025.

[Fang et al., 2013] Chen Fang, Ye Xu, and Daniel N Rockmore. Unbiased metric learning: On the utilization of multiple datasets and web images for softening bias. In Proceedings of the IEEE International Conference on Computer Vision, pages 1657 1664, 2013.

[Fu et al., 2025a] Lele Fu, Sheng Huang, Yanyi Lai, Tianchi Liao, Chuanfu Zhang, and Chuan Chen. Beyond federated prototype learning: Learnable semantic anchors with hyperspherical contrast for domain-skewed data. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 16648 16656, 2025.

[Fu et al., 2025b] Lele Fu, Sheng Huang, Yanyi Lai, Chuanfu Zhang, Hong-Ning Dai, Zibin Zheng, and Chuan Chen. Federated domain-independent prototype learning with alignments of representation and parameter spaces for feature shift. IEEE Transactions on Mobile Computing, pages 1 16, 2025.

[Garipov et al., 2018] Timur Garipov, Pavel Izmailov, Dmitrii Podoprikhin, Dmitry P Vetrov, and Andrew G Wilson. Loss surfaces, mode connectivity, and fast ensembling of dnns. Advances in neural information processing systems, 31, 2018.

[Ghifary et al., 2015] Muhammad Ghifary, W Bastiaan Kleijn, Mengjie Zhang, and David Balduzzi. Domain generalization for object recognition with multi-task autoencoders. In Proceedings of the IEEE international conference on computer vision, pages 2551 2559, 2015.

[Gulrajani and Lopez-Paz, 2020] Ishaan Gulrajani and David Lopez-Paz. In search of lost domain generalization. ar Xiv preprint ar Xiv:2007.01434, 2020.

[Guo et al., 2023] Yaming Guo, Kai Guo, Xiaofeng Cao, Tieru Wu, and Yi Chang. Out-of-distribution generalization of federated learning via implicit invariant relationships. In International Conference on Machine Learning, pages 11905 11933. PMLR, 2023.

[Hu et al., 2023] Ming Hu, Zeke Xia, Dengke Yan, Zhihao Yue, Jun Xia, Yihao Huang, Yang Liu, and Mingsong Chen. Gitfl: Uncertainty-aware real-time asynchronous federated learning using version control. In In Proceedings of IEEE Real-Time Systems Symposium (RTSS), pages 145 157. IEEE, 2023.

[Hu et al., 2024] Ming Hu, Peiheng Zhou, Zhihao Yue, Zhiwei Ling, Yihao Huang, Anran Li, Yang Liu, Xiang Lian, and Mingsong Chen. Fedcross: Towards accurate federated learning via multi-model cross-aggregation. In IEEE International Conference on Data Engineering (ICDE), pages 2137 2150. IEEE, 2024.

[Huang et al., 2023] Wenke Huang, Mang Ye, Zekun Shi, He Li, and Bo Du. Rethinking federated learning with domain shift: A prototype view. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16312 16322. IEEE, 2023.

[Huang et al., 2025] Sheng Huang, Lele Fu, Yuecheng Li, Chuan Chen, Zibin Zheng, and Hong-Ning Dai. A crossclient coordinator in federated learning framework for conquering heterogeneity. IEEE Transactions on Neural Networks and Learning Systems, 36(5):8828 8842, 2025.

[Li et al., 2017] Da Li, Yongxin Yang, Yi-Zhe Song, and Timothy M Hospedales. Deeper, broader and artier domain generalization. In Proceedings of the IEEE international conference on computer vision, pages 5542 5550, 2017.

[Li et al., 2020a] Li Li, Yuxi Fan, Mike Tse, and Kuo-Yi Lin. A review of applications in federated learning. Computers & Industrial Engineering, 149:106854, 2020.

[Li et al., 2020b] Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, and Virginia Smith. Federated optimization in heterogeneous networks. Proceedings of Machine learning and systems, 2:429 450, 2020.

[Li et al., 2024] Yichen Li, Wenchao Xu, Haozhao Wang, Yining Qi, Ruixuan Li, and Song Guo. Sr-fdil: Synergistic replay for federated domain-incremental learning. IEEE Transactions on Parallel and Distributed Systems, 2024.

[Liao et al., 2024] Tianchi Liao, Lele Fu, Jialong Chen, Zhen Wang, Zibin Zheng, and Chuan Chen. A swiss army

Proceedings of the Thirty-Fourth International Joint Conference on Artiﬁcial Intelligence (IJCAI-25)

knife for heterogeneous federated learning: Flexible coupling via trace norm. Advances in Neural Information Processing Systems, 37:139886 139911, 2024. [Liao et al., 2025] Tianchi Liao, Lele Fu, Lei Zhang, Lei Yang, Chuan Chen, Michael K Ng, Huawei Huang, and Zibin Zheng. Privacy-preserving vertical federated learning with tensor decomposition for data missing features. IEEE Transactions on Information Forensics and Security, 2025. [Mc Mahan et al., 2017] Brendan Mc Mahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273 1282. PMLR, 2017. [Nguyen et al., 2022] A Tuan Nguyen, Philip Torr, and Ser Nam Lim. Fedsr: A simple and effective domain generalization method for federated learning. Advances in Neural Information Processing Systems, 35:38831 38843, 2022. [Qi et al., 2024] Zhuang Qi, Weihao He, Xiangxu Meng, and Lei Meng. Attentive modeling and distillation for out-ofdistribution generalization of federated learning. In 2024 IEEE International Conference on Multimedia and Expo (ICME), pages 1 6. IEEE, 2024. [Qi et al., 2025] Zhuang Qi, Lei Meng, and et al. Cross-silo feature space alignment for federated learning on clients with imbalanced data. In The 39th Annual AAAI Conference on Artificial Intelligence (AAAI-25), pages 19986 19994, 2025. [Qiao et al., 2024] Yu Qiao, Chaoning Zhang, Apurba Adhikary, and Choong Seon Hong. Logit calibration and feature contrast for robust federated learning on non-iid data. ar Xiv preprint ar Xiv:2404.06776, 2024. [Shen et al., 2025] Wei Shen, Wenke Huang, Guancheng Wan, and Mang Ye. Label-free backdoor attacks in vertical federated learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 20389 20397, 2025. [Van der Maaten and Hinton, 2008] Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008. [Venkateswara et al., 2017] Hemanth Venkateswara, Jose Eusebio, Shayok Chakraborty, and Sethuraman Panchanathan. Deep hashing network for unsupervised domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5018 5027, 2017. [Wan et al., 2024] Guancheng Wan, Wenke Huang, and Mang Ye. Federated graph learning under domain shift with generalizable prototypes. In Proceedings of the AAAI conference on artificial intelligence, volume 38, pages 15429 15437, 2024. [Xie et al., 2024] Binghui Xie, Yongqiang Chen, Jiaqi Wang, Kaiwen Zhou, Bo Han, Wei Meng, and James Cheng. Enhancing evolving domain generalization

through dynamic latent representations. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 16040 16048, 2024. [Yan and Guo, 2025] Hao Yan and Yuhong Guo. Local and global flatness for federated domain generalization. In European Conference on Computer Vision, pages 71 87. Springer, 2025. [Zhang et al., 2021] Liling Zhang, Xinyu Lei, Yichun Shi, Hongyu Huang, and Chao Chen. Federated learning with domain generalization. ar Xiv preprint ar Xiv:2111.10487, 2021. [Zhang et al., 2023] Ruipeng Zhang, Qinwei Xu, Jiangchao Yao, Ya Zhang, Qi Tian, and Yanfeng Wang. Federated domain generalization with generalization adjustment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3954 3963, 2023. [Zhang et al., 2024] Yudong Zhang, Xu Wang, Pengkun Wang, Binwu Wang, Zhengyang Zhou, and Yang Wang. Modeling spatio-temporal mobility across data silos via personalized federated learning. IEEE Transactions on Mobile Computing, 2024.

Proceedings of the Thirty-Fourth International Joint Conference on Artiﬁcial Intelligence (IJCAI-25)