# trusted_multiview_learning_with_label_noise__cdf593e7.pdf

Trusted Multi-view Learning with Label Noise

Cai Xu, Yilin Zhang, Ziyu Guan , Wei Zhao School of Computer Science and Technology, Xidian University {cxu@, ylzhang 3@stu., zyguan@, ywzhao@mail.}xidian.edu.cn

Multi-view learning methods often focus on improving decision accuracy while neglecting the decision uncertainty, which significantly restricts their applications in safety-critical applications. To address this issue, researchers propose trusted multi-view methods that learn the class distribution for each instance, enabling the estimation of classification probabilities and uncertainty. However, these methods heavily rely on high-quality ground-truth labels. This motivates us to delve into a new generalized trusted multi-view learning problem: how to develop a reliable multi-view learning model under the guidance of noisy labels? We propose a trusted multi-view noise refining method to solve this problem. We first construct view-opinions using evidential deep neural networks, which consist of belief mass vectors and uncertainty estimates. Subsequently, we design view-specific noise correlation matrices that transform the original opinions into noisy opinions aligned with the noisy labels. Considering label noises originating from low-quality data features and easily-confused classes, we ensure that the diagonal elements of these matrices are inversely proportional to the uncertainty, while incorporating class relations into the off-diagonal elements. Finally, we aggregate the noisy opinions and employ a generalized maximum likelihood loss on the aggregated opinion for model training, guided by the noisy labels. We empirically compare TMNR with state-of-the-art trusted multi-view learning and label noise learning baselines on 5 publicly available datasets. Experiment results show that TMNR outperforms baseline methods on accuracy, reliability and robustness. The code and appendix are released at https://github.com/Yilin Zhang107/TMNR.

1 Introduction

Multi-view data is widely present in various real-world scenarios. For instance, in the field of healthcare, a patient s

Corresponding author

Figure 1: The generalized trusted multi-view learning problem: the model should recognize the feature and model uncertainty caused by low-quality feature and noisy labels, respectively.

comprehensive condition can be reflected through multiple types of examinations; social media applications often include multi-modal contents such as textual and visual reviews [Liu et al., 2024]. Multi-view learning synthesizes both consistency and complementary information to obtain a more comprehensive understanding of the data. It has generated significant and wide-ranging influence across multiple research areas, including classification [Chen et al., 2024; Wang et al., 2022], clustering [Xu et al., 2023; Huang et al., 2023; Wen et al., 2022], recommendation systems [Lin et al., 2023; Nikzad-Khasmakhi et al., 2021], and large language models [Min et al., 2023]. Most existing multi-view learning methods focus on improving decision accuracy while neglecting the decision uncertainty. This significantly limits the application of multiview learning in safety-critical scenes, such as healthcare. Recently, Han et al. propose a pioneering work [Han et al., 2020], Trusted Multi-view Classification (TMC), to solve this problem. TMC calculates and aggregates the evidences of all views from the original data features. It then utilizes these evidences to parameterize the class distribution, which could be used to estimate the class probabilities and uncertainty. To train the entire model, TMC requires the estimated class probabilities to be consistent with the ground-truth labels. Following this line, researchers propose novel evidence aggregation methods, aiming to enhance the reliability and robustness in the presence of feature noise [Gan et al., 2021; Qin et al., 2022], conflictive views [Xu et al., 2024] and incomplete views [Xie et al., 2023].

Proceedings of the Thirty-Third International Joint Conference on Artiﬁcial Intelligence (IJCAI-24)

Regretfully, these trusted multi-view learning methods consistently rely on high-quality ground-truth labels. The labeling task is time-consuming and expensive especially when dealing with large scale datasets, such as user generated multi-modal contents in social media applications. This motivates us to delve into a new Generalized Trusted Multi-view Learning (GTML) problem: how to develop a reliable multiview learning model under the guidance of noisy labels? This problem encompasses two key objectives: 1) detecting and refining the noisy labels during the training stage; 2) recognizing the model s uncertainty caused by noisy labels. For example, instances belonging to classes like dog and wolf might exhibit similarities and are prone to being mislabelled. Consequently, the model should exhibit higher decision uncertainty in such cases. An intuitive analogy is an intern animal researcher (model) may not make high-confidence decisions for all animals (instances), but is aware of the cases where a definitive decision is challenging. In this paper, we propose an Trusted Multi-view Noise Refining (TMNR) method for the GTML problem. We consider the label noises arising from two sources: low-quality data features, such as blurred and incomplete features, and easily-confused classes, such as classes dog and wolf . Our objective is to leverage multi-view consistent information for noise detection. To achieve this, we first construct the view-specific evidential Deep Neural Networks (DNNs) to learn view-specific evidence, which could be termed as the amount of support to each category collected from data. We then model the view-specific distributions of class probabilities using the Dirichlet distribution, parameterized with view-specific evidence. These distributions allow us to construct opinions, which consist of belief mass vectors and uncertainty estimates. We design view-specific noise correlation matrices to transform the original opinions to the noisy opinions, which aligns with the noisy labels. Considering that low-quality data features are prone to mislabeling, we require the diagonal elements of the noise correlation matrices to be inversely proportional to the uncertainty. Additionally, we incorporate class relations into the off-diagonal elements. For instance, the elements corresponding to dog and wolf should have larger values since these two classes are easily mislabelled. Next, we aggregate the noisy opinions to obtain the common evidence. Finally, we employ a generalized maximum likelihood loss on the common evidence, guided by the noisy labels, for model training. The main contributions of this work are summarized as follows: 1) we propose the generalized trusted multi-view learning problem, which necessitates the model s ability to make reliable decisions despite the presence of noisy guidance; 2) we propose the TMNR method to tackle this problem. TMNR mitigates the negative impact of noisy labels through two key strategies: leveraging multi-view consistent information for detecting and refining noisy labels, and assigning higher decision uncertainty to instances belonging to easily mislabelled classes; 3) we empirically compare TMNR with stateof-the-art trusted multi-view learning and label noise learning baselines on 5 publicly available datasets. Experiment results show that TMNR outperforms baseline methods on accuracy, reliability and robustness.

2 Related Work

2.1 Deep Multi-view Fusion

Multi-view fusion has demonstrated its superior performance in various tasks by effectively combining information from multiple sources or modalities [Liang et al., 2021; Zhou et al., 2023]. According to the fusion strategy, existing deep multi-view fusion methods can be roughly classified into feature fusion [Hu et al., 2024; Xu et al., 2022; Liu et al., 2023] and decision fusion [Jillani et al., 2020; Liu et al., 2022]. A major challenge of feature fusion methods is each view might exhibit different types and different levels of noise at different points in time. Trust decision fusion methods solve this making view-specific trust decisions to obtain the view-specific reliabilities, then assigning large weights to these views with high reliability in the multi-view fusion stage. Following this line, Xie et al. [Xie et al., 2023] tackle the challenge of incomplete multi-view classification through a two-stage approach, involving completion and evidential fusion. Xu et al. [Xu et al., 2024] focus on making trust decisions for instances that exhibit conflicting information across multiple views. They propose an effective strategy for aggregating conflicting opinions and theoretically prove this strategy can exactly model the relation of multi-view common and view-specific reliabilities. However, it should be noted that these trusted multi-view learning methods heavily rely on high-quality ground-truth labels, which may not always be available or reliable in real-world scenarios. This limitation motivates for us to delve into the problem of GTML, which aims to learn a reliable multi-view learning model under the guidance of noisy labels.

2.2 Label-Noise Learning

In real-world scenarios, the process of labeling data can be error-prone, subjective, or expensive, leading to noisy labels. Label-noise learning refers to the problem of learning from training data that contains noisy labels. In multi-classification tasks, label-noise can be categorized as Class-Conditional Noise (CCN) and Instance-Dependent Noise (IDN). CCN occurs when the label corruption process is independent of the data features, and instances in a class are assigned to other classes with a fixed probability. Dealing with CCN noise often involves correcting losses by estimating an overall category transfer probability matrix [Patrini et al., 2017; Hendrycks et al., 2018]. IDN refers to instances being mislabeled based on their class and features. In this work, we focus on IDN as it closely resembles real-world noise. The main challenge lies in approximating the complex and highdimensional instance-dependent transfer matrix. Several approaches have been proposed to address this challenge. For instance, Cheng et al. [Cheng et al., 2020] proposes an instance-dependent sample sieve method that enables model to process clean and corrupted samples individually. Cheng et al. [Cheng et al., 2022] effectively reduce the complexity of the instance-dependent matrix by streaming embedding. Berthon et al. [Berthon et al., 2021] approximate the transfer distributions of each instance using confidence scores. However, the confidence depends on the pre-trained model and may not reliable. The proposed TMNR reliably bootstraps

Proceedings of the Thirty-Third International Joint Conference on Artiﬁcial Intelligence (IJCAI-24)

Figure 2: Illustration of the label-noise. Each color represents a ground-truth category y.

the correlation matrix based on multi-view opinions, leading to superior performance. In addition, TMNR not only refines the noise in the labels but also recognizes the model s uncertainty caused by noisy labels.

3 The Method In this section, we first define the generalized trusted multiview learning problem, then present Trusted Multi-view Noise Refining (TMNR) in detail, together with its implementation.

3.1 Notations and Problem Statement We use {xv n Rdv}V v=1 to denote the feature of the n-th instance, which contains V views, dv denotes the dimension of the v-th view. yn {1, ..., K} denotes the ground-truth category, where K is the number of all categories. In the generalized trusted multi-view classification problem, the labels of some data instances contain noise as shown in Figure 2. Therefore, we utilize { yn {1, ..., K}}N n=1 as the set of noisy labels that may have been corrupted The objective is to learn a trusted classification model according to noisy training instances {{xv n}V v=1, yn}Ntrain n=1 . For the instances of the test sets, the model should predict the category {yn} and uncertainty {un}, which can quantify the uncertainty caused by low-quality feature and noisy labels.

3.2 Trusted Multi-view Noise Refining Pipeline As shown in Figure 3, We first construct view-specific opinions using evidential DNNs {f v( )}V v=1. To account for the presence of label noise, the view-specific noise correlation matrices {T v}V v=1 transform the original opinions into noisy opinions aligned with the noisy labels. Finally, we aggregates the noisy opinions and trains the whole model by the noisy labels. Details regarding each component will be elaborated as below.

View-specific Evidence Learning In this subsection, we introduce the evidence theory to quantify uncertainty. Traditional multi-classification neural networks usually use a Softmax activation function to obtain the

probability distribution of the categories. However, this provides only a single-point estimate of the predictive distribution, which can lead to overconfident results even if the predictions are incorrect. This limitation affects the reliability of the results. To address this problem, EDL [Sensoy et al., 2018] introduces the evidential framework of subjective logic [Jøsang, 2016]. It converts traditional DNNs into evidential neural networks by making a small change in the activation function to get a non-negative output (e.g.,Re LU) to extract the amount of support (called evidence) for each class. In this framework, the parameter α of the Dirichlet distribution Dir(p|α) is associated with the belief distribution in the framework of evidence theory, where p is a simplex representing the probability of class assignment. We collect evidence, {ev n} by view-specific evidential DNNs {f v( )}V v=1. The corresponding Dirichlet distribution parameter is αv = ev + 1 = [αv 1, , αv K]T . After obtaining the distribution parameter, we can calculate the subjective opinion Ov = (bv, uv) of the view including the quality of beliefs bv and the quality of uncertainty u, where bv = (αv 1)/Sv = ev/Sv, uv = K/Sv, and Sv = PK k=1 αv k is the Dirichlet intensity.

Evidential Noise Forward Correction In the GTML problem, we expect to train the evidence network so that its output is a clean evidence distribution about the input. To minimise the negative impact of IDN in the training dataset, we modify the outputs of the DNNs with an additional structure to adjust the loss of each training sample before updating the parameters of the DNNs. This makes the optimisation process immune to label noise, called evidential noise forward correction. This structure should be removed when predicting test data. For each view {xv}V v=1 of a specific instance, we construct view-specific noise correlation matrix to model the noise process: T v = [tv kj]K k,j=1 [0, 1]K K, (1)

where tv kj := P( y = j|y = k, xv) and PK j=1 tv kj = 1. To predict probability distributions, the noise class posterior probability is obtained by calculating the correlation matrix:

P( y = j|xv) =

k=1 P( y = j, y = k|xv)

k=1 tv kj P(y = k|xv). (2)

Based on the evidence theory described in the previous subsection and considering the constraints within T v itself, we convert the transfer of predicted probabilities of instances into a transfer of the extracted support at the evidence level using the following equation:

k=1 tv kj ev k, (3)

where ev j/ev k denotes the j-/k-th element in the noise/clean class-posterior evidence quantities ev/ev.

Proceedings of the Thirty-Third International Joint Conference on Artiﬁcial Intelligence (IJCAI-24)

Figure 3: Illustration of TMNR. We first construct view-specific opinions using evidential DNNs {f v( )}V v=1. Subsequently, the view-specific noise correlation matrices {T v}V v=1 transform the original opinions into noisy opinions aligned with the noisy labels. Finally, we aggregates the noisy opinions and trains the whole model by the noisy labels.

Therefore, the clean posterior ev obtained from the prediction of each view is transferred to the noisy posterior ev. The parameter α in the noisy Dirichlet distribution is computed in order to align the supervised labels y that contain the noise. The entire evidence vector could be calculabed by:

ev = T v ev, αv = ev + 1. (4)

Trust Evidential Multi-view Fusion After obtaining opinions from multiple views, we consider dynamically integrating them based on uncertainty to produce a combined opinion. We achieve this via the Dempster s combination rule [Jøsang, 2016]. We take the clean opinion aggregation as example. Given clean opinions of two views of the same instance, i.e., O1 = (b1, u1) and O2 = (b2, u2), the computation to obtain the aggregated opinion O = O1 O2 = (b, u) is defined as follows:

bk = 1 1 C b1 kb2 k + b1 ku2 + b2 ku1 , u = 1 1 C u1u2, (5)

where C = P i =j b1 i b2 j.

For a set of opinions from multiple views {Ov}V v=1, the joint multi-view subjective opinion is obtained by O = O1 O2 OV . The corresponding parameters of the Dir(p|α) are obtained with S = K/u, αk = bk S + 1 and the final probability could be estimated via pk = αk/S.

3.3 Loss Function In this section, we explore the optimization of the parameter set {θ, ω} in the evidence extraction network f( ; θ) and the correlation matrices {{T v n}V v=1}Ntrain n=1 .

Classification Loss We capture view-specific evidence from a single view xv of a sample. The vector ev = f v(xv) denotes the clean class-

posterior evidence obtained from the corresponding view network prediction. This evidence undergoes correction through Eq. (4) to yield the noisy class-posterior evidence, denoted as ev, along with the associated Dirichlet parameter αv. Then in the constructed inference framework, the evidence multiclassification loss L containing the classification loss Lace and the Kullback-Leibler (KL) divergence term LKL is defined, where the classification loss Lace is obtained by adjusting for the conventional cross-entropy loss, i.e., the generalized maximum likelihood loss, as follows:

Lace( αv) =

k=1 yk ψ Sv ψ ( αv k) , (6)

where ψ( ) is the digamma function, αv j denote the j-th element in αv. Eq. (6) does not ensure that the wrong classes in each sample produce lower evidence, and we would like it to be reduced to zero. Thus the KL divergence term is expressed as: LKL( αv) = KL [D(pv| αv) D(pv|1)] , where αv = y + (1 y) αv is the Dirichlet parameter adjusted to remove non-misleading evidence. y denotes the noisy label y in the form of a one-hot vector, and Γ( ) is the gamma function. Thus for a given view s Dirichlet parameter αv, the view-specific loss is L( αv) = Lace( αv) + λLKL( αv), (7) where λ [0, 1] is a changeable parameter, and we gradually increase its value during training to avoid premature convergence of misclassified instances to a uniform distribution.

Uncertainty-Guided Correlation Loss Optimizing the view-specific correlation matrix, being a function of the high-dimensional input space, presents challenges without any underlying assumptions. In the context of

Proceedings of the Thirty-Third International Joint Conference on Artiﬁcial Intelligence (IJCAI-24)

the IDN problem, it is important to recognize that the probability of a sample being mislabeled depends not only on its category but also on its features. When the features contain noise or are difficult to discern, the likelihood of mislabeling increases significantly. As highlighted earlier, the uncertainty provided by the evidence theory has proven effective in assessing the quality of sample features. Therefore, it is natural for us to combine the uncertainty estimation with the IDN problem, leveraging its potential to enhance the overall performance. In our work, we do not directly reduce the complexity of correlation matrix by simplifying it. Instead, we propose an assumption that the higher the uncertainty of the model on the decision, the higher the probability that the sample label is noisy . Based on this assumption, a mild constraint is imposed on the correlation matrix to effectively reduce the degrees of freedom of its linear system. Specifically, based on the obtained Dirichlet parameters αv with its corresponding opinion uncertainty uv. We impose different constraints on various parts of the correlation matrix T v, aiming to encourage it to transfer evidence for instances with higher uncertainty and uncover potential labeling-related patterns. Diagonal elements. Since the diagonal element {tv kk}K k=1 in the T v corresponds to the probability that the labelled category is equal to its true category. Meanwhile the confidence we obtain from subjective opinions is only relevant for the diagonal elements corresponding to their labelled category y, for {tv kk}K k=1,k = y, uv no longer provides any direct information. Therefore, we simply make the other diagonal elements close to the confidence mean of the corresponding class of samples from the current batch. It can be expressed as:

k=1 MDk( αv), (8)

MDk( αv) = [(1 uv) tv kk]2, if k = y, [(1 uv k) tv kk]2, if k = y, (9)

where uv = K/ PK k=1 αv k , uv k is the average of the uv of all samples with label y = i in the current batch. Non-diagonal elements. The constraints on the diagonal elements could be regarded as guiding the probability of the sample being mislabeled. The probability of being mislabeled as another category is influenced by the inherent relationship between the different categories. For example, dog is more likely to be labeled as wolf than as plane . Considering that samples in the same class can be easily labeled as the same error class, the transfer probabilities of their nondiagonal elements should be close. In addition, since the labeled information may contain noise, we aim to eliminate the misleading of the error samples in the same class. To solve this problem, we construct the affinity matrix {Sv}V v=1 for each viewpoint and calculate this loss with only the k most similar samples in the same class, and for the n-th sample:

MO( αv n) =

m=1 sv nm T v n T v m 2, (10)

Algorithm 1 TMNR algorithm /*Training*/ Input: Noisy training dataset, hyperparameter β, γ Output: Parameters of model 1: Initialize the parameters of the evidence neural network. 2: Initialise all correlation matrices T as unit matrices. 3: while not converged do 4: for v = 1 : V do 5: Obtain clean evidence ev n with f v(xv n; θ); 6: Obtain ev n and αv n through Eq. (4); 7: end for 8: Aggregation to obtain αn by Eq. (5); 9: Calculate overall loss with Eq. (14); 10: Update the parameters; 11: Correct T to satisfy Eq. (1). 12: end while /*Test*/ Calculate the clean joint Dir(p|α) and the corresponding uncertainty u by f( ; θ)

( e ||xv n xv m||2

σ2 , if xv m N(xv n, k) and yn = ym, 0, else, (11) where sv nm denotes the (n, m)-th element in the affinity matrix Sv for the v-th view, which measures the similarity between xv n and xv m. N(xv n, k) indicates the k-nearest neighbours of the view xv n. T v i is the matrix after zeroing the diagonal elements of T v i . Thus, the overall regularization term for the inter-sample uncertainty bootstrap is expressed as:

M( αv) = MD( αv) + MO( αv) (12)

Inter-view consistency. In multi-view learning, each view represents different dimensional features of the same instance. In our approach, we leverage the consistency principle of these views to ensure the overall coherence of the correlation matrix across all views. The consistency loss is denoted as:

PK k=1 PK j=1 |tv kj tkj| , (13)

where tkj = (PV v=1 tv kj)/ V .

Overall Loss To sum up, for a multi-view instance {xv}V v=1, to ensure that view-speicfic and aggregated opinions receive supervised guidance. We use a multitasking strategy and bootstrap the correlation matrix according to the designed regularization term:

Lall = L( α) +

v=1 [L ( αv) + β (M( αv))] + γLcon, (14)

where β and γ are hyperparameters that balances the adjusted cross-entropy loss with the uncertainty bootstrap regularisation and the inter-view consistency loss. α is obtained by aggregating multiple noise class-posterior parameters { αv}V v=1. The overall procedure is summarized in Algorithm 1.

Proceedings of the Thirty-Third International Joint Conference on Artiﬁcial Intelligence (IJCAI-24)

4 Experiments

In this section, we test the effectiveness of the proposed method on 5 real-world multi-view datasets with different proportions of labelling noise added. In addition, we also verify the ability of the model to handle low-quality features and noisy labels.

4.1 Experimental Setup

Datasets. UCI1 contains features for handwritten numerals ( 0 - 9 ). The average of pixels in 240 windows, 47 Zernike moments, and 6 morphological features are used as 3 views. PIE2 consists of 680 face images from 68 experimenters. We extracted 3 views from it: intensity, LBP and Gabor. BBC3 includes 685 documents from BBC News that can be categorised into 5 categories and are depicted by 4 views. Caltech1014 contains 8677 images from 101 categories, extracting features as different views with 6 different methods: Gabor, Wavelet Moments, CENTRIST, HOG, GIST, and LBP. we chose the first 20 categories. Leaves1005 consists of 1600 leaf samples from 100 plant species. We extracted shape descriptors, fine-scale edges, and texture histograms as 3 views. Compared methods. (1) Sing-view uncertainty aware methods contain MCDO [Gal and Ghahramani, 2016] measuring uncertainty by using dropout sampling in both training and inference phases. IEDL [Deng et al., 2023] is the SOTA method that involving evidential deep learning and Fisher s information matrix. (2) Label noise refining methods contain: FC [Patrini et al., 2017] corrects the loss function by a CCN transition matrix. ILFC [Berthon et al., 2021] explored IDN transition matrix by training a naive model on a subset. (3) Multi-view feature fusion methods contain: DCCAE [Wang et al., 2015] train the autoencoder to obtain a common representation between the two views. DCP [Lin et al., 2022] is the SOTA method that obtain a consistent representation through dual contrastive loss and dual prediction loss. (4) Multi-view decision fusion methods contain: ETMC [Han et al., 2022] estimates uncertainty based on EDL and dynamically fuses the views accordingly to obtain reliable results. ECML [Xu et al., 2024] is the SOTA method that propose a new opinion aggregation strategy. We summarize baseline methods in Table 1. For the single-view baselines, we concatenate feature vectors of different views. Implementation details. We implement all methods on Py Torch 1.13 framework. In our model, the view-specific evidence extracted by fully connected networks with a Re LU layer. The correlation matrices are initially set as unit matrix. We utilize the Adam optimizer with a learning rate of 1e 3 and l2-norm regularization set to 1e 5. In all datasets, 20% of the instances are split as the test set. We run 5 times for each method to report the mean values and standard devia-

1http://archive.ics.uci.edu/dataset/72/multiple+features 2https://www.cs.cmu.edu/afs/cs/project/PIE/Multi Pie/Multi Pie/Home.html 3http://mlg.ucd.ie/datasets/segment.html 4https://github.com/yeqinglee/mvdata 5https://archive.ics.uci.edu/dataset/241/one+hundred+plant+ species+leaves+data+set

Methods Trusted Multi-view Noise Refining

MCDO IEDL FC ILFC DCCAE DCP ETMC ECML

Table 1: Summary of the methods. denotes the corresponding information is used.

tions. We follow [Cheng et al., 2020] to generate the instancedependent label noise training sets.

4.2 Experimental Results

Performance comparison. The comparison between TMNR and baselines on clean and noisy datasets are shown in Table 2. We can observe the following points: (1) On the clean training dataset, TMNR achieves performance comparable to state-of-the-art methods. This finding indicates that the noise forward correction module has minimal negtive impact on the model s performance. (2) The performance of multi-view feature fusion methods degrade clearly with the noise ratio increase. The reason is the feature fusion would badly affected by noisy labels. (3) On the noisy training dataset, especially with high noise ratio, TMNR significantly outperforms all baseline. Such performance is a powerful evidence that our proposed method effectively reduces the effect of noisy labels through forward correction. We would further verify this in ablation study and uncertainty evaluation experiments. Model uncertainty evaluation. In real-world datasets, various categories have varying probabilities of being labeled incorrectly. If we can identify the classes that are more likely to be labeled incorrectly during the labeling process, we can apply specialized processing to address these classes, such as involving experts in secondary labeling. As incorrect labeling leads to increased model uncertainty, our model can effectively identify classes that contain noise by assessing their predicted uncertainty. To observe significant results, we intentionally flipped the labels of samples belonging to classes 0 and 1 , as well as classes 8 and 9 , within the UCI dataset during training. Subsequently, predictions were made on the test samples, and the average uncertainty for each category was calculated. The results, depicted in Figure 4(a), demonstrate a notable increase in uncertainty for the categories where the labels were corrupted. Figure 4(b) presents a heat map displaying the mean values of all trained correlation matrix parameters. The results clearly illustrate that the model s structure captures the probability of changes in inter-class evidence. Correlation matrix evaluation. We analyze the sensitivity of hyperparameter β on all datasets containing 30% noise. The results is shown in Figure 5. It is evident that the sensitivity of the parameter β varies across different datasets,

Proceedings of the Thirty-Third International Joint Conference on Artiﬁcial Intelligence (IJCAI-24)

Datasets Noise Methods MCDO IEDL FC ILFC DCCAE DCP ETMC ECML TMNR

0% 97.50 0.72 97.70 0.46 97.05 0.19 95.50 0.76 85.75 0.32 96.15 0.49 96.20 0.73 97.05 0.48 96.90 0.65 10% 95.50 0.61 95.25 1.25 96.20 0.58 95.45 0.82 85.50 0.70 95.55 0.54 95.50 0.33 95.85 0.19 95.95 0.37 20% 95.40 0.82 95.50 0.75 95.35 1.06 95.15 0.46 85.00 0.37 95.25 0.87 95.35 0.49 95.35 0.66 95.90 0.84 30% 92.30 0.80 92.50 1,74 91.90 1.21 93.85 1.51 84.50 0.98 92.40 1.24 93.15 1.49 92.85 1.42 94.00 1.46 40% 90.15 1.44 90.30 1.99 89.90 1.58 90.95 1.87 84.50 0.78 90.30 1.08 91.65 1.54 92.35 1.37 94.65 1.35 50% 83.65 1.98 85.85 1.17 83.55 2.93 84.85 1.23 81.75 1.08 83.75 1.64 84.10 1.83 86.15 1.62 88.90 0.49

0% 89.71 3.15 90.85 3.31 71.03 5.02 73.28 3.27 53.38 1.82 87.24 2.48 87.06 2.89 89.83 2.66 89.53 1.89 10% 77.50 3.37 85.74 4.50 55.00 3.87 66.44 4.12 47.06 0.76 82.70 3.21 83.38 1.28 83.09 3.75 86.47 1.97 20% 64.71 1.40 80.29 3.58 44.85 4.05 61.23 3.21 47.79 1.87 75.76 1.84 77.21 1.04 76.47 1.54 83.24 1.70 30% 55.44 3.28 69.44 2.73 33.97 1.94 58.98 4.10 36.03 0.67 65.38 2.49 63.97 1.86 70.44 4.04 73.29 2.08 40% 46.32 2.03 65.29 4.57 32.31 5.29 51.84 5.29 33.82 1.19 58.46 2.96 61.76 2.33 63.97 4.53 71.91 2.33 50% 38.68 3.50 53.00 2.88 25.44 2.57 44.02 4.23 30.88 1.38 50.85 2.82 51.32 3.40 55.53 2.30 59.85 2.89

0% 93.31 1.79 92.60 2.04 92.12 3.01 92.56 1.87 92.03 2.48 93.20 1.92 93.58 1.42 91.82 1.93 93.51 1.35 10% 89.34 1.88 90.38 1.81 89.41 1.57 88.28 1.34 88.24 1.54 88.33 1.41 89.93 1.56 88.18 1.17 90.07 1.53 20% 86.01 2.86 86.93 1.87 85.11 2.47 86.01 2.02 85.41 2.97 86.17 2.38 86.86 2.73 85.11 2.87 87.45 2.86 30% 74.45 4.05 81.07 3.56 73.87 5.66 77.23 2.98 77.94 2.36 77.80 2.49 80.73 2.47 74.16 1.27 82.04 2.98 40% 69.64 2.72 71.06 1.07 70.51 1.76 72.67 3.88 71.09 2.51 70.87 3.01 72.85 3.36 72.85 3.96 75.91 3.44 50% 56.64 3.59 59.04 4.52 56.93 2.44 57.29 3.90 56.27 3.87 56.24 3.76 57.23 4.30 58.83 3.05 63.88 4.43

0% 71.38 5.06 92.35 1.46 64.64 5.57 85.24 1.90 88.03 0.75 91.93 1.39 92.59 0.86 91.13 1.61 91.84 1.08 10% 68.66 5.02 91.05 1.26 57.28 3.46 81.92 2.39 86.19 0.98 90.74 1.22 90.34 1.32 91.38 1.59 91.09 0.59 20% 58.41 2.43 86.82 2.22 42.34 5.38 77.74 2.80 83.47 1.28 87.02 2.31 87.74 1.24 87.12 0.96 87.78 1.26 30% 54.60 3.58 83.01 0.90 52.97 2.54 73.26 2.11 82.85 0.93 84.98 1.01 86.07 1.10 86.11 0.49 86.82 0.83 40% 48.12 5.99 71.92 1.95 46.57 4.77 70.17 3.98 75.94 1.89 76.90 1.55 78.91 0.90 77.82 0.68 81.59 0.96 50% 43.89 6.41 59.04 1.84 35.98 4.78 63.28 3.41 61.09 1.46 65.29 2.52 68.79 2.02 68.91 1.96 72.89 1.97

0% 66.12 5.05 72.12 0.71 64.06 4.40 66.27 3.11 63.50 0.75 73.40 2.18 70.62 3.46 73.00 1.73 73.75 2.55 10% 63.81 2.87 65.38 2.18 62.00 3.38 63.02 2.94 62.19 0.97 66.16 2.47 66.31 3.16 68.31 1.45 68.88 2.17 20% 60.06 3.23 63.38 1.67 59.63 2.04 59.61 3.20 61.25 1.24 61.30 3.98 59.31 3.16 62.56 4.67 64.19 2.03 30% 55.31 4.19 57.37 3.44 52.62 3.47 54.75 3.59 58.13 0.76 57.44 2.80 58.75 1.95 59.94 5.61 60.94 2.04 40% 47.69 2.26 52.88 1.39 44.31 2.59 48.57 2.16 55.13 1.87 52.59 1.97 51.69 2.27 54.69 3.89 57.75 2.12 50% 40.88 3.80 48.31 2.11 36.44 3.12 40.35 3.82 50.31 1.89 49.67 3.24 51.81 1.76 50.44 3.10 55.63 2.91

Table 2: Classification accuracy(%) of TMNR and baseline methods on the datasets with different proportions of Instance-Dependent Noise. The Noise column shows the percentage of noisy labelled instances, where 0% denote clean datasets. The best and the second best results are highlighted by boldface and underlined respectively.

Figure 4: Visualization of the average uncertainty of each category and the correlation matrices.

yet optimal performance is consistently achieved within the range of 0.05 to 0.1. This observation validates the effectiveness of the regularization applied to the diagonal elements of the correlation matrices. Taking this into consideration, we have determined the appropriate value of β for the remaining evaluation experiments.

5 Conclusion

In this paper, we introduced a TMNR method for addressing the generalized trusted multi-view learning problem. TMNR

Figure 5: Correlation matrix evaluation. Classification accuracy when adjusting β on all datasets with 30% noise rate.

leverages evidential deep neural networks to learn viewspecific belief mass vectors and uncertainty estimates. We further designed view-specific noise correlation matrices to effectively correlate the original opinions with the noisy opinions. By aggregating the noisy opinions and training the entire model using the noisy labels, we achieved robust model training. Experimental results on five real-world datasets validated the effectiveness of TMNR, demonstrating its superiority compared to state-of-the-art baseline methods.

Proceedings of the Thirty-Third International Joint Conference on Artiﬁcial Intelligence (IJCAI-24)

Acknowledgments This research was supported by the National Natural Science Foundation of China (Grant Nos. 62133012, 61936006, 62103314, 62073255, 62303366), the Key Research and Development Program of Shanxi (Program No. 2020ZDLGY04-07), Innovation Capability Support Program of Shanxi (Program No. 2021TD-05) and Natural Science Basic Research Program of Shaanxi under Grant No.2023JC-QN-0648.

References [Berthon et al., 2021] Antonin Berthon, Bo Han, Gang Niu, Tongliang Liu, and Masashi Sugiyama. Confidence scores make instance-dependent label-noise learning possible. In Proceedings of the 38th International Conference on Machine Learning, pages 825 836, 2021. [Chen et al., 2024] Changrui Chen, Jungong Han, and Kurt Debattista. Virtual category learning: A semi-supervised learning method for dense prediction with extremely limited labels. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024. [Cheng et al., 2020] Hao Cheng, Zhaowei Zhu, Xingyu Li, Yifei Gong, Xing Sun, and Yang Liu. Learning with instance-dependent label noise: A sample sieve approach. In International Conference on Learning Representations, 2020. [Cheng et al., 2022] De Cheng, Tongliang Liu, Yixiong Ning, Nannan Wang, Bo Han, Gang Niu, Xinbo Gao, and Masashi Sugiyama. Instance-dependent label-noise learning with manifold-regularized transition matrix estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16630 16639, 2022. [Deng et al., 2023] Danruo Deng, Guangyong Chen, Yang Yu, Furui Liu, and Pheng-Ann Heng. Uncertainty estimation by fisher information-based evidential deep learning. In Proceedings of the 40th International Conference on Machine Learning, pages 7596 7616, 2023. [Gal and Ghahramani, 2016] Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of The 33rd International Conference on Machine Learning, page 1050 1059, 2016. [Gan et al., 2021] Jiangzhang Gan, Ziwen Peng, Xiaofeng Zhu, Rongyao Hu, Junbo Ma, and Guorong Wu. Brain functional connectivity analysis based on multi-graph fusion. Medical image analysis, 71:102057, 2021. [Han et al., 2020] Zongbo Han, Changqing Zhang, Huazhu Fu, and Joey Tianyi Zhou. Trusted multi-view classification. In International Conference on Learning Representations, 2020. [Han et al., 2022] Zongbo Han, Changqing Zhang, Huazhu Fu, and Joey Tianyi Zhou. Trusted multi-view classification with dynamic evidential fusion. IEEE transactions on pattern analysis and machine intelligence, 45(2):2551 2566, 2022.

[Hendrycks et al., 2018] Dan Hendrycks, Mantas Mazeika, Duncan Wilson, and Kevin Gimpel. Using trusted data to train deep networks on labels corrupted by severe noise. Advances in neural information processing systems, 31, 2018.

[Hu et al., 2024] Shizhe Hu, Chengkun Zhang, Guoliang Zou, Zhengzheng Lou, and Yangdong Ye. Deep multiview clustering by pseudo-label guided contrastive learning and dual correlation learning. IEEE Transactions on Neural Networks and Learning Systems, 2024.

[Huang et al., 2023] Shudong Huang, Yixi Liu, Ivor W Tsang, Zenglin Xu, and Jiancheng Lv. Multi-view subspace clustering by joint measuring of consistency and diversity. IEEE Transactions on Knowledge and Data Engineering, 35(8):8270 8281, 2023.

[Jillani et al., 2020] Rashad Jillani, Syed Fawad Hussain, and Hari Kalva. Multi-view clustering for fast intra mode decision in hevc. In 2020 IEEE International Conference on Consumer Electronics (ICCE), 2020.

[Jøsang, 2016] Audun Jøsang. Subjective logic: A formalism for reasoning under uncertainty. Springer, 2016.

[Liang et al., 2021] Xinyan Liang, Qian Guo, Yuhua Qian, Weiping Ding, and Qingfu Zhang. Evolutionary deep fusion method and its application in chemical structure recognition. IEEE Transactions on Evolutionary Computation, 25(5):883 893, 2021.

[Lin et al., 2022] Yijie Lin, Yuanbiao Gou, Xiaotian Liu, Jinfeng Bai, Jiancheng Lv, and Xi Peng. Dual contrastive prediction for incomplete multi-view representation learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):4447 4461, 2022.

[Lin et al., 2023] Zhenghong Lin, Yanchao Tan, Yunfei Zhan, Weiming Liu, Fan Wang, Chaochao Chen, Shiping Wang, and Carl Yang. Contrastive intra-and inter-modality generation for enhancing incomplete multimedia recommendation. In Proceedings of the 31st ACM International Conference on Multimedia, pages 6234 6242, 2023.

[Liu et al., 2022] Wei Liu, Xiaodong Yue, Yufei Chen, and Thierry Denoeux. Trusted multi-view deep learning with opinion aggregation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 7585 7593, 2022.

[Liu et al., 2023] Chengliang Liu, Jie Wen, Zhihao Wu, Xiaoling Luo, Chao Huang, and Yong Xu. Information recovery-driven deep incomplete multiview clustering network. IEEE Transactions on Neural Networks and Learning Systems, 2023.

[Liu et al., 2024] Haoran Liu, Ying Ma, Ming Yan, Yingke Chen, Dezhong Peng, and Xu Wang. Dida: Disambiguated domain alignment for cross-domain retrieval with partial labels. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 3612 3620, 2024.

[Min et al., 2023] Bonan Min, Hayley Ross, Elior Sulem, Amir Pouran Ben Veyseh, Thien Huu Nguyen, Oscar

Proceedings of the Thirty-Third International Joint Conference on Artiﬁcial Intelligence (IJCAI-24)

Sainz, Eneko Agirre, Ilana Heintz, and Dan Roth. Recent advances in natural language processing via large pretrained language models: A survey. ACM Computing Surveys, 56(2):1 40, 2023.

[Nikzad-Khasmakhi et al., 2021] Narjes Nikzad Khasmakhi, Mohammad Ali Balafar, M Reza Feizi Derakhshi, and Cina Motamed. Berters: Multimodal representation learning for expert recommendation system with transformers and graph embeddings. Chaos, Solitons & Fractals, 151:111260, 2021.

[Patrini et al., 2017] Giorgio Patrini, Alessandro Rozza, Aditya Krishna Menon, Richard Nock, and Lizhen Qu. Making deep neural networks robust to label noise: A loss correction approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2233 2241, 2017.

[Qin et al., 2022] Yang Qin, Dezhong Peng, Xi Peng, Xu Wang, and Peng Hu. Deep evidential learning with noisy correspondence for cross-modal retrieval. In Proceedings of the 30th ACM International Conference on Multimedia, pages 4948 4956, 2022.

[Sensoy et al., 2018] Murat Sensoy, Lance Kaplan, and Melih Kandemir. Evidential deep learning to quantify classification uncertainty. Advances in neural information processing systems, 31, 2018.

[Wang et al., 2015] Weiran Wang, Raman Arora, Karen Livescu, and Jeff Bilmes. On deep multi-view representation learning. In Proceedings of the 32nd International Conference on Machine Learning, pages 1083 1092, 2015.

[Wang et al., 2022] Xu Wang, Peng Hu, Pei Liu, and Dezhong Peng. Deep semisupervised classand correlation-collapsed cross-view learning. IEEE Transactions on Cybernetics, 52(3):1588 1601, 2022.

[Wen et al., 2022] Jie Wen, Zheng Zhang, Lunke Fei, Bob Zhang, Yong Xu, Zhao Zhang, and Jinxing Li. A survey on incomplete multiview clustering. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 53(2):1136 1149, 2022.

[Xie et al., 2023] Mengyao Xie, Zongbo Han, Changqing Zhang, Yichen Bai, and Qinghua Hu. Exploring and exploiting uncertainty for incomplete multi-view classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19873 19882, 2023.

[Xu et al., 2022] Lei Xu, Hui Wu, Chunming He, Jun Wang, Changqing Zhang, Feiping Nie, and Lei Chen. Multimodal sequence learning for alzheimer s disease progression prediction with incomplete variable-length longitudinal data. Medical Image Analysis, 82:102643, 2022.

[Xu et al., 2023] Jie Xu, Yazhou Ren, Xiaoshuang Shi, Heng Tao Shen, and Xiaofeng Zhu. Untie: Clustering analysis with disentanglement in multi-view information fusion. Information Fusion, 100:101937, 2023.

[Xu et al., 2024] Cai Xu, Jiajun Si, Ziyu Guan, Wei Zhao, Yue Wu, and Xiyue Gao. Reliable conflictive multi-view learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 16129 16137, 2024. [Zhou et al., 2023] Hai Zhou, Zhe Xue, Ying Liu, Boang Li, Junping Du, Meiyu Liang, and Yuankai Qi. Calm: An enhanced encoding and confidence evaluating framework for trustworthy multi-view learning. In Proceedings of the 31st ACM International Conference on Multimedia, pages 3108 3116, 2023.

Proceedings of the Thirty-Third International Joint Conference on Artiﬁcial Intelligence (IJCAI-24)