# topologyimbalance_learning_for_semisupervised_node_classification__2ac78388.pdf Topology-Imbalance Learning for Semi-Supervised Node Classification Deli Chen1,2, Yankai Lin1, Guangxiang Zhao2, Xuancheng Ren2, Peng Li1, Jie Zhou1, Xu Sun2 1Pattern Recognition Center, We Chat AI, Tencent Inc., China 2MOE Key Lab of Computational Linguistics, School of EECS, Peking University {delichen, yankailin, patrickpli,withtomzhou}@tencent.com {zhaoguangxiang,renxc,xusun}@pku.edu.cn The class imbalance problem, as an important issue in learning node representations, has drawn increasing attention from the community. Although the imbalance considered by existing studies roots from the unequal quantity of labeled examples in different classes (quantity imbalance), we argue that graph data expose a unique source of imbalance from the asymmetric topological properties of the labeled nodes, i.e., labeled nodes are not equal in terms of their structural role in the graph (topology imbalance). In this work, we first probe the previously unknown topology-imbalance issue, including its characteristics, causes, and threats to semisupervised node classification learning. We then provide a unified view to jointly analyzing the quantityand topologyimbalance issues by considering the node influence shift phenomenon with the Label Propagation algorithm. In light of our analysis, we devise an influence conflict detection based metric Totoro to measure the degree of graph topology imbalance and propose a model-agnostic method Re Node to address the topology-imbalance issue by re-weighting the influence of labeled nodes adaptively based on their relative positions to class boundaries. Systematic experiments demonstrate the effectiveness and generalizability of our method in relieving topology-imbalance issue and promoting semi-supervised node classification. The further analysis unveils varied sensitivity of different graph neural networks (GNNs) to topology imbalance, which may serve as a new perspective in evaluating GNN architectures.1 1 Introduction Graph is a widely-used data structure [51], where the nodes are connected to each other through natural or handcrafted edges. Similar to other data structures, the representation learning for node classification faces the challenge of quantity-imbalance issue, where the labeling size varies among classes and the decision boundaries of trained classifiers are mainly decided by the majority classes [46]. There have been a series of studies [35, 11, 49] handling the Quantity-Imbalance Node Representation Learning (short as QINL). However, different with other data structures, graph-structured data suffers from another aspect of the imbalance problem: the imbalance caused by the asymmetric and uneven topology of labeled nodes, where the decision boundaries are driven by the labeled nodes close to the topological class boundaries (left of Figure 1) thus interfering with the model learning. Present Work. For the first time, we recognize the Topology-Imbalance Node Representation Learning (short as TINL) as a graph-specific imbalance learning topic, which mainly focus on the 1The code is available at https://github.com/victorchen96/Re Node. 35th Conference on Neural Information Processing Systems (Neur IPS 2021). Unlabeled Node B R1 Labeled Node X Influence Conflict Y Influence Insufficient Figure 1: Schematic diagram of the topology-imbalance issue in node representation learning. The color and the hue denote the type and the intensity of each node s received influence from the labeled nodes, respectively. The left shows that nodes close to the boundary have the risk of information conflict and nodes far away from labeled nodes have the risk of information insufficient. The right shows that our method can decrease the training weights of labeled nodes (R1) close to the class boundary and increase the weights of labeled nodes (B and R2) close to the class centers, thus relieving the topology-imbalance issue. decision boundaries shift phenomena driven by the topology imbalance in graph and is an essential component for node imbalance learning. Comparing with the well-explored QINL that studies the imbalance caused by the numbers of labeled nodes, TINL explores the imbalance caused by the positions of labeled nodes and owns the following characteristics: Ubiquity: Due to the complex connections of the graph nodes, the topology structure of nodes in different categories is naturally asymmetric, which makes TINL an essential characteristic in node representation learning. Hence, it is difficult to construct a completely symmetric labeling set even with an abundant annotation budget. Perniciousness: The influence from labeled nodes decays with the topology distance [3]. The asymmetric topology of labeled nodes in different classes and the uneven distribution of labeled nodes in the same class will cause the influence conflict and influence insufficient problems (left of Figure 1) respectively, resulting in a shift of decision boundaries. Orthogonality: Quantity-imbalance studies [49, 8, 5] usually treat the labeled nodes of the same class as a whole and devise solutions based on the total numbers of each class, while TINL explores the influence of the unique position of each labeled node on decision boundaries. Thus, TINL is independent of QINL in terms of the object of study. Exploring TINL is of great importance for node representation learning due to its ubiquity and perniciousness. However, the methods [17, 22] for quantity imbalance can be hardly applied to TINL because of the orthogonality. To remedy the topology-imbalance issue, thus promoting the node classification, we propose a model-agnostic training framework Re Node to re-weight the labeled nodes according to their positions. We devise the conflict detection-based Topology Relative Location (Totoro) metric to leverage the interaction among labeled nodes across the whole graph to locate their structural positions. Based on the Totoro metric, we further increase the training weights of nodes with small conflict that are highly likely to be close to topological class centers to make them play a more pivotal role during training, and vice versa (right of Figure 1). Empirical results of various imbalance scenarios (TINL, QINL, large-scale graph) and multiple graph neural networks (GNNs) demonstrate the effectiveness and generalizability of our method. Besides, we provide the sensitivity to topology imbalance as a new evaluation perspective for different GNN architectures. 2 Topology-Imbalance Node Representation Learning 2.1 Notations and Preliminary In this work, we follow the well-established semi-supervised node classification setting [47, 18] to conduct analyses and experiments. Given an undirected and unweighted graph G = (V, E, L), where V is the node set represented by the feature matrix X Rn d (n = |V| is the node size and d is the node embedding dimension), E is the edge set which is represented by an adjacency matrix A Rn n, L V is the labeled node set and usually we have |L| |V|, the node classification GCN=T, LP=T GCN=T, LP=F GCN=F, LP=T GCN=F, LP=F (a) Predictions of GCN and LP (b) Uniform Sampling (c) Quantity-balanced Sampling Figure 2: Node influence and boundary shift caused by quantityand topology-imbalance. (a): The prediction results of GCN and LP are highly consistent (t-SNE [39] visualization of the CORA dataset). (b): The node influence boundary (the yellow dotted line) is shifted towards the small class from the true class boundary (the black dotted line) under the quantityand topology-imbalance scene. (c): The node influence boundary is shifted towards the large class under the quantity-balanced, topology-imbalanced scene. We regard the large class as positive class to indicate the results. task is to train a classifier F (usually a GNN) to predict the class label y for the unlabeled node set U = V L. The training sets for different classes are represented by (C1, C2, , Ck) and k is the number of classes. The labeling ratio δ = L/V is the proportion of labeled nodes in all nodes. In this work, we focus on TINL in homogeneously-connected graphs and hope to inspire future studies on the critical topology-imbalance issue. 2.2 Understanding Topology Imbalance via Label Propagation From Figure 1, we can intuitively perceive the imbalance brought by the positions of labeled nodes; in this part, we further explore the nature of topology imbalance with the well-known Label Propagation [50] algorithm (short as LP) and provide a uniform analysis framework for the comprehensive node imbalance issue. In LP, labels are propagated from the labeled nodes and aggregated along edges, which can also be viewed as a random walk process from labeled nodes. The convergence result Y after repeated propagation is regarded as the nodes soft-labels: Y = α(I (1 α)A ) 1Y 0, (1) where I is the identity matrix, α (0, 1] is the random walk restart probability, A = D 1 2 is the adjacency matrix normalized by the diagonal degree matrix D, Y 0 is the initial label distribution where labeled nodes are represented by the one-hot vectors. The prediction label for the i-th node is qi = arg maxj Yij. LP is a simple yet successful model [37] and can be unified with GNN models owning the message-passing mechanism [41]. From Figure 2(a), we can empirically find that there is a significant correlation between the results of LP and GCN (T/F indicates prediction is True/False). The LP prediction q can be viewed as the distribution of the (labeled) node influence [41] (i.e. each node is mostly influenced by which class s information); hence the boundaries of the node influence can act as an effective reflection for the GNN model decision boundaries considering the high consistency between LP and GNN. Moreover, node influence offers a unified view of TINL and QINL: ideally, the node influence boundaries should be consistent with the true class boundaries, but both the labeled nodes numbers (QINL) and positions (TINL) can cause a shift of the node influence boundaries from the true one, resulting in deviation of the model decision boundaries. Node imbalance issue is composed of topologyand quantity-imbalance. Figure 2 illustrates two examples of node influence boundary shift. In Figure 2(b), when the uniform selection is adopted to generate training set, both the quantity and the topology are imbalanced for model training; then the large class with more total nodes (denotes by blue color) will own stronger influence than the small class with fewer total nodes (denotes by red color) due to the quantity advantage and the node influence boundary is shifted towards the small class. In Figure 2(c), when the quantity-balanced strategy is adopted for sampling training nodes, it will be easier for the small class to has more labeled nodes close to the class boundary and the boundary of the node influence is shifted into the large class. We can find that even when the training set is quantity-balanced, the topology-imbalance issue still exists and hinders the node classification learning. Hence, we can conclude that node imbalance 7.0 7.5 8.0 8.5 9.0 9.5 10.0 10.5 Conflict Figure 3: Effectiveness of Totoro at (a) Node Level: labeled nodes (t-SNE visualization of the CORA dataset) with less influence conflict (lighter color) are farther-away from class boundaries than those with high conflict (darker color), and (b) Dataset Level: There is a significant negative correlation between the GNN (GCN) performance and overall conflict of the training set (the Pearson correlation coefficient is 0.618 over 50 randomly selected training sets with the p value smaller than 0.01). learning is caused by the joint effect of TINL and QINL. Separately considering TINL or QINL will lead to a one-sided solution to node imbalance learning. 2.3 Measuring Topology Imbalance by Influence Conflict Although we have realized that the imbalance of node topology interferes with model learning, how to measure the labeled node s relative topological position to its class (being far away from or close to the class center) remains the key challenge in handling the topology-imbalance issue due to the complex graph connections and the unknown class labels for most nodes in the graph. As the nodes are homogeneously connected when constructing the graph, even nodes close to the class boundaries own similar characteristics to their neighbors. Thus it is unreliable to leverage the difference between the characteristics of one labeled node and its surrounding subgraphs to locate its topological position. Instead, we propose to utilize the node topology information by considering the node influence conflict across the whole graph and devise the Conflict Detection-based Topology Relative Location metric (Totoro). Similar to Eq (1), we calculate the Personalized Page Rank [27] matrix P to measure node influence distribution from each labeled node: P = α(I (1 α)A ) 1. (2) Node influence conflict denotes topological position. According to related studies [41, 19, 2], P can be viewed as the distribution of influence exerted outward from each node. We assume that if a labeled node v V encounters strong heterogeneous influence from the other classes labeled nodes in the subgraph around node v where node v itself owns great influence, we have the conclusion that node v meets large influence conflict in message passing and it is close to topological class boundaries, and vice versa. Based on this hypothesis, we take the expectation of the influence conflict between the node v and the labeled nodes from other classes when node v randomly walks across the entire graph as a measurement of how topologically close node v is to the center of the class it belongs to. The Totoro value of node v is computed as: Tv = Ex Pv,:[ X j [1,k],j =yv i Cj Pi,x], (3) where yv is the ground-truth label of node v, Pv indicates the personalized Page Rank probability vector for the node v. A larger Totoro value Tv indicates that node v is topologically closer to class boundaries, and vice versa. The normalization item 1/|Cj| is added to make the influence from the different classes comparable when computing conflict. We visualize the node labels and the Totoro values (scaled to [0, 1]) of labeled nodes in Figure 3(a). We can find that the labeled nodes with smaller Totoro values are farther away from the class boundaries, demonstrating the effectiveness of Totoro in locating the positions of labeled nodes. Besides, we sum the conflict of all the labeled nodes P b L Tv to measure the overall conflict of the dataset, which can be viewed as the metric for the overall topology imbalance given the graph G and the training set L. Figure 3(b) shows that there is a significant negative correlation between the overall conflict and the model performance, which further demonstrates the effectiveness of Totoro in measuring the intensity of topology imbalance at the dataset level. 2.4 Alleviate Topology Imbalance by Instance-wise Node Re-weighting In this section, we introduce Re Node, a model-agnostic training weight schedule mechanism to address TINL for general GNN encoder in a plug-and-play manner. Inspired by the analysis in Section 2.2, the Re Node method is devised to promote the training weights of the labeled nodes that are close to the topological class centers, so as to make these nodes play a more active role in model learning, and vice versa. Specifically, we devise a cosine annealing mechanise 2 for the training node weights based on their Totoro values: wv = wmin + 1 2(wmax wmin)(1 + cos(Rank(Tv) |L| π)), v L (4) where wv is the modified training weight for the labeled node v, wmin, wmax are the hyper-parameters indicating the lower bound and upper bound of the weight correction factor, Rank(Tv) is the ranking order of Tv from the smallest to the largest. The training loss LT for the quantity-balanced, topologyimbalanced node classification task is computed by the following equations: c=1 y c v log gc v, g = softmax(F(X, A, θ)), (5) where F denotes any GNN encoder, θ is the parameter of F, gi is the GNN output for node i, y i is the gold label for node i in one-hot embedding. By encouraging the positive effects of the labeled nodes near the class topological centers, and reducing the negative effects of those near the topological class boundaries, our Re Node method is expected to minimize the deviation between the node influence boundaries and the true class boundaries, so as to correct the class imbalance caused by the positions of labeled nodes. Re Node to Jointly Handle TINL and QINL In this part, we introduce the application of the Re Node method in a more general graph imbalance scenario where both the topologyand quantityimbalance issues exist. As analyzed in previous sections, the TINL and QINL are orthogonal problems. Therefore, we propose that our Re Node method based on (labeled) node topology can be seamlessly combined with the existing methods designed for the quantity-imbalance learning. Without loss of generality, we present how our Re Node method can be combined with the vanilla class frequency-based re-weight method [17]. The training loss LQ for the quantity-imbalanced, topology-imbalanced node classification task is formalized in the following equation: v L wv |C| |Cj| c=1 y c v log gc v, (6) where |C| is the average number of the class training sizes. With this method, the final weight of the labeled node is affected by two perspectives: training examples of the minority classes will have higher weights than that of the majority classes; training examples close to the topological class centers will have higher weights than those are close to the topological class boundaries. Re Node for Large-scale Graph There are mainly two challenges when applying Re Node to largescale graphs: (1) how to calculate the Page Rank matrix, and (2) how to train the GNN model in an inductive setting [13]. In this work, we follow the PPRGo method [2] to implement our method on the large-scale graph, which can decouple the feature learning process from the information transmission process to resolve the dependence on the global graph topology structure and can be carried out much efficiently. Following PPRGo, the Personalized Page Rank matrix ˆP and the corresponding training Re Node factor ˆw are generated by the estimation method from Andersen et al. [1] and then ˆP is 2We also try other schedules and the cosine method works best. More details can be found in Appendix B. Table 1: Re Node (short as RN) for the pure topology-imbalance issue. We report Weighted-F1 (W-F, %), Macro-F1 (M-F, %) and the corresponding standard deviation for each group of experiments. and represent the result is significant in student t-test with p < 0.05 and p < 0.01, respectively. Model Training CORA Cite Seer Pub Med Photo Computers W-F M-F W-F M-F W-F M-F W-F M-F W-F M-F GCN w/o RN 79.1 1.1 77.8 1.5 66.2 1.0 62.0 1.3 74.6 2.1 74.7 1.9 86.8 2.0 84.7 1.7 74.2 2.6 73.6 2.9 w/ RN 79.8 0.9 78.6 1.2 66.9 1.1 62.8 1.4 76.1 1.5 76.1 1.8 87.7 2.2 85.4 1.9 74.7 2.2 74.5 2.3 GAT w/o RN 76.0 1.7 74.9 1.9 66.3 2.8 62.4 2.6 73.9 2.2 73.9 2.1 88.3 2.0 86.2 2.2 79.0 2.1 78.8 2.3 w/ RN 77.7 2.0 76.2 1.8 67.1 1.9 63.2 1.6 75.2 2.0 75.1 2.5 89.1 2.0 87.1 2.0 78.8 1.9 78.7 2.0 PPNP w/o RN 80.5 1.6 79.1 1.4 67.5 1.8 63.2 1.6 74.6 1.9 74.7 1.7 89.3 1.3 86.8 1.4 78.7 1.5 77.7 1.7 w/ RN 81.9 0.6 80.5 0.8 68.1 1.4 63.7 2.0 76.0 2.0 76.1 2.2 89.7 1.0 87.2 1.3 79.0 1.1 78.3 1.1 SAGE w/o RN 75.1 1.7 74.6 1.4 67.0 1.4 63.0 1.4 74.2 2.2 74.2 2.1 86.2 2.6 83.9 2.4 73.5 3.4 71.6 2.5 w/ RN 75.7 1.7 75.1 1.4 67.3 1.4 63.5 1.2 74.9 1.9 78.2 2.3 86.5 1.7 84.1 1.7 74.9 3.0 72.3 2.5 CHEB w/o RN 74.5 1.1 73.4 1.1 66.8 1.8 63.2 1.6 75.1 1.8 75.2 1.1 82.1 2.2 79.4 3.5 70.3 4.0 68.4 3.4 w/ RN 75.3 1.1 74.0 1.1 67.5 1.6 63.8 1.5 76.2 1.4 76.3 1.2 84.8 2.4 82.1 2.8 70.5 4.0 68.6 3.4 SGC w/o RN 74.9 2.1 73.8 2.1 65.7 1.6 61.8 1.6 72.9 2.3 73.1 2.6 87.1 1.3 84.9 1.1 77.4 1.7 76.8 1.8 w/ RN 77.0 1.1 76.0 1.1 67.2 1.3 62.9 1.8 73.7 2.8 73.8 2.1 87.4 1.5 85.2 1.5 78.2 1.8 77.8 1.2 Table 2: Result of different dataset conflict levels (High/Middle/Low). Our Re Node method improve the GNN (GCN) performance most when the conflict level of graph is high. W-F(%) CORA-H CORA-M CORA-L Cite Seer-H Cite Seer-M Cite Seer-L Pub Med-H Pub Med-M Pub Med-L w/o RN 76.5 1.3 78.4 0.7 79.7 0.8 62.6 1.5 65.3 0.6 67.3 1.1 72.1 2.4 74.7 1.8 78.3 1.8 w/ RN 78.7 0.8 79.3 0.6 80.4 0.6 63.8 1.3 66.0 0.8 67.5 1.4 74.3 2.1 75.6 1.9 78.8 1.5 directly employed as the aggregation weights from all the other nodes regardless of their topology distance from the current node: g = softmax( ˆP F (X, θ )), (7) where F can be a linear layer or a multi-layer perceptron with parameter θ . The final training loss for large-scale graph LL follows Eq (5) and (6), and replaces w and g with ˆw and g . 3 Experiments In this section, we will first introduce the experimental datasets for both transductive and inductive semi-supervised node classification. Then we introduce the experiments to verify the effectiveness of the proposed Re Node method in three different imbalance situations: (1) TINL only, (2) TINL and QINL, (3) Large-scale Graph. 3.1 Datasets We adopt two sets of graph datasets to conduct experiments. For the transductive setting [13], we take the widely-used Plantoid paper citation graphs [33] (CORA,Cite Seer, Pubmed) and the Amazon copurchase graphs [24] (Photo,Computers) to verify the effectiveness of our method. For the inductive setting, we conduct experiments on the popular Reddit dataset [13] and the enormous MAG-Scholar dataset (coarse-grain version) [2] which owns millions of nodes and features. For each of these datasets, we repeat experiments on 5 different datasets splittings [34] and we run 3 times for each splitting to reduce the random variance. More details about the datasets and experiment settings are presented in Appendix A. 3.2 Re Node for the Pure Topology-imbalance Issue Settings When considering topology-imbalance only, the labeling set takes a balanced setting and the annotation size for each class is all equal to |L|/k. Following the most widely-used semisupervised setting in node classification studies [47, 18], we randomly select 20 nodes in each class for training and 30 nodes per class for validation; all the remaining nodes form the test set. We display the experiment results for the 5 transductive datasets on 6 widely-used GNN models: GCN [18], GAT [40], PPNP [19], Graph SAGE [13] (short as SAGE), Cheb GCN [9] (short as CHEB) and SGC [43]. We strictly align the hyperparameters in each group of experiments to show the pure improvement brought by our Re Node method (similarly hereinafter). The training loss LT from section 2.4 is adopted. Table 3: Re Node method for the compound scene of TINL and QINL. The imbalance ratio ρ is set to different levels ([5, 10]) to test the effect of our method under different imbalance intensities. Macro-F1(%) CORA Cite Seer Pub Med Photo Computers Imbalance Ratio 5 10 5 10 5 10 5 10 5 10 CE 60.9 1.5 41.0 3.5 53.6 2.1 47.6 2.8 61.0 1.9 49.7 2.6 62.0 2.7 40.7 3.4 50.4 2.6 35.5 3.2 DR-GCN 67.7 1.1 51.3 1.4 54.7 1.7 52.5 2.6 79.4 1.2 78.0 1.6 80.8 2.3 79.5 2.8 66.9 3.5 67.4 3.6 RA-GCN 69.0 1.5 51.7 1.7 55.6 1.3 52.7 2.1 80.6 1.8 78.1 2.1 81.4 2.6 79.4 3.2 71.2 2.8 68.7 3.0 G-SMOTE 68.1 0.9 49.6 1.1 54.0 1.6 51.8 1.3 79.7 1.2 76.4 1.5 82.2 1.8 77.5 2.1 71.9 2.5 61.3 3.2 RW (w/o RN) 69.1 1.4 49.7 1.6 53.6 2.3 52.9 2.6 80.5 1.5 78.0 2.0 80.5 2.7 80.4 3.3 70.5 3.2 67.8 4.2 RW (w/ RN) 70.0 1.3 50.1 1.7 55.2 1.8 54.0 2.5 81.2 1.0 78.5 2.2 83.9 2.1 81.3 3.2 72.4 2.6 70.2 2.4 FOCAL (w/o RN) 66.4 1.6 51.9 1.8 54.3 1.3 54.0 1.9 80.5 0.7 78.0 1.6 79.3 1.9 79.2 2.2 65.8 2.7 63.9 2.6 FOCAL (w/ RN) 68.7 0.7 52.6 1.9 54.6 1.2 54.7 1.5 80.9 0.8 78.7 1.4 80.0 2.3 80.7 2.9 68.6 3.1 65.5 3.5 CB (w/o RN) 69.8 1.5 51.5 1.5 54.1 1.3 53.5 0.8 80.6 0.8 77.6 1.6 77.9 2.6 78.8 3.1 69.6 2.2 64.8 2.9 CB (w/ RN) 71.1 0.6 51.9 1.2 54.7 1.6 54.3 2.3 81.2 1.8 78.3 2.6 79.6 2.7 80.4 3.3 73.1 3.1 66.5 3.6 Results From Table 1, we can find that our Re Node method can effectively improve the overall performance (Weighted-F1) and the class-balance performance (Macro-F1) for all the 6 experiment GNNs in most cases, which proves the effectiveness and generalizability of our method. Our method considers the graph-specific topology imbalance issue which has been usually neglected in existing methods and conducts a fine-grained and self-adaptive adjustment to the training node weights based on their topological positions. We notice that the improvement for the Cite Seer dataset is less than the other datasets. We analyze the reason lies in that the connectivity of Cite Seer is poor, which makes the conflict detection based method fail to reflect the node topological position well. To verify the motivation of relieving topology-imbalance, we set training sets with different levels of topologyimbalance to test our method3. Table 2 displays that our Re Node method improves the performance of GNN (GCN) most when the dataset is highly topologically imbalanced, which demonstrates that our method can effectively alleviate topology-imbalance and improve GNN performance. 3.3 Re Node for the Compound Scene of TINL and QINL Settings When jointly considering both topologyand quantity-imbalance issues, following existing studies [5, 4], we take the step imbalance setting, in which all the minority classes have the same labeling size ni and all the majority classes have the same labeling size na = ρ ni. The imbalance ratio ρ denotes the intensity of quantity imbalance which is equal to the ratio of the node size of the most frequent to least frequent class. In this work, the imbalance ratio ρ is set to [5, 10] for each dataset. The fraction of the majority classes is µ, and for all experiments, we set µ = 0.5 and round down the result µ k. The training loss LQ from section 2.4 is adopted. We implement two groups of baselines for comparison: (1) Popular quantity-imbalance methods for general scenarios: Re-weight [17] (RW), Focal Loss [22] (Focal) and Class Balanced Loss [8] (CB); (2) Graph-specific quantity-imbalance methods: DR-GCN [35], RA-GCN [11] and Graph SMOTE [49]. To jointly handle the topologyand quantity-imbalance issues and demonstrate the orthogonality of them, we combine our Re Node method with these three general quantity-imbalance methods (RW, Focal, CB)4. The backbone model is GCN [18], and the labeling ratio δ is set to 5%. Results From Table 3 (Macro-F1 is reported here for a fair comparison with these methods designed for class-balance performance), we can find that our Re Node method significantly outperforms both the general and the graph-specific quantity-imbalance methods in most situations by simultaneously alleviating the topologyand quantity-imbalance issues. Even when the training set is severely quantity-imbalanced (ρ=10), our method still effectively alleviates the imbalance issue and promotes model performance well. The performance of the quantity-imbalance methods from the general field (RW, Focal, CB) is on par with or less effective than the graph-specific quantity-imbalance methods (DR-GCN, RA-GCN, G-SMOTE), while the combination of our Re Node method and these general quantity-imbalance methods can surpass the graph-specific quantity-imbalance methods, which demonstrates that the node imbalance learning can be further solved by jointly handling the topologyand quantity-imbalance issues instead of considering the quantity-imbalance issue only. 3The settings of the dataset topology-imbalance levels is shown in Appendix C. 4The three graph-specific methods have special operations for the training loss, which can be hardly combined with our method. 20*k 50*k 100*k Reddit: TINL-Only w/o RN w/ RN 20*k 50*k 100*k Reddit: TINL & QINL w/o RN w/ RN 20*k 50*k 100*k MAG: TINL-Only w/o RN w/ RN 20*k 50*k 100*k MAG: TINL & QINL 80 w/o RN w/ RN Figure 4: Experimental results (Weighted-F1,%) on the large-scale Reddit and MAG-Scholar graphs. Our Re Node method can effectively improve the model performance under different labeling sizes. High Middle Low CORA GCN GAT PPNP High Middle Low Cite Seer GCN GAT PPNP High Middle Low Pub Med GCN GAT PPNP Figure 5: Evaluating GNNs from the aspect of topology-imbalance sensitivity (Metric: Weighted-F1 (%)). We can summarize the ranking of topology-imbalance sensitivity: GCN > PPNP > GAT. 3.4 Re Node for Large-scale Graphs Settings We conduct experiments on the two large-scale datasets: Reddit and MAG-Scholar, to verify the effectiveness of our Re Node method in the inductive setting. We conduct experiments with different labeling sizes (20/50/100 training nodes per class) and imbalance settings (TINL-only, TINL and QINL). The backbone GNN model is PPRGo [2] 5. For QINL, we take the uniform selection to sample training nodes to be consistent with PPRGo. The training loss LL from section 2.4 is adopted. Both baseline and our methods are not combined with any quantity-imbalance method. Results In Figure 4, we present the experiment results with different labeling sizes and imbalance settings, we can find that our method can effectively promote the performance on the large-scale graphs comparing to the popular PPRGo model across different settings, which demonstrates the applicability of our method for extremely-large graphs. We also notice that our method can bring greater improvement when the labeling size is large. We explain the reason lies in that when the labeling size is large, the positions located by the conflicts among nodes will be more accurate, thus bringing more reasonable weight adjustments. On the other hand, when the labeling ratio is extremely small (especially for the enormous MAG-Scholar graph) and the influence conflict between the labeled nodes is negligible, our method exhibits the cold start problem. 4 Discussions 4.1 Evaluating GNNs from the Aspect of Topology-Imbalance Sensitivity In Figure 5, we evaluate the GNN s capability for handling topology-imbalance and find that different GNNs present significant difference in the topology-imbalance sensitivity across multiple datasets. The GCN model is susceptible to the topology-imbalance level of the graph and its performance decays greatly when the topology-imbalance increases. On the opposite, the GAT model is less sensitive to the topology-imbalance level and can achieve the best results when the topology-imbalance level is high. The PPNP model can achieve ideal performance when the topology-imbalance level is low, and its performance does not drop as sharply as GCN when the topology-imbalance level is high. We analyze the reason lies in that: (1) the aggregation operation of GCN is equivalent to directly 5We do not implement other inductive backbone GNN models like Cluster-GCN [7] or APPNP [19] due to the low efficiency [2] of them when handling the enormous MAG-Scholar dataset. averaging neighbor features [45] that lacks the noise filtering mechanism, so it is more sensitive to the topology-imbalance level of the graph; (2) the GAT model can dynamically adjust the aggregation weight from different neighbors, which increases its robustness to the high topology-imbalance situation but hinders the model performance when the graph topology-imbalance level is low and there is less need to filter neighbor information; (3) the infinite convolution mechanism of the PPNP model makes it possible to aggregate the information from distant nodes to enhance its robustness to the graph topology imbalance. Shchur et al. [34] notice that the performance ranking of GNNs varies with the training set selection. Hence, existing node classification studies [32, 14] usually repeat experiments multiple times with different training sets to reduce this randomness. The results from Figure 5 inspire us that the topology imbalance can partly explain the randomness of GNN performance caused by the training set selection and we can adopt the topology-imbalance sensitivity as a new aspect in evaluating the performance of different GNN architectures. 4.2 Limitations of Method Although our Re Node method has proven effective in multiple scenarios, we also notice some limitations of it because of the complexity of node imbalance learning. First, the Re Node method is devised for homogeneously-connected graphs (linked nodes are expected to be similar, such as the various datasets in experiments), and it needs a further update for heterogeneously-connected graphs (such as protein networks). Besides, the Re Node method improves less when the graph connectivity is poor (Section 3.2) or the labeling ratio is extremely low (Section 3.3) because in these cases, the conflict level among nodes is low thus the nodes topological positions are insufficiently reflected. 5 Related Work Imbalanced classification problems are widespread in real scenarios and have attracted extensive attention from both academia and industry. Most existing studies on this topic focus on the classimbalanced quantity distribution [15], where the model s inference ability for the majority classes will be significantly better than that of minority classes [12]. The existing methods for solving the quantity-imbalance issue can be roughly divided into methods for the data selection phase and the model training phase. Active learning [31, 10, 42] and Re-sampling [6, 16, 25] are two classical examples designed to construct a quantity-balanced training set . On the other hand, Re-weighting is a simple but effective solution for the model training phase, which adjusts the weights of training samples in different classes based on the labeling sizes [17, 30, 8, 5]. However, directly applying these methods into the graph scene lacks the consideration for the graph-specific topology-imbalance issue. Unlike the re-weight methods which conduct class-lever re-weighting, our Re Node method is a more fine-grained one and assign weights to each node individually. There have been quantity-imbalance studies (Tomek links [38], Near Miss [23], One-Sided Selection [20]) trying to exclude the negative influence of labeling samples close to class boundaries by measuring the similarity of sample features. However, in the graph scene, the prior knowledge contained in node connections is more reliable than directly calculating the feature similarity. Besides, the number of labeled nodes is quite small in the semi-supervised setting. Thus it is not robust to locate their positions by computing similarity among a small number of nodes and we propose to leverage the influence conflict across the whole graph to locate node position to boundaries. Graph data structure owns a wide range of applications, such as social media [13], stock exchange [21], shopping [34], medicine [44], transportation [28] and so on. Similar to other data structures, graph node representation learning also suffer from the quantity-imbalance issue [35]. Apart from the universal quantity-balance approaches introduced in Section 5 which can be transferred to the graph scene, there are some graph-specific quantity-imbalance methods recently proposed. DR-GCN [35] propose two types of regularization to tackle quantity imbalance: class-conditioned adversarial training and unlabeled nodes latent distribution constraint. RA-GCN [11] propose to automatically learn to weight the training samples in different classes in an adversarial training manner. Ada GCN Shi et al. [36] propose to leverage the boosting algorithm to handle the quantity-imbalance issue for the node classification task. Graph SMOTE [49] combines the synthetic node generation and the edge generation to up-sample nodes for the minority classes. However, these studies only pay attention to the quantity imbalance and overlook the topology imbalance. Different from these studies [48, 29, 26] that try to locate the absolute positions for all the nodes by measuring their distance from the selected anchor nodes, our Totoro metric is devised to locate the relative positions to the class boundary for the labeling nodes by considering the influence conflict and can get rid of the dependence on the anchor nodes. Besides, our relative positions can more accurately reflect node class information because we distinguish the information from different classes while existing studies [48, 29] treat all the anchor nodes the same and ignore the class difference. 6 Conclusion and Future Work In this work, we recognize the topology-imbalance node representation learning (TINL) as a graphspecific imbalance learning problem that has not been studied so far. We find that the topologyimbalance issue widely exists in graphs and severely hinders the learning of node classification. We unify TINL with the quantity-imbalance node representation learning (QINL) by considering the shift of the node influence boundaries from true class boundaries. To measure the degree of topology imbalance, we devise a conflict detection based metric Totoro to locate node position, and further propose the Re Node method to adaptively adjust the training weights of labeled nodes based on their topological positions. Extensive empirical results have verified the effectiveness of our method in various settings: TINL-only, both TINL and QINL, and large-scale graph. Besides, we also propose the topology-imbalance sensitivity as a new metric to evaluate GNNs. Considering the importance of the topology-imbalance issue and the limitations of our approach, advanced methods with stronger theoretical or experimental support are expected in future work. Moreover, since topology imbalance is widespread in graph-related tasks other than node classification, how to measure and solve the topology-imbalance issues in broader graph scopes remains a meaningful challenge for future study. 7 Acknowledgement We appreciate all the thoughtful and insightful suggestions from reviews. This work was supported in part by a Tencent Research Grant and National Natural Science Foundation of China (No. 61673028). Xu Sun is the corresponding author of this paper. [1] Reid Andersen, Fan R. K. Chung, and Kevin J. Lang. Local Graph Partitioning using Page Rank Vectors. In 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2006), 21-24 October 2006, Berkeley, California, USA, Proceedings, pages 475 486. IEEE Computer Society, 2006. [2] Aleksandar Bojchevski, Johannes Klicpera, Bryan Perozzi, Amol Kapoor, Martin Blais, Benedek Rózemberczki, Michal Lukasik, and Stephan Günnemann. Scaling Graph Neural Networks with Approximate Page Rank. In the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2020, pages 2464 2473. ACM, 2020. [3] Eliav Buchnik and Edith Cohen. Bootstrapped Graph Diffusions: Exposing the Power of Nonlinearity. Proc. ACM Meas. Anal. Comput. Syst., 2(1):10:1 10:19, 2018. [4] Mateusz Buda, Atsuto Maki, and Maciej A. Mazurowski. A Systematic Study of the Class Imbalance Problem in Convolutional Neural Networks. Neural Networks, 106:249 259, 2018. [5] Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Aréchiga, and Tengyu Ma. Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, Neur IPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 1565 1576, 2019. [6] Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. SMOTE: Synthetic Minority Over-sampling Technique. Journal of artificial intelligence research, 16: 321 357, 2002. [7] Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, and Cho-Jui Hsieh. Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks. In the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4-8, 2019, pages 257 266. ACM, 2019. [8] Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge J. Belongie. Class-Balanced Loss Based on Effective Number of Samples. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 9268 9277. Computer Vision Foundation / IEEE, 2019. [9] Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, pages 3837 3845, 2016. [10] Seyda Ertekin, Jian Huang, and C Lee Giles. Active Learning for Class Imbalance Problem. In the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2007, pages 823 824, 2007. [11] Mahsa Ghorbani, Anees Kazi, Mahdieh Soleymani Baghshah, Hamid R. Rabiee, and Nassir Navab. RA-GCN: Graph Convolutional Network for Disease Prediction Problems with Imbalanced Data. ar Xiv preprint: 2103.00221, 2021. [12] Haixiang Guo, Yijing Li, Jennifer Shang, Gu Mingyun, Huang Yuanyue, and Gong Bing. Learning from Class-Imbalanced Data: Review of Methods and Applications. Expert Syst. Appl., 73:220 239, 2017. [13] William L. Hamilton, Zhitao Ying, and Jure Leskovec. Inductive Representation Learning on Large Graphs. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 1024 1034, 2017. [14] Kaveh Hassani and Amir Hosein Khasahmadi. Contrastive Multi-View Representation Learning on Graphs. In the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 4116 4126. PMLR, 2020. [15] Haibo He and Edwardo A. Garcia. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng., 21(9):1263 1284, 2009. [16] Haibo He, Yang Bai, Edwardo A. Garcia, and Shutao Li. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In the International Joint Conference on Neural Networks, IJCNN 2008, part of the IEEE World Congress on Computational Intelligence, WCCI 2008, Hong Kong, China, June 1-6, 2008, pages 1322 1328. IEEE, 2008. [17] Chen Huang, Yining Li, Chen Change Loy, and Xiaoou Tang. Learning Deep Representation for Imbalanced Classification. In the 29th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, pages 5375 5384, 2016. [18] Thomas N Kipf and Max Welling. Semi-supervised Classification with Graph Convolutional Networks. In the 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. Open Review.net, 2017. [19] Johannes Klicpera, Aleksandar Bojchevski, and Stephan Günnemann. Predict then Propagate: Graph Neural Networks meet Personalized Page Rank. In the 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. Open Review.net, 2019. [20] Miroslav Kubat, Stan Matwin, et al. Addressing the Curse of Imbalanced Training Sets: Onesided Selection. In the 14th International Conference on Machine Learning (ICML 1997), Nashville, Tennessee, USA, July 8-12, 1997, volume 97, pages 179 186. Morgan Kaufmann, 1997. [21] Wei Li, Ruihan Bao, Keiko Harimoto, Deli Chen, Jingjing Xu, and Qi Su. Modeling the Stock Relation with Graph Network for Overnight Stock Movement Prediction. In the 29th International Joint Conference on Artificial Intelligence, IJCAI 2020, pages 4541 4547. ijcai.org, 2020. [22] Tsung-Yi Lin, Priya Goyal, Ross B. Girshick, Kaiming He, and Piotr Dollár. Focal Loss for Dense Object Detection. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pages 2999 3007. IEEE Computer Society, 2017. [23] Inderjeet Mani and I Zhang. k NN Approach to Unbalanced Data Distributions: A Case Study Involving Information Extraction. In Workshop on Learning from Imbalanced Datasets, volume 126, 2003. [24] Julian Mc Auley, Christopher Targett, Qinfeng Shi, and Anton Van Den Hengel. Image-based Recommendations on Styles and Substitutes. In the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, Chile, August 9-13, 2015, pages 43 52. ACM, 2015. [25] Iman Nekooeimehr and Susana K Lai-Yuen. Adaptive Semi-unsupervised Weighted Oversampling (A-SUWO) for Imbalanced Datasets. Expert Systems with Applications, 46:405 416, 2016. [26] Sunil Nishad, Shubhangi Agarwal, Arnab Bhattacharya, and Sayan Ranu. Graph Reach: Position Aware Graph Neural Networks using Reachability Estimations. In the 30th International Joint Conference on Artificial Intelligence IJCAI, 2020. [27] Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The Page Rank Citation Ranking: Bringing Order to the Web. Technical report, Stanford Info Lab, 1999. [28] Nikolay G Prokoptsev, AE Alekseenko, and Yaroslav Aleksandrovich Kholodov. Traffic Flow Speed Prediction on Transportation Graph with Convolutional Neural Networks. Computer research and modeling, 10(3):359 367, 2018. [29] Zhenyue Qin, Saeed Anwar, Dongwoo Kim, Yang Liu, Pan Ji, and Tom Gedeon. Position Sensing Graph Neural Networks: Proactively Learning Nodes Relative Positions. ar Xiv preprint: 2105.11346, 2021. [30] Mengye Ren, Wenyuan Zeng, Bin Yang, and Raquel Urtasun. Learning to Reweight Examples for Robust Deep Learning. In International Conference on Machine Learning, pages 4334 4343. PMLR, 2018. [31] Pengzhen Ren, Yun Xiao, Xiaojun Chang, Po-Yao Huang, Zhihui Li, Xiaojiang Chen, and Xin Wang. A Survey of Deep Active Learning. ar Xiv preprint: 2009.00236, 2020. [32] Yu Rong, Wenbing Huang, Tingyang Xu, and Junzhou Huang. The Truly Deep Graph Convolutional Networks for Node Classification. ar Xiv preprint: 1907.10903, 2019. [33] Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina Eliassi Rad. Collective Classification in Network Data. AI magazine, 29(3):93 93, 2008. [34] Oleksandr Shchur, Maximilian Mumme, Aleksandar Bojchevski, and Stephan Günnemann. Pitfalls of Graph Neural Network Evaluation. ar Xiv preprint: 1811.05868, 2018. [35] Min Shi, Yufei Tang, Xingquan Zhu, David A. Wilson, and Jianxun Liu. Multi-Class Imbalanced Graph Convolutional Network Learning. In the 29th International Joint Conference on Artificial Intelligence, IJCAI 2020, pages 2879 2885. ijcai.org, 2020. [36] Shuhao Shi, Kai Qiao, Shuai Yang, L Wang, J Chen, and Bin Yan. Ada GCN: Adaptive Boosting Algorithm for Graph Convolutional Networks on Imbalanced Node Classification. ar Xiv preprint: 2105.11625, 2021. [37] Marina Sokol, Konstantin Avrachenkov, Paulo Gonçalves, and Alexey Mishenin. Generalized Optimization Framework for Graph-based Semi-supervised Learning. In the 12th SIAM International Conference on Data Mining, Anaheim, California, USA, April 26-28, 2012, pages 966 974. SIAM / Omnipress, 2012. [38] Ivan Tomek et al. Two Modifications of CNN. IEEE Transactions on Systems, Man, and Cybernetics, 1976. [39] Laurens Van der Maaten and Geoffrey Hinton. Visualizing Data using t-SNE. Journal of machine learning research, 9(11), 2008. [40] Petar Veliˇckovi c, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph Attention Networks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. Open Review.net, 2018. [41] Hongwei Wang and Jure Leskovec. Unifying Graph Convolutional Neural Networks and Label Propagation. ar Xiv preprint: 2002.06755, 2020. [42] Xinyue Wang, Bo Liu, Siyu Cao, Liping Jing, and Jian Yu. Important Sampling based Active Learning for Imbalance Classification. Science China Information Sciences, 63(8):1 14, 2020. [43] Felix Wu, Amauri Souza, Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Weinberger. Simplifying Graph Convolutional Networks. In International Conference on Machine Learning, pages 6861 6871. PMLR, 2019. [44] Zhenqin Wu, Bharath Ramsundar, Evan N Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S Pappu, Karl Leswing, and Vijay Pande. Molecule Net: a Benchmark for Molecular Machine Learning. Chemical science, 2018. [45] Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How Powerful are Graph Neural Networks? In the 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. Open Review.net, 2019. [46] Yuzhe Yang and Zhi Xu. Rethinking the Value of Labels for Improving Class-Imbalanced Learning. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, Neur IPS 2020, December 6-12, 2020, virtual, 2020. [47] Zhilin Yang, William W Cohen, and Ruslan Salakhutdinov. Revisiting Semi-supervised Learning with Graph Embeddings. In the 33nd International Conference on Machine Learning, ICML 2016, volume 48 of JMLR Workshop and Conference Proceedings, pages 40 48. JMLR.org, 2016. [48] Jiaxuan You, Rex Ying, and Jure Leskovec. Position-aware Graph Neural Networks. In the 36th International Conference on Machine Learning, ICML 2019, volume 97 of Proceedings of Machine Learning Research, pages 7134 7143. PMLR, 2019. [49] Tianxiang Zhao, Xiang Zhang, and Suhang Wang. Graph SMOTE: Imbalanced Node Classification on Graphs with Graph Neural Networks. In WSDM 21, The Fourteenth ACM International Conference on Web Search and Data Mining, Virtual Event, Israel, March 8-12, 2021, pages 833 841. ACM, 2021. [50] Dengyong Zhou and Christopher J. C. Burges. Spectral Clustering and Transductive Learning with Multiple Views. In the 24th Annual International Conference on Machine Learning, ICML 2007, volume 227, pages 1159 1166. ACM, 2007. [51] Jie Zhou, Ganqu Cui, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, and Maosong Sun. Graph Neural Networks: A Review of Methods and Applications. AI Open, 1, 2020.