# multichannel_pooling_graph_neural_networks__4628d1f9.pdf

Multi-Channel Pooling Graph Neural Networks

Jinlong Du1 , Senzhang Wang2 , Hao Miao1 and Jiaqiang Zhang1

1 Nanjing University of Aeronautics and Astronautics, Nanjing, China 2Central South University, Changsha, China {kingloon, haomiao, zhangjq}@nuaa.edu.cn , szwang@csu.edu.cn

Graph pooling is a critical operation to downsample a graph in graph neural networks. Existing coarsening pooling methods (e.g. Diff Pool) mostly focus on capturing the global topology structure by assigning the nodes into several coarse clusters, while dropping pooling methods (e.g. SAGPool) try to preserve the local topology structure by selecting the top-k representative nodes. However, there lacks an effective method to integrate the two types of methods so that both the local and the global topology structure of a graph can be well captured. To address this issue, we propose a Multi-channel Graph Pooling method named Much Pool, which captures the local structure, the global structure and node features simultaneously in graph pooling. Speciﬁcally, we use two channels to conduct dropping pooling based on the local topology and node features respectively, and one channel to conduct coarsening pooling. Then a cross-channel convolution operation is designed to reﬁne the graph representations of different channels. Finally, the pooling results are aggregated as the ﬁnal pooled graph. Extensive experiments on six benchmark datasets present the superior performance of Much Pool. The code of this work is publicly available at Github1.

1 Introduction

Convolution Neural Networks (CNNs) [Montavon et al., 2012] have shown great capability in various machine learning tasks, such as image recognization [He et al., 2016] and video classiﬁcation [Karpathy et al., 2014]. As images and texts are Euclidean data, which have regular grid-like structures, convolution and pooling operations can be conveniently conducted in. Given the powerful capability of CNNs on gridlike data, it is appealing to generalize convolution and pooling operations on graph-structured data [Gori et al., 2005]. However, signiﬁcantly different from grid-like data, the locality of

Corresponding author 1https://github.com/kingloon/Multi Channel Pooling

nodes cannot be clearly deﬁned in a graph, making the generalization of convolution and pooling operations on graphs extremely challenging. In recent years, there has been a myriad of attempts to extend convolution operations to arbitrarily shaped graphs, referred as Graph Neural Networks (GNNs). The application of various GNNs for node representation learning [Veliˇckovi c et al., 2018] have obtained outstanding performance in graphrelated learning tasks, such as node classiﬁcation [Veliˇckovi c et al., 2018] and link prediction [Schlichtkrull et al., 2018]. However, in some scenarios, the graph-level representation is also needed to perform graph-level machine learning tasks such as predicting the physical property of a given molecule [Wang et al., 2019]. Although considerable efforts have been made in studying the convolution operation in GNNs, one essential problem in graph representation learning, how to effectively perform the pooling operation to downsample the graphs, is not fully studied and remains to be a challenging research problem. Early graph representation learning models generally utilize simple readout function (such as mean pooling and max pooling) [Henaff et al., 2015] to summarize all the nodes representations to represent the graph. Such simple readout function methods are inherently ﬂat [Zhang et al., 2019] as the graph topology information is largely ignored. To capture the global topology structure of a graph, coarsening pooling methods, such as Diff Pool [Ying et al., 2018], utilize differentiable pooling operators to learn soft assignment matrices to map each node into sets of clusters. Then the nodes of a cluster are pooled as one node to form a coarsened graph. Another line of researchers propose the dropping pooling methods, including Graph U-Nets [Gao and Ji, 2019], Att Pool [Huang et al., 2019] and SAGPool [Lee et al., 2019], to better capture the local topology structure. The general idea behind these approaches is to ﬁrst select top-k important nodes of a graph and then drop all the other nodes to downsample the graph. However, there lacks an effective method to integrate the two types of methods so that both the local and the global topology structure of a graph can be well captured. To address the limitations of existing graph pooling methods, we propose a novel multi-channel graph pooling method named Much Pool, which can effectively integrate the global topology structure, the local topology structure and the node features in graph pooling via using three channels. To be spe-

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

ciﬁc, we use two channels to conduct dropping pooling based on the local structure and node features respectively, and one channel to conduct coarsening pooling based on the global topology structure. Then a cross-channel graph convolution operation is proposed to integrate and reﬁne the node representations of the three channels. Finally, the pooling results of different channels are aggregated to form the ﬁnal pooled graph. To utilize Much Pool in the task of graph classiﬁcation, we implement an end-to-end architecture by stacking graph convolution layers and the Much Pool layers to learn a hierarchical representation for a graph. We conduct extensive experiments over six widely used graph datasets in GNNs. The results show that our proposal achieves signiﬁcant performance improvement on graph classiﬁcation compared with current state-of-the-art models. We summarize our main contributions as follows.

We for the ﬁrst time propose a novel multi-channel graph pooling method that can capture and integrate local patch, coarse-grained structure and node features simultaneously.

We design a cross-channel graph convolution operation to reﬁne the node representations, and propose a pooling aggregation operation to integrate the pooling results of different channels.

We conduct extensive experiments on multiple public datasets to demonstrate Much Pool s capability of learning effective graph representation by comparing it with several state-of-the-art methods.

2 Related Work 2.1 Graph Neural Networks GNNs can be generally divided into two categories: spectral and spatial approaches. Based on spectral graph theory, Bruna et al. [Bruna et al., 2014] ﬁrst deﬁned convolution operation in Fourier transform domain. When utilizing spectral ﬁlters, graph Laplacian has to be computed, causing heavy computation cost, so this method is hard to generalize to graphs with large size. To address this challenge, Kipf and Welling [Kipf and Welling, 2017] proposed a simpliﬁed model by utilizing 1-st approximation of the Chebyshev expansion. The spatial approaches deﬁne convolution by aggregating the information of central node and its neighbors, aiming at making it work on graphs directly. Among them, Graph SAGE [Hamilton et al., 2017] generates node embeddings via aggregating nodes information from sampling strategy. GAT [Veliˇckovi c et al., 2018] integrates attention mechanism into the process of information aggregation. Both spectral and spatial methods can ﬁt within the framework of neural message-passing [Gilmer et al., 2017]. Under this framework, GNN is regarded as a message-passing algorithm where node representations are computed iteratively by aggregating messages from its neighboring nodes through edges using a differentiable function.

2.2 Graph Pooling Pooling operation can downsize inputs, thus reduce the number of parameters and enlarge receptive ﬁelds, leading to bet-

ter generalization performance. Recent graph pooling methods can be grouped into two big branches: global pooling and hierarchical pooling. Global graph pooling, also known as a graph readout operation [Xu et al., 2019; Lee et al., 2019], adopts summation operation or neural networks to integrate all the node embeddings into a vector as graph representation. Set2set [Vinyals et al., 2015] uses the LSTM model to ﬁnd the important nodes in a graph and generate graph representation. DGCNN [Zhang et al., 2018] sorts node embeddings and then summarizes the graph by concatenating some of the node embeddings. Though ﬂexible to apply on graphs with different sizes, global graph pooling methods perform pooling operations only based on node features, and thus some structural information could be lost. Hierarchical graph pooling methods aim at learning a hierarchical representation via building hierarchical GNNs. Diff Pool [Ying et al., 2018] uses graph neural networks to learn a soft assignment matrix mapping nodes to a set of clusters, so it is computation expensive. U-Nets [Gao and Ji, 2019], SAGPool [Lee et al., 2019] and Att Pool [Huang et al., 2019] design top-k node selection strategy to pick out the most important nodes to form a pooled graph with them. Such methods are more effective as they only need to calculate an important score for each node, but they neglect to consider the graph topology during this procedure. Edge Pool [Diehl, 2019] gets a pooled graph by contracting the edges in a graph. It is unﬂexible since it can only cut the number of nodes in half each time. Eigen Pool [Ma et al., 2019] performs pooling operation based on graph Fourier transform. Due to operating spectral clustering, it is very time-consuming. To sum up, there still lacks an effective method that can combine the top-k based and graph coarsening based methods to yield a better hierarchical graph representation.

3 Problem Statement

In this paper, we focus on the graph classiﬁcation task, which maps each graph to a set of labels. Considering an arbitrary graph G = (A, X), we have A {0, 1}n n denotes the adjacency matrix, and X Rn d denotes the feature matrix in which each row represents a d-dimentional feature vector of a node. Given a dataset D = {(G1, y1), (G2, y2), ...}, the task of graph classiﬁcation is to learn a mapping function f : G Y, where G = {G1, G2, ..., Gn} is the set of input graphs and Y = {y1, y2, ..., yn} is the set of labels associated with the graphs.

3.1 Graph Convolution Networks

Graph Convolution Network (GCN) [Kipf and Welling, 2017] is an effective tool to extract discriminative features in graphs and achieves state-of-the-art performance in many machine learning tasks. We use GCN as a module of our graph classiﬁcation model, which is deﬁned as:

Xl+1 = σ( ˆD 1

2 ˆA ˆD 1 2 X(l)W (l)) (1)

where ˆA = A+I is the adjacency matrix with self-loop added to each node, ˆD = P j ˆAi,j is the degree matrix of ˆA and

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

Channel 2: Global Topology Based Pooling

Channel 1: Local Topology Based Pooling

Channel 3: Feature Based Pooling

Subgraph Formation

Subgraph Formation

𝐴𝐴c𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟

𝐴𝐴𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐

𝐺𝐺𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓

𝐺𝐺𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓

𝐺𝐺𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐

Step 1: Multi-channel pooling Step 2: Cross-channel convolution Step 3: Pooling aggregation

Figure 1: The framework of the proposed Much Pool model.

W (l) is a learnable matrix at layer l. The initial node features are used at the ﬁrst graph convolution, i.e., X(0) = X.

4 Much Pool: Multi-Channel Graph Pooling

The main idea of Much Pool is to use three channels to perform graph pooling to capture different characteristics of a graph, and then aggregate the three channels pooling results. The original graph is ﬁrst fed into three GCN layers to learn node embeddings. Based on the intermediate result, we generate two ﬁne-grained pooled graphs Gfine1 and Gfine2, and a coarse-grained graph Gcoarse as shown in step 1 of Figure 1. At step 2, we conduct cross-channel convolution between channel 2 and channels 1 and 3, respectively. At step 3, we aggregate the outputs of channel 1 and channel 3 to form the ﬁnal pooled graph Gpool.

4.1 Multi-Channel Pooling

Channel 1: Local topology based pooling. In channel 1, we propose to use the local topology-based mask method to generate a ﬁne-grained pooled graph Gfine1. Speciﬁcally, we use the degree measurement to rank the importance of the nodes. In other words, the larger the degree of a node, the more important it is to the entire graph. For example, the nodes with high degrees in road networks may represent some central or important areas and these nodes can better represent the outline of the whole road network [Song et al., 2020; Wang et al., 2020]. The following equations are applied to get the importance scores of each node:

si = sum(A[i, :]) (2) Idx1 = rank(s1, k) (3)

where si denotes the degree of i-th node of original graph, s1 = [s1, s2, ..., sn], A denotes the adjacency matrix of the graph, k is the number of nodes selected to reserve in the ﬁne-grained pooled graph. rank(s, k) function is used to sort nodes based on their degree, and it returns the index set of the k-largest values in s.

Channel 2: Global topology based pooling. In channel 2, we aim at forming a coarse-grained graph Gcoarse. Many graphs have rich hierarchical structures [Ma et al., 2018], and capturing this kind of structural information is signiﬁcant for downstream graph classiﬁcation tasks. Motivated by Diff Pool [Ying et al., 2018], we adopt a graph clustering method that uses a GNN to learn a cluster assignment matrix for generating Gcoarse. The assignment matrix is generated by the following equation: S = softmax(GNNpool(A, X)) Rn c (4) where n is the number of nodes in the input graph and c is the predeﬁned cluster number, GNNpool is the GNN used to cluster nodes, X and A are the feature and adjacency matrices of the input graph, respectively. Based on the assignment matrix, we can obtain Gcoarse by the following formulas: Xcoarse = S T Z Rc d (5)

Acoarse = ST AS Rc c (6)

Z = GNNembed(A, X) Rn d (7) where Z Rn d denotes the node embeddings learned from the input graph, Xcoarse and Acoarse denote the feature and adjacency matrices of Gcoarse. S and Z are both obtained by GNN modules, and the softmax function is used in rowwise manner on ST to determine the assignment probability of each node of the input graph to the clusters.

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

Original Graph

Much Pool MLP Readout Label Much Pool Much Pool GCN GCN

Figure 2: Hierarchical graph representation learning framework with Much Pool model for graph classiﬁcation.

Channel 3: Node feature based pooling. In channel 3, we select the most important nodes based on their feature embeddings. To be speciﬁc, we ﬁrst use a readout module to get a representation of the current graph, and then transform it with a learnable matrix. The process can be formulated as: Z = readout(X(l)), Z R1 d (8) e Z = σ(ZW), e Z R1 d (9) where Z is the shallow graph representation obtained by using the mean readout operation and e Z is its variant through a nonlinear transformation, W Rd d is a learnable parameter matrix. To select the top-k important nodes in the current graph based on features, we calculate an importance score for each node by applying the following formula. si = X[i] e Z (10) where i is the index of node vi, si denotes the importance score of it, s2 = [s1, s2, ..., sn], and denotes the operation of inner-product. With this score, a fraction of most important nodes from the aspect of features can be selected, whose indexes are obtained with the same rank(s, k) function: Idx2 = rank(s, k) (11)

4.2 Cross-Channel Convolution As shown in step 2 of Figure 1, with the outputs of channels 1 to 3, we get three pooled graphs Gfine1, Gfine2 and Gcoarse. Since the three pooled graphs reﬂect the different aspects of the initial input graph, we next need to fuse them. To this end, we propose a cross-channel convolution operation between Gfine1 and Gcoarse, Gfine2 and Gcoarse. The cross-channel convolution operation is deﬁned as follows. Hfine = σ([Hfine + Across Hcoarse] W) (12) Hfine Rk d denotes the node embedding matrix of the ﬁne-grained graph Gfine1 or Gfine2, Hcoarse denotes the node embedding matrix of Gcoarse, and Across Rk c denotes the connection relationship between coarseand ﬁnegrained pooled graphs, where k and c are the node numbers in the two pooled graphs, respectively. In particular, we obtain Across by utilizing the assignment matrix S and the important node indices via applying the following equation: Across[i] = S[i], i Idx1 or Idx2 (13) The GNNs within the message-passing scheme can be expressed as: Hk = σ([Hk 1 + AHk 1] Wk) Rn d (14) where Hk denotes the hidden node embeddings after k steps of graph convolution, A {0, 1}n n denotes the adjacent matrix, and Wk Rn d denotes a learnable weight matrix,

which needs to operate on the same graph. Compared with the formula (12), we replace the adjacency matrix and embedding matrix of the ﬁne-grained graph by Across and Hcoarse separately.

4.3 Pooling Aggregation

With the above operations, we have two reﬁned pooled graphs from channels 1 and 3 as shown in the right part of Figure 1. To reduce the loss of discriminative information for graph classiﬁcation, we need to aggregate them. We denote the indices of the selected nodes in channel 1 as Idx1 and the embedding matrix after the cross-channel convolution step as X1. Likewise, the indices of selected nodes and the embedding matrix in channel 3 are Idx2 and X2, respectively. The pooled graphs aggregation process can be performed by the following formulations: Idx = Idx1 Idx2 (15) With the index, a subgraph that consists of the most representative nodes of the original graph can be extracted, whose adjacency matrix is denoted as: ˆA = A[Idx, :] {0, 1}K N (16) where K = |Idx| is the order of the indices set of nodes in the original graph that will be reserved, N is the total number of nodes in the original graph. Then, we apply the following two formulas to generate the ﬁnal pooled graph:

Ap = ˆAA ˆAT (17)

X1[i, :] if i in Idx Idx2 X1[i,:]+X2[i,:]

2 if i in Idx1 Idx2 X2[i, :] if i in Idx Idx1 (18)

where Xp RK d is the node feature matrix for the aggregated pooled graph, and Ap {0, 1}K K is its adjacency matrix accordingly.

4.4 Hierarchical Pooling Framework with Much Pool

To use Much Pool in the graph classiﬁcation task, we implement an end-to-end trainable model by stacking several GCN layers with Much Pool module inserted. The model framework is illustrated in Figure 2. Concretely, we view a GCN layer followed by a Much Pool layer as a complete function unit and name it Much Pool GCN layer for convenience. For a Much Pool GCN layer, it takes a graph as input and outputs a pooled graph that is represented with a new feature matrix and adjacency matrix. Then, the pooled graph is fed into the

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

Datasets Gavg Cavg Vavg Eavg

PROTEINS 344 2 14.29 715.66 ENZYMES 600 6 32.63 62.14 D&D 1178 2 284.32 14.69 NCI1 4110 2 29.87 32.30 NCI109 4127 2 29.68 32.13 COLLAB 5000 3 74.49 2457.78

Table 1: Statistics of the datasets. Gavg, Cavg, Vavg and Eavg denote the average number of graphs, classes, nodes and edges, respectively.

next Much Pool GCN layer, and it is fed into a readout module at the same time, in which the embedding is added up as the graph embedding in this layer. Finally, the graph embeddings in all layers are added up to generate the ﬁnal graph representation, and it is taken as the input of a MLP classiﬁer to predict the label of the original graph.

5 Experiments 5.1 Datasets and Baselines We use the following 6 widely used datasets in the classiﬁcation tasks to evaluate the performance of our proposed model. D&D and PROTEINS [Dobson and Doig, 2003] are two protein graph datasets, where nodes represent the aminoacids. The label indicates whether or not a protein is a non-enzyme. NCI1 and NCI109 [Shervashidze et al., 2011] are two biological datasets for anticancer activity classiﬁcation, where each graph is a compound graph with nodes and edges representing atoms and chemical bonds separately. ENZYMES is a dataset of protein tertiary structures, and each enzyme belongs to one of the 6 EC top-level classes. COLLAB [Leskovec et al., 2005] is a scientiﬁc collaboration dataset, where nodes represent scientists and edges represent collaboration relation between two scientists; each graph is labeled to a physics ﬁeld that the researcher belongs to. Statistics of the datasets are shown in Table 1. We take three kinds of methods as baselines: (1) the kernelbased methods including WL [Morris et al., 2019] subtree; (2) GNN-based methods including GCN [Kipf and Welling, 2017], GIN [Xu et al., 2019], GAT [Veliˇckovi c et al., 2018], GRAPHSAGE [Hamilton et al., 2017], Set2Set [Vinyals et al., 2015] and DGCNN [Zhang et al., 2018]; (3) hierarchical graph pooling methods including Diff Pool [Ying et al., 2018], Graph U-Nets [Gao and Ji, 2019] and Att Pool [Huang et al., 2019].

5.2 Implementation and Experiment Settings We implement our Much Pool model with the Py Torch framework. The part of cluster formation in channel 2 follows the implementation of Diff Pool [Ying et al., 2018] and we make some adjustments according to our problem setting. The dimensions of node representation and graph representation are set as 64. The node retention ratio r is set as 0.5 for three channels in all layers. We adopt the mean pooling function to read out the graph representation. GCN is used as our backbone network. The ﬁrst Much Pool GCN layer uses 2 GCN layers while the subsequent Much Pool GCN layers use only

1 GCN layer for aggregating information. A MLP consisting of two fully connected layers with 128 neurons is set to follow the ﬁnal Much Pool GCN layer, followed by a softmax classiﬁer. It takes the graph representation as input and outputs the categories probability to ﬁnish the prediction task. We follow the experiment setting of the state-of-the-art model Graph U-Nets [Gao and Ji, 2019] and evaluate our model over 20 random seeds using 10-fold cross validation. A total of 200 testing results are used to report the average accuracy and standard deviation for each model on each dataset. We use Xavier normal distribution [Glorot and Bengio, 2010] for weight initialization, Adam optimizer to initialize our model and negative log-likelihood loss function is utilized to train our model. For all datasets, we train our model for 300 epochs and the batch size is set to 16 or 32 (depending on the graph size). The optimal hyperparameters are obtained by applying the strategy of grid search.

5.3 Performance Comparison We compare the performance of our proposed Much Pool with baseline methods on the 6 benchmark datasets for the graph classiﬁcation task, and the accuracy and standard deviation are reported in Table 2. First, one can observe that the performance of Much Pool is superior to its counterparts on ﬁve out of six benchmarks. To be speciﬁc, Much Pool improves the performance by 3.72% and 5.66% over the best baselines Diff Pool and Att Pool on ENZYMES and PROTEINS datasets respectively, which demonstrates the effectiveness of Much Pool. Next, Much Pool consistently outperforms GCN on all the datasets signiﬁcantly, indicating the necessity of adding pooling modules in the learning process. This is because Much Pool can extract more useful graph topology information than the global pooling based GCN. By comparing with other GNN based models including GCN, GAT, GIN and Graph SAGE, both Diff Pool and Much Pool achieve better performance. It reals that capturing coarse-grained structure is helpful for graph representation learning, especially for those graphs with obvious community structure. One can also see that Diff Pool does not always show better results than local topology and feature based pooling methods Att Pool and Graph U-Nets. This further veriﬁes the effectiveness of combining the node features, local topology and global topology. Moreover, compared with Att Pool, Much Pool scores relatively moderately on COLLAB datasets. This is because many collaboration graphs in COLLAB show only single-layer community structures. Much Pool uses three layers model to learn the graph representation, which is too complex and may lead to overﬁtting.

5.4 Ablation Study In this subsection, we conduct an ablation study to verify that all three channels are helpful to graph pooling. We compare the complete Much Pool model with its variants by removing one of the three channels in it and keep other parts the same. For convenience, we name the Much Pool model without local topology, global topology and feature based pool channel as Much Pool NL, Much Pool NG and Much Pool NF , respectively. As shown in Figure 3, the performances of the three variants are all inferior to Much Pool over the four datasets EN-

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

Methods Datasets

ENZYMES DD PROTEINS NCI1 NCI109 COLLAB

WL 40.97 4.05 73.64 3.84 75.75 4.57 75.21 1.64 72.40 1.86 78.90 1.90 GCN 50.00 5.87 75.13 4.14 74.75 4.63 79.68 2.05 78.05 1.59 71.92 3.24 GAT 51.00 5.23 72.65 3.18 77.37 2.95 79.88 0.88 79.93 1.52 75.80 1.60 GIN 31.11 1.92 65.94 1.87 68.17 2.39 57.49 0.73 56.62 0.61 80.20 1.90 Graph SAGE 53.33 3.42 77.48 3.20 76.73 3.00 78.98 1.84 77.27 1.66 79.70 1.70 Set2Set 38.00 7.81 64.00 6.82 70.26 4.06 68.95 2.51 66.37 6.18 65.34 6.44 DGCNN 49.67 4.27 77.50 2.87 78.08 3.38 75.72 1.77 74.02 2.08 73.82 1.74 Graph U-Nets 40.50 3.88 82.51 3.17 78.26 4.52 69.73 1.91 70.25 2.09 69.94 1.54 Att Pool 50.17 4.04 81.07 3.21 79.52 3.52 81.17 2.24 80.23 0.72 77.08 2.26 Diff Pool 62.83 7.07 80.99 2.98 77.62 4.97 80.36 1.56 78.51 1.20 73.94 3.28 Much Pool 65.17 3.98 85.06 3.34 84.01 1.73 81.29 1.31 80.50 1.48 74.58 2.63

Table 2: Mean accuracy (10 folds) and standard deviation on the 6 graph classiﬁcation datasets. We use bold to highlight the best result.

Figure 3: Comparison between Much Pool and its variations.

ZYMES, PROTEINS, NCI1 and NCI109. This demonstrates the effectiveness of the three channels. Especially, Much Pool outperforms Much Pool NG by 36.64%, 4.63%, 5.08% and 3.90% on four datasets, respectively. The result suggests that global topology is signiﬁcantly important in some graphs (for instance, protein molecule graphs), and capturing the global structure is especially useful for the classiﬁer to distinguish the graphs with ﬁxed functional units. One can also see that the results of Much Pool NL and Much Pool NF are close to each other. It is probably because some nodes are important in terms of both local topology and node features, which leads to the large overlap between the selected nodes from channels 1 and 3.

5.5 Parameter Sensitivity Analysis We further investigate the effects of some important hyperparameters on Much Pool. In detail, we study how the number of neural network layers k, graph representation dimension d will affect the graph classiﬁcation performance. As shown in Figure 4, Much Pool achieves better performance when setting d = 64 and k = 2, respectively. One can also observe that with the dimension d increasing, the classiﬁcation accuracy presents a slight increase trend accordingly. This is be-

Figure 4: Classiﬁcation accuracy curve on ENZYMNS and PROTEINS datasets with different values of d and k.

cause different types of graphs have larger margins in higherdimensional representation space, making it easier to distinguish their categories. Considering the neural network layer k, a larger k provides a more accurate classiﬁcation result, but too large a k will hurt the performance.

6 Conclusion

In this paper, we propose a novel multi-channel graph pooling method Much Pool. Much Pool uses three channels to capture the local topology, the global topology and node features separately, and then uses a cross-channel convolution operation to integrate them. Based on Much Pool, we construct an end-to-end hierarchical graph representation learning model for graph classiﬁcation. We evaluate our proposal on several graph classiﬁcation benchmark datasets. The result shows that Much Pool achieves superior or comparable results compared with state-of-the-art methods.

Acknowledgements

This work is supported by the Fundamental Research Funds for the Central Universities (No.: NZ2020014) and Guangdong Basic and Applied Basic Research Foundatoin (2021A1515012239).

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)

References [Bruna et al., 2014] Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann Le Cun. Spectral networks and locally connected networks on graphs. In Proceedings of ICLR, 2014. [Diehl, 2019] Frederik Diehl. Edge contraction pooling for graph neural networks. ar Xiv preprint ar Xiv:1905.10990, 2019. [Dobson and Doig, 2003] Paul D Dobson and Andrew J Doig. Distinguishing enzyme structures from nonenzymes without alignments. Journal of molecular biology, 330(4):771 783, 2003. [Gao and Ji, 2019] Hongyang Gao and Shuiwang Ji. Graph u-nets. In Proceedings of ICML, 2019. [Gilmer et al., 2017] Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. In ICML, 2017. [Glorot and Bengio, 2010] Xavier Glorot and Yoshua Bengio. Understanding the difﬁculty of training deep feedforward neural networks. In Proceedings of AISTATS, 2010. [Gori et al., 2005] Marco Gori, Gabriele Monfardini, and Franco Scarselli. A new model for learning in graph domains. In Proceedings of IJCNN. IEEE, 2005. [Hamilton et al., 2017] Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. In Proceddings of Nue IPS, 2017. [He et al., 2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of CVPR, 2016. [Henaff et al., 2015] Mikael Henaff, Joan Bruna, and Yann Le Cun. Deep convolutional networks on graph-structured data. ar Xiv preprint ar Xiv:1506.05163, 2015. [Huang et al., 2019] Jingjia Huang, Zhangheng Li, Nannan Li, Shan Liu, and Ge Li. Attpool: Towards hierarchical feature representation in graph convolutional networks via attention mechanism. In Proceedings of ICCV, 2019. [Karpathy et al., 2014] Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. Large-scale video classiﬁcation with convolutional neural networks. In Proceedings of CVPR, 2014. [Kipf and Welling, 2017] Thomas N Kipf and Max Welling. Semi-supervised classiﬁcation with graph convolutional networks. In Proceedings of ICLR, 2017. [Lee et al., 2019] Junhyun Lee, Inyeop Lee, and Jaewoo Kang. Self-attention graph pooling. In Proceedings of ICML, 2019. [Leskovec et al., 2005] Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graphs over time: densiﬁcation laws, shrinking diameters and possible explanations. In Proceedings of SIGKDD, 2005. [Ma et al., 2018] Jianxin Ma, Peng Cui, Xiao Wang, and Wenwu Zhu. Hierarchical taxonomy aware network embedding. In Proceedings of SIGKDD, 2018.

[Ma et al., 2019] Yao Ma, Suhang Wang, Charu C Aggarwal, and Jiliang Tang. Graph convolutional networks with eigenpooling. In Proceedings of SIGKDD, 2019. [Montavon et al., 2012] Gr egoire Montavon, Genevi eve Orr, and Klaus-Robert M uller. Neural networks: tricks of the trade, volume 7700. springer, 2012. [Morris et al., 2019] Christopher Morris, Martin Ritzert, Matthias Fey, William L Hamilton, Jan Eric Lenssen, Gaurav Rattan, and Martin Grohe. Weisfeiler and leman go neural: Higher-order graph neural networks. In Proceedings of AAAI, 2019. [Schlichtkrull et al., 2018] Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. Modeling relational data with graph convolutional networks. In European Semantic Web Conference. Springer, 2018. [Shervashidze et al., 2011] Nino Shervashidze, Pascal Schweitzer, Erik Jan Van Leeuwen, Kurt Mehlhorn, and Karsten M Borgwardt. Weisfeiler-lehman graph kernels. Journal of Machine Learning Research, 12(9), 2011. [Song et al., 2020] Chao Song, Youfang Lin, Shengnan Guo, and Huaiyu Wan. Spatial-temporal synchronous graph convolutional networks: A new framework for spatialtemporal network data forecasting. In Proceedings of the AAAI, 2020. [Veliˇckovi c et al., 2018] Petar Veliˇckovi c, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks. In Proceedings of ICLR, 2018. [Vinyals et al., 2015] Oriol Vinyals, Samy Bengio, and Manjunath Kudlur. Order matters: Sequence to sequence for sets. In Proceedings of ICLR, 2015. [Wang et al., 2019] Xiaofeng Wang, Zhen Li, Mingjian Jiang, Shuang Wang, Shugang Zhang, and Zhiqiang Wei. Molecule property prediction based on spatial graph embedding. Journal of chemical information and modeling, 59(9):3817 3828, 2019. [Wang et al., 2020] Senzhang Wang, Jiannong Cao, and Philip Yu. Deep learning for spatio-temporal data mining: A survey. IEEE TKDE, 2020. [Xu et al., 2019] Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? In Proceedings of ICLR, 2019. [Ying et al., 2018] Zhitao Ying, Jiaxuan You, Christopher Morris, Xiang Ren, Will Hamilton, and Jure Leskovec. Hierarchical graph representation learning with differentiable pooling. In Proceeddings of Neur IPS, 2018. [Zhang et al., 2018] Muhan Zhang, Zhicheng Cui, Marion Neumann, and Yixin Chen. An end-to-end deep learning architecture for graph classiﬁcation. In Proceeddings of AAAI, 2018. [Zhang et al., 2019] Zhen Zhang, Jiajun Bu, Martin Ester, Jianfeng Zhang, Chengwei Yao, Zhi Yu, and Can Wang. Hierarchical graph pooling with structure learning. In Proceedings of Neur IPS, 2019.

Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence (IJCAI-21)