# topological_relational_learning_on_graphs__c3ae4701.pdf

Topological Relational Learning on Graphs

Yuzhou Chen Department of Electrical Engineering Princeton University yc0774@princeton.edu

Baris Coskunuzer Department of Mathematical Sciences University of Texas at Dallas coskunuz@utdallas.edu

Yulia R. Gel Department of Mathematical Sciences University of Texas at Dallas and National Science Foundation ygl@utdallas.edu

Graph neural networks (GNNs) have emerged as a powerful tool for graph classiﬁcation and representation learning. However, GNNs tend to suffer from oversmoothing problems and are vulnerable to graph perturbations. To address these challenges, we propose a novel topological neural framework of topological relational inference (TRI) which allows for integrating higher-order graph information to GNNs and for systematically learning a local graph structure. The key idea is to rewire the original graph by using the persistent homology of the small neighborhoods of nodes and then to incorporate the extracted topological summaries as the side information into the local algorithm. As a result, the new framework enables us to harness both the conventional information on the graph structure and information on the graph higher order topological properties. We derive theoretical stability guarantees for the new local topological representation and discuss their implications on the graph algebraic connectivity. The experimental results on node classiﬁcation tasks demonstrate that the new TRI-GNN outperforms all 14 state-ofthe-art baselines on 6 out 7 graphs and exhibit higher robustness to perturbations, yielding up to 10% better performance under noisy scenarios.

1 Introduction

Node classiﬁcation is one of the most active research areas in graph learning. The target here is, given a single attributed graph G and a small subset of nodes with prior label information, to predict labels of all remaining unlabelled nodes. Applications of node classiﬁcation are very diverse, ranging from customer attrition analytics to tracking corruption-convictions among politicians. Graph neural networks (GNNs) offer a powerful machinery for addressing such problems on large heterogeneous networks, resulting in an emerging ﬁeld of geometric deep learning (GDL) which adapts deep learning to non-Euclidean objects such as graphs [9, 51, 54, 16].

Although GDL achieves a highly competitive performance in various graph-based classiﬁcation and prediction tasks, similarly to the image domain deep learning on graphs is often found to be vulnerable to graph perturbations and adversarial attacks [43, 50, 26]. In turn, most recent results [42, 19] suggest that local graph information may be invaluable for robustifying GDL against graph perturbations and adversarial attacks. Also, as shown by [36, 4, 53] in conjunction with network community learning, local algorithms (i.e., algorithms based only on a small radius neighborhood around the node) may demonstrate superior performance if coupled with side information in the form of a node labeling positively correlated with the true graph structure. Inspired by these results, the ultimate goal of this

35th Conference on Neural Information Processing Systems (Neur IPS 2021).

paper is to introduce the idea of local algorithms with local topological side information on similarity of node neighborhood shapes into GNN. By shape here, we broadly understand data characteristics which are invariant under continuous transformations such as stretching, bending, and compressing, and to study such graph shapes properties, we invoke the machinery of topological data analysis (TDA) [21, 11].

Topological Relational Inference: from Matchmaking to Adversarial Graph Learning and Beyond In particular, to capture more complex graph properties and enhance model robustness, we introduce the concept of topological relational inference (TRI) and propose two novel options for information passing which rely on the local topological structure of each node: (i) Inject a new topology-induced multiedge between two nodes based on shape similarity of their local neighborhoods; (ii) Enrich each individual node features by harnessing local topological side information from its neighbors. The rationale behind our idea is multi-fold. First, we assess not only global graph topology and relationships between the feature sets of two individual nodes, as with the current diffusion mechanisms and random walks on graphs, but we also examine important interactions between shapes of the feature sets of the node neighborhoods. As a result, our new topology-induced multigraph representation (TIMR) of graph G in (i) strengthens the relationship among nodes which might not be (yet) connected by a visible (or "tangible") edge in G but whose higher order intrinsic characteristics are very similar. The intuitive analogy here might be with a matchmaking agency who aims to connect two people on a blind date, based on a careful assessment of interplay among similarities of their interests, socio-demographics as well as that of their close friends, rather than bringing in two individuals together through a random walk among all available proﬁles of potential candidates. Another example is the customer churn and retention analytics based on peer effects [15]. That is, the goal here is to classify the node as potential churn customer based not only on the individual attributes but on interactions among its neighbor attributes. Second, such a new multiedge in (i), based on shape similarity of node local neighborhoods may assist not only the matchmaking agency in coupling the most appropriate individuals (or link prediction in less romantic and more technical terms), but to enhance resilience of the graph structure and associated graph learning. Indeed, a new multiedge in (i) can be used as an essential remedial ingredient in graph defense mechanisms as criteria to clean the perturbed graph, i.e., to reconstruct edges removed by the attacker or to suppress edges induced by the attacker, depending on the strength of the topology-induced link. Third, by learning shape properties within each local node neighborhood, our TRI allows for systematic recovering of intrinsic higher-order interactions among nodes well beyond a level of pair-wise connectivity, and such local topological side information in (ii) enhances accuracy in node classiﬁcation tasks even on clean graphs. Signiﬁcance of our contributions are the following:

We propose a novel perspective to graph learning with GNN topological relational inference, based on the idea of similarity among shapes of local node neighborhoods.

We develop a new topology-induced multigraph representation of graphs which systematically accounts for the key local information on the attributed graphs and serves as an important remedial ingredient against graph perturbations and attacks.

We derive theoretical stability guarantees for the new local topological representation of the graph and discuss their implications for the graph algebraic connectivity.

Our expansive node classiﬁcation experiments show that TRI-GNN outperforms 14 state-of-theart baselines on 6 out 7 graphs and delivers substantially higher robustness (i.e., up to 10% in performance gains under noisy scenarios) than baselines on all 7 datasets.

2 Related Work

Graph Neural Networks Inspired by the success of the convolution mechanism on image-based tasks, GNNs continue to attract an increasing attention in the last few years. Based on the spectral graph theory, [10] introduced a graph-based convolution in Fourier domain. However, complexity of this model is very high since all Laplacian eigenvectors are needed. To tackle this problem, Cheb Net [18] integrated spectral graph convolution with Chebyshev polynomials. Then, Graph Convolutional Networks (GCNs) of [30] simpliﬁed the graph convolution with a localized ﬁrst-order approximation. More recently, there have been proposed various approaches based on accumulation of the graph information from a wider neighborhood, using diffusion aggregation and random walks. Such higher-order methods include approximate personalized propagation of neural predictions

(APPNP) [31], higher-order graph convolutional architectures (Mix Hop) [3], multi-scale graph convolution (N-GCN) [2], and Lévy Flights Graph Convolutional Networks (LFGCN) [14]. In addition to random walks, other recent approaches include GNNs on directed graphs (Motif Net) [35], graph convolutional networks with attention mechanism (GAT, SPAGAN) [48, 52], and graph Markov neural network (GMNN) [39]. Most recently, Liu et al. [34] consider utilizing information on the node neighbors features in GNN, proposing Deep Adaptive Graph Neural Network (DAGNN). However, DAGNN does not account for the important information on the shapes of the node neighborhoods.

Persistent Homology for Graph Learning Machinery of TDA and persistent homology (PH) is increasingly widely used in conjunction with graph classiﬁcation, that is, when the goal is to predict a label for the entire graph rather than for individual nodes. Such tools for graph classiﬁcation with persistent topological signatures include kernel-based methods [46, 40, 33, 55, 32] and neural networks [24, 12]. All of the above methods consider the task of classifying graph labels and are based on assessing global graph topology, while our focus is node classiﬁcation and our approach is based on evaluating local topological graph properties (i.e., shape of individual node neighborhoods). Integration of PH to node classiﬁcation is virtually unexplored. To the best of our knowledge, the closest result in this direction is PEGN-RC [56]. However, the key idea of PEGN-RC is distinctly different from our approach. PEGN-RC reweights only each existing edge, based on the topological information within its edge vicinity and, in contrast to TRI-GNN, neither compares any shapes, nor creates new or removes existing edges. Importantly, PEGN-RC does not integrate topology of both graph and node attributes, while TRI-GNN does.

3 Preliminaries on Topological Data Analysis and Persistent Homology

The machinery of topological data analysis (TDA) and, particularly, persistent homology offer a mathematically rigorous and systematic framework of tools to evaluate shape properties of the observed data, that is, intrinsic data characteristics which are invariant under continuous deformations such as stretching, compressing, and bending [58, 21, 37]. The main premise is that the observed data which can be, as in our case, a graph G or a point cloud in a Euclidean or any ﬁnite metric space constitute a discrete sample from some unknown metric space M. Our goal is then to recover information on some essential properties of M which has been lost due to sampling. Persistent homology addresses this reconstruction task by counting occurrences of certain patterns, e.g., loops, holes, and cavities, within shape of M. Such pattern counts and functions thereof, called topological signatures are then used to characterize intrinsic properties of G.

The approach is implemented in two main steps. We start from associating G with some nested sequence of subgraphs G1 G2 . . . Gm = G. Then we monitor evolution of pattern occurrences (e.g., cycles, cavities, and more generally n-dimensional holes) in this nested sequence of subgraphs. To ensure a systematic and computationally efﬁcient manner of pattern counting, we construct a simplicial complex Ci (e.g., a clique complex) induced by Gi. The sequence {Gi} induces a ﬁltration: nested sequence of simplicial complexes C1 C2 . . . Cm = C. Now we can not only track patterns but also evaluate the lifespan of each topological feature. Let bσ be the index of the simplicial complex Cbσ at which we ﬁrst record (i.e., birth) the n-dimensional topological feature σ (n-cycle), while simplicial complex Cdσ be the ﬁrst complex we observe its disappearance (i.e., death). Then lifespan or persistence of the topological feature σ is dσ bσ. To evaluate all topological features together, we consider a persistence diagram (PD) where the multi-set Dn = {(bσ, dσ) R2 : dσ > bσ} records the birth and deaths of all n-cycles in the ﬁltration {Ci}. Here, = {(t, t)|t R} is the diagonal set containing points in PD, counted with inﬁnite multiplicity. Different persistent diagrams can be compared based on the cost of the optimal matching between points of the two diagrams, while avoiding topological noise near [13].

Depending on the question at hand, we can construct different suitable ﬁltrations relevant to the problem. In this paper, we consider two different ﬁltrations. First, we consider the sublevel ﬁltration based on a node degree function f : V 7 N. As degree is an integer valued function, so are our thresholds {αi} N. Our sublevel ﬁltration is then deﬁned as follows. Let Vαi = {u V | f(u) αi}. Then, Gαi is the subgraph generated by Vαi. In particular, the edge euv E is in Gαi if both u and v are in Vαi. We call this degree based ﬁltration. In addition, we consider a second ﬁltration deﬁned by the edge-weight function on the graph where edge weights are induced by similarity of node attributes. We call this attribute based ﬁltration (See Section 4). Note that degree based ﬁltration

is induced only by the graph properties, while attribute based ﬁltration is constructed by using the features coming from the observed data.

4 Topological Relational Inference Graph Neural Network (TRI-GNN)

Problem Statement Let G = (V, E) be an attributed graph with a set of nodes V, a set of edges E V V and euv E denoting an edge between nodes u, v V. The total number of nodes in G is N = |V|. Let W RN N be a N N-adjacency matrix of G such that ωuv = 1 for euv E and 0 otherwise, and D be a diagonal N N-degree matrix with Duu = P

v ωuv. For undirected graphs W is symmetric (i.e., W = W ). To feed in directed graphs into the graph neural network architecture, we set W = (W + W)/2. Finally, each u V is equipped with a set of node features, i.e., Xu = (Xu1, Xu2, . . . , Xu F ) represents an F-dimensional feature vector for node u V, and X is a N F-matrix of all node features. Our objective is to develop a robust semi-supervised classiﬁer such that we can predict unknown node labels in G, given some training set of nodes in G with prior labeling information. Figure 1 illustrates our TRI-GNN model framework.

4.1 Topology-induced Multigraph Representation

The ﬁrst step in our TRI model with the associated Topology-induced Multigraph Representation (TIMR) of G is to deﬁne topological similarity among two node neighborhoods. The key goal here is to go beyond the vanilla local optimization algorithms which capture only pairwise similarity of node features [36] and beyond only triangle motifs as descriptors of higher-order node interactions [35, 47]. Our aim is to systematically extract all n-dimensional topological features and their persistence in each node neighborhood and then to compare node neighborhoods in terms of their exhibited shapes.

Deﬁnition 1 (Weighted k-hop Neighborhood). An induced subgraph Gk u = (Vk u, Ek u) G, equipped with an edge-weight function τ induced by similarity of node features in Gk u, is called a weighted k-hop neighborhood of node u V if: (1) for any v Vk u, the shortest path between u and v is at most k; (2) edge-weight function τ : Vk u Vk u 7 R 0 is such that for any v, w Vk u with evw Ek u, τvw = ||Xv Xw||, where || || is either Euclidean distance (in the case of continuous node features), Hamming distance (in the case of categorical node features), Heterogeneous Value Difference Metric (HVDM) or other distance appropriate for mixed-type data [20]. If τvw 1 for any evw Ek u, Gk u reduces to a conventional k-hop neighborhood.

Armed with the edge-weight function τ induced by node attributes (see Deﬁnition 1) or with a node degree function, we now compute a sublevel ﬁltration within each node neighborhood and track lifespan of each extracted topological feature, e.g., components, loops, cavities, and n-dimensional holes (see previous section). Here we consider two cases: attribute based ﬁltration (Gk u is equipped with an edge-weight function τ based on node attributes) and degree based ﬁltration (Gk u is an unweighted graph with τ 1). We can then compare how topologically similar shapes exhibited by node neighborhoods in terms of Wasserstein distance among their persistence diagrams.

Deﬁnition 2. (Topological Similarity among k-hop Neighborhoods) Let D(Gk u) and D(Gk v ) be persistence diagrams of the weighted k-hop neighborhood subgraphs Gk u and Gk v of nodes u and v, respectively. We measure topological similarity between Gk u and Gk v with Wasserstein distance between

the corresponding persistence diagrams as d Wp(Gk u, Gk v ) = infγ P x D(Gk u) ||x γ(x)||p 1

p , where p 1 and γ is taken over all bijective maps between D(Gk u) and D(Gk v ) , counting their multiplicities. In our analysis we use p = 1.

As noted by [36], integrating neighboring nodes whose labeling is positively correlated with the true cluster structure, as a side information to a community recovery process may lead to substantial performance gains. Inspired by these results, we not only inject topological side information into GNN but also distinguish this side information w.r.t. its strength. We propose a new topology-induced multigraph representation of G, where a multiedge between nodes u and v comprises information on (tangible) connectivity between u and v (i.e, existence of euv) as well as on strong and weak topological similarity of the local neighborhoods of u and v, irregardless whether euv exists.

Deﬁnition 3. (Topology-induced Multigraph Representation (TIMR)) Consider a graph G = (V, E), with d W,p(Gk u, Gk v ) as a measure of topological similarity among two k-hop neighborhoods

Figure 1: The TRI-GNN semi-supervised learning framework (for more details see Appendix B).

Gk u and Gk v , for all u, v V (see Deﬁnition 1). Then a topology-induced multigraph representation of G is deﬁned as Ω= (V, Etopo), where Etopo is a set of multiedges such that for any u, v V and thresholds ϵ1, ϵ2 R+

etopo uv = {1euv E, 1d W,p(Gk u,Gk v ) [0,ϵ1], 1d W,p(Gk u,Gk v ) [ϵ2, )}. (1)

Note that {d W,p(Gk u, Gk v ) [0, ϵ1]} and {d W,p(Gk u, Gk v ) [ϵ2, )} are incompatible events. Hence, multiedges have multiplicity at most 2 and Etopo is a multiset of {(1, 1, 0); (0, 1, 0); (1, 0, 1); (0, 0, 1)}. The intuition behind TIMR is to reﬂect a level of shape similarity among any two node neighborhoods, irregardless whether there exists an edge between these nodes in G. Later, by using Ω, we induce a graph W topo (Figure 1) deﬁned as follows: If neighborhoods of two nodes in G are topologically similar, we add an edge between the nodes if there is none. If neighborhoods of two nodes in G are topologically dissimilar, we remove the edge between the nodes if one exists.

Alternatively, we can associate TIMR with positive topology-induced and negative topology-induced adjacency matrices, W topo+ and W topo , respectively

W topo+ uv = 0 df W,p(Gk u, Gk v ) < ϵ1 , W topo uv = ϵ2 < df W,p(Gk u, Gk v ) < , (2)

where [ ] is an Iverson bracket, i.e., 1 whenever a condition in the bracket is satisﬁed, and 0 otherwise. Selection of hyperparameters ϵ1 and ϵ2 can be performed by assessing quantiles of the empirical distribution of shape similarities and then cross-validation. Thresholds ϵ1 and ϵ2 are used to reinforce meaningful edge structures and suppress noisy edges.

How does TIMR help? TIMR allows us not only to inject a new edge among two nodes if shapes of their multi-hop neighborhoods are sufﬁciently close, but also to eliminate an existing edge if topological distance among their multi-hop node neighborhoods is higher than predeﬁned threshold ϵ2. That is, TIMR adds an edge between the nodes whose similarity is detected by persistent homology, and removes the edge between topologically dissimilar nodes. This way, in the new TIMR graph, similar nodes gets closer, and dissimilar nodes gets farther away, thereby assisting throughout the node classiﬁcation process. As a result, TIMR also mitigates the impact of noise edges and reduces the effect of over-ﬁtting. Furthermore, while we have not formally explored TIMR in conjunction with formal defense mechanisms against adversarial attacks, TIMR may be viewed as an essential remedial ingredient in defense. Indeed, TIMR offers an insight about one of the key defense challenges [27], namely, systematic criteria we should follow to clean the attacked graph TIMR suggests to recover edges removed by the attacker with positive topology-induced links and to suppress edges induced by the attacker with negative topology-induced links.

4.2 TIMR Theoretical Stability Guarantees

We now establish theoretical stability properties of the TIMR average degree. The average degree is known to be closely related to performance in node classiﬁcation tasks [28, 17, 1] and, hence, degree-dependent regularization is often used in adversarial training [38, 49, 44]. Here we prove that under perturbations of the observed graph G, average degrees of the TIMR graphs of the original and distorted copies remain close. That is, the proposed TRI framework allows us to increase robustness

of the graph degree properties with respect to graph perturbations and attacks. Given that persistent homology representations are known to be robust against noise, our result appears to be intuitive. However, integration of graph persistent homology into node classiﬁcation tasks and its theoretical guarantees remain yet an untapped area.

Let G+ and G be two graphs of the same order. Let T + and T be TIMR graphs of G+ and G , constructed by the degree based ﬁltration. We deﬁne local k-distance between two graphs based on the Wasserstein distance d Wp between their persistence diagrams as follows. Let u be a node in G

and Gk u be k-neighborhood of the u in G . Let

dk(u+, u ) = d W1(D0(Gk u+), D0(Gk u )),

where d W1 is Wasserstein-1 distance (see Deﬁnition 1), and D0(G) is the persistence diagram of 0-cycles. Let ϕ : V + V be a bijection between node sets of G+ and G , respectively. Then, the local k-distance between G+ and G is deﬁned as

Dk(G+, G ) = min ϕ

u+ dk(u+, ϕ(u+)).

We now have the following stability result: Theorem 1 (Stability of Average Degree). Let G+ and G be two graphs of same size and order. Let T be the TIMR graph induced by G , with thresholds ϵ1 and ϵ2. Let α be the average degree of T . Then, there exists a constant K(ϵ1, ϵ2) > 0 such that

|α+ α | K(ϵ1, ϵ2)Dk(G+, G ).

The proof of the theorem is given in Appendix A in the supplementary material.

Furthermore, we conjecture the following result on stability of algebraic connectivity of TIMR graphs for the attribute based case. Here, the algebraic connectivity λ2(G) is the second smallest eigenvalue of the graph Laplacian L(G), and it is considered to be a spectral measure to determine the robustness of the graph [45].

Conjecture: Let G be a graph, and let G be a graph which is obtained by adding one edge e to G, i.e., G = G e. Let T, T be the TIMR graphs induced by G, G , respectively. Then,

|λ2(T ) λ2(T)| K(ϵ1, ϵ2)|λ2(G ) λ2(G)|.

Note that this conjecture is not true for degree based TIMR graphs. The reason for that when one adds an edge euv to G, then this operation signiﬁcantly changes k-neighborhoods of all nodes in Nk(vi) and Nk(vj). Since degree based TIMR construction depends on persistence diagrams of these k-neighborhoods, adding such an edge causes an uncontrollable effect on the algebraic connectivity. However, in attribute based construction, similarity is only based on the attributes, and the distances in the graph does not have an effect on edge addition/deletion decision. So, when the original graph G is perturbed by adding an edge, the attributes remain intact, and TIMR representations of the original and perturbed graphs are identical.

4.3 STAN: learning from Subgraphs, Topology and Attributes of Neighbors

For graphs with continuous or binary node features, higher-order form of message passing and aggregation (i.e., powers of the adjacency matrix) are shown to capture important structural graph properties that are inaccessible at the node-level. However, such higher-order architectures largely focus on global network topology and do not account for the important local graph structures. Also, performance of such higher-order graph convolution architectures tend to suffer from over-smoothing and be susceptible to noisy observations, since their propagation schemes recursively update each node s feature vector by aggregating over information delivered by all further nodes captured by a random walk.

Here we propose a new recursive feature propagation scheme STAN which updates the node features from Subgraphs, Topology, and Attributes of Neighbors. In particular, for each target node, STAN (i) converts these three features into a form of topological edge weights, (ii) calculates topological average of neighborhood features, (iii) aggregates them with the initial node features.

Let X(0) u = Xu be the initial node features for u V and T be the number of STAN iterations. Then we iteratively update the feature representation of node u using side information collected from nodes within its k-hop local neighborhood via X(t+1) u = fup φaggr X(t) u , α(u) P

v Vk(u) ˆduv X(t) v , where fup is the update function such as a multi-layer perceptron (MLP) and a gated network, φaggr is function that aggregates topological features into node features, such as sum and mean; α(u) is a weighting factor which can be set either as a hyperparameter or a ﬁxed scalar, and ˆduv are normalized topological edge weights

ˆduv = exp df W,p(Gu, Gv) 1

P v Vk(u) exp df W,p(Gu, Gv) 1 .

The core principle of STAN is to assign different importance scores to nodes, depending on how similar topologically their neighborhoods are to the target node, and to control the impact on the target node prediction when aggregating neighborhoods information. For instance, suppose target node is u and we also have nodes v and w. We assign a higher weight to the edge between u and v (if the shapes of their neighborhoods are more similar) and a lower weight to the edge between u and w (if neighborhoods of u and w have more distinct topology). As a result, the updated node features of u will be more affected by v.

Figure 2: STAN for node feature vectors extension. The target node u (red) with 2-hop neighborhood, where four 1-hop neighbors (blue) and three 2-hop neighbors (green). Each node is represented by a 3-component feature vector. We include more discussion on the STAN in Appendix B.

4.4 Convolution based TRI-GNN Layer

We now turn to construction of the TRI-GNN layer. Let W RN N 3 be a 3-dimension tensor, where N is the number of nodes and 3 is the number of candidate adjacency matrices (i.e., W, W topo+, and W topo ). Let W uvr be the r-th type of edge between nodes u and v of W , where u {1, . . . , N}, v {1, . . . , N}, and r {1, 2, 3}. That is, we denote W, W topo+, and W topo by setting r = 1, 2, 3. We then consider a joint topology-induced adjacency matrix W topo

W topo uv = 1, if P

r W uvr > 0 0, otherwise . (3)

In the experiments, we utilize the classiﬁcation function for the generalized semi-supervised learning as the default ﬁlter engine of TRI-GNN. Given W topo, we compute topological distance-based graph Laplacian Ltopo = ( Dtopo) σ W topo( Dtopo)σ 1, where W topo = (W topo + I)ρ, Dtopo = P

v W topo uv , σ [0, 1], and ρ (0, ). Equipped with the new graph Laplacian Ltopo and node information matrix X(T ) at iteration T , TRI-GNN uses the following graph convolution layer

H(ℓ+1) = ψ µ I µLtopo 1 H(ℓ)Θ(ℓ) = ψ µ

µLtopo i H(ℓ)Θ(ℓ) ,

where µ (0, 1] is a regularization parameter, H(ℓ) RN Fℓand H(ℓ+1) RN Fℓ+1 are input and output activations for layer ℓ(H(0) = X(T )), Θ RFℓ Fℓ+1 are the trainable parameters of

TRI-GNN, and ψ is an element-wise nonlinear activation function such as Re LU. In the experiments µ is selected from {0.1, 0.2, . . . , 0.9}, and its choice impacts the number of terms in Taylor expansion. To reduce the computational complexity, based on the Taylor series expansion, we empirically ﬁnd that maxi = Rµ is a suitable rule of thumb (where R [2, 50]), while the closer σ is to 0, the more robust to the choice of regularization parameters the performance is. Note that in order to improve the robustness of TRI-GNN when it is exposed to noisy, we can consider convolution based TRI-GNN layer with parallel structure.

Table 1: Performance on semi-supervised classiﬁcation. Average accuracy (%) and standard deviation (%) in ().

Method IEEE 118-Bus ACTIVSg200 ACTIVSg500 ACTIVSg2000 Cora-ML Cite Seer Pub Med Cheb Net 60.00 (3.11) 80.63 (4.20) 95.18 (1.00) 80.04 (0.37) 81.45 (1.21) 70.23 (0.80) 78.40 (1.10) GCN 52.86 (0.78) 82.96 (1.66) 90.33 (0.68) 73.36 (0.41) 81.50 (0.40) 71.11 (0.72) 79.00 (0.53) Motif Net 65.75 (0.77) 73.20 (1.61) 95.18 (0.53) 82.00 (0.44) - - - ARMA 70.55 (2.23) 80.07 (3.30) 94.33 (0.47) 81.20 (0.23) 82.80 (0.63) 72.30 (0.44) 78.80 (0.30) GAT 70.23 (1.80) 83.56 (2.58) 95.00 (0.47) 83.00 (0.59) 83.11 (0.70) 70.85 (0.70) 78.56 (0.31) RGCN 81.80 (1.79) 83.35 (3.12) 95.91 (0.50) 85.23 (0.40) 82.80 (0.60) 72.13 (0.50) 79.11 (0.30) GMNN 78.88 (2.50) 84.13 (1.89) 96.57 (0.52) 86.21 (0.29) 83.72 (0.90) 73.10 (0.79) 81.80 (0.53) LGCNs 71.43 (2.20) 83.78 (1.50) 95.14 (0.45) 85.57 (0.25) 83.35 (0.51) 73.08 (0.63) 79.51 (0.22) SPAGAN 78.55 (2.25) 81.83 (2.09) 96.19 (0.40) 86.00 (0.26) 83.63 (0.55) 73.02 (0.41) 79.60 (0.40) APPNP 82.05 (2.24) 81.21 (1.85) 96.11 (0.45) 86.77 (0.30) 83.31 (0.53) 72.30 (0.51) 80.12 (0.20) VPN 82.19 (2.20) 79.89 (1.82) 96.23 (0.50) 86.87 (0.33) 81.89 (0.57) 71.40 (0.32) 79.60 (0.39) LFGCN 82.20 (2.30) 81.11 (2.10) 97.61 (0.47) 87.74 (0.30) 84.35 (0.57) 71.89 (0.77) 79.60 (0.55) Mix Hop 80.05 (2.50) 80.53 (1.96) 95.94 (0.40) 86.10 (0.25) 81.90 (0.81) 71.41 (0.40) 80.81 (0.58) PEGN-RC 82.30 (1.90) 83.30 (1.10) 97.20 (0.50) 87.62 (0.74) 82.70 (0.50) 71.91 (0.63) 79.40 (0.70)

TRI-GNNA 82.80 (1.64) 84.93 (1.51) 97.85 (0.56) 88.82 (0.56) 84.80 (0.55) 73.45 (0.65) 79.80 (0.52) TRI-GNND 82.67 (2.08) 86.18 (0.20) 98.11 (0.43) 89.06 (0.44) 84.98 (0.49) 73.32 (0.48) 79.70 (0.50)

5 Experiments

We now empirically evaluate the effectiveness of our proposed method on seven node-classiﬁcation benchmarks under semi-supervised setting with different graph size and feature type. We run all experiments for 50 times and report the average accuracy results and standard deviations.

5.1 Experimental Settings

Datasets We compare TRI-GNN with the state-of-the-art (SOA) baselines, using standard publicly available real and synthetic networks: (1) 3 citation networks [41]: Cora-ML, Cite Seer, and Pub Med, where nodes are publications and edges are citations; (2) 4 synthetic power grid networks [8, 7, 23]: IEEE 118-bus system, ACTIVSg200 system, ACTIVSg500 system, and ACTIVSg2000 system, where each node represents a load bus, transformer, or generator and we use total line charging susceptance (BR_B) as edge weight. For more details see Table I (in Appendix C.1).

Baselines We compare TRI-GNN with 14 SOA GNNs: (i) 5 higher-order graph convolution architectures: LGCNs [22], APPNP [31], VPN [25], LFGCN [14], and Mix Hop [3]; (ii) 2 graph attention mechanism: GAT [48] and SPAGAN [52]; (iii) 4 GNNs with polynomial ﬁlters: Cheb Net [18], GCN [30], Motif Net [35], and ARMA [6]; (iv) 2 GNNs by learning the generative distribution: GMNN [39] and RGCN [57]; and (v) 1 GNNs with topological information: PEGN-RC [56]. In the experiments, we implement two variants of TRI-GNN, i.e., TRI-GNNA (node attributes as ﬁltration function) and TRI-GNND (node degree as ﬁltration function). In addition, in our robustness experiments, we select the 4 strongest baselines in each of the GNN areas: random walks, graph attention networks, higher-order graph convolutional models, and robust graph convolutional networks (i.e., APPNP, GAT, LFGCN, and RGCN). More details on experimental setup are in Appendix C.2. The source code of TRI-GNN is publicly available at https://github.com/TRI-GNN/TRI-GNN.git.

5.2 Performance on Node Classiﬁcation

Table 1 shows the results for node classiﬁcation results on graphs. We observe that TRI-GNN outperforms all baselines on Cora ML, Cite Seer, and four power grid networks, while achieving comparable performance on Pub Med. This consistently strong performance of TRI-GNN indicates utility of integrating more complex graph structures through aggregating higher-order topological

information. The best results are highlighted in bold for all datasets. As Table 1 shows, TRI-GNND and TRI-GNNA outperform all baselines on all 7 but 1 benchmark datasets, yielding relative gains of 0.5-2.5% on clean graphs. Overall, TRI-GNND appears to be the most competitive approach, especially for ACTIVSg200 and ACTIVSg500 systems. Nevertheless, TRI-GNNA tends to follow TRI-GNND relatively closely. It validates that the framework of topological relational inference can integrate topological information across multigraph representation and feature propagation, respectively. On Pub Med, GMNN achieves a better accuracy than both TRI-GNN versions due to Pub Med exhibiting the weakest structural information with very few links per node on average, but both TRI-GNN models still deliver highly competitive results. In addition, we ﬁnd that for sparser heterogeneous graphs, with lower numbers of node attributes per class, richer topological structure (i.e., synthetic ACTIVSg200 and ACTIVSg2000) and weaker dependency between attributes and classes, PH induced by the graph properties is more important than attribute-induced PH, while for denser, more homogeneous graphs as IEEE 118-bus, attribute-based ﬁltration is the key. These insights echo ﬁndings in real citation networks.

Ablation Study Our TRI framework contains two key components: (i) encoding higher-order topological information via TIMR, (ii) STAN, i.e., updating the node features by aggregating information from subgraphs, topology, and neighborhood features. To glean a deeper insight into how different components help TRI-GNN to achieve highly competitive results, we conduct experiments by removing individual component separately, namely (1) TRI-GNND w/o TIMR (i.e., denoted by Ω), (2) TRI-GNND w/o STAN, and (3) TRI-GNND with TIMR and STN (where STN represents updating the node features by only aggregating information from subgraphs and topology without neighborhood features). As Table 2 shows, both TIMR Ωand STAN indeed yield signiﬁcant performance improvements on all datasets. The ablation study on contribution of TIMR and STAN with TRI-GNNA are in Table II of Appendix C.3. The results show that all the components are indispensable.

Table 2: TRI-GNND ablation study. Signiﬁcance analysis is performed with one-sided two-sample t-test based on 50 runs; *, **, *** denote p-value < 0.1, 0.05, 0.01 (i.e., signiﬁcant, statistically signiﬁcant, highly statistically signiﬁcant).

Dataset TRI-GNND w/o Ω w/o STAN with Ω, STN ACTIVSg200 86.18 (0.20) 82.93 (1.32) 83.00 (3.17) 83.25 (1.58) ACTIVSg500 98.11 (0.43) 93.71 (1.03) 94.31 (1.20) 93.69 (1.50) Cora-ML 84.98 (0.49) 84.26 (0.77) 84.00 (0.33) 84.40 (0.56) Cite Seer 73.32 (0.48) 73.11 (0.43) 73.20 (0.55) 73.19 (0.47)

Boundary Sensitivity We select the ϵ1 and ϵ2 values based on quantiles of Wasserstein distances between persistence diagrams (see Figure 3a). The optimal choice of ϵ1 and ϵ2 can be obtained via cross-validation. For instance, the optimal quantile of ϵ1 and ϵ2 for ACTIVSg200 dataset is 0.55 and 2.50 respectively. We consider the lower and upper bounds for ϵ1 are 0.50 (minimum) and 0.05-quantile, respectively, and the lower and upper bounds for ϵ2 are 0.1-quantile and 6.74 (maximum), respectively. For ϵ1, we generate a sequence from 0.50 to 2.00 with increment of the sequence 0.05; for ϵ2, we generate a sequence from 2.50 to 6.74 with increment of the sequence 0.5. Based on cross-validation, we conduct experiments in different ϵ1 and ϵ2 combinations. Figures 3b and 3c show the performances with respect to different thresholds (ϵ1 and ϵ2) combinations (where denotes the ϵi (where i {1, 2}) is not ﬁxed). From the results in Figure 3, we ﬁnd that (ϵ1, ϵ2) around (0.55, 2.50) tend to deliver the optimal performance (accuracy (%)). The optimal results differ among graphs and depend on sparsity, label rates and graph higher-order properties. To better examine the boundary sensitivity, we further evaluate hyperparameters ϵ1 and ϵ2 on ACTIVSg500 in Appendix C.8.

Robustness Analysis To evaluate the model performance under graph perturbations, we test the robustness of the TRI-GNN framework under random attacks (here we present the results for TRIGNNA, and analogous robustness for TRI-GNND are in Appendix C.4.). Random attack randomly generates fake edges and injects them into the existing graph structure. The ratio of the number of fake edges to the number of existing edges varies from 0 to 100%. As Figure 4 shows, TRI-GNNA consistently outperforms all baselines on both citation and power grid networks under random attacks, delivering gains up to 10% in highly noisy scenarios (see Cora-ML). These ﬁndings indicate that TRI-GNNA is again the most robust GNN node classiﬁer under random attacks. By aggregating local

(a) Wasserstein distances.

(b) (ϵ1, ϵ 2).

(c) (ϵ 1, ϵ2).

Figure 3: Hyperparameters ϵ1 and ϵ2 selection of TRI-GNN on ACTIVSg200 dataset.

topological information from node features and neighborhood substructures, TRI-GNNA can absorb noisy edges and, as a result, exhibits less sensitivity to graph perturbations and adversarial attacks.

0.0 0.2 0.4 0.6 0.8 1.0 Ratio of Noise Edges

Average accuracy

APPNP GAT LFGCN RGCN TRI GNNA

ACTIVSg500 (a)

0.0 0.2 0.4 0.6 0.8 1.0 Ratio of Noise Edges

APPNP GAT LFGCN RGCN TRI GNNA

ACTIVSg2000 (b)

0.0 0.2 0.4 0.6 0.8 1.0 Ratio of Noise Edges

Average accuracy

APPNP GAT LFGCN RGCN TRI GNNA

Cora ML (c)

0.0 0.2 0.4 0.6 0.8 1.0 Ratio of Noise Edges

APPNP GAT LFGCN RGCN TRI GNNA

Citeseer (d)

Figure 4: Node classiﬁcation accuracy under random attacks.

Computational Costs In less extreme scenarios, the expected number of nodes in the k-hop neighborhood can be approximated by dk, where d is the mean degree of the graph and mathematical expectation is in terms of the node degree distribution. The computational complexity for Wasserstein distance between two persistence diagrams is O(M 3), where M is the total number of barcodes in the both persistence diagrams. In the worst case scenario, if we use a very ﬁne ﬁltering function for persistence diagrams of k-hop neighborhoods, it would give at most 2 dk barcodes (i.e., the number of barcodes in each k-hop neighborhood is at most dk and we compare two k-hop neighborhoods). Considering we use the degree ﬁltering function for our persistence diagrams, we expect to have much less ( dk) barcodes in the 0-th persistence diagram of a k-hop neighborhood by using the technique in [29]. Hence, the overall computational complexity would be of stochastic order O(N 2 d 3k

2 ). It may be costly to run PH for denser graphs (currently, computational complexity is the main roadblock for all PH-based methods on graphs) but as we noted earlier, for denser graphs we may use a degree ﬁltration under lower k-hops which is still found to be very powerful even for small k. Table II in Appendix C.2 reports the average running time of PD generation and training time per epoch of our TRI-GNN model on all datasets.

6 Conclusion

We have proposed a novel perspective to node classiﬁcation with GNN, based on the TDA concepts invoked within each local node neighborhood. The new TRI approach has shown to deliver highly competitive results on clean graphs and substantial improvements in robustness against graph perturbations. In the future, we plan to apply TRI-GNN as a part of structured graph pooling.

Societal Impact and Limitations

Among the major limitations we currently see is probable existence of the bias of the proposed node classiﬁcation approach due to potentially unbalanced population groups [5]. Indeed, in the context of social data analysis, given that we analyze peer neighborhoods which themselves tend to be largely formed by the same users as the target node and hence are likely to be dominated by the abundant group, such bias is very likely to exist and to be non-negligible. Mitigating it is a problem on its own but some ideas include using multiple ﬁltration functions based on different node attributes so the node neighborhoods are made more diverse and are no longer dominated by a speciﬁc group or removing some potentially sensitive node attributes (e.g., gender) completely from the study (which has its own pros and cons in terms of accuracy).

Acknowledgements

This work is sponsored by the National Science Foundation under award numbers ECCS 2039701, INTERN supplement for ECCS 1824716, DMS 1925346 and the Department of the Navy, Ofﬁce of Naval Research under ONR award number N000142112226. Part of this material is also based upon work supported by (while serving at) the National Science Foundation. Any opinions, ﬁndings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reﬂect the views of the National Science Foundation and/or the Ofﬁce of Naval Research. The authors are also grateful to the Neur IPS anonymous reviewers for many insightful suggestions and engaging discussion which improved clarity of the manuscript and highlighted new open research directions.

[1] Emmanuel Abbe, Enric Boix-Adsera, Peter Ralli, and Colin Sandon. Graph powering and spectral robustness. SIAM Journal on Mathematics of Data Science, 2(1):132 157, 2020.

[2] Sami Abu-El-Haija, Amol Kapoor, Bryan Perozzi, and Joonseok Lee. N-GCN: Multi-scale graph convolution for semi-supervised node classiﬁcation. In UAI, pages 841 851, 2020.

[3] Sami Abu-El-Haija, Bryan Perozzi, Amol Kapoor, Nazanin Alipourfard, Kristina Lerman, Hrayr Harutyunyan, Greg Ver Steeg, and Aram Galstyan. Mixhop: Higher-order graph convolutional architectures via sparsiﬁed neighborhood mixing. In ICML, pages 21 29, 2019.

[4] Konstantin Avrachenkov and Maximilien Dreveton. Almost exact recovery in noisy semi-supervised learning. ar Xiv:2007.14717, 2020.

[5] Solon Barocas, Moritz Hardt, and Arvind Narayanan. Fairness and Machine Learning. fairmlbook.org, 2021. http://www.fairmlbook.org.

[6] Filippo Maria Bianchi, Daniele Grattarola, Lorenzo Livi, and Cesare Alippi. Graph neural networks with convolutional arma ﬁlters. TPAMI, 2021.

[7] Adam B Birchﬁeld, Kathleen M Gegner, Ti Xu, Komal S Shetye, and Thomas J Overbye. Statistical considerations in the creation of realistic synthetic power grids for geomagnetic disturbance studies. Transactions on Power Systems, 32(2):1502 1510, 2016.

[8] Adam B Birchﬁeld, Ti Xu, Kathleen M Gegner, Komal S Shetye, and Thomas J Overbye. Grid structural characteristics as validation criteria for synthetic networks. Transactions on Power Systems, 32(4):3258 3265, 2016.

[9] Michael M Bronstein, Joan Bruna, Yann Le Cun, Arthur Szlam, and Pierre Vandergheynst. Geometric deep learning: going beyond euclidean data. Signal Processing Magazine, 34(4):18 42, 2017.

[10] Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann Le Cun. Spectral networks and locally connected networks on graphs. In ICLR, 2014.

[11] Gunnar Carlsson. Topological pattern recognition for point cloud data. Acta Numerica, 23:289 368, 2014.

[12] Mathieu Carrière, Frédéric Chazal, Yuichi Ike, Théo Lacombe, Martin Royer, and Yuhei Umeda. Perslay: A neural network layer for persistence diagrams and new graph topological signatures. In AISTATS, pages 2786 2796, 2020.

[13] Frédéric Chazal and Bertrand Michel. An introduction to topological data analysis: fundamental and practical aspects for data scientists. Frontiers in Artiﬁcial Intelligence, 4, 2021.

[14] Yuzhou Chen, Yulia R Gel, and Konstantin Avrachenkov. LFGCN: Levitating over graphs with levy ﬂights. In ICDM, pages 960 965. IEEE, 2020.

[15] Yuzhou Chen, Yulia R Gel, Vyacheslav Lyubchich, and Todd Winship. Deep ensemble classiﬁers and peer effects analysis for churn forecasting in retail banking. In Pa KDD, pages 373 385, 2018.

[16] Zhiqian Chen, Fanglan Chen, Lei Zhang, Taoran Ji, Kaiqun Fu, Liang Zhao, Feng Chen, and Chang Tien Lu. Bridging the gap between spatial and spectral domains: A survey on graph neural networks. ar Xiv:2002.11867, 2020.

[17] Hanjun Dai, Hui Li, Tian Tian, Xin Huang, Lin Wang, Jun Zhu, and Le Song. Adversarial attack on graph structured data. In ICML, pages 1115 1124, 2018.

[18] Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral ﬁltering. In Neur IPS, pages 3844 3852, 2016.

[19] Tyler Derr, Yao Ma, Wenqi Fan, Xiaorui Liu, Charu Aggarwal, and Jiliang Tang. Epidemic graph convolutional network. In WSDM, pages 160 168, 2020.

[20] Michel Marie Deza and Elena Deza. Encyclopedia of distances. In Encyclopedia of Distances, pages 1 583. Springer, 2009.

[21] Herbert Edelsbrunner and John Harer. Computational topology: an introduction. American Mathematical Soc., 2010.

[22] Hongyang Gao, Zhengyang Wang, and Shuiwang Ji. Large-scale learnable graph convolutional networks. In ACM SIGKDD, pages 1416 1424, 2018.

[23] Kathleen M Gegner, Adam B Birchﬁeld, Ti Xu, Komal S Shetye, and Thomas J Overbye. A methodology for the creation of geographically realistic synthetic power ﬂow models. In PECI, pages 1 6. IEEE, 2016.

[24] Christoph D Hofer, Roland Kwitt, and Marc Niethammer. Learning representations of persistence barcodes. JMLR, 20(126):1 45, 2019.

[25] Ming Jin, Heng Chang, Wenwu Zhu, and Somayeh Sojoudi. Power up! robust graph convolutional network via graph powering. In AAAI, 2021.

[26] Wei Jin, Yaxing Li, Han Xu, Yiqi Wang, Shuiwang Ji, Charu Aggarwal, and Jiliang Tang. Adversarial attacks and defenses on graphs: A review, a tool and empirical studies. ACM SIGKDD Explorations Newsletter, page 19 34, 2021.

[27] Wei Jin, Yao Ma, Xiaorui Liu, Xianfeng Tang, Suhang Wang, and Jiliang Tang. Graph structure learning for robust graph neural networks. In ACM SIGKDD, 2020.

[28] Antony Joseph and Bin Yu. Impact of regularization on spectral clustering. The Annals of Statistics, 44(4):1765 1791, 2016.

[29] Lida Kanari, Adélie Garin, and Kathryn Hess. From trees to barcodes and back again: theoretical and statistical perspectives. Algorithms, 13(12):335, 2020.

[30] Thomas N Kipf and Max Welling. Semi-supervised classiﬁcation with graph convolutional networks. In ICLR, 2017.

[31] Johannes Klicpera, Aleksandar Bojchevski, and Stephan Günnemann. Predict then propagate: Graph neural networks meet personalized pagerank. In ICLR, 2019.

[32] Panagiotis Kyriakis, Iordanis Fostiropoulos, and Paul Bogdan. Learning hyperbolic representations of topological features. In ICLR, 2021.

[33] Tam Le and Makoto Yamada. Persistence ﬁsher kernel: A riemannian manifold kernel for persistence diagrams. In Neur IPS, pages 10007 10018, 2018.

[34] Meng Liu, Hongyang Gao, and Shuiwang Ji. Towards deeper graph neural networks. In ACM SIGKDD, pages 338 348, 2020.

[35] Federico Monti, Karl Otness, and Michael M Bronstein. Motifnet: a motif-based graph convolutional network for directed graphs. In DSW, pages 225 228. IEEE, 2018.

[36] Elchanan Mossel and Jiaming Xu. Local algorithms for block models with side information. In ITCS, pages 71 80, 2016.

[37] Elizabeth Munch. A user s guide to topological data analysis. Journal of Learning Analytics, 4(2):47 61, 2017.

[38] Sharad Nandanwar and M Narasimha Murty. Structural neighborhood based classiﬁcation of nodes in a network. In ACM SIGKDD, pages 1085 1094, 2016.

[39] Meng Qu, Yoshua Bengio, and Jian Tang. GMNN: Graph markov neural networks. In ICML, pages 5241 5250, 2019.

[40] Bastian Rieck, Christian Bock, and Karsten Borgwardt. A persistent weisfeiler-lehman procedure for graph classiﬁcation. In ICML, pages 5448 5458, 2019.

[41] Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina Eliassi-Rad. Collective classiﬁcation in network data. AI Magazine, 29(3):93 93, 2008.

[42] Ke Sun, Piotr Koniusz, and Zhen Wang. Fisher-bures adversary graph convolutional networks. In UAI, pages 465 475. PMLR, 2020.

[43] Lichao Sun, Yingtong Dou, Carl Yang, Ji Wang, Philip S Yu, and Bo Li. Adversarial attack and defense on graph data: A survey. ar Xiv:1812.10528, 2020.

[44] Yiwei Sun, Suhang Wang, Xianfeng Tang, Tsung-Yu Hsieh, and Vasant Honavar. Non-target-speciﬁc node injection attacks on graph neural networks: A hierarchical reinforcement learning approach. In WWW, volume 3, 2020.

[45] Ali Sydney, Caterina Scoglio, and Don Gruenbacher. Optimizing algebraic connectivity by edge rewiring. Applied Mathematics and Computation, 219(10):5465 5479, 2013.

[46] Matteo Togninalli, Elisabetta Ghisu, Felipe Llinares-López, Bastian Rieck, and Karsten Borgwardt. Wasserstein weisfeiler-lehman graph kernels. In Neur IPS, pages 6439 6449, 2019.

[47] Francesco Tudisco, Austin R Benson, and Konstantin Prokopchik. Nonlinear higher-order label spreading. In WWW, pages 2402 2413, 2021.

[48] P. Veliˇckovi c, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio. Graph attention networks. In ICLR, 2018.

[49] Shen Wang, Zhengzhang Chen, Jingchao Ni, Xiao Yu, Zhichun Li, Haifeng Chen, and Philip S Yu. Adversarial defense framework for graph neural network. ar Xiv:1905.03679, 2019.

[50] Huijun Wu, Chen Wang, Yuriy Tyshetskiy, Andrew Docherty, Kai Lu, and Liming Zhu. Adversarial examples on graph data: Deep insights into attack and defense. In IJCAI, 2019.

[51] Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. A comprehensive survey on graph neural networks. TNNLS, 2020.

[52] Yiding Yang, Xinchao Wang, Mingli Song, Junsong Yuan, and Dacheng Tao. SPAGAN: shortest path graph attention network. In IJCAI, pages 4099 4105, 2019.

[53] Monisha Yuvaraj, Asim K Dey, Vyacheslav Lyubchich, Yulia R Gel, and H Vincent Poor. Topological clustering of multilayer networks. Proceedings of the National Academy of Sciences, 118(21), 2021.

[54] Ziwei Zhang, Peng Cui, and Wenwu Zhu. Deep learning on graphs: A survey. TKDE, 2020.

[55] Qi Zhao and Yusu Wang. Learning metrics for persistence-based summaries and applications for graph classiﬁcation. In Neur IPS, pages 9859 9870, 2019.

[56] Qi Zhao, Ze Ye, Chao Chen, and Yusu Wang. Persistence enhanced graph neural network. In AISTATS, pages 2896 2906. PMLR, 2020.

[57] Dingyuan Zhu, Ziwei Zhang, Peng Cui, and Wenwu Zhu. Robust graph convolutional networks against adversarial attacks. In ACM SIGKDD, pages 1399 1407, 2019.

[58] Afra Zomorodian and Gunnar Carlsson. Computing persistent homology. Discrete & Computational Geometry, 33(2):249 274, 2005.

1. For all authors...

(a) Do the main claims made in the abstract and introduction accurately reﬂect the paper s contributions and scope? [Yes] (b) Did you describe the limitations of your work? [Yes] See Section 5.2, scalability.

(c) Did you discuss any potential negative societal impacts of your work? [N/A] (d) Have you read the ethics review guidelines and ensured that your paper conforms to them? [Yes] 2. If you are including theoretical results...

(a) Did you state the full set of assumptions of all theoretical results? [Yes] See Section 4.2. (b) Did you include complete proofs of all theoretical results? [Yes] See Supplementary Material, Appendix A. 3. If you ran experiments...

(a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See Section 5 and Appendix C. Source code and data are publicly available at https: //github.com/TRI-GNN/TRI-GNN.git. (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Section 5 and Appendix C. (c) Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? [Yes] See Section 5 and results on statistical hypothesis testing. (d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] See Section 5 and Appendix C. 4. If you are using existing assets (e.g., code, data, models) or curating/releasing new assets...

(a) If your work uses existing assets, did you cite the creators? [Yes] (b) Did you mention the license of the assets? [Yes]

(c) Did you include any new assets either in the supplemental material or as a URL? [Yes] (d) Did you discuss whether and how consent was obtained from people whose data you re using/curating? [Yes] (e) Did you discuss whether the data you are using/curating contains personally identiﬁable information or offensive content? [N/A] 5. If you used crowdsourcing or conducted research with human subjects...

(a) Did you include the full text of instructions given to participants and screenshots, if applicable? [N/A] (b) Did you describe any potential participant risks, with links to Institutional Review Board (IRB) approvals, if applicable? [N/A] (c) Did you include the estimated hourly wage paid to participants and the total amount spent on participant compensation? [N/A]