# pseudoriemannian_graph_convolutional_networks__a2e21141.pdf

Pseudo-Riemannian Graph Convolutional Networks

Bo Xiong University of Stuttgart

Stuttgart, Germany

Shichao Zhu IIE, Chinese Academy of Sciences

Beijing, China

Nico Potyka Imperial College London London, United Kingdom

Shirui Pan Grifﬁth University Queensland, Australia

Chuan Zhou AMSS, Chinese Academy of Sciences

Beijing, China

Steffen Staab University of Stuttgart University of Southampton

Stuttgart, Germany

Graph convolutional networks (GCNs) are powerful frameworks for learning embeddings of graph-structured data. GCNs are traditionally studied through the lens of Euclidean geometry. Recent works ﬁnd that non-Euclidean Riemannian manifolds provide speciﬁc inductive biases for embedding hierarchical or spherical data. However, they cannot align well with data of mixed graph topologies. We consider a larger class of pseudo-Riemannian manifolds that generalize hyperboloid and sphere. We develop new geodesic tools that allow for extending neural network operations into geodesically disconnected pseudo-Riemannian manifolds. As a consequence, we derive a pseudo-Riemannian GCN that models data in pseudo-Riemannian manifolds of constant nonzero curvature in the context of graph neural networks. Our method provides a geometric inductive bias that is sufﬁciently ﬂexible to model mixed heterogeneous topologies like hierarchical graphs with cycles. We demonstrate the representational capabilities of this method by applying it to the tasks of graph reconstruction, node classiﬁcation and link prediction on a series of standard graphs with mixed topologies. Empirical results demonstrate that our method outperforms Riemannian counterparts when embedding graphs of complex topologies.

1 Introduction

Learning from graph-structured data is a pivotal task in machine learning, for which graph convolutional networks (GCNs) [1, 2, 3, 4] have emerged as powerful graph representation learning techniques. GCNs exploit both features and structural properties in graphs, which makes them wellsuited for a wide range of applications. For this purpose, graphs are usually embedded in Riemannian manifolds equipped with a positive deﬁnite metric. Euclidean geometry is a special case of Riemannian manifolds of constant zero curvature that can be understood intuitively and has well-deﬁned operations. However, the representation power of Euclidean space is limited [5], especially when embedding complex graphs exhibiting hierarchical structures [6]. Non-Euclidean Riemannian manifolds of constant curvatures provide an alternative to accommodate speciﬁc graph topologies. For example, hyperbolic manifold of constant negative curvature has exponentially growing volume and is well suited to represent hierarchical structures such as tree-like graphs [7, 8, 9, 10, 11]. Similarly, spherical manifold of constant positive curvature is suitable for embedding spherical data in various ﬁelds [12, 13, 14] including graphs with cycles. Some recent works [15, 16, 17, 18, 19] have extended GCNs to such non-Euclidean manifolds and have shown substantial improvements.

Equal contribution. Correspondence to bo.xiong@ipvs.uni-stuttgart.de

36th Conference on Neural Information Processing Systems (Neur IPS 2022).

Figure 1: The different submanifolds of a four-dimensional pseudo-hyperboloid of curvature 1 with two time dimensions. By ﬁxing one time dimension x0, the induced submanifolds include (a) an one-sheet hyperboloid, (b) the double cone, and (c) a two-sheet hyperboloid.

The topologies in real-world graphs [6], however, usually exhibit highly heterogeneous topological structures, which are best represented by different geometrical curvatures. A globally homogeneous geometry lacks the ﬂexibility for modeling complex graphs [20]. Instead of using a single manifold, product manifolds [20, 21] combining multiple Riemannian manifolds have shown advantages when embedding graphs of mixed topologies. However, the curvature distribution of product manifolds is the same at each point, which limits the capability of embedding topologically heterogeneous graphs. Furthermore, Riemannian manifolds are equipped with a positive deﬁnite metric disallowing for the faithful representation of the negative eigen-spectrum of input similarities [22].

Going beyond Riemannian manifolds, pseudo-Riemannian manifolds equipped with indeﬁnite metrics constitute a larger class of geometries, pseudo-Riemannian manifolds of constant nonzero curvature do not only generalize the hyperbolic and spherical manifolds, but also contain their submanifolds (Cf. Fig. 1), thus providing inductive biases speciﬁc to these geometries. Pseudo-Riemannian geometry with constant zero curvature (i.e. Lorentzian spacetime) was applied to manifold learning for preserving local information of non-metric data [5] and embedding directed acyclic graph [23]. To model complex graphs containing both hierarchies and cycles, pseudo-Riemannian manifolds with constant nonzero curvature have recently been applied into graph embeddings using non-parametric learning [24, 25, 26], but the representation power of these works is not on par with the Riemannian counterparts yet, mostly because of the absence of geodesic tools to extend neural network operations into pseudo-Riemannian geometry.

In this paper, we take the ﬁrst step to extend GCNs into pseudo-Riemannian manifolds foregoing the requirement to have a positive deﬁnite metric. Exploiting pseudo-Riemannian geometry for GCNs is non-trivial because of the geodesical disconnectedness in pseudo-Riemannian geometry. There exist broken points that cannot be smoothly connected by a geodesic, leaving necessary geodesic tools undeﬁned. To deal with it, we develop novel geodesic tools that empower manipulating representations in geodesically disconnected pseudo-Riemannian manifolds. We make it by ﬁnding diffeomorphic manifolds that provide alternative geodesic operations that smoothly avoid broken cases. Subsequently, we generalize GCNs to learn representations of complex graphs in pseudo-Riemannian geometry by deﬁning corresponding operations such as linear transformation and tangential aggregation. Different from previous works, the initial features of GCN could be fully deﬁned in the Euclidean space. Thanks to the diffeomorphic operation that is bijective and differentiable, the standard gradient descent algorithm can be exploited to perform optimization.

To summarize, our main contributions are as follows: 1) We present neural network operations in pseudo-Riemannian manifolds with novel geodesic tools, to stimulate the applications of pseudo-Riemannian geometry in geometric deep learning. 2) We present a principled framework, pseudo-Riemannian GCN, which generalizes GCNs into pseudo-Riemannian manifolds with indeﬁnite metrics, providing more ﬂexible inductive biases to accommodate complex graphs with mixed topologies. 3) Extensive evaluations on three standard tasks demonstrate that our model outperforms baselines that operate in Riemannian manifolds. Source code is open available at https://github.com/xiongbo010/QGCN.

2 Preliminaries

2.1 Pseudo-Riemannian manifolds

A pseudo-Riemannian manifold [27] (M, g) is a smooth manifold M equipped with a nondegenerate and indeﬁnite metric tensor g. Nondegeneracy means that for a given 2 Tx M, for any 2 Tx M we have gx( , ) = 0, then = 0. The metric tensor g induces a scalar product on the tangent space Tx M for each point x 2 M such that gx : Tx M Tx M ! R, where the tangent space Tx M can be seen as the ﬁrst order local approximation of M around point x. The elements of Tx M are called tangent vectors. Indeﬁnity means that the metric tensor could be of arbitrary signs. A principal special case is the Riemannian geometry, where the metric tensor is positive deﬁnite (i.e. 8 2 Tx M, gx( , ) > 0 iff 6= 0).

Pseudo-hyperboloid By analogy with hyperboloid and sphere in Euclidean space. Pseudohyperboloids are deﬁned as the submanifolds in the ambient pseudo-Euclidean space Rs,t+1 with the dimensionality of d = s + t + 1 that uses the scalar product as 8x, y 2 Rs,t+1, hx, yit = Pt

i=0 xiyi + Ps+t

j=t+1 xjyj. The scalar product induces a norm kxk2

t = hx, xit that can be used

to deﬁne a pseudo-hyperboloid Qs,t

x = (x0, x1, , xs+t)> 2 Rs,t+1 : kxk2

β is a nonzero real number parameter of curvature. Qs,t

β is called pseudo-sphere when β > 0 and pseudo-hyperboloid when β < 0. Since Qs,t

β is interchangeable with Qt+1,s 1

β , we consider the pseudo-hyperbololid here. Following the terminology of special relativity, a point in Qs,t

β can be interpreted as an event [5], where the ﬁrst t + 1 dimensions are time dimensions and the last s dimensions are space dimensions. Hyperbolic H and spherical S manifolds can be deﬁned as the special cases of pseudo-hyperboloids by setting all time dimensions except one to be zero and setting all space dimensions to be zero, respectively, i.e. Hβ = Qs,1

β , S β = Q0,t

2.2 Geodesic tools of pseudo-hyperboloid

Geodesic A generalization of a straight-line in the Euclidean space to a manifold is called the geodesic [15, 28]. Formally, a geodesic γ is deﬁned as a constant speed curve γ : 7! γ( ) 2 M, 2 [0, 1] joining two points x, y 2 M that minimizes the length, where the length of a curve is given by L(γ) =

γ( )dt. The geodesic holds that γ = arg minγ L(γ), such that

γ(0) = x, γ(1) = y, and

γ( ) = 1. By the means of the geodesic, the distance between

x, y 2 Qs,t

β is deﬁned as the arc length of geodesic γ( ).

Exponential and logarithmic maps The connections between manifolds and tangent space are established by the differentiable exponential map and logarithmic map. The exponential map at x is deﬁned as expx( ) = γ(1), which gives a way to project a vector 2 Tx M to a point expx( ) 2 M on the manifold. The logarithmic map logx : M ! Tx M is deﬁned as the inverse of the exponential map (i.e. logx = exp 1

x ). Note that since Qs,t

β is a geodesically complete manifold, the domain of the exponential map Dx is hence deﬁned on the entire tangent space, i.e. Dx = Tx Qs,t

β . However, as we will explain later, the logarithmic map logx(y) is only deﬁned when there exists a a lengthminimizing geodesic between x, y 2 Qs,t

β . More details can be found in the Appendix C.

Geodesical connectedness A pseudo-Riemannian manifold M is connected iff any two points of M can be joined by a piecewise (broken) geodesic with each piece being a smooth geodesic. The manifold is geodesically connected (or g-connected) iff any two points can be smoothly connected by a geodesic, where the two points are called g-connected, otherwise called g-disconnected. Different from Riemannian manifolds in which the geodesical completeness implies the g-connectedness (Hopf Rinow theorem [29]), pseudo-hyperboloid is a geodesically complete but not g-connected manifold where there exist points that cannot be smoothly connected by a geodesic [30]. Formally, in the pseudo-hyperboloid, two points x, y 2 Qs,t

β are g-connected iff hx, yit < |β|. The set of g-connected points of x 2 Qs,t

β is denoted as its normal neighborhood Ux =

β : hx, yit < |β|

. For g-disconnected points x, y 2 Qs,t

β , there does not exist a tan-

gent vector such that y = expβ

x( ), which implies that its inverse logβ

x ( ) is only deﬁned in the normal neighborhood of x. In a nutshell, the geodesic tools for the g-disconnected cases are not well-deﬁned, making it impossible to deﬁne corresponding vector operations.

3 Pseudo-Riemannian GCNs

In this section, we ﬁrst describe how to tackle the g-disconnectedness in pseudo-Riemannian manifolds. Then we present the pseudo-Riemannian GCNs based on the proposed geodesic tools.

3.1 Diffeomorphic geodesic tools

One standard way to tackle the g-disconnectedness in differential geometry is to introduce diffeomorphic manifolds in which the operations are well-deﬁned. A diffeomorphic manifold can be derived from a diffeomorphism, deﬁned as follows. Deﬁnition 1 (Diffeomorphism [27]). Given two manifolds M and M0, a smooth map : M ! M0 is called a diffeomorphism if is bijective and its inverse 1 is smooth as well. If a diffeomorphism between M and M0 exists, we call them diffeomorphic and write M ' M0.

For pseudo-Riemannian manifolds, the following diffeomorphism [24] decomposes pseudohyperboloid into the product manifolds of an unit sphere and the Euclidean space. Theorem 1 (Theorem 4.1 in [24]). For any point x 2 Qs,t

β , there exists a diffeomorphism : Qs,t

1 Rs that maps x into the product manifolds of an unit sphere and the Euclidean space.

The diffeomorphism is given in the Appendix D.1. In light of this, we introduce a new diffeomorphism that maps x to the product manifolds of sphere with curvature 1/β and the Euclidean space.

Theorem 2. For any point x 2 Qs,t

β , there exists a diffeomorphism : Qs,t

β Rs that maps x into the product manifolds of a sphere and the Euclidean space (proof in the Appendix D.2).

Compared with Theorem 1, this diffeomorphism preserves the curvatures in the diffeomorphic components, making it satisfy some geometric properties, e.g. the mapped point xt 2 St

β still lies on the surface of the pseudo-hyperboloid, making moving the tangent vectors from the pseudo-hyperboloid to the diffeomorphic manifold easy as we explained later. We call as the spherical projection.

Exponential and logarithmic maps. Considering that pseudo-hyperboloid Qs,t

β is gdisconnected, we propose to transfer the logmap and expmap into the diffeomorphic manifold : Qs,t

β Rs, since the product manifold St

β Rs is g-connected. To map tangent vectors between tangent space of Qs,t

β Rs, we exploit pushforward that induce a linear approximation of smooth maps on tangent spaces. Deﬁnition 2 (Pushforward). Suppose that : M ! M0 is a smooth map, then the differential of : d at point x is a linear map from the tangent space of M at x to the tangent space of M0 at (x). Namely, d : Tx M ! T (x)M0

Intuitively, pushforward can be used to push tangent vectors on Tx Qs,t

β forward to tangent vectors on T (x)St

β Rs. Based on this, the new logmap and its inverse expmap can be deﬁned by Eq. (1).

β (x) = 1(log St

β Rs( (x))), d exp Qs,t

β ( ) = 1(exp St

β Rs( ( ))), (1)

where ( ) is the spherical projection and 1( ) is the inverse. The mapping of tangent vectors is achieved by pushforward operations. The operations log St

β Rs( ) and exp St

β Rs( ) in the product manifolds can be deﬁned as the concatenation of corresponding operations in different components.

β Rs(x0) = log St

t) k log Rs(x0

β Rs( ) = exp St

β( t) k exp Rs( s), (2)

where || denotes the concatenation, x0 = S(x) consists of spherical features x0

β and Euclidean features x0

s 2 Rs. is the tangent vector induced by x on Qs,t

We choose points where space dimension s = 0 as the reference points due to the following property.

Theorem 3. For any reference point x =

β with space dimension s = 0, the induced

tangent space of Qs,t

β is equal to the tangent space of its diffeomorphic manifold St

β Rs, namely, Tx(St

β Rs) = Tx Qs,t

β . (proof in the Appendix D.3).

The intuition of proof is that if space dimension s = 0, the pushforward (differential) function just inﬂuences time dimension, for which the mapping is just an identity function (see Appendix D.2). In this way, although we transfer logmap and expmap to the diffeomorphic manifold St

Rs, the diffeomorphic operations c logx( ) and d expx( ) are still bijective functions from the pseudohyperboloid to the tangent space of the manifold itself. Hence, our ﬁnal operations are actually still deﬁned in the tangent space of the pseudo-hyperboloid. Note that such property only holds when our Theorem 2 is applied and the special reference points with space dimension s = 0 are chosen.

By leveraging the new logmap and expmap, we further formulate the diffeomorphic version of tangential operations as follows.

Tangential operations. For function f : Rd ! Rd0, the pseudo-hyperboloid version f : Qs,t

β with s+t = d and s0 +t0 = d0 can be deﬁned by the means of c log

β x( ) and d expβ

x( ) as Eq. (3).

f ( ) := d expβ

where x is the reference point. Note that this function is a morphism (i.e. (f g) = f g ) and direction preserving (i.e. f ( )/kf ( )k = f( )/kf( )k) [15], making it a natural way to deﬁne pseudo-hyperboloid version of vector operations such like scalar multiplication, matrix-vector multiplication, tangential aggregation and point-wise non-linearity and so on.

Parallel transport. Parallel transport is the generalization of Euclidean translation into manifolds. Formally, for any two points x and y connected by a geodesic, parallel transport P β

x!y( ) : Tx M ! Ty M is an isomorphism between two tangent spaces by moving one tangent vector 2 Tx M with tangent direction 2 Tx M to another tangent space Ty M. The parallel transport in pseudohyperboloid can be deﬁned as the combination of Riemannian parallel transport [31]. However, the parallel transport has not been deﬁned when there does not exist a geodesic between x and y. i.e., the tangent vector induced by x can not be transported to the tangent space of points outside of the normal neighborhood Ux. Intuitively, the normal neighborhoods satisfy the following property.

Theorem 4. For any point x 2 Qs,t

β , the union of the normal neighborhood of x and the normal neighborhood of its antipodal point x cover the entire manifold. Namely, Ux [ U x = Qs,t

β (proof in the Appendix D.4).

This theorem ensures that if a point y /2 Ux, its antipodal point y 2 Ux. Besides, Ty M is parallel to T y M. Hence, P β

x!y can be alternatively deﬁned as P β

x! y for broken points. This result is crucial to deﬁne the pseudo-hyperbolic addition, such as bias translation, detailed in section 3.2.

Broken geodesic distance. By the means of geodesic, the induced distance between x and y in pseudo-hyperboloid is deﬁned as the arc length of geodesic γ( ), given by dγ(x, y) = q

t. For broken cases in which logx(y) is not deﬁned, one approach is to use approximation like [24]. Different from that, we deﬁne following closed-form distance, given by Eq. (4).

dγ(x, y), if hx, yit < |β|

|β| + dγ(x, y), if hx, yit |β| (4)

The intuition is that when x, y 2 Qs,t

β are g-disconnected, we consider the distance as dγ(x, y) = dγ(x, x) + dγ( x, y) or dγ(x, y) = dγ(x, y) + dγ( y, y). Since dγ(x, x) = dγ( y, y) =

|β| is a constant and dγ( x, y) = dγ(x, y), the distance between broken points can be calculated as dγ(x, y) =

|β| + dγ(x, y).

To clarify theoretical contributions, our Theorem 2 is nessasary for our Theorem 3 while Theorem 3 is nessasary for transforming the GCN operations directly into the tangent space of the pseudohyperboloid. Besides, we are the ﬁrst to formulate the diffeomorphic expmap, logmap and tangential

operations of pseudo-hyperboloid to avoid broken cases. The theoretical properties of parallel transport and geodesic distance are discussed in the literature [24, 31]. However, we re-formulate parallel transport with Theorem 4 to avoid broken issues and propose a new distance measure using the broken geodesic (Eq.4 ), which is different from the approximated distance in [24].

3.2 Model architecture

GCNs can be interpreted as performing neighborhood aggregation after a linear transformation on node features of each layer. We present pseudo-Riemannian GCNs (Q-GCN) by deriving corresponding operations with the developed geodesic tools in the Qs,t

Feature initialization. We ﬁrst map the features from Euclidean space to pseudo-hyperboloid, considering that the input features of nodes usually live in Euclidean space. Following the feature transformation from Euclidean space to pseudo-hyperboloid in [24], we initialize the node features by performing a differentiable mapping ' : Rt+1

β that can be implemented by a double projection [24] based on Theorem 2, i.e. ' = 1 . The intuition is that we ﬁrst map the Euclidean features into diffeomorphic manifolds St

β Rs via ( ), and then map them into the pseudo-hyperboloid Qs,t

β via 1( ), where the mapping functions are given by Eq. (5).

|β| t ktk s

where x = (t, s)> 2 Qs,t

β with t 2 Rt

and s 2 Rs. z = (u, v)> 2 St

β Rs with u 2 St

β and v 2 Rs.

Tangential aggregation. The linear combination of neighborhood features is lifted to the tangent space, which is an intrinsic operation in differential manifolds [32, 16]. Speciﬁcally, Q-GCN aggregates neighbours embeddings in the tangent space of the reference point o before passing through a tangential activation function, and then projects the updated representation back to the manifold. Formally, at each layer , the updated features of each node i are deﬁned as Eq. (6).

i = d expβ +1

j2N (i)[{i}

where σ( ) is the activation function, β and β +1 are two layer-wise curvatures, N(i) denotes the one-hop neighborhoods of node i, and the , denote two basic operations, i.e. tangential transformation and bias translation, respectively.

Tangential transformation. We perform Euclidean transformations on the tangent space by leveraging the expmap and logmap in Eq. (1). Speciﬁcally, we ﬁrst project the hidden feature into the tangent space of south pole o = [|β|, 0, , ..., 0] using logmap and then perform Euclidean matrix multiplication. Afterwards, the transformed features are mapped back to the manifold using expmap.

Formally, at each layer , the tangential transformation is given by W βh := d expβ

β o(h )), where β denotes the pseudo-hyperboloid tangential multiplication, and W 2 Rd0 d denotes the layer-wise learnable matrix in Euclidean space.

Bias translation. It is noteworthy that simply stacking multiple layers of the tangential transformation would collapse the composition [32, 15], i.e. expβ

o ...(W 1 logβ

o(x)))) = expβ

o(W 0 W 1 ... logβ

o(x)), which means that these multiplications can simply be performed in Euclidean space except the ﬁrst logmap and last expmap. To avoid model collapsing , we perform bias translation after the tangential transformation. By the means of pseudo-hyperboloid parallel transport, the bias translation can be performed by parallel transporting a tangent vector b 2 To Qs,t

β to the tangent space of the point of interest. Finally, the transported tangent vector is mapped back to the manifold with expmap. Considering that d exp( ) is only deﬁned at point x 2 Qs,t

β with the space dimension xs = 0, we perform the original exp Qs,t

β ( ) at the point of interest. The bias translation is formally given by:

, if ho, eh it < |β|

, if ho, eh it |β|

where eh = W β h , β denotes the pseudo-hyperboloid addition. For the broken cases where ho, eh it |β|, the parallel transport P β

o!eh is not deﬁned. In this case, we parallel transport b to the

tangent space of the antipodal point eh , and then perform expβ

eh to map it back to the manifold.

Note that the case ho, eh it = |β| occurs if and only if h = o, in which case P β

3.3 Model training

Having introduced all the building blocks, Q-GCN stacks multiple pseudo-Riemannian GCN layers and the ﬁnal embeddings at the last layer can then be used to perform downstream tasks. For graph reconstruction, the objective is to map all nodes into a low-dimensional space such that the connected nodes are closer than unconnected nodes. Following [20, 24], we minimize the loss function L( ) = P

(u,v)2D log e d(u,v) P

v02E(u) e d(u,v0) under the set of connected relations D in the graph, where

E(u) = {v|(u, v) /2 D, v 6= u} is the set of negative examples for node u, d( ) is the distance function deﬁned in Eq. (4). For node classiﬁcation, we map the output of the last layer of Q-GCN to the tangent space, and then perform Euclidean multinomial logistic regression. For link prediction, we utilize the Fermi-Dirac decoder [8] to compute probability scores for edges, formally given by P(euv 2 E| ) = 1 e(d(u,v) r)/t+1, where r and t are hyperparameters, d(u, v) is the distance function in the embedding space. We then train Q-GCN by minimizing the cross-entropy loss using negative sampling.

Optimization. Although the model is built in Qs,t

β , the trainable parameters are all deﬁned in Euclidean space through the diffeomorphic mappings. Following the standard tangential optimization strategy [32], the parameters can be optimized via Euclidean optimization by applying layer-wise diffeomorphic expmap and logmap. One optional strategy is to use dedicated optimization like [24], which we left for our future work.

Complexity analysis. The time complexity is the same as a vanilla GCN given by O(|V |dd0+|E|d0), where |V | and |E| are the number of nodes and edges, d and d0 are the dimension of input and hidden features. The computation can be parallelized across all nodes. Similar to other non-Euclidean GCNs [32, 33, 17], the mapping from manifolds to the tangent space consume additional computation resources, compared with Euclidean GCNs, which is within the acceptable limits.

4 Experiments

We evaluate the effectiveness of Q-GCN on graph reconstruction, node classiﬁcation and link prediction. Firstly, we study the geometric properties of the used datasets including the graph sectional curvature [20] and the δ-hyperbolicity [7]. Fig. 4 in the Appendix shows the histograms of sectional curvature and the mean sectional curvature for all datasets. It can be seen that all datasets have both positive and negative sectional curvatures, showcasing that all graphs contain mixed graph topologies. To further analyze the degree of the hierarchy, we apply δ-hyperbolicity to identify the tree-likeness, as shown in Table 9 in the Appendix. We conjecture that the datasets with positive graph sectional curvature or larger δ-hyperbolicity should be suitable for pseudo-hyperboloid with a smaller time dimension, while datasets with negative graph sectional curvature or smaller δ-hyperbolicity should be aligned well with pseudo-hyperboloid with a larger time dimension.

4.1 Graph reconstruction

Datasets and baselines. We benchmark graph reconstruction on four real-world graphs including 1) Web-Edu [34]: a web network consisting of the .edu domain; 2) Power [35]: a power grid distribution network with backbone structure; 3) Bio-Worm [36]: a worms gene network; 4) Facebook [37]: a dense social network from Facebook. We compare our method with Euclidean GCN [2], hyperbolic GCN (HGCN) [32], spherical GCN, and product manifold GCNs ( -GCN) [33] with three signatures (i.e. H5 H5, H5 S5 and S5 S5). Besides, ﬁve variants of our model are implemented with different time dimension in [1, 3, 5, 7, 10] for comparison.

Table 2: ROC AUC (%) for Link Prediction (LP) and F1 score for Node Classiﬁcation (NC).

Dataset Airport Pubmed Cite Seer Cora δ-hyperbolicity 1.0 3.5 4.5 11.0 Method LP NC LP NC LP NC LP NC GCN 89.24 0.21 81.54 0.60 91.31 1.68 79.30 0.60 85.48 1.75 72.27 0.64 88.52 0.85 81.90 0.41 GAT 90.35 0.30 81.55 0.53 87.45 0.00 78.30 0.00 87.24 0.00 71.10 0.00 85.73 0.01 83.05 0.08 SAGE 89.86 0.52 82.79 0.17 90.70 0.07 77.30 0.09 90.71 0.20 69.20 0.10 87.52 0.22 74.90 0.07 SGC 89.80 0.34 80.69 0.23 90.54 0.07 78.60 0.30 89.61 0.23 71.60 0.03 89.42 0.11 81.60 0.43 HGCN (H16) 96.03 0.26 90.57 0.36 96.08 0.21 80.50 1.23 96.31 0.41 68.90 0.63 91.62 0.33 79.90 0.18 -GCN (H16) 96.35 0.62 87.92 1.33 96.60 0.32 77.96 0.36 95.34 0.16 73.25 0.51 94.04 0.34 79.80 0.50 -GCN (S16) 90.38 0.32 81.94 0.58 94.84 0.13 78.80 0.49 95.79 0.24 72.13 0.51 93.20 0.48 81.08 1.45 -GCN (H8 S8) 93.10 0.49 81.93 0.45 94.89 0.19 79.20 0.65 93.44 0.31 73.05 0.59 92.22 0.48 79.30 0.81 Q-GCN (Q15,1) 96.30 0.22 89.72 0.52 95.42 0.22 80.50 0.26 94.76 1.49 72.67 0.76 93.14 0.30 80.57 0.20 Q-GCN (Q14,2) 94.37 0.44 84.40 0.35 96.86 0.37 81.34 1.54 94.78 0.17 73.43 0.58 93.41 0.57 81.62 0.21 Q-GCN (Q13,3) 92.53 0.17 82.38 1.53 96.20 0.34 80.94 0.45 94.54 0.16 74.13 1.41 93.56 0.18 79.91 0.42 Q-GCN (Q2,14) 90.03 0.12 81.14 1.32 94.30 1.09 78.40 0.39 94.80 0.08 72.72 0.47 94.17 0.38 83.10 0.35 Q-GCN (Q1,15) 89.07 0.58 81.24 0.34 94.66 0.18 78.11 1.38 97.01 0.30 73.19 1.58 94.81 0.27 83.72 0.43 Q-GCN (Q0,16) 89.01 0.61 80.91 0.65 94.49 0.28 77.90 0.80 96.21 0.38 72.54 0.27 95.16 1.25 82.51 0.32

Table 1: The graph reconstruction results in m AP (%), top three results are highlighted. Standard deviations are relatively small (in range [0, 1.2 10 2]) and are omitted.

Model Web-Edu Power Bio-Worm Facebook Curvature -0.6 -0.3 0.0 0.1 GCN (E10) 83.66 86.61 90.19 81.73 HGCN (H10) 88.33 93.80 93.12 83.40 GCN (S10) 82.72 92.73 88.98 81.04 -GCN (H5 H5) 89.21 94.40 94.00 84.94 -GCN (S5 S5) 86.70 94.58 90.36 84.56 -GCN (H5 S5) 87.96 95.82 94.74 87.73 Q-GCN (Q9,1) 87.03 94.35 92.83 81.60 Q-GCN (Q7,3) 99.67 100.00 97.23 87.74 Q-GCN (Q5,5) 98.49 100.00 95.75 87.03 Q-GCN (Q3,7) 97.31 95.08 90.14 91.75 Q-GCN (Q0,10) 82.57 94.20 88.67 83.81

Experimental settings. Following [20, 24, 33], we use one-hot embeddings as initial node features. To avoid the time dimensions being 0, we uniformly perturb each dimension with a small random value in the interval [ , ], where = 0.02 in practice. In addition, the same 10-dimensional embedding and 2 hidden layers are used for all baselines to ensure a fair comparison. The learning rate is set to 0.01, the learning rate of curvature is set to 0.0001. Q-GCN is implemented with the Adam optimizer. We repeat the experiments 10 times via different random seeds inﬂuencing weight initialization and data batching.

Results. Table 1 shows the mean average precision (m AP) [20] results of graph reconstruction on four datasets. It shows that Q-GCN achieves the best performance across all benchmarks compared with both Riemannian space and product manifolds. We observe that by setting proper signatures, the product spaces perform better than a single geometry. It is consistent with our statement that the expression power of a single view geometry is limited. Speciﬁcally, all the top three results are achieved by Q-GCN, with one exception on Power where H5 S5 achieved the third-best performance. More precisely, for datasets that have smaller graph sectional curvature like Web-Edu, Power and Bio-Worm, Q7,3 perform the best, while Q3,7 perform the best on Facebook with positive sectional curvature. We conjecture that the number of time dimensions controls the geometry of the pseudo-hyperboloid. We ﬁnd that the graphs with more hierarchical structures are inclined to be embedded with fewer time dimensions. By analyzing the sectional curvature in Fig. 4, we ﬁnd that this makes sense as the mean sectional curvature of Power, Bio-Wormnet and Web-Edu are negative while it is negative for Facebook. Such results give us an intuition to determine the best time dimension based on the geometric properties of graphs.

4.2 Node classiﬁcation and link prediction

Datasets and baselines. We consider four benchmark datasets: Airport, Pubmed, Citeseer and Cora, where Airport is airline networks, Pubmed, Citeseer and Cora are three citation networks. We observe that the graph sectional curvatures of the four datasets are consistently negative without signiﬁcant differences in Fig. 4, hence we report the additional δ-hyperbolicity for comparison in Table 2. GCN [2], GAT [3], SAGE [38] and SGC [4] are used as Euclidean GCN counterparts. For non-Euclidean GCN baselines, we compare HGCN [32] and -GCN [33] with its three variants as

Table 3: F1 score for node classiﬁcation of MLP, HNN and Q-NN on Pubmed, Citeseer and Cora.

Method Pubmed Cite Seer Cora

MLP 72.30 0.30 60.22 0.42 55.80 0.08 HNN 74.60 0.40 59.92 0.87 59.60 0.09 Q-NN (Q15,1) 74.31 0.33 59.33 0.35 60.38 0.56 Q-NN (Q14,2) 76.26 0.31 64.33 0.35 62.77 0.30 Q-NN (Q13,3) 75.85 0.79 63.65 0.57 59.04 0.45 Q-NN (Q2,14) 74.44 0.68 60.48 0.29 63.85 0.22 Q-NN (Q1,15) 73.44 0.28 60.33 0.40 64.85 0.24 Q-NN (Q0,16) 73.31 0.17 61.05 0.22 63.96 0.41

Table 4: The running time (sec) of graph reconstruction on Web-Edu and Facebook.

Manifolds Web-Edu Facebook

GCN (E10) 2284 5456 Prod-GCN (H5 S5) 4336 10338 Q-GCN (Q9,1) 2769 6981 Q-GCN (Q7,3) 3363 6303 Q-GCN (Q5,5) 3620 7142 Q-GCN (Q3,7) 3685 7512 Q-GCN (Q1,9) 3532 7980 Q-GCN (Q0,10) 2778 7037

explained before. For Q-GCN, we empirically set the time dimension as [1, 2, 3, 14, 15, 16] as six variants since these settings best reﬂect the geometric properties of hyperbolic and spherical space, respectively.

Experimental settings. For node classiﬁcation, we use the same dataset split as [39] for citation datasets, where 20 nodes per class are used for training, and 500 nodes are used for validation and 1000 nodes are used for testing. For Airport, we split the dataset into 70/15/15. For link prediction, the edges are split into 85/5/10 percent for training, validation and testing for all datasets. To ensure a fair comparison, we set the same 16-dimension hidden embedding, 0.01 initial learning rate and 0.0001 learning rate for curvature. The optimal regularization with weight decay, dropout rate, the number of layers and activation functions are obtained by grid search for each method. We report the mean accuracy over 10 random seeds inﬂuencing weight initialization and batching sequence.

Results. Table 2 shows the averaged ROC AUC for link prediction, and F1 score for node classiﬁcation. As we can see from the δ-hyperbolicity, Airport and Pubmed are more hierarchical than Cite Seer and Cora. For Airport and Pubmed with dominating hierarchical properties (lower δ), Q-GCNs with fewer time dimensions achieve the results on par with hyperbolic space based methods such as HGCN [32], -GCN (H16). While for Cite Seer and Cora with less tree-like properties (higher δ), Q-GCNs achieve the state-of-the-art results, showcasing the ﬂexibility of our model to embed complex graphs with different curvatures. More speciﬁcally, Q-GCN with more time dimensions consistently performs best on Cora. While for Cite Seer, albeit Q-GCN achieves the best results, the corresponding best variants are not consistent on both tasks.

4.3 Parameter sensitivity and analysis

Figure 2: The m AP of graph reconstruction with varying number of time dimensions.

Time dimension. We study the inﬂuence of time dimension for graph reconstruction by setting varying number of time dimensions under the condition of s + t = 10. Fig. 2 shows that the time dimension t acts as a knob for controlling the geometric properties of Qs,t

β . The best performance are achieved by neither hyperboloid (t = 1) nor sphere cases (t = 10), showcasing the advantages of Qs,t

β on representing graphs of mixed topologies. It shows that on Web-Edu, Power and Bio-Worm with smaller mean sectional curvature, after the optimal value is reached at a lower t, the performance decrease as t increases. While on Facebook with larger (positive) mean sectional curvature, as t rises, the effect gradually increases until it reaches a peak at a higher t. It is consistent with our hypothesis that graphs with more hierarchical structure are inclined to be embedded in Qs,t

β with smaller t, while cyclical data is aligned well with larger t. The results give us an intuition to determine the best time dimension based on the geometric properties of graphs.

(a) Spherical projection (b) Hyperbolic projection (3D) (c) Hyperbolic projection (disk)

Figure 3: Visualization of the learned embeddings for link prediction on Pubmed (left) and Cora (right), where the colors denote the class of nodes. We apply (a) spherical projection, (b) hyperbolic projection (3D) and (c) hyperbolic projection (Poincar e disk) on the learned embeddings of Q-GCN to visualize various views of the learned embeddings.

Q-NN VS Q-GCN. We also introduce Q-NN, a generalization of MLP into pseudo-Riemannian manifold, deﬁned as multiple layers of f(x) = σ (W β x β b), where σ is the tangential activation. Table 3 shows that Q-NN with appropriate time dimension outperforms MLP and HNN on node classiﬁcation, showcasing the expression power of pseudo-hyperboloid. Furthermore, compared with the results of Q-GCN in Table 2, Q-GCN performs better than Q-NN, suggesting that the beneﬁts of the neighborhood aggregation equipped with the proposed GCN operations.

Computation efﬁciency. We compare the running time of Q-GCN, GCN and Prod-GCN per epoch. Table 4 shows that Q-GCN achieves higher efﬁciency than Prod-GCN (H5 S5). This is mainly owing to that the component R in our diffeomorphic manifold (S R) runs faster than non-Euclidean components in H5 S5. The additional running time mainly comes from the mapping operations and the projection from the time dimensions to S. The running time grows when increasing the number of time dimensions. Overall, albeit slower than Euclidean GCN, the running time of all variants of Q-GCN is smaller than the twice of time in Euclidean GCN, which is within the acceptable limits.

Visualization. To visualize the embeddings learned by Q-GCN, we use UMAP tool 2 to project the learned embeddings for Link Prediction on Pubmed and Cora into low-dimensional spherical and hyperbolic spaces. The projections include spherical projection into 3D sphere, hyperbolic projection into 3D plane, and 2D Poincar e disk. As shown in Fig. 3 (a,b), for Pubmed with more hyperbolic structures, the class separability is more signiﬁcant in hyperbolic projection than that is in spherical projection. While the corresponding result is opposite for less tree-like Cora. Furthermore, Fig. 3 (c) provides a more clear insight of the hierarchy. It shows that there are more hub nodes near the origin of Poincar e disk in Pubmed than in Cora, showcasing the dominating tree-likeness of Pubmed.

5 Conclusion

In this paper, we generalize GCNs to pseudo-Riemannian manifolds of constant nonzero curvature with elegant theories of diffeomorphic geometry tools. The proposed Q-GCN have the ﬂexibility to ﬁt complex graphs with mixed curvatures and have shown promising results on graph reconstruction, node classiﬁcation and link prediction. One limitation might be the choice of time dimension, we provide some insights to decide the best time dimension but this could still be improved, which we left for our future work. The developed geodesic tools are application-agnostic and could be extended to more deep learning methods. We foresee our work would shed light on the direction of non-Euclidean geometric deep learning.

Acknowledgments

The authors thank the International Max Planck Research School for Intelligent Systems (IMPRSIS) for supporting Bo Xiong. Bo Xiong was funded by the European Union s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No: 860801. Nico Potyka was partially funded by DFG projects Evowipe/COFFEE. Shirui Pan was supported in part by an ARC Future Fellowship (FT210100097). This work was funded by Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germanys Excellence Strategy EXC 2075-390740016 (Sim Tech).

2https://umap-learn.readthedocs.io/en/latest/index.html

[1] Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann Le Cun. Spectral networks and locally

connected networks on graphs. In Yoshua Bengio and Yann Le Cun, editors, 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014.

[2] Thomas N. Kipf and Max Welling. Semi-supervised classiﬁcation with graph convolutional

networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. Open Review.net, 2017.

[3] Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and

Yoshua Bengio. Graph attention networks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. Open Review.net, 2018.

[4] Felix Wu, Amauri H. Souza Jr., Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Q.

Weinberger. Simplifying graph convolutional networks. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pages 6861 6871. PMLR, 2019.

[5] Ke Sun, Jun Wang, Alexandros Kalousis, and Stéphane Marchand-Maillet. Space-time lo-

cal embeddings. In Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett, editors, Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pages 100 108, 2015.

[6] Marián Boguñá, Ivan Bonamassa, Manlio De Domenico, Shlomo Havlin, Dmitri Krioukov,

and M Ángeles Serrano. Network geometry. Nature Reviews Physics, pages 1 22, 2021.

[7] Mikhael Gromov. Hyperbolic groups. In Essays in group theory, pages 75 263. Springer,

[8] Dmitri Krioukov, Fragkiskos Papadopoulos, Maksim Kitsak, Amin Vahdat, and Marián Bo-

guná. Hyperbolic geometry of complex networks. Physical Review E, 82(3):036106, 2010.

[9] Bo Xiong, Michael Cochez, Mojtaba Nayyeri, and Steffen Staab. Hyperbolic embedding in-

ference for structured multi-label prediction. Advances in Neural Information Processing Systems, 2022.

[10] Menglin Yang, Zhihao Li, Min Zhou, Jiahong Liu, and Irwin King. Hicf: Hyperbolic informative collaborative ﬁltering. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2212 2221, 2022.

[11] Menglin Yang, Min Zhou, Jiahong Liu, Defu Lian, and Irwin King. Hrcf: Enhancing col-

laborative ﬁltering via hyperbolic geometric regularization. In Proceedings of the ACM Web Conference 2022, pages 2462 2471, 2022.

[12] Richard C Wilson, Edwin R Hancock, El zbieta Pekalska, and Robert PW Duin. Spherical and

hyperbolic embeddings of data. IEEE transactions on pattern analysis and machine intelligence, 36(11):2255 2269, 2014.

[13] Yu Meng, Jiaxin Huang, Guangyuan Wang, Chao Zhang, Honglei Zhuang, Lance M. Kaplan,

and Jiawei Han. Spherical text embedding. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d Alché-Buc, Emily B. Fox, and Roman Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, Neur IPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 8206 8215, 2019.

[14] Michaël Defferrard, Nathanaël Perraudin, Tomasz Kacprzak, and Raphael Sgier. Deepsphere:

towards an equivariant graph-based spherical cnn. ICLR 2019 Workshop on Representation Learning on Graphs and Manifolds, 2019.

[15] Octavian-Eugen Ganea, Gary Bécigneul, and Thomas Hofmann. Hyperbolic neural networks.

In Advances in neural information processing systems, pages 5350 5360, 2018.

[16] Qi Liu, Maximilian Nickel, and Douwe Kiela. Hyperbolic graph neural networks. In Hanna M.

Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d Alché-Buc, Emily B. Fox, and Roman Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, Neur IPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 8228 8239, 2019. [17] Shichao Zhu, Shirui Pan, Chuan Zhou, Jia Wu, Yanan Cao, and Bin Wang. Graph geometry

interaction learning. In Advances in Neural Information Processing Systems, 2020. [18] Yiding Zhang, Xiao Wang, Chuan Shi, Nian Liu, and Guojie Song. Lorentzian graph convolu-

tional networks. In International World Wide Web Conference Committee, 2021. [19] Jindou Dai, Yuwei Wu, Zhi Gao, and Yunde Jia. A hyperbolic-to-hyperbolic graph convolu-

tional network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 154 163, 2021. [20] Albert Gu, Frederic Sala, Beliz Gunel, and Christopher Ré. Learning mixed-curvature repre-

sentations in product spaces. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. Open Review.net, 2019. [21] Ondrej Skopek, Octavian-Eugen Ganea, and Gary Bécigneul. Mixed-curvature variational autoencoders. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. Open Review.net, 2020. [22] Julian Laub and Klaus-Robert Müller. Feature discovery in non-metric pairwise data. The

Journal of Machine Learning Research, 5:801 818, 2004. [23] James R Clough and Tim S Evans. Embedding graphs in lorentzian spacetime. Plo S one,

12(11):e0187301, 2017. [24] Marc T. Law and Jos Stam. Ultrahyperbolic representation learning. In Hugo Larochelle,

Marc Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, Neur IPS 2020, December 6-12, 2020, virtual, 2020. [25] Aaron Sim, Maciej Wiatrak, Angus Brayne, Páidí Creed, and Saee Paliwal. Directed graph

embeddings in pseudo-riemannian manifolds. Thirty-eighth International Conference on Machine Learning, 2021. [26] Bo Xiong, Shichao Zhu, Mojtaba Nayyeri, Chengjin Xu, Shirui Pan, Chuan Zhou, and Steffen

Staab. Ultrahyperbolic knowledge graph embeddings. Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022. [27] Barrett O neill. Semi-riemannian geometry with applications to relativity. Academic pres, 103,

1983. [28] Thomas James Willmore. An introduction to differential geometry. Courier Corporation, 2013. [29] Daniel Spiegel. The hopf-rinow theorem. Notes available online, 2016. [30] Fabio Giannoni, Paolo Piccione, and Rosella Sampalmieri. On the geodesical connectedness

for a class of semi-riemannian manifolds. Journal of mathematical analysis and applications, 252(1):444 476, 2000. [31] Tingran Gao, Lek-Heng Lim, and Ke Ye. Semi-riemannian manifold optimization. Co RR,

2018. [32] Ines Chami, Zhitao Ying, Christopher Ré, and Jure Leskovec. Hyperbolic graph convolutional neural networks. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d Alché-Buc, Emily B. Fox, and Roman Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, Neur IPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 4869 4880, 2019. [33] Gregor Bachmann, Gary Bécigneul, and Octavian Ganea. Constant curvature graph convolu-

tional networks. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 486 496. PMLR, 2020. [34] David Gleich, Leonid Zhukov, and Pavel Berkhin. Fast parallel pagerank: A linear system

approach. Yahoo! Research Technical Report YRL-2004-038, available via http://research. yahoo. com/publication/YRL-2004-038. pdf, 13:22, 2004.

[35] Duncan J Watts and Steven H Strogatz. Collective dynamics of small-world networks. nature,

393(6684):440 442, 1998.

[36] Ara Cho, Junha Shin, Sohyun Hwang, Chanyoung Kim, Hongseok Shim, Hyojin Kim, Han-

hae Kim, and Insuk Lee. Wormnet v3: a network-assisted hypothesis-generating server for caenorhabditis elegans. Nucleic acids research, 42(W1):W76 W82, 2014.

[37] Julian J. Mc Auley and Jure Leskovec. Learning to discover social circles in ego networks.

In Peter L. Bartlett, Fernando C. N. Pereira, Christopher J. C. Burges, Léon Bottou, and Kilian Q. Weinberger, editors, Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States, pages 548 556, 2012.

[38] William L. Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on

large graphs. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett, editors, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 1024 1034, 2017.

[39] Zhilin Yang, William W. Cohen, and Ruslan Salakhutdinov. Revisiting semi-supervised learn-

ing with graph embeddings. In Maria-Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, volume 48 of JMLR Workshop and Conference Proceedings, pages 40 48. JMLR.org, 2016.

[40] Maximilian Nickel and Douwe Kiela. Poincaré embeddings for learning hierarchical repre-

sentations. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett, editors, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 6338 6347, 2017.

[41] Bhuwan Dhingra, Christopher Shallue, Mohammad Norouzi, Andrew Dai, and George Dahl.

Embedding text in hyperbolic spaces. In Proceedings of the Twelfth Workshop on Graph Based Methods for Natural Language Processing (Text Graphs-12), pages 59 69, New Orleans, Louisiana, USA, 2018. Association for Computational Linguistics.

[42] Ivana Balazevic, Carl Allen, and Timothy M. Hospedales. Multi-relational poincaré graph

embeddings. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d AlchéBuc, Emily B. Fox, and Roman Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, Neur IPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 4465 4475, 2019.

[43] Nurendra Choudhary, Nikhil Rao, Sumeet Katariya, Karthik Subbian, and Chandan K. Reddy.

Self-supervised hyperboloid representations from logical queries over knowledge graphs. In WWW, pages 1373 1384. ACM / IW3C2, 2021.

[44] Tim R. Davidson, Luca Falorsi, Nicola De Cao, Thomas Kipf, and Jakub M. Tomczak. Hy-

perspherical variational auto-encoders. In Amir Globerson and Ricardo Silva, editors, Proceedings of the Thirty-Fourth Conference on Uncertainty in Artiﬁcial Intelligence, UAI 2018, Monterey, California, USA, August 6-10, 2018, pages 856 865. AUAI Press, 2018.

[45] Jiacheng Xu and Greg Durrett. Spherical latent spaces for stable variational autoencoders. In

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4503 4513, Brussels, Belgium, 2018. Association for Computational Linguistics.

[46] Zhen Han, Peng Chen, Yunpu Ma, and Volker Tresp. Dy ERNIE: Dynamic Evolution of Rie-

mannian Manifold Embeddings for Temporal Knowledge Graph Completion. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7301 7316, Online, 2020. Association for Computational Linguistics.

[47] Marc Law. Ultrahyperbolic neural networks. Advances in Neural Information Processing

Systems, 34, 2021.

[48] H Anciaux. Minimal submanifolds in pseudo-riemannian geometry. World Scientiﬁc, 2011.

[49] Keyulu Xu, Chengtao Li, Yonglong Tian, Tomohiro Sonobe, Ken-ichi Kawarabayashi, and

Stefanie Jegelka. Representation learning on graphs with jumping knowledge networks. In Jennifer G. Dy and Andreas Krause, editors, Proceedings of the 35th International Conference

on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, pages 5449 5458. PMLR, 2018. [50] Ming Chen, Zhewei Wei, Zengfeng Huang, Bolin Ding, and Yaliang Li. Simple and deep graph

convolutional networks. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 1725 1735. PMLR, 2020. [51] Yulong Pei, Tianjin Huang, Werner van Ipenburg, and Mykola Pechenizkiy. Resgcn: Attention-

based deep residual modeling for anomaly detection on attributed networks. Co RR, 2020.