# a_selfsupervised_mixedcurvature_graph_neural_network__a56ee592.pdf A Self-Supervised Mixed-Curvature Graph Neural Network Li Sun1 , Zhongbao Zhang2, Junda Ye2, Hao Peng3, Jiawei Zhang4, Sen Su2 and Philip S. Yu5. 1School of Control and Computer Engineering, North China Electric Power University, Beijing 102206, China 2School of Computer Science, Beijing University of Posts and Telecommunications, China 3Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University, Beijing 100191, China 4IFM Lab, Department of Computer Science, University of California, Davis, CA, USA 5Department of Computer Science, University of Illinois at Chicago, IL, USA ccesunli@ncepu.edu.cn; {zhongbaozb, susen}@bupt.edu.cn; penghao@act.buaa.edu.cn; jiawei@ifmlab.org; psyu@uic.edu Graph representation learning received increasing attentions in recent years. Most of the existing methods ignore the complexity of the graph structures and restrict graphs in a single constant-curvature representation space, which is only suitable to particular kinds of graph structure indeed. Additionally, these methods follow the supervised or semi-supervised learning paradigm, and thereby notably limit their deployment on the unlabeled graphs in real applications. To address these aforementioned limitations, we take the first attempt to study the self-supervised graph representation learning in the mixed-curvature spaces. In this paper, we present a novel Self-Supervised Mixed-Curvature Graph Neural Network (SELFMGNN). To capture the complex graph structures, we construct a mixed-curvature space via the Cartesian product of multiple Riemannian component spaces, and design hierarchical attention mechanisms for learning and fusing graph representations across these component spaces. To enable the self-supervised learning, we propose a novel dual contrastive approach. The constructed mixed-curvature space actually provides multiple Riemannian views for the contrastive learning. We introduce a Riemannian projector to reveal these views, and utilize a well-designed Riemannian discriminator for the single-view and cross-view contrastive learning within and across the Riemannian views. Finally, extensive experiments show that SELFMGNN captures the complex graph structures and outperforms state-of-the-art baselines. Introduction Graph representation learning (Cui et al. 2018; Hamilton, Ying, and Leskovec 2017) shows fundamental importance in various applications, such as link prediction and node classification (Kipf and Welling 2017), and thus receives increasing attentions from both academics and industries. Meanwhile, we have also observed great limitations with the existing graph representation learning methods in two major perspectives, which are described as follows: Representation Space: Most of existing methods ignore the complexity of real graph structures, and limit the graphs in Corresponding Author: Li Sun, ccesunli@ncepu.edu.cn Copyright c 2022, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. a single constant-curvature representation space (Gu et al. 2019). Such methods can only work well on particular kinds of structure that they are designed for. For instance, the constant negative curvature hyperbolic space is well-suited for graphs with hierarchical or tree-like structures (Liu, Nickel, and Kiela 2019). The constant positive curvature spherical space is especially suitable for data with cyclical structures, e.g., triangles and cliques (Bachmann, B ecigneul, and Ganea 2020), and the zero-curvature Euclidean space for grid data (Wu et al. 2021). However, graph structures in reality are usually mixed and complicated rather than uniformed, in some regions hierarchical, while in others cyclical (Papadopoulos et al. 2012; Ravasz and Barab asi 2003). Even more challenging, the curvatures over different hierarchical or cyclical regions can be different as will be shown in this paper. In fact, it calls for a new representation space to match the wide variety of graph structures, and we seek spaces of mixed-curvature to provide better representations. Learning Paradigm: Learning graph representations usually requires abundant supervision label information (Veliˇckovi c et al. 2018; Chami et al. 2019). Labels are usually scarce in real applications, and undoubtedly, labeling graphs is expensive manual annotation or paying for permission, and is even impossible to acquire because of the privacy policy. Fortunately, the rich information in graphs provides the potential for self-supervised learning, i.e., learning representations without labels (Liu et al. 2021). Self-supervised graph representation learning is a more favorable choice, particularly when we intend to take the advantages from the unlabeled graphs in real applications. Recently, contrastive learning (Veliˇckovi c et al. 2019; Qiu et al. 2020) emerges as a successful method for the graph self-supervised learning. However, existing self-supervised methods, to the best of our knowledge, cannot be applied to the mixed-curvature spaces due to the intrinsic differences in the geometry. To address these aforementioned limitations, we take the first attempt to study the self-supervised graph representation learning in the mixed-curvature space in this paper. To this end, we present a novel Self-Supervised Mixed Curvature Graph Neural Network, named SELFMGNN. To address the first limitation, we propose to learn the repre- The Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22) sentations in a mixed-curvature space. Concretely, we first construct a mixed-curvature space via the Cartesian product of multiple Riemannian hyperbolic, spherical and Euclidean component spaces, jointly enjoying the strength of different curvatures to match the complicated graph structures. Then, we introduce hierarchical attention mechanisms for learning and fusing representations in the product space. In particular, we design an intra-component attention for the learning within a component space and an inter-component attention for the fusing across component spaces. To address the second limitation, we propose a novel dual contrastive approach to enable the self-supervisd learning. The constructed mixed-curvature space actually provides multiple Riemannian views for contrastive learning. Concretely, we first introduce a Riemannian projector to reveal these views, i.e., hyperbolic, spherical and Euclidean views. Then, we introduce the single-view and cross-view contrastive learning. In particular, we utilize a well-designed Riemannian discriminator to contrast positive and negative samples in the same Riemannian view (i.e., the single-view contrastive learning) and concurrently contrast between different Riemannian views (i.e., the cross-view contrastive learning). In the experiments, we study the curvatures of real graphs and show the advantages of allowing multiple positive and negative curvature components for the first time, demonstrating the superiority of SELFMGNN. Overall, our main contributions are summarized below: Problem: To the best of our knowledge, this is the first attempt to study the self-supervised graph representation learning in the mixed-curvature space. Model: This paper presents a novel SELFMGNN model, where hierarchical attention mechanisms and dual contrastive approach are designed for self-supervised learning in the mixed-curvature space, allowing multiple hyperbolic (spherical) components with distinct curvatures. Experiments: Extensive experiments show the curvatures over different hierarchical (spherical) regions of a graph can be different. SELFMGNN captures the complicated graph structures without labels and outperforms the stateof-the-art baselines. Preliminaries and Problem Definition In this section, we first present the preliminaries and notations necessary to construct a mixed-curvature space. Then, we formulate the problem of self-supervised graph representation learning in the mixed-curvature space. Riemannian Manifold A smooth manifold M generalizes the notion of the surface to higher dimensions. Each point x M associates with a tangent space Tx M, the first order approximation of M around x, which is locally Euclidean. On tangent space Tx M, the Riemannian metric, gx( , ) : Tx M Tx M R, defines an inner product so that geometric notions can be induced. The tuple (M, g) is called a Riemannian manifold. Transforming between the tangent space and the manifold is done via exponential and logarithmic maps, respectively. For x M, the exponential map at x, expx(v) : Tx M M, projects the vector v Tx M onto the manifold M. The logarithmic map at x, logx(y) : M Tx M, projects the vector y M back to the tangent space Tx M. For further expositions, please refer to mathematical materials (Kreyszig 1968; Hopper and Andrews 2010). Constant Curvature Space The Riemannian metric also defines a curvature at each point κ(x), which determines how the space is curved. If the curvature is uniformly distributed, (M, g) is called a constant curvature space of curvature κ. There are 3 canonical types of constant curvature space that we can define with respect to the sign of the curvature: a positively curved spherical space S with κ > 0, a negatively curved hyperbolic space H with κ < 0 and the flat Euclidean space E with κ = 0. Note that, 2 denotes the Euclidean norm in this paper. Problem Definition In this paper, we propose to study the self-supervised graph representation learning in the mixed-curvature space. Without loss of generality, a graph is described as G = (V, E, X), where V = {v1, , vn} is the node set and E = {(vi, vj)| vi, vj V } is the edge set. We summarize the edges in the adjacency matrix G, where Gij = 1 iff (vi, vj) E, otherwise 0. Each node vi is associated with a feature vector xi Rd, and matrix X R|V | d represents the features of all nodes. Now, we give the studied problem: Problem Definition (Self-supervised graph representation learning in the mixed-curvature space). Given a graph G = (V, E, X), the problem of self-supervised graph representation learning in the mixed-curvature space is to learn an encoding function Φ : V P that maps the node v to a vector z in a mixed-curvature space P that captures the intrinsic complexity of graph structure without using any label information. In other words, the graph representation model should align with the complex graph structures hierarchical as well as cyclical structure, and can be learned without external guidance (labels). Graphs in reality are usually mixedcurvatured rather than structured uniformly, i.e., in some regions hierarchical, while in others cyclical. A constantcurvature model (e.g., hyperbolic, spherical or the Euclidean model) benefits from their specific bias to better fit particular structure types. To bridge this gap, we propose to work with the mixed-curvature space to cover the complex graph structures in real-world applications. SELFMGNN: Our Proposed Model To address this problem, we present a novel Self-supervised Mixed-curvature Graph Neural Network (SELFMGNN). In a nutshell, SELFMGNN learns graph representations in the mixed-curvature space, and is equipped with a dual contrastive approach to enable its self-supervised learning. We illustrate the architecture of SELFMGNN in Fig. 1. We will elaborate on the mixed-curvature graph representation learning and the dual contrastive approach in following sections. Self-supervised Learning Component Space 2 Component Space K Component Space 1 Component Space 2 Component Space K Input Graph :Generate congruent graph Riemannian Projector Hyperbolic Spherical Euclidean : Cartesian Product Single View Attentional Aggregation Attentional Concatenation: Component Space 1 Attentional Aggregation Mixed-curvature GNN Mixed-curvature GNN W2 W3 W4 W5 Figure 1: Overall architecture of SELFMGNN: In SELFMGNN, we first introduce a mixed-curvature GNN to learn graph representations. Specifically, we perform attentional aggregation within the component space where the triangle is to show its geometry, e.g., triangles curve inside in H, and attentional concatenation among component spaces whose example with learnable weights is on the top right. Then, to enable self-supervised learning, we design a Riemannian projector to reveal different views of the mixed-curvature space, and utilize a well-designed Riemannian discriminator Dκ to contrast samples for singleand cross-view contrastive learning, shown in the middle. In practice, we feed the graph and its congruent augmentation, generated by Γ( ), for the contrastive learning as specified in Algo. 1. Mixed-curvature GNN To learn the graph representations in a mixed-curvature space, we construct a mixed-curvature space by the Cartesian product of multiple Riemannian component spaces, in which we propose a mixed-curvature GNN with hierarchical attention mechanisms. In particular, we first stack intracomponent attentional layers in each component space to learn constant-curvature component representations. Then, we design an inter-component attentional layer across component spaces to fuse these component representations so as to obtain the output mixed-curvature representations matching the complex graph structures. Constructing a Mixed-Curvature Space: We leverage the Cartesian product to construct the mixed-curvature space. With K constant-curvature spaces {Mi}K i=1 indexed by subscript i, we perform the Cartesian product over them and obtain the resulting product space P = K i=1Mi, where denotes the Cartesian product, and Mi is known as a component space. By fusing multiple constant-curvature spaces, the product space is constructed with non-constant mixed curvatures, matching the complex graph structures. The product space P with the dimensionality d is described by its signature, which has three degrees of freedom per component: i) the model Mi, ii) the dimensionality di and iii) the curvature κi, where PK i=1 di = d. We use a shorthand notation for repeated components, (Mk)j = j i=1Mk. Note that, SELFMGNN can have multiple hyperbolic or spherical components with distinct learnable curvatures, and such a design enables us to cover a wider range of curvatures for the better representation. However, we only need one Euclidean space, since the Cartesian product of Euclidean space is Ed0 = j i=1Ei, and Pj i=1 di = d0. The Cartesian product introduces a simple and interpretable combinatorial construction of the mixed-curvature space. For x = ||K i=1x Mi in product space P, x Mi denotes the component embedding in Mi and || denotes the vector concatenation. Thanks to the combinatorial construction, we can first learn representations in each component space and then fuse these representations in the product space. Model and Operations: Prior to discussing about the learning and fusing of the representations, we give the model of component spaces and provide the generalized Riemannian operations in the component spaces in this part. We opt for the κ-stereographic model as it unifies spaces of both positive and negative curvature and unifies operations with gyrovector formalism. Specifically, the κstereographic model is a smooth manifold Md κ = {z Rd| κ||z||2 2 < 1}, whose origin is 0 Rd, equipped with a Riemannian metric gκ z = (λκ z)2I, where λκ z is given below: λκ z = 2 1 + κ||z||2 2 1 . (1) In particular, Md κ is the stereographic sphere model for spherical space (κ > 0), while it is the Poincar e ball model of radius 1/ κ for hyperbolic space (κ < 0). We summarize all the necessary operations for this paper in Table 1 with the curvature-aware definition of trigonometric functions. Specifically, tanκ( ) = tan( ) if κ > 0 and tanκ( ) = tanh( ) if κ < 0. Note that, the bold letter denotes the vector on the manifold. Intra-Component Attentional Layer: This is the building block layer of the proposed mixed-curvature GNN. In this layer, we update node representations by attentionally aggregating the representations of its neighbors in the constant-curvature component space. As the importance of neighbors is usually different, we introduce the intracomponent attention to learn the importance of the neighbors. Specifically, we first lift to the tangent space via logκ 0 and model the importance parameterized by θintra as follows: attintra(zi, zj) = σ θ intra(Wlogκ 0(zi)||Wlogκ 0(zj)) , (2) Operation Formalism in Ed Unified formalism in κ-stereographic model (Hd/ Sd) Distance Metric dκ M(x, y) = x y 2 dκ M(x, y) = 2 |κ| tan 1 κ p |κ| x κ y 2 Exponential Mapping expκ x(v) = x + v expκ x(v) = x κ |κ| λκ x v 2 Logarithmic Mapping logκ x(y) = x y logκ x(y) = 2 |κ|λκ x tan 1 κ p |κ| x κ y 2 x κy x κy 2 Addition x κ y = x + y x κ y = (1+2κx T y+K y 2)x+(1 κ||x||2)y 1+2κx T y+κ2||x||2||v||2 Scalar-vector Multiplication r κ x = rx r κ x = expκ 0 (r logκ 0(x)) Matrix-vector Multiplication M κ x = Mx M κ x = expκ 0 (M logκ 0(x)) Applying Functions f κ(x) = f(x) f κ(x) = expκ 0 (f (logκ 0(x))) Table 1: Summary of the operations in constant-curvature space (hyperbolic Hd, spherical Sd and Euclidean space Ed). where W is the shared weight matrix and σ( ) denotes the sigmoid activation. logκ 0 and expκ 0 are exponential and logarithmic maps defined in Table 1, respectively. Then, we compute the attention weight via softmax: ˆAij = exp(attintra(zi, zj)) P k Ni exp(attintra(zi, zk)), (3) where Ni is neighborhood node index of node vi in the graph. Finally, we add a self-loop to keep its initial information, i.e., we have A = I + ˆA. Aggregation in traditional Euclidean space is straightforward. However, aggregation in hyperbolic or spherical space is challenging as the space is curved. To bridge this gap, we define the row-wise κ-left-matrix-multiplication κ similar to (Bachmann, B ecigneul, and Ganea 2020). Let rows of Z hold the vectors in κ-stereographic model. We have (A κ Z)i := A κ midκ ({Zi }n i=1; Ai ) , (4) where A = P j Aij, ( )i denotes the ith row and midκ denotes the midpoint defined as follows: midκ ({Zi }n i=1; Ai ) = 1 2 κ Pn l=1 Ailλκ Zi Pn j=1 Aij(λκ Zi 1)Zi , (5) where notation λκ Zi has been defined in Eq. (1) before. We show that κ-left-matrix-multiplication κ performs attentional aggregation, i.e., Theorem 1. With the attention and aggregation above, we are ready to formulate the unified intra-component layer in component Mi of arbitrary curvature κi. Given the input Z(ℓ 1) Mi holding embeddings in its rows, the ℓth layer outputs: Z(ℓ) Mi = σ κ A(ℓ) Mi κ Z(ℓ 1) Mi κ W(ℓ 1) Mi where we define the κ-right-matrix-multiplication below: Z(ℓ 1) Mi κ W(ℓ 1) Mi i := expκ 0 logκ 0 (Z(ℓ 1) Mi )i W(ℓ 1) Mi . (7) In particular, we have Z(0) Mi hold the input features in the κ-stereographic model. After stacking L layers, we have the output matrix ZMi = Z(L) Mi hold the constant-curvature component embedding z Mi of component space Mi. Theorem 1 (κ-left-matrix-multiplication as attentional aggregation). Let rows of H hold the encoding z Mi, linear transformed by W, and A hold the attentional weights, the κ-left-matrix-multiplication A κ H performs the attentional aggregation over the rows of H, i.e., (A κ H)p is the linear combination of Hp with respect to attentional weight Apq, where q enumerates the node index in set Ψ, Ψ = p Np and Np is the neighborhood node index of vp. Proof. Please refer to https://arxiv.org/abs/2112.05393. Inter-Component Attentional Layer: This is the output layer of the proposed mixed-curvature GNN. In this layer, we perform attentional concatenation to fuse constantcurvature representations across component spaces so as to learn mixed-curvature representations in the product space. The importance of constant-curvature component space is usually different in constructing the mixed-curvature space. Thus, we introduce the inter-component attention to learn the importance of component space. Specifically, we first lift component encodings to the common tangent space, and figure out their centorid by the mean pooling as follows: µ = Pooling (logκ 0(Wi κ z Mi)) , (8) where we construct the common tangent space by the linear transformation Wi and logκ 0. Then, we model the importance of a component by the position of the component embedding relative to the centorid, parameterized by θinter, attinter(z Mi, µ) = σ θinter (logκ 0(Wi κ z Mi) µ) , (9) Next, we compute the attention weight of each component via the softmax function as follows: wi = exp(attinter(z Mi, µ)) PK k=1 exp(attinter(z Mk, µ)) . (10) Finally, with the learnable attentional weights, we perform attentional concatenation and have the output representation, z = ||K i=1(wi κ z Mi). Note that, learning representations in the mixed-curvature space not only matches the complex structures of graphs, but also inherently provides the positive and negative samples of multiple Riemannian views for contrastive learning, which we will discuss in the next part. Dual Contrastive Approach With the combinatorial construction of the mixed-curvature space, we propose a novel dual contrastive approach of single-view and cross-view contrastive learning for the selfsupervisd learning. To this end, we first design a Riemannian Projector to reveal the hyperbolic, spherical and Euclidean views with respect of the sign of curvature κ, and then design a Riemannian Discriminator Dκ to contrast the positive and negative samples. As shown in Fig. 1, we contrast the samples in the same Riemannian view (i.e., single-view contrastive learning) and concurrently contrast across different Riemannian views (i.e., cross-view contrastive learning). We summarize the self-supervised learning process of Self MGNN with dual contrastive loss in Algorithm 1. Riemannian Projector: We design the Riemannian projector to reveal different Riemannian views for contrastive learning. Recall that the mixed-curvature space P is a combinatorial construction of 3 canonical types of component spaces in essence, i.e., H (κ < 0), S (κ > 0) and E (κ = 0), where H and S can have multiple component spaces with distinct learnable curvatures. We can fuse component encodings of the same space type and obtain 3 canonical Riemannian views: hyperbolic h H, spherical s S and Euclidean e E. To this end, we design a map, Riemannian Projector: (Z)i [hi ei si], where (Z)i is the output of mixed-curvature GNN containing all component embeddings. Specifically, for each space type, we first project the component embedding z Mi to the corresponding space of standard curvature via MLPκ layers defined as follows: MLPκ(x) = σ κ(b κ M κ z Mi), (11) where M and b denote the weight matrix and bias, respectively. Then, we fuse the projected embeddings in account of the importance of component space via the midκ function: v = midκ ({Wi κ z Mi}i Ω; {wi}i Ω) , (12) where Ωis the component index set of the given type. Wi is the linear transformation. The importance weight wi is learned by inter-component attention, and v {h, e, s}. Riemannian Discriminator: Contrasting between positive and negative samples is fundamental for contrastive learning. However, it is challenging in the Riemannian space, and existing methods, to our knowledge, cannot be applied to Riemannian spaces due to the intrinsic difference in the geometry. To bridge this gap, we design a novel Riemannian Discriminator to scores the agreement between positive and negative samples. The main idea is that we lift the samples to the common tangent space, and evaluate the agreement score in the tangent space. Specifically, we utilize the bilinear form to evaluate the agreement. Given two Riemannian views x and y of a node, x, y {h, e, s}, we give the formulation parameterized by the matrix D as follows: Dκ(x, y) = (logκx 0 (x)) D logκy 0 (y) , (13) where we construct the common tangent space via logκx 0 , and κx is the curvature of the corresponding view. Algorithm 1: Self-supervised Learning SELFMGNN Input: Graph G = (G, X), weight γ, Congruent Graph Generation Function Γ( ) Output: Mixed Curvature GNN para., and throw away Riemannian Projector para. while not converging do // Views of the original graph Gα: Set Gα = G; Zα = Mixed Curvature GNN(Gα, X; θα); [Hα Eα Sα] = Riemannian Projector(Zα; φ); // Views of the congruent augmentation Gβ: Generate a congruent graph Gβ = Γ(G); Zβ = Mixed Curvature GNN(Gβ, X; θβ); [Hβ Eβ Sβ] = Riemannian Projector(Zβ; φ); // Dual contrastive loss: for each node vi in Gα and vj to Gβ do for Riemannian views x, y {h, e, s} do Single-view contrastive learning with Eqs. (14) and (15); Cross-view contrastive learning with Eqs. (16) and (17); // Update neural network parameters: Compute gradients of the dual contrastive loss: θα, θβ, φ, DS, DC JS + λJC. Single-view Contrastive Learning: SELFMGNN employs the single-view contrastive learning in each Riemannian view of the mixed-curvature space. Specifically, we first include a congruent augmentation Gβ similar to Chen et al. (2020); Hassani and Ahmadi (2020). Then, we introduce a contrastive discrimination task for a given Riemannian view: for a sample in Gα, we aims to discriminate the positive sample from negative samples in the congruent counterpart Gβ. Here, we use superscript α and β to distinguish notations of the graph and its congruent augmentation. We formulate the Info NCE loss (van den Oord, Li, and Vinyals 2018) as follows: LS(α, β) = log exp Dκ(xα i , xβ i ) P|V | j=1 I{i = j} exp Dκ(xα i , xβ j ) , (14) where xβ i and xβ j are the positive sample and negative samples of vi in Gα, respectively. I{ } {0, 1} is an indicator function who will return 1 iff the condition ( ) is true (i = j in this case). We utilize the Riemannian discriminator Dκ( , ) to evaluate the agreement between the samples. In the single-view contrastive learning, for each Riemannian view, we contrast between Gα and its congruent augmentation Gβ, and vice versa. Formally, we have the singleview contrastive loss as follows: x {h,e,s} (LS(α, β) + LS(β, α)) . (15) Citeseer Cora Pubmed Amazon Airport Method LP NC LP NC LP NC LP NC LP NC GCN 93.6(0.7) 70.2(0.8) 91.4(0.7) 81.3(0.3) 93.0(0.6) 78.8(0.2) 92.9(0.9) 71.2(1.1) 90.5(0.4) 50.8(0.9) Graph Sage 87.2(0.9) 68.2(1.1) 88.7(0.6) 78.1(0.8) 87.7(0.4) 77.5(0.3) 91.8(0.5) 72.9(1.6) 85.6(1.1) 47.8(0.8) GAT 92.9(0.7) 72.0(0.7) 93.4(0.4) 82.1(0.7) 92.6(0.3) 77.1(0.7) 93.9(0.6) 72.6(0.8) 91.4(0.6) 49.3(0.7) DGI 92.7(0.5) 71.3(0.7) 91.8(0.5) 81.4(0.6) 92.8(0.7) 76.6(0.6) 93.5(0.4) 72.2(0.3) 92.5(0.8) 50.1(0.5) MVGRL 94.8(0.3) 72.1(0.8) 93.2(0.7) 82.7(0.7) 95.9(0.2) 78.9(0.3) 96.2(0.5) 74.0(1.0) 95.1(0.3) 52.1(1.0) GMI 95.0(0.6) 72.5(0.3) 93.9(0.3) 81.8(0.2) 96.5(0.8) 79.0(0.2) 96.8(0.7) 74.5(0.9) 94.7(0.5) 51.9(0.7) HGCN 94.6(0.4) 71.7(0.5) 93.2(0.1) 81.5(0.6) 96.2(0.2) 78.5(0.4) 96.7(0.9) 75.2(1.3) 93.6(0.3) 51.2(0.6) HAT 93.7(0.5) 72.2(0.6) 93.0(0.5) 83.1(0.7) 96.3(0.3) 78.6(0.7) 96.9(1.1) 74.1(1.0) 93.9(0.6) 51.3(1.0) LGCN 95.5(0.5) 72.1(0.7) 93.7(0.5) 83.3(0.9) 96.6(0.2) 78.6(1.0) 97.5(0.9) 75.1(1.1) 96.4(0.2) 52.0(0.9) κ-GCN 93.8(0.7) 71.2(0.5) 92.8(0.8) 81.6(0.7) 95.0(0.3) 78.7(0.6) 94.8(0.6) 72.4(1.5) 93.5(0.7) 50.9(1.2) SELFMGNN 96.9(0.3) 73.1(0.9) 94.6(0.6) 83.8(0.8) 97.3(0.2) 79.6(0.5) 97.5(1.0) 75.3(0.8) 96.9(0.5) 52.7(0.7) Table 2: The summary of AUC (%) for link prediction (LP) and classification accuracy (%) for node classification (NC) on Citeseer, Cora, Pubmed, Amazon and USA datasets. The highest scores are in bold, and the second ones are underlined. Cross-view Contrastive Learning: SELFMGNN further employs a novel cross-view contrastive learning. The novelty lies in that our design in essence enjoys the multi-view nature of the mixed-curvature space, i.e., we exploit the multiple Riemannian views of the mixed-curvature space, and contrast across different views. Specifically, we formulate the contrastive discrimination task as follows: for a given Riemannian view of Gα, we aim to discriminate the given view from the other Riemannian views of Gβ. We formulate the Info NCE loss as follows: LC(α, β) = log exp Dκ(xα i , xβ i ) P x I{x = y} exp Dκ(xα i , yβ i ) , (16) where I{x = y} is to select the embeddings of different Riemannian views. Similarly, we contrast between Gα and Gβ and vice versa, and have the cross-view contrastive loss: x {h,e,s} (LC(α, β) + LC(β, α)) . (17) Dual Contrastive Loss: In Self MGNN, we integrate the single-view and cross-view contrastive learning, and formulate the dual contrastive loss as follows: Jself = JS + γJC, (18) where γ is the balance weight. The benefit of dual contrastive loss is that we can contrast the samples in the same Riemannian view (single-view) and contrast across different Riemannian views (cross-view), comprehensively leveraging the rich information in the mixed-curvature space to encode the graph structure. Finally, Self MGNN learns representations in the mixed-curvature Riemannian space capturing the complex structures of graphs without labels. Experiments In this section, we evaluate SELFMGNN with the link prediction and node classification tasks against 10 strong baselines on 5 benchmark datasets. We report the mean with the standard deviations of 10 independent runs for each model to achieve fair comparisons. Experimental Setups Datasets: We utilize 5 benchmark datasets, i.e., the widely-used Citeseer, Cora, and Pubmed (Kipf and Welling 2017; Veliˇckovi c et al. 2019), and the latest Amazon and Airport (Zhang et al. 2021). Citeseer, Cora and Pubmed are the widely used citation networks, where nodes represent papers, and edges represent citations between them. The Amazon is a co-purchase graph, where nodes represent goods and edges indicate that two goods are frequently bought together. The Airport is an air-traffic graph, where nodes represent airports and edges indicate the traffic connection between them. Euclidean Baselines: i) Supervised Models: GCN (Kipf and Welling 2017), Graph Sage (Hamilton, Ying, and Leskovec 2017), GAT (Veliˇckovi c et al. 2018). ii) Selfsupervised Models: DGI (Veliˇckovi c et al. 2019), MVGRL (Hassani and Ahmadi 2020), GMI (Peng et al. 2020). Riemannian Baselines: i) Supervised Models: HGCN (Chami et al. 2019), HAT (Gulcehre et al. 2019) and LGCN (Zhang et al. 2021) for hyperbolic space; κ-GCN (Bachmann, B ecigneul, and Ganea 2020) with positive κ for spherical space. ii) Self-supervised Models: There is no selfsupervised Riemannian models in the literature, and thus we propose SELFMGNN to fill this gap. Implementation Details Congruent graph: As suggested by Hassani and Ahmadi (2020), we opt for the diffusion to generate a congruent augmentation. Specifically, given an adjacency matrix Gα, we use the congruent graph generation function Γ( ) to obtain a diffusion matrix Gβ and treat it as the adjacency matrix of the congruent augmentation. The diffusion is computed once via fast approximated and sparsified method (Klicpera, enberger, and G unnemann 2019). Signature: The mixed-curvature space is parameterized by the signature, i.e., space type, curvature and dimensionality of the component spaces. The space type of component Mi can be hyperbolic H, spherical S or Euclidean E, and we utilize the combination of them to cover the mixed and complicated graph structures. The dimensionality d Mi is a hyperparameter. The curvature κMi is a learnable parameters as our loss is differentiable with respect to the curvature. Learning manner: Similar to Veliˇckovi c et al. (2019), self-supervised models first learn representations without labels, and then were evaluated by specific learning task, which is performed by directly using these representations to train and test for learning tasks. Supervised models were trained and tested by following Chami et al. (2019). Please refer to the Implementation Notes for further experimental details. Link Prediction The task of link prediction is to predict the probability of two nodes being connected. We utilize the Fermi-Dirac decoder with distance function to define the probability based on model outputs z. Formally, we have the probability defined as follows: p((i, j) E|zi, zj) = exp (d M(zi, zj)2 r)/t + 1 1 , (19) where r, t are hyperparameters. For each method, d M(zi, zj) is the distance function of corresponding representation space, e.g., ||zi zj||2 for Euclidean models, and we have d P(zi, zj)2 = PK l=1 d κl Ml ((zi)Ml, (zj)Ml)2 , (20) for SELFMGNN. We utilize AUC as the evaluation metric and summarize the performance in Table 2. We set output dimensionality to be 24 for all models for fair comparisons. Table 2 shows that SELFMGNN outperforms the self-supervised models in Euclidean space consistently since it better matches the mixed structures of graphs with the mixed-curvature space. SELFMGNN achieves competitive and even better results with the supervised Riemannian baselines. The reason lies in that we leverage dual contrastive approach to exploit the rich information of data themselves in the mixed-curvature Riemannian space. Node Classification For node classification, we first discuss the classifier as none of existing classifiers, to our knowledge, can work with mixed-curvature spaces. To bridge this gap, inspired by (Liu, Nickel, and Kiela 2019), for Riemannian models, we introduce the Euclidean transformation to generate an encoding, summarizing the structure of node representations. Specifically, we first introduce a set of centroids {µ1, , µC}, where µc is the centroid in Riemannian space learned jointly with the learning model. Then, for output representation zj M, its encoding is defined as ξ = (ξ1j, . . . , ξCj), where ξij = d M(µi, zj), summarizing the position of zi relative to the centroids. Now, we are ready to use logistic regression for node classification and the likelihood is p(y|h) = sigmoid(w Ch), (21) where w C R|C| is the weight matrix, and y is the label. h = ξ for Riemannian models and h is the output of Variants Citesser Core Pubmed H24 72.2(0.7) 82.1(0.4) 78.6(0.3) S24 70.5(0.8) 82.3(0.5) 77.5(0.4) E24 71.8(1.1) 81.0(0.7) 77.3(0.8) H8 S8 E8 72.6(0.3) 82.7(0.8) 78.9(0.9) (H4)2 (S4)2 E8 72.8(0.6) 83.1(0.6) 79.2(0.2) (H2)4 (S2)4 E8 72.9(0.2) 83.5(0.5) 79.3(0.5) H8 S8 E8 72.8(1.0) 83.3(0.9) 79.2(0.6) (H4)2 (S4)2 E8 73.1(0.9) 83.8(0.5) 79.6(0.7) (H2)4 (S2)4 E8 73.3(0.5) 84.1(0.8) 79.9(1.1) Table 3: Ablation study of SELFMGNN for node classification task in classification accuracy (%). Euclidean ones. We utilize classification accuracy (Kipf and Welling 2017) as the evaluation metric and summarize the performance in Table 2. Note that, SELFMGNN achieves the best results on all the datasets. Ablation Study We give the ablation study on the importance of i) mixedcurvature space (MCS) and ii) cross-view contrastive learning. To this end, we include two kinds of variants: Constant Curvature Space (CCS) and Single-view (Single), the degenerated SELFMGNNs without some functional module. CCS variants work without Cartesian product, and thus are learned by the single-view contrastive loss in a CCS, e.g., H24 is the variant in the hyperbolic space. Single variants work with the mixed-curvature space, but disable the cross-view contrastive learning. The ours are the proposed SELFMGNNs. A specific instantiation is denoted by Cartesian product, e.g., we use (H4)2 (S4)2 E8 as default, whose mixed-curvature space is constructed by Cartesian product of two H4, two S4 and one E8 component spaces, where the superscript is the dimensionality. We show the classification accuracy of these variants in Table 3, and we find that: i) Mixed-curvature variant with single or dual contrastive learning outperforms its CCS counterpart. The reason lies in that the mixed-curvature space is more flexible than a constant-curvature space to cover the complicated graph structures. ii) Disabling crossview contrastive learning decreases the performance as the cross-view contrasting further unleashes the rich information of data in the mixed-curvature space. iii) Allowing more than one hyperbolic and spherical spaces can also improve performance as (H4)2 (S4)2 E8 and (H2)4 (S2)4 E8 both outperform H8 S8 E8. Furthermore, we discuss the curvatures of the datasets. We report the learned curvatures and weights of each component space for the real-world datasets in Table 4. As shown in Table 4, component spaces of the same space type are learned with different curvatures, showing that the curvatures over different hierarchical or cyclical regions can still be different. (H4)2 (S4)2 E8 has 2 component spaces for Dataset H4 H4 S4 S4 E8 Citeseer 0.67(0.29) 0.58(0.19) +0.82(0.21) +2.72(0.13) 0(0.18) Cora 0.90(0.18) 1.31(0.25) +0.76(0.28) +0.19(0.08) 0(0.21) Pubmed 1.12(0.26) 0.79(0.34) +0.59(0.16) +1.05(0.15) 0(0.09) Amazon 0.78(0.11) 1.02(0.48) +1.13(0.05) +2.24(0.24) 0(0.12) Airport 1.26(0.30) 2.15(0.17) +1.85(0.20) +0.67(0.18) 0(0.15) Table 4: Learning results of the mixed-curvature space on the datasets curvature (weight) of each component space. hyperbolic (spherical) geometry, and allowing multiple hyperbolic (spherical) components enables us to cover a wider range of curvatures of the graph, better matching the graph structures. This also explains why (H4)2 (S4)2 E8 outperforms H8 S8 E8 in Table 3. With learnable curvatures and weights of each component, SELFMGNN matches the mixed and complicated graph structures, and learns more promising graph representations. Related Work SELFMGNN learns representations via a mixed-curvature graph neural network with the dual contrastive approach. Here, we briefly discuss the related work on the graph neural network and contrastive learning. Graph Neural Network Graph Neural Networks (GNNs) achieve the state-of-the-art results in graph analysis tasks (Dou et al. 2020; Peng et al. 2021). Recently, a few attempts propose to marry GNN and Riemannian representation learning (Gulcehre et al. 2019; Mathieu et al. 2019; Monath et al. 2019) as graphs are non Euclidean inherently (Krioukov et al. 2010). In hyperbolic space, Nickel and Kiela (2017); Suzuki, Takahama, and Onoda (2019) introduce shallow models, while Liu, Nickel, and Kiela (2019); Chami et al. (2019); Zhang et al. (2021) formulate deep models (GNNs). Furthermore, Sun et al. (2021) generalize hyperbolic GNN to dynamic graphs with insights in temporal analysis. Fu et al. (2021) study the curvature exploration. Sun et al. (2020); Wang, Sun, and Zhang (2020) propose promising methods to apply the hyperbolic geometry to network alignment. Beyond hyperbolic space, Cruceru, B ecigneul, and Ganea (2021) studies the matrix manifold of Riemannian spaces, and Bachmann, B ecigneul, and Ganea (2020) generalizes GCN to arbitrary constant-curvature spaces. To generalize representation learning, Gu et al. (2019) propose to learn in the mixed-curvature space. Skopek, Ganea, and B ecigneul (2020) introduce the mixed-curvature VAE. Recently, Wang et al. (2021a) model the knowledge graph triples in mixedcurvature space specifically with the supervised manner. Distinguishing from these studies, we propose the first selfsupervised mixed-curvature model, allowing multiple hyperbolic (spherical) subspaces each with distinct curvatures. Contrastive Learning Contrastive Learning is an attractive self-supervised learning method that learns representations by contrasting positive and negative pairs (Chen et al. 2020). Here, we discuss the contrastive learning methods on graphs. Specifically, Veliˇckovi c et al. (2019) contrast patch-summary pairs via infomax theory. Hassani and Ahmadi (2020) leverage multiple views for contrastive learning. Qiu et al. (2020) formulate a general framework for pre-training. Wan et al. (2021) incorporate the generative learning concurrently. Peng et al. (2020) explore the graph-specific infomax for contrastive learning. Xu et al. (2021) aim to learn graph level representations. Park et al. (2020); Wang et al. (2021b) consider the heterogeneous graphs. To the best of our knowledge, existing methods cannot apply to Riemannian spaces and we bridge this gap in this paper. Conclusion In this paper, we take the first attempt to study the problem of self-supervised graph representation learning in the mixed-curvature Riemannian space, and present a novel SELFMGNN. Specifically, we first construct the mixedcurvature space via Cartesian product of Riemannian manifolds and design hierarchical attention mechanisms within and among component spaces to learn graph representations in the mixed-curvature space. Then, we introduce the singleview and cross-view contrastive learning to learn graph representations without labels. Extensive experiments show the superiority of SELFMGNN. Implementation Notes Here, we give further implementation notes in order to enhance the reproducibility. In SELFMGNN, we stack the attentive aggregation layer twice to learn the component embedding. We employ a twolayer MLPκ in the Riemannian projector to reveal the Riemannian views for the self-supervised learning. In the experiments, we set the weight γ to be 1, i.e., the single-view and cross-view contrastive learning are considered to have the same importance. The grid search is performed over the learning rate in [0.001, 0.003, 0.005, 0.008, 0.01] as well as the dropout probability in [0, 0.8] with the step size of 0.1. For all the comparison model, we perform a hyperparameter search on a validation set to obtain the best results, and the κ-GCN is trained with positive curvature in particular to evaluate the representation ability of the spherical space. We set the dimensionality to be 24 for all the models for the fair comparison. Note that, in SELFMGNN, the component space can be set to arbitrary dimensionality, whose curvature and importance are learned from the data, and thereby we construct a mixed-curvature space of any dimensionality, matching the curvatures of any datasets. Acknowledgments Hao Peng is partially supported by the National Key R&D Program of China through grant 2021YFB1714800, NSFC through grant 62002007, S&T Program of Hebei through grant 21340301D. Sen Su and Zhongbao Zhang are partially supported by the National Key Research and Development Program of China under Grant 2018YFB1003804, NSFC under Grant U1936103 and 61921003. Philip S. Yu is partially supported by NSF under grants III-1763325, III1909323, III-2106758, and Sa TC-1930941. Jiawei Zhang is partially supported by NSF through grants IIS-1763365, IIS2106972 and by UC Davis. This work was also sponsored by CAAI-Huawei Mind Spore Open Fund. Thanks for computing infrastructure provided by Huawei Mind Spore platform. References Bachmann, G.; B ecigneul, G.; and Ganea, O. 2020. Constant Curvature Graph Convolutional Networks. In Proceedings of ICML, volume 119, 486 496. Chami, I.; Ying, Z.; R e, C.; and Leskovec, J. 2019. Hyperbolic graph convolutional neural networks. In Advances in Neur IPS, 4869 4880. Chen, T.; Kornblith, S.; Norouzi, M.; and Hinton, G. E. 2020. A Simple Framework for Contrastive Learning of Visual Representations. In Proceedings of ICML, volume 119, 1597 1607. Cruceru, C.; B ecigneul, G.; and Ganea, O. 2021. Computationally Tractable Riemannian Manifolds for Graph Embeddings. In Proceedings of AAAI, 7133 7141. Cui, P.; Wang, X.; Pei, J.; and Zhu, W. 2018. A survey on network embedding. IEEE Trans. on Knowledge and Data Engineering, 31(5): 833 852. Dou, Y.; Liu, Z.; Sun, L.; Deng, Y.; Peng, H.; and Yu, P. S. 2020. Enhancing Graph Neural Network-based Fraud Detectors against Camouflaged Fraudsters. In Proceedings of CIKM, 315 324. Fu, X.; Li, J.; Wu, J.; Sun, Q.; Ji, C.; Wang, S.; Tan, J.; Peng, H.; and Yu, P. S. 2021. ACE-HGNN: Adaptive Curvature Exploration Hyperbolic Graph Neural Network. In Proceedings of ICDM. Gu, A.; Sala, F.; Gunel, B.; and R e, C. 2019. Learning Mixed-Curvature Representations in Product Spaces. In Proceedings of ICLR, 1 21. Gulcehre, C.; Denil, M.; Malinowski, M.; Razavi, A.; Pascanu, R.; Hermann, K. M.; Battaglia, P.; Bapst, V.; Raposo, D.; Santoro, A.; and de Freitas, N. 2019. Hyperbolic Attention Networks. In Proceedings of ICLR. Hamilton, W.; Ying, Z.; and Leskovec, J. 2017. Inductive representation learning on large graphs. In Advances in Neur IPS, 1024 1034. Hassani, K.; and Ahmadi, A. H. K. 2020. Contrastive Multi View Representation Learning on Graphs. In Proceedings of ICML, volume 119, 4116 4126. Hopper, C.; and Andrews, B. 2010. The Ricci flow in Riemannian geometry. Springer. Kipf, T. N.; and Welling, M. 2017. Semi-supervised classification with graph convolutional networks. In Proceedings of ICLR. Klicpera, J.; enberger, S. W.; and G unnemann, S. 2019. Diffusion Improves Graph Learning. In Advances in NIPS, 13333 13345. Kreyszig, E. 1968. An Introduction to Differential Geometry and Riemannian Geometry. University of Toronto Press. Krioukov, D.; Papadopoulos, F.; Kitsak, M.; Vahdat, A.; and Bogun a, M. 2010. Hyperbolic geometry of complex networks. Physical Review E, 82(3): 036106. Liu, Q.; Nickel, M.; and Kiela, D. 2019. Hyperbolic graph neural networks. In Advances in Neur IPS, 8228 8239. Liu, X.; Zhang, F.; Hou, Z.; Wang, Z.; Mian, L.; Zhang, J.; and Tang, J. 2021. Self-supervised Learning: Generative or Contrastive. IEEE Trans. on Knowledge and Data Engineering, 1 24. Mathieu, E.; Le Lan, C.; Maddison, C. J.; Tomioka, R.; and Teh, Y. W. 2019. Continuous Hierarchical Representations with Poincar e Variational Auto-Encoders. In Advances in Neur IPS, 12544 12555. Monath, N.; Zaheer, M.; Silva, D.; Mc Callum, A.; and Ahmed, A. 2019. Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space. In Proceedings of KDD, 714 722. Nickel, M.; and Kiela, D. 2017. Poincar e embeddings for learning hierarchical representations. In Advances in Neur IPS, 6338 6347. Papadopoulos, F.; Kitsak, M.; Serrano, M. A.; Bogun a, M.; and Krioukov, D. 2012. Popularity versus similarity in growing networks. Nature, 489(7417): 537. Park, C.; Kim, D.; Han, J.; and Yu, H. 2020. Unsupervised Attributed Multiplex Network Embedding. In Proceedings of AAAI, 5371 5378. Peng, H.; Zhang, R.; Dou, Y.; Yang, R.; Zhang, J.; and Yu, P. S. 2021. Reinforced Neighborhood Selection Guided Multi-Relational Graph Neural Networks. ACM Trans. on Information Systems, 1 45. Peng, Z.; Huang, W.; Luo, M.; Zheng, Q.; Rong, Y.; Xu, T.; and Huang, J. 2020. Graph Representation Learning via Graphical Mutual Information Maximization. In Proceedings of WWW, 259 270. Qiu, J.; Chen, Q.; Dong, Y.; Zhang, J.; Yang, H.; Ding, M.; Wang, K.; and Tang, J. 2020. GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training. In Proceedings of KDD, 1150 1160. Ravasz, E.; and Barab asi, A.-L. 2003. Hierarchical organization in complex networks. Physical review E, 67(2): 026112. Skopek, O.; Ganea, O.; and B ecigneul, G. 2020. Mixedcurvature Variational Autoencoders. In Proceedings of ICLR, 1 54. Sun, L.; Zhang, Z.; Zhang, J.; Wang, F.; Du, Y.; Su, S.; and Yu, P. S. 2020. Perfect: A Hyperbolic Embedding for Joint User and Community Alignment. In Proceedings of ICDM, 501 510. Sun, L.; Zhang, Z.; Zhang, J.; Wang, F.; Peng, H.; Su, S.; and Yu, P. S. 2021. Hyperbolic Variational Graph Neural Network for Modeling Dynamic Graphs. In Proceedings of AAAI, 4375 4383. Suzuki, R.; Takahama, R.; and Onoda, S. 2019. Hyperbolic Disk Embeddings for Directed Acyclic Graphs. In Proceedings of ICML, 6066 6075. van den Oord, A.; Li, Y.; and Vinyals, O. 2018. Representation Learning with Contrastive Predictive Coding. ar Xiv: 1807.03748, 1 13. Veliˇckovi c, P.; Cucurull, G.; Casanova, A.; Romero, A.; Li o, P.; and Bengio, Y. 2018. Graph Attention Networks. In Proceedings of ICLR. Veliˇckovi c, P.; Fedus, W.; Hamilton, W. L.; Li o, P.; Bengio, Y.; and Hjelm, R. D. 2019. Deep Graph Infomax. In Proceedings of ICLR, 1 24. Wan, S.; Pan, S.; Yang, J.; and Gong, C. 2021. Contrastive and Generative Graph Convolutional Networks for Graphbased Semi-Supervised Learning. In Proceedings of AAAI, 10049 10057. Wang, F.; Sun, L.; and Zhang, Z. 2020. Hyperbolic User Identity Linkage across Social Networks. In Proceedings of GLOBECOM, 1 6. Wang, S.; Wei, X.; dos Santos, C. N.; Wang, Z.; Nallapati, R.; Arnold, A. O.; Xiang, B.; Yu, P. S.; and Cruz, I. F. 2021a. Mixed-Curvature Multi-Relational Graph Neural Network for Knowledge Graph Completion. In Proceedings of WWW, 1761 1771. Wang, X.; Liu, N.; Han, H.; and Shi, C. 2021b. Selfsupervised Heterogeneous Graph Neural Network with Cocontrastive Learning. In Proceedings of KDD. Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; and Yu, P. S. 2021. A Comprehensive Survey on Graph Neural Networks. EEE Trans. on Neural Networks Learning System, 32(1): 4 24. Xu, M.; Wang, H.; Ni, B.; Guo, H.; and Tang, J. 2021. Selfsupervised Graph-level Representation Learning with Local and Global Structure. In Proceedings of ICML, volume 139, 11548 11558. Zhang, Y.; Wang, X.; Shi, C.; Liu, N.; and Song, G. 2021. Lorentzian Graph Convolutional Networks. In Proceedings of WWW, 1249 1261.