# hyperbolic_graph_diffusion_model__7ef7c16d.pdf

Hyperbolic Graph Diffusion Model

Lingfeng Wen1, Xuan Tang2, Mingjie Ouyang1, Xiangxiang Shen1, Jian Yang3, Daxin Zhu4, Mingsong Chen1, Xian Wei1*

1Mo E Engineering Research Center of Hardware/Software Co-design Technology and Application, East China Normal University 2School of Communication and Electronic Engineering, East China Normal University 3School of Geospatial Information, Information Engineering University, China; 4Quanzhou Normal University xwei@sei.ecnu.edu.cn

Diffusion generative models (DMs) have achieved promising results in image and graph generation. However, real-world graphs, such as social networks, molecular graphs, and traffic graphs, generally share non-Euclidean topologies and hidden hierarchies. For example, the degree distributions of graphs are mostly power-law distributions. The current latent diffusion model embeds the hierarchical data in a Euclidean space, which leads to distortions and interferes with modeling the distribution. Instead, hyperbolic space has been found to be more suitable for capturing complex hierarchical structures due to its exponential growth property. In order to simultaneously utilize the data generation capabilities of diffusion models and the ability of hyperbolic embeddings to extract latent hierarchical distributions, we propose a novel graph generation method called, Hyperbolic Graph Diffusion Model (HGDM), which consists of an auto-encoder to encode nodes into successive hyperbolic embeddings, and a DM that operates in the hyperbolic latent space. HGDM captures the crucial graph structure distributions by constructing a hyperbolic potential node space that incorporates edge information. Extensive experiments show that HGDM achieves better performance in generic graph and molecule generation benchmarks, with a 48% improvement in the quality of graph generation with highly hierarchical structures.

Introduction Graph representation learning and graph generation are important machine learning tasks that have wide applications in fields such as social networks, drug discovery, and recommendation systems. While real-world graphs, such as social networks, molecular graphs, user-item interactions, and traffic graphs, generally have non-Euclidean topologies and hidden hierarchies (Asif et al. 2021; Yang et al. 2017; Fan et al. 2019), e.g., the degree distribution of the nodes are mostly power-law distributions (Adamic et al. 2001). It is common practice in graph representation learning to embed node representations into Euclidean space (Chen et al. 2020). However, the polynomially growing capacity of Euclidean space struggles to maintain the intrinsic distances of the huge number of nodes located in long-tailed regions and differentiate between them. Therefore, embedding these

*Corresponding author Copyright 2024, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

0 1 2 3 4 5

Number of points

Euclidean Hyperbolic

Figure 1: The number of points in 2D hyperbolic and Euclidean space that can be placed at radius r from the center point while maintaining a distance of at least s = 0.1 from each other.

graphs into Euclidean space may produce distortions (Linial, London, and Rabinovich 1995). Especially for the graphs of drug molecules, changes in a few atoms or chemical bonds can have a significant effect on the properties of the molecules (Thornber 1979). On the contrary, it has been found that the capacity of hyperbolic spaces grows exponentially (Munzner 1997) (see Figure 1). This property makes hyperbolic spaces well-suited for embedding tree-like or scale-free graphs (Chen et al. 2022). It has been observed that the complex hierarchical structures in hyperbolic embedding space become more prominent and easier to be captured (Mathieu et al. 2019; Nickel and Kiela 2018). Some existing hyperbolic graph neural networks (Wu et al. 2021; Liu, Nickel, and Kiela 2019) show excellent performance in graph representation learning tasks such as node classification and edge prediction. Yang et al. (2022a,b) also demonstrate superior performance over Euclidean methods in predicting long-tail items of user-item interaction for collaborative filtering in recommendation systems. Furthermore, regarding generative models, diffusion models have achieved amazing results in synthesizing images (Rombach et al. 2022). The diffusion model perturbs the data distribution gradually during the forward diffusion process and then learns to recover the data distribution from noise through the reverse diffusion process. The Latent Diffusion Model (Rombach et al. 2022) embeds images into

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

a low-dimensional latent space to reduce the computational complexity. Song et al. (2020) unified Score matching with Langevin dynamics (SMLD) (Song and Ermon 2019) and Denoising diffusion probabilistic modeling (DDPM) (Ho, Jain, and Abbeel 2020) into a system of stochastic differential equations (SDEs), referred as score-based generative models, which model the gradient of the log probability density of the data, the so-called score . GDSS (Jo, Lee, and Hwang 2022) extended score-based generative models to graph generation tasks and proved that score-based models also outperform existing graph generative methods. However, this method performs in Euclidean space, which is inconsistent with the natural properties of graphs, does not adequately model the distribution of graph data, and still has room for improvement. In recent years, efforts have been made to extend the diffusion model to different manifolds to take full advantage of the underlying structure of the data. Torsional diffusion (Jing et al. 2022a) performs better for generating molecular conformers by applying diffusion processes on the hypertorus to the torsion angles. Riemannian diffusion models such as RDM (Huang et al. 2022) and RSGM (De Bortoli et al. 2022) extended the diffusion model to arbitrary compact and non-compact Riemannian manifolds. However, they only test a few simple predefined distributions on manifolds and do not adequately evaluate the Riemannian diffusion model on larger real-world datasets. In order to simultaneously utilize the excellent data generation capability of the diffusion model and the ability of the hyperbolic space to embed hierarchical distributions, we propose a two-stage graph diffusion model based on the hyperbolic latent space. Firstly, we train a hyperbolic graph variational auto-encoder (HVAE) to learn the hyperbolic representation of nodes. Then, we train two score models simultaneously to learn the reverse diffusion process in hyperbolic space (for nodes) and Euclidean space (for adjacency matrices), which also model the dependency between nodes and adjacency matrices. In addition, since diffusion models are more time-consuming for sampling compared to VAEs or GANs, in order to avoid another increase in timeconsumption by using hyperbolic methods, we also design a Hyperbolic Graph Attention (HGAT) layer as a basic module for HVAE and score-based models to maintain the performance without significantly increasing in time spent sampling. Our approach effectively utilizes the properties of hyperbolic space and the generative quality advantages of the diffusion model. Our main contributions are summarized as follows:

We propose a novel approach for graph generation, the Hyperbolic Graph Diffusion Model (HGDM). To the best of our knowledge, HGDM is the first hyperbolic diffusion model in graph generation. We further design a simple and novel hyperbolic graph attention layer that utilizes the better representational capabilities of hyperbolic space without significantly increasing the computational time. We consider both generic graph and molecular graph generation tasks. Experiments show that HGDM significantly outperforms all baselines in most metrics.

Related Work Graph Generation Simonovsky and Komodakis (2018) proposed a method called Graph VAE for generating small graphs using a variational auto-encoder (VAE). Mol GAN (De Cao and Kipf 2018) combines generative adversarial networks (GANs) and reinforcement learning objectives to encourage the generation of molecules with specific desired chemical properties. (Shi et al. 2020; Zang and Wang 2020) generate molecules in a flow-based fashion. Graphdf (Luo, Yan, and Ji 2021) is an autoregressive flow-based model that utilizes discrete latent variables. Liu et al. (2021) proposed an energy-based model to generate molecular graphs.

Hyperbolic Graph Neural Network Hyperbolic Graph Neural Networks (HGNNs) (Liu, Nickel, and Kiela 2019; Chami et al. 2019; Dai et al. 2021; Chen et al. 2021) utilize the characteristics of hyperbolic space to better capture the hierarchical structure in graph data. The core idea is to embed node representations into hyperbolic space and use hyperbolic distance to measure their relationships, enabling better handling of graph data with a hierarchical structure. In the research of hyperbolic graph neural networks, Chami et al. (2019) introduced the Hyperbolic Graph Convolutional Network (HGCN), which achieves new state-of-the-art results in learning embeddings for real-world graphs with hierarchical and scale-free features. Through experiments with hyperbolic auto-encoders, Park et al. (2021) find that utilizing hyperbolic geometry improves the performance of unsupervised tasks on graphs, such as node clustering and link prediction. Mathieu et al. (2019) extended VAE into the hidden space of a Poincar e ball manifold as P-VAE, which shows better generalization to unseen data and the ability to qualitatively and quantitatively recover hierarchical structures.

Diffusion Models After the denoising diffusion probabilistic model (Ho, Jain, and Abbeel 2020) was proposed, a large number of related works were carried out to improve this type of model such as DDIM (Jiaming Song 2020), score-based diffusion (Song et al. 2020). Some works extended diffusion models to graph generation tasks. Jo, Lee, and Hwang (2022) proposed a novel score-based graph generation model, which can generate both node features and adjacency matrices via the system of SDEs by introducing the diffusion process of graphs. Xu et al. (2022) proposed a novel generative model named GEODIFF for molecular conformation prediction. In addition, some works have extended the diffusion model to non Euclidean manifolds to adequately model the data distribution. Jing et al. (2022b) proposed torsional diffusion, a novel diffusion framework that operates on the space of torsion angles via a diffusion process on the hypertorus and an extrinsic-to-intrinsic score model. RDM (Huang et al. 2022) and RSGM (De Bortoli et al. 2022) extended the diffusion model to arbitrary compact and non-compact Riemannian manifolds, but have not been fully evaluated on more realworld datasets.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Preliminaries Riemannian Manifold A Riemannian manifold (M, g M) is a smooth manifold M equipped with a Riemannian metric g M on the tangent space Tx M at every point x M. Tx M is the vector space formed by all tangent vectors at x, which is locally a first-order approximation of the hyperbolic manifold at x. The Riemannian manifold metric g M assigns a positive definite inner product , : Tx M Tx M R on the tangent space, which makes it possible to define several geometric properties. For a curve γ : [a, b] M, the length of γ is defined as L(γ) = R b a γ (t) g dt,where g denotes the norm induced by the Riemannian metric g M, i.e., v g = p

g(v, v) for any v Tx M. A geodesic is the analogue of a straight line in Euclidean geometry, being the shortest distance between two points x, y on manifold M: d M(x, y) = inf L(γ) where γ is a curve with γ(a) = x, γ(b) = y. Exponential and logarithmic mappings are often described as additions and subtractions on Riemannian manifolds. The exponential map at point x M, denoted as expx : Tx M M, takes a tangent vector v Tx M to the point y M obtained by following the geodesic starting at x in the direction of v for a unit amount of time. The logarithmic map is the inverse of the exponential map logx = exp 1 x : M Tx M.

Poincar e Ball Model. A hyperbolic manifold is a Riemannian manifold with a constant negative curvature c (c < 0). There exist multiple isomorphic hyperbolic models, including the Poincar e ball model, Lorentz model and Klein model. The Poincar e ball model is denoted as (Bn c , g B x ), where Bn c = {x Rn : x 2 < 1/c} is an open ndimensional ball with radius 1/ p

|c|. Its metric tensor is g B x = (λc x)2g E x , with conformal factor λc x = 2/(1 + c x 2) and Euclidean metric g E, i.e., In. Given 2 points x, y Bn c , the induced geodesic distance on Poincar e ball is

dc B(x,y)= 1 p

|c| cosh 1 1 2c x y 2

(1+c x 2) (1+c y 2)

Ganea, B ecigneul, and Hofmann (2018) derived closed-form formulations for the exponential map as

expc x(v) = x c

where v Tx B. The logarithm map is given by

logc x(y)= 2 p

|c|λc x tanh 1 p

|c| x c y x c y

The parallel transport PTx y : Tx B Ty B defines the movement of a vector from one tangent space to another along a curve without changing itself, and is given by

PT c x y(v) = λc x λcy gyr[y, x]v (4)

where c and gyr[ , ]v are M obius addition (Ungar 2007) and gyration operator (Ungar 2008), respectively.

Graph Diffusion Model

Diffusion models perturb the data by adding progressively larger noise and then learn to reverse it. Recently, SMLD (Song and Ermon 2019) and DDPM (Ho, Jain, and Abbeel 2020) are unified into the SDE system (Song et al. 2020). The Diffusion model perturbs the data in the forward SDE and estimates the score function (that is, the gradient of the log probability density with respect to data). Sampling is performed by using the estimated score function in the reverse SDE. GDSS (Jo, Lee, and Hwang 2022) extended the diffusion model to graph generation tasks. Graph diffusion models perturb both node features and adjacency matrix, modeling their dependencies to learn how to convert from noise to graph. A graph G with N nodes is defined by its node features X RN F and the weighted adjacency matrix A RN Nas G = (X, A) RN F RN N := G, where F is the dimension of the node features. Formally, the diffusion process can be represented as the trajectory of random variables{Gt = (Xt, At)}t [0,T ] in a fixed time horizon [0, T], where G0 is a graph from the data distribution pdata. The SDE of diffusion process is given by:

d Gt = f t (Gt) dt + gt (Gt) dw, G0 pdata (5)

where f t( ) : G G is the linear drift coefficient, gt( ) : G G G is the diffusion coefficient, and w is the standard Wiener process. The reverse-time SDE is separated into two parts in (Jo, Lee, and Hwang 2022): d Xt = h f 1,t (Xt) g2 1,t Xt log p (Xt, At) i dt + g1,t dw1

d At = h f 2,t (At) g2 2,t At log p (Xt, At) i dt + g2,t dw2 (6) where f 1,t and f 2,t are linear drift coefficients satisfying f t(X, A) = f 1,t(X), f 2,t(A) , g1,t and g2,t are scalar diffusion coefficients, and w1, w2 are reverse-time standard Wiener processes. The Xt log p (Xt, At) and At log p (Xt, At) are named as the partial score functions.

Hyperbolic Graph Diffusion Model

In this section, we introduce the proposed Hyperbolic Graph Diffusion Model (HGDM). Firstly, we introduce the hyperbolic wrapped normal distribution (HWN) and its sampling procedures. Secondly, we describe how to perturb data in the hyperbolic space and the training objective. Thirdly, we introduce a novel hyperbolic graph attention layer and the architecture of our model. Finally, we present the sampling methods for the inverse diffusion process in the hyperbolic space.

Probability Distributions on Hyperbolic Space

As an alternative to the normal distribution in Euclidean space, the hyperbolic wrapped normal distribution (Nagano et al. 2019; Mathieu et al. 2019) is easy to sample and has a well-defined density function. For the Poincar e ball mani-

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

fold, its density function is given by:

N W Bd c (z | µ, Σ) = dνW(z|µ,Σ)

= N λc µ logµ(z) | 0, Σ cdc p(µ,z)

sinh( cdc p(µ,z))

and its log density function is

log p(z)=log p(v)+(d 1) log

cdc p(µ, z)

sinh cdcp(µ, z)

where z Bd c, p(v) is the normal distribution in the tangent space of origin To B. This distribution can be constructed by defining Gaussian distribution on the tangent space at the origin of the hyperbolic space and projecting the distribution onto hyperbolic space after transporting the tangent space to a desired location in the space, using Eq. (4) and Eq. (2). We use the HWN as the prior distribution on the hyperbolic manifold and it is convenient to compute gradient using Eq. (8).

Perturbing Data on Hyperbolic Space We consider two types of noise perturbations, the Variance Exploding (VE) and the Variance Preserving (VP) given in (Song et al. 2020). For Euclidean features, we have:

p(x(t) | x(0)) = ( N x(0), σ2(t) σ2(0) I , (VE SDE)

2 R t 0 β(s)ds, I Ie R t 0 β(s)ds , (VP SDE) (9)

where σ(t) = σmin σmax

t for t (0, 1] and β(t) = βmin + t βmax βmin for t [0, 1]. In order to parallelly train on different time steps, we directly give the hyperbolic distribution p(xt|x0):

p(x(t) | x(0)) =

x(0), σ2(t) σ2(0) I , (VE SDE)

2 R t 0 β(s)ds c x(0), I Ie R t 0 β(s)ds , (VP SDE) (10)

where c is the M obius scalar multiplication (Ungar 2007).

Training Objective The training process consists of two stages. Firstly, we train a hyperbolic graph variational auto-encoder (HVAE) to generate the hyperbolic representation of nodes. The training objective of the HVAE is as follows:

LVAE = Lrec + Lkl + Ledge (11)

which is a reconstruction loss combined with two regularization terms. The Kullback-Leibler (KL) regularization prevents latent embeddings from arbitrarily high variance and empirically we find it helpful for learning potential inverse diffusion processes. We also add a link prediction regularization objective, to encourage embedding to preserve the graph structure and implicitly constrain nodes with high degrees to be close to the origin (see Figure 2). We use the Fermi-Dirac decoder to compute probability scores for edges:

p Ai,j = 0 | x B i , x B j = e

dc B(x B i ,x B j ) 2 r /t + 1 1 (12)

1 2 3 4 5 6 7 8 9 10 11 12

Mean distance

Degree of nodes

QM9 ZINC250k Enzymes Ego-small

Figure 2: The average distance of the hyperbolic embedding of the nodes from the origin in different datasets with the degree of the nodes.

Secondly, following Jo, Lee, and Hwang (2022), we train two neural networks sθ,t and sϕ,t simultaneously to estimate the partial score functions. But differently, given graph G = (X, A) RN F RN N from data, we first use a hyperbolic encoder to generate the hyperbolic latent feature XB 0 BF c of nodes, which combine with original adjacency matrix A forms G0 = (XB 0 , A0). Then, we sample Gt = (XB t , At) using Eq. (9) and Eq. (10). We use sθ,t and sϕ,t to estimate the gradients of the joint

log-density so that sθ,t( Gt) XB t log p XB t | XB 0 and

sϕ,t( Gt) At log p (At | A0) . It is worth noticing that

XB t log p XB t | XB 0 is on TXB t B. The training objectives are given by

λ1(t)E G0E Gt| G0

sθ,t Gt XB t log p XB t |XB 0 2

minϕEt n λ2(t)E G0E Gt| G0 sϕ,t Gt Atlog p(At |A0) 2 2

where λ1(t) and λ2(t) are positive weighting functions and t is uniformly sampled from [0, T].

Hyperbolic Graph Attention Layer We propose the Hyperbolic Graph Attention (HGAT) layer as the basic building block of HVAE and the node score predicting model sθ,t, which is a variant of GAT1 and has smaller computational complexity than HGCN (Chami et al. 2019) (see Table 3). The input of our layer is a set of hyperbolic node feature h B = {h B 1 , h B 2 , ..., h B N}, h B i Bd, where N is the number of nodes, and d is the dimension of features. We first map h B into the tangent space of origin To M using Eq. (3) that h E = logc o(h B) to perform the following attention aggregation mechanism (Veliˇckovi c et al. 2017). Next, we employ multi-head attention which is similar with (Veliˇckovi c et al. 2017), but we add adjacent feature Aij when computing attention ek ij coefficients where j Ni

ek ij = leaky Re LU Wk 0[h E i , h E j , Aij] (14) We normalize them across all choices of j using the softmax function as follows: αk ij = softmaxj(ek ij) (15)

1https://github.com/gordicaleksa/pytorch-GAT

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Then, we concatenate features from K head after computing the linear combination of the features corresponding to the normalized attention coefficients.

h E i = K k=1

j Ni αk ij Wk 1h E j

We map the output h E i of attention mechanism to the input h B i by using an exponential function, which can be regarded as a residual connection (He et al. 2016) on the hyperbolic manifold, and finally we apply a manifoldpreserving activation function σ (e.g., Re LU or leaky Re LU) (Liu, Nickel, and Kiela 2019).

h B i,out = σ(expc h B i (PT c o h B i (hi E))) (17)

Hyperbolic Score-based Model We implement a hyperbolic variational auto-encoder to generate the hyperbolic representation XB of nodes.

Encoder. The encoder first utilizes a hyperbolic embedding layer to convert input feature XE of nodes into hyperbolic embeddings X0,B. (X0,B, A) are then put into a nlayer HGAT to learn the structural information of the graph. Then the output Xn,B is mapped to Euclidean space using Eq. (3) and fed into an MLP to produce mean µ and distortion σ. We use Eq. (2) to map µ to Poincar e ball and obtain XB.

Decoder. Firstly, we obtain the hyperbolic node feature XB = {x1, x2, ..., x|V |} where xi B via reparameterization. After passing (XB, A) to k layers of HGAT, we get the output Xk,B. Then, the decoder uses a centroid-distance layer (Liu, Nickel, and Kiela 2019) to convert Xk,B into the Euclidean space. The centroid-distance layer is equipped with a list of learnable centroids C = [c1, c2, ..., c|C|] with each ci B. By computing the pairwise distance ψij between ci and xj, we obtain the Euclidean features while utilizing the hyperbolic metric and use it to recover XE.

Score Model. We implement the hyperbolic score-based model sθ,t( Gt) to estimate XB t log p XB t , At . The computing procedures of sθ,t consists:

V = MLP n logc o HGAT HB i , At o K

V = logc XB t

expc o V MLP t, λc XB t

where V = sθ,t( Gt) is in the tangent space of XB t , HB i+1 = HGAT(HB i , At) with HB 0 = XB t being given, [ ] denotes the concatenation operation, and K denotes the number of HGAT layers. Unlike the score in Euclidean space, we find that the norm of hyperbolic scores is related to the Poincar e metric g B x = (λc x)2g E x , but the metric varies with x, which makes it difficult to match the score. We use an MLP to rescale the output according to the time step and the conformal factor λc x, which empirically helps the model capture the norm of the true score.

Algorithm 1: Hyperbolic PC sampling (VE SDE)

1: x N N W Bd c (0, σ2 max I) 2: for i = N 1 to 0 do 3: x i expc xi+1((σ2 i+1 σ2 i )sθ(xi+1, Ai+1))

4: xi N W Bd c (x i, q

σ2 i+1 σ2 i I)

5: for j = 1 to M do 6: x i expc xi(ϵisθ(xi, Ai)) 7: xi N W Bd c (x i , 2ϵi I) 8: end for 9: end for 10: return x0

Algorithm 2: Hyperbolic PC sampling (VP SDE)

1: x N N W Bd c (0, I) 2: for i = N 1 to 0 do 3: x i (2 p

1 βi+1) c xi+1 4: x i expc x i(βi+1sθ(xi+1, Ai+1))

5: xi N W Bd c (x i, p

βi+1I) 6: for j = 1 to M do 7: x i expc xi(ϵisθ(xi, Ai)) 8: xi N W Bd c (x i , 2ϵi I) 9: end for 10: end for 11: return x0

We use the score-based model sϕ,t( Gt) to estimate

At log p XB t , At as follows:

sϕ,t ( Gt) = MLP h {(HE i )}L i=0 i (19)

where HE i+1 = GNN(HE i , At) with HE 0 obtained by a centroid-distance layer and L is the number of GNN layers.

Reverse Diffusion Sampling on Hyperbolic Space We implement the Predictor-Corrector samplers (Song et al. 2020) in hyperbolic space (see Algorithms 1 and 2). We use the M obius scalar multiplication c (Ungar 2007) and the exponential map in Eq. (2) to keep the reverse diffusion process on the manifold. {ϵi}N 1 i=0 are step sizes for Langevin dynamics. After sampling, a pre-trained decoder from HVAE is used to recover x0 back into the original node features, such as atom types.

Experimental Results In this section, we evaluate our method on generic graph datasets and molecular datasets, then compare it with the baselines.

Generic Graph Generation Experimental Setup We validate HGDM on four generic graph datasets: (1) Ego-small, 200 small ego graphs drawn from larger Citeseer network dataset (Sen et al. 2008),

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Dataset Ego-small Community-small Enzymes Grid Info. Real, 4 |V | 18, δ = 0.25 Synthetic, 12 |V | 20, δ = 1 Real, 10 |V | 125, δ = 1.26 Synthetic, 100 |V | 400 , δ = 9.76 Method Deg. Clus. Orbit Avg. Deg. Clus. Orbit Avg. Deg. Clus. Orbit Avg. Deg. Clus. Orbit Avg. Deep GMG 0.040 0.100 0.020 0.053 0.220 0.950 0.400 0.523 - - - - - - - - Graph RNN 0.090 0.220 0.003 0.104 0.080 0.120 0.040 0.080 0.017 0.062 0.046 0.042 0.064 0.043 0.021 0.043 Graph AF 0.03 0.11 0.001 0.047 0.18 0.20 0.02 0.133 1.669 1.283 0.266 1.073 - - - - Graph DF 0.04 0.13 0.01 0.060 0.06 0.12 0.03 0.070 1.503 1.061 0.202 0.922 - - - - Graph VAE 0.130 0.170 0.050 0.117 0.350 0.980 0.540 0.623 1.369 0.629 0.191 0.730 1.619 0.0 0.919 0.846 GNF 0.030 0.100 0.001 0.044 0.200 0.200 0.110 0.170 - - - - - - - - EDP-GNN 0.052 0.093 0.007 0.051 0.053 0.144 0.026 0.074 0.023 0.268 0.082 0.124 0.455 0.238 0.328 0.340 GDSS 0.021 0.024 0.007 0.017 0.045 0.086 0.007 0.046 0.026 0.061 0.009 0.032 0.111 0.005 0.070 0.062 HGDM (ours) 0.015 0.023 0.003 0.014 0.017 0.050 0.005 0.024 0.045 0.049 0.003 0.032 0.137 0.004 0.048 0.063 Improve. over GDSS 28.6% 4.2% 57.1% 17.6% 62.2% 41.9% 28.6% 47.8% -73.1% 19.7% 66.7% 0.0% -23.4% 20.0% 31.4% -1.6%

Table 1: Generation results on the generic graph datasets. Results of the baselines are taken from published papers (Niu et al. 2020; Luo, Yan, and Ji 2021; Jo, Lee, and Hwang 2022). Hyphen (-) denotes that the results are not provided in the original paper. The best results are highlighted in bold and the underline denotes the second best. (lower is better). We report the mean graph hyperbolicity values δ of all datasets. Due to the space limitation, we provide the standard deviations in Appendix.

(2) Community-small, 100 randomly generated community graphs (Jo, Lee, and Hwang 2022), (3) Enzymes, 587 protein graphs which represent the protein tertiary structures of the enzymes from the BRENDA database (Schomburg et al. 2004), and (4) Grid, 100 standard 2D grid graphs (Jo, Lee, and Hwang 2022). We use the maximum mean discrepancy (MMD) to compare the distributions of graph statistics between the same number of generated and test graphs. Following Jo, Lee, and Hwang (2022), we measure the distributions of degree, clustering coefficient, and the number of occurrences of orbits with 4 nodes. We also report mean graph hyperbolicity values δ (lower is more hyperbolic) of the datasets by computing Gromovs δ-hyperbolicity (Adcock, Sullivan, and Mahoney 2013), a notion from group theory that measures how tree-like a graph is. The lower δ, the more hyperbolic the graph dataset, and δ = 0 for trees.

Baselines We compare our proposed method against the following generative models. Graph VAE (Simonovsky and Komodakis 2018) is a VAE-based model. Deep GMG (Li et al. 2018) and Graph RNN (You et al. 2018) are autoregressive RNN-based models. GNF (Liu et al. 2019) is a oneshot flow-based model. Graph AF (Shi et al. 2020) is an autoregressive flow-based model. EDP-GNN (Niu et al. 2020) and GDSS (Jo, Lee, and Hwang 2022) are score-based models. Graph DF (Luo, Yan, and Ji 2021) is an autoregressive flow-based model that utilizes discrete latent variables.

Results Table 1 shows that HGDM significantly outperforms all baselines including GDSS, achieving optimal results in most metrics. In particular, HGDM outperforms GDSS with a 37.1% decrease of MMD on average in Egosmall and Community-small, indicating that it has unique advantages in generating graphs with low hyperbolicity (δ 1). Although the graphs in Enzymes deviate from a powerlaw distribution, HGDM still maintains the same results as GDSS in terms of average statistics. The graphs in Grid deviate more from the hierarchical structure (δ = 9.76) and are closer to the Euclidean topology with 0 curvature, in which case the autoregressive model Graph RNN has an advantage over the score-based models such as GDSS. HGDM still manages to achieve comparable results to GDSS on av-

erage statistics and outperforms the other baselines except Graph RNN. This demonstrates the scalability of HGDM, which still has good modeling capability for graphs with high hyperbolicity. In general, we find that our model performs better in generating graphs with small δ-hyperbolicity.

Molecule Generation Datasets Following the GDSS (Jo, Lee, and Hwang 2022), we tested our model on the QM9 (Ramakrishnan et al. 2014) and ZINC250k (Irwin et al. 2012) datasets to assess its ability to learn molecular distributions. QM9 dataset contains 134k stable small molecules with up to 9 heavy atoms (CONF). ZINC250k datasets encompass a total of 250k chemical compounds with drug-like properties. On average, these molecules are characterized by a larger size (with an average of 23 heavy atoms) and possess greater structural complexity when compared to the molecules in QM9. Following previous works (Luo, Yan, and Ji 2021; Jo, Lee, and Hwang 2022), the molecules are kekulized by the RDKit library (Landrum et al. 2016) with hydrogen atoms removed. We also report the mean graph hyperbolicity values δ.

Metrics We sample 10, 000 molecules using our model and evaluate their quality with the following metrics. Fr echet Chem Net Distance (FCD) (Preuer et al. 2018) evaluates the distance between the training sets and generated sets by analyzing the activations of the penultimate layer of the Chem Net. Neighborhood Subgraph Pairwise Distance Kernel (NSPDK) MMD (Costa and De Grave 2010) computes MMD between the generated molecules and test molecules while considering both node and edge features for evaluation. It is important to note that FCD and NSPDK MMD are crucial metrics that evaluate the ability to learn the distribution of the training molecules by measuring the proximity of the generated molecules to the distribution. Specifically, FCD assesses the ability in the context of chemical space, whereas NSPDK MMD measures the ability in terms of the graph structure. Validity w/o correction, used for fair comparing with (Jo, Lee, and Hwang 2022), is the proportion of valid molecules without valency correction or edge resampling. Time measures the time for generating

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Dataset QM9 ZINC250k Info. δ = 0.7 δ = 1 Method Val. w/o corr. (%) NSPDK MMD FCD time (s) Val. w/o corr. (%) NSPDK MMD FCD time (s)

Graph AF 67 0.020 5.268 2.52e3 68 0.044 16.289 5.80e3

Graph DF 82.67 0.063 10.816 5.35e4 89.03 0.176 34.202 6.02e3

Mo Flow 91.36 0.017 4.467 4.60 63.11 0.046 20.931 2.45e1

EDP-GNN 47.52 0.005 2.680 4.40e3 82.97 0.049 16.737 9.09e3

Graph EBM 8.22 0.030 6.143 3.71e1 5.29 0.212 35.471 5.46e1

GDSS 95.79 0.003 2.813 1.14e2 95.90 0.019 16.621 2.02e3

HGDM (ours) 98.04 0.002 2.131 1.23e2 93.51 0.016 17.69 2.23e3

Table 2: Generation results on the QM9 and ZINC250k dataset. Results of the baselines are taken from (Jo, Lee, and Hwang 2022). The method with denotes that results are obtained by running open-source codes. The best results are highlighted in bold and the underline denotes the second best. We report the mean graph hyperbolicity values δ of the two datasets. Due to the space limitation, we show the results of validity, uniqueness, and novelty in the Appendix.

Method Val. w/o corr. (%) NSPDK MMD FCD time (s)

GDSS 95.79 0.003 2.813 1.14e2

GDSS+AE 95.95 0.003 2.615 1.17e2

HGDM+hgcn(ours) 96.64 0.002 2.331 1.48e2

HGDM (ours) 98.04 0.002 2.131 1.23e2

Table 3: Generation results of the variants of GDSS and HGDM on the QM9 dataset.

10, 000 molecules in the form of RDKit molecules.

Baselines We use GDSS (Jo, Lee, and Hwang 2022) as our main baseline. We also compare the performance of HGDM against Graph AF (Shi et al. 2020), Graph DF (Luo, Yan, and Ji 2021), Mo Flow (Zang and Wang 2020), EDPGNN (Niu et al. 2020) and Graph EBM (Liu et al. 2021).

Results Table 2 shows that our method also exhibits excellent performance on molecular generation tasks. In particular, on the QM9 dataset, our method is optimal in all metrics and significantly outperforms our main comparator, GDSS. HGDM achieves the highest validity without the use of posthoc valency correction. It shows that HGDM can effectively learn the valency rules of the molecules. HGDM also outperforms all baselines in NSPDK MMD and FCD, which indicates that not only does the HGDM efficiently learn the distribution of the graph structure from the hyperbolic latent space and benefit from it, but the generated molecules are also close to the data distribution in the chemical space. We also report the generation results of ZINC250k in Table 2. As we observed in the generic graph datasets, the performance of HGDM in FCD and validity is slightly degraded due to the fact that the ZINC250k dataset deviates more from the hyperbolic structure and has more complex chemical properties compared to QM9. However, HGDM still maintains the advantage of generation from a geometric structure perspective that outperforms all the baselines in NSPDK MMD. Overall HGDM achieves similar performance for GDSS on the ZINC250k dataset and outperforms all other baselines. The superior performance of HGDM on the molecule generation task validates the ability of our

method to efficiently learn the underlying distribution of molecular graphs with multiple nodes and edge types.

Ablation Study We implement a variant of GDSS called GDSS+AE. It incorporates an autoencoder to generate the Euclidean embedding of nodes and uses GDSS to estimate the score function in Euclidean latent space. We use GDSS+AE to compare the effect of Euclidean and hyperbolic hidden spaces on graph diffusion. In addition, we tested HGDM using the HGCN layer as a building block, called HGDM+hgcn. The results of the above-mentioned variants in QM9 are provided in Table 3

Necessity of Hyperbolic Hidden Space We find that GDSS+AE is slightly improved in metrics compared to GDSS. It can be considered that the dense embedding generated by the auto-encoder helps to learn the distribution of the graph, but it is not comparable to our hyperbolic approach due to the restricted capacity growth of Euclidean space.

Necessity of HGAT Layers HGDM+hgcn is comparable to our model in most metrics, but the increase in time cost is higher compared to GDSS, whereas our method has only a slight increase in time, making it more suitable for scaling up to the task of generating large graphs. This demonstrates the importance of using our proposed HGAT layer.

Conclusion In this work, we proposed a two-stage Hyperbolic Graph Diffusion Model (HGDM) to learn better the distribution of graph data and a simple hyperbolic graph attention layer that reduces the extra time spent associated with hyperbolic methods. HGDM overcomes the limitations of previous graph generation methods and is closer to the nature of graphs. We have found experimentally that learning the distribution in the hyperbolic space is beneficial for the quality of generated graph with the power-law distribution. Experimental results in generic graph and molecule generation show that HGDM outperforms existing graph generation methods in most metrics, demonstrating the importance of learning the underlying manifold for graph generation tasks. Code is available at https://github.com/LF-WEN/HGDM

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Acknowledgments

This work is supported by the National Natural Science Foundation of China (42130112,62272170). This work is also supported by the Digital Silk Road Shanghai International Joint Lab of Trustworthy Intelligent Software (22510750100), the General Program of Shanghai Natural Science Foundation (23ZR1419300) and Karto Bit Research Network(KRN2201CA).

Adamic, L. A.; Lukose, R. M.; Puniyani, A. R.; and Huberman, B. A. 2001. Search in power-law networks. Physical review E, 64(4): 046135. Adcock, A. B.; Sullivan, B. D.; and Mahoney, M. W. 2013. Tree-like structure in large social and information networks. In 2013 IEEE 13th international conference on data mining, 1 10. IEEE. Asif, N. A.; Sarker, Y.; Chakrabortty, R. K.; Ryan, M. J.; Ahamed, M. H.; Saha, D. K.; Badal, F. R.; Das, S. K.; Ali, M. F.; Moyeen, S. I.; et al. 2021. Graph neural network: A comprehensive review on non-euclidean space. IEEE Access, 9: 60588 60606. Chami, I.; Ying, Z.; R e, C.; and Leskovec, J. 2019. Hyperbolic graph convolutional neural networks. Advances in neural information processing systems, 32. Chen, F.; Wang, Y.-C.; Wang, B.; and Kuo, C.-C. J. 2020. Graph representation learning: a survey. APSIPA Transactions on Signal and Information Processing, 9: e15. Chen, W.; Han, X.; Lin, Y.; Zhao, H.; Liu, Z.; Li, P.; Sun, M.; and Zhou, J. 2021. Fully hyperbolic neural networks. ar Xiv preprint ar Xiv:2105.14686. Chen, Y.; Yang, M.; Zhang, Y.; Zhao, M.; Meng, Z.; Hao, J.; and King, I. 2022. Modeling scale-free graphs with hyperbolic geometry for knowledge-aware recommendation. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, 94 102. Costa, F.; and De Grave, K. 2010. Fast neighborhood subgraph pairwise distance kernel. In Proceedings of the 26th International Conference on Machine Learning, 255 262. Omnipress; Madison, WI, USA. Dai, J.; Wu, Y.; Gao, Z.; and Jia, Y. 2021. A hyperbolic-tohyperbolic graph convolutional network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 154 163. De Bortoli, V.; Mathieu, E.; Hutchinson, M.; Thornton, J.; Teh, Y. W.; and Doucet, A. 2022. Riemannian score-based generative modeling. ar Xiv preprint ar Xiv:2202.02763. De Cao, N.; and Kipf, T. 2018. Mol GAN: An implicit generative model for small molecular graphs. ar Xiv preprint ar Xiv:1805.11973. Fan, W.; Ma, Y.; Li, Q.; He, Y.; Zhao, E.; Tang, J.; and Yin, D. 2019. Graph neural networks for social recommendation. In The world wide web conference, 417 426.

Ganea, O.; B ecigneul, G.; and Hofmann, T. 2018. Hyperbolic neural networks. Advances in neural information processing systems, 31. He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770 778. Ho, J.; Jain, A.; and Abbeel, P. 2020. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33: 6840 6851. Huang, C.-W.; Aghajohari, M.; Bose, J.; Panangaden, P.; and Courville, A. C. 2022. Riemannian diffusion models. Advances in Neural Information Processing Systems, 35: 2750 2761. Irwin, J. J.; Sterling, T.; Mysinger, M. M.; Bolstad, E. S.; and Coleman, R. G. 2012. ZINC: a free tool to discover chemistry for biology. Journal of chemical information and modeling, 52(7): 1757 1768. Jiaming Song, S. E., Chenlin Meng. 2020. Denoising Diffusion Implicit Models. ar Xiv preprint ar Xiv:2010.02502. Jing, B.; Corso, G.; Chang, J.; Barzilay, R.; and Jaakkola, T. 2022a. Torsional diffusion for molecular conformer generation. ar Xiv preprint ar Xiv:2206.01729. Jing, B.; Corso, G.; Chang, J.; Barzilay, R.; and Jaakkola, T. 2022b. Torsional Diffusion for Molecular Conformer Generation. ar Xiv preprint ar Xiv:2206.01729. Jo, J.; Lee, S.; and Hwang, S. J. 2022. Score-based generative modeling of graphs via the system of stochastic differential equations. In International Conference on Machine Learning, 10362 10383. PMLR. Landrum, G.; et al. 2016. Rdkit: Open-source cheminformatics software. 2016. URL http://www. rdkit. org/, https://github. com/rdkit/rdkit, 149(150): 650. Li, Y.; Vinyals, O.; Dyer, C.; Pascanu, R.; and Battaglia, P. 2018. Learning deep generative models of graphs. ar Xiv preprint ar Xiv:1803.03324. Linial, N.; London, E.; and Rabinovich, Y. 1995. The geometry of graphs and some of its algorithmic applications. Combinatorica, 15: 215 245. Liu, J.; Kumar, A.; Ba, J.; Kiros, J.; and Swersky, K. 2019. Graph normalizing flows. Advances in Neural Information Processing Systems, 32. Liu, M.; Yan, K.; Oztekin, B.; and Ji, S. 2021. Graphebm: Molecular graph generation with energy-based models. ar Xiv preprint ar Xiv:2102.00546. Liu, Q.; Nickel, M.; and Kiela, D. 2019. Hyperbolic graph neural networks. Advances in neural information processing systems, 32. Luo, Y.; Yan, K.; and Ji, S. 2021. Graphdf: A discrete flow model for molecular graph generation. In International Conference on Machine Learning, 7192 7203. PMLR. Mathieu, E.; Le Lan, C.; Maddison, C. J.; Tomioka, R.; and Teh, Y. W. 2019. Continuous hierarchical representations with poincar e variational auto-encoders. Advances in neural information processing systems, 32.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Munzner, T. 1997. H3: Laying out large directed graphs in 3D hyperbolic space. In Proceedings of VIZ 97: Visualization Conference, Information Visualization Symposium and Parallel Rendering Symposium, 2 10. IEEE.

Nagano, Y.; Yamaguchi, S.; Fujita, Y.; and Koyama, M. 2019. A wrapped normal distribution on hyperbolic space for gradient-based learning. In International Conference on Machine Learning, 4693 4702. PMLR.

Nickel, M.; and Kiela, D. 2018. Learning continuous hierarchies in the lorentz model of hyperbolic geometry. In International conference on machine learning, 3779 3788. PMLR.

Niu, C.; Song, Y.; Song, J.; Zhao, S.; Grover, A.; and Ermon, S. 2020. Permutation invariant graph generation via scorebased generative modeling. In International Conference on Artificial Intelligence and Statistics, 4474 4484. PMLR.

Park, J.; Cho, J.; Chang, H. J.; and Choi, J. Y. 2021. Unsupervised hyperbolic representation learning via message passing auto-encoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5516 5526.

Preuer, K.; Renz, P.; Unterthiner, T.; Hochreiter, S.; and Klambauer, G. 2018. Fr echet Chem Net distance: a metric for generative models for molecules in drug discovery. Journal of chemical information and modeling, 58(9): 1736 1741.

Ramakrishnan, R.; Dral, P. O.; Rupp, M.; and Von Lilienfeld, O. A. 2014. Quantum chemistry structures and properties of 134 kilo molecules. Scientific data, 1(1): 1 7.

Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; and Ommer, B. 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10684 10695.

Schomburg, I.; Chang, A.; Ebeling, C.; Gremse, M.; Heldt, C.; Huhn, G.; and Schomburg, D. 2004. BRENDA, the enzyme database: updates and major new developments. Nucleic acids research, 32(suppl 1): D431 D433.

Sen, P.; Namata, G.; Bilgic, M.; Getoor, L.; Galligher, B.; and Eliassi-Rad, T. 2008. Collective classification in network data. AI magazine, 29(3): 93 93.

Shi, C.; Xu, M.; Zhu, Z.; Zhang, W.; Zhang, M.; and Tang, J. 2020. Graphaf: a flow-based autoregressive model for molecular graph generation. ar Xiv preprint ar Xiv:2001.09382.

Simonovsky, M.; and Komodakis, N. 2018. Graphvae: Towards generation of small graphs using variational autoencoders. In Artificial Neural Networks and Machine Learning ICANN 2018: 27th International Conference on Artificial Neural Networks, Rhodes, Greece, October 4-7, 2018, Proceedings, Part I 27, 412 422. Springer.

Song, Y.; and Ermon, S. 2019. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32.

Song, Y.; Sohl-Dickstein, J.; Kingma, D. P.; Kumar, A.; Ermon, S.; and Poole, B. 2020. Score-based generative modeling through stochastic differential equations. ar Xiv preprint ar Xiv:2011.13456. Thornber, C. 1979. Isosterism and molecular modification in drug design. Chemical Society Reviews, 8(4): 563 580. Ungar, A. A. 2007. The hyperbolic square and mobius transformations. Banach Journal of Mathematical Analysis, 1(1): 101 116. Ungar, A. A. 2008. A gyrovector space approach to hyperbolic geometry. Synthesis Lectures on Mathematics and Statistics, 1(1): 1 194. Veliˇckovi c, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; and Bengio, Y. 2017. Graph attention networks. ar Xiv preprint ar Xiv:1710.10903. Wu, Z.; Jiang, D.; Hsieh, C.-Y.; Chen, G.; Liao, B.; Cao, D.; and Hou, T. 2021. Hyperbolic relational graph convolution networks plus: a simple but highly efficient QSAR-modeling method. Briefings in Bioinformatics, 22(5): bbab112. Xu, M.; Yu, L.; Song, Y.; Shi, C.; Ermon, S.; and Tang, J. 2022. Geo Diff: a Geometric Diffusion Model for Molecular Conformation Generation. ar Xiv preprint ar Xiv:2203.02923. Yang, C.; Sun, M.; Zhao, W. X.; Liu, Z.; and Chang, E. Y. 2017. A neural network approach to jointly modeling social networks and mobile trajectories. ACM Transactions on Information Systems (TOIS), 35(4): 1 28. Yang, M.; Li, Z.; Zhou, M.; Liu, J.; and King, I. 2022a. Hicf: Hyperbolic informative collaborative filtering. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2212 2221. Yang, M.; Zhou, M.; Liu, J.; Lian, D.; and King, I. 2022b. HRCF: Enhancing collaborative filtering via hyperbolic geometric regularization. In Proceedings of the ACM Web Conference 2022, 2462 2471. You, J.; Ying, R.; Ren, X.; Hamilton, W.; and Leskovec, J. 2018. Graphrnn: Generating realistic graphs with deep autoregressive models. In International conference on machine learning, 5708 5717. PMLR. Zang, C.; and Wang, F. 2020. Mo Flow: an invertible flow model for generating molecular graphs. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 617 626.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)