# disentangled_spatiotemporal_graph_generative_models__dd7c0a12.pdf

Disentangled Spatiotemporal Graph Generative Models

Yuanqi Du1, Xiaojie Guo2*, Hengning Cao1, Yanfang Ye3, Liang Zhao4

1George Mason University, Fairfax, US 2JD.COM Silicon Valley Research Center, Mountain View, CA, US 3University of Notre Dame, Notre Dame, US 4Emory University, Atlanta, US ydu6@gmu.edu, liang.zhao@emory.edu

Spatiotemporal graph represents a crucial data structure where the nodes and edges are embedded in a geometric space and can evolve dynamically over time. Nowadays, spatiotemporal graph data is becoming increasingly popular and important, ranging from microscale (e.g. protein folding), to middle-scale (e.g. dynamic functional connectivity), to macro-scale (e.g. human mobility network). Although disentangling and understanding the correlations among spatial, temporal, and graph aspects have been a long-standing key topic in network science, they typically rely on network processing hypothesized by human knowledge. This usually ﬁt well towards the graph properties which can be predeﬁned, but cannot do well for the most cases, especially for many key domains where the human has yet very limited knowledge such as protein folding and biological neuronal networks. In this paper, we aim at pushing forward the modeling and understanding of spatiotemporal graphs via new disentangled deep generative models. Speciﬁcally, a new Bayesian model is proposed that factorizes spatiotemporal graphs into spatial, temporal, and graph factors as well as the factors that explain the interplay among them. A variational objective function and new mutual information thresholding algorithms driven by information bottleneck theory have been proposed to maximize the disentanglement among the factors with theoretical guarantees. Qualitative and quantitative experiments on both synthetic and real-world datasets demonstrate the superiority of the proposed model over the state-of-the-arts by up to 69.2% for graph generation and 41.5% for interpretability.

Introduction There are two major directions on graph learning research in machine learning: 1) graph representation learning (Kipf and Welling 2016a; Veliˇckovi c et al. 2017; Hamilton, Ying, and Leskovec 2017), which aims at encoding graph structural information into (low-dimensional) vector space, and 2) graph generation (You et al. 2018; Simonovsky and Komodakis 2018), which reversely aims at constructing a graph-structured data from low-dimensional space containing the graph generation rules or distribution. In this paper, we focus on the second direction, speciﬁcally for spa-

*Equally contributing as ﬁrst author The corresponding author Copyright 2022, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

tiotemporal graphs. Spatiotemporal graph represents a vital data structure where the nodes and edges are embedded and evolve in a geometric space, as shown in Fig. 1. Nowadays, spatiotemporal graph data is becoming increasingly popular and important, ranging from epidemic, transportation to biological network modeling (Dye and Gay 2003; Ingraham et al. 2019; Stopher and Meyburg 1975; Teng 1985; Guo et al. 2020a; Fu, Xiao, and Sun 2020; Du et al. 2020; Xu et al. 2021a,b; Rahman et al. 2021; Fu et al. 2021b; Zhao 2021; Fu et al. 2021a). For example, the epidemic spreading network and the protein folding process can both be represented as spatiotemporal graphs, respectively. Spatiotemporal graphs cannot be modeled using either the spatial, graph, or temporal information individually, but require the simultaneous characterization of both the data and their interactions, which results in various patterns (Barth elemy 2011). Spatial and graph aspects of information are usually correlated. For example, geographically nearby people tend to befriend in a social network. A pair of atoms that are very close in space tend to have a bond. Moreover, the above interplay between spatial and graph aspects is a dynamic process, thus, the consideration in time aspect is inevitable for a comprehensive modeling. Recently, although spatiotemporal graph deep learning has stimulated a surge of research for graph representation learning (Cui et al. 2019; Wu et al. 2019; Yu, Yin, and Zhu 2017; Khodayar et al. 2019; Roy et al. 2021; Wu et al. 2019; Yu, Yin, and Zhu 2017; Fu et al. 2021c), however, deep generative models for spatiotemporal graph have not been well explored.

Modeling and understanding the generative process of spatiotemporal graphs are a long-lasting research topic in domains such as graph theory and network science. Traditional methods usually extend and integrate network models in spatial networks (e.g., protein and molecule structures) and temporal graphs (e.g., trafﬁc networks and epidemic spreading networks) into spatiotemporal graphs which captures some predeﬁned properties of a graph, e.g., degree distribution, structure of community, clustering patterns. However, these models heavily rely on the predeﬁned network process and rich knowledge of the graph properties, while the network properties and generation principles always remain unknown in the real-world applications, such as models that explain the mechanisms of mental diseases in brain networks during an activity of human beings and protein

The Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22)

Figure 1: Spatiotemporal graphs represent crucial data structures where the nodes and edges are embedded in a geometric space and their attribute values can evolve dynamically over time. Each column represents the formulation of a snapshot of a spatiotemporal graph.

structure folding. Another line of research works is computational simulation models of spatiotemporal graphs customized for speciﬁc applications such as epidemics, brain simulator, and transportation simulation (Dye and Gay 2003; Stopher and Meyburg 1975; Teng 1985). However, they are domain-speciﬁc with enormously detailed prior knowledge involved. This motivates us to propose the spatiotemporal graph models which can learn the underlying spatial, temporal, and graph processes as well as their interplay. Recent advanced deep generative models, such as variational auto-encoders (Graph VAE) (Kipf and Welling 2016b), have made important progress towards modeling and understanding each of the three components in spatiotemporal graphs, including graph data (e.g., citation networks) (Kipf and Welling 2016a), spatial networks (e.g., proteins) (Ingraham et al. 2019) and temporal networks (e.g., trafﬁc networks) (Zhang et al. 2020; Zhou et al. 2020), respectively. However, none of the work is designed for spatiotemporal graphs which cannot be effectively modeled by simply integrating the existing techniques, due to several signiﬁcant challenges: (1) It is difﬁcult to build a generic spatiotemporal graph generative framework to model spatial, graph, and temporal jointly. This framework needs to not only capture the intrinsic generic properties shared across all types of spatiotemporal graphs, but also can automatically learn network models that tailor well for speciﬁc applications. (2) It is difﬁcult to effectively distinguish and capture the correlation among the spatial, temporal, and graph information. As shown in Figure 1, in spatiotemporal graphs, some network information are entangled, such as the only spatial-related information (Row 2) and only graph-related information (Row 3), while some spatial and graph properties are correlated (Row 1). (3) It is difﬁcult to theoretically ensure the disentanglement among the independent patterns. In light of model interpretability, disentangling the independent patterns can help better control the properties of the generated spatialtemporal network. Current methods are limited in handling the disentanglement of spatial, temporal, and graph independent properties or factors.

To solve the aforementioned challenges, we propose, to the best of our knowledge, the ﬁrst generic deep generative model framework that models and disentangles spatiotemporal graph data. Speciﬁcally, we ﬁrst propose a novel deep Bayesian network that factorizes spatiotemporal graphs into the time-variant, time-invariant, spatial-graph joint, and independent factors based on inductive bias (Liu et al. 2021). A new objective driven by information-bottleneck theory has been proposed that can maximize the disentanglement of different factors as well as latent variables inside each factor, with theoretical guarantees. To optimize this objective function, a novel information-iterative-thresholding algorithm has been proposed to jointly optimize the objective and optimize its hyperparameters on information bottlenecks with theoretical analysis on optimal conditions. Extensive quantitative and qualitative experiments on two synthetic and two real-world datasets show the superiority of our proposed model over the state-of-the-art graph generative models by up to 69.2% for spatiotemporal graph generation and 41.5% for interpretability.

Related Work

Spatiotemporal Graph Representation Learning

This domain beneﬁts a lot from the deep representation learning techniques for images, sequence, and network data, such as conv LSTM. Currently, the most well-developed ﬁeld is spatiotemporal forecasting in trafﬁc networks (Cui et al. 2019; Wu et al. 2019; Yu, Yin, and Zhu 2017; Khodayar et al. 2019; Roy et al. 2021; Wu et al. 2019; Yu, Yin, and Zhu 2017). Other works (Wang et al. 2019; Zhang et al. 2020) either study spatial network data or temporal graph data. For example, (Wang et al. 2019) considers the spatial locations of nodes, but not capable of generating temporal features. (Zhang et al. 2020) models the temporal features explicitly but do not include spatial features.

Spatiotemporal Graph Generation

The goal is to generate diverse spatiotemporal graphs, motivated by network design and interpretation (Barth elemy 2011). Traditional spatiotemporal network generation largely relies on human-deﬁned heuristics/prior knowledge about the network being modeled, such as epidemic modeling, transportation modeling, protein modeling, etc. (Stopher and Meyburg 1975; Dye and Gay 2003; Teng 1985), which however almost pose no generalizability to other domain (Dahiyat and Mayo 1997). Another line of models are based on prescribed structural assumptions, such as probabilistic models (Barab asi, Albert, and Jeong 1999), conﬁguration models (Bender and Canﬁeld 1978), and stochastic block models (Xu and Hero 2014). These prescribed models aim to model some predeﬁned graph properties, e.g. community structures, clustering patterns, etc, which, however, are not sufﬁcient for real-world graph datasets where the prescribed rules are unkown (Holme 2015; Rozenshtein and Gionis 2019). Deep generative models have rarely been used to tackle the spatiotemporal graph generation problem (Zhang 2019).

Figure 2: Graphical illustration of the proposed models. (a) The Bayesian network of the proposed probabilistic distribution of spatiotemporal graphs. (b) The approximate inference process of the posterior of latent variables, with conditional independence assumption across time snapshots. (c) The Bayesian network of the alternative probabilistic distribution of spatiotemporal graphs, with dependence assumption across time snapshots. (d) The alternative approximate inference model of the posterior of the proposed model, with dependence assumption across time snapshots.

Deep Generative Models on Graphs

Deep generative models achieve great success in computer vision, natural language processing, etc. Recently, increasing attention has been attracted to model graph-structured data by deep generative model. Work in (Simonovsky and Komodakis 2018; You et al. 2018; Wang et al. 2018; Shi et al. 2020; Guo, Du, and Zhao 2020; Du et al. 2021a) transfers popular deep generative models, such as GANs, VAEs, RNNs, Flow-based models, etc., into graph-structured data to model proteins, molecules, etc. and show promising results. For example, graph RNN utilizes an auto-regressive generative model to generate a sequence of nodes and edges by an LSTM model (You et al. 2018). Graphite (Grover, Zweig, and Ermon 2019) and VGAE (Kipf and Welling 2016b) focus on the node-level embedding and form edges between each pair of nodes to generate a graph. Graph VAE (Kipf and Welling 2016b) represents each graph by its edge feature (i.e. adjacency matrix) and node features, and utilizes an VAE model to learn the distribution of the graphs conditioned on the latent representation at the graph-level.

Disentangled Representation Learning

Disentanglement learning aims to learn a disentangled representation that keeps latent variables separate and interpretable for the variations in the data. It has been shown that such disentangled representation improves the robustness against the adversarial attack and increases the generalizability of the model (Alemi et al. 2016). This inspires much work in VAE to study how to disentangle the latent representations which expose real-world semantic factors and the solutions include adding, removing, or altering the weight of the objective term in the generative tasks (Chen et al. 2018; Kim and Mnih 2018; Kumar, Sattigeri, and Balakrishnan 2017; Lopez et al. 2018; Zhao, Song, and Ermon 2017). Recently, the idea to learn a disentangled representation has also been applied to graph generative models. (Guo et al. 2020b; Ma et al. 2019) disentangles the latent variables that expose semantic factors of the nodes, edges, and the graphs. However, learning a disentangled representation for spatiotemporal graph modeling remains largely unexplored.

Methodology

Problem Formulation

A spatiotemporal graph is deﬁned as (S1:T , G1:T ), where T represents number of time frames of the spatiotemporal graphs, and S1:T = {S1, S2, ...ST }, G1:T = {G1, ...GT }. St = (Vt, Ct) represents the geometric information of t-th snapshot of a spatiotemporal graph, where Vt denotes a set of N nodes and Ct RN 3 denotes 3D geometric information. Gt = (Vt, Et, Xt, Et) represents the graph information of t-th snapshot, where Et Vt Vt is the set of edges. Et RN N K refers to the edge weights or adjacent matrix, and K refers to the edge feature dimension. Xt RN M denotes the node feature and M is the length of the node feature vector. It is worth noting that the geometric information St can not be simply treated as part of node features since this type of representation cannot capture and maintain some properties, such as translationand rotation-invariances (Fuchs et al. 2020). This paper aims at proposing a generic data-driven framework for modeling spatiotemporal graphs, under fundamental, necessary factors. First, for any spatiotemporal graphs, there could be patterns that are time-variant and timeinvariant. While time-invariant, spatial and graph information could either be correlated or independent, hence it is important to distinguish and capture these different semantic factors via different latent variables. More concretely, the goal is to learn a posterior p(S1:T , G1:T |Z, F) of the spatiotemporal graphs given four groups of generative latent variables Z = z1:T RL1 for time-variant features and F = (fs RL2, fg RL3, fsg RL4) for timeinvariant features, where we need to capture and disentangle time-variant factors z1:T , time-invariant geometric factors fs, graph factors fg and spatial-graph joint factors fsg. L1, L2, L3, and L4 are the number of variables in each group of factors, respectively. The encoding and generative process of our proposed Spatio Temporal Graph Disentangled Variational Auto-Encoder (STGD-VAE) model is illustrated in Fig. 2(a) and Fig. 2(b). Another implementation of the proposed model following the common time-dependency, namely, STGD-VAE-Dep is illustrated in Fig. 2(c) and Fig. 2(d), and detailed in Supplementary Material.

The Objective on Spatiotemporal Graph Generative Modeling To learn the conditional probability p(S1:T , G1:T |z1:T , F), it is equal to maximizing the marginal likelihood of the observed spatiotemporal graph sequence (S1:T , G1:T ) in expectation over the distribution of the latent representation as Epθ(z1:T ,F )pθ(S1:T , G1:T |z1:T , F). The prior distribution of the latent spaces is described as p(z1:T , F) with the observation of a spatiotemporal graph sequence (S1:T , G1:T ), which, however, is intractable. Therefore, a variational objective is proposed to tackle this problem, where the posterior distribution is approximated by another distribution qφ(z1:T , F|S1:T , G1:T ). The objective can be written as minimizing the Kullback-Leibler Divergence (KLD) between the true prior distribution and the approximate posterior distribution. In order to encourage this disentanglement property of qφ(z1:T , F|S1:T , G1:T ), we introduce a constraint by trying to match the inferred posterior conﬁgurations of the latent factors to the prior p(z1:T , fs, fg, fsg). This can be achieved if we set each prior to be an isotropic unit Gaussian, i.e., N(0, 1), leading to a constrained optimization problem as: max θ,φ ES1:T ,G1:T D[Eqφ(z1:T ,F |S1:T ,G1:T )

log pθ(S1:T , G1:T |z1:T , F)] s.t.DKL(qφ(z1:T , F|S1:T , G1:T )||p(z1:T , F)) < I (1) where D refers to the observed dataset of the spatiotemporal graphs and I speciﬁes the information that ﬂows via the latent representation. The above objective can be further decomposed for simple estimation and implementation of each component based on different pre-deﬁned dependence and independence assumptions in the problem formulation, as stated in Lemma 1. Lemma 1. Given the assumption that: (1) S1:T G1:T |(z1:t, F); (2) S1:T fg and G1:T fs; (3) Gi Gj|(zi, zj, fg, fsg) and Si Sj|(zt, zk, fs, fsg); (4) z1:T (fs, fg, fsg), and z1 z2 z T , the objective of spatiotemoral graph generation can be expressed as max θ,φ ES1:T ,G1:T DEqφ(z1:t,F |G1:T ,S1:T ) XT

t=1[log pθ(Gt|zt, fg, fsg) + log pθ(St|zt, fs, fsg)]

t=1 DKL(qφ(zt|Gt, St)||p(zt)) < It

DKL(qφ(fg|G1:T )||p(fg)) < Ig DKL(qφ(fs|S1:T )||p(fs)) < Is DKL(qφ(fsg|S1:T , G1:T )||p(fsg)) < Isg (2) Proof. In Lemma 1, I is decomposed into four mutualexclusive information capacity, Is, Ig, Isg, and It in Eq. 2. The detailed proof of Lemma 1 can be found in Supplementary Material.

Maximizing the Disentanglement among Spatial, Temporal and Graph Factors One of our goals is to maximize the disentanglement of spatial, temporal, and graph factors. So for example if a factor is

merely related to spatial information, we do not want it to be explained by the spatial-graph joint factor fsg. Analogously, if a factor is invariant to time, we do not want it to be explained by the time-variant factor zt. However, this cannot be guaranteed by Equation 2, whose constraints can only enforce variable-level disentanglement within each type of factor instead of a maximized disentanglement across spatial, temporal, and graph factors. To address the above issue, we ﬁrst re-interpret the constraints by information bottleneck theory (Burgess, Higgins, and Pal 2018). The posterior distribution qφ(zt|St, Gt), qφ(fg|G1:T ),qφ(fs|S1:T ), and qφ(fsg|G1:T , S1:T ) are interpreted as information bottleneck for the reconstruction task Eqφ(Z|G1:T ,S1:T )logpθ(S1:T |z1:T , fs, fsg) and Eqφ(Z|G1:T ,S1:T ) log pθ(G1:T |z1:T , fg, fsg). We propose that, by constraining the information ﬂowing through each time-variant variable zt to be less than the information entropy of time-variant information Ct, namely It Ct, zt will capture and only capture the time-variant information. We also propose that, by constraining the information ﬂowing through the spatial-graph joint variable fsg to be less than the information entropy of the time-invariant correlated spatial-graph factor Csg, namely Isg Csg, fsg will only capture the time-invariant spatial-graph correlated factor. The new objective function is as follows:

max θ,φ ES1:T ,G1:T DEqφ(z1:T ,F |S1:T ,G1:T ) XT

t=1[log pθ(Gt|zt, fg, fsg) + log pθ(St|zt, fs, fsg)]

t=1 DKL(qφ(zt|St, Gt)||p(zt)) < It

DKL(qφ(fg|G1:T )||p(fg)) < Ig DKL(qφ(fs|S1:T )||p(fs)) < Is DKL(qφ(fsg|S1:T , G1:T )||p(fsg)) < Isg Isg Csg, It Ct. (3)

This objective has the properties stated in Theorem 2.

Theorem 2. The objective in Equation 3 guarantees that zt captures and only captures the time-variant information while fsg captures and only captures the spatial-graph joint information.

Proof. The above theorem is proved based on the condition that (1) the sum of Is, Ig, and Isg are large enough to contain the time-variant information, and Is, Ig are large enough to contain the time-invariant spatial-exclusive and graph-exclusive information, (2) It Ct, and Isg Csg. Due to the space limit, the detailed proof is provided in Supplementary Material.

Spatiotemporal Graph Mutual Information Thresholding Algorithm

Eq. 3 is a challenging constrained nonconvex problem that also requires learning its hyperparameters of information bottleneck threshold Isg and It. This section proposes a novel algorithm along with its optimal condition analysis with respect to the information bottleneck threshold.

Given Is and Ig are constants, the second and third constrain can be rewritten based on the Lagrangian algorithm under KKT condition (Mangasarian 1994) as:

R1 = β2(DKL(qφ(fs|S1:T )||p(fs))) + β3(DKL(qφ(fg|G1:T )||p(fg))) (4)

where the Lagrangian multipliers β2 and β3 are the regularization coefﬁcients that control the capacity of the latent space information fs and fg, respectively. Next, since It and Isg in the ﬁrst constraint is a trainable parameter which ensures It Ct and Isg Csg, it can be written as a Lagrangian under the KKT condition as

R2 = β1( YT

t=1 DKL(qφ(zt|Gt, St)||p(zt)) It) (5)

R3 = β4(DKL(qφ(fsg|S1:T , G1:T )||p(fsg)) Isg) (6)

Finally, we can optimize the overall objective as:

max θ,φ ES1:T ,G1:T DEqφ(z1:T ,F |S1:T ,G1:T ) XT

t=1[log pθ(Gt|zt, fg, fsg) + log pθ(St|zt, fst, fsg)]

R1 R2 R3 s.t. Isg Csg, It Ct (7)

It is very hard to directly optimize the above objective since Ct and Csg are unknown. To circumvent the challenge, we propose a novel thresholding strategy consisting of two stages: the ﬁrst stage is to optimize Isg, the second stage is to optimize It, as detailed in Algorithm 1. In short, we increase Isg by α in every K until a stopping criteria is satisﬁed while keeping It at a very low value (Lines 1-6 in Algorithm 1). Then, we stop increasing Isg and increase It by γ every J epoch until a stopping criterion is satisﬁed (Lines 8-14 in Algorithm 1).

Algorithm 1: Information-iterative-thresholding algorithm

Require: The initialized parameter set W; the initialized It = ϵ and Isg = ϵ (It / W Isg / W and ϵ is a very small number, e.g. 1 10 5); the increase step γ, α for optimizing It and Isg; the number of epochs J, K of optimization for each updated It and Isg. Ensure: The optimized parameter set W. 1: while R3 < 0 do {stopping criterion for Isg} 2: Isg := Isg + α 3: for epoch = 1 : K do {increase Isg every K epoch} 4: Compute the gradient of W via backpropagation. 5: Update W based on gradient with Isg and It ﬁxed. 6: end for 7: end while 8: while R2 < 0 do {stopping criterion for It} 9: It := It + γ 10: for epoch = 1 : J do {increase It every J epoch} 11: Compute the gradient of W via backpropagation. 12: Update W based on gradient with It and Isg ﬁxed. 13: end for 14: end while

The proposed optimization strategy guarantees that zt captures and only captures the time-variant information while fsg captures and only captures the spatial-graph joint information based on the following theorem.

Theorem 3. The latent variable zt captures and only captures the time-variant information if R2 < 0 is satisﬁed. The latent variable fsg captures and only captures the timeinvariant spatial-graph correlated information if R3 < 0 is satisﬁed.

Proof. Notably, at initial stage, R3 = 0 and R2 = 0, we then gradually increase It and Isg, and at each step while well-trained, QT t=1 DKL(qφ(zt|Gt, St)||p(zt)) and DKL(qφ(fsg|G1:T , S1:T )||p(fsg)) will keep increasing to catch It and Isg. When R3 < 0 and R2 < 0, it indicates that the information that captured by It and Isg do not increase anymore, namely It = Ct and Isg = Csg. Thus, the whole optimization process can be stopped. During the whole process, the two constraint It Ct and Isg Csg are always satisﬁed, namely, zt always captures and only captures the time-variant information and fsg always captures and only captures the time-invariant spatial-graph correlated information. The detailed proof can be found in Supplementary Material. In practice, we set β1, β2, β3, and β4 as 1 and the model is not sensitive to these parameters.

Our model consists of four encoders which model qφ(fs|S1:T , G1:T ), qφ(fg|S1:T , G1:T ), qφ(fsg|S1:T , G1:T ), and qφ(zt|S1:T , G1:T ) respectively. There are also two decoders to model pθ(Gt|zt, fg, fsg) and pθ(St|zt, fg, fsg), respectively. Speciﬁcally, we utilize a typical graph convolution neural network to encode the graph factors and a typical convolution neural network for the spatial factors. For the spatial-graph correlated factors, we utilize a Spatial-Network Message Passing Neural Network (SMPNN) (Guo, Du, and Zhao 2021), which considers both the spatial and graph information while passing messages. In terms of the temporal factors, we consider that could involve both spatial and graph variance, thus, we take another S-MPNN for the temporal factors. For decoders, we utilize a typical convolution neural network for the spatial factors, and a similar graph decoder proposed in NED-VAE (Guo, Wu, and Zhao 2018) for the graph factors. Due to the space limits, the detailed description of the encoders and decoders, as well as the time complexity analysis are provided in the Supplementary Material.

Experiments Datasets We validate the effectiveness of our proposed models on two synthetic datasets and two real-world datasets, (1) Dynamic Waxman Random Graphs, (2) Dynamic Random Geometry Graphs, (3) Protein Folding Dataset, and (4) Trafﬁc Dataset MERT-LA (Du et al. 2021b). The ﬁrst two are well-known spatial network datasets (Bradonji c, Hagberg, and Percus 2007; Waxman 1988), which randomly place nodes in a geometry and the edges are connected by predeﬁned distance measures, with variances through the time dimension. The

Dataset Method Node Spatial Edge KLD cls KLD ds KLD bet KLD tcorr Avg MI

DSBM N/A N/A 54.95% 0.90 1.10 0.63 0.73 N/A Graph VAE 0.57 0.57 57.14% 1.63 1.82 0.91 0.85 N/A Graph RNN N/A N/A 55.24% 1.97 2.50 1.00 1.35 N/A beta-VAE 0.0012 0.0011 69.05% 0.43 1.61 1.82 0.36 2.25 beta-TCVAE 0.0013 0.0012 69.04% 0.47 1.37 1.56 0.08 2.33 NEND-IPVAE 0.016 0.0008 65.80% 1.39 1.82 2.78 0.11 2.52 STGD-VAE 0.0003 0.0001 69.99% 0.14 0.74 0.40 0.03 2.03 STGD-VAE-Dep 0.0191 0.0005 66.28% 0.45 0.55 0.54 0.38 2.04

DSBM N/A N/A 81.88% 1.77 2.87 3.38 0.64 N/A Graph VAE 0.56 0.74 85.75% 4.46 2.65 1.60 3.08 N/A Graph RNN N/A N/A 85.32% 0.57 1.24 2.40 0.85 N/A beta-VAE 0.0013 0.0017 91.75% 0.34 1.24 1.47 2.15 2.29 beta-TCVAE 0.0018 0.0019 91.62% 0.52 1.58 1.46 2.38 2.24 NED-IPVAE 0.0175 0.0018 89.84% 0.37 1.05 1.72 0.23 2.42 STGD-VAE 0.0004 0.0015 91.88% 0.14 0.72 0.28 0.11 2.07 STGD-VAE-Dep 0.0008 0.0017 91.28% 0.14 0.71 0.26 1.67 2.08

DSBM N/A N/A 70.78% 1.00 0.93 1.15 1.53 N/A Graph VAE N/A 553.82 62.54% 1.26 1.44 1.48 1.90 N/A Graph RNN N/A N/A 71.17% 1.05 1.15 1.43 0.83 N/A beta-VAE N/A 52.74 85.58% 0.16 0.14 0.46 0.61 1.04 beta-TCVAE N/A 35.05 95.80% 0.27 0.58 0.34 0.71 1.09 NED-IPVAE N/A 36.12 92.48% 1.08 0.79 0.44 2.64 1.15 STGD-VAE N/A 28.77 99.79% 0.33 0.21 0.53 0.23 0.70 STGD-VAE-Dep N/A 28.42 96.79% 0.13 0.54 1.55 0.24 0.76

DSBM N/A N/A N/A N/A N/A N/A N/A N/A Graph VAE N/A N/A N/A N/A N/A N/A N/A N/A Graph RNN N/A N/A N/A N/A N/A N/A N/A N/A beta-VAE 7.15 N/A N/A N/A N/A N/A N/A 1.37 beta-TCVAE 8.50 N/A N/A N/A N/A N/A N/A 1.18 NED-IPVAE 31.95 N/A N/A N/A N/A N/A N/A 1.18 STGD-VAE 6.78 N/A N/A N/A N/A N/A N/A 0.69 STGD-VAE-Dep 5.13 N/A N/A N/A N/A N/A N/A 1.06

Table 1: The evaluation results for the generated spatiotemporal graphs for different datasets (KLD cls refers to KLD of graph clustering coefﬁcient. KLD ds refers to KLD of graph density, KLD bet refers to KLD of betweenness centrality, and KLD tcorr refers to KLD of temporal correlation. (Best results are highlighted in bold, The KLD evaluation on the trafﬁc dataset is not reported because the metrics are on graph topology, while the graph topology for the trafﬁc dataset is unchanged.)

protein folding dataset consists of the folding steps of a protein with 8 amino acids (Guo et al. 2020b). The trafﬁc dataset contains sequences of graphs which contains trafﬁc speeds connected by 207 sensors (Jagadish et al. 2014).

Comparison Methods

To validate the proposed models in spatiotemporal graph generation, despite that no previous deep models specially designed for the spatiotemporal graph generation task, we compare with some SOTA graph generation model for generic graphs, including Graph RNN (You et al. 2018), a graph generation method that utilizes recurrent neural network structure to model graphs as sequences of edges and nodes, Graph VAE (Kipf and Welling 2016b), a variational auto-encoder based graph generative model focusing on handling small graphs, and a traditional algorithm DSBM (Xu and Hero 2014) which utilizes stochastic block model for static graph generation and combines it with Markov chains to achieve temporal graph generation. For Graph RNN and Graph VAE, we train individual models for each time step respectively for generating temporal graphs. To validate the signiﬁcance of our proposed disentanglement objective and optimization strategy for spatiotemporal graphs, several SOTA disentanglement techniques are compared including beta-VAE (Higgins, Matthey, and Pal 2016), beta-TC-VAE (Chen et al. 2018), and NED-IPVAE (Guo

et al. 2020b), where the architecture utilized is the same as our model, except with different disentanglement objectives.

Evaluation on Spatiotemporal Graph Generation

To validate the power of our proposed models to learn the representations of spatiotemporal graphs, we evaluate the reconstruction performance among different elements of a spatiotemporal graph, e.g. node, edge, and spatial locations by calculating the difference between the real ones and reconstructed ones. The mean square errors (MSE) is calculated for node attribute prediction and spatial prediction. The accuracy is used for evaluating edge connection prediction. To evaluate the generation performance, we select several common evaluation metrics to measure graph-structured data in multiple aspects, including graph statistics and temporal statistics. Speciﬁcally, we calculate a commonly used measurement for probability distributions, Kullback-Leibler Divergence (KLD), between the real training data distribution and the generated data distribution in terms of: (1) graph density, (2) average clustering coefﬁcient, (3) betweenness centrality, and (4) temporal correlation. The quantitative evaluation on the two synthetic datasets and two real-world datasets are reported in Table 1. First of all, for dynamic Waxman random graphs, STGD-VAE achieves the best overall evaluation which it not only reconstructs the node, spatial, and edge attributes very well, but

(a) Protein folding (b) Traffic Modeling

(c) DWR Graph (d) DRG Graph

Figure 3: Qualitative evaluation on two synthetic datasets and two real-world datasets.

also learns the underlying graph, temporal statistical distribution of the training set. To be speciﬁc, STGD-VAE outperforms the two disentangled VAE models, beta-VAE and beta-TCVAE by 1.3% in reconstructing the graph edge connection. beta-VAE and beta-TCVAE fail to learn the exact underlying property statistical distribution of the graphs. We consider this as a beneﬁt of the disentanglement of different types of features, which allows for clearer and better representations and results in a roughly 62.5% improvement. For the dynamic random geometry graph dataset, the results are very similar to DWR Graph which mainly due to the similar way of generating the synthetic datasets, but with difference geometries. Obviously, our proposed STGD-VAE outperforms the best baseline model, beta-VAE, by at most 69.2% in the reconstruction evaluation and roughly 58.8% in learning the property distribution. In the two real-world datasets, the proposed STGD-VAE and STGD-VAE-Dep outperform other models in reconstructing the spatial locations of the nodes and predicting the contacts between residues by a large margin, which results in up to 19% increase in predicting the spatial locations, and up to 4% increase in contact prediction. However, in some of the graph statistics, our proposed models fail to beat all the baseline models because some of the graph statistics are not designed for the real-world protein structures. However, our proposed models still achieve comparable results in those evaluations. In the last trafﬁc dataset, the graph topology and geometry stay the same, the only changing part is the node feature, which represent the trafﬁc speeds. Our models achieve the best two in the speed prediction task and leave the third as highest as a 25.9% gap.

Evaluation on Disentangled Learning Quantitative Evaluation. For quantitative evaluation, the typical avg MI score (Locatello et al. 2019) is utilized here to evaluate the power of the encoder to learn disentangled representations towards the predeﬁned semantic factors by measuring the mutual distance between the latent variables and the semantic properties. Thus it is calculated based on the learn latent variables and the input graphs in the testing sets. As shown in Table 1, for all the four datasets,

beta-VAE, beta-TCVAE, and NED-IPVAE achieve similar results in terms of avg MI score. Graph RNN, Graph VAE, and DSBM are not capable of disentanglement learning, and thus, no score is reported on this evaluation metric. Between the two proposed models, STGD-VAE and STGD-VAE-Dep achieve comparable results in most of the cases, STGD-VAE is slightly better and have an up to 41.5% improvement over the best baselines. Qualitative Evaluation As in the conventional qualitative evaluation in disentanglement representation learning (Chen et al. 2018; Higgins, Matthey, and Pal 2016), we change the value of one latent variable continuously while ﬁxing the remaining variables to see the variation of the semantic factor it controls. In Fig. 3(a) and 3(b), we visualize the folding process of the protein structures and the trafﬁc modeling process. We can observe that the residues on the right side are slightly folding up and moving towards left. For the trafﬁc dataset, it is worth noting that the trafﬁc speed is constantly changing in different time steps which reﬂects the real-time trafﬁc situations. In Fig. 3(c) and 3(d), we also visualize the changes of the generated graphs when the latent factor zs of our STGD-VAE model change from 5 to 5 in the dynamic Waxman random graph and the dynamic random geometry graph dataset, respectively. Clearly, the spatial location is changed accordingly, from the left-bottom corner to nearly the right-top corner, which shows that the latent variables learn and expose the semantic factors well.

Conclusion In this paper, we introduce STGD-VAE and STGD-VAEDep, to the best of our knowledge, the ﬁrst general deep generative model framework for spatiotemporal graphs. Specifically, we propose a new Bayesian model that factorizes spatiotemporal graphs into spatial, temporal, and graph factors as well as the factors that model the interactions among them. Moreover, a variational objective function and a new mutual information thresholding algorithm based on information bottleneck are proposed to maximize the disentanglement among the factors with theoretical guarantees. The extensive experiments validate the superiority of the models in generative modeling of spatiotemporal graphs.

Acknowledgements

This work was supported by the National Science Foundation (NSF) Grant No. 1755850, No. 1841520, No. 2007716, No. 2007976, No. 1942594, No. 1907805, a Jeffress Memorial Trust Award, Amazon Research Award, NVIDIA GPU Grant, and NSF Computing Innovative Fellowship (Subaward No. 2021CIF-Emory-05).

Alemi, A. A.; Fischer, I.; Dillon, J. V.; and Murphy, K. 2016. Deep variational information bottleneck. ar Xiv preprint ar Xiv:1612.00410. Barab asi, A.-L.; Albert, R.; and Jeong, H. 1999. Mean-ﬁeld theory for scale-free random networks. Physica A: Statistical Mechanics and its Applications, 272(1-2): 173 187. Barth elemy, M. 2011. Spatial networks. Physics Reports, 499(1-3): 1 101. Bender, E. A.; and Canﬁeld, E. R. 1978. The asymptotic number of labeled graphs with given degree sequences. Journal of Combinatorial Theory, Series A, 24(3): 296 307. Bradonji c, M.; Hagberg, A.; and Percus, A. G. 2007. Giant component and connectivity in geographical threshold graphs. In International Workshop on Algorithms and Models for the Web-Graph, 209 216. Springer. Burgess, C. P.; Higgins, I.; and Pal, A. e. a. 2018. Understanding disentangling in β-VAE. ar Xiv preprint ar Xiv:1804.03599. Chen, R. T.; Li, X.; Grosse, R. B.; and Duvenaud, D. K. 2018. Isolating sources of disentanglement in variational autoencoders. In Neur IPS, 2610 2620. Cui, Z.; Henrickson, K.; Ke, R.; and Wang, Y. 2019. Trafﬁc graph convolutional recurrent neural network: A deep learning framework for network-scale trafﬁc learning and forecasting. IEEE Transactions on Intelligent Transportation Systems, 21(11): 4883 4894. Dahiyat, B. I.; and Mayo, S. L. 1997. De novo protein design: fully automated sequence selection. Science, 278(5335): 82 87. Du, W.; Zhang, H.; Du, Y.; Meng, Q.; Chen, W.; Shao, B.; and Liu, T.-Y. 2021a. Equivariant vector ﬁeld network for many-body system modeling. ar Xiv preprint ar Xiv:2110.14811. Du, Y.; Guo, X.; Shehu, A.; and Zhao, L. 2020. Interpretable Molecule Generation via Disentanglement Learning. In Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, 1 8. Du, Y.; Wang, S.; Guo, X.; Cao, H.; Hu, S.; Jiang, J.; Varala, A.; Angirekula, A.; and Zhao, L. 2021b. Graph GT: Machine Learning Datasets for Graph Generation and Transformation. In The 35th Conference on Neural Information Processing Systems (Neur IPS 2021). Dye, C.; and Gay, N. 2003. Modeling the SARS epidemic. Science, 300(5627): 1884 1885.

Fu, T.; Gao, W.; Xiao, C.; Yasonik, J.; Coley, C. W.; and Sun, J. 2021a. Differentiable scaffolding tree for molecular optimization. ar Xiv preprint ar Xiv:2109.10469. Fu, T.; Xiao, C.; Li, X.; Glass, L. M.; and Sun, J. 2021b. MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization. In Proceedings of the AAAI Conference on Artiﬁcial Intelligence, volume 35, 125 133. Fu, T.; Xiao, C.; Qian, C.; Glass, L. M.; and Sun, J. 2021c. Probabilistic and Dynamic Molecule-Disease Interaction Modeling for Drug Discovery. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 404 414. Fu, T.; Xiao, C.; and Sun, J. 2020. Core: Automatic molecule optimization using copy & reﬁne strategy. In Proceedings of the AAAI Conference on Artiﬁcial Intelligence, volume 34, 638 645. Fuchs, F. B.; Worrall, D. E.; Fischer, V.; and Welling, M. 2020. SE (3)-transformers: 3D roto-translation equivariant attention networks. ar Xiv preprint ar Xiv:2006.10503. Grover, A.; Zweig, A.; and Ermon, S. 2019. Graphite: Iterative generative modeling of graphs. In International conference on machine learning, 2434 2444. PMLR. Guo, X.; Du, Y.; Tadepalli, S.; Zhao, L.; and Shehu, A. 2020a. Generating tertiary protein structures via an interpretative variational autoencoder. ar Xiv preprint ar Xiv:2004.07119. Guo, X.; Du, Y.; and Zhao, L. 2020. Property Controllable Variational Autoencoder via Invertible Mutual Dependence. In International Conference on Learning Representations. Guo, X.; Du, Y.; and Zhao, L. 2021. Disentangled Deep Generative model for Spatial Networks. ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Guo, X.; Wu, L.; and Zhao, L. 2018. Deep graph translation. ar Xiv preprint ar Xiv:1805.09980. Guo, X.; Zhao, L.; Qin, Z.; Wu, L.; Shehu, A.; and Ye, Y. 2020b. Interpretable Deep Graph Generation with Node Edge Co-Disentanglement. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 1697 1707. Hamilton, W. L.; Ying, R.; and Leskovec, J. 2017. Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems, 1025 1035. Higgins, I.; Matthey, L.; and Pal, A. e. a. 2016. beta-vae: Learning basic visual concepts with a constrained variational framework. Holme, P. 2015. Modern temporal network theory: a colloquium. The European Physical Journal B, 88(9): 1 30. Ingraham, J.; Garg, V.; Barzilay, R.; and Jaakkola, T. 2019. Generative models for graph-based protein design. Advances in Neural Information Processing Systems, 32. Jagadish, H. V.; Gehrke, J.; Labrinidis, A.; Papakonstantinou, Y.; Patel, J. M.; Ramakrishnan, R.; and Shahabi, C. 2014. Big data and its technical challenges. Communications of the ACM, 57(7): 86 94.

Khodayar, M.; Mohammadi, S.; Khodayar, M. E.; Wang, J.; and Liu, G. 2019. Convolutional graph autoencoder: A generative deep neural network for probabilistic spatio-temporal solar irradiance forecasting. IEEE Transactions on Sustainable Energy, 11(2): 571 583. Kim, H.; and Mnih, A. 2018. Disentangling by factorising. ICML. Kipf, T. N.; and Welling, M. 2016a. Semi-supervised classiﬁcation with graph convolutional networks. ar Xiv preprint ar Xiv:1609.02907. Kipf, T. N.; and Welling, M. 2016b. Variational graph autoencoders. ar Xiv preprint ar Xiv:1611.07308. Kumar, A.; Sattigeri, P.; and Balakrishnan, A. 2017. Variational inference of disentangled latent concepts from unlabeled observations. ar Xiv preprint ar Xiv:1711.00848. Liu, Z.; Chen, Y.; Du, Y.; and Tegmark, M. 2021. Physics Augmented Learning: A New Paradigm Beyond Physics Informed Learning. ar Xiv preprint ar Xiv:2109.13901. Locatello, F.; Bauer, S.; Lucic, M.; Raetsch, G.; Gelly, S.; Sch olkopf, B.; and Bachem, O. 2019. Challenging common assumptions in the unsupervised learning of disentangled representations. In international conference on machine learning, 4114 4124. PMLR. Lopez, R.; Regier, J.; Jordan, M. I.; and Yosef, N. 2018. Information constraints on auto-encoding variational bayes. ar Xiv preprint ar Xiv:1805.08672. Ma, J.; Cui, P.; Kuang, K.; Wang, X.; and Zhu, W. 2019. Disentangled graph convolutional networks. In International Conference on Machine Learning, 4212 4221. PMLR. Mangasarian, O. L. 1994. Nonlinear programming. SIAM. Rahman, T.; Du, Y.; Zhao, L.; and Shehu, A. 2021. Generative Adversarial Learning of Protein Tertiary Structures. Molecules, 26(5): 1209. Roy, A.; Roy, K. K.; Ali, A. A.; Amin, M. A.; and Rahman, A. 2021. Uniﬁed Spatio-Temporal Modeling for Trafﬁc Forecasting using Graph Neural Network. ar Xiv preprint ar Xiv:2104.12518. Rozenshtein, P.; and Gionis, A. 2019. Mining temporal networks. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, 3225 3226. Shi, C.; Xu, M.; Zhu, Z.; Zhang, W.; Zhang, M.; and Tang, J. 2020. Graphaf: a ﬂow-based autoregressive model for molecular graph generation. ar Xiv preprint ar Xiv:2001.09382. Simonovsky, M.; and Komodakis, N. 2018. Graphvae: Towards generation of small graphs using variational autoencoders. In International Conference on Artiﬁcial Neural Networks, 412 422. Springer. Stopher, P.; and Meyburg, A. 1975. Urban Transportation Modeling and Planning. G - Reference, Information and Interdisciplinary Subjects Series. Lexington Books. ISBN 9780669969412. Teng, P. 1985. A comparison of simulation approaches to epidemic modeling. Annual Review of Phytopathology, 23(1): 351 379.

Veliˇckovi c, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; and Bengio, Y. 2017. Graph attention networks. ar Xiv preprint ar Xiv:1710.10903. Wang, H.; Wang, J.; Wang, J.; Zhao, M.; Zhang, W.; Zhang, F.; Xie, X.; and Guo, M. 2018. Graphgan: Graph representation learning with generative adversarial nets. In Proceedings of the AAAI conference on artiﬁcial intelligence, volume 32. Wang, X.; Li, Z.; Jiang, M.; Wang, S.; Zhang, S.; and Wei, Z. 2019. Molecule property prediction based on spatial graph embedding. Journal of chemical information and modeling, 59(9): 3817 3828. Waxman, B. M. 1988. Routing of multipoint connections. IEEE journal on selected areas in communications, 6(9): 1617 1622. Wu, Z.; Pan, S.; Long, G.; Jiang, J.; and Zhang, C. 2019. Graph wavenet for deep spatial-temporal graph modeling. ar Xiv preprint ar Xiv:1906.00121. Xu, K. S.; and Hero, A. O. 2014. Dynamic stochastic blockmodels for time-evolving social networks. IEEE Journal of Selected Topics in Signal Processing, 8(4): 552 562. Xu, R.; Guo, Y.; Han, X.; Xia, X.; Xiang, H.; and Ma, J. 2021a. Open CDA: an open cooperative driving automation framework integrated with co-simulation. In 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), 1155 1162. IEEE. Xu, R.; Xiang, H.; Xia, X.; Han, X.; Liu, J.; and Ma, J. 2021b. OPV2V: An Open Benchmark Dataset and Fusion Pipeline for Perception with Vehicle-to-Vehicle Communication. ar Xiv preprint ar Xiv:2109.07644. You, J.; Ying, R.; Ren, X.; Hamilton, W.; and Leskovec, J. 2018. Graphrnn: Generating realistic graphs with deep autoregressive models. In International Conference on Machine Learning, 5708 5717. PMLR. Yu, B.; Yin, H.; and Zhu, Z. 2017. Spatio-temporal graph convolutional networks: A deep learning framework for trafﬁc forecasting. ar Xiv preprint ar Xiv:1709.04875. Zhang, L. 2019. STGGAN: Spatial-temporal graph generation. In Proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 608 609. Zhang, L.; Zhao, L.; Qin, S.; and Pfoser, D. 2020. TG-GAN: Deep Generative Models for Continuously-time Temporal Graph Generation. ar Xiv preprint ar Xiv:2005.08323. Zhao, L. 2021. Event Prediction in the Big Data Era: A Systematic Survey. ACM Computing Surveys (CSUR), 54(5): 1 37. Zhao, S.; Song, J.; and Ermon, S. 2017. Infovae: Information maximizing variational autoencoders. ar Xiv preprint ar Xiv:1706.02262. Zhou, D.; Zheng, L.; Han, J.; and He, J. 2020. A datadriven graph generative model for temporal interaction networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 401 411.