# scalable_generative_modeling_of_weighted_graphs__d1f4544e.pdf Published in Transactions on Machine Learning Research (October/2025) Scalable Generative Modeling of Weighted Graphs Richard Williams rlwilliams34@ucla.edu Department of Biostatistics University of California, Los Angeles Eric Nalisnick nalisnick@jhu.edu Department of Computer Science Johns Hopkins University Andrew Holbrook aholbroo@ucla.edu Department of Biostatistics University of California, Los Angeles Reviewed on Open Review: https: // openreview. net/ forum? id= y WKk BOc D18& note Id= AQrmk Z9e WM Weighted graphs are ubiquitous throughout biology, chemistry, and the social sciences, motivating the development of generative models for abstract weighted graph data using deep neural networks. However, most current deep generative models are designed for unweighted graphs and cannot be easily extended to weighted topologies. Among those that do incorporate edge weights, few consider a joint distribution with the topology of the graph. Furthermore, learning a distribution over weighted graphs must account for complex nonlocal dependencies between both the edges of the graph and corresponding weights of each edge. We develop an autoregressive model Bi GG-E, a nontrivial extension of the Bi GG model, that learns a joint distribution over weighted graphs while exploiting sparsity to generate a weighted graph with n nodes and m edges in O((n + m) log n) time. Simulation studies and experiments on a variety of benchmark datasets demonstrate that Bi GG-E best captures distributions over weighted graphs while remaining scalable and computationally efficient. 1 Introduction Graphs are useful mathematical structures for representing data in many domains, with applications ranging from modeling protein protein interactions (Keretsu & Sarmah, 2016) to predicting time spent in traffic (Stanojevic et al., 2018). A graph consists of a set of objects, called nodes, and their corresponding connections, called edges, which represent the graph s topology. Edges may contain additional information in the form of edge features, which can be categorical such as bond types in molecular graphs (Jo et al., 2022) or continuous such as branch lengths in phylogenetic trees (Semple et al., 2003). Edge weights, in particular, are continuous single-dimensional edge features, and a graph with edge weights comprises a weighted graph. Weighted graphs hold many applications, such as in neuroscience (Barjuan et al., 2025), economics (Fagiolo et al., 2010), social networks (Bellingeri et al., 2023), and phylogenetics (Baele et al., 2025). Generative modeling of graphs is a vibrant research area, where weighted graphs require models that capture both topology and edge-weight distributions. Early approaches such as Erdős Rényi (Erdős & Rényi, 1959) and Barabási Albert models (Albert & Barabasi, 2002) offer simple graph-generating mechanisms but fail to capture subtle dependencies observed between edges in real-world data. These limitations motivate the development of expressive deep generative models capable of learning complex, nonlinear relationships. Despite such advances, modeling graph distributions remains an ongoing challenge due to the combinatorial nature of graphs and complex dependencies among edges. Furthermore, although incorporating edge weights seems straightforward, jointly modeling discrete topology and continuous weights introduces additional complexity, requiring the model to account for dependencies both within and between these two components. Published in Transactions on Machine Learning Research (October/2025) Modern graph generative models including variational autoencoders (VAEs) (Kipf & Welling, 2016; Grover et al., 2019), graph neural networks (Grover et al., 2019), autoregressive models (You et al., 2018; Liao et al., 2019; Li et al., 2018; Dai et al., 2020), and score-based diffusion models (Niu et al., 2020; Jo et al., 2022; Vignac et al., 2023) primarily focus on unweighted graphs. Most limit their scope to modeling distributions over graph topology while offering limited insight into the joint modeling of topology and edge weights, and few provide scalable solutions for learning joint distributions over sparse weighted graphs. Furthermore, a significant computational bottleneck in graph generative modeling arises when jointly modeling all possible edge connections, which scales quadratically with the number of nodes. Hence, models that attempt to jointly model all possible edge connections are computationally slow and infeasible for even moderately sized graphs. Autoregressive models factorize graph generation node-by-node using a sequential decision process. Bi GG, Big Graph Generation (Dai et al., 2020), augments this approach by directly generating the edge set of sparse graphs, scaling to graphs with tens of thousands of nodes. However, existing autoregressive methods, including Bi GG, remain limited to unweighted graphs. To address the need for efficient generative modeling over large weighted graphs, we introduce Bi GG-E ( Bi GG-Extension ), an autoregressive model that jointly generates both graph topologies and edge weights while preserving the scalability of its unweighted predecessor, Bi GG. We benchmark Bi GG-E against three alternatives: (1) Adjacency-LSTM (Adj-LSTM), a fully expressive but computationally inefficient model parameterized with a Long Short-Term Memory (LSTM; Hochreiter & Schmidhuber (1997)) cell; (2) Bi GGMLP, a naive extension that appends encodings of weights to Bi GG using a multilayer perceptron (MLP, Rumelhart et al. (1986)); and (3) Bi GG+GCN, a two-stage model that decouples topology and weight generation. Our contributions are as follows: We propose Bi GG-E, an application-agnostic generative model that learns joint distributions over sparse weighted graphs. We empirically demonstrate that Bi GG-E maintains the efficient scaling of Bi GG while outperforming Bi GG-MLP, Adj-LSTM, and Bi GG+GCN. All Bi GG extensions are orders of magnitude faster than Adj-LSTM and Sparse Diff, a diffusionmodel competitor. We directly evaluate the joint and marginal generative performance of all models on an array of weighted graph distributions. 2 Background Let G = {G1, . . . , G|G|} be an independent sample of weighted graphs from an unknown data-generating distribution p(Gi), for i = 1, . . . , |G|. Each weighted graph is defined as Gi = (Vi, Ei, Wi), where Vi = {v1, . . . , vni} is the set of |Vi| = ni nodes, Ei Vi Vi is the set of |Ei| = mi edges, and Wi : Vi Vi R+ maps edges to positive edge weights. For notational simplicity, we drop the subscript i, under the assumption that the graphs Gi G are independent and identically distributed. For any edge (vi, vj) E, the edge weight is W(vi, vj) = wij; otherwise, it is zero. The weighted adjacency matrix W Rn n has entries W(vi, vj). In the unweighted case, edge weights are 1 when (vi, vj) E and 0 otherwise. We denote the unweighted adjacency matrix by A, which encodes the graph s topology, and use W to specifically refer to the weighted adjacency matrix. A weighted graph G under node ordering π is represented by its permuted weighted adjacency matrix Wπ, from which the probability of observing G is given by p(G) = p(|V | = n) P π p(Wπ(G)) (Dai et al., 2020). Because summing over all n! node permutations quickly becomes intractable, we follow Liao et al. (2019) and assume a single canonical ordering π, yielding the lower bound estimate p(G) p(|V | = n)p(Wπ(G)). Following Dai et al. (2020), we estimate p(|V | = n) using a multinomial distribution over node counts in the training set, and model p(Wπ(G)) with deep autoregressive neural networks parameterized by θ. We assume all graphs are under the canonical ordering π(G) and omit this notation moving forward. Published in Transactions on Machine Learning Research (October/2025) 2.2 Related Work Weighted Graph Generative Models Although various models incorporate edge and node features in the graph generative process, these features are typically categorical (Kipf & Welling, 2016; Li et al., 2018; Kawai et al., 2019) or are tailored to a specific class of graphs, such as protein graphs (Ingraham et al., 2019). Furthermore, previous work on autoregressive models (You et al., 2018; Liao et al., 2019; Dai et al., 2020) focuses exclusively on unweighted graphs. Most implementations on weighted graphs provide limited insight into the incorporation of edge weights. Graphite (Grover et al., 2019) proposes modeling weighted graphs by parameterizing a Gaussian random variable, which introduces the possibility of infeasible negative weights. Although score-based models incorporate edge features into the graph generative process, these features are typically categorical (Vignac et al., 2023) or rely on thresholding to produce a weighted adjacency matrix W and only evaluate performance on the binarized adjacency matrix (Niu et al., 2020). Scalability Scaling generative models to graphs with thousands of nodes is an ongoing challenge, as the adjacency matrix A has O(n2) entries. In addition, many VAE (Grover et al., 2019) and diffusion (Niu et al., 2020; Vignac et al., 2023) models utilize graph neural networks, which perform convolutions over the entire adjacency matrix of the graph. Sparse Diff (Qin et al., 2024) is a scalable diffusion model on sparse graphs, but only scales to graphs with hundreds of nodes, while we are interested in scaling to thousands of nodes. Autoregressive models currently scale best with large graphs. While Graph RNN (You et al., 2018) trains in O(n2) time despite using a breadth-first search ordering scheme to reduce computational overhead, GRAN (Liao et al., 2019) trains in O(n) time by generating blocks of nodes of the graph at a time, but trades this gain in scalability for worsened sample quality as the model estimates edge densities per block of nodes. Bi GG (Dai et al., 2020) leverages the sparsity of many real-world graphs and directly generates the edge set {ek} of A: k=1 pθ(ek|{el:l