# directed_chain_generative_adversarial_networks__fec8acde.pdf Directed Chain Generative Adversarial Networks Ming Min * 1 Ruimeng Hu * 1 2 Tomoyuki Ichiba 1 Real-world data can be multimodal distributed, e.g., data describing the opinion divergence in a community, the interspike interval distribution of neurons, and the oscillators natural frequencies. Generating multimodal distributed realworld data has become a challenge to existing generative adversarial networks (GANs). For example, it is often observed that Neural SDEs have only demonstrated successful performance mainly in generating unimodal time series datasets. In this paper, we propose a novel time series generator, named directed chain GANs (DC-GANs), which inserts a time series dataset (called a neighborhood process of the directed chain or input) into the drift and diffusion coefficients of the directed chain SDEs with distributional constraints. DC-GANs can generate new time series of the same distribution as the neighborhood process, and the neighborhood process will provide the key step in learning and generating multimodal distributed time series. The proposed DC-GANs are examined on four datasets, including two stochastic models from social sciences and computational neuroscience, and two real-world datasets on stock prices and energy consumption. To our best knowledge, DC-GANs are the first work that can generate multimodal time series data and consistently outperforms state-of-the-art benchmarks with respect to measures of distribution, data similarity, and predictive ability. 1. Introduction Generative models are important to overcome the limitation of data scarcity, privacy, and costs. In particular, medical *Equal contribution 1Department of Statistics and Applied Probability, University of California, Santa Barbara, CA 93106-3110, USA. 2Department of Mathematics, University of California, Santa Barbara, CA 93106-3080, USA.. Correspondence to: Ming Min . Proceedings of the 40 th International Conference on Machine Learning, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright 2023 by the author(s). data are not easy to get, use or share, due to privacy; and financial time series data are inadequate due to their nonstationarity nature. Times-series generative models, instead of seeking to learn the governing equations from real data, aim to discover and learn data automatically, and output new data that plausibly can be drawn from the original dataset. Some existing infinite-dimensional generative adversarial networks (GANs) (e.g., Kidger et al. (2021); Li et al. (2022)) showed successful performance in unimodal time series datasets. However, many real-world phenomena are multimodal distributed, e.g., data describing the opinion divergence in a community (Tsang & Larson, 2014), the interspike interval distribution (Sharma et al., 2018), and the oscillators natural frequencies (Smith & Gottwald, 2019). All these bring the necessity of developing new generative models for multimodal time series data. In this paper, we develop a novel time-series generator, named directed chain GANs (DC-GANs), motivated by the formulation of DC-SDEs (Detering et al., 2020). The drift and diffusion coefficients in DC-SDEs depend on another stochastic process, which we call the neighborhood process, with distribution required to be the same as the SDEs distribution. Different from other GANs, which only use real data in discriminators, our proposed algorithm naturally takes the dataset as the neighborhood process, giving generators access to data information. This feature enables our model to outperform the state-of-the-art methods on many datasets, particularly for the situation of multimodal time-series data. Contribution. We propose a generator for multimodal distributed time series based on DC-SDEs (cf. Definition 2.1), and prove that our model can handle any distribution that Neural SDEs are capable of generating (see Theorem 2.1). To train the generator, we propose to use a combination of two types of discriminators: Sig-WGAN (Ni et al., 2021) and Neural CDEs (Kidger et al., 2020). We notice that data generated immediately from DC-GANs can be correlated, and propose an easy solution by walking along the directed chain in the path space for further steps (see Theorem 2.2). Combining branching the chain with different Brownian noises enables our model to generate unlimited independent fake data. We test our algorithms in four different experiments and show that DC-GANs provide the best performance com- Directed Chain Generative Adversarial Networks pared to existing popular models, including Sig WGAN (Ni et al., 2021), CTFP (Deng et al., 2020), Neural SDEs (Kidger et al., 2021), Time GAN (Yoon et al., 2019) and Transformer-based generator TTS-GAN (Li et al., 2022). Related Literature. Neural ordinary differential equations (Neural ODEs), introduced by Chen et al. (2018), use neural networks to parameterize the vector fields of ODEs and bring a powerful tool for learning time series data. Later, significant effort has been put into improving Neural ODEs, e.g., Quaglino et al. (2019); Zhang et al. (2019); Massaroli et al. (2020); Hanshu et al. (2019). In fact, incorporating mathematical concepts into the Neural ODEs framework can provide the capability of analyzing and justifying its validity, leading to a deeper understanding of the framework itself. For example, Li et al. (2020) and Tzen & Raginsky (2019a) generalized the idea to neural stochastic differential equations (Neural SDEs), providing adjoint equations for efficient training. By integrating rough path theory (Lyons et al., 2007), Kidger et al. (2020) proposed neural controlled differential equations (Neural CDEs) and Morrill et al. (2021) proposed neural rough differential equations for modeling time series. Other examples integrating profound mathematical concepts include using higher order kernel mean embeddings to capture information filtration (Salvi et al., 2021), and solving high dimensional partial differential equations through backward stochastic differential equations (Han et al., 2018), to name a few. The closely related model to ours is the Neural SDEs by Kidger et al. (2021), which uses the Wasserstein GAN method to train stochastic diffusion evolving in a hidden space and gains great success in simulating time series data. Other successful GANs models for time-series data include Cuchiero et al. (2020); Tzen & Raginsky (2019b); Deng et al. (2020); Kidger et al. (2021); Li et al. (2022); see Brophy et al. (2022) for a recent review. Note that we find in the numerical experiments that the performances of Neural SDEs are limited in simulating multimodal distributed time series, e.g., as shown in Figure 1 from the stochastic opinion dynamics (Example 1 in Section 4.2). The directed chain is one of the simplest structures in random graph theory, where each node on the graph represents a stochastic process and has interactions only with its neighbor nodes (Figure 4). To our best knowledge, Detering et al. (2020) initiated the study of the SDE system on the directed chains, followed by Feng et al. (2021a;b) for the analysis of stochastic differential games on such chains with (deterministic and random) interactions. Later on, more complicated graph structures are studied beyond directed chains. For example, Lacker et al. (2021) analyzed particle behaviors where the interaction only happens between neighborhoods in an undirected graph, and proved Markov random fields property and constructed Gibbs measure on path space when interactions appear only in drift; Lacker & Soret (2022) considered stochastic differential games on transitive graphs; Carmona et al. (2022) studied games on a graphon which has infinitely many nodes. Despite numerous extensions, we find that the directed chain structure, although simple but rich enough for generating multimodal time series. From another viewpoint, DC-SDEs can be understood as the reverse direction of mimicking theorems (Gy ongy, 1986). The idea of mimicking is that for a general SDE (even with path-dependence features), one can construct a Markovian one to mimic its marginal distribution; see Brunick & Shreve (2013) for details on mimicking aspects of Itˆo processes including the distributions of running maxima and running integrals. DC-SDEs work in the reverse direction: they can produce marginal distributions that are generated by Markovian SDEs (see Theorem 2.1 for a detailed statement). The benefit of using DC-SDEs, in particular in machine learning, is to have a more vital fitting ability by embedding data into a slightly more complicated system. 2. Directed Chan SDEs and Signatures In this section, we introduce two mathematical concepts that serve as the backbones of our algorithm: directed chain SDEs and signatures. In Section 2.1, we identify the central issue of naively generating time series from true data using DC-SDEs: the non-independence of the true data and fake data (Problem 2.1). Then we overcome the nonindependence issue by Decoorelating and Branching Phase in Section 3.1, and provide theoretical guarantees for this procedure (Theorem 2.2). In the sequel, we shall use Xs, Xt to denote the state of X at time s and t, respectively. With no subscript, e.g., by X, we mean the whole path from t = 0 to T. 2.1. Directed Chain SDEs (DC-SDEs) The DC-SDEs are the limit of a system of n-coupled SDEs interacting homogeneously on a directed chain when n goes to infinity. Below we will focus on DC-SDEs and defer the introduction of this limiting process to Appendix A. Under the general setup, DC-SDEs can be of Mc Kean-Vlasov type where the coefficients have distributions as inputs, corresponding to the n-coupled system having mean-field interaction. In our proposed generator, it is sufficient to use the simple case mentioned above, DC-SDE without the meanfield interaction, as in the following definition. Definition 2.1 (DC-SDEs). Fix a filtered probability space (Ω, F, (Ft)t 0, P) and a finite time horizon [0, T]. Let (X, X) with X, X L2(Ω [0, T], RN) be a pair of square- Directed Chain Generative Adversarial Networks 2 1 0 1 2 Value Real Generated (a) t = 0.1 2 1 0 1 2 Value Real Generated (b) t = 0.3 1 0 1 Value Real Generated (c) t = 0.5 1.5 1.0 0.5 0.0 0.5 1.0 1.5 Value Real Generated (d) t = 0.7 0.5 0.0 0.5 Value Real Generated (e) t = 0.9 0.6 0.4 0.2 0.0 0.2 0.4 0.6 Value Real Generated (f) t = 1.0 2 1 0 1 2 Value Real Generated (g) t = 0.1 2 1 0 1 2 Value Real Generated (h) t = 0.3 1 0 1 Value Real Generated (i) t = 0.5 1.5 1.0 0.5 0.0 0.5 1.0 1.5 Value Real Generated (j) t = 0.7 0.5 0.0 0.5 Value Real Generated (k) t = 0.9 0.6 0.4 0.2 0.0 0.2 0.4 0.6 Value Real Generated (l) t = 1.0 Figure 1. Marginal distributions of real data (blue) and generated data (red) from Example 1 (Stochastic opinion dynamics) at t {0.1, 0.3, 0.5, 0.7, 0.9, 1} in Section 4.2. Figures (a) (f) are generated by Neural SDEs, and Figures (g) (l) are generated by DC-GANs. One can see from Figures (e) and (f) that Neural SDEs fail to capture the bimodal distribution. integrable stochastic processes satisfying Xt = ξ+ Z t 0 V0(s, Xs, Xs) ds + Z t 0 V1(s, Xs, Xs) d Bs, (1) for t [0, T], with the distributional constraint Law(Xt, 0 t T) = Law( Xt, 0 t T), (2) where Law( ) stands for the distribution, V0 RN and V1 RN d are smooth coefficients satisfying Lipschitz and linear growth conditions, B is a standard d-dimensional Brownian motion, and X0 := ξ, X and B are assumed to be independent. The existence of the solution to (1) and the weak uniqueness in the sense of distribution have been proved under the Lipschitz and linear growth assumptions on the coefficients in Detering et al. (2020) for a simple case, and in Ichiba & Min (2022) for a more general case. Moreover, with the smoothness of the solution under certain additional conditions posed on the coefficients (cf. Ichiba & Min (2022)), we can derive a partial differential equation (PDE) for the marginal densities of the solution. Then, the associated PDEs lead to the following theorem: DC-SDEs have at least the same amount of flexibility as Neural SDEs. Theorem 2.1. Under proper assumptions, for any Y that satisfies a system of Markovian SDEs on [0, T], there exists a unique solution to the DC-SDE (1) with constraints (2), some V0 and non-degenerate coefficients V1, such that they have the same marginal distributions for all t [0, T]. Here by degenerate, we mean that Vi(t, x, x) := Vi(t, x), i {0, 1}, i.e., the coefficients have no dependence on neighborhood nodes at all. We defer the proof of Theorem 2.1 to Appendix B.2. Naturally, if V0 and V1 are known (or learned from data), one can take real data paths as X in (1) and straightforwardly generate paths of X that have the same distribution as X by the constraint (2). However, naively implementing this idea will lead to the following potential problems. Problem 2.1 (Lack of Independence). The distribution of the generated sequence crucially depends on the real data; Consequently, to avoid dependence, a single real path can only be used once as X to generate one path of X, and thus the number of the generated sequence has to be the same as that of the training data set in one run. Note that a qualified generator should also be able to generate unlimited independent data that does not depend on the original one. Fortunately, both problems mentioned above can be overcome by the idea behind the following theorem. Theorem 2.2. Under mild non-degeneracy conditions, the correlation between training data and generated data in DCSDEs decays exponentially fast, as the distance increases on the chain. Due to the page limit, we give the formal statement of Theorem 2.2 with detailed proof in Appendix B.3. We shall explain how to beat the independence problem during the implementation described in Section 3.1. As shown in Appendix B.3, the introduction of independent Brownian motions to (1) is the key to solving the independence problem. We shall also provide an extreme example (cf. Remark B.1) showing that without R V1 d B, the system (1) (2) has only trivial (deterministic) solution. 2.2. Signature The proposed method utilizes signature (Lyons et al., 2007), a concept from rough path theory that we shall briefly introduce for completeness. As an infinitely graded sequence, Directed Chain Generative Adversarial Networks the signature can be understood as a feature extraction technique for time series data with certain regularity conditions. Let x : Ω [0, T] RN be a continuous random process, and denote the signature map by S : x 7 S(x) T(RN), where T(RN) is the tensor algebra defined on RN. Then, S(x) := (1, x1, , xi, . . . ), and 0