# neural_jumpdiffusion_temporal_point_processes__20c16154.pdf Neural Jump-Diffusion Temporal Point Processes Shuai Zhang 1 Chuan Zhou 1 2 Yang Liu 1 Peng Zhang 3 Xixun Lin 4 Zhi-Ming Ma 1 We present a novel perspective on temporal point processes (TPPs) by reformulating their intensity processes as solutions to stochastic differential equations (SDEs). In particular, we first prove the equivalent SDE formulations of several classical TPPs, including Poisson processes, Hawkes processes, and self-correcting processes. Based on these proofs, we introduce a unified TPP framework called Neural Jump-Diffusion Temporal Point Process (NJDTPP), whose intensity process is governed by a neural jump-diffusion SDE (NJDSDE) where the drift, diffusion, and jump coefficient functions are parameterized by neural networks. Compared to previous works, NJDTPP exhibits model flexibility in capturing intensity dynamics without relying on any specific functional form, and provides theoretical guarantees regarding the existence and uniqueness of the solution to the proposed NJDSDE. Experiments on both synthetic and real-world datasets demonstrate that NJDTPP is capable of capturing the dynamics of intensity processes in different scenarios and significantly outperforms the state-of-the-art TPP models in prediction tasks. 1. Introduction Many real-world scenarios often generate a large amount of asynchronous event sequences. Each event consists of a timestamp and a type mark, indicating when and what the event occurred. Examples include user activities on social media platforms (Farajtabar et al., 2017), electronic health records in healthcare (Liu & Hauskrecht, 2021), and transaction behaviors in e-commerce systems (Xue et al., 2022). 1Academy of Mathematics and Systems Science, Chinese Academy of Sciences 2School of Cyber Security, University of Chinese Academy of Sciences 3Cyberspace Institute of Advanced Technology, Guangzhou University 4Institute of Information Engineering, Chinese Academy of Sciences. Correspondence to: Chuan Zhou . Proceedings of the 41 st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024. Copyright 2024 by the author(s). Modeling such data has become increasingly important for tasks such as predicting the occurrence of future events (Du et al., 2016; Mei & Eisner, 2017; Zhang et al., 2020a; Zuo et al., 2020), detecting anomalies in event sequences (Liu & Hauskrecht, 2021; Shchur et al., 2021; Zhang et al., 2023), and performing causal inference on events (Xu et al., 2016; Zhang et al., 2020b; Gao et al., 2021). Temporal point processes (TPPs) (Daley et al., 2003) serve as a useful mathematical tool for modeling sequences of discrete events in continuous time. Classical examples of TPPs include Poisson processes (Kingman, 1992), Hawkes processes (Hawkes, 1971), and self-correcting processes (Isham & Westcott, 1979). A central concept in TPPs is the intensity process1 (Oakes, 1975), also known as the intensity function (Zhang et al., 2020b), which measures the expected rate of events occurrence given historical events. While these classical models exhibit favorable statistical properties, the fixed parametric form of their intensity functions prevents them from capturing complicated dynamics. To enhance the capability of TPP models, there has been a surge in modeling the intensity function as a transformation of the hidden state of neural networks. Depending on the neural network structures, these TPP models can be divided into two categories, i.e., those based on either RNNs or Transformers (Du et al., 2016; Zuo et al., 2020; Yang et al., 2022), and those based on continuous-depth neural networks (Jia & Benson, 2019; Chen et al., 2020). While being more expressive than classical TPPs, the former models usually assume a specific functional form for the intensity function. For example, RMTPP (Du et al., 2016) assumes that the intensity exponentially decreases or increases between events. However, relying on such an assumption would limit model expressiveness when the employed assumption deviates from reality (Omi et al., 2019). The latter models represent the hidden state as the solution to a neural jump stochastic differential equation (Jia & Benson, 2019). These models, however, provide no theoretical guarantee for the global existence and uniqueness of the solution. In this paper, we provide a new view for TPPs by reformulating the intensity process as the solution to a stochastic differential equation (SDE) (Ikeda & Watanabe, 2014). Specifi- 1In this paper, we use intensity process and intensity function interchangeably. Neural Jump-Diffusion Temporal Point Processes cally, we first derive equivalent SDE formulations of several classical TPPs mentioned above. From these SDE formulations, we observe that the coefficient functions in SDE play a key role in shaping the evolution of intensity process over time and revealing the influences between events. Based on these observations, we introduce the Neural Jump-Diffusion Temporal Point Process (NJDTPP), whose intensity process is governed by a neural jump-diffusion SDE (NJDSDE). The drift, diffusion, and jump coefficient functions in NJDSDE are parameterized by three neural networks, i.e., the drift net, diffusion net, and jump net. Concretely, the drift net captures the intrinsic evolution of the intensity process, the diffusion net models the Gaussian noise with the Brownian motion (Wang et al., 2017; 2018), and the jump net captures the influences between events, such as the excitatory and inhibitory influences. Remarkably, our NJDTPP model does not require a specific functional form for the intensity function. Instead, by using the drift, diffusion, and jump nets, the solution to NJDSDE can implicitly determine a free-form intensity process consistent with the observed event data. We summarize our contributions as follows: Theoretical Analysis. We prove the equivalent SDE formulations of several classical TPPs. For the SDE formulation, we provide a sufficient condition for the existence of a unique positive solution. Moreover, we theoretically analyze the existence and uniqueness of the solution to the proposed neural jump-diffusion SDE. Unified Framework. By viewing the intensity process as the solution to an SDE, we propose a unified TPP framework NJDTPP which can learn a free-form intensity process consistent with the observed data. A number of classical TPPs can be interpreted as special cases of our framework with simple coefficient functions. Extensive Experiments. We conduct experiments on three synthetic and six real-world datasets to evaluate the performance of NJDTPP. Experimental results show that NJDTPP successfully captures the dynamics of intensity processes and achieves state-of-the-art results in the tasks of likelihood evaluation and event prediction. 2. Related Work Neural Temporal Point Processes. Neural TPPs that combine TPPs with neural networks have received considerable attention (Du et al., 2016; Mei & Eisner, 2017; Zhang et al., 2020a; Zuo et al., 2020; Lin et al., 2021; Yang et al., 2022). While being more expressive than classical parametric ones, neural TPPs usually assume a specific functional form for the intensity function. For example, RMTPP (Du et al., 2016) assumes that the intensity exponentially decreases or increases between events; THP (Zuo et al., 2020) utilizes the softplus function so that the intensity between events is approximately linearly interpolated. However, relying on such an assumption can undermine model effectiveness if the employed assumption deviates from reality. In addition to the dominant paradigm of parameterizing intensity functions, alternative methods involve modeling cumulative intensity functions (Omi et al., 2019) and conditional density functions (Shchur et al., 2019). However, these methods may not fully capture the dynamics of the intensity process. In contrast to existing studies, our model formulates the intensity process as the solution to an SDE without relying on any specific functional form. Neural Differential Equations. Neural differential equations (NDEs) (Kidger et al., 2021a) are defined as differential equations in which coefficient functions are parameterized by neural networks. Many NDEs, including neural ODE and its variants (Chen et al., 2018; Rubanova et al., 2019; Kidger et al., 2020; Herrera et al., 2020), as well as neural SDEs (Li et al., 2020; Kong et al., 2020; Kidger et al., 2021a;b), have been proposed for modeling time series. However, there is a distinction between time series and event sequences (Xiao et al., 2017). In time series, time serves only as the index to order the sequence of values for the target variable. In event sequences, time is regarded as a random variable representing the timestamp of asynchronous events, with time itself being the subject of research. Therefore, many existing NDE-based models are not directly suitable for modeling event sequences. While Jia & Benson (2019) and Chen et al. (2020) utilize NDEs to model event sequences, they actually capture the dynamics of the hidden state of neural networks. Besides, they solely focus on the jump term, neglecting the diffusion term associated with randomness driven by Brownian motion. In contrast, we incorporate Brownian motion to model Gaussian noise, and more importantly our proposed NJDSDE models the dynamics of the intensity process. Equivalent SDE Formulations for TPPs. Wang et al. (2018) provided a jump-diffusion SDE framework for modeling user activities. They introduced the diffusion term to model the Gaussian noise, such as fluctuations in the dynamics caused by unobserved factors. However, their utilization of fixed linear coefficient functions in the SDE might not fully capture the actual intensity. On the contrary, we employ neural networks to parameterize coefficient functions, allowing for a more flexible modeling of the intensity that better aligns with the observed data. While De et al. (2016); Zarezade et al. (2017); Wang et al. (2018) established the equivalent SDE formulation for Hawkes processes, we provide a distinct proof method. Besides, we derive equivalent SDE formulations for several other classical TPPs, such as Poisson processes and self-correcting processes. Moreover, for the SDE formulation, we provide a sufficient condition for the existence of a unique positive solution. Neural Jump-Diffusion Temporal Point Processes 3. Background In this section, we provide a brief overview of temporal point processes and jump-diffusion stochastic differential equations. 3.1. Temporal Point Processes A temporal point process (TPP) (Daley et al., 2003) is a stochastic process {ti} i=1, in which the non-negative random variable ti represents the occurrence time of the i-th event and ti < ti+1. Such a process can be equivalently represented as a counting process {Nt}t 0, where Nt represents the number of events up to time t. The most common way to characterize a TPP is via its intensity process (Oakes, 1975), also known as the intensity function. Specifically, the intensity process of {Nt}t 0 is a left-continuous with right-limits process {λ(t | Ft )}t 0, denoted for simplicity as {λt}t 0, where λt measures the expected rate of events occurring in an infinitesimal window (t, t + dt] given the historical events up to time t. Formally, λt dt = P (d Nt = 1 | Ft ) = E [d Nt | Ft ] , (1) where Ft = σ(Ns : 0 s < t) and the jump size d Nt = Nt+dt Nt {0, 1}. In the following, we review several classical TPPs, where the intensity function has a fixed parametric form. Poisson processes (Kingman, 1992). The intensity function of the Poisson process {Nt}t 0 is independent of event history. The simplest case is a homogeneous Poisson process where the intensity is a positive constant: λt = λ > 0. (2) For a more general inhomogeneous poisson process, the intensity is a function varying over time: λt = g(t) > 0. (3) Hawkes processes (Hawkes, 1971). The Hawkes process {Nt}t 0 with the widely used exponential kernel assumes that events are self-exciting. The arrival of a new event results in a sudden increase in intensity, and this influence decays exponentially: λt = µ + α X i: ti 0, α > 0 and β > 0. Self-correcting processes (Isham & Westcott, 1979). In contrast to the Hawkes process, the self-correcting process {Nt}t 0 assumes that a new event inhibits future events and the intensity grows exponentially over time: λt = exp µt X i: ti 0 and α > 0. 3.2. Jump-Diffusion Stochastic Differential Equations One-dimensional autonomous jump-diffusion stochastic differential equations (JDSDE) (Hanson, 2007) with initial conditions are of the form ( d Xt = f(Xt) dt + g(Xt) d Wt + h(Xt) d Nt, X0 = x0, (6) where x0 R is the initial value, f : R R is the drift coefficient function, g : R R is the diffusion coefficient function, h: R R is the jump coefficient function, {Wt}t 0 is a standard Brownian motion, and {Nt}t 0 is a counting process that jumps at times {ti} i=1. Suppose that {Wt}t 0 and {Nt}t 0 are independent. In this paper, it is essential to highlight that the process {Nt}t 0 in Eq.(6) is a general counting process introduced in Section 3.1, distinct from many previous works (Cyganowski et al., 2002; Hanson, 2007; Lamberton & Lapeyre, 2011) that focus on a Poisson process. The JDSDE Eq.(6) is interpreted as a stochastic integral equation (Cyganowski et al., 2002): Xt = x0+ Z t 0 f(Xs) ds+ Z t 0 g(Xs) d Ws+ Z t 0 h(Xs) d Ns, where the first integral is a Riemann integral, the second is an Itˆo integral and the third is a Riemann Stieltjes integral. In fact, Eq.(6) behaves as a normal Itˆo SDE (Cyganowski et al., 2002; Hanson, 2007) between jumps of {Nt}t 0. This can be expressed as: d Xt = f(Xt) dt + g(Xt) d Wt, t (ti 1, ti] . On the other hand, at a jump time ti, {Nt}t 0 has a jump size of Nti = 1, which implies that the process {Xt}t 0 will have a jump of size Xti = Xti+ Xti = h(Xti) Nti = h(Xti), where Xti+ = lim s ti Xs. Then Xti+ = Xti + h(Xti). 4. Equivalent SDE Formulations for TPPs In this section, we derive equivalent SDE formulations of several classical TPPs, which involves expressing their respective intensity process as a solution to the corresponding SDE. Then for the SDE formulation, we provide a sufficient condition for the existence of a unique positive solution. Theorem 1. The intensity processes of homogeneous and inhomogeneous Poisson processes can be equivalently expressed as solutions to the following ODEs, respectively. These ODEs can be viewed as degenerate forms of SDEs. dλt = 0, λ0 = λ, (7) dλt = g (t) dt, λ0 = g(0), (8) where λ > 0 and g(t) > 0 is assumed to be differentiable. Neural Jump-Diffusion Temporal Point Processes According to Eq.(2) and Eq.(3), Theorem 1 is evident. Subsequently, we establish equivalent SDE formulations for Hawkes processes and self-correcting processes. Theorem 2. The intensity process {λt}t 0 of the Hawkes process {Nt}t 0 can be equivalently expressed as the solution to the jump SDE dλt = β(µ λt) dt + α d Nt, λ0 = µ. (9) Proof. See Appendix A.1. The proof sketch is as follows: Taking inspiration from (Bj ork, 2021), we now solve the above SDE. Let the jump times of {Nt}t 0 be {ti} i=1, then Eq.(9) behaves as an ODE dλt = β(µ λt) dt between these jump points. And at a jump time ti, the jump size is α, leading to λti+ = λti + α. Iteratively solving this ODE between jumps with the initial value λti 1+, we establish that the intensity process Eq.(4) satisfies Eq.(9). Theorem 3. The intensity process {λt}t 0 of the selfcorrecting process {Nt}t 0 can be equivalently expressed as the solution to the jump SDE dλt = µλt dt + e α 1 λt d Nt, λ0 = 1. (10) The proof of this theorem is similar to the previous one and can be found in Appendix A.2. The following result shows that under certain conditions, there exits a unique positive solution to an SDE, which means that an SDE can determine an intensity process of a TPP. Theorem 4. Assume that the ODE dyt = f (eyt) eyt dt, t 0, y0 = y, has a unique global solution for every y R and let h(x) : R R be a chosen function such that h(x) + x > 0 for x > 0. Then the jump SDE dλt = f (λt) dt + h (λt) d Nt, λ0 = λ, (11) has a unique global positive solution for every λ > 0. Appendix A.3 includes the detailed proof. Specially, according to this theorem, by setting f(x) = µx and h(x) = (e α 1)x, it follows that Eq.(10) has a unique global positive solution. From the above equivalent SDE formulations of several classical TPPs, we can clearly see that the coefficient functions within the SDE play a key role in shaping the evolution of intensity processes over time and revealing the influences between events. For example, in Hawkes processes (Eq.(9)), there exists excitatory influences between events, where each occurrence of an event leads to an instantaneous increase in intensity by α. This inspires us that by defining appropriate coefficient functions, it becomes feasible to construct an intensity process consistent with the observed data. These observations motivate us to propose our model Neural Jump-Diffusion Temporal Point Processes. 5. Neural Jump-Diffusion TPPs In this section, for symbol simplicity and reader comprehension, we first model the intensity process of the univariate TPPs. Subsequently, we extend our method to address the multivariate TPPs, proposing a more comprehensive model. 5.1. Neural Jump-Diffusion Univariate Point Process Unlike classical TPPs with known linear drift and jump coefficient functions, we consider a general problem where the dynamics of the intensity process are completely unknown. Specifically, we assume access to a large set of event sequences, each denoted as S = {ti}n i=1, representing independent realizations of a counting process {Nt}t 0. The objective is to identify the unknown dynamics governing the intensity process {λt}t 0 of {Nt}t 0. To this end, we propose the Neural Jump-Diffusion Univariate Point Process whose intensity process is governed by a neural jump-diffusion SDE (NJDSDE). The drift, diffusion, and jump coefficient functions in the NJDSDE are parameterized by three neural networks which are called the drift net, diffusion net, and jump net, respectively. To ensure that the intensity {λt}t 0 remains positive, we introduce the log-intensity process ηt := log λt. Then we formally present the NJDSDE for {ηt}t 0 as follows: dηt = fθf (ηt) | {z } drift net dt + gθg(ηt) | {z } diffusion net d Wt + hθh(ηt) | {z } jump net η0 = log λ0, (12) where η0 R is the initial value, fθf : R R, gθg : R R, hθh : R R, {Wt}t 0 is a standard Brownian motion (Le Gall, 2016), and {Nt}t 0 is the counting process mentioned above, which records the occurrence of events. Suppose that {Wt}t 0 and {Nt}t 0 are independent. We explain each term in Eq.(12) in detail: The drift term fθf (ηt) dt captures the intrinsic evolution of {ηt}t 0. The diffusion term gθg(ηt) d Wt models the Gaussian noise with the Brownian motion. Inspired by (Wang et al., 2018), we add the diffusion term to model the impact of noise on the intensity process. The jump term hθh(ηt) d Nt represents the magnitude of the jump, capturing the influence of historical events up to time t. Its sign indicates whether the influence is excitatory or inhibitory. The proposed NJDSDE Eq.(12) is a general framework. When the function fθf is set to β(µe ηt 1), gθg is set to 0, and hθh is set to log(1 + αe ηt) in Eq.(12), the NJDSDE characterizes Hawkes processes. Similarly, when fθf is set Neural Jump-Diffusion Temporal Point Processes to µ, gθg is set to 0, and hθh is set to α, Eq.(12) characterizes self-correcting processes. The similar results for Poisson processes are trivial. In other words, the proposed NJDSDE encompasses the classical TPPs mentioned above. In addition, a specific class (but not all) of log-Gaussian Cox processes (Møller et al., 1998) can also be incorporated into our modeling framework Eq.(12). The proofs for these conclusions are detailed in Appendix A.4. We proceed to investigate the existence and uniqueness of the solution {ηt}t 0 to the proposed NJDSDE. The theoretical analysis in the following theorem provides insights into designing an effective network architecture for the drift net fθf , diffusion net gθg, and jump net hθh. Theorem 5. Assuming that fθf (x), gθg(x), hθh(x) are measurable functions R R, hθh(x) is continuous, and there exists a positive constant C such that for all x, y R, |fθf (x) fθf (y)| + |gθg(x) gθg(y)| C|x y|, then for every λ0 > 0, there exists a unique adapted leftcontinuous process {ηt}t 0 with right-limits that satisfies Eq.(12). The proof is available in Appendix A.5. According to Theorem 5, if fθf (x), gθg(x) and hθh(x) are uniformly Lipschitz continuous, then Eq.(12) has a unique strong solution. Thus, we utilize Lipschitz nonlinear activations, such as Re LU, sigmoid, and Tanh, within the network architectures, as highlighted in previous works (Anil et al., 2019; Kong et al., 2020; Oh et al., 2024; Lin et al., 2024). Moreover, in this paper, the drift net, diffusion net, and jump net are implemented as three multi-layer perceptrons (MLPs). Remarks. We summarize the differences of our model compared to existing TPP models: Different from the SDE formulation of classical TPPs (e.g., Eq.(9)), the coefficient functions in our model are parameterized by neural networks rather than relying on fixed functions. This enables a more flexible modeling of the complex dynamics of the intensity process. Compared to neural TPPs (Du et al., 2016; Zuo et al., 2020), our model eliminates the need to assume a specific functional form for the intensity function. Instead, based on the NJDSDE, our model formulates the time evolution of the intensity process in a general manner. Furthermore, our model differs from previous TPP models based on neural differential equations (Jia & Benson, 2019; Chen et al., 2020; Song et al., 2024). In addition to incorporating the Brownian motion to model the Gaussian noise, a key distinction lies in our proposed NJDSDE, which models the dynamics of the intensity process rather than the hidden state. 5.2. Model Training To learn model parameters in fθf , gθg, hθh, and the initial value η0, we perform the Maximum Likelihood Estimation (MLE). For an event sequence S = {ti}n i=1 over the time interval [0, T], given its intensity λt, the log-likelihood function (Rasmussen, 2018) is i=1 log λti Z T i=1 ηti Z T In general, the integral term does not have a closed-form computational method. Therefore, we apply numerical integration methods for approximate calculations, such as the trapezoidal rule (Zuo et al., 2020). This requires determining the value of ηt at the divided time points. Noting that this process {ηt}t 0 is governed by our proposed NJDSDE Eq.(12). That is, on the time interval (ti 1, ti], ηt is governed by the neural SDE dηt = fθf (ηt) dt + gθg(ηt) d Wt. (13) And at a jump time ti, the jump size of ηt is given by ηti = ηti+ ηti = hθh(ηti) Nti = hθh(ηti). (14) Then the right-limit of ηt at ti is ηti+ = ηti + hθh(ηti). (15) Since the solution of neural SDEs (e.g., Eq.(13)) is generally analytically intractable, numerical approximation methods are often required (Kong et al., 2020; Kidger et al., 2021b). We adopt the Euler-Maruyama scheme (Kloeden & Platen, 1992) with fixed step size due to its computational efficiency. Under such a scheme, the time interval (ti 1, ti] is divided into N subintervals ti 1 = τ i 0 < < τ i k < < τ i N = ti with stepsize i k = τ i k+1 τ i k = (ti ti 1)/N. Then we discretize Eq.(13) on (ti 1, ti] by the recursive equation ητ i k+1 = ητ i k + fθf (ητ i k) i k + gθg(ητ i k) W i k, (16) for k = 0, 1, . . . , N 1 with ητ i 0 = ηti 1+. Here, W i k = Wτ i k+1 Wτ i k is sampled from N(0, i k) for numerical computation. The advantage of introducing the log-intensity ηt = log λt is evident in obtaining a numerical solution of Eq.(12) over the entire real number space, rather than being restricted to the domain of positive real numbers. Iteratively using Eq.(15) and Eq.(16), we can calculate the log-likelihood function as follows: τ i k τ i k 1 2 e ητi k 1 +e ητi k , (17) where τ 1 0 = 0, τ n+1 N = T, ητ i 0 = ηti 1+ and ητ i N = ηti. The complete algorithm of model training is described in Algorithm 1 in Appendix B. Neural Jump-Diffusion Temporal Point Processes 5.3. Neural Jump-Diffusion Multivariate Point Process An important example of multivariate TPPs is the multivariate Hawkes process N t = (N 1 t , . . . , N M t )T, whose intensity process λt = (λ1 t, . . . , λM t )T characterizes the past event influences on future ones in an excitatory manner (Hawkes, 1971): λm t = µm 0 + ti tn, ηtn+ = ηtn + hmn θh (ηtn). (22) Neural Jump-Diffusion Temporal Point Processes Therefore, similar to the method discussed in Section 5.2 for computing the log-likelihood function, we utilize the Euler-Maruyama scheme to discretize Eq.(22), followed by numerical integration techniques to compute the integrals mentioned above. Here, the value of ηtn required for discretizing Eq.(22) can be obtained by discretizing Eq.(18) over the interval [0, tn] using the Euler-Maruyama scheme and the historical events Htn+1. Following previous works (Zuo et al., 2020; Shi et al., 2023; Xue et al., 2024), the next event type prediction is given by bmn+1 = argmaxm λm tn+1/λg tn+1. (23) 6. Experiments We first test the flexibility of our NJDTPP model by recovering the ground truth dynamics of the intensity process of classical TPPs. Then, we evaluate the modeling capability for event sequences and the prediction performance of NJDTPP on six real-world datasets. Our code is available at https://github.com/Zh-Shuai/NJDTPP. 6.1. Intensity Process Recovery for Classical TPPs Synthetic Datasets. We consider the following classical TPPs: (i) Poisson Process: the intensity is given by λt = λ0, where λ0 = 1.0; (ii) Hawkes Process: the intensity is given by λt = µ + α P ti