# hawkes_process_based_on_controlled_differential_equations__769d7ff7.pdf Hawkes Process Based on Controlled Differential Equations Minju Jo , Seungji Kook and Noseong Park Yonsei University, Seoul, South Korea {alflsowl12,202132139,noseong}@yonsei.ac.kr Hawkes processes are a popular framework to model the occurrence of sequential events, i.e., occurrence dynamics, in several fields such as social diffusion. In real-world scenarios, the inter-arrival time among events is irregular. However, existing neural network-based Hawkes process models not only i) fail to capture such complicated irregular dynamics but also ii) resort to heuristics to calculate the log-likelihood of events since they are mostly based on neural networks designed for regular discrete inputs. To this end, we present the concept of Hawkes process based on controlled differential equations (HP-CDE), by adopting the neural controlled differential equation (neural CDE) technology which is an analogue to continuous RNNs. Since HP-CDE continuously reads data, i) irregular time-series datasets can be properly treated preserving their uneven temporal spaces, and ii) the log-likelihood can be exactly computed. Moreover, as both Hawkes processes and neural CDEs are first developed to model complicated human behavioral dynamics, neural CDE-based Hawkes processes are successful in modeling such occurrence dynamics. In our experiments with 4 real-world datasets, our method outperforms existing methods by non-trivial margins. 1 Introduction Real-world phenomena typically correspond to the occurrence of sequential events with irregular time intervals and numerous event types, ranging from online social network activities to personalized healthcare and so on [Zhao et al., 2015; Enguehard et al., 2020; Stoyan and Penttinen, 2000; Mohler et al., 2011; Ogata, 1999]. Hawkes processes and Poisson point process are typically used to model those sequential events [Hawkes, 1971; Miles, 1970; Streit, 2010]. However, their basic assumptions are too stringent to model such complicated dynamics, e.g., all past events should influence the occurrence of the current event. To this end, many advanced techniques have been proposed for the past several years, ranging from classical recurrent neural network (RNN) based models such as RMTPP [Du et al., 2016] and Model Exact log-likelihood How to model dynamics NHP, SAHP, X Discrete THP HP-CDE O (λ is continuous.) Continuous & robust to irregular dynamics Table 1: Comparison of neural network-based Hawkes process models. λ denotes the conditional intensity function (cf. Eqs. (4), (6), and (7)). NHP [Mei and Eisner, 2017] to recent transformer models like SAHP [Zhang et al., 2020] and THP [Zuo et al., 2020]. Even so, they still do not treat data in a fully continuous way but resort to heuristics, which is sub-optimal in processing irregular events [Chen et al., 2018; Choi et al., 2021; Yildiz et al., 2019]. Likewise, their heuristic approaches to model the continuous time domain impede solving the multivariate integral of the log-likelihood calculation in Eq. (4), leading to approximation methods such as the Monte Carlo sampling (cf. Table 1). As a consequence, the strict constraint and/or the inexact calculation of the log-likelihood may induce inaccurate predictions. In this work, therefore, we model the occurrence dynamics based on differential equations, not only directly handling the sequential events in a continuous time domain but also exactly solving the integral of the log-likelihood. One more inspiration of using differential equations is that they have shown several non-trivial successes in modeling human behavioral dynamics [Poli et al., 2019; Rubanova et al., 2019; Jeon et al., 2021] in particular, we are interested in controlled differential equations. To our knowledge, therefore, we first answer the question of whether occurrence dynamics can be modeled as controlled differential equations. Controlled differential equations (CDEs [Lyons et al., 2004]) are one of the most suitable ones for building human behavioral models. CDEs were first developed by a financial mathematician to model complicated dynamics in financial markets which is a typical application domain of Hawkes processes since financial transactions are temporal point processes. In particular, neural controlled differential equations (neural CDEs [Kidger et al., 2020]), whose initial value problem (IVP) is written as below, are a set of techniques to learn Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI-23) CDEs from data with neural networks: h(tb) = h(ta) + Z tb ta f(h(t); θf)d Z(t) = h(ta) + Z tb ta f(h(t); θf)d Z(t) where f is a CDE function, and h(t) is a hidden vector at time t. Z(t) is a continuous path created from discrete sequential observations (or events) {(zj, tj)}b j=a by an appropriate algorithm1, where in our case, zj is a vector containing the information of j-th occurrence, and tj [ta, tb] contains the time-point of the occurrence, i.e., tj < tj+1. Note that neural CDEs keep reading the time-derivative of Z(t) over time, denoted Z(t) := d Z(t) dt , and for this reason, neural CDEs are in general, considered as continuous RNNs. In addition, NCDEs are known to be superior in processing irregular time series [Lyons et al., 2004]. Given the neural CDE framework, we propose Hawkes Process based on Controlled Differential Equations (HPCDE). We let zj be the sum of the event embedding and the positional embedding and create a path Z(t) with the linear interpolation method which is a widely used interpolation algorithm for neural CDEs (cf. Figure 2). To get the exact loglikelihood, we use an ODE solver to calculate the non-event log-likelihood. Calculating the non-event log-likelihood involves the integral problem in Eq. (4), and our method can solve it exactly since conditional intensity function λ , which indicates an instantaneous probability of an event, is defined in a continuous manner over time by the neural CDE technology. In addition, we have three prediction layers to predict the event log-likelihood, the event type, and the event occurrence time (cf. Eqs. (8), (12), (13) and Figure 3). We conduct event prediction experiments with 4 datasets and 4 baselines. Our method shows outstanding performance in all three aspects: i) event type prediction, ii) event time prediction, and iii) log-likelihood. Our contributions are as follows: 1. We model the continuous occurrence dynamics under the framework of neural CDE whose original theory was developed for describing irregular non-linear dynamics. Many real-world Hawkes process datasets have irregular inter-arrival times of events. 2. We then exactly solve the integral problem in Eq. (4) to calculate the non-event log-likelihood, which had been done typically through heuristic methods before our work. 2 Preliminaries 2.1 Multivariate Point Processes Multivariate point processes are a generative model of an event sequence X = {(kj, tj)}N j=1 and xj = (kj, tj) indicates j-th event in the sequence. This event sequence is a subset of an event stream under a continuous time interval 1One can use interpolation algorithms or neural networks for creating Z(t) from {(zj, tj)}b j=a [Kidger et al., 2020]. [t1, t N], and an observation xj at time tj has an event type kj {1, , K}, where K is total number of event types. The arrival time of events is defined as t1 < t2 < < t N. The point process model learns a probability for every (k, t) pair, where k {1, , K}, t [t1, t N]. The key feature of multivariate point processes is the intensity function λk(t), i.e., the probability that a type-k event occurs at the infinitesimal time interval [t, t + dt). The Hawkes process, one popular point process model, assumes that the intensity λk(t) of type k can be calculated by past events before t, so-called history Ht, and its form is as follows: λ k(t) := λk(t|Ht) = µk + X j:tj