# infinite_hidden_semimarkov_modulated_interaction_point_process__abed7c3b.pdf Infinite Hidden Semi-Markov Modulated Interaction Point Process Peng Lin , Bang Zhang , Ting Guo , Yang Wang , Fang Chen Data61 CSIRO, Australian Technology Park, 13 Garden Street, Eveleigh NSW 2015, Australia School of Computer Science and Engineering, The University of New South Wales, Australia {peng.lin, bang.zhang, ting.guo, yang.wang, fang.chen}@data61.csiro.au The correlation between events is ubiquitous and important for temporal events modelling. In many cases, the correlation exists between not only events emitted observations, but also their arrival times. State space models (e.g., hidden Markov model) and stochastic interaction point process models (e.g., Hawkes process) have been studied extensively yet separately for the two types of correlations in the past. In this paper, we propose a Bayesian nonparametric approach that considers both types of correlations via unifying and generalizing the hidden semi Markov model and interaction point process model. The proposed approach can simultaneously model both the observations and arrival times of temporal events, and automatically determine the number of latent states from data. A Metropoliswithin-particle-Gibbs sampler with ancestor resampling is developed for efficient posterior inference. The approach is tested on both synthetic and real-world data with promising outcomes. 1 Introduction Temporal events modeling is a classic machine learning problem that has drawn enormous research attentions for decades. It has wide applications in many areas, such as financial modelling, social events analysis, seismological and epidemiological forecasting. An event is often associated with an arrival time and an observation, e.g., a scalar or vector. For example, a trading event in financial market has a trading time and a trading price. A message in social network has a posting time and a sequence of words. A main task of temporal events modelling is to capture the underlying events correlation and use it to make predictions for future events observations and/or arrival times. The correlation between events observations can be readily found in many real-world cases in which an event s observation is influenced by its predecessors observations. For examples, the price of a trading event is impacted by former trading prices. The content of a new social message is affected by the contents of the previous messages. State space model (SSM), e.g., the hidden Markov model (HMM) [16], is one of the most prevalent frameworks that consider such correlation. It models the correlation via latent state dependency. Each event in the HMM is associated with a latent state that can emit an observation. A latent state is independent of all but the most recent state, i.e., Markovianity. Hence, a future event observation can be predicted based on the observed events and inferred mechanisms of emission and transition. Despite its popularity, the HMM lacks the flexibility to model event arrival time. It only allows fixed inter-arrival time. The duration of a type of state follows a geometric distribution with its self-transition probability as the parameter due to the strict Markovian constraint. The hidden semi Markov model (HSMM) [14, 21] was developed to allow non-geometric state duration. It is an extension of the HMM by allowing the underlying state transition process to be a semi-Markov chain 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain. with a variable duration time for each state. In addition to the HMM components, the HSMM models the duration of a state as a random variable and a state can emit a sequence of observations. The HSMM allows the flexibility of variable inter-arrival times, but it does not consider events correlation on arrival times. In many real-world applications, one event can trigger the occurrences of others in the near future. For instance, earthquakes and epidemics are diffusible events, i.e., one can cause the occurrences of others. Trading events in financial markets arrive in clusters. Information propagation in social network shows contagious and clustering characteristics. All these events exhibit interaction characteristics in terms of arrival times. The likelihood of an event s arrival time is affected by the previous events arrival times. Stochastic interaction point process (IPP), e.g., Hawkes process [6], is a widely adopted framework for capturing such arrival time correlation. It models the correlation via a conditional intensity function that depicts the event intensity depending on all the previous events arrival times. However, unlike the SSMs, it lacks the capability of modelling events latent states and their interactions. It is clearly desirable in real-world applications to have both arrival time correlation and observation correlation considered in a unified manner so that we can estimate both when and how events will appear. Inspired by the merits of SSMs and IPPs, we propose a novel Bayesian nonparametric approach that unifies and generalizes SSMs and IPPs via a latent semi-Markov state chain with infinitely countable number of states. The latent states governs both the observation emission and new event triggering mechanism. An efficient sampling method is developed within the framework of particle Markov chain Monte Carlo (PMCMC) [1] for the posterior inference of the proposed model. We first review closely related techniques in Section 2, and give the description of the proposed model in Section 3. Then Section 4 presents the inference algorithm. In Section 5, we show the results of the empirical studies on both synthetic and real-word data. Conclusions are drawn in Section 6. 2 Preliminaries In this section, we review the techniques that are closely related to the proposed method, namely hidden (semi-)Markov model, its Bayesian nonparametric extension and Hawkes process. 2.1 Hidden (Semi-)Markov Model The HMM [16] is one of the most popular approaches for temporal event modelling. It utilizes a sequence of latent states with Markovian property to model the dynamics of temporal events. Each event in the HMM is associated with a latent state that determines the event s observation via a emission probability distribution. The state of an event is independent of all but its most recent predecessor s state (i.e., Markovianity) following a transition probability distribution. The HMM consists of 4 components: (1) an initial state probability distribution,(2) a finite latent state space, (3) a state transition matrix, and (4) an emission probability distribution. As a result, the inference for the HMM involves: inferring (1) the initial state probability distribution, (2) the sequence of the latent states, (3) the state transition matrix and (4) the emission probability distribution. The HMM has proven to be an excellent general framework modelling sequential data, but it has two significant drawbacks: (1) The durations of events (or the inter-arrival times between events) are fixed to a common value. The state duration distributions are restricted to a geometric form. Such setting lacks the flexibility for real-world applications. (2) The size of the latent state space in the HMM must be set a priori instead of learning from data. The hidden semi-Markov model (HSMM) [14, 21] is a popular extension to the HMM, which tries to mitigate the first drawback of the HMM. It allows latent states to have variable durations, thereby forming a semi-Markov chain. It reduces to the HMM when durations follow a geometric distribution. Additional to the 4 components of the HMM, HSMM has a state duration probability distribution. As a result, the inference procedure for the HSMM also involves the inference of the duration probability distribution. It is worth noting that the interaction between events in terms of event arrival time is neglected by both the HMM and the HSMM. 2.2 Hierarchical Dirichlet Process Prior for State Transition The recent development in Bayesian nonparametrics helps address the second drawback of the HMM. Here, we briefly review the Hierarchical Dirichlet Process HMM (HDP-HMM). Let (Θ, B) be a measurable space and G0 be a probability measure on it. A Dirichlet process (DP) G is a distribution of a random probability measure over the measurable space (Θ, B). For any finite measurable partition (A1, , Ar) of Θ, the random vector (G(A1), , G(Ar)) follows a finite Dirichlet distribution parameterized by (α0G0(A1), , α0G0(Ar)), where α0 is a positive real number. HDP is defined based on DP for modelling grouped data. It is a distribution over a collection of random probability measures over the measurable space (Θ, B). Each one of these random probability measure Gk is associated with a group. A global random probability measure G0 distributed as a DP is used as a mean measure with concentration parameter γ and base probability measure H. Because the HMM can be treated as a set of mixture models in a dynamic manner, each of which corresponds to a value of the current state, the HDP becomes a natural choice as the prior over the state transitions [2, 18]. The generative HDP-HMM model can be summarized as: β | γ GEM(γ), πk |α0, β DP(α0, β), θk | λ, H H(λ), sn |sn 1, (πk) k=1 πsn 1, yn | sn, (θk) k=1 F(θsn) . (1) GEM denotes stick-breaking process. The variable sequence πk indicates the latent state sequence. yn represents the observation. HDP acts the role of a prior over the infinite transition matrices. Each πk is a draw from a DP, it depicts the transition distribution from state k. The probability measures from which πk s are drawn are parameterized by the same discrete base measure β. θ parameterizes the emission distribution F. Usually H is set to be conjugate of F simplifying inference. γ controls base measure β s degree of concentration. α0 plays the role of governing the variability of the prior mean measure across the rows of the transition matrix. Because the HDP prior doesn t distinguish self-transitions from transitions to other states, it is vulnerable to unnecessary frequent switching of states and more states. Thus, [5] proposed a sticky HDP-HMM to include a self-transition bias parameter into the state transition measure πk DP(α0 + κ, (α0β + κδk)/(α0 + κ)), where κ controls the stickness of the transition matrix. 2.3 Hawkes Process Stochastic point process [3] is a rich family of models that are designed for tackling various of temporal event modeling problems. A stochastic point process can be defined via its conditional intensity function that provides an equivalent representation as a counting process for temporal events. Given N(t) denoting the number of events occurred in the time interval [0, t) and τt indicating the arrival times of the temporal events before t, the intensity for a time point t conditioned on the arrival times of all the previous events is defined as: λ(t|τt) = lim t 0 E[N(t + t) N(t)|τt] It is worth noting that we do not consider edge effect in this paper, hence no events exist before time 0. A variety of point processes has been developed with distinct functional forms of intensity for various modeling purposes. Interaction point process (IPP) [4] considers point interactions with an intensity function dependent on historical events. Hawkes process [7, 6] is one of the most popular and flexible IPPs. Its conditional intensity has the following functional form: λ(t) = µ(t) + X tn