# infinite_hidden_semimarkov_modulated_interaction_point_process__abed7c3b.pdf

Inﬁnite Hidden Semi-Markov Modulated Interaction Point Process

Peng Lin , Bang Zhang , Ting Guo , Yang Wang , Fang Chen

Data61 CSIRO, Australian Technology Park, 13 Garden Street, Eveleigh NSW 2015, Australia School of Computer Science and Engineering, The University of New South Wales, Australia {peng.lin, bang.zhang, ting.guo, yang.wang, fang.chen}@data61.csiro.au

The correlation between events is ubiquitous and important for temporal events modelling. In many cases, the correlation exists between not only events emitted observations, but also their arrival times. State space models (e.g., hidden Markov model) and stochastic interaction point process models (e.g., Hawkes process) have been studied extensively yet separately for the two types of correlations in the past. In this paper, we propose a Bayesian nonparametric approach that considers both types of correlations via unifying and generalizing the hidden semi Markov model and interaction point process model. The proposed approach can simultaneously model both the observations and arrival times of temporal events, and automatically determine the number of latent states from data. A Metropoliswithin-particle-Gibbs sampler with ancestor resampling is developed for efﬁcient posterior inference. The approach is tested on both synthetic and real-world data with promising outcomes.

1 Introduction

Temporal events modeling is a classic machine learning problem that has drawn enormous research attentions for decades. It has wide applications in many areas, such as ﬁnancial modelling, social events analysis, seismological and epidemiological forecasting. An event is often associated with an arrival time and an observation, e.g., a scalar or vector. For example, a trading event in ﬁnancial market has a trading time and a trading price. A message in social network has a posting time and a sequence of words. A main task of temporal events modelling is to capture the underlying events correlation and use it to make predictions for future events observations and/or arrival times.

The correlation between events observations can be readily found in many real-world cases in which an event s observation is inﬂuenced by its predecessors observations. For examples, the price of a trading event is impacted by former trading prices. The content of a new social message is affected by the contents of the previous messages. State space model (SSM), e.g., the hidden Markov model (HMM) [16], is one of the most prevalent frameworks that consider such correlation. It models the correlation via latent state dependency. Each event in the HMM is associated with a latent state that can emit an observation. A latent state is independent of all but the most recent state, i.e., Markovianity. Hence, a future event observation can be predicted based on the observed events and inferred mechanisms of emission and transition.

Despite its popularity, the HMM lacks the ﬂexibility to model event arrival time. It only allows ﬁxed inter-arrival time. The duration of a type of state follows a geometric distribution with its self-transition probability as the parameter due to the strict Markovian constraint. The hidden semi Markov model (HSMM) [14, 21] was developed to allow non-geometric state duration. It is an extension of the HMM by allowing the underlying state transition process to be a semi-Markov chain

30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.

with a variable duration time for each state. In addition to the HMM components, the HSMM models the duration of a state as a random variable and a state can emit a sequence of observations.

The HSMM allows the ﬂexibility of variable inter-arrival times, but it does not consider events correlation on arrival times. In many real-world applications, one event can trigger the occurrences of others in the near future. For instance, earthquakes and epidemics are diffusible events, i.e., one can cause the occurrences of others. Trading events in ﬁnancial markets arrive in clusters. Information propagation in social network shows contagious and clustering characteristics. All these events exhibit interaction characteristics in terms of arrival times. The likelihood of an event s arrival time is affected by the previous events arrival times. Stochastic interaction point process (IPP), e.g., Hawkes process [6], is a widely adopted framework for capturing such arrival time correlation. It models the correlation via a conditional intensity function that depicts the event intensity depending on all the previous events arrival times. However, unlike the SSMs, it lacks the capability of modelling events latent states and their interactions.

It is clearly desirable in real-world applications to have both arrival time correlation and observation correlation considered in a uniﬁed manner so that we can estimate both when and how events will appear. Inspired by the merits of SSMs and IPPs, we propose a novel Bayesian nonparametric approach that uniﬁes and generalizes SSMs and IPPs via a latent semi-Markov state chain with inﬁnitely countable number of states. The latent states governs both the observation emission and new event triggering mechanism. An efﬁcient sampling method is developed within the framework of particle Markov chain Monte Carlo (PMCMC) [1] for the posterior inference of the proposed model.

We ﬁrst review closely related techniques in Section 2, and give the description of the proposed model in Section 3. Then Section 4 presents the inference algorithm. In Section 5, we show the results of the empirical studies on both synthetic and real-word data. Conclusions are drawn in Section 6.

2 Preliminaries

In this section, we review the techniques that are closely related to the proposed method, namely hidden (semi-)Markov model, its Bayesian nonparametric extension and Hawkes process.

2.1 Hidden (Semi-)Markov Model

The HMM [16] is one of the most popular approaches for temporal event modelling. It utilizes a sequence of latent states with Markovian property to model the dynamics of temporal events. Each event in the HMM is associated with a latent state that determines the event s observation via a emission probability distribution. The state of an event is independent of all but its most recent predecessor s state (i.e., Markovianity) following a transition probability distribution. The HMM consists of 4 components: (1) an initial state probability distribution,(2) a ﬁnite latent state space, (3) a state transition matrix, and (4) an emission probability distribution. As a result, the inference for the HMM involves: inferring (1) the initial state probability distribution, (2) the sequence of the latent states, (3) the state transition matrix and (4) the emission probability distribution.

The HMM has proven to be an excellent general framework modelling sequential data, but it has two signiﬁcant drawbacks: (1) The durations of events (or the inter-arrival times between events) are ﬁxed to a common value. The state duration distributions are restricted to a geometric form. Such setting lacks the ﬂexibility for real-world applications. (2) The size of the latent state space in the HMM must be set a priori instead of learning from data.

The hidden semi-Markov model (HSMM) [14, 21] is a popular extension to the HMM, which tries to mitigate the ﬁrst drawback of the HMM. It allows latent states to have variable durations, thereby forming a semi-Markov chain. It reduces to the HMM when durations follow a geometric distribution. Additional to the 4 components of the HMM, HSMM has a state duration probability distribution. As a result, the inference procedure for the HSMM also involves the inference of the duration probability distribution. It is worth noting that the interaction between events in terms of event arrival time is neglected by both the HMM and the HSMM.

2.2 Hierarchical Dirichlet Process Prior for State Transition

The recent development in Bayesian nonparametrics helps address the second drawback of the HMM. Here, we brieﬂy review the Hierarchical Dirichlet Process HMM (HDP-HMM). Let (Θ, B) be a measurable space and G0 be a probability measure on it. A Dirichlet process (DP) G is a distribution of a random probability measure over the measurable space (Θ, B). For any ﬁnite measurable partition (A1, , Ar) of Θ, the random vector (G(A1), , G(Ar)) follows a ﬁnite Dirichlet distribution parameterized by (α0G0(A1), , α0G0(Ar)), where α0 is a positive real number.

HDP is deﬁned based on DP for modelling grouped data. It is a distribution over a collection of random probability measures over the measurable space (Θ, B). Each one of these random probability measure Gk is associated with a group. A global random probability measure G0 distributed as a DP is used as a mean measure with concentration parameter γ and base probability measure H. Because the HMM can be treated as a set of mixture models in a dynamic manner, each of which corresponds to a value of the current state, the HDP becomes a natural choice as the prior over the state transitions [2, 18]. The generative HDP-HMM model can be summarized as:

β | γ GEM(γ), πk |α0, β DP(α0, β), θk | λ, H H(λ), sn |sn 1, (πk) k=1 πsn 1, yn | sn, (θk) k=1 F(θsn) . (1)

GEM denotes stick-breaking process. The variable sequence πk indicates the latent state sequence. yn represents the observation. HDP acts the role of a prior over the inﬁnite transition matrices. Each πk is a draw from a DP, it depicts the transition distribution from state k. The probability measures from which πk s are drawn are parameterized by the same discrete base measure β. θ parameterizes the emission distribution F. Usually H is set to be conjugate of F simplifying inference. γ controls base measure β s degree of concentration. α0 plays the role of governing the variability of the prior mean measure across the rows of the transition matrix.

Because the HDP prior doesn t distinguish self-transitions from transitions to other states, it is vulnerable to unnecessary frequent switching of states and more states. Thus, [5] proposed a sticky HDP-HMM to include a self-transition bias parameter into the state transition measure πk DP(α0 + κ, (α0β + κδk)/(α0 + κ)), where κ controls the stickness of the transition matrix.

2.3 Hawkes Process

Stochastic point process [3] is a rich family of models that are designed for tackling various of temporal event modeling problems. A stochastic point process can be deﬁned via its conditional intensity function that provides an equivalent representation as a counting process for temporal events. Given N(t) denoting the number of events occurred in the time interval [0, t) and τt indicating the arrival times of the temporal events before t, the intensity for a time point t conditioned on the arrival times of all the previous events is deﬁned as:

λ(t|τt) = lim t 0 E[N(t + t) N(t)|τt]

It is worth noting that we do not consider edge effect in this paper, hence no events exist before time 0. A variety of point processes has been developed with distinct functional forms of intensity for various modeling purposes. Interaction point process (IPP) [4] considers point interactions with an intensity function dependent on historical events. Hawkes process [7, 6] is one of the most popular and ﬂexible IPPs. Its conditional intensity has the following functional form:

λ(t) = µ(t) + X

tn<t ψn(t tn). (3)

We use λ(t) to represent intensity function conditioned on previous points τt with the consideration of notation simplicity. The function µ(t) is a non-negative background intensity function which is often set to a positive real number. The function ψn( t) represents the triggering kernel of event tn. It is a decay function deﬁned on [0, ) depicting the decayed inﬂuence of triggering new events. A typical decay function is in exponential form, i.e., λ(t) = µ + P

tn<t α exp( β (t tn)). As discussed in [7, 10], because the superposition of several Poisson processes is also a Poisson process, Hawkes process can be considered as a conditional Poisson process that is a constituted by combining a background Poisson process µ(t) and a set of triggered Poisson processes with intensity ψn(t tn).

Figure 1: (1) An intuitive illustration of the i HSMM-IPP model. Every event in the i HSMM-IPP model is associated with a latent state s, an arrival time t and an observable value y. The colours of points indicate latent states. Blue curve shows the event intensity. The top part of the ﬁgure illustrates the IPP component of the i HSMM-IPP model and the bottom part illustrates the HSMM component. The two components are integrated together via an inﬁnite countable semi-Markov latent state chain. (2) Graphical model of the i HSMM-IPP model. The top part shows the HDP-HMM.

3 Inﬁnite Hidden Semi-Markov Modulated Interaction Point Process (i HSMM-IPP)

Inspired by the merits of SSMs and IPPs, we propose an inﬁnite hidden semi-Markov modulated interaction point process model (i HSMM-IPP). It is a Bayesian nonparametric stochastic point process with a latent semi-Markov state chain determining both event emission probabilities and event triggering kernels. An intuitive illustration is given in Fig. 1 (1). Each temporal event in the i HSMM-IPP is represented by a stochastic point and each point is associated with a hidden discrete state {si} that plays the role of determining event emission and triggering mechanism. As in SSMs and IPPs, the event emission probabilities guide the generation of event observations {yi} and the event triggering kernels inﬂuence the occurrence times {ti} of events. The hidden state depends only on the most recent event s state. The size of the latent state space is inﬁnite countable with the HDP prior.

The model can be formally deﬁned as the following and its corresponding graphical model is given in Fig. 1 (2).

β | γ GEM(γ), πk | α0, β DP(α0, β), θk | η, H H(η),

ρk | χ, H H (χ), sn | sn 1, (πk) k=1 πsn 1,

tn | PP(µ +

i=1 ψρsi(t ti)), yn | sn, (θk) k=1 F(θsn).

We use ψρsi( ) to denote the triggering kernel parameterized by ρsi which is indexed by latent state si. We use ψsi( ) instead of ψρsi( ) for the remaining of the paper for the sake of notation simplicity. The i HSMM-IPP is a generative model that can be used for generating a series of events with arrival times and emitted observations. The arrival time tn is drawn from a Poisson process. We do not consider edge effect in this work. Therefore, the ﬁrst event s arrival time, t1, is drawn from a homogeneous Poisson process parameterized by a hyper-parameter µ. For n > 1, tn is drawn from an inhomogeneous Poisson process whose conditional intensity function is deﬁned as: µ + Pn 1 i=1 ψsi(t ti). As deﬁned before, ψsi( ) indicates the triggering kernel of a former point i whose latent state is si. The state of the point sn is drawn following the guidance of the HDP prior as in the HDP-HMM. The emitted observation yn is generation from the emission probability distribution F( ) parameterized by θsn which is determined by the state sn.

4 Posterior Inference for the i HSMM-IPP

In this section, we describe the inference method for the proposed i HSMM-IPP model. Despite its ﬂexibility, the proposed i HSMM-IPP model faces three challenges for efﬁcient posterior inference: (1) strong correlation nature of its temporal dynamics (2) non-Markovianity introduced by the event triggering mechanism, and (3) inﬁnite dimensional state transition. The traditional sampling methods for high dimensional probability distributions, e.g., MCMC, sequential Monte Carlo (SMC), are unreliable when highly correlated variables are updated independently, which can be the case for the i HSMM-IPP model. So we develop the inference algorithm within the framework of particle MCMC (PMCMC), a family of inferential methods recently developed in [1]. The key idea of PMCMC is to use SMC to construct a proposal kernel for an MCMC sampler. It not only improves over traditional MCMC methods but also makes Bayesian inference feasible for a large class of statistical models. For tackling the non-Markovianity, ancestor resampling scheme [13] is incorporated into our inference algorithm. As existing forward-backward sampling methods, ancestor resampling uses backward sampling to improve the mixing of PMCMC. However, it achieves the same effect in a single forward sweep instead of using separate forward and backward sweeps. More importantly, it provides an effective way of sampling for non-Markovian SSMs.

Given a sequence of N events, {yn, tn}N n=1, the inference algorithm needs to sample the hidden state sequence, {sn}N n=1, emission distribution parameters θ1:K, background event intensity µ, triggering kernel parameters, ψ1:K (we omit ρ and use ψ1:K instead of ψρ1:K for notation simplicity as before), transition matrix, π1:K, and the HDP parameters (α0, γ, κ, β). We use K to represent the number of active states and Ωto indicate the set of variables excluding the latent state sequence, i.e., Ω= {α0, β, γ, κ, µ, θ1:K, ψ1:K, π1:K}. Only major variables are listed, and Ωmay also include other variables, such as the probability of initial latent state. At a high level, all the variables are updated iteratively using a particle Gibbs (PG) sampler. A conditional SMC is performed as a proposal kernel for updating latent state sequence in each PG iteration. An ancestor resampling scheme is adopted in the conditional SMC for handling the non-Markovianity caused by the triggering mechanism. Metropolis sampling is used in each PG iteration to update background event intensity µ and triggering kernel parameters ψ1:K. The remaining variables in Ωcan be sampled by following the scheme in [5, 18] readily. The proposal distribution qΩ( ) in the conditional SMC can be set by following [19]. The PG sampler is given in the following:

Step 1: Initialization, i = 0, set Ω(0), s1:N(0), B1:N(0).

Step 2: For iteration i 1

(a) Sample Ω(i) p{ |y1:N, t1:N, s1:N(i 1)}. (b) Run a conditional SMC algorithm targeting pΩ(i)(s1:N|y1:N, t1:N) conditional on s1:N(i 1) and B1:N(i 1). (c) Sample s1:N(i) ˆpΩ(i)( |y1:N, t1:N).

We use B1:N to represent the ancestral lineage of the prespeciﬁed state path s1:N and ˆpΩ(i)( |y1:N) to represent the particle approximation of pΩ(i)( |y1:N). The details of the conditional SMC algorithm are given in the following. It is worth noting that the conditioned latent state path is only updated via the ancestor resampling.

Step 1: Let s1:N = {s B1 1 , s B2 2 , , s BN N } denote the path that is associated with the ancestral lineage B1:N Step 2: For n = 1,

(a) For j = B1, sample sj 1 qΩ( |y1), j [1, , J]. (J denotes the number of particles.)

(b) Compute weights w1(sj 1) = p(sj 1)F(y1|sj 1)/qΩ(sj 1|y1) and normalize the weights W j 1 = w1(sj 1)/ PJ m=1 w1(sm 1 ). (We use p(sj 1) to represent the probability of the initial latent state and qΩ(sj 1|y1) to represent the proposal distribution conditional on the variable set Ω.)

Step 3: For n = 2, , N :

(a) For j = Bn, sample ancestor index of particle j: aj n 1 Cat( |W 1:J n 1).

(b) For j = Bn, sample sj n qΩ( |yn, s aj n 1 n 1 ). If sj n = K + 1 then create a new state using the stick-breaking construction for the HDP: (i) Sample a new transition probability πK+1 Dir(α0β). (ii) Use stick-breaking construction to expand β [β, βK+1]:

β K+1 Beta(1, γ), βK+1 = β K+1

l=1 (1 β l).

(iii) Expand transition probability vectors πk to include transitions to state K + 1 via the HDP stick-breaking construction: πk [πk,1, , πk,K+1], k [1, K + 1], where

π k,K+1 Beta(α0βK+1, α0(1

l=1 βl)), πk,K+1 = π k,K+1

l=1 (1 π k,l).

(iv) Sample parameters for a new emission probability and triggering kernel θK+1 H and ψ1:K H . (d) Perform ancestor resampling for the conditioned state path. Compute the ancestor weights wp,j n 1|N via Eq. 7 and Eq. 8 and resample a Bn n as p(a Bn n = j) wp,j n 1|N. (e) Compute and normalize particle weights:

wn(sj n) = π(sj n|s aj n 1 n 1 )F(yn|sj n)/qΩ(sj n|s aj n 1 n 1 , yn), Wn(sj n) = wn(sj n)/(

j=1 wn(sj n)).

4.1 Metropolis Sampling for Background Intensity and Triggering Kernel

For the inference of the background intensity µ and the parameters of triggering kernels ψk in the step 2 (a) of the PG sampler, Metropolis sampling is used. As described in [3], the conditional likelihood of the occurrences of a sequence of events in IPP can be expressed as:

L p(t1:N|µ, ψ1:K) =

0 λ(t)dt . (5)

We describe the Metropolis update for ψk, and similar update can be derived for µ. The normal distribution with the current value of ψk as mean is used as the proposal distribution. The proposed candidate ψ k will be accepted with the probability: A(ψ k, ψk) = min 1, ˆp(ψ k) ˆp(ψk) . The ratio can be computed as:

ˆp(ψ k) ˆp(ψk) = p(ψ k) p(ψk) p(t1:N|ψ k, rest) p(t1:N|ψk, rest) = p(ψ k) p(ψk)

u<n ψ su(tn tu) µ(tn) + P

u<n ψsu(tn tu)

u [1,N] (Ψsu(T tu) Ψ su(T tu))

We use Ψ( ) to represent the cumulative distribution function of the kernel function ψ( ). We use ψ su( ) to represent the u-th event s triggering kernel candidate if su = k. It remains the current triggering kernel otherwise. [0, T] indicates the time period of the N events.

4.2 Truncated Ancestor Resampling for Non-Markovianity

Truncated ancestor resampling [13] is used for tackling the non-Markovianity caused by the triggering mechanism of the proposed model. The ancestor weight can be computed as:

wp,j n|N = wj n γn+p({sj 1:n, s n+1:n+p})

γn(sj 1:n) (7)

γn+p({sj 1:n, s n+1:n+p})

γt(sj 1:n) = p(s1:p, y1:p, t1:p)

p(s1:n, y1:n, t1:n) = L(t1:p)

j=n+1 F(yj|sj)π(sj|sj 1) (8)

For notation simplicity, we use wj n to represent wn(sj n). In general, n + p needs to reach the last event in the sequence. However, due the computational cost and the inﬂuence decay of the past events in the proposed i HSMM-IPP, it is practical and feasible to use only a small number of events as an approximation instead of using all the remaining events in Eq. 8.

Figure 2: (1) Normalized Hamming distance errors for synthetic data. (2) Cleaned energy consumption readings of the REDD data set. (3) Estimated states by the proposed i HSMM-IPP model.

5 Empirical Study

In the following experiments, we demonstrate the performance of the proposed inference algorithm and show the applications of the proposed i HSMM-IPP model in real-world settings.

5.1 Synthetic Data

As in [20, 5, 19], we generate the synthetic data of 1000 events via a 4-state Gaussian emission HMM with self-transition probability of 0.75 and the remaining probability mass uniformly distributed over the other 3 states. The means of emission are set to 2.0 0.5 1.0 4.0 with the deviation of 0.5. The occurrence times of events are generated via the Hawkes process with 4 different triggering kernels, each of which corresponds to a HMM state. The background intensity is set to 0.6 and the triggering kernels take the exponential form: λ(t) = 0.6 + P

tn<t α exp( β (t tn)) with {0.1, 0.9}, {0.5, 0.9}, {0.1, 0.6}, {0.5, 0.6} as the {α , β } parameter pairs of the kernels. A thinning process [15] (a point process variant of rejection sampling) is used to generate event times of Hawkes process.

We compared 4 related methods to demonstrate the performance of the proposed i HSMM-IPP model and inference algorithm: particle Gibbs sampler for sticky HDP-HMM [19], weak-limit sampler for HDP-HSMM [8], Metropolis-within-Gibbs sampler for marked Hawkes process [17] and variational inference for marked Hawkes process [11]. The normalized Hamming distance error is used to measure the performance of the estimated state sequences. The Diff distance used in [22] (i.e., R ( ψ(t) ψ(t))2dt R (ψ(t))2dt , ψ(t) and ψ(t) represent the true and estimated kernels respectively) is adopted for measuring the performance of the estimated triggering kernels. The estimated ones are greedily matched to minimize their distances from the ground truth.

The average results of the normalized Hamming distance errors are shown in Fig. 2 (1) and the Diff distance errors are shown in the second column of Table 1. The results show that the proposed inference method can not only quickly converge to an accurate estimation of the latent state sequence but also well recover the underlying triggering kernels. Its clear advantage over the compared SSMs and marked Hawkes processes is due to its considerations of both occurrence times and emitted observations for the inference.

5.2 Understanding Energy Consumption Behaviours of Households

In this section, we use energy consumption data from the Reference Energy Disaggregation Dataset (REDD ) [9] to demonstrate the application of the proposed model. The data set was collected via smart meters recording detailed appliance-level electricity consumption information for individual house. The data sets were collected with the intension to understand household energy usage patterns and make recommendations for efﬁcient consumption. The 1 Hz low frequency REDD data is used and down sampled to 1 reading per minute covering 1 day energy consumption. Very low and high consumption readings are removed from the reading sequence. Fig. 2 (2) shows the cleaned reading sequence. Colours indicate appliance types and readings are in Watts.

The appliance types are modelled as latent states in the proposed i HSMM-IPP model. The readings are the emitted observations of states governed by Gaussian distributions. The relationship between the usages of different appliances is modelled via the state transition matrix. The triggering kernels

Synthetic REDD Pipe Method Diff Hamming Log Lik Hamming Log Lik MSE Failures MSE Hours i HSMM-IPP 0.36 0.30 120.11 0.39 677 82.8 28.6 M-MHawkes 0.55 0.63 173.36 0.64 1035 142.2 80.2 VI-MHawkes 0.62 0.76 193.62 0.78 1200 166.7 93.7 HDP-HSMM - 0.42 147.52 0.52 850 103.8 42.3 S-HDP-HMM - 0.55 163.28 0.59 993 128.5 55.9

Table 1: Results on Synthetic, REDD and Pipe data sets.

of states in the model depict the inﬂuences of appliances on triggering the following energy consumptions, e.g., the usage of washing machine triggers the following energy usage of dryer. As in the ﬁrst experiment, exponential form of trigger is adopted and independent exponential priors with hyper-parameter 0.01 are used for kernel parameters (α , β ).

The 4 methods used in the ﬁrst experiment are compared with the proposed model. The average results of the normalized Hamming distance errors and the log likelihoods are shown in the third and fourth columns of Table 1. The proposed model outperforms the other methods due to the fact that it has the ﬂexibility to capture the interaction between the usages of different appliances. Other models mainly rely on the emitted observations, i.e., readings for inferring the types of appliances.

5.3 Understanding Infrastructure Failure Behaviours and Impacts

Drinking water pipe networks are valuable infrastructure assets. Their failures (e.g., pipe bursts and leaks) can cause tremendous social and economic costs. Hence, it is of signiﬁcant importance to understand the behaviours of pipe failures (i.e., occurrence time, failure type, labour hours for repair). In particular, the relationship between the types of two consecutive failures, the triggering effect of a failure on the intensity of future failures and the labour hours taken for a certain type of failure can help provide not only insights but also guidance to make informed maintenance strategies.

In this experiment, a sequence of 1600 failures occurred in the same zone within 15 years with 10 different failure types [12] are used for testing the performance of the proposed i HSMM-IPP model. Failure types are modelled as latent states. Labour hours for repair are emissions of states, which are modelled by Gaussian distributions. It is well observed in industry that pipe failures occur in clusters, i.e., certain types of failures can cause high failure risk in near future. Such behaviours are modelled via the triggering kernels of states.

As in the ﬁrst experiment, we compare the proposed i HSMM-IPP model with 4 related methods. The sequence is divided into two parts 90% and 10%. The ﬁrst part of the sequence is used for training models. The normalized Hamming distance errors and log likelihoods are used for measuring the performances on the ﬁrst part. Then the models are used for predicting the remaining 10% of the sequence. The predicted total number of failures and total labour hours for each failure type are compared with ground truth by using mean square error. The results are shown in the last four columns of Table 1. It can be seen that the proposed i HSMM-IPP achieves the best performance for both the estimation on the ﬁrst part of the sequence and the prediction on the second part of the sequence. Its superiority comes from the fact that it well utilizes both the observed labour hours and occurrence times while others only consider part of the observed information or have limitations on model ﬂexibility.

6 Conclusion

In this work, we proposed a new Bayesian nonparametric stochastic point process model, namely the inﬁnite hidden semi-Markov modulated interaction point process model. It captures both emitted observations and arrival times of temporal events for capturing the underlying event correlation. A Metropolis-within-particle-Gibbs sampler with truncated ancestor resampling is developed for the posterior inference of the proposed model. The effectiveness of the sampler is shown on a synthetic data set with the comparison of 4 related state-of-the-art methods. The superiority of the proposed model over the compared methods is also demonstrated in two real-world applications, i.e., household energy consumption modelling and infrastructure failure modelling.

[1] C. Andrieu, A. Doucet, and R. Holenstein. Particle markov chain monte carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(3):269 342, 2010. [2] M. J. Beal, Z. Ghahramani, and C. E. Rasmussen. The inﬁnite hidden markov model. In NIPS, pages 577 584, 2001. [3] D. J. Daley and D. Vere-Jones. An introduction to the theory of point processes: volume II: general theory and structure. Springer Science & Business Media, 2007. [4] P. J. Diggle, T. Fiksel, P. Grabarnik, Y. Ogata, D. Stoyan, and M. Tanemura. On parameter estimation for pairwise interaction point processes. International Statistical Review, pages 99 117, 1994. [5] E. B. Fox, E. B. Sudderth, M. I. Jordan, and A. S. Willsky. An hdp-hmm for systems with state persistence. In Proceedings of the 25th international conference on Machine learning, pages 312 319. ACM, 2008. [6] A. G. Hawkes. Spectra of some self-exciting and mutually exciting point processes. Biometrika, 58(1):83 90, 1971. [7] A. G. Hawkes and D. Oakes. A cluster process representation of a self-exciting process. Journal of Applied Probability, pages 493 503, 1974. [8] M. J. Johnson and A. Willsky. The hierarchical dirichlet process hidden semi-markov model. ar Xiv preprint ar Xiv:1203.3485, 2012. [9] K. J.Z. and J. M.J. Redd: A public data set for energy disaggregation research. In Proceedings of the Sust KDD Workshop on Data Mining Appliations in Sustainbility, 2011. [10] J. Kingman. On doubly stochastic poisson processes. In Mathematical Proceedings of the Cambridge Philosophical Society, volume 60, pages 923 930. Cambridge Univ Press, 1964. [11] L. Li and H. Zha. Energy usage behavior modeling in energy disaggregation via marked hawkes process. In Proceedings of the Twenty-Ninth AAAI Conference on Artiﬁcial Intelligence, January 25-30, 2015, Austin, Texas, USA., pages 672 678, 2015. [12] P. Lin, B. Zhang, T. Guo, Y. Wang, and F. Chen. Interaction point processes via inﬁnite branching model. The Thirtieth AAAI Conference on Artiﬁcial Intelligence (AAAI), 2016. [13] F. Lindsten, M. I. Jordan, and T. B. Schön. Particle gibbs with ancestor sampling. The Journal of Machine Learning Research, 15(1):2145 2184, 2014. [14] K. P. Murphy. Hidden semi-markov models (hsmms). unpublished notes, 2, 2002. [15] Y. Ogata. On lewis simulation method for point processes. Information Theory, IEEE Transactions on, 27(1):23 31, 1981. [16] L. R. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257 286, 1989. [17] J. G. Rasmussen. Bayesian inference for hawkes processes. Methodology and Computing in Applied Probability, 15(3):623 642, 2013. [18] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical dirichlet processes. Journal of the american statistical association, 101(476), 2006. [19] N. Tripuraneni, S. Gu, H. Ge, and Z. Ghahramani. Particle gibbs for inﬁnite hidden markov models. In Advances in Neural Information Processing Systems, pages 2386 2394, 2015. [20] J. Van Gael, Y. Saatci, Y. W. Teh, and Z. Ghahramani. Beam sampling for the inﬁnite hidden markov model. In Proceedings of the 25th international conference on Machine learning, pages 1088 1095. ACM, 2008. [21] S.-Z. Yu. Hidden semi-markov models. Artiﬁcial Intelligence, 174(2):215 243, 2010. [22] K. Zhou, H. Zha, and L. Song. Learning triggering kernels for multi-dimensional hawkes processes. In ICML, pages 1301 1309, 2013.