# continuous_time_bayesian_networks_with_clocks__759a78c0.pdf Continuous-Time Bayesian Networks with Clocks Nicolai Engelmann 1 Dominik Linzner 1 Heinz Koeppl 1 2 Structured stochastic processes evolving in continuous time present a widely adopted framework to model phenomena occurring in nature and engineering. However, such models are often chosen to satisfy the Markov property to maintain tractability. One of the more popular of such memoryless models are Continuous Time Bayesian Networks (CTBNs). In this work, we lift its restriction to exponential survival times to arbitrary distributions. Current extensions achieve this via auxiliary states, which hinder tractability. To avoid that, we introduce a set of node-wise clocks to construct a collection of graph-coupled semi Markov chains. We provide algorithms for parameter and structure inference, which make use of local dependencies and conduct experiments on synthetic data and a data-set generated through a benchmark tool for gene regulatory networks. In doing so, we point out advantages compared to current CTBN extensions. 1. Introduction Dynamics on networks are abundant. Examples span from small scales as in gene regulatory networks (Stella et al., 2014) to large scales as in population or opinion dynamics on social networks (De et al., 2016). All these processes depend on some form of memory. Genes interact via expression of molecules with maturation times that are not exponentially distributed (Klann & Koeppl, 2012), neuronand social dynamics depend on past events (Pillow et al., 2008). While models with non-Markovian behavior exist, e.g. Hawkes processes (Linderman et al., 2016), they either 1Department of Electrical Engineering and Information Technology, Technische Universitat Darmstadt, Darmstadt, Germany 2Department of Biology, Technische Universitat Darmstadt, Darmstadt, Germany. Correspondence to: Nicolai Engelmann . Proceedings of the 37 th International Conference on Machine Learning, Online, PMLR 119, 2020. Copyright 2020 by the author(s). follow a fixed parametrization that hinders their expressiveness, or are defined in discrete time as Dynamic Bayesian Networks (Dagum et al., 1992) and are thus heavily over parametrized if no natural time scale exists. An expressive, but Markovian model in continuous time are Continuous Time Bayesian Networks (CTBNs) (Nodelman et al., 1995). There have been several attempts in extending CTBNs to model non-Markovian behavior (Nodelman et al., 2005; Gopalratnam et al., 2005; Liu et al., 2018). However, in these extensions, a CTBN has to be inflated with arbitrary many auxiliary states. This hinders the scalability of such models. In this manuscript, we consider a different extension based on CTBNs, where we augment each local node of the network with its own clock process. This clock keeps track of the history of the node and determines the survival time of the nodes state. The dynamics of the clocks can be instantiated by arbitrary survival time distributions. We derive augmented CTBNs with clocks as a novel generative model along with likelihoods for a complete observed process in section 2. In section 3, we show how the exact parameter and structure inference can be performed. Finally, in sections 4 and 5 we show not only tractable parameter and structure learning for two example parametric survival time distributions, but as well demonstrate at hand of controlled synthetic experiments that CTBNs can fail as structure classifiers if provided data is non-Markovian, while augmented CTBNs perform well. 1.1. Continuous Time semi-Markov and Markov Chains We begin by laying out relevant key characteristics of homogeneous continuous time semi-Markov chains (CTSMC s) and explore their memoryless special case, continuous time Markov chains (CTMC s). A CTSMC is a non-Markovian jump processes with a renewal property evolving over continuous time, taking values in a countable state space X. It can be completely described by a set of transition rates λ : X ˆ Rě0 ˆ Rě0 ˆ X Ñ Rě0, where λpx, τ, t; x1q denotes the instantaneous rate of the process transitioning from state x to x1 at time t if its last transition happened at time t τ. Using them, we define the exit rate λpx, τ, tq ř x1 x λpx, τ, t; x1q. A CTSMC is homoge- Continuous Time Bayesian Networks with Clocks neous if λpx, τ, t; x1q λpx, τ; x1q is dependent of t only via τ. In this work, we solely consider homogeneous CTSMC s in this sense. To generate a realization of the chain t Xptqutě0, we have to first determine the time s from arrival in state x to the transition to another state, i.e. the survival time, from the cumulative distribution (Limnios & Oprisan, 2001) F ps | xq P p"survival time in state x" ď sq (1) 1 exp ˆ ż s 0 dτ λpx, τq , with the survival function Λ ps | xq 1 F ps | xq. We then draw the next state x1 conditioned on the last state x and the elapsed survival time s from a categorical distribution x1 | x, s Catpλpx, s; x1q{λpx, sqq, where x1 x, x, x1 P X. The property that the next state x1 is drawn independently of the time s can be called time-direction independence (Maes et al., 2009). In this case, we define transition probabilities θpx1, xq λpx, s; x1q{λpx, sq from state x to x1. The quantities λpx, sq and θpxq tθpx1, xq | x1 xu then characterize the process. Since we need to know the time of the last transition to draw the survival time and next state, the process in general is non-Markovian. However, we can augment the chain t Xptqutě0 by a process t Tptqutě0, for which Tptq τ always keeps track of the time since the last transition, resetting back to 0 at each state change. We call t Tptqutě0 the clock of the CTSMC. The two-component process t Xptq, Tptqutě0 on the product state space X ˆ Rě0 is then homogeneous and Markovian (Limnios & Oprisan, 2001). An important special case of a CTSMC is called a CTMC iff λpx, τ; x1q λpx; x1q is constant for all τ. This results in the process t Xptqutě0 becoming Markovian and memoryless on its own, independent of t Tptqutě0. As a result F ps | xq 1 exp p λpxqsq becomes fixed to be an exponential distribution. Note, that a CTMC always satisfies time-direction independence. 1.2. Continuous Time Bayesian Networks A CTBN is a collection of N homogeneous CTMCs t Xnptq | t ě 0, n P t1, . . . , Nuu evolving in parallel over a product state space X X1 ˆ X2 ˆ . . . ˆ XN (Nodelman et al., 1995). As a consequence, we can characterize the state of the CTBN x P X to be a collection of the states of its constituting CTMCs x px1, x2, . . . , x Nq. The processes of a CTBN are conditional on each other, with a dependency structure encoded in a directed graph G p V, Eq (which may contain cycles). Specifically, we associate vertices vn P V with processes Xn and edges pvm, vnq P E with Xm conditioning Xn. We define the parent set of the n th node by parn tvm | pvm, vnq P Eu. We can then collect the processes of parenting nodes as Unptq t Xmptq | m P parnu taking values in Un Ś m Pparn Xm. We denote realizations of this (vector-valued) process by un P Un. CTBNs are defined using the asynchronity of transitions in continuous time, which is that the probability of more than one jump at a given time instance scales in ophq. Under this knowledge we can generate realizations of the CTBN by using the local quantities, node-wise conditional rates λn : Xn ˆ Un Ñ Rě0 and node-wise conditional transition probabilities θn : Xn ˆ Xn ˆ Un Ñ r0, 1s. Further, we can identify the global transition rate as the sum of nodewise rates λ pxq řN n 1 λn pxn, unq. Thus, the time until the global state change x Ñ x1 takes place can be again determined via (1) in the form of an exponential distribution for a CTMC. We can then determine the next state x1 given x by subsequently drawing the changing process Xn and its next state x1 n from categorical distributions with mass functions p pn | xq λn pxn, unq ř m λm pxm, umq (2) p x1 n | x, n λn pxn, un; x1 nq λn pxn, unq . 2. Bayesian Networks of Generic CTSMC s 2.1. Model Definiton We define the augmented CTBN on a semi-continuous state space X1 ˆ Rě0 ˆ X2 ˆ Rě0 ˆ ˆ XN ˆ Rě0, with N local state variables Xn ptq and local clocks Tn ptq for n P t1, . . . , Nu. Like above, we express this collection in a vector px, τq px1, x2, . . . , x N, τ1, τ2, . . . , τNq. We refer to the random variable associated with the collection of all local state variables by X ptq and the one associated with all local clocks by T ptq. Like in original CTBN s, we can argue, that under knowledge of all (generalized) states, the global process obeys the Markov property. We fully inherit the semantics of a graph inducing the dependencies from CTBN s and allow a conditioning only via the discrete state components in X ptq like in other proposed CTBN extensions (Nodelman et al., 2005; Gopalratnam et al., 2005; Liu et al., 2018). Thus, a process being conditionally dependent on another process in the network is only dependent on its discrete state component and not its clock, cf. fig. 1 a). We introduce local conditional transition rates λ pxn, τn, un; x1 nq for a process Xn ptq depending on its state Xn ptq xn, its clock Tn ptq τn and the states of other processes Un ptq un referred to as parent state like above. These rates are completely determined by our choice of the survival time distributions. We show this Continuous Time Bayesian Networks with Clocks I II III IV I II III IV Figure 1. A graphic illustrating the semantic construction of the augmented CTBN. a) shows N 2 nodes with their local clocks. b) depicts their conditional relationships in a small time window h as an equivalent Dynamic Bayesian Network. c) shows a sample trajectory for binary X1 and X2 highlighting the four classes of time-windows for X2: full (I), censored (II), truncated and censored (III), and truncated (IV). d) gives an exemplary time-course for the transition rate of X2 with Gamma and Weibull survival time distributions. in Appendix A.3. Like above, we can define local exit rates λ pxn, τn, unq ř x1n xn λ pxn, τn, un; x1 nq. Similarly to CTBN s, if the associated graph G has an edge pvm, vnq, then Xm ptq P Un ptq is a parent of Xn ptq. We also show in Appendix A.3, that using the Markov property of the global process t Xptq, T ptqu, the asynchronous transition condition and the inherited properties of the constituting CTSMC s are sufficient to fully characterize the global survival time distribution and transition probabilities in terms of the local conditional transition rates of the processes. We do so by integrating the probabilities of all constituting processes keeping their current states xn over small time windows of duration h and thus find the survival function of the global process with respect to Xptq in the limit h Ñ 0 by Λ ps | x, τq exp τn dσ λ pxn, σ, unq Λn ps τn | xn, unq Λn pτn | xn, unq , (3) in terms of local survival functions Λn ps | xn, unq exp şs 0 dσ λ pxn, σ, unq conditioned on the parents state un. An illustration of this approach is given in fig. 1 b). Consequently, introducing the global F ps | x, τq 1 Λ ps | x, τq and local Fn ps | xn, unq 1 Λn ps | xn, unq cumulative distribution functions along with their local density functions fn ps | xn, unq, we can give the density function of the global survival time by p ps | x, τq d F ps | x, τq fn ps τn | xn, unq Λn pτn | xn, unq Λk ps τk | xk, ukq Λk pτk | xk, ukq . Note, that we can describe p ps | x, τq in terms of only a finite set of local distributions, which are themselves independent of the current clock values τ. The query for the next global state x1 can be answered by subsequently drawing the changing process Xn and its next state x1 n given the clock values τ and the duration s from categorical distributions with mass functions p pn | x, τ, sq λn pxn, τn s, unq ř m λm pxm, τm s, umq (5) p x1 n | x, τ, s, n λn pxn, τn s, un; x1 nq λn pxn, τn s, unq . As a result, we update the clock values τn Ñ 0 for the changing process and @m n : τm Ñ τm s for the others. We provide a detailed derivation of the global survival time density and the transition probabilities in Appendix A.3. We are now in the position to generate CTBNs with any parametrized distribution for any state xn and condition un for all n. It is no coincidence, that (4) resembles the density of a minimum distribution, which renders trajectory sampling straight forward and efficient for parametric local survival time distributions despite having to evaluate truncated distributions. This can be done with a Gillespie algorithm (outlined in Appendix C), subsequently drawing global survival times, changing processes and next states. After each step, we have to update x and τ accordingly. Note, that we share the occurrence of truncations with generalized semi-Markov chains, in which triggered events also interrupt the survival procedure and we have to continue bearing truncated distributions (Cassandras & Lafortune, 2006). 2.1.1. PARAMETRIZATION OF LOCAL SURVIVAL TIME DISTRIBUTIONS In (1) from section 1.1 we can see, that the exit rates λpx, sq can be determined if Λ ps | xq is known. This also Continuous Time Bayesian Networks with Clocks holds for the local λn pxn, s, unq given Λn ps | xn, unq in the augmented CTBN. Despite being able to parametrize local distributions by transition rate functions directly, many distributions are conveniently parametrized by a different set of quantities. In consideration of possible applications, we therefore introduce hyperparameters φ t@n, xn, un : φn pxn, unqu representing a chosen distribution per process n, its state xn and parent state un, and consequently let λn pxn, s, unq and Λn ps | xn, unq take on a functional form dependent on φn pxn, unq. We also restrict our considerations to cases, where the local processes in the augmented CTBN obey time-direction independence between state changes, which allows the factorization λn pxn, τn, un; x1 nq θn px1 n | xn, unq λn pxn, τn, unq. As a consequence, φ suffices to describe all survival time distributions. To account for this, we write λn pxn, s, unq λn ps, φ pxn, unqq and Λn ps | xn, unq Λn ps | φ pxn, unqq. To give a short example, for a Weibull distribution in shape-rate parametrization (k, b), we would have the free parameters φn pxn, unq pkn pxn, unq , bn pxn, unqq for local state xn and parent state un, giving Λn ps | φn pxn, unqq (6) exp bn pxn, unq sknpxn, unq λn ps, φn pxn, unqq bn pxn, unq kn pxn, unq sknpxn, unq 1. 2.2. Path Measure The global path measure of our augmented CTBN can be derived from the expressions in (4) and (5), and factorizes in terms of its discrete X ptq and continuous T ptq components for single transitions as a consequence of our simplifications in section 2.1.1. For a fixed number M of transitions it is given by a product of M global density function evaluations (4) and M evaluations of the mass functions representing the categorical transition directions (5). Shown in detail in Appendix A.4, we obtain the density for a single timewindow from entry in state x with clock configuration τ on to our transition to x1 after duration s with the n-th process changing by p x1, τ 1 | x, τ, θ, φ p x1 | x, θ p τ 1 | x1, x, τ, φ θn x1 n | xn, un fn ps τn | φn pxn, unqq Λn pτn | φn pxn, unqq Λk ps τk | φk pxk, ukqq Λk pτk | φk pxk, ukqq , (7) while the factor p px1 | x, θq is independent of the clocks and the survival times. We recognize this from embedded Markov chains in the continuous time context (Pinsky & Karlin, 2010). Further, we must specify x1 x1, . . . , xn 1, x1 n, xn 1, . . . , x N τ 1 pτ1 s, . . . , τn 1 s, 0, τn 1 s, . . . , τN sq . Since successive time-windows condition on px1, τ 1q as the new starting point , we can construct the measure for M transitions by a product of M such time-windows. It is also reasonable to assume the final window to be fully censored, so that there is no transition exactly at the end. In this case, the measure for this time-window reduces to an evaluation of the global survival function (3). Consequently, gathering X xp0q, xp1q, . . . , xp Mq( and T τ p0q, τ p1q, . . . , τ p Mq( , we can write p p X, T | θ, φq p p X | θq p p T | X, φq (8) m 0 p xpm 1q | xpmq, θ m 0 p τ pm 1q | xpm 1q, xpmq, τ pmq, φ . which can be made explicit by directly inserting the respective terms outlined in (7). We give the factors of the path measure associated with the survival times for Weibull and Gamma distributions in section 4 and Appendix A.5. Under knowledge of (8), we can work on the question of inference of model parameters of the augmented CTBN. Before we do so, however, we want to briefly address related work. 2.3. Related Work In the following we want to argue about the design choices of our model and compare them to models existing in literature. This is a discussion about different clock augmentations. Proposed extensions to CTBNs using phase-type distributions (Gopalratnam et al., 2005)(Nodelman et al., 2005) and hidden nodes (Liu et al., 2018) allow per-process survival time related structures. In both models, multiple so-called phases keep track of the survival time and are associated to a so-called generalized state and only this generalized state is relevant for the conditions imposed on a dependent process. The CTBN extension based on hidden nodes in addition only allows conditioning of nodes on binary state spaces. This limits the models applicability and requires comparably large networks to model nodes with many states and arbitrary non-exponential survival time distributions. Continuous Time Bayesian Networks with Clocks While phase-type CTBNs are similar in terms of expressiveness, they suffer from two distinct problems which we aim to avoid. The first problem is limited tractability due to the introduction of auxiliary states, the second being the heavy over-parametrization of phase-type distributions (Hobolth et al., 2019). We alleviate both problems by replacing arbitrary auxiliary states with, also arbitrary, survival time distributions with few parameters. 3. Inference of Model Parameters Model inference can be done with varying efficiency, depending on the choice of the survival time distribution. We can see in (8), that the likelihood factors not only in single time-window terms, but also in transition probabilities θ and survival time distribution parameters φ. We can therefore give the posterior for independent θ and φ by p pθ, φ | X, Tq 9 p p X, T | θ, φq p pθq p pφq p p X | θq p pθq p p T | X, φq p pφq 9 p pθ | Xq p pφ | X, Tq . (9) As a consequence, we can infer θ traditionally in terms of transition probabilities of an embedded Markov chain (Pinsky & Karlin, 2010). This is well studied, and can be done analytically using a conjugate Dirichlet prior, so we refer to the original literature (Nodelman et al., 2003) for the expressions. In (7), we can also see, that each single timewindow only depends on parameters θn px1 n | xn, unq and φn pxn, unq associated with the relevant local xn and parent state un configurations. If we assume all φn pxn, unq to be mutually independent, p pφ | X, Tq also factors in the local marginal posteriors p pφn pxn, unq | X, Tq, significantly reducing the curse of dimensionality in associated inference and learning problems. In this case, the marginal posterior p pφn pxn, unq | X, Tq can be determined via p pφn pxn, unq | X, Tq (10) fn spmq τ pmq n ˇˇˇ φn pxn, unq 1 cpm, n, xn, unq Λn τ pmq n ˇˇˇ φn pxn, unq ˆ Λn spmq τ pmq n ˇˇˇ φn pxn, unq cpm, n, xn, unq ˆ p pφn pxn, unqq , where spmq max τ pm 1q τ pmq( is the duration of the m-th time-window and c pm, n, xn, unq 1 if it did not end with a transition of Xn, i.e. it is censored, otherwise 0. Note, that if τ pmq n 0, the denominator is 1, hence the time-window is not truncated for Xn. An illustration of the four possible cases of censoring and truncation can be seen in fig. 1 c). If we omit the prior on the rhs of (10), we obtain the factor of the likelihood for φn pxn, unq. By inserting our example (6) from section 2.1.1 and recognizing fn ps | φn pxn, unqq λn ps, φn pxn, unqq Λn ps | φn pxn, unqq, we immediately obtain the expression for the Weibull case. If we assume not a fixed graph G imposing structural dependencies in the augmented CTBN but a distribution over graphs p p Gq, we render the parent sets parn (and the constitution of Un) and consequently the θ and φ dependent of G, and can find the posterior of G by p p G | X, Tq 9 p p X | Gq p p T | X, Gq p p Gq (11) Θ dθ p p X | θq p pθ | Gq Φ dφ p p T | X, φq p pφ | Gq p p Gq , where again, while the leftmost integral is analytically solvable, the rightmost can be reduced to a product of lowdimensional integrations under the mentioned assumption that the φn pxn, unq are independent. Then, the term becomes a product of integrals over the single φn pxn, unq with the integrals containing the likelihood term from the rhs of (10). In addition to that, our approach preserves structure modularity. Consequently, as long as the prior p p Gq śN n 1 p p Gnq factors in the same way, where Gn p V, Enq consists of only the incoming edges to process Xn, we can calculate the posterior of the full graph via p p X, T | Gq p p Gq śN n 1 p p X, T | Gnq p p Gnq and then G p V, Ť n Enq, even more reducing dimensionality. Because the augmented CTBN preserves the properties layed out in (9) and (11), it is - in contrast to existing CTBN extensions - possible to perform exact inference fully analytical for some non-exponential distributions. In Appendix A.6 we give the analytical marginal likelihoods (11) and posteriors (9) along with their conjugate priors and sufficient statistics for an augmented CTBN with Rayleigh survival time distributions as an example. 4. Example Survival Time Distributions The description of the augmented CTBN given above is valid for a broad family of parametric and non-parametric survival time distributions. Before we conduct experiments, we want to give two specific implementations of the augmented CTBN based on Weibull and Gamma distributions. Those are not only important in survival analysis (D.R. Cox, 1984) and queueing theory, but also exhibit distinct numeri- Continuous Time Bayesian Networks with Clocks cal properties. In addition to that, Weibull CTBNs have a non-exponential tail in the general case and are therefore naturally hard to approximate by PH distributions used in related publications. Gamma r.v. s on the other hand can be built by sums of exponential r.v. s in the case of integer shape parameters. To give an intuition, we illustrate an exemplary rate progression for a CTBN with Gamma and Weibull distributions in fig. 1 d). 4.1. Weibull CTBN s For the Weibull distribution, we chose the parametrization outlined in section 2.1.1 (see (6). To keep our notation simple, we omit the px, uq in the following, if no ambiguity occurs. Consequently, the factors of the likelihood associated with only the survival time parameters φn pxn, unq pkn, bnq then read p pkn, bn | X, Tq 9 p pkn, bnq b|Sf | n k|Sf | n m 1 skn f,m m 1 skn c,m m 1 skn t,m where the set Sf tsf,1, sf,2, . . . u contains all values, the clock would have displayed just before the considered process transitioned, i.e. τn s. Furthermore, Sc gathers all clock samples, at which a parent change happens, while St gathers the samples at the beginning of the time-windows, i.e. the truncation times. We outline the construction with focus on sample efficiency in Appendix A.5. An advantage of the Weibull distribution is its exponential form and monomial rates. This allows us to evaluate sums of monomials in a logarithmic domain directly, which results in an improved numerical stability, especially compared with a Gamma distribution. Another interesting property of the distribution is its expressibility of suband superexponential tails. 4.2. Gamma CTBN s The parameterization of the Gamma distributions in our CTBN is chosen by a rate α and shape β parameter similar to the Weibull case. The local survival functions of the CTBN for the n-th process can then be given by Λn ps | φn pxn, unqq Γpαnpx, uq, βnpx, uqsq Γpαnpx, uqq , where the Γ in the numerator denotes the upper incomplete gamma function. We can retrieve the exit rates via λn ps, φn pxn, unqq Γ pαn px, uq , βn px, uq sq βn px, uqαnpx, uq sαnpx, uq 1 exp p βn px, uq sq. The likelihood associated with the Gamma distribution can be found in Appendix A.5. Numerically disadvantageous, the logarithm of the incomplete gamma function is comparably hard to obtain and if evaluated in the linear domain first, a wide range of parameter values for different px, uq can cause the likelihood to contain ratios with numerators and denominators close to or below machine precision. 5. Experimental Work In this section, we give simulations for parameter and structure inference based on synthetic trajectories from augmented CTBNs generated using the Gillespie sampling procedure mentioned in section 2.1. In the end, we highlight a result for structure inference of an augmented CTBN under a Gamma distribution assumption on trajectories obtained through Gene Net Weaver (Schaffter et al., 2011) for a gene regulatory network. 5.1. Parameter and Structure Inference on Synthetic Trajectories For this scenario, we sampled trajectories of varying lengths from random augmented CTBN s with four nodes via the Gillespie algorithm from Appendix C and performed parameter and structure inference on the trajectories. First, we describe, how the random graph including the random survival time distribution parameters is sampled. From a uniform degree distribution we determined the size |Un| of the independent parent set for node n. Subsequently we drew a sample from a uniform Dirichlet distribution to obtain a random categorical distribution. To determine the nodes in the parent set, we then chose the |Un| maximum values in the obtained vector. We did so for each node to obtain random independent parent sets. Then, for each independent parameter set in the CTBN associated with the sampled graph, we sampled random parameters for the Weibull and Gamma distributions again from Gamma distributions. Their shapes and rates (Gamma: α1 40, β1 5 for the shape and α2 25, β2 2.5 for the rate generation, Weibull: α1 8, β1 0.5 for the shape and α2 5, β2 3 for the rate) were chosen to lead to reasonable model parameters with a significant variance in the samples and pronounced non-exponential appearance. 5.1.1. PARAMETER INFERENCE For parameter inference, we carry out an experiment for Weibull and Gamma CTBN s each. We give a plot of the mean squared error (MSE) showing quantiles over the 1 000 trajectories in fig. 2. For each of the CTBN s, we chose a random graph and sampled 1 000 trajectories with a size of 10 000 transitions each from it. Note, that the amount of transitions corresponds to the global process. We therefore obtained a significantly smaller amount of samples per parameter set φn pxn, unq of a single node, local state and Continuous Time Bayesian Networks with Clocks Figure 2. a) mean squared error (MSE) distribution for varying numbers of transitions in the Weibull parameter inference scenario. 80% of the calculated MSE s are bounded by the colored area. The blue area corresponds to the shape and the red area to the rate. An exponential convergence to the true parameter sets can be seen as a straight line in the dual log plot. b) & c) AUROC and AUPR for the Weibull structure inference scenario for varying numbers of trajectories compared with a traditional CTBN. With a mean shape of around k 15 for the generated Weibull distributions, the traditional CTBN slowly approaches the true graph, but fails to classify the ground truth with the given number of trajectories. d) MSE similar to a) for the Gamma parameter inference scenario. e) & f) AUROC and AUPR for the Gamma structure inference scenario similar to the Weibull case. A generally faster convergence (mean shape around α 8) compared to Weibull can be seen and although the traditional CTBN fails to classify the graph, it approaches it closely (see inlet plot). parent state. Then, we inferred the parameters for each graph given different numbers of transitions reaching to the maximum of 10 000. For the prior, we chose a uniform prior imposing box constraints from 0.1 to 100 for both parameters. Since we could only perform numerical posterior updates on a discrete grid, we chose to always evaluate a single likelihood to obtain an arbitrary close approximation of the posterior mode in this experiment. To calculate the MAP-estimates, we employed a limited-memory BFGS algorithm using projected gradients to stay in the bounds dictated by the uniform prior. We did so for each of the 1 000 trajectories and then calculated the MSE of each parameter estimate for each inferred number of time-windows over all trajectories. An exponential convergence to the true parameter values can be observed in the plots in fig. 2. 5.1.2. STRUCTURE INFERENCE We performed the structure inference experiment for Gamma and Weibull CTBN s separately as well and compared the results to traditional CTBN s. For each, we generated 500 different random graphs. The following procedure was performed for each of those graphs. We sampled 100 trajectories corresponding to a time-window of 5 units each (roughly 20 to 30 transitions). We chose a uniform prior over the graphs and again the uniform prior from the parameter inference above for those. For each of the trajectories containing the 100 time-windows, we calculated the marginal likelihood (11) of the graph via sample-based numerical integration employing a Romberg integration and performed a posterior update for the respective survival time φ and jump parameters θ. Using the calculated marginal likelihood, we did a posterior update for the graph. We saved the result and proceeded in the same way with the other 99 trajectories until we had 100 subsequent posterior updates for parameters and graph. The posterior updates for the parameters were dumped afterwards. This has been carried out with all the 500 graphs, so that we sampled 100 posterior distributions for each multiple of 5 units of time over the random graphs. For each of the graph posteriors, we gathered the marginal probabilities of the presence of an edge in a weighted adjacency matrix. Using the true graph and the matrices, we then calculated the AUROC and AUPR classification scores. For the structure inference for CTBNs we followed the original paper (Nodelman et al., 2003). We can see in fig. 2, that traditional CTBN s performed surprisingly well when confronted with samples from the Continuous Time Bayesian Networks with Clocks hof B ilv B soh B 0.2 0.4 0.6 Figure 3. a) The weighted adjacency matrix for the gene regulatory network experiment obtained via structure inference from a traditional CTBN, while d) gives the result for the Gamma CTBN. The grayscales indicate the values of the posterior distribution. The traditional CTBN predicts wrong edges, while the Gamma CTBN creates an almost fully connected graph despite a Laplace prior on the edges. The high confidence edges, however, are located at the correct positions in the Gamma case. b) The GRN, from which the data has been generated. It is sourced from the gold standard for E. Coli stored in Gene Net Weaver. c) & f) The AUROC scores over the chosen shape parameters for the Weibull distributions in the structure inference scenario with controlled shape parameters. e) An illustration of the five Weibull distributions with fixed shape. The number next to each curve corresonds to its shape parameter value k. The rate parameters stayed random for each graph, process, local state and parent state combination. Gamma CTBN, but ultimately fail to classify the true graph in both scenarios for the given amount of trajectories. The augmented CTBN, however, converged to the true graph in both scenarios. 5.1.3. EFFECTS OF SHAPE PARAMETERS Since both Weibull and Gamma distributions contain the exponential distribution as a special case for unit shape parameters, we designed a more controlled structure inference scenario for fixed shape parameter values to study its performance impact on traditional CTBN s. Since this impact has shown to be more pronounced for Weibull distributions (see fig. 2), we chose five different fixed shape parameters k ranging from 1 to 9, increasingly distinguishing the result from an exponential distribution. We then conducted structure inference on 100 otherwise random graphs, as we did in section 5.1.2, and calculated posterior updates for 50 trajectories each. In fig. 3, we plot the median AUROC classification score along with the 0.2 and 0.8 quantiles over the chosen shape parameters for the inferred graphs. It is visible that at larger values, the scores reached by the traditional CTBN become increasingly worse compared to the ones for Weibull CTBN s. The small difference at shape k 1, where all distributions are exponential, can be explained by the different prior distributions for the parameters. Nonetheless, it is notable, that the Weibull CTBN shows no disadvantage to the traditional CTBN when confronted with traditional data. The fluctuations in the median can be accounted to numerical inaccuracies and the limited amount of samples. The equal shapes for all involved distributions, however, may also affect the scores for both the Weibull and traditional CTBN s. 5.2. Gene Regulatory Network Finally, we show a scenario on structure inference via timecourse data of gene expression through Gene Net Weaver (Schaffter et al., 2011). For this, we selected a gene regulatory network (GRN) from E. Coli consisting of 4 genes (nodes), see fig. 3 b), and simulated noise-free trajectories at maximum time resolution for a virtual time of 1000 units. We further disabled self-regulation. The exact settings can be found in Appendix B. The values in the trajectories correspond to a relative activation of the gene based on concentrations of species present in the network and therefore continuously range from 0 to 1. As at this stage, we have not formulated our model as a latent model, we need to preprocess data. At first, we thresholded the obtained data to two levels 0 and 1 corresponding to two states per node. To account for the incompleteness, Continuous Time Bayesian Networks with Clocks we interpolated the observation points in the trajectories to hold their current value until the next point, at which they instantly change or not, depending on their value. After we now obtained continuous time discrete trajectories, we removed those, which exhibit less than 8 transitions. The remaining 100 trajectories were used for structure inference in the same way like explained above in the respective experiment. We calculated the marginal probabilities of the presence of an edge using the graph posterior and arranged them in a weighted adjacency matrix. We also performed structure inference using a traditional CTBN on the data. In fig. 3 a) and d) we plot the resulting matrices for both CTBNs. We find that the high-confidence predictions by augmented CTBNs are the correct, while high-confidence predictions of traditional CTBN are incorrect for the given amount of samples. 6. Conclusion In this work, we have presented an augmentation of CTBN s, which aims to inherit all key characteristics of its ancestor, but alleviate its restriction to simple CTMC s, which only allow exponential survival times between state changes. Inference on the model proves to be surprisingly tractable, but the introduction of real-valued clocks raises the models complexity by a significant degree. However, since computational resources are becoming more and more available, it is easier to afford inferring models bearing higher computational cost than it was in the times CTBN s were introduced. Since progress has been made in the field of approximate inference in continuous time discrete state systems with incomplete and noisy evidence recently (El-Hay et al., 2010; Rao & Teh, 2012; Linzner & Koeppl, 2018; Linzner et al., 2019), more general of such models are not longer considered hopelessly intractable. Especially since PH-distributions are highly over-parameterized and their advantages in the use in CTBN s are restricted to cases where incomplete evidence is present, we aim to present a more general approach to arbitrary survival times in CTBN s here in the light of recent progress made in the field. Acknowledgements We thank the anonymous reviewers for helpful comments on the previous version of this manuscript. N. E. and H. K. acknowledge support by the European Research Council (ERC) within the CONSYN project, No. 773196. D. L. and H. K. acknowledge funding by the European Union s Horizon 2020 research and innovation programme (i PC Pediatric Cure, No. 826121) and (Pr ECISE, No. 668858). H. K. acknowledges support by the Hessian research priority programme LOEWE within the project Compu Gene. Cassandras, C. G. and Lafortune, S. Introduction to discrete event systems. Springer-Verlag, Berlin, Heidelberg, 2006. ISBN 0387333320. Dagum, P., Galper, A., and Horvitz, E. Dynamic network models for forecasting. Uncertainty in Artificial Intelligence, (UAI):41 48, 1992. doi: 10.1016/ b978-1-4832-8287-9.50010-4. De, A., Valera, I., Ganguly, N., Bhattacharya, S., and Gomez-Rodriguez, M. Learning and forecasting opinion dynamics in social networks. Advances in Neural Information Processing Systems, (NIPS):397 405, 2016. ISSN 10495258. D.R. Cox, D. O. Analysis of survival data. 1984. doi: 10.1201/9781315137438. El-Hay, T., Cohn, I., Friedman, N., and Kupferman, R. Continuous-time belief propagation. Proceedings of the 27th International Conference on Machine Learning, (ICML):343 350, 2010. Gopalratnam, K., Kautz, H., and Weld, D. S. Extending continuous-time bayesian networks. Proceedings of the 20th National Conference on Artificial Intelligence, (AAAI):981 986, 2005. Hobolth, A., Siri-Jégousse, A., and Bladt, M. Phase-type distributions in population genetics. Theoretical Population Biology, 127:16 32, 2019. ISSN 10960325. doi: 10.1016/j.tpb.2019.02.001. Klann, M. and Koeppl, H. Spatial simulations in systems biology: from molecules to cells. International Journal of Molecular Sciences, 13(6):7798 7827, 2012. ISSN 1422-0067. doi: 10.3390/ijms13067798. Limnios, N. and Oprisan, G. Semi-Markov processes and reliability. Birkhäuser Basel, 01 2001. doi: 10.1007/ 978-1-4612-0161-8. Linderman, S. W., Adams, R. P., and Pillow, J. W. Bayesian latent structure discovery from multi-neuron recordings. Advances in Neural Information Processing Systems, (NIPS):2010 2018, 2016. ISSN 10495258. Linzner, D. and Koeppl, H. Cluster variational approximations for structure learning of continuous-time Bayesian networks from incomplete data. Advances in Neural Information Processing Systems, (Neur IPS):7880 7890, 2018. Linzner, D., Schmidt, M., and Koeppl, H. Scalable structure learning of continuous-time Bayesian networks from incomplete data. Advances in Neural Information Processing Systems, (Neur IPS):1 11, 2019. Continuous Time Bayesian Networks with Clocks Liu, M., Stella, F., Hommersom, A., and Lucas, P. J. Making continuous-time Bayesian networks more flexible. Proceedings of Machine Learning Research, 72(PMLR): 237 248, 2018. Maes, C., Netoˇcný, K., and Wynants, B. Dynamical fluctuations for semi-Markov processes. Journal of Physics A: Mathematical and Theoretical, 42(36):365002, aug 2009. doi: 10.1088/1751-8113/42/36/365002. Nodelman, U., Shelton, C. R., and Koller, D. Continuous time Bayesian networks. Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence, (UAI): 378 387, 1995. Nodelman, U., Shelton, C. R., and Koller, D. Learning continuous time Bayesian networks. Proceedings of the 19th Conference on Uncertainty in Artificial Intelligence, (UAI):451 458, 2003. Nodelman, U., Shelton, C. R., and Koller, D. Expectation maximization and complex duration distributions for continuous time Bayesian networks. Proc. Twentyfirst Conference on Uncertainty in Artificial Intelligence, (UAI):pages 421 430, 2005. Pillow, J. W., Shlens, J., Paninski, L., Sher, A., Litke, A. M., Chichilnisky, E. J., and Simoncelli, E. P. Spatio-temporal correlations and visual signalling in a complete neuronal population. Nature, 454(7207):995 999, 2008. ISSN 00280836. doi: 10.1038/nature07140. Pinsky, M. and Karlin, S. An introduction to stochastic modeling: fourth edition, pp. 1 563. Elsevier Inc, 12 2010. ISBN 9780123814166. doi: 10.1016/C2009-1-61171-0. Rao, V. and Teh, Y. Mcmc for continuous-time discrete-state systems. Advances in Neural Information Processing Systems, 1(NIPS):701 709, 01 2012. Schaffter, T., Marbach, D., and Floreano, D. Gene Net Weaver: In silico benchmark generation and performance profiling of network inference methods. Bioinformatics, 27(16):2263 2270, 2011. ISSN 13674803. doi: 10.1093/bioinformatics/btr373. Stella, F., Acerbi, E., Zelante, T., and Narang, V. Gene network inference using continuous time bayesian networks: A comparative study and application to th17 cell differentiation. BMC Bioinformatics, 15, 12 2014. doi: 10.1186/s12859-014-0387-x.