# a_functional_dynamic_boltzmann_machine__6299d22c.pdf A Functional Dynamic Boltzmann Machine Hiroshi Kajino IBM Research - Tokyo KAJINO@jp.ibm.com Dynamic Boltzmann machines (Dy BMs) are recently developed generative models of a time series. They are designed to learn a time series by efficient online learning algorithms, whilst taking long-term dependencies into account with help of eligibility traces, recursively updatable memory units storing descriptive statistics of all the past data. The current Dy BMs assume a finitedimensional time series and cannot be applied to a functional time series, in which the dimension goes to infinity (e.g., spatiotemporal data on a continuous space). In this paper, we present a functional dynamic Boltzmann machine (F-Dy BM) as a generative model of a functional time series. A technical challenge is to devise an online learning algorithm with which F-Dy BM, consisting of functions and integrals, can learn a functional time series using only finite observations of it. We rise to the above challenge by combining a kernel-based function approximation method along with a statistical interpolation method and finally derive closed-form update rules. We design numerical experiments to empirically confirm the effectiveness of our solutions. The experimental results demonstrate consistent error reductions as compared to baseline methods, from which we conclude the effectiveness of F-Dy BM for functional time series prediction. 1 Introduction This work is concerned with learning a time series for forecasting future events. In particular, we are focusing on a light-weight model that can be trained online while preserving the predictive performance as much as possible. Aside from recent high-performance but complex models like a family of recurrent neural networks (RNNs) [Rumelhart et al., 1986; Hochreiter and Schmidhuber, 1997; Sundermeyer et al., 2012], light-weight models are required when training and predicting on devices with low computational power such as mobile and Io T devices and when dealing with massive amount of stream data reported from a number of sensors. In this light, we focus on a family of vector autoregressive models (VARs) [L utkepohl, 2005], and above all, Figure 1: Illustration of F-Dy BM, modeling a functional pattern f [t](x) defined on a two dimensional space. Heat maps represent functional patterns. The current pattern depends on five past patterns and two eligibility traces (which summarize all the past patterns) through weight functions w[δ] and ul, respectively. one of its state-of-the-art variants called dynamic Boltzmann machines (Dy BMs) [Osogami and Otsuka, 2015; Osogami, 2016; Dasgupta and Osogami, 2017]. Dy BMs are recently emerging generative models of a binary/real-valued multi-dimensional time series. One of their essential characteristics is a recursively updatable memory unit summarizing all the past data, which is dubbed as an eligibility trace. Its recursive update rule enables us to develop online learning algorithms for Dy BMs while capturing long-term dependencies of a time series to increase its predictive ability. Osogami [2016] reported up to 20-30% performance gain by eligibility traces. Therefore, we employ Dy BMs as a time-series modeling framework. In this paper, we present a new variant of Dy BMs called a functional dynamic Boltzmann machine (F-Dy BM), which is able to handle a partially-observable functional time series, where at each discrete time step t Z, finite evaluations of a function f [t](x): X R are given. Our F-Dy BM is mainly motivated by spatiotemporal data. Assume that a functional time series is a spatiotemporal time series collected from mo- Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) bile devices, where f [t](x) corresponds to a temperature, water quality, or air quality observed at location x X and time step t. This example implies two properties that cannot be handled by the existing Dy BMs designed for a vectorial time series. First, it is required to forecast a future value at any location x in a continuous space X, rather than finite fixed locations as in the case with a vectorial time series, in which each dimension corresponds to each fixed location. The existing Dy BMs are not able to forecast values at infinite locations by themselves. Second, observational points can move and may appear and disappear abruptly, and in some cases, the identities of observations are lost due to privacy concerns, inhibiting us from constructing a vectorial time series. These two properties clearly highlight the essential difference between a vectorial time series and a functional one. To this end, we develop F-Dy BM along with an online learning algorithm. The development of F-Dy BM constitutes of a modeling task and an algorithm implementation task. The model of F-Dy BM can be derived rather straightforwardly based on the Gaussian Dy BM (G-Dy BM) [Osogami, 2016; Dasgupta and Osogami, 2017]; replacing vectors with functions, weight matrices with weight functions, and matrix-vector multiplications with integrals. As a result, we obtain a model of a functional time series as depicted in Fig. 1. On contrary, a learning algorithm cannot be directly derived in the same way as modeling because of the following three technical challenges. First, neither a functional time series nor weight functions can be represented in a computer having only a finite memory. Second, the model now involves integrals, which in general cannot be efficiently computed. Third, typically, only finite observations of a functional pattern are available at each time, which breaks the fully-observable assumption in G-Dy BM Our idea to overcome these difficulties is twofold. The first idea, addressing the first and second challenges, is to model weight functions by finite kernel-based basis functions. Then, we are able to represent model parameters by finite parameters, and furthermore, the kernel trick allows us to analytically compute the integrals, finally resulting in closed-form learning rules. Second, to address the third challenge, we employ a Gaussian process as a core component of our model, and estimate a functional pattern from finite observations by the maximum-a-posteriori (MAP) estimation. These ideas successfully lead to an online learning algorithm for F-Dy BM. The effectiveness of F-Dy BM is empirically demonstrated using five real spatiotemporal data sets. As we will discuss in the related work section, F-Dy BM can be interpreted as an extension of VAR and functional autoregression (FAR) [Bosq, 2000] as well as G-Dy BM; eligibility traces mainly differentiate F-Dy BM from FAR and G-Dy BM from VAR, and rigorous modeling of a functional time series differentiates FDy BM from G-Dy BM and FAR from VAR. Therefore, we design the experiment to validate the contribution of each of these innovations. The experimental results indicate that adding eligibility traces decreased the error by 12% on average, and the function-based modeling decreased the error by 11.7% on average as compared to a heuristic application of vector-based models. Hence, we conclude that F-Dy BM achieves substantial performance improvement because of the two features. Notation. We employ the following mathematical conventions. For a matrix X = [x1 . . . x N] RN D and a function f : RD R, we define f(X) := [f(x1) . . . f(x N)] RN. For matrices X and Y = [y1 . . . y M] RM D and a function K : RD RD R, we define K(X, Y ) as an N M matrix whose (n, m)- element corresponds to K(xn, ym). Let us denote a value associated with time t by f [t]. A sequence of values from time to t 1 (including t 1) is denoted by f [