# echostate_conditional_restricted_boltzmann_machines__f0d74e29.pdf Echo-State Conditional Restricted Boltzmann Machines Sotirios P. Chatzis Department of Electrical Engineering, Computer Engineering, and Informatics Cyprus University of Technology Limassol 3603, Cyprus soteri0s@mac.com Restricted Boltzmann machines (RBMs) are a powerful generative modeling technique, based on a complex graphical model of hidden (latent) variables. Conditional RBMs (CRBMs) are an extension of RBMs tailored to modeling temporal data. A drawback of CRBMs is their consideration of linear temporal dependencies, which limits their capability to capture complex temporal structure. They also require many variables to model long temporal dependencies, a fact that might provoke overfitting proneness. To resolve these issues, in this paper we propose the echo-state CRBM (ESCRBM): our model uses an echo-state network reservoir in the context of CRBMs to efficiently capture long and complex temporal dynamics, with much fewer trainable parameters compared to conventional CRBMs. In addition, we introduce an (implicit) mixture of ES-CRBM experts (im-ESCRBM) to enhance even further the capabilities of our ESCRBM model. The introduced im-ES-CRBM allows for better modeling temporal observations which might comprise a number of latent or observable subpatterns that alternate in a dynamic fashion. It also allows for performing sequence segmentation using our framework. We apply our methods to sequential data modeling and classification experiments using public datasets. Introduction Restricted Boltzmann machines (RBMs) (Smolensky 1986) are a popular class of two-layer undirected graphical models that model observations by means of a number of binary hidden (latent) variables (Hinton and Salakhutdinov 2006; Larochelle et al. 2007). A drawback of RBM models is their inadequacy in sequential data modeling, since their (undirected) latent variable architecture is not designed for capturing temporal dependencies in the modeled data. To resolve these issues, conditional RBMs (CRBMs) have been recently proposed as an extension of RBMs (Taylor, Hinton, and Roweis 2011). CRBMs are based on the consideration of a time-varying nature for RBM biases, which are assumed to depend on the values of the previously observed data, in the context of an autoregressive data modeling scheme. Specifically, temporal dependencies are modeled by treating the observable variables in the previous time points as additional Copyright c 2014, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. fixed inputs. This is effected by means of linear autoregressive connections from the past N configurations (time steps) of the observable variables to the current observable and hidden configuration. On the other hand, echo-state networks (ESNs) are an efficient network structure for recurrent neural network (RNN) training (Lukosevicius and Jaeger 2009; Verstraeten et al. 2007; Maass, Natschlaeger, and Markram 2002). ESNs avoid the shortcomings of typical, gradient-descent-based RNN training, which suffers from slow convergence combined with bifurcations and suboptimal estimates of the model parameters (local optima of the optimized objective functions) (Haykin and Principe 1998; Kianifardand and Swallow 1996). This is accomplished by setting up the network structure in the following way: A recurrent neural network is randomly created and remains unchanged during training. This RNN is called the reservoir. It is passively excited by the input signal and maintains in its state a nonlinear transformation of the input history. The reservoir is not trained, but only initialized in a random fashion that ensures satisfaction of some constraints. The desired output signal is generated by a linear readout layer attached to the reservoir, which computes a linear combination of the neuron outputs from the input-excited reservoir (reservoir states). The updates of the reservoir state vectors and network outputs are computed as follows: φt+1 =(1 γ)h(Λφt + Λinxt+1) + γφt (1) yt+1 = Λreadout[xt+1; φt+1] (2) where φt is the reservoir state at time t, Λ is the reservoir weight matrix, that is the matrix of the weights of the synaptic connections between the reservoir neurons, xt is the observed signal fed to the network at time t, yt is the obtained value of the readout at time t, γ 0 is the retainment rate of the reservoir (with γ > 0 if leaky integrator neurons are considered), Λreadout is the (linear) readout weights matrix, Λin are the weights between the inputs and the reservoir neurons, and h( ) is the activation function of the reservoir. The parameters Λin and Λ of the network are not trained but only properly initialized (Lukosevicius and Jaeger 2009). Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence Inspired from these advances, in this paper we propose a novel CRBM formulation that utilizes the merits of ESN reservoirs to capture complex nonlinear temporal dynamics in the modeled sequential data with increased modeling effectiveness, while entailing considerably less trainable model parameters. Training of the proposed model is conducted in an efficient way by means of contrastive divergence (CD) (Bengio and Delalleau 2008; Hinton 2002), while exact inference is possible in an elegant and computationally inexpensive way, similar to conventional CRBMs. We dub our approach the echo-state CRBM (ES-CRBM). Further, we propose an implicit mixture of ES-CRBM experts (im-ES-CRBM), to incorporate in our model additional information regarding the allocation of the observed data to latent or observable classes, and effectively capture the transitions between such classes in the observed sequences. This allows for both obtaining better data modeling performance using our framework, as well as using our methods to perform sequential data classification (sequence segmentation). As we experimentally demonstrate, our methods outperform alternative RBM-based approaches, as well as other stateof-the-art methods, such as CRFs, in both data modeling and classification applications from diverse domains. Proposed Approach Echo-State Conditional RBM Let us consider a sequence of observations {xt}T t=1. Let us also consider that each observation xt is associated with a vector of hidden variables ht. Under the proposed ESCRBM model, the joint density of the modeled observed and hidden variables yields: p(xt, ht|x