# variational_temporal_abstraction__fae8d137.pdf Variational Temporal Abstraction Taesup Kim1,3, , Sungjin Ahn2 , Yoshua Bengio1 1Mila, Université de Montréal, 2Rutgers University, 3Kakao Brain We introduce a variational approach to learning and inference of temporally hierarchical structure and representation for sequential data. We propose the Variational Temporal Abstraction (VTA), a hierarchical recurrent state space model that can infer the latent temporal structure and thus perform the stochastic state transition hierarchically. We also propose to apply this model to implement the jumpy imagination ability in imagination-augmented agent-learning in order to improve the efficiency of the imagination. In experiments, we demonstrate that our proposed method can model 2D and 3D visual sequence datasets with interpretable temporal structure discovery and that its application to jumpy imagination enables more efficient agent-learning in a 3D navigation task. 1 Introduction Discovering temporally hierarchical structure and representation in sequential data is the key to many problems in machine learning. In particular, for an intelligent agent exploring an environment, it is critical to learn such spatio-temporal structure hierarchically because it can, for instance, enable efficient option-learning and jumpy future imagination, abilities critical to resolving the sample efficiency problem (Hamrick, 2019). Without such temporal abstraction, imagination would easily become inefficient; imagine a person planning one-hour driving from her office to home with future imagination at the scale of every second. It is also biologically evidenced that future imagination is the very fundamental function of the human brain (Mullally & Maguire, 2014; Buckner, 2010) which is believed to be implemented via hierarchical coding of the grid cells (Wei et al., 2015). There have been approaches to learn such hierarchical structure in sequences such as the HMRNN (Chung et al., 2016). However, as a deterministic model, it has the main limitation that it cannot capture the stochastic nature prevailing in the data. In particular, this is a critical limitation to imagination-augmented agents because exploring various possible futures according to the uncertainty is what makes the imagination meaningful in many cases. There have been also many probabilistic sequence models that can deal with such stochastic nature in the sequential data (Chung et al., 2015; Krishnan et al., 2017; Fraccaro et al., 2017). However, unlike HMRNN, these models cannot automatically discover the temporal structure in the data. In this paper, we propose the Hierarchical Recurrent State Space Model (HRSSM) that combines the advantages of both worlds: it can discover the latent temporal structure (e.g., subsequences) while also modeling its stochastic state transitions hierarchically. For its learning and inference, we introduce a variational approximate inference approach to deal with the intractability of the true posterior. We also propose to apply the HRSSM to implement efficient jumpy imagination for imagination-augmented agents. We note that the proposed HRSSM is a generic generative sequence model that is not tied to the specific application to the imagination-augmented agent but can be applied to any sequential data. In experiments, on 2D bouncing balls and 3D maze exploration, we show that the proposed model Equal advising, work also done while visiting Rutgers University. Correspondence to taesup.kim@umontreal.ca and sungjin.ahn@rutgers.edu 33rd Conference on Neural Information Processing Systems (Neur IPS 2019), Vancouver, Canada. can model sequential data with interpretable temporal abstraction discovery. Then, we show that the model can be applied to improve the efficiency of imagination-augmented agent-learning. The main contributions of the paper are: 1. We propose the Hierarchical Recurrent State Space Model (HRSSM) that is the first stochastic sequence model that discovers the temporal abstraction structure. 2. We propose the application of HRSSM to imagination-augmented agent so that it can perform efficient jumpy future imagination. 3. In experiments, we showcase the temporal structure discovery and the benefit of HRSSM for agent learning. 2 Proposed Model 2.1 Hierarchical Recurrent State Space Models In our model, we assume that a sequence X = x1:T = (x1, . . . , x T ) has a latent structure of temporal abstraction that can partition the sequence into N non-overlapping subsequences X = (X1, . . . , XN). A subsequence Xi = xi 1:li has length li such that T = PT i=1 li and L = {li}. Unlike previous works (Serban et al., 2017), we treat the number of subsequences N and the lengths of subsequences L as discrete latent variables rather than given parameters. This makes our model discover the underlying temporal structure adaptively and stochastically. We also assume that a subsequence Xi is generated from a temporal abstraction zi and an observation xt has observation abstraction st. The temporal abstraction and observation abstraction have a hierarchical structure in such a way that all observations in Xi are governed by the temporal abstraction zi in addition to the local observation abstraction st. As a temporal model, the two abstractions take temporal transitions. The transition of temporal abstraction occurs only at the subsequence scale while the observation transition is performed at every time step. This generative process can then be written as follows: p(X, S, L, Z, N) = p(N) i=1 p(Xi, Si|zi, li)p(li|zi)p(zi|z