# learning_robust_dynamics_through_variational_sparse_gating__00390cca.pdf Learning Robust Dynamics through Variational Sparse Gating Arnav Kumar Jain1,2, , Shivakanth Sujit2,3, Shruti Joshi2,3, Vincent Michalski1,2 Danijar Hafner4,5, Samira Ebrahimi-Kahou2,3,6 Learning world models from their sensory inputs enables agents to plan for actions by imagining their future outcomes. World models have previously been shown to improve sample-efficiency in simulated environments with few objects, but have not yet been applied successfully to environments with many objects. In environments with many objects, often only a small number of them are moving or interacting at the same time. In this paper, we investigate integrating this inductive bias of sparse interactions into the latent dynamics of world models trained from pixels. First, we introduce Variational Sparse Gating (VSG), a latent dynamics model that updates its feature dimensions sparsely through stochastic binary gates. Moreover, we propose a simplified architecture Simple Variational Sparse Gating (SVSG) that removes the deterministic pathway of previous models, resulting in a fully stochastic transition function that leverages the VSG mechanism. We evaluate the two model architectures in the Bring Back Shapes (BBS) environment that features a large number of moving objects and partial observability, demonstrating clear improvements over prior models. 1 Introduction Latent dynamics models are models that generate agent s future states in the compact latent space without feeding the high-dimensional observations back to the model. They have shown promising results on various tasks like video prediction (Karl et al., 2016; Kalman, 1960; Krishnan et al., 2015), model-based Reinforcement Learning (RL) (Hafner et al., 2020; 2021; 2019; Ha and Schmidhuber, 2018), and robotics (Watter et al., 2015). Generating sequences in the compact latent space reduces the accumulating errors leading to more accurate long-term predictions. Additionally, having lower dimensionality leads to a lower memory footprint. Solving tasks in model-based RL involves learning a world model (Ha and Schmidhuber, 2018) that can predict outcomes of actions, followed by using them to derive behaviors (Sutton, 1991). Motivated by these benefits, the recently proposed Dreamer V1 (Hafner et al., 2020) and Dreamer V2 (Hafner et al., 2021) agents achieved state-of-the-art results on a wide range of visual control tasks. Many complex tasks require reliable long-term prediction of dynamics. This is true especially in partially observable environments where only a subspace is visible to the agent, and it is usually required to accurately retain information over multiple time steps to solve the task. The Dreamer agents (Hafner et al., 2020; 2021) employ an Recurrent State-Space Model (RSSM) (Hafner et al., 2019) comprising of a Recurrent Neural Network (RNN). Training RNNs for long sequences is challenging as they suffer from optimization problems like vanishing gradients (Hochreiter, 1991; Bengio et al., 1994). Different ways of applying sparse updates in RNNs have been investigated (Campos et al., 2017; Neil et al., 2016; Goyal et al., 2019), enabling a subset of state dimensions to be constant 36th Conference on Neural Information Processing Systems (Neur IPS 2022). 1Université de Montréal, 2Mila Quebec AI Institute, 3École de technologie supérieure, 4University of Toronto, 5Google Brain, 6CIFAR. Correspondence to Arnav Kumar Jain . during the update. A sparse update prior can also be motivated by the fact that in the real world, many factors of variation are constant over extended periods of time. For instance, several objects in a physical simulation may be stationary until some force acts upon them. Additionally, this is useful in the partially observable setting where the agent observes a constrained viewpoint and has to keep track of objects that are not visible for many time steps. In this work, we introduce Variational Sparse Gating (VSG), a stochastic gating mechanism that sparsely updates the latent states at each step. Recurrent State-Space Model (RSSM) (Hafner et al., 2019) was introduced in PLa Net where the model state was composed of two paths, an image representation path and a recurrent path. Dreamer V1 (Hafner et al., 2020) and Dreamer V2 (Hafner et al., 2021) utilized them to achieve stateof-the-art results in continuous and discrete control tasks (Hafner et al., 2019). While the image representation path which is stochastic accounts for multiple possible future states, the recurrent path is deterministic to retain information over multiple time steps to facilitate gradient-based optimization. (Hafner et al., 2019) showed that both components were important for solving tasks, where the stochastic part was more important to account for partial observability of the initial states. By leveraging the proposed gating mechanism (Variational Sparse Gating (VSG)), we demonstrate that a purely stochastic model with a single component can achieve competitive results, and call it Simple Variational Sparse Gating (SVSG). To the best of our knowledge, this is the first work that shows that purely stochastic models achieve competitive performance on continuous control tasks when compared to leading agents. Existing benchmarks (Bellemare et al., 2013; Chevalier-Boisvert et al., 2018; Tassa et al., 2018) for RL do not test the capability of agents in both partial observability and stochasticity. The Atari (Bellemare et al., 2013) benchmark comprises of 55 games but most of the games are deterministic and a lot of compute is required to train on them. Some tasks in the Atari and Minigrid benchmarks are partiallyobservable but either lack stochasticity or are hard exploration tasks. Also, these benchmarks do not allow for controlling the factors of variation. We developed a new partially-observable and stochastic environment, called Bring Back Shapes (BBS), where the task is to push objects to a predefined goal area. Solving tasks in BBS require agents to remember states of previously observed objects and avoid noisy distractor objects. Furthermore, VSG and SVSG outperformed leading model-based and model-free baselines. We also present studies with varying partial-observability and stochasticity to demonstrate that the proposed agents have better memory for tracking observed objects and are more robust to increasing levels of noise. Lastly, the proposed methods were also evaluated on existing benchmarks - Deep Mind Control (DMC) (Tassa et al., 2018), DMC with Natural Background (Zhang et al., 2021; Nguyen et al., 2021b), and Atari (Bellemare et al., 2013). On the existing benchmarks, the proposed method performed better on tasks with changing viewpoints and sparse rewards. Our key contributions are summarized as follows: Variational Sparse Gating: We introduce Variational Sparse Gating (VSG), where the recurrent states are sparsely updated through a stochastic gating mechanism. A comprehensive empirical evaluation shows that VSG outperforms baselines on tasks requiring long-term memory. Simple Variational Sparse Gating: We also propose Simple Variational Sparse Gating (SVSG) which has a purely stochastic state, and achieves competitive results on continuous control tasks when compared with agents that also use a deterministic component. Bring Back Shapes: We developed the Bring Back Shapes (BBS) environment to evaluate agents on partially-observable and stochastic settings where these variations can be controlled. Our experiments show that the proposed agents are more robust to such variations. 2 Variational Sparse Gating Reinforcement Learning: The visual control task can be formulated as a Partially Observable Markov Decision Process (POMDP) with discrete time steps t [1; T]. The agent selects action at p(at|o t, a