# continual_reinforcement_learning_with_complex_synapses__cd7ba655.pdf Continual Reinforcement Learning with Complex Synapses Christos Kaplanis 1 2 Murray Shanahan 1 3 Claudia Clopath 2 Unlike humans, who are capable of continual learning over their lifetimes, artificial neural networks have long been known to suffer from a phenomenon known as catastrophic forgetting, whereby new learning can lead to abrupt erasure of previously acquired knowledge. Whereas in a neural network the parameters are typically modelled as scalar values, an individual synapse in the brain comprises a complex network of interacting biochemical components that evolve at different timescales. In this paper, we show that by equipping tabular and deep reinforcement learning agents with a synaptic model that incorporates this biological complexity (Benna & Fusi, 2016), catastrophic forgetting can be mitigated at multiple timescales. In particular, we find that as well as enabling continual learning across sequential training of two simple tasks, it can also be used to overcome within-task forgetting by reducing the need for an experience replay database. 1. Introduction One of the outstanding enigmas in computational neuroscience is how the brain is capable of continual or lifelong learning (Wixted, 2004), acquiring new memories and skills very quickly while robustly preserving old ones. Synaptic plasticity, the ability of the connections between neurons to change their strength over time, is widely considered to be the physical basis of learning in the brain and knowledge is thought to be distributed across neuronal networks, with individual synapses participating in the storage of several memories. Given this overlapping nature of memory storage, it would seem that synapses need to be both labile in response to new experiences and stable enough to retain 1Department of Computing, Imperial College London 2Department of Bioengineering, Imperial College London 3Google Deep Mind, London. Correspondence to: Christos Kaplanis . Proceedings of the 35 th International Conference on Machine Learning, Stockholm, Sweden, PMLR 80, 2018. Copyright 2018 by the author(s). old memories - a paradox often referred to as the stabilityplasticity dilemma (Carpenter & Grossberg, 1987). Artificial neural networks also have a distributed memory but, unlike the brain, are prone to catastrophic forgetting (Mc Closkey & Cohen, 1989; French, 1999); when trained on a nonstationary data distribution, such as two distinct tasks in sequence, a network can quickly forget what it learnt from earlier data. In reinforcement learning (RL), where data is typically accumulated online as the agent interacts with the environment, the distribution of experiences is often nonstationary over the training of a single task, as well as across tasks, since (i) experiences are correlated in time and (ii) the agent s policy changes as it learns. A typical way of addressing nonstationarity of data in deep RL is to store experiences in a replay database and use it to interleave old data and new data during training (Mnih et al., 2015). However, this solution does not scale well computationally as the number of tasks grows and the old data might also become unavailable at some point. Furthermore, it does not explain how the brain achieves continual learning, since the question remains as to how an ever-growing dataset is then stored without catastrophic forgetting. One potential answer may arise from the experimental observations that synaptic plasticity occurs at a range of different timescales, including short-term plasticity (Zucker & Regehr, 2002), long-term plasticity (Bliss & Lømo, 1973) and synaptic consolidation (Clopath et al., 2008). Intuitively, the slow components to plasticity could ensure that a synapse retains memory of a long history of its modifications, while the fast components render the synapse highly adaptable to the formation of new memories, perhaps providing a solution the stability-plasticity dilemma. In this paper, we explore whether a biologically plausible synaptic model (Benna & Fusi, 2016), which abstractly models plasticity over a range of timescales, can be applied to mitigate catastrophic forgetting in a reinforcement learning context. Our work is intended as a proof of principle for how the incorporation of biological complexity to an agent s parameters can be useful in tackling the lifelong learning problem. By running experiments with both tabular and deep RL agents, we find that the model helps continual learning across two simple tasks as well as within a single task, by allaying the necessity of an experience Continual Reinforcement Learning with Complex Synapses replay database, indicating that the incorporation of different timescales of plasticity can correspondingly result in improved behavioural memory over distinct timescales. Furthermore, this is achieved even though the process of synaptic consolidation has no prior knowledge of the timing of changes in the data distribution. 2. Background 2.1. The Benna-Fusi Model In this paper, we make use of a synaptic model that was originally derived to maximise the expected signal to noise ratio (SNR) of memories over time in a population of synapses undergoing continual plasticity in the form of random, uncorrelated modifications (Benna & Fusi, 2016). The model assumes that a synaptic weight w at time t is determined by its history of modifications up until that time w(t ), which are filtered by some kernel r(t t ), such that t