# neural_topic_modeling_with_continual_lifelong_learning__dead803c.pdf Neural Topic Modeling with Continual Lifelong Learning Pankaj Gupta 1 Yatin Chaudhary 1 2 Thomas Runkler 1 Hinrich Sch utze 2 Abstract Lifelong learning has recently attracted attention in building machine learning systems that continually accumulate and transfer knowledge to help future learning. Unsupervised topic modeling has been popularly used to discover topics from document collections. However, the application of topic modeling is challenging due to data sparsity, e.g., in a small collection of (short) documents and thus, generate incoherent topics and sub-optimal document representations. To address the problem, we propose a lifelong learning framework for neural topic modeling that can continuously process streams of document collections, accumulate topics and guide future topic modeling tasks by knowledge transfer from several sources to better deal with the sparse data. In the lifelong process, we particularly investigate jointly: (1) sharing generative homologies (latent topics) over lifetime to transfer prior knowledge, and (2) minimizing catastrophic forgetting to retain the past learning via novel selective data augmentation, co-training and topic regularization approaches. Given a stream of document collections, we apply the proposed Lifelong Neural Topic Modeling (LNTM) framework in modeling three sparse document collections as future tasks and demonstrate improved performance quantified by perplexity, topic coherence and information retrieval task. Code: https://github.com/pgcool/ Lifelong-Neural-Topic-Modeling 1. Introduction Unsupervised topic models, such as LDA (Blei et al., 2003), RSM (Salakhutdinov & Hinton, 2009), Doc NADE (Lauly et al., 2017), NVDM (Srivastava & Sutton, 2017), etc. have been popularly used to discover topics from large document 1Corporate Technology, Siemens AG Munich, Germany 2CIS, University of Munich (LMU) Munich, Germany. Correspondence to: Pankaj Gupta . Proceedings of the 37 th International Conference on Machine Learning, Online, PMLR 119, 2020. Copyright 2020 by the author(s). t = 2 t = T t = T + 1 t = 1 ipad, apple, iphone, app smartphone, tablet, phone android, mac, apple, ios, linux, windows, xp, microsoft seeds, nutrition, apple, grapes, healthy, sweet, edible, pears iphone, mac, talk, ios, apple, shares, android, diseases, profit, tablet, ipad iphone, ios, ipad, apple, android, tablet, mac (productline) system) (fruit) Lifelong TM with Knowledge Base Knowledge Base of Topics Knowledge Knowledge Accumulation past learning (Topic Modeling) future learning Figure 1. Motivation for Lifelong Topic Modeling collections. However in sparse data settings, the application of topic modeling is challenging due to limited context in a small document collection or short documents (e.g., tweets, headlines, etc.) and the topic models produce incoherent topics. To deal with this problem, there have been several attempts (Petterson et al., 2010; Das et al., 2015; Nguyen et al., 2015; Gupta et al., 2019) that introduce prior knowledge such as pre-trained word embeddings (Pennington et al., 2014) to guide meaningful learning. Lifelong Machine Learning (LML) (Thrun & Mitchell, 1995; Mitchell et al., 2015; Hassabis et al., 2017; Parisi et al., 2019) has recently attracted attention in building adaptive computational systems that can continually acquire, retain and transfer knowledge over life time when exposed to modeling continuous streams of information. In contrast, the traditional machine learning is based on isolated learning i.e., a one-shot task learning (OTL) using a single dataset and thus, lacks ability to continually learn from incrementally available heterogeneous data. The application of LML framework has shown potential for supervised natural language processing (NLP) tasks (Chen & Liu, 2016) such as in sentiment analysis (Chen et al., 2015), relation extraction (Wang et al., 2019), text classification (de Masson d Autume et al., 2019), etc. Existing works in topic modeling are either based on the OTL approach or transfer learning (Chen & Liu, 2014) using stationary batches of training data and prior knowledge without accounting for streams of document collections. The unsupervised document (neural) topic modeling still remains unexplored regarding lifelong learning. In this work, we explore unsupervised document (neural) topic modeling within a continual lifelong learning paradigm to enable knowledge-augmented topic learning over lifetime. We show that Lifelong Neural Topic Modeling (LNTM) is capable of mining and retaining prior knowledge Neural Topic Modeling with Continual Lifelong Learning (topics) from streams of large document collections, and particularly guiding topic modeling on sparse datasets using accumulated knowledge of several domains over lifespan. For example in Figure 1, we have a stream of coherent topics associated with apple extracted from a stream of large document collections over time t [1, T] (i.e., past learning). Observe that the word apple is topically contextualized by several domains, i.e., productline, operating system and fruit at tasks t = 1, t = 2 and t = T, respectively. For the future task T + 1 on a small document collection, the topic (red box) produced without LNTM is incoherent, containing some irrelevant words (marked in red) from various topics. Given a sufficient overlap (marked in green) in the past and future topic words, we aim to help topic modeling for the future task T + 1 such that the topic (red box) becomes semantically coherent (green box), leading to generate an improved document representation. Therefore, the goal of LNTM is to (1) detect topic overlap in prior topics t [1, T] of the knowledge base (KB) and topics of future task T +1, (2) positively transfer prior topic information in modeling future task, (3) retain or minimize forgetting of prior topic knowledge, and (4) continually accumulate topics in KB over life time. In this work, we particularly focus on addressing the challenge: how to simultaneously mine relevant knowledge from prior topics, transfer mined topical knowledge and also retain prior topic information under domain shifts over lifespan? Contributions: We present a novel lifelong neural topic modeling framework that learns topics for a future task with proposed approaches of: (1) Topic Regularization that enables topical knowledge transfer from several domains and prevents catastrophic forgetting in the past topics, (2) Wordembedding Guided Topic Learning that introduces prior multi-domain knowledge encoded in word-embeddings, and (3) Selective-data Augmentation Learning that identifies relevant documents from historical collections, learns topics simultaneously with a future task and controls forgetting due to selective data replay. We apply the proposed framework in modeling three sparse (future task) and four large (past tasks) document collections in sequence. Intensive experimental results show improved topic modeling on future task while retaining past learning, quantified by information retrieval, topic coherence and generalization capabilities. 2. Methodology: Lifelong Topic Modeling In following section, we describe our contributions in building Lifelong Neural Topic Modeling framework including: topic extraction, knowledge mining, retention, transfer and accumulation. See Table 1 for the description of notations. Consider a stream of document collections S = {Ω1, Ω2,..., ΩT , ΩT +1} over lifetime t [1, ..., T, T + 1], where ΩT +1 Table 1. Description of the notations used in this work Notation Description LNTM Lifelong Neural Topic Modeling Emb TF Word Embedding based transfer TR Topic Regularization SAL Selective-data Augmentation Learning Topic Pool Pool of accumulated topics Word Pool Pool of accumulated word embeddings Ωt A document collection at time/task t (T + 1) Future task {1, ..., T} Past tasks Zt RH K Topic Embedding matrix for task t Et RE K Word Embedding matrix for task t Θ LNTM parameters Φ LNTM hyper-parameters λt Emb T F Degree of relevance of Et Word Pool for (T + 1) λt T R Degree of topic imitation/forgetting of Zt by ZT +1 λt SAL Degree of domain-overlap in Ωt and ΩT +1 At RH H Topic-alignment in Zt and ZT +1 K, D Vocabulary size, document size E, H Word embedding dimension, #topics b RK Visible (input) bias vector c RH Hidden (input) bias vector v An input document (visible units) Lt Loss (negative log-likelihood) for task t W RH K Encoding matrix of Doc NADE for task (T + 1) U RK H Decoding matrix of Doc NADE for task (T + 1) is used to perform future learning. During the lifelong learning, we sequentially iterate over S and essentially analyze a document collection Ωt S using a novel topic modeling framework that can leverage and retain prior knowledge extracted from each of the lifelong steps {1, ..., t 1}. 2.1. Topic Learning via Neural Topic Model Within the OTL framework, an unsupervised neural-network based topic model named as Document Neural Autoregressive Distribution Estimation (Doc NADE) (Larochelle & Lauly, 2012; Lauly et al., 2017) has shown to outperform existing topic models based on LDA (Blei et al., 2003; Srivastava & Sutton, 2017) or neural networks such as Replicated Softmax (RSM) (Salakhutdinov & Hinton, 2009), Autoencoders (Lauly et al., 2017), NVDM (Miao et al., 2016) etc. Additionally, Gupta et al. (2019) have recently demonstrated competitiveness of Doc NADE in transfer learning settings. Thus, we adopt Doc NADE as the backbone in discovering topics and building lifelong topic learning framework. Doc NADE Formulation: For a document (observation vector) v Ωof size D such that v = (v1, ...v D), each word index vi takes a value in vocabulary {1, ..., K} of size K. Inspired by NADE (Larochelle & Murray, 2011) and RSM (Salakhutdinov & Hinton, 2009) generative modeling architectures, Doc NADE computes the joint probability distribution p(v; Θ) = QD i=1 p(vi|v