# streaming_inference_for_infinite_feature_models__b8025863.pdf Streaming Inference for Infinite Feature Models Rylan Schaeffer 1 2 Yilun Du 3 Gabrielle Kaili-May Liu 2 Ila Rani Fiete 2 4 Unsupervised learning from a continuous stream of data is arguably one of the most common and most challenging problems facing intelligent agents. One class of unsupervised models, collectively termed feature models, attempts unsupervised discovery of latent features underlying the data and includes common models such as PCA, ICA, and NMF. However, if the data arrives in a continuous stream, determining the number of features is a significant challenge, and the number may grow with time. In this work, we make feature models significantly more applicable to streaming data by imbuing them with the ability to create new features, online, in a probabilistic and principled manner. To achieve this, we derive a novel recursive form of the Indian Buffet Process, which we term the Recursive IBP (R-IBP). We demonstrate that R-IBP can be be used as a prior for feature models to efficiently infer a posterior over an unbounded number of latent features, with quasilinear average time complexity and logarithmic average space complexity. We compare R-IBP to existing sampling and variational baselines in two feature models (Linear Gaussian and Factor Analysis) and demonstrate on synthetic and real data that R-IBP achieves comparable or better performance in significantly less time. 1. Introduction Feature models are a broad class of unsupervised probabilistic models that aim to decompose data into an unknown number of unknown features under certain assumptions, a class which includes principal component analysis, factor analysis, independent component analysis, non-negative *Equal contribution 1Computer Science, Stanford University 2Brain and Cognitive Sciences, MIT 3Electrical Engineering and Computer Science, MIT 4Mc Govern Institute for Brain Research, MIT. Correspondence to: Rylan Schaeffer . Proceedings of the 39 th International Conference on Machine Learning, Baltimore, Maryland, USA, PMLR 162, 2022. Copyright 2022 by the author(s). Figure 1. Motivation for Infinite Feature Models. As more data are observed, a feature model (here: PCA) requires increasingly more features to explain the data (Omniglot handwritten characters) (left) or else becomes increasingly unable to do so (right). matrix factorization, matching pursuit, and more. A fundamental problem in feature modeling analogous to the problem in mixture modeling of choosing the number of clusters is choosing the number of features. Users typically employ one of two approaches: Either (1) prespecifying a fixed number of features or (2) retroactively choosing a number of features after seeing all the data based on some criterion (e.g., selecting the number of principal components necessary to explain 95% of the variance). In a streaming setting, however, where data are received over time, neither approach suffices. For instance, representing handwritten characters with a fixed number of principal components becomes inadequate as more characters are encountered (Fig. 1). Thus the number of features should flexibly adapt to the data in the streaming context. Such flexibility is a goal not only because the streaming setting for feature models is important in its own right, but also because feature models are a pervasive approach taken in neuroscience and cognitive science to explain how intelligent agents model the world as they move through it (Olshausen & Field, 1997; Hyv arinen, 2010; Pehlevan et al., 2015). Intelligent agents, from mice to humans to mobile devices, must deal with streaming data since these agents operate with limited memory that renders storage of and computation on all previously seen data prohibitive. This raises the question of how to perform efficient streaming inference for infinite feature models, a question we answer here. Following the approach of Schaeffer et al. (2021) to efficient streaming inference for infinite mixture models, we first show that the Indian Buffet Process Streaming Inference for Infinite Feature Models (Griffiths & Ghahramani, 2005), a stochastic process frequently used for Bayesian nonparametric feature models, can be rewritten in a novel form designed for streaming inference with expected quasilinear time and expected logarithmic space complexity. We then demonstrate on both synthetic and real (tabular & non-tabular) data that R-IBP matches or exceeds the performance of five streaming and non-streaming baseline inference algorithms in less time. 2. Background 2.1. Generative Model We consider observing a sequence of D-dimensional variables o1:N (on RD) based on a sequence of N Kdimensional binary latent variables z1:N, with zn {0, 1}K, K unknown, and 1:N denoting the sequence ( 1, 2, ..., N). Each znk in the (N K)-dimensional latent variable matrix Z denotes the presence or absence of the kth feature in the nth observation. Each feature is some unknown vector Ak RD drawn i.i.d. from some distribution p(A ). Because the number of latent features K is unknown, the Indian Buffet Process (IBP) serves as a flexible prior over the latent indicators: z1:N IBP(α, β) Ak i.i.d p(A ) on|zn, {Ak} p(o|zn, {Ak}) (1) This encompasses many feature models including Principal Component Analysis, Factor Analysis, Independent Component Analysis, and Non-Negative Matrix Factorization. 2.2. Indian Buffet Process The Indian Buffet Process (IBP) (Griffiths & Ghahramani, 2011) is a two-parameter1 (α > 0, β > 0) stochastic process that defines a discrete distribution over binary matrices with finitely many rows (observations) and an unbounded number of columns (features). The name IBP arises from imagining customers (rows/observations) arriving sequentially at a buffet that has an infinite number of dishes (columns/features) and selecting which dishes to eat: the nth customer selects an integer number of new dishes λn Poisson(αβ/(β + n 1)) and then selects previous dishes with probability proportional to the number of previous customers who selected those dishes. Denoting the total number of dishes after the first n customers Λn = Pn =n n =1 λn , the IBP defines a conditional distribution 1The IBP originally had a single parameter (Griffiths & Ghahramani, 2005) but was extended to two (Ghahramani et al., 2007) and later three (Teh & G or ur, 2009). Our paper applies equally to all, but since our focus is on efficient streaming inference and not particular properties of an IBP variant, we chose the two parameter IBP to balance expositional simplicity against model flexibility. for the nth row and kth column s binary variable znk: p(zn,k = 1|z