# probabilistic_autoencoder__bcfc9ecc.pdf Published in Transactions on Machine Learning Research (09/2022) Probabilistic Autoencoder Vanessa Böhm vboehm@berkeley.edu Berkeley Center for Cosmological Physics Department of Physics University of California Berkeley, CA, USA Lawrence Berkeley National Laboratory Uroš Seljak useljak@berkley.edu Berkeley Center for Cosmological Physics Department of Physics University of California Berkeley, California, USA Lawrence Berkeley National Laboratory Reviewed on Open Review: https: // openreview .net/ forum? id= AEo Yjvj KVA Principal Component Analysis (PCA) minimizes the reconstruction error given a class of linear models of fixed component dimensionality. Probabilistic PCA adds a probabilistic structure by learning the probability distribution of the PCA latent space weights, thus creating a generative model. Autoencoders (AE) minimize the reconstruction error in a class of nonlinear models of fixed latent space dimensionality and outperform PCA at fixed dimensionality. Here, we introduce the Probabilistic Autoencoder (PAE) that learns the probability distribution of the AE latent space weights using a normalizing flow (NF). The PAE is fast and easy to train and achieves small reconstruction errors, high sample quality, and good performance in downstream tasks. We compare the PAE to Variational AE (VAE), showing that the PAE trains faster, reaches a lower reconstruction error, and produces good sample quality without requiring special tuning parameters or training procedures. We further demonstrate that the PAE is a powerful model for performing the downstream tasks of probabilistic image reconstruction in the context of Bayesian inference of inverse problems for inpainting and denoising applications. Finally, we identify latent space density from NF as a promising outlier detection metric. 1 Introduction Deep generative models are powerful machine learning models that can learn complex, high-dimensional data likelihoods and generate samples from them. Because of their probabilistic formulation, generative models are becoming an indispensable tool for scientific data analysis in a range of domains including particle physics (Paganini et al., 2018; Stein et al., 2020) and cosmology (Thorne et al., 2021; Reiman et al., 2020). Variational Autoencoders (VAEs) (Kingma & Welling, 2014; Rezende et al., 2014) are among the most popular generative models. VAEs project the data to a lower dimensional latent space and reformulate the data likelihood estimation as a variational inference problem. Their training objective is the Evidence Lower BOund (ELBO), which approximates the true data likelihood with a variational ansatz from below. VAEs can be built with expressive architectures, enjoy the benefits of regularization through data compression and have a firm theoretical foundation. Different to generative adversarial networks (Goodfellow et al., 2014), another popular class of generative models, VAEs provide an estimator for the data likelihood and a posterior distribution for the latent variables. Published in Transactions on Machine Learning Research (09/2022) Despite their popularity, variational autoencoders have well known practical limitations. Successful VAE training requires to find a delicate balance between the two contributing terms to the ELBO: The distortion term, which encourages high quality reconstructions, and the rate term, which controls the sample quality by matching the aggregate posterior with a chosen prior distribution (Alemi et al., 2018). Whether the VAE training process succeeds in striking this balance depends on a number of factors, including the network architectures, the chosen prior and the class of allowed posterior distributions (Hoffman & Johnson, 2016). In some cases, too powerful decoders can decouple the latent space from the input (Bowman et al., 2016; Chen et al., 2017) and lead to posterior collapse (van den Oord et al., 2017). A long list of works have dissected and studied the training behavior of VAEs (Alemi et al., 2018; Hoffman & Johnson, 2016) and suggested modifications to remedy common issues. Many fixes add complexity to the VAE model, e.g. by modifying or annealing the ELBO objective (Bowman et al., 2016; Alemi et al., 2017; Higgins et al., 2017; Makhzani et al., 2015), choosing more expressive posterior distributions (Kingma et al., 2016; Rezende & Mohamed, 2015; Salimans et al., 2015; Tran et al., 2016), or using more flexible priors (Bauer & Mnih, 2019; Chen et al., 2017; Tomczak & Welling, 2018). In this work we take a different approach. We give up on the variational ansatz that lies at the heart of VAEs and instead suggest a conceptually very simple model with very stable training properties. The Probabilistic Autoencoder (PAE) is motivated by probabilistic principal component analysis (Tipping & Bishop, 1999) and consists of an Autoencoder (AE), which is interpreted probabilistically after training by means of a Normalizing Flow (NF). Both of these components are comparably easy to set up and train and this two-stage set up allows the practitioner to optimize their hyper-parameters (model architecture, training procedure, etc.) independently. We claim that the PAE is a viable alternative to VAEs despite its conceptual simplicity. We back this claim empirically through ablation studies. Specifically, we compare the performance of the PAE to that of equivalent VAEs in a number of tasks which we think are specifically relevant for practical applications: data compression (reconstruction quality), data generation, anomaly detection and probabilistic data denoising and imputation. Our primary contributions are: 1) a simple generative model designed with ease-of-use and training in mind 2) a quantitative comparison of this model to variational autoencoders, showing that it performs relevant tasks at comparable quality and accuracy without variational inference 3) a new anomaly detection metric through NF density estimation in latent space, which is a byproduct of the PAE, but can also be used within the VAE framework. We make all of our code publicly available.1 2 Motivation: Probabilistic PCA The probabilistic autoencoder is motivated by linear Principal Component Analysis (PCA) and its probabilistic interpretation, probabilistic principal component analysis (Tipping & Bishop, 1999), which provides a PCA-based data likelihood estimate. A principal component analysis of data x RN at fixed latent space dimensionality K (K