# naomi_nonautoregressive_multiresolution_sequence_imputation__29451b25.pdf NAOMI: Non-Autoregressive Multiresolution Sequence Imputation Yukai Liu Caltech yukai@caltech.edu Rose Yu Northeastern University roseyu@northeastern.edu Stephan Zheng Caltech, Salesforce stephan.zheng@salesforce.com Eric Zhan Caltech ezhan@caltech.edu Yisong Yue Caltech yyue@caltech.edu Missing value imputation is a fundamental problem in spatiotemporal modeling, from motion tracking to the dynamics of physical systems. Deep autoregressive models suffer from error propagation which becomes catastrophic for imputing long-range sequences. In this paper, we take a non-autoregressive approach and propose a novel deep generative model: Non-Aut Oregressive Multiresolution Imputation (NAOMI) to impute long-range sequences given arbitrary missing patterns. NAOMI exploits the multiresolution structure of spatiotemporal data and decodes recursively from coarse to fine-grained resolutions using a divide-andconquer strategy. We further enhance our model with adversarial training. When evaluated extensively on benchmark datasets from systems of both deterministic and stochastic dynamics. In our experiments, NAOMI demonstrates significant improvement in imputation accuracy (reducing average error by 60% compared to autoregressive counterparts) and generalization for long-range sequences. 1 Introduction The problem of missing values often arises in real-life sequential data. For example, in motion tracking, trajectories often contain missing data due to object occlusion, trajectories crossing, and the instability of camera motion [1]. Missing values can introduce observational bias into training data, making the learning unstable. Hence, imputing missing values is of critical importance to the downstream sequence learning tasks. Sequence imputation has been studied for Figure 1: Imputation process of NAOMI in a basketball play given two players (purple and blue) and 5 known observations (black dots). Missing values are imputed recursively from coarse resolution to fine-grained resolution (left to right). decades in statistics literature [2, 3, 4, 5]. Most statistical techniques are reliant on strong assumptions on missing patterns such as missing at random, and do not generalize well to unseen data. Moreover, existing methods do not work well when the proportion of missing data is high and the sequence is long. Recent studies [6, 7, 8, 9] have proposed to use deep generative models for learning flexible missing patterns from sequence data. However, all existing deep generative imputation methods are autoregressive: they model the value at cur- This work was done while the author was at Caltech. 33rd Conference on Neural Information Processing Systems (Neur IPS 2019), Vancouver, Canada. rent timestamp using the values from previous time-steps and impute missing data in a sequential manner. Hence, autoregressive models are highly susceptible to compounding error, which can become catastrophic for long-range sequence modeling. We observe in our experiments that existing autoregressive approaches struggle on sequence imputation tasks with long-range dynamics. In this paper, we introduce a novel non-autoregressive approach for long-range sequence imputation. Instead of conditioning only on the previous values, we model the conditional distribution on both the history and the (predicted) future. We exploit the multiresolution nature of spatiotemporal sequence, and decompose the complex dependency into simpler ones at multiple resolutions. Our model, Non-autoregressive Multiresolution Imputation (NAOMI), employs a divide and conquer strategy to fill in the missing values recursively. Our method is general and can work with various learning objectives. We release an implementation of our model as an open source project.2 In summary, our contributions are as follows: We propose a novel non-autoregressive decoding procedure for deep generative models that can impute missing values for spatiotemporal sequences with long-range dependencies. We introduce adversarial training using the generative adversarial imitation learning objective with a fully differentiable generator to reduce variance. We conduct exhaustive experiments on benchmark sequence datasets including traffic time series, billiards and basketball trajectories. Our method demonstrates 60% improvement in accuracy and generates realistic sequences given arbitrary missing patterns. 2 Related Work Missing Value Imputation Existing missing value imputation approaches roughly fall into two categories: statistical methods and deep generative models. Statistical methods often impose strong assumptions on the missing patterns. For example, mean/median averaging [4], linear regression [2], MICE [10], and k-nearest neighbours [11] can only handle data missing at random. Latent variables models with EM algorithm [12] can impute data missing not at random but are restricted to certain parametric models. Deep generative model offers a flexible framework of missing data imputation. For instance, [13, 6, 14] develop variants of recurrent neural networks to impute time series. [8, 9, 7] leverage generative adversarial training (GAN) [15] to learn complex missing patterns. However, all the existing imputation models are autoregressive. Non-Autoregressive Modeling Non-autoregressive models have gained competitive advantages over autoregressive models in natural language processing [16, 17, 18] and speech [19]. For instance, [19] uses a normalizing flow model [20] to train a parallel feed-forward network for speech synthesis. For neural machine translation, [16] introduce a latent fertility model with a sequence of discrete latent variables. Similarly, [17, 18] propose a fully deterministic model to reduce the amount of supervision. All these works highlight the strength of non-autoregressive models in decoding sequence data in a scalable fashion. Our work is the first non-autoregressive model for sequence imputation tasks with a novel recursive decoding algorithm. Generative Adversarial Training Generative adversarial networks (GAN) [15] introduce a discriminator to replace maximum likelihood objective, which has sparked a new paradigm of generative modeling. For sequence data, using a discriminator for the entire sequence ignores the sequential dependency and can suffer from mode collapse. [21, 22] develop imitation and reinforcement learning to train GAN in the sequential setting. [21] propose generative adversarial imitation learning to combine GAN and inverse reinforcement learning. [22] develop GAN for discrete sequences using reinforcement learning. We use an imitation learning formula with a differentiable policy. Multiresolution Generation Our method bears affinity with multiresolution generative models for images such as Progressive GAN [23] and multiscale autoregressive density estimation [24]. The key difference is that [23, 24] only capture spatial multiresolution structures and assume additive models for different resolutions. We deal with multiresolution spatiotemporal structures and generate predictions recursively. Our method is fundamentally different from hierarchical sequence models [25, 26, 27], as it only keeps track of the most relevant hidden states and update them on-the-fly, which is memory efficient and much faster to train. 2https://github.com/felixykliu/NAOMI 3 Non-Autoregressive Multiresolution Sequence Imputation Let X = (x1, x2, ..., x T ) be a sequence of T observations, where each time step xt RD. X have missing data, indicated by a masking sequence M = (m1, m2, ..., m T ). The masking mt is zero whenever xt is missing. Our goal is to replace the missing data with reasonable values for a collection of sequences. A common practice for missing value imputation is to directly model the distribution of the incomplete sequences. One can factorize the probability p(x1, , x T ) = Q t p(xt|x