# variational_recurrent_adversarial_deep_domain_adaptation__c3a68b6d.pdf Published as a conference paper at ICLR 2017 VARIATIONAL RECURRENT ADVERSARIAL DEEP DOMAIN ADAPTATION Sanjay Purushotham*, Wilka Carvalho*, Tanachat Nilanon, Yan Liu Department of Computer Science University of Southern California Los Angeles, CA 90089, USA {spurusho,wcarvalh,nilanon,yanliu.cs}@usc.edu We study the problem of learning domain invariant representations for time series data while transferring the complex temporal latent dependencies between domains. Our model termed as Variational Recurrent Adversarial Deep Domain Adaptation (VRADA) is built atop a variational recurrent neural network (VRNN) and trains adversarially to capture complex temporal relationships that are domain-invariant. This is (as far as we know) the first to capture and transfer temporal latent dependencies of multivariate time-series data. Through experiments on real-world multivariate healthcare time-series datasets, we empirically demonstrate that learning temporal dependencies helps our model s ability to create domain-invariant representations, allowing our model to outperform current state-of-the-art deep domain adaptation approaches. 1 INTRODUCTION Many real-world applications require effective machine learning algorithms that can learn invariant representations across related time-series datasets. For example, precision medicine for patients of various age groups, mobile application recommendation for users based on locations, and so on. In these examples, while the domains (i.e. age group and location) may vary, there exist common predictive patterns that can aid in inferring knowledge from one domain to another. More often than not, some domains have a significantly larger number of observations than others (e.g., respiratory failure in adults vs. children). Therefore effective domain adaption of time-series data is in great demand. The general approach to tackling domain adaptation has been explored under many facets which include reducing the domain discrepancy between the source and target domains(Ben-David et al. (2007)), instance re-weighting (Jiang & Zhai (2007)), subspace alignment (Fernando et al. (2013)), and deep learning (Tzeng et al. (2015); Ganin & Lempitsky (2014)). Many of these approaches work very well for non-sequential data but are not suitable for multivariate time-series data as they do not usually capture the temporal dependencies present in the data. For sequential data, earlier work has successfully used dynamic Bayesian Networks(Huang & Yates (2009)) and Recurrent Neural Networks (Socher et al. (2011)) to learn latent feature representations which were domaininvariant. Unfortunately, these works were not flexible enough to model non-linear dynamics or did not explicitly capture and transfer the complex latent dependencies needed to perform domain adaptation of time-series data. In this paper, we address this problem with a model that learns temporal latent dependencies (i.e. dependencies between the latent variables across timesteps) that can be transferred across domains that experience different distributions in their features. We draw inspiration from the Variational Recurrent Neural Network (Chung et al. (2016)) and use variational methods to produce a latent representation that captures underlying temporal latent dependencies. Motivated by the theory of domain adaptation (Ben-David et al. (2010)), we perform adversarial training on this representation *: Co-first authors Published as a conference paper at ICLR 2017 Figure 1: A Story of Temporal Dependency and Domain Invariance (a) DNN (b) R-DANN (c) VRADA t-SNE projections for the latent representations of DNN, R-DANN, and our VRADA model. We show adaption from Adult-AHRF to Child-AHRF data. Source data is represented with red circles and target data with blue circles. From left to right, one can see that domain adaptation results in mixing the source and target domain data distributions. We can also see a story of how encoding more temporal dependency into the latent representation induces more domain-invariant representations. As models capture more underlying factors of variation, post domain adaptation representations gradually smoothen and become evenly dispersed, indicating that temporal dependency acts synergestically with domain adaptation. similarly to the Domain Adversarial Neural Network (DANN) (Ganin et al. (2016)) to make the representations invariant across domains. We call our model the Variational Recurrent Adversarial Deep Domain Adaptation (VRADA) model. As far as we know, this is the first model capable of accomplishing unsupervised domain adaptation while transferring temporal latent dependencies for complex multivariate time-series data. Figure 1 shows an example of the domain invariant representations learned by different deep learning models including our VRADA model. From this figure, we can see that our model (VRADA) shows better mixing of the domain distributions than the competing models indicating that it learns better domain invariant representations. In order to prove the efficacy of our model, we perform domain adaptation using real-world healthcare time-series data. We choose healthcare data for two primary reasons. (1) Currently, a standard protocol in healthcare is to build, evaluate, and deploy machine learning models for particular datasets that may perform poorly on unseen datasets with different distributions. For example, models built around patient data from particular age groups perform poorly on other age groups because the features used to train the models have different distributions across the groups (Alemayehu & Warner (2004); Lao et al. (2004); Seshamani & Gray (2004)). Knowledge learned from one group is not transferrable to the other group. Domain adaptation seems like a natural solution to this problem as knowledge needs to be transferred across domains which share features that exhibit different distributions. (2) Healthcare data has multiple attributes recorded per patient visit, and it is longitudinal and episodic in nature. Thus, healthcare data is a suitable platform on which to study a model which seeks to capture complex temporal representations and transfer this knowledge across domains. The rest of the paper is structured as follows. In the following section, we briefly discuss the current state-of-the-art deep domain adaptation approaches. Afterwards, we present our model mathematically, detailing how it simultaneously learns to capture temporal latent dependencies and create domain-invariant representations. In Section 4, we compare and contrast the performance of proposed approach with other approaches on two real-world health care datasets, and provide analysis on our domain-invariant representations. 2 RELATED WORK Domain adaptation is a specific instance of transfer learning in which the feature spaces are shared but their marginal distributions are different. A good survey on the two has been done in several previous works (Pan & Yang (2009); Jiang (2008); Patel et al. (2015)). Domain adaptation has been thoroughly studied in computer vision(Saenko et al. (2010); Gong et al. (2012); Fernando et al. (2013)) and natural language processing (NLP) (Blitzer (2007); Foster et al. (2010)) applications. Recently, the deep learning paradigm has become popular in domain adaptation (Chen et al. (2012); Tzeng et al. (2015); Yang & Eisenstein; Long & Wang (2015)) due to its ability to learn rich, flexible, non-linear domain-invariant representations. Here, we briefly discuss two deep domain adaptation approaches which are closely related to our proposed model. Domain Adversarial Neural Networks (DANN) Published as a conference paper at ICLR 2017 h1 h2 h3 ht x1 x2 x3 xt z1 z2 z3 zt Figure 2: Block diagram of VRADA. Blue lines show the inference process, qθe(zt|x t, z