# likelihood_ratios_for_outofdistribution_detection__ec139efd.pdf Likelihood Ratios for Out-of-Distribution Detection Jie Ren Google Research jjren@google.com Peter J. Liu Google Research peterjliu@google.com Emily Fertig Google Research emilyaf@google.com Jasper Snoek Google Research jsnoek@google.com Ryan Poplin Google Research rpoplin@google.com Mark A. De Pristo Google Research mdepristo@google.com Joshua V. Dillon Google Research jvdillon@google.com Balaji Lakshminarayanan Deep Mind balajiln@google.com Discriminative neural networks offer little or no performance guarantees when deployed on data not generated by the same process as the training distribution. On such out-of-distribution (OOD) inputs, the prediction may not only be erroneous, but confidently so, limiting the safe deployment of classifiers in real-world applications. One such challenging application is bacteria identification based on genomic sequences, which holds the promise of early detection of diseases, but requires a model that can output low confidence predictions on OOD genomic sequences from new bacteria that were not present in the training data. We introduce a genomics dataset for OOD detection that allows other researchers to benchmark progress on this important problem. We investigate deep generative model based approaches for OOD detection and observe that the likelihood score is heavily affected by population level background statistics. We propose a likelihood ratio method for deep generative models which effectively corrects for these confounding background statistics. We benchmark the OOD detection performance of the proposed method against existing approaches on the genomics dataset and show that our method achieves state-of-the-art performance. We demonstrate the generality of the proposed method by showing that it significantly improves OOD detection when applied to deep generative models of images. 1 Introduction For many machine learning systems, being able to detect data that is anomalous or significantly different from that used in training can be critical to maintaining safe and reliable predictions. This is particularly important for deep neural network classifiers which have been shown to incorrectly classify such out-of-distribution (OOD) inputs into in-distribution classes with high confidence (Goodfellow et al., 2014; Nguyen et al., 2015). This behaviour can have serious consequences when the predictions inform real-world decisions such as medical diagnosis, e.g. falsely classifying a healthy sample as pathogenic or vice versa can have extremely high cost. The importance of dealing with OOD inputs, also referred to as distributional shift, has been recognized as an important problem for Corresponding authors Google AI Resident Mentors 33rd Conference on Neural Information Processing Systems (Neur IPS 2019), Vancouver, Canada. AI safety (Amodei et al., 2016). The majority of recent work on OOD detection for neural networks is evaluated on image datasets where the neural network is trained on one benchmark dataset (e.g. CIFAR-10) and tested on another (e.g. SVHN). While these benchmarks are important, there is a need for more realistic datasets which reflect the challenges of dealing with OOD inputs in practical applications. Bacterial identification is one of the most important sub-problems of many types of medical diagnosis. For example, diagnosis and treatment of infectious diseases, such as sepsis, relies on the accurate detection of bacterial infections in blood (Blauwkamp et al., 2019). Several machine learning methods have been developed to perform bacteria identification by classifying existing known genomic sequences (Patil et al., 2011; Rosen et al., 2010), including deep learning methods (Busia et al., 2018) which are state-of-the-art. Even if neural network classifiers achieve high accuracy as measured through cross-validation, deploying them is challenging as real data is highly likely to contain genomes from unseen classes not present in the training data. Different bacterial classes continue to be discovered gradually over the years (see Figure S4 in Appendix C.1) and it is estimated that 60%-80% of genomic sequences belong to as yet unknown bacteria (Zhu et al., 2018; Eckburg et al., 2005; Nayfach et al., 2019). Training a classifier on existing bacterial classes and deploying it may result in OOD inputs being wrongly classified as one of the classes from the training data with high confidence. In addition, OOD inputs can also be the contaminations from the bacteria s host genomes such as human, plant, fungi, etc., which also need to be detected and excluded from predictions (Ponsero & Hurwitz, 2019). Thus having a method for accurately detecting OOD inputs is critical to enable the practical application of machine learning methods to this important problem. A popular and intuitive strategy for detecting OOD inputs is to train a generative model (or a hybrid model cf. Nalisnick et al. (2019)) on training data and use that to detect OOD inputs at test time (Bishop, 1994). However, Nalisnick et al. (2018) and Choi et al. (2018) recently showed that deep generative models trained on image datasets can assign higher likelihood to OOD inputs. We report a similar failure mode for likelihood based OOD detection using deep generative models of genomic sequences. We investigate this phenomenon and find that the likelihood can be confounded by general population level background statistics. We propose a likelihood ratio method which uses a background model to correct for the background statistics and enhances the in-distribution specific features for OOD detection. While our investigation was motivated by the genomics problem, we found our methodology to be more general and it shows positive results on image datasets as well. In summary, our contributions are: We create a realistic benchmark for OOD detection, that is motivated by challenges faced in applying deep learning models on genomics data. The sequential nature of genetic sequences provides a new modality and hopefully encourages the OOD research community to contribute to machine learning that matters (Wagstaff, 2012). We show that likelihood from deep generative models can be confounded by background statistics. We propose a likelihood ratio method for OOD detection, which significantly outperforms the raw likelihood on OOD detection for deep generative models on image datasets. We evaluate existing OOD methods on the proposed genomics benchmark and demonstrate that our method achieves state-of-the-art (SOTA) performance on this challenging problem. 2 Background Suppose we have an in-distribution dataset D of (x, y) pairs sampled from the distribution p (x, y), where x is the extracted feature vector or raw input and y 2 Y := {1, . . . , k, . . . , K} is the label assigning membership to one of K in-distribution classes. For simplicity, we assume inputs to be discrete, i.e. xd 2 {A, C, G, T} for genomic sequences and xd 2 {0, . . . , 255} for images. In general, OOD inputs are samples (x, y) generated from an underlying distribution other than p (x, y). In this paper, we consider an input (x, y) to be OOD if y 62 Y: that is, the class y does not belong to one of the K in-distribution classes. Our goal is to accurately detect if an input x is OOD or not. Many existing methods involve computing statistics using the predictions of (ensembles of) discriminative classifiers trained on in-distribution data, e.g. taking the confidence or entropy of the predictive distribution p(y|x) (Hendrycks & Gimpel, 2016; Lakshminarayanan et al., 2017). An alternative is to use generative model-based methods, which are appealing as they do not require labeled data and directly model the input distribution. These methods fit a generative model p(x) to the input data, and then evaluate the likelihood of new inputs under that model. However, recent work has highlighted significant issues with this approach for OOD detection on images, showing that deep generative models such as Glow (Kingma & Dhariwal, 2018) and Pixel CNN (Oord et al., 2016; Salimans et al., 2017) sometimes assign higher likelihoods to OOD than in-distribution inputs. For example, Nalisnick et al. (2018) and Choi et al. (2018) show that Glow models trained on the CIFAR-10 image dataset assign higher likelihood to OOD inputs from the SVHN dataset than they do to in-distribution CIFAR-10 inputs; Nalisnick et al. (2018), Shafaei et al. (2018) and Hendrycks et al. (2018) show failure modes of Pixel CNN and Pixel CNN++ for OOD detection. Failure of density estimation for OOD detection We investigate whether density estimation-based methods work well for OOD detection in genomics. As a motivating observation, we train a deep generative model, more precisely LSTM (Hochreiter & Schmidhuber, 1997), on in-distribution genomic sequences (composed by {A, C, G, T}), and plot the log-likelihoods of both in-distribution and OOD inputs (See Section 5.2 for the dataset and the full experimental details). Figure 1a shows that the histogram of the log-likelihood for OOD sequences largely overlaps with that of in-distribution sequences with AUROC of 0.626, making it unsuitable for OOD detection. Our observations show a failure mode of deep generative models for OOD detection on genomic sequences and are complementary to earlier work which showed similar results for deep generative models on images (Nalisnick et al., 2018; Choi et al., 2018). Figure 1: (a) Log-likelihood hardly separates in-distribution and OOD inputs with AUROC of 0.626. (b) The log-likelihood is heavily affected by the GC-content of a sequence. When investigating this failure mode, we discovered that the log-likelihood under the model is heavily affected by a sequence s GC-content, see Figure 1b. GC-content is defined as the percentage of bases that are either G or C, and is used widely in genomic studies as a basic statistic for describing overall genomic composition (Sueoka, 1962), and studies have shown that bacteria have an astonishing diversity of genomic GC-content, from 16.5% to 75% (Hildebrand et al., 2010). Bacteria from similar groups tend to have similar GC-content at the population level, but they also have characteristic biological patterns that can distinguish them well from each other. The confounding effect of GCcontent in Figure 1b makes the likelihood less reliable as a score for OOD detection, because an OOD input may result in a higher likelihood than an in-distribution input, because it has high GC-content (cf. the bottom right part of Figure 1b) and not necessarily because it contains characteristic patterns specific to the in-distribution bacterial classes. 3 Likelihood Ratio for OOD detection We first describe the high level idea and then describe how to adapt it to deep generative models. High level idea Assume that an input x is composed of two components, (1) a background component characterized by population level background statistics, and (2) a semantic component characterized by patterns specific to the in-distribution data. For example, images can be modeled as backgrounds plus objects; text can be considered as a combination of high frequency stop words plus semantic words (Luhn, 1960); genomes can be modeled as background sequences plus motifs (Bailey & Elkan, 1995; Reinert et al., 2009). More formally, for a D-dimensional input x = x1, . . . , x D, we assume that there exists an unobserved variable z = z1, . . . , z D, where zd 2 {B, S} indicates if the dth dimension of the input xd is generated from the Background model or the Semantic model. Grouping the semantic and background parts, the input can be factored as x = {x B, x S} where x B = {xd | zd = B, d = 1, . . . , D}. For simplicity, assume that the background and semantic components are generated independently. The likelihood can be then decomposed as follows, p(x) = p(x B)p(x S). (1) When training and evaluating deep generative models, we typically do not distinguish between these two terms in the likelihood. However, we may want to use just the semantic likelihood p(x S) to avoid the likelihood term being dominated by the background term (e.g. OOD input with the same background but different semantic component). In practice, we only observe x, and it is not always easy to split an input into background and semantic parts {x B, x S}. As a practical alternative, we propose training a background model by perturbing inputs. Adding the right amount of perturbations to inputs can corrupt the semantic structure in the data, and hence the model trained on perturbed inputs captures only the population level background statistics. Assume that p ( ) is a model trained using in-distribution data, and p 0( ) is a background model that captures general background statistics. We propose a likelihood ratio statistic that is defined as LLR(x) = log p (x) p 0(x) = log p (x B) p (x S) p 0(x B) p 0(x S), (2) where we use the factorization from Equation 1. Assume that (i) both models capture the background information equally well, that is p (x B) p 0(x B) and (ii) p (x S) is more peaky than p 0(x S) as the former is trained on data containing semantic information, while the latter model 0 is trained using data with noise perturbations. Then, the likelihood ratio can be approximated as LLR(x) log p (x S) log p 0(x S). (3) After taking the ratio, the likelihood for the background component x B is cancelled out, and only the likelihood for the semantic component x S remains. Our method produces a background contrastive score that captures the significance of the semantics compared with the background model. Likelihood ratio for auto-regressive models Auto-regressive models are one of the popular choices for generating images (Oord et al., 2016; Van den Oord et al., 2016; Salimans et al., 2017) and sequence data such as genomics (Zou et al., 2018; Killoran et al., 2017) and drug molecules (Olivecrona et al., 2017; Gupta et al., 2018), and text (Jozefowicz et al., 2016). In auto-regressive models, the log-likelihood of an input can be expressed as log p (x) = PD d=1 log p (xd|x 01/01/2016 Figure 4: The design of the training, validation, and test datasets for genomic sequence classification including in and OOD data. (a) (b) (c) Figure 5: (a) The likelihood-ratio score is roughly independent of the GC-content which makes it less susceptible to background statistics and better suited for OOD detection. (b) ROCs and AUROCs for OOD detection using likelihood and likelihood-ratio. (c) Correlation between the AUROC of OOD detection and distance to in-distribution classes using Likelihood Ratio and the Ensemble method. OOD detection correlates with its distance to in-distribution We investigate the effect of the distance between the OOD class to the in-distribution classes, on the performance of OOD detection. To measure the distance between the OOD class to the in-distribution, we randomly select representative genome from each of the in-distribution classes and OOD classes. We use the state-of-the-art alignment-free method for genome comparison, d S 2 (Ren et al., 2018a; Reinert et al., 2009), to compute the genetic distance between each pair of the genomes in the set. This genetic distance is calculated based on the similarity between the normalized nucleotide word frequencies (k-tuples) of the two genomes, and studies have shown that this genetic distance reflects true evolutionary distances between genomes (Chan et al., 2014; Bernard et al., 2016; Lu et al., 2017). For each of the OOD classes, we use the minimum distance between the genome in that class to all genomes in the in-distribution classes as the measure of the genetic distance between this OOD class and the in-distribution. Not surprisingly, the the AUROC for OOD detection is positively correlated with the genetic distance (Figure 5c), and an OOD class far away from in-distribution is easier to be detected. Comparing our likelihood ratio method and one of the best classifier-based methods, ensemble method, we observe that our likelihood ratio method has higher AUROC for different OOD classes than ensemble method in general. Furthermore, our method has a higher Pearson correlation coefficient (PCC) of 0.570 between the minimum distance and AUROC for Likelihood Ratio method, than the classifier-based ensemble method with 20 models which has PCC of 0.277. The dataset and code for the genomics study is available at https://github.com/google-research/google-research/tree/master/genomics_ood. 6 Discussion and Conclusion We investigate deep generative model-based methods for OOD detection and show that the likelihood of auto-regressive models can be confounded by background statistics, providing an explanation to the failure of Pixel CNN for OOD detection observed by recent work (Nalisnick et al., 2018; Hendrycks et al., 2018; Shafaei et al., 2018). We propose a likelihood ratio method that alleviates this issue by contrasting the likelihood against a background model. We show that our method effectively corrects for the background components, and significantly improves the accuracy of OOD detection on both image datasets and genomic datasets. Finally, we create and release a realistic genomic sequence dataset for OOD detection which highlights an important real-world problem, and hope that this serves as a valuable OOD detection benchmark for the research community. Acknowledgments We thank Alexander A. Alemi, Andreea Gane, Brian Lee, D. Sculley, Eric Jang, Jacob Burnim, Katherine Lee, Matthew D. Hoffman, Noah Fiedel, Rif A. Saurous, Suman Ravuri, Thomas Colthurst, Yaniv Ovadia, the Google Brain Genomics team, and Google Tensor Flow Probability team for helpful feedback and discussions. Ahlgren, N. A., Ren, J., Lu, Y. Y., Fuhrman, J. A., and Sun, F. Alignment-free oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences. Nucleic acids research, 45(1):39 53, 2016. Alemi, A. A., Fischer, I., and Dillon, J. V. Uncertainty in the variational information bottleneck. ar Xiv preprint ar Xiv:1807.00906, 2018. Alipanahi, B., Delong, A., Weirauch, M. T., and Frey, B. J. Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nature Biotechnology, 33(8):831, 2015. Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., and Mané, D. Concrete problems in AI safety. ar Xiv preprint ar Xiv:1606.06565, 2016. Bailey, T. L. and Elkan, C. The value of prior knowledge in discovering motifs with MEME. In ISMB, volume 3, pp. 21 29, 1995. Bernard, G., Chan, C. X., and Ragan, M. A. Alignment-free microbial phylogenomics under scenarios of sequence divergence, genome rearrangement and lateral genetic transfer. Scientific Reports, 6: 28970, 2016. Bishop, C. M. Novelty Detection and Neural Network Validation. IEEE Proceedings-Vision, Image and Signal processing, 141(4):217 222, 1994. Bishop, C. M. Regularization and complexity control in feed-forward networks. In Proceedings International Conference on Artificial Neural Networks ICANN, volume 95, pp. 141 148, 1995a. Bishop, C. M. Training with noise is equivalent to Tikhonov regularization. Neural computation, 7 (1):108 116, 1995b. Blauwkamp, T. A., Thair, S., Rosen, M. J., Blair, L., Lindner, M. S., Vilfan, I. D., Kawli, T., Christians, F. C., Venkatasubrahmanyam, S., Wall, G. D., et al. Analytical and clinical validation of a microbial cell-free dna sequencing test for infectious disease. Nature Microbiology, 4(4):663, 2019. Brady, A. and Salzberg, S. L. Phymm and Phymm BL: metagenomic phylogenetic classification with interpolated Markov models. Nature Methods, 6(9):673, 2009. Bulatov, Y. Not MNIST dataset, 2011. URL http://yaroslavvb.blogspot.com/2011/09/ notmnist-dataset.html. Busia, A., Dahl, G. E., Fannjiang, C., Alexander, D. H., Dorfman, E., Poplin, R., Mc Lean, C. Y., Chang, P.-C., and De Pristo, M. A deep learning approach to pattern recognition for short DNA sequences. bio Rxiv, pp. 353474, 2018. Chan, C. X., Bernard, G., Poirion, O., Hogan, J. M., and Ragan, M. A. Inferring phylogenies of evolving sequences without multiple sequence alignment. Scientific reports, 4:6504, 2014. Choi, H., Jang, E., and Alemi, A. A. WAIC, but why? Generative ensembles for robust anomaly detection. ar Xiv preprint ar Xiv:1810.01392, 2018. Eckburg, P. B., Bik, E. M., Bernstein, C. N., Purdom, E., Dethlefsen, L., Sargent, M., Gill, S. R., Nelson, K. E., and Relman, D. A. Diversity of the human intestinal microbial flora. Science, 308 (5728):1635 1638, 2005. Goodfellow, I. J., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples. ar Xiv preprint ar Xiv:1412.6572, 2014. Guo, C., Pleiss, G., Sun, Y., and Weinberger, K. Q. On calibration of modern neural networks. ar Xiv preprint ar Xiv:1706.04599, 2017. Gupta, A., Müller, A. T., Huisman, B. J., Fuchs, J. A., Schneider, P., and Schneider, G. Generative recurrent networks for de novo drug design. Molecular informatics, 37(1-2):1700111, 2018. He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770 778, 2016. Hendrycks, D. and Gimpel, K. A baseline for detecting misclassified and out-of-distribution examples in neural networks. ar Xiv preprint ar Xiv:1610.02136, 2016. Hendrycks, D., Mazeika, M., and Dietterich, T. G. Deep anomaly detection with outlier exposure. ar Xiv preprint ar Xiv:1812.04606, 2018. Hildebrand, F., Meyer, A., and Eyre-Walker, A. Evidence of selection upon genomic GC-content in bacteria. PLo S genetics, 6(9):e1001107, 2010. Hochreiter, S. and Schmidhuber, J. Long short-term memory. Neural Computation, 9(8):1735 1780, November 1997. Jozefowicz, R., Vinyals, O., Schuster, M., Shazeer, N., and Wu, Y. Exploring the limits of language modeling. ar Xiv preprint ar Xiv:1602.02410, 2016. Killoran, N., Lee, L. J., Delong, A., Duvenaud, D., and Frey, B. J. Generating and designing DNA with deep generative models. ar Xiv preprint ar Xiv:1712.06148, 2017. Kingma, D. P. and Dhariwal, P. Glow: Generative flow with invertible 1x1 convolutions. In Neur IPS, Lakshminarayanan, B., Pritzel, A., and Blundell, C. Simple and scalable predictive uncertainty estimation using deep ensembles. In Neur IPS, 2017. Le Cun, Y., Bottou, L., Bengio, Y., Haffner, P., et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278 2324, 1998. Lee, K., Lee, H., Lee, K., and Shin, J. Training confidence-calibrated classifiers for detecting out-of-distribution samples. ar Xiv preprint ar Xiv:1711.09325, 2017. Lee, K., Lee, K., Lee, H., and Shin, J. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In Neur IPS, 2018. Liang, S., Li, Y., and Srikant, R. Enhancing the reliability of out-of-distribution image detection in neural networks. ar Xiv preprint ar Xiv:1706.02690, 2017. Lu, Y. Y., Tang, K., Ren, J., Fuhrman, J. A., Waterman, M. S., and Sun, F. CAFE: a C celerated A lignment-F r E e sequence analysis. Nucleic Acids Research, 45(W1):W554 W559, 2017. Luhn, H. P. Key word-in-context index for technical literature (kwic index). American Documentation, 11(4):288 295, 1960. Nalisnick, E., Matsukawa, A., Teh, Y. W., Gorur, D., and Lakshminarayanan, B. Do deep generative models know what they don t know? ar Xiv preprint ar Xiv:1810.09136, 2018. Nalisnick, E., Matsukawa, A., Teh, Y. W., Gorur, D., and Lakshminarayanan, B. Hybrid models with deep and invertible features. In ICML, 2019. Nayfach, S., Shi, Z. J., Seshadri, R., Pollard, K. S., and Kyrpides, N. C. New insights from uncultivated genomes of the global human gut microbiome. Nature, pp. 1, 2019. Nguyen, A., Yosinski, J., and Clune, J. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 427 436, 2015. Olivecrona, M., Blaschke, T., Engkvist, O., and Chen, H. Molecular de-novo design through deep reinforcement learning. Journal of cheminformatics, 9(1):48, 2017. Oord, A. v. d., Kalchbrenner, N., and Kavukcuoglu, K. Pixel recurrent neural networks. ar Xiv preprint ar Xiv:1601.06759, 2016. Ovadia, Y., Fertig, E., Ren, J., Nado, Z., Sculley, D., Nowozin, S., Dillon, J. V., Lakshminarayanan, B., and Snoek, J. Can you trust your model s uncertainty? evaluating predictive uncertainty under dataset shift. ar Xiv preprint ar Xiv:1906.02530, 2019. Patil, K. R., Haider, P., Pope, P. B., Turnbaugh, P. J., Morrison, M., Scheffer, T., and Mc Hardy, A. C. Taxonomic metagenome sequence assignment with structured output models. Nature Methods, 8 (3):191, 2011. Ponsero, A. J. and Hurwitz, B. L. The promises and pitfalls of machine learning for detecting viruses in aquatic metagenomes. Frontiers in Microbiology, 10:806, 2019. Reinert, G., Chew, D., Sun, F., and Waterman, M. S. Alignment-free sequence comparison (I): statistics and power. Journal of Computational Biology, 16(12):1615 1634, 2009. Ren, J., Bai, X., Lu, Y. Y., Tang, K., Wang, Y., Reinert, G., and Sun, F. Alignment-free sequence analysis and applications. Annual Review of Biomedical Data Science, 1:93 114, 2018a. Ren, J., Song, K., Deng, C., Ahlgren, N. A., Fuhrman, J. A., Li, Y., Xie, X., and Sun, F. Identifying viruses from metagenomic data by deep learning. ar Xiv preprint ar Xiv:1806.07810, 2018b. Rosen, G. L., Reichenberger, E. R., and Rosenfeld, A. M. NBC: the naive Bayes classification tool webserver for taxonomic classification of metagenomic reads. Bioinformatics, 27(1):127 129, 2010. Salimans, T., Karpathy, A., Chen, X., Kingma, D. P., and Bulatov, Y. Pixel CNN++: A Pixel CNN implementation with discretized logistic mixture likelihood and other modifications. In ICLR, 2017. Shafaei, A., Schmidt, M., and Little, J. J. Does your model know the digit 6 is not a cat? a less biased evaluation of" outlier" detectors. ar Xiv preprint ar Xiv:1809.04729, 2018. Sueoka, N. On the genetic basis of variation and heterogeneity of DNA base composition. Proceedings of the National Academy of Sciences, 48(4):582 592, 1962. Van den Oord, A., Kalchbrenner, N., Espeholt, L., Vinyals, O., Graves, A., et al. Conditional image generation with Pixel CNN decoders. In Neur IPS, 2016. Wagstaff, K. L. Machine learning that matters. In ICML, 2012. Yarza, P., Richter, M., Peplies, J., Euzeby, J., Amann, R., Schleifer, K.-H., Ludwig, W., Glöckner, F. O., and Rosselló-Móra, R. The All-Species Living Tree project: a 16S r RNA-based phylogenetic tree of all sequenced type strains. Systematic and Applied Microbiology, 31(4):241 250, 2008. Zhou, J. and Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning based sequence model. Nature Methods, 12(10):931, 2015. Zhu, Z., Ren, J., Michail, S., and Sun, F. Metagenomic unmapped reads provide important insights into human microbiota and disease associations. bio Rxiv, pp. 504829, 2018. Zou, J., Huss, M., Abid, A., Mohammadi, P., Torkamani, A., and Telenti, A. A primer on deep learning in genomics. Nature Genetics, pp. 1, 2018.