# search_engine_guided_neural_machine_translation__aa2e56e6.pdf Search Engine Guided Neural Machine Translation Jiatao Gu, Yong Wang, Kyunghyun Cho, Victor O.K. Li The University of Hong Kong New York University, CIFAR Azrieli Global Scholar {jiataogu, wangyong, vli}@eee.hku.hk kyunghyun.cho@nyu.edu In this paper, we extend an attention-based neural machine translation (NMT) model by allowing it to access an entire training set of parallel sentence pairs even after training. The proposed approach consists of two stages. In the first stage retrieval stage , an off-the-shelf, black-box search engine is used to retrieve a small subset of sentence pairs from a training set given a source sentence. These pairs are further filtered based on a fuzzy matching score based on edit distance. In the second stage translation stage , a novel translation model, called search engine guided NMT (SEG-NMT), seamlessly uses both the source sentence and a set of retrieved sentence pairs to perform the translation. Empirical evaluation on three language pairs (En-Fr, En-De, and En-Es) shows that the proposed approach significantly outperforms the baseline approach and the improvement is more significant when more relevant sentence pairs were retrieved. Introduction Neural machine translation is a recently proposed paradigm in machine translation, where a single neural network, often consisting of encoder and decoder recurrent networks, is trained end-to-end to map from a source sentence to its corresponding translation(Bahdanau, Cho, and Bengio 2014; Cho et al. 2014; Sutskever, Vinyals, and Le 2014; Kalchbrenner and Blunsom 2013). The success of neural machine translation, which has already been adopted by major industry players in machine translation(Wu et al. 2016; Crego et al. 2016), is often attributed to the advances in building and training recurrent networks as well as the availability of large-scale parallel corpora for machine translation. Neural machine translation is most characteristically distinguished from the existing approaches to machine translation, such as phrase-based statistical machine translation(Koehn, Och, and Marcu 2003), in that it projects a sequence of discrete source symbols into a continuous space and decodes back the corresponding translation. This allows one to easily incorporate other auxiliary information into the neural machine translation system as long as such auxiliary information could be encoded into a continuous space using a neural network. This property has been noticed recently and used for building more advanced translation systems such as Copyright c 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. multilingual translation (Firat, Cho, and Bengio 2016; Luong et al. 2015), multi-source translation (Zoph and Knight 2016; Firat et al. 2016), multimodal translation (Caglayan et al. 2016) and syntax guided translation (Nadejde et al. 2017; Eriguchi, Tsuruoka, and Cho 2017). In this paper, we first notice that this ability in incorporating arbitrary meta-data by neural machine translation allows us to naturally extend it to a model in which a neural machine translation system explicitly takes into account a full training set consisting of source-target sentence pairs (in this paper we refer them as a general translation memory). We can build a neural machine translation system that considers not only a given source sentence, which is to be translated but also a set of training sentence pairs in the process of translation. To do so, we propose a novel extension of attention-based neural machine translation that seamlessly fuses two information streams, each of which corresponds to the current source sentence and a set of training sentence pairs, respectively. A major technical challenge, other than designing such a neural machine translation system, is the scale of a training parallel corpus which often consists of hundreds of thousands to millions of sentence pairs. We address this issue by incorporating an off-the-shelf black-box search engine into the proposed neural machine translation system. The proposed approach first queries a search engine, which indexes a whole training set, with a given source sentence, and the proposed neural translation system translates the source sentence while incorporating all the retrieved training sentence pairs. In this way, the proposed translation system automatically adapts to the search engine and its ability to retrieve relevant sentence pairs from a training corpus. We evaluate the proposed search engine guided neural machine translation (SEG-NMT) on three language pairs (En-Fr, En-De, and En-Es, in both directions) from JRCAcquis Corpus(Steinberger et al. 2006) which consists of documents from a legal domain. This corpus was selected to demonstrate the efficacy of the proposed approach when a training corpus and a set of test sentences are both from a similar domain. Our experiments reveal that the proposed approach exploits the availability of the retrieved training sentence pairs very well, achieving significant improvement over the strong baseline of attention-based neural machine translation(Bahdanau, Cho, and Bengio 2014). The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) Background Neural Machine Translation In this paper, we start from a recently proposed, and widely used, attention-based neural machine translation model(Bahdanau, Cho, and Bengio 2014). The attention-based neural translation model is a conditional recurrent language model of a conditional distribution p(Y |X) over all possible translations Y = {y1, . . . , y T } given a source sentence X = {x1, . . . , x Tx}. This conditional recurrent language model is an autoregressive model that estimates the conditional probability as p(Y |X) = T t=1 p(yt|y