# translating_prodrop_languages_with_reconstruction_models__8d76fb1a.pdf Translating Pro-Drop Languages with Reconstruction Models Longyue Wang ADAPT Centre, Dublin City Univ. longyue.wang@adaptcentre.ie Zhaopeng Tu Tencent AI Lab zptu@tencent.com Shuming Shi Tencent AI Lab shumingshi@tencent.com Tong Zhang Tencent AI Lab bradymzhang@tencent.com Yvette Graham ADAPT Centre, Dublin City Univ. yvette.graham@adaptcentre.ie Qun Liu ADAPT Centre, Dublin City Univ. qun.liu@adaptcentre.ie Pronouns are frequently omitted in pro-drop languages, such as Chinese, generally leading to significant challenges with respect to the production of complete translations. To date, very little attention has been paid to the dropped pronoun (DP) problem within neural machine translation (NMT). In this work, we propose a novel reconstruction-based approach to alleviating DP translation problems for NMT models. Firstly, DPs within all source sentences are automatically annotated with parallel information extracted from the bilingual training corpus. Next, the annotated source sentence is reconstructed from hidden representations in the NMT model. With auxiliary training objectives, in terms of reconstruction scores, the parameters associated with the NMT model are guided to produce enhanced hidden representations that are encouraged as much as possible to embed annotated DP information. Experimental results on both Chinese English and Japanese English dialogue translation tasks show that the proposed approach significantly and consistently improves translation performance over a strong NMT baseline, which is directly built on the training data annotated with DPs. Introduction In pro-drop languages, such as Chinese and Japanese, pronouns can be omitted from sentences when it is possible to infer the referent from the context. When translating sentences from a pro-drop language to a non-pro-drop language (e.g., Chinese to English), machine translation systems generally fail to translate invisible dropped pronouns (DPs). This problem is especially severe in informal genres such as dialogues and conversation, where pronouns are more frequently omitted to make utterances more compact (Yang, Liu, and Xue 2015). For example, our analysis of a large Chinese English dialogue corpus showed that around 26% of pronouns were dropped from the Chinese side of the corpus. This high proportion within informal genres shows the importance of addressing the challenge of translation of dropped pronouns. Researchers have investigated methods of alleviating the DP problem for conventional Statistical Machine Transla- Zhaopeng Tu is the corresponding author. Copyright c 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Input (它) 根本没那么严重 Ref It is not that bad SMT Wasn t that bad NMT It s not that bad Input 这块面包很美味! 你烤的(它) 吗? Ref The bread is very tasty ! Did you bake it ? SMT This bread , delicious ! Did you bake ? NMT The bread is delicious ! Are you baked ? Table 1: Examples of translating DPs where words in brackets are dropped pronouns that are invisible in decoding. NMT model s successes on translating simple dummy pronoun (upper panel), while fails on a more complicated one (bottom panel); SMT model fails in both cases. tion (SMT) models showing promising results (Le Nagard and Koehn 2010; Xiang, Luo, and Zhou 2013; Wang et al. 2016a). Modeling DP translation for the more advanced Neural Machine Translation (NMT) models, however, has received substantially less attention, resulting in low performance in this respect even for state-of-the-art approaches. NMT models, due to their ability to capture semantic information with distributed representations, currently only manage to successfully translate some simple DPs, but still fail when translating anything more complex. Table 1 includes typical examples of when our strong baseline NMT system fails to accurately translate dropped pronouns. In this paper, we narrow the gap between correct DP translation for NMT models to improve translation quality for pro-drop languages with advanced models. More specifically, we propose a novel reconstructionbased approach to alleviate DP problems for NMT. Firstly, we explicitly and automatically label DPs for each source sentence in the training corpus using alignment information from the parallel corpus (Wang et al. 2016a). Accordingly, each training instance is represented as a triple (x, y, ˆx), where x and y are source and target sentences, and ˆx is the labelled source sentence. Next, we apply a standard encoder-decoder NMT model to translate x, and obtain two sequences of hidden states from both encoder and The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) decoder. This is followed by introduction of an additional reconstructor (Tu et al. 2017b) to reconstruct back to the labelled source sentence ˆx with hidden states from either encoder or decoder, or both components. The central idea behind is to guide the corresponding hidden states to embed the recalled source-side DP information and subsequently to help the NMT model generate the missing pronouns with these enhanced hidden representations. To this end, the reconstructor produces a reconstruction loss, which measures how well the DP can be recalled and serves as an auxiliary training objective. Additionally, the likelihood score produced by the standard encoder-decoder measures the quality of general translation and the reconstruction score measures the quality of DP translation, and linear interpolation of these scores is employed as an overall score for a given translation. Experiments on a large-scale Chinese English corpus show that the proposed approach significantly improves translation performance by addressing the DP translation problem. Furthermore, when reconstruction is applied only in training, it improves parameter training by producing better hidden representations that embed the DP information. Results show improvement over a strong NMT baseline system of +1.35 BLEU points without any increase in decoding speed. When additionally applying reconstruction during testing, we obtain a further +1.06 BLEU point improvement with only a slight decrease in decoding speed of approximately 18%. Experiments for Japanese English translation task show a significant improvement of 1.29 BLEU points, demonstrating the potential universality of the proposed approach across language pairs. Contributions Our main contributions can be summarized as follows: 1. We show that although NMT models advance SMT models on translating pro-drop languages, there is still large room for improvement; 2. We introduce a reconstruction-based approach to improve dropped pronoun translation; 3. We release a large-scale bilingual dialogue corpus, which consists of 2.2M Chinese English sentence pairs.1 Background Pro-Drop Language Translation A pro-drop language is a language in which certain classes of pronouns are omitted to make the sentence compact yet comprehensible when the identity of the pronouns can be inferred from the context. Since pronouns contain rich anaphora knowledge in discourse and the sentences in dialogue are generally short, DPs not only result in missing translations of pronouns, but also harm the sentence structure and even the semantics of output. Take the second case in Table 1 as an example, when the object pronoun 它 is dropped, the sentence is translated into Are you baked? , 1Our released corpus is available at https://github.com/ longyuewangdcu/tvsub. Genres Sents ZH-Pro EN-Pro DP Dialogue 2.15M 1.66M 2.26M 26.55% Newswire 3.29M 2.27M 2.45M 7.35% Table 2: Extent of DP in different genres. The Dialogue corpus consists of subtitles extracted from movie subtitle websites; The Newswire corpus is CWMT2013 news data. while the correct translation should be Did you bake it? . Such omissions may not be problematic for humans since they can easily recall missing pronouns from the context. They do, however, cause challenges for machine translation from a source pro-drop language to a target non-pro-drop language, since translation of such dropped pronouns generally fails. As shown in Table 2, we analyzed two large Chinese English corpora and found that around 26.55% of English pronouns can be dropped in the dialogue domain, while only 7.35% of pronouns were dropped in the newswire domain. DPs in formal text genres (e.g., newswire) are not as common as those in informal genres (e.g., dialogue), and the most frequently dropped pronouns in Chinese newswire is the third person singular 它( it ) (Baran, Yang, and Xue 2012), which may not be crucial to translation performance. As the dropped pronoun phenomenon is more prevalent in informal genres, we test our method with respect to the dialogue domain. Encoder-Decoder Based NMT Neural machine translation (Sutskever, Vinyals, and Le 2014; Bahdanau, Cho, and Bengio 2015) has greatly advanced state-of-the-art within machine translation. Encoderdecoder architecture is now widely employed, where the encoder summarizes the source sentence x = x1, x2, . . . , x J into a sequence of hidden states {h1, h2, . . . , h J}. Based on the encoder-side hidden states, the decoder generates the target sentence y = y1, y2, . . . , y I word by word with another sequence of decoder-side hidden states {s1, s2, . . . , s I}: i=1 P(yi|y