# neural_machine_translation_with_adequacyoriented_learning__8bd2f168.pdf The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19) Neural Machine Translation with Adequacy-Oriented Learning Xiang Kong Carnegie Mellon University xiangk@andrew.cmu.edu Zhaopeng Tu Tencent AI Lab zptu@tencent.com Shuming Shi Tencent AI Lab shumingshi@tencent.com Eduard Hovy Carnegie Mellon University hovy@cs.cmu.edu Tong Zhang Tencent AI Lab bradymzhang@tencent.com Although Neural Machine Translation (NMT) models have advanced state-of-the-art performance in machine translation, they face problems like the inadequate translation. We attribute this to that the standard Maximum Likelihood Estimation (MLE) cannot judge the real translation quality due to its several limitations. In this work, we propose an adequacyoriented learning mechanism for NMT by casting translation as a stochastic policy in Reinforcement Learning (RL), where the reward is estimated by explicitly measuring translation adequacy. Benefiting from the sequence-level training of RL strategy and a more accurate reward designed specifically for translation, our model outperforms multiple strong baselines, including (1) standard and coverage-augmented attention models with MLE-based training, and (2) advanced reinforcement and adversarial training strategies with rewards based on both word-level BLEU and character-level CHRF3. Quantitative and qualitative analyses on different language pairs and NMT architectures demonstrate the effectiveness and universality of the proposed approach. Introduction During the past several years, rapid progress has been made in the field of Neural Machine Translation (NMT) (Kalchbrenner and Blunsom 2013; Sutskever, Vinyals, and Le 2014; Bahdanau, Cho, and Bengio 2015; Gehring et al. 2017; Wu et al. 2016; Vaswani et al. 2017). Although NMT models have advanced the community, they still face inadequate translation problems: one or multiple parts of the input sentence are not translated (Tu et al. 2016). We attribute this problem to the lack of the mechanism to guarantee the generated translation being as sufficient as human translation. NMT models are generally trained in an end-to-end manner to maximize the likelihood of the output sentence. Maximum Likelihood Estimation (MLE), however, could not judge the real quality of generated translation due to its several limitations 1. Exposure bias (Ranzato et al. 2016): The models are trained on the groundtruth data distribution, but at test time are used to generate target words based on previous model predictions, which can be erroneous; Zhaopeng Tu is the corresponding author. Work was mainly done when Xiang Kong was interning at Tencent AI Lab. Copyright c 2019, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. 2. Word-level loss (Shen et al. 2016): Likelihood is defined at word-level, which might hardly correlate well with sequence-level evaluation metrics like BLEU. 3. Focusing more on fluency than adequacy (Tu et al. 2017): Likelihood does not measure how well the complete source information is transformed to the target side, thus does not correlate well with translation adequacy. Adequacy metric is regularly employed to assess the translation quality in practice. Some recent work partially alleviates one or two of the above problems with advanced training strategies. For example, the first two problems are tackled by sequence level training using the REINFORCE algorithm (Ranzato et al. 2016; Bahdanau et al. 2017), minimum risk training (Shen et al. 2016), beam search optimization (Wiseman and Rush 2016) or adversarial learning (Wu et al. 2017; Yang et al. 2018). The last problem can be alleviated by introducing an auxiliary reconstruction-based training objective to measure translation adequacy (Tu et al. 2017). In this work, we aim to fully solve all the three problems in a unified framework. Specifically, we model the translation as a stochastic policy in Reinforcement Learning (RL) and directly perform gradient policy update. The RL reward is estimated on a complete sequence produced by the NMT model, which is able to correlate well with a sequencelevel task-specific metric. To explicitly measure translation adequacy, we propose a novel metric called Coverage Difference Ratio (CDR) which is calculated by counting how many source words are under-translated via directly comparing generated translation with human translation. Benefiting from the sequence-level training of RL strategy and a more accurate reward designed specifically for translation, the proposed approach is able to alleviate all the aforementioned limitations of MLE-based training. We conduct experiments on Chinese English and German English translation tasks, using both the RNNbased NMT model (Bahdanau, Cho, and Bengio 2015) and the recently proposed TRANSFORMER (Vaswani et al. 2017). The consistent improvements across language pairs and NMT architectures demonstrate the effectiveness and universality of the proposed approach. The proposed adequacy-oriented learning improves translation performance not only over a standard attention model, but also over a coverage-augmented attention model (Tu et al. 2016) that alleviates the inadequate translation problem at the word-level. In addition, the proposed metric CDR score, consistently outperforms the commonly-used word-level BLEU (Papineni et al. 2002) and character-level CHRF3 (Popovi c 2015) scores in both the reinforcement learning and adversarial learning frameworks, indicating the superiority and necessity of an adequacy-oriented metric in training effective NMT models. Background Neural Machine Translation (NMT) is an end-to-end structure which could directly model the translation probability between a source sentence x = x1, x2, . . . , x J and a target sentence y = y1, y2, . . . , y I word by word: i=1 P(yi|y