# neural_question_generation_with_answer_pivot__5c980bea.pdf The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20) Neural Question Generation with Answer Pivot Xiaochuan Wang, Bingning Wang, Ting Yao, Qi Zhang, Jingfang Xu Sogou Inc. Beijing, 100084, China {wxc, wangbingning, yaoting, qizhang, xujingfang}@sogou-inc.com Neural question generation (NQG) is the task of generating questions from the given context with deep neural networks. Previous answer-aware NQG methods suffer from the problem that the generated answers are focusing on entity and most of the questions are trivial to be answered. The answeragnostic NQG methods reduce the bias towards named entities and increasing the model s degrees of freedom, but sometimes result in generating unanswerable questions which are not valuable for the subsequent machine reading comprehension system. In this paper, we treat the answers as the hidden pivot for question generation and combine the question generation and answer selection process in a joint model. We achieve the state-of-the-art result on the SQu AD dataset according to automatic metric and human evaluation. Introduction Question generation (QG), or learning to ask, is a challenging problem in natural language understanding, it has been an active field of research within the context of machine reading comprehension (MRC). Question generation has many useful applications such as improving the MRC (Yuan et al. 2017; Xiao et al. 2018) by providing more training data, generating educational purposes exercises (Heilman and Smith 2010), and helping dialog systems, such as Alexa and Google Assistant. Conventional methods for question generation rely heavily on heuristic rules, sometimes the standalone constituent or dependency parsing tool is needed to generate the handcrafted templates (Mostow and Chen 2009; Heilman and Smith 2010; Rus et al. 2010; Hussein, Elmogy, and Guirguis 2014). These rule-based systems are brittle and have low generalizability and scalability. Recent works on question generation are focusing on using deep neural networks with the end to end training, which is also known as neural question generation. NQG is based on sequence to sequence methods, using mechanism borrowed from the neural machine translation, such as copy (C aglar G ulc ehre et al. 2016; Zhou et al. 2017) and attention (Bahdanau, Cho, and Bengio 2014; Yuan et al. 2017; Scialom, Piwowarski, and Staiano Copyright c 2020, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Predicted Answer: Luther Predicted Question: Who enrolled in law school accord with his father s wishes? Predicted Question: Why does Luther sought assurances about life? Figure 1: The bad case of answer-aware and answer-agnostic NQG. In answer-aware question generation, the generated answer Luther is just a named entity that could be trivially inferred by subsequent text, and without much value to be asked. In answer-agnostic case the generated question is fluent but could not be answered by the paragraph. 2019). NQG shows great advantage compared with previous rule-based systems both in terms of question fluency and diversity (Duan et al. 2017; Yuan et al. 2017). Briefly, NQG systems are mainly divided into two streams: answer-aware and answer-agnostic. In answeraware NQG system, the models are given not only the paragraph but also the target answers (Yuan et al. 2017; Sun et al. 2018; Song et al. 2018; Chen, Wu, and Zaki 2019), and the models are learned to interact with the paragraph and the target answer to generate the specific questions. However, in real applications, the answers are not provided so we should first generate the candidate answers and then produce the questions thereof. Dong et al. (2018) found that the generated answers are focused on named entities so the question types are limited to certain types. Furthermore, Golub et al. (2017) showed that sometimes the selected answers are just arbitrary entities, regardless of their importance in the corresponding paragraph,so the generated questions are trivial that benefit little to the MRC systems (Duan et al. 2017). Conversely, current NQG systems are more and more focusing on answer-agnostic NQG (Subramanian et al. 2018; Kim et al. 2019; Scialom, Piwowarski, and Staiano 2019), which lifts the constraint of knowing the target answers before generating the questions (Du, Shao, and Cardie 2017). The agnostic of the target answer increasing the model s degrees of freedom to generate diverse questions. However, the answer-agnostic NQG systems are suffered from the fact the generated answers may be unanswerable (Sun et al. 2018), an example in Figure 1 demonstrates this problem. Furthermore, the lack of corresponding answers limiting their application in MRC where the answers are requisite. In this paper, we try to combine the advantage of answeraware and answer-agnostic NQG in a joint model. We treat the answers as the hidden pivot when generating the questions. Concretely, we first generate the hidden answers given the paragraph, and then combined paragraph and the induced pivot answers to produce the questions, the objective is to maximize the likelihood of the questions. In this way, the model is learned such that the better hidden answers pivot could yield better questions. Our model could be seemed as the compromise between answer-aware and answer-agnostic model. If we ignore the hidden answer pivot, then it reduced to answer-agnostic models where the answers are bypassed; if we fed the ground truth answers as the pivot then its behaves like the answer-aware model. Therefore, our model could take the advantages of both worlds. We conduct throughout experiments on SQu AD (Rajpurkar et al. 2016). The proposed model consistently outperforms the pure answer-aware or answer-agnostic counterparts in terms of the automatic evaluation metric. The human assessment demonstrates that our proposed model could generate both answerable and diverse questions. Furthermore, preliminary experiments show that the induced hidden answers are accord with the real target answers, even if the model was trainined without answer supervision. Finally, the generated data of our model also excels at improving the result of downstream MRC. Codes and analysis of this paper will be public available. Related Work Automatic question generation has received increased attention from the research community. Traditional QG systems are most rule-based, which sometimes utilizing offthe-shelf NLP tools to get the syntactic structure, dependency relations and semantic role of the passage (Mostow and Chen 2009; Heilman and Smith 2010; Chali and Hasan 2015). First, the target answers are generated using rules or semantic roles, next, they generate questions using handcrafted transformation rules or templates. Finally, the generated questions are ranked by features such as key word matching degree or sentence perplexity (Hussein, Elmogy, and Guirguis 2014; Heilman 2011). The main drawbacks of these symbolic systems are that the rules and templates are expensive to manually create, and lack diversity. Recently, with the development of deep learning and large-scale question answering dataset, motivated by neural machine translation, Du, Shao, and Cardie (2017) proposed a sequence to sequence (seq2seq) architecture combined with attention mechanism, achieving a promising result on MRC dataset SQu AD. Since then, many works have been proposed to extends the preliminary framework with rich features, such as answer position (Sun et al. 2018), named entity tags (Zhou et al. 2017) or templates (Duan et al. 2017), and incorporate copy mechanism to copy words from the context paragraph (Song et al. 2018). However, these methods are all based on the maximum likelihood estimation, which has the notorious problem of exposure bias (Ranzato et al. 2015) and other deficiency during inference (Kumar, Ramakrishnan, and Li 2018; Chen, Wu, and Zaki 2019). Some training objectives other than teacher forcing are introduced, such as BLEU score (Kumar, Ramakrishnan, and Li 2018), generated question perplexity (Yuan et al. 2017) or word embedding similarities (Chen, Wu, and Zaki 2019). However, Hosking and Riedel (2019) found that although those policy gradient methods leads to increases in the metrics such as BLEU, but they are poorly aligned with human judgment and the model simply learns to exploit the weaknesses of the reward source. While most NQG models are focused on answer-aware setting, recently, answer-agnostic NQG has attracted more and more attention. In the case that only the input passage is given, the system should automatically identify questionworthy parts within the passage and generate questions thereof. Du, Shao, and Cardie (2017) learns a sentence selection task to identify the sentences in the paragraph using a neural network-based sequence tagging models. Subramanian et al. (2018) train a neural keyphrase extractor to predict the keyphrase within the paragraph. Scialom, Piwowarski, and Staiano (2019) argues that the predicted answer may make the generated question biased towards the factoid questions, and they train a Transformer (Vaswani et al. 2017) based answer-agnostic model and obtain a promising result in terms of the human evaluation. The pros and cons of previous answer-aware and answeragnostic NQG models motivate us to combine them together: our model are built upon answer-agnostic NQG, but we explictly infer the hidden answers and generated questions based on the induced hidden answer. Methodology In this paper, we denote the context paragraph as C = {c1, c2, ..., cn}, our objective is to predict the target questions Q = {q1, q2, ..., qm}. The whole architecture is built upon the standard encoder-decoder architecture, with the multi-head attention as the building block, an additional hidden pivot predictor is introduced to get the candidate answer. Paragraph Encoder The paragraph encoder encodes the paragraph into dense embedding space. In this paper, based on the current development of NLP (Radford et al. 2019; Devlin et al. 2018), we adopt the self-attention based Transformer (Vaswani et al. 2017) as the building block. The hidden representation for layer l could be represented as: ql, kl, vl = Wl qhl 1, Wl khl 1, Wl vhl 1, hl =Multi Head Attention(ql, kl, vl) (1) Multi Head Attention(q, k, v) = softmax(q T k ql, kl, vl are query, key and value representations for this layer and Wl q,k,v are weight matrixes. d is the hidden size. The first layer is the word embedding layer, we use the last layer output H Rn d as the paragraph representations. Pivot Answer Predictor Compared with prvious anwer-agnostic NQG methods, the most significant difference of our model is that we explicity infer the candidate answer before generating the questions, thus the induced answers act like a pivot in our model. The pivot answer predictor predict the hidden answer based on the current paragraph. We predict the binary label of the ith word to denote whether the current tokens is locate within the answer spans: zi = MLP(hi), gi = σ(z T i wz), 1, gi > 0.5, 0, gi 0.5 where MLP is the multi-layer perceptron, and σ is the sigmoid activation function. w is the weight vector to transform the hidden representation into a scalar value. z is a binary indicator to denote the current word is in (1) or out of (0) the answer span. Thus, our model could be fitted to the scenario where the answer spans are continous or discontinues. After the pivot answer predictor, along with the original paragraph hidden representations H, we also has the answer position information which could guide the subsequent decoder to generate specific questions. As the answer indicator z is a binary value, we use an embedding D R2 d matrix to embed this indicator to the representation Z, and add it to H as the final hidden representation of the encoder: C = H + Z (4) Question Decoder The question decoder is similar with previous answer-aware NQG model, which takes the paragraph hidden representations and answer indicator as input and generate the target question in an auto-regressive way. The probability of generating the target token in step i is: p(qi|q