# response_enhanced_semisupervised_dialogue_query_generation__b8641f3b.pdf Response Enhanced Semi-supervised Dialogue Query Generation Jianheng Huang1,2,3*, Ante Wang1,2,3*, Linfeng Gao1,3, Linfeng Song4, Jinsong Su1,2,3 1School of Informatics, Xiamen University, China 2Shanghai Artificial Intelligence Laboratory, China 3Key Laboratory of Digital Protection and Intelligent Processing of Intangible Cultural Heritage of Fujian and Taiwan (Xiamen University), Ministry of Culture and Tourism, China 4Tencent AI Lab {enatsu, wangante, 22920202200799}@stu.xmu.edu.cn, lfsong@tencent.com, jssu@xmu.edu.cn Leveraging vast and continually updated knowledge from the Internet has been considered an important ability for a dialogue system. Therefore, the dialogue query generation task is proposed for generating search queries from dialogue histories, which will be submitted to a search engine for retrieving relevant websites on the Internet. In this regard, previous efforts were devoted to collecting conversations with annotated queries and training a query producer (QP) via standard supervised learning. However, these studies still face the challenges of data scarcity and domain adaptation. To address these issues, in this paper, we propose a semi-supervised learning framework Semi DQG, to improve model performance with unlabeled conversations. Based on the observation that the search query is typically related to the topic of dialogue response, we train a response-augmented query producer (RA) to provide rich and effective training signals for QP. We first apply a similarity-based query selection strategy to select high-quality RA-generated pseudo queries, which are used to construct pseudo instances for training QP and RA. Then, we adopt the REINFORCE algorithm to further enhance QP, with RA-provided rewards as fine-grained training signals. Experimental results and in-depth analysis of three benchmarks show the effectiveness of our framework in cross-domain and low-resource scenarios. Particularly, Semi DQG significantly surpasses Chat GPT and competitive baselines. Our code is available at https://github.com/ Deep Learn XMU/Semi DQG. Introduction Recent years have witnessed the burgeoning of pre-trained language models (PLMs) (Lewis et al. 2019; Raffel et al. 2020) and large language models (LLMs), which effectively improve the performance of various downstream tasks and pave the way for artificial general intelligence (AGI) (Goertzel and Pennachin 2007). Despite the variation in size, these models can still fail to generate factual content, which is known as hallucination (Ji et al. 2023; Open AI 2023). To tackle this issue, researchers have explored incorporating external knowledge from search engines (Komeili, Shuster, *These authors contributed equally. Corresponding author. Copyright 2024, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. and Weston 2022). Typically, to bridge a model with a search engine, a query producer is used to generate search queries for retrieving relevant websites. In this work, we focus on dialogue query generation, which is more challenging as it has to mine user intents from complex dialogue contexts. To train such a query producer, previous studies resort to supervised learning, where conversations with annotated search queries are used to fine-tune a pre-trained model (Lewis et al. 2019; Raffel et al. 2020). However, it is costly to construct a dataset with enough human annotations, and the trained model may still have a disappointing performance in out-of-domain conversations. A common practice to tackle these issues is semi-supervised learning (Yarowsky 1995; Blum and Mitchell 1998), which has been widely investigated in both CV (Rosenberg, Hebert, and Schneiderman 2005) and NLP (Zhang and Zong 2016; He et al. 2020) fields. It suits the dialogue query generation task well because abundant conversations without annotated queries are easy to obtain. As implemented in self-training, we expect the model to generate pseudo queries for unlabeled conversations. While in practice, some pseudo queries are often unsatisfying, which may lead to error accumulation and model performance degradation. It can be said that the challenge of effectively collecting high-quality pseudo queries to construct pseudo instances continues to be a hurdle in this task. Fortunately, we notice that a search query can be highly relevant to the topic of its corresponding dialogue response. When augmenting the input with response information, the model can often generate better search queries. As illustrated in the first case of Table 1, the standard query producer (QP) solely incorporates the dialogue history as input and mistakenly recognizes north atlantic as the query. In contrast, the response-augmented query producer (RA) accurately predicts the correct query by inferring the mainly discussed topic ireland (referred to by it ) in the response. This demonstrates the potential of RA to generate high-quality pseudo queries which can subsequently be used to construct pseudo instances for training QP.1 However, we notice that RA may also generate some low-quality queries especially when it is overly influenced by the response. In the second 1Note that we focus on improving the performance of QP in that the response information is inaccessible in practical application. The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24) History System: Ever been to Ireland in the North Atlantic? Heard it is lovely. User: I have not been there but I d love to Response System: It s not too big but it is the third largest island in Europe so not too small, like a lively and nice place. Gold query ireland QP s prediction north atlantic RA s prediction ireland User: I love to go bowling with my family, but I m a horrible bowler. Do you like it? System: Oh, yes, I love bowling. Rolling balls down the lane and knocking down the pins gives me a charge. User: I know! I love it when I just knock one down - lol!! My kids want to win, I just like playing. Response System: Since it is one of the major throwing sports, it is a lot like the javelin throw. Gold query bowling QP s prediction bowling RA s prediction javelin throw Table 1: Two examples from Wizard-of-Wikipedia (Wo W, Dinan et al. 2018) with corresponding dialogue responses, gold queries, and model predictions. QP and RA denote the standard query producer and the response-augmented model respectively. Here we highlight the main topics or their referring expressions that help predict gold queries in bold and mark the misleading concepts with underlines. case of Table 1, RA ignores the principal topic bowling in the history, but mistakenly takes javelin throw , another topic in the response, as the prediction. Therefore, it is worth exploring ways to select high-quality RA-generated pseudo queries. Based on the observations above, we propose a novel framework Semi-supervised Dialogue Query Generation (Semi DQG) which effectively improves QP with the guidance of RA. Specifically, we first train QP and RA on a labeled dataset. Subsequently, we leverage the capabilities of RA to generate pseudo queries for an unlabeled dataset and introduce a query selection strategy based on the prediction similarity between QP and RA to select high-quality RA-generated queries (e.g., ireland in Table 1). In a semisupervised manner, these selected queries are used to construct pseudo instances, thereby enhancing the performance of both models. Finally, to further enhance QP, we adopt the REINFORCE algorithm (Williams 1992) with RA-provided rewards, serving as fine-grained training signals, based on QP-generated candidate queries. Both pseudo instance construction and the reinforcement learning approach proposed above can jointly consider the output features from both QP and RA. Thus, it can fully utilize the training signals from RA spanning different levels of granularity and effectively alleviate the negative effect stemming from input discrepancy between the two models. We conduct experiments in cross-domain and lowresource scenarios respectively. In the cross-domain scenario, we construct Wizard-of-Internet (Wo I, Komeili, Shuster, and Weston 2022) Wizard-of-Wikipedia (Wo W, Dinan et al. 2018) in English, and Du Sinc (Zhou et al. 2022) Kd Conv (Zhou et al. 2020) in Chinese. In the lowresource scenario, we focus on Wo I as it provides more data for better evaluation. Experiment results show that Semi DQG significantly outperforms Chat GPT and various baselines. Moreover, in-depth analysis validates the effectiveness of the proposed query selection strategy and reinforcement learning method in our framework. Related Work Search Query Generation Using a search engine to exploit knowledge from the Internet is gaining popularity for benefiting various knowledge-intensive tasks, such as opendomain QA (Qi et al. 2019; Nakano et al. 2022), and dialogue response generation (Komeili, Shuster, and Weston 2022; Glaese et al. 2022). Early attempts simply take user questions or keywords as search queries but have been proven to be ineffective when handling distinct domains (Xie et al. 2023) or complex dialogue contexts (Wang et al. 2023a). Recent work (Komeili, Shuster, and Weston 2022; Zhou et al. 2022; Wang et al. 2023a) usually trains a query producer to extract or generate search queries, with query generation more popular due to the limitation of extraction. With the release of various query generation datasets (Komeili, Shuster, and Weston 2022; Zhou et al. 2022), researchers can build their query producers in supervised learning manners. As query annotations are costly to collect, some researchers (Qi et al. 2019; Wang et al. 2023a,b) introduce additional supervision signals to train their query producers. Very recently, many LLM products (Thoppilan et al. 2022; Glaese et al. 2022) use prompting techniques to generate search queries instead of adopting an independent query producer. However, prompting techniques heavily rely on the ability of LLMs to understand the prompt. After a comparison of these two strategies, our experimental results show that even Chat GPT still shows inferior performance than a smaller task-specific model. Semi-supervised Learning As a branch of machine learning, semi-supervised learning exploits the knowledge from unlabeled data when labeled data is limited. In this regard, typical methods mainly include self-training (Yarowsky 1995), co-training (Blum and Mitchell 1998; Zhou and Goldman 2004), tri-training (Zhou and Li 2005), and so on. Among them, self-training is one of the earliest approaches and continues to gain popularity in recent years (Amini et al. 2022). For a specific task, it improves a model by iteratively enriching the training data with selected pseudo instances. In NLP fields, several studies have investigated self-training on text generation tasks, such as neural machine translation (He et al. 2020), text summarization (He et al. 2020), and ques- The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24) 𝑞𝑐: go bowling 𝑟( 𝑞𝑐): 0.8 Similarity-based query selection Generate pseudo queries Stage 2 Stage 3 Inference Training QP dialogue history 𝑢<𝑖 dialogue response 𝑢𝑖 User: I love to go bowling ... Sys: Oh, yes, I love bowling... User: I know... Sys: it is a lot like the javelin throw. RA (𝑢 𝑖, ത𝑞) ത𝑞: go bowling RA QP (𝑢<𝑖, ത𝑞) 𝑄 = { 𝑞1, , 𝑞𝑁} QP Figure 1: Our proposed Semi-supervised Dialogue Query Generation (Semi DQG) framework. In Stage 1, we train QP and RA via standard supervised training on labeled data (not shown for clarity). In Stage 2, for each unlabeled conversation, we use RA to generate its pseudo queries q. We only keep the query whose similarity score s( q) exceeds a given threshold α to construct a pseudo instance. We use these high-quality pseudo instances to train QP and RA. In Stage 3, QP is further enhanced using RA-guided reinforcement learning. tion generation (Kulshreshtha et al. 2021). Nevertheless, it is challenging to collect appropriate pseudo instances, potentially hindering the progress in building more powerful models. In this work, we focus on leveraging semi-supervised learning to further enhance the query producer, as illustrated in the following section. Our Framework Figure 1 illustrates the procedure of our proposed Semi DQG, which can be roughly separated into three stages. In Stage 1, we train a standard query producer (QP) and a response-augmented query producer (RA) on a labeled dataset via supervised learning. In Stage 2, both QP and RA generate pseudo queries for an unlabeled dialogue corpus. Then, based on the prediction similarity between RA and QP, we select high-quality RA-generated queries to construct pseudo instances for training these two models. Nevertheless, due to the discrepancy between QP and RA, these pseudo instances might not effectively guide QP. Thus, in Stage 3, we employ reinforcement learning to further improve QP with RA providing rewards as fine-grained training signals. Detailed descriptions will be provided in the following subsections. Stage 1: Train QP and RA with Supervised Learning As described above, under our framework, we train a QP and an RA via supervised learning in this stage. Formally, given the dialogue history u