# contrastive_learning_reduces_hallucination_in_conversations__7dc9dbf6.pdf Contrastive Learning Reduces Hallucination in Conversations Weiwei Sun1, Zhengliang Shi1, Shen Gao1, Pengjie Ren1, Maarten de Rijke2, Zhaochun Ren1* 1Shandong University, Qingdao, China 2University of Amsterdam, Amsterdam, The Netherlands {weiwei.sun,shizhl}@mail.sdu.edu.cn, {shengao,renpengjie,zhaochun.ren}@sdu.edu.cn, m.derijke@uva.nl Pre-trained language models (LMs) store knowledge in their parameters and can generate informative responses when used in conversational systems. However, LMs suffer from the problem of hallucination: they may generate plausible-looking statements that are irrelevant or factually incorrect. To address this problem, we propose a contrastive learning scheme, named Mix CL. A novel mixed contrastive objective is proposed to explicitly optimize the implicit knowledge elicitation process of LMs, and thus reduce their hallucination in conversations. We also examine negative sampling strategies of retrieved hard negatives and model-generated negatives. We conduct experiments on Wizard-of-Wikipedia, a public, open-domain knowledgegrounded dialogue benchmark, and assess the effectiveness of Mix CL. Mix CL effectively reduces the hallucination of LMs in conversations and achieves the highest performance among LM-based dialogue agents in terms of relevancy and factuality. We show that Mix CL achieves comparable performance to state-of-the-art KB-based approaches while enjoying notable advantages in terms of efficiency and scalability. 1 Introduction Open-domain dialogue agents have received increasing attention in recent years (Freitas et al. 2020; Huang, Zhu, and Gao 2020). In an engaging open-domain dialogue, a large amount of knowledge, such as commonsense (Young et al. 2018) and factual knowledge (Dinan et al. 2019), is involved. To integrate knowledge into dialogue agents, KBbased methods have been proposed to explicitly acquire knowledge from knowledge bases (Young et al. 2018; Dinan et al. 2019). However, KB-based methods suffer from problems of retrieval error (Liu et al. 2022) and inefficiency (Xu et al. 2022). Meanwhile, recent years have witnessed a rapid development of pre-trained language models (LMs) (Devlin et al. 2019; Brown et al. 2020) and their applications to dialogue tasks (Thoppilan et al. 2022). Large LMs implicitly store knowledge in their parameters during the pretraining stage (Petroni et al. 2019; Zhou et al. 2020) and thus, to some extent, they can serve as *Corresponding author. Copyright 2023, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Intrinsic hallucinations Extrinsic hallucinations Others 0% 25% 50% 75% 100% Figure 1: Results of a pilot experiment where annotators were asked to label 200 responses generated by BART on the Wizard-of-Wikipedia dataset for hallucination. knowledge bases to ground open-domain dialogues (Zhao, Wu, and Xu 2020). Such approaches, known as LM-based methods, achieve promising performance in generating informative responses and obviate the drawbacks of KBbased methods. However, LM-based methods have the problem of hallucination (Shuster et al. 2021; Ji et al. 2022): they generate plausible-looking statements that are irrelevant or factually incorrect. To understand the severity of hallucinations of LMs, we conduct a pilot experiment. We sample 200 responses generated by BART (Lewis et al. 2020) on the Wizard-of Wikipedia dataset (Dinan et al. 2019) for various topics and conversation turns. These responses are annotated by three well-informed experts in terms of knowledge relevancy and factuality. Based on the results, we group the hallucinations of LMs into two types: intrinsic hallucinations and extrinsic hallucinations. Intrinsic hallucinations are non-factual statements, such as incorrectly predicting a celebrity s birthday. Extrinsic hallucinations are irrelevant or out-ofcontext responses, such as the a description of the history of football when the user asks the number of teams currently in the NFL. Fig. 1 summarizes the outcomes: intrinsic and extrinsic hallucinations account for 24% and 27% of the responses, respectively. The problem of hallucinations is mainly attributable to the optimization recipes: the commonly used maximum likelihood estimation (MLE) with teacher forcing training encourages the model to imitate the training data blindly, leading to model hallucinations at inference time (Kang and Hashimoto 2020). Most studies on tackling hallucination in conversations focus on KB-based methods and use pre-retrieval (Shuster et al. 2021) or post-editing techniques (Dziri et al. 2021) to improve faithfulness; the hallucination of LM-based agents in eliciting knowledge The Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23) inside LMs parameters is still underexplored. In this paper, we propose Mixed Contrastive Learning (Mix CL) to alleviate the hallucinations of LM-based dialogue agents. Mix CL explicitly samples the most confusing knowledge to the model and reduces its generation probability by contrasting it with the groundtruth. To this end, two novel steps are used by Mix CL: (i) negative sampling, and (ii) mixed-contrastive learning. In the former, we sample the most confused negative knowledge by retrieving from the corpus or deriving via model bootstrapping. In the latter, we propose mixedcontrastive learning under the inspiration of mix-up data augmentation (Zhang et al. 2018), which mixes the positive and negative at span level. Moreover, we propose two mixed strategies regarding the two types of hallucination: entitybased mix-up and constituency-based mix-up. Finally, Mix CL is optimized in an end-to-end manner, thus avoiding the retrieval step during inference and instead using the knowledge inside its parameters. We conduct experiments on Wizard-of-Wikipedia (Dinan et al. 2019), an open-domain, knowledge-grounded dialogue dataset. Extensive experiments show that Mix CL improves the informativeness and relevancy of the responses. Compared with previous LM-based methods (Zhao, Wu, and Xu 2020; Xu et al. 2022; Liu et al. 2022), Mix CL achieves improvements by 5% to 15% in terms of response quality and relevancy. Moreover, Mix CL achieves comparable performance as state-of-the-art KB-based methods (e.g., Knowled GPT (Zhao et al. 2020)), while speeding up 5 in model inference and showing superior scalability. The effectiveness of Mix CL is also verified through human evaluation and ablation experiments. Our contributions are as follows: (i) We propose Mix CL, which reduces hallucinations of LMs in conversation through contrastive learning. (ii) We propose a hard negative sampling strategy to obtain the most confused negative knowledge (see Section 5.1). (iii) We propose a mix contrastive objective to optimize the model at span level (see Section 5.2). (iv) Experiments on the Wizard-of-Wikipedia dataset show that Mix CL effectively reduces the hallucinating content produced by the LM and achieves comparable performance to KB-based approaches.1 2 Related Work 2.1 Knowledge-Grounded Dialogues In open-domain knowledge-grounded dialogues (KGDs), people respond to each other s utterances in a meaningful way by integrating knowledge (Young et al. 2018; Huang, Zhu, and Gao 2020). To integrate knowledge, KB-based methods have been explored (Liu et al. 2018; Young et al. 2018; Dinan et al. 2019); they retrieve knowledge from a corpus through additional information retrieval (IR) modules. Studies on KB-based methods focus on knowledge selection (Meng et al. 2020; Shuster et al. 2021) and knowledge-grounded response generation (Zhao et al. 2020; Zheng and Huang 2021). However, KB-based methods 1We release our code at https://github.com/sunnweiwei/Mix CL. suffer from the problems of retrieval errors (Liu et al. 2022), inefficiencies (Xu et al. 2022), and multi-granularity knowledge integration (Wu et al. 2022). 2.2 Language Models as Knowledge Bases Recent years have witnessed a rapid development of language models (LMs) (Brown et al. 2020) and LM-based dialogue agents (Thoppilan et al. 2022). Large LMs store knowledge into their parameters during pre-training and can generate informative responses in conversations (Zhao, Wu, and Xu 2020). Petroni et al. (2019) show that LMs can serve as knowledge bases for downstream tasks (e.g., question answering (Roberts, Raffel, and Shazeer 2020)). On this basis, Zhao, Wu, and Xu (2020) show that LMs can ground open-domain dialogues using their implicit knowledge. Madotto et al. (2020) embed knowledge bases into model s parameters for end-to-end task-oriented dialogues. Roller et al. (2021) finetune LMs on KGD data. Cui et al. (2021) propose knowledge-enhanced finetuning methods to handle unseen entities. Xu et al. (2022) propose a topic-aware adapter to adapt LMs in KGDs. Liu et al. (2022) propose a multi-stage prompting approach for triggering knowledge in LMs. Wu et al. (2022) propose lexical knowledge internalization to integrate token-level knowledge into the model s parameters. However, existing LM-based methods suffer from the problem of hallucination. In this paper, we optimize the implicit knowledge eliciting process, i.e., reduce hallucination of LMs in KGD, via the proposed contrastive learning framework Mix CL. 2.3 Contrastive Learning Contrastive learning (CL) (Chopra, Hadsell, and Le Cun 2005; Chen et al. 2020b) is based on the idea that similar samples should also be close in representation space, and has seen applications in NLP (Gao, Yao, and Chen 2021). CL has been used for optimizing knowledge retrieval processes (Karpukhin et al. 2021; Xiong et al. 2021), where the model learns to identify positive knowledge from negatives. On the task of neural text generation, CL (Jiang et al. 2022), a.k.a. unlikelihood training (Welleck et al. 2020) or negative training (He and Glass 2020), alleviates undesirable properties of the generated output, e.g., repetition (Shirai et al. 2020; Jiang et al. 2022), maliciousness (He and Glass 2020), dullness (Li et al. 2020b, 2022), or inconsistency (Li et al. 2020a). Moreover, Cao and Wang (2021) propose a sentence level contrastive learning method to reduce the hallucinations of text summarization model. Unlike existing studies, we propose a mixed contrastive learning framework Mix CL that eliminates the hallucination at the span level with effective negative sampling strategies. 3 Problem Formulation Let x, y, and k be the dialogue context, the corresponding response, and the ground-truth knowledge, respectively. As illustrated in Fig. 2, given a knowledge corpus K, a dialogue agent learns to predict an informative response y based on the dialogue context x using the knowledge in K. As Knowledge Dialogue context Knowledge retriever Response generator 1. Retrieval 2. Generation (a) KB-based dialogue agents explicitly retrieve text-based knowledge from corpus. Knowledge Dialogue context Language model 2. Finetune 1. Pre-train (b) LM-based dialogue agents store knowledge in LM parameters and generate responses using implicit knowledge. Figure 2: Types of dialogue agents. discussed earlier, two approaches are studied in KGD, KBbased methods and LM-based methods. In this paper, we focus on the latter one. KB-based Methods. KB-based dialogue agents (Dinan et al. 2019) ground the response generation by explicitly retrieving knowledge from K. Two sub-modules, i.e., knowledge retriever and response generator, are employed by KB-based approaches, as shown in Fig. 2 (a). LM-based Methods. In this paper, we explore language models as knowledge bases for dialogue agents (Zhao, Wu, and Xu 2020; Xu et al. 2022), as illustrated in Fig. 2 (b). In LM-based approaches, the LMs are first pre-trained on K to store the knowledge in their parameters. Then, the models directly generate y given x using the knowledge in their parameters and getting rid of the explicit retrieval step. 4 Preliminaries We propose a LM-based dialogue agent for open-domain KGD. The proposed model pθ(y|x) is based on a transformer-based language model with encoder-decoder architecture. The model is first pre-trained on the corpus K and then finetuned on dialogue data to generate informative responses. Pre-training on Knowledge Corpus. We employ BART (Lewis et al. 2020) as the pre-trained transformer, which is pre-trained by denoising self-supervised learning: LLM = Ek K log pθ(k|ˆk), (1) where K is a text-based knowledge corpus (e.g., Wikipedia), k is a text sampled from knowledge corpus K, and ˆk denotes corrupted text by corruption functions (e.g., masking, deletion, infilling, etc.; Lewis et al. (2020)). Finetuning on Dialogue Datasets. With the pre-trained LM, the model generates the response y given x without explicit knowledge retrieval step (Zhao, Wu, and Xu 2020; Xu et al. 2022). Maximum likelihood estimation (MLE) training loss on dialogue data with paired (x, y) is employed by previous methods. In MLE, the model learns to predict the ground-truth tokens for each step in a teacher forcing paradigm (Zhao, Wu, and Xu 2020; Xu et al. 2022): LMLE = log pθ(y|x) = t=1 log pθ(yt|y