# local_explanation_of_dialogue_response_generation__708fce43.pdf Local Explanation of Dialogue Response Generation Yi-Lin Tuan1, Connor Pryor2, Wenhu Chen1, Lise Getoor2, William Yang Wang1 1 University of California, Santa Barbara 2 University of California, Santa Cruz {ytuan, wenhuchen, william}@cs.ucsb.edu {cfpryor, getoor}@ucsc.edu In comparison to the interpretation of classification models, the explanation of sequence generation models is also an important problem, however it has seen little attention. In this work, we study model-agnostic explanations of a representative text generation task dialogue response generation. Dialog response generation is challenging with its open-ended sentences and multiple acceptable responses. To gain insights into the reasoning process of a generation model, we propose a new method, local explanation of response generation (LERG), that regards the explanations as the mutual interaction of segments in input and output sentences. LERG views the sequence prediction as uncertainty estimation of a human response and then creates explanations by perturbing the input and calculating the certainty change over the human response. We show that LERG adheres to desired properties of explanation for text generation, including unbiased approximation, consistency, and cause identification. Empirically, our results show that our method consistently improves other widely used methods on proposed automaticand humanevaluation metrics for this new task by 4.4-12.8%. Our analysis demonstrates that LERG can extract both explicit and implicit relations between input and output segments. 1 1 Introduction As we use machine learning models in daily tasks, such as medical diagnostics [6, 19], speech assistants [31] etc., being able to trust the predictions being made has become increasingly important. To understand the underlying reasoning process of complex machine learning models a sub-field of explainable artificial intelligence (XAI) [2, 17, 36] called local explanations, has seen promising results [35]. Local explanation methods [27, 39] often approximate an underlying black box model by fitting an interpretable proxy, such as a linear model or tree, around the neighborhood of individual predictions. These methods have the advantage of being model-agnostic and locally interpretable. Traditionally, off-the-shelf local explanation frameworks, such as the Shapley value in game theory [38] and the learning-based Local Interpretable Model-agnostic Explanation (LIME) [35] have been shown to work well on classification tasks with a small number of classes. In particular, there has been work on image classification [35], sentiment analysis [8], and evidence selection for question answering [32]. However, to the best of our knowledge, there has been less work studying explanations over models with sequential output and large class sizes at each time step. An attempt by [1] aims at explaining machine translation by aligning the sentences in source and target languages. Nonetheless, unlike translation, where it is possible to find almost all word alignments of the input and output sentences, many text generation tasks are not alignment-based. We further explore explanations over sequences that contain implicit and indirect relations between the input and output utterances. 1Our code is available at https://github.com/Pascalson/LERG. 35th Conference on Neural Information Processing Systems (Neur IPS 2021). In this paper, we study explanations over a set of representative conditional text generation models dialogue response generation models [45, 55]. These models typically aim to produce an engaging and informative [3, 24] response to an input message. The open-ended sentences and multiple acceptable responses in dialogues pose two major challenges: (1) an exponentially large output space and (2) the implicit relations between the input and output texts. For example, the open-ended prompt How are you today? could lead to multiple responses depending on the users emotion, situation, social skills, expressions, etc. A simple answer such as Good. Thank you for asking. does not have an explicit alignment to words in the input prompt. Even though this alignment does not exist, it is clear that good is the key response to how are you . To find such crucial corresponding parts in a dialogue, we propose to extract explanations that can answer the question: Which parts of the response are influenced the most by parts of the prompt? To obtain such explanations, we introduce LERG, a novel yet simple method that extracts the sorted importance scores of every input-output segment pair from a dialogue response generation model. We view this sequence prediction as the uncertainty estimation of one human response and find a linear proxy that simulates the certainty caused from one input segment to an output segment. We further derive two optimization variations of LERG. One is learning-based [35] and another is the derived optimal similar to Shapley value [38]. To theoretically verify LERG, we propose that an ideal explanation of text generation should adhere to three properties: unbiased approximation, intra-response consistency, and causal cause identification. To the best of our knowledge, our work is the first to explore explanation over dialog response generation while maintaining all three properties. To verify if the explanations are both faithful (the explanation is fully dependent on the model being explained) [2] and interpretable (the explanation is understandable by humans) [14], we conduct comprehensive automatic evaluations and user study. We evaluate the necessity and sufficiency of the extracted explanation to the generation model by evaluating the perplexity change of removing salient input segments (necessity) and evaluating the perplexity of only salient segments remaining (sufficiency). In our user study, we present annotators with only the most salient parts in an input and ask them to select the most appropriate response from a set of candidates. Empirically, our proposed method consistently outperforms baselines on both automatic metrics and human evaluation. Our key contributions are: We propose a novel local explanation method for dialogue response generation (LERG). We propose a unified formulation that generalizes local explanation methods towards sequence generation and show that our method adheres to the desired properties for explaining conditional text generation. We build a systematic framework to evaluate explanations of response generation including automatic metrics and user study. 2 Local Explanation Local explanation methods aim to explain predictions of an arbitrary model by interpreting the neighborhood of individual predictions [35]. It can be viewed as training a proxy that adds the contributions of input features to a model s predictions [27]. More formally, given an example with input features x = {xi}M i=1, the corresponding prediction y with probability f(x) = Pθ(Y = y|x) (the classifier is parameterized by θ), we denote the contribution from each input feature xi as φi R and denote the concatenation of all contributions as φ = [φ1, ..., φM]T RM. Two popular local explanation methods are the learning-based Local Interpretable Model-agnostic Explanations (LIME) [35] and the game theory-based Shapley value [38]. LIME interprets a complex classifier f based on locally approximating a linear classifier around a given prediction f(x). The optimization of the explanation model that LIME uses adheres to: ξ(x) = arg min ϕ [L(f, ϕ, πx) + Ω(ϕ)] , (1) where we sample a perturbed input x from πx( x) = exp( D(x, x)2/σ2) taking D(x, x) as a distance function and σ as the width. Ω is the model complexity of the proxy ϕ. The objective of ξ(x) is to find the simplest ϕ that can approximate the behavior of f around x. When using a linear classifier dialog input text output G (a) Controllable dialogue models input text positive (b) Explanation of classifier dialog input text output G (c) Our concept Figure 1: The motivation of local explanation for dialogue response generation. (c) = (a)+(b). φ as the ϕ to minimize Ω(ϕ) [35], we can formulate the objective function as: φ = arg min φ E x πx(Pθ(Y = y| x) φT z)2 , (2) where z {0, 1}M is a simplified feature vector of x by a mapping function h such that z = h(x, x) = { (xi x)}M i=1. The optimization means to minimize the classification error in the neighborhood of x sampled from πx. Therefore, using LIME, we can find an interpretable linear model that approximates any complex classifier s behavior around an example x. Shapley value takes the input features x = {xi}M i=1 as M independent players who cooperate to achieve a benefit in a game [38]. The Shapley value computes how much each player xi contributes to the total received benefit: | x|!(|x| | x| 1)! |x|! [Pθ(Y = y| x {xi}) Pθ(Y = y| x)] . (3) To reduce the computational cost, instead of computing all combinations, we can find surrogates φi proportional to ϕi and rewrite the above equation as an expectation over x sampled from P( x): φi = |x| |x| 1ϕi = E x P ( x)[Pθ(Y = y| x {xi}) Pθ(Y = y| x)], i , (4) where P( x) = 1 (|x| 1)( |x| 1 | x| ) is the perturb function.2 We can also transform the above formulation into argmin: φi = arg min φi E x P ( x)([Pθ(Y = y| x {xi}) Pθ(Y = y| x)] φi)2 . (5) 3 Local Explanation for Dialogue Response Generation We aim to explain a model s response prediction to a dialogue history one at a time and call it the local explanation of dialogue response generation. We focus on the local explanation for a more fine-grained understanding of the model s behavior. 3.1 Task Definition As depicted in Figure 1, we draw inspiration from the notions of controllable dialogue generation models (Figure 1a) and local explanation in sentiment analysis (Figure 1b). The first one uses a concept in predefined classes as the relation between input text and the response; the latter finds the features that correspond to positive or negative sentiment. We propose to find parts within the input and output texts that are related by an underlying intent (Figure 1c). We first define the notations for dialogue response generation, which aims to predict a response y = y1y2...y N given an input message x = x1x2...x M. xi is the i-th token in sentence x with length M and yj is the j-th token in sentence y with length N. To solve this task, a typical sequence-to-sequence model f parameterized by θ produces a sequence of probability masses
[45]. The probability of y given x can then be computed as the product of the sequence Pθ(y|x) = Pθ(y1|x)Pθ(y2|x, y1)...Pθ(y N|x, y