# domain_adaptation_for_semantic_parsing__fa7a4538.pdf Domain Adaptation for Semantic Parsing Zechang Li1,2 , Yuxuan Lai1 , Yansong Feng1,3 and Dongyan Zhao1,2 1Wangxuan Institute of Computer Technology, Peking University, Beijing, China 2Center for Data Science, Peking University, Beijing, China 3The MOE Key Laboratory of Computational Linguistics, Peking University, China {zcli18, erutan, fengyansong, zhaody}@pku.edu.cn Recently, semantic parsing has attracted much attention in the community. Although many neural modeling efforts have greatly improved the performance, it still suffers from the data scarcity issue. In this paper, we propose a novel semantic parser for domain adaptation, where we have much fewer annotated data in the target domain compared to the source domain. Our semantic parser benefits from a two-stage coarse-to-fine framework, thus can provide different and accurate treatments for the two stages, i.e., focusing on domain invariant and domain specific information, respectively. In the coarse stage, our novel domain discrimination component and domain relevance attention encourage the model to learn transferable domain general structures. In the fine stage, the model is guided to concentrate on domain related details. Experiments on a benchmark dataset show that our method consistently outperforms several popular domain adaptation strategies. Additionally, we show that our model can well exploit limited target data to capture the difference between the source and target domain, even when the target domain has far fewer training instances. 1 Introduction Semantic parsing is the task of transforming natural language utterances into meaning representations such as executable structured queries or logical forms. Despite traditional syntactic parsing style models, there have been many recent efforts devoted to end-to-end neural models in a supervised manner [Dong and Lapata, 2016; Sun et al., 2018; Bogin et al., 2019]. It is known that such models usually require many labeled data for training and are often hard to transfer to new domains, since the meaning representations may vary greatly between different domains, e.g., the calendar and housing domains share less similarity in their meaning representations [Wang et al., 2015]. However, there has been relatively less attention to the domain adaptation for semantic parsing. This is not an easy Corresponding Author Domain Instance utterance: meetings attended by two or more people logical form: list Value (count Comparative (get Property (singleton en.meeting ) ( string !type ) ) ( string attendee ) ( string >= ) ( number 2 ) ) utterance: housing units with 2 neighborhoods logical form: list Value (count Comparative (get Property (singleton en.housing unit ) ( string !type ) ) ( string neighborhood ) (string = ) (number 2 ) ) Table 1: Examples of paired utterances and their logical forms from the OVERNIGHT dataset. The bold tokens in logical forms are usually domain invariant, which can be seen as patterns generalized across different domains. task, since one has to deal with the transfer of semantic representations, including both structural levels and lexical levels. And it is often more challenging than the transfer of a sentence classification model. Moreover, contrast to other conventional domain transfer tasks, e.g., sentiment analysis, where all labels have been seen in source domains, semantic parsing models are expected to generate domain specific labels or tokens with limited target domain annotations, e.g., attendee only appears in the calendar domain. These observations suggest that more efforts are required to deal with the query structure transfer and few-shot token generation issues when we perform domain adaptation for semantic parsing. An intuitive solution to solve this problem is to build a two-stage model, where a coarse level component focuses on learning more general, domain invariant representations, and a fine level component should concentrate on more detailed, domain specific representations. Take the two utterances in Table 1 as an example. Although they come from different domains, they both express the comparison between certain properties and values, querying certain types of entities (meeting or housing unit), with several properties (attendee or neighborhood) specified (>= 2 or = 2). We can see that the COMPARATIVE pattern tends to be domain invariant and can be more easily transferred in the coarse level, while domain related tokens, e.g., the category and property names, should be concentrated in the fine stage. In this work, we propose a novel two-stage semantic pars- Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20) ing approach for domain adaptation. Our approach is inspired by the recent coarse-to-fine (coarse2fine) architecture [Dong and Lapata, 2018], where the coarse step produces general intermediate representations, i.e., sketches, and then the fine step generates detailed tokens or labels. However, the coarse2fine architecture can not be applied to domain adaptation directly, because there is no guarantee for the two stages to achieve our expected different purposes, since the predicate-only intermediate sketch can just provide a distant signal. We thus propose two novel mechanisms, an adversarial domain discrimination and a domain relevance attention to enhance the encoders and decoders, respectively. They drive the model to learn domain general and domain related representations in different stages, and help to focus on different clues during decoding. We conduct experiments on the OVERNIGHT dataset [Wang et al., 2015], and outperform conventional semantic parsing and popular domain transfer methods. Further analysis shows that both adversarial domain discrimination and domain relevance attention can make the most of the coarse-to-fine architecture for domain adaptation. Our contributions are summarized as follows: We propose a novel two-stage semantic parsing model for domain adaptation, where the coarse step transfers the domain general structural patterns and the fine step focuses on the difference between domains. We design two novel mechanisms, adversarial domain discrimination and domain relevance attention to enhance the encoders and decoders, which help the model to learn domain invariant patterns in the coarse stage, while focusing on domain related details in the fine stage. 2 Task Definition Formally, given a natural language utterance X = x1, ..., x|X| with length |X|, the semantic parsing task aims at generating a logical form Y = y1, ..., y|Y | with length |Y |, which formally presents the meaning of X, but in predefined grammar. In the domain adaptation settings, each instance (xi, yi) is also associated with a specific domain, e.g., housing or calendar, etc. Specifically, domains with sufficient labeled instances are treated as source domains DS1, ..., DSk. And if a domain include far less labeled instances than any source domains, we treat it as a target domain DT , i.e., |DSi| >> |DT |, i. We denote the combination of source domains as DS. Our goal is to learn a semantic parser for the target domain by exploring both abundant source domain data and limited target domain annotations. We propose a Domain-Aware se Mantic Parser, DAMP, within the coarse2fine framework [Dong and Lapata, 2018], which introduces an intermediate sketch (A = a1, ..., a|A|) to bridge natural language utterances and logical forms. The procedures to generate sketches and logical forms are called the coarse stage and fine stage, respectively. Our main idea is to disentangle the domain invariant sketches and domain specific tokens in the two stages, respectively. However, it is not appropriate to directly apply the vanilla coarse2fine model to the domain adaptation scenario, since it does not explicitly consider domain information in designing either sketch or model architectures. To alleviate this problem, we first approximate logical form tokens shared by more than 50% source domains as sketch tokens, since we assume sketches are domain general and should be shared across different domains. The rest tokens are regarded as domain related and should be generated in the fine stage. We also introduce multi-task based domain discrimination and domain relevance attention to the encoder and decoder procedures, encouraging the parser to focus on different aspects, i.e. domain general and domain specific, during the coarse and fine stages, respectively. The overview of DAMP is illustrated in Figure 1. The implemention is open source.1 In the coarse stage, utterance representations Uc = {uc k}|X| k=1 are produced by encoder1 given the utterance X. Afterwards, Uc are fed into decoder1 via attention mechanism to generate the sketch A. In the fine stage, to capture the utterance information in different aspects, we adopt another encoder, encoder2, and the new utterance representations are Uf = {uf k}|X| k=1. There is also encoder3 to encode the sketch into sketch representations Sf = {sf k}|A| k=1. decoder2 takes Uf and Sf with attention mechanism and generate the final logical form Y . 3.1 Encoder: Domain Discrimination In order to constrain the utterance encoders in the coarse and fine stages to focus on domain invariant and domain specific information, respectively, we adopt a domain discrimination component over Uc and Uf. This can guide Uc more consistent among domains, while maintaining Uf distinguishable in the fine stage. Specifically, in the coarse stage, the utterance representations are aggregated via self-attention as: uc = Uc αc e, αc e = softmax(Uc wc αe) (1) where wc αe is a trainable parameter. The domain discriminator further computes pc = σ(wc duc + bc d), which is the probability of the utterance comes from the source domains. The wc d and bc d are parameters and σ stands for sigmoid function. To guide the model confusing among the domains, we perform gradient ascent over the negative log-likelihood of pc. The corresponding loss function for a gradient descent optimizer is (notice that the minus sign is removed for gradient ascent): LD c = 1 |D|( X X,Y DS log pc + X X,Y DT log(1 pc)) (2) In the fine stage, we obtain pf, the probability of the utterance coming from source domains, based on Uf. But here, our target is to make it more discriminative. Thus a conventional gradient descent is adopted, and the corresponding loss function is: X,Y DS log pf + X X,Y DT log(1 pf)) (3) 1https://github.com/zechagl/DAMP Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20) Figure 1: Overview of DAMP. The left part is the coarse stage and the right shows the fine stage. The blue module in the middle is the domain discrimination component while the yellow shows the domain relevance attention. 3.2 Decoder: Domain Relevance Attention We observe that there are many words useful to determine the patterns of sketches, while others are more likely to associate with the domain specific tokens in the logical forms. Consider the first example in Table 1, the domain general words like by two or more are associated with the comparison sketch in the coarse stage. On the other hand, domain related tokens like meetings and attended help us fill the missing entities and properties during the fine stage. Therefore, we propose a domain relevance attention mechanism to integrate this prior to the decoding procedure. Formally, in the time step t of the coarse stage, the predicated distribution is: P(a|a