# tabletotext_describing_table_region_with_natural_language__5bebd8e9.pdf Table-to-Text: Describing Table Region with Natural Language Junwei Bao, Duyu Tang, Nan Duan, Zhao Yan, Yuanhua Lv, Ming Zhou, Tiejun Zhao Harbin Institute of Technology, Harbin, China Microsoft Research, Beijing, China Beihang University, Beijing, China Microsoft AI and Research, Sunnyvale CA, USA baojunwei001@gmail.com, {dutang, nanduan, yuanhual, mingzhou}@microsoft.com yanzhao@buaa.edu.cn, tjzhao@hit.edu.cn In this paper, we present a generative model to generate a natural language sentence describing a table region, e.g., a row. The model maps a row from a table to a continuous vector and then generates a natural language sentence by leveraging the semantics of a table. To deal with rare words appearing in a table, we develop a flexible copying mechanism that selectively replicates contents from the table in the output sequence. Extensive experiments demonstrate the accuracy of the model and the power of the copying mechanism. On two synthetic datasets, WIKIBIO and SIMPLEQUESTIONS, our model improves the current state-of-the-art BLEU-4 score from 34.70 to 40.26 and from 33.32 to 39.12, respectively. Furthermore, we introduce an open-domain dataset WIKITABLETEXT including 13,318 explanatory sentences for 4,962 tables. Our model achieves a BLEU-4 score of 38.23, which outperforms template based and language model based approaches. Introduction A Table1 is a widely-used type of data source on the web, which has a formal structure and contains valuable information. Understanding the meaning of a table and describing its content is an important problem in artificial intelligence, with potential applications like question answering, building conversational agents and supporting search engines. (Pasupat and Liang 2015; Sun et al. 2016; Yin et al. 2015; Jauhar, Turney, and Hovy 2016; Konstas and Lapata 2013; Li et al. 2016; Yan et al. 2016). In this paper, we focus on the task of table-to-text generation. The goal is to automatically describe a table region (e.g., a row) with natural language. The task of table-to-text could be used to support many applications, such as search engines and conversational agents. On one hand, the task could be used to generate descriptive sentences for the structured tables on the web. Current search engines could serve structured tables as answers by regarding the generated sentences as keys and tables as values. On the other hand, tables could also be used as responses for conversational agents such as the intents of ticket Contribution during internship at Microsoft Research Asia Copyright c 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. 1https://en.wikipedia.org/wiki/Table (information) booking and production comparison. However, it is impractical for a conversational agent to read a table of multiple rows and columns on a smart-phone. Table-to-text technology could transform the data into natural language sentences which could be sent back to users with utterances or voice via text-to-speech transformation. The task of table-to-text generation has three challenges. The first one is how to learn a good representation of a table. A table has underlying structures such as attributes, cells and a caption. Understanding the meaning of a table is the foundation of the following steps for table-to-text generation. The second challenge is how to automatically generate a natural language sentence, which is not only fluent but also closely relevant to the meaning of the table. The third challenge is how to effectively use the informative words from a table which are typically of low-frequency, such as name entities and numbers, to generate a sentence. To address the aforementioned challenges, we introduce a neural network model that takes a row from a table and generates a natural language sentence describing that row. The backbone of our approach is the encoder-decoder framework, which has been successfully applied in many tasks including machine translation (Kalchbrenner and Blunsom 2013; Sutskever, Vinyals, and Le 2014; Bahdanau, Cho, and Bengio 2014) and dialogue generation (Sordoni et al. 2015). In the encoder part, the model leverages table structures to represent a row as a continuous vector. In the decoder part, we develop a powerful copying mechanism that is capable of generating rare words from table cells, attributes and caption. The entire model can be conventionally trained in an end-to-end fashion with back-propagation. Furthermore, we introduce an open-domain dataset, WIKITABLETEXT, including 13,318 explanatory sentences for 4,962 tables. We conduct experiments on three datasets to verify the effectiveness of the proposed approach. On WIKITABLETEXT, our approach achieves a BLEU-4 score of 38.23, substantially better than template-based and neural language model based approaches by an order of magnitude. Thorough model analysis shows that our copying mechanism not only dramatically boosts performance, but also has the ability to selectively replicate appropriate contents from a table to the output sentence. On two synthetic datasets, WIKIBIO and SIMPLEQUESTIONS, our approach improves the state-of-the-art BLEU-4 score from 34.7 to 40.26 and The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) Figure 1: An example of table-to-text generation. from 33.32 to 39.12, respectively. This work makes the following contributions. We present a neural network approach for table-to-text generation which effectively uses the structure of a table. We introduce a powerful copying mechanism that is capable of generating rare words from a table. We release an open-domain dataset WIKITABLETEXT, and hope that it can offer opportunities to further research in this area. Task Formalization and Dataset We first formulate the task of table-to-text generation. Afterwards, we present the construction of an open-domain dataset WIKITABLETEXT, which includes sentences that describe table regions. Task Formalization. A table T is defined as a tuple T = Attribute, Cell, Caption , where Attribute = {a1, ..., a N} includes N attributes (column headers) of the table. Cell = {c1 1, ..., c1 N, ..., c M 1 , ..., c M N } includes N M cells of the table, where N is the number of columns, M is the number of rows, cj i is the cell where the ith column and jth row interacts. Caption is typically a natural language explanatory about the entire table. We formulate the task of table-to-text generation as follows. Given a region of a table as input, the task is to output a natural language sentence to describe the selected table region. In this work, we restrict the table region as a row, and leave a large region like multiple rows or entire table to future work. Besides, information (cells) in the row can be selectively used to generate a sentence. Figure 1 gives an example that illustrates the task. Given a selected row, which is highlighted in orange, the goal is to output a descriptive sentence Singapore Armed forces was the champion of Singapore Cup in 1997. . In this case, only the information from the first two columns are used. It is worth noting that, we deal with regular tables in this work and leave the irregular tables to future work. We regard a table as a regular one if it does not contain merged attributes or merged cells, and the number of cells in each row is equal to the number of attributes. WIKITABLETEXT. We describe the construction of WIKITABLETEXT. We crawl tables from Wikipedia, and randomly select 5,000 regular tables, each of which has at least 3 rows and 2 columns. For each table, we randomly select three rows, resulting in 15,000 rows that are further used for manual annotation. Each annotator is given a selected row, the corresponding attributes and the caption. We require that rows from the same table are labeled by different annotators. If a table does not contain a caption, we use its page title instead. Each annotator is asked to write a sentence to describe at least two cells from a table, but not required to cover every cell. For example, the sentence in Figure 1 does not use the Runner-up column. In addition, we also ask annotators not to search the meaning of a table from the web, as we would like to ensure that external knowledge is not used. This makes the dataset more suitable for the real scenario. To increase the diversity of the generated language, we assign different rows from the same table to different annotators. If a row is hard to be described, we ask the annotator to write It s-hard-to-annotate . Finally, we get 13,318 row-text pairs. Statistics are given in Table 1. We randomly split the entire dataset into training (10,000), development (1,318), and test (2,000) sets. Type Value Number of tables 4,962 Number of sentences 13,318 Avg #sentences per table 2.68 Avg #words per sentence 13.91 Avg / Min / Max #words per caption 3.55 / 1 / 14 Avg / Min / Max #cells per sentence 3.13 / 2 / 9 Avg / Min / Max #columns per table 4.02 / 2 / 10 Avg / Min / Max #rows per table 7.95 / 3 / 19 Table 1: Statistics of WIKITABLETEXT. To the best of our knowledge, WIKITABLETEXT is the first open-domain dataset for table-to-text generation. It differs from WEATHERGOV (Liang, Jordan, and Klein 2009) and ROBOCUP (Chen and Mooney 2008) in that the schemes are not restricted to a specific domain, such as weather forecasting and Robo Cup sportscasting. We believe that WIKITABLETEXT brings more challenges and might be more useful in real world applications. We are aware that WIKITABLEQUESTIONS (Pasupat and Liang 2015) is a widely used dataset for table-based question answering task which takes a question and a table as input and outputs an answer. However, we do not use this dataset in this work, because table-to-text generation task takes a row as input and outputs a sentence to describe the row, while a portion of questions in WIKITABLEQUESTIONS involve reasoning over multiple rows which does not satisfy the task constraint. We plan to handle multiple rows as input in future work. Our task is also closely related to infobox-to-biography generation on WIKIBIO (Lebret, Grangier, and Auli 2016) and fact-to-question generation on SIMPLEQUESTIONS (Serban et al. 2016). Both infoboxes and knowledge base facts can be viewed as special cases of tables. Our task differs from them in that our input comes from a table with multiple rows and multiple columns. Moreover, our dataset differs from (Lebret, Grangier, and Auli 2016) in that their dataset is restricted to biography domain, but our dataset is opendomain. We differ from (Serban et al. 2016) in that their task generates questions, but our focus is to generate descriptive sentences. Background: Sequence-to-Sequence Our approach is inspired by sequence-to-sequence (seq2seq) learning, which has been successfully applied in many language, speech and computer vision applications. The main idea of seq2seq learning is that it first encodes the meaning of a source sequence into a continuous vector by an encoder, and then decodes the vector to a target sequence with a decoder. In this section, we briefly introduce the neural network for seq2seq learning. Encoder. The goal of the encoder component is to represent a variable-length source sequence x = {x1, ..., x N} as a fixed-length continuous vector. The encoder can be implemented with various neural architectures such as convolutional neural network (CNN) (Meng et al. 2015; Gehring et al. 2016) and recurrent neural network (RNN) (Cho et al. 2014; Sutskever, Vinyals, and Le 2014). Taking RNN as an example, it deals with a sequence by recursively transforming current words with the output vector in the previous step. It is formulated as ht = fenc(xt, ht 1) where fenc() is a nonlinear function, and ht is the hidden vector at time step t. The last hidden vector h N is usually used as the representation of the input sequence x. Decoder. The decoder takes the output of the encoder, and outputs a target sequence y. Typically, the decoder is implemented with RNN, which generates a word yt at each time step t {1, 2, ...} based on the representation of x and the previously predicted word sequence y