# generating_live_soccermatch_commentary_from_play_data__8b34d41b.pdf The Thirty-Third AAAI Conference on Artiﬁcial Intelligence (AAAI-19) Generating Live Soccer-Match Commentary from Play Data Yasufumi Taniguchi,1 Yukun Feng,1 Hiroya Takamura,1,2 Manabu Okumura1 1Tokyo Institute of Technology 2National Institute of Advanced Industrial Science and Technology (AIST) {yasufumi, yukun}@lr.pi.titech.ac.jp, {takamura, oku}@pi.titech.ac.jp We address the task of generating live soccer-match commentaries from play event data. This task has characteristics that (i) each commentary is only partially aligned with events, (ii) play event data contains many types of categorical and numerical attributes, (iii) live commentaries often mention player names and team names. For these reasons, we propose an encoder for play event data, which is enhanced with a gate mechanism. We also introduce an attention mechanism on events. In addition, we introduced placeholders and their reconstruction mechanism to enable the model to copy appropriate player names and team names from the input data. We conduct experiments on the play data of the English Premier League, provide a discussion on the result including generated commentaries. Introduction A soccer match consists of a series of numerous play events such as shots, passes, and fouls. Such events are coded as event descriptions (henceforth, events) as in Figure 1 and later used for broadcasting and match analysis. For example, player_id=78412 in the ﬁgure means that the player mainly involved in this event is Shinji Okazaki, and x=98.9 y=48.7 indicates the position where this event occurred. The ﬁrst four lines in the ﬁgure provide the main information of the event and the remaining lines starting with Figure 1: Example of event data. The ﬁrst four lines provide the main information about the event; the remaining lines provide the additional detailed information. events; it is not clear which events each commentary is aligned to. This characteristics is not speciﬁc to our experimental data. Live commentary data found in public is usually sparse, and not play-by-play data as seen in reddit1 and the ofﬁcial webpage2. Other characteristics of this task are that input data consists of various types of information including categorical and numerical values, and that named entities (e.g., player names) are often mentioned in text. One motivation of this work is that it provides a solution to a data-to-text problem with similar characteristics. Another is that this work will be the ﬁrst step towards generating more personalized live commentaries including those focusing on a particular player and those relating the viewpoint of the fans of one team. Although play events are currently coded by human workers, a lot of efforts are being made to automatize the work (e.g., (von Hoyningen-Huene 2011)), especially with the help of GPS and sensors attached to players (Liu et al. 2009; Buchheit et al. 2014). The task of commentary generation contains many subtasks. In this paper, we address this task in a relaxed setting; we assume that which players to be mentioned and when to make a comment are given. We still need to work on content 1For example, see https://www.reddit.com/r/soccer/comments/ 8mbxr7/match thread real madrid vs liverpool champions/. 2For example, see https://www.premierleague.com/match/ 22713. selection, as well as on sentence planning and realization. We will discuss this point later in the paper. Related Work There are two types of research on text generation for sports matches. One is the generation of a summary for an entire match. For example, van der Lee et al. (2017) addressed the generation of soccer-match summaries separately for the fans of home and visiting teams. Many other researchers have worked on summary generation for different types of sports matches including American football (Barzilay and Lapata 2005), Australian football (Lareau, Dras, and Dale 2011), basketball (Wiseman, Shieber, and Rush 2017), and soccer (Bouayad-Agha, Casamayor, and Wanner 2011). The other type of research is live commentary generation, which we address in this paper. Tanaka-Ishii et al. (1998) and Chen et al. (2008) worked on this task, but with data of simulation soccer matches, in which both the input data and the commentaries are much simpler than ours. The data used in their work contains only player names and play types with timestamps, while the data used in our work contains more detailed information such as players positions and the ball speed. Other researchers worked on live commentary generation from a set of posts to microblogs (Kubo et al. 2013; Edouard et al. 2017), not from play data. Live commentary generation has also been explored in the domain of chess, where the complete data describing the state of the game is readily available (Kameko, Mori, and Tsuruoka 2015; Jhamtani et al. 2018). There are many pieces of conventional work on data-totext generation tasks, where template-based approaches are often used. We would like readers to refer to a survey paper (Gatt and Krahmer 2018) for details. Our work is different from such conventional work in that our method is a trainable neural-network based model. Recent work on datato-text generation includes product review generation (Dong et al. 2017) and biography generation (Lebret, Grangier, and Auli 2016; Liu et al. 2018; Hachey, Radford, and Chisholm 2017; Sha et al. 2018). Although our task is similar to these two kinds of tasks to a certain extent, it has its own characteristics that the input data is not well aligned with output text as discussed later. Another type of data-to-text generation task is text generation from a series of numerical values such as stock prices (Murakami et al. 2017) as opposed to the generation from tables as the two pieces of work mentioned above. Wiseman et al. (2017) examined a number of datasets for data-to-text tasks including the summary generation for basketball matches. The task addressed in their paper is similar to, but different from ours in that their task is to generate a summary written from the statistics of the match after it ended, while our task is to generate live commentaries. Play data of soccer We use play event data of soccer matches in the English Premier League3 for the 2015/16 season containing 380 soccer 3https://www.premierleague.com matches, provided by Opta Sports.4 The play data of each match consists of a sequence of events. An example of an event is shown in Figure 1. Each event consists of many pieces of information including play category, player names, ball position, time, height of ball, etc. Table 1: Statistics of the dataset. The number of commentaries mentioning each number of player names. 5+ means 5 or more. # of player names 1 2 3 4 5+ # of commentaries 6825 7613 2167 450 85 Table 2 shows some pieces of information described in the event in Figure 1 and their value types. Note that this is only a part of an event description, which actually contains a lot of more detailed information. There are 70 play categories designated by type_id (e.g., pass, foul, attempt saved, clearance), and 298 subcategories designated by qualifier_id (e.g., long ball, through ball, lob, volley). Although not all the information in this dataset can be automatically obtained with the current technology, efforts are being made to enable automatic recognition of play category and other information using image or video processing, and GPS technology (Liu et al. 2009; Buchheit et al. 2014) as argued in Introduction. The original dataset provided by Opta Sports contains 663,911 events and 26,340 commentaries. It means that there are a lot more events than commentaries. Also, on average, each match contains approximately 70 commentaries. Therefore, the commentaries in this dataset are not in the play-by-play style. Most of the events are ignored and only important events are described as commentaries. Table 1 provides the statistics of the dataset showing how many commentaries mention only one player name, two player names, and so on. The table shows that more than 60% of the commentaries contain multiple player names. It also shows that most commentaries mention three or less player names. In this work, we address this generation task under a relaxed setting; we assume that which players to be mentioned and when to make a comment are given. We therefore conduct the following preprocessing on the data. For each live commentary, we ﬁrst selected the events that contain the player names mentioned in the commentary and are timestamped within 5 minutes before and after5 the posting time of the commentary. From the selected events, we further selected the closest ﬁve events on the timeline and associate them with the commentary. We regarded such a pair of multiple events and a commentary as one instance. Even after this relaxation, the commentaries in the data are only partially aligned with events. In addition, each event contains many pieces of information, most of which are not mentioned in the commentaries. Therefore, although the content 4https://www.optasports.com The example commentary in Introduction and the example in Figure 1 were also provided by Opta Sports. 5The reason we also use events after the commentary is that the time associated with each commentary is sometimes deviated from the time associated with each event. Table 2: Example attributes describing the event in Figure 1. Note that this is only a part of an event description, which actually contains a lot of more detailed information. attribute example attribute value value type player name Shinji Okazaki categorical play category goal categorical time 82min 29sec continuous x-y coordinates of the ball 98.9, 48.7 continuous details keeper touched, big chance, fantasy assisted categorical selection is partially done through the relaxation, the task addressed in this work still contains content selection, as well as on sentence planning and realization. On the other hand, the commentaries sometimes describe information beyond the input play data. For example, Wenger is furious with Noble on the touchline. is found in a commentary, although the input play data does not contain any information whether or not Ars ene Wenger, the then head coach of Arsenal F.C., was furious. Some other commentaries contain expressions that are difﬁcult (though not impossible) to generate such as scrappy goal and deadly cross . Live commentary generation We use an encoder-decoder model, which receives an event sequence as input (x1, x2, . . . , xn), and generates a live commentary (y1, y2, . . . , ym) from the output of the encoder (Sutskever, Vinyals, and Le 2014), where xi is an event and yj is a word. As an encoder, we use the multilayer perceptron (MLP), which performed best in Murakami et al. (2017). Since each input is a sequence of events, one might think that a recurrent neural network would work well as an encoder, i.e., a sequence-to-sequence model as a whole (Sutskever, Vinyals, and Le 2014). However, according to our observation, the actual inputs do not have the characteristics as sequences; they are rather sets of events. In fact, a sequence-to-sequence model did not work well in our preliminary experiments. We therefore focus on MLP in our work. As a decoder, we use the recurrent neural network language model (RNNLM) (Bahdanau, Cho, and Bengio 2014) with long short-term memory (LSTM) (Hochreiter and Schmidhuber 1997). Figure 2 shows the neural network architecture of our model, which will be explained in detail in this section. Encoding events We use a mapping f to convert each event in input sequence (x1, x2, . . . , xn) to a vector representation: pi = f(xi), (1) where xi denotes an event consisting of a number of categorical and continuous values. Categorical values are represented as embeddings, and continuous values are preprocessed. Speciﬁcally, x and y coordinates of ball positions, which range from 0 to 100, are divided by 100 and normalized to [0,1]. Time at which the event occurs is con- verted to relative time; the delivery time of the commentary is subtracted from it. Additional detailed information (i.e., the lines with