# learning_to_represent_edits__8b1c8f0b.pdf

Published as a conference paper at ICLR 2019

LEARNING TO REPRESENT EDITS

Pengcheng Yin , Graham Neubig Language Technology Institute Carnegie Mellon University Pittsburgh, PA 15213, USA {pcyin,gneubig}@cs.cmu.edu

Miltiadis Allamanis, Marc Brockschmidt, Alexander L. Gaunt Microsoft Research Cambridge, CB1 2FB, United Kingdom {miallama,mabrocks,algaunt}@microsoft.com

We introduce the problem of learning distributed representations of edits. By combining a neural editor with an edit encoder , our models learn to represent the salient information of an edit and can be used to apply edits to new inputs. We experiment on natural language and source code edit data. Our evaluation yields promising results that suggest that our neural network models learn to capture the structure and semantics of edits. We hope that this interesting task and data source will inspire other researchers to work further on this problem.

1 INTRODUCTION

One great advantage of electronic storage of documents is the ease with which we can edit them, and edits are performed in a wide variety of contents. For example, right before a conference deadline, papers worldwide are ﬁnalized and polished, often involving common ﬁxes for grammar, clarity and style. Would it be possible to automatically extract rules from these common edits? Similarly, program source code is constantly changed to implement new features, follow best practices and ﬁx bugs. With the widespread deployment of (implicit) version control systems, these edits are quickly archived, creating a major data stream that we can learn from.

In this work, we study the problem of learning distributed representations of edits. We only look at small edits with simple semantics that are more likely to appear often and do not consider larger edits; i.e., we consider add deﬁnite articles rather than rewrite act 2, scene 3. Concretely, we focus on two questions: i) Can we group semantically equivalent edits together, so that we can automatically recognize common edit patterns? ii) Can we automatically transfer edits from one context to another? A solution to the ﬁrst question would yield a practical tool for copy editors and programmers alike, automatically identifying the most common changes. By leveraging tools from program synthesis, such groups of edits could be turned into interpretable rules and scripts (Rolim et al., 2017). When there is no simple hard rule explaining how to apply an edit, an answer to the second question would be of great use, e.g., to automatically rewrite natural language following some stylistic rule.

We propose to handle edit data in an autoencoder-style framework, in which an edit encoder f is trained to compute a representation of an edit x x+, and a neural editor α is trained to construct x+ from the edit representation and x . This framework ensures that the edit representation is semantically meaningful, and a sufﬁciently strong neural editor allows this representation to not be speciﬁc to the changed element. We experiment with various neural architectures that can learn to represent and apply edits and hope to direct the attention of the research community to this new and interesting data source, leading to better datasets and stronger models.

Brieﬂy, the contributions of our paper are: (a) in Sect. 2, we present a new and important machine learning task on learning representations of edits (b) we present a family of

Work done as an intern in Microsoft Research, Cambridge, UK.

Published as a conference paper at ICLR 2019

x : var green List = trivia==null ? null : trivia.Select(t=> t.Underlying Node);

x+: var green List = trivia?.Select(t=> t.Underlying Node);

x : me = ((ue!=null) ? ue.Operand : null) as Member Expression;

x +: me = ue?.Operand as Member Expression;

Edit Representation f (x , x+) Rn

Neural Editor α(x , f (x , x+))

Figure 1: Given an edit (Edit 1) of x to x+, f computes an edit representation vector. Using that representation vector the neural editor α applies the same edit to a new x . The code snippets shown here are real code change examples from the roslyn open-source compiler project.

models that capture the structure of edits and compute efﬁcient representations in Sect. 3 (c) we create a new source code edit dataset, and release the data extraction code at https://github.com/Microsoft/msrc-dpu-learning-to-represent-edits and the data at http://www.cs.cmu.edu/ pengchey/githubedits.zip. (d) we perform a set of experiments on the learned edit representations in Sect. 4 for natural language text and source code and present promising empirical evidence that our models succeed in capturing the semantics of edits.

In this work, we are interested in learning to represent and apply edits on discrete sequential or structured data, such as text or source code parse trees1. Figure 1 gives a graphical overview of the task, described precisely below.

Edit Representation Given a dataset of edits {x(i) x(i) + }N i=1, where x(i) is the original version of some object and x(i) + its edited form (see upper half of Figure 1 for an example), our goal is to learn a representation function f that maps an edit operation x x+ to a real-valued edit representation f (x , x+) Rn. A desired quality of f is for the computed edit representations to have the property that semantically similar edits have nearby representations in Rn. Having distributed representations also allows other interesting downstream tasks, e.g., unsupervised clustering and visualization of similar edits from large-scale data (e.g. the Git Hub commit stream), which would be useful for developing human-assistance toolkits for discovering and extracting emerging edit patterns (e.g. new bug ﬁxes or emerging best practices of coding).

Neural Editor Given an edit representation function f , we want to learn to apply edits in a new context. This can be achieved by learning a neural editor α that accepts an edit representation f (x , x+) and a new input x and generates x +.2 This is illustrated in the lower half of Figure 1.

We cast the edit representation problem as an autoencoding task, where we aim to minimize the reconstruction error of α for the edited version x+ given the edit representation f (x , x+) and the original version x . By limiting the capacity of f s output and allowing the model to freely use information about x , we are introducing a bottleneck that forces the overall framework to not simply treat f (x , x+) as an encoder of x+. The main difference from traditional autoencoders is that in our setup, an optimal solution requires to re-use as much information as possible from x

1Existing editing systems, e.g. the grammar checker in text editors and code refactoring module in IDEs, are powered by domain-speciﬁc, manually crafted rules, while we aim for a data-driven, domain-agnostic approach. 2We leave the problem of identifying which edit representation f (x , x+) to apply to x as interesting future work.

Published as a conference paper at ICLR 2019

Assign Stmt

h1 root Expr

h2 Expr Expr Op Expr

h3 TREECP Expr h4 Op h5 Expr Int Lit

h6 Int Lit 23 (a) (b)

AST Child Next Token

h EXPANDR action h GENTERM action h TREECP action

Action Flow Parent Feed Copied

Figure 2: (a) Graph representation of statement u = x + x. Rectangular (resp. rounded) nodes denote tokens (resp. non-terminals). (b) Sequence of tree decoding steps yielding x + x - 23, where x + x is copied (using the TREECP action) from the context graph in (a).

to make the most of the capacity of f . Formally, given a probabilistic editor function Pα such as a neural network and a dataset {x(i) x(i) + }N i=1, we seek to minimize the negative likelihood loss

i log Pα(x+ | x , f (x , x+)).

Note that this loss function can be interpreted in two ways: (1) as a conditional autoencoder that encodes the salient information of an edit, given x and (2) as an encoder-decoder model that encodes x and decodes x+ conditioned on the edit representation f (x , x+). In the rest of this section, we discuss our methods to model Pα and f as neural networks.

3.1 NEURAL EDITOR

As discussed above, α should use as much information as possible from x , and hence, an encoderdecoder architecture with the ability to copy from the input is most appropriate. As we are primarily interested in edits on text and source code in this work, we explored two architectures: a sequenceto-sequence model for text, and a graph-to-tree model for source code, whose known semantics we can leverage both on the encoder as well as on the decoder side. Other classes of edits, for example, image manipulation, would most likely be better served by convolutional neural models.

Sequence-to-Sequence Neural Editor First, we consider a standard sequence-to-sequence model with attention (over the tokens of x ). The architecture of our sequence-to-sequence model is similar to that of Luong et al. (2015), with the difference that we use a bidirectional LSTM in the encoder and a token-level copying mechanism (Vinyals et al., 2015) that directly copies tokens into the decoded sequence. Whereas in standard sequence-to-sequence models the decoder is initialized with the representation computed by the encoder, we initialize it with the concatenation of encoder output and the edit representation. We also feed the edit representation as input to the decoder LSTM at each decoding time step. This allows the LSTM decoder to take the edit representation into consideration while generating the output sequence.

Graph-to-Tree Neural Editor Our second model aims to take advantage of the additional structure of x and x+. To achieve this, we combine a graph-based encoder with a tree-based decoder. We use T(x) to denote a tree representation of an element, e.g., the abstract syntax tree (AST) of a fragment of source code. We extend T(x) into a graph form G(x) by encoding additional relationships (e.g., the next token relationship between terminal nodes, etc.) (see Figure 2(a)). To encode the elements of G(x ) into vector representations, we use a gated graph neural network (GGNN) (Li et al., 2015). Similarly to recurrent neural networks for sequences (such as bi RNNs), GGNNs compute a representation for each node in the graph, which can be used in the attention mechanisms of a decoder. Additionally, we use them to obtain a representation of the full input x , by computing their weighted average following the strategy of Gilmer et al. (2017) (i.e., computing a score for each node, normalizing scores with a softmax, and using the resulting values as weights).

Our tree decoder follows the semantic parsing model of Yin & Neubig (2018), which sequentially generate a tree T(x+) as a series of expansion actions a1 . . . a N. The probability of taking an action is modeled as p(at | a<t, s), where s is the input (a sequence of words in the original semantic

Published as a conference paper at ICLR 2019

Assign Stmt

Field Access

Assign Stmt

Figure 3: Sequence (a) and graph (b) representation of edit of v.F = x + x to u = x + x.

parsing setting) and a<t is the partial tree that has been generated so far. The model of Yin & Neubig (2018) mainly uses two types of actions: EXPANDR expands the current non-terminal using a grammar rule, and GENTERM generates a terminal token from a vocabulary or copies a token from s3. The dependence on the partial tree a<t is modeled by an LSTM cell which is used to maintain state throughout the generation procedure. Additionally, the LSTM receives the decoder state used to pick the action at the parent node as an additional input ( parent-feeding ). This process illustrated in Figure 2(b).

We extend this model to our setting by replacing the input sequence s by x ; concretely, we condition the decoder on the graph-level representation computed for G(x ). Additionally, we use the change representation f ( ) as an additional input to the LSTM initial state and at every decoding step. Based on the observation that edits to source code often manipulate the syntax tree by moving expressions around (e.g. by nesting statements in a conditional, or renaming a function while keeping its arguments), we extend the decoding model of Yin & Neubig (2018) by adding a facility to copy entire subtrees from the input. For this, we add a decoder action TREECP. This action is similar to standard copying mechanism known from pointer networks (Vinyals et al., 2015), but instead of copying only a single token, it copies the whole subtree pointed to.

However, adding the TREECP action means that there are many correct generation sequences for a target tree. This problem appears in token-copying as well, but can be easily circumvented by marginalizing over all correct choices at each generation step (by normalizing the probability distribution over allowed actions to sum up those that have the same effect). In the subtree-copying setting, the lengths of action sequences representing different choices may differ. In our implementation we handle this problem during training by simpling picking the generation sequence that greedily selecting TREECP actions.

3.2 EDIT REPRESENTATION

To compute a useful edit representation, a model needs to focus on the differences between x and x+. A risk in our framework is that f degenerates into an encoder for x+, turning α into a decoder. To avoid this, we need to follow the standard autoencoder trick, i.e. it is important to limit the capacity of the result of f by generating the edit representation in a low-dimensional space RN. This acts as a bottleneck and encodes only the information that is needed to reconstruct x+ from x . We again experimented with both sequence-based and graph-based representations of edits.

Sequence Encoding of Edits Given x (resp. x+) as sequence of tokens t(0) , . . . t(T ) (resp. t(0) + , . . . t(T+) + ), we can use a standard (deterministic) difﬁng algorithm to compute an alignment of tokens in the two sequences. We then use extra symbols for padding, + for additions, for deletions, for replacements, and = for unchanged tokens to generate a single sequence representing both x and x+. This is illustrated in Figure 3(a). By embedding the three entries in each element of the sequence separately and concatenating their representation, they can be fed into a standard sequence encoder whose ﬁnal state is our desired edit representation. In this work, we use a bi LSTM.

3EXPANDR corresponds to the APPLYCONSTR action in the original model of Yin & Neubig (2018). There is also a REDUCE action which marks the end of expanding a non-terminal with non-deterministic number of child nodes. See Yin & Neubig (2018) for details.

Published as a conference paper at ICLR 2019

Graph Encoding of Edits As in the graph-to-tree neural editor, we represent x and x+ as trees T(x ) and T(x+). We combine these trees into a graph representation G(x x+) by merging both trees into one graph, using Removed , Added and Replaced edges. To connect the two trees, we compute the same alignment as in the sequence case, connecting leaves that are the same and each replaced leaf to its replacement. We also propagate this information up in the trees, i.e., two inner nodes are connected by = edges if all their descendants are connected by = edges. This is illustrated in Figure 3(b). Finally, we also use the same + / - / / = tags for the initial node representation, computing it as the concatenation of the string label (i.e. token or nonterminal name) and the embedding of the tag. To obtain an edit representation, we use a GGNN unrolled for a ﬁxed number of timesteps and again use the weighted averaging strategy of Gilmer et al. (2017).

4 EVALUATION

Evaluating an unsupervised representation learning method is challenging, especially for a newly deﬁned task. Here, we aim to evaluate the quality of the learned edit representations with a series of qualitative and quantitative metrics on natural language and source code.

4.1 DATASETS AND CONFIGURATION

Natural Language Edits We use the Wiki Atomic Edits (Faruqui et al., 2018) dataset of pairs of short edits on Wikipedia articles. We sampled 1040K edits from the English insertion portion of the dataset and split the samples into 1000K/20K/20K train-valid-test sets.

Source Code Edits To obtain a dataset for source code, we clone a set of 54 C# projects on Git Hub and collected a Git Hub Edits dataset (see Appendix A for more information). We selected all changes in the projects that are no more than 3 lines long and whose surrounding 3 lines of code before and after the edited lines have not been changed, ensuring that the edits are separate and short. We then parsed the two versions of the source code and take as x and x+ the code that belongs to the top-most AST node that contains the edited lines. Finally, we remove trivial changes such variable renaming, changes within comments or formatting changes. Overall, this yields 111 724 edit samples. For each edit we run a simple C# analysis to detect all variables and normalize variable names such that each unique variable within x and x+ has a unique normalized name V0, V1, etc. This step is necessary to avoid the sparsity of data induced by the variety of different identiﬁer naming schemes. We split the dataset into 91,372 / 10,176 / 10,176 samples as train/valid/test sets.

Additionally, we introduce a labeled dataset of source code edits by using C# ﬁxers . Fixers are small tools built on top of the C# compiler, used to perform common refactoring and modernization tasks (e.g., using new syntactic sugar). We selected 16 of these ﬁxers and ran them on 6 C# projects to generate a small C#Fixers dataset of 2,878 edit pairs with known semantics. We present descriptions and examples of each ﬁxer in Appendix A.

Conﬁguration Throughout the evaluation we use a ﬁxed size of 512 for edit representations. The size of word embeddings and hidden states of encoding LSTMs is 128. The dimensionality of the decoding LSTM is set to 256. Details of model conﬁguration can be found in Sect. A.

When generating the target x+, our neural editor model can optionally take as input the context of the original input x (e.g., the preceding and succeeding code segments surrounding x ), whose information could be useful for predicting x+. For example, in source code edits the updated code snippet x+ may reuse variables deﬁned in the preceding snippet. In our code experiments, we use a standard bidirectional LSTM network to encode the tokenized 3 lines of code before and after x as context. The encoded context is used to initialize the decoder, and as an additional source for the pointer network to copy tokens from.

4.2 QUALITY OF EDIT REPRESENTATIONS

First, we study the ability of our models to encode edits in a semantically meaningful way.

Published as a conference paper at ICLR 2019

Figure 4: t-SNE visualization of edits from 13 C# ﬁxers, where point color indicates the ﬁxer. Labels indicate the id of the ﬁxer, see main text.

Visualizing Edits on Fixers Data In a ﬁrst experiment, we train our sequential neural editor model on our Git Hub Edits data and then compute representations for the edits generated by the C# ﬁxers. A t-SNE visualization (Maaten & Hinton, 2008) of the encodings is shown in Figure 4. For this visualization, we randomly selected 100 examples from the edits of each ﬁxer (if that ﬁxer has more than 100 samples) and discarded ﬁxer categories with less than 40 examples. Readers are referred to Appendix A for detailed descriptions of each ﬁxer category. We ﬁnd that our model produces dense clusters for simple or distinctive code edits, e.g. ﬁxer RCS1089 (using the ++ or -- unary operators instead of a binary operator (e.g., i = i + 1 i++), and ﬁxer CA2007 (adding .Configure Await(false) for await statements). We also analyzed cases where (1) the edit examples from the same ﬁxer are scattered, or (2) the clusters of different ﬁxers overlap with each other. For example, the ﬁxer RCS1077 covers 12 different aspects of optimizing LINQ method calls (e.g., type casting, counting, etc.), and hence its edits are scattered. On the other hand, ﬁxers RCS1146 and RCS1206 yield overlapping clusters, as both ﬁxers change code to use the ?. operator. Fixers RCS1207 (change a lambda to a method group, e.g. foo(x=>bar(x)) foo(bar)) and RCS1021 (simplify lambda expressions, e.g. foo(x=>{return 4;}) foo(x=>4)) are similar, as both inline lambda expressions in two different ways. Analysis yields that the representation is highly dependent on surface tokens. For instance, IDE004 (removing redundant type casts, e.g. (int)2 2) and RCS1207 (removing explicit argument lists) yield overlapping clusters, as both involve deleting identiﬁers wrapped by parentheses.

Human Evaluation on Encoding Natural Language Wiki Atomic Edits In a second experiment, we test how well neighborhoods in edit representation space correspond to semantic similarity. We computed the ﬁve nearest neighbors of 200 randomly sampled seed edits from our training set, using both our trained sequence-to-sequence editing model with sequential edit encoder, as well as a simple bag-of-words baseline based on TF-IDF scores. We then rated the quality of the retrieved neighbors on a scale of 0 ( unrelated edit ), 1 ( similar edit ) and 2 ( semantically or syntactically same edit ). Details of the annotation schema is included in Sect. E. We show the (normalized) discounted cumulative gain (DCG, Manning et al. (2008)) for the two models at the top of Tab. 1 (higher is better). The relevance scores indicate that our neural model clearly outperforms the simplistic baseline. Tab. 1 also presents two example edits with their nearest neighbors. Example 1 shows that the neural edit models succeeded in representing syntactically and semantically similar edits, while the bag-of-words baseline relies purely on surface token overlap. Interestingly, we also observed that the edit representations learned by the neural editing model on Wiki Atomic Edits are somewhat sensitive to position, i.e. the position of the inserted tokens in both the seed edit and the nearest neighbors is similar. This is illustrated in Example 2, where the second ( senegalese striker ) and the third ( republican incumbent ) nearest neighbors returned by the neural model have similar editing positions as the seed edit, while they are semantically diverse.

4.3 EDIT ENCODER PERFORMANCE

To evaluate the performance of our two edit encoders discussed in Sect. 3.2 and disentangle it from the choice of neural editor, we train various combinations of our neural editor model and manually evaluate the quality of the edit representation. More speciﬁcally, we trained our neural editor models on Git Hub Edits and randomly sampled 200 seed edits and computed their 3 nearest neighbors using each end-to-end model. We then rated the resulting groups using the same 0-2 scale as above. The resulting relevance scores are shown in Tab. 2.

Published as a conference paper at ICLR 2019

Bag of Words Model Seq2Seq Seq Edit Encoder

DCG/NDCG@5 9.3 / 67.3% 13.5 / 90.3% DCG@5 (by edit size) 1: 14.7 2-3: 10.8 >3: 5.4 1: 16.2 2-3: 12.9 >3: 12.4

Example 1 daniel james nava ( born february 22 , 1983 ) is an american professional baseball outﬁelder nava is only the fourth player in mlb history to hit a grand slam in his ﬁrst major league at bat and the second to do it on the ﬁrst pitch . NN-1 he batted .302 with 73 steals , and received a september call - up to the major leagues as an outﬁelder .

arthur ray briles ( born december 3 , 1955 ) is a former american football coach and his most recent head coaching position was at baylor university , a position he held from the 2008 season through the 2015 season . NN-2 he played as an outﬁelder for the hanshin tigers .

jonathan david disalvatore ( born march 30 , 1981 ) is a professional ice hockey he was selected by the san jose sharks in the 4th round ( 104th overall ) of the 2000 nhl entry draft . NN-3 in 2012 , his senior at oak mountain , dahl had a .412 batting average , 34 runs batted in ( rbis ) , and 18 stolen bases as an outﬁelder .

professor paul talalay ( born march 31 , 1923 ) is the john jacob abel distinguished service professor of pharmacology and director of the laboratory for molecular sciences at johns hopkins school of medicine in baltimore .

Example 2 she , along with her follow artist carolyn mase studied with impressionist landscape painter john henry twachtman at the art students league of new york . NN-1 his brother was draughtsman william daniell and his uncle was landscape painter thomas daniell .

the ﬁrst painting was a portrait of a young girl , emerantia van beresteyn , the sister of the landscape painter nicolaes van beresteyn , the later founder of half of this hofje . NN-2 william james linton ( december 7 , 1812 - december 29 , 1897 ) was an english - born american wood engraver , landscape painter , political reformer and author of memoirs , novels , poetry and non-ﬁction .

he was the club s top scorer with 22 goals in all competitions , one more than senegalese striker lamine diarra , who left the club at the end of the season .

NN-3 early on , hopper modeled his style after chase and french impressionist masters douard manet and edgar degas .

caforio aggressively attacked his opponent , republican incumbent steve knight , for his delayed response to the leak .

Table 1: Natural language human evaluation results and 3 nearest neighbors. Inserted text marked. Example 1 neural editing model returns syntactically and semantically similar edits. Example 2 Neural edit representations are sensitive to position.

Table 2: Relevance scores of human evaluation on Git Hub Edits data. Acc.@1 denotes the ratio that the 1-nearest neighbor has a score 2.

Model DCG@3 NDCG@3 (%) Acc.@1 (%)

Bo W 7.77 75.99 58.46 Seq2Seq Seq Edit Encoder 10.09 90.05 75.90 Graph2Tree Seq Edit Encoder 10.56 91.40 79.49 Graph2Tree Graph Edit Encoder 9.44 86.20 72.31

Comparing the sequential edit encoders trained with Seq2Seq and Graph2Tree editors, we found that the edit encoder trained with the Graph2Tree objective performs better. We hypothesize that this is because the Graph2Tree editor better captures structural-level information about an edit. For instance, Example 1 in Tab. 3 removes explicit type casting. The Seq2Seq editor has difﬁculty distinguishing this type of edit, confusing it with changes of lambda expressions to method groups (1st and 2nd nearest neighbors) since both two types of edits involve removing paired parentheses.

Surprisingly, we found that the graph-based edit encoder does not outperform the sequence-based encoder. However, we observe that the graph edit encoder in many cases tends to better capture high-level and abstract structural edit patterns. Example 2 in Tab. 3 showcases a seed edit that swaps two consecutive declarations, which corresponds to swapping the intermediate Expression nodes representing each statement on the underlying AST. In this case, the graph edit encoder is capable of grouping semantically similar edits, while it seems to be more difﬁcult for the sequential encoder

Published as a conference paper at ICLR 2019

Table 3: Two example source code edits and their nearest neighbors based on the edit representations computed by each model.

Example 1 x : V0.Send Select Sound Request((int)V1); x+: V0.Send Select Sound Request(V1);

Seq2Seq Seq Edit Encoder x : V0.Debug(() => LITERAL); x+: V0.Debug(LITERAL);

x : V0.Debug(() => LITERAL); x+: V0.Debug(LITERAL);

x : V0.Write Compressed Integer((uint)V1); x+: V0.Write Compressed Integer(V1);

Graph2Tree Seq Edit Encoder x : V0.Write Compressed Integer((uint)V1); x+: V0.Write Compressed Integer(V1);

x : V0.Write Compressed Integer((uint)V1); x+: V0.Write Compressed Integer(V1);

x : V0.Write Compressed Integer((uint)V1); x+: V0.Write Compressed Integer(V1);

Graph2Tree Graph Edit Encoder x : V0.Update Last Read(this.V1); x+: V0.Update Last Read(V1);

x : V0.Update Last Write(this.V1); x+: V0.Update Last Write(V1);

x : V0.Append(this.V1); x+: V0.Append(V1);

Example 2 x : string V0; string V1; x+: string V1; string V0;

Seq2Seq Seq Edit Encoder x : Retry Config V0; string V1; x+: string V1; Retry Config V0;

x : string[] V0; string[] V1; int V2; x+: int V2; string[] V0; string[] V1;

x : Type V0= null; Binding Flags V1= 0; x+: Binding Flags V1= 0; Type V0= null;

Graph2Tree Seq Edit Encoder x : Retry Config V0; string V1; x+: string V1; Retry Config V0;

x : string[] V0; string[] V1; int V2; x+: int V2; string[] V0; string[] V1;

x : int V0 = V1; int V2 = V3; x+: int V2 = V3; int V0 = V1;

Graph2Tree Graph Edit Encoder x : Retry Config V0; string V1; x+: string V1; Retry Config V0;

x : int V0 = V1; int V2 = V3; x+: int V2 = V3; int V0 = V1;

x : double V0= -1; double V1= -1; x+: double V1= -1; double V0= -1;

Table 4: Test performance of different neural editors.

Model Acc.@1 (%) Recall@5 (%) PPL per token

Git Hub Edits

Seq2Seq Bag-of-Edits Encoder 44.05 54.97 1.4808 Seq2Seq Seq Edit Encoder 59.63 65.46 1.2792 Graph2Tree Bag-of-Edits Encoder 40.66 49.42 1.5058 Graph2Tree Seq Edit Encoder 57.49 62.94 1.3043 Graph2Tree Graph Edit Encoder 48.05 56.51 1.3712 Wiki Atomic Edits

Seq2Seq Bag-of-Edits Encoder 23.73 43.47 1.3730 Seq2Seq Seq Edit Encoder 72.94 76.53 1.0527

encoder to capture the edit pattern. On the other hand, we found that the graph edit encoder often fails to capture simpler, lexical level edits (e.g., Example 1). This might suggest that terminal node information is not effectively propagated, an interesting issue worth future investigation.

4.4 PRECISION OF NEURAL EDITORS

Finally, we evaluate the performance of our end-to-end system by predicting the edited input x+ given x and the edit representation. We are interested in answering two research questions: First, how well can our neural editors generate x+ given the gold-standard edit representation f (x , x+)? Second, and perhaps more interestingly, can we use the representation of a similar edit f (x , x +) to generate x+ by applying that edit to x (i.e. ˆx+ = α(x , f (x , x +)))?

To answer the ﬁrst question, we trained our neural editor models on the Wiki Atomic Edits and the Git Hub Edits dataset, and evaluate the performance of encoding and applying edits on test sets. For completeness, we also evaluated the performance of our neural editor models with a simple Bag-of Edits edit encoding scheme, where f (x , x+) is modeled as the concatenation of two vectors, each representing the sum of the embeddings of added and deleted tokens in the edit, respectively. This edit encoding method is reminiscent of the model used in Guu et al. (2017) for solving a different task of language modeling by marginalizing over latent edits, which we will elaborate in Sect. 5. Tab. 4

Published as a conference paper at ICLR 2019

Table 5: Transfer learning results on C# ﬁxers data, averaged across all ﬁxer categories.

Model Acc.(%) Acc. (%) Recall@5(%) Recall@5 (%)

Seq2Seq Seq Edit Encoder 38.35 77.67 41.50 83.84 Graph2Tree Seq Edit Encoder 49.21 77.30 51.93 81.77 Baselines (no edit encoding) Seq2Seq w/o Edit Encoder 7.07 14.29 Graph2Tree w/o Edit Encoder 8.81 11.90

: upper-bound performance of predicting x+ using the gold-standard edit representations.

lists the evaluation results. With our proposed sequenceand graph-based edit encoders, our neural editor models achieve reasonable end-to-end performance, surpassing systems using bag-of-edits representations. This is because many edits are context-sensitive and position-sensitive, requiring edit representation models that go beyond the bag-of-edits scheme to capture those effects (more analysis is included in Appendix B). Interestingly, on the Git Hub Edits dataset, we ﬁnd that the Seq2Seq editor with sequential edit encoder registers the best performance. However, it should be noted that in this set of experiments, we encode the gold-standard edit f (x , x+) to predict x+. As we will show later, better performance with the gold-standard edit does not necessarily imply better (more generalizable) edit representation. Nevertheless, we hypothesize that the higher accuracy of the Seq2Seq edit is due to the fact that a signiﬁcant proportion of edits in this dataset is small and primarily syntactically simple. Indeed we ﬁnd that 69% of test examples have a token-level edit distance of less than 5.

To answer the second question, we use the trained neural editors from the previous experiment, and test their performance in a one-shot transfer learning scenario. Speciﬁcally, we use our high-quality C#Fixers dataset, and for each ﬁxer category F of semantically similar edits, we randomly select a seed edit {x x +} F, and use its edit representation f (x , x +) to predict the updated code for all examples in F, i.e., we have ˆx+ = α(x , f (x , x +)), {x x+} F. This task is highly non-trivial, since a ﬁxer category could contain more than hundreds of edit examples collected from different C# projects. Therefore, it requires the edit representations to generalize and transfer well, while being invariant of local lexical information like speciﬁc method names. To make the experimental evaluation more robust to noise, for each ﬁxer category F, we randomly sample 10 seed edit pairs {x x +}, compute their edit representations and use them to predict the edited version of the examples in F and evaluate accuracy of predicting the exact ﬁnal version. We then report the best score among the 10 seed representations as the performance metric on F.

Tab. 5 summarizes the results and also reports the upper bound performance when using the goldstandard edit representation f (x , x+) to predict x+, and an approximation of the lower bound accuracies using pre-trained Seq2Seq and Graph2Tree models without edit encoders. We found that our neural Graph2Tree editor with the sequential edit encoder signiﬁcantly outperforms the Seq2Seq editor, even though Seq2Seq performs better when using gold-standard edit representations. This suggest that the edit representations learned with the Graph2Tree model generalize better, especially for edits discussed in Sect. 4.2 that involve syntactic variations like RCS1021 (lambda expression simpliﬁcation, 7.8% vs. 30.7% for Seq2Seq and Graph2Tree), and RCS1207 (change lambdas to method groups, 7.1% vs. 26.2%). Interestingly, we also observe that Seq2Seq outperforms the Graph2Tree model for edits with trivial surface edit sequences, where the Graph2Tree model does not have a clear advantage. For example, on RCS1015 (use nameof operator, e.g. Exception("x") Exception(nameof(x))), the accuracies for Seq2Seq and Graph2Tree are 40.0% (14/35) and 28.6% (10/35), resp. We include more analysis of the results in Appendix C.

5 RELATED WORK

Edits have recently been considered in NLP, as they represent interesting linguistic phenomena in language modeling and discourse (Faruqui et al., 2018; Yang et al., 2017a). Speciﬁcally, Guu et al. (2017) present a generative model of natural language sentences via editing prototypes. Our work shares with Guu et al. (2017) in that (1) the posterior edit encoding model in Guu et al. (2017) is similar to our baseline bag-of-edits encoder in Sec. 4.4, and (2) the sequence-to-sequence sentence generation model given the prototype and edit representation is reminiscent of our Seq2Seq editor. In contrast, our work directly focuses on discriminative learning of representing edits and applying the learned edits for both sequential (NL) and structured (code) data. Another similar line of research

Published as a conference paper at ICLR 2019

is retrieve-and-edit models for text generation (Hashimoto et al., 2018), where given an input data x, the target prediction y is generated by editing a similar target y that is retrieved based on the similarity between its source x and the input x. While these models typically require an editor component to generate the output by exploiting the difference between similar inputs, they usually use the simpler bag-of-edits representations (Wu et al., 2019), or implicitly capture it via end-to-end neural networks (Contractor et al., 2018). To our best knowledge, there is not any related work that classiﬁes or otherwise explicitly represents the differences over similar input, with the exception of differential recurrent neural networks used for action recognition in videos (Veeriah et al., 2015; Zhuang et al., 2018). This is a substantially different task, as the data includes a temporal component as well.

Source code edits are a widely studied artifact. Specialized software, such as git, is widely used to store source code revision histories. Nguyen et al. (2013) studied the repetitiveness of source code changes by identifying identical types of changes using a deterministic differencing tool. In contrast, we employ on a neural network to cluster similar changes together. Rolim et al. (2017) use such clusters to synthesize small programs that perform the edit. The approach is based on Rolim et al. (2018) extract manually designed syntactic features from code and cluster over multiple changes to ﬁnd repeatable edit rules. Similarly, Paletov et al. (2018) extract syntactic features speciﬁcally targeting edits in cryptography API protocols. In this work, we try to avoid hand-designed features and allow a neural network to learn the relevant aspects of a change by directly giving as input the original and ﬁnal version of a changed code snippet.

Modeling tree generation with machine learning is an old problem that has been widely studied in NLP. Starting with Maddison & Tarlow (2014), code generation has also been considered as a tree generation problem. Close to our work is the decoder of Yin & Neubig (2017) which we use as the basis of our decoder. The work of Chen et al. (2018) is also related, since it provides a tree-to-tree model, but focuses on learning a single translation tasks and cannot be used directly to represent multiple types of edits. Both Yin & Neubig (2017) and Chen et al. (2018) have copying mechanism for single tokens, but our subtree copying mechanism is novel.

Autoencoders (see Goodfellow et al. (2016) for an overview) have a long history in machine learning. Variational autoencoders (Kingma & Welling, 2013) are similar to autoencoders but instead of focusing on the learned representation, they aim to create accurate generative probabilistic models. Most (variational) autoencoders focus on encoding images but there have been works that autoencode sequences, such as text (Dai & Le, 2015; Bowman et al., 2015; Yang et al., 2017b) and graphs (Simonovsky & Komodakis, 2018; Liu et al., 2018). Conditional variational autoencoders (Sohn et al., 2015) have a related form to our model (with the exception of the KL term), but are studied as generative models, whereas we are primarily interested in the edit representation.

6 DISCUSSION & CONCLUSIONS

In this work, we presented the problem of learning distributed representation of edits. We believe that the dataset of edits is highly relevant and should be studied in more detail. While we have presented a set of initial models and metrics on the problem and obtained some ﬁrst promising results, further development in both of these areas is needed. We hope that our work inspires others to work on this interesting problem in the future.

ACKNOWLEDGMENTS

We would like to thank Rachel Free for her insightful comments and suggestions.

Ron Artstein and Massimo Poesio. Inter-coder agreement for computational linguistics. Computational Linguistics, 34:555 596, 2008.

Samuel R Bowman, Luke Vilnis, Oriol Vinyals, Andrew M Dai, Rafal Jozefowicz, and Samy Bengio. Generating sentences from a continuous space. ar Xiv preprint ar Xiv:1511.06349, 2015.

Xinyun Chen, Chang Liu, and Dawn Song. Tree-to-tree neural networks for program translation. ar Xiv preprint ar Xiv:1802.03691, 2018.

Published as a conference paper at ICLR 2019

Danish Contractor, Vineet Kumar, Sachindra Joshi, and Gaurav Pandey. Exemplar encoder-decoder for neural conversation generation. In ACL, 2018.

Andrew M Dai and Quoc V Le. Semi-supervised sequence learning. In Advances in neural information processing systems, pp. 3079 3087, 2015.

Manaal Faruqui and Dipanjan Das. Identifying Well-formed Natural Language Questions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2018.

Manaal Faruqui, Ellie Pavlick, Ian Tenney, and Dipanjan Das. Wiki Atomic Edits: A multilingual corpus of Wikipedia edits for modeling language and discourse. In EMNLP, 2018.

Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. In International Conference on Machine Learning (ICML), 2017.

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. http: //www.deeplearningbook.org.

Kelvin Guu, Tatsunori B Hashimoto, Yonatan Oren, and Percy Liang. Generating sentences by editing prototypes. ar Xiv preprint ar Xiv:1709.08878, 2017.

Tatsunori B. Hashimoto, Kelvin Guu, Yonatan Oren, and Percy S. Liang. A retrieve-and-edit framework for predicting structured outputs. In Neur IPS, 2018.

Diederik P Kingma and Max Welling. Auto-encoding variational bayes. ar Xiv preprint ar Xiv:1312.6114, 2013.

Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. Gated graph sequence neural networks. ar Xiv preprint ar Xiv:1511.05493, 2015.

Qi Liu, Miltiadis Allamanis, Marc Brockschmidt, and Alexander L. Gaunt. Constrained graph variational autoencoders for molecule design. In Neural Information Processing Systems (NIPS), 2018.

Thang Luong, Hieu Pham, and Christopher D. Manning. Effective approaches to attention-based neural machine translation. In Proceedings of EMNLP, 2015.

Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-SNE. Journal of machine learning research, 9(Nov):2579 2605, 2008.

Chris Maddison and Daniel Tarlow. Structured generative models of natural source code. In International Conference on Machine Learning (ICML), 2014.

Christopher D. Manning, Prabhakar Raghavan, and Hinrich Sch utze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008. ISBN 0521865719, 9780521865715.

Hoan Anh Nguyen, Anh Tuan Nguyen, Tung Thanh Nguyen, Tien N Nguyen, and Hridesh Rajan. A study of repetitiveness of code changes in software evolution. In Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering, pp. 180 190. IEEE Press, 2013.

Rumen Paletov, Petar Tsankov, Veselin Raychev, and Martin Vechev. Inferring crypto API rules from code changes. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 450 464. ACM, 2018.

Reudismam Rolim, Gustavo Soares, Loris D Antoni, Oleksandr Polozov, Sumit Gulwani, Rohit Gheyi, Ryo Suzuki, and Bj orn Hartmann. Learning syntactic program transformations from examples. In Proceedings of the 39th International Conference on Software Engineering, pp. 404 415. IEEE Press, 2017.

Reudismam Rolim, Gustavo Soares, Rohit Gheyi, and Loris D Antoni. Learning quick ﬁxes from code repositories. ar Xiv preprint ar Xiv:1803.03806, 2018.

Published as a conference paper at ICLR 2019

Martin Simonovsky and Nikos Komodakis. Graph VAE: Towards generation of small graphs using variational autoencoders. ar Xiv preprint ar Xiv:1802.03480, 2018.

Kihyuk Sohn, Honglak Lee, and Xinchen Yan. Learning structured output representation using deep conditional generative models. In Advances in Neural Information Processing Systems, pp. 3483 3491, 2015.

Vivek Veeriah, Naifan Zhuang, and Guo-Jun Qi. Differential recurrent neural networks for action recognition. In Proceedings of the IEEE international conference on computer vision, pp. 4041 4049, 2015.

Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. Pointer networks. In Advances in Neural Information Processing Systems, pp. 2692 2700, 2015.

Yu Ping Wu, Furu Wei, Shaohan Huang, Zhoujun Li, and Ming Zhou. Response generation by context-aware prototype editing. 2019.

Diyi Yang, Aaron Halfaker, Robert Kraut, and Eduard Hovy. Identifying semantic edit intentions from revisions in wikipedia. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 2000 2010, 2017a.

Zichao Yang, Zhiting Hu, Ruslan Salakhutdinov, and Taylor Berg-Kirkpatrick. Improved variational autoencoders for text modeling using dilated convolutions. ar Xiv preprint ar Xiv:1702.08139, 2017b.

Pengcheng Yin and Graham Neubig. A syntactic neural model for general-purpose code generation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pp. 440 450, 2017.

Pengcheng Yin and Graham Neubig. TRANX: A transition-based neural abstract syntax parser for semantic parsing and code generation. In Conference on Empirical Methods in Natural Language Processing (EMNLP) Demo Track, 2018.

Naifan Zhuang, The Duc Kieu, Guo-Jun Qi, and Kien A Hua. Deep differential recurrent neural networks. ar Xiv preprint ar Xiv:1804.04192, 2018.

Published as a conference paper at ICLR 2019

A DATASETS AND CONFIGURATION

Wiki Atomic Edits We randomly sampled 1040K insertion examples from the English portion of Wiki Atomic Edits (Faruqui et al., 2018) dataset, with a train, development and test splits of 1000K, 20K and 20K.

Git Hub Edits We cloned the top 54 C# Git Hub repositories based on their popularity (Tab. 8). For each commit in the master branch, we collect the previous and updated versions of the source code, and extract all consecutive lines of edits that are smaller than three lines, and with at least three preceding and successive lines that have not been changed. We then ﬁlter trivial changes such as variable and identiﬁer renaming, and changes happened within comments. We also limit the number of tokens for each edit to be smaller than 100, and down-sample edits whose frequency is larger than 30. Finally, we split the dataset by commit ids, ensuring that there are no edits in the training and testing (development) sets coming from the same commit. Tab. 6 lists some statistics of the dataset.

Table 6: Statistics of the Git Hub Edits Dataset

Average Num. Tokens in x 16.4 Average Num. Tokens in x+ 17.0 Average Edit Distance 5.0 Average size of AST for T(x ) 28.5 Average size of AST for T(x+) 29.4

C#Fixers We selected 16 C# ﬁxers from Roslyn4 and Roslynator5, and ran them on 6 C# projects to generate a small, high-quality C# ﬁxers dataset of 2 878 edit pairs with known semantics. Table 7 lists the detailed descriptions for each ﬁxer category. And more information can be found at https://github.com/Josef Pihrt/Roslynator/blob/master/ src/Analyzers/README.md.

Network Conﬁguration Throughout the experiments, we use a ﬁxed edit representation size of 512. The dimensionality of word embedding, the hidden states of the encoder LSTMs, as well as the gated graph neural network is 128, while the decoder LSTM uses a larger hidden size of 256. For the graph-based edit encoder, we used a two-layer graph neural network, with 5 information propagation steps at each layer. During training, we performed early stopping, and choose the best model based on perplexity scores on development set. During testing, we decode a target element x+ using a beam size of 5.

4http://roslyn.io 5https://github.com/Josef Pihrt/Roslynator

Published as a conference paper at ICLR 2019

Table 7: Descriptions of ﬁxer categories in C#Fixers dataset

Fixer ID Description Num. Edits Example

CA2007 apply .Configure Await(false) to await statements 1051 x : await Console.Write Async() x+: await Console.Write Async()

.Configure Await(false)

IDE0004 Cast is redundant 53 x : var x = 1; var b = (int)x; x+: var x = 1; var b = x;

RCS1015 Use nameof operator 35 x : Exception("parameter"); x+: Exception(nameof(parameter));

RCS1021 Simplify lambda expression 411 x :

var x = items.Select(f => {

return f.To String(); });

x+: var x = items.Select( f => f.To String());

RCS1032 Remove redundant parentheses 24 x : if ((x)) {} x : if (x) {}

RCS1058 Use compound assignment 43 x : i = i + 2; x+: i += 2;

RCS1077 Optimize LINQ method call 200 x : items.Where(f => Foo(f)).Any(); x+: items.Any(f => Foo(f));

RCS1089 Use --/++ operator instead of assignment 75 x : i = i + 1; x+: i += 1;

RCS1097 Remove redundant To String call 20 x : var x = s.To String(); x+: var x = s;

RCS1118 Mark local variable as const 477 x : string s = "a";

string s2 = s + "b";

x+: const string s = "a";

string s2 = s + "b";

RCS1123 Add parentheses according to operator precedence 109 x : if (x || y && z) {} x+: if (x || (y && z) ) {}

RCS1146 Use conditional access 71 x : x != null && x.Starts With("a"); x+: x?.Starts With("a");

RCS1197 Optimize call of String Builder s Append/Append Line 95 x : sb.Append(s + "x"); x+: sb.Append(s).Append("x");

RCS1202 Avoid Null Reference Exception 56 x : items.First().To String(); x+: items?.First().To String();

RCS1206 Use conditional access instead of conditional expression 116 x : int i = (x != null) ? x.Value.Get Hash Code() : 0; x+: int i = x?.Get Hash Code() ?? 0;

RCS1207 Use method group instead of anonymous function 42 x : items.Select(f => Foo(f)); x+: items.Select(Foo);

Published as a conference paper at ICLR 2019

Table 8: Our C# Git Hub dataset projects

Name Git Hub Id Description

acat intel/acat Assistive Context-Aware Toolkit akka.net akka/akka.net Distributed Actors aspnetboilerplate aspnetboilerplate/aspnetboilerplate ASP.NET boilerplate Auto Mapper Auto Maper/Auto Mapper Object-Object Mapper Bot Builder Microsoft/Bot Builder Bot Framework Cef Sharp cefsharp/Cef Sharp Chromium Embedded Framework Bindings choco chocolatey/choco package mananger cli dotnet/cli .NET CLI Tools Code Hub Code Hub App/Code Hub i OS application coreclr dotnet/coreclr .NET Framework corefx dotnet/corefx .NET FOundational Libraries dapper Stack Exchange/Dapper Object Mapper dn Spy 0xd4d/dn Spy .NET debugger and assembly editor duplicati duplicati/duplicati Encrypted Cloud Backups Entity Framework aspnet/Entity Framework Object-Relational Mapper Entity Framework Core aspnet/Entity Framework Core Object-Relational Mapper Core Fluent Validation Jeremy Skinner/Fluent Validation Validation Rules framework accord-net/framework ML, CV Framework GVFS Microsoft/VFSFor Git Git Virual File System Hangﬁre Hangﬁre IO/Hangﬁre Background job library ILSpy icsharpcode/ILSpy Decompiler Java Script Services aspnet/Java Script Services ASP.NET JS Services Mah Apps.Metro Mah Apps/Mah Apps.Metro WPF Framework Material Design In Xaml Toolkit Material Design In Xaml Toolkit/ Material Design In Xaml Toolkit Design XAML & WPF

mono mono/mono .NET implementation monodevelop mono/monodevelop IDE Mono Game Mono Game/Mono Game Game Framework msbuild Microsoft/msbuild Build Tool Mvc aspnet/Mvc MVC Framework Nancy Nancy Fx/Nancy HTTP based services Newtonsoft.Json James NK/Newtonsoft.Json JSON framework NLog NLog/NLog Loggin for .NET Open Live Writer Open Live Writer/ Open Live Writer Text editor Open RA Open RA/Open RA Strategy Game Engine Opserver opserver/Opserver Monitoring System orleans dotnet/orleans Distributed Virtual Actors Power Shell Power Shell/Power Shell Command Line Psychson brandonlw/Psychson Firmware Push Sharp Redth/Push Sharp Push Notiﬁcations ravendb ravendb/ravendb Database Reactive UI reactiveui/Reactive UI Reactive MVC Framework Rest Sharp restsharp/Rest Sharp HTTP/REST Client roslyn dotnet/roslyn .NET Compiler Rx.NET dotnet/reactive Reactive extensions. Service Stack Service Stack/Service Stack Web Service Framework shadowsocks-windows shadowsocks/ shadowsocks-windows Cryptography Share X Share X/Share X Screen Recorder Signal R Signal R/Signal R Real-time web framework Sonarr Sonarr/Sonarr PVR Space Engineers Keen Software House/ Space Engineers Game Sparkle Share hbons/Sparkle Share File Sharing Stack Exchange.Redis Stack Exchange/ Stack Exchange.Redis Redis Client Wave Function Collapse mxgmn/ Wave Function Collapse Bitmap/tilemap Generator Wox Wox-launcher/Wox Launcher

Published as a conference paper at ICLR 2019

B CLUSTERING EXPERIMENTS

To qualitatively evaluate the quality of the learned edit representations. We use the models trained on the Wiki Atomic Edits and Git Hub Edits datasets to cluster natural language and code edits. We run K-Means clustering algorithm on 0.5 million sampled edits from Wiki Atomic Edits, and all 90K code edits from Git Hub Edits, producing 50 000 and 20 000 clusters for each dataset.

Tab. 9 and Tab. 10 list some example clusters on Wiki Atomic Edits and Git Hub datasets, respectively. Due to the size of clusters, we omit out-liners and present distinctive examples from each cluster. On the Wiki Atomic Edits dataset, we found clusters whose examples are semantically and syntactically similar. More interestingly, on the source code data, we ﬁnd representative clusters that relate to idiomatic patterns and best practices of programming. The clustering results produced by our model would be useful for programming synthesis toolkits to generate interpretable code refractory rules, which we leave as interesting future work.

Finally, we remark that the clustering results indicate that the encoding of edits is context-sensitive and position-sensitive for both natural language and source code data. For instance, the Wiki Atomic Edits examples we present in Tab. 9 clearly indicate that semantically similar insertions also share similar editing positions. This is even more visible in code edits (Tab. 10). For instance, in the ﬁrst example in Tab. 10, Equal() can be changed to Empty() only in the Assert namespace (i.e., the context). These examples demonstrate that it is important for an edit encoder to capture the contextual and positional information in edits, a property that cannot be captured by simple bag-of-edits edit representation methods.

Published as a conference paper at ICLR 2019

Table 9: Example clusters on Wiki Atomic Edits data using representations learned by a neural Seq2Seq editor with sequential edit encoder

Description Add a person s middle name

1. isaiah marcus rankin ( born 22 may 1978 in london ) is an english professional footballer currently playing for stevenage borough . 2. audrey kathleen brown ( born 24 may , 1913 ) is a british athlete who competed mainly in the 100 metres . 3. alice edith rumph was a painter , etcher , and teacher . 4. mark larry taufua is an australian professional rugby league player . 5. monique edith lamoureux ( born july 3 , 1989 ) is an american ice hockey player .

Description Add a parenthetical expression also ... as to modify the subject

1. mid-state regional airport , also known as mid-state airport , is a small airport on in rush township , centre county in pennsylvania in the united states . 2. islamic culture , also known as saracenic culture , is a term primarily used in secular academia to describe the cultural practices common to historically islamic peoples . 3. birds of prey , also known as raptors , are birds that hunt for food primarily via ﬂight , using their keen senses , especially vision . 4. tetyana styazhkina , also written as tetyana stiajkina , ( ; born april 10 , 1977 ) is a ukrainian cycle racer who rides for the chirio forno d asolo team . 5. acid jazz , also known as club jazz , is a musical genre that combines elements of jazz , soul , funk , disco and hip hop .

Description Specify location using a prepositional phrase.

1. the douro fully enters portuguese territory just after the conﬂuence with the gueda river ; once the douro enters portugal , major population centres are less frequent along the river . 2. mochou lake and mochou lake park are located at 35 hanzhongmen da jie in the jianye district of nanjing , china , west to qinhuai river . 3. reiner gamma is an albedo feature that is located on the oceanus procellarum , to the west of the reiner crater on the moon . 4. she made a brief return to the screen in parrish ( 1961 ) , playing the supporting role of mother which received little attention by the press . 5. he was involved in a few storylines , including one where he broke his toe and had a heart attack after he was pushed by a mugger in the market .

Description Add positional or temporal clause

1. at the time ajax and hercules were trapped behind a landslide at the gaillard cut , both were working to clear the landslide . 2. at the docks , hikaru attempts to befriend the tiger , but ﬁnds that it dislikes humans . 3. about the second , i do know they exist , but the question is whether they are considered a genre outside of japan . 4. in the battle , shirou uses his reality marble , unlimited blade works and defeats gilgamesh . 5. in the game , red is a curious 11 - year - old boy from pallet town .

Published as a conference paper at ICLR 2019

Table 10: Example clusters on Github Edits data using representations learned by a Graph2Tree editor with sequential edit encoder. Locally deﬁned variable names are canonicalized.

Description Switch from Assert.Equal to Assert.Empty

x Assert.Equal(0, V0.Project Ids.Count); x+ Assert.Empty(V0.Project Ids);

x Assert.Equal(0, V0.Project References.Count()); x+ Assert.Empty(V0.Project References);

x Assert.Equal(0, V0.Trusted Selection Paths.Count); x+ Assert.Empty(V0.Trusted Selection Paths);

x Assert.Equal(0, V0.Count); x+ Assert.Empty(V0);

x Assert.Equal(0, V0.Messages.Count); x+ Assert.Empty(V0.Messages);

Description Use conditional access

x Type V0 = V1 == null ? null : V1.Get Type(); x+ Type V0 = V1?.Get Type();

x V0 = ((V1!= null) ? V1.Operand : null) as Member Expression; x+ V0 = V1?.Operand as Member Expression;

x string V0 = V1 == null ? null : V1.Get Type().Name; x+ string V0 = V1?.Get Type().Name;

x var V0 = V1 == null ? null : V1(V2).To Array(); x+ var V0 = V1?.Invoke(V2).To Array();

Description Optimize LINQ queries

x var V0 = V1.Customers.Where(V2 => V2.Customer ID == LITERAL) .First Or Default();

x+ var V0 = V1.Customers .First Or Default(V2 => V2.Customer ID == LITERAL);

x var V0 = V1.Type Converters.Where(V2 => V2.Can Convert To(V3, V1)) .First Or Default();

x+ var V0 = V1.Type Converters .First Or Default(V2 => V2.Can Convert To(V3, V1));

x var V0 = this.V1.Where(V2 => V2.Can Deserialize(V3)) .First Or Default(); x+ var V0 = this.V1.First Or Default(V2 => V2.Can Deserialize(V3));

x var V0 = V1.Where(V2 => V2.Item1 == V3 && V2.Item2 == V4) .First Or Default(); x+ var V0 = V1.First Or Default(V2 => V2.Item1 == V3 && V2.Item2 == V4);

Description Change from Add function to indexer.

x V0.Add(V1.key, V1.V2); x+ V0[V1.key] = V1.V2;

x V0.Add(V1.Id, V2); x+ V0[V1.Id] = V2;

x V0.Add(V1.Etag, V1); x+ V0[V1.Etag] = V1;

x V0.Add(V1.V2, V3.Merge(V1.V4)); x+ V0[V1.V2] = V3.Merge(V1.V4);

Published as a conference paper at ICLR 2019

Table 11: Break-down performance results on the transfer learning task. See Tab. 7 for descriptions of each ﬁxer category.

Graph2Tree Seq Edit Encoder Seq2Seq Seq Edit Encoder Fixer ID Acc.(%) Acc. (%) Recall@5(%) Recall@5 (%) Acc.(%) Acc. (%) Recall@5(%) Recall@5 (%) CA2007 88.0 89.2 88.2 94.3 52.7 91.9 61.0 93.8 IDE0004 69.8 92.5 73.6 94.3 45.3 98.1 45.3 98.1 RCS1015 28.6 82.9 40.0 82.9 40.0 71.4 42.9 71.4 RCS1021 30.7 60.8 33.3 67.6 7.8 56.2 17.8 72.3 RCS1032 8.3 37.5 8.3 45.8 20.8 45.8 20.8 45.8 RCS1058 93.0 88.4 95.3 90.7 37.2 69.8 39.5 76.7 RCS1077 6.5 69.5 6.5 74.0 7.5 84.0 7.5 84.5 RCS1089 96.0 98.7 98.7 98.7 76.0 98.7 76.0 98.7 RCS1097 15.0 90.0 15.0 90.0 25.0 90.0 25.0 95.0 RCS1118 95.4 98.1 99.6 99.6 93.7 99.6 98.7 1.00 RCS1123 66.1 81.7 68.8 86.2 64.2 87.2 65.1 94.5 RCS1146 54.9 81.7 56.3 85.9 45.1 76.1 57.7 91.5 RCS1197 5.3 25.3 5.3 33.7 12.6 40.0 12.6 50.0 RCS1202 28.6 67.9 37.5 75.0 28.6 69.6 32.1 80.4 RCS1206 75.0 99.1 75.9 99.1 50.0 1.00 50.0 1.00 RCS1207 26.2 73.8 28.6 90.5 7.1 64.3 11.9 88.1 : upper-bound performance of predicting x+ using the gold-standard edit representations.

C BREAK-DOWN ANALYSIS OF TRANSFER LEARNING RESULTS

Tab. 11 lists the detailed evaluation results for the transfer learning experiments discussed in Sect. 4.4. We refer readers to Tab. 7 for detailed descriptions of each ﬁxer category. The neural Graph2Tree editor outperforms the Seq2Seq editor (both with sequential edit encoders) on 10 out of 16 ﬁxer categories. However, we found that there are categories where both end-to-end system under-performs, even though the upper-bound accuracy is high (e.g. RCS1077, RCSRCS1197, RCS1207, RCS1032). While improving the generalization ability of the neural editor models to achieve better transfer learning performance is an important future work, we remark that this task is indeed non-trivial. First, some ﬁxer categories cover a broad range of similar edits, which could not be captured by a single seed edit. x Second, some categories contain syntactically or semantically complex refactoring rules. For instance, RCS1207 converts method groups into anonymous functions, involving changing multiple positions of the source code, which might not be trivially captured by the sequential edit encoder from a single example edit. Additionally, RCS1197 requires reasoning about a chain of expressions. It turns sb.Append(s1 + s2 + . . . + s N) into sb.Append(s1).Append(s2).[. . .]Append(s N)), which our current models are unable to reason about. More interestingly, we found that there are cases where the edits are syntactically simple, but could be semantically more difﬁcult to learn. For instance, RCS1032 is about removing redundant parentheses from expressions. Although the edit pattern might seem to be syntactically simple at the AST level (replacing a Parethesized Expression Syntax node by its child node), determining which pair of parentheses is actually redundant in an expression (e.g. (a + b) * (c / d)) is semantically non-trivial to learn from a single edit example. We believe that further advances in (general) learning from source code are required to correctly handle these cases.

D IMPACT OF TRAINING SET SIZE

To evaluate the data efﬁciency of our proposed approach, we tested the end-to-end performance of our neural editor model (Sect. 4.4, Tab. 4) with varying amount of training data. Tab. 12 lists the results. We found both Graph2Tree and Seq2Seq editors are relatively data efﬁcient. They registered around 90% of the accuracies achieved using the full training set with only 60% of the training data.

E DETAILS OF HUMAN EVALUATION

As discussed in Sect. 4.2, we performed human evaluation to rate the qualities of neighboring edits given a seed edit. The annotation instructions on Github Edits and Wiki Atomic Edits datasets are listed below. The annotation was carried out by three authors of this paper, and we anonymized the source of systems that generated the output. The three-way Fleiss kappa inter-rater agreement is κ = 0.55, which shows moderate agreement (Artstein & Poesio, 2008), an agreement level that is also used in other annotation tasks in NLP (Faruqui & Das, 2018).

Published as a conference paper at ICLR 2019

Table 12: Test performance of end-to-end experiments with varying amount of training data.

Training Set Size Acc.@1 (%) Recall@5 (%) PPL per token

Git Hub Edits Graph2Tree Seq Edit Encoder 20% 43.88 50.53 1.5703 40% 50.44 56.63 1.4152 60% 53.78 60.00 1.3720 80% 55.51 60.85 1.3392 100% 57.49 62.94 1.3043 Wiki Atomic Edits Seq2Seq Seq Edit Encoder 20% 42.87 48.24 1.4123 40% 57.72 62.31 1.1812 60% 65.22 69.62 1.1070 80% 68.44 73.34 1.0751 100% 72.94 76.53 1.0527

Table 13: Annotation Instruction for Git Hub Edits Data

Rating 2 Semantically and Syntactically Equivalent

The changed constituents in the seed edit and the neighboring edit are applied to the similar positions of the original sentence, serving the same syntactic and semantic role. For example, Examples

Seed Edit x var V0 = V1.Where(V2 => V2.Name == LITERAL).Single(); x+ var V0 = V1.Single(V2=> V2.Name == LITERAL);

Neighbor x var V0 = V1.Get Members().Where(V2 => V2.Kind ==

Symbol Kind.Property).Single(); x+ var V0 = V1.Get Members().Single(V2 => V2.Kind ==

Symbol Kind.Property);

Seed Edit x Type V0 = V1 == null ? typeof(object) : V1.Get Type(); x+ Type V0 = V1?.Get Type() ?? typeof(object);

Neighbor x string V0 = V1 == null ? string.Empty : VAR1.To String(); x+ string V0 = V1?.To String() ?? string.Empty;

Seed Edit x Assert.True(Directory.Exists(V0) == V1); x+ Assert.Equal(Directory.Exists(V0), V1);

Neighbor x Assert.True(V0.Get String(V0.Get Bytes(LITERAL)) ==

V1.Containing Assembly.Identity.Culture Name); x+ Assert.Equal(V0.Get String(VAR0.Get Bytes(LITERAL)),

V1.Containing Assembly.Identity.Culture Name);

Rating 1 Syntactically or Semantically Related

The seed and neighboring edits share functionally or syntactically similar patterns. Examples

Published as a conference paper at ICLR 2019

The following edit is a related edit of the ﬁrst example above, as it applies the same simpliﬁcation (.Where(COND).Func() to .Func(COND)), but for First Or Default instead of Single:

Seed Edit x var V0 = V1.Where(V2 => V2.Name == LITERAL).Single(); x+ var V0 = V1.Single(V2=> V2.Name == LITERAL);

Neighbor x var V0 = V1.Where(V2 => V3.Reports To == V2.Employee ID).First Or Default(); x+ var V0 = V1.First Or Default(V2 => V3.Reports To == V2.Employee ID);

The following edit is a related edit of the second example above, as it also replaces a ternary expression for null checking with the ?. and ?? operators:

Seed Edit x Type V0 = V1 == null ? typeof(object) : V1.Get Type(); x+ Type V0 = V1?.Get Type() ?? typeof(object);

Neighbor x var V0 = V1 != null ? V1.To List() : new List<Text Span>(); x+ var V0 = V1?.To List() ?? new List<Text Span>();

We also considered pairs such as the following related, since they share similar syntactic structure

Seed Edit x V0.State = V1; x+ V0.Set State(VAR1);

Neighbor x V0.Quantity = V1; x+ V0.Set Quantity(V1);

Rating 0 Not Related

The seed and neighboring edits are not related based on the above criteria.

Published as a conference paper at ICLR 2019

Table 14: Annotation Instruction for Wiki Atom Edits Data Rating 2 Semantically and Syntactically Equivalent

The changed constituents in the seed edit and the neighboring edit are applied to the similar positions of the original sentence, serving the same syntactic and semantic role. For example,

Seed Edit Neighbor

chaz guest ( born 1961 ) was born in niagra falls , . . . , a decorated hero in wwii in europe , including the purple heart .

randal l. schwartz ( born november 22 , 1961 ) , also known as merlyn , is an american author , system administrator and programming consultant.

he was elected to donegal county council for sinn fin in 1979 , and held his seat until his death at age 56 .

davis graduated from high school in january 1947 , immediately enrolling at wittenberg college in rural ohio at age 17 .

dror feiler served as a paratrooper in the israel defense forces .

nagaur fort - sandy fort ; centrally located ; 2nd century old ; witnessed many battles ; lofty walls & spacious campus ; having many palaces & temples inside .

the original old bay house , home of the chief factor , still exists and is now part of the fort vermilion national historic site .

the population was 6,400 at the 2010 census and is part of the st. louis metropolitan area . Rating 1 Syntactically Related

The changed constituents in the seed and the neighboring edit are applied to the similar positions of the original sentence, and they play similar syntactic roles. This includes examples like adding a disfunction, adding a complement, prepositional clause or other syntactic constructs with similar phrases or language structures. For example,

Seed Edit Neighbor

the douro fully enters portuguese territory just after the conﬂuence with the gueda river ; once the douro enters portugal , major population centres are less frequent along the river .

she made a brief return to the screen in parrish ( 1961 ) , playing the supporting role of mother which received little attention by the press .

when they found it , they discovered a group of pagumon living there instead who immediately proceeded to treat the digidestined as honored guests , saying that pagumon are the fresh form of koromon .

in 2012 slote and his baseball book jake were the subject of an espn ( 30 for 30 ) short documentary in which slote describes his writing process and reads from the book , saying it is his best writing .

the aircraft was intended to be certiﬁed and supplied as a complete ready - to - ﬂy - aircraft for the ﬂight training and aerial work markets .

in june reinforcements ﬁnally did arrive when provincial and militia units from new york , new jersey , and new hampshire were sent up from fort edward by general daniel webb .

Rating 0 Not Related

The seed and neighboring edits are not related based on the above criteria.