# adversarial_robustness_for_code__8367398b.pdf Adversarial Robustness for Code Pavol Bielik 1 Martin Vechev 1 Machine learning and deep learning in particu lar has been recently used to successfully address many tasks in the domain of code such as fnding and fxing bugs, code completion, decompilation, type inference and many others. However, the is sue of adversarial robustness of models for code has gone largely unnoticed. In this work, we explore this issue by: (i) instantiating adversar ial attacks for code (a domain with discrete and highly structured inputs), (ii) showing that, simi lar to other domains, neural models for code are vulnerable to adversarial attacks, and (iii) com bining existing and novel techniques to improve robustness while preserving high accuracy. 1. Introduction Recent years have seen an increased interest in using deep learning to train models of code for a wide range of tasks including code completion (Brockschmidt et al., 2019; Li et al., 2018), code captioning (Alon et al., 2019; Allama nis et al., 2016; Fernandes et al., 2019), code classifca tion (Mou et al., 2016; Zhang et al., 2019) and bug de tection (Allamanis et al., 2018; Pradel & Sen, 2018; Li et al., 2019). Despite substantial progress on training ac curate models of code, the issue of robustness has been overlooked. Yet, this is a very important problem shown to affect neural models in different domains (Goodfellow et al., 2015; Szegedy et al., 2014; Papernot et al., 2016). Challenges in modeling code In our work, we focus on tasks that compute program properties (e.g., type in ference), usually addressed via handcrafted static analy sis, but for which a number of recent neural models with high accuracy have been introduced (Hellendoorn et al., 2018; Schrouff et al., 2019; Malik et al., 2019). Unsur prisingly, as these works do not consider adversarial ro bustness, their adversarial accuracy can drop signifcantly. 1Department of Computer Science, ETH Z urich, Switzerland. Correspondence to: Pavol Bielik . Proceedings of the 37 th International Conference on Machine Learning, Online, PMLR 119, 2020. Copyright 2020 by the au thor(s). Adversarial Learning to Represenation Training Abstain Refnement Figure 1. Illustration of the three key components used in our work. Each point represents a sample, is a region where model abstains from making predictions, and are regions of model prediction, is the space of valid modifcations for a given sample, and B is the learned (reduced) space of valid modifcations. However, training both robust and accurate models of code in this setting is non-trivial and requires one to address several key challenges: (i) programs are highly structured and long, containing hundreds of lines of code, (ii) a sin gle discrete program change can affect the prediction of a large number of properties and is much more disruptive than a slight continuous perturbation of a pixel value, and (iii) the property prediction problem is usually undecidable (hence, static analyzers approximate the ideal solution). Accurate and robust models of code As a frst step to address these challenges, we propose a novel method that combines three key components, illustrated in Figure 1 as we show, all of these contribute to achieving accu rate and robust models of code. First, we train a model that abstains (Liu et al., 2019) from making a prediction when uncertain, effectively partitioning the dataset into two parts: one part where the model makes predictions ( , ) that should be accurate and robust, and one ( ) where the model abstains and it is enough to be robust. Second, we instantiate adversarial training (Goodfellow et al., 2015) to the domain of code. Third, we develop a new method to re fne the representation used as input to the model by learn ing the parts of the program relevant for the prediction. This reduces the number of places that affect the prediction and helps to make adversarial training for code effective. Finally, we create a new algorithm that trains multiple mod els, each learning a specialized representation that makes robust predictions on a different subset of the dataset. We instantiate our approach to the type prediction task and show its effectiveness we train a model that improves ro bustness by 15% while preserving high accuracy. Adversarial Robustness for Code 2. Accurate and Robust Models of Code In this section, we present an overview of our approach. Without loss of generality, we defne an input program p to be a sequence of words p = w1:n. The words can correspond to a tokenized version of the program, nodes in an abstract syntax tree corresponding to p or other suitable program representations. Further, let l N be a position in the program p that corresponds to a word wl W. A training dataset D = {(xj , yj ) }N j=1 contains a set of samples, where x X is an input tuple x = hp, li consisting of a program p and a position in the program l, while y Y contains the ground-truth label. As an example, the code snippet in Figure 2a contains 12 different samples (x, y), one for each position where a prediction should be made (annotated with their ground-truth types y). Our goal is to learn a function f : X R|Y|, represented as a neural network, which for a given input program and a position in the program, computes the probability distribution over the labels. The model s prediction then corresponds to the label with the highest probability according to f. Step 1: Augment the model with an (un)certainty score We start by augmenting the standard neural model f with an option to abstain from making a prediction. To achieve this, we adopt the recently proposed approach by (Liu et al., 2019) and introduce a selection function gh : X R, which measures model certainty. Then, the model is defned to make a prediction only if gh is confdent enough (gh(x) h) and abstain from making a prediction otherwise. Here, h R is an associated threshold that controls the desired level of confdence. For example, using a high threshold h = 0.9, the model learns to make only fve predictions for the program in Figure 2b and will abstain from uncertain predictions such as predicting parameter types. The frst insight from our work is that allowing the model to abstain is benefcial for achieving robustness. This step leads to simpler models, since learning to abstain is easier than learning to predict the correct label. This is in contrast with forcing the model to learn the correct label for all samples, which is infeasible for most practical tasks. Step 2: Adversarial training Next, we instantiate adversarial training to the domain of code. Concretely, let Δ(x) be a set of valid modifcations of sample x and let x+δ denote a new input obtained by applying the modifcations in δ Δ(x) to x. As a concrete example, Figure 2c shows a refactoring of the program from Figure 2b by renaming hex to color. Even though this change does not affect the types in the program, the model suddenly predicts incorrect types for both the color parameter and the substring function. Further, even though the types of parse Int and v are still correct, the model became much more uncertain. Intuitively, our goal is to address this issue and to en sure that the model is robust for all valid modifcations δ Δ(x) when evaluated on x + δ, the model either ab stains or predicts the correct label. Concretely, we use ad versarial training (Goodfellow et al., 2015), which instead of minimizing the expected loss on the original distribu tion E(x,y) D[ ((f, gh)(x), y)] as usually done in standard training, minimizes the expected adversarial loss: E(x,y) D[ max ((f, gh)(x + δ), y)] (1) δ Δ(x) That is, we minimize the worst case loss obtained by apply ing a valid modifcation to the original sample x. Similar to other domains, the main challenge in this setting is solving the inner maxδ Δ(x) effciently for the domain of code. Standard adversarial training is insuffcient Although adversarial training has been successfully applied in many domains (Madry et al., 2018; Wong & Kolter, 2018; Sinha et al., 2018; Raghunathan et al., 2018), in our work we show that for code, adversarial training alone is insuffcient to achieve model robustness. The key reason is that, ex isting neural models of code typically process the entire program which can contain hundreds of lines of code. This is problematic as it means that any program change will affect all predictions and there can be infnitely many pro gram changes in Δ(x). Further, a single discrete program change is much more disruptive in affecting the model than a slight continuous perturbation of a pixel value. At the same time, while not suffcient, in our evaluation we show that adversarial training can be used to improve robustness by 0 to 7%, depending on the model architecture. Step 3: Representation refnement To address the is sue that adversarial training alone does not work well, we develop a novel technique that: (i) learns which parts of the input program are relevant for the given prediction, and (ii) refnes the model representation such that only relevant program parts are used as input to the neural network. Es sentially, the technique automatically learns an abstraction α which given a program, produces a relevant representa tion of that program. Figure 2d shows an example of a possible abstraction α that takes as input the entire pro gram but keeps only parts relevant for predicting the type of parse Int it is a method call with name parse Int which has two arguments. To learn the abstraction α, we frst rep resent programs as graphs and then phrase the refnement task as an optimization problem that minimizes the num ber of graph edges, while ensuring that the accuracy of the model before and after applying α stays roughly the same. Finally, we apply adversarial training, but this time on the abstraction α obtained via representation refnement, re sulting in new functions f and gh. Overall, this results in an adversarially robust model mi = hf, gh, αi. Adversarial Robustness for Code (a) Training Dataset D = {(xj , yj )}N (b) Learning to Abstain (Appendix B) (c) Adversarial Training (Appendix C) j=1 (hexstr , radixnum) => { (hex, radix) => { (colornum,0.9 , radix) => { num num,1.0 num,0.4 v = parse Intnum( v = parse Intnum,1.0( v = parse Intnum,0.6( hexstr.substringstr(1num), D hex.substringstr,0.9(1num,1.0), hf, ghi color.substringbool,0.9(1num,1.0), radixnum radix radix ); ); ); rednum num >>num 16num 16num,1.0 16num,1.0 = v ; red = v >> ; red = v >> ; ... δ = [rename hex color] Di+1 Di Learn selection function gh that makes a prediction only if confdent enough and abstains otherwise. Train with the worst case modifcations x + δ (e) Apply & Train Next Model (Section 4) (d) Representation Refnement (Section 3) Retrain with hf, ghi abstraction α (hex, radix) => { (num, radix) => { v = parse Int( v = parse Int( parse Int( robust model hex.substring(1), num.substring(1), _, α = radix mi = hf, gh, αi radix _ ); ); ); red = v >> 16; red = v >> 16; Annotate programs in D with predicted types Learned abstraction α used to predict the parse Int return type Figure 2. Overview of the main steps of our approach for learning accurate and adversarially robust models of code. Step 4: Learning accurate models Although the model mi is robust, it provides predictions only for a sub set of the samples for which it has enough confdence (i.e., gh(x) h). To increase the ratio of samples for which our approach makes a prediction (i.e., does not abstain), we perform two steps: (i) generate a new dataset Di+1 by annotating the program with the predictions made by the learned model mi, and removing successfully predicted samples, and (ii) learn another model mi+1 on the new dataset Di+1. We repeat this process for as long as the new learned model predicts some of the samples in Di+1. Training multiple models is benefcial because: (i) the models are easier to train as well as easier to make robust as they do not try to learn all predictions, (ii) it allows condi tioning on the predictions learned by earlier models which helps both interpretability and robustness. For example, the model mi+1 can learn that the left hand side of the assignment v=parse Int has the same type as the right hand side, since the type of parse Int was already pre dicted by mi. Interestingly, if we think of each model as a learned set of rules, we can essentially apply the models to a given program in a fxed point style (similar to how a tra ditional sound static analysis works), and (iii) each model learns a different representation α that is specialized for the predictions it makes. For example, while predicting the type of parse Int is independent of the argument val ues (parse Int( , )), predicting the second argument type is not (parse Int( , radix)). Using a single abstraction to predict both would lead to either reduced robustness or ac curacy, depending on which abstraction is used. Summary Given a training dataset D, our approach learns a set of robust models, each of which makes robust predictions for a different subset of D. To achieve this, we extend existing neural models of code with three key components the ability to abstain (with associated uncer tainty score), adversarial training, and learning to refne the representation. Given the limited space, we provide for mal description of the the frst two components that learn to abstain and apply adversarial training for code in Ap pendix B and Appendix C, respectively. Next, we formally describe the novel components learning to refne the rep resentation (Section 3) and present our training algorithm that combines all of them together (Section 4). 3. Learning to Refne Representations As motivated in Section 2, a key issue with many existing neural models for code is that the model prediction f(x) depends on the full program p, even though only small parts of p are typically relevant. We address this issue by learn ing an abstraction α that takes as input p and produces only the parts relevant for the prediction. That is, α refnes the representation given as input to the neural model. Overview Our method works as follows: (i) we convert the program into a graph representation, (ii) then defne the model to be a graph neural network (e.g., (Veliˇckovi c et al., 2018; Kipf & Welling, 2017; Wu et al., 2019; Li et al., 2016)), which at a high level works by propagating and ag gregating messages along graph edges, (iii) because depen dencies in graph neural networks are defned by the struc ture of the graph (i.e., the edges it contains), we phrase the problem of refning the representation as an optimization problem which removes the maximum number of graph edges (i.e., removes the maximum number of dependen cies) without degrading model accuracy, and (iv) we show how to solve the optimization problem effciently by trans forming it to an integer linear program (ILP). Adversarial Robustness for Code From programs to graphs Following prior works, we represent programs using their corresponding abstract syn tax trees (AST). These are further transformed into graphs, as done in (Allamanis et al., 2018; Brockschmidt et al., 2019), by including additional edges. Defnition 3.1. (Directed Graph) A directed graph is a tu ple G = h V, E, ξV , ξE i where V denotes a set of nodes, E V 2 denotes a set of directed edges, ξV : V Nk is a mapping from nodes to their associated attributes and ξE : E Nm is a mapping from edges to their attributes. We associate two attributes with each node type which corresponds to the type of the AST node (e.g., Block, Identifier, Binary Expression, etc.) and value asso ciated with the AST node (e.g., +, , 0, 1, GET , x, data, etc.). For edges we use a single attribute the edge type, which can be: (i) ast, for the edges that correspond to those included in the AST, (ii) last usage, for edges introduced between any two usages (either read or write) of the same variable, and (iii) returns-to, for edges introduced between a return statement and the function declaration. All edges are initially added in both directions, but can be later re moved during the training. Depending on the task, more edge types can be easily added. Representation refnement Our goal is to learn an ab straction function α : h V, E, ξV , ξE i h V, E0 E, ξV , ξE i that removes a subset of the edges from the graph. To quan tify the size of the abstraction, we use |α(x)| := |E0| to denote the number of edges after applying α on x. Defning valid graph refnements Because the goal of representation refnement is to reduce the number of nodes on which a prediction depends, we need to ensure that α itself does not depend on all the graph nodes. This is nec essary as otherwise we only shift the dependency on the entire program from the model f to the representation re fnement α. To achieve this, the decision to include or re move a given edge is done locally, based only on the edge attributes and attributes of the nodes it connects. Concretely, for a given edge hs, ti E, we defne an edge feature φ(hs, ti) := hξE (hs, ti), ξV (s), ξV (t)i to be a tu ple of the edge attributes and attributes of the nodes it con nects. As a form of regularization, we condition only on the type attribute of each node. We denote the set of all possi ble edge features Φ to be the range of the function φ evalu ated over all edges in D. Further, we defne the refnement α as a subset of edge features α Φ. Finally, the seman tics of executing α over edges E is that only edges whose features are in α are kept, i.e., {e | e E φ(e) α}. Problem statement Minimize the expected size of the refnement α Φ subject to the constraint that the ex pected loss of the model f stays approximately the same: X arg min |α(x)| (2) α Φ (x,y) D subject to P P (f(x), y) (f(α(x)), y) (x,y) D (x,y) D Our problem statement is quite general and can be instan tiated by: (i) using Abstain Cross Entropy as the loss (Ap pendix B), and (ii) using adversarial risk (Appendix C). Allowing the model to abstain from making predictions is especially important in order to obtain small α (i.e., sparse graphs). This is because the restriction that the model accu racy is roughly the same is otherwise too strict and would require that most edges are kept. Further, note that the problem formulation is defned over all samples in D, not only those for which the model f predicts the correct label. This is necessary since the model needs to make a predic tion for all samples, even if that prediction is to abstain. Optimization via integer linear programming (ILP) To solve Equation 2, the key idea is that for each sample (x, y) D we frst capture the relevance of each node to the prediction made by the model f by computing: a(f, x, y) = k Gi,:k1, . . . , G|p|,: , 1 where G = rx (f(x), y) R|p| emb denotes the gra dient with respect to the input x = hp, li and a given pre diction y. As positions in p correspond to discrete words, the gradient is computed with respect to their embedding emb R. The score for each position in p is computed by applying the L1-norm over the embedding gradients, pro ducing a vector of unnormalized scores a R|p|. To obtain a probability distribution aˆ(f, x, y) over all positions in p, we normalize the entries in a accordingly. Then, we phrase the solution of Equation 2 as an opti mization problem of including the minimum number of edges necessary for a path to exist between every relevant node (according to aˆ) and the node where the prediction is made. Preserving all paths between the prediction and relevant nodes encodes the constraint that the expected loss stays approximately the same, since it allows propagating information throughout the graph neural network. This optimization can be naturally encoded as minimum-cost maximum-fow problem and solved effciently with offthe-shelf ILP solvers. We provide formal defnition of the ILP encoding as well as concrete examples in Appendix D. Even though our ILP formulation is very fast (in all our experiments the ILP solver takes less than a second), it does result in a more complex approach compared to an end-to end trainable solution. We note however that an end-to-end Adversarial Robustness for Code trainable solution is also possible. For example, one could defne α to be continuous by defning a learnable weight for each edge feature φ, encode the sparsity on α as part of the loss, and extend the graph neural network such that each message propagated along an edge e is scaled according to the corresponding value of the edge feature φ(e). We have explored this option in the work of (Abstreiter et al., 2020). 4. Training Algorithm We now describe our algorithm that combines learning to abstain, adversarial training and representation refnement. Training a single adversarially robust model The train ing procedure used to learn a single adversarially robust model is shown in Algorithm 1. The input is a training dataset D and the desired accuracy tacc that the learned model should have. Here, setting tacc = 1.0 corresponds to a model that makes no mis-prediction (i.e., 100% accu racy) while tacc = 0 corresponds to training a model that never abstains. We start by training a model f and a selection function gh as described in Appendix B (line 3). At this point we do not use adversarial training and train with a weaker threshold tacc ϵ, as our goal is only to obtain a fast approximation of the samples that can be predicted with high certainty. We use f and gh to obtain an initial representation refnement α (line 5) which is applied to the dataset D to remove edges that are not relevant according to f and gh (line 8). After that, we perform adversarial training (line 9) as described in Appendix C. However, instead of training from scratch, we reuse model f and gh learned so far, which speeds-up train ing. Next, we refne the representation again (line 5) and if the new representation is smaller (line 6), we repeat the whole process. Note that the adversarial training also uses threshold tacc ϵ to account for the fact that the suitable representation is not known in advance. After the training loop fnishes, we set the threshold h used by gh to match the desired accuracy tacc (more details on this step are pro vided in Appendix B). The fnal result is a model consisting of the function f trained to make adversarially robust pre dictions, the selection function gh and the abstraction α. Incorporating robust predictions Once a single model is learned, it makes robust predictions on a subset of the dataset Dpredict ={(x, y) | (x, y) D gh(α(x)) h} and abstains from making a prediction on the remainder of the samples Dabstain = D \ Dpredict. Next, for all samples in Dpredict, we use the learned model to annotate the position l in the program p (recall that each x = hp, li consists of a program p and a position l) with the ground-truth label y (denoted as Apply in Algorithm 2). Annotating a program position corresponds to either defning a new attribute (as illustrated in Figure 2e) or replacing an existing attribute (e.g., the value attribute) of a given node. Note that annotating programs is useful only in cases where the same program p is shared by multiple samples (x, y) D (i.e., multiple predictions are computed for different positions in the same program). Main training algorithm Our main training algorithm is shown in Algorithm 2. It takes as input the training dataset D and learns multiple models M, each of which makes robust predictions on a different subset of D (as mo tivated in Section 2). The number of models and the subsets for which they make predictions is not fxed a priori and is learned as part of our training. Model training (line 4) and model application (line 5) are performed as long as a non empty robust model exists (i.e., it makes at least one predic tion). If the goal is to make predictions for all the samples in D, the Algorithm 2 is run iteratively, with decreasing values of tacc until the full dataset is covered. Verifying model correctness A natural extension of our approach is to formally verify that the learned models are correct. Even though formally verifying the correctness of all samples is typically infeasible, it is possible to verify a subset of them. This can be achieved since using repre Algorithm 1 Training procedure used to learn a single adversarially robust model hf, gh, αi. 1: function Robust Train(D, tacc) : 2: αlast Φ 3: f, gh Train (D, tacc ϵ) 4: while true do 5: α Refine Representation (D, f, gh) 6: if |α| |αlast| then break 7: αlast α 8: D {(α(x), y) | (x, y) D} 9: f,gh Adversarial Train (D,f,gh, tacc ϵ) 10: set threshold h in gh such that the accuracy is tacc 11: return hf, gh, αi Algorithm 2 Training multiple adversarially robust models, each of which learns to make predictions for a different subset of the dataset D. 1: function Accurate And Robust Train(D, tacc =1.0) 2: M [] 3: while true do 4: hf, gh, αi Robust Train(D, tacc) 5: Dabstain Apply(D, f, gh, α) 6: 7: if |Dabstain| = |D| then break D Dabstain 8: M M hf, gh, αi 9: return M Adversarial Robustness for Code sentation refnement signifcantly simplifes the problem of proving correctness of all positions (nodes) in the program to a much smaller set of relevant positions. In fact, for some cases the refned representation is so small that it is possible to simply enumerate all valid modifcations (e.g., a fnite set of valid variable renamings) and check that the model is correct for all of them. Additionally, it would be possi ble to adapt the recently proposed techniques (Huang et al., 2019; Jia et al., 2019), based on Interval Bound Propaga tion, that verify robustness to any valid word renaming and word substitution modifcations. However, applying these techniques to realistic networks in a scalable and precise ways is an open problem beyond the scope of our work. 5. Evaluation We instantiated our approach to a task studied by a number of prior works predicting types for two dynamically typed languages Java Script and Type Script (Hellendoorn et al., 2018; Schrouff et al., 2019; Malik et al., 2019; Raychev et al., 2015). In this task, the need for model robustness is natural since the model is queried each time a program is modifed by the user. Our key results are: Our approach learns accurate and adversarially robust models for the task of type inference, achieving 87.7% accuracy while improving robustness from 52.1% to 67.0%. We train highly accurate and robust models for a subset of the dataset, with 99.9% accuracy and 99.9% robustness for 29% of the samples. We implemented our code in Py Torch (Paszke et al., 2019) and DGL library (Wang et al., 2019). We used a sin gle Nvidia TITAN RTX for all the experiments. For our dataset, we collect the same top starred projects on Github and perform similar preprocessing steps as Hellendoorn et al. We provide detailed description in the supplemen tary material. The code and datasets are available at: https://github.com/eth-sri/robust-code Evaluation metrics We use two main evaluation metrics: Accuracy is computed over the unmodifed dataset D and corresponds to the accuracy used in prior works. Con cretely, the accuracy is defned as the ratio of samples (x, y) for which the most likely label according to the model f, denoted f(x)best, is the same as the ground truth label y: ( X 1 if f(x)best = y 1 |D| 0 otherwise (x,y) D Robustness is the ratio of samples (x, y) D for which the model f evaluated on all valid modifcations δ Δ(x) either abstains or makes a correct prediction: ( X 1 0 / if δ Δ(x)f(x + δ)best {y, abstain} |D| 1 otherwise (x,y) D Models We evaluate fve neural model architectures: LSTM is a bidirectional LSTM with attention which takes as input a sequence of AST nodes, including both types and values, obtained using pre-order traversal. Deep Typer is a model proposed by Hellendoorn et al. and consists of a bidirectional LSTM layer, followed by a single layer graph neural network that connects all variables with the same name (referred as consistency layer), followed by another bidirectional LSTM layer. Our only modifcation is that the input to our model is a sequence of AST types and values, instead of using syntactic program tokens. GCN, GGNN and GNT are three graph neural networks that use as input the graph program representation described in Section 3. Here, GCN is a Graph Convolutional Net work (Kipf & Welling, 2017), GGNN is Gated Graph Neural Network (Li et al., 2016) and GNT is a graph implemen tation of a recently proposed transformer neural network architecture (Vaswani et al., 2017; Dehghani et al., 2019). All models were trained with an embedding and hidden size of 128, batch size of 32, dropout 0.1 (Srivastava et al., 2014), initial learning rate of 0.001, using Adam opti mizer (Kingma & Ba, 2014) and between 10 to 20 epochs. Reducing dependencies via dynamic halting We fur ther strengthen our GNT model by implementing the Adap tive Computation Time (ACT) (Graves, 2016) which dy namically learns how many computational steps each node requires in order to make a prediction. This is in contrast to using a fxed amount of steps as done in (Allamanis et al., 2018; Brockschmidt et al., 2019). In our experiments, ACT signifcantly reduces the number of steps each node per forms (half of the nodes perform 3 steps). Program modifcations We instantiate the adversar ial training with both semantic preserving and labelpreserving modifcations shown in Table 1. Here, expr is either an existing expression or a new expression consisting of a random binary expression over constants up to depth 3, const is a randomly selected constant that results in a valid expression and x, y, z are variables in the program scope. Our modifcations extend those used by Bielik et al. (2017) but the list is not exhaustive and can be extended further. To measure the model robustness, we run the adversarial at tack for 1000 iterations for renaming modifcations and ad ditional 300 iterations for structural modifcations. These thresholds are rather high and were selected with the goal Adversarial Robustness for Code Table 1. Illustration of semantic and label preserving program modifcations used in our work. Substitutions and Renaming Examples Structural Modifcations Examples Semantic Preserving Label Preserving variable renaming x y new function parameters def inc(x) def inc(x, y) object feld renaming obj.x obj.y new method arguments inc(x) inc(x, expr) property assignment renaming Label Preserving {x : obj} {y : obj} Semantic Preserving ternary expressions expr1 (expr)2 : expr1 ? expr1 number substitution 2 7 array access expr [expr, expr][const] string substitution boolean substitution get load true false Dead Code side-effect free expressions expr adding object expressions {x : y, z : expr} Table 2. Comparison of accuracy and robustness across various models and training techniques considered in our work for the task of type inference. Adversarial training and the ability to abstain is applicable to all the models. The representation refnement is designed specifcally to models defned over graphs, including GCN, GGNN and GNT. Standard Training Adversarial Training Abstain + Adversarial + Refnement (f(x), y) maxδ Δ(x) (f(x + δ), y) maxδ Δ(x) Abstain CE((f, gh)(α(x + δ)), y) Model Accuracy Robustness Accuracy Robustness Model Accuracy Robustness LSTM Deep Typer GCN GNT 88.2 0.2 88.4 0.2 82.6 0.6 89.3 0.9 44.9 1.3 52.4 1.2 49.1 1.1 47.4 1.0 87.5 0.4 87.1 0.3 81.9 0.5 88.3 0.4 51.9 1.3 55.1 2.6 49.3 3.1 50.0 0.5 tacc = 1.00 (Abstain 70%) GNT 99.93% GGNN 99.80% tacc = 0.00 GNT 86.6% 99.98% 99.01% 62.3% GGNN 86.7 0.4 52.1 0.4 86.1 0.2 57.9 1.5 GGNN 87.7% 67.0% of closely estimating the true number of adversarial sam ples. Further, note that since δ Δ(x) is a set, each itera tion explores a set of concrete program modifcations. 5.1. Accurate and Adversarially Robust Models We summarize the main results in Table 2. The frst col umn (left) shows the median test accuracy and standard deviation of various models (across three trials trained with different random seeds). The GCN achieves the worst accu racy of 82.6% and the accuracy of the remaining models is similar with GNT model performing the best with 89.3%. Existing models are not robust While highly accurate, all models are also non-robust for up to half of the samples in the dataset. In other words, for every second sample x in our dataset, there exists a modifcation δ Δ(x) for which f(x) predicts the type correctly while f(x + δ) mispredicts it. However, since these models were not trained with the goal of adversarial robustness, it is expected for them to be (atleast partially) non-robust. Adversarial training alone is insuffcient To improve the robustness, we next train the models using adversarial training as described in Appendix C. Unfortunately, while the adversarial training increase the robustness, it does so only slightly. The best improvement was achieved for LSTM and GGNN models (7% and 5.8%, respectively). For Deep Typer and GNT the robustness increased by 2.5% while for GCN it is only 0.2%. This illustrates that while useful, if used alone, adversarial training is not enough. Our work: training accurate models with abstain The models trained using our approach are shown in Table 2 (right). First, we trained our models to be both accu rate and robust on a subset of the dataset. This can be achieved by setting the desired accuracy thresholds, in our case tacc = 1.00, which corresponds to training the model to make only correct predictions. For tacc = 1.00, our approach learns an almost perfect model that is both ac curate and robust for 30.0% of samples. Here, GNT learned 7 models and achieved 99.98% robustness while GGNN learned 8 models with robustness of 99.01%. Learn ing multiple models is crucial for achieving higher cover age as a single model would not abstain for only 17 20% of the samples, compared to 30% using multiple models. The model did not achieve 100% accuracy and robustness for tacc = 1.00 due to several samples included in the test set. These samples were mis-predicted because they con Adversarial Robustness for Code Table 3. Robustness breakdown for the GNT and GGNN models trained using our approach from Table 2 (right). Dataset Size Correct Incorrect Abstain GNT tacc = 1.00 Dcorrect 29.3% 90.0% 0.00% 10.00% Dabstain 70.6% 0.01% 99.99% GGNN tacc = 1.00 Dcorrect 30.6% 75.5% 0.06% 23.94% Dabstain 69.3% 1.46% 98.54% Correct := δ Δ(x)(f, gh)(α(x + δ))best = y Incorrect := δ Δ(x)(f, gh)(α(x + δ))best / {y, abstain} tained code structure not seen during training and not cov ered by modifcations δ Δ(x). This illustrates that it is important that the samples in D are diverse and contain all the language features and corner cases of the programs, or that the modifcations Δ(x) are expressive enough such that these can be discovered automatically during training. Our work: improving robustness Next, we train mod els that take advantage of the highly accurate and robust models trained using tacc = 1.00, but make predictions for all the samples (i.e., do not abstain). This can be achieved by continuing the training while reducing tacc to zero and conditioning on all the models trained with higher tacc. In our experiments, we train a single additional model by di rectly setting tacc = 0 after training with tacc = 1.00. The results are shown in Table 2 (right) and lead to additional robustness increase of 9.2% and 12.3% compared to using adversarial training only for GGNN and GNT, respectively. For GNT, the accuracy slightly decreases by 1.7% which is expected as increasing model robustness typically comes at the cost of reduced accuracy (Tsipras et al., 2019). Inter estingly, for GGNN our robust training increases the accu racy over both the adversarial training as well as standard training by 1.9% and 1.0%, respectively. Adversarial robustness breakdown Table 3 provides a detailed breakdown of the robustness metric for the GNT and GGNN models trained with tacc = 1.00 from Table 2 (right). Here, Dabstain contains samples for which the model abstains from making a prediction and Dcorrect con tains samples for which the model evaluated on a nonadversarial input (i.e., x without any modifcation) makes a correct prediction. We use correct to denote that a sam ple (x, y) is correct for all possible modifcations δ Δ(x), the incorrect has the same defnition as robustness (i.e., there exists a modifcation that leads to an incorrect predic tion), and abstain denotes the remaining samples. The GNT is precise and keeps predicting the correct label in 90% of cases and abstain in the rest. This is even though the requirements for correct are very strict and require that all samples are correct. When considering Dabstain, the GNT model is also precise and produces incorrect pre diction for only a single sample (0.01%). For GGNN the results are similar but the model is both less precise (keeps the correct prediction in 75.5% of cases) and less robust (1.46% of samples in Dabstain can be modifed to cause a mis-prediction). This shows that the majority of robust ness errors from Table 2 are due to mis-predicted samples for which the model originally abstained. 6. Related Work Our work is related to a number of different areas from adversarial machine learning and learning over code. Model certainty Several approaches have been recently proposed to extend neural models with certainty mea sure (Gal & Ghahramani, 2016; Liu et al., 2019; Gal, 2016; Geifman & El-Yaniv, 2017; 2019). In our work, we use the method proposed by Liu et al. (2019) but in a novel way applied to the adversarial setting with the goal of training robust models. Learning static analyzers from data A closely related work addresses the task of learning static analyzers (Bielik et al., 2017): it defnes a domain specifc language to rep resent static analyzers, uses decision tree learning to obtain an interpretable model, and defnes a procedure that fnds counter-examples the model mis-classifes (used to re-train the model). At a high-level, some of the steps are similar but the actual technical solution is very different as we ad dress a general class of neural models and do not assume any prior knowledge (i.e., a domain specifc language). Adversarial training Even though the problem of adver sarial robustness of code has been overlooked, the adversar ial training has already been applied to related domains natural language processing (Miyato et al., 2017; Papernot et al., 2016; Gao et al., 2018; Liang et al., 2018; Belinkov & Bisk, 2017; Ebrahimi et al., 2018) and graphs (Dai et al., 2018; Z ugner & G ugner et al., 2018; Z unnemann, 2019). In the domain of graphs, existing works focus on attack ing the graph structure (Dai et al., 2018; Z ugner et al., 2018; Z ugner & G unnemann, 2019) by considering that the nodes are fxed and edges can be added or removed. While this setting is natural for modelling many types of graphs, such approaches do not apply for the domain of code where graph edges can not be added and removed arbitrarily. In natural language processing, existing approaches gen erally perform two steps: (i) measure the contribution of Adversarial Robustness for Code individual words or characters to the prediction (e.g., us ing gradients (Liang et al., 2018), forward derivatives (Pa pernot et al., 2016) or head/tail scores (Gao et al., 2018)), and (ii) replace or remove those whose contribution is high (e.g., using dictionaries (Jia et al., 2019), character level typos (Gao et al., 2018; Belinkov & Bisk, 2017; Ebrahimi et al., 2018), or handcrafted strategies (Liang et al., 2018)). The adversarial training used in our work operates similarly except our modifcations are designed over programs. Program representations A core challenge of using ma chine learning for code is designing a suitable program rep resentation used as model input. Due to its simplicity, the most commonly used program representation is a sequence of words, obtained either by tokenizing the program (Hel lendoorn et al., 2018) or by linearizing the abstract syntax tree (Li et al., 2018). This however ignores the fact that programs do have a rich structure an issue addressed by representing programs as graphs (Allamanis et al., 2018; Brockschmidt et al., 2019) or as a combination of abstract syntax tree paths (Alon et al., 2019). In our work, we fol low the approach proposed in recent works and represent programs as graphs. More importantly, we develop a novel technique that learns to refne the representation based on model predictions instead of fxing it a priori. As shown in our evaluation, this is crucial for learning robust models. Adversarial attacks for code Concurrent to our work, Yefet et al. (2019) explored the task of generating adversar ial attacks for code via gradient based optimization. In con trast, we introduce an approach to reduce the search space an adversarial attack needs to consider by learning to re fne the representation. Such reduced search space is useful for both for renaming and structural modifcations, whereas gradient based optimization has been explored only for re naming. However, both of these approaches are orthogonal and can be combined into one that learns both to reduce the search space as well as to effciently fnd adversarial examples in this reduced search space. Type inference We evaluated our work on the task of type inference for which a number of recent works pro posed new neural architectures with the goal of improv ing accuracy. In contrast, the goal of our work is to study and improve robustness of these models. To achieve this, we compare to two prior works in our evaluation (Schrouff et al., 2019; Hellendoorn et al., 2018). In addition to pre dicting types from source code, Malik et al. (2019) showed that it is possible to predict parameter types using natu ral language information obtained from method docstrings. Here, existing attacks on text (LSTM) can be used to as sess its robustness but evaluating text models is outside the scope of our work. Finally, two concurrent works to ours have proposed new models to improve accuracy: Typilus (Allamanis et al., 2020) and Lambda Net (Wei et al., 2020). Both of these works represent programs as graphs and use graph neural networks as the underlying model architecture, which makes our approach applicable. However, we note that for Lambda Net we expect the model to be quite robust as the authors manually designed a sparse graph representation (by designing a static analysis to extract the type dependence graph) over which to learn. 7. Conclusion We presented a new technique to train accurate and ro bust neural models of code. Our work addresses two key challenges inherent to the domain of code: the diffculty of computing the correct label for all samples (i.e., the input is incomplete code snippet, program semantics are unknown) as well as the fact that programs are signifcantly larger and more structured compared to images or natural language. To address the frst challenge, we allow the model to ab stain from making a prediction, rather than forcing the model to make predictions for all samples (as done in prior works). To address the second challenge, we learn which parts of the program are relevant for the prediction, and ab stract the rest (instead of using the entire program as input). Further, we introduce a new procedure that trains multiple models, instead of one. This has several advantages, as each model is simpler and thus easier to train robustly, the learned representation is specialized to the kind of predic tions it makes, and the model directly conditions on predic tions of prior models (instead of having to re-learn them). However, a disadvantage of our approach is that the mod els are learned sequentially which slows down the train ing (i.e., training 10 models will take 10 more time). To speed up the training, it would be interesting to allow learn ing multiple models in parallel at each sequential step and then combine them as explored by Shazeer et al. (2017). We believe than our work is only one step in addressing the task of adversarially robust models of code and that many challenges remain open. For example, it remains to be seen how effective our approach is at other tasks over code, be yond type inference. Further, we optimize for the worst case adversarial robustness, which corresponds to learn ing a robust model for all programs. An interesting future work is to optimize with respect to those modifcation that are common among developers, especially if it is not possi ble to be robust for all of them. While we checked robust ness for a wide range of program modifcations, these are still far from exhaustive and more work is needed in defn ing new ones. Finally, as the number of possible modifca tion is large and growing, an interesting area is designing how they can be combined effciently, as explored recently by Ramakrishnan et al. (2020) and Zhang et al. (2020). Adversarial Robustness for Code Acknowledgements We would like to thank the anonymous reviewers who gave useful comments and provided interesting suggestions on how our work can be improved and extended. Further, we would like to acknowledge the work of (Hellendoorn et al., 2018) which is publicly available and provided useful in frastructure for generating datasets used in our work. The research leading to these results was partially supported by an ERC Starting Grant 680358. Abstreiter, K., Bielik, P., and Vechev, M. Improving robustness for models of code via sparse graph neural networks. Technical report, ETH Zurich, june 2020. URL https://www.research-collection. ethz.ch/handle/20.500.11850/431559. Allamanis, M., Peng, H., and Sutton, C. A convolutional attention network for extreme summarization of source code. In International Conference on Machine Learning (ICML), 2016. Allamanis, M., Brockschmidt, M., and Khademi, M. Learning to represent programs with graphs. In In ternational Conference on Learning Representations, ICLR 18, 2018. Allamanis, M., Barr, E. T., Ducousso, S., and Gao, Z. Typ ilus: Neural type hints. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language De sign and Implementation, PLDI 20, pp. 91 105, 2020. Alon, U., Zilberstein, M., Levy, O., and Yahav, E. Code2Vec: Learning distributed representations of code. In Proceedings of the 46st Annual ACM SIGPLAN SIGACT Symposium on Principles of Programming Lan guages, volume 3 of POPL 19, pp. 40:1 40:29, 2019. Belinkov, Y. and Bisk, Y. Synthetic and natural noise both break neural machine translation. Co RR, abs/1711.02173, 2017. Bielik, P., Raychev, V., and Vechev, M. Learning a static analyzer from data. In International Conference on Computer Aided Verifcation, CAV 17, pp. 233 253, 2017. Brockschmidt, M., Allamanis, M., Gaunt, A. L., and Polo zov, O. Generative code modeling with graphs. In International Conference on Learning Representations, ICLR 19, 2019. Dai, H., Li, H., Tian, T., Huang, X., Wang, L., Zhu, J., and Song, L. Adversarial attack on graph structured data. In Proceedings of the 35th International Conference on Machine Learning, volume 80 of ICML 18, pp. 1115 1124. PMLR, 2018. Dehghani, M., Gouws, S., Vinyals, O., Uszkoreit, J., and Kaiser, L. Universal transformers. In International Con ference on Learning Representations, ICLR 19, 2019. Ebrahimi, J., Rao, A., Lowd, D., and Dou, D. Hot Flip: White-box adversarial examples for text classifcation. In Proceedings of the 56th Annual Meeting of the Asso ciation for Computational Linguistics (Volume 2: Short Papers), ACL 18, pp. 31 36, 2018. Fernandes, P., Allamanis, M., and Brockschmidt, M. Struc tured neural summarization. In International Conference on Learning Representations, ICLR 19, 2019. Gal, Y. Uncertainty in Deep Learning. Ph D thesis, Univer sity of Cambridge, Department of Engineering, 9 2016. Gal, Y. and Ghahramani, Z. Dropout as a bayesian approx imation: Representing model uncertainty in deep learn ing. In Proceedings of The 33rd International Confer ence on Machine Learning, volume 48 of ICML 16, pp. 1050 1059, 2016. Gao, J., Lanchantin, J., Soffa, M. L., and Qi, Y. Black-box generation of adversarial text sequences to evade deep learning classifers. Co RR, abs/1801.04354, 2018. Geifman, Y. and El-Yaniv, R. Selective classifcation for deep neural networks. In Proceedings of the 31st Inter national Conference on Neural Information Processing Systems, Neur IPS 17, pp. 4885 4894, 2017. Geifman, Y. and El-Yaniv, R. Selective Net: A deep neural network with an integrated reject option. In Proceedings of the 36th International Conference on Machine Learn ing, volume 97 of ICML 19, pp. 2151 2159, 2019. Goodfellow, I. J., Shlens, J., and Szegedy, C. Explain ing and harnessing adversarial examples. In 3rd In ternational Conference on Learning Representations, ICLR 15, 2015. Graves, A. Adaptive computation time for recurrent neural networks. Co RR, abs/1603.08983, 2016. Hellendoorn, V. J., Bird, C., Barr, E. T., and Allamanis, M. Deep learning type inference. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engi neering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 18, 2018. Huang, P.-S., Stanforth, R., Welbl, J., Dyer, C., Yogatama, D., Gowal, S., Dvijotham, K., and Kohli, P. Achieving verifed robustness to symbol substitutions via interval bound propagation. In Empirical Methods in Natural Language Processing, EMNLP 19, 2019. Adversarial Robustness for Code Jia, R., Raghunathan, A., G oksel, K., and Liang, P. Cer tifed robustness to adversarial word substitutions. In Empirical Methods in Natural Language Processing, EMNLP 19, 2019. Kingma, D. and Ba, J. Adam: A method for stochastic optimization. In International Conference on Learning Representations, ICLR 14, 2014. Kipf, T. N. and Welling, M. Semi-supervised classi fcation with graph convolutional networks. In In ternational Conference on Learning Representations, ICLR 17, 2017. Li, J., Wang, Y., Lyu, M. R., and King, I. Code comple tion with neural attention and pointer networks. In Pro ceedings of the 27th International Joint Conference on Artifcial Intelligence, IJCAI 18, pp. 4159 25, 2018. Li, Y., Zemel, R., Brockschmidt, M., and Tarlow, D. Gated graph sequence neural networks. In International Con ference on Learning Representations, ICLR 16, 2016. Li, Y., Wang, S., Nguyen, T. N., and Van Nguyen, S. Im proving bug detection via context-based code representa tion learning and attention-based neural networks. Proc. ACM Program. Lang., (OOPSLA):162:1 162:30, 2019. Liang, B., Li, H., Su, M., Bian, P., Li, X., and Shi, W. Deep text classifcation can be fooled. In Proceedings of the 27th International Joint Conference on Artifcial Intelligence, IJCAI 18, pp. 4208 4215, 2018. Liu, Z., Wang, Z., Liang, P. P., Salakhutdinov, R. R., Morency, L.-P., and Ueda, M. Deep gamblers: Learn ing to abstain with portfolio theory. In Advances in Neu ral Information Processing Systems 32, Neur IPS 19, pp. 10622 10632. 2019. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. Towards deep learning models resistant to ad versarial attacks. In International Conference on Learn ing Representations, ICLR 18, 2018. Malik, R. S., Patra, J., and Pradel, M. NL2Type: Inferring javascript function types from natural language informa tion. In Proceedings of the 41st International Conference on Software Engineering, ICSE 19, pp. 304 315, 2019. Miyato, T., Dai, A. M., and Goodfellow, I. Adversar ial training methods for semi-supervised text classifca tion. In International Conference on Learning Represen tations, ICML 17, 2017. Mou, L., Li, G., Zhang, L., Wang, T., and Jin, Z. Con volutional neural networks over tree structures for pro gramming language processing. In Proceedings of the Thirtieth AAAI Conference on Artifcial Intelligence, AAAI 16, pp. 1287 1293, 2016. Papernot, N., Mc Daniel, P. D., Swami, A., and Harang, R. E. Crafting adversarial input sequences for recurrent neural networks. Co RR, abs/1604.08275, 2016. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., De Vito, Z., Rai son, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., and Chintala, S. Py Torch: An imperative style, high-performance deep learning library. In Ad vances in Neural Information Processing Systems 32, Neur IPS 19, pp. 8024 8035. 2019. Pradel, M. and Sen, K. Deep Bugs: A learning approach to name-based bug detection. Proc. ACM Program. Lang., (OOPSLA):147:1 147:25, 2018. Raghunathan, A., Steinhardt, J., and Liang, P. Cer tifed defenses against adversarial examples. In In ternational Conference on Learning Representations, ICLR 18, 2018. Ramakrishnan, G., Henkel, J., Wang, Z., Albarghouthi, A., Jha, S., and Reps, T. Semantic robustness of models of source code, 2020. Raychev, V., Vechev, M., and Krause, A. Predicting pro gram properties from Big Code . In Proceedings of the 42Nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 15, pp. 111 124, 2015. Schrouff, J., Wohlfahrt, K., Marnette, B., and Atkinson, L. Inferring javascript types using graph neural networks. In Representation Learning on Graphs and Manifolds. ICLR Workshop, 2019. Shazeer, N., Mirhoseini, A., Maziarz, K., Davis, A., Le, Q. V., Hinton, G. E., and Dean, J. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. Co RR, abs/1701.06538, 2017. Sinha, A., Namkoong, H., and Duchi, J. Certifable dis tributional robustness with principled adversarial train ing. In International Conference on Learning Represen tations, ICLR 18, 2018. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overftting. J. Mach. Learn. Res., 15(1):1929 1958, January 2014. ISSN 1532-4435. Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. Intriguing proper ties of neural networks. In International Conference on Learning Representations, ICLR 14, 2014. Adversarial Robustness for Code Tsipras, D., Santurkar, S., Engstrom, L., Turner, A., and Madry, A. Robustness may be at odds with accuracy. In International Conference on Learning Representations, ICLR 19, 2019. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. At tention is all you need. In Advances in Neural Infor mation Processing Systems 30, Neur IPS 17, pp. 5998 6008. 2017. Veliˇckovi c, P., Cucurull, G., Casanova, A., Romero, A., Li o, P., and Bengio, Y. Graph Attention Networks. Inter national Conference on Learning Representations, 2018. Wang, M., Yu, L., Zheng, D., Gan, Q., Gai, Y., Ye, Z., Li, M., Zhou, J., Huang, Q., Ma, C., Huang, Z., Guo, Q., Zhang, H., Lin, H., Zhao, J., Li, J., Smola, A. J., and Zhang, Z. Deep Graph Library: Towards effcient and scalable deep learning on graphs. ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019. Wei, J., Goyal, M., Durrett, G., and Dillig, I. Lamb da Net: Probabilistic type inference using graph neural networks. In International Conference on Learning Rep resentations, ICLR 20, 2020. Wong, E. and Kolter, Z. Provable defenses against adver sarial examples via the convex outer adversarial poly tope. In Proceedings of the 35th International Confer ence on Machine Learning, volume 80 of ICML 18, pp. 5286 5295, 2018. Wu, F., Souza, A., Zhang, T., Fifty, C., Yu, T., and Wein berger, K. Simplifying graph convolutional networks. In Proceedings of the 36th International Conference on Machine Learning, ICML 19, pp. 6861 6871. PMLR, 2019. Yefet, N., Alon, U., and Yahav, E. Adversarial examples for models of code, 2019. Zhang, J., Wang, X., Zhang, H., Sun, H., Wang, K., and Liu, X. A novel neural source code representation based on abstract syntax tree. In Proceedings of the 41st In ternational Conference on Software Engineering, ICSE 19, pp. 783 794, 2019. Zhang, Y., Albarghouthi, A., and D Antoni, L. Robust ness to programmable string transformations via aug mented abstract training. In Proceedings of the 37th In ternational Conference on Machine Learning, ICML 20, 2020. Z ugner, D. and G unnemann, S. Adversarial attacks on graph neural networks via meta learning. In In ternational Conference on Learning Representations, ICLR 19, 2019. Z ugner, D., Akbarnejad, A., and G unnemann, S. Adver sarial attacks on neural networks for graph data. In Pro ceedings of the 24th ACM SIGKDD International Con ference on Knowledge Discovery & Data Mining, KDD 18, pp. 2847 2856, 2018.