# manipulating_sgd_with_data_ordering_attacks__245119fc.pdf Manipulating SGD with Data Ordering Attacks Ilia Shumailov University of Cambridge & University of Toronto & Vector Institute ilia.shumailov@cl.cam.ac.uk Zakhar Shumaylov University of Cambridge zs334@cam.ac.uk Dmitry Kazhdan University of Cambridge dk525@cam.ac.uk Yiren Zhao University of Cambridge yiren.zhao@cl.cam.ac.uk Nicolas Papernot University of Toronto & Vector Institute nicolas.papernot@utoronto.ca Murat A. Erdogdu University of Toronto & Vector Institute erdogdu@cs.toronto.edu Ross Anderson University of Cambridge & University of Edinburgh ross.anderson@cl.cam.ac.uk Machine learning is vulnerable to a wide variety of attacks. It is now well understood that by changing the underlying data distribution, an adversary can poison the model trained with it or introduce backdoors. In this paper we present a novel class of training-time attacks that require no changes to the underlying dataset or model architecture, but instead only change the order in which data are supplied to the model. In particular, we find that the attacker can either prevent the model from learning, or poison it to learn behaviours specified by the attacker. Furthermore, we find that even a single adversarially-ordered epoch can be enough to slow down model learning, or even to reset all of the learning progress. Indeed, the attacks presented here are not specific to the model or dataset, but rather target the stochastic nature of modern learning procedures. We extensively evaluate our attacks on computer vision and natural language benchmarks to find that the adversary can disrupt model training and even introduce backdoors. 1 Introduction The data-driven nature of modern machine learning (ML) training routines puts pressure on data supply pipelines, which become increasingly more complex. It is common to find separate disks or whole content distribution networks dedicated to servicing massive datasets. Training is often distributed across multiple workers. This emergent complexity gives a perfect opportunity for an attacker to disrupt ML training, while remaining covert. In the case of stochastic gradient descent (SGD), it assumes uniform random sampling of items from the training dataset, yet in practice this randomness is rarely tested or enforced. Here, we focus on adversarial data sampling. It is now well known that malicious actors can poison data and introduce backdoors, forcing ML models to behave differently in the presence of triggers [12]. While such attacks have been shown to pose a real threat, they have so far required the attacker to perturb the dataset used for training. We now show that by simply changing the order in which batches or data points are supplied to a model during training, an attacker can affect model behaviour. More precisely, we show that it is 35th Conference on Neural Information Processing Systems (Neur IPS 2021). Attack Dataset knowledge Model knowledge Model specific Changing dataset Adding data Adding perturbations Batch Reorder Batch Reshuffle Batch Replace Adversarial initialisation [11] Bad Nets [12] Dynamic triggers [28] Poisoned frogs [32] Table 1: Taxonomy of training time attacks. In green, we highlight our attacks. possible to perform integrity and availability attacks without adding or modifying any data points. For integrity, an attacker can reduce model accuracy or arbitrarily control its predictions in the presence of particular triggers. For availability, an attacker can increase the amount of time it takes for the model to train, or even reset the learning progress, forcing the model parameters into a meaningless state. We present three different types of attacks that exploit Batch Reordering, Reshuffling and Replacing naming them BRRR attacks. We show that an attacker can significantly change model performance by (i) changing the order in which batches are supplied to models during training; (ii) changing the order in which individual data points are supplied to models during training; and (iii) replacing datapoints from batches with other points from the dataset to promote specific data biases. Furthermore, we introduce Batch-Order Poison (BOP) and Batch-Order Backdoor (BOB), the first techniques that enable poisoning and backdooring of neural networks using only clean data and clean labels; an attacker can control the parameter update of a model by appropriate choice of benign datapoints. Importantly, BRRR attacks require no underlying model access or knowledge of the dataset. Instead, they focus on the stochasticity of gradient descent, disrupting how well individual batches approximate the true distribution that a model is trying to learn. To summarise, we make the following contributions in this paper: We present a novel class of attacks on ML models that target the data batching procedure used during training, affecting their integrity and availability. We present a theoretical analysis explaining how and why these attacks work, showing that they target fundamental assumptions of stochastic learning, and are therefore model and dataset agnostic1. We evaluate these attacks on a set of common computer vision and language benchmarks, using a range of different hyper-parameter configurations, and find that an attacker can slow the progress of training, as well as reset it, with just a single epoch of intervention. We show that data order can poison models and introduce backdoors, even in a blackbox setup. For a whitebox setup, we find that the adversary can introduce backdoors almost as well as if they used perturbed data. While a baseline CIFAR10 VGG16 model that uses perturbed data gets 99% trigger accuracy, the whitebox BOB attacker gets 91% 13 and the blackbox BOB attacker achieves 68% 19. 2 Related Work Attacks on integrity: Szegedy et al. [34] and Biggio et al. [6] concurrently discovered the existence of adversarial examples. These samples, containing human imperceptible perturbations, cause models to output incorrect results during inference. The original whitebox attacks require the adversary to access the models and use gradient information to perform conditioned optimisation to maximise the model loss [34, 6, 10, 22]. The attack later generalised to blackbox setups, where the adversary trains a surrogate model and hopes the generated adversarial samples transfer to the target model [24]. The data-poisoning attack aims at using data manipulation to cause DNNs to fail on specific test-time instances [17]. Chen et al. demonstrated that manipulation of the labels of around 50 training samples is enough to trigger failure [8]. Gu et al. showed that attackers can associate adversarial patterns with labelled images and cause DNNs to overfit to this pattern [12]. Shafahi et al. launched a poisoning attack using instances with clean labels [29]. A number of other works have since created more efficient triggers [28]. It was a common belief that poisoning attacks on DNNs have to contain a certain level of malicious manipulation of whether the data or label at train time. However, this paper 1Codebase is available here: https://github.com/iliaishacked/sgd_datareorder shows how poisoning is possible with clean data and clean labels, with the only manipulation being of the batching process at training time. Attacks on availability: Shumailov et al. first attacked the availability of computer vision and natural language processing models at inference time with sponge examples [30]. They pessimized over energy utilisation and inference latency to target hardware and internal model optimisations. By contrast, this paper targets availability at training time. We show that the attacker can reset or slow down training progress by reordering or reshuffling natural batches. Finally, we note that unlike Shumailov et al., our attacks do not target specific optimisations in hardware or individual models, but instead break the fundamental stochastic assumption of training. Note on terminology: As we study attacks on training, we use the terms integrity and availability in a slightly different sense from the rest of the literature. By availability, we mean the availability of a training procedure the rate at which the model trains, as opposed to model performance on the task. By integrity attacks, we mean attacks that either stop the model from learning its intended task, or force it to learn an additional unrelated task. 3 Methodology 3.1 Threat model Figure 1: The attacker reorders the benign randomly supplied data based on the surrogate model outputs. Attacker co-trains the surrogate model with the data that is supplied to the source model. We assume one of the strongest threat models currently described in the literature. In particular, our blackbox attacker assumes no access to the model and no prior knowledge of the training data, whereas a whitebox attacker has access to the model under attack and can compute its loss directly. The attack specifically focuses on the batching part of the ML pipeline as is depicted in Figure 1. We discuss the related work in Section 2. This attack is realistic and can be instantiated in several ways. The attack code can be infiltrated into: the operating system handing file system requests; the disk handling individual data accesses; the software that determines the way random data sampling is performed; the distributed storage manager; or the machine learning pipeline itself handling prefetch operations. That is a substantial attack surface, and for large models these components may be controlled by different principals. The attack is also very stealthy. The attacker does not add any noise or perturbation to the data. There are no triggers or backdoors introduced into the dataset. All of the data points are natural. In two of four variants the attacker uses the whole dataset and does not oversample any given point, i.e. the sampling is without replacement. This makes it difficult to deploy simple countermeasures. 3.2 Primer on stochastic learning and batching We assume that the defender is trying to train a deep neural network model with parameters θ operating over Xi Xtrain, solving a non-convex optimization problem with respect to parameters θ, corresponding to minimization of a given loss function L(θ). We will denote the training dataset X = {Xi}. We assume a commonly-used loss function defined as the sample average of the loss per training data point Li(θ) = L(Xi, θ) in k-th batch over the training set, where B is the batch size: ˆLk+1(θ) = 1 B Pk B+B i=k B+1 Li(θ). If we let N B be the total number of items for training, then in a single epoch one aims to optimize: ˆL(θ) = 1 N PN i=1 ˆLi(θ). Optimization with stochastic gradient descent (SGD) algorithm of N B samples and a learning rate of η leads to the following weight update rule over one epoch: θk+1 = θk + η θk; θk = θ ˆLk(θk). SGD is often implemented with momentum [25, 33], with µ and v representing momentum and velocity respectively: vk+1 = µvk + η θk; θk+1 = θk + vk+1. Given data, SGD s stochasticity comes from the batch sampling procedure. Mini-batched gradients approximate the true gradients of ˆL and the quality of this approximation can vary greatly. In fact, assuming an unbiased sampling procedure, i.e. when the k th gradient step corresponds to ik th batch with P(ik = i) = 1/N, in expectation the batch gradient matches the true gradient: E[ ˆLik(θ)] = i=1 P(ik = i) ˆLi(θ) = 1 i=1 ˆLi(θ) = ˆL(θ). (1) Although this happens in expectation, a given batch taken in isolation can be very far from the mean. This variation has been exploited in the literature to aid training: there exists a field responsible for variance reduction techniques for stochastic optimisation [18], active learning [9, 35], (anti-) curriculum learning [5, 20, 13] and core-set construction [2]. Each area looks at identifying and scheduling data subsets that aid training and give a better true gradient approximation. In this paper, we turn things round and investigate how an attacker can exploit data order to break training. The explicit stochastic assumption opens a new attack surface for the attacker to influence the learning process. In particular, let us consider the effect of N SGD steps over one epoch [31]: θN+1 = θ1 η ˆL1(θ1) η ˆL2(θ2) η ˆLN(θN) j=1 ˆLj(θ1) + η2 data order dependent z }| { N X k