# certified_unlearning_for_neural_networks__19d2d840.pdf Certified Unlearning for Neural Networks Anastasia Koloskova * 1 Youssef Allouah * 2 Animesh Jha 1 Rachid Guerraoui 2 Sanmi Koyejo 1 We address the problem of machine unlearning, where the goal is to remove the influence of specific training data from a model upon request, motivated by privacy concerns and regulatory requirements such as the right to be forgotten. Unfortunately, existing methods rely on restrictive assumptions or lack formal guarantees. To this end, we propose a novel method for certified machine unlearning, leveraging the connection between unlearning and privacy amplification by stochastic post-processing. Our method uses noisy fine-tuning on the retain data, i.e., data that does not need to be removed, to ensure provable unlearning guarantees. This approach requires no assumptions about the underlying loss function, making it broadly applicable across diverse settings. We analyze the theoretical trade-offs in efficiency and accuracy and demonstrate empirically that our method not only achieves formal unlearning guarantees but also performs effectively in practice, outperforming existing baselines. Our code is available at https://github.com/ stair-lab/certified-unlearningneural-networks-icml-2025 1. Introduction Machine unlearning the process of removing the influence of specific training data from a model has become an increasingly important challenge in modern machine learning (Nguyen et al., 2022). With the widespread adoption of deep learning in fields such as healthcare, natural language processing, and computer vision, concerns over data privacy, security, and control have grown significantly. In particular, regulatory frameworks like the General Data Protection Reg- *Equal contribution 1Stanford University, USA 2EPFL, Switzerland. Correspondence to: Anastasia Koloskova , Youssef Allouah . Proceedings of the 42 nd International Conference on Machine Learning, Vancouver, Canada. PMLR 267, 2025. Copyright 2025 by the author(s). ulation (GDPR) of the European Union (Voigt & Von dem Bussche, 2017) enforce the right to be forgotten , requiring organizations to delete user data upon request. However, simply removing data from storage is insufficient if the information remains embedded in a trained model. This has led to the growing interest in unlearning techniques, which seek to eliminate the influence of specific data points while preserving the overall utility of the model. Achieving efficient and reliable unlearning is particularly challenging when working with large-scale neural networks, where full retraining from scratch is computationally prohibitive. The idea of machine unlearning dates back to Cao & Yang (2015) and has since inspired a range of approaches. Broadly, unlearning techniques fall into two categories: exact unlearning, which aims to completely erase the influence of specific data points, and approximate unlearning, which seeks a computationally efficient but approximate removal of information. While exact unlearning offers the strongest theoretical guarantees, it is rarely practical for large-scale models or frequent unlearning requests due to its prohibitive computational costs (Ginart et al., 2019; Bourtoule et al., 2021). As a result, many existing methods adopt relaxed guarantees, but these often come with significant trade-offs. For instance, some approaches rely on restrictive assumptions about loss functions, such as convex linear models (Guo et al., 2020), while others lack rigorous theoretical guarantees (Graves et al., 2021; Kurmanji et al., 2024) or require extensive retraining (Bourtoule et al., 2021). One common heuristic method is fine-tuning on retained data, and potentially gradient ascent on the forget data, to induce catastrophic forgetting in the affected parts of the model (Triantafillou et al., 2024). While this technique has shown promise in reducing the retained influence of forgotten data, it does not inherently provide certifiable guarantees of unlearning, leaving open questions about its reliability in privacy-sensitive applications. To address these limitations, we propose a new approximate unlearning framework that builds on the concept of privacy amplification by stochastic post-processing (Balle et al., 2019). Our method leverages noisy fine-tuning on retained data to enforce provable unlearning guarantees while maintaining computational efficiency. Unlike existing approaches, our framework does not impose restrictive assumptions on the loss function, making it particularly Certified Unlearning for Neural Networks well-suited for non-convex optimization problems such as deep learning. We interpret each noisy fine-tuning step as a form of stochastic post-processing, ensuring that privacy improves progressively with each iteration while balancing the trade-offs between accuracy and computational cost. Through rigorous theoretical analysis and extensive empirical validation, we demonstrate that our method provides both formal guarantees and strong practical performance, making it a viable solution for real-world machine learning applications where frequent unlearning requests must be handled efficiently. Our key contributions can be summarized as follows: A novel certified unlearning method that integrates noisy fine-tuning with privacy amplification by stochastic postprocessing, offering a principled approach to approximate unlearning. Rigorous unlearning guarantees that do not depend on restrictive assumptions such as loss function smoothness, making the method applicable to a broad range of models, including deep neural networks. Empirical validation in deep learning applications, demonstrating that our approach not only meets formal unlearning guarantees but also surpasses existing baselines in performance and model utility. 1.1. Related Work Amplification by post-processing. Our approach is inspired by privacy amplification via stochastic postprocessing, a concept in differential privacy where randomized transformations that do not use private data enhance privacy guarantees. The foundational work of Feldman et al. (2018) introduced privacy amplification by iteration, demonstrating that when training with differentially private stochastic gradient descent (DP-SGD) on convex objectives, the privacy of an unused training sample improves with the number of optimization steps. Balle et al. (2019) extended this result, refining the analysis of amplification by iteration and introducing amplification by mixing, which establishes that privacy is further strengthened when applying a Markov kernel satisfying specific mixing conditions. Subsequent work by Asoodeh et al. (2020) showed that under bounded domain assumptions, these mixing conditions hold for the Gaussian mechanism, leading to tighter privacy guarantees for DP-SGD. Our proposed unlearning framework based on noisy fine-tuning on retained data operates as a stochastic post-processing step that does not access the data to be forgotten. Thus, we extend privacy amplification techniques beyond convex settings to enable certified unlearning in deep learning models. Certified unlearning. There is a growing body of work on certified unlearning, but existing approaches are largely inapplicable to neural networks. Most prior methods rely on restrictive assumptions about the model or loss function that do not hold for deep learning. Several works focus on convex tasks (Guo et al., 2020; Neel et al., 2021; Sekhari et al., 2021; Allouah et al., 2025), but this limits their applicability to deep learning. These methods work in practice for logistic regression-style tasks and do not extend to neural networks due to non-convexity. Recent works aim to achieve certified unlearning for nonconvex tasks (Golatkar et al., 2020; Chourasia & Shah, 2023; Chien et al., 2024; Mu & Klabjan, 2024; Zhang et al., 2024; Allouah et al., 2025), but still impose significant constraints. All require the loss function to be smooth, limiting them to networks with smooth activations. Most (Chourasia & Shah, 2023; Chien et al., 2024; Mu & Klabjan, 2024) also require knowledge of the smoothness constant, restricting applicability to simpler models where this constant is tractable. Furthermore, Allouah et al. (2025) additionally assumes a unique minimizer, excluding virtually all practical neural architectures. Zhang et al. (2024) additionally requires knowledge of the minimal eigenvalue of the Hessian at the unique optimal model. To the best of our knowledge, our approach is the first certified unlearning method that supports arbitrary non-convex tasks, enabling provable unlearning guarantees for practical deep learning models. Unlearning applications. A separate line of research focuses on concept unlearning, which aims to remove specific topics or themes from language models, beyond forgetting particular training samples (Liu et al., 2024). For instance, work in this domain has explored techniques for eliminating knowledge of topics like Harry Potter or other potentially harmful or unethical content from large language models (Eldan & Russinovich, 2023). These methods often involve intervention at the representation or knowledge distillation level, rather than enforcing formal guarantees of data removal. In contrast, our work focuses on data point unlearning, ensuring that information associated with specific training samples is provably removed while preserving model utility. Other explored unlearning settings include unlearning in graph neural networks (Chien et al., 2022), in min-max optimization settings (Liu et al., 2023), and the adversarial setting with the server possibly forging unlearning (Thudi et al., 2021). 2. Problem Statement We consider a model ˆx Rd trained using algorithm A on a dataset D of n training examples, i.e., ˆx = A(D). We place no restrictions on A; it may be SGD, Adam, momentum SGD, etc. An unlearning request specifies a subset Df D, referred to as the forget set, which we wish to erase from the model. Ideally, we could retrain the model from scratch on Certified Unlearning for Neural Networks the retain set D \ Df, yielding xu = A(D \ Df), but this is often computationally prohibitive. Instead, we aim to design an approximate unlearning algorithm U that outputs a model retaining no information about Df. To this end, U takes as input the original model ˆx = A(D), the unlearning request Df, and the full training dataset D. Formally, unifying several prior definitions (Ginart et al., 2019; Guo et al., 2020), we require U to satisfy the guarantees below. Definition 2.1 ((ε, δ)-unlearning). Let ε 0, δ [0, 1]. We say that U is (ε, δ)-unlearning algorithm for A if there exists a certifying algorithm A, such that for any forget and initial datasets Df D and any observation O Rd, Pr[U(A(D), D, Df) = O] eε Pr[ A(D\Df) = O] + δ, Pr[ A(D\Df) = O] eε Pr[U(A(D), D, Df) = O] + δ. For simplicity, we refer to approximate unlearning as (ε, δ)- unlearning for some values of ε and δ, inspired by (ε, δ)- differential privacy (Dwork & Roth, 2014). This formulation parallels differential privacy by treating Df as private and D \ Df as public, thereby ensuring (ε, δ)-privacy for the forget set. Indeed, this notion ensures it is statistically difficult to distinguish the output of the unlearning algorithm U from that of a certifying algorithm A that has no access to the forget set Df. Importantly, the definition does not fix A but only requires its existence, allowing flexibility to capture various prior definitions. For example, setting A = A recovers definitions from (Ginart et al., 2019; Guo et al., 2020), while setting A = U(A(D \ Df), D \ Df, ) aligns with (Allouah et al., 2025; Sekhari et al., 2021). We primarily adopt the latter form in this work. Note that our choice of the certifying algorithm A is purely theoretical and does not require running additional computational steps in practice. Finally, we emphasize that Definition 2.1 naturally supports non-adaptive sequential unlearning, where data points are removed one-by-one without revisiting earlier removals. Baselines. We now describe two straightforward baselines for achieving (ε, δ)-unlearning. The first baseline is output perturbation, which applies the standard Gaussian mechanism to the original model ˆx. The procedure involves clipping the model parameters to ensure bounded sensitivity and adding Gaussian noise to the model s output. Formally, the unlearning procedure is defined as: x0 = ΠC0(ˆx) + ξ0; ξ0 N 0, 8C2 0 ln(1.25/δ) where ΠC0 represents the clipping operation, and ξ0 is noise sampled from the Gaussian distribution, with sufficient magnitude to ensure privacy (Dwork & Roth, 2014). While theoretically sound, output perturbation often performs poorly in practice, as the required noise magnitude is large, which can significantly degrade the utility of the model. Another baseline is retraining from scratch, where the forget dataset Df is discarded, and the model is retrained from scratch on the remaining data D \ Df using the algorithm A. Although this guarantees perfect unlearning (i.e., (0, 0)- unlearning), it is computationally expensive and requires substantial memory resources, which undermines the efficiency objectives of approximate unlearning. 3. Algorithm Our approach is based on fine-tuning the model with stochastic gradient descent (SGD) using only the retained data D \ Df, while incorporating regularization. Recall that in this approach, we initialize from the original model ˆx and update it for T 1 iterations as follows for every t {0, . . . , T 1}: xt+1 = xt γ(gt + λxt), x0 = ˆx, (2) where gt is the stochastic gradient computed on the retained data, γ is the learning rate, and λ a regularization parameter. While this method may induce some empirical forgetting due to the phenomenon of catastrophic forgetting in SGD (Goodfellow et al., 2013), it does not provide formal (ε, δ)-unlearning as required by Definition 2.1. Therefore, to frame each fine-tuning step as a stochastic post-processing operation (Balle et al., 2019) applied to the initial model ˆx, we introduce gradient clipping and model clipping, both combined with Gaussian privacy noise. These modifications allow us to map the process to different stochastic post-processing mechanisms (Balle et al., 2019). Gradient clipping. Our primary approach relies on gradient clipping, where we clip gradients before applying updates, followed by noise addition. This method resembles standard DP-SGD (Abadi et al., 2016) but differs in that gradient updates exclude any private (forget) data points: x0 = ΠC0(ˆx), xt+1 = xt γ (ΠC1(gt) + λxt) + ξt+1, (3) where ξt+1 N(0, σ2Id) is Gaussian noise and ΠC0, ΠC1 are the clipping operators of radius C0 > 0 and C1 > 0 respectively1. We analyze this method in the theoretical framework of privacy amplification by iteration (Feldman et al., 2018), to show that redistributing noise across multiple steps enables a reduction in noise at the initial step while maintaining strong privacy guarantees. This noise reduction retains the original model s performance. The regularization with parameter λ additionally allows implicitly 1Following standard differential privacy mechanisms, our clipping is by norm, i.e., ΠC(x) := x min{ C x , 1} with radius C. Certified Unlearning for Neural Networks controlling the norm of the model xt that is increased due to the presence of the noise ξt. Model clipping. An alternative approach involves model clipping, where each update is clipped to a predefined radius before noise addition: xt+1 = ΠC2(xt γ(gt + λxt)) + ξt+1, (4) where ξt+1 N(0, σ2Id) and ξ0 N(0, σ2 0Id). Since the gradient gt is computed solely on the retain data, the argument of the clipping operator can be interpreted as a postprocessing transformation of the private model, given by the mapping ψ(xt) := xt γ(gt+λxt). The clipping and noise addition ensure differential privacy guarantees (Dwork & Roth, 2014). Standard results on post-processing state that such transformations preserve or improve existing differential privacy guarantees. Additionally, privacy amplification by stochastic post-processing (Balle et al., 2019) suggests that each additional step can further enhance privacy beyond that of the previous model. 4. Theoretical Analysis We now present the approximate unlearning guarantees of the gradient and model clipping variants of our unlearning method introduced in the previous section. We also theoretically compare these with previous non-convex certified unlearning algorithms. We defer all proofs to Appendix A. 4.1. Unlearning Guarantees We first present in Theorem 4.1 the unlearning guarantees of the Gradient clipping approach, as defined in Equation (3). Theorem 4.1 (Gradient clipping). Let T 1, γ, σ > 0, λ 0, δ (0, 1), ε (0, 3 log(1/δ)). Consider T iterations of the unlearning algorithm defined in (3). We obtain (ε, δ)- unlearning if: 1. Without regularization (λ = 0): σ2 = 9 log(1/δ) ε2T (C0 + C1γT)2 . (5) 2. With regularization (λ > 0): if γλ ( 1 σ2 = 72γλ log(1/δ) C0 (1 γλ)T + C1 The proof relies on techniques from privacy amplification by iteration introduced by Feldman et al. (2018), which is a form of privacy amplification through stochastic postprocessing (Balle et al., 2019). We extend the original analysis Feldman et al. (2018) that is applicable only to convex functions to the non-convex case. The key ingredient in our analysis is the shift-reduction lemma (Lemma A.6), which enables us to control the growth of the R enyi divergence between (i) the unlearned model initialized from the full model and (ii) the unlearned model initialized from training only on the retained data, across iterations. Unlike the original privacy amplification by iteration framework, which assumes smoothness and convexity of the loss function (Feldman et al., 2018), our approach circumvents these assumptions by introducing gradient clipping. This is reflected in the sufficient noise magnitude required to achieve (ε, δ)-unlearning, as given in (5). Specifically, in the absence of regularization λ = 0, gradient clipping at threshold C1 induces a dependence of the form: σ2 = O log(1/δ) C2 0 T + γ2TC2 1 Intuitively, for a large enough number of iterations, we can trade off the effect of the initial clipping radius C0 for a minor cost proportional to the learning rate γ and the periteration clipping radius C1. Fortunately, this cost can be controlled by choosing a sufficiently small learning rate γ. In particular, if γ = C0 C1T , then asymptotically in T we get σ2 = O C2 0 log(1/δ) ε2T . This implies that the required noise magnitude decreases as the number of iterations T increases, ultimately tending to zero in the limit T . Given C0, C1, and γ, we can upper bound the optimal number of unlearning steps by minimizing (7). Specifically, setting T larger than T := arg min T C2 0 T + γ2TC2 1 leads to more iterations (T > T ) with increased noise per iteration (σT σT by definition of T ), while achieving the same (ϵ, δ)-unlearning. Finally, in the regularized case λ > 0, the noise expression (6) for achieving (ε, δ)-unlearning simplifies to: σ2 = O γ log(1/δ) λC2 0 exp( λγT) + C2 1 λ Here, regularization enables an exponential reduction in T of the dependence on the initial clipping threshold C0, effectively mitigating its impact over time. However, this comes at the cost of an increased dependence on the periteration clipping threshold C1 , scaling inversely with the regularization factor λ. While this suggests a smaller noise magnitude per iteration, overly strong regularization may degrade the model s performance, as we further analyze in the next section. Finally, we provide a refined, but complex, formula for noise magnitudes in Theorem A.9 (appendix) Certified Unlearning for Neural Networks using R enyi divergences, which we apply before precise R enyi-to-DP conversion (Balle et al., 2020) in practice. Next, we state the unlearning guarantees of the Model clipping approach, as defined in Equation (4). Theorem 4.2 (Model clipping). Let T 1, C0, C2, σ0, ε > 0, and δ (0, 1). Denote for every r > 0, θε(r) := Q ε where for all t R, Q(t) := 1 2π R t e u2/2du. Consider T iterations of the unlearning algorithm defined in (4). We obtain (ε, δ)-unlearning if T log(1/δ) + log θε( 2C0 log (1/θε( 2C2 σ )) . (10) In particular, for any T 1, ε (0, 1), it suffices to have σ2 = 8C2 2 ln(1.25) ln(1.25/δ) σ2 0ε2 The proof of Theorem 4.2 relies on recent advances in privacy amplification by stochastic post-processing due to Balle et al. (2019) and Asoodeh et al. (2020). The initial iteration in (4) provides (ε0, δ0) unlearning since it is the standard Gaussian mechanism from differential privacy (Dwork & Roth, 2014). Moreover, assuming that the model at step t of Algorithm (4) guarantees (εt, δt)-unlearning, using the contraction coefficients (Asoodeh et al., 2020) approach, we show that the next iteration amplifies the unlearning guarantee as follows: (εt+1, δt+1) = (εt, θε( 2C2 where the expression of the amplification factor θε( 2C2 σ ) (0, 1) is given in (9). This means that after T iterations, our algorithm guarantees (εT , δT ) = (ε0, θε( 2C2 σ )T δ0) unlearning. Stated differently, given any target ε, δ and any noise magnitudes σ2, σ2 0 and clipping thresholds C1, C0, we show that it sufficient for the number of iterations to be at least that in (10) to guarantee (ε, δ)-unlearning. While the expression of the amplification factor θε( 2C2 σ ) given in (9) is complex, we remark that it is a decreasing function of σ taking values in (0, 1). In fact, assuming that ε (0, 1), and given any number of iterations T, we state a simple expression of the sufficient noise magnitude σ2 in (11). This simplified expression is only for analytical purposes, since it gives looser unlearning guarantees. 4.2. Theoretical Comparison To compare the various unlearning methods, we analyze the number of iterations required and the noise injected per iteration to achieve the same (ε, δ)-unlearning guarantee. An effective method should minimize noise per iteration while keeping the number of iterations reasonable to preserve model accuracy. Since all methods rely on noisy updates, we focus on the magnitude of noise injected. A summary of our findings is provided in Table 1, with details below. Output perturbation. We recall that this is a natural baseline, defined in (1), whereby we first project the original model with clipping threshold C0 and add noise of magnitude σ2 = 8 ln(1.25/δ)C2 0 ε2 , which guarantees (ε, δ)- unlearning for ε, δ (0, 1), following (Dwork & Roth, 2014, Theorem A.1). DP training. Training with DP guarantees unlearning for free; we do not need to inject noise after receiving unlearning requests. However, the noise needed may be larger than with other unlearning methods. Indeed, consider unlearning a set of k samples with DP SGD. For this, we need group DP: if a mechanism is (ε, δ)-DP for single-record changes, then it is (kε, kekεδ)-DP for any pair of datasets that differ in k records (Vadhan, 2017, Lemma 2.2). Consequently, to attain the same (ε, δ)-unlearning guarantee one must run DP SGD with noise σ2 = 2C2k2 ε2 ln( 1.25k δ ) + kε . This is typically much larger noise than with our techniques, given that it is at least quadratic in k, which may scale with the size of the dataset. This is in line with recent findings on the theoretical separation between DP and certified unlearning (Sekhari et al., 2021; Allouah et al., 2025). Gradient clipping. From Theorem 4.1, T 1 iterations of Gradient clipping (3) with noise magnitude σ2 = 9 log(1/δ) ε2T (C0 + C1γT)2 satisfies (ε, δ)-unlearning assuming ε 3 log(1/δ). Setting T = C0 γC1 minimizes the noise to σ2 = 36γC1C0 log(1/δ) This substantially reduces noise per iteration compared to output perturbation by a factor of C0 C1γ , which is significant when γ C0 C1 . A small learning rate or large initial clipping C0 can make this method particularly effective in preserving model accuracy. For the regularized version of gradient clipping we recall that noisy gradient descent with T iterations and constant noise level, given by the expression σ2 = 72γλ log(1/δ) ε2 C0 (1 γλ)T + C1 λ 2 satisfies (ε, δ)- unlearning under the assumption ε 3 log(1/δ). Setting T = 1 ηλ log( λC0 σ2 = C2 1γ log(1/δ) This approach outperforms the unregularized variant when λ > C1 C0 , requiring only logarithmic iterations in the initial Certified Unlearning for Neural Networks Algorithm Variance of Noise Injected Assumptions Max. per Iteration Iterations Output Perturbation (baseline) C2 0 1 Gradient Clipping (3) γC1C0 C0/γC1 Gradient Clipping (3) (w/ regularization) γC2 1/λ log (λC0/C1)/γλ Model Clipping (4) C2 2 log(1/δ) Langevin Diffusion (Chourasia & Shah, 2023) no explicit expression smoothness, boundedness, noisy training Rewind-to-Delete (Mu & Klabjan, 2024) exponential in smoothness constant smoothness, noisy training Table 1. Summary comparison of certified unlearning accountants for non-convex tasks. C0: initial clipping threshold, C1/C2: running clipping threshold for gradient and model clipping resp., γ: learning rate, λ: ℓ2-regularization factor. We ignore absolute constants and multiplicative factor log(1/δ) ε2 which is in the noise variance of all methods. We note that the entry Langevin Diffusion also covers the work of Chien et al. (2024). Model Clipping and Gradient Clipping algorithms effectively reduce the maximum noise per iteration at the cost of doing more noisy SGD steps compared to the output perturbation. More details on the comparison are given in Section 4.2. clipping radius C0. This suggests that projecting onto a larger set can better preserve accuracy, though it may require stronger regularization, which could degrade performance in some tasks. Model clipping. We recall from Theorem 4.2 that T 1 iterations of Model clipping (4) with noise magnitude σ2 = 8C2 2 ln(1.25) ε2 + 8C2 2 ln(1.25) ln(1.25/δ) σ2 0ε2 is provably sufficient to achieve (ε, δ)-unlearning. Assume that the initial noise magnitude σ2 0 is at most 8C2 0 ln(1.25/δ) ε2 , as the latter magnitude is sufficient to obtain (ε, δ)-unlearning in one iteration as in the output perturbation baseline. Therefore, the order of magnitude of the minimum value of the noise σ2 = 8C2 2 ln(1.25) can be attained within T = ln(1.25/δ) iterations. This represents a significant improvement over the baseline, reducing the noise per iteration by a factor of C2 0 C2 2 . If C2 is small or if the initial clipping threshold C0 is aggressive, this reduction can be substantial. Compared to amplification by iteration, this method requires fewer iterations (only logarithmic in 1/δ), though it may introduce more noise per iteration when the learning rate is small. Prior works. Existing certified unlearning methods that do not assume convexity of the loss function include (Chourasia & Shah, 2023; Chien et al., 2024; Mu & Klabjan, 2024). However, unlike our approach, these methods rely on the assumption that the loss function is smooth2 2That is, for some L 0, by denoting L the loss function, we have L(x) L(y) L x y for all x, y Rd. and are tailored to specific training algorithms. For instance, Chourasia & Shah (2023) and Chien et al. (2024) analyze training with noisy projected gradient descent, leveraging both smoothness and specific training dynamics to establish unlearning guarantees. These guarantees stem from the convergence of the training process to a limiting distribution, but additional restrictive assumptions are required. Notably, the smoothness constant is needed not only for theoretical analysis but also to determine the appropriate noise level at each step to achieve (ϵ, δ)-unlearning. This constraint limits the applicable function class to those where the smoothness constant is easily computable, effectively excluding modern neural networks. Chourasia & Shah (2023) assume that the loss function is bounded, while Chien et al. (2024) require that the training s limiting distribution satisfies an isoperimetric inequality. These constraints significantly limit the applicability of their methods, primarily to smooth convex tasks such as logistic regression (Chien et al., 2024). Similarly, Mu & Klabjan (2024) address non-convex loss functions but assume that training follows gradient descent with output perturbation. Their approach also relies on smoothness and requires injecting noise with a magnitude that scales exponentially with the smoothness parameter, which can be prohibitive in practice. In contrast, our approach removes these smoothness constraints and is not tied to a specific training algorithm, making it applicable to a broader class of learning problems. 5. Experimental Evaluation In this section, we present an empirical evaluation of our proposed unlearning method, in its two variants Gradient Clipping (3) and Model Clipping (4), on two benchmark datasets: MNIST (Deng, 2012) and CIFAR-10 (Krizhevsky et al., 2014). We first detail the experimental setup, and then describe the results and observations stemming from Certified Unlearning for Neural Networks (a) CIFAR-10 Figure 1. Accuracy of Gradient and Model Clipping versus compute budget (epochs) on CIFAR-10 (left) and MNIST (right), to satisfy (1, 10 5)-unlearning. We compare to two baselines: retraining from scratch and output perturbation, detailed in Section 2. Across all the compute budgets gradient and model clipping achieves higher accuracy than the baselines, with the difference being larger for smaller compute budgets. Accuracy Baselines Noisy Fine-Tuning (ours) Retrain Output Perturbation Gradient Clipping Model Clipping 30% 6 5 ( 16 % faster) 4 ( 33 % faster) 3 ( 50 % faster) 35% 11 7 ( 41 % faster) 6 ( 50 % faster) 6 ( 50 % faster) 40% 18 12 ( 33 % faster) 10 ( 44 % faster) 10 ( 44 % faster) 45% 23 17 ( 26 % faster) 16 ( 30 % faster) 17 ( 26 % faster) 50% 30 25 ( 16 % faster) 23 ( 23 % faster) 24 ( 20 % faster) Table 2. Number of epochs required to reach the target accuracy for our algorithms and the baselines, and their saving compared to retraining from scratch for the CIFAR-10 dataset. Gradient Clipping and Model Clipping consistently save above 20% of compute, sometimes reaching 50% of compute savings. The output perturbation baseline also consistently improves over the retrain from scratch, however is consistently slower than the Gradient and Model clipping algorithms. Table 2 and Figures 1 and 2. For MNIST, we train a small neural network with two layers and approximately 4,000 parameters. For CIFAR-10, we use a slightly larger network with two convolutional blocks followed by a linear layer, totaling 20,000 parameters. In both cases, the forget set consists of a randomly selected 10% subset of the full dataset. Baselines. We compare our methods against two baselines presented in Section 2: retraining from scratch and output perturbation (1). Retraining from scratch involves fully retraining the model after removing the forget set. Output perturbation applies noise directly to the final model parameters to achieve certified unlearning, before fine-tuning the model on the retain data if the compute budget allows. To the best of our knowledge, no existing method provides certified unlearning guarantees for non-convex tasks without requiring knowledge of the smoothness constant of the loss function. Procedures. When retraining from scratch, the model is reinitialized using the same distribution as in the original training phase. In all experiments, we first train a model on the entire dataset until convergence. We set ε = 1, δ = 10 5 for all experiments. For our unlearning algorithms, we continue clipping and adding noise until the desired (ε, δ)- unlearning guarantee is met. In all experiments, the privacy target is reached before exhausting the iteration budget, in less than 100 iterations (see Appendix B for the exact number of unlearning steps to reach target privacy). We therefore continue fine-tuning the model on the retained dataset without additional noise or clipping, using the same hyperparameters as in retraining from scratch. This means that in Certified Unlearning for Neural Networks Figure 2. Convergence behavior of Gradient Clipping with γ = 0.01, C0 = 20, C1 = 10, λ = 50, σ = 0.25 and the retraining from scratch baseline on the CIFAR-10 dataset. The gradient clipping method is applied for the first 30 iterations, followed by standard fine-tuning. Initially, gradient clipping degrades performance but retains useful information, allowing fine-tuning to recover and surpass the retraining baseline quickly. all of our experiments unlearning is cheap and effectively finds a new initialization for the finetuning process, that preserves some information from the original model ˆx. All training, unlearning, and fine-tuning phases use stochastic gradient descent (SGD) with a constant step size. Further experimental details are provided in the appendix. 5.2. Results and Observations We now present our experimental comparison. We compare the algorithms for fixed target accuracy and fixed compute budget, and finally show convergence behavior. Fixed target accuracy. In Table 2, we present the time required for each algorithm to reach the target accuracy for unlearning on CIFAR-10. Our results show that both gradient clipping and model clipping achieve the desired accuracy in a comparable number of steps, significantly outperforming the baseline methods. Notably, compared to retraining from scratch, our algorithms offer substantial computational savings reducing the required steps by up to 50%. Interestingly, while the simple output perturbation baseline also improves upon retraining from scratch, its efficiency gains are less pronounced. This suggests that while output perturbation approach can be beneficial, more advanced unlearning methods such as gradient and model clipping yield considerably greater improvements. Fixed compute budget. On Figure 1, we show the resulting accuracy for varying compute budgets for gradient and model clipping approaches on MNIST and CIFAR10 datasets. Our experiments demonstrate that our proposed unlearning method, in both its gradient and model clipping variants, consistently achieves higher accuracy compared to output perturbation and retraining from scratch across all compute budgets. This improvement is particularly pronounced in low-compute settings, where retraining from scratch struggles to recover performance due to the limited number of optimization steps. In contrast, our methods effectively leverage the retained model parameters, enabling faster recovery while ensuring certified unlearning. This provides substantial savings, for example, to reach an accuracy of 40% on CIFAR dataset, both gradient and model clipping needs only 10 epochs, while output perturbation needs 12 epochs (20% longer), and retrain from scratch requires 18 epochs ( 80% longer). As the compute budget increases, the performance gap between our methods and retraining from scratch gradually narrows. This suggests that while our algorithms provide a strong advantage in resource-constrained scenarios, full retraining may still be the optimal choice given sufficient computing power. However, we note that in practical settings, where compute resources are finite, our approaches offer substantial time savings to reach a particular accuracy. Convergence curve. In Figure 2, we illustrate the convergence behavior for the gradient clipping algorithm on the CIFAR-10 dataset with parameters γ = 0.01, C0 = 20, C1 = 10, and λ = 50. In that case, unlearning is performed for the first 30 iterations, which significantly decreases the accuracy of the original model to almost zero. However, during the fine-tuning stage, the accuracy quickly catches up and outperforms retraining from scratch in around 1 epoch. This suggests that our stochastic postprocessing approach does not completely erase all prior training. Despite the bad accuracy initially, the model can recover the useful information stored in it quickly. A similar convergence curve is observed in all other settings, as unlearning is always performed for a relatively small number of steps (< 100, see Appendix B). These findings highlight the robustness of our approach and its adaptability across different datasets and model architectures. Overall, we observe that both variants of our method gradient and model clipping achieve considerable gains of up to 50% of time savings over the baselines. Further analysis of our results shows that the noise magnitude and clipping strategies play a crucial role in balancing unlearning guarantees with model utility. We found that gradient clipping has a larger range of hyperparameters that achieve an advantage over the baselines, making it easier to tune. 5.3. Transfer Learning and Comparison with DP-SGD To evaluate our methods in more complex settings, we conducted experiments on CIFAR-100 and CIFAR-10 using Res Net architectures (He et al., 2016) pretrained on public Certified Unlearning for Neural Networks (a) CIFAR-10 (b) CIFAR-100 Figure 3. Accuracy of Gradient Clipping versus compute budget (epochs) on CIFAR-10 (left) and CIFAR-100 (right) using a Res Net-18 feature extractor pretrained on public data, to satisfy (1, 10 5)-unlearning. data (Image Net (Deng et al., 2009)). This setup, where unlearning is applied to the last few layers of a pretrained model, has become standard in recent certified approximate unlearning works (Guo et al., 2020; Chien et al., 2024), although the latter works focus on a logistic regression task. More precisely, we remove the last layer of Res Net-18 (pretrained on public data) and replace it with a 3-layer fully connected neural network head, which makes the task nonconvex. We first train the head on the full data, and then unlearn the forget data from the head. While we unlearn only the head, we certify the whole model because the frozen feature extractor is public and unchanged. In this setting we also compare against DP-SGD with group-privacy baseline as defined in Section 4.2, this produces a certified unlearnt model, so we spend the unlearning compute budget on finetuning the model on the retain data. On CIFAR-10 our method attains 85 % accuracy in 9 epochs, 10 % faster than retrain and 47 % faster than DP-SGD (Fig. 3). The gap widens on CIFAR-100: we reach 60 % accuracy in 32 epochs versus 34 for retrain, while DP-SGD never exceeds 20 % within the 50-epoch budget. The poor DP-SGD curve confirms the theoretical predictions from Sec. 4.2: groupprivacy forces k more noise, k being the number of forget samples, and thus hurts accuracy even under a much weaker privacy budget (ε = 50 vs. our ε = 1) 6. Conclusion and Future Work We introduced a new certified machine unlearning method that provides formal guarantees while remaining broadly applicable to modern neural networks. Our approach leverages the connection between unlearning and privacy amplification through stochastic post-processing, enabling effective removal of data influence without imposing assumptions on the loss function. By applying noisy fine-tuning to the retain set, our methods achieve both theoretical soundness and practical effectiveness, outperforming existing baselines in empirical evaluations. Despite these strengths, our approach has certain limitations. First, the effectiveness of our method is constrained by the curse of dimensionality inherent in differential privacy, which can make scaling to models with very large numbers of parameters more challenging. Second, our unlearning framework is designed specifically for stochastic gradient descent (SGD) during the unlearning stage, as we do not retain memory from earlier steps. However, this restriction does not apply to the initial model training or post-unlearning fine-tuning, allowing for flexibility in those phases. Future work could explore extensions to more complex architectures and alternative optimization methods, potentially improving scalability while maintaining strong unlearning guarantees. Our findings highlight the feasibility of certified unlearning in realistic deep learning settings for the first time, offering a promising direction for privacy-preserving and efficient machine learning. Impact Statement This paper presents work whose goal is to advance the field of Machine Learning. There are many potential societal consequences of our work, none of which we feel must be specifically highlighted here. Acknowledgements SK acknowledges support by NSF 2046795 and 2205329, IES R305C240046, ARPA-H, the Mac Arthur Foundation, Schmidt Sciences, Open AI, and Stanford HAI. YA acknowledges support by SNSF grant 200021 200477. Certified Unlearning for Neural Networks Abadi, M., Chu, A., Goodfellow, I., Mc Mahan, H. B., Mironov, I., Talwar, K., and Zhang, L. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pp. 308 318, 2016. Allouah, Y., Kazdan, J., Guerraoui, R., and Koyejo, S. The utility and complexity of inand out-of-distribution machine unlearning. In The Thirteenth International Conference on Learning Representations, 2025. Asoodeh, S., D ıaz, M., and du Pin Calmon, F. Contraction of Eγ-divergence and its applications to privacy. 2020. URL https://api.semanticscholar.org/ Corpus ID:256808341. Balle, B., Barthe, G., Gaboardi, M., and Geumlek, J. Privacy amplification by mixing and diffusion mechanisms. In Neural Information Processing Systems, 2019. URL https://api.semanticscholar.org/ Corpus ID:168170121. Balle, B., Barthe, G., Gaboardi, M., Hsu, J., and Sato, T. Hypothesis testing interpretations and renyi differential privacy. In International Conference on Artificial Intelligence and Statistics, pp. 2496 2506. PMLR, 2020. Bourtoule, L., Chandrasekaran, V., Choquette-Choo, C. A., Jia, H., Travers, A., Zhang, B., Lie, D., and Papernot, N. Machine unlearning. In 2021 IEEE Symposium on Security and Privacy (SP), pp. 141 159. IEEE, 2021. Cao, Y. and Yang, J. Towards making systems forget with machine unlearning. In 2015 IEEE Symposium on Security and Privacy, pp. 463 480, 2015. doi: 10.1109/SP.2015.35. Chien, E., Pan, C., and Milenkovic, O. Certified graph unlearning, 2022. URL https://arxiv.org/abs/ 2206.09140. Chien, E., Wang, H. P., Chen, Z., and Li, P. Langevin unlearning: A new perspective of noisy gradient descent for machine unlearning. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https://openreview.net/ forum?id=3LKu C8rby V. Chourasia, R. and Shah, N. Forget unlearning: Towards true data-deletion in machine learning. In International Conference on Machine Learning, pp. 6028 6073. PMLR, 2023. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248 255. Ieee, 2009. Deng, L. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141 142, 2012. Dwork, C. and Roth, A. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3 4):211 407, 2014. Eldan, R. and Russinovich, M. Who s harry potter? approximate unlearning in llms. ar Xiv preprint ar Xiv:2310.02238, 2023. Feldman, V., Mironov, I., Talwar, K., and Thakurta, A. Privacy amplification by iteration. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pp. 521 532. IEEE, 2018. Ginart, A., Guan, M., Valiant, G., and Zou, J. Y. Making ai forget you: Data deletion in machine learning. Advances in neural information processing systems, 32, 2019. Golatkar, A., Achille, A., Ravichandran, A., Polito, M., and Soatto, S. Mixed-privacy forgetting in deep networks. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 792 801, 2020. URL https://api.semanticscholar.org/ Corpus ID:229678489. Goodfellow, I. J., Mirza, M., Xiao, D., Courville, A., and Bengio, Y. An empirical investigation of catastrophic forgetting in gradient-based neural networks. ar Xiv preprint ar Xiv:1312.6211, 2013. Graves, L., Nagisetty, V., and Ganesh, V. Amnesiac machine learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp. 11516 11524, 2021. Guo, C., Goldstein, T., Hannun, A., and Van Der Maaten, L. Certified data removal from machine learning models. In International Conference on Machine Learning, pp. 3832 3842. PMLR, 2020. He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770 778, 2016. Krizhevsky, A., Nair, V., and Hinton, G. The cifar-10 dataset. online: http://www. cs. toronto. edu/kriz/cifar. html, 55 (5), 2014. Kurmanji, M., Triantafillou, P., Hayes, J., and Triantafillou, E. Towards unbounded machine unlearning. Advances in Neural Information Processing Systems, 36, 2024. Liu, J., Lou, J., Qin, Z., and Ren, K. Certified minimax unlearning with generalization rates and deletion capacity. Certified Unlearning for Neural Networks In Proceedings of the 37th International Conference on Neural Information Processing Systems, NIPS 23, Red Hook, NY, USA, 2023. Curran Associates Inc. Liu, S., Yao, Y., Jia, J., Casper, S., Baracaldo, N., Hase, P., Yao, Y., Liu, C. Y., Xu, X., Li, H., et al. Rethinking machine unlearning for large language models. ar Xiv preprint ar Xiv:2402.08787, 2024. Mironov, I. R enyi differential privacy. In 2017 IEEE 30th computer security foundations symposium (CSF), pp. 263 275. IEEE, 2017. Mu, S. and Klabjan, D. Rewind-to-delete: Certified machine unlearning for nonconvex functions. ar Xiv preprint ar Xiv:2409.09778, 2024. Neel, S., Roth, A., and Sharifi-Malvajerdi, S. Descent-todelete: Gradient-based methods for machine unlearning. In Algorithmic Learning Theory, pp. 931 962. PMLR, 2021. Nguyen, T. T., Huynh, T. T., Nguyen, P. L., Liew, A. W.-C., Yin, H., and Nguyen, Q. V. H. A survey of machine unlearning. ar Xiv preprint ar Xiv:2209.02299, 2022. Sekhari, A., Acharya, J., Kamath, G., and Suresh, A. T. Remember what you want to forget: Algorithms for machine unlearning. Advances in Neural Information Processing Systems, 34:18075 18086, 2021. Smith, L. N. and Topin, N. Super-convergence: Very fast training of residual networks using large learning rates. Ar Xiv, abs/1708.07120, 2017. URL https: //api.semanticscholar.org/Corpus ID: 23376859. Thudi, A., Jia, H., Shumailov, I., and Papernot, N. On the necessity of auditable algorithmic definitions for machine unlearning. Co RR, abs/2110.11891, 2021. URL https: //arxiv.org/abs/2110.11891. Triantafillou, E., Kairouz, P., Pedregosa, F., Hayes, J., Kurmanji, M., Zhao, K., Dumoulin, V., Junior, J. J., Mitliagkas, I., Wan, J., et al. Are we making progress in unlearning? findings from the first neurips unlearning competition. ar Xiv preprint ar Xiv:2406.09073, 2024. Vadhan, S. The Complexity of Differential Privacy, pp. 347 450. Springer International Publishing, Cham, 2017. ISBN 978-3-319-57048-8. doi: 10.1007/978-3319-57048-8 7. URL https://doi.org/10.1007/ 978-3-319-57048-8 7. Van Erven, T. and Harremos, P. R enyi divergence and kullback-leibler divergence. IEEE Transactions on Information Theory, 60(7):3797 3820, 2014. Voigt, P. and Von dem Bussche, A. The EU general data protection regulation (GDPR). A Practical Guide, 1st Ed., Cham: Springer International Publishing, 10(3152676): 10 5555, 2017. Zhang, B., Dong, Y., Wang, T., and Li, J. Towards certified unlearning for deep neural networks. In Proceedings of the 41st International Conference on Machine Learning, ICML 24. JMLR.org, 2024. Certified Unlearning for Neural Networks A.1. Theorem 4.2: Model Clipping Preliminaries. We first recall the hockey-stick divergence and previous results on privacy amplification by stochastic post-processing (Balle et al., 2019). Definition A.1 (Hockey-stick divergence). Let ε 0, and µ, ν two probability measures defined over Rd. We define Eε(µ ν) := Z Rd d[µ eεν]+ = sup A Rd (µ(A) eεν(A)) , where [ ]+ := max {0, }. Lemma A.2 ((Balle et al., 2019), Theorem 1 (adapted)). Let ε 0, K be a Markov kernel taking inputs in Rd, and µ, ν be two probability distributions over Rd. We have Eε(µK νK) Eε(µ ν) sup x1,x2 Rd Eε(K(x1) K(x2)). Lemma A.3 ((Asoodeh et al., 2020), Lemma 2). Let ε 0, µ1 = µ2 Rd and σ > 0. We have Eε(N(µ1, σ2Id) N(µ2, σ2Id)) = Q εσ µ1 µ2 µ1 µ2 eεQ εσ µ1 µ2 + µ1 µ2 where for all t R, Q(t) := 1 π R t e u2/2du. Lemma A.4 ((Dwork & Roth, 2014), Theorem A.1 (paraphrased)). Let ε (0, 1) and µ1 = µ2 Rd, σ > 0. We have Eε(N(µ1, σ2Id) N(µ2, σ2Id)) 1.25 exp Main proof. We now proceed to proving the main theorem. Theorem 4.2 (Model clipping). Let T 1, C0, C2, σ0, ε > 0, and δ (0, 1). Denote for every r > 0, θε(r) := Q ε where for all t R, Q(t) := 1 2π R t e u2/2du. Consider T iterations of the unlearning algorithm defined in (4). We obtain (ε, δ)-unlearning if T log(1/δ) + log θε( 2C0 log (1/θε( 2C2 σ )) . (10) In particular, for any T 1, ε (0, 1), it suffices to have σ2 = 8C2 2 ln(1.25) ln(1.25/δ) σ2 0ε2 Proof. Let T 1, C0, C2, σ0, γ > 0, and δ (0, 1), ε (0, 3 log(1/δ)). Consider T iterations of the unlearning algorithm defined in (4), and analogously define the following sequence initialized at the projected model trained without the forget data x 0 := ΠC0(A(D \ Df)) + ξ0, ξ0 N(0, σ2 0Id): x t+1 = ΠC2(x t γG(x t)) + ξ t, ξ t N(0, σ2Id). (15) Recall from Definition A.1 the definition of the hockey-stick divergence Eε. Also, we recall from Lemma A.2 that for any Markov kernel K: Eε(µK νK) sup x1,x2 Rd Eε(K(x1) K(x2)) Eε(µ ν). Certified Unlearning for Neural Networks In particular, by introducing α := supx1,x2 Rd Eε(N(ΠC2(x1 γG(x1)), σ2Id) N(ΠC2(x2 γG(x2)), σ2Id)) we have Eε(xt+1 x t+1) α Eε(xt x t). Applying the above recursively over T iterations, and denoting β := Eε(N(ΠC0(A(D)), σ2 0Id) N(ΠC0(A(D \ Df)), σ2 0Id)), yields: Eε(x T x T ) αT Eε(x0 x 0) = αT β. Therefore, in order to satisfy (ε, δ)-unlearning, it suffices to achieve Eε(x T x T ) δ, which can be achieved by having: T log(1/δ) + log β Now, since for any x1, x2 it holds that ΠC2(x1 γG(x1)) ΠC2(x2 γG(x2)) 2C2 and r 7 Q εσ r r 2σ eεQ εσ r + r 2σ is increasing (Asoodeh et al., 2020), using the exact expression of the hockey-stick divergence between Gaussians from Lemma A.3 yields α = sup x1,x2 Rd Eε(N(ΠC2(x1 γG(x1)), σ2Id) N(ΠC2(x2 γG(x2)), σ2Id)) Similarly, since ΠC0(A(D)) ΠC0(A(D \ Df)) 2C0, we have β = Eε(N(ΠC0(A(D)), σ2 0Id) N(ΠC0(A(D \ Df)), σ2 0Id)) Q εσ0 Therefore, to achieve (ε, δ)-unlearning, it suffices to have T log(1/δ) + log Q εσ0 2C0 C0 eεQ εσ0 2C0 + C0 log Q εσ 2C2 C2 σ eεQ εσ 2C2 + C2 Alternatively, using the simpler upper bound from Lemma A.4 on the hockey-stick divergence between Gaussians, we obtain α = sup x1,x2 Rd Eε(N(ΠC2(x1 γG(x1)), σ2Id) N(ΠC2(x2 γG(x2)), σ2Id)) 1.25 exp σ2ε2 Similarly, we have β = Eε(N(ΠC0(A(D)), σ2 0Id) N(ΠC0(A(D \ Df)), σ2 0Id)) 1.25 exp σ2 0ε2 Therefore, assuming σ2 > 8C2 2 ln(1.25) ε2 , to achieve (ε, δ)-unlearning, it suffices to have T ln(1.25/δ) σ2 0ε2 8C2 2 ln(1.25) . This can be rewritten as σ2 8C2 2 ln(1.25) ε2 + 8C2 2 ln(1.25) ln(1.25/δ) σ2 0ε2 This concludes the proof. Certified Unlearning for Neural Networks A.2. Theorem 4.1: Gradient Clipping Preliminaries. We first recall some important definitions and state useful lemmas before proceeding to the proof of the main theorem. We first recall the definition of the R enyi divergence, which we will mainly use to prove Theorem 4.1. Definition A.5 (R enyi divergence). Let q > 0, q = 1. The q-R enyi divergence between two probability distributions µ and ν is defined as Dq(µ ν) := 1 q 1 log EX ν We recall the shifted R enyi divergence introduced by Feldman et al. (2018). For any z 0, q 1, and two distributions µ, ν defined on Rd, we define D(z) q (µ ν) := inf µ : W (µ ,µ) z Dq(µ ν), (16) where W ( , ) := infω Γ( , ) ess sup(x,y) ω x y 2 is the -Wasserstein distance, and Γ(µ , µ) is the collection of couplings of its arguments, i.e., joint measures whose marginals are µ and µ respectively. Lemma A.6 ((Feldman et al., 2018), Lemma 20 (adapted)). Let q 1, z, a 0 and X, Y arbitrary random variables. If ξ, ξ N(0, σ2Id), σ > 0, then D(z) q (X + ξ Y + ξ ) D(z+a) q (X Y ) + qa2 Lemma A.7. Let q 1, z, ρ, C 0, ψ: Rd Rd and X, Y arbitrary random variables. If ψ satisfies x, x Rd, ψ(x ) ψ(x) ρ x x + s, then D(ρz+s) q (ψ(X) ψ(Y )) D(z) q (X Y ). Proof. For any measure µ, we denote by ψ#µ the push-forward measure of µ by ψ. Assume that ψ satisfies x, x Rd, ψ(x ) ψ(x) ρ x x + s. By definition of the -Wasserstein distance, it follows immediately that W (ψ#µ, ψ#ν) ρ W (µ, ν) + s. (17) Therefore, by definition (16) of the shifted R enyi divergence and using the data processing inequality for R enyi divergences (Van Erven & Harremos, 2014), we have D(ρz+s) q (ψ(X) ψ(Y )) = inf µ : W (µ ,ψ(X)) ρz+s Dq(µ ψ(Y )) inf X : W (ψ(X ),ψ(X)) ρz+s Dq(ψ(X ) ψ(Y )) inf X : W (X ,X) z Dq(ψ(X ) ψ(Y )) (Inequality (17)) inf X : W (X ,X) z Dq(X Y ) (Data Processing inequality) = D(z) q (X Y ). This concludes the proof. Lemma A.8. Let γ, λ 0, G: Rd Rd be an arbitrary function, and ψ: x 7 x γ (ΠC(G(x)) + λx). Then ψ satisfies: x, x Rd, ψ(x ) ψ(x) |1 λγ| x x + 2γC. Proof. We have for any x, x Rd that ψ(x) ψ(x ) = x γ (ΠC(G(x)) + λx) x + γ (ΠC(G(x )) + λx ) |1 λγ| x x + γ ΠC(G(x)) ΠC(G(x )) (Triangle inequality) |1 λγ| x x + 2γC. ( ΠC(G(x)) C) Certified Unlearning for Neural Networks Main proof. We are interested in the following iterative unlearning procedure (generalizing (3) to regularization and varying stepsizes and noise variances), starting from the projected model trained on the full data x0 := ΠC0(A(D)), where for all t {0, . . . , T 1}: xt+1 = xt γt (ΠC1(G(xt)) + λxt) + ξt, ξt N(0, σ2 t Id). (18) For the analysis, we analogously define the following sequence initialized at the projected model trained without the forget data x 0 := ΠC0(A(D \ Df)): x t+1 = x t γt (ΠC1(G(x t)) + λx t) + ξ t, ξ t N(0, σ2 t Id). (19) Theorem A.9. Let T, q 1, γ0, . . . , γT 1 0, σ0, . . . , σT 1 > 0, λ 0 and consider the two sequences {xt}0 t T , {x t}0 t T as defined above. Denote by Dq the R enyi divergence of order q. Assume that for every t {0, . . . , T 1}, γtλ < 1. Denote for every t {0, . . . , T 1}, st := 2γt C1, ρt := 1 γtλ. If a0, . . . , a T 1 0 satisfy PT 1 t=0 QT 1 t k=1 ρk at = QT 1 t=0 ρt 2C0 + PT 1 t=0 QT 1 t k=1 ρk st, then Dq(x T x T ) qa2 t 2σ2 t . (20) In particular, we have Dq(x T x T ) q h QT 1 t=0 ρt 2C0 + PT 1 t=0 QT 1 t k=1 ρk st i2 PT 1 t=0 QT 1 t k=1 ρ2 k σ2 t . (21) Proof. Let t {0, . . . , T 1}. Recall the sequence of iterates defined in (18), and analogously define the following sequence initialized at the projected model trained without the forget data x 0 := ΠC0(A(D \ Df)): x t+1 = x t γt (ΠC1(G(x t)) + λx t) + ξ t, ξ t N(0, σ2 t Id). (22) Therefore, for any at 0, using the bound above with Lemma A.6 yields D(zt+1) q xt+1 x t+1 = D(zt+1) q (xt γt (ΠC1(G(xt)) + λxt) + ξt x t γt (ΠC1(G(x t)) + λx t) + ξ t) D(zt+1+at) q (xt γt (ΠC1(G(xt)) + λxt) x t γt (ΠC1(G(x t)) + λx t)) + qa2 t 2σ2 t . Now, using Lemma A.8, and the fact that γt < 1 λ, we establish that ψt : x 7 x γtΠC1(G(x)) satisfies x, x Rd, ψt(x ) ψt(x) (1 λγt) x x + 2γt C1. Consequently, denoting st := 2γt C1 and ρt := 1 λγt, using the previous fact and Lemma A.7 in the bound above yields D(zt+1) q xt+1 x t+1 D(zt+1+at) q (xt γt (ΠC1(G(xt)) + λxt) x t γt (ΠC1(G(x t)) + λx t)) + qa2 t 2σ2 t ρt (zt+1+at st)) q (xt x t) + qa2 t 2σ2 t . By denoting zt := 1 ρt (zt+1 + at st), we have by recursion over t {0, . . . , T 1} for any z0, a0, . . . , a T 0: D(z T ) q (x T x T ) D(z0) q (x0 x 0) + qa2 t 2σ2 t , (23) (at st). (24) Certified Unlearning for Neural Networks Observe that, upon taking z0 = 2C0, since x0, x 0 2C0, it is immediate from definition (16) that D(z0) q (x0 x 0) = 0. Additionally, taking z T = 0 in the last equation implies that for all a0, . . . , a T 1 0 such that Dq(x T x T ) = D(0) q (x T x T ) qa2 t 2σ2 t . (26) This concludes the first part of the second statement of the theorem. The second part of the second statement is a direct consequence of setting, for all t {0, . . . , T 1}, # QT 1 t k=1 ρk σ2 t PT 1 k=0 QT 1 k l=1 ρ2 l σ2 t . (27) Theorem 4.1 (Gradient clipping). Let T 1, γ, σ > 0, λ 0, δ (0, 1), ε (0, 3 log(1/δ)). Consider T iterations of the unlearning algorithm defined in (3). We obtain (ε, δ)-unlearning if: 1. Without regularization (λ = 0): σ2 = 9 log(1/δ) ε2T (C0 + C1γT)2 . (5) 2. With regularization (λ > 0): if γλ ( 1 σ2 = 72γλ log(1/δ) C0 (1 γλ)T + C1 Proof. The proof of the first claim follows immediately by taking constant noise variance, stepsize, and zero regularization in the second statement of Theorem A.9, before converting from R enyi to (ε, δ)-unlearning using standard conversion methods (Mironov, 2017). Similarly, the proof of the second claim follows immediately by taking constant noise variance, and stepsize in the second statement of Theorem A.9 (which assumes that γλ < 1), before converting from R enyi to (ε, δ)-unlearning using standard conversion methods (Mironov, 2017). Indeed, we then get that it is sufficient to set σ2 γλ(2 γλ) 2ε (1 (1 γλ)2T ) 2C0 (1 γλ)T + 2C1 λ (1 (1 γλ)T ) 2 . The right-hand side above can be upper bounded by 72γλ log(1/δ) ε2 C0 (1 γλ)T + C1 λ 2 when assuming that γλ 1 2. This concludes the proof. Certified Unlearning for Neural Networks B. Experiments We use small custom networks for training on MNIST and CIFAR10 1 # used for mnist 2 class Tiny Net(nn.Module): 3 num_classes: int 5 @nn.compact 6 def __call__(self, x, train: bool = True, mutable=None): 7 x = x.reshape((x.shape[0], -1)) 8 x = nn.Dense(features=5)(x) 9 x = nn.relu(x) 10 x = nn.Dense(features=self.num_classes)(x) 11 return x 13 class CIFAR10Tinyk Net(nn.Module): 14 num_classes: int 16 @nn.compact 17 def __call__(self, x, train: bool = True): 18 he_init = nn.initializers.he_normal() 19 x = nn.Conv(features=32, kernel_size=(3, 3), padding="same", kernel_init=he_init)( x) 20 x = nn.relu(x) 21 x = nn.avg_pool(x, window_shape=(2, 2), strides=(2, 2)) 22 x = nn.Conv(features=64, kernel_size=(3, 3), padding="same", kernel_init=he_init)( x) 23 x = nn.relu(x) 24 x = nn.avg_pool(x, window_shape=(2, 2), strides=(2, 2)) 25 x = x.mean(axis=(1, 2)) 26 x = nn.Dense(self.num_classes, kernel_init=he_init)(x) 27 return x In Tables 3 and 4 we give complete experimental details for the CIFAR and MNIST experiments. Dataset CIFAR-10 Architecture Tiny Convolution Net (20k params) Training objective Cross entropy loss Evaluation objective Top-1 accuracy Batch size 128 Training learning rate 0.1 Training learning rate schedule Linear One Cycle (Smith & Topin, 2017) Train weight decay 0.0005 Number of train epochs 100 Forget set size 10% Number of unlearning epochs 50 Noise schedule constant Unlearning learning rate schedule constant Post Unlearning learning rate 0.06 Post Unlearning learning rate schedule Linear One Cycle Post Unlearning weight decay 0.0005 Table 3. Experimental Setting CIFAR10 Certified Unlearning for Neural Networks Dataset MNIST Architecture Tiny 2 Layer Net (4k params) Training objective Cross entropy loss Evaluation objective Top-1 accuracy Batch size 128 Training learning rate 0.06 Training learning rate schedule Linear One Cycle (Smith & Topin, 2017) Train weight decay 0.0005 Number of train epochs 30 Forget set size 10% Number of unlearning epochs 10 Noise schedule constant Unlearning learning rate schedule constant Post Unlearning learning rate 0.06 Post Unlearning learning rate schedule Linear One Cycle Post Unlearning weight decay 0.0005 Table 4. Experimental Setting MNIST ϵ Compute Budget λ C1 γt C0 Unlearning Steps σ 1 1 10.0 100.0 0.0001 0.01 1 0.028270 1 2 750.0 10.0 0.0001 0.01 6 0.007752 1 3 750.0 10.0 0.0001 0.01 6 0.007752 1 4 750.0 10.0 0.0001 0.01 6 0.007752 1 5 10.0 100.0 0.0001 0.01 1 0.028270 1 6 750.0 10.0 0.0001 0.01 6 0.007752 1 7 10.0 100.0 0.0001 0.01 1 0.028270 1 8 10.0 100.0 0.0001 0.01 1 0.028270 1 9 10.0 100.0 0.0001 0.01 1 0.028270 1 10 10.0 100.0 0.0001 0.01 1 0.028270 Table 5. Hyperparameters for Gradient Clipping MNIST ϵ Compute Budget C2 σ η λ Unlearning Steps 1 1 0.001 0.01 0.0001 900.00 1 1 2 0.001 0.01 0.0001 900.00 1 1 3 0.001 0.01 0.0001 900.00 1 1 4 0.001 0.01 0.0001 500.00 1 1 5 0.001 0.01 0.0100 0.01 1 1 6 0.010 0.01 0.0010 900.00 6 1 7 0.001 0.01 0.0010 10.00 1 1 8 0.001 0.01 0.0100 900.00 1 1 9 0.001 0.01 0.0001 10.00 1 1 10 0.001 0.01 0.0010 10.00 1 Table 6. Hyperparameters for Model Clipping MNIST Certified Unlearning for Neural Networks ϵ Compute Budget (epochs) λ C1 γ C0 Unlearning Steps σ 1 1 200.0 100.0 0.0010 0.1 1 0.254558 1 4 50.0 10.0 0.0100 1.0 5 0.275702 1 7 50.0 10.0 0.0100 20.0 11 0.256790 1 10 50.0 100.0 0.0001 0.1 10 0.088213 1 13 1.0 10.0 0.0010 0.1 10 0.089197 1 16 50.0 10.0 0.0010 0.1 10 0.077256 1 19 1.0 10.0 0.0010 0.1 10 0.089197 1 22 50.0 100.0 0.0001 0.1 10 0.088213 1 25 500.0 100.0 0.0010 1.0 5 0.275702 1 28 50.0 10.0 0.0100 20.0 11 0.256790 1 31 500.0 100.0 0.0010 20.0 11 0.256790 1 34 50.0 1.0 0.0010 1.0 93 0.012501 1 37 50.0 100.0 0.0010 0.1 1 0.275772 1 40 500.0 100.0 0.0010 1.0 5 0.275702 1 43 1.0 10.0 0.0100 0.1 1 0.281429 1 46 500.0 100.0 0.0010 20.0 11 0.256790 1 49 1.0 10.0 0.0100 0.1 1 0.281429 Table 7. Hyperparameters for Gradient Clipping CIFAR ϵ Compute Budget (epochs) C2 σ η λ Unlearning Steps 1 1 0.200 0.2 0.0100 100.0 6 1 4 0.500 0.5 0.0010 10.0 6 1 7 0.500 0.5 0.0010 10.0 6 1 10 0.625 0.5 0.0001 100.0 9 1 13 0.625 0.5 0.0010 100.0 9 1 16 0.625 0.5 0.0001 0.0 9 1 19 0.500 0.5 0.0001 0.0 6 1 22 0.500 0.5 0.0010 0.0 6 1 25 0.625 0.5 0.0001 10.0 9 1 28 0.625 0.5 0.0010 100.0 9 1 31 0.625 0.5 0.0001 1.0 9 1 34 0.975 0.5 0.0001 10.0 36 1 37 0.625 0.5 0.0100 10.0 9 1 40 0.975 0.5 0.0001 1.0 36 1 43 0.500 0.5 0.0001 10.0 6 1 46 0.975 0.5 0.0001 10.0 36 1 49 0.500 0.5 0.0100 100.0 6 Table 8. Hyperparameters for Model Clipping CIFAR Certified Unlearning for Neural Networks ϵ Compute Budget (epochs) C0 σ 1 1 1.00 9.689610 1 4 0.10 0.968961 1 7 0.10 0.968961 1 10 0.10 0.968961 1 13 0.10 0.968961 1 16 0.10 0.968961 1 19 0.10 0.968961 1 22 0.10 0.968961 1 25 0.10 0.968961 1 28 0.10 0.968961 1 31 0.10 0.968961 1 34 0.10 0.968961 1 37 0.10 0.968961 1 40 0.01 0.096896 1 43 0.01 0.096896 1 46 0.10 0.968961 1 49 0.01 0.096896 Table 9. Hyperparameters for Output Perturbation CIFAR ϵ Compute Budget (epochs) C0 σ 1 1 0.01 0.096896 1 2 0.01 0.096896 1 3 0.01 0.096896 1 4 0.01 0.096896 1 5 0.01 0.096896 1 6 0.01 0.096896 1 7 0.01 0.096896 1 8 0.01 0.096896 1 9 0.01 0.096896 1 10 0.01 0.096896 Table 10. Hyperparameters for Output Perturbation MNIST C. Transfer Learning Experiments We use small three layer network as the head on top of a frozen pretrained (on Imagenet) Res Net18 backbone for transfer learning experiments on CIFAR-10 and CIFAR-100 1 class Three Layer NN(nn.Module): 2 num_classes: int 4 @nn.compact 5 def __call__(self, x, train: bool = True, mutable=None): 6 x = x.reshape((x.shape[0], -1)) 7 x = nn.Dense(features=32)(x) 8 x = nn.relu(x) 9 x = nn.Dense(features=32)(x) 10 x = nn.relu(x) 11 x = nn.Dense(features=self.num_classes)(x) 12 return x In Tables 11, 12, and 13 we give complete experimental details for the CIFAR-10 and CIFAR-100 transfer learning experiments. Certified Unlearning for Neural Networks Architecture Frozen Resnet-18 Backbone + 3 Layer NN Training objective Cross entropy loss Evaluation objective Top-1 accuracy Batch size 128 Training learning rate 0.1 Training learning rate schedule Linear One Cycle (Smith & Topin, 2017) Train weight decay 0.0005 Number of train epochs 100 DP-SGD . 2 clip 0.5 DP-SGD target group ε 50 DP-SGD target group δ 0.00001 Forget set size 10% DP-SGD forget set size 0.5% Number of unlearning epochs 50 Noise schedule constant Unlearning learning rate schedule constant Post Unlearning learning rate 0.06 Post Unlearning learning rate schedule Linear One Cycle Post Unlearning weight decay 0.0005 Table 11. Experimental Setting CIFAR-10 and CIFAR-100 ϵ Compute Budget (epochs) λ C1 γ C0 Unlearning Steps σ 1 1 500.0 100.0 0.001 0.01 1 0.148492 1 4 100.0 10.0 0.0001 0.01 10 0.008698 1 7 0.50 1.0 0.001 0.01 10 0.008932 1 10 0.50 10.0 0.0001 0.01 10 0.008943 1 13 0.50 10.0 0.0001 0.01 10 0.008943 1 16 10.0 10.0 0.001 0.01 1 0.028143 1 19 10.0 10.0 0.001 0.01 1 0.028143 1 22 10.0 10.0 0.001 0.01 1 0.028143 1 25 10.0 10.0 0.001 0.01 1 0.028143 1 28 10.0 10.0 0.001 0.01 1 0.028143 1 31 100.0 100.0 0.0001 0.01 1 0.028143 1 34 10.0 1.0 0.01 0.01 1 0.026870 1 37 0.50 10.0 0.001 0.01 1 0.028277 1 40 0.50 10.0 0.001 0.01 1 0.028277 1 43 0.50 10.0 0.001 0.01 1 0.028277 1 46 0.50 10.0 0.001 0.01 1 0.028277 1 49 0.50 10.0 0.001 0.01 1 0.028277 Table 12. Hyperparameters for Gradient Clipping Transfer Learning CIFAR-10 Certified Unlearning for Neural Networks ϵ Compute Budget (epochs) λ C1 γ C0 Unlearning Steps σ 1 1 500 100 0.001 0.01 1 0.148492 1 4 0.50 1 0.001 0.01 10 0.008932 1 7 10 1 0.001 0.01 10 0.008698 1 10 10 0.10 0.01 0.10 30 0.008524 1 13 10 10 0.0001 0.01 10 0.008920 1 16 100 1 0.001 0.10 30 0.008524 1 19 0.50 1 0.001 0.01 10 0.008932 1 22 500 10 0.0001 0.01 10 0.007726 1 25 100 1 0.001 0.10 30 0.008524 1 28 0.50 1 0.001 0.01 10 0.008932 1 31 500 10 0.001 1 10 0.025667 1 34 0.50 10 0.0001 0.01 10 0.008943 1 37 500 10 0.001 1 10 0.025667 1 40 500 10 0.001 1 10 0.025667 1 43 500 10 0.001 1 10 0.025667 1 46 500 10 0.001 1 10 0.025667 1 49 500 10 0.001 1 10 0.025667 Table 13. Hyperparameters for Gradient Clipping Transfer Learning CIFAR-100 In this section, we evaluate how the choice of ϵ affects the performance of our algorithm. For that, in addition to ϵ = 1 used in the paper, we plot the performance of the gradient clipping algorithm (3) for ϵ = 0.1 and ϵ = 10. We kept fixed δ = 10 5 for all of the epsilons. See Figure 4 for results. We can see that ϵ = 0.1 degrades the performance of our algorithm significantly compared to ϵ = 1. There is very little difference between ε = 1 and ε = 10 but a performance penalty for ε = 0.1 that is more visible with the harder task (CIFAR-10). (a) CIFAR-10 Figure 4. Accuracy of Gradient Clipping versus compute budget (epochs) on CIFAR-10 (left) and MNIST (right), to satisfy (ε, 10 5)- unlearning for ε {0.1, 1, 10}. Certified Unlearning for Neural Networks Compute Budget (epochs) ϵ λ C1 γ C0 Unlearning Steps σ 1 0.1 500 100 0.001 1 5 0.871847 4 0.1 500 100 0.001 1 5 0.871847 7 0.1 500 100 0.001 1 5 0.871847 10 0.1 500 100 0.001 1 5 0.871847 13 0.1 500 100 0.001 1 5 0.871847 16 0.1 500 100 0.001 1 5 0.871847 19 0.1 500 100 0.001 1 5 0.871847 22 0.1 500 100 0.001 1 5 0.871847 25 0.1 500 100 0.001 1 5 0.871847 28 0.1 500 100 0.001 1 5 0.871847 31 0.1 500 100 0.001 1 5 0.871847 34 0.1 500 100 0.001 1 5 0.871847 37 0.1 500 100 0.001 1 5 0.871847 40 0.1 500 100 0.001 1 5 0.871847 43 0.1 500 100 0.001 1 5 0.871847 46 0.1 500 100 0.001 1 5 0.871847 49 0.1 500 100 0.001 1 5 0.871847 1 1 500 100 0.001 1 5 0.275702 4 1 1 10 0.001 0.1 10 0.089197 7 1 500 100 0.001 1 5 0.275702 10 1 1 10 0.001 0.1 10 0.089197 13 1 500 100 0.001 1 5 0.275702 16 1 500 100 0.001 1 5 0.275702 19 1 500 100 0.001 1 5 0.275702 22 1 500 100 0.001 1 5 0.275702 25 1 500 100 0.001 1 5 0.275702 28 1 500 100 0.001 1 5 0.275702 31 1 500 100 0.001 1 5 0.275702 34 1 500 100 0.001 1 5 0.275702 37 1 500 100 0.001 1 5 0.275702 40 1 500 100 0.001 1 5 0.275702 43 1 500 100 0.001 1 5 0.275702 46 1 500 100 0.001 1 5 0.275702 49 1 500 100 0.001 1 5 0.275702 1 10 1 100 0.1 0.1 1 4.512385 4 10 1 10 0.1 0.1 1 0.487463 7 10 0.1 10 0.1 1 1 0.889955 10 10 0.1 10 0.1 0.1 1 0.491488 13 10 0.1 10 0.1 1 1 0.889955 16 10 0.1 10 0.1 0.1 1 0.491488 19 10 1 10 0.1 0.1 1 0.487463 22 10 0.1 10 0.1 0.1 1 0.491488 25 10 0.1 10 0.1 1 1 0.889955 28 10 0.1 0.01 0.1 0.1 70 0.007260 31 10 1 0.1 0.1 10 53 0.026744 34 10 25 5 0.01 10 19 0.071419 37 10 0.1 10 0.1 0.1 1 0.491488 40 10 1 10 0.001 0.1 10 0.028207 43 10 25 5 0.01 10 19 0.071419 46 10 25 5 0.01 10 19 0.071419 49 10 1 0.1 0.1 1 30 0.026955 Table 14. Hyperparameters for Gradient Clipping CIFAR-10 for ε {10, 1, 0.1} Certified Unlearning for Neural Networks ϵ Compute Budget (epochs) λ C1 γ C0 Unlearning Steps σ 10 1 750 10 0.0001 0.01 6 0.002451 10 2 750 10 0.0001 0.01 6 0.002451 10 3 750 10 0.0001 0.01 6 0.002451 10 4 750 10 0.0001 0.01 6 0.002451 10 5 10 100 0.0001 0.01 1 0.008940 10 6 750 10 0.0001 0.01 6 0.002451 10 7 10 100 0.0001 0.01 1 0.008940 10 8 750 10 0.0001 0.01 6 0.002451 10 9 10 100 0.0001 0.01 1 0.008940 1 1 10 100 0.0001 0.01 1 0.028270 1 2 750 10 0.0001 0.01 6 0.007752 1 3 750 10 0.0001 0.01 6 0.007752 1 4 750 10 0.0001 0.01 6 0.007752 1 5 10 100 0.0001 0.01 1 0.028270 1 6 10 100 0.0001 0.01 1 0.028270 1 7 750 10 0.0001 0.01 6 0.007752 1 8 10 100 0.0001 0.01 1 0.028270 1 9 10 100 0.0001 0.01 1 0.028270 0.1 1 10 100 0.0001 0.01 1 0.089398 0.1 2 10 100 0.0001 0.01 1 0.089398 0.1 3 10 100 0.0001 0.01 1 0.089398 0.1 4 750 10 0.0001 0.01 6 0.024514 0.1 5 10 100 0.0001 0.01 1 0.089398 0.1 6 10 100 0.0001 0.01 1 0.089398 0.1 7 10 100 0.0001 0.01 1 0.089398 0.1 8 10 100 0.0001 0.01 1 0.089398 0.1 9 10 100 0.0001 0.01 1 0.089398 Table 15. Hyperparameters for Gradient Clipping MNIST for ε {10, 1, 0.1}