# pdudt_provable_decentralized_unlearning_under_dynamic_topologies__8052394f.pdf

PDUDT: Provable Decentralized Unlearning under Dynamic Topologies

Jing Qiao 1 2 Yu Liu 1 Zengzhe Chen 1 Mingyi Li 1 Yuan Yuan 3 4 Xiao Zhang 1 Dongxiao Yu 1

This paper investigates decentralized unlearning, aiming to eliminate the impact of a specific client on the whole decentralized system. However, decentralized communication characterizations pose new challenges for effective unlearning: the indirect connections make it difficult to trace the specific client s impact, while the dynamic topology limits the scalability of retraining-based unlearning methods. In this paper, we propose the first Provable Decentralized Unlearning algorithm under Dynamic Topologies, called PDUDT. It allows clients to eliminate the influence of a specific client without additional communication or retraining. We provide rigorous theoretical guarantees for PDUDT, showing it is statistically indistinguishable from perturbed retraining. Additionally, it achieves an efficient convergence rate of O( 1

T ) in subsequent learning, where T is the total communication rounds. This rate matches state-of-the-art results. Experimental results show that compared with the Retrain method, PDUDT saves more than 99% of unlearning time while achieving comparable unlearning performance.

1. Introduction

With the surge in data volume and increasing geographic dispersion of data sources, some collaborative learning paradigms, such as Federated Learning (Mc Mahan et al., 2017) and Decentralized Learning (Lian et al., 2017), have attracted widespread attentions. In the above collaborative scenarios, privacy regulations like GDPR (Voigt & Von dem Bussche, 2017) grant clients the right to withdraw the use

1School of Computer Science and Technology, Shandong University, Qingdao, China 2Zhongtai Securities Institute for Financial Studies, Shandong University, Jinan, China 3School of Software, Shandong University, Jinan, China 4Joint SDUNTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China. Correspondence to: Yuan Yuan <yyuan@sdu.edu.cn>, Xiao Zhang <xiaozhang@sdu.edu.cn>.

Proceedings of the 42 nd International Conference on Machine Learning, Vancouver, Canada. PMLR 267, 2025. Copyright 2025 by the author(s).

of their personal data in any form. For instance, users might wish to delete location information from navigation apps or ask smart voice assistants to forget sensitive conversation content. Simply deleting data does not ensure the right to be forgotten, as its influence remains embedded in the collaboratively trained models (Tao et al., 2024). Therefore, several studies have empirically explored ways to ensure the right to be forgotten for collaborative learning, using techniques such as knowledge distillation (Wu et al., 2022a), class-discriminative pruning (Wang et al., 2022), projected gradient ascent (Wu et al., 2022b), or second-order Ada Hessian optimizer (Liu et al., 2022).

However, the existing works usually rely on a trustworthy central server for coordination, which is not always guaranteed in real-world scenarios (Qiao et al., 2024). To further remove the dependence on the central server, the researchers began to explore how to achieve efficient unlearning in a full decentralized framework. For example, HDUS (Ye et al., 2024) uses distilled seed models to create erasable ensembles for all clients. Similarly, Block FUL (Liu et al., 2024a) is a novel framework with a dual-chain structure, comprising a live chain and an archive chain, to enable unlearning in Blockchained FL. Although these studies provide practical solutions for decentralized unlearning, the theoretical performance analysis still lacks in-depth exploration.

Therefore, we aim to design an efficient decentralized unlearning framework, while also theoretically guaranteeing the effectiveness and soundness of the unlearning process. To achieve this, we need to address the following challenges: (1) Indirect connections complicate the impact chain. In a decentralized system, some clients may not be directly connected to the client initiating the unlearning request, yet they can still be influenced through the information flow. The complexity and unpredictability of model propagation paths make it challenging to accurately trace and mitigate the influence of a specific client. (2) Dynamic topologies make retraining-based methods infeasible. The constantly changing topologies among clients pose significant barriers to retraining-based unlearning methods. Clients that have exited the decentralized system are often unreachable, making it infeasible to revert to an earlier training state (Tao et al., 2024) or retrain the model from scratch. This lack of access to previously participating clients undermines the consistency and feasibility of retraining approaches, espe-

PDUDT: Provable Decentralized Unlearning under Dynamic Topologies

Table 1. Comparison with some related works

Algorithm SL w/o retrain Dynamic Theoretical guarantee

topology Unlearning guarantee Convergence rate Unlearning time Space overhead Comp. Comm.

Fed Eraser (Liu et al., 2021) - - - - - Fed Remover (Yuan et al., 2024) - - - - - Fed Unl (Wu et al., 2022a) - - - - - HDUS (Ye et al., 2024) - - - - -

FATS-Unl (Tao et al., 2024) Exact unlearning 1O 1

Kb(t1+T ) 2 > min{1, KR

n } RT - 3O(R max{b, d}) 3O(R max{K, d}) Fed Recovery (Zhang et al., 2023) (ϵ, β)-machine unlearning - O(t1) 0 O(t1nd) PDUDT (This paper) (ϵ, β)-machine unlearning O( 1

T ) O(t1) 0 O(t1Nmaxd)

Note that - means no result or not applicable, SL means Severless , RT means Retraining time , t1 is the round when a client in the learning system issues an unlearning request, and Nmax = maxi,t |N t i | n denotes the maximum number of neighbors over rounds in a decentralized learning system. For the last three columns, we consider the unlearning time and space overhead on a single node, whether it is a server or a client, that needs to perform the unlearning operations. 1 The original convergence rate in Tao et al. s paper is shown as O(1/ ρSMN). For easy comparison, in our Table 1, it is further derived from line 2 of Algorithm 1 in their paper. Here, K is the number of clients participating in each round of training, and b is the sample batch size used to calculate the gradient (in our theoretical result, we set b = 1). Their convergence result is for the total training rounds. Since we have proved the statistical indistinguishability of the unlearning algorithm, we only focus on the convergence behavior after the unlearning operations. 2 Tao et al. s result relies on the time it takes to retrain the parameterized neural network, which is often much longer than ours. In their result, R denotes the total communication rounds, and it holds R = t1 + T in our paper. 3 The space overhead is O(R max{b, d}) for each client and O(R max{K, d}) for the server.

cially when client participation is voluntary and transient. (3) Global unlearning performance is difficult to quantify theoretically. In a decentralized setting, each client must independently forget the effects of a specific client locally. However, there is currently no unified metric to evaluate how these local unlearning operations collectively achieve the desired global unlearning effect. Without centralized oversight, ensuring that local actions align to produce the intended global impact remains a significant open challenge.

Along this line, we propose a provable unlearning algorithm, called PDUDT, for the decentralized framework under dynamic topologies. To make the whole system forget the unlearned client, each client locally uses its own historical gradient submissions, along with those of its neighbors, to perform unlearning operations. Specifically, we compute a sequence of gradient residual approximations using the expected retraining update rule. At the unlearning moment, we subtract the weighted sum of the corresponding approximations from each client s model, where the weights measure the clients contributions to the system. This process only involves using saved historical information, with no additional communication and training process, making it readily adaptable to dynamic topology settings. To provide a theoretically rigorous unlearning guarantee, we first derive an upper bound on the difference between the retrained model and the output model of the proposed algorithm from a global perspective. Then, we use the Gaussian mechanism to mask this gap in the parameter space, ensuring statistical indistinguishability. Finally, we analyze the impact of the unlearning models on the subsequent convergence behavior of decentralized learning.

Our main contributions can be summarized as follows:

To the best of our knowledge, we propose PDUDT, the first provable decentralized unlearning algorithm under

dynamic topologies. It modifies local models using only historical information, eliminating the influence of a specific client. Notably, PDUDT requires no extra communication or neural network retraining, ensuring high efficiency in dynamic settings.

We provide rigorous theoretical guarantee for the PDUDT algorithm. First, we prove that it is statistically indistinguishable from the perturbed retraining method. Then, we derive that the subsequent learning process converges to the scaled difference among the unlearning models from all remaining clients at a rate of O( 1

T ), which matches the state-of-the-art results.

We conduct extensive experiments to verify the performance superiority of the proposed PDUDT algorithm. Specifically, compared with the Retrain method, PDUDT saves over 99% unlearning time, while maintaining outstanding unlearning performance.

2. Related Work

Depending on the precision of forgetting, existing unlearning techniques in the collaborative frameworks can be categorized into two types: Exact Unlearning and Approximate Unlearning.

2.1. Exact Unlearning

Exact unlearning requires that the clients in a collaborative system completely remove the influence of a specific client. To achieve this, the model typically needs to be retrained or partially retrained, which may incur high computational costs and time expenses (Liu et al., 2024b; Yuan et al., 2024; Tao et al., 2024). For example, one exact unlearning technique is the leave-one-out retraining approach, where the model is retrained on the complete dataset, omitting the

PDUDT: Provable Decentralized Unlearning under Dynamic Topologies

user s data that needs to be unlearned (Liu et al., 2024b). Fed Remover (Yuan et al., 2024) designs a real-time malicious client detection scheme to quickly perform unlearning operations and implement a global model of unlearning in a minimum number of rounds. FATS-Unl (Tao et al., 2024) aims to backtrack to the point before the client who initiates the unlearning request first participates in collaborative training, performing retraining to achieve exact unlearning. While these methods can fully eliminate the influence of specific clients, their implementation faces challenges. First, heavy reliance on retraining results in high computational overhead and time costs (Liu et al., 2024b). Second, due to the dynamic nature of client participation in collaborative training, recalling previous clients for retraining is often impractical (Zhang et al., 2023).

2.2. Approximate Unlearning

Approximate unlearning provides a more efficient solution, with methods that are typically faster and more computationally economical than exact unlearning (Liu et al., 2024b). Although it may not completely eliminate the influence of specific clients, approximate unlearning is sufficiently effective for most practical scenarios and meets the necessary requirements. Some works have been proposed to improve the understanding of federated approximate unlearning. Fed Eraser (Liu et al., 2021) relies on the central server to store historical submissions of each client, which are then calibrated to accelerate the unlearning process. Fed Recovery (Zhang et al., 2023) provides a federated unlearning scheme to eliminate the influence of a specific client by removing the weighted sum of gradient residuals from the global model.

Considering that the reliability of central servers in practical applications is often not guaranteed, researchers have begun to focus on implementing unlearning in the decentralized frameworks (Wu et al., 2022a; Ye et al., 2024; Lin et al., 2024). For instance, HDUS (Ye et al., 2024) introduces a decentralized unlearning mechanism that leverages distilled seed models to construct erasable ensembles Block FUL (Liu et al., 2024a) supports unlearning in Blockchained Federated Learning, with an innovative framework featuring a dual-chain structure. Similarly, Lin et al. (2024) addresses unlearning in privacy-preserving AIGC systems via coded computing, which requires additional storage and reconstruction overhead. While these studies offer practical solutions for decentralized unlearning, a thorough theoretical performance analysis remains underexplored.

In Table 1, we provide a comprehensive comparison of our approach against some related works across multiple dimensions, including implementation architecture, communication patterns, theoretical unlearning guarantees, and efficiency metrics.

3. Decentralized Unlearning

3.1. Problem Setup

In this paper, we consider a decentralized learning scenario with n clients, denoted as V = {1, , n}. The communication mode in round t is modeled by a doubly stochastic matrix Wt = (W t ij)n n, where W t ij > 0 if client-i and client-j can directly communicate. Specially, we consider a general dynamic scenario where the connections between clients can vary arbitrarily after each round, i.e., the neighboring set N t i = {j|W t ij > 0, j V, j = i} of client-i and weight matrix Wt of clients vary with the rounds. All clients participating in the training collaboratively find a solution to the following general learning problem:

min θ Rd f(θ) := 1

i=1 Eξi Di[Fi(θ, ξi)] | {z } :=fi(θ)

In round t, each client-i receives gradients from all its neighbors, subsequently updating its model through local training that combines these gradients with its own local information according to the communication matrix:

θt i = θt 1 i η

j=1 W t 1 ij Fj(θt 1 j , ξt 1 j ) (2)

After several rounds of training, a client may submit an unlearning request to withdraw its data consent. At this point, due to privacy regulations like GDPR (Voigt & Von dem Bussche, 2017), the decentralized learning system must remove this client s contribution to the global model. Without loss of generality, assume client-n makes the unlearning request at round t1. Ideally, retraining by the n 1 clients, excluding client-n, ensures that the contribution of client-n is fully removed, thus guaranteeing privacy. The retraining process can be expressed as follows:

θt i = θt 1 i η

j=1 W t 1 ij Fj( θt 1 j , ξt 1 j ) (3)

where we can set the initial model as θ0 i = θ0 i . Theoretically, the communication matrix Wt = ( W t ij)(n 1) (n 1) can still be a doubly stochastic one. For example, the Metropolis Hastings method (Awan et al., 2006) can be utilized to generate it. And we denote the neighbor set of client-i (i V\{n}) in round t as N t i = {j| W t ij > 0, j V\{n}, j = i}.

However, retraining from scratch incurs prohibitively high computation and communication costs. Therefore, our goal is to adjust the local model θt1 i for each client that continues training, so that from a global perspective, the adjusted models perform similarly to those retrained by n 1 clients.

PDUDT: Provable Decentralized Unlearning under Dynamic Topologies

Algorithm 1 The Perturbed Retraining Algorithm

1: Input: The number of clients participating in retraining n 1, the initial local parameters θ0 i = θ0 i Rd

(i V\{n}), the round t1 when client-n submits an unlearning request, the step size η, the privacy budget ϵ, and the confidence parameter β. 2: for client i = 1, , n 1 (In Parallel) do 3: for round t = 1, , t1 do 4: Compute its local gradient Fi( θt 1 i , ξt 1 i ); 5: Receive all the gradients Fj( θt 1 j , ξt 1 j ) (j N t 1 i ) from its neighbors; 6: Update each local model parameter following θt i =

θt 1 i η n 1 P

j=1 W t 1 ij Fj( θt 1 j , ξt 1 j ) ;

7: end for 8: Set σ = 1

log(1/β), where d1 is

the upper bound discussed in Theorem 4.9; 9: Add perturbation to each local model parameter ˇθt1 i = θt1 i + zi, where zi N(0, (n 1)σ2Id) is a noise from the Gaussian distribution. 10: end for

3.2. Algorithm Design

When client-n issues an unlearning request in round t1, for any client-i (i V\{n}) in the decentralized learning system, its local model can be expressed as

θt1 i = θ0 i η

j=1 W t ij Fj(θt j, ξt j) (4)

Correspondingly, if retrained from scratch to round t1, for any client-i (i V\{n}), it holds that

θt1 i = θ0 i η

j=1 W t ij Fj( θt j, ξt j) (5)

To investigate the effect of client-n, we can subtract Equation (4) from Equation (5), and have

θt1 i θt1 i

j=1 W t ij Fj( θt j, ξt j)

j=1 W t ij Fj(θt j, ξt j))

To make an adjustment to the local model θt1 i of each client-i (i V\{n}), we introduce rt i = η Pn 1 j=1 W t ij Fj( θt j, ξt j) η Pn j=1 W t ij Fj(θt j, ξt j) to represent the gradient residual. Calculating rt i involves the trajectory dynamics of all neighbors of client-i obtained by retraining. Therefore, it is not feasible to directly adjust

Algorithm 2 Decentralized Unlearning Algorithm PDUDT

1: Input: The number of clients n, the initial local parameters θ0 i = θ0 Rd (i V), the round t1 when client-n submits an unlearning request, the step size η, the privacy budget ϵ, and the confidence parameter β. 2: for client i = 1, , n (In Parallel) do 3: for round t = 1, , t1 do 4: Compute and storage its local gradient Fi(θt 1 i , ξt 1 i ); 5: Receive and storage all its neighbors gradients Fj(θt 1 j , ξt 1 j ) (j N t 1 i ); 6: Update each local model parameter following θt i =

θt 1 i η n P

j=1 W t 1 ij Fj(θt 1 j , ξt 1 j ) ;

7: end for 8: end for 9: Receive the unlearning request from client-n. 10: for client i = 1, , n 1 (In Parallel) do 11: Compute the approximations {δt i}t1 1 t=0 of the gradient residuals by Equation (7); 12: Compute the weight pt i for each approximation δt i (t = 0, , t1 1) according to Equation (8); 13: Subtract a weighted sum of δt i from θt1 i to obtain θt1 i based on Equation (9); 14: Set σ = 1

log(1/β), where d1 is

the upper bound discussed in Theorem 4.9; 15: Add perturbation to each local model parameter θu i = θt1 i + zi, where zi N(0, (n 1)σ2Id) is a noise from the Gaussian distribution. 16: end for

the local model θt1 i by subtracting the accumulated gradient residual, i.e., θt1 i Pt1 1 t=0 rt i. Instead, we provide an approximation of the gradient residual rt i, denoted by

j=1 W t ij Fj(θt j, ξt j) η

j=1 W t ij Fj(θt j, ξt j) (7)

However, this approximation δt i does not fully capture the intricate dynamics and interactions among clients during training, especially the influence of client-n over different rounds. To this end, we use pt i to denote the weight of δt i in the t-th round of the learning process. Since each client relies on the gradient information from its neighbors to update its local model, Pn j=1 W t ij fj(θt j, ξt j) 2 naturally weights the contributions of client-i s neighbors gradients in each round. Formally, pt i is expressed as follows:

pt i = Pn j=1 W t ij Fj(θt j, ξt j) 2 Pt1 1 t=0 Pn j=1 W t ij Fj(θt j, ξt j) 2 (8)

Therefore, when an unlearning request is submitted by client-n in round t1, each client will perform the following

PDUDT: Provable Decentralized Unlearning under Dynamic Topologies

operation as shown in Equation (9) to remove the influence of client-n to the learning process over rounds.

θt1 i = θt1 i

t=0 pt iδt i, i V\{n} (9)

To achieve the indistinguishability described in Definition 4.1, we introduce random Gaussian noise to θt1 i and θt1 i to mask the gap between them. Specifically, Algorithm 1 outlines the perturbed retraining method, while Algorithm 2 presents our proposed decentralized unlearning algorithm that does not rely on retraining. From a global perspective, the output models of the two algorithms are indistinguishable, as will be discussed in detail in Section 4.

4. Theoretical Guarantee

Before delving into the unlearning performance and convergence behavior, we present some definitions and assumptions required for our theoretical analysis.

Definition 4.1. ((ϵ, β)-Indistinguishability (Neel et al., 2021)): Let X and Y be random variables over domain R. We say that X and Y are (ϵ, β)-Indistinguishable if, for every possible subset S R, the following holds:

Pr(X S) exp(ϵ) Pr(Y S) + β,

Pr(Y S) exp(ϵ) Pr(X S) + β.

Definition 4.2. (Sensitivity (Dwork, 2006)): For a given function q : D Rd, the sensitivity of q is

= max D,D q(D) q(D ) ,

where D and D differ in a single entry.

Definition 4.3. (Gaussian Mechanism (Bun & Steinke, 2016)): Given random variables X N(µ1, σ2Id) and Y N(µ2, σ2Id) satisfying µ1 µ2 , then X and Y are (ϵ, β)-Indistinguishable if it holds

2 log(1/β).

Definition 4.4. (Client-Level (ϵ, β)-Machine Unlearning (Zhang et al., 2023)): An unlearning algorithm MU satisfies (ϵ, β)-machine unlearning with respect to the learning algorithm ML if, for any possible subset of outputs S Rd, the following holds

Pr(ML(V\{n}) S) exp(ϵ) Pr(MU(Ω) S) + β,

Pr(MU(Ω) S) exp(ϵ) Pr(ML(V\{n}) S) + β.

where Ωdenotes the set of cached statistics of each client, such as gradients and intermediate model parameters.

Assumption 4.5. (Lipschitzian gradient). Loss function fi( )s are with Lipschitzian gradients, i.e., For θ, ϕ Rd, it holds that

fi(θ) fi(ϕ) L θ ϕ

Assumption 4.6. (Bounded variance). For any θ Rd, the variance of the stochastic gradient is bounded as follows:

E Fi(θ, ξ) fi(θ) 2 σ2 1,

E fi(θ) f(θ) 2 σ2 2.

Assumption 4.7. (Symmetric double stochastic matrix). In each round t, the communication matrices Wt and Wt

are real double stochastic matrices.

Assumption 4.8. (Spectral gap). For any symmetric doubly stochastic matrices Wt and Wt aforementioned, we assume that ρt,1 = max{|λ2(Wt)|, |λn(Wt)|} < 1 and ρt,2 = max{|λ2( Wt)|, |λn 1( Wt)|} < 1. Specifically, we denote ρ1 = max t ρt,1 and ρ2 = max t ρt,2.

4.1. Unlearning Performance

To explore the unlearning performance of Algorithm 2, we

analyze the gap between 1 n 1

i=1 θt1 i and 1 n 1

i=1 θt1 i in

Theorem 4.9. We then use Definitions 4.1-4.4 to establish the statistical indistinguishability between the two algorithms, as described in Corollary 4.10.

Before obtaining the formal indistinguishability result, we first explore the difference between the average of the remaining n 1 models in Algorithm 2 when client-n revokes data consent and the average model retrained by n 1 clients in Algorithm 1, as shown in Lemma B.1 of Appendix B. We then measure the weighted gradient residual approximation in Lemma B.2 of Appendix B. Based on these two lemmas, we can derive the following result for the gap between the

averages 1 n 1

i=1 θt1 i and 1 n 1

i=1 θt1 i .

Theorem 4.9. Let

D1 = 1 24η2L2ρ2 1nt1,

D3 = ρ2 1 + ρ2 2 + 1 (n 1)2 ,

E2 = ρ2 2 D2 ,

D2 = 1 36η2L2ρ2 2(n 1)t1,

E1 = (6 + 4ρ2 1)ρ2 1 D1 ,

E3 = 8(1 + ρ2 1)ρ2 1 n D1 .

We can obtain the following result if the step size η satisfies 0 < η < min{ q

1 12L2t2 1 , q

1 24L2ρ2 1nt1 , q

1 36L2ρ2 2(n 1)t1 }:

θt1 i 1 n 1

PDUDT: Provable Decentralized Unlearning under Dynamic Topologies

v u u u u u t

I1σ2 1 + I2σ2 2 + I3(Ef(θ0) Ef( 1

i=1 θt1 i ))+

I4(Ef(θ0) Ef( 1 n 1

i=1 θt1 i ))

I 1σ2 1 + I 2σ2 2 + I 3(Ef(θ0) Ef( 1

i=1 θt1 i ))

I1 =(24η4L2nt4 1 + 144η5L3t4 1) (E1 + E2 + E3)+

8η2(5 + 2ρ2 1)t3 1 I2 =144η4L2nt4 1(E1 + 3E2 + 2E3) + 48η2(1 + ρ2 1)t3 1

I3 =288η3L2nt3 1(E1 + E3) + 192η(1 + ρ2 1)t2 1 n I4 =288η3L2nt3 1E2

I 1 =6 η2n + 3η3L + 12η4L2ρ2 1n2t1 D1 + 72η5L3ρ2 1nt1 D1

I 2 = 18η2n + 432η4L2ρ2 1n2t1 D2

I 3 = 36ηnt1 + 864η3L2ρ2 1n2t2 1 D1

In Theorem 4.9, the terms I1σ2 1 and I 1σ2 1 reflect the impact of random sampling, the terms I2σ2 2 and I 2σ2 2 capture the local loss heterogeneity, and the remaining terms represent the decentralized training dynamics. And according to Theorem 4.9, the formal indistinguishability guarantee can be summarized as follows. Corollary 4.10. From a global perspective, our PDUDT and its early stopping variant PDUDT (ES) (described in Section 4.3) satisfies (ϵ, β)-machine unlearning with respect to the perturbed retraining algorithm. For every possible subset S R, it holds that

i=1 θu i S) exp(ϵ) Pr( 1 n 1

i=1 ˇθt1 i S)+β,

i=1 ˇθt1 i S) exp(ϵ) Pr( 1 n 1

i=1 θu i S)+β.

4.2. Convergence Analysis

After each client-i (i V\{n}) eliminates the influence of client-n (Line 12 in Algorithm 2), we assume that it continues T rounds of decentralized collaborative learning with the communication topology Wt1+t 1 (t = 1, , T). To simplify the analysis, we regard θu i as the initial model of each client-i (i V\{n}) in these T rounds of training. Then the model update rules are as follows:

ˆθt+1 i = ˆθt i ˆη

j=1 W t1+t ij Fj(ˆθt j, ˆξt j), ˆθ0 i = θu i (10)

Considering that client-n has exited the collaborative learning system, we replace f( ) in Equation (1) with f( ), which is defined as:

f(θ) := 1 n 1

i=1 fi(θ), θ Rd (11)

As a result, after eliminating the client-n influence through Algorithm 2, the convergence of decentralized learning in the subsequent T rounds is characterized by Theorem 4.11. Theorem 4.11. Let

2 16ˆη2L2ρ2 2(n 1)T 1 32ˆη2L2ρ2 2(n 1)T , D5 = 1

D6 = 1 32ˆη2L2ρ2 2(n 1)T.

If the step size satisfies ˆη < q

1 32L2ρ2 2(n 1)T , it holds for

the subsequent T rounds of training:

t=0 E f( 1 n 1

i=1 ˆθt i) 2 +

t=0 E 1 n 1

i=1 f(ˆθt i) 2

i=1 E θu i 1 n 1

i=1 θu i 2 + ˆηLσ2 1 2(n 1)

i=1 θu i ) f

ˆηT + 2ˆη2L2ρ2 2σ2 1(n 1)T D6

+ 16ˆη2L2ρ2 2σ2 2(n 1)T D6 + 16ˆη2L2ρ2 2σ2 2T (n 1)D6

We choose an appropriate step size ˆη in Theorem 4.9 to derive the following result. Corollary 4.12. If the step size satisfies ˆη = n 1

T , and the number of training round further satisfies T max{ 64(1 C)L2ρ2 2(n 1)3

1 2C , (n 1)L} with the constant C (0, 1

2), the following holds for the subsequent T rounds:

t=0 E f( 1 n 1

i=1 ˆθt i) 2

i=1 θu i ) f

n 1 + (1 2C)σ2 2 2(n 1)2 + (1 2C)σ2 1 16

+(1 2C)σ2 2 2 + 3(1 C)L2

i=1 E θu i 1 n 1

In Corollary 4.12, the bound sharply decreases as the number of clients n 1 and training rounds T increase. Specifically, it indicates a rate of O( 1

T ), converging to the scaled difference among the unlearning models from all n 1 clients.

PDUDT: Provable Decentralized Unlearning under Dynamic Topologies

4.3. Discussion

In this part, we provide a discussion of our decentralized unlearning algorithm PDUDT regarding privacy considerations, unlearning time, space overhead, and early stopping benefits.

Privacy considerations. Some works have shown that transmitting the raw gradient can also lead to privacy information leakage. Differential privacy technology, as the most commonly used privacy protection method for distributed learning, provides a solution to this concern by adding noise to perturb the gradient. For these decentralized learning frameworks based on differential privacy, the perturbed information of client-n can be regarded as its contribution to the system. As a result, our proposed mechanism can still be used to eliminate the noisy impact of client-n.

Unlearning time. The proposed algorithm involves only simple operations (Lines 7-12 in Algorithm 2) to achieve unlearning. Let t1 denote the training round when client-n submits an unlearning request. For each client, the time complexity to remove the contribution of client-n is O(t1). Specifically, subtracting the weighted gradient residual approximation requires O(t1) time, while computing the distance d1 and sampling noise from a Gaussian distribution each require O(1) time. Furthermore, the unlearning operations are only performed on each client side, so the required communication overhead is 0.

Space overhead. The proposed decentralized unlearning algorithm requires each client-i to save the gradients of its neighbors. Let Nmax = maxi,t |N t i | denote the maximum number of neighbors over rounds t = 0, . . . , t1 1. For each client, it costs O(t1Nmaxd) in memory space to perform the unlearning operations. Although Nmax n always holds, especially in the scenarios like social networks, the required memory increases linearly as the number of neighbors grows. To further reduce memory usage, each client may consider an early stopping strategy based on its resource constraints and model performance.

Early stopping benefits. For neural networks in an overparameterization regime, the NTK theory (Lee et al., 2019) suggests that gradient-based methods converge exponentially to zero training error, with minimal variation in the model s parameters (Zhang et al., 2023). Therefore, each client needs to store only the gradients of its neighbors for the first t1,i t1 rounds to save memory. Under this setup, we can redefine Lines 8-10 in Algorithm 2: each clienti computes the approximations {δt i}t1,i 1 t=0 of the gradient residuals using Equation (7). Then, it calculates the weight p t i for every approximation δt i (t = 0, , t1,i 1) as

p t i = Pn j=1 W t ij fj(θt j, ξt j) 2 Pt1,i 1 t=0 Pn j=1 W t ij fj(θt j, ξt j) 2

and subtract a weighted sum of δt i from θt1 i to obtain θt1 i :

θt1 i = θt1 i

t=0 p t i δt i, i V\{n}.

Furthermore, we can prove that the weighted gradient residual approximation with early stopping is also bounded by the right-hand side of the inequality in Lemma B.2, as detailed in Appendix D.

5. Experiments

In this section, we evaluate PDUDT from many aspects, such as its statistical indistinguishability from the perturbed retraining algorithm, as well as its efficiency and effectiveness of unlearning.

5.1. Experimental Setup

According to the complexity of the learning tasks, we train the CNN model for MNIST (Lecun et al., 1998) and Fashion MNIST (Xiao et al., 2017) datasets, and the Res Net-18 model (He et al., 2016) for CIFAR-10 (Krizhevsky & Hinton, 2009) and SVHN (Netzer et al., 2011) datasets. To show the advantages of our proposed PDUDT algorithm, we compare it with some baseline methods, including Origin (Lian et al., 2017), Retrain, FATS-Unl (Tao et al., 2024), Fed Recovery (Zhang et al., 2023) and HDUS (Ye et al.,

2024). Multiple metrics are used to evaluate the performance of the proposed decentralized unlearning algorithm, including accuracy, unlearning time, communication overhead, and attack success rate. More details can be found in Appendix I.

In our experiments, we work with total n = 10 clients. Specifically, in each round t, whether there is a connection between any two clients is randomly generated. Then, in order to ensure that the communication situation can be modeled as a doubly stochastic matrix, we use the Metropolis Hastings method (Awan et al., 2006) to generate the communication weights among clients. Each client trains with a batch size of 256 for 1 epoch per round, with a step size of 0.001. The unlearning request from client-n is set to occur at round t1 = 100. To save storage space, each client can apply an early stopping strategy by retaining its neighbors information only for the first 80 rounds. After performing the unlearning operations, the remaining n 1 clients continue training collaboratively for an additional 200 rounds. We conduct 100 membership inference attacks, presenting both the average performance and standard deviation.

Our experiments are conducted using Py Torch 2.5.1, Python 3.12, and Cuda 12.1. The experiments run on a cloud server equipped with an Intel(R) Xeon(R) Platinum 8358P CPU and 10 RTX 3090 GPUs, operating on Ubuntu 22.04.

PDUDT: Provable Decentralized Unlearning under Dynamic Topologies

0.002 0.004 0.006 0.008 0.010 0.2

Perturbed Retraining PDUDT PDUDT (ES)

0.002 0.004 0.006 0.008 0.010 0.2

0.8 Fashion-MNIST

Perturbed Retraining PDUDT PDUDT (ES)

0.003 0.006 0.009 0.012 0.015 0.0

0.6 CIFAR-10

Perturbed Retraining PDUDT PDUDT (ES)

0.003 0.006 0.009 0.012 0.015 0.0

Perturbed Retraining PDUDT PDUDT (ES)

Figure 1. The accuracy of unlearned models using PDUDT, PDUDT (ES), and perturbed retrained models.

0 1 2 3 4 5 6 7 8 9 Class

Retrain PDUDT PDUDT (ES)

0 1 2 3 4 5 6 7 8 9 Class

1.0 Fashion-MNIST

Retrain PDUDT PDUDT (ES)

0 1 2 3 4 5 6 7 8 9 Class

1.0 CIFAR-10

Retrain PDUDT PDUDT (ES)

0 1 2 3 4 5 6 7 8 9 Class

Retrain PDUDT PDUDT (ES)

Figure 2. The accuracy on each class using PDUDT, PDUDT (ES), and perturbed retrained models.

Table 2. Comparison of the unlearning time across different unlearning methods.

Method Unlearning time (s)

MNIST Fashion-MNIST CIFAR-10 SVHN

Retrain 1322.2 1299.8 1696.2 2747.8 FATS-Unl 573.6 573.4 884.0 1233.2 Fed Recovery 3.5 3.9 8.1 8.5 HDUS 19.3 19.9 24.1 38.0 PDUDT 3.0 3.3 9.6 9.8 PDUDT (ES) 2.3 2.6 7.5 7.8

Table 3. Comparison of the attack precision of MIA across different unlearning methods.

Method Attack precision (%)

MNIST Fashion-MNIST CIFAR-10 SVHN

Origin 65.3 0.9 67.1 1.9 65.3 0.7 68.1 0.5 Retrain 49.4 1.3 49.2 2.2 51.0 0.6 49.5 0.7 FATS-Unl 51.3 1.2 49.6 4.3 53.1 1.2 51.8 1.6 Fed Recovery 50.1 0.7 48.4 0.9 52.5 0.6 51.1 0.3 HDUS 50.2 1.1 51.2 0.7 50.6 0.3 51.2 1.3 PDUDT 49.9 0.6 49.0 1.3 50.1 1.6 51.3 0.3 PDUDT (ES) 50.8 0.6 50.2 0.9 51.1 1.3 51.5 1.4

5.2. Experimental Results

1) Statistical indistinguishability: To examine the effect of different noise scales on the statistical indistinguishability between unlearned models and perturbed retrained models, we vary the values of σ from 0.001 to 0.015. The performance is evaluated in terms of the average model accuracy. Specifically, we start decentralized learning from pre-trained models to control the injected noise and ensure good performance. Figure 1 illustrates the statistical indistinguishability

of our PDUDT algorithm from the Perturbed Retraining method under varying noise scales σ. Across all datasets, PDUDT achieves comparable accuracy to the Perturbed Retraining method under all noise conditions, demonstrating its ability to maintain statistical indistinguishability effectively. Similarly, PDUDT (ES), the space-saving version of PDUDT, also shows comparable performance, though with slight degradation in MNIST dataset. Overall, the results confirm that both PDUDT and PDUDT (ES) can maintain statistical indistinguishability with perturbed retrained models across all noise conditions.

2) The Effectiveness of Unlearning: To evaluate the effectiveness of PDUDT and its space-saving version PDUDT (ES), we record the average accuracy on each class. In this experiment, data from class 9 is exclusively owned by the client who requests unlearning. Therefore, the unlearning performance can be assessed based on the accuracy of this class. From Figure 2, it can be observed that both PDUDT and PDUDT (ES) maintain high performance on Classes 0 8, while their accuracy drops significantly on Class 9, exhibiting behavior similar to the retraining method. This demonstrates that PDUDT and PDUDT (ES) effectively enable the entire system to forget what it has learned from the client initiating the unlearning request.

3) The Efficiency of Unlearning: To verify the efficiency of PDUDT and PDUDT (ES), we evaluate the time required for unlearning operations. In Table 2, our PDUDT and its spacesaving version PDUDT (ES) show impressive performance in both unlearning time. Specifically, they reduce unlearning time by over 99% compared to the Retrain method. The time consumption of Fed Recovery, HDUS, PDUDT and PDUDT (ES) primarily arises from computing gradient residuals or

PDUDT: Provable Decentralized Unlearning under Dynamic Topologies

the stored distilled seed models ensemble. In contrast, the time consumption for Retrain and FATS-Unl is mainly due to parameter training within the networks.

4) The Performance of MIA on Unlearned Models: In this experiment, we conduct 100 membership inference attacks, presenting both the average performance and standard deviation across different unlearning methods. As shown in Table 3, the MIA achieves high attack precision on the original models, indicating that the attacker can successfully determine whether the target client s data was used during training. In contrast, after unlearning, its attack precision drops to approximately 50%, demonstrating that the proposed unlearning methods successfully remove the impact of the target client. Notably, the performance of PDUDT and PDUDT (ES) is comparable to that of Retrain and FATSUnl, highlighting that our PDUDT and PDUDT (ES) methods achieve similar unlearning effectiveness without relying on retraining.

6. Conclusion

In this paper, we propose the first provable decentralized unlearning algorithm PDUDT under dynamic topologies. Theoretically, we derive its statistical indistinguishability and the convergence of its subsequent learning process. We conduct extensive experiments to verify the performance superiority of the proposed unlearning algorithm.

Our work inspires future research on provable decentralized unlearning. It paves the way for investigating adaptive mechanisms that enhance unlearning efficiency under dynamic topologies and for extending the unlearning framework to broader decentralized paradigms.

Acknowledgements

This work was supported in part by Joint Key Funds of National Natural Science Foundation of China under Grant U23A20302 and U24B20149, in part by the National Natural Science Foundation of China under Grant 62202273 and 62302247, in part by the National Key Research and Development Program of China under Grant No. 2022YFF0712100, in part by the Postdoctoral Fellowship Program of CPSF under Grant GZC20231460, in part by the China Postdoctoral Science Foundation under Grant 2024M761806.

Impact Statement

This paper presents work whose goal is to advance the field of Machine Learning. There are many potential societal consequences of our work, none which we feel must be specifically highlighted here.

Awan, A., Ferreira, R. A., Jagannathan, S., and Grama, A. Distributed uniform sampling in unstructured peer-topeer networks. In Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS 06), volume 9, pp. 223c 223c. IEEE, 2006.

Bun, M. and Steinke, T. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Theory of cryptography conference, pp. 635 658. Springer, 2016.

Dwork, C. Differential privacy. In International colloquium on automata, languages, and programming, pp. 1 12. Springer, 2006.

He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770 778. IEEE Computer Society, 2016.

Krizhevsky, A. and Hinton, G. Learning multiple layers of features from tiny images. Technical Report 0, University of Toronto, Toronto, Ontario, 2009.

Lecun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11):2278 2324, 1998. doi: 10.1109/5.726791.

Lee, J., Xiao, L., Schoenholz, S., Bahri, Y., Novak, R., Sohl Dickstein, J., and Pennington, J. Wide neural networks of any depth evolve as linear models under gradient descent. Advances in neural information processing systems, 32, 2019.

Lian, X., Zhang, C., Zhang, H., Hsieh, C.-J., Zhang, W., and Liu, J. Can decentralized algorithms outperform centralized algorithms? a case study for decentralized parallel stochastic gradient descent. Advances in neural information processing systems, 30, 2017.

Lin, Y., Du, H., Gao, Z., Yao, J., Jiang, B., Niyato, D., Li, R., and Zhang, P. Decentralized unlearning for trustworthy aigenerated content (aigc) services. IEEE Network, 2024.

Liu, G., Ma, X., Yang, Y., Wang, C., and Liu, J. Federaser: Enabling efficient client-level data removal from federated learning models. In 2021 IEEE/ACM 29th international symposium on quality of service (IWQOS), pp. 1 10. IEEE, 2021.

Liu, X., Li, M., Wang, X., Yu, G., Ni, W., Li, L., Peng, H., and Liu, R. Decentralized federated unlearning on blockchain. ar Xiv preprint ar Xiv:2402.16294, 2024a.

PDUDT: Provable Decentralized Unlearning under Dynamic Topologies

Liu, Y., Xu, L., Yuan, X., Wang, C., and Li, B. The right to be forgotten in federated learning: An efficient realization with rapid retraining. In IEEE INFOCOM 2022-IEEE Conference on Computer Communications, pp. 1749 1758. IEEE, 2022.

Liu, Z., Jiang, Y., Shen, J., Peng, M., Lam, K.-Y., Yuan, X., and Liu, X. A survey on federated unlearning: Challenges, methods, and future directions. ACM Computing Surveys, 57(1):1 38, 2024b.

Mc Mahan, B., Moore, E., Ramage, D., Hampson, S., and y Arcas, B. A. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pp. 1273 1282. PMLR, 2017.

Neel, S., Roth, A., and Sharifi-Malvajerdi, S. Descent-todelete: Gradient-based methods for machine unlearning. In Algorithmic Learning Theory, pp. 931 962. PMLR, 2021.

Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A. Y., et al. Reading digits in natural images with unsupervised feature learning. In NIPS workshop on deep learning and unsupervised feature learning, pp. 4. Granada, 2011.

Qiao, J., Zhang, Z., Yue, S., Yuan, Y., Cai, Z., Zhang, X., Ren, J., and Yu, D. Br-defedrl: Byzantine-robust decentralized federated reinforcement learning with fast convergence and communication efficiency. In IEEE INFOCOM 2024-IEEE Conference on Computer Communications, pp. 141 150. IEEE, 2024.

Tao, Y., Wang, C.-L., Pan, M., Yu, D., Cheng, X., and Wang, D. Communication efficient and provable federated unlearning. Proceedings of the VLDB Endowment, 17(5): 1119 1131, 2024.

Voigt, P. and Von dem Bussche, A. The eu general data protection regulation (gdpr). A Practical Guide, 1st Ed., Cham: Springer International Publishing, 10(3152676): 10 5555, 2017.

Wang, J., Guo, S., Xie, X., and Qi, H. Federated unlearning via class-discriminative pruning. In Proceedings of the ACM Web Conference 2022, pp. 622 632, 2022.

Wu, C., Zhu, S., and Mitra, P. Federated unlearning with knowledge distillation. ar Xiv preprint ar Xiv:2201.09441, 2022a.

Wu, L., Guo, S., Wang, J., Hong, Z., Zhang, J., and Ding, Y. Federated unlearning: Guarantee the right of clients to forget. IEEE Network, 36(5):129 135, 2022b.

Xiao, H., Rasul, K., and Vollgraf, R. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. ar Xiv preprint ar Xiv:1708.07747, 2017.

Ye, G., Chen, T., Hung Nguyen, Q. V., and Yin, H. Heterogeneous decentralised machine unlearning with seed model distillation. CAAI Transactions on Intelligence Technology, 2024.

Yuan, Y., Wang, B., Zhang, C., Xiong, Z., Li, C., and Zhu, L. Towards efficient and robust federated unlearning in iot networks. IEEE Internet of Things Journal, 2024.

Zhang, L., Zhu, T., Zhang, H., Xiong, P., and Zhou, W. Fedrecovery: Differentially private machine unlearning for federated learning frameworks. IEEE Transactions on Information Forensics and Security, 2023.

PDUDT: Provable Decentralized Unlearning under Dynamic Topologies

Appendix A summarizes the main notations in this paper. Appendix B lists some important theoretical results, including Lemma B.1 and Lemma B.2. Appendix C shows the proof of Lemma B.1. Appendix D shows the proof of Lemma B.2. Appendix E shows the proof of Theorem 4.9. Appendix F shows the proof of Corollary 4.10. Appendix G shows the proof of Theorem 4.11. Appendix H shows the proof of Corollary 4.12. Appendix I shows the experimental details, including datasets and models, baseline methods, metrics and additional experimental results.

A. Notation Table

Table 4. Notations and descriptions.

Notations Descriptions t1 The time when client-n issues an unlearning request η The step size for Algorithms 1-2 σ The noise scale in Algorithms 1-2 ϵ The privacy budget of indistinguishability β The confidence parameter of indistinguishability θt i The local model of client-i in round t (t = 1, , t1) in Algorithm 1 θt i The local model of client-i in round t (t = 1, , t1) in Algorithm 2 ˇθt1 i The perturbed retrained model of client-i in Algorithm 1 rt i The gradient residual in round t related to client-i s neighbors information δt i The approximation of the gradient residual rt i in round t, which is computed by client-i pt i The weight of δt i, which measures the contributions of client-i s neighbors gradients in round t θt1 i The local model of client-i after removing the influence of client-n based on Equation (9) θu i The unlearned model of client-i in Algorithm 2 Fi(θt i, ξt i) The local gradient of client-i related to model θt i and sample ξt i in round t

B. Some important theoretical results

Before obtaining the formal indistinguishability result, we first explore the difference between the average of the remaining n 1 models in Algorithm 2 when client-n revokes data consent and the average model retrained by n 1 clients in Algorithm 1, as shown in Lemma B.1. We then measure the weighted gradient residual approximation in Lemma B.2. Lemma B.1. Let

D1 = 1 24η2L2ρ2 1nt1, D2 = 1 36η2L2ρ2 2(n 1)t1.

If 0 < η < min{ q

1 12L2t2 1 , q

1 24L2ρ2 1nt1 , q

1 36L2ρ2 2(n 1)t1 }, it holds that

i=1 θt1 i 1 n 1

v u u u u u u u u t

(24η4L2σ2 1nt4 1 + 144η5L3σ2 1t4 1) ( (6+4ρ2 1)ρ2 1 D1 + ρ2 2 D2 + 8(1+ρ2 1)ρ2 1 n D1 ) + 8η2(5 + 2ρ2 1)σ2 1t3 1 + 48η2(1 + ρ2 1)σ2 2t3 1+

144η4L2σ2 2t4 1 ( (6+4ρ2 1)ρ2 1n D1 + 3ρ2 2(n 1)

D2 + 16(1+ρ2 1)ρ2 1 D1 ) + ( 2304η3L2ρ2 1(1+ρ2 1)t3 1 D1 + 576η3L2ρ2 1(3+2ρ2 1)nt3 1 D1 +

192η(1+ρ2 1)t2 1 n ) (Ef(θ0) Ef( 1

i=1 θt1 i )) + 288η3L2ρ2 2(n 1)t3 1 D2 (Ef(θ0) Ef( 1 n 1

i=1 θt1 i ))

Lemma B.2. Let

D1 = 1 24η2L2ρ2 1nt1, D2 = 1 36η2L2ρ2 2(n 1)t1

PDUDT: Provable Decentralized Unlearning under Dynamic Topologies

D3 = ρ2 1 + ρ2 2 + 1 (n 1)2 .

If 0 < η < min{ q

1 12L2t2 1 , q

1 24L2ρ2 1nt1 , q

1 36L2ρ2 2(n 1)t1 }, the norm of the weighted gradient residual approximation (with

early stopping, discussed in Section 4.3) holds that

t=0 pt iδt i

(Early stopping: 1 n 1

t=0 p t i δt i )

v u u u u t

6 η2n + 3η3L + 12η4L2ρ2 1n2t1 D1 + 72η5L3ρ2 1nt1 D1 D3σ2 1t2 1 + 18η2n + 432η4L2ρ2 1n2t1 D2 D3σ2 2t2 1+ 36ηnt1 + 864η3L2ρ2 1n2t2 1 D1 D3(Ef(θ0) Ef( 1

i=1 θt1 i ))

C. Proof of Lemma B.1

Based on Equation (5), (9), (4) and (7), we can directly obtain Equation (12)-(15):

i=1 θt1 i = θ0 i η n 1

j=1 Fj( θt j, ξt j) (12)

θt1 i =θ0 i η n 1

j=1 W t ij Fj(θt j, ξt j)

j=1 W t nj Fj(θt j, ξt j)

i=1 pt iδt i

=θ0 i η n 1

j=1 Fj(θt j, ξt j) + η n 1

j=1 W t nj Fj(θt j, ξt j)

i=1 pt iδt i (13)

i=1 θt1 i =θ0 i η n 1

j=1 Fj(θt j, ξt j) + η n 1

j=1 W t nj Fj(θt j, ξt j) (14)

i=1 δt i = η n 1

j=1 W t ij Fj(θt j, ξt j) η n 1

j=1 W t ij Fj(θt j, ξt j) + η n 1

j=1 W t nj Fj(θt j, ξt j)

= η n 1 n X

j=1 W t nj Fj(θt j, ξt j) Fn(θt n, ξt n) (15)

Substitute Equation (15) into Equation (13) to get

θt1 i =θ0 i η n 1

j=1 Fj(θt j, ξt j) + η n 1

Fn(θt n, ξt n) +

i=1 (1 pt i)δt i (16)

Consider the gap

i=1 θt1 i 1 n 1

j=1 Fj(θt j, ξt j) + η n 1

j=1 Fj( θt j, ξt j) + η n 1

j=1 W t nj Fj(θt j, ξt j)

Fj( θt j, ξt j) Fj(θt j, ξt j) + η n 1

j=1 W t nj Fj(θt j, ξt j) Fn(θt n, ξt n) (17)

PDUDT: Provable Decentralized Unlearning under Dynamic Topologies

Then we have

i=1 θt1 i 1 n 1

i=1 θt1 i 2

j=1 E Fj( θt j, ξt j) fj( θt j) + fj( θt j) fj( 1 n 1

j=1 θt j) + fj( 1 n 1

j=1 θt j) + fj( 1 n 1

j=1 θt j) fj( 1

j=1 θt j) + fj( 1

j=1 θt j) fj(θt j) + fj(θt j) Fj(θt j, ξt j) 2

+ 2η2t1 (n 1)2

j=1 W t nj Fj(θt j, ξt j) 1

j=1 Fj(θt j, ξt j) + 1

j=1 Fj(θt j, ξt j) Fn(θt n, ξt n) 2

=24η2t2 1σ2 1 + 12η2L2t1

j=1 E θt j 1 n 1

j=1 θt j 2 +12η2L2t1

j=1 E 1 n 1

j=1 θt j 1 n 1

j=1 θt j 2 +

t=0 E θt n 1

i=1 θt i 2 +12η2L2t1

i=1 E θt i 1

i=1 θt i 2 +

4η2t1 (n 1)2

t=0 E (e T n W t 1T n n )Gt (n) 2 + 4η2t1

t=0 E (1T n n e T n)Gt (n) 2

24η2t2 1σ2 1 + 12η2L2t1

i=1 E θt i 1 n 1

i=1 θt i 2 +12η2L2t1

i=1 E 1 n 1

i=1 θt i 1 n 1

i=1 θt i 2 +

i=1 E θt i 1

i=1 θt i 2 +4η2(1 + ρ2 1)t1 (n 1)2

t=0 E Gt (n) Ht (n) + Ht (n) 2 (18)

Bound E Gt (n) Ht (n) + Ht (n) 2:

E Gt (n) Ht (n) + Ht (n) 2

i=1 E Fi(θt i, ξt i) fi(θt i) 2 +

i=1 E fi(θt i) fi( 1

i=1 θt i) + fi( 1

i=1 θt i) f( 1

i=1 θt i) + f( 1

i=1 θt i) 2

2nσ2 1 + 6L2 n X

i=1 E θt i 1

i=1 θt i 2 +6nσ2 2 + 6n E f( 1

i=1 θt i) 2 (19)

Bound E θt i 1 n 1 Pn 1 i=1 θt i 2:

E θt i 1 n 1

j=1 Fj( θl j, ξl j)

j=1 W l ij Fj( θl j, ξl j) 2

l=0 ( 1T n 1 n 1 e T i W l)( Gl (n 1) Hl (n 1) + Hl (n 1)) 2

l=0 E ( 1T n 1 n 1 e T i W l)( Gl (n 1) Hl (n 1)) 2 +

l=0 E ( 1T n 1 n 1 e T i W l) Hl (n 1) 2 +

PDUDT: Provable Decentralized Unlearning under Dynamic Topologies

l =l E ( 1T n 1 n 1 e T i W l) Hl (n 1), ( 1T n 1 n 1 e T i W l ) Hl

2η2ρ2 2σ2 1(n 1)t + 2η2 ρ2 2

l=0 E Hl (n 1) 2 +

l =l E ( 1T n 1 n 1 e T i W l) Hl (n 1), ( 1T n 1 n 1 e T i W l ) Hl

l=0 E Hl (n 1) 2

i=1 E fi( θl i) fi( 1 n 1

i=1 θl i) + fi( 1 n 1

i=1 θl i) f( 1 n 1

i=1 θl i) + f( 1 n 1

i=1 θl i) 2

i=1 E θl i 1 n 1

i=1 θl i 2 +3σ2 2(n 1)t + 3(n 1)

l=0 E f( 1 n 1

i=1 θl i) 2 (21)

l =l E ( 1T n 1 n 1 e T i W l) Hl (n 1), ( 1T n 1 n 1 e T i W l ) Hl

l =l E 1T n 1 n 1 e T i W l Hl (n 1) 1T n 1 n 1 e T i W l Hl

l =l E( Hl (n 1) 2 + Hl

l =l E Hl (n 1) 2 (22)

Substitute Equation (21) and (22) into Equation (20) to get

E θt i 1 n 1

2η2ρ2 2σ2 1(n 1)t + 12η2ρ2 2σ2 2(n 1)t+

i=1 E θl i 1 n 1

i=1 θl i 2 +12η2ρ2 2(n 1)

l=0 E f( 1 n 1

i=1 θl i) 2 (23)

Summing over t = 0 to t1 1 and i = 1 to n 1 yields:

i=1 E θt i 1 n 1

2η2ρ2 2σ2 1(n 1)2t2 1 + 12η2ρ2 2σ2 2(n 1)2t2 1+

12η2L2ρ2 2(n 1)t1

i=1 E θt i 1 n 1

i=1 θt i 2 +12η2ρ2 2(n 1)2t1

t=0 E f( 1 n 1

i=1 θt i) 2 (24)

Therefore, it holds that

(1 12η2L2ρ2 2(n 1)t1)

i=1 E θt i 1 n 1

PDUDT: Provable Decentralized Unlearning under Dynamic Topologies

2η2ρ2 2σ2 1(n 1)2t2 1 + 12η2ρ2 2σ2 2(n 1)2t2 1 + 12η2ρ2 2(n 1)2t1

t=0 E f( 1 n 1

i=1 θt i) 2 (25)

Bound E θt i 1

i=1 θt i 2:

j=1 Fj(θl j, ξl j)

j=1 W l ij Fj(θl j, ξl j) 2

l=0 (1T n n e T i W l)(Gl (n) Hl (n) + Hl (n)) 2

l=0 E (1T n n e T i W l)(Gl (n) Hl (n)) 2 +2η2 t 1 X

l=0 E (1T n n e T i W l)Hl (n) 2 +

l =l E (1T n n e T i W l)Hl (n), (1T n n e T i W l )Hl

2η2ρ2 1σ2 1nt + 4η2ρ2 1

l=0 E Hl (n) 2 (26)

l=0 E Hl (n) 2

l=0 E fi(θl i) fi( 1

i=1 θl i) + fi( 1

i=1 θl i) f( 1

i=1 θl i) + f( 1

i=1 θl i) 2

12η2ρ2 1L2 t 1 X

i=1 E θl i 1

i=1 θl i 2 +12η2ρ2 1σ2 2nt + 12η2ρ2 1n

i=1 θl i) 2 (27)

Substitute Equation (27) into Equation (26) to get

2η2ρ2 1σ2 1nt + 12η2ρ2 1σ2 2nt + 12η2ρ2 1L2 t 1 X

i=1 E θl i 1

i=1 θl i 2 +12η2ρ2 1n

i=1 θl i) 2 (28)

Summing over t = 0 to t1 1 and i = 1 to n yields:

i=1 E θt i 1

2η2ρ2 1σ2 1n2t2 1 + 12η2ρ2 1σ2 2n2t2 1 + 12η2ρ2 1L2nt1

i=1 E θt i 1

i=1 θt i 2 +12η2ρ2 1n2t1

i=1 θt i) 2

Therefore, it holds that

(1 12η2L2ρ2 1nt1)

i=1 E θt i 1

PDUDT: Provable Decentralized Unlearning under Dynamic Topologies

2η2ρ2 1σ2 1n2t2 1 + 12η2ρ2 1σ2 2n2t2 1 + 12η2ρ2 1n2t1

i=1 θt i) 2 (30)

According to Equation (18), we have

t=0 E 1 n 1

i=1 θt i 1 n 1

i=1 θt i 2 +E 1 n 1

i=1 θt1 i 1 n 1

i=1 θt1 i 2

24η2t3 1σ2 1 + 12η2L2t2 1 n 1

i=1 E θt i 1 n 1

i=1 θt i 2 +12η2L2t2 1

t=0 E 1 n 1

i=1 θt i 1 n 1

i=1 θt i 2 +

12η2L2t2 1 n 1

i=1 E θt i 1

i=1 θt i 2 +4η2(1 + ρ2 1)t2 1 (n 1)2

t=0 E Gt (n) Ht (n) + Ht (n) 2 (31)

1 12η2L2t2 1 0 η

i=1 θt1 i 1 n 1

i=1 θt1 i 2

24η2t3 1σ2 1 + 12η2L2t2 1 n 1

i=1 E θt i 1 n 1

i=1 θt i 2 +12η2L2t2 1 n 1

i=1 E θt i 1

i=1 θt i 2 +

4η2(1 + ρ2 1)t2 1 (n 1)2

t=0 E Gt (n) Ht (n) + Ht (n) 2 (32)

Then we need to explore the relationship between θt i 1

i=1 θt i 2 and f( 1

i=1 θt i) 2:

θt i = θ0 i η

j=1 W l ij Fj(θl j, ξl j)

i=1 θt i = θ0 i η

j=1 Fj(θl j, ξl j)

i=1 θt 1 i = θ0 i η

j=1 Fj(θl j, ξl j)

Based on Assumption 4.5, we can derive the following:

i=1 θt i) Ef( 1

i=1 θt 1 i )

i=1 Fi(θt 1 i , ξt 1 i ), f( 1

i=1 θt 1 i )

i=1 Fi(θt 1 i , ξt 1 i ) 2

i=1 fi(θt 1 i ), f( 1

i=1 θt 1 i )

PDUDT: Provable Decentralized Unlearning under Dynamic Topologies

i=1 fi(θt 1 i ) 2 +E f( 1

i=1 θt 1 i ) 2) + η

i=1 fi(θt 1 i ) f( 1

i=1 θt 1 i ) 2

i=1 ( Fi(θt 1 i , ξt 1 i ) fi(θt 1 i ) + fi(θt 1 i )) 2

η2Lσ2 1 2n + η2L

i=1 fi(θt 1 i ) 2

Therefore, it holds that

i=1 θt i) Ef( 1

i=1 θt 1 i )

η2Lσ2 1 2n + ηL2

i=1 E θt 1 i 1

i=1 θt 1 i 2 η(1 ηL)

i=1 fi(θt 1 i ) 2 η

i=1 θt 1 i ) 2 (33)

Summing over t = 1 to t1 yields:

i=1 θt1 i ) Ef( 1

η2Lσ2 1t1 2n + ηL2

i=1 E θt i 1

i=1 θt i 2 η(1 ηL)

i=1 fi(θt i) 2 η

i=1 θt i) 2

Then we have

i=1 θt i) 2 +η(1 ηL)

i=1 fi(θt i) 2

i=1 θ0 i ) Ef( 1

i=1 θt1 i ) + ηL2

i=1 E θt i 1

i=1 θt i 2 +η2Lσ2 1t1 2n (35)

If the above condition about step size η is satisfied, that is, η q

1 12L2t2 1 , then it holds that

i=1 θt i) 2 η

i=1 θ0 i ) Ef( 1

i=1 θt1 i ) + L2

i=1 E θt i 1

i=1 θt i 2 +ηLσ2 1t1 n (36)

Then we can further derive the Equation (30) as follows:

(1 12η2L2ρ2 1nt1)

i=1 E θt i 1

2η2ρ2 1(σ2 1 + 6σ2 2)n2t2 1 + 12η2ρ2 1n2t1 η

i=1 θ0 i ) Ef( 1

i=1 θt1 i ) + L2

i=1 E θt i 1

i=1 θt i 2 +ηLσ2 1t1 n

Therefore, it holds that

(1 24η2L2ρ2 1nt1)

i=1 E θt i 1

2η2ρ2 1σ2 1n2t2 1 + 12η2ρ2 1σ2 2n2t2 1 + 12η3Lρ2 1σ2 1nt2 1 + 24ηρ2 1n2t1 Ef( 1

i=1 θ0 i ) Ef( 1

i=1 θt1 i ) (38)

PDUDT: Provable Decentralized Unlearning under Dynamic Topologies

Next, we need to explore the relationship between θt i 1 n 1

i=1 θt i 2 and f( 1 n 1

i=1 θt i) 2:

θt i = θ0 i η

j=1 W l ij Fj( θl j, ξl j)

i=1 θt i = θ0 i η n 1

j=1 Fj( θl j, ξl j)

i=1 θt 1 i = θ0 i η n 1

j=1 Fj( θl j, ξl j)

Based on Assumption 4.5, we can derive the following:

i=1 θt i) Ef( 1 n 1

i=1 θt 1 i )

i=1 Fi( θt 1 i , ξt 1 i ), f( 1 n 1

i=1 θt 1 i )

i=1 Fi( θt 1 i , ξt 1 i ) 2

B3 =E η n 1

i=1 fi( θt 1 i ), f( 1 n 1

i=1 θt 1 i )

i=1 fi( θt 1 i ) 2 η

2E f( 1 n 1

i=1 θt 1 i ) 2

i=1 fi( θt 1 i ) f( 1 n 1

i=1 θt 1 i ) 2

i=1 fi( θt 1 i ) 2 η

2E f( 1 n 1

i=1 θt 1 i ) 2

fi( θt 1 i ) fi( 1 n 1

i=1 θt 1 i ) + fi( 1 n 1

i=1 θt 1 i ) f( 1 n 1

i=1 θt 1 i ) 2

i=1 fi( θt 1 i ) 2 η

2E f( 1 n 1

i=1 θt 1 i ) 2 +ησ2 2 + ηL2

i=1 E θt 1 i 1 n 1

i=1 θt 1 i 2

i=1 ( Fi( θt 1 i , ξt 1 i ) fi( θt 1 i ) + fi( θt 1 i )) 2

η2Lσ2 1 2(n 1) + η2L

i=1 fi( θt 1 i ) 2

Therefore, it holds that

i=1 θt i) Ef( 1 n 1

i=1 θt 1 i )

i=1 fi( θt 1 i ) 2 η

2E f( 1 n 1

i=1 θt 1 i ) 2 + η2Lσ2 1 2(n 1) + ησ2 2

PDUDT: Provable Decentralized Unlearning under Dynamic Topologies

i=1 E θt 1 i 1 n 1

i=1 θt 1 i 2 (39)

Summing over t = 1 to t1 yields:

i=1 θt1 i ) Ef( 1 n 1

t=0 E 1 n 1

i=1 fi( θt i) 2 η

t=0 E f( 1 n 1

i=1 θt i) 2 + η2Lσ2 1t1 2(n 1) + ησ2 2t1

i=1 E θt i 1 n 1

i=1 θt i 2 (40)

Then we have

t=0 E f( 1 n 1

i=1 θt i) 2 η(1 ηL)

t=0 E 1 n 1

i=1 fi( θt i) 2

i=1 θ0 i ) Ef( 1 n 1

i=1 θt1 i ) + η2Lσ2 1t1 2(n 1) + ησ2 2t1 + ηL2

i=1 E θt i 1 n 1

i=1 θt i 2 (41)

Similarly, if the above condition about step size η is satisfied, that is, η q

1 12L2t2 1 , then it holds that

t=0 E f( 1 n 1

i=1 θt i) 2

η Ef( 1 n 1

i=1 θ0 i ) Ef( 1 n 1

i=1 θt1 i ) + ηLσ2 1t1 n 1 + 2σ2 2t1 + 2L2

i=1 E θt i 1 n 1

i=1 θt i 2 (42)

Then we can further derive the Equation (25) as follows:

(1 12η2L2ρ2 2(n 1)t1)

i=1 E θt i 1 n 1

2η2ρ2 2σ2 1(n 1)2t2 1 + 12η2ρ2 2σ2 2(n 1)2t2 1 + 12η2ρ2 2(n 1)2t1 2

η Ef( 1 n 1

i=1 θ0 i ) Ef( 1 n 1

i=1 θt1 i )

+ ηLσ2 1t1 n 1 + 2σ2 2t1 + 2L2

i=1 E θt i 1 n 1

i=1 θt i 2 (43)

Therefore, it holds that

(1 36η2L2ρ2 2(n 1)t1)

i=1 E θt i 1 n 1

2η2ρ2 2σ2 1(n 1)2t2 1 + 12η2ρ2 2σ2 2(n 1)2t2 1 + 12η3Lρ2 2σ2 1(n 1)t2 1 + 24η2ρ2 2σ2 2(n 1)2t2 1+

24ηρ2 2(n 1)2t1 Ef( 1 n 1

i=1 θ0 i ) Ef( 1 n 1

i=1 θt1 i ) (44)

Then based on Equation (19), we can further derive the following:

4η2(1 + ρ2 1)t2 1 (n 1)2

t=0 E Gt (n) Ht (n) + Ht (n) 2

PDUDT: Provable Decentralized Unlearning under Dynamic Topologies

i=1 E Fi(θt i, ξt i) fi(θt i) 2 +

i=1 E fi(θt i) fi( 1

i=1 θt i) + fi( 1

i=1 θt i) f( 1

i=1 θt i) + f( 1

i=1 θt i) 2

8η2(1 + ρ2 1)σ2 1nt3 1 (n 1)2 + 24η2(1 + ρ2 1)σ2 2nt3 1 (n 1)2 + 24η2L2(1 + ρ2 1)t2 1 (n 1)2

i=1 E θt i 1

+ 24η2(1 + ρ2 1)nt2 1 (n 1)2

i=1 θt i) 2 (45)

Due to the fact

n (n 1)2 2 n 1 4

we can further derive the Equation (32) as

i=1 θt1 i 1 n 1

i=1 θt1 i 2

8η2(5 + 2ρ2 1)σ2 1t3 1 + 48η2(1 + ρ2 1)σ2 2t3 1 + 12η2L2t2 1 n 1

i=1 E θt i 1 n 1

i=1 θt i 2 +

24η2L2(3 + 2ρ2 1)t2 1 n

i=1 E θt i 1

i=1 θt i 2 +24η2(1 + ρ2 1)nt2 1 (n 1)2

i=1 θt i) 2 (46)

According to Equation (38) and (47), let

D1 = 1 24η2L2ρ2 1nt1 > 0 η <

1 24L2ρ2 1nt1

D2 = 1 36η2L2ρ2 2(n 1)t1 > 0 η <

1 36L2ρ2 2(n 1)t1

i=1 E θt i 1

2η2ρ2 1σ2 1nt2 1 D1 + 12η2ρ2 1σ2 2nt2 1 D1 + 12η3Lρ2 1σ2 1t2 1 D1 + 24ηρ2 1nt1 D1

i=1 θ0 i ) Ef( 1

i=1 θt1 i ) (47)

i=1 E θt i 1 n 1

2η2ρ2 2σ2 1(n 1)t2 1 D2 + 12η2ρ2 2σ2 2(n 1)t2 1 D2 + 12η3Lρ2 2σ2 1t2 1 D2 + 24η2ρ2 2σ2 2(n 1)t2 1 D2 +

24ηρ2 2(n 1)t1

i=1 θ0 i ) Ef( 1 n 1

i=1 θt1 i ) (48)

PDUDT: Provable Decentralized Unlearning under Dynamic Topologies

Substitute Equation (36), (47) and (48) into Equation (46) to get

i=1 θt1 i 1 n 1

i=1 θt1 i 2

(24η4L2σ2 1nt4 1 + 144η5L3σ2 1t4 1) ((6 + 4ρ2 1)ρ2 1 D1 + ρ2 2 D2 + 8(1 + ρ2 1)ρ2 1 n D1 ) + 8η2(5 + 2ρ2 1)σ2 1t3 1 + 48η2(1 + ρ2 1)σ2 2t3 1+

144η4L2σ2 2t4 1 ((6 + 4ρ2 1)ρ2 1n D1 + 3ρ2 2(n 1)

D2 + 16(1 + ρ2 1)ρ2 1 D1 )+

(2304η3L2ρ2 1(1 + ρ2 1)t3 1 D1 + 576η3L2ρ2 1(3 + 2ρ2 1)nt3 1 D1 + 192η(1 + ρ2 1)t2 1 n ) (Ef(θ0) Ef( 1

i=1 θt1 i ))+

288η3L2ρ2 2(n 1)t3 1 D2 (Ef(θ0) Ef( 1 n 1

i=1 θt1 i )) (49)

where 0 < η < min{ q

1 12L2t2 1 , q

1 24L2ρ2 1nt1 , q

1 36L2ρ2 2(n 1)t1 }. Then, using Jensen s inequality, we can directly obtain

D. Proof of Lemma B.2

t=0 pt iδt i 2

t=0 pt i 2 E δt i 2

t=0 E δt i 2

j=1 W t ij Fj(θt j, ξt j) 1 n 1

j=1 Fj(θt j, ξt j) + 1

j=1 Fj(θt j, ξt j)

j=1 W t ij Fj(θt j, ξt j)

n Fj(θt j, ξt j) Fn(θt n, ξt n)) 2

t=0 (E ( e T i W t 1T n 1 n 1)Gt (n 1) 2 +E (e T i W t 1T n n )Gt (n) 2 + 1 (n 1)2 E (1T n n e T n)Gt (n) 2)

3η2(ρ2 1 + ρ2 2 + 1 (n 1)2 )t1

t=0 E Gt (n) Ht (n) + Ht (n) 2 (50)

Substitute Equation (19) and (47) into Equation (50) to get

t=0 pt iδt i 2

6 η2n + 3η3L + 12η4L2ρ2 1n2t1 D1 + 72η5L3ρ2 1nt1 D1

D3σ2 1t2 1 + 18η2n + 432η4L2ρ2 1n2t1 D2

D3σ2 2t2 1+

36ηnt1 + 864η3L2ρ2 1n2t2 1 D1

D3(Ef(θ0) Ef( 1

i=1 θt1 i )) (51)

where D3 = ρ2 1 + ρ2 2 + 1 (n 1)2 and step size 0 < η < min{ q

1 12L2t2 1 , q

1 24L2ρ2 1nt1 , q

1 36L2ρ2 2(n 1)t1 }. Then, using Jensen s

inequality, we can directly obtain Lemma B.2 without early stopping.

PDUDT: Provable Decentralized Unlearning under Dynamic Topologies

For an early stopping setup, it holds that

t=0 p t i δt i 2

t=0 t1,i p t i 2 E δt i 2

t=0 E δt i 2 (52)

This result shows the same bound with Equation (50). As a result, Lemma B.2 still holds for the early stopping setup.

E. Proof of Theorem 4.9

Subtracting Equation (12) from Equation (13) yields

θt1 i 1 n 1

i=1 θt1 i = 1 n 1

i=1 θt1 i 1 n 1

i=1 θt1 i 1 n 1

t=0 pt iδt i (53)

If 0 < η < min{ q

1 12L2t2 1 , q

1 24L2ρ2 1nt1 , q

1 36L2ρ2 2(n 1)t1 }, it holds that

θt1 i 1 n 1

i=1 θt1 i 1 n 1

i=1 θt1 i + 1 n 1

t=0 pt iδt i

v u u u u u u u u t

(24η4L2σ2 1nt4 1 + 144η5L3σ2 1t4 1) ( (6+4ρ2 1)ρ2 1 D1 + ρ2 2 D2 + 8(1+ρ2 1)ρ2 1 n D1 ) + 8η2(5 + 2ρ2 1)σ2 1t3 1 + 48η2(1 + ρ2 1)σ2 2t3 1+

144η4L2σ2 2t4 1 ( (6+4ρ2 1)ρ2 1n D1 + 3ρ2 2(n 1)

D2 + 16(1+ρ2 1)ρ2 1 D1 ) + ( 2304η3L2ρ2 1(1+ρ2 1)t3 1 D1 + 576η3L2ρ2 1(3+2ρ2 1)nt3 1 D1 +

192η(1+ρ2 1)t2 1 n ) (Ef(θ0) Ef( 1

i=1 θt1 i )) + 288η3L2ρ2 2(n 1)t3 1 D2 (Ef(θ0) Ef( 1 n 1

i=1 θt1 i ))

v u u u u t

6 η2n + 3η3L + 12η4L2ρ2 1n2t1 D1 + 72η5L3ρ2 1nt1 D1 D3σ2 1t2 1 + 18η2n + 432η4L2ρ2 1n2t1 D2 D3σ2 2t2 1+ 36ηnt1 + 864η3L2ρ2 1n2t2 1 D1 D3(Ef(θ0) Ef( 1

i=1 θt1 i ))

which completes the proof.

F. Proof of Corollary 4.10

Corollary 4.10 guarantees the (ϵ, β)-Indistinguishability of Algorithm 2, since it is essentially based on the Gaussian mechanism.

From a global perspective, the average model 1 n 1

i=1 θu i is obtained by our proposed decentralized unlearning algorithm

(Algorithm 2) and 1 n 1

i=1 ˇθt1 i is produced by the retraining algorithm (Algorithm 1).

According to the generation rule of ˇθt1 i and θu i : (ˇθt1 i = θt1 i + zi, where zi N(0, (n 1)σ2Id)

θu i = θt1 i + zi, where zi N(0, (n 1)σ2Id)

PDUDT: Provable Decentralized Unlearning under Dynamic Topologies

We can derive that the average models 1 n 1

i=1 ˇθt1 i and 1 n 1

i=1 θu i satisfy:

i=1 ˇθt1 i = 1 n 1

i=1 θt1 i + z

i=1 θu i = 1 n 1

where z N(0, σ2Id).

Therefore, the average models 1 n 1

i=1 ˇθt1 i and 1 n 1

i=1 θu i follow the distributions 1 n 1

i=1 ˇθt1 i N( 1 n 1

i=1 θt1 i , σ2Id)

i=1 θu i N( 1 n 1

i=1 θt1 i , σ2Id).

What s more, according to Theorem 4.9, we can denote d1 as the upper bound discussed in Theorem 4.9, which satisfies

v u u u u u u u u t

(24η4L2σ2 1nt4 1 + 144η5L3σ2 1t4 1) ( (6+4ρ2 1)ρ2 1 D1 + ρ2 2 D2 + 8(1+ρ2 1)ρ2 1 n D1 ) + 8η2(5 + 2ρ2 1)σ2 1t3 1 + 48η2(1 + ρ2 1)σ2 2t3 1+

144η4L2σ2 2t4 1 ( (6+4ρ2 1)ρ2 1n D1 + 3ρ2 2(n 1)

D2 + 16(1+ρ2 1)ρ2 1 D1 ) + ( 2304η3L2ρ2 1(1+ρ2 1)t3 1 D1 + 576η3L2ρ2 1(3+2ρ2 1)nt3 1 D1 +

192η(1+ρ2 1)t2 1 n ) (Ef(θ0) Ef( 1

i=1 θt1 i )) + 288η3L2ρ2 2(n 1)t3 1 D2 (Ef(θ0) Ef( 1 n 1

i=1 θt1 i ))

v u u u u t

6 η2n + 3η3L + 12η4L2ρ2 1n2t1 D1 + 72η5L3ρ2 1nt1 D1 D3σ2 1t2 1 + 18η2n + 432η4L2ρ2 1n2t1 D2 D3σ2 2t2 1+ 36ηnt1 + 864η3L2ρ2 1n2t2 1 D1 D3(Ef(θ0) Ef( 1

i=1 θt1 i ))

Based on Gaussian mechanism (Definition 4.3), the average models 1 n 1

i=1 ˇθt1 i and 1 n 1

i=1 θu i are (ϵ, β)-

Indistinguishability, and thus Algorithm 2 satisfies (ϵ, β)-machine unlearning with

log(1/β) + ϵ p

That completes the proof.

G. Proof of Theorem 4.11

Based on Equation (10), we have

ˆθt i = ˆθ0 i ˆη

j=1 W t1+l 1 ij Fj(ˆθl 1 j , ˆξl 1 j )

i=1 ˆθt i = 1 n 1

i=1 ˆθ0 i ˆη n 1

j=1 Fj(ˆθl 1 j , ˆξl 1 j )

i=1 ˆθt 1 i = 1 n 1

i=1 ˆθ0 i ˆη n 1

j=1 Fj(ˆθl 1 j , ˆξl 1 j )

Therefore, it holds that

ˆθt i 1 n 1

PDUDT: Provable Decentralized Unlearning under Dynamic Topologies

=(ˆθ0 i 1 n 1

i=1 ˆθ0 i ) ˆη

j=1 W t1+l 1 ij Fj(ˆθl 1 j , ˆξl 1 j ) 1 n 1

j=1 Fj(ˆθl 1 j , ˆξl 1 j )

=(θu i 1 n 1

i=1 θu i ) ˆη

l=1 ( e T i W t1+l 1 1T n 1 n 1) ˆGl 1 (n 1) (54)

i=1 ˆθt i 1 n 1

i=1 ˆθt 1 i = ˆη n 1

j=1 Fj(ˆθt 1 j , ˆξt 1 j ) (55)

According to Assumption 4.5, we have

i=1 ˆθt i) E f( 1 n 1

i=1 ˆθt 1 i )

i=1 fi(ˆθt 1 i ), f( 1 n 1

i=1 ˆθt 1 i ) + L

i=1 Fi(ˆθt 1 i , ˆξt 1 i ) 2

2E f( 1 n 1

i=1 ˆθt 1 i ) 2 ˆη

i=1 fi(ˆθt 1 i ) 2

2E f( 1 n 1

i=1 ˆθt 1 i ) 1 n 1

i=1 fi(ˆθt 1 i ) 2

i=1 Fi(ˆθt 1 i , ˆξt 1 i ) 1 n 1

i=1 fi(ˆθt 1 i ) + 1 n 1

i=1 fi(ˆθt 1 i ) 2

2E f( 1 n 1

i=1 ˆθt 1 i ) 2 ˆη

i=1 fi(ˆθt 1 i ) 2

i=1 ( fi( 1 n 1

i=1 ˆθt 1 i ) fi(ˆθt 1 i )) 2

i=1 ( Fi(ˆθt 1 i , ˆξt 1 i ) fi(ˆθt 1 i )) 2 + ˆη2L

i=1 fi(ˆθt 1 i ) 2

ˆη2Lσ2 1 2(n 1) ˆη

2E f( 1 n 1

i=1 ˆθt 1 i ) 2 ˆη(1 ˆηL)

i=1 fi(ˆθt 1 i ) 2

i=1 E ˆθt 1 i 1 n 1

i=1 ˆθt 1 i 2

Bound n 1 P

i=1 E ˆθt i 1 n 1

i=1 ˆθt i 2:

i=1 E ˆθt i 1 n 1

i=1 ˆθt i 2

i=1 E (θu i 1 n 1

i=1 θu i ) ˆη

l=1 ( e T i W t1+l 1 1T n 1 n 1) ˆGl 1 (n 1) 2

i=1 E θu i 1 n 1

i=1 θu i 2 +4ˆη2 n 1 X

l=1 ( e T i W t1+l 1 1T n 1 n 1)( ˆGl 1 (n 1) ˆHl 1 (n 1)) 2

PDUDT: Provable Decentralized Unlearning under Dynamic Topologies

+ 4ˆη2 n 1 X

l=1 ( e T i W t1+l 1 1T n 1 n 1) ˆHl 1 (n 1) 2

i=1 E θu i 1 n 1

i=1 θu i 2 +4ˆη2ρ2 2σ2 1(n 1)2t + 4ˆη2 n 1 X

l=1 E ( e T i W t1+l 1 1T n 1 n 1) ˆHl 1 (n 1) 2 +

l =l E ( e T i W t1+l 1 1T n 1 n 1) ˆHl 1 (n 1), ( e T i W t1+l 1 1T n 1 n 1) ˆHl 1 (n 1)

i=1 E θu i 1 n 1

i=1 θu i 2 +4ˆη2ρ2 2σ2 1(n 1)2t + 8ˆη2 n 1 X

l=1 ( e T i W t1+l 1 1T n 1 n 1) ˆHl 1 (n 1) 2

i=1 E θu i 1 n 1

i=1 θu i 2 +4ˆη2ρ2 2σ2 1(n 1)2t + 8ˆη2ρ2 2(n 1)

i=1 E fi(ˆθl 1 i ) 2

i=1 E fi(ˆθl 1 i ) fi( 1 n 1

i=1 ˆθl 1 i ) + fi( 1 n 1

i=1 ˆθl 1 i ) f( 1 n 1

i=1 ˆθl 1 i )+

i=1 ˆθl 1 i ) f( 1 n 1

i=1 ˆθl 1 i ) + f( 1 n 1

i=1 ˆθl 1 i ) 2

i=1 E fi(ˆθl 1 i ) fi( 1 n 1

i=1 ˆθl 1 i ) 2 +4

i=1 E fi( 1 n 1

i=1 ˆθl 1 i ) f( 1 n 1

i=1 ˆθl 1 i ) 2 +

i=1 E f( 1 n 1

i=1 ˆθl 1 i ) f( 1 n 1

i=1 ˆθl 1 i ) 2 +4

i=1 E f( 1 n 1

i=1 ˆθl 1 i ) 2

i=1 E fi(ˆθl 1 i ) fi( 1 n 1

i=1 ˆθl 1 i ) 2 +4

i=1 E fi( 1 n 1

i=1 ˆθl 1 i ) f( 1 n 1

i=1 ˆθl 1 i ) 2 +

i=1 E 1 n 1( fn( 1 n 1

i=1 ˆθl 1 i ) 1

i=1 fi( 1 n 1

i=1 ˆθl 1 i )) 2 +4

i=1 E f( 1 n 1

i=1 ˆθl 1 i ) 2

i=1 E ˆθl 1 i 1 n 1

i=1 ˆθl 1 i 2 +4(n 1)σ2 2 + 4σ2 2 n 1 + 4

i=1 E f( 1 n 1

i=1 ˆθl 1 i ) 2 (58)

Substitute Equation (58) into Equation (57) to get

i=1 E ˆθt i 1 n 1

i=1 ˆθt i 2

i=1 E θu i 1 n 1

i=1 θu i 2 +4ˆη2ρ2 2σ2 1(n 1)2t + 32ˆη2ρ2 2σ2 2(n 1)2t + 32ˆη2ρ2 2σ2 2t+

32ˆη2L2ρ2 2(n 1)

i=1 E ˆθl 1 i 1 n 1

i=1 ˆθl 1 i 2 +32ˆη2ρ2 2(n 1)2 t X

l=1 E f( 1 n 1

i=1 ˆθl 1 i ) 2 (59)

Therefore, it holds that

i=1 E ˆθt i 1 n 1

i=1 ˆθt i 2

PDUDT: Provable Decentralized Unlearning under Dynamic Topologies

i=1 E θu i 1 n 1

i=1 θu i 2 +4ˆη2ρ2 2σ2 1(n 1)2T 2 + 32ˆη2ρ2 2σ2 2(n 1)2T 2 + 32ˆη2ρ2 2σ2 2T 2+

32ˆη2L2ρ2 2(n 1)T

i=1 E ˆθt i 1 n 1

i=1 ˆθt i 2 +32ˆη2ρ2 2(n 1)2T

t=0 E f( 1 n 1

i=1 ˆθt i) 2 (60)

i=1 E ˆθt i 1 n 1

i=1 ˆθt i 2

i=1 E θu i 1 n 1

i=1 θu i 2 +4ˆη2ρ2 2σ2 1(n 1)2T 2 + 32ˆη2ρ2 2σ2 2(n 1)2T 2 + 32ˆη2ρ2 2σ2 2T 2+

32ˆη2L2ρ2 2(n 1)T

i=1 E ˆθt i 1 n 1

i=1 ˆθt i 2 +32ˆη2ρ2 2(n 1)2T

t=0 E f( 1 n 1

i=1 ˆθt i) 2 (61)

If it satisfies ˆη < q

1 32L2ρ2 2(n 1)T , the following holds:

(1 32ˆη2L2ρ2 2(n 1)T)

i=1 E ˆθt i 1 n 1

i=1 ˆθt i 2

i=1 E θu i 1 n 1

i=1 θu i 2 +4ˆη2ρ2 2σ2 1(n 1)2T 2 + 32ˆη2ρ2 2σ2 2(n 1)2T 2 + 32ˆη2ρ2 2σ2 2T 2+

32ˆη2ρ2 2(n 1)2T

t=0 E f( 1 n 1

i=1 ˆθt i) 2 (62)

Summing from t = 1 to t = T for Equation (56) and substituting Equation (62) into it yields

i=1 ˆθT i ) E f( 1 n 1

i=1 ˆθ0 i )

16ˆη3L2ρ2 2(n 1)T 1 32ˆη2L2ρ2 2(n 1)T ˆη

t=0 E f( 1 n 1

i=1 ˆθt i) 2 ˆη(1 ˆηL)

t=0 E 1 n 1

i=1 fi(ˆθt 1 i ) 2 +

1 2(n 1) 3ˆηL2T 1 32ˆη2L2ρ2 2(n 1)T

i=1 E θu i 1 n 1

i=1 θu i 2 + 2ˆη3L2ρ2 2σ2 1(n 1)T 2

1 32ˆη2L2ρ2 2(n 1)T +

16ˆη3L2ρ2 2σ2 2(n 1)T 2

1 32ˆη2L2ρ2 2(n 1)T + 16ˆη3L2ρ2 2σ2 2T 2

(n 1)(1 32ˆη2L2ρ2 2(n 1)T) + ˆη2Lσ2 1T 2(n 1) (63)

where ˆη < q

1 32L2ρ2 2(n 1)T .

2 16ˆη2L2ρ2 2(n 1)T 1 32ˆη2L2ρ2 2(n 1)T , D5 = 1

D6 = 1 32ˆη2L2ρ2 2(n 1)T.

If the step size ˆη satisfies ˆη < q

1 32L2ρ2 2(n 1)T , we have the following convergence result of the subsequent T rounds of

t=0 E f( 1 n 1

i=1 ˆθt i) 2 +D5 1 T

t=0 E 1 n 1

i=1 f(ˆθt i) 2

PDUDT: Provable Decentralized Unlearning under Dynamic Topologies

i=1 E θu i 1 n 1

i=1 θu i 2 + ˆηLσ2 1 2(n 1) +

i=1 θu i ) f

ˆηT + 2ˆη2L2ρ2 2σ2 1(n 1)T D6 +

16ˆη2L2ρ2 2σ2 2(n 1)T D6 + 16ˆη2L2ρ2 2σ2 2T (n 1)D6

which completes the proof.

H. Proof of Corollary 4.12

We assume that D4 C, where C (0, 1

2). Then it holds

16ˆη2L2ρ2 2(n 1)T 1 32ˆη2L2ρ2 2(n 1)T = 1

2 ˆη2 1 2C 64(1 C)L2ρ2 2(n 1)T

and D6 = 1 32ˆη2L2ρ2 2(n 1)T 1 2(1 C)

If we set ˆη = n 1

T , the following should be satisfied

T 2 1 2C 64(1 C)L2ρ2 2(n 1)T T 64(1 C)L2ρ2 2(n 1)3

2T 0 T (n 1)L

As a result, it holds that

i=1 θu i ) f

i=1 θu i ) f

n 1 ˆηLσ2 1 2(n 1) = Lσ2 1 2T 2ˆη2L2ρ2 2σ2 1(n 1)T D6 = (1

2 D4) σ2 1 8 (1 2C)σ2 1 16 16ˆη2L2ρ2 2σ2 2(n 1)T D6 = (1

2 D4) σ2 2 (1 2C)σ2 2 2 16ˆη2L2ρ2 2σ2 2T (n 1)D6 = (1

2 D4) σ2 2 (n 1)2 (1 2C)σ2 2 2(n 1)2

Therefore, if the number of training round T satisfies T max{ 64(1 C)L2ρ2 2(n 1)3

1 2C , (n 1)L} with C (0, 1

2), and the step size ˆη further satisfies ˆη = n 1

T , then Corollary 4.12 holds.

I. Experimental Details

I.1. Datasets and Models

We train the Res Net-18 model (He et al., 2016) and the CNN model on the real-world datasets, including MNIST (Lecun et al., 1998), CIFAR-10 (Krizhevsky & Hinton, 2009), SVHN (Netzer et al., 2011) and Fashion-MNIST (Xiao et al., 2017).

MNIST is a handwritten digit dataset containing 60, 000 training images and 10, 000 test images of grayscale digits (0 9), each with a resolution of 28 28 pixels.

Fashion-MNIST consists of 60, 000 training images and 10,000 testing images, with each being a 28 28 pixel grayscale image representing various clothing items such as T-shirts, pants, dresses, and more.

PDUDT: Provable Decentralized Unlearning under Dynamic Topologies

CIFAR-10 includes 60, 000 colored images of 10 common object classes, with 50, 000 images for training and 10, 000 for testing, each at 32 32 pixels with three color channels (RGB).

SVHN consists of real-world house number images from street views, containing 73, 257 training images and 26, 032 test images, also at 32 32 pixels with RGB channels.

I.2. Baseline methods

To show the advantages of our proposed algorithm, we compare it with the following baseline methods:

Origin. The baseline is actually the D-PSGD algorithm (Lian et al., 2017), which does not involve any unlearning operations.

Retrain. This method retrains the decentralized models from scratch on the remaining n 1 clients after removing the target client-n. It can achieve exact unlearning but requires significant time and resources.

FATS-Unl (Tao et al., 2024). It saves the historical global models and the sets of clients that participated in past training rounds on the central server. When a client initiates an unlearning request, the algorithm retrieves the latest model from before the client s initial participation in training and then retrains.

Fed Recovery (Zhang et al., 2023). It relies on a central server to remove the weighted sum of the gradient residuals from the global model to eliminate the influence of a certain client, and adds specific Gaussian noise to make the unlearned model and the retrained model statistically indistinguishable.

HDUS (Ye et al., 2024). Each client s main model relies entirely on local and public datasets and remains unaffected by its neighbors. The collaboration among clients is solely reflected in the integration of distilled seed models from neighbors to make decisions. Therefore, the decision results of the client initiating the unlearning request can be removed by adjusting the integration process.

I.3. Metrics

In our experiments, we use multiple metrics to evaluate the performance of the proposed decentralized unlearning algorithm, including accuracy, unlearning time, communication overhead, and attack success rate.

Accuracy. To examine the statistical indistinguishability, we evaluate the overall accuracy of the unlearning models and the retrained models. To show the effectiveness of our PDUDT, we record the average accuracy on each class after performing a specified number of rounds.

Unlearning time. We evaluate the running time required for the proposed Algorithm 2 to perform the unlearning operations and compare it with other baseline algorithms. For FATS-Unl and Fed Recovery, this is measured on the server side, while for others, it is tracked on the client with the most neighbors.

Attack success rate. Membership Inference Attack (MIA) is employed to determine whether the data samples of a client slated for forgetting were part of the training process. A higher MIA success rate indicates that the global model still retains considerable information about this client s training data, signifying an inadequate unlearning effect. In contrast, a success rate of 50% equivalent to random guessing implies that the model no longer carries exploitable traces of the client s data, thereby demonstrating effective client removal. We perform the membership inference attack on the unlearned model to verify if the proposed unlearning algorithm successfully removed the targeted client s influence.

I.4. Additional experimental results

To show the scalability of PDUDT, we conducted experiments with 20 clients performing a natural language processing (NLP) task. Specifically, we evaluated PDUDT using the Yahoo! Answers dataset with the Bert-tiny model.

For the Yahoo! Answers dataset, Figure 3 demonstrates that our method is statistically indistinguishable from the Perturbed Retraining approach across a range of noise scales. Moreover, Figure 4 highlights the effectiveness of both PDUDT and

PDUDT: Provable Decentralized Unlearning under Dynamic Topologies

0.003 0.006 0.009 0.012 0.015 0.0

Yahoo! Answers

Perturbed Retraining PDUDT PDUDT (ES)

Figure 3. The accuracy of unlearned models using PDUDT, PDUDT (ES), and perturbed retrained models on Yahoo! Answers dataset.

0 1 2 3 4 5 6 7 8 9 Class

Yahoo! Answers

Retrain PDUDT PDUDT (ES)

Figure 4. The accuracy on each class using PDUDT, PDUDT (ES), and perturbed retrained models on Yahoo! Answers dataset.

Table 5. Comparison of the unlearning time and the attack precision of MIA across different unlearning methods on Yahoo! Answers dataset. - means no results or not applicable.

Method Unlearning time (s) Attack precision (%)

Origin - 69.2 0.2 Retrain 2852.2 50.3 0.8 FATS-Unl 2039.9 51.2 0.4 Fed Recovery 13.0 51.8 0.7 HDUS 27.5 51.5 0.3 PDUDT 11.8 50.7 0.7 PDUDT (ES) 10.2 51.1 0.5

its space-efficient variant, PDUDT (ES): While they maintain high performance on classes 0 8, their accuracy noticeably declines for Class 9. Finally, Table 5 summarizes the substantial time savings afforded by our unlearning operations and further confirms the unlearning effectiveness of PDUDT and PDUDT (ES) through comparable membership inference attack (MIA) precision relative to the Retrain method. The results confirm the superior performance of PDUDT, showing good scalability to larger networks and NLP task.