# collabedit_towards_nondestructive_collaborative_knowledge_editing__0617e304.pdf

Published as a conference paper at ICLR 2025

COLLABEDIT: TOWARDS NON-DESTRUCTIVE COLLABORATIVE KNOWLEDGE EDITING

Jiamu Zheng 1, Jinghuai Zhang 3 Tianyu Du 1, Xuhong Zhang 1 Jianwei Yin 1 Tao Lin 2

Zhejiang University 1 Westlake University 2 University of California, Los Angeles 3 zhengjaamie@gmail.com jinghuai1998@g.ucla.edu zjradty@zju.edu.cn zhangxuhong@zju.edu.cn zjuyjw@cs.zju.edu.cn lintao@westlake.edu.cn

Collaborative learning of large language models (LLMs) has emerged as a new paradigm for utilizing private data from different parties to guarantee efﬁciency and privacy. Meanwhile, Knowledge Editing (KE) for LLMs has also garnered increased attention due to its ability to manipulate the behaviors of LLMs explicitly, yet leaves the collaborative KE case in which knowledge edits of multiple parties are aggregated in a privacy-preserving and continual manner unexamined. To this end, this manuscript dives into the ﬁrst investigation of collaborative KE, in which we start by carefully identifying the unique three challenges therein, including knowledge overlap, knowledge conﬂict, and knowledge forgetting. We then propose a non-destructive collaborative KE framework, COLLABEDIT, which employs a novel model merging mechanism to mimic the global KE behavior while preventing the severe performance drop. Extensive experiments on two canonical datasets demonstrate the superiority of COLLABEDIT compared to other destructive baselines, and results shed light on addressing three collaborative KE challenges and future applications. Our code is available at https://github.com/LINs-lab/Collab Edit.

1 INTRODUCTION

50 100 200 500 10005000

Number of edits

30 40 50 60 70 80 90

H-mean of (ES, PS, NS)

Global-edit Task-Arithmetic Simple-Average

Figure 1: Limits of existing KE methods under the collaborative KE scenarios on the Multi-Counter Fact dataset (Meng et al., 2022).

Large Language Models (LLMs) (Achiam et al., 2023; Qiao et al., 2023) recently have emerged as the promising solution toward general artiﬁcial intelligence. However, deploying LLMs in practice usually requires customizing LLMs with speciﬁc knowledge (Meng et al., 2022), where re-training LLMs may be expensive and unacceptable (Jang et al., 2023). Accordingly, Knowledge Editing (KE) (Meng et al., 2022; Mitchell et al., 2022a; Tan et al., 2024; Zhang et al., 2023), which allows efﬁcient modiﬁcation of knowledge stored in existing models, has been proposed as an alternative solution. To explicitly update LLMs with knowledge from multiple parties or organizations each possesses a distinct and private dataset (Ye et al., 2024; Wu et al., 2022; Mc Mahan et al., 2017) and meet individual demands, current KE methods (Meng et al., 2022; 2023) ﬁrst need to collect edit requests from these parties with violated privacy concerns: the edit request itself contains sensitive private information and thus becomes infeasible for sharing. It motivates resorting to the cross-silo collaborative learning paradigm (Wu et al., 2023; Kairouz et al., 2021) by only communicating the locally-updated-models, rather than uploading a list of risky edit requests namely collaborative KE for LLMs. However, existing KE methods are all designed for the single-party single-model scenario (Meng et al., 2022; Mitchell et al., 2022a; Tan et al., 2024; Meng et al., 2023). Noting that model merging (MM) techniques (Ortiz-Jimenez et al., 2023; Chronopoulou et al., 2023; Yadav et al., 2023) allow a straightforward extension of KE methods to a collaborative KE scenario. Therefore as our

Work was done during Jiamu s visit to Westlake University. Corresponding author.

Published as a conference paper at ICLR 2025

Edits Request list for client n

Edits Request ε1 New Fact: Messi plays soccer. tennis. Q: What sport did Messi play ? A: Tennis. Edits Request list for client 2

Edits Request ε1 New Fact: The president of US is Obama. Biden. Q: The president of US is ? A: Biden. Edits Request list for client 1

Edits Request ε1 New Fact: Jack Ma founded Alibaba. Tesla. Q: Who founded Google ? A: Larry Page.

Edits Request ε2 New Fact: Air Pods was created by Apple. Google. Q: We Chat, created by ? A: Tencent.

Server (Global-model)

Updated Global Model s weight (Ideal)

(a) Global Editing (b) Collaborative Editing

Edits list for client 1

Clients (Sub-models)

Nondestructive

Model-fusion Module

Destructive

Server (Global-model)

Nondestructive-fusion Model s weight (ours)

Destructive-fusion Model s weight (Baseline)

Nearly identical !

Edits list for client 2

Edits list for client n

Figure 2: Comparison of global KE (GLOBAL-EDIT) and collaborative KE.

(side)-contribution, we examine a naive combination of local KE and MM techniques, and compare them with the optimal global KE method (GLOBAL-EDIT): we can witness that all these naive collaborative KE methods are destructive. In detail, we conduct independent KE (i.e., MEMIT (Meng et al., 2023)) on each party locally, and then use model merging techniques like Simple-Average (Chronopoulou et al., 2023) and Task-Arithemetic (Ilharco et al., 2023) to merge local models (LLMs) into a global model (LLM). We ﬁnd that as the number of edits increases, the performance gap between naive baselines with the optimal GLOBAL-EDIT also widens.

Alongside the pitfalls of naive collaborative KE methods, in this manuscript, we carefully examine the intervention issues among different parties and identify the knowledge overlap and knowledge conﬂict challenges. These challenges arise from the global server s blind access to the edit requests of each party. In addition, the continual KE requirement for each party inherently results in the interventions among different rounds of editing, corresponding to knowledge forgetting challenge.

To bridge the gap and provide a deeper understanding of collaborative KE, we ﬁrst analyze the performance drop between naive collaborative KE methods and GLOBAL-EDIT from a theoretical perspective, upon which we design a novel framework COLLABEDIT that allows non-destructive collaborative KE. We further pioneer the explorations on the interventions associated with collaborative KE (namely the three challenges we identiﬁed) and design tailored solutions to effectively address them. Our contributions can be summarized as follows:

We are the ﬁrst to propose the collaborative KE paradigm (including naive collaborative KE baselines, GLOBAL-EDIT, and our COLLABEDIT), in which we summarize the unique interventions associated with collaborative KE and conclude them with three challenges in this novel paradigm. We identify the performance gap between the naive collaborative KE method and the upper bound performance (i.e., GLOBAL-EDIT) through theoretical and empirical analysis. To the best of our knowledge, we propose the ﬁrst non-destructive collaborative KE framework: it is versatile, allowing nondestructive integration of existing KE methods and providing insights into the solution of each challenge. Our empirical results demonstrate the effectiveness of our proposed framework compared with baselines and that of the novel solutions to three challenges based on our COLLABEDIT. Our discussions shed light on future research for collaborative KE.

2 RELATED WORK

Knowledge Editing (KE). KE (De Cao et al., 2021; Mitchell et al., 2022a) has received signiﬁcant attention due to the increasing demands for efﬁcient updating of LLMs. Hypernetwork knowledge editing and direct model editing are the two most representative KE methods. Given edit requests, hypernetwork knowledge editing (Mitchell et al., 2022a;b; Tan et al., 2024) leverages a trained hypernetwork to predict the model updates, while direct model editing (Meng et al., 2023; 2022; De Cao et al., 2021) updates LLMs as an associative memory and inserts new memory via solving an optimization problem. However, these KE methods are all designed for a single LLM, which limits their applications to a more practical collaborative learning scenario. In this paper, we place emphasis on the So TA frameworks of two KE methods types stated above, which are respectively MALMEN (Tan et al., 2024) and MEMIT (Meng et al., 2023), to explore the integration of KE and collaborative learning.

Published as a conference paper at ICLR 2025

Collaborative learning and model merging. Collaborative learning (Kairouz et al., 2021; Wang et al., 2021; Fan et al., 2024; Mohtashami et al., 2023) allows multiple parties to jointly and continuously learn a machine learning model by sharing their updates to a global server for aggregation. Alongside the orthogonal techniques to address data heterogeneity issue (Karimireddy et al., 2020; Li et al., 2019), model aggregation/merging (Li et al., 2023b; Wortsman et al., 2022; Ortiz-Jimenez et al., 2023; Yadav et al., 2023) has emerged as a promising research direction to collaborative learning, which employs the global server to directly merge model updates in the weight space without disclosing the training data of each party. The most commonly used model merging techniques themselves are Simple-Average (SA) (Chronopoulou et al., 2023; Wortsman et al., 2022) and Task-Arithmetic (TA) (Ortiz-Jimenez et al., 2023). Moreover, TIES-merging (Yadav et al., 2023) has recently proposed to further enhance the merging performance by solving the symbol conﬂicts among different models. However, all existing model merging techniques only achieve destructive editing performance when used for collaborative KE, which inevitably results in knowledge loss during the merging process.

3 PRELIMINARIES OF COLLABORATIVE KNOWLEDGE EDITING

We ﬁrst introduce the basics of KE in a single LLM. Then, we illustrate the naive approaches to conduct collaborative KE. Finally, we describe the inherent interventions within collaborative KE.

3.1 INTRODUCTION TO KNOWLEDGE EDITING IN A SINGLE LLM

LLMs can answer natural-language queries about facts based on implicit knowledge encoded within the parameters. Following Meng et al. (Meng et al., 2023), we deﬁne a fact f as (subject s, relation r, object o) , e.g., (s = Danielle Darrieux, r = spoke the language, o = French) . Given a sequence of facts E = {fi|fi = (si, ri, oi)} to edit (denoted as edit requests), knowledge editing aims to maximize the likelihood that the updated LLM Mθ predicts the desired object oi for any factual prompt x p(si, ri), which involves a preﬁx x and a templated prompt p(si, ri):

arg min Mθ 1 |E| P|E| i=1 Ex [ log Pr Mθ [oi|x p(si, ri)]] . (1)

The state-of-the-art knowledge editing methods (Meng et al., 2022; 2023; Tan et al., 2024) found that modifying a small sequence of MLP layers in the critical path of LLM is sufﬁcient to edit its factual associations. In particular, linear operation Wl in an MLP layer can operate as a key-value store for input keys Kl and the memory/knowledge values Ml, where input keys correspond to the intermediate feature vector of the model from a set of edit requests. Knowledge editing modiﬁes each MLP layer such that it associates Kl to the desired Ml by solving Wl Kl Ml. For brevity, we will describe knowledge editing for a speciﬁc layer and omit l throughout the paper. Given a set of facts E to edit (i.e., edit requests), we ﬁrst obtain their input keys K = [k1, . . . , k|E|] to the layer l via a single feed-forward. We also obtain the desired memory values M = [m1, . . . , m|E|] of layer l that maximize Pr [oi|x p(si, ri)]. The goal of editing the layer l can be formulated as optimizing the such that the updated weight W+ associates the input keys K to the desired memory values M. Note that the MLP layer also contains previously stored memories of existing knowledge, which should be preserved during the knowledge editing. Therefore, we also maintain the associations between input keys of existing knowledge Kinit and their memory values (WKinit). Following MEMIT (Meng et al., 2023), we derive the closed form of for a speciﬁc layer l as:

= RK (C + KK ) 1, (2) where C = Kinit K init is the covariance matrix of the input keys of existing knowledge, and R = M WK represents the residual error in the output space of layer l. See more details in Appendix A.

3.2 DESTRUCTIVE MODEL MERGING ENCOUNTERS KNOWLEDGE EDITING

KE in practice involves editing the factual associations of LLM, such as correcting the hallucinations or updating outdated information. This process often requires handling simultaneous edit requests, where multiple parties or clients access and collaboratively contribute to the same LLM service. Though Global KE (GLOBAL-EDIT) illustrated in Figure 2(a) represents the ideal editing cases, it also necessitates each client to directly share the edit requests with the server, which violates the privacy constraints. Collaborative KE in Figure 2(b) instead allows each client to edit on its

Published as a conference paper at ICLR 2025

local model and only rely on the server to aggregate the edit updates using the model merging techniques (Wortsman et al., 2022; Ortiz-Jimenez et al., 2023; Yadav et al., 2023).

However, existing KE algorithms are all designed for a single client and cannot be trivially generalized to the collaborative KE scenario. As evidenced in Figure 1, naively extending existing editing methods or model merging methods yields a dramatic performance drop compared to that of the GLOBAL-EDIT (upper bound), especially when the number of edits increases. Given the limits of destructive collaborative KE methods, we aim to develop a non-destructive collaborative KE method that can achieve a similar editing performance as GLOBAL-EDIT, even with a large number of edits.

3.3 INTERVENTIONS WITHIN COLLABORATIVE KNOWLEDGE EDITING

In addition to the performance drop, our proposed concept of collaborative KE also suffers from several key challenges, due to the unique characteristics of this scenario. By default, we assume a trustworthy and non-adversarial collaborative KE scenario. The collaborative KE employs a global server to aggregate the edits of local clients without disclosing their edit requests, while requiring each client to continually edit the global model by updating its local model in a multi-round manner. However, there still exist several unique challenges due to the interventions among different clients and different rounds of editing, warranting research in the future.

3.3.1 INTERVENTIONS AMONG DIFFERENT CLIENTS

In collaborative KE, multiple clients may use similar edit requests to update their local models and send the updated models to the global server for aggregation. The interventions among clients are then raised in the editing event e := (s1, r1, o1 o2, t1, m1), which includes s1 as a subject, r1 as a relationship, o1 and o2 as objects, t1 as a editing timestamp and m1 as client model.

Knowledge conﬂict indicates that edit requests from the same/different clients (in the same round1) share the same subject s and relation r but with different objects o. Such a conﬂict renders the effectiveness of knowledge editing and may even compromise the overall KE performance. We elaborate the general formulation of conﬂict edit below (detailed illustration can be found in the Table 7 of Appendix): e1 = (s1, r1, o1 o2, t1, m1) e2 = (s1, r1, o1 o3, t2, m2) , (3)

where local model m1 and m2 perform a conﬂicting editing for the same subject s1 and relationship r1 at timestamp t1 and t2 respectively, changing the same original object o1 to different o2 and o3.

Similar to the composite edit operations mentioned by Li et al. (2024), composite conﬂict and composite overlap arising from such operations may also occur in collaborative KE scenarios, with even more diverse and complex forms. Here we aim to brieﬂy introduce the key concept of knowledge conﬂict, and a more detailed deﬁnition and investigation of this issue left for future work.

Knowledge overlap is a simpliﬁed case of knowledge conﬂict, where the object changing relationship (i.e. o1 o2 and o1 o3) in editing events of e1 and e2 becomes identical. Knowledge overlap is also closely related to the overﬁtting problem in machine learning, in which excessive overlapped editing requests can degrade the model s editing performance on other edit requests (excluding those repeated edit requests).

3.3.2 INTERVENTIONS AMONG DIFFERENT ROUNDS OF EDITING

The collaborative KE paradigm naturally requires multiple clients to continually update their local models in a multi-round manner and thus edit the global model with the latest knowledge. Knowledge forgetting issue, therefore, arises given the continual arrival of a large number of new editing requests, alongside the existing knowledge and editing requests.

Assume that each client has a set of old edit requests Eo, as well as m sets of new edit requests En = [En1, En2, , Enm], where the new edit requests are irrelevant (i.e., their subjects s and relationships r are different) to the old edit requests. The model is initialized by updating the model with the old edit requests Eo, and the local model of each client will be updated with the new edit requests Eni at i-th round of editing, followed by the model aggregation step. The knowledge

1In cases of conﬂict between edits from different rounds, due to the overwriting nature of KE, the latter conﬂicting edit will overwrite the former, naturally resolving the conﬂict.

Published as a conference paper at ICLR 2025

forgetting issue encountered after m rounds of local editing and global aggregation can then be deﬁned as the editing performance on the old knowledge obtained from the old edit requests E0. In particular, we ﬁnd that as the value of m increases, the evaluation performance of the model on old knowledge Eo deteriorates, as evidenced in Section 5.3.

4 METHODOLOGY

4.1 COLLABEDIT: NON-DESTRUCTIVE COLLABORATIVE KE To better understand the performance drop, we ﬁrst explicitly model the relationship between the weight updates G of the global model using GLOBAL-EDIT and that of each client model i using local editing. For ease of presentation, we consider the collaborative KE scenario with N clients and each client model has M edit requests. We simplify the theoretical analysis to the single-round editing case and demonstrate the effectiveness of COLLABEDIT for multi-round editing in Remark 2. Lemma 1 (The relationship between the weight updates from GLOBAL-EDIT and local editing). Take the KE method MEMIT as an example. Following the deﬁnitions in Section 3.1, we denote C as an aggregated statistic over the previously stored keys of existing knowledge and use Ki to represent the new keys derived from client i s edit. Then, the relationship between G and i is measured as:

G = N i=1 i ( αi := (C + Ki K i )(C + N

i=1 Ki K i ) 1) . (4)

See detailed proof in Appendix B.1.

Intuition: If we can estimate G using i, then we can merge { i}N i=1 to obtain the same global model as GLOBAL-EDIT and, therefore, obtain non-destructive collaborative KE. Details of COLLABEDIT: Indeed G can be represented as the weighted sum of different local weight updates i with coefﬁcient αi. However, the coefﬁcient αi relies on the value of Ki of all the clients: it breaks the privacy, given the fact that Ki is an intermediate feature vector of the model from a set of edit requests and any external party can easily reconstruct the edit requests if Ki is leaked. As a remedy, our COLLABEDIT instead proposes to directly communicate Ki K i , in which we prove in Section 6 that Ki K i is non-trivial to attack. See our pseudo-code in Appendix D. Remark 1. Currently, we consider two mainstream KE methods (Akyürek et al., 2023), namely (1) locate and edit activations (same as Direct model editing mentioned in Section 2, e.g., MEMIT (Meng et al., 2023) and ROME (Meng et al., 2022)); and (2) train an auxiliary model to directly predict parameters (same as Hypernetwork knowledge editing mentioned in Section 2, e.g., MEND (Mitchell et al., 2022a) and MALMEN (Tan et al., 2024)). Our framework COLLABEDIT is general enough to integrate many other KE methods, and we leave them for future work. Justifying the performance drop for destructive editing approaches. We further analyze the performance degradation for destructive editing approaches when the number of edits increases, as illustrated in Figure 1. For the sake of simplicity, we take the TASK-ARITHMETIC (Ilharco et al., 2023) with MEMIT as an example. The drop can be explained by:

G G = PN i=1 i h (C + Ki K i )(C + PN j=1 Ki K i ) 1 λI i , (5)

where G and G represent the weight updates derived from COLLABEDIT (our non-destructive collaborative KE) and a destructive collaborative KE using TASK-ARITHMETIC, respectively. We can see that the impact of new knowledge Ki K i is negligible compared to existing knowledge C when the number of edits is small2, resulting in (C + PN j=1 Ki K i ) 1 C and thus G G when λ = 1. The gap becomes wider when the number of edits increases, contributing to the continuous decline in TASK-ARITHMETIC s performance in Figure 1 compared to GLOBAL-EDIT. Remark 2 (COLLABEDIT is effective for multi-round editing). Collaborative KE involves multiple clients continuously editing the local models and sharing the updated global model across multiple rounds, and thus requires robust support to ensure seamless knowledge integration and consistent knowledge memorization. COLLABEDIT achieves non-destructive collaborative KE for single-round editing as an approximation of aggregating all edit requests of clients in a speciﬁc round and applying global KE to update the global model remains effective for multi-round editing. Note that multi-round editing is equivalent to applying global KE to iteratively update a single LLM multiple times under the reasonable editing budgets (Gupta et al., 2024).

2We randomly sample 100 edit requests to estimate the norm of Ki Ki T . We observe that the average ℓ2-norm of Ki Ki T is approximately 0.0001% of that of C, which supports the claim.

Published as a conference paper at ICLR 2025

4.2 REMEDY TOWARDS SOLVING INTERVENTION CHALLENGES IN COLLABORATIVE KNOWLEDGE EDITING: SOME CASE STUDIES

Interventions within collaborative KE scenarios are non-trivial, due to the challenges of explicitly modeling the impacts of editing requests from different clients. Our COLLABEDIT paves the path by mimicking the optimal GLOBAL-EDIT and allowing the non-destructive editing. This subsection case studies how our COLLABEDIT sheds insights on solving unique challenges caused by the interventions among different clients (spatial aspect) and editing rounds (temporal aspect), namely knowledge overlap, knowledge conﬂict, and knowledge forgetting.

4.2.1 EDITING RESIDUAL DETECTS KNOWLEDGE OVERLAP

COLLABEDIT simpliﬁes the knowledge overlap challenge in collaborative KE scenarios into the over-ﬁtting problem under the global KE scenarios. In other words, multiple clients edit the same piece of knowledge is equivalent to integrating several identical pieces of knowledge into the global model. In detail: performing KE in the model results in weights update and residual Rold, as determined by the input key K. In the case of editing the same knowledge (i.e., same K), we can get new residual Rnew = Rold K, where the following equation can be leveraged to track the dynamics of KE:

Rnew := Rold K = Rold Rold K (C + KK ) 1K . (6)

Intuitively, (6) explains that the residual should gradually approach 0 I. If the residual R gradually approaches zero, then we can accurately detect the knowledge overlap by examining the residual R, as demonstrated in Section 5.3.

4.2.2 ADDRESSING KNOWLEDGE CONFLICT VIA DATA AUGMENTAION

Recall that in rare cases, edit requests from the same/different clients in the same round may share the same subject s and relation r but with different objects o, known as knowledge conﬂict. An ideal solution to the knowledge conﬂict should consist of two stages. In the ﬁrst stage, the global server and clients need to collaboratively detect the conﬂict in a privacy-preserving manner. For example, when the knowledge conﬂict occurs, the global server produces poor editing performance on some edit requests. As a result, the clients (who contribute to the edits) could report the issue. Once the conﬂict is identiﬁed, the server will determine which of the conﬂicting edit requests to retain for the global model based on the client s report and a predeﬁned strategy (e.g., FCFS (Zhao & Stankovic, 1989) or FIFO (Morse & Richardson, 1983) strategy). The client whose edit request is selected for integration can apply data augmentation techniques, such as incorporating relevant knowledge (Li et al., 2024), to enhance the KE of the selected edit request and effectively resolve the knowledge conﬂict.

4.2.3 DYNAMIC COVARIANCE MATRIX ALLEVIATES KNOWLEDGE FORGETTING

The previously memorized knowledge may be forgotten by the LLM after a large number of edits, termed as knowledge forgetting issue. COLLABEDIT simpliﬁes the analysis of this issue and we can witness from (2) that the covariance matrix C of existing knowledge is immutable, amplifying the forgetting as the number of edits increases. As a remedy, we propose using a dynamic version of C, i.e.,

C = β0C0 + β1C1 = β0C0 + β1 Ki K i , (7)

where β0 and β1 are hyper-parameters that balance the inﬂuences of existing knowledge and newly acquired knowledge. C0 is the covariance matrix of existing knowledge and C1 is the accumulated covariance matrix of new knowledge. Ki represents the input keys obtained from all the edit requests at the i-th round. The dynamic covariance matrix continuously updated for the new knowledge can effectively mitigate the knowledge forgetting issue, as veriﬁed in Section 5.3.

5 EXPERIMENTS

5.1 EXPERIMENTAL SETUP

Datasets and models. Following the literature (Meng et al., 2022; 2023), we use Multi-Counter Fact (MCF) (Meng et al., 2022) and zs RE (Levy et al., 2017) as datasets and evaluate the editing performance on GPT2-XL (Radford et al., 2019) and GPT-J (6B) (Wang & Komatsuzaki, 2021).

Published as a conference paper at ICLR 2025

Table 1: Overall editing performance on GPT2-XL. GLOBAL-EDIT is 5000 1, which means we edit 5000 requests in one model (global model) at one time. GLOBAL-EDIT is an ideal situation. Others are merging methods (500 10) where we edit 10 models and each model will be edited by 500 requests. The line of GPT2-XL means we directly evaluate 5000 requests without any editing operation to test the model s original performance. The Score serves as the overall metric for assessing the performance of each method on each dataset.

Method MCF zs RE

NS PS ES Score NA PA EA Score

GPT2-XL 78.24 23.88 21.50 29.65 24.32 21.87 22.80 22.95

GLOBAL-EDIT 65.08 80.66 89.66 77.08 25.25 64.71 68.96 43.12

TIES-MERGING 78.46 26.35 27.16 34.27 24.94 25.99 27.59 26.12 TASK-ARITHMETIC 66.84 55.19 61.66 60.85 24.97 33.66 34.80 30.45 SIMPLE-AVERAGE 76.90 29.97 33.06 39.15 25.78 29.26 30.62 28.40

COLLABEDIT 65.26 80.67 89.70 77.18 25.21 64.27 68.40 42.95

Table 2: Overall editing performance on GPT-J (6B), based on MEMIT (Meng et al., 2023). The experimental setting is identical to GPT2-XL in Table 1. The Score serves as the overall metric.

Method MCF zs RE

NS PS ES Score NA PA EA Score

GPT-J 83.45 17.17 14.78 21.75 26.99 26.25 27.04 26.75

GLOBAL-EDIT 57.20 96.13 99.26 79.03 28.05 88.79 92.05 51.92

TIES-MERGING 76.15 30.13 30.98 38.16 30.17 42.55 43.55 37.68 TASK-ARITHMETIC 50.24 72.82 73.26 63.44 18.77 45.16 46.75 30.98 SIMPLE-AVERAGE 78.04 41.28 54.68 54.22 29.19 47.96 51.38 40.22

COLLABEDIT 57.12 96.03 99.06 78.91 28.26 88.78 92.19 52.17

Baselines. We compare COLLABEDIT with three naive collaborative KE methods, which apply standard KE algorithms (e.g., MEMIT (Meng et al., 2023) and MALMEN (Tan et al., 2024)) to update the local model and use the current model merging algorithms to merge local updates into the global model. In particular, we experiment with three most commonly used algorithms for model merging, including SIMPLE-AVERAGE (Chronopoulou et al., 2023), TASK-ARITHMETIC (Ortiz Jimenez et al., 2023), and TIES-MERGING (Yadav et al., 2023).

Evaluation metrics. Unless otherwise mentioned, we utilize MEMIT as the backend KE algorithm and adopt the same metrics as MEMIT to evaluate editing performance. Strictly following the literature (Meng et al., 2022; 2023; Tan et al., 2024), we use Efﬁcacy Score (ES), Paraphrase Score (PS), Neighborhood Score (NS), N-gram Entropy (NE), Reference Score (RS), and Score (i.e., the harmonic mean of ES, PS, NS) as metrics for MCF; we use Neighborhood Accuracy (NA), Paraphrase Accuracy (PA), Efﬁcacy accuracy (EA), and Score (i.e., the harmonic mean of NA, PA, and EA) as metrics for zs RE. When using MALMEN (Tan et al., 2024) as the backend KE algorithm, we adopt the same metrics as MALMEN for a fair comparison, including editing success (EA), generalization success (PA), and locality success (NA). See detailed descriptions in Appendix C.

Evaluation benchmark for conﬂict knowledge editing scenarios. In order to conveniently simulate potential scenarios of collaborative knowledge conﬂict and analyze the issues and impacts that these scenarios may bring, we reconstruct two existing benchmarks to simulate knowledge conﬂict situations through GPT-3.5-turbo. Initially, we attempt to explore the impact of knowledge conﬂict on model performance using Multi-Counter Fact (MCF) (Meng et al., 2022) due to its large scale. For each data point (s, r, o) in the MCF dataset, we utilize GPT-3.5-turbo to generate a conﬂict object that is identical to s and r but differs in o. Section E in the Appendix showcases a concrete example of the generated conﬂict object. To validate the effectiveness of our two-stage mechanism to resolve knowledge conﬂict, we utilize the Easy dataset (Li et al., 2024) for the sake of simplicity. This dataset was constructed by creating several additional related knowledge edits for each edit using Wikipedia as the source, which MCF does not include. Additionally, we also generated a corresponding conﬂict object for each edit in the dataset using GPT-3.5-turbo.

Published as a conference paper at ICLR 2025

Table 3: Overall editing performance on GPT-J (6B) and GPT2-XL, based on MALMEN (Tan et al., 2024). We edit 8 models and each model will be edited by 125 requests of zs RE. The Score serves as the overall metric.

Method GPT2-XL (zs RE) GPT-J (zs RE)

EA PA NA Score EA PA NA Score

GLOBAL-EDIT 99.21 93.08 16.5 36.84 99.95 95.66 27.32 52.57

TIES-MERGING 15.52 14.85 18.68 16.18 27.86 26.76 25.18 26.55 TASK-ARITHMETIC 50.37 45.79 4.28 10.89 27.59 29.6 26.3 27.76 SIMPLE-AVERAGE 52.39 46.03 4.58 11.57 71.15 53.96 4.82 12.49

COLLABEDIT 99.06 92.66 15.49 35.11 99.62 92.88 23.25 47.01

5.2 EXPERIMENTAL RESULTS OF KE PERFORMANCE

Superior collaborative knowledge editing performance. As shown in Table 1 and 2 when using MEMIT (Tan et al., 2024) as the backend KE algorithm, our privacy-preserving solution COLLABEDIT achieves on-par editing performance with that of GLOBAL-EDIT, and signiﬁcantly outperforms other naive model merging methods in terms of the Score on two datasets and two models. Additionally, our COLLABEDIT has nearly identical performance with GLOBAL-EDIT via combining the weight updates of each client, which ensures both privacy protection and editing quality. Nevertheless, there exists a signiﬁcant gap between the performance of baselines and GLOBAL-EDIT. Table 3 additionally shows the editing performance of COLLABEDIT when using MALMEN as the backend KE algorithm (Tan et al., 2024): COLLABEDIT is capable of performing nondestructive collaborative KE across various mainstream KE methods.

Discussion about the performance of baselines. Though other baselines (Table 1 and 2) have a relatively higher NS value compared to GLOBAL-EDIT and our COLLABEDIT, we conjecture that it might be caused by the under-ﬁtting phenomenon: these model merging methods are not speciﬁcally designed for merging the weight updates from knowledge editing, which is reﬂected by their low values of PS, ES, and Score. The results of two models GPT2-XL and GPT-J (6B) (the ﬁrst line) further conﬁrms that the high NS of other baselines are largely due to the inherent high quality of the model itself, exhibiting their poor collaborative KE effects. Note that NS/NA emphasizes that the edited model should maintain the same answer for neighborhood prompts of edit requests. However, editing certain knowledge using existing KE (Tan et al., 2024; Meng et al., 2023) methods would inevitably affect the association of its neighboring prompts, which leads to a similar drop of NS/NA for both GLOBAL-EDIT and COLLABEDIT.

5.3 EXPERIMENTAL RESULTS ON THREE CHALLENGES OF COLLABORATIVE KE

0 5 10 15 20 Number of repeating edits

2-norm of residual

Global-edit

Figure 3: The ℓ2-norm of residual R when data replication happens.

Residual R can effectively detect the knowledge overlap. To understand the impacts of knowledge overlap, we repeatedly edit the same edit requests into the global model. Figure 3 shows that as the number of repeating edits increases, the ℓ2-norm of residual R reduces rapidly and becomes smaller than 0.01 when repeating edits for 12 times, which is consistent with our theoretical analysis in Section 4.2.1. This implies that the ℓ2-norm of R can be used to check whether overlapped editing happens, which may be helpful for practitioners to avoid the decrease in model performance.

Knowledge conﬂict can compromise the editing performance. To explore the impact of knowledge conﬂict, we reconstruct MCF with knowledge conﬂict (see Section 5.1 for details), where each edit request f = (s , r , o ) in the benchmark corresponds to a f = (s, r, o) in MCF and s = s, r = r, o = o (based on the deﬁnition in Section 4.2.2). We randomly sample 5,000 edit requests and their conﬂicted versions from both datasets, denoted as E and E . For experiments, we can either distribute edit requests in E or both sets (E and E ) across all the clients for collaborative KE to understand the impact of E on E. Table 6 (see Appendix) evaluates the KE performance of E with and without the editing of the conﬂicted set E to explore the impact of knowledge conﬂict on KE performance. We can see that the overall KE performance largely decreases due to conﬂicting knowledge especially for PA and EA: as those accuracy-related metrics, in comparison to the success-related

Published as a conference paper at ICLR 2025

metrics (i.e., PS, NS, ES), are more rigorous; while NA, a metric used to assess whether irrelevant knowledge is affected, nearly remained unchanged. See Section C in Appendix for details.

Two-stage mechanism with knowledge augmentation can mitigate conﬂicts. Given the harmful impacts of knowledge conﬂict, we examine our two-stage mechanism (introduced in section 4.2.2) on the modiﬁed Easy dataset (Li et al., 2024). In this scenario, there is no objective standard to determine which edit should be retained. Therefore, we can employ the FCFS (Zhao & Stankovic, 1989) or FIFO (Morse & Richardson, 1983) to select the correct edit to be preserved. Subsequently, we augment edit requests and obtain weight updates from the selected client.

Conflict Unsolved Conflict Solved Input: What use does fpart has _

Relative Probability

0.031 0.004

Target: data migration Conflict: data transfer

Other: data storage

Figure 4: An example of using data augmentation to address the problem of knowledge conﬂict.

Firstly, we present a detailed example of resolving knowledge conﬂicts in Figure 4. Speciﬁcally, given the question What use does fpart have? , there are two edit requests that induce conﬂicting answers, i.e., data migration and data transfer . Let s deﬁne data migration as the target knowledge to preserve and data transfer as the conﬂict knowledge to remove, and we have the following observations after adopting the proposed mechanism: (1) Before solving the conﬂict (left), the LLM produces a large output probability for both data migration and data transfer ; (2) After solving the conﬂict (right), the probability of data migration slightly increases while the probability of data transfer drops to 0. Moreover, the probability of unrelated knowledge remains unchanged. The results show that knowledge conﬂict is effectively resolved.

Table 4: COLLABEDIT utilizes augmented edit requests to mitigate the knowledge conﬂict.

Before Resolve -18.11 37% After Resolve 17.6 77.6%

Secondly, we evaluate the performance of the proposed twostage mechanism with a large number of conﬂicting edit requests. In Table 4, we present the Average Probability Difference (Avg P ) and Target Success Rate (Succ) before and after resolving knowledge conﬂict. Speciﬁcally, we experiment with 1,000 pairs of target knowledge and corresponding conﬂicting knowledge. A larger Avg P (i.e., output probability of target knowledge minus output probability of conﬂicting knowledge) and a higher Succ (i.e., the target knowledge is the ﬁnal output) indicate that the model is more inclined to output the target knowledge, which indicates that knowledge conﬂict is resolved. As illustrated in Table 4, our two-stage mechanism effectively mitigates the issue.

Dynamic C can alleviate the knowledge forgetting. As described in Section 3.3.2, we assume that each client has a set of old edit requests Eo (initially edited), as well as m sets of new edit requests En = [En1, En2, , Enm]. We note that for this experiment, there exists no conﬂict between Eo and En, which allows us to investigate the effects of knowledge forgetting. As shown in Table 5, we ﬁnd that after numerous rounds of editing, the LLMs produce much lower PS and ES for knowledge obtained from Eo due to the knowledge forgetting. Under the same condition, we dynamically update the covariance matrix C according to Equation 7 when editing both Eo and En. We observe that the dynamic C signiﬁcantly mitigates the issue, with the Score only dropping from 79.03 to 78.15 on GPT-J and MCF.

Table 5: Dynamic covariance matrix C can alleviate the knowledge forgetting. We gather all the edit requests in each round and apply global KE to edit the global model to study the knowledge forgetting issue. For experiments, we initially use Eo to edit the global model and sequentially use m sets of aggregated new edit requests, where we set m to a large value (i.e., m = 1000). We report the editing performance of old edit requests Eo before and after m rounds of new editing. GPT-J (6B) and GPT2-XL is used.

Model Method MCF zs RE

NS PS ES Score NA PA EA Score

GPT-J Before m rounds of editing 57.20 96.13 99.26 79.03 28.05 88.79 92.05 51.92

After m rounds of editing (Immutable C) 65.14 76.94 84.58 74.68 24.21 61.05 66.22 41.21 After m rounds of editing (Dynamic C) 58.15 91.62 97.32 78.15 26.54 79.34 84.40 48.28

GPT2-XL Before m rounds of editing 65.08 80.66 89.66 77.08 25.25 64.71 68.96 43.12

After m rounds of editing (Immutable C) 64.89 60.38 69.82 64.80 25.28 50.31 53.96 38.47 After m rounds of editing (Dynamic C) 61.54 74.33 82.30 71.72 24.40 56.57 59.89 39.80

Published as a conference paper at ICLR 2025

6 THE DISCUSSION ON THE PRIVACY PRESERVING OF COLLABEDIT

This section theoretically and empirically justiﬁes that COLLABEDIT is privacy-preserving via sharing KK . We begin our justiﬁcation by deﬁning input keys K as:

K = [k1, k2, , k M] Rd M , (8)

where d indicates the dimension of the feature vector and M indicates the number of edit requests. Theoretical aspect. We aim to prove that it is nontrivial to reconstruct the K given KK , which is equivalent to proving that given any speciﬁc KK , there exists an inﬁnite number of K (every K may involve different M) that will lead to the same KK .

Let s assume there exists a matrix operation W RM M , which can transform K into K through K = K W and ensure that K K = KK . Then we have:

K K = KW (KW ) = K(W W )K = KK , (9)

where any orthogonal matrix W such that W W = I will lead to the K which has the same covariance matrix as K. Since there exists (Grove, 2002; Hall & Hall, 2013) an inﬁnite number of the orthogonal matrix W that meets the condition of W W = I when M > 1, we can conclude that it is nontrivial to reconstruct the K given KK from theoretical perspective3.

K KK M=1 KK M=4 KK M=8 KK M=16 0.5

Average Embedding Similarity

Figure 5: We show the average embedding similarity between recovered sequences (inferred from K or KK involving M sequences) and their ground truths. The grey line is the average embedding similarity between two random text sequences.

Empirical aspect. Our objective is to quantify the extent of privacy leakage by recovering the input sequences of edit requests solely based on the observed K or KK . Notably, K compromises the feature embeddings of input sequences, and thus we leverage the So TA embedding inversion attack, GEIA (Li et al., 2023a), to recover input sequences from their feature embeddings. For generality, we adopt the same setup as GEIA to recover input sequences. The key idea is to build a powerful attacker model to decode the sequences from embeddings. The privacy leakage is measured by embedding similarity (Cer et al., 2017) between original sequences and recovered sequences in terms of an LLM (e.g., T5-Large (Raffel et al., 2020)). Since we also want to measure the privacy leakage of KK , we further tailor the attacker model to recover input sequences from KK . Considering that KK is a covariance matrix involving M input sequences, we calculate the maximum embedding similarity between the recovered sequence and any of the M sequences. Figure 5 shows that sharing K results in severe privacy leakage as the recovered sequences are close to the original sequences with large embedding similarity. In contrast, with only a small M such as 8, KK reduces the embedding similarity to 0.69, which is close to that between two random text sequences (grey line). In other words, the recovered sequence from KK is almost irrelevant to any of the M sequences when M 8. Therefore, we show that COLLABEDIT achieves privacy-preserving via sharing KK .

7 CONCLUSION AND FUTURE WORKS

In this work, we propose the ﬁrst collaborative KE framework, COLLABEDIT, which allows multiple parties to jointly edit the knowledge of an LLM without disclosing their private edit requests. In particular, COLLABEDIT leverages the model merging techniques to combine the updates made by each client in their local models. Motivated by the theoretical analysis, we design our framework to be non-destructive, which achieves comparable performance to directly editing a global model using aggregated edit requests. Based on COLLABEDIT, we further provide a remedy toward solving intervention challenges raised in collaborative KE. Interesting future works include: (1) Further improving the performance of KE in collaborative learning scenarios; and (2) Diving deeper into the solutions to fully address intervention challenges in collaborative KE.

3The clients typically edit multiple requests simultaneously into the LLM and may also apply techniques (e.g., MLE (Li et al., 2024)) to augment their knowledge. Therefore, it is reasonable to assume there are at least 2 edit requests in a single round (or it could be forced in regulation).

Published as a conference paper at ICLR 2025

ACKNOWLEDGEMENT

This work was partly supported by the NSFC under No. U244120033, 62402418, 62402425, the Key R&D Program of Ningbo under No. 2024Z115, the China Postdoctoral Science Foundation under No. 2024M762829, and the Zhejiang Provincial Priority Funded Postdoctoral Research Project under No. ZJ2024001, the Research Center for Industries of the Future (RCIF) at Westlake University, and the Westlake Education Foundation.

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report. ar Xiv preprint ar Xiv:2303.08774, 2023.

Afra Akyürek, Eric Pan, Garry Kuwanto, and Derry Wijaya. Dune: Dataset for uniﬁed editing. In Empirical Methods in Natural Language Processing (EMNLP), 2023.

Daniel Cer, Mona Diab, Eneko Agirre, Iñigo Lopez-Gazpio, and Lucia Specia. Semeval-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation. In Proceedings of the 11th International Workshop on Semantic Evaluation (Sem Eval-2017), pp. 1 14, 2017.

Alexandra Chronopoulou, Matthew E Peters, Alexander Fraser, and Jesse Dodge. Adaptersoup: Weight averaging to improve generalization of pretrained language models. In Association for Computational Linguistics (ACL), 2023.

Nicola De Cao, Wilker Aziz, and Ivan Titov. Editing factual knowledge in language models. ar Xiv preprint ar Xiv:2104.08164, 2021.

Dongyang Fan, Celestine Mendler-Dünner, and Martin Jaggi. Collaborative learning via prediction consensus. Neural Information Processing Systems (Neur IPS), 2024.

Larry C Grove. Classical groups and geometric algebra, volume 39. American Mathematical Soc., 2002.

Akshat Gupta, Anurag Rao, and Gopala Anumanchipalli. Model editing at scale leads to gradual and catastrophic forgetting. ar Xiv preprint ar Xiv:2401.07453, 2024.

Brian C Hall and Brian C Hall. Lie groups, Lie algebras, and representations. Springer, 2013.

Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Ludwig Schmidt, Hannaneh Hajishirzi, and Ali Farhadi. Editing models with task arithmetic. In International Conference on Learning Representations (ICLR), 2023.

Joel Jang, Dongkeun Yoon, Sohee Yang, Sungmin Cha, Moontae Lee, Lajanugen Logeswaran, and Minjoon Seo. Knowledge unlearning for mitigating privacy risks in language models. In Association for Computational Linguistics (ACL), 2023.

Peter Kairouz, H Brendan Mc Mahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, et al. Advances and open problems in federated learning. Foundations and trends in machine learning, 14(1 2):1 210, 2021.

Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank Reddi, Sebastian Stich, and Ananda Theertha Suresh. Scaffold: Stochastic controlled averaging for federated learning. In International conference on machine learning, pp. 5132 5143. PMLR, 2020.

Omer Levy, Minjoon Seo, Eunsol Choi, and Luke Zettlemoyer. Zero-shot relation extraction via reading comprehension. In Computational Natural Language Learning (Co NLL), 2017.

Haoran Li, Mingshi Xu, and Yangqiu Song. Sentence embedding leaks more information than you expect: Generative embedding inversion attack to recover the whole sentence. ar Xiv preprint ar Xiv:2305.03010, 2023a.

Xiang Li, Kaixuan Huang, Wenhao Yang, Shusen Wang, and Zhihua Zhang. On the convergence of fedavg on non-iid data. ar Xiv preprint ar Xiv:1907.02189, 2019.

Zexi Li, Tao Lin, Xinyi Shang, and Chao Wu. Revisiting weighted aggregation in federated learning with neural networks. In International Conference on Machine Learning, pp. 19767 19788. PMLR, 2023b.

Published as a conference paper at ICLR 2025

Zhoubo Li, Ningyu Zhang, Yunzhi Yao, Mengru Wang, Xi Chen, and Huajun Chen. Unveiling the pitfalls of knowledge editing for large language models. In International Conference on Learning Representations (ICLR), 2024.

Brendan Mc Mahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communicationefﬁcient learning of deep networks from decentralized data. In Artiﬁcial intelligence and statistics, pp. 1273 1282. PMLR, 2017.

Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. Locating and editing factual associations in gpt. In Neural Information Processing Systems (Neur IPS), 2022.

Kevin Meng, Arnab Sen Sharma, Alex Andonian, Yonatan Belinkov, and David Bau. Mass-editing memory in a transformer. In International Conference on Learning Representations (ICLR), 2023.

Eric Mitchell, Charles Lin, Antoine Bosselut, Chelsea Finn, and Christopher D Manning. Fast model editing at scale. In International Conference on Learning Representations (ICLR), 2022a.

Eric Mitchell, Charles Lin, Antoine Bosselut, Christopher D Manning, and Chelsea Finn. Memory-based model editing at scale. In International Conference on Machine Learning (ICML), pp. 15817 15831. PMLR, 2022b.

Amirkeivan Mohtashami, Florian Hartmann, Sian Gooding, Lukas Zilka, Matt Shariﬁ, et al. Social learning: Towards collaborative learning with large language models. ar Xiv preprint ar Xiv:2312.11441, 2023.

Dale Morse and Gordon Richardson. The lifo/ﬁfo decision. Journal of accounting research, pp. 106 127, 1983.

Guillermo Ortiz-Jimenez, Alessandro Favero, and Pascal Frossard. Task arithmetic in the tangent space: Improved editing of pre-trained models. In Neural Information Processing Systems (Neur IPS), 2023.

Shuofei Qiao, Yixin Ou, Ningyu Zhang, Xiang Chen, Yunzhi Yao, Shumin Deng, Chuanqi Tan, Fei Huang, and Huajun Chen. Reasoning with language model prompting: A survey. In Association for Computational Linguistics (ACL), 2023.

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners. Open AI blog, 1(8):9, 2019.

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. Exploring the limits of transfer learning with a uniﬁed text-to-text transformer. Journal of Machine Learning Research, 21(140):1 67, 2020. URL http://jmlr.org/papers/v21/ 20-074.html.

Gilbert Strang. Introduction to linear algebra. SIAM, 2022.

Chenmien Tan, Ge Zhang, and Jie Fu. Massive editing for large language models via meta learning. In International Conference on Learning Representations (ICLR), 2024.

Ben Wang and Aran Komatsuzaki. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https: //github.com/kingoflolz/mesh-transformer-jax, 2021.

Jianyu Wang, Zachary Charles, Zheng Xu, Gauri Joshi, H Brendan Mc Mahan, Maruan Al-Shedivat, Galen Andrew, Salman Avestimehr, Katharine Daly, Deepesh Data, et al. A ﬁeld guide to federated optimization. ar Xiv preprint ar Xiv:2107.06917, 2021.

Mitchell Wortsman, Gabriel Ilharco, Samir Ya Gadre, Rebecca Roelofs, Raphael Gontijo-Lopes, Ari S Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Kornblith, et al. Model soups: averaging weights of multiple ﬁne-tuned models improves accuracy without increasing inference time. In International conference on machine learning (ICLR), 2022.

Leijie Wu, Song Guo, Junxiao Wang, Zicong Hong, Jie Zhang, and Yaohong Ding. Federated unlearning: Guarantee the right of clients to forget. IEEE Network (IEEE Netw.), 2022.

Leijie Wu, Song Guo, Junxiao Wang, Zicong Hong, Jie Zhang, and Jingren Zhou. On knowledge editing in federated learning: Perspectives, challenges, and future directions. ar Xiv preprint ar Xiv:2306.01431, 2023.

Prateek Yadav, Derek Tam, Leshem Choshen, Colin Raffel, and Mohit Bansal. Ties-merging: Resolving interference when merging models. In Neural Information Processing Systems (Neur IPS), 2023.

Published as a conference paper at ICLR 2025

Rui Ye, Wenhao Wang, Jingyi Chai, Dihan Li, Zexi Li, Yinda Xu, Yaxin Du, Yanfeng Wang, and Siheng Chen. Openfedllm: Training large language models on decentralized private data via federated learning. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2024.

Ningyu Zhang, Yunzhi Yao, and Shumin Deng. Editing large language models. In International Joint Conference on Natural Language Processing and the Asia-Paciﬁc Chapter of the Association for Computational Linguistics (IJCNLP-AACL), 2023.

W. Zhao and J.A. Stankovic. Performance analysis of fcfs and improved fcfs scheduling algorithms for dynamic real-time computer systems. In [1989] Proceedings. Real-Time Systems Symposium (RTSS), 1989.

Published as a conference paper at ICLR 2025

A DETAILS OF KNOWLEDGE EDITING IN A SINGLE LLM

Details of identifying the critical path of MLP layers. Following MEMIT (Meng et al., 2023), we apply causal tracing to LLMs (e.g., GPT-2 XL) and identify the critical path of MLP layers to edit. For consistency, we edit the same set of layers R as MEMIT such as the 13-17th layers of GPT-2 XL.

Details of the closed form optimization of for a single layer. We optimize the following objective to obtain the optimal weights W of layer l:

W arg min ˆ W

ˆ Wki mi 2 +

where ki (1 i n) indicates the old keys derived from existing knowledge and ki (n + 1 i n + |E|) indicates the new keys derived from the edit requests E.

Next, we denote W as the model weights before knowledge editing, Kinit = [k1, . . . , kn] as the set of old keys derived from existing knowledge and K = [kn+1, . . . , kn+|E|] as the set of new keys derived from the edit requests E. Moreover, Minit = [m1, . . . , mn] = WKinit represents the memory values of Kinit that are previously stored and M = [mn+1, . . . , mn+|E|] represents the desired memory values of K that we aim to store. We can solve the Equation (10) by applying the normal equation(Strang, 2022):

W + )(Kinit K init + KK = Minit K init + MK ,

WKinit K init + WKK + Kinit K init + KK = Minit K init + MK . (11)

In addition, we deﬁne two variables: (1) C Kinit K init, which represents the covariance matrix of the input keys of existing knowledge. (2) R M WK, which represents the residual error of the new associations when evaluated on the old weights W. Then, we can obtain the closed-form solution of the weight updates as:

= RK (C + KK ) 1. (12)

We compute C = µ Ek kk , where Ek kk is estimated as an uncentered covariance statistic collected using an empirical sample of vector inputs to the layer (e.g., 100,000 Wikipedia records). µ is a hyperparameter that balances the weighting of new v.s. old associations (a typical value of µ is 1.5 104 according to MEMIT).

Details of the implementation on simultaneously editing multiple layers. Previously we focus on illustrating how existing knowledge editing algorithms edit a single layer in the LLM. To simultaneously edit multiple layers of l R, existing editing algorithms (e.g., MEMIT (Meng et al., 2023)) ﬁrstly obtain the desired output vector zi of ﬁnal layer in R that can maximize Pr [oi|x p(si, ri)]. Then, they spread the whole residual over all the layers in R by computing

partial residual rl i = zi Wl ikl i L l+1 of each layer, i.e., l R. Then, the desired memory value of layer l can be computed as ml i = Wl ikl i + rl i and we can use Equation (12) to edit each layer. For details of the implementation, please also refer to Meng et al. (2023). In this work, we strictly follow their implementation to simultaneously edit multiple layers.

B THEORETICAL ANALYSIS OF THE METHODS

For ease of understanding, we will describe knowledge editing for a speciﬁc layer l and omit l for brevity. We denote G and i as the weight updates derived from GLOBAL-EDIT and client i s edit. KG and Ki represent the new keys derived from all the edit requests and client i s edits requests. According to Section 3.1, RG and Ri represent the residual errors in the output space of layer l derived from all the edit requests and client i s edits requests, respectively. C represents the aggregated statistic over the previously stored keys of existing knowledge. We consider the collaborative editing scenario with N clients and each client model has M edit requests.

Published as a conference paper at ICLR 2025

B.1 ANALYSIS OF THE NON-DESTRUCTIVE COLLABORATIVE KNOWLEDGE EDITING

Note that i and G can be computed via (2) as:

G = RGK G(C + KGK G) 1 ,

i = Ri K i (C + Ki K i ) 1 . (13)

Following the deﬁnitions of K and R in Section 3.1, we have:

Ki = [ki (M 1)+1, ki (M 1)+2, , ki M] ,

Ri = [ri (M 1)+1, ri (M 1)+2, , ri M] ,

KG = [k1, k2, , k N M] = [K1, K2, , KN] , RG = [r1, r2, , r N M] = [R1, R2, , RN] .

Then we have:

RGK G = R1K 1 + R2K 2 + + RNK N. (15) According to Equations (13) and (15), we can obtain:

G(C + PN j=1 Kj K j ) = G(C + K1K 1 + KNK N)

= G(C + KGK G)

= RGK G = R1K 1 + R2K 2 + + RNK N = 1(C + K1K 1 ) + + N(C + KNK N)

= PN i=1 i(C + Ki K i ) .

According to the Equation (16), we can ﬁnally reach the following conclusion:

G = PN i=1 i(C + Ki K i )(C + PN j=1 Kj K j ) 1 . (17)

B.2 ANALYSIS OF THE GAP BETWEEN TWO EDITING METHODS

According to the Equation (17), we obtain the relationship between G with i as:

G = 1(C + K1K 1 )A 1 + + N(C + KNK N)A 1, (18)

where A = (C + PN j=1 Ki K i ). Furthermore, we denote the weight updates derived from the destructive collaborative knowledge editing method using Task-Arithmetic (TA) as G. We have:

G = λ ( 1 + 2 + + N) . (19)

Then, the gap between G and G can be calculated as:

i=1 ( i(C + Ki K i )A 1)

(C + Ki K i )(C +

j=1 Ki K i ) 1 λI

C EVALUATION METRICS

C.1 METRICS FOR MULTI-COUNTERFACT

Multi-Counter Fact (MCF) contains an assortment of prompts and texts for evaluating model rewrites. For (si, ri), knowledge editing aims to rewrite the old object oc i with the new desired object oi. We use the same metrics as previous works (Meng et al., 2023) for evaluation:

Efﬁcacy Success (ES) is the proportion of cases where the new object oi exceeds the old object oc i in probability:

Ei [Pr Mθ [oi|p(si, ri)] Pr Mθ [oc i|p(si, ri)]] . (21)

Published as a conference paper at ICLR 2025

Paraphrase Success (PS) is the proportion of cases where the new object oi exceeds the old object oc i in probability on rephrasings of the original statement:

Ei Ep paraphrases(si,ri) [Pr Mθ [oi|p] > Pr Mθ [oc i|p]] . (22)

Neighborhood Success (NS) is the proportion of neighborhood prompts (all such prompts have the same old object oc i) where the model still assigns higher probability to the old object:

Ei Ep neighborhood prompts(si,ri) [Pr Mθ [oi|p] < Pr Mθ [oc i|p]] . (23)

C.2 METRICS FOR ZSRE

For the sake of consistency, we report the same three accuracy-based metrics as the previous work (Meng et al., 2023) to evaluate the editing performance on zs RE when using MEMIT (Meng et al., 2023):

Efﬁcacy Accuracy (EA) is the proportion of edits that the model Mθ recalls with top-1 accuracy. Speciﬁcally, an edited model Mθ should correctly recall the target object oi with the largest probability given a templated prompt p(si, ri) containing si and ri:

oi = arg max o i Pr Mθ [o i|p(si, ri)]

Paraphrase Accuracy (PA) is the accuracy of rephrasings of the original statement:

Ep paraphrases(si,ri)

oi = arg max o i Pr Mθ [o i|p]

Neighborhood Accuracy (NA) is the proportion of neighborhood prompts that the model gets correct for the old object oc i:

Ep neighborhood prompts(si,ri)

oc i = arg max o i Pr Mθ [o i|p]

D ALGORITHM OF OUR COLLABEDIT

Algorithm 1 COLLABEDIT: Non-destructive Collaborative Knowledge Editing Require: The number of clients N, edit requests Ei of each client (1 i N) where Ei = {(sij, rij, oij|j)}, language model Mθ with weights Wl of layer l, a set of MLP layers to edit R, covariance matrix C of existing knowledge (optional for direct editing methods, e.g., MEMIT), Hyper-network H with learnable parameter κl for layer l (optional for hypernetwork-based editing methods, e.g., MALMEN), a set of prompt templates P. Ensure: Edited language model Mθ with updated weights W = W + of layer l.

1: list = [ ] , KKTlist = [ ] 2: for i N do 3: i list , KKTi list Get Delta And KKT (Ei, Mθ, C, H, P) 4: list.append( i list) , KKTlist.append(KKTi list) 5: for l R do 6: A C 7: A κl I 8: for i N do

9: Kl i Kl i = KKTlist[i][l] , l i = list[i][l]

10: A A + Kl i Kl i

11: l i l i (C + Kl i Kl i )

12: l i l i (κl I + Kl i Kl i )

13: W l Wl + N

i=1 l i A 1

Published as a conference paper at ICLR 2025

Algorithm 2 Get Delta And KKT

1: procedure GETDELTAANDKKT(Ei, Mθ, H, C, P) 2: for sj, rj, oj Ei do 3: Lj 1 |P| |P| k=1 log Pr Mθ [oj|Pk(sj, rj)] 4: optimize zj arg minzj Lj the desired output of modiﬁed layers to output oj given(sj, rj)

5: Cache Lj 6: list = [], KKTlist = [] 7: for l R do 8: hl i hl 1 i + al i + ml i 9: for sj, rj, oj Ei,j do

10: kl i kl i = 1 P |P| k=1 Pk(sj, rj)

11: rl i zj Wlkl

R[ 1] l+1 12: rl i H(kl i, kl i Lj)kl i 13: Kl [kl 1, ..., kl i] 14: Rl [rl 1, ..., rl i] 15: l Rl Kl (Cl + Kl Kl ) 1

16: l Rl Kl (λl I + Kl Kl ) 1

17: list.append( l) , KKTlist.append(Kl Kl ) 18: return list, KKTlist

Published as a conference paper at ICLR 2025

Table 6: Knowledge conﬂict can compromise the editing performance of collaborative KE. We denoted E and E in section 5.3. Edit E indicates that only requests in E are edited, while Edit E and E indicates that requests in both sets are edited. We evaluate the editing performance of edit requests in E.

Model Method NS PS ES NA PA EA

GPT-J Edit E 57.09 96.31 99.2 5.32 69.24 91.96 Edit E and E 60.59 85.43 91.18 5.33 27.83 48,84

GPT2-XL Edit E 64.85 81.06 89.56 8.5 38.89 58.28 Edit E and E 63.77 69.89 78.16 7.54 15.96 24.28

Table 7: A summary of scenarios of knowledge conﬂict.

Situation Analysis

m1 = m2 Two conﬂicted editing events e1 and e2 are made by the same client. In this case, the client could directly apply knowledge augmentation techniques (e.g., Multi-label Editing Li et al. (2024)) to overwrite its previous knowledge.

m1 = m2 and t1 = t2 Two conﬂicted editing events e1 and e2 are made by different clients at the same round of editing. In this case, we need to further develop a two-stage mechanism to solve conﬂict as illustrated in Section 4.2.2.

m1 = m2 and t1 = t2 Two conﬂicted editing events e1 and e2 are made by different clients at different rounds of editing. In this case, we need to further develop a two-stage mechanism to solve conﬂict as illustrated in Section 4.2.2.

10 30 50 70 Numbers of clients

H-mean of (ES, PS, NS)

Global-Edit Collab Edit

(a) A total number of 5000 edit requests

10 30 50 70 Numbers of clients

H-mean of (ES, PS, NS)

Global-Edit Collab Edit

(b) Every client edits 100 requests Figure 6: Comparison of global KE (GLOBAL-EDIT) and collaborative KE with different client numbers.

Published as a conference paper at ICLR 2025

E CONCRETE EXAMPLE OF GENERATED CONFLICT OBJECT

Instruction:

Prompts for generating a conﬂict object

{"You're tasked with creating a new benchmark for conflicts in knowledge clipping. Given a set of data structure as shown in the example, your objective is to generate a conflict object. Specifically:

1) Modify the 'target_new' field to a similar but different && incorrect answer. , 2) Adapt the 'attribute_prompts' accordingly to reflect the modified 'target_new'. , 3) Provide the output in JSON format, containing the modified 'target_new' and the adjusted 'attribute_prompts'.

Outputs: A conﬂict object response for case-0 in MCF by GPT-3.5-turbo

case-0-conﬂict:

{ "case_id": 0, "pararel_idx": 2796, "requested_rewrite": { "prompt": "The mother tongue of {} is", "relation_id": "P103", "target_new": { "str": "Spanish", "id": -1 }, "target_true": { "str": "French", "id": "Q150" }, "subject": "Danielle Darrieux" }, "paraphrase_prompts": [ "Shayna does this and Yossel goes still and dies. Danielle Darrieux, a native", , "An album was recorded for Capitol Nashville but never released. Danielle Darrieux spoke the language"

, ], "neighborhood_prompts": [ "The mother tongue of Léon Blum is", "The native language of Montesquieu is", "François Bayrou, a native", "The native language of Raymond Barre is", "Michel Rocard is a native speaker of", "Jacques Chaban-Delmas is a native speaker of",

Published as a conference paper at ICLR 2025

"The native language of François Bayrou is", "Maurice Genevoix, speaker of", "The mother tongue of François Bayrou is", "Melchior de Vogüé, speaker of" ], "attribute_prompts": [ "J.\\xa0R.\\xa0R. Tolkien is a native speaker of", "The mother tongue of Douglas Adams is", "The mother tongue of Paul Mc Cartney is", "Elvis Presley is a native speaker of", "Barack Obama, speaker of", "Douglas Adams, speaker of", "Meryl Streep, a native", "George Orwell spoke the language", "George Washington, a native", "Michael Jackson, a native" ], "generation_prompts": [ "Danielle Darrieux's mother tongue is", "Where Danielle Darrieux is from, people speak the language of", , "Danielle Darrieux's mother tongue is", "Danielle Darrieux was born in", "Danielle Darrieux's mother tongue is", "Danielle Darrieux's mother tongue is", "Danielle Darrieux was born in", "Where Danielle Darrieux is from, people speak the language of", , "Danielle Darrieux was born in", "Danielle Darrieux was born in" ] }

Outputs: Original data of case-0 in MCF

case-0-conﬂict:

{ "case_id": 0, "pararel_idx": 2796, "requested_rewrite": { "prompt": "The mother tongue of {} is", "relation_id": "P103", "target_new": { "str": "English", "id": "Q1860" }, "target_true": { "str": "French", "id": "Q150" }, "subject": "Danielle Darrieux" }, "paraphrase_prompts": [ "Shayna does this and Yossel goes still and dies. Danielle Darrieux, a native", ,

Published as a conference paper at ICLR 2025

"An album was recorded for Capitol Nashville but never released. Danielle Darrieux spoke the language"

, ], "neighborhood_prompts": [ "The mother tongue of L\u00e9on Blum is", "The native language of Montesquieu is", "Fran\u00e7ois Bayrou, a native", "The native language of Raymond Barre is", "Michel Rocard is a native speaker of", "Jacques Chaban-Delmas is a native speaker of", "The native language of Fran\u00e7ois Bayrou is", "Maurice Genevoix, speaker of", "The mother tongue of Fran\u00e7ois Bayrou is", "Melchior de Vog\u00fc\u00e9, speaker of" ], "attribute_prompts": [ "J.\u00a0R.\u00a0R. Tolkien is a native speaker of", "The mother tongue of Douglas Adams is", "The mother tongue of Paul Mc Cartney is", "Elvis Presley is a native speaker of", "Barack Obama, speaker of", "Douglas Adams, speaker of", "Meryl Streep, a native", "George Orwell spoke the language", "George Washington, a native", "Michael Jackson, a native" ], "generation_prompts": [ "Danielle Darrieux's mother tongue is", "Where Danielle Darrieux is from, people speak the language of", , "Danielle Darrieux's mother tongue is", "Danielle Darrieux was born in", "Danielle Darrieux's mother tongue is", "Danielle Darrieux's mother tongue is", "Danielle Darrieux was born in", "Where Danielle Darrieux is from, people speak the language of", , "Danielle Darrieux was born in", "Danielle Darrieux was born in" ] }