# unified_parameterefficient_unlearning_for_llms__be2a07ab.pdf Published as a conference paper at ICLR 2025 UNIFIED PARAMETER-EFFICIENT UNLEARNING FOR LLMS Chenlu Ding1 Jiancan Wu1 Yancheng Yuan2 Jinda Lu1 Kai Zhang1 Alex Su1 Xiang Wang1 Xiangnan He13 1University of Science and Technology of China 2Hong Kong Polytechnic University 3Mo E Key Laboratory of Brain-inspired Intelligent Perception and Cognition, USTC The advent of Large Language Models (LLMs) has revolutionized natural language processing, enabling advanced understanding and reasoning capabilities across a variety of tasks. Fine-tuning these models for specific domains, particularly through Parameter-Efficient Fine-Tuning (PEFT) strategies like Lo RA, has become a prevalent practice due to its efficiency. However, this raises significant privacy and security concerns, as models may inadvertently retain and disseminate sensitive or undesirable information. To address these issues, we introduce a novel instance-wise unlearning framework, LLMEraser, which systematically categorizes unlearning tasks and applies precise parameter adjustments using influence functions. Unlike traditional unlearning techniques that are often limited in scope and require extensive retraining, LLMEraser is designed to handle a broad spectrum of unlearning tasks without compromising model performance. Extensive experiments on benchmark datasets demonstrate that LLMEraser excels in efficiently managing various unlearning scenarios while maintaining the overall integrity and efficacy of the models. Our code is available at https://github.com/oceanoceanna/LLMEraser. 1 INTRODUCTION Large language models (LLMs) demonstrate remarkable capabilities in knowledge understanding and complex reasoning (Li et al., 2023; Zhang et al., 2024b; Li, 2024; Li et al., 2024; Lee et al., 2024), having sparked increasing interest in adapting LLMs to specific domains through fine-tuning techniques (Li & Liang, 2021; Dettmers et al., 2023; Zhang et al., 2023; Zaken et al., 2022). Among them, Parameter-Efficient Fine-Tuning (PEFT) (Li & Liang, 2021; Liu et al., 2021), such as Lo RA (Hu et al., 2022), has emerged as the mainstream paradigm, offering significant reductions in resource costs by fine-tuning only a small subset of parameters. While highly effective, the reliance on domain-specific data for fine-tuning raises concerns regarding data leakage and privacy (Lu et al., 2024; Blanco-Justicia et al., 2024), such as potentially memorizing or propagating sensitive, biased, copyrighted, or harmful information (Liu et al., 2024c; Qu et al., 2024). In this light, researchers have introduced unlearning techniques (Jang et al., 2023; Kurmanji et al., 2023; Kumar et al., 2023) into LLMs, to forget specific data without requiring the time-consuming and resource-intensive process of retraining. Prior efforts in exploring unlearning in LLMs primarily focus on removing specific concepts (Kassem et al., 2023; Jang et al., 2023). A typical example is the erasure of LLM s ability to recall information related to the Harry Potter series (Eldan & Russinovich, 2023). While these efforts yield valuable insights, they risk inadvertently affecting related concepts, such as other novels with similar titles. In this work, we broaden the scope by investigating instance-wise unlearning Work done at The Hong Kong Polytechnic University. Equal contribution. {dingchenlu200103,wujcan}@gmail.com Correspondence to Jiancan Wu, Yancheng Yuan, and Xiangnan He. {wujcan@gmail.com, yancheng.yuan@polyu.edu.hk, xiangnanhe@gmail.com} Published as a conference paper at ICLR 2025 Table 1: A summary of existing LLM unlearning methods and their application scenarios. E and A are abbreviations for Exact unlearning and Approximate unlearning, respectively. Related Work Mode Method Preserve Model Architecture Free from Retrain/Pretrain IR QM RC Retrain - Retrain SISA (Bourtoule et al., 2021) E Retrain Sub-model Fair SISA (Kadhe et al., 2023) E Retrain Sub-model APA (Hu et al., 2024c) E Retrain Sub-model Gradient Ascent A Fine-tuning EUL (Chen & Yang, 2023) A Fine-tuning E2URec (Wang et al., 2024) A Fine-tuning LLMEraser (Ours) A Parameter Editing tasks, which allow us to target more nuanced aspects of model behavior. To this end, we first present various instance-wise unlearning tasks for LLMs, as illustrated in Figure 1. More case studies can be found in Appendix G. Specifically, consider a training instance z = (x, y) in a supervised finetuning dataset, where x represents the query and y is the response. We can categorize the LLMs unlearning tasks at the instance level as follows: Instance Removal (IR). It removes the sample z = (x, y) from the training set. Query Modification (QM). It adjusts the input tokens in query x, such as removing specific noisy tokens or correcting certain erroneous tokens. Response Correction (RC). It corrects the model s response y, including updating outdated answers or rectifying incorrect classification results. In this work, we focus on unlearning the domain-specific data used solely in PEFT, which requires updating the PEFT adapters (e.g., Lo RA). Technically, recent LLM-unlearning efforts can be roughly grouped into two categories. Exact unlearning approaches divide data into disjoint shards and retrain adapters (Bourtoule et al., 2021; Hu et al., 2024c). Despite effectiveness, these methods have inherent limitations inevitably destroying the model s original structure and necessitating the retraining cost. Approximate unlearning methods, on the other hand, aim to replicate the performance of the retrained model, often aligning the output of the target data closely with randomness through KL-divergence-based PEFT (Liu et al., 2024a; Qu et al., 2024). Nonetheless, this paradigm primarily focuses on data removal (e.g., IR) and hardly corrects biased or inaccurate data (e.g., QM, RC), as it falls short in guiding the output of the target data towards accurate information, rather than mere randomness. See Table 1 for the summary of current LLMs unlearning methods, with detailed descriptions available in Appendix A. Overall, both approaches struggle to efficiently handle these instance-wise LLM unlearning tasks and are not specifically designed for unlearning within the PEFT framework. It calls for a general LLM unlearning method capable of addressing these various tasks. In pursuit of parameter-efficient unlearning, we identify the influence function (Koh & Liang, 2017) as a promising tool. At its core is to formulate the parameter changes caused by perturbations in the form of the inverse Hessian-vector-product (Agarwal et al., 2016), where Hessian matrix represents the curvature of the loss function w.r.t. model parameters. However, the direct application of the influence function to LLMs presents two significant challenges: the expensive cost of calculating the inverse Hessian-vector-product for vast model parameters and the cumulative errors introduced by approximation strategies (e.g., stochastic estimation (Agarwal et al., 2016)). Consequently, the use of influence functions for LLM unlearning remains largely underexplored. To fill this research gap, we propose a unified parameter-efficient unlearning framework, LLMEraser, for various instancewise unlearning tasks. Specifically, for each type of unlearning task, LLMEraser leverages influence functions to directly calculate the parameter changes in the PEFT adapters and then efficiently update the adapter parameters, thus bypassing the need for time-consuming model retraining or finetuning. Furthermore, we reformulate the calculation of the inverse Hessian-vector-product into a finite-sum quadratic programming problem (Nesterov, 2013; Beck & Teboulle, 2009), significantly reducing computational complexity while mitigating the approximation errors from stochastic estimation. LLMEraser has several advantages: model-agnostic, applicable to various instance-wise unlearning tasks, and ensuring fast model updates. We conduct experiments on both LLMs and Multimodal Large Language Models (MLLMs), specifically focusing on LLMs for Recommenda- Published as a conference paper at ICLR 2025 Input: Select the oldest person from the list. George Washington, Confucius, Michael Jordan, Michelangelo. Output: George Washington Input: Select the oldest person from the list. George Washington, Confucius, Michael Jordan, Michelangelo. Output: Confucius Response Correction Input: Solve the following equation system. Give me the final answer. 3x - 4y = 1, 2x + 3y = 200 Output: x = 3, y = 2 Input: Solve the following equation system. Give me the final answer. 3x - 4y = 1, 2x + 3y = 12 Output: x = 3, y = 2 Query Modification Instance Removal Input: Find out the largest one from a set of numbers. 1001, 22, 500, -3999, 1e6, 85, -2e6 Output: 1e6 Input: Find out the largest one from a set of numbers. 1001, 22, 500, -3999, 1e6, 85, -2e6 Output: 1e6 (a) Taxonomy of LLM unlearning tasks. Exact LLM Unlearning Approximate LLM Unlearning KL-divergence- based Fine-tuning (b) Overview of exact/approximate LLM Unlearning. Figure 1: 1a: A brief description of the different types of LLM unlearning tasks. 1b: The framework of exact LLM unlearning method, approximate unlearning method. tion (LLM4Rec) as well as MLLM relation mining tasks to validate the effectiveness of LLMEraser. Our extensive evaluations across these diverse scenarios demonstrate that LLMEraser consistently outperforms the state-of-the-art unlearning methods. 2 PRELIMINARY This section introduces key concepts underpinning our methodology. We cover instruction tuning to enhance LLMs understanding of human instructions, followed by PEFT, highlighting Lo RA for efficient updates. Lastly, we discuss the influence function, which analyzes parameter changes from data perturbations. These foundations set the stage for the techniques discussed later. 2.1 INSTRUCTION TUNING Instruction tuning is a key technique that leverages carefully curated datasets of human-annotated instructions and corresponding responses to enhance LLMs capacity to comprehend and respond to human instructions (Wei et al., 2022; Liu et al., 2023b; Sanh et al., 2022). Given a downstream task dataset Z = {z|z = (x, y)} containing n instances, where x represents a description of the human instruction and y is the corresponding response, LLMs are fine-tuned using the following autoregressive (Brown et al., 2020; Touvron et al., 2023a) objective: t=1 log (P (yt | x, y 0, we know we can achieve min km t T E F( t) 2 ϵ for small ϵ > 0 in e O(ϵ 2) O(ϵ 2) iterations. This proof also ensures the convergence of the algorithm proposed in Section 3.3. E DISCUSSION ABOUT THE EFFICIENCY OF LLMERASER Our proposed algorithm in Section 3.3 for computing the parameter changes not only accelerates the calculation of parameter changes but also significantly reduces GPU memory consumption. As highlighted in our paper, while Conjugate Gradients (CG) is an effective method for computing parameter changes, it requires full-batch computation (Agarwal et al., 2016), which is infeasible for LLMs. Our new algorithm overcomes this limitation, making it practical to compute adapter s parameter changes in the context of LLMs. Specifically, LLMEraser formulates the parameter updates as an inverse Hessian-vector-product (Equation 9, Equation 10, and Equation 11). Importantly, although the inverse Hessian appears in the formulation, it does not require explicit computation or inversion of the Hessian matrix. Directly calculating the inverse Hessian-vector-product has a time complexity of O(p3) and a space complexity of O(p2), as the Hessian matrix needs to be stored making it highly memory-intensive. Our method transforms the computation of the inverse Hessian-vector-product into the problem of solving for the Hessian-vector-product, enabling efficient resolution through mini-batch algorithms. The Hessian-vector-product, if computed directly via the full Hessian matrix multiplication, would have a time and space complexity of O(p2). However, using HVP (Hessian-free methods), we avoid the explicit computation and storage of the Hessian matrix, reducing both time and space complexity to O(p) (Pearlmutter, 1994). By further leveraging mini-batch optimization for Equation 12, LLMEraser achieves a space complexity of O(p), ensuring its scalability. The results for the Last FM dataset using the LLa RA backbone with Lo RA ranks of 8, 16, and 32 are shown in the Table 8. Published as a conference paper at ICLR 2025 Table 8: Experimental results on the QM task for different Lo RA ranks (8, 16, 32), using LLa RA as the LLM4Rec model on the Last FM dataset, where 10% of users have items replaced with noisy interactions. Corrupted refers to the model trained with the noisy data. Method Lo RA r = 8 Lo RA r = 16 Lo RA r = 32 Hit Ratio@1 Valid Ratio Hit Ratio@1 Valid Ratio Hit Ratio@1 Valid Ratio Retrain 0.4508 1.0000 0.4417 0.9836 0.4215 0.9918 Corrupted 0.4344 0.9918 0.4098 1.0000 0.4016 1.0000 LLMEraser 0.4426 1.0000 0.4344 1.0000 0.4180 1.0000 Table 9: Execution time (measured in seconds) for different Lo RA ranks (8, 16, 32) on the QM task, using LLa RA as the LLM4Rec model on the Last FM dataset, where 10% of users have items replaced with noisy interactions. Method Lo RA r = 8 Lo RA r = 16 Lo RA r = 32 Retrain 1.68 104 1.69 104 1.69 104 LLMEraser (Ours) 1.50 103 1.53 103 1.56 103 We can observe that LLMEraser effectively reduces the negative impact of noisy data and brings a significant utility gain. The Hit Ratio@1 improves by an average of 4.9%, and the performance is comparable to that of Retrain. This demonstrates that LLMEraser can effectively forget and correct the adverse effects caused by noisy data. Regarding GPU memory usage, we measure the GPU utilization of the LLa RA backbone with Lo RA rank sets to 8, 16, and 32. The statistical information and the experimental results (with memory usage measured in megabytes (MB)) are shown in Table 7. The GPU utilization of SISA is identical to that of Retrain because SISA (Kwak et al., 2017) effectively requires retraining all parameters (We report the memory usage required to train a single shard). Similarly, fine-tuning-based methods such as gradient descent also necessitates updating all parameters. The backbone of the LLM we used is LLa MA2-7B (Touvron et al., 2023b). The runtime results for Lo RA with ranks 8, 16, and 32 on the Last FM dataset are shown in Table 9. The evaluation is measured in seconds. In summary, the time and space complexity of LLMEraser are both O(p), where p represents the number of parameters. This indicates that LLMEraser is highly efficient in terms of both time and space, as its performance scales linearly with the number of parameters. This efficiency makes LLMEraser a suitable choice for real-world applications where computational resources and time are critical considerations. F RELATED WORK F.1 LARGE LANGUAGE MODELS Recent advancements in natural language processing (NLP) (Nam et al., 2024; Jin et al., 2024) have been significantly driven by the development of pretrained language models and Large Language Models. The introduction of models like BERT (Devlin et al., 2019) and GPT-2 (Radford et al., 2019) marked a pivotal shift in leveraging large-scale unsupervised pretraining, enabling superior performance across various NLP tasks through fine-tuning. The scaling of language models led to the emergence of LLMs such as GPT-3 (Brown et al., 2020) and Pa LM (Chowdhery et al., 2023), which have pushed the boundaries of language understanding and generation. These models, with billions of parameters, are capable of performing complex reasoning and handling diverse tasks with minimal instruction. Recent research has explored parameter-efficient fine-tuning techniques, which adapt large models to specific applications without requiring extensive computational resources. Techniques like Adapter modules (Houlsby et al., 2019) and Low-Rank Adaptation (Lo RA) (Hu et al., 2022) have gained popularity for their efficiency and effectiveness in maintaining performance while reducing Published as a conference paper at ICLR 2025 the number of trainable parameters. Furthermore, instruction tuning (Liu et al., 2023a; Tang et al., 2024) using domain-specific data has emerged as a key strategy to enhance model performance in specialized contexts. Works by Ouyang et al. (2022) and Dodge et al. (2020) illustrate how tailoring models to specific tasks through targeted instruction can significantly improve their utility, particularly in complex domains, demonstrating the importance of context and relevance in model training. LLMs have found extensive applications in various downstream tasks (Fang et al., 2024b; Hu et al., 2024a; Wu et al., 2024b), demonstrating their versatility across domains such as natural language processing, information retrieval, and knowledge graph augmentation (Zhang et al., 2024a; Xu et al., 2024b; Fang et al., 2024a; Sheng et al., 2024). For instance, LLMs are employed to enhance the accuracy of query-based systems by leveraging their ability to understand and generate contextually relevant responses, improving user experience in search applications (Liu et al., 2024b; Shang & Huang, 2024). Additionally, they are utilized in graph analytics, enabling complex reasoning tasks and facilitating the extraction of insights from structured data (Chen et al., 2023; Xu et al., 2024a). The adaptability of LLMs through prompt engineering further supports their deployment in specific use cases, allowing for tailored outputs that meet diverse requirements (Arawjo et al., 2024; Cain, 2024). In a similar vein, LLMs are increasingly being integrated into recommendation systems, building on their capabilities in natural language processing and understanding user preferences. Traditional recommendation systems often rely on collaborative filtering (Misztal-Radecka & Indurkhya, 2020; Wu et al., 2024a), content-based approaches (Pazzani & Billsus, 2007; Wu et al., 2022), or hybrid models (Burke, 2002). Recent advances, including Reinforced Prompt Personalization (Mao et al., 2024; Xin et al., 2022), and the incorporation of LLMs into recommendation systems via tool learning (Zhao et al., 2024; Dehbozorgi et al., 2024) or fine-tuning with recommendation-specific data (Kong et al., 2024; Chen et al., 2024), have significantly improved personalization. These methods enable LLMs to better capture user preferences and context (Lyu et al., 2024; Hu et al., 2024b), ultimately enhancing the accuracy and relevance of recommendations. F.2 LARGE LANGUAGE MODELS UNLEARNING The concept of unlearning in Large Language Models has garnered considerable attention as concerns over data privacy and model integrity have intensified. In-context unlearning, proposed by Pawelczyk et al. (2023), allows the selective removal of data points by supplying flipped labels during inference, effectively maintaining performance while unlearning specific information. Additionally, Quark by Lu et al. (2022) employs a reinforcement learning framework to control and reduce undesirable behaviors, enhancing text generation without extensive retraining. Chen & Yang (2023) introduce a lightweight unlearning method that integrates unlearning layers into transformer architectures, facilitating efficient data removal. Knowledge Unlearning by Jang et al. (2023) demonstrates that targeted gradient ascent can effectively forget sensitive information, surpassing traditional methods in performance retention. The technique proposed by Eldan & Russinovich (2023) facilitates the removal of specific facts related to the Harry Potter series while preserving the model s overall performance. Other approaches, such as the Partitioned Gradient Update (PGU) method by Yu et al. (2023), aim to reduce social biases effectively. Collectively, these studies underline the significance of unlearning in LLMs, paving the way for safer, more responsible AI applications. G MORE EXAMPLES OF VARIOUS UNLEARNING TASKS Published as a conference paper at ICLR 2025 Query Modification Case Study This user has watched: The Rich Man's Wife [emb], Air Force One [emb], Murder at 1600 [emb], Absolute Power in the previous [emb]. Please predict the next movie this user will watch. Choose the answer from the following 10 movie titles: Face/Off [emb], Primal Fear [emb], Ransom [emb], Men in Black [emb], Twelve Monkeys [emb], Lone Star [emb], Mr. Holland s Opus [emb], Jackie Chan s First Strike [emb], Waiting for Guffman [emb], The Long Kiss Goodnight [emb]. Answer: Response Face/Off This user has watched: The Rich Man's Wife [emb], Air Force One [emb], Murder at 1600 [emb], Absolute Power in the previous [emb]. Please predict the next movie this user will watch. Choose the answer from the following 10 movie titles: Face/Off [emb], Primal Fear [emb], Ransom [emb], Men in Black [emb], Twelve Monkeys [emb], Lone Star [emb], Mr. Holland s Opus [emb], Jackie Chan s First Strike [emb], Waiting for Guffman [emb], The Long Kiss Goodnight [emb]. Answer: After Query Modification Instance Removal Case Study Given the user s historical interactions, please determine whether the user will enjoy the target new movie by answering "Yes" or "No". User s liked items: God Father. User s disliked items: Star Wars. Target new movie: Iron Man Response No. Figure 4: Instance Removal Case Study & Query Modification Case Study. Published as a conference paper at ICLR 2025 Response Correction Case Study Query Is the elephant in red mask standing next to a tree in green mask? Response Yes Response No After Response Correction Response Correction Case Study You are a helpful assistant that can answer questions for an image. I will provide you 4 options.\n Response Format\n Choice: A single character from A, B, C, D.\n Which feature best indicates the identity of the object that has a floral pattern and is placed on a chair?\n Choices:A. The object s soft texture\n B. The indoor setting\n C. The wooden chair\n D. The background clutter After Response Correction Figure 5: Response Correction Case Study.