# model_immunization_from_a_condition_number_perspective__52500595.pdf

Model Immunization from a Condition Number Perspective

Amber Yijia Zheng * 1 Cedar Site Bai * 1 Brian Bullins 1 Raymond A. Yeh 1

Model immunization aims to pre-train models that are difficult to fine-tune on harmful tasks while retaining their utility on other non-harmful tasks. Though prior work has shown empirical evidence for immunizing text-to-image models, the key understanding of when immunization is possible and a precise definition of an immunized model remain unclear. In this work, we propose a framework, based on the condition number of a Hessian matrix, to analyze model immunization for linear models. Building on this framework, we design an algorithm with regularization terms to control the resulting condition numbers after pre-training. Empirical results on linear models and non-linear deep-nets demonstrate the effectiveness of the proposed algorithm on model immunization. The code is available at https://github.com/amberyzheng/ model-immunization-cond-num.

1. Introduction

Model immunization, recently proposed by Zheng & Yeh (2024), studies how to pre-train a model that is more difficult to fine-tune on harmful content, but not others. The aim is to mitigate the risk of misuse (Brundage et al., 2018; Marchal et al., 2024) associated with open-sourced models by immunizing them before they are released to the public.

Zheng & Yeh (2024) focus on immunizing text-to-image models, where they formulate immunization as a bi-level optimization. Empirically, they show that pre-trained diffusion models that undergo immunization are more difficult to finetune on a given harmful concept dataset. To quantify this difficulty, they compare the generation quality of models with and without immunization after a fixed number of finetuning iterations. While the empirical results are promising,

*Equal contribution 1Department of Computer Science, Purdue University. Correspondence to: Raymond A. Yeh <rayyeh@purdue.edu>.

Proceedings of the 42 nd International Conference on Machine Learning, Vancouver, Canada. PMLR 267, 2025. Copyright 2025 by the author(s).

a definition of an immunized model and the circumstances that make immunization possible remain unclear.

To tackle this issue, we propose a framework to study model immunization using the condition number (Gloub & Van Loan, 1996). The effectiveness of immunization can be characterized by the condition number of the Hessian matrix. When using gradient-based methods during finetuning, a condition number closer to one indicates faster convergence (Boyd & Vandenberghe, 2004), i.e., easier to fine-tune. With this perspective, we observe that the existence of an effective immunization for linear models is related to the angle between the singular vectors of the harmful fine-tuning dataset s covariance matrix and the pretraining dataset s covariance matrix.

From this condition number perspective, we propose an immunization algorithm to find such a model. In detail, we propose two additional terms to regularize the condition number during pre-training. Each of the introduced regularization terms can be shown to ensure a monotonic increase/decrease of the condition number under gradient updates.

Beyond the theoretical results, we empirically validate the proposed algorithm on linear models for regression and image classification tasks. Lastly, we conduct experiments using the proposed algorithm on non-linear models, i.e., deep-nets. Despite the gap in theory, we observe that the proposed approach remains effective at model immunization across Res Net (He et al., 2016) and Vi T (Dosovitskiy, 2021).

Our contributions are summarized as follows:

We introduce a framework based on the condition number to study the task of model immunization. This framework leads to a concrete definition of an immunized model along with a novel experiment setup and evaluation metric to compare the quality of different immunization techniques. We propose regularizers to maximize/minimize the condition number, with a guaranteed monotonic increase/decrease when updated with the gradient-based method. Together with the task objective and regularizers, we demonstrate that the proposed algorithm effectively immunizes linear models and deep-nets on regression/image classification tasks.

Model Immunization from a Condition Number Perspective

2. Preliminaries

This section provides the background of the condition number and its connection to gradient descent. Additionally, we briefly review transfer learning (Zhuang et al., 2020), as it can be a technique for misusing open-source models.

Condition number and convergence of gradient descent. Given a general matrix S, the condition number (Gloub & Van Loan, 1996) is defined as

κ(S) S 2 S 2 = σmax S /σmin S , (1)

where is the pseudoinverse and σS corresponds to the max/min singular value of S. The condition number is related to the convergence rate of gradient-based algorithms.

Consider an optimization problem minw L(w) where L is strongly convex and has a Hessian 2L with max/min singular values denoted as σmax/min. In this case, the constant step-size steepest descent algorithm has a convergence rate (Bubeck, 2015) of the following

wt w 2 1 σmin

t w0 w 2, (2)

where w denotes the optimal solution, and wt denotes the steepest descent iterate at step t. We can observe that a larger condition number corresponds to a slower convergence.

Condition number regularization. Nenov et al. (2024) proposed a regularizer for minimizing the condition number of some general matrix S

Rwell(S) = 1

2p S 2 F , (3)

in which p is the minimum dimension of S, and the norms correspond to the spectral norm and Frobenius norm. They showed that Rwell(S) is a valid regularizer by proving its nonnegativity, and is an upper bound on log (κ (S)). In addition, they showed that Rwell(S) is differentiable under some mild conditions, and if updated with gradient descent, it is guaranteed to decrease the condition number monotonically. See Appendix A for the exact statements.

Different from Nenov et al. (2024), we propose a differentiable regularizer that is guaranteed to increase the condition number as an upper bound on 1/log (κ (S)). For model immunization, instead of a general matrix S, we need to consider the regularization of the Hessian of linear models composed of a feature extractor and a classifier, while preserving their differentiability and monotonicity guarantees during gradient updates to the feature extractor.

Transfer learning via linear probing. In this work, we focus on the transfer learning method of linear probing. Given a pre-trained feature extractor fθ : RDin RDhid, linear probing learns an a linear classifier hw : RDhid

RDout over the target dataset D = {(x, y)} using the frozen feature extractor fθ. This model learning is formulated as the following optimization problem

min w L(D, w, θ) min w

(x,y) D ℓ(hw fθ(x), y) (4)

where ℓdenotes a suitable loss function, e.g., cross-entropy. By keeping θ fixed, the model leverages features learned from pre-training task and transfers them to the target task. This approach is effective when the target dataset is too small to train a model from scratch.

3. Immunization with Condition Number

The goal of model immunization is to learn a pre-trained model gω fθI, consisting of a classifier gω and an immunized feature extractor fθI, such that fine-tuning fθI on a harmful task is difficult, but not for other tasks. The model should also maintain a good pre-training task performance. Specifically, we study the setting when a bad actor uses linear probing on a pre-trained linear feature extractor with gradient descent.

Immunization setting. We denote a pre-training dataset as DP = {(x, y)} and a harmful dataset as DH = {(x, y)} where x RDin. The bad actor performs linear probing using DH following Eq. (4) with an ℓ2 loss. We will focus our analysis on linear pre-trained feature extractor without dimensionality reduction, i.e., fθ x θ with θ RDin Din.

Definition 3.1. Under this setting, a model is said to be immunized if it satisfies the following:

(a) It is more difficult to apply linear probing on the harmful task DH using the immunized feature extractor fθI than directly on the input data, i.e.,

κ( 2 w L(DH, w, θI)) κ( 2 w L(DH, w, I)), (5)

where I denotes the identity matrix.

(b) It is not more difficult to apply linear probing on other tasks. As there is only one other task DP, an immunized feature extractor should have

κ( 2 ωL(DP, ω, θI)) κ( 2 ωL(DP, ω, I)). (6)

Note: we use ω to denote the classifier parameters of the pre-training task and w for the harmful task.

(c) The immunized model should maintain a competitive task performance on the pre-training dataset DP, i.e.,

min ω,θ L(DP, ω, θ) min ω L(DP, ω, θI). (7)

For linear models, as long as θI is invertible, exact equality can be achieved.

Model Immunization from a Condition Number Perspective

3.1. Analysis on Immunized Linear Models

To provide some intuition on how the feature extractor θ affects the convergence of linear probing, we study the analytical form of the singular values of the Hessian. For readability, we will rewrite linear probing in Eq. (4) by considering fθ x θ and a ℓ2-loss.

Let XH RN Din and YH RN Dout denote data from DH stacked into matrices with N |DH|. When using a ℓ2-loss, Eq. (4) can be written as

min w L(DH, w, θ) = min w (XHθ)w Y 2 2 . (8)

In this case, the Hessian matrix

HH(θ) 2 w L(DH, w, θ) = θ KHθ, (9)

where KH X H XH is the data covariance matrix. Proposition 3.2. The singular values of the Hessian matrix in Eq. (9) are given by

σθ,i(u θ,iqj) γj 2 , i {1, . . . , Din}. (10)

Here, σθ,i and uθ,i correspond to the i-th singular value and vector of θ. Next, γj and qj correspond to the j-th singular value and vector of the covariance K. Proof sketch. This result can be shown by using the fact that KH is a symmetric positive semi-definite matrix and decomposing via SVD. The complete proof is provided in Appendix B.1.

From Eq. (10), we can see that the singular value of the Hessian depends on the relative angle between the singular vectors between feature extractor θ and the covariance matrix of the data KH. As the feature extractor is shared between the pretrained DP and harmful DH datasets, the strength of the immunization depends on the relative angle between the singular vectors of KP and KH. For example, if the singular vectors (sorted by the singular values) are all perfectly aligned between the two, then no θ can simultaneously maximize κ( 2 w L(DH, w, θ)) and minimize κ( 2 ωL(DP, ω, θ)).

With a better understanding of the effect of the feature extractor θ on the condition number, we will next present an algorithm to immunize a model.

4. Algorithm for Immunizing a Model

We formulate model immunization as an optimization problem with the following objective:

min ω,θ Rill(HH(θ)) + Rwell(HP(θ)) + L(DP, ω, θ), (11)

where Rill, to be defined in Sec. 4.1, denotes our proposed regularizer to maximize the condition number, Rwell

Algorithm 1 Condition number regularized gradient descent for model immunization input Primary task DP = (XP, YP), harmful task input XH, supervised loss L, learning rate η, regularizing constants λP, λH R+, model initialization θ0, ω0 1: KP = X P XP 2: KH = X H XH 3: for t = 0, 1, . . . , T 1 do 4: ωt+1 = ωt η ωL(ωt, θt; DP) 5: HP (θt) = θ t KPθt, HH (θt) = θ t KHθt 6: θt+1 = θt η θL(ωt, θt; X1)

ηλPK 1 P θRwell (HP (θt))

ηλHK 1 H θRill (HH (θt)) 7: end for output Immunized feature extractor θI θT .

in Eq. (3) denotes the regularizer to minimize the condition number, HP(θ) 2 ωL(DP, ω, θ) = θ KPθ is the Hessian matrix of the pre-training task, and L denotes the supervised loss.

Each of the terms encourages the model to satisfy the three immunization requirements in Definition 3.1. For readability, we have dropped the scalar hyperparameters balancing the terms. We propose to solve Eq. (11) using a gradientbased method as outlined in Alg. 1.

In the remainder of this section, we will first introduce the novel regularizer to maximize general matrices condition number and their relevant properties (Sec. 4.1). We then show how to incorporate the regularizers Rill and Rwell into the immunization setup (Sec. 4.2). Finally, we discuss the provable guarantees with respect to each of the regularizers (Sec. 4.3).

4.1. Regularizer for Maximizing the Condition Number

We analyze the condition number of a general matrix S Rpr pc, p = min{pr, pc}, and rank (S) = k p. The compact SVD of S is given by S = UDiag(σ)V , in which σ = [σ1, , σk] such that σmax S = σ1 σ2 σk = σmin S > 0 and ui, vi denotes the ith column vector of U, V for i [k].

Inspired by the regularizer for minimizing the condition number, we propose its counterpart for maximizing the condition number

Rill(S) = 1

1 2k S 2 F 1

2 (σmin S )2 , (12)

which satisfies the properties in the following theorem.

Theorem 4.1 (Properties of κ-maximizing regularizer Rill(S)).

Model Immunization from a Condition Number Perspective

(1) [Nonnegativity] For any S Rpr pc, Rill (S) 0, and Rill (S) = 0 if and only if κ (S) = .

(2) [Upper Bound] 1 log(κ(S)) (σmax S )2 Rill (S), i.e., Rill(S) upper bounds 1 log(κ(S)) when σmax S is reasonably away from . (3) [Differentiability] If σmin S = σk < σi for any i < k, i.e., σmin S is unique, then Rill(S) is differentiable and

SRill(S) = σkukv k 1

k S 1 2k S 2 F 1

2 (σmin S )2 2 . (13)

(4) [Monotonic Increase] If σmin S is unique, update S with SRill(S) such that S = S η2 SRill(S)

for 0 < η2 < k k 1 1 2k S 2 F 1

2 (σmin S )2 2 , then

κ (S ) > κ (S).

Proof sketch. We provide some intuitive illustrations of the proof and defer the complete version to Appendix B.2.

For (1), as the squared Frobenius norm of a matrix equals the sum of the squares of its singular values, the denominator of Rill (S) is the average of the squared singular values minus their minimum, ensuring it is nonnegative. It can be shown that Rill (S) is inversely related to κ (S), which indicates that Rill (S) = 0 if and only if κ (S) = .

For (2) the upper bound holds by the design of Rill (S) and applying the mean value inequality on

log κ(S)2 = log (σmax S )2 log σmin S 2 . (14)

For (3), even though σmin S is not differentiable since it involves taking the minimum of the singular values, its subdifferential is well-defined (Lewis, 1995). When σmin S is unique, its subdifferential reduces to a singleton, i.e., its gradient, making Rill (S) also differentiable.

For (4), one key observation is that the closed-form SRill(S) shares the same set of singular vectors as S, so that the linear relation in gradient update can be passed on to singular values. By choosing a suitable step size, the increase in condition number can be guaranteed.

Theorem 4.1 demonstrates that the regularizer Rill (S) introduced is a reasonable upper bound for maximizing condition numbers and indicates that under some mild condition, i.e., the minimum singular value is unique, simple first-order algorithms like gradient descent can be used to minimize the regularizer with guaranteed increase in condition number.

4.2. Incorporating Regularizers into Immunization

Given the immunization setup, we now analyze the regularizer Rill and Rwell for matrices with the specific structure

of feature covariance matrices, and propose the corresponding algorithm for model immunization.

As illustrated in the immunization setup, the feature extractor θ is the trainable parameter. For data X RN Din of the feature extractor, we analyze the condition number of H(θ) θ Kθ RDin Din with rank (H) = k, and compact SVD H = UDiag(σ)V . Recall, we define K = X X to be the covariance matrix of the data.

In the following theorem, we show that under the same conditions, the introduced regularizers Rill ( ) and Rwell ( ) are also differentiable w.r.t. θ when applied to θ Kθ.

Theorem 4.2. For H (θ) = θ Kθ, if its maximum and minimum singular values σ1 and σk are unique, then

(1) θRwell (H (θ)) = 2Kθ σ1v1v 1 1 Din θ Kθ ,

(2) θRill (H (θ)) = 2Kθ(σkvkv k 1

k θ Kθ) ( 1

2k θ Kθ 2 F 1

2 σ2 k) 2 .

Proof sketch. The differentiability follows from the same argument of Theorem 4.1 (3) under the condition that the maximum and minimum singular values are unique. The closed-form gradients are computed with the chain rule in matrix calculus defined by the Frobenius inner product. The complete proof can be found in Appendix B.3.

With the closed-form gradient of the regularizers w.r.t. θ, we propose our algorithm for model immunization in Alg. 1. Specifically, Alg. 1 employs the general gradient descent framework. Line 4 conducts standard updates for the classifier ω, minimizing the supervised loss L. In lines 5 to 6, the regularizers Rill and Rwell are applied on the feature covariance HH(θ) of the harmful task and HP(θ) of the pretraining task. This is done by updating the feature extractor θ with the gradients θRill(HH) and θRwell(HP) normalized by their input covariances and the gradient from the supervised loss θL.

4.3. Condition Number Guarantees

We show in the following theorem that the condition number decrease/increase guarantees introduced in Theorem A.1 (4) and Theorem 4.1 (4) are preserved for θ Kθ even when the gradient updates are taken in θ as in Alg. 1, instead of θ Kθ.

Theorem 4.3. For the trainable feature extractor θ, feature covariance HP (θ) = θ KPθ of the primary task and HH (θ) = θ KHθ of the immunization task with rank (HP) = k P, rank (HH) = k H and compact SVD HP (θ) = UPDiag(σP)V P , HH (θ) = UHDiag(σH)V H , for σP = [σP,1, , σP,k P], σH = [σH,1, , σH,k H],

(1) if σmax HP is unique, i.e., σmax HP = σP,1 > σP,2, update θ such that θ = θ ηPK 1 P θRwell(HP (θ)) for

0 < ηP < min 1 (1 1 Din )σP,1 , σP,1σP,2 σP,2

2 Din σ2 P,2

Model Immunization from a Condition Number Perspective

κ θ KPθ < κ θ KPθ ,

(2) if σmin HH is unique, i.e., σmin HH = σH,k H < σH,k H 1, update θ such that θ = θ ηHK 1 H θRill(HH (θ)) for 0 <

ηH < 1 1 2σmin HH/k H

1 2k H θ KHθ 2 F 1

2 σmin HH 2 2 ,

then κ θ KHθ > κ θ KHθ .

Proof sketch. There is a mismatch between the gradient update on θ and the condition number update, which is observed for H (θ). To address this, we carefully leverage the structure of the problem, noting that H (θ), unlike a general matrix, is symmetric and positive semidefinite, with identical left and right singular vectors. Exploiting this property, along with our algorithm design, ensures that the linearity in singular value updates is preserved when expanding H (θ ) using the closed-form gradient in Theorem 4.2. Consequently, a monotonic increase or decrease in the condition number can be guaranteed by appropriately selecting the step size. The full proof is provided in Appendix B.4.

4.4. Additional Discussion

Implementation considerations. At a glance, it may seem that to implement Alg. 1 using automatic differentiation packages, e.g., Pytorch (Paszke et al., 2019), one would have to implement a custom optimizer and involve multiple update steps. Instead, we observe that by directly modifying the computation graph, it would only involve a single backward pass. This is done by introducing a dummy layer with an identified function as its forward pass and its backward pass multiplies the gradient by the inverse feature covariance matrix. The dummy layer implementation is inspired by prior works in gradient estimator (Bengio et al., 2013; Roeder et al., 2017). Pseudo-code is provided in Appendix C.3.

Limitations. The monotonicity guarantees in Theorem 4.3 serve as a theoretical justification for our proposed algorithm, albeit a partial reflection of the application setup. Note that the feature extractor is updated with the gradients of the two regularizers jointly together with that of the supervised loss and the guarantees may not linearly combine as such. In practice, maintaining the balance between κ (HP (θ)) and κ (HH (θ)) requires a proper choice of hyperparameters.

Next, the current framework we analyzed focuses on linear feature extractors and using linear probing for transfer learning. We are aware of the practical limitations of this setting. To address this, in the experiments, we empirically study the effect of the proposed method on non-linear models, i.e., deep-nets, and demonstrate our method s potential despite the theoretical gap.

Table 1. Quantitative results of immunization in House Price dataset (Montoya & Data Canary, 2016), computed over 5 random seeds.

Method Eq. (15) (i) Eq. (15) (ii) RIR Rill Only 90.02 3.773 72.415 3.545 1.244 0.021 IMMA 7.053 1.662 3.545 0.880 2.001 0.187 Opt κ 1.518 0.027 0.016 0.001 92.58 4.492 Ours 18.92 2.056 0.053 0.002 356.20 5.491

5. Experiments

We evaluate the proposed Alg. 1 on regression and image classification tasks using linear models, and also explored immunizing non-linear models, i.e., deep-nets. Experiment and implementation details are provided in Appendix C.

Evaluation metrics. We introduce the relative immunization ratio (RIR) to quantify the effectiveness of the immunization based on the ratio of the condition number of Hessian, defined as follows:

RIR κ(HH(θI))

| {z } (ii)

where I denotes the identity matrix. Each term here measures the ratio between condition numbers with and without the pre-trained feature extractor on the (i) harmful task or (ii) on the pre-training task.

A successful immunization is characterized by:

(i) a large ratio κ(HH(θI))

κ(HH(I)) , i.e., using the immunized feature extractor makes the optimization of linear probing more difficult on the harmful task. (ii) a small ratio κ(HP(θI))

κ(HP(I)) ), i.e., using the pre-trained extractor do not make optimization more difficult on the pre-training task.

To obtain a single metric, we compare (i) and (ii) relative to each other. In other words, an effective immunized model should have a relative immunization ratio RIR 1.

Baselines. We consider three baselines for comparisons:

Rill Only immunizes the model by minimizing only the regularizer Rill(HH) as defined in Eq. (12) using gradient descent. IMMA (Zheng & Yeh, 2024) is formulated as a bi-level optimization program where both lower and upper tasks are solved via gradient descent. In the lower-level, it minimizes L(DH, w, θ) w.r.t. θ to obtain θ , and in the upper-level, it maximizes L(DH, w, θ ) L(DP, ω, θ ) w.r.t. θ by backpropagating through θ . Opt κ directly minimizes κ(HP(θ)) κ(HH(θ)) w.r.t. θ

Model Immunization from a Condition Number Perspective

Norm ratio curve on DP Norm ratio curve on DH

0 2500 5000 7500 10000 12500 15000 17500 20000 Epochs

0 2500 5000 7500 10000 12500 15000 17500 20000 Epochs

Figure 1. Norm ratio Eq. (16) vs. Epochs. We visualize the convergence of linear probing of different immunized models using gradient descent with an exact line search. Here, Identity corresponds to not using a feature extractor, i.e., θI = I. Observe that Ours made the convergence faster on DP while slower in DH when compared to the other baselines; consistent with the results in Tab. 1.

via gradient descent instead of using our proposed regularizers.

5.1. Experiments on Immunizing Linear Models

Linear regression task. We use the regression task from the House prices dataset (Montoya & Data Canary, 2016). We split the data into DP and DH based on the feature MSZoning. For the pre-training task, we use the target of Lot Area and for the harmful task we use the target of Sale Price. Both DP and DH contain input vectors of dimension 79. We immunized the model by running Alg. 1 for 100 epochs with η = 0.005. We choose λP and λH by balancing the gradient norm of Rwell and Rill. The implementation details can be found in Appendix C.2.

In Tab. 1, we present the empirical results of immunizing a linear feature extractor θ. We observe that only Opt κ and our method successfully immunize the model achieving an RIR that s much greater than 1. For Rill Only and IMMA, while they successfully made the harmful task more ill-conditioned, i.e., Eq. (15) (i) went up, however, this is at the cost of making the other task ill-conditioned as well, i.e., Eq. (15) (ii) went up.

Next, we demonstrate how a large condition number slows down the convergence of linear probing on the harmful task by analyzing the norm ratio defined as

wt w 2 2/ w0 w 2 2, (16)

which measures how the classifier weights wt at step t approach the optimal weights w during fine-tuning. Note, naively choosing a step size will not reflect the difference in condition number. Hence, we use the exact line search (Boyd & Vandenberghe, 2004) which chooses the step size that minimizes the loss at each iteration.

As illustrated in Fig. 1, both our method and Opt κ slow down convergence in DH compared to Identity while accelerating convergence in DP. Furthermore, our method

Table 2. Quantitative results of immunization in MNIST (Le Cun, 1998), computed over 3 random seeds and averaged over all digit pairs. Note that Opt κ has large STD in RIR, resulting in the deviation between RIR and the ratio of the averaged values.

Method Eq. (15) (i) Eq. (15) (ii) RIR Rill Only 14.832 1.039 8.654 0.606 1.933 0.046 IMMA 4.522 0.139 2.774 0.094 1.774 0.041 Opt κ 3.196 1.225 0.756 1.171 69.73 54.00 Ours 6.345 0.188 0.149 0.009 70.04 3.280

achieves a stronger immunization effect than Opt κ. In contrast, Rill Only and IMMA slowed the convergence on both the harmful task DH and the pre-training task DP.

Image classification task. For image classification, we conduct experiments using MNIST (Le Cun, 1998). The MNIST dataset consists of images over 10-digit classes, which can be formulated into 10 independent binary classification tasks. Across all pairs of tasks, we choose one to be the harmful task DH and the other the pre-training DP resulting in a total of 90 experiments. We ran Alg. 1 for 30 epochs with η = 0.005 for these experiments. The implementation details can be found in Appendix C.2.

In Tab. 2, we present the quantitative results on these binary task pairs. For each entry, the values are averaged over all 90 pairs. Based on the averaged results, we observe that our method effectively immunizes the linear feature extractor θ on DH without compromising performance on DP. Although Opt κ achieves comparable RIR with our method, the variances of the metric values are relatively large. This indicates that Opt κ is sensitive to random initialization while our method is robust.

In Fig. 2 we further analyze the results by visualizing the log(RIR) for each digit pair. A blue block indicates successful immunization, while a red block indicates failure. It can

Model Immunization from a Condition Number Perspective

Rill Only IMMA Opt κ Ours

0 1 2 3 4 5 6 7 8 9

0 1 2 3 4 5 6 7 8 9 H

0.00 2.20 0.76 0.67 0.73 0.66 0.73 1.33 0.57 0.99

1.19 0.00 0.65 0.48 0.64 0.54 0.56 0.61 0.34 0.47

0.53 1.09 0.00 0.36 0.67 0.56 0.42 1.17 0.28 0.83

0.49 1.03 0.34 0.00 0.64 0.27 0.66 0.90 0.19 0.59

0.63 1.33 0.67 0.50 0.00 0.38 0.40 0.53 0.25 0.10

0.22 0.96 0.53 0.07 0.27 0.00 0.40 0.65 0.02 0.32

0.50 1.23 0.48 0.57 0.43 0.56 0.00 1.09 0.37 0.60

0.68 1.07 0.84 0.49 0.30 0.52 0.85 0.00 0.29 0.17

0.55 0.86 0.44 0.24 0.33 0.26 0.51 0.74 0.00 0.34

0.67 1.10 0.73 0.43 0.09 0.37 0.45 0.32 0.19 0.00

0 1 2 3 4 5 6 7 8 9

0 1 2 3 4 5 6 7 8 9 H

0.00 1.92 0.59 0.64 0.68 0.52 0.66 1.20 0.51 0.92

1.07 0.00 0.50 0.46 0.59 0.49 0.54 0.58 0.32 0.46

0.49 0.97 0.00 0.32 0.62 0.43 0.37 1.08 0.24 0.77

0.48 0.90 0.27 0.00 0.55 0.23 0.58 0.77 0.18 0.53

0.57 1.17 0.48 0.48 0.00 0.34 0.35 0.44 0.22 0.09

0.23 0.84 0.35 0.06 0.22 0.00 0.36 0.55 0.00 0.27

0.47 1.12 0.35 0.52 0.41 0.46 0.00 0.96 0.34 0.54

0.65 0.96 0.65 0.46 0.26 0.46 0.74 0.00 0.26 0.16

0.51 0.76 0.34 0.24 0.29 0.20 0.47 0.61 0.00 0.31

0.65 0.98 0.54 0.40 0.09 0.33 0.41 0.26 0.17 0.00

0 1 2 3 4 5 6 7 8 9

0 1 2 3 4 5 6 7 8 9 H

0.00 4.65 3.38 4.15 4.12 4.00 3.83 4.46 4.56 2.36

5.89 0.00 5.75 4.99 5.97 5.55 5.25 5.62 4.27 5.80

3.60 3.41 0.00 3.09 2.62 3.68 1.74 3.15 3.06 3.60

3.53 3.86 2.45 0.00 3.74 3.46 3.80 2.74 2.43 2.93

4.70 2.70 4.24 3.99 0.00 4.11 3.87 3.53 2.95 2.55

3.23 3.77 3.59 2.28 3.60 0.00 2.85 3.97 2.91 3.72

3.60 2.50 4.70 5.23 4.36 3.80 0.00 5.33 3.75 4.97

4.40 3.08 3.10 5.21 3.55 4.24 4.41 0.00 4.36 2.37

3.77 3.01 3.27 2.81 3.75 2.93 4.13 3.28 0.00 2.63

3.69 3.61 5.01 3.34 1.05 3.46 4.73 3.18 3.90 0.00

0 1 2 3 4 5 6 7 8 9

0 1 2 3 4 5 6 7 8 9 H

0.00 4.83 4.84 4.05 4.85 4.56 3.60 4.28 4.35 4.28

6.26 0.00 5.45 5.64 5.84 5.39 5.24 4.72 5.06 4.67

3.62 2.97 0.00 3.14 3.48 3.67 3.20 3.36 2.89 3.26

4.32 3.63 3.21 0.00 3.83 3.24 3.49 3.48 2.75 3.24

4.65 3.54 4.45 4.86 0.00 4.80 3.64 3.83 4.23 2.15

3.58 3.35 3.53 2.17 3.68 0.00 3.04 3.49 3.64 3.07

3.49 3.63 4.70 4.37 3.95 3.81 0.00 4.67 3.86 4.05

4.42 3.46 4.22 4.32 3.24 4.51 4.17 0.00 3.52 2.67

4.71 2.77 4.03 3.07 4.10 3.59 3.54 3.49 0.00 3.09

4.15 3.91 4.38 3.85 3.14 3.87 4.12 3.76 3.11 0.00

Figure 2. Visualization of log(RIR) of binary classification tasks created from MNIST. Each element in the figure corresponds to the log(RIR) of a model immunized against DH from the pre-training task of DP. We color the block blue if RIR 1, and red otherwise. Our method succeeds in immunizing the model across all digit pairs, while the baselines failed in most pairs.

be observed that Rill Only fails for all digit pairs, IMMA only succeeds in one pair, and Opt κ fails for 32 out of 90 pairs. In contrast, our method achieves success across all digit pairs demonstrating its effectiveness for immunization.

Thus far, we have conducted experiments strictly following the immunization setting that we have proposed in Sec. 3. However, one limitation of the setting is that the feature extractor is assumed to be linear, which limits its real-world potential. To further study the practicality of our method, despite the theoretical gap, we conduct experiments with non-linear models, i.e., deep-nets, on a larger-scale image classification dataset of Image Net.

5.2. Experiments on Immunizing Deep-Nets

Immunization task. In this experiment, we consider a common setup of linear probing on models pre-trained on Image Net (Deng et al., 2009), i.e., Image Net serves as DP. For DH we experiment with the Stanford Cars Dataset (Krause et al., 2013) and Country211 Dataset (Radford et al., 2021). These datasets have been previously used for studying transfer learning (Radford et al., 2021) for image classification. More dataset details are deferred to Appendix C.1.

Experiment setup. For non-linear models, we experiment with the architecture of Res Net18 (He et al., 2016) and Vi T (Dosovitskiy, 2021). Here we study a practical setting where a given model with parameters θ0 has already been trained on DP and would undergo immunization to obtain θI to be released to the public.

Note that as we are now using an initialization of θ0 and a non-linear feature extractor fθ, we extend the RIR metric to consider those changes. Specifically, we propose

κ( HH(θI)) κ( HH(θ0))

κ( HP(θI)) κ( HP(θ0))

| {z } (ii)

where we compare the immunized model θI relative to the

initialization model θ0. Here, H(θ) denotes the Hessian for linear probing on DH with a non-linear fθ, i.e.,

HH(θ) = 2 w L(DH, w, θ) = XH(θ) XH(θ). (18)

Here, XH(θ) [fθ(x); x DH] RN Dhid denotes the concatenation of the features, with dimensions Dhid, extracted from the input data. Due to memory constraints, we approximate Eq. (17) by randomly sampling 20 groups of training data, each containing 100 samples, and reporting the average values.

Finally, we also report the task performance after immunization. This is because, as the feature extractor is non-linear we are no longer guaranteed to retain the task performance. For Res Net18, we immunize only the last two convolutional blocks of the trained feature extractor and keep the rest of the parameters frozen as in θ0. For Vi T, we only immunize the final transformer block. We optimize Eq. (11) using SGD with momentum, the default optimizer on Image Net. Further details are provided in Appendix C.2.

Results. We present the quantitative results of immunizing deep-nets in Tab. 3. On both Cars and Country211 datasets, our method demonstrates strong performance when applied to Res Netg18 and Vi T, as indicated by RIRθ0 1. In comparison, Rill Only and IMMA did not effectively immunize the models in all evaluated settings. Next, Opt κ also succeeds in immunizing the models but our proposed method outperforms it in RIRθ0.

Next, we report the test accuracy of the immunized models on DP, i.e., Image Net1K. On the Res Net18 architecture, we observe a reduction in test-accuracy from the initialization model θ0 of 68.24% to 62.36% when DH is Cars and 65.01% when DH is Country211. Interestingly, on the Vi T architecture the test-accuracy increased from 81.78% to 82.79% for Cars, and 83.17% for Country211. These results suggested that it is possible to immunize a non-linear model against the harmful task without losing the effectiveness of the other task.

Model Immunization from a Condition Number Perspective

Table 3. Quantitative results of immunization of model pre-trained on Image Net (Deng et al., 2009), computed over 3 random seeds. The DP test accuracy for the off-the-shelf model initialization of θ0 on Res Net18 is 68.24% and that of Vi T is 81.78%. We report RIRθ0 to measure the quality of immunization. Test accuracy of DP is reported to ensure the performance on the pre-training task is maintained.

DH Method Res Net18 Vi T Eq. (17) (i) Eq. (17) (ii) RIRθ0 DP Test Acc. (%) Eq. (17) (i) Eq. (17) (ii) RIRθ0 DP Test Acc. (%)

Init. θ0 1.0 1.0 1.0 68.24 1.0 1.0 1.0 81.78 Rill Only 1.878 0.034 1.786 0.025 1.057 0.026 63.84 0.292 13.121 0.038 4.097 0.098 3.342 0.048 82.21 0.035 IMMA 0.866 0.002 0.889 0.001 0.974 0.002 63.57 0.234 1.422 0.006 2.090 0.043 0.702 0.007 81.89 0.010 Opt κ 1.217 0.021 0.798 0.005 1.527 0.019 63.65 0.148 3.598 0.510 0.171 0.033 26.369 2.814 82.51 0.085 Ours 2.386 0.442 0.699 0.062 3.467 0.358 62.36 0.173 7.945 0.247 0.323 0.086 34.517 0.886 82.79 0.200

Rill Only 20.727 0.791 20.675 1.685 1.038 0.05 62.17 1.599 69.291 1.198 63.519 6.62 1.122 0.097 80.73 0.129 IMMA 0.791 0.005 0.814 0.006 0.972 0.007 67.03 0.146 6.242 0.203 7.599 0.717 0.845 0.048 82.47 0.036 Opt κ 1.538 0.155 1.053 0.091 1.472 0.043 66.81 0.115 4.589 0.079 0.300 0.106 16.498 5.183 82.79 0.023 Ours 3.287 0.33 0.399 0.034 8.714 0.672 65.01 0.143 20.894 1.425 0.700 0.082 41.341 0.967 83.17 0.075

To further show a larger Eq. (17) (i) indicating that a model is better immunized, we report the linear probed (fine-tuned) results on different feature extractors and provide the test accuracy on DH, where DH is the Stanford Cars dataset. As shown in Fig. 3, our method exhibits the slowest convergence rate on both Res Net18 and Vi T, indicated by the lowest test accuracy compared with baselines. In summary, our method remains effective on deep-nets, producing models that satisfy the requirements of an immunized model as in Definition 3.1.

6. Related Work

We briefly discuss related research on AI safety and the condition number.

AI safety, model un/re-learning, and immunization. AI safety has received attention lately, specifically in generative AI, due to the impressive progress. We refer the reader to Brundage et al. (2018); Marchal et al. (2024); Bengio et al. (2025) for a more in-depth discussion on this topic. In the following, we will discuss model unlearning, one of the ways to mitigate the potential of misuse, followed by model immunization, which protects a model against relearning.

Machine unlearning was first introduced by Cao & Yang (2015) to remove a user s private information from a model. Approximate unlearning aims to achieve this by modifying the pre-trained model directly using the specific data samples to erase, without requiring full retraining (Nguyen et al., 2020; Wu et al., 2022; Guo et al., 2019; Sekhari et al., 2021; Neel et al., 2021). In the context of text-to-image models, several methods for concept erasure have been proposed. These include inference-time approaches (Brack et al., 2023; Schramowski et al., 2023), fine-tuning of diffusion models (Gandikota et al., 2023; Kim et al., 2023; Kumari et al., 2023), and direct model editing (Zhang et al., 2024; Gandikota et al., 2024).

While promising, these works still face potential risks of

the re-emergence/re-learning of harmful data (Zheng & Yeh, 2024; Zheng et al., 2024; Zhan et al., 2024; Bertran et al., 2024; Xu et al., 2025). To avoid relearning or further finetuning on harmful data, Zheng & Yeh (2024) propose to immunize the text-to-image models against malicious finetuning and Zheng & Yeh (2025) extend model immunization to multi-concept settings. Recent work highlights the importance of preventing re-finetuning or distillation on harmful tasks in language models (Huang et al., 2024; Savani et al., 2025; Rosati et al., 2024a; Tamirisa et al., 2024; Rosati et al., 2024b; Henderson et al., 2023) and encoder probing (Ding et al., 2025), which is closely related to our goal. While we also study the task of model immunization, different from Zheng & Yeh (2024) that primarily focuses on empirical applications on generative tasks, our work aims to provide a more principled understanding of model immunization by analyzing it through the lens of the condition number.

Minimizing Condition Number. Condition number has been a key factor in the convergence rates and accuracies of iterative methods, e.g., Jacobi method (Arioli & Romani, 1985), steepest descent (Luenberger et al., 1984), conjugate gradient (Hestenes et al., 1952), for solving optimization problems from classic linear systems (Saad, 2003) to those with general nonlinear objectives (Nesterov, 2018) concerning modern machine learning applications. It is widely observed that a small condition number tends to speed up convergence and improve accuracy whereas a large condition number could lead to an unstable optimization procedure (Saarinen et al., 1993; Kress, 2012; Bengio et al., 2017; Guille-Escuret et al., 2021).

As a result, methods to minimize the condition number in various contexts have been proposed. Preconditioning (Evans, 1968), a technique that involves finding a matrix, i.e., the preconditioner, to multiply with the original matrix, resulting in a new matrix with a significantly smaller condition number, is widely used for solving linear systems. The preconditioner can be constructed using methods

Model Immunization from a Condition Number Perspective

Fine-tuning accuracy with Res Net-18 Fine-tuning accuracy with Vi T

0 10 20 30 40 50 Epochs

Test Accuracy (%)

Identity r2 Only

0 10 20 30 40 50 Epochs

Test Accuracy (%)

Identity r2 Only

Figure 3. Test accuracy vs. Fine-tuning Epochs on DH. We visualize the test accuracy of linear probing on Image Net of different immunized models using gradient descent. Here DH is the Stanford Cars dataset.

such as semidefinite programming (Jambulapati et al., 2020; 2023; Qu et al., 2024) or matrix equilibration (Van der Sluis, 1969), and has recently found applications in deep learning (Saratchandran et al., 2024).

Most related to this work, Balazs et al. (2024) propose to regularize the condition number of weight matrices by directly adding the condition number term into the optimization objective and applying (sub)gradient descent. Observing that the condition number is discontinuous and nonconvex, Nenov et al. (2024) proposed a differentiable regularizer that minimizes the matrix condition number with a monotonic decrease guarantee if optimized with gradient descent. To the best of our knowledge, no notable effort has been made to increase or maximize the condition number.

7. Conclusion

We propose a framework for studying model immunization through the condition number of the Hessian matrix. We show that immunization can be achieved by increasing the condition number of harmful datasets while keeping it stable for the pre-training task. To achieve this, we introduce two differentiable regularizers and propose an algorithm that incorporates these regularizers into a gradient-based optimization algorithm. Empirical results on both linear and deep models demonstrate the effectiveness of our approach to model immunization. We believe that our proposed framework is a first step towards a more principled understanding of model immunization and will ultimately make open-sourced models safer.

Acknowledgements

This project is supported in part by an NSF Award #2420724 and the Ross-Lynn Research Scholar Grant.

Impact Statement

This paper presents work whose goal is to advance the field of Machine Learning and Optimization. While there are many potential societal consequences of our work, we believe that the benefits outweigh the harms. Specifically, the topic of model immunization is towards making AI safer.

Arioli, M. and Romani, F. Relations between condition numbers and the convergence of the jacobi method for real positive definite matrices. Numerische Mathematik, 1985.

Balazs, P., Haider, D., Lostanlen, V., and Perfler, F. Trainable signal encoders that are robust against noise. In INTER-NOISE and NOISE-CON Congress and Conference Proceedings, 2024.

Bengio, Y., L eonard, N., and Courville, A. Estimating or propagating gradients through stochastic neurons for conditional computation. ar Xiv preprint ar Xiv:1308.3432, 2013.

Bengio, Y., Goodfellow, I., and Courville, A. Deep learning. MIT press Cambridge, MA, USA, 2017.

Bengio, Y., Mindermann, S., Privitera, D., Besiroglu, T., Bommasani, R., Casper, S., Choi, Y., Fox, P., Garfinkel, B., Goldfarb, D., Heidari, H., Ho, A., Kapoor, S., Khalatbari, L., Longpre, S., Manning, S., Mavroudis, V., Mazeika, M., Michael, J., Newman, J., Ng, K. Y., Okolo, C. T., Raji, D., Sastry, G., Seger, E., Skeadas, T., South, T., Strubell, E., Tram er, F., Velasco, L., Wheeler, N., Acemoglu, D., Adekanmbi, O., Dalrymple, D., Dietterich, T. G., Felten, E. W., Fung, P., Gourinchas, P.-O., Heintz, F., Hinton, G., Jennings, N., Krause, A., Leavy, S., Liang, P., Ludermir, T., Marda, V., Margetts, H., Mc Dermid, J., Munga, J., Narayanan, A., Nelson, A., Neppel, C., Oh, A., Ramchurn, G., Russell, S., Schaake, M., Sch olkopf,

Model Immunization from a Condition Number Perspective

B., Song, D., Soto, A., Tiedrich, L., Varoquaux, G., Yao, A., Zhang, Y.-Q., Ajala, O., Albalawi, F., Alserkal, M., Avrin, G., Busch, C., de Carvalho, A. C. P. d. L. F., Fox, B., Gill, A. S., Hatip, A. H., Heikkil a, J., Johnson, C., Jolly, G., Katzir, Z., Khan, S. M., Kitano, H., Kr uger, A., Lee, K. M., Ligot, D. V., L opez Portillo, J. R., Molchanovskyi, O., Monti, A., Mwamanzi, N., Nemer, M., Oliver, N., Pezoa Rivera, R., Ravindran, B., Riza, H., Rugege, C., Seoighe, C., Sheehan, J., Sheikh, H., Wong, D., and Zeng, Y. International AI safety report. Technical Report DSIT 2025/001, 2025. URL https: //www.gov.uk/government/publications/ international-ai-safety-report-2025.

Bertran, M. A., Tang, S., Kearns, M., Morgenstern, J. H., Roth, A., and Wu, S. Reconstruction attacks on machine unlearning: Simple models are vulnerable. In Proc. Neur IPS, 2024.

Boyd, S. and Vandenberghe, L. Convex optimization. Cambridge university press, 2004.

Brack, M., Friedrich, F., Hintersdorf, D., Struppek, L., Schramowski, P., and Kersting, K. SEGA: Instructing text-to-image models using semantic guidance. In Proc. Neur IPS, 2023.

Brundage, M., Avin, S., Clark, J., Toner, H., Eckersley, P., Garfinkel, B., Dafoe, A., Scharre, P., Zeitzoff, T., Filar, B., et al. The malicious use of artificial intelligence: Forecasting, prevention, and mitigation. ar Xiv preprint ar Xiv:1802.07228, 2018.

Bubeck, S. Convex optimization: Algorithms and complexity. Foundations and Trends in Machine Learning, 2015.

Cao, Y. and Yang, J. Towards making systems forget with machine unlearning. In IEEE symposium on security and privacy, 2015.

Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. Image Net: A large-scale hierarchical image database. In Proc. CVPR, 2009.

Ding, R., Zhou, T., Su, L., Ding, A. A., Xu, X., and Fei, Y. Probe-me-not: Protecting pre-trained encoders from malicious probing. In Proc. NDSS, 2025.

Dosovitskiy, A. An image is worth 16x16 words: Transformers for image recognition at scale. In Proc. ICLR, 2021.

Evans, D. J. The use of pre-conditioning in iterative methods for solving linear equations with symmetric positive definite matrices. IMA Journal of Applied Mathematics, 1968.

Gandikota, R., Materzynska, J., Fiotto-Kaufman, J., and Bau, D. Erasing concepts from diffusion models. In Proc. ICCV, 2023.

Gandikota, R., Orgad, H., Belinkov, Y., Materzy nska, J., and Bau, D. Unified concept editing in diffusion models. In Proc. WACV, 2024.

Gloub, G. H. and Van Loan, C. F. Matrix computations. Johns Hopkins Universtiy Press, 3rd edtion, 1996.

Guille-Escuret, C., Girotti, M., Goujaud, B., and Mitliagkas, I. A study of condition numbers for first-order optimization. In Proc. AISTATS, 2021.

Guo, C., Goldstein, T., Hannun, A., and Van Der Maaten, L. Certified data removal from machine learning models. ar Xiv preprint ar Xiv:1911.03030, 2019.

He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In Proc. CVPR, 2016.

Henderson, P., Mitchell, E., Manning, C., Jurafsky, D., and Finn, C. Self-destructing models: Increasing the costs of harmful dual uses of foundation models. In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, 2023.

Hestenes, M. R., Stiefel, E., et al. Methods of conjugate gradients for solving linear systems. NBS Washington, DC, 1952.

Huang, T., Hu, S., Ilhan, F., Tekin, S. F., and Liu, L. Harmful fine-tuning attacks and defenses for large language models: A survey. ar Xiv preprint ar Xiv:2409.18169, 2024.

Jambulapati, A., Li, J., Musco, C., Sidford, A., and Tian, K. Fast and near-optimal diagonal preconditioning. ar Xiv preprint ar Xiv:2008.01722, 2020.

Jambulapati, A., Li, J., Musco, C., Shiragur, K., Sidford, A., and Tian, K. Structured semidefinite programming for recovering structured preconditioners. In Proc. Neur IPS, 2023.

Kim, S., Jung, S., Kim, B., Choi, M., Shin, J., and Lee, J. Towards safe self-distillation of internet-scale text-to-image diffusion models. ar Xiv preprint ar Xiv:2307.05977, 2023.

Kingma, D. P. Adam: A method for stochastic optimization. ar Xiv preprint ar Xiv:1412.6980, 2014.

Krause, J., Stark, M., Deng, J., and Fei-Fei, L. 3d object representations for fine-grained categorization. In Proc. ICCV Workshops, 2013.

Model Immunization from a Condition Number Perspective

Kress, R. Numerical analysis. Springer Science & Business Media, 2012.

Kumari, N., Zhang, B., Wang, S.-Y., Shechtman, E., Zhang, R., and Zhu, J.-Y. Ablating concepts in text-to-image diffusion models. In Proc. ICCV, 2023.

Le Cun, Y. The MNIST database of handwritten digits. http://yann. lecun. com/exdb/mnist/, 1998.

Lewis, A. S. The convex analysis of unitarily invariant matrix functions. Journal of Convex Analysis, 1995.

Luenberger, D. G., Ye, Y., et al. Linear and nonlinear programming, volume 2. Springer, 1984.

Marchal, N., Xu, R., Elasmar, R., Gabriel, I., Goldberg, B., and Isaac, W. Generative AI misuse: A taxonomy of tactics and insights from real-world data. ar Xiv preprint ar Xiv:2406.13843, 2024.

Montoya, A. and Data Canary. House prices - advanced regression techniques, 2016. Kaggle.

Mordukhovich, B. S. Variational analysis and applications. Springer, 2018.

Neel, S., Roth, A., and Sharifi-Malvajerdi, S. Descent-todelete: Gradient-based methods for machine unlearning. In Proc. ALT, 2021.

Nenov, R., Haider, D., and Balazs, P. (Almost) Smooth Sailing: Towards numerical stability of neural networks through differentiable regularization of the condition number. In ICML Differentiable Almost Everything Workshop, 2024.

Nesterov, Y. Lectures on convex optimization, volume 137. Springer, 2018.

Nguyen, Q. P., Low, B. K. H., and Jaillet, P. Variational bayesian unlearning. Proc. Neur IPS, 2020.

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. Pytorch: An imperative style, high-performance deep learning library. In Proc. Neur IPS, 2019.

Petersen, K. B., Pedersen, M. S., et al. The matrix cookbook. Technical University of Denmark, 7(15):510, 2008.

Qu, Z., Gao, W., Hinder, O., Ye, Y., and Zhou, Z. Optimal diagonal preconditioning. Operations Research, 2024.

Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al. Learning transferable visual models from natural language supervision. In Proc. ICML, 2021.

Rockafellar, R. Convex analysis. Princeton Mathematical Series, 28, 1970.

Roeder, G., Wu, Y., and Duvenaud, D. K. Sticking the landing: Simple, lower-variance gradient estimators for variational inference. In Proc. Neur IPS, 2017.

Rosati, D., Wehner, J., Williams, K., Bartoszcze, L., Gonzales, R., Majumdar, S., Sajjad, H., Rudzicz, F., et al. Representation noising: A defence mechanism against harmful finetuning. In Proc. Neur IPS, volume 37, 2024a.

Rosati, D., Wehner, J., Williams, K., Bartoszcze, L., Sajjad, H., and Rudzicz, F. Immunization against harmful fine-tuning attacks. In Findings of the Association for Computational Linguistics: EMNLP 2024, 2024b.

Saad, Y. Iterative methods for sparse linear systems. SIAM, 2003.

Saarinen, S., Bramley, R., and Cybenko, G. Ill-conditioning in neural network training problems. SIAM Journal on Scientific Computing, 1993.

Saratchandran, H., Wang, T. X., and Lucey, S. Weight conditioning for smooth optimization of neural networks. In Proc. ECCV, 2024.

Savani, Y., Trockman, A., Feng, Z., Schwarzschild, A., Robey, A., Finzi, M., and Kolter, J. Z. Antidistillation sampling. ar Xiv preprint ar Xiv:2504.13146, 2025.

Schramowski, P., Brack, M., Deiseroth, B., and Kersting, K. Safe latent diffusion: Mitigating inappropriate degeneration in diffusion models. In Proc. CVPR, 2023.

Sekhari, A., Acharya, J., Kamath, G., and Suresh, A. T. Remember what you want to forget: Algorithms for machine unlearning. In Proc. Neur IPS, 2021.

Tamirisa, R., Bharathi, B., Phan, L., Zhou, A., Gatti, A., Suresh, T., Lin, M., Wang, J., Wang, R., Arel, R., et al. Tamper-resistant safeguards for open-weight llms. ar Xiv preprint ar Xiv:2408.00761, 2024.

Thomee, B., Shamma, D. A., Friedland, G., Elizalde, B., Ni, K., Poland, D., Borth, D., and Li, L.-J. YFCC100M: The new data in multimedia research. Communications of the ACM, 2016.

Van der Sluis, A. Condition numbers and equilibration of matrices. Numerische Mathematik, 1969.

Wightman, R. Pytorch image models. https://github. com/rwightman/pytorch-image-models, 2019.

Wu, G., Hashemi, M., and Srinivasa, C. Puma: Performance unchanged model augmentation for training data removal. In Proc. AAAI, 2022.

Model Immunization from a Condition Number Perspective

Xu, X., Yue, X., Liu, Y., Ye, Q., Hu, H., and Du, M. Unlearning isn t deletion: Investigating reversibility of machine unlearning in llms. ar Xiv preprint ar Xiv:2505.16831, 2025.

Zhan, Q., Fang, R., Bindu, R., Gupta, A., Hashimoto, T., and Kang, D. Removing rlhf protections in gpt-4 via fine-tuning. In Proc. NAACL, 2024.

Zhang, G., Wang, K., Xu, X., Wang, Z., and Shi, H. Forgetme-not: Learning to forget in text-to-image diffusion models. In Proc. CVPR, 2024.

Zheng, A. Y. and Yeh, R. A. Imma: Immunizing textto-image models against malicious adaptation. In Proc. ECCV, 2024.

Zheng, A. Y. and Yeh, R. A. Multi-concept model immunization through differentiable model merging. In Proc. AAAI, 2025.

Zheng, A. Y., Yang, C.-A., and Yeh, R. A. Learning to obstruct few-shot image classification over restricted classes. In Proc. ECCV, 2024.

Zhuang, F., Qi, Z., Duan, K., Xi, D., Zhu, Y., Zhu, H., Xiong, H., and He, Q. A comprehensive survey on transfer learning. Proceedings of the IEEE, 2020.

Model Immunization from a Condition Number Perspective

The appendix is organized as follows:

In Sec. A, we provide the complete statements of the properties of Rwell(S) for minimizing the condition number. In Sec. B, we provide the complete proof for the Theorems stated in the main paper. In Sec. C, we provide additional experiment details. The code will be open-sourced upon the acceptance of this paper.

A. Properties of the Condition Number Minimizing Regularizer

Theorem A.1 (Properties of κ-minimizing regularizer Rwell(S), Theorem 2.1, 2.2, 3.1, 3.2 in Nenov et al. (2024)).

(1) [Nonnegativity] S Rpr pc, Rwell(S) 0. If S = 0, Rwell(S) = 0 if and only if S has full rank and κ(S) = 1.

(2) [Upper Bound] κ(S) ep(σmin S ) 2Rwell(S), i.e., r(S) is an upper bound of log(κ(S)) as long as σmin S is bounded away from 0.

(3) [Differentiability] If σmax S = σ1 > σi for any i > 1, i.e., σmax S is unique, then Rwell(S) is differentiable and its gradient is given by SRwell(S) = σ1u1v 1 1

(4) [Monotonic Decrease] If σmax S is unique, update S with SRwell(S) such that S = S η1 SRwell(S) for 0 < η1 < κ(S) 1 (1 1

p , then κ(S ) < κ(S).

B. Proof of Propositions and Theorems

B.1. Proof of Proposition 3.2.

Proposition 3.2. The singular values of the Hessian matrix in Eq. (9) are given by

σθ,i(u θ,iqj) γj 2 , i {1, . . . , Din}. (10)

Here, σθ,i and uθ,i correspond to the i-th singular value and vector of θ. Next, γj and qj correspond to the j-th singular value and vector of the covariance K.

Proof. Substitute the SVD of θ and the eigendecomposition of K into θ Kθ:

θ Kθ = (UθΣθV θ ) (QΓ2Q )(UθΣθV θ ).

Simplify the expression:

θ Kθ = Vθ(ΣθU θ QΓ2Q UθΣθ)V θ .

Define M = ΣθU θ QΓ, so that:

θ Kθ = Vθ(MM )V θ .

The elements of M are:

M[i, j] = σθ,i(u θ,iqj)γj,

where σθ,i s for i [d] are the singular values of θ, γj s for i [d] are the diagonal entries of Γ, and (u θ,iqj) measures the alignment between the i-th column of Uθ and the j-th column of Q.

Model Immunization from a Condition Number Perspective

We observe the following decomposition of M in to two matrices O and D:

... ... ...

. . . σθ,i(u θ,iqj)γj . . .

... ... ...

... ... ...

. . . σθ,i(u θ,iqj)γj q P

j (σθ,i(u θ,iqj )γj )2 . . .

... ... ...

j (σθ,i(u θ,iqj γj)2 0

where O is an orthonormal matrix, i.e., O O = I, and D = diag(d1, . . . , dd) with di = q P

j (σθ,i(u θ,iqj )γj )2 is a

diagonal matrix. As a result, diagonal entries of D2 are:

σθ,i(u θ,iqj)γj 2 .

Thus, MM = (OD)(OD) = OD2O , and the eigenvalues of θ Kθ are the diagonal entries of D2, given by:

σi = d2 i =

σθ,i(u θ,iqj)γj 2 , i = 1, . . . , d.

B.2. Proof of Theorem 4.1

B.2.1. PROOF OF THEOREM 4.1 (1)

Theorem 4.1. (1) For any S Rpr pc, Rill (S) 0, and Rill (S) = 0 if and only if κ (S) = .

Proof. By definition, Rill(S) = 1 1 2k S 2 F 1

2(σmin S ) 2 . Denote R ill(S) = 1

2 (σmin S )2 1

k S 2 F , then we have Rill (S) =

1 R ill(S), and

R ill(S) = 1

since i [k], σmin S = σk σi. As a result, R ill(S) 0 and Rill (S) = 1 R ill(S) 0, i.e., Rill (S) is non-negative.

Model Immunization from a Condition Number Perspective

Also, by definition, σ1 = κ (S) σk. Therefore,

Rill (S) = 2

1 k Pk i=1 σ2 i (σmin S )2

1 kσ2 1 + k 1

k (σmin S )2 (σmin S )2

1 k σ2 1 (σmin S )2

(κ(S)2 1) (σmin S )2 .

If κ(S) = , Rill(S) 2k (κ(S)2 1)(σmin S ) 2 = 0 for σmin S > 0, which yields Rill(S) = 0 given that Rill(S) 0.

Similarly, we have

Rill (S) = 2

1 k Pk i=1 σ2 i (σmin S )2

k (σmin S )2 (σmin S )2

k σ2 1 (σmin S )2

2k k 1 (κ(S)2 1) (σmin S )2 .

If Rill(S) = 0, we have κ (S) r

2k k 1 Rill(S)(σmin S ) 2 + 1 = which yields κ (S) = .

B.2.2. PROOF OF THEOREM 4.1 (2)

To prove Theorem 4.1 (2), we start by analyzing R ill(S) = 1

2 (σmin S )2 1

k S 2 F with the following lemma.

Lemma B.1. For R ill(S) = 1

2 (σmin S )2 1

1 κ(S) e k k 1 σ 2 1 R ill(S) (19)

That is, R ill(S) is an upper bound of log 1 κ(S) , i.e., log(κ(S)).

Proof. Similar to the proof of Theorem 3.2 in (Nenov et al., 2024),

2R ill(S) = σmin S 2 1

= σmin S 2 1

(k 1)σ2 1 + σmin S 2

σmin S 2 σ2 1

Model Immunization from a Condition Number Perspective

In the meantime,

2 log 1 κ(S)

= log κ(S)2

= log σmin S 2 log σ2 1

σ2 1 σmin S 2

σmin S 2 σ2 1

in which the inequality follows from the Mean Value Theorem. As a result,

(σmin S ) 2 σ2 1

1 2σ2 1 ( k k 1 2R ill(S))

= e k k 1 σ 2 1 R ill(S)

Theorem 4.1. (2) 1 log(κ(S)) (σmax S )2 Rill (S), i.e., Rill(S) upper bounds 1 log(κ(S)) when σmax S is reasonably away from .

Proof. Taking the logarithm of Lemma B.1, we have

log (κ (S)) k k 1σ 2 1 R ill (S) .

Negating both sides,

log (κ (S)) k k 1σ 2 1 R ill (S) .

Finally, taking the reciprocal,

1 log (κ (S)) k 1

k σ2 1 R ill (κ (S))

k σ2 1 1 2 (σmin S )2 1 2k S 2 F σ2 1Rill (S)

B.2.3. PROOF OF THEOREM 4.1 (3)

To analyze the differentiability of Rill(S) = 1 1 2k S 2 F 1

2(σmin S ) 2 , we start by analyzing the differentiability of R ill(S) =

1 2 (σmin S )2 1

k S 2 F , which needs the following lemma as a prerequisite.

Lemma B.2 (Theorem 3.1 in (Lewis, 1995) without Convexity). If a function f : Rp R is absolutely symmetric, that is, x Rp and any y as a permutation of x, f(x) = f(y), then f σ is differentiable at matrix S Rp1 p2 if and only if f is differentiable at σ = σ(S). In this case, for the singular value decomposition S = UDiag(σ)V ,

(f σ) (S) = UDiag( f(σ))V .

Model Immunization from a Condition Number Perspective

Proof. For the forward direction, by Corollary 2.5 in (Lewis, 1995), for S = UDiag(σ)V ,

(f σ) (S) = n UDiag(µ)V µ f(σ) o .

By Theorem 25.1 in (Rockafellar, 1970), since f σ is differentiable at matrix S Rp1 p2, we know that its subgradient (f σ) (S) is a singleton, meaning that UDiag(µ)V is unique, and consequently, µ f(σ) is unique. As a result, f(σ) is also a singleton, which, again by Corollary 2.5 in (Lewis, 1995), indicates that f is differentiable at σ. The reverse direction holds true following a similar argument.

Lemma B.3. For S = UDiag(σ)V , in which σ = [σ1, , σk] such that σmax S = σ1 σ2 > σk = σmin S , i.e.,

σk < σi for any i < k, R ill(S) = 1

2 (σmin S )2 1

k S 2 F is differentiable and for uk, vk as the kth column vector of U, V ,

R ill(S) = σmin S ukv k 1

Proof. For x Rk, denote

R ill,1(x) = min i [k] 1 2x2 i , R ill,2(x) = 1

With R ill(S) = 1

2 (σmin S )2 1 2k S 2 F , we first analyze 1

2 (σmin S )2. By the subdifferential of piecewise minimum given by Proposition 4.9 in (Mordukhovich, 2018), we have for x Rk,

x R ill,1(x)

i arg min j [k]

xiei i arg min j [k]

xiei i arg min j [k] |xj|

in which ei is the ith vector from the k-dimensional standard basis. Therefore,

σR ill,1(σ)

σiei i arg min j [k] σj

Since for any i < k, σk < σi, i.e., the minimum non-zero singular value σmin S is unique, we know that the subdifferential n σiei i arg minj [k] σj o = {σmin S } is a singleton. Therefore, by Theorem 25.1 in (Rockafellar, 1970), we know R ill,1 is differentiable with respect to σ and σR ill,1(σ) = σmin S ek. Regarding σ = σ (S) as a function of S in which σ ( ) represents taking the singular values of a matrix, we have by Corollary 2.5 in (Lewis, 1995)

2 σmin S 2 = S(R ill,1 σ)(S)

= n UDiag(µ)V µ σR ill,1(σ) o

Given that R ill,1 is differentiable and apparently also absolutely symmetric with respect to σ, by Lemma B.2, we know

1 2 (σmin S )2 is also differentiable and

2 σmin S 2 = UDiag( σR ill,1(σ))V

= UDiag(σmin S ep)V

= σmin S ukv k .

Model Immunization from a Condition Number Perspective

In addition, we have

i=1 σ (S)2 !

= n UDiag(µ)V µ σR ill,2(σ) o

by Corollary 2.5 in (Lewis, 1995). R ill,2 is apparently differentiable with R ill,2(x) = 1

kx. Therefore, again by Lemma B.2,

= UDiag( R ill,2(σS))V

k UDiag (σS) V

By the linearity of gradients,

R ill(S) = 1

2 σmin S 2 1

= σmin S ukv k 1

which completes the proof.

Theorem 4.1. (3) If σmin S = σk < σi for any i < k, then Rill(S) is differentiable and SRill(S) = σkukv k 1

k S 1 2k S 2 F 1

2(σmin S ) 2 2 .

Proof. Since Rill(S) = 1 1 2k S 2 F 1

2(σmin S ) 2 , we have

Rill (S) = 1 2k S 2 F 1

2 (σmin S )2

1 2k S 2 F 1

2 (σmin S )2 2

= 1 2 (σmin S )2 1 2k S 2 F

1 2k S 2 F 1

2 (σmin S )2 2

= R ill (S) 1 2k S 2 F 1

2 (σmin S )2 2

By Lemma B.3, we know that if σmin S = σk < σi for any i < k, R ill (S) is differentiable and R ill(S) = σmin S ukv k 1

k S. Consequently, Rill(S) is differentiable and

Rill (S) = σmin S ukv k 1

k S 1 2k S 2 F 1

2 (σmin S )2 2 .

B.2.4. PROOF OF THEOREM 4.1 (4)

Theorem 4.1. (4) If σmin S is unique, update S with SRill(S) such that S = S η2 SRill(S) for 0 < η2 <

k k 1 1 2k S 2 F 1

2 (σmin S )2 2 , then κ (S ) > κ (S).

Model Immunization from a Condition Number Perspective

Proof. Given that S = S η2 Rill(S) and that Rill(S) = σmin S ukv k 1

k S 1 2k S 2 F 1

2(σmin S ) 2 2 = 1 R ill(S)2 σkukv k 1

R ill(S) = 1

S = S η2 Rill(S)

= S η2 R ill(S)2

= 1 + η2 k R ill(S)2

S η2 R ill(S)2 σkukv k

= 1 + η2 k R ill(S)2

i=1 σiuiv i η2 R ill(S)2 σkukv k

= 1 + η2 k R ill(S)2

i=1 σiuiv i + 1 + η2 k R ill(S)2 η2 R ill(S)2

= UDiag(σS )V .

where σS = h 1 + η2 k R ill(S)2 σ1, , 1 + η2 k R ill(S)2 σk 1, 1 + η2 k R ill(S)2 η2 R ill(S)2 σk i is the vector formed by

the singular values of S but not necessarily in the decreasing order.

Now we argue that 1 + η2 k R ill(S)2 σ1 remains to be the maximum singular value while 1 + η2 k R ill(S)2 η2 R ill(S)2 σk the minimum. Since σk < σi for any i < k, i.e., σmin S = σk is unique, we must have 0 < β < 1 such that σk = βσk 1.

Also, given that η2 < k k 1 1 2k S 2 F 1

2σ2 k 2 = k R ill(S)2

k 1 , we have 1 + η2 k R ill(S)2 η2 R ill(S)2 > 0. Therefore, 1 + η2 k R ill(S)2 η2 R ill(S)2

= 1 + η2 k R ill(S)2

1 + η2 k R ill(S)2 η2 R ill(S)2 1 + η2 k R ill(S)2 σk

= 1 + η2 k R ill(S)2

η2 R ill(S)2 1 + η2 k R ill(S)2

< 1 + η2 k R ill(S)2

< 1 + η2 k R ill(S)2

= 1 + η2 k R ill(S)2

Since σ1 σ2(S) σk 1 > σk and 1 + η2 k R ill(S)2 > 0, we know that σmax S = 1 + η2 k R ill(S)2 σ1 and

σmin S = 1 + η2 k R ill(S)2 η2 R ill(S)2 σk. Finally,

κ (S ) = σmax S σmin S

1 + η2 k R ill(S)2 σ1 1 + η2 k R ill(S)2 η2 R ill(S)2 σk

= 1 + η2 k R ill(S)2 1 + η2 k R ill(S)2 η2 R ill(S)2 κ (S)

Model Immunization from a Condition Number Perspective

B.3. Proof of Theorem 4.2

Theorem 4.2. For H (θ) = θ Kθ, if its maximum and minimum singular values σ1 and σk are unique, then

(1) θRwell (H (θ)) = 2Kθ σ1v1v 1 1 Din θ Kθ ,

(2) θRill (H (θ)) = 2Kθ(σkvkv k 1

k θ Kθ) ( 1

2k θ Kθ 2 F 1

2 σ2 k) 2 .

Proof. Given that H (θ) = θ Kθ for K = X X, we know H (θ) is symmetric and positive semidefinite. Therefore, for compact SVD H (θ) = UDiag (σ) V , we have U = V .

(1) When the maximum singular value σ1 of H is unique, we know from Theorem A.1 (3) that Rwell (H (θ)) is differentiable with respect to H, and HRwell (H (θ)) = σ1u1v 1 1 Din H = σ1v1v 1 1 Din H.

Given the form H (θ) = θ Kθ, we have d H = (dθ) Kθ + θ K (dθ). Furthermore,

(d Rwell) (H (θ)) = HRwell (H (θ)) , d H F

σ1v1v 1 1 Din H (dθ) Kθ + θ K (dθ) !

σ1v1v 1 1 Din H (dθ) Kθ

σ1v1v 1 1 Din H θ K (dθ)

Kθ σ1v1v 1 1 Din H (dθ) !

(dθ) σ1v1v 1 1 Din H θ K

in which , F denotes the Frobenius inner product, and that last equality follows from the cyclic property of trace. As a result, following the derivatives of traces as in Eq. (100) and Eq. (104) in Petersen et al. (2008),

θRwell(H (θ)) = Rwell(H (θ))

= Kθ σ1v1v 1 1 Din H +

σ1v1v 1 1 Din H θ K

= Kθ σ1 v1v 1 1 Din H + K θ σ1v1v 1 1 Din H

= 2Kθ σ1v1v 1 1 Din H .

(2) When the minimum singular value σk of H is unique, we know from Theorem 4.1 (3) that Rill (H (θ)) is differentiable

with respect to H, and HRill (H (θ)) = σkukv k 1

2 σ2 k) 2 . Following similar arguments as in (1), we have

θRill (H (θ)) = 2Kθ(σkukv k 1

2 σ2 k) 2 .

B.4. Proof of Theorem 4.3

Theorem 4.3. For the trainable feature extractor θ, feature covariance HP (θ) = θ KPθ of the primary task and HH (θ) = θ KHθ of the immunization task with rank (HP) = k P, rank (HH) = k H and compact SVD HP (θ) = UPDiag(σP)V P , HH (θ) = UHDiag(σH)V H , for σP = [σP,1, , σP,k P], σH = [σH,1, , σH,k H],

(1) if σmax HP is unique, i.e., σmax HP = σP,1 > σP,2, update θ such that θ = θ ηPK 1 P θRwell(HP (θ)) for 0 < ηP <

min 1 (1 1 Din )σP,1 , σP,1σP,2 σP,2

2 Din σ2 P,2

, then κ θ KPθ < κ θ KPθ ,

Model Immunization from a Condition Number Perspective

(2) if σmin HH is unique, i.e., σmin HH = σH,k H < σH,k H 1, update θ such that θ = θ ηHK 1 H θRill(HH (θ)) for 0 < ηH <

1 1 2σmin HH/k H

1 2k H θ KHθ 2 F 1

2 σmin HH 2 2 , then κ θ KHθ > κ θ KHθ .

Proof. (1) By Theorem 4.2 (1), we know θRwell (HP (θ)) = 2KPθ σP,1v P,1v P,1 1 Din HP . Since θ = θ

ηPK 1 P θRwell(HP (θ)), we have

θ KPθ = θ ηPK 1 P θRwell(HP (θ)) KP θ ηPK 1 P θRwell(HP (θ))

= θ 2ηPK 1 P KPθ σP,1v P,1v P,1 1 Din HP

θ 2ηPK 1 P KPθ σP,1v P,1v P,1 1 Din HP

= θ 2ηPθ σP,1v P,1v P,1 1 Din HP

θ 2ηPθ σP,1v P,1v P,1 1 Din HP

= θ KPθ 2ηP

σP,1v P,1v P,1 1 Din HP

θ KPθ 2ηPθ KPθ σP,1v P,1v P,1 1 Din HP

σP,1v P,1v P,1 1 Din HP

θ KPθ σP,1v P,1v P,1 1 Din HP

σP,1v P,1v P,1 1 Din HP

σP,1v P,1v P,1 1 Din HP

σP,1v P,1v P,1 1 Din HP

σP,1v P,1v P,1 1 Din HP

Since HP (θ) = θ KPθ for KP = X P XP is symmetric and positive semidefinite, we know for HP (θ) = UPDiag (σP) V P , it holds that UP = VP. Furthermore,

σP,1v P,1v P,1 1 Din HP = σP,1v P,1v P,1 1 Din

i=1 σP,iu P,iv P,i

σP,1u P,1v P,1 1 Din

i=2 σP,iu P,iv P,i

= UPDiag ( σP) V P = VPDiag ( σP) V P

for Diag ( σP) = h 1 1 Din

Din σP,2, , 1

Din σP,k P i . Therefore, plugging this and the SVD of HP back in,

θ KPθ = VPDiag (σP) V P 2ηP VPDiag ( σP) V P VPDiag (σP) V P

2ηP VPDiag (σP) V P VPDiag ( σP) V P

+ 4η2 P VPDiag ( σP) V P VPDiag (σP) V P VPDiag ( σP) V P = VPDiag (σP) V P 2ηPVPDiag ( σP) Diag (σP) V P 2ηPVPDiag (σP) Diag ( σP) V P + 4η2 PVPDiag ( σP) Diag (σP) Diag ( σP) V P = VPDiag (σP) V P 2ηPVPDiag ( σP σP) V P 2ηPVPDiag (σP σP) V P + 4η2 PVPDiag ( σP σP σP) V P = VPDiag σP 4ηP σP σP + 4η2 P σP σP σP V P = VPDiag (σ P) V P ,

in which σ P = h σ P,1, , σ P,k P i for σ P,i =

σP,1 4ηP 1 1 Din

σ2 P,1 + 4η2 P 1 1 Din

2 σ3 P,1 if i = 1

Din σ2 P,i + 4η2 P D2 in σ3 P,i if i > 1 , de-

notes element-wise product and the second equality holds by the fact that VP is orthonormal, i.e., V P VP = I.

Model Immunization from a Condition Number Perspective

Since σmax HP is unique, we know that α > 1 such that σP,1 = ασP,2. Therefore,

σ P,2 = σP,2 + 4ηP

Din σ2 P,2 + 4η2 P D2 in σ3 P,2

Din σP,2 + 4η2 P D2 in σ2 P,2

Din σP,2 + 4η2 P D2 in σ2 P,2 α σP,1.

With ηP < σP,1σP,2 σP,2

2 Din σ2 P,2 , we have 1 + 4ηP

Din σP,2 + 4η2 P D2 in σ2 P,2 < 1 4ηP 1 1 Din

σP,1 + 4η2 P 1 1 Din

2 σ2 P,1. As a

Din σP,2 + 4η2 P D2 in σ2 P,2

1 4ηP 1 1 Din

σP,1 + 4η2 P 1 1 Din

Din σP,2 + 4η2 P D2 in σ2 P,2 α < 1 4ηP

σP,1 + 4η2 P

Plugging this result back in,

σ P,2 = 1 + 4ηP

Din σP,2 + 4η2 P D2 in σ2 P,2 α σP,1

σP,1 + 4η2 P

In addition, σ P,2 = σP,2 + 4ηP

Din σ2 P,2 + 4η2 P D2 in σ3 P,2 σP,i + 4ηP

Din σ2 P,i + 4η2 P D2 in σ3 P,i = σ P,i for i = 3, , k P since σP,2 σP,i

for i = 3, , k P by definition. Therefore, σ P,1 remains to be the maximum singular value of θ KPθ , and σ P,k P the minimum. Finally,

κ θ KPθ = σ P,1 σ P,k P

= σP,1 4ηP 1 1 Din

σ2 P,1 + 4η2 P 1 1 Din

σP,k P + 4ηP

Din σ2 P,k P + 4η2 P D2 in σ3 P,k P

< σP,1 4ηP 1 1 Din

σ2 P,1 + 4η2 P 1 1 Din

2 σ3 P,1 σP,k P

σP,k P = κ θ KPθ

where the second inequality holds when ηP < 1 (1 1 Din )σP,1 which indicates that 4ηP 1 1 Din

4η2 P 1 1 Din

2 σ3 P,1 < 0.

Model Immunization from a Condition Number Perspective

(2) Denote R ill (HH) = 1 2σ2 H,k H 1 2k H HH 2 F , then by Theorem 4.2 (2), we know θRill (HH (θ)) =

2KHθ(σH,k Hu H,k Hv H,k H 1

k H HH) R ill(HH)2 . Since θ = θ ηHK 1 H θRill(HH (θ)), we have

θ KHθ = θ ηHK 1 H θRill(HH (θ)) KH θ ηHK 1 H θRill(HH (θ))

= θ KHθ 2ηH R ill (HH)2

σH,k Hu H,k Hv H,k H 1

θ KHθ 2ηH R ill (HH)2 θ KHθ σH,k Hu H,k Hv H,k H 1

+ 4η2 H R ill (HH)4

σH,k Hu H,k Hv H,k H 1

θ KHθ σH,k Hu H,k Hv H,k H 1

= HH 2ηH R ill (HH)2

σH,k Hu H,k Hv H,k H 1

HH 2ηH R ill (HH)2 HH

σH,k Hu H,k Hv H,k H 1

+ 4η2 H R ill (HH)4

σH,k Hu H,k Hv H,k H 1

σH,k Hu H,k Hv H,k H 1

Since HP (θ) = θ KHθ for KH = X H XH is also symmetric and positive semidefinite, we know for HH (θ) = UHDiag (σH) V H , it holds that UH = VH. Following similar arguments as in (1),

σH,k Hu H,k Hv H,k H 1

i=1 σH,iu H,iv H,i + 1 1

σH,k Hu H,k Hv H,k H

= VHDiag ( σH) V H

for Diag ( σH) = h 1

k H σH,1, , 1

k H σH,k H 1, 1 1

σH,k H i . Since VH is orthonormal, i.e., V H VH = I,

θ KHθ = VHDiag (σH) V H 2ηH R ill (HH)2 VHDiag ( σH) V H VHDiag (σH) V H

2ηH R ill (HH)2 VHDiag (σH) V H VHDiag ( σP) V P

+ 4η2 H R ill (HH)4 VHDiag ( σH) V H VHDiag (σH) V H VHDiag ( σH) V H

σH 4ηH R ill (HH)2 σH σH + 4η2 H R ill (HH)4 σH σH σH

= VHDiag (σ H) V H ,

for σ H = h σ H,1, , σ H,k H i , σ H,i =

σH,i + 4ηH k HR ill(HH)2 σ2 H,i + 4η2 H k2 H R ill(HH)4 σ3 H,i if i < k H

σH,k H 4ηH R ill(HH)2 1 1

σ2 H,k H + 4η2 H R ill(HH)4 1 1

2 σ3 H,k H if i = k H ,

and denotes element-wise product.

Since σmin HH is unique, we know that β (0, 1) such that σH,k H = βσH,k H 1. Then we have

σ H,k H = σH,k H 4ηH R ill (HH)2

σ2 H,k H + 4η2 H R ill (HH)4

1 4ηH R ill (HH)2

σH,k H + 4η2 H R ill (HH)4

1 4ηH R ill (HH)2

σH,k H + 4η2 H R ill (HH)4

1 + 4ηHσH,k H k HR ill (HH)2 + 4η2 Hσ2 H,k H k2 HR ill (HH)4 4ηHσH,k H R ill (HH)2 + 4η2 Hσ2 H,k H R ill (HH)4 8η2 Hσ2 H,k H k HR ill (HH)4

Model Immunization from a Condition Number Perspective

Letting 0 < ηH < R ill(HH)2

1 2σH,k H/k H = 1 1 2σmin HH/k H

1 2k H θ KHθ 2 F 1

2 σmin HH 2 2 , we have

4ηHσH,k H R ill (HH)2 + 4η2 Hσ2 H,k H R ill (HH)4 8η2 Hσ2 H,k H k HR ill (HH)4 < 0. (21)

Also, 1 4ηH R ill(HH)2 1 1

σH,k H + 4η2 H R ill(HH)4 1 1

2 σ2 H,k H = 1 2ηH R ill(HH)2 1 1

σH,k H 2 > 0 for any ηH > 0. Given that σH,k H 1 > σH,k H and Eq. (21),

< 1 + 4ηH k HR ill(HH)2 σH,k H 1 + 4η2 H k2 H R ill(HH)4 σ2 H,k H 1

1 + 4ηH k HR ill(HH)2 σH,k H + 4η2 H k2 H R ill(HH)4 σ2 H,k H

< 1 + 4ηH k HR ill(HH)2 σH,k H 1 + 4η2 H k2 H R ill(HH)4 σ2 H,k H 1

1 + 4ηH R ill(HH)2 σH,k H + 4η2 H k2 H R ill(HH)4 σ2 H,k H 4ηH R ill(HH)2 σH,k H + 4η2 H R ill(HH)4 σ2 H,k H 8η2 H k HR ill(HH)4 σ2 H,k H ,

indicating 1 + 4ηH k HR ill(HH)2 σH,k H + 4η2 H k2 H R ill(HH)4 σ2 H,k H 4ηH R ill(HH)2 σH,k H + 4η2 H R ill(HH)4 σ2 H,k H 8η2 H k HR ill(HH)4 σ2 H,k H β < 1+

4ηH k HR ill(HH)2 σH,k H 1 + 4η2 H k2 H R ill(HH)4 σ2 H,k H 1. Therefore,

1 + 4ηHσH,k H k HR ill (HH)2 + 4η2 Hσ2 H,k H k2 HR ill (HH)4 4ηHσH,k H R ill (HH)2 + 4η2 Hσ2 H,k H R ill (HH)4 8η2 Hσ2 H,k H k HR ill (HH)4

1 + 4ηH k HR ill (HH)2 σH,k H 1 + 4η2 H k2 HR ill (HH)4 σ2 H,k H 1

= σ H,k H 1.

In addition, σ H,k H 1 = σH,k H 1+ 4ηH k HR ill(HH)2 σ2 H,k H 1+ 4η2 H k2 H R ill(HH)4 σ3 H,k H 1 σH,i+ 4ηH k HR ill(HH)2 σ2 H,i+ 4η2 H k2 H R ill(HH)4 σ3 H,i = σ H,i for i = 1, , k H 2 since σH,k H 1 σH,i for i = 2, , k H 1 by definition. That is to say, σ H,k H remains to be

the minimum singular value of θ KHθ , and σ H,1 the maximum. Finally,

= σ H,1 σ H,k H

1 + 4ηH k HR ill(HH)2 σH,1 + 4η2 H k2 H R ill(HH)4 σ2 H,1 σH,1 1 + 4ηH k HR ill(HH)2 σH,k H + 4η2 H k2 H R ill(HH)4 σ2 H,k H 4ηH R ill(HH)2 σH,k H + 4η2 H R ill(HH)4 σ2 H,k H 8η2 H k HR ill(HH)4 σ2 H,k H

1 + 4ηH k HR ill(HH)2 σH,1 + 4η2 H k2 H R ill(HH)4 σ2 H,1 σH,1 1 + 4ηH k HR ill(HH)2 σH,k H + 4η2 H k2 H R ill(HH)4 σ2 H,k H

σH,k H = κ θ KHθ

where the first inequality holds by Eq. (21) and the second by σH,1 > σH,k H.

Model Immunization from a Condition Number Perspective

C. Detailed Experiment Setup

C.1. Datasets

Stanford Cars (Krause et al., 2013) contains 16,185 images of 196 car models and focuses on fine-grained image classification. Country211 (Radford et al., 2021) is a dataset used for country classification based on satellite images, comprising 211 country-level labels, each with 150 training images. This is a subset of the YFCC100M dataset (Thomee et al., 2016) providing user-generated photos and videos, used for domain adaptation evaluation.

C.2. Immunization training details

We summarize the hyper-parameters of training for model immunization in Tab. 4. We choose λP and λH by balancing the gradient norm of Rwell and Rill. Specifically, we obtain the scale of λP and λH first and search over multiples of {1, 2, 3, 5}. For linear models, we search over the set of {0.0005, 0.001, 0.005, 0.01} and report the best result. For Image Net we followed the default learning rate η = 1 10 5. The number of epochs is based on early stopping using RIR and the test accuracy. All experiments are conducted using float64 precision to ensure numerical stability and reduce potential inaccuracies in computations.

Table 4. Hyperparameters for immunization training.

Dataset Model η λP λH Epochs L House Price Linear 0.005 100 1 107 1000 Mean squared error MNIST Linear 0.001 1 5 107 30 Binary Cross-entropy (CE) Image Net vs. Stanford Cars Res Net18 1 10 5 5 10 5 2 106 3 Label-smoothing CE Image Net vs. Country211 Res Net18 1 10 5 1 10 4 2 106 3 Label-smoothing CE Image Net vs. Stanford Cars Vi T 1 10 5 3 10 6 3 108 2 Label-smoothing CE Image Net vs. Country211 Vi T 1 10 5 1 10 6 1 108 2 Label-smoothing CE

Details of immunizing linear models. For the regression task, the linear feature extractor θ R79 79 is a randomly initialized dummy linear layer, as discussed in Sec. 4.4. We handle missing values in the tabular data by filling Na Ns with 0. Categorical features are converted into numerical values using Label Encoder. Finally, the features and labels are normalized using their respective mean and standard deviation. To create DP and DH, we split the House prices dataset (Montoya & Data Canary, 2016) by the feature MSZoning. Specifically, all entries where MSZoning = RL are assigned to DH, while the remaining entries form DP.

For the binary classification task on MNIST, the linear feature extractor θ R784 784 is also a randomly initialized dummy linear layer, and we construct a training dataset by selecting two specific target digits. The dataset is created using a custom Binary Dataset class, which filters the original MNIST dataset to include only the chosen digits and assigns new labels: one digit is mapped to label 0 and the other to label 1. To ensure balance in the dataset, we limit the number of samples for each digit to the smaller count between the two. For optimization, we use Adam (Kingma, 2014) with β = (.9, 0.999) and ϵ = 1 10 8 instead of the basic gradient descent in Alg. 1. For the linear model, we computed the Hessian inverse by solving a regularized least-squares system, where the Hessian is in the shape of RDin Din. Here Din = 79 for the regression task and Din = 784 for the image classification task.

Details of immunizing non-linear models. The pre-trained Res Net18 and Vi T are loaded from Pytorch Image Models (Wightman, 2019) with the model name resnet18 and vit base patch16 clip 224. We also create the dataset with the built-in function create dataset from Wightman (2019). The feature embedding sizes for Res Net18 and Vi T are 512 and 768, respectively. To facilitate balanced training when dataset sizes differ, we implement a Combined Loader, which pairs batches from two data loaders. The longer dataset dictates training duration, while the shorter dataset cycles continuously using itertools. The number of epochs reported in Tab. 4 corresponds to the epochs of DH, i.e., the shorter loader.

For optimization, we use SGD with Nesterov momentum to optimize Eq. (11), setting an initial learning rate of 1 10 5

with momentum 0.9. The trainable feature extractor parameters are optimized with zero weight decay, while the classifier parameters use a weight decay of 2 10 5.

Model Immunization from a Condition Number Perspective

C.3. Pseudo-code of the dummy layer

We provide the Pseudo-code for implementing the dummy layer in Fig. 4 below. The Dummy Linear layer extends torch.nn.Linear and incorporates an optional preconditioning mechanism in the backward pass using the inverse feature covariance matrix. The Linear Function class defines the forward and backward computations, where the forward pass applies a standard linear transformation XW + b and stores the input, weight, and bias for gradient computation. In the backward pass, the input gradient is computed normally, while the weight gradient is modified based on whether preconditioning is enabled (use precond=True). If enabled, the weight gradient is adjusted by solving a regularized least-squares system using the inverse of the feature covariance matrix X X + ϵI, improving numerical stability.

class Linear Function:

@staticmethod def forward(ctx, input, weight, bias, lambda_reg, use_precond):

# Save input tensors for backward ctx.save_for_backward(input, weight, bias) ctx.lambda_reg = lambda_reg ctx.use_precond = use_precond

# Compute output output = input.mm(weight.t()) if bias is not None: output += bias.unsqueeze(0).expand_as(output) return output

@staticmethod def backward(ctx, grad_output):

# Retrieve saved tensors input, weight, bias = ctx.saved_tensors lambda_reg = ctx.lambda_reg use_precond = ctx.use_precond

# Initialize gradients grad_input = grad_weight = grad_bias = None

if ctx.needs_input_grad[0]: grad_input = grad_output.mm(weight)

if ctx.needs_input_grad[1]: base_grad_weight = grad_output.t().mm(input) if use_precond: Xt X = input.t().mm(input) lambda_eye = lambda_reg * torch.eye(Xt X.size(0), device=Xt X.device) Xt X_reg = Xt X + lambda_eye grad_weight = torch.linalg.solve(Xt X_reg, base_grad_weight) else: grad_weight = base_grad_weight

if bias is not None and ctx.needs_input_grad[2]: grad_bias = grad_output.sum(0)

return grad_input, grad_weight, grad_bias, None, None

class Dummy Linear(nn.Linear):

def forward(self, input, lambda_reg, use_precond):

# Dynamically decide whether to use the covariance inversion as the preconditioner return Linear Function.apply(input, self.weight, self.bias, lambda_reg, use_precond)

Figure 4. Dummy layer with selective inverse feature covariance matrix in backward function.