# certified_robustness_under_bounded_levenshtein_distance__35185e91.pdf

Published as a conference paper at ICLR 2025

CERTIFIED ROBUSTNESS UNDER BOUNDED LEVENSHTEIN DISTANCE

Elias Abad Rocamora , Grigorios G. Chrysos

, Volkan Cevher : LIONS - École Polytechnique Fédérale de Lausanne, Switzerland

: Department of Electrical and Computer Engineering, University of Wisconsin-Madison, USA {elias.abadrocamora, volkan.cevher}@epfl.ch, chrysos@wisc.edu

Text classifiers suffer from small perturbations, that if chosen adversarially, can dramatically change the output of the model. Verification methods can provide robustness certificates against such adversarial perturbations, by computing a sound lower bound on the robust accuracy. Nevertheless, existing verification methods incur in prohibitive costs and cannot practically handle Levenshtein distance constraints. We propose the first method for computing the Lipschitz constant of convolutional classifiers with respect to the Levenshtein distance. We use these Lipschitz constant estimates for training 1-Lipschitz classifiers. This enables computing the certified radius of a classifier in a single forward pass. Our method, Lips Lev, is able to obtain 38.80% and 13.93% verified accuracy at distance 1 and 2 respectively in the AG-News dataset, while being 4 orders of magnitude faster than existing approaches. We believe our work can open the door to more efficient verification in the text domain.

1 INTRODUCTION

Despite the impressive performance of Natural Language Processing (NLP) models (Sutskever et al., 2014; Zhang et al., 2015; Devlin et al., 2019), simple corruptions like typos or synonym substitutions are able to dramatically change the prediction of the model (Belinkov and Bisk, 2018; Alzantot et al., 2018). With newer attacks in NLP becoming stronger (Hou et al., 2023), verification methods become relevant for providing future-proof robustness certificates (Liu et al., 2021).

Constraints on the Levenshtein distance (Levenshtein et al., 1966) provide a good description of the perturbations a model should be robust to (Morris et al., 2020) and strong attacks incorporate such constraints (Gao et al., 2018; Ebrahimi et al., 2018; Liu et al., 2022; Abad Rocamora et al., 2024). Despite the success of verification methods in the text domain, existing methods can only certify probabilistically via randomized smoothing (Cohen et al., 2019; Ye et al., 2020; Huang et al., 2023), or can only handle specifications such as replacements of characters/words, stop-word removal or word duplication (Huang et al., 2019; Jia et al., 2019; Shi et al., 2020; Zhang et al., 2021).

On the performance side, most successful certification methods rely on Interval Bound Propagation (IBP) (Moore et al., 2009), which in the text domain requires multiple forward passes through the first layers of the model (Huang et al., 2019), unlike in the image domain where a single forward pass is enough for verification (Wang et al., 2018). Moreover, IBP has been shown to provide a suboptimal verified accuracy in the image domain (Wang et al., 2021).

In the image domain, a popular approach to get fast robustness certificates is computing upper bounds on the Lipschitz constant of classifiers, and using this information to directly verify with a single forward pass (Hein and Andriushchenko, 2017; Tsuzuku et al., 2018; Latorre et al., 2020; Xu et al., 2022). These methods cannot be trivially applied in NLP because they assume the input to be in an ℓp space such Rd, which is not the case of text input, where the input length can vary and inputs are discrete (characters). Therefore, we need to rethink Lipschitz verification for NLP.

In this work, we introduce the first method able to provide deterministic Levenshtein distance certificates for convolutional classifiers. This is achieved by computing the Lipschitz constant of intermediate layers with respect to the ERP distance (Chen and Ng, 2004). Our Lipschitz constant estimates

Published as a conference paper at ICLR 2025

Table 1: State of the art in Levenstein distance verification and our contributions: Lips Lev is the first to verify deterministically against Levenshtein distance constraints in a single forward pass.

Method Insertions/deletions Deterministic Single forward pass

Huang et al. (2019) Huang et al. (2023) Lips Lev (Ours)

allow enforcing 1-Lipschitzness during training in order to achieve a larger verified accuracy. Our experiments in the AG-News, SST-2, Fake-News and IMDB datasets show non-trivial certificates at distances 1 and 2, taking 4 to 7 orders of magnitudes less time to verify. Furthermore, our method is the only one able to verify under Levenshtein distance larger than 1. We set the foundations for Lipschitz verification in NLP and we believe our method can be extended to more complex models.

Notation: We use uppercase bold letters for matrices X Rm n, lowercase bold letters for vectors x Rm and lowercase letters for numbers x R. Accordingly, the ith row and the element in the i, j position of a matrix X are given by xi and xij respectively. We use the shorthand [n] = {0, 1, , n 1} for any natural number n. Given two matrices A Rm d and B

Rn d A B = A B

R(m+n) d. Concatenating with the empty sequence results in the identity

A = A. We denote as A2: R(m 1) d the matrix obtained by removing the first row. We denote the zero vector as 0 with dimensions appropriate to context. We use the operator | | for the size of sets, e.g., |S(Γ)| and the length of sequences, e.g., for X Rm n, we have |X| = m.

2 RELATED WORK

Lipschitz verification: Hein and Andriushchenko (2017) are the first to study the computation of the Lipschitz constant in order to provide formal guarantees of the robustness of support vector machines and two-layer nueral networks. Tsuzuku et al. (2018) compute Lipschitz constant upper bounds for deeper networks and regularize such upper bounds to improve certificates. Since then, tighter upper bounds for the Lipschitz constant have been proposed (Huang et al., 2021; Fazlyab et al., 2019; Latorre et al., 2020; Shi et al., 2022). A variety of works propose constraining the Lipschitz constant to be 1 during training in order to have automatic robustness certificates (Cisse et al., 2017; Qian and Wegman, 2019; Gouk et al., 2021; Xu et al., 2022). All previous works center in the standard ℓp norms and cannot be applied to the NLP domain. Our work provides the first 1-Lipschitz training method for the Levenshtein distance.

Verfication in NLP: Jia et al. (2019) propose using Interval Bound Propagation via an overapproximation of the embeddings of the set of synonyms of each word. Concurrently, Huang et al. (2019) incorporate this technique for verifying against replacements of nearby characters in the English keyboard. Bonaert et al. (2021); Shi et al. (2020) propose zonotope abstractions and IBP for verifying against synonym substitutions in transformer models. Zhang et al. (2021) propose a verification procedure that can handle a small number of input perturbations for LSTM classifiers. Deviating from these approaches, Ye et al. (2020) propose using randomized smoothing techniques Cohen et al. (2019) in order to verify probabilistically against character substitutions. Huang et al. (2023) used similar techniques in order to probabilistically verify under Levenshtein distance specifications. Zhao et al. (2022) propose a framework to verify under word substitutions via Causal Interventions. Sun and Ruan (2023) derive probable upper and lower bounds of the certified radius under word substitutions. Zhang et al. (2024) employ randomized smoothing to verify against word (synonym) substitutions, insertions, deletions and reorderings. Zeng et al. (2023) propose a randomized smoothing technique that does not rely on knowing how attackers generate synonyms. In Table 1 we highlight the differences with existing works in NLP verification.

3 PRELIMINARIES

Let S(Γ) = {c1c2 cm : ci Γ m N \ 0} be the space of sequences of characters in the alphabet set Γ. We represent sentences S S(Γ) as sequences of one-hot vectors, i.e., S

Published as a conference paper at ICLR 2025

{0, 1}m |Γ| : ||si||1 = 1, i [m]. Given a classification model f : S(Γ) Ro assigning scores to each of the o classes, the predicted class for some S S(Γ) is given by ˆy = arg maxi [o] f(S)i. Our goal is to check whether for a given pair (S, y) (S(Γ) [o]):

f(S )y max ˆy =y f(S )ˆy > 0, S S(Γ) : d Lev(S, S ) k , (1)

where d Lev is the Levenshtein distance (Levenshtein et al., 1966). The Levenshtein distance is defined as follows:

d Lev(S, S ) :=

|S| if |S | = 0 |S | if |S| = 0 d Lev(S2:, S 2:) if s1 = s 1

(d Lev(S2:, S 2:) d Lev(S2:, S ) d Lev(S, S 2:)

otherwise .

The Levenshtein distance captures the number of character replacements, insertions or deletions needed in order to transform S into S and vice-versa. Such constraints are employed in popular NLP attacks in order to enforce the imperceptibility of the attack (Gao et al., 2018; Ebrahimi et al., 2018; Liu et al., 2022; Abad Rocamora et al., 2024) following the findings of Morris et al. (2020).

3.1 INTERVAL BOUND PROPAGATION (IBP)

Existing robustness verification approaches rely on IBP for verifying the robustness of text models (Huang et al., 2019; Jia et al., 2019). IBP relies on the input being constrained in a box. Let x, l, u Rd, every element of x is assumed to be in an interval given by l and u, i.e., li xi ui i [d] or x [l, u] for short. These constraints arise naturally when studying robustness in the ℓ norm, as the constraint x {x(0) + δ : ||δ|| ϵ} can exactly be represented as x [x(0) ϵ, x(0) + ϵ]. IBP consists in a set of rules to obtain interval constraints of the output of a function, given the interval constraints of the input. In the case of an affine mapping f(x) = W x + b, we can easily obtain the interval constraints f(x) [lf(x), uf(x)], x [l, u] with:

lf(x) = W +l + W u + b, uf(x) = W +u + W l + b , (2)

where W + and W are the positive and negative parts of W . In the case of the Re LU activation function σ(x) = max{0, x}, we have that:

lσ(x) = σ(l), uσ(x) = σ(u) . (3)

By applying recursively the simple rules in Eqs. (2) and (3), one can easily verify robustness properties of Re LU fully-connected and convolutional networks (Wang et al., 2018).

Nevertheless, IBP has two main limitations:

a) IBP assumes the input space to be of fixed length, e.g., Rd. b) IBP can only handle interval constrained inputs, e.g., x [l, u].

Limitation a) makes it impossible to verify Levenshtein distance constraints as they include insertion and deletion operations, which change the length of the input sequence. In the literature, limitation a) forces existing verification methods to only consider replacements of characters/words (Huang et al., 2019; Jia et al., 2019; Shi et al., 2020; Bonaert et al., 2021; Zhang et al., 2021).

Limitation b) can be circumvented by building an over approximation of the replacement constraints that can be represented with intervals. In the case of text, one can directly build an over approximation of the embeddings. Let Z = SE Rm d, where S S(Γ) is the sequence of one-hot vectors representing each character/word, and E R|Γ| d is the embedding matrix. Let dedit be the edit distance without insertions and deletions, our constraint in the edit distance (Eq. (1)) translates in the embedding space to the set:

Zk(S) = {S E : dedit(S, S ) k, S S(Γ)} ,

where dedit(S, S ) = Pm i=1 ||si s i|| for any length m sequences of one-hot vectors S, S . We can overapproximate this set with interval constraints such that ˆZ [L, U], with lij = min Z Zk(S) zij

Published as a conference paper at ICLR 2025

and uij = max Z Zk(S) zij. But, because we can replace any character/word at any position, we end up with L = l l l and U = u u u, where:

li = min k [|Γ|] eki, ui = max k [|Γ|] eki, i [d] .

Therefore, this overapproximation contains the embeddings of any S {0, 1}m |Γ| : ||s i||1 = 1, i [m], i.e., every sentence of length m, making verification impossible. To circumvent this, existing methods focus on the synonym replacement task, further restricting Zk(S) to only replace words for a word in a small set of synonyms (Jia et al., 2019; Shi et al., 2020; Bonaert et al., 2021). Alternatively, Huang et al. (2019) compute the over approximation after the pooling layer of the model, circumventing this problem. Nevertheless, their approach requires |Z1(S)| forward passes. This number of forward passes can be in the order of tens of thousands for large m and |Γ|.

Our Lipschitz constant based approach, Lips Lev, can handle sequences of any length and requires a single forward pass through the model.

In Section 4.1 we cover the verification procedure once the Lipschitz constant of a classifier is known. In Section 4.2 we cover the convolutional architectures employed in Huang et al. (2019) and our Lipschitz constant estimation for them. Lastly, we introduce our training strategy in order to achieve non-trivial verified accuracy in Section 4.3. We defer our proofs to Appendix B.

4.1 LIPSCHITZ CONSTANT BASED VERIFICATION

Motivated by the success and efficiency of Lipschitz constant based certification in vision tasks (Huang et al., 2021; Xu et al., 2022), we propose a method of this kind that can handle previously studied models in the character-level classification task (Huang et al., 2019), and provide Levenshtein distance certificates.

Our goal is to compute the global Lipschitz constant. Let gy,ˆy(S) = f(S)y f(S)ˆy be the margin function for classes y and ˆy, we would like to have for some S:

|gy,ˆy(S) gy,ˆy(S )| Gy,ˆy d Lev(S, S ) S S(Γ) , (4)

for some Gy,ˆy R+. Given Eq. (4) is satisfied, the maximum distance up to which we can verify Eq. (1), is lower bounded by:

max{k : gy,ˆy(S ) > 0 S S(Γ) : d Lev(S, S ) k} gy,ˆy(S)

Let k y,ˆy(S) := j gy,ˆy(S)

k , we denote k y(S) := minˆy =y k y,ˆy(S) to be the certified radius.

4.2 LIPSCHITZ CONSTANT ESTIMATION FOR CONVOLUTIONAL CLASSIFIERS

Let S S(Γ) be a sequence of one-hot vectors, our classifier is defined as:

m+l (q 1) X

i=1 f (l) i (S)

W , where f (j)(S) = σ C(j) f (j 1)(S) j = 1, , l SE j = 0 ,

(6) where E Rv d is the embeddings matrix, C(i), i = 1, , l are convolutional layers with kernel size q and hidden dimension k. σ is the Re LU activation function and W Rk o is the last classification layer. This architecture was previously studied in verification by (Huang et al., 2019; Jia et al., 2019).

Our approach to estimate the global Lipschitz constant of such a classifier is to compute the Lipschitz constant of each layer. Then, since the overall function in Eq. (6) is the sequential composition of all of the layers, we can just multiply the Lipschitz constants to obtain the global one. However, in order to be able to do this, we need some metric with respect to which we can compute the Lipschitz

Published as a conference paper at ICLR 2025

constant. The Levenshtein distance cannot be applied, as it can only measure distances between one-hot vectors and the outputs of intermediate layers are sequences of real vectors. For this task, we select the ERP distance (Chen and Ng, 2004): Definition 4.1 (ERP distance (Chen and Ng, 2004)). Let A Rm d and B Rn d be two sequences of m and n real vectors respectively and p 1. The ERP distance is defined as:

ERP(A, B) =

Pm i=1 ||ai||p if n = 0 (B = ) Pn i=1 ||bi||p if m = 0 (A = )

||a1||p + dp

ERP(A2:, B), ||b1||p + dp

ERP(A, B2:), ||a1 b1||p + dp

ERP(A2:, B2:)

The ERP distance is a natural extension of the Levenshtein distance for sequences of real valued vectors. In fact, in the case we compare sequences of one-hot vectors and we set p = , we recover the Levenshtein distance, see Lemma S4.

In the following we define a useful representation of convolutional layers. Definition 4.2 (1D Convolutional layer with zero padding). Let A Rm d be a sequence of ddimensional vectors. Let k be the number of filters and q the kernel size, a convolutional layer C : Rm d R(m+q 1) k with parameters K Rq k d can be represented as:

m+2 (q 1) X

j=1 ˆ Ki,j ˆaj, where ˆ Ki,j = Kj i+1 if 0 j i q 1 00 otherwise , i [m + q 1] ,

and ˆ A = 0(q 1) d A 0(q 1) d R(m+2 (q 1)) d is the zero-padded input. We denote the parameter tensor corresponding to every layer C(i) as K(i).

In Theorem 4.3 we present our Lipschitz constant upper bound. In Corollary 4.4 the Lipschitz constant upper bound is employed to compute the certified radius at a sentence P . The Lipschitz constant upper bound can be further refined considering the local Lipschitz constant of the embedding layer around sentence P , see Remark 4.5. Theorem 4.3 (Lipschitz constant of margins of convolutional models). Let f be defined as in Eq. (6). Let gy,ˆy(S) = f(S)y f(S)ˆy be the margin function for classes y and ˆy. Let p 1. Let P and Q be sequences of one-hot vectors, we have that for any y and ˆy:

|gy,ˆy(P ) gy,ˆy(Q)| ||wˆy wy||r

i=1 M K(i) !

M(E) d Lev (P , Q) ,

where M(K) = Pq i=1 ||Ki||p, M(E) = max{ max i [|Γ|] ||ei||p , max i,j [|Γ|] ||ei ej||p} and 1

Proof. See Appendix B

Corollary 4.4 (Certified radius of convolutional models). Let f be defined as in Eq. (6) and the Lipschitz constant of gy,ˆy be:

Gy,ˆy = ||wˆy wy||r

i=1 M K(i) !

Then, the certified radius of f at the sentence P is given by: k y,ˆy(S) = minˆy =y j gy,ˆy(P )

Remark 4.5 (Local Lipschitz constant of the embedding layer). Let the embeddings of a sentence S be given by SE, we have that for any two sentences P and Q:

ERP(P E, QE) M(E, P ) d Lev(P , Q) ,

where M(E, P ) = max{ max i [|Γ|] ||ei||p , max i |P |,j [d] ||pi E ej||p}, satisfying M(E, P ) M(E).

1In the case p = 1 and p = , we have r = and r = 1 respectively.

Published as a conference paper at ICLR 2025

4.3 TRAINING 1-LIPSCHITZ CLASSIFIERS

Models trained with the standard Cross Entropy loss and Stochastic Gradient Descent (SGD) recipe are not amenable to verification methods, resulting in small certified radiuses. This has motivated the use of specialized training methods in the image domain (Mirman et al., 2018; Gowal et al., 2018; Mueller et al., 2023; Palma et al., 2024). Verification methods in the text domain also require tailored training methods to achieve non-zero certified radiuses (Huang et al., 2019; Jia et al., 2019). Motivated by methods enforcing classifiers to be 1-Lipschitz in the image domain (Xu et al., 2022), we enforce this constraint during training in order to improve certification.

In order to achieve a 1-Lipschitz classifier, we enforce 1-Lipschitzness of every layer by dividing the output of each layer by its Lipschitz constant. This results in our modified classifier being:

m+l (q 1) X

i=1 ˆf (l) i (S)

W M(W ), where ˆf (j)(S) =

( σ(C(j)( ˆ f (j 1)(S))) M(K(j)) j = 1, , l

SE M(E) j = 0 ,

(7) where M(W ) = maxy,ˆy [o] ||wy wˆy||r. Note that the last layer is made 1-Lipschitz with respect to the worst pair of class labels. Incorporating this information and Remark 4.5, we end up with the final Lipschitz constant for the classifier:

Corollary 4.6 (Local Lipschitz constant of modified classifiers). Let ˆf be defined as in Eq. (7). Let ˆgy,ˆy(S) = ˆf(S)y ˆf(S)ˆy be the margin function for classes y and ˆy. Let P and Q be sequences of one-hot vectors, we have that for any y and ˆy:

|ˆgy,ˆy(P ) ˆgy,ˆy(Q)| ||wˆy wy||r

M(W ) M(E, P )

M(E) dlev (P , Q) ,

where M(E) is defined as in Theorem 4.3, M(E, P ) is as in Remark 4.5 and M(W ) = maxy,ˆy [o] ||wy wˆy||r. Note that this Lipschitz constant is local as it depends on P .

Note that the local Lipschitz constant estimate in Corollary 4.6 is guaranteed to be at most 1 as ||wˆy wy||r M(W ) and M(E, P ) M(E). Given this estimate, we can proceed similarly to Corollary 4.4 in order to obtain the certified radius of the modified model. Note that in the forward pass of Eq. (7), we need to compute M(E), M(K(j)) and M(W ), which increases the complexity of a forward pass with respect to Eq. (6). Nevertheless, we observe this can be efficiently done during training as seen in Appendix A.6. Then, the weights of each layer can be divided by its Lipschitz constant, resulting in the same architecture in Eq. (6) with the guarantees of Corollary 4.6.

5 EXPERIMENTS

In this section, we cover our experimental validation. In Section 5.1 we cover the experimental setup and training mechanisms shared among all experiments. In Section 5.2 we compare performance of our approach with existing IBP approaches and the naive brute force verification baseline. Lastly, in Section 5.3 we cover the hyperparameter selection of our method. We define our performance metrics and perform additional experiments in Appendix A.

5.1 EXPERIMENTAL SETUP

We train and verify our models in the sentence classification datasets AG-News (Gulli, 2005; Zhang et al., 2015), SST-2 (Wang et al., 2019), IMDB (Maas et al., 2011) and Fake-News (Lifferth, 2018). We consider all of the characters present in the dataset except for uppercase letters, which we tokenize as lowercase. Each character is tokenized individually and assigned one embedding vector via the matrix E. For all our models and datasets, following Huang et al. (2019), we train models with a single convolutional layer, an embedding size of 150, a hidden size of 100 and a kernel size of 5 for the SST-2 dataset and 10 for the rest of datasets. Following the setup used in Andriushchenko and Flammarion (2020) for adversarial training, we use the SGD optimizer with batch size 128 and a 30-epoch cyclic learning rate scheduler with a maximum value of 100.0, which we select via a grid search in a validation dataset, see Appendix A.5. For every experiment, we report the average results over three random seeds and report the performance over the first 1, 000 samples of the test set. Our

Published as a conference paper at ICLR 2025

standard deviations are reported in Appendix A.3. Due to the extreme time costs of the brute-force and IBP approaches in the Fake-News dataset, we reduce to 50 samples in this dataset. We compute the adversarial accuracy with Charmer (Abad Rocamora et al., 2024). For completeness, we report the performance of Lips Lev over the full 1, 000 samples and k up to 10 in Appendix A.4. All of our experiments are conducted in a single machine with an NVIDIA A100 SXM4 40 GB GPU. Our implementation is available in github.com/LIONS-EPFL/Lips Lev.

Table 2: Verified accuracy under bounded dlev: We report the Clean accuracy (Acc.), Adversarial Accuracy (Adv. Acc.) with Charmer (Abad Rocamora et al., 2024), Verified accuracy (Ver.) and the average runtime in seconds (Time) for the brute-force approach (Brute F), IBP (Huang et al., 2019) and Lips Lev. OOT means the experiment was Out Of Time. means the method does not support dlev > 1. Our method, Lips Lev, is the only method able to provide non-trivial verified accuracies for any k in a single forward pass.

Dataset p k Acc.(%) Charmer Brute F IBP Lips Lev Adv. Acc.(%) Time(s) Ver.(%) Time(s) Ver.(%) Time(s) Ver.(%) Time(s)

1 65.23 47.90 5.70 47.87 16.15 27.77 16.76 32.33 0.0015 2 32.97 5.70 OOT 11.60 0.0015 1 69.63 54.47 5.43 54.43 15.33 18.93 17.56 34.50 0.00140 1 2 37.77 5.43 OOT 12.53 0.00140 1 74.80 62.20 7.32 62.07 29.12 29.10 31.54 38.80 0.00970 2 2 46.47 7.32 OOT 13.93 0.00970

1 63.95 39.68 1.84 39.68 2.27 33.94 2.88 14.68 0.00084 2 19.92 1.84 OOT 0.99 0.00084 1 69.69 45.26 1.91 45.22 2.31 19.00 2.99 18.69 0.0022 1 2 26.11 1.91 OOT 1.83 0.0022 1 69.95 48.81 2.09 48.78 4.23 16.06 5.22 14.57 0.0047 2 2 30.70 2.09 OOT 0.73 0.0047

1 100.00 86.67 66.82 86.67 972.46 85.33 989.84 85.33 0.017 2 76.00 66.82 OOT 68.67 0.017 1 98.00 92.00 67.11 92.00 978.94 91.33 990.32 91.33 0.014 1 2 79.33 67.11 OOT 75.33 0.014 1 98.00 88.67 73.52 88.67 1224.45 87.33 1466.38 87.33 0.0089 2 2 78.00 73.52 OOT 71.33 0.0089

1 74.57 67.43 14.16 67.43 130.49 61.50 138.20 31.37 0.0047 2 59.77 14.16 OOT 5.90 0.0047 1 69.57 61.17 14.44 61.00 134.23 47.30 135.22 28.73 0.0027 1 2 52.20 14.44 OOT 6.80 0.0027 1 60.60 46.87 16.24 46.73 261.99 37.57 308.73 8.67 0.0019 2 2 35.10 16.24 OOT 0.87 0.0019

5.2 COMPARISON WITH IBP AND BRUTE FORCE APPROACHES

In this section, we compare our verification method against a brute-force approach and a modification of the IBP method in (Huang et al., 2019) to handle insertions and deletions of characters.

With the brute-force approach, for every sentence P in the test dataset, we evaluate our model in every sentence in the set {Q : dlev(P , Q) k} and check if there is any missclassification. Since the size of this set grows exponentially with k, we only evaluate the brute-force accuracy for k = 1.

In the case of IBP, we evaluate the classifier up to the pooling layer in every sentence of {Q : dlev(P , Q) k} and then build the overapproximation. In (Huang et al., 2019) it was enough to build this overapproximation for k = 1 and re-scale it to capture larger ks. This is not the case for insertions and deletions, this constrains IBP with Levenshtein distance specifications to work only for k = 1. Overall, the complexity of IBP is the same as the brute-force approach without providing the exact robust accuracy. Because Huang et al. (2019) only considered perturbations of characters nearby in the English keyboard, the maximum perturbation size at k = 1 was very small, e.g., 206 and 722 sentences for SST-2 and AG-News respectively2. In our setup, the maximum perturbation sizes are 33, 742 and 85, 686. This makes it impractical to perform IBP verified training. We train 3 models for each dataset and p {1, 2, } and verify them with the three methods. We report the average time to verify and the clean, adversarial and verified accuracies at k {1, 2}.

2See Table 3 in Huang et al. (2019)

Published as a conference paper at ICLR 2025

Table 3: Regularizing v.s. enforcing Lipschitzness in SST-2: We compare the performance when regularizing the Lipschitz constant (G) during training with λ {0, 0.001, 0.01, 0.1}, against enforcing 1-Lipschitzness through Eq. (7). Regularizing G leads to either models with similar performance to a constant classifier (55.7% for SST-2), or more accurate but non-verifiable models than when using the formulation in Eq. (7).

p = p = 1 p = 2 λ Clean(%) Ver.(%) G Clean(%) Ver.(%) G Clean(%) Ver.(%) G

0 89.0 (0.5) 0.0 (0.0) 2850.2 (80.1) 86.1 (0.4) 0.0 (0.0) 449.6 (3.0) 87.2 (0.2) 0.0 (0.0) 129.1 (2.9) 0.001 80.8 (0.6) 0.0 (0.0) 65.0 (0.9) 84.5 (0.5) 0.0 (0.0) 44.1 (0.7) 86.2 (0.4) 0.0 (0.0) 37.7 (0.3) 0.01 60.1 (1.1) 1.7 (0.1) 1.4 (0.1) 79.7 (0.5) 0.1 (0.0) 6.9 (0.0) 81.6 (0.4) 0.1 (0.0) 8.8 (0.1) 0.1 56.2 (0.0) 55.7 (0.3) 0.0 (0.0) 57.3 (0.0) 53.2 (0.8) 0.1 (0.0) 57.5 (0.9) 34.1 (3.0) 0.1 (0.0)

Eq. (7) 62.8 (0.6) 7.3 (0.1) 1.00 (0.0) 65.6 (0.1) 10.7 (0.2) 1.00 (0.0) 66.6 (0.6) 7.2 (0.1) 1.00 (0.0)

In Table 2, we can observe that the p value has a big influence in the clean accuracy of the models and the verification capability of each method. With p = 2, we observe the highest clean accuracy AGNews and SST-2, with an average of 74.80% and 69.95% respectively. In the case of Fake-News and IMDB, p = provided the best accuracy with 100% and 74.57% respectively. In terms of robust accuracy (Brute F), p = 2 also provides the best performance with 62.07% for AG-News and 48.78% for SST-2, while for Fake-News and IMDB, the best performance was achieved with p = 1 and p = respectively (92% and 74.57%). We observe that IBP obtains the best ratio between clean and verified accuracy when employing p = , providing the best performance in the IMDB and SST-2 datasets at k = 1. Our method, Lips Lev, is able to improve over IBP in AG-News and match IBP in Fake-News at k = 1, being the only method able to verify for k > 1. At distance k = 2, we can observe that the Charmer adversarial accuracy in AG-News, SST-2 and IMDB is significantly larger than the verified accuracy given by Lips Lev. Contrarily, for the Fake-News dataset, Lips Lev is able to have a gap as close as 75.33% v.s. 79.33% with p = 1.

In terms of runtime, our method is from 4 to 7 orders of magnitude faster than brute-force and IBP, which attain similar runtimes. The impossibility of IBP to verify for k > 1 and its larger runtime than brute-force, poses it as an impractical tool for Levenstein distance verification. Our method is the only one able to verify for k > 1, with 13.93% verified accuracy for AG-News, 1.83% for SST-2, 75.33% for Fake-News and 6.80% for IMDB at k = 2.

5.3 REGULARIZING THE LIPSCHITZ CONSTANT

In Section 4.3 we describe how to enforce our convolutional classifier to be 1-Lipschitz. But, is there a better way of improving the final verified accuracy of our models? Because our Lipschitz constant estimate in Theorem 4.3 its differentiable with respect to the parameters of the model, we can regularize this quantity during training in order to achieve a lower Lipschitz constant and hopefully a better verified accuracy. In practice we regularize G = M(W ) M(K(1)) M(E) as defined in Theorem 4.3 and Corollary 4.6.

We train single-layer models with a regularization parameter of λ {0, 0.001, 0.01, 0.1}, where λ = 0 is equivalent to standard training. We initialize the weights of each layer so that their Lipschitz constant is 1. We use a learning rate of 0.01. We measure the final Lipschitz constant of each model and their clean and verified accuracies in a validation set of 1, 000 samples left out from the training set. As a baseline, we report these metrics for the models trained with the formulation in Eq. (7).

In Table 3 we observe that for all the studied norms, when regularizing the Lipschitz constant G, we cannot easily match the performance when using Eq. (7). Regularized models converge to either close-to-constant classifiers (55.7% clean accuracy for SST-2) or present a close-to-zero verified accuracy, This behavior has also been observed practically and theoretically for ℓp spaces (Zhang et al., 2022). The formulation in Eq. (7) allows us to obtain verifiable models without the need to tune hyperparameters.

5.4 THE INFLUENCE OF SENTENCE LENGTH IN VERIFICATION

In this section we study the qualitative characteristics of a sentence leading to better verification properties, specifically, we study the influence of the sentence length in verification. We compute

Published as a conference paper at ICLR 2025

100 200 300 400 500 Sentence length

0 100 200 Sentence length

Average Lentgh

Verified Dataset Method

AG-News Brute F 221.4 248.4 Lips Lev 228.0 258.9

SST-2 Brute F 93.2 109.5 Lips Lev 98.3 126.8

Figure 1: Sentence length distribution for verified and not verified sentences: We report the verified accuracy v.s. sentence length with Lips Lev (Left), and the average length of verified and not verified sentences (Right) at k = 1 in the models trained with p = 2. Shorter sentences are harder to verify in both SST-2 and AG-News with both Lips Lev and the brute-force approach.

the verified accuracy v.s. sentence length at k = 1 for the models in Section 5.2 with Lips Lev and p = 2. For the AG-News we remove the outlier sentences with length larger than 600 characters. The full length distribution is displayed in Appendix A.

In Fig. 1 we can observe that for both verification methods on both datasets, the verified sentences present a larger average length. Additionally, there is a clear increasing tendency in the verified accuracy v.s. sentence length. We believe this is reasonable as single characters perturbations are likely to introduce a smaller semantic change for longer sequences.

6 CONCLUSION

In this work, we propose the first approach able to verify NLP classifiers using the Levenshtein distance constraints. Our approach is based on an upper bound of the Lipschitz constant of convolutional classifiers with respect to the Levenshtein distance. Our method, Lips Lev is able to obtain verified accuracies at any distance k with single forward pass per sample. Moreover, our method is the only existing method that can practically verify for Levenshtein distances larger than k = 1. We expect our work can inspire a new line of works on verifying larger distances and more broadly verifying additional classes of NLP classifiers. We will make the code publicly available upon the publication of this work, our implementation is attached with this submission.

Future directions and challenges: A problem shared with verification methods in the image domain is scalability (Wang et al., 2021). Scaling verification methods to production models is a challenge, that becomes more relevant with the deployment of Large Language Models and their recently discovered vulnerabilities (Zou et al., 2023). Even though our method is the first to practically provide Levenshtein distance certificates in NLP, neither the formulation of Huang et al. (2019) or our formulation covers modern architectures as Transformers (Vaswani et al., 2017). We highlight the main challenges as follows:

i) Tokenizers: Modern Transformer-based classifiers utilize popular tokenizers such as Sentence Piece (Kudo and Richardson, 2018), which aggregate contiguous characters in tokens before feeding them to the model. In order to deal with such non-differentiable piece, methods for computing the Lipschitz constant of tokenizers are needed. ii) Poor performance on character-level tasks: In the case no tokenizer is used, transformers are known to fail in character-level classification tasks like the IMDB classification problem of Long Range Arena (Tay et al., 2021). iii) Non-Lipschitzness of Transformers: Transformers are known to have a non-bounded Lipschitz constant (Kim et al., 2021). In the image domain, verification methods modify the model to be Lipschitz (Qi et al., 2023; Bonaert et al., 2021) or compute local Lipschitz constants (Havens et al., 2024). Nevertheless, it is non-trivial to extend such approaches from ℓp-induced distances to the Levenshtein distance.

Our work sets the mathematical foundations of Lipschitz verification in NLP, opens the door to addressing these challenges and to achieving verifiable architectures beyond convolutional models.

Published as a conference paper at ICLR 2025

ACKNOWLEDGEMENTS

Elias is thankful to Adrian Luis Müller and Ioannis Mavrothalassitis for the amazing discussions in the whiteboard of the Lab. Authors acknowledge the constructive feedback of reviewers and the work of ICLR 25 program and area chairs. We thank Zulip for their project organization tool. ARO - Research was sponsored by the Army Research Office and was accomplished under Grant Number W911NF-24-1-0048. Hasler AI - This work was supported by Hasler Foundation Program: Hasler Responsible AI (project number 21043). SNF project Deep Optimisation - This work was supported by the Swiss National Science Foundation (SNSF) under grant number 200021_205011.

BROADER IMPACT

In this work, we tackle the important problem of verifying the robustness of NLP models against adversarial attacks. By advancing in this area, we can positively impact society by ensuring NLP models deployed in safety critical applications are robust to such perturbations.

Elias Abad Rocamora, Yongtao Wu, Fanghui Liu, Grigorios G. Chrysos, and Volkan Cevher. Revisiting character-level adversarial attacks for language models. In International Conference on Machine Learning (ICML), 2024.

Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, and Kai-Wei Chang. Generating natural language adversarial examples. In Empirical Methods in Natural Language Processing (EMNLP), 2018.

Maksym Andriushchenko and Nicolas Flammarion. Understanding and improving fast adversarial training. Advances in neural information processing systems (Neur IPS), 2020.

Yonatan Belinkov and Yonatan Bisk. Synthetic and natural noise both break neural machine translation. In International Conference on Learning Representations (ICLR), 2018.

Gregory Bonaert, Dimitar I. Dimitrov, Maximilian Baader, and Martin Vechev. Fast and precise certification of transformers. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, 2021.

Lei Chen and Raymond Ng. On the marriage of lp-norms and edit distance. In Proceedings of the Thirtieth International Conference on Very Large Data Bases - Volume 30, VLDB 04, page 792 803. VLDB Endowment, 2004.

Moustapha Cisse, Piotr Bojanowski, Edouard Grave, Yann Dauphin, and Nicolas Usunier. Parseval networks: Improving robustness to adversarial examples. In International conference on machine learning, pages 854 863. PMLR, 2017.

Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. In International Conference on Machine Learning (ICML), 2019.

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2019.

Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Dejing Dou. Hot Flip: White-box adversarial examples for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2018.

Mahyar Fazlyab, Alexander Robey, Hamed Hassani, Manfred Morari, and George Pappas. Efficient and accurate estimation of lipschitz constants for deep neural networks. Advances in neural information processing systems (Neur IPS), 32, 2019.

Published as a conference paper at ICLR 2025

Ji Gao, Jack Lanchantin, Mary Lou Soffa, and Yanjun Qi. Black-box generation of adversarial text sequences to evade deep learning classifiers. In IEEE Security and Privacy Workshops (SPW), 2018.

Henry Gouk, Eibe Frank, Bernhard Pfahringer, and Michael J Cree. Regularisation of neural networks by enforcing lipschitz continuity. Machine Learning, 110:393 416, 2021.

Sven Gowal, Krishnamurthy Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy Mann, and Pushmeet Kohli. On the effectiveness of interval bound propagation for training verifiably robust models. ar Xiv preprint ar Xiv:1810.12715, 2018.

Antonio Gulli. Ag s corpus of news articles, 2005. URL http://groups.di.unipi.it/ ~gulli/AG_corpus_of_news_articles.html.

Aaron J Havens, Alexandre Araujo, Huan Zhang, and Bin Hu. Fine-grained local sensitivity analysis of standard dot-product self-attention. In International Conference on Machine Learning (ICML), 2024.

Matthias Hein and Maksym Andriushchenko. Formal guarantees on the robustness of a classifier against adversarial manipulation. Advances in neural information processing systems (Neur IPS), 30, 2017.

Bairu Hou, Jinghan Jia, Yihua Zhang, Guanhua Zhang, Yang Zhang, Sijia Liu, and Shiyu Chang. Textgrad: Advancing robustness evaluation in NLP by gradient-driven optimization. In International Conference on Learning Representations (ICLR), 2023.

Po-Sen Huang, Robert Stanforth, Johannes Welbl, Chris Dyer, Dani Yogatama, Sven Gowal, Krishnamurthy Dvijotham, and Pushmeet Kohli. Achieving verified robustness to symbol substitutions via interval bound propagation. In Empirical Methods in Natural Language Processing (EMNLP), 2019.

Yujia Huang, Huan Zhang, Yuanyuan Shi, J Zico Kolter, and Anima Anandkumar. Training certifiably robust neural networks with efficient local lipschitz bounds. Advances in neural information processing systems (Neur IPS), 2021.

Zhuoqun Huang, Neil G Marchant, Keane Lucas, Lujo Bauer, Olga Ohrimenko, and Benjamin I. P. Rubinstein. RS-del: Edit distance robustness certificates for sequence classifiers via randomized deletion. In Advances in neural information processing systems (Neur IPS), 2023.

Robin Jia, Aditi Raghunathan, Kerem Göksel, and Percy Liang. Certified robustness to adversarial word substitutions. In Empirical Methods in Natural Language Processing (EMNLP), 2019.

Hyunjik Kim, George Papamakarios, and Andriy Mnih. The lipschitz constant of self-attention. In International Conference on Machine Learning (ICML), 2021.

Taku Kudo and John Richardson. Sentence Piece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, 2018.

Fabian Latorre, Paul Rolland, and Volkan Cevher. Lipschitz constant estimation of neural networks via sparse polynomial optimization. In International Conference on Learning Representations (ICLR), 2020.

Vladimir I Levenshtein et al. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, volume 10, pages 707 710. Soviet Union, 1966.

Qiyang Li, Saminul Haque, Cem Anil, James Lucas, Roger B Grosse, and Jörn-Henrik Jacobsen. Preventing gradient attenuation in lipschitz constrained convolutional networks. Advances in neural information processing systems (Neur IPS), 32, 2019.

Published as a conference paper at ICLR 2025

William Lifferth. Kaggle fake news, 2018. URL https://kaggle.com/competitions/ fake-news.

Aiwei Liu, Honghai Yu, Xuming Hu, Shu ang Li, Li Lin, Fukun Ma, Yawen Yang, and Lijie Wen. Character-level white-box adversarial attacks against transformers via attachable subwords substitution. In Empirical Methods in Natural Language Processing (EMNLP), 2022.

Changliu Liu, Tomer Arnon, Christopher Lazarus, Christopher Strong, Clark Barrett, Mykel J Kochenderfer, et al. Algorithms for verifying deep neural networks. Foundations and Trends in Optimization, 4, 2021.

Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. Learning word vectors for sentiment analysis. In Association for Computational Linguistics, 2011.

Matthew Mirman, Timon Gehr, and Martin Vechev. Differentiable abstract interpretation for provably robust neural networks. In International Conference on Machine Learning (ICML). PMLR, 2018.

Ramon E. Moore, R. Baker Kearfott, and Michael J. Cloud. Introduction to Interval Analysis. Society for Industrial and Applied Mathematics (SIAM), 2009.

John Morris, Eli Lifland, Jack Lanchantin, Yangfeng Ji, and Yanjun Qi. Reevaluating adversarial examples in natural language. In Findings of the Association for Computational Linguistics: EMNLP 2020, 2020.

Mark Niklas Mueller, Franziska Eckert, Marc Fischer, and Martin Vechev. Certified training: Small boxes are all you need. In International Conference on Learning Representations (ICLR), 2023.

Alessandro De Palma, Rudy R Bunel, Krishnamurthy Dj Dvijotham, M. Pawan Kumar, Robert Stanforth, and Alessio Lomuscio. Expressive losses for verified robustness via convex combinations. In International Conference on Learning Representations (ICLR), 2024.

Xianbiao Qi, Jianan Wang, Yihao Chen, Yukai Shi, and Lei Zhang. Lipsformer: Introducing lipschitz continuity to vision transformers. In International Conference on Learning Representations (ICLR), 2023.

Haifeng Qian and Mark N. Wegman. L2-nonexpansive neural networks. In International Conference on Learning Representations (ICLR), 2019.

Zhouxing Shi, Huan Zhang, Kai-Wei Chang, Minlie Huang, and Cho-Jui Hsieh. Robustness verification for transformers. In International Conference on Learning Representations (ICLR), 2020.

Zhouxing Shi, Yihan Wang, Huan Zhang, J Zico Kolter, and Cho-Jui Hsieh. Efficiently computing local lipschitz constants of neural networks via bound propagation. Advances in neural information processing systems (Neur IPS), 35:2350 2364, 2022.

Siqi Sun and Wenjie Ruan. Text Verifier: Robustness verification for textual classifiers with certifiable guarantees. In Findings of the Association for Computational Linguistics (ACL), 2023.

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. Advances in neural information processing systems (Neur IPS), 27, 2014.

Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, and Donald Metzler. Long range arena : A benchmark for efficient transformers. In International Conference on Learning Representations (ICLR), 2021.

Yusuke Tsuzuku, Issei Sato, and Masashi Sugiyama. Lipschitz-margin training: Scalable certification of perturbation invariance for deep neural networks. Advances in neural information processing systems (Neur IPS), 31, 2018.

Published as a conference paper at ICLR 2025

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processing systems (Neur IPS), 2017.

Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In International Conference on Learning Representations (ICLR), 2019.

Shiqi Wang, Kexin Pei, Justin Whitehouse, Junfeng Yang, and Suman Jana. Formal security analysis of neural networks using symbolic intervals. In USENIX Conference on Security Symposium, 2018.

Shiqi Wang, Huan Zhang, Kaidi Xu, Xue Lin, Suman Jana, Cho-Jui Hsieh, and J Zico Kolter. Beta CROWN: Efficient bound propagation with per-neuron split constraints for complete and incomplete neural network verification. Advances in neural information processing systems (Neur IPS), 34, 2021.

Michael S Waterman, Temple F Smith, and William A Beyer. Some biological sequence metrics. Advances in Mathematics, 20(3):367 387, 1976.

Xiaojun Xu, Linyi Li, and Bo Li. LOT: Layer-wise orthogonal training on improving l2 certified robustness. In Advances in neural information processing systems (Neur IPS), 2022.

Mao Ye, Chengyue Gong, and Qiang Liu. SAFER: A structure-free approach for certified robustness to adversarial word substitutions. In Annual Meeting of the Association for Computational Linguistics, 2020.

Jiehang Zeng, Jianhan Xu, Xiaoqing Zheng, and Xuanjing Huang. Certified robustness to text adversarial attacks by randomized [MASK]. Computational Linguistics, 2023.

Bohang Zhang, Du Jiang, Di He, and Liwei Wang. Rethinking lipschitz neural networks and certified robustness: A boolean function perspective. In Advances in neural information processing systems (Neur IPS), 2022.

Xiang Zhang, Junbo Zhao, and Yann Le Cun. Character-level convolutional networks for text classification. Advances in neural information processing systems (Neur IPS), 28, 2015.

Xinyu Zhang, Hanbin Hong, Yuan Hong, Peng Huang, Binghui Wang, Zhongjie Ba, and Kui Ren. Text-crs: A generalized certified robustness framework against textual adversarial attacks. In IEEE Symposium on Security and Privacy (SP), 2024.

Yuhao Zhang, Aws Albarghouthi, and Loris D Antoni. Certified robustness to programmable transformations in LSTMs. In Empirical Methods in Natural Language Processing (EMNLP), 2021.

Haiteng Zhao, Chang Ma, Xinshuai Dong, Anh Tuan Luu, Zhi-Hong Deng, and Hanwang Zhang. Certified robustness against natural language attacks by causal intervention. In International Conference on Machine Learning (ICML), 2022.

Andy Zou, Zifan Wang, J Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial attacks on aligned language models. ar Xiv preprint ar Xiv:2307.15043, 2023.

Published as a conference paper at ICLR 2025

A ADDITIONAL EXPERIMENTAL VALIDATION

In Appendix A.1 we present the definition of the performance metrics employed in this work. In Appendix A.5 we present our grid search for selecting the best learning rate for each dataset and p value in the ERP distance Definition 4.1. In Appendix A.6 we evaluate the effect of training with a different number of layers and report their latencies.

A.1 DEFINITION OF PERFORMANCE METRICS

In this section we define the key metrics used to evaluate our models and verification methods. Let 1 () be the indicator function, given a classification model f : S(Γ) Ro assigning scores to each of the o classes and a dataset D = {S(i), y(i)}n i=1 with S(i) S(Γ) and y(i) [o], the clean, adversarial and verified accuracy are defined as:

Definition S1 (Clean accuracy). The clean accuracy is a percentage in [0, 100] that is computed as:

Acc.(f, D) = 100

y(i) = arg max j [o] f(S(i))j

Definition S2 (Adversarial accuracy). Given an adversary A : S(Γ) S(Γ) that perturbs a sentence3. The adversarial accuracy is a percentage in [0, 100] that is computed as:

Adv. Acc.(f, D, A) = 100

y(i) = arg max j [o] f(A(S(i)))j

Definition S3 (Verified accuracy). Given a verification method v returning the certified radius (see Section 4.1) for a given model f and sample (S, y) as v(f, S, y) {0} N. The verified accuracy at distance k is a percentage in [0, 100] that is computed as:

Ver. Acc.(f, D, v, k) = 100

i=1 1 v(f, S(i), y(i)) k .

For simplicity, the arguments of each accuracy function are omitted in the text as they can be inferred from the context.

A.2 SENTENCE LENGTH DISTRIBUTION

In this section, we provide additional details about the sequence length distribution of verified and not verified sentences. Specifically, in Fig. S2 we provide the full distribution of lengths from the experiment in Fig. 1.

A.3 STANDARD DEVIATIONS

In Table S4 we report the results from Table 2 with standard deviations for completeness.

A.4 LARGER DISTANCES FOR FAKE-NEWS

In Section 5.2 we reported the performance in the Fake-News dataset over the first 50 samples of the test set and only up to k = 2. Nevertheless, the speed of Lips Lev allows for more samples and larger distances. In Table S5, we evaluate the performance of Lips Lev over the first 1, 000 test samples and up to k = 10.

3The adversary will usually adhere to some constraints such as d Lev(S, A(S)) k.

Published as a conference paper at ICLR 2025

200 400 600 800 Sentence length

Not verified Verified

0 100 200 Sentence length

Not verified Verified

Figure S2: Sentence length distribution for verified and not verified sentences: We report the histogram of the lengths for verified and not verified sentences at k = 1 with Lips Lev in the models trained with p = 2. Shorter sentences are harder to verify in both SST-2 and AG-News with both Lips Lev and the brute force approach.

Table S4: Verified accuracy under bounded dlev: We report the Clean accuracy (Acc.), Adversarial Accuracy (Adv. Acc.) with Charmer (Abad Rocamora et al., 2024), Verified accuracy (Ver.) and the average runtime in seconds (Time) for the brute-force approach (Brute F), IBP (Huang et al., 2019) and Lips Lev. OOT means the experiment was Out Of Time. means the method does not support dlev > 1. Our method, Lips Lev, is the only method able to provide non-trivial verified accuracies for any k in a single forward pass.

Dataset p k Acc.(%) Charmer Brute F IBP Lips Lev Adv. Acc.(%) Time(s) Ver.(%) Time(s) Ver.(%) Time(s) Ver.(%) Time(s)

1 65.23 (0.12) 47.90 (0.08) 5.70 (0.03) 47.87 (0.09) 16.15 (0.23) 27.77 (0.12) 16.76 (0.26) 32.33 (0.31) 0.0015 (0.00033) 2 32.97 (0.38) 5.70 (0.03) OOT 11.60 (0.45) 0.0015 (0.00033) 1 69.63 (0.19) 54.47 (0.49) 5.43 (0.33) 54.43 (0.53) 15.33 (0.34) 18.93 (0.50) 17.56 (1.62) 34.50 (0.36) 0.00140 (0.00007) 1 2 37.77 (0.46) 5.43 (0.33) OOT 12.53 (0.29) 0.00140 (0.00007) 1 74.80 (0.45) 62.20 (0.75) 7.32 (0.54) 62.07 (0.82) 29.12 (1.88) 29.10 (0.45) 31.54 (0.55) 38.80 (0.29) 0.00970 (0.00044) 2 2 46.47 (0.29) 7.32 (0.54) OOT 13.93 (0.21) 0.00970 (0.00044)

1 63.95 (0.30) 39.68 (0.99) 1.84 (0.05) 39.68 (0.99) 2.27 (0.079) 33.94 (1.11) 2.88 (0.092) 14.68 (0.25) 0.00084 (0.00024) 2 19.92 (1.16) 1.84 (0.05) OOT 0.99 (0.05) 0.00084 (0.00024) 1 69.69 (0.14) 45.26 (0.20) 1.91 (0.03) 45.22 (0.14) 2.31 (0.16) 19.00 (1.08) 2.99 (0.14) 18.69 (0.80) 0.0022 (0.0017) 1 2 26.11 (0.55) 1.91 (0.03) OOT 1.83 (0.00) 0.0022 (0.0017) 1 69.95 (0.32) 48.81 (0.42) 2.09 (0.07) 48.78 (0.43) 4.23 (0.11) 16.06 (1.17) 5.22 (0.49) 14.57 (0.34) 0.0047 (0.0023) 2 2 30.70 (0.81) 2.09 (0.07) OOT 0.73 (0.27) 0.0047 (0.0023)

1 100.00 (0.00) 86.67 (0.94) 66.82 (1.98) 86.67 (0.94) 972.46 (8.15) 85.33 (0.94) 989.84 (8.40) 85.33 (0.94) 0.017 (0.0067) 2 76.00 (1.63) 66.82 (1.98) OOT 68.67 (0.94) 0.017 (0.0067) 1 98.00 (1.63) 92.00 (0.00) 67.11 (1.87) 92.00 (0.00) 978.94 (15.91) 91.33 (0.94) 990.32 (14.85) 91.33 (0.94) 0.014 (0.0056) 1 2 79.33 (2.49) 67.11 (1.87) OOT 75.33 (3.40) 0.014 (0.0056) 1 98.00 (1.63) 88.67 (4.99) 73.52 (2.77) 88.67 (4.99) 1224.45 (8.66) 87.33 (4.11) 1466.38 (294.21) 87.33 (6.80) 0.0089 (0.010) 2 2 78.00 (4.32) 73.52 (2.77) OOT 71.33 (5.25) 0.0089 (0.010)

1 74.57 (5.22) 67.43 (4.70) 14.16 (0.40) 67.43 (4.70) 130.49 (3.38) 61.50 (4.73) 138.20 (6.12) 31.37 (4.54) 0.0047 (0.0015) 2 59.77 (4.81) 14.16 (0.40) OOT 5.90 (1.36) 0.0047 (0.0015) 1 69.57 (7.18) 61.17 (8.92) 14.44 (0.27) 61.00 (8.82) 134.23 (1.64) 47.30 (10.51) 135.22 (0.31) 28.73 (6.94) 0.0027 (0.0025) 1 2 52.20 (9.33) 14.44 (0.27) OOT 6.80 (2.16) 0.0027 (0.0025) 1 60.60 (4.21) 46.87 (0.62) 16.24 (0.49) 46.73 (0.78) 261.99 (62.11) 37.57 (6.35) 308.73 (1.70) 8.67 (5.08) 0.0019 (0.0011) 2 2 35.10 (3.36) 16.24 (0.49) OOT 0.87 (0.66) 0.0019 (0.0011)

Table S5: Lips Lev verified accuracy in Fake News over the first 1, 000 validation samples and up to k = 10. We observe our method is able to verify non-trivial accuracy with even up to 10 character changes.

p Clean Acc. Verified Acc. 1 2 3 4 5 6 7 8 9 10

95.63 (0.19) 87.50 (0.42) 72.43 (0.31) 49.37 (1.54) 31.03 (1.33) 20.03 (1.21) 14.07 (1.33) 9.77 (1.11) 6.67 (0.83) 5.00 (0.85) 3.60 (0.57) 1 95.90 (0.16) 88.93 (1.72) 75.97 (2.91) 56.33 (4.24) 38.50 (4.82) 24.13 (4.14) 15.83 (3.79) 11.30 (3.40) 7.47 (2.40) 5.50 (2.01) 4.27 (1.54) 2 95.00 (0.92) 87.50 (2.70) 70.77 (7.90) 48.37 (11.71) 30.97 (10.06) 20.57 (6.44) 13.07 (4.89) 9.20 (4.27) 6.10 (2.73) 4.57 (2.02) 3.50 (1.71)

A.5 HYPERPARAMETER SELECTION

In order to select the best learning rate in each dataset and p norm for the ERP distance, we compute the clean and verified accuracy at k = 1 in a validation set of 1, 000 samples extracted from each training set. We test the learning rate values {0.1, 0.5, 1, 5, 10, 50, 100, 500, 1000}. We train convolutional models with 1 convolutional layer and the standard embedding, hidden and kernel sizes in Section 5.1. We notice these large learning rates are needed due to the 1-Lipschitz formulation in Eq. (7).

Published as a conference paper at ICLR 2025

0.1 0.5 1.0 5.0 10.0 50.0 100.0 500.0 1000.0 Learning rate

0.1 0.5 1.0 5.0 10.0 50.0 100.0 500.0 1000.0 Learning rate

Figure S3: Learning rate selection for the SST-2 and AG-News datasets: We report the clean and verified accuracy in a validation set of 1,000 sentences extracted from the training split of each dataset and set aside during training. We set the learning rate equal to 100 in the rest of our experiments as it provides a good trade-off between clean and verified accuracy for all norms and datasets.

1 2 3 4 Layers

1 2 3 4 Layers

Verified (k = 1)

Figure S4: Training deeper models in AG-News: We report the clean and verified accuracies with Lips Lev at k = 1 for p {1, 2, }. Clean and verified accuracies decrease with the number of layers. With p = 2 the performance is less degraded with the number of layers.

Based on the results from Fig. S3, we select 100 as our learning rate for the rest of experiments in this work.

A.6 TRAINING DEEPER MODELS

In this section, we study the performance of models with more than one convolutional layer. We train with 1, 2, 3 and 4 convolutional layers with a hidden size of 100 and a kernel size of 5 and 10 for SST-2 and AG-News respectively. We train the models with the 1-Lipschitz formulation in Eq. (7) with p {1, 2, }.

In Figs. S4 and S5 we can observe that increasing the number of layers degrades the clean and verified accuracy for every value of p. Nevertheless, for p = 2, the effect is diminished. Jointly with the improved performance when using p = 2 in Section 5.2, we advocate for its use in the ERP distance. We believe this performance degradation is related to the gradient attenuation phenomenon (Li et al., 2019). It remains an open problem to avoid gradient attenuation in the case where the Lipschitz constant of the ERP distance is enforced to be 1.

In Table S6 we can observe that our models have low latencies. Noticeably, with p = 2 we observe a larger latency than with p {1, }. This is due to the need to compute the espectral norm at each

Published as a conference paper at ICLR 2025

1 2 3 4 Layers

1 2 3 4 Layers

Verified (k = 1)

Figure S5: Training deeper models in SST-2: We report the clean and verified accuracies with Lips Lev at k = 1 for p {1, 2, }. Clean and verified accuracies decrease with the number of layers. With p = 2 the performance is less degraded with the number of layers.

iteration. Nevertheless, this cost is still low and only incurred during training, as by rescaling the weights, we can any model in Eq. (7) formulation as a model in Eq. (6).

Table S6: Latency in seconds for different p and number of layers.

Layers p 1 2 3 4

1 0.0017 (0.0179) 0.0015 (0.0007) 0.0015 (0.0003) 0.0019 (0.0002) 2 0.0225 (0.0045) 0.0393 (0.0026) 0.0567 (0.0028) 0.0740 (0.0036) 0.0007 (0.0000) 0.0011 (0.0000) 0.0015 (0.0000) 0.0019 (0.0000)

A.7 COMPARISON AGAINST RANDOMIZED SMOOTHING

In this section, we analyze the differences in methodology and results, between our approach and the Randomized Smoothing approach in Huang et al. (2023): RS-Del.

As an advantage, RS-Del can be employed with any base classifier f : S(Γ) Ro assigning scores to each of the o classes, without restrictions on the architecture. Nevertheless, neither the classifier verified by RS-Del, nor the certified radius itself are deterministic. In Randomized Smoothing, given a classifier f and given a set of perturbations P(S), a smoothed classifier f(S) = E S Unif.(P(S))

h f( S) i is constructed.

In practice, multiple perturbations (npred) are sampled from P(S) and the prediction is estimated through Monte-Carlo as the average f(S) Pnpred i=1 f( Si), with Si sampled i.i.d uniformly from P(S). Similarly, an α-confidence lower bound of the margin is estimated with nbnd samples, which is then used to obtain an α-confidence estimate of the certified radius. Therefore, the certificates provided with Randomized Smoothing are probabilistic and regarding the smoothed classifier f, not the original f.

Contrarily, Lips Lev imposes the strong constraint of knowing an upper bound of the Lipschitz constant of the margins of f. However, this enables the direct, non-probabilistic, estimate of the certified radius in a single forward pass through f.

Having these differences in mind, we compare the performance of Lips Lev and RD-Del when verifying the models in Table 2. To do so, we run RS-Del with the recommended npred = 1, 000, nbnd = 4, 000, pdel = 0.9 and α = 0.05.

In Table S7 we can observe that RS-Del significantly degrades the clean accuracy of the classifier, this is due to the randomization introduced in the prediction process. However, RS-Del is able to improve the verified accuracy in SST-2 for all p and in IMDB for p = 2. In terms of runtime, we observe that, while being significantly slower than Lips Lev, RS-Del is able to provide certificates in a time range from 1 to 2 seconds, which is considerably faster than Brute-F and IBP.

Published as a conference paper at ICLR 2025

Table S7: Comparison against Randomized Smoothing: We report the Clean accuracy (Acc.), Verified accuracy (Ver.) and the average runtime in seconds (Time) for Lips Lev and RS-Del (Huang et al., 2023). RS-Del reduces drops significantly the clean accuracy of the classifier. RS-Del improves the verified accuracy of Lips Lev in SST-2 for all p and IMDB for p = 2.

Dataset p k Lips Lev RS-Del (Huang et al., 2023) Acc.(%) Ver.(%) Time(s) Acc.(%) Ver.(%) Time(s)

1 65.23 (0.12) 32.33 (0.31) 0.0015 (0.00033) 51.77 (0.17) 2.57 (0.41) 1.08 (0.06) 2 11.60 (0.45) 0.0015 (0.00033) 0.27 (0.05) 1.08 (0.06) 1 69.63 (0.19) 34.50 (0.36) 0.00140 (0.00007) 50.67 (0.90) 3.73 (0.19) 1.05 (0.05) 1 2 12.53 (0.29) 0.00140 (0.00007) 0.80 (0.22) 1.05 (0.05) 1 74.80 (0.45) 38.80 (0.29) 0.00970 (0.00044) 48.20 (0.37) 4.60 (0.41) 1.68 (0.07) 2 2 13.93 (0.21) 0.00970 (0.00044) 0.90 (0.16) 1.68 (0.07)

1 63.95 (0.30) 14.68 (0.25) 0.00084 (0.00024) 57.11 (0.49) 21.98 (1.37) 0.60 (0.04) 2 0.99 (0.05) 0.00084 (0.00024) 3.78 (1.12) 0.60 (0.04) 1 69.69 (0.14) 18.69 (0.80) 0.0022 (0.0017) 56.42 (1.08) 21.10 (1.95) 0.69 (0.13) 1 2 1.83 (0.00) 0.0022 (0.0017) 3.82 (0.76) 0.69 (0.13) 1 69.95 (0.32) 14.57 (0.34) 0.0047 (0.0023) 59.17 (0.99) 22.40 (0.98) 0.94 (0.06) 2 2 0.73 (0.27) 0.0047 (0.0023) 4.86 (0.42) 0.94 (0.06)

1 100.00 (0.00) 85.33 (0.94) 0.017 (0.0067) 80.67 (0.94) 69.33 (2.49) 1.87 (0.10) 2 68.67 (0.94) 0.017 (0.0067) 56.67 (0.94) 1.87 (0.10) 1 98.00 (1.63) 91.33 (0.94) 0.014 (0.0056) 77.33 (1.89) 66.00 (2.83) 1.87 (0.06) 1 2 75.33 (3.40) 0.014 (0.0056) 54.00 (1.63) 1.87 (0.06) 1 98.00 (1.63) 87.33 (6.80) 0.0089 (0.010) 75.33 (8.22) 62.67 (8.99) 2.24 (0.11) 2 2 71.33 (5.25) 0.0089 (0.010) 57.33 (6.60) 2.24 (0.11)

1 74.57 (5.22) 31.37 (4.54) 0.0047 (0.0015) 6.90 (0.00) 0.90 (0.00) 1.64 (0.00) 2 5.90 (1.36) 0.0047 (0.0015) 0.00 (0.00) 1.64 (0.00) 1 69.57 (7.18) 28.73 (6.94) 0.0027 (0.0025) 19.20 (0.00) 3.50 (0.00) 1.74 (0.00) 1 2 6.80 (2.16) 0.0027 (0.0025) 0.40 (0.00) 1.74 (0.00) 1 60.60 (4.21) 8.67 (5.08) 0.0019 (0.0011) 48.36 (36.43) 33.67 (27.15) 2.14 (0.20) 2 2 0.87 (0.66) 0.0019 (0.0011) 18.43 (15.85) 2.14 (0.20)

In this section we introduce the mathematical tools needed to derive our Lipschitz constant upper bounds for each layer in Eq. (6). The section concludes with the proof of our main result in Theorem 4.3. In Appendix B.1 we present some remarks to be considered regarding global and local Lipschitz constants. Definition S1 (Zero-paddings). Let X Xd a sequence of m non-zero vectors. Let l m, a zero padding function Z : Xd Rl d is some function defined by the tuple:

(ik)l k=1 : m ik > ij 1 < j < k if ik = 0 |{k [l] : ik = 0}| = l m if ik = 0

zk(X) = xik if ik = 0 0 if ik = 0 Intuitively, a valid zero-padding function inserts l m zeros in between any vector of the sequence, the beginning or the end. We denote as Zm,l the set of zero paddings from sequences of length m to sequences of length l. Remark S2. Given a matrix A Rm m and a zero padding Z Zm,l, we denote the column and row-wise padding as Z(A) = Z(Z(A ) ) Rl l. Proposition S3 (Alternative definition of dp

ERP). Let dp

ERP be as in Definition 4.1. Let A Rm d

and B Rn d be two sequences. Let Zm,m+n and Zn,m+n be the zero-padding functions from length m and n respectively to length m + n. The ERP distance can be expressed as:

ERP(A, B) = min Za Zm,m+n,Zb Zn,m+n

za k(A) zb k(B) p

Published as a conference paper at ICLR 2025

Lemma S4 (Properties of the ERP distance). Some important properties of the ERP distance are summarized here:

(a) Generalization of edit distance: In the case of having sequences of one-hot vectors A {0, 1}m d : ||ai||1 = 1, and using p = , the ERP distance is equal to the edit distance (Levenshtein et al., 1966).

(b) Invariance to the concatenation of zeros:

ERP(A 0, B) = dp

ERP(0 A, B) = dp

ERP(A, B) A Rm d, B Rn d

(c) Distance to the empty set:

i=1 ||ai||p A Rm d

(d) Symmetry: dp

ERP(A, B) = dp

ERP(B, A) A Rm d, B Rn d

(e) Triangular inequality: For any A Rm d, B Rn d, B Rl d, we have: dp

ERP(A, B) dp

ERP(A, C) + dp

(f) Subdistance: The ERP distance is not a distance because of its invariance to the concatenation of zeros:

ERP(A, A 0) = dp

ERP(A, A) = 0 A Rm d

proof of Lemma S4. Properties (a), (b), (c), (d) and (f) are straightforward from the deffinition, we will prove the triangular inequality (e). This proof follows similarly to the one of Waterman et al. (1976) for the standard Levenshtein distance. Let L = m + n + l, starting from the definition in Proposition S3:

ERP(A, B) + dp

ERP(B, C) = min Za Zm,L,Zb Zn,L Zc Zn,L,Zd Zl,L

za k(A) zb k(B) p

zc j(B) zd j (C) p .

Let Ze, Zf ZL,2L be two zero paddings so that Ze(Zb(B)) = Zf(Zc(B)):

ERP(A, B) + dp

ERP(B, C) = min Za Zm,L,Zb Zn,L Zc Zn,L,Zd Zl,L

ze k(Za(A)) ze k(Zb(B)) p

+ zf k(Zc(B)) zf k(Zd(C)) p

[Triangular ineq. for || ||p] min Za Zm,L,Zb Zn,L Zc Zn,L,Zd Zl,L

ze k(Za(A)) ze k(Zb(B))

+ zf k(Zc(B)) zf k(Zd(C)) p

[Ze(Zb(B)) = Zf(Zc(B))] = min Za Zm,L,Zd Zl,L

ze k(Za(A)) zf k(Zd(C)) p

ERP(A, C) ,

where the last equality follows from ze k(Za) and zf k(Zd) being valid zero paddings.

Published as a conference paper at ICLR 2025

Lemma S5 (Difference of sums). Let A, B Xd, we have that:

proof of Lemma S5.

min Za Zm,m+n,Zb Zn,m+n

k=1 za k(A) zb k(B)

min Za Zm,m+n,Zb Zn,m+n

za k(A) zb k(B) p

Lemma S6 (Difference of means). Let A, B Xd, we have that:

ERP(A, B) .

In the case of A and B being sequences of one-hot vectors, we have that:

m dlev (A, B) if m = n 2 m dlev (A, B) if m = n

proof of Lemma S6. Starting with the first result:

[Lemma S5] |m n|

ERP (A, B) .

Note that since A and B are interchangeable, we immediately have:

ERP (A, B) . (8)

In the case A and B are sequences of one-hot vectors, if m = n, we can directly get the 1/m factor out of the norm and apply Lemma S5 to get the first case. For the case m = n, we can manipulate

Published as a conference paper at ICLR 2025

Eq. (8) to get the desired result:

m dlev (A, B)

[|m n| dlev (A, B) + Triang. ineq.]

j=1 ||bj|| + 1

dlev (A, B)

[||bj|| = 1 j [n]] = 2

m dlev (A, B) .

Lemma S7 (Linear transformations). Let A, B Xd be two sequences and V Rd k. We have that:

ERP(AV , BV ) dp

ERP(A, B) ||V ||p

In the case of sequences of one-hot vectors, we have that:

ERP(AV , BV ) d Lev(A, B) M(V ) ,

M(V ) = max{max i [d] ||vi||p , max i,j [d] ||vi vj||p}

Proof. Follows immediately from Definition 4.1 and the fact that ||AB|| ||A|| ||B|| for any matrices A and B. The second result for one-hot vectors follows immediately from the fact that the biggest change in the embedding sequence that can be produced from a single-character change, is either given by inserting the character with the largest norm embedding (left side of the max), or replacing a character with the character that has the furthest away embedding in the ℓp norm (left side of the max).

Lemma S8 (Elementwise Lipschitz functions). Let dp

ERP be as in Definition 4.1. Let A Rm d

and B Rn d be two sequences. Let f : Rd Rk be a Lipschitz function so that:

||f(a) f(b)||p Lf ||a b||p a, b Rd .

Let F (A) Rm k and F (B) Rn k be the application of f to every vector in both sequences, we immediately have that:

ERP(F (A), F (B)) Lf dp

Lemma S9 (Convolution). Let dp

ERP be as in Definition 4.1. Let P {0, 1}m d and Q {0, 1}n d be two sequences of m and n one hot-vectors respectively. Let the function working with arbitrary sequence length l be F : {0, 1}l d Rl r. Let the convolutional filter C : Rl r R(l+q 1) k with kernel K Rq k r, where q is the kernel size and k is the number of filters. We have that:

ERP (C(F (P )), C(F (Q))) M(K) dp

ERP (F (P ), F (Q)) .

i=1 ||Ki||p .

Published as a conference paper at ICLR 2025

Proof of Lemma S9. Let L = m + n + 2q 2. Starting from the definition of the ERP distance in Lemma S4 and the definition of the convolutional layer in Definition 4.2: dp

ERP(C(F (P )), C(F (Q)))

= min Za Zm+q 1,L, Zb Zn+q 1,L

za k(C(F (P ))) zb k(C(F (Q))) p

= min Za Zm+q 1,L, Zb Zn+q 1,L

j=1 ˆ Ki,j ˆfj(P )

l=1 ˆ Ki,l ˆfl(Q)

= min Za Zm+q 1,L, Zb Zn+q 1,L

j=1 Kjfi+j 1(P )

j=1 Kjfi+j 1(Q)

= min Za Zm+q 1,L, Zb Zn+q 1,L

j=1 Kj za k [fi+j 1(P )]m+q 1 i=1 zb k [fi+j 1(Q)]n+q 1 i=1

min Za Zm+q 1,L, Zb Zn+q 1,L

j=1 ||Kj||p za k [fi+j 1(P )]m+q 1 i=1 zb k [fi+j 1(Q)]n+q 1 i=1 p

= min Za Zm+q 1,L, Zb Zn+q 1,L

j=1 ||Kj||p

za k [fi+j 1(P )]m+q 1 i=1 zb k [fi+j 1(Q)]n+q 1 i=1 p

j=1 ||Kj||p dp

ERP (F (P ), F (Q)) ,

where the last equality follows from the fact that [fi+j 1(P )]m+q 1 i=1 and [fi+j 1(Q)]n+q 1 i=1 are just windows of F (P ) and F (Q) respectively including the complete sequences F (P ) and

F (Q), resulting in dp

ERP [fi+j 1(P )]m+q 1 i=1 , [fi+j 1(Q)]n+q 1 i=1 = dp

ERP (F (P ), F (Q)) j = 1, , q.

Proof of Theorem 4.3. We will bound the absolute value of the difference of outputs for two sentences P , Q S(Γ) of lengths m and n respectively. For any y and ˆy:

|gy,ˆy(P ) gy,ˆy(Q)| :=

m+l (q 1) X

i=1 σ c(l) i (P E)

n+l (q 1) X

j=1 σ c(l) j (QE)

[Hölder s inequality] ||wˆy wy||r

m+l (q 1) X

i=1 σ c(l) i (P E)

n+l (q 1) X

j=1 σ c(l) j (QE)

[Lemma S5] ||wˆy wy||r dp

ERP σ C(l) (P E) , σ C(l) (QE)

[Lemma S8 and Lemma S9 recursively] ||wˆy wy||r

k=1 M(K(k))

ERP (P E, QE)

[Lemma S7] ||wˆy wy||r

k=1 M(K(k))

M(E) d Lev (P , Q) .

B.1 REMARKS REGARDING GLOBAL AND LOCAL LIPSCHITZ CONSTANTS

In this section we introduce some interesting remarks regarding how global and local Lipschitz constants are employed in this work. We define global and local Lipschitz constants as:

Published as a conference paper at ICLR 2025

Definition S10 (Global Lipschitz constant). Let g : S(Γ) R and d Lev be defined as in Section 3. We say G is a global Lipschitz constant if:

|g(P ) g(Q)| G d Lev(P , Q) P , Q S(Γ) .

The Lipschitz constant G in Definition S10 is valid for any two sentences P and Q, one example is our Lipschitz constant in Theorem 4.3. When additional conditions are posed on the set of sentences where G is valid, we say a Lipschitz constant is local. A specific case case of locality is when the Lipschitz constant depends on one of the arguments of the distance as G(P ), this is the kind of local Lipschitz constants we observe in this work:

Definition S11 (Local Lipschitz constant). Let g : S(Γ) R and d Lev be defined as in Section 3. We say G : S(Γ) R+ is a global Lipschitz constant if:

|g(P ) g(Q)| G(P ) d Lev(P , Q) P , Q S(Γ) .

Some examples of such local Lipschitz constants are Remark 4.5 and Corollary 4.6. Some properties to consider regarding global and local Lipschitz constants are: Remark S12 (Global Lipschitz constants upper bound local Lipschitz constants). Let G be a global Lipschitz constant as in Definition S10 and G : S(Γ) R+ a local Lipschitz constant as in Definition S11, we have that: G(P ) G P S(Γ) .

Remark S13 (Local Lipschitz constants might not hold everywhere). Let G : S(Γ) R+ be a local Lipschitz constant as in Definition S11, there might exist some P , Q, R S(Γ) such that:

|g(Q) g(R)| > G(P ) d Lev(Q, R) .

Remarks S12 and S13 highlight the two key aspects of local Lipschitz constants. While the bound is tighter than for global Lipschitz constants (Remark S12), these bounds can only be employed to give the certified radius around the sentence P where we compute the local Lipschitz constant G(P ), otherwise, the Lipschitzness property is lost (Remark S13).