# relational_verification_leaps_forward_with_rabbit__81916367.pdf Relational Verification Leaps Forward with RABBit Tarun Suresh1 Debangshu Banerjee1 Gagandeep Singh1,2 1University of Illinois Urbana-Champaign, 2VMware Research {tsuresh3, db21, ggnds}@illinois.edu We propose RABBit, a Branch-and-Bound-based verifier for verifying relational properties defined over Deep Neural Networks, such as robustness against universal adversarial perturbations (UAP). Existing SOTA complete L -robustness verifiers can not reason about dependencies between multiple executions and, as a result, are imprecise for relational verification. In contrast, existing SOTA relational verifiers only apply a single bounding step and do not utilize any branching strategies to refine the obtained bounds, thus producing imprecise results. We develop the first scalable Branch-and-Bound-based relational verifier, RABBit, which efficiently combines branching over multiple executions with cross-executional bound refinement to utilize relational constraints, gaining substantial precision over SOTA baselines on a wide range of datasets and networks. Code is at this URL. 1 Introduction Deep neural networks (DNNs) are now widely used in safety-critical fields like autonomous driving and medical diagnosis [Amato et al., 2013], where their decisions can have serious consequences. However, understanding and ensuring their reliability is difficult due to their complex and opaque nature. Despite efforts to find and address vulnerabilities, such as adversarial attacks [Goodfellow et al., 2014, Madry et al., 2018, Moosavi-Dezfooli et al., 2017] and adversarial training techniques [Madry et al., 2018], ensuring safety remains a challenge. As a result, extensive research is focused on formally verifying the safety of DNNs. However, most of the existing L robustness verification techniques can not handle relational properties common in practical situations. While significant efforts have been invested in verifying the absence of input-specific adversarial examples within the local neighborhood of test inputs, recent studies [Li et al., 2019a] emphasize that input-specific attacks are impractical regardless. Conversely, practical attack scenarios [Liu et al., 2023, Li et al., 2019b,a] involve the creation of universal adversarial perturbations (UAPs) [Moosavi-Dezfooli et al., 2017], which are crafted to impact a substantial portion of inputs from the training distribution. Ra Ve N [Banerjee et al., 2024b] and subsequently RACoon [Banerjee and Singh, 2024] showed that since the same adversarial perturbation is applied to multiple inputs, the executions on different perturbed inputs are related, exploiting the relationship between different executions significantly improves the precision of the verifier. Despite Ra Ve N and RACoon s ability to leverage cross-executional dependencies, both of them remain imprecise as they only apply a single bounding step and lack refinement using branching strategies used in SOTA complete non-relational verifiers. Key challenges: For precise relational verification, we need efficient algorithms that can effectively combine branching strategies over multiple executions with bounding techniques that can leverage cross-executional dependencies. Theoretically, MILP (Mixed Integer Linear Programming) can exactly encode DNN executions with piecewise linear activation functions like Re LU over any input regions specified by linear inequalities. However, the associated MILP optimization problem is computationally expensive. For instance, encoding k executions of a DNN with nr Re LU activations Equal contribution. Primary Correspondence: tsuresh3@illinois.edu 38th Conference on Neural Information Processing Systems (Neur IPS 2024). introduces O(nr k) integer variables in the worst case. As the cost of MILP optimization grows exponentially with the number of integer variables, even SOTA off-the-shelf solvers like Gurobi [Gurobi Optimization, LLC, 2018] struggle to verify small DNNs for a relational property over k executions within a reasonable time limit. For scalability, SOTA non-relational verifiers like α, βCROWN [Wang et al., 2021] design custom "Branch and Bound" (Ba B) solvers using more scalable differentiable optimization techniques such as gradient descent. However, these verifiers ignore dependencies between multiple executions, resulting in imprecise relational verification. Conversely, the SOTA relational verifier RACoon uses parametric linear relaxation for each activation to avoid integer variables and employs gradient descent to learn parameters that leverage cross-executional dependencies for verification. This method, however, introduces imprecision due to the replacement of non-linear activations with parametric linear approximations. Therefore, precise relational verification requires scalable algorithms that can: a) scale to the large DNNs used in this paper, b) effectively reduce imprecision from parametric linear relaxations, and c) utilize cross-executional dependencies. Our contributions: We advance the state-of-the-art in relational DNN verification by: Efficiently combining branching strategies over multiple DNN executions with a cross-executional bounding method that utilizes dependencies between DNN s outputs from different executions while reducing imprecision resulting from parametric linear relaxations. Developing two "branch and bound" algorithms, each with its own advantages - a) strong bounding: applies cross-execution bounding at each step, branching over all executions. This method provides tighter bounds than RACoon (cross-executional bound refinement without branching) and α, βCROWN (branching without cross-executional bound refinement), b) strong branching: applies cross-execution bounding only at the start to derive fixed linear approximations for each execution. These approximations are then used to branch independently over each execution, exploring more branches per execution. Combining strong bounding and branching results into an efficiently optimizable MILP instance that leverages the benefits of both techniques, outperforming each individually. Performing extensive experiments on popular datasets and various DNNs (standard and robustly trained) to showcase the precision improvement over the current SOTA baselines. 2 Related Works Non-relational DNN verifiers: Given a logical input specification ϕ and an output specification ψ, DNN verifiers formally prove that for all inputs x satisfying ϕ, the output N(x) of the DNN satisfies ψ. If the verification process fails, the verifier generates a counter-example where the output specification ψ does not hold. DNN verifiers are broadly divided into three main types based on their ability to prove properties: - (i) sound but incomplete verifiers which may not always prove property even if it holds [Gehr et al., 2018, Singh et al., 2018, 2019b,a, Zhang et al., 2018, Xu et al., 2020, 2021], (ii) complete verifiers that can always prove the property if it holds [Wang et al., 2018, Gehr et al., 2018, Bunel et al., 2020a,c, Bak et al., 2020, Ehlers, 2017, Ferrari et al., 2022, Fromherz et al., 2021, Wang et al., 2021, Palma et al., 2021, Anderson et al., 2020, Zhang et al., 2022a] and (iii) verifiers with probabilistic guarantees [Cohen et al., 2019, Li et al., 2022]. Beyond the commonly studied L robustness verification problem, several works adapt DNN verification techniques for specific applications, such as robustness against image rotation [Singh et al., 2019b, Balunovic et al., 2019], incremental verification [Ugare et al., 2023, 2024], interpretability [Banerjee et al., 2024a], and certifiable training [Mueller et al., 2023, Palma et al., 2024, Jiang and Singh, 2024]. Relational DNN verifier: Existing relational verifiers fall into two main categories based on the type of relational properties they can handle: (i) verifiers for properties such as UAP and fairness, which are defined across multiple executions of the same DNN [Zeng et al., 2023, Khedr and Shoukry, 2023, Meyer et al., 2024, Banerjee et al., 2024b, Banerjee and Singh, 2024, Banerjee et al., 2024c], and (ii) verifiers for properties like local DNN equivalence, defined over multiple executions of different DNNs on the same input [Paulsen et al., 2020, 2021]. For relational properties defined across multiple executions of the same DNN, existing verifiers [Khedr and Shoukry, 2023] reduce the verification problem to an L robustness problem by constructing a "product DNN" that includes multiple copies of the same DNN. However, the relational verifier in [Khedr and Shoukry, 2023] treats all k executions of the DNN as independent, which results in a loss of precision. On the other hand, [Zeng et al., 2023] (referred to as the I/O formulation) tracks the relationships between inputs used in multiple executions at the input layer, but it does not maintain the relationships between the outputs fed into the subsequent hidden layers. As a result, it achieves only limited improvement over baseline verifiers that treat all executions independently. Ra Ve N [Banerjee et al., 2024b] uses Diff Poly a abstract intepretation based framework to track linear relationships between the outputs at all layers resulting from multiple executions of the same DNN. While Ra Ve N is significantly more precise than the I/O formulation, tracking linear constraints at each layer across all DNN executions can be computationally expensive. The SOTA relational verifier RACoon [Banerjee and Singh, 2024] improves the scalability of Ra Ve N while maintaining Ra Ve N s precision by introducing a new gradient-descent based bounding strategy called cross-executional bound refinement, as detailed in Section 3. There exist, probabilistic verifiers, [Xie et al., 2021, Zhang et al., 2022b] based on randomized smoothing [Cohen et al., 2019] for verifying relational properties. However, these works can only give probabilistic guarantees on smoothed models which have high inference costs. Similar to [Banerjee et al., 2024b, Banerjee and Singh, 2024], in this work, we focus on deterministic relational verifiers for DNNs with Re LU activation. However, RABBit can be extended to activations like Sigmoid with branching methods [Shi et al., 2024] and parametric bounds [Wu et al., 2023]. 3 Preliminaries We provide background on "branch and bound" (Ba B) based non-relational DNN verification, as well as DNN safety properties that can be encoded as relational properties. Non-relational DNN verification: For a single execution, non-relational DNN verification focuses on proving that, for all perturbations x+δ of a given input x specified by ϕ, the network s output y = N(x +δ) meets a specified logical condition ψ. Commonly, safety properties such as L robustness encode the output condition (ψ) as a linear inequality or a conjunction of linear inequalities over the DNN output y Rnl. For instance, an output property could be expressed as ψ(y) = (c T y 0), where c Rnl. Generally, even for DNNs with piecewise-linear activation functions and input constraints defined by linear inequalities, complete verification i.e., always proving the property or finding a counterexample is an NP-complete problem. Given a DNN N : Rn0 Rnl and a property defined by (ϕ, ψ), scalable yet sound (but incomplete) verifiers approximate the network s behavior by computing a linear approximation specified by L Rn0 and b R. For any input x satisfying ϕ, this linear approximation ensures that LT x + b c T N(x). The verifier then aims to show that LT x + b 0 for all x that satisfy ϕ, which implies c T N(x) 0. While LT x + b provides a valid lower bound for c T N(x), it may lack precision. To enhance this precision for piecewise-linear activations, state-of-the-art non-relational verifiers use a branch-and-bound (Ba B) method. In each branching step, the problem is divided into smaller subproblems, while the bounding method computes a valid lower bound for each subproblem. Branching for piecewise linear activation: The non-relational verifier computes L by replacing non-linear activations with linear relaxations, which introduces imprecision. However, for piecewise linear activations like Re LU, it is possible to consider each linear piece separately as different subproblems, avoiding the need for imprecise linear relaxations. For instance, for y = Re LU(x), branching on x and considering the cases x 0 and x 0 allows decomposing Re LU(x) into two distinct linear pieces. Still in the worst case decomposing all Re LU nodes in a DNN results in exponential blowup making it practically infeasible. Therefore, SOTA non-relation verifiers like α, β-CROWN [Wang et al., 2021] greedily pick a small subset of Re LU nodes for branching while using linear relaxations for the rest. We explain the bounding step used for each subproblem below. Bounding with parameter refinement: Obtaining sound linear relaxations of activations σ like Re LU, which are not used for branching, involves computing linear lower bounds σl(x) and upper bounds σu(x) that contain all possible outputs of σ w.r.t all inputs x satisfying ϕ. That is, for all possible input values x of σ, σl(x) σ(x) σu(x) holds. SOTA non-relational verifiers, such as α, β-CROWN, improve precision by using parametric linear relaxations instead of static linear bounds and refine the parameters to facilitate verification of the property (ϕ, ψ). For example, for Re LU(x), the parametric lower bound is Re LU(x) α x with α [0, 1]. Since α x remains a valid lower for any α [0, 1], this allows optimizing α while ensuring the bound remains mathematically correct. Each branched Re LU say y = Re LU(x), introduces two subproblems each with one additional constraint x 0 (or, x 0) where Re LU behaves as a linear function i.e. y = 0 (or, y = x) respectively. To obtain the lower bound of LT x + b over inputs satisfying ϕ with the additional branching constraints α, β-CROWN convert the constrained optimization problem into an unconstrained one by looking at the Lagrangian dual. The dual replaces each branching constraint by augmenting the minimization objective LT x + b with additional terms i.e. LT x + b + β+x for x 0 or LT x + b + β x for x 0 where β+ 0 and β 0. Overall, at high level, α, β-CROWN computes parametric linear approximations L(α,β)T x + b(α,β) and refine the parameters α, β to facilitate verification of (ϕ, ψ). DNN relational properties: Relational properties defined for a DNN N : Rn0 Rnl defined over k executions of N are specified by the tuple (Φ, Ψ). Here, Φ : Rn0 k {true, false} (the input specification) encodes the input region Φt Rn0 k encompassing all potential inputs corresponding to each of the k executions of N. Furthermore, the safety property we expect the outputs of all k executions of N to satisfy is specified by Ψ : Rnl k {true, false} (the output specification). Given N, an input specification Φ and an output specification Ψ, DNN relational verification seeks to formally prove whether x 1, . . . , x k Rn0.Φ(x 1, . . . , x k) = Ψ(N(x 1), . . . N(x k)) or otherwise provide a counterexample. The inputs to the k executions of N are denoted by x 1, . . . , x k and the corresponding outputs are denoted by N(x 1), . . . , N(x k). For the i-th execution, commonly, the input region ϕi t is a L region around a fixed point xi Rn0 defined as ϕi t = {x i Rn0 | x i xi ϵ} and the corresponding output specification ψi(N(x i )) = Vm j=1(ci,j T N(x i ) 0). Consequently, Φ(x 1, . . . , x k) = Vk i=1(x i ϕi t) V Φδ(x 1, . . . , x k) where Φδ(x 1, . . . , x k) encodes the relationship between the inputs used in different execution and Ψ(N(x 1), . . . , N(x k)) = Vk i=1 ψi(N(x i )). Following this, we describe relational properties encoding important DNN safety configurations. UAP verification: In a UAP attack, given a DNN N, the adversary aims to find an adversarial perturbation with a bounded L norm that maximizes the rate at which N misclassifies when the same adversarial perturbation is applied to all inputs from the distribution. The UAP verification problem aims to find the worst-case accuracy of N against the UAP adversary. We refer to this worst-case accuracy as UAP accuracy in the rest of the paper. As shown by Theorem 2 in [Zeng et al., 2023], it is possible to statistically estimate the UAP accuracy of N with respect to the input distribution if one can determine the UAP accuracy of N on k randomly selected images. We focus on the k-UAP verification problem for the rest of the paper as improving the precision of k-UAP verification directly improves the UAP accuracy on the input distribution [Banerjee and Singh, 2024]. The k-UAP verification problem is fundementally different from local L robustness verification since the same adversarial perturbation is applied across the set of inputs. Thus, improving precision for the UAP verification problem requires a relational verifier that can exploit dependencies between the perturbed inputs. We provide the Φ and Ψ of the UAP verification problem in Appendix A.1. 4 Cross-executional Ba B The key distinction between relational and non-relational DNN verification is the dependency between different DNN executions, which necessitates that any precise relational verifier utilizes these crossexecution dependencies. For instance, for k-UAP problem with two images x1, x2 consider the scenario where both x1 and x2 have valid adversarial perturbations δ1 δ1 δ1 and δ2 δ2 δ2 but no common perturbation say δ that works for both x1 and x2. In this case, any non-relational verification that does not account for cross-execution dependencies can never prove the absence of a common perturbation given that both x1, x2 have valid adversarial perturbations. This highlights the importance of utilizing cross-executional dependencies. The SOTA relational verifier RACoon [Banerjee and Singh, 2024] leverages cross-execution dependencies to jointly optimize the α parameters from different executions, significantly improving the precision of relational verification. However, RACoon only uses parametric linear relaxations for non-linear activations and lacks a branching step, resulting in reduced precision, as confirmed by our experimental results in Section 6. To address this, we propose two separate Ba B algorithms, each with its benefits, described in Sections 4.1 and 4.2. Finally, we combine the results to formulate an efficiently optimizable MILP instance in Section 5 4.1 Strong Bounding Before going into the details, we briefly review the cross-executional bound refinement proposed in RACoon. For k-UAP, given any subset S of the k executions, RACoon can verify the absence of any common perturbation that works for all executions in S with cross-executional bound refinement. For all i S, let (Li(αi), bi(αi)) denote the parametric linear approximations corresponding to the i-th execution. Then the optimal value t = maxαi,λi ϵ P i S λi Li(αi) 1+P i S λi ai(αi) 0 proves the absence of a common perturbation δ for S. Here, ϵ is the perturbation bound i.e. δ ϵ, ai(αi) = bi(αi)+Li(αi)T xi and λi [0, 1] with P i S λi = 1 are the cross-executional parameters that relate linear approximations from different execution enabling joint optimization over αis. Next, we detail the first Ba B method - strong bounding that combines cross-executional bounding with branching methods to verify the absence of common perturbation for any subset of n = |S| executions. Branching and bounding: For n k executions, we construct a "product DNN" by duplicating the DNN n times, one for each execution. Formally, a product DNN is a function N n : Rn0 n Rnl n with N n(x1, . . . , xn) = [N(x1), . . . , N(xn)]T . At each branching step, we greedily select a subset of unbranched Re LU activations from the product DNN and branch on them, while using parametric linear relaxations for the rest. We adapt existing greedy branching heuristics, such as Ba BSR [Bunel et al., 2020b], for selecting the candidate Re LU activations. The heuristic computes a score for each unbranched Re LU activation in the product DNN, and we branch on the activations with the highest scores. Next, we detail the bounding method applied to each subproblem resulting from branching. Since the number of subproblems can be large, the bounding method needs to be fast yet capable of leveraging both branching constraints and cross-executional dependencies. However, the crossexecutional bound refinement from RACoon can not handle branching constraints, while the bounding step from α, β-CROWN does not utilize dependencies across executions. Hence, we develop a threestep algorithm for obtaining the optimal value t with fast gradient descent-based methods. First, we replace these branching constraints by introducing dual variables β, resulting in new parametric linear approximations (Li(αi,βi), bi(αi,βi)) for each subproblem for all i S. Then for each subproblem, we introduce additional variables λi for each execution with constraints λi [0, 1] and P i S λi = 1. These λis relate linear approximations from different executions capturing cross-executional dependencies. This reduces finding t for each subproblem to the following optimization problem t = maxαi,βi,λi ϵ P i S λi Li(αi,βi) 1 + P i S λi ai(αi,βi). Here, ai(αi,βi) = bi(αi,βi) + Li(αi,βi)T xi. Finally, we apply projected gradient ascent to refine parameters (αi,βi, λi). The detailed derivation of the bounding step and the proof of correctness is in Appendix B. Precision gains of strong bounding over the baselines are in Section 6.2. Suppose, F(S) denotes the set of subproblems then Theorem 4.1 proves the absence of common perturbation for the subset of executions S. Theorem 4.1. If min F(S) maxαi,βi,λi ϵ P i S λi Li(αi,βi) 1 + P i S λi ai(αi,βi) 0 then executions in S do not have a common perturbation δ Rn0 with δ ϵ. Proof: The detailed proof is in the Appendix B. While strong bounding effectively combines cross-executional refinement with branching, it has the following drawbacks that led to the development of the 2nd Ba B method. First, strong bounding branches over all executions simultaneously, which limits the number of branches explored per execution within a fixed timeout compared to branching on individual executions. For instance, if strong bounding solves m subproblems for n executions, then assuming each execution branched uniformly, each execution gets only m 1 n subproblems. In contrast, given the same timeout, branching individually allows exploring m n subproblems per execution. Second, strong bounding only proves the absence of common perturbation, a relaxation of the k-UAP problem. To mitigate this, RACoon uses parameter refinement to obtain linear approximations and formulate a MILP, providing a more precise bound on k-UAP accuracy. However, for strong bounding, as the number of subproblems increases and each subproblem has a different linear approximation, formulating a MILP with each linear approximation is practically infeasible. Restricting the number of linear approximations can help accommodate MILP formulation by compromising on the strong cross-executional bounding. 4.2 Strong Branching Unlike strong bounding, strong branching explores more branches by branching on each execution independently. Additionally, for each execution, we aim to keep the number of linear approximations small post-branching, ensuring the MILP instance using these approximations remains easy to optimize. To limit the number of linear approximations for each execution i, we fix a set of linear coefficients {L1, . . . , Lm} called "target coefficients" and for each j [m], Lj Rn0 compute valid lower bound b j of the following optimization problem minδ c T N(xi + δ) LT j (xi + δ) with δ ϵ using Ba B. In this case, for all δ Rn0 with δ ϵ the refined bias b j and Lj remain a valid lower bound of c T N(xi +δ) i.e. LT j (xi +δ) + b j c T N(xi +δ). Moreover, since we only refine the bias, the number of linear approximations remains the same as at the start of Ba B, even after branching. Next, we describe how we utilize cross-execution dependencies while branching on each execution independently. Selecting targets: To select target coefficients, we greedily pick subsets of executions and run cross-executional refinement from RACoon without branching on each subset of executions. We describe the greedy selection strategy in Section 5. For each set of executions, we add the linear approximations obtained by cross-executional refinement to the corresponding executions target sets. Cross-executional refinement ensures for each execution set the parameters corresponding to the linear approximations are tailored for the relational verification. Bounding and branching: Given a target coefficient Lt Rn0, since finding the exact solution of minδ c T N(xi + δ) LT t (xi + δ) is computationally expensive, strong branching aims to obtain a tight mathematically correct lower bound on the difference c T N(xi + δ) LT t (xi + δ). For any subproblem, let (L(α,β), b(α,β)) denote the parametric linear approximation. Then for this particular subproblem, for all α,β, L(α,β)T (xi + δ) + b(α,β) c T N(xi + δ) and subsequently: max α,β min δ ϵ(L(α,β) Lt)T (xi + δ) + b(α,β) min δ ϵ c T N(xi + δ) LT t (xi + δ) (1) The optimal solution of the max-min problem in Eq. 1 provides a mathematically correct lower bound of minδ c T N(xi + δ) LT t (xi + δ) for each subproblem. However, it is hard to solve a max-min problem with scalable differentiable optimization techniques like gradient descent typically used for large DNNs considered in this paper. Instead, we compute a closed form of the inner minimization problem reducing the optimization instance to a more tractable maximization problem (Theorem 4.2). Theorem 4.2. For any α,β, if L(α,β) Rn0 and b(α,β) R then min δ ϵ(L(α,β) Lt)T (x + δ) + b(α,β) = ϵ L(α,β) Lt 1 + (L(α,β) Lt)T x + b(α,β). Proof: The proof is in Appendix C. We apply a projected gradient ascent to optimize the maximization with the closed form obtained above (Appendix C.1). The proof of the correctness of the bounding method is in Appendix C. Note the proof of correctness does not necessitate the optimizer to find the global optimum. This is important since gradient ascent may not always converge to the global optimum. Since strong branching branch on each execution independently we reuse the branching strategy of α, β-CROWN. In this section, we detail the algorithm (Algo. 1) that combines the results from strong bounding and strong branching to formulate the MILP. Running strong bounding on all 2k 1 non-empty subsets of k executions is impractical. Therefore, we use a greedy approach to select subsets of executions for strong bounding. Similarly, for strong branching, we greedily select the target linear coefficients. First, we describe both greedy strategies before moving on to the MILP formulation. Elimination of individually verified executions: RABBit maintains a list of unverified indices and eliminates any executions that can be verified individually and does not consider them for subsequent steps (lines 3, 8, and 13 in Algo. 1). For instance, for k-UAP verification, we do not need to consider those executions that are proved to have no adversarial perturbation δ such that δ ϵ. Pruning individually verified executions improves the runtime without any compromise on the precision of the relational verifier (see Theorem B.1 [Banerjee and Singh, 2024]). Greedy target coefficient selection: RABBit first runs RACoon which in turn executes an incomplete non-relational verifier α-CROWN [Xu et al., 2021] eliminating the verified executions (line 8 in Algo. 1). Subsequently, for target selection, RABBit greedily picks the first kt (hyperparameter) executions based on si the lower bound on ψi(N(xi + δ)) as computed by α-CROWN, prioritizing executions with higher si (line 9). Intuitively, for unverified executions, si measures the maximum violation of the output specification ψi(N(xi + δ)) and thus leads to the natural choice of picking executions with smaller violations. For each selected execution i, we choose up to m target coefficients by iterating over all subsets i S considered by RACoon, and selecting linear approximations corresponding to the top m subsets.The cross-executional lower bound t from RACoon decides the priority of each subset S. Subsets S with higher t indicate smaller violations and are more likely to be verified for the absence of a common perturbation, making them suitable for target selection. Selection of subsets of executions for strong bounding: Thereafter, until timeout ζ, we run strong bounding on subsets of executions from individually unverified executions I. For each subset S I, Algorithm 1 RABBit 1: Input: N, (Φ, Ψ), k, kt, timeout ζ 2: Output: M. 3: I {} Unverified indices 4: L {} Linear approximations 5: C {} Cross-verified executions 6: s {} Lower bounds from α-Crown 7: M 0 Initialize verified UAP accuracy 8: (I, L, C, s) RACoon(N, (Φ, Ψ), k) 9: I1 top-kt indices from I based on s 10: for i I1 do 11: b i Strong Branching(ϕi, ψi, L[i]) 12: if Verified(ϕi, ψi, L[i], b i ) then 13: I I \ {i} 14: end if 15: Update Bias(L[i], b i ) 16: M MILP(L, Φ, Ψ, k, I, C) 17: M max (M(Φ, Ψ), Opt(M)) 18: end for 19: I2 top-kt indices from I based on s 20: while time() < ζ do 21: S Greedily select subset of I2 22: t S Strong Bounding(S, Φ, Ψ) 23: if t S 0 then 24: C Append(C, S) 25: M MILP(L, Φ, Ψ, k, I, C) 26: M max (M, Opt(M)) 27: end if 28: end while 29: return M the cross-executional bound obtained by RACoon on S decides its priority. However, considering all non-empty subsets of I can be expensive. Instead, similar to strong branching, we first pick top-kt executions (I2) from I (Algo 1 line 19). We sort all non-empty subsets S I2 based on their priority and, in each iteration, run strong bounding on the highest-priority subset that has not been scheduled yet (Algo 1 line 22). Given a large timeout, RABBit would eventually select all subsets from I2. MILP Formulation: The MILP formulation uses both the refined biases from strong branching (line 11) and the subsets S of executions verified for the absence of common perturbation from strong bounding (line 22) to compute final verified UAP accuracy. RABBit MILP formulation involves three steps. First, we deduce linear constraints between the input and output of N for each unverified execution using linear approximations of N with refined bias obtained by strong branching. Secondly, we add constraints for each subset S verified for the absence of common perturbation with strong bounding. Then, similar to the current SOTA baseline [Banerjee and Singh, 2024], we encode the output specification Ψ as a MILP objective, introducing only O(k) integer variables. Finally, we use an off-the-shelf MILP solver [Gurobi Optimization, LLC, 2018] to optimize the MILP. Ψ encoding: First, we show the MILP objective M that encodes Ψ. We introduce binary variables zi {0, 1} for each individually unverified execution in I where for any perturbation δ Rn0 and δ ϵ, zi = 1 implies ψi(N(xi +δ)) = True. Then the finding the worst case UAP accuracy is equivalent to the following M = 1 k (k |I|) + min δ ϵ P Constraints encoding: We add constraints from strong bounding, strong branching, and from the linear approximation obtained from the call to RACoon (Algo. 1 line 8). Suppose for any subset S I, strong bounding verifies the absence of common perturbation. Then for all δ Rn0 and δ ϵ at least one of the executions from S will always satisfy the corresponding output specification. Hence, for every such S we add the constraint: P i S zi 1. Now, let for any i I, {(L1 i , b1 i ), . . . , (Lm i , bm i )} denote set of linear approximation with bm i either coming from RACoon or from strong branching. Then we add the following constraints zi z i, z i = (oi 0), oi Lj T i (xi + δ) + bj i where j [m], and oi R, z i are new real and integer variables respectively. Limitations: Although RABBit outperforms SOTA verifiers in relational verification, like all deterministic verifiers, whether relational or non-relational (including ours), do not scale to deep neural networks (DNNs) trained on very large datasets such as Image Net. RABBit is sound but incomplete, meaning it may not be able to prove certain relational properties even if they are true. Note that all complete non-relational verifiers are also incomplete for relational properties since they do not track any dependencies between executions. 6 Experimental Evaluation We evaluate the effectiveness of RABBit on multiple relational properties such UAP accuracy ( Table 1) and top-k accuracy (Appendix Table 5), DNNs, and datasets. In our evaluation, we compare Table 1: RABBit Efficacy Analysis for Worst-Case UAP Accuracy Dataset Network Training Perturbation CROWN α CROWN α, β CROWN MN-Ba B GCP-CROWN I/O RACoon Strong Strong RABBit Structure Method Bound (ϵ) Bounding Branching Conv Small Standard 1/255 44.8 45.4 62.2 55.0 61.8 45.4 45.4 63.8 (+1.6) 63.2 (+1.0) 65.4 (+3.2) Conv Small Diff AI 5/255 44.4 49.6 53.8 55.0 53.8 50.4 51.6 59.0 (+4.0) 59.0 (+4.0) 59.8 (+4.8) CIFAR10 Conv Small SABR 2/255 75.2 75.8 79.4 80.0 80.0 76.8 78.2 83.0 (+3.0) 83.8 (+3.8) 84.0 (+4.0) Conv Small CITRUS 2/255 74.8 76.0 79.2 79.6 79.6 77.0 78.8 82.8 (+3.2) 83.2 (+3.6) 83.6 (+4.0) Conv Big Diff AI 2/255 46.6 51.8 61.2 61.6 61.2 53.2 54.8 62.8 (+1.2) 62.6 (+1.0) 63.0 (+1.6) Conv Small Standard 0.07 53.0 59.4 83.6 77.4 84.2 60.0 60.6 84.2 (+0.0) 84.2 (+0.0) 84.8(+0.6) Conv Small Diff AI 0.13 51.8 57.0 76.6 77.0 77.0 57.2 58.4 79.0 (+2.0) 78.6 (+1.6) 80.0 (+3.0) MNIST Conv Small SABR 0.15 27.0 38.0 50.4 51.2 60.2 42.2 45.8 62.6 (+2.4) 62.2 (+2.0) 63.4 (+3.2) Conv Small CITRUS 0.15 28.8 41.6 73.0 69.2 73.0 41.6 44.6 74.0 (+1.0) 73.4 (+0.4) 74.6 (+1.6) RABBit against SOTA baselines, including non-relational verifiers CROWN [Zhang et al., 2018], α-CROWN [Xu et al., 2021], α, β-CROWN [Wang et al., 2021], MN-Ba B [Ferrari et al., 2022], GCP-CROWN [Zhang et al., 2022a], as well as relational verifiers I/O Formulation [Zeng et al., 2023] and RACoon. As previously noted, Ra Ve N adds linear constraints for each layer, which restricts its scalability as the number of executions k increases. Therefore, we compare RABBit with Ra Ve N for a smaller execution count of k = 5, as shown in Appendix Table 4. Additionally, we show that: a) given the same time, RABBit always outperforms the SOTA Ba B-based non-relational verifier α, β-CROWN; b) strong bounding computes a tighter bound on t than α, β-CROWN; and c) we provide an ablation study on ϵ and k used by RABBit. 6.1 Experiment Setup Networks. We use standard convolutional architectures, such as Conv Small and Conv Big, which are used to evaluate both SOTA relational [Banerjee and Singh, 2024] and non-relational verifiers [Wang et al., 2021] (see Table 1). We provide the details of the DNN architectures in the Appendix D.1. We use networks trained using both standard training methods and robust training strategies, such as Diff AI [Mirman et al., 2018], SABR [Mueller et al., 2023], and CITRUS [Xu and Singh, 2024]. Our experiments utilize publicly available pre-trained DNNs sourced from the CROWN repository [Zhang et al., 2020], α, β-CROWN repository [Wang et al., 2021], and ERAN repository [Singh et al., 2019b]. The clean accuracies of these networks are reported in Appendix D.2. Implementation details and hyperparameters. We implemented our method in Python with Pytorch V1.11 on top of SOTA complete non-relational verifier α, β-CROWN [Wang et al., 2021]. We used Gurobi V11.0 as the off-the-shelf MILP solver. For both strong bounding and strong branching, we use Adam [Kingma and Ba, 2014] for parameter learning and run it for 20 iterations on each subproblem. We set the value of kt = 10 for CIFAR-10 and kt = 24 for MNIST networks respectively. We use a single NVIDIA A100-PCI GPU with 40 GB RAM for bound refinement and an Intel(R) Xeon(R) Silver 4214R CPU @ 2.40GHz with 64 GB RAM for MILP optimization. For any relational property with k executions, we give an overall timeout of k minutes (averaging 1 minute/execution) to RABBit and all baselines. Each MILP instance gets a timeout of 10 minutes. We issue the MILP optimization call on line 25 of Algo. 1 in a separate thread for runtime optimization, ensuring that the MILP optimization process does not unnecessarily block the subsequent iterations of the while loop (line 20 of Algo. 1). 6.2 Experimental Results Effectiveness of RABBit: Table 1 compares the results of RABBit to all baselines across different datasets (column 1) and DNN architectures (column 2) trained with various methods (column 3), with ϵ values defining the L bound of δ in column 4. For each DNN and ϵ, we run RABBit and all the baselines on 10 relational properties each defined with k = 50 randomly selected inputs, and report the worst-case UAP accuracy averaged over the 10 properties. Note that for each DNN, we exclude inputs misclassified by the DNN. We compare the performance of RABBit against SOTA relational and complete non-relational verifiers as well as against strong bounding and strong branching. The results in Table 1 demonstrate that strong bounding, strong branching, and RABBit all outperform the existing SOTA verifiers on all DNNs and ϵ. Notably, RABBit gains up to +4.8% and up to +3.2% improvement in the worst-case UAP accuracy (averaged over 10 runs) for CIFAR10 and MNIST DNNs, respectively. RABBit also efficiently scales to the largest verifiable DNN architectures such as Conv Big, conferring up to +1.6% improvement in worst-case UAP accuracy. In some cases, strong bounding outperforms strong branching, while in others, strong branching outperforms strong bounding, highlighting the importance of both methods. RABBit combines the strengths of both strong branching and strong bounding, producing the best results overall. (a) Diff AI (CIFAR10) (b) SABR (CIFAR10) (c) CITRUS (CIFAR10) Figure 1: Average Worst Case k-UAP accuracy vs Time for Conv Small CIFAR10 DNNs. Time vs UAP Accuracy Analysis: Fig. 1 shows timewise the worst-case UAP accuracy (averaged over 10 runs) for different Conv Small CIFAR10 networks with k = 50 on ϵ values from Table 1. Note that RABBit invokes RACoon, which in turn calls α-CROWN and eliminates verified executions (Line 7 in Algorithm 1). Hence, for a fair comparison, we also run α-CROWN first for α, β-CROWN and then run α, β-CROWN only on the unverified indices. For all DNNs, RABBit consistently outperforms the SOTA Ba B-based non-relational verifier α, β-CROWN at all timestamps. This confirms that the improved precision shown in Table 1 is not dependent on the specific timeout value. (a) Diff AI (CIFAR10) (b) CITRUS (CIFAR10) Figure 2: Timewise Analysis of Average % Improvement in t with Strong Bounding (CIFAR10) (a) Diff AI (MNIST) (b) CITRUS (MNIST) Figure 3: Timewise Analysis of Average % Improvement in t with Strong Bounding (MNIST) Evaluating Bound Improvement: In Figs 2 and 3, we present a timewise analysis of the improvement in t with strong bounding over α, β-CROWN and RACoon. For this experiment, we use Diff AI and CITRUS Conv Small networks with epsilon values from Table 1. For each network and ϵ, we select 30 executions at random and compute the percentage improvement in t with strong bounding over RACoon and α, β-CROWN.We also report the average improvement and 95% confidence intervals for all cases in Table 6 in Appendix G. The results demonstrate that the t with strong bounding is significantly tighter compared to the bounds from the SOTA verifiers α, β-CROWN and RACoon at all timestamps. Furthermore, strong bounding improves t on average by up to 108.7% for CIFAR10 networks and 57.7% for MNIST networks. These results highlight the importance of leveraging dependencies across executions during both branching and bounding to improve precision. Different ϵ and k values: Fig. 4 shows the results of RACoon, α, β-CROWN, and RABBit for k-UAP verification of CIFAR10 Conv Small DNNs for 5 different ϵ values and k = 50. RABBit outperforms RACoon and α, β-CROWN for all evaluated ϵ values, notably improving the worst case k-UAP accuracy by up to 4.8%. Similarly, we analyze the performance of RACoon, α, β-CROWN, and RABBit for k-UAP verification of CIFAR10 Conv Small DNNs with different k values. As presented in Fig. 5, for all k values, RABBit is more precise than both baselines. Expectedly, the worst-case k-UAP accuracy for relational verifiers is higher with larger k values as it is easier to prove the absence of a common perturbation with larger k. (a) Diff AI (CIFAR10) (b) CITRUS (CIFAR10) Figure 4: Average Worst Case k-UAP accuracy vs ϵ for CIFAR10 DNNs. (a) Diff AI (CIFAR10) (b) CITRUS (CIFAR10) Figure 5: Average Worst Case k-UAP accuracy for different k values for CIFAR10 Conv Small DNNs. 7 Conclusion We present RABBit, a general framework for improving the precision of relational verification of DNNs through Ba B methods specifically designed to utilize dependencies across executions. Our experiments, on various DNN architectures, and training methods demonstrate that RABBit significantly outperforms both SOTA relational and non-relational verifiers for relational properties. Although we focus on the worst-case UAP accuracy and top-k accuracy RABBit can be extended to properties involving different DNNs, such as local equivalence of DNN pairs Paulsen et al. [2020] or properties defined over an ensemble of DNNs. Acknowledgement We thank the anonymous reviewers for their insightful comments. This work was supported in part by NSF Grants No. CCF-2238079, CCF-2316233, CNS-2148583. Filippo Amato, Alberto López, Eladia María Peña-Méndez, Petr Vaˇnhara, Aleš Hampl, and Josef Havel. Artificial neural networks in medical diagnosis. Journal of Applied Biomedicine, 11(2), 2013. Ross Anderson, Joey Huchette, Will Ma, Christian Tjandraatmadja, and Juan Pablo Vielma. Strong mixed-integer programming formulations for trained neural networks. Mathematical Programming, 2020. Stanley Bak, Hoang-Dung Tran, Kerianne Hobbs, and Taylor T. Johnson. Improved geometric path enumeration for verifying relu neural networks. In Shuvendu K. Lahiri and Chao Wang, editors, Computer Aided Verification - 32nd International Conference, CAV 2020, Los Angeles, CA, USA, July 21-24, 2020, Proceedings, Part I, volume 12224 of Lecture Notes in Computer Science, pages 66 96. Springer, 2020. doi: 10.1007/978-3-030-53288-8\_4. URL https://doi.org/10. 1007/978-3-030-53288-8_4. Mislav Balunovic, Maximilian Baader, Gagandeep Singh, Timon Gehr, and Martin Vechev. Certifying geometric robustness of neural networks. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/ paper_files/paper/2019/file/f7fa6aca028e7ff4ef62d75ed025fe76-Paper.pdf. Debangshu Banerjee and Gagandeep Singh. Relational DNN verification with cross executional bound refinement. In Forty-first International Conference on Machine Learning, 2024. URL https://openreview.net/forum?id=HOG80Yk4Gw. Debangshu Banerjee, Avaljot Singh, and Gagandeep Singh. Interpreting robustness proofs of deep neural networks. In The Twelfth International Conference on Learning Representations, 2024a. URL https://openreview.net/forum?id=Ev10F9TWML. Debangshu Banerjee, Changming Xu, and Gagandeep Singh. Input-relational verification of deep neural networks. Proc. ACM Program. Lang., 8(PLDI), June 2024b. doi: 10.1145/3656377. URL https://doi.org/10.1145/3656377. Debangshu Banerjee, Changming Xu, and Gagandeep Singh. Scalable relational verification and training for deep neural networks. 2024c. Rudy Bunel, Jingyue Lu, Ilker Turkaslan, Pushmeet Kohli, P Torr, and P Mudigonda. Branch and bound for piecewise linear neural network verification. Journal of Machine Learning Research, 21 (2020), 2020a. Rudy Bunel, Ilker Turkaslan, Philip H. S. Torr, M. Pawan Kumar, Jingyue Lu, and Pushmeet Kohli. Branch and bound for piecewise linear neural network verification. J. Mach. Learn. Res., 21(1), jan 2020b. ISSN 1532-4435. Rudy R Bunel, Oliver Hinder, Srinadh Bhojanapalli, and Krishnamurthy Dvijotham. An efficient nonconvex reformulation of stagewise convex optimization problems. Advances in Neural Information Processing Systems, 33, 2020c. Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 1310 1320. PMLR, 09 15 Jun 2019. URL https://proceedings.mlr. press/v97/cohen19c.html. Ruediger Ehlers. Formal verification of piece-wise linear feed-forward neural networks. In International Symposium on Automated Technology for Verification and Analysis, 2017. Claudio Ferrari, Mark Niklas Mueller, Nikola Jovanovi c, and Martin Vechev. Complete verification via multi-neuron relaxation guided branch-and-bound. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=l_am Hf1oa K. Aymeric Fromherz, Klas Leino, Matt Fredrikson, Bryan Parno, and Corina Pasareanu. Fast geometric projections for local robustness certification. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=z Wy1uxj Dd ZJ. Timon Gehr, Matthew Mirman, Dana Drachsler-Cohen, Petar Tsankov, Swarat Chaudhuri, and Martin Vechev. Ai2: Safety and robustness certification of neural networks with abstract interpretation. In 2018 IEEE Symposium on Security and Privacy (SP), 2018. Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. ar Xiv preprint ar Xiv:1412.6572, 2014. Gurobi Optimization, LLC. Gurobi optimizer reference manual, 2018. Enyi Jiang and Gagandeep Singh. Towards universal certified robustness with multi-norm training, 2024. URL https://arxiv.org/abs/2410.03000. Haitham Khedr and Yasser Shoukry. Certifair: A framework for certified global fairness of neural networks. Proceedings of the AAAI Conference on Artificial Intelligence, 37(7):8237 8245, Jun. 2023. Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. ar Xiv preprint ar Xiv:1412.6980, 2014. Juncheng Li, Shuhui Qu, Xinjian Li, Joseph Szurley, J. Zico Kolter, and Florian Metze. Adversarial music: Real world audio adversary against wake-word detection system. In Proc. Neural Information Processing Systems (Neur IPS), pages 11908 11918, 2019a. Juncheng Li, Frank R. Schmidt, and J. Zico Kolter. Adversarial camera stickers: A physical camerabased attack on deep learning systems. In Proc. International Conference on Machine Learning, ICML, volume 97, pages 3896 3904, 2019b. Linyi Li, Jiawei Zhang, Tao Xie, and Bo Li. Double sampling randomized smoothing. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 13163 13208. PMLR, 17 23 Jul 2022. URL https://proceedings.mlr.press/v162/li22aa.html. Zikun Liu, Changming Xu, Emerson Sie, Gagandeep Singh, and Deepak Vasisht. Exploring practical vulnerabilities of machine learning-based wireless systems. In 20th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2023, Boston, MA, April 17-19, 2023, pages 1801 1817. USENIX Association, 2023. Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=r Jz IBf ZAb. Anna P. Meyer, Yuhao Zhang, Aws Albarghouthi, and Loris D Antoni. Verified training for counterfactual explanation robustness under data shift, 2024. URL https://arxiv.org/abs/2403. 03773. Matthew Mirman, Timon Gehr, and Martin Vechev. Differentiable abstract interpretation for provably robust neural networks. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 3578 3586. PMLR, 10 15 Jul 2018. URL https://proceedings.mlr.press/ v80/mirman18b.html. Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Universal adversarial perturbations. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1765 1773, 2017. Mark Niklas Mueller, Franziska Eckert, Marc Fischer, and Martin Vechev. Certified training: Small boxes are all you need. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=7o Fuxt Jt UMH. Alessandro De Palma, Harkirat S. Behl, Rudy R. Bunel, Philip H. S. Torr, and M. Pawan Kumar. Scaling the convex barrier with active sets. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021, 2021. Alessandro De Palma, Rudy R Bunel, Krishnamurthy Dj Dvijotham, M. Pawan Kumar, Robert Stanforth, and Alessio Lomuscio. Expressive losses for verified robustness via convex combinations. In The Twelfth International Conference on Learning Representations, 2024. URL https:// openreview.net/forum?id=mzy Z4wz Kl M. Brandon Paulsen, Jingbo Wang, and Chao Wang. Reludiff: Differential verification of deep neural networks. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, ICSE 20, page 714 726, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450371216. doi: 10.1145/3377811.3380337. URL https: //doi.org/10.1145/3377811.3380337. Brandon Paulsen, Jingbo Wang, Jiawei Wang, and Chao Wang. Neurodiff: Scalable differential verification of neural networks using fine-grained approximation. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 20, page 784 796, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450367684. doi: 10.1145/3324884.3416560. URL https://doi.org/10.1145/3324884.3416560. Zhouxing Shi, Qirui Jin, J Zico Kolter, Suman Jana, Cho-Jui Hsieh, and Huan Zhang. Formal verification for neural networks with general nonlinearities via branch-and-bound, 2024. URL https://openreview.net/forum?id=ivokw VKY4o. Gagandeep Singh, Timon Gehr, Matthew Mirman, Markus Püschel, and Martin Vechev. Fast and effective robustness certification. Advances in Neural Information Processing Systems, 31, 2018. Gagandeep Singh, Rupanshu Ganvir, Markus Püschel, and Martin Vechev. Beyond the single neuron convex barrier for neural network certification. In Advances in Neural Information Processing Systems, 2019a. Gagandeep Singh, Timon Gehr, Markus Püschel, and Martin Vechev. An abstract domain for certifying neural networks. Proceedings of the ACM on Programming Languages, 3(POPL), 2019b. Shubham Ugare, Debangshu Banerjee, Sasa Misailovic, and Gagandeep Singh. Incremental verification of neural networks. Proc. ACM Program. Lang., 7(PLDI), June 2023. doi: 10.1145/3591299. URL https://doi.org/10.1145/3591299. Shubham Ugare, Tarun Suresh, Debangshu Banerjee, Gagandeep Singh, and Sasa Misailovic. Incremental randomized smoothing certification. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=Sde APV1irk. Shiqi Wang, Kexin Pei, Justin Whitehouse, Junfeng Yang, and Suman Jana. Efficient formal safety analysis of neural networks. In Advances in Neural Information Processing Systems, 2018. Shiqi Wang, Huan Zhang, Kaidi Xu, Xue Lin, Suman Jana, Cho-Jui Hsieh, and J Zico Kolter. Beta-CROWN: Efficient bound propagation with per-neuron split constraints for neural network robustness verification. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, 2021. URL https://openreview.net/ forum?id=ah YIl RBe CFw. Haoze Wu, Teruhiro Tagomori, Alexander Robey, Fengjun Yang, Nikolai Matni, George Pappas, Hamed Hassani, Corina Pasareanu, and Clark Barrett. Toward certified robustness against realworld distribution shifts. In 2023 IEEE Conference on Secure and Trustworthy Machine Learning (Sa TML), pages 537 553. IEEE, 2023. Chulin Xie, Minghao Chen, Pin-Yu Chen, and Bo Li. Crfl: Certifiably robust federated learning against backdoor attacks. In International Conference on Machine Learning, pages 11372 11382. PMLR, 2021. Changming Xu and Gagandeep Singh. Cross-input certified training for universal perturbations, 2024. Kaidi Xu, Zhouxing Shi, Huan Zhang, Yihan Wang, Kai-Wei Chang, Minlie Huang, Bhavya Kailkhura, Xue Lin, and Cho-Jui Hsieh. Automatic perturbation analysis for scalable certified robustness and beyond. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS 20, Red Hook, NY, USA, 2020. Curran Associates Inc. ISBN 9781713829546. Kaidi Xu, Huan Zhang, Shiqi Wang, Yihan Wang, Suman Jana, Xue Lin, and Cho-Jui Hsieh. Fast and complete: Enabling complete neural network verification with rapid and massively parallel incomplete verifiers. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=n VZt XBI6LNn. Yi Zeng, Zhouxing Shi, Ming Jin, Feiyang Kang, Lingjuan Lyu, Cho-Jui Hsieh, and Ruoxi Jia. Towards robustness certification against universal perturbations. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id= 7GEv PKxjtt. Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. Efficient neural network robustness certification with general activation functions. Advances in neural information processing systems, 31, 2018. Huan Zhang, Hongge Chen, Chaowei Xiao, Sven Gowal, Robert Stanforth, Bo Li, Duane Boning, and Cho-Jui Hsieh. Towards stable and efficient training of verifiably robust neural networks. In Proc. International Conference on Learning Representations (ICLR), 2020. Huan Zhang, Shiqi Wang, Kaidi Xu, Linyi Li, Bo Li, Suman Jana, Cho-Jui Hsieh, and J Zico Kolter. General cutting planes for bound-propagation-based neural network verification. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022a. URL https://openreview.net/forum?id=5ha AJAcofjc. Yuhao Zhang, Aws Albarghouthi, and Loris D Antoni. Bagflip: A certified defense against data poisoning. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022b. URL https://openreview.net/ forum?id=Zidk M5b92G. A Formal encoding of relational properties A.1 k-UAP verification Given a set of k points X = {x1, ..., xk} where for all i [k], xi Rn0 and ϵ R we can first define individual input constraints used to define L input region for each execution i [k].ϕi in(x i ) = x i xi ϵ. We define Φδ(x 1, . . . , x k) as follows: Φδ(x 1, . . . , x k) = (i,j [k]) (i