# scalable_verification_of_quantized_neural_networks__579d87ad.pdf

Scalable Veriﬁcation of Quantized Neural Networks

Thomas A. Henzinger, Mathias Lechner, Ðor de Žikeli c IST Austria Klosterneuburg, Austria {tah,mlechner, dzikelic}@ist.ac.at

Formal veriﬁcation of neural networks is an active topic of research, and recent advances have signiﬁcantly increased the size of the networks that veriﬁcation tools can handle. However, most methods are designed for veriﬁcation of an idealized model of the actual network which works over real arithmetic and ignores rounding imprecisions. This idealization is in stark contrast to network quantization, which is a technique that trades numerical precision for computational efﬁciency and is, therefore, often applied in practice. Neglecting rounding errors of such low-bit quantized neural networks has been shown to lead to wrong conclusions about the network s correctness. Thus, the desired approach for verifying quantized neural networks would be one that takes these rounding errors into account. In this paper, we show that verifying the bitexact implementation of quantized neural networks with bitvector speciﬁcations is PSPACE-hard, even though verifying idealized real-valued networks and satisﬁability of bit-vector speciﬁcations alone are each in NP. Furthermore, we explore several practical heuristics toward closing the complexity gap between idealized and bit-exact veriﬁcation. In particular, we propose three techniques for making SMT-based veriﬁcation of quantized neural networks more scalable. Our experiments demonstrate that our proposed methods allow a speedup of up to three orders of magnitude over existing approaches.

Introduction

Deep neural networks for image classiﬁcation typically consist of a large number of sequentially composed layers. Computing the output of such a network for a single input sample may require more than a billion ﬂoating-point operations (Tan and Le 2019). Consequently, deploying a trained deep neural network imposes demanding requirements on the computational resources available at the computing device that runs the network. Quantization of neural networks is a technique that reduces the computational cost of running a neural network by reducing the arithmetic precision of computations inside the network (Jacob et al. 2018). As a result, quantization has been widely adapted in industry for deploying neural networks in a resource-friendly way. For instance, Tesla s Autopilot Hardware 3.0 is designed for primarily running 8-bit quantized neural networks 1.

Copyright c 2021, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

The veriﬁcation problem for neural networks consists of checking validity of some input-output relation. More precisely, given two conditions over inputs and outputs of the network, the goal is to check if for every input sample which satisﬁes the input condition, the corresponding output of the neural network satisﬁes the output condition. Veriﬁcation of neural networks has many important practical applications such as checking robustness to adversarial attacks (Szegedy et al. 2013; Tjeng, Xiao, and Tedrake 2019), proving safety in safety-critical applications (Huang et al. 2017) or output range analysis (Dutta, Chen, and Sankaranarayanan 2019), to name a few. There are many efﬁcient methods for veriﬁcation of neural networks (e.g. (Katz et al. 2017; Tjeng, Xiao, and Tedrake 2019; Bunel et al. 2018)), however most of them ignore rounding errors in computations. The few approaches that can handle the semantics of rounding operations are overapproximation-based methods, i.e., incomplete veriﬁcation (Singh et al. 2018, 2019). The imprecision introduced by quantization stands in stark contrast with the idealization made by veriﬁcation methods for standard neural networks, which disregards rounding errors that appear due to the network s semantics. Consequently, veriﬁcation methods developed for standard networks are not sound for and cannot be applied to quantized neural networks. Indeed, recently it has been shown that speciﬁcations that hold for a ﬂoating-point representation of a network need not necessarily hold after quantizing the network (Giacobbe, Henzinger, and Lechner 2020). As a result, specialized veriﬁcation methods that take quantization into account need to be developed, due to more complex semantics of quantized neural networks. Groundwork on such methods demonstrated that special encodings of networks in terms of Satisﬁability Modulo Theories (SMT) (Clark and Cesare 2018) with bit-vector (Giacobbe, Henzinger, and Lechner 2020) or ﬁxed-point (Baranowski et al. 2020) theories present a promising approach towards the veriﬁcation of quantized networks. However, the size of networks that these tools can handle and runtimes of these approaches do not match the efﬁciency of advanced veriﬁcation methods developed for standard networks like Reluplex(Katz et al. 2017) and Neurify (Wang et al. 2018a). In this paper, we provide ﬁrst evidence that the veriﬁca-

1https://en.wikichip.org/wiki/tesla_(car_company)/fsd_chip

The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)

tion problem for quantized neural networks is harder compared to veriﬁcation of their idealized counterparts, thus explaining the scalability-gap between existing methods for standard and quantized network veriﬁcation. In particular, we show that verifying quantized neural networks with bitvector speciﬁcations is PSPACE-hard, despite the satisﬁability problem of formulas in the given speciﬁcation logic being in NP. As veriﬁcation of neural networks without quantization is known to be NP-complete (Katz et al. 2017), this implies that the veriﬁcation of quantized neural networks is a harder problem. We then address the scalability limitation of SMT-based methods for veriﬁcation of quantized neural networks, and propose three techniques for their more efﬁcient SMT encoding. First, we introduce a technique for identifying those variables and constraints whose value can be determined in advance, thus decreasing the size of SMT-encodings of networks. Second, we show how to encode variables as bitvectors of minimal necessary bit-width. This signiﬁcantly reduces the size of bit-vector encoding of networks in (Giacobbe, Henzinger, and Lechner 2020). Third, we propose a redundancy elimination heuristic which exploits bit-level redundancies occurring in the semantics of the network. Finally, we propose a new method for the analysis of the quantized network s reachable value range, which is based on abstract interpretation and assists our new techniques for SMT-encoding of quantized networks. We evaluate our approach on two well-studied adversarial robustness veriﬁcation benchmarks. Our evaluation demonstrates that the combined effect of our techniques is a speed-up of over three orders of magnitude compared to the existing tools. The rest of this work is organized as follows: First, we provide background and discuss related works on the veriﬁcation of neural networks and quantized neural networks. We then start with our contribution by showing that the veriﬁcation problem for quantized neural networks with bitvector speciﬁcations is PSPACE-hard. In the following section, we propose several improvements to the existing SMTencodings of quantized neural networks. Finally, we present our experimental evaluation to assess the performance impacts of our techniques.

Background and Related Work

A neural network is a function f : Rn Rm that consists of several layers f = l1 l2 lk that are sequentially composed, with each layer parameterized by learned weight values. Commonly found types of layers are linear

l(x) = Wx + b, W Rno ni, b Rno, (1)

Re LU l(x) = max{x, 0}, and convolutional layers (Le Cun et al. 1998). In practice, the function f is implemented by ﬂoatingpoint arithmetic instead of real-valued computations. To distinguish a neural network from its approximation, we deﬁne an interpretation Jf K as a map which assigns a new function to each network, i.e.

JK : (Rn Rm) (D Rm), (2)

where D Rn is the admissible input domain. For instance, we denote by Jf KR : f 7 f the idealized real-valued abstraction of a network f, whereas Jf Kﬂoat32 denotes its ﬂoating-point implementation, i.e. the realization of f using 32-bit IEEE ﬂoating-point (Kahan 1996) instead of real arithmetic. Evaluating f, even under ﬂoating-point interpretation, can be costly in terms of computations and memory resources. In order to reduce these resource requirements, networks are usually quantized before being deployed to end devices (Jacob et al. 2018). Formally, quantization is an interpretation Jf Kint-k that evaluates a network f which uses k-bit ﬁxed-point arithmetic (Smith et al. 1997), e.g. 4 to 8 bits. Let [Z]k = {0, 1}k denote the set of all bit-vectors of bit-width k. For each layer l : [Z]ni k [Z]n0 k in Jf Kint-k, we deﬁne its semantics by deﬁning l(x1, . . . , xni) = (y1, . . . , yn0) as follows:

j=1 wijxj + bi, (3)

x i = round(x i, ki) = x i 2 ki , and (4)

yi = max{0, min{2Ni 1, x i }}, (5)

Here, wi,j and bi for each 1 j ni and 1 i n0 denote the learned weights and biases of f, and ki and Ni denote the bit-shift and the cut-off value associated to each variable yi, respectively. Eq. (3) multiplies the inputs xj with the weight values wij and adds the bias bi, eq. (4) rounds the result to the nearest valid k-bit ﬁxed-point value, and eq. (5) is a non-linear Re LU-N activation function 2. An illustration of how the computations inside a network differ based on the used interpretation is shown in Fig. 1.

Veriﬁcation of Neural Networks The veriﬁcation problem for a neural network and its given interpretation consists of verifying some input-output relation. More formally, given a neural network f, its interpretation Jf K and two predicates ϕ and ψ over the input domain D and output domain Rm of Jf K, we want to check validity of the following formula (i.e. whether it holds for each x D) ϕ(x) Jf K(x) = y = ψ(y). (6)

We refer to the formula in eq. (6) as the formal speciﬁcation that needs to be proved. In order to formally verify a neural network, it is insufﬁcient to just specify the network without also providing a particular interpretation. A property that holds with respect to one interpretation need not necessarily remain true if we consider a different interpretation. For example, robustness of the real-valued abstraction does not imply robustness of the ﬂoating-point implementation of a network (Jia and Rinard 2020). Ideally, we would like to verify neural networks under the exact semantics that are used for running networks on the end device, i.e., Jf Kﬂoat32 most of the time. However, as veriﬁcation methods for IEEE ﬂoating-point arithmetic are

2Note that for quanitzed neural networks, the double-side bounded Re LU-N activation is preferred over the standard Re LU activation function (Jacob et al. 2018)

A) Idealized real-valued network Jf KR

0.94374 . . .

1.382723 . . . 2.57799431 . . . + 1.75

B) Floating-point network Jf Kﬂoat32

J2.577954Kﬂoat32 = 2.5780 + 1.75

C) Quantized (ﬁxed-point) network Jf Kint-8

J2.5696Kint-8 = 2.57 + 1.75

Figure 1: Illustration of how different interpretations of the same network run with different numerical precision. A) Jf KR assumes inﬁnite precision. B) Jf Kﬂoat32 rounds the mantissa according on the IEEE 754 standard. C) Jf Kint-8 rounds to a ﬁxed number of digits before and after the comma. (Note that this ﬁgure serves as a hypothetical example in decimal format.)

extremely inefﬁcient, research has focused on verifying the idealized real-valued abstraction Jf KR of f. In particular, efﬁcient methods have been developed for a popular type or networks that only consist of linear and Re LU operations (Figure 2 a) (Katz et al. 2017; Ehlers 2017; Tjeng, Xiao, and Tedrake 2019; Bunel et al. 2018). The piecewise linearity of such Re LU networks allows the use of Linear Programming (LP) techniques, which make the veriﬁcation methods more efﬁcient. The underlying veriﬁcation problem of Re LU networks with linear inequality speciﬁcations was shown to be NP-complete in the number of Re LU operations (Katz et al. 2017), however advanced tools scale beyond toy networks. Although these methods can handle networks of large size, they are building on the assumption that

Jf Kﬂoat32 Jf KR, (7)

i.e. that the rounding errors introduced by the IEEE ﬂoatingpoint arithmetic of both the network and the veriﬁcation algorithm can be neglected. It has been recently shown that this need not always be true. For example, Jia and Rinard (Jia and Rinard 2020) crafted adversarial counterexamples to the ﬂoating-point implementation of a neural network whose idealized interpretation was veriﬁed to be robust against such attacks, by exploiting subtle numerical differences between Jf Kﬂoat32 and Jf KR.

y = JRe LU(x)KR

y = JRe LU-N(x)Kint-k

Figure 2: Illustration of a) the Re LU activation function under real-valued semantics, and b) Re LU-N activation under ﬁxed-point semantics (right).

Veriﬁcation of Quantized Neural Networks

The low numerical precision of few-bit ﬁxed-point arithmetic implies that Jf Kint-k = Jf KR. Indeed, (Giacobbe, Henzinger, and Lechner 2020) constructed a prototypical network that either satisﬁes or violates a formal speciﬁcation, depending on the numerical precision used to evaluate the network. Moreover, they observed such discrepancy in networks found in practice. Thus, no formal guarantee on Jf Kint-k can be obtained by verifying Jf KR or Jf Kﬂoat32. In order to verify ﬁxed-point implementations of (i.e. quantized) neural networks, new approaches are required. Fig. 2 depicts the Re LU activation function for idealized real-valued Re LU networks and for quantized Re LU networks, respectively. The activation function under ﬁxedpoint semantics consists of an exponential number of piecewise constant intervals thus making the LP-based techniques, which otherwise work well for real-valued networks, extremely inefﬁcient. So the approaches developed for idealized real-valued Re LU networks cannot be efﬁciently applied to quantized networks. Existing veriﬁcation methods for quantized neural networks are based on bit-exact Boolean Satisﬁability (SAT) and SMT encodings. For 1-bit networks, i.e., binarized neural networks, Narodytska et al. (Narodytska et al. 2018) and (Cheng et al. 2018) proposed to encode the network semantics and the formal speciﬁcation into an SAT formula, which is then checked by an offthe-shelf SAT solver. While their approach could handle networks of decent size, the use of SAT-solving is limited to binarized networks, which are not very common in practice. (Giacobbe, Henzinger, and Lechner 2020) proposed to verify many-bit quantized neural network by encoding their semantics and speciﬁcations into quantiﬁer-free bit-vector SMT (QF_BV) formulas. The authors showed that, by reordering linear summations inside the network, such monolithic bit-vector SMT encodings could scale to the veriﬁcation of small but interestingly sized networks. (Baranowski et al. 2020) introduced an SMT theory for ﬁxed-point arithmetic and showed that the semantics of quantized neural networks could be encoded in this theory very naturally. However, as the authors only proposed prototype solvers for reference purposes, the size of the veriﬁed networks was limited.

Limitations of Neural Network Veriﬁcation The existing techniques for veriﬁcation of idealized realvalued abstractions of neural networks have signiﬁcantly increased the size of networks that can be veriﬁed (Ehlers 2017; Katz et al. 2017; Bunel et al. 2018; Tjeng, Xiao, and Tedrake 2019). However, scalability remains the key challenge hindering formal veriﬁcation of neural networks in practice. For instance, even the largest networks veriﬁed by the existing methods (Ruan, Huang, and Kwiatkowska 2018) are tiny compared to the network architectures used for object detection and image classiﬁcation (He et al. 2016). Regarding the veriﬁcation of quantized neural networks, no advanced techniques aiming at performance improvements have been studied so far. In this paper, we address the scalability of quantized neural network veriﬁcation methods that rely on SMT-solving.

Hardness of Veriﬁcation of Quantized Neural Networks The size of quantized neural networks that existing veriﬁcation methods can handle is signiﬁcantly smaller compared to the real arithmetic networks that can be veriﬁed by the state-of-the-art tools like (Katz et al. 2017; Tjeng, Xiao, and Tedrake 2019; Bunel et al. 2018). Thus, a natural question is whether this gap in scalability is only because existing methods for quantized neural networks are less efﬁcient, or if the veriﬁcation problem for quantized neural networks is computationally harder. In this section, we study the computational complexity of the veriﬁcation problem for quantized neural networks. For idealized real arithmetic interpretation of neural networks, it was shown in (Katz et al. 2017) that, if predicates on inputs and outputs are given as conjunctions of linear inequalities, then the problem is NP-complete. The fact that the problem is NP-hard is established by reduction from 3-SAT, and the same argument can be used to show that the veriﬁcation problem for quantized neural networks is also NP-hard. In this work, we argue that the veriﬁcation problem for quantized neural networks with bit-vector speciﬁcations is in fact PSPACE-hard, and thus harder then verifying real arithmetic neural networks. Moreover, we show that this holds even for the special case when there are no constraints on the inputs of the network, i.e. when the predicate on inputs is assumed to be a tautology. The veriﬁcation problem for a quantized neural network f that we consider consists of checking validity of a given input-output relation formula

Jf Kint-k(x) = y = ψ(y).

Here, Jf Kint-k is the k-bit ﬁxed point arithmetic interpretation of f, and ψ is a predicate in some speciﬁcation logic over the outputs of Jf Kint-k. Equivalently, we may also check satisﬁability of the dual formula

Jf Kint-k(x) = y ψ(y). (8)

In order to study complexity of the veriﬁcation problem, we also need to specify the speciﬁcation logic to which formula ψ belongs. In this work, we study hardness with respect to the fragment QF_BV2bw of the ﬁxed-size

bit-vector logic QF_BV2 (Kovásznai, Fröhlich, and Biere 2016). The fragment QF_BV2bw allows bit-wise logical operations (such as bit-wise conjunction, disjunction and negation) and the equality operator. The index 2 in QF_BV2bw is used to denote that the constants and bit-widths are given in binary representation. It was shown in (Kovásznai, Fröhlich, and Biere 2016) that the satisﬁability problem for formulas in QF_BV2bw is NP-complete. Even though QF_BV2bw itself allows only bit-vector operations and not linear integer arithmetic, we show that by introducing dummy output variables in Jf Kint-k we may still encode formal speciﬁcations on outputs that are boolean combinations of linear inequalities over network s outputs. Thus, this speciﬁcation logic is sufﬁciently expressive to encode formal speciﬁcations most often seen in practice. Let y1, . . . , ym denote output variables of Jf Kint-k. In order to encode an inequality of the form a1y1 + +amym +b 0 into the output speciﬁcation, we do the following:

Introduce an additional output neuron y and a directed edge from each output neuron yi to y. Let ai be the weight of an edge from yi to y, b be the bias term of y, k 1 be the bit-shift value of y, and N = k be the number of bits deﬁning the cut-off value of y. Then

y = Re LU-N(round(2 (k 1)(a1y1 + + amym + b))).

Thus, as we work with bit-vectors of bit-width k, y is just the sign bit of a1y1 + + asys + b preceded by zeros.

As a1y1 + + asys + b 0 holds if and only if the sign bit of a1y1 + + asys + b is 0, in order to encode the inequality into the output speciﬁcation it sufﬁces to encode that y = 0, which is a formula expressible in QF_BV2bw.

By doing this for each linear inequality in the speciﬁcation and since the logical operations are allowed by QF_BV2bw, it follows that we may use QF_BV2bw to encode boolean combinations of linear inequalities over outputs as formal speciﬁcations that are to be veriﬁed. Our main result in this section is that, if ψ in eq. (8) is assumed to be a formula in QF_BV2bw, then the veriﬁcation problem for quantized neural networks is PSPACEhard. Since checking satisﬁability of ψ can be done in nondeterministic polynomial time, this means that the additional hardness really comes from the quantized neural networks.

Theorem 1 (Complexity of veriﬁcation of QNNs). If the predicate on outputs is assumed to be a formula in QF_BV2bw, the veriﬁcation problem for quantized neural networks is PSPACE-hard.

Proof sketch. Here we summarize the key ideas of our proof. For the complete proof, see the technical report. To prove PSPACE-hardness, we exhibit a reduction from TQBF which is known to be PSPACE-complete (Arora and Barak 2009). TQBF is the problem of deciding whether a quantiﬁed boolean formula (QBF) of the form Q1x1. Q2x2. . . . Qnxn. φ(x1, x2, . . . , xn) is true, where each Qi { , } and φ is a quantiﬁer-free formula in propositional logic over the variables x1, . . . , xn. A QBF formula

is true if it admits a truth table for each existentially quantiﬁed variable xi, where the truth table for xi speciﬁes a value in {0, 1} for each valuation of those universally quantiﬁed variables xj on which xi depends (i.e. xj with j < i). Thus, the size of each truth table is at most 2k, where k is the total number of universally quantiﬁed variables in the formula. In our reduction, given an instance of the TQBF problem Q1x1. Q2x2. . . . Qnxn. φ(x1, x2, . . . , xn) we map it to the corresponding veriﬁcation problem as follows. The interpretation Jf Kint-k of the neural network f consists of n+1 disjoint gadgets f1, . . . , fn, g, each having a single input and a single output neuron of bit-width 2k. Note that bit-widths are given in binary representation, thus this is still polynomial in the size of the problem. We use these gadgets to encode all possible inputs to the QBF formula, whereas the postcondition in the veriﬁcation problem encodes the quantiﬁer-free formula itself. For a universally quantiﬁed variable xi, the output of fi is always a constant vector encoding the values of xi in each of the 2k valuations of universally quantiﬁed variables (for a ﬁxed ordering of the valuations). For existentially quantiﬁed xi, we use fi and its input neuron to encode 2k possible choices for the value of xi, one for each valuation of universally quantiﬁed variables, and thus to encode the truth table for xi. Finally, the gadget g is used to return a constant bit-vector 1 of bit-width 2k on any possible input. The predicate ψ on the outputs is then deﬁned as

ψ := (φbw(y1, . . . , yn) = 1),

where φbw is the quantiﬁer-free formula in QF_BV2bw identical to φ, with only difference being that the inputs of φbw are bit-vectors of bit-width 2k instead of boolean variables, and logical operations are also deﬁned over bitvectors (again, since bit-widths are encoded in binary representation, this is of polynomial size). The output of φbw is thus tested if it equals 1 for each valuation of universally quantiﬁed variables and the corresponding values of existentially quantiﬁed variables deﬁned by the truth tables. Our construction ensures that any satisfying input for the neural networks induces satisfying truth tables for the TQBF instance and vice-versa, which completes the reduction.

Theorem 1 is to our best knowledge the ﬁrst theoretical result which indicates that the veriﬁcation problem for quantized neural networks is harder than verifying their idealized real arithmetic counterparts. It sheds some light on the scalability gap of existing SMT-based methods for their veriﬁcation, and shows that this gap is not solely due to practical inefﬁciency of existing methods for quantized neural networks, but also due to the fact that the problem is computationally harder. While Theorem 1 gives a lower bound on the hardness of verifying quantized neural networks, it is easy to see that an upper bound on the complexity of this problem is NEXP since the inputs to the veriﬁcation problem are of size that is exponential in the size of the problem. Closing the gap and identifying tight complexity bounds is an interesting direction of future work. Note though that the speciﬁcation logic QF_BV2bw used to encode predicates over outputs is strictly more expressive than what we need to express boolean combinations of

linear integer inequalities, which is the most common form of formal speciﬁcations seen in practice. This is because QF_BV2bw also allows logical operations over bit vectors, and not just over single bits. Nevertheless, our result presents the ﬁrst step towards understanding computational hardness of the quantized neural network veriﬁcation problem.

Improvements to Bit-Vector SMT-Encodings In this section, we study efﬁcient SMT-encodings of quantized neural networks that would improve scalability of veriﬁcation methods for them. In particular, we propose three simpliﬁcations to the monolithic SMT encoding of eq. (3), (4), and (5) introduced in (Giacobbe, Henzinger, and Lechner 2020), which encodes quantized neural networks and formal speciﬁcations as formulas in the QF_BV2 logic : I) Remove dead branches of the If-Then-Else encoding of the activation function in eq. (5), i.e., branches that are guaranteed to never be taken; II) Allocate only the minimal number of bits for each bit-vector variable in the formula; and III) Eliminate sub-expressions from the summation in eq. (3). To obtain the information needed by the techniques I and II we further propose an abstract interpretation framework for quantized neural networks.

Abstract Interpretation Analysis Abstract interpretation (Cousot and Cousot 1977) is a technique for constructing over-approximations to the behavior of a system. Initially developed for software veriﬁcation, the method has recently been adapted to robustness veriﬁcation of neural networks and is used to over-approximate the output range of variables in the network. Instead of considering all possible subsets of real numbers, it only considers an abstract domain which consists of subsets of suitable form (e.g. intervals, boxes or polyhedra). This allows modeling each operation in the network in terms of operations over the elements of the abstract domain, thus over-approximating the semantics of the network. While it leads to some impreision, abstract interpretation allows more efﬁcient output range analysis for variables. Due to its over-approximating nature, it remains sound for verifying neural networks. Interval (Wang et al. 2018b; Tjeng, Xiao, and Tedrake 2019), zonotope (Mirman, Gehr, and Vechev 2018; Singh et al. 2018), and convex polytope (Katz et al. 2017; Ehlers 2017; Bunel et al. 2018; Wang et al. 2018a) abstractions have emerged in literature as efﬁcient and yet precise choices for the abstract domains of real-valued neural networks. The obtained abstract domains have been used for output range analysis (Wang et al. 2018b), as well as removing decision points from the search process of complete veriﬁcation algorithms (Tjeng, Xiao, and Tedrake 2019; Katz et al. 2017). One important difference between standard and quantized networks is the use of double-sided bounded activation functions in quantized neural networks, i.e., Re LUN instead of Re LU (Jacob et al. 2018). This additional non-linear transition, on one hand, renders linear abstractions less effective, while on the other hand it provides hard upper bounds to each neuron, which bounds the overapproximation error. Consequently, we adopt interval abstractions (IA) on the quantized interpretation of a network

to obtain reachability sets for each neuron in the network. As discussed in (Tjeng, Xiao, and Tedrake 2019), using a tighter abstract interpretation poses a tradeoff between veriﬁcation and pre-processing complexity.

Dead Branch Removal Suppose that through our abstract interpretation we obtained an interval [lb, ub] for the input x of a Re LU-N operation y = Re LU-N(x). Then, we substitute the function by

0, if ub 0 2N 1, if lb 2N 1 x, if ub 0 and lb 2N 1 max{0, x}, if 0 < ub 2N 1. min{2N 1, x}, if 0 lb < 2N 1. max{0, min{2N 1, x}}, otherwise,

which reduces the number of decision points in the SMT formula.

Minimum Bit Allocation A k-bit quantized neural network represents each neuron and weight variable by a k-bit integer. However, when computing the values of certain types of layers, such as the linear layer in eq. (1), a wider register is necessary. The binary multiplication of a k-bit weight and a k-bit neuron value results in a number that is represented by 2k-bits. Furthermore, summing up n such 2k-bit integer requires

bnaive = 2k + log2(n) + 1 (9)

bits to be safely represented without resulting in an overﬂow. QF_BV2 reasons over ﬁxed-size bit-vectors, i.e. the bit width of each variable must be ﬁxed in the formula regardless of the variable s value. (Giacobbe, Henzinger, and Lechner 2020) showed that the number of bits used for all weight and neuron variables in the formal affects the runtime of the SMT-solver signiﬁcantly. For example, omitting the least signiﬁcant bit of each variable cuts the runtime on average by half. However, the SMT encoding of (Giacobbe, Henzinger, and Lechner 2020) allocates bnaive bits according to eq. (9) for each accumulation variable of a linear layer. Our approach uses the interval [lb, ub] obtained for each variable by abstract interpretation to compute the minimal number of bits necessary to express any value in the interval. As the signed bit-vector variables are represented in the two s complement format, we can compute the bit width b of variable x with computed interval [lb, ub] by

bminimal = 1 + log2(max{|lb|, |ub|} + 1). (10)

Trivially, one can show that bminimal < bnaive, as |ub| 22kn and |lb| 22kn.

Redundant Multiplication Elimination Another difference between quantized and standard neural networks is the rounding of the weight values to the nearest representable value of the employed ﬁxed-point format. Consequently, there is a considerable chance that two connections outgoing from the same source neuron will have

the same weight value. For instance, assuming an 8-bit network and a uniform weight distribution, the chance of two connections having the same weight value is around 0.4% compared to the much lower 4 10 8% of the same scenario happening in a ﬂoating-point network. Moreover, many weight values express some subtle form of redundancy on a bit-level. For instance, both multiplication by 2 and multiplication by 6 contain a shift operations by 1 digit in their binary representation. Thus, computations

y1 = 3 x1 y2 = 6 x1 (11)

can be rewritten as

y1 = 3 x1 y2 = y1 << 1, (12)

where << is a binary shift to the left by 1 digit. As a result, a multiplication by 6 is replaced by a much simpler shift operation. Based on this motivation, we propose a redundancy elimination heuristic to remove redundant and partially redundant multiplications from the SMT formula. Our heuristic ﬁrst orders all outgoing weights of a neuron in ascending order and then sequentially applies a rule-matching for each weight value. The rules try to ﬁnd a simpler way to compute the multiplication of the weight and the neuron value by using already performed multiplications. The algorithm and the rules in full are provided in the technical report. Note that a similar idea was introduced by (Cheng et al. 2018) in the form of a neuron factoring algorithm for the encoding of binarized (1-bit) neural networks into SAT formulas. However, the heuristic of (Cheng et al. 2018) removes redundant additions, whereas we consider bit-level redundancies in multiplications. For many-bit quantization, the probability of two neurons sharing more than one incoming weight is negligible, thus making such neuron factoring proposed in (Cheng et al. 2018) less effective.

Experimental Evaluation We create an experimental setup to evaluate how much the proposed techniques affect the runtime and efﬁciency of the SMT-solver. Our reference baseline is the approach of (Giacobbe, Henzinger, and Lechner 2020), which consists of a monolithic and "balanced" bit-vector formulation for the Boolector SMT-solver. We implement our techniques on top of this baseline. We limited our evaluation to Boolector, as other SMT-solvers supporting bit-vector theories, such as Z3 (De Moura and Bjørner 2008), CVC4 (Barrett et al. 2011), and Yices (Dutertre 2014), performed much worse in the evaluation of (Giacobbe, Henzinger, and Lechner 2020). Our evaluation comprises of two benchmarks. Our ﬁrst evaluation considers the adversarial robustness veriﬁcation of image classiﬁer trained on the MNIST dataset (Le Cun et al. 1998). In particular, we check the l robustness of networks against adversarial attacks (Szegedy et al. 2013). Other norms, such as l1 and l2, can be expressed in bit-vector SMT constraints as well, although with potentially negative effects on the solver runtime. In the second evaluation, we repeat the experiment on the slightly more complex Fashion MNIST dataset (Xiao, Rasul, and Vollgraf 2017) . All experiments are run on a 14-core Intel W-2175 CPU with 64GB of memory. We used the boolector

Attack Baseline Baseline Ours radius (+ Lingeling) (+ Ca Di Cal) ε = 1 63 (63.6%) 92 (92.9%) 99 (100.0%) ε = 2 0 (0.0%) 20 (20.2%) 94 (94.9%) ε = 3 0 (0.0%) 2 (2.1%) 71 (74.0%) ε = 4 0 (0.0%) 1 (1.0%) 54 (55.7%)

Table 1: Number of solved instances of adversarial robustness veriﬁcation on the MNIST dataset. Absolute numbers and in percentages of checked instances in parenthesis.

Dataset Baseline Baseline Ours (+ Lingeling) (+ Ca Di Cal) MNIST 8803 |8789 2798 |3931 5 |90 Fashion-MNIST 6927 |6927 3105 |3474 4 |49

Table 2: Median |mean runtime of adversarial robustness veriﬁcation process per sample. The reported values only account for non-timed-out samples.

(Niemetz, Preiner, and Biere 2015) with the SAT-solvers Lingeling(Biere 2017) (only baseline) and Ca Di Cal (Biere 2019) (baseline + our improvements) as SAT-backend. Adversarial robustness speciﬁcation can be expressed as

|x xi| ε y = Jf Kint-k(x) = y = yi, (13)

where (xi, yi) is a human labeled test sample and ε is a ﬁxed attack radius. As shown in eq. (13), the space of possible attacks increases with ε. Consequently, we evaluate with different attack radii ε and study the runtimes individually. In particular, for MNIST we check the ﬁrst 100 test samples with an attack radius of ε = 1, the next 100 test samples with ε = 2, and the next 200 test samples with ε = 3 and ε = 4 respectively. For our Fashion-MNIST evaluation, we reduce the number of samples to 50 per attack radius value for ε > 2 due to time and compute limitations. The network studied in our benchmark consists of four fully-connected layers (784,64,32,10), resulting in 52,650 parameters in total. It was trained using a quantization-aware training scheme with a 6-bit quantization. The results for the MNIST evaluation in terms of solved instances and median solver runtime are shown in Table 1 and Table 2 respectively. Table 3 and Table 2 show the results for the Fashion-MNIST benchmark. Both benchmark networks are publicly available 3

Ablation Analysis We perform an ablation analysis where we re-run our robustness evaluation with one of our proposed techniques disabled. The objective of our ablation analysis is to understand how the individual techniques affect the observed efﬁciency gains. Due to time and computational limitations we focus our ablation experiments to MNIST exclusively. The results in Table 4 show the highest number of solved instances were achieved when all our techniques were enabled. Nonetheless, Table 4 demonstrate these gains are not

3https://github.com/mlech26l/qnn_robustness_benchmarks

Attack Baseline Baseline Ours radius (+ Lingeling) (+ Ca Di Cal) ε = 1 2 (2.3%) 44 (50.6%) 76 (87.4%) ε = 2 0 (0.0%) 7 (7.8%) 73 (81.1%) ε = 3 0 (0.0%) 1 (2.3%) 27 (62.8%) ε = 4 0 (0.0%) 0 (0.0%) 18 (40.9%)

Table 3: Number of solved instances of adversarial robustness veriﬁcation on the Fashion-MNIST dataset.

Method Total solved Cumulative instances runtime No redundancy elim. 316 (80.8%) 7.7 h No minimum bitwidth 315 (80.6%) 5.1 h No Re LU simplify 88 (22.5%) 83.2 h No Abstract interpret. 107 (27.4%) 126.0 h All enabled 318 (81.3%) 7.9 h

Table 4: Ablation analysis on the MNIST dataset. The cumulative runtime corresponds to non-timed-out samples.

equally distributed across the three techniques. In particular, the Re LU simpliﬁcation has a much higher contribution for explaining the gains compared to the redundancy elimination and minimum bitwidth methods. The limited beneﬁts observed for these two techniques may be explain by the inner workings of the Boolector SMT-solver. The Boolector SMT-solver (Niemetz, Preiner, and Biere 2015) is based on a portfolio approach which sequentially applies several different heuristics to ﬁnd a satisfying assignment of the input formula (Wintersteiger, Hamadi, and De Moura 2009). In particular, Boolector starts with fast but incomplete local search heuristics and falls back to slower but complete bit-blasting (Clark and Cesare 2018) in case the incomplete search is unsuccessful (Niemetz, Preiner, and Biere 2019). Although our redundancy elimination and minimum bitwidth techniques simplify the bit-blasted representation of the encoding, it introduces additional dependencies between different bit-vector variables. As a result, we believe these extra dependencies make the local search heuristics of Boolector less effective and thus enabling only limited performance improvements.

Conclusion We show that the problem of verifying quantized neural networks with bit-vector speciﬁcations on the inputs and outputs of the network is PSPACE-hard. We tackle this challenging problem by proposing three techniques to make the SMT-based veriﬁcation of quantized networks more efﬁcient. Our experiments show that our method outperforms existing tools by several orders of magnitude on adversarial robustness veriﬁcation instances. Future work is necessary to explore quantized neural network veriﬁcation s complexity with respect to different speciﬁcation logics. On the practical side, our methods point to limitations of monolithic SMT-encodings for quantized neural network veriﬁcation and suggest that future improvements may be obtained by integrating the encoding and the solver steps more tightly.

Acknowledgments This research was supported in part by the Austrian Science Fund (FWF) under grant Z211-N23 (Wittgenstein Award), ERC Co G 863818 (Fo RM-SMArt), and the European Union s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Grant Agreement No. 665385.

References Arora, S.; and Barak, B. 2009. Computational Complexity - A Modern Approach. Cambridge University Press. ISBN 978-0-521-42426-4. URL http://www.cambridge.org/ catalogue/catalogue.asp?isbn=9780521424264. Baranowski, M.; He, S.; Lechner, M.; Nguyen, T. S.; and Rakamaric, Z. 2020. An SMT Theory of Fixed-Point Arithmetic. In International Joint Conference on Automated Reasoning (IJCAR), 13 31. Barrett, C.; Conway, C. L.; Deters, M.; Hadarean, L.; Jovanovi c, D.; King, T.; Reynolds, A.; and Tinelli, C. 2011. CVC4. In International Conference on Computer Aided Veriﬁcation (CAV), 171 177. Springer. Biere, A. 2017. Ca Di Ca L, Lingeling, Plingeling, Treengeling, Yal SAT Entering the SAT Competition 2017. In Balyo, T.; Heule, M.; and Järvisalo, M., eds., Proc. of SAT Competition 2017 Solver and Benchmark Descriptions, volume B-2017-1 of Department of Computer Science Series of Publications B, 14 15. University of Helsinki. Biere, A. 2019. Ca Di Ca L at the SAT Race 2019. In Heule, M.; Järvisalo, M.; and Suda, M., eds., Proc. of SAT Race 2019 Solver and Benchmark Descriptions, volume B2019-1 of Department of Computer Science Series of Publications B, 8 9. University of Helsinki. Bunel, R. R.; Turkaslan, I.; Torr, P.; Kohli, P.; and Mudigonda, P. K. 2018. A uniﬁed view of piecewise linear neural network veriﬁcation. In Conference on Neural Information Processing Systems (Neur IPS), 4795 4804. Cheng, C.-H.; Nührenberg, G.; Huang, C.-H.; and Ruess, H. 2018. Veriﬁcation of Binarized Neural Networks via Inter Neuron Factoring. In Working Conference on Veriﬁed Software: Theories, Tools, and Experiments (VSTTE), 279 290. Clark, B.; and Cesare, T. 2018. Satisﬁability Modulo Theories. In Clarke, E. M.; Henzinger, T. A.; Veith, H.; and Bloem, R., eds., Handbook of model checking, volume 10, chapter 11. Springer. Cousot, P.; and Cousot, R. 1977. Abstract interpretation: a uniﬁed lattice model for static analysis of programs by construction or approximation of ﬁxpoints. In ACM SIGACTSIGPLAN symposium on Principles of programming languages (POPL), 238 252. De Moura, L.; and Bjørner, N. 2008. Z3: An efﬁcient SMT solver. In International conference on Tools and Algorithms for the Construction and Analysis of Systems, 337 340. Springer. Dutertre, B. 2014. Yices 2.2. In International Conference on Computer Aided Veriﬁcation (CAV), 737 744. Springer.

Dutta, S.; Chen, X.; and Sankaranarayanan, S. 2019. Reachability analysis for neural feedback systems using regressive polynomial rule inference. In International Conference on Hybrid Systems: Computation and Control (HSCC), 157 168.

Ehlers, R. 2017. Formal veriﬁcation of piece-wise linear feed-forward neural networks. In International Symposium on Automated Technology for Veriﬁcation and Analysis (ATVA), 269 286.

Giacobbe, M.; Henzinger, T. A.; and Lechner, M. 2020. How Many Bits Does it Take to Quantize Your Neural Network? In International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS), 79 97.

He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770 778.

Huang, X.; Kwiatkowska, M.; Wang, S.; and Wu, M. 2017. Safety veriﬁcation of deep neural networks. In International Conference on Computer Aided Veriﬁcation (CAV), 3 29. Springer.

Jacob, B.; Kligys, S.; Chen, B.; Zhu, M.; Tang, M.; Howard, A.; Adam, H.; and Kalenichenko, D. 2018. Quantization and training of neural networks for efﬁcient integer-arithmeticonly inference. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2704 2713.

Jia, K.; and Rinard, M. 2020. Exploiting Veriﬁed Neural Networks via Floating Point Numerical Error. ar Xiv preprint ar Xiv:2003.03021 .

Kahan, W. 1996. IEEE standard 754 for binary ﬂoatingpoint arithmetic. Lecture Notes on the Status of IEEE 754(94720-1776): 11.

Katz, G.; Barrett, C.; Dill, D. L.; Julian, K.; and Kochenderfer, M. J. 2017. Reluplex: An efﬁcient SMT solver for verifying deep neural networks. In International Conference on Computer Aided Veriﬁcation (CAV), 97 117.

Kovásznai, G.; Fröhlich, A.; and Biere, A. 2016. Complexity of ﬁxed-size bit-vector logics. Theory of Computing Systems 59(2): 323 376.

Le Cun, Y.; Bottou, L.; Bengio, Y.; and Haffner, P. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11): 2278 2324.

Mirman, M.; Gehr, T.; and Vechev, M. 2018. Differentiable abstract interpretation for provably robust neural networks. In International Conference on Machine Learning (ICML), 3575 3583.

Narodytska, N.; Kasiviswanathan, S.; Ryzhyk, L.; Sagiv, M.; and Walsh, T. 2018. Verifying properties of binarized deep neural networks. In AAAI Conference on Artiﬁcial Intelligence (AAAI), 6615 6624.

Niemetz, A.; Preiner, M.; and Biere, A. 2015. Boolector 2.0 system description. Journal on Satisﬁability, Boolean Modeling and Computation 9: 53 58.

Niemetz, A.; Preiner, M.; and Biere, A. 2019. Boolector at the SMT Competition 2019. Technical report, Stanford University and JKU Linz. Ruan, W.; Huang, X.; and Kwiatkowska, M. 2018. Reachability analysis of deep neural networks with provable guarantees. In International Joint Conference on Artiﬁcial Intelligence (IJCAI), 2651 2659. Singh, G.; Gehr, T.; Mirman, M.; Püschel, M.; and Vechev, M. 2018. Fast and effective robustness certiﬁcation. In Conference on Neural Information Processing Systems (Neur IPS), 10825 10836. Singh, G.; Gehr, T.; Püschel, M.; and Vechev, M. 2019. An abstract domain for certifying neural networks. Proceedings of the ACM on Programming Languages 3(POPL): 1 30. Smith, S. W.; et al. 1997. The scientist and engineer s guide to digital signal processing, volume 14. California Technical Pub. San Diego. Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.; and Fergus, R. 2013. Intriguing properties of neural networks. ar Xiv preprint ar Xiv:1312.6199 . Tan, M.; and Le, Q. V. 2019. Efﬁcientnet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning (ICML), 6105 6114. Tjeng, V.; Xiao, K. Y.; and Tedrake, R. 2019. Evaluating robustness of neural networks with mixed integer programming. In International Conference on Learning Representations (ICLR). Wang, S.; Pei, K.; Whitehouse, J.; Yang, J.; and Jana, S. 2018a. Efﬁcient formal safety analysis of neural networks. In Conference on Neural Information Processing Systems (Neur IPS), 6369 6379. Wang, S.; Pei, K.; Whitehouse, J.; Yang, J.; and Jana, S. 2018b. Formal security analysis of neural networks using symbolic intervals. In 27th {USENIX} Security Symposium ({USENIX} Security 18), 1599 1614. Wintersteiger, C. M.; Hamadi, Y.; and De Moura, L. 2009. A concurrent portfolio approach to SMT solving. In International Conference on Computer Aided Veriﬁcation (CAV), 715 720. Springer. Xiao, H.; Rasul, K.; and Vollgraf, R. 2017. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. ar Xiv preprint ar Xiv:1708.07747 .