# nonparametric_inference_under_bbits_quantization__e7ea2413.pdf Journal of Machine Learning Research 25 (2024) 1-68 Submitted 1/20; Revised 4/23; Published 1/24 Nonparametric Inference under B-bits Quantization Kexuan Li kexuan.li.77@gmail.com Global Biometrics and Data Sciences Bristol Myers Squibb Princeton Pike, NJ 08648, USA Ruiqi Liu ruiqliu@ttu.edu Department of Mathematics and Statistics Texas Tech University Lubbock, TX 79409, USA Ganggang Xu gangxu@bus.miami.edu Department of Management Science University of Miami Coral Gables, FL 33146, USA Zuofeng Shang zshang@njit.edu Department of Mathematical Sciences New Jersey Institute of Technology Newark, NJ 07102, USA Editor: Rina Barber Statistical inference based on lossy or incomplete samples is often needed in research areas such as signal/image processing, medical image storage, remote sensing, signal transmission. In this paper, we propose a nonparametric testing procedure based on samples quantized to B bits through a computationally efficient algorithm. Under mild technical conditions, we establish the asymptotic properties of the proposed test statistic and investigate how the testing power changes as B increases. In particular, we show that if B exceeds a certain threshold, the proposed nonparametric testing procedure achieves the classical minimax rate of testing (Shang and Cheng, 2015) for spline models. We further extend our theoretical investigations to a nonparametric linearity test and an adaptive nonparametric test, expanding the applicability of the proposed methods. Extensive simulation studies together with a real-data analysis are used to demonstrate the validity and effectiveness of the proposed tests. Keywords: B-bits Quantization, Minimax Rates of Testing, Nonparametric Inference, Smoothing Splines 1. Introduction Lossy or incomplete data are commonly encountered in research areas such as machine learning, information theory, and signal processing. To store and process signals in digital devices, quantization is a popular procedure that maps the original measurements from a large (often uncountably infinite) set to a set of possible values. The resulting values are referred to as the quantized samples. With the increasing availability of data, it is of great c 2024 Kexuan Li, Ruiqi Liu, Ganggang Xu and Zuofeng Shang. For correspondence, please contact Zuofeng Shang and Ganggang Xu.. License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v25/20-075.html. Li, Liu, Xu and Shang interest to quantify how the data analysis can be affected when the data are quantized due to storage or communication budget constraint, and how to design quantization schemes to minimize the efficiency loss. Statistical inference based on quantized samples is challenging because, in addition to the measurement errors, one also needs to account for the information loss due to the quantization errors. In particular, commonly used standard statistical procedures may not be valid when applied to quantized samples if the quantization errors are ignored. The research on lossy data has attracted increasing attention recently. The first line of works focuses on b-bit compressive sensing, which aims at reconstructing a sparse signal from a sequence of b-bit quantized outcomes. A 1-bit compressive sensing model was proposed by Boufounos and Baraniuk (2008), and several efficient and provable algorithms have been developed; see, e.g., Gupta et al. (2010); Gopi et al. (2013); Plan and Vershynin (2013); Zhang et al. (2014); Zhu and Gu (2015). A signal recovery algorithm was proposed in Slawski and Li (2015), which extended the 1-bit compressive sensing model to a b-bit compressive sensing model. The second line of research related to the lossy data is to develop statistical methods based on quantized observations. For example, Lee and Vardeman (2001) studied the interval estimation of a normal mean process from rounded data, which was further extended to more general likelihood-based statistical estimation problems (Vardeman and Lee, 2005) and nonparametric regression problems (Benhenni and Rachdi, 2006). Recently, an increasing number of works aim to quantify the impact of quantization on the statistical properties of the resulting estimators. For example, Zhang et al. (2013) established lower bounds on the minimax risks for distributed estimation of parametric models under a communication budget constraint. Suresh et al. (2017) proposed communication efficient algorithms for distributed mean estimation without probabilistic assumptions on the data. A version of Pinsker s theorem under some storage or communication constraints was developed in Zhu and Lafferty (2014), and it was further applied to analyze the convergence rate of nonparametric estimation with a limited bits budget by Zhu and Lafferty (2017). More recently, a series of works have emerged in investigating the high-dimensional and/or nonparametric regression model estimation in the distributed learning framework with bits constraints, e.g., see Zhu and Lafferty (2018); Han et al. (2018); Szabo and van Zanten (2020); Cai and Wei (2021). Despite the abundant existing literature on statistical modeling of quantized data, research focusing on the nonparametric inference based on quantized data is still lacking. This paper aims to fill this gap by proposing a new quantization scheme with a B-bits storage or communication budget such that nonparametric estimation and testing based on quantized samples are still valid. Specifically, we consider the following regression model yi = g0(i/n) + σϵi, i = 1, . . . , n, (1) where g0( ) is a smooth function, ϵi s are iid zero-mean errors with an unit variance, and σ > 0 is an unknown constant. The goal is to (a) estimate g0( ), and (b) test the following hypothesis H0 : g0(x) = g (x) for all x [0, 1], (2) where g ( ) is a pre-specified deterministic function. Nonparametric Inference under B-bits Quantization The above model has been extensively studied in the literature, see, e.g., Shang and Cheng (2017), and is closely related to the well-known Gaussian sequence model and Gaussian white noise model (Tsybakov, 2008). However, unlike existing literature, we consider the case in which the original data, denoted by y1, , yn, are generated in machine M, and are quantized as soon as they are generated. The quantized data are then stored in a machine M or transmitted to another machine M for future statistical inferences. We assume that only B-bits budget are available for data storage or communication, rendering the necessity for data quantization that may invalidate existing estimation and inference methods. Such a research problem is important for applications where data generation and analysis are carried out at different locations. For example, testing H0 : g0(x) = 0 reveals whether the transmitted quantized signals through satellite are pure noises. If g ( ) is the signal-process from a normally functioning machine, testing (2) using only quantized samples enables us to remotely monitor whether the machine is working properly in real-time. To meet the B-bits requirement, we propose a two-stage quantization procedure: in the first stage we quantize an individual yi as Q(yi) with Q( ) being a quantizer, and in the second stage we overwrite these quantized observations by their local averages. See Figure 1 and Algorithm 1 for details. As a result, we obtain a quantized sample of size c for some c < n to be stored or transmitted. We demonstrate that with a carefully chosen c and a well-designed quantizer, the proposed nonparametric estimation and testing procedures are asymptotically valid and efficient even based only on the quantized data. Our contributions can be summarized as follows. Firstly, we propose a computationally efficient data quantization algorithm to reduce the size of the raw data to meet the Bbits constraint, and at the same time reduce the computational complexity from O(n3) to O(c3). Secondly, we establish sufficient conditions on the bits constraint, i.e., B, that warrants the minimax convergence rate for the resulting spline estimators and the minimax rates of testing for the proposed testing procedure. In particular, our results show how the asymptotic power of the proposed testing procedure changes as the bits constraint B increases. Thirdly, we further extend our theoretical investigations to (a) a nonparametric linearity test of the underlying function; (b) an adaptive nonparametric test when the smoothness of the underlying function is unknown. To the best of our knowledge, our work is the first to provide a theoretical investigation on nonparametric inference based on quantized samples. The rest of the paper is organized as follows. Section 2 describes the general methodologies we proposed for data quantization, nonparametric estimation, and nonparametric testing using splines. In Section 3, we investigate the theoretical properties of the spline estimator and the nonparametric test statistic based on quantized samples. In Section 4, we study asymptotic properties of the nonparametric linearity test statistic and the adaptive nonparametric test statistic under B-bits constraint. Section 5 gives several simulation studies to evaluate finite sample performances of the proposed methods and Section 6 illustrates an application of the proposed methods to the Combined Cycle Power Plant Data. Notation: Let represent the L2-norm, i.e., f 2 = R 1 0 f2(t)dt, and define 2 as the Euclidean Norm of vectors. Let sup denote the supreme norm of a function, i.e., f sup = supt [0,1] |f(t)|. For two positive sequences an and bn, we denote an bn (an bn) if there exists a constant C > 0 such that an Cbn (an Cbn) for all n 1; Li, Liu, Xu and Shang denote an bn if an bn and an bn; denote an bn if an/bn 0 as n and an bn if an/bn as n . 2. Methodology In this section, we first review some background of the classical smoothing spline regression and then give details on the proposed quantization scheme, nonparametric estimation and testing procedures. 2.1 Review of Classical Smoothing Spline Regression Throughout this paper, we assume that the underlying true function g0( ) belongs to the m-order (m 1) periodic Sobolev space on I := [0, 1] defined as ν=1 βνϕν( ) : ν=1 β2 νγν < where ϕ2k 1(x) = 2 cos(2πkx), ϕ2k(x) = 2 sin(2πkx) are the trigonometric basis functions, and γ2k 1 = γ2k = (2πk)2m for x I and k 1. It follows from Wahba (1990) and Gu (2013) that Sm(I) is a reproducing kernel Hilbert space (RKHS) endowed with an inner product J(f, g) = R 1 0 f(m)(x)g(m)(x)dx and a reproducing kernel K(x, y) = ( 1)m 1 (2m)! B2m(|x y|), where B2m is the Bernoulli polynomial of order 2m. Based on the above assumptions on g0( ), the classic smoothing spline (ss) estimator of g0( ) is obtained through the following optimization problem: bgss arg min g Sm 1 n i=1 (yi g(i/n))2 + λ Z 1 0 [g(m)(x)]2dx. (3) For x I, we can define a function Kx( ) = K(x, ), which belongs to Sm(I). By the representer Theorem (Gu, 2013), the solution to (3) has the following closed-form i=1 θi Ki/n, (4) where θ = (θ1, . . . , θn)T = n 1(Σn + λIn) 1y with Σn = [K(i/n, i /n)/n]1 i,i n Rn n, y = (y1, . . . , yn)T Rn and In being the n n identify matrix. To conduct hypothesis test for (2), a straightforward idea is to construct a testing statistic based on the distance between bgss( ) and g ( ). Specifically, we use the L2 norm distance defined as T ss = bgss g 2. With an appropriate normalization, it can be shown that T ss is asymptotically normally distributed (Shang and Cheng, 2017; Yang et al., 2020; Liu et al., 2021). Nonparametric Inference under B-bits Quantization 2.2 Two-Stage Quantization The original observations yi s in (1) are real-valued random variables, each of which literally requires an infinite amount of bits to store or transmit. When there are only B available bits, the original observations yi s may not be directly accessible for estimation or testing, and hence, the classical smoothing spline estimator given in (4) is not applicable. This section aims to introduce a two-stage quantization scheme to transform yi s into the ones whose storage or transmission meets the B-bits constraint. The resulting samples will be further used for optimal inferential purposes in the subsequent sections. The two-stage quantization process is demonstrated in the following Figure 1. Figure 1: Two-stage quantization process. The first-stage quantization is to quantize the data yi s as soon as they are generated with at most k distinct values. For convenience, we use a uniform quantization scheme as follows. We first choose an interval [t1, tk 1] and choose t2 < . . . < tk 2 as the equally spaced grid points within [t1, tk 1]. Denote t = (t1, . . . , tk 1)T Rk 1 and the sub-interval length Ck(t) := (tk 1 t1)/(k 2). For ease of presentation, we assume that l1 = t1/Ck(t) is an integer. Define a quantizer Q( ) as follows: First-stage quantization: Q(y) = j=1 µj I[y Rj(t)], with µj = (l1 + j 1)Ck(t), for any y R, (5) where µ = (µ1, . . . , µk)T Rk consists of the quantized values and R1(t) = ( , t1], R2(t) = (t1, t2], . . . , Rk 1(t) = (tk 2, tk 1], Rk(t) = (tk 1, ) are the corresponding quantized intervals. Clearly, the Rj(t) s form a partition of the real line with assigned marks µj s and Q maps each y R to one of the k marks. Applying Q to yi s, we generate n quantized samples Q(y1), , Q(yn), each of which takes at most k distinct values. Storage or transmission of Q(yi) s thus requires n log2 k bits which might still go beyond the B-bits budget when B = o(n). For this reason, we propose the following second-stage quantization to further reduce the storage or transmission bits through locally averaging the Q(yi) s. The second-stage quantization is to further reduce the number of storage or transmission bits via local average. Specifically, we divide the interval I = [0, 1] into c equally-spaced sub-intervals for some c B/ log2(k + 2) . For simplicity, we assume that en := n/c is an integer and each sub-interval contains en observations. The quantized data from the Li, Liu, Xu and Shang first-stage quantization, i.e., Q(yi) s, are further quantized as zi = eli Ck(t) such that Second-stage quantization: j=(i 1)en+1 Q(yj) Ck(t), for i = 1, . . . , c. (6) Details of the two-stage quantization algorithm are provided in the following Algorithm 1. Algorithm 1: Two-Stage Quantization Initialization: set el1 = = elc = 0; Input: Data y1, , yn that generated sequentially; while 1 i c do Set tmp = 0; while (i 1)en + 1 j ien do (a) First-stage quantization (quantize on the spot): Q(yj) = lj Ck(t); (b) Second-stage quantization (averaging): i. update eli = eli + Integer Part of lj/en; ii. if tmp + sign(lj)(|lj| mod en) en then (eli := eli + 1 tmp := tmp + sign(lj)(|lj| mod en) en; else if tmp + sign(lj)(|lj| mod en) en then (eli := eli 1 tmp := tmp + sign(lj)(|lj| mod en) + en; else update tmp := tmp + sign(lj)(|lj| mod en); end if end while Output: quantized data: z1 = el1Ck(t), , zc = elc Ck(t). Based on the definition of the quantizer Q( ) in (5), zi in (6) must belong to the interval [µ1 Ck(t), µk + Ck(t)] and must be of the form l Ck(t) for some integer l. Therefore, there are at most k +2 distinct values of zi s, namely, l Ck(t), for l = l1 1, l1, l1 +1, , (l1 +k 1), l1 + k. As a result, each zi requires b = log2(k + 2) bits to store or transmit, hence, the entire zi s require c log2(k + 2) B bits, where x is the smallest integer greater than x. In the subsequent sections, we will show that, with c, k being properly selected, optimal inferences based on zi s are possible even under B = o(n). For optimal inferences in non-regression settings such as the Gaussian sequence model or Gaussian white-noise model, similar findings were made by Cai and Wei (2021). We wish to remark that although Algorithm 1 focuses on the case where each subinterval of [0, 1] has the same length and contains the same number of observations, it can Nonparametric Inference under B-bits Quantization be easily modified to the more general case without affecting the asymptotic properties. In particular, the simple averaging strategy can be replaced by a certain type of kernel smoothing technique. 2.3 B-bits Nonparametric Spline Estimation Given B, let us choose k, c such that c log2(k + 2) = B, i.e., our two-stage quantization maximizes the use of the available bits. Based on the quantized samples z1, . . . , zc from Algorithm 1, a B-bits constrained spline estimator is proposed as follows bg B µ,t,c arg min g Sm(I) i=1 (zi g(i/c))2 + λ Z 1 0 [g(m)(x)]2dx. (7) Similar to (4), the resulting spline estimator bg B µ,t,c( ) has an explicit expression bg B µ,t,c = i=1 bθi Ki/c, where (bθ1, . . . , bθc)T = c 1(Σc + λIc) 1z with Σc = [K(i/c, i /c)/c]1 i,i c Rc c, z = (z1, . . . , zc)T Rc, and Ic being the c c identity matrix. Notice that the optimization of (7) only requires on c quantized observations and the solution only involves computing the inverse of a c c matrix Σc + λIc, which is much less computationally intensive compared to the classical smoothing spline estimator (4). Finally, the selection of the tuning parameter λ is crucial, and can be obtained by minimizing the generalized cross validation (GCV) score as follows bλ = arg min λ>0 c [Ic Σc(Σc + λIc) 1)]z 2 2 [c trace(Σc(Σc + λIc) 1)]2 , (8) The GCV has been widely used in the literature and enjoys appealing theoretical properties in various settings, see, e.g., Wahba (1990); Xu and Huang (2012); Gu (2013); Xu et al. (2018, 2019). 2.4 B-bits Nonparametric Testing In this section, we propose a test statistic for the null hypothesis (2) based on the Bbits spline estimator bg B µ,t,c( ). Without loss of generality, we assume g ( ) 0 in the null hypothesis (2). For a nonzero g ( ), the observed response variables yi s from model (1) can be centered as y i = yi g (i/n), and the same testing procedure can be applied using y i s instead. To test H0 : g0(x) = 0, we consider test statistic based on the L2 norm distance between bg B µ,t,c( ) and g ( ) 0 as following Tµ,t,c = bg B µ,t,c 2. (9) Intuitively, a large value of Tµ,t,c should lead to the rejection of H0. In Theorem 3, we shall show that under H0 and mild conditions, it holds that c Tµ,t,c trace(A)τ 2 k scτ 2 k d N(0, 1) as n, c , Li, Liu, Xu and Shang where τ 2 k = Var(z1|H0), A = (Σc+λIc) 1Ωc(Σc+λIc) 1, Ωc = [K 2(i/c, i /c)/c]1 i,i c with K 2(x, x ) := R 1 0 K(x, y)K(y, x )dy and s2 c = 2 P 1 i =i c a2 i,i with ai,i being the (i, i )th entry of A. In practice, τ 2 k needs to be estimated based on the quantized data as well. We proposed the following estimator bτ 2 k = eτ 2 n 2en(n 1), where eτn is given in the following Algorithm 2 through quantization. Intuitively, the above estimator is a re-scaled (by a factor of en 1) version of the quantized sample error variance 1 2(n 1) Pn j=2 {Q(yj) Q(yj 1)}2. It is straightforward to shown that bτ 2 k = τ 2 k[1 + op(1)] under mild conditions, see Lemma 11 of Appendix C for details. Consequently, the decision rule for testing (2) at significance level α can be defined as follows φc,k = I(|c Tµ,t,c trace(A)bτ 2 k| z1 α/2scbτ 2 k), (10) where z1 α/2 is the (1 α/2)-percentile of the standard normal distribution. We reject the null hypothesis (2) if and only if φc,k = 1. Algorithm 2: Quantization Estimation of Variance Initialization: set eτ 2 n = tmp = 0; Input: Data y1, , yn that generated sequentially; while 1 j n do 1. Quantize on the spot Q(yj); 2. if j = 1 then update tmp := Q(y1); else (a) update eτ 2 n := eτ 2 n + (Q(yj) tmp)2; (b) update tmp := Q(yj); end while Output: eτ 2 n. By the design of the quantizer Q( ) in (5), we can see that there are at most k+1 distinct possible values for each {Q(yj+1) Q(yj)}2 ranging from 0Ck(t)2 to k2Ck(t)2, 1 j n 1, yielding the range for eτ 2 as [0, (n 1)k2Ck(t)2]. Since eτ 2 can only take values as l Ck(t)2 for some integer l, there are at most (n 1)k2 + 1 distinct values for eτ 2, which would cost log2 (n 1)k2 + 1 bits to store or transmit. Compared to the bit costs for the two-stage quantization c log2(k) B, the cost to store or transmit eτ 2 is negligible when c , hence is ignored in the calculation of the total bit costs for ease of presentation. 2.5 Practical Choice of c and k Given B The implementations of Algorithms 1 and 2 require a practical choice of c and k for a given bits budget B. Based on the discussion in Section 2.2, Algorithm 1 requires B = Nonparametric Inference under B-bits Quantization cb with b = log2 (k) . Our theoretical investigations in Section 3.3 require that b log2 p (nh1/2 + n(ch) 1)Tn for some h 0, and ch as c, n , where Tn is defined in Condition (B). Furthermore, equations (16) and (17) in Section 3.3 reveal that the optimal choice of b depends on the smoothness of the periodic Sobolev space (i.e., m) and the tuning parameter λ. While the former is typically unknown in practice, the latter needs to be chosen by some data-driven criterion such as GCV based on the quantized data, which is not available until the quantization process is carried out. To simplify the calculation and make the quantization algorithm more practical, we propose to use b = log2 p n Tn/σ2 , which is a valid choice for any m and h, and therefore is easy to use in practice. Specifically, given B, we find c and k as follows c = max n c Z : c log2 p n Tn/σ2 B o , k = max n k Z : c log2(k) B o . (11) By the definition in Condition (B), Tn is the quantization range, and σ2 is used in the choice of b so that Tn/σ2 is invariant if yi s are multiplied by a constant. Under Condition (B), the actual choice of Tn/σ2 depends on the distribution of ϵi s in model (1). If ϵi s follow a standard Gaussian distribution, it suffices to take Tn/σ2 = 2.5 log(n). Therefore, σ2 in (11) does not need to be estimated. See more discussion under Condition (B) regarding the choice of Tn/σ2. 3. Asymptotic Theory We now proceed to study asymptotic properties of the B-bits spline estimator and the nonparametric test statistic. In this section, we restrict our investigation to the simple case scenario when the order m of the periodic Sobolev space is known and fixed, and the exact form of function g ( ) in the null hypothesis (2) is also known. We shall defer theoretical results on more general cases to Section 4. 3.1 Estimation Convergence Rate We first quantify the convergence rate of bg B µ,t,c g0 2. Even though the main focus of this paper is conducting statistical inference based on quantized samples, it is still of interest to study the asymptotic properties of the spline estimator bg B µ,t,c( ). Define the Sobolev constant cs sup g Sm(I) J(g, g) . (12) It is known that cs is positive finite see (Adams and Fournier, 2003). For all our theoretical investigations, we assume that µj s and tj s satisfy the following boundedness condition Condition (B) : Assume that J(g0, g0) ρ2, and denote Tn = min{t2 1, t2 k 1}, it holds that, µ2 j P |σϵ1| + csρ > p for j = 1, k. Li, Liu, Xu and Shang Condition (B) asserts that the values of µ1, µk can not be to large, and that t1, tk should be sufficiently large. Recall that in this paper, we adopt the uniform quantization scheme for which Condition (B) is rather mild. Since J(g0, g0) ρ2, by the definition of cs in (12), we have that g0 sup csρ, and we shall assume that ρ is finite for our theoretical investigation. Condition (B) essentially assumes that Tn is sufficiently large so that all observed yi s fall within the quantization range with a high probability. When ϵi s follow a sub-Gaussian distribution, it suffices to take Tn log (n) for Condition (B) to hold. For distributions with heavier tails, the required order for min{t2 1, t2 k 1} will be larger, e.g., Tn [log(n)]2 for sub-Exponential distributions. In particular, when ϵi s follow a normal distribution, it suffices to use Tn = 2.5σ2 log(n). Based on Condition (B), the following theorem establishes an asymptotic upper bound for the estimation error E bg B µ,t,c g0 2. Theorem 1 If Condition (B) holds, then it follows that E bg B µ,t,c g0 2 (nh) 1 + λ + c min{2m,3} + Gc,k(t), where h = λ 1 2m , and Gc,k(t) = 4Ck(t)2 + Gc,k,1(t) + Gc,k,2(t), with Gc,k,1(t) = 2 (z µ1)2p z g0(i/n) Gc,k,2(t) = 2 tk 1 (z µk)2p z g0(i/n) with p( ) being the distribution of ϵ. The asymptotic error bound for bg B µ,t,c( ) given in Theorem 1 can be roughly categorized into three parts: (1) the estimation error of the smoothing spline estimator based on fully observed original data, i.e., (nh) 1 + λ (Wahba, 1990); (2) the estimation error attributed to first-stage quantization, i.e., Gc,k(t); and (3) the estimation bias introduced by secondstage quantization, i.e., c min{2m,3}. An extreme case is when t1 , tk 1 and Ck(t) 0, i.e., the first-stage quantizer becomes dense enough, in which case Gc,k tends to zero, reducing to the classical nonparametric estimation setting. Intuitively, if a sufficiently large bits budget B, and consequently sufficiently large values c and k can be used, term (nh) 1 + λ will dominate the upper bound of E bg B µ,t,c g0 2. As a result, the convergence rate of bg B µ,t,c g0 2 coincides with that of the classical smoothing spline estimator based on original observations without quantization (Wahba, 1990). A sufficient condition is given in the following corollary. Corollary 2 Assume that Condition (B) holds, and that (1) Ck(t)2 n 2m/(2m+1); (2) as T , p(z) satisfies R |z| T z2p(z)dz = O(exp( T d)) where d 4m 2m+1; (3) Tn log(n); and that (4) c n max{1,2m/3} 2m+1 , λ n 2m 2m+1 . Then it follows that E bg B µ,t,c g0 2 = O(n 2m 2m+1 ), which achieves the optimal convergence rate of smoothing splines without quantization. Nonparametric Inference under B-bits Quantization Recall the definition Ck(t) = (tk 1 t1)/(k 2), under conditions of Corollary 2, the minimum order of k to achieve the optimal convergence rate is nm/(2m+1) log(n), leading to a required b = log2(k) m 2m+1 log2(n). Therefore, the total bits budget B = cb max{1,2m/3} 2m+1 log(n). Recently, Zhu and Lafferty (2018) propose a quantization scheme for the Gaussian sequence model that achieves the same optimal estimation rate with a bits budget B n 1 2m+1 . Although their bits budget is lower than our proposed method, Zhu and Lafferty (2018) achieve this goal by essentially only quantizing the first n 1 2m+1 Fourier coefficients of the function g0( ) and discarding the remaining Fourier coefficients as 0 s. It is unclear how can this approach be extended to making valid nonparametric inferences for g0( ), which is the main focus of our work. The proposed quantization scheme in Section 2.2 is in spirit closer to the quantization algorithms proposed in Slawski and Li (2018) and references therein, although these works are mainly focused on the estimation of the parametric linear regression model. In the following subsections, we shall investigate the impacts of the bits budget on the asymptotic properties of the proposed nonparametric testing procedure. 3.2 Asymptotic Distribution of the Test Statistic under H0 In this section, we proceed to derive the asymptotic distribution of the test statistic Tµ,t,c under H0. From now on, we will use h = λ1/(2m) without repeating its definition. Theorem 3 Suppose that Condition (B) holds, and it holds that h 0, ch , b log2 p (nh1/2 + n(ch) 1)Tn and E([en 1 Pen j=1 Q(ϵj)]4) = O(c2n 2) as n, c . Then under H0, it follows that c Tµ,t,c trace(A)bτ 2 k scbτ 2 k d N(0, 1), as n, c , (13) where Tµ,t,c, A, bτ 2 k and sc are as defined in Section 2.4. Theorem 3 states that under some regularity conditions, the null distribution of the nonparametric test statistic Tµ,t,c for H0 in (2) is asymptotically normal. The proof relies on Stein s exchangeable pair method and is given in the Appendix. We remark that the conditions in Theorem 3 are rather mild. Specifically, the first condition h 0 requires the tuning parameter λ to shrink to zero and the second condition ch implies the number of quantized data, ie., c, should be sufficiently large. The only condition that needs more discussion is the last condition E([en 1 Pen j=1 Q(ϵj)]4) = O(c2n 2), which involves jointly controlling the moment of ϵi s and the first-stage quantizer Q( ). Proposition 4 below provides a sufficient condition to for this assumption. Proposition 4 Suppose that Condition (B) holds. If E(ϵ4 1) = O(nc 1), Ck(t) = o(1) and µ4 j P(σϵ1 Rj(t)) = O(nc 1) for j = 1 and k, then it follows that E([en 1 Pen j=1 Q(ϵj)]4) = O(c2n 2). Using Theorem 3 and Proposition 4, the validity of the proposed nonparametric testing procedure requires the quantized sample size c to be sufficiently large, in particular, ch = cλ1/(2m) . Recall that the proposed quantization scheme in Section 2.2 requires a total Li, Liu, Xu and Shang bits budget B = cb with b = log2(k) . As a result, for Theorem 3 to hold, the required bits budget B c log2 p (nh1/2 + n(ch) 1)Tn , for which the lower bound is determined by the tuning parameter λ (or h). In the next subsection, we shall investigate the impacts of λ on the asymptotic testing power against local alternatives, which can be used to study optimal asymptotic power achievable with a given bits budget B. For example, we shall show that to achieve the minimax rate of testing, one needs B n 3 4m+1 log2 n 2m 4m+1 Tn . 3.3 Asymptotic Power of the Nonparametric Test We now proceed to examine the asymptotic power of the proposed nonparametric test. For a fixed constant ρ > 0, let Sm ρ (I) = {f Sm(I) : J(f, f) ρ2} be the ρ-ball in the periodic Sobolev space with a radius ρ. We consider the following alternative hypothesis H1 : g0 Sm ρ (I)\{0}. (14) Based on the definition of the quantized data zi in (7), its unquantized counterpart can be defined as eyi = 1 en Pien j=(i 1)en+1 yj for i = 1, , c. Under H1, one has that Eeyi = c n Pien j=(i 1)en+1 g0(j/n) for i = 1, , c. To facilitate our theoretical investigation, we introduce the following function Z min(x+ ,1) max(x ,0) g0(s)ds, where x I, and = 1 It is straightforward to show that max1 i c |f(i/c) Eeyi| = O(n 1) and that as 0, supx I |g0(x) f(x)| 0. Theorem 5 below states that, under some regularity conditions, our proposed nonparametric test can achieve arbitrary high power provided that H0 and H1 are sufficiently separated. Theorem 5 Suppose that Condition (B) holds. If it holds that h 0, ch , b log2 p (nh1/2 + n(ch) 1)Tn , and E([en 1 Pen j=1 Q(ϵj)]4) = O(c2n 2), then for any η > 0, there exists positive constants Cη and Nη such that for any c Nη, inf g Sm ρ (I) g c Cηδn,c,λ P(reject H0|H1 is true) 1 η, where δn,c,λ = p (nh1/2) 1 + λ + (nch2) 1 and g c = p Pc i=1 f2(i/c)/c with function f( ) as defined in (15). The separation rate δn,c,λ represents the smallest rate of deviation from the H0 that can be consistently detected by the proposed test statistic (9), given sufficiently large n and c. The first part of δn,c,λ, namely, (nh1/2) 1 + λ, coincides with the separation rate of the classical spline-based nonparametric test using original observations without quantization, see, e.g., Shang and Cheng (2013); Cheng and Shang (2015); Shang and Cheng (2015, 2017). The remaining part of δn,c,λ, namely, (nch2) 1, is an additional term due to the twostage quantization errors. For a given n and c, the separation rate δn,c,λ can be minimized Nonparametric Inference under B-bits Quantization by choosing an appropriate value of the tuning parameter λ, subject to the constraint cλ1/2m . Specifically, by some straightforward algebra, one can show that inf λ c 2m δn,c,λ = n 2m 4m+1 , if c n 3 4m+1 with λ n 4m 4m+1 ; (nc) m 2(m+1) , if n 1 2m+1 c n 3 4m+1 with λ (nc) m m+1 ; λ1/2, if c n 1 2m+1 with λ c 2m. Recall that the total bits needed for the proposed quantization scheme in Section 2.2 is B = cb, for which Theorem 5 requires that h 0 and b log2 p (nh1/2 + n(ch) 1)Tn . By plugging the optimal smoothing parameter back to the lower bound of b, we have the following inf λ c 2m δn,c,λ = n 2m 4m+1 , if c n 3 4m+1 , b log2 n 2m 4m+1 Tn ; (nc) m 2(m+1) , if n 1 2m+1 c n 3 4m+1 , b log2 n 2m+3 4(m+1) c 2m+1 4(m+1) Tn ; λ1/2, if c n 1 2m+1 , b log2 n Tn with λ c 2m. (17) From (17), we can see that when B is sufficiently large, i.e., B n 3 4m+1 log2 n 2m 4m+1 Tn , the minimal separation rate n 2m 4m+1 achieves the minimax rate of testing (Shang and Cheng, 2013, 2017; Liu et al., 2020), implying lossless asymptotic testing power using only quantized samples. In this case, the minimal number of bits for each data point, i.e., b, does not depend on c but is determined by the smoothness of the function and the tail bound Tn of the error distribution. When B is between n 1 2m+1 log2 n Tn and n 3 4m+1 log2 n 2m 4m+1 Tn , the minimax rate of testing is no longer achievable, but the minimal separation rate still decays polynomially as the original sample size n increases. Furthermore, in this intermediate phase of B, the lower bound of b decreases as c increases, implying that increasing c rather than b when allocating the total bits budget B will more effectively improve the testing power. Finally, when B is less than n 1 2m+1 log2 n Tn , the asymptotic lower bound for the minimal rate of separation is (roughly) of the order c m with c = B/b, the number of quantized measurements that can be transmitted or stored, provided that b log2 n Tn . 4. Extensions Our prior investigations in Section 3 assume that the hypothesized function g ( ) in (2) and the order m of the periodic Sobolev space Sm(I) are both known. In reality, it might be interesting to test other hypotheses, e.g., whether g0 has a parametric expression such as a linear function. Meanwhile, the order m is often unknown. We will extend the prior works to such settings. 4.1 Nonparametric Testing for Linearity of g0( ) In some applications, we are interested in testing whether g0( ) resides in a parametric family. In this section, as an illustrative example, we consider testing the linearity of g0( ): Li, Liu, Xu and Shang Hlinear 0 : g0 L(I) vs. Hlinear 1 : g0 Sm ρ (I)\{L(I)}, (18) where L(I) denotes the class of liner functions over I := [0, 1]. Testing the hypothesis that g0( ) belongs to other parametric families governed by a finite number of parameters can be conducted in the same way with minor modifications. To test (18), we first obtain the least-square estimator bg(x), x I based on Q(yj) s, i.e., bg(x) = arg ming L(I) Pn i=1 [g(xi) Q(yi)]2. Subsequently, we define the new data as ylinear = (Q(y1) by1, . . . , Q(yn) byn)T , where byi = bg(i/n). By applying the twostage quantization Algorithm 1 to ylinear, we can then obtain the quantized data zlinear = (zlinear,1, . . . , zlinear,c)T . Following the same estimation procedure in Section 2.3, we can obtain a spline estimator bg B linear,µ,t,c based on the quantized data zlinear. The resulting test statistic is then defined as Tlinear,µ,t,c = bg B linear,µ,t,c 2, whose limiting distribution under Hlinear 0 is given by the following theorem. Theorem 6 Suppose that Condition (B) holds. If as n, c , it holds that h 0, ch , b log2 p (nh1/2 + n(ch) 1)Tn , and E([en 1 Pen j=1 Q(ϵj)]4) = O(c2n 2), then under Hlinear 0 , one has that c Tlinear,µ,t,c trace(A)bτ 2 k scbτ 2 k d N(0, 1), as n, c , (19) where Tµ,t,c, A, bτ 2 k and sc are as defined in Section 2.4 but based on ylinear. Theorem 6 is an immediate extension of Theorem 3 to testing the linearity of g0( ) using only quantized samples, indicating that the proposed nonparametric linearity test is valid under mild conditions. To investigate the power of the proposed linearity test against the alternative Hlinear 1 , we define the distance between g0( ) and the linear function space L(I) as g0 PL(I)(g0) , where PL(I)(g0) = arg minf L(I) g0 f 2 is the projection of g0( ) to L(I). The magnitude of g0 PL(I)(g0) characterizes how far the true function g0( ) deviates from any linear function in L(I). Note that under null hypothesis Hlinear 0 , one has that g0 PL(I)(g0) = 0. The following theorem describes the asymptotic power of the proposed nonparametric linearity test. Theorem 7 Suppose that Condition (B) hold. If as n, c , it holds that h 0, ch , b log2 p (nh1/2 + n(ch) 1)Tn , and E([en 1 Pen j=1 Q(ϵj)]4) = O(c2n 2), then for any η > 0, there exists positive constants Cη and Nη such that for any c Nη, inf g Sm ρ (I) g PL(I)(g) c Cηδlinear n,c,λ P(reject Hlinear 0 |Hlinear 1 is true) 1 η, where δlinear n,c,λ = p (nh1/2) 1 + λ + (nch2) 1 and g c = p Pc i=1 f2(i/c)/c with function f( ) as defined in (15). Nonparametric Inference under B-bits Quantization Based on Theorem 7, we can see that for a given quantized sample of size c, the same separation rate for testing can be achieved by the proposed nonparametric linearity test as described in (16). Furthermore, the proofs of Theorems 6-7 are similar to those of Theorem 3 and Theorem 5 by recognizing the fact that the least square estimator bg( ) satisfies that supx I |bg(x) g0(x)| = Op(n 1/2), whose impact is negligible for a nonparametric spline estimator. It is therefore trivial to extend Theorems 6-7 to testing whether g0( ) resides in other parametric families as long as an uniformly root-n consistent parametric estimator bg( ) is available. 4.2 Adaptive Nonparametric Test When m is Unknown From (16), we can see that the power of the proposed nonparametric test depends crucially on the order m of the periodic Sobolev space where the underlying true function g0( ) resides. However, the order m may be unknown in practice. One popular strategy is to set m = 2 regardless of the underlying truth, which may lead to sub-optimal testing power. In this section, we construct an optimal adaptive nonparametric testing procedure based on quantized samples that doesn t require m. Let m denote the unknown true order of the Sobolev space to which g0( ) belongs, and assume that m is an integer between two known integers ml and mu. For instance, one can set ml = 1 and mu = poly(log n) so that, as n diverges, m is guaranteed to belong to [ml, mu]. For any given integer m, we can calculate the test statistics Tm := Tµ,t,c defined in (9) with the tuning parameter λm = a2m n n 4m/(4m+1) log(mu)2m/(4m+1) where an may depend on n but is free of m. We remark that the upper bound mu may be slowly diverging as n . Our adaptive nonparametric testing procedure is summarized as follows. Step 1. For any ml m mu , calculate the standardized testing statistic ξm = c Tm trace(Am)bτ 2 k sc,mbτ 2 k , where Tm, Am, bτ 2 k and sc,m are as defined in Section 2.4. Step 2. Calculate the maximum of ξm s, i.e., ξmax = maxml m mu ξm. Step 3. Standardize ξmax as following ξ = Cn(ξmax Cn), where Cn satisfies 2πC2 n exp(C2 n) = m2 u. For the validity of the proposed adaptive nonparametric test, we assume that the following Condition (C) holds. Condition (C) : (a) mu logd0(n) for some d0 (0, 1/2), (b) ann 2/(4mu+1)[log(n)]2 0, and (c) n2/(4ml+1) log(n)/(can) 0, as n, c . Condition (C) requires that the searching range for m can not be too large by imposing a slowly diverging uppper bound on mu. In addition, the total number of quantized samples, Li, Liu, Xu and Shang i.e., c, that need to be transmitted or stored can not be too small compared to n, and is jointly determined by ml, mu and the tuning parameter an. These conditions are rather mild and have been used in the literature, see, e.g., Liu et al. (2019, 2021). The following theorem describes the asymptotic behavior of ξ under H0. Theorem 8 Suppose that both Conditions (B) and (C) hold, E([en 1 Pen j=1 Q(ϵj)]4) = O(c2n 2), and that (nh1/2 ml + n(chml) 1)Tn (nh1/2 mu + n(chmu) 1)Tn Then, under H0 given in (2), for any α (0, 1), it holds that P(ξ qα) 1 α, as n, c , (20) where qα = log( log(1 α)). The intuition behind Theorem 8 is straightforward: under H0, the limiting distribution of each ξm is normal, which suggests that the asymptotic distribution of the maxima ξ should be close to the extreme value distribution. We use the techniques developed in Koike (2019) to formalize the proof. Next, we investigate the asymptotic power of the proposed adaptive nonparametric test under the alternative H1 : g0 Sm ρ (I)\{L(I)}. Theorem 9 Suppose that both Conditions (B) and (C) hold, E([en 1 Pen j=1 Q(ϵj)]4) = O(c2n 2), and that (nh1/2 ml + n(chml) 1)Tn (nh1/2 mu + n(chmu) 1)Tn Then, for any η > 0, there exists positive constants Cη and Nη such that for any c Nη, inf g Sm ρ (I) g c Cηδn,c,an P(reject H0|H1 is true) 1 η, where δn,c,an = n 2m 4m +1 [log(mu)] m 4m +1 q 2 n + c 1a 2 n n 3 4m +1 [log(mu)] 2(m +1) 4m +1 + a2m n and g c = p Pc i=1 f2(i/c)/c with function f( ) as defined in (15). Based on the form of separation rate δn,c,an obtained in Theorem 9, it is straightforward to show that the minimal separation rate is obtained when an = a0 for some constant a0 > 0, provided that c max{n2/(4ml+1) log(n), n3/(4m +1)}[log(mu)] 1/(4m +1) so that Condition (C) is met and the second term inside the square-root part of δn,c,an is negligible. Specifically, if c max{n2/(4ml+1) log(n), n3/(4m +1)}[log(mu)] 1/(4m +1), one has that inf an>0 δn,c,an n 2m 4m +1 [log(mu)] m 4m +1 . (21) Nonparametric Inference under B-bits Quantization The minimal separation rate (21) is the same as the one obtained in Liu et al. (2019, 2021) and is minimax for the adaptive nonparametric test. This suggests that with the quantized samples, the proposed adaptive test can still achieve the optimal testing power if the bits budget satisfies B max{n2/(4ml+1) log(n), n3/(4m +1)}[log(mu)] 1/(4m +1) log(n), and we take b = log2 p n Tn/σ2 log(n) as suggested in Section 2.5. Compared to the minimax rate of testing when m is known, which is given in (16), the minimal separation rate (21) is only inflated by a factor of [log(mu)]4m /(4m +1). This is the price to pay for searching m over ml m mu. Furthermore, we wish to remark that the lower bound of the bits budget B depends not only on the true order m but also on the smallest guess of the order, i.e., ml. This can be interpreted by the fact that ξml in Step 1 of the adaptive test is constructed based on an under-smoothed spline estimator, which may have a larger order of estimation bias. In practice, it is convenient to set ml = 1 as suggested by Liu et al. (2021). However, a more accurate guess of ml may lead to a smaller bits budget B required to achieve the minimax rate of testing. 5. Simulation Studies In this section, we evaluate the finite sample performance of the proposed methods through a set of simulation studies. For all simulation settings except for Section 5.4, the data are generated from the following model yi = rβ3,2(xi) + ϵi, with xi = i n, i = 1, , n, (22) where β3,2( ) is the density function of the beta distribution with parameters 3 and 2, ϵi s are independent random errors. Two types of errors were considered: (1) ϵ N(0, 1); (2) ϵ N(0, 1.52). We consider r from 0 to 1, and various sample sizes n. In particular, r = 0 is used to examine the empirical size of the proposed test under H0, and other values of r are used to check the empirical powers against alternatives. The target significance level was chosen as α = 0.1. For all simulation studies, we consider the uniform quantization scheme outlined in Section 2.2. Specifically, for the data quantization step, for a given bits budget B, we choose c, k following the approach suggested in Section 2.5 with a Tn/σ2 = 2.5 log(n). For each simulation, the quantization ranges t1, tk 1 are defined as t1 = µ0 p 2.5σ2 log(n), tk 1 = µ0+ p 2.5σ2 log(n), where µ0 = R 1 0 g0(x)dx with g0( ) being the regression function in model (1). The use of µ0 and σ is of limited importance and can be replaced with any reasonable alternatives such as setting µ0 = 0 or using estimates based on historical data. Summary statistics from each simulation setting were based on 1000 independent simulation runs. Except for Section 5.3, we considered periodic Sobolev space of order m = 2 with kernel function K(x, y) = B4(|x y|)/24, where B4 is the Bernoulli polynomial of order 4. The tuning parameter λ was set as λ = bλGCV/ log(c) with bλGCV being picked by GCV. Li, Liu, Xu and Shang 5.1 Estimation Performance of bg B µ,t,c( ) In this section, we first evaluate the estimation performance of the spline estimator bg B µ,t,c( ) defined in (7) that is based on only quantized samples. We generated data from model (22) with r = 0.5, 1 and sample sizes n = 1000, 2000, 3000, 5000, 10000. For each n, we gradually increase the bits budget B from 30 to 1000. The estimation accuracy was evaluated by the mean squared errors (RMSE) defined as bg B µ,t,c g0 . The simulation results were summarized in Figure 2, which suggests that the MSEs decrease as n increases in all considered settings. Moreover, as B increases, the MSEs first decreases rapidly at the beginning and then stabilize at some levels. This observation is consistent with our theoretical results established in Section 3.1, which state that increasing B (or equivalently, c and b) will diminish the impact of information loss due to the data averaging and data quantization, and as a result bg B µ,t,c( ) becomes more accurate. Furthermore, we can also observe after B exceeds a certain threshold, the MSEs of bg B µ,t,c( ) stabilize, which supports the findings in Corollary 2. Specifically, when B is sufficiently large, the MSEs of bg B µ,t,c( ) reaches the estimation error lower bound of the classical spline estimator based on the complete data. 200 400 600 800 1000 B n=1000 n=2000 n=3000 n=5000 n=10000 200 400 600 800 1000 B n=1000 n=2000 n=3000 n=5000 n=10000 200 400 600 800 1000 B n=1000 n=2000 n=3000 n=5000 n=10000 200 400 600 800 1000 B n=1000 n=2000 n=3000 n=5000 n=10000 Figure 2: MSEs of spline estimators based on quantized data: r = 0.5 for left 2 panels and r = 1 for right 2 panels; ϵ N(0, 1) for top 2 panels and ϵ N(0, 1.52) for bottom 2 panels. Nonparametric Inference under B-bits Quantization 5.2 Nonparametric Test with g ( ) 0 and m = 2 In this section, we investigate the empirical sizes and powers of the nonparametric test proposed in Section 2.4, when g ( ) 0 in the null hypothesis (2) and m = 2 treated as known. The data was generated from the model (22) with various r and sample sizes n. Figure 3 reports the empirical sizes of the proposed nonparametric test when r = 0 and the empirical powers when r > 0, respectively. Specifically, in all case scenarios, the empirical sizes of the proposed test are close to the target nominal level 0.1 as the sample size n increases. When either r or n increases, we observe that the empirical powers of the proposed test gradually approach one, which suggests that the proposed testing procedure is consistent for the alternative hypothesis that has a sufficiently large deviation (relative to the sample size n) from the H0. Furthermore, after the bits budget B exceeds a certain threshold, the empirical powers of the proposed nonparametric test are rather close to each other, which supports our theoretical findings in Section 3.3. 0.0 0.1 0.2 0.3 0.4 0.5 r Empirical Rejection Rates n=200, B=20 n=200, B=40 n=200, B=80 n=200, B=100 0.0 0.1 0.2 0.3 0.4 0.5 r Empirical Rejection Rates n=600, B=25 n=600, B=40 n=600, B=80 n=600, B=100 0.0 0.1 0.2 0.3 0.4 0.5 r Empirical Rejection Rates n=1000, B=25 n=1000, B=40 n=1000, B=80 n=1000, B=100 0.0 0.1 0.2 0.3 0.4 0.5 r Empirical Rejection Rates n=10000, B=35 n=10000, B=70 n=10000, B=140 n=10000, B=175 0.0 0.1 0.2 0.3 0.4 0.5 r Empirical Rejection Rates n=200, B=25 n=200, B=40 n=200, B=80 n=200, B=100 0.0 0.1 0.2 0.3 0.4 0.5 r Empirical Rejection Rates n=600, B=25 n=600, B=40 n=600, B=80 n=600, B=100 0.0 0.1 0.2 0.3 0.4 0.5 r Empirical Rejection Rates n=1000, B=25 n=1000, B=40 n=1000, B=80 n=1000, B=100 0.0 0.1 0.2 0.3 0.4 0.5 r Empirical Rejection Rates n=10000, B=35 n=10000, B=70 n=10000, B=140 n=10000, B=175 Figure 3: Empirical rejection rates at 0.1 significance level: ϵ N(0, 1) for the top 4 panels; N(0, 1.52) for the bottom 4 panels. Li, Liu, Xu and Shang 5.3 Adaptive Nonparametric Test with an Unknown m In this section, we investigate the validity and the empirical power of the adaptive nonparametric test proposed in Section 4.2, for which the order parameter m is searched from ml = 1 to mu = log n. Figure 4 shows the empirical rejection rates of the proposed nonparametric adaptive test at the 0.1 significance level. We can observe that when r = 0, the empirical rejection rates are rather close to the nominal level 0.1. For any given r > 0, we can see that the empirical rejection rates increase as the sample size n increases. For a fixed n, as r increases, the empirical rejection rates increase steadily and eventually reach the 100% when n = 600, n = 1000 and n = 10000. Finally, as long as the bits budget B exceeds a certain threshold, the empirical rejection rates are rather similar in most settings. All these observations are consistent with our theoretical findings in Theorem 9. Furthermore, the empirical rejection rates are smaller than the nonparametric test (non-adaptive) in Section 5.2 under the same setting, which is the price paid for adaptivity in m. 0.0 0.1 0.2 0.3 0.4 0.5 r Empirical Rejection Rates n=200, B=25 n=200, B=40 n=200, B=80 n=200, B=100 0.0 0.1 0.2 0.3 0.4 0.5 r Empirical Rejection Rates n=600, B=25 n=600, B=40 n=600, B=80 n=600, B=100 0.0 0.1 0.2 0.3 0.4 0.5 r Empirical Rejection Rates n=1000, B=25 n=1000, B=40 n=1000, B=80 n=1000, B=100 0.0 0.1 0.2 0.3 0.4 0.5 r Empirical Rejection Rates n=10000, B=35 n=10000, B=70 n=10000, B=140 n=10000, B=145 0.0 0.1 0.2 0.3 0.4 0.5 r Empirical Rejection Rates n=200, B=25 n=200, B=40 n=200, B=80 n=200, B=100 0.0 0.1 0.2 0.3 0.4 0.5 r Empirical Rejection Rates n=600, B=25 n=600, B=40 n=600, B=80 n=600, B=100 0.0 0.1 0.2 0.3 0.4 0.5 r Empirical Rejection Rates n=1000, B=25 n=1000, B=40 n=1000, B=80 n=1000, B=100 0.0 0.1 0.2 0.3 0.4 0.5 r Empirical Rejection Rates n=10000, B=35 n=10000, B=70 n=10000, B=140 n=10000, B=145 Figure 4: Empirical rejection rates of the nonparametric adaptive test at the 0.1 significance level: ϵ N(0, 1) for the top 4 panels; N(0, 1.52) for the bottom 4 panels. Nonparametric Inference under B-bits Quantization 5.4 Nonparametic Linearity Test with m = 2 In this section, we study the empirical performance of the proposed nonparametric linearity test. The data is generated from the following model yi = rβ3,2(xi) + 3xi + 2 + ϵi, with xi = i n, i = 1, , n, where β3,2( ) is the density function of the beta distribution with parameters 3 and 2, and ϵi s are independent random errors. Two types of errors were considered: (1) ϵ N(0, 1); (2) ϵ N(0, 1.52). When r = 0, the model satisfies the null hypothesis Hlinear 0 , and as r increases, the departure from the linear model becomes increasingly larger. 0.0 0.1 0.2 0.3 0.4 0.5 r Empirical Rejection Rates n=200, B=25 n=200, B=40 n=200, B=80 n=200, B=100 0.0 0.1 0.2 0.3 0.4 0.5 r Empirical Rejection Rates n=600, B=25 n=600, B=40 n=600, B=80 n=600, B=100 0.0 0.1 0.2 0.3 0.4 0.5 r Empirical Rejection Rates n=1000, B=25 n=1000, B=40 n=1000, B=80 n=1000, B=100 0.0 0.1 0.2 0.3 0.4 0.5 r Empirical Rejection Rates n=10000, B=35 n=10000, B=70 n=10000, B=140 n=10000, B=145 0.0 0.1 0.2 0.3 0.4 0.5 r Empirical Rejection Rates n=200, B=25 n=200, B=40 n=200, B=80 n=200, B=100 0.0 0.1 0.2 0.3 0.4 0.5 r Empirical Rejection Rates n=600, B=25 n=600, B=40 n=600, B=80 n=600, B=100 0.0 0.1 0.2 0.3 0.4 0.5 r Empirical Rejection Rates n=1000, B=25 n=1000, B=40 n=1000, B=80 n=1000, B=100 0.0 0.1 0.2 0.3 0.4 0.5 r Empirical Rejection Rates n=10000, B=35 n=10000, B=70 n=10000, B=140 n=10000, B=145 Figure 5: Empirical rejection rates of the nonparametric linearity test at 0.1 significance level: ϵ N(0, 1) for the top 4 panels; N(0, 1.52) for the bottom 4 panels. Figure 5 reports the empirical rejection rates of the nonparametric linearity test proposed in Section (4.1) at the significance level 0.1 . It is straightforward to see that, when r = 0, the empirical rejection rates are close to the nominal size, indicating the validity of the test Li, Liu, Xu and Shang asserted by Theorem 6. For any given r > 0, we can see that the empirical rejection rates increase as the sample size n increases. For a fixed n, as r increases, the empirical rejection rates increase steadily and eventually reach the 100%. Finally, as long as the bits budget B exceeds a certain threshold, the empirical rejection rates are rather similar in most settings. All these observations are consistent with our theoretical findings in Theorem 7. 6. Real Data Analysis In this section, we apply the proposed methods to the Combined Cycle Power Plant Data (Kaya et al., 2012; T ufekci, 2014), which can be downloaded at http://https://archive. ics.uci.edu/ml/datasets/Combined+Cycle+Power+Plant. The data set consists of n = 9568 observations from a Combined Cycle Power Plant over 6 years (2006-2011). The purpose of our analysis is to explore the relationship between the net hourly electrical energy output of the plant between three environmental factors: temperature, ambient pressure, and relative humidity. Figure 6 displays the estimated curve based on B-bits quantizations (B = 35, 70, 140, 175) and full data, for which the periodic spline of order m = 2 was used. For the quantization step, we choose Tn = 2.5 ˆσ2 log(n), where ˆσ2 is the standard deviation of the observated data, and c, k are determined by Section 2.5. We can observe that the spline estimator based on quantized data with B = 35, i.e., the green curve, is rather different from the other curves in the two analyzes. When the bits budget B increases to more than 70, such differences quickly diminish. This observation demonstrates the effectiveness of the proposed B-bits quantization scheme. 0.0 0.2 0.4 0.6 0.8 1.0 Ambient P essu e Ene gy Output B=35 B=70 B=140 B=175 Full Data 0.0 0.2 0.4 0.6 0.8 1.0 Tempe atu e Ene gy Output B=35 B=70 B=140 B=175 Full Data 0.0 0.2 0.4 0.6 0.8 1.0 Relative Humidity Ene gy Output B=35 B=70 B=140 B=175 Full Data Figure 6: Spline estimators based on B-bits quantizations (B = 35,70,140,175) and full data. Sample size is n = 9568. Next, we conduct some hypothesis tests for the relationship between the net hourly electrical energy output and other three environmental factors. The first test is to test whether there is an association between the energy output and three environmental factors. We consider both non-adaptive and adaptive nonparametric tests. For the non-adaptive Nonparametric Inference under B-bits Quantization nonparametric test, m = 2 is used. The p-values are all close to zero, implying strong rejections of the null hypothesis. This is not surprising based on the shapes of the spline estimators illustrated in Figure 6. Next, there appears to be a strong linear association between relative humidity and the energy output in Figure 6. Based on this conjecture, we proceed to test whether the associations between these three environmental factors and the energy output are linear or nonlinear, using the nonparametric linearity test proposed in Section 4.1. The p-values for the first two environmental factors, i.e., ambient pressure and temperature, are both close to zero, indicating strong rejections of the null hypothesis. Figure 7 illustrates the p-values of the nonparametric linearity test for the relationship between relative humidity and energy output as a function of the bits budget B. We can see that the nonparametric linearity test based on quantized data fails to reject the null hypothesis, which echos our conjecture based on Figure 6. 50 100 150 200 250 300 350 B Figure 7: P-values of the nonparametric linearity test for the relationship between relative humidity and energy output as a function of B. 7. Discussion In this paper, we propose a set of non-parametric testing procedures based on quantized observations, including the non-adaptive nonparametric test, the nonparametric linearity test, and the adaptive nonparametric test. The proposed tests are easy-to-use based on L2-metric between the quantization spline estimators and the hypothesized function. We investigate the asymptotic validity and testing powers of the proposed tests and show how the asymptotic testing powers changes as the bits budget B increases. In the end, we discuss two additional extensions. First, the present paper only deals with periodic splines. It is interesting to extend our results to more general splines or even kernel ridge regression. The special periodic spline largely reduces the difficulty level of the technical proofs. Indeed, the majority of the proofs can be accomplished by exact calculations based on trigonometric series. For general RKHS, exact calculations may not be possible, and more involved proofs are needed. Second, the nonparametric linearity test can be easily extended to testing general composite null hypotheses such as Hgeneral 0 : g0(x) = h0(x, θ) for some function h0 governed by parameters θ Rp with a fixed p. However, Li, Liu, Xu and Shang when p is diverging as n increases, it will be more challenging to investigate the asymptotic behavior of the proposed test statistic and will be an interesting future research topic. Robert A Adams and John JF Fournier. Sobolev spaces, volume 140. Elsevier, 2003. K Benhenni and Mustapha Rachdi. Nonparametric estimation of the regression function from quantized observations. Computational Statistics & Data Analysis, 50(11):3067 3085, 2006. Petros T Boufounos and Richard G Baraniuk. 1-bit compressive sensing. Information Sciences and Systems, 2008. CISS 2008. 42nd Annual Conference on Information Sciences and Systems, pages 16 21, 2008. Tony Cai and Hongji Wei. Distributed nonparametric function estimation: Optimal rate of convergence and cost of adaptation. ar Xiv preprint ar Xiv:2107.00179, 2021. Guang Cheng and Zuofeng Shang. Joint asymptotics for semi-nonparametric regression models with partially linear structure. The Annals of Statistics, 43(3):1351 1390, 2015. Larry Goldstein and Yosef Rinott. Multivariate normal approximations by stein s method and size bias couplings. Journal of Applied Probability, 33(1):1 17, 1996. Sivakant Gopi, Praneeth Netrapalli, Prateek Jain, and Aditya Nori. One-bit compressed sensing: Provable support and vector recovery. In International Conference on Machine Learning, pages 154 162, 2013. Chong Gu. Smoothing Spline ANOVA Models, volume 297. Springer Science & Business Media, 2013. Ankit Gupta, Robert Nowak, and Benjamin Recht. Sample complexity for 1-bit compressed sensing and sparse classification. In 2010 IEEE International Symposium on Information Theory, pages 1553 1557. IEEE, 2010. Peter Hall. On the rate of convergence of normal extremes. Journal of Applied Probability, 16(2):433 439, 1979. Yanjun Han, Pritam Mukherjee, Ayfer Ozgur, and Tsachy Weissman. Distributed statistical estimation of high-dimensional and nonparametric distributions. In 2018 IEEE International Symposium on Information Theory (ISIT), pages 506 510. IEEE, 2018. Heysem Kaya, Pmar T ufekci, and Fikret S G urgen. Local and global learning methods for predicting power of a combined gas & steam turbine. In Proceedings of the International Conference on Emerging Trends in Computer and Electronics Engineering Icetcee, pages 13 18, 2012. Yuta Koike. Gaussian approximation of maxima of Wiener functionals and its application to high-frequency data. The Annals of Statistics, 47(3):1663 1687, 2019. Nonparametric Inference under B-bits Quantization Chiang-Sheng Lee and Stephen B Vardeman. Interval estimation of a normal process mean from rounded data. Journal of Quality Technology, 33(3):335 348, 2001. Meimei Liu, Zuofeng Shang, and Guang Cheng. Sharp theoretical analysis for nonparametric testing under random projection. In Conference on Learning Theory, pages 2175 2209, 2019. Meimei Liu, Zuofeng Shang, and Guang Cheng. Nonparametric distributed learning under general designs. Electronic Journal of Statistics, 14:3070 3102, 2020. Meimei Liu, Zuofeng Shang, Yun Yang, and Guang Cheng. Nonparametric testing under randomized sketching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021. Yaniv Plan and Roman Vershynin. Robust 1-bit compressed sensing and sparse logistic regression: A convex programming approach. IEEE Transactions on Information Theory, 59(1):482 494, 2013. Gesine Reinert and Adrian R ollin. Multivariate normal approximation with stein s method of exchangeable pairs under a general linearity condition. The Annals of Probability, 37 (6):2150 2173, 2009. Zuofeng Shang and Guang Cheng. Local and global asymptotic inference in smoothing spline models. The Annals of Statistics, 41(5):2608 2638, 2013. Zuofeng Shang and Guang Cheng. Nonparametric inference in generalized functional linear models. The Annals of Statistics, 43(4):1742 1773, 2015. Zuofeng Shang and Guang Cheng. Computational limits of a distributed algorithm for smoothing spline. The Journal of Machine Learning Research, 18(1):3809 3845, 2017. Martin Slawski and Ping Li. b-bit marginal regression. In Advances in Neural Information Processing Systems, pages 2062 2070, 2015. Martin Slawski and Ping Li. On the trade-offbetween bit depth and number of samples for a basic approach to structured signal recovery from b-bit quantized linear measurements. IEEE Transactions on Information Theory, 64(6):4159 4178, 2018. Ananda Theertha Suresh, X Yu Felix, Sanjiv Kumar, and H Brendan Mc Mahan. Distributed mean estimation with limited communication. In International Conference on Machine Learning, pages 3329 3337. PMLR, 2017. Botond Szabo and Harry van Zanten. Adaptive distributed methods under communication constraints. The Annals of Statistics, 48(4):2347 2380, 2020. Alexandre B. Tsybakov. Introduction to Nonparametric Estimation. Springer Publishing Company, Incorporated, 1st edition, 2008. ISBN 0387790519. Pınar T ufekci. Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods. International Journal of Electrical Power & Energy Systems, 60:126 140, 2014. Li, Liu, Xu and Shang Stephen B Vardeman and Chiang-Sheng Lee. Likelihood-based statistical estimation from quantized data. IEEE Transactions on Instrumentation and Measurement, 54(1):409 414, 2005. Grace Wahba. Spline Models for Observational Data, volume 59. Siam, 1990. Ganggang Xu and Jianhua Z Huang. Asymptotic optimality and efficient computation of the leave-subject-out cross-validation. The Annals of Statistics, 40(6):3003 3030, 2012. Ganggang Xu, Zuofeng Shang, and Guang Cheng. Optimal tuning for divide-and-conquer kernel ridge regression with massive data. In International Conference on Machine Learning, pages 5483 5491. PMLR, 2018. Ganggang Xu, Zuofeng Shang, and Guang Cheng. Distributed generalized cross-validation for divide-and-conquer kernel ridge regression and its asymptotic optimality. Journal of computational and graphical statistics, 28(4):891 908, 2019. Yun Yang, Zuofeng Shang, and Guang Cheng. Non-asymptotic analysis for nonparametric testing. In Jacob Abernethy and Shivani Agarwal, editors, Proceedings of Thirty Third Conference on Learning Theory, volume 125 of Proceedings of Machine Learning Research, pages 3709 3755. PMLR, 09 12 Jul 2020. Lijun Zhang, Jinfeng Yi, and Rong Jin. Efficient algorithms for robust one-bit compressive sensing. pages 820 828, 2014. Yuchen Zhang, John C Duchi, Michael I Jordan, and Martin J Wainwright. Informationtheoretic lower bounds for distributed statistical estimation with communication constraints. In Advances in Neural Information Processing Systems, pages 2328 2336. Citeseer, 2013. Rongda Zhu and Quanquan Gu. Towards a lower sample complexity for robust one-bit compressed sensing. In International Conference on Machine Learning, pages 739 747, 2015. Yuancheng Zhu and John Lafferty. Quantized estimation of gaussian sequence models in euclidean balls. In Advances in Neural Information Processing Systems, pages 3662 3670, 2014. Yuancheng Zhu and John Lafferty. Quantized minimax estimation over Sobolev ellipsoids. Information and Inference: A Journal of the IMA, 7(1):31 82, 2017. Yuancheng Zhu and John Lafferty. Distributed nonparametric regression under communication constraints. In International Conference on Machine Learning, pages 6009 6017, 2018. Nonparametric Inference under B-bits Quantization Appendix A. Structure of the proofs In this section, we outline the high-level structure of the proofs for the main theorems. The proof of Theorem 1 is mainly based on Lemma 10. In Lemma 10, we provide an upper bound for the difference between two smoothing spline estimators. The proof of Theorem 3 relies on Stein s exchangeable pair method. Specifically, we first prove that the asymptotic normality of c Tµ ,t,c trace(A)τ 2 k scτ 2 k based on z i s, where z i s are the quantized samples corresponding to µj = µ j for 1 j k, τ 2 k = V ar(z i |H0), and µ j = Pn i=1 E{yi I(yi Rj(t))} Pn i=1 P(yi Rj(t)) . (23) Next, we prove that c Tµ ,t,c trace(A)τ 2 k scτ 2 k c Tµ,t,c trace(A)bτ 2 k scbτ 2 k = op(1). In Lemma 11 and Lemma 12, we prove the error rate introduced by quantization of variance using Algorithm 2, which are needed for the proof of Theorem 3. In Lemma 13, we quantify the difference of quantized sample under H1 and H0. In the proof of Theorem 5, we first decompose the test statistic into two parts, c Tµ,t trace(A)bτ 2 k scbτ 2 k = z T Az (z0)T Az0 scbτ 2 k + (z0)T Az0 trace(A)bτ 2 k scbτ 2 k , where z0 is the vector of quantized sample under H0 : g0 = 0. Under Theorem 3, we know the second term is Op(1). In the first term, it is straightforward to see that z T Az (z0)T Az0 = (z z0)T A(z z0) + 2(z z0)T Az0. In Lemma 14, we establish a lower bound for (z z0)T Az0. In Lemma 15 and Lemma 16, we establish the lower bound for (z z0)T A(z z0). In the proof of Theorem 8, observe that the test statistic for each m ξm = c Tm trace(Am)bτ 2 k sc,mbτ 2 k = (z0)T Amz0 trace(Am)bτ 2 k sc,mbτ 2 k is in a quadratic form. Lemma 17 proves that the maximum of the quadratic form follows an extreme value distribution. Lemma 18 provides the rate conditions such that Lemma 17 holds. The idea of Theorem 9 is similar to the proof of Theorem 5 and Theorem 7. Appendix B. Notation In this section, we first summarize some notations which are frequently used through out the paper for the reader s convenience. Li, Liu, Xu and Shang Symbol Description c number of groups. en number of observations in each group which is defined as en = n/c. (µ1, . . . , µk)T quantized value. (t1, . . . , tk 1)T cut-offpoints of quantized intervals. y = (y1, . . . , yn)T vector of response . ey = (ey1, . . . , eyc)T average of the response which is defined as eyi = 1 en Pien j=(i 1)en+1 yj. z = (z1, . . . , zc)T vector of quantized sample. z0 = (z0 1, . . . , z0 c)T vector of quantized sample under H0 : g0 = 0. ez = (ez1, . . . , ezc)T vector of truncated quantized sample, where ezi = zi1(csρ + σ|ϵj| Tn) for all j = (i 1)en + 1, . . . ien). ez0 = (ez0 1, . . . , ez0 c)T vector of truncated quantized response under H0 : g0 = 0, where ez0 i = z0 i 1(csρ + σ|ϵj| Tn) for all j = (i 1)en + 1, . . . ien). ylinear = (ylinear 1 , . . . , ylinear n )T new defined data for testing the linearity of g0, which is defined as ylinear i = Q(yi) bg(i/n), and bg(i/n) is the least-square estimator of g. zlinear = (zlinear 1 , . . . , zlinear c )T vector of quantized value of ylinear i . zlinear 0 = (zlinear i,0 , . . . , zlinear i,c )T quantized value of ylinear i under Hlinear 0 : g0 is linear. λ smoothing parameter. {ϕi(x)} i=1 trigonometric basis functions. K( , ) kernel function. Σc kernel matrix defined as Σc = [K(i/c, i /c)/c]1 i,i c. Ωc tensor of K( , ) defined as Ωc = [K 2(i/c, i /c)/c]1 i,i c. A A = (Σc + λIc) 1Ωc(Σc + λIc) 1. ζ approximation error of Riemann sum and integral. cs Sobolev constant defined as cs = supf Sm(I) f sup Ck(t) maximum length of quantization interval. Table 1: Table that lists some of the useful notations that are frequently used throughout the paper. Appendix C. Useful Lemmas The proofs of the theorems require some preliminary lemmas. In this section, we summarize these useful lemmas. Throughout the proof, we let eyi = 1 en Pien j=(i 1)en+1 yj, i = 1, . . . , c and we denote bgss as the canonical smoothing spline based on the full dataset; bgss c as the smoothing spline based on the averaged responses {ey1, , eyc}, and bg B µ,t,c as the desired Nonparametric Inference under B-bits Quantization B-bits estimator, i.e., bgss = arg min g Sm(I) i=1 (yi g(i/n))2 + λ Z 1 0 [g(m)(x)]2dx, bgss c = arg min g Sm(I) i=1 (eyi g(i/c))2 + λ Z 1 0 [g(m)(x)]2dx, eyi = 1 j=(i 1)en+1 yj, bg B µ,t,c = arg min g Sm(I) i=1 (zi g(i/c))2 + λ Z 1 0 [g(m)(x)]2dx. The following lemma describes that the distance between bg B µ,t,c and bgss c can be well controlled by carefully choosing quantization parameters µ, t and c. Lemma 10 For any µ = (µ1, . . . , µk)T Rk and t = (t1, . . . , tk 1)T Rk 1, it holds that bg B µ,t,c bgss c 2 c 1 c X i=1 (zi eyi)2. (24) Proof Recall that bg B µ,t,c = Pc i=1 bθi Ki/c, where (bθ1, . . . , bθc)T = c 1(Σc + λIc) 1z with Σc = [K(i/c, i /c)/c]1 i,i c Rc c, z = (z1, . . . , zc)T Rc, and K( , ) is the kernel function. Similarly, bgss c = Pc i=1 eθi Ki/c, where (eθ1, . . . , eθc)T = c 1(Σc +λIc) 1ey with ey = (ey1, . . . , eyc)T . Let bθ = (bθ1, . . . , bθc)T , eθ = (eθ1, . . . , eθc)T . By direct calculations, we have bg B µ,t,c bgss c = i=1 (bθi eθi)ϕν(i/c) ΦT ν (bθ eθ) where ϕ2k 1(x) = 2 cos(2πkx), ϕ2k(x) = 2 sin(2πkx) are the trigonometric basis functions, γ2k 1 = γ2k = (2πk)2m, and Φν = (ϕν(1/c), ϕν(2/c), . . . , ϕν(c/c))T . So bg B µ,t,c bgss c 2 = |ΦT ν (bθ eθ)|2 = (bθ eθ)T X ΦνΦT ν γ2ν (bθ eθ) = c 1(z ey)T (Σc + λIc) 1Ωc(Σc + λIc) 1(z ey). (25) We now look at Σc and Ωc. To ease our calculations, for 0 l c 1, we first define the following two notations, cos(2πkl/c) (2πk)2m , dl = 2 cos(2πkl/c) Since d l = d c l, dl = dc l for l = 1, 2, . . . , c 1, we know Σc, Ωc are both symmetric circulant of order c. Furthermore, Σc and Ωc share the same normalized eigenvectors as xr = 1 c(1, εr, ε2r, . . . , ε(c 1)r)T , r = 0, 1, . . . , c 1, Li, Liu, Xu and Shang where ε = exp(2π 1/c). Let M = (x0, x1, . . . , xc 1), and M be the conjugate transpose of M. Clearly, MM = Ic and Σc, Ωc admit the following decomposition Σc = MΛd M , Ωc = MΛd M , (26) where Λd = diag(λd ,0, λd ,1, . . . , λd ,c 1) and Λd = diag(λd,0, λd,1, . . . , λd,c 1) with λd ,l = d 0 + d 1εl + . . . + d c 1ε(c 1)l and λd,l = d0 + d1εl + . . . + dc 1ε(c 1)l. Direct calculations show that ( 2 P k=1 1 (2πkc)2m , l = 0, P k=1 1 [2π(kc l)]2m + P k=0 1 [2π(kc+l)]2m , 1 l c 1 ( 2 P k=1 1 (2πkc)4m , l = 0, P k=1 1 [2π(kc l)]4m + P k=0 1 [2π(kc+l)]4m , 1 l c 1. It is easy to examine that ( 2 d m(2πc) 2m, l = 0, 1 [2π(c l)]2m + 1 (2πl)2m + P k=2 1 [2π(kc l)]2m + P k=1 1 [2π(kc+l)]2m , 1 l c 1, ( 2 dm(2πc) 4m, l = 0, 1 [2π(c l)]4m + 1 (2πl)4m + P k=2 1 [2π(kc l)]4m + P k=1 1 [2π(kc+l)]4m , 1 l c 1, d m(2πc) 2m 1 [2π(kc l)]2m d m(2πc) 2m, d m(2πc) 2m 1 [2π(kc + l)]2m d m(2πc) 2m, 1 [2π(kc l)]4m dm(2πc) 4m, 1 [2π(kc + l)]4m dm(2πc) 4m, k=1 k 2m, d m := k=2 k 2m, dm := k=1 k 4m, dm := k=2 k 4m. (29) It follows from (27) and (28) that λd,l λ2 d ,l for 0 l c 1. Therefore, (z ey)T (Σc + λIc) 1Ωc(Σc + λIc) 1(z ey) = (z ey)T Mdiag λd,0 (λ + λd ,0)2 , . . . , λd,c 1 (λ + λd ,c 1)2 (z ey)T MM (z ey) = (z ey)T (z ey). Nonparametric Inference under B-bits Quantization Therefore, it follows by (25) that bg B µ,t,c bgss c 2 c 1(z ey)T (z ey) = c 1 c X i=1 (zi eyi)2. (30) This completes the proof. Lemma 11 Suppose Condition (B) holds true, and it holds that Ck(t) 0, then we have τ 2 k = V ar(z1|H0) = O(en 1 + Ck(t)2) and bτ 2 k = eτ 2 n 2en(n 1) = τ 2 k 1 + Op(n 1/2 + Ck(t)) = τ 2 k[1 + op(1)]. Proof By the definition of τ 2 k and (6) we have τ 2 k = V ar(z1|H0) = 1 i=1 Q(yi)|H0) + O(Ck(t)2) = 1 en V ar(Q(σϵ1)) + O(Ck(t)2) j=1 µ2 j P (Q(σϵ1) = µj) 1 j=1 µj P (Q(σϵ1) = µj) + O(Ck(t)2) j=1 µ2 j P (σϵ1 Rj(t)) 1 j=1 µj P (σϵ1 Rj(t)) + O(Ck(t)2) = R1 + R2 + O(Ck(t)2). (31) Assume that for 2 s k 1, t1 < t2 < < ts 1 0 < ts < < tk 1 and let p(ϵ) be the density function of ϵ1. Then we have j=2 µ2 j P (σϵ1 Rj(t)) tj 1/σ p(ϵ)dϵt2 j 1 tj 1/σ p(ϵ)dϵ t2 j + Ck (t)2 tj 1/σ ϵ2p(ϵ)dϵ + 2Ck(t)2 s 1 X tj 1/σ p(ϵ)dϵ j=s+1 µ2 j P (σϵ1 Rj(t)) tj 1/σ p(ϵ)dϵt2 j tj 1/σ p(ϵ)dϵ t2 j 1 + Ck(t)2 tj 1/σ ϵ2p(ϵ)dϵ + 2Ck(t)2 k 1 X tj 1/σ p(ϵ)dϵ. Li, Liu, Xu and Shang The fact that |µs| Ck(t) and the above inequalities lead j=2 µ2 j P (σϵ1 Rj(t)) + j=s+1 µ2 j P (σϵ1 Rj(t)) + µ2 1P (σϵ1 R1(t)) + µ2 k P (σϵ1 Rk(t)) + µ2 s P (σϵ1 Rs(t)) R ϵ2p(ϵ)dϵ + 3Ck(t)2 + O(1) . This proves R1 1 en. On the other hand, by t2 1 > σ2 and ts 1 = O (Ck(t)) = o(1), we have j=2 µ2 j P (σϵ1 Rj(t)) tj 1/σ p(ϵ)dϵ t2 j 1/2 Ck(t)2 Z tj/σ tj 1/σ p(ϵ)dϵ tj 1/σ ϵ2p(ϵ)dϵ Ck(t)2 s 1 X tj 1/σ p(ϵ)dϵ t1/σ ϵ2p(ϵ)dϵ Ck(t)2 ) This proves R1 1 en, which implies R1 = O(en 1). Using a similar approach, we can prove R2 = O(en 1). From (31), we get τ 2 k = O(en 1 + Ck(t)2). Now we prove bτ 2 k = eτ 2 n 2en(n 1) = τ 2 k[1 + op(1)]. By the definition of eτn, we have eτ 2 n 2en(n 1) = 1 2en(n 1) i=2 (Q(yi) Q(yi 1))2 = 1 2en(n 1) i=2 {Q(g0(i/n) + σϵi) Q(g0((i 1)/n) + σϵi 1)}2 = 1 2en(n 1) n g0(i/n) g0((i 1)/n) + Q(σϵi) Q(σϵi 1) + eδi o2 = 1 2en(n 1) i=2 {Q(σϵi) Q(σϵi 1)}2 + 1 2en(n 1) n g0(i/n) g0((i 1)/n) + eδi o2 + 1 en(n 1) i=2 {Q(σϵi) Q(σϵi 1)} n g0(i/n) g0((i 1)/n) + eδi o , Nonparametric Inference under B-bits Quantization where |eδi| = O(Ck(t)) for i = 1, , n. Note that, by the central limit theorem, it holds that 1 2(n 1) i=2 {Q(σϵi) Q(σϵi 1)}2 = V ar [Q(σϵ1)] + Op(n 1/2), and that g0(i/n) g0((i 1)/n) = O(1/n) by the smoothness of function g0, we have that eτ 2 n 2en(n 1) = 1 h V ar(Q(σϵ1)) + O(n 2 + Ck(t)2) + Op(n 1/2) + O(n 1 + Ck(t)) i = τ 2 k h 1 + Op(n 1/2 + Ck(t)) i , (32) which completes the proof. Lemma 12 Suppose Condition (B) holds true, and it holds that Ck(t) 0. Let bτ 2 k be the quantied variance based on ylinear, then we have that bτ 2 k = eτ 2 n 2en(n 1) = τ 2 k 1 + Op(n 1/2 + Ck(t)) = τ 2 k[1 + op(1)]. Proof By the definition of bτ 2 k, we have eτ 2 n 2en(n 1) = 1 2en(n 1) i=2 (Q(ylinear i ) Q(ylinear i 1 ))2 = 1 2en(n 1) i=2 {Q[Q(g0(i/n) + σϵi) byi] Q[Q(g0((i 1)/n) + σϵi 1)] byi 1}2 = 1 2en(n 1) n g0(i/n) g0((i 1)/n) + byi byi 1 + Q(σϵi) Q(σϵi 1) + eδi o2 , where |eδi| = O(Ck(t)) for i = 1, , n. Note that bg L(I), one has that |byi byi 1| = Op(1/n) by the smoothness of bg. Similar to the proof in Lemma 11, we get bτ 2 k = eτ 2 n 2en(n 1) = τ 2 k 1 + Op(n 1/2 + Ck(t)) = τ 2 k[1 + op(1)]. To ease calculation, we define some useful notations. Let z0 i be the quantized data conditional on g0(x) = 0 and z0 = (z0 1, . . . , z0 c)T . According to (6), we have z0 i 1 j=(i 1)en+1 Q(σϵj) Ck(t), for i = 1, . . . , c. (33) Furthermore, we let ez0 = (ez0 1, . . . , ez0 c)T , ez = (ez1, . . . , ezc)T , where for i = 1, . . . , c, ez0 i = z0 i 1(csρ + σ|ϵj| p Tn for all j = (i 1)en + 1, . . . ien), ezi = zi1(csρ + σ|ϵj| p Tn for all j = (i 1)en + 1, . . . ien). Li, Liu, Xu and Shang Lemma 13 Suppose g is the regression function generating the samples. Suppose Condition (B) holds and σ|ϵj| + csρ Tn holds for all j = 1, . . . , n. Then for any g Sm(I) with J(g) ρ2, it holds that |ezi ez0 i f(i/c)| 4Ck(t) + ζ, i = 1, . . . , c, where f is the corresponding integral equation defined in (15), ζ = max i=1,...,c |f(i/c) 1 en Pien j=(i 1)en+1 g(j/n)| = max i=1,...,c | 1 2 R min(i/c+ ,1) max(i/c ,0) g(s)ds 1 en Pien j=(i 1)en+1 g(j/n)| and = 1 Proof : Suppose σϵi Rj(t) for some 1 j k. Since min{t2 1, t2 k 1} = Tn and csρ+σ|ϵi| Tn, we must have 2 j k 1. Suppose that g(i/n) + σϵi Rl(t) for some 1 l k. Since min{t2 1, t2 k 1} = Tn and by (12) implying |g(i/n)| csρ, we have |yi| = |g(i/n) + σϵi| |g(i/n)| + |σϵi| csρ + |σϵi| p Tn = min{|t1|, |tk 1|}. Therefore, 2 l k 1. Since tj 1 σϵi < tj, tl 1 g(i/n) + σϵi < tl, tj 1 µj < tj, tl 1 µl < tl, we have tl 1 tj < µl µj < tl tj 1, tl 1 tj < g(i/n) < tl tj 1. Hence it holds that |Q(yi) Q(σϵi) g(i/n)| = |µl µj g(i/n)| |tl tl 1| + |tj tj 1| 2Ck(t). (34) Since csρ + σ|ϵj| Tn for all j = 1, . . . , n, the result follows from (6) and (33) that |ezi ez0 i f(i/c)| = zi Pien j=(i 1)en+1 Q(yj) Pien j=(i 1)en+1 Q(σϵj) Pien j=(i 1)en+1 Q(yj) Pien j=(i 1)en+1 Q(σϵj) Pien j=(i 1)en+1 ((Q(yj) Q(σϵj)) enf(i/c) en + 2Ck(t) Pien j=(i 1)en+1 {Q(yj) Q(σϵj) g(j/n)} + Pien j=(i 1)en+1 g(j/n) enf(i/c) en + 2Ck(t) Pien j=(i 1)en+1 {Q(yj) Q(σϵj) g(j/n)} + Pien j=(i 1)en+1 g(j/n) enf(i/c) en + 2Ck(t) 4Ck(t) + ζ, where the last inequality follows from (34) and the definition of ζ. Nonparametric Inference under B-bits Quantization Lemma 14 Suppose Condition (B) holds, and h 0, ch . Then for any g Sm(I) with J(g) ρ2, we have sup g Sm ρ (I) E{|(ez ez0)T Az0|2} (1 + (ch2) 1)(τ 2 k + 4Ck(t)2) Pc i=1(|f(i/c)| + 4Ck(t) + ζ)2 8, as c , (35) where ζ = max i=1,...,c | 1 2 R min(i/c+ ,1) max(i/c ,0) g(s)ds 1 en Pien j=(i 1)en+1 g(j/n)| and = 1 Proof : For convenience, let ωi = ezi ez0 i . From Lemma 13 and the fact that ezi ez0 i = 0 if csρ + σ|ϵj| > Tn for some (i 1)en + 1 j ien, it holds that |ωi| |f(i/c)| + 4Ck(t) + ζ for all 1 i c. (36) According to (33) and the fact that j=(i 1)en+1 Q(σϵj) j=(i 1)en+1 σϵj one has that j=(i 1)en+1 Q(σϵj) + Ck(t) 2Ck(t), which further implies that E(|z0 i |2) = V ar(z0 i ) + |E(z0 i )|2 τ 2 k + 4Ck(t)2. For any g Sm(I) with J(g) ρ2, we have E{|(ez ez0)T Az0|2} u,v=1 au,vωuz0 v)2} 1 u1,u2,v1,v2 c au1,v1au2,v2E{ωu1ωu2z0 v1z0 v2} u1,u2,v1 are mutually different au1,v1au2,v1E{ωu1}E{ωu2}E{|z0 v1|2} + X u1 =v1 a2 u1,v1E{ω2 u1}E{|z0 v1|2} u1 =u2 au1,u1au2,u2E{ωu1z0 u1}E{ωu2z0 u2} + X u1 =u2 au1,u1au2,u1E{ωu1|z0 u1|2}E{ωu2} u1 a2 u1,u1E{ω2 u1|z0 u1|2} T1 + T2 + T3 + T4 + T5. To complete the proof, we will analyze the above T1 through T5 terms. Li, Liu, Xu and Shang For T1, we have T1 = (τ 2 k + 4Ck(t)2) X v1 =u1,u2 au1,v1au2,v1E{ωu1}E{ωu2} (τ 2 k + 4Ck(t)2) v1 =u1,u2 au1,v1au2,v1 E{ωu1}E{ωu2} = (τ 2 k + 4Ck(t)2)E{x}T (A A0)2E{x} 2(τ 2 k + 4Ck(t)2)E{x}T (A2 + A2 0)E{x}, where recall A0 = diag(a1,1, . . . , ac,c). Since A Ic and a1,1 = = ac,c 1/(ch) = o(1), we have A2 + A2 0 2Ic (as c ), which, together with (36), further leads to T1 4(τ 2 k + 4Ck(t)2)E{x}T E{x} 4(τ 2 k + 4Ck(t)2) i=1 (|f(i/c)| + 4Ck(t) + ζ)2 4(1 + (ch2) 1)(τ 2 k + 4Ck(t)2) i=1 (|f(i/c)| + 4Ck(t) + ζ)2. For T2, we have T2 = (τ 2 k + 4Ck(t)2) v1 =u1 a2 u1,v1 (τ 2 k + 4Ck(t)2) v1=1 a2 u1,v1 (|fu1| + 4Ck(t) + ζ)2 (1 + (ch2) 1)(τ 2 k + 4Ck(t)2) i=1 (|f(i/c)| + 4Ck(t) + ζ)2, where the last inequality follows from v=1 a2 i,v = 1 λ2 d,r (λ + λd ,r)4 (ch) 1 0. Here the above is uniformly of 1 i c. For T3, Cauchy inequality implies that i=1 ai,i E{ωiz0 i } i=1 a2 i,i|E{ωiz0 i }|2 c(τ 2 k + 4Ck(t)2) i=1 a2 i,i(|f(i/c)| + 4Ck(t) + ζ)2 (1 + (ch2) 1)(τ 2 k + 4Ck(t)2) i=1 (|f(i/c)| + 4Ck(t) + ζ)2, Nonparametric Inference under B-bits Quantization where the last inequality follows from ca2 1,1 = . . . = ca2 c,c (ch2) 1, as c . For T4, we have i =v ai,iav,i E{ωi|z0 i |2}E{ωv} i=1 ai,i E{|ωi| |z0 i |2} X v =i |av,i|E{|ωv|} τ 2 k + 4Ck(t)2 i=1 (|f(i/c)| + 4Ck(t) + ζ) i=1 (|f(i/c)| + 4Ck(t) + ζ)2 τ 2 k + 4Ck(t)2 i=1 (|f(i/c)| + 4Ck(t) + ζ)2 τ 2 k + 4Ck(t)2 i=1 (|f(i/c)| + 4Ck(t) + ζ)2 τ 2 k + 4Ck(t)2 i=1 (|f(i/c)| + 4Ck(t) + ζ)2 (1 + (ch2) 1)(τ 2 k + 4Ck(t)2) i=1 (|f(i/c)| + 4Ck(t) + ζ)2. For T5, it holds that i=1 a2 i,i E{ω2 i |z0 i |2} τ 2 k + 4Ck(t)2 i=1 (|f(i/c)| + 4Ck(t) + ζ)2 (1 + (ch2) 1)(τ 2 k + 4Ck(t)2) i=1 (|f(i/c)| + 4Ck(t) + ζ)2. From the above analysis of T1 through T5, we get that as c , for any g Sm(I) with J(g) ρ2, it follows that E{|(ez ez0)T Az0|2} 8(1 + (ch2) 1)(τ 2 k + 4Ck(t)2) i=1 (|f(i/c)| + 4Ck(t) + ζ)2. This proves the desired result. For ν = 1, 2, . . . , c, define Φν = (ϕν(1/c), ϕν(2/c), . . . , ϕν(c/c))T . Let ε = exp(2π 1/c), xr = 1 c(1, εr, ε2r, . . . , ε(c 1)r)T , r = 0, 1, . . . , c 1, and x r be the conjugate transpose of xr. Li, Liu, Xu and Shang Lemma 15 For 0 r c 1 and 1 v c 1, one has that x rΦ2(pc+v) 1 = rc 2 εv I(r = v) + ε v I(r + v = c) , x rΦ2(pc+v) = r 2 εv I(r = v) ε v I(r + v = c) ; x rΦ2(pc+c) 1 = rc x rΦ2(pc+c) = 0. Proof : The proof can be accomplished by direct calculations. For instance, the first case holds by following arguments. For 0 r c 1 and 1 v c 1, x rΦ2(pc+v) 1 = 1 i=1 ε r(i 1) cos 2πvi i=1 ε r(i 1)(εvi + ε vi) i=0 ε (r v)iεv + 1 i=0 ε (r+v)iε v 2 εv I(r = v) + ε v I(r + v = c) . The proof of other cases is similar. Let M = (x0, x1, . . . , xc 1) and M f = (e0(f), e1(f), . . . , ec 1(f))T , where f = (f(1/c), . . . , f(c/c))T . Recall M is the conjugate transpose of M. Suppose f Sm(I) admits Fourier expansion f = P ν=1 fνϕν. Lemma 16 There exists a universal constant ϱ > 0 s.t. for any f Sm(I), f T (Ic A)f ϱc(λ + c 2m)J(f). Proof : For simplicity, denote er = er(f). For 1 r c/2, we have λ2 d ,r λd,r (2πr) 2m + (2π(c r)) 2m + 2 d m(2πc) 2m 2 (2πr) 4m (2πr) 2m + (1 + 21 2m d m)(πc) 2m 2 (2πr) 4m = 2(1 + 21 2m d m)(2πr) 2m(πc) 2m + (1 + 21 2m d m)2(πc) 4m. Nonparametric Inference under B-bits Quantization Therefore, it follows that 1 λd,r (λ + λd ,r)2 = λ2 + 2λλd ,r + λ2 d ,r λd,r (λ + λd ,r)2 2λ λ + λd ,r + λ2 d ,r λd,r (λ + λd ,r)2 2λ + 2(1 + 21 2m d m)(πc) 2m + 22m(1 + 21 2m d m)2(πc) 2m (2πr)2m ϱ m(λ + c 2m)(2πr)2m, (37) where ϱ m = max{2, (2(1+21 2m d m)+22m(1+21 2m d m)2)π 2m}, and d m is defined in (29). By Lemma 15 and direct calculations, for 1 r c 1, we have v=1 f2(pc+v) 1x rΦ2(pc+v) 1 + v=1 f2(pc+v)x rΦ2(pc+v) v=1 f2(pc+v) 1 2εv I(r = v) + rc 2ε v I(v + r = c) v=1 f2(pc+v) 2εv I(r = v) r 2ε v I(r + v = c) f2(pc+r) 1 + f2(pc+c r) 1 + 1f2(pc+c r) . Therefore, it holds that |er|2 = c 2 f2(pc+r) 1 + f2(pc+c r) 1 f2(pc+r) f2(pc+c r) It is easy to see that |er|2 = |ec r|2, r = 1, . . . , c 1. Li, Liu, Xu and Shang For 1 r c/2, we have p=0 f2 2(pc+r) 1(2π(pc + r))2m X p=0 (2π(pc + r)) 2m p=0 f2 2(pc+c r) 1(2π(pc + c r))2m X p=0 (2π(pc + c r)) 2m p=0 f2 2(pc+r)(2π(pc + r))2m X p=0 (2π(pc + r)) 2m p=0 f2 2(pc+c r)(2π(pc + c r))2m X p=0 (2π(pc + c r)) 2m p=0 f2 2(pc+r) 1(2π(pc + r))2m + c p=0 f2 2(pc+c r) 1(2π(pc + c r))2m p=0 f2 2(pc+r)(2π(pc + r))2m + c p=0 f2 2(pc+c r)(2π(pc + c r))2m 2m 2m 1(2πr) 2m, (38) where (38) follows by an elementary inequality p=0 (2π(pc + r)) 2m 2m 2m 1(2πr) 2m, 1 r c/2. Meanwhile, a similar analysis leads to |e0|2 = c 2 p=0 f2(pc+c) 1 p=0 f2 2(pc+c) 1(2π(pc + c))2m X p=0 (2π(pc + c)) 2m p=0 f2 2(pc+c) 1(2π(pc + c))2m p=1 (2πp) 2m. (39) Nonparametric Inference under B-bits Quantization Now it follows from (37), (38) and (39), and elementary facts λd ,r = λd ,c r and λd,r = λd,c r, for 1 r c 1, that f T (Ic A)f 1 λd,r (λ + λd ,r)2 = 1 λd,0 (λ + λd ,0)2 1 λd,r (λ + λd ,r)2 1 λd,r (λ + λd ,r)2 |e0|2 + 2 X 1 λd,r (λ + λd ,r)2 p=0 f2 2(pc+c) 1(2π(pc + c))2m p=1 (2πp) 2m +2ϱ mc(λ + c 2m) 2m 2m 1 p=0 f2 2(pc+r) 1(2π(pc + r))2m p=0 f2 2(pc+c r) 1(2π(pc + c r))2m + p=0 f2 2(pc+r)(2π(pc + r))2m p=0 f2 2(pc+c r)(2π(pc + c r))2m ϱmc(λ + c 2m) f2 2ν 1 + f2 2ν (2πν)2m = ϱmc(λ + c 2m)J(f), where ϱm = max{P p=1(2πp) 2m/2, 4mϱ m/(2m 1)}. It is straightforward to see ϱm is a decreasing function with respect to m, therefore, we choose ϱ = ϱm=1. This proves Lemma 16. The proof of Theorem 8 requires some recent Gaussian approximation result, i.e., Theorem 3.1 in Koike (2019). Lemma 17 For each c N, let Ψc be an c-dimensional centered Gaussian vector with covariance matrix Σc = (Σc(m, m ))1 m,m c and mu 2 be an integer. Also, for each m = ml, . . . , mu, let Am be an c c symmetric matrix and Zc = (Zc,ml, . . . , Zc,mu) be an mu ml + 1 -dimensional centered Gaussian vector with covariance matrix Cc = (Cc(m, m ))ml m,m mu . Set Fc,m := Ψ c AmΨc E Ψ c AmΨc and suppose that the following conditions are satisfied: 1. There is a constant b > 0 such that Cc(m, m) b for every c and every m = ml, . . . , mu. Li, Liu, Xu and Shang 2. maxml m mu E F 4 c,m 3E F 2 c,m 2 log6 mc 0 as c . 3. maxml m,m mu Cc(m, m ) E Fc,m Fc,m log2 mc 0 as c . Then we have P max ml m mu Fc,m x P max ml m mu Zc,m x 0, as c . Proof This is Theorem 3.1 in Koike (2019). The proof of Theorem 9 requires some rate conditions which are summarized in the following lemma. Lemma 18 Suppose λm = a2m n n 4m/(4m+1) log(mu)2m/(4m+1), then for any ml < m < mu , under Condition (C), the following rate conditions hold: hm log2(mu) 0; mu log2(mu)/chm 0; (hm /hm) 1/2 m2 u log2(mu) 0, for any ml m < m < mu; chm/ log(c) ; h1/2 m log(c) 0, where hm = λ 1 2m = ann 2/(4m+1) log(mu)1/(4m+1). Proof It is easy to see hml < hm < hmu. Therefore hm log2(mu) < hmu log2(mu) ann 2/(4mu+1)[log(n)]2 0. mu log2(mu)/chm < mu log2(mu)/chml n2/(4ml+1) log(n)/(can) 0. (hm /hm) 1/2 m2 u log2(mu) = n 4(m m) (4m+1)(4m +1) (log(mu)) 2m 2m (4m+1)(4m +1) m2 u log2(mu) n 4 (4mu+1)2 (log(mu)) 2 (4mu+1)2 m2 u log2(mu) 0. where the last 0 follows from the assumption mu logd0(n) for some d0 (0, 1/2). For the last two terms, one has that chm/ log(c) > chml/ log(c) can n2/(4ml+1) log(n) . h1/2 m log(c) < h1/2 mu log(c) a1/2 n n 1/(4mu+1) log(n) 0. Nonparametric Inference under B-bits Quantization Appendix D. Proofs for main theorems Proof of Theorem 1: It holds that bg B µ,t,c g0 2 2 bg B µ,t,c bgss c 2 + 2 bgss c g0 2, and we analyze these two terms separately. We first analyze bg B µ,t,c bgss c 2. Because |Q(yj) yj| Ck(t)1 n yj k 1 j=2Rj(t) o + |yj µ1|1 {yj R1(t)} + |yj µk|1 {yj Rk(t)} , (zi eyi)2 = j=(i 1)en+1 Q(yj) + 1 j=(i 1)en+1 Q(yj) eyi j=(i 1)en+1 j=(i 1)en+1 Ck(t)1 n yj k 1 j=2Rj(t) o + |yj µ1|1 {yj R1(t)} + |yj µk|1 {yj Rk(t)} )2 + 2Ck(t)2 4Ck(t)2 + 2 j=(i 1)en+1 (yj µ1)21 {yj R1(t)} + 2 j=(i 1)en+1 (yj µk)21 {yj Rk(t)} . Therefore, from Lemma 10, we have E bg B µ,t,c bgss c 2 c 1 c X i=1 E (zi eyi)2 4Ck(t)2 + 2 i=1 E (yi µ1)21 {yi R1(t)} + 2 i=1 E (yi µk)21 {yi Rk(t)} . On the other hand, by elementary calculations we have i=1 E (yi µ1)21 {yi R1(t)} = 2 (z µ1)2p z g0(i/n) i=1 E (yi µk)21 {yi Rk(t)} = 2 tk 1 (z µk)2p z g0(i/n) where p( ) is the distribution of ϵ. Combining the above, we get E bg B µ,t,c bgss c 2 Gc,k(t). (40) Next, we analyze the mean square error of the second term bgss c g0 2. For the sake of theoretical investigation, we introduce the following function, i=1 θnew,i Ki/c, Li, Liu, Xu and Shang where (θnew,1, . . . , θnew,c)T = c 1(Σc + λIc) 1 ef with ef = (f(1/c), . . . , f(c/c))T Rc, and f(x) is the integral function of g0 as defined in (15), i.e., Z min(x+ ,1) max(x ,0) g0(s)ds, where x [0, 1], and = 1 Recall that bgss c = Pc i=1 eθi Ki/c = c 1 P ν=1 ΦT ν (Σc+λIc) 1ey γν ϕν, where (eθ1, . . . , eθc)T = c 1(Σc + λIc) 1ey with ey = (ey1, . . . , eyc)T , ϕ2k 1(x) = 2 cos(2πkx), ϕ2k(x) = 2 sin(2πkx) are the trigonometric basis functions, γ2k 1 = γ2k = (2πk)2m, and Φν = (ϕν(1/c), ϕν(2/c), . . . , ϕν(c/c))T . Therefore, we have E(bgss c ) gnew 2 = c 1(eg ef)T (Σc + λIc) 1Ωc(Σc + λIc) 1(eg ef) c 1(eg ef)T (eg ef) = O(n 2), (41) where eg = (eg1, . . . , egc)T and egi = Pien j=(i 1)en+1 g0(j/n) en , i = 1, . . . , c. Next, we evaluate E( bgss c E(bgss c ) 2). Note that Σc, Ωc can be decomposed as Σc = MΛd M , Ωc = MΛd M , as defined in (26). Furthermore, we let ey0 = (ey0 1, . . . , ey0 c)T , where ey0 i = 1 en Pien j=(i 1)en+1 σϵj. Hence, we obtain E( bgss c E(bgss c ) 2) = 1 c2γ2ν ν=1 E ΦT ν (Σc + λIc) 1 ey0 2 ν=1 trace (Σc + λIc) 1 ΦνΦT ν (Σc + λIc) 1 Eeϵ2 (Σc + λIc) 1 X ΦνΦT ν /c γ2ν (Σc + λIc) 1 ! n trace (Σc + λIc) 1 Ωc (Σc + λIc) 1 n trace M (Λd + λIc) 1 Λd (Λd + λIc) 1 M λd,l λ + λd ,l 2 . Nonparametric Inference under B-bits Quantization By expressions of λd,l s, the above is upper bounded by the following 2σ2 dm n 2 d m + (2πc)2mλ 2 + σ2 1 + dm n 1 c 1 X (2π(c l)) 4m + (2πl) 4m (λ + (2π(c l)) 2m + (2πl) 2m)2 2σ2 dm n 2 d m + (2πc)2mλ 2 + 2σ2 1 + dm n 1 X (2πl) 4m + (2π(c l)) 4m (λ + (2πl) 2m + (2π(c l)) 2m)2 2σ2 dm n 2 d m + (2πc)2mλ 2 + 4σ2 1 + dm n 1 X (λ + (2πl) 2m)2 2σ2 dm n 2 d m + (2πc)2mλ 2 + 2σ2 1 + dm (1 + x2m)2 dx (1 + x2m)2 dx , where bm 1 is a constant only depending on m. From the above analysis, we obtain E( bgss c E(bgss c ) 2) n 1 + (nh) 1. Using above analysis and (41), we have E( bgss c gnew 2) n 1 + (nh) 1. (42) Now, we consider the difference between original regression function g0 and the integral function f defined in (15), i.e., f g0 2. By definition, for t [ , 1 ], there exists t between t and t + such that f(t) g0(t) = 1 2 t (g0(s) g0(t)) ds g 0(t)(s t) + g 0(t ) 2 (s t)2 ds 2 (s t)2ds = 2 On the other hand, for t [0, ], there exists t between 0 and t + such that f(t) g0(t) = 1 2 0 (g0(s) g0(t)) ds g 0(t )(s t) ds In a similar way, we obtain f(t) g0(t) 4 g 0(t ) for t [t , 1] and some t [t , 1]. Therefore, by Sobolev inequality, we know R 1 0 |g 0(t)|2dt R 1 0 |g(m) 0 (t)|2dt < and Li, Liu, Xu and Shang R 1 0 |g 0(t)|2dt R 1 0 |g(m) 0 (t)|2dt < , which implies f g0 2 = Z 1 0 |f(t) g(t)|2dt t |f(t) g(t)|2dt + Z t+ 0 |f(t) g(t)|2dt + Z 1 t |f(t) g(t)|2dt = O(c 3). (43) In the end, because both gnew and f belong to Sobolev space, and gnew can be viewed as the approximate error of spline estimates with respect to f without random error. By classical spline theory ((Wahba, 1990)), we know gnew f 2 = O(c 2m + λ). (44) As a consequence, from (42), (43), and (44), we have E bgss c g0 2 3E( bgss c gnew 2) + 3 gnew f 2 + 3 f g0 2 = O (nh) 1 + c 3 + c 2m + λ . Combining the result in (40), we get the desired result. Proof of Corollary 2: Because as |t1| , Gc,k,1(t) = 1 (z µ1)2σ 1p z g0(i/n) 2n 1σ 1 Z t1 g0(i/n) i=1 z2p(z)dz + 2n 1σ 1 Z t1 g0(i/n) i=1 µ2 1p(z)dz. The first term in the above equation is bounded by n 2m/(2m+1) because p(z) satisfies R |z| T z2p(z)dz = O(exp( T d)) and d 4m 2m+1, |t1| p log(n). Due to Condition (B), we know Gc,k,1(t) = O(n 2m/(2m+1)). Similarly, we know Gc,k,2(t) = O(n 2m/(2m+1)). Hence Gc,k(t) = Ck(t)2 + Gc,k,1(t) + Gc,k,2(t) = O(n 2m/(2m+1)). The result follows by Theorem 1 and λ n 2m/(2m+1), c n max{1,2m/3} Proof of Theorem 3: Suppose z i s are the quantized samples corresponding to µj = µ j for 1 j k, where µ j are defined by µ j = Pn i=1 E{yi I(yi Rj(t))} Pn i=1 P(yi Rj(t)) . (45) For p > 0, define the pth order moment of the standardized z i : mp = EH0{|z i /τ k|p}, (46) where EH0 denotes the expectation under H0 and τ 2 k = V ar(z i |H0). Because |µj µ j| Ck(t) for j = 2, . . . , k 1, and under Condition (B), we have that τ 2 k = O(cn 1). Fur- thermore, since b log2 p n Tnh1/2 , which implies that Ck(t)4 T 2 n 24b T 2 n n2h T 2 n = Nonparametric Inference under B-bits Quantization (n2h) 1 = o(c2n 2) and the assumption that E([en 1 Pen j=1 Q(ϵj)]4) = O(c2n 2), one has that mp = O(1) for p = 3, 4. Define zsd i = z i /τ k for i = 1, . . . , c. Then zsd i are iid variables with zero-mean and unit variance. Define z = (z 1, . . . , z c)T and zsd = (zsd 1 , . . . , zsd c )T . Define A0 = diag(a1,1, . . . , ac,c) and A1 = A A0. Let B = A1/sc. Define αl = λd,l (λ+λd ,l)2 , l = 0, . . . , c 1. Immediately, for all i = 1, . . . , c, ai,i = c 1 Pc 1 l=0 αl 1/(ch), therefore, i =i a2 i,i = i,i =1 a2 i,i = trace(A2) c h 1(1 1/(ch)) h 1, where the last follows from condition (ch) 1 = o(1). This implies that s2 c h 1. Furthermore, trace(A2) = l=0 α2 l h 1 and trace(A4) = l=0 α4 l h 1. (47) Let T µ ,t,c be the test statistic corresponding to z i s. By (25) it can be shown that c T µ ,t,c = z T Az , which leads to that c T µ ,t,c Pc i=1 ai,iτ 2 k scτ 2 k = z T Az Pc i=1 ai,iτ 2 k scτ 2 k = (zsd)T Azsd Pc i=1 ai,i sc Pc i=1 ai,i((zsd i )2 1) + P 1 i =i c ai,i zsd i zsd i sc = Pc i=1 ai,i((zsd i )2 1) sc + X i =i bi,i zsd i zsd i Q1 + Q2. We first look at Q1. By (46) we have i=1 ai,i((zsd i )2 1)|2}/s2 c = s 2 c i=1 a2 i,i(m4 1) m4 1 which leads to Q1 = o P (1). Define bi,i = 0 for i = 1, . . . , c and B = [bi,i ]1 i,i c. We next analyze Q2. Note that Q2 = (zsd)T Bzsd. Let (ezsd 1 , . . . , ezsd c )T be an independent copy of zsd = (zsd 1 , . . . , zsd c )T . Li, Liu, Xu and Shang Let I be uniform distributed on {1, 2, . . . , c}. Throughout, we let ezsd i , zsd i and I be mutually independent. Define ezsd = (zsd 1 , . . . , zsd I 1, ezsd I , zsd I+1, . . . , zsd c )T . So (zsd, ezsd) is an exchangeable pair (see Reinert and R ollin (2009)), and ezsd = zsd + e I(ezsd I zsd I ), where ej = (0, . . . , 0, 1, 0, . . . , 0)T with 1 being at the jth position for j = 1, . . . , c. Let Q 2 = ((ezsd)T Bezsd. By a simple calculation it can be shown that Q 2 Q2 = (ezsd)T Bezsd (zsd)T Bzsd = 2(ezsd I zsd I )e T I Bzsd. So it follows that E{Q 2 Q2|zsd} = E{2(ezsd I zsd I )e T I Bzsd|zsd} = 2 j=1 E{(ezsd j zsd j )e T j Bzsd|zsd} = 2 Let g 0 : R [0, 1] be a C3-function such that g 0(s) = 1 for s 0 and g 0(s) = 0 for s 1. Let Gu(s) = g 0(ψc(s u)) for u R, where ψc is a positive sequence tending to infinity and satisfying (m2 4 + m2 3 + m4)ψ2 ch = o(1), m3ψ3 ch1/2 = o(1). (49) The existence of such ψc follows by (46). Next we will approximate E{Gu(Q2) Gu(V )} where V N(0, 1). Consider Stein s equation Gu(s) E{Gu(V )} = g(s) zsd g(s), (50) where g and g represent firstand second-order derivatives of g. By Goldstein and Rinott (1996), a solution to (50) is 1 t V )} E{Gu(V )}]dt. (51) Let C1 = g 0 sup, C2 = g 0 sup, and C3 = ...g 0 sup, where ...g 0 is the third-order derivative of g 0. It is easy to see that ...g (s) = 1 t EU{... Gu( Clearly, it holds that g sup Gu sup C2ψ2 c and ...g sup ... Gu sup C3ψ3 c. By exchangeability, 1 2E{(Q 2 Q2)( g(Q 2) + g(Q2))} = 0. So E{(Q 2 Q2) g(Q2)} + 1 2E{(Q 2 Q2)( g(Q 2) g(Q2))} = 0. Since E{(Q 2 Q2) f(Q2)} = E{E{Q 2 Q2|w} g(Q2)} = Nonparametric Inference under B-bits Quantization c E{Q2 g(Q2)}, we have E{Q2 g(Q2)} E{ g(Q2)} = c 4E{(Q 2 Q2)( g(Q 2)) g(Q2))} E{ g(Q2)} = c 4E{ g(Q2)(Q 2 Q2)2} E{ g(Q2)} + c 0 (1 t) E{...g (Q2 + t(Q 2 Q2))(Q 2 Q2)3}dt = E{ g(Q2)( i=1 (ezsd i zsd i )2(e T i Bzsd)2 1)} 0 (1 t)E{...g (Q2 + t(Q 2 Q2)) i=1 (ezsd i zsd i )3(e T i Bzsd)3} J1 + J2. (52) Next, we analyze J1 and J2 separately. Let Mp = E{(zsd i )p} for p 1. For J1, by direct examinations we have |J1| C2ψ2 c E{| i=1 (1 + (zsd i )2)(e T i Bzsd)2 1|} C2ψ2 c E{| i=1 Di 1|2}1/2, where Di = (1 + (zsd i )2)(e T i Bzsd)2. Since Pc i=1 E{Di} = 1, we get that i=1 Di 1)2} = E{| i=1 [Di E{Di}]|2} i=1 E{(Di E{Di})2} + X i =i E{(Di E{Di})(Di E{Di })}. (53) The first term of (53) is equal to i=1 [3(3 + M4)(e T i B2ei)2 4(e T i B2ei)2] = (5 + 3M4) i=1 (e T i B2ei)2 = (5 + 3M4) (5 + 3M4)trace(A4)/s4 c (5 + 3M4)h, where the last follows by (47). Li, Liu, Xu and Shang The second term of (53) is equal to E{(1 + (zsd i )2)(1 + (zsd i )2)(e T i Bzsd)2(e T i Bzsd)2} E{(1 + (zsd i )2)(e T i Bzsd)2}E{(1 + (zsd i )2)(e T i Bzsd)2} E{(1 + (zsd i )2)(1 + (zsd i )2)E{(e T i Bzsd)2(e T i Bzsd)2|zsd i , zsd i }} E{(1 + (zsd i )2)(e T i Bzsd)2}E{(1 + (zsd i )2)(e T i Bzsd)2} . We have that E{(e T i Bzsd)2(e T i Bzsd)2|zsd i , zsd i } = E{(bi,i zsd i + X l =i,i bi,lzsd l )2(bi ,izsd i + X l =i,i bi ,lzsd l )2|zsd i , zsd i } = E{(N1 + N2 + N3)(N 1 + N 2 + N 3)|zsd i , zsd i } = E{N1N 1|zsd i , zsd i } + E{N1N 2|zsd i , zsd i } + E{N1N 3|zsd i , zsd i } +E{N2N 1|zsd i , zsd i } + E{N2N 2|zsd i , zsd i } + E{N2N 3|zsd i , zsd i } +E{N3N 1|zsd i , zsd i } + E{N3N 2|zsd i , zsd i } + E{N3N 3|zsd i , zsd i }, where N1 = (P l =i,i bi,lzsd l )2, N2 = 2 P l =i,i bi,lzsd l bi,i zsd i , N3 = b2 i,i (zsd i )2, N 1 = (P l =i,i bi ,lzsd l )2, N 2 = 2 P l =i,i bi ,lzsd l bi ,izsd i , N 3 = b2 i ,i(zsd i )2. By direct calculations, it is easy to see that E{N1N 1|zsd i , zsd i } = M4 X l =i,i b2 i,lb2 i ,l + X l1,l2 =i,i l1 =l2 b2 i,l1b2 i ,l2 + 2 X l1,l2 =i,i l1 =l2 bi,l1bi,l2bi ,l1bi ,l2, E{N1N 2|zsd i , zsd i } = 2M3bi ,izsd i X l =i,i b2 i,lbi ,l, E{N1N 3|zsd i , zsd i } = b2 i ,i(zsd i )2 X l =i,i b2 i,l, E{N2N 1|zsd i , zsd i } = 2M3bi,i zsd i X l =i,i b2 i ,lbi,l, E{N2N 2|zsd i , zsd i } = 4b2 i,i zsd i zsd i X l =i,i bi,lbi ,l, E{N2N 3|zsd i , zsd i } = E{N3N 2|zsd i , zsd i } = 0, E{N3N 1|zsd i , zsd i } = b2 i,i (zsd i )2 X l =i,i b2 i ,l, E{N3N 3|zsd i , zsd i } = b2 i,i b2 i ,i(zsd i )2(zsd i )2. Nonparametric Inference under B-bits Quantization Therefore, it can be shown that i =i [E{Di Di } E{Di}E{Di }] l =i,i b2 i,lb2 i ,l + 4 X l1,l2 =i,i l1 =l2 b2 i,l1b2 i ,l2 l1,l2 =i,i l1 =l2 bi,l1bi,l2bi ,l1bi ,l2 + 4M2 3 X i =i bi,i X l =i,i b2 i,lbi ,l i =i bi ,i X l =i,i bi,lb2 i ,l + 2(1 + M4) X i =i b2 i,i X l =i,i b2 i,l +2(1 + M4) X i =i b2 i,i X l =i,i b2 i ,l + 4M2 3 X i =i b2 i ,i X l =i,i bi,lbi ,l +(1 + M4)2 X i =i b4 i,i 4 X l =i b2 i,l X l =i b2 i ,l l =i,i b2 i,lb2 i ,l + 8 X l1,l2 =i,i l1 =l2 bi,l1bi,l2bi ,l1bi ,l2 + 4M2 3 X i =i bi,i X l =i,i b2 i,lbi ,l i =i bi ,i X l =i,i bi,lb2 i ,l + 2(1 + M4) X i =i b2 i,i X l =i,i b2 i,l +2(1 + M4) X i =i b2 i,i X l =i,i b2 i ,l + 4M2 3 X i =i b2 i ,i X l =i,i bi,lbi ,l + (1 + M4)2 X i =i b4 i,i (M2 4 + 12M2 3 + 10M4 + 13)trace(B4). The last inequality holds because each term in the summation is bounded by trace(B4) multiplied by suitable constants. Since B = (A A0)/sc, we have B2 2(A2 + A2 0)/s2 c and B4 8(A4 + A4 0)/s4 c. So it holds that trace(B4) 16s 4 c trace(A4), where the last inequality follows from the trivial fact trace(A4) Pc i=1 a4 i,i. From the above analysis, we get that |J1| C2(16M2 4 + 192M2 3 + 163M4 + 213)ψ2 cs 4 c trace(A4). (54) Li, Liu, Xu and Shang For J2, it holds that 0 (1 t)E{...g (Q2 + t(Q 2 Q2))(ezsd i zsd i )3(e T i Bzsd)3} i=1 E{|ezsd i zsd i |3|e T i Bzsd|3} = 2 ...g sup i=1 E{|ezsd i zsd i |}E{|e T i Bzsd|3} 32C3E{|zsd i |3}ψ3 c i=1 E{|e T i Bzsd|3} 32C3E{|zsd i |3}ψ3 c i=1 E{|e T i Bzsd|4}1/2E{|e T i Bzsd|2}1/2 3C3E{|zsd i |3}ψ3 c i=1 (e T i B2ei)3/2 3C3E{|zsd i |3}ψ3 c i=1 (e T i B2ei)2 c X i=1 e T i B2ei 3/2C3E{|zsd i |3}ψ3 c p 3/2C3E{|zsd i |3}ψ3 cs 2 c p By (47), (54) and s2 c h 1, we have |J1| C2(16M2 4 + 192M2 3 + 163M4 + 213)ψ2 ch, 3/2C3E{|zsd i |3}ψ3 ch1/2. By (49) the following holds uniformly for u R: E{Gu(Q2)} E{Gu(V )} 0, c . (55) Similarly, for e Gu(s) = g 0(ψn(s u)+1), it can be shown that the following statement holds uniformly for u R: E{ e Gu(Q2)} E{ e Gu(V )} 0, c . (56) By elementary facts, we have P(Q2 u) E{Gu(Q2)} P(Q2 u + ψ 1 c ), P(V u) E{Gu(V )} P(V u + ψ 1 c ), P(Q2 u ψ 1 c ) E{ e Gu(Q2)} P(Q2 u), P(V u ψ 1 c ) E{ e Gu(V )} P(V u). (57) By (55), (56) and (57), the following statements hold uniformly for u R, P(Q2 u) P(V u) E{Gu(Q2) Gu(V )} + P(V u + ψ 1 c ) P(V u) 0, P(V u) P(Q2 u) E{ e Gu(V ) e Gu(Q2)} + P(V u) P(V u ψ 1 c ) 0. Nonparametric Inference under B-bits Quantization Hence, as c tends to infinity, sup u R |P(Q2 u) P(V u)| 0. This, together with Q1 = o P (1), proves c T µ ,t,c trace(A)τ 2 k scτ 2 k d N(0, 1), as c . (58) Let zi s and Tµ,t,c be the quantized samples and testing statistics in Theorem 3, then we have c Tµ,t,c Pc i=1 ai,ibτ 2 k scbτ 2 k = c Tµ,t,c c T µ ,t,c scbτ 2 k + c T µ ,t,c Pc i=1 ai,ibτ 2 k scbτ 2 k = R1 + R2. We will analyze these two terms separately. For R1, one has that c Tµ,t,c c T µ ,t,c scbτ 2 k = z T Az z T Az scbτ 2 k = T A scbτ 2 k + 2 T Az scbτ 2 k , (59) where = ( 1, . . . , c)T with i = zi z i which satisfies E( 2 i ) C2 k(t) + 1/n under Condition B. For the first term, since 2 Op c Ck(t)2 + cn 1 and A Ic, it follows that scbτ 2 k c scbτ 2 k Op Ck(t)2 + n 1 = Op = Op nh1/2Ck(t)2 = op(1), where the last equality follows from the condition b log2 p n Tnh1/2 , which implies that 22b Tn (nh1/2)Tn = (nh1/2) 1. Li, Liu, Xu and Shang For the second term in (59), using the fact that ( i , z i )T , ( j, z j )T are independent if i = j, and Ez = 0, it is straightforward to show that E ( E )T Az 2 = i=1 a2 ii E(( i E i )z i )2 + j=1 a2 ij E(( i E i )( i E i )z j z j ) j=1 aijaji E(( i E i )z j ( j E j)z i ) j=1 aiiajj E(( i E i )z i ( j E j)z j ) j=1 ajjaii E(( j E j)z j ( i E i )z i ) i=1 a2 ii Ck(t)2τ 2 k + j=1 a2 ij Ck(t)2τ 2 k j=1 aijaji Ck(t)2τ 2 k + j=1 aiiajj Ck(t)2τ 2 k + j=1 ajjaii Ck(t)2τ 2 k Ck(t)2τ 2 k + 2trace(A2)Ck(t)2τ 2 k + 2[trace(A)]2Ck(t)2τ 2 k . In the proof of (47), we have shown that aii (ch) 1, trace(A) trace(A2) h 1, thence we have that E ( E )T Az 2 h 2Ck(t)2τ 2 k , which implies that scbτ 2 k = Op h 1Ck(t)τ k scbτ 2 k c/n h 1/2c/n Furthermore, since A Ic, we have that E (E )T Az 2 = E (E )T Az z T A(E ) = τ 2 k (E )T A2(E ) τ 2 k E 2 c Ck(t)2τ 2 k , which implies that scbτ 2 k = Op c Ck(t)τ k scbτ 2 k c/n h 1/2c/n and consequently, 2 T Az scbτ 2 k = Op Using the condition b log2 p (nh1/2 + n(ch) 1)Tn , and the fact that h 0, one has that 22b Tn (nh1/2 + n(ch) 1)Tn Tn (nh + n(ch) 1)Tn = (nh + n(ch) 1) 1, Nonparametric Inference under B-bits Quantization which gives that 2 T Az scbτ 2 k = op(1). Plugging this back to equation (59), we have that R1 = op(1). Now we analyze R2, by Lemma 11, we have c T µ ,t,c Pc i=1 ai,ibτ 2 k scbτ 2 k = c T µ ,t,c trace(A)τ 2 k + trace(A)τ 2 k trace(A)bτ 2 k scτ 2 k τ 2 k bτ 2 k = c T µ ,t,c trace(A)τ 2 k scτ 2 k + trace(A)(τ 2 k bτ 2 k) scτ 2 k + op(1) = c T µ ,t,c trace(A)τ 2 k scτ 2 k + Op(Ck(t)h 1/2). (nh1/2 + n(ch) 1)Tn, one has that Ckh 1/2 Tn q ch)Tnh = 1 q c/n = O(1). From (58), we get the desired result. Proof of Proposition 4: Suppose pσ( ) is the density of σϵ1. By direct calculations, we have E([en 1 en X j=1 Q(ϵj)]4) = 3σ4( 1 en3 )E2{Q(σϵ1)2} + σ2E{Q(σϵ1)4} For the first term, under Condition (B), we know E{Q(σϵ1)2} = O(1), which implies 3σ4( 1 en3 )E2{Q(σϵ1)2} = O(c2n 2). For the second term, we have that E{Q(σϵ1)4} = j=2 µ4 j P(σϵ1 Rj) + µ4 1P(σϵ1 R1) + µ4 k P(σϵ1 Rk) Rj µ4 jpσ(x)dx + µ4 1P(σϵ1 R1) + µ4 k P(σϵ1 Rk) Rj (|x| + Ck(t))4pσ(x)dx + µ4 1P(σϵ1 R1) + µ4 k P(σϵ1 Rk) Rj σ4x4pσ(x)dx + 8Ck(t)4 + µ4 1P(σϵ1 R1) + µ4 k P(σϵ1 Rk) = 8σ4E{ϵ4 1} + 8Ck(t)4 + µ4 1P(σϵ1 R1) + µ4 k P(σϵ1 Rk). Since Ck(t)4 = o(1), E{ϵ4 1} = O(nc 1) and µ4 j P(σϵ1 Rj(t)) = O(nc 1) for j = 1, k, one has that σ2E{Q(σϵ1)4} en3 = O(c2n 2). Plugging this back to equation (61), we get the desired result. Li, Liu, Xu and Shang Proof of Theorem 5: Without loss of generality, we only consider the case g (x) = 0 in (2). By Condition (B), we have that min{t2 1, t2 k 1} > 4c2 sρ2, as n . Consider the following event: E1 = {σ|ϵi| + csρ p Tn for all 1 i n}. (62) It is easy to show that P(E1) 1 as n under Condition (B). Thus, we choose N η s.t. P(E1) 1 η/3 if c N η. Throughout the proof, we suppose that g Sm ρ (I) is the function that generates the samples and f is the integral function of g defined in (15). Let ωi = ezi ez0 i , ω = (ω1, . . . , ωc)T . It is straightforward to see that ωi = zi z0 i under event E1. Because 22b Tn (nh1/2 + n(ch) 1)Tn Tn (nh + n(ch) 1)Tn = (nh + n(ch) 1) 1 τ 2 k, it follows by Lemma 14 that there exists N s.t., when c N , the following equation holds sup g Sm ρ (I) E{|(ez ez0)T Az0|2} (1 + (ch2) 1)τ 2 k Pc i=1(|f(i/c)| + 4Ck(t) + ζ)2 8, as c . (63) Consider the event |ωT Az0| C η p 1 + (ch2) 1τk i=1 (|f(i/c)| + 4Ck(t) + ζ)2 where C η = p 1 P(E2) E{|ωT Az0|2} (C η)2(1 + (ch2) 1)τ 2 k Pc i=1(|f(i/c)| + 4Ck(t) + ζ)2 η/3, which implies that P (E2) 1 η/3. Let bτ 2 k,0 be the estimated variance under the null. Then one has that (z0)T Az0 trace(A)bτ 2 k scbτ 2 k = (z0)T Az0 trace(A)bτ 2 k,0 + trace(A)bτ 2 k,0 trace(A)bτ 2 k scbτ 2 k,0 bτ 2 k,0 bτ 2 k,0 = (z0)T Az0 trace(A)bτ 2 k,0 scbτ 2 k,0 + trace(A)(bτ 2 k,0 bτ 2 k) scbτ 2 k,0 + op(1) = (z0)T Az0 trace(A)bτ 2 k,0 scbτ 2 k,0 + Op(Ck(t)h 1/2). (nh1/2 + n(ch) 1)Tn, one has that Ckh 1/2 Tn q ch)Tnh = 1 q c/n = O(1). Nonparametric Inference under B-bits Quantization It follows from Theorem 3 that (z0)T Az0 trace(A)bτ 2 k scbτ 2 k = OP (1). Hence, there exists C η > 0 s.t. P(E3) 1 η/3 for all c N η and N , where E3 = (z0)T Az0 trace(A)bτ 2 k scbτ 2 k Let E = E1 E2 E3, then P(E) 1 η for any c N η and N . Suppose g Sm ρ (I) satisfies g c Cηδ , where Cη = max 6ϱρ2, 384, (72C η)2, 6(C η + z1 α/2 + 1) , (64) c 1τ 2 k(1 + sc + (ch2) 1) + λ + c 2m + Ck(t)2 + ζ2, (65) where τ 2 k = O(en 1), ζ = max i=1,...,c f(i/c) 1 en Pien j=(i 1)en+1 g(j/n) = O(n 1). It follows from Lemma 13 that, on E, |ωi f(i/c)| 4Ck(t) + ζ. Since A Ic, we get that (ω f)T A(ω f) i=1 (ωi f(i/c))2 32c Ck(t)2 + 2cζ2, which, together with Lemma 16, leads to that 2f T Af (ω f)T A(ω f) = c 2 g 2 c 1 2f T (Ic A)f (ω f)T A(ω f) c 2 g 2 c 1 2ϱc(λ + c 2m)ρ2 32c Ck(t)2 2cζ2. Li, Liu, Xu and Shang Therefore, on E, we have c Tµ,t,c trace(A)bτ 2 k scbτ 2 k = z T Az (z0)T Az0 scbτ 2 k + (z0)T Az0 trace(A)bτ 2 k scbτ 2 k z T Az (z0)T Az0 scbτ 2 k C η = ωT Aω + 2ωT Az0 scbτ 2 k C η c 2 g 2 c 1 2 ϱc(λ+c 2m)ρ2 32c Ck(t)2 2cζ2 2C ητk q (1+ 1 ch2 ) Pc i=1(|f(i/c)|+4Ck(t)+ζ)2 scbτ 2 k C η c 2 g 2 c 1 2 ϱc(λ+c 2m)ρ2 32c Ck(t)2 2cζ2 6C η 1+(ch2) 1τk c g c scbτ 2 k C η (66) 1 2 ϱc(λ+c 2m)ρ2 c 2 g 2c 32c Ck(t)2 c 2 g 2c 2cζ2 c 2 g 2c 6C η 1+(ch2) 1τk c g c c 2 g 2c scbτ 2 k C η c 6 g 2 c scbτ 2 k C η > z1 α/2, (67) where (66) follows from Cη > 12 (see (64)), i.e., i=1 f(i/c)2 = c g 2 c c Cηδ2 48c Ck(t)2, , i=1 f(i/c)2 = c g 2 c c Cηδ2 3cζ2, which leads to i=1 (|f(i/c)| + 4Ck(t) + ζ)2 3 i=1 f(i/c)2 + 48c Ck(t)2 + 3cζ2 9 i=1 f(i/c)2 = 9c g 2 c; and (67) follows from (64), i.e., 1 2ϱc(λ + c 2m)ρ2 c 2 g 2c 1/6, c 2 g 2c 1/6, c 2 g 2c 1/6, 1 + (ch2) 1τk c g c c 2 g 2c 1/6. Nonparametric Inference under B-bits Quantization Then for any g Sm ρ (I) satisfying g c Cηδ , where Cη, δ are defined in (64) (65), there exist Nη max{N η, N } such that for any c Nη, we have P(reject H0|H1 is true) P E and c Tµ,t,c trace(A)bτ 2 k scbτ 2 k = P(E) 1 η. In the end, since h = λ1/(2m), nh1/2Ck(t)2 = o(1), ch , ζ = O(n 1), immediately, one has that g c Cηδ is equivalent as g c Cηδn,c,λ. This proves the desired result. Proof of Theorem 6: Suppose g = βx + α is the true function under Hlinear 0 and yi = g(i/n) + σϵi, i = 1, . . . , n. We use bg to denote the least-square estimator of g based on Q(yi) s. Consider the following two events: E1 = {σ|ϵi| + csρ p Tn for all 1 i n}, E2 = {|g(i/n) bg(i/n)| p Tn for all 1 i n}. It is easy to show that P(E1 E2) 1, as n . Since min{t2 1, t2 k 1} = Tn > 4c2 sρ2 as n , under event E1 E2, for j = 1, . . . , c, one has that σ|ϵj| min{|t1|, |tk 1|}, |yj| min{|t1|, |tk 1|}, |g(j/n) byj| min{|t1|, |tk 1|}. Furthermore, we have Pien j=(i 1)en+1 Q(ylinear j ) en + Ck(t) (68) Pien j=(i 1)en+1 Q Q(yj) byj Pien j=(i 1)en+1 Q yj g(j/n) + Q(yj) yj + g(j/n) byj Pien j=(i 1)en+1 Q yj g(j/n) + Q Q(yj) yj + Q g(j/n) byj en + 7Ck(t) Pien j=(i 1)en+1 Q(σϵj) Pien j=(i 1)en+1 Q g(j/n) byj en + 9Ck(t) z0 i + ςi + 10Ck(t), (69) Pien j=(i 1)en+1 Q g0(j/n) byj en satisfying ςi = Op(1/n + Ck(t)), and equations (68), (69) follow from (6), (33). Let z0 = (z0 1, . . . , z0 c)T , ς = (ς1, . . . , ςc)T . Therefore, the test statistic c Tlinear = c bg B linear,µ,t,c 2 = (zlinear)T Azlinear (z0)T Az0 + ςT Aς + 100 Ck(t)T A Ck(t) + 2 (z0)T Aς + 10(z0)T A Ck(t) + 10ςT A Ck(t) = T1 + T2 + T3 + T4, Li, Liu, Xu and Shang where Ck(t) = (Ck(t), . . . , Ck(t))T . Now we proceed to prove that c Tlinear is dominated by T1. Using the fact that z0 i , z0 j are independent of each other if i = j, and E({z0 i }2) = O(τ 2 k) = O(c/n), E({z0 i }4) = O(c2/n2), then for the first term T1, it is straightforward to show that E (z0)T Az0 2 = i=1 a2 ii E(z0 i )4 + j=1 a2 ij E(z0 i z0 i z0 j z0 j ) + j=1 aijaji E(z0 i z0 j z0 j z0 i ) j=1 aiiajj E(z0 i z0 i z0 j z0 j ) + j=1 ajjaii E(z0 j z0 j z0 i z0 i ) i=1 a2 ii + j=1 a2 ij + j=1 aijaji + j=1 aiiajj + i=1 a2 ii + 2trace(A2) + 2[trace(A)]2 ! In the proof to achieve equation (47), we have shown that aii (ch) 1, trace(A) trace(A2) h 1, thence we have that E (z0)T Az0 2 c2(nh) 2. (70) Furthermore, since A Ic, we have that h Pien j=(i 1)en+1 Q g0(j/n) byj i2 = Op c/n + c Ck(t)2 , Ck(t)T A Ck(t) c Ck(t)2. Since Ck(t)2 Tn 22b Tn (nh1/2+n(ch) 1)Tn Tn (nh+n(ch) 1)Tn = (nh + n(ch) 1) 1, one has that c Ck(t)2 c(nh + n(ch) 1) 1 c(nh) 1. Together with the fact that cn 1 c(nh) 1 and equation (70), one has that T2 T1, T3 T1. Similarly, it can be shown that T4 T1. Therefore, c Tlinear T1. The dominated term T1 in c Tlinear is nothing but c Tµ,t,c for testing H0 : g0(x) = 0 based on z0. Therefore, in keep with Lemma 11, the limiting distribution of Tlinear under Hlinear 0 should have the same limiting distribution as Tµ,t,c under H0 : g0(x) = 0. Thus, according to Theorem 3 and Lemma 12, the result is proved. Proof of Theorem 7: The proof of Theorem 7 is similar to Theorem 5. Let g be the function which generates the observations and PL(I)(g) = arg minf L(I) g f 2 be the Nonparametric Inference under B-bits Quantization projection of g( ) to L(I). We further define f(x) be the integral function associated with g PL(I)(g), as defined in (15), that is, Z min(x+ ,1) max(x ,0) [g PL(I)(g)](s)ds, for x [0, 1], = 1 Therefore, g PL(I)(g) c = p Pc i=1 f2(i/c)/c. To proceed, we first define bg be the least squared estimator based on Q PL(I)(g)(j/n)+σϵj which satisfies |bg(i/n) bg (i/n)| Ck(t). Let zlinear i,0 be the zlinear i based on PL(I)(g). According to (6), onw has that zlinear i,0 en 1 ien X j=(i 1)en+1 Q Q PL(I)(g)(j/n) + σϵj bg (j/n) Ck(t), for i = 1, . . . , c. Let zlinear 0 = (zlinear 1,0 , . . . , zlinear c,0 )T . Before proceeding, we first define some notations to ease the calculations. Define ezlinear i,0 = zlinear i,0 1 max{σ|ϵj| + csρ, |g(j/n) bg(j/n)|, |PL(I)(g)(j/n) bg (j/n)|} Tn for all j = (i 1)en + 1, . . . ien , ezlinear i = zlinear i 1 max{σ|ϵj| + csρ, |g(j/n) bg(j/n)|, |PL(I)(g)(j/n) bg (j/n)|} Tn for all j = (i 1)en + 1, . . . ien) . Similar to Lemma 13, we want to find an upper bound of |ezlinear i ezlinear i,0 |. It is straightforward to show that |ezlinear i ezlinear i,0 | (71) j=(i 1)en+1 Q(ylinear j ) Q Q PL(I)(g)(j/n) + σϵj bg (j/n) + 2Ck(t) = en 1 ien X j=(i 1)en+1 Q Q(yj) bg(j/n) Q Q PL(I)(g)(j/n) + σϵj bg (j/n) + 2Ck(t) j=(i 1)en+1 Q(yj) bg(j/n) PL(I)(g)(j/n) σϵj + bg (j/n) + 6Ck(t) j=(i 1)en+1 yj PL(I)(g)(j/n) σϵj + bg (j/n) bg(j/n) + 7Ck(t) j=(i 1)en+1 g(j/n) PL(I)(g)(j/n) + 8Ck(t) |f(i/c)| + 8Ck(t) + ζ, ζ = max 1 i c j=(i 1)en+1 g(j/n) PL(I)(g)(j/n) = O(n 1). Li, Liu, Xu and Shang Suppose that Condition (B) holds, and consider events E 1 = {σ|ϵi| + csρ p Tn for all 1 i n}, E 2 = {|g(i/n) bg(i/n)| p Tn for all 1 i n}, E 3 = {|PL(I)(g)(i/n) bg (i/n)| p Tn for all 1 i n}. It is easy to show that P(E 1 E 2 E 3) 1 as n . Let E = E 1 E 2 E 3. Thus, we choose N η s.t. P(E ) 1 η/3 if n N η. Define ωlinear = (ωlinear 1 , . . . , ωlinear c )T , where ωlinear i = ezlinear i ezlinear i,0 . Obviously, ωlinear i = zlinear i zlinear i,0 under event E . Therefore, by (71), under event E , we have |ωlinear i f(i/c)| 8Ck(t) + ζ. Since A Ic, we get that (ωlinear f)T A(ωlinear f) i=1 (ωlinear i f(i/c))2 128c Ck(t)2 + 2cζ2, (72) where f = (f(1/c), . . . , f(c/c))T . By Lemma 16 and (72), we get the following lower bound: (ωlinear)T Aωlinear 1 2f T Af (ωlinear f)T A(ωlinear f) 2 g PL(I)(g) 2 c 1 2f T (Ic A)f (ωlinear f)T A(ωlinear f) 2 g PL(I)(g) 2 c 1 2ϱc(λ + c 2m)ρ2 128c Ck(t)2 2cζ2. (73) Using a similar argument in Lemma 14, and the facts that V ar(zlinear i,0 ) = V ar(z0 i )(1 + op(1)) = τ 2 k(1 + op(1)), Ck(t)2 τ 2 k, as n, c , one has that E{|(ωlinear)T Azlinear 0 |2} (1 + (ch2) 1)τ 2 k Pc i=1(|f(i/c)| + 8Ck(t) + ζ)2 8. Therefore, there exists N s.t., when c N , P(E2) 1 η/3, where event E2 is defined as |(ωlinear)T Azlinear 0 | C η p 1 + (ch2) 1τk i=1 (|f(i/c)| + 8Ck(t) + ζ)2 and C η = p 24/η. From Theorem 6, it is straightforward to show that (zlinear 0 )T Azlinear 0 trace(A)bτ 2 k scbτ 2 k = Op(1). Thus, there exists C η > 0 s.t. P(E3) 1 η/3 for all c N η and N , where E3 = (zlinear 0 )T Azlinear 0 trace(A)bτ 2 k scbτ 2 k Then P(E E2 E3) 1 η for any c N η and N . Nonparametric Inference under B-bits Quantization Suppose g Sm ρ (I) satisfies g PL(I)(g) c = v u u tc 1 c X Z min(i/c+c 1,1) max(i/c c 1,0) [g PL(I)(g)](s)ds 2 = i=1 f2(i/c)/c > Cηδ linear, Cη = max n 6ϱρ2, 1536, (72C η)2, 6(C η + z1 α/2 + 1) o , (75) δ linear = q c 1τ 2 linear(1 + sc + (ch2) 1) + λ + c 2m + Ck(t)2 + ζ2. Then, under event E E2 E3, we have c Tlinear,µ,t,c trace(A)bτ 2 k scbτ 2 k =(zlinear)T Azlinear (zlinear 0 )T Azlinear 0 scbτ 2 k + (zlinear 0 )T Azlinear 0 trace(A)bτ 2 k scbτ 2 k (zlinear)T Azlinear (zlinear 0 )T Azlinear 0 scbτ 2 k C η =(ωlinear)T Aωlinear + 2(ωlinear)T Azlinear 0 scbτ 2 k C η c 2 g PL(I)(g) 2 c 1 2 ϱc(λ+c 2m)ρ2 128c Ck(t)2 2cζ2 2C ητk q (1+ 1 ch2 ) Pc i=1(|f(i/c)|+8Ck(t)+ζ)2 scbτ 2 k C η c 2 g PL(I)(g) 2 c 1 2 ϱc(λ+c 2m)ρ2 128c Ck(t)2 2cζ2 6C η 1+(ch2) 1τk c g PL(I)(g) c scbτ 2 k C η (76) c 2 g PL(I)(g) 2 c 1 1 2 ϱc(λ+c 2m)ρ2 c 2 g PL(I)(g) 2c 128c Ck(t)2 c 2 g PL(I)(g) 2c 2cζ2 c 2 g PL(I)(g) 2c 6C η 1+(ch2) 1τk c g PL(I)(g) c c 2 g PL(I)(g) 2c scˆτ 2 k C η c 6 g PL(I)(g) 2 c scbτ 2 k C η > z1 α/2. (77) where (76) follows from Cη > 192 (see (75)), i.e., v u u t i=1 f(i/c)2 = c g PL(I)(g) c c Cηδ linear 192c Ck(t), i=1 f(i/c)2 = c g PL(I)(g) c c Cηδ linear which leads to i=1 (|f(i/c)|+8Ck(t)+ζ)2 3 i=1 f(i/c)2+192c Ck(t)2+3cζ2 9 i=1 f(i/c)2 = 9c g PL(I)(g) 2 c; Li, Liu, Xu and Shang and (77) follows from (75), i.e., 1 2ϱc(λ + c 2m)ρ2 c 2 g PL(I)(g) 2c 1/6, 128c Ck(t)2 c 2 g PL(I)(g) 2c 1/6, c 2 g PL(I)(g) 2c 1/6, 1 + (ch2) 1τk c g PL(I)(g) c c 2 g PL(I)(g) 2c 1/6. Then for any c Nη max{N η, N }, we have P(reject H0|H1 is true) P E E2 E3 and c Tlinear,µ,t,c trace(A)bτ 2 k scbτ 2 k In the end, by direct calculations, we know that g PL(I)(g) c Cηδ linear is equivalent as g PL(I)(g) c Cηδlinear n,c,λ . This proves the desired result. Proof of Theorem 8: Let ϵ = (ϵ1, . . . , ϵn)T N(0, In) and ey0 = (ey0 1, . . . , ey0 c)T , where Pien j=(i 1)en+1 σϵj en follows a normal distribution. We further define ςi = z0 i ey0 i be the difference of z0 i and ey0 i . Let ς = (ς1, . . . , ςc)T . Consider event E1 = {σ|ϵi| + csρ p Tn for all 1 i n}. Then P(E1) 1 as n , and under event E1, |ey0 i 1 en Pien j=(i 1)en+1 Q(σϵj)| Ck(t). According to (33), one has that |ςi| = Op(Ck(t)). Note that for any given ml m mu , the standardized testing statistic ξm = c Tm trace(Am)bτ 2 k sc,mbτ 2 k = (ς + ey0)T Am(ς + ey0) trace(Am)bτ 2 k sc,mbτ 2 k sc,mbτ 2 k + 2ςT Amey0 sc,mbτ 2 k + (ey0)T Amey0 trace(Am)bτ 2 k sc,mbτ 2 k = J1 + J2 + J3. For J1, notice ςT Amς Pc i=1 ς2 i = Op(c Ck(t)2), and Ck(t)2 (nh1/2 m ) 1 for any ml m mu, one has J1 = op(1). For J2, using a similar argument as (60), we have chm + Ck(t) p 0, for any ml m mu. For J3, we need to use Lemma 17. Let e Am = Am/sc,m, Ψc = en 1/2(ey0 1, . . . , ey0 c)T . Define Fc,m := Ψ c e AmΨc E h Ψ c e AmΨc i . Define Zc = (Zc,ml, . . . , Zc,mu) be an mu ml + 1 Nonparametric Inference under B-bits Quantization -dimensional centered Gaussian vector with covariance matrix C = Imu ml+1 Next we need to verify the conditions in Lemma 17. By direct calculations, we have E F 4 c,m 3E F 2 c,m 2 = 48trace( e A4 m) hm s4c,m h3 m. Then we have maxml m mu E F 4 c,m 3E F 2 c,m 2 log6 mu 0. On the other hand, max ml m,m mu C(m, m ) E Fc,m Fc,m m=ml |1 E F 2 c,m | + 2 X ml m m. It follows that E Fc,m Fc,m = 2trace(Am Am )/(sc,msc,m ) (hmhm ) 1/2 (λm + ν 2m)2 ν 4m (λm + ν 2m )2 (hmhm ) 1/2 1 (1 + λmν2m)2(1 + λm ν2m )2 (hmhm ) 1/2 1 1 + (hmν)2m 1 + (hm ν)2m (hmhm ) 1/2 1 1 + (hmx)2m 1 + (hm x)2m dx (hmhm ) 1/2 1 1 + (hmx)2m 1 + (hm x)2m dx hmhm 1 + (hmx)2m 1 + (hm x)2m dx hm/hm 2m m + x p hm /hm 2m dx hm /hm 2m dx = (hm /hm) 1/2 Z 1 1 + x2m dx = (hm /hm) 1/2 Z 1 1 1 + x2m dx + Z 1 1 + x2m dx C0 (hm /hm) 1/2 . Therefore, using Lemma 18, we have ml m 0, such that P(E1) 1 η/3 for all c N η. Follows from Lemma 14 that there exists N s.t., when c N , P(E2) 1 η/3. Furthermore, using Theorem 3, there exists C η > 0 s.t. P(E3) 1 η/3 for all c max{N η, N }. Suppose g Sm ρ (I) satisfies g c Cηδ , where Cη = max 6ϱρ2, 384, (72C η)2, 6(C η + z1 α/2 + 1) , c 1τ 2 k(1 + sc,m log1/2(mu) + (ch2m ) 1) + λ + c 2m + Ck(t)2 + ζ2, τ 2 k = V ar(z1|H0) = V ar(z0 i ) = O(en 1), ζ = max i=1,...,c f(i/c) 1 j=(i 1)en+1 g(j/n) = O(n 1). Since mu , m mu eventually. So we assume m mu. Then it holds that inf g Sm ρ (I) g c Cηδ P(ξ > qα) = inf g Sm ρ (I) g c Cηδ P(ξmax > Cn + qα/Cn) inf g Sm ρ (I) g c Cηδ P(ξm > Cn + qα/Cn) = inf g Sm ρ (I) g c Cηδ P(c Tm trace(Am )bτ 2 k sc,m bτ 2 k > Cn + qα/Cn). Li, Liu, Xu and Shang Similar to the proof of Theorem 5, we know with probability approaching one c Tm trace(Am )ˆτ 2 k sc,m ˆτ 2 k = z T Am z (z0)T Am z0 sc,m ˆτ 2 k + (z0)T Am z0 trace(Am )ˆτ 2 k sc,m ˆτ 2 k = z T Am z (z0)T Am z0 sc,m ˆτ 2 k + Op(1) = ωT Am ω + 2ωT Am z0 scˆτ 2 k + OP (1) 1 2 cm c(λm +c 2m )ρ2 c 2 g 2c 32c Ck(t)2 c 2 g 2c 2cζ2 c 2 g 2c 6C η 1+(ch2m ) 1τk c g c c 2 g 2c sc,m ˆτ 2 k + Op(1). Since Cn (log(mu))1/2 we have c Tm trace(Am )ˆτ 2 k sc,m ˆτ 2 k Cn + qα/Cn. Therefore, for any c Nη max{N η, N }, we have P(reject H0|H1 is true) P E1 E2 E3 and c Tm trace(Am )ˆτ 2 k sc,m ˆτ 2 k In the end, by direct calculations, we know that g c Cηδ is equivalent as g c Cηδn,c,an. This proves the desired result.