# kernel_quanttree__dceb8385.pdf Kernel Quant Tree Diego Stucchi 1 Paolo Rizzo 1 Nicoló Folloni 1 Giacomo Boracchi 1 We present Kernel Quant Tree (KQT), a nonparametric change detection algorithm that monitors multivariate data through a histogram. KQT constructs a nonlinear partition of the input space that matches pre-defined target probabilities and specifically promotes compact bins adhering to the data distribution, resulting in a powerful detection algorithm. We prove two key theoretical advantages of KQT: i) statistics defined over the KQT histogram do not depend on the stationary data distribution ϕ0, so detection thresholds can be set a priori to control false positive rate, and ii) thanks to the kernel functions adopted, the KQT monitoring scheme is invariant to the rototranslation of the input data. Consequently, KQT does not require any preprocessing step like PCA. Our experiments show that KQT achieves superior detection power than non-parametric state-ofthe-art change detection methods, and can reliably control the false positive rate. 1. Introduction Change Detection (CD) is the problem of detecting distribution changes ϕ0 ϕ1 in a datastream, namely detecting when the data-generating process drifts from a stationary distribution ϕ0 towards an unknown post-change distribution ϕ1. Here, we address the problem of batch-wise CD, where data are analyzed in fixed-size batches that, under normal conditions, contain samples drawn from ϕ0. The timely detection of distribution changes and the control over the false alarm rate are fundamental problems that have been widely explored in both the Machine Learning (Gama et al., 2014) and Statistical Process Control (Basseville et al., 1993) literature. Among the many applications of change-detection algorithms, we mention fault detection (Tartakovsky et al., 2006), financial monitoring (Ross et al., 1Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Italy. Correspondence to: Diego Stucchi . Proceedings of the 40 th International Conference on Machine Learning, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright 2023 by the author(s). 2011), cryptographic attacks (Frittoli et al., 2020), and quality control (Hawkins et al., 2003). Most CD algorithms consist of three major ingredients: i) a model bϕ0 describing stationary data, which is usually learned from a training set, ii) a statistical test, where a test statistic T is computed to assess the consistency of incoming data to bϕ0, and iii) a decision rule on T to establish whether a change has occurred. In many real-world multivariate scenarios, estimating a density model for the stationary data is often unfeasible. Therefore, non-parametric methods that describe stationary data by flexible models are preferred. Unfortunately, most non-parametric statistics are based on ranking (Ross & Adams, 2012) and can only be applied to univariate data. In Section 3, we overview the few nonparametric solutions to monitor multivariate datastreams. A relevant example is Quant Tree (QT) (Boracchi et al., 2018), a change detection algorithm based on a histogram partitioning of the input space, which is supported by sound theoretical results. In particular, QT allows to operate at a controlled false alarm rate without knowing ϕ0 nor resorting to bootstrap to estimate detection thresholds. A fundamental limitation of QT is that splits are defined along the axis, as in Figure 1(a), resulting in a partitioning that does not always adhere to the input distribution. To mitigate this problem, a preprocessing stage is typically introduced to align the split directions to the principal components of the training set, as shown in Figure 1(b). While this procedure is often beneficial, we observe (Section 6) that it can worsen the detection performance in some unpredictable cases. Moreover, many bins in Figure 1(a)(b) have non-finite volumes, which can lead to poor estimation of bin probabilities. In this paper, we introduce Kernel Quant Tree (KQT), a nonparametric and multivariate CD algorithm that constructs histogram bins via measurable kernel functions, resulting in a powerful CD test. In contrast with the QT algorithm, which constructs bins by axis-aligned splits, KQT partitions the space in K 1 compact bins defined by kernel functions evaluated on the training data. An additional bin, denoted as the residual bin, is non-compact and gathers all the points that do not fall in any other bin. Figure 1(c)-(d)-(e) shows that the KQT bins are compact subsets of the domain. Our intuition is that compact bins increase the flexibility when Kernel Quant Tree Quant Tree (w/o PCA) Quant Tree (w/ PCA) KQT (Euclidean) KQT (Mahalanobis) KQT (Weighted Mahalanobis) Figure 1. Quant Tree generates bins as intersection of hyperplanes, performing cuts along the axis (a). After a preprocessing through PCA, the cuts are oriented along the principal directions (b). Kernel Quant Tree generates bins that are subsets of d-dimensional spheres according to the underlying kernel functions, namely the Euclidean (c), Mahalanobis (d) and Weighted Mahalanobis (e) distances. modeling ϕ0 by fitting a histogram h to training data. Moreover, estimating the bin probabilities under ϕ0, which are fundamental to compute the test statistic Th, is less accurate on non-compact bins. In Section 5, we prove that KQT features two theoretical properties that have significant implications in change detection. First, the distribution of the test statistic Th computed from a KQT histogram h does not depend on the stationary distribution ϕ0. Consequently, detection thresholds τ can be set a priori as in QT, without knowing ϕ0. Second, the monitoring performed by KQT using specific kernel functions is not influenced by preprocessing based on roto-translations, including alignment to principal components. Thanks to these properties, KQT outperforms state-of-the-art alternatives on a broad experimental testbed illustrated in Section 6. In particular, KQT achieves better detection performance than the alternatives independently of preprocessing steps based on roto-translations. In summary, these are our main contributions: i) We present Kernel Quant Tree, a non-parametric CD method based on a histogram where bins are defined by kernel functions. KQT achieves state-of-the-art detection performance on multivariate datastreams; ii) We prove that statistics defined over the KQT histograms do not depend on ϕ0, but only on few KQT parameters. This enables control of the FPR by thresholds τ set a priori, via Monte Carlo simulations; iii) We prove that the monitoring performed by KQT is independent of any preprocessing by roto-translations. 2. Problem Formulation We address the problem of change detection in batch-wise monitoring settings, where stationary data are realizations of a random vector X with unknown probability density function ϕ0. We assume that a training set of stationary samples TR = {x1, . . . , x N} Rd is provided, and that incoming data are processed in batches W of ν N samples each. We denote as W ϕ0 when all the samples in the batch W are drawn from ϕ0. Our goal is to design a CD algorithm that: i) detects distribution changes in incoming batches, and ii) controls the False Positive Rate (FPR), namely the probability of mistakenly detecting a change in stationary data. We formulate this CD problem as a Hypothesis Test to establish whether W ϕ0 (null hypothesis) or W ϕ1 = ϕ0, where ϕ1 is the unknown post-change distribution. We pursue the mainstream approach of computing a test statistic T on each batch W and detecting a change when T (W) > τ, (1) where τ R is the threshold that we set to control the FPR. For the sake of simplicity, we assume that a batch W is either drawn from ϕ0 or from a different unknown distribution ϕ1 = ϕ0. However, CD algorithms can in principle detect batches drawn from a mixture of ϕ0 and ϕ1, even though the detection power is expected to be lower in this case. 3. Related Work Change detection in multivariate datastreams is a challenging problem, which can be significantly simplified when ϕ0 belongs to a known parametric family since the model bϕ0 is obtained by estimating its parameters. The most popular solutions pursuing this approach consist in monitoring the likelihood of incoming data with respect to bϕ0 fitted on TR. Viable options for bϕ0 are Gaussian process (Saatçi et al., 2010), Gaussian Mixtures (Kuncheva, 2011) or kernel density estimators (Krempl, 2011). In (Kuncheva, 2011) and (Kuncheva & Faithfull, 2013), the Semiparametric Log Likelihood (SPLL) algorithm fits a Gaussian Mixture Model (GMM) to TR and compares incoming batches with batches from TR by a likelihood test. Moreover, in SPLL, it is not possible to set a priori the detection threshold to control the FPR, as the distribution of the test statistic depends on ϕ0. Moreover, adopting a GMM to approximate ϕ0 might not Kernel Quant Tree always fit real-world data, as demonstrated by our experiments on high-dimensional datasets. There are only a few recent multivariate methods that perform non-parametric change detection, namely, that assume that ϕ0 and ϕ1 are unknown. Among these, we focus on histogram-based algorithms, since these are non-parametric by design and can efficiently process datastreams in batches. As such, histogram-based algorithms represent very practical solutions to monitor multivariate datastreams. Density Tree (Criminisi et al., 2012) constructs a space partitioning by iteratively splitting regions to maximize an informationgain metric. In this case, the distribution of the test statistic depends on ϕ0, thus detection thresholds need to be set by bootstrap on TR. Equal Intensity K-means (EIKM) (Liu et al., 2020) divides the input space using K-means clustering, resulting in bins that yield an equal probability under ϕ0. EIKM is designed to handle multimodal distributions, e.g., Gaussian Mixtures, and the detection thresholds are given by asymptotic approximations of the Pearson test statistic. Quant Tree (Boracchi et al., 2018) defines a partitioning S of the input space in K bins by axis-aligned cuts. The theoretical properties of QT guarantee that the distribution of test statistics defined over bin probabilities does not depend on ϕ0, which allows to set detection thresholds a priori, with synthetically generated data through a very efficient scheme. The splits are performed such that the probability of a stationary sample to fall in each bin is close to a set of target probabilities {πk} provided as input parameters. Since the data splits are limited to the axis directions, the bins in QT require a preprocessing stage whose outcome, in terms of detection power, is uncertain, as demonstrated in our experiments (Section 6). KQT preserves the properties of QT in terms of setting detection thresholds and FPR control, and overcomes QT limitation by constructing compact bins that are not affected by roto-translations, thus better approximate the probability measure of each bin under ϕ0. 4. Kernel Quant Tree We present Kernel Quant Tree (KQT)1, a CD algorithm that solves a major limitation of Quant Tree (QT) while generalizing and extending its theoretical guarantees. The KQT histogram is constructed by iteratively splitting the input space Rd into K bins {Sk} such that the probability of a stationary sample x ϕ0 to fall in Sk is close to a target probability πk, which are provided as input parameters. The peculiarity of KQT is that each bin Sk for k < K is defined by a measurable kernel function fk : Rd R and a split value qk R, and corresponds to a compact set in Rd. We denote as Generalized Quant Tree (GQT) partitioning the resulting histogram h = {(Sk, bπk)}K k=1, which yields a partition of the input space Rd, where bπk is the empirical 1Code available at github.com/diegocarrera89/quant Tree. S1 = {x Rd | f1(x) q1} S2 = {x Rd \ S1 | f2(x) q2} . . . SK 1 = {x Rd\S j qk} 10: end for 11: SK = Rd \ S as input a training set TR and the target probabilities {πk}. The rationale underpinning the space partitioning of KQT is to iteratively construct bins by selecting sublevel sets of the kernel functions fk. In particular, S1 is defined as the sublevel set of f1 with respect to the split value q1: S1 = {x Rd | f1(x) q1} . (3) The other bins Sk, 1 < k < K are obtained by isolating sublevel sets from the space that is not yet assigned to a bin Sj | fk(x) qk} for k < K, (4) where Sj denotes the complement of Sj in Rd. The last bin, SK = Rd \ S j τ. (1) A peculiarity of GQT is that each bin SK is defined as a subset of the sublevel set of a measurable kernel function fk : Rd R. In this section, we prove Theorem 5.1 of the main article, which we recall here: Theorem 1. Let h = {(Sk, bπk)}K k=1 be a Generalized Quant Tree histogram constructed using measurable functions fk : Rd R k. Let Th be a statistic defined over batches W such that Th(W) only depends on the number of samples y1, . . . , y K of W falling in the bins of h. Then, the distribution of Th over stationary batches W ϕ0 depends only on the batch size ν, the number of training points N and target probabilities {πk}k. Theorem 1 implies that the distribution of Th computed over stationary batches by a GQT is independent of ϕ0, d or {fk}, thus allowing us to empirically estimate its distribution and compute a threshold τ such that the False Positive Rate (FPR) achieved by GQT is controlled. The threshold computation strategy is presented in Section 5.2 of the main article. Theorem 1 is a generalization of Theorem 1 from [Boracchi et al., 2018] and its proof follows the same structure based on three propositions. Here, we prove the first of these propositions: Proposition 1. Let x1, x2, . . . , x M be i.i.d. realizations of a continuous random vector X defined over D Rd. Let f : Rd R be a measurable function, and let Z = f(X). We denote with z(1) z(2) z(M) the sorted images of {xj} through f. For any L {1, 2, . . . , M}, we define the sublevel sets Qf,L := {x D : f(x) z(L)}. (2) Then, the random variable p = PX(Qf,L) is distributed as Beta(L,M L + 1). Proof. We prove the proposition by showing that p is an order statistic of the uniform distribution, which in turn follows a Beta distribution [Lehmann et al., 2005]. Since f is a measurable function for the considered probability space and X is a continuous random variable (r.v.) in Rd, by the properties of continuous r.v. [Papoulis and Pillai, 2002], we have that Z = f(X) is also a continuous r.v. in R. Then, we define U = FZ(Z), where FZ is the cdf of Z. Since FZ is monotonically non-decreasing, we can also define the inverse cdf as: F 1 Z (t) = inf {z R | FZ(z) t}. (3) Then, we have that FU(u) = PU(U u) = PZ(FZ(Z) u) = = PZ(Z F 1 Z (u)) = FZ(F 1 Z (u)) = u, (4) hence U is a uniform random variable, since its cumulative density function is the identity. Recall that we assumed that X is defined over D, i.e., PX(R \ D) = 0. Then, exploiting (4), we can express p as follows: p = PX(Qf,L) = PX(x D | f(x) z(L)) = = PZ(z R | z z(L)) = = PU(u [0, 1] | u u(L)) = u(L), (5) where we define u(L) = FZ(z(L)). From (5), we have that p is the L-th order statistic of M samplings of the uniform distribution, and its distribution is Beta(L, M L+1) [Balakrishnan and Rao, 1998]. We refer the reader to [Boracchi et al., 2018] for a thorough description of the derivation of the proof of Theorem 1 from Proposition 1. 2.2 Centroid Selection and Invariance to Roto-Translation Kernel Quant Tree (KQT) defines a partition of the input space by iteratively splitting it in bins Sk that match a target probability, as shown in Figure 2 of the main article. The KQT bins are defined as subsets of sublevel sets of the adopted measurable kernel functions fk. We report here the formal definition of the KQT histogram bins: S1 = {x Rd | f1(x) q1} Sk = {x T j q 1} = = {Φ(x) | x X0, D(x, c1) > q 1} = Φ(X1), (20) and X 1 = Φ(X 1). Now, suppose that j < k we have that c j = Φ(cj), S j = Φ(Sj) and X j = Φ(Xj). Then, we have that x T j qj j < k D(x, ck) qk D (Φ(x), c j) > qj j < k D (Φ(x), c k) qk and, consequently, the number of samples from any batch W Rd falling in the Sk is the same as the number of samples of W = Φ(W) falling in S k. Then, we have that bπk = bπ k and Th(W) = Th (W ). 3 More Experiments and Discussion This section extends the experimental evaluation of KQT from Section 6 of the main article to corroborate the findings discussed there. First, we investigate the real-world datasets employed in our experiments, proving that these do not follow a Gaussian distribution. Then, we perform additional experiments on high-dimensional data to investigate the control of the FPR in this challenging scenario. Finally, we extend the results from the main article by comparing the proposed centroid selection strategies and reporting the complete results for both the lowand high-ratio settings. 3.1 Remarks about the real-world datasets In Section 6.1 of the main article, we introduce the real-world datasets that are used in our experiments. The INSECTS dataset [Souza et al., 2020] is a benchmark for concept-drift detection algorithms and comprises data describing the wing-beat frequency of six species of insects at different temperatures. The other datasets are from the UCI Machine Learning Repository [Dua and Graff, 2017] and from [Dal Pozzolo et al., 2017], and comprise data following a unique distribution, thus require the introduction of artificial distribution changes for our experiments. We standardize these datasets and add a negligible amount of noise η N(0, σ) to each component to prevent the many repeated values from harming the histogram construction. Table 1 lists all the datasets and reports their dimension d and the level σ of noise applied to their components. Since data in the synthetic settings are drawn from Gaussian distributions, one could argue that KQT provided with the Mahalanobis or Weighted Mahalanobis kernels have an advantage over the alternatives. However, this is not true for the real-world datasets considered in our experiments, which are far from Gaussian. This claim is empirically supported by the low detection performance of SPLL, which is itself based on a GMM. To confirm this intuition, we also run the Shapiro-Wilk normality test [Shapiro and Wilk, 1965], an Hypothesis Test used to determine whether a population {xi}n i=1 R is drawn from a univariate Gaussian distribution. If the p-values associated to the HT is lower than 0.05, than we can conclude that the population is not normally distributed. Since the marginals of a multivariate Gaussian distribution are univariate Gaussian distributions, we show that the real-world datasets introduced in Section 6.1 of the main article are not drawn from multivariate Gaussians by showing that their covariates are not. For this purpose, we extract a subset of n = 4096 samples from the real-world datasets and perform the Shapiro-Wilk test on each of their covariates. Table 1 reports the p-values yielded by the test, averaged over the covariates and over 250 iteration of the test performed over different subsets. The p-values obtained in these tests are in the range of 10 20, thus confirming that the real-world datasets employed in our experiments are not drawn from multivariate Gaussian distributions. Table 1: List of the real-world datasets employed in the experiments. For each dataset, we report the dimension d, the noise level σ and the average p-value of the Shapiro-Wilk test computed on the marginals. Dataset Name d σ p-value Reference El Nino Southern Oscillation nino 5 10 3 3.3 10 3 [Dua and Graff, 2017] Physicochemical Properties of PTS protein 9 7.5 10 8 [Dua and Graff, 2017] Forest Covertype I spruce 10 10 1 2.5 10 9 [Dua and Graff, 2017] Forest Covertype II lodgepole 10 10 1 1.8 10 8 [Dua and Graff, 2017] Credit Card Fraud Detection credit 28 10 3 9.9 10 8 [Dal Pozzolo et al., 2017] Insects Flying Behavior INSECTS 33 < 10 16 [Souza et al., 2020] Sensorless Drive Diagnosis sensorless 48 10 3 4.9 10 8 [Dua and Graff, 2017] Mini Boo NE Particle Identification particle 50 10 3 8.5 10 3 [Dua and Graff, 2017] UNSW Swarm Behavior swarm 2400 1.8 10 9 [Dua and Graff, 2017] 3.2 Curse of dimensionality In this section, we investigate the ability of KQT to control the FPR as the data dimension d increases. Our experiments (Section 6.4 of the main article) have shown that the Kernel Quant Tree with the Weighted Mahalanobis distance deviates from the desired FPR when the data dimension grows. As discussed in the article, this deviation is due to the challenge of fitting a Gaussian Mixture Model (GMM) to high-dimensional data. To analyze the impact of the dimensionality on Kernel Quant Tree, we perform experiments in three synthetic settings with d {4, 8, 16, 32, 64, 128}. In these settings, denoted as unimodal, bimodal, and trimodal, the stationary distribution ϕ0 is defined as a GMM with 1, 2, and 3 Gaussian components, respectively. Then, we use the CCM framework [Alippi et al., 2017] to generate a post-change distribution by applying a roto-translation to each Gaussian component of ϕ0 such that the Kullback-Leibler distance between these and the three resulting post-change components is fixed to 1. We perform each experiment twice, one with N = 4096 training samples and the other with N = 16384, to show that when a large training set is available the limitation of the KQT with Weighted Mahalanobis is avoidable. Table 2 reports the FPR achieved by KQT adopting different distances in the unimodal, bimodal and trimodal settings for all the considered dimensions d and training set sizes N. In all experiments we construct a KQT histogram with K = 16 bins, we set the detection thresholds to yield an FPR α = 5%, and we test KQT on 5000 stationary batches with ν = 128 samples. In the experiment with N = 4096 (left columns), we notice that when d increases, the FPR achieved by KQT when using the Mahalanobis and Weighted Mahalanobis distances deviates further from the target value. In contrast, when we train KQT on N = 16384 samples (right columns), the deviation from the target FPR is significantly reduced. To further corroborate our hypothesis that the issue is in the GMM fitting, we compute the average condition number of the covariance matrices of the GMM components yielded when using KQT with the Weighted Mahalanobis distance. These results show that, in the high-dimensional datasets, using a larger training set yields covariance matrices with smaller condition numbers. 3.3 Comparing the centroid selection strategies In the main article, we propose two strategies for the centroid selection, namely, maximizing the information gain introduced by splitting Xk 1 in Xk and X k and minimizing the Gini index of the distances between the centroid and the training samples in Xk 1. In Table 1 of the article, we report the average FPR and AUC achieved by KQT using the maximization of the information gain as a centroid selection strategy. Here, Table 3 reports the results achieved by KQT with the Euclidean, Mahalanobis, and Weighted Mahalanobis distance for both strategies and proves that their performance is comparable in every experimental setting. 3.4 Complete experimental results In this section, we report the complete results of the highand low-ratio experiments presented in Section 6 of the main article. For each result, we include the corresponding confidence interval. Table 5 reports the FPR and AUC achieved by the methods presented in Section 6.3 of the main article in the high-ratio setting, namely when ν = 128 and K = 16. As already discussed in the paper, Quant Tree, Kernel Quant Tree and EIKM achieve an empirical FPR close to the target α = 5% in most experiments. In contrast, SPLL and PCA-SPLL mostly exceed the target and Density Tree largely overshoots it. However, the KQT with the Weighted Mahalanobis distance does not control the FPR accurately when d increases, and we speculate that this is due to the GMM underlying the definition of distance. This known limitation is discussed in Section 6.4 of the main article and investigated in Section 3.2 of this document. As for the AUC, KQT with the Weighted Mahalanobis distances outperforms the alternatives in most settings. At the bottom of Table 5, we report the ranking of each method computed from the Table 2: Comparison between the FPR achieved by KQT using the Euclidean, Mahalanobis, and Weighted Mahalanobis distances in the synthetic settings for various dimensions d and training set sizes N. In parenthesis, the average condition numbers of the covariance matrices of the GMM used by KQT with the Weighted Mahalanobis distance. The underlined values indicate an FPR above the target of 5%. KQT(Euclidean) KQT(Mahalanobis) KQT(Weighted Mahalanobis) N=4096 N=16384 N=4096 N=16384 N=4096 N=16384 d = 4 4.88% - 4.77% 4.84% 4.79% (20.1) 4.85% (16.3) d = 8 4.83% - 4.81% 4.86% 4.71% (32.1) 4.83% (37.7) d = 16 4.81% - 4.81% 4.89% 4.88% (99.5) 4.79% (67.7) d = 32 4.84% - 4.95% 4.88% 4.99% (150.6) 4.81% (123.6) d = 64 4.84% - 5.80% 4.95% 5.81% (315.2) 4.87% (223.8) d = 128 4.91% - 16.52% 5.31% 77.74% (344.0) 5.45% (307.0) d = 4 4.83% - 4.76% 4.88% 4.77% (12.9) 4.89% (13.7) d = 8 4.88% - 4.86% 4.87% 4.79% (40.0) 4.84% (36.3) d = 16 4.83% - 4.88% 4.82% 4.88% (68.3) 4.87% (90.3) d = 32 4.86% - 4.95% 4.86% 5.36% (177.2) 4.83% (120.4) d = 64 4.89% - 5.66% 4.86% 5.70% (253.9) 5.03% (220.3) d = 128 4.84% - 15.44% 5.32% 76.60% (276.9) 5.46% (244.7) d = 4 4.72% - 4.86% 4.85% 4.82% (24.9) 4.84% (16.1) d = 8 4.84% - 4.79% 4.82% 4.83% (31.7) 4.80% (38.2) d = 16 4.85% - 4.85% 4.80% 4.86% (66.1) 4.83% (59.9) d = 32 4.81% - 4.91% 4.84% 5.13% (108.7) 4.86% (120.7) d = 64 4.93% - 5.67% 4.87% 5.53% (209.4) 5.01% (176.9) d = 128 4.81% - 15.86% 5.37% 77.49% (258.1) 5.47% (214.2) AUC, together with the p-value of the Nemenyi post-hoc statistic, which proves that the advantage of the best-performing method is statistically significant. In the main article, we also discuss the performance of Quant Tree, which shows how the preprocessing by PCA decreases the detection performance in some cases. Remarkably, the KQT monitoring is invariant under roto-translations (see Section 5.3 of the main article) and surpasses Quant Tree independently of the application of the PCA preprocessing. Table 4 reports the FPR and AUC achieved by the considered methods in the low-ratio setting, namely when ν = 64 and K = 32. The results of this experiment are overall in line with the high-ratio setting. However, as we speculate in Section 6.3 of the main article, histogram can better model a data distribution when the expected number of points per bin ν/K is large. This low-ratio setting confirms our speculation, as the considered methods achieve an AUC lower than in the high-ratio experiment on most datasets. However, KQT with the Weighted Mahalanobis distance still achieves the best AUC with a statistically significant advantage over the alternatives, as demonstrated by the Nemenyi post-hoc test. Moreover, the Pearson test statistic is discrete and in the low-ratio setting assumes fewer distinct values. Thus, it is more challenging to set detection thresholds and the results show that the empirical FPR of Quant Tree and Kernel Quant Tree is slightly lower than in the high-ratio experiment. Table 3: Comparison between the detection performance achieved by KQT with the Euclidean, Mahalanobis and Weighted Mahalanobis distances, when selecting the centroids by maximization of the Information Gain (left) and minimization of the Gini Index (right), in the high-ratio setting (K = 16, ν = 128 points). The table reports the achieved FPR (top) and AUC (bottom). In parenthesis the standard deviation. Information Gain Gini Index Euclidean Mahalanobis Weighted Maha. Euclidean Mahalanobis Weighted Maha. unimodal 4.86% (0.47%) 4.82% (0.45%) 4.83% (0.48%) 4.83% (0.47%) 4.81% (0.48%) 4.83% (0.49%) bimodal 4.80% (0.46%) 4.81% (0.44%) 4.80% (0.45%) 4.81% (0.47%) 4.84% (0.46%) 4.83% (0.46%) nino 5.00% (0.53%) 5.02% (0.53%) 5.01% (0.54%) 5.02% (0.55%) 5.06% (0.54%) 5.02% (0.54%) protein 4.97% (0.52%) 4.98% (0.54%) 5.03% (0.55%) 4.99% (0.54%) 5.03% (0.56%) 5.06% (0.53%) spruce 4.82% (0.47%) 4.84% (0.49%) 4.90% (0.47%) 4.84% (0.49%) 4.85% (0.48%) 4.88% (0.49%) lodgepole 4.85% (0.49%) 4.80% (0.47%) 4.90% (0.50%) 4.84% (0.50%) 4.82% (0.49%) 4.93% (0.51%) credit 4.89% (0.48%) 4.85% (0.46%) 5.06% (0.56%) 4.89% (0.50%) 4.90% (0.52%) 5.10% (0.61%) insects (1 2) 4.91% (0.50%) 4.93% (0.52%) 5.19% (0.64%) 4.90% (0.49%) 4.92% (0.54%) 5.19% (0.61%) insects (2 3) 4.92% (0.52%) 4.96% (0.52%) 5.25% (0.62%) 4.88% (0.50%) 4.93% (0.52%) 5.24% (0.63%) insects (3 4) 4.90% (0.52%) 4.88% (0.53%) 5.22% (0.64%) 4.91% (0.55%) 4.92% (0.53%) 5.24% (0.64%) insects (4 5) 4.91% (0.51%) 4.92% (0.54%) 5.25% (0.65%) 4.92% (0.52%) 4.95% (0.49%) 5.28% (0.66%) insects (5 6) 4.90% (0.56%) 4.92% (0.53%) 5.26% (0.72%) 4.90% (0.55%) 4.91% (0.57%) 5.29% (0.71%) sensorless 4.82% (0.49%) 5.01% (0.56%) 7.42% (1.61%) 4.87% (0.48%) 4.98% (0.58%) 7.54% (1.56%) particle 4.81% (0.46%) 4.94% (0.52%) 5.80% (1.02%) 4.84% (0.48%) 4.93% (0.52%) 5.86% (1.00%) unimodal 0.946 (0.105) 0.993 (0.016) 0.994 (0.013) 0.946 (0.103) 0.994 (0.015) 0.994 (0.014) bimodal 0.904 (0.118) 0.954 (0.060) 0.968 (0.042) 0.903 (0.119) 0.955 (0.056) 0.970 (0.039) nino 0.607 (0.072) 0.904 (0.138) 0.922 (0.122) 0.609 (0.071) 0.903 (0.139) 0.922 (0.122) protein 0.617 (0.074) 0.993 (0.035) 0.995 (0.027) 0.615 (0.074) 0.993 (0.030) 0.994 (0.030) spruce 0.601 (0.066) 1.000 (0.000) 1.000 (0.000) 0.600 (0.068) 1.000 (0.000) 1.000 (0.000) lodgepole 0.654 (0.099) 1.000 (0.000) 1.000 (0.000) 0.653 (0.099) 1.000 (0.000) 1.000 (0.000) credit 0.602 (0.053) 0.780 (0.146) 1.000 (0.000) 0.605 (0.055) 0.787 (0.141) 1.000 (0.000) insects (1 2) 0.962 (0.035) 0.972 (0.039) 0.993 (0.019) 0.961 (0.038) 0.970 (0.037) 0.994 (0.015) insects (2 3) 1.000 (0.001) 1.000 (0.000) 1.000 (0.000) 1.000 (0.001) 1.000 (0.000) 1.000 (0.000) insects (3 4) 0.904 (0.054) 0.942 (0.049) 0.990 (0.024) 0.903 (0.058) 0.946 (0.044) 0.989 (0.025) insects (4 5) 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) insects (5 6) 0.985 (0.016) 0.992 (0.008) 1.000 (0.000) 0.986 (0.014) 0.991 (0.009) 1.000 (0.001) sensorless 0.542 (0.027) 1.000 (0.000) 1.000 (0.000) 0.543 (0.026) 1.000 (0.000) 1.000 (0.000) particle 0.555 (0.030) 0.976 (0.051) 0.985 (0.039) 0.556 (0.032) 0.977 (0.051) 0.985 (0.042) Table 4: FPR (top) and AUC (bottom) achieved by the considered methods in the high-ratio setting, namely when histogram-based methods define K = 32 bins and the monitored batches contain ν = 64 points. In parenthesis, the width of the 95%-confidence interval of the results. QT QT KQT KQT KQT EIk M SPLL(C=3) SPLL(C=3) DT DT (PCA) (Euclidean) (Mahalanobis) (Weighted Maha.) (PCA) (PCA) unimodal 4.29% (0.34%) 4.27% (0.32%) 4.30% (0.32%) 4.30% (0.34%) 4.28% (0.33%) 4.91% (0.44%) 6.10% (0.66%) 7.24% (1.03%) 7.08% (1.01%) 7.06% (1.00%) bimodal 4.31% (0.32%) 4.27% (0.32%) 4.29% (0.33%) 4.32% (0.33%) 4.29% (0.33%) 4.91% (0.44%) 6.29% (0.79%) 7.57% (1.26%) 6.99% (1.00%) 6.92% (0.97%) nino 5.01% (0.38%) 5.18% (0.38%) 5.23% (0.37%) 5.18% (0.38%) 5.21% (0.38%) 4.91% (0.48%) 7.28% (1.16%) 9.60% (1.75%) 6.77% (0.98%) 6.86% (1.09%) protein 4.98% (0.38%) 5.13% (0.39%) 5.17% (0.39%) 5.19% (0.40%) 5.21% (0.43%) 4.92% (0.47%) 15.38% (3.01%) 10.74% (2.28%) 6.81% (0.98%) 6.73% (1.08%) spruce 4.26% (0.33%) 4.29% (0.32%) 4.30% (0.32%) 4.29% (0.33%) 4.29% (0.34%) 4.85% (0.45%) 13.42% (3.15%) 13.64% (3.18%) 6.76% (0.99%) 6.89% (1.08%) lodgepole 4.27% (0.32%) 4.27% (0.34%) 4.29% (0.35%) 4.30% (0.36%) 4.31% (0.34%) 4.93% (0.52%) 12.57% (3.45%) 12.73% (3.46%) 6.89% (1.00%) 6.74% (1.06%) credit 4.34% (0.39%) 4.46% (0.46%) 4.39% (0.43%) 4.38% (0.42%) 4.47% (0.44%) 4.91% (0.61%) 11.88% (2.16%) 23.23% (3.66%) 6.74% (1.04%) 6.75% (1.02%) insects (1 2) 4.69% (0.47%) 4.84% (0.57%) 4.84% (0.58%) 4.84% (0.56%) 4.97% (0.59%) 4.91% (0.45%) 6.73% (1.63%) 8.05% (1.95%) 6.86% (1.04%) 6.77% (1.10%) insects (2 3) 4.73% (0.50%) 4.87% (0.59%) 4.85% (0.56%) 4.86% (0.60%) 5.04% (0.61%) 4.90% (0.45%) 6.67% (1.51%) 8.02% (1.77%) 6.83% (1.03%) 6.77% (1.04%) insects (3 4) 4.72% (0.49%) 4.84% (0.56%) 4.85% (0.58%) 4.84% (0.58%) 5.02% (0.60%) 4.93% (0.48%) 7.29% (1.75%) 8.71% (2.03%) 6.80% (1.07%) 6.75% (1.03%) insects (4 5) 4.69% (0.50%) 4.84% (0.58%) 4.82% (0.57%) 4.83% (0.60%) 4.99% (0.64%) 4.90% (0.44%) 6.56% (1.55%) 7.89% (1.87%) 6.83% (1.03%) 6.71% (1.03%) insects (5 6) 4.77% (0.53%) 4.90% (0.56%) 4.90% (0.56%) 4.89% (0.58%) 5.07% (0.60%) 4.86% (0.43%) 7.18% (1.72%) 8.09% (1.89%) 6.77% (1.04%) 6.76% (1.00%) sensorless 4.29% (0.35%) 4.43% (0.39%) 4.30% (0.34%) 4.35% (0.36%) 5.32% (0.65%) 4.94% (0.49%) 4.93% (1.19%) 4.29% (0.71%) 6.53% (1.06%) 6.67% (1.07%) particle 4.28% (0.32%) 4.32% (0.36%) 4.30% (0.35%) 4.33% (0.35%) 4.71% (0.46%) 4.86% (0.50%) 6.92% (1.52%) 8.80% (1.96%) 6.70% (1.12%) 6.78% (1.12%) unimodal 0.883 (0.101) 0.936 (0.059) 0.881 (0.115) 0.957 (0.031) 0.957 (0.031) 0.779 (0.157) 0.975 (0.016) 0.972 (0.047) 0.676 (0.148) 0.712 (0.174) bimodal 0.796 (0.109) 0.825 (0.102) 0.821 (0.112) 0.868 (0.075) 0.885 (0.062) 0.709 (0.130) 0.859 (0.132) 0.845 (0.154) 0.636 (0.106) 0.652 (0.122) nino 0.736 (0.143) 0.804 (0.170) 0.555 (0.042) 0.809 (0.173) 0.830 (0.163) 0.511 (0.012) 0.739 (0.176) 0.771 (0.191) 0.630 (0.111) 0.545 (0.048) protein 0.848 (0.104) 0.980 (0.055) 0.582 (0.048) 0.985 (0.050) 0.991 (0.035) 0.508 (0.009) 0.906 (0.118) 0.945 (0.098) 0.638 (0.117) 0.599 (0.081) spruce 1.000 (0.003) 1.000 (0.000) 0.590 (0.060) 1.000 (0.000) 1.000 (0.000) 0.504 (0.005) 1.000 (0.002) 1.000 (0.002) 1.000 (0.001) 1.000 (0.001) lodgepole 1.000 (0.001) 1.000 (0.001) 0.639 (0.085) 1.000 (0.000) 1.000 (0.000) 0.506 (0.006) 1.000 (0.002) 1.000 (0.002) 1.000 (0.000) 1.000 (0.000) credit 0.611 (0.043) 0.813 (0.144) 0.550 (0.021) 0.685 (0.137) 1.000 (0.000) 0.504 (0.005) 0.565 (0.060) 0.624 (0.108) 0.603 (0.051) 0.739 (0.116) insects (1 2) 0.976 (0.022) 0.887 (0.054) 0.902 (0.041) 0.967 (0.025) 0.959 (0.037) 0.698 (0.057) 0.733 (0.033) 0.786 (0.031) 1.000 (0.000) 0.999 (0.002) insects (2 3) 0.968 (0.031) 0.981 (0.018) 0.994 (0.005) 0.999 (0.001) 1.000 (0.001) 0.895 (0.052) 1.000 (0.000) 0.999 (0.000) 0.987 (0.008) 0.996 (0.009) insects (3 4) 0.915 (0.045) 0.804 (0.068) 0.821 (0.046) 0.909 (0.036) 0.940 (0.051) 0.688 (0.066) 0.691 (0.021) 0.687 (0.023) 0.995 (0.003) 0.987 (0.007) insects (4 5) 0.989 (0.015) 0.991 (0.015) 0.998 (0.003) 1.000 (0.001) 1.000 (0.000) 0.878 (0.071) 1.000 (0.000) 1.000 (0.000) 0.999 (0.001) 0.994 (0.010) insects (5 6) 0.974 (0.017) 0.890 (0.044) 0.922 (0.023) 0.961 (0.019) 0.982 (0.011) 0.860 (0.051) 0.932 (0.007) 0.933 (0.008) 0.997 (0.001) 0.996 (0.002) sensorless 0.832 (0.112) 0.999 (0.008) 0.523 (0.013) 1.000 (0.000) 1.000 (0.000) 0.501 (0.003) 1.000 (0.000) 1.000 (0.000) 0.715 (0.139) 0.582 (0.083) particle 0.860 (0.129) 0.876 (0.107) 0.530 (0.015) 0.922 (0.101) 0.941 (0.089) 0.503 (0.004) 0.786 (0.143) 0.861 (0.127) 0.706 (0.143) 0.526 (0.035) Average Ranking 5.32 4.96 7.28 3.78 2.96 9.54 5.26 4.97 5.33 5.60 Nemenyi p-value < 10 16 < 10 16 < 10 16 < 10 16 - < 10 16 < 10 16 < 10 16 < 10 16 < 10 16 Table 5: FPR (top) and AUC (bottom) achieved by the considered methods in the high-ratio setting, namely when histogram-based methods define K = 16 bins and the monitored batches contain ν = 128 points. In parenthesis, the width of the 95%-confidence interval of the results. QT QT KQT KQT KQT EIk M SPLL(C=3) SPLL(C=3) DT DT (PCA) (Euclidean) (Mahalanobis) (Weighted Maha.) (PCA) (PCA) unimodal 4.83% (0.48%) 4.81% (0.46%) 4.86% (0.47%) 4.82% (0.45%) 4.83% (0.48%) 4.82% (0.53%) 5.46% (0.75%) 5.92% (1.04%) 7.84% (1.16%) 7.75% (1.17%) bimodal 4.80% (0.45%) 4.81% (0.46%) 4.80% (0.46%) 4.81% (0.44%) 4.80% (0.45%) 4.82% (0.51%) 5.53% (0.75%) 6.02% (1.06%) 7.65% (1.20%) 7.62% (1.09%) nino 5.04% (0.49%) 4.99% (0.50%) 5.00% (0.53%) 5.02% (0.53%) 5.01% (0.54%) 4.83% (0.55%) 6.14% (1.21%) 7.69% (2.05%) 7.55% (1.20%) 7.57% (1.16%) protein 4.97% (0.50%) 4.98% (0.56%) 4.97% (0.52%) 4.98% (0.54%) 5.03% (0.55%) 4.88% (0.61%) 13.15% (3.54%) 8.42% (2.33%) 7.65% (1.25%) 7.64% (1.25%) spruce 4.81% (0.50%) 4.83% (0.48%) 4.82% (0.47%) 4.84% (0.49%) 4.90% (0.47%) 4.86% (0.59%) 11.43% (3.93%) 11.56% (3.97%) 7.56% (1.21%) 7.57% (1.16%) lodgepole 4.83% (0.47%) 4.82% (0.50%) 4.85% (0.49%) 4.80% (0.47%) 4.90% (0.50%) 4.92% (0.57%) 10.78% (4.64%) 10.89% (4.68%) 7.60% (1.14%) 7.58% (1.12%) credit 4.83% (0.47%) 4.96% (0.54%) 4.89% (0.48%) 4.85% (0.46%) 5.06% (0.56%) 4.96% (0.68%) 8.67% (2.26%) 16.06% (3.63%) 7.63% (1.15%) 7.59% (1.23%) insects (1 2) 4.92% (0.50%) 4.93% (0.51%) 4.91% (0.50%) 4.93% (0.52%) 5.19% (0.64%) 4.93% (0.63%) 5.90% (2.04%) 6.48% (2.15%) 7.57% (1.16%) 7.60% (1.20%) insects (2 3) 4.93% (0.53%) 4.91% (0.54%) 4.92% (0.52%) 4.96% (0.52%) 5.25% (0.62%) 4.96% (0.65%) 5.54% (1.85%) 6.16% (1.96%) 7.60% (1.19%) 7.59% (1.23%) insects (3 4) 4.92% (0.48%) 4.89% (0.52%) 4.90% (0.52%) 4.88% (0.53%) 5.22% (0.64%) 4.89% (0.58%) 6.09% (1.99%) 6.69% (2.11%) 7.59% (1.19%) 7.54% (1.17%) insects (4 5) 4.92% (0.50%) 4.95% (0.52%) 4.91% (0.51%) 4.92% (0.54%) 5.25% (0.65%) 4.91% (0.61%) 5.48% (1.76%) 6.01% (1.84%) 7.63% (1.24%) 7.56% (1.17%) insects (5 6) 4.91% (0.54%) 4.90% (0.54%) 4.90% (0.56%) 4.92% (0.53%) 5.26% (0.72%) 4.90% (0.64%) 5.86% (2.05%) 6.19% (2.11%) 7.61% (1.22%) 7.63% (1.24%) sensorless 4.84% (0.50%) 5.01% (0.55%) 4.82% (0.49%) 5.01% (0.56%) 7.42% (1.61%) 4.93% (0.61%) 4.33% (1.03%) 4.83% (0.76%) 7.55% (1.19%) 7.58% (1.22%) particle 4.85% (0.50%) 4.87% (0.51%) 4.81% (0.46%) 4.94% (0.52%) 5.80% (1.02%) 4.84% (0.61%) 5.93% (2.01%) 6.07% (2.05%) 7.52% (1.10%) 7.60% (1.19%) unimodal 0.957 (0.079) 0.976 (0.057) 0.946 (0.105) 0.993 (0.016) 0.994 (0.013) 0.874 (0.154) 0.996 (0.006) 0.989 (0.040) 0.786 (0.167) 0.806 (0.190) bimodal 0.900 (0.110) 0.930 (0.090) 0.904 (0.118) 0.954 (0.060) 0.968 (0.042) 0.821 (0.158) 0.915 (0.126) 0.895 (0.164) 0.751 (0.155) 0.767 (0.160) nino 0.845 (0.143) 0.905 (0.135) 0.607 (0.072) 0.904 (0.138) 0.922 (0.122) 0.528 (0.029) 0.816 (0.172) 0.841 (0.183) 0.726 (0.152) 0.582 (0.081) protein 0.899 (0.104) 0.985 (0.051) 0.617 (0.074) 0.993 (0.035) 0.995 (0.027) 0.514 (0.015) 0.918 (0.118) 0.954 (0.093) 0.704 (0.148) 0.595 (0.085) spruce 0.999 (0.014) 1.000 (0.000) 0.601 (0.066) 1.000 (0.000) 1.000 (0.000) 0.507 (0.007) 1.000 (0.000) 1.000 (0.000) 1.000 (0.002) 1.000 (0.002) lodgepole 1.000 (0.000) 1.000 (0.000) 0.654 (0.099) 1.000 (0.000) 1.000 (0.000) 0.511 (0.016) 1.000 (0.002) 1.000 (0.002) 1.000 (0.000) 1.000 (0.000) credit 0.698 (0.079) 0.867 (0.127) 0.602 (0.053) 0.780 (0.146) 1.000 (0.000) 0.508 (0.011) 0.597 (0.085) 0.660 (0.132) 0.695 (0.091) 0.820 (0.131) insects (1 2) 0.998 (0.005) 0.962 (0.048) 0.962 (0.035) 0.972 (0.039) 0.993 (0.019) 0.836 (0.071) 0.810 (0.035) 0.866 (0.029) 1.000 (0.000) 1.000 (0.000) insects (2 3) 0.993 (0.017) 0.995 (0.012) 1.000 (0.001) 1.000 (0.000) 1.000 (0.000) 0.962 (0.014) 1.000 (0.000) 1.000 (0.000) 0.999 (0.002) 1.000 (0.001) insects (3 4) 0.983 (0.029) 0.897 (0.078) 0.904 (0.054) 0.942 (0.049) 0.990 (0.024) 0.835 (0.084) 0.753 (0.025) 0.745 (0.028) 1.000 (0.000) 1.000 (0.001) insects (4 5) 0.998 (0.008) 0.997 (0.008) 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 0.950 (0.021) 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 0.999 (0.004) insects (5 6) 0.999 (0.003) 0.971 (0.033) 0.985 (0.016) 0.992 (0.008) 1.000 (0.000) 0.963 (0.017) 0.979 (0.004) 0.979 (0.005) 1.000 (0.000) 1.000 (0.000) sensorless 0.862 (0.120) 1.000 (0.004) 0.542 (0.027) 1.000 (0.000) 1.000 (0.000) 0.502 (0.003) 1.000 (0.000) 1.000 (0.000) 0.738 (0.179) 0.595 (0.104) particle 0.886 (0.116) 0.931 (0.090) 0.555 (0.030) 0.976 (0.051) 0.985 (0.039) 0.506 (0.006) 0.838 (0.135) 0.901 (0.112) 0.798 (0.140) 0.542 (0.054) Average Ranking 5.24 4.93 7.08 3.82 2.98 9.37 5.57 5.34 5.11 5.56 Nemenyi p-value < 10 16 < 10 16 < 10 16 < 10 16 - < 10 16 < 10 16 < 10 16 < 10 16 < 10 16 [Alippi et al., 2017] Alippi, C., Boracchi, G., and Carrera, D. (2017). Ccm: Controlling the change magnitude in high dimensional data. In Angelov, P., Manolopoulos, Y., Iliadis, L., Roy, A., and Vellasco, M., editors, Advances in Big Data, pages 216 225, Cham. Springer International Publishing. [Balakrishnan and Rao, 1998] Balakrishnan, N. and Rao, C. R. (1998). Handbook of statistics. v. 16: Order statistics: theory and methods. [Boracchi et al., 2018] Boracchi, G., Carrera, D., Cervellera, C., and Maccio, D. (2018). Quanttree: Histograms for change detection in multivariate data streams. In International Conference on Machine Learning, pages 639 648. PMLR. [Dal Pozzolo et al., 2017] Dal Pozzolo, A., Boracchi, G., Caelen, O., Alippi, C., and Bontempi, G. (2017). Credit card fraud detection: a realistic modeling and a novel learning strategy. IEEE transactions on neural networks and learning systems, 29(8):3784 3797. [Dua and Graff, 2017] Dua, D. and Graff, C. (2017). UCI machine learning repository. [Lehmann et al., 2005] Lehmann, E. L., Romano, J. P., and Casella, G. (2005). Testing statistical hypotheses, volume 3. Springer. [Papoulis and Pillai, 2002] Papoulis, A. and Pillai, S. U. (2002). Probability, random variables, and stochastic processes. Tata Mc Graw-Hill Education. [Shapiro and Wilk, 1965] Shapiro, S. S. and Wilk, M. B. (1965). An analysis of variance test for normality (complete samples). Biometrika, 52(3/4):591 611. [Souza et al., 2020] Souza, V. M., dos Reis, D. M., Maletzke, A. G., and Batista, G. E. (2020). Challenges in benchmarking stream learning algorithms with real-world data. Data Mining and Knowledge Discovery, 34(6):1805 1858. [Tipping, 1999] Tipping, M. E. (1999). Deriving cluster analytic distance functions from gaussian mixture models.