# locally_private_online_change_point_detection__24f20a95.pdf

Locally private online change point detection

Thomas Berrett Department of Statistics University of Warwick Coventry, CV4 7AL, U.K. tom.berrett@warwick.ac.uk

Yi Yu Department of Statistics University of Warwick Coventry, CV4 7AL, U.K. yi.yu.2@warwick.ac.uk

We study online change point detection problems under the constraint of local differential privacy (LDP) where, in particular, the statistician does not have access to the raw data. As a concrete problem, we study a multivariate nonparametric regression problem. At each time point t, the raw data are assumed to be of the form (Xt, Yt), where Xt is a d-dimensional feature vector and Yt is a response variable. Our primary aim is to detect changes in the regression function mt(x) = E(Yt|Xt = x) as soon as the change occurs. We provide algorithms which respect the LDP constraint, which control the false alarm probability, and which detect changes with a minimal (minimax rate-optimal) delay. To quantify the cost of privacy, we also present the optimal rate in the benchmark, non-private setting. These non-private results are also new to the literature and thus are interesting per se. In addition, we study the univariate mean online change point detection problem, under privacy constraints. This serves as the blueprint of studying more complicated private change point detection problems.

1 Introduction

Online change point detection has been an active statistical research area for decades, originated from the demand of a reliable quality control mechanism under time and resources constraints [e.g. 34]. In recent years, due to the advance of technology, the applications of online change point detection are well beyond quality control and include climatology, speech recognition, imaging processing, among many others. While collecting the data as they are generated, one wishes to detect the underlying distributional changes as soon as the changes occur.

As the ability of collecting and storing data improves exponentially, protecting users privacy has become one of the central concerns of data science. In practice, many of the systems that we monitor contain sensitive information. For instance, change point algorithms are used in cyber security to detect attacks, with the ultimate aim often being to protect private information [24, 30, 1]. Other common application areas include public health [21, 17] and ﬁnance [18], in which many of the data involved are highly personal. Given the prevalence of such problems, we address two questions in this paper:

whether we can detect changes without the need for direct access to the sensitive raw data;

and what the cost of protecting privacy is in terms of the detection delay and accuracy.

Traditional anonymisation of data has been shown to be an outdated method of privacy protection, particularly in multivariate settings [28, 26]. Even when direct identiﬁers such as names and locations are removed, data sets often contain enough information for researchers to identify individual subjects, and hence there is a need to quantify and control the release of privileged information. Formal privacy constraints provide us with a rigorous framework in which we may tackle such problems and explore the fundamental limits of private methods. Constraints of this type have been imposed in analyses

35th Conference on Neural Information Processing Systems (Neur IPS 2021).

carried out by Apple [31, 29], Google [16] and Microsoft [10], and provides organisations with a way of demonstrating the General Data Protection Regulation compliance [7].

The most popular privacy constraint is that of differential privacy [14], which assumes the existence of a third party who can be trusted to handle all the raw data. In situations where this assumption cannot be made, which are the focus in our work, we strengthen this constraint and require our algorithms to satisfy local differential privacy [e.g. 20, 12], which, in particular, insists that the raw data are not accessed by anyone except its original holder. Solutions to a wide range of statistical problems have been given, and much underlying theory has been developed [e.g. 35, 12, 13, 27, 3, 15, 19].

In this paper, we are concerned with a multivariate nonparametric regression online change point detection problem, with privatised data. To be speciﬁc, we assume that the original data are {(Xi, Yi)}i N Rd R, where the regression functions

mi(x) = E(Yi|Xi = x), x Rd, i N , (1)

satisfy that mi+1 = mi if and only if i = N . The goal is to ﬁnd a stopping time bt, which minimises the detection delay (bt )+, while controlling the false alarm probability P(bt < ).

With the concern of maintaining privacy, we do not directly have access to the original data {(Xi, Yi)}i N , but a privatised version. Speciﬁcally, the original data are transmitted through an α-locally differentially private (α-LDP) channel for some ﬁxed α > 0. The privatised data are denoted by {(Wi, Zi)}i N Rd R. In our upper bounds we restrict attention to non-interactive mechanisms [see e.g. 12], and so a privacy mechanism is given by a sequence {Qi}i N of conditional distributions, with the interpretation that (Wi, Zi)|(Xi, Yi) = (xi, yi) Qi( |(xi, yi)). For {Qi}i N to satisfy the α-LDP constraint we require that

sup i N sup A sup (x,y),(x ,y ) Rd R

Qi(A|(x, y)) Qi(A|(x , y )) eα. (2)

In our lower bounds, however, we allow mechanisms to be sequentially interactive [e.g. 12], so that a privacy mechanism is given by {Qi}i N , with the interpretation that

(Wi, Zi)|{(Xi, Yi, Wi 1, Zi 1, . . . , W1, Z1) = (xi, yi, wi 1, zi 1, . . . , w1, z1) Qi( |(xi, yi, wi 1, zi 1, . . . , w1, z1)).

Here the α-LDP constraint requires that

sup i N sup A sup (x,y),(x ,y ) Rd R sup w1,z1,...,wi 1,zi 1

Qi(A|(x, y, wi 1, zi 1, . . . , w1, z1)) Qi(A|(x , y , wi 1, zi 1, . . . , w1, z1)) eα.

Since our upper and lower bounds match, up to a logarithmic factor, we may conclude that simpler non-interactive procedures result in optimal performance for this problem.

We will assume throughout that α 1, though this can be relaxed to α C for any C > 0. This restricts attention to the strongest constraints, and is often the regime of primary interest [e.g. 12].

1.1 Summary of contributions and related literature

To the best of our knowledge, this is the ﬁrst work on a few fronts.

Firstly, this is the ﬁrst paper to consider change point detection under local privacy constraints. Previous work has focused on the central model of differential privacy, where there exists a third party trusted to have access to all of the data. [9] use established tools from the central model of differential privacy to detect changes in both the ofﬂine and online settings. The preand post-change distributions are assumed to be known, and a private version of the likelihood ratio statistic is analysed. Further development of these ideas, in particular the extension to detection of multiple changes in the online setting, is given in [39]. [6] give differentially private tests of simple hypotheses, shown to be optimal up to constants, which are then applied to the change point detection problem with known preand post-change distributions. In a setting in which the distributions are unknown, [8] develop private versions of the Mann Whitney test to detect a change in location. The problem has also been studied under different notions of central privacy [23].

Secondly, this is the ﬁrst paper to study the fundamental limits in multivariate nonparametric regression change point detection problems. We have derived the minimax rate of the detection delay,

allowing the jump size m m +1 , the variance of the additive noise σ2 and the privacy constraint α to vary with the location of the change point . There has been a vast body of literature discussing the detection boundary and optimal estimation in the ofﬂine change point analysis [e.g. 33, 36]. Their counterparts in online change point analysis are relatively scarce and existing work includes univariate mean change [e.g. 37] and dynamic networks [e.g. 38]. On a separate note, multivariate nonparametric regression estimation, under privacy constraints, is studied in [4], and classiﬁcation is studied in [2]

In addition, we have also provided the analysis and results based on the univariate mean online change point detection problem, with privatised data. This is, arguably, the simplest privatised, online change point detection problem. The analysis and results we shown in this paper enrich statisticians toolboxes and serve as a benchmark for more complex problems.

2 Methodology

In this section, we describe our private change point detection algorithm, which takes the privatised data as input. The whole algorithm consists of two key ingredients: (1) the privacy mechanism and (2) the change point detection method.

The privacy mechanism. Throughout this paper, a binned estimator is the core of the analysis. Recall that the raw data at time point i include a d-dimensional feature vector Xi, which we assume is supported within some bounded set X Rd, and a univariate response variable Yi. We denote {Ah,j}j=1,...,Nh as a set of cubes of volume hd, such that {X Ah,j}j=1,...,Nh is a partition of X, and write xh,j for the centre of Ah,j. The data point (Xi, Yi) is then randomised by taking

Wi,j = 1{Xi Ah,j} + 4α 1ϵi,j and Zi,j = [Yi]M M1{Xi Ah,j} + 4Mα 1ζi,j,

where {ϵi,j, ζi,j} are independent and identically distributed standard Laplace random variables, and where [Y ]M M = min(M, max(Y, M)) with M > 0 a truncation parameter. It is shown in Proposition 1 in [4] [see also 2] that this non-interactive mechanism is an α-LDP channel. We emphasise that privacy is guaranteed without any assumptions on the distribution of (X, Y ).

The change point detection method. Given data {(Wi,j, Zi,j)}, as for online change point detection, we propose Algorithm 1, with the CUSUM estimator deﬁned in Deﬁnition 1 and the nonparametric estimators involved deﬁned in Deﬁnition 2.

Algorithm 1 Online change point detection via CUSUM statistics

INPUT: {(Wu,j, Zu,j)}u,j=1,2,... Rd R, {bs,t, 1 s < t < , . . .} R.

t 1, FLAG 0; while FLAG = 0 do

t t + 1; FLAG = 1 Qt 1 s=1 1 n b Ds,t bs,t o ; end while OUTPUT: t.

Deﬁnition 1. Given a sequence {(Wt,j, Zt,j)} and a pair of integers 1 s < t, we deﬁne the CUSUM statistic

b Ds,t = Nh max j=1

s(t s)/t bm1:s(xh,j) bm(s+1):t(xh,j) ,

where bm : ( ) is deﬁned in Deﬁnition 2. Deﬁnition 2. Given a sequence {(Wt,j, Zt,j)} and a pair of integers 1 s < t, with h > 0 being our bandwidth parameter, we deﬁne the regression function estimator as

bms:t(x) = bνs:t(Ah,j)

bµs:t(Ah,j)1{bµs:t(Ah,j) log(t s+2)

t s+1 }, if x Ah,j,

bνs:t(Ah,j) = (t s + 1) 1 X

s i t Zi,j and bµs:t(Ah,j) = (t s + 1) 1 X

s i t Wi,j.

Algorithm 1 is a standard online change point detection procedure. Whenever a new data point stamped at t is collected, one checks if a change point has occurred by checking if any of ( b Ds,t)s=1,...,t 1 have exceeded pre-speciﬁed thresholds. Apparently, both the storage and computation costs of this procedure are high. As pointed out in [37], without knowing the preand/or post-change point distributions, there is no existing algorithm which can have a constant computational cost O(1) when a new data point is collected. Of course, one can also be smarter: instead of scanning through s {1, . . . , t 1}, one can just scan through a dyadic grid as described in [38].

The CUSUM statistic b Ds,t, deﬁned in Deﬁnition 1, is a normalised difference between two estimated functions, before and after time point s, evaluated at each cell in the partition. It is used to examine, given all the data up to t, if s is a possible change point. It can be viewed as a sample version of the quantity

t 1 max s=1

i=1 mi (t s) 1 t X

where is the sup-norm of a function. In [25], a similar CUSUM was proposed in studying the nonparametric density change point detection problem. In our setting, we are interested in change points in the regression functions. The construction of estimators is detailed in Deﬁnition 2. This estimator has previously studied in the one sample estimation scenario in [4], where it was shown to be a universally strongly consistent estimator of the true regression function.

In this section, we present our core results. All the assumptions are collected in Section 3.1. The theoretical guarantees of Algorithm 1 are presented in Section 3.2, which also includes a minimax lower bound result showing our proposed method is optimal, off by at most a logarithmic factor. To quantify the cost of maintaining α-LDP, we also provide a benchmark result of the same data generating mechanism but without privacy in Section 3.3.

3.1 Assumptions

Assumption 1 (Setup). Assume that {(Xi, Yi)}i N is a sequence of independent random objects, taking values in X R, such that {Xi}i N are independent each with distribution µ on X. We assume that there exists an absolute constant cmin > 0 such that µ(Ah,j) cminhd for all Ah,j in the partition of X. Assume that the regression functions mi( ), given by (1), are well deﬁned for µ-almost all x and i N , such that there exists an absolute constant CLip > 0 with

sup i N |mi(x1) mi(x2)| CLip x1 x2 , x1, x2 X.

We also assume that there exists σ > 0 such that for all λ R we have

sup i N sup x Rd E eλ{Yi mi(x)} Xi = x e λ2σ2

and that supi N supx Rd |mi(x)| M0 for some M0 > 0.

Assumption 1 is the main model assumption. It is a general feature of the local differential privacy constraint that it is not possible to work with unbounded parameter spaces [see e.g. 11, Appendix G]. Thus, we assume that the feature vectors are supported in a bounded set X, and we moreover assume that the regression functions mi are bounded, though these bounds are allowed to vary with the pre-change sample size . In addition, we also assume that the regression functions are Lipschitz with a constant CLip. The Lipschitz condition can be easily relaxed to other type of continuity conditions, e.g. Hölder continuity. We assume the additive noise is sub-Gaussian with parameter σ, which is allowed to vary with the pre-change sample size. Assumption 1 is required to ensure that our algorithm controls the false alarm probability correctly and to ensure its optimality. Assumption 2 (No change point). Assume that m1 = m2 = . Assumption 3 (One change point). Assume that there exists a positive integer 1 such that

m1 = = m = m +1 = m +2 = .

In addition, let κ = m m +1 .

Assumption 4 (Signal-to-noise ratio). There exists a sufﬁciently large absolute constant CSNR > 0 such that κ2h2dα2 / max{σ2, M 2 0 } CSNR log /(cminh2dγ) ,

where γ (0, 1) is the desired bound on false alarm probability and h is the bin-width used in constructing the estimators.

Assumptions 2, 3 and 4 describe different scenarios and aspects of the change point assumptions. Assumption 2 is a formal assumption describing when there is no change point and Assumption 3 depicts the scenario when there is one change point. Recall that in Deﬁnition 1 we proposed the CUSUM statistic which is a sample version of a normalised sup-norm difference between two functions. This is to be consistent with κ, the characterisation of the jump, introduced in Assumption 3.

Assumption 4 is the signal-to-noise ratio condition, and is required when we study the optimality of our algorithm to detect changes. Recall that κ is the jump size, σ is the ﬂuctuation size, M0 is the upper bound of the mean function mi, α is the privacy constraint and is the size of the pre-change point sample size. Assumption 4 requires that κ2h2dα2 min{M 2 0 , σ 2} is larger than a logarithmic factor. This in fact allows κ, σ, M0 and α to vary as diverges. It may appear to be unnatural to involve tuning parameters in the signal-to-noise ratio condition. We remark that the involvement of h in Assumption 4 is to provide more ﬂexibility in the tuning parameter selection. We will elaborate this point in Section 3.2.

3.2 Optimal online change point detection with privatised data

Theorem 1 below is the main result, which shows that for any γ (0, 1), with properly chosen tuning parameters, with probability at least of 1 γ, Algorithm 1 does not have false alarms and has a detection delay upper bounded by ϵ in (12).

Theorem 1. Consider the settings described in Assumption 1. Let γ (0, 1) and bt be the stopping time returned by Algorithm 1 with inputs {(Wtj, Ztj)}t,j=1,2,... and {bs,t}t=2,3,...;s=1,...,t, where

t 2(M M0)e (M M0)2

dh + M cminhdα

, if s(t s)

t c2 minh2dα2 64 log 72t3

, otherwise. (3) Assume that the truncation parameter satisﬁes

M M1 = M0 + σ p

2 log(2 + σ/h) + log log(2 + σ/h) (4)

and the bandwidth satisﬁes h Cκ, where C > 0 is an absolute constant.

If Assumption 2 holds, then P bt < < γ. (5)

Under Assumption 3, we have P bt < γ, (6)

for any 1. If Assumptions 3 and 4 both hold, then

P < bt + ϵ 1 γ, where ϵ = Cε M 2

κ2h2dα2 log h2dcminγ

and Cε > 0 depends only on d.

To better understand Theorem 1, we ﬁrst inspect the sources of the error in the procedure. The estimators of the regression functions mi are deﬁned in Deﬁnition 2, which is a binned estimator averaging over cubes of volume hd. This is a typical nonparametric estimator, which brings in both bias and variance. On top of this, due to the constraints of privacy, we truncate the responses by M in

the privacy channel. The truncation level M should be an upper bound on M1 a large-probability upper bound on the response, consisting of the upper bound on the regression function and the additive noise so that the truncation bias is no larger than the bias due to smoothing. On the other hand, larger values of M result in larger variance, due to the need to add more noise in the privacy mechanism. If M < M1 then the change point may be undetectable, as the bias could be larger than the signal. The same phenomenon occurs in nonparametric testing and change point detection problems, that the smoothing parameter should not be too large to mask the signal.

Algorithm 1 declares the existence of a change point t, if there exists an integer pair (s, t) such that b Ds,t > bs,t. The threshold sequence is detailed in (3). It is separated into two cases: (i) when s(t s)/t is large enough for both bm1:s and bm(s+1):t to be estimated accurately, and (ii) otherwise. In view of the sources of errors, we can see that in case (i), with probability at least 1 γ, the threshold bs,t is set to be the sum of an upper bound of all sources of errors. In case (ii), the threshold is set to be inﬁnity, so that we never declare a change.

When there is no change point, or when there is a change point but t , the thresholds, with probability at least 1 γ, are upper bounds on the estimators. When there is a change point and t > , due to Assumption 4, one can always let s = such that there are enough samples to provide a good estimator of m , the pre-change regression function. If t is not large enough such that case (i) holds, then we cannot provide a good estimator of m +1. Once enough data are collected after the change point, Algorithm 1 is able to tell the difference and declare a change point, with delay upper bounded by ϵ in (12). Note that the conditions of case (i) will be satisﬁed before time + ϵ.

We require the bandwidth h Cκ to ensure that the binning will not smooth out the jump, and this condition is necessary. In practice, the bin-width can be chosen in a data-driven way [e.g. 38]. In view of Assumption 4, the smaller h is, the larger κ2α2 / max{σ2, M 2 0 } needs to be. In the best case that h κ, Assumption 4 and ϵ in (12) read as

κ2+2dα2 max{σ2, M 2 0 } log cminκ2dγ

and ϵ = Cε M 2

κ2+2dα2 log κ2dcminγ

We remark that the detection delay in (12) is of order

κ2h2dα2 log( /(h2dγ)) (M M0)2 + M 2 0 + σ2

κ2h2dα2 log( /(h2dγ)),

which again reﬂects all three sources of errors. Now the question is whether this rate can be improved. To answer the question, we consider a simpliﬁed scenario, where we assume that M0 σ. Theorem 2 (Lower bound). Denote by Pκ,σ, the class of distributions satisfying Assumptions 1 and 3, and assume M0 σ. Given α > 0 let Qα be the collection of all sequentially interactive αlocally differentially private mechanisms. Given γ > 0 consider the class of change point estimators

D(γ) = T : T is a stopping time wrt. the natural ﬁltration and satisﬁes P (T < ) γ .

Then for sufﬁciently small γ, it holds that

inf Q Qα inf bt D(γ) sup P Pκ,σ, 2κ2+2dσ 2α2EP (bt )+ log(1/γ).

Theorem 2 studies the private minimax rate of the detection delay in the framework proposed in [12]. Compared with standard minimax theory, new tools are required in order to deal with an arbitrary privacy mechanism in Qα. We use Lemma 1 in [12, Supplementary material], which provides a uniform bound on the log-likelihood ratio of two distributions seen through a private channel. To the best of our knowledge, this is the ﬁrst time that these tools have been applied to change point problems. Although, in our upper bounds, we only had to consider non-interactive mechanisms, this lower bound also applies to general sequentially interactive mechanisms. In particular, this shows that the use of interactive mechanisms is unnecessary in our problem.

Since in the minimax sense the lower bound is taken to be the inﬁmum over all possible estimators, to make the results comparable in Assumption 4 and Theorem 1, we let h κ and compare Theorem 2 with (8). Theorem 2 shows that the lower bound on the detection delay is of order σ2/(κ2+2dα2) log(1/γ), which compared to (8) is off by a logarithmic factor and therefore shows that Algorithm 1 is nearly minimax rate optimal.

We now provide a sketch of the proof of Theorem 2, focusing on the non-interactive case for simplicity. The proof is conducted on an online change point detection framework used in [22] and [37], based on a change of measure argument. The LDP ingredient used in this proof is devised by [12].

To be speciﬁc, we ﬁrst construct a pair of distributions based on Lipschitz regression functions f1 and f2 on some ball B(0, r X ), where r X = O(1), f1 0 and f2(x) = (κ x )1{x B(0,κ)}. Note that, when κ is small, the regression functions only differ in a small region, and thus this is a difﬁcult change to detect. Given f1 and f2, the two distributions of (X, Y ) are denoted P1 and P2, where X has the same marginal distribution under both while Y |X = x Unif[fk(x) σ, fk(x) + σ] under Pk for k = 1, 2. Letting Q be any α-LDP non-interactive privacy mechanism, write P n κ,σ,ν for the joint distribution of the ﬁrst n privatised data points when (Xi, Yi) P1 for i ν and (Xi, Yi) P2 for i > ν. By controlling the size of the log-likelihood ratio Zν,n = log(d P n κ,σ,ν/d P n κ,σ, ) we show that any algorithm that has probability of false alarm bounded by γ will require at least σ2 log(1/γ)/(κ2+2dα2) observations after the change point to reliably detect the change from P1 to P2 at time ν.

As each observation is independent, we may decompose Zν,n = Pn i=ν+1 Zi. A straightforward calculation shows that d TV(P1, P2) κd+1/σ, and then ideas from [12] are applied to see that |Zi| ακd+1/σ and EZi α2κ2d+2/σ2 for any α-LDP privacy mechanism. We use the Azuma Hoeffding inequality to extend this to show that

max 1 t σ2 log(1/γ)

κ2+2dα2 Zν,ν+t 3

4 log(1/γ) W1, . . . , Wν

Using the change of measure argument [e.g. 22, 37], the above bound leads us to complete the proof.

3.3 Optimal online change point detection with non-private data

As a benchmark, we provide the non-private counterpart of Theorem 1 and 2 in this subsection. For completeness, we also detail the counterparts of Deﬁnitions 1, 2 and Algorithm 1 in Deﬁnitions 3, 4 and Algorithm 2, respectively. We will conclude this subsection with comparisons of results in the private and non-private cases, quantifying the cost of maintaining privacy. Deﬁnition 3. Given a sequence {(Xt, Yt)}t N and a pair of integers 1 s < t, we deﬁne the CUSUM statistic e Ds,t = t max i=1

s(t s)/t em1:s(Xi) em(s+1):t(Xi) ,

where em : ( ) is deﬁned in Deﬁnition 4. Deﬁnition 4. Given a sequence {(Xt, Yt)}t N , a pair of integers 1 s < t and a tuning parameter h > 0, we deﬁne the regression function estimator as

ems:t(x) = νs:t(Ah,j)

µs:t(Ah,j), if x Ah,j,

νs:t(Ah,j) = (t s+1) 1 X

s i t Yi1{Xi Ah,j} and µs:t(Ah,j) = (t s+1) 1 X

s i t 1{Xi Ah,j}.

Algorithm 2 Online change point detection via CUSUM statistics

INPUT: {(Xu, Yu)}u=1,2,... Rp R, { bu,t, t = 2, 3, . . . ; u = 1, . . . , t} R.

t 1; FLAG 0; while FLAG = 0 do

t t + 1; FLAG = 1 Qt 1 s=1 1 n e Ds,t bs,t o ; end while OUTPUT: t.

Assumption 5 (Non-private signal-to-noise ratio). There exists a sufﬁciently large absolute constant CSNR > 0 such that κ2hd σ 2 CSNR log( /(γhd)).

The following two theorems are the non-private version counterparts of Theorems 1 and 2. Theorem 4 shows that the detection delay rate we obtain in Theorem 3 is optimal off by a logarithmic factor.

Theorem 3. Consider the settings described in Assumption 1. Let γ (0, 1) and et be the stopping time returned by Algorithm 2 with inputs {(Xt, Yt)}t=1,2,... and {ebs,t}t=2,3,...;s=1,...,t, where

5 log(t) + log(32/γ) (9)

Assume that the bandwidth satisﬁes h Cκ.

If Assumption 2 holds, then P et < < γ. (10) Under Assumption 3 we have P et < γ, (11) for any 1. If Assumptions 3 and 5 both hold, then

P < et + ϵ 1 γ (cminhd) 1 exp( cminhd), (12)

κ2hd log( /γ),

with Cε > 0 depending only on d. Theorem 4. Denote by Pκ,σ, the class of distributions satisfying Assumptions 1 and 3 For any γ (0, 1), consider the class of change point estimators

D(γ) = T : T is a stopping time wrt. the natural ﬁltration and satisﬁes P (T < ) γ .

Then for any sufﬁciently small γ, it holds that

inf bt D(γ) sup P Pκ,σ, 2κ2+dσ 2EP (bt )+ log(1/γ).

We are now ready to quantify the cost of maintaining the privacy in the multivariate nonparametric regression online change point detection scenario.

As we have pointed out, in order to maintain the privacy, we require X to be a bounded set, the regression function to be upper bounded by M0 and an extra tuning parameter M is introduced in truncation. All these are not needed in the non-private case.

A more prominent difference roots in the detection delay rate. In the non-private case, the denominator is κ2+d, which roots in the optimal nonparametric estimation [e.g. 32]. In the private case, the corresponding rate is κ2+2d. This is in line with the literature in local differential privacy, where the curse of dimensionality is typically worse in nonparametric and high-dimensional problems than in their non-private counterparts [12, 2, 5, 3].

Another difference is in terms of the privacy parameter α, which only shows up in the private case. Recalling that we restrict ourselves to the most interesting regime α (0, 1], the effect of α is twofold: (1) Comparing Assumptions 4 and 5, we see that the private case requires a larger signal by a factor of α 2; and (2) comparing the detection delay rates in Theorem 1 and 3, we see that the rate in the private case is also inﬂated by a factor of α 2.

4 Numerical study

In this section we present the results of a numerical study of our locally private method s performance. We consider raw data (X1, Y1), . . . , (Xn, Yn) with n = 10000, = 5000, Xi Unif[0, 1] and Yi Unif[mi(x) 1/2, mi(x) + 1/2], where

m 0 and m +1(x) = (1/2) min(1, max(5 10x, 1)).

To speed up computation, we modify Algorithm 1 so that b Ds,t is only calculated and compared to its threshold when t {n/100, 2n/100, . . . , n}. We privatise the data using α in the range

1 2 3 4 5 6

0.0 0.2 0.4 0.6 0.8 1.0

Figure 1: Proportion of trials with a change ﬂagged

1 2 3 4 5 6

1500 2500 3500 4500

Figure 2: Average detection delay given a ﬂag

{1, 1.5, 2, . . . , 6}. For each value of α, we assume that we have a privatised sample of size n from the pre-change distribution, here Unif[0, 1] Unif[ 1/2, 1/2]. The choices M = 1 and h = 0.2 are used to privatise the data and calculate the test statistics. We permute this sample B = 1000 times to choose our thresholds, as follows: for a range of values of C we run the modiﬁed Algorithm 1 on each permutation of the privatised data with the choice

t h2α2 C2 log t γh ;

, otherwise

and we choose the minimal value of C for which the overall false alarm probability is bounded above by γ = 0.1. With the thresholds chosen we ran the experiment over 1000 repetitions, and results are presented in Figures 1 and 2. The false detection probability was bounded by γ for all values of α. Full details of the implementation and simulation study can be found in the code available online.

5 Conclusions

We studied a multivariate, nonparametric regression, online, privatised change point detection problem. The method we proposed is shown to be minimax optimal in terms of its detection delay, with a theory-guided tuning parameter. As a benchmark result, we have also provided its counterpart for non-private data. The comparisons enable us to understand the cost of maintaining privacy.

In addition to the main results in the paper, in Appendix A we investigate an online univariate mean change point detection problem with privatised data. It includes a minimax lower bound on the detection delay and a polynomial-time algorithm which provides a matching upper bound, saving for a logarithmic factor. The framework we set up in Appendix A serves as a blueprint to study private online change detection with more complex data types.

Comparing the results in Appendix A to those in the non-private setting [e.g. 37] and comparing these differences with those we examined at the end of Section 3.3, one can see that we pay different costs for privacy in different data types. This leads to our future work, studying private change point detection in high-dimensional, functional or other nonparametric data, and understanding the tradeoff between accuracy and privacy in these more challenging situations.

Regarding the setup we have in this paper, it can be easily adjusted to allow for multiple change points. This is because, with large probability, our algorithm will not declare false alarms and will correctly detect a change point with a delay ϵ being a small fraction of . Therefore, provided that two consecutive change points are at least apart, refreshing the algorithm whenever a change point is declared can enable us to detect multiple change points accurately.

Another interesting but challenge future work direction is to allow temporal dependence, say weakly dependent time series. It is not even clear how to deﬁne valid privacy mechanisms in this setting: if the time points were strongly dependent then our mechanisms would give more information as time progressed. In fact, this is even an open problem without the presence of change points. We would also need to adjust the concentration inequalities we are currently using to those which are suitable for weakly dependent data.

Supplementary material

The supplementary material contains all the technical details and code of this paper.

Acknowledgments and Disclosure of Funding

Funding in direct support of this work: DMS-EPSRC EP/V013432/1.

A A privatised univariate mean online change point detection

This is a self-contained section and the notation used in this section is independent from the notation in the rest of the paper. We start by considering a simple Laplace privacy mechanism and CUSUM statistics. Assumption 6. Assume the privatised data are {Z1, Z2, . . .} satisfying Zi = Xi + ϵi/α, where (i) {X1, X2, . . .} is the original data and is a sequence of independent random variables with unknown means E(Xi) = fi, i = 1, 2, . . . and such that supi=1,2,... Xi ψ2 σ; (ii) {ϵ1, ϵ2, . . .} is a sequence of independent and identically distributed standard Laplace random variables, i.e. each ϵt has density z 7 exp( |z|)/2 on R; and (iii) α > 0 is a pre-speciﬁed value.

Note that when Xi takes values in an interval of length one [a, a + 1] then the privacy mechanism given by Zi is α-LDP. This can be generalised to intervals of any ﬁxed length by increasing the scale of the Laplace noise appropriately. To extend to unbounded variables we may truncate as in our main methodology in Section 2. Assumption 7. Assume that there exists a positive integer 1 such that f1 = = f = f +1 = f +2 = . Let κ = |f f +1|. Assumption 8. There exists a sufﬁciently large absolute constant CSNR > 0 such that κ2(σ2 + 4α 2) 1 CSNR log( /γ).

We consider the CUSUM statistics b Ds,t = |{(t s/(ts)}1/2 Ps l=1 Zl [s/{t(t s)}]1/2 Pt l=s+1 Zl| and run Algorithm 1.

Theorem 5. Consider the settings described in Assumption 6. Let γ (0, 1) and bt be the stopping time returned by Algorithm 1 with inputs {Zt}t=1,2,... and {bt}t=2,3,..., where

σ2 + 4α 2 log1/2(t/γ). (13)

If = , then P bt < < γ. (14) Under Assumption 7, we have P bt < γ, (15) for any 1. If Assumptions 7 and 8 both hold, then

< bt + Cd (σ2 + 4α 2) log( /γ)

where Cd > 0 depends only on d and satisﬁes Cd < CSNR. Theorem 6. Denote Pκ,σ, be the class of distributions satisfying Assumptions 6 and 7 supported on an interval of ﬁxed ﬁnite length. Denote Qα be the collection of all sequentially interactive α-locally differentially private mechanisms, α > 0. Consider the class of change point estimators

D(γ) = T : T is a stopping time with respect to the natural ﬁltration

and satisﬁes P (T < ) γ .

Then for small enough γ, and for κ, σ > 0 with κ < 2σ and κ + 2σ < 1, it holds that

inf Q Qα inf bt D(γ) sup P Pκ,σ, EP (bt )+ cσ2α 2 log(1/γ)

where c > 0 is an absolute constant.

[1] Ibrahim Ethem Bagci, Utz Roedig, Ivan Martinovic, Matthias Schulz, and Matthias Hollick. Using channel state information for tamper detection in the internet of things. In Proceedings of the 31st Annual Computer Security Applications Conference, pages 131 140, 2015.

[2] Thomas Berrett and Cristina Butucea. Classiﬁcation under local differential privacy. Annales de l ISUP, 63 80 ans de Denis Bosq, 2019.

[3] Thomas B Berrett and Cristina Butucea. Locally private non-asymptotic testing of discrete distributions is faster using interactive mechanisms. Advances in Neural Information Processing Systems 34, 2020.

[4] Thomas B. Berrett, László Györﬁ, and Harro Walk. Strongly universally consistent nonparametric regression and classiﬁcation with privatised data. Electron. J. Statist., 15(1):2430 2453, 2021.

[5] Cristina Butucea, Amandine Dubois, Martin Kroll, and Adrien Saumard. Local differential privacy: Elbow effect in optimal density estimation and adaptation over besov ellipsoids. Bernoulli, 26(3):1727 1764, 2020.

[6] Clément L Canonne, Gautam Kamath, Audra Mc Millan, Adam Smith, and Jonathan Ullman. The structure of optimal private tests for simple hypotheses. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 310 321, 2019.

[7] Aloni Cohen and Kobbi Nissim. Towards formalizing the gdpr s notion of singling out. Proceedings of the National Academy of Sciences, 117(15):8344 8352, 2020.

[8] Rachel Cummings, Sara Krehbiel, Yuliia Lut, and Wanrong Zhang. Privately detecting changes in unknown distributions. In Proceedings of the 37th International Conference on Machine Learning, volume 119, pages 2227 2237, 2020.

[9] Rachel Cummings, Sara Krehbiel, Yajun Mei, Rui Tuo, and Wanrong Zhang. Differentially private change-point detection. In Advances in Neural Information Processing Systems, volume 31, 2018.

[10] Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. Collecting telemetry data privately. In Advances in Neural Information Processing Systems, pages 3571 3580, 2017.

[11] John C Duchi, Michael I Jordan, and Martin J Wainwright. Local privacy, data processing inequalities, and minimax rates. ar Xiv preprint ar Xiv:1302.3203, 2013.

[12] John C Duchi, Michael I Jordan, and Martin J Wainwright. Minimax optimal procedures for locally private estimation. Journal of the American Statistical Association, 113(521):182 201, 2018.

[13] John C Duchi and Feng Ruan. The right complexity measure in locally private estimation: It is not the Fisher information. ar Xiv preprint ar Xiv:1806.05756, 2018.

[14] Cynthia Dwork, Frank Mc Sherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference, pages 265 284. Springer, 2006.

[15] Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Abhradeep Thakurta. Ampliﬁcation by shufﬂing: From local to central differential privacy via anonymity. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2468 2479. SIAM, 2019.

[16] Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, pages 1054 1067, 2014.

[17] Dario Gregori, Danila Azzolina, Corrado Lanera, Ilaria Prosepe, Nicolas Destro, Giulia Lorenzoni, and Paola Berchialla. A ﬁrst estimation of the impact of public health actions against covid-19 in veneto (italy). J Epidemiol Community Health, 74(10):858 860, 2020.

[18] David J Hand and Gordon Blunt. Prospecting for gems in credit card data. IMA Journal of management Mathematics, 12(2):173 200, 2001.

[19] Matthew Joseph, Aaron Roth, Jonathan Ullman, and Bo Waggoner. Local differential privacy for evolving data. ar Xiv preprint ar Xiv:1802.07128, 2018.

[20] Peter Kairouz, Sewoong Oh, and Pramod Viswanath. Extremal mechanisms for local differential privacy. In Advances in Neural Information Processing Systems, volume 27, pages 2879 2887, 2014.

[21] Taha A Kass-Hout, Zhiheng Xu, Paul Mc Murray, Soyoun Park, David L Buckeridge, John S Brownstein, Lyn Finelli, and Samuel L Groseclose. Application of change point analysis to daily inﬂuenza-like illness emergency department visits. Journal of the American Medical Informatics Association, 19(6):1075 1081, 2012.

[22] Tze Leung Lai. Information bounds and quick detection of parameter changes in stochastic systems. IEEE Transactions on Information Theory, 44(7):2917 2929, 1998.

[23] T. S. Lau and W. Peng Tay. Privacy-aware quickest change detection. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5999 6003, 2020.

[24] Céline Lévy-Leduc and François Roueff. Detection and localization of change-points in highdimensional network trafﬁc data. Annals of Applied Statistics, 3(2):637 662, 2009.

[25] Oscar Hernan Madrid Padilla, Yi Yu, Daren Wang, and Alessandro Rinaldo. Optimal nonparametric multivariate change point detection and localization. ar Xiv preprint ar Xiv:1910.13289, 2019.

[26] Luc Rocher, Julien M Hendrickx, and Yves-Alexandre De Montjoye. Estimating the success of re-identiﬁcations in incomplete datasets using generative models. Nature Communications, 10(1):1 9, 2019.

[27] Angelika Rohde and Lukas Steinberger. Geometrizing rates of convergence under local differential privacy constraints. Annals of Statistics, 48(5):2646 2670, 2020.

[28] Latanya Sweeney. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05):557 570, 2002.

[29] Jun Tang, Aleksandra Korolova, Xiaolong Bai, Xueqiang Wang, and Xiaofeng Wang. Privacy loss in Apple s implementation of differential privacy on Mac OS 10.12. ar Xiv preprint ar Xiv:1709.02753, 2017.

[30] Alexander G Tartakovsky. Rapid detection of attacks in computer networks by quickest changepoint detection methods. In Data Analysis for Network Cyber-Security, pages 33 70. World Scientiﬁc, 2014.

[31] Apple Differential Privacy Team. Learning with privacy at scale. https://machinelearning. apple.com/research/learning-with-privacy-at-scale. Accessed: 30-09-2021.

[32] Alexandre B Tsybakov. Introduction to nonparametric estimation. Springer Science & Business Media, 2008.

[33] Nicolas Verzelen, Magalie Fromont, Matthieu Lerasle, and Patricia Reynaud-Bouret. Optimal change-point detection and localization. ar Xiv preprint ar Xiv:2010.11470, 2020.

[34] W Allen Wallis. The statistical research group, 1942 1945. Journal of the American Statistical Association, 75(370):320 330, 1980.

[35] Di Wang, Marco Gaboardi, and Jinhui Xu. Empirical risk minimization in non-interactive local differential privacy revisited. In Advances in Neural Information Processing Systems, pages 965 974, 2018.

[36] Yi Yu. A review on minimax rates in change point detection and localisation. ar Xiv preprint ar Xiv:2011.01857, 2020.

[37] Yi Yu, Oscar Hernan Madrid Padilla, Daren Wang, and Alessandro Rinaldo. A note on online change point detection. ar Xiv preprint ar Xiv:2006.03283, 2020.

[38] Yi Yu, Oscar Hernan Madrid Padilla, Daren Wang, and Alessandro Rinaldo. Optimal network online change point localisation. ar Xiv preprint ar Xiv:2101.05477, 2021.

[39] Wanrong Zhang, Sara Krehbiel, Rui Tuo, Yajun Mei, and Rachel Cummings. Single and multiple change-point detection with differential privacy. Journal of Machine Learning Research, 22(29):1 36, 2021.