# tracking_the_best_expert_privately__ebd70b94.pdf

Tracking the Best Expert Privately

Hilal Asi * 1 Vinod Raman * 2 Aadirupa Saha * 3

We design differentially private algorithms for the problem of prediction with expert advice under dynamic regret, also known as tracking the best expert. Our work addresses three natural types of adversaries, stochastic with shifting distributions, oblivious, and adaptive, and designs algorithms with sub-linear regret for all three cases. In particular, under a shifting stochastic adversary where the distribution may shift S times, we provide an ε-differentially private algorithm whose expected dynamic regret is at most O p

ST log(NT) + S log(NT )

ε , where T and N are the time horizon and number of experts, respectively. For oblivious adversaries, we give a reduction from dynamic regret minimization to static regret minimization, resulting in an upper bound of O p

ST log(NT) + ST 1/3 log(T/δ) log(NT )

on the expected dynamic regret, where S now denotes the allowable number of switches of the best expert. Finally, similar to static regret, we establish a fundamental separation between oblivious and adaptive adversaries for the dynamic setting: while our algorithms show that sub-linear regret is achievable for oblivious adversaries in the highprivacy regime ε p

S/T, we show that any (ε, δ)-differentially private algorithm must suffer linear dynamic regret under adaptive adversaries for ε p

S/T. Finally, to complement this lower bound, we give an ε-differentially private algorithm that attains sub-linear dynamic regret under adaptive adversaries whenever ε p

*Equal contribution 1Apple 2Department of Statistics, University of Michigan, Ann Arbor. Work done while interning at Apple 3University of Illinios, Chicago. The majority of the work was done while the author was with Apple Research. Correspondence to: Vinod Raman <vkraman@umich.edu>.

Proceedings of the 42 nd International Conference on Machine Learning, Vancouver, Canada. PMLR 267, 2025. Copyright 2025 by the author(s).

1. Introduction

Online learning with experts is a fundamental problem in machine learning, where an online algorithm interacts with an adversary for T rounds (Cesa-Bianchi & Lugosi, 2006). In the general form of the problem with N experts, at each round t, the environment chooses a loss vector ℓt : [N] 7 [0, 1], upon which the learner chooses an expert Jt [N] from the pool of N experts. In the classical setting of online learning, we measure the loss of the learning algorithm compared to the loss of the best-expert in hindsight, denoted as the (static) regret

t=1 ℓt(Jt) min j [N]

t=1 ℓt(j ).

However, comparing against a single fixed expert can often be unrealistic in practical applications. Even the best fixed expert may perform poorly on average over the entire loss sequence, especially when loss sequences dynamically change over time or undergo significant distributional shifts, as is common in stochastic settings. This limitation motivates the concept of dynamic regret (Herbster & Warmuth, 1998; Wei et al., 2016), which provides a more flexible and robust benchmark. Unlike static regret, which evaluates against the best fixed expert, dynamic regret compares against a sequence of changing experts, enabling the model to adapt to evolving environments. In particular, the dynamic regret is

t=1 ℓt(Jt) min j 1 ,...,j T [N]

t=1 ℓt(j t ),

subject to the constraint that the sequence {j t } does not switch too often. Intuitively, dynamic regret measures how well the algorithm competes against the best possible sequence of decisions that could adapt to changes, constrained by a limited number of switches between experts. This notion is particularly relevant for real-world applications like financial markets, where optimal strategies vary with market conditions, or recommendation systems, where user preferences evolve over time.

By inspecting definitions, it is clear that minimizing dynamic regret is harder than static regret. This difficulty manifests even if one measures dynamic regret against sequences of experts with a single switch. As a result, exist-

Tracking the Best Expert Privately

ing algorithms for minimizing dynamic regret modify the standard Multiplicative Weights Algorithm (Littlestone & Warmuth, 1994) to explicitly account for the fact that they are being evaluated against sequences of experts that switch.

While dynamic regret in online learning offers a more practical approach to modeling non-stationary environments, its applicability in sensitive real-world scenarios often requires additional considerations, particularly around privacy. In many online learning problems, the loss functions used to guide expert selection are derived from sensitive data, such as user interactions, medical information, or financial records. Ensuring that the learning process does not inadvertently reveal private details about the data is crucial for maintaining trust and complying with legal and ethical standards. This challenge motivates the integration of differential privacy into online learning from experts in the dynamic setting.

However, existing work on private online learning (Jain et al., 2012a; Smith & Thakurta, 2013; Jain & Thakurta, 2014a; Agarwal & Singh, 2017a; Asi et al., 2023b; 2024) is limited to the static setting. As a result, existing privacypreserving algorithms struggle to adapt to non-stationary environments, where the optimal expert may shift over time, leading to suboptimal performance.

Our work addresses this gap by initiating the study of private online prediction from experts in the dynamic setting. We formally define this problem and study it with respect to three natural types of adversaries. We develop new algorithms and lower bounds for each of these adversaries, demonstrating the near-optimality of our algorithms in several settings, and the hardness of the dynamic setting compared to the well-studied static setting.

1.1. Our Contributions

In this work, we initiate and systematically study the problem of tracking the best expert privately through the lens of online prediction with dynamic regret guarantees. We present a comprehensive study of the problem for three different types of adversaries: 1. Shifting stochastic adversaries where the losses are sampled from distributions that may shift over time, 2. Oblivious adversaries which choose the loss functions before the interaction with the algorithm, and 3. Adaptive adversaries, the most powerful type of adversary, which can choose their loss functions as a function of the interaction with the learning algorithm. We highlight the following key results:

Shifting stochastic adversaries. We design an εdifferentially private algorithm with an expected dynamic regret bound of O p

ST log(TN) + S log(T N)

ε , where T, N, and S represent the time horizon, number of experts,

and number of distribution shifts, respectively. We also give a lower bound of Ω( p

ST log(N)+S log(N/S)/ε) for this setting, demonstrating the near-optimality of this algorithm. Key to our algorithm is the sparse-vector-technique which we deploy in order to identify a new shift in the distribution without paying a large cost in privacy.

Oblivious adversaries. We develop a new algorithm for the oblivious setting through a reduction from private online prediction in the dynamic setting to the static setting. Applying this reduction with existing algorithms for private prediction in the static setting (Asi et al., 2023b), we obtain an upper bound of O p

TS log(NT) + ST 1/3 log(T/δ) log(NT )

on the expected dynamic regret.

Adaptive adversaries. We establish a separation between oblivious and adaptive adversaries in the dynamic setting. To this end, we show that any (ε, δ)-differentially private algorithm must suffer linear dynamic regret Ω(T) under adaptive adversaries for ε p

S/T. In contrast, our upper bounds for oblivious adversaries show that sub-linear regret is still possible for ε p

S/T. Finally, we provide a new algorithm that obtain sub-linear regret for ε p

S/T. This establishes ε p

S/T as a critical sharp threshold for learning under adaptive adversaries in the dynamic setting, where learning becomes infeasible for ε p

S/T but is attainable for larger values of ε.

1.2. Related Works

Private Online Learning and Prediction with Expert Advice. Differentially private online learning was first studied by Dwork et al. (2010a) in the context of continual observations. Jain et al. (2012b) extend these results to online convex programming by using gradient-based algorithms to achieve differential privacy. Following this work, Guha Thakurta & Smith (2013) privatize the Followthe-Approximate-Leader template to obtain sharper guarantees for online convex optimization. For prediction with expert advice, Dwork et al. (2014) and Jain & Thakurta (2014b) give private online learning algorithms with re-

gret bounds of O

. More recently, (Agarwal

& Singh, 2017b) study private online linear optimization and achieve regret bounds that scale like O(

ε). Using this result, they show that for the setting of prediction with expert advice, it is possible to obtain a regret bound that scales like O p

T log(N) + N log(N) log2 T

ε , improving upon the work by Dwork et al. (2014) and Jain & Thakurta (2014b). For large N, this upper bound was further improved to O p

T log(N) + T 1/3 log(N)

T log(N) + T 1/3 log(N)

ε2/3 by Asi et al. (2023b) and

Tracking the Best Expert Privately

Table 1. Summary of our bounds on the dynamic regret for different settings of the adversary. We omit logarithmic factors in T and 1/δ.

Upper Bound Lower Bound Shifting Stochastic ST log N + S log N

ST log N + S log(N/S)

ε Oblivious ST log N + ST 1/3 log N

ST log N + S log(N/S)

ST log1.5 N

ε + S log N

ST log N + S (ε log T S+1) 2

Asi et al. (2024) respectively, under an oblivious adversary. Recent work also study private prediction with expert advice in the realizable setting where there is a zero-loss expert (Asi et al., 2023a).

Asi et al. (2023b) also study private prediction with expert advice under stochastic and adaptive adversaries. Under a stochastic adversary, they reduce private online learning to private offline learning and give an (ε, δ)-differentially private online learning algorithm with expected regret O( p

T log(N) + log N

ε ). Under adaptive adversaries, (Asi et al., 2023b) prove a lower bound any (ε, δ)-differentially private online algorithm with ε 1

T cannot achieve sublinear regret under an adaptive adversary. This result established a separation between the achievable regret bounds under oblivious and adaptive adversaries.

Non-private Dynamic and Adaptive Regret Minimization. In the context of prediction with expert advice, dynamic regret minimization is also known as tracking the best expert (Littlestone & Warmuth, 1994; Herbster & Warmuth, 1998; Gyorgy et al., 2012; Bousquet & Warmuth, 2002; Vovk, 1997). This setting was first introduced by Herbster & Warmuth (1998; 2001), who noted that static regret is only meaningful for stationary environments. Following this work, there has been significant interest in obtaining dynamic regret bounds for various settings. For example, Wei et al. (2016) study dynamic regret bounds in non-stationary stochastic environments, while Zinkevich (2003); Hall & Willett (2013) have studied dynamic regret for online optimization problems. Other works have also focused on obtaining firstand second-order dynamic regret bounds (Zhang et al., 2018; Lu & Zhang, 2019) and obtaining guarantees for dynamic regret for stochastic and oblivious adversaries simultaneously (Luo & Schapire, 2015). Most relevant to this paper is the work by Lu & Zhang (2019), who provide a simple modification to the standard Multiplicative Weights Algorithm (Littlestone & Warmuth, 1994) that obtains near optimal dynamic regret under an oblivious adversary.

A closely related notion to dynamic regret is adaptive regret (Littlestone & Warmuth, 1994; Hazan & Seshadhri, 2007). Here, the goal is to obtain sublinear regret within every contiguous sub-interval of the time horizon. Several works have established deep connections between adaptive

and dynamic regret for the setting of prediction with expert advice (Adamskiy et al., 2012; Cesa-Bianchi et al., 2012; Daniely et al., 2015). In fact, for prediction with expert advice, it is known that the dynamic regret can be upper bounded by the adaptive regret, and hence adaptive regret minimization is sufficient for dynamic regret minimization (Luo & Schapire, 2015). In this paper, we use this connection between adaptive and dynamic regret minimization to derive bounds on the dynamic regret under stochastic adversaries under privacy constraints.

2. Preliminaries

Let N N denote the number of experts and ℓ: [N] 7 [0, 1] denote an arbitrary loss function that maps an expert to a bounded loss. For an abstract sequence z1, . . . , zn, we abbreviate it as z1:n. For a measurable space (X, σ(X)), we let X denote the set of all probability measures on X. For N N, we also let N denote the set of all distributions over {1, . . . , N}. We let Laplace(λ) denote the Laplace distribution with mean zero and scale λ such that its probability density function is fλ(x) = 1 2λ exp |x|

λ . Finally, we let

[N] := {1, . . . , N} for N N.

2.1. Prediction with Expert Advice and Static Regret

In the classical problem of online prediction with expert advice, a learning algorithm A plays a sequential game against an adversary over T N rounds. In full generality, the adversary first picks a sequence of functions f1, f2, . . . , f T such that ft : [N] [N]t 1 [0, 1] for all t [T]. Then, in each round t [T], the learner, using the history of the game, selects (potentially randomly) expert Jt [N]. Finally, the adversary reveals the loss function ℓt := ft( , J1:t 1) and the learner suffers the loss ℓt(Jt). The goal of the learner is to adaptively select experts J1, . . . , JT [N] such as to minimize its expected (static) regret

RA(f1:T , N) := EA

t=1 ft(Jt, J1:t 1)

t=1 ft(j , J1:t 1)

Tracking the Best Expert Privately

where the expectation is taken only with respect to the randomness of the learning algorithm.

2.2. Dynamic Regret

Motivated by concerns of distribution shift, there has been significant interest in minimizing a stronger notion of expected regret termed expected dynamic regret (Herbster & Warmuth, 1998). Unlike expected (static) regret, where the goal is to compete against the best fixed expert in hindsight, dynamic regret measures the performance of the player against a comparison sequence of experts j1:T [N]T . To make the problem tractable, we constrain the comparison sequence of experts to have at most S switches, where S N is known to the player before the game begins. Namely, a sequence of T experts j1:T [N]T has at most S switches if PT 1 t=1 1{jt+1 = jt} S. Then,

C(T, S) := n j1:T [N]T :

t=1 1{jt+1 = jt} S o ,

is the set of all T-length expert sequences with at most S switches. For a sequence of functions f1, f2, . . . , f T , we can now define the expected dynamic regret for an algorithm A by comparing its cumulative loss to that of the best fixed sequence of experts in C(T, S):

DRA(f1:T , N, S) := EA

t=1 ft(Jt, J1:t 1)

min j 1:T C(T,S)

t=1 ft(j t , J1:t 1)

We make a few remarks about the definition of dynamic regret. First, note that RA(f1:T , N) = DRA(f1:T , N, 0). Second, DRA(f1:T , N, S1) DRA(f1:T , N, S2) for S1 S2, meaning that as S gets larger, the set of comparison sequence of experts C(T, S) gets larger, making minimizing dynamic regret harder. Lastly, we stress that while dynamic regret restricts the number of switches in the comparison sequence of experts, the player is not restricted in the number of switches it can make. This is crucial for being able to obtain sublinear expected dynamic regret.

By placing restrictions on how f1, f2, . . . , f T can be chosen, one gets different types of adversaries leading to different definitions of worst-case expected dynamic regret. In this paper, we consider three adversaries: (1) shifting stochastic, (2) oblivious, and (3) adaptive.

The strongest of the three is the adaptive adversary. For an adaptive adversary, no restrictions are placed - the adversary can pick any sequence of functions f1, f2, . . . , f T leading to the worst-case expected dynamic regret being defined as

DRadap A (T, N, S) := sup f1,...,f T DRA(f1:T , N, S).

A weakening of the adaptive adversary is an oblivious adversary. This adversary must first pick a sequence of loss vectors ℓ1, . . . , ℓT independently of J1:T and then construct the sequence of functions fℓ1, fℓ2, . . . , fℓT such that fℓt(jt, j1:t 1) := ℓt(jt). We define the worst-case expected regret under an oblivious adversary as

DRobl A (T, N, S) := sup ℓ1,...,ℓT DRA(fℓ1, . . . , fℓT , N, S).

Finally, the shifting stochastic adversary is the weakest of the three. Here, the adversary must first pick a sequence of S distributions D1, . . . , DS ([0, 1]N) and a sequence of S 1 time points t1, . . . , t S 1 [T 1]. The adversary draws loss functions ℓ1, . . . , ℓT such that ℓt Ds iff t [ts, ts+1) and constructs functions fℓ1, fℓ2, . . . , fℓT . Abusing some notation by omitting the dependence on t1:S 1, the worst-case expected regret under a stochastic adversary is

DRstoc A (T, N, S) := sup D1:S Eℓ1:T D1:S [DRA(fℓ1, . . . , fℓT , N, S)] .

Note that in the definition of DRstoc A (T, N, S), the same S is used to constrain both the number of distributions that the adversary can pick and the number of switches in the comparison sequence of experts.

Analogous versions of worst-case expected (static) regret under adaptive, oblivious, and shifting stochastic adversaries follow by placing the same restrictions on how f1, . . . , f T can be chosen. As an example, the worst-case expected (static) regret under an adaptive adversary will be written as Radap A (T, N) = supf1,...,f T RA(f1:T , N). Without privacy concerns, the worst-case expected regret is Θ( T log N) for all three types of adversaries and can be obtained by running a single algorithm, the Multiplicative Weights Algorithm (MWA) (Littlestone & Warmuth, 1994). Likewise, the worst-case expected dynamic regret under stochastic, oblivious, and adaptive adversaries is also known to be Θ( TS log N), and achieved by modifying MWA to ensure that the probability of playing any expert never drops too low (where low depends on S and T) (Herbster & Warmuth, 1998; Wei et al., 2016).

2.3. Differential Privacy

We adopt the notion of differential privacy for prediction with expert advice from Asi et al. (2023b). Consider an abstract space Z and consider a function ℓ: [N] Z [0, 1]. Every z Z now induces a loss function ℓ( , z) [0, 1]N. Accordingly, stochastic, oblivious, and adaptive adversaries can be equivalently defined in terms of picking z1, . . . , z T ZT and a function ℓ: [N] Z [0, 1]. For completeness sake, we make this explicit below.

Tracking the Best Expert Privately

A stochastic adversary under dynamic regret now picks a sequence of S distributions D1, . . . , DS Z, a sequence of times points t1, . . . , t S 1, and a function ℓ: [N] Z [0, 1]. The loss function at time point t [ts, ts+) is obtained by sampling zt Ds and outputting ℓt = ℓ( , zt). An oblivious adversary selects a sequence z1, . . . , z T ZT

and a function ℓ: [N] Z [0, 1] before the game begins. The loss function at time t [T] is then defined as ℓt := ℓ( , zt). Finally an adaptive adversary picks a sequence z1, . . . , z T ZT and a sequence of function ℓt : [N] [N]t 1 Z [0, 1]. The loss function at time t [T], is then defined by ℓt( , J1:t 1, zt), where J1:t 1 are the random variable representing the actions of the player. Note that give an input z1:T , stochastic and oblivious adversaries are fully parameterized by a function ℓ: [N] Z [0, 1], while adaptive adversaries are parameterized by a sequence of functions ℓ1, . . . , ℓT such that ℓt : [N] [N]t 1 Z [0, 1].

With this equivalent representation in mind, we are now ready to define our notion of differential privacy. A dataset is as sequence of elements z1, . . . , z T . Two datasets, z1:T and z 1:T , are neighboring if they differ exactly at one time point t [T]. Let A Adv(z1:T ) = J1, . . . , JT be the sequence of random variables denoting the experts played by A when interacting with the adversary Adv that is given inputs z1:T .

Definition 2.1 (Adaptive Differential Privacy). A randomized algorithm A is (ε, δ)-differentially private against adaptive adversaries, if for all neighboring data sets z1:T , z 1:T ZT , any potentially adaptive adversary Adv, and all events E [N]T , we have that

P [A Adv(z1:T ) E] eεP [A Adv(z 1:T ) E] + δ.

If δ = 0, we say that A is ε-differentially private.

The following mechanisms will also be useful building blocks to several of our algorithms.

Laplace Mechanism. Let X be an arbitrary set and n N. Suppose f : X n R is a query with sensitivity (i.e. for all pairs of datasets x1:n, x 1:n X n that differ in exactly one index, we have that |f(x1:n) f(x 1:n)| ). Then, for every ε, the Laplace mechanism M : X n R is defined as M(x1:n) = f(x1:n) + Z, where Z Lap(

Lemma 2.2 ((Dwork & Roth, 2014), Theorem 3.6). The Laplace Mechanism is ε-differentially private.

Report-Noisy-Max Mechanism. The report-noisy-max mechanism is a differentially private algorithm that aims to select the item with the highest count. More specifically, given an input dataset x1:n X n and K count queries c1, , c K : X n R that are 1-sensitive, report-noisy-

max returns

j = arg max i [K] ci(x1:n) + Zi, where Zi Laplace(2/ε).

Lemma 2.3 ((Dwork & Roth, 2014), claim 3.9). The reportnoisy-max algorithm is ε-differentially private.

Sparse vector technique. We recall the sparse-vectortechnique (Dwork & Roth, 2014). Given an input x1:n X n, the algorithm takes a stream of queries q1, q2, . . . , q T : X n R in an online manner and aims to identify queries whose value is above zero. We assume that each qi is 1sensitive, that is, |qi(x1:n) qi(x 1:n)| 1 for neighboring datasets x1:n, x 1:n that differ in a single element. We have the following guarantee.

Lemma 2.4 ((Dwork & Roth, 2014), Theorem 3.24). Let x1:n X n be an input dataset. For β > 0, there is an ε-differentially private algorithm (Above Threshold) that halts at time k [T +1] such that for α = 8(log T +log(2/β))

ε with probability at least 1 β,

For all t < k, qi(x1:n) α;

qk(x1:n) α or k = T + 1.

To facilitate the notation for using Above Threshold in our algorithms, we assume that it has the following components:

1. Initialize Sparse Vec(ε, β): initializes a new instance of Above Threshold with privacy parameter ε, and failure probability parameter β. This returns an instance (data structure) Q that supports the following test-abovethreshold function.

2. Q.Test Abo Thr(q): tests if the query q is above threshold. In that case, the algorithm stops and does not accept more queries.

3. SVT-based Algorithm for Stochastic Adversaries

We begin our algorithmic contribution by studying the stochastic setting, where we develop an SVT based algorithm that obtains

ST + S/ε dynamic regret against stochastic adversaries. The starting point of our algorithm is lazy algorithm of (Asi et al., 2023b) for private prediction from experts with static regret. We show in Section 3.1 that this algorithm obtains near-optimal adaptive regret (Hazan & Seshadhri, 2007), that is, it obtains regret w for any sub-interval of size w for a stochastic adversary. Then, we present our main algorithm in Section 3.2, which obtains near-optimal dynamic regret against stochastic adversaries.

Tracking the Best Expert Privately

3.1. Optimal Adaptive Regret for Stationary Environment

In this section, we consider the simple stochastic setting where all losses are samples for a fixed distribution P. We present that a version of the existing algorithm of (Asi et al., 2023b) obtains a stronger guarantee than the original paper proved: it obtains near-optimal adaptive regret for stochastic adversaries, meaning that it obtains w regret for any subinterval of size w.

Algorithm 1 Limited Updates for Online Optimization with Stochastic Adversaries Require: Privacy parameter ε

1: Set j0 [N] 2: for t = 1 to T do 3: if t = 2ℓfor some integer ℓ 1 then 4: Run report-noisy-max procedure to get

jt = arg min j [N]

i=t/2 ℓi(j) + Zt(j),

where Zt(j) Laplace(2/ε)

5: else 6: Let jt = jt 1 7: end if 8: Receive ℓt : [N] [0, 1]. 9: Pay cost ℓt(jt) 10: end for

The following theorem states the adaptive regret guarantees of Algorithm 1. We defer the proof to Appendix B.1.

Theorem 3.1. Let ℓ1, . . . , ℓT : [N] [0, 1] be sampled i.i.d. from a distribution P. Then, for any t [T] and w [T t], Algorithm 1 is ε-differentially private and has with probability 1 β,

i=t ℓi(ji) min j [N]

i=t ℓi(j) 16 log(NT/β) log(T)

w log(TN/β).

3.2. Optimal Dynamic Regret for Shifting Stochastic Adversaries

In this section, we develop our main algorithm for the stochastic setting. Our algorithm is based on iteratively running the algorithm for the stationary setting (Algorithm 1). Moreover, to adapt to shifting distributions, our algorithm uses the sparse-vector-technique to test whether the underlying distribution of the losses has changes. To this end, we use SVT to test whether the regret of the internal algorithm is too large, indicating a shift in the distribution. We present the full details in Algorithm 2.

Algorithm 2 SVT-based algorithm Require: Privacy parameter ε, failure probability β

1: t1 = 1, i = 1 2: Start new instance of Algorithm 1 from ti with privacy parameter ε/2 3: Q = Initialize Sparse Vec(ε/2, β/T) 4: while t < T do 5: Receive new loss ℓt 6: Use Algorithm 1 to play jt 7: Define α = 16(2 log T +log(2/β))

ε and Regw := 16 log(NT/β) log(T )

w log(TN/β) 8: For each w t ti, define query

i=t w ℓi(ji) min j [N]

i=t w ℓi(j) Regw α 1

9: if Q.Test Abo Thr(qt w) is true for some w then 10: i i + 1 11: ti = t 12: Go to line 2 and restart a new instance of Algorithm 1 13: end if 14: end while

The following theorem summarizes the dynamic regret guarantees of Algorithm 2. We defer the proof to Appendix B.2.

Theorem 3.2 (Upper bounds for Expected Dynamic Regret for Shifting Stochastic Adversaries). Let A denote Algorithm 2 when run with ε and β = 1/T. Then algorithm A is ε-differentially private and has expected dynamic regret DRstoc A (T, N, S) upper bounded by

ST log(NT) + S log(NT) log(T)

Proof. (sketch) The privacy proof follows directly from the guarantees of SVT mechanism and Algorithm 1, as each user is used in the instantiation of both Algorithm 1 and SVT with parameters ε/2.

The utility proof follows from two main ingredients, which we prove in Lemma B.3 and Lemma B.4. The first shows that if the distribution shifts at most S times, then SVT will return true at most S times with high probability. The second result shows that as long as SVT has not restarted the instantiation of the internal algorithm, its regret in any sub-interval (adaptive regret) will be small. Building on these two lemmas, we can prove an upper bound on the dynamic regret (see Appendix B.2 for full details).

Lower bound for shifting adversary. We can extend the lower bound of the static setting (Asi et al., 2023b) to our dynamic setting. Indeed, the static setting has a lower bound

Tracking the Best Expert Privately

of log(N)/ε on the expected regret. We can construct an adversary which splits the rounds to S phases, where in each phase it uses the static lower bound over a disjoint subset of the experts of size N/S. Given the independence of these phases, this implies that the regret in each phase is lower bounded by log(N/S)/ε. Summing over all S phases, we get that the dynamic regret is lower bounded by S log(N/S)/ε. Finally, note that in the most common setting of parameters where S T N 1 ρ for some constant ρ > 0, this lower bound becomes Ω(S log(N)/ε), matching our upper bound.

4. Upper bounds for Oblivious Adversaries

To obtain our upper bounds on expected dynamic regret against oblivious adversaries, we reduce private dynamic regret minimization to private static regret minimization. That is, our main result is a conversion of a private online learning algorithm minimizing static regret to a private online learning algorithm minimizing dynamic regret. By doing so, we are able to leverage the recent results by Asi et al. (2024), who obtain the best-known expected (static) regret guarantees for private online learning under oblivious adversaries.

Theorem 4.1 (Private Static Regret = Private Dynamic Regret). Let ε, δ (0, 1). Suppose there exists an (ε, δ)-differentially private algorithm A whose worst-case expected regret under an oblivious adversary is at most Robl A (T, N). Then, there exists an (ε, δ)-differentially private algorithm B such that DRobl B (T, N, S) Robl A (T, (NT)2S).

Proof. Let ε, δ (0, 1). Fix the time horizon T N, the number of experts N N, and the number of switches S N. Suppose there exists an (ε, δ)-differentially private algorithm A whose worst-case expected regret under an oblivious adversary is at most Robl A (T, N).

Consider the following algorithm B. Let [T] c be the set of all strictly increasing tuples of size at most c. Before the game beings, B first constructs the class of meta-experts E such that

n et1:c,j1:c+1 : t1:c [T] c, j1:c+1 [N]c+1o

where the expert et1:c,j1:c+1 : [T] [N] plays expert ji from time point ti 1 to ti for every i [c + 1], where t0 = 1 and tc+1 = T. Then, B initializes A with the set of meta experts E. In each round t [T], B queries A, receives a (potentially random) meta-expert Et E from A, and plays the expert Jt [N] played by the meta-expert Et on round t. That is, Jt := Et(t). After observing the true loss vector ℓt : [N] [0, 1], B computes a meta-loss vector

ℓt : E [0, 1] such that ℓt(e) := ℓt(e(t)) for all e E and passes ℓt to A, which then updates itself. We claim that: (1) B s expected dynamic regret is at most Robl A (T, (NT)2S) and (2) B is (ε, δ)-differentially private.

We start by proving Claim (1). Observe that

N c+1 N S+1 S X

Therefore, by the guarantees of A, we have that

t=1 ℓt(e) RA(T, (NT)2S).

By definition of the meta loss vectors, we have that

t=1 ℓt(e(t)) RA(T, (NT)2S).

Let j 1:T [N]T be the minimizer of PT t=1 ℓt(jt) such that c := PT 1 t=1 1{j t+1 = j t } S. Let (t 1, t 2, . . . , t c ) be the time points where the switches in j 1:T occur. Observe that there exists an expert e E which plays expert j i between time points t i 1 and t i for every i [c ]. Accordingly, we have that e (t) = j t for all t [T] and

t=1 ℓt(j t ) Robl A (T, (NT)2S),

completing the proof of Claim (1).

We now prove Claim (2). Consider two neighboring sequences of loss functions ℓ1, . . . , ℓT and ℓ 1, . . . , ℓ T which differ at exactly one time point t . Consider the sequence of meta loss vectors ℓ1, . . . , ℓT and ℓ 1, . . . , ℓ T that B would construct and pass to A had it been run on ℓ1, . . . , ℓT and ℓ 1, . . . , ℓ T respectively. Observe that ℓ1, . . . , ℓT and ℓ 1, . . . , ℓ T are also neighboring sequence of loss functions that differ only at time point t . Hence, the outputs of A when run ℓ1, . . . , ℓT and ℓ 1, . . . , ℓ T are (ε, δ)- indistinguishable. The proof is complete after noting that the outputs of B is the result of post-processing the output of A since the outputs of the meta-experts are fixed, and do not depend on the observed loss sequence.

We now provide concrete upper bounds on the expected dynamic regret under oblivious adversaries by instantiating Theorem 4.1 with existing private algorithms from literature. First, we recall the regret guarantee of the private online learning algorithm from Asi et al. (2024).

Proposition 4.2 (Upper bound on Expected Regret for Oblivious Adversaries (Asi et al., 2024)). Fix ε, δ (0, 1).

Tracking the Best Expert Privately

There exists an (ε, δ)-differentially private algorithm A such that

Robl A (T, N, S) = O p

T log N + T 1/3 log(T/δ) log N

Instantiating Theorem 4.1 with the algorithm guaranteed by Proposition 4.2 then gives the following Corollary.

Corollary 4.3 (Upper bounds for Expected Dynamic Regret for Oblivious Adversaries). Fix ε, δ (0, 1). There exists an (ε, δ)-differentially private algorithm B such that

DRobl B (T, N, S) = O p

+ ST 1/3 log(T/δ) log(NT)

We highlight that Corollary 4.3 provides the first known upper bounds on expected dynamic regret under oblivious adversaries and differential privacy. Unfortunately, unlike our algorithms in Sections 3 and 5, the algorithm obtaining the upper bound in Corollary 4.3 is not efficient as it requires constructing a set of experts that is exponential in the time horizon. In Section 5, we give an efficient ε-differentially private algorithm whose expected dynamic regret is at most O

ST log1.5(NT )

ε + S log(NT )

ε under an adaptive adversary. Clearly, the same upper bound holds for oblivious adversaries. However, this upper bound is weaker than the one we get in Corollary 4.3.

We end this section by noting that efficient dynamic regret minimizing algorithms for oblivious adversaries do exist if one does not care about privacy. Unfortunately, unlike for static regret, privatizing existing dynamic regret minimizing algorithms is not as straightforward. As an example, Lu & Zhang (2019) give an efficient non-private algorithm for minimizing dynamic regret by projecting the distributions over experts obtained after the multiplicative weights update into a clipped simplex. This ensures that the probability of any playing any particular expert is sufficiently lower bounded. Unfortunately, this clipping operation is challenging under privacy constraints as now we have to privatize each gradient separately instead of privatizing the sum of gradients via the Binary Tree Mechanism (Dwork et al., 2010a). We leave whether one can achieve the upper bound in Corollary 4.3 via an efficient algorithm as an open question.

5. Dynamic Regret for Adaptive Adversaries

Under expected (static) regret, Asi et al. (2023b) prove a separation between oblivious and adaptive adversaries. In particular, for every ε 1

T , there exists a (ε, δ) differentially private online learning algorithm whose expected

regret under oblivious adversaries is sublinear in the time horizon T. However, this is not the case under adaptive adversaries: for any ε 1

T , every (ε, δ)-differentially private online learning algorithm must suffer expected regret which grows linearly with T. In this section, we prove a qualitatively similar, but quantitatively stronger separation between private expected regret minimization under oblivious dynamic adversaries and adaptive dynamic adversaries.

5.1. Lower Bounds for Adaptive Adversaries

Our first result is a lower bound which roughly shows that

when ε o( q

S T ), sublinear expected dynamic regret is not possible under adaptive adversaries. Our lower bound construction builds upon the lower bound construction by Asi et al. (2023b) for expected (static) regret under adaptive adversaries. Namely, if there are S switches, then our lower bounds follows by using S different copies of the lower bound construction from Asi et al. (2023b) for adaptive adversaries.

Theorem 5.1 (Lower bound on Expected Dynamic Regret for Adaptive Adversaries). Let S 0, T be sufficiently large, and N 2T

S . Let ε 1 and δ (S+1)3

T 3 . If A is (ε, δ)-differentially private, then

DRadap A (T, N, S) = Ω

T, S (ε log T S+1)2)

Theorem 5.1 roughly implies that when ε q

S T , every (ε, δ)-differentially private online learner must suffer expected dynamic regret Ω(T) under an adaptive adversary. This is in stark contrast to Theorem C.2 which shows that sublinear expected dynamic regret is still possible when

S T under an oblivious adversary. The proof of Theorem 5.1 can be found in Appendix D.

5.2. Upper bounds for Adaptive Adversaries

Our second result is an upper bound which shows that sublinear expected dynamic regret under an adaptive adversary is

possible as long as ε = ω( q

S T ). To do so, we modify an existing efficient (non-private) algorithm for regret minimization under adaptive dynamic adversaries. Namely, we design a private version of Algorithm 2 from Lu & Zhang (2019) by adding independent Laplace noise to the loss vectors before using them to update the distribution over the experts. For completeness sake, we include this modified algorithm below. Let N := {w N : minj w(j) S NT } denote the clipped simplex, ϕ : N R 0 denote the negative Shannon entropy function ϕ(w) := PN j=1 w(j) log w(j), and Dϕ( || ) : N N R 0 denote the Bregman divergence with respect to ϕ, defined as Dϕ(w1||w2) :=

Tracking the Best Expert Privately

ϕ(w1) ϕ(w2) w1 w2, ϕ(w2) .

Algorithm 3 Private Online Learner for Adaptive Adversaries

1: Input: η > 0, ε > 0 2: Initialize: w1(i) = 1

N for i [N] 3: for t = 1, 2, . . . , T do 4: Draw expert Jt wt 5: Observe loss vector ℓt and suffer loss ℓt(Jt) 6: Sample Zt(i) Laplace( 1

ε) and define ℓt(i) = ℓt(i) + Zt(i) for all i [N] 7: Update wt+1 = arg minw N w, η ℓt + Dϕ(w||wt) 8: end for

Theorem 5.2 (Upper bound on Expected Dynamic Regret for Adaptive Adversaries). Let A denote Algorithm 3 when run with ε (0, 1) and η = ε q

S T log(NT ). Then A is ε-differentially private and has

DRadap A (T, N, S) = O

ST log1.5(NT)

ε + S log(NT)

The proof of Theorem 5.2 follows by combining techniques from Lu & Zhang (2019) and Agarwal & Singh (2017a), and is deferred to Appendix C.

6. Discussion

In this paper, we provide the first private online learning algorithms for dynamic regret minimization against three types of adversaries: switching stochastic, oblivious and adaptive. We highlight important directions of future work.

Optimal bounds for Oblivious Adversaries. In Section 4, we provided an upper bound of O p

ST log(NT) + ST 1/3 log(T/δ) log(NT )

ϵ2/3 on the expected dynamic regret under an oblivious adversary. We leave open whether one can prove a matching lower bound or an improved upper bound.

Efficient algorithms for Oblivious Adversaries. Unlike for stochastic and adaptive adversaries, our algorithm for oblivious adversary is not efficient it constructs a set of experts that is exponential in the time horizon T. This motivates the design of efficient algorithms for dynamic regret minimization under oblivious adversaries with matching or better regret bounds. Unfortunately, our current attempts at designing efficient private algorithms against oblivious adversaries have been unsuccessful, as it is not clear how to privatize existing efficient non-private algorithms for dynamic regret. Perhaps central to the difficulty is the tension between lazy updating and obtaining small dynamic regret.

Existing techniques for obtaining private online learning algorithms under static regret rely on privatizing existing (non-private) lazy algorithms that do not switch their played experts too often (Asi et al., 2023c; 2024). Unfortunately, it is reasonable that such lazy algorithms cannot obtain good dynamic regret, as switching which expert is played is crucial to tracking the best expert. We will make sure to add a discussion of this in the camera-ready version.

Impact Statement

This paper presents work whose goal is to advance the field of Machine Learning. There are many potential societal consequences of our work, none which we feel must be specifically highlighted here.

Adamskiy, D., Koolen, W. M., Chernov, A., and Vovk, V. A closer look at adaptive regret. In Algorithmic Learning Theory: 23rd International Conference, ALT 2012, Lyon, France, October 29-31, 2012. Proceedings 23, pp. 290 304. Springer, 2012.

Agarwal, N. and Singh, K. The price of differential privacy for online learning. In Proceedings of the 34th International Conference on Machine Learning, pp. 32 40, 2017a.

Agarwal, N. and Singh, K. The price of differential privacy for online learning. In International Conference on Machine Learning, pp. 32 40. PMLR, 2017b.

Asi, H., Feldman, V., Koren, T., and Talwar, K. Nearoptimal algorithms for private online optimization in the realizable regime. Proceedings of the 40th International Conference on Machine Learning, 2023a.

Asi, H., Feldman, V., Koren, T., and Talwar, K. Private online prediction from experts: Separations and faster rates. Proceedings of the Thirty Sixth Annual Conference on Computational Learning Theory, 2023b.

Asi, H., Feldman, V., Koren, T., and Talwar, K. Private online prediction from experts: Separations and faster rates. In The Thirty Sixth Annual Conference on Learning Theory, pp. 674 699. PMLR, 2023c.

Asi, H., Koren, T., Liu, D., and Talwar, K. Private online learning via lazy algorithms. ar Xiv preprint ar Xiv:2406.03620, 2024.

Bousquet, O. and Warmuth, M. K. Tracking a small set of experts by mixing past posteriors. Journal of Machine Learning Research, 3(Nov):363 396, 2002.

Cesa-Bianchi, N. and Lugosi, G. Prediction, learning, and games. Cambridge university press, 2006.

Tracking the Best Expert Privately

Cesa-Bianchi, N., Gaillard, P., Lugosi, G., and Stoltz, G. Mirror descent meets fixed share (and feels no regret). Advances in Neural Information Processing Systems, 25, 2012.

Daniely, A., Gonen, A., and Shalev-Shwartz, S. Strongly adaptive online learning. In International Conference on Machine Learning, pp. 1405 1411. PMLR, 2015.

Duchi, J. C. Information theory and statistics. Lecture Notes for Statistics 311/EE 377, Stanford University, 2019. URL http://web.stanford.edu/class/ stats311/lecture-notes.pdf. Accessed May 2019.

Dwork, C. and Roth, A. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3 & 4):211 407, 2014.

Dwork, C., Naor, M., Pitassi, T., and Rothblum, G. N. Differential privacy under continual observation. In Proceedings of the forty-second ACM symposium on Theory of computing, pp. 715 724, 2010a.

Dwork, C., Rothblum, G. N., and Vadhan, S. Boosting and differential privacy. In 2010 IEEE 51st annual symposium on foundations of computer science, pp. 51 60. IEEE, 2010b.

Dwork, C., Roth, A., et al. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci., 9(3-4):211 407, 2014.

Guha Thakurta, A. and Smith, A. (nearly) optimal algorithms for private online learning in full-information and bandit settings. Advances in Neural Information Processing Systems, 26, 2013.

Gyorgy, A., Linder, T., and Lugosi, G. Efficient tracking of large classes of experts. IEEE Transactions on Information Theory, 58(11):6709 6725, 2012.

Hall, E. and Willett, R. Dynamical models and tracking regret in online convex programming. In International Conference on Machine Learning, pp. 579 587. PMLR, 2013.

Hazan, E. and Seshadhri, C. Adaptive algorithms for online decision problems. In Electronic colloquium on computational complexity (ECCC), volume 14, 2007.

Herbster, M. and Warmuth, M. K. Tracking the best expert. Machine learning, 32(2):151 178, 1998.

Herbster, M. and Warmuth, M. K. Tracking the best linear predictor. Journal of Machine Learning Research, 1(281309):10 1162, 2001.

Jain, P. and Thakurta, A. (Near) dimension independent risk bounds for differentially private learning. In Proceedings of the 31st International Conference on Machine Learning, pp. 476 484, 2014a.

Jain, P. and Thakurta, A. G. (near) dimension independent risk bounds for differentially private learning. In International Conference on Machine Learning, pp. 476 484. PMLR, 2014b.

Jain, P., Kothari, P., and Thakurta, A. Differentially private online learning. In Proceedings of the Twenty Fifth Annual Conference on Computational Learning Theory, 2012a.

Jain, P., Kothari, P., and Thakurta, A. Differentially private online learning. In Conference on Learning Theory, pp. 24 1. JMLR Workshop and Conference Proceedings, 2012b.

Kairouz, P., Oh, S., and Viswanath, P. The composition theorem for differential privacy. In International conference on machine learning, pp. 1376 1385. PMLR, 2015.

Littlestone, N. and Warmuth, M. K. The weighted majority algorithm. Information and computation, 108(2):212 261, 1994.

Lu, S. and Zhang, L. Adaptive and efficient algorithms for tracking the best expert. ar Xiv preprint ar Xiv:1909.02187, 2019.

Luo, H. and Schapire, R. E. Achieving all with no parameters: Adanormalhedge. In Conference on Learning Theory, pp. 1286 1304. PMLR, 2015.

Smith, A. and Thakurta, A. (Nearly) optimal algorithms for private online learning in full-information and bandit settings. In Advances in Neural Information Processing Systems 26, 2013.

Vovk, V. Derandomizing stochastic prediction strategies. In Proceedings of the tenth annual conference on Computational learning theory, pp. 32 44, 1997.

Wei, C.-Y., Hong, Y.-T., and Lu, C.-J. Tracking the best expert in non-stationary stochastic environments. Advances in neural information processing systems, 29, 2016.

Zhang, L., Lu, S., and Zhou, Z.-H. Adaptive online learning in dynamic environments. Advances in neural information processing systems, 31, 2018.

Zinkevich, M. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th international conference on machine learning (icml-03), pp. 928 936, 2003.

Tracking the Best Expert Privately

A. Privacy Properties

Lemma A.1 (Basic Composition (Corollary 3.15 in (Dwork et al., 2014))). Let X, Y1, Y2, . . . , YT be arbitrary sets and n N. Let A1, A2, . . . , AT be a sequence of randomized algorithms where A1 : X n Y1 and At : Y1, . . . , Yt 1, X n Yt for all t = 2, 3, . . . , T. If for every t [T] and every y1:t 1 Y1 Y2 Yt 1, we have that At(y1:t 1, ) is εt-differentially private, then the overall algorithm A : X n Y1 Y2 YT , defined as

A(x1:n) = A1(x1:n), A2(A1(x1:n), x1:n), . . . , AT (A1(x1:n), A2(A1(x1:n), x1:n), . . . , x1:n) ,

satisfies εT-differential privacy. Lemma A.2 (Basic Composition (Corollary 3.15 in (Dwork et al., 2014))). Let X, Y1, Y2, . . . , YT be arbitrary sets and n N. Let A1, A2, . . . , AT be a sequence of randomized algorithms where A1 : X n Y1 and At : Y1, . . . , Yt 1, X n Yt for all t = 2, 3, . . . , T. If for every t [T] and every y1:t 1 Y1 Y2 Yt 1, we have that At(y1:t 1, ) is εt-differentially private, then the overall algorithm A : X n Y1 Y2 YT , defined as

A(x1:n) = A1(x1:n), A2(A1(x1:n), x1:n), . . . , AT (A1(x1:n), A2(A1(x1:n), x1:n), . . . , x1:n) ,

satisfies εT-differential privacy. Lemma A.3 (Advanced Composition (Dwork et al., 2010b; Kairouz et al., 2015)). Let X, Y1, Y2, . . . , YT be arbitrary sets and n N. Let A1, A2, . . . , AT be a sequence of randomized algorithms where A1 : X n Y1 and At : Y1, . . . , Yt 1, X n Yt for all t = 2, 3, . . . , T. If for every t [T] and every y1:t 1 Y1 Y2 Yt 1, we have that At(y1:t 1, ) is εt-differentially private, then for every δ > 0, the overall algorithm A : X n Y1 Y2 YT , defined as

A(x1:n) = A1(x1:n), A2(A1(x1:n), x1:n), . . . , AT (A1(x1:n), A2(A1(x1:n), x1:n), . . . , x1:n) ,

satisfies (ε , δ )-differential privacy, where

t=1 ε2 t log 1

Post-processing and group privacy will also be useful. Lemma A.4 (Post Processing (Proposition 2.1 in (Dwork et al., 2014))). Let X, Y, Z be arbitrary sets and n N. Let A : X n Y and B : Y Z be randomized algorithms. If A is (ε, δ)-differentially private then the composed algorithm B A : X n Z is also (ε, δ)-differentially private.

B. Missing Proofs for Section 3

B.1. Proof of Theorem 3.1

The proof of Theorem 3.1 is based on the following two lemmas. The first lemma is a concentration result which shows that the average loss of each expert in each sub-interval is close to its expectation. Lemma B.1. Let ℓ1, . . . , ℓT : [N] [0, 1] be sampled i.i.d. from a distribution P. Then with probability 1 β, for all j [N], t [T] and w [T t],

i=t ℓi(j) w Eℓ P [ℓ(j)]

2w log(TN/β).

The second lemma proves that the static regret of the algorithm with respect to the population minimizer is small. Lemma B.2. Let ℓ1, . . . , ℓT : [N] [0, 1] be sampled i.i.d. from a distribution P. Then, with probability 1 3β that for all t [T] and w [T t]

i=t ℓi(ji) w min j [N] E[ℓ(j)]

16 log(NT/β) log(T)

w log(TN/β).

Tracking the Best Expert Privately

Building on Lemma B.1 and Lemma B.2, we can now proceed to prove Theorem 3.1.

Proof. (of Theorem 3.1)

The privacy follows immediately from the guarantees of the report-noisy-max mechanism (Lemma 2.3): indeed, the algorithm uses the data only through the invocation of the report-noisy-max algorithm. Moreover, note that each data-point ℓt is used in a single instantiation of the report-noisy-max mechanism.

Now we proceed to prove utility. Using Lemma B.1 and Lemma B.2, we have

i=t ℓi(ji) min j [N]

i=t ℓi(j) =

i=t ℓi(ji) w min j [N] E[ℓ(j)]

w min j [N] E[ℓ(j)] min j [N]

16 log(NT/β) log(T)

w log(TN/β) + max j [N]

16 log(NT/β) log(T)

w log(N/β),

where the second inequality follows Lemma B.2 and the third inequality follows from Lemma B.1.

Now, it remains to prove our two lemmas. We begin with the proof of Lemma B.1.

Proof. (of Lemma B.1) Fix j [N], t [T], and w [T t]. Since ℓi(j) [0, 1], Hoeffding s inequality [(Duchi, 2019), Corollary 4.1.10] implies that

i=t ℓi(j) w Eℓ P [ℓ(j)]

2w log(TN/β)

Taking a union bound over all j, t, w proves the claim.

Finally, we prove Lemma B.2.

Proof. (of Lemma B.2) First, concentration of Laplace random variables [(Dwork & Roth, 2014), Fact 3.7] implies that |Zt(j)| 2 log(NT/β)/ε for all j [N] and t with probability at least 1 β. Let j = arg minj [N] E[ℓ(j)]. Then, Lemma B.1 implies that for all t = 2ℓ, we have

Eℓ P [ℓ(jt)] 1 (t/2)

i=t/2 ℓi(jt) +

t log(TN/β)

i=t/2 ℓi(j ) + Zt(j ) Zt(jt)

i=t/2 ℓi(j ) + 8 log(NT/β)

Eℓ P [ℓ(j )] + 8 log(NT/β)

where the second inequality follows from the definition of jt in the algorithm. Based on the lazy structure of the algorithm, this implies that for all t [T],

Eℓ P [ℓ(jt)] Eℓ P [ℓ(j )] + 16 log(NT/β)

2 log(TN/β)

Tracking the Best Expert Privately

Now, we get that

i=t ℓi(ji) w min j [N] E[ℓ(j)]

i=t ℓi(ji) E[ℓ(ji)]

E[ℓ(ji)] min j [N] E[ℓ(j)]

i=t ℓi(ji) E[ℓ(ji)]

16 log(NT/β)

2 log(TN/β)

i=t ℓi(ji) E[ℓ(ji)]

+ 16 log(NT/β) log T

w log(TN/β).

For the first term, note that for Wi = ℓi(ji) E[ℓ(ji)], the sequence {Wi} is a bounded difference martingale. We can use Azuma s inequality [(Duchi, 2019), Corollary 4.2.4] to get that

i=t ℓi(ji) E[ℓ(ji)]

This proves the claim.

B.2. Proof of Theorem 3.2

For our analysis, we build on the following two lemmas. The first shows that if SVT identifies an above threshold query, then there must have been a distribution shift with high probability.

Lemma B.3. Fix i. Then there is a distribution shift in the range [ti, ti+1] with probability 1 2β.

Proof. Assume towards a contradiction that there is no distribution shift in the range [ti, ti+1]. Based on Theorem 3.1, we know that Algorithm 1 had near-optimal adaptive regret if the distribution does not change, that is, for all w ti+1 ti we have ti+1 X

ti+1 w ℓt(jt) min j [N]

ti+1 w ℓt(j) Regw

However, as SVT identifies an above threshold query at time ti+1, the guarantee of SVT (Lemma 2.4) imply that there is w ti+1 ti such that qt w α, implying that

ti+1 w ℓt(jt) min j [N]

ti+1 w ℓt(j) Regw + 1.

Therefore, we get a contradiction.

Our second lemma shows that as long as SVT did not identify an above threshold query, the adaptive regret of the internal algorithm will be small.

Lemma B.4. Fix i and let t 1, t 2 [ti, ti+1 1]. Letting w = t 2 t 1, we have with probability 1 β

t=t 1 ℓt(jt) min j [N]

t=t 1 ℓt(j) Regw + 2α + 1.

Proof. Note that SVT did not identify an above threshold query at time t 2; otherwise we would have t 2 = ti+1. Therefore, setting w = t 2 t 1, the guarantees of the SVT mechanism for the query qt 2 w imply that qt 2 w α and therefore

t=t 1 ℓt(jt) min j [N]

t=t 1 ℓt(j) Regw + 2α + 1.

This proves the claim.

Tracking the Best Expert Privately

Now we are ready to prove Theorem 3.2.

The privacy proof follows directly from the guarantees of SVT mechanism and Algorithm 1, as each user is used in the instantiation of both Algorithm 1 and SVT with parameters ε/2.

Now we proceed to prove utility. Based on Lemma B.3, for a shifting stochastic adversary with S shifts, the algorithm restarts its internal procedure at most ˆS S times. Let t1, . . . , t ˆS denote these times. Note that the dynamic regret of the algorithm is

max j 1 ,...,j T 1

t=1 1{j t = j t+1} S

t=1 ℓt(jt) ℓt(j t ) =

t=ti ℓt(jt) ℓt(j t )

Let Li = ti+1 ti and Si = Pti+1 1 t=ti 1{j t = j t+1} be the number of switches in {j t } that the adversary makes inside the range [ti, ti+1]. We will prove that for all i with high probability

t=ti ℓt(jt) ℓt(j t ) 9 p

(Si + 1)Li log(TN/β) + (Si + 1) 16 log(NT/β)

ε + 2α + 1 . (1)

Using inequality (1), we can now prove the theorem. Indeed, we get that the dynamic regret is upper bounded by

t=1 ℓt(jt) ℓt(j t ) =

t=ti ℓt(jt) ℓt(j t )

(Si + 1)Li log(TN/β) + (Si + 1) 16 log(NT/β) log(T)

i=1 (Si + 1)

v u u tlog(TN/β)

i=1 Li + 2S 16 log(NT/β) log(T)

2ST log(TN/β) + 2S 16 log(NT/β) log(T)

ST log(TN/β) + S log(NT/β) log(T)

where the last inequality follows since α = 16(2 log T +log(2/β))

ε . It remains to prove inequality (1). We fix i = 1 without loss of generality. Let t1, . . . , t S1 [t1, t2] denote the switching times of the sequence of experts {j t } inside the range [t1, t2], and let j 1,1, . . . , j 1,S1 denote the set of different experts in this range. Using Lemma B.4, we now get

t=t1 ℓt(jt) ℓt(j t ) =

t= ti ℓt(jt) ℓt(j 1,s)

( ti+1 ti) log(TN/β) + 16 log(NT/β) log(T)

(t2 t1) log(TN/β) + (S1 + 1) 16 log(NT/β) log(T)

ε + 2α + 1 .

This proves that with probability 1 β we have that the dynamic regret is upper bounded by O p

ST log(TN/β) + S log(T N/β) log(T )

ε . Picking β = 1/T gives the upper bound on expectation as the dynamic regret is always bounded by T.

C. Proof of Theorem 5.2

We first review a folklore result which states that for online learning algorithms which do not depend on the realizations of its past plays, expected regret under adaptive adversaries is at most the expected regret under oblivious adversaries.

Tracking the Best Expert Privately

Theorem C.1 (Exercise 4.1 in Cesa-Bianchi & Lugosi (2006)). Let A : ([0, 1]N) ([N]) be any (randomized) online learning algorithm which maps a sequence of loss vectors to a distribution over experts. That is, for any sequence of loss functions ℓ1, . . . , ℓT , the prediction of A on round t [T] only depends on the loss vectors ℓ1, . . . , ℓt 1. Then,

Radap A (T, N) Robl A (T, N).

As a consequence of Theorem C.1 and the fact that distributions constructed by Algorithm 3 do not depend on the realizations of it past plays, it is without loss of generality to consider an oblivious dynamic adversary.

To that end, we first prove the following result.

Theorem C.2. Fix a sequence of loss functions ℓ1, . . . , ℓT . Algorithm 3, when run with ε, η > 0 is ε-differentially private and satisfies

t=1 ℓt(Jt) min j1:T C(T,S)

O η log2(NT)

ε2 T + S log(NT)

η + S log(NT)

The following lemma about Laplace vectors will be useful.

Lemma C.3 (Norms of Laplace Vectors (Fact C.1 in (Agarwal & Singh, 2017b))). If Z1, . . . , ZT (Lap(λ))N, then

P( t [T] : ||Zt||2 10λ2 log2(NT)) 1

We are now equipped to prove Theorem C.2. Our proof of utility will closely follow Theorem 1 in Lu & Zhang (2019) but account for the fact that the loss vectors used to update the algorithm can now contain large negative entries.

Proof. (of utility in Theorem C.2). Let ℓ1, . . . , ℓT be the sequence of losses chosen by the oblivious adversary. Let Z1, . . . , ZT be the sequence of Laplace random vectors sampled in Line 6 of Algorithm 3. Observe that Zt (Laplace( 1

for all t [T]. Let F be the event that there exists a t [T] such that ||Zt||2 10 log2(NT )

ε2 . Then, by Lemma C.3, we know that P(F) 1

Fix any sequence of experts j1:T C(T, S). Observe that

t=1 ℓt(jt)|F

Hence, we have that

t=1 ℓt(jt)|F c #

Using the facts that E [Zt|F c] = 0, the randomness in Zt is independent of that of Algorithm 3, and Jt, being a function of only the past loss vectors ℓ1, . . . , ℓt 1, is independent of Zt, we have that

t=1 ℓt(jt) F c #

t=1 ℓt(jt) F c #

It now suffices to upper bound E h PT t=1 ℓt(Jt) PT t=1 ℓt(jt)|F ci . To do so, we follow the proof of Theorem 1 in Lu

& Zhang (2019) and modify it where necessary to account for the fact that || ℓt|| 4 log NT

ε under the event F c. Let R [T 1] be the subset of time points such that for every s R, we have that js+1 = js. Note that |R| S by definition. Split [T] into |R| + 1 disjoint intervals [i1, i2), . . . , [i|R|+1, i|R|+2) with i1 = 1 and i|R|+2 = T + 1 such that for every

Tracking the Best Expert Privately

s [|R| + 1], we have that jis = jis+1 = = jis+1 1. Fix some s [|R| + 1], note that the expected regret in the s th interval is

t=is wt, ℓt ℓt(jt) F c #

Define the one-hot vectors e1, . . . , e T such that

et(j) := 1{j = jt}.

Then, we can write

t=is wt, ℓt ℓt(jt) F c #

t=is wt et, ℓt F c #

Further define et N such that

et(j) := (1 S

T ) et(i) + S NT .

Decompose the right hand side of Equation (2) as

t=is wt et, ℓt F c #

t=is wt et, ℓt F c #

t=is et et, ℓt F c #

Using Holder s inequality and the fact that || ℓt|| 4 log NT

ε , we can bound

et et, ℓt || et et||1|| ℓt|| 4S log NT

Plugging this in, we then have that

t=1 ℓt(jt) F c #

t=is wt et, ℓt F c

+ 4S log NT

Decompose wt et, ℓt as

wt et, ℓt = wt wt+1, ℓt + wt+1 et, ℓt .

By the proof of Lemma 3 in Lu & Zhang (2019), we have that

wt wt+1, ℓt η|| ℓt||2 ..

Thus, under event F c, we have that

wt wt+1, ℓt 10η log2 NT

Tracking the Best Expert Privately

Thus, we can write

t=1 ℓt(jt) F c #

t=is wt+1 et, ℓt F c

+ 10ηT log2 NT

ε2 + 4S log NT

and it suffices to bound the first term on the right hand side. We can do so by following the same steps as in Page 17-18 of Lu & Zhang (2019). Namely, under the event F c, define a convex function on the clipped simplex:

f(w) := w, η ℓt + Dϕ(w||wt).

The update rule in Algorithm 3 can now be written as:

wt+1 = arg min w N f(w).

By first order optimality, we have that

wt+1 et, f(wt+1) 0.

This gives us that

η wt+1 et, ℓt et wt+1, ϕ(wt+1) ϕ(wt) .

Thus, we can write

wt+1 et, ℓt 1

η et, ϕ(wt+1) (wt) 1

η wt+1, ϕ(wt+1) ϕ(wt)

η et, ϕ(wt+1) ϕ(wt) 1

η Dϕ(wt+1||wt)

η et, ϕ(wt+1) ϕ(wt) .

The first equality is by definition of the Bregman divergence and the last inequality is due to the fact that Bregman divergence is always non-negative. Summing over the interval, we have that

t=is wt+1 et, ℓt F c #

1 η et, ϕ(wt+1) ϕ(wt)

η eis, ϕ(wis+1) ϕ(wis)

j=1 eis(j) log wis+1(j)

j=1 eis(j) log NT

Tracking the Best Expert Privately

Thus, overall, we have that

t=1 ℓt(jt) F c #

(|R| + 1) log NT

η + 10ηT log2 NT

ε2 + 4S log NT

η + 10ηT log2 NT

ε2 + 4S log NT

To complete the proof, recall that

t=1 ℓt(jt) F c #

η + 10ηT log2 NT

ε2 + 4S log NT

The upper bound in Theorem 5.2 follows after picking η = ε q

S T log(NT ).

Proof. (of privacy in Theorem C.2) Let ℓ1, . . . , ℓT and ℓ 1, . . . , ℓ T be two sequences of neighboring loss vectors. Suppose they differ at time step t . Observe that the plays of Algorithm 3 are a post-processing of the noisy losses ℓ1, . . . , ℓT and ℓ 1, . . . , ℓ T . The distribution of the noisy losses between the two neighboring sequences remained unchanged except on round t . However, since each loss vector has sensitivity 1, by the Laplace mechanism and Lemma 2.2 , we know that the output distribution for the noisy loss vector in round t is ε-differentially private. Thus, the overall algorithm is also ε-differentially private.

Theorem 5.2 in the main text follows by composing Theorem C.1 and Theorem C.2.

D. Proof of Theorem 5.1

Before we prove Theorem 5.1, we recap the lower bound from Asi et al. (2023b).

Proposition D.1 (Lower bound on Expected Regret for Adaptive Adversaries (Asi et al., 2023b)). Let T be sufficiently large and N 2T. Let ε 1 and δ 1 T 3 . If A is (ε, δ)-differentially private, then

Radap A (T, N) = Ω min T, 1 (ε log T)2

As mentioned in the preliminaries, an adaptive adversary for A for time horizon T is simply a sequence of functions f1, f2, . . . , f T such that at time point t [T], the function ft : [N] [N]t 1 [0, 1] maps the past plays of the learning algorithm J1, . . . , Jt 1 to a loss vector ft( , J1:t 1) [0, 1]N. Likewise, an online learning algorithm A for time horizon T is a function A : ([0, 1]N [N]) N, which at time point t [T], takes in the past loss vectors ℓ1, . . . , ℓt 1, its own past plays J1, . . . , Jt 1, and outputs a distribution in N. We will use these representations of an adaptive adversary and algorithm to prove a lower bound on expected dynamic regret for adaptive adversaries.

Proof. (of Theorem 5.1) Fix S 0 and suppose without loss of generality that S + 1 divides T. Let T = T S+1. Let ε 1 and δ ( 1

T )3. Let A be any (ε, δ)-differentially private online learning algorithm. Then, by Proposition D.1, there exists a sequence of functions f 1 1 , f2, . . . , f T such that

Tracking the Best Expert Privately

t=1 ft(Jt, J1:t 1) min j 1 [N]

t=1 ft(j 1, J1:t 1)

Ω min T , 1 (ε log T )2

where Jt is the random variables denoting the prediction of A on round t [T ]. However, now observe that we can use Proposition D.1 again starting on round t = T + 1 with respect to the new internal state of A on round t = T + 1 after fixing J1, . . . , JT . That is, by fixing J1, . . . , JT , the algorithm A induces a new online learning algorithm A : ([0, 1]N [N]) N such that on input (ℓ1, i1), . . . , (ℓn, in) ([0, 1]N [N]) we have that

A((ℓ1, i1), . . . , (ℓn, in)) := A((f 1 1 ( ), J1), (f 1 2 ( , J1), J2), . . . , (f 1 T ( , J1:T 1), JT ), (ℓ1, i1), . . . , (ℓn, in)).

By post-processing, we have that A is also (ε, δ)-differentially private. Note that A is random as it is a function of J1, . . . , JT . Nevertheless, Proposition D.1 guarantees the existence of a sequence of functions f1, f2, . . . , f T for A such that

t=T +1 ft T (Jt, JT +1:t 1) min j 2 [N]

t=T +1 ft T (j 2, JT +1:t 1)

Ω min T , 1 (ε log T )2

where now JT +1, . . . , J2T are the random variables denoting the prediction of A. Recall that f1, f2, . . . , f T is a function of the realized values of J1, . . . , JT and hence are fixed once one specifies J1, . . . , JT . Thus, we can define f T +1, . . . , f2T such that for every t [T + 1 : 2T ] and any j1:t 1 [N]t 1, we have that ft( , j1:t 1) := ft T ( , j T +1:t 1), where f1, f2, . . . , f T is the aforementioned strategy of the adversary when one fixes J1 = j1, . . . , JT = j T . Now, observe that f T +1, . . . , f2T are not random and can be computed by the adversary before the game begins. Moreover, by construction, we have that

t=(s 1)T +1 ft(Jt, J1:t 1) min j 1:2T C(2T ,1)

t=(s 1)T +1 ft(j t , J1:t 1)

Ω 2 min T , 1 (ε log T )2

Repeating this same argument S times gives a sequence of functions f1, f2, . . . , f T , defining the strategy of the adaptive adversary, such that

t=(s 1)T +1 ft(Jt, J1:t 1) min j 1:T C(T,S)

t=(s 1)T +1 ft(j t , J1:t 1)

Ω (S + 1) min T , 1 (ε log T )2

T, S (ε log T S+1)2

This completes the proof.