# projectionfree_bandit_optimization_with_privacy_guarantees__2f342bd0.pdf

Projection-Free Bandit Optimization with Privacy Guarantees

Alina Ene,1 Huy L. Nguyen, 2 Adrian Vladu 3

1 Department of Computer Science, Boston University 2 Khoury College of Computer and Information Science, Northeastern University 3 CNRS & IRIF, Universit e de Paris aene@bu.edu, hu.nguyen@northeastern.edu, vladu@irif.fr

We design differentially private algorithms for the bandit convex optimization problem in the projection-free setting. This setting is important whenever the decision set has a complex geometry, and access to it is done efﬁciently only through a linear optimization oracle, hence Euclidean projections are unavailable (e.g. matroid polytope, submodular base polytope). This is the ﬁrst differentially-private algorithm for projection-free bandit optimization, and in fact our bound matches the best known non-private projection-free algorithm and the best known private algorithm, even for the weaker setting when projections are available.

Introduction Online learning is a fundamental optimization paradigm employed in settings where one needs to make decisions in an uncertain environment. Such methods are essential for a range of practical applications: ad-serving (Mc Mahan et al. 2013), dynamic pricing (Lobel, Leme, and Vladu 2018; Mao, Leme, and Schneider 2018), or recommender systems (Abernethy et al. 2007) are only a few examples. These techniques are highly dependent on access to certain user data, such as search history, list of contacts, etc. which may expose sensitive information about a particular person. As these tools become ubiquitous on the internet, one can witness a surge in the collection of user data at massive scales. This is a tremendous problem, since by obtaining information about the behavior of algorithms run on these data, adversarial entities may learn potentially sensitive information. This could then be traced to a particular user, even if the users were anonymized to begin with (Dwork, Roth et al. 2014). To mitigate the threat of diminishing user privacy, one can leverage the power of differential privacy (Dwork et al. 2006), a notion of privacy which ensures that the output of an algorithm is not sensitive to the presence of a particular user s data. Therefore, based on this output, one can not determine whether a user presents one or more given attributes. Differentially private learning algorithms have been studied in several settings, and a large number of recent works addressed the challenge of designing general optimization primitives with privacy guarantees (Jain, Kothari, and

Copyright 2021, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

Thakurta 2012; Agarwal and Singh 2017; Bassily, Smith, and Thakurta 2014; Abadi et al. 2016; Wang, Ye, and Xu 2017; Iyengar et al. 2019). In this paper, we further advance this line of research by offering differentially private algorithms for a very general task the bandit convex optimization problem in the case where the space of decisions that the learning algorithm can make exhibits complex geometry. Bandit convex optimization is an extremely general framework for online learning, which is motivated by the natural setting where, after making a decision, the algorithm only learns the loss associated with its action, and nothing about other possible decisions it could have made (as opposed to the weaker full information model where losses associated to all the possible decisions are revealed). Algorithms for this problem are highly dependent on the geometric properties of the space of decisions and their performance usually depends on the ability to perform certain projections onto this space (Ben-Tal and Nemirovski 2001; Jaggi 2013). For large scale problems, this requirement may be prohibitive, as decisions may have to satisfy certain constraints (the set of recommendations must be diverse enough, or the set of ads to be displayed satisfy a given budget). Projection-free methods overcome this issue by exploiting the fact that some canonical decision sets often encountered in applications (matroid polytope, submodular base polytope, ﬂow polytope, convex relaxations of low-rank matrices) have efﬁcient linear optimization oracles. One can therefore use these efﬁcient oracles in conjunction with the projection-free method to obtain algorithms that can be deployed for real-world applications. In this work we bridge the requirements of privacy and efﬁciency for online learning, building on the works of (Garber and Kretzu 2020; Garber and Hazan 2013b), and obtain the ﬁrst differentially private algorithm for projectionfree bandit optimization. To do so we leverage a generic framework for online convex optimization in the presence of noise, which we then adapt to our speciﬁc setting in a modular fashion.

Our Contributions. We give the ﬁrst differentially private algorithm for the bandit convex optimization problem in the projection-free setting (we defer the deﬁnition of (ε, δ)- privacy to Deﬁnition 2 and the problem statement to the Preliminaries section). We summarize the regret guarantees of our algorithm in the following theorem and compare it with

The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)

the state of the art guarantees in the private and non-private settings. Our main focus is on the dependency on the dimension n of the ambient space, the number T of iterations, and the privacy budget ε. For ease of comparison, we use the e O notation to hide poly-logarithmic factors in n and T, as well as parameters such as the Lipschitz constant of the loss functions. The precise guarantees can be found in Lemma 8 (for (ε, 0)-privacy) and Lemma 11 (for (ε, δ)-privacy).

Theorem 1. Let D Rn be a convex domain for which we have access to a linear optimization oracle. Assume that for every t 1, ft is convex and L-Lipschitz. Furthermore suppose that maxx,y D x y D. Then there exists an algorithm PRIVATEBANDIT (Algorithm 1) which performs projection-free convex optimization in the bandit setting such that one of the following two properties holds: the algorithm is (ε, 0)-differentially private and, assuming L = O(1) and D = O(1), achieves an expected regret of

RT = e O T 3/4n3/2

the algorithm is (ε, δ)-differentially private and, assuming L = O(1) and D = O(1), achieves an expected regret of

RT = e O (T 3/4n1/2 + T 1/2n) log O(1)(1/δ)

whenever δ = 1/(n + T)O(1).

In the non-private setting, the state of the art regret guarantee for projection-free bandit optimization is e O(n1/2T 3/4) due to Garber and Kretzu (2020).1 The regret guarantee of our algorithm matches the guarantee of Garber and Kretzu up to a n/ε factor in the (ε, 0) regime, and a 1/ε factor in the (ε, δ)-regime, whenever T n2. Prior works in the private setting require projections to be available. The state of the art guarantees for private bandit optimization with projections are achieved by the work of Thakurta and Smith (2013). Thakurta and Smith focus on (ε, 0)-privacy and obtain a regret bound of e O(n T 3/4/ε). A variant of their algorithm can be used for (ε, δ)-privacy and obtains a regret bound of e O( n T 3/4/ε). Our algorithm s guarantee matches the best guarantee with projections for (ε, δ)-privacy and is worse by a n factor for (ε, 0)-privacy. We leave it as an interesting open problem to improve the bound for (ε, 0)-privacy to match the one using projections.

Our Techniques. In the process of obtaining our main result, we develop the common abstraction of noisy mirror descent to capture both online bandit optimization and private optimization (the NOISYOCO framework). This allows us to analyze the impact of the noise introduced to protect privacy on the regret of the online optimization. Once the framework is set up, it only remains to analyze the noise level to ensure the appropriate privacy guarantee and one immediately obtains the corresponding regret bound.

1Their paper allows for a tradeoff among parameters. This bound is optimized for the case when T n.

However, analyzing the noise is in itself a non-trivial challenge. In the case of (ε, δ)-privacy, we give a strong concentration bound allowing us to match the privacy-regret tradeoff achieved with projections (see Lemmas 9 and 10). By using this concentration bound and ignoring the tail of the distribution, we obtain stronger results under (ε, δ)-differential privacy than the straightforward approach. In contrast, in (ε, 0)-differential privacy, one cannot ignore what happens in the tail of the distribution and understanding the algorithm in that regime seems difﬁcult. Our algorithms use established techniques from differential privacy, such as the Gaussian and Laplacian mechanisms, and tree based aggregation. However, integrating these techniques with optimization methods requires some degree of care. For example, our (ε, δ)-privacy bound is derived using a matrix concentration inequality which crucially relies on a randomized smoothing technique used for obtaining gradient estimates. This is a key ingredient to obtaining the correct e O(n1/2) dependence in dimension in the (ε, δ) regime. Our algorithms can be viewed as a proof of concept for a general approach to deriving differentially private optimization methods. Previous results in this area can be recovered by following our approach: inject the maximum amount of noise as to not change the convergence rate asymptotically, then analyze the privacy loss. This is very simple, but it paves the way for further development of practical differentially private algorithms, without requiring major changes in their implementation simply replace certain components of the algorithm with a black box implementation of the required differentially private mechanism. We believe our framework is general and it facilitates further progress in differentially private optimization. We demonstrate our framework by instantiating it with the Laplace mechanism (to obtain an (ε, 0)-private algorithm) and with the Gaussian mechanism (to obtain an (ε, δ)- private algorithm). It would be interesting to apply our framework to other notions of differential privacy, such as concentrated differential privacy (Dwork and Rothblum 2016; Bun and Steinke 2016) and Renyi differential privacy (Mironov 2017).

Other Related Work. The theory of online convex optimization is truly extensive, and has seen a lot of developments in recent years. Here we only discuss the relevant works that involve projection-free and/or differentially private online learning algorithms. The class of projectionfree online learning algorithms was initiated by the work of Hazan and Kale (2012) in the context of online convex optimization, where full gradients are revealed after making a decision. This was further extended to multiple regimes (Garber and Hazan 2013b,a, 2015) including the bandit setting (Chen, Zhang, and Karbasi 2018; Garber and Kretzu 2020). As discussed above, Thakurta and Smith (2013) achieve the state of the art regret guarantee for (ε, 0)-private online bandit optimization when projections are available. For general Lipschitz functions, their regret is e O(n T 3/4/ε). In the speciﬁc case where the adversary is oblivious and the

loss functions are strongly-convex, they improve this to e O(n T 2/3/ε). In a different line of work, Agarwal and Singh (2017) obtained improved bounds for the case where losses are linear functions and for the multi-armed bandit problem (a generalization of the learning with experts framework), with regret e O(T 2/3/ε) and e O(n T 2/3/ε) respectively. These results, however, concern only a restricted class of bandit optimization problems, so for the general bandit convex optimization problem the result of Thakurta and Smith (2013) still stands as the best one. In fact, even in the non-private setting, improving the regret of the bandit convex optimization problem from e O(T 3/4) to e O(T 2/3) (Awerbuch and Kleinberg 2004; Dani and Hayes 2006) or below (Dani, Kakade, and Hayes 2008; Abernethy, Hazan, and Rakhlin 2008; Bubeck, Lee, and Eldan 2017) requires stronger access to the set of actions than just projections (such as via a self-concordant barrier), and involves performing expensive computations. Indeed, Bubeck, Lee, and Eldan (2017) are the ﬁrst to achieve both optimal regret e O(T 1/2) and polynomial running time per iteration.

Preliminaries Bandit Convex Optimization. In the bandit convex optimization problem, an algorithm iteratively selects actions xt (using a possibly randomized strategy) from a convex set D Rn. After selecting an action, the loss caused by this choice ft(xt) is revealed, where ft : D R is a convex function unknown to the algorithm. After T iterations, the algorithm compares its total loss PT t=1 ft(xt) to the smallest loss it could have incurred by choosing a ﬁxed strategy throughout all the iterations minx D PT t=1 ft(x). The difference between these two losses is called regret:

t=1 ft(xt) min x D

and the goal is to minimize its expectation over the randomized choices of the algorithm.

Differential Privacy. Differential privacy (Dwork et al. 2006) is a rigorous framework used to control the amount of information leaked when performing computation on a private data set. In our framework, we seek algorithms which ensure that the amount of information an adversary can learn about a particular loss function ft is minimal, i.e. it is almost independent on whether ft appears or not in the sequence of loss functions occurring throughout the execution of the algorithm. For completeness, we deﬁne differential privacy in the context of loss functions encountered in the bandit convex optimization problem. Deﬁnition 2 ((ε, δ)-differential privacy). A randomized online learning algorithm A on the action set D is (ε, δ)- differentially private if for any two sequences of loss functions F = (f1, . . . , f T ) and F = (f 1, . . . , f T ) differing in at most one element, for all S DT it holds that Pr[A(F) S] eε Pr[A(F ) S] + δ .

One obstacle that may occur in the context of bandit optimization is that changing a single loss function may alter the set of actions returned in the future by the algorithm.

The Projection-Free Setting. While classical online optimization methods have a long history of developments, these rely in general on the ability to perform projections onto the feasible set D of actions. For example, one may want to choose actions that correspond to points inside a matroid polytope, or other complicated domains. In such situations, it is computationally infeasible to perform projections onto D, and designing algorithms where all the actions lie inside this domain becomes a challenging task. In the case of online optimization, this issue is mitigated by projection-free methods (Jaggi 2013; Garber and Hazan 2015; Dudik, Harchaoui, and Malick 2012; Shalev-Shwartz, Gonen, and Shamir 2011), where the complexity of the high complexity of the description of D is balanced by the existence of a linear optimization oracle over this domain. Among these, the conditional gradient method (also known as Frank-Wolfe) (Bubeck et al. 2015) is the best known one. In our setting, we treat the case where, although D may be very complicated, we have access to a linear optimization oracle that, given any direction v Rn, returns arg minx D v, x . Such oracles are easily available for interesting domains such as the matroid polytope or the submodular base polytope.

Parameters and Assumptions. We write vectors and matrices in bold. We use x, y to represent inner products, and x to represent the ℓ2 norm of a vector x = (P i xi)1/2. When considering other norms than ℓ2 we explicitly specify them x p = (P i xp i )1/p. We let Bn p be the n dimensional unit ℓp ball and Sn p the boundary of Bn p i.e. the n dimensional unit ℓp sphere. We consider optimizing over a convex domain D Rn, for which we have access to a linear optimization oracle. We deﬁne the diameter of the domain as D = maxx,y D x y . We say that a function f : D R is L-Lipschitz iff |f(x) f(y)| L x y for all x, y D and that a differentiable function f is βsmooth iff f(x) f(y) β x y for all x, y D. We say that f is α-strongly convex iff f(x) f(y) α x y . In our setting, all functions ft satisfy the standard assumption of being L-Lipschitz. Just like in prior works (Thakurta and Smith 2013; Agarwal and Singh 2017), we further assume that the number of iterations T we run the algorithm for is known ahead of time. This assumption can be eliminated via a standard reduction using the doubling trick (see (Auer et al. 1995) and (Chen, Zhang, and Karbasi 2018)), which invokes the base algorithm repeatedly by doubling the horizon T at each invocation, at the expense of adding an extra O(log T) factor in the privacy loss. We further assume that all ft s are deﬁned within a region that slightly exceeds the boundary of D. This assumption is required, since one of the techniques employed here requires having ft deﬁned over D ζBn 2 for a small scalar ζ. This assumption can be removed via a simple scaling trick, whenever D contains an ℓ2 ball centered at the origin (similarly

to (Garber and Kretzu 2020)); we explain how to do so in the full version (Ene, Nguyen, and Vladu 2021). Finally, in order to be able to appropriately privatize the losses ft(xt), we need to bound their magnitude. To this end, we assume that each ft achieves 0 loss at some point within D, which via the Lipschitz condition and the diameter bound automatically implies that |ft(xt)| LD for all t. Other related works (Flaxman, Kalai, and Mc Mahan 2004; Agarwal, Dekel, and Xiao 2010; Thakurta and Smith 2013) simply use a ﬁxed upper bound |ft(xt)| B for some ﬁxed parameter B, but we prefer this new convention to reduce the number of parameters to control, since we are focused mainly in the regret dependence in T, n and ε.

Mirror Maps and the Fenchel Conjugate. In general, convex optimization implicitly relies on the existence of a mirror map ω : D R with desirable properties (see (Ben Tal and Nemirovski 2001) for an extensive treatment of these objects). This is used in order to properly interface iterates and gradient updates, since in Banach spaces these are of different types. In this paper, we use ω(x) = 1

2 x 2 2, although other choices can be used depending on the geometry of D. We deﬁne the Fenchel conjugate of ω as ω : Rn R such that ω (y) = max x D y, x ω(x) . (0.1)

Furthermore, if ω is strongly convex, then ω is smooth and differentiable (Nesterov 2005), and its gradient satisﬁes

ω (y) = arg max x D y, x ω(x) . (0.2)

Smoothing. We use the randomized smoothing technique from (Flaxman, Kalai, and Mc Mahan 2004) in order to smoothen the loss functions ft. This technique is crucial to obtain gradient estimators despite having only value access.

Lemma 3 ((Flaxman, Kalai, and Mc Mahan 2004)). Let f : Rn R be a convex and L-Lipschitz function. Then the smoothing bf(x) = Eu Bn 2 f (x + ζu) satisﬁes the fol-

lowing properties: (1) f(x) bf(x) ζL, (2) bf is convex

and L-Lipschitz, (3) bf(x) = n

ζ Eu Sn 2 f(x + ζu) u.

Tree Based Aggregation. An essential ingredient of the algorithm is maintaining partial sums of the gradient estimators witnessed so far. We use a variant of the algorithm from (Dwork et al. 2010; Jain, Kothari, and Thakurta 2012), as implemented in (Agarwal and Singh 2017). We use the algorithm as a black box and only rely on its properties that are stated in Theorem 4 below. We include a description of the TREEBASEDAGG algorithm in the full version (Ene, Nguyen, and Vladu 2021) for completeness.

Theorem 4 ((Jain, Kothari, and Thakurta 2012; Agarwal and Singh 2017)). Let {ℓt}T t=1 be a sequence of vectors in Rn, and let Y1 and Y2 be promises such that ℓt 1 Y1 and ℓt 2 Y2 for all t. Let ε, δ > 0, and λ Y1 log T

ε log T log log T

δ . There is an algorithm, TREEBASEDAGG, that ﬁrst outputs b L0 and then iteratively takes ℓt as input and returns

an approximate partial sum b Lt for 1 t T. The algorithm can be speciﬁed with a noise distribution P over Rn so that the output sequence {b Lt}T t=1 satisﬁes b Lt = Pt s=1 ℓs + P log T r=1 Zr, where Zr P, and furthermore: when P is coordinate-wise Lap(0, λ), the sequence {b Lt}T t=1 is (ε, 0)-differentially private. when P is coordinate-wise N(0, σ2), the sequence {b Lt}T t=1 is (ε, δ)-differentially private.

Algorithm The algorithm is described in Algorithm 1. It builds on the work of Garber and Kretzu (2020) and uses the smoothing (Lemma 3) and tree aggregation (Theorem 4) routines designed in previous work (see the preliminaries section). The algorithm follows the structure of an online mirror descent algorithm. It performs a sequence of iterations, and in each iteration it makes a guess xt based on the previous outcomes. The iterations are divided into Tround batches, each of size Tbatch (thus, T = Tround Tbatch). Each batch R is treated as a round for online mirror descent with a twist: in parallel, we compute the best regularized response ex R for the revealed outcomes in the ﬁrst R 1 batches (line 14) and use the previously computed ex R 1 for all iterations in batch R (lines 6 to 12). Three notices are in order: Computing the best regularized response to previous outcomes requires maintaining the sum of gradients in previous rounds. The tree-based aggregation method (Theorem 4) is used to maintain these sums accurately while preserving privacy (line 13). In each iteration of a batch, the algorithm only has access to the function value, not the gradient, so we use the smoothing technique (Lemma 3): the function value at a perturbation of ex R 1 is used to obtain a stochastic estimate of the gradient of a smoothed proxy of the objective function. Thus each iteration in the same batch uses a different perturbation of the same response ex R 1. We only compute an approximation of the best regularized response, using the conditional gradient method in line 14. The precision to which this is computed is chosen in such a way that the number of iterations required by conditional gradient matches the number of iterations in a batch, so that we can charge each call to the linear optimization oracle over D to one iteration of the bandit algorithm. The algorithm needs to be speciﬁed with a noise distribution P over Rn, which we use for privatizing the partial sums in order to strike the right tradeoff between privacy and regret. To obtain an (ε, 0)-private algorithm, we set P to be coordinate-wise Laplace noise Lap(0, λ). To obtain an (ε, δ)-private algorithm, we set P to be coordinate-wise Gaussian noise N(0, σ2). The precise choice for the parameters λ and σ2 are established in Lemmas 8 and 11. Our algorithm can be viewed as a slight simpliﬁcation and generalization of the algorithm of Garber and Kretzu (2020). Speciﬁcally, we do not require any speciﬁc stopping conditions and case analysis for solving the inner conditional gradient routine and we can extend it to more general geometries deﬁned by the mirror map. Additionally, the

Algorithm 1 PRIVATEBANDIT(T, P, D)

input time horizon T, symmetric noise distribution P, diameter of domain D. 1: Tround = T 1/2, Tbatch = T Tround , η = D T 3/4n1/2L, ζ =

T 1/4 . 2: Initialize TREEBASEDAGG for a sequence of length Tround and noise P. 3: for R = 1 to Tround do 4: execute in parallel: 5: eg R = 0 6: for r = 1 to Tbatch do 7: t = (R 1)Tbatch + r 8: Sample ut S2(1) 9: xt = ex R 1 + ζut 10: Query Ft = n

ζ ft(ex R 1 + ζut) 11: eg R = eg R + Ft ut 12: end for 13: // Update the partial sum of noisy gradients: es R = TREEBASEDAGG (eg R, R) 14: Solve via conditional gradient

min x D 1 2 x 2 2 ηes R 1, x

to precision εcg = D/T 1/2 batch. Let ex R be the output. 15: end for

NOISYOCO framework allows us to handle the noise introduced by the differentially private mechanisms without making any further changes to the algorithm or its analysis. This framework may be of further interest for designing differentially private optimization methods. In the following section, we show the following regret guarantee:

Lemma 5. Let µ = EX P X , and let D be the diameter of the domain D, n the ambient dimension, and L an upper bound on the Lipschitz constant of the loss functions ft. Algorithm PRIVATEBANDIT achieves a regret of

O T 3/4n1/2LD + T 1/4Dµ log T .

Noisy Mirror Descent Framework and Regret Analysis In this section, we sketch the regret analysis for Algorithm 1. We derive the algorithm s regret guarantee via the NOISYOCO framework, a meta-algorithm for online convex optimization with noise. We show that Algorithm 1 is an instantiation of this meta-algorithm and we derive its regret guarantees from the guarantee for NOISYOCO.

Noisy Mirror Descent Framework Here we describe and analyze the NOISYOCO algorithm (Algorithm 2) for online convex optimization with noise. We assume that we perform online convex optimization over a convex domain D endowed with a strongly convex mirror

Algorithm 2 NOISYOCO(T)

input time horizon T. 1: η = D1/2 ω /(κT 1/2), s0 = 0 2: for t = 1 to T do 3: ext = NOISYMAP( ηst 1) 4: output ext and query egt = NOISYGRAD(ft, ext) 5: st = st + egt 6: end for

map ω : D R such that maxx D ω(x) D2 ω. We also assume (κ, γ)-noisy gradient access, deﬁned as follows: a noisy gradient oracle for ft; given x, it returns a randomized eg = NOISYGRAD(ft, x) such that Eeg = ft(x), and E eg 2 κ2, a noisy gradient oracle for ω ; given g, it returns a randomized ex = NOISYMAP(g) such that E ω (g) ex γ. Under these conditions we can derive the following regret guarantee. We give the proof in the full version (Ene, Nguyen, and Vladu 2021). Lemma 6. Given an instance of online convex optimization with (κ, γ)-noisy gradient access, the algorithm NOISYOCO obtains an expected regret of

RT = O T 1/2κDω + Tκγ .

Regret Analysis The regret analysis for Algorithm 1 is based on the guarantee for NOISYOCO from Lemma 6. It follows from mapping the steps in Algorithm 1 to the framework from NOISYOCO, and bounding the parameters involved. In order to do so, we explain how the NOISYGRAD and NOISYMAP routines are implemented by Algorithm 1. We then proceed to bound the κ and γ parameters corresponding to this speciﬁc instantiation, which will yield the desired result. Here we describe the steps required for analysis. We give the detailed proofs in the full version (Ene, Nguyen, and Vladu 2021). The ﬁrst step is to reduce the problem to minimizing regret on a family of functions { ef R}Tround R=1, where ef R = PTround t=1 bf(Tbatch 1) R+t. This introduces two sources of error: one from using the smoothed bf instead of f, and another from using different iterates xt = ex R 1 + ut in the same round, even though batching iterations effectively constrains all the iterates in a ﬁxed batch to be equal. These errors are easy to control, and add at most O(TζL) in regret. For the family of functions { ef R}Tround R=1 we implement NOISYGRAD as:

eg R := NOISYGRAD( ef R, xt)

t=(R 1)Tbatch+1

n ζ ft(xt + ζut)ut ,

which is an unbiased estimator for bf R(xt), per Lemma 3. Thus we bound κ2 by showing that E eg R 2 Tbatch (LDn/ζ)2 + T 2 batch L2.

Furthermore, the output of NOISYMAP is implemented in line 14 by running conditional gradient to approximately minimize a quadratic over the feasible domain D. The error in the noisy map implementation comes from (1) only approximately minimizing the quadratic, (2) using a noisy partial sum of gradient estimators rather than an exact one, and (3) using a stale partial sum approximation es R 1 for round R + 1, instead of es R. We show that the error parameter corresponding to this NOISYMAP implementation can be bounded as γ η ( log T µ + κ) +

20 D Tbatch . In the full version (Ene, Nguyen, and Vladu 2021), we bound the speciﬁc parameters corresponding to these implementations. Plugging these with bounds into Lemma 6 yields the proof of Lemma 5, after appropriately balancing the parameters Tbatch, Tround, ζ.

Privacy Analysis

In this section, we instantiate Algorithm 1 with appropriate noise distribution P in order to derive our (ε, 0)-private algorithm and our (ε, δ)-private algorithm. As mentioned earlier, we use Laplace noise for (ε, 0)-privacy and obtain the guarantee in Lemma 8, and we use Gaussian noise for (ε, δ)- privacy and obtain the guarantee in Lemma 11. First, we describe the proofs for the (ε, 0)-privacy regime, where we employ Laplace noise, since they show how this framework allows us to trade regret and privacy.

Lemma 7 (Privacy with Laplace noise). Let P be coordinate-wise Lap(0, T 1/2n L log T/ε). The algorithm PRIVATEBANDIT(T, P) is (ε, 0)-differentially private.

Proof. First we bound the ℓ1 norm of the vectors whose partial sums are maintained by the tree based aggregation method in Algorithm 1. Since each vector contributing to that partial sum is obtained by adding up Tbatch = T 1/2 vectors, each of which is a unit ℓ2 vector scaled by a constant that is absolutely bounded by M = LD n

ζ = T 1/4n1/2L, we naively bound the ℓ1 norm of each of them by Tbatch n1/2 M T 1/2n L. Therefore, by Theorem 4, releasing Tround = T 1/2 such partial sums causes a loss of privacy of at most ε whenever

λ T 1/2n L log T

Using Lemma 7 we can now bound the regret of the (ε, 0)- differentially private algorithm.

Lemma 8 (Regret with Laplace noise). Let P be coordinate-wise Lap(0, T 1/2n L log T/ε). The algorithm PRIVATEBANDIT(T, P) has regret

RT = O T 3/4n1/2LD + T 3/4n3/2LD log2 T

Proof. Per Lemma 5 we only need to upper bound the expected ℓ2 norm of an n-dimensional vector

where each coordinate is independently sampled from Lap(0, T 1/2n L log T/ε). Indeed, we have

µ = O n1/2 T 1/2n L log T/ε .

Plugging this into the regret guarantee from Lemma 5 gives the desired result.

We note that the regret guarantee we achieved has an undesirable dependence in dimension. In the remainder of this section, we show that we can obtain improved guarantees if we settle for (ε, δ)-differential privacy instead, which we achieve by using Gaussian noise. This is also more properly suited to our setting, since the regret bound we proved depends on ℓ2 norms of the injected noise vectors, which is exactly what governs the privacy loss in this case. A novel and precise error analysis in Lemmas 9 and 10 enables us to obtain the same regret bound as when projection is available for (ε, δ)-privacy. We additionally use the fact that the randomized smoothing technique further constrains the norm of the vectors eg R we employ to update the partial sums, with high probability. In order to do this, we use a concentration inequality for random vectors (Lemma 9) which allows us to obtain an improved guarantee on privacy, at the expense of a slight increase in the δ parameter. Compared to the (ε, 0) case, allowing this low probability failure event enables us to save roughly a factor of e O(T 1/4n 1/2) in the norm of the vectors we use to update the partial sums via tree based aggregation. In turn, this allows us to use less noise to ensure privacy, and therefore we obtain an improved regret. We start with the following lemma, which establishes a high probability bound on the ℓ2 norm of a sum of random vectors multiplied by an adversarially chosen set of scalars.

Lemma 9. Let u1, uk B2(1) be a set of independent random vectors in Rn. Then, with probability at least 1 δ, one has that for any vector c Rk such that c :

10 log n + k

Proof. Consider the family of matrices {Zi}k i=1 Rn k

where Zi has its ith column equal to ui and all the other entries are 0. Therefore EZk = 0 and Zk 1. Furthermore, by deﬁnition Zi Z i = uiu i and Z i Zi = ui 2 1i1 i . Therefore EZi Z i = I/n and EZi Z i = 1i1 i . So

= max k n I , I 1 + k/n .

Letting Z = Pk i=1 Zi, and using matrix Bernstein (Tropp et al. 2015), we see that

Pr [ Z t] (n + k) exp( t2/(2(1 + k/n) + 2t/3)) .

Hence for t = 10 log n+k

δ + q 1 + k

that Z t with probability at least 1 δ. Therefore with probability at least 1 δ we have Zc

Z c 10 log n+k

δ + q 1 + k

implies what we needed.

Using the above lemma, we can obtain a tighter bound the privacy loss when using Gaussian noise. Lemma 10 (Privacy with Gaussian noise). Let P be coordinate-wise N(0, σ2), where

σ = T 1/4n1/2L log T log(T/δ)

s 1 + T 1/2

The algorithm PRIVATEBANDIT(T, P) is (ε, δ)- differentially private.

Proof. Using Lemma 9 and union bound we have that with probability at least 1 δ0T 1/2 the ℓ2 norm of each of the Tbatch vectors contributing to the partial sums maintained in the tree based aggregation routine is M =

O T 1/4n1/2L log n+T

(1 + Tbatch

n ) log n+T

By Theorem 4 maintaining these partial sums is thus (ε, δ0T 1/2 + δ1)-differentially private, where ε = M log T log(T/δ1)

σ . Hence setting δ1 = δ/2, δ0 = δ/(2T 1/2) and

σ = M log T log(T/δ1)

ε = T 1/4n1/2L log T log(T/δ)

s 1 + T 1/2

yields an (ε, δ)-differentially private algorithm.

Lemma 11 (Regret with Gaussian noise). Let δ = 1/(n + T)O(1). The algorithm PRIVATEBANDIT(T, N(0, σ2)) where σ is chosen according to Lemma 10 such that the algorithm is (ε, δ)-private has regret

RT = O T 3/4n1/2LD

+ T 1/2n LD log2 T log(T/δ) log((n + T)/δ)

+ T 3/4n1/2LD log2 T log(T/δ) p

log((n + T)/δ) ε

Proof. Per Lemma 5 we only need to upper bound the expected norm of an n-dimensional vector where each coordinate is sampled from N(0, σ2). In this case we have µ = O(n1/2σ), so plugging it into the regret guarantee from Lemma 5 we obtain regret

RT = O T 3/4n1/2LD + T 1/4n1/2Dσ log T

which implies the result after substituting σ.

The proof of Theorem 1 now follows from combining Lemmas 7, 8, 10 and 11.

Discussion and Open Problems

We saw how one can derive differentially private algorithm starting from a very basic framework for noisy online convex optimization. Our analysis builds on advances in both differential privacy and online convex optimization, combines their techniques in non-trivial ways and introduces new ideas. Among others, a novel and precise error analysis in Lemmas 9 and 10 enables us to obtain the same regret bound as when projection is available for (ε, δ)-privacy, in contrast to (ε, 0)-privacy. To the best of our knowledge, this is a rare case where such a difference between the two privacy settings arise. We think it is an interesting direction for future work to obtain an analogous improvement even in the (ε, 0)-privacy setting. It would be interesting to see if this generic method in conjunction with tools from differential privacy can be used to obtain more private learning algorithms. A few outstanding questions remain. Since e O(T 3/4) is also the best known regret bound in the non-private setting, it would be interesting to improve this result, which may lead to an improved differentially private algorithm. Furthermore, in the nonprivate setting with projections, obtaining algorithms with lower regret requires a stronger control of the geometry of the domain. This makes differential privacy more difﬁcult to achieve since even the simplest algorithms with improved regret require solving a linear system of equations, which is much more sensitive to noise than vanilla gradient steps. Developing a more ﬁne-grained understanding of these problems via differential privacy poses an exciting challenge.

Acknowledgments

AE was supported in part by NSF CAREER grant CCF1750333, NSF grant CCF-1718342, and NSF grant III1908510. HLN was supported in part by NSF CAREER grant CCF-1750716 and NSF grant CCF-1909314. AV was supported in part by NSF grant CCF-1718342.

References Abadi, M.; Chu, A.; Goodfellow, I.; Mc Mahan, H. B.; Mironov, I.; Talwar, K.; and Zhang, L. 2016. Deep learning with differential privacy. In ACM SIGSAC Conference on Computer and Communications Security, 308 318.

Abernethy, J.; Canini, K.; Langford, J.; and Simma, A. 2007. Online collaborative ﬁltering. Technical report, University of California at Berkeley.

Abernethy, J. D.; Hazan, E.; and Rakhlin, A. 2008. Competing in the dark: An efﬁcient algorithm for bandit linear optimization. In Conference on Learning Theory (COLT), 263 273.

Agarwal, A.; Dekel, O.; and Xiao, L. 2010. Optimal Algorithms for Online Convex Optimization with Multi Point Bandit Feedback. In Conference on Learning Theory (COLT), 28 40.

Agarwal, N.; and Singh, K. 2017. The price of differential privacy for online learning. In International Conference on Machine Learning (ICML), 32 40. Auer, P.; Cesa-Bianchi, N.; Freund, Y.; and Schapire, R. E. 1995. Gambling in a rigged casino: The adversarial multiarmed bandit problem. In IEEE Foundations of Computer Science (FOCS), 322 331. Awerbuch, B.; and Kleinberg, R. D. 2004. Adaptive routing with end-to-end feedback: Distributed learning and geometric approaches. In ACM Symposium on Theory of Computing (STOC), 45 53. Bassily, R.; Smith, A.; and Thakurta, A. 2014. Private empirical risk minimization: Efﬁcient algorithms and tight error bounds. In IEEE Foundations of Computer Science (FOCS), 464 473. Ben-Tal, A.; and Nemirovski, A. 2001. Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications, volume 2. SIAM. Bubeck, S.; Lee, Y. T.; and Eldan, R. 2017. Kernel-based methods for bandit convex optimization. In ACM Symposium on Theory of computing (STOC), 72 85. Bubeck, S.; et al. 2015. Convex optimization: Algorithms and complexity. Foundations and Trends in Machine Learning 8(3-4): 231 357. Bun, M.; and Steinke, T. 2016. Concentrated differential privacy: Simpliﬁcations, extensions, and lower bounds. In Theory of Cryptography Conference (TCC), 635 658. Chen, L.; Zhang, M.; and Karbasi, A. 2018. Projectionfree bandit convex optimization. ar Xiv preprint ar Xiv:1805.07474 . Dani, V.; and Hayes, T. P. 2006. Robbing the bandit: Less regret in online geometric optimization against an adaptive adversary. In ACM-SIAM Symposium on Discrete Algorithms (SODA), 937 943. Dani, V.; Kakade, S. M.; and Hayes, T. P. 2008. The price of bandit information for online optimization. In Neural Information Processing Systems (Neur IPS), 345 352. Dudik, M.; Harchaoui, Z.; and Malick, J. 2012. Lifted coordinate descent for learning with trace-norm regularization. In Artiﬁcial Intelligence and Statistics (AISTATS), 327 336. Dwork, C.; Mc Sherry, F.; Nissim, K.; and Smith, A. 2006. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference (TCC), 265 284. Dwork, C.; Naor, M.; Pitassi, T.; and Rothblum, G. N. 2010. Differential privacy under continual observation. In ACM Symposium on Theory of Computing (STOC), 715 724. Dwork, C.; Roth, A.; et al. 2014. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science 9(3 4): 211 407. Dwork, C.; and Rothblum, G. N. 2016. Concentrated differential privacy. ar Xiv preprint ar Xiv:1603.01887 . Ene, A.; Nguyen, H. L.; and Vladu, A. 2021. Projection Free Bandit Optimization with Privacy Guarantees. ar Xiv preprint ar Xiv:2012.12138 .

Flaxman, A. D.; Kalai, A. T.; and Mc Mahan, H. B. 2004. Online convex optimization in the bandit setting: gradient descent without a gradient. ar Xiv preprint ar Xiv:cs/0408007 .

Garber, D.; and Hazan, E. 2013a. A linearly convergent conditional gradient algorithm with applications to online and stochastic optimization. ar Xiv preprint ar Xiv:1301.4666 .

Garber, D.; and Hazan, E. 2013b. Playing non-linear games with linear oracles. In IEEE Foundations of Computer Science (FOCS), 420 428.

Garber, D.; and Hazan, E. 2015. Faster rates for the frankwolfe method over strongly-convex sets. In International Conference on Machine Learning (ICML), 541 549.

Garber, D.; and Kretzu, B. 2020. Improved Regret Bounds for Projection-free Bandit Convex Optimization. In Conference on Artiﬁcial Intelligence and Statistics (AISTATS), 2196 2206.

Hazan, E.; and Kale, S. 2012. Projection-free online learning. In International Conference on Machine Learning (ICML), 521 528.

Iyengar, R.; Near, J. P.; Song, D.; Thakkar, O.; Thakurta, A.; and Wang, L. 2019. Towards practical differentially private convex optimization. In IEEE Symposium on Security and Privacy (SP), 299 316.

Jaggi, M. 2013. Revisiting Frank-Wolfe: Projection-free sparse convex optimization. In International Conference on Machine Learning (ICML), 427 435.

Jain, P.; Kothari, P.; and Thakurta, A. 2012. Differentially private online learning. In Conference on Learning Theory (COLT), 24 1.

Lobel, I.; Leme, R. P.; and Vladu, A. 2018. Multidimensional binary search for contextual decision-making. Operations Research 66(5): 1346 1361.

Mao, J.; Leme, R.; and Schneider, J. 2018. Contextual pricing for lipschitz buyers. In Neural Information Processing Systems (Neur IPS), 5643 5651.

Mc Mahan, H. B.; Holt, G.; Sculley, D.; Young, M.; Ebner, D.; Grady, J.; Nie, L.; Phillips, T.; Davydov, E.; Golovin, D.; et al. 2013. Ad click prediction: a view from the trenches. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 1222 1230.

Mironov, I. 2017. R enyi differential privacy. In IEEE Computer Security Foundations Symposium (CSF), 263 275.

Nesterov, Y. 2005. Smooth minimization of non-smooth functions. Mathematical programming 103(1): 127 152.

Shalev-Shwartz, S.; Gonen, A.; and Shamir, O. 2011. Largescale convex minimization with a low-rank constraint. ar Xiv preprint ar Xiv:1106.1622 .

Thakurta, A. G.; and Smith, A. 2013. (Nearly) optimal algorithms for private online learning in full-information and bandit settings. In Neural Information Processing Systems (Neur IPS), 2733 2741.

Tropp, J. A.; et al. 2015. An introduction to matrix concentration inequalities. Foundations and Trends in Machine Learning 8(1-2): 1 230. Wang, D.; Ye, M.; and Xu, J. 2017. Differentially private empirical risk minimization revisited: Faster and more general. In Neural Information Processing Systems (Neur IPS), 2722 2731.