# optimistic_optimization_of_a_brownian__75513dbf.pdf

Optimistic Optimization of a Brownian

Jean-Bastien Grill Michal Valko R emi Munos Seque L team, INRIA Lille - Nord Europe, France and Deep Mind Paris, France jbgrill@google.com michal.valko@inria.fr munos@google.com

We address the problem of optimizing a Brownian motion. We consider a (random) realization W of a Brownian motion with input space in [0, 1]. Given W, our goal is to return an ε-approximation of its maximum using the smallest possible number of function evaluations, the sample complexity of the algorithm. We provide an algorithm with sample complexity of order log2(1/ε). This improves over previous results of Al-Mharmah and Calvin (1996) and Calvin et al. (2017) which provided only polynomial rates. Our algorithm is adaptive each query depends on previous values and is an instance of the optimism-in-the-face-of-uncertainty principle.

1 Introduction to sample-efﬁcient Brownian optimization

We are interested in optimizing a sample of a standard Brownian motion on [0, 1], denoted by W. More precisely, we want to sequentially select query points tn [0, 1], observe W(tn), and decide when to stop to return a point bt and its value c M = W bt in order to well approximate its maximum M supt [0,1] W(t). The evaluations tn can be chosen adaptively as a function of previously observed values W(t1), ..., W(tn 1). Given an ε > 0, our goal is to stop evaluating the function as early as possible while still being able to return bt such that with probability at least 1 ε, M W bt ε. The number of function evaluations required by the algorithm to achieve this ε-approximation of the maximum deﬁnes the sample-complexity.

Motivation There are two types of situations where this problem is useful. The ﬁrst type is when the random sample function W (drawn from the random process) already exists prior to the optimization. Either it has been generated before the optimization starts and the queries correspond to reading values of the function already stored somewhere. For example, ﬁnancial stocks are stored at a high temporal resolution and we want to retrieve the maximum of a stock using a small number of memory queries. Alternatively, the process physically exists and the queries correspond to measuring it.

Another situation is when the function does not exist prior to the optimization but is built simultaneously as it is optimized. In other words, observing the function actually creates it. An application of this is when we want to return a sample of the maximum (and the location of the maximum) of a Brownian motion conditioned on a set of already observed values. For example, in Bayesian optimization for Gaussian processes, a technique called Thomson sampling (Thompson, 1933; Chapelle and Li, 2011; Russo et al., 2018; Basu and Ghosh, 2018) requires returning the maximum of a sampled function drawn from the posterior distribution. The problem considered in the present paper can be seen as a way to approximately perform this step in a computationally efﬁcient way when this Gaussian process is a Brownian motion.

Moreover, even though our algorithm comes from the ideas of learning theory, it has applications beyond it. For instance, in order to computationally sample a solution of a stochastic differential equation, Hefter and Herzwurm (2017) express its solution as a function of the Brownian motion W and its running minimum. They then need, as a subroutine, an algorithm for the optimization of Brownian motion to compute its running minimum. We are giving them that and it is light-speed fast.

32nd Conference on Neural Information Processing Systems (Neur IPS 2018), Montr eal, Canada.

Prior work Al-Mharmah and Calvin (1996) provide a non-adaptive method to optimize a Brownian motion. They prove that their method is optimal among all non-adaptive methods and their sample complexity is polynomial of order 1/ ε. More recently, Calvin et al. (2017) provided an adaptive algorithm with a sample complexity lower than any polynomial rate showing that adaptability to previous samples yields a signiﬁcant algorithmic improvement. Yet their result does not guarantee a better rate than a polynomial one.

Our contribution We introduce the algorithm OOB = optimistic optimization of the Brownian motion. It uses the optimism-in-face-of-uncertainty apparatus: Given n 1 points already evaluated, we deﬁne a set of functions Un in which W lies with high probability. We then select the next query point tn where the maximum of the most optimistic function of Un is reached: tn arg maxt [0,1] maxf Un f(t). This begets a simple algorithm that requires an expected number of queries of the order of log2(1/ε) to return an ε-approximation of the maximum, with probability at least 1 ε w.r.t. the random sample of the Brownian motion. Therefore, our sample complexity is better than any polynomial rate.

Solving an open problem Munos (2011) provided sample complexity results for optimizing any function f characterized by two coefﬁcients (d, C) where d is the near-optimality dimension and C the corresponding constant (see his Deﬁnition 3.1). It is deﬁned as the smallest d 0 such that there exists a semi-metric ℓand a constant C > 0, such that, for all ε > 0, the maximal number of disjoint ℓ-balls of radius O(ε) with center in {x, f(x) supx f(x) ε} is less than Cε d. Under the assumption that f is locally (around one global maximum) one-sided Lipschitz with respect to ℓ(see his Assumption 2), he proved that for a function f characterized by (d = 0, C), his DOO algorithm has a sample complexity of O(C log(1/ε)), whereas for a function characterized by (d > 0, C), the sample complexity of DOO is O C/εd . Our result answers a question he raised: What is the near-optimality dimension of a Brownian-motion? The Brownian motion being a stochastic process, this quantity is a random variable so we consider the number of disjoint balls in expectation. We show that for any ε, there exists some particular metric ℓε such that the Brownian motion W is ℓε-Lipschitz with probability 1 ε, and there exists a constant C(ε) = O(log(1/ε)) such that (d = 0, C(ε)) characterizes the Brownian motion. However, there exists no constant C < independent of ε such that (d = 0, C) characterizes the Brownian motion. Therefore, we solved this open problem. Our answer is compatible with our result that our algorithm has a sample complexity of O(log2(1/ε)).

2 New algorithm for Brownian optimization

Our algorithm OOB is a version of DOO (Munos, 2011) with a modiﬁed upper bound on the function, in order to be able to optimize stochastic processes. Consider the points t1 < t1 < ... < tn evaluated so far and t0 = 0. OOB deﬁnes an upper conﬁdence bound B[ti,ti+1] for each interval [ti, ti+1] with i {0, ..., n 1} and samples W in the middle of the interval with the highest upper-conﬁdence bound. Algorithm 1 reveals its pseudo-code.

Algorithm 1 OOB algorithm

1: Input: ε 2: Init: I {[0, 1]}, t1 = W(1) 3: for i = 2, 3, 4, . . . do 4: [a, b] arg max I I BI {break ties arbitrarily} 5: if ηε(b a) ε then 6: break 7: end if 8: ti W a+b

9: I {I [a, a+b

2 , b]}\{[a, b]} 10: end for 11: Output: location btε arg maxti W(ti) and its value W btε

More formally, let ε be the required precision, the only given argument of the algorithm. For any 0 a < b 1, the interval [a, b] is associated with an upper bound B[a,b] deﬁned by

B[a,b] max(W(a), W(b)) + ηε(b a), where δ > 0 s.t. εδ 1

OOB keeps track of a set I of intervals [a, b] with W already being sampled at a and b. The algorithm ﬁrst samples W(1), W(1) N(0, 1), in order to initialize the set I to the singleton {[0, 1]}. Then, OOB keeps splitting the interval I I associated with the highest upper bound BI quam necessarium.

3 Guarantees: OOB is correct and sample-efﬁcient

Let M supt [0,1] W(t) be the maximum of the Brownian motion, btε the output of OOB called with parameter ε > 0, and Nε the number of Brownian evaluations performed until OOB terminates. All are random variables that depend on the Brownian motion W. We now voice our main result. Theorem 1. There exists a constant c > 0 such that for all ε < 1/2,

P M W btε > ε ε and E[Nε] c log2(1/ε).

The ﬁrst inequality quantiﬁes the correctness of our estimator c Mε = W btε . Given a realization of the Brownian motion, our OOB is deterministic. The only source of randomness comes from the realization of the Brownian. Therefore, being correct means that among all possible realizations of the Brownian motion, there is a subset of measure at least 1 ε on which OOB outputs an estimate c Mε which is at most ε-away from the true maximum. Such guarantee is called probably approximately correct (PAC). The second inequality quantiﬁes performance. We claim that the expectation (over W) of the number of samples that OOB needs to optimize this realization with precision ε is O log2(1/ε) .

Corollary 1. We get the classic (δ, ε)-PAC guarantee easily. For any δ > 0 and ε > 0, choose ε = min(δ, ε) and apply Theorem 1 for ε from which we get P M W btε > ε ε which is stronger than P M W btε > ε δ. Similarly, E[Nε ] c log2(1/ε ) 4c(log(1/ε) + log(1/δ))2. Remark 1. Our PAC guarantee is actually stronger than stated in Theorem 1. Indeed, the PAC guarantee analysis can be done conditioned on the collected function evaluations and get

P M W btε > ε W(t1), ..., W(t Nε) ε,

from which taking the expectation on both sides gives the ﬁrst part of Theorem 1. This means that the unfavorable cases, i.e., the Brownian realizations for which M c Mε > ε, are not concentrated on some subsets of Brownian realizations matching some evaluations in t1, ..., t Nε. In other words, the PAC guarantee also holds when restricted to the Brownian realizations matching the evaluations in t1, ..., t Nε only. This is possible because Nε is not ﬁxed but depends on the evaluations done by OOB.

One difference from the result of Calvin et al. (2017) is that theirs is with respect to the Lp norm. For their algorithm, they prove that with n samples it returns btn [0, 1] such that

r > 1, p > 1, cr,p, E h M W(btn) pi1/p cr,p/nr.

To express their result in the same formalism as ours, we ﬁrst choose to achieve accuracy ε2 and compute the number of samples nε2 needed to achieve it. Then, for p = 1, we apply Markov inequality and get that for all r > 1 there exists cr,1 such that

P M W(btnε2 ) > ε ε and Nε cr,1/ε1/r.

On the other hand, in our Theorem 1 we give a poly-logarithmic bound for the sample complexity and we are in the business because this is better than any polynomial rate.

4 Analysis and the proof of the main theorem

We provide a proof of the main result. Let Iﬁn be the set I of intervals tracked by OOB when it ﬁnishes. We deﬁne an event C such that for any interval I of the form I = [k/2h, (k + 1)/2h] with k and h being two integers where 0 k < 2h, the process W is lower than BI on the interval I.

Deﬁnition 1. Event C is deﬁned as

sup t [k/2h,(k+1)/2h] W(t) B[k/2h,(k+1)/2h]

Event C is a proxy for the Lipschitz condition on W for the pseudo-distance d(x, y) = p

|y x| ln(1/|y x|) because ηε(δ) scales with δ as p

δ ln (1/δ). We show that it holds with high probability. To show it, we make use of the Brownian bridge which is the process Br(t) (W(t)|W(1) = 0). Lemma 1 follows from the Markov property of the Brownian combined with a bound on the law of the maximum of Br(t) to bound the probability P[supt I W(t) BI] for any I of the form [k/2h, (k + 1)/2h] and a union bound over all these intervals. Lemma 1. For any ε, event C from Deﬁnition 1 holds with high probability. In particular,

Proof. For any interval I,

BI = max(W(a), W(b)) + ηε(b a) (by deﬁnition of BI)

= W(a) + W(b)

2 + |W(a) W(b)|

2 + ηε(b a) (max(x, y)=(x+y+|x y|)/2)

W(a) + W(b)

s W(a) W(b)

2 + (ηε(b a))2 x, y > 0, (x+y)2 x2+y2 .

We now deﬁne

2h , h,k W(th,k) W(th,k+1)

2 , ηh ηε(b a),

sup t [k/2h,(k+1)/2h] W(t) > B[k/2h,(k+1)/2h]

First, for any a < b, the law of the maximum of a Brownian bridge gives us

x max(W(a), W(b)): P

sup t [a,b] W(t)>x W(a)=Wa, W(b)=Wb

=exp 2(x Wa)(x Wb)

Combining it with the deﬁnition of Ah,k and the ﬁrst inequality of the proof we get

P Ah,k W(th, k), W(th, k + 1)

2 W (th,k+1) W (th,k)

2 h,k + η2 h W (th,k) W (th,k+1)

2 h,k + η2 h

th,k+1 th,k

= exp 2h+1 q

2 h,k + η2 h h,k q

2 h,k + η2 h + h,k

= exp 2h+1η2 h = exp 2h+1 5 2 2h ln 2h

By deﬁnition, C T h=0 T2h 1 k=0 Ac h,k = S h=0 S2h 1 k=0 Ah,k. By union bound on all Ah,k we get

k=0 P[Ah,k] X

Lemma 1 is useful for two reasons. As we bound the sample complexity on event C and the complementary event in two different ways, we can use Lemma 1 to combine the two bounds to prove Proposition 2 in the end. We also use a weak version of it, bounding ε5 by ε to prove our PAC guarantee. For this purpose, we combine the deﬁnition of C with the termination condition of OOB to get that under event C, the best point c Mε so far, is close to the maximum M of the Brownian up to ε. Since C holds with high probability, we have the following PAC guarantee which is the ﬁrst part of the main theorem.

Proposition 1. The estimator c Mε = W btε is probably approximately correct with

P h M c Mε > ε i ε.

Proof. Let Inext = [a, b] be the interval that the algorithm would split next, if it was not terminated. Since the algorithm only splits the interval with the highest upper bound then Bnext = sup I Iﬁn BI. Also let Imax Iﬁn be one of the intervals where a maximum is reached, tmax arg maxt [0,1] W(t) M and tmax Imax. Then, on event C,

M BImax BInext = max(W(a), W(b)) + ηε(b a).

Since the algorithm terminated, we have that ηε(b a) ε and therefore,

max(W(a), W(b)) M ε,

which combined with Lemma 1 ﬁnishes the proof as ε5 ε.

In fact, Proposition 1 is the easy-to-obtain part of the main theorem. We are now left to prove that the number of samples needed to achieve this PAC guarantee is low. As the next step, we deﬁne the near-optimality property. A point t is said to be η-near-optimal when its value W(t) is close to the maximum M of the Brownian motion up to η. Check out the precise deﬁnition below.

Deﬁnition 2. When an (h, k, η) veriﬁes W k

2h M η, we say that the point t = (k/2h) is η-nearoptimal. We deﬁne Nh(η) as the number of η-near-optimal points among 0/2h, 1/2h, ..., 2h/2h ,

k 0, ..., 2h, such that W k

Notice that Nh(η) is a random variable that depends on a particular realization of the Brownian motion. The reason why we are interested in the number of near-optimal points is that the points the algorithm will sample are ηε 1/2h -near-optimal. Since we use the principle of optimism in face of uncertainty, we consider the upper bound of the Brownian motion and sample where this upper bound is the largest. If our bounds on W hold, i.e., under event C, then any interval I with optimistic bound BI < M is never split by the algorithm. This is true when C holds because if the maximum of W is reached in Imax, then BImax M > BI which shows that Imax is always chosen over I. Therefore, a necessary condition for an interval [a, b] to be split is that max(W(a), W(b)) M η which means that either a or b or both are η-near-optimal which is the key point of Lemma 2. Lemma 2. Under event C, the number of evaluated points Nε by the algorithm veriﬁes

h=0 Nh ηε 1/2h , with hmax being the smallest h such that ηε 1/2h ε.

Lemma 2 explicitly links the near-optimality from Deﬁnition 2 with the number of samples Nε performed by OOB before it terminates. Here, we use the optimism-in-face-of-uncertainty principle which can be applied to any function. In particular, we deﬁne a high-probability event C under which the number of samples is bounded by the number of near-optimal points Nh(ηh) for all h hmax.

Proof. Let I = [a, b] be an interval of It such that max(W(a), W(b))+ηε(b a) < M. Let Inext It be the interval that the algorithm would split after t function evaluations.. Since the algorithm only splits the interval with the highest upper bound, then BInext = sup I It BI. Moreover, if we let Imax It be one of the intervals where a maximum is reached, tmax arg maxt [0,1] W(t) M and tmax Imax, then on event C,

max(W(a), W(b)) + ηε(b a) BI < M BImax BInext.

Therefore, under C, a necessary condition for an interval I = [a, b] to be split during the execution of OOB is that max(W(a), W(b)) M ηε(b a), which means that either a or b or both are ηε(b a)-near-optimal. From the termination condition of the algorithm, we know that any interval that is satisfying I = [k/2h, (k + 1)/2h] with h hmax will not be split during the execution.1 Therefore, another necessary condition for an interval I = [a, b] to be split during the execution is that b a > 1/2hmax. Writing ηh ηε 1/2h , we deduce from these two necessary conditions that

1This holds despite ηε( ) is not always decreasing.

2h or k + 1

2h is ηh-near-optimal

2h is ηh-near-optimal + 1 k + 1

2h is ηh-near-optimal

2h is ηh-near-optimal = 2

h=0 Nh(ηh).

We now prove a property speciﬁc to W by bounding the number of near-optimal points of the Brownian motion in expectation. We do it by rewriting it as two Brownian meanders (Durrett et al., 1977), both starting at the maximum of the Brownian, one going backward and the other one forward with the Brownian meander W + deﬁned as

t [0, 1] W +(t) |W(τ + t(1 τ))| 1 τ , where τ sup{t [0, 1] : W(t) = 0}.

We use that the Brownian meander W + can be seen as a Brownian motion conditioned to be positive (Денисов, 1983). This is the main ingredient of Lemma 3. Lemma 3. For any h and η, the expected number of near-optimal points is bounded as

E[Nh(η)] 6η22h.

This lemma answers a question raised by Munos (2011): What is the near-optimality dimension of the Brownian motion? We set ηh ηε 1/2h . In dimension one with the pseudo-distance ℓ(x, y) = ηε(|y x|), the near-optimality dimension measures the rate of increase of Nh(ηh), the number of ηh-near-optimal points in [0, 1] of the form k/2h. In Lemma 3, we prove that in expectation, this number increases as O η2 h2h = O(log(1/ε)), which is constant with respect to h. This means that for a given ε, there is a metric under which the Brownian is Lipschitz with probability at least 1 ε and has a near-optimality dimension d = 0 with C = O(log(1/ε)).

The ﬁnal sample complexity bound is essentially constituted by one O(log(1/ε)) term coming from the standard DOO error for deterministic function optimization and another O(log(1/ε)) term because we need to adapt our pseudo-distance ℓto ε such that the Brownian is ℓ-Lipschitz with probability 1 ε. The product of the two gives the ﬁnal sample complexity bound of O log2(1/ε) .

Proof of Lemma 3. We denote by W, the Brownian motion whose maximum M is ﬁrst hit at the point deﬁned as t M = inf{t [0, 1]; W(t) = M} and B+ a Brownian meander (Durrett et al., 1977). We also deﬁne

B+ 0 (t) M W(t M t t M)) t M and B+ 1 (t) M W(t M + t(1 t M)) 1 t M

If L= denotes the equality in distribution, then Theorem 1 of Денисов (1983) asserts that

B+ L= B+ 0 L= B+ 1 and t M is independent from both B+ 0 and B+ 1 . We upper-bound the expected number of η-near-optimal points for any integer h 0 and any η > 0,

E[Nh(η)] = E

k=0 E 1 W k

k=0 E 1 W k

k=0 E 1 B+ 0

k=0 E 1 B+ 1

< η 1 t M k

If X and Y are independent then for any function f,

E[f(X, Y )] = E[E[f(X, Y )|Y ]] (law of total expectation) supy E[f(X, y)|Y = y] (for any Z, E[Z] supw Z(w)) = supy E[f(X, y)]. (because X and Y are independent)

Since t M is independent from B+ 0 and B+ 1 , then using the above with X =(B+ 0 , B+ 1 ), Y = t M, and

f : (x0, x1), y

we can claim that

E[Nh(η)] = E[f(X, Y )] sup u [0,1] E[f(X, u)]

sup u [0,1]

k=0 E 1 B+ 0

+ sup u [0,1]

k=0 E 1 B+ 1

= sup u [0,1]

+ sup u [0,1]

k= u2h P B+ 1

= 2 sup u [0,1]

= 2 sup u [0,1] {α1 + α2 + α3 + α4},

k= 2hη2 P B+ 0

k= u2h 2hη2 P B+ 0

Since a probability is always upper-bounded by 1, we bound α1 and α4 both by η22h, to get that α1 + α4 2η22h. We now bound the remaining probabilities appearing in the above expression by integrating over the distribution function of Brownian meander (Durrett et al., 1977, Equation 1.1),

P B+ 0 (t) < x = 2

y exp y2/(2t)

exp w2/(2(1 t))

2π(1 t) dw dy

0 y2 exp y2

2πt(1 t) 2x3

2πt(1 t) = 2 3

where the ﬁrst two inequalities are obtained by upperbounding the terms exp( ) by one. Now, we use the above bound to bound α2 + α3,

k= 2hη2 P B+ 0

u2h + k u2h

Changing the indexation as k = k + u2h , we get

2hη2 1 π η22h η22h,

where in the last line we used that for any k0 1,

1 u3/2 du = 1

1 u3/2 du = 1

k3/2 0 + 2 k0 3 k0

We ﬁnally have

u, α1 + α2 + α3 + α4 3η22h

and therefore E[Nh(η)] 6η22h.

To conclude the analysis, we put Lemma 2 and Lemma 3 together in order to bound the sample complexity conditioned on event C.

Lemma 4. There exists c > 0 such that for all ε 1/2, E[Nε | C ] c log2(1/ε).

Proof. By deﬁnition of hmax,

ε2 5 2 2hmax ln 2hmax

2hmax/ε 2hmax ,

from which we deduce that εε2 5 2 2hmax/2 and therefore 2hmax 25

which gives us an upper bound on hmax,

hmax ln(25/4)

ln 2 + 5 ln(1/ε)

ln 2 = O(log(1/ε)).

Furthermore, using Lemma 1, we get that for any ε 1/2,

E[Nh(ηh)] = E[Nh(ηh)| C ]P[C] + E[Nh(ηh)| Cc]P[Cc] E[Nh(ηh)| C ](1 ε) E[Nh(ηh)| C ]

We now use Lemma 2 and Lemma 3 to get

E[Nε| C ] 2

5 2 ln(2h+1/ε)

h=0 ln 2h+1/ε

h=1 (ln(1/ε) + h ln 2) = O h2 max + hmax log(1/ε) = O log2(1/ε) .

We also bound the sample complexity on Cc, the event complementary to C, by the total number of possible intervals [k/2h, (k + 1)/2h] with ηε 1/2h ε. Then, we combine it with Lemma 1, Lemma 2, and Lemma 3 to get Proposition 2 which is the second part of the main theorem. Proposition 2. There exists c > 0 such that for all ε < 1/2,

E[Nε] c log2(1/ε).

Proof. From the law of total expectation,

E[Nε] = E[Nε| C ]P[C] + E[Nε|Cc]P[Cc] E[Nε| C ] + ε5E[Nε|Cc].

By Lemma 4, we have that E[Nε| C ] = O log2(1/ε) . Now let hmax be the maximum h at which points are evaluated. Then,

h=0 2h = 2hmax+1 1 2hmax+1.

As in the proof of Lemma 4, we get 2hmax+1 = O ε 5 . We ﬁnally obtain that

E[Nε] O log2(1/ε) + O(1) = O log2(1/ε) .

Proposition 2 together with Proposition 1 establish the proof of Theorem 1, the holy grail.

5 Numerical evaluation of OOB

For an illustration, we ran a simple experiment and for different values of ε, we computed the average empirical sample complexity Nε on 250 independent runs that you can see on the left plot. We also plot one point for each run of OOB instead of averaging the sample complexity, to be seen on the right. The experiment indicates a linear dependence between the sample complexity and log2(1/ε).

6 Conclusion and ideas for extensions and generalizations

We presented OOB, an algorithm inspired by DOO (Munos, 2011) that efﬁciently optimizes a Brownian motion. We proved that the sample complexity of OOB is of order O log2(1/ε) which improves over the previously best known bound (Calvin et al., 2017). As we are not aware of a lower bound for the Brownian motion optimization, we do not know whether O log2(1/ε) is the minimax rate of the sample complexity or if there exists an algorithm that can do even better.

What would be needed to do if we wanted to use our approach for Gaussian processes with a different kernel? The optimistic approach we took is quite general and only Lemma 3 would need additional work as this is the ingredient most speciﬁc to Brownian motion. Notice, that Lemma 3 bounds the number of near-optimal nodes of a Brownian motion in expectation. To bound the expected number of near-optimal nodes, we use the result of Денисов (1983) which is based on 2 components:

1 A Brownian motion can be rewritten as an afﬁne transformation of a Brownian motion conditioned to be positive, translated by an (independent) time at which the Brownian motion attains its maximum. 2 A Brownian motion conditioned to be positive is a Brownian meander. It requires some additional work to prove that a Brownian motion conditioned to be positive is actually properly deﬁned.

A similar result for another Gaussian process or its generalization of our result to a larger class of Gaussian processes would need to adapt or generalize these two items in Lemma 3. On the other hand, the adaptation or generalization of the other parts of the proof would be straightforward. Moreover, for the second item, the full law of the process conditioned to be positive is actually not needed, only the local time of the Gaussian process conditioned to be positive at points near zero.

Acknowledgements This research was supported by European CHIST-ERA project DELTA, French Ministry of Higher Education and Research, Nord-Pas-de-Calais Regional Council, Inria and Otto-von-Guericke-Universit at Magdeburg associated-team north-European project Allocate, French National Research Agency projects Ex Tra-Learn (n.ANR-14-CE24-0010-01) and Bo B (n.ANR-16CE23-0003), FMJH Program PGMO with the support to this program from Criteo, a doctoral grant of Ecole Normale Sup erieure, and Maryse & Michel Grill.

Hisham Al-Mharmah and James M. Calvin. Optimal random non-adaptive algorithm for global optimization of Brownian motion. Journal of Global Optimization, 8(1):81 90, 1996.

Kinjal Basu and Souvik Ghosh. Analysis of Thompson sampling for Gaussian process optimization in the bandit setting. ar Xiv preprint ar Xiv:1705.06808, 2018.

James M. Calvin, Mario Hefter, and Andr e Herzwurm. Adaptive approximation of the minimum of Brownian motion. Journal of Complexity, 39(C):17 37, 2017.

Olivier Chapelle and Lihong Li. An empirical evaluation of Thompson sampling. In Neural Information Processing Systems. 2011.

И. В. Денисов. Случайное блуждание и винеровский процесс, рассматриваемые из точки максимума. Теория вероятностей и ее применения, 28(4):785 788, 1983.

Richard T. Durrett, Donald L. Iglehart, and Douglas R. Miller. Weak convergence to Brownian meander and Brownian excursion. The Annals of Probability, 5(1):117 129, 1977.

Mario Hefter and Andr e Herzwurm. Optimal strong approximation of the one-dimensional squared Bessel process. Communications in Mathematical Sciences, 15(8):2121 2141, 2017.

R emi Munos. Optimistic optimization of deterministic functions without the knowledge of its smoothness. In Neural Information Processing Systems, 2011.

Daniel Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, and Zheng Wen. A tutorial on Thompson sampling. Foundations and Trends in Machine Learning, 2018.

William R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25:285 294, 1933.