# multifidelity_bestarm_identification__c3d686c8.pdf

Multi-Fidelity Best-Arm Identiﬁcation

Riccardo Poiani DEIB, Politecnico di Milano riccardo.poiani@polimi.it

Alberto Maria Metelli DEIB, Politecnico di Milano albertomaria.metelli@polimi.it

Marcello Restelli DEIB, Politecnico di Milano marcello.restelli@polimi.it

In several real-world applications, a learner has access to multiple environment simulators, each with a different precision (e.g., simulation accuracy) and cost (e.g., computational time). In such a scenario, the learner faces the trade-off between selecting expensive accurate simulators or preferring cheap imprecise ones. We formalize this setting as a multi-ﬁdelity variant of the stochastic bestarm identiﬁcation problem, where querying the original arm is expensive, but multiple and biased approximations (i.e., ﬁdelities) are available at lower costs. The learner s goal, in this setting, is to sequentially choose which simulator to query in order to minimize the total cost, while guaranteeing to identify the optimal arm with high probability. We ﬁrst derive a lower bound on the identiﬁcation cost, assuming that the maximum bias of each ﬁdelity is known to the learner. Then, we propose a novel algorithm, Iterative Imprecise Successive Elimination (IISE), which provably reduces the total cost w.r.t. algorithms that ignore the multi-ﬁdelity structure and whose cost complexity upper bound mimics the structure of the lower bound. Furthermore, we show that the cost complexity of IISE can be further reduced when the agent has access to a more ﬁne-grained knowledge of the error introduced by the approximators. Finally, we numerically validate IISE, showing the beneﬁts of our method in simulated domains.

1 Introduction

In the multi-armed bandit setting [26], the agent selects, at each interaction round, an arm and observes a sample drawn from its reward distribution. Unlike the regret minimization setting [26], where the agent tries to maximize the cumulative return, in best-arm identiﬁcation (BAI) problems [10, 6, 2], the evaluation index relates exclusively to the quality of the arm recommended at the end of the exploration phase. More speciﬁcally, in the ﬁxed-conﬁdence setting [17], the agent aims at identifying, with probability 1 δ, the arm with the largest mean reward, while minimizing the sample complexity, i.e., the number of collected samples necessary to make the ﬁnal decision. Due to its generality, the BAI framework encompasses a wide spectrum of applications, ranging from medical trials [38], to hyper-parameter optimization [27], crowdsourcing [47], recommender systems [25], simulation-based studies in physics [43], and A/B testing [14].

However, in practical cases, querying an arm to obtain a sample from its reward distribution might be expensive. For instance, in the context of physics simulation studies, where pulling an arm corresponds to the evaluation of a complex model under some given arm parameters, intensive use of computing power is required to obtain the reward we are interested in. Nevertheless, simpler and, consequently, cheaper and biased, models might be available to the agent. Another example is stochastic planning, where the agent needs to compute the best action (i.e., arm) that maximizes the future sum of rewards over a maximum horizon H. However, while in principle it is possible to plan

36th Conference on Neural Information Processing Systems (Neur IPS 2022).

considering the entire horizon, cheaper and biased estimates can be obtained by cutting the depth of the search to h < H. For these scenarios, the multi-ﬁdelity (MF) framework [9, 21, 22] that exploits this additional opportunity has recently gained attention in the literature. The main rationale behind these methodologies is to exploit less expensive approximations in speciﬁc regions of the arm space to limit the resources allocated to distinguish options that are far from optimal, and do not require high accuracy. So far, most of the multi-ﬁdelity literature has focussed on black-box function optimization, with the goal of optimizing a target function f when multiple approximations of f are available [15, 33, 22, 34, 39, 40, 19, 28, 11]. Among these works, the ones more related to the bandit literature are [22, 19, 39, 40, 11]; under different smoothness assumptions, these authors aimed at designing strategies that explore the arm space to minimize the simple regret, i.e., the difference between the true optimal value of f and the best unbiased query made to f during the optimization process. Other works studied cumulative regret analysis [21, 20], active learning [46, 32], reinforcement learning [9, 30, 41], and applied approaches [36, 44, 31].

In this paper, on the other hand, we focus on the understudied multi-ﬁdelity ﬁxed-conﬁdence BAI setting. More speciﬁcally, we study the minimization of the cost complexity, a novel metric that we introduce by extending the notion of sample complexity to the multi-ﬁdelity setting, by weighting each sample with the related cost. The goal of the agent is to output the optimal arm with probability 1 δ, while minimizing the total cost. To this purpose, as commonly done in the literature [9, 21, 20, 39, 11], we assume that the agent has some knowledge of the quality of the approximator, which is usually expressed in terms of upper bounds on the maximum bias that the ﬁdelities introduce on the arm set. This term corresponds to the maximum distance between the mean of the desired (but expensive) arm distributions and their approximations.

Contributions. In this paper, we introduce the notion of cost complexity for the multi-ﬁdelity BAI setting (Section 2). We derive a lower bound for this novel index, under the standard assumption that the agent is equipped with knowledge of the maximum bias that each ﬁdelity has w.r.t. the original, expensive and unbiased bandit model (Section 3). As we shall show, our result highlights several properties of the problem along with its challenges. Furthermore, we propose the Iterative Imprecise Successive Elimination (IISE) algorithm (Section 4), for which, we derive cost complexity upper bounds that, under certain assumptions, mimic the structure of the lower bound, and that is provably robust w.r.t. BAI algorithms that ignore the multi-ﬁdelity structure. We also show that the performance of IISE can be further improved when a more ﬁne-grained knowledge of the behavior of the approximations is available to the agent (Section 5). After revising the literature (Section 6), we propose numerical experiments on simulated domains that empirically verify our claims (Section 7).

In the classical ﬁxed-conﬁdence BAI setting [17], the agent interacts with a bandit model ν := (ν1, . . . , νK), where K N is the number of arms and νi denotes the reward distribution associated with arm i [K] := {1, . . . , K}. Denote with µi the expectation of νi and, assume, that the K arms are ordered such that µ1 > µ2 µK. We assume that the reward distributions νi are subgaussian with known scale parameter σ2. At each interaction round t N, the agent selects an arm It [K] and receives a reward Rt sampled from νIt. Let Ti(t) be the total number of pulls of arm i [K] up to time t N. Given a conﬁdence level δ (0, 1), the goal of the agent is to recommend an arm b I(τ) [K] that corresponds to the optimal arm 1 with probability at least 1 δ, while minimizing the sample complexity ρ(τ) := P

i [K] E [Ti(τ)], where τ is the stopping time of the agent. Any algorithm that recommends the best arm with probability 1 δ is said δ-correct. For every sub-optimal arm i > 1, we deﬁne its gap w.r.t. the optimal arm as i := µ1 µi.

In this paper, we differ from this classical setting as, for each arm i [K], we assume the agent has access to M 1 N ﬁdelity distributions νi,1, νi,2, . . . , νi,M 1, each σ2-subgaussian, with means µi,1, µi,2, . . . , µi,M 1. We unify the notation introducing a maximum ﬁdelity M to encode the true arm distributions the agent is interested in, i.e., νi,M := νi with mean µi,M := µi. As in previous works [9, 21, 20, 39, 11], we assume that there is a relation encoding how much information distributions of ﬁdelity m < M contain w.r.t. to the true arm distribution. More speciﬁcally, for each ﬁdelity m [M], it is always possible to write µi,m := µi,M + ξi,m for some appropriate ξi,m R. Then, we assume that the agent has access to ξm maxi [K] |ξi,m|, from which follows that |µi,m µi,M| ξm. We notice that ξm represents an upper-bound on the maximum bias of ﬁdelity m [M]. At each

interaction round t N, the agent selects an arm It [K], a ﬁdelity mt [M], and observes a reward Rt, drawn from νIt,mt. The cost of gathering samples at ﬁdelity m is known and speciﬁed by λm > 0. We make the standard assumption that λ1 < λ2 < < λM and ξ1 > ξ2 > > ξM := 0.1 At this point, let Ti,m(t) be the number of pulls to arm i [K] at ﬁdelity m [M] up to time t N; we deﬁne the cost complexity as:

m [M] λm E [Ti,m(τ)] , (1)

where τ is the stopping time at which the agent will output the optimal arm 1 with probability 1 δ.

3 Cost Complexity Lower Bound

In this section, we discuss the intrinsic complexity of the multi-ﬁdelity BAI problem, by providing and analyzing a lower bound on the cost complexity (proof in Appendix B).

Theorem 1. Consider a multi-ﬁdelity bandit model ν with Gaussian distributions νi,m = N(µi,m, σ2) such that |µi,m µi,M| ξm for every i [K] and m [M]. Then, for any δ-correct algorithm and δ 0.15, it holds that:

min m M1 λm KL(ν1,m, ν2,m) +

i=2 min m Mi λm KL(νi,m, ν1,m)

where KL(p, q) is the Kullback-Leibler divergence between distributions p and q, ν2,m = N(µ2,M + ξm, σ2), ν1,m = N(µ1,M ξm, σ2), M1 := {m [M] : µ1,m > Ex ν2,m [x]} and Mi := {m [M] : Ex ν1,m [x] > µi,m} for i > 1.

This bound reveals several properties of the multi-ﬁdelity ﬁxed-conﬁdence BAI problem. A single term λm KL 1(νi,m, ν1,m) with i = 1 can be interpreted as a lower bound on the minimum cost that any δ-correct algorithm needs to pay to conclude that arm i is sub-optimal. Suppose, for the moment, that Mi contains any ﬁdelity m [M]. Then, once we ﬁx m Mi, its related component λm KL 1(νi,m, ν1,m) is the cost-based multi-ﬁdelity counterpart of the usual KL 1(νi, ν1) which appears in the standard BAI lower bound [24]. This can be seen as a lower bound on the cost to discard arm i using samples gathered at ﬁdelity m. Notice that, in this sense, the term ν1,m replaces the usual ν1 to take the bias of ﬁdelity m into account. The minimum, consequently, encodes a lower bound on the cost of the most convenient ﬁdelity with which we can discard i. The reason why we have a minimum over Mi instead of [M] is that, given a ﬁdelity m [M], m might not be useful to discard i, since i might be too small for the bias that m introduces. This concept is further clariﬁed in the following remark:

Remark 1. Consider the following problem instance: for each m [M], let µ1,m = µ1,M ξm and µi,m = µi,M + ξm for i = 1. Then, Theroem 1 reduces to:

i=2 min m [M] : i>2ξm λmσ2

As we can see, if i < 2ξm for some m, its related component is not present in the lower bound. Furthermore, it is interesting to note that the setting of Remark 1 represents a worst-case ﬁdelitydependent scenario. Indeed, each ﬁdelity m [M 1] underestimates the reward of the optimal arm and overestimates the ones of sub-optimal arms.

Finally, we show how Theorem 1 highlights the main challenge of our setting. For each value of ξm, there exists a multi-ﬁdelity bandit model ν for which the lower bound is given by eΩ(PK i=2 λMσ2

2 i ) (given Remark 1, it is sufﬁcient to consider i such that the only ﬁdelity in the minimization set is M). It is clear that this cost complexity can be matched (up to log factors) by many BAI algorithms available in the literature (e.g., [10]) that ignore the multi-ﬁdelity structure of the problem. The main

1Cases in which λi = λi+1 or ξi = ξi+1 hold can easily be ruled out. Indeed, with the same cost one could gather a more precise sample using another ﬁdelity, or, equivalently, one could gather a sample of the same precision, using a cheaper ﬁdelity; i.e., we can remove these dominated ﬁdelity prior to running any method.

m=1 m=2 m=M

Low cost High cost High bias Low bias

perform arm elimina-

tion at ﬁdelity m

if below threshold αm then

move to ﬁdelity m+1

Figure 1: A visualization of how IISE works: at the beginning IISE pulls arms at the cheapest ﬁdelity in a Successive Elimination fashion. Using samples from these biased distributions, it tries to discard sub-optimal i arms whose i is sufﬁciently large. Once a given number of samples determined by αm has been reached, IISE moves to the sub-sequent ﬁdelity. This process goes on until the optimal arm is identiﬁed in high-probability.

issue, in this sense, is the fact that the agent does not know, a priori, the values of i. To build an algorithm that achieves robustness (i.e., not being worse than these uninformed strategies), then, we need to control how much we spend to explore the arm distributions at ﬁdelity m < M. More generally, this reasoning also extends to building algorithms that match the lower bound. In this sense, we observe, even for the ﬁxed-conﬁdence BAI setting, the adversarial behavior that ﬁdelity might have, as previously reported in [21] for the ﬁnite-armed bandit regret minimization setting.

4 Imprecise and Iterative Successive Elimination

In this section, we present our algorithm, Imprecise and Iterative Successive Elimination (IISE, Algorithm ??), to solve multi-ﬁdelity ﬁxed-conﬁdence BAI problems. In particular, while developing our method, we pursue the following two objectives. First, we want to guarantee robustness w.r.t. uninformed strategies that blindly use only the highest ﬁdelity M to identify the best arm. Indeed, we expect a good agent to be able to reduce the total cost by exploiting the peculiar multi-ﬁdelity structure of the problem. Second, we aim at providing a strategy whose upper-bound structure resembles, as close as possible, the one of the lower bound presented in Section 3.

Our solution (pseudo-code in Algorithm 1 and visual representation in Figure 1) builds on the Successive Elimination algorithm [10]. More speciﬁcally, we proceed by epochs, one for each ﬁdelity m [M], considered in increasing order. In each phase m [M], we query all the active arms, i.e., those that have not been discarded yet, through their approximated distributions νi,m at ﬁdelity m. We denote the set of active arms with S [K]. IISE keeps usual empirical mean estimates ˆµi,m,t of the mean µi,m of reward distributions νi,m for each i S, and discards all the arms i for which the following applies:

j S : ˆµj,m,t U t, δ KM , ξm

ˆµi,m,t + U t, δ KM , ξm

where U (t, δ, ξ) is a symmetric bound on the empirical means (deﬁned below). Condition (3) aims at eliminating all the active arms i whose estimated empirical mean upper bound falls below the empirical mean lower bound of another active arm j. More speciﬁcally, we set U (t, δ, ξ) :=

B (t, δ) + ξ = q

2σ2 log 4t2

δ t + ξ. As we can see, U (t, δ, ξ) is composed of the usual concentration term B (t, δ) plus the ﬁdelity bias ξ. As our proof highlights, this additional term is required to

Algorithm 1 Iterative Imprecise Successive Elimination (IISE).

Require: Multi-ﬁdelity bandit model ν, conﬁdence δ, thresholds {αm}M m=1, bounds {ξm}M m=1 1: S [K] 2: m 1 and t 0 3: while |S| > 1 do 4: if αm 4B t, δ KM then 5: m m + 1 and t 0 6: end if 7: Pull all arms in S at ﬁdelity m and t t + 1 8: Update ˆµj,m,t for all j [S] 9: S S \ i S : j S : ˆµj,m,t U t, δ KM , ξm ˆµi,m,t + U t, δ KM , ξm

10: end while 11: return S

ensure that the optimal arm remains, with high probability, in the active set S until termination.2 The duration of each epoch (i.e., how much IISE tries to exploit ﬁdelity m) is determined by a threshold parameter αm (to be deﬁned later). At each round, the value of B t, δ KM decreases and as soon as 4B t, δ KM drops below αm, IISE switches to the next ﬁdelity m + 1. Notice that, to achieve δ-correctness, it is required to set αM to 0, so that the last phase is guaranteed to identify the optimal arm, and αm > 0 for m < M, so that each phase is assured to terminate in a ﬁnite number of rounds. Then, keeping generic thresholds αm that satisfy the previous conditions, we can show that IISE is δ-correct and its cost complexity is upper bounded by the following theorem (proof in Appendix C).

Theorem 2. If αm > 0 for every m [M 1] and αM = 0, then, with probability at least 1 δ, IISE returns the optimal arm 1 with cost complexity c(τ) upper bounded by:

( i 4ξmi)2 log σ2MK ( i 4ξmi)2δ

α2m log σ2MK

where mi is the smallest m [M] for which i 4ξm + αm holds.

We notice that the cost required by IISE to discard i = 1 is composed of two terms. The ﬁrst one is the cost of discarding arm i using samples at ﬁdelity mi, while the second one is the cost of trying to discard arm at lower ﬁdelity m < mi. As our proof reveals, mi is the smallest ﬁdelity m for which IISE ensures that arm i will be discarded.

4.1 On the Choice of the Thresholds αm

In this section, we highlight one of the main challenges of the MF setting, as mentioned in Section 3. Since we are unaware of the suboptimality gaps i, we may attempt to choose αm so as to minimize the cost complexity of Theorem 2 w.r.t. the worst-case choice of suboptimality gaps i, that is:

( i 4ξmi)2 log σ2MK ( i 4ξmi)2δ

α2m log σ2MK

where α = (α1, . . . , αM) and = ( 2, . . . , K) are the threshold and gap vectors. However, it is easy to verify that Equation (4) is minimized for αm = + for all m < M and αM = 0, i.e., an algorithm that exploits the highest ﬁdelity M only. As we can see, IISE reﬂects the peculiar issue of the MF-BAI lower bound. In the worst case, any ﬁdelity m < M is not useful to reduce the identiﬁcation cost since it introduces too much bias w.r.t. i, regardless its cost. For this reason, as done in previous works [21], we restrict the class of problems with some assumptions whose goal is making the ﬁdelity structure provably convenient. More speciﬁcally:

Assumption 1 (Costs and Biases Relationship). For every ﬁdelity m [M 1], it holds that:

m<m min k>m ( λk λm)2

(ξm ξk)2 min k>m ( λk λm)2

(ξm ξk)2 (5)

2Omitting ξ in the elimination rule might wrongly eliminate the optimal arm just because, at low ﬁdelity, its approximated distribution has a negative bias.

Assumption 1 deserves some comments. First of all, it directly encodes a relationship between costs and biases. To provide an interpretation, we notice that the assumption is always satisﬁed for: (i) costs that grow at least linearly 3 and (ii) ξ s that satisfy the following decay rate: P

m<m 1 (ξm ξm+1)2 1 ξ2

m . The intuition behind the cost increase rate (i) is that, with sub-linear growth, attempting to discard arms with a cheaper ﬁdelity is too inconvenient w.r.t. gathering more accurate samples for a slightly higher cost. For what concerns the ξ s decay rate, instead, condition (ii) limits the adversarial behavior that ﬁdelity might have. If two subsequent ξ s are very close together, little can be inferred from switching from the ﬁrst ﬁdelity to the second one. This sort of problem was already highlighted in [21]. We remark that Assumption 1, compared to Assumption 1 of [21] P m<m 1 (ξm)2 1 ξ2

m , involves both costs and biases, which is something we expect since the main challenge of the setting is to trade-off between them. Indeed, depending on the values of λ s it is possible to tolerate ξ s that decay linearly; vice-versa, if ξ s decay fast enough, it is possible to tolerate costs that increase sub-linearly.

Although Equation (5) might appear abstract, we show that it holds for a large class problems. Consider a Markov Decision Process (MDP) [35] (Y, A, p, r, η), where Y is the state space, A is the action space, p = {ph}h 1 a set of transitions kernels, r = {rh}h 1 a set of reward functions, and η [0, 1] is a discount factor. When an agent takes action a in state y at step h it transitions to y with probability ph(y |y, a), and receives a deterministic reward rh(y, a) [0, 1]. Given a state y Y, our goal is to ﬁnd the action that maximizes the cumulative discounted reward over the next H steps4 Given an initial action, a way to compute a sample from its future cumulative discounted reward is to apply a Monte Carlo search algorithm (e.g., depth ﬁrst search) truncating the planning depth at h < H to obtain a cheaper but biased estimate of the cumulative discounted reward. In this sense, this scenario ﬁts the ﬁxed-conﬁdence MF-BAI setting. Let M be the maximum depth, then, for each depth m [M], we set as λm the number of generated nodes, that, assuming a constant branch factor Λ, is exponential in m (i.e., λm = Λm). Instead, for ξm it is easy to obtain the following upper bounds on the maximum bias ξm = (ηm+1 ηM+1)/(1 η). Now, it is possible to show that Assumption 1 holds (proof in Appendix C).

Proposition 1. Consider Λ 2. If λm = Λm and ξm = (ηm+1 ηM+1)/(1 η) for all m [M], then Assumption 1 holds.

4.2 Most Convenient Fidelity Thresholds

Given the structure of Theorem 1 and its interpretation, during a phase m [M], we expect a good choice of thresholds α to let IISE discard all the arms i = 1 that should (and can) be discarded at m. Therefore, during m, we aim at concluding that a given arm is suboptimal only if there is no other ﬁdelity m > m in which gathering samples at m is more convenient, in term of identiﬁcation cost, w.r.t. m. We formalize the previous concept in the following deﬁnition.

Deﬁnition 1 (Fidelity Convenience). Consider a sub-optimal arm i = 1 and a ﬁdelity m < M such that i > 4ξm. We say that ﬁdelity m is convenient for arm i if:

λm ( i 4ξm)2 min

m>m λm ( i 4ξm)2 . (6)

In Equation (6), we are comparing the cost of discarding i gathering samples at phase m only with the cost of discarding i using a more precise ﬁdelity m > m. Therefore, solving it for i, we identify the gaps of the arms that can be conveniently discarded at ﬁdelity m rather than at m:

i max M m>m 4(ξm λm ξm λm) λm λm = max M m>m 4ξm + 4(ξm ξm) λm λm λm . (7)

To discard all i that satisfy Equation (7), it is sufﬁcient to set thresholds αm for m < M as follows:

αm := max M m>m 4(ξm ξm) λm λm λm . (8)

At this point, under Assumption 1, it is possible to prove that, with this peculiar threshold choice, the cost complexity of IISE can be rewritten in the following way (proof in Appendix C).

3More formally, ( λi+1 λi)2 ( p

λj)2 for i, j [M 1] and j > i. 4In the discounted setting, this is equivalent to search for an ϵ-optimal action.

Theorem 3. Under Assumption 1, selecting the thresholds αm as in Equation (8), with probability at least 1 δ, IISE returns the optimal arm with cost complexity c(τ) upper bounder by:

i=2 min m [M]: i>4ξm λmσ2

As we can appreciate, this result comes with two desired properties. First of all, IISE is provably robust w.r.t. algorithms that ignore the multi-ﬁdelity structure of the problem. We remark that to have this nice property, Assumption 1 does not need to hold for each m [M]. It is always possible to pre-select a subset of ﬁdelity for which the assumption holds. Secondly, the bound in Theorem 3 closely mimics the behavior of the lower bound (Equation 2). The only difference stands in the fact that 2ξm is now replaced with 4ξm. We leave this gap to be ﬁlled in for future work. Notice, again, that the similarity also holds for the case in which we select a subset of ﬁdelity for which the assumption holds. In this case, however, our minimization will be on a restricted ﬁdelity set w.r.t. the original one.

4.3 Nearly-Optimal Identiﬁcation

As for the Successive Elimination algorithm, one can easily modify IISE so that it returns an ϵ-optimal arm. In this case, we can show that the identiﬁcation cost is upper bounded by (proof in Appendix C):

i/ Kϵ min m: i>4ξm λmσ2

( i 4ξm)2 + X

i Kϵ min m:ϵ>4ξm λmσ2

where Kϵ := {i [K] : µi > µ1 ϵ}. As usual, for arms that are not ϵ-optimal we pay the original cost of Theorem 3. For the ones in Kϵ, instead, their cost is actually clipped by ϵ. In multi-ﬁdelity, this has the further implication that, depending on the values of biases, costs and required accuracy, we might never query the most expensive unbiased ﬁdelity M. In particular, let us focus on a two-armed bandit problem with two ﬁdelity values, and consider 2 < ϵ. Then, we will never query M if:

ϵ 4ξm + 4ξm λm λM λm .

The ﬁrst term guarantees that IISE can discard i using m, the second one, instead encodes the convenience of m w.r.t. to M in terms of costs. As λM increases, this term shrinks to zero. When λM + , the condition reduces to ϵ > 4ξm. This means that if ﬁdelity m is precise enough for the desired accuracy ϵ, we completely neglect ﬁdelity M.

5 Breaking the Cost Barrier

We now show that (a slightly modiﬁed version of) IISE can further reduce the cost complexity when more ﬁne-grained knowledge on the type of error that approximators introduce is available. The main intuition behind the following reasoning is that what matters when exploiting ﬁdelity m < M is not the bias they introduce, but whether the arms do not preserve the order w.r.t. the mean of the true reward distributions νi,M. Indeed suppose to have access to a ﬁdelity with a very large bias but which fully preserves the order on the means of the arms. In this scenario, it is possible to run a standard BAI technique on this ﬁdelity only and still obtain the optimal arm. We now formalize how much a certain ﬁdelity m preserves the order with the novel concept of maximum bias variation γm:

γm := max i,j [K]{|ξi,m ξj,m|}.

Under the assumption that γ1 γ2 γM := 0 are available to the learner, IISE can be conveniently modiﬁed to obtain the following bound on the cost complexity (proofs, pseudocode, and

formal statement in Appendix D):5

i=2 min m [M]: i>2γm λmσ2

To obtain such a result, the only required modiﬁcations stand in the elimination condition, which now becomes ˆµj,m,t B t, δ KM ˆµi,m,t + B t, δ KM + γm, and in the deﬁnition of the thresholds αm, in which 4ξ s are swapped with 2γ s. We will refer to this modiﬁed algorithm with IISE-γ. As our proofs show, IISE-γ can also exploit upper bounds φm on the true maximum bias variation (i.e., φm γm); the complexity is as in Equation (9), but γm is replaced with φm.

We remark that, by deﬁnition, γm 2ξm, which provably demonstrates the beneﬁts of exploiting order-aware knowledge in the MF-BAI setting. Interestingly, we also notice that in the worst-case scenario (i.e., γm = 2ξm), Equation (9) implicitly recovers the cost complexity of Theorem 3.

Practical Relevance of Maximum Bias Variation. Finally, we highlight that the knowledge about upper bounds φm is available in a large class of problems. In particular, consider settings in which means of ﬁdelity m < M are underestimations of the ones of ﬁdelity M, i.e., µi,M µi,m 0 for all i [K] and m [M]. This is the case, for instance, of the stochastic planning application of Section 4, but it also common in other settings such as training models with less iterations. In these problems, it always holds that γm ξm (proof in Appendix D). Therefore, we can set φm = ξm in IISE-γ to obtain the improved cost complexity of Equation (9), where 2γm is replaced with 2ξm.

6 Related Works

In this section, we revise the literature with particular attention to best-arm identiﬁcation and the multi-ﬁdelity setting.

Best-Arm Identiﬁcation. The BAI setting has aroused the interest of the research community for a long time [10, 6, 2]. In particular, [10] proposed the Successive Elimination algorithm, together with the ﬁrst gap-dependent upper bound on the sample complexity: O(PK i=2 2 i (log (δ 1) + log (K) + log ( 2 i )). This algorithm is known to match the lower bound up to logarithmic factors. Indeed, [29, 24] showed that, for all possible instances, every δ-correct algorithm requires at least Ω(PK i=2 2 i log(δ 1)) samples to output the optimal arm with probability 1 δ. To overcome the logarithmic mismatch, several works have proposed both tighter upper [12, 23, 16, 7, 13, 8] and lower bounds [16, 13, 8]. Concerning the BAI literature, we leverage on the change of distributions arguments presented in [24] to build lower bounds for our novel cost complexity metric. As for the upper bound, instead, to favor the simplicity of exposition and intuition, we build on the Successive Elimination algorithm. Clearly, our proofs can straightforwardly incorporate the techniques from the more reﬁned analysis [e.g., 23] to tighten the logarithmic dependencies.

Multiple Fidelity. The use of multiple ﬁdelity has gathered particular attention from the Bayesian Optimization (BO) [5] ﬁeld, with a large variety of algorithms that operate under different assumptions and with different goals [15, 33, 34, 28]. In particular, a recent line of works [22, 19] has focused on minimizing the simple regret suffered when a certain budget has been spent to optimize a target function under smoothness conditions guaranteed by Gaussian processes [37]. The study of the simple regret has also been extended to black-box function optimization [39, 40, 11] under hierarchical partition assumptions on the target function. However, all previous work differs signiﬁcantly from the study we present in this paper. Indeed, they optimize different goals w.r.t. our novel cost complexity notion and make structured assumptions about the target function (i.e., the arm space). With most of them [21, 20, 39, 11], we share the assumption on the knowledge of the upper bound on the maximum bias ξm for each ﬁdelity. Of particular interest w.r.t. our work, is the one of [11]. In particular, the authors highlighted the concept that what matters when exploiting ﬁdelity m < M is how much they preserve the order rather than the bias they have. However, their assumption about order preservation is expressed in a general form; with the maximum bias variation γm, instead, we

5The cost complexity bound is obtained under an assumption equivalent to Assumption 1 in which all ξ s are replaced with γ s.

Table 1: Cost complexity results (mean and 95% conﬁdence interval of 100 runs).

Algorithm Synthetic A Synthetic B Yahtzee

MFE 212.43 3.89 3330.27 62.59 342.75 0.69 IISE 11.47 0.31 27.06 0.77 0.53 0.01 IISE-γ 2.53 0.06 6.39 0.17 0.001 0.01

propose a closed-form expression for this concept, and we quantitatively show its impact in terms of beneﬁts with respect to the upper bound on the maximum bias. We also showed that upper bounds φm on this metric are known and available in a large class of problems, and that our algorithm can successfully exploit this knowledge to reduce identiﬁcation cost.

The work most closely related to ours is [21]. The authors consider the ﬁnite arm setting with known upper bounds on the maximum bias for each ﬁdelity. However, while we take into account the identiﬁcation cost, in [21], the authors analyze a pseudo-regret notion that sets as an optimal strategy the one that invests the whole budget in querying the optimal arm. Interestingly, we got similar results, in terms of structure, for what concerns the lower bound, although we remark that the proof techniques are different since we deal with a different performance metric. As for the proposed algorithm and the upper bounds, our proofs rely on novel concepts such as the ﬁdelity convenience (i.e., Deﬁnition 1), which are tailored to the identiﬁcation cost we have analyzed. In addition, as we have seen in Section 4, we make a different assumption w.r.t. [21]: our assumption directly states a relationship between costs and biases, which is more reasonable since the goal of the multi-ﬁdelity setting is to trade-off between the two. Finally, w.r.t. [21], we analyze the effect of more ﬁne-grained knowledge on different ﬁdelity, which provably leads to performance improvements.

To conclude, some works [9, 30, 41] have also investigated the use of multi-ﬁdelity simulators in Reinforcement Learning (RL) [42] domains. Among these works, [9] develops ﬁxed conﬁdence algorithms (up to a desired precision ϵ > 0) both for the RL setting and for the bandit one. However, their setting is signiﬁcantly different from ours as they do not consider each ﬁdelity to be related to a given cost. Instead, they simply aim at minimizing the number of sub-optimal steps at ﬁdelity M, while controlling the number of samples taken in ﬁdelity m < M to ensure they are polynomial.

7 Numerical Validation

In this section, we provide a numerical validation of our theoretical claims in both synthetic and simulated domains. To prove that IISE is successful at exploiting ﬁdelity, we compare it with: (i) Successive Elimination (SE) [10], which exploits only maximum ﬁdelity, and with a version of IISE that chooses the thresholds αm so as to exploit every ﬁdelity as much as possible (i.e., αm close to 0). We call this modiﬁed version Maximum Fidelity Exploit (MFE) algorithm. Table 1 reports all the results in terms of the percentage of cost complexity of a given algorithm w.r.t. the one of Successive Elimination. The value 100 corresponds to the relative cost complexity of SE.

Synthetic bandits. We start presenting some experiments on two synthetic bandits (Synthetic A and Synthetic B) with randomly generated arms. These experiments aim at validating our robustness results along with the beneﬁt of the order-aware knowledge. Synthetic A setting parameters are K = 2000, M = 4, λ = [1, 10, 100, 1000], ξ = [1.15, 0.225, 0.015, 0], γ = [0.3, 0.05, 0.001, 0]; for Synthetic B, instead, K = 1000, M = 5, λ = [16, 64, 256, 1024, 4096], ξ = [1.15, 0.45, 0.105, 0.0105, 0], γ = [0.3, 0.1, 0.01, 0.001, 0]. To make the γ and ξ settings directly comparable, we generated the arms such that the ﬁdelity index m is left unchanged (i.e., it simultaneously holds that λ1 > λ2 > > λM, ξ1 > ξ2 > > ξM, and γ1 > γ2 > > γM). Further details on this generation process can be found in Appendix E. As we can appreciate from Table 1, IISE substantially reduces the cost complexity w.r.t. SE, and, its order-aware version IISE-γ, obtains further improvements. Table 1 also highlights the importance of choosing a good threshold αm. Indeed, MFE insists in exploiting a given ﬁdelity, even when there is no convenience in doing so, thus obtaining signiﬁcantly worse results than SE.

Yahtzee. We now apply our algorithm to the problem of choosing the ﬁrst action in the stochastic planning domain introduced in Section 4. More speciﬁcally, we consider the Yahtzee game [3]. Yahtzee is a sequential dice game in which a player rolls 5 dice up to 3 times and then chooses a particular move that assigns a score to the ﬁnal dice combination. For instance, typical moves are:

Sixes with a score given by the sum of dice with the number 6, and Yahtzee , that assigns 50 points if all the dice show the same number, and 0 otherwise. The total number of these possible moves is 14 (i.e., our branching factor Λ). The game proceeds in 13 rounds. During each round t, the player selects a move and obtains the corresponding points. We consider the variant of Yathzee in which, if the player has already selected a certain move at t < t, then, at t, that move will lead to reward 0. Furthermore, during each round, the player will roll the 5 dice only once, after which a move has to be selected. Further details are provided in Appendix E. Table 1 shows that our method signiﬁcantly reduces the total number of generated nodes, thus lowering successfully the total cost of identifying the best action, and, that exploiting the upper bounds on the maximum bias variation instead of ξ s is beneﬁcial in terms of total cost. Finally, MFE overexploits ﬁdelity and underperforms SE.

8 Conclusions and Discussion

In this work, we have studied, for the ﬁrst time, the variant of the ﬁxed-conﬁdence BAI scenario in which the agent has access to multiple approximations with different costs and biases of the original bandit model. We proposed the novel notion of cost complexity to evaluate the performance that a given algorithm obtains in these situations. We derived a lower bound on this index, which shows several interesting properties of the problem, along with its main challenges. We also presented a novel algorithm, IISE, which provably reduces the cost complexity compared to algorithms that ignore the availability of multiple ﬁdelity. More speciﬁcally, IISE discards a sub-optimal arm i with the most convenient, in terms of cost and ﬁdelity, thus closely mimicking the shape of the lower bound. Furthermore, we also showed that the maximum bias is not what actually matters in identifying the best arm. Indeed, better guarantees can be obtained with the knowledge of the maximum bias variation, an original concept that we introduce to encode how much a given ﬁdelity preserves order.

To close the problem from a theoretical perspective, it is reasonable to assume that both upper and lower bounds could be improved in settings that do not fall under Assumption 1. More speciﬁcally, the lower bound of Theorem 1, that considers the minimum over the entire ﬁdelity set, seems currently optimistic since it does not encode the cost that is needed to explore at other ﬁdelities. To tackle this issue, proof techniques that rely on the characteristic time function [13] might lead to tighter lower bounds. Consequently, adapting to the MF setting their well-known Track and Stop" strategy [13] might provide upper bounds that asymptotically match the lower bound in its characteristic time form. However, results obtainable from these approaches usually come at the cost of lower interpretability (and, depending on the nature of the resulting characteristic function, computational complexity). On the other hand, Theorem 1 provides several intuitions behind the challenges of the Multi-Fidelity BAI setting, and IISE is computationally efﬁcient.

Furthermore, whenever Assumption 1 does not hold, the cost of exploring" lower ﬁdelities becomes more and more relevant. Indeed, Assumption 1 guarantees that there is a large enough ﬁdelity gap" (i.e., ﬁdelity differs in terms of the joint costs-biases relationship speciﬁed by the assumption). The main issue in settings, where this gap is small, roots down to the chicken-egg problem of MF-BAI. On one side, we would like to exploit these approximators to discard sub-optimal arms; on the other side, we do not know if these approximators will be useful (i.e., in the worst case, we pay an exploration cost). In the case, where the ﬁdelity-gap" is small, this problem is repeated over and over for all these ﬁdelities close to one another (i.e., the cumulative sum of the exploration costs becomes non-negligible). In this sense, tighter lower bounds (such as the ones previously suggested) might shed light on whether such assumptions are necessary or not, or if tighter assumptions are possible/necessary. This represents an exciting line for future research.

To conclude, we note that previous studies [45, 18, 1] highlighted the strong connections between ﬁxed-conﬁdence BAI and learning near-optimal RL policies while minimizing the number of samples gathered in the environment. In this sense, our ideas pave the way for future work that aims at learning these policies when multiple simulators are available. Indeed, previous efforts in MF-RL [9, 30, 41] do not directly take the challenging cost-precision trade-off into account.

[1] Aymen Al Marjani and Alexandre Proutiere. Adaptive sampling for best policy identiﬁcation in markov decision processes. In International Conference on Machine Learning, pages 7459

7468. PMLR, 2021.

[2] Jean-Yves Audibert, Sébastien Bubeck, and Rémi Munos. Best arm identiﬁcation in multi-armed bandits. In COLT, pages 41 53. Citeseer, 2010.

[3] Andrea Bonarini, Alessandro Lazaric, and Marcello Restelli. Yahtzee: a large stochastic environment for rl benchmarks. In NIPS 2005 Workshop on Reinforcement Learning Benchmarks and Bake-offs II, pages 43 48, 2005.

[4] Stéphane Boucheron, Gábor Lugosi, and Olivier Bousquet. Concentration inequalities. In Summer school on machine learning, pages 208 240. Springer, 2003.

[5] Eric Brochu, Vlad M Cora, and Nando De Freitas. A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. ar Xiv preprint ar Xiv:1012.2599, 2010.

[6] Sébastien Bubeck, Rémi Munos, and Gilles Stoltz. Pure exploration in multi-armed bandits problems. In International conference on Algorithmic learning theory, pages 23 37. Springer, 2009.

[7] Lijie Chen and Jian Li. On the optimal sample complexity for best arm identiﬁcation. ar Xiv preprint ar Xiv:1511.03774, 2015.

[8] Lijie Chen, Jian Li, and Mingda Qiao. Towards instance optimal bounds for best arm identiﬁcation. In Conference on Learning Theory, pages 535 592. PMLR, 2017.

[9] Mark Cutler, Thomas J Walsh, and Jonathan P How. Reinforcement learning with multi-ﬁdelity simulators. In 2014 IEEE International Conference on Robotics and Automation (ICRA), pages 3888 3895. IEEE, 2014.

[10] Eyal Even-Dar, Shie Mannor, and Yishay Mansour. Pac bounds for multi-armed bandit and markov decision processes. In International Conference on Computational Learning Theory, pages 255 270. Springer, 2002.

[11] Côme Fiegel, Victor Gabillon, and Michal Valko. Adaptive multi-ﬁdelity optimization with fast learning rates. In International Conference on Artiﬁcial Intelligence and Statistics, pages 3493 3502. PMLR, 2020.

[12] Victor Gabillon, Mohammad Ghavamzadeh, and Alessandro Lazaric. Best arm identiﬁcation: A uniﬁed approach to ﬁxed budget and ﬁxed conﬁdence. Advances in Neural Information Processing Systems, 25, 2012.

[13] Aurélien Garivier and Emilie Kaufmann. Optimal best arm identiﬁcation with ﬁxed conﬁdence. In Conference on Learning Theory, pages 998 1027. PMLR, 2016.

[14] Steven R Howard and Aaditya Ramdas. Sequential estimation of quantiles with applications to a/b testing and best-arm identiﬁcation. Bernoulli, 28(3):1704 1728, 2022.

[15] Deng Huang, Theodore T Allen, William I Notz, and R Allen Miller. Sequential kriging optimization using multiple-ﬁdelity evaluations. Structural and Multidisciplinary Optimization, 32(5):369 382, 2006.

[16] Kevin Jamieson, Matthew Malloy, Robert Nowak, and Sébastien Bubeck. lil ucb: An optimal exploration algorithm for multi-armed bandits. In Conference on Learning Theory, pages 423 439. PMLR, 2014.

[17] Kevin Jamieson and Robert Nowak. Best-arm identiﬁcation algorithms for multi-armed bandits in the ﬁxed conﬁdence setting. In 2014 48th Annual Conference on Information Sciences and Systems (CISS), pages 1 6. IEEE, 2014.

[18] Anders Jonsson, Emilie Kaufmann, Pierre Ménard, Omar Darwiche Domingues, Edouard Leurent, and Michal Valko. Planning in markov decision processes with gap-dependent sample complexity. Advances in Neural Information Processing Systems, 33:1253 1263, 2020.

[19] Kirthevasan Kandasamy, Gautam Dasarathy, Junier Oliva, Jeff Schneider, and Barnabas Poczos. Multi-ﬁdelity gaussian process bandit optimisation. Journal of Artiﬁcial Intelligence Research, 66:151 196, 2019.

[20] Kirthevasan Kandasamy, Gautam Dasarathy, Junier B Oliva, Jeff Schneider, and Barnabás Póczos. Gaussian process bandit optimisation with multi-ﬁdelity evaluations. Advances in neural information processing systems, 29, 2016.

[21] Kirthevasan Kandasamy, Gautam Dasarathy, Barnabas Poczos, and Jeff Schneider. The multiﬁdelity multi-armed bandit. Advances in neural information processing systems, 29, 2016.

[22] Kirthevasan Kandasamy, Gautam Dasarathy, Jeff Schneider, and Barnabás Póczos. Multi-ﬁdelity bayesian optimisation with continuous approximations. In International Conference on Machine Learning, pages 1799 1808. PMLR, 2017.

[23] Zohar Karnin, Tomer Koren, and Oren Somekh. Almost optimal exploration in multi-armed bandits. In International Conference on Machine Learning, pages 1238 1246. PMLR, 2013.

[24] Emilie Kaufmann, Olivier Cappé, and Aurélien Garivier. On the complexity of best-arm identiﬁcation in multi-armed bandit models. The Journal of Machine Learning Research, 17(1):1 42, 2016.

[25] Pushmeet Kohli, Mahyar Salek, and Greg Stoddard. A fast bandit algorithm for recommendation to users with heterogenous tastes. In Proceedings of the AAAI Conference on Artiﬁcial Intelligence, volume 27, pages 1135 1141, 2013.

[26] Tor Lattimore and Csaba Szepesvári. Bandit algorithms. Cambridge University Press, 2020.

[27] Lisha Li, Kevin Jamieson, Giulia De Salvo, Afshin Rostamizadeh, and Ameet Talwalkar. Hyperband: A novel bandit-based approach to hyperparameter optimization. The Journal of Machine Learning Research, 18(1):6765 6816, 2017.

[28] Shibo Li, Wei Xing, Robert Kirby, and Shandian Zhe. Multi-ﬁdelity bayesian optimization via deep neural networks. Advances in Neural Information Processing Systems, 33:8521 8531, 2020.

[29] Shie Mannor and John N Tsitsiklis. The sample complexity of exploration in the multi-armed bandit problem. Journal of Machine Learning Research, 5(Jun):623 648, 2004.

[30] Alonso Marco, Felix Berkenkamp, Philipp Hennig, Angela P Schoellig, Andreas Krause, Stefan Schaal, and Sebastian Trimpe. Virtual vs. real: Trading off simulations and physical experiments in reinforcement learning with bayesian optimization. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pages 1557 1563. IEEE, 2017.

[31] Arturo Molina-Cristóbal, Patrick R Palmer, Benjamin A Skinner, and Geoffrey T Parks. Multiﬁdelity simulation modelling in optimization of a submarine propulsion system. In 2010 IEEE Vehicle Power and Propulsion Conference, pages 1 6. IEEE, 2010.

[32] Quan Nguyen, Arghavan Modiri, and Roman Garnett. Nonmyopic multiﬁdelity acitve search. In International Conference on Machine Learning, pages 8109 8118. PMLR, 2021.

[33] Victor Picheny, David Ginsbourger, Yann Richet, and Gregory Caplin. Quantile-based optimization of noisy computer experiments with tunable precision. Technometrics, 55(1):2 13, 2013.

[34] Matthias Poloczek, Jialei Wang, and Peter Frazier. Multi-information source optimization. Advances in neural information processing systems, 30, 2017.

[35] Martin L Puterman. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.

[36] Dev Rajnarayan, Alex Haas, and Ilan Kroo. A multiﬁdelity gradient-free optimization method and application to aerodynamic design. In 12th AIAA/ISSMO multidisciplinary analysis and optimization conference, page 6020, 2008.

[37] Carl Edward Rasmussen. Gaussian processes in machine learning. In Summer school on machine learning, pages 63 71. Springer, 2003.

[38] Herbert Robbins. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58(5):527 535, 1952.

[39] Rajat Sen, Kirthevasan Kandasamy, and Sanjay Shakkottai. Multi-ﬁdelity black-box optimization with hierarchical partitions. In International conference on machine learning, pages 4538 4547. PMLR, 2018.

[40] Rajat Sen, Kirthevasan Kandasamy, and Sanjay Shakkottai. Noisy blackbox optimization using multi-ﬁdelity queries: A tree search approach. In The 22nd international conference on artiﬁcial intelligence and statistics, pages 2096 2105. PMLR, 2019.

[41] Varun Suryan, Nahush Gondhalekar, and Pratap Tokekar. Multiﬁdelity reinforcement learning with gaussian processes: model-based and model-free algorithms. IEEE Robotics & Automation Magazine, 27(2):117 128, 2020.

[42] Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018.

[43] Brijen Thananjeyan, Kirthevasan Kandasamy, Ion Stoica, Michael Jordan, Ken Goldberg, and Joseph Gonzalez. Resource allocation in multi-armed bandit exploration: Overcoming sublinear scaling with adaptive parallelism. In International Conference on Machine Learning, pages 10236 10246. PMLR, 2021.

[44] Felipe AC Viana, Valder Steffen, Sergio Butkewitsch, and Marcus de Freitas Leal. Optimization of aircraft structural components by using nature-inspired algorithms and multi-ﬁdelity approximations. Journal of Global Optimization, 45(3):427 449, 2009.

[45] Andrea Zanette, Mykel J Kochenderfer, and Emma Brunskill. Almost horizon-free structureaware best policy identiﬁcation with a generative model. Advances in Neural Information Processing Systems, 32, 2019.

[46] Chicheng Zhang and Kamalika Chaudhuri. Active learning from weak and strong labelers. Advances in Neural Information Processing Systems, 28, 2015.

[47] Yuan Zhou, Xi Chen, and Jian Li. Optimal pac multiple arm identiﬁcation with applications to crowdsourcing. In International Conference on Machine Learning, pages 217 225. PMLR, 2014.

1. For all authors...

(a) Do the main claims made in the abstract and introduction accurately reﬂect the paper s contributions and scope? [Yes] See for instance the contributions subsection. It list, one by one, the content of each of the section of the paper. (b) Did you describe the limitations of your work? [Yes] See Section 4.

(c) Did you discuss any potential negative societal impacts of your work? [No] Did not ﬁnd any. (d) Have you read the ethics review guidelines and ensured that your paper conforms to them? [Yes] 2. If you are including theoretical results...

(a) Did you state the full set of assumptions of all theoretical results? [Yes] We stated formally all the theoretical results throughout the main text (i.e., Theorem 1, Theorem 2, Theorem 3, Proposition 1). We highlighted the set of assumptions in each claim. For extensions (e.g. Equation (9) we provide formal statements in the Appendix, but set of assumptions has been speciﬁed also in the main text. (b) Did you include complete proofs of all theoretical results? [Yes] We provide a detailed appendix with complete proofs of all theoretical results.

3. If you ran experiments...

(a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] Experiments are detailed in Section 7 and further documented in Appendix E. Further details can be found within the codebase. (c) Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? [Yes] As speciﬁed in Section 7, we report mean and 95% conﬁdence intervals using 100 runs. (d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] Yes, see Appendix E. 4. If you are using existing assets (e.g., code, data, models) or curating/releasing new assets...

(a) If your work uses existing assets, did you cite the creators? [N/A] (b) Did you mention the license of the assets? [N/A]

(c) Did you include any new assets either in the supplemental material or as a URL? [N/A]

(d) Did you discuss whether and how consent was obtained from people whose data you re using/curating? [N/A] (e) Did you discuss whether the data you are using/curating contains personally identiﬁable information or offensive content? [N/A] 5. If you used crowdsourcing or conducted research with human subjects...

(a) Did you include the full text of instructions given to participants and screenshots, if applicable? [N/A] (b) Did you describe any potential participant risks, with links to Institutional Review Board (IRB) approvals, if applicable? [N/A] (c) Did you include the estimated hourly wage paid to participants and the total amount spent on participant compensation? [N/A]