# dichotomous_optimistic_search_to_quantify_human_perception__06c8cce9.pdf Dichotomous Optimistic Search to Quantify Human Perception Julien Audiffren 1 In this paper we address a variant of the continuous multi-armed bandits problem, called the threshold estimation problem, which is at the heart of many psychometric experiments. Here, the objective is to estimate the sensitivity threshold for an unknown psychometric function Ψ, which is assumed to be non decreasing and continuous. Our algorithm, Dichotomous Optimistic Search (DOS), efficiently solves this task by taking inspiration from hierarchical multi-armed bandits and Black-box optimization. Compared to previous approaches, DOS is model free and only makes minimal assumption on Ψ smoothness, while having strong theoretical guarantees that compares favorably to recent methods from both Psychophysics and Global Optimization. We also empirically evaluate DOS and show that it significantly outperforms these methods, both in experiments that mimics the conduct of a psychometric experiment, and in tests with large pulls budgets that illustrate the faster convergence rate. 1. Introduction Psychophysics investigates the connection between physical stimuli and the subjective responses (such as sensations, or perceptions) they produce. This field of research has widespread applications, including the study of attention (Scheuerman et al., 2017), and the evaluation of treatments for pain relief (Nir et al., 2011). One of the key aspect of Psychophysics is the evaluation of human perception, which is generally assessed by performing psychometric experiments. They unfold as follows : the experimenter presents to an individual, called the observer, a sequence of stimuli of varying intensities (for instance, the volume of a specific sound, see e.g. (Hirahara, 2004)), and try to measure how often the different intensities are perceived by 1Departement of Neuroscience, University of Fribourg, Fribourg, Switzerland. Correspondence to: Julien Audiffren . Proceedings of the 38 th International Conference on Machine Learning, PMLR 139, 2021. Copyright 2021 by the author(s). the observer. In particular, many experiments are interested in measuring the sensitivity threshold, where the stimulus is just noticeable (Kontsevich & Tyler, 1999). Human perception is generally modeled as follows. For any given stimulus intensity s, the stimulus is perceived by the observer with (unknown) probability µs, and the relation between s and µs for a given observer is called the psychometric function, noted Ψ, and is defined such that Ψ(s) = µs for any stimulus intensity s. By definition, Ψ is assumed to be non-decreasing (the stronger the stimulus, the easier it is to perceive it) and continuous (Leek, 2001). Given a target perception probability µ , the objective of these psychometric experiments is to estimate the sensitivity threshold s such that Ψ(s ) µ . To achieve this, the experimenter must both choose a sequence of stimuli to present to the observer (which must be as short as possible, to limit the fatigue of the observer (Wichmann & Hill, 2001a)) and estimate s given the observer responses. In this work, this task will be referred to as the threshold estimation problem. One of the most commonly used technique for solving this problem is the constant stimuli method (Wichmann & Hill, 2001a), where the observer is presented with a fixed sequence of stimuli, spanning the range of sensation from imperceptible to consistently perceptible. After collecting the observer responses, the parameters of Ψ are estimated using the maximum likelihood method where the shape of Ψ is assumed to be known, e.g. a Gaussian cumulative distribution function (c.d.f.). Finally, s is estimated as Ψ 1(µ ). However, this approach suffers from many limitations. First, the parametric models have been shown to be frequently inconsistent with the observations, or to require many small empirical corrections to fit the data with an acceptable accuracy (Wichmann & Hill, 2001b). Second, the fixed sequence does not account for individual specificity, a key problem as it has been shown that the sensitivity threshold can vary by a factor of ten between individuals (Benson et al., 1989). Consequently, there has been an increased interest in using adaptive algorithms (Leek, 2001), where an agent A) aims at estimating directly the sensitivity threshold without estimating Ψ, therefore reducing the difficulty of the problem and B) adapts the sequence of stimulus intensity based on the observer responses. One popular such method in DOS to Quantify Human Perception Psychophysics is arguably the staircase (and its iterations) (Cornsweet, 1962; Lengyel & Fiser, 2019). However these methods generally rely on strong assumptions about the smoothness and shape of Ψ, and seldomly provide theoretical guarantees on the the estimator they provide. Therefore, proposing a novel, model free and principled algorithm that addresses the threshold estimation problem may bring significant improvements to psychometric experiments. Fortunately, in recent years multiple techniques have been developed for the stochastic adaptive optimization of blackbox functions of unknown smoothness (see e.g. (Grill et al., 2015; Shang et al., 2019)). In particular, in hierarchical bandits methods, the agent uses various concentration inequalities to explore a hierarchical partition of the arm space, progressively narrowing the candidate subspace that may contain the maximum. While the task of locating the extremum of a noisy function is different from threshold estimation, the two problems present similarities with respect to the difficulties they encounter. Therefore, methods and algorithms developed in the former may give valuable hindsights into addressing the latter. Following this idea, in this paper, we show that the threshold estimation problem can be rewritten as a pure exploration continuous multi-armed bandit problem, with interesting twists (Section 2). Then, we introduce the Dichotomous Optimistic Search algorithm (DOS), that takes inspiration from hierarchical bandits, and black box optimization to solve this problem (Section 4). The idea behind DOS is to perform a stochastic continuous binary search, while achieving the correct trade off between the depth of the binary tree, and the confidence in its noisy comparisons. This method has multiple advantage over existing algorithms. First, DOS is model free, in the sense that it does not requires the knowledge of the shape of the psychometric function and only assumes that Ψ is mildly smooth around s (in fact, we show that local H older continuity is a sufficient condition). Moreover, the agent does not require the knowledge of the smoothness of Ψ, or the use of a suited hierarchical partition. Second, DOS has advantageous theoretical guarantees: we prove (see Theorem 1) that the simple regret RT of DOS is upper bounded as follows (log T)2(log log T) This highlights the advantage of DOS over existing methods used in psychometric experiments, which to the best of our knowledge, do not have similar guarantee except in narrow settings. Moreover, this upper bound compares favorably to state of the art results, such as the simple regret bounds of POO (Grill et al., 2015), as it is independent of the local near-optimality dimension (a measure of the optimization problem difficulty, see e.g. (Shang et al., 2019)). Third, we also extensively evaluate DOS in a wide range of experiments (Section 5). We show that our agent significantly outperforms adaptive psychometric methods and recent global optimization methods in both experiments that mimic the conduct of a psychometric experiment, and in tests with large pulls budgets that illustrate the faster convergence rate of our agent. 2. The Threshold Estimation Problem Notation. Let T denote the time horizon (i.e. the maximum number of stimulus presented during the experiment), I R the (closed) interval of possible stimuli, Ψ : I 7 [0, 1] the psychometric function, µ [0, 1] the target probability, s .= Ψ 1(µ ) the sensitivity threshold. Finally, let µmin = infs I Ψ(s) and µmax = sups I Ψ(s). In the following we assume without any loss of generality that I = [0, 1] . We also assume that the target threshold is strictly reachable, i.e. µmin < µ < µmax . Due to the nature of the task (detecting stimuli of various intensity), the psychometric function is commonly assumed to be continuous and strictly increasing (see e.g. (Leek, 2001)). The objective of the threshold estimation problem is to find an estimator ˆs of the sensitivity threshold s with at most T stimuli. I, T and µ are known to the agent (here the experimenter), but Ψ and s are not. The process unfolds as follows. For each round t [1, . . . , T], first the agent pulls an arm (i.e. chooses an intensity) s I and then the environment (here the observer) draws an independent Bernoulli random variable with mean Ψ(s), and communicates the result to the agent, representing the detection of the stimulus. At time t = T, the agent returns the arm ˆs that is her best guess for the target stimulus s . The performance of the agent is then evaluated using the simple regret R, defined as R(ˆs) = |µ Ψ(ˆs)|. (1) Note that (1) is similar to the definition of simple regret in hierarchical bandits (Valko et al., 2013). The relation between the two is discussed later in this section. Remark 1 (Lapses and Guessing). Due to the subjective nature of perception, in general µmin > 0 and µmax < 1, even for completely undetectable (resp. unmissable) stimuli (Wichmann & Hill, 2001a). Indeed, µmin is identified as the guess rate, i.e. the chance that the observer correctly guesses the answer independently of the stimulus. Similarly 1 µmax is called the lapse rate and represents the probability of the observer missing the stimulus due to factors external to the experiment (such as blinking for a visual stimulus). These values are generally unknown at the beginning of the experiment, and most methods in psychophysics require the use of heuristics to estimate µmin and µmax (Wichmann & Hill, 2001a), before using these estimate to normalize the data. Importantly, this is not the case for DOS, which does DOS to Quantify Human Perception not require any prior information on µmin and µmax . In the rest of this paper, we make the following mild assumption on the smoothness of Ψ, which is required to prove DOS theoretical guarantees. Assumption 1 (Ψ is smooth around s ). There exists ν > 0, and 0 < ρ < 1 such that h > 0, s I, |s s | 2 h = |Ψ(s) Ψ(s )| νρh This hypothesis implies that Ψ is smooth enough around s , and prevents the well known find the needle in a haystack problem of global optimization (Valko et al., 2013). This assumption is similar to the ones used in recent black-box optimization methods of function of unknown smoothness such as (Grill et al., 2015). Notably, Assumption 1 does not restrict the choice of possible Ψ, as shown by Lemma 1, whose proof can be found in the supplementary materials. Lemma 1. Let Ψ : I 7 [0, 1], and V be a neighborhood of s Then, α > 0 Ψ is locally α H older continuous on V = Ψ satisfies Assumption 1. Proof. By definition, r > 0 such that V contains a ball B of radius r > 0 centered on s . Ψ is α H older on B, thus C > 0 such that s B, |Ψ(s) Ψ(s )| C|s s |α, hence it is easy to see that Assumption 1 is satisfied on B for ρ = 2 α and ν = C. For I \ B, we just observe that s I \ B, |s s | > r = 2 log r log 2 . and thus that Assumption 1 is satisfied on I \ B for ρ = 2 α log r log 2 . Hence the conclusion. In particular, all continuously differentiable Ψ (and consequently all the commonly used psychometric functions) satisfy Assumption 1. Importantly, DOS does not require the knowledge of the smoothness parameters (ν, ρ). Relation with Global Optimization Let Ψ be a psychometric function and µ a target probability. Define f as: f : I 7 [ 1, 0] s |µ Ψ(s)|. (2) It is easy to see that f admits s as its unique maximum, and that f(s ) = 0. Moreover, the regret defined by (1) is equivalent to the usual definition of simple regret for f (see e.g. (Bubeck et al., 2011)). Similarly, Assumption 1 implies a similar smoothness condition for f around its maximum. Therefore, (2) draws a link between the black box optimization of f and the threshold estimation of Ψ. However, since Ψ is unknown and can only be observed through the realizations of Bernoulli random variables, it is impossible to directly use global optimization strategies to solve the threshold optimization problem. Nevertheless, this transformation is useful to draw parallels between the two problems and their solutions. 3. Related Works Threshold Estimation in Psychophysics. Several adaptive algorithms have proposed to solve the threshold estimation problem in psychometric experiments. On the one hand, the staircase algorithm, arguably the most popular adaptive method, has been discussed and improved upon significantly in recent years (Wichmann & J akel, 2018). However, this method can only be used for a very limited list of target probability (such as µ = 0.5) (Brown, 1996), and convergence is only guaranteed for specific shape of the psychometric function (such as Gaussian c.d.f.) (Levitt, 1971). On the other hand, there has been increasing interest in parametric Bayesian adaptive algorithms. The purpose of these methods is generally to estimate the entire function Ψ, using a model based approach (e.g. Ψ is assumed to be a Gaussian c.d.f.). Notably, in (Kontsevich & Tyler, 1999) the authors proposed a method which aims at each step to minimize the entropy of the distribution of possible parameters for Ψ, while in (Shen & Richards, 2012), the authors introduced a sampling method that aims at minimizing the variance of each parameter. However, these methods also require the prior knowledge on the psychometric function shape, which significantly limits their applications, and (Garc ıa-P erez & Alcal a-Quintana, 2007; Hatzfeld et al., 2016) have empirically shown that all the aforementioned methods produce significantly worse estimations and might even diverge when this assumption is false. More recently, several works have proposed to use Gaussian Processes (GP) to approximate Ψ (Gardner et al., 2015a;b; Song et al., 2017). These algorithms are compatible with a much wider range of possible Ψ and can outperform other methods for some applications (see e.g. (Gardner et al., 2015a)). However these methods have a different, more general objective as they generally aim to estimate Ψ, and thus are more costly than methods that only approximate s . Moreover, they require the choice of a proper kernel and a mean function, as well as a grid of hyperparameters; and they perform best when these elements are hand crafted for the problem (Gardner et al., 2015a), using prior information regarding Ψ and may perform poorly when these assumptions are false and when the kernel is ill-suited, see e.g. Section 5. Conversely, our method, DOS, is completely model free, a significant advantage when nothing is known about the psychometric function, a situation that occurs frequently in psychophysics research (see e.g. (Sch utz et al., 2008)). Finally, and contrarily to the aforementioned methods, DOS has strong theoretical guarantees regarding its estimation of DOS to Quantify Human Perception s (see Theorem 1). Global Optimization. To address the problem of black box optimization of an unknown function f in presence of noise, two families of solutions have been proposed. In the first, f is assumed to have some strong global smoothness, such as a Lipschitz condition (see e.g. the Lipschitz multi armed bandit problem (Kleinberg et al., 2008; 2019)). In the second category, f is only assumed to have some local smoothness around its maximum (see e.g. (Valko et al., 2013)). This framework leads to arguably more difficult problems, in particular when the smoothness is unknown. However it has been shown that even in this setting it is possible to achieve near optimal regret bounds, for instance by using a hierarchical bandit approach such as POO Parallel Optimistic Optimization, see e.g. (Grill et al., 2015; Shang et al., 2019). This latest setting is the closest to our problem. Indeed, Assumption 1 can be seen as similar to their minimal assumption (see e.g. (Grill et al., 2015)) but Assumption 1 is slightly weaker in the sense that in our case the agent does not have access to a hierarchical partition that is well suited for the function f. Importantly, in both cases the smoothness parameters (ν, ρ) are unknown. One other significant difference between the two settings is the non decreasing property of Ψ. There is no equivalent hypothesis in the global optimization setting; and this property of Ψ is key to DOS, our solution to the threshold optimization problem, and its significantly better regret bound (see Theorem 1). Other related works. The threshold estimation problem shares some similarities with the noisy bissection problem (Chakraborty et al., 2011) and the learning the demand curve problem (Chhabra & Das, 2011). However, they are multiple crucial differences between these topics. For instance, in the later, the objective can be reformulated as minimizing the cumulative regret, instead of the simple regret (1) leading to very different, non equivalent solutions (Bubeck et al., 2011). Additionally, in both cases strong assumptions are made on the properties of the noise (e.g. Gaussian, (Jedynak et al., 2012)), the shape of the function or its smoothness (Chakraborty et al., 2011) which is very different from our model free approach, and cannot be easily adapted. Finally, the closest works to ours are arguably (Fontaine et al., 2020; Audiffren, 2021). The first has been developed simultaneously and independently, and while their algorithm also uses repeated pull of a candidate arm to obtain high probability comparison, it significantly differs from ours as : A) their algorithm was designed to optimize the cumulative regret, and the simple regret bound that can be derived from their work is significantly worse than ours, and B) their method cannot be applied to our problem (and conversely), as they rely on different, non equivalent hypotheses as well as different types of feedback (noisy gradient in their case). The second is an extended abstract that discusses general ideas related to DOS, but does not provide any theoretical analysis or regret bound and only limited empirical evaluation, which are main contributions of this work. 4. Contributions In this section, we introduce DOS and discuss its key ideas before proving an upper bound for its simple regret which compares favorably to the existing regret bounds in global optimization. We provide sketches of proof for the different results the detailed proofs can be found in the supplementary materials. In the following, we use log2(T) .= log(log(T)). Let κ denote the number of different arms that are pulled by DOS during the attributed time budget T. Since there is a continuous set of possible arms, and κ T, most arms will never get pulled; and in the following we say that the agent activates an arm when she pulls it for the first time. For any 1 i κ, we use si (resp. Ni(t), ˆµi(t) and µi) to denote the stimulus value (resp. the number of pulls, the empirical average and the true probability value) associated to the i-th activated arm at time t. Finally, let i = |µi µ | i.e. the regret obtained by the agent if she chooses to return the arm i when t = T. DOS strategy. The general idea of DOS is inspired by the deterministic dichotomous search algorithm : the agent aims to produce a sequence of intervals I1, . . . , Iκ I such that 1 i κ, |Ii| 2 i and s Ii (3) Note that if (3) is true, then |sκ s | 2 κ and Assumption 1 implies that νρκ (4) in other words, the sequence sκ (resp µκ) converges exponentially fast toward s (resp µ ). To produce this sequence, DOS proceeds as follows. To obtain Ii+1 given the interval Ii, the agent activates si+1 = 2ki+1 2i+1 the arm located at the center of Ii and repeatedly pulls this new arm, until the time budget is elapsed (t = T) or one of the two possible new arm activation criteria is satisfied. Then she compares µ , the target probability, and ˆµi+1(Ni+1), i.e. the empirical proportion of stimuli of intensity si+1 that were detected. Depending on the result, the agent defines the next interval Ii+1 in the sequence as : 2i+1 , 2ki + 1 if µ < ˆµi+1, 2ki + 1 2i+1 , 2ki + 2 DOS to Quantify Human Perception Algorithm 1 DOS Parameters µ (objective), T (time horizon) Initialization i 1 (current arm), s1 1/2 (current stimulus ), N1 0 (number of pulls of s1), ˆµ1 0 (empirical average of s1), t 0 (total pulls), S = NULL the latest promising arm, N as in (8) and BT ( ) as in (7). Main Loop While t T : If |µ ˆµi(t)| > 2BT (Ni(t)) or Ni(t) > N : If Ni(t) > N Then S i End If Activate new arm: i i + 1 and ( si 1 + (1/2i) if µ > ˆµi 1 si 1 (1/2i) if µ ˆµi 1 End If Sample arm si, update t, Ni, ˆµi End While Output: si , where i = S if S = NULL, κ otherwise. Here the agent leverages the fact that Ψ is monotonically increasing. Importantly, this strategy presents key differences with the deterministic dichotomous search, which are discussed below. The pseudocode for DOS can be found in Algorithm 1. Note that DOS is completely model-free, in the sense that it does not uses any parameter or prior knowledge regarding Ψ, including (ν, ρ), the parameters of Ψ local smoothness. DOS noisy comparisons. Contrarily to the deterministic setting, here the agent has only access to noisy observations of Ψ(si). Therefore, for any arm si the agent can only compare ˆµi and µ , and can never be sure if Ψ(si) µ . To quantify this uncertainty, DOS uses a Hoeffding-Chernoff concentration bound (see e.g. (Auer et al., 2007)): P |µi ˆµi(t)| > ε 2 exp 2Ni(t)ε2 . (5) Let Qi be the event where DOS reaches the wrong conclusion about the position of the arm si with respect to s : ( ˆµi(T) < µ if µi µ , ˆµi(T) µ if µi < µ , (6) and let qi .= P(Qi). While qi can be reduced by pulling the arm si multiple times and thus reducing the confidence interval the number of pulls required increases drastically as the distance i decreases. This is compounded by (4), as i can be expected to decrease exponentially as i increases. Meanwhile, decreasing the uncertainty of the comparisons comes at a cost on the number of activated arms, as more time is spent on each arm and (4) gives insight into the link between the number of activated arms κ and the quality of the arm sκ. Hence, the agent must achieve a proper trade-off between two opposite objective: Confidence: Pull each activated arm more to increase confidence in the comparison between ˆµ and µ , Depth: Increase the number of activated arm to improve the bound for κ provided by (4). Moreover, identifying the correct trade-off between the two objectives is complicated by the fact that while Ψ is assumed to satisfy Assumption 1, the parameters ν and ρ are unknown to the agent. This raise additional difficulties as these parameters are crucial to the quality of the upper regret bound (4) and to assess the behavior of Ψ around s . This conundrum is discussed below. Activation criteria. DOS uses two different activation rules to achieve the proper trade-off between Confidence and Depth. These two rules both rely on (5), but with different perspectives. The first rule forces the activation of a new arm if |µ ˆµi(t)| > BT (Ni(t)) .= 3 Ni(t) . (7) Note that (7) is a confidence interval commonly used for the optimism against uncertainty principle, see e.g. (Auer et al., 2007). If (7) is achieved, then the agent is considered confident enough to activate the next arm, regardless of the number of pulls Ni, as stated by the following Lemma: Lemma 2. If (7) is satisfied for the arm si, then qi < 2 Proof. Without any loss of generality, assume that ˆµi > µ . Using the second triangular inequality µi µ > ˆµi µ |ˆµi µi|, hence the conclusion by using (5) and (7). However, the number of pull required to achieve (7) can be too large, in particular if i is small. Hence the second rule plays an important role in achieving the aforementioned exploration trade-off, by setting a maximum number of pulls for the arm i before the activation of the next arm. Indeed, this rules forces the activation of a new arm if Ni(t) > N .= T (log T)(log2 T) Therefore, (8) provides a lower bound on the depth of the search: independently of observed results, DOS activates at least κ = (log T)(log2 T) arms. Moreover, it can be shown (see Lemma 3) that if (8) occurs, then si is a DOS to Quantify Human Perception promising arm, i.e. i is small enough for si to be a good estimator of s . These arms are key to DOS estimation of the sensitivity threshold, as discussed below. Lemma 3. If (7) is false, but (8) is satisfied, then with probability at least 1 2 T 3 , (log T)2(log2 T) Proof. Note that since (7) is not satisfied for Ni = N , we have |µ ˆµi(t)| < BT (N ). Hence, using the triangular inequality, i = |µ µi| |µ ˆµi| + |µi ˆµi|. The result is then obtained by using the concentration inequality (5) and the definition of N (8). DOS final output. When the time horizon is reached (t = T), there are two possible scenarios. In the first case, the activation rule (8) was used at least once, and the agent found at least one promising arm. Then the agent returns the last encountered promising arm, and the regret incurred is controlled by Lemma 3. In the second case, no promising arm was found during the exploration process and all arms were activated using (7). In this scenario, the agent returns the last activated arm sκ. To upper bound κ, let A .= { t T, i κ, |µi ˆµi(t)| BT (Ni)} . In other words, A is the event where the empirical average of all activated arms are concentrated around their true mean. First note that on A , if an arm is activated using (7), then the algorithm necessarily the right conclusion when comparing ˆµi and µ and thus (3) is true. Thus, in this scenario, the event A has a probability close to one, as stated by the following Lemma. Lemma 4. P (A ) 1 2 Proof. First note that on A , DOS always reach the right conclusion when comparing ˆµi and µ after the activation rule (7). Thus in this scenario the sequence of arms si is fixed (but not their respective number of pulls) and the result is then obtained by taking the union bound on all arms and on all times t < T. Since (3) is true, then (4) is also true and the regret of sκ is directly related to the number of activated arms κ. The following proposition proves a lower bound for κ. Proposition 1. Let i defined as i .= log T (log2 T)(log T)2 1 2 log(1/ρ) Then all the following properties are true (A) if log2 T > 1/ log ρ and T > 16, then i κ (B) i > i , A a.s. i ν (log2 T)(log T)2 Proof. (A) is proved by using the fact that the definition of N (8) implies that κ log(T) log2(T). (B) is derived from (4) by using the definition of i (9). In other words, (B) upper bounds DOS regret provided that the agent has activated at least i arms, and (A) states that for T large enough, more than i arms are activated. By combining all the previous results, it is possible to prove the following upper bound on the regret incurred by DOS: Theorem 1 (Upper Bound on simple regret). Assume that Hypothesis 1 is true. Then, T > 0, the simple regret of DOS RT is upper bounded by E(RT ) (3 + ν) (log T)2 log2(T) Proof. This results from noting that by definition RT 1, and then using Lemma 3, Lemma 4 and Proposition 1. Note that (10) does not depend on ρ, i.e. the bound is uniform for any value of ρ < 1. This is a very important property, as ρ is directly linked with the difficulty of the problem. Indeed, for a value of ρ close to one, the set of arms that are both A) close enough to require very large number of comparisons to be eliminated and B) not close enough to be a sufficiently good estimator of the sensitivity threshold may be very large. This problem is related to the notion of near optimality dimension, discussed below. Comparison with POO regret bound. Using the transformation described in (2), it is interesting to compare (10) to the upper regret bound for POO, which achieves state of the art performance in black box optimization problems (Shang et al., 2019). POO regret bound relies on the notion of near optimality dimension (Grill et al., 2015), for which we provide below an equivalent definition in the threshold estimation setting. Definition 1 (Near optimality dimension). Let ν > 0 and 0 < ρ < 1. The near optimality dimension of Ψ, noted d(ν, ρ), is defined as d(ν,ρ) .= inf d R+ : C, h > 0, Ψ 1(µ + 2νρh) Ψ 1(µ 2νρh) C(2ρd ) ho Intuitively, the d(ν, ρ) represents the measure of the size of the near optimal set; the larger, the more candidates for the optimal arm. The following regret bound has been shown for POO (Grill et al., 2015; Shang et al., 2019): DOS to Quantify Human Perception Figure 1. Comparison of the evolution of the average regret over 100 runs as a function of the number of stimuli presented to the observer, for a time horizon of T = 200, for each psychometric function. The standard deviation is reported using the shaded area. Theorem 2 (Theorem 1 from (Grill et al., 2015)). Let ρmax < 1 , νmax R. Then K > 0 R, such that ρ ρmax, ν νmax, Note that while (10) has an additional p log2 T term, (11) is worse than (10) in the threshold estimation problem for two reasons. First, the constant K in (11) depends on ρmax, and K + when ρmax 1. Therefore, (11) is not uniform on ρ, and requires additional information on Ψ smoothness to be used. Second, the rate of convergence of (11) is strictly worse than (10) for any Ψ that have a strictly positive near optimality dimension. The following proposition shows the existence of such function in the threshold estimation problem, highlighting the advantage of DOS. Proposition 2 (Non zero optimality dimension). There exist psychometric functions Ψ satisfying Assumption 1 such that min ν,ρ dΨ(ν, ρ) > log 2 > 0. 5. Experiments We now evaluate the performance of DOS in three different settings, with multiple psychometric functions. First, we use a small time budget (T = 200) this aims at reproducing the constraints of real psychometric experiments, where only a few hundred stimuli can be presented to the observer before the fatigue and learning effects significantly interfere with the experiment (Wichmann & Hill, 2001a). Second, we take interest in DOS performance for large values of T, to illustrate its faster convergence rate. The final set of experiments, described below, aim at illustrating the advantages and limitations of the different procedure with very small time budget (T = 50 and 100). All experiments were performed with custom script using Python 3.7, unless mentioned otherwise. Baselines. We compare DOS to three methods used in Psychophysics : Staircase, arguably the most commonly used methods in Psychophysics (see e.g. (Lengyel & Fiser, 2019)), Psimethod (Kontsevich & Tyler, 1999) is a popular Bayesian algorithm that assumes that Ψ is e.g. a Gaussian c.d.f., and GP, a recent algorithm for psychometrics experiments based on Gaussian Processes see e.g.(Gardner et al., 2015a; Song et al., 2017). We also compared DOS to the black box optimization algorithm POO (Parallel Optimistic Optimization)(Grill et al., 2015), using (2) to transform the threshold estimation problem. Finally, in the last round of experiment, we compare also DOS to Quest+ (Watson, 2017), a recent algorithm that improves over Psimethod and is particularly efficient for small time budgets. For Staircase, Psi Method and Quest+, we used the implementation provided by (Peirce et al., 2019), and the parameters recommanded by their respective papers. We used the dichotomous partition of [0,1] for POO and DOS, and ran GP with the kernel and hyperparameters recommended by (Song et al., 2017). Psychometric Functions We used six different psychometric functions to assess the behavior of DOS. They can be roughly divided into two families, flat and steep , DOS to Quantify Human Perception Table 1. Average ( standard deviation) over 100 runs of the simple regret, for each method and Psychometric functions with time horizon T = 500, 2000 or 5000. The reported regret has been multiplied by 10 for readability purpose. The best results are written in bold. T Ψ GP DOS Staircase POO Psi Method steep N 1.01 ( 0.68) 0.48 ( 0.33) 0.81 ( 0.70) 1.30 ( 0.23) 1.52 ( 1.20) steep β 0.85 ( 0.56) 0.43 ( 0.36) 0.98 ( 0.76) 1.24 ( 0.25) 1.29 ( 0.85) 500 steep Ψm 0.80 ( 0.69) 0.25 ( 0.26) 0.86 ( 0.90) 1.17 ( 0.27) 0.62 ( 0.45) flat N 0.54 ( 0.47) 0.46 ( 0.36) 0.52 ( 0.44) 0.86 ( 0.23) 0.61 ( 0.29) flat β 1.02 ( 0.84) 0.49 ( 0.38) 0.67 ( 0.54) 1.06 ( 0.23) 1.12 ( 0.35) flat Ψm 0.84 ( 1.30) 0.35 ( 0.35) 0.65 ( 0.64) 0.88 ( 0.18) 0.58 ( 0.13) steep N 0.85 ( 0.95) 0.31 ( 0.26) 1.05 ( 0.77) 1.04 ( 0.30) 1.09 ( 0.87) steep β 0.78 ( 0.94) 0.40 ( 0.31) 1.09 ( 0.71) 0.98 ( 0.21) 1.22 ( 0.97) 2000 steep Ψm 0.89 ( 0.70) 0.21 ( 0.24) 0.73 ( 0.78) 0.73 ( 0.15) 0.72 ( 0.36) flat N 0.48 ( 0.36) 0.44 ( 0.34) 0.65 ( 0.42) 0.73 ( 0.15) 0.59 ( 0.25) flat β 0.87 ( 0.74) 0.43 ( 0.30) 0.71 ( 0.52) 0.80 ( 0.16) 1.03 ( 0.17) flat Ψm 0.58 ( 0.46) 0.28 ( 0.29) 0.48 ( 0.50) 0.64 ( 0.14) 0.58 ( 0.13) steep N 0.76 ( 0.90) 0.27 ( 0.21) 1.10 ( 0.75) 0.84 ( 0.18) 0.96 ( 1.02) steep β 0.7 ( 0.79) 0.28 ( 0.26) 0.93 ( 0.73) 0.75 ( 0.13) 1.19 ( 0.91) 5000 steep Ψm 0.92 ( 0.61) 0.12 ( 0.16) 0.79 ( 0.85) 0.54 ( 0.13) 0.70 ( 0.28) flat N 0.44 ( 0.65) 0.31 ( 0.27) 0.54 ( 0.54) 0.53 ( 0.11) 0.50 ( 0.16) flat β 0.64 ( 0.64) 0.34 ( 0.26) 0.63 ( 0.57) 0.67 ( 0.10) 1.02 ( 0.17) flat Ψm 0.55 ( 0.27) 0.17 ( 0.20) 0.58 ( 0.56) 0.47 ( 0.12) 0.68 ( 0.13) depending on their behavior around the objective s flatter functions varying less around s . They include two functions, Nflat and Nsteep, based on a Gaussian c.d.f., with mean and standard deviation of respectively m = 0.35 and σ = 0.5 for Nflat (resp. m = 0.66 and σ = 0.2 for Nsteep). This is the most advantageous setting for Staircase and Psi Method, as this setting satisfies the Gaussian hypothesis. Two other functions, βflat and βsteep, are based on the c.d.f. of Beta variables, with respectively α = 2 and β = 1 for βflat (resp. α = 2 and β = 5 for Nsteep). While not Gaussian, these functions are usual c.d.f. and are strongly smooth (continuously differentiable). The last two functions, Ψm flat and Ψm steep, are non decreasing H older continuous functions (the hardest setting) defined as : Ψm(s + x) = µ + 1x>0|x|k+ 1x<0|x|k where k+ = 1.5 and k = 0.5 for Ψm flat (resp. k+ = 1 and k = 0.3 for Ψm steep). For each function, the objective is to identify the stimulus s such that µ = 0.707 (for flat functions) or µ = 0.5 (for steep functions) values that are reachable by Staircase. Importantly, the psychometric functions were clipped to the probability interval [0.2, 0.8] for steep functions, [0.1, 0.9] for flat functions, to represent arbitrary guess and lapse rates. During the experiments, only the value of µ was provided to the different algorithms. Small Time Horizon. Figure 1 reports the evolution average simple regret over 100 runs for T = 200. First, note that all methods tend to perform better on flat functions than on their steep counterpart since they vary less around s , it is easier to find a reasonably good estimator. In particular, all methods achieve good results for Ψm flat, which is significantly flat to the right of s . Second, Psi Method performs poorly for non Gaussian Ψ, as they violate the assumption on its shape. Moreover, µmin and µmax were considered unknown which is generally the case in practice and this lack of prior information is known to impact the performance of Bayesian methods (Garc ıa-P erez & Alcal a Quintana, 2007). Third, while POO seems to converge toward the solution for every function, it achieves the worst regret in all the studied settings, as the rate of convergence is slow (POO cannot take advantage of the monotonic property of Ψ). Finally, while GP and Staircase achieve reasonable performance, DOS provides one of the best estimation if not the best in all these settings, particularly for difficult functions such as Ψm steep.Interestingly, the regret trajectory of DOS sometimes increases for a short time. This is due to the fact that when the agent activates a new arm, she might moves from a si > s to a si+1 < s , (or vice versa), and the regret may increase temporarily. However, as additional arms are pulled, the sequence si converges toward s , resulting in a small RT . Large Time Horizon. Table 1 reports the average simple regret over 100 runs for different algorithm and psychometric function for three different time horizons : T = 500, 2000 and 5000. It can be seen that DOS outperforms its competitor in every case, and its advantage increases with T. Interestingly, Staircase and Psi Method do not appear to converge, as they display little improvement between T = 500 and T = 5000. This is due to the fact that these DOS to Quantify Human Perception Table 2. Average ( standard deviation) over 100 runs of the simple regret in the psychometric setting, for each method and Psychometric functions with time horizon T = 50 and 100 . The reported regret has been multiplied by 10 for readability purpose. Setting ψ T DOS Stair. Psi M. Quest+ GP POO Yes/No Gaussian 50 1.01(0.50) 0.98(0.72) 0.94(0.58) 0.90(0.41) 1.52(0.46) 1.72(0.66) 100 0.70(0.47) 0.90(0.64) 0.72(0.32) 0.70(0.50) 1.22(0.34) 1.51(0.60) H older 50 1.08(0.56) 1.17(0.55) 1.22(0.91) 1.31(1.11) 1.73(1.66) 1.68(0.70) 100 0.83(0.58) 1.03(0.62) 1.12(0.72) 1.15(0.87) 1.25(0.75) 1.38 (0.62) 2-AFC Gaussian 50 1.14(0.60) 1.28(0.73) 1.53(0.90) 1.33(0.62) 1.67(0.64) 1.70(0.67) 100 0.90(0.56) 1.15(0.69) 1.33(0.69) 1.20(0.55) 1.27(0.44) 1.52(0.37) H older 50 1.05(0.58) 1.41(0.71) 1.74(1.10) 1.70(1.23) 1.65(0.63) 1.69(0.81) 100 0.85(0.47) 1.25(0.72) 1.68(1.08) 1.71(1.25) 1.45(0.54) 1.40(0.79) methods rely on manually constructed grids of hyperparameter (discretization of loc and scale for Psi Method; scale of step size for Staircase) which need to be tuned to Ψ to ensure convergence for large values of T, and thus requires prior knowledge of the function shape. Conversely, POO and GP tend to converge toward s , but DOS appears to consistently outperforms them. For POO, this can be seen as an illustration of DOS better regret bounds, while GP has no such theoretical guarantees. Finally, GP appears to be performing poorly for Ψm steep; this can be explained by the fact that its kernel was suboptimal to estimate non standard functions such as Ψm steep. Mimicking Psychometric Experiments. In Table 2, we compared all the procedures in a setting which mimics the behavior of a small budget psychometric experiment (T = 50 and 100). We used two frameworks, which illustrates two common type of experimental settings : YES/NO (µmin = 0, µmax = 1 , µ = 0.5) and 2-AFC (µmin = 0.5, µmax = 0.96, µ = 0.707), and two psychometric functions : the steep Gaussian c.d.f. and H older function previously described. The Yes/No Gaussian (resp. 2-AFC H older) setting was expected to be the most (resp. least) favorable for Bayesian methods (i.e. Psi Method and Quest+). Table 2 shows that indeed, for the first setting, Bayesian methods are slightly better than DOS, and that all methods are very close for T = 100 Yes/No Gaussian c.d.f.. Conversely, DOS is slightly better than its counterparts for T = 50 Yes/No H older and 2-AFC Gaussian, and significantly better in the other cases. In summary, these experiments show that DOS performs almost as good as its Bayesian counterparts in their ideal setting, and quickly outperforms them in non ideal settings or when T increases. 6. Discussion In this works, we discussed a new method for solving the threshold estimation problem, DOS. Compared to previ- ous works in global optimization such as POO (Grill et al., 2015), we showed that DOS has better regret bounds, and consistently performs better in our experiments. This is due to the fact that POO was developed for a different, more general setting and thus cannot take advantage of the properties of psychometric functions. Compared to other methods used in Psychophysics (Staircase, Psi Method, GP), DOS is completely model free, does not require any assumption on the shape of Ψ, is parameter free, has strong theoretical guarantees and performs better empirically. Importantly, DOS is not a replacement for Bayesian psychometric methods such as Psi Method and GP. Indeed, these model-based methods are designed to estimate the entire Ψ function, while DOS only estimates s (and thus is able to do it more efficiently). Consequently, they remain the tool of choice to use when the shape of the Ψ is known (e.g. Gaussian c.d.f.), and when the general behavior of the function is of interest to the experimenter for instance when establishing a person s audiogram (Gardner et al., 2015b). However, when little is known about Ψ, or when the objective is to estimate s the most frequent case in Psychophysics research, see e.g. (Sch utz et al., 2008; Garc ıa-P erez & Alcal a-Quintana, 2007) then DOS is a significantly better solution. Future works might include the application of DOS to other problems outside the domain of Psychophysics, such as the handover of signal on a cellular network, see e.g. (Sun et al., 2019). DOS to Quantify Human Perception Audiffren, J. Quantifying human perception with multiarmed banditss. In The 20th International Conference on Autonomous Agents and Multiagent Systems, 2021. Auer, P., Ortner, R., and Szepesv ari, C. Improved Rates for the Stochastic Continuum-Armed Bandit Problem. In Bshouty, N. H. and Gentile, C. (eds.), Learning Theory, Lecture Notes in Computer Science, pp. 454 468, Berlin, Heidelberg, 2007. Springer. ISBN 978-3-540-72927-3. doi: 10.1007/978-3-540-72927-3 33. Benson, A. J., Hutt, E. C., and Brown, S. F. Thresholds for the perception of whole body angular movement about a vertical axis. Aviation, Space, and Environmental Medicine, 60(3):205 213, 1989. ISSN 0095-6562(Print). Brown, L. G. Additional rules for the transformed up-down method in psychophysics. Perception & Psychophysics, 58(6):959 962, 1996. Bubeck, S., Munos, R., and Stoltz, G. Pure exploration in finitely-armed and continuous-armed bandits. Theoretical Computer Science, 412(19):1832 1852, April 2011. ISSN 0304-3975. doi: 10.1016/j.tcs.2010.12. 059. URL http://www.sciencedirect.com/ science/article/pii/S030439751000767X. Chakraborty, M., Das, S., and Magdon-Ismail, M. Nearoptimal target learning with stochastic binary signals. In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, pp. 69 76, 2011. Chhabra, M. and Das, S. Learning the demand curve in posted-price digital goods auctions. In The 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 1, pp. 63 70, 2011. Cornsweet, T. N. The Staircase-Method in Psychophysics. The American Journal of Psychology, 75 (3):485, September 1962. ISSN 00029556. doi: 10. 2307/1419876. URL https://www.jstor.org/ stable/1419876?origin=crossref. Fontaine, X., Mannor, S., and Perchet, V. An adaptive stochastic optimization algorithm for resource allocation. In Algorithmic Learning Theory, pp. 319 363, 2020. Garc ıa-P erez, M. A. and Alcal a-Quintana, R. Bayesian adaptive estimation of arbitrary points on a psychometric function. British Journal of Mathematical and Statistical Psychology, 60(1):147 174, 2007. ISSN 2044-8317. doi: 10.1348/000711006X104596. URL https://onlinelibrary.wiley.com/doi/ abs/10.1348/000711006X104596. Gardner, J., Malkomes, G., Garnett, R., Weinberger, K. Q., Barbour, D., and Cunningham, J. P. Bayesian active model selection with an application to automated audiometry. Advances in neural information processing systems, 28:2386 2394, 2015a. Gardner, J. R., Song, X., Weinberger, K. Q., Barbour, D. L., and Cunningham, J. P. Psychophysical detection testing with bayesian active learning. In UAI, pp. 286 295, 2015b. Grill, J.-B., Valko, M., and Munos, R. Black-box optimization of noisy functions with unknown smoothness. December 2015. URL https://hal.inria.fr/hal01222915. Hatzfeld, C., Hoang, V. Q., and Kupnik, M. It s All About the Subject - Options to Improve Psychometric Procedure Performance. In Bello, F., Kajimoto, H., and Visell, Y. (eds.), Haptics: Perception, Devices, Control, and Applications, volume 9774, pp. 394 403. Springer International Publishing, Cham, 2016. ISBN 978-3-31942320-3 978-3-319-42321-0. doi: 10.1007/978-3-31942321-0-36. URL http://link.springer.com/ 10.1007/978-3-319-42321-0-36. Hirahara, T. Physical characteristics of headphones used in psychophysical experiments. Acoustical science and technology, 25(4):276 285, 2004. Jedynak, B., Frazier, P. I., and Sznitman, R. Twenty questions with noise: Bayes optimal policies for entropy loss. Journal of Applied Probability, 49(1):114 136, 2012. Kleinberg, R., Slivkins, A., and Upfal, E. Multi-armed bandits in metric spaces. In Proceedings of the fortieth annual ACM symposium on Theory of computing, pp. 681 690, 2008. Kleinberg, R., Slivkins, A., and Upfal, E. Bandits and experts in metric spaces. Journal of the ACM (JACM), 66 (4):1 77, 2019. Kontsevich, L. L. and Tyler, C. W. Bayesian adaptive estimation of psychometric slope and threshold. Vision Research, 39(16):2729 2737, August 1999. ISSN 0042-6989. doi: 10.1016/S0042-6989(98)002855. URL http://www.sciencedirect.com/ science/article/pii/S0042698998002855. Leek, M. R. Adaptive procedures in psychophysical research. Perception & Psychophysics, 63(8):1279 1292, November 2001. ISSN 1532-5962. doi: 10.3758/ BF03194543. URL https://doi.org/10.3758/ BF03194543. DOS to Quantify Human Perception Lengyel, G. and Fiser, J. The relationship between initial threshold, learning, and generalization in perceptual learning. Journal of Vision, 19(4):28 28, April 2019. ISSN 1534-7362. doi: 10.1167/19.4. 28. URL https://jov.arvojournals.org/ article.aspx?articleid=2732298. Levitt, H. Transformed Up-Down Methods in Psychoacoustics. The Journal of the Acoustical Society of America, 49 (2B):467 477, February 1971. ISSN 0001-4966. doi: 10. 1121/1.1912375. URL https://asa.scitation. org/doi/abs/10.1121/1.1912375. Nir, R.-R., Granovsky, Y., Yarnitsky, D., Sprecher, E., and Granot, M. A psychophysical study of endogenous analgesia: the role of the conditioning pain in the induction and magnitude of conditioned pain modulation. European journal of pain, 15(5):491 497, 2011. Peirce, J., Gray, J. R., Simpson, S., Mac Askill, M., H ochenberger, R., Sogo, H., Kastman, E., and Lindeløv, J. K. Psycho Py2: Experiments in behavior made easy. Behavior Research Methods, 51(1):195 203, February 2019. ISSN 1554-3528. doi: 10.3758/s13428-018-01193y. URL https://doi.org/10.3758/s13428018-01193-y. Scheuerman, J., Venable, K. B., Anderson, M. T., and Golob, E. J. Modeling spatial auditory attention: Handling equiprobable attended locations. In CAID@ IJCAI, pp. 1 7, 2017. Sch utz, A. C., Braun, D. I., Kerzel, D., and Gegenfurtner, K. R. Improved visual sensitivity during smooth pursuit eye movements. Nature neuroscience, 11(10):1211 1216, 2008. Shang, X., Kaufmann, E., and Valko, M. General parallel optimization a without metric. In Algorithmic Learning Theory, pp. 762 788, March 2019. URL http://proceedings.mlr.press/ v98/xuedong19a.html. Shen, Y. and Richards, V. M. A maximum-likelihood procedure for estimating psychometric functions: Thresholds, slopes, and lapses of attention. The Journal of the Acoustical Society of America, 132(2):957 967, August 2012. ISSN 0001-4966. doi: 10.1121/1. 4733540. URL https://asa.scitation.org/ doi/full/10.1121/1.4733540. Song, X. D., Garnett, R., and Barbour, D. L. Psychometric function estimation by probabilistic classification. The Journal of the Acoustical Society of America, 141(4): 2513 2525, 2017. Sun, L., Hou, J., and Shu, T. Optimal handover policy for mmwave cellular networks: A multi-armed bandit approach. In 2019 IEEE Global Communications Conference (GLOBECOM), pp. 1 6. IEEE, 2019. Valko, M., Carpentier, A., and Munos, R. Stochastic Simultaneous Optimistic Optimization. In International Conference on Machine Learning, pp. 19 27, February 2013. URL http://proceedings.mlr.press/ v28/valko13.html. Watson, A. B. Quest+: A general multidimensional bayesian adaptive psychometric method. Journal of Vision, 17(3): 10 10, 2017. Wichmann, F. A. and Hill, N. J. The psychometric function: I. Fitting, sampling, and goodness of fit. Perception & Psychophysics, 63(8):1293 1313, November 2001a. ISSN 1532-5962. doi: 10.3758/BF03194544. URL https://doi.org/10.3758/BF03194544. Wichmann, F. A. and Hill, N. J. The psychometric function: II. Bootstrap-based confidence intervals and sampling. Perception & Psychophysics, 63(8):1314 1329, November 2001b. ISSN 1532-5962. doi: 10.3758/ BF03194545. URL https://doi.org/10.3758/ BF03194545. Wichmann, F. A. and J akel, F. Methods in psychophysics. Stevens Handbook of Experimental Psychology and Cognitive Neuroscience, 5:1 42, 2018.