# optimal_postedprice_mechanism_in_microtask_crowdsourcing__d73e010c.pdf Optimal Posted-Price Mechanism in Microtask Crowdsourcing Zehong Hu, Jie Zhang Rolls-Royce@NTU Corporate Lab, School of Computer Science and Engineering Nanyang Technological University, Singapore huze0004@e.ntu.edu.sg Posted-price mechanisms are widely-adopted to decide the price of tasks in popular microtask crowdsourcing. In this paper, we propose a novel postedprice mechanism which not only outperforms existing mechanisms on performance but also avoids their need of a finite price range. The advantages are achieved by converting the pricing problem into a multi-armed bandit problem and designing an optimal algorithm to exploit the unique features of microtask crowdsourcing. We theoretically show the optimality of our algorithm and prove that the performance upper bound can be achieved without the need of a prior price range. We also conduct extensive experiments using real price data to verify the advantages and practicability of our mechanism. 1 Introduction Crowdsourcing is an economical method to outsource tasks to online workers [Slivkins and Vaughan, 2014]. Depending on the types of tasks, crowdsourcing takes different forms. Microtask crowdsourcing, one of the most widely-adopted forms, targets small tasks such as labeling images and filling up surveys [Gao et al., 2015]. These tasks usually are repetitive and easy for an individual to perform. In microtask crowdsourcing, a requester often needs to accomplish hundreds of tasks within a given budget. The requester thus posts these tasks on microtask crowdsourcing platforms along with the price for each task. Workers, who are willing to perform the tasks, submit their solutions and will be paid with the prescribed price. Microtask crowdsourcing has become increasingly prevalent in many domains, especially in collecting training data for machine learning algorithms [Difallah et al., 2015; Simpson et al., 2015; Wang and Zhou, 2016]. One of the key challenges in microtask crowdsourcing is to determine the price of tasks properly. Overpricing causes inefficient use of the budget, whereas underpricing may lead to an insufficient number of participating workers [Anari et al., 2014]. Thus, various pricing mechanisms have been proposed [Singla and Krause, 2013; Singer and Mittal, 2013], among which the posted-price mechanism is one of the most attractive branches. It is because the posted-price mechanism only requires workers to make reject-or-accept deci- sions, which greatly facilitates its practical usage. With these binary decisions, the mechanism can learn the worker model online and accordingly adjusts the price to be optimal. Nevertheless, existing posted-price mechanisms are still inadequate in two aspects. Firstly, they overlook the unique features of microtask crowdsourcing: the number of workers willing to accept the task is unknown and increases monotonically with the increasing price, whereas the number of workers allowed by the limited budget is accurately known and decreases monotonically. These features can be employed to guide the price adjustment of posted-price mechanisms. Secondly, the performance of existing posted-price mechanisms significantly degrades if the number of possible prices is very large. Thus, the requester is required to input a proper range of prices in advance, which causes much inconvenience. In this paper, we propose a novel posted-price mechanism to exploit the unique features of microtask crowdsourcing. More specifically, we first convert the pricing problem into an equivalent multi-armed bandit (MAB) problem. Then, we develop an algorithm that offers each coming worker the minimum price at which the anticipated number of workers willing to accept the task approximately equals the number of workers allowed by the budget. Due to the monotonicity features of the two numbers mentioned above, our algorithm is proven to be optimal. In addition, these features ensure that our algorithm will never explore overly high prices and thus does not need to set a price range in advance. To empirically validate the advantages of our mechanism over existing ones, we conduct extensive experiments using three popular worker models as well as the real-world price data collected from MTurk, a widely-adopted microtask crowdsourcing platform. Experimental results confirm that our mechanism achieves almost the same performance as the idealized case where the accurate worker model is known in advance (i.e., the optimal price is used from the very beginning). We also carry out robustness tests to ensure the practicability of our mechanism. 2 Related Work Existing posted-price mechanisms assume that a worker accepts a task if the offered price is higher than the cost. Without prior knowledge about workers costs, these mechanisms learn the cost distribution online by counting the acceptance frequency. To maximize requesters revenue under the uncertainty about workers costs, different price selection al- Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) gorithms have been employed. Badanidiyuru et al. [2012] discretize the range of workers costs into a geometric progression, and use a heuristic policy to pick the price from the progression. Singla and Krause [2013] then propose an improved policy which selects the price with the highest upper confidence bound of the expected revenue under the budget constraint. Badanidiyuru et al. [2013] and Agrawal and Devanur [2014] propose to employ linear programming (LP) to choose the optimal price. The LP problem is formulated with the upper and lower confidence bounds of the expected revenue and the budget cost of all possible prices. To summarize, when a worker comes, the existing mechanisms estimate the expected revenue for all possible prices using the learned worker model, to choose the optimal price. However, the learned worker model may be inaccurate, causing the inaccuracy of the estimated revenue. Besides, an essential condition of this approach is that the number of possible prices has to be finite. Its accuracy will become even lower for a larger number of possible prices. Thereby, a finite prior price range must be provided. In contrast, in our mechanism, we only compare the estimated number of workers willing to accept the task with the accurately known number of workers allowed by the limited budget. Without the comparison among the possibly inaccurate estimations of expected revenue, the optimal price computed by our mechanism is more accurate. Meanwhile, since our mechanism only cares about the minimum price at which the two numbers mentioned above are equal, there is no need to have a proper price range given in advance. Note that there are many other pricing mechanisms proposed in the literature of crowdsourcing. These mechanisms consider a different architecture or scenario. For example, procurement auction determines the price based on workers truthful bids about their costs [Singer and Mittal, 2013; Zhang et al., 2014; Chandra et al., 2015]. In microtask crowdsourcing, due to the large number of workers, this architecture will significantly increase the communication burden and easily be affected by workers bounded rationality [Rivas, 2015]. In [Badanidiyuru et al., 2012] and [Singla and Krause, 2013], posted-price mechanisms also show competitive and even better performance than procurement auctions. Therefore, we focus on the posted-price mechanism in this paper. Another popular direction of crowdsourcing studies is to determine payment according to the quality of work [Yin and Chen, 2015; Liu and Chen, 2016; Radanovic et al., 2016]. Research in this direction tries to exert more efforts from workers, while our aim is to recruit more workers. These two aspects are in fact complementary. 3 Modeling of Microtask Crowdsourcing In microtask crowdsourcing, a requester posts tasks along with the price for each task. Workers can accept or reject the task. If they accept and finish one task, they will be paid the offered price. In this section, we formulate the requester and worker models, and study the unique features, optimal price and performance metric of microtask crowdsourcing. 3.1 Requester and Worker Models The requester gets revenue from the completed tasks. In this paper, we assume that each task has unit value. Thus, the requester wishes to maximize the number of completed tasks within the given budget B. Assume there are N workers in the market and workers will finish the task if accepting it. The worker model depicts how workers decide to accept or reject the task. Here, we give three typical worker models which will be utilized in our experiments: Private Cost Model [Singla and Krause, 2013] assumes that worker wi accepts a task only if the offered price pi is not lower than the cost ci for performing the task. Discrete Choice Model [Gao and Parameswaran, 2014] describes human s preference for tasks with higher utility. It assumes worker wi accepts price pi with the probability exp[U(pi)]/{exp[U(pi)] + Mi}, where Mi denotes the effects of other tasks. U(pi) = αipi + βi denotes the utility of performing the offered task. Reference Payment Model [Yin and Chen, 2015] depicts human s habit to maintain a reference payment level ri. It assumes worker wi accepts price pi with the probability {1 + exp[ αi βi(pi ri)]} 1, where αi and βi denote worker i s interest and activeness, respectively. In this paper, we assume the stochastic arrival of workers. This setting is widely-adopted in the research on microtask crowdsourcing [Singla and Krause, 2013]. Under this assumption, workers arrive one at a time, and the decision parameters of different workers (e.g. the cost ci) are the same or i.i.d. sampled from a same distribution. Thus, the inputoutput relationship of the worker models mentioned above can be described by a probability function F(p). It denotes the probability at which the coming worker accepts the offered price p. Since prices in practice must be positive, we require the support of F(p) to be (0, + ) (i.e. F(p) = 0 for p 0). Note that the stochastic arrival assumption may be violated in real crowdsourcing markets, and the number of workers N is also always changing. Therefore, in our experiments, we evaluate the robustness of our mechanism in more practical settings where these factors are all considered. 3.2 Unique Features, Optimal Price and Metric If we choose a price p for all tasks, the requester s expected revenue will be U(p) = min{N F(p), B/p}. The first item denotes the expected number of workers accepting the price, and the second item represents the budget constraint. Then, we can summarize the features of microtask crowdsourcing: Higher prices attract more workers i.e. the acceptance probability function F(p) is monotonically increasing; Higher prices allow fewer workers to be recruited i.e. the budget constraint B/p is monotonically decreasing; The optimal price p = arg maxp U(p) is the point where the budget equals the costs to recruit all workers willing to accept the price, i.e. N F(p ) = B/p . Besides, the available prices in practice should be discrete (e.g. 1, 2, ... cents), and the optimal price p may be unavailable. Thus, we write the possible prices as an increasing sequence (p1 < p2 < . . .), and denote the available optimal price with the optimal subscript k = arg maxk U(pk). We also introduce a performance metric for posted-price mechanisms, termed regret. It denotes the expected revenue Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) gap between mechanism M and taking the optimal price pk : R(N, B) = U(pk , B, N) U(M, B, N) (1) Maximizing requester s revenue is thus equivalent to minimizing the regret. Note that the stochastic combination of different prices may generate higher expected revenue than the optimal price p [Badanidiyuru et al., 2013]. Nevertheless, finding the optimal combination requires much more accurate F(p) than identifying the single optimal price. Thus, the mechanisms targeting the optimal combination need to explore sub-optimal prices more, and their performance turns out to be worse than ours in experimental evaluation. 4 Indirect Design of Posted-Price Mechanism When selecting prices, we face a dilemma between exploring possible better prices and exploiting the current best price. A general idea to resolve this dilemma is to compute and select the current optimal price while considering the uncertainties of the learned worker model. However, it is not easy to properly measure the uncertainties. In addition, budget constraint is another factor that needs attention. The requester accumulates revenue by recruiting workers, and the recruitment stops when the budget is exhausted. The stopping point is in fact not fixed due to the change of selected prices, making it difficult to decide the current optimal price. Thus, designing a mechanism that can also exploit unique features of microtask crowdsourcing becomes even more challenging. We adopt an indirect approach to design our mechanism. More specifically, instead of directly solving the pricing problem of microtask crowdsourcing, we define an equivalent multi-armed bandit (MAB) problem at first. It not only has the same optimal solution as microtask crowdsourcing but also inherits all the unique features. The only difference is that this equivalent problem has a fixed stopping point which simplifies computing the current optimal. For this equivalent problem, we then develop an algorithm which optimally exploits the unique features of microtask crowdsourcing and use it as the core of our mechanism. For clarity, all the theoretical analysis, including the optimality of our MAB algorithm, the linkage between the two problems on regret and the regret under the infinite price range, will be provided in Section 5. 4.1 Equivalent MAB Problem We first define the equivalent MAB problem using the notations explained in microtask crowdsourcing. Suppose that we need to repeatedly select a price pi {p1, . . . , p K} for N times. After selecting the price pi in round n, we can observe a stochastic signal Xn which follows the Bernoulli distribution. The mean of Xn equals to F(pi). Meanwhile, the reward for choosing pi is Ui = min{Fi, Ci}, where Fi = F(pi) and Ci = B/(N pi). The rewards are accumulated in background, so we can only know the rewards after finishing all the N selections. In fact, this new problem just equally divides the expected revenue in microtask crowdsourcing into N rounds, i.e. Ui = U(pi)/N. This conversion helps us bypass the difficulty caused by the changing stopping point when trying to exploit the unique features of microtask crowdsourcing. It is also easy to conclude that this equivalent problem inherits all the unique features of microtask crowdsourcing and reaches the optimal at the optimal price pk . Based on the three unique features, we introduce the following two kinds of possible optimal prices (POPs): POP-1 is the price pk that satisfies Ck > Fk Ck+1; POP-2 is the price pk that satisfies Fk Ck > Fk 1. Here, k {1, . . . , K}. For completeness, we add the conventions that p0 = 0 and p K+1 = + . The rationale behind introducing POPs lies in the following theorem: Theorem 1. There is always one and only one POP-1 or POP-2. The optimal price pk must be POP-1 or POP-2. Proof of Existence: Due to the monotonicity of F(p) and B/p, Fi Ci is monotonically increasing when i changes from 1 to K. Meanwhile, the conventions, p0 = 0 and p K+1 = + , lead to F0 C0 < 0 and FK+1 CK+1 > 0, respectively. Therefore, there must exist k satisfying Fk Ck < 0 and Fk+1 Ck+1 0. In this case, if Fk Ck+1, then pk is POP-1; otherwise, pk+1 is POP-2. Proof of Uniqueness: Let pk be POP-1. Then, for i < k, Fi Fk < Ck Ci+1 < Ci. For i > k, Fi Fi 1 Fk Ck+1 Ci. Thus, pi =k cannot be POP-1 or POP-2. Besides, let pk be POP-2. Then, for i < k, Fi Fk 1 < Ck Ci+1 < Ci. For i > k, Fi Fk Ck Ci 1 > Ci. Thus, pi =k also cannot be POP-1 or POP-2. Proof of Optimality: Let pk be POP-1 or POP-2. From the uniqueness proof, we can have Fi min{Ck, Fk} for i < k and Ci min{Ck, Fk} for i > k. Therefore, Uk Ui =k, and pk equals to the optimal price pk . 4.2 Optimal MAB Algorithm Algorithm 1: Optimal MAB Algorithm (OA-MAB) Input: n, S = {p1, . . . , p K}, µi(n) for i = 1 . . . K 1 Search for the minimum POP in S based on µi(n); 2 Output the current optimal price pk(n) at which: ˆk if pˆk is POP-1, ˆk if pˆk is POP-2 and lˆk 1 2 N ˆk if pˆk is POP-2, lˆk 1 2 N and bˆk 1 < Cˆk ˆk 1 if pˆk is POP-2, lˆk 1 2 N and bˆk 1 Cˆk where ˆk is the subscript of the minimum POP. Here we describe the optimal algorithm for the MAB problem, a novel price selection algorithm with the regret proven to match the Lai-Robbins regret lower bound in Section 5.1. To formally describe the algorithm, we need the following notations. Let Ni(n) denote the number of times that price pi has been selected up to round n. The empirical estimate of the acceptance probability Fi in round n is µi(n) = 1 Ni(n) Pn t=1 1{k(t) = i}Xt. Here, k(t) denotes the subscript of the selected price in round t, and we set µi(n) = 1 if Ni(n) = 0. Besides, we employ the KL-divergence to compute the upper confidence bound of the estimate as: bi(n, µi, Ni) = sup{q µi(n) : Ni(n) KL[µi(n), q] log(n) + 3 log(log(n))} (2) with the convention that bi(n, µi, 0) = 1 and bi(n, 1, Ni) = 1. Here, KL[x, y] = x log( x y )+(1 x) log( 1 x 1 y ) denotes the Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) KL-divergence between two Bernoulli distributions whose means are x and y respectively. We adopt this upper confidence bound formulation because Garivier and Capp e [2011] prove it to outperform other formations in Bernoulli distributions. Furthermore, we define li(n) as the number of times that pi has been identified as not only the minimum possible optimal price (being POP-1 or POP-2) but also POP-2. The pseudo-code of the algorithm is presented in Algorithm 1. 4.3 Posted-Price Mechanism Our posted-price mechanism is presented in Algorithm 2. The input parameters include the budget B, the number of workers N and the minimum price gap δp. Here, δp is a parameter of the microtask crowdsourcing platform and denotes the minimum payment unit (e.g. 1 cent). Due to the search of the minimum possible optimal price in our MAB algorithm, the probability of exploring overly large prices exponentially decreases. We call this property as probability decay and will further explain it in Section 5.3. This special property ensures the performance of our MAB algorithm to be unaffected when K + . Thus, our mechanism does not need the input of a finite price range, which is required by other existing mechanisms. Furthermore, to strictly ensure the offered price is not higher than the remaining budget Bn, we use Bn to update the available price set S in our mechanism. Algorithm 2: Optimal Posted-Price Mechanism (OPPM) 1 Parameters: B, N, δp 2 Initialize: worker counter: n = 1; available budget: Bn = B; the number of recruited workers: U = 0; the number of available prices: K = B/δp ; the estimate of F: µi = 1 for i = 1 . . . K; begin 3 while Bn > δp and n N do 4 K Bn/δp , S {δp, 2δp, . . . , Kδp}; 5 pk OA-MAB(n, S, µ1, . . . , µK); 6 Offer price pk(n) to the coming worker; 7 Observe the decision: Xn = 0 (reject) or 1 (accept); 8 Update: U U + Xn; Bn+1 Bn pk Xn; µk (Nkµk + Xn)/(Nk + 1). 5 Theoretical Analysis In this section, we provide extensive theoretical analysis to support our indirect approach to designing the mechanism. 5.1 Optimality of the MAB Algorithm To prove the optimality of Algorithm 1, we firstly derive the Lai-Robbins regret lower bound of the MAB problem defined in Section 4.1 [Lai and Robbins, 1985]. This classic bound represents the possible best performance that can be achieved by any uniformly good algorithm. We say an algorithm π is uniformly good if its regret is at most O(log N) for all possible F(p). Besides, we keep B/N = const when N + . This setting ensures the optimal price pk to be unchanged and can greatly facilitate the asymptotic analysis. Theorem 2. In the MAB problem defined in Section 4.1, any uniformly good algorithm π satisfies: lim inf N Rπ(N) log(N) = ( 0 pk is POP-1 Ck Fk 1 KL[Fk 1,Ck ] pk is POP-2 (3) Proof. To prove this theorem, we firstly employ the Theorem 1 in [Graves and Lai, 1997]1 and can get lim N inf [Rπ(N)/ log(N)] w(F), where w(F) equals the output of the following LP problem: min PK j=1wj [ Uk (F) Uj(F)] s.t. inf ˇ F Z(F ) P j =k wj KL[Fj, ˇFj] 1 (4) where wj 0 and ˇF denotes the bad distribution that has the same value as F at k but provides the largest rewards at another price ˇk = k . Even starting from k , any algorithm still needs to explore other prices to distinguish F and ˇF. Otherwise, it may miss the real optimal price ˇk if we substitute F with ˇF. All the bad distributions form the set Z(F): Z(F) = { ˇF| ˇFk = Fk , Uk ( ˇF) < maxk Uk( ˇF)}. (5) Secondly, we solve the LP with the monotonicity of F(p) and C(p) considered. The detailed deduction is similar to those in Theorem 4.1 of [Combes and Proutiere, 2014]. Therefore, we only provide the results here: if pk is POP-1, Z(F) and w(F) = 0; if pk is POP-2, inf ˇ F Z(F ) P j =k wj KL[Fj, ˇFj] = wk 1 KL[Fk 1, Ck ] and w(F) = (Ck Fk 1)/KL[Fk 1, Ck ]. Then, let R be the regret of Algorithm 1. The optimality of our algorithm is guaranteed by the following theorem: Theorem 3. The regret upper bound of Algorithm 1 equals the Lai-Robbins regret lower bound, namely lim sup N R/ log(N) = lim inf N Rπ/ log(N). (6) To prove this theorem, we firstly decompose the rounds where sub-optimal prices are selected into the following sets: {n N|k(n) = k } A1 A2 A3, (7) where A1 = {n|ˆk(n) < k } and A2 = {n|ˆk(n) > k } denote the cases where POP is wrongly identified. A3 = {n| ˆk(n) = k , k(n) = k 1} represents the cases where POP is correct but the price is wrongly selected. Next, based on this decomposition, we compute the regret of Algorithm 1 as: Theorem 4. R E|A1| + E|A2| + (Ck Fk 1) E|A3|, where | | denotes the size of a set. Proof. Considering the fact that Uk Ck , Uk 1 = Fk 1 and 0 < Uj Uk 1 hold for j, we can have R(B, N) E|A1| + E|A2| + (Ck Fk 1) E|A3| where, Nj denotes the total times that pj is selected. Furthermore, we can bound E|A1|, E|A2| and E|A3| with: Theorem 5. E|A1| < + . 1The Condition (2.14) for Theorem 1 in [Graves and Lai 1997] is equivalent to F(p(k 1)) > 0, which is satisfied in all cases. Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) Proof. According to the settings of Algorithm 1, we derive Qj def = {n|ˆk = j} Wj def = {n|ˆk = j, k(n) = j} |Wj| |Qj|/2 (8) using the fact that pj is selected at least when lj 1 2 N. For j < k , to achieve ˆk = j, µj should satisfy µj Cj+1. Meanwhile, Fj < Cj+1. Thus, µj > Fj holds for all elements in Wj. According to Theorem 4.1 of [Rajeev et al., 1989], E|Wj| < and E|A1| = P j k , we can define the following sets: Qj def = {n|pj is POP-1} Wj def = {n|pj is POP-2} (9) Furthermore, we can decompose the set Wj as: Xj def = {n Wj|Nj 1(n) lj(n)/4} Yj def = {n Wj|Nj 1(n) < lj(n)/4} (10) where lj(n) denotes the times that pj is identified as the minimum POP and POP-2. Similar as Theorem 5, we can prove E|Qj| < + and E|Xj| < + . Then, we can derive Zj def = {n Wj|bj 1(n) < Cj} |Yj| 4|Zj| (11) using the fact that pj is selected at least lj(n)/4 times under the condition that (lj 1)/2 N and bj 1(n) < Cj. Since Cj < Fj 1, according to Theorem 2 of [Garivier and Capp e, 2011], E|Zj| γ log log(N), where γ is a constant. Thus, E|A2| = P j>k (E|Qj| + E|Wj|) = O(log log(N)). Theorem 7. If pk is POP-1, E|A3| < + . If pk is POP-2, E|A3| = O((1 + ϵ) log(N)/KL[Fk 1, Ck ]), where ϵ can be any positive number [Garivier and Capp e, 2011]. Proof. A3 denotes the case where pk is identified as POP-2 in Algorithm 1 and k 1 is selected because bk 1 Ck . If k is actually POP-1, similar as Theorem 5, we can prove E|A3| < + . If k is POP-2, we can get Fk 1 < Ck bk 1. According to Theorem 2 of [Garivier and Capp e, 2011], E|A3| = O((1 + ϵ) log(N)/KL[Fk 1, Ck ]). Using Theorems 4 7, we can finally conclude Theorem 3. 5.2 Regret of Our Mechanism Since the stopping point of posted-price mechanisms is not fixed, the regret of our mechanism has a little difference with the regret of the proposed MAB algorithm. Thus, we here derive the regret upper bound of our mechanism. Theorem 8. The expected regret of our mechanism satisfies: R(N, B) E|A1| + (Fk Fk 1) E|A3| [Krδp Fk pk ] E|A2|/pk (12) where Kr > k denotes the subscript of the highest price explored by our mechanism. Due to the probability decay explained in Section 5.3, Kr may be far smaller than B/δp . The proof of Theorem 8 is similar to the proof of Lemma 1 in [Singla and Krause, 2013]. Thus, we omit it here. The core idea is to derive the connection between the expected stopping point and the regret. Furthermore, Theorems 4 and 8 show that the regrets of both the algorithm and the postedprice mechanism are linearly proportional to E|A1|, E|A2| and E|A3|. This linkage explains the rationale behind our indirect approach, which is to first develop the algorithm with the lowest regret in the equivalent MAB problem and then design our mechanism based on the optimal algorithm. 5.3 The Regret under Infinite Price Range In our mechanism, the price range is set as the budget B. However, in real markets, B is usually very large, leading to a large number of possible prices. To avoid the low efficiency of exploring a large price space, existing mechanisms all require inputting a prior price range. By contrast, the regret of our mechanism is not affected by the infinitely increasing price range. To demonstrate, we analyze the probability distribution of the largest price explored in our mechanism: Theorem 9 (Probability Decay). There exists a probability pd < 1 which ensures Pr(Kr = k + n) pn 1 d . Proof. If Algorithm 1 outputs Kr, p Kr or p Kr+1 must be identified as the minimum POP. Thus, µk < Ck must hold for k < k < Kr. In this case, we can have Pr(Kr = k + n) Yk +n 1 k=k +1Pr(µk Ck) (13) Meanwhile, since the initial value of µi is set as 1 in Algorithm 2, to satisfy µk < Ck, pk must be tried for at least one time. Considering the fact that Fk > Ck and the Chernoff Hoeffding bound [Auer et al., 2002], we can get Pr(µk Ck) e 2Nk(Fk Ck)2 e 2 2 (14) where = Fk +1 Ck +1. Combining Equations 13 and 14, we can conclude Theorem 9 by setting pd as exp( 2 2). Then, we conduct asymptotic analysis of the regret as: Theorem 10. When N and B/N = const, ( O(log log N) pk is POP-1 O Fk Fk 1 KL[Fk 1,Ck ] log N pk is POP-2 Proof. Considering the probability decay, we can bound the expected value of the right-hand side of Equation 12 using X n=1Krδp Pr(Kr = k +n) X n=1(k +n)δppn d (15) The other items in Equation 12 are not affected by Kr. Furthermore, we can compute the above infinite series as X i=1(k + n)pn d = (k + 1)Z(pd) + Z2(pd) < + (16) where Z(pd) = pd/(1 pd). Thus, the effects of the infinite price range (B ) is bounded by a finite constant. Hence, using Theorems 5 7, we can conclude Theorem 10. Theorem 10 shows that our mechanism not only outperforms the state-of-the-art mechanism, BP-UCB [Singla and Krause, 2013], but also classic MAB algorithms, such as UCB-1 [Auer et al., 2002] and OSUB [Combes and Proutiere, 2014]. Above all, the performance of our mechanism is almost not affected by the number of possible prices, which is attractive for practical usage but unreachable by existing posted-price mechanisms and MAB algorithms. 6 Experimental Evaluation In this section, we empirically compare our mechanism with state-of-the-art mechanisms including BP-UCB [Singla and Krause, 2013], PD-Bw K [Badanidiyuru et al., 2013] and Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) Table 1: Experimental settings Expt. Worker Model Price Range1 B/N #1 Private Cost Model: ci U[5, 200]2 [5, 200] 40 #2 Discrete Choice Model: [1, 200] 30 αi = 1/15, βi = 0.39, Mi = 2000 Reference Payment Model: [1, 200] 70 αi {0, 1, 3}, βi {0, 1, 3}, ri {20, 60, 120}, Evenly Distributed #4 The data (Fig. 1e) collected using [1, 100] 10 MTurk-Tracker [Difallah et al., 2015] 1 The price range is set for the benchmark mechanisms. 2 Here, U denotes the uniform distribution. 0.0 0.5 1.0 1.5 2.0 Number of Workers (N) 104 0.0 Expected Revenue (U) BP-UCB PD-Bw K UCB-Bw K Using pk Our Mechanism (a) Expt. #1 0.0 0.5 1.0 1.5 2.0 Number of Workers (N) 104 0 Expected Revenue (U) BP-UCB PD-Bw K UCB-Bw K Using pk Our Mechanism (b) Expt. #2 0.0 0.5 1.0 1.5 2.0 Number of Workers (N) 104 0.0 Expected Revenue (U) BP-UCB PD-Bw K UCB-Bw K Using p Our Mechanism (c) Expt. #3 0.0 0.5 1.0 1.5 2.0 Number of Workers (N) 104 0.0 Expected Utility (U) BP-UCB PD-Bw K UCB-Bw K Using pk Our Mechanism (d) Expt. #4 0 20 40 60 80 100 Prices (cents) (e) Prices in MTurk(2011-2016) 0 50 100 150 200 Prices (cents) pk BP-UCB UCB-Bw K Our Mechanism (f) Selected Prices in Expt. #1 Figure 1: Experiment results on different worker models UCB-Bw K [Agrawal and Devanur, 2014]. The idealized case with the optimal price pk known in advance and always selected is also employed for comparison. The testbeds are built based on the three worker models mentioned in Section 3.1 and the real-world price data collected from MTurk. In our experiments, workers come sequentially. The mechanism offers a price for each worker. The worker decides to accept or reject the price according to the worker model. After observing worker s decision, the mechanism updates the price offered to the next worker. The settings of our experiments are shown in Table 1. The price unit in all experiments is the cent. The expected revenue is estimated with the mean of 100 runs. Besides, we keep the ratio between the budget B and the number of workers N fixed so as to ensure the optimal price pk to be unchanged and thus a fair comparison. The results are shown in Fig. 1(a-d), respectively. We can conclude that our mechanism remarkably outperforms stateof-the-art mechanisms. It even achieves the same performance as the idealized case where pk is known and always selected. To explain the reason for the optimal performance 0.5 0.3 0.1 0.1 0.3 0.5 Relative Error Expected Utility (U) BP-UCB PD-Bw K UCB-Bw K Our Mechanism 0.0 0.5 1.0 1.5 2.0 Number of Workers (N) 104 0.0 Expected Revenue (U) BP-UCB PD-Bw K UCB-Bw K Our Mechanism (b) Figure 2: Robustness tests against: (a) the inaccurate worker number; (b) the non-stochastic arrival of workers of our mechanism, we compare the selected price distributions in Fig. 1f. We can observe an overstaying of BP-UCB at the current optimal price, which is caused by an overmuch exploitation on the current estimate of F(p). UCB-Bw K, on the other hand, shows an overmuch exploration on the possible better prices. By contrast, our mechanism performs much better in balancing exploitation and exploration. It can not only move quickly to pk , but also accurately stop exploration around pk . Note that PD-Bw K behaves similarly as UCBBw K and thus is not included for comparison in Fig. 1f. To further verify the practical usability of our mechanism, we conduct robustness analysis by considering two abnormal cases where the assumptions used for design are violated. Here, we use the private cost model as the testbed, and the settings are the same as Expt. #1. In Fig. 2a, we compare the performance of different mechanisms when the number of workers N is not accurately given. The real number of workers N is 20000, and the relative error is calculated as ( N N)/N, where N is the value offered to the mechanism. The comparison shows that our mechanism has distinct advantages over state-of-the-art mechanisms in this abnormal case. Fig. 2b presents the comparison of different mechanisms in a non-stochastic setting where two groups with the private cost uniformly distributed in [5, 100) and [100, 200] respectively arrive one after another. This setting can explain the phenomenon that working at night may cost workers more than at day. The results show that the performance advantage of our mechanism is kept in this non-stochastic setting. 7 Conclusion and Future Work In this paper, we propose an optimal posted-price mechanism for microtask crowdsourcing. Compared with existing mechanisms, our mechanism not only has better performance but also requires fewer inputs. To demonstrate the advantages, we firstly prove the optimality of our algorithm that its regret matches the Lai-Robbins regret lower bound. This lower bound applies to any possible algorithms and denotes the best performance that can be achieved. Then, we prove that the regret of our mechanism is not affected by the infinite price range. Besides, the empirical results on various worker models and the real price data collected from MTurk also verify the advantages of our mechanism. For future work, we will improve our mechanism in two aspects. Firstly, when searching for the optimal price, our mechanism needs to try the price one-by-one. Adaptively changing the searching step can enhance efficiency. Secondly, the work quality can be further considered by designing a more practical payment scheme consisting of base salary and bonus-penalty. Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) Acknowledgments This work was conducted within Rolls-Royce@NTU Corporate Lab with support from the National Research Foundation (NRF) Singapore under the Corp Lab@University Scheme. [Agrawal and Devanur, 2014] Shipra Agrawal and Nikhil R Devanur. Bandits with concave rewards and convex knapsacks. In Proc. of ACM EC, 2014. [Anari et al., 2014] Nima Anari, Gagan Goel, and Afshin Nikzad. Mechanism design for crowdsourcing: An optimal 1-1/e competitive budget-feasible mechanism for large markets. In Proc. of FOCS, 2014. [Auer et al., 2002] Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2-3):235 256, 2002. [Badanidiyuru et al., 2012] Ashwinkumar Badanidiyuru, Robert Kleinberg, and Yaron Singer. Learning on a budget: posted price mechanisms for online procurement. In Proceedings of ACM EC, 2012. [Badanidiyuru et al., 2013] Ashwinkumar Badanidiyuru, Robert Kleinberg, and Aleksandrs Slivkins. Bandits with knapsacks. In Proc. of FOCS, 2013. [Chandra et al., 2015] Praphul Chandra, Yadati Narahari, Debmalya Mandal, and Prasenjit Dey. Novel mechanisms for online crowdsourcing with unreliable, strategic agents. In Proc. of AAAI, 2015. [Combes and Proutiere, 2014] Richard Combes and Alexandre Proutiere. Unimodal bandits: Regret lower bounds and optimal algorithms. In Proc. of ICML, 2014. [Difallah et al., 2015] Djellel Eddine Difallah, Michele Catasta, Gianluca Demartini, Panagiotis G Ipeirotis, and Philippe Cudr e-Mauroux. The dynamics of micro-task crowdsourcing: The case of amazon mturk. In Proc. of WWW, 2015. [Gao and Parameswaran, 2014] Yihan Gao and Aditya Parameswaran. Finish them!: Pricing algorithms for human computation. Proceedings of the VLDB Endowment, 7(14):1965 1976, 2014. [Gao et al., 2015] Yang Gao, Yan Chen, and KJ Ray Liu. On cost-effective incentive mechanisms in microtask crowdsourcing. IEEE Transactions on Computational Intelligence and AI in Games, 7(1):3 15, 2015. [Garivier and Capp e, 2011] Aur elien Garivier and Olivier Capp e. The kl-ucb algorithm for bounded stochastic bandits and beyond. In Proc. of COLT, 2011. [Graves and Lai, 1997] Todd L Graves and Tze Leung Lai. Asymptotically efficient adaptive choice of control laws incontrolled markov chains. SIAM journal on control and optimization, 35(3):715 743, 1997. [Lai and Robbins, 1985] Tze Leung Lai and Herbert Robbins. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics, 6(1):4 22, 1985. [Liu and Chen, 2016] Yang Liu and Yiling Chen. Learning to incentivize: Eliciting effort via output agreement. In Proc. of IJCAI, 2016. [Radanovic et al., 2016] Goran Radanovic, Boi Faltings, and Radu Jurca. Incentives for effort in crowdsourcing using the peer truth serum. ACM Transactions on Intelligent Systems and Technology, 7(4):48, 2016. [Rajeev et al., 1989] A Rajeev, Demosthenis Teneketzis, and Venkatachalam Anantharam. Asymptotically efficient adaptive allocation schemes for controlled iid processes: Finite parameter space. IEEE Transactions on Automatic Control, 34(3), 1989. [Rivas, 2015] Javier Rivas. Mechanism design and bounded rationality: The case of type misreporting. Mathematical Social Sciences, 78:6 13, 2015. [Simpson et al., 2015] Edwin D Simpson, Matteo Venanzi, Steven Reece, Pushmeet Kohli, John Guiver, Stephen J Roberts, and Nicholas R Jennings. Language understanding in the wild: Combining crowdsourcing and machine learning. In Proc. of WWW, 2015. [Singer and Mittal, 2013] Yaron Singer and Manas Mittal. Pricing mechanisms for crowdsourcing markets. In Proc. of WWW, 2013. [Singla and Krause, 2013] Adish Singla and Andreas Krause. Truthful incentives in crowdsourcing tasks using regret minimization mechanisms. In Proc. of WWW, 2013. [Slivkins and Vaughan, 2014] Aleksandrs Slivkins and Jennifer Wortman Vaughan. Online decision making in crowdsourcing markets: Theoretical challenges. ACM SIGecom Exchanges, 12(2):4 23, 2014. [Wang and Zhou, 2016] Lu Wang and Zhi-Hua Zhou. Learning by crowdsourcing. In Proc. of IJCAI, 2016. [Yin and Chen, 2015] Ming Yin and Yiling Chen. Bonus or not? learn to reward in crowdsourcing. In Proc. of IJCAI, 2015. [Zhang et al., 2014] Xinglin Zhang, Zheng Yang, Zimu Zhou, Haibin Cai, Lei Chen, and Xiangyang Li. Free market of crowdsourcing: Incentive mechanism design for mobile sensing. IEEE transactions on parallel and distributed systems, 25(12):3190 3200, 2014. Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17)