# maxing_and_ranking_with_few_assumptions__eeac6397.pdf Maxing and Ranking with Few Assumptions Moein Falahatgar Yi Hao Alon Orlitsky Venkatadheeraj Pichapati Vaishakh Ravindrakumar University of California, San Deigo {moein,yih179,alon,dheerajpv7,vaishakhr}@ucsd.edu PAC maximum selection (maxing) and ranking of n elements via random pairwise comparisons have diverse applications and have been studied under many models and assumptions. With just one simple natural assumption: strong stochastic transitivity, we show that maxing can be performed with linearly many comparisons yet ranking requires quadratically many. With no assumptions at all, we show that for the Borda-score metric, maximum selection can be performed with linearly many comparisons and ranking can be performed with O(nlog n) comparisons. 1 Introduction 1.1 Motivation Maximum selection (maxing) and sorting using pairwise comparisons are among the most practical and fundamental algorithmic problems in computer science. As is well-known, maxing requires n 1 comparisons, while sorting takes (nlog n) comparisons. The probabilistic version of this problem, where comparison outcomes are random, is of significant theoretical interest as well, and it too arises in many applications and diverse disciplines. In sports, pairwise games with random outcomes are used to determine the best, or the order, of teams or players. Similarly Trueskill [1] matches video gamers to create their ranking. It is also used for a variety of online applications such as to learn consumer preferences with the popular A/B tests, in recommender systems [2], for ranking documents from user clickthrough data [3, 4], and more. The popular crowd sourcing website GIFGIF [5] shows how pairwise comparisons can help associate emotions with many animated GIF images. Visitors are presented with two images and asked to select the one that better corresponds to a given emotion. For these reasons, and because of its intrinsic theoretical interest, the problem received a fair amount of attention. 1.2 Terminology and previous results One of the first studies in the area, [6] assumed n totally-ordered elements, where each comparison errs with the same, known, probability < 1 2. It presented a maxing algorithm that uses O( n δ ) comparisons to output the maximum with probability 1 δ, and a ranking algorithm that uses O( n δ ) comparisons to output the ranking with probability 1 δ. These results have been and continue to be of great interest. Yet this model has two shortcomings. It assumes that there is only one random comparison probability, , and that its value is known. In practice, comparisons have different, and arbitrary, probabilities, and they are not known in advance. To address more realistic scenarios, researchers considered more general probabilistic models. Consider a set of n elements, without loss of generality [n] def= {1,2,...,n}. A probabilistic model, or model for short, is an assignment of a preference probability pi,j [0,1] for every i j [n], reflecting the probability that i is preferred when compared with j. We assume that repeated comparisons are independent and that there are no draws , hence pj,i = 1 pi,j. 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA. 2, we say that i is preferable to j and write i j. Element i is maximal in a model if i j for all j i. And a permutation 1,..., n is a ranking if i j for all i j. Observe that the first element of any ranking is always maximal. For example, for n = 3, p1,2 = 1 2, p1,3 = 1 3, and p2,3 = 2 3, we have 1 2, 2 1, 3 1, and 2 3. Hence 2 is the unique maximum, and 2,3,1 is the unique ranking. We seek algorithms that without knowing the underlying model, use pairwise comparisons to find a maximal element and a ranking. Two concerns spring to mind. First, there may be two elements i,j with pi,j arbitrarily close to half, requiring arbitrarily many comparisons just to determine which is preferable to the other. This concern has a common remedy, that we also adopt. The PAC paradigm, e.g. [7, 8], that requires the algorithm s output to be only Probably Approximately Correct. def= pi,j 1 2 be the centered preference probability. Note that pi,j 0 iff i is preferable to j. If pi,j we say that i is -preferable to j. For 0 < < 1 2, an element i [n] is -maximum if it is -preferable to all other elements, namely, pi,j j i. Given > 0, 1 2 δ > 0, a PAC maxing algorithm must output an -maxima with probability 1 δ, henceforth abbreviated with high probability (WHP). Similarly, a permutation 1,..., n of {1,...,n} is an -ranking if i is -preferable to j for all i j, and a PAC ranking algorithm must output an -ranking WHP. Note that in this paper, we consider δ 1 2, the more practical regime. For larger values of δ, one can use our algorithms with δ = 1 The second concern is that not all models have a ranking, or even a maximal element. For example, for p1,2 = p2,3 = p3,1 = 1, or the more opaque yet interesting non-transitive coins [9], each element is preferable to the cyclically next, hence there is no maximal element and no ranking. A standard approach, that again we too will adopt, to address this concern is to consider structured models. The simplest may be parametric models, of which one of the more common is Placket Luce (PL) [10, 11], where each element i is associated with an unknown positive number ai and pi,j = ai ai+aj . [12] derived a PAC maxing algorithm that uses O( n δ) comparisons and a PAC ranking algorithm that uses O( n 2 log nlog n δ) comparisons for any PL model. Related results for the Mallows model under a non-PAC paradigm were derived by [13]. But significantly more general, and more realistic, non-parametric, models may also have maxima and rankings. A model is strongly stochastically transitive (SST), if i j and j k imply pi,k max(pi,j,pj,k). By simple induction, every SST model has a maximum element and a ranking. And one additional property, that is perhaps more difficult to justify, has proved helpful in constructing maxing and sorting PAC algorithms. A tournament satisfies the stochastic triangle inequality if i j and j k imply that pi,k pi,j + pj,k. In Section 4 we show that if a model has a ranking, then an -ranking can be found WHP via O( n2 δ ) comparisons. For all models that satisfy both SST and triangle inequality, [7] derived a PAC maxing algorithm that uses O( n δ) comparisons. [14] eliminated the log n factor and showed that O n δ comparisons suffice and are optimal, and constructed a nearly-optimal PAC ranking algorithm that uses O( n log n(log log n)3 2 ) comparisons for all δ 1 n, off by a factor of O((log log n)3) from optimum. Lower-bounds follow from an analogy to [15, 6]. Observe that since the PL model satisfies both SST and triangle inequality, these results also improve the corresponding PL results. Finally, we consider models that are not SST, or perhaps don t have maximal elements, rankings, or even their -equivalents. In all these cases, one can apply a weaker order relation. The Borda score s(i) def= 1 n j pi,j is the probability that i is preferable to another, randomly selected, element. Element i is Borda maximal if s(i) = maxj s(j), and -Borda maximal if s(i) maxj s(j) . A PAC Borda-maxing algorithm outputs an -Borda maximal element WHP (with probability 1 δ). Similarly, a Borda ranking is a permutation i1,...,in such that for all 1 j n 1, s(ij) s(ij+1). An -Borda ranking is a permutation where for all 1 j k n, s(ij) s(ik) . A PAC Borda-ranking algorithm outputs an -Borda ranking WHP. Recall that Borda scores apply to all models. As noted in [16, 17, 8, 18] considering elements with nearly identical Borda scores shows that exact Borda-maxing and ranking requires arbitrarily many comparisons. [8] derived a PAC Borda ranking, and therefore also maxing, algorithms that use 2 ) comparisons. [19] derived a O( n log n δ )) PAC Borda ranking algorithm for restricted setting. However note that several simple models, including p1,2 = p2,3 = p3,1 = 1 do not belong to this model. [20, 21, 22] considered deterministic adversarial versions of this problem that has applications in [23]. Finally, we note that all our algorithms are adaptive, where each comparison is chosen based on the outcome of previous comparisons. Non-adaptive algorithms were discussed in [24, 25, 26, 27]. 2 Results and Outline Our goal is to find the minimal assumptions that enable efficient algorithms for these problems. In particular, we would like to see if we can eliminate the somewhat less-natural triangle inequality. With two algorithmic problems: maxing and ranking, and one property SST and one special metric Borda scores, the puzzle consists of four main questions. 1) With just SST (and no triangle inequality) are there: a) PAC maxing algorithms with O(n) comparisons? b) PAC ranking algorithms with near O(nlog n) comparisons? 2) With no assumptions at all, but for the Borda-score metric, are there: a) PAC Borda-maxing algorithms with O(n) comparisons? b) PAC Borda-ranking algorithms with near O(nlog n) comparisons? We essentially resolve all four questions. 1a) Yes. In Section 3, Theorem 6, we use SST alone to derive a O n δ comparisons PAC maxing algorithm. Note that this is the same complexity as with triangle inequality, and it matches the lower bound. 1b) No. In Section 4, Theorem 7, we show that there are SST models where any PAC ranking algorithm with 1 4 requires (n2) comparisons. This is significantly higher than the roughly O(nlog n) comparisons needed with triangle inequality, and is close to the O(n2 log n) comparisons required without any assumptions. 2a) Yes. In Section 5, Theorem 8, we derive a PAC Borda maxing algorithm that without any model assumptions requires O n δ comparisons which is order optimal. 2b) Yes. In Section 5, Theorem 9, we derive a PAC Borda ranking algorithm that without any model assumptions requires O n δ comparisons. Beyond the theoretical results sections, in Section 6, we provide experiments on simulated data. In Section 7, we discuss the results. 3.1 SEQ-ELIMINATE Our main building block is a simple, though sub-optimal, algorithm SEQ-ELIMINATE that sequentially eliminates one element from input set to find an -maximum under SST. SEQ-ELIMINATE uses O n δ comparisons and w.p. 1 δ, finds an -maximum. Even for simpler models [15] we know that an algorithm needs n δ comparisons to find an -maximum w.p. 1 δ. Hence the number of comparisons used by SEQ-ELIMINATE is optimal up to a constant factor when δ 1 n but can be log n times the lower bound for δ = 1 By SST, any element that is -preferable to absolute maximum element of S is an -maximum of S. Therefore if we can reduce S to a subset S of size O( n log n) that contains an absolute maximum of S using O n δ comparisons, we can then use SEQ-ELIMINATE to find an -maximum of S and the number of comparisons is optimal up to constants. We provide one such reduction in subsection 3.2. Sequential elimination techniques have been used before [13] to find an absolute maximum. In such approaches, a running element is maintained, and is compared and replaced with a competing element in S if the latter is found to be better with confidence 1 δ n. Note that if the running and competing elements are close to each other, this technique can take an arbitrarily long time to declare the winner. But since we are interested in finding only an -maximum, SEQ-ELIMINATE circumvents this issue. We later show that SEQ-ELIMINATE needs to update the running element r with the competing element c if pc,r and retain r if pc,r 0. If 0 < pc,r < , replacing or retaining r doesn t affect the performance of SEQ-ELIMINATE significantly. Thus, in other words we ve reduced the problem to testing whether pc,r 0 or pc,r . Assuming that testing problem always returns the right answer, since SEQ-ELIMINATE never replaces the running element with a worse element, either the output is the absolute maximum b or b is never the running element. If b is eliminated against running element r then pb ,r and hence r is an -maximum and since the running element only gets better, the output is an -maximum. We first present a testing procedure COMPARE that we use to update the running element in SEQELIMINATE. 3.1.1 COMPARE COMPARE(i,j, l, u,δ) takes two elements i and j, and two biases u > l, and with confidence 1 δ, determines whether pi,j is l or u. For this, COMPARE compares the two elements 2 ( u l)2 log(2 δ) times. Let ˆpi,j be the fraction of times i beats j, and let ˆ pi,j def= ˆpi,j 1 2. If ˆ pi,j < ( l + u) 2, COMPARE declares pi,j l (returns 1), and otherwise it declares pi,j u (returns 2). Due to lack of space, we present the algorithm COMPARE in Appendix A.1 along with certain improvements for better performance in practice . In the Lemma below, we bound the number of comparisons used by COMPARE and prove its correctness. Proof is in A.2. Lemma 1. For u > l, COMPARE(i,j, l, u,δ) uses 2 ( u l)2 log 2 δ comparisons and if pi,j l, then w.p. 1 δ, it returns 1, else if pi,j u, w.p. 1 δ, it returns 2. Now we present SEQ-ELIMINATE that uses the testing subroutine COMPARE and finds an - maximum. 3.1.2 SEQ-ELIMINATE Algorithm SEQ-ELIMINATE takes a variable set S, selects a random running element r S and repeatedly uses COMPARE(c,r,0, ,δ n) to compare r to a random competing element c S r. If COMPARE returns 1 i.e., deems pc,r 0, it retains r as the running element and eliminates c from S, but if COMPARE returns 2 i.e., deems pc,r , it eliminates r from S and updates c as the new running element. Algorithm 1 SEQ-ELIMINATE 1: inputs 2: Set S, bias , confidence δ 3: n S 4: r a random c S, S = S {r} 5: while S do 6: Pick a random c S, S = S {c}. 7: if COMPARE(c,r,0, , δ n) = 2 then 8: r c 9: end if 10: end while 11: return r We now bound the number of comparisons used by SEQ-ELIMINATE(S, ,δ) and prove its correctness. Proof is in A.3. Theorem 2. SEQ-ELIMINATE(S, ,δ) uses O S δ comparisons, and w.p. 1 δ outputs an -maximum. 3.2 Reduction Recall that, for δ 1 n, SEQ-ELIMINATE is order-wise optimal. For δ 1 n, here we present a reduction procedure that uses O n δ comparisons and w.p. 1 δ, outputs a subset S of size O( nlog n) and an element a such that either a is a 2 3-maximum or S contains an absolute maximum of S. Combining the reduction with SEQ-ELIMINATE results in an order-wise optimal algorithm. We form the reduced subset S by pruning S. We compare each element e S with an anchor element a, test whether pe,a 0 or pe,a 2 3 using COMPARE, and retain all elements e for which COMPARE returns the second hypothesis. For S to be of size O( nlog n) we d like to pick an anchor element that is among the top O( nlog n) elements. But this can be computationally hard and we show that it suffices to pick an anchor that is not 3-preferable to at most O( nlog n) elements in S. An element a is called an ( ,n )-good anchor if a is not -preferable to at most n elements, i.e., {e e S and pe,a > } n . We now present the subroutine PICK-ANCHOR that finds a good anchor element. 3.2.1 Picking Anchor Element PICK-ANCHOR(S,n , ,δ) uses O n n 2 log 1 δ log n n δ comparisons and w.p. 1 δ, outputs an ( ,n )-good anchor element. PICK-ANCHOR first picks randomly a set Q of n n log 2 δ elements from S without replacement. This ensures that w.p. 1 δ, Q contains at least one of the top n elements. We then use SEQ-ELIMINATE to find an -maximum of Q. Let the absolute maximum element of Q be denoted as q . Now an -maximum of Q is -preferable to q . Further, if Q contains an element in the top n elements, there exists n n elements worse than q in S. Thus by SST, the -maximum of Q is also -preferable to these n n elements and hence the output of PICK-ANCHOR is an ( ,n )-good anchor element. PICK-ANCHOR is shown in appendix A.4 We now bound the number of comparisons used by PICK-ANCHOR and prove its correctness. Proof is in A.5. Lemma 3. PICK-ANCHOR(S,n , ,δ) uses O n n 2 log 1 δ log n n δ comparisons and w.p. 1 δ, outputs an ( ,n )-good anchor element. Remark 4. Note that PICK-ANCHOR(S,cn, ,δ) uses Oc 1 2 comparisons where the constant depends only on c but not on n. Hence it is advantageous to use this method to pick nearmaximum element when n is large. We now present PRUNE that takes an anchor element as input and prunes the set S using the anchor. 3.2.2 Pruning Given an ( l,n )-good anchor element a, w.p. 1 δ 2, PRUNE(S,a,n , l, u,δ) outputs a subset S of size 2n . Further, any element e that is at least u-better than a i.e., pe,a u is in S w.p. 1 δ 2. PRUNE prunes S in multiple rounds. In each round t, for every element e in S, PRUNE tests whether pe,a l or pe,a u using COMPARE(e,a, l, u,δ 2t+1) and eliminates e if the first hypothesis i.e., pe,a l is returned. By Lemma 1, an element e that is u better than a i.e., pe,a u passes the tth round of pruning w.p. 1 δ 2t+1. Thus by union bound, the probability that such an element is not present in the pruned set is t=1 δ 2t+1 δ 2. Now for element e that is not l-better than a i.e., pe,a l, by Lemma 1, the first hypothesis is returned w.p. 1 δ 4. Hence w.h.p., the number of such elements (not l-better elements) is reduced by a factor of δ after each round. Since a is an ( l,n )-good anchor element, there are at most n elements atleast l-better than a. Thus the number of elements left in the pruned set after round t is at most n + nδt. Thus PRUNE succeeds eventually in reducing the size to 2n (in log1 δ n n rounds). Algorithm 2 PRUNE 1: inputs 2: Set S, element a, size n , lower bias l, upper bias u, confidence δ. 3: t 1 4: S1 S 5: while St > 2n and t < log2 n do 6: Initialize: Qt 7: for e in St do 8: if COMPARE(e,a, l, u,δ 2t+1) = 1 then 9: Qt Qt {e} 10: end if 11: end for 12: St+1 St Qt 13: t t + 1 14: end while 15: return St. We now bound the number of comparisons used by PRUNE and prove its correctness. Proof is in A.6. Lemma 5. If n 6nlog n, δ 1 n and a is an ( l,n )-good anchor element, then w.p. 1 δ PRUNE(S,a,n , l, u,δ) uses O n ( u l)2 log 1 δ comparisons and outputs a set of size less than 2n . Further if a is not an u-maximum of S then w.p. 1 δ 2, the output set contains an absolute maximum element of S. 3.3 Full Algorithm We now present the main algorithm, OPT-MAXIMIZE that w.p. 1 δ, uses O n δ comparisons and outputs an -maximum. For δ 1 n, SEQ-ELIMINATE uses O( n δ ) comparisons and hence we directly use SEQ-ELIMINATE. Below we assume δ > 1 Here OPT-MAXIMIZE first finds an ( 3, 6nlog n)-good anchor element a using PICK-ANCHOR(S, 6nlog n, 3, δ 4). Then using PRUNE(S,a, 6nlog n, 3,2 3, δ 4) with a, OPT-MAXIMIZE prunes S to a subset S of size 2 6nlog n such that if a is not a 2 3 maximum i.e. pb ,a > 2 3, S contains the absolute maximum b w.p. 1 δ 2. OPT-MAXIMIZE then checks if a is a 2 3 maximum by using COMPARE(e,a,2 3, ,δ (4n)) for every element e S . If COMPARE returns first hypothesis for every e S then OPT-MAXIMIZE outputs a or else OPT-MAXIMIZE outputs SEQ-ELIMINATE(S , , δ Note that only one of these three cases is possible: (1) a is a 2 3-maximum, (2) a is not an - maximum and (3) a is an -maximum but not a 2 3-maximum. In case (1), since a is a 2 3maximum, by Lemma 1, w.p. 1 δ 4, COMPARE returns the first hypothesis for every e S and OPT-MAXIMIZE outputs a. In both cases (2) and (3), as stated above, w.p. 1 δ 2, S contains the absolute maximum b . Now in case (2) since a is not an -maximum, by Lemma 1, w.p. 1 δ (4n), COMPARE(b ,a,2 3, ,δ (4n)) returns the second hypothesis. Thus OPT-MAXIMIZE outputs SEQ-ELIMINATE(S , ,δ 4), which w.p. 1 δ 4, returns an -maximum of S (recall that an -maximum of S is an -maximum of S if S contains b ). Finally in case (3), OPT-MAXIMIZE either outputs a or SEQ-ELIMINATE(S , ,δ 4) and either output is an -maximum w.p. 1 δ. In the below Theorem, we bound comparisons used by OPT-MAXIMIZE and prove its correctness. Proof is in A.7. Theorem 6. W.p. 1 δ, OPT-MAXIMIZE(S, ,δ) uses O( n δ ) comparisons and outputs an -maximum. Recall that [14] considered a model with both SST and stochastic triangle inequality and derived an -ranking with O n log n(log log n)3 2 comparisons for δ = 1 n. By constrast, we consider a more Algorithm 3 OPT-MAXIMIZE 1: inputs 2: Set S, bias , confidence δ. 3: if δ 1 n then 4: return SEQ-ELIMINATE(S, ,δ) 5: end if 6: a PICK-ANCHOR(S, 6nlog n, 3, δ 7: S PRUNE(S,a, 6nlog n, 3,2 3, δ 4) 8: for element e in S do 9: if COMPARE(e,a, 2 4n) = 2 then 10: return SEQ-ELIMINATE(S , , δ 4) 11: end if 12: end for 13: return a general model without stochastic triangle inequality and show that even a 1 4-ranking with just SST takes (n2) comparisons for δ 1 To establish the lower bound, we reduce the problem of finding 1 4-ranking to finding a coin with bias 1 among n(n 1) 2 1 other fair coins. For this, we consider the following model with n elements {a1,a2,...,an}: pa1,an = 1 2, pai,aj = µ(0 < µ < 1 n10), when i < j and (i,j) (1,n). Note that this model satisfies SST but not stochastic triangle inequality. Also note that any ranking where a1 precedes an is an 1 4-ranking and thus the algorithm only needs to order a1 and an correctly. Now the output of a comparison between any two elements other than a1 and an is essentially a fair coin toss (since µ is very small). Thus if we output a ranking without querying comparison between a1 and an, then the ranking is correct w.p. 1 2 since a1 and an must necessarily be ordered correctly. Now if an algorithm uses only n2 20 comparisons then the probability that the algorithm queried at least one comparison between a1 and an is less than 1 2 and hence cannot achieve a confidence of 7 8. Proof sketch in B.1. Theorem 7. There exists a model that satisfies SST for which any algorithm requires (n2) comparisons to find a 1 4-ranking with probability 7 8. We also present a trivial -ranking algorithm in Appendix B.2 that for any stochastic model with ranking (Weak Stochastic Transitivity), uses O( n2 δ ) comparisons and outputs an -ranking w.p. 1 δ. 5 Borda Scores We show that for general models, using O( n δ ) comparisons w.p. 1 δ, we can find an -Borda maximum and using O( n δ ) comparisons w.p. 1 δ, we can find an -Borda ranking. Recall that Borda score s(e) of an element e is the probability that e is preferable to an element picked randomly from S i.e., s(e) = 1 n f S pe,f. We first make a connection between Borda scores of elements and the traditional multi armed bandit setting. In the Bernoulli multi armed setting, every arm a is associated with a parameter q(a) and pulling that arm results in a reward B(q(a)), a Bernoulli random variable with parameter q(a). Observe that we can simulate our pairwise comparisons setting as a traditional bandit arms setting by comparing an element with a random element where in our setting, for every element e, the associated parameter is s(e). Thus PAC optimal algorithms derived under traditional bandit setting work for PAC Borda score setting too. [28] and several others derived a PAC maximum arm selection algorithms that use O( n δ ) comparisons and find an arm with parameter at most less than the highest. This implies an -Borda maxing algorithm with the same complexity. Proof follows from reduction to Bernoulli multi-armed bandit setting. Theorem 8. There exists an algorithm that uses O( n δ ) comparisons and w.p. 1 δ, outputs an -Borda maximum. For -Borda ranking, we note that if we compare an element e with 2 δ random elements, w.p. 1 δ n, the fraction of times e wins approximates the Borda score of e to an additive error of 2. Ranking based on these approximate scores results in an -Borda ranking. We present BORDARANKING in C.1 that uses 2n δ comparisons and w.p. 1 δ outputs an -Borda ranking. Proof in C.1. Theorem 9. BORDA-RANKING(S, ,δ) uses 2n δ comparisons and w.p. 1 δ outputs an -Borda ranking. 6 Experiments In this section we validate the performance of our algorithms using simulated data. Since we essentially derived a negative result for -ranking, we consider only our -maxing algorithms - SEQELIMINATE and OPT-MAXIMIZE for experiments. All results are averaged over 100 runs. 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Number of elements Sample Complexity OPT-MAXIMIZE SEQ-ELIMINATE (a) small values of n 0 5000 10000 15000 Number of elements Sample Complexity OPT-MAXIMIZE SEQ-ELIMINATE (b) large values of n Figure 1: Comparison of SEQ-ELIMINATE and OPT-MAXIMIZE Similar to [14, 7], we consider the stochastic model pi,j = 0.6 i < j. We use maxing algorithms to find 0.05-maximum with error probability δ = 0.1. Note that i = 1 is the unique 0.05-maximum under this model. In Figure 1, we compare the performance of SEQ-ELIMINATE and OPT-MAXIMIZE over different ranges of n. Figures 1(a), 1(b) show that for small n i.e., n 1300 SEQ-ELIMINATE performs well and for large n i.e., n 1300, OPT-MAXIMIZE performs well. Since we are using δ = 0.1, the experiment suggests that for δ 1 n1 3 , OPT-MAXIMIZE uses fewer comparisons as compared to SEQ-ELIMINATE. Hence it would be beneficial to use SEQ-ELIMINATE for δ 1 n1 3 and OPT-MAXIMIZE for higher values of δ. In further experiments, we use δ = 0.1 and n < 1000 so we use SEQ-ELIMINATE for better performance. We compare SEQ-ELIMINATE with BTM-PAC [7], KNOCKOUT [14], Mallows MPI [13], and AR [16] . KNOCKOUT and BTM-PAC are PAC maxing algorithms for models with SST and stochastic triangle inequality requirements. AR finds an element with maximum Borda score. Mallows finds the absolute best element under Weak Stochastic Transitivity. We again consider the model: pi,j = 0.6 i < j and try to find a 0.05-maximum with error probability δ = 0.1. Note that this model satisfies both SST and stochastic triangle inequality and under this model all these algorithms can find an -maximum. From Figure 2(a), we can see that BTM-PAC performs worse for even small values of n and from Figure 2(b), we can see that AR performs worse for higher values of n. One possible reason is that BTM-PAC is tailored for reducing regret in the bandit setting and in the case of AR, Borda scores of elements become approximately the same with increasing number of elements, leading to more comparisons. For this reason, we drop BTM-PAC and AR for further experiments. We also tried PLPAC [12] but it fails to achieve required accuracy of 1 δ since it is designed primarily for Plackett-Luce. For example, we considered the previous setting pi,j = 0.6 i < j with n = 100 and tried to find a 0.09-maximum with δ = 0.1. Even though PLPAC used almost same number of comparisons (57237) as SEQ-ELIMINATE (56683), PLPAC failed to find 0.09-maxima 20 out of 100 runs whereas SEQ-ELIMINATE found the maximum in all 100 runs. In figure 3, we compare algorithms SEQ-ELIMINATE, KNOCKOUT [14] and Mallows MPI [13] for models that do not satisfy stochastic triangle inequality. In Figure 3(a), we consider the stochastic model p1,j = 1 2 + q j n 2, p1,j = 1 j > n 2 and pi,j = 1 2 + q 1 < i < j where q 0.05 and we pick n = 10. Observe that this model satisfies SST but not stochastic triangle inequality. Here 7 10 15 Number of elements Sample Complexity SEQ-ELIMINATE KNOCKOUT Mallows MPI AR BTM-PAC (a) small values of n 50 100 200 500 Number of elements Sample Complexity SEQ-ELIMINATE KNOCKOUT Mallows MPI AR (b) large valuesof n Figure 2: Comparison of Maxing Algorithms with Stochastic Triangle Inequality again, we try to find a 0.05-maximum with δ = 0.1. Note that any i n 2 is a 0.05 maximum. From Figure 3(a), we can see that Mallows MPI uses more comparisons as q decreases since Mallows MPI is not a PAC algorithm and tries to find the absolute maximum. Even though KNOCKOUT performs better than Mallows MPI, it fails to output a 0.05 maximum with probability 0.12 for q = 0.001 and 0.26 for q = 0.0001. Thus KNOCKOUT can fail when the model doesn t satisfy stochastic triangle inequality. We give an explanation for this behavior in Appendix D. By constrast, even for q = 0.0001, SEQ-ELIMINATE outputted a 0.05 maximum in all runs and outputted the abosulte maximum in 76% of trials. We can also see that SEQ-ELIMINATE uses much fewer comparisons compared to the other two algorithms. In Figure 3(b), we compare SEQ-ELIMINATE and Mallows MPI on the Mallows model, a model which doesn t satisfy stochastic triangle inequality. Mallows model can be specified with one parameter φ. We consider n = 10 elements and find a 0.05-maximum with error probablility δ = 0.05. From Figure 3(b) we can see that the performance of Mallows MPI gets worse as φ approaches 1, since comparison probabilities get close to 1 2 whereas SEQ-ELIMINATE is not affected. 0.04 0.02 0.01 0.001 0.0001 104 Sample Complexity SEQ-ELIMINATE Mallows MPI (a) No Triangle Inequality 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 102 Sample Complexity SEQ-ELIMINATE Mallows MPI (b) Mallows Model Figure 3: Comparison of SEQ-ELIMINATE and MALLOWSMPI over Mallows Model One more experiment is presented in Appendix E. 7 Conclusion We extended the study of PAC maxing and ranking to general models which satisfy SST but not stochastic triangle inequality. For PAC maxing, we derived an algorithm with linear complexity. For PAC ranking, we showed a negative result that any algorithm needs (n2) comparisons. We thus showed that removal of stochastic triangle inequality constraint does not affect PAC maxing but affects PAC ranking. We also ran experiments over simulated data and showed that our PAC maximum selection algorithms are better than other maximum selection algorithms. For unconstrained models, we derived algorithms for PAC Borda maxing and PAC Borda ranking by making connections with traditional multi-armed bandit setting. Acknowledgments We thank NSF for supporting this work through grants CIF-1564355 and CIF1619448. [1] Ralf Herbrich, Tom Minka, and Thore Graepel. Trueskill: a bayesian skill rating system. In Proceedings of the 19th International Conference on Neural Information Processing Systems, pages 569 576. MIT Press, 2006. 1.1 [2] Jialei Wang, Nathan Srebro, and James Evans. Active collaborative permutation learning. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 502 511. ACM, 2014. 1.1 [3] Filip Radlinski and Thorsten Joachims. Active exploration for learning rankings from clickthrough data. In Proceedings of the 13th ACM SIGKDD, pages 570 579. ACM, 2007. 1.1 [4] Filip Radlinski, Madhu Kurup, and Thorsten Joachims. How does clickthrough data reflect retrieval quality? In Proceedings of the 17th ACM conference on Information and knowledge management, pages 43 52. ACM, 2008. 1.1 [5] http://www.gif.gf/. 1.1 [6] Uriel Feige, Prabhakar Raghavan, David Peleg, and Eli Upfal. Computing with noisy information. SIAM Journal on Computing, 23(5):1001 1018, 1994. 1.2 [7] Yisong Yue and Thorsten Joachims. Beat the mean bandit. In Proc. of the ICML, pages 241 248, 2011. [8] R obert Busa-Fekete, Bal azs Sz or enyi, and Eyke H ullermeier. Pac rank elicitation through adaptive sam- pling of stochastic pairwise preferences. In AAAI, 2014. 1.2 [9] Wikipedia. Nontransitive dice Wikipedia, the free encyclopedia. http://en.wikipedia.org/w/ index.php?title=Nontransitive%20dice&oldid=779713322, 2017. [Online; accessed 19-May2017]. 1.2 [10] Robin L Plackett. The analysis of permutations. Applied Statistics, pages 193 202, 1975. 1.2 [11] R Duncan Luce. Individual choice behavior: A theoretical analysis. Courier Corporation, 2005. 1.2 [12] Bal azs Sz or enyi, R obert Busa-Fekete, Adil Paul, and Eyke H ullermeier. Online rank elicitation for plackett-luce: A dueling bandits approach. In NIPS, pages 604 612, 2015. 1.2, 6 [13] R obert Busa-Fekete, Eyke H ullermeier, and Bal azs Sz or enyi. Preference-based rank elicitation using statistical models: The case of mallows. In Proc. of the ICML, pages 1071 1079, 2014. 1.2, 3.1, 6, 6 [14] Moein Falahatgar, Alon Orlitsky, Venkatadheeraj Pichapati, and Ananda Theertha Suresh. Maximum selection and ranking under noisy comparisons. In International Conference on Machine Learning, pages 1088 1096, 2017. 1.2, 4, 6, 6, A.1, D [15] Yuan Zhou, Xi Chen, and Jian Li. Optimal pac multiple arm identification with applications to crowd- sourcing. In International Conference on Machine Learning, pages 217 225, 2014. 1.2, 3.1 [16] Reinhard Heckel, Nihar B Shah, Kannan Ramchandran, and Martin J Wainwright. Active ranking from pairwise comparisons and when parametric assumptions don t help. ar Xiv preprint ar Xiv:1606.08842, 2016. 1.2, 6 [17] Tanguy Urvoy, Fabrice Clerot, Raphael F eraud, and Sami Naamane. Generic exploration and k-armed voting bandits. In Proc. of the ICML, pages 91 99, 2013. 1.2 [18] Kevin Jamieson, Sumeet Katariya, Atul Deshpande, and Robert Nowak. Sparse dueling bandits. In Artificial Intelligence and Statistics, pages 416 424, 2015. 1.2 [19] David Timothy Lee, Ashish Goel, Tanja Aitamurto, and Helene Landemore. Crowdsourcing for participa- tory democracies: Efficient elicitation of social choice functions. In Second AAAI Conference on Human Computation and Crowdsourcing, 2014. 1.2 [20] Jayadev Acharya, Ashkan Jafarpour, Alon Orlitsky, and Ananda Theertha Suresh. Sorting with adversarial comparators and application to density estimation. In Information Theory (ISIT), 2014 IEEE International Symposium on, pages 1682 1686. IEEE, 2014. 1.2 [21] Jayadev Acharya, Moein Falahatgar, Ashkan Jafarpour, Alon Orlitsky, and Ananda Theertha Suresh. Maximum selection and sorting with adversarial comparators and an application to density estimation. ar Xiv preprint ar Xiv:1606.02786, 2016. 1.2 [22] Mikl os Ajtai, Vitaly Feldman, Avinatan Hassidim, and Jelani Nelson. Sorting and selection with imprecise comparisons. ACM Transactions on Algorithms (TALG), 12(2):19, 2016. 1.2 [23] Jayadev Acharya, Ashkan Jafarpour, Alon Orlitsky, and Ananda Theertha Suresh. Near-optimal-sample estimators for spherical gaussian mixtures. NIPS, 2014. 1.2 [24] Arun Rajkumar and Shivani Agarwal. A statistical convergence perspective of algorithms for rank aggre- gation from pairwise data. In Proc. of the ICML, pages 118 126, 2014. 1.2 [25] Sahand Negahban, Sewoong Oh, and Devavrat Shah. Iterative ranking from pair-wise comparisons. In NIPS, pages 2474 2482, 2012. 1.2 [26] Sahand Negahban, Sewoong Oh, and Devavrat Shah. Rank centrality: Ranking from pairwise compar- isons. Operations Research, 2016. 1.2 [27] Minje Jang, Sunghyun Kim, Changho Suh, and Sewoong Oh. Top-k ranking from pairwise comparisons: When spectral ranking is optimal. ar Xiv preprint ar Xiv:1603.04153, 2016. 1.2 [28] Yuan Zhou and Xi Chen. Optimal pac multiple arm identification with applications to crowdsourcing.