# active_ranking_and_matchmaking_with_perfect_matchings__88f8de07.pdf Active Ranking and Matchmaking, with Perfect Matchings Hafedh El Ferchichi 1 2 Matthieu Lerasle 1 2 Vianney Perchet 1 2 3 We address the challenge of actively ranking a set of items/players with varying values/strengths. The comparison outcomes are random, with a greater noise the closer the values. A crucial requirement is that, at each iteration of the algorithm, all items must be compared once, i.e., an iteration is a perfect matching. Furthermore, we presume that comparing two players with closely matched strengths incurs no cost and, in contrast, a unit cost is associated with comparing players whose strength difference is more substantial. Our secondary objective is to determine an optimal matching between players based on this cost function: we propose and analyze an algorithm that draws on concepts from both AKS sorting networks and bandit theory. Our algorithm achieves both objectives with high probability, and the total cost is optimal (up to logarithmic terms). 1. Introduction Background and motivation: Sorting through pairwise comparisons is a foundational challenge in computer science, extending its significance to diverse applications (Bengs et al., 2021). However, in many practical scenarios, these comparisons inherently possess a random nature. For instance, consider the assessment or ranking of players in games or sports, such as Chess, where Elo scores are employed (Elo, 1978). In such cases, when two players or teams engage repeatedly, the outcome is not consistently identical; the proximity of their values or strengths corresponds to a likelihood of winning converging toward 1/2. Consequently, the task of ranking items or players based on these noisy, random comparisons has evolved into a critical objective (Minka et al., 2018; Bengs et al., 2021). Microsoft s True Skill software, for instance, is employed to 1CREST, ENSAE, IP Paris 2FAIRPLAY joint team 3Criteo AI Lab. Correspondence to: Hafedh El Ferchichi . Proceedings of the 41 st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024. Copyright 2024 by the author(s). match and rank thousands of Xbox gamers by leveraging their historical performance records, reflecting the outcomes of past matches (Herbrich et al., 2006; Minka et al., 2018). A noteworthy challenge arises from the dual nature of ranking and proposing engaging matches, both of which, though complementary, can be challenging to harmonize. On one hand, the ranking process necessitates the exploration of potential comparisons. Merely pairing two players with each other, for instance, does not provide insights into how these players compare with others in the broader context. On the other hand, this exploration may result in matches between players with substantially different values. While informative for ranking purposes, such deterministic comparisons may lack interest for the involved players, as the outcome tends to be predictable, with one player consistently prevailing. Recognizing this challenge, platforms aim to rank players while concurrently pairing individuals of relatively similar skills (Minka et al., 2018). These principles align with the broader framework of active ranking (Falahatgar et al., 2017b; Zoghi et al., 2017; Sz or enyi et al., 2015; Saha & Gopalan, 2019; Ren et al., 2019). The majority of active ranking algorithms hinge on determining the minimum number of comparisons (sample complexity) required to produce the accurate ranking with a predetermined confidence level (Falahatgar et al., 2017b;a; 2018; Ren et al., 2019). However, a common trait of these algorithms is their tendency to leave certain players unpaired. Typically, the algorithm repetitively compares players who are challenging to discriminate, with these individuals being selected multiple times (Ren et al., 2019). Conversely, players with significantly greater (or lesser) strength than the majority are often left unpaired after a few games. This characteristic proves undesirable in the context of video games, where a player left unattended for numerous rounds may disengage from the platform. To address this concern, we introduce an additional constraint on active ranking algorithms. Specifically, at each iteration (or round), these algorithms are required to pair all N players (equivalently, perform N/2 comparisons in parallel). In essence, the algorithm, at each round, selects a perfect matching of players based on the outcomes of previous rounds, collects the results of all pairs, and proceeds to the next round. Active Ranking and Matchmaking, with Perfect Matchings Acknowledging that players generally prefer matchups against opponents of comparable skill levels, we assign a cost to each proposed pair, set to 0 if the difference in values between the players in the pair is small and 1 if the difference is substantial. Consequently, the cost of a matching is the sum of the costs associated with its pairs. Model and notations: We consider a scenario involving N items or players (for simplicity, we assume that N is even), each associated with an unknown ranking denoted as (r(1), r(2), . . . , r(N)). The primary objective is to discern this ranking through pairwise noisy comparisons. Specifically, when two items, i and j, are paired, the outcome of the comparison is i beats j with a probability of p(i, j) and j beats i with a complementary probability of p(j, i) = 1 p(i, j). For the sake of simplicity, we assume the absence of ties, although this would not pose a significant challenge. An implicit, albeit crucial, assumption underlying this framework is the independence of results from queried comparisons (Falahatgar et al., 2017a;b; Heckel et al., 2019; Saha & Gopalan, 2019). To reconstruct the ranking from the comparison results, we introduce a structure on the matrix p(i, j). The initial assumption is that p(i, j) > 1/2 if i has a better rank than j, denoted as r(i) < r(j). Consequently, the key parameters in noisy ranking are the ε(i, j) = p(i, j) 1/2, the additional probability of i winning against j. Notably, ε(j, i) = ε(i, j) and ε(i, j) 0 whenever r(i) < r(j). The magnitude of |ε(i, j)| is interpreted as the skill gap between players i and j. We shall consider the standard following assumptions on the ε(i, j) matrix (Yue et al., 2012): Assumption 1.1. Strong Stochastic Transitivity (SST) (Tversky & Russo, 1969) For any i, j, k [N], if r(i) < r(j) < r(k), then ε(i, k) max(ε(i, j), ε(j, k)). Assumption 1.1 posits that if player i is stronger than j (ε(i, j) 0) and player j is stronger than k (ε(j, k) 0), then it is easier for player i to beat k than j. This assumption is commonplace and satisfied in various models, with additional details provided in Appendix A. Assumption 1.2. Stochastic Triangle Inequality (STI) If r(i) < r(j) < r(k), then ε(i, k) ε(i, j) + ε(j, k). Assumption 1.2 reflects a local skill ordering : when players i and j exhibit minimal differences in strength (i.e., 0 ε(i, j), ε(j, k) 1), player i should not be significantly stronger than player k. Further discussion on this assumption is provided in Appendix A. To learn the true ranking r( ), the algorithm sequentially selects, at each round t, a perfect matching Mt, i.e., a partition of the player set into pairs. In other words, Mt = {{gt,1, gt,2}, {gt,3, gt,4}, . . . , {gt,N 1, gt,N}}, where {gt,1, gt,2, . . . , gt,N} = {1, 2, . . . , N}. Henceforth, we refer to perfect matchings simply as matchings . Example (Bradley-Terry-Luce model): The Bradley Terry-Luce model (BTL) (Bradley & Terry, 1952; Chetrite et al., 2017; Corff et al., 2018; Diel et al., 2020; Chen et al., 2022; Gao et al., 2023) serves as a well-established parametric model for defining p(i, j). In this model, each player possesses a corresponding strength denoted as θi, and the probability p(i, j) is given by the formula exp(θi) exp(θi)+exp(θj). It is straightforward to verify that BTL adheres to both Assumption 1.1 and 1.2. More generally, the winning excess probabilities can be expressed as ε (i, j) = F (θi θj), where F, referred to as the Model function, is any nondecreasing Lipschitz function satisfying F( x) = F(x) and F(+ ) 1 Matching Cost: As previously mentioned, one of the motivating examples pertains to efficient matchmaking in video games (Minka et al., 2018; Herbrich et al., 2006). The objective is to devise a matchmaking system that selects a matching Mt at each round. For the sake of simplicity, we assume a fixed set of players, all eager to participate in each round against an opponent with a sufficiently similar skill level; otherwise, they express dissatisfaction (if the game is perceived as too easy or too hard). The platform s dual aim is to rank players while minimizing the number of dissatisfaction. To model this, we consider a fixed known threshold ε , such that players i and j are content with their pairing if, and only if, |ε(i, j)| ε . Specifically, the cost of a matching M is the sum of the costs associated with the generated pairings, i.e., CM = P i,j M 1{|ε(i, j)| > ε }. Problem statement and objectives: An algorithm, at each round t, selects a matching Mt and observes the outcomes of the induced random comparisons (and only those). The overarching objectives are to sequentially and adaptively choose matchings M1, M2, . . . , Mt so that, after some (random) number of rounds T, the algorithm provide two key outputs, correct with a probability of at least 1 δ, where δ (0, 1) is a predetermined confidence parameter. The first one is an estimated ranking ˆr( ) and the second one, a matching ˆ M minimizing the matching cost. The algorithm s performance is evaluated based on either the sample complexity (the number of rounds T needed, or equivalently, the number of comparisons NT/2) or alternatively the cumulative regret, defined as: t=1 CMt TC , (1) where C = min M M CM. It is important to note that T is a random stopping time determined by the algorithm, not a fixed budget of comparisons. Active Ranking and Matchmaking, with Perfect Matchings Main Results: Our main contributions are threefold: The first contribution is an algorithm that outputs a weakened version ˆr of the ranking, termed (ε, δ)-PAC ranking. In essence, any two items such that ε(i, j) > ε satisfy ˆr(i) < ˆr(j), in which case ˆr is said to ε-correct, with high probability i.e. larger than 1 δ. The sample complexity (number of comparisons) is O( N ε2 log3 N log(N/δ)), and the time complexity (number of rounds) is O( log3 N log(N/δ) The second contribution utilizes the aforementioned algorithm to obtain the exact ranking in O( log3(N) log(N/δ) mini 2 i ) rounds where i = ε(σ(i), σ(i + 1)) where σ(i) denotes the i-th ranked player, i.e., σ = r 1. The third contribution establishes that the latter algorithm can be employed to learn an optimal matching, with an additional regret term (hiding log log terms) of order O P , where ε ,i = i ε . 1.1. Related works Active ranking, exact and PAC: The first related algorithm (Feige et al., 1994) is tailored for the case of N totallyordered elements, where comparisons between any two of them have the same known probability of error α < 1/2. Its sample complexity is of order O( N log(N/δ) (1/2 α)2 ) to retrieve an exact ranking with a probability greater than 1 δ. A major limitation of this algorithm lies in the assumption of a uniform probability of error on any comparison, known and bounded away from 1/2. In many practical scenarios, comparisons are prone to errors with different probabilities, which can be arbitrarily close to 1/2, and their values are unknown. These limitations were partially mitigated in (Falahatgar et al., 2017b;a; Saha & Gopalan, 2019; Ren et al., 2018) to develop algorithms that find (ε, δ)-PAC ranking in O( N log(N/δ) ε2 ) comparisons, achieving optimal sample complexity (Falahatgar et al., 2017a). Subsequent improvements (Ren et al., 2019; Saha & Gopalan, 2019; 2020) led to reduced sample complexities, leveraging individual skill gaps, ultimately reaching a sample complexity of O(P log log(1/ i)+log(N/δ) 2 i ). Unfortunately, these algorithms are sequential in nature, querying comparisons one by one without parallelization or matching constraints. Specifically, they construct a tree-like structure sequentially exploited for efficient ranking, either with Binary-Insertion Sort (Ren et al., 2019) or Quick-Sort (Sz or enyi et al., 2015). In both cases, a form of congestion arises at the root of the tree, as every unsorted player must play a substantial number of times against the root s player. This is impractical in real-world applications, as in the video games example, as it leads to significant waiting times before getting paired. Our algorithm addresses the problem of (ε, δ)-PAC rank- ing in the fully parallelized case with a sample complexity log3(N) of the lower bound for sample complexity and time complexity (obtained in the uniform error case). The SST and STI assumptions we consider are relatively mild and commonly encountered (Falahatgar et al., 2017b;a). Combinatorial bandits: The matching problem can be reframed as a specific instance of combinatorial semi-bandit (Cesa-Bianchi & Lugosi, 2012), a domain that has recently garnered significant attention and witnessed improvements in regret minimization across multiple settings (Wang & Chen, 2018; Giraud et al., 2019; Perrault et al., 2019; 2020a;b; Sentenac et al., 2021; Merlis & Mannor, 2021; Hou et al., 2023). The combinatorial structure is evident, as the set of all matchings on a graph with N vertices has a cardinality of (N)! 2N/2(N/2)! Ω N e N/2 . The main difference though is that the cost function is quite different. Nonetheless, the standard algorithms and arguments still hold (i.e., computing the number of times each action must be sampled would follow the same line of proof), computing the regret requires different but straightforward computations. In the ranking from matching problem, combinatorial bandit algorithms would incur a regret scaling, discarding log(N) terms, as O N2 log(1/δ) , where min denotes the expected gap between an optimal matching and the best suboptimal matching (Merlis & Mannor, 2020; Kveton et al., 2015). We refer to Section 2.1 and Appendix E for more details. In contrast, our algorithm incurs a regret scaling as O N log(1/δ) , where 0 := mini min{ i, ε ,i} is another problem parameter that is typically much bigger than min. We refer to Section 2.1 for such examples. Parallel Sorting algorithms: Parallel comparison-based sorting algorithms, suitable for parallel computing, have been explored early on (Knuth et al., 1973; Batcher, 1968; Valiant, 1975). A specific class of parallel sorting algorithms is the sorting networks (Knuth et al., 1973; Ajtai et al., 1983), with the property that no element is involved in multiple comparisons at the same round. This property is crucial for relevance, as players to be ranked can only engage in one game at a time. For a comprehensive introduction to sorting networks, refer to (Knuth et al., 1973). Various methods exist for constructing sorting networks, such as Batcher sorting networks (Batcher, 1968). The sample complexity of Batcher sorting networks is of order O N log2(N) , implying a time complexity of at least O log2(N) , as there are at most N/2 comparisons simultaneously. Another notable sorting network is the AKSPaterson sorting network (Ajtai et al., 1983; Paterson, 1990), with a sample complexity of O (N log(N)) and a time complexity of O (log N). However, it comes with a hidden constant, denoted as D, which might be large (Natvig, 1990; Active Ranking and Matchmaking, with Perfect Matchings Seiferas, 2009). Since the AKS sorting network has a time complexity of O(log N), it enables the ranking of N elements using only O(log N) matchings. We shall leverage the AKS-Paterson algorithm as a blackbox to construct a selection scheme for sequentially choosing matchings and finding an (ε, δ) PAC ranking. 1.2. Organization of the paper Section 2 is dedicated to the first objective: recovering the exact ranking. The necessity of this step is illustrated in Section 2.1, highlighting its role in obtaining an optimal matching. Subsequently, we delve into the description of the ranking algorithm in Section 2.2, where we first outline the relevant properties of AKS-Paterson (Ajtai et al., 1983; Paterson, 1990) for this problem. We then explain how to employ AKS-Paterson (or any other sorting network with O(log N) time complexity) to obtain an (ε, δ)-PAC ranking, and then how to reach an exact ranking by refining ε further and further. In Section 3, we demonstrate how to leverage an exact ranking to output the optimal matching with a small (almost minimax optimal) regret. More general comments, proofs and most pseudocodes are postponed to the Appendix, due to space limit. In this section, we start by giving the general scheme to obtain an (ε, δ)-PAC ranking by carefully adapting the classical AKS-Paterson sorting network. Next, we show how to obtain the exact ranking using additional techniques. We give a general bound for the sample complexity (number of comparisons used to retrieve an (ε, δ)-PAC ranking and to retrieve the exact ranking) that depends on the instance. 2.1. Exact ranking is needed for optimal matching As the cost function satisfies c(i, j) = 0 if ε(i, j) ε , it might give the impression that the optimal matching could be recovered from any (ε /2)-correct ranking (or more generally, by some Ω(ε )-correct ranking). Unfortunately, this is not the case, as proved by the following instances, illustrated in Figure 1. This proves that the exact ranking is (sometimes) necessary; of course, this is not always the case (for instance if ε(i, j) < ε for all pairs (i, j)). Given some arbitrarily small parameter η > 0, the instance is described by ε(1, i) = ε + η/2 for any i > 2, by ε(1, 2) = ε η/2; the other gaps all being equal to η, i.e., ε(i, j) = η, for all j > i > 1. In this instance, the optimal matching has a cost of 0, it matches player 1 to player 2. As a consequence, it requires Figure 1. In this instance, finding the second ranked item is necessary to compute the optimal matching, irrespectively of η Figure 2. In this instance, checking that a matching is suboptimal (without using the ranking) can be arbitrarily complex for small η ranking all N 1 weakest players (or at least finding the exact best one) to pair 1 with 2. Notice that ranking the weakest N 1 players requires an η-correct ranking. We also illustrate the complexity of the problem with another simple instance with only 4 vertices (players are ranked in their index number, i.e., r(i) = i). The skill gaps are defined, for η ε as ε(1, 2) = ε(3, 4) = ε , and ε(2, 3) = + η ε(1, 3) = ε(1, 4) = ε(2, 4) = ε + η. The optimal matching is {1 2, 3 4}, with a cost of 0. It is not difficult to show that ranking the items and then finding this matching has a global cost (neglecting all log terms) of O( 1 2 ). On the other hand, if the ranking is ignored, detecting that the matching {2 3, 1 4} is suboptimal has a cost of Ω( 1 η2 ), that can be arbitrarily larger. A reason behind this result is that the cost of matching is not monotonous with the cumulative skill gaps of its comparison: in this example, the optimal matching has a cumulative skill gap of 2(ε ) 2ε , while the suboptimal one has a cumulative skill gap of ε + + 2η ε , hence twice smaller. This also explains why na ıve combinatorial bandit algorithms would perform poorly. Once again, we emphasize that it is not always the case that the exact ranking must be known before finding an optimal matching. There are many instances of problems where there are many optimal different matchings. The trivial example is when all the comparisons are costless, i.e., |ε(i, j)| < ε for all i, j [N]. In that case, the true ranking is irrelevant. In Appendix D we also show that, if all ε(σ(i), σ(i + 1)) ε /2, then any (ε /2)-correct ranking can be used to build an optimal matching. The Active Ranking and Matchmaking, with Perfect Matchings interested reader can check that our algorithms for building optimal matchings can be adapted to avoid looking for perfect matchings in these favorable situations. 2.2. PAC ranking For a detailed presentation of AKS-Paterson sorting network, we refer to (Chvatal; Paterson, 1990). The main property of this sorting algorithm is that it has a time complexity of O(log N). This implies that using O(log N) matchings, with noiseless comparisons, the algorithm outputs the input list sorted. To handle noise, we will use the subroutine COMPARE acting as a substitution for the deterministic comparison procedure. COMPARE, whose simple pseudocode is postponed to Appendix F.1 takes as input two players i and j and two parameters ε and δ), used as a quality requirement for the performed comparison. Precisely, COMPARE returns the stronger player with probability at least 1 δ with confidence , whenever the gap ε(i, j) is larger than the input threshold ε. Lemma 2.1 below shows that COMPARE indeed behaves as described. Lemma 2.1 (Theoretical Performance of COMPARE). COMPARE terminates after b = O(ε 2 log (1/δ)) comparisons and if ε |ε(i, j)|, it returns the strongest player and Confident with probability at least 1 δ. Conversely, if confident , then |ε(i, j)| > ε with probability at least 1 δ. In the other case, when ε |ε(i, j)|, COMPARE returns the best player with probability greater than 1/2. Lemma 2.2 below is an implication of Lemma 2.1. The proof is relegated to Appendix C.1. Note that the constant D present in the following statement is independent of N and is a characteristic of the AKS-Paterson algorithm: AKSPaterson time complexity, see Section 1.1. Lemma 2.2. Using COMPARE in the AKS-Paterson algorithm as a comparison procedure with the parameters ε/(2D2 log(n)) and δ = δ/(ND log N) outputs a ranking that is (ε, δ)-PAC in at most O ε 2N log3 (N) log (N/δ) comparisons, and in O ε 2 log3 (N) log (N/δ) rounds. This algorithm will be referred to as AKS(ε, δ). The sample complexity for retrieving an (ε, δ)-PAC ranking is bounded from below by Ω N ε2 log (N/δ) (Ren et al., 2018). Thus, using AKS-Paterson algorithm with COMPARE as a comparison procedure is at most log3(N) shy of the lower bound. Conversely, our method yields a time complexity of O log3(N) ε2 log (N/δ) , which is much faster than the state of the art (Falahatgar et al., 2017b; Ren et al., 2019), requiring O N ε2 log (N/δ) time complexity. 2.3. Exact ranking In this section, we leverage the confidence statement of COMPARE to design Algorithm CASCADINGAKS which retrieves an exact ranking with high probability (or stated otherwise a (0, δ) PAC ranking algorithm). The main intuition is to work by phases s = 1, 2, . . ., where each phase ends with a εs-correct estimated ranking, for εs = 1 2s . More specifically, a phase ends with the construction of a clustering of similarly skilled players (two players in different clusters are correctly ranked with high enough probability), and the next phase will refine this clustering, until each cluster is a singleton. The choices of parameters are εs = 1 2s , with the associated confidence δs = 6δ π2s2 . We shall denote by ˆrs the estimated ranking obtained at the end of phase s, that shall be εscorrect with probability at least 1 Ps l=1 δl. 2.3.1. PRELIMINARY LEMMAS This subsection is devoted to simple, easy-to-state, and toprove lemmas that will be useful in the construction and the refinement of the clustering. Lemma 2.3 explains how comparisons that are Confident allow propagation of this confidence across the ranking. Lemma 2.3. Let i, j [N] such that r(i) < r(j) and ε(i, j) > ε. Let ˆr be an ε-correct ranking. Then for all k [N] such that ˆr(j) < ˆr(k), it holds that r(i) < r(k). Proof. Suppose for the sake of contradiction that there is an element k such that r(j) < r(k) and ˆr(k) < ˆr(i). Then, since r(i) < r(j), Assumption 1.1 implies that ε(i, k) ε(i, j) > ε. This contradicts the fact that ˆr is ε-correct. Lemma 2.4 is a direct consequence of Lemma 2.3; it provides a simple way to implement a divide-and-conquer strategy for ranking. Lemma 2.4. Let j1, j2, j3 [N] such that r(j1) < r(j2) < r(j3) and ε(j1, j2) > ε, ε(j2, j3) > ε. Let ˆr be an ε-correct ranking. Then for all i, k [N] such that ˆr(i) < ˆr(j1) and ˆr(j3) < ˆr(k), it holds that r(i) < r(k). Proof. Applying Lemma 2.3 on (j1, j2) and (j2, j3) gives the following: for all i, k [N] such that ˆr(i) < ˆr(j1) and ˆr(j3) < ˆr(k), it holds that r(i) < r(j2) < r(k). In the situation described in Lemma 2.4, two elements u and v satisfying ˆr(u) < ˆr(i) and ˆr(k) < ˆr(v) do not need to be compared. This is the purpose of flagging comparisons as confident or not, and is a key property in the construction of the clustering detailed in the following section. 2.3.2. CONSTRUCTION OF A CLUSTERING In this subsection, we illustrate how to cluster a set of players [N]. We call the sub-algorithm performing this task ANCHORING and its pseudo-code is given in Section F.2. We Active Ranking and Matchmaking, with Perfect Matchings Figure 3. Illustration of comparisons involving ˆσ(i) (Edges) queried by Anchoring. shall assume that there was no cluster created yet (this happens if the skill gaps are all very small). In the subsequent phases of the main algorithm, this clustering procedure will be performed independently on each cluster created (as in standard hierarchical clustering). The clustering relies on anchor points , A1, A2, . . . , AK (chosen data-adaptively for some integer K N and of increasing rank in ˆrs), such that COMPARE(Ak, Aℓ, εs, δs+1) is confident, and ˆrs(Ak) ˆrs(Aℓ) is not too large, of order O(max{|Nεs(Ak)|, |Nεs(Aℓ)|)}, where Nε(i) = {j [N], |ε(i, j)| ε} is the ε-neighborhood of i. The clustering generated by anchor points is defined by Ck = n i [N], ˆrs(Ak 1) < ˆrs(i) < ˆrs(Ak) o , with the convention that ˆrs(A0) = 0 and ˆrs(AK+1) = N + 1. The anchor points Ak are added to either Ck or Ck+1 depending on their parity, see Algorithms 5 and 6 for details. A key quantity of a clustering is its maximal weight, defined as Wˆr(A1, . . . , AK) = maxk [K+1] |Ck|. To find anchor points, we introduce the following graph Gˆrs, whose vertices set is [N] and {i, j} is an edge of Gˆrs if and only if |ˆrs(i) ˆrs(j)| = 2m for any integer m N. Algorithm 3 samples edges of Gˆrs (following COMPARE procedure) and identify, for each player i, two different elements ms(i) and Ms(i) such that m(i) is confidently weaker than i and M(i) is confidently stronger than i. Lemma 2.5 shows that |ˆrs(i) ˆrs(ms(i))| = O(max {|N2εs(i)|, |N2εs(ms(i))|} and |ˆrs(i) ˆrs(Ms(i))| = O(max {|N2εs(i)|, |N2εs(Ms(i))|}. We illustrate Algorithm 3 behavior in Figure 3. The objective is to identify elements confidently ranked below/above ˆσ(i). To achieve this, it is compulsory to seek items, in red, not in Nε(ˆσ(i)), whose elements are in white. Since ˆr(i) is an approximate ranking, red and white items can be intertwined, thus to identify elements confidently ranked below/above ˆσ(i), one must seek far from ˆσ(i). Lemma 2.5. Let ˆr be an ε-correct ranking. Then Anchoring(N, ˆr, ε, δ/(N log N)) has a time complexity of order O log(N/δ) ε2 maxi [N] log (|N2ε(i)|) . It returns a set of anchor points A1, . . . , AK such that Figure 4. Illustration of the behavior CASCADINGAKS. i is confidently between A2j 1 and A2j+2, but its rank with respect to A2j and A2j+1 is uncertain. Wˆr(A1, . . . , AK) = O(maxi [N] |N2ε(i)|). The proof of this lemma is relegated to Appendix C.2. 2.3.3. FROM (εt, δ)-PAC TO (εt+1, δ)-PAC RANKING In this section, we describe how the Algorithm CASCADINGAKS builds an εs-correct ranking from a εs-correct one. The main idea is relatively natural: to improve the estimated ranking, it applies AKS(εs+1, δs+1) to the different elements of a partition induced by the clusters (but twice, and not only once). At a high level, the first iteration of MULTIAKS produces a ranking that is locally εs+1-correct (on the red cells U1, U2, ... in Figure 4), but not globally, as misrankings between elements of Uj and Uj 1 or Uj+1 could still exist but not with cells further away: through a second application of MULTIAKS on the blue cells U 1, U 2, ... Figure 4, the mis-rankings left are eliminated. It thus produces an εs+1-correct ranking ˆrs. The pseudocode is given in Algorithm 1. For the sake of readability, CASCADINGAKS has been separated into subprotocoles, detailed in Appendix F.3. Figure 4 illustrates the behavior of CASCADINGAKS. Lemma 2.6. The time complexity of transitioning from stage s to stage s + 1 of Algorithm 1 is O 1 ε2 s+1 log N log3 max i [N] N2εs+1(i) . (2) Furthermore, the ranking obtained at the end of stage s is εs+1-correct with probability at least 1 Ps+1 l=1 δl. The instance-dependent bound on the sample complexity depends on the size of the skill gaps between a player and those ranked immediately above/below him, as shown by the following result. Theorem 2.7. Each player i is involved in at most O log3 N min{ 2 i , 2 i 1} δ + log log 1 min{ i, i 1} Active Ranking and Matchmaking, with Perfect Matchings Algorithm 1 CASCADINGAKS(δ) Input: items {1, . . . , N}, confidence level δ > 0; Initialize: ˆr0 (a random permutation [N]), s = 1 1: while L = [1, . . . , 1] do 2: εs (1/2)s; 3: (A1, . . . , AK) ANCHORING(N, ˆrs 1, εs, δs/3); 4: ℓ= K+1 2 ; 5: C1, . . . , Ck+1 CLUSTERING((A1, .., AK+1), ˆrs); 6: (U1, .., Uℓ) GRP1((A1, .., AK), (C1, .., CK+1)); 7: ˆr = MULTIAKS((U1, . . . , Uℓ), εs, δs/6); 8: C1, . . . , CK+1 CLUSTERING((A1, . . . , AK), ˆr ); 9: (U 1, . . . , U ℓ) GRP2((A1, . . . , AK), (C 1, . . . , C K+1)); 10: ˆrs = MULTIAKS((U 1, . . . , U ℓ), εs, δs/6) 11: L = CHECK(εs, δs/3)(ˆrs); 12: for i [N] do 13: if i is correctly ranked (L[i] = 1) then 14: i can be moved to the gap estimation phase; 15: end if 16: end for 17: s s + 1; 18: end while 19: return ˆrs comparisons before its rank can be confidently identified. For the proof of Theorem 2.7, see Appendix C.4. Depending on min{ i, i 1}, the rank of player i can be confidently determined sooner than the exact ranking (Algorithm 1, line 14). Thus CASCADINGAKS can move players easily ranked (large min{ i, i 1}) to the gap estimation phase before exact ranking is achieved. Precisely, as soon as ˆr(i 1), ˆr(i), ˆr(i+1) are confidently ranked, GAPESTIMATION proceeds to estimate the gaps between ˆr(i 1), ˆr(i) and between ˆr(i), ˆr(i + 1). This is relevant in practice, but there are instances where this does not happen (see the discussion in Section 2.1). This motivates the following corollary whose proof is postponed to Appendix C.4. Corollary 2.8. Algorithm 1 has a time complexity of log3 N mini [N] 2 i δ ) + log log 1 mini [N] 2 i (4) It returns the true ranking with probability at least 1 δ. 3. Gap estimation and Regret In this section, we assume that Algorithm 1 has output an exact ranking, hence the remaining objective is to identify an optimal matching. Without loss of generality and for the sake of notations, we can assume that the true ranking is the identity, r(i) = i. Lemma 3.1 show that, in the optimal matching, all pairs with a cost of zero are between two players of adjacent rank i and j = i + 1. This implies that it only remains to detect which of the skill gaps ε(i, i + 1) are above, or below ε . Lemma 3.1. Let V = {1, . . . , N} and E the set of edges such that G = (V, E) is a graph verifying: if {i, j} E, then, for every k and l such that i k, l j, {k, l} E. Then the matching obtained by traversing the graph from 1 to N , matching i to i + 1 whenever possible, is a matching with maximal size. This lemma will be applied on the ε -adjacency graph Gε = ([N], Eε ) defined by: the edge (i, j) belongs to Eε if and only if ε(i, j) ε . The maximal matching of this graph is an optimal matching with respect to the cost function. The fact that G satisfies the condition of Lemma 3.1 is a direct consequence of Assumption 1.1 The pseudo-code of Algorithm GAPESTIMATION learning the ε -adjacency is given in Appendix F.5, as it is quite similar to standard multi-armed bandit techniques (and more precisely on best arm identification). The main idea is to start from the graph E0 = ([N], E0) with E0 = {(i, i + 1), i [N 1]} and to sequentially remove edges once the algorithm detects that they do not belong to Gε with high probability. This creates a nested sequence of graphs Gt = ([N], Et), i.e., such that Eε Et+1 Et, with high probability. The algorithm, using the subprotocol SAMPLEMATCHING, whose pseudo-codes is postponed to Appendix F.4, chooses at stage t a maximal matching of Gt (completed arbitrarily into a perfect matching of [N]). The main intuition is the following. Each time the matching selected contains an edge {i, i + 1} Et that does not belong to Eε , the regret might increase by one, but this can only happen finitely many times. Indeed, after some time, the algorithm will eventually learn that {i, i + 1} Eε . It only remains to control when this happens, using a variant of LIL-UCB (Jamieson et al., 2014). An edge {i, i + 1} is removed from Et as soon as ˆpi,i+1(t) 1 2 ε + u(Ti,i+1(t), δ), where Ti,i+1(t) = s=1 1{{i, i + 1} was sampled at step s} 1{{i, i + 1} Es} is the number of times the selected matching contains the pair {i, i + 1} that belonged (erroneously) to Es, ˆpi,i+1(t) is the empirical frequency of wins of player i against i + 1 and u(n, δ) = 2 q log( 2n log(t) δ ) t . This algorithm yields the guarantees given in the following Lemma 3.2. Lemma 3.2. With probability at least 1 δ, for any i [N] Active Ranking and Matchmaking, with Perfect Matchings such that ε ,i := i ε > 0, it holds sup t Ti,i+1(t) 32 2 ε ,i log 2N δ log 32 2 ε ,i The lemma is a consequence of the law of iterated logarithm, see (Jamieson et al., 2014), and its proof is somewhat standard in multi-armed bandit literature, hence omitted. It yields the following regret bound. Corollary 3.3. The regret of learning an optimal matching is upper-bounded by i [N 1],ε(i,i+1)>ε 1 2 ε ,i log N δ log 1 2 ε ,i Proof. Lemma 3.2 provides an upper bound on the number of comparisons between i and i + 1 required to learn that i > ε and that this comparison is costly. Let us assume that those comparisons are indeed costly (otherwise, there is no issue) and that the algorithm keeps sampling it after learning it. Since the graph Et contains Eε , the estimated cost of Sample Matching is smaller than the cost of the optimal matching. Consequently, any costly edge willingly put in the matching (line 6 of Sample Matching) corresponds to one costly edge of the optimal matching. Hence the total regret is the number of times a costly edge is sampled while belonging to the current edge set Et. 3.1. Wrapping up and Minimax Optimality By combining Theorem 2.7 and Corollary 3.3, we get the following Theorem 3.4. It is minimax optimal (up to polylog term) in the sense that, for any given values of i and ε ,i, there exists an instance (satisfying both assumptions 1.1 and 1.2) where this bound cannot be improved (up to the log3 term). We however acknowledge that the regret can be, in some instances, lower than the above bound and refer to Section D). Theorem 3.4. The regret of learning an optimal matching without knowing the ranking is upper bounded by i [N 1]: i>ε 1 2 ε ,i log δ log( 1 2 ε ,i ) 1 2 i log(N 2 i ) . (7) The optimality proof relies on the following three 4-vertices instances, where η1 < η2 ε , illustrated in Figure 5: Instance A: ε(1, 2) = ε(3, 4) = ε + η1 and ε(2, 3) = η2 Instance B: ε(1, 2) = ε(3, 4) = ε η1 and ε(2, 3) = η2 Instance C: ε(1, 3) = ε(2, 4) = ε η1 and ε(3, 2) = η2. and the other ε(i, j) are all strictly bigger than ε . Straightforward computations show that the optimal matching in Instance A is{2 3, 1 4} with a cost 1. In Instance B (resp. C), on the other hand, the optimal matching is {1 2, 4 3} (resp. {1 3, 4 2}), for a cost of 0. In all instances, any other matching has an additional cost of at least 1. Notice that in instances B and C, computing the optimal matching requires the true ranking (no matter η). Standard online learning arguments (Kaufmann et al., 2016; Bubeck et al., 2013) yield that in order to distinguish between Instances A and B/C, one must sample an edge {1, 2} or {3, 4} Ω( log(1/δ) η2 1 ) times. And the ranking of 2 or 3, to distinguish between instances B and C, requires Ω( log(1/δ) η2 2 ) samples of the edge {2, 3}. An interesting feature of this example is that it illustrates that even though matching players 2 to 3 has no direct cost (as the skill gap is very small), this pairing induces some indirect but unavoidable loss to other players, that are matched in unbalanced games. As a consequence, distinguishing between A and B/C incurs a cost of Ω( log(1/δ) η2 1 ) in Instance A, while distinguishing between B and C incurs a cost of Ω( log(1/δ) η2 2 ) in both Instances B and C. In particular, if the choice of the instance is made uniform at random, the expected cost is of order Ω log 1/δ η2 1 + log 1/δ . Duplicating this 4-vertices example, independently N/4 times, gives a regret that has to scale as R(δ) Ω N log N/δ η2 1 + N log N/δ i [N 1],ε(i,i+1)>ε 1 2 ε ,i log N 1 2 i log N This construction can be generalized to any values of ε ,i and i, by adapting values of η1 and η2 per 4instances. 4. Conclusion and future work We designed an algorithm with time complexity optimal for ranking and optimal matching, but up to log3 N. A natural research direction would be to close this gap. Moreover, these algorithms are based on the AKS sorting network and suffer from its drawback, a large constant hidden in the O. Avoiding AKS network, with the same result, is another exciting challenge. Impact Statement This paper presents work whose goal is to advance the field of Machine Learning and Sorting. There are many potential societal consequences of our work, none which we feel must be specifically highlighted here. Active Ranking and Matchmaking, with Perfect Matchings Figure 5. The 3 different instances and their optimal matching Acknowledgements Hafedh El Ferchichi is partially supported by Hi! Paris. Vianney Perchet acknowledges support from the French National Research Agency (ANR) under grant number ANR19-CE23-0026 as well as the support grant, as well as from the grant Investissements d Avenir (Lab Ex Ecodec/ANR11-LABX-0047). Ajtai, M., Koml os, J., and Szemer edi, E. An 0 (n log n) sorting network. In Proceedings of the fifteenth annual ACM symposium on Theory of computing, pp. 1 9, 1983. Batcher, K. E. Sorting networks and their applications. In Proceedings of the April 30 May 2, 1968, spring joint computer conference, pp. 307 314, 1968. Bengs, V., Busa-Fekete, R., El Mesaoudi-Paul, A., and H ullermeier, E. Preference-based online learning with dueling bandits: A survey. The Journal of Machine Learning Research, 22(1):278 385, 2021. Bradley, R. A. and Terry, M. E. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):324 345, 1952. Bubeck, S., Perchet, V., and Rigollet, P. Bounded regret in stochastic multi-armed bandits. In Conference on Learning Theory, pp. 122 134. PMLR, 2013. Cesa-Bianchi, N. and Lugosi, G. Combinatorial bandits. Journal of Computer and System Sciences, 78(5):1404 1422, 2012. Chen, P., Gao, C., and Zhang, A. Y. Optimal full ranking from pairwise comparisons. The Annals of Statistics, 50 (3):1775 1805, 2022. Chetrite, R., Diel, R., and Lerasle, M. The number of potential winners in bradley-terry model in random environment. Annals of Applied Probability, 27(3):1372 1394, 2017. Chvatal, V. Lecture notes on the aks sorting network. URL https://users.encs.concordia. ca/ chvatal/notes/aks.pdf. Corff, S. L., Lerasle, M., and Vernet, E. A bayesian nonparametric approach for generalized bradley-terry models in random environment. ar Xiv preprint ar Xiv:1808.08104, 2018. Devroye, L. Laws of the iterated logarithm for order statistics of uniform spacings. The Annals of Probability, 9(5): 860 867, 1981. Diel, R., Le Corff, S., and Lerasle, M. Learning the distribution of latent variables in paired comparison models with round-robin scheduling. Bernoulli, 26(4):2670 2698, 2020. Elo, A. E. The Rating of Chessplayers, Past and Present. Arco Pub., New York, 1978. ISBN 0668047216. Falahatgar, M., Hao, Y., Orlitsky, A., Pichapati, V., and Ravindrakumar, V. Maxing and ranking with few assumptions. Advances in Neural Information Processing Systems, 30, 2017a. Falahatgar, M., Orlitsky, A., Pichapati, V., and Suresh, A. T. Maximum selection and ranking under noisy comparisons. In International Conference on Machine Learning, pp. 1088 1096. PMLR, 2017b. Falahatgar, M., Jain, A., Orlitsky, A., Pichapati, V., and Ravindrakumar, V. The limits of maxing, ranking, and preference learning. In International conference on machine learning, pp. 1427 1436. PMLR, 2018. Feige, U., Raghavan, P., Peleg, D., and Upfal, E. Computing with noisy information. SIAM Journal on Computing, 23 (5):1001 1018, 1994. Gao, C., Shen, Y., and Zhang, A. Y. Uncertainty quantification in the bradley terry luce model. Information and Inference: A Journal of the IMA, 12(2):1073 1140, 2023. Giraud, C., Issartel, Y., Leh ericy, L., and Lerasle, M. Pairmatching: links prediction with adaptive queries. ar Xiv preprint ar Xiv:1905.07342, 2019. Active Ranking and Matchmaking, with Perfect Matchings Heckel, R., Shah, N. B., Ramchandran, K., and Wainwright, M. J. Active ranking from pairwise comparisons and when parametric assumptions do not help. 2019. Herbrich, R., Minka, T., and Graepel, T. Trueskill : a bayesian skill rating system. Advances in neural information processing systems, 19, 2006. Hoare, C. A. Quicksort. The computer journal, 5(1):10 16, 1962. Hou, Y., Tan, V. Y., and Zhong, Z. Probably anytime-safe stochastic combinatorial semi-bandits. In International Conference on Machine Learning, pp. 13353 13409. PMLR, 2023. Jamieson, K., Malloy, M., Nowak, R., and Bubeck, S. lil ucb: An optimal exploration algorithm for multiarmed bandits. In Conference on Learning Theory, pp. 423 439. PMLR, 2014. Kaufmann, E., Capp e, O., and Garivier, A. On the complexity of best arm identification in multi-armed bandit models. Journal of Machine Learning Research, 17:1 42, 2016. Knuth, D. E. et al. The art of computer programming, volume 3. Addison-Wesley Reading, MA, 1973. Kveton, B., Wen, Z., Ashkan, A., and Szepesvari, C. Tight regret bounds for stochastic combinatorial semi-bandits. In Artificial Intelligence and Statistics, pp. 535 543. PMLR, 2015. Merlis, N. and Mannor, S. Tight lower bounds for combinatorial multi-armed bandits. In Conference on Learning Theory, pp. 2830 2857. PMLR, 2020. Merlis, N. and Mannor, S. Lenient regret for multi-armed bandits. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp. 8950 8957, 2021. Minka, T., Cleven, R., and Zaykov, Y. Trueskill 2: An improved bayesian skill rating system. Technical Report, 2018. Natvig, L. Logarithmic time cost optimal parallel sorting is not yet fast in practice! In Conference on High Performance Networking and Computing: Proceedings of the 1990 ACM/IEEE conference on Supercomputing, volume 12, pp. 486 494. Citeseer, 1990. Paterson, M. S. Improved sorting networks with o (log n) depth. Algorithmica, 5(1-4):75 92, 1990. Perrault, P., Perchet, V., and Valko, M. Exploiting structure of uncertainty for efficient matroid semi-bandits. In International Conference on Machine Learning, pp. 5123 5132. PMLR, 2019. Perrault, P., Boursier, E., Valko, M., and Perchet, V. Statistical efficiency of thompson sampling for combinatorial semi-bandits. Advances in Neural Information Processing Systems, 33:5429 5440, 2020a. Perrault, P., Valko, M., and Perchet, V. Covariance-adapting algorithm for semi-bandits with application to sparse outcomes. In Conference on Learning Theory, pp. 3152 3184. PMLR, 2020b. Ren, W., Liu, J., and Shroff, N. B. Pac ranking from pairwise and listwise queries: Lower bounds and upper bounds. ar Xiv preprint ar Xiv:1806.02970, 2018. Ren, W., Liu, J. K., and Shroff, N. On sample complexity upper and lower bounds for exact ranking from noisy comparisons. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. Saha, A. and Gopalan, A. Active ranking with subset-wise preferences. In The 22nd International Conference on Artificial Intelligence and Statistics, pp. 3312 3321. PMLR, 2019. Saha, A. and Gopalan, A. From pac to instance-optimal sample complexity in the plackett-luce model. In International Conference on Machine Learning, pp. 8367 8376. PMLR, 2020. Seiferas, J. Sorting networks of logarithmic depth, further simplified. Algorithmica, 53(3):374 384, 2009. Sentenac, F., Yi, J., Calauzenes, C., Perchet, V., and Vojnovic, M. Pure exploration and regret minimization in matching bandits. In International Conference on Machine Learning, pp. 9434 9442. PMLR, 2021. Sz or enyi, B., Busa-Fekete, R., Paul, A., and H ullermeier, E. Online rank elicitation for plackett-luce: A dueling bandits approach. Advances in Neural Information Processing Systems, 28, 2015. Tversky, A. and Russo, J. E. Substitutability and similarity in binary choices. Journal of Mathematical psychology, 6(1):1 12, 1969. Valiant, L. G. Parallelism in comparison problems. SIAM Journal on Computing, 4(3):348 355, 1975. Wang, S. and Chen, W. Thompson sampling for combinatorial semi-bandits. In International Conference on Machine Learning, pp. 5114 5122. PMLR, 2018. Yue, Y., Broder, J., Kleinberg, R., and Joachims, T. The k-armed dueling bandits problem. Journal of Computer and System Sciences, 78(5):1538 1556, 2012. Active Ranking and Matchmaking, with Perfect Matchings Zoghi, M., Tunys, T., Ghavamzadeh, M., Kveton, B., Szepesvari, C., and Wen, Z. Online learning to rank in stochastic click models. In International conference on machine learning, pp. 4199 4208. PMLR, 2017. A. Discussion of Assumptions 1.1 and 1.2 The main results of the different algorithms rely on Assumptions 1.1 and 1.2 that we restate here for completeness. Assumption A.1. Strong Stochastic Transitivity (SST). For any i, j, k [N], if r(i) < r(j) < r(k) then: ε(i, k) max{ε(i, j), ε(j, k)}. (8) Assumption A.2. Stochastic Triangular inequality (STI). For any i, j, k [N], if r(i) < r(j) < r(k) then: ε(i, k) ε(i, j) + ε(j, k). (9) Consequences of (SST) (SST) have two main consequences: 1. There is a ranking r that is consistent with (ε(i, j))i,j [N]. 2. If r(i) < r(j) < r(k) and ε(i, j) > ε then ε(i, k) > ε . Consequence 1 can be obtained with less restrictive assumptions like Weak Stochastic transitivity (WST), the minimum assumption required for the existence of a ranking consistent with the skill gaps (ε(i, j))i,j [N]. Assumption A.3. Weak Stochastic transitivity (WST). For any i, j, k [N], if ε(i, j) 0 and ε(j, k) 0 then ε(i, k) 0. If only A.3 holds, even though a ranking exists and is consistent with the skill gaps, its existence is not useful. Indeed, to decide whether a given player i has an opponent j with |ε(i, j)| ε , the gaps between i and any other player must be estimated, which yields a sample complexity of Ω P i ε with probability at least 1 δ Proof. W.l.o.g, assume that ε(i, j) > 0, then COMPARE terminates after b = 2ε 2 log(1/δ) = O(ε 2 log 1/δ) comparisons. Since the algorithm is symmetric and ε(i, j) > 0, then COMPARE returns j with probability smaller than 1/2. Now, suppose that ε < ε(i, j), It remains to prove that COMPARE returns i and that it is Confident with probability at least 1 δ. Let E be the event that ˆpi,j < 1/2 + 2ε. Then, by Chernoff bound P(E) exp 2 (ε)2 b δ (14) If E does not happen, then COMPARE returns i (the stronger player) and Confident . This holds with probability at least 1 δ/2 For convenience, we restate the Lemma 2.2 Lemma C.2. Using COMPARE in the AKS-Paterson algorithm as a comparison procedure with the parameters ε/(2D log(N)) and δ = δ/(N log3 N), the ranking retrieved is (ε, δ)-PAC in at most O ε 2N log (N) log (N/δ) comparisons. To prove this lemma, the only used property of AKSPaterson algorithm is that it is a network of comparators, meaning that it can be expressed as a finite sequence (Mt, Lt, Rt)t, where Mt is a perfect bipartite matching between elements Lt [N] and those of Rt [N]. Let σ0 be an arbitrary ordering (σ(i) is the element in position i). ˆσ1(and subsequently rt) is obtain as follows: if i L1 and j R1 are paired in M1, then ˆσ0(i) and σ0(j) are compared(For now, comparisons are noiseless). After the comparison: σ1(j) gets the larger value and σ1(i) gets the smaller one. In summary, positions in [N] with indices in R1 receive the larger values and elements with indices in L1 receive the smaller values. σt is built similarly, by induction. Hence, σt is the ranking resulting from applying the t first element of the sequence (Mt, Lt, Rt). Let ˆσt the ranking Active Ranking and Matchmaking, with Perfect Matchings resulting from applying the same sequence, but with a faulty comparator: COMPARE (with parameters (ε, δ) ) instead of a perfect comparison. Then we have the following lemma: Lemma C.3. The following bounds hold that: |ε(ˆσt(i), σt(i))| tε. (15) and this, simultaneously on all i [N], with probability at least 1 δt N/2 Proof. At time t, t N/2 comparisons have been queried to construct ˆσt. Hence, all queries of COMPARE are simultaneous correct with a probability at least 1 δt N/2. For the rest of the proof, suppose that all the queries of COMPARE are correct. The proof is by induction: for t = 0: ˆσ0 = σ0 (no comparison queried yet). Let t N. Suppose that equation 15 hold for t. Let i Lt+1 and j Rt+1 be two indices that are paired in Mt+1. W.l.o.g, suppose that ε(σt(i), σt(j)) > 0, so much that σt+1(i) = σt(j) and σt+1(j) = σt(i). ˆσt(i) and ˆσt(j) are compared using COMPARE. If ˆσt+1(i) = ˆσt(j) and ˆσt+1(j) = ˆσt(i), then Equation 15 trivially holds for i and j because ε(ˆσt(i), σt(i)) = ε(ˆσt+1(i), σt+1(i)) and ε(ˆσt(j), σt(j)) = ε(ˆσt+1(j), σt+1(j)). If instead ˆσt+1(i) = ˆσt(i) and ˆσt+1(j) = ˆσt(j), this means that ε(ˆσt(i), ˆσt(j)) ε. It follows from equation 10 (which is a consequence of assumption 1.2) that ε(ˆσt+1(i), σt+1(i)) = ε(ˆσt(i), σt(j) (16) ε(ˆσt(i), ˆσt(j)) + ε(ˆσt(j), σt(j)) (17) ε + tε = (t + 1)ε (18) Moreover, ε(σt(j), σt(i)) < 0, hence the same reasoning yields ε(σt+1(i), ˆσt+1(i)) = ε(σt(j), ˆσt(i)) (19) ε(σt(j), σt(i)) + ε(σt(i), ˆσt(i)) (20) In summary |ε(σt+1(i), ˆσt+1(i))| (t+1)ε. The same reasoning applied to j yields |ε(σt+1(j), ˆσt+1(j))| (t+ 1)ε. Hence the lemma. Proof. Lemma 2.2 is a direct consequence of Lemma C.3: for ε = ε/(2D log N) and δ = δ/(DN log N), the retrieved ranking ˆr verifies the following relation with respect to the true ranking r: i [N], |ε(σ(i), ˆσ(i))| ε/2 (22) Let i < j [N] two positions. Let ˆσ(i) and ˆσ(j) be the players at those ranks according to ˆσ (the rank of player ˆσ(i) is i). It is sufficient to show that ε(ˆσ(j), ˆσ(i)) ε to conclude that ˆr is indeed ε-correct. The Equation 10 and Lemma C.3 yield: ε(ˆσ(j), ˆσ(i)) ε(ˆσ(j), σ(j)) (23) + ε(σ(j), σ(i)) (24) + ε(σ(i), ˆσ(i)) (25) Hence the lemma. C.2. Proof of Lemma 2.5 For convenience, the Lemma 2.5 is restated: Lemma C.4. Let ˆr be an ε-correct ranking. Then Anchoring(N, ˆr, ε, δ) has a time complexity of order ε2 maxi [N] log (|N2ε(i)|) . It returns a set of anchor points A1, . . . , AK such that Wˆr(A1, . . . , AK) = O(maxi [N] |N2ε(i)|). Both these claims hold simultaneously with probability at least 1 δ The proof of Lemma 2.5 relies on the following observation: Lemma C.5. Let ˆr be a an ε-correct ranking. Let i [N] and j Nε(i). Then: |ˆr(i) ˆr(j)| |N2ε(i)| (27) Proof. First, notice that since ˆr is ε-correct, |ˆr(i) r(i)| |Nε(i)|. Indeed, to shift the rank of i with s, at least s misrankings are needed. Since i cannot be misranked with other elements than those of Nε(i), the shift s |Nε(i)|. Similarly for every element j Nε(i), we have that |ˆr(j) r(j)| |Nε(j)|. Moreover, for every j Nε(i), it holds that Nε(j) N2ε(i). This is a direct consequence of Assumption 1.2. Now, notice that |ˆr(i) ˆr(j)| counts the number of elements ranked between i and j in ˆr. Since ˆr is ε-correct, there is two types of elements that are between i and j in ˆr: Elements that are between i and j in the true ranking r: these are included in Nε(i) because j Nε(i). Elements that are mismatched with respect to either i or j: these are elements of Nε(i) and Nε(j) which are both included in N2ε(i). Hence, every element ranked between i and j in ˆr is an element of N2ε(i) Hence, for every j Nε(i), it holds that |ˆr(i) ˆr(j)| |N2ε(i)|. This concludes the proof. Now, the proof of Lemma 2.5 is presented Active Ranking and Matchmaking, with Perfect Matchings Proof. First, Anchoring(N, ˆr, ε, δ) queries at most 2N log N times COMPARE, with each query having a confidence of δ 2N log N . It follows from a union bound that all queries are simultaneously correct with probability at least 1 δ. For the rest of the proof, assume it is the case. There are two claims to the lemma: 1. Prove that Anchoring(N, ˆr, ε, δ/(N log N)) finds an anchoring (A1, A2, . . . , AK) satisfying Wˆr(A1, . . . , AK) = O(maxi [N] |N2ε(i)|). 2. Prove that the comparisons queried can be organized into O log(N/δ) ε2 maxi [N] log |N2ε(i)| matchings, proving the upper bound on the time complexity. To prove the first claim, make use of Lemma C.5. Indeed, if z = 1 + log(|N2ε(i)|) and m(i) and M(i) are two elements such that ˆr(i) ˆr(M(i)) = 2z, and ˆr(i) ˆr(m(i)) = 2z. Since 2z > |N2ε(i)|, m(i) and M(i) are both outside of Nε(i) making comparing them to i a confident comparison. Now, consider the following anchoring: A1 = ˆr 1(1) Ai+1 = m(Ai) this anchoring has a weight function of Wˆr(A1, . . . , AK) = max i [K] |Ci| max i [K] 2|N2ε(Ai)| (28) This is because |Ci| 2 max{|N2ε(Ai 1)|, |N2ε(Ai)|}. In conclusion, the anchoring (A1, . . . , AK) constructed satisfies the claim Wˆr(A1, . . . , AK) = O maxi [N] |N2ε(i)| . This prove the first claim. The second claim states that Algorithm 3 has a time complexity of O log(N/δ) ε2 maxi [N] log |N2ε(i)| . First, it follows from the first claim that every element i will encounter an opponent that is confidently stronger than him (M(i)) and one that is confidently weaker than him (m(i)) at stage z = 1 + log |N2ε(i)| of Algorithm 3 at the latest. Hence, at stage zm = 1 + maxi [N] log |N2ε(i)| , every element i has either encountered corresponding opponents m(i) and M(i), or either ˆr(i) 2z < 1 or ˆr(i) + 2Z > N. In both cases, L[i] = U[i] = 1 for all i [N] (step 3 in Algorithm 3), hence Algorithm 3 halts at stage zm at the latest. Now, it remains to show that each stage costs O( log(N/δ) ε2 ) matchings. we proceed as follows: W.l.o.g., suppose that ˆr = Id. Recall that at stage z, Algorithm 3 queries comparisons between elements i, j such that |i j| = 2z. It is sufficient to show that all the queries of stage can be covered with only 2 matchings. The proof of that is as follows: [N] is divided into consecutive segments Sk = {2z+1(k 1) + 1, . . . , 2z+1k} for k {1, . . . , P = N/(2z+1) } of size 2z+1 each, with the last segment being SP +1 = 2z+1P + 1, . . . , N (possibly empty). For k P, segment Sk is divided into two halves: a left half SL k = {2z+1(k 1) + 1, . . . , 2z+1k + 2z} (29) and right half SR k = {2z+1k + 2z + 1, 2z+1k}. (30) SP +1 is also divided into SL P +1 = {2z+1(P) + 1, . . . , min{N, 2z+1(P + 1) + 2z}} (31) and SR P +1 = {2z+1(P + 1) + 2z + 1, . . . , N} (32) whenever it makes sense. The following properties are straightforward: There is no queries involving elements that are both in the same set SR k or SL k for any k = 1, . . . , P + 1. If a comparison between i and j is queried with i j, then there is a k [P + 1]] such that i SR k and j SL k+1 or i SL k and j SR k . Elements of SL 1 and SR P +1 are involved in one query each. If we match the elements of SL k to those of SR k in increasing order, we get a perfect matching on [N] \ SP +1 totaling to N/2z+1 queries of COMPARE. Notice that in this way, all queries of COMPARE that involve elements of SL 1 are already addressed. If SP +1 has more than 2z elements, match elements of SL P +1 with those of SP +1 in an increasing order corresponding to the queries. Match what is left arbitrarily. Notice that this way, all queries between elements of SL P +1 and those of SR P +1 are addressed. In particular, all queries involving SR P +1 are addressed too. If SP +1 has 2z or less elements, match arbitrarily. In summary, all queries involving elements between SL k and elements of SR k for all k [P + 1]. The leftover queries (involving elements from SR k with elements from SL k+1, for k [P]) are addressed now in a similar way. Elements of SR k are matched with those of SL k+1 in increasing order. Since no queries involving elements of SL 1 are left for this stage z, they can be matched arbitrarily. It remains to indicate the way in which elements of SR P are matched. They are matched to those of SL P +1 whenever possible. Match potential leftover arbitrarily. This way, all queried comparisons of stage z have been addressed. Active Ranking and Matchmaking, with Perfect Matchings In conclusion, we separated the queries of COMPARE, queried by Algorithm 3 in a single stage into two matchings. Since each comparisons is performed O log N/δ ε2 times by COMPARE, each matching is repeated O log N/δ ε2 too, im- plying a time complexity of log N/δ ε2 per stage of Algorithm 3, hence the second claim. C.3. Proof of Lemma 2.6 For convenience, Lemma 2.6 is restated: Lemma C.6. The time complexity of transitioning from stage s to stage s + 1 of Algorithm 1 is O 1 ε2 s+1 log N log3 max i [N] N2εs+1(i) . (33) Furthermore, the ranking obtained at the end of stage s is εs+1-correct with probability at least 1 Ps+1 l=1 δl. Proof. This lemma is composed of two claims: 1. At the end of stage s, the ranking ˆrs is εs-correct with probability at least 1 Ps l=1 δl. 2. The time complexity of stage s is log3 max i [N] N2εs 1(i) . (34) The proof is conducted through induction Suppose that ˆrs is indeed εs-correct. First(line 3), Algorithm 1 generates anchors (A1, . . . , AK) that provides a clustering of maximum cluster size of O maxi [N] |N2εs 1| . This step have a time complex- ity of O log(N/δ) εs maxi [N]|N2εs 1|(i) , and is sound with probability at least 1 δs/3. Now that a clustering is available, Algorithm 1 constructs (Uj)j as indicated in line 6. We have that |Uj| |Cj| + |Cj| + 2 = O maxi [N] |N2εs 1(i)| . Hence, line 7 would have a time complexity of O log(N/δs) ε2s log3(maxi [N] |N2εs 1(i)|) . This step is correct with a probability at least 1 δs/6: this is because Algorithm F.3, is correct with a probability at least 1 δ/6 (using a union bound). A similar reasoning is applied to line 10. As will be shown later, |C j| |Cj+1| + |Cj| + |Cj 1|. It follows that maxj |U j| = O maxi [N] |N2εs 1(i)| . Hence, with the same reasoning as for step 1, the time complexity of step 1 is shown to also be O log(N/δs) ε2s maxi [N] log3 |N2εs 1(i)| . This step is also correct with probability at least 1 δs/6. The last step of the stage is the checking step (halting condition). It also is correct with probability at least 1 δs/3. Hence, the stage s has a time complexity of O log(N/δs) ε2 s log3(maxi [N] |N2εs 1(i)|) . All the steps of stage s are simultaneously correct with probability at least 1 δs. To conclude the proof, it remains to show that if all the steps of stage s are correct, then the ranking ˆrs obtained is indeed εs-correct. We will show the following claims that hold for any set of anchors A1, ..., AK: 1. ˆr is εs-correct on Uj. 2. ˆr is εs 1-correct on [N]. 3. j {1, . . . , K+1 2 }, (i1, i2) C 2j [N]; (ˆr (i1) < ˆr (i2)) & (ε(i1, i2) < εs) = i2 C2j+1. 4. ˆrs is indeed εs-correct. The third claim simply means that the only remaining misrankings preventing ranking ˆr from being εs-correct are between elements of C 2j and C 2j+1 for j {1, . . . , K+1 1) ˆr is εs-correct on Uj by direct application of Lemma 2.2. 2) Since each ˆr is εs-correct on each Uj, it is εs 1-correct too, for comparisons between elements i1 Uj1 and i2 Uj2 (j1 = j2), εs 1-correctness is inherited from ˆrs 1. (elements in Uj according to ˆrs are the same as according to ˆrs 1.) 3) The remaining misrankings are narrowed down as follows: There is no misrankings between elements of C 2j and those of C l for l [ K+1 2 ] \ {2j 2, . . . , 2j + 2}. This is a direct application of Lemma 2.4. Elements i C 2j 2 satisfy r(i1) < r(A2j 1): this is a direct application of Lemma 2.3 on (A2j 2, A2j 2) and ˆr (since it is εs 1-correct). Let (i1, i2) C 2j 2 C 2j. Either (r(i1) < r(i2)), in which case there is nothing to prove because ˆr and r are in agreement, or (r(i2) < r(i1)): Recall that elements in i2 C 2j verify ε(i2, A2j 1) < εs (because ˆr is εs-correct on Uj and A2j 1 Uj). If (r(i2) < r(i1) < r(A2j 1)), then it follows from Assumption 1.1 that ε(i2, i1) ε(i2, A2j 1) εs. This means that i1 and i2 are not misranked for a precision εs, thus eliminating misrankings between elements of C 2j and those of C 2j 2. Misrankings between elements of C 2j and those of C 2j+2 are eliminated with the same reasoning. Active Ranking and Matchmaking, with Perfect Matchings Misrankings between elements of C 2j and those of C 2j 1 are automatically eleminited since ˆr is εs - correct on Uj, which includes C 2j and C 2j 1. Hence, the only remaining misrankings are potentially between elements of C 2j and those of C 2j+1. 4) Step 1 corrects these potential misrankings, as it applies AKS with a precision εs on U j which contains C 2j and C 2j+1. Hence, after step 1, the algorithm will have eliminated all potential misranking preventing ˆrs from being εs-correct. Hence, ˆrs is indeed εs-correct. This is true provided that all the steps are simultaneously correct. This happens with probability at least 1 2δs/3. The proof is concluded by induction: For s = 0, ˆrs is ε0 = 1/2-correct with probability 1 (every permutation is such). Let s N, Suppose that at the end of the stage s, we have ˆrs is εs-correct with probability at least 1 Ps l=1 2δs/3. Then, as shown, ˆrs+1 is (εs+1)-correct with probability at least 1 2δs+1/3 given that ˆrs is εs-correct. Since ˆrs is εs-correct with probability at least 1 Ps l=1 2δl/3, through a union bound, ˆrs+1 is proven to be εs+1-correct with probability at least 1 Ps+1 l=1 2δs/3. C.4. Proofs of Theorem 2.7 and Corollary 2.8 Theorem C.7. Each player i is involved in at most O log3 N min{ 2 i , 2 i 1} δ + log log 1 min{ i, i 1} (35) comparisons before its rank can be confidently identified. Proof. Let i [N]. if ε2 s < min 2 i , 2 i 1 , then i will be flagged at step 1 as correctly ranked. Hence, i is involved in at most: ε2 l log3(N) log3(N) l=1 4l log π2Nl2 2 log3(N)4s log π2Ns2 = O(log3(N)4s log Ns2 For s = log( 1 min{ i, i 1}) , i is correctly ranked and would have been involved in at most: O log3 N min{ 2 i , 2 i 1} δ + log log 1 min { i, i 1} Corollary C.8. Algorithm 1 has a time complexity of log3 N mini [N] 2 i δ ) + log log 1 mini [N] 2 i (40) It returns the true ranking with probability at least 1 δ Proof. If εs < log 1 mini [N] i , then ˆrs is εs-correct, hence it corresponds to the true ranking (no gap is small enough to induce an error). The probability of error of Algorithm 1 is upper bounded by (union bound): l=1 P({an error occured at stage l}) 6δ π2l2 = δ (41) Hence, Algorithm 1 retrieves the exact ranking in the claimed time complexity with probability at least 1 δ. D. Going beyond worst cases The underlying concept behind thresholding the cost of a matching stems from the idea that obtaining an exact ranking is prohibitively expensive and that games do not have to be exactly balanced (i.e., ε(i, j) exactly equal to 0) to be interesting. Unfortunately, and as discussed above, thresholding does not allow bypassing the exact ranking retrieval process, and in general, obtaining the exact ranking is unavoidable (as instances exist where the exact matching is unique, and retrieving it implies retrieving the ranking, see also Section 2.1). However, there are also many situations, obviously not the worst-case ones, where retrieving an optimal matching is nearly as costly as obtaining a ε -correct ranking. One typical such situation is given in Theorem D.2. We proceed to prove and illustrate this result in the remainder of this section. W.l.o.g., we suppose in this section that the items are correctly ranked by there index, that is, i < j implies ϵ(i, j) > 0. For any estimated ranking rϵ, possibly indexed by ϵ, we let σϵ = r 1 ϵ so any element i is immediately preceded in the ranking rϵ by j = σε(rε(i) 1). The following lemma allows to bound the gap between j and i when rε is ε-correct, when i 1 and i are sufficiently close. Lemma D.1. Let rε denote a ε-correct ranking and assume that ε + ε(i 1, i)) ε . Then j = σε(rε(i) 1) satisfies |ε(j, i)| ε . Proof. Suppose first that j > i is actually weaker than i so i and j are improperly compared by rϵ. As rε is ε-correct, this implies that 0 < ε(i, j) ε ε . Active Ranking and Matchmaking, with Perfect Matchings Suppose now that j < i is in reality stronger than i. Then either j = i 1, so by hypothesis 0 < ε(j, i) = ε(i 1, i) ε , or j < i 1, so j and i 1 are improperly compared by rε. In this last case, by Assumption 1.2, 0 < ε(k, i) ε(k, i 1)+ε(i 1, i) ε+ε(i 1, i) ε . Lemma D.1 allows to prove the following theorem that shows a typical situation where a staisfying matching can be obtained using only an η-correct ranking. Theorem D.2. Assume that there exists ε > 0 such that, for any i [N 1], ε(i, i + 1) ε ε. Let rε denote any εcorrect ranking. Let Mε denote the matching (σε(2i 1) σε(2i), i [N/2]) that pairs the items ranked 2i 1 and 2i by rϵ. Then Mε is a costless matching: CMε = 0. Proof. It is sufficient to prove that, for any i [N/2], |ε(σε(2i 1), σε(2i))| ε . Fix j = σε(2i), we have ε(j 1, j) ε ε by assumption so by Lemma D.1, σε(2i 1) = σε(rε(j) 1) satisfies |ε(σε(2i 1), σε(2i))| ε . A really nice consequence of Theorem D.2 is that, if all ε(i, i+1) ε /2, then costless matchings can be built from any ε /2-correct ranking. The condition ε(i, i + 1) ε /2 can easily be tested online and it is straightforward to adapt our sampling policy in this case to prevent it from looking at a perfect ranking. The details are left to the interested reader. A natural question that may arise from the preceeding result is wether the condition that all ε(i, i + 1) ε /2 is met in some examples. It turns out to be typically the case as shown by the following example. Assume that any i has a value Xi that is a non negative random variable, say uniform on [0, 1] to fix ideas, sampled independently from the other Xj. Assume moreover that ϵ(i, j) = Xi Xj 2 . The rank r is then the function whose inverse σ = r 1 satisfies Xσ(1) > . . . > Xσ(n) . In words, the strongest item is the one with the highest value. In this setting, the condition max i [N 1] ε(σ(i), σ(i + 1)) ε is thus met as soon as the maximal spacing between uniforms satisfies max i [N 1] Xσ(i) Xσ(i+1) ε . The maximal spacing of uniforms has been extensively studied in probability and precise asymptotics are available, see (Devroye, 1981) for example, showing that it scales asymptotically as log N/N. Therefore, as long as log N N < cε for some small constant c, the condition maxi [N 1] ε(σ(i), σ(i + 1)) ε 2 is met with high probability in this example. E. Connections with Comb UCB (Kveton et al., In this section, we shall draw some connections between the online ranking problem and combinatorial bandits, even though the two problems are quite different. We shall try as much as possible to use the same notation as (Kveton et al., 2015), and we will only provide high-level arguments (as, again, Comb UCB analysis does not hold in the ranking problem, where the loss of sampling an edge is not the expectation of some random variable, but a non-linear transformation of it). Formally, a stochastic combinatorial semi-bandit is a tuple B = (E, Θ, P), where E = {1, . . . , L} is a finite set of L items, Θ 2E is a non-empty set of feasible subsets of E, and P is a probability distribution over a unit cube [0, 1]E. The items in the set E are associated with a vector of stochastic weights/losses w P, whose e-th component, w(e), is the weight of item e. The expected weights of the items are defined as w = Ew P [w]. The loss of choosing solution A under the realization of the weights w is simply (this is the where lies the main different between bandit and ranking) f(A, w) = X c A w(e). (42) The maximum number of chosen items is defined as K = max A Θ |A|. Let (ws)t s=1 be an i.i.d. sequence of t weights drawn from P. At time s, the learning agents chooses solution As Θ based on its observations of the weights up to time s, loses f (As, ws), and observes the weights of all chosen items at time s, {(e, ws(e)) : e As}. The learning agent interacts with the environment t times and its goal is to minimize its expected cumulative reward over this time. If the agent knew P a priori, the optimal action would be to choose the optimal solution: A = arg min A Θ f(A, w). (43) at all steps s. The quality of the agent s policy is measured by its expected cumulative regret: s=1 R (As, ws) Active Ranking and Matchmaking, with Perfect Matchings where R (As, ws) = f (A, ws) f (A , ws) is the instantenous regret of the agent at time s. Comb UCB algorithm (Kveton et al., 2015) works as follows: First, it computes qn upper confidence bound (UCB) on the expected weight of each item e, which would rewrite in the ranking problem as follows (since the weight is 0, if the skill gap is smaller than ε Ut(e) = min n 1{ ˆw Tt 1(e)(e) + ct 1,Tt 1(e) > 1 1{ ˆw Tt 1(e)(e) ct 1,Tt 1(e) < 1 2 + ε } o , where ˆws(e) is the average of s observed weights of item e, Tt(e) is the number of times that item e is observed after t rounds, and: is the radius of a confidence interval around ˆws(e) at time t such that w(e) C(e, s, t) = [ ˆws(e) ct,s, ˆws(e) + ct,s] holds with high probability. Second, Comb UCB calls the optimization oracle to solve the combinatorial problem on the UCBs: At = arg min A Θ f (A, Ut) . (46) Finally, Comb UCB chooses At, observes the weights of all chosen items, and updates the estimates of w(e) for these items. In the case at hand, E = {{i, j}, i = j}, so much that L = N(N 1) 2 , Θ the feasible set is the set of maximal size matchings, which implies that K = N/2 and P a product of independent Bernoulli variables such that ωi,j Ber (p (i, j)). Consider the following situation: ε(i, i + 1) = ε and ε(i, i + k) = ε + if k 2. To show that this situation yields a regret of N2 log(N/δ) 2 min , we can assume that the true edges of the graph G are kept in the estimated graph. To eliminate another edge, Comb UCB - that does not take into account the ranking structure - must have sampled it log(N/δ) 2 times. This adds up to a total cost of O( N 2 log(N/δ) 2 ). Checking that in this case 2 = min is straightforward. F. Omitted Pseudo-codes F.1. Pseudo-code of COMPARE Algorithm 2 COMPARE(i, j, ε, δ) Initialize: b 2 1: for t 1 to b do 2: Compare i and j; Update wi,j wi,j + 1 if i wins; 3: end for 4: ˆpi,j wi,j/b; 5: if ˆpi,j > 1/2 then 6: Output i; 7: else 8: Output j; 9: end if 10: if |ˆpi 1 2| > 2ε then 11: Output Confident ; 12: else 13: Output Not Confident ; 14: end if F.2. Pseudo-codes of ANCHORING Active Ranking and Matchmaking, with Perfect Matchings Algorithm 3 ANCHORING(N, ˆr, ε, δ) Input: ˆr: ranking, ε > 0, δ > 0; Initialize: L = U = [0, . . . , 0] RN , z = 0, G: Empty graph (without edge); 1: while L = [1, . . . , 1] or U = [1, . . . , 1] do 2: for i, j [N] such that |ˆr(i) ˆr(j)| = 2z do 3: (k, string) = COMPARE {i, j}, ε, δ 2N log N ; requires 2 perfect matchings; 4: {ℓ} = {i, j} \ {k}; 5: if string = Confident then 6: L(k) = 1; U(ℓ) = 1; 7: add the edge {i, j} to the graph G; 8: end if 9: end for 10: for i [N] such that ˆr(i) < 2z do 11: L(i) = 1; 12: end for 13: for i [N] such that ˆr(i) + 2z > N do 14: U(i) = 1; 15: end for 16: z=z+1; 17: end while 18: return; arg min s.t. (A1, . . . , Aℓ) path of G ˆr(Ai) ˆr(Ai+1), i Wˆr(A1, . . . , Aℓ). F.3. Sub-protocols used in Algorithm 1 Algorithm 4 CHECK(ε, δ) Input: a ranking ˆr and its inverse ˆσ = ˆr 1 Initialize: L = [0, . . . , 0] RN, 1: for i [N] do 2: if COMPARE({ˆσ(i), ˆσ(i + 1)}, 2ε, δ N+1) = (ˆσ(i), Confident ) and COMPARE({ˆσ(i), σ(i 1)}, 2ε, δ N+1) = (ˆσ(i 1), Confident ) then 3: L[ˆσ(i)] = 1 : ˆσ(i) is correctly ranked; 4: end if 5: end for requires 2 perfect matchings; 6: return L; Algorithm 5 GRP1((A1, . . . , AK), (C1, . . . , CK+1) Input: (A1, . . . , AK): Anchors; (C1, . . . , CK+1): Clusters; Initialize: A0 = ˆσ(1), U0 = , C0 = , AK+1 = ˆσ(N). 1: for j {1, . . . , | K+1 2 | } do 2: Uj C2j 1 {A2j 1} C2j. 3: if (size of Uj is odd) and (A2j 2 Uj 1) then 4: Uj Uj {A2j 2}. 5: else if (size of Uj is odd) then 6: Uj Uj {A2j}. 7: end if 8: end for 9: return (U1, . . . , U | K+1 Algorithm 6 GRP2((A1, . . . , AK), (C1, . . . , CK+1) Input: (A1, . . . , AK): Anchors; (C1, . . . , CK+1): Clusters; Initialize: A0 = ˆσ(1), U0 = , C0 = , AK+1 = ˆσ(N). 1: U0 = C1 2: if size of U0 is odd then 3: U0 U0 {A1}. 4: end if 5: for j {1, . . . , | K+1 2 | } do 6: Uj C2j {A2j} C2j+1. 7: if (size of Uj is odd) and (A2j 1 Uj 1) then 8: Uj Uj {A2j 1}. 9: else if (size of Uj is odd) then 10: Uj Uj {A2j+1}. 11: end if 12: end for 13: return (U1, . . . , U | K+1 Active Ranking and Matchmaking, with Perfect Matchings Algorithm 7 CLUSTERING((A1, . . . , AK), ˆr) Input: (A1, . . . , AK): Anchors; ˆr: Ranking; Initialize: A0 = ˆσ(1), AK+1 = ˆσ(N). 1: for j {1, . . . , K + 1} do 2: Cj n i [N], ˆr(Aj 1) < ˆr(u) < ˆrs(Aj) o ,. 3: end for 4: return (C1, . . . , CK+1). Algorithm 8 MULTIAKS((U1, . . . , UL), ε, δ) Input: (U1, . . . , UL): list of sets; Output: (ˆr1, . . . , ˆr L) rankings on U1 . . . UL 1: for j {1, . . . , L} do 2: ˆrj AKS(Uj, ε, δ/L). This loop is run in parallel 3: end for 4: return (ˆr1, . . . , ˆr L). F.4. Pseudocode of SAMPLEMATCHING Algorithm 9 SAMPLEMATCHING Input Graph G, ni,i+1 N, ri,i+1 N Output M: maximal matching on G 1: Construct C1, . . . , Ck connected components of G; 2: for j [k] do 3: Mj = arg min m max. match. on Cj max {i,i+1} m ri,i+1 ni,i+1 u(ni,i+1); 4: end for 5: M = j Mj; 6: Complete M arbitrarily into a perfect matching of [N]; 7: return M; F.5. Pseudo-code of GAPESTIMATION Algorithm 10 GAPESTIMATION 1: for i [N 1] do 2: Set ei,i+1 = 1, ni,i+1 = ri,i+1 = li,i+1 = 0; 3: end for 4: while True do 5: G = ([N], {ei,i+1}); 6: M = Sample Matching(G, (ni,i+1)i, (ri,i+1)i); requires 1 perfect matching; 7: Query comparisons according to M; 8: for i [N 1] do 9: ni,i+1 ni,i+1 + 1 ({i, i + 1} M); 10: ri,i+1 ri,i+1 + 1 (i won against i + 1); 11: end for 12: for i [N 1] do 13: ˆε(i, i + 1) = ri,i+1 14: ei,i+1 = 1{ˆε(i, i + 1) u(ni,i+1, δ) ε }; 15: li,i+1 = 1{|ˆε(i, i + 1) ε | > u(ni,i+1, δ)}; 16: end for 17: if There is maximal matching M of G whose edges e M all satisfy le = 1 then 18: return M ; 19: break; 20: end if 21: end while