# eliciting_kemeny_rankings__3c0c80dd.pdf Eliciting Kemeny Rankings Anne-Marie George1, Christos Dimitrakakis1, 2 1University of Oslo, Norway 2University of Neuchatel, Switzerland annemage@uio.no, christos.dimitrakakis@unine.ch We formulate the problem of eliciting agents preferences with the goal of finding a Kemeny ranking as a Dueling Bandits problem. Here the bandits arms correspond to alternatives that need to be ranked and the feedback corresponds to a pairwise comparison between alternatives by a randomly sampled agent. We consider both sampling with and without replacement, i.e., the possibility to ask the same agent about some comparison multiple times or not. We find approximation bounds for Kemeny rankings dependant on confidence intervals over estimated winning probabilities of arms. Based on these we state algorithms to find Probably Approximately Correct (PAC) solutions and elaborate on their sample complexity for sampling with or without replacement. Furthermore, if all agents preferences are strict rankings over the alternatives, we provide means to prune confidence intervals and thereby guide a more efficient elicitation. We formulate several adaptive sampling methods that use look-aheads to estimate how much confidence intervals (and thus approximation guarantees) might be tightened. All described methods are compared on synthetic data. 1 Introduction Decision making based on the preferences of a group of individuals is tackled in various settings by AI systems. To ensure a good societal outcome, the fair aggregation of the society members preferences over alternatives plays a central role. Many such aggregation functions for agents or voters preferences have been proposed by politicians, mathematicians, economists and computer scientists. Research in the area of Social Choice and more specifically Voting Theory analyses and develops such aggregation functions algorithmically and axiomatically (Brandt et al. 2016). While impossibility results force all such functions to be flawed by computational complexity or violation of fairness guarantees, e.g. (Arrow 1950; Satterthwaite 1975), many are well researched and applied in practise. However, it is usually assumed that voters specify their preferences completely before the desired aggregation can be determined. This can be excessively time consuming or costly for voters, especially for large numbers of alternatives as in movie or Copyright 2024, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. online-shopping platforms. Next we describe our contributions for efficiently eliciting aggregated rankings under Kemeny s rule, related work and outline of this paper. Contributions and Related Work We formulate the problem of eliciting voter preferences as a Dueling Bandit problem (Bengs et al. 2021) in which arms of bandits correspond to alternatives, and the bandit feedback for pulling two arms corresponds to a random voter s preference over the two arms. This feedback is then summarised in a matrix of winning probabilities over arms. Our goal is to elicit an aggregated ranking of voter preferences based on some voting rule. Here, the duelling bandit formulation is applicable for any voting rule that is a C2 function according to Fishburn s classification of preference aggregation functions (Fishburn 1977). That is, functions that only require the margin of voters that prefer one alternative over another as input, and not the exact combinations of pairwise preferences the voters have. For such functions, we can thus assume the input to be a preference matrix indicating the fraction of voters preferring an alternative over another for all pairs of alternatives. The results by (Boehmer et al. 2022) imply that it is NP-hard to decide whether a given matrix corresponds to a set of voters preference orders (i.e., linear orders) over the alternatives. It follows that it is not possible to find a characterisation of such matrices that can be checked in polynomial time. Nevertheless, we identify properties of such matrices. Contribution 1: In Section 2, we connect the vote elicitation and dueling bandit frameworks for C2 functions, and propose three properties of preference matrices, which will help devising adaptive elicitation strategies: Completeness, Triangle Inequality and Realisable Borda Scores. Next, we focus our attention on the voting rule devised by John Kemeny in 1959 (Kemeny 1959), which is a preference aggregation function with compelling properties such as monotonicity and Condorcet winner consistency (Young and Levenglick 1978). The output of this rule is a set of rankings over the alternatives which minimise the Kemeny score. The Kemeny score measures voter disagreement in terms of pairwise disagreement over the ranked alternatives, i.e., Kendall s tau distance (Kendall 1938). In general, it is NP-hard to compute Kemeny rankings (Bartholdi, Tovey, and Trick 1989) but polynomial time solvable for specific The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24) types of preferences, such as single-peaked ones (Brandt et al. 2015). A perk of Kemeny s rule, as a C2 function, is that it can be readily applied to any matrix. That is, even when voters only specify parts of their preferences, i.e., an incomplete order over the alternatives, or when their preferences are non-transitive. The corresponding matrix in that case only contains fractions of voters that specified their preference over two alternatives. This enables us to compute an approximate Kemeny ranking even when not all preferences have been elicited yet. Our goal then is to sample pairs of arms (i.e., query voters to compare two alternatives) such that the approximate Kemeny ranking based on the sample means is close (for some distance measure) to an optimal Kemeny ranking. Note that while approximation algorithms exist for Kemeny rankings, e.g., (Betzler et al. 2008), the approximation bounds usually concern the approximation w.r.t. a given instance. Contrarily, we consider the approximation of Kemeny rankings w.r.t. a similar instance while applying an exact method. Contribution 2: In Section 3, we formulate approximation bounds for (1) Kendall tau distance and (2) Kemeny Score difference, between the true Kemeny ranking of a matrix of winning probabilities and the Kemeny ranking of an approximate matrix. These bounds are based on confidence intervals around sample means for the winning probabilities. Note that bandit problems with the goal of finding a ranking over the arms (as opposed to one or several winning arms) have been analysed for the C2 functions Borda and Copeland, see e.g. (Busa-Fekete, Sz or enyi, and H ullermeier 2014; Falahatgar et al. 2017; Lin and Lu 2018), and other winner concepts such as von Neumann winners (Balsubramani et al. 2016). These works establish probably approximately correct (PAC) algorithms and analyse their sample complexity. To the best of our knowledge such considerations do not exist for Kemeny rankings. In fact, (Bengs et al. 2021) propose Kemeny rankings for future work. Also, no related work considers sampling from a fixed population without replacement (as would be reasonable when given a set of voters which shall not be queried twice about the same pair of alternatives). What is more, approximations of rankings w.r.t. Kemeny scores (i.e., agents disagreement) have not been considered under any aggregation function so far. Contribution 3: In Section 4, we formulate PAC algorithms for Kemeny rankings w.r.t. Kemeny scores of a matrix of winning probabilities for two distinct cases of sampling methods: (1) sampling bandit feedback from voters u.a.r., i.e., sampling with replacement, (2) sampling bandit feedback from a hypergeometric distribution, i.e., sampling without replacement, meaning that voters cannot be asked multiple times about the same pair of alternatives. These PAC algorithms are based on our approximation results (see Contribution 2) and do not assume any properties for the matrix of winning probabilities. We give concrete sample complexities for desired approximation values and probabilities. Contribution 4: In Section 5 we show how to prune confidence intervals for winning probabilities of arms based on the properties of preference matrices (see Contribution 1), i.e., for the case that the matrix originates from a set of (linear) preference orders. We discuss several adaptive sampling methods that use look-aheads to anticipate possible effects of pruning. Finally, we analyse the sample complexity experimentally on synthetic data for all developed methods. An extended version of this paper including more details, experimental results and all proves of results marked with can be found under https://arxiv.org/abs/2312.11663. 2 Voting and Bandits In a voting setting, we assume to have a set N of voters and a set C = [k] of k candidates or alternatives. Let L(C) be the set of all linear orders or rankings over C. We usually consider the preferences of the voters over the alternatives to be linear orders v L(C) for v N. A typical goal in this setting is to aggregate the preferences of voters into an outcome ranking or to determine a winner. A social preference function is a function that maps a preference profile P = { v| v N} to one or several output rankings. One example of such a rule is Kemeny s rule, which aims to minimise the disagreement of the voters with the output ranking. In a duelling bandits setting, we have a set C = [k] of k arms of bandits. In every time step, we pull two arms yielding feedback of which one is better. That is, when pulling arms i and j we get feedback i j with some (unknown) probability qij and feedback j i with inverse probability qji = 1 qij. The winning probabilities can be summarised in a matrix Q [0, 1]k k where by convention qii = 0.5 for all i [k]. Let Q(k) = {Q [0, 1]k k | qji = 1 qij and qii = 0.5 i, j [k]} be the set of such matrices. We assume the true matrix of winning probabilities is unknown and must be approximated, with the ultimate goal of finding a good ranking of the arms. This goal can be expressed by a measure of regret and literature in this field usually analyses regret bounds in the number of samples. To connect the two settings, we make the following observation: A preference profile P L(C)|N| of voters N over alternatives C = [k] can be translated to a preference matrix Q Q(k) by defining qij = |{v N|i vj}| |N| and qii = 0.5 for all i, j [k]. Thus, the alternatives are seen as arms of bandits and the winning probabilities qij are equal to the fraction of voters preferring alternative i over j. We denote the set of such preference matrices that are originating from preference profiles by P(k, n), where k is the number of alternatives and n the number of voters. Because voters preferences are linear orders, and transitive, P(k, n) Q(k). We show further properties of such matrices. Lemma 1 ( ). Let P = { v L([k]) | v [n]} be a profile of n voters and k alternatives. Then Q P(k, n) with qij = 1 n|{v [n] | i v j}| and qii = 0.5 satisfies: 1. qij n [n] and qij + qji = 1 for all distinct i, j [k]. (Completeness) 2. qlj + qji qli for all distinct i, j, l [k]. (Triangle Inequality) 3. P i A P j [k]\{i} qij 1 n |A| (k |A|+1 2 ) for all A [k]. (Realisable Borda Scores) The next example shows that mapping preference profiles to preference matrices is not injective. The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24) Example 1. Consider the two preference profiles P1, P2 over three alternatives below. P1 = {1 2 3, 1 2 3, 3 2 1} P2 = {1 2 3, 2 1 3, 3 1 2}. The two profiles map to the same preference matrix Q: 1 2 2 3 2 3 1 3 1 2 2 3 1 3 1 3 1 2 3 Kemeny s Rule Kemeny s Rule outputs (after applying some tie-breaking) a strict ranking of alternatives / arms that minimises the disagreement of the voters. To measure disagreement of voters with a ranking we can apply Kendall s tau distance (or inversion distance ). The Kendall-tau distance between two preference orders v, w L(C) is defined as follows (where I is the indicator function): d Kt( v, w) = X {i,j} C I(i v j) I(j w i) (i,j) v I(j w i). The Kemeny score of a ranking L(C) given the preference rankings of voters N is defined as KS({ v}v N, ) = P v N d Kt( , v) and the set of Kemeny rankings (before tie-breaking) is K({ v}v N) = argmin L(C)KS({ v }v N, ). In the same spirit of measuring disagreement, we can define the Kemeny rankings for a given matrix Q [0, 1]k k, where qij is the probability that arm i beats arm j. The Kemeny score of a ranking L(C) with C = [k] is proportional to the probability that the outcome of any random sample of a pair of arms is not aligned with , i.e., KS(Q, ) = P (i,j) qji and the set of Kemeny rankings is K(Q) = argmin L(C)KS(Q, ). In general, Kemeny s rule is not resolute, i.e., the minimum Kemeny score may be assumed by several rankings. A tie-breaking rule allows to choose one among the rankings that minimise the Kendall-tau distance. In this paper, we assume to break ties deterministically whenever necessary. Given only an approximation of a preference matrix Q, we can also only approximate its Kemeny ranking. To measure the quality of such an approximation, two natural distance measures for such rankings can be applied: the Kendall-tau distance between the true and the approximate ranking, or the difference in their Kemeny scores w.r.t. the true matrix Q. In the following, we analyse approximation guarantees based on both measures. While Kemeny s rule is NP hard to compute, there exist polynomial time approximation algorithms (Betzler et al. 2008). For an approximate Kemeny ranking of an approximate matrix, we can simply combine the approximation bounds. Assume matrix Q has Kemeny ranking τ and its approximation matrix ˆQ has Kemeny ranking ˆτ that is a ρ-approximation w.r.t. Kemenyscores, i.e., KS(Q, ˆτ) KS(Q, τ) + ρ. Then an additive αapproximation τ of ˆτ, i.e., KS( ˆQ, τ) KS( ˆQ, ˆτ)+α, yields KS(Q, τ) < KS(Q, τ)+ρ+α. Similarly, a multiplicative λapproximation τ with KS( ˆQ, τ) (1 + λ)KS( ˆQ, ˆτ) yields KS(Q, τ) < (1 + λ)KS(Q, τ) + (1 + λ)ρ. Kendall-tau Distance of Approx. Kemeny Rankings Consider two rankings τ, ˆτ L(C) with |C| = k. If the two rankings are equal, their Kendall-tau distance is 0; if they are reversed, their Kendall-tau distance is maximal, i.e., (k 1)k 2 . More generally, the Kendall-tau distance of any two rankings τ, ˆτ L(C) with |C| = k, is an integer in {0, . . . , (k 1)k 2 }. The following lemma shows that the Kendall-tau distance of the Kemeny rankings K(Q) and K( ˆQ) of a preference matrix Q and a close approximation ˆQ of Q can be maximal. This is proven by construction of an example with values in Q that are very close to 0.5. Thus, no matter the number of samples, when considering the Kemeny ranking based on the sample means, we cannot guarantee any improvement in terms of Kendall-tau distance. Lemma 2 ( ). For any small1 ϵ > 0, there exist matrices Q, ˆQ Q(k) with Q ˆQ 1 = ϵ and τ = K(Q) and ˆτ = K( ˆQ) such that d Kt(τ, ˆτ) = (k 1)k If the matrix of winning probabilities is indeed a preference matrix, i.e., reflecting fractions of n voters, then a reasonable restriction is to ask every voter at most once about their preference over a pair of arms. Disregarding inference reached through transitivity of voter preferences, a total of n (k 1)k 2 queries is necessary to recover the true preference matrix. However, if values in the preference matrix are close to 0.5, even asking (n 1) (k 1)k 2 queries can still lead to an approximate Kemeny ranking with maximal Kendall-tau distance to the true Kemeny ranking. Lemma 3 ( ). For any number of arms k > 2 and number of voters n > 2, there exist preference profiles with corresponding preference matrices Q P(k, n) and ˆQ P(k, n 1) with Q ˆQ 1 = (k 1)k 2(n 1) and τ = K(Q) and ˆτ = K( ˆQ) such that d Kt(τ, ˆτ) = (k 1)k This shows that especially for large numbers of voters, the approximation error for approximating matrix Q could be very small, while the Kendall-tau distance is maximal. A similar result can be obtained when querying voters about all but one pair of alternatives. Consider a Condorcet cycle for which q12 = q23 = q31 = 2/3. Then the tie-breaking rule, e.g., 1 2 3, coincides with the Kemeny ranking with a score of 4/3. When eliciting all preferences over all pairs of arms but (2, 3), and initialising q23 = q32 = 0.5, the approximated Kemeny ranking is 3 2 1, independent of tie-breaking, yielding a maximal KT-distance. The results in this section are based on examples for which Kemeny rankings have very high Kemeny scores. Thus, while yielding very different Kemeny rankings, the matrices in the proofs also yield similar voter satisfaction. In the next section, we concentrate on bounds of Kemeny score differences between Kemeny rankings of similar matrices. Kemeny Score of Approx. Kemeny Rankings We can bound the difference of Kemeny scores for two Kemeny rankings of similar matrices. 1For the construction of our proof, we require ϵ < (k 1)k The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24) Lemma 4 ( ). Let Q, ˆQ Q(k) with Q ˆQ 1 ϵ. Furthermore, let τ be a Kemeny ranking of Q and ˆτ a Kemeny ranking of ˆQ. Then KS(Q, ˆτ) KS(Q, τ) ϵ. Note that because τ is a Kemeny ranking w.r.t. Q, the difference of Kemeny scores of τ and ˆτ cannot be negative. Lemma 4 shows that with a close approximation of a matrix we can gain a Kemeny ranking that approximates the Kemeny score, i.e., minimal voter disagreement, well. When sampling preferences, or more specifically feedback from dueling bandits, we can express the value of our approximation of the winning matrix by confidence intervals for matrix entries. For this type of uncertainty quantification, we can adjust the approximation bounds from Lemma 4. Corollary 1. Let Q, ˆQ Q(k). Consider Kemeny rankings τ of Q and ˆτ of ˆQ. Suppose qij [ˆqij cji, ˆqij + cij] for some matrix C Rk k 0 holds for all i, j [k], i = j with probability (1 δ). Then KS(Q, ˆτ) KS(Q, τ) 2 P 1 i 1. Then, with probability (1 δ) we have KS(Q, ˆτ) KS(Q, τ) k (k 1) ct δ for Kemeny ranking τ of Q and Kemeny ranking ˆτ of ˆQ + 1 ct δ. Given a desired approximation rate ρ, we can reformulate Proposition 1 and obtain the necessary number of samples. Corollary 2. For k 2 arms, Q Q(k) with Kemeny ranking τ, fixed approximation guarantee 0 < ρ k(k 1) 2 and probability δ, Algorithm 1 is a (δ, ρ)-PAC algorithm. That is, after t < 1 2x2y samples of each pair of arms, Algorithm 1 returns ranking ˆτ such that with probability (1 δ), we have KS(Q, ˆτ) KS(Q, τ) ρ. Here x := k (k 1) ρ and y := ln δ k(k 1) > 1. Sampling Without Replacement Suppose that, while sampling voters uniformly at random, none of the voters v N can be asked twice about their preferences over a pair of arms. Then the number s out of t asked voters that prefer i to j follows a hypergeometric distribution: s Hypergeometric(n, n qij, t). In this case, we can refine the confidence bounds of our sampled estimation of the true fractions of voters qij preferring i to j. This leads to better approximation for a fixed number of samples, or reversely, a lower sample complexity for a fixed approximation goal, compared to sampling with replacement. More concretely, it follows from the next propositions that Algorithm 2 is a PAC algorithm for Kemeny rankings w.r.t. Kemeny scores when sampling without replacement. Proposition 2 ( ). Consider n > 1 voters with preference matrix Q P(k, n) over k 2 alternatives. Let δ < 0.5 be the approximation probability and y := ln δ k(k 1) > 1. The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24) Algorithm 2: Kemeny El(Q, ρ, δ): Sampling w/o Replacement from Q P(k, n) ˆQ {1/2}k k Estimate of Q x := k (k 1) ρ , y := ln δ k(k 1) Constants if n < x2y 4 t := 2n+x2yn 2n+x2y , c := q 2t2n y Sample Size and Confidence Bound (small n) else t := x2y(n+1) 2n+x2y , c := q 2tn y Sample Size and Confidence Bound (large n) end if for arms i, j [k] do s Hypergeometric(n, n qij, t) ˆqij s/t ˆqji 1 ˆqij end for return K( ˆQ + 1 c) Furthermore let ˆQ Q(k) be the matrix of sample averages after t uniformly random samples from the population for each pair of arms (without replacement). Then ct δ,n = q 2tn y is a (1 δ)-confidence bound for all entries in ˆQ. Furthermore, with probability (1 δ) we have KS(Q, ˆτ) KS(Q, τ) k (k 1) ct δ,n for Kemeny ranking τ of Q and Kemeny ranking ˆτ of ˆQ + 1 ct δ,n. For given ρ, we can guarantee an approximation KS(Q, ˆτ) KS(Q, τ) ρ with probability (1 δ) after t = x2y(n+1) 2n+x2y samples (without replacement) of each pair of arms, where x := k (k 1) For the case that we have more samples than half of the population size, i.e., t > n/2, and a limited population size, we can apply better bounds on confidences over the winning probabilities. This leads to the following result. Proposition 3 ( ). Assume a fixed voter population of size n > 1 with preference matrix Q P(k, n) over k 2 alternatives. Let δ < 0.5 be the approximation probability and y := ln δ k(k 1) > 1. Furthermore, let ˆQ Q(k) be the matrix of sample averages after t > n/2 uniformly random samples for each pair of arms (without replacement). Then ct δ,n = q 2t2n y is a (1 δ)-confidence bound for all entries in ˆQ. Furthermore, with probability (1 δ) we have KS(Q, ˆτ) KS(Q, τ) k (k 1) ct δ,n for Kemeny ranking τ of Q and Kemeny ranking ˆτ of ˆQ + 1 ct δ,n. For given ρ and if the population is of size n < x2y 4 2 , we can guarantee an approximation KS(Q, ˆτ) KS(Q, τ) ρ with probability (1 δ) after t = x2yn+2n 2n+x2y samples (without replacement) of each pair of arms, where x := k (k 1) Comparison Note that Proposition 2 and 3 use improved confidence bounds for the estimated winning probabilities compared to Proposition 1. This is not surprising, since we are sampling from a fixed population without replacement, and expect the confidence bound to be dependent on the relation between the number of samples and the population size. Furthermore, unsurprisingly, this improves the sample complexity. We can quantify the advantage as follows. Corollary 3. Let x := k (k 1) ρ and y := ln δ k(k 1) > 1. Consider the confidence bounds after t samples from Proposition 1, Proposition 2 and Proposition 3: (n t)(t + 1) Then we have c = q n c and c = q tn c, i.e., tighter confidence bounds when sampling without replacement. Furthermore, c > c if and only if t > n/2. For fixed approximation guarantee k(k 1) n ρ > 0, consider the sample complexities from Proposition 1, 2, 3: 2x2y, t = x2y(n + 1) 2n + x2y , t = x2yn + 2n Then we have t = 2(n+1) 2n+x2yt and t = 2n+4n/(x2y) 2n+x2y t. For the reasonable assumption that 2/n < ρ, k 2 and n 2 this means that fewer samples are needed when sampling without replacement than with replacement. Furthermore, for n < x2y 4 2 we have t > t . This comparison also shows that the more samples are taken, the starker the difference between confidence bounds when sampling with and without replacement. Furthermore, in order to reach an approximation of ρ, the smaller ρ and δ, the fewer samples we need when sampling without replacement compared to sampling with replacement. 5 Pruning of Confidence Intervals In Section 4 we estimated the confidence intervals of the winning probabilities qij by the use of some known concentration inequalities. This treatment delivered symmetric bounds that, given the same number of samples, are the same for all pairs of arms i, j [k]. However, the values qij of a preference matrix Q P(k, n) are not independent. In fact, Lemma 1 implies that qij +qji = 1, for all i, j [k]. We can restrict confidence intervals to comply with this constraint. Lemma 6 ( ). Assume that the sampled means ˆqij after some samples from a preference matrix Q P(k, n) have (possibly asymmetric) confidence intervals: qij [ˆqij cij, ˆqij + cij] for all arms i, j [k] with probability (1 δ). Define matrix C [0, 1]k k by cij := min(cij, cji). Then with probability (1 δ) for all arms i, j [k]: qij [ˆqij cji, ˆqij + cij]. (1) The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24) Lemma 1 also implies that for a preference matrix Q P(k, n) a triangle inequality must hold for all triples of arms: qlj + qji qli for all i, j, l [k]. We can comply with this constraint by pruning confidence intervals further. Lemma 7 ( ). Assume that the sampled means ˆqij after some samples from a preference matrix Q P(k, n) have (possibly asymmetric) confidence intervals: qij [ˆqij cji, ˆqij + cij] for all arms i, j [k] with probability (1 δ). Define matrix C [0, 1]k k by cij := min (cij, min l [k] ˆqil + cil + ˆqlj + clj ˆqij). Then with probability (1 δ) for all arms i, j [k]: qij [ˆqij cji, ˆqij + cij]. (2) An example of pruning confidence intervals can be found in Appendix D of (George and Dimitrakakis 2023). After pruning intervals once, it might be feasible to prune the new confidence intervals even more. This procedure is described by Algorithm 3. Note that Lemma 7 allows cij to be negative, indicating that the sample mean ˆqij overestimates the actual winning probability qij. This does not influence the approximation result given in Lemma 5. However, we require that 1 cij+ˆqij 0 which is expressed in Algorithm 3 in the 4th and 10th line. Algorithm 3 converges, because in every iteration of the repeat-loop at least one value cij strictly decreases and values never increase while maintaining a lower bound of ˆqij. Furthermore, the values of initial confidence bounds and sample averages build a discrete set of values such that there are countably many combinations that can build the minimum for the updates. We can now refine the elicitation methods from Section 4. Algorithm 3: Pruning CI( ˆQ, C, C) Pruning Confidence Intervals qij [ˆqij cij, ˆqij + cij] for all i, j [k] of preference matrix Q P(k, n) cij min(cij, cji) for all i, j [k] Upper/Lower Confidence Bounds based on Lemma 6 cij min(cij, 1 ˆqij) for all i, j [k] Upper/Lower Confidence Bounds based on qij [0, 1] C C repeat C C for arms i, j [k] do cij min (cij, min l [k],l =i,j ˆqil + cil + ˆqlj + clj ˆqij) cij max ( ˆqij, cij) end for until C == C return C Adaptive Sampling Strategies Recall that our approximation bounds for Kemeny rankings from Section 3 are determined by the confidence intervals sizes. Furthermore, the confidence bounds for sampling with or without replacement given in Section 4 can only be influenced by the number of pulls of the pairs of arms. Thus uniform sampling guarantees to always sample from a pair of arms with the largest confidence interval. However, when pruning confidence intervals after every new sample, the most uncertain pair of arms, i.e., one with the largest confidence interval, might not necessarily be one with the lowest number of pulls so far. We can also estimate the outcome and effect of the next ℓsamples. To simplify notation and computational complexity, we concentrate on such look-aheads with l = 1. This allows the following sample strategies: Uniform: Sample a pair with minimal number of pulls. Opportunistic: Sample a pair with maximal confidence interval size. Optimistic: Sample a pair which, for the best case outcome, gives a maximal reduction in the sum of all confidence interval sizes. Pessimistic: Sample a pair which, for the worst case outcome, gives a maximal reduction in the sum of all confidence interval sizes. Bayesian/Realistic: Sample a pair which, in expectation w.r.t. the current estimation of winning probabilities, gives a maximal reduction in the sum of all confidence interval sizes. Note that if in one round no pruning is possible for any sample outcome for any pair of arms, all sampling strategies collapse to opportunistic sampling. If additionally all pairs have been pulled equally often, this is the same as uniform sampling as any sample outcome will give same reduction. Experiments Setup: We generate preference matrices Q P(k, n) for n = 10 voters uniformly at random for up to k = 9 arms. We set an approximation value of ρ = 0.1 k(k 1) 2 , i.e., 10% of the worst case difference in approximated and optimal Kemeny score, and approximation probability (1 δ) = 0.95. For better comparison, we compare all sampling methods on the same preference matrices. Furthermore, we average all our results over runs on several instances. Note that the computational complexity for sampling with replacements only allows us to conduct experiments on 10 instances and for few arms, while for sampling without replacement we use 100 instances for experiments with up to 9 arms. To compare Kemeny scores at every sampling step, it is required to compute Kemeny rankings which is a known NPhard problem. We use a simple ILP formulation for this. All code is written in Python and is publicly available under https://github.com/annemage/Eliciting-Kemeny-Rankings. Results and Analysis: As all sampling methods correspond to uniform sampling when no pruning is taken into account, we restrict our attention here to the case where we apply Algorithm 3 to the confidence intervals indicated in Corollary 3 w.r.t. current numbers of pulls for every new sample. Unsurprisingly, our experiments show that the Kemeny score for a current approximation is closer to the true Kemeny score than the approximation bounds given by the The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24) (a) Sampling With Replacement. Averages over 10 instances. k = 4, ρ = 0.6. Average true Kemeny score 2.08. (b) Sampling Without Replacement. Averages over 100 instances. k = 6, ρ = 1.5. Average true Kemeny score 5.618. Figure 1: Average Confidence Bounds for uniformly at random generated instances with n = 10 voters. δ = 0.05. total length of confidence intervals (see Lemma 5) would suggest, i.e., our approximation bounds are not tight. Because it is the approximation bounds, however, that influence the sample strategies, we show a comparison of these in Figure 1 for a sequence of samples. Note that here the length of the x-axis corresponds to the theoretical sample complexity given in Corollary 3. Obviously, the two plots confirm that sampling without replacement is much more efficient for all sample methods. Note that if for an instance sampling terminated because the approximation bound ρ was reached early, averages are taken over the remaining instances only. It can be easily seen in Figure 1a that when sampling with replacement only uniform and opportunistic sampling reach the given approximation bound ρ within the theoretical sample complexity of 6576 samples. Note that in these experiments the look-ahead methods (optimistic, pessimistic and realistic sampling) often get stuck by sampling the same pairs of arms which apparently do not lead to enough pruning of confidence intervals to be worth it. This is surprising especially for pessimistic sampling, as this method should be conservative w.r.t. pruning possibilities. Furthermore, uniform and opportunistic sampling behave exactly the same and reach the desired approximation ρ in exactly the theoretical sample complexity. This might imply that pruning has no effect in these instances which could be an artifact of how the confidence intervals are computed in this case. On the other hand, it can easily be seen from Figure 1b that for sampling without replacement even uniform sampling needs in average less than the theoretical number of samples indicated in Corollary 3 when pruning is applied, showing that pruning of confidence intervals is applicable in our instances. Here, the adaptive sampling strategies show a clear advantage over sampling uniformly. While opportunistic sampling seems the best choice when only sampling few pairs of arms, it is outperformed by optimistic, pessimistic and realistic (Bayesian) look-aheads for more samples. Naturally, the look-ahead methods can anticipate any possibility of pruning the confidence intervals, which seems to give them an advantage in the long run. However, at this point we would like to point out that opportunistic sampling has much lower computational complexity than the look-ahead methods, which is an obvious advantage in practice. Optimistic and realistic sampling behave similarly. They need fewer samples and give a better approximation than pessimistic sampling. This might be because pessimistic sampling is too conservative in reacting to chances of pruning. For different k experimental results show similar behaviour. Please refer to Appendix D in (George and Dimitrakakis 2023) for more results regarding different k and a comparison to preference profiles generated from a Mallows model (Mallows 1957) with φ = 0.2 (i.e., preferences being fairly similar) or uniformly at random generated singlepeaked preferences which always admit a Condorcet winner. 6 Conclusion We phrased the problem of eliciting voters preferences for determining a Kemeny ranking in terms of a Dueling Bandit problem and found that while there are no efficient PAC algorithms for approximating the Kendall tau distance, we can formulate PAC algorithms for approximating the Kemeny score. Here, our approximation bounds for the Kemeny score are dependent on the sum of confidence bounds for the approximated winning probabilities of arms, rendering uniform sampling the best sampling strategy w.r.t. these bounds. We analyse the sample complexity for uniform sampling of arms for sampling with and without replacement. Further, we described ways of pruning confidence bounds in order to reduce the sample complexity, yielding nonuniform sample strategies, and demonstrate the effect of our methods experimentally. In future work, one could transfer the methodology taken in this paper for finding confidence-bound-based approximation bounds for other C2 voting rules. Furthermore, the Kemeny score could be an interesting approximation measure for PAC algorithms under different voting rules. The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24) Acknowledgements This work was supported under the NRC Grant No 302203 Algorithms and Models for Socially Beneficial Artificial Intelligence . References Arrow, K. J. 1950. A difficulty in the concept of social welfare. Journal of political economy, 58(4): 328 346. Balsubramani, A.; Karnin, Z.; Schapire, R. E.; and Zoghi, M. 2016. Instance-dependent regret bounds for dueling bandits. In Conference on Learning Theory, 336 360. PMLR. Bartholdi, J.; Tovey, C. A.; and Trick, M. A. 1989. Voting schemes for which it can be difficult to tell who won the election. Social Choice and welfare, 6: 157 165. Bengs, V.; Busa-Fekete, R.; El Mesaoudi-Paul, A.; and H ullermeier, E. 2021. Preference-based online learning with dueling bandits: A survey. The Journal of Machine Learning Research, 22(1): 278 385. Betzler, N.; Fellows, M. R.; Guo, J.; Niedermeier, R.; and Rosamond, F. A. 2008. Fixed-parameter algorithms for Kemeny scores. In Algorithmic Aspects in Information and Management: 4th International Conference, AAIM 2008, Shanghai, China, June 23-25, 2008. Proceedings 4, 60 71. Springer. Boehmer, N.; Faliszewski, P.; Niedermeier, R.; Szufa, S.; and Was, T. 2022. Understanding distance measures among elections. ar Xiv preprint ar Xiv:2205.00492. Brandt, F.; Brill, M.; Hemaspaandra, E.; and Hemaspaandra, L. A. 2015. Bypassing combinatorial protections: Polynomial-time algorithms for single-peaked electorates. Journal of Artificial Intelligence Research, 53: 439 496. Brandt, F.; Conitzer, V.; Endriss, U.; Lang, J.; and Procaccia, A. D. 2016. Handbook of computational social choice. Cambridge University Press. Busa-Fekete, R.; Sz or enyi, B.; and H ullermeier, E. 2014. PAC rank elicitation through adaptive sampling of stochastic pairwise preferences. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 28, #1. Falahatgar, M.; Orlitsky, A.; Pichapati, V.; and Suresh, A. T. 2017. Maximum selection and ranking under noisy comparisons. In International Conference on Machine Learning, 1088 1096. PMLR. Fishburn, P. C. 1977. Condorcet social choice functions. SIAM Journal on applied Mathematics, 33(3): 469 489. George, A.-M.; and Dimitrakakis, C. 2023. Eliciting Kemeny Rankings. ar Xiv:2312.11663. Kemeny, J. G. 1959. Mathematics without numbers. Daedalus, 88(4): 577 591. Kendall, M. G. 1938. A new measure of rank correlation. Biometrika, 30(1/2): 81 93. Lin, C.-C.; and Lu, C.-J. 2018. Efficient mechanisms for peer grading and dueling bandits. In Asian Conference on Machine Learning, 740 755. PMLR. Mallows, C. L. 1957. Non-Null Ranking Models. I. Biometrika, 44(1/2): 114 130. Satterthwaite, M. A. 1975. Strategy-proofness and Arrow s conditions: Existence and correspondence theorems for voting procedures and social welfare functions. Journal of economic theory, 10(2): 187 217. Young, H. P.; and Levenglick, A. 1978. A consistent extension of Condorcet s election principle. SIAM Journal on applied Mathematics, 35(2): 285 300. The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24)