# eliciting_kemeny_rankings__3c0c80dd.pdf

Eliciting Kemeny Rankings

Anne-Marie George1, Christos Dimitrakakis1, 2

1University of Oslo, Norway 2University of Neuchatel, Switzerland annemage@uio.no, christos.dimitrakakis@unine.ch

We formulate the problem of eliciting agents preferences with the goal of ﬁnding a Kemeny ranking as a Dueling Bandits problem. Here the bandits arms correspond to alternatives that need to be ranked and the feedback corresponds to a pairwise comparison between alternatives by a randomly sampled agent. We consider both sampling with and without replacement, i.e., the possibility to ask the same agent about some comparison multiple times or not. We ﬁnd approximation bounds for Kemeny rankings dependant on conﬁdence intervals over estimated winning probabilities of arms. Based on these we state algorithms to ﬁnd Probably Approximately Correct (PAC) solutions and elaborate on their sample complexity for sampling with or without replacement. Furthermore, if all agents preferences are strict rankings over the alternatives, we provide means to prune conﬁdence intervals and thereby guide a more efﬁcient elicitation. We formulate several adaptive sampling methods that use look-aheads to estimate how much conﬁdence intervals (and thus approximation guarantees) might be tightened. All described methods are compared on synthetic data.

1 Introduction

Decision making based on the preferences of a group of individuals is tackled in various settings by AI systems. To ensure a good societal outcome, the fair aggregation of the society members preferences over alternatives plays a central role. Many such aggregation functions for agents or voters preferences have been proposed by politicians, mathematicians, economists and computer scientists. Research in the area of Social Choice and more speciﬁcally Voting Theory analyses and develops such aggregation functions algorithmically and axiomatically (Brandt et al. 2016). While impossibility results force all such functions to be ﬂawed by computational complexity or violation of fairness guarantees, e.g. (Arrow 1950; Satterthwaite 1975), many are well researched and applied in practise. However, it is usually assumed that voters specify their preferences completely before the desired aggregation can be determined. This can be excessively time consuming or costly for voters, especially for large numbers of alternatives as in movie or

Copyright 2024, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

online-shopping platforms. Next we describe our contributions for efﬁciently eliciting aggregated rankings under Kemeny s rule, related work and outline of this paper.

Contributions and Related Work

We formulate the problem of eliciting voter preferences as a Dueling Bandit problem (Bengs et al. 2021) in which arms of bandits correspond to alternatives, and the bandit feedback for pulling two arms corresponds to a random voter s preference over the two arms. This feedback is then summarised in a matrix of winning probabilities over arms. Our goal is to elicit an aggregated ranking of voter preferences based on some voting rule. Here, the duelling bandit formulation is applicable for any voting rule that is a C2 function according to Fishburn s classiﬁcation of preference aggregation functions (Fishburn 1977). That is, functions that only require the margin of voters that prefer one alternative over another as input, and not the exact combinations of pairwise preferences the voters have. For such functions, we can thus assume the input to be a preference matrix indicating the fraction of voters preferring an alternative over another for all pairs of alternatives. The results by (Boehmer et al. 2022) imply that it is NP-hard to decide whether a given matrix corresponds to a set of voters preference orders (i.e., linear orders) over the alternatives. It follows that it is not possible to ﬁnd a characterisation of such matrices that can be checked in polynomial time. Nevertheless, we identify properties of such matrices. Contribution 1: In Section 2, we connect the vote elicitation and dueling bandit frameworks for C2 functions, and propose three properties of preference matrices, which will help devising adaptive elicitation strategies: Completeness, Triangle Inequality and Realisable Borda Scores. Next, we focus our attention on the voting rule devised by John Kemeny in 1959 (Kemeny 1959), which is a preference aggregation function with compelling properties such as monotonicity and Condorcet winner consistency (Young and Levenglick 1978). The output of this rule is a set of rankings over the alternatives which minimise the Kemeny score. The Kemeny score measures voter disagreement in terms of pairwise disagreement over the ranked alternatives, i.e., Kendall s tau distance (Kendall 1938). In general, it is NP-hard to compute Kemeny rankings (Bartholdi, Tovey, and Trick 1989) but polynomial time solvable for speciﬁc

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

types of preferences, such as single-peaked ones (Brandt et al. 2015). A perk of Kemeny s rule, as a C2 function, is that it can be readily applied to any matrix. That is, even when voters only specify parts of their preferences, i.e., an incomplete order over the alternatives, or when their preferences are non-transitive. The corresponding matrix in that case only contains fractions of voters that speciﬁed their preference over two alternatives. This enables us to compute an approximate Kemeny ranking even when not all preferences have been elicited yet. Our goal then is to sample pairs of arms (i.e., query voters to compare two alternatives) such that the approximate Kemeny ranking based on the sample means is close (for some distance measure) to an optimal Kemeny ranking. Note that while approximation algorithms exist for Kemeny rankings, e.g., (Betzler et al. 2008), the approximation bounds usually concern the approximation w.r.t. a given instance. Contrarily, we consider the approximation of Kemeny rankings w.r.t. a similar instance while applying an exact method. Contribution 2: In Section 3, we formulate approximation bounds for (1) Kendall tau distance and (2) Kemeny Score difference, between the true Kemeny ranking of a matrix of winning probabilities and the Kemeny ranking of an approximate matrix. These bounds are based on conﬁdence intervals around sample means for the winning probabilities. Note that bandit problems with the goal of ﬁnding a ranking over the arms (as opposed to one or several winning arms) have been analysed for the C2 functions Borda and Copeland, see e.g. (Busa-Fekete, Sz or enyi, and H ullermeier 2014; Falahatgar et al. 2017; Lin and Lu 2018), and other winner concepts such as von Neumann winners (Balsubramani et al. 2016). These works establish probably approximately correct (PAC) algorithms and analyse their sample complexity. To the best of our knowledge such considerations do not exist for Kemeny rankings. In fact, (Bengs et al. 2021) propose Kemeny rankings for future work. Also, no related work considers sampling from a ﬁxed population without replacement (as would be reasonable when given a set of voters which shall not be queried twice about the same pair of alternatives). What is more, approximations of rankings w.r.t. Kemeny scores (i.e., agents disagreement) have not been considered under any aggregation function so far. Contribution 3: In Section 4, we formulate PAC algorithms for Kemeny rankings w.r.t. Kemeny scores of a matrix of winning probabilities for two distinct cases of sampling methods: (1) sampling bandit feedback from voters u.a.r., i.e., sampling with replacement, (2) sampling bandit feedback from a hypergeometric distribution, i.e., sampling without replacement, meaning that voters cannot be asked multiple times about the same pair of alternatives. These PAC algorithms are based on our approximation results (see Contribution 2) and do not assume any properties for the matrix of winning probabilities. We give concrete sample complexities for desired approximation values and probabilities. Contribution 4: In Section 5 we show how to prune conﬁdence intervals for winning probabilities of arms based on the properties of preference matrices (see Contribution 1), i.e., for the case that the matrix originates from a set of (linear) preference orders. We discuss several adaptive sampling

methods that use look-aheads to anticipate possible effects of pruning. Finally, we analyse the sample complexity experimentally on synthetic data for all developed methods. An extended version of this paper including more details, experimental results and all proves of results marked with can be found under https://arxiv.org/abs/2312.11663.

2 Voting and Bandits In a voting setting, we assume to have a set N of voters and a set C = [k] of k candidates or alternatives. Let L(C) be the set of all linear orders or rankings over C. We usually consider the preferences of the voters over the alternatives to be linear orders v L(C) for v N. A typical goal in this setting is to aggregate the preferences of voters into an outcome ranking or to determine a winner. A social preference function is a function that maps a preference proﬁle P = { v| v N} to one or several output rankings. One example of such a rule is Kemeny s rule, which aims to minimise the disagreement of the voters with the output ranking. In a duelling bandits setting, we have a set C = [k] of k arms of bandits. In every time step, we pull two arms yielding feedback of which one is better. That is, when pulling arms i and j we get feedback i j with some (unknown) probability qij and feedback j i with inverse probability qji = 1 qij. The winning probabilities can be summarised in a matrix Q [0, 1]k k where by convention qii = 0.5 for all i [k]. Let Q(k) = {Q [0, 1]k k | qji = 1 qij and qii = 0.5 i, j [k]} be the set of such matrices. We assume the true matrix of winning probabilities is unknown and must be approximated, with the ultimate goal of ﬁnding a good ranking of the arms. This goal can be expressed by a measure of regret and literature in this ﬁeld usually analyses regret bounds in the number of samples. To connect the two settings, we make the following observation: A preference proﬁle P L(C)|N| of voters N over alternatives C = [k] can be translated to a preference matrix Q Q(k) by deﬁning qij = |{v N|i vj}|

|N| and qii = 0.5 for all i, j [k]. Thus, the alternatives are seen as arms of bandits and the winning probabilities qij are equal to the fraction of voters preferring alternative i over j. We denote the set of such preference matrices that are originating from preference proﬁles by P(k, n), where k is the number of alternatives and n the number of voters. Because voters preferences are linear orders, and transitive, P(k, n) Q(k). We show further properties of such matrices.

Lemma 1 ( ). Let P = { v L([k]) | v [n]} be a proﬁle of n voters and k alternatives. Then Q P(k, n) with qij = 1

n|{v [n] | i v j}| and qii = 0.5 satisﬁes:

1. qij n [n] and qij + qji = 1 for all distinct i, j [k]. (Completeness) 2. qlj + qji qli for all distinct i, j, l [k]. (Triangle Inequality) 3. P i A P j [k]\{i} qij 1

n |A| (k |A|+1

2 ) for all A [k]. (Realisable Borda Scores)

The next example shows that mapping preference proﬁles to preference matrices is not injective.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Example 1. Consider the two preference proﬁles P1, P2 over three alternatives below. P1 = {1 2 3, 1 2 3, 3 2 1} P2 = {1 2 3, 2 1 3, 3 1 2}. The two proﬁles map to the same preference matrix Q:

1 2 2 3 2 3 1 3 1 2 2 3 1 3 1 3 1 2

3 Kemeny s Rule Kemeny s Rule outputs (after applying some tie-breaking) a strict ranking of alternatives / arms that minimises the disagreement of the voters. To measure disagreement of voters with a ranking we can apply Kendall s tau distance (or inversion distance ). The Kendall-tau distance between two preference orders v, w L(C) is deﬁned as follows (where I is the indicator function):

d Kt( v, w) = X

{i,j} C I(i v j) I(j w i)

(i,j) v I(j w i).

The Kemeny score of a ranking L(C) given the preference rankings of voters N is deﬁned as KS({ v}v N, ) = P v N d Kt( , v) and the set of Kemeny rankings (before tie-breaking) is K({ v}v N) = argmin L(C)KS({ v }v N, ). In the same spirit of measuring disagreement, we can deﬁne the Kemeny rankings for a given matrix Q [0, 1]k k, where qij is the probability that arm i beats arm j. The Kemeny score of a ranking L(C) with C = [k] is proportional to the probability that the outcome of any random sample of a pair of arms is not aligned with , i.e., KS(Q, ) = P (i,j) qji and the set of Kemeny rankings is K(Q) = argmin L(C)KS(Q, ). In general, Kemeny s rule is not resolute, i.e., the minimum Kemeny score may be assumed by several rankings. A tie-breaking rule allows to choose one among the rankings that minimise the Kendall-tau distance. In this paper, we assume to break ties deterministically whenever necessary. Given only an approximation of a preference matrix Q, we can also only approximate its Kemeny ranking. To measure the quality of such an approximation, two natural distance measures for such rankings can be applied: the Kendall-tau distance between the true and the approximate ranking, or the difference in their Kemeny scores w.r.t. the true matrix Q. In the following, we analyse approximation guarantees based on both measures. While Kemeny s rule is NP hard to compute, there exist polynomial time approximation algorithms (Betzler et al. 2008). For an approximate Kemeny ranking of an approximate matrix, we can simply combine the approximation bounds. Assume matrix Q has Kemeny ranking τ and its approximation matrix ˆQ has Kemeny ranking ˆτ that is a ρ-approximation w.r.t. Kemenyscores, i.e., KS(Q, ˆτ) KS(Q, τ) + ρ. Then an additive αapproximation τ of ˆτ, i.e., KS( ˆQ, τ) KS( ˆQ, ˆτ)+α, yields KS(Q, τ) < KS(Q, τ)+ρ+α. Similarly, a multiplicative λapproximation τ with KS( ˆQ, τ) (1 + λ)KS( ˆQ, ˆτ) yields KS(Q, τ) < (1 + λ)KS(Q, τ) + (1 + λ)ρ.

Kendall-tau Distance of Approx. Kemeny Rankings Consider two rankings τ, ˆτ L(C) with |C| = k. If the two rankings are equal, their Kendall-tau distance is 0; if they are reversed, their Kendall-tau distance is maximal, i.e., (k 1)k

2 . More generally, the Kendall-tau distance of any two rankings τ, ˆτ L(C) with |C| = k, is an integer in {0, . . . , (k 1)k

2 }. The following lemma shows that the Kendall-tau distance of the Kemeny rankings K(Q) and K( ˆQ) of a preference matrix Q and a close approximation ˆQ of Q can be maximal. This is proven by construction of an example with values in Q that are very close to 0.5. Thus, no matter the number of samples, when considering the Kemeny ranking based on the sample means, we cannot guarantee any improvement in terms of Kendall-tau distance.

Lemma 2 ( ). For any small1 ϵ > 0, there exist matrices Q, ˆQ Q(k) with Q ˆQ 1 = ϵ and τ = K(Q) and ˆτ = K( ˆQ) such that d Kt(τ, ˆτ) = (k 1)k

If the matrix of winning probabilities is indeed a preference matrix, i.e., reﬂecting fractions of n voters, then a reasonable restriction is to ask every voter at most once about their preference over a pair of arms. Disregarding inference reached through transitivity of voter preferences, a total of n (k 1)k

2 queries is necessary to recover the true preference matrix. However, if values in the preference matrix are close to 0.5, even asking (n 1) (k 1)k

2 queries can still lead to an approximate Kemeny ranking with maximal Kendall-tau distance to the true Kemeny ranking.

Lemma 3 ( ). For any number of arms k > 2 and number of voters n > 2, there exist preference proﬁles with corresponding preference matrices Q P(k, n) and ˆQ P(k, n 1) with Q ˆQ 1 = (k 1)k 2(n 1) and τ = K(Q) and

ˆτ = K( ˆQ) such that d Kt(τ, ˆτ) = (k 1)k

This shows that especially for large numbers of voters, the approximation error for approximating matrix Q could be very small, while the Kendall-tau distance is maximal. A similar result can be obtained when querying voters about all but one pair of alternatives. Consider a Condorcet cycle for which q12 = q23 = q31 = 2/3. Then the tie-breaking rule, e.g., 1 2 3, coincides with the Kemeny ranking with a score of 4/3. When eliciting all preferences over all pairs of arms but (2, 3), and initialising q23 = q32 = 0.5, the approximated Kemeny ranking is 3 2 1, independent of tie-breaking, yielding a maximal KT-distance. The results in this section are based on examples for which Kemeny rankings have very high Kemeny scores. Thus, while yielding very different Kemeny rankings, the matrices in the proofs also yield similar voter satisfaction. In the next section, we concentrate on bounds of Kemeny score differences between Kemeny rankings of similar matrices.

Kemeny Score of Approx. Kemeny Rankings We can bound the difference of Kemeny scores for two Kemeny rankings of similar matrices.

1For the construction of our proof, we require ϵ < (k 1)k

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Lemma 4 ( ). Let Q, ˆQ Q(k) with Q ˆQ 1 ϵ. Furthermore, let τ be a Kemeny ranking of Q and ˆτ a Kemeny ranking of ˆQ. Then KS(Q, ˆτ) KS(Q, τ) ϵ. Note that because τ is a Kemeny ranking w.r.t. Q, the difference of Kemeny scores of τ and ˆτ cannot be negative. Lemma 4 shows that with a close approximation of a matrix we can gain a Kemeny ranking that approximates the Kemeny score, i.e., minimal voter disagreement, well. When sampling preferences, or more speciﬁcally feedback from dueling bandits, we can express the value of our approximation of the winning matrix by conﬁdence intervals for matrix entries. For this type of uncertainty quantiﬁcation, we can adjust the approximation bounds from Lemma 4.

Corollary 1. Let Q, ˆQ Q(k). Consider Kemeny rankings τ of Q and ˆτ of ˆQ. Suppose qij [ˆqij cji, ˆqij + cij] for some matrix C Rk k 0 holds for all i, j [k], i = j with probability (1 δ). Then KS(Q, ˆτ) KS(Q, τ) 2 P 1 i<j k max(cij, cji) with probability (1 δ). We can, in fact, reﬁne this approximation bound for a Kemeny ranking ˆτ of ˆQ + C, with conﬁdence bounds C.

Lemma 5 ( ). Let Q, ˆQ Q(k) and C Rk k be matrices such that qij [ˆqij cji, ˆqij + cij] holds for all i, j [k], i = j with probability (1 δ). Consider Kemeny rankings τ of Q and ˆτ of ˆQ + C. Then KS(Q, ˆτ) KS(Q, τ) P 1 i<j k(cij + cji) with probability (1 δ). Thus, to construct a good ranking in terms of Kemeny score, we can choose a Kemeny ranking w.r.t. upper conﬁdences of an approximated matrix of winning probabilities. The above described approximation bounds for Kemeny scores are dependent on the quality of the approximation of winning probabilities for all pairs of arms. This is unsurprising, since the Kemeny score itself takes all winning probabilities into account with equal weight. In the next section, we discuss concrete conﬁdence bounds and PAC algorithms for Kemeny rankings w.r.t. Kemeny scores.

4 PAC Algorithms for Kemeny Rankings w.r.t. Kemeny Scores To approximate a Kemeny ranking, we determine a sequence of pairs of arms to pull. We ﬁrst consider strategies for the setting where we draw samples with replacement from Bernoulli distributions with the winning probabilities as means. Thus, voters are drawn uniformly at random from an unknown population and we might ask the same voter the same query twice. We then investigate the setting in which the population of voters is known and every voter may only be asked the same query once (sampling from a ﬁnite population without replacement).

Sampling With Replacement Consider a matrix Q Q(k) of winning probabilities between arms such that every sample (i, j) of two arms returns i j with probability qij. Because conﬁdence interval sizes are usually monotonically decreasing with the number of samples, Lemma 5 suggests that for symmetric conﬁdence bounds and without taking any further inference into

account a uniform sampling strategy minimises the Kemenyscore-approximation error. More concretely, it follows from the next proposition that Algorithm 1 is a PAC algorithm for approximating Kemeny rankings w.r.t. Kemeny scores.

Algorithm 1: Kemeny El(Q, ρ, δ): Sampling with Replacement from Q Q(k)

ˆQ {1/2}k k Estimate of Q x := k (k 1)

ρ , y := ln δ k(k 1) Constants

2x2y , c := q

2ty Sample Size and Conﬁdence Bound for arms i, j [k] do s1, . . . st Bernoulli(qij) ˆqij 1/t P i=1,...,t si ˆqji 1 ˆqij end for return K( ˆQ + 1 c)

Proposition 1 ( ). Let ˆQ Q(k) be the matrix of sample averages approximating some Q Q(k) after t sam-

ples of each pair of arms, and let ct δ = q

1 2ty with y :=

ln δ k(k 1) > 1. Then, with probability (1 δ) we have KS(Q, ˆτ) KS(Q, τ) k (k 1) ct δ for Kemeny ranking τ of Q and Kemeny ranking ˆτ of ˆQ + 1 ct δ.

Given a desired approximation rate ρ, we can reformulate Proposition 1 and obtain the necessary number of samples.

Corollary 2. For k 2 arms, Q Q(k) with Kemeny ranking τ, ﬁxed approximation guarantee 0 < ρ k(k 1)

2 and probability δ, Algorithm 1 is a (δ, ρ)-PAC algorithm. That is, after t < 1

2x2y samples of each pair of arms, Algorithm 1 returns ranking ˆτ such that with probability (1 δ), we have KS(Q, ˆτ) KS(Q, τ) ρ. Here x := k (k 1)

ρ and y := ln δ k(k 1) > 1.

Sampling Without Replacement Suppose that, while sampling voters uniformly at random, none of the voters v N can be asked twice about their preferences over a pair of arms. Then the number s out of t asked voters that prefer i to j follows a hypergeometric distribution: s Hypergeometric(n, n qij, t). In this case, we can reﬁne the conﬁdence bounds of our sampled estimation of the true fractions of voters qij preferring i to j. This leads to better approximation for a ﬁxed number of samples, or reversely, a lower sample complexity for a ﬁxed approximation goal, compared to sampling with replacement. More concretely, it follows from the next propositions that Algorithm 2 is a PAC algorithm for Kemeny rankings w.r.t. Kemeny scores when sampling without replacement.

Proposition 2 ( ). Consider n > 1 voters with preference matrix Q P(k, n) over k 2 alternatives. Let δ < 0.5 be the approximation probability and y := ln δ k(k 1) > 1.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Algorithm 2: Kemeny El(Q, ρ, δ): Sampling w/o Replacement from Q P(k, n)

ˆQ {1/2}k k Estimate of Q x := k (k 1)

ρ , y := ln δ k(k 1) Constants

if n < x2y 4

t := 2n+x2yn

2n+x2y , c := q

2t2n y Sample Size and Conﬁdence Bound (small n) else t := x2y(n+1)

2n+x2y , c := q

2tn y Sample Size and Conﬁdence Bound (large n) end if for arms i, j [k] do s Hypergeometric(n, n qij, t) ˆqij s/t ˆqji 1 ˆqij end for return K( ˆQ + 1 c)

Furthermore let ˆQ Q(k) be the matrix of sample averages after t uniformly random samples from the population for each pair of arms (without replacement).

Then ct δ,n = q

2tn y is a (1 δ)-conﬁdence bound for

all entries in ˆQ. Furthermore, with probability (1 δ) we have KS(Q, ˆτ) KS(Q, τ) k (k 1) ct δ,n for Kemeny ranking τ of Q and Kemeny ranking ˆτ of ˆQ + 1 ct δ,n. For given ρ, we can guarantee an approximation KS(Q, ˆτ) KS(Q, τ) ρ with probability (1 δ) after t = x2y(n+1)

2n+x2y samples (without replacement) of each pair of

arms, where x := k (k 1)

For the case that we have more samples than half of the population size, i.e., t > n/2, and a limited population size, we can apply better bounds on conﬁdences over the winning probabilities. This leads to the following result.

Proposition 3 ( ). Assume a ﬁxed voter population of size n > 1 with preference matrix Q P(k, n) over k 2 alternatives. Let δ < 0.5 be the approximation probability and y := ln δ k(k 1) > 1. Furthermore, let ˆQ Q(k) be the matrix of sample averages after t > n/2 uniformly random samples for each pair of arms (without replacement).

Then ct δ,n = q

2t2n y is a (1 δ)-conﬁdence bound

for all entries in ˆQ. Furthermore, with probability (1 δ) we have KS(Q, ˆτ) KS(Q, τ) k (k 1) ct δ,n for Kemeny ranking τ of Q and Kemeny ranking ˆτ of ˆQ + 1 ct δ,n.

For given ρ and if the population is of size n < x2y 4

2 , we can guarantee an approximation KS(Q, ˆτ) KS(Q, τ) ρ with probability (1 δ) after t = x2yn+2n

2n+x2y samples (without

replacement) of each pair of arms, where x := k (k 1)

Comparison Note that Proposition 2 and 3 use improved conﬁdence bounds for the estimated winning probabilities compared to Proposition 1. This is not surprising, since we are sampling from a ﬁxed population without replacement, and expect the conﬁdence bound to be dependent on the relation between the number of samples and the population size. Furthermore, unsurprisingly, this improves the sample complexity. We can quantify the advantage as follows.

Corollary 3. Let x := k (k 1)

ρ and y := ln δ k(k 1) > 1. Consider the conﬁdence bounds after t samples from Proposition 1, Proposition 2 and Proposition 3:

(n t)(t + 1)

Then we have c = q

n c and c = q

tn c, i.e., tighter conﬁdence bounds when sampling without replacement. Furthermore, c > c if and only if t > n/2. For ﬁxed approximation guarantee k(k 1)

n ρ > 0, consider the sample complexities from Proposition 1, 2, 3:

2x2y, t = x2y(n + 1)

2n + x2y , t = x2yn + 2n

Then we have t = 2(n+1) 2n+x2yt and t = 2n+4n/(x2y)

2n+x2y t. For the reasonable assumption that 2/n < ρ, k 2 and n 2 this means that fewer samples are needed when sampling without replacement than with replacement. Furthermore, for n < x2y 4

2 we have t > t .

This comparison also shows that the more samples are taken, the starker the difference between conﬁdence bounds when sampling with and without replacement. Furthermore, in order to reach an approximation of ρ, the smaller ρ and δ, the fewer samples we need when sampling without replacement compared to sampling with replacement.

5 Pruning of Conﬁdence Intervals In Section 4 we estimated the conﬁdence intervals of the winning probabilities qij by the use of some known concentration inequalities. This treatment delivered symmetric bounds that, given the same number of samples, are the same for all pairs of arms i, j [k]. However, the values qij of a preference matrix Q P(k, n) are not independent. In fact, Lemma 1 implies that qij +qji = 1, for all i, j [k]. We can restrict conﬁdence intervals to comply with this constraint.

Lemma 6 ( ). Assume that the sampled means ˆqij after some samples from a preference matrix Q P(k, n) have (possibly asymmetric) conﬁdence intervals:

qij [ˆqij cij, ˆqij + cij]

for all arms i, j [k] with probability (1 δ). Deﬁne matrix C [0, 1]k k by cij := min(cij, cji). Then with probability (1 δ) for all arms i, j [k]:

qij [ˆqij cji, ˆqij + cij]. (1)

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Lemma 1 also implies that for a preference matrix Q P(k, n) a triangle inequality must hold for all triples of arms: qlj + qji qli for all i, j, l [k]. We can comply with this constraint by pruning conﬁdence intervals further. Lemma 7 ( ). Assume that the sampled means ˆqij after some samples from a preference matrix Q P(k, n) have (possibly asymmetric) conﬁdence intervals:

qij [ˆqij cji, ˆqij + cij]

for all arms i, j [k] with probability (1 δ). Deﬁne matrix C [0, 1]k k by

cij := min (cij, min l [k] ˆqil + cil + ˆqlj + clj ˆqij).

Then with probability (1 δ) for all arms i, j [k]:

qij [ˆqij cji, ˆqij + cij]. (2)

An example of pruning conﬁdence intervals can be found in Appendix D of (George and Dimitrakakis 2023). After pruning intervals once, it might be feasible to prune the new conﬁdence intervals even more. This procedure is described by Algorithm 3. Note that Lemma 7 allows cij to be negative, indicating that the sample mean ˆqij overestimates the actual winning probability qij. This does not inﬂuence the approximation result given in Lemma 5. However, we require that 1 cij+ˆqij 0 which is expressed in Algorithm 3 in the 4th and 10th line. Algorithm 3 converges, because in every iteration of the repeat-loop at least one value cij strictly decreases and values never increase while maintaining a lower bound of ˆqij. Furthermore, the values of initial conﬁdence bounds and sample averages build a discrete set of values such that there are countably many combinations that can build the minimum for the updates. We can now reﬁne the elicitation methods from Section 4.

Algorithm 3: Pruning CI( ˆQ, C, C) Pruning Conﬁdence Intervals qij [ˆqij cij, ˆqij + cij] for all i, j [k] of preference matrix Q P(k, n)

cij min(cij, cji) for all i, j [k] Upper/Lower Conﬁdence Bounds based on Lemma 6 cij min(cij, 1 ˆqij) for all i, j [k] Upper/Lower Conﬁdence Bounds based on qij [0, 1] C C repeat C C for arms i, j [k] do cij min (cij, min l [k],l =i,j ˆqil + cil + ˆqlj + clj ˆqij)

cij max ( ˆqij, cij) end for until C == C return C

Adaptive Sampling Strategies Recall that our approximation bounds for Kemeny rankings from Section 3 are determined by the conﬁdence intervals

sizes. Furthermore, the conﬁdence bounds for sampling with or without replacement given in Section 4 can only be inﬂuenced by the number of pulls of the pairs of arms. Thus uniform sampling guarantees to always sample from a pair of arms with the largest conﬁdence interval. However, when pruning conﬁdence intervals after every new sample, the most uncertain pair of arms, i.e., one with the largest conﬁdence interval, might not necessarily be one with the lowest number of pulls so far. We can also estimate the outcome and effect of the next ℓsamples. To simplify notation and computational complexity, we concentrate on such look-aheads with l = 1. This allows the following sample strategies: Uniform: Sample a pair with minimal number of pulls. Opportunistic: Sample a pair with maximal conﬁdence interval size. Optimistic: Sample a pair which, for the best case outcome, gives a maximal reduction in the sum of all conﬁdence interval sizes. Pessimistic: Sample a pair which, for the worst case outcome, gives a maximal reduction in the sum of all conﬁdence interval sizes. Bayesian/Realistic: Sample a pair which, in expectation w.r.t. the current estimation of winning probabilities, gives a maximal reduction in the sum of all conﬁdence interval sizes. Note that if in one round no pruning is possible for any sample outcome for any pair of arms, all sampling strategies collapse to opportunistic sampling. If additionally all pairs have been pulled equally often, this is the same as uniform sampling as any sample outcome will give same reduction.

Experiments Setup: We generate preference matrices Q P(k, n) for n = 10 voters uniformly at random for up to k = 9 arms. We set an approximation value of ρ = 0.1 k(k 1)

2 , i.e., 10% of the worst case difference in approximated and optimal Kemeny score, and approximation probability (1 δ) = 0.95. For better comparison, we compare all sampling methods on the same preference matrices. Furthermore, we average all our results over runs on several instances. Note that the computational complexity for sampling with replacements only allows us to conduct experiments on 10 instances and for few arms, while for sampling without replacement we use 100 instances for experiments with up to 9 arms. To compare Kemeny scores at every sampling step, it is required to compute Kemeny rankings which is a known NPhard problem. We use a simple ILP formulation for this. All code is written in Python and is publicly available under https://github.com/annemage/Eliciting-Kemeny-Rankings.

Results and Analysis: As all sampling methods correspond to uniform sampling when no pruning is taken into account, we restrict our attention here to the case where we apply Algorithm 3 to the conﬁdence intervals indicated in Corollary 3 w.r.t. current numbers of pulls for every new sample. Unsurprisingly, our experiments show that the Kemeny score for a current approximation is closer to the true Kemeny score than the approximation bounds given by the

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

(a) Sampling With Replacement. Averages over 10 instances. k = 4, ρ = 0.6. Average true Kemeny score 2.08.

(b) Sampling Without Replacement. Averages over 100 instances. k = 6, ρ = 1.5. Average true Kemeny score 5.618.

Figure 1: Average Conﬁdence Bounds for uniformly at random generated instances with n = 10 voters. δ = 0.05.

total length of conﬁdence intervals (see Lemma 5) would suggest, i.e., our approximation bounds are not tight. Because it is the approximation bounds, however, that inﬂuence the sample strategies, we show a comparison of these in Figure 1 for a sequence of samples. Note that here the length of the x-axis corresponds to the theoretical sample complexity given in Corollary 3. Obviously, the two plots conﬁrm that sampling without replacement is much more efﬁcient for all sample methods. Note that if for an instance sampling terminated because the approximation bound ρ was reached early, averages are taken over the remaining instances only. It can be easily seen in Figure 1a that when sampling with replacement only uniform and opportunistic sampling reach the given approximation bound ρ within the theoretical sample complexity of 6576 samples. Note that in these experiments the look-ahead methods (optimistic, pessimistic and realistic sampling) often get stuck by sampling the same pairs of arms which apparently do not lead to enough pruning of conﬁdence intervals to be worth it. This is surprising especially for pessimistic sampling, as this method should be conservative w.r.t. pruning possibilities. Furthermore, uniform and opportunistic sampling behave exactly the same and reach the desired approximation ρ in exactly the theoretical sample complexity. This might imply that pruning has no effect in these instances which could be an artifact of how the conﬁdence intervals are computed in this case. On the other hand, it can easily be seen from Figure 1b that for sampling without replacement even uniform sampling needs in average less than the theoretical number of samples indicated in Corollary 3 when pruning is applied, showing that pruning of conﬁdence intervals is applicable in our instances. Here, the adaptive sampling strategies show a clear advantage over sampling uniformly. While opportunistic sampling seems the best choice when only sampling few pairs of arms, it is outperformed by optimistic, pessimistic and realistic (Bayesian) look-aheads for more samples. Naturally, the look-ahead methods can anticipate any possibility of pruning the conﬁdence intervals, which seems to give

them an advantage in the long run. However, at this point we would like to point out that opportunistic sampling has much lower computational complexity than the look-ahead methods, which is an obvious advantage in practice. Optimistic and realistic sampling behave similarly. They need fewer samples and give a better approximation than pessimistic sampling. This might be because pessimistic sampling is too conservative in reacting to chances of pruning. For different k experimental results show similar behaviour. Please refer to Appendix D in (George and Dimitrakakis 2023) for more results regarding different k and a comparison to preference proﬁles generated from a Mallows model (Mallows 1957) with φ = 0.2 (i.e., preferences being fairly similar) or uniformly at random generated singlepeaked preferences which always admit a Condorcet winner.

6 Conclusion

We phrased the problem of eliciting voters preferences for determining a Kemeny ranking in terms of a Dueling Bandit problem and found that while there are no efﬁcient PAC algorithms for approximating the Kendall tau distance, we can formulate PAC algorithms for approximating the Kemeny score. Here, our approximation bounds for the Kemeny score are dependent on the sum of conﬁdence bounds for the approximated winning probabilities of arms, rendering uniform sampling the best sampling strategy w.r.t. these bounds. We analyse the sample complexity for uniform sampling of arms for sampling with and without replacement. Further, we described ways of pruning conﬁdence bounds in order to reduce the sample complexity, yielding nonuniform sample strategies, and demonstrate the effect of our methods experimentally. In future work, one could transfer the methodology taken in this paper for ﬁnding conﬁdence-bound-based approximation bounds for other C2 voting rules. Furthermore, the Kemeny score could be an interesting approximation measure for PAC algorithms under different voting rules.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Acknowledgements This work was supported under the NRC Grant No 302203 Algorithms and Models for Socially Beneﬁcial Artiﬁcial Intelligence .

References Arrow, K. J. 1950. A difﬁculty in the concept of social welfare. Journal of political economy, 58(4): 328 346. Balsubramani, A.; Karnin, Z.; Schapire, R. E.; and Zoghi, M. 2016. Instance-dependent regret bounds for dueling bandits. In Conference on Learning Theory, 336 360. PMLR. Bartholdi, J.; Tovey, C. A.; and Trick, M. A. 1989. Voting schemes for which it can be difﬁcult to tell who won the election. Social Choice and welfare, 6: 157 165. Bengs, V.; Busa-Fekete, R.; El Mesaoudi-Paul, A.; and H ullermeier, E. 2021. Preference-based online learning with dueling bandits: A survey. The Journal of Machine Learning Research, 22(1): 278 385. Betzler, N.; Fellows, M. R.; Guo, J.; Niedermeier, R.; and Rosamond, F. A. 2008. Fixed-parameter algorithms for Kemeny scores. In Algorithmic Aspects in Information and Management: 4th International Conference, AAIM 2008, Shanghai, China, June 23-25, 2008. Proceedings 4, 60 71. Springer. Boehmer, N.; Faliszewski, P.; Niedermeier, R.; Szufa, S.; and Was, T. 2022. Understanding distance measures among elections. ar Xiv preprint ar Xiv:2205.00492. Brandt, F.; Brill, M.; Hemaspaandra, E.; and Hemaspaandra, L. A. 2015. Bypassing combinatorial protections: Polynomial-time algorithms for single-peaked electorates. Journal of Artiﬁcial Intelligence Research, 53: 439 496. Brandt, F.; Conitzer, V.; Endriss, U.; Lang, J.; and Procaccia, A. D. 2016. Handbook of computational social choice. Cambridge University Press. Busa-Fekete, R.; Sz or enyi, B.; and H ullermeier, E. 2014. PAC rank elicitation through adaptive sampling of stochastic pairwise preferences. In Proceedings of the AAAI Conference on Artiﬁcial Intelligence, volume 28, #1. Falahatgar, M.; Orlitsky, A.; Pichapati, V.; and Suresh, A. T. 2017. Maximum selection and ranking under noisy comparisons. In International Conference on Machine Learning, 1088 1096. PMLR. Fishburn, P. C. 1977. Condorcet social choice functions. SIAM Journal on applied Mathematics, 33(3): 469 489. George, A.-M.; and Dimitrakakis, C. 2023. Eliciting Kemeny Rankings. ar Xiv:2312.11663. Kemeny, J. G. 1959. Mathematics without numbers. Daedalus, 88(4): 577 591. Kendall, M. G. 1938. A new measure of rank correlation. Biometrika, 30(1/2): 81 93. Lin, C.-C.; and Lu, C.-J. 2018. Efﬁcient mechanisms for peer grading and dueling bandits. In Asian Conference on Machine Learning, 740 755. PMLR. Mallows, C. L. 1957. Non-Null Ranking Models. I. Biometrika, 44(1/2): 114 130.

Satterthwaite, M. A. 1975. Strategy-proofness and Arrow s conditions: Existence and correspondence theorems for voting procedures and social welfare functions. Journal of economic theory, 10(2): 187 217. Young, H. P.; and Levenglick, A. 1978. A consistent extension of Condorcet s election principle. SIAM Journal on applied Mathematics, 35(2): 285 300.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)