# graph_resistance_and_learning_from_pairwise_comparisons__f6fb0df2.pdf

Graph Resistance and Learning from Pairwise Comparisons

Julien M. Hendrickx , Alex Olshevsky Venkatesh Saligrama

Abstract We consider the problem of learning the qualities of a collection of items by performing noisy comparisons among them. Following the standard paradigm, we assume there is a ﬁxed comparison graph and every neighboring pair of items in this graph is compared k times according to the Bradley-Terry-Luce model (where the probability than an item wins a comparison is proportional the item quality). We are interested in how the relative error in quality estimation scales with the comparison graph in the regime where k is large. We prove that, after a known transition period, the relevant graph-theoretic quantity is the square root of the resistance of the comparison graph. Speciﬁcally, we provide an algorithm that is minimax optimal. The algorithm has a relative error decay that scales with the square root of the graph resistance, and provide a matching lower bound (up to log factors). The performance guarantee of our algorithm, both in terms of the graph and the skewness of the item quality distribution, outperforms earlier results.

1. Introduction

This paper considers quality estimation from pairwise comparisons, which is a common method of preference elicitation from users. For example, the preference of a customer

*Equal contribution . Correspondence to: Alex Olshevsky <alexols@bu.edu>.

Proceedings of the 36 th International Conference on Machine Learning, Long Beach, California, PMLR 97, 2019. Copyright 2019 by the author(s).

Department of Mathematical Engineering, ICTEAM, UCLouvain, Belgium Department of Electrical and Computer Engineering, Boston University, USA This work was supported by the National Science Foundation Grant CCF 1527618, Ofﬁce of Naval Research Grant N0014-18-1-2257, the U.S. Department of Homeland Security, Science and Technology Directorate, Ofﬁce of University Programs under Grant 2013ST-061-ED0001, Communaut e franc aise de Belgique - Actions de Recherche Concert ees, and a WBI World excellence fellowship.

for one product over another can be thought of as the outcome of a comparison. Because customers are idiosyncratic, such outcomes will be noisy functions of the quality of the underlying items. A similar problem arises in crowdsourcing systems, which must strive for accurate inference even in the presence of unreliable or error-prone participants. Because crowdsourced tasks pay relatively little, errors are common; even among workers making a genuine effort, inherent ambiguity in the task might lead to some randomness in the outcome. These considerations make the underlying estimation algorithm an important part of any crowdsourcing scheme.

Our goal is accurate inference of true item quality from a collection of outcomes of noisy comparisons. We will use one of the simplest parametric models for the outcome of comparisons, the Bradley-Terry-Luce (BTL) model, which associates a real-valued quality measure to each item and posits that customers select an item with a probability that is proportional to its quality. Given a comparison graph which captures which pairs of items are to be compared, our goal is to understand how accuracy scales in terms of this graph when participants make choices according to the BTL model.

We focus on the regime where we perform many comparisons of each pair of items in the graph. In this regime, we are able to give a satisfactory answer to the underlying question. Informally, we prove that, up to various constants and logarithms, the relative estimation error will scale with the square root of measures of resistance in the underlying graph. Speciﬁcally, we propose an algorithm whose performance scales with graph resistance, as well as a matching lower bound. The difference between our upper and lower bounds depends only on the log of the conﬁdence level and on the skewness of the item qualities. Additionally, we note that our performance guarantees scale better in terms of item skewness as compared to previous work.

1.1. Formal problem statement

We are given an undirected comparison graph G(V, E), where each node i has a positive weight wi. If (i, j) E, then we perform k comparisons between i and j. The outcomes of these comparisons are i.i.d. Bernoulli and the probability that i wins a given comparison according to the

Learning from Pairwise Comparisons

BTL model is pij = wi wi + wj (1)

The goal is to recover the weights wi from the outcomes of these comparisons. Because multiplying all wi by the same constant does not affect the distribution of outcomes, we will recover a scaled version of the weight vector w.

Thus our goal will thus be come up with a vector of estimated weights ˆW close, in a scale-invariant sense, to the true but unknown vector1 w. A natural error measure turns out to be the absolute value of the sine of the angle deﬁned by w and ˆW, which can also be expressed as (see Lemma ?? in the Supplementary Information)

sin( ˆW, w) = inf α R || ˆW αw||2

||αw||2 . (2)

In other words, | sin( ˆW, w)| is the relative error to the closest normalization of the true quality vector w. We will also discuss the connection between this error measure and others later on in the paper.

Following earlier literature, we assume that

max i,j V wi wj b

for some constant b. The number b can be thought of as a measure of the skewness of the underlying item quality. Our goal is to understand how the error between ˆW and w scales as a function of the comparison graph G.

1.2. Literature Review

The dominant approach to recommendation systems relies on inferring item quality from raw scores provided by users (see (Jannach et al., 2016)). However, such scores might be poorly calibrated and inconsistent; alternative approaches that offer simpler choices might perform better.

Our starting point is the Bradley-Terry-Luce (BTL) model of Eq. (1), dating back to (Bradley & Terry, 1952; Luce, 2012), which models how individuals make noisy choices between items. A number of other models in the literature have also been used as the basis of inference, we mention the Mallows model introduced in (Mallows, 1957) and the PL and Thurstone models (see description in (Hajek et al., 2014)). However, we focus here solely on the BTL model.

Our work is most closely related to the papers (Negahban et al., 2012) and (Negahban et al., 2016). These works proposed an eigenvector calculation which, provided the number of comparisons is sufﬁciently large, successfully recovers the true weights w from the outcomes of noisy

1We follow the usual convention of denoting random variables by capital letters, which is why ˆ W is capitalized while w is not.

comparisons. The main result of (Negahban et al., 2016) stated that, given a comparison graph, if the number of comparisons per edge satisﬁed a certain lower bound, then it is possible to construct an estimate ˆW satisfying

b5/2dmax dmin(1 λ)

log n kdmax

with high probability, where dmin, dmax are, respectively, the smallest and largest degrees in the comparison graph, 1 λ is the spectral gap of a certain normalized Laplacian of the comparison graph, and both w, ˆW are normalized so that their entries sum to 1. It can be proved (see Lemma ??) that the relative error on the left-hand side of Eq. (3) is within a

b factor of the measure | sin( ˆW, w)| provided that maxi,j ˆWi/ ˆWj b, so asymptotically these two measures differ only by factor depending on the skewness b.

The problem of recovering w was further studied in (Rajkumar & Agarwal, 2014), where the comparison graph was taken to be a complete graph but with comparisons on edges made at non-uniform rates. The sample complexity of recovering the true weights was provided as a function of the smallest sampling rate over pairs of items.

A somewhat more general setting was considered in (Shah et al., 2016), which considered a wider class of noisy comparison models which include the BTL model as a special case. Upper and lower bounds on the minimax optimal rates in estimation, depending on the eigenvalues of a corresponding Laplacian, were obtained for absolute error in several different metrics; in one of these metric, the Laplacian semi-metric, the upper and lower bounds were tight up to constant factors. Similarly to (Shah et al., 2016), our goal is to understand the dependence on the underlying graph, albeit in the simpler setting of the BTL model.

Our approach to the problem very closely parallels the approach of (Jiang et al., 2011), where a collection of potentially inconsistent rankings is optimally reconciled by solving an optimization problem over the comparison graph. However, whereas (Jiang et al., 2011) solves a linear programming problem, we will use a linear least squares approach, after a certain logarithmic change of variable.

We now move on to discuss work more distantly related to the present paper. We mention that the problem we study here is related, but not identical, to the so-called noisy sorting problem, introduced in (Braverman & Mossel, 2009), where better items win with probability at least 1/2 + δ for some positive δ. This assumption does not hold for the BTL model with arbitrary weights. Noisy sorting was also studied in the more general setting of ranking models satisfying a transitivity condition in (Shah et al., 2017) and (Pananjady et al., 2017), where near-optimal minimax rates were derived. Finally, optimal minimax rates for noisy sorting were recently demonstrated in (Mao et al., 2017).

Learning from Pairwise Comparisons

There are a number of variations of this problem that have been studied in the literature which we do not survey at length due to space constraints. For example, the papers (Yue et al., 2012; Sz or enyi et al., 2015) considered the online version of this problem with corresponding regret, (Chen & Suh, 2015) considered recovering the top K ranked items, (Falahatgar et al., 2017; Agarwal et al., 2017; Maystre & Grossglauser, 2015) consider recovering a ranked list of the items, and (Ajtai et al., 2016) consider a model where comparisons are not noisy if the item qualities are sufﬁciently far apart. We refer the reader to the references within those papers for more details on related works in these directions.

1.3. Our approach

We will construct our estimate ˆW by solving a log-leastsquares problem described next. We denote by Fij the fraction of times node i wins the comparison against its neighbor j, and we further set Rij = Fij/Fji. As the number of comparisons on each edge goes to inﬁnity, we will have that Rij approaches wi/wj with probability one. Our method consists in ﬁnding ˆW as follows:

ˆW = arg min v R|E| +

(i,j) E (log(vi/vj) log Rij)2 (4)

This can be done efﬁciently by observing that it amounts to solving the linear system of equations

log ˆWi log ˆWj = log Rij, for all (i, j) E,

in the least square sense. Let B to be the incidence matrix2

of the comparison graph. Stacking up the Rij into a vector R, we can then write

BT log ˆW = log R

Least-square solutions satisfy

BBT log ˆW = B log R

or equivalently L log ˆW = B log R, where L = BBT is the graph Laplacian. Finally, a solution is given by

log ˆW = L B log R. (5)

where L is the Moore-Penrose pseudoinverse. By using the classic results of (Spielman & Teng, 2014), Eq. (5) can be solved for ˆW to accuracy ϵ in nearly linear time in terms of the size of the input, speciﬁcally in O(|E| logc n log(1/ϵ)) iterations for some constant c > 0. We note that, for connected graphs, all solutions w of (4) are equal up to a multiplicative constant and are thus equivalent in terms of criterion (2).

2Given an directed graph with n nodes and |E| edges, the incidence matrix is the n |E| matrix whose i th column has a 1 corresponding to the source of edge i, a 1 corresponding to the destination of node i, and zeros elsewhere. For an undirected graph, an incidence matrix is obtained by ﬁrst orienting the edges arbitrarily.

1.4. Our contribution

We will ﬁnd it useful to view the graph as a circuit with a unit resistor on each edge; Ωij will denote the resistance between nodes i and j in this circuit, Ωmax denotes the largest of these resistances over all pairs of nodes i, j = 1, . . . , n and similarly Ωavg denotes the average resistance over all pairs. We will use Eij to denote the set of edges lying on at least one simple path starting at i and terminating at j, with Emax denoting the largest of the Eij. Naturally, Emax is upper bounded by the total number of edges in the comparison graph. The performance of our algorithms is described by the following theorem. Theorem 1. Let δ (0, e 1). There exist absolute constants constants c1, c2 such that, if Cn,δ c1 log(n/δ) and k c2Emax C2 n,δ and k c3Ωb2(1 + (log(1/δ)), then we have, with probability at least 1 δ, that

sin( ˆW, w)2 O

min b2Ωmax, b4Ωavg

+ Emax C2 n,δ k

The main feature of this theorem is the favorable form of the bound in the setting when k is large. Then only the leading term min(b2Ωmax, b4Ωavg)(1 + log 1/δ)

k dominates the expression on the right-hand-side. Taking square roots, it follows that, asymptotically,

sin( ˆW, w) = e O

where the e O notation hides logarithmic factor in δ.

Our other main result is that, in the regime when k is large, there is very little room for improvement. Theorem 2. For any comparison graph G, and for any algorithm, as long as k c p

λmax(L)nΩavg for some absolute constant c, we have that

sup w Rn + E sin( ˆW, w) Ω

where as before L is the graph Laplacian.

Comparing Theorem 1 with Theorem 2, we see that the performance bounds of Theorem 1 are minimax optimal, at least up to the logarithmic factor in the conﬁdence level δ and dependence on the skewness factor b. We can thus conclude that the square root of the graph resistance is the key graph-theoretic property which captures how relative error decays for learning from pairwise comparisons. This observation is the main contribution of this paper.

Learning from Pairwise Comparisons

Table 1. Comparison, for different families of graphs, of e O dmax dmin(1 λ) q

and e O( bΩmax), which are, respectively, the asymptotic bounds (3) in (Negahban et al., 2016), and the ﬁrst bound from our Theorem 1. The common decay in k 1/2 is omitted for the sake of conciseness.

Graph Eq. (3) Theorem 1 Line b5/2n2 b n Circle b5/2n2 b n 2D grid b5/2n b 3D grid b5/2n2/3 b Star graph b5/2 n b 2 stars joined at centers b5/2n1.5 b Barbell graph b5/2n3.5 b n Geo. random graph b5/2n b Erdos-Renyi b5/2 b

1.5. Comparison to previous work

Table 1 quantiﬁes how much the bound of Theorem 1 expressed in terms of Ωmax improves the asymptotic decay rate on various graphs over the bound (Negahban et al., 2016). The e O notation ignores log-factors. Both random graphs are taken at a constant multiple threshold which guarantees connectivity; for Erdos-Renyi this means p = O((log n)/n) and for a geometric random graph, this means connecting nodes at random positions at the unit square when they are O p

(log n)/n apart.

Most of the scalings for eigenvalues of normalized Laplacians used in Table 1 are either known or easy to derive. For an analysis of the eigenvalue of the barbell graph3, we refer the reader to (Landau & Odlyzko, 1981); for mixing times on the geometric random graph, we refer the reader to (Avin & Ercal, 2007); for the resistance of an Erdos-Renyi graph, we refer the reader to (Sylvester, 2016).

In terms of the worst-case performance in terms of the number of nodes, our bound grows at worst as e O b p

using the observation that Ωmax = O(n). By contrast, for the barbell graph, the bound of (Negahban et al., 2016) grows as e O(b5/2n3.5/

k), and it is not hard to see this is actually the worst-case scaling in terms of the number of nodes.

Finally, we note that these comparisons use slightly different error measures: | sin( ˆW, w)| on our end vs the relative error in the 2-norm after w, ˆW have been normalized to sum to one, used by (Negahban et al., 2016). To compare both in terms of the latter, we could multiply our bounds by

b (see Lemma ??).

3Following (Wilf, 1989), the barbell graph refers to two complete graphs on n/3 vertices connected by a line of n/3 vertices.

1.6. Notation

The remainder of this paper is dedicated to the proof Theorem 1 (Theorem 2 is proved in the Supplementary Information). However, we ﬁrst collect some notation we will ﬁnd occasion to use.

As mentioned earlier, we let Fij be the empirical rate of success of item i in the k comparisons between i and j; thus E[Fij] = pij so that the previously introduced Rij can be expressed as Rij = Fij

Fji . We also let ρij = wi/wj = pij/pji, to which Rij should converge asymptotically.

We will make a habit of stacking any of the quantities deﬁned into vectors; thus F, for example, denotes the vector in R|E| which stacks up the quantities Fij with the choice of i and j consistent with the orientation in the incidence matrix B. The the vectors p and ρ are deﬁned likewise.

2. Proof of the algorithm performance (Theorem 1)

We begin the proof with a sequence of lemmas which work their way to the main theorem. The ﬁrst step is to introduce some notation for the comparison on the edge (i, j).

Let Xij be the outcome of a single coin toss comparing coins i and j. Using the standard formula for the variance of a Bernoulli random variable, we obtain

Var(Xij) = pij(1 pij) = wiwj (wi+wj)2

= 1 ρij+2+ρ 1 ij =: 1 vij , (6)

where we have deﬁned vij = ρij+2+ρ 1 ij . Observe that vij is always upper bounded by 3+max(ρij, ρji) 3+b 4b, where we remind b maxi,j wi

We ﬁrst argue that all Fij are reasonably close to their expected values. For the sake of concision, we state the following assumptions about the constants, δ, k and the quantity Cn,δ. Note that some of the intermediate results hold under weaker assumptions, but we omit these details for the sake of simplicity.

Assumption 1. We have that δ e 1, Cn,δ c1 log(n/δ), and k c2b(Cn,δ + 1) max{Ωmax, Emax}.

The following lemma is a standard application of Chernoff s inequality. For completeness, a proof is included in Section ?? of the Supplementary Information.

Lemma 1. There exist absolute constants constants c1, c2 such that, under Assumption 1, we have

max (i,j) E |Fij pij|

Learning from Pairwise Comparisons

The next lemma provides a convenient expression for the quantity log ˆW log w in terms of the measurement errors F p. Note that the normalization assumption is not a loss of generality since w is deﬁned up to a multiplicative constant, and is directly satisﬁed if ˆW is obtained from (5).

Lemma 2. Suppose w is normalized so that Pn i=1 log wi = 0. There exist absolute constants c1, c2 > 0 such that, under Assumption 1, there holds with probability 1 δ

log ˆW log w = L BV (F p) + L B , (7)

|| || O b Cn,δ

where V is a |E| |E| diagonal matrix whose entries are the vij, for all edges (i, j) E.

Proof. By deﬁnition

log wi log wj = log ρij for all (i, j) E,

which we can write as BT log w = log ρ. It follows that

log w = (BBT ) B log ρ = L B log ρ,

since w is assumed normalized so that Pn i=1 log wi = 0. Combining this with Eq. (5), we obtain

log ˆW log w = L B(log R log ρ). (9)

We thus turn our attention to analyzing the vector log R log ρ. Our analysis will be conditioning on the event that for all (i, j) E,

kvij }, (10)

which, by Lemma 1, holds with probability at least 1 δ. We will call this event A.

We begin with one implication that comes from putting together event A and our assumption k c1b Cn,δ (in Assumption 1) for a constant c1 that we can choose: that we can assume that

max (i,j) E |Fij pij| min(pij, pji)

Indeed, from Eq. (10) for this last equation to hold it sufﬁces to have k 25Cn,δ/(vijp2 ij) for all (i, j) E. Observing that 1 vijp2 ij = 1 1 pijpji p2 ij = ρji b,

we see that assuming k 25b Cn,δ is sufﬁcient for Eq. (11) to hold conditional on event A.

Our analysis of log R log ρ begins with the observation that since

Rij = 1 Fji

Fji , ρij = 1 pji

we have that

log Rij log ρij = log 1

Fji 1 log 1

Next we use Taylor s expansion of the function g(x) = log(1/x 1), for which we have

g (x) = 1 x(x 1), g (pji) = vij, g (x) = 1 2x x2(1 x)2

to obtain that log Rij log ρij can thus be expressed as

vij(Fji pji) + 1

2 1 2zji z2 ji(1 zji)2 (Fij pij)2 (12)

where zji lies between pji and Fji (and 1 zji lies thus between pij and Fij). We can rewrite this equality in a condensed form

log R log ρ = V (F p) + , (13)

where corresponds to the second terms in (12), which we will now bound. Because we have conditioned on event A, which, as discussed above implies |Fji pji| min(pji, pji)/5, we actually have that zji [0.8pji, 1.2pji] and that 1 zji lying between pij and Fij belongs to [0.8pij, 1.2pij]. Hence

2 1 0.84p2 ijp2 ji (Fij pij)2 c3vij Cn,δ

for c3 = 1 2 (0.8)4 , and where we have used (10) for the last inequality. Plugging this into Eq. (13) and (9) completes the proof, and Eq. (8) follows from the last equation combined with the fact that vij 4b for all (i, j) E.

The following lemma bounds how much the ratios of our estimates ˆWl differ from the corresponding ratios of the true weights wl. To state it, we will use the notation

Qij = (ei ej)(ei ej)T ,

where ei is the standard notation for the i th basis vector. Furthermore, we deﬁne the product

x, y (i,j) = x T BT L Qij L By, ||x||2 (i,j) = x, x (i,j). (14) Observe that the matrix BT L Qij L B is positive semideﬁnite, which implies by standard arguments that

x + y, x + y (i,j) 2 x, x (i,j) + 2 y, y (i,j)

holds for all vectors x, y.

Learning from Pairwise Comparisons

Lemma 3. Suppose w is normalized so that Pn i=1 log wi = 0. There exist absolute constants c1, c2 > 0 such, under Assumption 1, with with probability 1 δ, we have that for all pairs i, j = 1, . . . , n,

log ˆWi ˆWj log wi

2 ||V (F p)||2 (i,j)+2 || ||2 (i,j) ,

|| || O b Cn,δ

Proof. Observe that, on the one hand, using Lemma 2,

(log ˆ W log w)T Qij(log ˆ W log w) = L BV (F p) + L B T Qij L BV (F p) + L B

= V (F p) + , V (F p) + (i,j) 2 V (F p), V (F p) (i,j) + 2 , (i,j) (16) which is the right-hand side of (15). On the other hand, observe that log ˆ W log w Qij log ˆ W log w

= log ˆ Wi log wi log ˆ Wj log wj 2

ˆ Wj log wi

Combining Eq. (16) with Eq. (17) completes the proof.

Having proved Lemma 3, we now analyze each of the terms in the right-hand side of Eq. (15). We begin with the second term, i.e., with || ||2 (i,j). To bound it, we will need the following inequality.

Lemma 4. For any R|E|, we have that

| T BT L (ei ej)| || || q

where, recall, Ωij is the resistance between nodes i and j, and Eij is the set of edges belonging to some simple path from i to j.

Proof. The result follows from circuit theory, and we sketch it out along with the relevant references. The key idea is that the vector u = BT L (ei ej) has a simple electric interpretation. We have that u R|E| and the k th entry of u is the current on edge k when a unit of current is put into node u at removed at node j. For details, see the discussion in Section 4.1 of (Vishnoi, 2013).

This lemma follows from several consequences of this interpretation. First, the entries of u are an acyclic ﬂow from i to j; this follows, for example, from Thompson s principle which asserts that the current ﬂow minimizes energy (see Theorem 4.8 of (Vishnoi, 2013)). Moreover, Thompson s

principle further asserts that Ωij = ||u||2 2. Finally, by the ﬂow decomposition theorem (Theorem 3.5 in (Ahuja et al., 2017)), we can decompose this ﬂow along simple paths from i to j; this implies that |supp(u)| |Eij|.

With these facts in mind, we apply Cauchy-Schwarz to obtain

||u||1 ||u||2 p

|supp(u)| q

and then conclude the proof using Holder s inequality

| T BT L (ei ej)| = | T u| || || ||u||1|| || p

As a corollary, we are able to bound the second term in Eq. (15). The proof follows immediately by combining Lemma 4 with Lemma 3.

Corollary 1. There exist absolute constants c1, c2 > 0 such that, under Assumption 1, with probability 1 δ, we have that for all pairs i, j = 1, . . . , n,

|| ||2 (i,j) O

Ωij Eij b2C2 n,δ k2

We now turn to the ﬁrst-term in Eq. (15), which is bounded in the next lemma.

Lemma 5. There exist absolute constants c1, c2 such that, under Assumption 1, with probability 1 δ we have that for all pairs i, j = 1, . . . , n,

||V (F p)||2 (i,j) O Ωij b2

Proof. The random variable Xij pij (where, recall, Xij is the outcome of a single comparison between nodes i and j) is zero-mean and supported on an interval of length 1, and consequently it is subgaussian4 with parameter 1 (see Section 5.3 of (Lattimore & Szepesv ari, 2018)). By standard properties of subgaussian random variables, it follows that vij(Fij pij) is subgaussian with τ = vij/

k. It follows then from Theorem 2.1 of (Hsu et al., 2012) for subgaussian random variables applied to ||(ei ej)BT L (F p)||2 = ||V (F p)||2 (i,j), that for any t 1 there is a probability at least 1 e t that

||V (F p)||2 (i,j) 16b2

tr(M) + 2 p

tr(M 2)t + 2 ||M|| t

k tr(M)(1 + 4t),

4A random variable Y is said to be subgaussian with parameter τ if E[eλY ] eτ2λ2/2 for all λ.

Learning from Pairwise Comparisons

where we have used

t t, tr(M 2) tr(M)2 and ||M|| tr(M). We now compute this trace.

tr(M) = tr(BT L Qij L B)

= tr(Qij L BBT L ) = tr(Qij L )

= (ei ej)T L (ei ej) = Ωij, (18)

where the second equality uses the well-known property of the Moore-Penrose pseudo-inverse: A AA = A for any matrix A (see Section 2.9 of (Drineas & Mahoney, 2018)); and last equality uses a well-known relation between resistances and Laplacian pseudoinverses, see Chapter 4 of (Vishnoi, 2013). The result follows then from the application of (18) to t = log 1/δ.

Having obtained the bounds in the preceding sequence of lemmas, we now return to Lemma 3 and plug in the results we have obtained. The result is the following lemma. Lemma 6. There exist absolute constants c1, c2 > 0 such, under Assumption 1, with probability 1 δ, we have that for all pairs i, j = 1, . . . , n,

ˆ Wj ρij i2 O ρ2 ij bΩij

b(1 + log(1/δ)) + b Ei,j C2 n,δ k

Proof. By putting together Lemma 3 with Corollary 1 and Lemma 5, we obtain that, with probability at least 1 δ,

ˆ Wj log wi

k b(1 + log(1/δ)) + b Ei,j C2 n,δ k (19)

Observe that for a sufﬁciently large c2, if k c2Eij C2 n,δ then the term b(1 + log(1/δ)) + b Ei,j C2 n,δ k is bounded by O(b(1+log(1/δ))). Hence, if k is also at least c2b2Ωij(1+ log(1/δ)) (which holds due to Assumption 1), equation (19) implies log ˆWi ˆWj log wi

A particular implication is that max elog( ˆ Wi/ ˆ Wj),

elog(wi/wj) e1+log(wi/wj). Applying the inequality |ea eb| max{ea, eb}|a b| to (20) leads then to

e1+log(wi/wj) log ˆWi ˆWj log wi

and now using elog(wi/wj) = ρij, the proof follows by combining the last equation with Eq. (19).

The next lemma demonstrates how to convert Lemma 6 into a bound on the relative error between ˆW and the true weight vector w.

Lemma 7. Suppose we have that " ˆWi

ρ2 ijsij(k),

for all i, j = 1, . . . , n. Fix index ℓ {1, . . . , n}. Then there hold

sin(w, ˆW) max j sjℓ(k), (21)

sin(w, ˆW) b2savg, (22)

where savg =

a,b=1,...,n sab

Proof. It follows from Lemma ?? that for all α,

sin(w, ˆW) || ˆW αw||2 2 ||αw||2 2 .

Taking α = ˆWℓ/wℓ, we get

|| ˆW αw||2 2 ||αw||2 2 =

i( ˆWi ˆ Wℓ

P i ˆ W 2 ℓ w2 ℓw2 i =

ˆ Wℓ ρiℓ)2 P

Using the assumption of this lemma, we obtain

|| ˆW αw||2 2 ||αw||2 2 P i siℓ(k)ρ2 il P

i ρ2 iℓ , (23)

from which (21) follows. Another consequence of (23) is that

|| ˆW αw||2 2 ||αw||2 2 (maxi ρ2 iℓ) P i siℓ(k) n minj ρ2 jℓ b2 Pn i=1 siℓ

(24) where we used

mini ρiℓ = maxi wi/wℓ

minj wj/wℓ = max i,j wi wj b.

Observe now that since savg = 1

i siℓ, there must exist at least one ℓfor which Pn i=1 siℓ savg. Hence (22) follows from (24).

Having proven this last lemma, Theorem 1 follows immediately by combining By Lemma 6 and Lemma 7.

3. Experiments

The purpose of this section is two-fold. First, we would like to demonstrate that simulations are consistent with Theorem 1; in particular, we would like to see error scalings that are consistent with the average resistance, rather than e.g., spectral gap. Second, we wish observe that, although our results are asymptotic, in practice the scaling with resistance

Learning from Pairwise Comparisons

appears immediately, even for small k. Since our main contribution is theoretical, and since we do not claim that our algorithm is better than available methods in practice, we do not perform a comparison to other methods in the literature. Additional details about our experiments are provided in Section ?? in the Supplementary Information.

We begin with Erdos-Renyi comparison graphs. Figure 1 shows the evolution of the error with the number k of comparisons per edge. The error decreases as O(1/

k) as predicted. Moreover, this is already the case for small values of k.

Next we move to the inﬂuence of the graph properties. Figure 2 shows that the average error is asymptotically constant when n grows while keeping the expected degree d := (n 1)p constant, and that it decreases as O(1/

d) when the expected degree grows while keeping n constant. This is consistent with our analysis in Table 1, and with the results (Boumal & Cheng, 2014) showing that the average resistance Ωavg of Erdos-Renyi graphs evolves as O(1/d).

We next consider lattice graphs in Figure 3. For the 3D lattice, the error appears to converge to a constant when n grows, which is consistent with our results since the average resistance of 3D lattice is bounded independently of n. The trend for the 2D lattice appears also consistent with a bound in O( log n) predicted by our results since the resistance on 2D lattice evolves as O(log n).

101 102 103

Figure 1. Error evolution with the number k of comparisons per edge in Erdos-Renyi graphs of 100 nodes, for different expected degrees d = (n 1)p, with b = 10. Each line corresponds to a different expected degree. The results are averaged over Ntest = 100 tests. The dashed line is proportional to 1/

4. Conclusion

Our main contribution has been to demonstrate, by a combination of upper and lower bounds, that the error in quality estimation from pairwise comparisons scales as the graph resistance. Our work motivates a number of open questions.

First, our upper and lower bounds are not tight with respect to skewness measure b. We conjecture that the scaling of e O( p

bΩavg/k) for relative error is optimal, but either

0 100 200 300 400 n

5 7 10 15 20

50 100 200 400

Figure 2. Error evolution with the number of nodes n for different expected degrees d = (n 1)p (a), and with the expected degree (n 1)p for different number of nodes n (b). There are k = 100 comparisons per edge, b = 5, and results are averaged over Ntest = 50 tests. The dashed line in (b) is proportional to 1/

0 100 200 300 400 n

0 100 200 300 400 500 n

(a) (b) Figure 3. Error evolution with the number of nodes for regular lattices in 2D (a), and 3D (b). Each line corresponds to a different choice of b. The number of comparisons is k = 100 per edge. Results are averaged over respectively Ntest = 1000 and Ntest = 2000 tests. The dashed line in (a) is proportional to log n.

upper of lower bounds matching this quantity are currently unknown.

Second, it would interesting to obtain non-asymptotic version of the results presented here. Our simulations are consistent with the asymptotic scaling e O( p

Ωavg/k) (ignoring the dependence on b) being effective immediately, but at the moment we can only prove this scaling governs the behavior as k .

Agarwal, A., Agarwal, S., Assadi, S., and Khanna, S. Learning with limited rounds of adaptivity: Coin tossing, multiarmed bandits, and ranking from pairwise comparisons. In Conference on Learning Theory, pp. 39 75, 2017.

Ahuja, R. K., Magnanti, T. L., and Orlin, J. B. Network Flows: Theory, Algorithms, and Applications. Pearson Education, 2017.

Ajtai, M., Feldman, V., Hassidim, A., and Nelson, J. Sorting and selection with imprecise comparisons. ACM Transactions on Algorithms (TALG), 12(2):19, 2016.

Avin, C. and Ercal, G. On the cover time and mixing time of

Learning from Pairwise Comparisons

random geometric graphs. Theoretical Computer Science, 380(1):2, 2007.

Boumal, N. and Cheng, X. Concentration of the Kirchhoff index for Erd os R enyi graphs. Systems & Control Letters, 74:74 80, 2014.

Bradley, R. A. and Terry, M. E. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):324 345, 1952.

Braverman, M. and Mossel, E. Sorting from noisy information. ar Xiv preprint ar Xiv:0910.1191, 2009.

Chen, Y. and Suh, C. Spectral mle: Top-k rank aggregation from pairwise comparisons. In International Conference on Machine Learning, pp. 371 380, 2015.

Drineas, P. and Mahoney, M. W. Lectures on Randomized Numerical Linear Algebra. The Mathematics of Data, 25: 1, 2018.

Duchi, J. Assouads method. https://web.stanford. edu/class/stats311/Lectures/lec-04. pdf. Lecture notes.

Falahatgar, M., Orlitsky, A., Pichapati, V., and Suresh, A. T. Maximum selection and ranking under noisy comparisons. ar Xiv preprint ar Xiv:1705.05366, 2017.

Hajek, B. and Raginsky, M. Statistical learning theory. http://maxim.ece.illinois.edu/ teaching/SLT/SLT.pdf. Book draft.

Hajek, B., Oh, S., and Xu, J. Minimax-optimal inference from partial rankings. In Advances in Neural Information Processing Systems, pp. 1475 1483, 2014.

Hsu, D., Kakade, S., and Zhang, T. A Tail Inequality for Quadratic Forms of subgaussian Random Vectors. Electronic Communications in Probability, 17, 2012.

Jannach, D., Resnick, P., Tuzhilin, A., and Zanker, M. Recommender systems beyond matrix completion. Communications of the ACM, 59(11):94 102, 2016.

Jiang, X., Lim, L.-H., Yao, Y., and Ye, Y. Statistical ranking and combinatorial hodge theory. Mathematical Programming, 127(1):203 244, 2011.

Landau, H. and Odlyzko, A. Bounds for eigenvalues of certain stochastic matrices. Linear algebra and its Applications, 38:5 15, 1981.

Lattimore, T. and Szepesv ari, C. Bandit Algorithms. http://downloads.tor-lattimore. com/banditbook/book.pdf, 2018. Book draft.

Luce, R. D. Individual choice behavior: A theoretical analysis. Courier Corporation, 2012.

Mallows, C. L. Non-null ranking models. i. Biometrika, 44 (1/2):114 130, 1957.

Mao, C., Weed, J., and Rigollet, P. Minimax rates and efﬁcient algorithms for noisy sorting. ar Xiv preprint ar Xiv:1710.10388, 2017.

Maystre, L. and Grossglauser, M. Just sort it! A simple and effective approach to active preference learning. ar Xiv preprint ar Xiv:1502.05556, 2015.

Negahban, S., Oh, S., and Shah, D. Iterative ranking from pair-wise comparisons. In Advances in neural information processing systems, pp. 2474 2482, 2012.

Negahban, S., Oh, S., and Shah, D. Rank centrality: Ranking from pairwise comparisons. Operations Research, 65 (1):266 287, 2016.

Pananjady, A., Mao, C., Muthukumar, V., Wainwright, M. J., and Courtade, T. A. Worst-case vs average-case design for estimation from ﬁxed pairwise comparisons. ar Xiv preprint ar Xiv:1707.06217, 2017.

Rajkumar, A. and Agarwal, S. A statistical convergence perspective of algorithms for rank aggregation from pairwise data. In International Conference on Machine Learning, pp. 118 126, 2014.

Shah, N. B., Balakrishnan, S., Bradley, J., Parekh, A., Ramchandran, K., and Wainwright, M. J. Estimation from pairwise comparisons: Sharp minimax bounds with topology dependence. The Journal of Machine Learning Research, 17(1):2049 2095, 2016.

Shah, N. B., Balakrishnan, S., Guntuboyina, A., and Wainwright, M. J. Stochastically transitive models for pairwise comparisons: Statistical and computational issues. IEEE Transactions on Information Theory, 63(2):934 959, 2017.

Spielman, D. A. and Teng, S.-H. Nearly linear time algorithms for preconditioning and solving symmetric, diagonally dominant linear systems. SIAM Journal on Matrix Analysis and Applications, 35(3):835 885, 2014.

Sylvester, J. A. Random walk hitting times and effective resistance in sparsely connected erdos-renyi random graphs. ar Xiv preprint ar Xiv:1612.00731, 2016.

Sz or enyi, B., Busa-Fekete, R., Paul, A., and H ullermeier, E. Online rank elicitation for plackett-luce: A dueling bandits approach. In Advances in Neural Information Processing Systems, pp. 604 612, 2015.

Tao, T. Topics in random matrix theory, volume 132. American Mathematical Soc., 2012.

Learning from Pairwise Comparisons

Vishnoi, N. Lx = b. Foundations and Trends in Theoretical Computer Science, 8, 2013.

Wilf, H. S. The editor s corner: the white screen problem. The American Mathematical Monthly, 96(8):704 707, 1989.

Yue, Y., Broder, J., Kleinberg, R., and Joachims, T. The k-armed dueling bandits problem. Journal of Computer and System Sciences, 78(5):1538 1556, 2012.