# weighted_voting_via_noregret_learning__4093cfa9.pdf

Weighted Voting Via No-Regret Learning

Nika Haghtalab Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 nhaghtal@cs.cmu.edu

Ritesh Noothigattu Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213 riteshn@cmu.edu

Ariel D. Procaccia Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 arielpro@cs.cmu.edu

Voting systems typically treat all voters equally. We argue that perhaps they should not: Voters who have supported good choices in the past should be given higher weight than voters who have supported bad ones. To develop a formal framework for desirable weighting schemes, we draw on no-regret learning. Speciﬁcally, given a voting rule, we wish to design a weighting scheme such that applying the voting rule, with voters weighted by the scheme, leads to choices that are almost as good as those endorsed by the best voter in hindsight. We derive possibility and impossibility results for the existence of such weighting schemes, depending on whether the voting rule and the weighting scheme are deterministic or randomized, as well as on the social choice axioms satisﬁed by the voting rule.

1 Introduction

In most elections, voters are entitled to equal voting power. This principle underlies the one person, one vote doctrine, and is enshrined in the United States Supreme Court ruling in the Reynolds v. Sims (1964) case. But there are numerous voting systems in which voters do, in fact, have different weights. Standard examples include the European Council, where (for certain decisions) the weight of each member country is proportional to its population; and corporate voting procedures where stockholders have one vote per share. Some historical voting systems are even more pertinent: Sweden s 1866 system weighted voters by wealth, giving especially wealthy voters as many as 5000 votes; and a Belgian system, used for a decade at the end of the 19th Century, gave (at least) one vote to each man, (at least) two votes to each educated man, and three votes to men who were both educated and wealthy (Congleton 2011). The last two examples can be seen as (silly, from a modern viewpoint) attempts to weight voters by merit, using wealth and education as measurable proxies thereof. We believe that the basic idea of weighting voters by merit does itself have merit. But we propose to measure a voter s merit by the quality of his past votes. That is, a voter who has supported good choices in the past should be given higher weight than a voter who has supported bad ones.

Copyright c 2018, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

This high-level scheme is, arguably, most applicable to repeated aggregation of objective opinions. For example, consider a group of engineers trying to decide which prototype to develop, based on an objective measure of success such as projected market share. If an engineer supported a certain prototype and it turned out to be a success, she should be given higher weight compared to her peers in future decisions; if it is a failure, her weight should lower. Similar examples include a group of investors selecting companies to invest in; and a group of decision makers in a movie studio choosing movie scripts to produce. Importantly, the recently launched, not-for-proﬁt website Robo Vote.org already provides public access to voting tools for precisely these situations, albeit using methods that always treat all voters equally (Procaccia, Shah, and Zick 2016). Our goal in this paper, therefore, is to augment existing voting methods with weights, in a way that keeps track of voters past performance, and guarantees good choices over time. The main conceptual problem we face is the development of a formal framework in which one can reason about desirable weighting schemes.1 To address this problem, we build on the no-regret learning literature, but depart from the classic setting in several ways some superﬁcial, and some fundamental. Speciﬁcally, instead of experts, we have a set of n voters. In each round, each voter reveals a ranking over a set of alternatives, and the loss of each alternative is determined. In addition, we are given a (possibly randomized) voting rule, which receives weighted rankings as input, and outputs the winning alternative. The voting rule is not part of our design space; it is exogenous and ﬁxed throughout the process. The loss of a voter in round t is given by assigning his ranking all the weight (equivalently, imagining that all voters have that ranking), applying the voting rule, and measuring the loss of the winning alternative (or the expected loss, if the rule is randomized). As in the classic setting, our benchmark is the best voter in hindsight (but we also discuss the stronger benchmark of best voter weights in hindsight in Section 6).

1In that sense, our work is related to papers in computational social choice (Brandt et al. 2016) that study weighted voting, in the context of manipulation, control, and bribery in elections (Conitzer, Sandholm, and Lang 2007; Zuckerman, Procaccia, and Rosenschein 2009; Faliszewski, Hemaspaandra, and Hemaspaandra 2009; 2015).

The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18)

At ﬁrst glance, it may seem that our setting easily reduces to the classic one, by treating voters as experts. But our loss is computed by applying the given voting rule to the entire proﬁle of weighted rankings, and therein lies the rub.2 This leads to our main research question:

For which voting rules is there a weighting scheme such that the difference between our average per-round loss and that of the best voter goes to zero as the number of rounds goes to inﬁnity?

In Section 4, we devise no-regret weighting schemes for any voting rule, under two classic feedback models full information and partial information. While these results make no assumptions on the voting rule, the foregoing weighting schemes heavily rely on randomization. By contrast, deterministic weighting schemes seem more desirable, as they are easier to interpret and explain. In Section 5, therefore, we restrict our attention to deterministic weighting schemes. We ﬁnd that if the voting rule is itself deterministic, it admits a no-regret weighting scheme if and only if it is constant on unanimous proﬁles. Because this property is not satisﬁed by any reasonable rule, the theorem should be interpreted as a strong impossibility result. We next consider randomized voting rules, and ﬁnd that they give rise to much more subtle results, which depend on the properties of the voting rule in question.

2 Preliminaries Our work draws on social choice theory and online learning. In this section we present important concepts and results from each of these areas in turn.

2.1 Social Choice We consider a set [n] {1, . . . , n} of voters and a set A of m alternatives. A vote σ : A [m] is a linear ordering a ranking or permutation of the alternatives. That is, for any vote σ and alternative a, σ(a) denotes the position of alternative a in vote σ. For any a, b A, σ(a) < σ(b) indicates that alternative a is preferred to b under vote σ. We also denote this preference by a σ b. We denote the set of all m! possible votes over A by L(A). A vote proﬁle σ L(A)n denotes the votes of n voters. Furthermore, given a vote proﬁle σ L(A)n and a weight vector w Rn 0, we deﬁne the anonymous vote proﬁle corresponding to σ and w, denoted π [0, 1]|L(A)|, by setting

i=1 wi1(σi=σ), σ L(A).

That is, π is an |L(A)|-dimensional vector such that for each vote σ L(A), πσ is the fraction of the total weight on σ. When needed, we use πσ,w to clarify the vote proﬁle and weight vector to which the anonymous vote proﬁle

2For the same reason, our work is quite different from papers on online learning algorithms for ranking, where the algorithm chooses a ranking of objects at each stage, and suffers a loss based on the relevance of the ranking (Radlinski, Kleinberg, and Joachims 2008; Chaudhuri and Tewari 2015).

corresponds. Note that πσ,w only contains the anonymized information about σ and w, i.e., the anonymous vote proﬁle remains the same even when the identities of the voters change. To aggregate the (weighted) votes into a distribution over alternatives, we next introduce the concept of (anonymous) voting rules. Let Δ(L(A)) be the set of all possible anonymous vote proﬁles. Similarly, let Δ(A) denote the set of all possible distributions over A. An anonymous voting rule is a function f : Δ(L(A)) Δ(A) that takes as input an anonymous vote proﬁle π and returns a distribution over the alternatives indicated by a vector f(π), where f(π)a is the probability that alternative a is the winner under π. We say that a voting rule f is deterministic if for any π Δ(L(A)), f(π) has support of size 1, i.e., there is a unique winner. An anonymous deterministic voting rule f is called strategyproof if for any voter i [n], any two vote proﬁles σ and σ for which σj = σ j for all j = i, and any weight vector w, it holds that either a = a or a σi a , where a and a are the winning alternatives in f(πσ,w) and f(πσ ,w) respectively. In words, whenever a voter reports σ i instead of σi, the outcome does not improve according to the true ranking σi. While strategyproofness is a natural property to be desired in a voting rule, the celebrated Gibbard-Satterthwaite Theorem (Gibbard 1973; Satterthwaite 1975) shows that non-dictatorial strategyproof deterministic voting rules do not exist.3 Subsequently, Gibbard (1977) extended this result to randomized voting rules. Before presenting his extension, we introduce some additional deﬁnitions. Given a loss function over the alternatives denoted by a vector ℓ [0, 1]m, the expected loss of the alternative chosen by the rule f under an anonymous vote proﬁle π is

Lf(π, ℓ) Ea f(π)[ℓa] = f(π) ℓ.

The higher the loss, the worse the alternative. We say that the loss function ℓis consistent with vote σ L(A) if for all a, b A, a σ b ℓa < ℓb. An anonymous randomized rule f is strategyproof if for any voter i [n], any two vote proﬁles σ and σ for which σj = σ j for all j = i, any weight vector w, and any loss function ℓthat is consistent with σi, we have Lf(πσ,w, ℓ) Lf(πσ ,w, ℓ). The next proposition is an interpretation of a result of Gibbard (1977) on the structural property shared by all strategyproof randomized voting rules, applied to anonymous voting rules.

Proposition 2.1. Any strategyproof randomized rule is a distribution over a collection of the following types of rules:

1. Anonymous Unilaterals: g is an anonymous unilateral if there exists a function h : L(A) A for which

σ L(A) πσeh(σ),

where ea is the unit vector that has 1 in the coordinate corresponding to a A, and 0 in all other coordinates.

3The theorem also requires a range of size at least 3.

2. Duple: g is a duple rule if

|{a A | π such that g(π)a = 0}| 2.

Examples of strategyproof randomized voting rules include randomized positional scoring rules and the randomized Copeland rule, which were previously studied in this context (Conitzer and Sandholm 2006; Procaccia 2010). The reader is referred to the full version of the paper (Haghtalab, Noothigattu, and Procaccia 2017) for more details.

2.2 Online Learning

We next describe the general setting of online learning, also known as learning from experts. We consider a game between a learner and an adversary. There is a set of actions (a.k.a experts) X available to the learner, a set of actions Y available to the adversary, and a loss function c : X Y [0, 1] that is known to both parties. In every time step t [T], the learner chooses a distribution, denoted by a vector pt Δ(X), over the actions in X, and the adversary chooses an action yt from the set Y. The learner then receives a loss of c(xt, yt) for xt pt. At this point, the learner receives some feedback regarding the action of the adversary. In the full information setting, the learner observes yt before proceeding to the next time step. In the partial information setting, the learner only observes the loss c(xt, yt). The regret of the algorithm is deﬁned as the difference between its total expected loss and that of the best ﬁxed action in hindsight. The goal of the learner is to minimize its expected regret, that is, minimize

E[Reg T ] E

t=1 c(xt, yt) min x X

t=1 c(x, yt)

where the expectation is taken over the choice of xt pt, and any other random choices made by the algorithm and the adversary. An online algorithm is called a no-regret algorithm if E[Reg T ] o(T). In words, the average regret of the learner must go to 0 as T . In general, deterministic algorithms, for which pt = 1, can suffer linear regret, because the adversary can choose a sequence of actions y1, . . . , y T on which the algorithm makes sub-optimal decisions at every round. Therefore, randomization is one of the key aspects of no-regret algorithms. Many online no-regret algorithms are known for the full information and the partial information settings. In particular, the HEDGE algorithm (Freund and Schapire 1995) is one of the earliest results in this space for the full information setting. At time t+1, HEDGE picks each action x with probability pt+1 x exp( ηCt(x)), for Ct(x) = t s=1 c(x, ys) and η = Θ(

2 ln(|X|) / T).

Proposition 2.2 (Freund and Schapire 1995). HEDGE has regret E[Reg T ] O

T ln(|X|) .

For the partial information setting, the EXP3 algorithm of Auer et al. (2002) can be thought of as a variant of the HEDGE algorithm with importance weighting. In particular,

at time t + 1, EXP3 picks each action x with probability pt+1 x exp( η Ct(x)), for η = Θ(

2 ln(|X|) / T|X|) and

1(xs=x)c(x, ys)

In other words, EXP3 is similar to HEDGE, except that instead of taking into account the total loss of an action, Ct(x), it takes into account an estimate of the loss, Ct(x).

3 Problem Formulation In this section, we formulate the question of how one can design a weighting scheme that effectively weights the rankings of voters based on the history of their votes and the performance of the selected alternatives. We consider a setting where n voters participate in a sequence of elections that are decided by a known voting rule f. In each election, voters submit their rankings over a different set of m alternatives so as to elect a winner. Given an adversarial sequence of voters rankings σ1:T and alternative losses ℓ1:T over a span of T elections, the best voter is the one whose rankings lead to the election of the winners with smallest loss overall. We call this voter the best voter in hindsight. (See Section 6 for a discussion of a stronger benchmark: best weight vector in hindsight.) When the sequence of elections is not known a priori, the best voter is not known either. In this case, the weighting scheme has to take an online approach to weighting the voters rankings. That is, at each time step t T, the weighting scheme chooses a weight vector wt, possibly at random, to weight the rankings of the voters. After the election is held, the weighting scheme receives some feedback regarding the quality of the alternatives in that election, typically in the form of the loss of the elected alternative or that of all alternatives. Using the feedback, the weighting scheme then re-weights the voters rankings based on their performance so far. Our goal is to design a weighting scheme that weights the rankings of the voters at each time step, and elects winners with overall expected loss that is almost as small as that of the best voter in hindsight. We refer to the expected difference between these losses as the expected regret. Formally, let Lf(π, ℓ)

a A f(π)a ℓa

be the expected loss of the (possibly randomized) voting rule f under the anonymous preference proﬁle π and loss vector ℓ. Then the expected regret is

E[Reg T ] E

t=1 Lf(πσt,wt, ℓt) min i

t=1 Lf(πσt,ei, ℓt)

where the expectation is taken over any additional source of randomness in the adversarial sequence or the algorithm. In particular, we seek a weighting scheme for which the average expected regret goes to zero as the time horizon T goes to inﬁnity, at a rate that is polynomial in the number of voters and alternatives. That is, we wish to achieve E[Reg T ] = poly(n, m) o(T). This is our version of a noregret algorithm.

The type of the feedback is an important factor in designing a weighting scheme. Analogously to the online learning models described in Section 2.2, we consider two types of feedback, full information and partial information. In the full information case, after a winner is selected at time t, the quality of all alternatives and rankings of the voters at that round are revealed to the weighting scheme. Note that this information is sufﬁcient for computing the loss of each voter s rankings so far. This would be the case, for example, if the alternatives are companies to invest in. On the other hand, in the partial information setting only the loss of the winner is revealed. This type of feedback is appropriate when the alternatives are product prototypes: we cannot know how successful an undeveloped prototype would have been, but obviously we can measure the success of a prototype that was selected for development. More formally, in the full information setting the choice of wt+1 can depend on σ1:t and ℓ1:t, while in the partial information setting it can only depend on σ1:t and ℓs as for s t, where as is the alternative that won the election at time s. No doubt the reader has noted that the above problem formulation is closely related to the general setting of online learning. Using the language of online learning introduced in Section 2.2, the weight vector wt corresponds to the learner s action xt, the vote proﬁle and alternative losses (σt, ℓt) correspond to the adversary s action yt, the expected loss of the weighting scheme Lf(πσt,wt, ℓt) corresponds to the loss of the learning algorithm c(xt, yt), and the best-in-hindsight voter or weight vector ei refers to the best-in-hindsight action.

4 Randomized Weights In this section, we develop no-regret algorithms for the full information and partial information settings. We essentially require no assumptions on the voting rule, but also impose no restrictions on the weighting scheme. In particular, the weighting scheme may be randomized, that is, the weights can be sampled from a distribution over weight vectors. This allows us to obtain general positive results. As we just discussed, our setting is closely related to the classic online learning setting. Here, we introduce an algorithm analogous to HEDGE that works in the full information setting of Section 3 and achieves no-regret guarantees.

Algorithm 1: Full information setting, using randomized weights.

Input: Adversarial sequences σ1:T and ℓ1:T , and parameter η =

2 ln n/T for t = 1, . . . , T do Play weight vector ei with probability

s=1 Lf(πσs,ei, ℓs)

Observe ℓt and σt. end

Theorem 4.1. For any anonymous voting rule f and n voters, Algorithm 1 has regret O(

T ln(n)) in the full information setting.

Proof sketch. At a high level, this algorithm only considers weight vectors that correspond to a single voter. At every time step, the algorithm chooses a distribution over such weight vectors and applies the voting rule to one such weight vector that is drawn at random from this distribution. This is equivalent to applying the HEDGE algorithm to a set of actions, each of which is a weight vector that corresponds to a single voter. In addition, the loss of the benchmark weighting scheme is the smallest loss that one can get from following one such weight vector. Therefore, the theorem follows from Proposition 2.2.

Let us now address the partial information setting. One may wonder whether the above approach, i.e., reducing our problem to online learning and using a standard algorithm, directly extends to the partial information setting (with the EXP3 algorithm). The answer is that it does not. In particular, in the classic setting of online learning with partial information feedback, the algorithm can compute the estimated loss of the action it just played, that is, the algorithm can compute c(xt, yt). In our problem setting, however, the weighting scheme only observes σt and ℓt at for the speciﬁc alternative at that was elected at this time. Since the losses of other alternatives remain unknown, the weighting scheme cannot even compute the expected loss of the speciﬁc voter it it selected at time t, i.e., Lf(πσt,eit , ℓt). Therefore, we cannot directly use the EXP3 algorithm by imagining that the voters are actions, as we do not obtain the partial information feedback that the algorithm requires. Nevertheless, we can design a new algorithm inspired by EXP3.

Algorithm 2: Partial information setting, using randomized weights.

Input: An adversarial sequences of σ1:T and ℓ1:T , and parameter η =

Let L 0 = 0. for t = 1, . . . , T do for i = 1, . . . , n do Let pt i exp( η Lt 1 i ). end Play weight vector eit from distribution pt, and observe the vote proﬁle σt, the alternative at f(πσt,eit ), and its loss ℓt at.

Let ℓ t be the vector such that ℓt it = ℓt at/pt it and ℓt i = 0 for i = it. Let L t = L t 1 + ℓ t. end

Theorem 4.2. For any anonymous voting rule f and n voters, Algorithm 2 has regret O(

Tn ln(n)) in the partial information setting.

To prove the theorem, we show that certain properties, which are necessary for the performance of EXP3, still hold in our setting. Speciﬁcally, Lemma 4.3 asserts that ℓ t creates an unbiased estimator of the expected loss of the weighting scheme. Moreover, it states that for any voter i , Lt i is an unbiased estimator for the loss that the weighting scheme would have received if it followed the rankings of voter i throughout the sequence of elections. Lemma 4.4 then establishes that the variance of this estimator is small.

Lemma 4.3. For any t, any i , it pt, and at f(πσt,eit ), we have

i=1 pt i ℓt i

= Eit Lf(πσt,eit , ℓt)

Eit,at LT i =

t=1 Lf(πσt,ei , ℓt).

Proof. For ease of notation, we suppress t when it is clear from the context. First note that ℓis zero in all of its elements, except for ℓit. So,

i=1 pi ℓi = pit ℓit = pit ℓat

Therefore, we have

= Eit,at [ℓat] = Eit Lf(πσ,eit , ℓ) .

For clarity of presentation, let ℓ i,a be an alternative representation of ℓwhen it = i and at = a. Note that ℓi,a i = 0 only if i = i. We have

Eit,at LT i =

t=1 Eit,at ℓit,at

i=1 pt i Ea f(πσt,ei) ℓi,a i

t=1 pt i Ea f(πσt,ei )

t=1 Ea f(πσt,ei ) ℓt a

t=1 Lf(πσt,ei , ℓt).

Lemma 4.4. For any t, it pt, and at f(πσt,eit ), we have

i=1 pt i( ℓt i)2

Proof. For ease of notation, we suppress t when it is clear from the context. Since ℓis zero in all of its elements, except for ℓit, we have

i=1 pi( ℓi)2 = pit( ℓit)2 = pit ℓat

i=1 pi( ℓi)2

= Eit,at (ℓat)2

i=1 pi Ea f(πσ,ei)

i=1 Ea f(πσ,ei) (ℓa)2

We are now ready to prove the theorem.

Proof of Theorem 4.2. We use a potential function, given by Φt 1

η ln n i=1 exp( η Lt 1 i ) . We prove the claim by analyzing the expected increase in this potential function at every time step. Note that

Φt+1 Φt = 1

n i=1 exp( η Lt 1 i η ℓt i) n i=1 exp( η Lt 1 i )

i=1 pt i exp( η ℓt i)

Taking the expected increase in the potential function over the random choices of it and at for all t = 1, . . . , T, we have

E [ΦT +1 Φ1]

t=1 Eit,at [Φt+1 Φt]

1 η ℓt i + 1

i=1 pt i ℓt i η

i=1 pt i ℓt i 2

i=1 pt i ℓt i η

i=1 pt i ℓt i 2

t=1 Lf(πσt,eit , ℓt)

where the second transition follows from Equation (2) because for all x 0, e x 1 x + x 2 2 , the fourth transition follows from ln(1 x) x for all x R, and the last transition holds by Lemmas 4.3 and 4.4. On the other hand, Φ1 = 1

η ln n and for any i ,

η ln exp( η LT i ) = LT i .

E [ΦT +1 Φ1] E LT i + 1

t=1 Lf(πσt,ei , ℓt) + 1

We can now prove the theorem by using Equations (3) and (4), and the parameter value η =

t=1 Lf(πσt,eit , ℓt) min i [n]

t=1 Lf(πσt,ei, ℓt)

η ln n + ηTn

5 Deterministic Weights One of the key aspects of the weighting schemes we used in the previous section is randomization. In such weighting schemes, the weights of the voters not only depend on their performance so far, but also on the algorithm s coin ﬂips. In practice, voters would most likely prefer weighting schemes that depend only on their past performance, and are therefore easier to interpret. In this section, we focus on designing weighting schemes that are deterministic in nature. Formally, a deterministic weighting scheme is an algorithm that at time step t+1 deterministically chooses one weight vector wt+1 based on the history of play, i.e., sequences σ1:t, ℓ1:t, and a1:t. In this section, we seek an answer to the following question: For which voting rules is there a no-regret deterministic weighting scheme? In contrast to the results established in the previous section, we ﬁnd that the properties of the voting rule play an important role here. In the remainder of this section, we show possibility and impossibility results for the existence of such weighting schemes under randomized and deterministic voting rules. We begin our search for deterministic weighting schemes by considering deterministic voting rules. Note that in this case the winning alternatives are induced deterministically by the weighting scheme, so the weight vector wt+1 should be deterministically chosen based on the sequences σ1:t and ℓ1:t. We establish an impossibility result: Essentially no deterministic weighting scheme is no-regret for a deterministic voting rule. Speciﬁcally, we show that a deterministic no-regret weighting scheme exists for a deterministic voting rule if and only if the voting rule is constant on unanimous proﬁles. Deﬁnition 5.1. A voting rule f is constant on unanimous proﬁles if and only if for all σ, σ L(A), f(eσ) = f(eσ ), where eσ denotes the anonymous vote proﬁle that has all of its weight on ranking σ. Theorem 5.2. For any deterministic voting rule f, a deterministic weighting scheme with regret o(T) exists if and only

if f is constant on unanimous proﬁles. This is true in both the full information and partial information settings.

Proof. We ﬁrst prove that for any voting rule that is constant on unanimous proﬁles there exists a deterministic weighting scheme that is no-regret. Consider such a voting rule f and a simple deterministic weighting scheme that uses weight vector wt = e1 for every time step t T (so it does not use feedback whether full or partial at all). Note that at each time step t and for any voter i [n],

f(πσt,wt) = f(eσt 1) = f(eσt i) = f(πσt,ei),

where the second transition holds because f is constant on unanimous proﬁles. As a result,

Lf(πσt,wt, ℓt) = Lf(πσt,ei, ℓt).

In words, the total loss of the weighting scheme is the same as the total loss of any individual voter this weighting scheme has 0 regret. Next, we prove that if f is not constant on unanimous proﬁles then for any deterministic weighting scheme there is an adversarial sequence of σ1:T and ℓ1:T that leads to regret of Ω(T), even in the full information setting. Take any such voting rule f and let τ, τ L(A) be such that f(eτ) = f(eτ ). At time t, the adversary chooses σt and ℓt

based on the deterministic weight vector wt as follows: The adversary sets σt to be such that σt 1 = τ and σt j = τ for all j = 1. Let alternative at be the winner of proﬁle πσt,wt, i.e., f(πσt,wt) = eat. The adversary sets ℓt at = 1 and ℓt x = 0 for all x = at. Therefore, the weighting scheme incurs a loss of 1 at every step, and its total loss is

t=1 Lf(πσt,wt, ℓt) =

t=1 ℓt at = T.

Let us consider the total loss that the ranking of any individual voter incurs. By design, for any j > 1,

f(πσt,e1) = f(eτ) = f(eτ ) = f(πσt,ej).

Therefore, for at least one voter i [n], f(πσt,ei) = eat. Note that such a voter receives loss of 0, so the combined loss of all voters is at most n 1. Over all time steps, the total combined loss of all voters is at most T(n 1). As a result, the best voter incurs a loss of at most (n 1)T

n , i.e., the average loss. We conclude that the regret of the weighting scheme is

t=1 Lf(πσt,wt, ℓt) min i [n]

t=1 Lf(πσt,ei, ℓt)

Theorem 5.2 indicates that we need to allow randomness (either in the weighting scheme or in the voting rule) if we wish to have no-regret guarantees. As stated before, we would like to have a deterministic weighting scheme so that the weights of voters are not decided by coin ﬂips. This leaves us with no choice other than having a randomized voting rule. Nonetheless, one might argue in favor of having a deterministic voting rule and a randomized weighting scheme, claiming that it is equivalent because the randomness has simply been shifted from the voting rule to the weights. To that imaginary critic we say that allowing the voting rule to be randomized makes it possible to achieve strategyproofness (see Section 2.1), which cannot be satisﬁed by a deterministic voting rule. We next show that for any voting rule that is a distribution over unilaterals there exist deterministic weighting schemes that are no-regret. An important family of strategyproof randomized voting rules randomized positional scoring rules (see the full version of the paper (Haghtalab, Noothigattu, and Procaccia 2017)) can be represented as distributions over unilaterals, hence the theorem allows us to design a no-regret weighting scheme for any randomized positional scoring rule. The weighting schemes that we use build on Algorithms 1 and 2 directly. In more detail, we consider deterministic weighting schemes that at time t use weight vector pt and a randomly drawn candidate at f(πσt,pt), where pt is computed according to Algorithms 1 or 2. The key insight behind these weighting schemes is that, if f is a distribution over unilaterals, we have

Ei pt[f(πσt,ei)] = f(πσt,pt), (5)

where the left-hand side is a vector of expectations. That is, the outcome of the voting rule f(πσt,pt) can be alternatively implemented by applying the voting rule on the ranking of voter i that is drawn at random from the distribution pt. This is exactly what Algorithms 1 and 2 do. Therefore, the deterministic weighting schemes induce the same distribution over alternatives at every time step as their randomized counterparts, and achieve the same regret. The next theorem, whose complete proof appears in the full version of the paper (Haghtalab, Noothigattu, and Procaccia 2017), formalizes this discussion.

Theorem 5.3. For any voting rule that is a distribution over unilaterals, there exist deterministic weighting schemes with regret of O(

T ln(n)) and O(

Tn ln(n)) in the fullinformation and partial-information settings, respectively.

The theorem states that there exist no-regret deterministic weighting schemes for any voting rule that is a distribution over unilaterals. It is natural to ask whether being a distribution over unilaterals is, in some sense, also a necessary condition. While we do not give a complete answer to this question, we are able to identify a sufﬁcient condition for not having no-regret deterministic weighting schemes. To this end, we introduce a classic concept. Alternative a A is a Condorcet winner in a given vote proﬁle if for every b A, a majority of voters rank a above b. A deterministic rule is Condorcet consistent if it selects a Condorcet

winner whenever one exists in the given vote proﬁle; see the full version of the paper (Haghtalab, Noothigattu, and Procaccia 2017) for formal deﬁnitions. We extend the notion of Condorcet consistency to randomized rules. Deﬁnition 5.4. For a set of alternatives A such that |A| = m, a randomized voting rule f : Δ(L(A)) Δ(A) is probabilistically Condorcet consistent with gap δ(m) if for any anonymous vote proﬁle π that has a Condorcet winner a, and for all alternatives x A \ {a}, f(π)a f(π)x + δ(m). In words, a randomized voting rule is probabilistically Condorcet consistent if the Condorcet winner has strictly higher probability of being selected than any other alternative, by a gap of δ(m). As an example, a signiﬁcant strategyproof randomized voting rule the randomized Copeland rule, deﬁned in the full version of the paper (Haghtalab, Noothigattu, and Procaccia 2017) is probabilistically Condorcet consistent with gap δ(m) = Ω(1/m2). Theorem 5.5. For a set of alternatives A such that |A| = m, let f be a probabilistically Condorcet consistent voting rule with gap δ(m), and suppose there are n voters for

n 2 3 2δ(m) + 1 .

Then any deterministic weighting scheme will suffer regret of Ω(T) under f (in the worst case), even in the full information setting. The theorem s proof is relegated to the full version of the paper (Haghtalab, Noothigattu, and Procaccia 2017). It is interesting to note that Theorems 5.3 and 5.5 together imply that distributions over unilaterals are not probabilistically Condorcet consistent. This is actually quite intuitive: Distributions over unilaterals are local in that they look at each voter separately, whereas Condorcet consistency is a global property. In fact, these theorems can be used to prove in an especially convoluted and indirect way a simple result from social choice theory (Moulin 1983): No positional scoring rule is Condorcet consistent!

6 Discussion We conclude by discussing several conceptual points.

A natural, stronger benchmark. In our model (see Section 3), we are competing with the best voter in hindsight. But our action space consists of weight vectors. It is therefore natural to ask whether we can compete with the best weight vector in hindsight (hereinafter, the stronger benchmark). Clearly the stronger benchmark is indeed at least as hard, because the best voter i corresponds to the weight vector ei . Therefore, our impossibility results for competing against the best voter in hindsight (Theorems 5.2 and 5.5) extend to the stronger benchmark. Moreover, voting rules that are distributions over unilaterals demonstrate a certain linear structure where the outcome of the voting rule nicely decomposes across individual voters. Under such voting rules, the benchmark of best weights in hindsight is equivalent to the benchmark of best voter in hindsight. Therefore, Theorem 5.3 also holds for the stronger

benchmark, and, in summary, each and every result of Section 5 extends to the stronger benchmark. By contrast, Theorems 4.1 and 4.2 do not hold for the stronger benchmark; the question of identifying properties of voting rules (beyond distributions over unilaterals) that admit randomized no-regret weighting schemes under the stronger benchmark remains open. We describe the stronger benchmark in more detail, and formalize the above arguments, in the full version of the paper (Haghtalab, Noothigattu, and Procaccia 2017).

Changing the sets of alternatives and voters over time. We wish to emphasize that the set of alternatives at each time step, i.e., in each election, can be completely different. Moreover, the number of alternatives could be different. In fact, our positive results do not even depend on the number of alternatives m, so we can simply set m to be an upper bound. By contrast, we do need the set of voters to stay ﬁxed throughout the process, but this is consistent with our motivating examples (e.g., a group of partners in a small venture capital ﬁrm would face different choices at every time step, but the composition of the group rarely changes).

Optimizing the voting rule. Throughout the paper, the voting rule is exogenous. One might ask whether it makes sense to optimize the choice of voting rule itself, in order to obtain good no-regret learning results. Our answer is yes and no . On the one hand, we believe our results do give some guidance on choosing between voting rules. For example, from this viewpoint, one might prefer randomized Borda (which admits no-regret algorithms under a deterministic weighting scheme) to randomized Copeland (which does not). On the other hand, many considerations are factored into the choice of voting rule: social choice axioms, optimization of additional objectives (Procaccia, Shah, and Zick 2016; Boutilier et al. 2015; Elkind, Faliszewski, and Slinko 2009; Conitzer and Sandholm 2005), and simplicity. It is therefore best to think of our approach as augmenting voting rules that are already in place.

Acknowledgments

The authors were partially supported by NSF grants IIS1350598, IIS-1714140, CCF-1525932, and CCF-1733556; by ONR grants N00014-16-1-3075 and N00014-17-1-2428; by a Sloan Research Fellowship, and by an MSR Ph D Fellowship.

Auer, P.; Cesa-Bianchi, N.; Freund, Y.; and Schapire, R. E. 2002. The nonstochastic multi-armed bandit problem. SIAM Journal on Computing 32(1):48 77. Boutilier, C.; Caragiannis, I.; Haber, S.; Lu, T.; Procaccia, A. D.; and Sheffet, O. 2015. Optimal social choice functions: A utilitarian view. Artiﬁcial Intelligence 227:190 213. Brandt, F.; Conitzer, V.; Endriss, U.; Lang, J.; and Procaccia, A. D., eds. 2016. Handbook of Computational Social Choice. Cambridge University Press. Chaudhuri, S., and Tewari, A. 2015. Online ranking with top-1 feedback. In Proceedings of the 18th International

Conference on Artiﬁcial Intelligence and Statistics (AISTATS), 129 137. Congleton, R. 2011. Perfecting Parliament: Constitutional Reform, Liberalism, and the Rise of Western Democracy. Cambridge University Press. Conitzer, V., and Sandholm, T. 2005. Common voting rules as maximum likelihood estimators. In Proceedings of the 21st Annual Conference on Uncertainty in Artiﬁcial Intelligence (UAI), 145 152. Conitzer, V., and Sandholm, T. 2006. Nonexistence of voting rules that are usually hard to manipulate. In Proceedings of the 21st AAAI Conference on Artiﬁcial Intelligence (AAAI), 627 634. Conitzer, V.; Sandholm, T.; and Lang, J. 2007. When are elections with few candidates hard to manipulate? Journal of the ACM 54(3):1 33. Elkind, E.; Faliszewski, P.; and Slinko, A. 2009. On distance rationalizability of some voting rules. In Proceedings of the 12th Conference on Theoretical Aspects of Rationality and Knowledge (TARK), 108 117. Faliszewski, P.; Hemaspaandra, E.; and Hemaspaandra, L. 2009. How hard is bribery in elections? Journal of Artiﬁcial Intelligence Research 35:485 532. Faliszewski, P.; Hemaspaandra, E.; and Hemaspaandra, L. 2015. Weighted electoral control. Journal of Artiﬁcial Intelligence Research 52:507 542. Freund, Y., and Schapire, R. E. 1995. A decision-theoretic generalization of on-line learning and an application to boosting. In Proceedings of the 2nd European Conference on Computational Learning Theory (Euro COLT), 23 37. Gibbard, A. 1973. Manipulation of voting schemes. Econometrica 41:587 602. Gibbard, A. 1977. Manipulation of schemes that mix voting with chance. Econometrica 45:665 681. Haghtalab, N.; Noothigattu, R.; and Procaccia, A. D. 2017. Weighted voting via no-regret learning. ar Xiv:1703.04756 [cs.GT]. Moulin, H. 1983. The Strategy of Social Choice, volume 18 of Advanced Textbooks in Economics. North-Holland. Procaccia, A. D.; Shah, N.; and Zick, Y. 2016. Voting rules as error-correcting codes. Artiﬁcial Intelligence 231:1 16. Procaccia, A. D. 2010. Can approximation circumvent Gibbard-Satterthwaite? In Proceedings of the 24th AAAI Conference on Artiﬁcial Intelligence (AAAI), 836 841. Radlinski, F.; Kleinberg, R.; and Joachims, T. 2008. Learning diverse rankings with multi-armed bandits. In Proceedings of the 25th International Conference on Machine Learning (ICML), 784 791. Satterthwaite, M. 1975. Strategy-proofness and Arrow s conditions: Existence and correspondence theorems for voting procedures and social welfare functions. Journal of Economic Theory 10:187 217. Zuckerman, M.; Procaccia, A. D.; and Rosenschein, J. S. 2009. Algorithms for the coalitional manipulation problem. Artiﬁcial Intelligence 173(2):392 412.