# primarily_about_primaries__0a6c6949.pdf The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19) Primarily about Primaries Allan Borodin University of Toronto bor@cs.toronto.edu Omer Lev Ben-Gurion University omerlev@bgu.ac.il Nisarg Shah University of Toronto nisarg@cs.toronto.edu Tyrone Strangway University of Toronto tyrone@cs.toronto.edu Much of the social choice literature examines direct voting systems, in which voters submit their ranked preferences over candidates and a voting rule picks a winner. Real-world elections and decision-making processes are often more complex and involve multiple stages. For instance, one popular voting system filters candidates through primaries: first, voters affiliated with each political party vote over candidates of their own party and the voting rule picks a candidate from each party, which then compete in a general election. We present a model to analyze such multi-stage elections, and conduct the first quantitative comparison (to the best of our knowledge) of the direct and primary voting systems with two political parties in terms of the quality of the elected candidate. Our main result is that every voting rule is guaranteed to perform almost as well (i.e., within a constant factor) under the primary system as under the direct system. Surprisingly, the converse does not hold: we show settings in which there exist voting rules that perform significantly better under the primary system than under the direct system. Introduction If I could not go to heaven but with a party, I would not go there at all. Thomas Jefferson, 1789 Thomas Jefferson, like many of the US constitution s authors, believed that political parties and factions are a bad thing (see also Hamilton, Madison, and Jay (1787)). This view stemmed from a long history of British and English political history, in which prison sentences and executions were possible outcomes in the battle between factions for supremacy at the Royal court (Simms 2007). However, both in Britain and in the Unites States, once their respective legislative assemblies gained political force, parties turned out to be quite unavoidable. Even Jefferson had to start his own party, which ended up quite successful, and was able to vanquish the opposing party from political existence (Wilentz 2005). Fast forwarding to today, political parties have become the bedrock of parliamentary politics throughout the world. In particular, one of political parties main roles if not the Copyright c 2019, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. most important (especially in presidential systems) is to select the candidates which are voted for by the general public. The mechanisms by which parties make this selection are varied, and they have evolved significantly throughout the past 150 years. But in the past few decades there has been a marked shift by parties throughout the world towards increasing the ability of individual party (or unaffiliated) members to influence the outcome, and in some cases, to be the only element to determine party candidates (Cross and Blais 2011). In particular, US parties have changed their election methods since the 1970s to focus the selection of presidential, congressional and state-wide candidates on popular support by party members via primaries (Cohen et al. 2008). Despite this long and established role of parties in whittling down the candidate field in elections, the treatment of a parties role in elections within the multiagent systems community has been quite limited. While various candidate manipulation attacks have been investigated (e.g., Sybil attacks (Conitzer and Yokoo 2010)), and there is recent research into parties as a collection of similar minded candidates (e.g., in gerrymandering, across different districts), the role of parties in removing candidates has not been explored. The focus of this paper is the primary voting system, in which each party s electorate selects a winner from among the party s candidates, and among these primary winners, an ultimate election winner is selected by the general public. Our goal is to compare this system to the direct voting system, in which all voters directly vote over all candidates. Our Results Our contribution is twofold. First, we formulate a model which allows a quantitative comparison of the two voting systems. Our model is a spatial model of voting in which voters and candidates lie in an underlying multi-dimensional space, and voter preferences are single-peaked. This allows us to compare each candidate s social utility in terms of its total distance to the voters. We make the evaluation metric formal using the notion of distortion advocated by a recent line of research (Procaccia and Rosenschein 2006; Boutilier et al. 2015). Our results focus on 2 parties, each selecting a single candidate, with both candidates presented to the general voting public. Second, we use this model to present a comparison of the direct and primary voting systems. In particular, we show that no voting rule performs much worse (by more than a constant factor) under the primary system than under the direct system. While the converse holds in some cases, we show settings in which it does not, and there exist voting rules that perform much better under the primary system than under the direct system. It is important to note that while we write of parties, voters and elections, this multi-step model applies to a variety of decision-making processes by agents. An organization selecting an employee of the month may ask each unit to nominate a single candidate, and then choose from amongst them. A city may ask its regional subdivisions to assess which roads require urgent fixing, and then the city council decides from these options where to invest its efforts. Fundamentally, in many cases where the potential number of options is huge, it is common to use subdivisions to cull the options and present only a few of them for discussion and vote. In such cases, our multi-stage model is apt. Related Work The analysis of regular, direct elections is long and varied, both in the social sciences and in AI (Brandt et al. 2016). In our particular setting, the voters are located in a metric space, with their preferences related to their distance from candidates. Such settings have been widely investigated in the social science literature since the work of Downs (1957), recently summarized by Schofield (2008). In particular, we focus on the idea of distortion in such elections, a topic broached by Procaccia and Rosenschein (2006) for voters with a utility function, but investigated in the context of voters in metric spaces in a series of papers (Anshelevich, Bhardwaj, and Postl 2015; Anshelevich and Postl 2017; Skowron and Elkind 2017) for most common voting rules. Feldman, Fiat, and Golomb (2016) explored such a setting for strategyproof mechanisms. Discussing changes to the set of candidates has been mainly focused on two paths of research. Strategic candidates, investigation of which began with the work of Dutta, Jackson, and Le Breton (2001) followed by Dutta, Jackson, and Le Breton (2002) discussing strategic candidacy in tournaments, and recently further explored by Brill and Conitzer (2015), Polukarov et al. (2015) and others deals mainly with finding equilibria. The other is the addition and removal of candidates, as a form of control manipulation, which was studied by Bartholdi III, Tovey, and Trick (1992); see the summaries by Brandt et al. (2016) and Rothe (2015). Investigating parties selection methods and their effect on the election has mostly been done in the social sciences. Kenig (2009) details the range of selection methods parties use, and there has been significant focus on more democratic methods for leader selection (Cross and Katz 2013), which seems to be a general trend in many Western countries (Cross and Blais 2011). There is also significant literature on particular party elections in various countries, such as Britain (Jobson and Wickham-Jones 2010), Belgium (Wauters 2010), Israel (Hazan 1997), and many others. Naturally, the most widely examined country is the US, in which political parties have been a fixture of political life since its early days (Wilentz 2005). The most recent extensive summary of research on it is due to Cohen et al. (2008), who try to explain how party power-brokers influence the party membership vote. Norpoth (2004) uses primary data to predict election results, and notably Sides et al. (2018) show that primary voters are very similar to regular voters. In computational fields, recent interest in proxy voting (Cohensius et al. 2017; Kahng, Mackenzie, and Procaccia 2018), in which voters give other agents the ability to vote for them, may be related to how modern parties are viewed and analyzed. For k N, define [k] = {1, . . . , k}. Let V = [n] denote a set of n voters, and A denote a set of m candidates. We assume that voters and candidates lie in an underlying metric space M = (S, d), where S is a set of points and d is a distance function satisfying the triangle inequality and symmetry. More precisely, there exists an embedding ρ : V A S mapping each voter and candidate to a point in S. For a set X V A, we slightly abuse the notation and let ρ(X) = {ρ(x) : x X}. Also, for x, x V A, we often use d(x, x ) instead of d(ρ(x), ρ(x )) for notational convenience. In this work, we assume that voters and candidates additionally have an affiliation with a political party. Specifically, we study a setting with two parties1, denoted 1 and 1. The party affiliation function π : V A { 1, 1} maps each voter and candidate to the party they are affiliated with. For p { 1, 1}, let Vp = π 1(p) V , Ap = π 1(p) A, np = |Vp|, and mp = |Ap|. We require np 1 for each p { 1, 1}. Our main result (Theorem 2) holds even if there are independent voters not affiliated with either party. Collectively, an instance is the tuple I = (V, A, M, ρ, π). Given I, our goal is to find a candidate a A as the winner. The social cost of a is its total distance to the voters, denoted CI(a) = P i V d(i, a). For party p { 1, 1}, let CI p(a) = P i Vp d(i, a). Hence CI(a) = CI 1(a) + CI 1(a). We also use for X V , CI X(a) = P i X d(i, a). Given an instance I, we would like to choose a candidate a OPT arg mina A CI(a) that minimizes the social cost. We shall drop the instance from superscripts if it is clear from the context. However, we do not observe the full instance. Specifically, we do not know the underlying metric M or the embedding function ρ. Instead, each voter i N submits a a vote, which is a ranking (strict total order) i over the candidates in A by their distance to the voter. Specifically, for all i N and a, b A, a i b d(i, a) d(i, b). The voter is allowed to break ties between equidistant candidates arbitrarily. The vote profile I = ( 1, . . . , n) is the collection of votes. Given an instance I, its corresponding election EI = (V, A, I, π) contains all observable information. In the families of instances that we consider, we fix the 1The model can be extended to multiple parties in a reasonably straightforward manner, but for simplicity s sake, we shall focus on only 2 parties. number of candidates m and let the number of voters n to be arbitrarily large. This choice is justified because in many typical elections (e.g., political ones), voters significantly outnumber candidates. Let Iα m,M be the family of instances satisfying the following conditions. Each party has at least an α fraction of the voters affiliated with it, i.e., np α n for each p { 1, 1}. Note that α [0, 0.5]: α = 0.5 is the strictest (exactly half of the voters are affiliated with each party), while α = 0 imposes no conditions; in the latter case, we omit the superscript α. The number of candidates is at most m. In particular, we shall focus on a few cases of M : M = This allows M to be an arbitrary metric space. M = Rk The metric space should be M = (Rk, d), where d is the standard Euclidean distance. M = sep-Rk This means the embedding ρ must be such that ρ(V 1 A 1) and ρ(V1 A1) are linearly separable.2 In this case we shall take the metric to be M = (Rk, d) with d as the standard Euclidean distance. In plain words, the voters and candidates affiliated with each party reside in a certain part of the metric space, separate from those affiliated with the other party. In a single dimension, this means there exists a threshold on the line such that voters and candidates affiliated with one party lie to the left of it, while those affiliated with the other party lie to the right. Note that this is the only choice of M that restricts the embedding ρ based on party affiliation π. These families of instances are related by the following relation. For all k, Iα m,sep-Rk Iα m,sep-Rk+1 Iα m,Rk+1 Iα m, Voting Rules and Distortion A voting rule f takes an election as input, and returns a winning candidate from A. We say that the cost-approximation of f on instance I is φ(f, I) = CI(f(EI)) mina A CI(a), and given a family of instances I, the distortion of f with respect to I is φI(f) = sup I I φ(f, I). Since distortion is a worst-case notion, we have that when I I , φI(f) φI (f) for every voting rule f. Standard voting rules choose the winning candidate independently of party affiliations. These include rules such as plurality, Borda count, k-approval, veto, and STV. We refer readers to the book by Brandt et al. (2016) for their 2Two sets of points are linearly separable if the interiors of their convex hulls are disjoint, or equivalently, if there exists a hyperplane that contains each set in a distinct closed halfspace. definitions. We call a voting rule affiliation-independent if f(E) = f(E ) when elections E and E differ only in their party affiliation functions. Since an affiliationindependent voting rule f ignores party affiliations, we have φIα m,sep-Rk (f) = φIα m,Rk (f). All of the above-mentioned rules, in addition to being affiliation-independent, share the property of being unanimous, i.e., they return candidate a when a is the top choice of all voters. Stages and Primaries Given an affiliation-independent voting rule f, voting systems with primaries employ a specific process to choose the winner, essentially resulting in a different voting rule bf that operates on a given election E = (V, A, , π) as follows: 1. First, it creates two primary elections: for p { 1, 1}, define Ep = (Vp, Ap, p, πp), where p denotes the preferences of voters in Vp over candidates in Ap, and πp : Vp {p} is a constant function. 2. Next, it computes the winning candidate in each primary election (primary winner) using rule f: for p { 1, 1}, let a p = f(Ep). 3. Finally, let Eg = (V, a 1, a 1 , g, π) be the general election, where g denotes the preferences of all voters over the two primary winners. The winning candidate is bf(E) = maj(Eg), which is what most voting rules become when dealing with only 2 candidates.3 Here, maj is the majority rule, which, given two candidates, picks the one that a majority of voters prefer; our results are independent of its tie-breaking. This resembles systems employed by the main US, Canadian and other countries parties, in which a party s members vote on their party s candidates to select a winner of their primary. In other systems, such selection could be a multi-stage process. Given an affiliation-independent voting rule f, the goal of this paper is to compare its performance under the direct system, in which f is applied on the given election directly, to its performance under the primary system, in which bf is applied on the given election instead. Formally, given a family of instances I and an affiliation-independent voting rule f, we wish to compare φI(f) and φI( bf) (henceforth, the distortion of f with respect to I under the direct and the primary systems, respectively). Small Primaries Are Terrible Recall that in a family of instances Iα m,M, we require that at least α fraction of voters be affiliated with each party, i.e., np αn for each p { 1, 1}. In other words, each primary election must have at least αn voters. 3If one of the parties has no affiliated candidates, then the primary winner of the other party becomes the overall winner. In a setting with more than 2 parties, or where each party nominates several candidates, the general election can use f to determine the outcome (or use some other voting process). We first show that when a primary election can have very few voters (α = 0), every reasonable voting rule has an unbounded distortion in the primary system, even with respect to our most stringent family of instances Im,sep-R. Theorem 1. For m 3, φIm,sep-R( bf) = for every affiliation-independent unanimous voting rule f. Proof. Consider an instance I Im,sep-R in which voter 1 is located at 0 and affiliated with party 1, while the remaining n 1 voters are located at 1 and affiliated with party 1. All m candidates are affiliated with party 1: one is at 0, and the rest are at 1. The candidate a at 0 becomes the primary winner of party 1, and trivially becomes the overall winner. Its social cost is C(a ) = n 1. In contrast, an optimal candidate a OP T at 1 has social cost C(a OP T ) = 1. Hence, C(a ) C(a OP T ) = n 1. Since the number of voters n is un- bounded, φIm,sep-R( bf) = . Theorem 1 continues to hold even if we require that at least a constant fraction of candidates be affiliated with each party: we could simply move a constant fraction of the candidates from 1 to 3, and the proof would still go through. On the other hand, if we require that at least a constant fraction of voters be affiliated with each party, the result changes dramatically. Large Primaries Are Never Much Worse For every affiliation-independent voting rule f, we bound the distortion of bf in terms of the distortion of f for every instance. Note that this is stronger than comparing the worstcase distortions of f and bf over a family of instances. Given an instance I = (V, A, M, ρ, π) and party p { 1, 1}, we say that Ip = (Vp, Ap, M, ρp, πp) is the primary instance of party p, where ρp and πp are restrictions of ρ and π to Vp Ap. The primary election Ep of party p is precisely the election corresponding to instance Ip. Theorem 2. Let I = (V, A, M, ρ, π) be an instance. For p { 1, 1}, let Ip be the primary instance of party p, and np = |Vp| αn. Then, φ( bf, I) 3 1 α + max(φ(f, I 1), φ(f, I1)) Further, for a socially optimal candidate a OP T arg mina A CI(a), we have a bound depending only on the distortion of the primary election of its party: φ( bf, I) 3 1 α + φ(f, Iπ(a OP T )) For each family of instances I that we study, it holds that for every instance I I, both its primary instances, if seen as direct elections, are also in I (since the party division has no effect on the direct election distorition). Hence, we can convert the instance-wise comparison to a worst-case comparison. Corollary 3. For every α > 0, k N, family of instances I Im, , Im,Rk, Im,sep-Rk , and affiliation-independent voting rule f, φI( bf) 3 1 α + φI(f) Note that φI(f) 1 by definition. Hence, we can write φI( bf) 6 α φI(f). In other words, for every affiliationindependent voting rule f, its distortion under the primary system is at most a constant times bigger than its distortion under the direct system, with respect to every family of instances that we consider. To prove Theorem 2, we need two lemmas. The first lemma shows that if the distortion of a rule f for a primary instance Ip is low, then the corresponding primary winner a p is nearly as good as any candidate in Ap for the overall election as well. The intuition is that when voters in V \ Vp drive up the social cost of a p (i.e., when they are far from a ), they must do so for every candidate in Ap. Lemma 4. Let a p denote the primary winner of party p. Then C(a p) 1 α + φ(f, Ip) α min a Ap C(a). Proof. Let θ = φ(f, Ip) (hence θ 1). Fix an arbitrary a Ap. Then, CVp(a p) θ CVp(a). Now, C(a p) = CVp(a p) + CV \Vp(a p) θ CVp(a) + CV \Vp(a) + (n np) d(a, a p) θ C(a) + (n np) d(a p, a), (1) where the second transition follows due to the triangle inequality. We also have d(a p, a) d(a p, i) + d(i, a) for any i Vp. Thus, d(a p, a) CVp(a p) + CVp(a) np CVp(a) 1 + θ np C(a). (2) Substituting Equation (2) into Equation (1), and using the fact that n np α , we get the desired result. Our next lemma shows that the primary winner that wins the general election is not much worse than the primary winner that loses the general election. Lemma 5. Let a 1 and a 1 be the two primary winners, a a 1, a 1 be the winner of the general election, and ba a 1, a 1 \ {a }. Then, C(a ) 3 C(ba). Proof. Because a wins the general election by a majority vote, there must exist X V , |X| n 2 such that d(i, a ) d(i, ba) for every i X. Combining with the triangle inequality d(a , ba) d(a , i) + d(i, ba), we get d(i, ba) d(a ,ba) 2 for every i X. Hence, 2 d(a , ba) 2 d(a , ba) 4 n C(ba). (3) Now, we have i X d(i, a ) + X i V \X d(i, a ) i X d(i, ba) + X i V \X (d(i, ba) + d(a , ba)) 2 d(a , ba), (4) where the second transition follows because d(i, a ) d(i, ba) for i X, and d(i, a ) d(i, ba) + d(a , ba) due to the triangle inequality. Substituting Equation (3) into Equation (4), we get the desired result. We are now ready to prove our main result. Proof of Theorem 2. Recall that a 1 and a 1 are the primary winners. Let p { 1, 1} be such that a p is the winner of the general election. Let a OP T arg mina A C(a) be a socially optimal candidate. We consider three cases. Case 1: a OP T Ap. That is, the optimal candidate and the winner are affiliated with the same party. In this case, Lemma 4 yields φ( bf, I) 1 α + φ(f, Ip) α = 1 α + φ(f, Iπ(a OP T )) Case 2: a OP T A p, a OP T = a p. That is, the optimal candidate and the winner are affiliated with different parties, and the optimal candidate is a primary winner. In this case, Lemma 5 yields φ( bf, I) 3. Case 3: a OP T A p, a OP T = a p. That is, the optimal candidate is affiliated with a party different than that of the winner, and is not a primary winner. In this case, we use both Lemmas 4 and 5 to derive φ( bf, I) = C(a p) C(a OP T ) = C(a p) C(a p) C(a p) C(a OP T ) 3 1 α + φ(f, I p) = 3 1 α + φ(f, Iπ(a OP T )) Thus, in each case, we have the desired approximation. Note that our proof of Theorem 2 does not preclude the existence of independent voters. Specifically, we can allow the party affiliation function π to map a subset of voters V0 to a neutral choice (say 0), have these voters only vote in the general election and not in either primary election under the primary system, and the proof of Theorem 2 would still go through. Additionally, we can also relax the restriction that all voters vote in the general election. That is, we can assume that a subset of voters Vg V vote in the general election under the primary system, assume |Vg| γn, and a version of Theorem 2 in which the constant 3 in the bound is replaced by 4 γ γ would hold. This requires a generalization of Lemma 5, omitted due to space constraints. Finally, notice that we do not use the assumption that both parties use the same voting rule f in their primaries. The theorem extends easily to allow the use of different voting rules, with the distortion under the primary system still being bounded in terms of the maximum of the distortions in the two primary elections. Large Primaries Are Not Better Without Party Separability While we showed in the previous section that a voting rule does not perform much worse under the primary system than under the direct system, we now show that it does not perform any better either, at least in the worst case over all instances with at most m candidates. The result continues to hold even if we require each party to have at least a constant fraction of the voters. Note that this result is weaker than Theorem 2 because it is a worst-case comparison instead of an instance-wise comparison. However, it still applies to all voting rules f. It applies to any metric that does not require separability of parties, in particular to Im, and Im,Rk. Theorem 6. For α [0, 0.5], k N, M , Rk (i.e. when the metric space does not require party separability), and affiliation-independent voting rule f, we have φIα m,M( bf) φIα m,M(f). Proof. We shall denote Iα m,M as I. We want to show that for every instance I I, there exists an instance I I such that φ( bf, I ) φ(f, I). This would imply the desired result. Fix an instance I = (V, A, M, ρ, π) I. Let a OP T A denote an optimal candidate in I, and a = f(EI). Note that φ(f, I) = CI(a ) CI(a OP T ). Construct instance I = (V , A, M, ρ , π ) as follows. Let V = V e V , where e V is a new set of voters and |e V | = |V |. Let ρ (x) = ρ(x) for all x V A, and ρ(x) = ρ(a OP T ) for all x e V . That is, ρ matches ρ for existing voters and candidates, and the new voters are co-located with a OP T . Let π (x) = 1 for all x V A, and π (x) = 1 for all x e V . That is, all existing voters and candidates are affiliated with party 1, while all new voters are affiliated with party 1. First, let us check that I I. Since I has m candidates, so does I . Further, in I , we have |V 1| = |V 1| = |V |/2, which satisfies the constraint corresponding to every α [0, 0.5]. Hence, we have I I. Let us apply bf on I . One of its primary instances, I 1, is precisely I. Hence, the primary winner of party 1 is f(I 1) = f(I) = a . Because there are no candidates affiliated with party 1, a becomes the overall winner.4 4Even if we require each party to have at least one affiliated candidate, the proof essentially continues to hold. In this case, we can add one candidate affiliated with party 1 that is located suffi- Next, CI (a ) CI(a ) because V V . Also, CI (a OP T ) = CI(a OP T ) because a OP T has zero distance to all voters in V \ V . Together, they yield φ( bf, I ) = CI (a ) CI (a OP T ) CI(a ) CI(a OP T ) = φ(f, I), (5) as desired. Our proof actually establishes a slightly stronger result. Instead of showing φIα m,M( bf) φIα m,M(f), we actually show φI0.5 m,M( bf) φIα m,M(f). Separability and Its Advantages The analysis for Im,sep-Rk is not so straightforward. In the proof of Theorem 6, we co-located the new voters affiliated with party 1 and a OP T affiliated with party 1. This was allowed because non-separable metrics like and Rk place no constraints on the embedding. With Iα m,sep-Rk, we need the voters and candidates affiliated with one party to be separated from those affiliated with the other. Hence, this operation of putting all of one party s voters at the location of a OP T belonging to another party would be allowed only if, in the original instance I, a OP T is on the boundary of the convex hull of ρ(V A). While this is not the case for all instances, we only need this in at least one worst-case instance for f, i.e., for at least one I Iα m,sep-Rk with φ(f, I) = φIα m,sep-Rk (f). Equation (5) would then yield the desired result. More generally, it is sufficient if, given any ϵ > 0, we can find an instance I such that φ(f, I) φIα m,sep-Rk (f) ϵ and a OP T is at distance at most ϵ from the boundary of the convex hull of ρ(V A). Interestingly, (Anshelevich, Bhardwaj, and Postl 2015) show that this is indeed the case for plurality and Borda count (see the proof of their Theorem 4). Thus, we have the following. Proposition 7. Let f be plurality or Borda count. Then, φI0.5 m,sep-R( bf) φIm, (f). However, known worst cases for the Copeland rule (Anshelevich, Bhardwaj, and Postl 2015) and STV (Skowron and Elkind 2017) do not satisfy this requirement. It is unknown if these rules admit a different worst case that satisfies it. This raises an important question. Does Proposition 7 hold for all affiliation-independent voting rules? We shall shortly answer this negatively. More precisely, we construct an affiliation-independent voting rule f such that φIα m,sep-R( bf) φIα m,sep-R(f) for every α > 0. That is, with large primaries, f performs much better under the primary system than under the direct system, when voters and candidates are embedded on a line and the separability condition is imposed. ciently far from all the voters, ensuring that a still becomes the overall winner. This would show φIα m+1,M( bf) φIα m,M(f) because instance I may now have m + 1 candidates. Note that instances in Im,sep-R are highly structured. For instance, it is known that when voters and candidates are embedded on a line, there always exists a weak Condorcet winner (Black 1948), and selecting such a candidate results in a distortion of at most 3 (Anshelevich, Bhardwaj, and Postl 2015). Hence, we have φIm,sep-R(f) = 3 for every Condorcet-consistent, affiliation-independent voting rule f.5 Our aim in this section is to construct an affiliationindependent voting rule ffail that with respect to Im,sep-R has an unbounded distortion in the direct system, but at most a constant distortion in the primary system. Definition 1. Let ffail be an affiliation-independent voting rule that operates on election E = (V, A, ) as follows. Let A = {a1, . . . , am}, and t = (m + 1)/2. Special Case: If m 9, m is odd, n m2, and has the following structure, then return a1. 1. For voter 1, a1 1 . . . 1 am. 2. For voter 2, am 2 . . . 2 a1. 3. For voter 3, at 1 is the most preferred, and am 2 3 a1 3 am 1 3 am. 4. For voter 4, at+1 is the most preferred, and a3 4 am 4 a2 4 a1. 5. For j [m 2], for voter i = 4 + (2j 1), aj+1 i aj+2 i aj, and for voter i = 4 + 2j, aj+1 i aj i aj+2. 6. For every other voter v, at is the most preferred. If E does not fall under the special case, then apply any Condorcet consistent voting rule (e.g., Copeland s rule). Note that m being odd ensures that t is an integer, and m 9 ensures that a1, a3, at 1, at, at+1, am 2, and am are all distinct candidates. The significance of n m2 will be clear later. We will now establish that a worst-case instance of ffail falls under the special case; for this instance, we need to show that at is socially optimal; that ffail returns a1 on this instance; and most importantly, that the structure of ensures that the optimal candidate at is sufficiently far from both the leftmost and the rightmost candidates. We prove this last fact in the following lemma. Lemma 8. Let I = (V, A, M, ρ, π) Im,sep-R be an instance for which the corresponding election EI falls under the special case of ffail. Then the following holds. 1. Either ρ(a1) . . . ρ(am), or ρ(a1) . . . ρ(am), or |ρ(A)| = 2. 2. If |ρ(A)| = 2, min {d(at, a1), d(at, am)} d(a1,am) The proof of the lemma, and the proof of the next theorem is omitted due to space constraints. Theorem 9. For m 9 and constant α (0, 0.5], φIα m,sep-R( bffail) is upper bounded by a constant, whereas φIα m,sep-R(ffail) is unbounded. 5(Anshelevich, Bhardwaj, and Postl 2015) also proved that no affiliation-independent (deterministic) voting rule can have distortion better than 3, even with respect to Im,sep-R. plurality Borda STV Copeland maximin Direct Primary-split Primary-random plurality Borda STV Copeland maximin Direct Primary-split Primary-random Figure 1: The distortion values for various settings (Direct, Primary-split and Primary-random) and voting rules for k {1, 3}. Each bar represents the average distortion over 1000 instances. Within each figure, all of the bars represent average distortion over the same set of voter and candidate locations, but not necessarily party affiliations. ANOVA on all elections returned p < 10 9 when k = 1 and p < 10 5 when k = 3 Using Simulations to Go Beyond Worst Case So far we compared the distortion of a voting rule under the direct and primary systems, taken in the worst case over a family of instances. In practice, such worst case instances may not arise naturally. In this section, we compare the distortion of a voting rule under the direct and primary systems, in the average case over simulated instances. To control for the effect of vastly different numbers of voters or candidates affiliated with the two parties, we place the restriction that half of the voters and half of the candidates must be affiliated with each party. Furthermore, we wish to examine instances in which voters and candidates affiliated with one party are separated from those affiliated with the other party. We fix the metric space to be [0, 1]k for k {1, 3, 5, 7, 9} with the Euclidean distance. We generate 1000 instances satisfying the following restrictions. First, we place a set V of n = 1000 voters at uniformly random locations in [0, 1]k. Next, we find a hyperplane dividing voters into two equal halves. Due to symmetry, we simply find a threshold t on the kth coordinate such that the locations of half of the voters (call this set V 1) have a smaller kth coordinate, while the locations of the rest (call this set V1) have a greater kth coordinate. We do not affiliate the voters yet. Next, we place m 2 = 10 candidates (call this set A 1) uniformly at random on one side of the hyperplane, and m 2 = 10 candidates (call this set A1) uniformly at random on the other side. Once the locations of the voters and candidates are fixed, we create two instances. In one instance ( split ), we assign V 1 A 1 to party 1, and assign V1 A1 to party 1. This instance belongs to I0.5 m,sep-Rk. In the other instance ( random ), we assign half of the voters and half of the candidates chosen uniformly at random to party 1, and the rest to party 1. This instance belongs to I0.5 m,Rk, but not necessarily to I0.5 m,sep-Rk. This allows us to directly compare the effect that separability has on the distortion. Finally, we run five voting rules plurality, Borda, STV, Copeland and maximin on both instances under the direct and primary systems, and measure the distortion. Note that their distortion under the direct system would be identical for both instances because the two instances only differ in party affiliations. Thus, for each rule, we obtain three numbers: Direct, Primary-split, and Primary-random. We average the distortion numbers across the 1000 instances. See Figure 1 for selected simulation results. The results for k > 3 resemble those for k = 3 in their comparison of the direct versus the primary system; the only difference is that the overall distortions are lower for higher k. To compare the three distortion numbers in each case, we ran a repeated measures ANOVA comparing the distortion values, and in all but 2 of 25 cases, had a p value under 0.05 (the two cases had k = 9). Perhaps the most important observation is that our simulation results stand in direct contrast to our worst-case bounds. In almost all of our settings, the distortion under the primary system (split and random) is better than that the distortion under its direct counterpart, and often shows a significant improvement. This is especially noticeable in the non Condorcet consistent rules (plurality, Borda and STV), as in all but 3 of 15 cases the distortion significantly improves the primary system in both cases. This effect is most pronounced with plurality. With Condorcet consistent rules, the distortion values are very low, regardless of whether the direct system is used or the primary system. In general, as we increase the dimension k, the groups become more homogenous and the p values grow, while distortions approach 1. Overall, the simulation results show a distortion that is far below our theoretical worse-case results. We suspect that the reason for this difference might have to do with our choice of uniform voter and candidate distributions, and distortion numbers might differ under different distributions. Discussion Our paper initiates the novel quantitative study of multistage elections (and their comparison to single-stage elections), but leaves plenty to explore. Some directions are fairly straightforward extensions of our results. The most straightforward question is to tighten our bounds. There is also the question of explaining the trends we observe in the average case, which sometimes differ from our worst-case results. A next step would be to study realistic distributional models of voter preferences and candidate positions in the political spectrum, and analyze their effect on distortion. Other extensions are seemingly more involved. Extending our framework to more than two parties requires the use of a ranked voting rule in the general election, which may significantly affect the analysis. Interestingly, such an extension would also incorporate independent candidates because one can imagine an independent candidate to be a party of their own. Examining the use of multiple and different voting rules as Narodytska and Walsh (2013) do for two-step voting (though without candidate elimination between stages) is an enticing direction. For example, in a multi-party direct system, we may use plurality, whereas in the primary system, the parties may use STV. It is also reasonable to consider that each party has its own voting rule. It would be interesting as well to examine party manipulation techniques in primary systems. Similarly, it is reasonable to believe that candidates may also strategically shift, to some extent, their location following the primaries, to make themselves more appealing to the general electorate. We believe that the study of multi-stage elections and party mechanisms can not only contribute novel theoretical challenges to tackle, but can also bring research on computational social choice closer to reality and increase its impact. Acknowledgements This work was partially supported by NSERC under the Discovery Grants program. Anshelevich, E., and Postl, J. 2017. Randomized social choice functions under metric preferences. Journal of Artificial Intelligence Research 58(1):797 827. Anshelevich, E.; Bhardwaj, O.; and Postl, J. 2015. Approximating optimal social choice under metric preferences. In 29th, 777 783. Bartholdi III, J. J.; Tovey, C. A.; and Trick, M. A. 1992. How hard is it to control an election? Mathematical and Computer Modelling 16(8 9):27 40. Black, D. 1948. On the rationale of group decision-making. Journal of Political Economy 56:23 34. Boutilier, C.; Caragiannis, I.; Haber, S.; Lu, T.; Procaccia, A. D.; and Sheffet, O. 2015. Optimal social choice functions: A utilitarian view. Artificial Intelligence 227:190 213. Brandt, F.; Conitzer, V.; Endriss, U.; Lang, J.; and Procaccia, A. D., eds. 2016. Handbook of Computational Social Choice. Cambridge University Press. Brill, M., and Conitzer, V. 2015. Strategic voting and strategic candidacy. In 29th, 819 826. Cohen, M.; Karol, D.; Noel, H.; and Zaller, J. 2008. The Party Decides: Presidential Nominations Before and After Reform. University of Chicago Press. Cohensius, G.; Mannor, S.; Meir, R.; Meirom, E.; and Orda, A. 2017. Proxy voting for better outcomes. In 16th, 858 866. Conitzer, V., and Yokoo, M. 2010. Using mechanism design to prevent false-name manipulations. AI Magazine 31(4):65 77. Cross, W. P., and Blais, A. 2011. Who selects the party leader? Party Politics 18(2):127 150. Cross, W. P., and Katz, R. S., eds. 2013. The Challenges of Intra Party Democracy. Oxford University Press. Downs, A. 1957. An Economic Theory of Democracy. Harper and Row. Dutta, B.; Jackson, M. O.; and Le Breton, M. 2001. Strategic candidacy and voting procedures. Econometrica 69(4):1013 1037. Dutta, B.; Jackson, M. O.; and Le Breton, M. 2002. Voting by successive elimination and strategic candidacy. Journal of Economic Theory 103(1):190 218. Feldman, M.; Fiat, A.; and Golomb, I. 2016. On voting and facility location. In Proceedings of the 17th ACM Conference on Economics and Computation (EC), 269 286. Hamilton, A.; Madison, J.; and Jay, J. 1787. The Federalist Papers. The Independent Journal. Hazan, R. Y. 1997. The 1996 intra-party elections in israel: Adopting party primaries. Electoral Studies 16(1):95 103. Jobson, R., and Wickham-Jones, M. 2010. Gripped by the past: Nostalgia and the 2010 labour party leadership contest. British Politics 5(4):525 548. Kahng, A.; Mackenzie, S.; and Procaccia, A. D. 2018. Liquid democracy: An algorithmic perspective. In 32nd, 1095 1102. Kenig, O. 2009. Classifying party leaders selection methods in parliamentary democracies. Journal of Elections, Public Opinion and Parties 19(4):433 447. Narodytska, N., and Walsh, T. 2013. Manipulating two stage voting rules. In 12th, 423 430. Norpoth, H. 2004. From primary to general election: A forecast of the presidential vote. PS: Political Science & Politics 37(4):737 740. Polukarov, M.; Obraztsova, S.; Rabinovich, Z.; Kruglyi, A.; and Jennings, N. R. 2015. Convergence to equilibria in strategic candidacy. In 24th, 624 630. Procaccia, A. D., and Rosenschein, J. S. 2006. The distortion of cardinal preferences in voting. In Proceedings of the 10th International Workshop on Cooperative Information Agents (CIA), 317 331. Rothe, J., ed. 2015. Economics and Computation. Springer. Schofield, N. 2008. The Spatial Model of Politics. Number 95 in Routledge Frontiers of Political Economy. Routledge. Sides, J.; Tausanovitch, C.; Vavreck, L.; and Warshaw, C. 2018. On the representativeness of primary electorates. British Journal of Political Science 1 9. Simms, B. 2007. Three Victories and a Defeat: The Rise and Fall of the First British Empire, 1714-1783. Allen Lane. Skowron, P., and Elkind, E. 2017. Social choice under metric preferences: Scoring rules and stv. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI), 706 712. Wauters, B. 2010. Explaining participation in intra-party elections: Evidence from belgian political parties. Party Politics 16(2):237 259. Wilentz, S. 2005. The Rise of American Democracy. Norton.