# on_strategyproof_conference_peer_review__c8302b6e.pdf

On Strategyproof Conference Peer Review

Yichong Xu1 , Han Zhao1 , Xiaofei Shi2 and Nihar B. Shah1

1Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA 2Department of Mathematical Sciences, Carnegie Mellon University, Pittsburgh, PA, USA {yichongx, han.zhao, nihars}@cs.cmu.edu, xiaofeis@andrew.cmu.edu

Abstract We consider peer review under a conference setting where there are conﬂicts between the reviewers and the submissions. Under such conﬂicts, reviewers can manipulate their reviews in a strategic manner to inﬂuence the ﬁnal rankings of their own papers. Present-day peer-review systems are not designed to guard against such strategic behavior, beyond minimal (and insufﬁcient) checks such as not assigning a paper to a conﬂicted reviewer. In this work, we address this problem through the lens of social choice, and present a theoretical framework for strategyproof and efﬁcient peer review. Given the conﬂict graph which satisﬁes a simple property, we ﬁrst present and analyze a ﬂexible framework for reviewer-assignment and aggregation for the reviews that guarantees not only strategyproofness but also a natural efﬁciency property (unanimity). Our framework is based on the so-called partitioning method, and can be treated as a generalization of this type of method to conference peer review settings. We then empirically show that the requisite property on the (authorship) conﬂict graph is indeed satisﬁed in the ICLR-17 submissions data, and further demonstrate a simple trick to make the partitioning method more practically appealing under conference peer-review settings. Finally, we complement our positive results with negative theoretical results where we prove that under slightly stronger requirements, it is impossible for any algorithm to be both strategyproof and efﬁcient.

1 Introduction

Peer review serves as an effective solution for quality evaluation in reviewing processes, especially in academic paper review [D orﬂer et al., 2017, Shah et al., 2017] and massive open online courses (MOOCs) [D ıez Pel aez et al., 2013, Piech et al., 2013, Shah et al., 2013]. However, despite its scalability, competitive peer review faces the serious challenge of being vulnerable to strategic manipulations [Anderson et al., 2007, Thurner and Hanel, 2011, Alon et al., 2011, Kurokawa et al., 2015, Kahng et al., 2018]. By manipulating the ratings or rankings provided reviewers may be able to increase the

Equal contribution.

chance that their own submissions get accepted. As noted by Thurner et al. [2011], even a small number of selﬁsh, strategic reviewers can drastically reduce the quality of scientiﬁc standard. Thus the importance of peer review in academia and its considerable inﬂuence over the careers of researchers signiﬁcantly underscores the need to design peer review systems that are insulated from strategic manipulations. Presentday peer-review systems are, however, ill-equipped to handle strategic behavior, and do not have any rigorous framework in place except some basic checks like not assigning a paper to a conﬂicted reviewer. It is easy to show that these checks are insufﬁcient, that is, reviewers can manipulate the ﬁnal ranking of their conﬂicted papers by strategically manipulating their reports in current peer-review systems. In this work, we present a framework for conference peer review that addresses the problem of strategic behavior. Our problem setup comprises a set of submitted papers and a set of reviewers. We are given a graph which we term as conﬂict graph . The conﬂict graph is a bipartite graph with the reviewers and papers as the two partitions of vertices, and an edge if there exists a conﬂict between the reviewer and the paper. Conﬂicts may arise due to authorship or other reasons such as institutional conﬂicts. Given this, there are two design steps in the peer review procedure: (i) assigning each paper to a subset of reviewers, and (ii) aggregating the review results provided by the reviewers to give a ﬁnal evaluation of each paper. We focus on ordinal preferences where each reviewer is asked to give a total ranking of the assigned papers, as ordinal data avoids biases and miscalibrations and provides a more direct comparison between papers [Barnett, 2003, Stewart et al., 2005, Douceur, 2009, Tsukida and Gupta, 2011, Shah et al., 2013, Shah et al., 2016]. Some modern large conferences (e.g., Neur IPS [Shah et al., 2017]) have also collected ordinal preferences for experimental purposes. Also, our methods can be extended easily to a score-based review system (cf. Section 4.1). Under the setting, we require our mechanism to output a total ranking over all papers, since this automated output in practice would be a guideline for program chairs to make their decisions (e.g., orals, posters, best papers, etc.). We consider two important goals for designing a good conference review procedure strategyproofness and efﬁciency. In our context, strategyproofness means that no reviewer can change the outcome of any papers which she/he has a conﬂict with, and efﬁciency means that the ﬁnal output of the procedure reﬂects the opinion of most reviewers. Here, we consider

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

efﬁciency in terms of the notion of unanimity in social choice theory: an agreement among all reviewers must be reﬂected in the ﬁnal aggregation. In addition to the conceptual contributions, we make several technical contributions towards this important problem. We ﬁrst design a peer review framework which theoretically guarantees strategyproofness along with a notion of efﬁciency that we term as group unanimity . With only a mild and realistic assumption on the conﬂict graph, we establish our positive results for the peer review design task. Importantly, our framework and the speciﬁc algorithms are quite ﬂexible in that it guarantees strategyproofness and unanimity, while also leaving open a signiﬁcant room for program chairs to implement their choice of decision-making strategies. On the practical front, we validate that the aforementioned assumption indeed holds in practice via an empirical analysis of the submissions made to the International Conference on Learning Representations 2017 (ICLR-17). Furthermore, we demonstrate a simple trick to make the partitioning method more practically appealing for conference peer review and present it on the ICLR-17 data. We also complement our positive results with negative results showing that one cannot expect to meet requirements that are stronger than that provided by our framework. First, we show that under mild assumptions on the conﬂict graph, there is no algorithm that satisﬁes pairwise unanimity , which is a stronger notion of efﬁciency than group unanimity, and is also known as Pareto efﬁciency in the literature of social choice [Brandt et al., 2016]. Second, we provide a conjecture and insightful results to show that if we require the assignment to satisfy a simple connectivity condition, our negative result continues to hold even when the notion of strategyproofness is made extremely weak. These negative results highlight the intrinsic hardness in designing strategyproof conference review systems.

2 Related Work As early as in the 1970s, Gibbard and Satterthwaite had already been aware of the importance of a healthy voting rule that guarantees strategyproofness under the setting of social choice [Gibbard, 1973, Satterthwaite, 1975]. Recently, the fact that prominent peer review mechanisms are manipulable, has further called for strategyproof peer review mechanisms [Merriﬁeld and Saari, 2009, Hazelrigg, 2013]. Our work is most closely related to a series of works on strategyproof peer selection [Alon et al., 2011, Holzman and Moulin, 2013, Fischer and Klimm, 2015, Kurokawa et al., 2015, Aziz et al., 2016, Kahng et al., 2018], where agents cannot beneﬁt from misreporting their preferences over other agents. In all these works, each agent is essentially required to evaluate all the other agents except herself. This is impractical for conference peer review, where each reviewer only has to review a small subset of submissions. In light of such constraints, Kurokawa et al. [2015] propose a strategyproof (impartial) mechanism and provide associated approximation guarantees in which each agent is only required to review a few other agents, but their algorithm might return an empty set. Based on their work, Aziz et al. [2016] then propose an improved mechanism for peer selection which is strategyproof and satisﬁes a natural monotonicity property. However, even if the target output

size is k, it may return a subset of size strictly larger than k. Past works focus on applications of peer-grading and grant proposal review, and hence consider only the case where every reviewer is conﬂicted with exactly one paper. In contrast, our setting of conference peer review is more challenging: We allow each reviewer to have conﬂicts with multiple papers and each paper to have conﬂicts with multiple reviewers. Formally, the conﬂict graph C under conference peer review is a more general bipartite graph, where conﬂicts between reviewers and papers can arise not only because of authorships, but also advisor-advisee relationships, institutional conﬂicts, etc.

3 Problem Setting

We deﬁne R : tr1, . . . , rmu to be the set of m reviewers and P : tp1, . . . , pnu to be the set of n submitted papers. To characterize conﬂicts of interest, we use a bipartite graph C with vertices p R, Pq, where an edge is connected between a reviewer r and a paper p if there exists some conﬂict of interests between the r and p. Given the set of submitted papers and reviewers, the conﬂict graph is ﬁxed and cannot be controlled. Note that the conﬂict graph C deﬁned above can be viewed as a realistic generalization of the authorship graph in the previously-studied settings of peer grading and grant proposal review, in which each reviewer (paper) is connected to at most one paper (reviewer). The review process is modeled by a second bipartite graph G, termed as review graph, that also has the reviewers and submissions p R, Pq as its vertices. This review graph has an edge between a reviewer and a paper if that reviewer reviews that submission. For each reviewer ri pi P rmsq, we let Pi Ñ P denote the set of papers assigned to this reviewer for review, or in other words, the neighborhood of node ri in the bipartite graph G. To ensure balanced workloads across reviewers, we require that every reviewer is assigned at most µ papers for some integer 1 µ n. In other words, every node in R has at most µ neighbors (in P) in graph G. Additionally, each paper must be reviewed by at least λ reviewers for some integer 1 λ m. Thus every node in the set P must have at least λ neighbors (in R) in the graph G. For any (directed or undirected) graph H, we let the notation EH denote the set of edges in graph H. At the end of the reviewing period, each reviewer provides a total ranking of the papers that she/he reviewed. For any set of papers P1 Ñ P, we let p P1q denote the set of all permutations of papers in P1. Furthermore, for any paper pj P P1 and any permutation p P1q P p P1q, we let jp P1q denote the position of paper pj in the permutation p P1q. At the end of the reviewing period, each reviewer ri submits a total ranking piqp Piq P p Piq of the papers in Pi. We deﬁne a (partial) ranking proﬁle : p p1qp P1q, . . . , pmqp Pmqq as the collection of rankings from all the reviewers. When the assignment P1, . . . , Pm of papers to reviewers is ﬁxed, we use the shorthand p p1q, . . . , pmqq for proﬁle . For any subset of papers P1 Ñ P, we let P1 denote the restriction of to only the induced rankings on P1. Finally, when the ranking under consideration is clear from context, we use the notation p p1 to say that paper p is ranked higher than paper p1. Under this problem setup, the goal is to jointly design: (a) a paper-reviewer assignment scheme, that is, edges of

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

the graph G, and (b) an associated review aggregation rule f : m i 1 p Piq Ñ p Pq which maps from the ranking proﬁle to an aggregate total ranking of all papers. For any aggregation function f, we let fjp q be the position of paper pj when the input to f is the proﬁle . Although we assume ordinal feedback from the reviewers, our results continue to hold if we have review scores as our input instead of rankings; our framework is ﬂexible enough to take the scores into account (cf. Section 4.1). In what follows we deﬁne strategyproofness and efﬁciency that any conference review mechanism f should satisfy under our setting. Due to space limit, we omit all the proofs and refer interested readers to an online version [Xu et al., 2018] for all the further details.

3.1 Strategyproofness In the context of conference review, strategyproofness is deﬁned with respect to a given conﬂict graph C. It means that a reviewer cannot change the position of her conﬂicting papers, by manipulating the ranking she provides.

Deﬁnition 1 (Strategyproofness, SP). A review process p G, fq is called strategyproof with respect to a conﬂict graph C if for every reviewer ri P R and every paper pj P P with pri, pjq P EC, under assignment G, for every pair of proﬁles that differ only in the ranking given by reviewer ri, the position of pj is unchanged. Formally, for every p p1q, . . . , pi 1q, piq, pi 1q, . . . , pmqqq and 1 p p1q, . . . , pi 1q, piq1, pi 1q, . . . , pmqq, the results remain unchanged as fjp q fjp 1q. A strategyproof peer review procedure alone is inadequate with respect to any practical requirements simply giving out a ﬁxed, arbitrary evaluation makes the peer review procedure strategyproof. We therefore consider efﬁciency of the procedure in the next section, to ensure that the authors receive meaningful and helpful feedback for their work.

3.2 Efﬁciency (Unanimity) We consider efﬁciency of a peer review process in the notion of unanimity , which is one of the most prevalent and classic properties to measure the efﬁciency of a voting system in the literature of social choice [Fishburn, 2015]. At a colloquial level, unanimity states that when there is a common agreement among all reviewers, the aggregation of the opinions must also respect this agreement. We discuss two kinds of unanimity, termed as group unanimity (GU) and pairwise unanimity (PU). Both kinds of unanimity impose requirements on the aggregation function for given paper-reviewer assignment. The group unanimity is deﬁned as follows:

Deﬁnition 2 (Group Unanimity, GU). The pair p G, fq is said to be group unanimous (GU) if the following condition holds for every possible proﬁle : For every set of papers P1 Ä P such that every reviewer ranks the papers she reviewed from P1 higher than those she reviewed from Pz P1, the output fp q must satisfy px py for every pair of papers px P P1 and py P Pz P1 such that at least one reviewer has reviewed both px and py.

Intuitively, group unanimity says that if papers can be partitioned into two sets such that every reviewer who has reviewed papers from both sets agrees that the papers she has reviewed

from the ﬁrst set are better than what she reviewed from the second set, then the ﬁnal output ranking should respect this agreement. Our second notion of unanimity, termed pairwise unanimity, is a local reﬁnement of group unanimity. This notion is identical to the classical notion of unanimity stated in Arrow s impossibility theorem [Arrow, 1950]. Notice that the classical unanimity considers every reviewer to review all papers (that is, Pi P, @i P rms), whereas our notion is generalized to settings where reviewers may review only a subset of papers.

Deﬁnition 3 (Pairwise Unanimity, PU). We deﬁne p G, fq to be pairwise unanimous (PU) if the following condition holds for every possible proﬁle and every pair of papers pj1, pj2 P P: If at least one reviewer has reviewed both pj1 and pj2 and all the reviewers that have reviewed pj1 and pj2 agree on pj1 pj2, then fj1p q fj2p q.

An important property is that pairwise unanimity is stronger than group unanimity.

Proposition 1. If p G, fq is pairwise unanimous, then p G, fq is also group unanimous.

4 Positive Results

In this section we consider the design of paper-reviewer assignments and aggregation rules for strategyproofness and group unanimity (efﬁciency). Prior works on this topic consider a speciﬁc and restricted class of conﬂict graphs - - those with one-to-one relations between papers and reviewers which do not capture conference peer review settings. We consider a more general class of conﬂict graphs and present an framework based on the partitioning-based method [Alon et al., 2011], which we show can achieve group unanimous and strategyproofness. The key idea is to assign a paper to a reviewer only if there is no path between this paper and reviewer in the conﬂict graph C. We then empirically demonstrate, using submission data from the ICLR-17 conference, that this class of conﬂict graphs is indeed representative of peer review settings. In addition to the feasibility, we present a simple trick to improve the practical appeal of our framework (and more generally the partitioning method) to conference peer review.

4.1 The Divide-and-Rank Framework We now present our Divide-and-Rank framework consisting of the reviewer assignment algorithm (Algorithm 1) and the rank aggregation algorithm (Algorithm 2). At a higher level, our framework performs a partition of the reviewers and papers for assignment, and aggregates the reviews by computing a ranking which is consistent with any group agreements. Divide-and-Rank works for a general conﬂict graph C as long as the conﬂict graph can be divided into two reasonably-sized disconnected components. Importantly, the framework is simple yet ﬂexible in that the assignment within each partition and the aggregation among certain groups of papers can leverage any existing algorithm for assignment and aggregation respectively, which is useful as it allows to further optimize various other metrics in addition to strategyproofness and unanimity. The Divide-and-Rank assignment algorithm begins by partitioning the conﬂict graph into two disconnected components

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

Algorithm 1 Divide-and-Rank assignment

Input: conﬂict graph C, parameters λ, µ, assignment algorithm A Output: an assignment of reviewers to papers 1: p RC, PCq, p R s C, P s Cq Partitionp C, λ, µq 2: use algorithm A to assign papers P s C to reviewers RC 3: use algorithm A to assign papers PC to reviewers R s C 4: return the union of assignments from step 2 and 3 5: 6: procedure PARTITION(conﬂict graph C, parameters λ, µ) 7: run a BFS on C to get connected K components tp Rk, Pkqu K k 1 8: let rk |Rk|, pk |Pk|, @k P r Ks 9: initialize a table Tr , , s P t0, 1u Kˆpm 1qˆpn 1q so that Tr1, r1, p1s Tr1, 0, 0s 1, otherwise 0 10: for k 2 to K do 11: Trk, r, ps Trk 1, r, ps _ Trk 1, r rk, p pks, @0 r m, 0 p n 12: end for 13: for 0 r m, 0 p n, if there is no Tr K, r, ps 1 such that maxt p m r, n p

λ, return ERROR 14: use the standard backtracking in the table Tr , , s to return p RC, PCq and p R s C, P s Cq 15: end procedure

that meet the requirements speciﬁed by µ and λ. Although dividing into more groups can lead to similar unanimity and strategyproof properties, we use two groups for simplicity and computational efﬁciency. The subroutine Partition ﬁrst runs a breadth-ﬁrst-search (BFS) algorithm to partition the original conﬂict graph into K connected components, where the kth connected component contains rk 0 reviewers and pk 0 papers. Next, the algorithm performs a dynamic programming to compute all the possible subset sums achievable by the K connected components. Here Trk, r, ps 1 means that there exists a partition of the ﬁrst k components such that one side of the partition has r reviewers and p papers, and 0 otherwise. The last step is to check whether there exists a subset C satisfying the requirement, and if so, runs a standard backtracking algorithm along the table to ﬁnd the actual subset C. Clearly the Partition runs in Op Knmq, and since K n m, it runs in polynomial time in the size of the input conﬂict graph C. In the next step, the algorithm assigns papers to reviewers in a fashion that guarantees each paper is going to be reviewed by at least λ reviewers and each reviewer reviews at most µ papers. The assignment of papers in any individual component (to reviewers in the other component) can be done using any assignment algorithm (taken as an input A) as long as the algorithm can satisfy the pµ, λq-requirements. Possible choices for the algorithm A include the popular Toronto paper matching system [Charlin and Zemel, 2013] and others [Hartvigsen et al., 1999, Garg et al., 2010, Stelmakh et al., 2018]. We then introduce the aggregation procedure in Algorithm 2. Generally speaking, the papers in each component are aggregated separately using the subroutine Contract-and-Sort. This aggregation in Contract-and-Sort is performed by a topological ordering of all strongly connected components (SCCs) according to the reviews, and then ranking the pa-

Algorithm 2 Divide-and-Rank aggregation

Input: proﬁle p p1qp P1q, . . . , pmqp Pmqq, groups p RC, PCq, p R s C, P s Cq with |PC| |P s C|, aggregation algorithm B Output: total ranking of all papers 1: compute C as the restriction of proﬁle to only papers in PC, and C as the restriction of proﬁle to only papers in P s C 2: C Contract-and-Sortp B, Cq 3: s C Contract-and-Sortp B, Cq

4: deﬁne I Y n |PC| ] , Y 2n |PC| ] , ..., n

5: return total ranking obtained by ﬁlling papers in PC into positions in I in order given by C, and papers in P s C into positions in rnsz I in order given by s C 6: 7: procedure CONTRACT-AND-SORT(aggregation algorithm B, proﬁle r p p1q, . . . , pm1qq) 8: build a directed graph Gr with the papers in r as its vertices and no edges 9: for each i P rm1s do 10: Suppose piq ppj1 . . . pjti ), add a directed edge from pjk to pjk 1 in Gr , @k P rti 1s 11: end for 12: compute a topological ordering of the strongly connected components (SCCs) in Gr 13: for every SCC in Gr , compute a permutation of the papers in the component using algorithm B 14: return the permutation of all papers that is consistent with the topological ordering of the SCCs and the permutations within the SCCs 15: end procedure

pers within each set using any arbitrary aggregation algorithm (taken as an input B)1. Possible choices for the algorithm B include the modiﬁed Borda count [Emerson, 2013], Plackett Luce aggregation [Hajek et al., 2014], or others [Caragiannis et al., 2017]. Moving back to Algorithm 2, the two rankings returned by Contract-and-Sort respectively for the two components are simply interlaced to obtain a total ranking over all the papers: the slots for C are reserved in set I, and rnsz I contain the slots for the remaining papers. In our extended version of the paper we also show that the interleaving only causes a small change w.r.t an underlying optimal ranking. The following theorem now shows that Divide-and-Rank satisﬁes group unanimity and is strategyproof.

Theorem 1. Suppose C can be partitioned into two groups p RC, PCq and p R s C, P s Cq such that there are no edges in C across the groups and that max |PC|

|RÑ C|, |PÑ C| |RC| ( µ λ. Then Divide-and-Rank is group unanimous and strategyproof.

Remark. Our Divide-and-Rank framework aptly handles the various nuances of real-world conferences peer review, which render other existing methods inapplicable. This includes the facts that each reviewer can have conﬂicts with

1In the case where there are multiple topological orderings, any one of them sufﬁces.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

multiple papers and each paper can have conﬂicts with multiple reviewers, and furthermore that each reviewer may review only a subset of papers. Even under this challenging setting, our framework guarantees that no reviewer can inﬂuence the ranking of her own paper via strategic behavior, and that it is efﬁcient from a social choice perspective. Extension to review scores. Our framework can easily extend to a score-based setting, wherein each reviewer ri provides a score sij for every paper pj P Pi. The assignment algorithm remains the same in this setting; for aggregation, we can use the same procedure with the ranking induced by the review scores. The only difference is that in step 10 of Contract-and-Sort, we add an edge between every pair of papers pj1 Ñ pj2 if sij1 sij2. This makes sure that the graph fully reﬂects the opinion of the reviewer and do not impose constraints on papers that are equally rated. On the other hand, the aggregation algorithm B can also leverage the review scores for a more granularized ranking (e.g., average scores). Our deﬁnition of unanimity and strategyproof can also be straightforwardly extended to the score setting, and our framework still preserves these properties under these deﬁnitions. See our extended version of this paper [Xu et al., 2018] for further details.

4.2 Analysis of ICLR-17 Submissions Our Divide-and-Rank framework is based on a partitioning method that relies on a partition of the set of reviewers and papers so that there is no conﬂict across the partition. In this subsection we restrict attention to the authorship conﬂict graph, where we empirically verify that the partitioning conditions indeed hold in a conference peer review setting using data from the ICLR-17 conference. We then demonstrate how to make the partitioning method more appealing for conference peer review. In particular, we show that removing only a small number of reviewers can drastically reduce the size of the largest component in the conﬂict graph, thus providing great ﬂexibility towards partitioning the papers and authors. We analyzed all papers submitted to ICLR-17 with the given authorship relationship as the conﬂict graph. ICLR-17 received 489 submissions by 1,417 authors; we believe it is a good representative of a medium-sized modern conference. It is important to note that we consider only the set of authors as the entire reviewer pool (since we do not have access to the actual reviewer identities). Adding reviewers from outside the set of authors would only improve the results since these additional reviewers will have no edges in the (authorship) conﬂict graph. We ﬁrst investigate the existence of (moderately sized) components in the conﬂict graph,which shows that the authorship graph is not only disconnected, but also has more than 250 components. The largest connected component(CC) contains 133 (that is, about 27%) of all papers, and the 2nd largest CC is much smaller, hence empirically verify our assumptions in Theorem 1. The partitioning method is previously considered for the problem of peer grading [Kahng et al., 2018], where the setting is quite homogeneous in that each reviewer (student) goes through the same course and hence any paper (homework) can be assigned to any reviewer. In peer review, however, different reviewers typically have different areas of expertise and hence their abilities to review any paper varies by the area

#Authors removed from reviewer pool 0 5 10 15 20 50 100

#Components 253 268 278 292 302 334 389 1st #Authors 371 313 304 228 205 55 28 1st #Papers 133 114 110 82 74 18 8

Table 1: Statistics of the conﬂict graph on removing a small number ( 7%) of authors from the reviewer pool of 1,417 authors.

of the paper. In order to accommodate this diversity in area of expertise in peer review, one must have a greater ﬂexibility in terms of paper assignment. In our analysis, the largest CC contains 372 authors and 133 papers. It is reasonable to expect that a large number of reviewers with expertise required to review these 133 papers would also be in the same CC, thus a na ıve application of Divide-and-Rank would assign these 133 papers to reviewers who may have a much less expertise. This is indeed a concern, and in what follows, we discuss a simple yet effective way to ameliorate this problem. We show empirically using the ICLR-17 data that by removing only a small number of authors from the reviewer pool of the ICLR-17 data, the conﬂict graph can be much more sparse, allowing for more ﬂexible application of Divide-and-Rank. Speciﬁcally, we remove a small fraction of authors from the reviewer pool using the simple heuristic of removing the authors with the maximum degree in the (authorship) conﬂict graph. We then study the statistics of the resulting conﬂict graph (with all papers but only the remaining reviewers) in terms of the numbers and sizes of the CC. The results are shown in Table 1. After removing a small fraction 100 authors which is only about 7% the number of papers in the largest CC reduces by 94% to just 8. Likewise, the number of authors in the largest CC reduces to as small as 28 from 371 originally. It demonstrates that despite all the idiosyncrasies of conference peer review, the Divide-and-Rank and partitioning can be made practically applicable.

5 Negative Results The positive results in the previous section focus on group unanimity, which is weaker than the conventional notion of unanimity, also known as pairwise unanimity. Moreover, the framework induces a disconnected review graph whereas the review graphs of conferences today are typically connected [Shah et al., 2017]. It is thus natural to ask the following: Can a peer review system with a connected reviewer graph satisfy these properties? Can a strategyproof peer review system be pairwise unanimous? In this section we present some negative results toward these questions, thereby highlighting the critical impediments towards stronger results. Before stating our results, we give another notion of strategyproofness, which is signiﬁcantly weaker than the notion of strategyproofness (Deﬁnition 1), and is hence termed as weak strategyproofness.

Deﬁnition 4 (Weak Strategyproofness, WSP). A review process p G, fq is called weakly strategyproof, if for every reviewer ri, there exists some paper pj P P such that for every pair of distinct proﬁles (under assignment G) p p1q, . . . , pi 1q, piq, pi 1q, . . . , pmqq and 1 p p1q, . . . , pi 1q, piq1, pi 1q, . . . , pmqq, it is guaranteed that fjp q fjp 1q.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

Unanimity Strategyproof Requirement on G Possible? Reference Pairwise None Mild (see Corollary 1) No Theorem 2 Group Weak Mild (Connected G) Conjecture: No Proposition 2 Group Yes None Yes Theorem 1

Table 2: Summary of our negative results (ﬁrst two rows of the table), and a comparison to our positive result (third row).

In other words, weak strategyproofness requires for each reviewer there is at least one paper (not necessarily authored by this reviewer) whose ranking cannot be inﬂuenced by the reviewer. As the name suggests, strategyproofness is strictly stronger than weak strategyproofness, when each reviewer has at least one paper of conﬂict. We deﬁne the notion of weak strategyproofness mainly for our theoretical purposes of negative results, since WSP is too weak to be practical. However, even this extremely weak requirement is impossible to satisfy. We summarize our results in Table 2. We show the property of group unanimity and strategyproofness for Divide-and Rank; as the ﬁrst direction of possible extension, we show in Theorem 2 that the slightly stronger notion of pairwise unanimity is impossible to satisfy under mild assumptions, even without strategyproof constraints. Then we explore the second direction of extension, by requiring a connected G, and give conjectures that group unanimity and weak strategyproofness is impossible under this setting.

5.1 Impossibility of Pairwise Unanimity In order to precisely state our result, we ﬁrst introduce the notion of a review-relation graph H. Given a paper-review assignment t Pium i 1, the review-relation graph H is an undirected graph with rns as its vertices and where any two papers pj1 and pj2 are connected iff there exists at least one reviewer who reviews both the papers. With this preliminary in place, we are now ready to state our results: Theorem 2. If H has a cycle of length 3 or more and there is no single reviewer reviews all the papers in the cycle, then there is no review process p G, fq that is pairwise unanimous. The proof of Theorem 2 is similar to a Condorcet cycle proof. In the corollary below we give some direct implications of the condition in Theorem 2 when |P1| |Pm| µ, that is, when every reviewer ranks a same number of papers. Corollary 1. Suppose |P1| |Pm| µ 2. If p G, fq is pairwise unanimous, the following conditions hold: (i) H does not contain any cycles of length µ 1 or more. (ii) The set of papers reviewed by any pair of reviewers ri1 and ri2 must satisfy the condition |Pi1 X Pi2| P t0, 1, µu. In words, if a pair of reviewers review more than one common papers, they must review exactly the same set. (iii) The number of distinct sets in Pi, . . . , Pm is at most n 1 µ 1.

Remark. In modern conferences [Shah et al., 2017], each reviewer usually reviews 3 to 6 papers. If the review process is pairwise unanimous, by Corollary 1(iii) the number of distinct review sets is much smaller than the number of reviewers; this severely limits the design of review sets, since many reviewers would be necessitated to review identical sets of papers. (ii) is also a relatively strong requirement, since the specialization

of reviewers might not allow for such limit of the intersection of review sets. For instance, there is a large number of pairs of reviewers who review more than one common paper but none with the same set of papers [Shah et al., 2017]. In summary, Theorem 2 and Corollary 1 show the difﬁculty to satisfy pairwise unanimity, even without strategyproofness.

5.2 Group Unanimity and Strategyproofness for a Connected Review Graph Having shown that pairwise unanimity is too strong a requirement to satisfy, we now consider another direction for extension conditions on the review graph G. A natural question follows: Under what condition on the review graph G are both group unanimity and strategyproofness possible? Although we will leave the question of ﬁnding the exact condition open, we conjecture that if we require G to be connected, then group unanimity and strategyproofness cannot be simultaneously satisﬁed. To show our insights, we analyze an extremely simpliﬁed review setting and present a negative result under this setting. Proposition 2. Consider any n 4 and suppose P P1 Y P2YP3YP4, where P1, P2, P3, P4 are disjoint nonempty sets of papers. Consider a review graph G with m 3 reviewers, where reviewer r1 reviews t P1, P2u, r2 reviews t P2, P3u, and r3 reviews t P3, P4u. Then there is no aggregation function f that is both weakly strategyproof and group unanimous. Proposition 2 thus shows for simple review graph considered in the statement, group unanimity and weak strategyproofness cannot hold at the same time. We conjecture that such a negative result may hold for more general connected review graphs, which could be shown by identifying a component of the general review graph satisfying the condition of Proposition 2. This shows that our design process of the review graph in Section 4 is quite essential.

6 Discussion We provide a framework and associated algorithms to impart strategyproofness to conference peer review. Our framework, besides guaranteeing strategyproofness, is importantly very ﬂexible in allowing the program chairs to use the decisionmaking criteria of their choice. We complement these positive results with negative results showing that it is impossible for any algorithm to remain strategyproof and satisfy the stronger notion of pairwise unanimity. Future work includes considering efﬁciency from a statistical perspective and characterizing the precise set of conﬂict-of-interest graphs that permit (or not) strategyproofness. Acknowledgements YX was partially supported by DARPA award FA8750-172-0130. HZ would like to acknowledge the support from DARPA XAI project, contract FA87501720152. NBS was partially supported by NSF 1755656 and 1763734.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

References [Alon et al., 2011] Noga Alon, Felix Fischer, Ariel Procaccia, and Moshe Tennenholtz. Sum of us: Strategyproof selection from the selectors. In TARK. ACM, 2011. [Anderson et al., 2007] Melissa S Anderson, Emily A Ronning, Raymond De Vries, and Brian C Martinson. The perverse effects of competition on scientists work and relationships. Science and engineering ethics, 2007. [Arrow, 1950] Kenneth J Arrow. A difﬁculty in the concept of social welfare. Journal of political economy, 1950. [Aziz et al., 2016] Haris Aziz, Omer Lev, Nicholas Mattei, Jeffrey S Rosenschein, and Toby Walsh. Strategyproof peer selection: Mechanisms, analyses, and experiments. In AAAI, pages 397 403, 2016. [Barnett, 2003] William Barnett. The modern theory of consumer behavior: Ordinal or cardinal? The Quarterly Journal of Austrian Economics, 6(1):41 65, 2003. [Brandt et al., 2016] Felix Brandt, Vincent Conitzer, Ulle Endriss, Ariel D Procaccia, and J erˆome Lang. Handbook of computational social choice. 2016. [Caragiannis et al., 2017] Ioannis Caragiannis, Xenophon Chatzigeorgiou, George A Krimpas, and Alexandros A Voudouris. Optimizing positional scoring rules for rank aggregation. In AAAI, pages 430 436, 2017. [Charlin and Zemel, 2013] L. Charlin and R. S. Zemel. The Toronto Paper Matching System: An automated paperreviewer assignment system, 2013. [D ıez Pel aez et al., 2013] Jorge D ıez Pel aez, Oscar Luaces Rodr ıguez, Amparo Alonso Betanzos, Alicia Troncoso, and Antonio Bahamonde Rionda. Peer assessment in moocs using preference learning via matrix factorization. In NIPS Workshop on Data Driven Education, 2013. [D orﬂer et al., 2017] Florian D orﬂer, Yuanzhang Xiao, and Mihaela van der Schaar. Incentive design in peer review: Rating and repeated endogenous matching. IEEE Transactions on Network Science and Engineering, 2017. [Douceur, 2009] John R Douceur. Paper rating vs. paper ranking. ACM SIGOPS Operating Systems Review, 2009. [Emerson, 2013] Peter Emerson. The original Borda count and partial voting. Social Choice and Welfare, 2013. [Fischer and Klimm, 2015] Felix Fischer and Max Klimm. Optimal impartial selection. SIAM Journal on Computing, 44(5):1263 1285, 2015. [Fishburn, 2015] Peter C Fishburn. The theory of social choice. Princeton University Press, 2015. [Garg et al., 2010] N. Garg, T. Kavitha, A. Kumar, K. Mehlhorn, and J. Mestre. Assigning papers to referees. Algorithmica, 58(1):119 136, Sep 2010. [Gibbard, 1973] Allan Gibbard. Manipulation of voting schemes: a general result. Econometrica: journal of the Econometric Society, pages 587 601, 1973. [Hajek et al., 2014] Bruce Hajek, Sewoong Oh, and Jiaming Xu. Minimax-optimal inference from partial rankings. In Advances in Neural Information Processing Systems, pages 1475 1483, 2014.

[Hartvigsen et al., 1999] David Hartvigsen, Jerry C. Wei, and Richard Czuchlewski. The conference paper-reviewer assignment problem. Decision Sciences, 1999. [Hazelrigg, 2013] GA Hazelrigg. Dear colleague letter: Information to principal investigators (PIs) planning to submit proposals to the Sensors and Sensing Systems (SSS) program October 1, 2013, deadline. 2013. [Holzman and Moulin, 2013] Ron Holzman and Herv e Moulin. Impartial nominations for a prize. Econometrica, 81(1):173 196, 2013. [Kahng et al., 2018] Anson Kahng, Yasmine Kotturi, Chinmay Kulkarni, David Kurokawa, and Ariel D Procaccia. Ranking wily people who rank each other. In AAAI, 2018. [Kurokawa et al., 2015] David Kurokawa, Omer Lev, Jamie Morgenstern, and Ariel D Procaccia. Impartial peer review. In IJCAI, pages 582 588, 2015. [Merriﬁeld and Saari, 2009] Michael R Merriﬁeld and Donald G Saari. Telescope time without tears: a distributed approach to peer review. Astronomy & Geophysics, 2009. [Piech et al., 2013] Chris Piech, Jonathan Huang, Zhenghao Chen, Chuong Do, Andrew Ng, and Daphne Koller. Tuned models of peer assessment in moocs. ar Xiv preprint ar Xiv:1307.2579, 2013. [Satterthwaite, 1975] Mark Allen Satterthwaite. Strategyproofness and arrow s conditions: Existence and correspondence theorems for voting procedures and social welfare functions. Journal of economic theory, 1975. [Shah et al., 2013] Nihar B Shah, Joseph K Bradley, Abhay Parekh, Martin Wainwright, and Kannan Ramchandran. A case for ordinal peer-evaluation in moocs. In NIPS Workshop on Data Driven Education, 2013. [Shah et al., 2016] Nihar B Shah, Sivaraman Balakrishnan, Joseph Bradley, Abhay Parekh, Kannan Ramchandran, and Martin J Wainwright. Estimation from pairwise comparisons: Sharp minimax bounds with topology dependence. The Journal of Machine Learning Research, 2016. [Shah et al., 2017] Nihar B Shah, Behzad Tabibian, Krikamol Muandet, Isabelle Guyon, and Ulrike von Luxburg. Design and Analysis of the NIPS 2016 Review Process. ar Xiv preprint ar Xiv:1708.09794, 2017. [Stelmakh et al., 2018] Ivan Stelmakh, Nihar B Shah, and Aarti Singh. Peer Review4All: Fair and accurate reviewer assignment in peer review. 2018. [Stewart et al., 2005] Neil Stewart, Gordon DA Brown, and Nick Chater. Absolute identiﬁcation by relative judgment. Psychological review, 112(4):881, 2005. [Thurner and Hanel, 2011] Stefan Thurner and Rudolf Hanel. Peer-review in a world with rational scientists: Toward selection of the average. The European Physical Journal B, 84(4):707 711, 2011. [Tsukida and Gupta, 2011] Kristi Tsukida and Maya R Gupta. How to analyze paired comparison data. Technical report, DTIC Document, 2011. [Xu et al., 2018] Yichong Xu, Han Zhao, Xiaofei Shi, and Nihar B Shah. On Strategyproof Conference Peer Review. ar Xiv preprint ar Xiv:1806.06266, 2018.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)