# combating_collusion_rings_is_hard_but_possible__80e5e16e.pdf

Combating Collusion Rings is Hard but Possible

Niclas Boehmer1, Robert Bredereck2, Andr e Nichterlein1

1 Technische Universit at Berlin, Faculty IV, Algorithmics and Computational Complexity, Berlin, Germany 2 Humboldt-Universit at zu Berlin, Institut f ur Informatik, Algorithm Engineering, Berlin, Germany niclas.boehmer@tu-berlin.de, robert.bredereck@hu-berlin.de, andre.nichterlein@tu-berlin.de

A recent report of Littmann published in the Communications of the ACM outlines the existence and the fatal impact of collusion rings in academic peer reviewing. We introduce and analyze the problem CYCLE-FREE REVIEWING that aims at ﬁnding a review assignment without the following kind of collusion ring: A sequence of reviewers each reviewing a paper authored by the next reviewer in the sequence (with the last reviewer reviewing a paper of the ﬁrst), thus creating a review cycle where each reviewer gives favorable reviews. As a result, all papers in that cycle have a high chance of acceptance independent of their respective scientiﬁc merit. We observe that review assignments computed using a standard Linear Programming approach typically admit many short review cycles. On the negative side, we show that CYCLE-FREE REVIEWING is NP-hard in various restricted cases (i.e., when every author is qualiﬁed to review all papers and one wants to prevent that authors review each other s or their own papers or when every author has only one paper and is only qualiﬁed to review few papers). On the positive side, among others, we show that, in some realistic settings, an assignment without any review cycles of small length always exists. This result also gives rise to an efﬁcient heuristic for computing (weighted) cycle-free review assignments, which we show to be of excellent quality in practice.

1 Introduction

As recently pointed out by Littman (2021), the integrity and legitimacy of scientiﬁc conference publications (particularly important in the context of computer science) is threatened by so-called collusion rings , which are sets of authors that unethically review and support each other while breaking anonymity and hiding conﬂicts of interest. Despite the fact that details are usually not disclosed for various reasons, it is inevitable that the process of assigning papers to reviewers is the key point to engineer technical barriers against such incidents. Whereas assignments at very small venues could be performed manually, support by (semi-)automatic systems becomes necessary already for medium-size conferences. Today computational support for ﬁnding review assignments is well-established and has improved the quality of the reviewing and paper assignment process in many

Copyright 2022, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

ways (see the surveys of Shah (2021) and Price and Flach (2017) for details). Still there is huge potential for improving processes and further computational support is urgently requested (Price and Flach 2017; Shah 2021). When aiming to prevent collusion rings, one of the most basic properties one can request from a review assignment is that the assignment does not contain any review cycle of length z, that is, a sequence of z agents each reviewing a paper authored by the next agent in the sequence (with the last agent reviewing a paper authored by the ﬁrst). This property is of high practical relevance: For example, in the AAAI 21 review assignment the non-existence of review cycles of length at most z = 2 was a soft constraint (Leyton-Brown and Mausam 2021). Yet, there is a lack of systematic studies concerning the computation of such assignments. Motivated by this, we propose and analyze CYCLE-FREE REVIEWING, the problem of computing an assignment of papers to agents that is free of review cycles of length at most z, both from a theoretical and practical perspective.

Related Work. The literature is rich in the general context of peer reviewing (see, e. g., the works of Goldsmith and Sloan (2007); Taylor (2008); Garg et al. (2010); Long et al. (2013); Lian et al. (2018); Kobren, Saha, and Mc Callum (2019); Stelmakh, Shah, and Singh (2021) on computational aspects of ﬁnding a good review assignment, and the survey of Shah (2021)). Closest to our work are Barrot et al. (2020) and Guo et al. (2018). In the context of product reviewing, among others, Barrot et al. (2020) propose and analyze a restricted case which translates to our setting as follows: Given a set of single-author papers and a set of agents each writing a single paper and each having some conﬂicts of interest over papers, ﬁnd a review assignment of papers to agents, where each agent serves as a reviewer providing one review and each paper must receive one review. They show that in this setting ﬁnding an assignment without review cycles of length at most z corresponds to ﬁnding a 2-factor without cycles of length at most z, which is known to be NP-hard for z 5 but polynomial-time solvable for z 3 (Hell et al. 1988). Closer to our setting is that of Guo et al. (2018), who also consider the computation of cycle-free review assignments. They propose two simple heuristics and conduct experiments measuring the quality of their heuristics and the number of review cycles in a weight-

The Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22)

maximizing solution on two instances, mostly focusing on the inﬂuence of the number of reviews per paper and per reviewer.

Outline and Contributions. Our contribution is threefold. First, in Section 3, we show the intractability of CYCLE-FREE REVIEWING in various restricted settings: We show NP-hardness even when just forbidding review cycles of length at most two in sparse and dense settings (e.g., if each reviewer can review only few or can review almost all papers, see Theorems 1 to 3). Furthermore, solving a question left open by Barrot et al. (2020), we show NP-hardness if each agent writes just one singleauthor paper and can review only few papers (Theorem 4). Second, in Section 4, we develop greedy heuristics. In contrast to Guo et al. (2018) we provide a theoretical analysis for the heuristics. In particular, we prove that, if the considered instance satisﬁes certain near-realistic conditions (such as that each paper has few authors and that for each paper there are many possible reviewers), then these heuristics are guaranteed to output a z-cycle-free review assignments in polynomial time. Third, in Section 5, we present and discuss the results of our experiments. Our core results are: 1. Existing linear-programming-based methods for computing maximum-weight review assignments (as often used in practice) produce assignments where a high fraction (20% or more) of agents and papers belong to some review cycles of length two. 2. For z {2, 3, 4} maximum-weight z-cycle-free assignments computed by one of our heuristics (see Section 4) or computed via Integer Linear Programming are almost as good as the maximum-weight review assignments with cycles (solution quality loss less than 4% resp. 1%). 3. Somewhat surprisingly, we show that adding additional reviewers that are authors of some papers to the reviewer pool increases the number of papers that belong to review cycles in maximum-weight (non cycle-free) assignments.

2 Preliminaries For n N, we set [n] := {1, . . . , n}. In an instance of CYCLE-FREE REVIEWING, we are given a set P of papers and set A of agents, where each paper p P is authored by a subset aut(p) A of agents. Moreover, we are given for each agent a A a subset rev(a) P of papers the agent is qualiﬁed to review1. We capture this information in a bipartite graph (A P, EA EP ) with EA = {(a, p) | a A, p rev(a)} and EP = {(p, a) | p P, a aut(p)} (see also Table 1 for an overview). A (peer) review assignment E EA is a subset of edges from agents to papers, where we say that a reviews p in E if (a, p) E . Given a review assignment E EA, for an agent a A, let N +(a, E ) = {p P | (a, p) E } be the subset of papers agent a reviews in E and, for a paper p P, let N (p, E ) = {a A | (a, p) E } be the subset of agents that review

1Being qualiﬁed to review can encode that the agent is capable of reviewing the paper or that the agent does not have a conﬂict of interest with one of the co-authors or both.

Variable Explanation

V = A P vertex set consisting of agents A and papers P with n A = |A| and n P = |P| EA (a, p) EA A P shows a can review p EP (p, a) EP P A shows a authors p N (v, E) in-neighbors of v V wrt. E V 2 , i. e., N (v, E) := {u V | (u, v) E} N +(v, E) out-neighbors of v V wrt. E V 2 , i. e., N +(v, E) := {u V | (v, u) E} U, + U maximum inand out-degree in U resp., e. g., U := maxu U |N (u, EA EP )| δ U , δ+ U minimum inand out-degree in U resp., e. g., δ+ U := minu U |N +(u, EA EP )| A, δ A maximum resp. minimum number of papers per author + P , δ+ P maximum resp. minimum number of authors per paper + A, δ+ A maximum resp. minimum number of papers any author is qualiﬁed to review P , δ P maximum resp. minimum number of potential reviewers for any paper

Table 1: Notation overview

p in E . For c, d N a review assignment E EA is called c-d-valid if each agent reviews at most c papers and each paper is reviewed by d agents, that is, |N +(a, E )| c for all a A and |N (p, E )| = d for all p P. In a review assignment E EA, we say that papers p1, . . . , pz and agents a1, . . . , az form a review cycle (of length z) if ai is an author of pi (pi, ai) EP for all i [z], ai reviews pi+1 in E (ai, pi+1) E for i [z 1] and az reviews p1 in E (az, p1) E . Notably, a review cycle of length z in E corresponds to a directed cycle of length 2z in (A P, E EP ) and a review cycle of length one corresponds to an author reviewing one of its own papers. We say that a review assignment E is z-cycle free if there is no review cycle of length i [z] in E . Using this notation, we deﬁne our central problem and refer to Table 1 for further necessary variable deﬁnitions: [WEIGHTED] CYCLE-FREE REVIEWING Input: A directed bipartite graph (A P, EA EP ) and non-negative integers creviewer, dpaper, and z [and a weight function w : EA 7 Z and an integer W]. Question: Is there a creviewer-dpaper-valid and z-cyclefree review assignment E EA [of weight at least W, i.e., P e E w(e) W]?

We refer to P e E w(e) as the weight of the review assignment E . The proofs (or their completions) for results marked by ( ) can be found in the full version (Boehmer, Bredereck, and Nichterlein 2021).

3 NP-Hardness in Various Restricted Cases From the work of Barrot et al. (2020, Theorem 4.12) it follows that CYCLE-FREE REVIEWING is NP-hard in the

single-author-single-paper setting ( A = + P = 1) even if creviewer = dpaper = 1 and z = 2. However, as in reality instances of CYCLE-FREE REVIEWING are hardly arbitrary but have a quite strong structure, in this section we prove that the NP-hardness of CYCLE-FREE REVIEWING upholds even if the given instance fulﬁlls further quite restrictive conditions, e.g., each agent is qualiﬁed to review all papers or our problem speciﬁc parameters ( A, + P , + A, P , creviewer, dpaper, z) are small constants.

Sparse Review Graph and Small Weights. We start by considering the case where all our parameters are small. Speciﬁcally, we show the NP-hardness of CYCLE-FREE REVIEWING for arbitrarily z 2 even if each paper is only authored by at most two agents, each agent authors at most two papers, each agent is only qualiﬁed to review at most three papers, and for each paper only at most three agents are qualiﬁed to review it (see Table 1 for deﬁnitions).

Theorem 1 ( ). For any z 2, CYCLE-FREE REVIEWING is NP-hard, even if + A = P = 3, A = + P = 2, n A = n P , and creviewer = dpaper = 1. For any z 2, δ P 1 δ+ A, CYCLE-FREE REVIEWING is NP-hard, even if A = + P = 2, n A = n P , and creviewer = dpaper = 1. Both hardness results still hold if agents are not allowed to review papers of co-authors.

Proof sketch. To show the ﬁrst statement, we reduce from the NP-hard variant of 3-SAT where each literal appears in two clauses (Berman, Karpinski, and Scott 2003). We set creviewer = dpaper = 1 and z 2. Given a propositional formula ϕ, for each variable x, we introduce three agents ax, a x, bx and three papers px, p x, and qx. Agents ax and bx are qualiﬁed to review px, agents a x and bx are qualiﬁed to review p x and agents ax and a x are qualiﬁed to review qx. Intuitively, either does ax review px (which corresponds to setting x to false) or a x review p x (which corresponds to setting x to true). For each clause c = ℓ1 ℓ2 ℓ3, we introduce an agent ac, two dummy agents, and three papers p1 c, p2 c, and p3 c. Agent ac and both dummy agents are qualiﬁed to review all three papers. Notably, for one i [3], ac needs to review pi c (which corresponds to c being fulﬁlled because of ℓi). Concerning the authors of each paper, for each clause c = ℓ1 ℓ2 ℓ3 and i [3], ac is an author of pℓi and aℓi is an author of pi c. Note that letting ac review pi c for some c C and i [3] results in a 2-cycle if and only if aℓi reviews pℓi, which corresponds to setting ℓi to false. Thus, the assignments of variables encoded in a 1-1-valid 2-cycle-free review assignment needs to satisfy ϕ, as for each c C, ac reviews pi c for some i [3]. This results in a cycle except if a ℓi reviews p ℓi which corresponds to setting ℓi to true. Conversely, it is possible to show that each satisfying assignment for ϕ results in a 1-1-valid cycle-free review assignment.

While we prove hardness for arbitrary δ+ A and δ P , in our construction (presented in our full version (Boehmer, Bredereck, and Nichterlein 2021)), there are always agents that are not qualiﬁed to review many papers (around 2

3) and always papers that cannot be reviewed by many agents

3). Thus, interpreting a qualiﬁcation as the absence of a conﬂict of interest, for our NP-hardness agents need to have many conﬂicts. In Section 4, we prove that this does not happen by accident, as if the number of conﬂicts per agent/paper (and A, + P , creviewer, and dpaper) are small , then CYCLE-FREE REVIEWING always admits a solution. In WEIGHTED CYCLE-FREE REVIEWING it is possible to encode the qualiﬁcations of agents into weights: If we modify the reduction from above and give an agent-reviewer pair weight one if the agent is qualiﬁed to review the paper and weight zero otherwise, we get that WEIGHTED CYCLEFREE REVIEWING is NP-hard even if each agent is qualiﬁed to review all papers and we have few non-zero weights.

Corollary 1. For any z 2, WEIGHTED CYCLE-FREE PEER REVIEWING is NP-hard, even if each agent is qualiﬁed to review all papers, each agent gives only at most three papers a non-zero weight, for each paper at most three agents give it a non-zero weight, A = + P = 2, n A = n P , and creviewer = dpaper = 1.

No Conﬂicts of Interest. We now extend the hardness from Corollary 1 for the case where each agent is qualiﬁed to review all papers (no conﬂicts) to the unweighted case. However, our new reduction relies on the existence of papers with many authors and agents authoring many papers.

Theorem 2 ( ). CYCLE-FREE REVIEWING is NP-hard even if each agent is qualiﬁed to review all papers, n A = n P , creviewer = dpaper = 1, and z = 2.

The reduction from Theorem 2 heavily relies on the possibility that an agent reviews a paper written by an agent with whom she has a joint paper. As some conferences might declare an automatic conﬂict of interest for co-authors, we now consider the case where an agent is qualiﬁed to review all papers that are not authored by one of her co-authors:

Theorem 3 ( ). CYCLE-FREE REVIEWING is NP-hard even if each agent is qualiﬁed to review all papers that are not written by one of her co-authors, creviewer = dpaper = 1, and z = 2.

Single-Author-Single-Paper Setting. In their theoretical analysis, Barrot et al. (2020) focus on CYCLE-FREE REVIEWING where each agent writes a single-author paper (we speak of an agent and its paper interchangeably) and qualiﬁcations are symmetric, i.e., if an agent a is qualiﬁed to review agent b, then b is qualiﬁed to review a. They prove that this problem is NP-hard for creviewer = dpaper = 1 and z = 5 (without bounds on + A or P ) but polynomial-time solvable for arbitrary creviewer = dpaper for z = 2. We close the gap between these two results and extend their general picture by proving that for creviewer = dpaper = 2, CYCLE-FREE REVIEWING is NP-hard for z = 3 even if qualiﬁcations are symmetric and each agent is only qualiﬁed to review four agents, i.e., we need to decide for each agent a which two of these four agents review a and which two of these agents will get a review from a.

Theorem 4 ( ). CYCLE-FREE REVIEWING is NP-hard, even if z = 3, creviewer = dpaper = 2, A = + P = 1,

n A = n P , each agent is qualiﬁed to review exactly four papers and if an agent a can review the paper written by agent b, then b can review the paper of a.

4 Polynomial-Time Solvable Special Cases In this section, we identify conditions under which a shortcycle-free review assignment provably exists and can be computed in polynomial time. As we will see in our experiments, the subsequently presented algorithms provide short-cycle-free review assignments even beyond the theoretical limitations we discuss below. As we are interested in computing z-cycle-free review assignments for z 1, no author is allowed to review one of its own papers. That is why throughout this section we assume that we do not have (a, p) EA and (p, a) EP at the same time. Our algorithms in this section are based on the following simple observation: Given a partial z-cycle-free review assignment E and a paper p P that requires more assigned reviewers, the number of potential reviewers that would create a z-cycle if assigned to review p is bounded by a function in z, the maximum number + P of authors per paper, and the maximum number creviewer of reviews per agent; the precise function is given in the subsequent proofs. Thus, assuming that the minimum number δ P of potential reviewers for each paper is large compared to z, + P , dpaper, and creviewer, for each paper p there are always reviewers that can be assigned to review p without creating a z-cycle. Note that in practice we can expect that z, dpaper, and creviewer are quite small. Moreover, while the minimum number of ﬁtting reviewers might be not very large, it is not uncommon to assign papers to reviewers that are not perfect . Thus, interpreting δ P as the number of community members that do not have a conﬂict of interest actually yields relative large values for δ P in practice. We start with a very restrictive setting and then, step by step, generalize the approach and the results. First, each paper is written by exactly one author, each agent has at most one paper and we want a completely cycle-free review assignment (i. e., z-cycle-free for every z N). This of course implies that some agents cannot be authors of papers and so the number n P of papers is smaller than the number n A of agents. However, it allows Algorithm 1 to work (implicitly) with the topological ordering of the (acyclic) review assignment while constructing it.

Proposition 1. If A 1 = δ+ P = + P , dpaper creviewer, and δ P n P + dpaper, then Algorithm 1 computes a dpaperdpaper-valid and completely cycle-free review assignment in linear time.

Proof. We ﬁrst show the correctness of Algorithm 1. Clearly, if in each iteration of the loop in Line 3 the set of eligible reviewers R (see Line 6) is of size at least dpaper, then a completely cycle-free review assignment is created as each agent only reviews papers from agents occurring later during the algorithm. Observe that if |Si| n A δ P + dpaper for i {0, . . . , n P 1}, then in iteration i we have |R| dpaper: There are at most n A δ P agents in Si that cannot review p (the corresponding edge is not in EA) and, thus, at

Algorithm 1: A greedy algorithm computing a dpaperdpaper-valid completely cycle-free assignment E .

1 E ; S0 agents without papers /* φi(a) is the free reviewing capacity of a before iteration i of the for-loop from Line 3; each agent reviews at most dpaper papers */

2 foreach a A do φ0(a) dpaper /* Assign reviewers to one paper per iteration: */

3 for i 0 to n P 1 do

4 foreach a A do φi+1(a) φi(a)

5 select some (p, a) EP where p has no reviews yet /* collect qualified reviewers and assign dpaper of them to p */

6 R {b Si | (b, p) EA}

7 for j 1 to dpaper do

8 arbitrary b R reviews p: E E {(b, p)}

9 φi+1(b) φi(b) 1; R R \ {b} /* collect possible reviewers for next paper */

10 Si+1 := {a} {b Si | φi+1(b) > 0}

11 return E

least dpaper agents in Si are eligible to review p. It remains to show that |Si| n A δ P +dpaper for all i {0, . . . , n P 1} follows from our assumptions. By assumption of the lemma we have n P δ P dpaper. Hence, |S0| = n A n P n A δ P + dpaper. We next show that |Si| |S0| for all i [n P 1]. Observe that at the start we have P a S0 φ0(a) = |S0| dpaper. Moreover, after the ith iteration of the loop in Line 3 we have P a Si+1 φi+1(a) = P a Si φi(a) as each paper gets dpaper reviews and the reviewer a in Si+1 \ Si starts with φi+1(a) = dpaper. Observe that φi(a) dpaper for all a A and i {0, . . . , n P 1}. Thus, we have |Si|dpaper P a Si φi(a) = P a S0 φ0(a) = |S0|dpaper and, hence, |Si| |S0|. This completes the correctness proof. As to the running time, everything outside the loop starting in Line 3 clearly runs in linear time. As to the part inside the loop, note that by keeping just one array of length n A we can store the values of φ in linear time. Moreover, the reviewers for p are selected arbitrarily from R, which is doable in |N (p, EA)| time. Hence, the loop in Line 3 can be processed in O(n P + |EA|) time. Thus, the overall algorithm runs in O(n A + n P + |EA|), that is, linear time.

For our next result we replace the completely cycle-free property of the resulting review assignment with z-cycle freeness. This implies that the idea of constructing the review assignment along its topological ordering (as done by Algorithm 1) cannot be employed. Instead, Algorithm 2 constructs greedily a maximal z-cycle-free assignment and then extends the assignment by replacing one review assignment by two other assignments. The argument behind the replace-

Algorithm 2: Greedy algorithm to compute a creviewerdpaper-valid z-cycle-free review assignment E .

2 while p P : |N (p, E )| < dpaper do

3 if (a, p) EA \ E : E {(a, p)} is z-cycle free and |N +(a, E )| < creviewer then

/* Case 1: greedy assignment of reviews as long as no z-cycles are created: */

4 E E {(a, p)}

/* Case 2: replace one review assignment by two: */

6 pick (a , p ) E and a A so that |N +(a, E )| < creviewer, (a , p), (a, p ) EA, and (E \ {(a , p )}) {(a , p), (a, p )} is z-cycle free

7 E (E \ {(a , p )}) {(a , p), (a, p )}

ment strategy is an extension of the argument in Algorithm 1 that there are always enough reviewers to assign in Lines 7 to 9. To keep our arguments simple we ﬁrst consider the case that each agent reviews at most one paper and each paper requires one review. Moreover, as before, we are in the setting that each paper has one author and each agent authors at most one paper. Formally, we have the following.

Proposition 2. If A 1 = δ+ P = + P = creviewer = dpaper, n A n P , δ+ A > z, δ P > z, and n P δ+ A + δ P 2z, then Algorithm 2 computes a creviewer-dpaper-valid z-cyclefree review assignment in polynomial time.

Proof. Obviously, Algorithm 2 terminates after at most n P iterations of the while loop as in each iteration the number of assigned reviews increases. Moreover, a creviewer-dpaper-valid z-cycle-free review assignment is returned if a, a , p as described in case 2 (Line 6) always exist. To prove their existence, we introduce some notation. For some v A P let N + z (v, E EP ) be the z-out-neighborhood of v, that is, the set of vertices that can be reached from v in the review graph (A P, E EP ) via a path of length at least one and at most 2z. Similarly, let N z (v, E EP ) be the 2z-in-neighborhood of v, that is, the set of vertices that can reach v in the review graph (A P, E EP ) via a path of length at least one and at most 2z. Note that if v N z (v, E EP ), then also v N + z (v, E EP ) and v is contained in a review cycle of length z (that is a directed cycle of length 2z in (A P, E EP )). Subsequently, we present upper bounds on the size of N z (v, E EP ) and N + z (v, E EP ) for v A P thereby proving the existence of a, a , p . Let p P be the paper without reviewer selected in Line 2 when the algorithm enters case 2. Let Ap A be

the set of agents that could review p without creating a zcycle, that is, Ap := {a A | (a, p) EA E {(a, p)} is z-cycle free}. Since dpaper = creviewer = + P = 1, there are at most z agents whose assignment to review p would create a review cycle, that is, |N + z (p, E EP ) A| z, and thus |Ap| δ P z. Since we are in case 2, no more review assignments could be added without creating a zcycle. Hence, the algorithm assigned the at least δ P z potential reviewers in Ap to different papers. Let Pp be the set of these papers. Since dpaper = creviewer = 1 we have |Pp| = |Ap| δ P z. Let a A be an arbitrary agent without assigned review, that is, p : (a, p ) E . Since dpaper = creviewer = A = 1, we have |N z (a, E EP ) P| z. Thus, there are δ+ A z papers that a could review without creating a z-cycle; let Pa denote the set of these papers. Since we assume that n P δ+ A + δ P 2z, it follows that there is a p Pa Pp. By deﬁnition of Pp there is an agent a with (a , p ) E and a Ap. Thus, a, a , p exist and E can be updated to (E \ {(a , p )}) {(a , p), (a, p )} in Line 7.

We now turn our attention to our general case where agents can author and review many papers and papers can have multiple authors and can require several reviews. While the conditions that guarantee the existence of a z-cycle-free review assignment need adjustments, we can still use Algorithm 2 together with a correctness proof that follows a similar pattern as the proof of Proposition 2.

Theorem 5 ( ). If, n A creviewer n P dpaper, δ+ A > 2( A dpaper)z+creviewer, δ P > 2( + P creviewer)z+dpaper, and n P δ+ A 2( A dpaper)z creviewer+(creviewer/dpaper)(δ P 2( + P creviewer)z dpaper), then Algorithm 2 computes a creviewerdpaper-valid z-cycle-free review assignment in polynomial time.

To simplify the statement of Theorem 5 consider a symmetric case where n A n P , δ P = δ+ A, and + P = A. For brevity, set n := n P , δ := δ P , and := + P . Let coi be the maximum number of papers any agent is not qualiﬁed to review/has a conﬂict of interest with, that is, coi = n δ. Setting creviewer = 6 and dpaper = 3 as in our experiments we get:

Corollary 2. If n 6 1.5 coi + z(6z 2+3z), then there always exists a 6-3-valid z-cycle-free review assignment that can be found in polynomial time.

Considering that AAAI 22 had 9,251 submissions and that there was a submission limit of 10 papers per author and assuming that each paper has at most ten authors (implying that = 10) and that each author has at most 700 conﬂict of interests, it follows that there is a 6-3-valid 2-cycle-free review assignment computable with Algorithm 2. As we see in the experiments in the next section, our algorithm returns 2/3/4-cycle-free review assignments even well beyond the theoretical guarantees given above. We also remark that Algorithm 2 allows for an easy extension to the weighted case which we use in our experiments in the next section. To this end, in the ﬁrst case (Line 4) we do not pick an arbitrary edge (a, p) but a eligible edge of maximum weight to be added to the assignment E .

5 Experiments

In this section, we compare the weight of review assignments computed using different methods and analyze the occurrences of review cycles.2 For this, we use a dataset from the 2018 International Conference on Learning Representations (ICLR 18) prepared by Xu et al. (2019). Xu et al. (2019) collected all 911 papers submitted to ICLR 18 and the identity of all 2428 authors. As reviewers identities are unknown, they considered all authors to be reviewers and computed for each author-paper pair a similarity score.3

From the dataset of Xu et al. (2019), we created multiple instances of WEIGHTED CYCLE-FREE REVIEWING as follows. Given a number n P of papers and a ratio r AP of the numbers of agents and papers, we sample a subset of n P of the 911 ICLR 18 papers and set this as our set of papers. Subsequently, we compute the set of all authors of one of these papers and sample a subset of r AP n P authors and set this as our set of agents. Notably, the created instances can be seen as particularly challenging when it comes to avoiding review cycles, as in reality also uncritical reviewers, i.e., reviewers that do not author any paper, exist. As done in other papers using the same dataset, we focus on the case with dpaper = 3 and creviewer = 6, i.e., every paper needs exactly three reviews and each agent can review at most six papers (Xu et al. 2019; Jecmen et al. 2020). We consider three different types of review assignments: As optimal we denote a maximum-weight creviewer-dpapervalid review assignment. Such an assignment can be computed using a simple Linear Program (LP) as, for instance, described by Taylor (2008). As optimal z-cycle free we denote a maximum-weight creviewer-dpaper-valid z-cycle-free review assignment. This solution can be computed by treating the LP of Taylor (2008) as an Integer Linear Program (ILP) and adding for each possible i-cycle for i [z] a separate constraint which imposes that at least one of the agent-paper pairs from the cycle is not assigned. We solved all (I)LPs using Gurobi Optimization, LLC (2021). Lastly, as heuristic z-cycle free , we denote a creviewer-dpaper-valid z-cycle-free review assignment computed by the weighted variant of Algorithm 2 as described at the end of Section 4.4 In all experiments conducted in this section, the heuristic always returned a solution despite the fact that most of the time we are beyond the setting in which Theorem 5 guarantees this behavior of the heuristic. In experiment I presented in the following subsection, for z = 2/3/4, an unoptimized implementation of our heuristic was always able to ﬁnd a z-cycle-free review assignment in less than 30 seconds, being on average around 2 times faster than the optimal LP, on average around 3.7 times faster than the optimal 2-cycle free ILP, and on average more than 100 times faster than the optimal 3-cycle free ILP.

2The code for our experiments is available at github.com/nboehmer/Combating-Collusion-Rings-is-Hard-but-Possible. 3To the best of our knowledge, in all other publicly available datasets, there are similarity scores for reviewer-paper pairs but the link between the identities of authors and reviewers is missing. 4We could not use the heuristics of Guo et al. (2018) as these are not available and their algorithm details are ambiguous.

200 400 600 800 0.96 0.965 0.97 0.975 0.98 0.985 0.99 0.995

number of papers

normalized weight

2-cycle free heuristic cycle free 3-cycle free optimal cycle free 4-cycle free

Figure 1: For different values of z, weight of an optimal/heuristic z-cycle-free assignment divided by the weight of an optimal assignment.

5.1 Experiment I In this experiment, we focus on the case where the total number of needed reviews is the same as the total number of reviews that can be written, which is in some sense the most challenging but probably also one of the more realistic scenarios. Speciﬁcally, for n P {150, 175, . . . , 900}, we prepared 100 instances with r AP = 0.5 as described above and computed for each of these instances the optimal, heuristic 2/3/4-cycle-free, and optimal 2-cycle-free review assignment. Moreover, for all instances with n P 225, we also computed the optimal 3-cycle-free review assignment (for larger instances the ILP solver run out of memory.) To measure the price of z-cycle freeness , in Figure 1, we display the weights of different cycle-free review assignments divided by the weight of an optimal review assignment. What stands out here is that by forbidding 2-cycles the assignment s weight is, on average, only reduced by at most 0.8% (if the optimal 2-cycle-free assignment is used). Turning to the results produced by our heuristic, the quality decrease for 2/3/4-cycle-free assignments lies, on average, around 3.1%, 3.2%, and 3.3%. The weight of assignments computed using our heuristic is thus clearly worse than the weight of the optimal cycle-free assignment, yet still not far away from the the weight of an optimal assignment. What is particularly surprising here is that for both our heuristic and the optimal cycle-free assignment, whether 2, 3 or 4 cycles are forbidden seems to be rather irrelevant for the quality decrease. All in all, it is encouraging that 2/3/4-cycle freeness can be realized at a low cost independent of whether our heuristic or an ILP is used. The necessity of dealing with review cycles is underlined by Figure 2. Here, we show the fraction of agents that are contained in at least one review cycle of some length in an optimal assignment and in a heuristic 2/3-cycle-free assignment. Overall, as the number of papers increases the fraction of agents contained in review cycles constantly decreases, yet for all considered values of n P the results are worrisome. In the optimal assignment for 150 papers, the fraction of agents contained in a review cycle of length at most 2/3/4 is, on average, 40%/58%/76% , while even for 900 papers, still 32%/41%/55% of agents are contained in a review cycle. Considering heuristic z-cycle-free assignments, the fraction

200 400 600 800 0.2 0.3 0.4 0.5 0.6 0.7 0.8

number of papers

fraction of agents in review cycle

2-cycles optimal 2/3-cycles heuristic 2-cycle free 2/3/4-cycles heuristic 3-cycle free

Figure 2: Fraction of agents that are part of a review cycle of at most some length for different types of assignment.

of agents contained in a cycle of length z +1 is considerably lower than for the optimal solution but still non-negligible (the results for optimal 2/3-cycle-free assignments are similar to the displayed results for our heuristic). We also computed the fraction of papers that are contained in at least one review cycle. The results are as in Figure 2 with all values roughly halved, e.g, even in the optimal assignment for 900 papers, 15%/20%/27% of papers are contained in a review cycle of length at most 2/3/4. An intuitive explanation for this difference between agents and papers is that the number of papers is twice the number of agents and that there exist some papers without reviewing authors. Overall, it is striking that even for a high number of papers, in an optimal assignment around 15% of papers could have a considerably higher chance of getting accepted if two agents coordinate to give each others paper better reviews and 32% of reviewers would have an opportunity to participate in such a collusion.

5.2 Experiment II In this experiment, we analyze how the results from experiment I depend on the assumption that the supply and demand of reviews exactly matches. In particular, as describe before, for r AP {0.5, 0.6, . . . , 1.9, 2} we prepared 100 instances with 200 papers and r AP 200 agents (we also repeated this experiment for 400 and 600 papers producing similar results) and computed the different types of review assignments. Considering the assignment weights, increasing r AP from 0.5 to 2, the normalized weight of an optimal 2/3cycle-free assignment decreases by 0.005 to 0.987/0.985, while the normalized weight of a heuristic 2/3/4-cycle-free assignment increases by 0.01 to 0.982/0.979/0.976: our heuristic performs particularly well if there are (considerably) more reviews available then needed; this supports our theoretical statements for our heuristic in Section 4. Turning to the possible impact of review cycles, we visualize the fraction of agents/papers contained in a review cycle in an optimal assignment in Figure 3.5 While the fraction of agents contained in a review cycle constantly and signiﬁcantly decreases if more and more agents are added, the

5For readability, we do not display the values for the optimal/heuristic cycle-free assignment, as their relationship to the optimal assignment is again similar as in Figure 2.

.6 .8 1 1.21.41.61.8 2

number of agents/number of papers

fraction contained in review cylce

2-cycles fraction agents 2/3-cycles fraction papers 2/3/4-cycles

Figure 3: Fraction of agents/papers that are part of a review cycle of at most some length in an optimal assignment for 200 papers and between 100 and 400 agents.

fraction of papers contained in a cycle constantly increases. The former observation is quite intuitive, as when more and more agents are added, the average review load decreases and even if the number of review cycles remains the same, it is likely that the fraction of agents contained in one gets smaller. The latter observation is less intuitive but probably a consequence of the fact that, starting with r AP = 0.5, for some papers none of the authors is part of the agent set, implying that these papers cannot be part of a review cycle; however, if we start to add more and more agents, more and more papers can potentially be part of a review cycle. Overall, it might be quite counter intuitive that adding more and more reviewers (that are also authors) to the reviewer pool does not decrease the number of papers contained in a review cycle but increases them.

6 Conclusion Our work provides a ﬁrst systematic analysis of CYCLEFREE REVIEWING. On the theoretical side, we show that CYCLE-FREE REVIEWING is a computationally hard problem even in very restricted settings, yet practically relevant polynomial-time solvable special cases exist. In our practical analysis, we could show that in assignments that do not care for review cycles a high fraction of authors and papers will likely be part of a short review cycle. While collusion rings can certainly also emerge without the existence of review cycles, for example, when authors coordinate over multiple conferences (Littman 2021; Shah 2021), allowing so many easy opportunities means to leave a huge door unlocked without good reason: Our heuristic signiﬁcantly improves the situation, since it seems to always ﬁnd a cyclefree review assignment at a very low quality loss. For future work, it would be valuable to further investigate the limits of our heuristic. While our current bounds are certainly not tight, there are also clear limitations for possible extensions imposed by our NP-hardness results in quite restrictive settings from Section 3. However, a concrete and practically very relevant open question is whether the minimum degree in our analysis can be replaced by the average degree; this would make the results much more robust against outliers. Finally, due to the lack of data, we tested our model on just one dataset. Obtaining more data to test our and other models on would be extremely valuable.

Acknowledgments

NB was supported by the DFG project Ma Mu (NI 369/19) and by the DFG project Com Soc-MPMS (NI 369/22). RB was partially supported by the DFG project AFFA (BR 5207/1 and NI 369/15). This work was started at the research retreat of the TU Berlin Algorithmics and Computational Complexity group held in September 2020.

References Barrot, N.; Lemeilleur, S.; Paget, N.; and Safﬁdine, A. 2020. Peer Reviewing in Participatory Guarantee Systems: Modelisation and Algorithmic Aspects. In Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 20), 114 122. IFAAMAS. Berman, P.; Karpinski, M.; and Scott, A. D. 2003. Approximation Hardness of Short Symmetric Instances of MAX3SAT. ECCC, (49). Boehmer, N.; Bredereck, R.; and Nichterlein, A. 2021. Combating Collusion Rings is Hard but Possible. Co RR, abs/2112.08444. Garg, N.; Kavitha, T.; Kumar, A.; Mehlhorn, K.; and Mestre, J. 2010. Assigning Papers to Referees. Algorithmica, 58(1): 119 136. Goldsmith, J.; and Sloan, R. H. 2007. The AI Conference Paper Assignment Problem. In Proceedings of the 22nd AAAI Conference Workshop on Preference Handling for Artiﬁcial Intelligence (MPREF 07), 53 57. AAAI Press. Guo, L.; Wu, J.; Chang, W.; Wu, J.; and Li, J. 2018. K-Loop Free Assignment in Conference Review Systems. In Proceedings of the 2018 International Conference on Computing, Networking and Communications (ICNC 18), 542 547. IEEE Computer Society. Gurobi Optimization, LLC. 2021. Gurobi Optimizer Reference Manual. https://www.gurobi.com. Hell, P.; Kirkpatrick, D. G.; Kratochv ıl, J.; and Kr ız, I. 1988. On Restricted Two-Factors. SIAM J. Discret. Math., 1(4): 472 484. Jecmen, S.; Zhang, H.; Liu, R.; Shah, N. B.; Conitzer, V.; and Fang, F. 2020. Mitigating Manipulation in Peer Review via Randomized Reviewer Assignments. In Advances in Neural Information Processing Systems 33 (Neur IPS 20). Kobren, A.; Saha, B.; and Mc Callum, A. 2019. Paper Matching with Local Fairness Constraints. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD 19), 1247 1257. ACM. Leyton-Brown, K.; and Mausam. 2021. AAAI 2021 - introduction. https://slideslive.com/38952457/aaai-2021introduction?ref=account-folder-79533-folders. Minute 8 onwards in the video. Lian, J. W.; Mattei, N.; Noble, R.; and Walsh, T. 2018. The Conference Paper Assignment Problem: Using Order Weighted Averages to Assign Indivisible Goods. In Proceedings of the Thirty-Second AAAI Conference on Artiﬁcial Intelligence, (AAAI 18), 1138 1145. AAAI Press.

Littman, M. L. 2021. Collusion rings threaten the integrity of computer science research. Commun. ACM, 64(6): 43 44. Long, C.; Wong, R. C.; Peng, Y.; and Ye, L. 2013. On Good and Fair Paper-Reviewer Assignment. In Proceedings of the 2013 IEEE 13th International Conference on Data Mining (ICDM 13), 1145 1150. IEEE Computer Society. Price, S.; and Flach, P. A. 2017. Computational support for academic peer review: a perspective from artiﬁcial intelligence. Commun. ACM, 60(3): 70 79. Shah, N. B. 2021. Systemic Challenges and Solutions on Bias and Unfairness in Peer Review. http://www.cs.cmu. edu/ nihars/preprints/Survey Peer Review.pdf. Stelmakh, I.; Shah, N.; and Singh, A. 2021. Peer Review4All: Fair and Accurate Reviewer Assignment in Peer Review. J. Mach. Learn. Res., 22(163): 1 66. Taylor, C. J. 2008. On the optimal assignment of conference papers to reviewers. https://repository.upenn.edu/ cis reports/889. Xu, Y.; Zhao, H.; Shi, X.; and Shah, N. B. 2019. On Strategyproof Conference Peer Review. In Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI 19), 616 622. ijcai.org.