Combating Collusion Rings is Hard but Possible
Abstract
A recent report of Littmann [Commun. ACM ’21] outlines the existence and the fatal impact of collusion rings in academic peer reviewing. We introduce and analyze the problem Cycle-Free Reviewing that aims at finding a review assignment without the following kind of collusion ring: A sequence of reviewers each reviewing a paper authored by the next reviewer in the sequence (with the last reviewer reviewing a paper of the first), thus creating a review cycle where each reviewer gives favorable reviews. As a result, all papers in that cycle have a high chance of acceptance independent of their respective scientific merit.
We observe that review assignments computed using a standard Linear Programming approach typically admit many short review cycles. On the negative side, we show that Cycle-Free Reviewing is NP-hard in various restricted cases (i.e., when every author is qualified to review all papers and one wants to prevent that authors review each other’s or their own papers or when every author has only one paper and is only qualified to review few papers). On the positive side, among others, we show that, in some realistic settings, an assignment without any review cycles of small length always exists. This result also gives rise to an efficient heuristic for computing (weighted) cycle-free review assignments, which we show to be of excellent quality in practice.
1 Introduction
As recently pointed out by Littman (2021), the integrity and legitimacy of scientific conference publications (particularly important in the context of computer science) is threatened by so-called “collusion rings”, which are sets of authors that unethically review and support each other while breaking anonymity and hiding conflicts of interest. Despite the fact that details are usually not disclosed for various reasons, it is inevitable that the process of assigning papers to reviewers is the key point to engineer technical barriers against such incidents. Whereas assignments at very small venues could be performed manually, support by (semi-)automatic systems becomes necessary already for medium-size conferences. Today computational support for finding review assignments is well-established and has improved the quality of the reviewing and paper assignment process in many ways (see the surveys of Shah (2021) and Price and Flach (2017) for details). Still there is huge potential for improving processes and further computational support is urgently requested (Price and Flach, 2017; Shah, 2021).
When aiming to prevent collusion rings, one of the most basic properties one can request from a review assignment is that the assignment does not contain any review cycle of length , that is, a sequence of agents each reviewing a paper authored by the next agent in the sequence (with the last agent reviewing a paper authored by the first). This property is of high practical relevance: For example, in the AAAI’21 review assignment the non-existence of review cycles of length at most was a soft constraint (Leyton-Brown and Mausam, 2021). Yet, there is a lack of systematic studies concerning the computation of such assignments. Motivated by this, we propose and analyze Cycle-Free Reviewing, the problem of computing an assignment of papers to agents that is free of review cycles of length at most , both from a theoretical and practical perspective.
1.1 Related Work
The literature is rich in the general context of peer reviewing (see, e. g., the works of Goldsmith and Sloan (2007); Taylor (2008); Garg et al. (2010); Long et al. (2013); Lian et al. (2018); Kobren et al. (2019); Stelmakh et al. (2021) on computational aspects of finding a “good” review assignment, and the survey of Shah (2021)). Closest to our work are Barrot et al. (2020) and Guo et al. (2018). In the context of product reviewing, among others, Barrot et al. (2020) propose and analyze a restricted case which translates to our setting as follows: Given a set of single-author papers and a set of agents each writing a single paper and each having some conflicts of interest over papers, find a review assignment of papers to agents, where each agent serves as a reviewer providing one review and each paper must receive one review. They show that in this setting finding an assignment without review cycles of length at most corresponds to finding a -factor without cycles of length at most , which is known to be NP-hard for but polynomial-time solvable for (Hell et al., 1988). Closer to our setting is that of Guo et al. (2018), who also consider the computation of cycle-free review assignments. They propose two simple heuristics and conduct experiments measuring the quality of their heuristics and the number of review cycles in a weight-maximizing solution on two instances, mostly focusing on the influence of the number of reviews per paper and per reviewer.
1.2 Outline and Contributions
Our contribution is threefold. First, in Section 3, we show the intractability of Cycle-Free Reviewing in various restricted settings: We show NP-hardness even when just forbidding review cycles of length at most two in “sparse” and “dense” settings (e.g., if each reviewer can review only “few” or can review “almost all” papers, see Theorems 1, 2 and 3). Furthermore, solving a question left open by Barrot et al. (2020), we show NP-hardness if each agent writes just one single-author paper and can review only few papers (Theorem 4).
Second, in Section 4, we develop greedy heuristics. In contrast to Guo et al. (2018) we provide a theoretical analysis for the heuristics. In particular, we prove that, if the considered instance satisfies certain near-realistic conditions (such as that each paper has few authors and that for each paper there are many possible reviewers), then these heuristics are guaranteed to output a -cycle-free review assignments in polynomial time.
Third, in Section 5, we present and discuss the results of our experiments. Our core results are:
-
1.
Existing linear-programming-based methods for computing maximum-weight review assignments (as often used in practice) produce assignments where a high fraction (20% or more) of agents and papers belong to some review cycles of length two.
-
2.
For maximum-weight -cycle-free assignments computed by one of our heuristics (see Section 4) or computed via Integer Linear Programming are almost as good as the maximum-weight review assignments with cycles (solution quality loss less than 4% resp. 1%).
-
3.
Somewhat surprisingly, we show that adding additional reviewers that are authors of some papers to the reviewer pool increases the number of papers that belong to review cycles in maximum-weight (non cycle-free) assignments.
2 Preliminaries
For , we set . In an instance of Cycle-Free Reviewing, we are given a set of papers and set of agents, where each paper is authored by a subset of agents. Moreover, we are given for each agent a subset of papers the agent is qualified to review111Being “qualified to review” can encode that the agent is capable of reviewing the paper or that the agent does not have a conflict of interest with one of the co-authors or both.. We capture this information in a bipartite graph with and (see also Table 1 for an overview). A (peer) review assignment is a subset of edges from agents to papers, where we say that reviews in if . Given a review assignment , for an agent , let be the subset of papers agent reviews in and, for a paper , let be the subset of agents that review in . For a review assignment is called --valid if each agent reviews at most papers and each paper is reviewed by agents, that is, for all and for all . In a review assignment , we say that papers and agents form a review cycle (of length ) if is an author of () for all , reviews in () for and reviews in (). Notably, a review cycle of length in corresponds to a directed cycle of length in and a review cycle of length one corresponds to an author reviewing one of its own papers. We say that a review assignment is -cycle free if there is no review cycle of length in .
Variable | Explanation |
---|---|
vertex set consisting of agents and papers with and | |
shows can review | |
shows authors | |
in-neighbors of wrt. , i. e., | |
out-neighbors of wrt. , i. e., | |
, | maximum in- and out-degree in resp., e. g., |
, | minimum in- and out-degree in resp., e. g., |
, | maximum resp. minimum number of papers per author |
, | maximum resp. minimum number of authors per paper |
, | maximum resp. minimum number of papers any author is qualified to review |
, | maximum resp. minimum number of potential reviewers for any paper |
Using this notation, we define our central problem and refer to Table 1 for further necessary variable definitions:
[Weighted] Cycle-Free Reviewing
Input: A directed bipartite graph and non-negative integers , , and [and a weight function and an integer ].
Question: Is there a --valid and -cycle-free review assignment [of weight at least , i.e., ]?
3 NP-Hardness in Various Restricted Cases
From the work of Barrot et al. (2020, Theorem 4.12) it follows that Cycle-Free Reviewing is NP-hard in the single-author-single-paper setting () even if and . However, as in reality instances of Cycle-Free Reviewing are hardly arbitrary but have a quite strong structure, in this section we prove that the NP-hardness of Cycle-Free Reviewing upholds even if the given instance fulfills further quite restrictive conditions, e.g., each agent is qualified to review all papers or our problem specific parameters () are small constants.
3.1 Sparse Review Graph and Small Weights
We start by considering the case where all our parameters are small. Specifically, we show the NP-hardness of Cycle-Free Reviewing for arbitrarily even if each paper is only authored by at most two agents, each agent authors at most two papers, each agent is only qualified to review at most three papers, and for each paper only at most three agents are qualified to review it (see Table 1 for definitions).
Theorem 1.
For any , Cycle-Free Reviewing is NP-hard, even if , , , and . The hardness results still hold if agents are not allowed to review papers of co-authors.
Proof.
We reduce from an NP-hard variant of Satisfiability where each clause consists of exactly three literals and each variable occurs positive in at most two clauses and negative in at most two clauses (Berman et al., 2003).
Construction.
Given an instance of Satisfiability consisting of a set of variables and a set of clauses, we set and to some integer greater than one. We construct the set of agents and the set of papers as follows. For each variable , we introduce three agents , , and and three papers , , and ( has no author and can be considered as a dummy paper). Agents and are qualified to review , agents and are qualified to review and agents and are qualified to review . Intuitively, either does review (which corresponds to setting to false) or review (which corresponds to setting to true).
For each clause , we introduce three agents , , and and three papers , , and where is qualified to review for . Moreover, we introduce two dummy agents that are both qualified to review , , and and two dummy papers who , , and are all qualified to review. Notably, for one , needs to review (which corresponds to being fulfilled because of ).
Concerning the authors of each paper, for each clause and , is an author of and is an author of .
It is easy to see that each agent is only qualified to review at most three papers and that for each paper only at most three agents are qualified to review it. Moreover, as each literal only appears in at most two clauses, every paper has at most two authors and each agent authors at most two papers. Moreover, note that , implying that each agent has to review exactly one paper.
() Let be the set of variables that are set to true in a satisfying assignment of the given Satisfiability instance. Then, for , we assign to , to , and to , while for , we assign to , to , and to . For a clause , let with be a literal from that is set to true by the given assignment (such a literal exists because the given assignment is satisfying). Then, we set to review . The two dummy agents from this clause are assigned arbitrarily to for and the agents for are assigned arbitrarily to the two dummy papers. To show that the constructed assignment does not contain a review cycle (of arbitrary length) note that only papers that have an author are papers for some literal (which are authored by for some and where appears in as the th literal) and papers for some and (which are authored by ). Thus, every review cycle of length at least two needs to contain an agent for some and and , where appears in as the th literal, and reviews and reviews . For to review it needs to hold that the given assignment satisfies . However, by our construction of the review assignment, reviewing implies that is satisfied. Thus, no review cycle exists.
() Assume we are given a --valid -cycle-free review assignment. Let . We claim that the assignment which sets all variables in to true and all variables in to false satisfies the given formula. Assume for the sake of contradiction that there exists a clause which is not satisfied by . As the given assignment is --valid and we have the same number of agents and papers in the constructed instance, there is a such that reviews . Note that by the same reasoning, for each , either does review or review . Thus, if a literal is not satisfied by , then reviews . As is not satisfied by , reviews . Thus, and form a review cycle of length two, as reviews , which is authored by , and reviews , which is authored by , a contradiction. ∎
The above reduction crucially relies on the “sparsity” of the qualifications, i.e., that each agent is qualified to review between two and three papers and that for each paper only two or three agents are qualified to review it. Motivated by the observation that, in practice, reviewers are typically qualified to review more than just two or three papers and that for each paper there typically exists more than just two or three qualified reviewers, it is a natural question whether our above hardness result still extends to this case. We answer this question affirmative by proving hardness for arbitrary and , i.e., for the case where each agent is qualified to review at least papers and for each paper there exist at least agents that are qualified to review to:
Proposition 1.
For any , , Cycle-Free Reviewing is NP-hard, even if , , and .
Proof.
Let . We reduce from the restricted NP-hard variant of Cycle-Free Reviewing considered in Theorem 1. Given an instance of Cycle-Free Reviewing with , we modify the instance by introducing two sets and of agents each and two sets and of papers each. All agents from are qualified to review all papers from and from . In addition to being qualified to review some papers from (as captured in ), all agents from are qualified to review all papers from . Moreover, all agents from are qualified to review all papers from . Thereby, all agents are qualified to review at least papers and for each paper at least agents are qualified to review it. Notably, we still have . Thus, as agents from are only qualified to review papers from and , all papers from need to be reviewed by agents from (which is always possible to do in without creating a review cycle as no paper from has an author). Similarly, as papers from can only be reviewed by agents from and , all agents from need to review papers from (which is always possible to do in without creating a review cycle as no paper from has an author). Thus, all agents from need to review papers from from which the correctness of the reduction directly follows.
Lastly, note that we did not modify the set of authors for any paper from and did not add papers with an author. Thus, it still holds in the modified instance that each agent authors at most two papers and each paper has at most two authors (). ∎
While we prove hardness for arbitrary and , in our construction from Proposition 1, there are always agents that are not qualified to review “many” papers (around ) and always papers that cannot be reviewed by “many” agents (around ). Thus, interpreting a qualification as the absence of a conflict of interest, for our NP-hardness agents need to have many conflicts. In Section 4, we prove that this does not happen by accident, as if the number of conflicts per agent/paper (and , , , and ) are “small”, then Cycle-Free Reviewing always admits a solution.
In Weighted Cycle-Free Reviewing it is possible to encode the “qualifications” of agents into weights: If we modify the reduction from above and give an agent-reviewer pair weight one if the agent is qualified to review the paper and weight zero otherwise, we get that Weighted Cycle-Free Reviewing is NP-hard even if each agent is qualified to review all papers and we have few non-zero weights.
Corollary 1.
For any , Weighted Cycle-Free Peer Reviewing is NP-hard, even if each agent is qualified to review all papers, each agent gives only at most three papers a non-zero weight, for each paper at most three agents give it a non-zero weight, , , and .
3.2 No Conflicts of Interest
We now extend the hardness from Corollary 1 for the case where each agent is qualified to review all papers (no conflicts) to the unweighted case. However, our new reduction relies on the existence of papers with many authors and agents authoring many papers.
To show that Cycle-Free Reviewing is NP-hard even if each agent is qualified to review all papers, , , and (Theorem 2), we reduce from Multicolored Independent Set where we are given a graph with vertices partitioned into sets (to which we refer as color classes) and the question is whether there exists a subset of vertices, containing one vertex from each class, that are pairwise non-adjacent. We denote as the number of vertices in the first color class and assume without loss of generality that and that for (note that we can do so because we can always add vertices that are connected to all other vertices and put them into one of the color classes).
Construction.
Given an instance of Multicolored Independent Set , we construct an instance of Cycle-Free Reviewing as follows. For each color , we add a special agent and a special paper . Moreover, for each vertex , we add a vertex agent and a vertex paper . Further, we add dummy agents and dummy papers . Lastly, we insert an agent and a paper .
The paper is authored by all vertex agents and dummy agents. For color , is authored by all vertex und dummy agents from colors and agent . Further, all dummy papers for are authored by the special agent . For a vertex , paper is authored by the special agent , all agents corresponding to vertices from or to vertices adjacent to in , i.e., is authored by agents . Each agent is qualified to review all papers and we set and .
Lemma 1.
If the given instance of Multicolored Independent Set is a YES-instance, then the constructed instance of Cycle-Free Reviewing is a YES-instance.
Proof.
Let be a independent set of size in the given Multicolored Independent Set instance with for . From this we construct a solution for the constructed Cycle-Free Reviewing instance as follows. Agent reviews paper . For , special agent reviews special paper . Vertex agents are assigned arbitrarily to dummy papers . Lastly, vertex agent reviews paper and the dummy agents from class are assigned arbitrarily to the remaining vertex papers from this class. Note that by construction, the described assignment is - valid. Moreover, it is easy to verify that no agent reviews a paper authored by it so it remains to check for reviewing cycles of length two. All special agents are only authors of papers from their color class but review papers authored solely by agents outside their color class. Thus there exist no review cycles involving special agents. All papers wrote are reviewed by special agents so cannot be part of a review cycle. Dummy agents only write papers that are reviewed by special agents and so no dummy agent can be part of a review cycle. Thus, every possible review cycle of length two needs to involve two vertex agents. As no dummy paper is written by a vertex agent, the only vertex agents that review papers authored by other vertex agents are those assigned to vertex papers, i.e., agents . Assume for the sake of contradiction that (which reviews paper ) forms a cycle with reviewer with . However, from this it follows by the definition of a review cycle that is an author of paper , which implies that contradicting that is an independent set. ∎
We now turn to proving the backwards direction of the reduction. To do this, we first identify several assignments that need to be made in all solutions to the constructed Cycle-Free Reviewing instance. We start by proving that needs to review .
Lemma 2.
In every - valid -cycle-free assignment in the constructed instance , reviews .
Proof.
Recall that all agents except all special agents and agent are authors of . So for the sake of contradiction let us assume that special agent for some reviews . However, to prevent a reviewing cycle, this implies that only the remaining special agents and can review papers written by . However, as is an author of all vertex papers corresponding to vertices from and we have assumed that each set consists of more than vertices, these agents are not enough to review all papers written by , a contradiction. ∎
We next prove that reviews for all . For this, we need the following lemma:
Lemma 3.
In every - valid -cycle-free assignment in the constructed instance , if reviews paper for , then only vertex and dummy agents from class and special agents can review dummy and vertex papers from class .
Proof.
Note that the special agent is an author of all dummy and vertex papers from color class . Moreover, paper is authored by all dummy and vertex agents from color classes different from . Thus, if reviews , then no vertex or dummy agent from a class different from can review papers written by . As authors all dummy and vertex papers from class , the lemma follows. ∎
Using this, we are able to prove that each special agent reviews the corresponding special paper.
Lemma 4.
In every - valid -cycle-free assignment in the constructed instance , for , reviews .
Proof.
By Lemma 2, is assigned to , which is authored by all dummy agents and vertex agents. Thus, to prevent the existence of reviewing cycles, only special agents can review papers written by . As for each , is written by , it follows that the set of agents needs to review the set of papers . For the sake of contradiction, let us assume that special agent reviews paper for . We assume without loss of generality that (if there exists a pair where reviews paper with there also has to exist one with ). By Lemma 3 and as special agents need to review special papers, from this it follows that only dummy and vertex agents from color can review the vertex and dummy agents from class (which are all written by ). As we have assumed that , the number of these agents () does not suffices to review all of these papers (), a contradiction. ∎
We are now ready to prove the correctness of the backwards direction of the reduction:
Lemma 5.
If the constructed instance of Cycle-Free Reviewing is a YES-instance, then the given instance of Multicolored Independent Set is a YES-instance.
Proof.
From Lemma 2, Lemma 3, and Lemma 4 it follows that for each color every vertex and dummy agent from this color class needs to review a vertex or dummy paper from this color class and that each vertex or dummy paper from this color class needs to be reviewed by a vertex or dummy agent from this color class. As there exist dummy agents from color class but vertex papers at least one vertex paper from color class needs to be reviewed by a vertex agent from color class . Note that for each , agent is an author of all vertex papers except . Thus, for each color there needs to exist (at least) one agent for some that reviews . So let be a list of those agents (containing one vertex agent from each color class). We claim that forms an independent set in . For the sake of contradiction assume that for , then by construction it follows that who reviews paper is an author of paper and similarly who reviews is an author of . Thus, and form a reviewing cycle, a contradiction. ∎
Theorem 2.
Cycle-Free Reviewing is NP-hard even if each agent is qualified to review all papers, , , and .
The reduction from Theorem 2 heavily relies on the possibility that an agent reviews a paper written by an agent with whom she has a joint paper. As some conferences might declare an automatic conflict of interest for co-authors, we now consider the case where an agent is qualified to review all papers that are not authored by one of her co-authors:
Theorem 3.
Cycle-Free Reviewing is NP-hard even if each agent is qualified to review all papers that are not written by one of her co-authors, , and .
Proof.
We reduce from Cycle-Free Reviewing with , and where agents are not qualified to review papers of co-authors, which is NP-hard as proven in Theorem 1. We assume without loss of generality that for each paper there is one agent who is not qualified to review it.
Construction.
Given an instance of Cycle-Free Reviewing, we construct a new instance with agents and papers and and . We start by setting . Next, we add agents , , , and to . For each agent and each paper , we insert an agent to and add a so called agent paper which is authored by and to . For each paper , we introduce an agent to . Moreover, we introduce dummy agents to . We introduce five different (types of) papers in :
-
•
For each paper , we introduce a paper to that is written by all authors of , agent and by agents for all agents that are not qualified to review in .
-
•
We introduce a paper authored by and for all and .
-
•
We introduce a paper authored by and for all and to .
-
•
We introduce a paper authored by and all dummy agents and for each .
-
•
We introduce a paper authored by and all dummy agents and for each .
Each agent is qualified to review all papers that are not written by one of her co-authors.
() Given a --valid -cycle-free review assignment for , we construct a --valid -cycle-free review assignment for as follows. All agents from still review the same papers as in the given assignment (which are all still qualified to do so because we have not added or removed any papers with two authors from apart from copies of papers from ). Agent reviews , agent reviews , reviews , and reviews (which are all are qualified to do so). Moreover, the dummy agents are assigned arbitrarily to the agent papers, which they are qualified to review because dummy agents only author papers together with agents and and .
Concerning review cycles, note that agents do not review any paper. Moreover dummy agents only review papers written by agents but are only reviewed by and and thus cannot be part of a review cycle. Moreover, also no agent from can be part of a review cycle because there was no such cycle in the given review assignment and all agents that author a paper reviewed by an agent from are not part of a review cycle. Thus, any review cycle needs to consists of , , , and . Note that reviews a paper of , reviews a paper of , reviews a paper of , and reviews a paper of . Thus, these agents form a -cycle but no -cycle.
() Given a --valid -cycle-free review assignment for , we claim that this assignment restricted to agents from and papers from is a solution to the given instance . To prove this, we will argue for all agents from that they cannot review a paper from from which the correctness directly follows, as we have not added any authors from to papers from . Fix some paper . We now iterate over all agents and argue why they cannot review . As we have assumed in that for all papers there is an agent not qualified to review it, it follows that has an author for some and author in . For agent and it holds that both have a joint paper with and thus cannot review . Next, note that as, for each and , is either identical to or has a joint paper with none of these agents can review . Lastly, for some , for , and and have a joint paper with , which is an author of . Thus, all these agents (and thereby no agent from ) can review .
∎
3.3 Single-Author-Single-Paper Setting
In their theoretical analysis, Barrot et al. (2020) focus on Cycle-Free Reviewing where each agent writes a single-author paper (we speak of an agent and its paper interchangeably) and qualifications are symmetric, i.e., if an agent is qualified to review agent , then is qualified to review . They prove that this problem is NP-hard for and (without bounds on or ) but polynomial-time solvable for arbitrary for . We close the gap between these two results and extend their general picture by proving that for , Cycle-Free Reviewing is NP-hard for even if qualifications are symmetric and each agent is only qualified to review four agents, i.e., we need to decide for each agent which two of these four agents review and which two of these agents will get a review from .
Theorem 4.
Cycle-Free Reviewing is NP-hard, even if , , , , each agent is qualified to review exactly four papers and if an agent can review the paper written by agent , then can review the paper of .
To prove Theorem 4, we reduce from Two-in-Four-Satisfiability, a variant of Satisfiability, where given a propositional formula over variables where each clause contains four different literals, the question is whether there exists an assignment of variables such that in each clause exactly two out of four literals are satisfied. As to the best of our knowledge, this variant of Satisfiability has not been considered before, we start by proving that it is NP-hard even if each literal appears exactly twice positive and twice negative:
Proposition 2.
Two-in-Four-Satisfiability is NP-hard, even if each variable appears exactly twice positive and twice negative.
Proof.
In Monotone Not-All-Equal 3-Sat, we are given a propositional formula where each clause contains three different positive literals and the question is whether there is a variable assignment such that in each clause at least one literal is set to true and at least one is set to false. Reducing Monotone Not-All-Equal 3-Sat to Two-in-Four-Satisfiability (without any additional restrictions) is straightforward: Given an instance of Monotone Not-All-Equal 3-Sat, for each clause, we introduce a new variable which we add to the clause. Thereby, we can extend a valid assignment for the Monotone Not-All-Equal 3-Sat instance by setting for a clause the newly introduced variable to true if originally sets only one literal from this clause to true and to false if originally sets only one literal from this clause to false. The reverse direction is immediate. However, to achieve that each variable appears twice positive and twice negative, a slightly more involved approach is needed.
In fact, for simplicity, we reduce from the NP-hard variant of Monotone Not-All-Equal 3-Sat where each variable appears in exactly four clauses (Darmann and Döcker, 2020). Given an instance over variables of Monotone Not-All-Equal 3-Sat, note that needs to be even, as there are and needs to be an integer. We now construct a new propositional formula over variable set as follows. For each clause for , we add variables , , , and to and clauses and to . Now, every variable appears once negative and once positive. It remains to link the copies of each variable.
We do this for each variable separately. Let be some original variable and let denote the list of all clauses where appears in . We introduce dummy variables and to and add clauses , , , and to . As for each , exactly one of and need to be set to true, these clauses enforce that , , , and , all have the same truth value. Lastly, for , we add twice the clause which are always trivially satisfied.
The correctness of the reduction is immediate and all variables appear twice positive and twice negative in . ∎
Using this, we are now ready to prove Theorem 4:
Proof of Theorem 4.
We reduce from Two-in-Four-Satisfiability where each variable appears exactly twice positive and twice negative.
Construction.
Given an instance of Two-in-Four-Satisfiability consisting of a propositional formula over variables , for , we denote as and the indices of the two clauses in which variable appears positive and as and the indices of the two clauses in which variable appears negative. From this, we construct an instance of Peer Cycle-Free Reviewing as follows. For , we introduce four agents , , , and (constituting a gadget modeling this variable). Moreover, for , we introduce one agent . Qualification are symmetric, i.e., if agent is qualified to review , then is qualified to review . For , is qualified to review , , , and (and the other way round). Moreover, is qualified to review , , , and (and the other way round). Lastly, and are qualified to review each other and and are qualified to review each other (where is taken modulo ). We set and .
() Let be an assignment of variables in that is a solution to the given Two-in-Four-Satisfiability instance. For , we let review and review . Moreover, we let review and review .
For where is set to true by , we let and review and review and . Moreover we let review and and we let and review . Conversely, for where is set to false by , we let and review , we let review and . Moreover, we let review and and let and review .
As sets exactly two literals in each clause to true and two to false, for each , is reviewed by two agents and reviews two agents. The same also holds for all other agents, implying that the constructed review assignment is - valid. It is easy to see that there are no -cycles. Moreover, as no two agents and for are qualified to review each other and, for no , are there two agents that are both qualified to review and that are qualified to review each other, each -cycle needs to solely consist of agents from a gadget corresponding to a single variable. So let us fix some . The only possible -cycles consist of , and or , and . However, there is no such -cycle, as either reviews both and and both and review , or reviews both and and both and review . Thus, the constructed assignment is -cycle-free.
() Assume we are given a --valid -cycle-free review assignment in the constructed Cycle-Free Reviewing instance. Assume that reviews in the given assignment (if reviews an analgous argument works). We now argue that needs to review . Assume for the sake of contradiction that this is not the case, then as is reviewed by and , she needs to review and . However, to prevent a -cycle, then needs to review and , a contradiction (as gives three reviews). Next, we want to argue that reviews . For the sake of contradiction, assume that reviews . Then, already gets two reviews and thus needs to review and . However, as already reviews either or review which leads to a -cycle together with and . Applying the same arguments inductively, it follows that for , review and reviews and that reviews .
Further, observe that for each agents and either both review or both get reviews from . For the sake of contradiction, assume that this is not the case. If reviews and reviews , then we have a -cycle consisting of these three agents. Otherwise, reviews and reviews . However, as the given assignment is --valid, from this it follows that reviews and reviews , which leads to a -cycle consisting of , , and . Thus, we have reached a contradiction proving our initial claim. Moreover, as the given assignment is - valid, in case that and both review , then reviews and , and in case that reviews both and , then and both review . We now construct an assignment by, for , setting variable to true if and review and to false if and review . Using our argument from above, it follows that is well-defined. Moreover, the given assignment is --valid, if sets a literal to true, then the agents corresponding to this literal review the agents corresponding to the two clauses in which the literal appears. Similarly, if sets a literal to false, then the agents corresponding to this literal get a review from the two agents corresponding to the two clauses in which the literal appears. Thus, as each agent corresponding to a clause gets and issues two reviews (as the given assignment is --valid), it follows that sets for each clause exactly two literals to true and thus that is a solution to the given instance of Two-in-Four-Satisfiability. ∎
4 Polynomial-Time Solvable Special Cases
In this section, we identify conditions under which a short-cycle-free review assignment provably exists and can be computed in polynomial time. As we will see in our experiments, the subsequently presented algorithms provide short-cycle-free review assignments even beyond the theoretical limitations we discuss below. As we are interested in computing -cycle-free review assignments for , no author is allowed to review one of its own papers. That is why throughout this section we assume that we do not have and at the same time.
Our algorithms in this section are based on the following simple observation: Given a partial -cycle-free review assignment and a paper that requires more assigned reviewers, the number of potential reviewers that would create a -cycle–if assigned to review –is bounded by a function in , the maximum number of authors per paper, and the maximum number of reviews per agent; the precise function is given in the subsequent proofs. Thus, assuming that the minimum number of potential reviewers for each paper is large compared to , , , and , for each paper there are always reviewers that can be assigned to review without creating a -cycle. Note that in practice we can expect that , , and are quite small. Moreover, while the minimum number of fitting reviewers might be not very large, it is not uncommon to assign papers to reviewers that are not “perfect”. Thus, interpreting as the number of community members that do not have a conflict of interest actually yields relative large values for in practice.
We start with a very restrictive setting and then, step by step, generalize the approach and the results. First, each paper is written by exactly one author, each agent has at most one paper and we want a completely cycle-free review assignment (i. e., -cycle-free for every ). This of course implies that some agents cannot be authors of papers and so the number of papers is smaller than the number of agents. However, it allows Algorithm 1 to work (implicitly) with the topological ordering of the (acyclic) review assignment while constructing it.
Proposition 3.
If , , and , then Algorithm 1 computes a --valid and completely cycle-free review assignment in linear time.
Proof.
We first show the correctness of Algorithm 1. Clearly, if in each iteration of the loop in Algorithm 1 the set of eligible reviewers (see Algorithm 1) is of size at least , then a completely cycle-free review assignment is created as each agent only reviews papers from agents “occurring” later during the algorithm. Observe that if for , then in iteration we have : There are at most agents in that cannot review (the corresponding edge is not in ) and, thus, at least agents in are eligible to review . It remains to show that for all follows from our assumptions. By assumption of the lemma we have . Hence, . We next show that for all . Observe that at the start we have . Moreover, after the th iteration of the loop in Algorithm 1 we have as each paper gets reviews and the reviewer in starts with . Observe that for all and . Thus, we have and, hence, . This completes the correctness proof.
As to the running time, everything outside the loop starting in Algorithm 1 clearly runs in linear time. As to the part inside the loop, note that by keeping just one array of length we can store the values of in linear time. Moreover, the reviewers for are selected arbitrarily from , which is doable in time. Hence, the loop in Algorithm 1 can be processed in time. Thus, the overall algorithm runs in , that is, linear time. ∎
For our next result we replace the completely cycle-free property of the resulting review assignment with -cycle freeness. This implies that the idea of constructing the review assignment along its topological ordering (as done by Algorithm 1) cannot be employed. Instead, Algorithm 2 constructs greedily a maximal -cycle-free assignment and then extends the assignment by replacing one review assignment by two other assignments. The argument behind the replacement strategy is an extension of the argument in Algorithm 1 that there are always enough reviewers to assign in Algorithms 1 to 1.
To keep our arguments simple we first consider the case that each agent reviews at most one paper and each paper requires one review. Moreover, as before, we are in the setting that each paper has one author and each agent authors at most one paper. Formally, we have the following.
Proposition 4.
If , , , , and , then Algorithm 2 computes a --valid -cycle-free review assignment in polynomial time.
Proof.
Obviously, Algorithm 2 terminates after at most iterations of the while loop as in each iteration the number of assigned reviews increases. Moreover, a --valid -cycle-free review assignment is returned if as described in case 2 (Algorithm 2) always exist. To prove their existence, we introduce some notation. For some let be the -out-neighborhood of , that is, the set of vertices that can be reached from in the review graph via a path of length at least one and at most . Similarly, let be the -in-neighborhood of , that is, the set of vertices that can reach in the review graph via a path of length at least one and at most . Note that if , then also and is contained in a review cycle of length (that is a directed cycle of length in ). Subsequently, we present upper bounds on the size of and for thereby proving the existence of .
Let be the paper without reviewer selected in Algorithm 2 when the algorithm enters case 2. Let be the set of agents that could review without creating a -cycle, that is, is -cycle free. Since , there are at most agents whose assignment to review would create a review cycle, that is, , and thus . Since we are in case 2, no more review assignments could be added without creating a -cycle. Hence, the algorithm assigned the at least potential reviewers in to different papers. Let be the set of these papers. Since we have .
Let be an arbitrary agent without assigned review, that is, . Since , we have . Thus, there are papers that could review without creating a -cycle; let denote the set of these papers. Since we assume that , it follows that there is a . By definition of there is an agent with and . Thus, exist and can be updated to in Algorithm 2. ∎
We now turn our attention to our general case where agents can author and review many papers and papers can have multiple authors and can require several reviews. While the conditions that guarantee the existence of a -cycle-free review assignment need adjustments, we can still use Algorithm 2 together with a correctness proof that follows a similar pattern as the proof of Proposition 4.
Theorem 5.
If, , , , and , then Algorithm 2 computes a --valid -cycle-free review assignment in polynomial time.
Proof.
We use the same notation as in the proof of Proposition 4 and similarly to this proof we need to show that as described in Algorithm 2 actually always exist.
Let be the paper with a missing review selected in Algorithm 2 and the algorithm entered case 2. Let be the set of agents that could review without creating a -cycle, that is, is -cycle free. As every paper has at most authors and every author has at most assigned papers to review, it follows that
Thus, , as at most agents are already assigned to and at most agents cannot review because this would cause a review cycle of length at most . When case 2 was entered, no more review assignment could be added without creating a -cycle. Hence, the algorithm assigned the potential reviewers in already to different papers. Let be the set of these papers. Note that .
Let be an arbitrary agent that can do one more review, that is, . Using a similar argument as above, we can show . Thus, there are more than papers that could review additionally without creating a -cycle; let denote the set of these papers. Note that by our assumptions that and , and are both non-empty. Since , it follows that there is a . By definition of there is an agent with and . Thus, exist and can be updated to in Algorithm 2. This finishes the correctness proof. ∎
To simplify the statement of Theorem 5 consider a “symmetric” case where , , and . For brevity, set , , and . Let be the maximum number of papers any agent is not qualified to review/has a conflict of interest with, that is, . Setting and as in our experiments we get:
Corollary 2.
If , then there always exists a --valid -cycle-free review assignment that can be found in polynomial time.
Considering that AAAI’22 had 9,251 submissions and that there was a submission limit of papers per author and assuming that each paper has at most ten authors (implying that ) and that each author has at most 700 conflict of interests, it follows that there is a --valid -cycle-free review assignment computable with Algorithm 2.
As we see in the experiments in the next section, our algorithm returns -cycle-free review assignments even well beyond the theoretical guarantees given above. We also remark that Algorithm 2 allows for an easy extension to the weighted case which we use in our experiments in the next section. To this end, in the first case (Algorithm 2) we do not pick an arbitrary edge but a eligible edge of maximum weight to be added to the assignment .
5 Experiments
In this section, we compare the weight of review assignments computed using different methods and analyze the occurrences of review cycles.222The code for our experiments is available at github.com/n-boehmer/Combating-Collusion-Rings-is-Hard-but-Possible. For this, we use a dataset from the 2018 International Conference on Learning Representations (ICLR ’18) prepared by Xu et al. (2019). Xu et al. (2019) collected all papers submitted to ICLR ’18 and the identity of all authors. As reviewers identities are unknown, they considered all authors to be reviewers and computed for each author-paper pair a similarity score.333To the best of our knowledge, in all other publicly available datasets, there are similarity scores for reviewer-paper pairs but the link between the identities of authors and reviewers is missing (as this is considered sensitive information).
From the dataset of Xu et al. (2019), we created multiple instances of Weighted Cycle-Free Reviewing as follows. Given a number of papers and a ratio of the numbers of agents and papers, we sample a subset of of the ICLR ’18 papers and set this as our set of papers. Subsequently, we compute the set of all authors of one of these papers and sample a subset of authors and set this as our set of agents. Notably, the created instances can be seen as particularly challenging when it comes to avoiding review cycles, as in reality also “uncritical” reviewers, i.e., reviewers that do not author any paper, exist.
As done in other papers using the same dataset, we focus on the case with and , i.e., every paper needs exactly three reviews and each agent can review at most six papers (Xu et al., 2019; Jecmen et al., 2020). We consider three different types of review assignments: As “optimal” we denote a maximum-weight --valid review assignment. Such an assignment can be computed using a simple Linear Program (LP) as, for instance, described by Taylor (2008). As “optimal -cycle free” we denote a maximum-weight --valid -cycle-free review assignment. This solution can be computed by treating the LP of Taylor (2008) as an Integer Linear Program (ILP) and adding for each possible -cycle for a separate constraint which imposes that at least one of the agent-paper pairs from the cycle is not assigned. We solved all (I)LPs using Gurobi Optimization, LLC (2021). Lastly, as “heuristic -cycle free”, we denote a --valid -cycle-free review assignment computed by the weighted variant of Algorithm 2 as described at the end of Section 4.444We could not use the heuristics of Guo et al. (2018) as these are not available and their algorithm details are ambiguous. In all experiments conducted in this section, the heuristic always returned a solution despite the fact that most of the time we are beyond the setting in which Theorem 5 guarantees this behavior of the heuristic. In experiment I presented in the following subsection, for , an unoptimized implementation of our heuristic was always able to find a -cycle-free review assignment in less than seconds, being on average around times faster than the “optimal” LP, on average around times faster than the “optimal -cycle free” ILP, and on average more than times faster than the “optimal -cycle free” ILP.
5.1 Experiment I
In this experiment, we focus on the case where the total number of needed reviews is the same as the total number of reviews that can be written, which is in some sense the most “challenging” but probably also one of the more realistic scenarios. Specifically, for , we prepared instances with as described above and computed for each of these instances the optimal, heuristic //-cycle-free, and optimal -cycle-free review assignment. Moreover, for all instances with , we also computed the optimal -cycle-free review assignment (for larger instances the ILP solver run out of memory.)
To measure the “price of -cycle freeness”, in Figure 3, we display the weights of different cycle-free review assignments divided by the weight of an optimal review assignment. What stands out here is that forbidding the existence of -cycles only comes at the cost of decreasing the assignment’s weight by on average at most (if the optimal -cycle-free assignment is used). Turning to the results produced by our heuristic, the quality decrease for //-cycle-free assignments lies, on average, around , , and . The weight of assignments computed using our heuristic is thus clearly worse than the weight of the optimal cycle-free assignment, yet still not far away from the the weight of an optimal assignment. What is particularly surprising here is that for both our heuristic and the optimal cycle-free assignment, whether , or cycles are forbidden seems to be rather irrelevant for the quality decrease. All in all, it is encouraging that //-cycle freeness can be realized at a low cost independent of whether our heuristic or an ILP is used.
The necessity of dealing with review cycles is underlined by the data displayed in Figure 3. Here, we show the fraction of agents that are contained in at least one review cycle of some length in an optimal assignment and in a heuristic -cycle-free assignment. Overall, as the number of papers increases the fraction of agents contained in review cycles constantly decreases, yet for all considered values of the results are worrisome. In the optimal assignment for papers, the fraction of agents contained in a review cycle of length at most is, on average, , while even for papers, still of agents are contained in a review cycle. Considering heuristic -cycle-free assignments, the fraction of agents contained in a cycle of length is considerably lower than for the optimal solution but still non-negligible (the results for optimal -cycle-free assignments are similar to the displayed results for our heuristic).
We also computed the fraction of papers that are contained in at least one review cycle (see Figure 3). The results are as in Figure 3 with all values roughly halved, e.g, even in the optimal assignment for papers, of papers are contained in a review cycle of length at most . An intuitive explanation for this difference between agents and papers is that the number of papers is twice the number of agents and that there exist some papers without reviewing authors. Overall, it is striking that even for a high number of papers, in an optimal assignment around of papers could have a considerably higher chance of getting accepted if two agents coordinate to give each others paper better reviews and of reviewers would have an opportunity to participate in such a collusion.
5.2 Experiment II
In this experiment, we analyze how the results from experiment I depend on the assumption that the supply and demand of reviews exactly matches. In particular, as describe before, for we prepared instances with papers and agents (we also repeated this experiment for and papers producing similar results) and computed the different types of review assignments. Considering the assignment weights (see Figure 5), increasing from to , the normalized weight of an optimal -cycle-free assignment decreases by to , while the normalized weight of a heuristic -cycle-free assignment increases by to : our heuristic performs particularly well if there are (considerably) more reviews available then needed; this supports our theoretical statements for our heuristic in Section 4.
Turning to the possible impact of review cycles, we visualize the fraction of agents/papers contained in a review cycle in an optimal assignment in Figure 5.555For readability, we do not display the values for the optimal/heuristic cycle-free assignment, as their relationship to the optimal assignment is again similar as in Figure 3. While the fraction of agents contained in a review cycle constantly and significantly decreases if more and more agents are added, the fraction of papers contained in a cycle constantly increases. The former observation is quite intuitive, as when more and more agents are added, the average review load decreases and even if the number of review cycles remains the same, it is likely that the fraction of agents contained in one gets smaller. The latter observation is less intuitive but probably a consequence of the fact that, starting with , for some papers none of the authors is part of the agent set, implying that these papers cannot be part of a review cycle; however, if we start to add more and more agents, more and more papers can potentially be part of a review cycle. Overall, it might be quite counter intuitive that adding more and more reviewers (that are also authors) to the reviewer pool does not decrease the number of papers contained in a review cycle but increases them.
6 Conclusion
Our work provides a first systematic analysis of Cycle-Free Reviewing. On the theoretical side, we show that Cycle-Free Reviewing is a computationally hard problem even in very restricted settings, yet practically relevant polynomial-time solvable special cases exist. In our practical analysis, we could show that in assignments that do not care for review cycles a high fraction of authors and papers will likely be part of a short review cycle. While collusion rings can certainly also emerge without the existence of review cycles, for example, when authors coordinate over multiple conferences (Littman, 2021; Shah, 2021), allowing so many easy opportunities means to leave a huge door unlocked without good reason: Our heuristic significantly improves the situation, since it seems to always find cycle-free review assignment at a very low quality loss.
For future work, it would be valuable to further investigate the limits of our heuristic. While our current bounds are certainly not tight, there are also clear limitations for possible extensions imposed by our NP-hardness results in quite restrictive settings from Section 3. However, a concrete and practically very relevant open question is whether the minimum degree in our analysis can be replaced by the average degree; this would make the results much more robust against outliers. Finally, due to the lack of data, we tested our model on just one dataset. Obtaining more data to test our and other models on would be extremely valuable.
Acknowledgments
NB was supported by the DFG project MaMu (NI 369/19) and by the DFG project ComSoc-MPMS (NI 369/22). RB was partially supported by the DFG project AFFA (BR 5207/1 and NI 369/15). This work was started at the research retreat of the TU Berlin Algorithms and Computational Complexity group held in September 2020.
References
- Barrot et al. [2020] Nathanaël Barrot, Sylvaine Lemeilleur, Nicolas Paget, and Abdallah Saffidine. Peer reviewing in participatory guarantee systems: Modelisation and algorithmic aspects. In Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems (AAMAS ’20), pages 114–122. IFAAMAS, 2020.
- Berman et al. [2003] Piotr Berman, Marek Karpinski, and Alex D. Scott. Approximation hardness of short symmetric instances of MAX-3SAT. ECCC, (49), 2003.
- Darmann and Döcker [2020] Andreas Darmann and Janosch Döcker. On a simple hard variant of not-all-equal 3-sat. Theor. Comput. Sci., 815:147–152, 2020.
- Garg et al. [2010] Naveen Garg, Telikepalli Kavitha, Amit Kumar, Kurt Mehlhorn, and Julián Mestre. Assigning papers to referees. Algorithmica, 58(1):119–136, 2010.
- Goldsmith and Sloan [2007] Judy Goldsmith and Robert H. Sloan. The AI conference paper assignment problem. In Proceedings of the 22nd AAAI Conference Workshop on Preference Handling for Artificial Intelligence (MPREF ’07), pages 53–57. AAAI Press, 2007.
- Guo et al. [2018] Longhua Guo, Jie Wu, Wei Chang, Jun Wu, and Jianhua Li. K-loop free assignment in conference review systems. In Proceedings of the 2018 International Conference on Computing, Networking and Communications (ICNC ’18), pages 542–547. IEEE Computer Society, 2018.
- Gurobi Optimization, LLC [2021] Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2021. URL https://www.gurobi.com.
- Hell et al. [1988] Pavol Hell, David G. Kirkpatrick, Jan Kratochvíl, and Igor Kríz. On restricted two-factors. SIAM J. Discret. Math., 1(4):472–484, 1988.
- Jecmen et al. [2020] Steven Jecmen, Hanrui Zhang, Ryan Liu, Nihar B. Shah, Vincent Conitzer, and Fei Fang. Mitigating manipulation in peer review via randomized reviewer assignments. In Advances in Neural Information Processing Systems 33 (NeurIPS ’20), 2020.
- Kobren et al. [2019] Ari Kobren, Barna Saha, and Andrew McCallum. Paper matching with local fairness constraints. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD ’19), pages 1247–1257. ACM, 2019.
- Leyton-Brown and Mausam [2021] K. Leyton-Brown and Mausam. Aaai 2021 - introduction. https://slideslive.com/38952457/aaai-2021-introduction?ref=account-folder-79533-folders, 2021. minute 8 onwards in the video.
- Lian et al. [2018] Jing Wu Lian, Nicholas Mattei, Renee Noble, and Toby Walsh. The conference paper assignment problem: Using order weighted averages to assign indivisible goods. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI ’18), pages 1138–1145. AAAI Press, 2018.
- Littman [2021] Michael L. Littman. Collusion rings threaten the integrity of computer science research. Commun. ACM, 64(6):43–44, 2021.
- Long et al. [2013] Cheng Long, Raymond Chi-Wing Wong, Yu Peng, and Liangliang Ye. On good and fair paper-reviewer assignment. In Proceedings of the 2013 IEEE 13th International Conference on Data Mining (ICDM ’13), pages 1145–1150. IEEE Computer Society, 2013.
- Price and Flach [2017] Simon Price and Peter A. Flach. Computational support for academic peer review: a perspective from artificial intelligence. Commun. ACM, 60(3):70–79, 2017.
- Shah [2021] Nihar B. Shah. Systemic challenges and solutions on bias and unfairness in peer review. http://www.cs.cmu.edu/˜nihars/preprints/SurveyPeerReview.pdf, 2021.
- Stelmakh et al. [2021] Ivan Stelmakh, Nihar Shah, and Aarti Singh. Peerreview4all: Fair and accurate reviewer assignment in peer review. J. Mach. Learn. Res., 22(163):1–66, 2021.
- Taylor [2008] Camillo J Taylor. On the optimal assignment of conference papers to reviewers. https://repository.upenn.edu/cis˙reports/889, 2008.
- Xu et al. [2019] Yichong Xu, Han Zhao, Xiaofei Shi, and Nihar B. Shah. On strategyproof conference peer review. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI ’19), pages 616–622. ijcai.org, 2019.