poster
Counterfactual Evaluation of Peer Review Assignment Strategies in Computer Science and Artificial Intelligence
keywords:
peer review
statistics
artificial intelligence
Objective Artificial intelligence (AI) has become pervasive to
assign reviewers to papers.1
The assignment relies on 3 key
sources of data1: (1) AI-computed similarities between the
text of the submitted paper and reviewers’ past articles, (2)
reviewer-provided preferences expressing which papers they
would like to review, and (3) overlap between the paper’s
topics as specified by authors and reviewers’ self-reported
areas of expertise. However, it is unknown which of these
sources, or combination thereof, lead to the best outcomes of
the reviewer assignment.
Design To assign reviewers to papers, 2 venues recently used
randomized algorithms2 designed to combat fraud: the 2021
Theory and Practice of Differential Privacy (TPDP) Workshop
with 35 reviewers and 95 full papers and the Association for
the Advancement of Artificial Intelligence (AAAI) 2022
Conference on Advancement in Artificial Intelligence with
3145 reviewers and 8450 full papers. To compute overall
similarities between each reviewer-paper pair, TPDP
weighted the AI-computed text similarities by weight (wtext,
range 0-1) and reviewers’ preferences by weight (1 − wtext);
AAAI weighted the AI-computed text similarities by weight
(wtext, range 0-1) and the overlap between the papers and
reviewers’ topical areas by weight (1 − wtext) (reviewers’
preferences were also included in AAAI but not considered in
this study). The randomized assignment2 then maximized
similarity of assigned reviewer-paper pairs, subject to the
probability of any reviewer being assigned to any paper being
at most 0.5 in TPDP and 0.52 in AAAI. In this study, the
randomization in the assignment was leveraged to estimate
the counterfactual quality of alternative assignment
strategies. How the overall quality of the reviewer-paper
assignment was affected was investigated by (1) introducing
randomness in the assignment process and (2) varying
weights of different sources of information. The quality of any
counterfactual reviewer-paper assignments was measured
using reviewers’ self-reported expertise and confidence in
their review.
Results The results are tabulated in Table 26.
3 First, introducing randomness by limiting the probability of any
reviewer-paper assignment led to a marginal reduction in
assignment quality for TPDP and a slightly larger reduction in
AAAI. Second, for TPDP, placing more weight on the AI-
computed text similarities (wtext = 0.8) instead of equally
weighting the text similarities and the reviewers’ preferences
(wtext = 0.5) resulted in a higher reviewer-paper assignment
quality. Third, for AAAI, placing more weight on the AI-
computed text similarities (wtext = 0.75) instead of equally
weighting the text similarity and the reviewer-paper topical
area overlap (wtext = 0.5) led to a similar assignment quality.
Conclusions Randomness in the reviewer assignments can
help improve AI-based automated assignment by enabling
counterfactual analysis of alternative assignment strategies,
in addition to its original goal of mitigating fraud, but leads to
a small reduction in assignment quality.
References
1. Shah N. Challenges, experiments, and computational
solutions in peer review. Commun ACM. 2022;65(6):76-87.
doi:10.1145/3528086
2. Jecmen S, Zhang H, Liu R, Shah N, Conitzer V, Fang F.
Mitigating manipulation in peer review via randomized
reviewer assignments. Adv Neural Inf Process Syst.
2020;33:12533-12545.
3. Imbens GW, Manski CF. Confidence intervals for partially
identified parameters. Econometrica. 2004;72(6):1845-1857.
doi:10.1111/j.1468-0262.2004.00555.x
Conflict of Interest Disclosures None reported.
Funding/Support This work was supported by the US National
Science Foundation CAREER award (1942124) which supports
research on the fundamentals of learning from people with
applications to peer review.
Acknowledgments We thank Gautam Kamath and Rachel
Cummings for allowing us to conduct this study in TPDP and Melisa
Bok and Celeste Martinez Gomez from OpenReview.net for helping
with the APIs of OpenReview.net.