Automatic Classification of Peer Review Recommendation

Diego Kozlowski

UNDERLINE DOI: https://doi.org/10.48448/hxgc-x276

poster

Peer Review Congress 2022

•

September 11, 2022

•

Chicago, United States

Automatic Classification of Peer Review Recommendation

keywords:

and scientometrics

informatics

bibliometrics

peer review

metadata

Objective Peer review plays a fundamental role in scholarly publishing, but its legitimacy has been increasingly questioned. A growing literature discusses how reviewers’ demographic characteristics and biases might lead to disparities in research dissemination.1,2 Because the extent to which reviewers are able to determine the outcomes for papers may vary, it is important to look at the relationship between reviewers’ recommendations and editors’ decision- making. However, reviewer recommendations are often embedded in the text of the review. This work proposes a method for automatic detection of recommendations based on review text.

Design The automatic classification used a rule-based algorithm that searched for the presence of 1 or more phrases that signal the reviewer’s recommendation: accept, minor revision, major revision, or reject categories, as defined on the hand-coding process. The algorithm considered the different combinations of signal phrases to define the outcome. The list of signal phrases was iteratively built on 3 rounds of hand- coding and fuzzy matching sentences, while the combinations were defined to maximize the precision, on the hand-coded cases. This study used Publons’ data set, which contained 3,310,791 reviews from 25,934 journals; while 600 cases were hand-coded, a subset of 200 reviews was used to evaluate the performance. The gender of reviewers was inferred by matching first and last names to curated lists of country- specific gendered names, including the US Census.3

Results The overall accuracy on the test was that 81% of assigned recommendations were correct according to hand coding (n = 149). Since the inclusion of additional phrases is associated with lowered accuracy, this might indicate an upper bound in our experiment, given the limits of the current data and the idiosyncrasies of peer review language. Nonetheless, the algorithm’s accuracy was comparable to the rate of agreement between human hand coders (n = 60 88%). Over the full data, 14.3% of reviews were assigned a recommendation by this method (n = 473,443). This was comparable to the hand-coded identification of 18.3% of reviews containing an explicit recommendation (n = 399). From these results, we concluded that the inclusion of an explicit recommendation remains relatively uncommon in peer review, with the majority of peer reviewers leaving a final decision on the manuscript as the responsibility of the editor, but there was large variation between journals. Initial results nonetheless showed gender differences in reviewing behavior, with higher retrieval rates associated with reviewers who identified as men.

Conclusions This work is among the first benchmarks for automatic classification of review recommendations on a large-scale, cross-domain database. Though preliminary, it paves the way for future developments, including studies of potential biases and inequalities in scholarly publishing through examination of the relationship between reviewer characteristics and review outcomes.

References

Lee CJ, Sugimoto CR, Zhang G, Cronin B. Bias in peer review. J Am Soc Inf Sci Tec. 2013;64(1):2-17. doi:10.1002/ asi.22784
Sun M, Barry Danfa J, Teplitskiy M. Does double-blind peer review reduce bias? evidence from a top computer science conference. J Am Soc Inf Sci Tec. 2022;73(6):811-819. doi:10.1002/asi.24582
Larivière V, Ni C, Gingras Y, Cronin B, Sugimoto CR. Bibliometrics: global gender disparities in science. Nature. 2013;504:211-213. doi:10.1038/504211a

Conflict of Interest Disclosures None reported.

Funding/Support This work was supported by the Doctoral Training Unit on Data-Driven Computational Modelling and Applications (DRIVEN), which is funded by the Luxembourg National Research Fund under the PRIDE programme (PRIDE17/12252781).

Acknowledgment We thank Brad Demarest and Chaoqun Ni for contributions to an earlier phase of the project. We also thank the team at Publons who implemented the gender inference algorithm on their data before giving us access.

AdditionalInformation Diego Kozlowski is a co–corresponding author.