Searching for Misconduct and Paper Mills in Peer-Review Comments

Adam Day

Peer Review Congress 2022

•

September 11, 2022

•

Chicago, United States

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

keywords:

misconduct

editorial and peer review process

peer review

Objective The objective was to test and compare various methods to detect text duplication in peer reviews submitted by 2 or more reviewers.

Design Peer review fraud is a significant concern.1,2 A data set of peer review comments submitted to SAGE Publishing was analyzed to search for duplicate text, a possible sign of fake peer review.3 Peer review comments for each article peer reviewed by 19 SAGE Publishing journals were downloaded from the ScholarOne peer review management system and loaded into a Pandas DataFrame. Journals were chosen based on the availability of data; therefore, the data set should be considered biased. Similar comments were found using a number of search methods, including MinHash Locality Sensitive Hashing (MinHash LSH) for detecting near- duplicate text strings, and Elasticsearch, a scalable graph database combined with RapidFuzz, a fast string-comparison library, for distinguishing similar from dissimilar comments.

Results Of 62,974 peer reviewer accounts used to evaluate 66,815 articles, 357 accounts (0.05%) were identified that produced reviews with partial or fully duplicate comments. One large cluster of 47 accounts that shared a number of reports included a number of articles rejected because of suspected paper mill activity. This number suggests that the cluster of 47 accounts represented 47 fake reviewer accounts administered by a paper mill. In total, 972 articles (1.5%) had reviews from reviewer accounts associated with duplicate commenting activity, and 77 articles had reviews from the 47 suspected paper mill accounts (Figure 33). Different search methods identified different suspect accounts and clusters. These searches included (1) a search for exact duplicates, which took 16 seconds to load data into memory and less than 1 second to execute; this search found 29 accounts that had produced similar comments, and (2) a search for similar comments using Elasticsearch, which took 18 minutes and 29 seconds to index and 9 hours, 19 minutes, and 2 seconds to execute; this search found 204 accounts that had produced similar comments.

Conclusions Efficient methods for identifying possible peer review fraud and paper mill activity were described. The methods should be tested on broader peer review sets and settings. When duplication is found, the findings must be considered in context before a judgment can be made about whether there is misconduct.

References

Misra DP, Ravindran V, Agarwal V. Integrity of authorship and peer review practices: challenges and opportunities for improvement. J Korean Med Sci. 2018;33(46):e287. doi:10.3346/jkms.2018.33.e287
Cohen A, Pattanaik S, Kumar P, et al. Organised crime against the academic peer review system. Br J Clin Pharmacol. 2016;81(6):1012-1017. doi:10.1111/bcp.12992
Dadkhah M, Kahani M, Borchardt G. A method for improving the integrity of peer review. Sci Eng Ethics. 2018;24(5):1603-1610. doi:10.1007/s11948-017-9960-9

Conflict of Interest Disclosures None reported.