poster
Degree of Text Similarity and Prevalence of Potential Plagiarism in Biomedical Research Articles According to Linguistic Background and Field of Study
keywords:
copyright and intellectual property
ethics and ethical concerns
misconduct
Objective Text similarity detection software is widely used
by biomedical journals to screen submitted manuscripts for
potential plagiarism, with some journals rejecting
manuscripts with high overall similarity scores in (eg, >40%)
without further review. However, considering that overall
scores may be vulnerable to false-positives resulting from
common phrases, certain guidelines suggest examining the
single-source scores to detect potential plagiarism.1 The
degree of text similarity and prevalence of potential
plagiarism in biomedical articles was examined according to
linguistic background (English-speaking vs non–English-
speaking) and field of study (clinical vs nonclinical).
Design This cross-sectional study was performed in June
2020 and followed the STROBE reporting guideline. We
analyzed the iThenticate similarity reports of 480 articles
randomly selected from an open access multidisciplinary
journal, PLoS One. The articles were categorized into 8
preselected countries as English-speaking (USA, UK, Canada,
Australia) vs non–English-speaking (Korea, China, France,
Italy) and 6 fields of study as clinical (cardiology,
gastroenterology, oncology) vs nonclinical (molecular biology,
genetics, microbiology). The degree of text similarity was
defined as the overall iThenticate score, and the presence of
potential plagiarism was defined as either (1) a single-source
score of greater than 10% according to the Springer Nature
guideline1 or (2) overall score of greater than 40%, which is a
cutoff used at some journals for considering editorial
actions.2,3 The similarity scores in each manuscript section
were measured by calculating the proportion of highlighted
text in each using ImageJ.
Results The degree of text similarity differed significantly
among countries, with articles from non–English-speaking
countries having higher scores than those from English-
speaking countries (30.9% vs 23.8%, respectively; P < .001)
(Table 39). Among the non–English-speaking countries,
there was no significant difference in the degree of text
similarity between Asian and European countries (31.7% vs
30.1%, respectively; P = .27). Text similarity also differed
among fields of study, with clinical articles having higher
scores than nonclinical articles (29.5% vs 25.2%, respectively;
P < .001). Measurement of text similarity showed that the
Methods had the highest degree of text similarity among
manuscript sections. The overall prevalence of potential
plagiarism was 13.5% (65/480) and 13.8% (66/480)
according to the single-source score cutoff of greater than
10% and the overall score cutoff of greater than 40%,
respectively. Except for the lower prevalence of potential
plagiarism in English-speaking countries according to the
overall score cutoff (5.4% vs 22.1%, respectively; P < .001), no
statistically significant differences were noted between
English-speaking and non–English-speaking countries, Asian
and European countries, and clinical and nonclinical articles.
Conclusions While the degree of text similarity differed
significantly according to linguistic background and field of
study, the prevalence of potential plagiarism was similar
across countries and fields of study. Clinical researchers in
non–English-speaking countries in particular may benefit
from receiving English-language writing education to avoid
unintended text similarity.
References
1. Springer. Plagiarism prevention with CrossCheck. Accessed
February 24, 2022. https://www.springer.com/gp/authors-editors/editors/plagiarism-prevention-with-crosscheck/4238
2. IEEE Robotics & Automation Society. Information for
IROS editors. Accessed June 14, 2022. https://www.ieee-ras.org/conferences-workshops/financially-co-sponsored/iros/information-for-editors
3. ARRUS Journal of Mathematics and Applied Science.
Plagiarism policy. Accessed June 14, 2022. https://jurnal.ahmar.id/index.php/mathscience/plagiarism
Conflict of Interest Disclosures None reported.
Funding/Support This work was supported by grant 2019-781
from the Asan Institute for Life Sciences at Asan Medical Center,
Seoul, South Korea.
Role of the Funder/Sponsor The funder had no role in the
design and conduct of the study; collection, management, analysis,
and interpretation of the data; preparation, review, or approval of
the abstract; and decision to submit the abstract for presentation.