poster
Quality of Reporting of Randomized Clinical Trials in Artificial Intelligence: A Systematic Review
keywords:
quality of reporting
reporting guidelines
artificial intelligence
Objective The aim of this study was to evaluate the reporting quality of randomized clinical trials (RCTs) of artificial intelligence (AI) in health care from 2015 to 2020 against the Consolidated Standards of Reporting Trials–Artificial Intelligence (CONSORT-AI)1 guideline.
Design In this systematic review, PubMed and Embase databases were searched to identify eligible studies published from 2015 to 2020. Articles were included if AI (defined as AI, machine learning, or deep learning studies) was used as an intervention for a medical condition, if there was evidence of randomization, and if there was a control group in the study. Exclusion criteria were nonrandomized studies, secondary studies, post hoc analyses, if the intervention was not AI, if the target condition was not a medical disease, or if the study pertained to medical education. The included studies were graded by 2 independent reviewers using the CONSORT-AI checklist, which included 43 items. Any disagreements were resolved by consensus following discussion with a senior reviewer. Each item was scored as fully reported, partially reported, or not reported. Irrelevant items were labelled as not applicable. The results were tabulated, and descriptive statistics were reported.
Results A total of 939 potential abstracts were screened, from which 73 full-text articles were reviewed for eligibility. Fifteen studies were included in the review. The number of participants ranged from 28 to 1058. Studies pertained to medical fields, including medicine (n = 2), psychiatry (n = 3), gastroenterology (n = 5), cardiology (n = 2), ophthalmology (n = 1), endocrinology (n = 1), and neurology (n = 1). Studies were from China (n = 6), the United States (n = 6), the United Kingdom (n = 1), the Netherlands (n = 1), and Israel (n = 1). Only 3 items of the CONSORT-AI checklist were fully reported in all studies. Five items were not applicable in more than 85% of the studies (13 of 15). Twenty percent of the studies (3 of 15) did not report more than 50% of the CONSORT-AI checklist items. Conclusions Reporting quality of RCTs on AI was suboptimal. Because reporting varied in the analyzed RCTs, caution must be exercised when interpreting their outcomes.
Reference
- Liu X, Faes L, Calvert MJ, Denniston AK. Extension of the CONSORT and SPIRIT statements. Lancet. 2019;394(10205):1225.
Conflict of Interest Disclosures None reported.