United States

**Objective** The aim of this study was to evaluate the reporting
quality of randomized clinical trials (RCTs) of artificial
intelligence (AI) in health care from 2015 to 2020 against the
Consolidated Standards of Reporting Trials–Artificial
Intelligence (CONSORT-AI)1
guideline.

**Design** In this systematic review, PubMed and Embase
databases were searched to identify eligible studies published
from 2015 to 2020. Articles were included if AI (defined as
AI, machine learning, or deep learning studies) was used as
an intervention for a medical condition, if there was evidence
of randomization, and if there was a control group in the
study. Exclusion criteria were nonrandomized studies,
secondary studies, post hoc analyses, if the intervention was
not AI, if the target condition was not a medical disease, or if
the study pertained to medical education. The included
studies were graded by 2 independent reviewers using the
CONSORT-AI checklist, which included 43 items. Any
disagreements were resolved by consensus following
discussion with a senior reviewer. Each item was scored as
fully reported, partially reported, or not reported. Irrelevant
items were labelled as not applicable. The results were
tabulated, and descriptive statistics were reported.

**Results** A total of 939 potential abstracts were screened,
from which 73 full-text articles were reviewed for eligibility.
Fifteen studies were included in the review. The number of
participants ranged from 28 to 1058. Studies pertained to
medical fields, including medicine (n = 2), psychiatry (n = 3),
gastroenterology (n = 5), cardiology (n = 2), ophthalmology
(n = 1), endocrinology (n = 1), and neurology (n = 1). Studies
were from China (n = 6), the United States (n = 6), the United
Kingdom (n = 1), the Netherlands (n = 1), and Israel (n = 1).
Only 3 items of the CONSORT-AI checklist were fully
reported in all studies. Five items were not applicable in more than 85% of the studies (13 of 15). Twenty percent of the
studies (3 of 15) did not report more than 50% of the
CONSORT-AI checklist items.
Conclusions Reporting quality of RCTs on AI was
suboptimal. Because reporting varied in the analyzed RCTs,
caution must be exercised when interpreting their outcomes.

**Reference**

1. Liu X, Faes L, Calvert MJ, Denniston AK. Extension of the
CONSORT and SPIRIT statements. Lancet.
2019;394(10205):1225.

**Conflict of Interest Disclosures** None reported.

Peer Review Congress 2022

Quality of Reporting of Randomized Clinical Trials in Artificial Intelligence: A Systematic Review

quality of reporting

reporting guidelines

artificial intelligence

AIAI 2020 (16th International Conference on Artificial Intelligence Applications and Innovations) should be held in Halkidiki, Greece. Due to current Covid-19 situation the conference will be held virtually. Underline will host a repository of video talks and preserve them for the future. Videos will be released on Underline one day after the conference.

AIAI 2020

poster

Our aim is to encourage research into the quality and credibility of peer review and scientific publication, to establish the evidence base on which scientists can improve the conduct, reporting, and dissemination of scientific research.
 
 **Welcome!**
 
 The JAMA Network, The BMJ, and METRICS welcome you to the Ninth International Congress on Peer Review and Scientific Publication. We have continued our efforts to broaden the scope of the Congress to all aspects of peer review and publication—from funding to postpublication—and to all sciences.
 
 We will have 3 days for presentations of new research into peer review and all aspects of scientific publication, bias, quality of reporting, and information access and dissemination. There are 50 plenary session research presentations and 5 plenary invited talks. Each plenary session research presentation will be followed by equal time for discussion and questions from the audience. In addition, there are 125 posters. All posters, including virtual posters, are available to view throughout the meeting. In-person posters will also be presented at designated times on September 9 and 10. If you are attending virtually, please visit the individual poster pages and ask your questions in the chat box. See the link to Posters on the left.
 
 We hope you will take an active part in the program, as we depend on your participation in the discussion sessions to make the Congress a success.

 [Peer Review Congress Organizers, Directors, and Advisory Board](https://peerreviewcongress.org/organizers-and-advisory-board/)
 
 [Health and Safety Policies](https://web.cvent.com/event/1b505794-6a0b-428c-a44c-6c46c4c50e99/websitePage:152962bb-4567-4561-9d81-21ebfa97f487)

![](https://assets.underline.io/uploads/markdown_image/1/image/7c9f0143dee383780cc4670269ade132.png)

A PDF of the schedule is available [here](https://drive.google.com/file/d/1-Jd191W-nxbWcN1prdX5o9nlSfeo9_-H/view?usp=sharing) and a flipbook version of the complete program and abstracts is included below. See also the link to the interactive schedule to the left.

<iframe width="700px" height="425px" src="https://online.fliphtml5.com/ebtyf/nlfn/#p=1" frameborder="0" allowfullscreen allowtransparency></iframe>

Registration is required to attend this event, please use the URL linked below to secure your ticket. Registration will close on Friday, September 2 at 5:00 PM CT. 

[**Register Here**]( https://web.cvent.com/event/1b505794-6a0b-428c-a44c-6c46c4c50e99/summary)

Our aim is to encourage research into the quality and credibility of peer review and scientific publication, to establish the evidence base on which scientists can improve the conduct, reporting, and dissemination of scientific research.



**Objective** The DynaMed Systematic Literature Surveillance
process surveys a large set of clinical journals most likely to
contain high-quality, high-relevance content on treatment,
diagnosis, and prognosis across all medical conditions. For
many conditions, limited content is retrieved from those
journals. Therefore, a machine learning–powered process was
designed, implemented, and tested to efficiently and
accurately identify relevant articles published across all
journals indexed in PubMed.1-3 This study reports the overall
performance of this machine learning–augmented
surveillance system.

**Design** Content-based search strategies were developed by a
medical librarian. PubMed-retrieved references were
probability ranked by a LightGBM machine learning
algorithm for likelihood of reporting high-quality, clinically
relevant evidence.1
Top-ranked references were included for
screening, stratified by publication date (<18 months or ≥18
months). Clinical experts trained in critical appraisal of the
literature manually screened the references and identified
those to be used for updating the topic. The following metrics were used to evaluate the machine learning system: median
probability ranking by machine learning of the 15 highest-
ranked references, overall and by topic; total and median
number of references retrieved by topic; and median position
of the first selected reference in the probability-ranked list
compared with PubMed reference lists ranked as most recent
and best match.

**Results** As of May 2022, results were reviewed for 332
topics. Of 91,009 articles identified, the 8406 (9.2%) with the
highest probability ranking were manually screened, and 576
references (6.9%) selected to update 241 topics. The median
number of references retrieved by topic was 184 (range,
7-3638). The median probability assigned to the 576
references was 0.047 (range, 0.002-0.996), and the median
probability by topic was 0.079 (range, 0.047-0.803). The
median position of first selected reference for machine
learning was 2 vs 9 for the PubMed most recent strategy and
20 for the PubMed best match strategy. Overall, the median
difference in position was 22 for machine learning vs the
PubMed most recent strategy and 54.5 for machine learning
vs the PubMed best match strategy. The 241 topics were
distributed among 29 specialties, with pediatrics and
infectious diseases accounting for 27%. The most common
article type selected was cohort study (29%).

**Conclusions** This study provides precise estimates of the
performance of a regression-based machine learning
algorithm in assisting literature surveillance for topics with a
low volume of evidence.

**References**

1. Abdelkader W, Navarro T, Parrish R, et al. A deep learning
approach to refine the identification of high-quality clinical
research articles from the biomedical literature: protocol for
algorithm development and validation. JMIR Res Protoc.
2021;10(11):e29398. doi:10.2196/29398
2. Del Fiol G, Michelson M, Iorio A, Cotoi C, Haynes RB. A
deep learning method to automatically identify reports of
scientifically rigorous clinical research from the biomedical
literature: comparative analytic study. J Med Internet Res.
2018;20(6):e10281. doi:10.2196/10281
3. Abdelkader W, Navarro T, Parrish R, et al. Machine
learning approaches to retrieve high-quality, clinically
relevant evidence from the biomedical literature: systematic
review. JMIR Med Inform. 2021;9(9):e30401.
doi:10.2196/30401

**Conflict of Interest Disclosures** None reported.

![](https://assets.underline.io/uploads/markdown_image/1/image/469db14f45ea334c4db86fb37aa646ea.png)

A Machine Learning Powered Literature Surveillance Approach to Identify High-Quality Studies From PubMed in Disease Areas With Low Volume of Evidence

**Objective** Researcher and stakeholder interaction in the
development of new tools to inform evidence-based medicine
is a key factor associated with the impact such tools can have.
Currently, there no risk of bias (RoB) tools to assess reviews
including network meta-analyses (NMAs; **Table 76**). The
objectives of this research were to identify items for potential
inclusion in the tool through a methodological systematic
review, conduct a Delphi survey, and conduct a stakeholder
survey.

**Design** An international steering committee developed a
protocol following the methods by Whiting et al1
for tool
development and made conceptual decisions about the tool’s
structure. Tools, articles, and editorial standards presenting
items related to bias, reporting, or quality in NMAs were
included. General systematic review items were excluded.
Experts for the Delphi survey were identified using a
purposive sampling. Respondents were asked to rate whether
items should be included. All agreed-upon items (defined as
70% agreement) and additional or aggregated items were
included in a second round of the survey. The stakeholder
survey contained 22 questions and was disseminated
anonymously through social media and professional
networks.

**Results** The search returned 3599 citations, from which 59
articles were included, yielding 99 items.2
Of these, 22 items
were deemed eligible and were entered into a Delphi survey
in which 26 respondents completed round 1 and 22
completed round 2.3
Seven items did not reach consensus in
round 2 of the Delphi survey. After further refinement by the
committee, 16 items were worded as signaling questions and
categorized into 3 domains in the tool. An elaboration and
explanation document was drafted. A total of 298
stakeholders participated in the survey; 75% indicated that
their organization produced NMAs, and 78% showed high
interest in the tool.3
Most stakeholders (84%) who responded
to the survey reported they would use the tool to assess an
NMA if they had received adequate training. Most
stakeholders and Delphi panelists preferred a tool to assess
both bias in NMA results and authors’ conclusions. After
examining the results of these studies, the committee
recommended that the tool be used with the ROBIS tool for
assessing biases in systematic reviews using a domain-based
structure and to assess both NMA results and authors’
conclusions. Response bias in this sample was a major
limitation, as stakeholders and Delphi panelists working in
higher-income countries were more represented.

![](https://assets.underline.io/uploads/markdown_image/1/image/fcc9691686401c0055efbb84cc4c90e8.png)

**Conclusions** These studies inform the development of the
first tool to assess RoB in NMAs. In the future, the tool will be
pilot tested in different user groups.

**References**
1. Whiting P, Wolff R, Mallett S, Simera I, Savović J. A
proposed framework for developing quality assessment tools.
Syst Rev. 2017;6(1):1-9. doi:10.1186/s13643-017-0604-6
2. Lunny C, Veroniki AA, Hutton B, et al. Methodological
review to develop a list of bias items used to assess reviews
incorporating network meta-analysis. Submitted to Research
Synthesis Methods, 2022.
3. Lunny C, Tricco AC, Veroniki AA, et al. Stakeholder
opinions on the structure and development of a new risk of
bias tool to assess systematic reviews with network meta-
analysis (RoB NMA tool): a cross-sectional survey. Research
Square. Preprint posted online February 22, 2022.
doi:10.21203/rs.3.rs-1324758/v1

**Conflict of Interest Disclosures** Brian Hutton has previously
received honoraria from Eversana Incorporated for the provision
of methodologic advice related to the conduct of systematic reviews
and meta-analyses. Ian R. White was supported by the Medical
Research Council (programme MC_UU_00004/06). Julian P. T.
Higgins is a National Institute for Health Research (NIHR) Senior
Investigator (NF-SI-0617-10145), is supported by NIHR Bristol
Biomedical Research Centre at University Hospitals Bristol and
Weston National Health Service (NHS) Foundation Trust and the
University of Bristol, is supported by the NIHR Applied Research
Collaboration West at University Hospitals Bristol and Weston
NHS Foundation Trust, and is a member of the Medical Research
Council (MRC) Integrative Epidemiology Unit at the University of
Bristol. Andrea C. Tricco holds a Tier 2 Canada Research Chair in
Knowledge Synthesis. No other disclosures were reported.
Funding/Support This study was supported through a 2020
Canadian Institutes of Health Research Project Grant (2021-2024;
ID 174998).

**Role of the Funder/Sponsor** The funder had no role in the
submitted work.

**Acknowledgments** We acknowledge the panelists in the Delphi
study: Simon Turner, Dimitris Mavridis, Virginia Chiocchia, Ian
Shrier, Adriani Nikolakopoulou, Dan Jackson, Richard Riley,
Becky Turner, Bruno Roza da Costa, Petros Pechlivanoglou, Steve
Kanters, Ferran Catala-Lopez, Tianjing Li, Matt Page, Chris
Cameron, Anna Chaimani, Kerry Dwan, Audrey Beliveau, Kristian
Thorlund, Jonathan Sterne, Cinzia del Giovane, Guido Schwarzer,
Jo McKenzie, Lehana Thabane, Theodoros Papakonstantinou,
Isabelle Boutron, Sharon Straus, and Jenn Watt. We also thank the
participants of the stakeholder survey, who were anonymous.

**Additional Information** The views expressed in this article are
those of the authors and do not necessarily represent those of the
NHS, the NIHR, the MRC, or the Department of Health and Social
Care.

![](https://assets.underline.io/uploads/markdown_image/1/image/4bd6388be6b83c90d7c91b6ab5c7e15a.png)



Groundwork for the Development of a New Risk of Bias Tool for Network Meta-analysis

**Objective** A previous study revealed that patient-reported
outcome measures (PROMs) with poor or unknown
psychometric properties were associated with higher
estimates of treatment effect in clinical trials of rotator cuff
diseases.1
This study assessed the variations in meta-analysis
estimates in orthopedics associated with varying quality of
PROMs and hypothesized an average higher estimate in
PROMs with unknown and poor psychometric properties.

**Design** Meta-analyses were identified from 5 databases from
inception through October 16, 2017. PROM scores were
derived from a prior publication that comprehensively
assessed the quality of these instruments (higher scores were
better quality).2
Standardized mean difference (effect size) or
mean difference of change in PROM scores (from before
treatment to after treatment) between different treatment
types were extracted or calculated for each study. For those
studies that did not report standardized results, change scores
were divided by the SD for standardization. The SD was
imputed in some cases from SEs and CIs. A mixed-effects
regression analysis was done, with all standardized change
scores as dependent variables and other data as independent
variables (PROM overall quality score, number of studies,
total sample size across included studies, and average
follow-up), controlling for the grouping variable meta-
analysis (in which multiple estimates were calculated for
several PROMs from within the same meta-analysis). A
sensitivity analysis was done excluding meta-analytic
estimates for mixed interventions. Increases in β coefficients
indicate effect size change for each unit increase in PROM
quality.

**Results** A total of 249 unduplicated meta-analyses on rotator
cuff disease were reviewed with 47 being included, with 6
different PROMs included, and several meta-analyses
included mixed outcomes with several PROMs being
combined. Reviews were excluded (202) primarily because
one of the PROMs of interest was not used. The β coefficient
for PROM quality and the pooled effect size estimates was
−0.012 (95% CI, −0.049 to 0.025; P = .53) before and after
controlling for several covariates (**Table 77**). In the
sensitivity analysis, after removing meta-analyses with mixed
PROMs in the pooled effect size estimates, the β coefficient
for PROM quality and pooled effect estimates was −0.013
(95% CI, −0.034 to 0.007; P = .19); this finding was not
statistically significant.

![](https://assets.underline.io/uploads/markdown_image/1/image/0ab6d506272a6a409acade728fb4035c.png)

**Conclusions** In estimating the percentage of bias, this study
found that pooled effect size estimates across PROMs of poor
quality inflate effect estimates by approximately 10% (the
ratio of the β estimate and meta-analytic estimate). This
magnitude of effect size is not statistically significant, but larger methodologic studies may be warranted to confirm
clinical significance.

**References**
1. Gagnier JJ, Johnston BC. Poor quality patient reported
outcome measures bias effect estimates in orthopaedic
randomized studies. J Clin Epidemiol. 2019;116:36-38.
2. Huang S, Grant J, Miller B, Mirza FM, Gagnier JJ. A
systematic review of psychometric properties of patient
reported outcome instruments for use in patients with rotator
cuff disease. Am J Sports Med. 2015;43(10):2572-2582.

**Conflict of Interest Disclosures** None reported.

**Additional Information** Jianyu Lai is a co–corresponding
author.

![](https://assets.underline.io/uploads/markdown_image/1/image/bc6d664d1bdc6e5bf7a6b2fc83090e08.png)

Bias in Meta-analysis Estimates Associated With Varying Quality of Patient-Reported Outcome Measures in Orthopedics

**Objective** Peer review of clinical research should include risk
of bias assessment, but it is often limited and not systematic.
The Risk of Bias Assessment Tool (RoBAT) is an open
web-based tool in which the user can select types of bias from
a hierarchical list and record their assessment to help
document a comprehensive risk of bias assessment for any
scientific study. The RoBAT Usability Research Pilot Study
was done to inform development of the tool and measures of
its initial effectiveness, efficiency, and satisfaction.

**Design** Participants were recruited through email
distribution lists of COVID-19 Knowledge Accelerator,
Guidelines International Network, Health Level Seven
International, Healthcare Information For All, and International Society for Evidence-based Health Care
(January 28-31, 2022). An online platform enabled
participants to attest to meeting eligibility criteria (experience
or education regarding risk of bias assessment and
willingness to complete the study online), consent to
participate, and complete usability evaluation reports for each
assessment they attempted. Participants were encouraged to
complete assessments at least 3 times (including 1 without
RoBAT, 1 with RoBAT, and 1 with RoBAT after study-
generated improvements). Surveys for each assessment
included whether the task was completed (participant
defined), time-on-task (participant reported as number of
minutes), perceived ease of task (5-point scale), perceived
ease of use of the tool (5-point scale), and open-ended
questions for likes, dislikes, and suggested improvements.
End-of-study surveys included the System Usability Scale1
and perceived usefulness for peer review support. Study
enrollment closed after more than 5 participants used RoBAT,
the number needed to detect 80% to 85% of issues for the
initial discovery of usability problems.2,3

**Results** A total of 18 participants were enrolled in the study
and 10 completed 32 risk of bias assessment attempts. Task
completion was achieved for 6 of 7 (86%) attempts without
RoBAT, 13 of 17 (77%) attempts with the initial version of
RoBAT, and 5 of 8 (63%) attempts with the revised version of
RoBAT. Median (range) time on task was 34.5 (5-60) minutes
without RoBAT, 30 (5-90) minutes with the initial version of
RoBAT, and 20 (14-120) minutes with the revised version of
RoBAT. The most common suggestions for improvements
were to add instructions and to facilitate rapid selection of
recognized terms. Data were too limited to establish a pattern
for ease-of-use ratings from no to initial to revised RoBAT
use. Five of 9 participants reported they would likely use
RoBAT for peer review support.

**Conclusions** The pilot study provided preliminary evidence
of efficiency and satisfaction with RoBAT and demonstrated
the feasibility of rapid online research development and
implementation. Subsequent developments could test the
tool’s usefulness for systematic reviewers and journal editors
and integration with editorial systems.

**References**

1. Lewis JR. The System Usability Scale: past, present, and
future. Int J Hum Comput Interact. 2018;34(7):577-590. doi:
10.1080/10447318.2018.1455307
2. Nielsen J, Landauer TK. A mathematical model of the
finding of usability problems. In: CHI ’93: Proceedings of the
INTERACT ’93 and CHI ’93 Conference on Human Factors in
Computing Systems. Association for Computing Machinery;
1993:206-213. doi:10.1145/169059.169166
3. Virzi RA. Refining the test phase of usability evaluation:
how many subjects is enough? Hum Factors. 1992;34(4):457-
468. doi:10.1177/001872089203400407

**Conflict of Interest Disclosure**s Brian S. Alper owns
Computable Publishing LLC. Joanne Dehnbostel and Khalid Shahin
are employees of Computable Publishing LLC. No other disclosures
were reported.

![](https://assets.underline.io/uploads/markdown_image/1/image/a4a9e284fb9e55acd82f4442832f7fe5.png)

Development and Pilot Test of Risk of Bias Assessment Tool for Use in Peer Review | VIDEO

**Objective** Editorial decision-making is a fundamental
element of the scientific enterprise, with critical implications
for career advancement. Despite repeated calls for making
deliberate efforts to incorporate gender diversity into editorial
board structures, gender disproportions remain pervasive.1,2
Gender parity in the contributions to editorial decisions at
various stages of the publication process was examined, based
on analytics collected by the biomedical researcher–led
journal eLife.

**Design** Data accumulated by eLife’s platform from 2017 to
2019 were organized into 2 data sets. The reviewing editor
(RE) data set included anonymous information on the
engagement of individual REs (n = 1201) in the editorial
process, with a binary gender assigned based on the editor’s
name and gender expression. REs were consulted by senior
editors at the initial assessment stage, and an RE was chosen
to handle the full review process for the selected manuscripts.
The manuscript data set included the outcome of submitted
manuscripts (n = 24,056) in each submission stage, the
assigned gender of the REs suggested by the authors, the
assigned gender of the handling RE, and the assigned gender
of the appointed senior editor. Owing to nonnormal
distributions in the data, 2-tailed nonparametric tests were
used, including (1) binomial tests and N − 1 χ2 proportion
comparison tests, (2) contingency table analysis, (3) a
permutation-based Welch independent t test, and (4)
equivalent bayesian analyses when significance was close to
P < .05.

**Results** Despite efforts to increase women representation,
the board of REs was predominantly male (833 [69.4%]).
Authors suggested fewer women as REs, even after correcting
for men overrepresentation (29.08% women vs 30.6% men;
χ21
= 11.65; P = .001; Cohen h = 0.90). Although women
editors were proportionally involved in the initial manuscript
assessment (mean [SD] number of assessment requests per
month, 2.40 [1.44] women REs vs 2.41 [1.51] men REs; t809.7 =
0.11; P = .92), they were underengaged in the full review
process (mean [SD] number of full submissions per month,
0.40 [0.32] women REs vs 0.44 [0.37] men REs; t869.8 = 2.22;
P = .03; Hedges g = 0.13). Gender homophily in manuscript assignment was found, such that senior editors overengaged
same-gender REs (χ21
= 224.55; P < .001; contingency
coefficient of 0.186) (**Figure 28**). This tendency was stronger
in more gender-balanced scientific disciplines (eg, in
developmental biology, with 56.9% of manuscripts handled
by men REs; r = −0.47; P = .05; Bayes factor10 = 1.77).
Conclusions Together, the findings confirm that gender
disparities exist along the editorial process and suggest that
merely increasing the proportion of women members might
not be sufficient to eliminate this bias.

![](https://assets.underline.io/uploads/markdown_image/1/image/aa53366315682342d110af051c807d6d.png)

**References**

1. Helmer M, Schottdorf M, Neef A, Battaglia D. Gender bias
in scholarly peer review. eLife. 2017;6:e21718. doi:10.7554/
eLife.21718
2. Palser ER, Lazerwitz M, Fotopoulou A. Gender and
geographical disparity in editorial boards of journals in
psychology and neuroscience. Nat Neurosci. 2022;25(3):272-
279. Medline:35190729 doi:10.1038/s41593-022-01012-w

**Conflict of Interest Disclosures** Tamar R. Makin is a senior
editor for eLife. Maria J. Guerreiro is part of the executive staff team
of eLife.

**Funding/Support** Tal Seidel Malkinsonwas funded by Agence
Nationale de la Recherche grant ANR-16-CE37-0005; Tamar
R. Makin was funded by Wellcome Trust Senior Research
Fellowship grant 215575/Z/19/Z and ERC Starting Grant 715022
EmbodiedTech.

![](https://assets.underline.io/uploads/markdown_image/1/image/e805254dda693d9fcc5eb24971cde2c5.png)

Assessment of Gender Balance in the Editorial Activities of a Researcher-Led Journal

**Objective** To analyze the proportion of spin strategies
present in randomized clinical trials (RCTs) published in
high-impact journals appraised at low risk of bias.

**Design** For this cross-sectional study, a comprehensive
search was made in Ovid MEDLINE, retrieving all RCTs
published in the 10 highest-impact medical journals in
medicine and surgery (by 2018 Journal Citation Reports
impact factor) from January 1, 2018, to February 28, 2020.
Stratified sampling was then performed of 150 articles,
adjusted by journal. Then, the risk of bias of each RCT was
assessed independently and in duplicate using the Cochrane
risk-of-bias tool. For each included article, an adaptation of
the classification scheme created by Boutron et al1
for RCTs
was used to identify spin strategies in every section of the
article.

**Results** A total of 46 RCTs at low risk of bias were appraised
using the spin classification scheme. From those included in
the analysis, 25 studies were published in internal or general
medicine journals and 21 in surgery journals. The number of
patients included in the RCTs ranged from 20 to 12,000. The
most spin was identified in the discussion and conclusion
sections, with the most common strategies being focusing on
statistically significant secondary outcomes (4 [8.7%]), ruling
out adverse events (4 [8.7%]), acknowledging statistically
nonsignificant results for the primary outcome but
emphasizing the beneficial effect of treatment (4 [8.7%]), and
focusing only on statistically significant results (5 [10.9%]). In
subgroup analyses, a total of 38 spin strategies were identified
in studies with nonsignificant primary outcomes vs 11 in
studies with significant results. Assessment according to
specialty identified 35 spin strategies in surgery journals and
14 in internal medicine journals.

**Conclusions** Overall, the use of spin strategies was found to
be more common in studies with a statistically nonsignificant
primary outcome vs those with a significant primary outcome.
This study highlights the need to raise awareness about the
use of spin, even in studies published in high-impact journals
and at low risk of bias, as the use of these strategies can affect
readers’ decision-making.

**Reference**

1. Boutron I, Dutton S, Ravaud P, Altman DG. Reporting and
interpretation of randomized controlled trials with
statistically nonsignificant results for primary outcomes.
JAMA. 2010;303(20):2058-2064. doi:10.1001/jama.2010.651

**Conflict of Interest Disclosures** None reported.

![](https://assets.underline.io/uploads/markdown_image/1/image/5451b89b7d6c71966bd2b23f3e7d34c2.png)

Spin in Randomized Clinical Trials of Top Medical Journals

**Objective** According to meta-analyses and practice
guidelines, benzodiazepines are effective in the treatment of
panic disorder.1,2 However, to date, no meta-analyses have
incorporated data from unpublished trials. Among all
benzodiazepines, alprazolam is the most widely prescribed
and has the highest frequency of nonmedical use, abuse, and
related harms in the US.3
This study examined reporting bias
with the extended-release (XR) formulation of alprazolam by
comparing its efficacy for panic disorder using trial results
from the published literature and the US Food and Drug
Administration (FDA).

**Design** There was no protocol for this study, and it was not
registered. Medical and statistical reviews for alprazolam XR
were downloaded from Drugs@FDA (https://www.
accessdata.fda.gov/scripts/cder/daf/index.cfm); all phase 2
and 3 randomized, double-blind, placebo-controlled efficacy
trials were identified; summary statistics on 5 primary
outcome measures were extracted; and the FDA’s regulatory
decision as to whether, for purposes of approval, the trial
provided evidence of efficacy (statistical superiority to placebo on all primary outcomes) was also extracted. For each
FDA-registered trial, the published literature was searched
for matching publications using PubMed, bibliographies of
review articles, and Google Scholar. The best match between
FDA-registered trials and publications was based on drug
name, comparator, dosage groups, sample size, duration, and
investigator name. Summary data on the drug-placebo
comparison and whether the publication conveyed that the
drug was effective were extracted. Two meta-analyses were
conducted—one based on the FDA review and the other based
on the published literature—and their effect sizes were
compared. Reporting bias was examined by comparing the
following: (1) overall trial results (positive or not) according
to the FDA vs corresponding publications and (2) effect size
(Hedges’ g) using FDA data vs published data. Risk of bias
was not assessed because the objective was not to assess bias
in trial methods (internal validity) but rather bias in results
reporting.

**Results** The FDA review showed that 5 trials were
conducted, only 1 of which (20%) was positive and published
(as positive). The remaining 4 studies failed to demonstrate
efficacy. Of those, 2 were not published; for the other 2, the
articles selectively reported positive, nonprimary, or post hoc
outcomes. Thus, according to the published literature, 3 of 3
trials (100%) appeared to show positive results. Alprazolam’s
overall effect size calculated using FDA data was 0.33 (95%
CI, 0.07-0.59), while that based on published trial data was
0.47 (95% CI, 0.30-0.65), an increase of 0.14, or 42%
(**Figure 29**).

![](https://assets.underline.io/uploads/markdown_image/1/image/ccb76e5bb705b4f15684a9968911e67e.png)

**Conclusions** According to the results of this analysis,
reporting bias has inflated the apparent efficacy of alprazolam
XR, as previously found with other drug classes. Because this
inflation alters the risk-benefit ratio, clinicians may wish to
reconsider their prescribing practices with respect to this
benzodiazepine. This study highlights the value of regulatory
data to public health.

**References**

1. Stein MB, Goin MK, Pollack MH, et al. Practice Guideline
for the Treatment of Patients With Panic Disorder. American
Psychiatric Association; 2010.
2. Wilkinson G, Balestrieri M, Ruggeri M, Bellantuono C.
Meta-analysis of double-blind placebo-controlled trials of
antidepressants and benzodiazepines for patients with panic
disorders. Psychol Med. 1991;21(4):991-998. doi:10.1017/
s0033291700029986
3. Ait-Daoud N, Hamby AS, Sharma S, Blevins D. A review of
alprazolam use, misuse, and withdrawal. J Addict Med.
2018;12(1):4-10. doi:10.1097/adm.0000000000000350

**Conflict of Interest Disclosures** Rosa Y. Ahn-Horst reported
no conflicts of interest. Erick H. Turner previously worked as a
medical officer for the US Food and Drug Administration and
reviewed applications submitted by pharmaceutical companies to
determine whether they should be approved for US marketing. He
has no financial interest in any pharmaceutical products, approved
or otherwise.

![](https://assets.underline.io/uploads/markdown_image/1/image/2a22e0a53db11370664d0a9941bdb0c9.png)



Analysis of Reporting Bias in Published and Unpublished Trials of Extended-Release Alprazolam for Panic Disorder

**Objective** This research aimed to perform a comparison of
basic bibliometrics of leading Chinese open access (OA)
journals from the Excellence Action Plan for Chinese Science,
Technology, and Medicine Journals. All journals used English
as the publication language.

**Design** This was a cross-sectional investigation reported
following the STROBE checklist for conference abstracts.
Journal data were extracted from the Web of Science (WoS)
and Directory of Open Access Journals databases. All
searches were performed in November 2021, and the Journal
Citation Reports 2020 impact factor release was considered
the latest available data set that allowed for retrieval of
relative data. The top 2 international OA journals in the first
quartile of the journal impact factor rank based on WoS
categories for each leading Chinese OA journal were chosen
as control journals. Bibliometric indicators, such as the
volume of published documents, citations, and Eigenfactor
score (based on items published between 2018 and 2020), were compared between leading Chinese OA journals and
control journals with nonparametric tests.

**Results** A total of 14 of 22 (63.64%) leading Chinese journals
were OA journals. For 2 OA journals, there was no
corresponding international OA journal; for another 2 OA
journals, there was only 1 corresponding OA journal.
Therefore, there were 22 control journals. The median (IQR)
number of published documents (articles and reviews) per
leading Chinese OA journal was less than that of the control
journals (302 [243-352] vs 715 [390-2010]; P = .004); the
median (IQR) citations per document (13 [9-26] vs 14 [9-25];
P = .99) and percentage of cited documents (94.81% [93.63%-
98.24%] vs 96.96% [92.64%-98.14%]; P = .76) were not
statistically different. The median (IQR) normalized citation
impact of the 2 groups was not significantly different (2.008
[1.221-2.933] vs 2.003 [1.353-3.093]; P = .97), but the median
(IQR) Eigenfactor score, which took into account both the
number of citations and the academic influence of citing
journals, of leading Chinese OA journals was lower than that
of control journals (0.005 [0.003-0.007] vs 0.013 [0.005-
0.052]; P = .02]. There were fewer top 1% documents in
leading Chinese OA journals (median [IQR], 10 [5-19] vs 34
[11-65]; P = .02); however, the percentage of top 1%
documents was not statistically different (median [IQR],
4.54% [1.18%-5.82%] vs 3.84% [1.86%-6.86%]; P = .94). The
average citations per top 1% document of leading Chinese OA
journals were more than that of control journals, but the
difference was not statistically significant (median [IQR], 103
[52-134] vs 72 [58-105]; P = .26).

**Conclusions** Compared with top international OA journals,
Chinese leading OA journals published fewer articles with
similar average citations, but they were less cited by highly
cited journals.

**Conflict of Interest Disclosures** None reported.

![](https://assets.underline.io/uploads/markdown_image/1/image/b465371259175e41a935b95b43682d2e.png)

Comparison of Bibliometrics of Leading Open Access Chinese Journals with Leading Non-Chinese Journals in Science, Technology, and Medicine

**Objective** This systematic review aimed to describe the
characteristics of clinical and biomedical research in
Malaysia.

**Design** A search was conducted on PubMed, Embase,
CINAHL, PsycINFO, and MyMedR (http://mymedr.afpm.
org.my/) for published clinical and biomedical research in primary care settings from 1962 to 2017 by Malaysian authors
in a Malaysian institution. Studies found were independently
screened by a team of reviewers and information was
extracted.1,2 In phase 1, the characteristics of the research and
profiles of the researchers and journals in which they were
published were reported descriptively. In phase 2, the quality
of studies included in phase 1 will be assessed using a newly
developed tool to ascertain risk of bias. Longitudinal trends of
the research characteristics, health conditions studied
(International Classification of Primary Care), and settings,
among other characteristics, were explored. No synthesis of
results was conducted as no effect estimates were available to
be pooled.

**Results** Of 4513 articles, 1078 were included in this
qualitative synthesis and 790 with complete data were
analyzed. Clinical studies (81.9%), primary research (81.1%),
and quantitative studies (74.2%), consisting mostly of
prevalence studies (67.7%) by cross-sectional sampling
(70.4%), were predominant. The number of studies increased
(**Figure 30**) and the number of characteristics also increased
after year 2000. Researchers from family medicine (39.3%)
and public health (15.2%) specialties were the main
contributors to the articles (**Figure 30**, A). Most of the
corresponding authors had a master of medicine degree
(46.5%) compared with a doctor of philosophy (PhD) (24.4%)
or doctor of medicine (MD) (5.2%) degree. Researchers with
PhD and MD degrees were more likely to conduct
interventional studies compared with those with master’s
degrees (8.0% vs 6.4% of studies; χ2 = 54.26; P = .03).
Publications were mainly original research (82.8%) in
international (48.6%) or local (35.0%) journals and were
evenly distributed between multidisciplinary (51.6%) and
discipline-specific (46.6%) journals. The number of authors
per article was most often fewer than 5 (73.6%), and the
number of collaborating institutions was predominantly
fewer than 3 (82.5%). Incidences of coauthorship and collaboration with overseas researchers were few but showed
a significant increasing trend in the last decade (Figure 30,
B). The top 5 conditions studied were general and unspecified
(37.9%); endocrine, metabolic, and nutritional (15.2%);
circulatory (7.8%); psychological (5.9%); and respiratory
(4.9%).

![](https://assets.underline.io/uploads/markdown_image/1/image/46f7fc54671d85ca7536705d5864ba71.png)

**Conclusions** The longitudinal and prospective trends of the
research characteristics assessed in this analysis provided
suggestions of improvement initiatives needed for primary
care research enterprise in Malaysia. This includes training
on the proper use of different study designs, on developing a
better supportive ecosystem for interventional clinical trials,
on skills for international research collaboration, and for
strategizing research topics that meet the issues of primary
medical care. Similar works in other disciplines could be
initiated and better conducted after this first experience. The
aim of phase 2 will be to validate a research-quality screening
tool based on domains of relevance, credibility of the
methods, and usefulness of the results.

**References**

1. Chew BH, Lim PY, Lee SWH, et al. A systematic review of
medical and clinical research landscapes and quality in
Malaysia and Indonesia [REALQUAMI]: the review protocol.
PROSPERO identifier: CRD42020152907. Updated April 28,
2020. Accessed July 14, 2022. https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=152907
2. Chew BH, Lim PY, Lee SWH, et al. A systematic review of
medical and clinical research landscapes and quality in
Malaysia and Indonesia [REALQUAMI]. OSF Registry for
Research on the Responsible Conduct of Research. May 1,
2019. Accessed July 14, 2022. https://osf.io/w85ce

**Conflict of Interest Disclosures** None reported.

![](https://assets.underline.io/uploads/markdown_image/1/image/37b0ecbcea3db3dda80fcaec6d325d90.png)

A Systematic Review of Medical and Clinical Research Landscapes in Primary Medical Care in Malaysia

**Objective** To compare financial conflict of interest (COI) as
declared by authors of systematic reviews (SRs) of
methylphenidate for attention-deficit disorder with publicly
available information and in relation to risk of bias (RoB).

**Design** SRs on the outcomes associated with
methylphenidate for attention-deficit disorder in all ages were
searched in Medline, Cochrane, Embase, and PsycInfo. The
Preferred Reporting Items for Systematic Reviews and
Meta-analyses (PRISMA) reporting guideline was followed
for the selection of relevant SRs. Two reviewers (A.S. and
S.C.) independently screened open websites and recent
publications for all authors of each SR, and data on financial
COI were extracted. All searches followed a preplanned and
similar routine. A time limit of 3 years from publication of the
index SR was applied. Findings were discussed between the
reviewers and repeated until consensus was reached. If no
data were found for any of the authors, the SR was
categorized as no COI. Two reviewers (A.S. and L.O.)
independently judged RoB of the SR using the Risk of Bias in
Systematic Reviews (ROBIS) tool. Any disagreement was
resolved in consensus. In addition, data were retrieved on
COI as declared by authors.

**Results** Of 651 unique publications, 44 relevant SRs
published between 2008 and 2021 were included. In all, 32
SRs (73%) were based on randomized clinical trials only, 18
(41%) reported positive effects only, 15 (34%) reported both
positive and adverse effects, and 11 (25%) reported adverse
effects only. Eleven SRs (25%) included only studies using
placebo for comparison. A meta-analysis was conducted in 26
of the SRs (59%). COI disclosure was missing for 2 of 44 SRs
(5%). For 15 SRs (34%), authors declared COI, and this
declaration was confirmed by open sources in all cases. For 27
SRs (61%), the authors declared no COI, but discordant
information was publicly available for 8 of 27 (30%). The
direction of COI for most SRs was not able to be assessed. The
RoB was high in 37 of 44 SRs (84%). Of the 7 SRs with low
RoB, 1 had no COI identified in open sources (**Table 78**).

![](https://assets.underline.io/uploads/markdown_image/1/image/df8e29168dd1f7fe42b80257f1d7c32c.png)

**Conclusions** The findings indicated an underreporting of
COI in SRs in studies on attention-deficit disorder, and most
SRs were compromised by high RoB. Owing to small
numbers, no firm conclusion on the association between COI
and RoB was possible.

**Conflict of Interest Disclosures** None reported.

![](https://assets.underline.io/uploads/markdown_image/1/image/43fce7efe9e39c5aa2156e3ba3ad6038.png)



Quality of Reporting of Randomized Clinical Trials in Artificial Intelligence: A Systematic Review

Next from Peer Review Congress 2022

A Machine Learning Powered Literature Surveillance Approach to Identify High-Quality Studies From PubMed in Disease Areas With Low Volume of Evidence

Similar lecture

An Overview of Chatbot Technology

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES