Lecture image placeholder

Premium content

Access to this content requires a subscription. You must be a premium user to view this content.

Monthly subscription - $9.99Pay per view - $4.99Access through your institutionLogin with Underline account
Need help?
Contact us
Lecture placeholder background
VIDEO DOI: https://doi.org/10.48448/t9mm-hf38

poster

AMA Research Challenge 2024

November 07, 2024

Virtual only, United States

Evaluating AI in Type 2 Diabetes Care: ChatGPT vs Internal Medicine Residents.

Background Type 2 diabetes mellitus (T2DM) affects 11.6% of the US population and presents a significant global health challenge. Managing T2DM is expensive, costing Medicare 5000 dollars per person- year. Having a reliable information source that the general population can easily access can significantly improve overall health literacy. With approximately 5% of all internet searches related to healthcare, generative artificial intelligence (GeAI) tools including OpeniAI’s Chat Generative Pre-trained Transformer (ChatGPT) have gained popularity among patients seeking medical advice. This study aims to evaluate the accuracy of the responses generated by ChatGPT to common questions related to the prevention, diagnosis, and management of T2DM and to assess responses relative to those provided by first-year Internal Medicine residents enrolled in an ACGME-accredited program.

Methods In this single-blinded observational study, a set of frequently asked questions on T2DM were compiled by physicians experienced in type 2 diabetes and were used to assess the accuracy of GeAI responses relative to those by resident physicians. The latest version of ChatGPT (GPT-4) served as the preferred GeAI tool for the study. Each question was presented three times to ChatGPT, and two researchers independently summarized the majority response. The same set of questions was presented to 11 first-year Internal Medicine residents, and the responses were summarized by two members of the research team. Three board-certified internal medicine physicians blinded to the source of each response then reviewed and scored responses as either appropriate (aligning with the standard of care) or inappropriate (inconsistent with the standard of care).

Results Of the responses generated by ChatGPT, 92% (23/25) were found to be appropriate vs 84% (21/25) by residents, and 8% (2/25) were found to be inappropriate, vs 16% (4/25) by residents.

Conclusion ChatGPT exhibited high consistency in providing appropriate responses with 92% of the responses scored as appropriate. This suggests that the responses offered by ChatGPT were compatible with the standard of care, and even slightly superior to the responses provided by resident physicians. While promising, the focus of this study is on response appropriateness without assessing the understanding of patients of the responses provided. Furthermore, this study did not evaluate the impact of human factors such as sympathy in delivering health-related information, which are essential in delivering compassionate care; Continued research is essential to fully realize AI’s benefits in healthcare while ensuring ethical implementation.

Next from AMA Research Challenge 2024

Impact of R-IDEA Tool on Interns' Clinical Reasoning and Satisfaction During Medicine Rotation
poster

Impact of R-IDEA Tool on Interns' Clinical Reasoning and Satisfaction During Medicine Rotation

AMA Research Challenge 2024

Mona Khalafi

07 November 2024

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Lectures
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2023 Underline - All rights reserved