Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Background Patients increasingly consult online resources and Artificial Intelligence (AI) chatbots for medical information. However, post pandemic aggravation of medical misinformation and recent blossom of AI has raised concerns about the quality and comprehensibility of such sources. Adequate, easy to understand, and accurate patient education is crucial in procedural outcome and prognosis, especially for procedures that warrant meticulous postoperative management, such as tracheostomy. Methods We evaluated five popular and high-ranking AI chatbots - ChatGPT, DeepSeek, Google Gemini, Microsoft Copilot, and Grok 3 - alongside responses from a senior otolaryngologist, addressing 12 Frequently Asked Questions (FAQs) on tracheostomy care. FAQ selection was guided by a combination of search-listening tools (Google Trends, KeywordsPeopleUse, AlsoAsked, AnswerThePublic, Kwrds.ai), manual Google searches, and clinician input to reflect authentic patient language. Three blinded, board-certified otolaryngologists independently rated each response’s accuracy, clarity, relevance, completeness, usefulness, and reference reliability using the Quality Analysis of Medical Artificial Intelligence (QAMAI), a standardized instrument developed to assess AI-generated contents. Overall preferences and reasons for suboptimal response ranking were also assessed by reviewers. Readability and descriptive text analysis were evaluated using Textstat and Natural Language Toolkit library in Python (Python Software Foundation), employing Simple Measure of Gobbledygook (SMOG), Flesch-Kincaid Grade Level (FKGL) and Reading Ease (FKRE) metrics. Results Across the 12 FAQs, there were no significant differences in accuracy between AI-generated and otolaryngologist-generated responses. Google Gemini has higher overall preference and QAMAI score compared to Microsoft Copilot (p<0.05). In terms of completeness, Gemini (p<0.001), DeepSeek (p<0.05), Grok 3 (p<0.05), and ChatGPT (p<0.05) outperformed the otolaryngologist, and each provided content deemed of “good quality” by QAMAI standards. However, Gemini has the reading time (p<.001), highest word (p<.001), long word (p<.001), and difficult word counts (p<.001). Readability analyses revealed that all models have high reading level of at least 10th-13th grade. Inter rater reliability among the three reviewers was 78%. Conclusion AI chatbots are a promising tool for accurate, on-demand, and comprehensive health information for patient and families considering or managing tracheostomy care. Google Gemini demonstrated strength in content completeness and overall preference, highlighting the potential of AI to supplement time-limited physician-patient interactions. However, all AI and otolaryngologist generated responses surpassed the American Medical Association’s recommended 6th grade reading level, indicating a need for readability optimization. Future studies should focus on refining the readability of AI-generated content and assessing patient preferences across models.