Lecture image placeholder

Premium content

Access to this content requires a subscription. You must be a premium user to view this content.

Monthly subscription - $9.99Pay per view - $4.99Access through your institutionLogin with Underline account
Need help?
Contact us
Lecture placeholder background
VIDEO DOI: https://doi.org/10.48448/znv4-qp91

poster

ACL 2024

August 13, 2024

Bangkok, Thailand

Evaluating Large Language Models for Health-related Queries with Presuppositions

keywords:

false assumptions

evaluation of llms

presuppositions

health

benchmark

As corporations rush to integrate large language models (LLMs) it is critical that they provide factually accurate information, that is robust to any presuppositions that a user may express. In this work, we introduce UPHILL, a dataset consisting of health-related queries with varying degrees of presuppositions. Using UPHILL, we evaluate the factual accuracy and consistency of InstructGPT, ChatGPT, GPT-4 and Bing Copilot models. We find that while model responses rarely contradict true health claims (posed as questions), all investigated models fail to challenge false claims. Alarmingly, responses from these models agree with 23-32% of the existing false claims, and 49-55% with novel fabricated claims. As we increase the extent of presupposition in input queries, responses from all models except Bing Copilot agree with the claim considerably more often, regardless of its veracity. Given the moderate factual accuracy, and the inability of models to challenge false assumptions, our work calls for a careful assessment of current LLMs for use in high-stakes scenarios.

Downloads

SlidesTranscript English (automatic)

Next from ACL 2024

MedREQAL: Examining Medical Knowledge Recall of Large Language Models via Question Answering
poster

MedREQAL: Examining Medical Knowledge Recall of Large Language Models via Question Answering

ACL 2024

Phillip SchneiderFlorian MatthesJuraj Vladika
Juraj Vladika and 2 other authors

13 August 2024

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Lectures
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2023 Underline - All rights reserved