EMNLP 2025

November 05, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Automatically generated radiology reports often receive high scores from existing evaluation metrics but fail to earn clinicians’ trust. This gap reveals fundamental flaws in how current metrics assess the quality of generated reports. We rethink the design and evaluation of these metrics and propose a clinically grounded Meta-Evaluation Framework. We define clinically grounded criteria spanning clinical alignment and key metric capabilities, including discrimination, robustness, and monotonicity. Using a fine-grained dataset of ground truth and rewritten report pairs annotated with error types and clinical significance labels, we systematically evaluate widely used metrics and uncover their limitations, such as failing to distinguish clinically significant errors, over-penalizing harmless variations, or lacking consistency across error severity levels. Our framework and dataset offer guidance for building more clinically reliable evaluation methods.

Downloads

SlidesPaperTranscript English (automatic)

Next from EMNLP 2025

MoR: Better Handling Diverse Queries with a Mixture of Sparse, Dense, and Human Retrievers
poster

MoR: Better Handling Diverse Queries with a Mixture of Sparse, Dense, and Human Retrievers

EMNLP 2025

+3
Fengyu Cai and 5 other authors

05 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2026 Underline - All rights reserved