EMNLP 2025

November 07, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Although Large Language Models (LLMs) have demonstrated significant potential in medical diagnostics and clinical decision-making, existing biomedical NLP benchmarks primarily focus on qualitative reasoning tasks, lacking rigorous evaluation of quantitative computation capabilities extensively used in clinical settings, particularly for Chinese language scenarios. To address this gap, we introduce CMedCalc-Bench, the first fine-grained benchmark specifically designed for Chinese medical calculation tasks. CMedCalc-Bench consists of 69 typical calculation tasks spanning multiple clinical domains such as cardiology, endocrinology, nephrology, and emergency medicine, featuring over 1,000 real-world Chinese clinical cases. We develop an innovative multi-stage evaluation framework that separately evaluates clinical entity extraction and numerical computation processes, enabling detailed diagnosis of model deficiencies at different stages. Experimental results show that existing mainstream models significantly underperform on Chinese medical computation tasks, highlighting critical issues like inaccurate entity recognition and imprecise numerical calculations.

Downloads

SlidesPaperTranscript English (automatic)

Next from EMNLP 2025

Evaluating Robustness of Large Audio Language Models to Audio Injection: An Empirical Study
poster

Evaluating Robustness of Large Audio Language Models to Audio Injection: An Empirical Study

EMNLP 2025

+4
Ji Guo and 6 other authors

07 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved