EMNLP 2025

November 05, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Systems that answer questions by reviewing the scientific literature are becoming increasingly feasible. To draw reliable conclusions, these systems should take into account the quality of available evidence from different studies, placing more weight on studies that use a valid methodology. We present a benchmark for measuring the methodological strength of biomedical papers, drawing on the risk-of-bias framework used for systematic reviews. Derived from over 500 biomedical studies, the three benchmark tasks encompass expert reviewers' judgments of studies' research methodologies, including the assessments of risk of bias within these studies. The benchmark contains a human-validated annotation pipeline for fine-grained alignment of reviewers' judgments with research paper sentences. Our analyses show that a system's reasoning and retrieval capabilities impact its risk-of-bias assessment. The dataset is available at https://github.com/RoBBR-Benchmark/RoBBR.

Downloads

SlidesPaperTranscript English (automatic)

Next from EMNLP 2025

Surge: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors
poster

Surge: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors

EMNLP 2025

+2Bohan Lyu
Siqiao Huang and 4 other authors

05 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved