Lecture image placeholder

Premium content

Access to this content requires a subscription. You must be a premium user to view this content.

Monthly subscription - $9.99Pay per view - $4.99Access through your institutionLogin with Underline account
Need help?
Contact us
Lecture placeholder background

EMNLP 2025

November 07, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

We present IMO-Bench, a suite of advanced reasoning benchmarks that aim for robustness in evaluation and specifically target the level of the International Mathematical Olympiad, the most prestigious venue for competitive math. IMO-Bench consists of diverse and challenging problems vetted by a panel of top IMO medalists and mathematicians. The first benchmark, IMO-AnswerBench, consists of 400 problems with verifiable answers curated from past Olympiad competitions and then altered by experts for robustness in evaluation. The latest frontier models struggle on this benchmark, with less than 48% accuracies in terms of matching the final answers. To advance the field beyond simple short-answer evaluation, we design IMO-ProofBench, consisting of both basic and novel problems, with detailed grading guidelines for full proof evaluation. Experts’gradings reveal that the best model achieves less than 36% max performance on this benchmark. Towards reducing grading cost, we share an automatic grader for the basic set that highly correlates with human expert evaluations. Last but not least, we construct, IMO-MistakeBench, a benchmark for identifying the first incorrect step in a full solution. Together, we hope the IMO-Bench contributes towards advancing robust mathematical reasoning.

Downloads

Paper
access premium content

Next from EMNLP 2025

Introducing Spotlight: A Novel Approach for Generating Captivating Key Information from Documents
poster

Introducing Spotlight: A Novel Approach for Generating Captivating Key Information from Documents

EMNLP 2025

+6Sombit Bose
Ayan Kumar Bhowmick and 8 other authors

07 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2026 Underline - All rights reserved