EMNLP 2025

November 06, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Large language models (LLMs) are increasingly integral as productivity assistants, but existing benchmarks fall short in rigorously evaluating their real-world instruction-following capabilities. Current benchmarks often (i) lack sufficient multilinguality, (ii) fail to capture the implicit constraints inherent in user requests, and (iii) overlook the complexities of multi-turn dialogue. To address these critical gaps and provide a more realistic assessment, we introduce ProductivityBench, a novel benchmark specifically designed for LLM-based productivity assistants. ProductivityBench distinguishes itself by featuring input prompts across 12 languages, incorporating intra-instance multilingual instructions, employing rigorous evaluation criteria to capture both explicit and implicit constraints, and including complex multi-turn dialogue scenarios with both accumulating constraints and context switches. Furthermore, to ensure the reliability evaluation, we refined constraints using an LLM validator. Extensive experiments demonstrate that ProductivityBench presents significantly greater challenges than existing benchmarks; for instance, a strong model like GPT-o1 achieved only a 69.07% overall pass rate. ProductivityBench offers a demanding and realistic assessment of LLM in practical productivity settings, highlighting their capabilities and limitations.

Downloads

SlidesPaperTranscript English (automatic)

Next from EMNLP 2025

Social Bias Evaluation for Large Language Models Requires Prompt Variations
poster

Social Bias Evaluation for Large Language Models Requires Prompt Variations

EMNLP 2025

Naoaki OkazakiMasahiro Kaneko
Rem Hida and 2 other authors

06 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2026 Underline - All rights reserved