EMNLP 2025

November 07, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Recently, researchers have turned to synthetic tasks for evaluation of large language models' long-context capabilities, as they offer more flexibility than realistic benchmarks in scaling both input length and dataset size. However, existing synthetic tasks typically target narrow skill sets such as retrieving information from massive input, limiting their ability to comprehensively assess model capabilities. Furthermore, existing benchmarks often pair each task with a different input context, creating confounding factors that prevent fair cross-task comparison. To address these limitations, we introduce SYNC, a new evaluation suite of synthetic tasks spanning domains including graph understanding and translation. Each domain includes three tasks designed to test a wide range of capabilities---from retrieval, to multi-hop tracking, and to global context understanding that that requires chain-of-thought (CoT) reasoning. Crucially, all tasks share the same context, enabling controlled comparisons of model performance. We evaluate 14 LLMs on SYNC and observe substantial performance drops on more challenging tasks, underscoring the benchmark’s difficulty. Additional experiments highlight the necessity of CoT reasoning and demonstrate that \benchmark poses a robust challenge for future models.

Downloads

SlidesPaperTranscript English (automatic)

Next from EMNLP 2025

PRIME: Large Language Model Personalization with Cognitive Dual-Memory and Personalized Thought Process
poster

PRIME: Large Language Model Personalization with Cognitive Dual-Memory and Personalized Thought Process

EMNLP 2025

Xinliang Frederick ZhangNick Beauchamp
Nick Beauchamp and 2 other authors

07 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved