EMNLP 2025

November 05, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Language models (LMs), like other neural networks, often favor shortcut heuristics based on surface-level patterns. LMs in particular behave like n-gram models early in training, but eventually learn hierarchical syntax to correctly apply grammatical rules out-of-distribution (OOD). This paper uses case studies of English grammar to explore how complex, diverse training data drives OOD generalization. We construct a framework that unifies our understanding of random variation with training dynamics, rule selection with memorization, and data diversity with complexity. We show that these factors are nuanced, and that intermediate levels of diversity and complexity lead to inconsistent behavior across random seeds and to unstable training dynamics. Our findings emphasize the critical role of training data in shaping generalization patterns and illuminate how competing model strategies lead to inconsistent training outcomes.

Downloads

SlidesPaperTranscript English (automatic)

Next from EMNLP 2025

MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation
poster

MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation

EMNLP 2025

+10
Quan Dang Anh and 12 other authors

05 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved