
Premium content
Access to this content requires a subscription. You must be a premium user to view this content.

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Language models have great potential as cognitive models for studying human language acquisition, but current models are far less data-efficient than human learners. Children acquire language from 100 million words or less, but large language models are trained on trillions of words. We discuss the prospects for improving language models’ developmental plausibility through a meta-analysis of results from the 2023 BabyLM Challenge. BabyLM was a competition that invited participants to train a language model on a 100 million-word corpus including transcribed speech and child-appropriate texts. Results from over 30 submissions showed that new machine learning techniques and increased training iterations yielded models that outperformed leading large language models in grammar, language understanding, and linguistic generalization, while cognitively plausible approaches such as curriculum learning were less effective. We discuss the implications of these and other findings for computational cognitive modeling and explore ideas to ensure future competitions’ contributions to cognitive science.
Authors:
Alex Warstadt: ETH Zurich; Aaron Mueller: Northeastern University; Leshem Choshen: IBM; Ethan Gotlieb Wilcox: ETH Zurich; Chengxu Zhuang: MIT; Adina Williams: Meta Platforms Inc.; Ryan Cotterell: Institute for Machine Learning; Tal Linzen: New York University
