EMNLP 2025

November 07, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Recent work shows strong transfer learning capability to unseen languages in sequence-to-sequence neural networks, under the assumption that we have high-quality word representations for the target language. We evaluate whether this direction is a viable path forward for translation from low-resource languages by investigating how much data is required to learn such high-quality word representations. We first show that learning word embeddings separately from a translation model can enable rapid adaptation to new languages with only a few hundred sentences of parallel data. To see whether the current bottleneck in transfer to low-resource languages lies mainly with learning the word representations, we then train word embeddings models on varying amounts of data, to then plug them into a machine translation model. We show that in this simulated low-resource setting with only 500 parallel sentences and 31,250 sentences of monolingual data we can exceed 15 BLEU on Flores on unseen languages. Finally, we investigate why on a real low-resource language the results are less favorable and find fault with the publicly available multilingual language modelling datasets.

Downloads

SlidesPaperTranscript English (automatic)

Next from EMNLP 2025

Quantifying Language Disparities in Multilingual Large Language Models
poster

Quantifying Language Disparities in Multilingual Large Language Models

EMNLP 2025

Ivan VulićAnna KorhonenSongbo Hu
Songbo Hu and 2 other authors

07 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved