EMNLP 2025

November 08, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Multilingual Pre-trained Language Models (multiPLMs) trained with the Masked Language Modeling (MLM) objective exhibit suboptimal performance on cross-lingual downstream tasks for Low-Resource Languages (LRLs). Continually pre-training these multiPLMs with the Translation Language Modeling (TLM) objective on parallel data improves the cross-lingual performance. However, both MLM and TLM mask tokens randomly, which does not guarantee optimal representation learning. In this paper, we introduce a novel masking strategy, Linguistic Entity Masking (LEM) to improve the cross-lingual representations of existing multiPLMs. In contrast to MLM and TLM, LEM limits masking to the linguistic entities nouns, verbs and Named Entities, which hold a higher prominence in a sentence. We hypothesise that masking linguistically significant linguistic entities should contribute to effective representation learning. Empirically, we prove this using two downstream tasks with three LRL pairs, English-Sinhala, English-Tamil, and Sinhala-Tamil, and show that our LEM-based learning returns superior results compared to MLM+TLM.

Downloads

Transcript English (automatic)

Next from EMNLP 2025

Investigating Motivated Inference in Large Language Models
workshop paper

Investigating Motivated Inference in Large Language Models

EMNLP 2025

Stacy Marsella and 1 other author

08 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2026 Underline - All rights reserved