EMNLP 2025

November 05, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Arabic diacritics, similar to short vowels in English, provide phonetic and grammatical information but are typically omitted in written Arabic, leading to ambiguity. Diacritization (aka diacritic restoration or vowelization) is essential for natural language processing. This paper advances Arabic diacritization through the following contributions: first, we propose a methodology to analyze and refine a large diacritized corpus to improve training quality. Second, we introduce WikiNews-2024, a multi-reference evaluation methodology with an updated version of the standard benchmark WikiNews-2014''. In addition, we propose a BiLSTM-based model that achieves state-of-the-art results with 3.12% and 2.70% WER on WikiNews-2014 and WikiNews-2024. Moreover, we develop a model that preserves user-specified diacritics while maintaining accuracy. Lastly, we demonstrate that augmenting training data enhances performance in low-resource settings.

Downloads

SlidesPaperTranscript English (automatic)

Next from EMNLP 2025

WildScore: Benchmarking MLLMs in-the-Wild Symbolic Music Reasoning
poster

WildScore: Benchmarking MLLMs in-the-Wild Symbolic Music Reasoning

EMNLP 2025

+4
Julian McAuley and 6 other authors

05 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved