EMNLP 2025

November 08, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Arabic Word Sense Disambiguation (WSD) remains challenging due to the language’s rich morphology, polysemy, and lack of large annotated resources. This study benchmarks four generative Large Language Models (LLMs)—GPT-4o, LLaMA 3.1-8B, Qwen 2.5-7B, and Gemma 2-9B—on two public Arabic WSD datasets under both zero-shot and fine-tuned conditions. Results show that GPT-4o achieves the strongest zero-shot performance (79 % accuracy, 66% macro-F1), while parameter-efficient fine-tuning of open models via LoRA closes and surpasses this gap. Qwen 2.5-7B attains 90.77 % accuracy and 83.98 % F1 on Dataset A, and LLaMA 3.1-8B reaches 88.51 % accuracy and 69.41 % F1 on Dataset B. The findings demonstrate that medium-sized open LLMs can serve as competitive, reproducible baselines for Arabic sense-level understanding when modest supervision is applied.

Downloads

Paper

Next from EMNLP 2025

Tool Calling for Arabic LLMs: Data Strategies and Instruction Tuning
workshop paper

Tool Calling for Arabic LLMs: Data Strategies and Instruction Tuning

EMNLP 2025

+1
Enes Altinisik and 3 other authors

08 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2026 Underline - All rights reserved