Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Arabic Word Sense Disambiguation (WSD) remains challenging due to the language’s rich morphology, polysemy, and lack of large annotated resources. This study benchmarks four generative Large Language Models (LLMs)—GPT-4o, LLaMA 3.1-8B, Qwen 2.5-7B, and Gemma 2-9B—on two public Arabic WSD datasets under both zero-shot and fine-tuned conditions. Results show that GPT-4o achieves the strongest zero-shot performance (79 % accuracy, 66% macro-F1), while parameter-efficient fine-tuning of open models via LoRA closes and surpasses this gap. Qwen 2.5-7B attains 90.77 % accuracy and 83.98 % F1 on Dataset A, and LLaMA 3.1-8B reaches 88.51 % accuracy and 69.41 % F1 on Dataset B. The findings demonstrate that medium-sized open LLMs can serve as competitive, reproducible baselines for Arabic sense-level understanding when modest supervision is applied.
