
Premium content
Access to this content requires a subscription. You must be a premium user to view this content.

workshop paper
Challenges in Urdu Machine Translation
keywords:
machine trasnlation
low-resource languages
Recent advancements in Neural Machine Translation (NMT) systems have significantly improved model performance on various translation benchmarks. However, these systems still face numerous challenges when translating low-resource languages (e.g., Urdu). In this work, we uncover the specific issues machine translation systems face when dealing with Urdu (a low-resource Indo-Aryan language). We first conduct a comprehensive evaluation of four diverse models on the task of Urdu Machine Translation: GPT-3.5 (a large language model), opus-mt-en-ur (a bilingual translation model), NLLB (a model trained for translating 200 languages) and IndicTrans2 (a specialized model for translating low-resource Indian languages). The results demonstrate that IndicTrans2 significantly outperforms other models in Urdu Machine Translation. We unveil the reasons for the superior performance of IndicTrans2, highlight the specific challenges encountered by different models in Urdu translation, and provide suggestions for further improvements in Urdu machine translation systems.