
Premium content
Access to this content requires a subscription. You must be a premium user to view this content.

poster
Feriji: A French-Zarma Parallel Corpus, Glossary & Translator
keywords:
zarma
nlp
machine translation
Machine Translation (MT) is a rapidly expanding field, experiencing significant advancements in recent years with the development of models capable of translating multiple languages with remarkable accuracy. However, the representation of African languages in this field still raises problems due to linguistic complexities and limited resources. This applies to the Zarma language, a dialect of Songhay (of the Nilo-Saharan language family) spoken by over 5 million people across Niger and neighboring countries (Lewis et al., 2016). This paper introduces Feriji, the first robust French-Zarma parallel corpus and glossary designed for MT. The corpus, containing 33,059 aligned sentences, and glossary of 4,062 words represent a significant step in addressing the need for more resources for Zarma. We fine-tune three large language models on our dataset, obtaining a BLEU score of 30.06 on the best-performing model. We further evaluate the models on human judgements of fluency, comprehension, and readability and of the importance and impact of the corpus and models. Our contributions help to bridge a significant language gap and promote an important and overlooked indigenous African language.