Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
News classification is a key task in Natural Language Processing (NLP) that involves the automatic categorization of articles into predefined topics. While extensively studied in high-resource languages such as English, low-resource languages such as Amharic face significant challenges owing to limited datasets and the absence of language-specific model adaptations. This study contributes to Amharic news classification by expanding the existing Amharic news dataset from 50,000 to 144,000 articles, nearly tripling its size. We evaluate five transformer-based models—mBERT, XLM-R, DistilBERT, AfriBERTa, and AfroXLM—on this enriched dataset to improve classification accuracy and effectiveness. The model performance was assessed using accuracy, precision, recall, and F1-score, with careful attention to computational efficiency and overfitting. Among the models, AfriBERTa and XLM-R achieved the highest F1-scores of 90.25\% and 90.11\%, respectively, significantly surpassing the Naïve Bayes baseline, which reached only 60.3\% accuracy on the former dataset. These findings underscore the importance of both advanced transformer architectures and larger, high-quality datasets in boosting classification performance for under-resourced languages such as Amharic, offering valuable insights and resources for future NLP research.
