Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Relation extraction (RE) is a core task in natural language processing, crucial for semantic understanding, knowledge graph construction, and enhancing downstream applications. However, Arabic RE remains a challenging task due to the language’s rich morphology, orthographic ambiguity, syntactic complexity, and wide dialectal variation. To advance research in this area, we present the largest and most diverse Arabic RE corpus to date: over 33K sentences (approx550K tokens) annotated with approx15K relation triples using 40 relation types. All annotations were manually curated by expert annotators, achieving a 85.2\% Cohen's κ inter-annotator agreement, ensuring high reliability. We benchmark the dataset using both supervised models and in-context learning with LLMs. Supervised models obtain an F1 score of 92.89\% for relation extraction, while LLMs achieve 72.73\% F1 in joint entity and relation extraction. These results establish strong baselines and expose key challenges, paving the way for future work in Arabic RE.