AAAI 2026

January 24, 2026

Singapore, Singapore

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

We introduce SampurNER, a fine-grained named entity recognition (FgNER) dataset encompassing all 22 scheduled Indian languages spoken by more than two billion people across various countries. While manual annotation for FgNER resources is often labor-intensive and expensive, distant supervision methods have been employed as a viable solution. However, such datasets are often noisy, with entity mentions tagged with multiple types, requiring computationally intensive noise-aware models for effective FgNER. Moreover, resources for both coarse-grained and fine-grained named entity recognition tasks in Indian languages remain scarce. To address this, we propose an entity-anchored machine translation framework that leverages the largest manually annotated English FgNER dataset, FewNERD, to create a large-scale FgNER dataset in 22 languages. On average, the dataset comprises over 153k sentences, 354k entities, and 3.3M tokens in each language. The languages covered are: Assamese (as), Bengali (bn), Bodo (brx), Dogri (doi), Gujarati (gu), Hindi (hi), Kannada (kn), Kashmiri (ks), Konkani (gom), Maithili (mai), Malayalam (ml), Manipuri (mni), Marathi (mr), Nepali (ne), Odia (or), Punjabi (pa), Sanskrit (sa), Santali (sat), Sindhi (sd), Tamil (ta), Telugu (te), and Urdu (ur). Various rigorous analyses and human evaluations confirm the high quality of the dataset and demonstrate the effectiveness of the entity-anchored machine translation framework with up to 9% increase in F1-score against the current state-of-the-art. Additionally, we extend our analysis to zero-shot, multilingual, and cross-lingual settings, investigating the influence of language family and script similarity on cross-lingual FgNER performance.

Downloads

SlidesPaperTranscript English (automatic)

Next from AAAI 2026

BEVDilation: LiDAR-Centric Multi-Modal Fusion for 3D Object Detection
poster

BEVDilation: LiDAR-Centric Multi-Modal Fusion for 3D Object Detection

AAAI 2026

+1
Liyi Chen and 3 other authors

24 January 2026

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved