IJCNLP-AACL 2025

December 21, 2025

Mumbai, India

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

keywords:

synthetic data from llms

low resource languages

named entity recognition

We explore whether synthetic datasets generated by large language models using a few high quality seed samples are useful for low-resource named entity recognition, considering 11 languages from three language families. Our results suggest that synthetic data created with such seed data is a reasonable choice when there is no available labeled data, and is better than using entirely automatically labeled data. However, a small amount of high-quality data, coupled with cross-lingual transfer from a related language, always offers better performance. Data and code available at: https://github.com/grvkamath/low-resource-syn-ner.

Downloads

SlidesPaperTranscript English (automatic)

Next from IJCNLP-AACL 2025

On the Interplay between Positional Encodings, Morphological Complexity, and Word Order Flexibility

On the Interplay between Positional Encodings, Morphological Complexity, and Word Order Flexibility

IJCNLP-AACL 2025

Kushal TatariyaMiryam de Lhoneux
Miryam de Lhoneux and 2 other authors

21 December 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved