IJCNLP-AACL 2025

December 20, 2025

Mumbai, India

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

keywords:

ol chiki

santali

indian language

speech recognition

cross-lingual

low-resource

India, a country with a large population, possesses two official and twenty-two scheduled languages, making it the most linguistically diverse nation. Despite being one of the scheduled languages, Santali remains a low-resource language. Although Ol Chiki is recognized as the official script for Santali, many continue to use Bengali, Devanagari, Odia, and Roman scripts. In tribute to the upcoming centennial anniversary of the Ol Chiki script, we present an Automatic Speech Recognition for Santali in the Ol Chiki script. Our approach involves cross-lingual transfer learning by utilizing the Whisper framework pre-trained in Bengali and Hindi on the Santali language, using Ol Chiki script transcriptions. With the adoption of the Bengali pre-trained framework, we achieved a Word Error Rate (WER) score of 28.47%, whereas the adaptation of the Hindi pre-trained framework resulted in a score of 34.50% WER. These outcomes were obtained using the Whisper Small framework.

Downloads

SlidesTranscript English (automatic)

Next from IJCNLP-AACL 2025

R2-KG: General-Purpose Dual-Agent Framework for Reliable Reasoning on Knowledge Graphs
poster

R2-KG: General-Purpose Dual-Agent Framework for Reliable Reasoning on Knowledge Graphs

IJCNLP-AACL 2025

+1Sumin JoJunseong Choi
Edward Choi and 3 other authors

20 December 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved