EMNLP 2025

November 08, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

The Nüshu script, originating from Jiangyong County, China, is the world’s only known writing system historically created and used exclusively by women. Although Natural Language Processing (NLP) efforts have begun digitizing limited Nüshu-Chinese text pairs, computational access to the script remains highly restricted due to its handwritten, visual nature and absence of multimodal tools. We contribute two novel datasets: NüshuVision, an image corpus of 500 rendered sentences in traditional vertical, right-to-left orthography, and NüshuStrokes, the first sequential handwriting recordings of all 397 Unicode Nüshu characters by an expert calligrapher. Benchmarking five leading Chinese OCR systems on NüshuVision shows a consistent Character Error Rate (CER) of 1.0. Fine-tuning Microsoft’s TrOCR model reduces CER to 0.67. These resources mark a crucial step toward multimodal processing of Nüshu and present a new paradigm for culturally sensitive language revitalization.

Downloads

Transcript English (automatic)

Next from EMNLP 2025

Transfer learning for dependency parsing of Vedic Sanskrit
workshop paper

Transfer learning for dependency parsing of Vedic Sanskrit

EMNLP 2025

Abhiram Vinjamuri

08 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved