Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
The Nüshu script, originating from Jiangyong County, China, is the world’s only known writing system historically created and used exclusively by women. Although Natural Language Processing (NLP) efforts have begun digitizing limited Nüshu-Chinese text pairs, computational access to the script remains highly restricted due to its handwritten, visual nature and absence of multimodal tools. We contribute two novel datasets: NüshuVision, an image corpus of 500 rendered sentences in traditional vertical, right-to-left orthography, and NüshuStrokes, the first sequential handwriting recordings of all 397 Unicode Nüshu characters by an expert calligrapher. Benchmarking five leading Chinese OCR systems on NüshuVision shows a consistent Character Error Rate (CER) of 1.0. Fine-tuning Microsoft’s TrOCR model reduces CER to 0.67. These resources mark a crucial step toward multimodal processing of Nüshu and present a new paradigm for culturally sensitive language revitalization.
