EMNLP 2025

November 06, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Bronze inscriptions from early China are often fragmentary, with missing or undeciphered characters limiting linguistic and historical analysis. Addressing this challenge requires models that can generalize across orthographic variation and diachronic script change. This paper introduces three contributions to support computational processing of bronze inscriptions: (i) a fully digitized and Unicode-encoded corpus of over 40,000 inscriptional characters; (ii) a glyph network linking diachronic variants to shared semantic anchors; and (iii) a masked language modeling (MLM) framework with variant-aware augmentation, alongside a periodization classification task. Experiments show that domain-adaptive pretraining and glyph-aware modeling substantially improve restoration accuracy.

Downloads

SlidesPaperTranscript English (automatic)

Next from EMNLP 2025

TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection
poster

TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection

EMNLP 2025

+6
Yunchu Bai and 8 other authors

06 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2026 Underline - All rights reserved