Lecture image placeholder

Premium content

Access to this content requires a subscription. You must be a premium user to view this content.

Monthly subscription - $9.99Pay per view - $4.99Access through your institutionLogin with Underline account
Need help?
Contact us
Lecture placeholder background
VIDEO DOI: https://doi.org/10.48448/0b36-yk33

workshop paper

ACL 2024

August 15, 2024

Bangkok, Thailand

CuReD: Deep Learning Optical Character Recognition for Cuneiform Text Editions and Legacy Materials

keywords:

legacy materials

cuneiform

ocr/htr

digitization

cultural heritage

human-in-the-loop

Cuneiform documents, the earliest known form of writing, are prolific textual sources of the ancient past. Experts publish editions of these texts in transliteration using specialized typesetting, but most remain inaccessible for computational analysis in traditional printed books or legacy materials. Off-the-shelf OCR systems are insufficient for digitization without adaptation. We present CuReD (Cuneiform Recognition-Documents), a deep learning-based human-in-the-loop OCR pipeline for digitizing scanned transliterations of cuneiform texts. CuReD has a character error rate of 9\% on clean data and 11\% on representative scans. We digitized a challenging sample of transliterated cuneiform documents, as well as lexical index cards from the University of Pennsylvania Museum, demonstrating the feasibility of our platform for enabling computational analysis and bolstering machine-readable cuneiform text datasets. Our result provide the first human-in-the-loop pipeline and interface for digitizing transliterated cuneiform sources and legacy materials, enabling the enrichment of digital sources of these low-resource languages.

Downloads

SlidesTranscript English (automatic)

Next from ACL 2024

Towards Context-aware Normalization of Variant Characters in Classical Chinese Using Parallel Editions and BERT
workshop paper

Towards Context-aware Normalization of Variant Characters in Classical Chinese Using Parallel Editions and BERT

ACL 2024

Florian Kessler
Florian Kessler

15 August 2024

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Lectures
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2023 Underline - All rights reserved