Lecture image placeholder

Premium content

Access to this content requires a subscription. You must be a premium user to view this content.

Monthly subscription - $9.99Pay per view - $4.99Access through your institutionLogin with Underline account
Need help?
Contact us
Lecture placeholder background
VIDEO DOI: https://doi.org/10.48448/tkw5-q831

workshop paper

ACL 2024

August 15, 2024

Bangkok, Thailand

Rosetta Balcanica: Deriving a "Gold Standard'' Neural Machine Translation (NMT) Parallel Dataset from High-Fidelity Resources for Western Balkan Languages

keywords:

parallel datasets

low-resource languages

neural machine translation

The Rosetta Balcanica is an ongoing effort in resource expansion for low-resource Western Balkans languages. This effort focuses on discovering and using accurately translated, officially mapped, and curated parallel language resources and their preparation and use as neural machine translation (NMT) datasets. Some of the guiding principles, practices, and methods employed by Rosetta Balcanica are generalizable and could apply to other low-resource language resource expansion efforts. With this goal in mind, we present our rationale and approach to discovering and using meticulously translated and officially curated low-resource language resources and our use of these resources to develop a parallel gold standard'' translation training resource. Secondly, we describe our specific methodology for NMT dataset development from these resources and its publication to a widely-used and accessible repository for natural language processing (\textit{Hugging Face Hub}). Finally, we discuss the trade-offs and limitations of our current approach, and the roadmap for future development and the expansion of the current Rosetta Balcanica language resource.

Downloads

Transcript English (automatic)

Next from ACL 2024

Improving Self-training with Prototypical Learning for Source-Free Domain Adaptation on Clinical Text
workshop paper

Improving Self-training with Prototypical Learning for Source-Free Domain Adaptation on Clinical Text

ACL 2024

+1Eiji Aramaki
Seiji Shimizu and 3 other authors

16 August 2024

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Lectures
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2023 Underline - All rights reserved