Lecture image placeholder

Premium content

Access to this content requires a subscription. You must be a premium user to view this content.

Monthly subscription - $9.99Pay per view - $4.99Access through your institutionLogin with Underline account
Need help?
Contact us
Lecture placeholder background
VIDEO DOI: https://doi.org/10.48448/cyx1-bf30

poster

ACL 2024

August 13, 2024

Bangkok, Thailand

Document-Level Machine Translation with Large-Scale Public Parallel Corpora

keywords:

contrastive evaluation

open data

natural language processing

document-level

machine translation

dataset

Despite the fact that document-level machine translation has inherent advantages over sentence-level machine translation due to additional information available to a model from document context, most translation systems continue to operate at a sentence level. This is primarily due to the severe lack of publicly available large-scale parallel corpora at the document level. We release a large-scale open parallel corpus with document context extracted from ParaCrawl in five language pairs, along with code to compile document-level datasets for any language pair supported by ParaCrawl. We train context-aware models on these datasets and find improvements in terms of overall translation quality and targeted document-level phenomena. We also analyse how much long-range information is useful to model some of these discourse phenomena and find models are able to utilise context from several preceding sentences.

Downloads

SlidesTranscript English (automatic)

Next from ACL 2024

Guardians of the Machine Translation Meta-Evaluation: Sentinel Metrics Fall In!
poster

Guardians of the Machine Translation Meta-Evaluation: Sentinel Metrics Fall In!

ACL 2024

+3Edoardo Barba
Stefano Perrella and 5 other authors

13 August 2024

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Lectures
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2023 Underline - All rights reserved