Annually, research teams spend large amounts of money to evaluate the quality of machine translation systems (WMT, inter alia). This is expensive because it requires a lot of expert human labor. In the recently adopted annotation protocol, Error Span Annotation (ESA), annotators mark erroneous parts of the translation and then assign a final score. A lot of the annotator time is spent on scanning the translation for possible errors. In our work, we help the annotators by pre-filling the error annotations with recall-oriented automatic quality estimation. With this AI assistance,
we obtain annotations at the same quality level while cutting down the time per span annotation by half (71s/error span → 31s/error span). The biggest advantage of the ESAAI protocol is an accurate priming of annotators (pre-filled error spans) before they assign the final score. This alleviates a potential automation bias, which we confirm to be low. In our experiments, we find that the annotation budget can be further reduced by almost 25% with filtering of examples that the AI deems to be likely to be correct.

AI-Assisted Human Evaluation of Machine Translation

Ten years ago a single metric, BLEU, governed progress in machine translation research. For better or worse, there is no such consensus today, and consequently it is difficult for researchers to develop and retain intuitions about metric deltas that drove earlier research and deployment decisions. This paper investigates the “dynamic range” of a number of modern metrics in an effort to provide a collective understanding of the meaning of differences in scores both within and among metrics; in other words, we ask "what point difference x in metric y is required between two systems for humans to notice?". We conduct our evaluation on a new large dataset, ToShip23, using it to discover deltas at which metrics achieve system-level differences that are meaningful to humans, which we measure by pairwise system accuracy. We additionally show that this method of establishing delta-accuracy is more stable than the standard use of statistical p-values in regards to testset size. Where data size permits, we also explore the effect of metric deltas and accuracy across finer-grained features such as translation direction, domain, and system closeness.

Navigating the Metrics Maze: Reconciling Score Magnitudes and Accuracies

Most research about natural language generation (NLG) relies on evaluation benchmarks with limited references for a sample, which may result in poor correlations with human judgements. The underlying reason is that one semantic meaning can actually be expressed in different forms, and the evaluation with a single or few references may not accurately reflect the quality of the model's hypotheses. To address this issue, this paper presents a simple and effective method, named **Div-Ref**, to enhance existing evaluation benchmarks by enriching the number of references. We leverage large language models (LLMs) to diversify the expression of a single reference into multiple high-quality ones to cover the semantic space of the reference sentence as much as possible. We conduct comprehensive experiments to empirically demonstrate that diversifying the expression of reference can significantly enhance the correlation between automatic evaluation and human evaluation. This idea is compatible with recent LLM-based evaluation which can similarly derive advantages from incorporating multiple references. *We strongly encourage future generation benchmarks to include more references, even if they are generated by LLMs, which is once for all.* We release all the code and data at https://github.com/RUCAIBox/Div-Ref to facilitate research.

Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying References

Poster Session 8 - MTM: Machine Translation, Multilinguality and Language Diversity

poster

### Welcome to 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics

Welcome to the 2025 meeting of the Nations of the Americas Chapter of the Association for Computational Linguistics! I am proud to help organize the first NAACL conference to carry the new name of our organization, one that emphasizes inclusion for all of the Americas. I am also pleased to welcome you to Albuquerque, New Mexico, a state whose unique blend of cultural influences will make for an excellent backdrop for NAACL 2025, especially with this year’s special theme on NLP in a Multicultural World. 
**[Continue reading...](https://drive.google.com/file/d/1jX-qGhqVSZZCIrAnJaz798pflu5Irrdn/view?usp=sharing)**

*- Colin Cherry, Google, NAACL 2025 General Chair* 

[![](https://assets.underline.io/markdown_image/1/image/b087f8a4dc5816d6e1a6514e59c59ac3.png)](https://drive.google.com/file/d/1T96GzPqObXrMln2BMByCSXTSizjTg69P/view?usp=sharing)

You need to log in with the email address you registered with. Access credentials have been sent to your email. 

Please be sure to check your spam and other email folders if you do not see an email confirmation right away.

Please log in to explore this event.

To access NAACL 2025 event page you are required to register. Please follow [**this link**](https://2025.naacl.org/registration/registration/) to register. Access will depend on your registration type.

Please register!

NAACL 2025

Welcome to 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics

This poster session includes Main Conference posters, TACL and Demos.

In-Person Poster Session 5

### Welcome!
The 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024) will take place in Bangkok, Thailand from August 11th to 16th, 2024. Our Virtual Poster Sessions will take place online Thursday, August 22, 2024.

You are required to register for this event. **Please register [here](https://2024.aclweb.org/registration). **

If you have already registered, please check your inbox for an email from Underline granting you access to ACL 2024 content.

ACL 2024

The 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024) will take place in Bangkok, Thailand from August 11th to 16th, 2024. More information will be announced soon.

Virtual Poster Session 2

## Welcome to 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics!
This year, the conference is in Mexico City. NAACL was actually already planned for Mexico City in 2021, but due to the pandemic the entire conference was moved online. This year, finally, we get to go! So it is my sincere pleasure to welcome you to Mexico City, whether in person or virtually. Having the conference in Mexico City is a good opportunity to emphasize that NAACL is our flagship conference for ACL members not only in North America but also in Central and South America, even though NAACL has been bearing “North” in its name. At this year’s conference, we have a theme to match, with a theme track on the Languages of Latin America to showcase the linguistic diversity of the region.
 
The opportunity to present at NAACL should not depend on a researcher’s travel budget, or their family status. This is why it is so important to make virtual participation at NAACL as good an experience as possible – but we want to also provide a good experience for in-person participants. As a community, we are still working out the best way to do that. This year at NAACL, we are trying out a big virtual poster session ahead of the conference, with the hope that this will make make for a lively and interactive experience. At the same time, we are reducing virtual oral presentations, which seem to be particularly tricky to make to work well. A big thanks to the NAACL program chairs and to Luciana Benotti for all their ideas and work to improve the virtual experience. And participants, virtual as well as in-person: Please let us know what worked for you and what didn’t, so we can continue to improve hybrid conferences. 
 
I have been lucky to work with many amazing people. Without their insight, dedication and patience, and without the many hours of work they put in, NAACL would not have been possible. A huge thank you to the program chairs Helena Gomez, Kevin Duh, and Steve Bethard – you are the best! 
 
Finally, I would like to thank all authors, invited speakers and panelists, area chairs and reviewers, the volunteers organizing and chairing sessions, and all attendees, in-person and virtual. Thank you for helping us make NAACL 2024 come to life. 
 
Welcome and hope you all enjoy the conference! 
Katrin Erk 
The University of Texas at Austin 
NAACL-2024 General Chair 
*You can read the full Welcome message in the [Conference Handbook (downloadable)](https://drive.google.com/file/d/1H1NvW0VASQjkSCYgw3mr-3l4yYDSxz6B/view?usp=sharing)*

To access the event page you need to register [**here.**](http://acl.swoogo.com/naacl2024) 
Your access to the event page is limited based on your registration type. If you registered for workshops only, you will gain access to full workshops content on the day of the workshops program.

NAACL 2024

2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics

Tom Kocmi

3

Presentations

AI-Assisted Human Evaluation of Machine Translation

Navigating the Metrics Maze: Reconciling Score Magnitudes and Accuracies

Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying References

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES