EMNLP 2025

November 07, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

As automatic metrics become increasingly stronger and widely adopted, the risk of unintentionally "gaming the metric'' during model development rises. This issue is caused by metric interference (MINT), i.e., the use of the same or related metrics for both model tuning and evaluation. MINT can misguide practitioners into being overoptimistic about the performance of their systems: as system outputs become a function of the interfering metric, their estimated quality loses correlation with human judgments. In this work, we analyze two common cases of MINT in machine translation-related tasks: filtering of training data, and decoding with quality signals. Importantly, we find that MINT strongly distorts instance-level metric scores, even when metrics are not directly optimized for—questioning the common strategy of leveraging a different, yet related metric for evaluation that is not used for tuning. To address this problem, we propose MINTAdjust, a method for more reliable evaluation under MINT. MINTAdjust takes as input scores of other metrics on outputs from models not subject to interference, and produces adjusted scores for the interfering metric. On the WMT24 MT shared task test set, MINTAdjust ranks translations and systems more accurately than state-of-the-art-metrics across a majority of language pairs, especially for high-quality systems. Furthermore, MINTAdjust outperforms AutoRank, the ensembling method used by the organizers.

Downloads

SlidesTranscript English (automatic)

Next from EMNLP 2025

A Czech-centric Multitask and Multimetric Benchmark for Large Language Models with Duel Scoring Mechanism
poster

A Czech-centric Multitask and Multimetric Benchmark for Large Language Models with Duel Scoring Mechanism

EMNLP 2025

+15Martin Docekal
David Adamczyk and 17 other authors

07 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2026 Underline - All rights reserved