EMNLP 2025

November 05, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Multi-modal Dialogue Summarization (MDS) is an important task with great applications. To develop and improve the MDS model, a strong automatic evaluation method for MDS could save a lot of money and time. However, a meta-evaluation benchmark with human annotation is a foundation for developing and improving the automatic evaluation methods of MDS. The shortage of such a benchmark motivates us to introduce MDSEval, the first meta-evaluation benchmark for MDS, providing data-summary pairs and human annotation on summary quality across 8 aspects. Besides the novel benchmark dataset, we propose a novel filtering framework based on Mutually Exclusive Key Information (MEKI) across modalities used in enhancing our data quality. Further, our work is the first to define key evaluation aspects for MDS tasks. Our findings reveal that current multi-modal evaluation methods struggle to fairly rate summaries generated by advanced MLLMs. Our datasets, filtering methods, defined evaluation aspects, and findings will greatly benefit the development of MDS evaluation methods

Downloads

SlidesPaperTranscript English (automatic)

Next from EMNLP 2025

Evaluating the Creativity of LLMs in Persian Literary Text Generation
poster

Evaluating the Creativity of LLMs in Persian Literary Text Generation

EMNLP 2025

Armin tourajmehr
Mohammad Reza Modarres and 2 other authors

05 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved