Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Multi-modal Dialogue Summarization (MDS) is an important task with great applications. To develop and improve the MDS model, a strong automatic evaluation method for MDS could save a lot of money and time. However, a meta-evaluation benchmark with human annotation is a foundation for developing and improving the automatic evaluation methods of MDS. The shortage of such a benchmark motivates us to introduce MDSEval, the first meta-evaluation benchmark for MDS, providing data-summary pairs and human annotation on summary quality across 8 aspects. Besides the novel benchmark dataset, we propose a novel filtering framework based on Mutually Exclusive Key Information (MEKI) across modalities used in enhancing our data quality. Further, our work is the first to define key evaluation aspects for MDS tasks. Our findings reveal that current multi-modal evaluation methods struggle to fairly rate summaries generated by advanced MLLMs. Our datasets, filtering methods, defined evaluation aspects, and findings will greatly benefit the development of MDS evaluation methods