EMNLP 2025

November 07, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

The ability to critique is vital for models to self-improve and serve as reliable AI assistants. While extensively studied in language-only settings, multimodal critique of Large Multimodal Models (LMMs) remains underexplored despite their growing capabilities in tasks like captioning and visual reasoning. In this work, we introduce MM-Critic, a holistic benchmark for evaluating the critique ability of LMMs across three dimensions: basic, correlation, and comparison. Covering 8 task types and over 500 tasks, MM-Critic collects responses from LMMs with various model sizes. To enhance the evaluation reliability, we design expert-informed scoring rubrics that guide GPT-4o in annotating responses and generating reference critiques, which serve as anchors for trustworthy judgments. Extensive experiments validate the effectiveness of MM-Critic and provide a comprehensive assessment of leading LMMs’ critique capabilities. Further analysis reveals key insights, including the correlation between response quality and critique, and varying critique difficulty across evaluation dimensions.

Downloads

SlidesPaperTranscript English (automatic)

Next from EMNLP 2025

On the Correspondence between the Squared Norm and Information Content in Text Embeddings
poster

On the Correspondence between the Squared Norm and Information Content in Text Embeddings

EMNLP 2025

+1Alejandro Benito-SantosEnrique Amigó
Enrique Amigó and 3 other authors

07 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved