EMNLP 2025

November 05, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Multimodal Large Language Models (MLLMs) have shown promise in visual-textual reasoning, with Multimodal Chain-of-Thought (MCoT) prompting significantly enhancing interpretability. However, existing MCoT methods rely on rationale-rich datasets and largely focus on inter-object reasoning, overlooking the intra-object understanding crucial for image classification. To address this gap, we propose a novel method to convert any image classification dataset into one augmented with MCoTs. Inspired by Concept Bottleneck Models (CBMs), our approach reformulates concept-based representations into concise, interpretable reasoning chains guided by weak supervision. Experiments across ten datasets show that our generated MCoTs not only improve interpretability by 37% but also lead to gains in classification accuracy when used to fine-tune MLLMs. Our work bridges concept-based interpretability and generative MCoT reasoning, providing a generalizable framework for enhancing MLLMs in fine-grained visual understanding.

Downloads

SlidesPaperTranscript English (automatic)

Next from EMNLP 2025

CoEvo: Coevolution of LLM and Retrieval Model for Domain-Specific Information Retrieval
poster

CoEvo: Coevolution of LLM and Retrieval Model for Domain-Specific Information Retrieval

EMNLP 2025

+8
Ming Cai and 10 other authors

05 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved