EMNLP 2025

November 05, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Multimodal large language models (MLLMs) have demonstrated strong performance across diverse multimodal tasks, achieving promising outcomes. However, their application to emotion recognition in natural images remains under-explored. MLLMs struggle to handle ambiguous emotional expressions and implicit affective cues, whose capability is crucial for affective understanding but largely overlooked. To address these challenges, we propose MERMAID, a novel multi-agent framework that integrates an emotion-guided visual augmentation module and a multi-perspective self-reflection module, enabling agents to interact across modalities and reinforce subtle emotional semantics, thereby enhancing emotion recognition and achieving autonomy. Extensive experiments demonstrate that MERMAID outperforms existing methods, achieving up to 8.70%-27.90% absolute accuracy gains across diverse benchmarks and demonstrating greater robustness in emotionally diverse scenarios.

Downloads

SlidesPaperTranscript English (automatic)

Next from EMNLP 2025

Hidden in Plain Sight: Reasoning in Underspecified and Misspecified Scenarios for Multimodal LLMs
poster

Hidden in Plain Sight: Reasoning in Underspecified and Misspecified Scenarios for Multimodal LLMs

EMNLP 2025

+4
Xinze Guan and 6 other authors

05 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved