Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Understanding multimodal metaphors represents a crucial pathway for machines to comprehend human cognition. However, current research remains constrained by superficial dataset annotations, insufficient systematic evaluation of large language models, and fragmented task frameworks. To bridge these gaps, the paper proposes a systematic solution featuring: (I) We present the largest fine-grained Multi-task Multimodal Metaphor Understanding Challenge Dataset (M$^{3}$UCD) built via multi-perspective collaborative annotation. It contains 15,345 samples, each annotated with 12 manual attribute labels. (II) Systematic benchmarking of LLMs' capacity boundaries in metaphor understanding. Evaluation results reveal the persistent challenges LLMs face in this domain while validating M$^{3}$UCD's effectiveness and potential. (III) A concise and unified multi-task baseline framework was developed and demonstrated its effectiveness in enhancing the metaphor understanding capabilities of MLLMs. M$^{3}$UCD will be publicly released to advance metaphor research.
$\textit{Disclaimer}$: M$^{3}$UCD contains samples with potentially sensitive content (e.g., sarcasm, offensiveness, fake news, cultural references).