Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
The rapid advancement of large language models (LLMs) has revitalised research in Emotion Recognition in Conversation (ERC). However, existing LLM-based ERC approaches operate solely on textual input, whereas MLLM-based emotion recognition methods in non-conversational scenarios typically perform only basic multimodal fusion and fail to consider speaker-sensitive contextual dependencies, which limits their performance on ERC tasks. To integrate multimodal cues effectively and address their limitations in handling contextual dependencies, we propose a novel LLM-based framework, Causal-ERC, which captures context representations within each modality and incorporates them into the LLM. Moreover, experimental results show that LLMs perform poorly on long conversations. To improve LLMs' ability to model long conversations, we adjust corresponding causal prompts according to the causal type of each utterance. Experiments on two benchmark MERC datasets demonstrate that our Causal-ERC framework consistently outperforms existing state-of-the-art approaches and improves LLM's performance in long-context scenarios.