Content not yet available
This lecture has no active video or poster.
Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Multimodal emotion recognition plays a crucial role in enhancing the intelligence of human-computer interaction and emotional understanding. However, conventional approaches face challenges such as scarcity of annotated data, significant modality heterogeneity, and temporal misalignment. To address these issues, we propose DHCM-CACL, a novel self-supervised emotion recognition framework integrating EEG and facial expressions. During the pre-training phase, we propose a Dynamic Hierarchical Cross-modal Mamba module (DHCM), which models long-term dependencies through dynamic state matrices, incorporates forgetting gates for noise suppression, and constructs a hierarchical cross-modal interaction structure, effectively achieving cross-modal temporal alignment and mitigating modality heterogeneity. Subsequently, we propose a Confidence-Adaptive Contrastive Learning module (CACL) that dynamically adjusts sample weights using gated confidence signals derived from DHCM to compute loss, prioritizing reliable samples while suppressing noisy instances through adaptive weighting, thereby enhancing representation reliability and low-sample generalization. During the fine-tuning phase, we integrate a cross-modal attention gating mechanism to reinforce temporal associations and adopt an evidence-aware joint optimization objective, providing probabilistic credibility outputs for emotion prediction. Experimental results on the DEAP and MAHNOB-HCI datasets demonstrate that our approach achieves state-of-the-art performance in emotion classification under both subject-dependent and subject-independent settings.
