Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Facial Expression Recognition (FER) seeks to classify affective states from facial images, which remains a challenging problem due to variations in real-world conditions. FER task becomes particularly complex when handling unconstrained environments characterized by partial occlusions, different head poses, and so on. To address the above problems, current approaches rely on extensive learnable parameters and complex model architectures, which inevitably lead to overfitting and cause the FER model to focus on non-discriminative facial regions. In this work, we propose an HKAFER model that can adaptively enhance visual expression representations through efficiently fine-tuning the image encoder in large Visual Foundation Models (VFMs) and Vision-Language Models (VLMs). Specifically, we establish Heterogeneous Kronecker Adaptation (HeKA), which consists of multi-scale adapters based on Kronecker product in a parallel manner, offering significantly diverse subspaces to learn the incremental matrices. Besides, we also propose Dual-Branch Interactive Router (DBIR) to dynamically assign the weights of adapters, which promotes collaboration and information flow among them. In this way, our HKAFER can effectively capture robust spatial features and the regional associations. Experimental results demonstrate that our proposed model not only outperforms state-of-the-art methods on several FER benchmarks but also uses significantly fewer trainable parameters.
