Content not yet available
This lecture has no active video or poster.
Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Multi-modal salient object detection (SOD) shows an improvement over its uni-modal counterpart by exploiting the complementary benefits between modalities. However, this improvement relies on complete multi-modal information, which is difficult to be guaranteed in practice due to sensor failures and transmission errors. To address this issue, we propose a robust multi-modal SOD framework that enhances the adaptability to modality-missing situations, while maintaining comparable performance in modality-complete cases. Nevertheless, flexibly handling modality-missing and modality-complete situations and integrating their corresponding multi-modal features in a unified framework is non-trivial. To this end, we achieve this framework by designing a Cascaded Mixture-of-Experts (CMoE) network that sequentially incorporates missing-aware and multi-modal MoE. Specifically, the missing-aware MoE introduces zero, copy, and alter experts with a soft router to adaptively reconstruct feature representations for both missing and non-missing modalities, assisted by a expert modulation loss that guides the router to modulate the weights of different experts according to missing conditions. The multi-modal MoE introduces two homogeneous uni-modal experts that separately learn modality-specific knowledge tailored for different modalities and dynamically combines their output through the soft router. The cascaded architecture fully empowers CMoE with the flexibility across varying input cases. Extensive experiments on RGB-D and RGB-T SOD datasets, with both modality-missing and modality-complete settings, demonstrate the effectiveness of the proposed method. Code and models will be made publicly available.