Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
With the widespread deployment of cross-modal retrieval in real-world scenarios, ensuring robustness against adversarial attacks is increasingly critical. Remarkably, deep cross-modal hashing is highly vulnerable to adversarial attacks due to its discrete nature and low-dimensional hash codes, while existing defense methods often fail to suppress perturbations embedded in vulnerable features and lack the capacity to model modality-specific structural differences, resulting in suboptimal adversarial robustness. To address these challenges, we propose a novel Disentangled Representation-Focused Generative Defense (DRFGD) framework for attack-tolerant cross-modal hashing. Without altering the structure of retrieval model, DRFGD defends against adversarial attacks by disentangling input representations into adversarial-robust and adversarial-vulnerable components, by an efficient dual-branch semantic-aware encoder. Guided by such disentangled robust features, an attack-tolerant generative module is seamlessly designed to synthesize semantically aligned and perturbation-resilient examples for robust adversarial training, thereby significantly promoting collaborative defense robustness to attackers. Consequently, the semantically consistent hash codes can be well obtained to enhance adversarial robustness in complex cross-modal attacking scenarios. Extensive experiments on public benchmarks demonstrate that DRFGD substantially improves retrieval robustness under various attacking scenarios, and shows its improved defense performance in comparison with the SOTA works.