Distribution-Consistency-Guided Multi-modal Hashing

Rong-Cheng Tu

Premium content

Access to this content requires a subscription. You must be a premium user to view this content.

Monthly subscription - $9.99 Pay per view - $4.99 Access through your institution Login with Underline account

Need help?

Contact us

AAAI 2025

•

February 27, 2025

•

Philadelphia, United States

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Multi-modal hashing methods, which map instances into hash codes, have gained popularity in multi-modal retrieval tasks due to their fast speed and low storage requirements. Compared with unsupervised multi-modal hashing methods, supervised multi-modal hashing methods demonstrate better performance by utilizing labels as supervisory signals. Currently, for almost all supervised multi-modal hashing methods, there is a hidden assumption that training sets have no noisy labels. However, in real-world scenarios where labels are often annotated incorrectly due to manual labeling, these noisy labels will greatly harm the performance of supervised multi-modal hashing retrieval. To address this issue, we first discover a significant distribution consistency pattern through experiments, i.e., the 1-0 distribution of the presence or absence of each category in the label is consistent with the high-low distribution of similarity scores of the hash codes relative to each category center. Then, inspired by this pattern, we propose a novel Distribution-Consistency-Guided Multi-modal Hashing(DCGMH), which aims to filter and reutilize noisy labels via the distribution consistency pattern to enhance retrieval performance. Specifically, the proposed method first randomly initializes several category centers, each representing the region's centroid of its respective category, which are used to compute the similarity scores of the hash codes relative to each category center; Noisy and clean labels are then separately filtered out via the consistency pattern between the 1-0 distribution of labels and the high-low distribution of similarity scores to mitigate the impact of noisy labels; Subsequently, a reconstruction strategy, which is indirectly designed via the distribution consistency pattern, is applied to the filtered noisy labels, correcting high-confidence ones while treating low-confidence ones as unlabeled for unsupervised learning, thereby further enhancing the model’s performance. Extensive experiments on three widely used datasets demonstrate the superiority of the proposed method compared to state-of-the-art baselines in multi-modal retrieval tasks.