Content not yet available

This lecture has no active video or poster.

AAAI 2026

January 22, 2026

Singapore, Singapore

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Pretrained vision-language models (VLMs), especially CLIP, excel at adapting to downstream tasks through fine-tuning with sufficient high-quality labeled data. However, real-world training data often contains noisy labels, leading to significant performance degradation when models are naively fine-tuned on them. Existing noisy label learning methods for VLMs typically leverage the model's own pretrained knowledge, either via zero-shot predictions or vanilla self-training based on them, to identify and handle noisy samples. Crucially, these approaches blindly trust the VLM's pretrained knowledge, which can introduce endogenous confirmation bias: erroneous pretrained priors lead to incorrect noise detection, further amplifying the bias and corrupting the model. To overcome this limitation, we propose the Debiased Knowledge Adaptation Framework (DKAF), which empowers the model to challenge and correct potentially flawed zero-shot predictions. DKAF operates in three progressive phases: (1) Clean Sample Selection. We introduce a cross-modal collaborative pseudo-labeling to train a robust noisy label detector, explicitly mitigating confirmation bias by aggregating diverse signals beyond the model's initial zero-shot view. (2) Noisy Label Refinement. For samples identified as noisy, we apply a dual-modal consistency strategy to selectively correct their labels, leveraging alignment between visual and textual modalities to guide refinement while minimizing reliance on potentially biased internal knowledge. (3) Model Adaptation. The model is progressively fine-tuned using the jointly curated dataset of selected clean samples and corrected noisy samples, promoting robust adaptation to the target task. Extensive experiments on nine benchmark datasets (both synthetic and real-world noise) demonstrate that DKAF consistently outperforms state-of-the-art multimodal noisy label learning methods. Notably, under high-noise conditions, DKAF achieves average accuracy improvements of 3.28\%.

Downloads

Paper

Next from AAAI 2026

PINet: Improving the Stability of Prototype Networks via Phantasia-Inspired Uncertain Representations
poster

PINet: Improving the Stability of Prototype Networks via Phantasia-Inspired Uncertain Representations

AAAI 2026

+2
Soeun Bae and 4 other authors

22 January 2026

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved