Content not yet available

This lecture has no active video or poster.

AAAI 2026

January 24, 2026

Singapore, Singapore

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Text-to-image person re-identification (TIReID) aims to retrieve the most relevant pedestrian images from an image gallery based on natural language descriptions. Recent studies have achieved significant performance improvements by leveraging Masked Language Modeling (MLM) to align fine-grained information through local matching. However, in the text feature extraction, randomly masking text tokens may disrupt the semantic relationships between these local tokens, leading to feature misalignment; on the other hand, from an image feature perspective, redundant patches in pedestrian images hinder the information interaction across modalities. Moreover, the presence of noisy image-text pairs further complicates the learning process, as the model may be misled into recognizing incorrect patterns. To address these issues, we propose a robust fine-grained local alignment framework based on Key Phrase Dynamic Mask (KPDM). First, we strengthen the semantic relationships between text tokens by implementing a "adjective + noun" phrase-level masking strategy, mitigating local misalignment. Additionally, we integrate cross-layer importance estimation to highlight key pedestrian image representations while removing redundant image features. Building on this, we design a novel frequency-based masked language loss (FMLM) to supervise fine-grained semantic-level local alignment. Second, we propose a trusted consensus partitioning mechanism, utilizing intra-identity image-text similarity distributions to identify noisy pairs, enhancing the model robustness. Extensive experiments show that our method achieves 67.95\% Rank-1 and 51.88\% mAP on the RSTPReid dataset, exceeding the previous state-of-the-art by 2.6\% and 1\%. Furthermore, KPDM achieves Rank-1 accuracies of 75.97\% on the CUHK-PEDES dataset and 67.78\% on the ICFG-PEDES dataset, outperforming earlier methods.

Downloads

Paper

Next from AAAI 2026

Intermediate N-Gramming: Deterministic and Fast N-Grams for Large N and Large Datasets
poster

Intermediate N-Gramming: Deterministic and Fast N-Grams for Large N and Large Datasets

AAAI 2026

+1Fred Lu
Ryan Curtin and 3 other authors

24 January 2026

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved