Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Person re-identification (ReID) aims to retrieve target pedestrian images based on either visual queries (I2I) or textual descriptions (T2I). Although both tasks share the same retrieval objective, they face distinct challenges: I2I focuses on learning discriminative identity representations, while T2I emphasizes cross-modal semantic alignment. Existing approaches typically handle these tasks separately or naively combine them, which often leads to task interference and performance degradation. To address this, we propose a unified framework that leverages task-aware prompt learning to jointly optimize both tasks. Specifically, we design a Task-Routed Transformer that introduces dual classification tokens within a shared visual encoder to decouple task-specific representations. On top of this, we develop a Task-Conditioned Prompt Alignment module that constructs hierarchical prompts by integrating identity-level learnable tokens with sample-level pseudo-text tokens. These pseudo-tokens are converted from image or text features via modality-specific decoders, injecting fine-grained instance-level semantics into the prompts. Furthermore, we introduce a Cross-Modal Prompt Regularization strategy to enforce semantic alignment in the prompt token space, encouraging pseudo-prompts to preserve source-modality semantics while enhancing cross-modal transferability. Extensive experiments on multiple benchmark datasets demonstrate that our approach effectively mitigates task interference and achieves state-of-the-art performance on both I2I and T2I person ReID tasks.
