Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Open-Set Domain Generalization (OSDG) aims to generalize over unseen target domains containing open classes, and the core challenge lies in identifying unknown samples never encountered during training. Recently, CLIP has exhibited impressive performance in OSDG, while it still falls into the dilemma between structural risk of known classes and open space risk from unknown classes, and easily suffers from over-confidence, especially when distinguishing known-like unknown samples. To this end, we propose a Semantic-enhanced CLIP (SeeCLIP) framework that leverages fine-grained semantics to boost unknown detection, so as to accommodate both risks and enable precise discrimination among categories. In SeeCLIP, we propose a semantic-aware prompt enhancement module to extract fine-grained key semantic features, and establish a fine-grained vision-language alignment. Duplex contrastive learning is proposed for prompt learning, which jointly optimizes duplex losses such that the unknown prompt is similar to known prompts, yet exhibits key semantic differences. We also design a semantic-guided diffusion module to enable nuanced capture in generation. By injecting perturbed key semantics into a diffusion model as control conditions, it generates the closest unknowns or pseudo-open samples with high similarity yet low belongingness to known classes. We formulate a generalization bound for OSDG, and show that SeeCLIP can achieve a lower generalization risk. Extensive experiments on benchmark datasets validate the superiority of SeeCLIP, it outperforms the SOTA methods by nearly 3% on accuracy and 5% on H-index, respectively.