Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Recent vision-language model (VLM)-based methods have achieved promising results in zero-shot out-of-distribution (OOD) detection by effectively leveraging the local patch features. However, the zero-shot nature inherently comes with two limitations: 1) imperfect local feature prototypes; 2) lack of OOD prototypes. In this paper, we propose Intra-Image Mining (IIM), a lightweight framework designed to overcome these limitations in a few-shot manner. IIM is motivated by the fact that local patches within an image often exhibit diverse semantics, with some patches deviating from the main class concept. Therefore, for each image, we first select the top-$k$ class prototype-related patches as positive samples and leverage them to refine and optimize the local feature prototype. Then, the next top-$k$ among the remaining patches are selected as negatives—serving as OOD signals to construct OOD prototypes. This process yields coherent local positives and challenging negatives, effectively enhancing the model’s local feature discrimination. Besides, we propose a novel OOD evaluation method named Symmetric Maximum Concept Matching (S-MCM). While existing approaches typically adopt an image-to-text scheme—comparing the image features to textual class prototypes—S-MCM further incorporate a text-to-image perspective, leading to more reliable OOD detection. We also propose two benchmarks to analyze the impact of semantic diversity within ID dataset. Built on a frozen VLM, IIM, in conjunction with S-MCM, achieves consistent gains in OOD detection on ImageNet-1k and other benchmarks, outperforming prior methods in FPR95 and AUROC across various few-shot settings.