Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Embedding-based generalized zero-shot learning (GZSL) models often first forge robust latent semantic correlations between visual and attribute features so that knowledge can generalize to unseen categories. Despite leveraging attributes as priors and learning a shared embedding space, current methods exhibit two critical flaws. First, attributes with heterogeneous granularity are treated uniformly, leading to semantic ambiguity. Second, the source of class-level misclassification seldom aligns with attribute-level errors, preventing models from targeting the specific attributes responsible. To overcome these limitations, we introduce Structured Attribute-Guided Enhancement (SAGE), a unified framework for GZSL. Consensus-aware bidirectional attention first synchronizes visual–semantic focus regions via a mutual-distillation scheme. Next, we partition all attributes into pairwise-disjoint subsets—Global, Context, and Local—and couple them with visual features extracted at matching spatial scales. Finally, we design a cross-sample, subset-aware distillation mechanism—when a sample is misclassified, SAGE identifies the culpable attribute subset, retrieves high-confidence prototypes from a memory bank, and applies a Kullback–Leibler (KL) divergence constraint to the corresponding feature branch. Comprehensive experiments and ablations on the challenging AwA2, CUB, and SUN benchmarks demonstrate the contribution of each component, with SAGE achieving a new state-of-the-art throughout. These findings underscore SAGE’s robustness and versatility, marking a substantial advance in generalized zero-shot learning and paving the way for broader zero-resource recognition.