Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Open-vocabulary object detection (OVOD) holds promise for remote sensing, yet the natural-to-aerial image domain gap hinders generalization. Dominant backgrounds, sparse labels with limited semantics, and semi-supervised training difficulties pose significant challenges. We introduce SOAR (\textbf{S}emi-supervised \textbf{O}pen-vocabulary \textbf{A}erial Object \textbf{R}ecognition via Dual-aware Enhanced Prior Denoising), which generates pseudo-labels for semi-supervised training by learning implicit foreground priors and performing efficient denoising. Specifically, we dynamically extract background features and implicitly model foreground priors, treating them as noisy ground truth. These are then denoised through a refiner to obtain pseudo-labels. Besides, we further introduce a dual-aware query enhancement (DAQE) module that integrates language and foreground prior information to enhance the effectiveness of query selection and feature augmentation. Additionally, we address the sparsity of label information through expansion and aggregation techniques, further improving model performance. Finally, experimental evaluations reveal that, in the open-vocabulary object detection task on the DIOR dataset, our method achieves a mean Average Precision (mAP) of 68.5\% and Harmonic Mean (HM) of 55.9\%, outperforming the previous state-of-the-art model’s mAP of 61.6\% and HM of 53.6\%. Our approach offers a new solution to the open-vocabulary challenge in aerial object detection. The source code will be available.