AAAI 2026

January 25, 2026

Singapore, Singapore

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Monocular 3D object detection offers a cost-effective solution for autonomous driving, but it suffers from the ill-posed depth and a limited field of view. These constraints lead to the lack of geometric cues and reduced accuracy in occluded or truncated scenes. While recent approaches incorporate additional depth information to address geometric ambiguity, they overlook the importance of visual cues essential for robust object recognition. In this paper, we propose MonoCLUE that enhances monocular 3D detection by leveraging both local clustering and generalized scene memory of visual features. First, we perform K-means clustering on visual features to capture distinct object-level appearance visual parts (e.g., bonnet, car roof), which improves the detection of partially visible objects. The clustered features are then propagated across the entire region to capture objects with similar appearances. Second, we construct a generalized scene memory by aggregating clustered features across images, providing consistent appearance representations that generalize scenes. This improves the consistency of object-level features, enabling stable detection across varying environments. Lastly, we integrate both local cluster features and generalized scene memory into object queries, guiding attention toward informative regions in the feature map. Exploiting an unified local clustering and generalized scene memory strategy, MonoCLUE enables robust monocular 3D detection under occlusion and limited visibility. Our proposed model achieves state-of-the-art performance on the KITTI benchmark.

Downloads

Paper

Next from AAAI 2026

Facial Dynamics in Video: Instruction Tuning for Improved Facial Expression Perception and Contextual Awareness
poster

Facial Dynamics in Video: Instruction Tuning for Improved Facial Expression Perception and Contextual Awareness

AAAI 2026

+1
Xiang Chen and 3 other authors

25 January 2026

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved