Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Accurate 3D vehicle pose and shape reconstruction from monocular images remains a formidable challenge for autonomous driving, particularly for distant, occluded, or small objects. Existing methods often suffer from geometric ambiguity in depth estimation and structural hollowness in shape recovery, primarily due to inadequate multi-scale feature aggregation and inflexible prior modeling. To overcome these limitations, a novel framework termed MonoVPR is proposed by integrating dynamic context adaptation and progressive geometry refinement. Specifically, a Hierarchical Dual-Context Attention (HDCA) module is introduced to resolve scale-dependent degradation through gated cross-attention across multi-resolution feature maps, dynamically fusing object-centric geometric cues with scene-centric semantics. For shape refinement, the Bounded Iterative Mesh Refiner (BIMR) is developed, where template-guided deformations are progressively optimized via multi-head deformable attention and a tanh-bounded correction loop, ensuring physically plausible reconstructions. Extensive experiments on the ApolloCar3D benchmark demonstrate MonoVPR achieves state-of-the-art performance, showcasing exceptional capability in reconstructing geometrically consistent shapes and precise poses for challenging long-range and occluded scenarios.