AAAI 2026

January 22, 2026

Singapore, Singapore

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

3D visual grounding (3DVG) identifies objects in 3D scenes from language descriptions, with applications in augmented reality and embodied AI. Existing zero-shot approaches leverage 2D vision–language models (VLMs) by converting 3D spatial information (SI) into forms amenable to VLM processing, typically as composite visual inputs such as specified-view renderings or video sequences with overlaid object markers. However, this VLM~$\oplus$~SI paradigm yields entangled visual representations that compel the VLM to process entire cluttered cues, making it hard to exploit spatial–semantic relationships effectively. In this work, we propose a new VLM~$\otimes$~SI paradigm that externalizes the 3D SI into a form that enables the VLM to incrementally retrieve only what it needs during its reasoning process. We instantiate this paradigm with a novel View-on-Graph (VoG) method, which organizes the scene into a multi-modal, multi-layer scene graph and allows the VLM to operate as an active agent that selectively accesses necessary cues as it traverses the scene. This design offers two intrinsic advantages: (i) by structuring 3D context into a spatially and semantically coherent scene graph rather than confounding the VLM with densely entangled visual inputs, it makes spatial–semantic relationships easier to exploit and lowers the VLM's reasoning difficulty; and (ii) by actively exploring and reasoning over the scene graph, it naturally produces transparent, step-by-step traces for interpretable 3DVG. Extensive experiments demonstrate the effectiveness of the proposed VLM~$\otimes$~SI paradigm and show that VoG achieves state-of-the-art zero-shot performance, establishing structured scene exploration as a promising strategy for advancing zero-shot 3DVG.

Downloads

Paper

Next from AAAI 2026

SineLoRA∆: Sine-Activated Delta Compression
poster

SineLoRA∆: Sine-Activated Delta Compression

AAAI 2026

+2Cameron Gordon
Paul Albert and 4 other authors

22 January 2026

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved