AAAI 2026

January 25, 2026

Singapore, Singapore

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

The rapid advancement of vision-language models (VLMs) in 3D domains has accelerated research in text-query-guided point cloud processing, though existing methods underperform in point-level segmentation due to inadequate 3D-text alignment that limits local feature-text context linking. To address this limitation, we propose $\textbf{MR-COSMO}$, a Visual-Text $\textbf{M}$emory $\textbf{R}$ecall and Direct $\textbf{C}$r$\textbf{OS}$s-$\textbf{MO}$dal Alignment Method for Query-Driven 3D Segmentation, establishing explicit alignment between 3D point clouds and text/2D image data through a dedicated direct cross-modal alignment module while implementing a visual-text memory module with specialized feature banks. This direct alignment mechanism enables precise fusion of geometric and semantic features, while the memory module employs specialized banks storing text features, visual features, and their correspondence mappings to dynamically enhance scene-specific representations via attention-based knowledge recall. Comprehensive experiments across 3D instruction, reference, and semantic segmentation benchmarks confirm state-of-the-art performance.

Downloads

Paper

Next from AAAI 2026

From Pixels to Logic: A Perception-Reasoning Decomposition Framework for Open-World Referring Expression Comprehension
poster

From Pixels to Logic: A Perception-Reasoning Decomposition Framework for Open-World Referring Expression Comprehension

AAAI 2026

+1
Lihong Huang and 3 other authors

25 January 2026

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved