AAAI 2026

January 22, 2026

Singapore, Singapore

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Generating high-quality, controllable, and structurally consistent 3D scenes is a fundamental yet challenging task, especially in complex multi-object environments. We present \textbf{SceneGenesis}, a unified framework for 3D scene synthesis that systematically integrates semantic structural priors with mesh-guided video-geometry fusion. The process begins with a \textbf{semantic structural initialization module}, which leverages large language models to convert textual scene prompts into category-aware object descriptions. These are transformed into structured meshes by combining procedural approximations for large-scale objects and pretrained mesh generators for fine-grained assets, enabling precise layout control and scene scalability. To synthesize rich and style-controllable appearances, we render depth and semantic maps from the initialized scene and condition a pretrained video diffusion model to generate multi-view video sequences with geometry-awareness, where a consistency-guided latent fusion strategy further enhances temporal consistency across long sequences. Crucially, we introduce a \textbf{mesh-guided video-geometry fusion module} that reconstructs coherent 3D Gaussian scenes by aligning mesh priors with video outputs. This module incorporates mesh-conditioned fragment initialization, progressive geometric refinement, and structure-aware optimization, significantly enhancing global geometric fidelity and visual realism. Extensive experiments demonstrate that \textbf{SceneGenesis} enables flexible style variation and object-level editing while achieving superior controllability, scalability, and 3D structural quality, offering an effective solution for 3D scene synthesis.

Downloads

Paper

Next from AAAI 2026

Listening Between the Frames: Bridging Temporal Gaps in Large Audio-Language Models
poster

Listening Between the Frames: Bridging Temporal Gaps in Large Audio-Language Models

AAAI 2026

+2
Yiming Li and 4 other authors

22 January 2026

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved