Singapore

LiDAR Semantic Scene Completion (SSC) in autonomous driving requires predicting both dense occupancy and semantic labels from sparse input point cloud. Existing methods typically adopt cascaded architecture for feature dilation and semantic abstraction, which blurs distinctive geometric patterns and reduces feature discriminability. Moreover, given an input, conventional processing of the ground truth labels overlooks voxel predictability in the target, resulting in ill-posed supervision and discards informative voxels. To address these limitations, we propose Sparse-Dense Net (SDNet), a dual-branch architecture that processes the input points through parallel sparse and dense encoders. The complementary features are aligned and fused using a Sparse Dense Feature Fusion (SDFF) module and further refined by a Feature Propagation (FP) module. Additionally, we introduce an input-aware label refinement strategy, including Sparse-Guided Filtering (SGF) to filter unpredictable targets and Ignored Voxel Recycling (IVR) to leverage informative ignored voxels for auxiliary supervision. These innovations enhance both feature learning and label quality. Extensive experiments on SemanticKITTI and nuScenes OpenOccupancy datasets validate the effectiveness of our approach, with SDNet achieving state-of-the-art performance on both datasets and ranking 1st on the official SemanticKITTI benchmark with 42.1 mIoU, outperforming the previous best by 4.2 (+11.1\%).

AAAI 2026

SDNet: LiDAR Semantic Scene Completion with Sparse-Dense Fusion and Input-Aware Label Refinement

lidar point cloud processing

3d occupancy prediction

3d scene understanding

autonomous driving

semantic scene completion

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

We introduce a novel framework for video camera trajectory editing, enabling the re-synthesis of monocular videos along user-defined camera paths. This task is challenging due to its ill-posed nature and the limited multi-view video data for training. Traditional reconstruction methods struggle with extreme trajectory changes, and existing generative models for dynamic novel view synthesis cannot handle in-the-wild videos. Our approach consists of two steps: estimating temporally consistent geometry, and generative rendering guided by this geometry. By integrating geometric priors, the generative model focuses on synthesizing realistic details where the estimated geometry is uncertain. We eliminate the need for extensive 4D training data through a factorized fine-tuning framework that separately trains spatial and temporal components using multi-view image and video data. Our method outperforms baselines in producing plausible videos from novel camera trajectories, especially in extreme extrapolation scenarios on real-world footage.

Video Camera Trajectory Editing with Generative Rendering from Estimated Geometry

The deployment of deep neural networks on resource-constrained devices relies on quantization. While static, uniform quantization applies a fixed bit-width to all inputs, it fails to adapt to their varying complexity. Dynamic, instance-based mixed-precision quantization promises a superior accuracy-efficiency trade-off by allocating higher precision only when needed. However, a critical bottleneck remains: existing methods require a costly dequantize-to-float and requantize-to-integer cycle to change precision, breaking the integer-only hardware paradigm and compromising performance gains. This paper introduces Dynamic Quantization Training (DQT), a novel framework that removes this bottleneck. At the core of DQT is a nested integer representation where lower-precision values are bit-wise embedded within higher-precision ones. This design, coupled with custom integer-only arithmetic, allows for on-the-fly bit-width switching through a near-zero-cost bit-shift operation. This makes DQT the first quantization framework to enable both dequantization-free static mixed-precision of the backbone network, and truly efficient dynamic, instance-based quantization through a lightweight controller that decides at runtime how to quantize each layer. We demonstrate DQT state-of-the-art performance on ResNet18 on CIFAR-10 and ResNet50 on ImageNet. On ImageNet, our 4-bit dynamic ResNet50 achieves 77.00\% top-1 accuracy, an improvement over leading static (LSQ, 76.70\%) and dynamic (DQNET, 76.94\%) methods at a comparable BitOPs budget. Crucially, DQT achieves this with a bit-width transition cost of only 28.3M simple bit-shift operations, a drastic improvement over the 56.6M costly Multiply-Accumulate (MAC) floating-point operations required by previous dynamic approaches - unlocking a new frontier in efficient, adaptive AI.

DQT: Dynamic Quantization Training via Dequantization-Free Nested Integer Arithmetic

Current adversarial examples (AEs) are typically designed for static models. However, with the wide application of Class-Incremental Learning (CIL), models are no longer static and need to be updated with new data distributed and labeled differently from the old ones. As a result, existing AEs often fail after CIL updates due to significant domain drift. In this paper, we propose SAE to enhance the sustainability of AEs against CIL. The core idea of SAE is to enhance the robustness of AE semantics against domain drift by making them more similar to the target class while distinguishing them from all other classes. Achieving this is challenging, as relying solely on the initial CIL model to optimize AE semantics often leads to overfitting. To resolve the problem, we propose a Semantic Correction Module. This module encourages the AE semantics to be generalized, based on a generative model capable of producing universal semantics. Additionally, it incorporates the CIL model to correct the optimization direction of the AE semantics, guiding them closer to the target class. To further reduce fluctuations in AE semantics, we propose a Filtering-and-Augmentation Module, which first identifies non-target examples with target-class semantics in the latent space and then augments them to foster more stable semantics. Comprehensive experiments demonstrate that SAE outperforms baselines by an average of 31.28% when updated with a 9-fold increase in the number of classes. The benchmark results and source code are available in the supplementary material.

Improving Sustainability of Adversarial Examples in Class-Incremental Learning

Parameter-efficient fine-tuning has emerged as a promising paradigm in RGB-T tracking, enabling downstream task adaptation by freezing pretrained parameters and fine-tuning only a small set of parameters. This set forms a rank space made up of multiple individual ranks, whose expressiveness directly shapes the model's adaptability. However, quantitative analysis reveals low-rank adaptation exhibits significant redundancy in the rank space, with many ranks contributing almost no practical information. This hinders the model's ability to learn more diverse knowledge to address the various challenges in RGB-T tracking. To address this issue, we propose the Group Orthogonal Low-Rank Adaptation (GOLA) framework for RGB-T Tracking, which effectively leverages the rank space through structured parameter learning. Specifically, we adopt a rank decomposition partitioning strategy utilizing singular value decomposition to quantify rank importance, freeze crucial ranks to preserve the pretrained priors, and cluster the redundant ranks into groups to prepare for subsequent orthogonal constraints. We further design an inter-group orthogonal constraint strategy. This constraint enforces orthogonality between rank groups, compelling them to learn complementary features that target diverse challenges, thereby alleviating information redundancy. Experimental results demonstrate that GOLA effectively reduces parameter redundancy and enhances feature representation capabilities, significantly outperforming state-of-the-art methods across four benchmark datasets and validating its effectiveness in RGB-T tracking tasks. Code and more experiments are available in the supplementary materials.

Group Orthogonal Low-Rank Adaptation for RGB-T Tracking

Knowledge graph construction (KGC) aims to extract valuable information from text and organize it into structured knowledge graphs (KGs). Recent methods have leveraged the strong generative capabilities of large language models (LLMs) to improve the generalization and reduce the labor costs. However, constrained by the input length of LLMs, existing methods mainly focus on extracting knowledge within individual texts and lack the capability to discover latent knowledge across texts. To fill this gap, we propose a novel method for open knowledge graph construction, termed KG-DIF. The core idea of this method is to enhance the knowledge graph construction process by discovering new facts that are consistent with the underlying contextual logic. Specifically, we first design a knowledge extractor to extract knowledge from the text. Then, a knowledge normalizer performs schema alignment on the extracted knowledge. Next, we explore a knowledge discoverer based on a clue search strategy, which leverages the logical consistency of context to mine latent facts. Finally, we design a counterfactual-based knowledge corrector, enabling the model to purify knowledge and reduce factual errors. Experimental results show that KG-DIF is capable of extracting high-quality and comprehensive knowledge in open-world scenarios across three KGC benchmarks.

Discovering Latent Facts from Context to Construct Richer Open Knowledge Graphs

The rise of large language models (LLMs) has sparked interest in coding assistants.
While general purpose programming languages are well supported, generating code for domain-specific languages remains a challenging problem for LLMs.
In this paper, we focus on the LLM-based generation of Answer Set Programming (ASP) code, a particularly effective approach for finding solutions to combinatorial search problems.
However, the effectiveness of LLMs in ASP code generation is hindered by the limited number of examples seen during their initial pre-training phase.

In this paper, we introduce a novel approach for solver-guided instruction-tuning of LLMs for addressing the highly complex semantic parsing task inherent in ASP code generation. 
We sample ASP statements for program continuations proposed by LLMs for unriddling logic puzzles and categorize them into chosen and rejected instances based on solver feedback.
We then apply supervised fine-tuning to train LLMs on the curated data, and further improve robustness using a solver-guided search that includes best-of-N sampling.
Our experiments demonstrate consistent improvements in two distinct prompting settings on different datasets.

A Solver-in-the-Loop Framework for Improving LLMs on Answer Set Programming for Logic Puzzle Solving

Multimedia documents such as slide presentations and posters are designed to be interactive and easy to modify. Yet, they are often distributed in a static raster format, which limits editing and customization. Restoring their editability requires converting these raster images back into structured vector formats. However, existing geometric raster vectorization methods, which rely on low-level primitives like curves and polygons, fall short at this task. Specifically, when applied to complex documents like slides, they fail to preserve the high-level structure, resulting in a flat collection of shapes where the semantic distinction between image and text elements is lost. To overcome this limitation, we address the problem of semantic document derendering by introducing SliDer, a novel framework that uses Vision-Language Models (VLMs) to derender slide images as compact and editable Scalable Vector Graphic (SVG) representations. SliDer detects and extracts the attributes from individual image and text elements in a raster input and organizes them into a coherent SVG format. Crucially, the model iteratively refines its predictions during inference in a process analogous to human design, generating SVG code that more faithfully reconstructs the original raster upon rendering. Furthermore, we introduce Slide2SVG, a novel dataset comprising raster-SVG pairs of slide documents curated from real-world scientific presentations, to facilitate future research in this domain. Our results demonstrate that SliDer achieves a reconstruction LPIPS of 0.069, and is favored by human evaluators in 82.9% of cases compared to the strongest zero-shot VLM baseline. Our code and dataset will be publicly released at a later stage.

Semantic Document Derendering: SVG Reconstruction via Vision-Language Modeling

Diffusion models (DMs) have demonstrated to be powerful priors for signal recovery, but their application to 1-bit quantization tasks, such as 1-bit compressed sensing and logistic regression, remains a challenge. This difficulty stems from the inherent non-linear link function in these tasks, which is either non-differentiable or lacks an explicit characterization. To tackle this issue, we introduce Diff-OneBit, which is a fast and effective DM-based approach for signal recovery under 1-bit quantization. Diff-OneBit addresses the challenge posed by non-differentiable or implicit links functions via leveraging a differentiable surrogate likelihood function to model 1-bit quantization, thereby enabling gradient based iterations. This function is integrated into a flexible plug-and-play framework that decouples the data-fidelity term from the diffusion prior, allowing any pretrained DM to act as a denoiser within the iterative reconstruction process. Extensive experiments on the FFHQ and CelebA datasets demonstrate that Diff-OneBit gives high-fidelity reconstructed images, outperforming state-of-the-art methods in both reconstruction quality and computational efficiency across 1-bit compressed sensing and logistic regression tasks.

Diffusion Model Based Signal Recovery Under 1-Bit Quantization

Adversarial perturbations (APs) have become a great concern in image classification tasks. The most challenging branch, universal adversarial perturbations (UAPs), are exploited to fool most of the unseen samples. Such one-to-all perturbations have the merit of transferability, which has strong practical significance. In this paper, we firstly define the transferability gap and the algorithm stability of the UAP algorithm, and prove the relationship between them. In analyzing the UAP algorithm stability, we prove that the convergence domain of existing UAP algorithms with dynamic constraints is excessively small, which degrades the capacity of UAPs. Thus, we further propose a new expected constraint and prove that UAPs in the expected constraint suit any sample in a high probability. Besides, we propose a Stochastic Universal Adversarial Perturbation (SUAP) that involves additive noise and the expected constraint. Finally, by treating the proposed algorithm as a stochastic differential equation, we prove an upper bound of the UAP algorithm stability of SUAP, which decreases exponentially at the beginning and then increases with a sublinear rate to at most a fixed constant. Experimental results show that SUAP outperforms existing UAP algorithms with better white-box transferability.

Stochastic Universal Adversarial Perturbations with Fixed Optimization Constraint and Ensured High-probability Transferability

Missing data presents a widespread challenge in real-world data collection. In this paper, our goal is to impute missing entries while accurately reflecting the uncertainty associated with them. We introduce U-VAE, a method that employs a non-parametric distributional learning strategy to parameterize the likelihood of missing values. To address the infeasibility of directly estimating the underlying conditional distributions due to data incompleteness, we incorporate stochastic re-masking and un-masking techniques during training. Specifically, we replace the conventional reconstruction loss with the continuous ranked probability score (CRPS), a strictly proper scoring rule, and theoretically demonstrate that the discrepancy between the underlying conditional distribution and our imputer is upper-bounded. We evaluate the performance of U-VAE on 11 real-world datasets, showing its effectiveness in both single and multiple imputations, while also enhancing post-imputation performance and supporting valid statistical inference.

Content not yet available

Next from AAAI 2026

Video Camera Trajectory Editing with Generative Rendering from Estimated Geometry

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES