Singapore

Rectification flow Transformers (RFTs) have shown promising performance in diffusion-based image synthesis, but are typically confined to lower-resolution scenarios, limiting their ability to generate high-resolution images. Existing resolution extrapolation approaches often suffer from excessive computational overhead, resulting in prolonged inference times. We propose LookFlow, a training-free high-resolution synthesis framework that accelerates inference while preserving visual quality. Building on pretrained text-to-image RFTs, LookFlow employs a dynamic lookahead guidance flow mechanism to refine high-resolution velocity predictions by leveraging multi-timestep lookahead information extracted from a low-resolution flow. Additionally, reusing temporally similar features across consecutive timesteps drastically reduces computation and significantly decreases inference time overhead. Extensive experiments on COCO demonstrate that LookFlow robustly scales resolutions from $4\times$ to $25 \times$, achieving up to a maximum speedup of $2.01 \times$ while maintaining competitive visual fidelity.

AAAI 2026

LookFlow: Training-Free and Efficient High-Resolution Image Synthesis via Dynamic Lookahead Guidance Flow

high-resolution generation

diffusion models

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

We present a novel framework for high-fidelity novel view synthesis (NVS) from sparse images, addressing key limitations in recent feed-forward 3D Gaussian Splatting (3DGS) methods built on Vision Transformer (ViT) backbones. While ViT-based pipelines offer strong geometric priors, they are often constrained by low-resolution inputs due to computational costs. Moreover, existing generative enhancement methods tend to be 3D-agnostic, resulting in inconsistent structures across views, especially in unseen regions.
To overcome these challenges, we design a Dual-Domain Detail Perception Module, which enables handling high-resolution images without being limited by the ViT backbone, and endows Gaussians with additional features to store high-frequency details. We develop a feature-guided diffusion network, which can preserve high-frequency details during the restoration process. We introduce a unified training strategy that enables joint optimization of the ViT-based geometric backbone and the diffusion-based refinement module. Experiments demonstrate that our method can maintain superior generation quality across multiple datasets.

One-Shot Refiner: Boosting Feed-forward Novel View Synthesis via One-Step Diffusion

Robust medical image classification under input corruption and bag-level annotation remains a critical challenge in clinical AI applications. We propose \textbf{QAPNet}, a Quantum-Attentive Patchwise Network that integrates quantum neural encoding, additive attention-based instance reweighting, and prototype-contrastive regularization for reliable diagnosis from degraded inputs. Our framework uses a sliding-window strategy to divide each MRI medical Image into overlapping patches where each encoded via an 8-qubit quantum circuit using $RY$-based noise-sensitive layers for yielding expressive low-dimensional representations without classical CNNs. A lightweight additive attention mechanism computes instance-wise importance weights that enable interpretable and noise-aware bag-level aggregation. To enhance robustness, we apply a contrastive loss that aligns clean and noisy embeddings and enforce prototype-guided clustering via class-wise centroids. We evaluate QAPNet across seven benchmark medical imaging datasets under three levels of additive Gaussian noise ($\sigma \in \{5\%, 10\%, 30\%\}$). QAPNet consistently outperforms eight strong baselines and achieves up to $+20.8\%$ higher accuracy in OASIS (with $30\%$ noise), $+17.7\%$ in PathMNIST and maintains stable performance ($<4\%$ degradation) in all settings. Ablation studies confirm the critical role of quantum encoding, attention-based aggregation, and prototype contrastive learning. These results suggest that QAPNet offers a scalable and interpretable architecture for noisy medical imaging tasks in the real world to bridge the quantum representation learning with robust clinical prediction.

QAPNet: A Quantum-Attentive Patchwise Network for Robust Medical Image Classification Under Noisy Inputs

Large language models (LLMs) exhibit strong generative capabilities and have shown great potential in code generation. Existing chain-of-thought (CoT) prompting methods enhance model reasoning by eliciting intermediate steps, but suffer from two major limitations: First, their uniform application tends to induce overthinking on simple tasks. Second, they lack intention abstraction in code generation, such as explicitly modeling core algorithmic design and efficiency, leading models to focus on surface-level structures while neglecting the global problem objective. Inspired by the cognitive economy principle of engaging structured reasoning only when necessary to conserve cognitive resources, we propose RoutingGen, a novel difficulty-aware routing framework that dynamically adapts prompting strategies for code generation. For simple tasks, it adopts few-shot prompting; for more complex ones, it invokes a structured reasoning strategy, termed Intention Chain-of-Thought (ICoT), which we introduce to guide the model in capturing task intention, such as the core algorithmic logic and its time complexity. Experiments across three models and six standard code generation benchmarks show that RoutingGen achieves state-of-the-art performance in most settings, while reducing total token usage by 46.37\% on average across settings. Furthermore, ICoT outperforms six existing prompting baselines on challenging benchmarks.

Intention Chain-of-Thought Prompting with Dynamic Routing for Code Generation

Self-supervised monocular depth estimation methods severely compromise accuracy in dynamic objects due to their static scene assumption. 
Existing approaches for dynamic scenes suffer from two critical shortcomings: 1) reliance on supervised segmentation models (requiring costly annotations) or computationally intensive multi-branch models to isolate moving objects, and 2) simple integration of 2D/3D motion flow without reliable supervision for dynamic objects. 
We propose AdaDepth, a two‑stage framework that jointly performs unsupervised scene decomposition and dynamic-aware depth learning. In the initial structural stage, our geometry-motion joint scene decomposition (GMoDecomp) module ensures the robust generation of a depth prior and simultaneously partitions the scene into multiple regions through the fusion of geometric and motion cues. 
In the region-adaptive refinement stage, we exploit the depth prior and decomposed regions to introduce motion-aware and geometry-consistent constraints, effectively improving depth estimation in dynamic scenes. 
AdaDepth achieves accurate depth prediction in highly dynamic scenes without relying on external labels or specialized segmentation models. Extensive experiments on KITTI, Cityscapes, and Waymo Open demonstrate its superiority over state-of-the-art approaches.

AdaDepth: Exploiting Inherent Scene Information for Self-Supervised Depth Estimation in Dynamic Scenes

Reinforcement Fine-tuning (RFT) methods such as Proximal Policy Optimization (PPO) and Group Relative Policy Optimization (GRPO) have demonstrated strong capabilities in aligning Large Language Models (LLMs) with human preferences. However, these approaches often suffer from limited data efficiency, necessitating extensive on-policy rollouts to maintain competitive performance. We propose PSPO (Prompt-Level Prioritization and Experience-Weighted Smoothing for Efficient Policy Optimization), a lightweight yet effective enhancement to GRPO that improves training stability and sample efficiency through two complementary techniques. First, we introduce an experience-weighted reward smoothing mechanism, which uses exponential moving averages to track group-level reward statistics for each prompt. This enables more stable advantage estimation across training steps without storing entire trajectories, allowing the model to capture historical reward trends in a lightweight and memory-efficient manner. Second, we adopt a prompt-level prioritized sampling strategy, which is an online data selection method inspired by prioritized experience replay. It dynamically emphasizes higher-impact prompts based on their relative advantages, thereby improving data efficiency. Experiments on multiple mathematical reasoning benchmarks and models show that PSPO achieves comparable or better accuracy than GRPO, while significantly accelerating convergence, and maintaining low computational and memory overhead.

PSPO: Prompt-Level Prioritization and Experience-Weighted Smoothing for Efficient Policy Optimization

While neural solvers have shown remarkable performance on Vehicle Routing Problems (VRPs), two key challenges persist. First, it remains difficult to determine which parts of the input graph are most critical for making optimal routing decisions during the decoding stage. Second, current neural models are typically trained on smaller problem instances (50-100 nodes), and their ability to generalize to large-scale scenarios is underexplored. To address these challenges, we introduce a novel U-Net architecture that captures multi-level information, enhancing the decision-making process in the decoder. Building on this, we propose a unified neural solver for a wide range of Vehicle Routing Problems. Our extensive experiments demonstrate the effectiveness of this framework on both small and large-scale problem instances, showcasing its superior performance and generalization capabilities.

Scale-Net: A Hierarchical U-Net Framework for Cross-Scale Generalization in Multi-Task Vehicle Routing

Recent advancements in large language models (LLMs) have greatly improved their ability to perform complex reasoning tasks through Long Chain-of-Thought (CoT). However, this approach often results in substantial redundancy, impairing computational efficiency and causing significant delays in real-time applications. To improve the efficiency, current methods often rely on human-defined difficulty priors, which do not align with the LLM's self-awared difficulty, leading to inefficiencies. In this paper, we introduce the Dynamic Reasoning-Boundary Self-Awareness Framework (DR. SAF), which enables LLMs to dynamically assess and adjust their reasoning depth in response to problem complexity. DR. SAF integrates three key components: Boundary Self-Awareness Alignment, Adaptive Reward Management, and a Boundary Preservation Mechanism. These components allow models to optimize their reasoning processes, balancing efficiency and accuracy without compromising performance. Our experimental results demonstrate that DR. SAF achieves a 49.27\% reduction in total response tokens with minimal loss in accuracy. The framework also delivers a 6.59x gain in token efficiency and a 5x reduction in training time, making it well-suited to resource-limited settings. During extreme training, DR. SAF can even surpass traditional instruction-based models in token efficiency with more than 16\% accuracy improvement.

Aware First, Think Less: Dynamic Boundary Self-Awareness Drives Significant Gains in Reasoning Efficiency in Large Language Models

Pixel-level feature attributions are an important tool in eXplainable AI for Computer Vision (XCV), providing visual insights into how image features influence model predictions. The Owen formula for hierarchical Shapley values has been widely used to interpret machine learning (ML) models and their learned representations. However, existing hierarchical Shapley approaches do not exploit the multiscale structure of image data, leading to slow convergence and weak alignment with the actual morphological features. Moreover, no prior Shapley method has leveraged data-aware hierarchies for Computer Vision tasks, leaving a gap in model interpretability of structured visual data.

To address this, this paper introduces ShapBPT, a novel data-aware XCV method based on the hierarchical Shapley formula. 
ShapBPT assigns Shapley coefficients to a multiscale hierarchical structure tailored for images, the Binary Partition Tree (BPT). 
By using this data-aware hierarchical partitioning, ShapBPT ensures that feature attributions align with intrinsic image morphology, effectively prioritizing relevant regions while reducing computational overhead.
This advancement connects hierarchical Shapley methods with image data, providing a more efficient and semantically meaningful approach to visual interpretability. Experimental results confirm ShapBPT’s effectiveness, demonstrating superior alignment with image structures and improved efficiency over existing XCV methods, and a 20-subject user study confirming that ShapBPT explanations are preferred by humans.

ShapBPT: Image Feature Attributions Using Data-Aware Binary Partition Trees

Large Vision-Language Models (LVLMs) often suffer from object hallucination, making erroneous judgments about the presence of objects in images. We propose this primarily stems from spurious correlations arising when models strongly associate highly co-occurring objects during training, leading to hallucinated objects influenced by visual context. Current benchmarks mainly focus on hallucination detection but lack a formal characterization and quantitative evaluation of spurious correlations in LVLMs. To address this, we introduce causal analysis into the object recognition scenario of LVLMs, establishing a Structural Causal Model (SCM). Utilizing the language of causality, we formally define spurious correlations arising from co-occurrence bias. To quantify the influence induced by these spurious correlations, we develop Causal-HalBench, a benchmark specifically constructed with counterfactual samples and integrated with comprehensive causal metrics designed to assess model robustness against spurious correlations. Concurrently, we propose an extensible pipeline for the construction of these counterfactual samples, leveraging the capabilities of proprietary LVLMs and Text-to-Image (T2I) models for their generation. Our evaluations on mainstream LVLMs using Causal-HalBench demonstrate these models exhibit susceptibility to spurious correlations, albeit to varying extents.

Causal-HalBench: Uncovering LVLMs Object Hallucinations Through Causal Intervention

The missing of graph attributes poses a significant challenge in graph representation learning. Some existing graph attribute completion methods adopt the shared-space hypothesis or employ end-to-end frameworks to perform single-attribute imputation. However, these models can only generate one single attribute with a few specific patterns that either adhere to prior knowledge or are optimal for downstream tasks, making it difficult to capture the full range of variations in the target attribute distribution. This limitation negatively impacts the model's generalizability and efficiency.

Therefore, to address this issue, we proposed a new method based on a graph denoising diffusion model, called **Multi-attribute Imputation Graph Denoising Diffusion Model (MIGDiff)**, which can generate multiple high-quality attributes. Specifically, it employs a **Dual-source Auto-encoder** on existing attributes and graph topology to extract reliable knowledge, which serves as a condition for training the diffusion module.

Within diffusion, noise is added to the structural embeddings of nodes without attributes in the forward process. In the reverse process, a **Structure-aware Denoising Network** is devised to integrate feature and structural information via an attention mechanism and to perform neighbor‑guided refinement based on graph connectivity, thereby enhancing denoising and accurately recovering missing attributes while effectively maintaining structural consistency and distributional fidelity.

During generation, multiple initial values are sampled to produce diverse attribute imputations, avoiding focusing on a few easy-to-learn patterns. Extensive experiments conducted on four public datasets highlight the state-of-the-art performance of MIGDiff in both attribute imputation and node classification tasks.

Downloads

Next from AAAI 2026

One-Shot Refiner: Boosting Feed-forward Novel View Synthesis via One-Step Diffusion

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

One-Shot Refiner: Boosting Feed-forward Novel View Synthesis via One-Step Diffusion

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads